A Brief Survey into the Field of Automatic Image Dataset Generation through Web Scraping and Query Expansion


Bart Dikmans, Dongwann Kang, Journal of Information Processing Systems Vol. 19, No. 5, pp. 602-613, Oct. 2023  

10.3745/JIPS.04.0288
Keywords: Image Dataset Generation, Query Expansion, Web Scraping
Fulltext:

Abstract

High-quality image datasets are in high demand for various applications. With many online sources providing manually collected datasets, a persisting challenge is to fully automate the dataset collection process. In this study, we surveyed an automatic image dataset generation field through analyzing a collection of existing studies. Moreover, we examined fields that are closely related to automated dataset generation, such as query expansion, web scraping, and dataset quality. We assess how both noise and regional search engine differences can be addressed using an automated search query expansion focused on hypernyms, allowing for user-specific manual query expansion. Combining these aspects provides an outline of how a modern web scraping application can produce large-scale image datasets.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Dikmans, B. & Kang, D. (2023). A Brief Survey into the Field of Automatic Image Dataset Generation through Web Scraping and Query Expansion. Journal of Information Processing Systems, 19(5), 602-613. DOI: 10.3745/JIPS.04.0288.

[IEEE Style]
B. Dikmans and D. Kang, "A Brief Survey into the Field of Automatic Image Dataset Generation through Web Scraping and Query Expansion," Journal of Information Processing Systems, vol. 19, no. 5, pp. 602-613, 2023. DOI: 10.3745/JIPS.04.0288.

[ACM Style]
Bart Dikmans and Dongwann Kang. 2023. A Brief Survey into the Field of Automatic Image Dataset Generation through Web Scraping and Query Expansion. Journal of Information Processing Systems, 19, 5, (2023), 602-613. DOI: 10.3745/JIPS.04.0288.