Spam Image Detection Model based on Deep Learning for Improving Spam Filter


Seong-Guk Nam, Dong-Gun Lee, Yeong-Seok Seo, Journal of Information Processing Systems Vol. 19, No. 3, pp. 289-301, Jun. 2023  

10.3745/JIPS.04.0274
Keywords: Classification, Deep Learning, image processing, Image SPAM, Obfuscated Feature, SPAM
Fulltext:

Abstract

Due to the development and dissemination of modern technology, anyone can easily communicate using services such as social network service (SNS) through a personal computer (PC) or smartphone. The development of these technologies has caused many beneficial effects. At the same time, bad effects also occurred, one of which was the spam problem. Spam refers to unwanted or rejected information received by unspecified users. The continuous exposure of such information to service users creates inconvenience in the user's use of the service, and if filtering is not performed correctly, the quality of service deteriorates. Recently, spammers are creating more malicious spam by distorting the image of spam text so that optical character recognition (OCR)-based spam filters cannot easily detect it. Fortunately, the level of transformation of image spam circulated on social media is not serious yet. However, in the mail system, spammers (the person who sends spam) showed various modifications to the spam image for neutralizing OCR, and therefore, the same situation can happen with spam images on social media. Spammers have been shown to interfere with OCR reading through geometric transformations such as image distortion, noise addition, and blurring. Various techniques have been studied to filter image spam, but at the same time, methods of interfering with image spam identification using obfuscated images are also continuously developing. In this paper, we propose a deep learning-based spam image detection model to improve the existing OCR-based spam image detection performance and compensate for vulnerabilities. The proposed model extracts text features and image features from the image using four sub-models. First, the OCR-based text model extracts the text-related features, whether the image contains spam words, and the word embedding vector from the input image. Then, the convolution neural network-based image model extracts image obfuscation and image feature vectors from the input image. The extracted feature is determined whether it is a spam image by the final spam image classifier. As a result of evaluating the F1-score of the proposed model, the performance was about 14 points higher than the OCR-based spam image detection performance.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from November 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.




Cite this article
[APA Style]
Nam, S., Lee, D., & Seo, Y. (2023). Spam Image Detection Model based on Deep Learning for Improving Spam Filter. Journal of Information Processing Systems, 19(3), 289-301. DOI: 10.3745/JIPS.04.0274.

[IEEE Style]
S. Nam, D. Lee, Y. Seo, "Spam Image Detection Model based on Deep Learning for Improving Spam Filter," Journal of Information Processing Systems, vol. 19, no. 3, pp. 289-301, 2023. DOI: 10.3745/JIPS.04.0274.

[ACM Style]
Seong-Guk Nam, Dong-Gun Lee, and Yeong-Seok Seo. 2023. Spam Image Detection Model based on Deep Learning for Improving Spam Filter. Journal of Information Processing Systems, 19, 3, (2023), 289-301. DOI: 10.3745/JIPS.04.0274.