Fusion of Text and Image Features: A New Approach to Image Spam Filtering

  • Congfu Xu
  • Kevin Chiew
  • Yafang Chen
  • Juxin Liu
Part of the Advances in Intelligent and Soft Computing book series (AINSC, volume 124)


While enjoying the convenience of email communications, many users have also experienced annoying email spam. Even if the current spam detecting approaches have gained a competitive edge against text-based email spam, they still face the challenge arising from image-based spam (image spam in short). Image spam normally includes embedded images that contain the spam messages in binary format rather than text format and cost more storage and bandwidth resources. In this paper, we propose a hybrid image spam filtering framework to detect spam images based on both extracted text and image features. Our experimental results show that our approach achieves significant improvement in detection accuracy as compared with other methods that simply use text or image features, and works robustly in an environment with either complex background or compression artifact.


Support Vector Machine Text Region Color Saturation Spam Detection Email Spam 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Aradhye, H.B., Myers, G.K., Herson, J.A.: Image analysis for efficient categorization of image-based spam e-mail. In: Proceedings of International Conference on Document Analysis and Recognition, pp. 914–918 (August 2005)Google Scholar
  2. 2.
    Bennett, P.N., Dumais, S.T., Horvitz, E.: The combination of text classifiers using reliability indicators. Information Retrieval 8(1), 67–100 (2005)CrossRefGoogle Scholar
  3. 3.
    Biggio, B., Fumera, G., Pillai, I., Roli, F.: Image spam filtering by content obscuring detection. In: Proceedings of the Fourth Conference on Email and Anti-Spam (CEAS 2007), pp. 2–3 (August 2007)Google Scholar
  4. 4.
    Biggio, B., Fumera, G., Pillai, I., Roli, F.: Image spam filtering using visual information. In: Proceedings of the 14th International Conference on Image Analysis and Processing (ICIAP 2007), pp. 105–110 (September 2007)Google Scholar
  5. 5.
    Cheng, H.D., Sun, Y.: A hierarchical approach to color image segmentation using homogeneity 9(12), 2071–2082 (2000)Google Scholar
  6. 6.
    Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. In: Proceedings of the Fourth Conference on Email and Anti-Spam (CEAS 2007), pp. 487–493 (August 2007)Google Scholar
  7. 7.
    Frankel, C., Swain, M., Athitsos, V.: Webseer: an image search engine for the world wide web. Technical report, University of Chicago (1996)Google Scholar
  8. 8.
    Fumera, G., Pillai, I., Roli, F.: Spam filtering based on the analysis of text information embedded into images. Journal of Maching Learning Research (special issue on Machine Learning in Computer Security) 7, 2699–2720 (2006)Google Scholar
  9. 9.
    Gopalan, C., Manjula, D.: Statistical modeling for the detection, localization and extraction of text from heterogeneous textual images using combined feature scheme, 1863–1703 (2010)Google Scholar
  10. 10.
    Haralick, R., Shanmugam, K., Dinstein, I.: Textual features for image classification 3(6), 610–631 (1973)Google Scholar
  11. 11.
    Huang, H., Guo, W., Zhang, Y.: A novel method for image spam filtering. In: Proceedings of the 9th International Conference for Young Computer Scientists (ICYCS 2008), pp. 826–830 (November 2008)Google Scholar
  12. 12.
    Jain, A.K.: Fundamentals of Digital Image Processing. Prentice-Hall, Inc., Upper Saddle River (1989)zbMATHGoogle Scholar
  13. 13.
    Lynam, T.R., Buckley, C., Clarke, C.L.A., Cormack, G.V.: A multi-system analysis of document and term selection for blind feedback. In: Proceedings of the 13th ACM Conference on Information and Knowledge Management (CIKM 2004), pp. 261–269 (November 2004)Google Scholar
  14. 14.
    Mehta, B., Nangia, S., Gupta, M., Nejdl, W.: Detecting image spam using visual features and near duplicate detection. In: Proceedings of the 17th International Conference on World Wide Web (WWW 2008), pp. 21–25 (April 2008)Google Scholar
  15. 15.
    Montague, M., Aslam, J.A.: Condorcet fusion for improved retrieval. In: Proceedings of the 11th ACM Conference on Information and Knowledge Management (CIKM 2002), pp. 538–548 (November 2002)Google Scholar
  16. 16.
    Nhung, N.P., Phuong, T.M.: An efficient method for filtering image-based spam. In: Proceedings of 2007 IEEE International Conference on Research, Innovation and Vision for the Future, pp. 96–102 (March 2007)Google Scholar
  17. 17.
    Secure Computing Whitepaper. Image spam: The latest attack on the enterprise inbox. Technical report (November 2006)Google Scholar
  18. 18.
    Zhang, Y.: Using bayesian priors to combine classifiers for adaptive filtering. In: Proceedings of the 27th Conference on Research and Development in Information Retrieval (SIGIR 2004), pp. 345–352 (July 2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Congfu Xu
    • 1
  • Kevin Chiew
    • 2
  • Yafang Chen
    • 1
  • Juxin Liu
    • 1
  1. 1.Institute of Artificial IntelligenceZhejiang UniversityHangzhouChina
  2. 2.School of EngineeringTan Tao UniversityLong AnVietnam

Personalised recommendations