Image spam analysis and detection

  • Annapurna Annadatha
  • Mark Stamp
Original Paper


Image spam is unsolicited bulk email, where the message is embedded in an image. Spammers use such images to evade text-based filters. In this research, we analyze and compare two methods for detecting spam images. First, we consider principal component analysis (PCA), where we determine eigenvectors corresponding to a set of spam images and compute scores by projecting images onto the resulting eigenspace. The second approach focuses on the extraction of a broad set of image features and selection of an optimal subset using support vector machines (SVM). Both of these detection strategies provide high accuracy with low computational complexity. Further, we develop a new spam image dataset that cannot be detected using our PCA or SVM approach. This new dataset should prove valuable for improving image spam detection capabilities.


  1. 1.
    Annadatha, A.S.: Image spam analysis. Master’s Report, Department of Computer Science, San Jose State University (2016)Google Scholar
  2. 2.
    Annadatha, A.S.: Improved spam image dataset. Accessed 8 Aug 2016
  3. 3.
    Bradley, A.P.: The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 30(7), 1145–1159 (1997)CrossRefGoogle Scholar
  4. 4.
    Brownlee, J.: An introduction to feature selection (2014). Accessed 11 April 2016
  5. 5.
    Canny, J.: A computational approach to edge detection. IEEE Trans. Pattern Anal. Mach. Intell. 6, 679–698 (1986)CrossRefGoogle Scholar
  6. 6.
    Dredze, M., Gevaryahu, R., Elias-Bachrach, A.: Learning fast classifiers for image spam. CEAS, India (2007)Google Scholar
  7. 7.
    Dredze, M.: Image spam dataset (2007). Accessed 15 Febr 2016
  8. 8.
    Gao, Y., Choudhary, A., Hua, G.: A comprehensive server to client side approach to image spam detection. IEEE Trans. Inf. Foren. Secur. 5(4), 826–836 (2010)Google Scholar
  9. 9.
    Gao, Y., Choudhary, A.: Active learning image spam hunter. Adv. Vis. Comput. Lect. Not. Comput. Sci. 5876, 293–302 (2009)Google Scholar
  10. 10.
    Gao, Y., Yang, M., Choudhary, A.: Semi supervised image spam hunter: a regularized discriminant em approach. Adv. Data Min. Appl. Lect. Not. Comput. Sci. 5678, 152–164 (2009)Google Scholar
  11. 11.
    Gao, Y., Yang, M., Zhao, X., Pardo, B., Wu, Y., Pappas, T.N., Choudhary, A.: Image spam hunter, acoustics, speech and signal processing (ICASSP 2008), pp. 1765–1768Google Scholar
  12. 12.
    Gao, Y.: Image spam hunter dataset (2008). Accessed 20 Sept 2015
  13. 13.
    Guyon, I., Weston, J., Barnhill, S., Vapnik, V.: Gene selection for cancer classification using support vector machines. Mach. Learn. 46(1–3), 389–422 (2002)CrossRefzbMATHGoogle Scholar
  14. 14.
    Hearst, M.A., Dumais, S.T., Osman, E., Platt, J., Scholkopf, B.: Support vector machines. IEEE Intell. Syst. Appl. 13(4), 18–28 (1998)CrossRefGoogle Scholar
  15. 15.
    He, P., Wen, X., Zheng, W.: A simple method for filtering image spam. In: Eighth IEEE/ACIS International Conference, pp. 910–913 (2009)Google Scholar
  16. 16.
    Jain, U., Dhavale, S.: Image spam detection technique based on fuzzy inference system. Master’s Report, Department of Computer Engineering, Defense Institute of Advanced Technology (2015)Google Scholar
  17. 17.
    Kullback, S., Leibler, R.A.: On information and sufficiency. Ann. Math. Stat. 22(1), 79–86 (1951)MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    Mäenpää, T., Pietikäinen, M.: Texture analysis with local binary patterns. Handbook of Pattern Recognition and Computer Vision, pp. 197–216, 3rd ed (2005)Google Scholar
  19. 19.
    NIST, Mean vector and covariance matrix. Accessed 20 Jan 2016
  20. 20.
    Mladeni, D., Brank, J., Grobelnik, M., Milic-Frayling, N.: Feature selection using linear classifier weights: interaction with classification models. In: Proceedings of the 27th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 234–241 (2004)Google Scholar
  21. 21.
    Nixon, M.: Feature Extraction & Image Processing. Academic Press, New York (2008)Google Scholar
  22. 22.
    Princeton spam image benchmark (2007).
  23. 23.
    Rakotomamonjy, A.: Variable selection using SVM based criteria. J. Mach. Learn. Res. 3, 1357–1370 (2003)MathSciNetzbMATHGoogle Scholar
  24. 24.
    Saleh, M., Mohamed, A., Nabi, A.: Eigenviruses for metamorphic virus recognition. IET Inf. Secur. 5(4), 191–198 (2011)CrossRefGoogle Scholar
  25. 25.
    Shlens, J.: A Tutorial on Principal Component Analysis., Accessed 5 March 2016
  26. 26.
    Spam Assassin. Accessed 1 March 2016
  27. 27.
    Stamp, M.: Introduction to Machine Learning with Applications in Information Security. Chapman & Hall/CRC Press (in press)Google Scholar
  28. 28.
  29. 29.
    Turk, M., Pentland, A.: Eigenfaces for recognition. J. Cognit. Neurosci. 3(1), 71–86 (1991)CrossRefGoogle Scholar
  30. 30.
    Wang, Z., Josephson, W.K., Lv, Q., Charikar, M., Li, K.: Filtering Image Spam with Near-Duplicate Detection. CEAS, India (2007)Google Scholar
  31. 31.
    Win, Z.M., Aye, N.: Detecting image spam based on file properties, histogram and hough transform. J. Adv. Comput. Netw. 2(4), 287–292 (2014)Google Scholar

Copyright information

© Springer-Verlag France 2016

Authors and Affiliations

  1. 1.Department of Computer ScienceSan Jose State UniversitySan JoseUSA

Personalised recommendations