Advertisement

Cluster Computing

, Volume 22, Supplement 1, pp 33–46 | Cite as

Visual and textual features based email spam classification using S-Cuckoo search and hybrid kernel support vector machine

  • T. KumaresanEmail author
  • S. Saravanakumar
  • R. Balamurugan
Article

Abstract

Spam mail classification has been playing a vital role in recent days due to the uncontrollable growth happening in the electronic media. Literature presents several algorithms for email spam classification based on classification methods. In this paper, we propose a spam classification framework using S-Cuckoo and hybrid kernel based support vector machine (HKSVM). At first, the features are extracted from the e-mails based on the text as well as the image. For the textual features, TF-term frequency is used. For the image dependent features, correrlogram and wavelet moment are taken. The hybrid features have then high dimension so the optimum features are identified with the help of hybrid algorithm, called S-Cuckoo search. Then, the classification is done using proposed classifier HKSVM model which is designed based on the hybrid kernel by blending three different kernel functions and then it is used in the SVM classifier. The additional features provided based on image and the modification of SVM classifier provides significant improvement as compared with existing algorithms. The spam classification performance is measured by db1 (combining bare-ling spam and Spam Archive corpus) and db2 (combining lemm-ling spam and Spam Archive corpus). Experimental results show that the proposed spam classification framework has outperformed by having better accuracy of 97.235% when compared with existing approach which is able to achieve only 94.117%.

Keywords

Support vector machine Cuckoo search Spam Correlogram S-Cuckoo search 

References

  1. 1.
    Wang, M.F., Jheng, S.L., Tsai, M.F., Tang, C.H.: Enterprise email classification based on social network features. In: International Conference on Advances in Social Networks Analysis and Mining, pp. 532–536. (2011)Google Scholar
  2. 2.
    Kumar, R.K., Poonkuzhali, G., Sudhakar, P.: Comparative study on email spam classifier using data mining techniques. In: Proceeding of the IMECS, vol. 1, (2012)Google Scholar
  3. 3.
    Islam, M.R., Chowdhury, M.U., Zhou, W.: An innovative spam filtering model based on support vector machine. Computational Intelligence for Modeling, Control and Automation 2, 349–353 (2005)Google Scholar
  4. 4.
    Chiu, C.Y., Huang, Y.T.: Integration of support vector machine with naive bayesian classifier for spam classification. In: 4th International Conference on Fuzzy System and Knowledge Discovery, vol. 1, pp. 618–622 (2007)Google Scholar
  5. 5.
    Islam M.R., Choudhury, M.U.: Dynamic feature selection for spam filtering using support vector machine. International Conference on Computer and Information Science, pp. 757–762 (2007)Google Scholar
  6. 6.
    Androutsopoulos, I., Paliouras, G., Michelakis, E.: Learning to filter unsolicited commercial email. NCRS. Technical Report (2), (2004)Google Scholar
  7. 7.
    Moustakas, E., Ranganathan, C., Duquenoy, P.: Combating spam through legislation: a comparative analysis of US and European approaches. In: Proceedings of 2nd Conference on Email and Anti-Spam, CEAS (2005)Google Scholar
  8. 8.
    Acır, N., Özdamar, Ö., Güzeliş, C.: Automatic classification of auditory brainstem responses using SVM-based feature selection algorithm for threshold detection. Eng. Appl. Artif. Intell. 19, 209–218 (2006)CrossRefGoogle Scholar
  9. 9.
    Valentini, G., Muselli, M., Ruffino, F.: Cancer recognition with bagged ensembles of support vector machines. Neurocomputing 56, 461–466 (2004)CrossRefGoogle Scholar
  10. 10.
    Zhang, Y.L., Guo, N., Du, H., Li, W.H.: Automated defect recognition of C-SAM images in IC packaging using Support Vector Machines. Int. J. Adv. Manuf. Technol. 25, 1191–1196 (2005)CrossRefGoogle Scholar
  11. 11.
    Araújo, T., Aresta, G., Castro, E., Rouco, J., Aguiar, P., Eloy, C., et al.: Classification of breast cancer histology images using Convolutional Neural Networks. PLoS ONE 12(6), e0177544 (2017)CrossRefGoogle Scholar
  12. 12.
    Huang, M.-W., Chen, C.-W., Lin, W.-C., Ke, S.-W., Tsai, C.-F.: SVM and SVM Ensembles in breast cancer prediction. PLoS ONE 12(1), e0161501 (2017)CrossRefGoogle Scholar
  13. 13.
    Wu, C.H.: Behavior-based spam detection using a hybrid method of rule-based techniques and neural networks. Expert Syst. Appl. 36, 4321–4330 (2009)CrossRefGoogle Scholar
  14. 14.
    Bezerra, G.B., Barra, T.V., Ferreira, H.M., et al.: An immunological filter for spam. In: Proceedings of the International Conference on Artificial Immune Systems, Oeiras, Portugal, pp. 446–458. (2006)Google Scholar
  15. 15.
    Wang, F., You, Z., Man, L.: Immune-based peer-to-peer model for anti-spam. In: Proceedings of the International Conference on Intelligent Computing, Kunming, China, pp. 660–671. (2006)Google Scholar
  16. 16.
    Moon, J., Shon, T., Seo, J., et al.: An approach for spam e-mail detection with support vector machine and n-gram indexing. In: Proceedings of International Symposium on Computer and Information Sciences, Springer, Antalya, pp. 351–362. (2004)Google Scholar
  17. 17.
    Wang, H.B., Yu, Y., Liu, Z.: SVM classifier incorporating feature selection using GA for spam detection. In: Proceedings of the 2005 International Conference on Embedded And Ubiquitous Computing, Nagasaki, Japan, pp. 1147–1154. (2005)Google Scholar
  18. 18.
    Gavrilis, D., Tsoulos, I.G., Dermatas, E.: Neural recognition and genetic features selection for robust detection of e-mail spam. In: Proceedings of the 4th Helenic Conference on AI, Crete, Greece, pp. 498–501. (2006)Google Scholar
  19. 19.
    Marsono, M.N., El-Kharashi, M.W., Gebali, F.: Binary lns-based naive bayes hardware classifier for spam control. In: Proceedings of the 2006 IEEE International Symposium on Circuits and Systems, pp. 3674–3677. (2006)Google Scholar
  20. 20.
    Crawford, E., Koprinska, I., Patrick, J.: Phrases and feature selection in e-mail classification. In: Proceedings of the 9th Australasian Document Computing Symposium, Melbourne, Australia (2004)Google Scholar
  21. 21.
    Delany, S.J., Cunningham, P., Tsymbal, A., et al.: A case-based technique for tracking concept drift in spam filtering. Knowl. Based Syst. 18(4–5), 187–195 (2005)CrossRefGoogle Scholar
  22. 22.
    Carpinter, J., Hunt, R.: Tightening the net: a review of current and next generation spam filtering tools. Comput. Secur. 25, 566–578 (2006)CrossRefGoogle Scholar
  23. 23.
    Georgioua, E., Dikaiakosa, M.D., Stassopoulou, A.: On the properties of spam advertised URL addresses. J. Netw. Comput. Appl. 31(4), 966–985 (2008)CrossRefGoogle Scholar
  24. 24.
    Gordillo, J., Conde, E.: An HMM for detecting spam mail. Expert Syst. Appl. 33, 667–682 (2007)CrossRefGoogle Scholar
  25. 25.
    Hsiao, W.-F., Chang, T.-M.: An incremental cluster-based approach to spam filtering. Expert Syst. Appl. 34(3), 1599–1608 (2008)CrossRefGoogle Scholar
  26. 26.
    Lai, C.C.: An empirical study of three machine learning methods for spam filtering. Knowl. Based Syst. 20, 249–254 (2007)CrossRefGoogle Scholar
  27. 27.
    El-Alfy, E.M.: Learning methods for spam filtering. Int. J. Comput. Res. 16(4), 45 (2008)Google Scholar
  28. 28.
    Drucker, H., Wu, D., Vapnik, V.: Support vector machines for spamcategorization. IEEE Trans. Neural Netw. 10(5), 1048–1054 (1999)CrossRefGoogle Scholar
  29. 29.
    Wang, H.B., Yu, Y., Liu, Z.: SVM classifier incorporating feature selection using GA for spam detection. In: Proceedings of the 2005 International Conference on Embedded and Ubiquitous Computing, Nagasaki, Japan, pp. 1147–1154 (2005)Google Scholar
  30. 30.
    Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)zbMATHGoogle Scholar
  31. 31.
    George, J., Kumaraswamy, R.: A Hybrid Wavelet Kernel Construction for Support Vector Machine Classification. In: Proceedings of The 2008 International Conference on Data Mining, DMIN 2008, July 14–17, Las Vegas, USA, vol. 2. (2008)Google Scholar
  32. 32.
    Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Inf. Retr. 6(1), 49–73 (2003)CrossRefGoogle Scholar
  33. 33.
  34. 34.
    Huang, J., Kumar, S., Mitra, M., Zhu, W., Zabih, R.: Image indexing using color correlograms. IEEE computer society conference on computer vision and pattern recognition (CVPR) (1997)Google Scholar
  35. 35.
    Kecman, V.: Learning and Soft Computing: Support vector Machines. Neural Networks and Fuzzy logic models. MIT Press, London (2001)zbMATHGoogle Scholar
  36. 36.
    Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. J. Soc. Inf. Sci. 41(6), 391–407 (1990)CrossRefGoogle Scholar
  37. 37.
    Shawe-Taylor, J., Cristianini, N.: Kernel Methods for Pattern Analysis. Cambridge University Press, UK (2004)CrossRefzbMATHGoogle Scholar
  38. 38.
    Scholkopf, B., Smola, A.J.: Learning with Kernels: Support Vector Machine, Regularization, Optimization, and Beyond. MIT Press, London (2002)Google Scholar
  39. 39.
    Howley, T., Madden, M.G.: The genetic kernel support vector machine: description and evaluation. Artif. Intell. 24(3–4), 379–395 (2005)CrossRefGoogle Scholar
  40. 40.
    Gee, K.R.: Using latent semantic indexing to filter spam. In: Proceedings of the 2003 ACM Symposium on Applied Computing, Data Minning Track. ACM, New York, pp. 460–464. (2003)Google Scholar
  41. 41.
    Gansterer, W.N., Janecek, A.G.K., Neumayer, R.: Spam Filtering Based on Latent Semantic Indexing. In: Survey of Text Mining II: Clustering, Classification, and Retrieval. Springer, London, pp. 165–183 (2007)Google Scholar
  42. 42.
    Gansterer, W.N., Ilger, M., Lechner, P., Neumayer, R., Strauss, J.: Anti-spam methods-state of the art. Tech. rep (2005)Google Scholar
  43. 43.
    Cormack, G.V.: Spam track overview. In: Proceedings of the 16th Text Retrieval Conference: TREC-2007, National Institute of Standards and Technology (NIST) (2007)Google Scholar
  44. 44.
    Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36, 10206–10222 (2009)CrossRefGoogle Scholar
  45. 45.
    Gomez, J.C., Moens, M.F.: PCA document reconstruction for email classification. Comput. Stat. Data Anal. 56, 741–751 (2012)MathSciNetCrossRefGoogle Scholar
  46. 46.
    Yu, B., Zhu, D.H.: Combining neural networks and semantic feature space for email classification. Knowl. Based Syst. 22, 376–381 (2009)CrossRefGoogle Scholar
  47. 47.
    Crawford, E., Kay, J., McCreath, E.: Automatic induction of rules for email classification. In: Proceedings of the 6th Australasian Document Computing Symposium, Coffs Harbour, Australia, pp. 13–20 (2001)Google Scholar
  48. 48.
    Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., Stamatopoulos, P.: A memory-based approach to anti-spam filtering for mailing lists. Inf. Retr. 6(1), 49–73 (2003)CrossRefGoogle Scholar
  49. 49.
    Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated email classification. In Proceedings of IEEE/WIC International Conference on Web Intelligence, Halifax, Canada, pp. 702-705. (2003)Google Scholar
  50. 50.
    Chen, D.H., Chen, T.J., Ming, H.: Spare email filter using naive bayesian, decision tree, neural network and adaboost (2003)Google Scholar
  51. 51.
    Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An evaluation of naïve Bayesian anti-spam filtering. In: Proceedings of the 11th European Conference on Machine Learning: ECML 2009, Workshop on Machine Learning in the New Information Age. Springer-Verlag, Berlin, pp. 9–7. (2000)Google Scholar
  52. 52.
    Crawford, E., Koprinska, I., Patrick, J.: Phrases and feature selection in email classification. In: Proceedings of the 9th Australasian document computing symposium, Melbourne, Australia (2004)Google Scholar
  53. 53.
    Bezerra, G.B., Barra, T.V., Ferreira, H.M., Knidel, H., de Castro, L.N., Von Zuben, F.J.: An immunological filter for spam. In: Proceedings of the International Conference on Artificial Immune Systems, Oeiras, Portugal, pp. 446–458. (2006)Google Scholar
  54. 54.
    Fette, I., Sadeh, N., Tomasic, A.: Learning to detect phishing emails. In: Proceedings of the 16th International World Wide Web Conference: WWW 2007. ACM, New York, pp. 649–656. (2007)Google Scholar
  55. 55.
    Abu-Nimeh, S., Nappa, D., Wang, X., Nair, S.: A comparison of machine learning techniques for phishing detection. In: Proceedings of the Anti-Phishing Working Groups 2nd Annual eCrime Researchers Summit: eCrime 2007. ACM, New York, pp. 60–69. (2007)Google Scholar
  56. 56.
    Gansterer, W.N., Pölz, D.: E-mail classification for phishing defense. In: Proceedings of the 31st European Conference on Information Retrieval: ECIR 2009. Springer, Toulouse, pp. 449-460. (2009)Google Scholar
  57. 57.
    Brutlag, J.D., Meek, C.: Challenges of the email domain for text classification. In: Proceedings of the 17th International Conference on Machine Learning: ICML 2000. Morgan Kaufmann, San Francisco (2000)Google Scholar
  58. 58.
    Xia, Y., Wong, K.-F.: Binarization approaches to email categorization. In: Proceedings of the 23rd Annual ACM Symposium on Applied Computing: SAC 2008. ACM, New York, pp. 474–481. (2006)Google Scholar
  59. 59.
    Bratko, A., Cormack, G., Filipic, B., Lynam, T., Zupan, B.: Spam filtering using statistical data compression models. J. Mach. Learn. Res. 7, 2673–2698 (2006)MathSciNetzbMATHGoogle Scholar
  60. 60.
    Bíró, I., Szabó, J., Benczúr, A.A.: Latent Dirichlet allocation in web spam filtering. In: Proceedings of the 4th International Workshop on Adversarial Information Retrieval on the Web: AIRWeb 2008. ACM, New York, pp. 29–32. (2008)Google Scholar
  61. 61.
    Kanaris, I., Kanaris, K., Houvardas, I., Stamatatos, E.: Words vs. character n-grams for anti-spam filtering. Int. J. Artif. Intell. Tools 16(6), 1047–1067 (2007)Google Scholar
  62. 62.
    Gomez, J.C., Moens, M.-F.: Using biased discriminant analysis for email filtering. In: Proceedings of the 14th International Conference KES 2010. Springer, Berlin, pp. 566–575. (2010)Google Scholar
  63. 63.
    Gomez, J.C., Moens, M.-F.: Highly discriminative statistical features for email classification. Knowl. Inf. Syst. 31(1), 23–53 (2011)CrossRefGoogle Scholar
  64. 64.
    Janecek, A.G.K., Gansterer, W.N.: Utilizing Nonnegative Matrix Factorization for Email Classification Problems. Wiley, Chichester (2010)CrossRefGoogle Scholar
  65. 65.
    Jolliffe, I.T.: Principal Component Analysis. Springer, New York (1986)CrossRefzbMATHGoogle Scholar
  66. 66.
    Snyder, J.: Spam in the wild, the sequel. Network World 12/20/04 (2004)Google Scholar
  67. 67.
    Kumaresan, T., Palanisamy, C.: E-mail spam classification using S-Cuckoo search and support vector machine. Int. J. Bio-Inspired Comput. 9(3), 142–156 (2017)CrossRefGoogle Scholar
  68. 68.
    Kumaresan, T., Sanjushree, S., Palanisamy, C.: Image spam detection using color features and K-Nearest neighbor classification. Int. J. Comput. Inf. Syst. Control Eng. 8(10), 1746–1749 (2014)Google Scholar
  69. 69.
    Kumaresan, T., Sanjushree, S., Suhasini, K., Palanisamy, C.: Image spam filtering using support vector machine and particle swarm optimization. Int. J. Comput. Appl. 1, 17–21 (2015)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2017

Authors and Affiliations

  1. 1.Bannari Amman Institute of TechnologyErodeIndia
  2. 2.Adithya Institute of TechnologyCoimbatoreIndia
  3. 3.Bharat Institute of Engineering and Technology, IbrahimpatanamHyderabadIndia

Personalised recommendations