Skip to main content

A Comprehensive Comparative Study of Machine Learning Classifiers for Spam Filtering

  • Conference paper
  • First Online:
International Conference on Cyber Security, Privacy and Networking (ICSPN 2022) (ICSPN 2021)

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 599))

Included in the following conference series:

Abstract

In July 2021, the daily spam count globally reached 283 billion and constitutes 84.12% of the total email volume. The increasing surge in the spam or unsolicited emails that can hamper communication has led to an intrinsic requirement for robust and reliable antispam filters. In recent years, spam filtration and monitoring have become significant concerns for mail and other internet services. Machine learning strategies are being employed to act as safeguards against internet spam. This study provides a systematic survey of spam filtering methods using machine learning techniques. Logistic Regression, Random Forest, Naive Bayes, and Decision Tree methods used for spam filtering have been compared based on precision, recall, and accuracy on a dataset composed of Twitter tweets, Facebook posts, and YouTube comments. The preliminary discussion involves a background study of the related work on spam filtering and the research gaps in the current literature. Further, a detailed discussion on each method has been provided in this study. The results of our experiments indicate that Decision Trees provide the best accuracy at 97.02% and precision at 98.83%, and Logistic Regression has the highest recall at 99.89%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Faris, H., Al-Zoubi, A.M., Heidari, A.A., Aljarah, I., Mafarja, M., Hassonah, M.A., Fujita, H.: An intelligent system for spam detection and identification of the most relevant features based on evolutionary random weight networks. Inf. Fusion 48, 67–83 (2019)

    Google Scholar 

  2. Yu, H.Q., Reiff-Marganiec, S.: Learning disease causality knowledge from the web of health data. Int. J. Semant. Web Inf. Syst. (IJSWIS) 18(1), 1–19 (2022)

    Article  Google Scholar 

  3. Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29, 63–92 (2008)

    Google Scholar 

  4. Alghoul, A., Ajrami, S., Jarousha, G., Harb, G., Abu-Naser, S.: Email classification using artificial neural network. Int. J. Acad. Eng. Res. (2018)

    Google Scholar 

  5. Sahoo, S.R., et al.: Spammer detection approaches in online social network (OSNs): a survey. In: Sustainable Management of Manufacturing Systems in Industry 4.0, pp. 159–180. Springer, Cham (2022)

    Google Scholar 

  6. Gupta, B.B., Badve, O.P.: GARCH and ANN-based DDoS detection and filtering in cloud computing environment. Int. J. Embed. Syst. 9(5), 391–400 (2017)

    Article  Google Scholar 

  7. Udayakumar, N., Anandaselvi, S., Subbulakshmi, T.: Dynamic malware analysis using machine learning algorithm. In: 2017 International Conference on Intelligent Sustainable Systems (ICISS) (2017)

    Google Scholar 

  8. Chui KT, et al.: Handling data heterogeneity in electricity load disaggregation via optimized complete ensemble empirical mode decomposition and wavelet packet transform. Sensors 21(9):3133 (2021). https://doi.org/10.3390/s21093133

  9. DeBarr, D., Wechsler, H.: Using social network analysis for Spam Detection. Adv. Soc. Comput. 62–69 (2010)

    Google Scholar 

  10. Lu, J., Shen, J., et al.: Blockchain-based secure data storage protocol for sensors in the industrial internet of things. IEEE Trans. Indus. Inf. 18(8), 5422–5431 (2022). https://doi.org/10.1109/TII.2021.3112601

    Article  Google Scholar 

  11. Rusland, N.F., Wahid, N., Kasim, S., Hafit, H.: Analysis of Naive Bayes algorithm for email spam filtering across multiple datasets. In: IOP Conference Series: Materials Science and Engineering, vol. 226, p. 012091 (2017)

    Google Scholar 

  12. Xu, H., Sun, W., Javaid, A.: Efficient spam detection across online social networks. In: 2016 IEEE International Conference on Big Data Analysis (ICBDA) (2016)

    Google Scholar 

  13. Gupta, B.B.: A lightweight mutual authentication approach for RFID tags in IoT devices. Int. J. Netw. Virtual Organ. (2016)

    Google Scholar 

  14. Hijawi, W., Faris, H., Alqatawna, J., Al-Zoubi, A.M., Aljarah, I.: Improving email spam detection using content based feature engineering approach. In: 2017 IEEE Jordan Conference on Applied Electrical Engineering and Computing Technologies (AEECT) (2017)

    Google Scholar 

  15. Banaday, M., Jan, T.: Effectiveness and limitations of statistical spam filters. In: arXiv. (2009)

    Google Scholar 

  16. Cvitić, I., Peraković, D., Periša, M. et al.: Ensemble machine learning approach for classification of IoT devices in smart home. Int. J. Mach. Learn. Cyber. 12, 3179–3202 (2021). https://doi.org/10.1007/s13042-020-01241-0

  17. Olatunji, S.O.: Extreme learning machines and support vector machines models for email spam detection. In: 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (2017)

    Google Scholar 

  18. Zheng, X., Zhang, X., Yu, Y., Kechadi, T., Rong, C.: Elm-based spammer detection in social networks. J. Supercomput. 72, 2991–3005 (2015)

    Google Scholar 

  19. Olatunji, S.O.: Extreme learning machines and support vector machines models for email spam detection. In: 2017 IEEE 30th Canadian Conference on Electrical and Computer Engineering (CCECE) (2017)

    Google Scholar 

  20. Dean, J.: Large-scale deep learning for building intelligent computer systems. In: Proceedings of the Ninth ACM International Conference on Web Search and Data Mining (2016)

    Google Scholar 

  21. Adewole, K.S., Anuar, N.B., Kamsin, A., Varathan, K.D., Razak, S.A.: Malicious accounts: dark of the social networks. J. Netw. Comput. Appl. 79, 41–67 (2017)

    Google Scholar 

  22. Barushka, A., Hájek, P.: Spam filtering using regularized neural networks with rectified linear units. In: AI*IA 2016 Advances in Artificial Intelligence, pp. 65–75 (2016)

    Google Scholar 

  23. Gupta, S., Sharma, P., Sharma, D., Gupta, V., Sambyal, N.: Detection and localization of potholes in thermal images using deep neural networks. Multimedia Tools Appl. 79, 26265–26284 (2020)

    Article  Google Scholar 

  24. Zheng, X., Zhang, X., Yu, Y., Kechadi, T., Rong, C.: Elm-based spammer detection in social networks. J. Supercomput. 72, 2991–3005 (2015)

    Google Scholar 

  25. Ferrag, M.A., Maglaras, L., Moschoyiannis, S., Janicke, H.: Deep learning for cyber security intrusion detection: approaches, datasets, and comparative study. J. Inf. Secur. Appl. 50, 102419 (2020)

    Google Scholar 

  26. Kumar, N., Sonowal, S., Nishant: Email spam detection using machine learning algorithms. In: 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA) (2020)

    Google Scholar 

  27. Sharma, R., Sharma, T.P., Sharma, A.K.: Detecting and preventing misbehaving intruders in the internet of vehicles. Int. J. Cloud Appl. Comput. (IJCAC) 12(1), 1–21 (2022)

    MathSciNet  Google Scholar 

  28. Santos, I., Penya, Y.K., Devesa, J., Bringas, P.G.: N-grams-based file signatures for malware detection. In: Proceedings of the 11th International Conference on Enterprise Information (2009)

    Google Scholar 

  29. Bhuiyan, H., Ashiquzzaman, A., Juthi, T., Biswas, S., Ara, J.: A survey of existing E-Mail spam filtering methods considering machine learning techniques. Global J. Comput. Sci. Technol. (2018)

    Google Scholar 

  30. Kumar, S., Singh, S.K., Aggarwal, N., Aggarwal, K.: Evaluation of automatic parallelization algorithms to minimize speculative parallelism overheads: an experiment. J. Discrete Math. Sci. Crypt. 24, 1517–1528 (2021)

    MATH  Google Scholar 

  31. Singh, I., Singh, S.K., Kumar, S., Aggarwal, K.: Dropout-VGG based convolutional neural network for traffic sign categorization. Lecture Notes on Data Engineering and Communications Technologies, pp. 247–261 (2022)

    Google Scholar 

  32. Ling, Z., Hao, Z.J.: An intrusion detection system based on normalized mutual information antibodies feature selection and adaptive quantum artificial immune system. Int. J. Semant. Web Inf. Syst. (IJSWIS) 18(1), 1–25 (2022)

    Google Scholar 

  33. Singh, I., Singh, S.K., Singh, R., Kumar, S.: Efficient loop unrolling factor prediction algorithm using machine learning models. In: 2022 3rd International Conference for Emerging Technology (INCET) (2022)

    Google Scholar 

  34. Singh, S.K.: Linux yourself (2021)

    Google Scholar 

  35. Gansterer, W.N., Janecek, A.G., Neumayer, R.: Spam filtering based on latent semantic indexing. In: Survey of Text Mining II, pp. 165–183 (2008)

    Google Scholar 

  36. Lee, D., Lee, M.J., Kim, B.J.: Deviation-based spam-filtering method via stochastic approach. EPL (Europhys. Lett.) 121, 68004 (2018)

    Google Scholar 

  37. Wang, J., Katagishi, K.: Image content-based email spam image filtering. J. Adv. Comput. Netw. 2, 110–114 (2014)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Saksham Gupta .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gupta, S., Chhabra, A., Agrawal, S., Singh, S.K. (2023). A Comprehensive Comparative Study of Machine Learning Classifiers for Spam Filtering. In: Nedjah, N., Martínez Pérez, G., Gupta, B.B. (eds) International Conference on Cyber Security, Privacy and Networking (ICSPN 2022). ICSPN 2021. Lecture Notes in Networks and Systems, vol 599. Springer, Cham. https://doi.org/10.1007/978-3-031-22018-0_24

Download citation

Publish with us

Policies and ethics