Skip to main content

Comparative Study of Classification Algorithms for Spam Email Detection

  • Conference paper
  • First Online:
Emerging Research in Computing, Information, Communication and Applications

Abstract

Spam in emails has become a major issue. Spam messages consume space, network bandwidth and are of no use to the receiver. It is very difficult to filter spam as spammers try to tackle the processes carried out by the filtering mechanism. Various classification algorithms are used to classify a mail as spam or non-spam (ham). The present paper compares and discusses the effectiveness of four machine learning classification algorithms, belonging to different categories (Probabilistic, Decision Tree, Vector Machines and Lazy Algorithms) on the basis of various performance measures, using WEKA, a data mining tool to analyze different algorithms. Enron dataset is taken in a processed form from Athens University of Economics and Business and it is found that J48 and BayesNet algorithms perform better than SVM.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Panigrahi, P.K.: A comparative study of supervised machine learning techniques for spam email filtering. In: 2012 Fourth International Conference on Computational Intelligence and Communication Networks, 2012

    Google Scholar 

  2. Alvestad, S.: Early warnings of critical diagnoses, 2009

    Google Scholar 

  3. Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with Naive Bayes—which Naive Bayes?. In: Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006), Mountain View, CA, USA, 2006

    Google Scholar 

  4. Ackoff, R.L.: From data to wisdom. J. Appl. Syst. Anal. 16(1), 3–9 (1989)

    Google Scholar 

  5. Sharma, N.: The origin of the data information knowledge wisdom hierarchy. Data Inf. Knowl. Wisdom hierarchy (2008)

    Google Scholar 

  6. Chandrasekaran, M., Narayanan, K., Upadhyaya, S.: Phishing email detection based on structural properties. In: Proceedings of 9th Annual NYS Cyber Security Conference, June 2006

    Google Scholar 

  7. Ozarkar, P., Patwardhan, M.: Efficient spam classification by appropriate feature selection. Global J. Comput. Sci. Technol. Softw. Data Eng. 13(5) (2013)

    Google Scholar 

  8. Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000)

    Article  MATH  Google Scholar 

  9. Hsiao, W.F., Chang, T.M.: An incremental cluster-based approach to spam filtering. Expert Syst. Appl. 34(3), 1599–1608 (2008)

    Google Scholar 

  10. Awad, W.A., Elseuofi, S.M.: Machine learning methods for E-mail classification. Int. J. Comput. Appl. 16(1), 39–45 (2011). doi:10.5120/1974-2646

    Google Scholar 

  11. El-Alfy, E.S.M., Abdel-Aal, R.E.: Using GMDH-based networks for improved spam detection and email feature analysis. Appl. Soft Comput. 11(1), 477–488 (2011)

    Article  Google Scholar 

  12. Ion Androutsopoulos, Enron Dataset http://nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html

  13. Weka- Data Mining Tool tutorials, documentation http://www.cs.waikato.ac.nz/ml/weka/documentation.html

  14. Ahmed, Kh: An overview of content-based spam filtering techniques. Informatica 31(3), 269–277 (2007)

    MATH  Google Scholar 

  15. Geetha Ramani, R., Sivagami, G.: Parkinson disease classification using data mining algorithms. Int. J. Comput. 32(9) (2011)

    Google Scholar 

  16. Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to Spam filtering. Expert Syst. Appl. 36, 10206–10222 (2009)

    Article  Google Scholar 

Download references

Acknowledgments

Aakanksha Sharaff, Naresh Kumar Nagwani would like to thanks National Institute of Technology, Raipur, India for providing their support, imparting their knowledge, and making all the required facilities available for completing this research.

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer India

About this paper

Cite this paper

Sharaff, A., Nagwani, N.K., Dhadse, A. (2016). Comparative Study of Classification Algorithms for Spam Email Detection. In: Shetty, N., Prasad, N., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2553-9_23

Download citation

  • DOI: https://doi.org/10.1007/978-81-322-2553-9_23

  • Published:

  • Publisher Name: Springer, New Delhi

  • Print ISBN: 978-81-322-2552-2

  • Online ISBN: 978-81-322-2553-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics