Abstract
Spam in emails has become a major issue. Spam messages consume space, network bandwidth and are of no use to the receiver. It is very difficult to filter spam as spammers try to tackle the processes carried out by the filtering mechanism. Various classification algorithms are used to classify a mail as spam or non-spam (ham). The present paper compares and discusses the effectiveness of four machine learning classification algorithms, belonging to different categories (Probabilistic, Decision Tree, Vector Machines and Lazy Algorithms) on the basis of various performance measures, using WEKA, a data mining tool to analyze different algorithms. Enron dataset is taken in a processed form from Athens University of Economics and Business and it is found that J48 and BayesNet algorithms perform better than SVM.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Panigrahi, P.K.: A comparative study of supervised machine learning techniques for spam email filtering. In: 2012 Fourth International Conference on Computational Intelligence and Communication Networks, 2012
Alvestad, S.: Early warnings of critical diagnoses, 2009
Metsis, V., Androutsopoulos, I., Paliouras, G.: Spam filtering with Naive Bayes—which Naive Bayes?. In: Proceedings of the 3rd Conference on Email and Anti-Spam (CEAS 2006), Mountain View, CA, USA, 2006
Ackoff, R.L.: From data to wisdom. J. Appl. Syst. Anal. 16(1), 3–9 (1989)
Sharma, N.: The origin of the data information knowledge wisdom hierarchy. Data Inf. Knowl. Wisdom hierarchy (2008)
Chandrasekaran, M., Narayanan, K., Upadhyaya, S.: Phishing email detection based on structural properties. In: Proceedings of 9th Annual NYS Cyber Security Conference, June 2006
Ozarkar, P., Patwardhan, M.: Efficient spam classification by appropriate feature selection. Global J. Comput. Sci. Technol. Softw. Data Eng. 13(5) (2013)
Lim, T.S., Loh, W.Y., Shih, Y.S.: A comparison of prediction accuracy, complexity, and training time of thirty-three old and new classification algorithms. Mach. Learn. 40(3), 203–228 (2000)
Hsiao, W.F., Chang, T.M.: An incremental cluster-based approach to spam filtering. Expert Syst. Appl. 34(3), 1599–1608 (2008)
Awad, W.A., Elseuofi, S.M.: Machine learning methods for E-mail classification. Int. J. Comput. Appl. 16(1), 39–45 (2011). doi:10.5120/1974-2646
El-Alfy, E.S.M., Abdel-Aal, R.E.: Using GMDH-based networks for improved spam detection and email feature analysis. Appl. Soft Comput. 11(1), 477–488 (2011)
Ion Androutsopoulos, Enron Dataset http://nlp.cs.aueb.gr/software_and_datasets/Enron-Spam/index.html
Weka- Data Mining Tool tutorials, documentation http://www.cs.waikato.ac.nz/ml/weka/documentation.html
Ahmed, Kh: An overview of content-based spam filtering techniques. Informatica 31(3), 269–277 (2007)
Geetha Ramani, R., Sivagami, G.: Parkinson disease classification using data mining algorithms. Int. J. Comput. 32(9) (2011)
Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to Spam filtering. Expert Syst. Appl. 36, 10206–10222 (2009)
Acknowledgments
Aakanksha Sharaff, Naresh Kumar Nagwani would like to thanks National Institute of Technology, Raipur, India for providing their support, imparting their knowledge, and making all the required facilities available for completing this research.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Sharaff, A., Nagwani, N.K., Dhadse, A. (2016). Comparative Study of Classification Algorithms for Spam Email Detection. In: Shetty, N., Prasad, N., Nalini, N. (eds) Emerging Research in Computing, Information, Communication and Applications. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2553-9_23
Download citation
DOI: https://doi.org/10.1007/978-81-322-2553-9_23
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2552-2
Online ISBN: 978-81-322-2553-9
eBook Packages: EngineeringEngineering (R0)