Abstract
E-mail is one of the most popular ways of communication due to its accessibility, low sending cost and fast message transfer. However, Spam emails appear as a severe problem affecting this application of today’s Internet. Filtering is an important approach to isolate those spam emails. In this paper, an approach for filtering spam email is proposed, which is based on classification techniques. The approach analyses the body of Email messages and assigns weights to terms (features) that can help identifying spam and clean (ham) emails. An adaptation is proposed that tries to reduce the dimensionality of the extracted features, in which only determined (meaningful) terms are regarded by consulting a dictionary. A thorough comparative study has been studied among different classification algorithms that prove the efficiency of the filtering approach proposed. The approach has been evaluated using Enron dataset.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Radicati, S., Hoang, Q.: Email statistics report. The Radicati Group Inc., London (2012)
Lai, C.C., Tsai, M.C.: An empirical performance comparison of machine learning methods for spam e-mail categorization. In: Fourth International Conference on Hybrid Intelligent Systems HIS’04, pp. 44–48. IEEE (2004)
del Castillo, M.D., Serrano, J.I.L.: An interactive hybrid system for identifying and filtering unsolicited e-mail. In: Intelligent Data Engineering and Automated Learning–IDEAL. Lecture Notes in Computer Science, vol. 4224, pp. 779–788. Springer, Berlin (2006)
Islam, M.S., Al Mahmud, A., Islam, M.R.: Machine learning approaches for modeling spammer behavior. In: Proceedings of the Information Retrieval Technology: The 6th Asia Information Retrieval Societies Conference, AIRS, Taipei, Taiwan, vol. 6458, pp. 251–260. Springer, Berlin (2010)
Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Technical Report DIT-06-056, University of Trento, Information Engineering and Computer Science Department, 2008
Mitchell, T.: Generative and discriminative classifiers: naive Bayes and logistic regression. http://www.cs.cm.edu/~tom/NewChapters.html. (2005)
Renuka, D.K., Hamsapriya, T., Chakkaravarthi, M. R., Surya, P.L.: Spam classification based on supervised learning using machine learning techniques. In: International Conference on Process Automation, Control and Computing (PACC), pp. 1–7. IEEE (2011)
Shi, L., Wang, Q., Ma, X., Weng, M., Qiao, H.: Spam email classification using decision tree ensemble. J. Comput. Inf. Syst. 8(3), 949–956 (2012)
Islam, M., Zhou, W.: Architecture of adaptive spam filtering based on machine learning algorithms. In: ICA3PP, LNCS, vol. 4494, pp. 458–469. Springer, Berlin (2007)
Islam, R., Xiang, Y.: Email classification using data reduction method. In: Proceedings of the 5th International ICST Conference on Communications and Networking in China, pp. 1–5. IEEE (2010)
Bhat, V.H., Malkani, V.R., Shenoy, P.D., Venugopal, K.R., Patnaik, L.M.: Classification of email using beaks: behavior and keyword stemming. In: TENCON IEEE Region 10 Conference, pp. 1139–1143. IEEE (2011)
Abdelrahim, A.A., Elhadi, A.A.E., Ibrahim, H., Elmisbah, N.: Feature selection and similarity coefficient based method for email spam filtering. In: International Conference on Computing, Electrical and Electronics Engineering (ICCEEE). IEEE (2013)
Ting, L., Yu, Q.: Spam feature selection based on the improved mutual information algorithm. In: Fourth International Conference on Multimedia Information Networking and Security (MINES), IEEE (2012)
Wang, R., Youssef, A.M., Elhakeem, A.K.: On some feature selection strategies for spam filter design. In: Canadian Conference on Electrical and Computer Engineering (CCECE’06) pp. 2186–2189. IEEE (2006)
Shams, R., Mercer, R.E.: Classifying spam emails using text and readability features. In: 13th International Conference on Data Mining (ICDM). IEEE (2013)
More, S., Kulkarni, S.: Data mining with machine learning applied for email deception. In: International Conference on Optical Imaging Sensor and Security. IEEE (2013)
Porter, M.F. An algorithm for suffix stripping. Program 14.3, 130–137 (1980)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this paper
Cite this paper
Bahgat, E.M., Rady, S., Gad, W. (2016). An E-mail Filtering Approach Using Classification Techniques. In: Gaber, T., Hassanien, A., El-Bendary, N., Dey, N. (eds) The 1st International Conference on Advanced Intelligent System and Informatics (AISI2015), November 28-30, 2015, Beni Suef, Egypt. Advances in Intelligent Systems and Computing, vol 407. Springer, Cham. https://doi.org/10.1007/978-3-319-26690-9_29
Download citation
DOI: https://doi.org/10.1007/978-3-319-26690-9_29
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-26688-6
Online ISBN: 978-3-319-26690-9
eBook Packages: Computer ScienceComputer Science (R0)