Skip to main content

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 407))

Abstract

E-mail is one of the most popular ways of communication due to its accessibility, low sending cost and fast message transfer. However, Spam emails appear as a severe problem affecting this application of today’s Internet. Filtering is an important approach to isolate those spam emails. In this paper, an approach for filtering spam email is proposed, which is based on classification techniques. The approach analyses the body of Email messages and assigns weights to terms (features) that can help identifying spam and clean (ham) emails. An adaptation is proposed that tries to reduce the dimensionality of the extracted features, in which only determined (meaningful) terms are regarded by consulting a dictionary. A thorough comparative study has been studied among different classification algorithms that prove the efficiency of the filtering approach proposed. The approach has been evaluated using Enron dataset.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Radicati, S., Hoang, Q.: Email statistics report. The Radicati Group Inc., London (2012)

    Google Scholar 

  2. Lai, C.C., Tsai, M.C.: An empirical performance comparison of machine learning methods for spam e-mail categorization. In: Fourth International Conference on Hybrid Intelligent Systems HIS’04, pp. 44–48. IEEE (2004)

    Google Scholar 

  3. del Castillo, M.D., Serrano, J.I.L.: An interactive hybrid system for identifying and filtering unsolicited e-mail. In: Intelligent Data Engineering and Automated Learning–IDEAL. Lecture Notes in Computer Science, vol. 4224, pp. 779–788. Springer, Berlin (2006)

    Google Scholar 

  4. Islam, M.S., Al Mahmud, A., Islam, M.R.: Machine learning approaches for modeling spammer behavior. In: Proceedings of the Information Retrieval Technology: The 6th Asia Information Retrieval Societies Conference, AIRS, Taipei, Taiwan, vol. 6458, pp. 251–260. Springer, Berlin (2010)

    Google Scholar 

  5. Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Technical Report DIT-06-056, University of Trento, Information Engineering and Computer Science Department, 2008

    Google Scholar 

  6. Mitchell, T.: Generative and discriminative classifiers: naive Bayes and logistic regression. http://www.cs.cm.edu/~tom/NewChapters.html. (2005)

  7. Renuka, D.K., Hamsapriya, T., Chakkaravarthi, M. R., Surya, P.L.: Spam classification based on supervised learning using machine learning techniques. In: International Conference on Process Automation, Control and Computing (PACC), pp. 1–7. IEEE (2011)

    Google Scholar 

  8. Shi, L., Wang, Q., Ma, X., Weng, M., Qiao, H.: Spam email classification using decision tree ensemble. J. Comput. Inf. Syst. 8(3), 949–956 (2012)

    Google Scholar 

  9. Islam, M., Zhou, W.: Architecture of adaptive spam filtering based on machine learning algorithms. In: ICA3PP, LNCS, vol. 4494, pp. 458–469. Springer, Berlin (2007)

    Google Scholar 

  10. Islam, R., Xiang, Y.: Email classification using data reduction method. In: Proceedings of the 5th International ICST Conference on Communications and Networking in China, pp. 1–5. IEEE (2010)

    Google Scholar 

  11. Bhat, V.H., Malkani, V.R., Shenoy, P.D., Venugopal, K.R., Patnaik, L.M.: Classification of email using beaks: behavior and keyword stemming. In: TENCON IEEE Region 10 Conference, pp. 1139–1143. IEEE (2011)

    Google Scholar 

  12. Abdelrahim, A.A., Elhadi, A.A.E., Ibrahim, H., Elmisbah, N.: Feature selection and similarity coefficient based method for email spam filtering. In: International Conference on Computing, Electrical and Electronics Engineering (ICCEEE). IEEE (2013)

    Google Scholar 

  13. Ting, L., Yu, Q.: Spam feature selection based on the improved mutual information algorithm. In: Fourth International Conference on Multimedia Information Networking and Security (MINES), IEEE (2012)

    Google Scholar 

  14. Wang, R., Youssef, A.M., Elhakeem, A.K.: On some feature selection strategies for spam filter design. In: Canadian Conference on Electrical and Computer Engineering (CCECE’06) pp. 2186–2189. IEEE (2006)

    Google Scholar 

  15. Shams, R., Mercer, R.E.: Classifying spam emails using text and readability features. In: 13th International Conference on Data Mining (ICDM). IEEE (2013)

    Google Scholar 

  16. More, S., Kulkarni, S.: Data mining with machine learning applied for email deception. In: International Conference on Optical Imaging Sensor and Security. IEEE (2013)

    Google Scholar 

  17. Porter, M.F. An algorithm for suffix stripping. Program 14.3, 130–137 (1980)

    Google Scholar 

  18. http://csmining.org/index.php/enron-spam-datasets.html

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eman M. Bahgat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this paper

Cite this paper

Bahgat, E.M., Rady, S., Gad, W. (2016). An E-mail Filtering Approach Using Classification Techniques. In: Gaber, T., Hassanien, A., El-Bendary, N., Dey, N. (eds) The 1st International Conference on Advanced Intelligent System and Informatics (AISI2015), November 28-30, 2015, Beni Suef, Egypt. Advances in Intelligent Systems and Computing, vol 407. Springer, Cham. https://doi.org/10.1007/978-3-319-26690-9_29

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-26690-9_29

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-26688-6

  • Online ISBN: 978-3-319-26690-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics