Skip to main content

A Supervised Approach for Spam Detection Using Text-Based Semantic Representation

  • Conference paper
  • First Online:
E-Technologies: Embracing the Internet of Things (MCETECH 2017)

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 289))

Included in the following conference series:

Abstract

In this paper, we propose an approach for email spam detection based on text semantic analysis at two levels. The first level allows categorization of emails by specific domains (e.g., health, education, finance, etc.). The second level uses semantic features for spam detection in each specific domain. We show that the proposed method provides an efficient representation of internal semantic structure of email content which allows for more precise and interpretable spam filtering results compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    http://tartarus.org/martin/PorterStemmer/index-old.html.

  2. 2.

    https://www.cs.cmu.edu/./enron/.

  3. 3.

    http://csmining.org/index.php/lingspam-datasets.html.

References

  1. Bratko, A., Cormack, G.V., et al.: Spam filtering using statistical data compression models. J. Mach. Learn. Res. 7, 2673–2698 (2006)

    MathSciNet  MATH  Google Scholar 

  2. Caruana, G., Li, M.: A survey of emerging approaches to spam filtering. ACM Comput. Surv. (CSUR) 44(2), 1–27 (2012)

    Article  Google Scholar 

  3. Clark, P., Boswell, R.: Rule induction with CN2: some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991). doi:10.1007/BFb0017011

    Chapter  Google Scholar 

  4. Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)

    Google Scholar 

  5. Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retrieval 1(4), 335–455 (2007)

    Article  Google Scholar 

  6. Çiltik, A., Güngör, T.: Time-efficient spam e-mail filtering using n-gram models. Pattern Recogn. Lett. 29(1), 19–33 (2008)

    Article  Google Scholar 

  7. Gudkova, D., Vergelis, M., et al.: Spam and phishing in Q2 2016, pp. 1–22. Kaspersky Lab (2016)

    Google Scholar 

  8. Gudkova, D., Vergelis, M., Demidova, N.: Spam and phishing in Q2 2015, pp. 1–19. Kaspersky Lab (2015)

    Google Scholar 

  9. Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36(7), 10206–10222 (2009)

    Article  Google Scholar 

  10. Herrera, F., Carmona del Jesus, C.J., et al.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2010). Published online first

    Article  Google Scholar 

  11. Laorden, C., Santos, I., et al.: Word sense disambiguation for spam filtering. Electron. Commer. Res. Appl. 11(3), 290–298 (2012)

    Article  Google Scholar 

  12. Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(2), 153–188 (2004)

    MathSciNet  Google Scholar 

  13. Renuka, D.K., Hamsapriya, T., et al.: Spam classification based on supervised learning using machine learning techniques. In: International Conference on Process Automation, Control and Computing (PACC), pp. 1–7. IEEE (2011)

    Google Scholar 

  14. Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced topic-based vector space model for semantics aware spam filtering. Expert Syst. Appl. 39(1), 437–444 (2012)

    Article  Google Scholar 

  15. Symantec. Internet Security Threat Report, vol. 21, pp. 1–77, April 2016

    Google Scholar 

  16. Tang, G., Pei, J., Luk, W.S.: Email mining: tasks, common techniques, and tools. Knowl. Inf. Syst. 41(1), 1–31 (2014)

    Article  Google Scholar 

  17. Torabi, Z.S., Nadimi-Shahraki, M.H., et al.: Efficient support vector machines for spam detection: a survey. Int. J. Comput. Sci. Inf. Secur. 13(1), 11 (2015)

    Google Scholar 

  18. Wang, H., Zheng, G., He, Y.: The improved bayesian algorithm to spam filtering. In: Wong, W.E. (ed.) Proceedings of the 4th International Conference on Computer Engineering and Networks, pp. 37–44. Springer, Cham (2015). doi:10.1007/978-3-319-11104-9_5

    Chapter  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nadjate Saidani .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Saidani, N., Adi, K., Allili, M.S. (2017). A Supervised Approach for Spam Detection Using Text-Based Semantic Representation. In: Aïmeur, E., Ruhi, U., Weiss, M. (eds) E-Technologies: Embracing the Internet of Things . MCETECH 2017. Lecture Notes in Business Information Processing, vol 289. Springer, Cham. https://doi.org/10.1007/978-3-319-59041-7_8

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-59041-7_8

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-59040-0

  • Online ISBN: 978-3-319-59041-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics