A Supervised Approach for Spam Detection Using Text-Based Semantic Representation

Saidani, Nadjate; Adi, Kamel; Allili, Mouhand Said

doi:10.1007/978-3-319-59041-7_8

Nadjate Saidani⁹,
Kamel Adi⁹ &
Mouhand Said Allili⁹

Part of the book series: Lecture Notes in Business Information Processing ((LNBIP,volume 289))

Included in the following conference series:

International Conference on E-Technologies

991 Accesses
4 Citations

Abstract

In this paper, we propose an approach for email spam detection based on text semantic analysis at two levels. The first level allows categorization of emails by specific domains (e.g., health, education, finance, etc.). The second level uses semantic features for spam detection in each specific domain. We show that the proposed method provides an efficient representation of internal semantic structure of email content which allows for more precise and interpretable spam filtering results compared to existing methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

References

Bratko, A., Cormack, G.V., et al.: Spam filtering using statistical data compression models. J. Mach. Learn. Res. 7, 2673–2698 (2006)
MathSciNet MATH Google Scholar
Caruana, G., Li, M.: A survey of emerging approaches to spam filtering. ACM Comput. Surv. (CSUR) 44(2), 1–27 (2012)
Article Google Scholar
Clark, P., Boswell, R.: Rule induction with CN2: some recent improvements. In: Kodratoff, Y. (ed.) EWSL 1991. LNCS, vol. 482, pp. 151–163. Springer, Heidelberg (1991). doi:10.1007/BFb0017011
Chapter Google Scholar
Clark, P., Niblett, T.: The CN2 induction algorithm. Mach. Learn. 3(4), 261–283 (1989)
Google Scholar
Cormack, G.V.: Email spam filtering: a systematic review. Found. Trends Inf. Retrieval 1(4), 335–455 (2007)
Article Google Scholar
Çiltik, A., Güngör, T.: Time-efficient spam e-mail filtering using n-gram models. Pattern Recogn. Lett. 29(1), 19–33 (2008)
Article Google Scholar
Gudkova, D., Vergelis, M., et al.: Spam and phishing in Q2 2016, pp. 1–22. Kaspersky Lab (2016)
Google Scholar
Gudkova, D., Vergelis, M., Demidova, N.: Spam and phishing in Q2 2015, pp. 1–19. Kaspersky Lab (2015)
Google Scholar
Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Syst. Appl. 36(7), 10206–10222 (2009)
Article Google Scholar
Herrera, F., Carmona del Jesus, C.J., et al.: An overview on subgroup discovery: foundations and applications. Knowl. Inf. Syst. 29(3), 495–525 (2010). Published online first
Article Google Scholar
Laorden, C., Santos, I., et al.: Word sense disambiguation for spam filtering. Electron. Commer. Res. Appl. 11(3), 290–298 (2012)
Article Google Scholar
Lavrac, N., Kavsek, B., Flach, P., Todorovski, L.: Subgroup discovery with CN2-SD. J. Mach. Learn. Res. 5(2), 153–188 (2004)
MathSciNet Google Scholar
Renuka, D.K., Hamsapriya, T., et al.: Spam classification based on supervised learning using machine learning techniques. In: International Conference on Process Automation, Control and Computing (PACC), pp. 1–7. IEEE (2011)
Google Scholar
Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced topic-based vector space model for semantics aware spam filtering. Expert Syst. Appl. 39(1), 437–444 (2012)
Article Google Scholar
Symantec. Internet Security Threat Report, vol. 21, pp. 1–77, April 2016
Google Scholar
Tang, G., Pei, J., Luk, W.S.: Email mining: tasks, common techniques, and tools. Knowl. Inf. Syst. 41(1), 1–31 (2014)
Article Google Scholar
Torabi, Z.S., Nadimi-Shahraki, M.H., et al.: Efficient support vector machines for spam detection: a survey. Int. J. Comput. Sci. Inf. Secur. 13(1), 11 (2015)
Google Scholar
Wang, H., Zheng, G., He, Y.: The improved bayesian algorithm to spam filtering. In: Wong, W.E. (ed.) Proceedings of the 4th International Conference on Computer Engineering and Networks, pp. 37–44. Springer, Cham (2015). doi:10.1007/978-3-319-11104-9_5
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, University de Quebec en Outaouais, Gatineau, Canada
Nadjate Saidani, Kamel Adi & Mouhand Said Allili

Authors

Nadjate Saidani
View author publications
You can also search for this author in PubMed Google Scholar
Kamel Adi
View author publications
You can also search for this author in PubMed Google Scholar
Mouhand Said Allili
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nadjate Saidani .

Editor information

Editors and Affiliations

Université de Montréal, Montreal, Québec, Canada
Esma Aïmeur
University of Ottawa, Ottawa, Ontario, Canada
Umar Ruhi
Carleton University, Ottawa, Ontario, Canada
Michael Weiss

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Saidani, N., Adi, K., Allili, M.S. (2017). A Supervised Approach for Spam Detection Using Text-Based Semantic Representation. In: Aïmeur, E., Ruhi, U., Weiss, M. (eds) E-Technologies: Embracing the Internet of Things . MCETECH 2017. Lecture Notes in Business Information Processing, vol 289. Springer, Cham. https://doi.org/10.1007/978-3-319-59041-7_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-59041-7_8
Published: 05 May 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-59040-0
Online ISBN: 978-3-319-59041-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics