An Interactive Hybrid System for Identifying and Filtering Unsolicited E-mail

del Castillo, M. Dolores; Serrano, J. Ignacio

doi:10.1007/11875581_94

M. Dolores del Castillo²⁰ &
J. Ignacio Serrano²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Included in the following conference series:

International Conference on Intelligent Data Engineering and Automated Learning

1259 Accesses
3 Citations

Abstract

This paper presents a system for automatically detecting and filtering unsolicited electronic messages. The underlying hybrid filtering method is based on e-mail origin and content. The system classifies each of the three parts of e-mails separately by using a sinole Bayesian filter together with a heuristic knowledge base. The system extracts heuristic knowledge from a set of labelled words as the basis on which to begin filtering instead of conducting a training stage using a historic body of pre-classified e-mails. The classification resulting from each part is then integrated to achieve optimum effectiveness. The heuristic knowledge base allows the system to carry out intelligent management of the increase in filter vocabularies and thus ensures efficient classification. The system is dynamic and interactive and the role of the user is essential to keep the evolution of the system up to date by incremental machine learning with the evolution of spam. The user can interact with the system over a customized, friendly interface, in real time or at intervals of the user’s choosing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Androutsopoulos, I., Paliouras, G., Karkaletsis, G., Sakkis, G., Spyropoulos, C., Stamatopoulos, P.: Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. In: Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (2000)
Google Scholar
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering. In: Proc. of the workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML), pp. 9–17 (2000)
Google Scholar
Carreras, X., Márquez, L.: Boosting Trees for Anti-Spam Email Filtering. In: Mitkov, R., Angelova, G., Bontcheva, K., Nicolov, N., Nikolov, N. (eds.) Proceedings of RANLP 2001, 4th International Conference on Recent Advances in Natural Language Processing, Tzigov Chark, BG, pp. 58–64 (2001)
Google Scholar
Cohen, W.: Learning rules that classify e-mail. In: AAAI Spring Symposium on Machine Learning in Information Access (1996)
Google Scholar
Commission of the European Communities: Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee of the Regions on unsolicited commercial communications or ‘spam’, Brussels (2004)
Google Scholar
Cunningham, P., Nowlan, N., Delany, S.J., Haahr, M.: A Case-Based Approach to Spam Filtering that Can Track Concept Drift. Technical Report at Trinity College, TCD-CS-2003-16, Dublin (2003)
Google Scholar
Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner - version 4.0 Reference Guide (2001)
Google Scholar
Dupont, P.: Inductive and Statistical Learning of Formal Grammars. Technical Report, research talk, Department of Ingenerie Informatique, Universite Catholique de Louvain (2002)
Google Scholar
Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5) (1999)
Google Scholar
Graham, P.: A plan for spam (2002), http://www.paulgraham.com/spam.html
Graham, P.: Better Bayesian Filtering. In: Proc. of Spam Conference 2003, MIT Media Lab, Cambridge (2003)
Google Scholar
Mertz, D.: Spam Filtering Techniques. Six approaches to eliminating unwanted e-mail. Gnosis Software Inc. (2002)
Google Scholar
Michalsky, R.S.: A theory and methodology of inductive learning. In: Michalsky, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 83–134. Springer, Heidelberg (1983)
Google Scholar
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
MATH Google Scholar
Pyzor, http://pyzor.sourceforge.net
Randazzese, V.A.: ChoiceMail Eases Antispam Software Use While Effectively Figthing Off Unwanted E-mail Traffic. CRN (2004)
Google Scholar
Rulot, H.: ECGI. Un algoritmo de Inferencia Gramatical mediante Corrección de Errores. Phd Thesis, Facultad de Ciencias Físicas, Universidad de Valencia (1992)
Google Scholar
Sergeant, M.: Internet-Level Spam Detection and SpamAssassin 2.50. In: Proceedings of Spam Conference 2003. MIT Media Lab, Cambridge (2003), http://spamassassin.org
Google Scholar
http://www.spamassassin.apache.org
Tagged Message Delivery Agent Homepage, http://tmda.net
Teredesai, A., Dawara, S.: Junk Mail, a Bane to Messaging. Technical Report of STARE Project, Rochester Institute of Technology (2003), http://www.cs.rit.edu/~sgd9494/STARE.htm
Vinther, M.: Junk Detection using neural networks. MeeSoft Technical Report (2002), http://logicnet.dk/reports/JunkDetection/JunkDetection.htm
Vipul’s Razor, http://razor.sourceforge.net
Yerazunis, W.S.: The Spam-Filtering Accuracy Plateau at 99,9% Accuracy and How to Get Past It. In: Proceedings of MIT Spam Conference (2004)
Google Scholar

Download references

Author information

Authors and Affiliations

Instituto de Automática Industrial, CSIC, Ctra, Campo Real km 0.200 – La Poveda, 28500, Arganda del Rey, Madrid, Spain
M. Dolores del Castillo & J. Ignacio Serrano

Authors

M. Dolores del Castillo
View author publications
You can also search for this author in PubMed Google Scholar
J. Ignacio Serrano
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Escuela Politécnica Superior, GICAP Research Group, Universidad de Burgo, Calle Francisco de Vitoria S/N, Edifico C, Campus Vena, 09006, Burgos, Spain
Emilio Corchado
School of Electrical and Electronic Engineering, University of Manchester, UK
Hujun Yin
Department of Information Systems and Computation, Technical University of Valencia, Camino de Vera, Valencia, Spain
Vicente Botti
University of West Scotland, PA1 2BE, Paisley, Scotland
Colin Fyfe

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

del Castillo, M.D., Serrano, J.I. (2006). An Interactive Hybrid System for Identifying and Filtering Unsolicited E-mail. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_94

Download citation

DOI: https://doi.org/10.1007/11875581_94
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics