Abstract
This paper presents a system for automatically detecting and filtering unsolicited electronic messages. The underlying hybrid filtering method is based on e-mail origin and content. The system classifies each of the three parts of e-mails separately by using a sinole Bayesian filter together with a heuristic knowledge base. The system extracts heuristic knowledge from a set of labelled words as the basis on which to begin filtering instead of conducting a training stage using a historic body of pre-classified e-mails. The classification resulting from each part is then integrated to achieve optimum effectiveness. The heuristic knowledge base allows the system to carry out intelligent management of the increase in filter vocabularies and thus ensures efficient classification. The system is dynamic and interactive and the role of the user is essential to keep the evolution of the system up to date by incremental machine learning with the evolution of spam. The user can interact with the system over a customized, friendly interface, in real time or at intervals of the user’s choosing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Androutsopoulos, I., Paliouras, G., Karkaletsis, G., Sakkis, G., Spyropoulos, C., Stamatopoulos, P.: Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. In: Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (2000)
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering. In: Proc. of the workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML), pp. 9–17 (2000)
Carreras, X., Márquez, L.: Boosting Trees for Anti-Spam Email Filtering. In: Mitkov, R., Angelova, G., Bontcheva, K., Nicolov, N., Nikolov, N. (eds.) Proceedings of RANLP 2001, 4th International Conference on Recent Advances in Natural Language Processing, Tzigov Chark, BG, pp. 58–64 (2001)
Cohen, W.: Learning rules that classify e-mail. In: AAAI Spring Symposium on Machine Learning in Information Access (1996)
Commission of the European Communities: Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee of the Regions on unsolicited commercial communications or ‘spam’, Brussels (2004)
Cunningham, P., Nowlan, N., Delany, S.J., Haahr, M.: A Case-Based Approach to Spam Filtering that Can Track Concept Drift. Technical Report at Trinity College, TCD-CS-2003-16, Dublin (2003)
Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner - version 4.0 Reference Guide (2001)
Dupont, P.: Inductive and Statistical Learning of Formal Grammars. Technical Report, research talk, Department of Ingenerie Informatique, Universite Catholique de Louvain (2002)
Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5) (1999)
Graham, P.: A plan for spam (2002), http://www.paulgraham.com/spam.html
Graham, P.: Better Bayesian Filtering. In: Proc. of Spam Conference 2003, MIT Media Lab, Cambridge (2003)
Mertz, D.: Spam Filtering Techniques. Six approaches to eliminating unwanted e-mail. Gnosis Software Inc. (2002)
Michalsky, R.S.: A theory and methodology of inductive learning. In: Michalsky, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 83–134. Springer, Heidelberg (1983)
Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)
Pyzor, http://pyzor.sourceforge.net
Randazzese, V.A.: ChoiceMail Eases Antispam Software Use While Effectively Figthing Off Unwanted E-mail Traffic. CRN (2004)
Rulot, H.: ECGI. Un algoritmo de Inferencia Gramatical mediante Corrección de Errores. Phd Thesis, Facultad de Ciencias Físicas, Universidad de Valencia (1992)
Sergeant, M.: Internet-Level Spam Detection and SpamAssassin 2.50. In: Proceedings of Spam Conference 2003. MIT Media Lab, Cambridge (2003), http://spamassassin.org
Tagged Message Delivery Agent Homepage, http://tmda.net
Teredesai, A., Dawara, S.: Junk Mail, a Bane to Messaging. Technical Report of STARE Project, Rochester Institute of Technology (2003), http://www.cs.rit.edu/~sgd9494/STARE.htm
Vinther, M.: Junk Detection using neural networks. MeeSoft Technical Report (2002), http://logicnet.dk/reports/JunkDetection/JunkDetection.htm
Vipul’s Razor, http://razor.sourceforge.net
Yerazunis, W.S.: The Spam-Filtering Accuracy Plateau at 99,9% Accuracy and How to Get Past It. In: Proceedings of MIT Spam Conference (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
del Castillo, M.D., Serrano, J.I. (2006). An Interactive Hybrid System for Identifying and Filtering Unsolicited E-mail. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_94
Download citation
DOI: https://doi.org/10.1007/11875581_94
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-45485-4
Online ISBN: 978-3-540-45487-8
eBook Packages: Computer ScienceComputer Science (R0)