Skip to main content

An Interactive Hybrid System for Identifying and Filtering Unsolicited E-mail

  • Conference paper
Intelligent Data Engineering and Automated Learning – IDEAL 2006 (IDEAL 2006)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4224))

Abstract

This paper presents a system for automatically detecting and filtering unsolicited electronic messages. The underlying hybrid filtering method is based on e-mail origin and content. The system classifies each of the three parts of e-mails separately by using a sinole Bayesian filter together with a heuristic knowledge base. The system extracts heuristic knowledge from a set of labelled words as the basis on which to begin filtering instead of conducting a training stage using a historic body of pre-classified e-mails. The classification resulting from each part is then integrated to achieve optimum effectiveness. The heuristic knowledge base allows the system to carry out intelligent management of the increase in filter vocabularies and thus ensures efficient classification. The system is dynamic and interactive and the role of the user is essential to keep the evolution of the system up to date by incremental machine learning with the evolution of spam. The user can interact with the system over a customized, friendly interface, in real time or at intervals of the user’s choosing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Androutsopoulos, I., Paliouras, G., Karkaletsis, G., Sakkis, G., Spyropoulos, C., Stamatopoulos, P.: Learning to filter spam e-mail: A comparison of a naive bayesian and a memory-based approach. In: Workshop on Machine Learning and Textual Information Access, 4th European Conference on Principles and Practice of Knowledge Discovery in Databases (2000)

    Google Scholar 

  • Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D.: An Evaluation of Naive Bayesian Anti-Spam Filtering. In: Proc. of the workshop on Machine Learning in the New Information Age, 11th European Conference on Machine Learning (ECML), pp. 9–17 (2000)

    Google Scholar 

  • Carreras, X., Márquez, L.: Boosting Trees for Anti-Spam Email Filtering. In: Mitkov, R., Angelova, G., Bontcheva, K., Nicolov, N., Nikolov, N. (eds.) Proceedings of RANLP 2001, 4th International Conference on Recent Advances in Natural Language Processing, Tzigov Chark, BG, pp. 58–64 (2001)

    Google Scholar 

  • Cohen, W.: Learning rules that classify e-mail. In: AAAI Spring Symposium on Machine Learning in Information Access (1996)

    Google Scholar 

  • Commission of the European Communities: Communication from the Commission to the European Parliament, the Council, the European Economic and Social Committee of the Regions on unsolicited commercial communications or ‘spam’, Brussels (2004)

    Google Scholar 

  • Cunningham, P., Nowlan, N., Delany, S.J., Haahr, M.: A Case-Based Approach to Spam Filtering that Can Track Concept Drift. Technical Report at Trinity College, TCD-CS-2003-16, Dublin (2003)

    Google Scholar 

  • Daelemans, W., Zavrel, J., van der Sloot, K., van den Bosch, A.: TiMBL: Tilburg Memory-Based Learner - version 4.0 Reference Guide (2001)

    Google Scholar 

  • Dupont, P.: Inductive and Statistical Learning of Formal Grammars. Technical Report, research talk, Department of Ingenerie Informatique, Universite Catholique de Louvain (2002)

    Google Scholar 

  • Drucker, H., Wu, D., Vapnik, V.N.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5) (1999)

    Google Scholar 

  • Graham, P.: A plan for spam (2002), http://www.paulgraham.com/spam.html

  • Graham, P.: Better Bayesian Filtering. In: Proc. of Spam Conference 2003, MIT Media Lab, Cambridge (2003)

    Google Scholar 

  • Mertz, D.: Spam Filtering Techniques. Six approaches to eliminating unwanted e-mail. Gnosis Software Inc. (2002)

    Google Scholar 

  • Michalsky, R.S.: A theory and methodology of inductive learning. In: Michalsky, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning: An Artificial Intelligence Approach, pp. 83–134. Springer, Heidelberg (1983)

    Google Scholar 

  • Mitchell, T.M.: Machine Learning. McGraw-Hill, New York (1997)

    MATH  Google Scholar 

  • Pyzor, http://pyzor.sourceforge.net

  • Randazzese, V.A.: ChoiceMail Eases Antispam Software Use While Effectively Figthing Off Unwanted E-mail Traffic. CRN (2004)

    Google Scholar 

  • Rulot, H.: ECGI. Un algoritmo de Inferencia Gramatical mediante Corrección de Errores. Phd Thesis, Facultad de Ciencias Físicas, Universidad de Valencia (1992)

    Google Scholar 

  • Sergeant, M.: Internet-Level Spam Detection and SpamAssassin 2.50. In: Proceedings of Spam Conference 2003. MIT Media Lab, Cambridge (2003), http://spamassassin.org

    Google Scholar 

  • http://www.spamassassin.apache.org

  • Tagged Message Delivery Agent Homepage, http://tmda.net

  • Teredesai, A., Dawara, S.: Junk Mail, a Bane to Messaging. Technical Report of STARE Project, Rochester Institute of Technology (2003), http://www.cs.rit.edu/~sgd9494/STARE.htm

  • Vinther, M.: Junk Detection using neural networks. MeeSoft Technical Report (2002), http://logicnet.dk/reports/JunkDetection/JunkDetection.htm

  • Vipul’s Razor, http://razor.sourceforge.net

  • Yerazunis, W.S.: The Spam-Filtering Accuracy Plateau at 99,9% Accuracy and How to Get Past It. In: Proceedings of MIT Spam Conference (2004)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2006 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

del Castillo, M.D., Serrano, J.I. (2006). An Interactive Hybrid System for Identifying and Filtering Unsolicited E-mail. In: Corchado, E., Yin, H., Botti, V., Fyfe, C. (eds) Intelligent Data Engineering and Automated Learning – IDEAL 2006. IDEAL 2006. Lecture Notes in Computer Science, vol 4224. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11875581_94

Download citation

  • DOI: https://doi.org/10.1007/11875581_94

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-45485-4

  • Online ISBN: 978-3-540-45487-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics