Skip to main content
Log in

Robust classification for spam filtering by back-propagation neural networks using behavior-based features

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

Earlier works on detecting spam e-mails usually compare the contents of e-mails against specific keywords, which are not robust as the spammers frequently change the terms used in e-mails. We have presented in this paper a novel featuring method for spam filtering. Instead of classifying e-mails according to keywords, this study analyzes the spamming behaviors and extracts the representative ones as features for describing the characteristics of e-mails. An back-propagation neural network is designed and implemented, which builds classification model by considering the behavior-based features revealed from e-mails’ headers and syslogs. Since spamming behaviors are infrequently changed, compared with the change frequency of keywords used in spams, behavior-based features are more robust with respect to the change of time; so that the behavior-based filtering mechanism outperform keyword-based filtering. The experimental results indicate that our methods are more useful in distinguishing spam e-mails than that of keyword-based comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Aris A, Gemmell J, Lueder R (2004) Exploiting location and time for photo search and storytelling in Mylifebits. Technical report MSR-TR-2004-102, Microsoft Research

  2. Attoh-Okine NO (1999) Analysis of learning rate and momentum term in backpropagation neural network algorithm trained to predict pavement performance. Adv Eng Softw 30(4):291–302

    Article  Google Scholar 

  3. Bezerra GB, Barra TV, Ferreira HM, Knidel H, de Castro LN, Von Zuben FJ (2006) An immunological filter for spam. In: Proceedings of the international conference on artificial immune systems. Oeiras, Portugal, pp 446–458

  4. Blanzieri E, Bryl A (2006) A survey of anti-spam techniques. Technical report DIT-06-056, Informatica e Telecomunicazioni, University of Trento

  5. Camastra F (2005) Kernel methods for clustering. In: Proceedings of the international workshop on natural and artificial immune systems, pp 1–9

  6. Carpinteiro OAS, Lima I, Assis JMC, de Souza ACZ, Moreira EM, Pinheiro CAM (2006) A neural model in anti-spam systems. In: Lecture notes in computer science, vol 4132. Springer, Berlin, pp 847–855

    Google Scholar 

  7. Chaitin GJ (1974) Information theoretic limitations on formal systems. J Assoc Comput Mach 21(3):403–424

    MATH  MathSciNet  Google Scholar 

  8. Clark J, Koprinska I, Poon J (2003) A neural network based approach to automated e-mail classification. In: Proceedings of the 2003 IEEE/WIC international conference on Web intelligence. Halifax, Canada, October 2003, pp 702–705

  9. Costales B, Allman E (2002) Sendmail, 3rd edn. O’Reilly & Associates, Sebastopol

    Google Scholar 

  10. Crawford E, Kay J, McCreath E (2001) Automatic induction of rules for e-mail classification. In: Proceedings of the 6th Australasian document computing symposium. Coffs Harbour, Australia, 7 December 2001, pp 13–20

  11. Delany SJ, Cunningham P, Doyle D, Zamolotskikh A (2005) Generating estimates of classification confidence for a case-based spam filter. In: Munoz-Avila H, Ricci F (eds) Proceedings of the international conference on case-based reasoning. Chicago, Illinois, USA, pp 177–190

  12. Dreyfus G (2005) Neural networks, 1st edn. Springer, New York

    MATH  Google Scholar 

  13. Fdez-Riverola F, Iglesias EL, Díaz F, Méndez JR, Corchado JM (2007) Applying lazy learning algorithms to tackle concept drift in spam filtering. Expert Syst Appl 33(1):36–48

    Article  Google Scholar 

  14. Garner S (1995) WEKA: The waikato environment for knowledge analysis. In: Proceedings of the New Zealand computer science research students conference. New Zealand, pp 57–64

  15. Gavrilis D, Tsoulos IG, Dermatas E (2006) Neural recognition and genetic features selection for robust detection of e-mail spam. In: Antoniou G, Potamias G, Spyropoulos C, Plexousakis D (eds) Proceedings of the 4th Helenic conference on AI, Heraklion, Crete, Greece. Lecture notes in computer science, vol 3955. Springer, Berlin, pp 498–501

    Google Scholar 

  16. Gemmell J, Williams L, Wood K, Bell G, Lueder R (2004) Passive capture and ensuing issues for a personal lifetime store. In: Proceedings of the first ACM workshop on continuous archival and retrieval of personal experiences (CARPE04). New York, USA, pp 48–55

  17. Grimes GA (2007) Compliance with the CAN-SPAM Act of 2003. Commun ACM 50(2):56–62

    Article  Google Scholar 

  18. Haykin S (1998) Neural networks: A comprehensive foundation, 2nd edn. Prentice Hall, New York

    Google Scholar 

  19. Hershkop S (2006) Behavior-based email analysis with application to spam detection. Ph.D. thesis, Columbia University

  20. Hoanca B (2006) How good are our weapons in the spam wars? IEEE Technol Soc Mag 25:22–30

    Article  Google Scholar 

  21. Hopkins M, Reeber E, Forman G, Suermondt J (2005) Spam e-mail database from UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html

  22. Jiang E (2006) Learning to semantically classify email messages. In: Proceedings of the international conference on intelligent computing. Kunming, China, pp 700–711

  23. Kanaris I, Kanaris K, Stamatatos E (2006) Spam detection using character n-grams. In: Antoniou G et al (eds) Proceedings of the Hellenic conference on artificial intelligence. Heraklion, Crete, Greece, pp 95–104

  24. Lai C-C, Tsai M-C (2004) An empirical performance comparison of machine learning methods for spam e-mail categorization. In: Proceedings of the fourth international conference on hybrid intelligent systems. Kitakyushu, Japan

  25. Luo X, Zincir-Heywood N (2005) Comparison of a SOM based sequence analysis system and naive Bayesian classifier for spam filtering. In: Proceedings of international joint conference on neural networks, vol 4. Montreal, Canada, pp 2571–2576

  26. Mark B, Perrault RC (2005) Enron email dataset. http://www-2.cs.cmu.edu/~enron/

  27. Massey B, Thomure M, Budrevich R, Long S (2005) The PSAM project. http://nexp.cs.pdx.edu/~psam/cgi-bin/view/PSAM/CorpusSets

  28. Mendez JR, Fdez-Riverola F, Iglesias EL, Diaz F, Corchado JM (2006) Tracking concept drift at feature selection stage in spam hunting: An anti-spam instance-based reasoning system. In: Roth-Berghofer TR et al (eds) Proceedings of the European conference on case-based reasoning. Fethiye, Turkey, pp 504–518

  29. Özgür L, Gungor T, Gurgen F (2004) Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish. Pattern Recognit Lett 25(16):1819–1833

    Article  Google Scholar 

  30. Pampapathi R, Mirkin B, Levene M (2006) A suffix tree approach to anti-spam email filtering. Mach Learn 65:309–338

    Article  Google Scholar 

  31. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523

    Article  Google Scholar 

  32. SpamArchive.org. (2005) Spamarchive project. http://spamarchive.org/

  33. SpamLinks.net. (2005) Spam links—spam archives. http://spamlinks.net/filter-archives.htm

  34. The Spamhaus Project Ltd. (2007) The definition of spam. http://www.spamhaus.org/definition.html

  35. Trend Micro, Inc. (2005) Definition of “spam”. http://www.mail-abuse.com/spam_def.html

  36. Tseng L-S, Wu C-H (2003) Detection of spam e-mails by analyzing the distributing behaviors of e-mail servers. In: Proceedings of the third international conference on hybrid intelligent systems. Kitakyushu, Japan, pp 1024–1033

  37. U.S. Senate and House of Representatives (2004) CAN-SPAM Act of 2003 (S. 877). http://www.cauce.org/S877.pdf

  38. Wang B, Jones GJF, Pan W (2006) Using online linear classifiers to filter spam emails. Pattern Anal Appl 9:339–351

    Article  MathSciNet  Google Scholar 

  39. Wang F, You Z, Man L (2006). Immune-based peer-to-peer model for anti-spam. In: Proceedings of the international conference on intelligent computing. Kunming, China, pp 660–671

  40. Wang H-B, Yu Y, Liu Z (2005). SVM classifier incorporating feature selection using GA for spam detection. In: Proceedings of the 2005 international conference on embedded and ubiquitous computing. Nagasaki, Japan, pp 1147–1154

  41. Webb S, Chitti S, Pu C (2005). An experimental evaluation of spam filter performance and robustness against attack. In: Proceedings of the 1st international conference on collaborative computing: networking, applications and worksharing. San Jose, CA, USA, pp 19–21

  42. Yue X, Abraham A, Chi Z-X, Hao Y-Y, Mo H (2007) Artificial immune system inspired behavior-based anti-spam filter. Soft Comput 11:729–740

    Article  Google Scholar 

  43. Zhang X, Liu J, Zhang Y, Wang C (2006). Spam behavior recognition based on session layer data mining. In: Wang L et al (eds) Proceedings of third international conference on fuzzy systems and knowledge discovery. Xian, China, pp 1289–1298

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chih-Hung Wu.

Additional information

Partially supported by National Science Council, Taiwan, under grant NSC 93-2213-E-390-009.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CH., Tsai, CH. Robust classification for spam filtering by back-propagation neural networks using behavior-based features. Appl Intell 31, 107–121 (2009). https://doi.org/10.1007/s10489-008-0116-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-008-0116-0

Keywords

Navigation