Abstract
Earlier works on detecting spam e-mails usually compare the contents of e-mails against specific keywords, which are not robust as the spammers frequently change the terms used in e-mails. We have presented in this paper a novel featuring method for spam filtering. Instead of classifying e-mails according to keywords, this study analyzes the spamming behaviors and extracts the representative ones as features for describing the characteristics of e-mails. An back-propagation neural network is designed and implemented, which builds classification model by considering the behavior-based features revealed from e-mails’ headers and syslogs. Since spamming behaviors are infrequently changed, compared with the change frequency of keywords used in spams, behavior-based features are more robust with respect to the change of time; so that the behavior-based filtering mechanism outperform keyword-based filtering. The experimental results indicate that our methods are more useful in distinguishing spam e-mails than that of keyword-based comparison.
Similar content being viewed by others
References
Aris A, Gemmell J, Lueder R (2004) Exploiting location and time for photo search and storytelling in Mylifebits. Technical report MSR-TR-2004-102, Microsoft Research
Attoh-Okine NO (1999) Analysis of learning rate and momentum term in backpropagation neural network algorithm trained to predict pavement performance. Adv Eng Softw 30(4):291–302
Bezerra GB, Barra TV, Ferreira HM, Knidel H, de Castro LN, Von Zuben FJ (2006) An immunological filter for spam. In: Proceedings of the international conference on artificial immune systems. Oeiras, Portugal, pp 446–458
Blanzieri E, Bryl A (2006) A survey of anti-spam techniques. Technical report DIT-06-056, Informatica e Telecomunicazioni, University of Trento
Camastra F (2005) Kernel methods for clustering. In: Proceedings of the international workshop on natural and artificial immune systems, pp 1–9
Carpinteiro OAS, Lima I, Assis JMC, de Souza ACZ, Moreira EM, Pinheiro CAM (2006) A neural model in anti-spam systems. In: Lecture notes in computer science, vol 4132. Springer, Berlin, pp 847–855
Chaitin GJ (1974) Information theoretic limitations on formal systems. J Assoc Comput Mach 21(3):403–424
Clark J, Koprinska I, Poon J (2003) A neural network based approach to automated e-mail classification. In: Proceedings of the 2003 IEEE/WIC international conference on Web intelligence. Halifax, Canada, October 2003, pp 702–705
Costales B, Allman E (2002) Sendmail, 3rd edn. O’Reilly & Associates, Sebastopol
Crawford E, Kay J, McCreath E (2001) Automatic induction of rules for e-mail classification. In: Proceedings of the 6th Australasian document computing symposium. Coffs Harbour, Australia, 7 December 2001, pp 13–20
Delany SJ, Cunningham P, Doyle D, Zamolotskikh A (2005) Generating estimates of classification confidence for a case-based spam filter. In: Munoz-Avila H, Ricci F (eds) Proceedings of the international conference on case-based reasoning. Chicago, Illinois, USA, pp 177–190
Dreyfus G (2005) Neural networks, 1st edn. Springer, New York
Fdez-Riverola F, Iglesias EL, Díaz F, Méndez JR, Corchado JM (2007) Applying lazy learning algorithms to tackle concept drift in spam filtering. Expert Syst Appl 33(1):36–48
Garner S (1995) WEKA: The waikato environment for knowledge analysis. In: Proceedings of the New Zealand computer science research students conference. New Zealand, pp 57–64
Gavrilis D, Tsoulos IG, Dermatas E (2006) Neural recognition and genetic features selection for robust detection of e-mail spam. In: Antoniou G, Potamias G, Spyropoulos C, Plexousakis D (eds) Proceedings of the 4th Helenic conference on AI, Heraklion, Crete, Greece. Lecture notes in computer science, vol 3955. Springer, Berlin, pp 498–501
Gemmell J, Williams L, Wood K, Bell G, Lueder R (2004) Passive capture and ensuing issues for a personal lifetime store. In: Proceedings of the first ACM workshop on continuous archival and retrieval of personal experiences (CARPE04). New York, USA, pp 48–55
Grimes GA (2007) Compliance with the CAN-SPAM Act of 2003. Commun ACM 50(2):56–62
Haykin S (1998) Neural networks: A comprehensive foundation, 2nd edn. Prentice Hall, New York
Hershkop S (2006) Behavior-based email analysis with application to spam detection. Ph.D. thesis, Columbia University
Hoanca B (2006) How good are our weapons in the spam wars? IEEE Technol Soc Mag 25:22–30
Hopkins M, Reeber E, Forman G, Suermondt J (2005) Spam e-mail database from UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Jiang E (2006) Learning to semantically classify email messages. In: Proceedings of the international conference on intelligent computing. Kunming, China, pp 700–711
Kanaris I, Kanaris K, Stamatatos E (2006) Spam detection using character n-grams. In: Antoniou G et al (eds) Proceedings of the Hellenic conference on artificial intelligence. Heraklion, Crete, Greece, pp 95–104
Lai C-C, Tsai M-C (2004) An empirical performance comparison of machine learning methods for spam e-mail categorization. In: Proceedings of the fourth international conference on hybrid intelligent systems. Kitakyushu, Japan
Luo X, Zincir-Heywood N (2005) Comparison of a SOM based sequence analysis system and naive Bayesian classifier for spam filtering. In: Proceedings of international joint conference on neural networks, vol 4. Montreal, Canada, pp 2571–2576
Mark B, Perrault RC (2005) Enron email dataset. http://www-2.cs.cmu.edu/~enron/
Massey B, Thomure M, Budrevich R, Long S (2005) The PSAM project. http://nexp.cs.pdx.edu/~psam/cgi-bin/view/PSAM/CorpusSets
Mendez JR, Fdez-Riverola F, Iglesias EL, Diaz F, Corchado JM (2006) Tracking concept drift at feature selection stage in spam hunting: An anti-spam instance-based reasoning system. In: Roth-Berghofer TR et al (eds) Proceedings of the European conference on case-based reasoning. Fethiye, Turkey, pp 504–518
Özgür L, Gungor T, Gurgen F (2004) Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish. Pattern Recognit Lett 25(16):1819–1833
Pampapathi R, Mirkin B, Levene M (2006) A suffix tree approach to anti-spam email filtering. Mach Learn 65:309–338
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
SpamArchive.org. (2005) Spamarchive project. http://spamarchive.org/
SpamLinks.net. (2005) Spam links—spam archives. http://spamlinks.net/filter-archives.htm
The Spamhaus Project Ltd. (2007) The definition of spam. http://www.spamhaus.org/definition.html
Trend Micro, Inc. (2005) Definition of “spam”. http://www.mail-abuse.com/spam_def.html
Tseng L-S, Wu C-H (2003) Detection of spam e-mails by analyzing the distributing behaviors of e-mail servers. In: Proceedings of the third international conference on hybrid intelligent systems. Kitakyushu, Japan, pp 1024–1033
U.S. Senate and House of Representatives (2004) CAN-SPAM Act of 2003 (S. 877). http://www.cauce.org/S877.pdf
Wang B, Jones GJF, Pan W (2006) Using online linear classifiers to filter spam emails. Pattern Anal Appl 9:339–351
Wang F, You Z, Man L (2006). Immune-based peer-to-peer model for anti-spam. In: Proceedings of the international conference on intelligent computing. Kunming, China, pp 660–671
Wang H-B, Yu Y, Liu Z (2005). SVM classifier incorporating feature selection using GA for spam detection. In: Proceedings of the 2005 international conference on embedded and ubiquitous computing. Nagasaki, Japan, pp 1147–1154
Webb S, Chitti S, Pu C (2005). An experimental evaluation of spam filter performance and robustness against attack. In: Proceedings of the 1st international conference on collaborative computing: networking, applications and worksharing. San Jose, CA, USA, pp 19–21
Yue X, Abraham A, Chi Z-X, Hao Y-Y, Mo H (2007) Artificial immune system inspired behavior-based anti-spam filter. Soft Comput 11:729–740
Zhang X, Liu J, Zhang Y, Wang C (2006). Spam behavior recognition based on session layer data mining. In: Wang L et al (eds) Proceedings of third international conference on fuzzy systems and knowledge discovery. Xian, China, pp 1289–1298
Author information
Authors and Affiliations
Corresponding author
Additional information
Partially supported by National Science Council, Taiwan, under grant NSC 93-2213-E-390-009.
Rights and permissions
About this article
Cite this article
Wu, CH., Tsai, CH. Robust classification for spam filtering by back-propagation neural networks using behavior-based features. Appl Intell 31, 107–121 (2009). https://doi.org/10.1007/s10489-008-0116-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-008-0116-0