Robust classification for spam filtering by back-propagation neural networks using behavior-based features

Wu, Chih-Hung; Tsai, Chiung-Hui

doi:10.1007/s10489-008-0116-0

Robust classification for spam filtering by back-propagation neural networks using behavior-based features

Published: 01 February 2008

Volume 31, pages 107–121, (2009)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Chih-Hung Wu¹ &
Chiung-Hui Tsai²

264 Accesses
16 Citations
Explore all metrics

Abstract

Earlier works on detecting spam e-mails usually compare the contents of e-mails against specific keywords, which are not robust as the spammers frequently change the terms used in e-mails. We have presented in this paper a novel featuring method for spam filtering. Instead of classifying e-mails according to keywords, this study analyzes the spamming behaviors and extracts the representative ones as features for describing the characteristics of e-mails. An back-propagation neural network is designed and implemented, which builds classification model by considering the behavior-based features revealed from e-mails’ headers and syslogs. Since spamming behaviors are infrequently changed, compared with the change frequency of keywords used in spams, behavior-based features are more robust with respect to the change of time; so that the behavior-based filtering mechanism outperform keyword-based filtering. The experimental results indicate that our methods are more useful in distinguishing spam e-mails than that of keyword-based comparison.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A comprehensive survey of AI-enabled phishing attacks detection techniques

Article 23 October 2020

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

Article 21 March 2022

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

Article Open access 11 May 2022

References

Aris A, Gemmell J, Lueder R (2004) Exploiting location and time for photo search and storytelling in Mylifebits. Technical report MSR-TR-2004-102, Microsoft Research
Attoh-Okine NO (1999) Analysis of learning rate and momentum term in backpropagation neural network algorithm trained to predict pavement performance. Adv Eng Softw 30(4):291–302
Article Google Scholar
Bezerra GB, Barra TV, Ferreira HM, Knidel H, de Castro LN, Von Zuben FJ (2006) An immunological filter for spam. In: Proceedings of the international conference on artificial immune systems. Oeiras, Portugal, pp 446–458
Blanzieri E, Bryl A (2006) A survey of anti-spam techniques. Technical report DIT-06-056, Informatica e Telecomunicazioni, University of Trento
Camastra F (2005) Kernel methods for clustering. In: Proceedings of the international workshop on natural and artificial immune systems, pp 1–9
Carpinteiro OAS, Lima I, Assis JMC, de Souza ACZ, Moreira EM, Pinheiro CAM (2006) A neural model in anti-spam systems. In: Lecture notes in computer science, vol 4132. Springer, Berlin, pp 847–855
Google Scholar
Chaitin GJ (1974) Information theoretic limitations on formal systems. J Assoc Comput Mach 21(3):403–424
MATH MathSciNet Google Scholar
Clark J, Koprinska I, Poon J (2003) A neural network based approach to automated e-mail classification. In: Proceedings of the 2003 IEEE/WIC international conference on Web intelligence. Halifax, Canada, October 2003, pp 702–705
Costales B, Allman E (2002) Sendmail, 3rd edn. O’Reilly & Associates, Sebastopol
Google Scholar
Crawford E, Kay J, McCreath E (2001) Automatic induction of rules for e-mail classification. In: Proceedings of the 6th Australasian document computing symposium. Coffs Harbour, Australia, 7 December 2001, pp 13–20
Delany SJ, Cunningham P, Doyle D, Zamolotskikh A (2005) Generating estimates of classification confidence for a case-based spam filter. In: Munoz-Avila H, Ricci F (eds) Proceedings of the international conference on case-based reasoning. Chicago, Illinois, USA, pp 177–190
Dreyfus G (2005) Neural networks, 1st edn. Springer, New York
MATH Google Scholar
Fdez-Riverola F, Iglesias EL, Díaz F, Méndez JR, Corchado JM (2007) Applying lazy learning algorithms to tackle concept drift in spam filtering. Expert Syst Appl 33(1):36–48
Article Google Scholar
Garner S (1995) WEKA: The waikato environment for knowledge analysis. In: Proceedings of the New Zealand computer science research students conference. New Zealand, pp 57–64
Gavrilis D, Tsoulos IG, Dermatas E (2006) Neural recognition and genetic features selection for robust detection of e-mail spam. In: Antoniou G, Potamias G, Spyropoulos C, Plexousakis D (eds) Proceedings of the 4th Helenic conference on AI, Heraklion, Crete, Greece. Lecture notes in computer science, vol 3955. Springer, Berlin, pp 498–501
Google Scholar
Gemmell J, Williams L, Wood K, Bell G, Lueder R (2004) Passive capture and ensuing issues for a personal lifetime store. In: Proceedings of the first ACM workshop on continuous archival and retrieval of personal experiences (CARPE04). New York, USA, pp 48–55
Grimes GA (2007) Compliance with the CAN-SPAM Act of 2003. Commun ACM 50(2):56–62
Article Google Scholar
Haykin S (1998) Neural networks: A comprehensive foundation, 2nd edn. Prentice Hall, New York
Google Scholar
Hershkop S (2006) Behavior-based email analysis with application to spam detection. Ph.D. thesis, Columbia University
Hoanca B (2006) How good are our weapons in the spam wars? IEEE Technol Soc Mag 25:22–30
Article Google Scholar
Hopkins M, Reeber E, Forman G, Suermondt J (2005) Spam e-mail database from UCI machine learning repository. http://www.ics.uci.edu/~mlearn/MLRepository.html
Jiang E (2006) Learning to semantically classify email messages. In: Proceedings of the international conference on intelligent computing. Kunming, China, pp 700–711
Kanaris I, Kanaris K, Stamatatos E (2006) Spam detection using character n-grams. In: Antoniou G et al (eds) Proceedings of the Hellenic conference on artificial intelligence. Heraklion, Crete, Greece, pp 95–104
Lai C-C, Tsai M-C (2004) An empirical performance comparison of machine learning methods for spam e-mail categorization. In: Proceedings of the fourth international conference on hybrid intelligent systems. Kitakyushu, Japan
Luo X, Zincir-Heywood N (2005) Comparison of a SOM based sequence analysis system and naive Bayesian classifier for spam filtering. In: Proceedings of international joint conference on neural networks, vol 4. Montreal, Canada, pp 2571–2576
Mark B, Perrault RC (2005) Enron email dataset. http://www-2.cs.cmu.edu/~enron/
Massey B, Thomure M, Budrevich R, Long S (2005) The PSAM project. http://nexp.cs.pdx.edu/~psam/cgi-bin/view/PSAM/CorpusSets
Mendez JR, Fdez-Riverola F, Iglesias EL, Diaz F, Corchado JM (2006) Tracking concept drift at feature selection stage in spam hunting: An anti-spam instance-based reasoning system. In: Roth-Berghofer TR et al (eds) Proceedings of the European conference on case-based reasoning. Fethiye, Turkey, pp 504–518
Özgür L, Gungor T, Gurgen F (2004) Adaptive anti-spam filtering for agglutinative languages: a special case for Turkish. Pattern Recognit Lett 25(16):1819–1833
Article Google Scholar
Pampapathi R, Mirkin B, Levene M (2006) A suffix tree approach to anti-spam email filtering. Mach Learn 65:309–338
Article Google Scholar
Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
Article Google Scholar
SpamArchive.org. (2005) Spamarchive project. http://spamarchive.org/
SpamLinks.net. (2005) Spam links—spam archives. http://spamlinks.net/filter-archives.htm
The Spamhaus Project Ltd. (2007) The definition of spam. http://www.spamhaus.org/definition.html
Trend Micro, Inc. (2005) Definition of “spam”. http://www.mail-abuse.com/spam_def.html
Tseng L-S, Wu C-H (2003) Detection of spam e-mails by analyzing the distributing behaviors of e-mail servers. In: Proceedings of the third international conference on hybrid intelligent systems. Kitakyushu, Japan, pp 1024–1033
U.S. Senate and House of Representatives (2004) CAN-SPAM Act of 2003 (S. 877). http://www.cauce.org/S877.pdf
Wang B, Jones GJF, Pan W (2006) Using online linear classifiers to filter spam emails. Pattern Anal Appl 9:339–351
Article MathSciNet Google Scholar
Wang F, You Z, Man L (2006). Immune-based peer-to-peer model for anti-spam. In: Proceedings of the international conference on intelligent computing. Kunming, China, pp 660–671
Wang H-B, Yu Y, Liu Z (2005). SVM classifier incorporating feature selection using GA for spam detection. In: Proceedings of the 2005 international conference on embedded and ubiquitous computing. Nagasaki, Japan, pp 1147–1154
Webb S, Chitti S, Pu C (2005). An experimental evaluation of spam filter performance and robustness against attack. In: Proceedings of the 1st international conference on collaborative computing: networking, applications and worksharing. San Jose, CA, USA, pp 19–21
Yue X, Abraham A, Chi Z-X, Hao Y-Y, Mo H (2007) Artificial immune system inspired behavior-based anti-spam filter. Soft Comput 11:729–740
Article Google Scholar
Zhang X, Liu J, Zhang Y, Wang C (2006). Spam behavior recognition based on session layer data mining. In: Wang L et al (eds) Proceedings of third international conference on fuzzy systems and knowledge discovery. Xian, China, pp 1289–1298

Download references

Author information

Authors and Affiliations

Department of Electrical Engineering, National University of Kaohsiung, Kaohsiung, Taiwan
Chih-Hung Wu
Computer and Network Center, Chung Hwa University of Medical Technology, Tainan, Taiwan
Chiung-Hui Tsai

Authors

Chih-Hung Wu
View author publications
You can also search for this author in PubMed Google Scholar
Chiung-Hui Tsai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chih-Hung Wu.

Additional information

Partially supported by National Science Council, Taiwan, under grant NSC 93-2213-E-390-009.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wu, CH., Tsai, CH. Robust classification for spam filtering by back-propagation neural networks using behavior-based features. Appl Intell 31, 107–121 (2009). https://doi.org/10.1007/s10489-008-0116-0

Download citation

Received: 31 March 2007
Accepted: 21 January 2008
Published: 01 February 2008
Issue Date: October 2009
DOI: https://doi.org/10.1007/s10489-008-0116-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Robust classification for spam filtering by back-propagation neural networks using behavior-based features

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of AI-enabled phishing attacks detection techniques

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Robust classification for spam filtering by back-propagation neural networks using behavior-based features

Abstract

Access this article

Similar content being viewed by others

A comprehensive survey of AI-enabled phishing attacks detection techniques

Modeling Hybrid Feature-Based Phishing Websites Detection Using Machine Learning Techniques

A review of spam email detection: analysis of spammer strategies and the dataset shift problem

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation