Abstract
Most e-mail readers spend a non-trivial amount of time regularly deleting junk e-mail (spam) messages, even as an expanding volume of such e-mail occupies server storage space and consumes network bandwidth. An ongoing challenge, therefore, rests within the development and refinement of automatic classifiers that can distinguish legitimate e-mail from spam. A few published studies have examined spam detectors using Naïve Bayesian approaches and large feature sets of binary attributes that determine the existence of common keywords in spam, and many commercial applications also use Naïve Bayesian techniques. Spammers recognize these attempts to thwart their messages and have developed tactics to circumvent these filters, but these evasive tactics are themselves patterns that human readers can often identify quickly. Therefore, in contrast to earlier approaches, our feature set uses descriptive characteristics of words and messages similar to those that a human reader would use to identify spam. This preliminary study tests this alternative approach using a neural network (NN) classifier on a corpus of e-mail messages from one user. The results of this study are compared to previous spam detectors that have used Naïve Bayesian classifiers. Also, it appears that commercial spam detectors are now beginning to use descriptive features as proposed here.
Chapter PDF
References
Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 160–167 (2000)
Burton, B.: SpamProbe – Bayesian Spam Filtering Tweaks (2002), http://spamprobe.sourceforge.net/paper.html (last accessed November 2003)
Cranor, L.F., LaMacchia, B.A.: Spam! Communications of the ACM 41(8), 74–83 (1998)
Declude: IP Lookup Against a List of All Known DNS-based Spam Databases, http://www.declude.com/junkmail/support/ip4r.htm (last accessed January 27, 2004)
Graham, P.: A Plan for Spam (2002), http://www.paulgraham.com/spam.html (last accessed November 2003)
Graham, P.: Better Bayesian Filtering. In: Proceedings of the 2003 Spam Conference (Cambridge, Massachusetts, 2003), Also available at http://spamconference.org/proceedings2003.html
Hauser, S.: Statistical Spam Filter Works for Me (2002) http://www.sofbot.com/article/Statistical_spam_filter.html (last accessed November 2003)
Hauser. S.: Statistical Spam Filter Review (2003) http://www.sofbot.com/article/Spam_review.html (last accessed November, 2003)
Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization – Papers from the AAAI Workshop (Madison, Wisconsin, 1998), pp. 55–62 (1998)
Sebastiani, F.: Machine Learning in Automatic Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Spertus, E.: Smokey: Automatic Recognition of Hostile Messages. In: Proceedings of the 14th National Conference on AI and the 9th Conference on Innovative Applications of AI (Providence, Rhode Island, 1997), pp. 1058–1065 (1997)
Spam Assassin, http://spamassassin.rediris.es/index.html (last accessed June 2004)
Symantec Spam Watch, http://www.symantec.com/spamwatch/ (last accessed June 2004)
Weiss, A.: Ending Spam’s Free Ride. netWorker 7(2), 18–24 (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2004 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Stuart, I., Cha, SH., Tappert, C. (2004). A Neural Network Classifier for Junk E-Mail. In: Marinai, S., Dengel, A.R. (eds) Document Analysis Systems VI. DAS 2004. Lecture Notes in Computer Science, vol 3163. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-28640-0_42
Download citation
DOI: https://doi.org/10.1007/978-3-540-28640-0_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-23060-1
Online ISBN: 978-3-540-28640-0
eBook Packages: Springer Book Archive