Advertisement

A Neural Network Classifier for Junk E-Mail

  • Ian Stuart
  • Sung-Hyuk Cha
  • Charles Tappert
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3163)

Abstract

Most e-mail readers spend a non-trivial amount of time regularly deleting junk e-mail (spam) messages, even as an expanding volume of such e-mail occupies server storage space and consumes network bandwidth. An ongoing challenge, therefore, rests within the development and refinement of automatic classifiers that can distinguish legitimate e-mail from spam. A few published studies have examined spam detectors using Naïve Bayesian approaches and large feature sets of binary attributes that determine the existence of common keywords in spam, and many commercial applications also use Naïve Bayesian techniques. Spammers recognize these attempts to thwart their messages and have developed tactics to circumvent these filters, but these evasive tactics are themselves patterns that human readers can often identify quickly. Therefore, in contrast to earlier approaches, our feature set uses descriptive characteristics of words and messages similar to those that a human reader would use to identify spam. This preliminary study tests this alternative approach using a neural network (NN) classifier on a corpus of e-mail messages from one user. The results of this study are compared to previous spam detectors that have used Naïve Bayesian classifiers. Also, it appears that commercial spam detectors are now beginning to use descriptive features as proposed here.

References

  1. 1.
    Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Spyropoulos, C.D.: An Experimental Comparison of Naive Bayesian and Keyword-Based Anti-Spam Filtering with Personal E-mail Messages. In: Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 160–167 (2000)Google Scholar
  2. 2.
    Burton, B.: SpamProbe – Bayesian Spam Filtering Tweaks (2002), http://spamprobe.sourceforge.net/paper.html (last accessed November 2003)
  3. 3.
    Cranor, L.F., LaMacchia, B.A.: Spam! Communications of the ACM 41(8), 74–83 (1998)CrossRefGoogle Scholar
  4. 4.
    Declude: IP Lookup Against a List of All Known DNS-based Spam Databases, http://www.declude.com/junkmail/support/ip4r.htm (last accessed January 27, 2004)
  5. 5.
    Graham, P.: A Plan for Spam (2002), http://www.paulgraham.com/spam.html (last accessed November 2003)
  6. 6.
    Graham, P.: Better Bayesian Filtering. In: Proceedings of the 2003 Spam Conference (Cambridge, Massachusetts, 2003), Also available at http://spamconference.org/proceedings2003.html
  7. 7.
    Hauser, S.: Statistical Spam Filter Works for Me (2002) http://www.sofbot.com/article/Statistical_spam_filter.html (last accessed November 2003)
  8. 8.
    Hauser. S.: Statistical Spam Filter Review (2003) http://www.sofbot.com/article/Spam_review.html (last accessed November, 2003)
  9. 9.
    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization – Papers from the AAAI Workshop (Madison, Wisconsin, 1998), pp. 55–62 (1998)Google Scholar
  10. 10.
    Sebastiani, F.: Machine Learning in Automatic Text Categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  11. 11.
    Spertus, E.: Smokey: Automatic Recognition of Hostile Messages. In: Proceedings of the 14th National Conference on AI and the 9th Conference on Innovative Applications of AI (Providence, Rhode Island, 1997), pp. 1058–1065 (1997)Google Scholar
  12. 12.
    Spam Assassin, http://spamassassin.rediris.es/index.html (last accessed June 2004)
  13. 13.
    Symantec Spam Watch, http://www.symantec.com/spamwatch/ (last accessed June 2004)
  14. 14.
    Weiss, A.: Ending Spam’s Free Ride. netWorker 7(2), 18–24 (2003)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Ian Stuart
    • 1
  • Sung-Hyuk Cha
    • 1
  • Charles Tappert
    • 1
  1. 1.Computer Science DepartmentPace UniversityPleasantvilleUSA

Personalised recommendations