Pattern Analysis and Applications

, Volume 9, Issue 4, pp 339–351 | Cite as

Using online linear classifiers to filter spam emails

Theoretical Advances


The performance of two online linear classifiers—the Perceptron and Littlestone’s Winnow—is explored for two anti-spam filtering benchmark corpora—PU1 and Ling-Spam. We study the performance for varying numbers of features, along with three different feature selection methods: information gain (IG), document frequency (DF) and odds ratio. The size of the training set and the number of training iterations are also investigated for both classifiers. The experimental results show that both the Perceptron and Winnow perform much better when using IG or DF than using odds ratio. It is further demonstrated that when using IG or DF, the classifiers are insensitive to the number of features and the number of training iterations, and not greatly sensitive to the size of training set. Winnow is shown to slightly outperform the Perceptron. It is also demonstrated that both of these online classifiers perform much better than a standard Naïve Bayes method. The theoretical and implementation computational complexity of these two classifiers are very low, and they are very easily adaptively updated. They outperform most of the published results, while being significantly easier to train and adapt. The analysis and promising experimental results indicate that the Perceptron and Winnow are two very competitive classifiers for anti-spam filtering.


Online linear classifier Perceptron Winnow Anti-spam filtering 


  1. 1.
    Vaughan Nichols SJ (2003) Saving private e-mail. IEEE Spectr 40(8):40–44CrossRefGoogle Scholar
  2. 2.
    Whitworth B, Whitworth E (2004) Spam and the social-technical gap. IEEE Comput 37(10):37–45Google Scholar
  3. 3.
    Hoffman P, Crocker D (1998) Unsolicited bulk email: mechanisms for control. Technical Report Report UBE-SOL, IMCR-008, Internet Mail ConsortiumGoogle Scholar
  4. 4.
    Androutsopoulos I, Paliouras G, Karkaletsis V, Sakkis G, Spyropoulos CD, Stamatopoulos P (2000) Learning to filter spam e-mail: a comparison of a Naive Bayesian and a memory-based approach. In: Proceedings of the 4th European conference on principles and practice of knowledge discovery in databases, pp 1–13Google Scholar
  5. 5.
    Cohen W (1995) Fast effective rule induction. In: Machine learning: Proceedings of the 12th international conference, pp 115–123Google Scholar
  6. 6.
    Drucker H, Wu D, Vapnik VN (1999) Support vector machines for spam categorization. IEEE Trans Neural Netw 20(5):1048–1054CrossRefGoogle Scholar
  7. 7.
    Hidalgo JMG (2002) Evaluating cost-sensitive unsolicited bulk email categorization. In: Proceedings of ACM symposium on applied computing, pp 615–620Google Scholar
  8. 8.
    Yang L, Xiaoping D, Ping L, Zhihui H, Chen G, Huanlin L (2002) Intelligently analyzing and filtering spam emails based on rough set. In: Proceedings of 12th Chinese computer society conference on network and data communication, pp 211–215Google Scholar
  9. 9.
    Sahami M, Dumais S, Heckerman D, Horvitz E (1998) A Bayesian approach to filtering junk e-mail. In: Proceedings of AAAI workshop on learning for text categorization, pp 55–62Google Scholar
  10. 10.
    Androutsopoulos I, Koutsias J, Chandrinos KV, Spyropoulos CD (2000) An experimental comparison of Naive Bayesian and keyword-based anti-spam filtering with encrypted personal e-mail messages. In: Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval, pp 160–167Google Scholar
  11. 11.
    Androutsopoulos I, Koutsias J, Chandrinos KV, Paliouras G, Spyropoulos CD (2000) An evaluation of Naive Bayesian anti-spam filtering. In: Proceedings of the workshop on machine learning in the new information age, 11th European conference on machine learning, pp 9–17Google Scholar
  12. 12.
    Androutsopoulos I, Paliouras G, Michelakis E (2004) Learning to filter unsolicited commercial e-mail. Technical report 2004/2, NCSR “Demokritos”Google Scholar
  13. 13.
    Schneider K (2003) A comparison of event models for Naive Bayes anti-spam e-mail filtering. In: Proceedings of the 10th conference of the European chapter of the association for computational linguistics, pp 307–314Google Scholar
  14. 14.
    Carreras X, Marquez L (2001) Boosting trees for anti-spam email filtering. In: Proceedings of European conference on recent advances in NLP, pp 58–64Google Scholar
  15. 15.
    Lewis DD, Schapire RE, Callan JP, Papka R (1996) Training algorithms for linear text classifiers. In: Proceedings of the 19th annual international conference on research and development in information retrieval, pp 298–306Google Scholar
  16. 16.
    Rocchio J (1971) Relevance feedback in information retrieval. In: The SMART retrieval system: experiments in automatic document processing, pp 313–323. Prentice Hall Inc., Englewood CliffsGoogle Scholar
  17. 17.
    Rosenblatt E (1988) The perceptron: a probabilistic model for information storage and organization in the brain. Psych Rev 65(1958):386–407; reprinted in: Neurocomputing (MIT Press, Cambridge, 1988)Google Scholar
  18. 18.
    Littlestone N (1988) Learning quickly when irrelevant attributes abound: a new linear-threshold algorithm. Mach Learn 2(4):285–318Google Scholar
  19. 19.
    Grove AJ, Littlestone N, Schuurmans D (1997) General convergence results for linear discriminant updates. In: Annual workshop on computational learning theory, Proceedings of the 10th annual conference on computational learning theory, pp 171–183Google Scholar
  20. 20.
    Dagan I, Karov Y, Roth D (1997) Mistake-driven learning in text categorization. In: Proceedings of the 2nd conference on empirical methods in natural language processing, pp 55–63Google Scholar
  21. 21.
    Ng HT, Goh WB, Low KL (1997) Feature selection, perceptron learning, and a usability case study for text categorization. In: Proceedings of the 20th ACM international conference on research and development in information retrieval, pp 67–73Google Scholar
  22. 22.
    Zhang T (2001) Regularized Winnow methods. Adv Neural Inf Process Syst 13:703–709Google Scholar
  23. 23.
    Kivinen J, Warmuth MK, Auer P (1997) The Perceptron algorithm versus Winnow: linear versus logarithmic mistake bounds when few input variables are relevant. Artif Intell 97(1–2):325–343MATHMathSciNetCrossRefGoogle Scholar
  24. 24.
    Bel N, Koster CHA, Villegas M (2003) Cross-lingual text categorization. In: Proceedings the 7th European conference on digital library, LNCS 2769, pp 126–139. Springer, Berlin Heidelberg New YorkGoogle Scholar
  25. 25.
    Liere R, Tadepalli P (1998) Active learning with committees in text categorization: preliminary results in comparing winnow and perceptron. In: Learning for text categorization, technical report WS-98-05. AAAI Press, Menlo ParkGoogle Scholar
  26. 26.
    Schütze H, Hull DA, Pedersen JO (1995) A comparison of classifiers and document representations for the routing problem. In: Proceedings of the 18th ACM international conference on research and development in information retrieval, pp 229–237Google Scholar
  27. 27.
    Wiener ED, Pedersen JO, Weigend AS (1995) A neural network approach to topic spotting. In: Proceedings of the 4th annual symposium on document analysis and information retrieval, pp 317–332Google Scholar
  28. 28.
    Yang Y, Pedersen JP (1997) A comparative study on feature selection in text categorization. In: Proceedings of the 14th international conference on machine learning, pp 412–420Google Scholar
  29. 29.
    Xu H, Yang Z, Wang B, Liu B, Cheng J, Liu Y, Yang Z, Cheng X, Bai S (2002) TREC 11 experiments at CAS-ICT: filtering and web. In: Proceedings of the 11th text retrieval conference, pp 105–115Google Scholar

Copyright information

© Springer-Verlag London Limited 2006

Authors and Affiliations

  1. 1.Institute of Computing TechnologyChinese Academy of SciencesBeijingChina
  2. 2.School of ComputingDublin City UniversityDublinIreland

Personalised recommendations