Performance Analysis of Naϊve Bayes Classification, Support Vector Machines and Neural Networks for Spam Categorization

  • A. Cϋneyd Tantuğ
  • Gϋlşen Eryiğit
Part of the Advances in Soft Computing book series (AINSC, volume 34)

Abstract

Spam mail recognition is a new growing field which brings together the topic of natural language processing and machine learning as it is in essence a two class classification of natural language texts. An important feature of spam recognition is that it is a cost-sensitive classification: misclassification of a nonspam mail as spam is generally a more severe error than misclassifying a spam mail as non-spam. In order to be compared, the methods applied to this field should be all evaluated with the same corpus and within the same cost-sensitive framework. In this paper, the performances of Support Vector Machines (SVM), Neural Networks (NN) and Naϊve Bayes (NB) techniques are compared using a publicly available corpus (LINGSPAM) for different cost scenarios. The training time complexities of the methods are also evaluated. The results show that NN has significantly better performance than the two other, having acceptable training times. NB gives better results than SVM when the cost is extremely high while in all other cases SVM outperforms NB.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Androutsopoulos, I, Koutsias, J., Chandrinos, K.V., and Spyropoulos, C.D. (2000), “An Evaluation of Naive Bayesian Anti-Spam Filtering,” Proceedings of the workshop on Machine Learning in the New Information Age, pp. 9–17.Google Scholar
  2. Carreras, X., and Marquez, L. (2001), “Boosting Trees for Anti-Spam Email Filtering,” Proceedings of the 4th International Conference on Recent Advances in NLP, pp. 58–64.Google Scholar
  3. Chang, C.C, and Lin, C. (2001), “LIBSVM: a Library for Support Vector Machines,” http://www.csie.ntu.edu.tw/~cjlin/libsvm.Google Scholar
  4. Duda, R.O., Hart, P.E., and Strok, D.G., (2001), “Linear Discriminant Functions,” Chapter 5 in Pattern Classification. John Wiley, 10–43.Google Scholar
  5. Drucker, H., Wu, D., and Vapnik, V.N., (1999), “Support Vector Machines for Spam Categorization,” IEEE Transactions On Neural Networks, vol. 10 (5), pp. 1048–1054.CrossRefGoogle Scholar
  6. Efe, O.M., and Kaynak, O. (2000), Artificial Neural Networks and their Applications, Bogazici University Press, Istanbul.Google Scholar
  7. Kohonen, T. (1980), Content Addressable Memories, Springer-Verlag, New YorkMATHGoogle Scholar
  8. Osuna, E.E., Freund, R., and Girosi, F. (1997), “Improved training algorithm for support vector machines,” Proceedings of the IEEE Workshops on Neural Network for Signal Processing, pp. 24–26.Google Scholar
  9. Platt, J.C. (1998), “Sequential minimal optimization: A fast algorithm for training support vector machines,” Advances in Kernel Method: Support Vector Learning, Scholkopf, Surges, and Smola, Eds. Cambridge, MA: MIT Press, pp. 185–208.Google Scholar
  10. Sahami, M., Dumais, S., Heckerman, D., and Horvitz, E. (1998), “A Bayesian Approach to Filtering Junk E-Mail. Learning for Text Categorization,” AAAI Technical Report, WS-98-05, pp. 55–62.Google Scholar
  11. Sakkis, G., Androutsopoulos, L, Paliouras, G., Karkaletsis, V., Spyropoulos, C.D., and Stamatopoulos, P. (2003), “A Memory-Based Approach to Anti-Spam Filtering for Mailing Lists,” Information Retrieval, vol. 6(1), pp. 49–73.CrossRefGoogle Scholar
  12. Schneider, K. (2003), “A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering,” Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics, pp. 207–314.Google Scholar
  13. Vapnik, V. (1995), The Nature of Statistical Learning Theory, Springer-Verlag.Google Scholar
  14. Vapnik, V. (1982), Estimation of Dependences Based on Empirical Data, Springer-Verlag.Google Scholar
  15. Zurada, J. M. (1992), Introduction To Artificial Neural Networks, West Publishing Company.Google Scholar

Copyright information

© Springer 2006

Authors and Affiliations

  • A. Cϋneyd Tantuğ
    • 1
  • Gϋlşen Eryiğit
    • 1
  1. 1.Dept. of Computer EngineeringIstanbul Technical Univ.Turkey

Personalised recommendations