Combining Classifiers for Spam Detection

  • Fatiha Barigou
  • Naouel Barigou
  • Baghdad Atmani
Part of the Communications in Computer and Information Science book series (CCIS, volume 293)

Abstract

Nowadays e-mail has become a fast and economical way to exchange information. However, unsolicited or junk e-mail also known as spam quickly became a major problem on the Internet and keeping users away from them becomes one of the most important research area. Indeed, spam filtering is used to prevent access to undesirable e-mails. In this paper we propose a spam detection system called “3CA&1NB” which uses machine learning to detect spam. “3CA&1NB” has the characteristic of combining three cellular automata and one naïve Bayes algorithm. We discuss how the combination learning based methods can improve detection performances. Our preliminary results show that it can detect spam effectively.

Keywords

spam cellular automaton Naïve Bayes classifier combination 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Androutsopoulos, I., Koutsias, J.: An Evaluation of Naive Bayesian Networks. In: Machine Learning in the New Information Age, Barcelona, Spain, pp. 9–17 (2000)Google Scholar
  2. 2.
    Androutsopoulos, I., Paliouras, G., Karkaletsis, V., Sakkis, G., Spyropoulos, C.D., Stamatopoulos, P.: Learning to filter spam e-mail: a comparison of a naïve Bayesian and a memory based approach. In: Proc. Workshop on Machine Learning and Textual Information Access, PKDD, Lyon, France, pp. 1–13 (2000)Google Scholar
  3. 3.
    Atmani, B., Beldjilali, B.: Knowledge Discovery in Database: Induction Graph and Cellular Automaton. Computing and Informatics Journal 26, 171–197 (2007)MATHGoogle Scholar
  4. 4.
    Awad, A., Polyvyanyy, A., Weske, M.: Semantic querying of business process models. In: Proc. International Conference on Enterprise Distributed Object Computing Conference, EDOC, pp. 85–94 (2008)Google Scholar
  5. 5.
    Barigou, N., Barigou, F., Atmani, B.: A Boolean model for spam detection. In: Proceedings of the International Conference on Communication, Computing and Control Applications, Tunisia, pp. 450–455 (2011)Google Scholar
  6. 6.
    Carreras, X., Marquez, L.: Boosting trees for anti-spam email filtering. In: 4th International Conference on Recent Advances in Natural Language Processing, Bulgaria, pp. 58–64 (2001)Google Scholar
  7. 7.
    Clark, J., Koprinska, I., Poon, J.: A neural network based approach to automated e-mail classification. In: IEEE International Conference on Web Intelligence, Halifax, Canada, pp. 702–705 (2003)Google Scholar
  8. 8.
    Cormack, G., Lynam, T.: Online supervised spam filter evaluation. ACM Transactions On Information Systems 25(3) (2007)Google Scholar
  9. 9.
    Dietterich, T.G.: Ensemble Methods in Machine Learning. In: Kittler, J., Roli, F. (eds.) MCS 2000. LNCS, vol. 1857, pp. 1–15. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  10. 10.
    Green, T.: How URL Spam Filtering Beats Bayesian/Heuristics Hands Down (2005), http://www.greenviewdata.com/documents/white_papers/ssh_url_filtering_white_paper.pdf (last date accessed: January 8, 2012)
  11. 11.
    Guzella, T.S., Caminhas, W.M.: A review of machine learning approaches to spam filtering. Expert Systems with Applications 36(7), 10206–10222 (2009)CrossRefGoogle Scholar
  12. 12.
    Heron, S.: Technologies for spam detection. Network Security, 11–15 (2009)Google Scholar
  13. 13.
    Jung, J., Sit, E.: An empirical study of spam traffic and the use of DNS black lists. In: 4th ACM Conference on Internet Measurement, New York, USA, pp. 370–375 (2004)Google Scholar
  14. 14.
    Koprinska, I., Poon, J., Clarck, J., Chan, J.: Learning to classify e-mail. Information Sciences 177, 2167–2187 (2007)CrossRefGoogle Scholar
  15. 15.
    Lai, C., Tsai, M.: An empirical performance comparison of machine learning methods for spam e-mail categorization. In: 4th International Conference on Hybrid Intelligent Systems, pp. 44-48 (2004)Google Scholar
  16. 16.
    Rios, G., Zha, H.: Exploring support vector machines and random forests for spam detection. In: First International Conference on Email and Anti Spam (CEAS), California, USA (2004)Google Scholar
  17. 17.
    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A Bayesian Approach to Filtering Junk E-Mail. In: Learning for Text Categorization, AAAI Technical Report WS-98-05 (1998)Google Scholar
  18. 18.
    Sakkis, G., Androutsopoulos, I., Paliouras, G., Karkaletsis, V.: Stacking classifiers for anti-spam filtering of e-mail. In: 6th Proceedings of Empirical Methods in Natural Language Processing, Pittsburgh, PA, pp. 44–50 (2001)Google Scholar
  19. 19.
    Santos, I., Laorden, C., Sanz, B., Bringas, P.G.: Enhanced Topic-based Vector Space Model for Semantics-aware Spam Filtering. Expert Systems with Applications 39(1), 437–444 (2012)Google Scholar
  20. 20.
    Sanz, E.P., Hidalgo, J.M., Perez, J.C.: Email spam filtering. In: Zelkowitz, M. (ed.) Advances in Computers, vol. 74, pp. 45–114 (2008)Google Scholar
  21. 21.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  22. 22.
    Shih, D.H., Chiang, S., Lin, I.B.: Collaborative spam filtering with heterogeneous agents. Expert Systems with Applications 34(4), 1555–1566 (2008)CrossRefGoogle Scholar
  23. 23.
    Schneider, K.: A comparison of event models for Naive Bayes anti-spam e-mail filtering. In: 10th Conference of the European Chapter of the Association for Computational Linguistics, pp. 307–314 (2003)Google Scholar
  24. 24.
    Subramaniam, T., Jalab, H., Taqa, A.Y.: Overview of textual anti-spam filtering techniques. International Journal of the Physical Sciences 5(12), 1869–1882 (2010)Google Scholar
  25. 25.
    Upasana, P., Chakraverty, S.: A review of text classification approaches for e-mail management. International Journal of Engineering and Technology 3(2), 137–144 (2011)Google Scholar
  26. 26.
    Valentini, G., Masulli, F.: Ensembles of Learning Machines. In: Marinaro, M., Tagliaferri, R. (eds.) WIRN 2002. LNCS, vol. 2486, pp. 3–19. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  27. 27.
    Vapnik, V.N., Druck, H., Wu, D.: Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks 10(5), 1048–1054 (1999)CrossRefGoogle Scholar
  28. 28.
    Zhang, I., Zhu, J., Yao, T.: An evaluation of statistical spam filtering techniques. ACM Transactions on Asian Language Information Processing 3(4), 243–269 (2004)CrossRefGoogle Scholar
  29. 29.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Fisher, D.H. (ed.) Proceedings of ICML 1997, 14th International Conference on Machine Learning, Nashville, US, pp. 412–420. Morgan Kaufmann Publishers (1997)Google Scholar
  30. 30.

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Fatiha Barigou
    • 1
  • Naouel Barigou
    • 1
  • Baghdad Atmani
    • 1
  1. 1.Computer Science Laboratory of Oran Computer Science Department, Faculty of ScienceUniversity of OranOranAlgeria

Personalised recommendations