Advertisement

Combined Bayesian Classifiers Applied to Spam Filtering Problem

  • Karol Wrótniak
  • Michał Woźniak
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 189)

Abstract

This paper focuses on the problem of designing effective spam filters using combined Näive Bayes classifiers. Firstly, we describe different tokenization methods which allow us for extracting valuable features from the e-mails. The methods are used to create training sets for individual Bayesian classifiers, because different methods of feature extraction ensure the desirable diversity of classifier ensemble. Because of the lack of an adequate analytical methods of ensemble evaluation the most valuable and diverse committees are chosen on the basis of computer experiments which are carried out on the basis of our own spam dataset. Then the number of well known fusion methods using class labels and class supports are compared to establish the final proposition.

Keywords

adaptive classifier classifier ensemble combined classifier Näive Bayes classifier n-gram OSB SBPH spam filtering 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Biggio, B., Fumera, G., Roli, F.: Multiple Classifier Systems under Attack. In: El Gayar, N., Kittler, J., Roli, F. (eds.) MCS 2010. LNCS, vol. 5997, pp. 74–83. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  2. 2.
    Hershkop, S., Stolfo, S.J.: Combining email models for false positive reduction. In: Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining, pp. 98–107. ACM, New York (2005)CrossRefGoogle Scholar
  3. 3.
    Kurlej, B., Wozniak, M.: Active learning approach to concept drift problem. Logic Journal of the IGPL 20(3), 550–559 (2012)CrossRefGoogle Scholar
  4. 4.
    Erdélyi, M., Benczúr, A.A., Masanés, J., Siklósi, D.: Web spam filtering in internet archives. In: Proceedings of the 5th International Workshop on Adversarial Information Retrieval on the Web, AIRWeb 2009, pp. 17–20. ACM, New York (2009)CrossRefGoogle Scholar
  5. 5.
    Pu, C., Webb, S.: Observed trends in spam construction techniques: A case study of spam evolution. In: CEAS (2006)Google Scholar
  6. 6.
    Henzinger, M.R., Motwani, R., Silverstein, C.: Challenges in web search engines. SIGIR Forum 36(2), 11–22 (2002)CrossRefGoogle Scholar
  7. 7.
    Lai, C.C., Tsai, M.C.: An empirical performance comparison of machine learning methods for spam e-mail categorization. In: Proceedings of the Fourth International Conference on Hybrid Intelligent Systems, HIS 2004, pp. 44–48. IEEE Computer Society, Washington, DC (2004)Google Scholar
  8. 8.
    Sahami, M., Dumais, S., Heckerman, D., Horvitz, E.: A bayesian approach to filtering junk E-mail. In: Learning for Text Categorization: Papers from the 1998 Workshop, Madison, Wisconsin. AAAI Technical Report WS-98-05 (1998)Google Scholar
  9. 9.
    Graham, P.: A plan for spam (August 2002), http://www.paulgraham.com/spam.html
  10. 10.
    Siefkes, C., Assis, F., Chhabra, S., Yerazunis, W.S.: Combining Winnow and Orthogonal Sparse Bigrams for Incremental Spam Filtering. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) PKDD 2004. LNCS (LNAI), vol. 3202, pp. 410–421. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  11. 11.
    Gargiulo, F., Penta, A., Picariello, A., Sansone, C.: A personal antispam system based on a behaviour-knowledge space approach. In: Okun, O., Valentini, G. (eds.) Applications of Supervised and Unsupervised Ensemble Methods. SCI, vol. 245, pp. 39–57. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  12. 12.
    Blanzieri, E., Bryl, A.: A survey of learning-based techniques of email spam filtering. Artif. Intell. Rev. 29(1), 63–92 (2008)CrossRefGoogle Scholar
  13. 13.
    Erdélyi, M., Garzó, A., Benczúr, A.A.: Web spam classification: a few features worth more. In: Proceedings of the 2011 Joint WICOW/AIRWeb Workshop on Web Quality. WebQuality 2011, pp. 27–34. ACM, New York (2011)CrossRefGoogle Scholar
  14. 14.
    Kuncheva, L.I.: Combining pattern classifiers: methods and algorithms. Wiley-Interscience (2004)Google Scholar
  15. 15.
    Wozniak, M.: Proposition of common classifier construction for pattern recognition with context task. Knowledge-Based Systems 19(8), 617–624 (2006)CrossRefGoogle Scholar
  16. 16.
    van Erp, M., Vuurpijl, L., Schomaker, L.: An overview and comparison of voting methods for pattern recognition. In: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition, IWFHR 2002. IEEE Computer Society, Washington, DC (2002)Google Scholar
  17. 17.
    Bishop, C.M.: Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus (2006)Google Scholar
  18. 18.
    Duin, R.P.W.: The combining classifier: To train or not to train? In: International Conference on Pattern Recognition, vol. 2, p. 20765 (2002)Google Scholar
  19. 19.
    Jacobs, R.A.: Methods for combining experts’ probability assessments. Neural Comput. 7(5), 867–888 (1995)CrossRefGoogle Scholar
  20. 20.
    Burduk, R.: Imprecise information in bayes classifier. Pattern Anal. Appl. 15(2), 147–153 (2012)CrossRefGoogle Scholar
  21. 21.
    Shipp, C.A., Kuncheva, L.I.: Relationships between combination methods and measures of diversity in combining classifiers. Information Fusion 3, 135–148 (2002)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Karol Wrótniak
    • 1
  • Michał Woźniak
    • 1
  1. 1.Department of Systems and Computer NetworksWroclaw University of TechnologyWroclawPoland

Personalised recommendations