Journal of Intelligent Information Systems

, Volume 42, Issue 1, pp 19–45 | Cite as

Cost-sensitive three-way email spam filtering

  • Bing ZhouEmail author
  • Yiyu Yao
  • Jigang Luo


Email spam filtering is typically treated as a binary classification problem that can be solved by machine learning algorithms. We argue that a three-way decision approach provides a more meaningful way to users for precautionary handling their incoming emails. Three email folders instead of two are produced in a three-way spam filtering system, a suspected folder is added to allow users make further examinations of suspicious emails, thereby reducing the chances of misclassification. Different from existing ternary email spam filtering systems, we focus on two issues that are less studied, that is, the computation of required thresholds to define the three email categories, and the interpretation of the cost-sensitive characteristics of spam filtering. Instead of supplying the thresholds based on intuitive understandings of the levels of tolerance for errors, we systematically calculate the thresholds based on decision-theoretic rough set model. A loss function is interpreted as the costs of making classification decisions. A decision is made for which the overall cost is minimum. Experimental results show that the new approach reduces the error rate of misclassifying a legitimate email to spam and demonstrates a better performance for the cost-sensitivity aspect.


Email spam filtering Cost-sensitive learning Ternary classification Three-way decision Naive Bayes classifier 



The authors are grateful for the financial support from NSERC Canada.


  1. Androutsopoulos, I., Koutsias, J., Chandrinos, K.V., Paliouras, G., Spyropoulos, C.D. (2000). An evaluation of naive Bayesian anti-spam filtering. In Proc. of the workshop on machine learning in the new information age.Google Scholar
  2. Barracuda Spam Firewall (2012). From Accessed 25 July 2012.
  3. Bogofilter (2012). From Accessed 25 July 2012.
  4. Cohen, W. (1996). Learning rules that classify email. In Advances in inductive logic programming.Google Scholar
  5. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  6. Drummond, C., & Holte, R.C. (2000). Explicitly representing expected cost: an alternative to ROC representation. In KDD 2000 (pp. 198–207).Google Scholar
  7. Drummond, C., & Holte, R.C. (2006). Cost curves: an improved method for visualizing classifier performance. Machine Learning, 65(1), 95–130.CrossRefGoogle Scholar
  8. Duda, R.O., & Hart, P.E. (1973). Pattern classification and scene analysis. New York: Wiley.zbMATHGoogle Scholar
  9. Elkan, C. (2001). The foundations of cost-senstive learning. In Proceedings of the 17th international joint conference on artificial intelligence (pp. 973–978).Google Scholar
  10. Fayyad, U.M., & Irani, K.B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th international joint conference on artificial intelligence (pp. 1022–1029).Google Scholar
  11. GFI MailEssentials (2012). Accessed 25 July 2012.
  12. Good, I.J. (1965). The estimation of probabilities: An essay on modern Bayesian methods. Cambridge: MIT Press.zbMATHGoogle Scholar
  13. Graham, P. (2002). A Plan for spam. Accessed 25 July 2012.
  14. Masand, B., Linoff, G., Waltz, D. (1992). Classifying news stories using memory based reasoning. In Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval (pp. 59–65).Google Scholar
  15. Mitchell, T. (1997). Machine learning. New York: McGraw Hill.zbMATHGoogle Scholar
  16. Pantel, P., & Lin, D.K. (1998). SpamCop—a spam classification & organization program. In Proceedings of AAAI workshop on learning for text categorization (pp. 95–98). Madison, WI.Google Scholar
  17. Rennie, J. (1996). “ifile”. Accessed 25 July 2012.
  18. Robinson, G. (2004). A statistical approach to the spam problem, spam detection. In Why Chi? Motivations for the use of fishers inverse Chi-square procedure in spam classification. Handling redundancy in email token probabilities.Google Scholar
  19. Sahami, M., Dumais, S., Heckerman, D., Horvitz, E. (1998). A Bayesian approach to filtering junk e-mail. In AAAI workshop on learning for text categorization. AAAI Technical Report WS-98-05, Madison, Wisconsin.Google Scholar
  20. Schapire, E., & Singer, Y. (2000). BoosTexter: a boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168.CrossRefzbMATHGoogle Scholar
  21. Siersdorfer, S., & Weikum, G. (2005). Using restrictive classification and meta classification for junk elimination. In Proceedings of ECIR’2005 (pp. 287–299).Google Scholar
  22. Triola, M.F. (2005). Elementary statistics. Reading: Addison Wesley.Google Scholar
  23. Yao, Y.Y. (2011). The superiority of three-way decisions in probabilistic rough set models. Information Sciences, 181, 1080–1096.CrossRefzbMATHMathSciNetGoogle Scholar
  24. Yao, Y.Y., Wong, S.K.M., Lingras, P. (1990). A decision-theoretic rough set model. In Z.W. Ras, M. Zemankova, M.L. Emrich (Eds.), Methodologies for intelligent systems (Vol. 5, pp. 17–24). New York: North Holland.Google Scholar
  25. Yerazunis, W.S. (2003). Sparse binary polynomial hashing and the CRM114 discriminator. In Proceedings of the MIT spam conference.Google Scholar
  26. Yih, W., McCann, R., Kolcz, A. (2007). Improving spam filtering by Detecting Gray mail. In Proceedings of the 4th conference on e-mail and anti-spam (CEAS07).Google Scholar
  27. Zhao, W., & Zhang, Z. (2005). An email classification model based on rough set theory. In Procedings of the international conference on active media technology (pp. 403–408).Google Scholar
  28. Zhou, Z.H., & Liu, X.Y. (2006). Training cost-sensitive neural networks with methods addressing the class imbalance problem. IEEE Transactions on Knowledge and Data Engineering, 18(1), 63–77.CrossRefGoogle Scholar
  29. Zhou, Z.H., & Liu, X.Y. (2010). On multi-class cost-sensitive learning. Computational Intelligence, 26(3), 232–257.CrossRefMathSciNetGoogle Scholar
  30. Zhou, B., & Liu, Q.Z. (2012). A comparison study of cost-sensitive classifier evaluations. In The 2012 international conference on brain informatics (BI’12). Lecture notes in computer science (Vol. 7670, pp. 360–371).Google Scholar
  31. Zhou, B., Yao, Y.Y., Luo, J.G. (2010). A three-way decision approach to email spam filtering. In Proceedings of the 23th Canadian conference on artificial intelligence (AI 2010), University of Ottawa, Ontario, Canada, 31 May–2 June 2010. Lecture notes in artificial intelligence (pp. 28–39).Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  1. 1.Department of Computer ScienceSam Houston State UniversityHuntsvilleUSA
  2. 2.Department of Computer ScienceUniversity of ReginaReginaCanada

Personalised recommendations