Advertisement

Abstract

Email categorization becomes very popular today in personal information management. However, most n-way classification methods suffer from feature unevenness problem, namely, features learned from training samples distribute unevenly in various folders. We argue that the binarization approaches can handle this problem effectively. In this paper, three binarization techniques are implemented, i.e. one-against-rest, one-against-one and some-against-rest, using two assembling techniques, i.e. round robin and elimination. Experiments on email categorization prove that significant improvement has been achieved in these binarization approaches over an n-way baseline classifier.

Keywords

Binarization assembling email categorization text classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bekkerman, R., McCallum, A., Huang, G.: Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora. UMass CIIR Technical Report IR-418 (2004)Google Scholar
  2. 2.
    Berger, A.: Error-correcting output coding for text classification. In: IJCAI 1999 Workshop on machine learning for information filtering (1999)Google Scholar
  3. 3.
    Cohen, W.: Learning Rules that Classify E-Mail. In: Proc. AAAI Spring Symposium on Machine Learning in Information Access, Stanford, California (1996)Google Scholar
  4. 4.
    Fisher, D., Moody, P.: Studies of Automated Collection of Email Records. University of California, Irvine, Technical Report UCI-ISR-02-4 (2001)Google Scholar
  5. 5.
    Furnkranz, J.: Round robin classification. Journal of Machine Learning Research 2, 721–747 (2002)CrossRefMathSciNetGoogle Scholar
  6. 6.
    Hastie, T., Tibshirani, R.: Classification by pairwise coupling. In: Jordan, M.I., Kearns, M.J., Solla, S.A. (eds.) Advances in Neural Information Processing Systems 10 (NIPS 1997), pp. 507–513. MIT Press, Cambridge (1998)Google Scholar
  7. 7.
    Joachims, T.: Learning to Classify Text Using Support Vector Machines, Methods, Theory, and Algorithms. Kluwer, Dordrecht (2002)Google Scholar
  8. 8.
    Yang, Y., Klimt, B.: The Enron Corpus: A New Dataset for Email Classification Research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  9. 9.
    Manco, G., Masciari, E., Rurolo, M., Tagarelli, A.: Towards an adaptive mail classifier. In: Proc. AIIA 2002 (2002)Google Scholar
  10. 10.
    Schwenker, F.: Hierarchical support vector machines for multi-class pattern recognition. In: Proc. IEEE KES 2000, vol. 2, pp. 561–565 (2000)Google Scholar
  11. 11.
    Xia, Y., Dalli, A., Wilks, Y., Guthrie, L.: FASiL Adaptive Email Categorization System. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 723–734. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  12. 12.
    Yang, Y.: An evaluation of statistical approaches to text categorization. Journal IR 1(1/2), 67–88 (1999)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Yunqing Xia
    • 1
  • Kam-Fai Wong
    • 1
  1. 1.Department of Systems Engineering and Engineering ManagementThe Chinese University of Hong KongShatinHong Kong

Personalised recommendations