Advertisement

Classification Decision Combination for Text Categorization: An Experimental Study

  • Yaxin Bi
  • David Bell
  • Hui Wang
  • Gongde Guo
  • Werner Dubitzky
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3180)

Abstract

This study investigates the combination of four different classification methods for text categorization through experimental comparisons. These methods include the Support Vector Machine, kNN (nearest neighbours), kNN model-based approach (kNNM), and Rocchio methods. We first review these learning methods and the method for combining the classifiers, and then present some experimental results on a benchmark data collection of 20-newsgroup with an emphasis of average group performance – looking at the effectiveness of combining multiple classifiers on each category. In an attempt to see why the combination of the best and the second best classifiers can achieve better performance, we propose an empirical measure called closeness as a basis of our experiments. Based on our empirical study, we verify the hypothesis that when a classifier has the high closeness to the best classifier, their combination can achieve the better performance.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Xu, L., Krzyzak, A., Suen, C.Y.: Several Methods for Combining Multiple Classifiers and Their Applications in Handwritten Character Recognition. IEEE Trans. on System, Man and Cybernetics 22(3), 418–435 (1992)CrossRefGoogle Scholar
  2. 2.
    Larkey, L.S., Croft, W.B.: Combining classifiers in text categorization. In: Proceedings of SIGIR 1996, 19th ACM International Conference on Research and Development in Information Retrieval, pp. 289–297 (1996)Google Scholar
  3. 3.
    Yang, Y., Ault, T., Pierce, T.: Combining multiple learning strategies for effective cross validation. In: The Seventeenth International Conference on Machine Learning (ICML 2000), pp. 1167–1182 (2000)Google Scholar
  4. 4.
    Bi, Y., Bell, D., Wang, H., Guo, G., Greer, K.: Combining Multiple Classifiers Using Dempster’s Rule of Combination for Text Categorization. In: International Conference of Modelling Decision for Artificial Intelligence (2004)Google Scholar
  5. 5.
    Bi, Y., Bell, D., Guan, J.W.: Combining Evidence from Classifiers in Text Categorization. In: 8th International Conference on Knowledge-Based Intelligent Information & Engineering Systems (2004) (to appear)Google Scholar
  6. 6.
    Ittner, D.J., Lewis, D.D., Ahn, D.D.: Text categorization of low quality images. In: Symposium on Document Analysis and Information Retrieval, pp. 301–315 (1995)Google Scholar
  7. 7.
    Yang, Y.: A study on thresholding strategies for text categorization. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), pp. 137–145 (2001)Google Scholar
  8. 8.
    Joachims, T.: A probabilistic analysis of the Rocchio algorithm with TFIDF for text categorization. In: The Fourteen International Conference on Machine Learning, ICML 1997 (1997)Google Scholar
  9. 9.
    Chang, C.C., Lin, C.J.: LIBSVM: A library for support vector machines (2001), http://www.csie.ntu.edu.tw/~cjlin/libsvm
  10. 10.
    Guo, G., Wang, H., Bell, D., Bi, Y., Kieran Greer, K.: kNN model-based approach in classification. In: Meersman, R., Tari, Z., Schmidt, D.C. (eds.) CoopIS 2003, DOA 2003, and ODBASE 2003. LNCS, vol. 2888, pp. 986–996. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  11. 11.
    Bezdek, J.C., Keller, J.M., Krishnapuram, R., Pal, N.R.: Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. Kluwer Academic Publishers, Dordrecht (1999)zbMATHGoogle Scholar
  12. 12.
    van Rijsbergen, C.J.: Information Retrieval, 2nd edn., Butterworths (1979)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Yaxin Bi
    • 1
    • 2
  • David Bell
    • 1
  • Hui Wang
    • 3
  • Gongde Guo
    • 3
  • Werner Dubitzky
    • 2
  1. 1.School of Computer ScienceQueen’s University of BelfastBelfastUK
  2. 2.School of Biomedical ScienceUniversity of UlsterColeraine, LondonderryUK
  3. 3.School of Computing and MathematicsUniversity of UlsterNewtownabbey, Co. AntrimUK

Personalised recommendations