Skip to main content

Improving Multiclass Text Classification with Error-Correcting Output Coding and Sub-class Partitions

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 6085))

Abstract

Error-Correcting Output Coding (ECOC) is a general framework for multiclass text classification with a set of binary classifiers. It can not only help a binary classifier solve multi-class classification problems, but also boost the performance of a multi-class classifier. When building each individual binary classifier in ECOC, multiple classes are randomly grouped into two disjoint groups: positive and negative. However, when training such a binary classifier, sub-class distribution within positive and negative classes is neglected. Utilizing this information is expected to improve a binary classifier. We thus design a simple binary classification strategy via multi-class categorization (2vM) to make use of sub-class partition information, which can lead to better performance over the traditional binary classification. The proposed binary classification strategy is then applied to enhance ECOC. Experiments on document categorization and question classification show its effectiveness.

This research is supported by the Science Foundation Ireland (Grant 07/CE/I1142) as part of the Centre for Next Generation Localisation ( www.cngl.ie ) at Trinity College Dublin.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  2. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error-correcting output codes. J. of Artificial Intelligence Research 2, 263–286 (1995)

    MATH  Google Scholar 

  3. Ghani, R.: Using Error-Correcting Codes for Text Classification. In: The Seventeenth International Conference on Machine Learning, ICML 2000 (2000)

    Google Scholar 

  4. Mccallum, A., Nigam, K.: A Comparison of Event Models for Naive Bayes Text Classification. In: The AAAI/ICML 1998 Workshop on Learning for Text Categorization (1998)

    Google Scholar 

  5. Cardoso-Cachopo, A.: Improving Methods for Single-label Text Categorization. PhD Thesis, Instituto Superior Técnico, Portugal (2007)

    Google Scholar 

  6. Li, X., Roth, D.: Learning question classifiers. In: The 19th International Conference on Computational Linguistics (COLING 2002), pp. 556–562 (2002)

    Google Scholar 

  7. Hacioglu, K., Ward, W.: Question Classification with Support Vector Machines and Error Correcting Codes. In: Proceedings of HLT-NAACL 2003 (2003) (short papers)

    Google Scholar 

  8. Dietterich, T.G., Bakiri, G.: Error-correcting output codes: A general method for improving multiclass inductive learning programs. In: The Ninth National Conference on Artificial Intelligence (AAAI 1991), pp. 572–577 (1991)

    Google Scholar 

  9. Berger, A.: Error-correcting output coding for text classification. In: IJCAI 1999 Workshop on Machine Learning for Information Filtering (1999)

    Google Scholar 

  10. Rennie, J., Rifkin, R.: Improving Multiclass Text Classification with the Support Vector Machine. Massachusetts Institute of Technology, AI Memo, AIM-2001-026 (2001)

    Google Scholar 

  11. Tan, S., Wu, G., Cheng, X.: Enhancing the Performance of Centroid Classifier by ECOC and Model Refinement. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 458–472. Springer, Heidelberg (2009)

    Chapter  Google Scholar 

  12. Crammer, K., Singer, Y.: Improved Output Coding for Classification Using Continuous Relaxtion. In: Neural Information Processing Systems (NIPS 2000), pp. 437–443 (2000)

    Google Scholar 

  13. Pujol, O., Radeva, P., Vitria, J.: Discriminant ECOC: A Heuristic Method for Application Dependent Design of Error Correcting Output Codes. IEEE Transactions on Pattern Analysis and Machine Intelligence 28, 1007–1012 (2006)

    Article  Google Scholar 

  14. Zhou, J., Peng, H., Suen, C.Y.: Data-driven Decomposition for Multi-class Classification. Pattern Recognition 41, 67–76 (2008)

    Article  MATH  Google Scholar 

  15. Luo, D., Xiong, R.: An improved error-correcting output coding framework with kernel-based decoding. Neurocomputing 71, 3131–3139 (2008)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, B., Vogel, C. (2010). Improving Multiclass Text Classification with Error-Correcting Output Coding and Sub-class Partitions. In: Farzindar, A., Kešelj, V. (eds) Advances in Artificial Intelligence. Canadian AI 2010. Lecture Notes in Computer Science(), vol 6085. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-13059-5_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-13059-5_4

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-13058-8

  • Online ISBN: 978-3-642-13059-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics