Skip to main content

Ensemble of Binary Learners for Reliable Text Categorization with a Reject Option

  • Conference paper
Book cover Hybrid Artificial Intelligent Systems (HAIS 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7208))

Included in the following conference series:

Abstract

Text categorization is a key task in information retrieval and natural language processing. Providing a reliability measure of the classification result for a text document into a particular category can benefit the recognition rate as well as better inform the user with regard to the confidence that should be attributed to the output. A novel reliability measure is proposed starting from running different binary classifiers in the Error-Correcting Output Codes (ECOC) framework. Documents classified in a particular category which have a higher ECOC-computed distance from their classification in the next ranked category also have a higher associated reliability. This is the main idea explored in the proposed ECOC-based text classifier with a reject option. Experiments performed for some commonly used text categorization benchmark datasets demonstrate the potential of the proposed method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Feldman, R., Sanger, J.: The Text Mining Handbook - Advanced Approaches in Analyzing Unstructured Data, pp. I-XII, 1–410. Cambridge University Press (2007)

    Google Scholar 

  2. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1) (2002)

    Google Scholar 

  3. Hotho, A., Nurnberger, A., Paass, G.: A Brief Survey of Text Mining. LDV Forum 20(1), 19–62 (2005)

    Google Scholar 

  4. Fumera, G., Pillai, I., Roli, F.: Classification with reject option in text categorisation systems. In: Proc. 12th International Conference on Image Analysis and Processing, pp. 582–587. IEEE Computer Society (2003)

    Google Scholar 

  5. Fumera, G., Pillai, I., Roli, F.: A Two-Stage Classifier with Reject Option for Text Categorisation. In: Structural, Syntactic, and Statistical Patt. Rec., pp. 771–779 (2004)

    Google Scholar 

  6. Theeramunkong, T., Sriphaew, K.: Discovery of Relations among Scientific Articles using Association Rule Mining. In: Proceedings of the 2007 NSTDA Annual Conference Science (Science and Technology for National Productivity and Happiness), Thailand Science Park, Pathumthani, Thailand (2007)

    Google Scholar 

  7. Pillai, I., Fumera, G., Roli, F.: A Classification Approach with a Reject Option for Multi-label Problems. In: Maino, G., Foresti, G.L. (eds.) ICIAP 2011, Part I. LNCS, vol. 6978, pp. 98–107. Springer, Heidelberg (2011)

    Chapter  Google Scholar 

  8. Dietterich, T.G., Bakiri, G.: Solving multiclass learning problems via error correcting output codes. J. of Artificial Intelligence Research 2, 263–286 (1995)

    MATH  Google Scholar 

  9. Kołcz, A., Chowdhury, A.: Improved Naive Bayes for Extremely Skewed Misclassification Costs. In: Jorge, A.M., Torgo, L., Brazdil, P.B., Camacho, R., Gama, J. (eds.) PKDD 2005. LNCS (LNAI), vol. 3721, pp. 561–568. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  10. Smirnov, E.N., Nalbantov, G.I., Kaptein, A.M.: Meta-conformity approach to reliable classification. Intell. Data Anal. 13(6), 901–915 (2009)

    Google Scholar 

  11. Kaptein, A.M.: Meta-Classifier Approaches to Reliable Text Classification, Master Thesis, Universiteit Maastricht, The Netherlands (2005)

    Google Scholar 

  12. Allwein, E.L., Shapire, R.E., Singer, Y.: Reducing multiclass to binary: A unifying approach for margin classifiers. Journal of Machine Learning Research 1, 113–141 (2000)

    Google Scholar 

  13. Hastie, T., Tibshirani, R.: Classification by pairwise grouping. The Annals of Stat. 26(5), 451–471 (1998)

    MathSciNet  MATH  Google Scholar 

  14. Lin, S., Costello, D.J.: Error Control Coding, 2nd edn. Prentice-Hall, Inc. (2004)

    Google Scholar 

  15. Hatami, N.: Thinned-ECOC ensemble based on sequential code shrinking. Expert Systems with Applications 39(1) (2012)

    Google Scholar 

  16. Pujol, O., Radeva, P., Vitria, J.: Discriminant ECOC: A heuristic method for application dependent design of error correcting output codes. IEEE Transactions on PAMI 28(6), 1001–1007 (2006)

    Article  Google Scholar 

  17. Pujol, O., Escalera, S., Radeva, P.: An incremental node embedding technique for error correcting output codes. Pattern Recognition 41, 713–725 (2008)

    Article  MATH  Google Scholar 

  18. Zhou, J., Peng, H., Suen, C.Y.: Data-driven decomposition for multi-class classification. Pattern Recognition 41, 67–76 (2008)

    Article  MATH  Google Scholar 

  19. Lewis, D.D., Yang, Y., Rose, T.G., Li, F.: RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research 5, 361–397 (2004)

    Google Scholar 

  20. Kosmopoulos, A., Gaussier, E., Paliouras, G., Aseervatham, S.: The ECIR 2010 Large Scale Hierarchical Classification, Workshop report (2010)

    Google Scholar 

  21. Silla Jr., C.N., Freitas, A.A.: A Survey of Hierarchical Classification Across Different Application Domains. Data Mining and Knowledge Discovery 20(1) (2010)

    Google Scholar 

  22. Armano, G., Chira, C., Hatami, N.: Error-Correcting Output Codes for Multi-label Text Categorization. In: Proceedings of 3rd Italian Information Retrieval Workshop (IIR 2012), Bari (in press, 2012)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Armano, G., Chira, C., Hatami, N. (2012). Ensemble of Binary Learners for Reliable Text Categorization with a Reject Option. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7208. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28942-2_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28942-2_13

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28941-5

  • Online ISBN: 978-3-642-28942-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics