Advertisement

A Two-Stage Classifier with Reject Option for Text Categorisation

  • Giorgio Fumera
  • Ignazio Pillai
  • Fabio Roli
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3138)

Abstract

In this paper, we investigate the usefulness of the reject option in text categorisation systems. The reject option is introduced by allowing a text classifier to withhold the decision of assigning or not a document to any subset of categories, for which the decision is considered not sufficiently reliable. To automatically handle rejections, a two-stage classifier architecture is used, in which documents rejected at the first stage are automatically classified at the second stage, so that no rejections eventually remain. The performance improvement achievable by using the reject option is assessed on a real text categorisation task, using the well known Reuters data set.

Keywords

Support Vector Machine Test Document Text Categorisation Statistical Pattern Recognition Pattern Recognition System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Chow, C.K.: An optimum Character Recognition System Using Decision Functions. IRE Trans. on Electronic Computers 6, 247–254 (1957)CrossRefGoogle Scholar
  2. 2.
    Chow, C.K.: On Optimum Error and Reject Tradeoff. IEEE Trans. on Information Theory 16, 41–46 (1970)zbMATHCrossRefGoogle Scholar
  3. 3.
    Fumera, G., Pillai, I., Roli, F.: Classification with Reject Option in Text Categorisation Systems. In: Proc. 12th International Conference on Image Analysis and Processing, pp. 582–587. IEEE Computer Society, Los Alamitos (2003)CrossRefGoogle Scholar
  4. 4.
    Ha, T.M.: The Optimum Class-Selective Rejection Rule. IEEE Trans. on Pattern Analysis and Machine Intelligence 19, 608–615 (1997)CrossRefGoogle Scholar
  5. 5.
    Joachims, T.: Text Categorization with Support Vector Machines: Learning with Many Relevant Features. In: Proc. 10th European Conference on Machine Learning, pp. 137–142 (1998)Google Scholar
  6. 6.
    Li, Y.H., Jain, A.K.: Classification of Text Documents. The Computer Journal 41, 537–546 (1998)zbMATHCrossRefGoogle Scholar
  7. 7.
    Giusti, N., Masulli, F., Sperduti, A.: Theoretical and Experimental Analysis of a Two-Stage System for Classification. IEEE Trans. on Pattern Analysis and Machine Intelligence 24, 893–904 (2002)CrossRefGoogle Scholar
  8. 8.
    Pudil, P., Novovicova, J., Blaha, S., Kittler, J.: Multistage Pattern Recognition with Reject Option. In: Proc. 11th IAPR Int. Conf. on Pattern Recognition, vol. 2, pp. 92–95 (1992)Google Scholar
  9. 9.
    Schapire, R.E., Singer, Y.: BoosTexter: a Boosting-Based System for Text Categorization. Machine Learning 39, 135–168 (2000)zbMATHCrossRefGoogle Scholar
  10. 10.
    Sebastiani, F.: Machine Learning in Automated Text Categorization. ACM Computing Surveys 34, 1–47 (2002)CrossRefMathSciNetGoogle Scholar
  11. 11.
    Yang, Y., Pedersen, J.O.: A Comparative Study on Feature Selection in Text Categorization. In: Proc. 14th Int. Conf. on Machine Learning, pp. 412–420 (1997)Google Scholar
  12. 12.
    Yang, Y., Liu, X.: A Re-Examination of Text Categorization Methods. In: Proc. 22nd ACM Int. Conf. on Res. and Dev. In Inf. Retrieval, pp. 42–49 (1999)Google Scholar
  13. 13.
    Yang, Y.: A Study on Thresholding Strategies for Text Categorization. In: Proc. 24th ACM Int. Conf. on Res. and Dev. In Inf. Retrieval, pp. 137–145 (2001)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Giorgio Fumera
    • 1
  • Ignazio Pillai
    • 1
  • Fabio Roli
    • 1
  1. 1.Dept. of Electrical and Electronic Eng.University of CagliariCagliariItaly

Personalised recommendations