Selection Strategies for Multi-label Text Categorization

  • Arturo Montejo-Ráez
  • Luis Alfonso Ureña-López
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4139)


In multi-label text categorization, determining the final set of classes that will label a given document is not trivial. It implies first to determine whether a class is suitable of being attached to the text and, secondly, the number of them that we have to consider. Different strategies for determining the size of the final set of assigned labels are studied here. We analyze several classification algorithms along with two main strategies for selection: by a fixed number of top ranked labels, or using per-class thresholds. Our experiments show the effects of each approach and the issues to consider when using them.


Information Retrieval Text Categorization Rank Strategy Global Ranking Adaptive Selection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Pouliquen, B., Steinberger, R., Ignat, C.: Automatic Annotation of Multilingual Text Collections with a Conceptual Thesaurus. In: Todirascu, A. (ed.) Proceedings of the workshop Ontologies and Information Extraction’ at the EuroLan Summer School The Semantic Web and Language Technology (EUROLAN 2003), Bucharest (Romania), p. 8 (2003)Google Scholar
  2. 2.
    Dallman, D., Meur, J.Y.L.: Automatic keywording of High Energy Physics. In: 4th International Conference on Grey Literature: New Frontiers in Grey Literature, Washington, DC, USA (October 1999)Google Scholar
  3. 3.
    Genkin, A., Lewis, D.D., Madigan, D.: Large-scale bayesian logistic regression for text categorization. Technical report, Center for Discrete Mathematics and Theoretical Computer Science (2004)Google Scholar
  4. 4.
    Joachims, T.: Text categorization with support vector machines: learning with many relevant features. In: Nédellec, C., Rouveirol, C. (eds.) ECML 1998. LNCS, vol. 1398, pp. 137–142. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  5. 5.
    Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model selection. In: Proc. of the Fourteenth International Joint Conference on Artificial Intelligence, pp. 1137–1145. Morgan Kaufmann, San Mateo (1995)Google Scholar
  6. 6.
    Lewis, D.D.: Evaluating Text Categorization. In: Proceedings of Speech and Natural Language Workshop, pp. 312–318. Morgan Kaufmann, San Francisco (1991)CrossRefGoogle Scholar
  7. 7.
    Lewis, D.D.: Evaluating and Optimizing Autonomous Text Classification Systems. In: Fox, E.A., Ingwersen, P., Fidel, R. (eds.) Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Seattle, Washington, pp. 246–254. ACM Press, New York (1995)CrossRefGoogle Scholar
  8. 8.
    Lewis, D.D., Schapire, R.E., Callan, J.P., Papka, R.: Training algorithms for linear text classifiers. In: Frei, H.-P., Harman, D., Schäuble, P., Wilkinson, R. (eds.) Proceedings of SIGIR 1996, 19th ACM International Conference on Research and Development in Information Retrieval, Zürich, CH, pp. 298–306. ACM Press, New York (1996)CrossRefGoogle Scholar
  9. 9.
    Montejo-Ráez, A.: Towards conceptual indexing using automatic assignment of descriptors. In: Workshop in Personalization Techniques in Electronic Publishing on the Web: Trends and Perspectives, Málaga, Spain (May 2002)Google Scholar
  10. 10.
    Montejo-Ráez, A., Dallman, D.: Experiences in automatic keywording of particle physics literature. High Energy Physics Libraries Webzine (issue 5) (November 2001), URL:
  11. 11.
    Montejo-Ráez, A., Steinberger, R., Ureña-López, L.A.: Adaptive Selection of Base Classifiers in One-Against-All Learning for Large Multi-labeled Collections. In: Vicedo, J.L., Martínez-Barco, P., Muńoz, R., Saiz Noeda, M. (eds.) EsTAL 2004. LNCS, vol. 3230, pp. 1–12. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  12. 12.
    Porter, M.F.: An algorithm for suffix stripping, pp. 313–316. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar
  13. 13.
    Salton, G., Wong, A., Yang, C.S.: A Vector Space Model for Automatic Indexing. Technical Report TR74-218, Cornell University, Computer Science Department (July 1974)Google Scholar
  14. 14.
    Schapire, R.E., Singer, Y.: BoosTexter: A boosting-based system for text categorization. Machine Learning 39(2/3), 135–168 (2000)MATHCrossRefGoogle Scholar
  15. 15.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Comput. Surv. 34(1), 1–47 (2002)CrossRefGoogle Scholar
  16. 16.
    van Rijsbergen, C.J.: Information Retrieval. Butterworths, London (1975), Google Scholar
  17. 17.
    Li, Y., Zaragoza, H., Herbrich, R., Shawe-Taylor, J., Kandola, J.: The perceptron algorithm with uneven margins. In: Proceedings of the International Conference of Machine Learning (ICML 2002) (2002)Google Scholar
  18. 18.
    Yang, Y.: A study on thresholding strategies for text categorization. In: Croft, W.B., Harper, D.J., Kraft, D.H., Zobel, J. (eds.) Proceedings of SIGIR 2001, 24th ACM International Conference on Research and Development in Information Retrieval, New Orleans, US, pp. 137–145. ACM Press, New York (2001); Describes RCut, Scut, etc.CrossRefGoogle Scholar
  19. 19.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Hearst, M.A., Gey, F., Tong, R. (eds.) Proceedings of SIGIR 1999, 22nd ACM International Conference on Research and Development in Information Retrieval, Berkeley, US, pp. 42–49. ACM Press, New York (1999)CrossRefGoogle Scholar
  20. 20.
    Yang, Y., Pedersen, J.O.: A comparative study on feature selection in text categorization. In: Proceedings of the Fourteenth International Conference on Machine Learning, pp. 412–420. Morgan Kaufmann Publishers Inc., San Francisco (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Arturo Montejo-Ráez
    • 1
  • Luis Alfonso Ureña-López
    • 1
  1. 1.Department of Computer ScienceUniversity of JaénSpain

Personalised recommendations