How Many Labels? Determining the Number of Labels in Multi-Label Text Classification

Azarbonyad, Hosein; Marx, Maarten

doi:10.1007/978-3-030-28577-7_11

How Many Labels? Determining the Number of Labels in Multi-Label Text Classification

Hosein Azarbonyad¹⁷ &
Maarten Marx¹⁸

Conference paper
First Online: 03 August 2019

1171 Accesses
1 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11696))

Abstract

Multi-Label Text Classification (MLTC) is a supervised machine learning task in which the goal is to learn a classifier that assigns multiple labels to text documents. When all documents have the same number of labels, this task is very close to ordinary (single label) text classification. However, in case this number varies another classifier needs to determine, for each document, how many labels to assign. The topic of this paper is exactly this additional classifier. We compare several baselines to a system which learns a dynamic threshold for a given text classifier. The thresholding classifier receives the ranked list of scores for each label for a document as input and returns a threshold score. All labels with a score higher than this threshold will then be assigned to the document. Our results show that, first, this dynamic thresholding significantly improves recall but has the same precision as a static system which assigns the same (the mean) number of classes to each document, and second, that the accuracy of predicting the number of classes is positively related to the quality (measured by MAP) of the text classifier.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 59.99; Price excludes VAT (USA)

Softcover Book: USD 79.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

Babbar, R., Schölkopf, B.: Dismec: distributed sparse machines for extreme multi-label classification. In: WSDM 2017, pp. 721–729 (2017)
Google Scholar
Bi, W., Kwok, J.T.: Multi-label classification on tree and dag-structured hierarchies. In: ICML 2011, pp. 17–24 (2011)
Google Scholar
Bi, W., Kwok, J.T.: Efficient multi-label classification with many labels. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, pp. 405–413 (2013)
Google Scholar
Dehghani, M., Azarbonyad, H., Marx, M., Kamps, J.: Sources of evidence for automatic indexing of political texts. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 568–573. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-16354-3_63
Chapter Google Scholar
Dehghani, M., Azarbonyad, H., Kamps, J., Marx, M.: On horizontal and vertical separation in hierarchical text classification. In: ICTIR 2016, pp. 185–194 (2016)
Google Scholar
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: NIPS 2001 (2001)
Google Scholar
EuroVoc. Multilingual thesaurus of the European union (2014). http://eurovoc.europa.eu/
Hariharan, B., Zelnik-manor, L., Vishwanathan, S.V.N., Varma, M.: Large scale max-margin multi-label classification with priors. In: ICML 2010, pp. 423–430 (2010)
Google Scholar
Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J.: Multilabel Classification. In: Herrera, F., Charte, F., Rivera, A.J., del Jesus, M.J. (eds.) Multilabel Classification: Problem Analysis, Metrics and Techniques, pp. 17–31. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-41111-8_2
Chapter Google Scholar
Ioannou, M., Sakkas, G., Tsoumakas, G., Vlahavas, I.: Obtaining bipartitions from score vectors for multi-label classification. In: ICTAI 2010, pp. 409–416 (2010)
Google Scholar
Nam, J., Kim, J., Loza Mencía, E., Gurevych, I., Fürnkranz, J.: Large-scale multi-label text classification—revisiting neural networks. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8725, pp. 437–452. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-662-44851-9_28
Chapter Google Scholar
Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. Mach. Learn. 850(3), 333–359 (2011)
Article MathSciNet Google Scholar
Steinberger, R., Pouliquen, B., Widiger, A., Ignat, C., Erjavec, T., Tufis, D.: The JRC-Acquis: a multilingual aligned parallel corpus with 20+ languages. In: LREC 2006 (2006)
Google Scholar
Steinberger, R., Ebrahim, M., Turchi, M.: JRC EuroVoc indexer JEX-A freely available multi-label categorisation tool. In: LREC 2012 (2012)
Google Scholar
Tang, L., Rajan, S., Narayanan, V.K.: Large scale multi-label classification via metalabeler. In: WWW 2009, pp. 211–220 (2009)
Google Scholar
Xu, J., Li, H.: Adarank: a boosting algorithm for information retrieval. In: SIGIR 2007, pp. 391–398 (2007)
Google Scholar
Yang, Y., Gopal, S.: Multilabel classification with meta-level features in a learning-to-rank framework. Mach. Learn. 880(1–2), 47–68 (2012)
Article MathSciNet Google Scholar
Zhang, M.L., Zhou, Z.H.: A review on multi-label learning algorithms. IEEE Trans. Knowl. Data Eng. 260(8), 1819–1837 (2014)
Article Google Scholar

Download references

Author information

Authors and Affiliations

KLM Royal Dutch Airlines, Amsterdam, The Netherlands
Hosein Azarbonyad
University of Amsterdam, Amsterdam, The Netherlands
Maarten Marx

Authors

Hosein Azarbonyad
View author publications
You can also search for this author in PubMed Google Scholar
Maarten Marx
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hosein Azarbonyad .

Editor information

Editors and Affiliations

Universita della Svizzera Italiana, Lugano, Switzerland
Fabio Crestani
Zurich University of Applied Sciences, Winterthur, Switzerland
Martin Braschler
University of Neuchâtel, Neuchâtel, Switzerland
Jacques Savoy
Technische Universität Wien, Vienna, Austria
Andreas Rauber
HES-SO Valais-Wallis, Sierre, Switzerland
Henning Müller
University of Santiago de Compostela, Santiago de Compostela, Spain
David E. Losada
Swiss Alliance for Data-Intensive Services, Thun, Switzerland
Gundula Heinatz Bürki
University of Padua, Padua, Italy
Linda Cappellato
University of Padua, Padua, Italy
Nicola Ferro

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Azarbonyad, H., Marx, M. (2019). How Many Labels? Determining the Number of Labels in Multi-Label Text Classification. In: Crestani, F., et al. Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2019. Lecture Notes in Computer Science(), vol 11696. Springer, Cham. https://doi.org/10.1007/978-3-030-28577-7_11

Download citation

DOI: https://doi.org/10.1007/978-3-030-28577-7_11
Published: 03 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-28576-0
Online ISBN: 978-3-030-28577-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics