MCut: A Thresholding Strategy for Multi-label Classification

Largeron, Christine; Moulin, Christophe; Géry, Mathias

doi:10.1007/978-3-642-34156-4_17

MCut: A Thresholding Strategy for Multi-label Classification

Christine Largeron^19,20,21,
Christophe Moulin^19,20,21 &
Mathias Géry^19,20,21

Conference paper

1782 Accesses
5 Citations

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 7619))

Abstract

The multi-label classification is a frequent task in machine learning notably in text categorization. When binary classifiers are not suited, an alternative consists in using a multiclass classifier that provides for each document a score per category and then in applying a thresholding strategy in order to select the set of categories which must be assigned to the document. The common thresholding strategies, such as RCut, PCut and SCut methods, need a training step to determine the value of the threshold. To overcome this limit, we propose a new strategy, called MCut which automatically estimates a value for the threshold. This method does not have to be trained and does not need any parametrization. Experiments performed on two textual corpora, XML Mining 2009 and RCV1 collections, show that the MCut strategy results are on par with the state of the art but MCut is easy to implement and parameter free.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Clare, A., King, R.D.: Knowledge Discovery in Multi-label Phenotype Data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)
Chapter Google Scholar
Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. In: Proceedings of the 19th ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR 1996), pp. 307–315 (1996)
Google Scholar
De Comité, F., Gilleron, R., Tommasi, M.: Learning multi-label alternating decision trees from texts and data. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 251–274. Springer, Heidelberg (2003)
Chapter Google Scholar
Crammer, K., Singer, Y., Jaz, K., Hofmann, T., Poggio, T., Shawe-taylor, J.: A family of additive online algorithms for category ranking. Journal of Machine Learning Research (JMLR) 3, 1025–1058 (2003)
MathSciNet MATH Google Scholar
Denoyer, L., Gallinari, P.: The wikipedia xml corpus. Special Interest Group on Information Retrieval Forum (SIGIR 2006) 40(1), 64–69 (2006)
Google Scholar
Denoyer, L., Gallinari, P.: Report on the xml mining classification track at inex 2009. In: INitiative for the Evaluation of XML Retrieval 2009 Workshop Pre-proceedings (INEX 2009), pp. 339–343 (2009)
Google Scholar
Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems 14 (NIPS 2001), pp. 681–687 (2001)
Google Scholar
Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research (JMLR) 9, 1871–1874 (2008)
MATH Google Scholar
Har-Peled, S., Roth, D., Zimak, D.: Constraint Classification: A New Approach to Multiclass Classification. In: Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds.) ALT 2002. LNCS (LNAI), vol. 2533, pp. 365–379. Springer, Heidelberg (2002)
Chapter Google Scholar
Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR 1992), pp. 37–50 (1992)
Google Scholar
Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1994), pp. 81–93 (1994)
Google Scholar
Lewis, D.D., Yang, Y., Rose, T.G., Dietterich, G., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research (JMLR) 5, 361–397 (2004)
Google Scholar
Luo, X., Zincir-Heywood, A.N.: Evaluation of Two Systems on Multi-class Multi-label Document Classification. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 161–169. Springer, Heidelberg (2005)
Chapter Google Scholar
Porter, M.: An algorithm for suffix stripping. Program 3, 130–137 (1980)
Article Google Scholar
Montejo-Ráez, A., Ureña-López, L.A.: Selection Strategies for Multi-label Text Categorization. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 585–592. Springer, Heidelberg (2006)
Chapter Google Scholar
Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill (1983)
Google Scholar
Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 39(2-3), 135–168 (2000)
Article MATH Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Tang, L., Rajan, S., Narayanan, V.K.: Large scale multi-label classification via metalabeler. In: Proceedings of the 18th International Conference on World Wide Web (WWW 2009), pp. 211–220 (2009)
Google Scholar
Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM 2007) 3(3), 1–13 (2007)
Article Google Scholar
Vapnik, V.: The Nature of Statistical Learning Theory. Springer (1995)
Google Scholar
Yang, Y.: A study of thresholding strategies for text categorization. In: Proceedings of the 24th ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR 2001), pp. 137–145 (2001)
Google Scholar
Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 42–49 (1999)
Google Scholar
Zhang, M.-L., Zhou, Z.-H.: A k-nearest neighbor based algorithm for multi-label classification. In: Proceedings of the 1st IEEE International Conference on Granular Computing (GrC 2005), pp. 718–721 (2005)
Google Scholar
Zhang, M.-L., Zhou, Z.-H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering (TKDE 2006) 18, 1338–1351 (2006)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Université de Lyon, F-42023, Saint-Étienne, France
Christine Largeron, Christophe Moulin & Mathias Géry
Laboratoire Hubert Curien, CNRS UMR 5516, France
Christine Largeron, Christophe Moulin & Mathias Géry
Université de Saint-Étienne, Jean Monnet, F-42023, France
Christine Largeron, Christophe Moulin & Mathias Géry

Authors

Christine Largeron
View author publications
You can also search for this author in PubMed Google Scholar
Christophe Moulin
View author publications
You can also search for this author in PubMed Google Scholar
Mathias Géry
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Information and Computer Science, Aalto University School of Science, P.O. Box 15400, 00076, Aalto, Finland
Jaakko Hollmén
Department of Computer Science, Ostfalia University of Applied Sciences, Salzdahlumer Straße 46/48, 38302, Wolfenbüttel, Germany
Frank Klawonn
School of Information Systems, Computing and Mathematics, Brunel University, UB8 3PH, Uxbridge, Middlesex, UK
Allan Tucker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Largeron, C., Moulin, C., Géry, M. (2012). MCut: A Thresholding Strategy for Multi-label Classification. In: Hollmén, J., Klawonn, F., Tucker, A. (eds) Advances in Intelligent Data Analysis XI. IDA 2012. Lecture Notes in Computer Science, vol 7619. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-34156-4_17

Download citation

DOI: https://doi.org/10.1007/978-3-642-34156-4_17
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-34155-7
Online ISBN: 978-3-642-34156-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics