MCut: A Thresholding Strategy for Multi-label Classification

  • Christine Largeron
  • Christophe Moulin
  • Mathias Géry
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7619)

Abstract

The multi-label classification is a frequent task in machine learning notably in text categorization. When binary classifiers are not suited, an alternative consists in using a multiclass classifier that provides for each document a score per category and then in applying a thresholding strategy in order to select the set of categories which must be assigned to the document. The common thresholding strategies, such as RCut, PCut and SCut methods, need a training step to determine the value of the threshold. To overcome this limit, we propose a new strategy, called MCut which automatically estimates a value for the threshold. This method does not have to be trained and does not need any parametrization. Experiments performed on two textual corpora, XML Mining 2009 and RCV1 collections, show that the MCut strategy results are on par with the state of the art but MCut is easy to implement and parameter free.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Clare, A., King, R.D.: Knowledge Discovery in Multi-label Phenotype Data. In: Siebes, A., De Raedt, L. (eds.) PKDD 2001. LNCS (LNAI), vol. 2168, pp. 42–53. Springer, Heidelberg (2001)CrossRefGoogle Scholar
  2. 2.
    Cohen, W.W., Singer, Y.: Context-sensitive learning methods for text categorization. In: Proceedings of the 19th ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR 1996), pp. 307–315 (1996)Google Scholar
  3. 3.
    De Comité, F., Gilleron, R., Tommasi, M.: Learning multi-label alternating decision trees from texts and data. In: Perner, P., Rosenfeld, A. (eds.) MLDM 2003. LNCS, vol. 2734, pp. 251–274. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Crammer, K., Singer, Y., Jaz, K., Hofmann, T., Poggio, T., Shawe-taylor, J.: A family of additive online algorithms for category ranking. Journal of Machine Learning Research (JMLR) 3, 1025–1058 (2003)MathSciNetMATHGoogle Scholar
  5. 5.
    Denoyer, L., Gallinari, P.: The wikipedia xml corpus. Special Interest Group on Information Retrieval Forum (SIGIR 2006) 40(1), 64–69 (2006)Google Scholar
  6. 6.
    Denoyer, L., Gallinari, P.: Report on the xml mining classification track at inex 2009. In: INitiative for the Evaluation of XML Retrieval 2009 Workshop Pre-proceedings (INEX 2009), pp. 339–343 (2009)Google Scholar
  7. 7.
    Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Advances in Neural Information Processing Systems 14 (NIPS 2001), pp. 681–687 (2001)Google Scholar
  8. 8.
    Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. Journal of Machine Learning Research (JMLR) 9, 1871–1874 (2008)MATHGoogle Scholar
  9. 9.
    Har-Peled, S., Roth, D., Zimak, D.: Constraint Classification: A New Approach to Multiclass Classification. In: Cesa-Bianchi, N., Numao, M., Reischuk, R. (eds.) ALT 2002. LNCS (LNAI), vol. 2533, pp. 365–379. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  10. 10.
    Lewis, D.D.: An evaluation of phrasal and clustered representations on a text categorization task. In: Proceedings of the 15th ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR 1992), pp. 37–50 (1992)Google Scholar
  11. 11.
    Lewis, D.D., Ringuette, M.: A comparison of two learning algorithms for text categorization. In: Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR 1994), pp. 81–93 (1994)Google Scholar
  12. 12.
    Lewis, D.D., Yang, Y., Rose, T.G., Dietterich, G., Li, F.: Rcv1: A new benchmark collection for text categorization research. Journal of Machine Learning Research (JMLR) 5, 361–397 (2004)Google Scholar
  13. 13.
    Luo, X., Zincir-Heywood, A.N.: Evaluation of Two Systems on Multi-class Multi-label Document Classification. In: Hacid, M.-S., Murray, N.V., Raś, Z.W., Tsumoto, S. (eds.) ISMIS 2005. LNCS (LNAI), vol. 3488, pp. 161–169. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  14. 14.
    Porter, M.: An algorithm for suffix stripping. Program 3, 130–137 (1980)CrossRefGoogle Scholar
  15. 15.
    Montejo-Ráez, A., Ureña-López, L.A.: Selection Strategies for Multi-label Text Categorization. In: Salakoski, T., Ginter, F., Pyysalo, S., Pahikkala, T. (eds.) FinTAL 2006. LNCS (LNAI), vol. 4139, pp. 585–592. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  16. 16.
    Salton, G., McGill, M.: Introduction to Modern Information Retrieval. McGraw-Hill (1983)Google Scholar
  17. 17.
    Schapire, R.E., Singer, Y.: Boostexter: A boosting-based system for text categorization. Machine Learning 39(2-3), 135–168 (2000)MATHCrossRefGoogle Scholar
  18. 18.
    Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)CrossRefGoogle Scholar
  19. 19.
    Tang, L., Rajan, S., Narayanan, V.K.: Large scale multi-label classification via metalabeler. In: Proceedings of the 18th International Conference on World Wide Web (WWW 2009), pp. 211–220 (2009)Google Scholar
  20. 20.
    Tsoumakas, G., Katakis, I.: Multi-label classification: An overview. International Journal of Data Warehousing and Mining (IJDWM 2007) 3(3), 1–13 (2007)CrossRefGoogle Scholar
  21. 21.
    Vapnik, V.: The Nature of Statistical Learning Theory. Springer (1995)Google Scholar
  22. 22.
    Yang, Y.: A study of thresholding strategies for text categorization. In: Proceedings of the 24th ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR 2001), pp. 137–145 (2001)Google Scholar
  23. 23.
    Yang, Y., Liu, X.: A re-examination of text categorization methods. In: Proceedings of the 22nd ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR 1999), pp. 42–49 (1999)Google Scholar
  24. 24.
    Zhang, M.-L., Zhou, Z.-H.: A k-nearest neighbor based algorithm for multi-label classification. In: Proceedings of the 1st IEEE International Conference on Granular Computing (GrC 2005), pp. 718–721 (2005)Google Scholar
  25. 25.
    Zhang, M.-L., Zhou, Z.-H.: Multilabel neural networks with applications to functional genomics and text categorization. IEEE Transactions on Knowledge and Data Engineering (TKDE 2006) 18, 1338–1351 (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Christine Largeron
    • 1
    • 2
    • 3
  • Christophe Moulin
    • 1
    • 2
    • 3
  • Mathias Géry
    • 1
    • 2
    • 3
  1. 1.Université de LyonSaint-ÉtienneFrance
  2. 2.Laboratoire Hubert CurienCNRS UMR 5516France
  3. 3.Université de Saint-ÉtienneFrance

Personalised recommendations