Classifier chains for multi-label classification

Abstract

The widely known binary relevance method for multi-label classification, which considers each label as an independent binary problem, has often been overlooked in the literature due to the perceived inadequacy of not directly modelling label correlations. Most current methods invest considerable complexity to model interdependencies between labels. This paper shows that binary relevance-based methods have much to offer, and that high predictive performance can be obtained without impeding scalability to large datasets. We exemplify this with a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity. We extend this approach further in an ensemble framework. An extensive empirical evaluation covers a broad range of multi-label datasets with a variety of evaluation metrics. The results illustrate the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.

References

  1. Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.

    Article  Google Scholar 

  2. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.

    MathSciNet  MATH  Google Scholar 

  3. Cheng, W., & Hüllermeier, E. (2009). Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 76(2–3), 211–225. doi:10.1007/s10994-009-5127-5.

    Article  Google Scholar 

  4. Cheng, W., Dembczyński, K., & Hüllermeier, E. (2010). Bayes optimal multilabel classification via probabilistic classifier chains. In ICML ’10: 27th international conference on machine learning. Haifa: Omnipress.

    Google Scholar 

  5. Clare, A., & King, R. D. (2001). Lecture notes in computer science: Vol. 2168. Knowledge discovery in multi-label phenotype data.

    Google Scholar 

  6. Dembczyński, K., Waegeman, W., Cheng, W., & Hüllermeier, E. (2010). On label dependence in multi-label classification. In Workshop proceedings of learning from multi-label data (pp. 5–12). Haifa, Israel.

    Google Scholar 

  7. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.

    MATH  Google Scholar 

  8. Dimou, A., Tsoumakas, G., Mezaris, V., Kompatsiaris, I., & Vlahavas, I. (2009). An empirical study of multi-label learning methods for video annotation. In Proceedings of the 7th international workshop on content-based multimedia indexing. New York: IEEE.

    Google Scholar 

  9. Elisseeff, A., & Weston, J. (2001). A kernel method for multi-labelled classification. In Advances in neural information processing systems (Vol. 14, pp. 681–687). Cambridge: MIT Press.

    Google Scholar 

  10. Fan, R. E., & Lin, C. J. (2007). A study on threshold selection for multi-label classification (Tech. rep.). National Taiwan University. http://www.csie.ntu.edu.tw/cjlin/papers/threshold.pdf.

  11. Freund, Y., & Schapire, R. E. (1999). A short introduction to boosting. Jinkō Chinō Gakkaishi, 14(5), 771–780.

    Google Scholar 

  12. Fürnkranz, J. (2002). Round robin classification. Machine Learning, 2, 721–747.

    MATH  Google Scholar 

  13. Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153. doi:10.1007/s10994-008-5064-8.

    Article  Google Scholar 

  14. Godbole, S., & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. In PAKDD ’04: eighth Pacific-Asia conference on knowledge discovery and data mining (pp. 22–30). Berlin: Springer.

    Google Scholar 

  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Peter, R., & Witten, I. H. (2009). The weka data mining software: An update. SIGKDD Explorations, 11(1).

  16. Hsu, D., Kakade, S. M., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In NIPS ’09: neural information processing systems 2009.

    Google Scholar 

  17. Ji, S., Tang, L., Yu, S., & Ye, J. (2008). Extracting shared subspace for multi-label classification. In KDD ’08: 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 381–389). New York: ACM. doi:10.1145/1401890.1401939.

    Google Scholar 

  18. Kiritchenko, S. (2005). Hierarchical text categorization and its application to bioinformatics. Ph.D. thesis, Queen’s University, Kingston, Canada.

  19. Loza Mencía, E., & Fürnkranz, J. (2008). Efficient pairwise multilabel classification for large-scale problems in the legal domain. In ECML-PKDD ’08: European conference on machine learning and knowledge discovery in databases (pp. 50–65). Berlin: Springer. doi:10.1007/978-3-540-87481-2_4.

    Google Scholar 

  20. McCallum, A. K. (1999). Multi-label text classification with a mixture model trained by EM. In Association for the advancement of artificial intelligence workshop on text learning.

    Google Scholar 

  21. Petrovskiy, M. (2006). Paired comparisons method for solving multi-label learning problem. In HIS ’06: sixth international conference on hybrid intelligent systems. New York: IEEE. doi:10.1109/HIS.2006.264925.

    Google Scholar 

  22. Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press.

    Google Scholar 

  23. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106.

    Google Scholar 

  24. Ráez, A. M., López, L. A. U., & Steinberger, R. (2004). Adaptive selection of base classifiers in one-against-all learning for large multi-labeled collections. In EsTAL: 4th international conference on advances in natural language processing (pp. 1–12).

    Google Scholar 

  25. Read, J., Pfahringer, B., & Holmes, G. (2008). Multi-label classification using ensembles of pruned sets. In ICDM’08: eighth IEEE international conference on data mining (pp. 995–1000). New York: IEEE.

    Google Scholar 

  26. Read, J., Pfahringer, B., & Holmes, G. (2009a). Generating synthetic multi-label data streams. In MLD ’09: 1st ECML/PKDD 2009 workshop on learning from multi-label data.

    Google Scholar 

  27. Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2009b). Classifier chains for multi-label classification. In ECML ’09: 20th European conference on machine learning (pp. 254–269). Berlin: Springer.

    Google Scholar 

  28. Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.

    MATH  Article  Google Scholar 

  29. Schapire, R. E., & Singer, Y. (2000). Boostexter: a boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168.

    MATH  Article  Google Scholar 

  30. Spyromitros, E., Tsoumakas, G., & Vlahavas, I. (2008). An empirical study of lazy multilabel classification algorithms. In SETN ’08: fifth Hellenic conference on artificial intelligence (pp. 401–406). Berlin: Springer.

    Google Scholar 

  31. Sun, L., Ji, S., & Ye, J. (2008). Hypergraph spectral learning for multi-label classification. In KDD ’08: 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 668–676). New York: ACM. doi:10.1145/1401890.1401971.

    Google Scholar 

  32. Tai, F., & Lin, H. T. (2010). Multi-label classification with principle label space transformation. In Workshop proceedings of learning from multi-label data, Haifa, Israel.

    Google Scholar 

  33. Tsoumakas, G., & Katakis, I. (2007). Multi label classification: an overview. International Journal of Data Warehousing and Mining, 3(3), 1–13.

    Article  Google Scholar 

  34. Tsoumakas, G., & Vlahavas, I. P. (2007). Random k-labelsets: an ensemble method for multilabel classification. In ECML ’07: 18th European conference on machine learning (pp. 406–417). Berlin: Springer.

    Google Scholar 

  35. Vens, C., Struyf, J., Schietgat, L., Džeroski, S., & Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 2(73), 185–214. doi:10.1007/s10994-008-5077-3.

    Article  Google Scholar 

  36. Yan, R., Tesic, J., & Smith, J. R. (2007). Model-shared subspace boosting for multi-label classification. In KDD ’07: 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 834–843). New York: ACM. doi:10.1145/1281192.1281281.

    Google Scholar 

  37. Yang, Y. (2001). A study on thresholding strategies for text categorization. In Proceedings of SIGIR-01, 24th ACM international conference on research and development in information retrieval (pp. 137–145). New York: ACM Press.

    Google Scholar 

  38. Zhang, M. L., & Zhou, Z. H. (2005). A k-nearest neighbor based algorithm for multi-label classification. In GnC ’05: IEEE international conference on granular computing (pp. 718–721). New York: IEEE.

    Google Scholar 

  39. Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.

    MATH  Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Jesse Read.

Additional information

Editor: Carla Brodley.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Read, J., Pfahringer, B., Holmes, G. et al. Classifier chains for multi-label classification. Mach Learn 85, 333 (2011). https://doi.org/10.1007/s10994-011-5256-5

Download citation

Keywords

  • Multi-label classification
  • Problem transformation
  • Ensemble methods
  • Scalable methods