Machine Learning

, 85:333 | Cite as

Classifier chains for multi-label classification

  • Jesse ReadEmail author
  • Bernhard Pfahringer
  • Geoff Holmes
  • Eibe Frank


The widely known binary relevance method for multi-label classification, which considers each label as an independent binary problem, has often been overlooked in the literature due to the perceived inadequacy of not directly modelling label correlations. Most current methods invest considerable complexity to model interdependencies between labels. This paper shows that binary relevance-based methods have much to offer, and that high predictive performance can be obtained without impeding scalability to large datasets. We exemplify this with a novel classifier chains method that can model label correlations while maintaining acceptable computational complexity. We extend this approach further in an ensemble framework. An extensive empirical evaluation covers a broad range of multi-label datasets with a variety of evaluation metrics. The results illustrate the competitiveness of the chaining method against related and state-of-the-art methods, both in terms of predictive performance and time complexity.


Multi-label classification Problem transformation Ensemble methods Scalable methods 


  1. Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771. CrossRefGoogle Scholar
  2. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. MathSciNetzbMATHGoogle Scholar
  3. Cheng, W., & Hüllermeier, E. (2009). Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 76(2–3), 211–225. doi: 10.1007/s10994-009-5127-5. CrossRefGoogle Scholar
  4. Cheng, W., Dembczyński, K., & Hüllermeier, E. (2010). Bayes optimal multilabel classification via probabilistic classifier chains. In ICML ’10: 27th international conference on machine learning. Haifa: Omnipress. Google Scholar
  5. Clare, A., & King, R. D. (2001). Lecture notes in computer science: Vol. 2168. Knowledge discovery in multi-label phenotype data. Google Scholar
  6. Dembczyński, K., Waegeman, W., Cheng, W., & Hüllermeier, E. (2010). On label dependence in multi-label classification. In Workshop proceedings of learning from multi-label data (pp. 5–12). Haifa, Israel. Google Scholar
  7. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30. zbMATHGoogle Scholar
  8. Dimou, A., Tsoumakas, G., Mezaris, V., Kompatsiaris, I., & Vlahavas, I. (2009). An empirical study of multi-label learning methods for video annotation. In Proceedings of the 7th international workshop on content-based multimedia indexing. New York: IEEE. Google Scholar
  9. Elisseeff, A., & Weston, J. (2001). A kernel method for multi-labelled classification. In Advances in neural information processing systems (Vol. 14, pp. 681–687). Cambridge: MIT Press. Google Scholar
  10. Fan, R. E., & Lin, C. J. (2007). A study on threshold selection for multi-label classification (Tech. rep.). National Taiwan University.
  11. Freund, Y., & Schapire, R. E. (1999). A short introduction to boosting. Jinkō Chinō Gakkaishi, 14(5), 771–780. Google Scholar
  12. Fürnkranz, J. (2002). Round robin classification. Machine Learning, 2, 721–747. zbMATHGoogle Scholar
  13. Fürnkranz, J., Hüllermeier, E., Loza Mencía, E., & Brinker, K. (2008). Multilabel classification via calibrated label ranking. Machine Learning, 73(2), 133–153. doi: 10.1007/s10994-008-5064-8. CrossRefGoogle Scholar
  14. Godbole, S., & Sarawagi, S. (2004). Discriminative methods for multi-labeled classification. In PAKDD ’04: eighth Pacific-Asia conference on knowledge discovery and data mining (pp. 22–30). Berlin: Springer. Google Scholar
  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Peter, R., & Witten, I. H. (2009). The weka data mining software: An update. SIGKDD Explorations, 11(1). Google Scholar
  16. Hsu, D., Kakade, S. M., Langford, J., & Zhang, T. (2009). Multi-label prediction via compressed sensing. In NIPS ’09: neural information processing systems 2009. Google Scholar
  17. Ji, S., Tang, L., Yu, S., & Ye, J. (2008). Extracting shared subspace for multi-label classification. In KDD ’08: 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 381–389). New York: ACM. doi: 10.1145/1401890.1401939. CrossRefGoogle Scholar
  18. Kiritchenko, S. (2005). Hierarchical text categorization and its application to bioinformatics. Ph.D. thesis, Queen’s University, Kingston, Canada. Google Scholar
  19. Loza Mencía, E., & Fürnkranz, J. (2008). Efficient pairwise multilabel classification for large-scale problems in the legal domain. In ECML-PKDD ’08: European conference on machine learning and knowledge discovery in databases (pp. 50–65). Berlin: Springer. doi: 10.1007/978-3-540-87481-2_4. CrossRefGoogle Scholar
  20. McCallum, A. K. (1999). Multi-label text classification with a mixture model trained by EM. In Association for the advancement of artificial intelligence workshop on text learning. Google Scholar
  21. Petrovskiy, M. (2006). Paired comparisons method for solving multi-label learning problem. In HIS ’06: sixth international conference on hybrid intelligent systems. New York: IEEE. doi: 10.1109/HIS.2006.264925. Google Scholar
  22. Platt, J. C. (1999). Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In Advances in large margin classifiers (pp. 61–74). Cambridge: MIT Press. Google Scholar
  23. Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81–106. Google Scholar
  24. Ráez, A. M., López, L. A. U., & Steinberger, R. (2004). Adaptive selection of base classifiers in one-against-all learning for large multi-labeled collections. In EsTAL: 4th international conference on advances in natural language processing (pp. 1–12). Google Scholar
  25. Read, J., Pfahringer, B., & Holmes, G. (2008). Multi-label classification using ensembles of pruned sets. In ICDM’08: eighth IEEE international conference on data mining (pp. 995–1000). New York: IEEE. CrossRefGoogle Scholar
  26. Read, J., Pfahringer, B., & Holmes, G. (2009a). Generating synthetic multi-label data streams. In MLD ’09: 1st ECML/PKDD 2009 workshop on learning from multi-label data. Google Scholar
  27. Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2009b). Classifier chains for multi-label classification. In ECML ’09: 20th European conference on machine learning (pp. 254–269). Berlin: Springer. Google Scholar
  28. Schapire, R. E., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336. zbMATHCrossRefGoogle Scholar
  29. Schapire, R. E., & Singer, Y. (2000). Boostexter: a boosting-based system for text categorization. Machine Learning, 39(2/3), 135–168. zbMATHCrossRefGoogle Scholar
  30. Spyromitros, E., Tsoumakas, G., & Vlahavas, I. (2008). An empirical study of lazy multilabel classification algorithms. In SETN ’08: fifth Hellenic conference on artificial intelligence (pp. 401–406). Berlin: Springer. Google Scholar
  31. Sun, L., Ji, S., & Ye, J. (2008). Hypergraph spectral learning for multi-label classification. In KDD ’08: 14th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 668–676). New York: ACM. doi: 10.1145/1401890.1401971. CrossRefGoogle Scholar
  32. Tai, F., & Lin, H. T. (2010). Multi-label classification with principle label space transformation. In Workshop proceedings of learning from multi-label data, Haifa, Israel. Google Scholar
  33. Tsoumakas, G., & Katakis, I. (2007). Multi label classification: an overview. International Journal of Data Warehousing and Mining, 3(3), 1–13. CrossRefGoogle Scholar
  34. Tsoumakas, G., & Vlahavas, I. P. (2007). Random k-labelsets: an ensemble method for multilabel classification. In ECML ’07: 18th European conference on machine learning (pp. 406–417). Berlin: Springer. CrossRefGoogle Scholar
  35. Vens, C., Struyf, J., Schietgat, L., Džeroski, S., & Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 2(73), 185–214. doi: 10.1007/s10994-008-5077-3. CrossRefGoogle Scholar
  36. Yan, R., Tesic, J., & Smith, J. R. (2007). Model-shared subspace boosting for multi-label classification. In KDD ’07: 13th ACM SIGKDD international conference on knowledge discovery and data mining (pp. 834–843). New York: ACM. doi: 10.1145/1281192.1281281. CrossRefGoogle Scholar
  37. Yang, Y. (2001). A study on thresholding strategies for text categorization. In Proceedings of SIGIR-01, 24th ACM international conference on research and development in information retrieval (pp. 137–145). New York: ACM Press. Google Scholar
  38. Zhang, M. L., & Zhou, Z. H. (2005). A k-nearest neighbor based algorithm for multi-label classification. In GnC ’05: IEEE international conference on granular computing (pp. 718–721). New York: IEEE. CrossRefGoogle Scholar
  39. Zhang, M. L., & Zhou, Z. H. (2007). ML-KNN: a lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048. zbMATHCrossRefGoogle Scholar

Copyright information

© The Author(s) 2011

Authors and Affiliations

  • Jesse Read
    • 1
    • 2
    Email author
  • Bernhard Pfahringer
    • 1
  • Geoff Holmes
    • 1
  • Eibe Frank
    • 1
  1. 1.Department of Computer ScienceThe University of WaikatoHamiltonNew Zealand
  2. 2.Department of Signal Theory and CommunicationsUniversidad Carlos IIIMadridSpain

Personalised recommendations