Journal of Intelligent Information Systems

, Volume 47, Issue 1, pp 57–90 | Cite as

The use of data-derived label hierarchies in multi-label classification

  • Gjorgji MadjarovEmail author
  • Dejan Gjorgjevikj
  • Ivica Dimitrovski
  • Sašo Džeroski


Instead of traditional (multi-class) learning approaches that assume label independency, multi-label learning approaches must deal with the existing label dependencies and relations. Many approaches try to model these dependencies in the process of learning and integrate them in the final predictive model, without making a clear difference between the learning process and the process of modeling the label dependencies. Also, the label relations incorporated in the learned model are not directly visible and can not be (re)used in conjunction with other learning approaches. In this paper, we investigate the use of label hierarchies in multi-label classification, constructed in a data-driven manner. We first consider flat label sets and construct label hierarchies from the label sets that appear in the annotations of the training data by using a hierarchical clustering approach. The obtained hierarchies are then used in conjunction with hierarchical multi-label classification (HMC) approaches (two local model approaches for HMC, based on SVMs and PCTs, and two global model approaches, based on PCTs for HMC and ensembles thereof). The experimental results reveal that the use of the data-derived label hierarchy can significantly improve the performance of single predictive models in multi-label classification as compared to the use of a flat label set, while this is not preserved for the ensemble models.


Multi-label Hierarchical Classification Ranking Learning 



We would like to acknowledge the support of the European Commission through the project MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data (Grant number ICT-2013-612944).


  1. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36(1), 105–139.CrossRefGoogle Scholar
  2. Blockeel, H., Raedt, L.D., & Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (pp. 55–63).Google Scholar
  3. Boutell, M.R., Luo, J., Shen, X., & Brown, C.M. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.CrossRefGoogle Scholar
  4. Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140.MathSciNetzbMATHGoogle Scholar
  5. Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32.CrossRefzbMATHGoogle Scholar
  6. Breiman, L., Friedman, J., Olshen, R., & Stone, C.J. (1984). Classification and regression trees. Chapman & Hall/CRC.Google Scholar
  7. Brinker, K., Fürnkranz, J., & Hüllermeier, E. (2006). A unified model for multilabel classification and ranking. In Proceedings of the 2006 conference on ECAI 2006: 17th european conference on artificial intelligence August 29 – September 1, 2006, Riva del Garda, Italy (pp. 489–493).Google Scholar
  8. Chang, C.C., & Lin, C.J. (2001). LIBSVM: A library for support vector machines. Software available at
  9. Clare, A., & King, R.D. (2001). Knowledge discovery in multi-label phenotype data. In Proceedings of the 5th european conference on PKDD (pp. 42–53).Google Scholar
  10. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 1–30.MathSciNetzbMATHGoogle Scholar
  11. Duygulu, P., Barnard, K., de Freitas, J., & Forsyth, D. (2002). Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In Proceedings of the 7th european conference on computer vision (pp. 349–354).Google Scholar
  12. Elisseeff, A., & Weston, J. (2005). A kernel method for Multi-Labelled classification. In Proceedings of the annual ACM conference on research and development in information retrieval (pp. 274–281).Google Scholar
  13. Friedman, M. (1940). A comparison of alternative tests of significance for the problem of m rankings. Annals of Mathematical Statistics, 11, 86–92.MathSciNetCrossRefzbMATHGoogle Scholar
  14. Gibaja, E., & Ventura, S. (2015). A tutorial on multilabel learning. ACM Computing Surveys, 47(3), 52:1–52:38.CrossRefGoogle Scholar
  15. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I.H. (2009). The weka data mining software: an update. SIGKDD Explorations, 11, 10–18.CrossRefGoogle Scholar
  16. Katakis, I., Tsoumakas, G., & Vlahavas, I. (2008). Multilabel text classification for automated tag suggestion. In Proceedings of the ECML/PKDD discovery challenge (pp. 124–135).Google Scholar
  17. Klimt, B., & Yang, Y. (2004). The enron corpus: a new dataset for email classification research. In Proceedings of the 15th european conference on machine learning (pp. 217–226).Google Scholar
  18. Kocev, D. (2011). Ensembles for predicting structured outputs. Ph.D. thesis, IPS Jožef Stefan, Ljubljana, Slovenia.Google Scholar
  19. Kocev, D., Vens, C., Struyf, J., & Džeroski, S. (2007). Ensembles of multi-objective decision trees. In Proceedings of the 18th european conference on machine learning (pp. 624–631).Google Scholar
  20. Kocev, D., Vens, C., Struyf, J., & Džeroski, S. (2013). Tree ensembles for predicting structured outputs. Pattern Recognition, 46(3), 817–833.CrossRefGoogle Scholar
  21. Kong, X., & Yu, P.S. (2011). An ensemble-based approach to fast classification of multilabel data streams. In Proceedings of the 7th international conference on collaborative computing: Networking, Applications and Worksharing (pp. 95–104).Google Scholar
  22. Levatić, J., Kocev, D., & Džeroski, S. (2014). The importance of the label hierarchy in hierarchical multi-label classification. Journal of Intelligent Information Systems, 45(2), 247–271.CrossRefGoogle Scholar
  23. Li, P., Li, H., & Wu, M. (2013). Multi-label ensemble based on variable pairwise constraint projection. Information Sciences, 222(0), 269–281.MathSciNetCrossRefGoogle Scholar
  24. Madjarov, G., Dimitrovski, I., Gjorgjevikj, D., & Deroski, S. (2015). Evaluation of different data-derived label hierarchies in multi-label classification. In New frontiers in mining complex patterns, lecture notes in computer science, (Vol. 8983 pp. 19–37): Springer international publishing.Google Scholar
  25. Madjarov, G., Kocev, D., Gjorgjevikj, D., & Dzeroski, S. (2012). An extensive experimental comparison of methods for multi-label learning. Pattern Recognition, 45 (9), 3084–3104.CrossRefGoogle Scholar
  26. Nemenyi, P.B. (1963). Distribution-free multiple comparisons. Ph.D. thesis, Princeton University.Google Scholar
  27. Quinlan, J.R. (1993). C4.5: Programs for machine learning (Morgan Kaufmann series in machine learning) morgan kaufmann.Google Scholar
  28. Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2009). Classifier chains for multi-label classification. In Proceedings of the 20th european conference on machine learning (pp. 254–269).Google Scholar
  29. Silla Carlos, N.J., & Freitas, A. (2011). A survey of hierarchical classification across different application domains. Data Mining and Knowledge Discovery, 22, 31–72.MathSciNetCrossRefzbMATHGoogle Scholar
  30. Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., & Smeulders, A.W.M. (2006). The challenge problem for automated detection of 101 semantic concepts in multimedia. In Proceedings of the 14th annual ACM international conference on multimedia (pp. 421–430).Google Scholar
  31. Srivastava, A., & Zane-Ulman, B. (2005). Discovering recurring anoMalies in text reports regarding complex space systems. In Proceedings of the IEEE aerospace conference (pp. 55–63).Google Scholar
  32. Trohidis, K., Tsoumakas, G., Kalliris, G., & Vlahavas, I. (2008). Multilabel classification of music into emotions. In Proceedings of the 9th international conference on music information retrieval (pp. 320–330).Google Scholar
  33. Tsoumakas, G., & Katakis, I. (2007). Multi label classification: an overview. International Journal of Data Warehouse and Mining, 3(3), 1–13.CrossRefGoogle Scholar
  34. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2008). Effective and efficient multilabel classification in domains with large number of labels. In Proceedings of the ECML/PKDD workshop on mining multidimensional data (pp. 30–44).Google Scholar
  35. Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: an ensemble method for multilabel classification. In Proceedings of the 18th european conference on machine learning (pp. 406–417).Google Scholar
  36. Vens, C., Struyf, J., Schietgat, L., Džeroski, S., & Blockeel, H. (2008). Decision trees for hierarchical multi-label classification. Machine Learning, 73 (2), 185–214.CrossRefGoogle Scholar
  37. Zhang, M.L., & Zhou, Z.H. (2014). A review on multi-label learning algorithms. IEEE Transactions on Knowledge and Data Engineering, 26(8), 1819–1837.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Gjorgji Madjarov
    • 1
    Email author
  • Dejan Gjorgjevikj
    • 1
  • Ivica Dimitrovski
    • 1
  • Sašo Džeroski
    • 2
  1. 1.Faculty of Computer Science and EngineeringSs. Cyril and Methodius UniversitySkopjeMacedonia
  2. 2.Department of Knowledge TechnologiesJožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations