Advertisement

Evaluation of Different Data-Derived Label Hierarchies in Multi-label Classification

  • Gjorgji Madjarov
  • Ivica Dimitrovski
  • Dejan Gjorgjevikj
  • Sašo Džeroski
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8983)

Abstract

Motivated by an increasing number of new applications, the research community is devoting an increasing amount of attention to the task of multi-label classification (MLC). Many different approaches to solving multi-label classification problems have been recently developed. Recent empirical studies have comprehensively evaluated many of these approaches on many datasets using different evaluation measures. The studies have indicated that the predictive performance and efficiency of the approaches could be improved by using data derived (artificial) hierarchies, in the learning and prediction phases. In this paper, we compare different clustering algorithms for constructing the label hierarchies (in a data-driven manner), in multi-label classification. We consider flat label sets and construct the label hierarchies from the label sets that appear in the annotations of the training data by using four different clustering algorithms (balanced \(k\)-means, agglomerative clustering with single and complete linkage and predictive clustering trees). The hierarchies are then used in conjunction with global hierarchical multi-label classification (HMC) approaches. The results from the statistical and experimental evaluation reveal that the data-derived label hierarchies used in conjunction with global HMC methods greatly improve the performance of MLC methods. Additionally, multi-branch hierarchies appear much more suitable for the global HMC approaches as compared to the binary hierarchies.

Keywords

Multi-label Hierarchical Classification Clustering 

Notes

Acknowledgements

We would like to acknowledge the support of the European Commission through the project MAESTRA - Learning from Massive, Incompletely annotated, and Structured Data (Grant number ICT-2013-612944). Also, we would like to acknowledge the support of the Faculty of Computer Science and Engineering at the “Ss. Cyril and Methodius” University.

References

  1. 1.
    Madjarov, G., Kocev, D., Gjorgjevikj, D., Dzeroski, S.: An extensive experimental comparison of methods for multi-label learning. Pattern Recogn. 45(9), 3084–3104 (2012)CrossRefGoogle Scholar
  2. 2.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Effective and efficient multilabel classification in domains with large number of labels. In: Proceedings of the ECML/PKDD Workshop on Mining Multidimensional Data, pp. 30–44 (2008)Google Scholar
  3. 3.
    Kocev, D.: Ensembles for predicting structured outputs. Ph.D. thesis, IPS Jožef Stefan, Ljubljana, Slovenia (2011)Google Scholar
  4. 4.
    Tsoumakas, G., Katakis, I.: Multi label classification: an overview. Int. J. Data Warehouse Min. 3(3), 1–13 (2007)CrossRefGoogle Scholar
  5. 5.
    Mencía, E.L., Park, S.H., Fürnkranz, J.: Efficient voting prediction for pairwise multilabel classification. Neurocomputing 73, 1164–1176 (2010)CrossRefGoogle Scholar
  6. 6.
    Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63 (1998)Google Scholar
  7. 7.
    Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Mach. Learn. 73(2), 185–214 (2008)CrossRefGoogle Scholar
  8. 8.
    Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2009)CrossRefGoogle Scholar
  9. 9.
    Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recogn. 46(3), 817–833 (2013)CrossRefGoogle Scholar
  10. 10.
    de Carvalho, A.C.P.L.F., Freitas, A.A.: A tutorial on multi-label classification techniques. In: Abraham, A., Hassanien, A.-E., Snášel, V. (eds.) Foundations of Comput. Intel. Vol. 5. SCI, vol. 205, pp. 177–195. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  11. 11.
    Tsoumakas, G., Katakis, I., Vlahavas, I.: Mining multi-label data. In: Maimon, O., Rokach, L. (eds.) Data Mining and Knowledge Discovery Handbook, pp. 667–685. Springer, Heidelberg (2010) Google Scholar
  12. 12.
    Silla Jr., C.N., Freitas, A.: A survey of hierarchical classification across different application domains. Data Min. Knowl. Dis. 22, 31–72 (2011)CrossRefMATHMathSciNetGoogle Scholar
  13. 13.
    Dimitrovski, I., Kocev, D., Loskovska, S., Džeroski, S.: Fast and scalable image retrieval using predictive clustering trees. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.) DS 2013. LNCS, vol. 8140, pp. 33–48. Springer, Heidelberg (2013) CrossRefGoogle Scholar
  14. 14.
    Levatić, J., Kocev, D., Džeroski, S.: The use of the label hierarchy in HMC improves performance: a case study in predicting community structure in ecology. In: Proceedings of the Workshop on New Frontiers in Mining Complex Patterns held in Conjunction with ECML/PKDD2013, pp. 189–201 (2013)Google Scholar
  15. 15.
    Trohidis, K., Tsoumakas, G., Kalliris, G., Vlahavas, I.: Multilabel classification of music into emotions. In: Proceedings of the 9th International Conference on Music Information Retrieval, pp. 320–330 (2008)Google Scholar
  16. 16.
    Boutell, M.R., Luo, J., Shen, X., Brown, C.M.: Learning multi-label scene classification. Pattern Recogn. 37(9), 1757–1771 (2004)CrossRefGoogle Scholar
  17. 17.
    Elisseeff, A., Weston, J.: A kernel method for multi-labelled classification. In: Proceedings of the Annual ACM Conference on Research and Development in Information Retrieval, pp. 274–281 (2005)Google Scholar
  18. 18.
    Read, J., Pfahringer, B., Holmes, G., Frank, E.: Classifier chains for multi-label classification. In: Buntine, W., Grobelnik, M., Mladenić, D., Shawe-Taylor, J. (eds.) ECML PKDD 2009, Part II. LNCS, vol. 5782, pp. 254–269. Springer, Heidelberg (2009) CrossRefGoogle Scholar
  19. 19.
    Klimt, B., Yang, Y.: The enron corpus: a new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004) CrossRefGoogle Scholar
  20. 20.
    Duygulu, P., Barnard, K., de Freitas, J.F.G., Forsyth, D.: Object recognition as machine translation: learning a lexicon for a fixed image vocabulary. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part IV. LNCS, vol. 2353, pp. 97–112. Springer, Heidelberg (2002) CrossRefGoogle Scholar
  21. 21.
    Srivastava, A., Zane-Ulman, B.: Discovering recurring anomalies in text reports regarding complex space systems. In: Proceedings of the IEEE Aerospace Conference, pp. 55–63 (2005)Google Scholar
  22. 22.
    Snoek, C.G.M., Worring, M., van Gemert, J.C., Geusebroek, J.M., Smeulders, A.W.M.: The challenge problem for automated detection of 101 semantic concepts in multimedia. In: Proceedings of the 14th Annual ACM International Conference on Multimedia, pp. 421–430 (2006)Google Scholar
  23. 23.
    Katakis, I., Tsoumakas, G., Vlahavas, I.: Multilabel text classification for automated tag suggestion. In: Proceedings of the ECML/PKDD Discovery Challenge (2008)Google Scholar
  24. 24.
    Friedman, M.: A comparison of alternative tests of significance for the problem of m rankings. Ann. Math. Stat. 11, 86–92 (1940)CrossRefGoogle Scholar
  25. 25.
    Nemenyi, P.B.: Distribution-free multiple comparisons. Ph.D. thesis, Princeton University (1963)Google Scholar
  26. 26.
    Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)MATHMathSciNetGoogle Scholar
  27. 27.
    Pearson, E.S., Hartley, H.O.: Biometrika Tables for Statisticians, vol. 1. Cambridge University Press, Cambridge (1966) MATHGoogle Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Gjorgji Madjarov
    • 1
  • Ivica Dimitrovski
    • 1
  • Dejan Gjorgjevikj
    • 1
  • Sašo Džeroski
    • 2
  1. 1.Faculty of Computer Science and EngineeringSs. Cyril and Methodius UniversitySkopjeMacedonia
  2. 2.Jožef Stefan InstituteLjubljanaSlovenia

Personalised recommendations