Hierarchical classification addresses the problem of classifying items into a hierarchy of classes. An important issue in hierarchical classification is the evaluation of different classification algorithms, an issue which is complicated by the hierarchical relations among the classes. Several evaluation measures have been proposed for hierarchical classification using the hierarchy in different ways without however providing a unified view of the problem. This paper studies the problem of evaluation in hierarchical classification by analysing and abstracting the key components of the existing performance measures. It also proposes two alternative generic views of hierarchical evaluation and introduces two corresponding novel measures. The proposed measures, along with the state-of-the-art ones, are empirically tested on three large datasets from the domain of text classification. The empirical results illustrate the undesirable behaviour of existing approaches and how the proposed methods overcome most of these problems across a range of cases.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
Tax calculation will be finalised during checkout.
Without loss of generality, we assume a subclass-of relationship among the classes, but in some cases a different relationship may hold, for example part-of. We assume, however, that the three properties always hold for the relationship.
The tool is available from http://nlp.cs.aueb.gr/software_and_datasets/HEMKit.zip.
Aho AV, Hopcroft JE, Ullman JD (1973) On finding lowest common ancestors in trees. In: Proceedings of 5th ACM Symposium Theory of Computing (STOC), pp 253–265
Ahuja RK, Magnanti TL, Orlin JB (1993) Network flows: theory, algorithms, and applications. Prentice Hall, Upper Saddle River
Blockeel H, Bruynooghe M, Dzeroski S, Ramon J, Struyf J (2002) Hierarchical multi-classification. In: ACM SIGKDD 2002 Workshop on multi-relational data mining, pp 21–35
Brucker F, Benites F, Sapozhnikova, E (2011) An empirical comparison of flat and hierarchical performance measures for multi-label classification with hierarchy extraction. In: Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems—Volume Part I, pp 579–589
Cai L, Hofmann T (2007) Exploiting known taxonomies in learning overlapping concepts. In: International joint conferences on artificial intelligence, pp 714–719
Cesa-Bianchi N, Gentile C, Zaniboni L (2006) Incremental algorithms for hierarchical classification. J Mach Learn Res 7:31–54
Costa EP, Lorena AC, Carvalho, Freitas AA (2007) A review of performance evaluation measures for hierarchical classifiers. In: 2007 AAAI Workshop, Vancouver
Dekel O, Keshet J, Singer, Y (2004) Large margin hierarchical classification. In: Proceedings of the twenty-first international conference on machine learning, pp 209–216
Holden N, Freitas AA (2006) Hierarchical classification of g-protein-coupled receptors with a pso/aco algorithm. In: IEEE swarm intelligence symposium (SIS-06), pp 77–84
Ipeirotis PG, Gravano L, Sahami M (2001) Probe, count, and classify: categorizing hidden web databases. In: ACM SIGMOD international conference on management of data, SIGMOD ’01, pp 67–78
Kendall MG (1938) A new measure of rank correlation. Biometrica 30:81–93
Kiritchenko S, Matwin S, Fazel FA (2005) Functional annotation of genes using hierarchical text categorization. In: ACL workshop on linking biological literature, ontologies and databases: mining biological semantics
Koller D, Sahami M (1997) Hierarchically classifying documents using very few words
Kosmopoulos A, Gaussier E, Paliouras G (2010) The ECIR 2010 large scale hierarchical classification workshop. SIGIR Forum 44:23–32
McCallum A, Rosenfeld R (1998) Improving text classification by shrinkage in a hierarchy of classes. ICML 98:359–367
Nowak S, Lukashevich H, Dunker P, Rüger S (2010) Performance measures for multilabel evaluation: a case study in the area of image classification. In: Proceedings of the international conference on multimedia information retrieval, pp 35–44
Silla CN Jr, Freitas AA (2011) A survey of hierarchical classification across different application domains. Data Min Knowl Discov 22:31–72
Sokolova M, Guy L (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45:427–437
Struyf J, Dzeroski S, Blockeel H, Clare A (2005) Hierarchical multi-classification with predictive clustering trees in functional genomics. In Carlos B, Cardoso A, and Dias G, (eds) Progress in artificial Intelligence. Lecture Notes in Computer Science, vol 3808, pp 272–283
Sun A, Lim E-P (2001) Hierarchical text classification and evaluation. In: IEEE International conference on data mining, pp 521–528
Sun A, Lim E-P, Ng W-K (2003) Performance measurement framework for hierarchical text classification. J Am Soc Inf Sci Technol 54:1014–1028
Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83
Xiao L, Zhou D, Wu M (2011) Hierarchical classification via orthogonal transfer. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 801–808
Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 42–49
Responsible editor: Chih-Jen Lin.
About this article
Cite this article
Kosmopoulos, A., Partalas, I., Gaussier, E. et al. Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Min Knowl Disc 29, 820–865 (2015). https://doi.org/10.1007/s10618-014-0382-x
- Evaluation measures
- Hierarchical classification
- Tree-structured class hierarchies
- DAG-structured class hierarchies