Evaluation measures for hierarchical classification: a unified view and novel approaches


Hierarchical classification addresses the problem of classifying items into a hierarchy of classes. An important issue in hierarchical classification is the evaluation of different classification algorithms, an issue which is complicated by the hierarchical relations among the classes. Several evaluation measures have been proposed for hierarchical classification using the hierarchy in different ways without however providing a unified view of the problem. This paper studies the problem of evaluation in hierarchical classification by analysing and abstracting the key components of the existing performance measures. It also proposes two alternative generic views of hierarchical evaluation and introduces two corresponding novel measures. The proposed measures, along with the state-of-the-art ones, are empirically tested on three large datasets from the domain of text classification. The empirical results illustrate the undesirable behaviour of existing approaches and how the proposed methods overcome most of these problems across a range of cases.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19


  1. 1.

    Without loss of generality, we assume a subclass-of relationship among the classes, but in some cases a different relationship may hold, for example part-of. We assume, however, that the three properties always hold for the relationship.

    Fig. 1

    A tree and a DAG class hierarchy

  2. 2.


  3. 3.

    The tool is available from http://nlp.cs.aueb.gr/software_and_datasets/HEMKit.zip.

  4. 4.


  5. 5.


  6. 6.


  7. 7.

    http://www.ncbi.nlm.nih.gov/pubmed and http://www.ncbi.nlm.nih.gov/mesh.


  1. Aho AV, Hopcroft JE, Ullman JD (1973) On finding lowest common ancestors in trees. In: Proceedings of 5th ACM Symposium Theory of Computing (STOC), pp 253–265

  2. Ahuja RK, Magnanti TL, Orlin JB (1993) Network flows: theory, algorithms, and applications. Prentice Hall, Upper Saddle River

    Google Scholar 

  3. Blockeel H, Bruynooghe M, Dzeroski S, Ramon J, Struyf J (2002) Hierarchical multi-classification. In: ACM SIGKDD 2002 Workshop on multi-relational data mining, pp 21–35

  4. Brucker F, Benites F, Sapozhnikova, E (2011) An empirical comparison of flat and hierarchical performance measures for multi-label classification with hierarchy extraction. In: Proceedings of the 15th international conference on Knowledge-based and intelligent information and engineering systems—Volume Part I, pp 579–589

  5. Cai L, Hofmann T (2007) Exploiting known taxonomies in learning overlapping concepts. In: International joint conferences on artificial intelligence, pp 714–719

  6. Cesa-Bianchi N, Gentile C, Zaniboni L (2006) Incremental algorithms for hierarchical classification. J Mach Learn Res 7:31–54

    MATH  MathSciNet  Google Scholar 

  7. Costa EP, Lorena AC, Carvalho, Freitas AA (2007) A review of performance evaluation measures for hierarchical classifiers. In: 2007 AAAI Workshop, Vancouver

  8. Dekel O, Keshet J, Singer, Y (2004) Large margin hierarchical classification. In: Proceedings of the twenty-first international conference on machine learning, pp 209–216

  9. Holden N, Freitas AA (2006) Hierarchical classification of g-protein-coupled receptors with a pso/aco algorithm. In: IEEE swarm intelligence symposium (SIS-06), pp 77–84

  10. Ipeirotis PG, Gravano L, Sahami M (2001) Probe, count, and classify: categorizing hidden web databases. In: ACM SIGMOD international conference on management of data, SIGMOD ’01, pp 67–78

  11. Kendall MG (1938) A new measure of rank correlation. Biometrica 30:81–93

    Article  MATH  Google Scholar 

  12. Kiritchenko S, Matwin S, Fazel FA (2005) Functional annotation of genes using hierarchical text categorization. In: ACL workshop on linking biological literature, ontologies and databases: mining biological semantics

  13. Koller D, Sahami M (1997) Hierarchically classifying documents using very few words

  14. Kosmopoulos A, Gaussier E, Paliouras G (2010) The ECIR 2010 large scale hierarchical classification workshop. SIGIR Forum 44:23–32

    Article  Google Scholar 

  15. McCallum A, Rosenfeld R (1998) Improving text classification by shrinkage in a hierarchy of classes. ICML 98:359–367

    Google Scholar 

  16. Nowak S, Lukashevich H, Dunker P, Rüger S (2010) Performance measures for multilabel evaluation: a case study in the area of image classification. In: Proceedings of the international conference on multimedia information retrieval, pp 35–44

  17. Silla CN Jr, Freitas AA (2011) A survey of hierarchical classification across different application domains. Data Min Knowl Discov 22:31–72

    Article  MATH  MathSciNet  Google Scholar 

  18. Sokolova M, Guy L (2009) A systematic analysis of performance measures for classification tasks. Inf Process Manag 45:427–437

    Article  Google Scholar 

  19. Struyf J, Dzeroski S, Blockeel H, Clare A (2005) Hierarchical multi-classification with predictive clustering trees in functional genomics. In Carlos B, Cardoso A, and Dias G, (eds) Progress in artificial Intelligence. Lecture Notes in Computer Science, vol 3808, pp 272–283

  20. Sun A, Lim E-P (2001) Hierarchical text classification and evaluation. In: IEEE International conference on data mining, pp 521–528

  21. Sun A, Lim E-P, Ng W-K (2003) Performance measurement framework for hierarchical text classification. J Am Soc Inf Sci Technol 54:1014–1028

    Article  Google Scholar 

  22. Wilcoxon F (1945) Individual comparisons by ranking methods. Biom Bull 1(6):80–83

    Article  Google Scholar 

  23. Xiao L, Zhou D, Wu M (2011) Hierarchical classification via orthogonal transfer. In: Proceedings of the 28th international conference on machine learning (ICML-11), pp 801–808

  24. Yang Y, Liu X (1999) A re-examination of text categorization methods. In: Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, ACM, pp 42–49

Download references

Author information



Corresponding author

Correspondence to Aris Kosmopoulos.

Additional information

Responsible editor: Chih-Jen Lin.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kosmopoulos, A., Partalas, I., Gaussier, E. et al. Evaluation measures for hierarchical classification: a unified view and novel approaches. Data Min Knowl Disc 29, 820–865 (2015). https://doi.org/10.1007/s10618-014-0382-x

Download citation


  • Evaluation
  • Evaluation measures
  • Hierarchical classification
  • Tree-structured class hierarchies
  • DAG-structured class hierarchies