Abstract
Hierarchical multi-label classification (HMC) is a variant of classification where instances may belong to multiple classes at the same time and these classes are organized in a hierarchy. This article presents several approaches to the induction of decision trees for HMC, as well as an empirical study of their use in functional genomics. We compare learning a single HMC tree (which makes predictions for all classes together) to two approaches that learn a set of regular classification trees (one for each class). The first approach defines an independent single-label classification task for each class (SC). Obviously, the hierarchy introduces dependencies between the classes. While they are ignored by the first approach, they are exploited by the second approach, named hierarchical single-label classification (HSC). Depending on the application at hand, the hierarchy of classes can be such that each class has at most one parent (tree structure) or such that classes may have multiple parents (DAG structure). The latter case has not been considered before and we show how the HMC and HSC approaches can be modified to support this setting. We compare the three approaches on 24 yeast data sets using as classification schemes MIPS’s FunCat (tree structure) and the Gene Ontology (DAG structure). We show that HMC trees outperform HSC and SC trees along three dimensions: predictive accuracy, model size, and induction time. We conclude that HMC trees should definitely be considered in HMC tasks where interpretable models are desired.
Article PDF
Similar content being viewed by others
References
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., & Lipman, D. J. (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research, 25, 3389–3402.
Ashburner, M. et al. (2000). Gene Ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics, 25(1), 25–29.
Barutcuoglu, Z., Schapire, R. E., & Troyanskaya, O. G. (2006). Hierarchical multi-label prediction of gene function. Bioinformatics, 22(7), 830–836.
Blockeel, H., Bruynooghe, M., Džeroski, S., Ramon, J., & Struyf, J. (2002). Hierarchical multi-classification. In Proceedings of the ACM SIGKDD 2002 workshop on multi-relational data mining (MRDM 2002) (pp. 21–35).
Blockeel, H., De Raedt, L., & Ramon, J. (1998). Top-down induction of clustering trees. In Proceedings of the 15th international conference on machine learning (pp. 55–63).
Blockeel, H., Džeroski, S., & Grbović, J. (1999). Simultaneous prediction of multiple chemical parameters of river water quality with Tilde. In Proceedings of the 3rd European conference on principles of data mining and knowledge discovery (pp. 32–40).
Blockeel, H., Schietgat, L., Struyf, J., Džeroski, S., & Clare, A. (2006). Decision trees for hierarchical multilabel classification: a case study in functional genomics. In Proceedings of the 10th European conference on principles and practice of knowledge discovery in databases (pp. 18–29).
Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont: Wadsworth.
Cesa-Bianchi, N., Gentile, C., & Zaniboni, L. (2006). Incremental algorithms for hierarchical classification. Journal of Machine Learning Research, 7, 31–54.
Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P., & Herskowitz, I. (1998). The transcriptional program of sporulation in budding yeast. Science, 282, 699–705.
Clare, A. (2003). Machine learning and data mining for yeast functional genomics. PhD thesis, University of Wales, Aberystwyth.
Clare, A., & King, R. D. (2001). Knowledge discovery in multi-label phenotype data. In 5th European conference on principles of data mining and knowledge discovery (pp. 42–53).
Davis, J., & Goadrich, M. (2006), The relationship between precision-recall and ROC curves. In Proceedings of the 23rd international conference on machine learning (pp. 233–240)
Demšar, D., Džeroski, S., Larsen, T., Struyf, J., Axelsen, J., Bruus Pedersen, M., & Henning Krogh, P. (2006). Using multi-objective classification to model communities of soil microarthropods. Ecological Modelling, 191(1), 131–143.
DeRisi, J., Iyer, V., & Brown, P. (1997). Exploring the metabolic and genetic control of gene expression on a genomic scale. Science, 278, 680–686.
Džeroski, S., Slavkov, I., Gjorgjioski, V., & Struyf, J. (2006). Analysis of time series data with predictive clustering trees. In Proceedings of the 5th international workshop on knowledge discovery in inductive databases (pp. 47–58).
Eisen, M., Spellman, P., Brown, P., & Botstein, D. (1998). Cluster analysis and display of genome-wide expression patterns. Proceedings of the National Academy of Sciences of the USA, 95, 14863–14868.
Expasy (2008). ProtParam. http://www.expasy.org/tools/protparam.html.
Gasch, A., Huang, M., Metzner, S., Botstein, D., Elledge, S., & Brown, P. (2001). Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. Molecular Biology of the Cell, 12(10), 2987–3000.
Gasch, A., Spellman, P., Kao, C., Carmel-Harel, O., Eisen, M., Storz, G., Botstein, D., & Brown, P. (2000). Genomic expression program in the response of yeast cells to environmental changes. Molecular Biology of the Cell, 11, 4241–4257.
Geurts, P., Wehenkel, L., & d’Alché-Buc, F. (2006). Kernelizing the output of tree-based methods. In Proceedings of the 23th international conference on machine learning (pp. 345–352)
Koller, D., & Sahami, M. (1997). Hierarchically classifying documents using very few words. In Proceedings of the 14th international conference on machine learning (pp. 170–178).
Kumar, A., Cheung, K. H., Ross-Macdonald, P., Coelho, P. S. R., Miller, P., & Snyder, M. (2000). TRIPLES: a database of gene function in S. cerevisiae. Nucleic Acids Research, 28, 81–84.
Mewes, H. W., Heumann, K., Kaps, A., Mayer, K., Pfeiffer, F., Stocker, S., & Frishman, D. (1999). MIPS: a database for protein sequences and complete genomes. Nucl. Acids Research, 27, 44–48.
Oliver, S. (1996). A network approach to the systematic analysis of yeast gene function. Trends in Genetics, 12(7), 241–242.
Ouali, M., & King, R. D. (2000). Cascaded multiple classifiers for secondary structure prediction. Protein Science, 9(6), 1162–1176.
Quinlan, J. R. (1993). C4.5: Programs for machine learning. San Mateo: Morgan Kaufmann.
Roth, F., Hughes, J., Estep, P., & Church, G. (1998). Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation. Nature Biotechnology, 16, 939–945.
Rousu, J., Saunders, C., Szedmak, S., & Shawe-Taylor, J. (2006). Kernel-based learning of hierarchical multilabel classification models. Journal of Machine Learning Research, 7, 1601–1626.
Spellman, P., Sherlock, G., Zhang, M., Iyer, V., Anders, K., Eisen, M., Brown, P., Botstein, D., & Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell, 9, 3273–3297.
Stenger, B., Thayananthan, A., Torr, P., & Cipolla, R. (2007). Estimating 3D hand pose using hierarchical multi-label classification. Image and Vision Computing, 5(12), 1885–1894.
Struyf, J., & Džeroski, S. (2006). Constraint based induction of multi-objective regression trees. In Knowledge discovery in inductive databases, 4th international workshop, KDID’05, revised, selected and invited papers (pp. 222–233).
Struyf, J., & Džeroski, S. (2007). Clustering trees with instance level constraints. In Proceedings of the 18th European conference on machine learning (pp. 359–370)
Taskar, B., Guestrin, C., & Koller, D. (2003). Max-margin Markov networks. In Advances in neural information processing systems 16 16
Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2005). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6, 1453–1484.
Tsoumakas, G., & Vlahavas, I. (2007). Random k-labelsets: an ensemble method for multilabel classification. In Proceedings of the 18th European conference on machine learning (pp. 406–417).
Weiss, G. M., & Provost, F. J. (2003). Learning when training data are costly: the effect of class distribution on tree induction. The Journal of Artificial Intelligence Research, 19, 315–354.
Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics, 1, 80–83.
Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1, 69–90.
Author information
Authors and Affiliations
Corresponding author
Additional information
Editor: Johannes Fürnkranz.
Rights and permissions
About this article
Cite this article
Vens, C., Struyf, J., Schietgat, L. et al. Decision trees for hierarchical multi-label classification. Mach Learn 73, 185–214 (2008). https://doi.org/10.1007/s10994-008-5077-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10994-008-5077-3