Predictive Bi-clustering Trees for Hierarchical Multi-label Classification

Santos, Bruna Z.; Nakano, Felipe K.; Cerri, Ricardo; Vens, Celine

doi:10.1007/978-3-030-67664-3_42

Bruna Z. Santos¹²,
Felipe K. Nakano^13,14,
Ricardo Cerri¹² &
…
Celine Vens^13,14

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12459))

Included in the following conference series:

Joint European Conference on Machine Learning and Knowledge Discovery in Databases

1760 Accesses
1 Citations

Abstract

In the recent literature on multi-label classification, a lot of attention is given to methods that exploit label dependencies. Most of these methods assume that the dependencies are static over the entire instance space. In contrast, here we present an approach that dynamically adapts the label partitions in a multi-label decision tree learning context. In particular, we adapt the recently introduced predictive bi-clustering tree (PBCT) method towards multi-label classification tasks. This way, tree nodes can split the instance-label matrix both in a horizontal and a vertical way. We focus on hierarchical multi-label classification (HMC) tasks, and map the label hierarchy to a feature set over the label space. This feature set is exploited to infer vertical splits, which are regulated by a lookahead strategy in the tree building procedure. We evaluate our proposed method using benchmark datasets. Experiments demonstrate that our proposal (PBCT-HMC) obtained better or competitive results in comparison to its direct competitors, both in terms of predictive performance and model size. Compared to an HMC method that does not produce label partitions though, our method results in larger models on average, while still producing equally large or smaller models in one third of the datasets by creating suitable label partitions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://dtai.cs.kuleuven.be/clus/.
2.
In our implementation, we consider a greedy generation of the subsets.
3.
Available at https://dtai.cs.kuleuven.be/clus/hmc-ens/.
4.
Available at https://dtai.cs.kuleuven.be/clus/hmcdatasets/.
5.
Available at https://itec.kuleuven-kulak.be/supportingmaterial.
6.
Available at http://kt.ijs.si/DragiKocev/PhD/resources/doku.php.
7.
Available at https://dtai.cs.kuleuven.be/clus.
8.
Available at https://github.com/biomal/Clus-PBCT-HMC.

References

Tsoumakas, G., Katakis, I.: Multi-label classification: an overview. Int. J. Data Warehouse. Min. (IJDWM) 3(3), 1–13 (2007)
Article Google Scholar
Blockeel, H., Raedt, L.D., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the Fifteenth International Conference on Machine Learning, ICML 1998, pp. 55–63 (1998)
Google Scholar
Papagiannopoulou, C., Tsoumakas, G., Tsamardinos, I.: Discovering and exploiting deterministic label relationships in multi-label learning. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 915–924. Association for Computing Machinery (2015)
Google Scholar
Madjarov, G., Gjorgjevikj, D., Dimitrovski, I., Džeroski, S.: The use of data-derived label hierarchies in multi-label classification. J. Intell. Inf. Syst. 47(1), 57–90 (2016)
Article Google Scholar
Szymanski, P., Kajdanowicz, T., Kersting, K.: How is a data-driven approach better than random choice in label space division for multi-label classification? CoRR (2016)
Google Scholar
Joly, A., Geurts, P., Wehenkel, L.: Random forests with random projections of the output space for high dimensional multi-label classification. In: Machine Learning and Knowledge Discovery in Databases, pp. 607–622 (2014)
Google Scholar
Breskvar, M., Kocev, D., Džeroski, S.: Multi-label classification using random label subset selections. In: Discovery Science (2017)
Google Scholar
Breskvar, M., Kocev, D., Džeroski, S.: Ensembles for multi-target regression with random output selections. Mach. Learn. 107(11), 1673–1709 (2018). https://doi.org/10.1007/s10994-018-5744-y
Article MathSciNet MATH Google Scholar
Prati, R.C., de França, F.O.: Extending features for multilabel classification with swarm biclustering. In: IEEE Congress on Evolutionary Computation, pp. 2964–2971 (2013)
Google Scholar
de Abreu, I.B.M., Mantovani, R.G., Cerri, R.: Incorporating instance correlations in multi-label classification via label-space. In: International Joint Conference on Neural Networks (IJCNN), pp. 581–588 (2017)
Google Scholar
Feng, L., An, B., He, S.: Collaboration based multi-label learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3550–3557 (2019)
Google Scholar
Pliakos, K., Geurts, P., Vens, C.: Global multi-output decision trees forinteraction prediction. Mach. Learn. 107(8), 1257–1281 (2018). https://doi.org/10.1007/s10994-018-5700-x
Article MathSciNet MATH Google Scholar
Elomaa, T., Malinen, T.: On lookahead heuristics in decision tree learning. In: Zhong, N., Raś, Z.W., Tsumoto, S., Suzuki, E. (eds.) ISMIS 2003. LNCS (LNAI), vol. 2871, pp. 445–453. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-540-39592-8_63
Chapter Google Scholar
Vens, C., Struyf, J., Schietgat, L., Džeroski, S., Blockeel, H.: Decision trees for hierarchical multi-label classification. Mach. Learn. 73(2), 185 (2008)
Article Google Scholar
Pliakos, K., Vens, C., Tsoumakas, G.: Predicting drug-target interactions with multi-label classification and label partitioning. In: IEEE-ACM Transactions On Computational Biology And Bioinformatics, pp. 1–11 (2019)
Google Scholar
Schietgat, L., Vens, C., Struyf, J., Blockeel, H., Kocev, D., Dzeroski, S.: Predicting gene function using hierarchical multi-label decision tree ensembles. BMC Bioinf. 11, 2 (2010)
Article Google Scholar
Cerri, R., Barros, R.C., de Carvalho, A.C., Jin, Y.: Reduction strategies for hierarchical multi-label classification in protein function prediction. BMC Bioinform. 17(1), 373 (2016) https://doi.org/10.1186/s12859-016-1232-1
Wehrmann, J., Cerri, R., Barros, R.: Hierarchical multi-label classification networks. In: Proceedings of the 35th International Conference on Machine Learning, vol. 80, pp. 5075–5084 (2018)
Google Scholar
Masera, L., Blanzieri, E.: Awx: an integrated approach to hierarchical-multilabel classification. In: Machine Learning and Knowledge Discovery in Databases, pp. 322–336 (2019)
Google Scholar
Tsoumakas, G., Katakis, I., Vlahavas, I.: Random k-labelsets for multi-label classification. IEEE Trans. Knowl. Data Eng. 23 1079–1089 (2011)
Google Scholar
Moyano, J., Gibaja, E., Cios, K., Ventura, S.: Combining multi-labelclassifiers based on projections of the output space using evolutionary algorithms. Knowl.-Based Syst. 196, 105770 (2020)
Article Google Scholar
Ruepp, A, Zollner, A.M.D.: The funcat, a functional annotation scheme for systematic classification of proteins from whole genomes. Nucleic Acids Res. 32(18), 5539–5545 (2004)
Google Scholar
Nakano, F.K., Lietaert, M., Vens, C.: Machine learning for discovering missing or wrong protein function annotations. BMC Bioinf. 20(1), 485 (2019)
Article Google Scholar
Dimitrovski. I., Kocev, D., Loskovska, S., Džeroski, S.: Hierchical annotation of medical images. In: Proceedings of the 11th International Multiconference - Information Society IS 200. IJS, Ljubljana, pp. 174–181 (2008)
Google Scholar
Dimitrovski, I., Kocev, D., Loskovska, S., Džeroski, S.: Hierarchical classification of diatom images using ensembles of predictive clustering trees. Ecol. Inf. 7(1), 19–29 (2012)
Article Google Scholar
Klimt, B., Yang, Y.: The Enron corpus: a new dataset for email classification research. In: Boulicaut, J.-F., Esposito, F., Giannotti, F., Pedreschi, D. (eds.) ECML 2004. LNCS (LNAI), vol. 3201, pp. 217–226. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-30115-8_22
Chapter Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

We acknowledge Sao Paulo Research Foundation (FAPESP grants #2017/13218-5 and #2016/25078-0) and Research Fund Flanders (FWO) for financial support.

Author information

Authors and Affiliations

Department of Computer Science, Federal University of São Carlos, São Carlos, Brazil
Bruna Z. Santos & Ricardo Cerri
Department of Public Health and Primary Care, KU Leuven, Kortrijk, Belgium
Felipe K. Nakano & Celine Vens
ITEC, imec Research Group at KU Leuven, Kortrijk, Belgium
Felipe K. Nakano & Celine Vens

Authors

Bruna Z. Santos
View author publications
You can also search for this author in PubMed Google Scholar
Felipe K. Nakano
View author publications
You can also search for this author in PubMed Google Scholar
Ricardo Cerri
View author publications
You can also search for this author in PubMed Google Scholar
Celine Vens
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Bruna Z. Santos .

Editor information

Editors and Affiliations

Albert-Ludwigs-Universität, Freiburg, Germany
Frank Hutter
TU Darmstadt, Darmstadt, Germany
Kristian Kersting
Ghent University, Ghent, Belgium
Jefrey Lijffijt
Saarland University, Saarbrücken, Germany
Isabel Valera

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Santos, B.Z., Nakano, F.K., Cerri, R., Vens, C. (2021). Predictive Bi-clustering Trees for Hierarchical Multi-label Classification. In: Hutter, F., Kersting, K., Lijffijt, J., Valera, I. (eds) Machine Learning and Knowledge Discovery in Databases. ECML PKDD 2020. Lecture Notes in Computer Science(), vol 12459. Springer, Cham. https://doi.org/10.1007/978-3-030-67664-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-030-67664-3_42
Published: 25 February 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-67663-6
Online ISBN: 978-3-030-67664-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the ECML PKDD community (opens in a new tab)