Option Predictive Clustering Trees for Multi-target Regression

Osojnik, Aljaž; Džeroski, Sašo; Kocev, Dragi

doi:10.1007/978-3-319-46307-0_8

Aljaž Osojnik^16,17,
Sašo Džeroski^16,17 &
Dragi Kocev^16,17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9956))

Included in the following conference series:

International Conference on Discovery Science

1614 Accesses
1 Citations

Abstract

Decision trees are one of the most widely used predictive modelling methods primarily because they are readily interpretable and fast to learn. These nice properties come at the price of predictive performance. Moreover, the standard induction of decision trees suffers from myopia: A single split is chosen in each internal node which is selected in a greedy manner; hence, the resulting tree may be sub-optimal. To address these issues, option trees have been proposed which can include several alternative splits in a new type of internal nodes called option nodes. Considering all of this, an option tree can be also regarded as a condensed representation of an ensemble. In this work, we propose to extend predictive clustering trees for multi-target regression by considering option nodes, i.e., learn option predictive clustering trees (OPCTs). Multi-target regression is concerned with learning predictive models for tasks with multiple continuous target variables. We evaluate the proposed OPCTs on 11 benchmark MTR datasets. The results reveal that OPCTs achieve statistically significantly better predictive performance than a single PCT. Next, the performance is competitive with that of bagging and random forests of PCTs. Finally, we demonstrate the potential of OPCTs for multifaceted interpretability and illustrate the potential of inclusion of domain knowledge in the tree learning process.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Not only is the best split included, other splits are compared to it to determine their inclusion in the option tree.

References

Kocev, D., Vens, C., Struyf, J., Džeroski, S.: Tree ensembles for predicting structured outputs. Pattern Recogn. 46(3), 817–833 (2013)
Article Google Scholar
Demšar, D., Džeroski, S., Larsen, T., Struyf, J., Axelsen, J., Bruns-Pedersen, M., Krogh, P.H.: Using multi-objective classification to model communities of soil. Ecol. Model. 191(1), 131–143 (2006)
Article Google Scholar
Stojanova, D., Panov, P., Gjorgjioski, V., Kobler, A., Džeroski, S.: Estimating vegetation height and canopy cover from remotely sensed data with machine learning. Ecol. Inform. 5(4), 256–266 (2010)
Article Google Scholar
Kocev, D., Džeroski, S., White, M., Newell, G., Griffioen, P.: Using single- and multi-target regression trees and ensembles to model a compound index of vegetation condition. Ecol. Model. 220(8), 1159–1168 (2009)
Article Google Scholar
Tsoumakas, G., Spyromitros-Xioufis, E., Vrekou, A., Vlahavas, I.: Multi-target regression via random linear target combinations. In: Calders, T., Esposito, F., Hüllermeier, E., Meo, R. (eds.) ECML PKDD 2014. LNCS (LNAI), vol. 8726, pp. 225–240. Springer, Heidelberg (2014). doi:10.1007/978-3-662-44845-8_15
Google Scholar
Bakır, G.H., Hofmann, T., Schölkopf, B., Smola, A.J., Taskar, B., Vishwanathan, S.V.N.: Predicting Structured Data. Neural Information Processing. The MIT Press, Cambridge (2007)
Google Scholar
Kocev, D., Ceci, M.: Ensembles of extremely randomized trees for multi-target regression. In: Japkowicz, N., Matwin, S. (eds.) DS 2015. LNCS (LNAI), vol. 9356, pp. 86–100. Springer, Heidelberg (2015). doi:10.1007/978-3-319-24282-8_9
Chapter Google Scholar
Struyf, J., Džeroski, S.: Constraint based induction of multi-objective regression trees. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 3933, pp. 222–233. Springer, Heidelberg (2006). doi:10.1007/11733492_13
Chapter Google Scholar
Appice, A., Džeroski, S.: Stepwise induction of multi-target model trees. In: Kok, J.N., Koronacki, J., Mantaras, R.L., Matwin, S., Mladenič, D., Skowron, A. (eds.) ECML 2007. LNCS (LNAI), vol. 4701, pp. 502–509. Springer, Heidelberg (2007). doi:10.1007/978-3-540-74958-5_46
Chapter Google Scholar
Kocev, D., Struyf, J., Džeroski, S.: Beam search induction and similarity constraints for predictive clustering trees. In: Džeroski, S., Struyf, J. (eds.) KDID 2006. LNCS, vol. 4747, pp. 134–151. Springer, Heidelberg (2007). doi:10.1007/978-3-540-75549-4_9
Chapter Google Scholar
Buntine, W.: Learning classification trees. Stat. Comput. 2(2), 63–73 (1992)
Article Google Scholar
Kohavi, R., Kunz, C.: Option decision trees with majority votes. In: Proceedings of the 14th International Conference on Machine Learning, ICML 1997, pp. 161–169. Morgan Kaufmann Publishers Inc., San Francisco (1997)
Google Scholar
Blockeel, H., Struyf, J.: Efficient algorithms for decision tree cross-validation. J. Mach. Learn. Res. 3, 621–650 (2002)
MATH Google Scholar
Ikonomovska, E., Gama, J., Ženko, B., Džeroski, S.: Speeding-up hoeffding-based regression trees with options. In: Proceedings of the 28th International Conference on Machine Learning, ICML 2011, pp. 537–544 (2011)
Google Scholar
Breiman, L., Friedman, J., Olshen, R., Stone, C.J.: Classification and Regression Trees. Chapman & Hall/CRC, Boca Raton (1984)
MATH Google Scholar
Kampichler, C., Džeroski, S., Wieland, R.: Application of machine learning techniques to the analysis of soil ecological data bases: relationships between habitat features and Collembolan community characteristics. Soil Biol. Biochem. 32(2), 197–209 (2000)
Article Google Scholar
Karalič, A.: First order regression. Ph.D. thesis, Faculty of Computer Science, University of Ljubljana, Ljubljana, Slovenia (1995)
Google Scholar
Stojanova, D.: Estimating forest properties from remotely sensed data by using machine learning. Master’s thesis, Jožef Stefan IPS, Ljubljana, Slovenia (2009)
Google Scholar
Demšar, D., Debeljak, M., Džeroski, S., Lavigne, C.: Modelling pollen dispersal of genetically modified oilseed rape within the field. In: The Annual Meeting of the Ecological Society of America, p. 152 (2005)
Google Scholar
Gjorgjioski, V., Džeroski, S., White, M.: Clustering analysis of vegetation data. Technical report 10065, Jožef Stefan Institute (2008)
Google Scholar
Džeroski, S., Demšar, D., Grbović, J.: Predicting chemical parameters of river water quality from bioindicator data. Appl. Intell. 13(1), 7–17 (2000)
Article Google Scholar
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
MathSciNet MATH Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MathSciNet MATH Google Scholar
Demšar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar

Download references

Acknowledgments

We acknowledge the financial support of the European Commission through the grants ICT-2013-612944 MAESTRA and ICT-2013-604102 HBP, as well as the support of the Slovenian Research Agency through a young researcher grant and the program Knowledge Technologies (P2-0103).

Author information

Authors and Affiliations

Department of Knowledge Technologies, Jožef Stefan Institute, Ljubljana, Slovenia
Aljaž Osojnik, Sašo Džeroski & Dragi Kocev
International Postgraduate School, Jožef Stefan Institute, Ljubljana, Slovenia
Aljaž Osojnik, Sašo Džeroski & Dragi Kocev

Authors

Aljaž Osojnik
View author publications
You can also search for this author in PubMed Google Scholar
Sašo Džeroski
View author publications
You can also search for this author in PubMed Google Scholar
Dragi Kocev
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Aljaž Osojnik .

Editor information

Editors and Affiliations

Campus Middelhe, M.G.103a, Universiteit Antwerpen Campus Middelhe, M.G.103a, Antwerp, Belgium
Toon Calders
Università degli Studi di Bari Aldo Moro, Bari, Italy
Michelangelo Ceci
Bari, Italy
Donato Malerba

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Osojnik, A., Džeroski, S., Kocev, D. (2016). Option Predictive Clustering Trees for Multi-target Regression. In: Calders, T., Ceci, M., Malerba, D. (eds) Discovery Science. DS 2016. Lecture Notes in Computer Science(), vol 9956. Springer, Cham. https://doi.org/10.1007/978-3-319-46307-0_8

Download citation

DOI: https://doi.org/10.1007/978-3-319-46307-0_8
Published: 21 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46306-3
Online ISBN: 978-3-319-46307-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics