Constraint Based Induction of Multi-objective Regression Trees

Struyf, Jan; Džeroski, Sašo

doi:10.1007/11733492_13

Jan Struyf¹⁸ &
Sašo Džeroski¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 3933))

Included in the following conference series:

International Workshop on Knowledge Discovery in Inductive Databases

493 Accesses
44 Citations

Abstract

Constrained based inductive systems are a key component of inductive databases and responsible for building the models that satisfy the constraints in the inductive queries. In this paper, we propose a constraint based system for building multi-objective regression trees. A multi-objective regression tree is a decision tree capable of predicting several numeric variables at once. We focus on size and accuracy constraints. By either specifying maximum size or minimum accuracy, the user can trade-off size (and thus interpretability) for accuracy. Our approach is to first build a large tree based on the training data and to prune it in a second step to satisfy the user constraints. This has the advantage that the tree can be stored in the inductive database and used for answering inductive queries with different constraints. Besides size and accuracy constraints, we also briefly discuss syntactic constraints. We evaluate our system on a number of real world data sets and measure the size versus accuracy trade-off.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Almuallim, H.: An efficient algorithm for optimal pruning of decision trees. Artificial Intelligence 83(2), 347–362 (1996)
Article Google Scholar
Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63 (1998)
Google Scholar
Bohanec, M., Bratko, I.: Trading accuracy for simplicity in decision trees. Machine Learning 15(3), 223–250 (1994)
MATH Google Scholar
Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)
MATH Google Scholar
Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
Article MATH Google Scholar
Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)
MATH Google Scholar
De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4(2), 69–77 (2002)
Article MathSciNet Google Scholar
Demšar, D., Debeljak, M., Lavigne, C., Džeroski, S.: Modelling pollen dispersal of genetically modified oilseed rape within the field. Abstract presented at The Annual Meeting of the Ecological Society of America, Montreal, Canada, August 7-12 (2005)
Google Scholar
Demšar, D., Džeroski, S., Henning Krogh, P., Larsen, T., Struyf, J.: Using multiobjective classification to model communities of soil microarthropods. Ecological Modelling (2005) (to appear)
Google Scholar
Džeroski, S., Demšar, D., Grbović, J.: Predicting chemical parameters of river water quality from bioindicator data. Applied Intelligence 13(1), 7–17 (2000)
Article Google Scholar
Džeroski, S., Colbach, N., Messean, A.: Analysing the effect of field characteristics on gene flow between oilseed rape varieties and volunteers with regression trees. Submitted to the The Second International Conference on Co-existence between GM and non-GM based agricultural supply chains (GMCC 2005), Montpellier, France, November 14-15 (2005)
Google Scholar
Garofalakis, M., Hyun, D., Rastogi, R., Shim, K.: Building decision trees with constraints. Data Mining and Knowledge Discovery 7(2), 187–214 (2003)
Article MathSciNet Google Scholar
Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)
Article Google Scholar
Kampichler, C., Džeroski, S., Wieland, R.: The application of machine learning techniques to the analysis of soil ecological data bases: Relationships between habitat features and collembola community characteristics. Soil Biology and Biochemistry 32, 197–209 (2000)
Article Google Scholar
Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann series in Machine Learning. Morgan Kaufmann, San Francisco (1993)
Google Scholar
Sain, R.S., Carmack, P.S.: Boosting multi-objective regression trees. Computing Science and Statistics 34, 232–241 (2002)
Google Scholar
Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science, Katholieke Universiteit Leuven, Celestijnenlaan 200A, B-3001, Leuven, Belgium
Jan Struyf
Dept. of Knowledge Technologies, Jozef Stefan Institute, Jamova 39, 1000, Ljubljana, Slovenia
Sašo Džeroski

Authors

Jan Struyf
View author publications
You can also search for this author in PubMed Google Scholar
Sašo Džeroski
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Pisa KDD Laboratory, ISTI - C.N.R, Area della Ricerca di Pisa, Via Giuseppe Moruzzi 1, Pisa, Italy
Francesco Bonchi
INSA-Lyon, LIRIS CNRS UMR5205, F-69621, Villeurbanne, France
Jean-François Boulicaut

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Struyf, J., Džeroski, S. (2006). Constraint Based Induction of Multi-objective Regression Trees. In: Bonchi, F., Boulicaut, JF. (eds) Knowledge Discovery in Inductive Databases. KDID 2005. Lecture Notes in Computer Science, vol 3933. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11733492_13

Download citation

DOI: https://doi.org/10.1007/11733492_13
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33292-3
Online ISBN: 978-3-540-33293-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics