Advertisement

Constraint Based Induction of Multi-objective Regression Trees

  • Jan Struyf
  • Sašo Džeroski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3933)

Abstract

Constrained based inductive systems are a key component of inductive databases and responsible for building the models that satisfy the constraints in the inductive queries. In this paper, we propose a constraint based system for building multi-objective regression trees. A multi-objective regression tree is a decision tree capable of predicting several numeric variables at once. We focus on size and accuracy constraints. By either specifying maximum size or minimum accuracy, the user can trade-off size (and thus interpretability) for accuracy. Our approach is to first build a large tree based on the training data and to prune it in a second step to satisfy the user constraints. This has the advantage that the tree can be stored in the inductive database and used for answering inductive queries with different constraints. Besides size and accuracy constraints, we also briefly discuss syntactic constraints. We evaluate our system on a number of real world data sets and measure the size versus accuracy trade-off.

Keywords

Mean Square Error Soil Quality Regression Tree Target Variable Size Constraint 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Almuallim, H.: An efficient algorithm for optimal pruning of decision trees. Artificial Intelligence 83(2), 347–362 (1996)CrossRefGoogle Scholar
  2. 2.
    Blockeel, H., De Raedt, L., Ramon, J.: Top-down induction of clustering trees. In: Proceedings of the 15th International Conference on Machine Learning, pp. 55–63 (1998)Google Scholar
  3. 3.
    Bohanec, M., Bratko, I.: Trading accuracy for simplicity in decision trees. Machine Learning 15(3), 223–250 (1994)MATHGoogle Scholar
  4. 4.
    Breiman, L.: Bagging predictors. Machine Learning 24(2), 123–140 (1996)MATHGoogle Scholar
  5. 5.
    Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)CrossRefMATHGoogle Scholar
  6. 6.
    Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and Regression Trees. Wadsworth, Belmont (1984)MATHGoogle Scholar
  7. 7.
    De Raedt, L.: A perspective on inductive databases. SIGKDD Explorations 4(2), 69–77 (2002)MathSciNetCrossRefGoogle Scholar
  8. 8.
    Demšar, D., Debeljak, M., Lavigne, C., Džeroski, S.: Modelling pollen dispersal of genetically modified oilseed rape within the field. Abstract presented at The Annual Meeting of the Ecological Society of America, Montreal, Canada, August 7-12 (2005)Google Scholar
  9. 9.
    Demšar, D., Džeroski, S., Henning Krogh, P., Larsen, T., Struyf, J.: Using multiobjective classification to model communities of soil microarthropods. Ecological Modelling (2005) (to appear)Google Scholar
  10. 10.
    Džeroski, S., Demšar, D., Grbović, J.: Predicting chemical parameters of river water quality from bioindicator data. Applied Intelligence 13(1), 7–17 (2000)CrossRefGoogle Scholar
  11. 11.
    Džeroski, S., Colbach, N., Messean, A.: Analysing the effect of field characteristics on gene flow between oilseed rape varieties and volunteers with regression trees. Submitted to the The Second International Conference on Co-existence between GM and non-GM based agricultural supply chains (GMCC 2005), Montpellier, France, November 14-15 (2005)Google Scholar
  12. 12.
    Garofalakis, M., Hyun, D., Rastogi, R., Shim, K.: Building decision trees with constraints. Data Mining and Knowledge Discovery 7(2), 187–214 (2003)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Imielinski, T., Mannila, H.: A database perspective on knowledge discovery. Communications of the ACM 39(11), 58–64 (1996)CrossRefGoogle Scholar
  14. 14.
    Kampichler, C., Džeroski, S., Wieland, R.: The application of machine learning techniques to the analysis of soil ecological data bases: Relationships between habitat features and collembola community characteristics. Soil Biology and Biochemistry 32, 197–209 (2000)CrossRefGoogle Scholar
  15. 15.
    Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann series in Machine Learning. Morgan Kaufmann, San Francisco (1993)Google Scholar
  16. 16.
    Sain, R.S., Carmack, P.S.: Boosting multi-objective regression trees. Computing Science and Statistics 34, 232–241 (2002)Google Scholar
  17. 17.
    Witten, I., Frank, E.: Data Mining: Practical machine learning tools and techniques, 2nd edn. Morgan Kaufmann, San Francisco (2005)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Jan Struyf
    • 1
  • Sašo Džeroski
    • 2
  1. 1.Dept. of Computer ScienceKatholieke Universiteit LeuvenLeuvenBelgium
  2. 2.Dept. of Knowledge TechnologiesJozef Stefan InstituteLjubljanaSlovenia

Personalised recommendations