Skip to main content

Regression Trees and Rule-Based Models

  • Chapter

Abstract

Tree-based models consist of one or more nested if-then statements for the predictors that partition the data. Within these partitions, a model is used to predict the outcome. Regression trees and regression model trees are basic partitioning models and are covered in Sections 8.1 and 8.2, respectively. In Section 8.3, we present rule-based models, which are models governed by if-then conditions (possibly created by a tree) that have been collapsed into independent conditions. Rules can be simplified or pruned in a way that samples are covered by multiple rules. Ensemble methods combine many trees (or rule-based models) into one model and tend to have much better predictive performance than single tree- or rule-based model. Popular ensemble techniques are bagging (Section 8.4), random forests (Section 8.5), boosting (Section 8.6), and Cubist (Section 8.7). In the Computing Section (8.8), we demonstrate how to train each of these models in R. Finally, exercises are provided at the end of the chapter to solidify the concepts.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   99.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    Also, note that the first three splits here involve the same predictors as the regression tree shown in Fig. 8.4 (and two of the three split values are identical).

  2. 2.

    We are indebted to the work of Chris Keefer, who extensively studied the Cubist source code.

References

  • Amit Y, Geman D (1997). “Shape Quantization and Recognition with Randomized Trees.” Neural Computation, 9, 1545–1588.

    Article  Google Scholar 

  • Ben-Dor A, Bruhn L, Friedman N, Nachman I, Schummer M, Yakhini Z (2000). “Tissue Classification with Gene Expression Profiles.” Journal of Computational Biology, 7(3), 559–583.

    Article  Google Scholar 

  • Bergstra J, Casagrande N, Erhan D, Eck D, Kégl B (2006). “Aggregate Features and AdaBoost for Music Classification.” Machine Learning, 65, 473–484.

    Article  Google Scholar 

  • Breiman L (1996a). “Bagging Predictors.” Machine Learning, 24(2), 123–140.

    MathSciNet  MATH  Google Scholar 

  • Breiman L (1996b). “Heuristics of Instability and Stabilization in Model Selection.” The Annals of Statistics, 24(6), 2350–2383.

    Article  MathSciNet  MATH  Google Scholar 

  • Breiman L (2000). “Randomizing Outputs to Increase Prediction Accuracy.” Mach. Learn., 40, 229–242. ISSN 0885-6125.

    Google Scholar 

  • Breiman L (2001). “Random Forests.” Machine Learning, 45, 5–32.

    Article  MATH  Google Scholar 

  • Breiman L, Friedman J, Olshen R, Stone C (1984). Classification and Regression Trees. Chapman and Hall, New York.

    MATH  Google Scholar 

  • Carolin C, Boulesteix A, Augustin T (2007). “Unbiased Split Selection for Classification Trees Based on the Gini Index.” Computational Statistics & Data Analysis, 52(1), 483–501.

    Article  MathSciNet  MATH  Google Scholar 

  • Dietterich T (2000). “An Experimental Comparison of Three Methods for Constructing Ensembles of Decision Trees: Bagging, Boosting, and Randomization.” Machine Learning, 40, 139–158.

    Article  Google Scholar 

  • Dudoit S, Fridlyand J, Speed T (2002). “Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data.” Journal of the American Statistical Association, 97(457), 77–87.

    Article  MathSciNet  MATH  Google Scholar 

  • Freund Y (1995). “Boosting a Weak Learning Algorithm by Majority.” Information and Computation, 121, 256–285.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman J (2001). “Greedy Function Approximation: A Gradient Boosting Machine.” Annals of Statistics, 29(5), 1189–1232.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman J (2002). “Stochastic Gradient Boosting.” Computational Statistics and Data Analysis, 38(4), 367–378.

    Article  MathSciNet  MATH  Google Scholar 

  • Friedman J, Hastie T, Tibshirani R (2000). “Additive Logistic Regression: A Statistical View of Boosting.” Annals of Statistics, 38, 337–374.

    Article  MathSciNet  MATH  Google Scholar 

  • Hastie T, Pregibon D (1990). “Shrinking Trees.” Technical report, AT&T Bell Laboratories Technical Report.

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2008). The Elements of Statistical Learning: Data Mining, Inference and Prediction. Springer, 2 edition.

    Google Scholar 

  • Ho T (1998). “The Random Subspace Method for Constructing Decision Forests.” IEEE Transactions on Pattern Analysis and Machine Intelligence, 13, 340–354.

    Google Scholar 

  • Holmes G, Hall M, Frank E (1993). “Generating Rule Sets from Model Trees.” In “Australian Joint Conference on Artificial Intelligence,”.

    Google Scholar 

  • Hothorn T, Hornik K, Zeileis A (2006). “Unbiased Recursive Partitioning: A Conditional Inference Framework.” Journal of Computational and Graphical Statistics, 15(3), 651–674.

    Article  MathSciNet  Google Scholar 

  • Kearns M, Valiant L (1989). “Cryptographic Limitations on Learning Boolean Formulae and Finite Automata.” In “Proceedings of the Twenty-First Annual ACM Symposium on Theory of Computing,”.

    Google Scholar 

  • Loh WY (2002). “Regression Trees With Unbiased Variable Selection and Interaction Detection.” Statistica Sinica, 12, 361–386.

    MathSciNet  MATH  Google Scholar 

  • Loh WY (2010). “Tree–Structured Classifiers.” Wiley Interdisciplinary Reviews: Computational Statistics, 2, 364–369.

    Article  Google Scholar 

  • Loh WY, Shih YS (1997). “Split Selection Methods for Classification Trees.” Statistica Sinica, 7, 815–840.

    MathSciNet  MATH  Google Scholar 

  • Quinlan R (1987). “Simplifying Decision Trees.” International Journal of Man–Machine Studies, 27(3), 221–234.

    Article  Google Scholar 

  • Quinlan R (1992). “Learning with Continuous Classes.” Proceedings of the 5th Australian Joint Conference On Artificial Intelligence, pp. 343–348.

    Google Scholar 

  • Quinlan R (1993a). “Combining Instance–Based and Model–Based Learning.” Proceedings of the Tenth International Conference on Machine Learning, pp. 236–243.

    Google Scholar 

  • Quinlan R (1993b). C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers.

    Google Scholar 

  • Ridgeway G (2007). “Generalized Boosted Models: A Guide to the gbm Package.” URL http://cran.r-project.org/web/packages/gbm/vignettes/gbm.pdf.

  • Schapire R (1990). “The Strength of Weak Learnability.” Machine Learning, 45, 197–227.

    Google Scholar 

  • Schapire YFR (1999). “Adaptive Game Playing Using Multiplicative Weights.” Games and Economic Behavior, 29, 79–103.

    Article  MathSciNet  MATH  Google Scholar 

  • Strobl C, Boulesteix A, Zeileis A, Hothorn T (2007). “Bias in Random Forest Variable Importance Measures: Illustrations, Sources and a Solution.” BMC Bioinformatics, 8(1), 25.

    Article  Google Scholar 

  • Valiant L (1984). “A Theory of the Learnable.” Communications of the ACM, 27, 1134–1142.

    Article  MATH  Google Scholar 

  • Varmuza K, He P, Fang K (2003). “Boosting Applied to Classification of Mass Spectral Data.” Journal of Data Science, 1, 391–404.

    Google Scholar 

  • Wang Y, Witten I (1997). “Inducing Model Trees for Continuous Classes.” Proceedings of the Ninth European Conference on Machine Learning, pp. 128–137.

    Google Scholar 

  • Westfall P, Young S (1993). Resampling–Based Multiple Testing: Examples and Methods for P–Value Adjustment. Wiley.

    Google Scholar 

  • Zeileis A, Hothorn T, Hornik K (2008). “Model–Based Recursive Partitioning.” Journal of Computational and Graphical Statistics, 17(2), 492–514.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer Science+Business Media New York

About this chapter

Cite this chapter

Kuhn, M., Johnson, K. (2013). Regression Trees and Rule-Based Models. In: Applied Predictive Modeling. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-6849-3_8

Download citation

Publish with us

Policies and ethics