Abstract
In this chapter, we construct decision trees by estimating the relationship between the covariates and the response from the observed data. Starting from the root, each vertex traces to either the left or right at each branch, depending on whether a condition w.r.t. the covariates is met, and finally reaches a terminal node to obtain the response. Compared to the methods we have considered thus far, since it is expressed as a simple structure, the estimation accuracy of a decision tree is poor, but since it is expressed visually, it is easy to understand the relationship between the covariates and the response. Decision trees are often used to understand relationships rather than to predict the future, and decision trees can be used for regression and classification. The decision tree has the problem that the estimated tree shapes differ greatly even if observation data that follow the same distribution are used. Therefore, similar to the bootstrap discussed in Chap. 4, by sampling data of the same size from the original data multiple times, we reduce the variation in the obtained decision tree and this improvement can be considered. Finally, we introduce a method (boosting) that produces many small decision trees in the same way as the backfitting method learned in Chap. 7 to make highly accurate predictions.
This is a preview of subscription content, access via your institution.
Buying options
Author information
Authors and Affiliations
Corresponding author
Exercises 69–74
Exercises 69–74

69.
Write the following functions in the R language, where each input y is a vector. For example, we write the function mode of vector y as follows:

a.
sq.loss that given input vector y, outputs the square sum of the differences between each element and the arithmetic average

b.
mis.match that given input vector y, outputs the number of mismatches between each element and the mode

a.

70.
We used the function branch below to construct a tree. Given matrix x, vector y, loss function f, and the set of row indeces S, the procedure outputs the division of S that minimizes the sum of the losses for the two new sets of indices. Fill in the blanks and execute the program.

71.
The following procedure constructs a decision tree using the function branch and a loss function. Execute the procedure for Fisher’s iris data set and \(n.min=5\), \(\alpha =0\), and draw the graph.
The decision tree is obtained using vertex.

72.
For the Boston data set, we consider finding the optimum \(0\le \alpha \le 1.5\) via tenfold CV. Fill in either train or test in each blank to execute the procedure.

73.
We wish to modify branch and to construct a random forest procedure. Fill in the blanks, and execute the procedure.

74.
We execute boosting using the gbm package for the Boston data set. Look up the gbm package, fill in the blanks, and draw the graph.
Rights and permissions
Copyright information
© 2020 The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Suzuki, J. (2020). Decision Trees. In: Statistical Learning with Math and R. Springer, Singapore. https://doi.org/10.1007/9789811575686_8
Download citation
DOI: https://doi.org/10.1007/9789811575686_8
Published:
Publisher Name: Springer, Singapore
Print ISBN: 9789811575679
Online ISBN: 9789811575686
eBook Packages: Computer ScienceComputer Science (R0)