Comments on: A random forest guided tour

Arlot, Sylvain; Genuer, Robin

doi:10.1007/s11749-016-0484-4

Comments on: A random forest guided tour

Discussion
Published: 19 April 2016

Volume 25, pages 228–238, (2016)
Cite this article

TEST Aims and scope Submit manuscript

Sylvain Arlot¹ &
Robin Genuer²

947 Accesses
6 Citations
Explore all metrics

Abstract

This paper is a comment on the survey paper by Biau and Scornet (TEST, 2016. doi:10.1007/s11749-016-0481-7) about random forests. We focus on the problem of quantifying the impact of each ingredient of random forests on their performance. We show that such a quantification is possible for a simple pure forest, leading to conclusions that could apply more generally. Then, we consider “hold-out” random forests, which are a good middle point between “toy” pure forests and Breiman’s original random forests.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Comments on: A random forest guided tour

Article 19 April 2016

Peter Bühlmann & Florencia Leonardi

Comments on: A random forest guided tour

Article 19 April 2016

Giles Hooker & Lucas Mentch

A random forest guided tour

Article 19 April 2016

Gérard Biau & Erwan Scornet

References

Arlot S (2008) $V$-fold cross-validation improved: $V$-fold penalization. arXiv:0802.0566v2
Arlot S, Genuer R (2014) Analysis of purely random forests bias. arXiv:1407.3939v1
Biau G (2012) Analysis of a random forests model. J Mach Learn Res 13:1063–1095
MathSciNet MATH Google Scholar
Biau G, Scornet E (2016) A random forest guided tour. TEST. doi:10.1007/s11749-016-0481-7
Google Scholar
Bühlmann P, Yu B (2002) Analyzing bagging. Ann Stat 30(4):927–961
Article MathSciNet MATH Google Scholar
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat, pp 1–67
Genuer R (2012) Variance reduction in purely random forests. J Nonparametric Stat 24(3):543–562
Article MathSciNet MATH Google Scholar
Genuer R, Poggi JM, Tuleau C (2008) Random forests: some methodological insights. arXiv:0811.3619
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22. http://CRAN.R-project.org/doc/Rnews/
R Core Team (2015) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/

Download references

Author information

Authors and Affiliations

Laboratoire de Mathématiques d’Orsay, Univ. Paris-Sud, CNRS, Université Paris-Saclay, 91405, Orsay, France
Sylvain Arlot
Univ. Bordeaux, ISPED, Centre INSERM U-1219, INRIA Bordeaux Sud-Ouest, Equipe SISTM, 146 rue Léo Saignat, 33076, Bordeaux Cedex, France
Robin Genuer

Authors

Sylvain Arlot
View author publications
You can also search for this author in PubMed Google Scholar
Robin Genuer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sylvain Arlot.

Additional information

This comment refers to the invited paper available at: doi:10.1007/s11749-016-0481-7.

The research of the authors was partly supported by the French Agence Nationale de la Recherche (ANR 2011 BS01 010 01 projet Calibration). S. Arlot was also partly supported by Institut des Hautes Études Scientifiques (IHES, Le Bois-Marie, 35, route de Chartres, 91440 Bures-Sur-Yvette, France).

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 152 KB)

Appendices

Appendix 1: Approximation and estimation errors

We state a general decomposition of the risk of a forest having the X-property (that is, when partitions are built independently from $(Y_i)_{1 \le i \le n}$), that we need for proving the results of Sect. 1, but can be useful more generally. We assume that ${\mathbb {E}}[Y_i^2]<+\infty $ for all i.

For any random forest $m_{M,n}$ having the X-property, following Sections 2 and 3.2 of Biau and Scornet’s survey, we can write

$$\begin{aligned} \begin{array}{c} \displaystyle m_{M,n} ({\mathbf {x}};{\varTheta }_{1 \ldots M}, {\mathscr {D}}_n) =\sum _{i=1}^n W_{ni}({\mathbf {x}}) Y_i\\ \displaystyle \quad \text {where} \quad W_{ni}({\mathbf {x}}) = W_{ni}({\mathbf {x}} ;{\varTheta }_{1 \ldots M} , X_{1 \ldots n}) = \frac{1}{M} \sum _{j=1}^M \frac{C_i({\varTheta }_j) {\mathbf {1}}_{X_i \in A_n({\mathbf {x}}; {\varTheta }_j ; X_{1 \ldots n})}}{N_n({\mathbf {x}}; {\varTheta }_j ; X_{1 \ldots n})}, \end{array} \end{aligned}$$

(1)

$C_i({\varTheta }_j)$ is the number of times $(X_i,Y_i)$ appears in the j-th resample, $A_n({\mathbf {x}}; {\varTheta }_j ; X_{1 \ldots n})$ is the cell containing ${\mathbf {x}}$ in the j-th tree, and

$$\begin{aligned} N_n({\mathbf {x}}; {\varTheta }_j ; X_{1 \ldots n}) = \sum _{i=1}^n C_i({\varTheta }_j) {\mathbf {1}}_{X_i \in A_n({\mathbf {x}}; {\varTheta }_j ; X_{1 \ldots n})}. \end{aligned}$$

Now, let us define

$$\begin{aligned} m^{\star }_{M,n} ({\mathbf {x}};{\varTheta }_{1 \ldots M}, X_{1 \ldots n})= & {} {\mathbb {E}}\Bigl [ m_{M,n} ({\mathbf {x}};{\varTheta }_{1 \ldots M}, {\mathscr {D}}_n) \, \big | \,X_{1 \ldots n}, {\varTheta }_{1 \ldots M} \Bigr ]\\= & {} \sum _{i=1}^n W_{ni}({\mathbf {x}};{\varTheta }_{1 \ldots M} , X_{1 \ldots n}) m(X_i)\\ \text {and}\quad {\overline{m}}^{\star }_{M,n} ({\mathbf {x}};{\varTheta }_{1 \ldots M})= & {} {\mathbb {E}}\Bigl [ m^{\star }_{M,n} ({\mathbf {x}};{\varTheta }_{1 \ldots M}, X_{1 \ldots n}) \, \big | \,{\varTheta }_{1 \ldots M} \Bigr ]. \end{aligned}$$

By definition of the conditional expectation, we can decompose the risk of $m_{M,n}$ at ${\mathbf {x}}$ into three terms

$$\begin{aligned} {\mathbb {E}}\Bigl [ \bigl ( m_{M,n}({\mathbf {x}}) - m ({\mathbf {x}}) \bigr )^2 \Bigr ]= & {} \underbrace{ {\mathbb {E}}\Bigl [ \bigl ( {\overline{m}}^{\star }_{M,n} ({\mathbf {x}}) - m({\mathbf {x}}) \bigr )^2 \Bigr ] }_{A =\mathrm{approximation}~\mathrm{error}}\nonumber \\&+ \underbrace{ {\mathbb {E}}\Bigl [ \bigl ( m^{\star }_{M,n} ({\mathbf {x}}) - {\overline{m}}^{\star }_{M,n}({\mathbf {x}}) \bigr )^2 \Bigr ] }_{{\varDelta }} \nonumber \\&+ \underbrace{ {\mathbb {E}}\Bigl [ \bigl ( m_{M,n}({\mathbf {x}}) - m^{\star }_{M,n} ({\mathbf {x}}) \bigr )^2 \Bigr ] }_{E = \mathrm{estimation}~\mathrm{error}}. \end{aligned}$$

(2)

In the fixed-design regression setting (where the $X_i$ are deterministic), A is called approximation error, ${\varDelta }=0$, and E is called estimation error. Things are a bit more complicated in the random-design setting—when $(X_i,Y_i)_{1 \le i \le n}$ are independent and identically distributed—since ${\varDelta } \ne 0$ in general. Up to minor differences related to how $m_{n}$ is defined on empty cells, A is still the approximation error, and the estimation error is ${\varDelta } + E$.

Let us finally assume that $(X_i,Y_i)_{1 \le i \le n}$ are independent and define

$$\begin{aligned} \sigma ^2(X_i) = {\mathbb {E}}{}\left[ \bigl ( m(X_i) - Y_i \bigr )^2 \, \big | \,X_i \right] {}. \end{aligned}$$

Then, since the weights $W_{ni}({\mathbf {x}})$ only depend on ${\mathscr {D}}_n$ through $X_{1 \ldots n}$, we have the following formula for the estimation error

$$\begin{aligned} E = {\mathbb {E}}{}\left[ {} \left( \sum _{i=1}^n W_{ni}({\mathbf {x}}) \bigl ( m(X_i) - Y_i \bigr ) \right) ^2 {} \right] {} = {\mathbb {E}}{}\left[ \sum _{i=1}^n W_{ni}({\mathbf {x}})^2 \sigma ^2(X_i) \right] {}. \end{aligned}$$

For instance, in the homoscedastic case, $\sigma ^2(X_i) \equiv \sigma ^2$ and

$$\begin{aligned} E = {\mathbb {E}}{}\left[ {} \left( \sum _{i=1}^n W_{ni}({\mathbf {x}}) \bigl ( m(X_i) - Y_i \bigr ) \right) ^2 {} \right] {} = \sigma ^2 {\mathbb {E}}{}\left[ \sum _{i=1}^n W_{ni}({\mathbf {x}})^2 \right] {}. \end{aligned}$$

(3)

Appendix 2: Analysis of the toy forest: proofs

We prove the results stated in Sect. 1 for the one-dimensional toy forest.

Since the toy forest is purely random, all results of Appendix 1 apply, with ${\varTheta } = (T,I)$ and $C_i({\varTheta }) = {\mathbf {1}}_{i \in I}$. It remains to compute the three terms of Eq. (2).

Since we assume m is of class ${\mathscr {C}}^3$, we can use the results of Arlot and Genuer (2014, Section 4) for the approximation error A (up to minor differences in the definition of ${\overline{m}}^{\star }_{M,n} ({\mathbf {x}})$, due to event where $A_n({\mathbf {x}};{\varTheta })$ is empty, which has a small probability since $a \gg k$). We assume that $m'({\mathbf {x}}) \ne 0$ and $m''({\mathbf {x}}) \ne 0$ for simplicity, so the quantities appearing in Table 1 indeed provide the order of magnitude of A.

The middle term ${\varDelta }$ in decomposition (2) is negligible in front of E for a single tree, which can be proved using results from Arlot (2008), as soon as $m'({\mathbf {x}}) / k \ll \sigma $ and $a \gg k$. We assume that it can also be neglected for an infinite forest.

For the estimation error, we can use Eq. (3) and the following arguments. First, for every $i \in \{1, \ldots , n\}$, $X_i$ belongs to $A_n({\mathbf {x}};{\varTheta })$ with probability 1 / k. Combined with the subsampling process, we get that

$$\begin{aligned} N_n({\mathbf {x}};{\varTheta }; X_{1 \ldots n}) \sim {\mathscr {B}}{}\left( n , \frac{a}{n k} \right) {} \end{aligned}$$

is close to its expectation a / k with probability almost one if $a/k \gg \log (n)$. Assuming that this holds simultaneously for a huge fraction of the subsamples, we get the approximation

$$\begin{aligned} W_{ni}^{\mathrm {toy}}({\mathbf {x}})= & {} \frac{1}{M} \sum _{j=1}^M \frac{{\mathbf {1}}_{i \in I_j} {\mathbf {1}}_{X_i \in A_n({\mathbf {x}}; {\varTheta }_j)}}{N_n({\mathbf {x}}; {\varTheta }_j ; X_{1 \ldots n})}\nonumber \\\approx & {} \frac{k}{a} \frac{1}{M} \sum _{j=1}^M {\mathbf {1}}_{i \in I_j} {\mathbf {1}}_{X_i \in A_n({\mathbf {x}}; {\varTheta }_j)} =: {\widetilde{W}}_{ni}^{\mathrm {toy}}({\mathbf {x}}). \end{aligned}$$

(4)

Now, we note that conditionally to $X_{1 \ldots n}$, the variables ${\mathbf {1}}_{i \in I_j} {\mathbf {1}}_{X_i \in A_n({\mathbf {x}}; {\varTheta }_j)}$, $j=1, \ldots , M$ are independent and follow a Bernoulli distribution with the same parameter

$$\begin{aligned} \frac{a}{n} \times \bigl ( 1-k|X_i-x| \bigr )_+. \end{aligned}$$

Therefore,

$$\begin{aligned} {\mathbb {E}}{}\left[ {\widetilde{W}}_{ni}^{\mathrm {toy} }({\mathbf {x}})^2 \, \big | \, X_{1 \ldots n} \right] {}= & {} \frac{k^2 }{ n a} {}\left[ {}\left( 1 - \frac{1}{M} \right) {} \frac{a}{n} \Bigl ( \bigl ( 1-k|X_i-x| \bigr )_+ \Bigr )^2 \right. \\&+\left. \frac{1}{M} \bigl ( 1-k|X_i-x| \bigr )_+ \right] {}\\ \text {hence} \quad {\mathbb {E}}{}\left[ {\widetilde{W}}_{ni}^{\mathrm {toy} }({\mathbf {x}})^2 \right] {}= & {} \frac{k}{n a} {}\left[ {}\left( 1 - \frac{1}{M} \right) {} \frac{2 a}{3 n} + \frac{1}{M} \right] {}. \end{aligned}$$

By Eq. (3), this ends the proof of the results in the bottom line of Table 1.

Similar arguments apply for justifying the top line of Table 1, where $T_j=0$ almost surely.

Note that we have not given a full rigorous proof of the results shown in Table 1, because of the approximation (4) and of the term ${\varDelta }$ that we have neglected. We are convinced that the parts of the proof that we have skipped might only require to add some technical assumptions, which would not help to reach our goal of understanding better random forests in general.

Appendix 3: Details about the experiments

This section describes the experiments whose results are shown in Sect. 2.

Data generation process We take ${\mathscr {X}}= [0,1]^{p}$, with $p \in \{ 5, 10 \}$. Table 2 only shows the results for $p=5$. Results for $p=10$ are shown in supplementary material.

The data $(X_i,Y_i)_{1 \le i \le n_1 + n_2}$ are independent with the same distribution: $X_i \sim {\mathscr {U}}([0,1]^{p})$, $Y_i = m(X_i) + \varepsilon _i$ with $\varepsilon _i \sim {\mathscr {N}}(0,\sigma ^2)$ independent from $X_i$, $\sigma ^2 = 1/16$, and the regression function m is defined by

$$\begin{aligned} m : {\mathbf {x}} \in [0,1]^{p} \mapsto \mathbf {1/10} \times [10 \sin (\pi x_1 x_2) + 20 (x_3 - 0.5)^2 + 10 x_4 + 5 x_5]. \end{aligned}$$

The function m is proportional to the Friedman1 function which was introduced by Friedman (1991). Note that when $p>5$, m only depends on the 5 first coordinates of ${\mathbf {x}}$.

Then, the two subsamples are defined by ${\mathscr {D}}_{n_1}^1= (X_i,Y_i)_{1 \le i \le n_1}$ and ${\mathscr {D}}_{n_2}^2= (X_i,Y_i)_{n_1 + 1 \le i \le n_1+n_2}$.

We always take $n_1 = 1280$ and $n_2 = 25{,}600$.

Trees and forests For each $k \in \{2^5, 2^6, 2^7, 2^8\}$, each experimental condition (bootstrap or not, ${\mathtt {mtry}}=p$ or $\lfloor p/3 \rfloor $), we build some hold-out random trees and forests as defined in Sect. 2. These are built with the randomForest R package (Liaw and Wiener 2002; R Core Team 2015), with appropriate parameters (k is controlled by maxnodes, while ${\mathtt {nodesize}}=1$).

Resampling within ${\mathscr {D}}_{n_1}^1$ (when there is some resampling) is done with a bootstrap sample of size $n_1$ (that is, with replacement and $a_{n_1} = n_1$).

“Large” forests are made of $M=k$ trees (a number of trees suggested by Arlot and Genuer 2014).

Estimates of approximation and estimation error Estimating approximation and estimation errors (as defined by Eq. (2)) requires to estimate some expectations over ${\varTheta }$ (which includes the randomness of ${\mathscr {D}}_{n_1}^1$ as well as the randomness of the choice of bootstrap subsamples of ${\mathscr {D}}_{n_1}^1$ and of the repeated choices of a subset ${\mathscr {M}}_{\mathrm {try}}$). This is done with a Monte-Carlo approximation, with 500 replicates for trees and 10 replicates for forests. This number might seem small, but we observe that large forests are quite stable, hence expectations can be evaluated precisely from a small number of replicates.

We estimate the approximation error (integrated over ${\mathbf {x}}$) as follows. For each partition that we build, we compute the corresponding “ideal” tree, which maps each piece of the partition to the average of m over it (this average can be computed almost exactly from the definition of m). Then, to each forest we associate the “ideal” forest ${\overline{m}}^{\star }_{M,n}$ which is the average of the ideal trees. We can thus compute $( {\overline{m}}^{\star }_{M,n} ({\mathbf {x}}) - m({\mathbf {x}}) )^2$ for any ${\mathbf {x}} \in {\mathscr {X}}$, and estimate its expectation with respect to ${\varTheta }$. Averaging these estimates over 1000 uniform random points ${\mathbf {x}} \in {\mathscr {X}}$ provides our estimate of the approximation error.

We estimate the estimation error (integrated over ${\mathbf {x}}$) from Eq. (3); since $\sigma ^2$ is known, we focus on the remaining term. Given some hold-out random forest, for any ${\mathbf {x}} \in {\mathscr {X}}$ and $i \in \{1, \ldots , n\}$, we can compute

$$\begin{aligned} W_{ni}({\mathbf {x}}) = \frac{1}{M} \sum _{j=1}^M \sum _{(X_i,Y_i) \in {\mathscr {D}}_{n_2}^2} \frac{{\mathbf {1}}_{X_i \in A_{n_1}({\mathbf {x}} ; {\varTheta }_j , {\mathscr {D}}_{n_1}^1) }}{N_{n_2}( {\mathbf {x}} ; {\varTheta }_j, {\mathscr {D}}_{n_1}^1, {\mathscr {D}}_{n_2}^2) }. \end{aligned}$$

Then, averaging $\sum _i W_{ni}({\mathbf {x}})^2$ over several replicate trees/forests and over $1\,000$ uniform random points ${\mathbf {x}} \in {\mathscr {X}}$, we get an estimate of the estimation error (divided by $\sigma ^2$).

Summarizing the results in Table 2 Given the estimates of the (integrated) approximation and estimation errors that we obtain for every $k \in \{2^5, 2^6, 2^7, 2^8\}$, we plot each kind of error as a function of k (in $\mathrm {log}_2$-$\mathrm {log}_2$ scale for the approximation error), and we fit a simple linear model (with an intercept). The estimated parameters of the model directly give the results shown in Table 2 (in which the value of the intercept for the estimation error is omitted for simplicity). The corresponding graphs are shown in supplementary material.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arlot, S., Genuer, R. Comments on: A random forest guided tour. TEST 25, 228–238 (2016). https://doi.org/10.1007/s11749-016-0484-4

Download citation

Published: 19 April 2016
Issue Date: June 2016
DOI: https://doi.org/10.1007/s11749-016-0484-4

Keywords

Mathematics Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Comments on: A random forest guided tour

Abstract

Access this article

Similar content being viewed by others

Comments on: A random forest guided tour

Comments on: A random forest guided tour

A random forest guided tour

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 152 KB)

Appendices

Appendix 1: Approximation and estimation errors

Appendix 2: Analysis of the toy forest: proofs

Appendix 3: Details about the experiments

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Comments on: A random forest guided tour

Abstract

Access this article

Similar content being viewed by others

Comments on: A random forest guided tour

Comments on: A random forest guided tour

A random forest guided tour

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplementary material 1 (pdf 152 KB)

Appendices

Appendix 1: Approximation and estimation errors

Appendix 2: Analysis of the toy forest: proofs

Appendix 3: Details about the experiments

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation