This section describes the experiments whose results are shown in Sect. 2.

*Data generation process* We take \({\mathscr {X}}= [0,1]^{p}\), with \(p \in \{ 5, 10 \}\). Table 2 only shows the results for \(p=5\). Results for \(p=10\) are shown in supplementary material.

The data

\((X_i,Y_i)_{1 \le i \le n_1 + n_2}\) are independent with the same distribution:

\(X_i \sim {\mathscr {U}}([0,1]^{p})\),

\(Y_i = m(X_i) + \varepsilon _i\) with

\(\varepsilon _i \sim {\mathscr {N}}(0,\sigma ^2)\) independent from

\(X_i\),

\(\sigma ^2 = 1/16\), and the regression function

*m* is defined by

$$\begin{aligned} m : {\mathbf {x}} \in [0,1]^{p} \mapsto \mathbf {1/10} \times [10 \sin (\pi x_1 x_2) + 20 (x_3 - 0.5)^2 + 10 x_4 + 5 x_5]. \end{aligned}$$

The function

*m* is proportional to the

**Friedman1** function which was introduced by Friedman (

1991). Note that when

\(p>5\),

*m* only depends on the 5 first coordinates of

\({\mathbf {x}}\).

Then, the two subsamples are defined by \({\mathscr {D}}_{n_1}^1= (X_i,Y_i)_{1 \le i \le n_1}\) and \({\mathscr {D}}_{n_2}^2= (X_i,Y_i)_{n_1 + 1 \le i \le n_1+n_2}\).

We always take \(n_1 = 1280\) and \(n_2 = 25{,}600\).

*Trees and forests* For each \(k \in \{2^5, 2^6, 2^7, 2^8\}\), each experimental condition (bootstrap or not, \({\mathtt {mtry}}=p\) or \(\lfloor p/3 \rfloor \)), we build some hold-out random trees and forests as defined in Sect. 2. These are built with the randomForest R package (Liaw and Wiener 2002; R Core Team 2015), with appropriate parameters (*k* is controlled by maxnodes, while \({\mathtt {nodesize}}=1\)).

Resampling within \({\mathscr {D}}_{n_1}^1\) (when there is some resampling) is done with a bootstrap sample of size \(n_1\) (that is, with replacement and \(a_{n_1} = n_1\)).

“Large” forests are made of \(M=k\) trees (a number of trees suggested by Arlot and Genuer 2014).

*Estimates of approximation and estimation error* Estimating approximation and estimation errors (as defined by Eq. (2)) requires to estimate some expectations over \({\varTheta }\) (which includes the randomness of \({\mathscr {D}}_{n_1}^1\) as well as the randomness of the choice of bootstrap subsamples of \({\mathscr {D}}_{n_1}^1\) and of the repeated choices of a subset \({\mathscr {M}}_{\mathrm {try}}\)). This is done with a Monte-Carlo approximation, with 500 replicates for trees and 10 replicates for forests. This number might seem small, but we observe that large forests are quite stable, hence expectations can be evaluated precisely from a small number of replicates.

We estimate the approximation error (integrated over \({\mathbf {x}}\)) as follows. For each partition that we build, we compute the corresponding “ideal” tree, which maps each piece of the partition to the average of *m* over it (this average can be computed almost exactly from the definition of *m*). Then, to each forest we associate the “ideal” forest \({\overline{m}}^{\star }_{M,n}\) which is the average of the ideal trees. We can thus compute \(( {\overline{m}}^{\star }_{M,n} ({\mathbf {x}}) - m({\mathbf {x}}) )^2\) for any \({\mathbf {x}} \in {\mathscr {X}}\), and estimate its expectation with respect to \({\varTheta }\). Averaging these estimates over 1000 uniform random points \({\mathbf {x}} \in {\mathscr {X}}\) provides our estimate of the approximation error.

We estimate the estimation error (integrated over

\({\mathbf {x}}\)) from Eq. (

3); since

\(\sigma ^2\) is known, we focus on the remaining term. Given some hold-out random forest, for any

\({\mathbf {x}} \in {\mathscr {X}}\) and

\(i \in \{1, \ldots , n\}\), we can compute

$$\begin{aligned} W_{ni}({\mathbf {x}}) = \frac{1}{M} \sum _{j=1}^M \sum _{(X_i,Y_i) \in {\mathscr {D}}_{n_2}^2} \frac{{\mathbf {1}}_{X_i \in A_{n_1}({\mathbf {x}} ; {\varTheta }_j , {\mathscr {D}}_{n_1}^1) }}{N_{n_2}( {\mathbf {x}} ; {\varTheta }_j, {\mathscr {D}}_{n_1}^1, {\mathscr {D}}_{n_2}^2) }. \end{aligned}$$

Then, averaging

\(\sum _i W_{ni}({\mathbf {x}})^2\) over several replicate trees/forests and over

\(1\,000\) uniform random points

\({\mathbf {x}} \in {\mathscr {X}}\), we get an estimate of the estimation error (divided by

\(\sigma ^2\)).

*Summarizing the results in Table* 2 Given the estimates of the (integrated) approximation and estimation errors that we obtain for every \(k \in \{2^5, 2^6, 2^7, 2^8\}\), we plot each kind of error as a function of *k* (in \(\mathrm {log}_2\)-\(\mathrm {log}_2\) scale for the approximation error), and we fit a simple linear model (with an intercept). The estimated parameters of the model directly give the results shown in Table 2 (in which the value of the intercept for the estimation error is omitted for simplicity). The corresponding graphs are shown in supplementary material.