Skip to main content
Log in

Tree-based boosting with functional data

  • Original Paper
  • Published:
Computational Statistics Aims and scope Submit manuscript

Abstract

In this article we propose a boosting algorithm for regression with functional explanatory variables and scalar responses. The algorithm uses decision trees constructed with multiple projections as the “base-learners”, which we call “functional multi-index trees”. We establish identifiability conditions for these trees and introduce two algorithms to compute them. We use numerical experiments to investigate the performance of our method and compare it with several linear and nonlinear regression estimators, including recently proposed nonparametric and semiparametric functional additive estimators. Simulation studies show that the proposed method is consistently among the top performers, whereas the performance of existing alternatives can vary substantially across different settings. In a real example, we apply our method to predict electricity demand using price curves and show that our estimator provides better predictions compared to its competitors, especially when one adjusts for seasonality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Ait-Saïdi A, Ferraty F, Kassa R et al (2008) Cross-validated estimations in the single-functional index model. Statistics 42(6):475–494

    MathSciNet  Google Scholar 

  • Amato U, Antoniadis A, De Feis I (2006) Dimension reduction in functional regression with applications. Comput Stat Data Anal 50(9):2422–2446

    MathSciNet  Google Scholar 

  • Amiri A, Crambes C, Thiam B (2014) Recursive estimation of nonparametric regression with functional covariate. Comput Stat Data Anal 69:154–172

    MathSciNet  Google Scholar 

  • Avery M, Wu Y, Helen Zhang H et al (2014) RKHS-based functional nonparametric regression for sparse and irregular longitudinal data. Can J Stat 42(2):204–216

    MathSciNet  Google Scholar 

  • Baíllo A, Grané A (2009) Local linear regression for functional predictor and scalar response. J Multivar Anal 100(1):102–111

    MathSciNet  Google Scholar 

  • Barrientos-Marin J, Ferraty F, Vieu P (2010) Locally modelled regression and functional data. J Nonparametr Stat 22(5):617–632

    MathSciNet  Google Scholar 

  • Bates D, Mächler M, Bolker B et al (2015) Fitting linear mixed-effects models using lme4. J Stat Softw 67(1):1–48

    Google Scholar 

  • Berlinet A, Elamine A, Mas A (2011) Local linear regression for functional data. Ann Inst Stat Math 63(5):1047–1075

    MathSciNet  Google Scholar 

  • Blumenson L (1960) A derivation of n-dimensional spherical coordinates. Am Math Mon 67(1):63–66

    MathSciNet  Google Scholar 

  • Boente G, Salibian-Barrera M (2021) Robust functional principal components for sparse longitudinal data. METRON 79(2):1–30

    MathSciNet  Google Scholar 

  • Breiman L, Friedman J, Olshen R et al (1984) Classification and regression trees, 1st edn. Routledge, Routledge

    Google Scholar 

  • Burba F, Ferraty F, Vieu P (2009) K-nearest neighbour method in functional nonparametric regression. J Nonparametr Stat 21(4):453–469

    MathSciNet  Google Scholar 

  • Cardot H, Sarda P (2005) Estimation in generalized linear models for functional data via penalized likelihood. J Multivar Anal 92(1):24–41

    MathSciNet  Google Scholar 

  • Cardot H, Ferraty F, Sarda P (1999) Functional linear model. Stat Probabil Lett 45(1):11–22

    MathSciNet  Google Scholar 

  • Cardot H, Ferraty F, Sarda P (2003) Spline estimators for the functional linear model. Stat Sin 13(3):571–591

    MathSciNet  Google Scholar 

  • Carroll C, Gajardo A, Chen Y et al (2021) fdapace: functional data analysis and empirical dynamics. https://CRAN.R-project.org/package=fdapace, R package version 0.5.6

  • Chen D, Hall P, Müller HG et al (2011) Single and multiple index functional regression models with nonparametric link. Ann Stat 39(3):1720–1747

    MathSciNet  Google Scholar 

  • Dou WW, Pollard D, Zhou HH et al (2012) Estimation in functional regression for general exponential families. Ann Stat 40(5):2421–2451

    MathSciNet  Google Scholar 

  • Fan Y, James GM, Radchenko P et al (2015) Functional additive regression. Ann Stat 43(5):2296–2325

    MathSciNet  Google Scholar 

  • Febrero-Bande M, González-Manteiga W (2013) Generalized additive models for functional data. Test 22(2):278–292

    MathSciNet  Google Scholar 

  • Ferraty F, Vieu P (2002) The functional nonparametric model and application to spectrometric data. Comput Stat 17(4):545–564

    MathSciNet  Google Scholar 

  • Ferraty F, Vieu P (2006) Nonparametric functional data analysis: theory and practice. Springer, New York, NY

    Google Scholar 

  • Ferraty F, Vieu P (2009) Additive prediction and boosting for functional data. Comput Stat Data Anal 53(4):1400–1413

    MathSciNet  Google Scholar 

  • Ferraty F, Peuch A, Vieu P (2003) Modèle à indice fonctionnel simple. CR Math 336(12):1025–1028

    Google Scholar 

  • Ferraty F, Hall P, Vieu P (2010) Most-predictive design points for functional data predictors. Biometrika 97(4):807–824

    MathSciNet  Google Scholar 

  • Ferraty F, Park J, Vieu P (2011) Estimation of a functional single index model. In: Ferraty F (ed) Recent advances in functional data analysis and related topics. Physica-Verlag HD, Heidelberg

    Google Scholar 

  • Ferraty F, Goia A, Salinelli E et al (2013) Functional projection pursuit regression. Test 22(2):293–320

    MathSciNet  Google Scholar 

  • Ferré L, Yao AF (2003) Functional sliced inverse regression analysis. Statistics 37(6):475–488

    MathSciNet  Google Scholar 

  • Friedman JH (2001) Greedy function approximation: a gradient boosting machine. Ann Stat 29(5):1189–1232

    MathSciNet  Google Scholar 

  • Geenens G et al (2011) Curse of dimensionality and related issues in nonparametric functional regression. Stat Surv 5:30–43

    MathSciNet  Google Scholar 

  • Goia A, Vieu P (2015) A partitioned single functional index model. Comput Stat 30(3):673–692

    MathSciNet  Google Scholar 

  • Goldsmith J, Scheipl F, Huang L, et al (2020) refund: regression with functional data. https://CRAN.R-project.org/package=refund, r package version 0.1-23

  • Gregorutti B (2016) RFgroove: importance measure and selection for groups of variables with random forests. https://CRAN.R-project.org/package=RFgroove, r package version 1.1

  • Gregorutti B, Michel B, Saint-Pierre P (2015) Grouped variable importance with random forests and application to multiple functional data analysis. Comput Stat Data Anal 90:15–35

    MathSciNet  Google Scholar 

  • Greven S, Scheipl F (2017) A general framework for functional regression modelling. Stat Model 17(1–2):1–35

    MathSciNet  Google Scholar 

  • Hall P, Horowitz JL et al (2007) Methodology and convergence rates for functional linear regression. Ann Stat 35(1):70–91

    MathSciNet  Google Scholar 

  • Hastie T, Mallows C (1993) A statistical view of some chemometrics regression tools. Technometrics 35(2):140–143

    Google Scholar 

  • James GM (2002) Generalized linear models with functional predictors. J R Stat Soc Ser B (Stat Methodol) 64(3):411–432

    MathSciNet  Google Scholar 

  • James GM, Silverman BW (2005) Functional adaptive model estimation. J Am Stat Assoc 100(470):565–576

    MathSciNet  Google Scholar 

  • Jiang CR, Wang JL et al (2011) Functional single index models for longitudinal data. Ann Stat 39(1):362–388

    MathSciNet  Google Scholar 

  • Kara LZ, Laksaci A, Rachdi M et al (2017) Data-driven KNN estimation in nonparametric functional data analysis. J Multivar Anal 153:176–188

    Google Scholar 

  • Kudraszow NL, Vieu P (2013) Uniform consistency of KNN regressors for functional variables. Stat Probabil Lett 83(8):1863–1870

    Google Scholar 

  • Li KC (1991) Sliced inverse regression for dimension reduction. J Am Stat Assoc 86(414):316–327

    MathSciNet  Google Scholar 

  • Lian H, Li G (2014) Series expansion for functional sufficient dimension reduction. J Multivar Anal 124:150–165

    MathSciNet  Google Scholar 

  • Liebl D et al (2013) Modeling and forecasting electricity spot prices: A functional data perspective. Ann Appl Stat 7(3):1562–1592

    MathSciNet  Google Scholar 

  • Ling N, Vieu P (2018) Nonparametric modelling for functional data: selected survey and tracks for future. Statistics 52(4):934–949

    MathSciNet  Google Scholar 

  • Ling N, Vieu P (2020) On semiparametric regression in functional data analysis. Wiley Interdisciplinary Reviews: Computational Statistics 1538. https://doi.org/10.1002/wics.1538

  • Mas A et al (2012) Lower bound in regression for functional data by representation of small ball probabilities. Electron J Stat 6:1745–1778

    MathSciNet  Google Scholar 

  • McLean MW, Hooker G, Staicu AM et al (2014) Functional generalized additive models. J Comput Graph Stat 23(1):249–269

    MathSciNet  Google Scholar 

  • Möller A, Tutz G, Gertheiss J (2016) Random forests for functional covariates. J Chemom 30(12):715–725

    Google Scholar 

  • Müller HG, Yao F (2008) Functional additive models. J Am Stat Assoc 103(484):1534–1544

    MathSciNet  Google Scholar 

  • Müller HG, Stadtmüller U et al (2005) Generalized functional linear models. Ann Stat 33(2):774–805

    MathSciNet  Google Scholar 

  • Müller HG, Wu Y, Yao F (2013) Continuously additive models for nonlinear functional regression. Biometrika 100(3):607–622

    MathSciNet  Google Scholar 

  • Preda C (2007) Regression models for functional data by reproducing kernel Hilbert spaces methods. J Stat Plan Inference 137(3):829–840

    MathSciNet  Google Scholar 

  • Reiss PT, Ogden RT (2007) Functional principal component regression and functional partial least squares. J Am Stat Assoc 102(479):984–996

    MathSciNet  Google Scholar 

  • Shang HL (2016) A Bayesian approach for determining the optimal semi-metric and bandwidth in scalar-on-function quantile regression with unknown error density and dependent functional data. J Multivar Anal 146:95–104

    MathSciNet  Google Scholar 

  • Telgarsky M (2013) Margins, shrinkage, and boosting. Int Conf Mach Learn 28(2):307–315

    Google Scholar 

  • Therneau T, Atkinson B (2019) rpart: recursive partitioning and regression trees. https://CRAN.R-project.org/package=rpart, r package version 4.1-15

  • Tutz G, Gertheiss J (2010) Feature extraction in signal regression: a boosting technique for functional data regression. J Comput Graph Stat 19(1):154–174

    MathSciNet  Google Scholar 

  • Wang G, Lin N, Zhang B (2014) Functional K-means inverse regression. Comput Stat Data Anal 70:172–182

    MathSciNet  Google Scholar 

  • Wood SN (2017) Generalized additive models: an introduction with R, 2nd edn. Chapman and Hall/CRC, Boca Raton

    Google Scholar 

  • Yao F, Müller HG, Wang JL (2005) Functional data analysis for sparse longitudinal data. J Am Stat Assoc 100(470):577–590

    MathSciNet  Google Scholar 

  • Zhang T, Yu B (2005) Boosting with early stopping: convergence and consistency. Ann Stat 33(4):1538–1579

    MathSciNet  Google Scholar 

  • Zhao Y, Ogden RT, Reiss PT (2012) Wavelet-based lasso in functional linear regression. J Comput Graph Stat 21(3):600–617

    MathSciNet  Google Scholar 

Download references

Acknowledgements

The authors would like to thank Professors James and Ferraty for sharing the code used in their papers (James and Silverman 2005; Ferraty et al. 2013). In addition, we would like to thank two anonymous referees and an Associate Editor for their constructive comments on an earlier version of this work that resulted in a notably improved paper.

Funding

This research was supported by the Natural Sciences and Engineering Research Council of Canada [Discovery Grant RGPIN-2016-04288].

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaomeng Ju.

Ethics declarations

Conflict of interest

The authors have no competing interest to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Proof of Theorem 1 in Sect. 2.1.1.

Proof

It is clear that if \(\{\beta _1,..., \beta _K \} = \left\{ (-1)^{l_1}\eta _1,..., (-1)^{l_K}\eta _K \right\}\) for some \(l_1,...,l_K \in \{0,1\}\), then \(g = {\tilde{g}}\). Therefore, it suffices to show that \(\{\beta _1,..., \beta _K \} = \left\{ (-1)^{l_1}\eta _1,..., (-1)^{l_K}\eta _K \right\}\) for some \(l_1,...,l_K \in \{0,1\}\). We prove that if there do not exist \(l_1,...,l_K\) for the two sets to be equal, there exists a set of indices for which (12) is a constant function and thus contradicts Condition 2.

For simplicity, we let \({\tilde{\eta }}_j = (-1)^{l_j} \eta _j\). If for any \(l_1,...,l_K\), \(\{\beta _1,..., \beta _K \} \ne \left\{ {\tilde{\eta }}_1,..., {\tilde{\eta }}_K \right\}\), we match two sets so that the same vectors \(\beta _j\) and \({\tilde{\eta }}_j\) align with each other. We let \(S = \{\beta _1,...,\beta _K \} \cap \{{\tilde{\eta }}_1,...,{\tilde{\eta }}_K \}\), \(\beta _j = {\tilde{\eta }}_j\), for \(j = 1,..., |S|\) and \(\beta _j \notin \{{\tilde{\eta }}_1,...,{\tilde{\eta }}_K \}\), for \(j = |S|+1,..., K\), and \(|S| < K\). By Condition 2, there exist a \(x_0\) for \(J = {|S|+1,...,K}\), (12) is not a constant function.

By (13), Conditions 1 and 2, for any \(t_1,..., t_K \in (-\delta , \delta )\)

$$\begin{aligned} h\left( \langle x_0 ,\beta _1\rangle + t_1, ..., \langle x_0, \beta _K \rangle + t_K\right)&= h\left( \langle x_0 + t_1\beta _1, \beta _1 \rangle , ..., \langle x_0 + t_K \beta _K\rangle , \beta _K \right) \\&={\tilde{h}} \left( \langle x_0 + t_1 \beta _1, \eta _1 \rangle , ..., \langle x_0 + t_K \beta _K, \eta _K \rangle \right) \\&= {\tilde{h}} \left( \langle x_0, \eta _1 \rangle + t_1 \langle \beta _1, \eta _1\rangle , ..., \langle x_0, \eta _K \rangle + t_K \langle \beta _K,\eta _K\rangle \right) \end{aligned}$$

and similarly

$$\begin{aligned} {\tilde{h}}\left( \langle x_0, \eta _1 \rangle + t_1, ..., \langle x_0, \eta _K \rangle + t_K\right)&= {\tilde{h}}\left( \langle x_0 + t_1 \eta _1 , \eta _1 \rangle , ..., \langle x_0 + t_K \eta _K, \eta _K\rangle \right) \\&=h \left( \langle x_0 + t_1 \eta _1, \beta _1 \rangle , ..., \langle x_0 + t_K \eta _K, \beta _K \rangle \right) \\&= h \left( \langle x_0, \beta _1 \rangle + t_1 \langle \beta _1, \eta _1 \rangle , ..., \langle x_0, \beta _K \rangle + t_K \langle \beta _K, \eta _K\rangle \right) \end{aligned}$$

By Cauchy-Schwarz inequality and Condition 1, \((\langle \beta _j, \eta _j \rangle )^2 = 1\) for \(j = 1,..., |S|\) and \((\langle \beta _j, \eta _j \rangle )^2 < 1\) for \(j = |S+1|,..., K\). For any \(t_1,..., t_K \in (-\delta ,\delta )\),

$$\begin{aligned} h\left( \langle x_0, \beta _1 \rangle + t_1, ..., \langle x_0 , \beta _K \rangle + t_K\right)&= {\tilde{h}} ( \langle x_0,\eta _1 \rangle + t_1 \langle \beta _1, \eta _1 \rangle , ..., \nonumber \\&\quad \langle x_0, \eta _K \rangle + t_K \langle \beta _K, \eta _K\rangle ) \nonumber \\&= h ( \langle x_0, \beta _1 \rangle + t_1 \langle \beta _1, \eta _1 \rangle ^2, ..., \nonumber \\&\quad \langle x_0, \beta _K\rangle + t_K \langle \beta _K, \eta _K\rangle ^2 ) \nonumber \\&\vdots \nonumber \\&= h ( \langle x_0, \beta _1 \rangle + t_1 \langle \beta _1, \eta _1 \rangle ^{2n}, ..., \nonumber \\&\quad \langle x_0, \beta _K\rangle + t_K \langle \beta _K, \eta _K \rangle ^{2n} )\nonumber \\&\vdots \nonumber \\&= h \left( \langle x_0, \beta _1 \rangle + t_1I_1, ..., \langle x_0, \beta _K\rangle + t_KI_K\right) \end{aligned}$$
(22)

where \(I_j = 1\) for \(j = 1,...,|S|\) and \(I_j = 0\) for \(j = |S+1|,..., |K|\).

Let \(x = x_0 + t e\) for any unit function \(e \in L^2({\mathcal {I}})\), \(\Vert e \Vert = 1\) and \(t \in (-\delta , \delta )\). Then x fills the space of \(B(x_0, \delta )\). For \(j = 1,...,K\), we define

$$\begin{aligned} L_j(x)&= (1-I_j)x + I_j x_0 \\&= (1-I_j) (x_0 + te) + I_j x_0 \\&= x_0 + (1 - I_j) te \\ h( \langle L_1(x),\beta _1 \rangle ,..., \langle L_K(x), \beta _K \rangle )&= h( \langle x_0 + (1 - I_1) te, \beta _1 \rangle ,...,\langle x_0 + (1 - I_K) te, \beta _K \rangle ) \\&= h( \langle x_0, \beta _1 \rangle + \langle (1 - I_1) te, \beta _1 \rangle ,..., \langle x_0, \beta _K \rangle + \\&\langle (1 - I_K) te, \beta _K \rangle ) , \text {by} \ (22) \\&= h( \langle x_0, \beta _1 \rangle + \langle I_1 (1 - I_1) te, \beta _1 \rangle ,..., \langle x_0, \beta _K \rangle + \\ {}&\langle I_K(1 - I_K) te, \beta _K \rangle ) \\&=h( \langle x_0, \beta _1 \rangle ,..., \langle x_0, \beta _K \rangle ), \end{aligned}$$

which is a constant function of x and that contradicts Condition 2. \(\square\)

Appendix B

The summary statistics of test MSEs from 100 independent runs of the simulation are provided in Tables 2, 5, 8 and 11, with bold font indicating the lowest two average test errors in each setting. Summary statistics of the tree depths selected by TFBoost are provided in Tables 3,6, 9 and 12, and for the early stopping times for TFBoost are provided in Tables 4, 7, 10 and 13.

Table 2 Summary statistics of test errors for data generated from \(r_1\)
Table 3 Summary statistics of the tree depths selected by TFBoost for data generated from \(r_1\)
Table 4 Summary statistics of the early stoping times \(T_{\text {stop}}\) selected by TFBoost methods for data generated from \(r_1\)
Table 5 Summary statistics of test errors for data generated from \(r_2\)
Table 6 Summary statistics of the tree depths selected by TFBoost for data generated from \(r_2\)
Table 7 Summary statistics of the early stoping times \(T_{\text {stop}}\) selected by TFBoost methods for data generated from \(r_2\)
Table 8 Summary statistics of test errors for data generated from \(r_3\)
Table 9 Summary statistics of the tree depths selected by TFBoost for data generated from \(r_3\)
Table 10 Summary statistics of the early stoping times \(T_{\text {stop}}\) selected by TFBoost methods for data generated from \(r_3\)
Table 11 Summary statistics of test errors for data generated from \(r_4\)
Table 12 Summary statistics of the tree depths selected by TFBoost for data generated from \(r_4\)
Table 13 Summary statistics of the early stoping times \(T_{\text {stop}}\) selected by TFBoost methods for data generated from \(r_4\)

Figures 9, 10, 11 and 12 show the test MSEs averaged over 100 runs of the experiment versus the number of iterations for TFBoost. For the convenience of taking the averages, for each run, we let the test errors for the iterations past the early stopping time to keep the same test error as the one obtained at the early stopping time. It can be observed from figures that the test errors usually drop quickly within 100 iterations.

Fig. 9
figure 9

Test MSEs averaged from 100 runs of the experiment for TFBoost in \(r_1\) settings

Fig. 10
figure 10

Test MSEs averaged from 100 runs of the experiment for TFBoost in \(r_2\) settings

Fig. 11
figure 11

Test MSEs averaged from 100 runs of the experiment for TFBoost in \(r_3\) settings

Fig. 12
figure 12

Test MSEs averaged from 100 runs of the experiment for TFBoost in \(r_4\) settings

Appendix C

We consider another regression function that is linear:

$$\begin{aligned} r_5(X) = \int _{{\mathcal {I}}} \left( \text {sin} \left( \frac{3}{2} \pi t \right) + \text {sin} \left( \frac{1}{2} \pi t \right) \right) X(t)dt. \end{aligned}$$

The other specifications of the model remain the same as described in Sect. 3. Table 14 include the summary statistics of test MSEs from 100 independent runs of the simulation, with bold font indicating the lowest two average test errors in each setting. Tables 15 and 16 include the summary statistics of the tree depths and early stopping times selected by TFBoost methods.

Table 14 Summary statistics of test errors for data generated from \(r_5\)
Table 15 Summary statistics of the tree depths selected by TFBoost for data generated from \(r_5\)
Table 16 Summary statistics of the early stoping times \(T_{\text {stop}}\) selected by TFBoost methods for data generated from \(r_5\)

Appendix D

The summary statistics of test MSEs from 100 random partitions of the German electricity data in Sect. 4 are provided in Table 17.

Table 17 Summary statistics of test MSEs displayed in the form of mean (sd).

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ju, X., Salibián-Barrera, M. Tree-based boosting with functional data. Comput Stat 39, 1587–1620 (2024). https://doi.org/10.1007/s00180-023-01364-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00180-023-01364-2

Keywords

Navigation