Abstract
A new data science tool named wavelet-based gradient boosting is proposed and tested. The approach is special case of componentwise linear least squares gradient boosting, and involves wavelet functions of the original predictors. Wavelet-based gradient boosting takes advantages of the approximate \(\ell _1\) penalization induced by gradient boosting to give appropriate penalized additive fits. The method is readily implemented in R and produces parsimonious and interpretable regression fits and classifiers.
Similar content being viewed by others
References
Binder, H., Tutz, G.: A comparison of methods for the fitting of generalized additive models. Stat. Comput. 18, 87–99 (2008)
Bühlmann, P.: Boosting for high-dimensional linear models. Ann. Stat. 34, 559–583 (2006)
Bühlmann, P., Yu, B.: Sparse boosting. J. Mach. Learn. Res. 7, 1001–1024 (2006)
Bühlmann, P., Hothorn, T.: Boosting algorithms: regularization, prediction and model fitting (with discussion). Stat. Sci. 22, 477–522 (2007)
Donoho, D.L.: De-noising by soft-thresholding. IEEE Trans. Inf. Theor. 41, 613–627 (1995)
Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika 81, 425–456 (1994)
Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–451 (2004)
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Hansen, M.H., Yu, B.: Model selection and the principle of minimum description length. J. Am. Stati. Assoc. 96, 746–774 (2001)
Hastie, T.: Comment on paper by Bühlmann & Hothorn. Stat. Sci. 22, 513–515 (2007)
Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning, 2nd edn. Springer, New York (2009)
Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M. & Hofner, B.: mboost 2.2. Model-based boosting. R package.(2011) http://cran.r-project.org
Hurvich, C.M., Simonoff, J.S., Tsai, C.: Smoothing parameter selection in nonparametric regression using an improved A kaike information criterion. J. R. Stat. Soc. B 60, 271–293 (1998)
Hyndman, R.J.: hdrcde 2.15. Highest density regions and conditional density estimation. R package. (2010) http://cran.r-project.org
Leitenstorfer, F., Tutz, G.: Knot selection by boosting techniques. Comput. Stat. Data Anal. 51, 4605–4621 (2007)
Nason, G.P.: Wavelet Methods in Statistics with R. Springer, New York (2008)
Nason, G.P.: wavethresh 4.5. Wavelets statistics and transforms. R package. (2010) http://cran.r-project.org
R Development Core Team R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN 3-900051-07-0, (2012) http://www.R-project.org
Ridgeway G.: gbm 1.6. Generalized boosted regression models. R package. (2012) http://cran.r-project.org
Samworth, R.J., Wand, M.P.: Asymptotics and optimal bandwidth selection for highest density region estimation. Ann. Stat. 38, 1767–1792 (2010)
Vidakovic, B.: Statistical Modeling by Wavelets. Wiley, New York (1999)
Wand, M.P., Jones, M.C.: Kernel Smoothing. Chapman and Hall, London (1995)
Wand, M.P., Ormerod, J.T.: Penalized wavelets: embedding wavelets into semiparametric regression. Electron. J. Stat. 5, 1654–1717 (2011)
Zou, H., Hastie, T., Tibshirani, R.: On the “degrees of freedom” of the lasso. Ann. Stat. 5, 2173–2192 (2007)
Acknowledgments
We are grateful to Andrew Chernih for his provision of the Sydney residential property price data and to Peter Green for his comments on aspects of this research. Partial support was provided by Australian Research Council Discovery Project DP0877055. Assistance from the University of Technology, Sydney’s Distinguished Visitor programme is gratefully acknowledged.
Author information
Authors and Affiliations
Corresponding author
Appendix: Highest-density region grids
Appendix: Highest-density region grids
We now provide details of the highest density region (HDR) grids used in Figures 3 and 5.
Let \(\varvec{x}=(x_1,\ldots ,x_n)\) be a generic univariate sample and \({\widehat{p}}\) be a probability density estimate based on \(\varvec{x}\). Then a \(100(1-\tau )\%\) highest-density region estimate is
where \({\widehat{p}}_{\tau }\) is chosen so that the probability mass of \({\widehat{p}}\) over the set \({\widehat{R}}_{\tau }\) does not exceed \(1-\tau \). See, for example, Samworth and Wand (2010) for a precise mathematical definition of \({\widehat{p}}_{\tau }\).
The most commonly used estimator \({\widehat{p}}\) for HDR estimation is the kernel density estimator
where \(K\) is a kernel function and \(h>0\) is a bandwidth (see e.g. Wand and Jones 1995). Recently, Samworth and Wand (2010) devised an automatic rule for selection of \(h\) in the HDR estimation context. The R package hdrcde (Hyndman 2010) implements both HDR estimation and the Samworth-Wand bandwidth selector. Figure 8 shows 80% HDR estimate for the variable distance to coastline variable in the Sydney residential property prices data. The corresponding HDR grid of size 50 is shown at the base of the plot.
Rights and permissions
About this article
Cite this article
Dubossarsky, E., Friedman, J.H., Ormerod, J.T. et al. Wavelet-based gradient boosting. Stat Comput 26, 93–105 (2016). https://doi.org/10.1007/s11222-014-9474-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11222-014-9474-0