Skip to main content
Log in

Estimation and regularization techniques for regression models with multidimensional prediction functions

  • Published:
Statistics and Computing Aims and scope Submit manuscript

Abstract

Boosting is one of the most important methods for fitting regression models and building prediction rules. A notable feature of boosting is that the technique can be modified such that it includes a built-in mechanism for shrinking coefficient estimates and variable selection. This regularization mechanism makes boosting a suitable method for analyzing data characterized by small sample sizes and large numbers of predictors. We extend the existing methodology by developing a boosting method for prediction functions with multiple components. Such multidimensional functions occur in many types of statistical models, for example in count data models and in models involving outcome variables with a mixture distribution. As will be demonstrated, the new algorithm is suitable for both the estimation of the prediction function and regularization of the estimates. In addition, nuisance parameters can be estimated simultaneously with the prediction function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Breiman, L.: Arcing classifiers (with discussion). Ann. Stat. 26, 801–849 (1998)

    Article  MATH  MathSciNet  Google Scholar 

  • Breiman, L.: Prediction games and arcing algorithms. Neural Comput. 11, 1493–1517 (1999)

    Article  Google Scholar 

  • Breiman, L.: Random forests. Mach. Learn. 45, 5–32 (2001)

    Article  MATH  Google Scholar 

  • Brier, G.: Verification of forecasts expressed in terms of probability. Mon. Weather Rev. 78, 1–3 (1950)

    Article  Google Scholar 

  • Bühlmann, P.: Boosting for high-dimensional linear models. Ann. Stat. 34, 559–583 (2006)

    Article  MATH  Google Scholar 

  • Bühlmann, P., Hothorn, T.: Boosting algorithms: Regularization prediction and model fitting (with discussion). Stat. Sci. 22, 477–522 (2007)

    Article  Google Scholar 

  • Bühlmann, P., Yu, B.: Boosting with the L 2 loss: Regression and classification. J. Am. Stat. Assoc. 98, 324–338 (2003)

    Article  MATH  Google Scholar 

  • Consul, P., Jain, G.: A generalization of the Poisson distribution. Technometrics 15, 791–799 (1973)

    Article  MATH  MathSciNet  Google Scholar 

  • Dudoit, S., Fridlyand, J., Speed, T.P.: Comparison of discrimination methods for the classification of tumors using gene expression data. J. Am. Stat. Assoc. 97, 77–87 (2002)

    Article  MATH  MathSciNet  Google Scholar 

  • Efron, B., Johnston, I., Hastie, T., Tibshirani, R.: Least angle regression. Ann. Stat. 32, 407–499 (2004)

    Article  MATH  Google Scholar 

  • Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  • Fitzpatrick, T.B.: The validity and practicality of sun-reactive skin types I through VI. Arch. Dermatol. 124, 869–871 (1988)

    Article  Google Scholar 

  • Freund, Y., Schapire, R.: A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 55, 119–139 (1997)

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman, J.H.: Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)

    Article  MATH  Google Scholar 

  • Friedman, J.H., Hastie, T., Tibshirani, R.: Additive logistic regression: A statistical view of boosting (with discussion). Ann. Stat. 28, 337–407 (2000)

    Article  MATH  MathSciNet  Google Scholar 

  • Gallagher, R.P., McLean, D.I., Yang, C.P., Coldman, A.J., Silver, H.K., Spinelli, J.J., Beagrie, M.: Suntan, sunburn, and pigmentation factors and the frequency of acquired melanocytic nevi in children. Similarities to melanoma: The Vancouver mole study. Arch. Dermatol. 126, 770–776 (1990)

    Article  Google Scholar 

  • Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102, 359–378 (2007)

    Article  MATH  MathSciNet  Google Scholar 

  • Golub, T.R., Slonim, D.K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J.P., Coller, H., Loh, M.L., Downing, J.R., Caligiuri, M.A., Bloomfeld, C.D., Lander, E.S.: Molecular classification of cancer: Class discovery and class prediction by gene expression monitoring. Science 286, 531–537 (1999)

    Article  Google Scholar 

  • Hastie, T., Tibshirani, R.: Generalized Additive Models. Chapman & Hall, London (1990)

    MATH  Google Scholar 

  • Hastie, T., Tibshirani, R., Friedman, J.: The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd edn. Springer, New York (2009)

    MATH  Google Scholar 

  • Hilbe, J.M.: Negative Binomial Regression. Cambridge University Press, Cambridge (2007)

    MATH  Google Scholar 

  • Hothorn, T., Bühlmann, P., Kneib, T., Schmid, M., Hofner, B.: mboost: Model-Based Boosting (2008). R package version 1.0-4. http://R-forge.R-project.org

  • Li, L.: Multiclass boosting with repartitioning. In: Proceedings of the 23rd International Conference on Machine Learning (ICML2006), pp. 569–576. ACM Press, New York (2006)

    Chapter  Google Scholar 

  • McCullagh, P., Nelder, J.A.: Generalized Linear Models, 2nd edn. Chapman & Hall, London (1989)

    MATH  Google Scholar 

  • Mullahy, J.: Specification and testing of some modified count data models. J. Econom. 33, 341–365 (1986)

    Article  MathSciNet  Google Scholar 

  • Park, M.Y., Hastie, T.: L 1-regularization path algorithm for generalized linear models. J. R. Stat. Soc., Ser. B 69, 659–677 (2007)

    Article  MathSciNet  Google Scholar 

  • Pfahlberg, A., Uter, W., Kraus, C., Wienecke, W.R., Reulbach, U., Kölmel, K.F., Gefeller, O.: Monitoring of nevus density in children as a method to detect shifts in melanoma risk in the population. Prev. Med. 38, 382–387 (2004)

    Article  Google Scholar 

  • R Development Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna (2008). http://www.R-project.org. ISBN 3-900051-07-0

    Google Scholar 

  • Schapire, R.E., Singer, Y.: Improved boosting algorithms using confidence-rated predictions. Mach. Learn. 37, 297–336 (1999)

    Article  MATH  Google Scholar 

  • Schmid, M., Hothorn, T.: Boosting additive models using component-wise P-splines. Comput. Stat. Data Anal. 53, 298–311 (2008a)

    Article  Google Scholar 

  • Schmid, M., Hothorn, T.: Flexible boosting of accelerated failure time models. BMC Bioinform. 9:269 (2008b)

  • Segal, M.R.: Microarraygene expression data with linked survival phenotypes: Diffuse large-B-cell lymphoma revisited. Biostatistics 7, 268–285 (2006)

    Article  MATH  Google Scholar 

  • Sun, Y., Todorovic, S., Li, J.: Unifying multi-class AdaBoost algorithms with binary base learners under the margin framework. Pattern Recognit. Lett. 28, 631–643 (2007)

    Article  Google Scholar 

  • Tibshirani, R.: Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58, 267–288 (1996)

    MATH  MathSciNet  Google Scholar 

  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., Knight, K.: Sparsity and smoothness via the fused Lasso. J. R. Stat. Soc. Ser. B 67, 91–108 (2005)

    Article  MATH  MathSciNet  Google Scholar 

  • Uter, W., Pfahlberg, A., Kalina, B., Kölmel, K.F., Gefeller, O.: Inter-relation between variables determining constitutional UV sensitivity in Caucasian children. Photodermatol. Photoimmunol. Photomed. 20, 9–13 (2004)

    Article  Google Scholar 

  • Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B 68, 49–67 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  • Zhu, J., Rosset, S., Zou, H., Hastie, T.: A multi-class AdaBoost. Technical Report 430, Department of Statistics, University of Michigan (2005)

  • Zou, H.: The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)

    Article  MATH  Google Scholar 

  • Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67, 301–320 (2005)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthias Schmid.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Schmid, M., Potapov, S., Pfahlberg, A. et al. Estimation and regularization techniques for regression models with multidimensional prediction functions. Stat Comput 20, 139–150 (2010). https://doi.org/10.1007/s11222-009-9162-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11222-009-9162-7

Navigation