Skip to main content
Log in

Bootstrap model selection for possibly dependent and heterogeneous data

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

This paper proposes the use of the bootstrap in penalized model selection for possibly dependent heterogeneous data. The results show that we can establish (at least asymptotically) a direct relationship between estimation error and a data based complexity penalization. This requires redefinition of the target function as the sum of the individual expected predicted risks. In this framework, the wild bootstrap and related approaches can be used to estimate the penalty with no need to account for heterogeneous dependent data. The methodology is highlighted by a simulation study whose results are particularly encouraging.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Bartlett P., Boucheron G., Lugosi G. (2002) Model selection and error estimation. Machine Learning 48: 85–113

    Article  MATH  Google Scholar 

  • Bartlett P., Bousquet O., Mendelson S. (2005) Local rademacher complexities. Annals of Statistics 33: 1497–1537

    Article  MATH  MathSciNet  Google Scholar 

  • Bühlmann P. (1997) Sieve Bootstrap for time series. Bernoulli 3: 123–148

    Article  MATH  MathSciNet  Google Scholar 

  • Cesa-Bianchi N., Lugosi G. (2001) Worst-case bounds for the logarithmic loss of predictors. Machine Learning 43: 247–264

    Article  MATH  Google Scholar 

  • Dawid A.P. (1984) Present position and potential developments: some personal views: statistical theory: the prequential approach. Journal of the Royal Statistical Society Series A 147: 278–292

    Article  MATH  MathSciNet  Google Scholar 

  • Dawid A.P. (1985) Calibration-based empirical probability. The Annals of Statistics 13: 1251–1274

    Article  MATH  MathSciNet  Google Scholar 

  • Dawid, A. P. (1986). Probability forecasting. In S. Kotz, N. L. Johnson, C. B. Read (Eds.), Encyclopedia of statistical sciences (Vol. 7, pp. 210–218). New York: Wiley.

  • De la Peña V.H. (1999) A general class of exponential inequalities for Martingales and ratios. Annals of Probability 27: 537–564

    Article  MATH  MathSciNet  Google Scholar 

  • Devroye L., Györfi L., Lugosi G. (1996) A probabilistic theory of pattern recognition. Springer, New York

    MATH  Google Scholar 

  • Doukhan P., Leon J.R., Portal F. (1987) Principes d’Invariance Faible pour la Mesure Empirique d’un Suite de Variables Aléatoires Mélangeante. Probability Theory and Related Fields 76: 51–70

    Article  MATH  MathSciNet  Google Scholar 

  • Dudley R.M. (2002) Real analysis and probability. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Efron B. (1983) Estimating the error rate of a prediction rule: improvement on cross-validation. Journal American Statistical Association 78: 316–331

    Article  MATH  MathSciNet  Google Scholar 

  • Friedman J.H. (2001) Greedy function approximation: a gradient boosting machine. Annals of Statistics 29: 1189–1232

    Article  MATH  MathSciNet  Google Scholar 

  • Fromont M. (2007) Model selection by bootstrap penalization for classification. Machine Learning 66: 165–207

    Article  Google Scholar 

  • Gray R.M., Kieffer J.C. (1980) Asymptotically mean stationary measures. Annals of Probability 8: 962–973

    Article  MATH  MathSciNet  Google Scholar 

  • Koltchinskii V. (2001) Rademacher penalties and structural risk minimization. IEEE Transactions on Information Theory 47: 1902–1914

    Article  MATH  MathSciNet  Google Scholar 

  • Levental S. (1989) A uniform CLT for uniformly bounded families of Martingale differences. Journal of Theoretical Probability 2: 271–287

    Article  MATH  MathSciNet  Google Scholar 

  • Lugosi G., Wegkamp M. (2004) Complexity regularization via localized random penalties. Annals of Statistics 32: 1679–1697

    Article  MATH  MathSciNet  Google Scholar 

  • Mammen E. (1992) Bootstrap, wild bootstrap, and asymptotic normality. Probability Theory Related Fields 93: 439–455

    Article  MATH  MathSciNet  Google Scholar 

  • McLeish D.L. (1974) Dependent central limit theorems and invariance principles. Annals of Probability 2: 620–628

    Article  MATH  MathSciNet  Google Scholar 

  • Petrov V. (1995) Limit Theorems of probability theory. Oxford University Press, Oxford

    MATH  Google Scholar 

  • Ripley B. (1996) Pattern recognition and neural networks. Cambridge University Press, Cambridge

    MATH  Google Scholar 

  • Rüschendorf L., de Valk V. (1993) On regression representation of stochastic processes. Stochastic Processes and their Applications 46: 183–198

    Article  MATH  MathSciNet  Google Scholar 

  • Seillier-Moiseiwitsch P., Dawid A.P. (1993) On testing the validity of sequential probability forecasts. Journal of the American Statistical Association 88: 355–359

    Article  MATH  MathSciNet  Google Scholar 

  • Skouras, K., Dawid, P. (2000). Consistency in misspecified models. Research report 218. Department of Statistical Science, University College London.

  • Van der Laan, M. J., Dudoit, S. (2003). Unified cross-validation methodology for selection among estimators and a general cross-validated adaptive epsilon-net estimator: finite sample oracle inequalities and examples. U.C. Berkeley Division of Biostatistics Working Paper Series, Working Paper 130.

  • Van der Vaart, A., Wellner, J. A. (2000). Weak convergence of empirical processes. Springer series in statistics. New York: Springer.

  • Vapnik V.N. (1998) Statistical learning theory. Wiley, New York

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessio Sancetta.

Additional information

I thank the associate editor and the referee for comments that improved the quality and presentation of the paper.

About this article

Cite this article

Sancetta, A. Bootstrap model selection for possibly dependent and heterogeneous data. Ann Inst Stat Math 62, 515–546 (2010). https://doi.org/10.1007/s10463-008-0183-3

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-008-0183-3

Keywords

Navigation