Minimization and estimation of the variance of prediction errors for cross-validation designs

Fuchs, Mathias; Krautenbacher, Norbert

doi:10.1080/15598608.2016.1158675

Minimization and estimation of the variance of prediction errors for cross-validation designs

Article
Published: 01 June 2016

Volume 10, pages 420–443, (2016)
Cite this article

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Mathias Fuchs¹ &
Norbert Krautenbacher^2,3

23 Accesses
5 Citations
1 Altmetric
Explore all metrics

Abstract

We consider the mean prediction error of a classification or regression procedure as well as its cross-validation estimates, and investigate the variance of this estimate as a function of an arbitrary cross-validation design. We decompose this variance into a scalar product of coefficients and certain covariance expressions, such that the coefficients depend solely on the resampling design, and the covariances depend solely on the data’s probability distribution. We rewrite this scalar product in such a form that the initially large number of summands can gradually be decreased down to three under the validity of a quadratic approximation to the core covariances. We show an analytical example in which this quadratic approximation holds true exactly. Moreover, in this example, we show that the leave-p-out estimator of the error depends on p only by means of a constant and can, therefore, be written in a much simpler form. Furthermore, there is an unbiased estimator of the variance of K-fold cross-validation, in contrast to a claim in the literature. As a consequence, we can show that balanced incomplete block designs have smaller variance than K-fold cross-validation. In a real data example from the UCI machine learning repository, this property can be confirmed. We finally show how to find balanced incomplete block designs in practice.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization

The leave-worst-k-out criterion for cross validation

Article 17 June 2022

Lizhi Wang

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Aki Vehtari, Andrew Gelman & Jonah Gabry

References

Arlot, S., and A. Celisse. 2010. A survey of cross-validation procedures for model selection. Statistics Surveys 4:40–79.
Article MathSciNet Google Scholar
Bailey, R. A., and P. J. Cameron. 2013. Using graphs to find the best block designs. In Topics in structural graph theory, vol. 147 of Encyclopedia Math. Appl., 282–317. Cambridge, UK: Cambridge University ress.
Google Scholar
Bengio, Y., and Y. Grandvalet. 2003/2004. No unbiased estimator of the variance of K-fold cross-validation. Journal of Machine Learning Research 5:1089–105.
MathSciNet MATH Google Scholar
Cramér, H. 1946. Mathematical methods of statistics. Princeton Mathematical Series, vol. 9. Princeton, NJ: Princeton University Press.
MATH Google Scholar
Fuchs, M., R. Hornung, R. De Bin, and A. L. Boulesteix. 2013. A u-statistic estimator for the variance of resampling-based error estimators. Technical report, Ludwig Maximilian University of Munich, Munich, Germany.
Hastie, T., R. Tibshirani, and J. Friedman. 2009. The elements of statistical learning, 2nd ed. Springer Series in Statistics. New York, NY: Springer.
Book Google Scholar
Hoeffding W. 1948. A class of statistics with asymptotically normal distribution. Annals of Mathematical Statistics 19:293–325.
Article MathSciNet Google Scholar
I-Cheng, Y. 2007. Modeling slump flow of concrete using second-order regressions and artificial neural networks. Cement and Concrete Composites 29 (6):474–80.
Article Google Scholar
Lee, A. J. 1990. U-statisticsa, vol. 110 of Statistics: Textbooks and monographs. New York, NY: Marcel Dekker.
Google Scholar
Maesono, Y. 1998. Asymptotic comparisons of several variance estimators and their effects for Studentizations. Annals of the Institute of Statistical Mathematics 50 (3):451–70. doi:10.1023/A:1003521327411.
Article MathSciNet Google Scholar
Nadeau, C., and Y. Bengio. 2003. Inference for the generalization error. Machine Learning 522:239–81. doi:10.1023/A:1024068626366.
Article Google Scholar
Stinson, D. R. 2004. Combinatorial designs. New York, NY: Springer-Verlag.
MATH Google Scholar
Tang, B., 1999. Balanced bootstrap in sample surveys and its relationship with balanced repeated replication. Journal of Statistical Planning and Inference 81 (1):121–27.
Article MathSciNet Google Scholar
Wallis, W. D., ed. 1996. Computational and constructive design theory, vol. 368 of Mathematics and its applications. Dordrecht, The Netherland: Kluwer Academic. doi:10.1007/978-1-4757-2497-4.
Google Scholar
Wang, Q., and B. Lindsay. 2014. Variance estimation of a general u-statistic with application to cross-validation. Statistica Sinica 24:1117–41.
MathSciNet MATH Google Scholar
Zhang, Q., and P. Z. G. Qian. 2013. Designs for crossvalidating approximation models. Biometrika 100 (4):997–1004.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Institut für Medizinische Informationsverarbeitung Biometrie und Epidemiologie, Ludwig Maximilian University of Munich, Marchioninistr. 15, 81377, Munich, Germany
Mathias Fuchs
Institute of Computational Biology, Helmholtz Zentrum München, German Research Center for Environmental Health, Munich, Germany
Norbert Krautenbacher
Department of Mathematics, Technische Universität München, Munich, Germany
Norbert Krautenbacher

Authors

Mathias Fuchs
View author publications
You can also search for this author in PubMed Google Scholar
Norbert Krautenbacher
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathias Fuchs.

Additional information

Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/ujsp.

Supplemental data for this article can be accessed on the publisher’s website.

Electronic supplementary material

Supplement: An exemplary computation of the quantities

Rights and permissions

Reprints and permissions

About this article

Cite this article

Fuchs, M., Krautenbacher, N. Minimization and estimation of the variance of prediction errors for cross-validation designs. J Stat Theory Pract 10, 420–443 (2016). https://doi.org/10.1080/15598608.2016.1158675

Download citation

Received: 08 August 2015
Accepted: 23 February 2016
Published: 01 June 2016
Issue Date: June 2016
DOI: https://doi.org/10.1080/15598608.2016.1158675

Keywords

AMS Subject Classification

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Minimization and estimation of the variance of prediction errors for cross-validation designs

Abstract

Access this article

Similar content being viewed by others

Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization

The leave-worst-k-out criterion for cross validation

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplement: An exemplary computation of the quantities

Supplement: An exemplary computation of the quantities

Rights and permissions

About this article

Cite this article

Keywords

AMS Subject Classification

Navigation

Minimization and estimation of the variance of prediction errors for cross-validation designs

Abstract

Access this article

Similar content being viewed by others

Performance-Estimation Properties of Cross-Validation-Based Protocols with Simultaneous Hyper-Parameter Optimization

The leave-worst-k-out criterion for cross validation

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

Supplement: An exemplary computation of the quantities

Supplement: An exemplary computation of the quantities

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS Subject Classification

Search

Navigation