On the use of cross-validation to assess performance in multivariate prediction

Abstract

We describe a Monte Carlo investigation of a number of variants of cross-validation for the assessment of performance of predictive models, including different values of k in leave-k-out cross-validation, and implementation either in a one-deep or a two-deep fashion. We assume an underlying linear model that is being fitted using either ridge regression or partial least squares, and vary a number of design factors such as sample size n relative to number of variables p, and error variance. The investigation encompasses both the non-singular (i.e. n > p) and the singular (i.e. n ≤ p) cases. The latter is now common in areas such as chemometrics but has as yet received little rigorous investigation. Results of the experiments enable us to reach some definite conclusions and to make some practical recommendations.

This is a preview of subscription content, access via your institution.

References

  1. Altman N. and Leger C. 1997. On the optimality of prediction-based selection criteria and the convergence rate of estimators. Journal of the Royal Statistical Society, Series B 59: 205–216.

    Google Scholar 

  2. Bellman R.E. 1961. Adaptive Control Processes. Princeton NJ, Princeton University Press.

    Google Scholar 

  3. Breiman L. 1996. Heuristics of instability and stabilization in model selection. Annals of Statistics 24: 2350–2382.

    Google Scholar 

  4. Breiman L. and Friedman J.H. 1997. Predicting multivariate responses in multiple linear regression (with discussion). Journal of the Royal Statistical Society, Series B 59: 3–54.

    Google Scholar 

  5. Brown P.J. 1993. Measurement, Regression and Calibration. Oxford, Clarendon Press.

    Google Scholar 

  6. Efron B. 1982. The Jackknife, the Bootstrap and Other Resampling Plans. CBSM38, SIAM, Philadelphia, Penn.

    Google Scholar 

  7. Ganeshanandam S. and Krzanowski W.J. 1989. On selecting variables and assessing their performance in linear discriminant analysis. Australian Journal of Statistics 31: 433–447.

    Google Scholar 

  8. Garthwaite P. 1994. An interpretation of partial least squares. Journal of the American Statistical Association 89: 122–127.

    Google Scholar 

  9. Golub G.H., Heath M., and Wahba G. 1979. Generalized crossvalidation as a method for choosing a good ridge parameter. Technometrics 22: 215–223.

    Google Scholar 

  10. Hills M. 1966. Allocation rules and their error rates (with discussion). Journal of the Royal Statistical Society, Series B 28: 1–31.

    Google Scholar 

  11. Krzanowski W.J. 1995. Selection of variables, and assessment of their performance, in mixed-variable discriminant analysis. Computational Statistics and Data Analysis 19: 419–431.

    Google Scholar 

  12. Krzanowski W.J., Jonathan P., McCarthy W.V., and Thomas M.R. 1995. Discriminant analysis with singular covariance matrices: methods and applications to spectroscopic data. Applied Statistics 44: 101–115.

    Google Scholar 

  13. Lachenbruch P.A. and Mickey M.R. 1968. Estimation of error rates in discriminant analysis. Technometrics 10: 1–11.

    Google Scholar 

  14. Mertens B., Fearn T., and Thompson M. 1995. The efficient crossvalidation of principal components applied to principal component regression. Statistics and Computing 5: 227–235.

    Google Scholar 

  15. Montgomery D.C. and Peck E.A. 1982. Introduction to Linear Regression Analysis. New York, John Wiley & Sons.

    Google Scholar 

  16. Mosteller F. and Tukey J.W. 1968. Data analysis, including statistics. In: G. Lindzey and E. Aronson (Eds.), Handbook of Social Psychology, Vol. 2. Addison-Wesley, Reading, Mass.

    Google Scholar 

  17. Rannar S., Geladi P., Lindgren F., and Wold S. 1995. A PLS kernel algorithm for data sets with many variables and few objects. Journal of Chemometrics 9: 459–470.

    Google Scholar 

  18. Shao J. 1993. Linear model selection by cross-validation. Journal of the American Statistical Association 88: 486–494.

    Google Scholar 

  19. Stone M. 1974. Cross-validatory choice and assessment of statistical predictions (with discussion). Journal of the Royal Statistical Society, Series B 36: 111–147.

    Google Scholar 

  20. Wold S. 1978. Cross-validatory estimation of the number of components in factor and principal component models. Technometrics 20: 397–405.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Jonathan, P., Krzanowski, W.J. & McCarthy, W.V. On the use of cross-validation to assess performance in multivariate prediction. Statistics and Computing 10, 209–229 (2000). https://doi.org/10.1023/A:1008987426876

Download citation

  • cross-validation
  • ridge regression
  • partial least squares
  • prediction
  • assessment of predictive models