Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation

Myles, Andrew J.; Murray, Alan F.; Wallace, A. Robin; Barnard, John; Smith, Gordon

doi:10.1007/BF01413859

Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation

Articles
Published: September 1997

Volume 5, pages 134–151, (1997)
Cite this article

Neural Computing & Applications Aims and scope Submit manuscript

Andrew J. Myles²,
Alan F. Murray²,
A. Robin Wallace²,
John Barnard¹ &
…
Gordon Smith¹

146 Accesses
7 Citations
Explore all metrics

Abstract

When using MLP regression models, some method for estimating the generalisation ability is required to identify badly over and underfitted models. If data is limited, it may be impossible to spare sufficient data for a test set, and leave-one-out crossvalidation may be considered as an alternative method for estimating generalisation ability. However, this method is very computer intensive, and we suggest a faster, approximate version suitable for use with the MLP. This approximate method is tested using an artificial test problem, and is then applied to a real modelling problem from the papermaking industry. It is shown that the basic method appears to work quite well, but that the approximation may be poor under certain conditions. These conditions and possible means of improving the approximation are discussed in some detail.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Article 30 August 2016

Best subset selection via cross-validation criterion

Article 14 February 2020

Estimates for the generalized cross-validation function via an extrapolation and statistical approach

Article 18 June 2018

References

Altman NS. Introduction to kernel and nearest-neighbour nonparametric regression. The American Statistician 1992; 46: 175–185
Google Scholar
Hardie W. Applied Nonparametric Regression, Cambridge University Press, 1990, 190–202
Bulsari A, Saxen H. System identification of a biochemical process using feed-forward neural networks. Neurocomputing 1991; 3: 125–133
Google Scholar
Bhat NV, Minderman PA, McAvoy T. Modelling chemical process systems via neural computation. IEEE Control Systems 1990; 24–30, April
Andersen K, Cook GE, Karsai G, Ramaswamy K. Artificial neural networks applied to arc welding process modelling and control. IEEE Trans on Industry Applications 1990; 26
Tsoi AC. Application of neural network methodology to the modelling of the yield strength in a steel rolling plate mill. In: Advances in Neural Information Processing Systems 4, RP Lippman, JE Moody, SJ Hanson (eds), 698–705, Morgan Kaufmann, 1992
Hornik K, Stinchcombe M, White H. Multilayer feedforward neural networks are universal approximators. Neural Networks 1989; 2: 359–366
Google Scholar
Geman S, Bienenstock E, Doursat R. Neural networks and the bias/variance dilemma. Neural Computation 1992: 1–58
Eubank RL. Spline Smoothing and Nonparametric Regression, Marcel Dekker, 1988
Gallant AR. Nonlinear Statistical Models, John Wiley, 1987
Seber GAF, Wild CJ. Nonlinear Regression, John Wiley, 1989
Hettmansperger TP, Sheather SJ. Resistant and robust procedures. In: Perspectives on Contemporary Statistics, DC Hoaglin, DS Moore (eds), pp 145–170, Mathematical Association of America, 1992
Battiti R. First- and second-order methods for learning: between steepest descent and Newton's method. Neural Computation 1992; 4: 141–166
Google Scholar
Everitt BS. Introduction to Optimisation Methods and their Application inStatistics. Chapman & Hall, 1987
Thisted RA. Elements of Statistical Computing, Chapman & Hall, 1988
Gonin R, Money AH. Nonlinear Lp-Norm Estimation. Marcel Dekker, 1989
Hanson SJ, Burr DJ. Minkowski-r back-propagation: learning in connectionist models with non-Euclidian error signals. In: Neural Information Processing Systems, DZ Anderson (ed), American Institute of Physics, 1988
Ekblom H.Lp-methods for robust regression, BIT 1974: 22–32
Forsythe AB. Robust estimation of straight line regression coefficients by minimisingpth power deviations. Technometrics 1972; 14: 159–166
Google Scholar
Girosi F, Jones M, Poggio T. Regularisation theory and neural networks architectures. Neural Computation 1993: 219–269
Green PJ. Penalised likelihood for general semi-parametric regression models. International Statistical Review 1987; 55: 245–259
Google Scholar
Green PJ, Silverman BW. Nonparametric Regression and Generalised Linear Models — A Roughness Penalty Approach. Chapman & Hall, 1994
Wahba G, Wang Y, Gu C, Klein R, Klein B. Structured machine learning for ‘Soft’ classification with smoothing spline ANOVA and stacked tuning, testing and evaluation. In: Advances in Neural Information Processing Systems, 6, JD Cowan, G Tesauro, J Alspector (eds), Morgan Kaufmann, 1994; 415–422
Wahba G, Wendelberger J. Some new mathematical methods for variational objective analysis using splines and cross validation. Monthly Weather Review 1980; 108: 1122–1143
Google Scholar
Bishop CM. Curvature-driven smoothing: a learning algorithm for feedforward networks. IEEE Transactions on Neural Networks 1993; 4: 882–884
Google Scholar
Tikhonov AN, Arsenin VY. Solutions of Ill-Posed Problems. John Wiley, Washington, DC, 1977
Google Scholar
O'Sullivan F. A statistical perspective on ill-posed inverse problems. Statistical Science 1986; 1: 502–527
Google Scholar
Vinod HD, Ullah A. Recent Advances in Regression Methods. Marcel Dekker, 1981
Snee RD. Validation of regression models: methods and examples. Technometrics 1977; 19: 415–428
Google Scholar
Picard RR, Cook RD. Cross-validation of regr1ession models. Journal of the American Statistical Association 1984; 387: 575–583
Google Scholar
Efron B. An Introduction to the Bootstrap. Chapman & Hall, 1993
Wang FT, Scott DW. The L₁ method for robust nonparametric regression. Journal of the American Statistical Association 1994; 89: 65–76
Google Scholar
Barber D, Saad D, Sollich P. Test error fluctuations in finite linear perceptrons. Neural Computation 1995; 7: 809–821
Google Scholar
Schoner W. Reaching the generalisation maximum of backpropagation networks. In Artificial Neural Networks 2, I Aleksander, J Taylor (es), vol 2, North-Holland, 1992; 91–94
Cheng B, Titterington DM. Neural networks: a view from a statistical perspective (with discussion). Statistical Science 1994; 9: 2–54
Google Scholar
Efron B, Gong G. A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician 1983; 37: 36–48
Google Scholar
Urban Hjorth JS. Computer Intensive Statistical Methods — validation model selection and bootstrap. Chapman & Hall, 1994
Plutowski M, Sakata S, White H. Cross-validation estimates IMSE. In: Advances in Neural Information Processing Systems, 6, JD Cowan, G Tesauro, J Alspector (eds) Morgan Kaufmann, 1994; 391–398
Burman P. A comparative study of ordinary crossvalidation,ν-fold cross-validation and the repeated learning-testing methods. Biometrika 1989; 76: 503–514
Google Scholar
Chow YS, Geman S, Wu LD. Consistent cross-validated density estimation. The Annals of Statisics 1983; 11: 25–38
Google Scholar
Stone M. Asymptotics for and against cross-validation. Biometrika 1977; 64: 29–35
Google Scholar
Efron B. How biased is the apparent error rate of a prediction rule. Journal of the American Statistical Association 1986; 81: 461–470
Google Scholar
Golub GH, Heath M, Wahba G. Generalised crossvalidation as a method for choosing a good ridge parameter. Technometrics 1979; 21: 215–223
Google Scholar
Silverman BW. A fast and efficient cross-validation method for smoothing parameter choice in spline regression. Journal of the American Statistical Association 1984; 79: 584–589
Google Scholar
St. Laurent RT, Cook RD. Leverage and superleverage in nonlinear regression. Journal of the American Statistical Association 1992; 87: 985–990
Google Scholar
Craven P, Wahba G. Smoothing noisy data using spline functions. Numerische Mathematik 1979; 31: 377–403
Google Scholar
Ross WH. The geometry of case deletion and the assessment of influence in nonlinear regression. The Canadian Journal of Statistics 1987; 15: 91–103
Google Scholar
Belsley DA, Kuh E, Welsch RE. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, John Wiley, 1980
Emerson JD, Hoaglin DC, Kempthorne PJ. Leverage in least squares additive-plus-multiplicative fits for two-way tables. Journal of the American Statistical Association 1984; 79: 329–335
Google Scholar
Schall R, Dunne T. A note on the relationship between parameter collinearity and local influence. Biometrika 1992; 79: 399–404
Google Scholar
St. Laurent RT, Cook RD. Leverage, local influence and curvature in nonlinear regression. Biometrika 1993; 80: 99–106
Google Scholar
Schall R, Gonin R. Diagnostics for nonlinear L_p-norm estimation. Computational Statistics and Data Analysis 1991; 11: 189–198
Google Scholar
Hamilton DC, Watts DG, Bates DM. Accounting for intrinsic nonlinearity in nonlinear regression parameter inference regions. The Annals of Statistics 1982; 10: 386–393
Google Scholar
Moody J. The effective number of parameters: an analysis of generalisation and regularisation in nonlinear learning systems. In: Advances in Neural Information Processing Systems, 4, JE Moody, SJ Hanson, RP Lippman (eds), Morgan Kaufmann, 1992; 847–854
Liu Y. Unbiased estimate of generalisation error and model selection in neural network. Neural Networks 1995; 2: 215–219
Google Scholar
Mallows CL. Some comments on C_p. Technometrics 1973; 15: 661–675
Google Scholar
Fox T, Hinkley D, Larntz K. Jackknifing in nonlinear regression. Technometrics 1980; 22: 29–33
Google Scholar
Simonoff JS, Tsai CL. Jackknife-based estimators and confidence regions in nonlinear regression. Technometrics 1988; 28: 103–112
Google Scholar
Liu Y. Neural network model selection using asymptotic jackknife estimator and cross-validation. In: Advances in Neural Information Processing Systems, 5, SJ Hanson, JD Cowan, CL Giles (eds), Morgan Kaufmann, 1993; 599–606
Carroll RJ, Ruppert D. Diagnostics and robust estimation when transforming the regression model and the response. Technometrics 1987; 29: 287–299
Google Scholar
Duncan GT. An empirical study of jackknife-constructed confidence regions in nonlinear regression.Technometrics 1978; 20: 123–129
Google Scholar
Draper NR, Smith H. Applied Regression Analysis (2nd Edn), John Wiley, 1981
Belsley DA. Conditioning diagnostics: Collinearity and Weak Data in Regression, John Wiley, 1991
Golub GH, Van Loan CF. Matrix Computations. North Oxford Academic, 1983
Saarinnen S, Bramley R, Cybenko G. Ill-conditioning in neural network training problems. SIAM Journal of Scientific Computing 1993; 14: 693–714
Google Scholar
Marquardt DW. Generalised inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics 1970; 12: 591–612
Google Scholar
Gallant RA. Nonlinear regression. The American Statistician 1975; 29: 73–81
Google Scholar
Gonin R. Numerical algorithms for solving nonlinear L_p-norm estimation problems: Part I — a first order, gradient algorithm for well-conditioned small residual problems. Communications in Statistics (Simulation) 1986; 15: 801–813
Google Scholar
Gonin R, du Toit SHC. Numerical algorithms for solving nonlinear L_p-norm estimation problems: Part II — a mixture method for large residual and illconditioned problems. Communications in Statistics (Theory & Methods) 1987; 16: 969–986
Google Scholar
Stewart GW. Collinearity and least squares regression. Statistical Science 1987; 2: 68–100
Google Scholar
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust Statistics — the Approach based on Influence Functions, John Wiley, 1986
Huber PJ. Robust Statistics, John Wiley, 1981
Huber PJ. Robust Statistical Procedures. J.W. Arrowsmith, 1977

Download references

Author information

Authors and Affiliations

Coated Development Team, Tullis Russell & Co. Ltd., Markinch, Fife, UK
John Barnard & Gordon Smith
Department of Electrical Engineering, University of Edinburgh, King's Buildings, Mayfield Road, EH9 3JL, Edinburgh, UK
Andrew J. Myles, Alan F. Murray & A. Robin Wallace

Authors

Andrew J. Myles
View author publications
You can also search for this author in PubMed Google Scholar
Alan F. Murray
View author publications
You can also search for this author in PubMed Google Scholar
A. Robin Wallace
View author publications
You can also search for this author in PubMed Google Scholar
John Barnard
View author publications
You can also search for this author in PubMed Google Scholar
Gordon Smith
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

The authors would also like to thank Mike Smart and Robin Woodburn for finding the time to read copies of the manuscript and for many useful suggestions which have improved the readability. Any remaining errors are solely the responsibility of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Myles, A.J., Murray, A.F., Wallace, A.R. et al. Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation. Neural Comput & Applic 5, 134–151 (1997). https://doi.org/10.1007/BF01413859

Download citation

Issue Date: September 1997
DOI: https://doi.org/10.1007/BF01413859

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Best subset selection via cross-validation criterion

Estimates for the generalized cross-validation function via an extrapolation and statistical approach

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation

Abstract

Access this article

Similar content being viewed by others

Practical Bayesian model evaluation using leave-one-out cross-validation and WAIC

Best subset selection via cross-validation criterion

Estimates for the generalized cross-validation function via an extrapolation and statistical approach

References

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation