Abstract
When using MLP regression models, some method for estimating the generalisation ability is required to identify badly over and underfitted models. If data is limited, it may be impossible to spare sufficient data for a test set, and leave-one-out crossvalidation may be considered as an alternative method for estimating generalisation ability. However, this method is very computer intensive, and we suggest a faster, approximate version suitable for use with the MLP. This approximate method is tested using an artificial test problem, and is then applied to a real modelling problem from the papermaking industry. It is shown that the basic method appears to work quite well, but that the approximation may be poor under certain conditions. These conditions and possible means of improving the approximation are discussed in some detail.
Similar content being viewed by others
References
Altman NS. Introduction to kernel and nearest-neighbour nonparametric regression. The American Statistician 1992; 46: 175–185
Hardie W. Applied Nonparametric Regression, Cambridge University Press, 1990, 190–202
Bulsari A, Saxen H. System identification of a biochemical process using feed-forward neural networks. Neurocomputing 1991; 3: 125–133
Bhat NV, Minderman PA, McAvoy T. Modelling chemical process systems via neural computation. IEEE Control Systems 1990; 24–30, April
Andersen K, Cook GE, Karsai G, Ramaswamy K. Artificial neural networks applied to arc welding process modelling and control. IEEE Trans on Industry Applications 1990; 26
Tsoi AC. Application of neural network methodology to the modelling of the yield strength in a steel rolling plate mill. In: Advances in Neural Information Processing Systems 4, RP Lippman, JE Moody, SJ Hanson (eds), 698–705, Morgan Kaufmann, 1992
Hornik K, Stinchcombe M, White H. Multilayer feedforward neural networks are universal approximators. Neural Networks 1989; 2: 359–366
Geman S, Bienenstock E, Doursat R. Neural networks and the bias/variance dilemma. Neural Computation 1992: 1–58
Eubank RL. Spline Smoothing and Nonparametric Regression, Marcel Dekker, 1988
Gallant AR. Nonlinear Statistical Models, John Wiley, 1987
Seber GAF, Wild CJ. Nonlinear Regression, John Wiley, 1989
Hettmansperger TP, Sheather SJ. Resistant and robust procedures. In: Perspectives on Contemporary Statistics, DC Hoaglin, DS Moore (eds), pp 145–170, Mathematical Association of America, 1992
Battiti R. First- and second-order methods for learning: between steepest descent and Newton's method. Neural Computation 1992; 4: 141–166
Everitt BS. Introduction to Optimisation Methods and their Application inStatistics. Chapman & Hall, 1987
Thisted RA. Elements of Statistical Computing, Chapman & Hall, 1988
Gonin R, Money AH. Nonlinear Lp-Norm Estimation. Marcel Dekker, 1989
Hanson SJ, Burr DJ. Minkowski-r back-propagation: learning in connectionist models with non-Euclidian error signals. In: Neural Information Processing Systems, DZ Anderson (ed), American Institute of Physics, 1988
Ekblom H.Lp-methods for robust regression, BIT 1974: 22–32
Forsythe AB. Robust estimation of straight line regression coefficients by minimisingpth power deviations. Technometrics 1972; 14: 159–166
Girosi F, Jones M, Poggio T. Regularisation theory and neural networks architectures. Neural Computation 1993: 219–269
Green PJ. Penalised likelihood for general semi-parametric regression models. International Statistical Review 1987; 55: 245–259
Green PJ, Silverman BW. Nonparametric Regression and Generalised Linear Models — A Roughness Penalty Approach. Chapman & Hall, 1994
Wahba G, Wang Y, Gu C, Klein R, Klein B. Structured machine learning for ‘Soft’ classification with smoothing spline ANOVA and stacked tuning, testing and evaluation. In: Advances in Neural Information Processing Systems, 6, JD Cowan, G Tesauro, J Alspector (eds), Morgan Kaufmann, 1994; 415–422
Wahba G, Wendelberger J. Some new mathematical methods for variational objective analysis using splines and cross validation. Monthly Weather Review 1980; 108: 1122–1143
Bishop CM. Curvature-driven smoothing: a learning algorithm for feedforward networks. IEEE Transactions on Neural Networks 1993; 4: 882–884
Tikhonov AN, Arsenin VY. Solutions of Ill-Posed Problems. John Wiley, Washington, DC, 1977
O'Sullivan F. A statistical perspective on ill-posed inverse problems. Statistical Science 1986; 1: 502–527
Vinod HD, Ullah A. Recent Advances in Regression Methods. Marcel Dekker, 1981
Snee RD. Validation of regression models: methods and examples. Technometrics 1977; 19: 415–428
Picard RR, Cook RD. Cross-validation of regr1ession models. Journal of the American Statistical Association 1984; 387: 575–583
Efron B. An Introduction to the Bootstrap. Chapman & Hall, 1993
Wang FT, Scott DW. The L1 method for robust nonparametric regression. Journal of the American Statistical Association 1994; 89: 65–76
Barber D, Saad D, Sollich P. Test error fluctuations in finite linear perceptrons. Neural Computation 1995; 7: 809–821
Schoner W. Reaching the generalisation maximum of backpropagation networks. In Artificial Neural Networks 2, I Aleksander, J Taylor (es), vol 2, North-Holland, 1992; 91–94
Cheng B, Titterington DM. Neural networks: a view from a statistical perspective (with discussion). Statistical Science 1994; 9: 2–54
Efron B, Gong G. A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician 1983; 37: 36–48
Urban Hjorth JS. Computer Intensive Statistical Methods — validation model selection and bootstrap. Chapman & Hall, 1994
Plutowski M, Sakata S, White H. Cross-validation estimates IMSE. In: Advances in Neural Information Processing Systems, 6, JD Cowan, G Tesauro, J Alspector (eds) Morgan Kaufmann, 1994; 391–398
Burman P. A comparative study of ordinary crossvalidation,ν-fold cross-validation and the repeated learning-testing methods. Biometrika 1989; 76: 503–514
Chow YS, Geman S, Wu LD. Consistent cross-validated density estimation. The Annals of Statisics 1983; 11: 25–38
Stone M. Asymptotics for and against cross-validation. Biometrika 1977; 64: 29–35
Efron B. How biased is the apparent error rate of a prediction rule. Journal of the American Statistical Association 1986; 81: 461–470
Golub GH, Heath M, Wahba G. Generalised crossvalidation as a method for choosing a good ridge parameter. Technometrics 1979; 21: 215–223
Silverman BW. A fast and efficient cross-validation method for smoothing parameter choice in spline regression. Journal of the American Statistical Association 1984; 79: 584–589
St. Laurent RT, Cook RD. Leverage and superleverage in nonlinear regression. Journal of the American Statistical Association 1992; 87: 985–990
Craven P, Wahba G. Smoothing noisy data using spline functions. Numerische Mathematik 1979; 31: 377–403
Ross WH. The geometry of case deletion and the assessment of influence in nonlinear regression. The Canadian Journal of Statistics 1987; 15: 91–103
Belsley DA, Kuh E, Welsch RE. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, John Wiley, 1980
Emerson JD, Hoaglin DC, Kempthorne PJ. Leverage in least squares additive-plus-multiplicative fits for two-way tables. Journal of the American Statistical Association 1984; 79: 329–335
Schall R, Dunne T. A note on the relationship between parameter collinearity and local influence. Biometrika 1992; 79: 399–404
St. Laurent RT, Cook RD. Leverage, local influence and curvature in nonlinear regression. Biometrika 1993; 80: 99–106
Schall R, Gonin R. Diagnostics for nonlinear Lp-norm estimation. Computational Statistics and Data Analysis 1991; 11: 189–198
Hamilton DC, Watts DG, Bates DM. Accounting for intrinsic nonlinearity in nonlinear regression parameter inference regions. The Annals of Statistics 1982; 10: 386–393
Moody J. The effective number of parameters: an analysis of generalisation and regularisation in nonlinear learning systems. In: Advances in Neural Information Processing Systems, 4, JE Moody, SJ Hanson, RP Lippman (eds), Morgan Kaufmann, 1992; 847–854
Liu Y. Unbiased estimate of generalisation error and model selection in neural network. Neural Networks 1995; 2: 215–219
Mallows CL. Some comments on Cp. Technometrics 1973; 15: 661–675
Fox T, Hinkley D, Larntz K. Jackknifing in nonlinear regression. Technometrics 1980; 22: 29–33
Simonoff JS, Tsai CL. Jackknife-based estimators and confidence regions in nonlinear regression. Technometrics 1988; 28: 103–112
Liu Y. Neural network model selection using asymptotic jackknife estimator and cross-validation. In: Advances in Neural Information Processing Systems, 5, SJ Hanson, JD Cowan, CL Giles (eds), Morgan Kaufmann, 1993; 599–606
Carroll RJ, Ruppert D. Diagnostics and robust estimation when transforming the regression model and the response. Technometrics 1987; 29: 287–299
Duncan GT. An empirical study of jackknife-constructed confidence regions in nonlinear regression.Technometrics 1978; 20: 123–129
Draper NR, Smith H. Applied Regression Analysis (2nd Edn), John Wiley, 1981
Belsley DA. Conditioning diagnostics: Collinearity and Weak Data in Regression, John Wiley, 1991
Golub GH, Van Loan CF. Matrix Computations. North Oxford Academic, 1983
Saarinnen S, Bramley R, Cybenko G. Ill-conditioning in neural network training problems. SIAM Journal of Scientific Computing 1993; 14: 693–714
Marquardt DW. Generalised inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics 1970; 12: 591–612
Gallant RA. Nonlinear regression. The American Statistician 1975; 29: 73–81
Gonin R. Numerical algorithms for solving nonlinear Lp-norm estimation problems: Part I — a first order, gradient algorithm for well-conditioned small residual problems. Communications in Statistics (Simulation) 1986; 15: 801–813
Gonin R, du Toit SHC. Numerical algorithms for solving nonlinear Lp-norm estimation problems: Part II — a mixture method for large residual and illconditioned problems. Communications in Statistics (Theory & Methods) 1987; 16: 969–986
Stewart GW. Collinearity and least squares regression. Statistical Science 1987; 2: 68–100
Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust Statistics — the Approach based on Influence Functions, John Wiley, 1986
Huber PJ. Robust Statistics, John Wiley, 1981
Huber PJ. Robust Statistical Procedures. J.W. Arrowsmith, 1977
Author information
Authors and Affiliations
Additional information
The authors would also like to thank Mike Smart and Robin Woodburn for finding the time to read copies of the manuscript and for many useful suggestions which have improved the readability. Any remaining errors are solely the responsibility of the authors.
Rights and permissions
About this article
Cite this article
Myles, A.J., Murray, A.F., Wallace, A.R. et al. Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation. Neural Comput & Applic 5, 134–151 (1997). https://doi.org/10.1007/BF01413859
Issue Date:
DOI: https://doi.org/10.1007/BF01413859