Skip to main content
Log in

Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation

  • Articles
  • Published:
Neural Computing & Applications Aims and scope Submit manuscript

Abstract

When using MLP regression models, some method for estimating the generalisation ability is required to identify badly over and underfitted models. If data is limited, it may be impossible to spare sufficient data for a test set, and leave-one-out crossvalidation may be considered as an alternative method for estimating generalisation ability. However, this method is very computer intensive, and we suggest a faster, approximate version suitable for use with the MLP. This approximate method is tested using an artificial test problem, and is then applied to a real modelling problem from the papermaking industry. It is shown that the basic method appears to work quite well, but that the approximation may be poor under certain conditions. These conditions and possible means of improving the approximation are discussed in some detail.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Altman NS. Introduction to kernel and nearest-neighbour nonparametric regression. The American Statistician 1992; 46: 175–185

    Google Scholar 

  2. Hardie W. Applied Nonparametric Regression, Cambridge University Press, 1990, 190–202

  3. Bulsari A, Saxen H. System identification of a biochemical process using feed-forward neural networks. Neurocomputing 1991; 3: 125–133

    Google Scholar 

  4. Bhat NV, Minderman PA, McAvoy T. Modelling chemical process systems via neural computation. IEEE Control Systems 1990; 24–30, April

  5. Andersen K, Cook GE, Karsai G, Ramaswamy K. Artificial neural networks applied to arc welding process modelling and control. IEEE Trans on Industry Applications 1990; 26

  6. Tsoi AC. Application of neural network methodology to the modelling of the yield strength in a steel rolling plate mill. In: Advances in Neural Information Processing Systems 4, RP Lippman, JE Moody, SJ Hanson (eds), 698–705, Morgan Kaufmann, 1992

  7. Hornik K, Stinchcombe M, White H. Multilayer feedforward neural networks are universal approximators. Neural Networks 1989; 2: 359–366

    Google Scholar 

  8. Geman S, Bienenstock E, Doursat R. Neural networks and the bias/variance dilemma. Neural Computation 1992: 1–58

  9. Eubank RL. Spline Smoothing and Nonparametric Regression, Marcel Dekker, 1988

  10. Gallant AR. Nonlinear Statistical Models, John Wiley, 1987

  11. Seber GAF, Wild CJ. Nonlinear Regression, John Wiley, 1989

  12. Hettmansperger TP, Sheather SJ. Resistant and robust procedures. In: Perspectives on Contemporary Statistics, DC Hoaglin, DS Moore (eds), pp 145–170, Mathematical Association of America, 1992

  13. Battiti R. First- and second-order methods for learning: between steepest descent and Newton's method. Neural Computation 1992; 4: 141–166

    Google Scholar 

  14. Everitt BS. Introduction to Optimisation Methods and their Application inStatistics. Chapman & Hall, 1987

  15. Thisted RA. Elements of Statistical Computing, Chapman & Hall, 1988

  16. Gonin R, Money AH. Nonlinear Lp-Norm Estimation. Marcel Dekker, 1989

  17. Hanson SJ, Burr DJ. Minkowski-r back-propagation: learning in connectionist models with non-Euclidian error signals. In: Neural Information Processing Systems, DZ Anderson (ed), American Institute of Physics, 1988

  18. Ekblom H.Lp-methods for robust regression, BIT 1974: 22–32

  19. Forsythe AB. Robust estimation of straight line regression coefficients by minimisingpth power deviations. Technometrics 1972; 14: 159–166

    Google Scholar 

  20. Girosi F, Jones M, Poggio T. Regularisation theory and neural networks architectures. Neural Computation 1993: 219–269

  21. Green PJ. Penalised likelihood for general semi-parametric regression models. International Statistical Review 1987; 55: 245–259

    Google Scholar 

  22. Green PJ, Silverman BW. Nonparametric Regression and Generalised Linear Models — A Roughness Penalty Approach. Chapman & Hall, 1994

  23. Wahba G, Wang Y, Gu C, Klein R, Klein B. Structured machine learning for ‘Soft’ classification with smoothing spline ANOVA and stacked tuning, testing and evaluation. In: Advances in Neural Information Processing Systems, 6, JD Cowan, G Tesauro, J Alspector (eds), Morgan Kaufmann, 1994; 415–422

  24. Wahba G, Wendelberger J. Some new mathematical methods for variational objective analysis using splines and cross validation. Monthly Weather Review 1980; 108: 1122–1143

    Google Scholar 

  25. Bishop CM. Curvature-driven smoothing: a learning algorithm for feedforward networks. IEEE Transactions on Neural Networks 1993; 4: 882–884

    Google Scholar 

  26. Tikhonov AN, Arsenin VY. Solutions of Ill-Posed Problems. John Wiley, Washington, DC, 1977

    Google Scholar 

  27. O'Sullivan F. A statistical perspective on ill-posed inverse problems. Statistical Science 1986; 1: 502–527

    Google Scholar 

  28. Vinod HD, Ullah A. Recent Advances in Regression Methods. Marcel Dekker, 1981

  29. Snee RD. Validation of regression models: methods and examples. Technometrics 1977; 19: 415–428

    Google Scholar 

  30. Picard RR, Cook RD. Cross-validation of regr1ession models. Journal of the American Statistical Association 1984; 387: 575–583

    Google Scholar 

  31. Efron B. An Introduction to the Bootstrap. Chapman & Hall, 1993

  32. Wang FT, Scott DW. The L1 method for robust nonparametric regression. Journal of the American Statistical Association 1994; 89: 65–76

    Google Scholar 

  33. Barber D, Saad D, Sollich P. Test error fluctuations in finite linear perceptrons. Neural Computation 1995; 7: 809–821

    Google Scholar 

  34. Schoner W. Reaching the generalisation maximum of backpropagation networks. In Artificial Neural Networks 2, I Aleksander, J Taylor (es), vol 2, North-Holland, 1992; 91–94

  35. Cheng B, Titterington DM. Neural networks: a view from a statistical perspective (with discussion). Statistical Science 1994; 9: 2–54

    Google Scholar 

  36. Efron B, Gong G. A leisurely look at the bootstrap, the jackknife, and cross-validation. The American Statistician 1983; 37: 36–48

    Google Scholar 

  37. Urban Hjorth JS. Computer Intensive Statistical Methods — validation model selection and bootstrap. Chapman & Hall, 1994

  38. Plutowski M, Sakata S, White H. Cross-validation estimates IMSE. In: Advances in Neural Information Processing Systems, 6, JD Cowan, G Tesauro, J Alspector (eds) Morgan Kaufmann, 1994; 391–398

  39. Burman P. A comparative study of ordinary crossvalidation,ν-fold cross-validation and the repeated learning-testing methods. Biometrika 1989; 76: 503–514

    Google Scholar 

  40. Chow YS, Geman S, Wu LD. Consistent cross-validated density estimation. The Annals of Statisics 1983; 11: 25–38

    Google Scholar 

  41. Stone M. Asymptotics for and against cross-validation. Biometrika 1977; 64: 29–35

    Google Scholar 

  42. Efron B. How biased is the apparent error rate of a prediction rule. Journal of the American Statistical Association 1986; 81: 461–470

    Google Scholar 

  43. Golub GH, Heath M, Wahba G. Generalised crossvalidation as a method for choosing a good ridge parameter. Technometrics 1979; 21: 215–223

    Google Scholar 

  44. Silverman BW. A fast and efficient cross-validation method for smoothing parameter choice in spline regression. Journal of the American Statistical Association 1984; 79: 584–589

    Google Scholar 

  45. St. Laurent RT, Cook RD. Leverage and superleverage in nonlinear regression. Journal of the American Statistical Association 1992; 87: 985–990

    Google Scholar 

  46. Craven P, Wahba G. Smoothing noisy data using spline functions. Numerische Mathematik 1979; 31: 377–403

    Google Scholar 

  47. Ross WH. The geometry of case deletion and the assessment of influence in nonlinear regression. The Canadian Journal of Statistics 1987; 15: 91–103

    Google Scholar 

  48. Belsley DA, Kuh E, Welsch RE. Regression Diagnostics: Identifying Influential Data and Sources of Collinearity, John Wiley, 1980

  49. Emerson JD, Hoaglin DC, Kempthorne PJ. Leverage in least squares additive-plus-multiplicative fits for two-way tables. Journal of the American Statistical Association 1984; 79: 329–335

    Google Scholar 

  50. Schall R, Dunne T. A note on the relationship between parameter collinearity and local influence. Biometrika 1992; 79: 399–404

    Google Scholar 

  51. St. Laurent RT, Cook RD. Leverage, local influence and curvature in nonlinear regression. Biometrika 1993; 80: 99–106

    Google Scholar 

  52. Schall R, Gonin R. Diagnostics for nonlinear Lp-norm estimation. Computational Statistics and Data Analysis 1991; 11: 189–198

    Google Scholar 

  53. Hamilton DC, Watts DG, Bates DM. Accounting for intrinsic nonlinearity in nonlinear regression parameter inference regions. The Annals of Statistics 1982; 10: 386–393

    Google Scholar 

  54. Moody J. The effective number of parameters: an analysis of generalisation and regularisation in nonlinear learning systems. In: Advances in Neural Information Processing Systems, 4, JE Moody, SJ Hanson, RP Lippman (eds), Morgan Kaufmann, 1992; 847–854

  55. Liu Y. Unbiased estimate of generalisation error and model selection in neural network. Neural Networks 1995; 2: 215–219

    Google Scholar 

  56. Mallows CL. Some comments on Cp. Technometrics 1973; 15: 661–675

    Google Scholar 

  57. Fox T, Hinkley D, Larntz K. Jackknifing in nonlinear regression. Technometrics 1980; 22: 29–33

    Google Scholar 

  58. Simonoff JS, Tsai CL. Jackknife-based estimators and confidence regions in nonlinear regression. Technometrics 1988; 28: 103–112

    Google Scholar 

  59. Liu Y. Neural network model selection using asymptotic jackknife estimator and cross-validation. In: Advances in Neural Information Processing Systems, 5, SJ Hanson, JD Cowan, CL Giles (eds), Morgan Kaufmann, 1993; 599–606

  60. Carroll RJ, Ruppert D. Diagnostics and robust estimation when transforming the regression model and the response. Technometrics 1987; 29: 287–299

    Google Scholar 

  61. Duncan GT. An empirical study of jackknife-constructed confidence regions in nonlinear regression.Technometrics 1978; 20: 123–129

    Google Scholar 

  62. Draper NR, Smith H. Applied Regression Analysis (2nd Edn), John Wiley, 1981

  63. Belsley DA. Conditioning diagnostics: Collinearity and Weak Data in Regression, John Wiley, 1991

  64. Golub GH, Van Loan CF. Matrix Computations. North Oxford Academic, 1983

  65. Saarinnen S, Bramley R, Cybenko G. Ill-conditioning in neural network training problems. SIAM Journal of Scientific Computing 1993; 14: 693–714

    Google Scholar 

  66. Marquardt DW. Generalised inverses, ridge regression, biased linear estimation, and nonlinear estimation. Technometrics 1970; 12: 591–612

    Google Scholar 

  67. Gallant RA. Nonlinear regression. The American Statistician 1975; 29: 73–81

    Google Scholar 

  68. Gonin R. Numerical algorithms for solving nonlinear Lp-norm estimation problems: Part I — a first order, gradient algorithm for well-conditioned small residual problems. Communications in Statistics (Simulation) 1986; 15: 801–813

    Google Scholar 

  69. Gonin R, du Toit SHC. Numerical algorithms for solving nonlinear Lp-norm estimation problems: Part II — a mixture method for large residual and illconditioned problems. Communications in Statistics (Theory & Methods) 1987; 16: 969–986

    Google Scholar 

  70. Stewart GW. Collinearity and least squares regression. Statistical Science 1987; 2: 68–100

    Google Scholar 

  71. Hampel FR, Ronchetti EM, Rousseeuw PJ, Stahel WA. Robust Statistics — the Approach based on Influence Functions, John Wiley, 1986

  72. Huber PJ. Robust Statistics, John Wiley, 1981

  73. Huber PJ. Robust Statistical Procedures. J.W. Arrowsmith, 1977

Download references

Author information

Authors and Affiliations

Authors

Additional information

The authors would also like to thank Mike Smart and Robin Woodburn for finding the time to read copies of the manuscript and for many useful suggestions which have improved the readability. Any remaining errors are solely the responsibility of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Myles, A.J., Murray, A.F., Wallace, A.R. et al. Estimating MLP generalisation ability without a test set using fast, approximate leave-one-out cross-validation. Neural Comput & Applic 5, 134–151 (1997). https://doi.org/10.1007/BF01413859

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01413859

Keywords

Navigation