Skip to main content

Abstract

Choosing the architecture of a neural network is one of the most important problems in making neural networks practically useful, but accounts of applications usually sweep these details under the carpet. How many hidden units are needed? Should weight decay be used, and if so how much? What type of output units should be chosen? And so on.

We address these issues within the framework of statistical theory for model choice, which provides a number of workable approximate answers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Besag, J., Green, P., Higdon, D. and Mengersen, K. (1995) Bayesian computation and stochastic systems. Statistical Science1995.

    Google Scholar 

  2. Bishop, C. Improving the generalization properties of radial basis function neural networks. Neural Computation1991; 3: 579–588.

    Article  Google Scholar 

  3. Cheng, B, and Titterington, D. M. Neural networks: a review from a statistical perspective (with discussion). Statistical Science1994; 9: 2–54.

    Article  MATH  MathSciNet  Google Scholar 

  4. Draper, D. Assessment and propagation of model uncertainty (with discussion). Journal of the Royal Statistical Society series B1995; 57: 45–97.

    MATH  MathSciNet  Google Scholar 

  5. Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. Bayesian Data Analysis. Chapman & Hall, New York, 1995.

    Google Scholar 

  6. Geman, S., Bienenstock, E. and Doursat, R. Neural networks and the bias/variance dilemma. Neural Computation1992; 4: 1–58.

    Article  Google Scholar 

  7. Huber, P. J. The behavior of maximum likelihood estimates under nonstandard conditions. In: Le Cam, L. M. and Neyman, J. (eds) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, 1967, 1:221–233

    Google Scholar 

  8. Jeffreys, H. Theory of Probability. Third edition. Clarendon Press, Oxford, 1961.

    MATH  Google Scholar 

  9. Lincoln, W. P. and Skrzypek, J. Synergy of clustering multiple backpropagation networks. In: Touretzky, D. S. (ed) Advances in Neural Information Processing Systems 2. Morgan Kaufmann, San Mateo, CA, 1990, pp. 650 - 657.

    Google Scholar 

  10. Madigan, D. and Raftery, A. E. Model selection and accounting for model uncertainty in graphical models using Occam’s window. Journal of the American Statistical Association1994; 89: 1535–1546.

    Article  MATH  Google Scholar 

  11. Madigan, D. and York, J. Bayesian graphical models for discrete data. Technical report 239, Department of Statistics, University of Washington, 1993.

    Google Scholar 

  12. Moody, J. E. Note on generalization, regularization and architecture selection in nonlinear learning systems. In First IEEE-SP Workshop on Neural Networks in Signal Processing. IEEE Computer Society Press, 1991, pp. 1–10.

    Google Scholar 

  13. Moody, J. E. The effectivenumber of parameters: an analysis of generalization and regularization in nonlinear learning systems. In: Moody, J. E., Hanson, S. J. and Lippmann, R. P. (eds) Advances in Neural Information Processing Systems 4. Morgan Kaufmann, San Mateo, CA, 1992, pp. 847 - 854.

    Google Scholar 

  14. Moody, J. and Utans, J. Principled architecture selection for neural networks: Application to corporate bond rating prediction. In: Moody, J. E., Hanson, S. J. and Lippmann, R. P. (eds) Advances in Neural Information Processing Systems 4. Morgan Kaufmann, San Mateo, CA, 1992, pp. 683 - 690.

    Google Scholar 

  15. Moody, J. and Utans, J. Architecture selection strategies for neural networks: Application to corporate bond rating prediction. In: Refenes, A.-P. (ed) Neural Networks in the Capital Markets. Wiley, Chichester, 1995, pp. 277–300.

    Google Scholar 

  16. Murata, N., Yoshizawa, S. and Amari, S. A criterion for determining the number of parameters in an artificial neural network model. In: Kohonen, T., Mäkisara, K., Simula, O. and Kangas, J. (eds) Artificial Neural Networks. North Holland, Amsterdam, 1991, pp. 9–14.

    Google Scholar 

  17. Murata, N., Yoshizawa, S. and Amari, S. Learning curves, model selection and complexity of neural networks. In: Hanson, S. J., Cowan, J. D. and Giles, C. L. (eds) Advances in Neural Information Processing Systems 5. Morgan Kaufmann, San Mateo, CA, 1993, pp. 607 - 614.

    Google Scholar 

  18. Murata, N., Yoshizawa, S. and Amari, S. Network information criterion determining the number of hidden units for artificial neural network models. IEEE Transactions on Neural Networks1994; 5: 865–872.

    Article  Google Scholar 

  19. Perrone, M. P. and Cooper, L. N. When networks disagree: Ensemble methods for hybrid neural networks. In: Mammon, R. J. (ed) Artificial Neural Networks for Speech and Vision. Chapman & Hall, London, 1993, pp. 126–142.

    Google Scholar 

  20. Ripley, B. D. Statistical aspects of neural networks. In Barndorff-Nielsen, O. E., Jensen, J. L. and Kendall, W. S. (eds) Networks and Chaos — Statistical and Probabilistic Aspects. Chapman & Hall, London, 1993, pp. 40–123

    Google Scholar 

  21. Ripley, B. D. Neural networks and related methods for classification (with discussion). Journal of the Royal Statistical Society series B1994; 56: 409–456

    MATH  MathSciNet  Google Scholar 

  22. Ripley, B. D. Neural networks and flexible regression and discrimination. In: Mardia, K. V. (ed) Statistics and Images Carfax, Abingdon, 1994, pp. 39–57 (Advances in Applied Statistics 2).

    Google Scholar 

  23. Ripley, B. D. Flexible non-linear approaches to classification. In: Cherkassky, V., Friedman, J. H. and Wechsler, H. (eds) From Statistics to Neural Networks. Theory and Pattern Recognition Applications. Springer, Berlin, 1994, pp. 105–126.

    Google Scholar 

  24. Ripley, B. D. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, 1995.

    Google Scholar 

  25. Seber, G. A. F. and Wild, C. J. Nonlinear Regression. Wiley, New York, 1989.

    Book  MATH  Google Scholar 

  26. Stewart, L. Hierarchical Bayesian analysis using Monte Carlo integration: computing posterior distributions when there are many possible models. The Statistician1987; 36: 211–219.

    Article  Google Scholar 

  27. Stone, M. Cross-validatory choice and assessment of statistical predictions (with discussion). Journal of the Royal Statistical Society seriesB 1974; 36: 111–147.

    MATH  Google Scholar 

  28. Stone, M. An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. Journal of the Royal Statistical Society series B1977; 39: 4447.

    Google Scholar 

  29. Wahba, G. and Wold, S. A completely automatic French curve. Communications in Statistics1975; 4: 1–17.

    Article  MathSciNet  Google Scholar 

  30. Wolpert, D. H. Stacked generalization. Neural Networks1992; 5: 241–259.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1995 Springer-Verlag London Limited

About this paper

Cite this paper

Ripley, B.D. (1995). Statistical Ideas for Selecting Network Architectures. In: Kappen, B., Gielen, S. (eds) Neural Networks: Artificial Intelligence and Industrial Applications. Springer, London. https://doi.org/10.1007/978-1-4471-3087-1_36

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-3087-1_36

  • Publisher Name: Springer, London

  • Print ISBN: 978-3-540-19992-2

  • Online ISBN: 978-1-4471-3087-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics