Abstract
Choosing the architecture of a neural network is one of the most important problems in making neural networks practically useful, but accounts of applications usually sweep these details under the carpet. How many hidden units are needed? Should weight decay be used, and if so how much? What type of output units should be chosen? And so on.
We address these issues within the framework of statistical theory for model choice, which provides a number of workable approximate answers.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Besag, J., Green, P., Higdon, D. and Mengersen, K. (1995) Bayesian computation and stochastic systems. Statistical Science1995.
Bishop, C. Improving the generalization properties of radial basis function neural networks. Neural Computation1991; 3: 579–588.
Cheng, B, and Titterington, D. M. Neural networks: a review from a statistical perspective (with discussion). Statistical Science1994; 9: 2–54.
Draper, D. Assessment and propagation of model uncertainty (with discussion). Journal of the Royal Statistical Society series B1995; 57: 45–97.
Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. Bayesian Data Analysis. Chapman & Hall, New York, 1995.
Geman, S., Bienenstock, E. and Doursat, R. Neural networks and the bias/variance dilemma. Neural Computation1992; 4: 1–58.
Huber, P. J. The behavior of maximum likelihood estimates under nonstandard conditions. In: Le Cam, L. M. and Neyman, J. (eds) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability. University of California Press, Berkeley, 1967, 1:221–233
Jeffreys, H. Theory of Probability. Third edition. Clarendon Press, Oxford, 1961.
Lincoln, W. P. and Skrzypek, J. Synergy of clustering multiple backpropagation networks. In: Touretzky, D. S. (ed) Advances in Neural Information Processing Systems 2. Morgan Kaufmann, San Mateo, CA, 1990, pp. 650 - 657.
Madigan, D. and Raftery, A. E. Model selection and accounting for model uncertainty in graphical models using Occam’s window. Journal of the American Statistical Association1994; 89: 1535–1546.
Madigan, D. and York, J. Bayesian graphical models for discrete data. Technical report 239, Department of Statistics, University of Washington, 1993.
Moody, J. E. Note on generalization, regularization and architecture selection in nonlinear learning systems. In First IEEE-SP Workshop on Neural Networks in Signal Processing. IEEE Computer Society Press, 1991, pp. 1–10.
Moody, J. E. The effectivenumber of parameters: an analysis of generalization and regularization in nonlinear learning systems. In: Moody, J. E., Hanson, S. J. and Lippmann, R. P. (eds) Advances in Neural Information Processing Systems 4. Morgan Kaufmann, San Mateo, CA, 1992, pp. 847 - 854.
Moody, J. and Utans, J. Principled architecture selection for neural networks: Application to corporate bond rating prediction. In: Moody, J. E., Hanson, S. J. and Lippmann, R. P. (eds) Advances in Neural Information Processing Systems 4. Morgan Kaufmann, San Mateo, CA, 1992, pp. 683 - 690.
Moody, J. and Utans, J. Architecture selection strategies for neural networks: Application to corporate bond rating prediction. In: Refenes, A.-P. (ed) Neural Networks in the Capital Markets. Wiley, Chichester, 1995, pp. 277–300.
Murata, N., Yoshizawa, S. and Amari, S. A criterion for determining the number of parameters in an artificial neural network model. In: Kohonen, T., Mäkisara, K., Simula, O. and Kangas, J. (eds) Artificial Neural Networks. North Holland, Amsterdam, 1991, pp. 9–14.
Murata, N., Yoshizawa, S. and Amari, S. Learning curves, model selection and complexity of neural networks. In: Hanson, S. J., Cowan, J. D. and Giles, C. L. (eds) Advances in Neural Information Processing Systems 5. Morgan Kaufmann, San Mateo, CA, 1993, pp. 607 - 614.
Murata, N., Yoshizawa, S. and Amari, S. Network information criterion determining the number of hidden units for artificial neural network models. IEEE Transactions on Neural Networks1994; 5: 865–872.
Perrone, M. P. and Cooper, L. N. When networks disagree: Ensemble methods for hybrid neural networks. In: Mammon, R. J. (ed) Artificial Neural Networks for Speech and Vision. Chapman & Hall, London, 1993, pp. 126–142.
Ripley, B. D. Statistical aspects of neural networks. In Barndorff-Nielsen, O. E., Jensen, J. L. and Kendall, W. S. (eds) Networks and Chaos — Statistical and Probabilistic Aspects. Chapman & Hall, London, 1993, pp. 40–123
Ripley, B. D. Neural networks and related methods for classification (with discussion). Journal of the Royal Statistical Society series B1994; 56: 409–456
Ripley, B. D. Neural networks and flexible regression and discrimination. In: Mardia, K. V. (ed) Statistics and Images Carfax, Abingdon, 1994, pp. 39–57 (Advances in Applied Statistics 2).
Ripley, B. D. Flexible non-linear approaches to classification. In: Cherkassky, V., Friedman, J. H. and Wechsler, H. (eds) From Statistics to Neural Networks. Theory and Pattern Recognition Applications. Springer, Berlin, 1994, pp. 105–126.
Ripley, B. D. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, 1995.
Seber, G. A. F. and Wild, C. J. Nonlinear Regression. Wiley, New York, 1989.
Stewart, L. Hierarchical Bayesian analysis using Monte Carlo integration: computing posterior distributions when there are many possible models. The Statistician1987; 36: 211–219.
Stone, M. Cross-validatory choice and assessment of statistical predictions (with discussion). Journal of the Royal Statistical Society seriesB 1974; 36: 111–147.
Stone, M. An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. Journal of the Royal Statistical Society series B1977; 39: 4447.
Wahba, G. and Wold, S. A completely automatic French curve. Communications in Statistics1975; 4: 1–17.
Wolpert, D. H. Stacked generalization. Neural Networks1992; 5: 241–259.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1995 Springer-Verlag London Limited
About this paper
Cite this paper
Ripley, B.D. (1995). Statistical Ideas for Selecting Network Architectures. In: Kappen, B., Gielen, S. (eds) Neural Networks: Artificial Intelligence and Industrial Applications. Springer, London. https://doi.org/10.1007/978-1-4471-3087-1_36
Download citation
DOI: https://doi.org/10.1007/978-1-4471-3087-1_36
Publisher Name: Springer, London
Print ISBN: 978-3-540-19992-2
Online ISBN: 978-1-4471-3087-1
eBook Packages: Springer Book Archive