Abstract
This paper addresses the optimization of neural network architectures. It is suggested to optimize the architecture by selecting the model with minimal estimated averaged generalization error. We consider a least-squares (LS) criterion for estimating neural network models, i.e., the associated model weights are estimated by minimizing the LS criterion. The quality of a particular estimated model is measured by the average generalization error. This is defined as the expected squared prediction error on a novel input-output sample averaged over all possible training sets. An essential part of the suggested architecture optimization scheme is to calculate an estimate of the average generalization error. We suggest using the GEN-estimator [9, 10] which allows for dealing with nonlinear, incomplete models, i.e., models which are not capable of modeling the underlying nonlinear relationship perfectly. In most neural network applications, it is impossible to suggest a perfect model, and consequently the ability to handle incomplete models is urgent. A concise derivation of the GEN-estimator is provided, and its qualities are demonstrated by comparative numerical studies.
Similar content being viewed by others
References
H. Akaike, “Fitting autoregressive models for prediction,”Ann. Inst. Statistical Math.,21, 243 (1969).
N. R. Draper and H. Smith,Applied Regression Analysis, John Wiley and Sons, New York (1981).
D. B. Fogel,IEEE Trans. Neural Networks,2, No. 5, 490 (1991)
L. K. Hausen,Neural Networks,6, 393 (1993).
L. K. Hansen and P. Salamon,IEEE Trans. Pattern Analysis Machine Intelligence,12, No. 10, 993 (1990).
J. Hertz, A. Krogh, and R. G. Palmer,Introduction to the Theory of Neural Computation, Addison-Wesley, Redwood City, California (1991).
K. Hornik, M. Stinchcombe, and H. White,Neural Networks,3, No. 5, 551 (1990).
R. Kannurpatti and G. W. Hart,IEEE Trans. Information Theory,37, No. 5 1441 (1991)
J. Larsen, in:Neural Networks for Signals, S. Y. Kung, F. Fallside, J. A. Sorensen, and C. A. Kamm (eds.), IEEE, Piscataway, New Jersey (1992), p. 29.
J. Larsen, “Design of neural network filters,”Ph.D. Thesis, The Technical University of Denmark, Electronics Institute, March, 1993.
J. Moody, in:Proceedings of the First IEEE Workshop on Neural Networks for Signal Processing, B. H. Juang, S. Y. Kung, and C. A. Kamm (eds.), IEEE, Piscataway, New Jersey (1991), p. 1.
J. Moody, in:Advances in Neural Information Processing Systems 4, Proceedings of the 1991 Conference, J. E. Moody, S. J. Hanson, and R. P. Lippmann (eds.), Morgan Kaufmann Publishers, San Mateo, California (1992), p. 847.
M. Rosenblatt,Stationary Sequences and Random Fields, Birkhäuser, Boston, Massachusetts (1985).
G. A. F. Seber and C. J. Wild,Nonlinear Regression, John Wiley and Sons, New York (1989).
Additional information
The Computational Neural Network Center, Electronics Institute, The Technical University of Denmark, Building 349. Translated from Izvestiya Vysshikh Uchebnykh Zavedenii, Radiofizika, Vol. 37, No. 9, pp. 1131–1147, September, 1994.
Rights and permissions
About this article
Cite this article
Larsen, J. Optimizing neural network architectures using generalization error estimators. Radiophys Quantum Electron 37, 729–740 (1994). https://doi.org/10.1007/BF01039612
Received:
Issue Date:
DOI: https://doi.org/10.1007/BF01039612