Complexity Control and Generalization in Multilayer Perceptrons

  • Patrick Gallinari
  • Tautvydas Cibas
Part of the Advances in Computational Management Science book series (AICM, volume 1)


This paper presents simple and practical approaches for controlling the complexity of neural networks (NN) in order to optimize their generalization ability. Several formal and heuristic methods have been proposed in the literature for improving the performances of NNs. It is of major importance for the user to understand which cf these methods are of practical use and which are the more efficient. We will try here to fill the gap between specialists of these techniques and the user by presenting and analyzing some methods which we have selected both for their simplicity and efficiency. We will consider only supervised learning.


Hide Unit Generalization Error Minimum Description Length Neural Computation Complexity Control 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Akaike H. A new look at the statistical model identification. Proceedings of the IEEE trans. Auto. Control, 1974; 19: 716–723.Google Scholar
  2. Amari S, Murata N. Statistical Theory of Learning Curves under Entropic Loss Criterion. Neural Computation, 1993; 5: 140–153.CrossRefGoogle Scholar
  3. Badeva V, Morosov V. Problèmes incorrectements posés, théorie et applications. Masson, 1991.Google Scholar
  4. Bishop CM. Training with noise is equivalent to Tikhonov Regularization, 1994; to appear in Neural Computation.Google Scholar
  5. Breiman L, Friedman JR, Olsen RA, Stone CJ. Classification and Regression Trees. Wadsworth, Belmont, CA., 1984.Google Scholar
  6. Buntime WL, Weigend AS. Bayesian Back-Propagation. Complex Systems, 1991; 5: 603–643.Google Scholar
  7. Cibas T, Gallinari P, Gascuel O. Experimental investigations on the complexity-performance relations in multi-layer Perceptrons. ICANN 95, Paris; 1995.Google Scholar
  8. Efron BE. The Jacknife, the Bootstrap and other resampling plans. Proceedings of the SIAM's Regional conference series in applied mathematics, 1982; vol. 38.Google Scholar
  9. Fahlman SE, Lebiere C. “The cascade correlation learning architecture.” In NIPS 2, D.S. Touretzky ed., Morgan Kaufmann, 1990; 524–532.Google Scholar
  10. Finnoff W, Hergert F, Zimmermann HG. Improving model selection by non convergent methods. Neural Networks, 1993; 6: 771–783.CrossRefGoogle Scholar
  11. Fleming H.E. “Equivalence of regularization and truncated iteration in the solution of ill-posed image reconstruction problems.” In Linear algebra and its applications, 1990; 130: 133–150.CrossRefGoogle Scholar
  12. Girosi F, Jones M, Poggio T. Regularization theory and neural networks architectures. Neural Computation, 1995; 7, 2: 219–269.CrossRefGoogle Scholar
  13. Grandvalet Y, Canu S. Comments on “noise Injection into inputs in back propagation learning.” IEEE Trans. on Systems, Man and Cybernetics, 1995; 25, 4: 678–681.Google Scholar
  14. Grandvallet Y. Injection de bruit dans les perceptrons multi-couches. Thèse Univ. Tech. Compiègne, 1995.Google Scholar
  15. Gustafson, Hajlmarsson. 21 maximum likelihood estimators for model selection. Automatica, 1995.Google Scholar
  16. Guyon I, Vapnik V, Boser BE, Bottou LY, Solla SA. “Strictural risk minimization for character recognition.” NIPS 4, Moody J.E., Hanson S.J., Lippmann R.P. eds., M. Kaufmann, 1992; 471–479.Google Scholar
  17. Hassibi B, Stork DG. “Second order derivatives for Neural Pruning: Optimal Brain Surgeon.” In Neural Information Processing Systems 5; C.L.Giles, S.J.Hanson and J.D.Cowan eds., Morgan Kaufmann, San Mateo, 1993.Google Scholar
  18. Hochreiter S, Schmidhuber J. Flat minimum search finds simple nets. Neural Computation, 1994; 9, 1: 142.Google Scholar
  19. Hornik M, Stinchcombe M, White H. Multilayer feedforward networks are universal approximators. Neural networks, 1989; 2: 359–366.CrossRefGoogle Scholar
  20. Larsen J, Hansen LK. Generalization Performance of Regularized Neural Networks Models. Proceedings of the IEEE Workshop on Neural Networks for Signal Processing NNSP’94, 1994.Google Scholar
  21. Le Cun Y, Denker JS, Solla SA. “Optimal brain damage.” In NIPS 2, 1990; 598–605.Google Scholar
  22. Ljung L. System Identification: Theory for the User. Prentice-Hall, Englewood Cliffts, NJ:1987.Google Scholar
  23. MacKay DJC. Bayesian interpolation. Neural Comp., 1992(a); 4: 415–447.Google Scholar
  24. MacKay D.J.C. A practical framework for backpropagation networks. Neural Comp., 1992(b); 3: 448–472.Google Scholar
  25. MacKay DJC. The evidence framework applied to classification networks. Neural Comp., 1992(c); 4: 720–736.Google Scholar
  26. Matsuoka K. Noise injection into inputs in back propagation learning. IEEE trans. SMC, 1992; 22, 3: 436–440.Google Scholar
  27. Moody JE. “The effective number of parameters: an analysis of generalization in non linear learning systems.” In NIPS 4, 1992; 847–855.Google Scholar
  28. Murray AF, Edwards PJ. “Synaptic weight noise during MLP learning enhences fault tolerance, generalisation and learning trajectory.” In NIPS 6, 1994; 491–498.Google Scholar
  29. Nadal J.P. Study of growth algorithm for a feedforward neural network. Int. Journal of Neural Systems 1989; 1, 1: 55–69.CrossRefGoogle Scholar
  30. Neal RM. “Bayesian Learning for Neural Networks.” In Lecture Notes in Statistics, Springer, 1995.Google Scholar
  31. Poggio T, Girosi F. Regularization algorithms that are equivalent to multilayer networks. Science, 1990; 247: 978 - 982.CrossRefGoogle Scholar
  32. Powell MJD. “Radial basis functions for multivariable interpolation: a review.” In Algorithms for approximation, J.C. Mason and M.G. Cox eds, Clarendon Press Oxford, 1987.Google Scholar
  33. Rissanen J. Modeling by shortest data description. Automatica, 1978; 14: 465.CrossRefGoogle Scholar
  34. Sjöberg J. Regularization issues in neural networks models of dynamical systems. PhD Thesis, Linköping University, Sw., 1993.Google Scholar
  35. Tikhonov AN, Arsenin VY. Solutions of Ill posed Problems. Winston, Washington DC: 1977.Google Scholar
  36. Vapnik VN. “Principle of risk minimization for learning theory.” In NIPS 4, 1992; 831–840.Google Scholar
  37. Vapnik VN. The Nature of Statistical Learning Theory. Springer-Verlag, New York, Inc: 1995.Google Scholar
  38. Vapnik VN, Chervonenkis YA. “On the uniform convergence of relative frequencies of events to their probabilities.” In Theory of Probabiliry and its Applications, 1971; 16: 264–280.CrossRefGoogle Scholar
  39. Williams P.M. Bayesian regularization and pruning using a Laplace prior. TR CDRP-312, Univ. Sussex: 1994.Google Scholar

Copyright information

© Springer Science+Business Media Dordrecht 1998

Authors and Affiliations

  • Patrick Gallinari
    • 1
  • Tautvydas Cibas
    • 1
  1. 1.LIP6Université Paris 6Paris cedex 5France

Personalised recommendations