Advertisement

Generalization performance of overtrained back-propagation networks

  • Yves Chauvin
Part II Theory, Algorithms
Part of the Lecture Notes in Computer Science book series (LNCS, volume 412)

Abstract

The performance of the back-propagation (BP) algorithm is investigated under overtraining for three different tasks. In a first case study, a network was trained to map a function composed of two discontinuous intervals. Interpolation performance is shown to decrease with overtraining and size of the sample space. In a second case study, a network was trained to map a continuous, and continuously differentiable function, known to produce the Runge effect (i.e., complete deterioration of polynomial interpolation performance despite an adequate number of degrees of freedom). Simulation results suggested a minimal network strategy would solve the observed overfitting effect. Constraints added to the BP Least Mean Square error term were used to reduce the size of the network on-line during training. For a speech labeling task, this method eliminated the overfitting effects after overtraining. Interpretation of the results are given in terms of the properties of the back-propagation algorithm in relation to the data being learned.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. [1]
    Chauvin, Y. (1987). Generalization as a function of the number of hidden units in back-propatation networks. Unpublished Manuscript. University of California, San Diego, CA.Google Scholar
  2. [2]
    Chauvin, Y. (1989). A back-propagation algorithm with optimal use of the hidden units. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems 1. Palo Alto, CA: Morgan Kaufman.Google Scholar
  3. [3]
    Chauvin, Y. & Rumelhart, D.E. (In Preparation). Back-propagation: Theory, architectures and applications. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
  4. [4]
    Földi%ak, P. (1989). Adaptive network for optimal linear feature extraction. Proceedings of the IJCNN International Joint Conference on Neural Networks, 1, 401–405. Washington D.C., June 18–22.Google Scholar
  5. [5]
    Golden, R.M. & Rumelhart, D.E. (1989). Improving generalization in multi-layer networks through weight decay and derivative minimization. Unpublished Manuscript. Stanford University, Palo Alto, CA.Google Scholar
  6. [6]
    Hanson, S.J. (1989). Comparing biases for minimal network construction with back-propagation. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems 1. Palo Alto, CA: Morgan Kaufman.Google Scholar
  7. [7]
    Hornik K.M., Stinchcombe M. & White H. (1988). Multi-layer feedforward networks are universal approximators. Unpublished Manuscript, Department of Economics, University of California, San Diego.Google Scholar
  8. [8]
    Ishikawa M. (1989). A structural learning algorithm with forgetting of weight link weights. Proceedings of the IJCNN International Joint Conference on Neural Networks, II, 626. Washington D.C., June 18–22.Google Scholar
  9. [9]
    Ji, C., Snapp R. & Psaltis D. (1989). Generalizing smoothness constraints from discrete samples. Unpublished Manuscript. Department of Electrical Engineering. California Institute of Technology, CA.Google Scholar
  10. [10]
    Morgan, N. & Bourlard, H. (1989). Generalization and parameter estimation in feedforward nets: some experiments. Paper presented at the Snowbird conference on Neural Networks, Utah.Google Scholar
  11. [11]
    Rumelhart, D. E., Hinton G. E., Williams R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart & J. L. McClelland (Eds.) Parallel Distributed Processing: Explorations in the Microstructures of Cognition (Vol. I). Cambridge, MA: MIT Press.Google Scholar
  12. [12]
    Rumelhart, D.E. (1988). Network Minimization. Talk given at Stanford University, Palo Alto, CA.Google Scholar
  13. [13]
    J. F. Steffenssen. (1950). Interpolation. Chelsea: New York, NY.Google Scholar
  14. [14]
    Vallet, F., Cailton, J.-G. & Refregier P. (1989). Solving the problem of overfitting of the pseudo-inverse solution for classification learning. Proceedings of the IJCNN International Joint Conference on Neural Networks, II, 443–450. Washington D.C., June 18–22.Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 1990

Authors and Affiliations

  • Yves Chauvin
    • 1
  1. 1.Thomson-CSF, Inc./Pacific Rim OperationsPalo AltoUSA

Personalised recommendations