Skip to main content

A Study of Non Mean Square Error Criteria for the Training of Neural Networks

  • Chapter
Dealing with Complexity

Abstract

An important problem concerns the probabilistic interpretation to be given to the output units of a neural network after training. It appears that this probabilistic interpretation depends on the cost function used for training. Consequently, there has been considerable interest in analysing the properties of the mean square error criterion. It has been shown by several authors that, when training a multi-layer neural network by minimizing a mean square error criterion — and assuming that this minimum is indeed attained after training —, the output of the network provides an estimation of the conditional expectation of the desired output of the network, given the input pattern, whatever the characteristics of the noise affecting the data (for the continuous case: [1], [2]; for the binary case: [3], [4], [5], [6]; for a review, see [7]). This is in fact a fundamental result of mathematical statistics, and, in particular, estimation theory (see, for instance, [8], [9], [21], [22], [23], [24]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. White H. (1989). Learning in artificial neural networks: A statistical perspective. Neural Computation, 1, pp. 425–464.

    Article  Google Scholar 

  2. Wan E.A. (1990). Neural network classification: A bayesian interpretation. IEEE Transactions on Neural Networks, NN-1, pp. 303–305.

    Article  Google Scholar 

  3. Bourlard H. & Wellekens C. (1989). Links between Markov models and multilayer perceptrons. In Advances in Neural Information Processing Systems I, Touretzky (editor), pp. 502–510. Morgan Kaufmann.

    Google Scholar 

  4. Ruck D.W., Rogers S.K., Kabrisky M., Oxley M.E. & Suter B.W. (1990). The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks, NN-1, pp. 296–298.

    Article  Google Scholar 

  5. Gish H. (1990). A probabilistic approach to the understanding and training of neural network classifiers. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1361–1364.

    Google Scholar 

  6. Shoemaker P.A. (1991). A note on least-squares learning procedures and classification by neural network models. IEEE Transactions on Neural Networks, NN-2, pp. 158–160.

    Article  Google Scholar 

  7. Richard M.D. & Lippmann R.P. (1991). Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation, 3, pp. 461–483.

    Article  Google Scholar 

  8. Deutsch R. (1965). Estimation theory. Prentice-Hall.

    Google Scholar 

  9. Meditch J.S. (1969). Stochastic optimal linear estimation and control. McGraw-Hill.

    Google Scholar 

  10. Hampshire J.B. & Pearlmutter B. (1990). Equivalence proofs for multi-layer perceptron classifiers and the Bayesian discriminant function. In Proceedings of the 1990 Connectionnist Models Summer School, Touretzky D., Elman J., Sejnowski T. & Hinton G. (editors), Morgan Kaufmann, pp. 159–172.

    Google Scholar 

  11. Santini S. & Del Bimbo A. (1995). Recurrent neural networks can be trained to be maximum a posteriori probability classifiers. Neural Networks, 8 (1), pp. 25–29.

    Article  Google Scholar 

  12. Miller J.W., Goodman R. & Smyth P. (1991). Objective functions for probability estimation. Proceedings of the IEEE International Joint Conference on Neural Networks, San Diego, pp. I-881–886.

    Google Scholar 

  13. Miller J.W., Goodman R. & Smyth P. (1993). On loss functions which minimize to conditional expected values and posterior probabilities. IEEE Transactions on Information Theory, IT-39 (4), pp. 1404–1408.

    Article  Google Scholar 

  14. Saerens M. (1996). Non mean square error criteria for the training of learning machines. Proceedings of the 13th International Conference on Machine Learning(ICML), july 1996, Bari (Italy), pp. 427–434.

    Google Scholar 

  15. Sorenson H.W. (1980). Parameter estimation, principles and problems. Marcel Dekker.

    Google Scholar 

  16. Bard Y. (1974). Nonlinear parameter estimation. Academic Press.

    Google Scholar 

  17. Gelfand I.M. & Fomin S.V. (1963). Calculus of variations. Prentice-Hall.

    Google Scholar 

  18. Wolfram S. (1991). Mathematica, 2nd ed. Addison-Wesley.

    Google Scholar 

  19. Brown J.L. (1962). Asymmetric non-mean-square error criteria. IEEE Transactions on Automatic Control, AC-7, pp. 64–66.

    Google Scholar 

  20. Sherman S. (1958). Non-mean-square error criteria. IEEE Transactions on Information Theory, IT-4, pp. 125–126.

    Article  MathSciNet  Google Scholar 

  21. Kay S.M. (1993). Fundamentals of statistical signal processing: Estimation theory. Prentice-Hall.

    Google Scholar 

  22. Melsa J.L. & Cohn D.L. (1978). Decision and estimation theory. McGraw-Hill.

    Google Scholar 

  23. Sage A.P. & Melsa J.L. (1971). Estimation theory with applications to communications and control. McGraw-Hill.

    Google Scholar 

  24. Nahi N.E. (1976). Estimation theory and applications. John Wiley & Sons.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag London Limited

About this chapter

Cite this chapter

Kárný, M., Warwick, K., Kůrková, V. (1998). A Study of Non Mean Square Error Criteria for the Training of Neural Networks. In: Kárný, M., Warwick, K., Kůrková, V. (eds) Dealing with Complexity. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-1523-6_6

Download citation

  • DOI: https://doi.org/10.1007/978-1-4471-1523-6_6

  • Publisher Name: Springer, London

  • Print ISBN: 978-3-540-76160-0

  • Online ISBN: 978-1-4471-1523-6

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics