A Study of Non Mean Square Error Criteria for the Training of Neural Networks

Kárný, Mirek; Warwick, Kevin; Kůrková, Vera

doi:10.1007/978-1-4471-1523-6_6

Mirek Kárný Csc, DrSc⁵,
Kevin Warwick BSc, PhD, DSc, DrSc⁶ &
Vera Kůrková PhD⁷

Part of the book series: Perspectives in Neural Computing ((PERSPECT.NEURAL))

Abstract

An important problem concerns the probabilistic interpretation to be given to the output units of a neural network after training. It appears that this probabilistic interpretation depends on the cost function used for training. Consequently, there has been considerable interest in analysing the properties of the mean square error criterion. It has been shown by several authors that, when training a multi-layer neural network by minimizing a mean square error criterion — and assuming that this minimum is indeed attained after training —, the output of the network provides an estimation of the conditional expectation of the desired output of the network, given the input pattern, whatever the characteristics of the noise affecting the data (for the continuous case: [1], [2]; for the binary case: [3], [4], [5], [6]; for a review, see [7]). This is in fact a fundamental result of mathematical statistics, and, in particular, estimation theory (see, for instance, [8], [9], [21], [22], [23], [24]).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

White H. (1989). Learning in artificial neural networks: A statistical perspective. Neural Computation, 1, pp. 425–464.
Article Google Scholar
Wan E.A. (1990). Neural network classification: A bayesian interpretation. IEEE Transactions on Neural Networks, NN-1, pp. 303–305.
Article Google Scholar
Bourlard H. & Wellekens C. (1989). Links between Markov models and multilayer perceptrons. In Advances in Neural Information Processing Systems I, Touretzky (editor), pp. 502–510. Morgan Kaufmann.
Google Scholar
Ruck D.W., Rogers S.K., Kabrisky M., Oxley M.E. & Suter B.W. (1990). The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks, NN-1, pp. 296–298.
Article Google Scholar
Gish H. (1990). A probabilistic approach to the understanding and training of neural network classifiers. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1361–1364.
Google Scholar
Shoemaker P.A. (1991). A note on least-squares learning procedures and classification by neural network models. IEEE Transactions on Neural Networks, NN-2, pp. 158–160.
Article Google Scholar
Richard M.D. & Lippmann R.P. (1991). Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation, 3, pp. 461–483.
Article Google Scholar
Deutsch R. (1965). Estimation theory. Prentice-Hall.
Google Scholar
Meditch J.S. (1969). Stochastic optimal linear estimation and control. McGraw-Hill.
Google Scholar
Hampshire J.B. & Pearlmutter B. (1990). Equivalence proofs for multi-layer perceptron classifiers and the Bayesian discriminant function. In Proceedings of the 1990 Connectionnist Models Summer School, Touretzky D., Elman J., Sejnowski T. & Hinton G. (editors), Morgan Kaufmann, pp. 159–172.
Google Scholar
Santini S. & Del Bimbo A. (1995). Recurrent neural networks can be trained to be maximum a posteriori probability classifiers. Neural Networks, 8 (1), pp. 25–29.
Article Google Scholar
Miller J.W., Goodman R. & Smyth P. (1991). Objective functions for probability estimation. Proceedings of the IEEE International Joint Conference on Neural Networks, San Diego, pp. I-881–886.
Google Scholar
Miller J.W., Goodman R. & Smyth P. (1993). On loss functions which minimize to conditional expected values and posterior probabilities. IEEE Transactions on Information Theory, IT-39 (4), pp. 1404–1408.
Article Google Scholar
Saerens M. (1996). Non mean square error criteria for the training of learning machines. Proceedings of the 13th International Conference on Machine Learning(ICML), july 1996, Bari (Italy), pp. 427–434.
Google Scholar
Sorenson H.W. (1980). Parameter estimation, principles and problems. Marcel Dekker.
Google Scholar
Bard Y. (1974). Nonlinear parameter estimation. Academic Press.
Google Scholar
Gelfand I.M. & Fomin S.V. (1963). Calculus of variations. Prentice-Hall.
Google Scholar
Wolfram S. (1991). Mathematica, 2nd ed. Addison-Wesley.
Google Scholar
Brown J.L. (1962). Asymmetric non-mean-square error criteria. IEEE Transactions on Automatic Control, AC-7, pp. 64–66.
Google Scholar
Sherman S. (1958). Non-mean-square error criteria. IEEE Transactions on Information Theory, IT-4, pp. 125–126.
Article MathSciNet Google Scholar
Kay S.M. (1993). Fundamentals of statistical signal processing: Estimation theory. Prentice-Hall.
Google Scholar
Melsa J.L. & Cohn D.L. (1978). Decision and estimation theory. McGraw-Hill.
Google Scholar
Sage A.P. & Melsa J.L. (1971). Estimation theory with applications to communications and control. McGraw-Hill.
Google Scholar
Nahi N.E. (1976). Estimation theory and applications. John Wiley & Sons.
Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Information Theory & Automation, Pod vodarenskou vezi 4, 182 08, Prague 8, Czech Republic
Mirek Kárný Csc, DrSc
Department of Cybernetics, University of Reading, RG6 6AY, Whiteknights, Reading, UK
Kevin Warwick BSc, PhD, DSc, DrSc
Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod vodarenskou vezi 2, 182 07, Prague 8, Czech Republic
Vera Kůrková PhD

Authors

Mirek Kárný Csc, DrSc
View author publications
You can also search for this author in PubMed Google Scholar
Kevin Warwick BSc, PhD, DSc, DrSc
View author publications
You can also search for this author in PubMed Google Scholar
Vera Kůrková PhD
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Institute of Information Theory & Automation, Pod vodarenskou vezi 4, 182 08, Prague 8, Czech Republic
Mirek Kárný Csc, DrSc
Department of Cybernetics, University of Reading, RG6 6AY, Whiteknights, Reading, UK
Kevin Warwick BSc, PhD, DSc, DrSc
Institute of Computer Science, Academy of Sciences of the Czech Republic, Pod vodarenskou vezi 2, 182 07, Prague 8, Czech Republic
Vera Kůrková PhD

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kárný, M., Warwick, K., Kůrková, V. (1998). A Study of Non Mean Square Error Criteria for the Training of Neural Networks. In: Kárný, M., Warwick, K., Kůrková, V. (eds) Dealing with Complexity. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-1523-6_6

Download citation

DOI: https://doi.org/10.1007/978-1-4471-1523-6_6
Publisher Name: Springer, London
Print ISBN: 978-3-540-76160-0
Online ISBN: 978-1-4471-1523-6
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics