Abstract
An important problem concerns the probabilistic interpretation to be given to the output units of a neural network after training. It appears that this probabilistic interpretation depends on the cost function used for training. Consequently, there has been considerable interest in analysing the properties of the mean square error criterion. It has been shown by several authors that, when training a multi-layer neural network by minimizing a mean square error criterion — and assuming that this minimum is indeed attained after training —, the output of the network provides an estimation of the conditional expectation of the desired output of the network, given the input pattern, whatever the characteristics of the noise affecting the data (for the continuous case: [1], [2]; for the binary case: [3], [4], [5], [6]; for a review, see [7]). This is in fact a fundamental result of mathematical statistics, and, in particular, estimation theory (see, for instance, [8], [9], [21], [22], [23], [24]).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
White H. (1989). Learning in artificial neural networks: A statistical perspective. Neural Computation, 1, pp. 425–464.
Wan E.A. (1990). Neural network classification: A bayesian interpretation. IEEE Transactions on Neural Networks, NN-1, pp. 303–305.
Bourlard H. & Wellekens C. (1989). Links between Markov models and multilayer perceptrons. In Advances in Neural Information Processing Systems I, Touretzky (editor), pp. 502–510. Morgan Kaufmann.
Ruck D.W., Rogers S.K., Kabrisky M., Oxley M.E. & Suter B.W. (1990). The multilayer perceptron as an approximation to a Bayes optimal discriminant function. IEEE Transactions on Neural Networks, NN-1, pp. 296–298.
Gish H. (1990). A probabilistic approach to the understanding and training of neural network classifiers. Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 1361–1364.
Shoemaker P.A. (1991). A note on least-squares learning procedures and classification by neural network models. IEEE Transactions on Neural Networks, NN-2, pp. 158–160.
Richard M.D. & Lippmann R.P. (1991). Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation, 3, pp. 461–483.
Deutsch R. (1965). Estimation theory. Prentice-Hall.
Meditch J.S. (1969). Stochastic optimal linear estimation and control. McGraw-Hill.
Hampshire J.B. & Pearlmutter B. (1990). Equivalence proofs for multi-layer perceptron classifiers and the Bayesian discriminant function. In Proceedings of the 1990 Connectionnist Models Summer School, Touretzky D., Elman J., Sejnowski T. & Hinton G. (editors), Morgan Kaufmann, pp. 159–172.
Santini S. & Del Bimbo A. (1995). Recurrent neural networks can be trained to be maximum a posteriori probability classifiers. Neural Networks, 8 (1), pp. 25–29.
Miller J.W., Goodman R. & Smyth P. (1991). Objective functions for probability estimation. Proceedings of the IEEE International Joint Conference on Neural Networks, San Diego, pp. I-881–886.
Miller J.W., Goodman R. & Smyth P. (1993). On loss functions which minimize to conditional expected values and posterior probabilities. IEEE Transactions on Information Theory, IT-39 (4), pp. 1404–1408.
Saerens M. (1996). Non mean square error criteria for the training of learning machines. Proceedings of the 13th International Conference on Machine Learning(ICML), july 1996, Bari (Italy), pp. 427–434.
Sorenson H.W. (1980). Parameter estimation, principles and problems. Marcel Dekker.
Bard Y. (1974). Nonlinear parameter estimation. Academic Press.
Gelfand I.M. & Fomin S.V. (1963). Calculus of variations. Prentice-Hall.
Wolfram S. (1991). Mathematica, 2nd ed. Addison-Wesley.
Brown J.L. (1962). Asymmetric non-mean-square error criteria. IEEE Transactions on Automatic Control, AC-7, pp. 64–66.
Sherman S. (1958). Non-mean-square error criteria. IEEE Transactions on Information Theory, IT-4, pp. 125–126.
Kay S.M. (1993). Fundamentals of statistical signal processing: Estimation theory. Prentice-Hall.
Melsa J.L. & Cohn D.L. (1978). Decision and estimation theory. McGraw-Hill.
Sage A.P. & Melsa J.L. (1971). Estimation theory with applications to communications and control. McGraw-Hill.
Nahi N.E. (1976). Estimation theory and applications. John Wiley & Sons.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 1998 Springer-Verlag London Limited
About this chapter
Cite this chapter
Kárný, M., Warwick, K., Kůrková, V. (1998). A Study of Non Mean Square Error Criteria for the Training of Neural Networks. In: Kárný, M., Warwick, K., Kůrková, V. (eds) Dealing with Complexity. Perspectives in Neural Computing. Springer, London. https://doi.org/10.1007/978-1-4471-1523-6_6
Download citation
DOI: https://doi.org/10.1007/978-1-4471-1523-6_6
Publisher Name: Springer, London
Print ISBN: 978-3-540-76160-0
Online ISBN: 978-1-4471-1523-6
eBook Packages: Springer Book Archive