Skip to main content
Log in

Learning criteria for training neural network classifiers

  • Articles
  • Published:
Neural Computing & Applications Aims and scope Submit manuscript

Abstract

This paper presents a study of two learning criteria and two approaches to using them for training neural network classifiers, specifically a Multi-Layer Perceptron (MLP) and Radial Basis Function (RBF) networks. The first approach, which is a traditional one, relies on the use of two popular learning criteria, i.e. learning via minimising a Mean Squared Error (MSE) function or a Cross Entropy (CE) function. It is shown that the two criteria have different charcteristics in learning speed and outlier effects, and that this approach does not necessarily result in a minimal classification error. To be suitable for classification tasks, in our second approach an empirical classification criterion is introduced for the testing process while using the MSE or CE function for the training. Experimental results on several benchmarks indicate that the second approach, compared with the first, leads to an improved generalisation performance, and that the use of the CE function, compared with the MSE function, gives a faster training speed and improved or equal generalisation performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Abbreviations

x :

random input vector withd real number components [x 1 ...x d ]

t :

random target vector withc binary components [t 1 ...t c ]

y(·):

neural network function or output vector

θ :

parameters of a neural model

η :

learning rate

α :

momentum

γ :

decay factor

O :

objective function

E :

mean sum-of-squares error function

L :

cross entropy function

n :

nth training pattern

N :

number of training patterns

ϕ(·):

transfer function in a neural unit

z j :

output of hidden unit-j

a i :

activation of unit-j

W ij :

weight from hidden unit-j to output unit-i

W 0 jl :

weight from input unit-l to hidden unit-j

μ j :

centre vector [μ j 1 ... μ jd ] of RBF unit-j

σ j :

width vector [σ j 1, ...σ jd ] of RBF unit-j

p( ·¦·):

conditional probability function

References

  1. Juang B, Katagiri S. Discriminative learning for minimum error classification. IEEE Trans Signal Processing 1992; 40(12): 3043–3053.

    Google Scholar 

  2. Nedljkovic V. A novel multilayer neural networks training algorithm that minimises the probability of classification errors. IEEE Trans Neural Networks 1993; 4(4): 650–659

    Google Scholar 

  3. Miller D, Rao AV, Rose K, Gersho A. A global optimisation technique for statistical classier design. IEEE Trans Signal Processing 1996; 44(12): 3108–3122.

    Google Scholar 

  4. Telfer BA, Szu HH. Energy function for minimising misclassification error with minimum-complexity networks. Neural Networks 1994; 7 (5): 809–818.

    Google Scholar 

  5. Hampshire JB, Waibel AH. A novel objective function for improved phoneme recognition using time-delay neural networks. IEEE Trans Neural Networks 1990; 1(2): 2160–2228

    Google Scholar 

  6. Hey H. On the probabilistic interpretation of neural network classifiers and discriminative training criteria. IEEE Trans Pattern Analysis and Machine Intelligence 1995; 17(2): 107–119

    Google Scholar 

  7. Bishop CM. Neural networks for pattern recognition. Clarendon Press, Oxford, 1995

    Google Scholar 

  8. Bridle JS. Probabilistic interpretation of feedforward classification networks outputs, with relationships to statistical pattern recognition. In: Soulie F, J. Herault J (eds) Neuro-computing: Algorithms, architectures and applications, Springer-Verlag 1990, pp. 227–236

  9. Bridle JS. Training stochastic model recognition algorithm. In: Touretzky D (ed), Advances in Neural Information Processing Systems, Vol 2, MIT Press, 1990, pp. 211–217

  10. Bichsel M, Seitz P. Minimum class entropy: a maximum information approach to layered neural networks. Neural Networks 1989; 2(2): 133–141

    Google Scholar 

  11. Solla SA, Levin E, Fleisher M. Accelerate learning in layered neural networks. Complex Systems 1988; 22: 625–640

    Google Scholar 

  12. Holt MJJ, Semnani S. Convergence of back-propagation in neural networks using a log-likelihood function. Electronic Letters 1990; 26(23): 1965–1965

    Google Scholar 

  13. The Carnegie Mellon University Collection of Neural Net Benchmarks, from http://www.cs.cmu.edu:80/afs/cs/project/connect/bench/, or from ftp.cs.cmu.edu —afs/cs/project/connect/bench/

  14. University of California, Irvine, UCI Repository of Machine Learning Databases, from http://www.ics.uci. edu/~mlearn/MLOther.html, or from ftp.ics.uci.edu — pub/machine-learning-databases

  15. Deterding D. Speaker normalisation for automatic speech recognition. PhD Thesis, University of Cambridge, 1989

  16. Gorman RP, Sejnowski TJ. Analysis of hidden units in a layered network trained to classify sonar targets. Neural Networks 1988; 1(1): 75–89

    Google Scholar 

  17. Sigillito VG, Wing SP, Hutton LV, Baker KB. Classification of radar returns from the ionosphere using neural networks. Johns Hopkins APL Technical Digest 1989; 10: 262–266

    Google Scholar 

  18. Michie D, Spiegelhalter DJ, Taylor CC. Machine learning, neural and statistical classification, Ellis Horwood, 1994

  19. Moody J, Darken CJ. Fast learning in networks of locally tuned processing units. Neural Computation 1989; 1(2): 281–294

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, P., Austin, J. Learning criteria for training neural network classifiers. Neural Comput & Applic 7, 334–342 (1998). https://doi.org/10.1007/BF01428124

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/BF01428124

Keywords

Navigation