Neural Network Classification and Prior Class Probabilities

  • Steve Lawrence
  • Ian Burns
  • Andrew Back
  • Ah Chung Tsoi
  • C. Lee Giles
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7700)


A commonly encountered problem in MLP (multi-layer perceptron) classification problems is related to the prior probabilities of the individual classes - if the number of training examples that correspond to each class varies significantly between the classes, then it may be harder for the network to learn the rarer classes in some cases. Such practical experience does not match theoretical results which show that MLPs approximate Bayesian a posteriori probabilities (independent of the prior class probabilities). Our investigation of the problem shows that the difference between the theoretical and practical results lies with the assumptions made in the theory (accurate estimation of Bayesian a posteriori probabilities requires the network to be large enough, training to converge to a global minimum, infinite training data, and the a priori class probabilities of the test set to be correctly represented in the training set). Specifically, the problem can often be traced to the fact that efficient MLP training mechanisms lead to sub-optimal solutions for most practical problems. In this chapter, we demonstrate the problem, discuss possible methods for alleviating it, and introduce new heuristics which are shown to perform well on a sample ECG classification problem. The heuristics may also be used as a simple means of adjusting for unequal misclassification costs.


Neural Network Prior Probability Probabilistic Sampling Class Basis Neural Information Processing System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    AAMI. Testing and reporting performance results of ventricular Arrhythmia detection algorithms. In: Association for the Advancement of Medical Instrumentation, ECAR 1987, Arlington, VA (1987)Google Scholar
  2. 2.
    Anand, R., Mehrotra, K.G., Mohan, C.K., Ranka, S.: An improved algorithm for neural network classification of imbalanced training sets. IEEE Transactions on Neural Networks 4(6), 962–969 (1993)CrossRefGoogle Scholar
  3. 3.
    Barnard, E., Botha, E.C.: Back-propagation uses prior information efficiently. IEEE Transactions on Neural Networks 4(5), 794–802 (1993)CrossRefGoogle Scholar
  4. 4.
    Barnard, E., Casasent, D.: A comparison between criterion functions for linear classifiers, with an application to neural nets. IEEE Transactions on Systems, Man, and Cybernetics 19(5), 1030–1041 (1989)CrossRefGoogle Scholar
  5. 5.
    Barnard, E., Cole, R.A., Hou, L.: Location and classification of plosive constants using expert knowledge and neural-net classifiers. Journal of the Acoustical Society of America 84(suppl. 1), S60 (1988)Google Scholar
  6. 6.
    Bourlard, H.A., Morgan, N.: Links between Markov models and multilayer perceptrons. In: Touretzky, D.S. (ed.) Advances in Neural Information Processing Systems, vol. 1, pp. 502–510. Morgan Kaufmann, San Mateo (1989)Google Scholar
  7. 7.
    Bourlard, H.A., Morgan, N.: Connnectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Boston (1994)CrossRefGoogle Scholar
  8. 8.
    Scott Cardell, N., Joerding, W., Li, Y.: Why some feedforward networks cannot learn some polynomials. Neural Computation 6(4), 761–766 (1994)CrossRefGoogle Scholar
  9. 9.
    Fletcher, R.: Practical Methods of Optimization, Second Edition, 2nd edn. John Wiley & Sons (1987)Google Scholar
  10. 10.
    Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma. Neural Computation 4(1), 1–58 (1992)CrossRefGoogle Scholar
  11. 11.
    Gish, H.: A probabilistic approach to the understanding and training of neural network classifiers. In: Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, pp. 1361–1364. IEEE Press (1990)Google Scholar
  12. 12.
    Hampshire, J.B., Pearlmutter, B.: Equivalence proofs for multilayer perceptron classifiers and the Bayesian discriminant function. In: Touretzky, D.S., Elman, J.L., Sejnowski, T.J., Hinton, G.E. (eds.) Proceedings of the 1990 Connectionist Models Summer School, Morgan Kaufmann, San Mateo (1990)Google Scholar
  13. 13.
    Hampshire, J.B., Waibel, A.H.: A novel objective function for improved phoneme recognition using time delay neural networks. In: International Joint Conference on Neural Networks, Washington, DC, pp. 235–241 (June 1989)Google Scholar
  14. 14.
    Haykin, S.: Neural Networks, A Comprehensive Foundation. Macmillan, New York (1994)MATHGoogle Scholar
  15. 15.
    Kanaya, F., Miyake, S.: Bayes statistical behavior and valid generalization of pattern classifying neural networks. IEEE Transactions on Neural Networks 2(1), 471 (1991)CrossRefGoogle Scholar
  16. 16.
    Krogh, A., Hertz, J.A.: A simple weight decay can improve generalization. In: Moody, J.E., Hanson, S.J., Lippmann, R.P. (eds.) Advances in Neural Information Processing Systems, vol. 4, pp. 950–957. Morgan Kaufmann, San Mateo (1992)Google Scholar
  17. 17.
    Lawrence, S., Lee Giles, C., Tsoi, A.C.: Lessons in neural network training: Overfitting be harder than expected. In: Proceedings of the Fourteenth National Conference on Artificial Intelligence, AAAI 1997, pp. 540–545. AAAI Press, Menlo Park (1997)Google Scholar
  18. 18.
    LeCun, Y.: Efficient learning and second order methods. In: Tutorial Presented at Neural Information Processing Systems, vol. 5 (1993)Google Scholar
  19. 19.
    LeCun, Y., Bengio, Y.: Pattern recognition. In: Arbib, M.A. (ed.) The Handbook of Brain Theory and Neural Networks, pp. 711–715. MIT Press (1995)Google Scholar
  20. 20.
    Lyon, R., Yaeger, L.: On-line hand-printing recognition with neural networks. In: Fifth International Conference on Microelectronics for Neural Networks and Fuzzy Systems, Lausanne, Switzerland. IEEE Computer Society Press (1996)Google Scholar
  21. 21.
    MIT-BIH. MIT-BIH Arrhythmia database directory. Technical Report BMEC TR010 (Revised), Massachusetts Institute of Technology and Beth Israel Hospital (1988)Google Scholar
  22. 22.
    Murray, A.F., Edwards, P.J.: Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks 5(5), 792–802 (1994)CrossRefGoogle Scholar
  23. 23.
    Richard, M.D., Lippmann, R.P.: Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation 3(4), 461–483 (1991)CrossRefGoogle Scholar
  24. 24.
    Ripley, B.D.: Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge (1996)CrossRefMATHGoogle Scholar
  25. 25.
    Rojas, R.: A short proof of the posterior probability property of classifier neural networks. Neural Computation 8, 41–43 (1996)CrossRefGoogle Scholar
  26. 26.
    Ruck, D.W., Rogers, S.K., Kabrisky, K., Oxley, M.E., Suter, B.W.: The multilayer perceptron as an approximation to an optimal Bayes estimator. IEEE Transactions on Neural Networks 1(4), 296–298 (1990)CrossRefGoogle Scholar
  27. 27.
    Schiffman, W., Joost, M., Werner, R.: Optimization of the backpropagation algorithm for training multilayer perceptrons. Technical report, University of Koblenz (1994)Google Scholar
  28. 28.
    Shoemaker, P.A.: A note on least-squares learning procedures and classification by neural network models. IEEE Transactions on Neural Networks 2(1), 158–160 (1991)CrossRefGoogle Scholar
  29. 29.
    Wan, E.: Neural network classification: A Bayesian interpretation. IEEE Transactions on Neural Networks 1(4), 303–305 (1990)MathSciNetCrossRefGoogle Scholar
  30. 30.
    Weigend, A.S., Rumelhart, D.E., Huberman, B.A.: Generalization by weight-elimination with application to forecasting. In: Lippmann, R.P., Moody, J.E., Touretzky, D.S. (eds.) Advances in Neural Information Processing Systems, vol. 3, pp. 875–882. Morgan Kaufmann, San Mateo (1991)Google Scholar
  31. 31.
    Weiss, N.A., Hassett, M.J.: Introductory Statistics, 2nd edn. Addison-Wesley, Reading (1987)MATHGoogle Scholar
  32. 32.
    White, H.: Learning in artificial neural networks: A statistical perspective. Neural Computation 1(4), 425–464 (1989)CrossRefGoogle Scholar
  33. 33.
    Yaeger, L., Lyon, R., Webb, B.: Effective training of a neural network character classifier for word recognition. In: Mozer, M.C., Jordan, M.I., Petsche, T. (eds.) Advances in Neural Information Processing Systems, vol. 9. MIT Press, Cambridge (1997)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Steve Lawrence
    • 1
  • Ian Burns
    • 2
  • Andrew Back
    • 3
  • Ah Chung Tsoi
    • 4
  • C. Lee Giles
    • 1
  1. 1.NEC Research InstitutePrincetonUSA
  2. 2.Open Access Pty LtdLeonardsAustralia
  3. 3.RIKEN Brain Science InstituteWako-shiJapan
  4. 4.Faculty of InformaticsUniversity of WollongongWollongongAustralia

Personalised recommendations