Skip to main content

Neural Network Classification and Prior Class Probabilities

  • Chapter
  • First Online:
Neural Networks: Tricks of the Trade

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 1524))

Abstract

A commonly encountered problem in MLP (multi-layer perceptron) classification problems is related to the prior probabilities of the individual classes - if the number of training examples that correspond to each class varies significantly between the classes, then it may be harder for the network to learn the rarer classes in some cases. Such practical experience does not match theoretical results which show that MLPs approximate Bayesian a posteriori probabilities (independent of the prior class probabilities). Our investigation of the problem shows that the difference between the theoretical and practical results lies with the assumptions made in the theory (accurate estimation of Bayesian a posteriori probabilities requires the network to be large enough, training to converge to a global minimum, infinite training data, and the a priori class probabilities of the test set to be correctly represented in the training set). Specifically, the problem can often be traced to the fact that eficient MLP training mechanisms lead to sub-optimal solutions for most practical problems. In this chapter, we demonstrate the problem, discuss possible methods for alleviating it, and introduce new heuristics which are shown to perform well on a sample ECG classification problem. The heuristics may also be used as a simple means of adjusting for unequal misclassification costs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. AAMI. Testing and reporting performance results of ventricular Arrhythmia detection algorithms. Association for the Advancement of Medical Instrumentation, Arlington, VA, 1987. ECAR-1987.

    Google Scholar 

  2. R. Anand, K. G. Mehrotra, C. K. Mohan, and S. Ranka. An improved algorithm for neural network classification of imbalanced training sets. IEEE Transactions on Neural Networks, 4(6):962–969, November 1993.

    Google Scholar 

  3. E. Barnard and E. C. Botha. Back-propagation uses prior information eficiently. IEEE Transactions on Neural Networks, 4(5):794–802, September 1993.

    Google Scholar 

  4. E. Barnard and D. Casasent. A comparison between criterion functions for linear classifiers, with an application to neural nets. IEEE Transactions on Systems, Man, and Cybernetics, 19(5):1030–1041, 1989.

    Article  Google Scholar 

  5. E. Barnard, R.A. Cole, and L. Hou. Location and classification of plosive constants using expert knowledge and neural-net classifiers. Journal of the Acoustical Society of America, 84 Supp 1:S60, 1988.

    Google Scholar 

  6. H.A. Bourlard and N. Morgan. Links between Markov models and multilayer perceptrons. In D. S. Touretzky, editor, Advances in Neural Information Processing Systems 1, volume 1, pages 502–510. Morgan Kaufmann, San Mateo, CA, 1989.

    Google Scholar 

  7. H.A. Bourlard and N. Morgan. Connnectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Boston, MA, 1994.

    Google Scholar 

  8. N. Scott Cardell, Wayne Joerding, and Ying Li. Why some feedforward networks cannot learn some polynomials. Neural Computation, 6(4):761–766, 1994.

    Article  Google Scholar 

  9. R. Fletcher. Practical Methods of Optimization, Second Edition. John Wiley & Sons, 1987.

    Google Scholar 

  10. S. Geman, E. Bienenstock, and R. Doursat. Neural networks and the bias/variance dilemma. Neural Computation, 4(1):1–58, 1992.

    Article  Google Scholar 

  11. H. Gish. A probabilistic approach to the understanding and training of neural network classifiers. In Proceedings of the IEEE Conference on Acoustics, Speech and Signal Processing, pages 1361–1364. IEEE Press, 1990.

    Google Scholar 

  12. J.B. Hampshire and B. Pearlmutter. Equivalence proofs for multilayer perceptron classifiers and the Bayesian discriminant function. In D. S. Touretzky, J. L. Elman, T. J. Sejnowski, and G. E. Hinton, editors, Proceedings of the 1990 Connectionist Models Summer School. Morgan Kaufmann, San Mateo, CA, 1990.

    Google Scholar 

  13. J.B. Hampshire and A. H. Waibel. A novel objective function for improved phoneme recognition using time delay neural networks. In International Joint Conference on Neural Networks, pages 235–241, Washington, DC, June 1989.

    Google Scholar 

  14. S. Haykin. Neural Networks, A Comprehensive Foundation. Macmillan, New York, NY, 1994.

    MATH  Google Scholar 

  15. F. Kanaya and S. Miyake. Bayes statistical behavior and valid generalization of pattern classifying neural networks. IEEE Transactions on Neural Networks, 2(1):471, 1991.

    Article  Google Scholar 

  16. A. Krogh and J.A. Hertz. A simple weight decay can improve generalization. In J.E. Moody, S. J. Hanson, and R. P. Lippmann, editors, Advances in Neural Information Processing Systems, volume 4, pages 950–957. Morgan Kaufmann, San Mateo,CA, 1992.

    Google Scholar 

  17. S. Lawrence, C. Lee Giles, and A.C. Tsoi. Lessons in neural network training: Overfitting may be harder than expected. In Proceedings of the Fourteenth National Conference on Artificial Intelligence, AAAI-97, pages 540–545. AAAI Press, Menlo Park, California, 1997.

    Google Scholar 

  18. Y. Le Cun. Eficient learning and second order methods. Tutorial presented at Neural Information Processing Systems 5, 1993.

    Google Scholar 

  19. Y. Le Cun and Y. Bengio. Pattern recognition. In Michael A. Arbib, editor, The Handbook of Brain Theory and Neural Networks, pages 711–715. MIT Press, Cambridge, Massachusetts, 1995.

    Google Scholar 

  20. R. Lyon and L. Yaeger. On-line hand-printing recognition with neural networks. In Fifth International Conference on Microelectronics for Neural Networks and Fuzzy Systems, Lausanne, Switzerland, 1996. IEEE Computer Society Press.

    Google Scholar 

  21. MIT-BIH. MIT-BIH Arrhythmia database directory. Technical Report BMEC TR010 (Revised), Massachusetts Institute of Technology and Beth Israel Hospital, 1988.

    Google Scholar 

  22. A. F. Murray and P. J. Edwards. Enhanced MLP performance and fault tolerance resulting from synaptic weight noise during training. IEEE Transactions on Neural Networks, 5(5):792–802, 1994.

    Article  Google Scholar 

  23. M.D. Richard and R.P. Lippmann. Neural network classifiers estimate Bayesian a posteriori probabilities. Neural Computation, 3(4):461–483, 1991.

    Article  Google Scholar 

  24. B.D. Ripley. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge, UK, 1996.

    MATH  Google Scholar 

  25. R. Rojas. A short proof of the posterior probability property of classifier neural networks. Neural Computation, 8:41–43, 1996.

    Article  Google Scholar 

  26. D.W. Ruck, S.K. Rogers, K. Kabrisky, M.E. Oxley, and B.W. Suter. The multilayer perceptron as an approximation to an optimal Bayes estimator. IEEE Transactions on Neural Networks, 1(4):296–298, 1990.

    Article  Google Scholar 

  27. W. Schifiman, M. Joost, and R. Werner. Optimization of the backpropagation algorithm for training multilayer perceptrons. Technical report, University of Koblenz, 1994.

    Google Scholar 

  28. P.A. Shoemaker. A note on least-squares learning procedures and classification by neural network models. IEEE Transactions on Neural Networks, 2(1):158–160, 1991.

    Article  Google Scholar 

  29. E. Wan. Neural network classification: A Bayesian interpretation. IEEE Transactions on Neural Networks, 1(4):303–305, 1990.

    Article  Google Scholar 

  30. A.S. Weigend, D.E. Rumelhart, and B.A. Huberman. Generalization by weightelimination with application to forecasting. In R. P. Lippmann, J.E. Moody, and D. S. Touretzky, editors, Advances in Neural Information Processing Systems, volume 3, pages 875–882. Morgan Kaufmann, San Mateo, CA, 1991.

    Google Scholar 

  31. N.A. Weiss and M.J. Hassett. Introductory Statistics. Addison-Wesley, Reading, Massachusetts, second edition, 1987.

    MATH  Google Scholar 

  32. H. White. Learning in artificial neural networks: A statistical perspective. Neural Computation, 1(4):425–464, 1989.

    Article  Google Scholar 

  33. L. Yaeger, R. Lyon, and B. Webb. Efiective training of a neural network character classifier for word recognition. In M.C. Mozer, M.I. Jordan, and T. Petsche, editors, Advances in Neural Information Processing Systems 9, Cambridge, MA, 1997. MIT Press.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 1998 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Lawrence, S., Burns, I., Back, A., Tsoi, A.C., Giles, C.L. (1998). Neural Network Classification and Prior Class Probabilities. In: Orr, G.B., Müller, KR. (eds) Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science, vol 1524. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-49430-8_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-49430-8_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-65311-0

  • Online ISBN: 978-3-540-49430-0

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics