Skip to main content

Toward Bayesian Classifiers with Accurate Probabilities

  • Conference paper
  • First Online:

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Abstract

In most data mining applications, accurate ranking and probability estimation are essential. However, many traditional classifiers aim at a high classification accuracy (or low error rate) only, even though they also produce probability estimates. Does high predictive accuracy imply a better ranking and probability estimation? Is there any better evaluation method for those classifiers than the classification accuracy, for the purpose of data mining applications? The answer is the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC. We show that AUC provides a more discriminating evaluation for the ranking and probability estimation than the accuracy does. Further, we show that classifiers constructed to maximise the AUC score produce not only higher AUC values, but also higher classification accuracies. Our results are based on experimental comparison between error-based and AUC-based learning algorithms for TAN (Tree-Augmented Naive Bayes).

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Bradley, A. P.: The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Patten Recognition, Vol. 30 (1997), 1145–1159.

    Article  Google Scholar 

  2. Chow, C. K., Liu, C. N.: Approximating Discrete Probability Distributions with Dependence Trees. IEEE Trans. on Information Theory, Vol. 14 (1968), 462–467.

    Article  MATH  Google Scholar 

  3. Frank, E., Trigg, L., Holmes, G., Witten, I. H.: Naive Bayes for Regression. Machine Learning, Vol. 41 (2000), 5–15.

    Article  Google Scholar 

  4. Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning, Vol: 29 (1997), 131–163.

    Article  MATH  Google Scholar 

  5. Hand, D. J., Till, R. J.: A Simple Generalisation of the Area under the ROC Curve for Multiple Class Classification Problems. Machine Learning, Vol. 45 (2001), 171–186.

    Article  MATH  Google Scholar 

  6. Keogh, E. J., Pazzani, M. J.: Learning Augmented Naive Bayes Classifiers. In: Proceedings of the Seventh International Workshop on AI and Statistics, Ft. Lauderdale (1999).

    Google Scholar 

  7. Merz, C., Murphy, P., Aha, D.: UCI Repository of Machine Learning Databases. In: Dept of ICS, University of California, Irvine (1997). http://www.www.ics.uci.edu/mlearn/MLRepository.html.

    Google Scholar 

  8. Monti, S., Cooper, G. F.: A Bayesian Network Classifier That Combines a Finite Mixture Model and a Naive Bayes Model. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann (1999) 447–456.

    Google Scholar 

  9. Provost, F., Fawcett, T.: Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distribution. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. AAAI Press (1997) 43–48.

    Google Scholar 

  10. Provost, F., Fawcett, T., Kohavi, R.: The Case Against Accuracy Estimation for Comparing Induction Algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann (1998) 445–453.

    Google Scholar 

  11. Swets, J.: Measuring the Accuracy of Diagnostic Systems. Science, Vol. 240 (1988), 1285–1293.

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Ling, C.X., Zhang, H. (2002). Toward Bayesian Classifiers with Accurate Probabilities. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_12

Download citation

  • DOI: https://doi.org/10.1007/3-540-47887-6_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43704-8

  • Online ISBN: 978-3-540-47887-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics