Toward Bayesian Classifiers with Accurate Probabilities

Ling, Charles X.; Zhang, Huajie

doi:10.1007/3-540-47887-6_12

Toward Bayesian Classifiers with Accurate Probabilities

Charles X. Ling⁴ &
Huajie Zhang⁴

Conference paper
First Online: 01 January 2002

2149 Accesses
9 Citations

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Abstract

In most data mining applications, accurate ranking and probability estimation are essential. However, many traditional classifiers aim at a high classification accuracy (or low error rate) only, even though they also produce probability estimates. Does high predictive accuracy imply a better ranking and probability estimation? Is there any better evaluation method for those classifiers than the classification accuracy, for the purpose of data mining applications? The answer is the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC. We show that AUC provides a more discriminating evaluation for the ranking and probability estimation than the accuracy does. Further, we show that classifiers constructed to maximise the AUC score produce not only higher AUC values, but also higher classification accuracies. Our results are based on experimental comparison between error-based and AUC-based learning algorithms for TAN (Tree-Augmented Naive Bayes).

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Bradley, A. P.: The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Patten Recognition, Vol. 30 (1997), 1145–1159.
Article Google Scholar
Chow, C. K., Liu, C. N.: Approximating Discrete Probability Distributions with Dependence Trees. IEEE Trans. on Information Theory, Vol. 14 (1968), 462–467.
Article MATH Google Scholar
Frank, E., Trigg, L., Holmes, G., Witten, I. H.: Naive Bayes for Regression. Machine Learning, Vol. 41 (2000), 5–15.
Article Google Scholar
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning, Vol: 29 (1997), 131–163.
Article MATH Google Scholar
Hand, D. J., Till, R. J.: A Simple Generalisation of the Area under the ROC Curve for Multiple Class Classification Problems. Machine Learning, Vol. 45 (2001), 171–186.
Article MATH Google Scholar
Keogh, E. J., Pazzani, M. J.: Learning Augmented Naive Bayes Classifiers. In: Proceedings of the Seventh International Workshop on AI and Statistics, Ft. Lauderdale (1999).
Google Scholar
Merz, C., Murphy, P., Aha, D.: UCI Repository of Machine Learning Databases. In: Dept of ICS, University of California, Irvine (1997). http://www.www.ics.uci.edu/mlearn/MLRepository.html.
Google Scholar
Monti, S., Cooper, G. F.: A Bayesian Network Classifier That Combines a Finite Mixture Model and a Naive Bayes Model. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann (1999) 447–456.
Google Scholar
Provost, F., Fawcett, T.: Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distribution. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. AAAI Press (1997) 43–48.
Google Scholar
Provost, F., Fawcett, T., Kohavi, R.: The Case Against Accuracy Estimation for Comparing Induction Algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann (1998) 445–453.
Google Scholar
Swets, J.: Measuring the Accuracy of Diagnostic Systems. Science, Vol. 240 (1988), 1285–1293.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, The University of Western Ontario, London, Ontario, Canada, N6A 5B7
Charles X. Ling & Huajie Zhang

Authors

Charles X. Ling
View author publications
You can also search for this author in PubMed Google Scholar
Huajie Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

EE Department, National Taiwan University, No. 1, Sec. 4, Roosevelt Road, Taipei, Taiwan, ROC
Ming-Syan Chen
IBM Thomas J. Watson Research Center, 30 Sawmill River Road, Hawthorne, NY, 10532, USA
Philip S. Yu
School of Computing, National University of Singapore, Lower Kent Ridge Road, Singapore, 119260
Bing Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ling, C.X., Zhang, H. (2002). Toward Bayesian Classifiers with Accurate Probabilities. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_12

Download citation

DOI: https://doi.org/10.1007/3-540-47887-6_12
Published: 29 April 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics