Abstract
In most data mining applications, accurate ranking and probability estimation are essential. However, many traditional classifiers aim at a high classification accuracy (or low error rate) only, even though they also produce probability estimates. Does high predictive accuracy imply a better ranking and probability estimation? Is there any better evaluation method for those classifiers than the classification accuracy, for the purpose of data mining applications? The answer is the area under the ROC (Receiver Operating Characteristics) curve, or simply AUC. We show that AUC provides a more discriminating evaluation for the ranking and probability estimation than the accuracy does. Further, we show that classifiers constructed to maximise the AUC score produce not only higher AUC values, but also higher classification accuracies. Our results are based on experimental comparison between error-based and AUC-based learning algorithms for TAN (Tree-Augmented Naive Bayes).
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Bradley, A. P.: The Use of the Area under the ROC Curve in the Evaluation of Machine Learning Algorithms. Patten Recognition, Vol. 30 (1997), 1145–1159.
Chow, C. K., Liu, C. N.: Approximating Discrete Probability Distributions with Dependence Trees. IEEE Trans. on Information Theory, Vol. 14 (1968), 462–467.
Frank, E., Trigg, L., Holmes, G., Witten, I. H.: Naive Bayes for Regression. Machine Learning, Vol. 41 (2000), 5–15.
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian Network Classifiers. Machine Learning, Vol: 29 (1997), 131–163.
Hand, D. J., Till, R. J.: A Simple Generalisation of the Area under the ROC Curve for Multiple Class Classification Problems. Machine Learning, Vol. 45 (2001), 171–186.
Keogh, E. J., Pazzani, M. J.: Learning Augmented Naive Bayes Classifiers. In: Proceedings of the Seventh International Workshop on AI and Statistics, Ft. Lauderdale (1999).
Merz, C., Murphy, P., Aha, D.: UCI Repository of Machine Learning Databases. In: Dept of ICS, University of California, Irvine (1997). http://www.www.ics.uci.edu/mlearn/MLRepository.html.
Monti, S., Cooper, G. F.: A Bayesian Network Classifier That Combines a Finite Mixture Model and a Naive Bayes Model. In: Proceedings of the 15th Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann (1999) 447–456.
Provost, F., Fawcett, T.: Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distribution. In: Proceedings of the Third International Conference on Knowledge Discovery and Data Mining. AAAI Press (1997) 43–48.
Provost, F., Fawcett, T., Kohavi, R.: The Case Against Accuracy Estimation for Comparing Induction Algorithms. In: Proceedings of the Fifteenth International Conference on Machine Learning. Morgan Kaufmann (1998) 445–453.
Swets, J.: Measuring the Accuracy of Diagnostic Systems. Science, Vol. 240 (1988), 1285–1293.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ling, C.X., Zhang, H. (2002). Toward Bayesian Classifiers with Accurate Probabilities. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_12
Download citation
DOI: https://doi.org/10.1007/3-540-47887-6_12
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive