Abstract
For a variety of applications, machine learning algorithms are required to construct models that minimize the total loss associated with the decisions, rather than the number of errors. One of the most efficient approaches to building models that are sensitive to non-uniform costs of errors is to first estimate the class probabilities of the unseen instances and then to make the decision based on both the computed probabilities and the loss function. Although all classification algorithms can be converted into algorithms for learning models that compute class probabilities, in many cases the computed estimates have proven to be inaccurate. As a result, there is a large research effort to improve the accuracy of the estimates computed by different algorithms. This paper presents a novel approach to cost-sensitive learning that addresses the problem of minimizing the actual cost of the decisions rather than improving the overall quality of the probability estimates. The decision making step for our methods is based on the distribution of the individual scores computed by classifiers that are built by different types of ensembles of decision trees. The new approach relies on statistics that measure the probability that the computed estimates are on one side or the other of the decision boundary, rather than trying to improve the quality of the estimates. The experimental analysis of the new algorithms that were developed based on our approach gives new insight into cost-sensitive decision making and shows that for some tasks, the new algorithms outperform some of the best probability-based algorithms for cost-sensitive learning.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Y. Amit and D. Geman. Shape quantization and recognition with randomized trees. Neural Computation, 9:1545–1588, 1997.
S. D. Bay. The UCI KDD archive. University of California, Irvine, Dept. of Information and Computer Sciences, 1999. [http://kdd.ics.uci.edu/].
C. L. Blake and C. J. Merz. UCI repository of machine learning databases. University of California, Irvine, Dept. of Information and Computer Sciences, 1998. [http://www.ics.uci.edu/~mlearn/MLRepository.html].
J. P. Bradford, C. Kunz, R. Kohavi, C. Brunk, and C. E. Brodley. Pruning decision trees with misclassification costs. In C. Nedellec and C. Rouveirol, editors, Lecture Notes in Artificial Intelligence. Machine Learning: ECML-98, Tenth European Conference on Machine Learning, Proceedings, volume 1398, pages 131–136, Berlin, New York, 1998. Springer Verlag.
A. P. Bradley. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:1145–1159, 1997.
L. Breiman. Out-of-bag estimation. Technical report, Department of Statistics, University of California, Berkeley, 1998.
L. Breiman. Random forests. Technical report, Department of Statistics, University of California, Berkeley, 2001.
L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification and Regression Trees. Wadsworth International Group, 1984.
B. Cestnik. Estimating probabilities: A crucial task in machine learning. In L. C. Aiello, editor, Proceedings of the Ninth European Conference on Artificial Intelligence, pages 147–149, London, 1990. Pitman Publishing.
T. G. Dietterich. An experimental comparison of three methods for constructing ensembles of decision trees: Bagging, boosting and randomization. Machine Learning, 40(2):139–158, 2000.
P. Domingos. Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pages 155–164, New York, 1999. ACM Press.
C. Drummond and R. C. Holte. Exploiting the cost (in)sensitivity of decision tree splitting criteria. In Machine Learning: Proceedings of the Seventeenth International Conference, pages 239–246, San Francisco, CA, 2000. Morgan Kaufmann.
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley and Sons, Inc.—Interscience, second edition, 2000.
B. Efron and R. J. Tibshirani. An Introduction to the Bootstrap. Chapman and Hall, New York, 1993.
C. Elkan. The foundations of cost-sensitive learning. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence. Morgan Kaufmann Publishers, Inc., 2001.
W. Fan, S. J. Stolfo, J. Zhang, and P. K. Chan. AdaCost: Misclassification costsensitive boosting. In Machine Learning: Proceedings of the Sixteenth International Conference, pages 97–105, San Francisco, 1999. Morgan Kaufmann.
I. J. Good. The Estimation of Probabilities: An Essay on Modern Bayesian Methods. M. I. T. Press, Cambridge, Mass., 1965.
N. Japkowicz. The class imbalance problem: Significance and strategies. In Proceedings of the 2000 International Conference on Artificial Intelligence (IC-AI’2000), 2000.
M. Kukar and I. Kononenko. Cost-sensitive learning with neural networks. In Proceedings of the Thirteenth European Conference on Artificial Intelligence, Chichester, NY, 1998. Wiley.
T. Leonard and J. S. J. Hsu. Bayesian Methods, An Analysis for Statisticians and Interdisciplinary Researchers. Cambridge University Press, 1999.
D. D. Margineantu. Building ensembles of classifiers for loss minimization. In M. Pourahmadi, editor, Models, Predictions and Computing: Proceedings of the 31st Symposium on the Interface, volume 31, pages 190–194. The Interface Foundation of North America, 1999.
D. D. Margineantu. Methods for cost-sensitive learning. Technical report, Department of Computer Science, Oregon State University, Corvallis, OR, 2001.
D. D. Margineantu and T. G. Dietterich. Bootstrap methods for the cost-sensitive evaluation of classifiers. In Machine Learning: Proceedings of the Seventeenth International Conference, pages 583–590, San Francisco, CA, 2000. Morgan Kaufmann.
E. P. D. Pednault, B. K. Rosen, and C. Apte. The importance of estimation errors in cost-sensitive learning. In Cost-Sensitive Learning Workshop Notes, 2000.
F. Provost and P. Domingos. Well-trained PETs: Improving probability estimation trees. Technical Report IS-00-04, Stern School of Business, New York University, 2000.
F. Provost, T. Fawcett, and R. Kohavi. The case against accuracy estimation for comparing classifiers. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 445–453. Morgan Kaufmann, San Francisco, 1998.
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, San Francisco, 1993.
M. Saar-Tsechansky and F. Provost. Active learning for class probability estimation and ranking. In Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pages 911–917. AAAI Press/MIT Press, 2001.
P. Smyth, A. Gray, and U. Fayyad. Retrofitting decision tree classifiers using kernel density estimation. In Machine Learning: Proceedings of the Twelvth International Conference, pages 506–514, 1995.
B. Zadrozny and C. Elkan. Obtaining calibrated probability estimates from decision trees and naive bayesian classifiers. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 609–616, San Francisco, CA, 2001. Morgan Kaufmann.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Margineantu, D.D. (2002). Class Probability Estimation and Cost-Sensitive Classification Decisions. In: Elomaa, T., Mannila, H., Toivonen, H. (eds) Machine Learning: ECML 2002. ECML 2002. Lecture Notes in Computer Science(), vol 2430. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36755-1_23
Download citation
DOI: https://doi.org/10.1007/3-540-36755-1_23
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-44036-9
Online ISBN: 978-3-540-36755-0
eBook Packages: Springer Book Archive