Abstract
Active selection of good training examples is an important approach to reducing data-collection costs in machine learning; however, most existing methods focus on maximizing classification accuracy. In many applications, such as those with unequal misclassification costs, producing good class probability estimates (CPEs) is more important than optimizing classification accuracy. We introduce novel approaches to active learning based on the algorithms Bootstrap-LV and ActiveDecorate, by using Jensen-Shannon divergence (a similarity measure for probability distributions) to improve sample selection for optimizing CPEs. Comprehensive experimental results demonstrate the benefits of our approaches.
Chapter PDF
Similar content being viewed by others
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Machine Learning 15, 201–221 (1994)
Saar-Tsechansky, M., Provost, F.J.: Active learning for class probability estimation and ranking. In: Proc. of 17th Intl. Joint Conf. on Artificial Intelligence, pp. 911–920 (2001)
Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)
Melville, P., Mooney, R.J.: Diverse ensembles for active learning. In: Proc. of 21st Intl. Conf. on Machine Learning (ICML 2004), Banff, Canada, pp. 584–591 (2004)
Dhillon, I., Mallela, S., Kumar, R.: Enhanced word clustering for hierarchical classification. In: Proc. of 8th ACM Intl. Conf. on Knowledge Discovery and Data Mining (2002)
McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: Proc. of 15th Intl. Conf. on Machine Learning (1998)
Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Proc. of 11th Intl. Conf. on Machine Learning, pp. 148–156 (1994)
Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993)
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proc. of the ACM Workshop on Computational Learning Theory, Pittsburgh, PA (1992)
Melville, P., Mooney, R.J.: Constructing diverse classifier ensembles using artificial training examples. In: Proc. of 18th Intl. Joint Conf. on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 505–510 (2003)
Melville, P., Mooney, R.J.: Creating diversity in ensembles using artificial data. Journal of Information Fusion: Special Issue on Diversity in Multi Classifier Systems 6, 99–111 (2004)
Saar-Tsechansky, M., Provost, F.: Active sampling for class probability estimation and ranking. Machine Learning 54, 153–178 (2004)
Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html
Provost, F., Domingos, P.: Tree induction for probability-based rankings. Machine Learning 52, 199–215 (2003)
Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)
Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proc. of 18th Intl. Conf. on Machine Learning (2001)
Nielsen, R.D.: MOB-ESP and other improvements in probability estimation. In: Proc. of 20th Conf. on Uncertainty in Artificial Intelligence, pp. 418–425 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Melville, P., Yang, S.M., Saar-Tsechansky, M., Mooney, R. (2005). Active Learning for Probability Estimation Using Jensen-Shannon Divergence. In: Gama, J., Camacho, R., Brazdil, P.B., Jorge, A.M., Torgo, L. (eds) Machine Learning: ECML 2005. ECML 2005. Lecture Notes in Computer Science(), vol 3720. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11564096_28
Download citation
DOI: https://doi.org/10.1007/11564096_28
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-29243-2
Online ISBN: 978-3-540-31692-3
eBook Packages: Computer ScienceComputer Science (R0)