Active Learning for Probability Estimation Using Jensen-Shannon Divergence

  • Prem Melville
  • Stewart M. Yang
  • Maytal Saar-Tsechansky
  • Raymond Mooney
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3720)


Active selection of good training examples is an important approach to reducing data-collection costs in machine learning; however, most existing methods focus on maximizing classification accuracy. In many applications, such as those with unequal misclassification costs, producing good class probability estimates (CPEs) is more important than optimizing classification accuracy. We introduce novel approaches to active learning based on the algorithms Bootstrap-LV and ActiveDecorate, by using Jensen-Shannon divergence (a similarity measure for probability distributions) to improve sample selection for optimizing CPEs. Comprehensive experimental results demonstrate the benefits of our approaches.


Mean Square Error Active Learning Utility Score Class Probability Minority Class 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Machine Learning 15, 201–221 (1994)Google Scholar
  2. 2.
    Saar-Tsechansky, M., Provost, F.J.: Active learning for class probability estimation and ranking. In: Proc. of 17th Intl. Joint Conf. on Artificial Intelligence, pp. 911–920 (2001)Google Scholar
  3. 3.
    Cover, T.M., Thomas, J.A.: Elements of Information Theory. Wiley, New York (1991)zbMATHCrossRefGoogle Scholar
  4. 4.
    Melville, P., Mooney, R.J.: Diverse ensembles for active learning. In: Proc. of 21st Intl. Conf. on Machine Learning (ICML 2004), Banff, Canada, pp. 584–591 (2004)Google Scholar
  5. 5.
    Dhillon, I., Mallela, S., Kumar, R.: Enhanced word clustering for hierarchical classification. In: Proc. of 8th ACM Intl. Conf. on Knowledge Discovery and Data Mining (2002)Google Scholar
  6. 6.
    McCallum, A., Nigam, K.: Employing EM and pool-based active learning for text classification. In: Proc. of 15th Intl. Conf. on Machine Learning (1998)Google Scholar
  7. 7.
    Lewis, D.D., Catlett, J.: Heterogeneous uncertainty sampling for supervised learning. In: Proc. of 11th Intl. Conf. on Machine Learning, pp. 148–156 (1994)Google Scholar
  8. 8.
    Efron, B., Tibshirani, R.J.: An Introduction to the Bootstrap. Chapman and Hall, New York (1993)zbMATHGoogle Scholar
  9. 9.
    Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proc. of the ACM Workshop on Computational Learning Theory, Pittsburgh, PA (1992)Google Scholar
  10. 10.
    Melville, P., Mooney, R.J.: Constructing diverse classifier ensembles using artificial training examples. In: Proc. of 18th Intl. Joint Conf. on Artificial Intelligence (IJCAI 2003), Acapulco, Mexico, pp. 505–510 (2003)Google Scholar
  11. 11.
    Melville, P., Mooney, R.J.: Creating diversity in ensembles using artificial data. Journal of Information Fusion: Special Issue on Diversity in Multi Classifier Systems 6, 99–111 (2004)Google Scholar
  12. 12.
    Saar-Tsechansky, M., Provost, F.: Active sampling for class probability estimation and ranking. Machine Learning 54, 153–178 (2004)zbMATHCrossRefGoogle Scholar
  13. 13.
    Blake, C.L., Merz, C.J.: UCI repository of machine learning databases (1998),
  14. 14.
    Provost, F., Domingos, P.: Tree induction for probability-based rankings. Machine Learning 52, 199–215 (2003)zbMATHCrossRefGoogle Scholar
  15. 15.
    Witten, I.H., Frank, E.: Data Mining: Practical Machine Learning Tools and Techniques with Java Implementations. Morgan Kaufmann, San Francisco (1999)Google Scholar
  16. 16.
    Zadrozny, B., Elkan, C.: Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers. In: Proc. of 18th Intl. Conf. on Machine Learning (2001)Google Scholar
  17. 17.
    Nielsen, R.D.: MOB-ESP and other improvements in probability estimation. In: Proc. of 20th Conf. on Uncertainty in Artificial Intelligence, pp. 418–425 (2004)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Prem Melville
    • 1
  • Stewart M. Yang
    • 1
  • Maytal Saar-Tsechansky
    • 2
  • Raymond Mooney
    • 1
  1. 1.Dept. of Computer SciencesUniv. of Texas at Austin 
  2. 2.Red McCombs School of BusinessUniv. of Texas at Austin 

Personalised recommendations