Machine Learning

, Volume 54, Issue 2, pp 153–178 | Cite as

Active Sampling for Class Probability Estimation and Ranking

  • Maytal Saar-Tsechansky
  • Foster Provost
Article

Abstract

In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.

active learning cost-sensitive learning class probability estimation ranking supervised learning decision trees uncertainty sampling selective sampling 

References

  1. Abe, N., & Mamitsuka, H. (1998). Query learning strategies using boosting and bagging. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 1–9).Google Scholar
  2. Angluin, D. (1988). Queries and concept learning. Machine Learning, 2, 319–342.Google Scholar
  3. Argamon-Engelson, S., & Dagan, I. (1999). Committee-based sample selection for probabilistic classifiers. Journal of Artificial Intelligence Research, 11, 335–360.Google Scholar
  4. Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants. Machine Learning, 36, 105–142.Google Scholar
  5. Blake, C. L., & Merz, C. J. (1998). UCI Repository of Machine Learning Databases. Irvine, CA: University of California, Department of Information and Computer Science [http://www.ics.uci.edu/~mlearn/MLRepository.html].Google Scholar
  6. Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognition, 30:7, 1145–1159.Google Scholar
  7. Breiman, L. (1996). Bagging predictors. Machine Learning, 26:2, 123–140.Google Scholar
  8. Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In Proceedings of the Ninth European Conference on Artificial Intelligence (pp. 147–149).Google Scholar
  9. Cohn, D., Atlas, L., & Ladner, R. (1994). Improved generalization with active learning. Machine Learning, 15, 201–221.Google Scholar
  10. Cohn, D., Ghahramani, Z., & Jordan, M. (1996). Active learning with statistical models. Journal of Artificial Intelligence Research, 4, 129–145.Google Scholar
  11. Cohen, W., & Singer, Y. A. (1999). A simple, fast, and effective rule learner. In Proceedings of the Sixteenth National Conference of the American Association of Artificial Intelligence (pp. 335–342).Google Scholar
  12. Efron, B., & Tibshirani, R. (1993). An Introduction to the Bootstrap. Chapman and Hall.Google Scholar
  13. Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In Proceedings of the International Conference on Machine Learning (pp. 148–156).Google Scholar
  14. Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality. Journal of Knowledge Discovery and Data Mining, 55–77.Google Scholar
  15. Iyengar, V. S., Apte, C., & Zhang, T. (2000). Active learning using adaptive resampling. In Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 92–98).Google Scholar
  16. Lewis, D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 3–12).Google Scholar
  17. Lewis, D., & Catlett, J. (1994). Heterogeneous uncertainty sampling. In Proceedings of the Eleventh International Conference on Machine Learning (pp. 148–156).Google Scholar
  18. MaCallum, A., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 350–358).Google Scholar
  19. Perlich, C., Provost, F., & Simonoff, J. S. (2003). Tree induction vs. logistic regression: A learning-curve analysis. Journal of Machine Learning Research, 4: June, 211–255.Google Scholar
  20. Porter, B. W., & Kibler, D. F. (1986). Experimental goal regression: A method for learning problem-solving heuristics. Machine Learning, 1:3, 249–285.Google Scholar
  21. Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing classifiers. In Proceedings of the Fifteenth International Conference on Machine Learning (pp. 445–453).Google Scholar
  22. Provost, F., & Domingos, P. (2000). Well-trained PETs: Improving probability estimation trees. CeDER Working Paper #IS-00-04, Stern School of Business, NYU.Google Scholar
  23. Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufman.Google Scholar
  24. Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the Eighteenth International Conference on Machine Learning (pp. 441–448).Google Scholar
  25. Seung, H. S., Opper, M., & Smopolinsky, H. (1992). Query by committee. In Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory (pp. 287–294).Google Scholar
  26. Simon, H. A., & Lea, G. (1974). Problem solving and rule induction: A unified view. In L. W. Gregg (Ed.), Knowledge and Cognition. Chap. 5. Potomac, MD: Erlbaum.Google Scholar
  27. Turney, P. D. (2000). Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning at ICML-2000, Stanford University, California, 15–21.Google Scholar
  28. Winston, P. H. (1975). Learning structural descriptions from examples. In P. H. Winston (Ed.), The Psychology of Computer Vision. New York: McGraw-Hill.Google Scholar
  29. Zadrozny B., & Elkan, C. (2001). Learning and making decisions when costs and probabilities are both unknown. In Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 204–212).Google Scholar

Copyright information

© Kluwer Academic Publishers 2004

Authors and Affiliations

  • Maytal Saar-Tsechansky
    • 1
  • Foster Provost
    • 2
  1. 1.Department of Management Science and Information SystemsRed McCombs School of Business, The University of Texas at AustinAustinUSA
  2. 2.Department of Information Operations & Management SciencesLeonard N. Stern School of Business, New York UniversityNew YorkUSA

Personalised recommendations