# Active Sampling for Class Probability Estimation and Ranking

- 884 Downloads
- 57 Citations

## Abstract

In many cost-sensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a sampling-based active learning method for estimating class probabilities and class-based rankings. BOOTSTRAP-LV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAP-LV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAP-LV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAP-LV and show that it is significantly more competitive with BOOTSTRAP-LV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.

## References

- Abe, N., & Mamitsuka, H. (1998). Query learning strategies using boosting and bagging. In
*Proceedings of the Fifteenth International Conference on Machine Learning*(pp. 1–9).Google Scholar - Angluin, D. (1988). Queries and concept learning.
*Machine Learning, 2*, 319–342.Google Scholar - Argamon-Engelson, S., & Dagan, I. (1999). Committee-based sample selection for probabilistic classifiers.
*Journal of Artificial Intelligence Research, 11*, 335–360.Google Scholar - Bauer, E., & Kohavi, R. (1999). An empirical comparison of voting classification algorithms: Bagging, boosting, and variants.
*Machine Learning, 36*, 105–142.Google Scholar - Blake, C. L., & Merz, C. J. (1998). UCI
*Repository of Machine Learning Databases*. Irvine, CA: University of California, Department of Information and Computer Science [http://www.ics.uci.edu/~mlearn/MLRepository.html].Google Scholar - Bradley, A. P. (1997). The use of the area under the ROC curve in the evaluation of machine learning algorithms.
*Pattern Recognition, 30:7*, 1145–1159.Google Scholar - Breiman, L. (1996). Bagging predictors.
*Machine Learning, 26:2*, 123–140.Google Scholar - Cestnik, B. (1990). Estimating probabilities: A crucial task in machine learning. In
*Proceedings of the Ninth European Conference on Artificial Intelligence*(pp. 147–149).Google Scholar - Cohn, D., Atlas, L., & Ladner, R. (1994). Improved generalization with active learning.
*Machine Learning, 15*, 201–221.Google Scholar - Cohn, D., Ghahramani, Z., & Jordan, M. (1996). Active learning with statistical models.
*Journal of Artificial Intelligence Research, 4*, 129–145.Google Scholar - Cohen, W., & Singer, Y. A. (1999). A simple, fast, and effective rule learner. In
*Proceedings of the Sixteenth National Conference of the American Association of Artificial Intelligence*(pp. 335–342).Google Scholar - Efron, B., & Tibshirani, R. (1993).
*An Introduction to the Bootstrap*. Chapman and Hall.Google Scholar - Freund, Y., & Schapire, R. (1996). Experiments with a new boosting algorithm. In
*Proceedings of the International Conference on Machine Learning*(pp. 148–156).Google Scholar - Friedman, J. H. (1997). On bias, variance, 0/1-loss, and the curse-of-dimensionality.
*Journal of Knowledge Discovery and Data Mining*, 55–77.Google Scholar - Iyengar, V. S., Apte, C., & Zhang, T. (2000). Active learning using adaptive resampling. In
*Proceedings of the Sixth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*(pp. 92–98).Google Scholar - Lewis, D., & Gale, W. A. (1994). A sequential algorithm for training text classifiers. In
*Proceedings of the Seventeenth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval*(pp. 3–12).Google Scholar - Lewis, D., & Catlett, J. (1994). Heterogeneous uncertainty sampling. In
*Proceedings of the Eleventh International Conference on Machine Learning*(pp. 148–156).Google Scholar - MaCallum, A., & Nigam, K. (1998). Employing EM in pool-based active learning for text classification. In
*Proceedings of the Fifteenth International Conference on Machine Learning*(pp. 350–358).Google Scholar - Perlich, C., Provost, F., & Simonoff, J. S. (2003). Tree induction vs. logistic regression: A learning-curve analysis.
*Journal of Machine Learning Research, 4: June*, 211–255.Google Scholar - Porter, B. W., & Kibler, D. F. (1986). Experimental goal regression: A method for learning problem-solving heuristics.
*Machine Learning, 1:3*, 249–285.Google Scholar - Provost, F., Fawcett, T., & Kohavi, R. (1998). The case against accuracy estimation for comparing classifiers. In
*Proceedings of the Fifteenth International Conference on Machine Learning*(pp. 445–453).Google Scholar - Provost, F., & Domingos, P. (2000). Well-trained PETs: Improving probability estimation trees. CeDER Working Paper #IS-00-04, Stern School of Business, NYU.Google Scholar
- Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufman.Google Scholar
- Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In
*Proceedings of the Eighteenth International Conference on Machine Learning*(pp. 441–448).Google Scholar - Seung, H. S., Opper, M., & Smopolinsky, H. (1992). Query by committee. In
*Proceedings of the Fifth Annual ACM Workshop on Computational Learning Theory*(pp. 287–294).Google Scholar - Simon, H. A., & Lea, G. (1974). Problem solving and rule induction: A unified view. In L. W. Gregg (Ed.),
*Knowledge and Cognition*. Chap. 5. Potomac, MD: Erlbaum.Google Scholar - Turney, P. D. (2000). Types of cost in inductive concept learning. Workshop on Cost-Sensitive Learning at ICML-2000, Stanford University, California, 15–21.Google Scholar
- Winston, P. H. (1975). Learning structural descriptions from examples. In P. H. Winston (Ed.),
*The Psychology of Computer Vision*. New York: McGraw-Hill.Google Scholar - Zadrozny B., & Elkan, C. (2001). Learning and making decisions when costs and probabilities are both unknown. In
*Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*(pp. 204–212).Google Scholar