, Volume 54, Issue 2, pp 153178
First online:
Active Sampling for Class Probability Estimation and Ranking
 Maytal SaarTsechanskyAffiliated withDepartment of Management Science and Information Systems, Red McCombs School of Business, The University of Texas at Austin
 , Foster ProvostAffiliated withDepartment of Information Operations & Management Sciences, Leonard N. Stern School of Business, New York University
Abstract
In many costsensitive environments class probability estimates are used by decision makers to evaluate the expected utility from a set of alternatives. Supervised learning can be used to build class probability estimates; however, it often is very costly to obtain training data with class labels. Active learning acquires data incrementally, at each phase identifying especially useful additional data for labeling, and can be used to economize on examples needed for learning. We outline the critical features of an active learner and present a samplingbased active learning method for estimating class probabilities and classbased rankings. BOOTSTRAPLV identifies particularly informative new data for learning based on the variance in probability estimates, and uses weighted sampling to account for a potential example's informative value for the rest of the input space. We show empirically that the method reduces the number of data items that must be obtained and labeled, across a wide variety of domains. We investigate the contribution of the components of the algorithm and show that each provides valuable information to help identify informative examples. We also compare BOOTSTRAPLV with UNCERTAINTY SAMPLING, an existing active learning method designed to maximize classification accuracy. The results show that BOOTSTRAPLV uses fewer examples to exhibit a certain estimation accuracy and provide insights to the behavior of the algorithms. Finally, we experiment with another new active sampling algorithm drawing from both UNCERTAINTY SAMPLING and BOOTSTRAPLV and show that it is significantly more competitive with BOOTSTRAPLV compared to UNCERTAINTY SAMPLING. The analysis suggests more general implications for improving existing active sampling algorithms for classification.
 Title
 Active Sampling for Class Probability Estimation and Ranking
 Journal

Machine Learning
Volume 54, Issue 2 , pp 153178
 Cover Date
 200402
 DOI
 10.1023/B:MACH.0000011806.12374.c3
 Print ISSN
 08856125
 Online ISSN
 15730565
 Publisher
 Kluwer Academic PublishersPlenum Publishers
 Additional Links
 Topics
 Keywords

 active learning
 costsensitive learning
 class probability estimation
 ranking
 supervised learning
 decision trees
 uncertainty sampling
 selective sampling
 Industry Sectors
 Authors

 Maytal SaarTsechansky ^{(1)}
 Foster Provost ^{(2)}
 Author Affiliations

 1. Department of Management Science and Information Systems, Red McCombs School of Business, The University of Texas at Austin, Austin, Texas, 78712, USA
 2. Department of Information Operations & Management Sciences, Leonard N. Stern School of Business, New York University, 44 West Fourth Street, New York, NY, 10012, USA