Machine Learning

, Volume 76, Issue 2–3, pp 257–270 | Cite as

A self-training approach to cost sensitive uncertainty sampling



Uncertainty sampling is an effective method for performing active learning that is computationally efficient compared to other active learning methods such as loss-reduction methods. However, unlike loss-reduction methods, uncertainty sampling cannot minimize total misclassification costs when errors incur different costs. This paper introduces a method for performing cost-sensitive uncertainty sampling that makes use of self-training. We show that, even when misclassification costs are equal, this self-training approach results in faster reduction of loss as a function of number of points labeled and more reliable posterior probability estimates as compared to standard uncertainty sampling. We also show why other more naive methods of modifying uncertainty sampling to minimize total misclassification costs will not always work well.


Active learning Cost-sensitive learning Self-training 


  1. Ando, R., & Zhang, T. (2005). A high-performance semi-supervised learning method for text chunking. In Proceedings of the 43rd annual meeting on association for computational linguistics (pp. 1–9). NJ: Association for Computational Linguistics Morristown. Google Scholar
  2. Asuncion, A., & Newman, D. (2007). UCI machine learning repository. Google Scholar
  3. Beygelzimer, A., Dasgupta, S., & Langford, J. (2009). Importance weighted active learning. In ICML ’09: Proceedings of the 26th international conference on Machine learning. Google Scholar
  4. Chen, Y., Crawford, M., & Ghosh, J. (2007). Knowledge based stacking of hyperspectral data for land cover classification. In Computational intelligence and data mining, 2007. CIDM 2007. IEEE Symposium on (pp. 316–322). Google Scholar
  5. Dasgupta, S., & Hsu, D. (2008). Hierarchical sampling for active learning. In ICML ’08: Proceedings of the 25th international conference on Machine learning (pp. 208–215). Helsinki, Finland. Google Scholar
  6. Dasgupta, S., Hsu, D., & Monteleoni, C. (2008). A general agnostic active learning algorithm. Advances in Neural Information Processing Systems, 20, 353–360. Google Scholar
  7. Elkan, C. (2001). The foundations of cost-sensitive learning. In Proceedings of the seventeenth international joint conference on artificial intelligence (pp. 973–978). Google Scholar
  8. Ham, J., Chen, Y., Crawford, M. M., & Ghosh, J. (2005). Investigation of the random forest framework for classification of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 43(3), 492–501. CrossRefGoogle Scholar
  9. Karypis, G. (2002). CLUTO–a clustering toolkit. University of Minnesota technical report 02-017. Google Scholar
  10. Kumar, S., Ghosh, J., & Crawford, M. M. (2001). Best-bases feature extraction algorithms for classification of hyperspectral data. IEEE Transactions on Geoscience and Remote Sensing, 39(7), 1368–1379. CrossRefGoogle Scholar
  11. Landgrebe, D. (2002). Hyperspectral image data analysis. Signal Processing Magazine, IEEE, 19, 17–28. CrossRefGoogle Scholar
  12. Lewis, D. D., & Catlett, J. (1994). Heterogeneous uncertainty sampling for supervised learning. In Proceedings of the eleventh international conference on machine learning (pp. 148–156). San Mateo: Morgan Kaufmann. Google Scholar
  13. Margineantu, D. D. (2005). Active cost-sensitive learning. In The nineteenth international joint conference on artificial intelligence. Edinburgh, Scotland. Google Scholar
  14. McCallum, A., & Nigam, K. (1998). A comparison of event models for Naive Bayes text classification. In AAAI-98 workshop on learning for text categorization. Google Scholar
  15. McClosky, D., Charniak, E., & Johnson, M. (2006). Effective self-training for parsing. In Proceedings of HLT-NAACL 2006. Google Scholar
  16. Morgan, J. T. (2002). Adaptive hierarchical classifier with limited training data. PhD thesis, Univ. of Texas at Austin. Google Scholar
  17. Rosenberg, C., Hebert, M., & Schneiderman, H. (2005). Semi-supervised self-training of object detection models. In Seventh IEEE workshop on applications of computer vision (vol. 1, pp. 29–36). Google Scholar
  18. Roy, N., & McCallum, A. (2001). Toward optimal active learning through sampling estimation of error reduction. In Proceedings of the 18th international conference on machine learning (pp. 441–448). San Mateo: Morgan Kaufmann. Google Scholar
  19. Saar-Tsechansky, M., & Provost, F. (2004). Active sampling for class probability estimation and ranking. Machine Learning, 54(2), 153–178. MATHCrossRefGoogle Scholar
  20. Saar-Tsechansky, M., & Provost, F. (2007). Decision-centric active learning of binary-outcome models. Information Systems Research, 18(1), 1–19. CrossRefGoogle Scholar
  21. Settles, B. (2009). Active learning literature survey. Computer sciences technical report 1648, University of Wisconsin–Madison. Google Scholar
  22. Seung, H. S., Opper, M., & Sompolinsky, H. (1992). Query by committee. In Proceedings of the fifth annual workshop on computational learning theory (pp. 287–294). Pittsburgh: ACM. CrossRefGoogle Scholar
  23. Yarowsky, D. (1995). Unsupervised word sense disambiguation rivaling supervised methods. In Proceedings of the 33rd annual meeting on association for computational linguistics (pp. 189–196). Google Scholar
  24. Zhong, S., & Ghosh, J. (2003). A comparative study of generative models for document clustering. In SDM workshop on clustering high dimensional data and its applications. Google Scholar
  25. Zhu, X. (2005). Semi-supervised learning literature survey. Tech. Rep. 1530, Computer Sciences, University of Wisconsin-Madison. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2009

Authors and Affiliations

  1. 1.Department of Electrical and Computer EngineeringThe University of Texas at AustinAustinUSA

Personalised recommendations