Data Mining and Knowledge Discovery

, Volume 31, Issue 1, pp 164–202 | Cite as

Evidence-based uncertainty sampling for active learning

  • Manali Sharma
  • Mustafa BilgicEmail author


Active learning methods select informative instances to effectively learn a suitable classifier. Uncertainty sampling, a frequently utilized active learning strategy, selects instances about which the model is uncertain but it does not consider the reasons for why the model is uncertain. In this article, we present an evidence-based framework that can uncover the reasons for why a model is uncertain on a given instance. Using the evidence-based framework, we discuss two reasons for uncertainty of a model: a model can be uncertain about an instance because it has strong, but conflicting evidence for both classes or it can be uncertain because it does not have enough evidence for either class. Our empirical evaluations on several real-world datasets show that distinguishing between these two types of uncertainties has a drastic impact on the learning efficiency. We further provide empirical and analytical justifications as to why distinguishing between the two uncertainties matters.


Active learning Uncertainty sampling Classification 



This material is based upon work supported by the National Science Foundation CAREER award no. IIS-1350337.


  1. Abe N, Mamitsuka H (1998) Query learning strategies using boosting and bagging. In: Proceedings of the fifteenth international conference on machine learning, pp 1–9Google Scholar
  2. Bilgic M, Mihalkova L, Getoor L (2010) Active learning for networked data. In: Proceedings of the 27th international conference on machine learning, pp 79–86Google Scholar
  3. Chao C, Cakmak M, Thomaz AL (2010) Transparent active learning for robots. In: 5th ACM/IEEE international conference on Human–Robot interaction (HRI), IEEE, pp 317–324Google Scholar
  4. Cohn DA (1997) Minimizing statistical bias with queries. In: Advances in neural information processing systems, pp 417–423Google Scholar
  5. Cohn DA, Ghahramani Z, Jordan MI (1996) Active learning with statistical models. J Artif Intell Res 4:129–145zbMATHGoogle Scholar
  6. Dagan I, Engelson SP (1995) Committee-based sampling for training probabilistic classifiers. In: Proceedings of the twelfth international conference on machine learning, pp 150–157Google Scholar
  7. Donmez P, Carbonell JG, Bennett PN (2007) Dual strategy active learning. In: Machine learning: ECML 2007. Springer, pp 116–127Google Scholar
  8. Frank A, Asuncion A (2010) UCI machine learning repository.
  9. Frey PW, Slate DJ (1991) Letter recognition using holland-style adaptive classifiers. Mach Learn 6(2):161–182Google Scholar
  10. Gu Q, Zhang T, Han J, Ding CH (2012) Selective labeling via error bound minimization. In: Advances in neural information processing systems, pp 323–331Google Scholar
  11. Gu Q, Zhang T, Han J (2014) Batch-mode active learning via error bound minimization. In: Proceedings of the Thirtieth conference annual conference on uncertainty in artificial intelligence (UAI-14). AUAI Press, Corvallis, Oregon, pp 300–309Google Scholar
  12. Guyon I et al (2011) Datasets of the active learning challenge. J Mach Learn ResGoogle Scholar
  13. Hoi SC, Jin R, Lyu MR (2006a) Large-scale text categorization by batch mode active learning. In: Proceedings of the 15th international conference on World Wide Web, ACM, pp 633–642Google Scholar
  14. Hoi SC, Jin R, Zhu J, Lyu MR (2006b) Batch mode active learning and its application to medical image classification. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 417–424Google Scholar
  15. Lewis DD, Gale WA (1994) A sequential algorithm for training text classifiers. In: Proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Springer-Verlag New York, Inc., pp 3–12Google Scholar
  16. Maas AL, Daly RE, Pham PT, Huang D, Ng AY, Potts C (2011) Learning word vectors for sentiment analysis. In: Proceedings of the 49th annual meeting of the association for computational linguistics: human language technologies, vol 1. Association for Computational Linguistics, pp 142–150Google Scholar
  17. MacKay DJ (1992) Information-based objective functions for active data selection. Neural Comput 4(4):590–604CrossRefGoogle Scholar
  18. McCallum A, Nigam K et al (1998) A comparison of event models for naive bayes text classification. In: AAAI-98 workshop on learning for text categorization, Citeseer, vol 752, pp 41–48Google Scholar
  19. Melville P, Mooney RJ (2004) Diverse ensembles for active learning. In: Proceedings of the twenty-first international conference on machine learning, pp 74Google Scholar
  20. Mitchell TM (1982) Generalization as search. Artif Intell 18(2):203–226MathSciNetCrossRefGoogle Scholar
  21. Nguyen HT, Smeulders A (2004) Active learning using pre-clustering. In: Proceedings of the twenty-first international conference on machine learning, ACM, p 79Google Scholar
  22. Pace RK, Barry R (1997) Sparse spatial autoregressions. Stat Probab Lett 33(3):291–297CrossRefzbMATHGoogle Scholar
  23. Roy N, McCallum A (2001) Toward optimal active learning through sampling estimation of error reduction. In: Proceedings of the eighteenth international conference on machine learning. Morgan Kaufmann Publishers Inc., ICML ’01, pp 441–448Google Scholar
  24. Sculley D (2007) Online active learning methods for fast label-efficient spam filtering. In: Fourth conference on email and anti-spam (CEAS)Google Scholar
  25. Segal R, Markowitz T, Arnold W (2006) Fast uncertainty sampling for labeling large e-mail corpora. In: Third conference on email and anti-spam (CEAS)Google Scholar
  26. Senge R, Bösner S, Dembczyński K, Haasenritter J, Hirsch O, Donner-Banzhoff N, Hüllermeier E (2014) Reliable classification: Learning classifiers that distinguish aleatoric and epistemic uncertainty. Inf Sci 255:16–29MathSciNetCrossRefzbMATHGoogle Scholar
  27. Settles B (2012) Active learning. Synth Lect Artif Intell Mach Learn 6(1):1–114MathSciNetCrossRefzbMATHGoogle Scholar
  28. Settles B, Craven M (2008) An analysis of active learning strategies for sequence labeling tasks. In: Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, pp 1070–1079Google Scholar
  29. Seung HS, Opper M, Sompolinsky H (1992) Query by committee. In: Proceedings of the fifth annual workshop on computational learning theory, ACM, pp 287–294Google Scholar
  30. Sharma M, Bilgic M (2013) Most-surely vs. least-surely uncertain. In: IEEE 13th international conference on data mining (ICDM), pp 667–676Google Scholar
  31. Sindhwani V, Melville P, Lawrence RD (2009) Uncertainty sampling and transductive experimental design for active dual supervision. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 953–960Google Scholar
  32. Steuer RE (1989) Multiple criteria optimization: theory, computations, and application. Krieger Pub CoGoogle Scholar
  33. Thompson CA, Califf ME, Mooney RJ (1999) Active learning for natural language parsing and information extraction. In: Proceedings of the sixteenth international conference on machine learning, pp 406–414Google Scholar
  34. Tong S, Chang E (2001) Support vector machine active learning for image retrieval. In: Proceedings of the ninth ACM international conference on multimedia, ACM, pp 107–118Google Scholar
  35. Xu Z, Yu K, Tresp V, Xu X, Wang J (2003) Representative sampling for text classification using support vector machines. In: Advances in information retrieval. Lecture notes in computer science, vol 2633, pp 393–407Google Scholar
  36. Yu K, Bi J, Tresp V (2006) Active learning via transductive experimental design. In: Proceedings of the 23rd international conference on machine learning, ACM, pp 1081–1088Google Scholar
  37. Zhang C, Chen T (2002) An active learning framework for content-based information retrieval. IEEE Trans Multimedia 4(2):260–268CrossRefGoogle Scholar
  38. Zhu J, Wang H, Yao T, Tsou BK (2008) Active learning with sampling by uncertainty and density for word sense disambiguation and text classification. In: Proceedings of the 22nd international conference on computational linguistics, vol 1, pp 1137–1144Google Scholar

Copyright information

© The Author(s) 2016

Authors and Affiliations

  1. 1.Illinois Institute of TechnologyChicagoUSA

Personalised recommendations