Advertisement

Machine Learning

, Volume 80, Issue 2–3, pp 111–139 | Cite as

The true sample complexity of active learning

  • Maria-Florina Balcan
  • Steve Hanneke
  • Jennifer Wortman Vaughan
Article

Abstract

We describe and explore a new perspective on the sample complexity of active learning. In many situations where it was generally believed that active learning does not help, we show that active learning does help in the limit, often with exponential improvements in sample complexity. This contrasts with the traditional analysis of active learning problems such as non-homogeneous linear separators or depth-limited decision trees, in which Ω(1/ε) lower bounds are common. Such lower bounds should be interpreted carefully; indeed, we prove that it is always possible to learn an ε-good classifier with a number of samples asymptotically smaller than this. These new insights arise from a subtle variation on the traditional definition of sample complexity, not previously recognized in the active learning literature.

Keywords

Active learning Sample complexity Selective sampling Sequential design Learning theory Classification 

References

  1. Antos, A., & Lugosi, G. (1998). Strong minimax lower bounds for learning. Machine Learning, 30, 31–56. CrossRefGoogle Scholar
  2. Balcan, M.-F., & Blum, A. (2006). A PAC-style model for learning from labeled and unlabeled data. In O. Chapelle, B. Schölkopf, & A. Zien (Eds.), Semi-supervised learning. Cambridge: MIT. Google Scholar
  3. Balcan, M.-F., Beygelzimer, A., & Langford, J. (2006). Agnostic active learning. In Proceedings of the 23rd international conference on machine learning. Google Scholar
  4. Balcan, M.-F., Broder, A., & Zhang, T. (2007). Margin based active learning. In Proceedings of the 20th annual conference on learning theory. Google Scholar
  5. Balcan, M.-F., Hanneke, S., & Wortman, J. (2008). The true sample complexity of active learning. In Proceedings of the 21st annual conference on learning theory. Google Scholar
  6. Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the Vapnik Chervonenkis dimension. Journal of the ACM, 36(4), 929–965. zbMATHCrossRefMathSciNetGoogle Scholar
  7. Castro, R., & Nowak, R. (2007). Minimax bounds for active learning. In Proceedings of the 20th annual conference on learning theory. Google Scholar
  8. Cohn, D., Atlas, L., & Ladner, R. (1994). Improving generalization with active learning. Machine Learning, 15(2), 201–221. Google Scholar
  9. Dasgupta, S. (2004). Analysis of a greedy active learning strategy. In Advances in neural information processing systems. Google Scholar
  10. Dasgupta, S. (2005). Coarse sample complexity bounds for active learning. In Advances in neural information processing systems. Google Scholar
  11. Dasgupta, S., Kalai, A., & Monteleoni, C. (2005). Analysis of perceptron-based active learning. In Proceedings of the 18th annual conference on learning theory. Google Scholar
  12. Dasgupta, S., Hsu, D., & Monteleoni, C. (2007). A general agnostic active learning algorithm. In Advances in neural information processing systems. Google Scholar
  13. Devroye, L., Gyorfi, L., & Lugosi, G. (1996). A probabilistic theory of pattern recognition. Berlin: Springer. zbMATHGoogle Scholar
  14. Freund, Y., Seung, H. S., Shamir, E., & Tishby, N. (1997). Selective sampling using the query by committee algorithm. Machine Learning, 28(23), 133–168. zbMATHCrossRefGoogle Scholar
  15. Hanneke, S. (2007a). A bound on the label complexity of agnostic active learning. In Proceedings of the 24th international conference on machine learning. Google Scholar
  16. Hanneke, S. (2007b). Teaching dimension and the complexity of active learning. In Proceedings of the 20th conference on learning theory. Google Scholar
  17. Hanneke, S. (2009). Theoretical foundations of active learning. PhD thesis, Machine Learning Department, Carnegie Mellon University. Google Scholar
  18. Haussler, D., Littlestone, N., & Warmuth, M. (1994). Predicting {0,1}-functions on randomly drawn points. Information and Computation, 115, 248–292. zbMATHCrossRefMathSciNetGoogle Scholar
  19. Shawe-Taylor, J., Bartlett, P. L., Williamson, R. C., & Anthony, M. (1998). Structural risk minimization over data-dependent hierarchies. IEEE Transactions on Information Theory, 44(5), 1926–1940. CrossRefMathSciNetGoogle Scholar
  20. Vapnik, V. (1982). Estimation of dependencies based on empirical data. New York: Springer. Google Scholar
  21. Vapnik, V. N. (1998). Statistical learning theory. New York: Wiley. zbMATHGoogle Scholar

Copyright information

© The Author(s) 2010

Authors and Affiliations

  • Maria-Florina Balcan
    • 1
  • Steve Hanneke
    • 2
  • Jennifer Wortman Vaughan
    • 3
  1. 1.College of Computing, School of Computer ScienceGeorgia Institute of TechnologyAtlantaUSA
  2. 2.Department of StatisticsCarnegie Mellon UniversityPittsburghUSA
  3. 3.School of Engineering and Applied SciencesHarvard UniversityCambridgeUSA

Personalised recommendations