Active Learning in the Non-realizable Case

  • Matti Kääriäinen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4264)


Most of the existing active learning algorithms are based on the realizability assumption: The learner’s hypothesis class is assumed to contain a target function that perfectly classifies all training and test examples. This assumption can hardly ever be justified in practice. In this paper, we study how relaxing the realizability assumption affects the sample complexity of active learning. First, we extend existing results on query learning to show that any active learning algorithm for the realizable case can be transformed to tolerate random bounded rate class noise. Thus, bounded rate class noise adds little extra complications to active learning, and in particular exponential label complexity savings over passive learning are still possible. However, it is questionable whether this noise model is any more realistic in practice than assuming no noise at all.

Our second result shows that if we move to the truly non-realizable model of statistical learning theory, then the label complexity of active learning has the same dependence Ω(1/ε2) on the accuracy parameter ε as the passive learning label complexity. More specifically, we show that under the assumption that the best classifier in the learner’s hypothesis class has generalization error at most β>0, the label complexity of active learning is Ω(β2/ε2log(1/δ)), where the accuracy parameter ε measures how close to optimal within the hypothesis class the active learner has to get and δ is the confidence parameter. The implication of this lower bound is that exponential savings should not be expected in realistic models of active learning, and thus the label complexity goals in active learning should be refined.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Machine Learning 28(2-3), 133–168 (1997)MATHCrossRefGoogle Scholar
  2. 2.
    Dasgupta, S., Kalai, A.T., Monteleoni, C.: Analysis of perceptron-based active learning. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS, vol. 3559, pp. 249–263. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  3. 3.
    Dasgupta, S.: Coarse sample complexity bounds for active learning. In: NIPS 2005 (2005)Google Scholar
  4. 4.
    Balcan, N., Beygelzimer, A., Langford, J.: Agnostic active learning. In: ICML (accepted, 2006)Google Scholar
  5. 5.
    Tong, S., Koller, D.: Support vector machine active learning with applications to text classification. Journal of Machine Learning Research 2, 45–66 (2002)MATHCrossRefGoogle Scholar
  6. 6.
    Angluin, D., Laird, P.: Learning from noisy examples. Machine Learning 2(4), 343–370 (1987)Google Scholar
  7. 7.
    Sakakibara, Y.: On learning from queries and counterexamples in the presence of noise. Information Processing Letters 37(5), 279–284 (1991)MATHCrossRefMathSciNetGoogle Scholar
  8. 8.
    Vapnik, V.N.: Estimation of Dependencies Based on Empirical Data. Springer, Heidelberg (1982)Google Scholar
  9. 9.
    Gentile, C., Helmbold, D.P.: Improved lower bounds for learning from noisy examples: an information-theoretic approach. In: COLT 1998, pp. 104–115. ACM Press, New York (1998)CrossRefGoogle Scholar
  10. 10.
    Domingo, C., Gavaldá, R., Watanabe, O.: Adaptive sampling methods for scaling up knowledge discovery algorithms. In: Arikawa, S., Furukawa, K. (eds.) DS 1999. LNCS, vol. 1721, pp. 172–183. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  11. 11.
    Castro, R.: Personal communication (March 2006)Google Scholar
  12. 12.
    Silvey, S.D.: Optimal Design. Chapman and Hall, London (1980)MATHGoogle Scholar
  13. 13.
    Elfving, G.: Selection of nonrepeatable observations for estimation. In: Proceedings of the 3rd Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 69–75 (1956)Google Scholar
  14. 14.
    Canetti, R., Even, G., Goldreich, O.: Lower bounds for sampling algorithms for estimating the average. Information Processing Letters 53(1), 17–25 (1995)MATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Matti Kääriäinen
    • 1
  1. 1.Department of Computer ScienceUniversity of Helsinki 

Personalised recommendations