Boosting Active Learning to Optimality: A Tractable Monte-Carlo, Billiard-Based Algorithm

  • Philippe Rolet
  • Michèle Sebag
  • Olivier Teytaud
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5782)


This paper focuses on Active Learning with a limited number of queries; in application domains such as Numerical Engineering, the size of the training set might be limited to a few dozen or hundred examples due to computational constraints. Active Learning under bounded resources is formalized as a finite horizon Reinforcement Learning problem, where the sampling strategy aims at minimizing the expectation of the generalization error. A tractable approximation of the optimal (intractable) policy is presented, the Bandit-based Active Learner (BAAL) algorithm. Viewing Active Learning as a single-player game, BAAL combines UCT, the tree structured multi-armed bandit algorithm proposed by Kocsis and Szepesvári (2006), and billiard algorithms. A proof of principle of the approach demonstrates its good empirical convergence toward an optimal policy and its ability to incorporate prior AL criteria. Its hybridization with the Query-by-Committee approach is found to improve on both stand-alone BAAL and stand-alone QbC.


  1. 1.
    Kulkarni, S.R., Mitter, S.K., Tsitsiklis, J.N.: Active learning using arbitrary binary valued queries. Mach. Learn. 11(1), 23–35 (1993)CrossRefzbMATHGoogle Scholar
  2. 2.
    Cohn, D., Atlas, L., Ladner, R.: Improving generalization with active learning. Mach. Learn. 15(2), 201–221 (1994)Google Scholar
  3. 3.
    Schohn, G., Cohn, D.: Less is more: Active learning with support vector machines. Int. Conf. on Machine Learning 282, 285–286 (2000)Google Scholar
  4. 4.
    Dasgupta, S.: Analysis of a greedy active learning strategy. In: NIPS 17, pp. 337–344. MIT Press, Cambridge (2005)Google Scholar
  5. 5.
    Castro, R., Willett, R., Nowak, R.: Faster rates in regression via active learning. In: NIPS 18, pp. 179–186. MIT Press, Cambridge (2006)Google Scholar
  6. 6.
    Hoi, S.C.H., Jin, R., Zhu, J., Lyu, M.R.: Batch mode active learning and its application to medical image classification. In: Int. Conf. on Machine Learning, pp. 417–424. ACM, New York (2006)Google Scholar
  7. 7.
    Hanneke, S.: A bound on the label complexity of agnostic active learning. In: Int. Conf. on Machine Learning, pp. 353–360. ACM, New York (2007)Google Scholar
  8. 8.
    Kocsis, L., Szepesvari, C.: Bandit-based monte-carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  9. 9.
    Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: Int. Conf. on Machine Learning, pp. 273–280. ACM, New York (2007)Google Scholar
  10. 10.
    Ruján, P.: Playing billiards in version space. Neural Computation 9(1), 99–122 (1997)CrossRefzbMATHGoogle Scholar
  11. 11.
    Herbrich, R., Graepel, T., Campbell, C.: Bayes point machines. Journal of Machine Learning Research 1, 245–279 (2001)MathSciNetzbMATHGoogle Scholar
  12. 12.
    Warmuth, M.K., Liao, J., Rätsch, G., Mathieson, M., Putta, S., Lemmen, C.: Support vector machines for active learning in the drug discovery process. Journal of Chemical Information Sciences 43, 667–673 (2003)CrossRefGoogle Scholar
  13. 13.
    Coulom, R.: Efficient selectivity and backup operators in Monte-Carlo tree search. In: Ciancarini, P., van den Herik, H.J. (eds.) CG 2006. LNCS, vol. 4630, pp. 72–83. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  14. 14.
    Chaslot, G., Winands, M., Uiterwijk, J., van den Herik, H., Bouzy, B.: Progressive strategies for Monte-Carlo tree search. In: Wang, P., et al. (eds.) Proc. of the 10th Joint Conf. on Information Sciences, pp. 655–661. World Scientific Publishing, Singapore (2007)Google Scholar
  15. 15.
    Wang, Y., Audibert, J.Y., Munos, R.: Algorithms for infinitely many-armed bandits. In: NIPS 21, pp. 1729–1736 (2009)Google Scholar
  16. 16.
    Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: COLT 1992, pp. 287–294. ACM, New York (1992)Google Scholar
  17. 17.
    Freund, Y., Seung, H.S., Shamir, E., Tishby, N.: Selective sampling using the query by committee algorithm. Mach. Learn. 28(2-3), 133–168 (1997)CrossRefzbMATHGoogle Scholar
  18. 18.
    Cohn, D., Ghahramani, Z., Jordan, M.: Active Learning with Statistical Models. Journal of Artificial Intelligence Research 4, 129–145 (1996)zbMATHGoogle Scholar
  19. 19.
    Roy, N., McCallum, A.: Toward optimal active learning through sampling estimation of error reduction. In: Int. Conf. on Machine Learning, pp. 441–448. Morgan Kaufmann, San Francisco (2001)Google Scholar
  20. 20.
    Lindenbaum, M., Markovitch, S., Rusakov, D.: Selective sampling for nearest neighbor classifiers. Machine Learning 54, 125–152 (2004)CrossRefzbMATHGoogle Scholar
  21. 21.
    Dasgupta, S., Kalai, A.T., Monteleoni, C.: Analysis of perceptron-based active learning. In: Auer, P., Meir, R. (eds.) COLT 2005. LNCS (LNAI), vol. 3559, pp. 249–263. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  22. 22.
    Cesa-Bianchi, N., Conconi, A., Gentile, C.: Learning probabilistic linear-threshold classifiers via selective sampling. In: Schölkopf, B., Warmuth, M.K. (eds.) COLT/Kernel 2003. LNCS (LNAI), vol. 2777, pp. 373–387. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  23. 23.
    Florina Balcan, M., Broder, A., Zhang, T.: Margin based active learning. In: Bshouty, N.H., Gentile, C. (eds.) COLT. LNCS (LNAI), vol. 4539, pp. 35–50. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  24. 24.
    Xiao, G., Southey, F., Holte, R.C., Wilkinson, D.: Software testing by active learning for commercial games. In: AAAI 2005, pp. 609–616 (2005)Google Scholar
  25. 25.
    Vidyasagar, M.: A Theory of Learning and Generalization, with Applications to Neural Networks and Control Systems. Springer, Heidelberg (1997)zbMATHGoogle Scholar
  26. 26.
    Hegedüs, T.: Generalized teaching dimensions and the query complexity of learning. In: COLT 1995, pp. 108–117. ACM, New York (1995)Google Scholar
  27. 27.
    Dasgupta, S.: Coarse sample complexity bounds for active learning. In: NIPS 18, pp. 235–242. MIT Press, Cambridge (2006)Google Scholar
  28. 28.
    Haussler, D., Kearns, M., Schapire, R.E.: Bounds on the sample complexity of bayesian learning using information theory and the VC dimension. Mach. Learn. 14(1), 83–113 (1994)zbMATHGoogle Scholar
  29. 29.
    Mackay, D.J.C.: Bayesian interpolation. Neural Computation 4, 415–447 (1992)CrossRefzbMATHGoogle Scholar
  30. 30.
    Sutton, R., Barto, A.: Reinforcement learning: An introduction. MIT Press, Cambridge (1998)Google Scholar
  31. 31.
    Rolet, P., Sebag, M., Teytaud, O.: Boosting active learning to optimality: some results on a tractable Monte-Carlo, billiard-based algorithm. Technical report, Laboratoire de Recherche en Informatique, Univ. Paris Sud. (2009)Google Scholar
  32. 32.
    Bellman, R.: Dynamic Programming. Princeton Univ. Press, Princeton (1957)Google Scholar
  33. 33.
    Auer, P.: Using confidence bounds for exploitation-exploration trade-offs. The Journal of Machine Learning Research 3, 397–422 (2003)MathSciNetzbMATHGoogle Scholar
  34. 34.
    Wang, Y., Gelly, S.: Modifications of UCT and sequence-like simulations for Monte-Carlo Go. In: IEEE Symposium on Computational Intelligence and Games, Honolulu, Hawaii, pp. 175–182 (2007)Google Scholar
  35. 35.
    Ruján, P., Marchand, M.: Computing the bayes kernel classifier (1999)Google Scholar
  36. 36.
    Comets, F., Popov, S., Schütz, G.M., Vachkovskaia, M.: Billiards in a General Domain with Random Reflections. Archive for Rational Mechanics and Analysis 191, 497–537 (2009)MathSciNetCrossRefzbMATHGoogle Scholar
  37. 37.
    Kocsis, L., Szepesvari, C.: Bandit Based Monte-Carlo Planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  38. 38.
    Freund, Y., Schapire, R.: Large margin classification using the perceptron algorithm. In: COLT 1998. Morgan Kaufmann, San Francisco (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2009

Authors and Affiliations

  • Philippe Rolet
    • 1
  • Michèle Sebag
    • 1
  • Olivier Teytaud
    • 1
  1. 1.TAO, CNRS − INRIA − Univ. Paris-SudFrance

Personalised recommendations