Q-Learning with Double Progressive Widening: Application to Robotics

  • Nataliya Sokolovska
  • Olivier Teytaud
  • Mario Milone
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7064)


Discretization of state and action spaces is a critical issue in Q-Learning. In our contribution, we propose a real-time adaptation of the discretization by the progressive widening technique which has been already used in bandit-based methods. Results are consistently converging to the optimum of the problem, without changing the parametrization for each new problem.


Q-Learning discretization applications 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Davies, S.: Multidimensional Triangulation and Interpolation for Reinforcement Learning. In: Advances in Neural Information Processing Systems (1997)Google Scholar
  2. 2.
    Munos, R., Moore, A.: Variable Resolution Discretization in Optimal Control. Technical report, Robotics Institute, CMU (1999)Google Scholar
  3. 3.
    Munos, R., Moore, A.W.: Variable Resolution Discretization for High-accuracy Solutions of Optimal Control Problems. In: IJCAI, pp. 1348–1355 (1999)Google Scholar
  4. 4.
    Albus, J.S.: A New Approach to Manipulator Control: the Cerebellar Model Articulation Controller. Journal of Dynamic Systems, Measurement, and Control 97, 220–227 (1975)CrossRefzbMATHGoogle Scholar
  5. 5.
    Burgin, G.: Using Cerebellar Arithmetic Computers. AI Expert 7 (1992)Google Scholar
  6. 6.
    Gaskett, C., Wettergreen, D., Zelinsky, A.: Q-learning in Continuous State and Action Spaces. In: Foo, N.Y. (ed.) AI 1999. LNCS, vol. 1747, pp. 417–428. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  7. 7.
    Gersho, A., Gray, R.M.: Vector Quantization and Signal Compression. Kluwer Academic Publishers (1991)Google Scholar
  8. 8.
    Stone, P., Sutton, R.S., Kuhlmann, G.: Reinforcement Learning for Robocup-soccer Keepaway. Adaptive Behavior 3, 165–188 (2005)CrossRefGoogle Scholar
  9. 9.
    Fernández, F., Borrajo, D.: Two Steps Reinforcement Learning. International Journal of Intelligent Systems 2, 213–245 (2008)CrossRefzbMATHGoogle Scholar
  10. 10.
    Lampton, A., Valasek, J.: Multiresolution State-Space Discretization Method for Q-Learning. In: American Control Conference (2009)Google Scholar
  11. 11.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction. MIT Press (1998)Google Scholar
  12. 12.
    Watkings, C.J.C.H.: Learning from Delayed Rewards. PhD thesis, Cambridge University (1989)Google Scholar
  13. 13.
    Couëtoux, A., Hoock, J.B., Sokolovska, N., Teytaud, O., Bonnard, N.: Continuous Upper Confidence Trees. In: International Conference on Learning and Intelligent Optimization (2011)Google Scholar
  14. 14.
    Coulom, R.: Monte-Carlo Tree Search in Crazy Stone. In: Game Programming Workshop (2007)Google Scholar
  15. 15.
    Rolet, P., Sebag, M., Teytaud, O.: Boosting Active Learning to Optimality: a Tractable Monte-Carlo, Billiard-based Algorithm. In: European Conference on Machine Learning (2009)Google Scholar
  16. 16.
    Wang, Y., Audibert, J.Y., Munos, R.: Algorithms for Infinitely Many-armed Bandits. In: Advances in Neural Information Processing Systems (2008)Google Scholar
  17. 17.
    Coulom, R.: Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In: Ciancarini, P., van den Herik, H.J. (eds.) Proceedings of the 5th International Conference on Computers and Games, Turin, Italy (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Nataliya Sokolovska
    • 1
  • Olivier Teytaud
    • 1
  • Mario Milone
    • 1
  1. 1.INRIA Saclay, CNRS UMR 8623 & LRIUniversité Paris SudOrsayFrance

Personalised recommendations