Task-Driven Discretization of the Joint Space of Visual Percepts and Continuous Actions

  • Sébastien Jodogne
  • Justus H. Piater
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


We target the problem of closed-loop learning of control policies that map visual percepts to continuous actions. Our algorithm, called Reinforcement Learning of Joint Classes (RLJC), adaptively discretizes the joint space of visual percepts and continuous actions. In a sequence of attempts to remove perceptual aliasing, it incrementally builds a decision tree that applies tests either in the input perceptual space or in the output action space. The leaves of such a decision tree induce a piecewise constant, optimal state-action value function, which is computed through a reinforcement learning algorithm that uses the tree as a function approximator. The optimal policy is then derived by selecting the action that, given a percept, leads to the leaf that maximizes the value function. Our approach is quite general and applies also to learning mappings from continuous percepts to continuous actions. A simulated visual navigation problem illustrates the applicability of RLJC.


Joint Space Optimal Policy Action Space Interest Point Action Feature 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scient. (1996)Google Scholar
  2. 2.
    Sutton, R., Barto, A.: Reinforcement Learning, an Introduction. MIT Press, Cambridge (1998)Google Scholar
  3. 3.
    Gross, H.M., Stephan, V., Krabbes, M.: A neural field approach to topological reinforcement learning in continuous action spaces. In: Proc. of the IEEE World Congress on Computational Intelligence, vol. 3, pp. 1992–1997 (1998)Google Scholar
  4. 4.
    Santamaria, J., Sutton, R., Ram, A.: Experiments with reinforcement learning in problems with continuous state and action spaces. Adaptive Behavior 6(2), 163–218 (1998)CrossRefGoogle Scholar
  5. 5.
    Gaskett, C., Wettergreen, D., Zelinsky, A.: Q-learning in continuous state and action spaces. In: Australian Joint Conf. on Artificial Intelligence, pp. 417–428 (1999)Google Scholar
  6. 6.
    Jodogne, S., Piater, J.: Interactive learning of mappings from visual percepts to actions. In: De Raedt, L., Wrobel, S. (eds.) Proc. of the 22nd Intern. Conf. on Machine Learning (ICML), Bonn, Germany, pp. 393–400. ACM Press, New York (2005)CrossRefGoogle Scholar
  7. 7.
    Monson, C., Wingate, D., Seppi, K., Peterson, T.: Variable resolution discretization in the joint space. In: Intern. Conf. on Machine Learning and Applications (2004)Google Scholar
  8. 8.
    Munos, R., Moore, A.: Variable resolution discretization in optimal control. Machine Learning 49, 291–323 (2002)MATHCrossRefGoogle Scholar
  9. 9.
    Whitehead, S., Ballard, D.: Learning to perceive and act by trial and error. Machine Learning 7, 45–83 (1991)Google Scholar
  10. 10.
    Breiman, L., Friedman, J., Stone, C.: Classification and Regression Trees. Wadsworth Intern. Group (1984)Google Scholar
  11. 11.
    Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. Intern. Journal of Computer Vision 37(2), 151–172 (2000)MATHCrossRefGoogle Scholar
  12. 12.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition, Madison, WI, USA, vol. 2, pp. 257–263 (2003)Google Scholar
  13. 13.
    Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)MathSciNetGoogle Scholar
  14. 14.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. Intern. Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  15. 15.
    Coelho, J., Piater, J., Grupen, R.: Developing haptic and visual perceptual categories for reaching and grasping with a humanoid robot. Robotics and Autonomous Systems, special issue on Humanoid Robots 37(2-3), 195–218 (2001)MATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sébastien Jodogne
    • 1
  • Justus H. Piater
    • 1
  1. 1.Montefiore Institute (B28)University of LiègeLiègeBelgium

Personalised recommendations