Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks

  • Sébastien Jodogne
  • Cyril Briquet
  • Justus H. Piater
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)


Approximate Policy Iteration (API) is a reinforcement learning paradigm that is able to solve high-dimensional, continuous control problems. We propose to exploit API for the closed-loop learning of mappings from images to actions. This approach requires a family of function approximators that maps visual percepts to a real-valued function. For this purpose, we use Regression Extra-Trees, a fast, yet accurate and versatile machine learning algorithm. The inputs of the Extra-Trees consist of a set of visual features that digest the informative patterns in the visual signal. We also show how to parallelize the Extra-Tree learning process to further reduce the computational expense, which is often essential in visual tasks. Experimental results on real-world images are given that indicate that the combination of API with Extra-Trees is a promising framework for the interactive learning of visual tasks.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific (1996)Google Scholar
  2. 2.
    Jodogne, S., Piater, J.: Interactive learning of mappings from visual percepts to actions. In: De Raedt, L., Wrobel, S. (eds.) Proc. of the 22nd International Conference on Machine Learning (ICML), Bonn Germany, pp. 393–400. ACM, New York (2005)CrossRefGoogle Scholar
  3. 3.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 36(1), 3–42 (2006)CrossRefGoogle Scholar
  4. 4.
    Lagoudakis, M., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Marée, R., Geurts, P., Piater, J., Wehenkel, L.: Random subwindows for robust image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, San Diego (CA, USA), vol. 1, pp. 34–40 (2005)Google Scholar
  6. 6.
    Puterman, M., Shin, M.: Modified policy iteration algorithms for discounted Markov decision problems. Management Science 24, 1127–1137 (1978)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. International Journal of Computer Vision 37(2), 151–172 (2000)MATHCrossRefGoogle Scholar
  8. 8.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Madison (WI, USA), vol. 2, pp. 257–263 (2003)Google Scholar
  9. 9.
    Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)MathSciNetGoogle Scholar
  10. 10.
    Cohen, B.: Incentives build robustness in BitTorrent. In: Proc. of the Workshop on Economics of Peer-to-Peer Systems (2003)Google Scholar
  11. 11.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  12. 12.
    Jodogne, S., Piater, J.: Learning, then compacting visual policies (extended abstract). In: Proc. of the 7th European Workshop on Reinforcement Learning (EWRL), Naples (Italie), pp. 8–10 (2005)Google Scholar
  13. 13.
    Jodogne, S., Scalzo, F., Piater, J.: Task-driven learning of spatial combinations of visual features. In: Proc. of the IEEE Workshop on Learning in Computer Vision and Pattern Recognition, San Diego (CA, USA). IEEE, Los Alamitos (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sébastien Jodogne
    • 1
  • Cyril Briquet
    • 1
  • Justus H. Piater
    • 1
  1. 1.Montefiore Institute (B28)University of LiègeLiègeBelgium

Personalised recommendations