Advertisement

Approximate Policy Iteration for Closed-Loop Learning of Visual Tasks

  • Sébastien Jodogne
  • Cyril Briquet
  • Justus H. Piater
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4212)

Abstract

Approximate Policy Iteration (API) is a reinforcement learning paradigm that is able to solve high-dimensional, continuous control problems. We propose to exploit API for the closed-loop learning of mappings from images to actions. This approach requires a family of function approximators that maps visual percepts to a real-valued function. For this purpose, we use Regression Extra-Trees, a fast, yet accurate and versatile machine learning algorithm. The inputs of the Extra-Trees consist of a set of visual features that digest the informative patterns in the visual signal. We also show how to parallelize the Extra-Tree learning process to further reduce the computational expense, which is often essential in visual tasks. Experimental results on real-world images are given that indicate that the combination of API with Extra-Trees is a promising framework for the interactive learning of visual tasks.

Keywords

Visual Feature Markov Decision Process Visual Task Function Approximators Policy Iteration 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Bertsekas, D., Tsitsiklis, J.: Neuro-Dynamic Programming. Athena Scientific (1996)Google Scholar
  2. 2.
    Jodogne, S., Piater, J.: Interactive learning of mappings from visual percepts to actions. In: De Raedt, L., Wrobel, S. (eds.) Proc. of the 22nd International Conference on Machine Learning (ICML), Bonn Germany, pp. 393–400. ACM, New York (2005)CrossRefGoogle Scholar
  3. 3.
    Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Machine Learning 36(1), 3–42 (2006)CrossRefGoogle Scholar
  4. 4.
    Lagoudakis, M., Parr, R.: Least-squares policy iteration. Journal of Machine Learning Research 4, 1107–1149 (2003)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Marée, R., Geurts, P., Piater, J., Wehenkel, L.: Random subwindows for robust image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, San Diego (CA, USA), vol. 1, pp. 34–40 (2005)Google Scholar
  6. 6.
    Puterman, M., Shin, M.: Modified policy iteration algorithms for discounted Markov decision problems. Management Science 24, 1127–1137 (1978)MATHCrossRefMathSciNetGoogle Scholar
  7. 7.
    Schmid, C., Mohr, R., Bauckhage, C.: Evaluation of interest point detectors. International Journal of Computer Vision 37(2), 151–172 (2000)MATHCrossRefGoogle Scholar
  8. 8.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. In: Proc. of the IEEE Conference on Computer Vision and Pattern Recognition, Madison (WI, USA), vol. 2, pp. 257–263 (2003)Google Scholar
  9. 9.
    Ernst, D., Geurts, P., Wehenkel, L.: Tree-based batch mode reinforcement learning. Journal of Machine Learning Research 6, 503–556 (2005)MathSciNetGoogle Scholar
  10. 10.
    Cohen, B.: Incentives build robustness in BitTorrent. In: Proc. of the Workshop on Economics of Peer-to-Peer Systems (2003)Google Scholar
  11. 11.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  12. 12.
    Jodogne, S., Piater, J.: Learning, then compacting visual policies (extended abstract). In: Proc. of the 7th European Workshop on Reinforcement Learning (EWRL), Naples (Italie), pp. 8–10 (2005)Google Scholar
  13. 13.
    Jodogne, S., Scalzo, F., Piater, J.: Task-driven learning of spatial combinations of visual features. In: Proc. of the IEEE Workshop on Learning in Computer Vision and Pattern Recognition, San Diego (CA, USA). IEEE, Los Alamitos (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Sébastien Jodogne
    • 1
  • Cyril Briquet
    • 1
  • Justus H. Piater
    • 1
  1. 1.Montefiore Institute (B28)University of LiègeLiègeBelgium

Personalised recommendations