Learning from Demonstrations: Is It Worth Estimating a Reward Function?

  • Bilal Piot
  • Matthieu Geist
  • Olivier Pietquin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8188)

Abstract

This paper provides a comparative study between Inverse Reinforcement Learning (IRL) and Apprenticeship Learning (AL). IRL and AL are two frameworks, using Markov Decision Processes (MDP), which are used for the imitation learning problem where an agent tries to learn from demonstrations of an expert. In the AL framework, the agent tries to learn the expert policy whereas in the IRL framework, the agent tries to learn a reward which can explain the behavior of the expert. This reward is then optimized to imitate the expert. One can wonder if it is worth estimating such a reward, or if estimating a policy is sufficient. This quite natural question has not really been addressed in the literature right now. We provide partial answers, both from a theoretical and empirical point of view.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the 21st International Conference on Machine Learning, ICML (2004)Google Scholar
  2. 2.
    Archibald, T., McKinnon, K., Thomas, L.: On the generation of markov decision processes. Journal of the Operational Research Society (1995)Google Scholar
  3. 3.
    Atkeson, C.G., Schaal, S.: Robot learning from demonstration. In: Proceedings of the 14th International Conference on Machine Learning, ICML (1997)Google Scholar
  4. 4.
    Boularias, A., Kober, J., Peters, J.: Relative entropy inverse reinforcement learning. In: JMLR Workshop and Conference Proceedings, AISTATS 2011, vol. 15 (2011)Google Scholar
  5. 5.
    Klein, E., Geist, M., Piot, B., Pietquin, O.: Inverse reinforcement learning through structured classification. In: Advances in Neural Information Processing Systems 25 (NIPS) (2012)Google Scholar
  6. 6.
    Langford, J., Zadrozny, B.: Relating reinforcement learning performance to classification performance. In: Proceedings of the 22nd International Conference on Machine Learning, ICML (2005)Google Scholar
  7. 7.
    Pomerleau, D.: Alvinn: An autonomous land vehicle in a neural network. Tech. rep., DTIC Document (1989)Google Scholar
  8. 8.
    Russell, S.: Learning agents for uncertain environments. In: Proceedings of the 11th Annual Conference on Computational Learning Theory, COLT (1998)Google Scholar
  9. 9.
    Shor, N.Z., Kiwiel, K.C., Ruszcaynski, A.: Minimization methods for non-differentiable functions. Springer (1985)Google Scholar
  10. 10.
    Syed, U., Schapire, R.: A game-theoretic approach to apprenticeship learning. In: Advances in Neural Information Processing Systems 21 (NIPS) (2008)Google Scholar
  11. 11.
    Syed, U., Schapire, R.: A reduction from apprenticeship learning to classification. In: Advances in Neural Information Processing Systems 23 (NIPS) (2010)Google Scholar
  12. 12.
    Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C.: Learning structured prediction models: A large margin approach. In: Proceedings of the 22nd International Conference on Machine Learning, ICML (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Bilal Piot
    • 1
    • 2
  • Matthieu Geist
    • 1
  • Olivier Pietquin
    • 1
    • 2
  1. 1.IMS-MaLIS Research GroupSupélecFrance
  2. 2.GeorgiaTech-CNRS UMI 2958France

Personalised recommendations