A Cascaded Supervised Learning Approach to Inverse Reinforcement Learning

  • Edouard Klein
  • Bilal Piot
  • Matthieu Geist
  • Olivier Pietquin
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8188)


This paper considers the Inverse Reinforcement Learning (IRL) problem, that is inferring a reward function for which a demonstrated expert policy is optimal. We propose to break the IRL problem down into two generic Supervised Learning steps: this is the Cascaded Supervised IRL (CSI) approach. A classification step that defines a score function is followed by a regression step providing a reward function. A theoretical analysis shows that the demonstrated expert policy is near-optimal for the computed reward function. Not needing to repeatedly solve a Markov Decision Process (MDP) and the ability to leverage existing techniques for classification and regression are two important advantages of the CSI approach. It is furthermore empirically demonstrated to compare positively to state-of-the-art approaches when using only transitions sampled according to the expert policy, up to the use of some heuristics. This is exemplified on two classical benchmarks (the mountain car problem and a highway driving simulator).


Optimal Policy Relative Entropy Markov Decision Process Reward Function Bellman Equation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Abbeel, P., Ng, A.: Apprenticeship learning via inverse reinforcement learning. In: Proc. ICML (2004)Google Scholar
  2. 2.
    Boularias, A., Kober, J.: Peters: Relative entropy inverse reinforcement learning. In: Proc. ICAPS, vol. 15, pp. 20–27 (2011)Google Scholar
  3. 3.
    Dvijotham, K., Todorov, E.: Inverse optimal control with linearly-solvable MDPs. In: Proc. ICML (2010)Google Scholar
  4. 4.
    Guermeur, Y.: A generic model of multi-class support vector machine. International Journal of Intelligent Information and Database Systems (2011)Google Scholar
  5. 5.
    Klein, E., Geist, M., Piot, B., Pietquin, O.: Inverse Reinforcement Learning through Structured Classification. In: Proc. NIPS, Lake Tahoe, NV, USA (December 2012)Google Scholar
  6. 6.
    Melo, F.S., Lopes, M.: Learning from demonstration using MDP induced metrics. In: Balcázar, J.L., Bonchi, F., Gionis, A., Sebag, M. (eds.) ECML PKDD 2010, Part II. LNCS, vol. 6322, pp. 385–401. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  7. 7.
    Melo, F., Lopes, M., Ferreira, R.: Analysis of inverse reinforcement learning with perturbed demonstrations. In: Proc. ECAI, pp. 349–354. IOS Press (2010)Google Scholar
  8. 8.
    Neu, G., Szepesvári, C.: Training parsers by inverse reinforcement learning. Machine Learning 77(2), 303–337 (2009)CrossRefGoogle Scholar
  9. 9.
    Ng, A., Russell, S.: Algorithms for inverse reinforcement learning. In: Proc. ICML, pp. 663–670. Morgan Kaufmann Publishers Inc. (2000)Google Scholar
  10. 10.
    Puterman, M.: Markov decision processes: Discrete stochastic dynamic programming. John Wiley & Sons, Inc., New York (1994)MATHCrossRefGoogle Scholar
  11. 11.
    Rasmussen, C., Williams, C.: Gaussian processes for machine learning, vol. 1. MIT press, Cambridge (2006)MATHGoogle Scholar
  12. 12.
    Ratliff, N., Bagnell, J., Srinivasa, S.: Imitation learning for locomotion and manipulation. In: International Conference on Humanoid Robots, pp. 392–397. IEEE (2007)Google Scholar
  13. 13.
    Ratliff, N., Bagnell, J., Zinkevich, M.: Maximum margin planning. In: Proc. ICML, p. 736. ACM (2006)Google Scholar
  14. 14.
    Regan, K., Boutilier, C.: Robust online optimization of reward-uncertain MDPs. In: Proc. IJCAI 2011 (2011)Google Scholar
  15. 15.
    Russell, S.: Learning agents for uncertain environments (extended abstract). In: Annual Conference on Computational Learning Theory, p. 103. ACM (1998)Google Scholar
  16. 16.
    Sutton, R., Barto, A.: Reinforcement learning. MIT Press (1998)Google Scholar
  17. 17.
    Syed, U., Bowling, M., Schapire, R.: Apprenticeship learning using linear programming. In: Proc. ICML, pp. 1032–1039. ACM (2008)Google Scholar
  18. 18.
    Syed, U., Schapire, R.: A game-theoretic approach to apprenticeship learning. In: Proc. NIPS, vol. 20, pp. 1449–1456 (2008)Google Scholar
  19. 19.
    Syed, U., Schapire, R.: A reduction from apprenticeship learning to classification. In: Proc. NIPS, vol. 24, pp. 2253–2261 (2010)Google Scholar
  20. 20.
    Taskar, B., Chatalbashev, V., Koller, D., Guestrin, C.: Learning structured prediction models: A large margin approach. In: Proc. ICML, p. 903. ACM (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Edouard Klein
    • 1
    • 2
  • Bilal Piot
    • 2
    • 3
  • Matthieu Geist
    • 2
  • Olivier Pietquin
    • 2
    • 3
  1. 1.ABC TeamLORIA-CNRSFrance
  2. 2.IMS-MaLIS Research GroupSupélecFrance
  3. 3.UMI 2958 (GeorgiaTech-CNRS)France

Personalised recommendations