Skip to main content

Deep Inverse Reinforcement Learning by Logistic Regression

  • Conference paper
  • First Online:
Book cover Neural Information Processing (ICONIP 2016)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9947))

Included in the following conference series:

Abstract

This study proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. It is based on our previous method that exploits the fact that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under linearly solvable Markov decision processes and reward and value functions are estimated by logistic regression. However, reward is assumed to be a linear function whose basis functions are prepared in advance. To overcome this limitation, we employ deep neural network frameworks to implement logistic regression. Simulation results show our method is comparable to model-based previous methods with less computing effort in the Objectworld benchmark. In addition, we show the optimal policy, which is trained with the shaping reward using the estimated reward and value functions, outperforms the policies that are used to collect data in the game of Reversi.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bickel, S., Brückner, M., Scheffer, T.: Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, 2137–2155 (2009)

    MathSciNet  MATH  Google Scholar 

  2. Boularias, A., Kober, J., Peters, J.: Relative entropy inverse reinforcement learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, vol. 15 (2011)

    Google Scholar 

  3. Finn, C., Levine, S., Abbeel, P.: Guided cost learning: deep inverse optimal control via policy optimization. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 49–58 (2016)

    Google Scholar 

  4. Kuderer, M., Gulati, S., Burgard, W.: Learning driving styles for autonomous vehicles from demonstration. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 2641–2646 (2015)

    Google Scholar 

  5. Levine, S., Popović, Z., Koltun, V.: Nonlinear inverse reinforcement learning with gaussian processes. Adv. Neural Inf. Process. Syst. 24, 19–27 (2011)

    Google Scholar 

  6. Muelling, K., Boularias, A., Mohler, B., Schölkopf, B., Peters, J.: Learning strategies in table tennis using inverse reinforcement learning. Biol. Cybern. 108(5), 603–619 (2014)

    Article  Google Scholar 

  7. Ng, A.Y., Harada, D., Russel, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the 16th International Conference on Machine Learning (1999)

    Google Scholar 

  8. Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of the 17th International Conference on Machine Learning, pp. 663–670 (2000)

    Google Scholar 

  9. Shimosaka, M., Nishi, K., Sato, J., Kataoka, H.: Predicting driving behavior using inverse reinforcement learning with multiple reward functions towards environmental diversity. In: Proceedings of IEEE Intelligent Vehicles Symposium, pp. 567–572 (2015)

    Google Scholar 

  10. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  11. Sugiyama, M., Suzuki, T., Kanamori, T.: Density Ratio Estimation in Machine Learning. Cambridge University Press, Cambridge (2012)

    Book  MATH  Google Scholar 

  12. Szubert, M., Jaśkowski, W., Krawiec, K.: On scalability, generalization, and hybridization of coevolutionary learning: a case study for Othello. IEEE Trans. Comput. Intell. AI Games 5(3), 214–226 (2013)

    Article  Google Scholar 

  13. Todorov, E.: Efficient computation of optimal actions. PNAS 106(28), 11478–11483 (2009)

    Article  MATH  Google Scholar 

  14. Uchibe, E., Doya, K.: Inverse reinforcement learning using dynamic policy programming. In: Proceedings of the 4th IEEE International Conference on Development and Learning and on Epigenetic Robotics, pp. 222–228 (2014)

    Google Scholar 

  15. Wulfmeier, M., Ondrúška, P., Posner, I.: Maximum entropy deep inverse reinforcement learning. In: NIPS Deep Reinforcement Learning Workshop (2015)

    Google Scholar 

  16. Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence (2008)

    Google Scholar 

Download references

Acknowledgements

This paper is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eiji Uchibe .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing AG

About this paper

Cite this paper

Uchibe, E. (2016). Deep Inverse Reinforcement Learning by Logistic Regression. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9947. Springer, Cham. https://doi.org/10.1007/978-3-319-46687-3_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-46687-3_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-46686-6

  • Online ISBN: 978-3-319-46687-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics