Deep Inverse Reinforcement Learning by Logistic Regression

Uchibe, Eiji

doi:10.1007/978-3-319-46687-3_3

Eiji Uchibe^19,20

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9947))

Included in the following conference series:

International Conference on Neural Information Processing

2968 Accesses
1 Citations
1 Altmetric

Abstract

This study proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. It is based on our previous method that exploits the fact that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under linearly solvable Markov decision processes and reward and value functions are estimated by logistic regression. However, reward is assumed to be a linear function whose basis functions are prepared in advance. To overcome this limitation, we employ deep neural network frameworks to implement logistic regression. Simulation results show our method is comparable to model-based previous methods with less computing effort in the Objectworld benchmark. In addition, we show the optimal policy, which is trained with the shaping reward using the estimated reward and value functions, outperforms the policies that are used to collect data in the game of Reversi.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bickel, S., Brückner, M., Scheffer, T.: Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, 2137–2155 (2009)
MathSciNet MATH Google Scholar
Boularias, A., Kober, J., Peters, J.: Relative entropy inverse reinforcement learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, vol. 15 (2011)
Google Scholar
Finn, C., Levine, S., Abbeel, P.: Guided cost learning: deep inverse optimal control via policy optimization. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 49–58 (2016)
Google Scholar
Kuderer, M., Gulati, S., Burgard, W.: Learning driving styles for autonomous vehicles from demonstration. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 2641–2646 (2015)
Google Scholar
Levine, S., Popović, Z., Koltun, V.: Nonlinear inverse reinforcement learning with gaussian processes. Adv. Neural Inf. Process. Syst. 24, 19–27 (2011)
Google Scholar
Muelling, K., Boularias, A., Mohler, B., Schölkopf, B., Peters, J.: Learning strategies in table tennis using inverse reinforcement learning. Biol. Cybern. 108(5), 603–619 (2014)
Article Google Scholar
Ng, A.Y., Harada, D., Russel, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the 16th International Conference on Machine Learning (1999)
Google Scholar
Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of the 17th International Conference on Machine Learning, pp. 663–670 (2000)
Google Scholar
Shimosaka, M., Nishi, K., Sato, J., Kataoka, H.: Predicting driving behavior using inverse reinforcement learning with multiple reward functions towards environmental diversity. In: Proceedings of IEEE Intelligent Vehicles Symposium, pp. 567–572 (2015)
Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
Sugiyama, M., Suzuki, T., Kanamori, T.: Density Ratio Estimation in Machine Learning. Cambridge University Press, Cambridge (2012)
Book MATH Google Scholar
Szubert, M., Jaśkowski, W., Krawiec, K.: On scalability, generalization, and hybridization of coevolutionary learning: a case study for Othello. IEEE Trans. Comput. Intell. AI Games 5(3), 214–226 (2013)
Article Google Scholar
Todorov, E.: Efficient computation of optimal actions. PNAS 106(28), 11478–11483 (2009)
Article MATH Google Scholar
Uchibe, E., Doya, K.: Inverse reinforcement learning using dynamic policy programming. In: Proceedings of the 4th IEEE International Conference on Development and Learning and on Epigenetic Robotics, pp. 222–228 (2014)
Google Scholar
Wulfmeier, M., Ondrúška, P., Posner, I.: Maximum entropy deep inverse reinforcement learning. In: NIPS Deep Reinforcement Learning Workshop (2015)
Google Scholar
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence (2008)
Google Scholar

Download references

Acknowledgements

This paper is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Author information

Authors and Affiliations

Department of Brain Robot Interface, ATR Computational Neuroscience Laboratories, 2-2-2 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0288, Japan
Eiji Uchibe
Neural Computation Unit, Okinawa Institute of Science and Technology Graduate University, 1919-1 Tancha, Onna-son, Okinawa, 904-0495, Japan
Eiji Uchibe

Authors

Eiji Uchibe
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eiji Uchibe .

Editor information

Editors and Affiliations

The University of Tokyo, Tokyo, Japan
Akira Hirose
Kobe University, Kobe, Japan
Seiichi Ozawa
Okinawa Institute of Science and Technology Graduate University, Onna, Japan
Kenji Doya
Nara Institute of Science and Technology, Ikoma, Japan
Kazushi Ikeda
Kyungpook National University, Daegu, Korea (Republic of)
Minho Lee
Chinese Academy of Sciences, Beijing, China
Derong Liu

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Uchibe, E. (2016). Deep Inverse Reinforcement Learning by Logistic Regression. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9947. Springer, Cham. https://doi.org/10.1007/978-3-319-46687-3_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-46687-3_3
Published: 29 September 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46686-6
Online ISBN: 978-3-319-46687-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics