Abstract
This study proposes model-free deep inverse reinforcement learning to find nonlinear reward function structures. It is based on our previous method that exploits the fact that the log of the ratio between an optimal state transition and a baseline one is given by a part of reward and the difference of the value functions under linearly solvable Markov decision processes and reward and value functions are estimated by logistic regression. However, reward is assumed to be a linear function whose basis functions are prepared in advance. To overcome this limitation, we employ deep neural network frameworks to implement logistic regression. Simulation results show our method is comparable to model-based previous methods with less computing effort in the Objectworld benchmark. In addition, we show the optimal policy, which is trained with the shaping reward using the estimated reward and value functions, outperforms the policies that are used to collect data in the game of Reversi.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bickel, S., Brückner, M., Scheffer, T.: Discriminative learning under covariate shift. J. Mach. Learn. Res. 10, 2137–2155 (2009)
Boularias, A., Kober, J., Peters, J.: Relative entropy inverse reinforcement learning. In: Proceedings of the 14th International Conference on Artificial Intelligence and Statistics, vol. 15 (2011)
Finn, C., Levine, S., Abbeel, P.: Guided cost learning: deep inverse optimal control via policy optimization. In: Proceedings of the 33rd International Conference on Machine Learning, pp. 49–58 (2016)
Kuderer, M., Gulati, S., Burgard, W.: Learning driving styles for autonomous vehicles from demonstration. In: Proceedings of IEEE International Conference on Robotics and Automation, pp. 2641–2646 (2015)
Levine, S., Popović, Z., Koltun, V.: Nonlinear inverse reinforcement learning with gaussian processes. Adv. Neural Inf. Process. Syst. 24, 19–27 (2011)
Muelling, K., Boularias, A., Mohler, B., Schölkopf, B., Peters, J.: Learning strategies in table tennis using inverse reinforcement learning. Biol. Cybern. 108(5), 603–619 (2014)
Ng, A.Y., Harada, D., Russel, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: Proceedings of the 16th International Conference on Machine Learning (1999)
Ng, A.Y., Russell, S.: Algorithms for inverse reinforcement learning. In: Proceedings of the 17th International Conference on Machine Learning, pp. 663–670 (2000)
Shimosaka, M., Nishi, K., Sato, J., Kataoka, H.: Predicting driving behavior using inverse reinforcement learning with multiple reward functions towards environmental diversity. In: Proceedings of IEEE Intelligent Vehicles Symposium, pp. 567–572 (2015)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Sugiyama, M., Suzuki, T., Kanamori, T.: Density Ratio Estimation in Machine Learning. Cambridge University Press, Cambridge (2012)
Szubert, M., Jaśkowski, W., Krawiec, K.: On scalability, generalization, and hybridization of coevolutionary learning: a case study for Othello. IEEE Trans. Comput. Intell. AI Games 5(3), 214–226 (2013)
Todorov, E.: Efficient computation of optimal actions. PNAS 106(28), 11478–11483 (2009)
Uchibe, E., Doya, K.: Inverse reinforcement learning using dynamic policy programming. In: Proceedings of the 4th IEEE International Conference on Development and Learning and on Epigenetic Robotics, pp. 222–228 (2014)
Wulfmeier, M., Ondrúška, P., Posner, I.: Maximum entropy deep inverse reinforcement learning. In: NIPS Deep Reinforcement Learning Workshop (2015)
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence (2008)
Acknowledgements
This paper is based on results obtained from a project commissioned by the New Energy and Industrial Technology Development Organization (NEDO).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing AG
About this paper
Cite this paper
Uchibe, E. (2016). Deep Inverse Reinforcement Learning by Logistic Regression. In: Hirose, A., Ozawa, S., Doya, K., Ikeda, K., Lee, M., Liu, D. (eds) Neural Information Processing. ICONIP 2016. Lecture Notes in Computer Science(), vol 9947. Springer, Cham. https://doi.org/10.1007/978-3-319-46687-3_3
Download citation
DOI: https://doi.org/10.1007/978-3-319-46687-3_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46686-6
Online ISBN: 978-3-319-46687-3
eBook Packages: Computer ScienceComputer Science (R0)