Stable Deep Reinforcement Learning Method by Predicting Uncertainty in Rewards as a Subtask

Suzuki, Kanata; Ogata, Tetsuya

doi:10.1007/978-3-030-63833-7_55

Kanata Suzuki^14,15 &
Tetsuya Ogata^15,16

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 12533))

Included in the following conference series:

International Conference on Neural Information Processing

2435 Accesses
1 Citations

Abstract

In recent years, a variety of tasks have been accomplished by deep reinforcement learning (DRL). However, when applying DRL to tasks in a real-world environment, designing an appropriate reward is difficult. Rewards obtained via actual hardware sensors may include noise, misinterpretation, or failed observations. The learning instability caused by these unstable signals is a problem that remains to be solved in DRL. In this work, we propose an approach that extends existing DRL models by adding a subtask to directly estimate the variance contained in the reward signal. The model then takes the feature map learned by the subtask in a critic network and sends it to the actor network. This enables stable learning that is robust to the effects of potential noise. The results of experiments in the Atari game domain with unstable reward signals show that our method stabilizes training convergence. We also discuss the extensibility of the model by visualizing feature maps. This approach has the potential to make DRL more practical for use in noisy, real-world scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Anschel, O., Baram, N., Shimkin, N.: Averaged-DQN: variance reduction and stabilization for deep reinforcement learning. In: The International Conference on Machine Learning (2017)
Google Scholar
Ba, J., Kingma, D.P.: Adam: a method for stochastic optimization. In: Proceedings of International Conference on Learning Representations (2015)
Google Scholar
Brockman, G., et al.: OpenAI Gym. arXiv preprint arXiv:1606.01540 (2016)
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011)
MathSciNet MATH Google Scholar
Everitt, T., Krakovna, V., Orseau, L., Hutter, M., Legg, S.: Reinforcement learning with a corrupted reward channel. In: 26th International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 4705–4713 (2017)
Google Scholar
Fukui, H., Hirakawa, T., Yamashita, T., Fujiyoshi, H.: Attention branch network: learning of attention mechanism for visual explanation. In: International Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double Q-learning. In: Proceedings of the 13th AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Jang, E., Devin, C., Vanhoucke, V., Levine, S.: Grasp2Vec: learning object representations from self-supervised grasping. In: Conference on Robot Learning (CoRL) (2018)
Google Scholar
Johnson, R., Zhang, T.: Accelerating stochastic gradient descent using predictive variance reduction. In: Advances in Neural Information Processing Systems, pp. 315–323 (2013)
Google Scholar
Kase, K., Suzuki, K., Yang, P.C., Mori, H., Ogata, T.: Put-in-box task generated from multiple discrete tasks by a humanoid robot using deep learning. In: Proceedings of the IEEE International Conference on Robots and Automation (2018)
Google Scholar
Levine, S., Pastor, P., Krizhevsky, A., Quillen, D.: Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. Int. J. Robot. Res. 37(4–5), 421–436 (2017)
Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)
MathSciNet MATH Google Scholar
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Mnih, V.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning (ICML) (2016)
Google Scholar
Murata, S., Namikawa, J., Arie, H., Sugano, S., Tani, J.: Learning to reproduce fluctuating time series by inferring their time-dependent stochastic properties: application in robot learning via tutoring. IEEE Trans. Auton. Ment. Dev. 5, 298–310 (2013)
Article Google Scholar
Romoff, J., Henderson, P., Piché, A., F-Lavet, V., Pineau, J.: Reward Estimation for Variance Reduction in Deep Reinforcement Learning. arXiv preprint arXiv:1805.03359 (2018)
Roux, N.L., Schmidt, M., Bach, F.: A stochastic gradient method with an exponential convergence rate for finite training sets. Adv. Neural Inf. Process. Syst. 25, 2663–2671 (2012)
Google Scholar
Shwars, S.S., Zhang, T.: Stochastic dual coordinate ascent methods for regularized loss minimization. J. Mach. Learn. Res. 14, 567–599 (2013)
MathSciNet MATH Google Scholar
Silver, D., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529, 484–489 (2016)
Article Google Scholar
Suzuki, K., Mori, H., Ogata, T.: Motion switching with sensory and instruction signals by designing dynamical systems using deep neural network. IEEE Robot. Autom. Lett. 3(4), 3481–3488 (2018)
Article Google Scholar
Suzuki, K., Yokota, Y., Kanazawa, Y., Takebayashi, T.: Online self-supervised learning for object picking: detecting optimum grasping position using a metric learning approach. In: Proceedings of International Symposium on System Integrations (2020)
Google Scholar
Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–30 (2012)
Google Scholar
Wang, F., et al.: Residual attention network for image classification. In: Conference on Computer Vision and Pattern Recognition (2017)
Google Scholar
Zhao, W.Y., Guan, X.Y., Liu, Y., Zhao, X., Peng, J.: Stochastic Variance Reduction for Deep Q-learning. arXiv preprint arXiv:1905.08152 (2019)

Download references

Acknowledgment

This work was supported by JST, ACT-X Grant Number JPMJAX190I, Japan.

Author information

Authors and Affiliations

Artificial Intelligence Laboratories, Fujitsu Laboratories Ltd., Kanagawa, Japan
Kanata Suzuki
School of Fundamental Science and Engineering, Waseda University, Tokyo, Japan
Kanata Suzuki & Tetsuya Ogata
Artificial Intelligence Research Center, Tsukuba, Japan
Tetsuya Ogata

Authors

Kanata Suzuki
View author publications
You can also search for this author in PubMed Google Scholar
Tetsuya Ogata
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tetsuya Ogata .

Editor information

Editors and Affiliations

Department of AI, Ping An Life, Shenzhen, China
Haiqin Yang
Faculty of Information Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok, Thailand
Kitsuchart Pasupa
City University of Hong Kong, Kowloon, China
Andrew Chi-Sing Leung
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, Hong Kong
James T. Kwok
School of Information Technology, King Mongkut’s University of Technology Thonburi, Bangkok, Thailand
Jonathan H. Chan
The Chinese University of Hong Kong, New Territories, Hong Kong
Irwin King

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Suzuki, K., Ogata, T. (2020). Stable Deep Reinforcement Learning Method by Predicting Uncertainty in Rewards as a Subtask. In: Yang, H., Pasupa, K., Leung, A.CS., Kwok, J.T., Chan, J.H., King, I. (eds) Neural Information Processing. ICONIP 2020. Lecture Notes in Computer Science(), vol 12533. Springer, Cham. https://doi.org/10.1007/978-3-030-63833-7_55

Download citation

DOI: https://doi.org/10.1007/978-3-030-63833-7_55
Published: 20 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-63832-0
Online ISBN: 978-3-030-63833-7
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics