Adaptively Shaping Reinforcement Learning Agents via Human Reward

Yu, Chao; Wang, Dongxu; Yang, Tianpei; Zhu, Wenxuan; Li, Yuchen; Ge, Hongwei; Ren, Jiankang

doi:10.1007/978-3-319-97304-3_7

Chao Yu¹⁵,
Dongxu Wang¹⁵,
Tianpei Yang¹⁵,
Wenxuan Zhu¹⁵,
Yuchen Li¹⁵,
Hongwei Ge¹⁵ &
…
Jiankang Ren¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11012))

Included in the following conference series:

Pacific Rim International Conference on Artificial Intelligence

3472 Accesses
3 Citations

Abstract

The computational complexity of reinforcement learning algorithms increases exponentially with the size of the problem. An effective solution to this problem is to provide reinforcement learning agents with informationally rich human knowledge, so as to expedite the learning process. Various integration methods have been proposed to combine human reward with agent reward in reinforcement learning. However, the essential distinction of these combination methods and their respective advantages and disadvantages are still unclear. In this paper, we propose an adaptive learning algorithm that is capable of selecting the most suitable method from a portfolio of combination methods in an adaptive manner. We show empirically that our algorithm enables better learning performance under various conditions, compared to the approaches using one combination method alone. By analyzing different ways of integrating human knowledge into reinforcement learning, our work provides some important insights into understanding the role and impact of human factors in human-robot collaborative learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abel, D., Hershkowitz, D.E., Barth-Maron, G., Brawner, S., O’Farrell, K., MacGlashan, J., Tellex, S.: Goal-based action priors. In: ICAPS2015, pp. 306–314 (2015)
Google Scholar
Abel, D., Salvatier, J., Stuhlmüller, A., Evans, O.: Agent-agnostic human-in-the-loop reinforcement learning. arXiv preprint arXiv:1701.04079 (2017)
Brys, T., Harutyunyan, A., Suay, H.B., Chernova, S., Taylor, M.E.: Reinforcement learning from demonstration through shaping. In: IJCAI2015, pp. 3352–3358 (2015)
Google Scholar
Cederborg, T., Grover, I., Isbell, C.L., Thomaz, A.L.: Policy shaping with human teachers. In: IJCAI2015, pp. 3366–3372 (2015)
Google Scholar
Suay, H.B., Brys, T., Taylor, M.E., Chernova, S.: Learning from demonstration for shaping through inverse reinforcement learning. In: AAMAS2016, pp. 429–437 (2016)
Google Scholar
Devlin, S., Kudenko, D.: Dynamic potential-based reward shaping. In: AAMAS2012, pp. 433–440 (2012)
Google Scholar
Devlin, S., Yliniemi, L., Kudenko, D., Tumer, K.: Potential-based difference rewards for multiagent reinforcement learning. In: AAMAS2014, pp. 165–172 (2014)
Google Scholar
Fachantidis, A., Taylor, M.E., Vlahavas, I.: Learning to teach reinforcement learning agents. Mach. Learn. Knowl. Extr. 1(1), 2 (2017)
Google Scholar
Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: AAMAS2006, pp. 720–727. ACM (2006)
Google Scholar
Griffith, S., Subramanian, K., Scholz, J., Isbell, C.L., Thomaz, A.L.: Policy shaping: integrating human feedback with reinforcement learning. In: NIPS2013, pp. 2625–2633 (2013)
Google Scholar
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE ICRA, pp. 3389–3396. IEEE (2017)
Google Scholar
Knox, W.B., Stone, P.: Tamer: training an agent manually via evaluative reinforcement. In: 7th IEEE ICDL, pp. 292–297. IEEE (2008)
Google Scholar
Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: AAMAS2010, pp. 5–12 (2010)
Google Scholar
Knox, W.B., Stone, P.: Reinforcement learning from simultaneous human and MDP reward. In: AAMAS2012, pp. 475–482 (2012)
Google Scholar
Knox, W.B., Stone, P.: Framing reinforcement learning from human reward: reward positivity, temporal discounting, episodicity, and performance. Artif. Intell. 225(C), 24–50 (2015)
Article MathSciNet Google Scholar
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Moodie, E.E., Chakraborty, B., Kramer, M.S.: Q-learning for estimating optimal dynamic treatment rules from observational data. Can. J. Stat. 40(4), 629–645 (2012)
Article MathSciNet Google Scholar
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML1999, vol. 99, pp. 278–287 (1999)
Google Scholar
Peng, P., et al.: Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017)
Prasad, N., Cheng, L.F., Chivers, C., Draugelis, M., Engelhardt, B.E.: A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. arXiv preprint arXiv:1704.06300 (2017)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: ICML2015, pp. 1889–1897 (2015)
Google Scholar
Sherstov, A.A., Stone, P.: Improving action selection in MDP’s via knowledge transfer. In: AAAI2005, vol. 5, pp. 1024–1029 (2005)
Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. The MIT press, Cambridge (1998)
Google Scholar
Taylor, M.E., Suay, H.B., Chernova, S.: Integrating reinforcement learning with human demonstrations of varying ability. In: AAMAS2011, pp. 617–624 (2011)
Google Scholar
Torrey, L., Taylor, M.: Teaching on a budget: agents advising agents in reinforcement learning. In: AAMAS2013, pp. 1053–1060 (2013)
Google Scholar
Yu, C., Zhang, M., Ren, F., Tan, G.: Multiagent learning of coordination in loosely coupled multiagent systems. IEEE Trans. Cybern. 45(12), 2853–2867 (2015)
Article Google Scholar
Yu, C., Zhang, M., Ren, F.: Collective learning for the emergence of social norms in networked multiagent systems. IEEE Trans. Cybern. 44(12), 2342–2355 (2014)
Article Google Scholar
Yu, C., Zhang, M., Ren, F., Tan, G.: Emotional multiagent reinforcement learning in spatial social dilemmas. IEEE Trans. Neural Netw. Learn. Syst. 26(12), 3083–3096 (2015)
Article MathSciNet Google Scholar
Zhan, Y., Fachantidis, A., Vlahavas, I., Taylor, M.E.: Agents teaching humans in reinforcement learning tasks. In: Proceedings of the Adaptive and Learning Agents Workshop (AAMAS) (2014)
Google Scholar

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant 61502072, 61572104 and 61403059, Hongkong Scholar Program under Grant XJ2017028, and Dalian High Level Talent Innovation Support Program under Grant 2017RQ008.

Author information

Authors and Affiliations

School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024, Liaoning, China
Chao Yu, Dongxu Wang, Tianpei Yang, Wenxuan Zhu, Yuchen Li, Hongwei Ge & Jiankang Ren

Authors

Chao Yu
View author publications
You can also search for this author in PubMed Google Scholar
Dongxu Wang
View author publications
You can also search for this author in PubMed Google Scholar
Tianpei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Wenxuan Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Yuchen Li
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Ge
View author publications
You can also search for this author in PubMed Google Scholar
Jiankang Ren
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chao Yu .

Editor information

Editors and Affiliations

Southeast University, Nanjing, China
Xin Geng
University of Tasmania, Hobart, Tasmania, Australia
Byeong-Ho Kang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yu, C. et al. (2018). Adaptively Shaping Reinforcement Learning Agents via Human Reward. In: Geng, X., Kang, BH. (eds) PRICAI 2018: Trends in Artificial Intelligence. PRICAI 2018. Lecture Notes in Computer Science(), vol 11012. Springer, Cham. https://doi.org/10.1007/978-3-319-97304-3_7

Download citation

DOI: https://doi.org/10.1007/978-3-319-97304-3_7
Published: 27 July 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97303-6
Online ISBN: 978-3-319-97304-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics