Abstract
The computational complexity of reinforcement learning algorithms increases exponentially with the size of the problem. An effective solution to this problem is to provide reinforcement learning agents with informationally rich human knowledge, so as to expedite the learning process. Various integration methods have been proposed to combine human reward with agent reward in reinforcement learning. However, the essential distinction of these combination methods and their respective advantages and disadvantages are still unclear. In this paper, we propose an adaptive learning algorithm that is capable of selecting the most suitable method from a portfolio of combination methods in an adaptive manner. We show empirically that our algorithm enables better learning performance under various conditions, compared to the approaches using one combination method alone. By analyzing different ways of integrating human knowledge into reinforcement learning, our work provides some important insights into understanding the role and impact of human factors in human-robot collaborative learning.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abel, D., Hershkowitz, D.E., Barth-Maron, G., Brawner, S., O’Farrell, K., MacGlashan, J., Tellex, S.: Goal-based action priors. In: ICAPS2015, pp. 306–314 (2015)
Abel, D., Salvatier, J., Stuhlmüller, A., Evans, O.: Agent-agnostic human-in-the-loop reinforcement learning. arXiv preprint arXiv:1701.04079 (2017)
Brys, T., Harutyunyan, A., Suay, H.B., Chernova, S., Taylor, M.E.: Reinforcement learning from demonstration through shaping. In: IJCAI2015, pp. 3352–3358 (2015)
Cederborg, T., Grover, I., Isbell, C.L., Thomaz, A.L.: Policy shaping with human teachers. In: IJCAI2015, pp. 3366–3372 (2015)
Suay, H.B., Brys, T., Taylor, M.E., Chernova, S.: Learning from demonstration for shaping through inverse reinforcement learning. In: AAMAS2016, pp. 429–437 (2016)
Devlin, S., Kudenko, D.: Dynamic potential-based reward shaping. In: AAMAS2012, pp. 433–440 (2012)
Devlin, S., Yliniemi, L., Kudenko, D., Tumer, K.: Potential-based difference rewards for multiagent reinforcement learning. In: AAMAS2014, pp. 165–172 (2014)
Fachantidis, A., Taylor, M.E., Vlahavas, I.: Learning to teach reinforcement learning agents. Mach. Learn. Knowl. Extr. 1(1), 2 (2017)
Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: AAMAS2006, pp. 720–727. ACM (2006)
Griffith, S., Subramanian, K., Scholz, J., Isbell, C.L., Thomaz, A.L.: Policy shaping: integrating human feedback with reinforcement learning. In: NIPS2013, pp. 2625–2633 (2013)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE ICRA, pp. 3389–3396. IEEE (2017)
Knox, W.B., Stone, P.: Tamer: training an agent manually via evaluative reinforcement. In: 7th IEEE ICDL, pp. 292–297. IEEE (2008)
Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: AAMAS2010, pp. 5–12 (2010)
Knox, W.B., Stone, P.: Reinforcement learning from simultaneous human and MDP reward. In: AAMAS2012, pp. 475–482 (2012)
Knox, W.B., Stone, P.: Framing reinforcement learning from human reward: reward positivity, temporal discounting, episodicity, and performance. Artif. Intell. 225(C), 24–50 (2015)
Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Moodie, E.E., Chakraborty, B., Kramer, M.S.: Q-learning for estimating optimal dynamic treatment rules from observational data. Can. J. Stat. 40(4), 629–645 (2012)
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML1999, vol. 99, pp. 278–287 (1999)
Peng, P., et al.: Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017)
Prasad, N., Cheng, L.F., Chivers, C., Draugelis, M., Engelhardt, B.E.: A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. arXiv preprint arXiv:1704.06300 (2017)
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: ICML2015, pp. 1889–1897 (2015)
Sherstov, A.A., Stone, P.: Improving action selection in MDP’s via knowledge transfer. In: AAAI2005, vol. 5, pp. 1024–1029 (2005)
Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. The MIT press, Cambridge (1998)
Taylor, M.E., Suay, H.B., Chernova, S.: Integrating reinforcement learning with human demonstrations of varying ability. In: AAMAS2011, pp. 617–624 (2011)
Torrey, L., Taylor, M.: Teaching on a budget: agents advising agents in reinforcement learning. In: AAMAS2013, pp. 1053–1060 (2013)
Yu, C., Zhang, M., Ren, F., Tan, G.: Multiagent learning of coordination in loosely coupled multiagent systems. IEEE Trans. Cybern. 45(12), 2853–2867 (2015)
Yu, C., Zhang, M., Ren, F.: Collective learning for the emergence of social norms in networked multiagent systems. IEEE Trans. Cybern. 44(12), 2342–2355 (2014)
Yu, C., Zhang, M., Ren, F., Tan, G.: Emotional multiagent reinforcement learning in spatial social dilemmas. IEEE Trans. Neural Netw. Learn. Syst. 26(12), 3083–3096 (2015)
Zhan, Y., Fachantidis, A., Vlahavas, I., Taylor, M.E.: Agents teaching humans in reinforcement learning tasks. In: Proceedings of the Adaptive and Learning Agents Workshop (AAMAS) (2014)
Acknowledgments
This work is supported by the National Natural Science Foundation of China under Grant 61502072, 61572104 and 61403059, Hongkong Scholar Program under Grant XJ2017028, and Dalian High Level Talent Innovation Support Program under Grant 2017RQ008.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Yu, C. et al. (2018). Adaptively Shaping Reinforcement Learning Agents via Human Reward. In: Geng, X., Kang, BH. (eds) PRICAI 2018: Trends in Artificial Intelligence. PRICAI 2018. Lecture Notes in Computer Science(), vol 11012. Springer, Cham. https://doi.org/10.1007/978-3-319-97304-3_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-97304-3_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-97303-6
Online ISBN: 978-3-319-97304-3
eBook Packages: Computer ScienceComputer Science (R0)