Skip to main content

Adaptively Shaping Reinforcement Learning Agents via Human Reward

  • Conference paper
  • First Online:
PRICAI 2018: Trends in Artificial Intelligence (PRICAI 2018)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11012))

Included in the following conference series:

Abstract

The computational complexity of reinforcement learning algorithms increases exponentially with the size of the problem. An effective solution to this problem is to provide reinforcement learning agents with informationally rich human knowledge, so as to expedite the learning process. Various integration methods have been proposed to combine human reward with agent reward in reinforcement learning. However, the essential distinction of these combination methods and their respective advantages and disadvantages are still unclear. In this paper, we propose an adaptive learning algorithm that is capable of selecting the most suitable method from a portfolio of combination methods in an adaptive manner. We show empirically that our algorithm enables better learning performance under various conditions, compared to the approaches using one combination method alone. By analyzing different ways of integrating human knowledge into reinforcement learning, our work provides some important insights into understanding the role and impact of human factors in human-robot collaborative learning.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abel, D., Hershkowitz, D.E., Barth-Maron, G., Brawner, S., O’Farrell, K., MacGlashan, J., Tellex, S.: Goal-based action priors. In: ICAPS2015, pp. 306–314 (2015)

    Google Scholar 

  2. Abel, D., Salvatier, J., Stuhlmüller, A., Evans, O.: Agent-agnostic human-in-the-loop reinforcement learning. arXiv preprint arXiv:1701.04079 (2017)

  3. Brys, T., Harutyunyan, A., Suay, H.B., Chernova, S., Taylor, M.E.: Reinforcement learning from demonstration through shaping. In: IJCAI2015, pp. 3352–3358 (2015)

    Google Scholar 

  4. Cederborg, T., Grover, I., Isbell, C.L., Thomaz, A.L.: Policy shaping with human teachers. In: IJCAI2015, pp. 3366–3372 (2015)

    Google Scholar 

  5. Suay, H.B., Brys, T., Taylor, M.E., Chernova, S.: Learning from demonstration for shaping through inverse reinforcement learning. In: AAMAS2016, pp. 429–437 (2016)

    Google Scholar 

  6. Devlin, S., Kudenko, D.: Dynamic potential-based reward shaping. In: AAMAS2012, pp. 433–440 (2012)

    Google Scholar 

  7. Devlin, S., Yliniemi, L., Kudenko, D., Tumer, K.: Potential-based difference rewards for multiagent reinforcement learning. In: AAMAS2014, pp. 165–172 (2014)

    Google Scholar 

  8. Fachantidis, A., Taylor, M.E., Vlahavas, I.: Learning to teach reinforcement learning agents. Mach. Learn. Knowl. Extr. 1(1), 2 (2017)

    Google Scholar 

  9. Fernández, F., Veloso, M.: Probabilistic policy reuse in a reinforcement learning agent. In: AAMAS2006, pp. 720–727. ACM (2006)

    Google Scholar 

  10. Griffith, S., Subramanian, K., Scholz, J., Isbell, C.L., Thomaz, A.L.: Policy shaping: integrating human feedback with reinforcement learning. In: NIPS2013, pp. 2625–2633 (2013)

    Google Scholar 

  11. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE ICRA, pp. 3389–3396. IEEE (2017)

    Google Scholar 

  12. Knox, W.B., Stone, P.: Tamer: training an agent manually via evaluative reinforcement. In: 7th IEEE ICDL, pp. 292–297. IEEE (2008)

    Google Scholar 

  13. Knox, W.B., Stone, P.: Combining manual feedback with subsequent MDP reward signals for reinforcement learning. In: AAMAS2010, pp. 5–12 (2010)

    Google Scholar 

  14. Knox, W.B., Stone, P.: Reinforcement learning from simultaneous human and MDP reward. In: AAMAS2012, pp. 475–482 (2012)

    Google Scholar 

  15. Knox, W.B., Stone, P.: Framing reinforcement learning from human reward: reward positivity, temporal discounting, episodicity, and performance. Artif. Intell. 225(C), 24–50 (2015)

    Article  MathSciNet  Google Scholar 

  16. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)

  17. Mnih, V., et al.: Playing atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  18. Moodie, E.E., Chakraborty, B., Kramer, M.S.: Q-learning for estimating optimal dynamic treatment rules from observational data. Can. J. Stat. 40(4), 629–645 (2012)

    Article  MathSciNet  Google Scholar 

  19. Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML1999, vol. 99, pp. 278–287 (1999)

    Google Scholar 

  20. Peng, P., et al.: Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017)

  21. Prasad, N., Cheng, L.F., Chivers, C., Draugelis, M., Engelhardt, B.E.: A reinforcement learning approach to weaning of mechanical ventilation in intensive care units. arXiv preprint arXiv:1704.06300 (2017)

  22. Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: ICML2015, pp. 1889–1897 (2015)

    Google Scholar 

  23. Sherstov, A.A., Stone, P.: Improving action selection in MDP’s via knowledge transfer. In: AAAI2005, vol. 5, pp. 1024–1029 (2005)

    Google Scholar 

  24. Sutton, R., Barto, A.: Reinforcement Learning: An Introduction. The MIT press, Cambridge (1998)

    Google Scholar 

  25. Taylor, M.E., Suay, H.B., Chernova, S.: Integrating reinforcement learning with human demonstrations of varying ability. In: AAMAS2011, pp. 617–624 (2011)

    Google Scholar 

  26. Torrey, L., Taylor, M.: Teaching on a budget: agents advising agents in reinforcement learning. In: AAMAS2013, pp. 1053–1060 (2013)

    Google Scholar 

  27. Yu, C., Zhang, M., Ren, F., Tan, G.: Multiagent learning of coordination in loosely coupled multiagent systems. IEEE Trans. Cybern. 45(12), 2853–2867 (2015)

    Article  Google Scholar 

  28. Yu, C., Zhang, M., Ren, F.: Collective learning for the emergence of social norms in networked multiagent systems. IEEE Trans. Cybern. 44(12), 2342–2355 (2014)

    Article  Google Scholar 

  29. Yu, C., Zhang, M., Ren, F., Tan, G.: Emotional multiagent reinforcement learning in spatial social dilemmas. IEEE Trans. Neural Netw. Learn. Syst. 26(12), 3083–3096 (2015)

    Article  MathSciNet  Google Scholar 

  30. Zhan, Y., Fachantidis, A., Vlahavas, I., Taylor, M.E.: Agents teaching humans in reinforcement learning tasks. In: Proceedings of the Adaptive and Learning Agents Workshop (AAMAS) (2014)

    Google Scholar 

Download references

Acknowledgments

This work is supported by the National Natural Science Foundation of China under Grant 61502072, 61572104 and 61403059, Hongkong Scholar Program under Grant XJ2017028, and Dalian High Level Talent Innovation Support Program under Grant 2017RQ008.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chao Yu .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Yu, C. et al. (2018). Adaptively Shaping Reinforcement Learning Agents via Human Reward. In: Geng, X., Kang, BH. (eds) PRICAI 2018: Trends in Artificial Intelligence. PRICAI 2018. Lecture Notes in Computer Science(), vol 11012. Springer, Cham. https://doi.org/10.1007/978-3-319-97304-3_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-97304-3_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-97303-6

  • Online ISBN: 978-3-319-97304-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics