Skip to main content

Diversity-Based Trajectory and Goal Selection with Hindsight Experience Replay

Part of the Lecture Notes in Computer Science book series (LNAI,volume 13033)

Abstract

Hindsight experience replay (HER) is a goal relabelling technique typically used with off-policy deep reinforcement learning algorithms to solve goal-oriented tasks; it is well suited to robotic manipulation tasks that deliver only sparse rewards. In HER, both trajectories and transitions are sampled uniformly for training. However, not all of the agent’s experiences contribute equally to training, and so naive uniform sampling may lead to inefficient learning. In this paper, we propose diversity-based trajectory and goal selection with HER (DTGSH). Firstly, trajectories are sampled according to the diversity of the goal states as modelled by determinantal point processes (DPPs). Secondly, transitions with diverse goal states are selected from the trajectories by using k-DPPs. We evaluate DTGSH on five challenging robotic manipulation tasks in simulated robot environments, where we show that our method can learn more quickly and reach higher performance than other state-of-the-art approaches on all tasks.

Keywords

  • Deep reinforcement learning
  • Determinantal point processes
  • Hindsight experience replay

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    https://github.com/openai/baselines.

References

  1. Andrychowicz, M., et al.: Hindsight experience replay. In: Neural Information Processing Systems (2017)

    Google Scholar 

  2. Andrychowicz, O.M., et al.: Learning dexterous in-hand manipulation. Int. J. Robot. Res. 39(1), 3–20 (2020)

    CrossRef  Google Scholar 

  3. Arulkumaran, K., Deisenroth, M.P., Brundage, M., Bharath, A.A.: Deep reinforcement learning: a brief survey. IEEE Signal Process. Mag. 34(6), 26–38 (2017)

    CrossRef  Google Scholar 

  4. Berner, C., et al.: Dota 2 with large scale deep reinforcement learning. arXiv:1912.06680 (2019)

  5. Brockman, G., et al.: Openai gym. arXiv preprint arXiv:1606.01540 (2016)

  6. Dai, T., Liu, H., Bharath, A.A.: Episodic self-imitation learning with hindsight. Electronics 9(10), 1742 (2020)

    CrossRef  Google Scholar 

  7. Fang, M., Zhou, C., Shi, B., Gong, B., Xu, J., Zhang, T.: DHER: Hindsight experience replay for dynamic goals. In: International Conference on Learning Representations (2018)

    Google Scholar 

  8. Fang, M., Zhou, T., Du, Y., Han, L., Zhang, Z.: Curriculum-guided hindsight experience replay. In: Neural Information Processing Systems (2019)

    Google Scholar 

  9. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: International Conference on Robotics and Automation (2017)

    Google Scholar 

  10. Hong, K., Nenkova, A.: Improving the estimation of word importance for news multi-document summarization. In: Conference of the European Chapter of the Association for Computational Linguistics (2014)

    Google Scholar 

  11. Kaelbling, L.P.: Learning to achieve goals. In: International Joint Conference on Artificial Intelligence (1993)

    Google Scholar 

  12. Kiran, B.R., et al.: Deep reinforcement learning for autonomous driving: a survey. IEEE Trans. Intell. Transp. Syst. 1–18 (2021)

    Google Scholar 

  13. Kulesza, A., Taskar, B.: k-DPPs: fixed-size determinantal point processes. In: International Conference on Machine Learning (2011)

    Google Scholar 

  14. Kulesza, A., et al.: Determinantal Point Processes for Machine Learning. Foundations and Trends in Machine Learning (2012)

    Google Scholar 

  15. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1–40 (2016)

    MathSciNet  MATH  Google Scholar 

  16. Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. In: International Conference on Learning Representations (2016)

    Google Scholar 

  17. Liu, H., Trott, A., Socher, R., Xiong, C.: Competitive experience replay. In: International Conference on Learning Representations (2019)

    Google Scholar 

  18. Mahasseni, B., Lam, M., Todorovic, S.: Unsupervised video summarization with adversarial lstm networks. In: IEEE Conference on Computer Vision and Pattern Recognition (2017)

    Google Scholar 

  19. Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)

    CrossRef  Google Scholar 

  20. Ng, A.Y., Harada, D., Russell, S.: Theory and application to reward shaping. In: International Conference on Machine Learning (1999)

    Google Scholar 

  21. Osogami, T., Raymond, R.: Determinantal reinforcement learning. In: AAAI Conference on Artificial Intelligence (2019)

    Google Scholar 

  22. Parker-Holder, J., Pacchiano, A., Choromanski, K.M., Roberts, S.J.: Effective diversity in population based reinforcement learning. In: Neural Information Processing Systems (2020)

    Google Scholar 

  23. Plappert, M., et al.: Multi-goal reinforcement learning: challenging robotics environments and request for research. arXiv:1802.09464 (2018)

  24. Rauber, P., Ummadisingu, A., Mutz, F., Schmidhuber, J.: Hindsight policy gradients. In: International Conference on Learning Representations (2019)

    Google Scholar 

  25. Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning (2015)

    Google Scholar 

  26. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. In: International Conference on Learning Representations (2016)

    Google Scholar 

  27. Schrittwieser, J., et al.: Mastering Atari, go, chess and shogi by planning with a learned model. Nature 588, 604–609 (2020)

    CrossRef  Google Scholar 

  28. Silver, D., et al.: Mastering the game of go without human knowledge. Nature 550, 354–359 (2017)

    CrossRef  Google Scholar 

  29. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  30. Vinyals, O., et al.: Grandmaster level in Starcraft ii using multi-agent reinforcement learning. Nature 575, 350–354 (2019)

    CrossRef  Google Scholar 

  31. Yang, Y., et al.: Multi-agent determinantal q-learning. In: International Conference on Machine Learning (2020)

    Google Scholar 

  32. Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Conference on Robot Learning (2018)

    Google Scholar 

Download references

Acknowledgements

This work was supported by JST, Moonshot R&D Grant Number JPMJMS2012.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Tianhong Dai .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Dai, T., Liu, H., Arulkumaran, K., Ren, G., Bharath, A.A. (2021). Diversity-Based Trajectory and Goal Selection with Hindsight Experience Replay. In: Pham, D.N., Theeramunkong, T., Governatori, G., Liu, F. (eds) PRICAI 2021: Trends in Artificial Intelligence. PRICAI 2021. Lecture Notes in Computer Science(), vol 13033. Springer, Cham. https://doi.org/10.1007/978-3-030-89370-5_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-89370-5_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-89369-9

  • Online ISBN: 978-3-030-89370-5

  • eBook Packages: Computer ScienceComputer Science (R0)