Abstract
Sparse rewards is a tricky problem in reinforcement learning and reward shaping is commonly used to solve the problem of sparse rewards in specific tasks, but it often requires priori knowledge and manually designing rewards, which are costly in many cases. Hindsight experience replay (HER) solves the problem of sparse rewards in multi-goal scenarios by replacing the goal of a failed trajectory with a virtual goal. Our method integrates the ideas of reward shaping and HER, which has two advantages: First, it can automatically perform reward shaping without manually-designed reward functions; Second, it can solve the problem arising from the use of virtual goals in HER. Experiment results show our method can significantly improve the performance in both Bit-Flipping environment and Mujoco environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Amodei, D., Olah, C., Steinhardt, J., Christiano, P., Schulman, J., Mané, D.: Concrete problems in ai safety. arXiv preprint arXiv:1606.06565 (2016)
Andrychowicz, M., et al.: Hindsight experience replay. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, pp. 5055–5065 (2017)
Bai, C., Liu, P., Zhao, W., Tang, X.: Guided goal generation for hindsight multi-goal reinforcement learning. Neurocomputing 359, 353–367 (2019)
Clark, J., Amodei, D.: Faulty reward functions in the wild. Internet (2016). https://blog.openai.com/faulty-reward-functions
Devlin, S.M., Kudenko, D.: Dynamic potential-based reward shaping. In: Proceedings of the 11th International Conference on Autonomous Agents and Multiagent Systems, pp. 433–440. IFAAMAS (2012)
Fang, M., Zhou, C., Shi, B., Gong, B., Xu, J., Zhang, T.: Dher: hindsight experience replay for dynamic goals. In: International Conference on Learning Representations (2018)
Fang, M., Zhou, T., Du, Y., Han, L., Zhang, Z.: Curriculum-guided hindsight experience replay. Adv. Neural Inf. Process. Syst. 32, 12623–12634 (2019)
Fu, Z.Y., Zhan, D.C., Li, X.C., Lu, Y.X.: Automatic successive reinforcement learning with multiple auxiliary rewards (2019)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396. IEEE (2017)
Harutyunyan, A., Devlin, S., Vrancx, P., Nowé, A.: Expressing arbitrary reward functions as potential-based advice. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 29 (2015)
Hu, Y., et al.: Learning to utilize shaping rewards: a new approach of reward shaping. Adv. Neural Inf. Process. Syst. 33 (2020)
Jiang, K., Qin, X.: Reinforcement learning with goal-distance gradient. arXiv preprint arXiv:2001.00127 (2020)
Manela, B., Biess, A.: Curriculum learning with hindsight experience replay for sequential object manipulation tasks. arXiv preprint arXiv:2008.09377 (2020)
Nair, A.V., Pong, V., Dalal, M., Bahl, S., Lin, S., Levine, S.: Visual reinforcement learning with imagined goals. Adv. Neural Inf. Process. Syst. 31, 9191–9200 (2018)
Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)
Randløv, J., Alstrøm, P.: Learning to drive a bicycle using reinforcement learning and shaping. In: ICML, vol. 98, pp. 463–471. Citeseer (1998)
Rauber, P., Ummadisingu, A., Mutz, F., Schmidhuber, J.: Hindsight policy gradients. In: International Conference on Learning Representations (2018)
Russell, S.J., Norvig, P.: Artificial intelligence: a modern approach (2010)
Schaul, T., Horgan, D., Gregor, K., Silver, D.: Universal value function approximators. In: International Conference on Machine Learning, pp. 1312–1320. PMLR (2015)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)
Sun, F.Y., Chang, Y.Y., Wu, Y.H., Lin, S.D.: Designing non-greedy reinforcement learning agents with diminishing reward shaping. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pp. 297–302 (2018)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (2018)
Wiewiora, E., Cottrell, G.W., Elkan, C.: Principled methods for advising reinforcement learning agents. In: Proceedings of the 20th International Conference on Machine Learning (ICML-2003), pp. 792–799 (2003)
Zhao, R., Tresp, V.: Energy-based hindsight experience prioritization. In: Conference on Robot Learning, pp. 113–122. PMLR (2018)
Zheng, Z., et al.: What can learned intrinsic rewards capture? In: International Conference on Machine Learning, pp. 11436–11446. PMLR (2020)
Zou, H., Ren, T., Yan, D., Su, H., Zhu, J.: Reward shaping via meta-learning. arXiv preprint arXiv:1901.09330 (2019)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Shao, M., Jiang, F., Liu, S., Han, K., Zhao, D. (2023). Hindsight Balanced Reward Shaping. In: Tanveer, M., Agarwal, S., Ozawa, S., Ekbal, A., Jatowt, A. (eds) Neural Information Processing. ICONIP 2022. Communications in Computer and Information Science, vol 1792. Springer, Singapore. https://doi.org/10.1007/978-981-99-1642-9_42
Download citation
DOI: https://doi.org/10.1007/978-981-99-1642-9_42
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-1641-2
Online ISBN: 978-981-99-1642-9
eBook Packages: Computer ScienceComputer Science (R0)