Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning

Li, Haorui; Liang, Jiaqi; Li, Linjing; Zeng, Daniel

doi:10.1007/978-981-99-8079-6_44

Haorui Li^12,13,
Jiaqi Liang¹²,
Linjing Li^12,13 &
…
Daniel Zeng^12,13

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14447))

Included in the following conference series:

International Conference on Neural Information Processing

815 Accesses

Abstract

Hierarchical reinforcement learning composites subpolicies in different hierarchies to accomplish complex tasks. Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies. However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further. Experimental results demonstrate that our WDER improves performance and sample efficiency in comparison with prior work without modifying hyperparameters, which indicates the applicability and robustness of the WDER.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bacon, P., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1726–1734 (2017)
Google Scholar
Dadashi, R., Hussenot, L., Geist, M., Pietquin, O.: Primal Wasserstein imitation learning. In: Proceeding of the International Conference on Learning Representations (2021)
Google Scholar
Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. In: Proceeding of the International Conference on Learning Representations (2019)
Google Scholar
Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. In: Proceeding of the International Conference on Learning Representations (2017)
Google Scholar
Frans, K., Ho, J., Chen, X., Abbeel, P., Schulman, J.: Meta learning shared hierarchies. In: Proceeding of the International Conference on Learning Representations (2018)
Google Scholar
Haarnoja, T., Hartikainen, K., Abbeel, P., Levine, S.: Latent space policies for hierarchical reinforcement learning. In: Proceedings of the International Conference on Machine Learning, vol. 80, pp. 1846–1855 (2018)
Google Scholar
He, S., Jiang, Y., Zhang, H., Shao, J., Ji, X.: Wasserstein unsupervised reinforcement learning. In: Proceeding of the AAAI Conference on Artificial Intelligence, pp. 6884–6892 (2022)
Google Scholar
Huo, L., Wang, Z., Xu, M., Song, Y.: A task-agnostic regularizer for diverse subpolicy discovery in hierarchical reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. 53(3), 1932–1944 (2023)
Article Google Scholar
Kamat, A., Precup, D.: Diversity-enriched option-critic. arXiv preprint arXiv:2011.02565 (2020)
Levy, A., Konidaris, G.D., Jr., R.P., Saenko, K.: Learning multi-level hierarchies with hindsight. In: Proceeding of the International Conference on Learning Representations (2019)
Google Scholar
Levy, K.Y., Shimkin, N.: Unified inter and intra options learning using policy gradient methods. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 153–164. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_17
Chapter Google Scholar
Li, C., Song, D., Tao, D.: Hit-MDP: learning the SMDP option framework on MDPs with hidden temporal embeddings. In: Proceeding of the International Conference on Learning Representations (2023)
Google Scholar
Moskovitz, T., Arbel, M., Huszar, F., Gretton, A.: Efficient Wasserstein natural gradients for reinforcement learning. In: Proceeding of the International Conference on Learning Representations (2021)
Google Scholar
Pacchiano, A., Parker-Holder, J., Tang, Y., Choromanski, K., Choromanska, A., Jordan, M.I.: Learning to score behaviors for guided policy optimization. In: Proceedings of the International Conference on Machine Learning, vol. 119, pp. 7445–7454 (2020)
Google Scholar
Pateria, S., Subagdja, B., Tan, A., Quek, C.: Hierarchical reinforcement learning: a comprehensive survey. ACM Comput. Surv. 54(5), 109:1–109:35 (2022)
Google Scholar
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 1177–1184 (2007)
Google Scholar
Rowland, M., Hron, J., Tang, Y., Choromanski, K., Sarlos, T., Weller, A.: Orthogonal estimation of Wasserstein distances. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 186–195. PMLR (2019)
Google Scholar
Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust region policy optimization. In: Proceedings of the International Conference on Machine Learning, vol. 37, pp. 1889–1897 (2015)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
MATH Google Scholar
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: Proceeding of the International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)
Google Scholar
Vezhnevets, A.S., et al.: Feudal networks for hierarchical reinforcement learning. In: Proceedings of the International Conference on Machine Learning, vol. 70, pp. 3540–3549 (2017)
Google Scholar
Villani, C., et al.: Optimal Transport: Old and New, vol. 338. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9
Book MATH Google Scholar
Wu, J., et al.: Sliced Wasserstein generative models. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3713–3722 (2019)
Google Scholar
Yang, X., et al.: Hierarchical reinforcement learning with universal policies for multistep robotic manipulation. IEEE Trans. Neural Networks Learn. Syst. 33(9), 4727–4741 (2022)
Article MathSciNet Google Scholar
Zhang, R., Chen, C., Li, C., Carin, L.: Policy optimization as Wasserstein gradient flows. In: Proceedings of the International Conference on Machine Learning, vol. 80, pp. 5741–5750 (2018)
Google Scholar
Zhang, S., Whiteson, S.: DAC: the double actor-critic architecture for learning options, vol. 32 (2019)
Google Scholar

Download references

Acknowledgements

This work was supported in part by the National Key Research and Development Program of China under Grant 2020AAA0103405, the National Natural Science Foundation of China under Grants 72293573 and 72293575, as well as the Strategic Priority Research Program of Chinese Academy of Sciences under Grant XDA27030100.

Author information

Authors and Affiliations

State Key Laboratory of Multimodal Artificial Intelligence Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China
Haorui Li, Jiaqi Liang, Linjing Li & Daniel Zeng
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China
Haorui Li, Linjing Li & Daniel Zeng

Authors

Haorui Li
View author publications
You can also search for this author in PubMed Google Scholar
Jiaqi Liang
View author publications
You can also search for this author in PubMed Google Scholar
Linjing Li
View author publications
You can also search for this author in PubMed Google Scholar
Daniel Zeng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jiaqi Liang .

Editor information

Editors and Affiliations

Central South University, Changsha, China
Biao Luo
Chinese Academy of Sciences, Beijing, China
Long Cheng
Zhejiang University, Hangzhou, China
Zheng-Guang Wu
Guangdong University of Technology, Guangzhou, China
Hongyi Li
UNSW Sydney, Sydney, NSW, Australia
Chaojie Li

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Li, H., Liang, J., Li, L., Zeng, D. (2024). Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14447. Springer, Singapore. https://doi.org/10.1007/978-981-99-8079-6_44

Download citation

DOI: https://doi.org/10.1007/978-981-99-8079-6_44
Published: 14 November 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8078-9
Online ISBN: 978-981-99-8079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning