Abstract
Hierarchical reinforcement learning composites subpolicies in different hierarchies to accomplish complex tasks. Automated subpolicies discovery, which does not depend on domain knowledge, is a promising approach to generating subpolicies. However, the degradation problem is a challenge that existing methods can hardly deal with due to the lack of consideration of diversity or the employment of weak regularizers. In this paper, we propose a novel task-agnostic regularizer called the Wasserstein Diversity-Enriched Regularizer (WDER), which enlarges the diversity of subpolicies by maximizing the Wasserstein distances among action distributions. The proposed WDER can be easily incorporated into the loss function of existing methods to boost their performance further. Experimental results demonstrate that our WDER improves performance and sample efficiency in comparison with prior work without modifying hyperparameters, which indicates the applicability and robustness of the WDER.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bacon, P., Harb, J., Precup, D.: The option-critic architecture. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 1726–1734 (2017)
Dadashi, R., Hussenot, L., Geist, M., Pietquin, O.: Primal Wasserstein imitation learning. In: Proceeding of the International Conference on Learning Representations (2021)
Eysenbach, B., Gupta, A., Ibarz, J., Levine, S.: Diversity is all you need: learning skills without a reward function. In: Proceeding of the International Conference on Learning Representations (2019)
Florensa, C., Duan, Y., Abbeel, P.: Stochastic neural networks for hierarchical reinforcement learning. In: Proceeding of the International Conference on Learning Representations (2017)
Frans, K., Ho, J., Chen, X., Abbeel, P., Schulman, J.: Meta learning shared hierarchies. In: Proceeding of the International Conference on Learning Representations (2018)
Haarnoja, T., Hartikainen, K., Abbeel, P., Levine, S.: Latent space policies for hierarchical reinforcement learning. In: Proceedings of the International Conference on Machine Learning, vol. 80, pp. 1846–1855 (2018)
He, S., Jiang, Y., Zhang, H., Shao, J., Ji, X.: Wasserstein unsupervised reinforcement learning. In: Proceeding of the AAAI Conference on Artificial Intelligence, pp. 6884–6892 (2022)
Huo, L., Wang, Z., Xu, M., Song, Y.: A task-agnostic regularizer for diverse subpolicy discovery in hierarchical reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. 53(3), 1932–1944 (2023)
Kamat, A., Precup, D.: Diversity-enriched option-critic. arXiv preprint arXiv:2011.02565 (2020)
Levy, A., Konidaris, G.D., Jr., R.P., Saenko, K.: Learning multi-level hierarchies with hindsight. In: Proceeding of the International Conference on Learning Representations (2019)
Levy, K.Y., Shimkin, N.: Unified inter and intra options learning using policy gradient methods. In: Sanner, S., Hutter, M. (eds.) EWRL 2011. LNCS (LNAI), vol. 7188, pp. 153–164. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-29946-9_17
Li, C., Song, D., Tao, D.: Hit-MDP: learning the SMDP option framework on MDPs with hidden temporal embeddings. In: Proceeding of the International Conference on Learning Representations (2023)
Moskovitz, T., Arbel, M., Huszar, F., Gretton, A.: Efficient Wasserstein natural gradients for reinforcement learning. In: Proceeding of the International Conference on Learning Representations (2021)
Pacchiano, A., Parker-Holder, J., Tang, Y., Choromanski, K., Choromanska, A., Jordan, M.I.: Learning to score behaviors for guided policy optimization. In: Proceedings of the International Conference on Machine Learning, vol. 119, pp. 7445–7454 (2020)
Pateria, S., Subagdja, B., Tan, A., Quek, C.: Hierarchical reinforcement learning: a comprehensive survey. ACM Comput. Surv. 54(5), 109:1–109:35 (2022)
Rahimi, A., Recht, B.: Random features for large-scale kernel machines. In: Proceedings of the Annual Conference on Neural Information Processing Systems, pp. 1177–1184 (2007)
Rowland, M., Hron, J., Tang, Y., Choromanski, K., Sarlos, T., Weller, A.: Orthogonal estimation of Wasserstein distances. In: The 22nd International Conference on Artificial Intelligence and Statistics, pp. 186–195. PMLR (2019)
Schulman, J., Levine, S., Abbeel, P., Jordan, M.I., Moritz, P.: Trust region policy optimization. In: Proceedings of the International Conference on Machine Learning, vol. 37, pp. 1889–1897 (2015)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Todorov, E., Erez, T., Tassa, Y.: MuJoCo: a physics engine for model-based control. In: Proceeding of the International Conference on Intelligent Robots and Systems, pp. 5026–5033 (2012)
Vezhnevets, A.S., et al.: Feudal networks for hierarchical reinforcement learning. In: Proceedings of the International Conference on Machine Learning, vol. 70, pp. 3540–3549 (2017)
Villani, C., et al.: Optimal Transport: Old and New, vol. 338. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-540-71050-9
Wu, J., et al.: Sliced Wasserstein generative models. In: Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3713–3722 (2019)
Yang, X., et al.: Hierarchical reinforcement learning with universal policies for multistep robotic manipulation. IEEE Trans. Neural Networks Learn. Syst. 33(9), 4727–4741 (2022)
Zhang, R., Chen, C., Li, C., Carin, L.: Policy optimization as Wasserstein gradient flows. In: Proceedings of the International Conference on Machine Learning, vol. 80, pp. 5741–5750 (2018)
Zhang, S., Whiteson, S.: DAC: the double actor-critic architecture for learning options, vol. 32 (2019)
Acknowledgements
This work was supported in part by the National Key Research and Development Program of China under Grant 2020AAA0103405, the National Natural Science Foundation of China under Grants 72293573 and 72293575, as well as the Strategic Priority Research Program of Chinese Academy of Sciences under Grant XDA27030100.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Li, H., Liang, J., Li, L., Zeng, D. (2024). Wasserstein Diversity-Enriched Regularizer for Hierarchical Reinforcement Learning. In: Luo, B., Cheng, L., Wu, ZG., Li, H., Li, C. (eds) Neural Information Processing. ICONIP 2023. Lecture Notes in Computer Science, vol 14447. Springer, Singapore. https://doi.org/10.1007/978-981-99-8079-6_44
Download citation
DOI: https://doi.org/10.1007/978-981-99-8079-6_44
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-8078-9
Online ISBN: 978-981-99-8079-6
eBook Packages: Computer ScienceComputer Science (R0)