Asynchronous Distributed Proximal Policy Optimization Training Framework Based on GPU

Youlve, Chen; Kaiyun, Bi; Zhaoyang, Liu

doi:10.1007/978-981-16-6372-7_67

Chen Youlve³⁷,
Bi Kaiyun³⁷ &
Liu Zhaoyang³⁷

Part of the book series: Lecture Notes in Electrical Engineering ((LNEE,volume 801))

1327 Accesses

Abstract

We propose a new GPU-based asynchronous DPPO training framework (GAPPO), in which the sampling part and the network update part are assigned to two different threads. The data exchange between two threads is realized by a buffer. Through coordinating the cycles of the two threads and synchronizing them, the training speed can be increased by about 1.5 times on average compared with the original DPPO algorithm. We apply this framework to five Atari games from the Arcade Learning Environment, the result indicates that compared with the DPPO algorithm, GAPPO can reach the same score in shorter time, and compared with GA3C, GAPPO can reach higher scores in four games.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 259.00; Price excludes VAT (USA)

Softcover Book: USD 329.99; Price excludes VAT (USA)

Hardcover Book: USD 329.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Mnih, V., Badia, A.P., Mirza, M.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Google Scholar
Frosio, I., Tyree, S., Clemons, J., Kautz, J.: GA3C: GPU-based A3C for deep reinforcement learning. In: 30th Conference on Neural Information Processing Systems, Barcelona, Spain (2016)
Google Scholar
Sutton, R.S., McAllester, D., Singh, S., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems 12, pp. 1057–1063 (2000)
Google Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms, aXiv:1707.06347v2 [cs.LG] (2017)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. In: Proceedings of the 32nd International Conference on Machine Learning (ICML) (2015)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D.: Human-level control through deep reinforcement learning. Nature, vol. 518. Springer, Heidelberg (2015)
Google Scholar
Schulman, J., Moritz, P., Levine, S., Jordan, M., Abbeel, P.: High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv (2015)
Google Scholar
Xiao, Z., Xie, N., Chen, J., Liu, B., Jiang, F., Yang, G.: Fast-PPO: proximal policy optimization with optimal baseline method. J. Chin. Comput. Syst. 41, 1351–1356 (2020)
Google Scholar
Heess, N., et al.: Emergence of locomotion behaviours in rich environments, arXiv preprint arXiv (2017)
Google Scholar
Mnih, V., et al.: Playing atari with deep reinforcement learning. In: NIPS Deep Learning Workshop (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Tianjin University, Weijin Rd. 92, Tianjin, 300072, China
Chen Youlve, Bi Kaiyun & Liu Zhaoyang

Authors

Chen Youlve
View author publications
You can also search for this author in PubMed Google Scholar
Bi Kaiyun
View author publications
You can also search for this author in PubMed Google Scholar
Liu Zhaoyang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Tsinghua University, Beijing, Beijing, China
Zhidong Deng

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Youlve, C., Kaiyun, B., Zhaoyang, L. (2022). Asynchronous Distributed Proximal Policy Optimization Training Framework Based on GPU. In: Deng, Z. (eds) Proceedings of 2021 Chinese Intelligent Automation Conference. Lecture Notes in Electrical Engineering, vol 801. Springer, Singapore. https://doi.org/10.1007/978-981-16-6372-7_67

Download citation

DOI: https://doi.org/10.1007/978-981-16-6372-7_67
Published: 08 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6371-0
Online ISBN: 978-981-16-6372-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics