Parallelization of Reinforcement Learning Algorithms for Video Games

Kopel, Marek; Szczurek, Witold

doi:10.1007/978-3-030-73280-6_16

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12672))

Included in the following conference series:

Asian Conference on Intelligent Information and Database Systems

1845 Accesses

Abstract

This paper explores a new way of parallelization of Reinforcement Learning algorithms - simulation of environments on the GPU. We use the recently proposed framework called CUDA Learning Environment as a basis for our work. To prove the approach’s viability, we performed experimentation with two main class of Reinforcement Learning algorithms - value based (Deep-Q-Network) and policy based (Proximal Policy Optimization). Our results validate the approach of using GPU for environment emulation in Reinforcement Learning algorithms and give insight into convergence properties and performance of those algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Parallel Approach to Advantage Actor Critic in Deep Reinforcement Learning

Distributed Deep Reinforcement Learning: Learn How to Play Atari Games in 21 minutes

Applying Game-Learning Environments to Power Capping Scenarios via Reinforcement Learning

References

Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: GA3C: GPU-based A3C for Deep Reinforcement Learning. CoRR abs/1611.06256 (Nov 2016)
Google Scholar
Badia, A.P., et al.: Agent57: outperforming the Atari Human Benchmark. arXiv:2003.13350 [cs, stat], March 2020, http://arxiv.org/abs/2003.13350, arXiv: 2003.13350
Cho, H., Oh, P., Park, J., Jung, W., Lee, J.: FA3C: FPGA-accelerated deep reinforcement learning. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, pp. 499–513. Association for Computing Machinery, Providence, RI, USA, April 2019. https://doi.org/10.1145/3297858.3304058, https://doi.org/10.1145/3297858.3304058
Clemente, A.V., Castejón, H.N., Chandra, A.: Efficient parallel methods for deep reinforcement learning. arXiv:1705.04862 [cs], May 2017, http://arxiv.org/abs/1705.04862, arXiv: 1705.04862
Dalton, S., Frosio, I., Garland, M.: GPU-Accelerated Atari Emulation for Reinforcement Learning. arXiv:1907.08467 [cs, stat] (Jul 2019), http://arxiv.org/abs/1907.08467, arXiv: 1907.08467
Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., Legg, S., Kavukcuoglu, K.: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arXiv:1802.01561 [cs] (Jun 2018), http://arxiv.org/abs/1802.01561, arXiv: 1802.01561
Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)
van Hasselt, H., Guez, A., Silver, D.: Deep Reinforcement Learning with Double Q-learning. arXiv:1509.06461 [cs], December 2015, http://arxiv.org/abs/1509.06461, arXiv: 1509.06461
Hernandez-Garcia, J.F., Sutton, R.S.: Understanding multi-step deep reinforcement learning: a systematic study of the dqn target. arXiv preprint arXiv:1901.07510 (2019)
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. arXiv:1710.02298 [cs], October 2017, http://arxiv.org/abs/1710.02298, arXiv: 1710.02298
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. arXiv:1602.01783 [cs], June 2016, http://arxiv.org/abs/1602.01783, arXiv: 1602.01783
Nair, A., et al.: Massively parallel methods for deep reinforcement learning. arXiv:1507.04296 [cs] (Jul 2015), http://arxiv.org/abs/1507.04296, arXiv: 1507.04296
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized Experience Replay. arXiv:1511.05952 [cs], February 2016, http://arxiv.org/abs/1511.05952, arXiv: 1511.05952
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust Region Policy Optimization. arXiv:1502.05477 [cs], April 2017, http://arxiv.org/abs/1502.05477, arXiv: 1502.05477
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 [cs], August 2017, http://arxiv.org/abs/1707.06347, arXiv: 1707.06347
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. Adaptive Computation and Machine Learning Series. The MIT Press Cambridge, Massachusetts (2018)
MATH Google Scholar
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement Learning. arXiv:1511.06581 [cs], April 2016, http://arxiv.org/abs/1511.06581, arXiv: 1511.06581

Download references

Acknowledgements

We want to thank the authors of CuLE for making their work freely available, which helped us do our research.

Author information

Authors and Affiliations

Faculty of Computer Science and Management, Wroclaw University of Science and Technology, wybrzeze Wyspiańskiego 27, 50-370, Wroclaw, Poland
Marek Kopel & Witold Szczurek

Authors

Marek Kopel
View author publications
You can also search for this author in PubMed Google Scholar
Witold Szczurek
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Marek Kopel .

Editor information

Editors and Affiliations

Wrocław University of Science and Technology, Wrocław, Poland
Ngoc Thanh Nguyen
King Mongkut's Institute of Technology Ladkrabang, Bangkok, Thailand
Suphamit Chittayasothorn
Nanyang Technological University, Singapore, Singapore
Dusit Niyato
Wrocław University of Science and Technology, Wrocław, Poland
Bogdan Trawiński

Appendices

Appendicies

See Tables 1, 2 and 3.

A Hyperparameters

Table 1. PPO hyperparameters

Full size table

Table 2. DQN hyperparameters

Full size table

B System specification

Table 3. Hardware and software specification

Full size table

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Kopel, M., Szczurek, W. (2021). Parallelization of Reinforcement Learning Algorithms for Video Games. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_16

Download citation

DOI: https://doi.org/10.1007/978-3-030-73280-6_16
Published: 05 April 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73279-0
Online ISBN: 978-3-030-73280-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Parallelization of Reinforcement Learning Algorithms for Video Games

Abstract

Access this chapter

Similar content being viewed by others

A Parallel Approach to Advantage Actor Critic in Deep Reinforcement Learning

Distributed Deep Reinforcement Learning: Learn How to Play Atari Games in 21 minutes

Applying Game-Learning Environments to Power Capping Scenarios via Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendicies

A Hyperparameters

B System specification

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Parallelization of Reinforcement Learning Algorithms for Video Games

Abstract

Access this chapter

Similar content being viewed by others

A Parallel Approach to Advantage Actor Critic in Deep Reinforcement Learning

Distributed Deep Reinforcement Learning: Learn How to Play Atari Games in 21 minutes

Applying Game-Learning Environments to Power Capping Scenarios via Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Appendices

Appendicies

A Hyperparameters

B System specification

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation