Abstract
This paper explores a new way of parallelization of Reinforcement Learning algorithms - simulation of environments on the GPU. We use the recently proposed framework called CUDA Learning Environment as a basis for our work. To prove the approach’s viability, we performed experimentation with two main class of Reinforcement Learning algorithms - value based (Deep-Q-Network) and policy based (Proximal Policy Optimization). Our results validate the approach of using GPU for environment emulation in Reinforcement Learning algorithms and give insight into convergence properties and performance of those algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: GA3C: GPU-based A3C for Deep Reinforcement Learning. CoRR abs/1611.06256 (Nov 2016)
Badia, A.P., et al.: Agent57: outperforming the Atari Human Benchmark. arXiv:2003.13350 [cs, stat], March 2020, http://arxiv.org/abs/2003.13350, arXiv: 2003.13350
Cho, H., Oh, P., Park, J., Jung, W., Lee, J.: FA3C: FPGA-accelerated deep reinforcement learning. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, pp. 499–513. Association for Computing Machinery, Providence, RI, USA, April 2019. https://doi.org/10.1145/3297858.3304058, https://doi.org/10.1145/3297858.3304058
Clemente, A.V., Castejón, H.N., Chandra, A.: Efficient parallel methods for deep reinforcement learning. arXiv:1705.04862 [cs], May 2017, http://arxiv.org/abs/1705.04862, arXiv: 1705.04862
Dalton, S., Frosio, I., Garland, M.: GPU-Accelerated Atari Emulation for Reinforcement Learning. arXiv:1907.08467 [cs, stat] (Jul 2019), http://arxiv.org/abs/1907.08467, arXiv: 1907.08467
Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., Legg, S., Kavukcuoglu, K.: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arXiv:1802.01561 [cs] (Jun 2018), http://arxiv.org/abs/1802.01561, arXiv: 1802.01561
Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)
van Hasselt, H., Guez, A., Silver, D.: Deep Reinforcement Learning with Double Q-learning. arXiv:1509.06461 [cs], December 2015, http://arxiv.org/abs/1509.06461, arXiv: 1509.06461
Hernandez-Garcia, J.F., Sutton, R.S.: Understanding multi-step deep reinforcement learning: a systematic study of the dqn target. arXiv preprint arXiv:1901.07510 (2019)
Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. arXiv:1710.02298 [cs], October 2017, http://arxiv.org/abs/1710.02298, arXiv: 1710.02298
Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. arXiv:1602.01783 [cs], June 2016, http://arxiv.org/abs/1602.01783, arXiv: 1602.01783
Nair, A., et al.: Massively parallel methods for deep reinforcement learning. arXiv:1507.04296 [cs] (Jul 2015), http://arxiv.org/abs/1507.04296, arXiv: 1507.04296
Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized Experience Replay. arXiv:1511.05952 [cs], February 2016, http://arxiv.org/abs/1511.05952, arXiv: 1511.05952
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust Region Policy Optimization. arXiv:1502.05477 [cs], April 2017, http://arxiv.org/abs/1502.05477, arXiv: 1502.05477
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 [cs], August 2017, http://arxiv.org/abs/1707.06347, arXiv: 1707.06347
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. Adaptive Computation and Machine Learning Series. The MIT Press Cambridge, Massachusetts (2018)
Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement Learning. arXiv:1511.06581 [cs], April 2016, http://arxiv.org/abs/1511.06581, arXiv: 1511.06581
Acknowledgements
We want to thank the authors of CuLE for making their work freely available, which helped us do our research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Kopel, M., Szczurek, W. (2021). Parallelization of Reinforcement Learning Algorithms for Video Games. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_16
Download citation
DOI: https://doi.org/10.1007/978-3-030-73280-6_16
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-73279-0
Online ISBN: 978-3-030-73280-6
eBook Packages: Computer ScienceComputer Science (R0)