Skip to main content

Parallelization of Reinforcement Learning Algorithms for Video Games

  • Conference paper
  • First Online:
Intelligent Information and Database Systems (ACIIDS 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12672))

Included in the following conference series:

  • 1845 Accesses

Abstract

This paper explores a new way of parallelization of Reinforcement Learning algorithms - simulation of environments on the GPU. We use the recently proposed framework called CUDA Learning Environment as a basis for our work. To prove the approach’s viability, we performed experimentation with two main class of Reinforcement Learning algorithms - value based (Deep-Q-Network) and policy based (Proximal Policy Optimization). Our results validate the approach of using GPU for environment emulation in Reinforcement Learning algorithms and give insight into convergence properties and performance of those algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Babaeizadeh, M., Frosio, I., Tyree, S., Clemons, J., Kautz, J.: GA3C: GPU-based A3C for Deep Reinforcement Learning. CoRR abs/1611.06256 (Nov 2016)

    Google Scholar 

  2. Badia, A.P., et al.: Agent57: outperforming the Atari Human Benchmark. arXiv:2003.13350 [cs, stat], March 2020, http://arxiv.org/abs/2003.13350, arXiv: 2003.13350

  3. Cho, H., Oh, P., Park, J., Jung, W., Lee, J.: FA3C: FPGA-accelerated deep reinforcement learning. In: Proceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS 2019, pp. 499–513. Association for Computing Machinery, Providence, RI, USA, April 2019. https://doi.org/10.1145/3297858.3304058, https://doi.org/10.1145/3297858.3304058

  4. Clemente, A.V., Castejón, H.N., Chandra, A.: Efficient parallel methods for deep reinforcement learning. arXiv:1705.04862 [cs], May 2017, http://arxiv.org/abs/1705.04862, arXiv: 1705.04862

  5. Dalton, S., Frosio, I., Garland, M.: GPU-Accelerated Atari Emulation for Reinforcement Learning. arXiv:1907.08467 [cs, stat] (Jul 2019), http://arxiv.org/abs/1907.08467, arXiv: 1907.08467

  6. Espeholt, L., Soyer, H., Munos, R., Simonyan, K., Mnih, V., Ward, T., Doron, Y., Firoiu, V., Harley, T., Dunning, I., Legg, S., Kavukcuoglu, K.: IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures. arXiv:1802.01561 [cs] (Jun 2018), http://arxiv.org/abs/1802.01561, arXiv: 1802.01561

  7. Fortunato, M., Azar, M.G., Piot, B., Menick, J., Osband, I., Graves, A., Mnih, V., Munos, R., Hassabis, D., Pietquin, O., et al.: Noisy networks for exploration. arXiv preprint arXiv:1706.10295 (2017)

  8. van Hasselt, H., Guez, A., Silver, D.: Deep Reinforcement Learning with Double Q-learning. arXiv:1509.06461 [cs], December 2015, http://arxiv.org/abs/1509.06461, arXiv: 1509.06461

  9. Hernandez-Garcia, J.F., Sutton, R.S.: Understanding multi-step deep reinforcement learning: a systematic study of the dqn target. arXiv preprint arXiv:1901.07510 (2019)

  10. Hessel, M., et al.: Rainbow: combining improvements in deep reinforcement learning. arXiv:1710.02298 [cs], October 2017, http://arxiv.org/abs/1710.02298, arXiv: 1710.02298

  11. Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. arXiv:1602.01783 [cs], June 2016, http://arxiv.org/abs/1602.01783, arXiv: 1602.01783

  12. Nair, A., et al.: Massively parallel methods for deep reinforcement learning. arXiv:1507.04296 [cs] (Jul 2015), http://arxiv.org/abs/1507.04296, arXiv: 1507.04296

  13. Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized Experience Replay. arXiv:1511.05952 [cs], February 2016, http://arxiv.org/abs/1511.05952, arXiv: 1511.05952

  14. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust Region Policy Optimization. arXiv:1502.05477 [cs], April 2017, http://arxiv.org/abs/1502.05477, arXiv: 1502.05477

  15. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 [cs], August 2017, http://arxiv.org/abs/1707.06347, arXiv: 1707.06347

  16. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. Adaptive Computation and Machine Learning Series. The MIT Press Cambridge, Massachusetts (2018)

    MATH  Google Scholar 

  17. Wang, Z., Schaul, T., Hessel, M., van Hasselt, H., Lanctot, M., de Freitas, N.: Dueling network architectures for deep reinforcement Learning. arXiv:1511.06581 [cs], April 2016, http://arxiv.org/abs/1511.06581, arXiv: 1511.06581

Download references

Acknowledgements

We want to thank the authors of CuLE for making their work freely available, which helped us do our research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Marek Kopel .

Editor information

Editors and Affiliations

Appendices

Appendicies

See Tables 1, 2 and 3.

A Hyperparameters

Table 1. PPO hyperparameters
Table 2. DQN hyperparameters

B System specification

Table 3. Hardware and software specification

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Kopel, M., Szczurek, W. (2021). Parallelization of Reinforcement Learning Algorithms for Video Games. In: Nguyen, N.T., Chittayasothorn, S., Niyato, D., Trawiński, B. (eds) Intelligent Information and Database Systems. ACIIDS 2021. Lecture Notes in Computer Science(), vol 12672. Springer, Cham. https://doi.org/10.1007/978-3-030-73280-6_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-73280-6_16

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-73279-0

  • Online ISBN: 978-3-030-73280-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics