Efficient Exploration in Side-Scrolling Video Games with Trajectory Replay

Chiang, I-Huan; Huang, Chung-Ming; Cheng, Nien-Hu; Liu, Hsin-Yu; Tsai, Shi-Chun

doi:10.1007/s40869-019-00089-x

Efficient Exploration in Side-Scrolling Video Games with Trajectory Replay

Research
Published: 05 November 2019

Volume 9, pages 263–280, (2020)
Cite this article

The Computer Games Journal

I-Huan Chiang¹,
Chung-Ming Huang¹,
Nien-Hu Cheng¹,
Hsin-Yu Liu¹ &
…
Shi-Chun Tsai ORCID: orcid.org/0000-0002-0085-0377¹

284 Accesses
1 Citation
Explore all metrics

Abstract

Deep Reinforcement learning agent has outperformed human players in many games, such as the Atari 2600 games. In more complicated games, previous related works proposed a curiosity-driven exploration for learning. Nevertheless, it generally requires substantial computational resources to train the agent. We attempt to design a method to assist with our agent to explore the environment. By utilizing prior learned experience more effectively, we develop a new memory replay mechanism, which consists of two modules: Trajectory Replay Module to record the agent moving trajectory information with much less space, and the Trajectory Optimization Module to formulate the state information as a reward. We evaluate our approach with two popular side-scrolling video games: Super Mario Bros and Sonic the Hedgehog. The experiment results show that our method can help the agent explore the environment efficiently, pass through various tough scenarios and successfully reach the goal in most of the testing game levels with merely four workers and of ordinary CPU computational resources for training. The demo videos are at Super Mario Bros and Sonic the Hedgehog.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 4

Fig. 9

Multi-agent deep reinforcement learning: a survey

Article Open access 15 April 2021

Monte Carlo Tree Search: a review of recent modifications and applications

Article Open access 19 July 2022

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., Ghemawat, S., Goodfellow, I., Harp, A., Irving, G., Isard, M., Jia, Y., Jozefowicz, R., Kaiser, L., Kudlur, M., Levenberg, J., Mané, D., Monga, R., Moore, S., Murray, D., Olah, C., Schuster, M., Shlens, J., Steiner, B., Sutskever, I., Talwar, K., Tucker, P., Vanhoucke, V., Vasudevan, V., Viégas, F., Vinyals, O., Warden, P., Wattenberg, M., Wicke, M., Yu, Y., & Zheng, X. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. URL https://www.tensorflow.org/. Software available from tensorflow.org.
Arjona-Medina, J. A., Gillhofer, M., Widrich, M., Unterthiner, T., & Hochreiter, S. (2018). Rudder: Return decomposition for delayed rewards. arXiv preprint arXiv:1806.07857.
Bellemare, M., Srinivasan, S., Ostrovski, G., Schaul, T., Saxton, D., & Munos, R. (2016). Unifying count-based exploration and intrinsic motivation. In Advances in neural information processing systems (pp. 1471–1479).
de Bruin, T., Kober, J., Tuyls, K., & Babuška, R. (2015). The importance of experience replay database composition in deep reinforcement learning. In Deep reinforcement learning workshop, NIPS.
Burda, Y., Edwards, H., Pathak, D., Storkey, A., Darrell, T., & Efros, A.A. (2018). Large-scale study of curiosity-driven learning. arXiv preprint arXiv:1808.04355.
Clark, J., & Amodei, D. (2016). Faulty reward functions in the wild. URL https://blog.openai.com/faulty-reward-functions/.
Clevert, D. A., Unterthiner, T., & Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289.
Conti, E., Madhavan, V., Such, F. P., Lehman, J., Stanley, K., & Clune, J. (2018). Improving exploration in evolution strategies for deep reinforcement learning via a population of novelty-seeking agents. In Advances in neural information processing systems (pp. 5027–5038).
Ecoffet, A., Huizinga, J., Lehman, J., Stanley, K. O., & Clune, J. (2019). Go-explore: a new approach for hard-exploration problems. arXiv preprint arXiv:1901.10995.
Fu, J., Co-Reyes, J., & Levine, S. (2017). Ex2: Exploration with exemplar models for deep reinforcement learning. In Advances in neural information processing systems (pp. 2577–2587).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735.
Article Google Scholar
Houthooft, R., Chen, X., Duan, Y., Schulman, J., De Turck, F., & Abbeel, P. (2016). Curiosity-driven exploration in deep reinforcement learning via bayesian neural networks. arXiv preprint arxiv.1605.09674.
Jaderberg, M., Mnih, V., Czarnecki, W. M., Schaul, T., Leibo, J. Z., Silver, D., & Kavukcuoglu, K. (2016). Reinforcement learning with unsupervised auxiliary tasks. arXiv preprint arXiv:1611.05397.
Kelly, M. (2017). An introduction to trajectory optimization: How to do your own direct collocation. SIAM Review, 59(4), 849–904.
Article MathSciNet Google Scholar
Kimura, D., Chaudhury, S., Tachibana, R., & Dasgupta, S. (2018). Internal model from observations for reward shaping. arXiv preprint arXiv:1806.01267.
Kompella, V. R., Stollenga, M., Luciw, M., & Schmidhuber, J. (2017). Continual curiosity-driven skill acquisition from high-dimensional video inputs for humanoid robots. Artificial Intelligence, 247, 313–335.
Article MathSciNet Google Scholar
Liu, R., & Zou, J. (2017). The effects of memory replay in reinforcement learning. arXiv preprint arXiv:1710.06574.
Nair, V., & Hinton, G. E. (2010). Rectified linear units improve restricted boltzmann machines. In Proceedings of the 27th international conference on machine learning (ICML-10) (pp. 807–814).
Nichol, A., Pfau, V., Hesse, C., Klimov, O., & Schulman, J. (2018). Gotta learn fast: A new benchmark for generalization in rl. arXiv preprint arXiv:1804.03720.
Oh, J., Guo, Y., Singh, S., & Lee, H. (2018). Self-imitation learning. arXiv preprint arXiv:1806.05635.
OpenAI: Openai five. https://blog.openai.com/openai-five/ (2018).
Pardo, F., Levdik, V., & Kormushev, P. (2018). Goal-oriented trajectories for efficient exploration. arXiv preprint arXiv:1807.02078.
Pardo, F., Levdik, V., & Kormushev, P. (2018). Q-map: a convolutional approach for goal-oriented reinforcement learning. arXiv preprint arXiv:1810.02927.
Pathak, D., Agrawal, P., Efros, A. A., & Darrell, T. (2017). Curiosity-driven exploration by self-supervised prediction. In International conference on machine learning (ICML) (vol. 2017).
Schaul, T., Quan, J., Antonoglou, I., & Silver, D. (2015). Prioritized experience replay. arXiv preprint arXiv:1511.05952.
Schulman, J., Klimov, O., Wolski, F., Dhariwal, P., & Radford, A. (2017). Proximal policy optimization. URL https://openai.com/blog/openai-baselines-ppo/
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.
Simonini, T. (2018). Sonic the hedgehog. in openai gym. github:simoninithomas/Deep\_reinforcement\_learning\_Course.
Sukhbaatar, S., Lin, Z., Kostrikov, I., Synnaeve, G., Szlam, A., & Fergus, R. (2017). Intrinsic motivation and automatic curricula via asymmetric self-play. arXiv preprint arXiv:1703.05407.
Tang, H., Houthooft, R., Foote, D., Stooke, A., Chen, O.X., Duan, Y., Schulman, J., DeTurck, F., & Abbeel, P. (2017). # exploration: A study of count-based exploration for deep reinforcement learning. In Advances in neural information processing systems (pp. 2753–2762).
Vinyals, O., Babuschkin, I., Chung, J., Mathieu, M., Jaderberg, M., Czarnecki, W.M., Dudzik, A., Huang, A., Georgiev, P., Powell, R., Ewalds, T., Horgan, D., Kroiss, M., Danihelka, I., Agapiou, J., Oh, J., Dalibard, V., Choi, D., Sifre, L., Sulsky, Y., Vezhnevets, S., Molloy, J., Cai, T., Budden, D., Paine, T., Gulcehre, C., Wang, Z., Pfaff, T., Pohlen, T., Wu, Y., Yogatama, D., Cohen, J., McKinney, K., Smith, O., Schaul, T., Lillicrap, T., Apps, C., Kavukcuoglu, K., Hassabis, D., & Silver, D. (2019). Alphastar: Mastering the real-time strategy game starcraft ii. https://deepmind.com/blog/alphastar-mastering-real-time-strategy-game-starcraft-ii/.

Download references

Author information

Authors and Affiliations

Department of Computer Science, National Chiao Tung University, 1001 University Road, Hsinchu, Taiwan
I-Huan Chiang, Chung-Ming Huang, Nien-Hu Cheng, Hsin-Yu Liu & Shi-Chun Tsai

Authors

I-Huan Chiang
View author publications
You can also search for this author in PubMed Google Scholar
Chung-Ming Huang
View author publications
You can also search for this author in PubMed Google Scholar
Nien-Hu Cheng
View author publications
You can also search for this author in PubMed Google Scholar
Hsin-Yu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Shi-Chun Tsai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shi-Chun Tsai.

About this article

Cite this article

Chiang, IH., Huang, CM., Cheng, NH. et al. Efficient Exploration in Side-Scrolling Video Games with Trajectory Replay. Comput Game J 9, 263–280 (2020). https://doi.org/10.1007/s40869-019-00089-x

Download citation

Received: 27 July 2019
Accepted: 30 October 2019
Published: 05 November 2019
Issue Date: September 2020
DOI: https://doi.org/10.1007/s40869-019-00089-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficient Exploration in Side-Scrolling Video Games with Trajectory Replay

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Keywords

Navigation

Efficient Exploration in Side-Scrolling Video Games with Trajectory Replay

Abstract

Access this article

Similar content being viewed by others

Multi-agent deep reinforcement learning: a survey

Monte Carlo Tree Search: a review of recent modifications and applications

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

References

Author information

Authors and Affiliations

Corresponding author

About this article

Cite this article

Share this article

Keywords

Search

Navigation