Abstract
In recent years, deep reinforcement learning has developed rapidly. Many deep reinforcement learning models are applied in various simple game environments. There are many applications with environments far more complex than simple games. Hence, the performance of the deep reinforcement learning model should be improved in many aspects. In this paper, we explore the effect of fast training and enhancing spatio-temporal representation in deep reinforcement learning model. For the former aspect, we propose to utilize the depthwise separable Convolutional Neural Network (CNN) to accelerate deep reinforcement learning model. For the latter aspect, we introduce the convolutional long short-term memory network (ConvLSTM) to improve the expression ability of spatio-temporal feature. We verify the models in the experiments of StarCraft II [1], a game strategy with a complex environment for reinforcement learning. All of the agents learn a certain level game strategy, such as ‘siege’ and ‘searching’. The experimental results show that depth-wise separable CNN has a good effect in shortening training time and the ConvLSTM has better spatial and temporal feature representation ability to improve the performance of the agents.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Vinyals, O., et al.: StarCraft II: A New Challenge for Reinforcement Learning. https://arxiv.org/abs/1708.04782. Accessed 16 Aug 2017
Yu, K., Jia, L., Chen, Y., Xu, W.: Deep learning: yesterday, today, and tomorrow. J. Comput. Res. Develop. 20(6), 1349 (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: International Conference on Neural Information Processing Systems, pp. 1097–1105. Curran Associates Inc. (2012)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vis. 115(3), 211–252 (2015)
Karpathy, A., Toderici, G., Shetty, S., Leung, T., Sukthankar, R., Li, F.F.: Large-scale video classification with convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1725–1732. IEEE Computer Society (2014)
Cho, K., Merrienboer, B.V., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: Proceedings of Conference on Empirical Methods in Natural Language Processing, Doha, Qatar, pp. 1724–1734 (2014)
Yang, Z., Tao, D.P., Zhang, S.Y., Jin, L.W.: Similar handwritten Chinese character recognition based on deep neural networks with big data. J. Commun. 35(9), 184–189 (2014)
Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 6645–6649. IEEE (2013)
Li, Y., Zhang, J., Pan, D., Hu, D.: A study of speech recognition based on RNN-RBM language model. J. Comput. Res. Develop. 51(9), 1936–1944 (2014)
Sun, Z.J., Xue, L., Xu, Y.M., Wang, Z.: Overview of deep learning. Appl. Res. Comput. 29(8), 2806–2810 (2012)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, Bradford Book. MIT Press, Cambridge (2005). IEEE Transactions on Neural Networks 16(1), 285–286
Kober, J., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6(2), 215–219 (1989)
Kocsis, L., Szepesvári, C.: Bandit based Monte-Carlo planning. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML 2006. LNCS (LNAI), vol. 4212, pp. 282–293. Springer, Heidelberg (2006). https://doi.org/10.1007/11871842_29
Fu, Q.M., Liu, Q., Wang, H., Xiao, F., Yu, J., Li, J.: A novel off policy Q(λ) algorithm based on linear function approximation. Chin. J. Comput. 37(3), 677–686 (2014)
Gao, Y., Zhou, R.Y., Wang, H., Cao, Z.X.: Study on an average reward reinforcement learning algorithm. Chin. J. Comput. 30(8), 1372–1378 (2007)
Wei, Y.Z., Zhao, M.Y.: A reinforcement learning-based approach to dynamic job-shop scheduling. Acta Autom. Sin. 31(5), 765–771 (2005)
Ipek, E., Mutlu, O., Carunana, R.: Self-optimizing memory controllers: a reinforcement learning approach. In: International Symposium on Computer Architecture, pp. 39–50. IEEE (2008)
Mnih, V., et al.: Playing atari with deep reinforcement learning. Computer Science, pp. 201–220 (2013)
Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Silver, D.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Oh, J., Guo, X., Lee, H., Lewis, R., Singh, S.: Action-conditional video prediction using deep networks in atari games. In: Proceedings of the Neural Information Processing Systems, Montreal, Canada, pp. 2863–2871 (2015)
Caicedo, J.C., Lazebnik, S.: Active object localization with deep reinforcement learning. In: IEEE International Conference on Computer Vision, pp. 2488–2496. IEEE Computer Society (2015)
Lillicrap, T.P.: Continuous control with deep reinforcement learning. Comput. Sci. 8(6), A187 (2016)
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control, pp. 1329–1338 (2016)
Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep Q-Learning with model-based acceleration. In: Proceeding of ICML 2016 Proceedings of the 33rd International Conference on International Conference on Machine Learning, vol. 48, pp. 2829–2838 (2016)
Hansen, S.: Using deep Q-learning to control optimization hyperparameters. https://arxiv.org/abs/1602.04062v2. Accessed 19 Jun 2016
Andrychowicz, M.: Learning to learn by gradient descent by gradient descent. In: Proceedings of the Conference on Neural Information Processing Systems, Barcelona, Spain, pp. 3981–3989 (2016)
Mnih, V.: Asynchronous methods for deep reinforcement learning. In: Proceedings of the International Conference on Machine Learning, New York, USA, pp. 1928–1937 (2016)
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal Policy Optimization Algorithms. https://arxiv.org/abs/1707.06347v2. Accessed 28 Aug 2017
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust Region Policy Optimization. Computer Science, pp. 1889–1897 (2015)
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. https://arxiv.org/abs/1704.04861. Accessed 17 Apr 2017
Shi, X., Chen, Z., Wang, H., Yeung, D.Y., Wong, W., Woo, W.: Convolutional LSTM Network: a machine learning approach for precipitation nowcasting. In: International Conference on Neural Information Processing Systems, pp. 802–810. MIT Press (2015)
Jaderberg, M., et al.: Reinforcement Learning with Unsupervised Auxiliary Tasks. https://arxiv.org/abs/1611.05397. Accessed 16 Nov 2016
Acknowledgements
The work is funded by the Shanghai Undergraduate Student Innovation Project, the National Natural Science Foundation of China (No. 61170155) and the Shanghai Innovation Action Plan Project (No. 16511101200).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Li, Y., Fang, Y. (2018). Accelerating Spatio-Temporal Deep Reinforcement Learning Model for Game Strategy. In: Cheng, L., Leung, A., Ozawa, S. (eds) Neural Information Processing. ICONIP 2018. Lecture Notes in Computer Science(), vol 11303. Springer, Cham. https://doi.org/10.1007/978-3-030-04182-3_27
Download citation
DOI: https://doi.org/10.1007/978-3-030-04182-3_27
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-04181-6
Online ISBN: 978-3-030-04182-3
eBook Packages: Computer ScienceComputer Science (R0)