Abstract
Activities in reinforcement learning (RL) revolve around learning the Markov decision process (MDP) model, in particular, the following quantities: state values V, state-action values Q, and policy \(\pi \). Due to high computational cost, the reinforcement learning problem is commonly formulated for learning task specific representations with hand-crafted input features. In this report, we discuss an alternative end-to-end approach where the RL attempts to learn general task representations, in this context, learning how to play the Pong game from a sequence of screen snap shots. We apply artificial neural networks to approximate a policy of a reinforcement learning model. The policy network learns to play the game from a sequence of frames without any extra semantics apart from the pixel information and the score. Many games are simulated using different network architectures and different parameters settings. We examine the activation of hidden nodes and the weights between the input and the hidden layers, before and after the RL has successfully learned to play the game. Insights into the internal learning mechanisms and future research directions are discussed.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
Our policy network is implemented using a feed forward neural network. It could successfully learn and win the game with an average score of 5, e.g., 21:16. There are many other implementations with better scores such as those demonstrated using asynchronous actor-critic agents (A3C) and long-short term memory (LSTM) learning algorithms.
References
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2017)
Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6(2), 215ā219 (1994)
Phon-Amnuaisuk, S.: Learning chasing behaviours of non-player characters in games using SARSA. In: Chio, C., et al. (eds.) EvoApplications 2011. LNCS, vol. 6624, pp. 133ā142. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20525-5_14
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittweiser, J., Antonofglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484ā489 (2016)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436ā444 (2015)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529ā533 (2015)
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253ā279 (2013)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097ā1105 (2012)
Sukhbaatar, S., Szlam, A., Weston, J., Fergus, R.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440ā2448 (2015)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818ā833. Springer, Cham (2014). doi:10.1007/978-3-319-10590-1_53
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229ā256 (1992)
Phon-Amnuaisuk, S.: Evolving and discovering Tetris gameplay strategies. In: Proceedings of the 19th Annual Conference on Knowledge-Based and Intelligent Information and Engineering Systems (KES 2015), vol. 60, pp. 458ā467 (2015). Procedia Comput. Sci
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 387ā395 (2014)
Andrychowicz, M., Denil, M., Colmenarejo, S.G., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, pp. 3981ā3989 (2016)
Williams, R.J.: On the use of backpropagation in associative reinforcement learning. In: Proceedings of the IEEE International Conference on Neural Networks, vol. I, pp. 263ā270 (1988)
Acknowledgments
We wish to thank anonymous reviewers for their comments that have helped improve this paper. We would like to thank the GSR office for their financial support given to this research.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
Ā© 2017 Springer International Publishing AG
About this paper
Cite this paper
Phon-Amnuaisuk, S. (2017). What Does a Policy Network Learn After Mastering a Pong Game?. In: Phon-Amnuaisuk, S., Ang, SP., Lee, SY. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture Notes in Computer Science(), vol 10607. Springer, Cham. https://doi.org/10.1007/978-3-319-69456-6_18
Download citation
DOI: https://doi.org/10.1007/978-3-319-69456-6_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69455-9
Online ISBN: 978-3-319-69456-6
eBook Packages: Computer ScienceComputer Science (R0)