What Does a Policy Network Learn After Mastering a Pong Game?

Phon-Amnuaisuk, Somnuk

doi:10.1007/978-3-319-69456-6_18

Somnuk Phon-Amnuaisuk^16,17

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10607))

Included in the following conference series:

International Workshop on Multi-disciplinary Trends in Artificial Intelligence

1561 Accesses

Abstract

Activities in reinforcement learning (RL) revolve around learning the Markov decision process (MDP) model, in particular, the following quantities: state values V, state-action values Q, and policy \(\pi \). Due to high computational cost, the reinforcement learning problem is commonly formulated for learning task specific representations with hand-crafted input features. In this report, we discuss an alternative end-to-end approach where the RL attempts to learn general task representations, in this context, learning how to play the Pong game from a sequence of screen snap shots. We apply artificial neural networks to approximate a policy of a reinforcement learning model. The policy network learns to play the game from a sequence of frames without any extra semantics apart from the pixel information and the score. Many games are simulated using different network architectures and different parameters settings. We examine the activation of hidden nodes and the weights between the input and the hidden layers, before and after the RL has successfully learned to play the game. Insights into the internal learning mechanisms and future research directions are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
Our policy network is implemented using a feed forward neural network. It could successfully learn and win the game with an average score of 5, e.g., 21:16. There are many other implementations with better scores such as those demonstrated using asynchronous actor-critic agents (A3C) and long-short term memory (LSTM) learning algorithms.

References

Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2017)
Google Scholar
Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6(2), 215–219 (1994)
Article Google Scholar
Phon-Amnuaisuk, S.: Learning chasing behaviours of non-player characters in games using SARSA. In: Chio, C., et al. (eds.) EvoApplications 2011. LNCS, vol. 6624, pp. 133–142. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20525-5_14
Chapter Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittweiser, J., Antonofglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484–489 (2016)
Article Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436–444 (2015)
Article Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015)
Article Google Scholar
Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253–279 (2013)
Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)
Google Scholar
Sukhbaatar, S., Szlam, A., Weston, J., Fergus, R.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440–2448 (2015)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). doi:10.1007/978-3-319-10590-1_53
Google Scholar
Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229–256 (1992)
MATH Google Scholar
Phon-Amnuaisuk, S.: Evolving and discovering Tetris gameplay strategies. In: Proceedings of the 19th Annual Conference on Knowledge-Based and Intelligent Information and Engineering Systems (KES 2015), vol. 60, pp. 458–467 (2015). Procedia Comput. Sci
Google Scholar
Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 387–395 (2014)
Google Scholar
Andrychowicz, M., Denil, M., Colmenarejo, S.G., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, pp. 3981–3989 (2016)
Google Scholar
Williams, R.J.: On the use of backpropagation in associative reinforcement learning. In: Proceedings of the IEEE International Conference on Neural Networks, vol. I, pp. 263–270 (1988)
Google Scholar

Download references

Acknowledgments

We wish to thank anonymous reviewers for their comments that have helped improve this paper. We would like to thank the GSR office for their financial support given to this research.

Author information

Authors and Affiliations

Media Informatics Special Interest Group, Centre for Innovative Engineering, Universiti Teknologi Brunei, Mukim Gadong, Brunei
Somnuk Phon-Amnuaisuk
School of Computing and Informatics, Universiti Teknologi Brunei, Mukim Gadong, Brunei
Somnuk Phon-Amnuaisuk

Authors

Somnuk Phon-Amnuaisuk
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Somnuk Phon-Amnuaisuk .

Editor information

Editors and Affiliations

Universiti Teknologi Brunei, Gadong, Brunei Darussalam
Somnuk Phon-Amnuaisuk
Universiti Teknologi Brunei, Gadong, Brunei Darussalam
Swee-Peng Ang
Korea Advanced Institute of Science and Technology, Daejeon, Korea (Republic of)
Soo-Young Lee

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Phon-Amnuaisuk, S. (2017). What Does a Policy Network Learn After Mastering a Pong Game?. In: Phon-Amnuaisuk, S., Ang, SP., Lee, SY. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture Notes in Computer Science(), vol 10607. Springer, Cham. https://doi.org/10.1007/978-3-319-69456-6_18

Download citation

DOI: https://doi.org/10.1007/978-3-319-69456-6_18
Published: 19 October 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-69455-9
Online ISBN: 978-3-319-69456-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics