Skip to main content

What Does a Policy Network Learn After Mastering a Pong Game?

  • Conference paper
  • First Online:
Multi-disciplinary Trends in Artificial Intelligence (MIWAI 2017)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 10607))

  • 1561 Accesses

Abstract

Activities in reinforcement learning (RL) revolve around learning the Markov decision process (MDP) model, in particular, the following quantities: state values V, state-action values Q, and policy \(\pi \). Due to high computational cost, the reinforcement learning problem is commonly formulated for learning task specific representations with hand-crafted input features. In this report, we discuss an alternative end-to-end approach where the RL attempts to learn general task representations, in this context, learning how to play the Pong game from a sequence of screen snap shots. We apply artificial neural networks to approximate a policy of a reinforcement learning model. The policy network learns to play the game from a sequence of frames without any extra semantics apart from the pixel information and the score. Many games are simulated using different network architectures and different parameters settings. We examine the activation of hidden nodes and the weights between the input and the hidden layers, before and after the RL has successfully learned to play the game. Insights into the internal learning mechanisms and future research directions are discussed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    Our policy network is implemented using a feed forward neural network. It could successfully learn and win the game with an average score of 5, e.g., 21:16. There are many other implementations with better scores such as those demonstrated using asynchronous actor-critic agents (A3C) and long-short term memory (LSTM) learning algorithms.

References

  1. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, 2nd edn. MIT Press, Cambridge (2017)

    Google ScholarĀ 

  2. Tesauro, G.: TD-Gammon, a self-teaching backgammon program, achieves master-level play. Neural Comput. 6(2), 215ā€“219 (1994)

    ArticleĀ  Google ScholarĀ 

  3. Phon-Amnuaisuk, S.: Learning chasing behaviours of non-player characters in games using SARSA. In: Chio, C., et al. (eds.) EvoApplications 2011. LNCS, vol. 6624, pp. 133ā€“142. Springer, Heidelberg (2011). doi:10.1007/978-3-642-20525-5_14

    ChapterĀ  Google ScholarĀ 

  4. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., van den Driessche, G., Schrittweiser, J., Antonofglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of Go with deep neural networks and tree search. Nature 529(7587), 484ā€“489 (2016)

    ArticleĀ  Google ScholarĀ 

  5. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521, 436ā€“444 (2015)

    ArticleĀ  Google ScholarĀ 

  6. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S., Beattie, C., Sadik, A., Antonoglou, I., King, H., Kumaran, D., Wierstra, D., Legg, S., Hassabis, D.: Human-level control through deep reinforcement learning. Nature 518, 529ā€“533 (2015)

    ArticleĀ  Google ScholarĀ 

  7. Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. 47, 253ā€“279 (2013)

    Google ScholarĀ 

  8. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097ā€“1105 (2012)

    Google ScholarĀ 

  9. Sukhbaatar, S., Szlam, A., Weston, J., Fergus, R.: End-to-end memory networks. In: Advances in Neural Information Processing Systems, pp. 2440ā€“2448 (2015)

    Google ScholarĀ 

  10. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818ā€“833. Springer, Cham (2014). doi:10.1007/978-3-319-10590-1_53

    Google ScholarĀ 

  11. Williams, R.J.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3), 229ā€“256 (1992)

    MATHĀ  Google ScholarĀ 

  12. Phon-Amnuaisuk, S.: Evolving and discovering Tetris gameplay strategies. In: Proceedings of the 19th Annual Conference on Knowledge-Based and Intelligent Information and Engineering Systems (KES 2015), vol. 60, pp. 458ā€“467 (2015). Procedia Comput. Sci

    Google ScholarĀ 

  13. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 387ā€“395 (2014)

    Google ScholarĀ 

  14. Andrychowicz, M., Denil, M., Colmenarejo, S.G., Hoffman, M.W., Pfau, D., Schaul, T., Shillingford, B., de Freitas, N.: Learning to learn by gradient descent by gradient descent. In: Advances in Neural Information Processing Systems, pp. 3981ā€“3989 (2016)

    Google ScholarĀ 

  15. Williams, R.J.: On the use of backpropagation in associative reinforcement learning. In: Proceedings of the IEEE International Conference on Neural Networks, vol. I, pp. 263ā€“270 (1988)

    Google ScholarĀ 

Download references

Acknowledgments

We wish to thank anonymous reviewers for their comments that have helped improve this paper. We would like to thank the GSR office for their financial support given to this research.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Somnuk Phon-Amnuaisuk .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Ā© 2017 Springer International Publishing AG

About this paper

Cite this paper

Phon-Amnuaisuk, S. (2017). What Does a Policy Network Learn After Mastering a Pong Game?. In: Phon-Amnuaisuk, S., Ang, SP., Lee, SY. (eds) Multi-disciplinary Trends in Artificial Intelligence. MIWAI 2017. Lecture Notes in Computer Science(), vol 10607. Springer, Cham. https://doi.org/10.1007/978-3-319-69456-6_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-69456-6_18

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-69455-9

  • Online ISBN: 978-3-319-69456-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics