Multi-Dimensional Deep Memory Atari-Go Players for Parameter Exploring Policy Gradients

  • Mandy Grüttner
  • Frank Sehnke
  • Tom Schaul
  • Jürgen Schmidhuber
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6353)


Developing superior artificial board-game players is a widely-studied area of Artificial Intelligence. Among the most challenging games is the Asian game of Go, which, despite its deceivingly simple rules, has eluded the development of artificial expert players. In this paper we attempt to tackle this challenge through a combination of two recent developments in Machine Learning. We employ Multi-Dimensional Recurrent Neural Networks with Long Short-Term Memory cells to handle the multi-dimensional data of the board game in a very natural way. In order to improve the convergence rate, as well as the ultimate performance, we train those networks using Policy Gradients with Parameter-based Exploration, a recently developed Reinforcement Learning algorithm which has been found to have numerous advantages over Evolution Strategies. Our empirical results confirm the promise of this approach, and we discuss how it can be scaled up to expert-level Go players.


Hide Layer Recurrent Neural Network Board Size Neural Network Architecture Board Game 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bouzy, B., Chaslot, G.: Monte-Carlo Go Reinforcement Learning Experiments. In: IEEE 2006 Symposium on Computational Intelligence in Games, pp. 187–194. IEEE, Los Alamitos (2006)CrossRefGoogle Scholar
  2. 2.
    Gelly, S., Silver, D.: Combining online and offline knowledge in UCT. In: ICML, vol. 227 (2007)Google Scholar
  3. 3.
    Grüttner, M.: Evolving Multidimensional Recurrent Neural Networks for the Capture Game in Go (2008)Google Scholar
  4. 4.
    Graves, A.: Supervised Sequence Labelling with Recurrent Neural Networks. PhD thesis, Technische Universität München (2007)Google Scholar
  5. 5.
    Schaul, T., Schmidhuber, J.: Scalable neural networks for board games. In: Alippi, C., et al. (eds.) ICANN 2009. LNCS, vol. 5768, pp. 1005–1014. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  6. 6.
    Hansen, N., Ostermeier, A.: Completely derandomized self-adaptation in evolution strategies. Evolutionary Computation 9, 159–195 (2001)CrossRefGoogle Scholar
  7. 7.
    Schwefel, H.: Evolution and optimum seeking. Wiley, New York (1995)Google Scholar
  8. 8.
    Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., Schmidhuber, J.: Policy gradients with parameter-based exploration for control. In: Kůrková, V., Neruda, R., Koutník, J. (eds.) ICANN 2008, Part I. LNCS, vol. 5163, pp. 387–396. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    Rückstieß, T., Sehnke, F., Schaul, T., Wierstra, D., Sun, Y., Schmidhuber, J.: Exploring parameter space in reinforcement learning. Paladyn 1(1), 1–12 (2010)CrossRefGoogle Scholar
  10. 10.
    Schaul, T., Schmidhuber, J.: A scalable neural network architecture for board games. In: Proceedings of the IEEE Symposium on Computational Intelligence in Games (CIG 2008) (2008)Google Scholar
  11. 11.
    Konidaris, G., Shell, D., Oren, N.: Evolving Neural Networks for the Capture Game. In: Proceedings of the SAICSIT Postgraduate Symposium (2002)Google Scholar
  12. 12.
    Stanley, K.O., Miikkulainen, R.: Evolving a Roving Eye for Go (2004)Google Scholar
  13. 13.
    Graves, A., Fernández, S., Schmidhuber, J.: Multi-Dimensional Recurrent Neural Networks (2007)Google Scholar
  14. 14.
    Liwicki, M., Graves, A., Fernández, S., Bunke, H., Schmidhuber, J.: A novel approach to on-line handwriting recognition based on bidirectional long short-term memory networks. In: Proc. 9th Int. Conf. on Document Analysis and Recognition, pp. 367–371 (September 2007)Google Scholar
  15. 15.
    Wu, L., Baldi, P.: A scalable machine learning approach to go. In: Advances in Neural Information Processing Systems, vol. 19, pp. 1521–1528. MIT Press, Cambridge (2007)Google Scholar
  16. 16.
    Streichert, F., Ulmer, H.: JavaEvA - A Java Framework for Evolutionary Algorithms. Technical Report WSI-2005-06, Centre for Bioinformatics Tübingen, University of Tübingen (2005)Google Scholar
  17. 17.
    Streichert, F.: Evolutionary Algorithms in Multi-Modal and Multi-Objective Environments. PhD thesis (2007)Google Scholar
  18. 18.
    Sehnke, F., Osendorfer, C., Rückstieß, T., Graves, A., Peters, J., Schmidhuber, J.: Parameter-exploring policy gradients. Neural Networks 23(4), 551–559 (2010)CrossRefGoogle Scholar
  19. 19.
    Schaul, T., Bayer, J., Wierstra, D., Sun, Y., Felder, M., Sehnke, F., Rückstieß, T., Schmidhuber, J.: PyBrain. Journal of Machine Learning Research 11, 743–746 (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Mandy Grüttner
    • 1
  • Frank Sehnke
    • 1
  • Tom Schaul
    • 2
  • Jürgen Schmidhuber
    • 2
  1. 1.Faculty of Computer ScienceTechnische Universität MünchenGermany
  2. 2.IDSIAUniversity of LuganoSwitzerland

Personalised recommendations