Reinforcement Learning with Neural Networks: Tricks of the Trade

  • Christopher J. Gatti
  • Mark J. Embrechts
Part of the Studies in Computational Intelligence book series (SCI, volume 410)


Reinforcement learning enables the learning of optimal behavior in tasks that require the selection of sequential actions. This method of learning is based on interactions between an agent and its environment. Through repeated interactions with the environment, and the receipt of rewards, the agent learns which actions are associated with the greatest cumulative reward.

This work describes the computational implementation of reinforcement learning. Specifically, we present reinforcement learning using a neural network to represent the valuation function of the agent, as well as the temporal difference algorithm, which is used to train the neural network. The purpose of this work is to present the bare essentials in terms of what is necessary for one to understand how to apply reinforcement learning using a neural network. Additionally, we describe two example implementations of reinforcement learning using the board games of Tic-Tac-Toe and Chung Toi, a challenging extension to Tic-Tac-Toe.


Hide Layer Reinforcement Learning Encode Scheme Hide Node Network Weight 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bakker, B., Schmidhuber, J.: Hierarchical reinforcement learning based on subgoal discovery and subpolicy specialization. In: Groen, F., Amato, N., Bonarini, A., Yoshida, E., Kröse, B. (eds.) Proc. of the 8th Conf. on Intell., Amsterdam, The Netherlands, pp. 438–445 (2004)Google Scholar
  2. 2.
    Binkley, K.J., Seehart, K., Hagiwara, M.: A study of artificial neural network architectures for Othello evaluation functions. Trans. Jpn. Soc. Artif. Intell. 22(5), 461–471 (2007)CrossRefGoogle Scholar
  3. 3.
    Doya, K.: Reinforcement learning in continuous time and space. Neural Comput. 12(1), 219–245 (1999)CrossRefGoogle Scholar
  4. 4.
    Embrechts, M.J., Hargis, B.J., Linton, J.D.: An augmented efficient backpropagation training strategy for deep autoassociative neural networks. In: Proc. of the 15th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, April 28-30, pp. 141–146 (2010)Google Scholar
  5. 5.
    Gatti, C.J., Linton, J.D., Embrechts, M.J.: A brief tutorial on reinforcement learning: The game of Chung Toi. In: Proc. of the 19th European Symposium on Artificial Neural Networks (ESANN), Bruges, Belgium, April 27-29 (2011)Google Scholar
  6. 6.
    Ghory, I.: Reinforcement Learning in Board Games. Technical Report CSTR-04-004, Department of Computer Science. University of Bristol (2004)Google Scholar
  7. 7.
    Haykin, S.: Neural Networks and Learning Machines, 3rd edn. Prentice-Hall, New York (2008)Google Scholar
  8. 8.
    Konen, W., Bartz–Beielstein, T.: Reinforcement Learning: Insights from Interesting Failures in Parameter Selection. In: Rudolph, G., Jansen, T., Lucas, S., Poloni, C., Beume, N. (eds.) PPSN 2008. LNCS, vol. 5199, pp. 478–487. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  9. 9.
    LeCun, Y.A., Bottou, L., Orr, G.B., Müller, K.-R.: Efficient Backprop. In: Orr, G.B., Müller, K.-R. (eds.) NIPS-WS 1996. LNCS, vol. 1524, p. 9. Springer, Heidelberg (1998)CrossRefGoogle Scholar
  10. 10.
    Mannen, H., Wiering, M.: Learning to play chess using TD(λ)–learning with database games. In: Benelearn 2004: Proc. of the 13th Belgian-Dutch Conference on Machine Learning, pp. 72–79 (2004)Google Scholar
  11. 11.
    Moore, A.: Efficient memory-based learning for robot control. PhD Thesis. University of Cambridge (1990)Google Scholar
  12. 12.
    Patist, J.P., Wiering, M.: Learning to play draughts using temporal difference learning with neural networks and databases. In: Benelearn 2004: Proc. of the 13th Belgian-Dutch Conference on Machine Learning, pp. 87–94 (2004)Google Scholar
  13. 13.
    Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error propagation. In: Rumelhart, D.E., McClelland, J.L. (eds.) Parallel Distributed Processing, vol. 1. MIT Press, Cambridge (1986)Google Scholar
  14. 14.
    Sutton, R.S.: Learning to predict by the method of temporal difference. Mach. Learn. 3, 9–44 (1988)Google Scholar
  15. 15.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1988)Google Scholar
  16. 16.
    Tesauro, G.: Neurogammon: A neural network backgammon program. In: Proc. of the International Joint Conference on Neural Networks., vol. 3, pp. 33–40 (1990)Google Scholar
  17. 17.
    Tesauro, G.: Practical issues in temporal difference learning. Mach. Learn. 8, 257–277 (1992)zbMATHGoogle Scholar
  18. 18.
    Tesauro, G.: Temporal difference learning and TD-Gammon. Communications of the ACM 8(3), 58–68 (1995)CrossRefGoogle Scholar
  19. 19.
    Werbos, P.: Beyond regression: New tools for prediction and analysis in the behavioral sciences. Ph.D. Dissertation. Harvard University, Cambridge, MA (1974)Google Scholar
  20. 20.
    Wiering, M.A.: TD learning of game evaluation functions with hierarchical neural architectures. Master’s Thesis. University of Amsterdam (1995)Google Scholar
  21. 21.
    Wiering, M.A.: Self-play and using an expert to learn to play backgammon with temporal different learning. J. Intell. Learn. Syst. & Appl. 2, 57–68 (2010)Google Scholar

Copyright information

© Springer Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Rensselaer Polytechnic InstituteTroyUSA

Personalised recommendations