Neural Approximation of Monte Carlo Policy Evaluation Deployed in Connect Four

  • Stefan Faußer
  • Friedhelm Schwenker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5064)


To win a board-game or more generally to gain something specific in a given Markov-environment, it is most important to have a policy in choosing and taking actions that leads to one of several qualitative good states. In this paper we describe a novel method to learn a game-winning strategy. The method predicts statistical probabilities to win in given game states using a state-value function that is approximated by a Multi-layer perceptron. Those predictions will improve according to rewards given in terminal states. We have deployed that method in the game Connect Four and have compared its game-performance with Velena [5].


  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Tesauro, G.: Temporal Difference Learning and TD-Gammon. Communications of the ACM 38(3) (1995)Google Scholar
  3. 3.
    Thimm, G., Fiesler, E.: High order and multilayer perceptron initialization. IEEE Transactions on Neural Networks 8(2), 249–259 (1997)CrossRefGoogle Scholar
  4. 4.
    Thimm, G., Fiesler, E.: Optimal Setting of Weights, Learning Rate and Gain. IDIAP Research Report, Dalle Molle Institute for Perceptive Artificial Intelligence, Switzerland (April 2007)Google Scholar
  5. 5.
    Bertoletti, G.: Velena: A Shannon C-type program which plays connect four perfectly (1997),
  6. 6.
    Allis, V.: A Knowledge-based Approach of Connect-Four, Department of Mathematics and Computer Science, Vrije Universiteit, Amsterdam (1998)Google Scholar
  7. 7.
    Lenze, B.: Einführung in die Mathematik neuronaler Netze. Logos Verlag, Berlin (2003)Google Scholar
  8. 8.
    Russel, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice Hall, Englewood Cliffs (2002)Google Scholar
  9. 9.
    Cybenko, G.V.: Approximation by Superpositions of a Sigmoidal function. Mathematics of Control, Signals and Systems 2, 303–314 (electronic version) (1989)CrossRefMathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Stefan Faußer
    • 1
  • Friedhelm Schwenker
    • 1
  1. 1.Institute of Neural Information ProcessingUniversity of UlmUlmGermany

Personalised recommendations