Skip to main content

Co-Evolution in the Successful Learning of Backgammon Strategy


Following Tesauro's work on TD-Gammon, we used a 4,000 parameter feedforward neural network to develop a competitive backgammon evaluation function. Play proceeds by a roll of the dice, application of the network to all legal moves, and selection of the position with the highest evaluation. However, no backpropagation, reinforcement or temporal difference learning methods were employed. Instead we apply simple hillclimbing in a relative fitness environment. We start with an initial champion of all zero weights and proceed simply by playing the current champion network against a slightly mutated challenger and changing weights if the challenger wins. Surprisingly, this worked rather well. We investigate how the peculiar dynamics of this domain enabled a previously discarded weak method to succeed, by preventing suboptimal equilibria in a “meta-game” of self-learning.


  1. Angeline, P.J. (1994). An alternate interpretation of the iterated prisoner's dilemma and the evolution of nonmutual cooperation. In R. Brooks, & P. Maes (Eds.), Proceedings 4th Artificial Life Conference (pp. 353–358). MIT Press.

  2. Angeline, P.J., & Pollack, J.B. (1994). Competitive environments evolve better solutions for complex tasks. In S. Forrest (Ed.), Genetic Algorithms: Proceedings of the Fifth International Conference.

  3. Axelrod, R. (1984). The Evolution of Cooperation. New York: Basic Books.

    Google Scholar 

  4. Boyan, J.A. (1992). Modular neural networks for learning context-dependent game strategies. Master's thesis, Computer Speech and Language Processing, Cambridge University.

  5. Cliff, D., & Miller, G. (1995). Tracking the red queen: Measurements of adaptive progress in co-evolutionary simulations. Third European Conference on Artificial Life (pp. 200–218).

  6. Crites, R., & Barto, A. (1996). Improving elevator performance using reinforcement learning. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8, pp. 1024–1030).

  7. Epstein, S.L. (1994). Toward an ideal trainer. Machine Learning, 15, 251–277.

    Google Scholar 

  8. Fogel, D.B. (1993). Using evolutionary programming to create neural networks that are capable of playing tictac-toe. International Conference on Neural Networks 1993 (pp. 875–880). IEEE Press.

  9. Hillis, D. (1992). Co-evolving parasites improve simulated evolution as an optimization procedure. In C. Langton, C. Taylor, J.D. Farmer & S. Rasmussen (Eds.), Artificial Life II (pp. 313–324). Addison-Wesley.

  10. Juille, H., & Pollack, J. (1995). Massively parallel genetic programming. In P. Angeline, & K. Kinnear (Eds.), Advances in Genetic Programming II. Cambridge: MIT Press.

    Google Scholar 

  11. Littman, M.L. (1994). Markov games as a framework for multi-agent reinforcement learning. Machine Learning: Proceedings of the Eleventh International Conference (pp. 157–163). Morgan Kaufmann.

  12. Littman, M.L. (1996). Algorithms for sequential decision making. Ph.D. dissertation, Providence: Brown University Computer Science Department.

  13. Maynard Smith, J. (1982). Evolution and the Theory of Games. Cambridge: Cambridge University Press.

    Google Scholar 

  14. Michie, D. (1961). Trial and error. Science Survey, part 2 (pp. 129–145). Penguin.

  15. Mitchell, M., Hraber, P.T., & Crutchfield, J.P. (1993). Revisiting the edge of chaos: Evolving cellular automata to perform computations. Complex Systems, 7, 89–130.

    Google Scholar 

  16. Packard, N. (1988). Adaptation towards the edge of chaos. In J.A.S. Kelso, A.J. Mandell, & M.F. Shlesinger (Eds.), Dynamic Patterns in Complex Systems (pp. 293–301). World Scientific.

  17. Reynolds, C. (1994). Competition, coevolution, and the game of tag. Proceedings 4th Artificial Life Conference. MIT Press.

  18. Rosin, C.D., & Belew, R.K. (1995). Methods for competitive co-evolution: Finding opponents worth beating. Proceedings of the 6th International Conference on Genetic Algorithms (pp. 373–380). Morgan Kaufman.

  19. Rumelhart, D., Hinton, G., & Williams, R. (1986). Learning representations by back-propagating errors. Nature, 323, 533–536.

    Google Scholar 

  20. Samuel, A.L. (1959). Some studies of machine learning using the game of checkers. IBM Journal of Research and Development.

  21. Schraudolph, N.N., Dayan, P., & Sejnowski, T. J. (1994). Temporal difference learning of position evaluation in the game of go. Advances in Neural Information Processing Systems (Vol. 6, pp. 817–824). Morgan Kauffman.

    Google Scholar 

  22. Sims, K. (1994). Evolving 3D morphology and behavior by competition. In R. Brooks, & P. Maes (Eds.), Proceedings 4th Artificial Life Conference. MIT Press.

  23. Sutton, R. (1988). Learning to predict by the methods of temporal differences. Machine Learning, 3, 9–44.

    Google Scholar 

  24. Tesauro, G. (1989). Connectionist learning of expert preferences by comparison training. In D. Touretzky (Ed.), Advances in Neural Information Processing Systems, Denver (Vol. 1, pp. 99–106). San Mateo: Morgan Kaufmann.

    Google Scholar 

  25. Tesauro, G. (1992). Practical issues in temporal difference learning. Machine Learning, 8, 257–277.

    Google Scholar 

  26. Tesauro, G. (1995). Temporal difference learning and td-gammon. Communications of the ACM, 38(3), 58–68.

    Google Scholar 

  27. Walker, S., Lister, R., & Downs, T. (1994). Temporal difference, non-determinism, and noise: A case study on the ‘othello’ board game. International Conference on Artificial Neural Networks 1994 (pp. 1428–1431). Sorrento, Italy.

  28. Zhang, W., & Dietterich, T. (1996). High-performance job-shop scheduling with a time-delay td(lambda) network. In D. Touretzky, M. Mozer, & M. Hasselmo (Eds.), Advances in Neural Information Processing Systems (Vol. 8).

Download references

Author information



Rights and permissions

Reprints and Permissions

About this article

Cite this article

Pollack, J.B., Blair, A.D. Co-Evolution in the Successful Learning of Backgammon Strategy. Machine Learning 32, 225–240 (1998).

Download citation

  • coevolution
  • backgammon
  • reinforcement
  • temporal difference learning
  • self-learning