Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Whiteson, Shimon; Taylor, Matthew E.; Stone, Peter

doi:10.1007/s10458-009-9100-2

Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Open access
Published: 17 July 2009

Volume 21, pages 1–35, (2010)
Cite this article

Download PDF

You have full access to this open access article

Autonomous Agents and Multi-Agent Systems Aims and scope Submit manuscript

Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Download PDF

Shimon Whiteson¹,
Matthew E. Taylor² &
Peter Stone³

828 Accesses
22 Citations
Explore all metrics

Abstract

Temporal difference and evolutionary methods are two of the most common approaches to solving reinforcement learning problems. However, there is little consensus on their relative merits and there have been few empirical studies that directly compare their performance. This article aims to address this shortcoming by presenting results of empirical comparisons between Sarsa and NEAT, two representative methods, in mountain car and keepaway, two benchmark reinforcement learning tasks. In each task, the methods are evaluated in combination with both linear and nonlinear representations to determine their best configurations. In addition, this article tests two specific hypotheses about the critical factors contributing to these methods’ relative performance: (1) that sensor noise reduces the final performance of Sarsa more than that of NEAT, because Sarsa’s learning updates are not reliable in the absence of the Markov property and (2) that stochasticity, by introducing noise in fitness estimates, reduces the learning speed of NEAT more than that of Sarsa. Experiments in variations of mountain car and keepaway designed to isolate these factors confirm both these hypotheses.

Article PDF

Qualitative differences between evolutionary strategies and reinforcement learning methods for control of autonomous agents

Article 07 December 2022

Synergies Between Reinforcement Learning and Evolutionary Dynamic Optimisation

Synchronisms Using Reinforcement Learning as an Heuristic

Find the latest articles, discoveries, and news in related topics.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Albus J. S. (1981) Brains, behavior, and robotics. Byte Books, Peterborough, NH
Google Scholar
Anderson, C. W. (1986). Learning and problem solving with multilayer connectionist systems. Ph.D. thesis, University of Massachusetts, Amherst, MA.
Baird, L., & Moore, A. (1999). Gradient descent for general reinforcement learning. In Advances in Neural Information Processing Systems (Vol. 11). Cambridge, MA: MIT Press.
Bakker, B. (2002). Reinforcement learning with long short-term memory. In Advances in Neural Information Processing Systems (Vol. 14, pp. 1475–1482).
Barto, A., & Duff, M. (1994). Monte Carlo matrix inversion and reinforcement learning. In Advances in Neural Information Processing Systems (Vol. 6, pp. 687–694).
Barto A. G., Sutton R. S., Anderson C. W. (1983) Neuronlike adaptive elements that can solve difficult learning control problems. IEEE Transactions on Systems, Man, and Cybernetics SMC-13(5): 834–846
Google Scholar
Baxter J., Bartlett P. L. (2001) Infinite-horizon policy-gradient estimation. Journal of Artificial Intelligence Research 15: 319–350
Article MATH MathSciNet Google Scholar
Beielstein, T., & Markon, S. (2002). Threshold selection, hypothesis tests and DOE methods. 2002 Congresss on evolutionary computation (pp. 777–782).
Bellman R. E. (1956) A problem in the sequential design of experiments. Sankhya 16: 221–229
MATH MathSciNet Google Scholar
Bellman R. E. (1957) Dynamic programming. Princeton University Press, Princeton
Google Scholar
Beyer, H.-G., & Sendhoff, B. (2007). Evolutionary algorithms in the presence of noise: To sample or not to sample. In Proceedings of the 1st IEEE Symposium on Foundations of Computational Intelligence (pp. 17–24).
Boyan, J. A., & Moore, A. W. (1995). Generalization in reinforcement learning: Safely approximating the value function. In Advances in Neural Information Processing Systems (Vol. 7).
Bradtke, S. J., & Duff, M. O. (1995). Reinforcement learning methods for continuous-time Markov decision problems. In Advances in Neural Information Processing Systems (Vol. 7, pp. 393–400).
Brafman R. I., Tennenholtz M. (2002) R-MAX—a general polynomial time algorithm for near-optimal reinforcement learning. Journal of Machine Learning Research 3: 213–231
Article MathSciNet Google Scholar
Crites R. H., Barto A. G. (1998) Elevator group control using multiple reinforcement learning agents. Machine Learning 33(2-3): 235–262
Article MATH Google Scholar
Darwen, P. J. (2001). Why co-evolution beats temporal difference learning at backgammon for a linear architecture, but not a non-linear architecture. In Proceedings of the 2001 Congress on Evolutionary Computation (pp. 1003–1010).
Gauci, J. J., & Stanley, K. O. (2007). Generating large-scale neural networks through discovering geometric regularities. In Proceedings of the Genetic and Evolutionary Computation Conference.
Goldberg D. E. (1989) Genetic algorithms in search, optimization and machine learning. Addison-Wesley, Boston, MA
MATH Google Scholar
Gomez, F., & Miikkulainen, R. (1999). Solving non-Markovian control tasks with neuroevolution. In Proceedings of the International Joint Conference on Artificial Intelligence (pp. 1356–1361).
Gomez, F., & Schmidhuber, J. (2005). Co-evolving recurrent neurons learn deep memory pomdps. In GECCO-05: Proceedings of the Genetic and Evolutionary Computation Conference (pp. 491–498).
Gomez, F., Schmidhuber, J., & Miikkulainen, R. (2006). Efficient non-linear control through neuroevolution. In Proceedings of the European Conference on Machine Learning.
Gruau, F., Whitley, D., & Pyeatt, L. (1996). A comparison between cellular encoding and direct encoding for genetic neural networks. In Genetic Programming 1996: Proceedings of the 1st Annual Conference (pp. 81–89).
Heidrich-Meisner, V., & Igel, C. (2008a). Evolution strategies for direct policy search. In Proceedings of the 10th International Conference on Parallel Problem Solving from Nature (pp. 428–437). Berlin, Heidelberg: Springer.
Heidrich-Meisner, V., & Igel, C. (2008b). Similarities and differences between policy gradient methods and evolution strategies. In Proceedings of the 16th European Symposium on Artificial Neural Networks (ESANN).
Heidrich-Meisner, V., & Igel, C. (2008c). Variable metric reinforcement learning methods applied to the noisy mountain car problem. In Recent Advances in Reinforcement Learning: 8th European Workshop (pp. 136–150). Berlin, Heidelberg: Springer.
Jong, N. K., & Stone, P. (2007). Model-based exploration in continuous state spaces. In The 7th Symposium on Abstraction, Reformulation, and Approximation.
Kakade, S. (2003). On the sample complexity of reinforcement learning. Ph.D. thesis, University College London, London, UK.
Kalyanakrishnan, S., & Stone, P. (2009). An empirical analysis of value function-based and policy search reinforcement learning. In Proceedings of the 8th International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2009).
Kassahun, Y., & Sommer, G. (2005). Automatic neural robot controller design using evolutionary acquisition of neural topologies. In Fachgespräch Autonome Mobile Systeme (AMS 2005), Stuttgart, Germany, 8, 9.12.05, Informatik aktuell (Vol. 19, pp. 315–321). Springer.
Kearns M., Singh S. (2002) Near-optimal reinforcement learning in polynomial time. Machine Learning 49(2): 209–232
Article MATH Google Scholar
Keller, P., Mannor, S., & Precup, D.(2006). Automatic basis function construction for approximate dynamic programming and reinforcement learning. In Proceedings of the 23rd International Conference on Machine Learning (pp. 449–456).
Kohl, N., & Miikkulainen, R. (2008). Evolving neural networks for fractured domains. In Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1405–1412).
Kohl N., Miikkulainen R. (2009) Evolving neural networks for strategic decision-making problems. Neural Networks, Special Issue on Goal-Directed Neural Systems 22(3): 326–337
Google Scholar
Kohl, M., & Stone, P. (2004). Policy gradient reinforcement learning for fast quadrupedal locomotion. In Proceedings of the IEEE International Conference on Robotics and Automation (pp. 2619–2624).
Kretchmar, R. M., & Anderson, C. W. (1997). Comparison of CMACs and radial basis functions for local function approximators in reinforcement learning. In International Conference on Neural Networks.
Lagoudakis M. G., Parr R. (2003) Least-squares policy iteration. Journal of Machine Learning Research 4: 1107–1149
Article MathSciNet Google Scholar
Littman, M. L., Dean, T. L., & Kaelbling, L. P. (1995). On the complexity of solving Markov decision processes. In Proceedings of the 11th International Conference on Uncertainty in Artificial Intelligence (pp. 394–402).
Lucas, S. M., & Runarsson, T. P. (2006). Temporal difference learning versus co-evolution for acquiring Othello position evaluation. In IEEE Symposium on Computational Intelligence and Games.
Lucas, S. M., & Togelius, J. (2007). Point-to-point car racing: An initial study of evolution versus temporal difference learning. In IEEE Symposium on Computational Intelligence and Games (pp. 260–267).
Mahadevan, S. (2005). Samuel meets Amarel: Automating value function approximation using global state space analysis. In Proceedings of the 20th National Conference on Artificial Intelligence.
Mannor, S., Rubenstein, R., & Gat, Y. (2003). The cross-entropy method for fast policy search. In Proceedings of the 20th International Conference on Machine Learning (pp. 512–519).
Menache I., Mannor S., Shimkin N. (2005) Basis function adaptation in temporal difference reinforcement earning. Annals of Operations Research 134: 215–238
Article MATH MathSciNet Google Scholar
Metzen, J. H., Edgington, M., Kassahun, Y., & Kirchner, F. (2008). Analysis of an evolutionary reinforcement learning method in a multiagent domain. In Proceedings of the 7th International Joint Conference on Autonomous Agents and Multi-Agent Systems (AAMAS 2008) (pp. 291–298). Estoril, Portugal.
Moore A., Atkeson C. (1993) Prioritized sweeping: Reinforcement learning with less data and less real time. Machine Learning 13: 103–130
Google Scholar
Moriarty D. E., Miikkulainen R. (1996) Efficient reinforcement learning through symbiotic evolution. Machine Learning 22(11): 11–33
Google Scholar
Moriarty D. E., Schultz A. C., Grefenstette J. J. (1999) Evolutionary algorithms for reinforcement learning. Journal of Artificial Intelligence Research 11: 99–229
Google Scholar
Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., et al. (2004). Inverted autonomous helicopter flight via reinforcement learning. In Proceedings of the International Symposium on Experimental Robotics.
Noda I., Matsubara H., Hiraki K., Frank I. (1998) Soccer server: A tool for research on multiagent systems. Applied Artificial Intelligence 12: 233–250
Article Google Scholar
Pollack J., Blair A. (1998) Co-evolution in the successful learning of backgammon strategy. Machine Learning 32: 225–240
Article MATH Google Scholar
Potter M. A., Jong K. A. D. (2000) Cooperative coevolution: An architecture for evolving coadapted subcomponents. Evolutionary Computation 8: 1–29
Article Google Scholar
Powell M. (1987) Radial basis functions for multivariate interpolation: A review algorithms for approximation. Clarendon Press, Oxford
Google Scholar
Pyeatt, L. D., & Howe, A. E. (2001). Decision tree function approximation in reinforcement learning. In Proceedings of the 3rd International Symposium on Adaptive Systems: Evolutionary computation and probabilistic graphical models (pp. 70–77).
Radcliffe N. J. (1993) Genetic set recombination and its application to neural network topology optimization. Neural Computing and Applications 1(1): 67–90
Article MATH MathSciNet Google Scholar
Reidmiller, M. (2005). Neural fitted Q iteration—first experiences with a data efficient neural reinforcement learning method. In Proceedings of the 16th European Conference on Machine Learning (pp. 317–328).
Rummery, G., & Niranjan, M. (1994). On-line Q-learning using connectionist systems CUED/F-INFENG/TR 166. Cambridge University.
Runarsson T. P., Lucas S. M. (2005) Co-evolution versus self-play temporal difference learning for acquiring position evaluation in small-board Go. IEEE Transactions on Evolutionary Computation 9: 628–640
Article Google Scholar
Saravanan N., Fogel D. B. (1995) Evolving neural control systems. IEEE Expert: Intelligent Systems and Their Applications 10(3): 23–27
Google Scholar
Smart, W. D., & Kaelbling, L. P. (2000). Practical reinforcement learning in continuous spaces. In Proceedings of the 17th International Conference on Machine Learning (pp. 903–910).
Stagge P. (1998) Averaging efficiently in the presence of noise. Parallel Problem Solving from Nature 5: 188–197
Article Google Scholar
Stanley K. O., Miikkulainen R. (2002) Evolving neural networks through augmenting topologies. Evolutionary Computation 10(2): 99–127
Article Google Scholar
Stanley K. O., Miikkulainen R. (2004) Competitive coevolution through evolutionary complexification. Journal of Artificial Intelligence Research 21: 63–100
Google Scholar
Stone P. (2000) Layered learning in multiagent systems: A winning approach to robotic soccer. MIT Press, Cambridge, MA
Google Scholar
Stone, P., Kuhlmann, G., Taylor, M. E., & Liu, Y. (2005a). Keepaway soccer: From machine learning testbed to benchmark. In RoboCup-2005: Robot Soccer World Cup IX (Vol. 4020, pp. 93–105). Berlin: Springer.
Stone P., Sutton R. S., Kuhlmann G. (2005) Learning in RoboCup-soccer keepaway. Adaptive Behavior 13(3): 165–188
Article Google Scholar
Strehl, A., & Littman, M. (2005). A theoretical analysis of model-based interval estimation. In Proceedings of the 22nd International Conference on Machine Learning (pp. 856–863).
Sutton, R. (1996). Generalization in reinforcement learning: Successful examples using sparse coarse coding. In Advances in Neural Information Processing Systems (Vol. 8, pp. 1038–1044).
Sutton R. S. (1988) Learning to predict by the methods of temporal differences. Machine Learning 3: 9–44
Google Scholar
Sutton, R. S. (1990). Integrated architectures for learning, planning, and reacting based on approximating dynamic programming. In Proceedings of the 7th International Conference on Machine Learning (pp. 216–224).
Sutton R. S., Barto A. G. (1998) Reinforcement learning: An introduction. MIT Press, Cambridge, MA
Google Scholar
Sutton, R., McAllester, D., Singh, S., & Mansour, Y. (2000). Policy gradient methods for reinforcement learning with function approximation. In Advances in Neural Information Processing Systems (pp. 1057–1063).
Szita I., Lörincz A. (2006) Learning Tetris using the noisy cross-entropy method. Neural Computation 18(12): 2936–2941
Article MATH Google Scholar
Taylor, M. E., Whiteson, S., & Stone, P. (2006). Comparing evolutionary and temporal difference methods in a reinforcement learning domain. In GECCO 2006: Proceedings of the Genetic and Evolutionary Computation Conference (pp. 1321–1328).
Tesauro G. (1994) TD-gammon, a self-teaching backgammon program achieves master-level play. Neural Computation 6: 215–219
Article Google Scholar
Tesauro G. (1998) Comments on “co-evolution in the successful learning of backgammon strategy”. Machine Learning 32(3): 241–243
Article MATH Google Scholar
Tesauro, G., Das, N. K. J. R., & Bennania, M. N. (2006). A hybrid reinforcement learning approach to autonomic resource allocation. In Proceedings of the 3rd International Conference on Autonomic Computing.
Watkins C., Dayan P. (1992) Q-learning. Machine Learning 8(3-4): 9–44
Article Google Scholar
Weiland, A. (1991). Evolving neural network controllers for unstable systems. In International Joint Conference on Neural Networks (pp. 667–673).
Whiteson S., Kohl N., Miikkulainen R., Stone P. (2005) Evolving keepaway soccer players through task decomposition. Machine Learning 59(1): 5–30
Article Google Scholar
Whiteson S., Stone P. (2006) Evolutionary function approximation for reinforcement learning. Journal of Machine Learning Research 7: 877–917
MathSciNet Google Scholar
Whitley D., Dominic S., Das R., Anderson C. W. (1993) Genetic reinforcement learning for neurocontrol problems. Machine Learning 13: 259–284
Article Google Scholar
Whitley, D., & Kauth, K. (1988). GENITOR: A different genetic algorithm. In Proceedings of the 1988 Rocky Mountain Conference on Artificial Intelligence (pp. 118–130).
Yao X. (1999) Evolving artificial neural networks. Proceedings of the IEEE 87(9): 1423–1447
Article Google Scholar

Download references

We would like to thank Ken Stanley for help setting up NEAT in keepaway, as well as Shivaram Kalyanakrishnan, Nate Kohl, Frans Oliehoek, David Pardoe, Jefferson Provost, Joseph Reisinger, Ken Stanley, and the anonymous reviewers for helpful comments and suggestions. This research was supported in part by NSF CAREER award IIS-0237699 and NSF award EIA-0303609.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

Informatics Institute, University of Amsterdam, Science Park 107, 1098 XG, Amsterdam, The Netherlands
Shimon Whiteson
Computer Sciences Department, The University of Southern California, 941 W. 37th Place, Los Angeles, CA, 90089-0781, USA
Matthew E. Taylor
Department of Computer Sciences, The University of Texas at Austin, 1 University Station C0500, Austin, TX, 78712-0233, USA
Peter Stone

Authors

Shimon Whiteson
View author publications
You can also search for this author in PubMed Google Scholar
Matthew E. Taylor
View author publications
You can also search for this author in PubMed Google Scholar
Peter Stone
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shimon Whiteson.

Additional information

This paper significantly extends an earlier conference paper, presented at the 2006 GECCO conference [72].

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Whiteson, S., Taylor, M.E. & Stone, P. Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning. Auton Agent Multi-Agent Syst 21, 1–35 (2010). https://doi.org/10.1007/s10458-009-9100-2

Download citation

Published: 17 July 2009
Issue Date: July 2010
DOI: https://doi.org/10.1007/s10458-009-9100-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Abstract

Article PDF

Similar content being viewed by others

Qualitative differences between evolutionary strategies and reinforcement learning methods for control of autonomous agents

Synergies Between Reinforcement Learning and Evolutionary Dynamic Optimisation

Synchronisms Using Reinforcement Learning as an Heuristic

References

Open Access

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Critical factors in the empirical performance of temporal difference and evolutionary methods for reinforcement learning

Abstract

Article PDF

Similar content being viewed by others

Qualitative differences between evolutionary strategies and reinforcement learning methods for control of autonomous agents

Synergies Between Reinforcement Learning and Evolutionary Dynamic Optimisation

Synchronisms Using Reinforcement Learning as an Heuristic

Explore related subjects

References

Open Access

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation