Ensemble Methods for Reinforcement Learning with Function Approximation

  • Stefan Faußer
  • Friedhelm Schwenker
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6713)


Ensemble methods allow to combine multiple models to increase the predictive performances but mostly utilize labelled data. In this paper we propose several ensemble methods to learn a combined parameterized state-value function of multiple agents. For this purpose the Temporal-Difference (TD) and Residual-Gradient (RG) update methods as well as a policy function is adapted to learn from joint decisions. Such joint decisions include Majority Voting and Averaging of the state-values. We apply these ensemble methods to the simple pencil-and-paper game Tic-Tac-Toe and show that an ensemble of three agents outperforms a single agent in terms of the Mean-Squared Error (MSE) to the true values as well as in terms of the resulting policy. Further we apply the same methods to learn the shortest path in a 20 ×20 maze and empirically show that the learning speed is faster and the resulting policy, i.e. the number of correctly choosen actions is better in an ensemble of multiple agents than that of a single agent.


Single Agent Reinforcement Learn Markov Decision Process Ensemble Method Joint Decision 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  2. 2.
    Baird, L.: Residual Algorithms: Reinforcement Learning with Function Approximation. In: Proceedings of the 12th International Conference on Machine Learning pp. 30–37 (1995)Google Scholar
  3. 3.
    Breiman, L.: Bagging Predictors. Machine Learning 24, 123–140 (1996)zbMATHGoogle Scholar
  4. 4.
    Schapire, R.E.: The Strength of Learnability. Machine Learning 5(2), 197–227 (1990)Google Scholar
  5. 5.
    Sun, R., Peterson, T.: Multi-Agent Reinforcement Learning: Weighting and Partitioning. Journal on Neural Networks 12(4-5), 727–753 (1999)CrossRefGoogle Scholar
  6. 6.
    Kok, J.R., Vlassis, N.: Collaborative Multiagent Reinforcement Learning by Payoff Propagation. Journal of Machine Learning Research 7, 1789–1828 (2006)zbMATHGoogle Scholar
  7. 7.
    Partalas, I., Feneris, I., Vlahavas, I.: Multi-Agent Reinforcement Learning using Strategies and Voting. In: 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), vol. 2, pp. 318–324 (2007)Google Scholar
  8. 8.
    Abdallah, S., Lesser, V.: Multiagent Reinforcement Learning and Self-Organization in a Network of Agents. In: Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2007 pp.172–179 (2007)Google Scholar
  9. 9.
    Wiering, M.A., van Hasselt, H.: Ensemble Algorithms in Reinforcement Learning. IEEE Transactions on Systems, Man and Cybernetics, Part B: Cybernetics 38, 930–936 (2008), ISSN 1083-4419CrossRefGoogle Scholar
  10. 10.
    Faußer, S., Schwenker, F.: Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts. In: ICPR 2010, pp. 2925–2928 (2010)Google Scholar
  11. 11.
    Bishop, C.M.: Neural Networks for Pattern Recognition. Oxford University Press, Oxford (1995)zbMATHGoogle Scholar
  12. 12.
    Stuart, J.R., Norvig, P.: Artificial Intelligence: A Modern Approach, 2nd edn. Prentice-Hall, Englewood Cliffs (2002)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Stefan Faußer
    • 1
  • Friedhelm Schwenker
    • 1
  1. 1.Institute of Neural Information ProcessingUniversity of UlmUlmGermany

Personalised recommendations