Ensemble Methods for Reinforcement Learning with Function Approximation
Ensemble methods allow to combine multiple models to increase the predictive performances but mostly utilize labelled data. In this paper we propose several ensemble methods to learn a combined parameterized state-value function of multiple agents. For this purpose the Temporal-Difference (TD) and Residual-Gradient (RG) update methods as well as a policy function is adapted to learn from joint decisions. Such joint decisions include Majority Voting and Averaging of the state-values. We apply these ensemble methods to the simple pencil-and-paper game Tic-Tac-Toe and show that an ensemble of three agents outperforms a single agent in terms of the Mean-Squared Error (MSE) to the true values as well as in terms of the resulting policy. Further we apply the same methods to learn the shortest path in a 20 ×20 maze and empirically show that the learning speed is faster and the resulting policy, i.e. the number of correctly choosen actions is better in an ensemble of multiple agents than that of a single agent.
Unable to display preview. Download preview PDF.
- 1.Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
- 2.Baird, L.: Residual Algorithms: Reinforcement Learning with Function Approximation. In: Proceedings of the 12th International Conference on Machine Learning pp. 30–37 (1995)Google Scholar
- 4.Schapire, R.E.: The Strength of Learnability. Machine Learning 5(2), 197–227 (1990)Google Scholar
- 7.Partalas, I., Feneris, I., Vlahavas, I.: Multi-Agent Reinforcement Learning using Strategies and Voting. In: 19th IEEE International Conference on Tools with Artificial Intelligence (ICTAI 2007), vol. 2, pp. 318–324 (2007)Google Scholar
- 8.Abdallah, S., Lesser, V.: Multiagent Reinforcement Learning and Self-Organization in a Network of Agents. In: Proceedings of the Sixth International Joint Conference on Autonomous Agents and Multi-Agent Systems, AAMAS 2007 pp.172–179 (2007)Google Scholar
- 10.Faußer, S., Schwenker, F.: Learning a Strategy with Neural Approximated Temporal-Difference Methods in English Draughts. In: ICPR 2010, pp. 2925–2928 (2010)Google Scholar