A Convergent Multiagent Reinforcement Learning Approach for a Subclass of Cooperative Stochastic Games

  • Thomas Kemmerich
  • Hans Kleine Büning
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7113)


We present a distributed Q-Learning approach for independently learning agents in a subclass of cooperative stochastic games called cooperative sequential stage games. In this subclass, several stage games are played one after the other. We also propose a transformation function for that class and prove that transformed and original games have the same set of optimal joint strategies. Under the condition that the played game is obtained through transformation, it will be proven that our approach converges to an optimal joint strategy for the last stage game of the transformed game and thus also for the original game. In addition, the ability to converge to ε-optimal joint strategies for each of the stage games is shown. The environment in our approach does not need to present a state signal to the agents. Instead, by the use of the aforementioned transformation function, the agents gain knowledge about state changes from an engineered reward. This allows agents to omit storing strategies for each single state, but to use only one strategy that is adapted to the currently played stage game. Thus, the algorithm has very low space requirements and its complexity is comparable to single agent Q-Learning. Besides theoretical analyses, we also underline the convergence properties with some experiments.


Reinforcement Learning Multiagent System Markov Decision Process Transformation Function Reward Function 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Bernstein, D.S., Givan, R., Immerman, N., Zilberstein, S.: The complexity of decentralized control of markov decision processes. Math. Oper. Res. 27, 819–840 (2002)MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent Reinforcement Learning: An Overview. In: Srinivasan, D., Jain, L.C. (eds.) Innovations in Multi-Agent Systems and Applications - 1. SCI, vol. 310, pp. 183–221. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proc. of the 15th National Conf. on Artificial Intelligence, pp. 746–752. AAAI Press (1998)Google Scholar
  4. 4.
    Kaelbling, L.P., Littman, M.L., Cassandra, A.R.: Planning and acting in partially observable stochastic domains. J. Artif. Intell. Res. 101(1-2), 99–134 (1998)MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Kaelbling, L.P., Littman, M.L., Moore, A.P.: Reinforcement learning: A survey. J. Artif. Intell. Res. 4, 237–285 (1996)Google Scholar
  6. 6.
    Kapetanakis, S., Kudenko, D.: Reinforcement learning of coordination in cooperative multi-agent systems. In: AAAI/IAAI, pp. 326–331. AAAI Press (2002)Google Scholar
  7. 7.
    Kemmerich, T., Kleine Büning, H.: Region-based heuristics for an iterative partitioning problem in multiagent systems. In: Proc. 3rd Intl. Conf. on Agents and Artificial Intelligence (ICAART 2011), vol. 2, pp. 200–205. SciTePress (2011)Google Scholar
  8. 8.
    Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proc. of the 17th Intl. Conf. on Machine Learning (ICML 2000), pp. 535–542. Morgan Kaufmann (2000)Google Scholar
  9. 9.
    Matignon, L., Laurent, G.J., Fort-Piat, L.: A study of fmq heuristic in cooperative multi-agent games. In: Proc. of Multi-Agent Sequential Decision Making in Uncertain Multi-Agent Domains (MSDM), AAMAS 2008 Workshop, pp. 77–91 (2008)Google Scholar
  10. 10.
    Melo, F.S., Meyn, S.P., Ribeiro, M.I.: An analysis of reinforcement learning with function approximation. In: Proc. of the 25th Intl. Conf. on Machine learning (ICML 2008), pp. 664–671. ACM (2008)Google Scholar
  11. 11.
    Mitchell, T.M.: Machine Learning. McGraw-Hill (1997)Google Scholar
  12. 12.
    Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Autonomous Agents and Multi-Agent Systems 11(3), 387–434 (2005)CrossRefGoogle Scholar
  13. 13.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. The MIT Press (1998)Google Scholar
  14. 14.
    Wang, X., Sandholm, T.: Reinforcement learning to play an optimal nash equilibrium in team markov games. In: NIPS, vol. 15, pp. 1571–1578. MIT Press (2003)Google Scholar
  15. 15.
    Watkins, C.J.C.H.: Learning from Delayed Rewards. Ph.D. thesis, King’s College, Cambridge (1989)Google Scholar
  16. 16.
    Watkins, C.J.C.H., Dayan, P.: Technical note q-learning. Machine Learning 8, 279–292 (1992)zbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Thomas Kemmerich
    • 1
  • Hans Kleine Büning
    • 2
  1. 1.International Graduate School Dynamic Intelligent SystemsUniversity of PaderbornPaderbornGermany
  2. 2.Department of Computer ScienceUniversity of PaderbornPaderbornGermany

Personalised recommendations