Solving Sparse Delayed Coordination Problems in Multi-Agent Reinforcement Learning

  • Yann-Michaël De Hauwere
  • Peter Vrancx
  • Ann Nowé
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7113)


One of the main advantages of Reinforcement Learning is the capability of dealing with a delayed reward signal. Using an appropriate backup diagram, rewards are backpropagated through the state space. This allows agents to learn to take the correct action that results in the highest future (discounted) reward, even if that action results in a suboptimal immediate reward in the current state. In a multi-agent environment, agents can use the same principles as in single agent RL, but have to apply them in a complete joint-state-joint-action space to guarantee optimality. Learning in such a state space can however be very slow. In this paper we present our approach for mitigating this problem. Future Coordinating Q-learning (FCQ-learning) detects strategic interactions between agents several timesteps before these interactions occur. FCQ-learning uses the same principles as CQ-learning [3] to detect the states in which interaction is required, but several timesteps before this is reflected in the reward signal. In these states, the algorithm will augment the state information to include information about other agents which is used to select actions. The techniques presented in this paper are the first to explicitly deal with a delayed reward signal when learning using sparse interactions.


Optimal Policy Reinforcement Learn Multiagent System Coordination Problem Augmented State 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Boutilier, C.: Planning, learning and coordination in multiagent decision processes. In: Proceedings of the 6th Conference on Theoretical Aspects of Rationality and Knowledge, Renesse, Holland, pp. 195–210 (1996)Google Scholar
  2. 2.
    Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the 15th National Conference on Artificial Intelligence, pp. 746–752. AAAI Press (1998)Google Scholar
  3. 3.
    De Hauwere, Y.-M., Vrancx, P., Nowé, A.: Learning multi-agent state space representations. In: Proceedings of the 9th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), Toronto, Canada, pp. 715–722 (2010)Google Scholar
  4. 4.
    De Hauwere, Y.-M., Vrancx, P., Nowé, A.: Adaptive state representations for multi-agent reinforcement learning. In: Proceedings of the 3th International Conference on Agents and Artificial Intelligence, Rome, Italy, pp. 181–189 (2011)Google Scholar
  5. 5.
    Greenwald, A., Hall, K.: Correlated-q learning. In: AAAI Spring Symposium, pp. 242–249. AAAI Press (2003)Google Scholar
  6. 6.
    Hu, J., Wellman, M.: Nash Q-learning for general-sum stochastic games. Journal of Machine Learning Research 4, 1039–1069 (2003)MathSciNetzbMATHGoogle Scholar
  7. 7.
    Kok, J., ’t Hoen, P., Bakker, B., Vlassis, N.: Utile coordination: Learning interdependencies among cooperative agents. In: Proceedings of the IEEE Symposium on Computational Intelligence and Games (CIG), pp. 29–36 (2005)Google Scholar
  8. 8.
    Kok, J., Vlassis, N.: Sparse cooperative Q-learning. In: Proceedings of the 21st International Conference on Machine Learning (ICML). ACM, New York (2004)Google Scholar
  9. 9.
    Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: Proceedings of the 11th International Conference on Machine Learning (ICML), pp. 157–163. Morgan Kaufmann (1994)Google Scholar
  10. 10.
    Melo, F.S., Veloso, M.: Learning of coordination: Exploiting sparse interactions in multiagent systems. In: Proceedings of the 8th International Conference on Autonomous Agents and Multi-Agent Systems (AAMAS), pp. 773–780. International Foundation for Autonomous Agents and Multiagent Systems (2009)Google Scholar
  11. 11.
    Melo, F., Veloso, M.: Local multiagent coordination in decentralised mdps with sparse interactions. Tech. Rep. CMU-CS-10-133, School of Computer Science, Carnegie Mellon University (2010)Google Scholar
  12. 12.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (1998)Google Scholar
  13. 13.
    Tsitsiklis, J.: Asynchronous stochastic approximation and Q-learning. Journal of Machine Learning 16(3), 185–202 (1994)zbMATHGoogle Scholar
  14. 14.
    Vrancx, P., Verbeeck, K., Nowé, A.: Decentralized learning in markov games. IEEE Transactions on Systems, Man and Cybernetics (Part B: Cybernetics) 38(4), 976–981 (2008)CrossRefGoogle Scholar
  15. 15.
    Watkins, C.: Learning from Delayed Rewards. Ph.D. thesis, University of Cambridge (1989)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Yann-Michaël De Hauwere
    • 1
  • Peter Vrancx
    • 1
  • Ann Nowé
    • 1
  1. 1.Computational Modeling LabVrije Universiteit BrusselBrusselsBelgium

Personalised recommendations