Skip to main content

Hybrid Independent Learning in Cooperative Markov Games

  • 412 Accesses

Part of the Lecture Notes in Computer Science book series (LNAI,volume 12547)

Abstract

Independent agents learning by reinforcement must overcome several difficulties, including non-stationarity, miscoordination, and relative overgeneralization. An independent learner may receive different rewards for the same state and action at different time steps, depending on the actions of the other agents in that state. Existing multi-agent learning methods try to overcome these issues by using various techniques, such as hysteresis or leniency. However, they all use the latest reward signal to update the Q function. Instead, we propose to keep track of the rewards received for each state-action pair, and use a hybrid approach for updating the Q values: the agents initially adopt an optimistic disposition by using the maximum reward observed, and then transform into average reward learners. We show both analytically and empirically that this technique can improve the convergence and stability of the learning, and is able to deal robustly with overgeneralization, miscoordination, and high degree of stochasticity in the reward and transition functions. Our method outperforms state-of-the-art multi-agent learning algorithms across a spectrum of stochastic and partially observable games, while requiring little parameter tuning.

Keywords

  • Multi-agent reinforcement learning
  • Markov games
  • Independent learners
  • Distributed Q-learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-64096-5_6
  • Chapter length: 16 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   59.99
Price excludes VAT (USA)
  • ISBN: 978-3-030-64096-5
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Softcover Book
USD   74.99
Price excludes VAT (USA)
Fig. 1.
Fig. 2.

Notes

  1. 1.

    We sometimes omit the subscript i, when it is clear that we are referring to a specific agent.

References

  1. Agogino, A.K., Tumer, K.: A multiagent approach to managing air traffic flow. Auton. Agent. Multi-Agent Syst. 24(1), 1–25 (2012). https://doi.org/10.1007/s10458-010-9142-5

    CrossRef  Google Scholar 

  2. Amato, C., Dibangoye, J.S., Zilberstein, S.: Incremental policy generation for finite-horizon DEC-POMDPs. In: International Conference on Automated Planning and Scheduling (ICAPS) (2009)

    Google Scholar 

  3. Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998, 746–752 (1998)

    Google Scholar 

  4. Fulda, N., Ventura, D.: Predicting and preventing coordination problems in cooperative q-learning systems. Int. Joint Conf. Artif. Intell. (IJCAI). 2007, 780–785 (2007)

    Google Scholar 

  5. Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: International Conference on Machine Learning (ICML) (2000)

    Google Scholar 

  6. Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J., Graepel, T.: Multi-agent reinforcement learning in sequential social dilemmas. In: International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 464–473 (2017)

    Google Scholar 

  7. Littman, M.L.: Markov games as a framework for multi-agent reinforcement learning. In: International Conference on Machine Learning (ICML), pp. 157–163 (1994)

    Google Scholar 

  8. Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 64–69 (2007)

    Google Scholar 

  9. Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. Knowl. Eng. Rev. 27(1), 1–31 (2012)

    CrossRef  Google Scholar 

  10. Nash, J.F.: Equilibrium points in n-person games. Proc. Natl. Acad. Sci. U.S.A. 36(1), 48–49 (1950)

    CrossRef  MathSciNet  Google Scholar 

  11. Palmer, G., Savani, R., Tuyls, K.: Negative update intervals in deep multi-agent reinforcement learning. In: International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), pp. 43–51 (2019)

    Google Scholar 

  12. Panait, L., Sullivan, K., Luke, S.: Lenient learners in cooperative multiagent systems. In: International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 801–803 (2006)

    Google Scholar 

  13. Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)

    MATH  Google Scholar 

  14. Verbeeck, K., Nowé, A., Parent, J., Tuyls, K.: Exploring selfish reinforcement learning in repeated games with stochastic rewards. Auton. Agent. Multi-Agent Syst. 14(3), 239–269 (2007)

    CrossRef  Google Scholar 

  15. Vrancx, P., Tuyls, K., Westra, R.: Switching dynamics of multi-agent learning. Int. Conf. Auton. Agent. Multiagent Syst. (AAMAS) 1, 307–313 (2008)

    Google Scholar 

  16. Wang, Y., De Silva, C.W.: Multi-robot box-pushing: single-agent q-learning vs. team q-learning. In: IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3694–3699 (2006)

    Google Scholar 

  17. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    MATH  Google Scholar 

  18. Wei, E., Luke, S.: Lenient learning in independent-learner stochastic cooperative games. J. Mach. Learn. Res. 17(1), 2914–2955 (2016)

    MathSciNet  MATH  Google Scholar 

  19. Yang, E., Gu, D.: Multiagent reinforcement learning for multi-robot systems: A survey. Technical report, Department of Computer Science, University of Essex, Technical report (2004)

    Google Scholar 

Download references

Acknowledgments

This work is funded by the U.S. Air Force Research Laboratory (AFRL), BAA Number: FA8750-18-S-7007, and NSF grant no. 1816382.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Roi Yehoshua .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Verify currency and authenticity via CrossMark

Cite this paper

Yehoshua, R., Amato, C. (2020). Hybrid Independent Learning in Cooperative Markov Games. In: Taylor, M.E., Yu, Y., Elkind, E., Gao, Y. (eds) Distributed Artificial Intelligence. DAI 2020. Lecture Notes in Computer Science(), vol 12547. Springer, Cham. https://doi.org/10.1007/978-3-030-64096-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-64096-5_6

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-64095-8

  • Online ISBN: 978-3-030-64096-5

  • eBook Packages: Computer ScienceComputer Science (R0)