Skip to main content
Log in

Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

  • Short Paper
  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

In this article, we introduce the Smooth Q-Learning algorithm for independent learners (distributed and non-communicative learners) in cooperative Markov games. Smooth Q-Learning aimed to solve the relative over-generalization and the stochasticity problems while also performing well in the presence of other non-coordination factors such as the miscoordination problem (also known as the Pareto selection problem) and the non-stationarity problem. Smooth Q-Learning is an algorithm that tries to find a trade-off between two incompatible learning approaches: the maximum-based learning and the average-based learning, by dynamically adjusting the learning rate based on the value of temporal difference error in a way that ensures the algorithm lies somewhere between average-based learning and maximum-based learning. We compare Smooth Q-Learning against different algorithms from the literature: Decentralized Q-learning, Distributed Q-Learning, Hysteretic Q-Learning, and a recent version of Lenient Q-Learning called Lenient Multiagent Reinforcement learning 2. The results show that Smooth Q-Learning is very effective in the sense that it has the highest number of convergent trials. Unlike competing algorithms, Smooth Q-Learning is also easy to tune and does not require storing additional information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Code Availability

The code used in this paper is available from the corresponding author upon reasonable request.

References

  1. Mnih, V., et al.: Human-level control through deep reinforcement learning. nature 518(7540), 529–533 (2015)

    Article  Google Scholar 

  2. Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. nature 529(7587), 484–489 (2016)

    Article  Google Scholar 

  3. Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. The J. Mach. Learn. Res. 17(1), 1334–1373 (2016)

    MathSciNet  MATH  Google Scholar 

  4. Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction (2018)

  5. Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Auton. Agent Multi-Agent Syst. 11(3), 387–434 (2005)

    Article  Google Scholar 

  6. Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1 183–221 (2010)

  7. Hernandez-Leal, P., Kartal, B., Taylor, M.E.: A survey and critique of multiagent deep reinforcement learning. Auton. Agent Multi-Agent Syst. 33(6), 750–797 (2019)

    Article  Google Scholar 

  8. Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control 321–384 (2021)

  9. Hu, J., Wellman, M.P.: Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4(Nov), 1039–1069 (2003)

    MathSciNet  MATH  Google Scholar 

  10. Littman, M.L., et al.: Friend-or-foe q-learning in general-sum games 1, 322–328 (2001)

    Google Scholar 

  11. Greenwald, A., Hall, K., Serrano, R., et al.: Correlated q-learning 3, 242–249 (2003)

    Google Scholar 

  12. Conitzer, V., Sandholm, T.: Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Mach. Learn. 67(1), 23–43 (2007)

    Article  MATH  Google Scholar 

  13. Wei, E., Luke, S.: Lenient learning in independent-learner stochastic cooperative games. The J. Mach. Learn. Res. 17(1), 2914–2955 (2016)

    MathSciNet  MATH  Google Scholar 

  14. Littman, M.L.: Value-function reinforcement learning in markov games. Cogn. Syst. Res. 2, 55–66 (2001)

    Article  Google Scholar 

  15. Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free q-learning designs for linear discrete-time zero-sum games with application to h-infinity control. Automatica 43(3), 473–481 (2007)

  16. Bai, Y., Jin, C., Yu, T.: Near-optimal reinforcement learning with self-play. Advances in neural information processing systems 33, 2159–2170 (2020)

    Google Scholar 

  17. Busoniu, L., De Schutter, B., Babuska, R.: Decentralized reinforcement learning control of a robotic manipulator, pp. 1–6. IEEE (2006)

  18. Rhazzaf, M., Masrour, T.: Deep learning approach for automated guided vehicle system, pp. 227–237. Springer (2020)

  19. Boutilier, C.: Planning, learning and coordination in multiagent decision processes, Vol. 96, pp. 195–210. Citeseer (1996)

  20. Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review 27, 1–31 (2012)

    Article  Google Scholar 

  21. Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746–752), 2 (1998)

    Google Scholar 

  22. Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi-agent learning: A survey. J. Artif. Intell. Res. 53, 659–697 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  23. Fulda, N., Ventura, D.: Predicting and preventing coordination problems in cooperative q-learning systems. 2007, 780–785 (2007)

    Google Scholar 

  24. Laurent, G.J., Matignon, L., Fort-Piat, N.L.: The world of independent learners is not markovian. Int. J. Knowl. Based Intell. Eng. Syst. 15, 55–64 (2011)

    Google Scholar 

  25. Tuyls, K., Weiss, G.: Multiagent learning: Basics, challenges, and prospects. Ai Magazine 33(3), 41–41 (2012)

    Article  Google Scholar 

  26. Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. Citeseer (2000)

  27. Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams, pp. 64–69. IEEE (2007)

  28. Tesauro, G.: Extending q-learning to general adaptive multi-agent systems (2003)

  29. Foerster, J., et al.: Stabilising experience replay for deep multi-agent reinforcement learning, pp. 1146–1155. PMLR (2017)

  30. Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. J. Mach. Learn. Res. 7, 1789–1828 (2006)

    MathSciNet  MATH  Google Scholar 

  31. Agogino, A.K., Tumer, K.: A multiagent approach to managing air traffic flow. Auton. Agent Multi-Agent Syst. 24, 1–25 (2010)

    Article  MATH  Google Scholar 

  32. Panait, L., Sullivan, K., Luke, S.: Lenient learners in cooperative multiagent systems, 801–803 (2006)

  33. Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning, 443-451 (2018)

  34. Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39(10), 1095–1100 (1953)

    Article  MathSciNet  MATH  Google Scholar 

  35. Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons (2014)

  36. Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)

    Article  MATH  Google Scholar 

  37. Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents, pp. 330–337. Morgan Kaufmann (1993)

  38. Panait, L., Tuyls, K., Luke, S.: Theoretical advantages of lenient learners: An evolutionary game theoretic perspective. The Journal of Machine Learning Research 9, 423–457 (2008)

    MathSciNet  MATH  Google Scholar 

  39. Palmer, G., Savani, R., Tuyls, K.: Negative update intervals in deep multi-agent reinforcement learning, 43–51 (2019)

  40. Amhraoui, E., Masrour, T.: Smoothing approximations for piecewise smooth functions: A probabilistic approach. Numerical Algebra, Control and Optimization 12(4), 745–762 (2022)

    Article  MathSciNet  MATH  Google Scholar 

  41. Boutilier, C.: Sequential optimality and coordination in multiagent systems 99, 478–485 (1999)

    Google Scholar 

Download references

Acknowledgements

The first author’s work is supported by the national center for scientific and technical research, Morocco.

Funding

The first author’s work is supported by the national center for scientific and technical research, Morocco.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the proposed approach. Elmehdi Amhraoui implements the algorithm and write the first draft of the paper. Tawfik Masrour supervised the work and has reviewed, and edited the final version of the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Elmehdi Amhraoui.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

All authors of this research paper have consented to participate in the research study.

Consent for publication

All authors of this research paper have read and approved the submitted version.

Conflict of Interests

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Elmehdi Amhraoui and Tawfik Masrour contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Amhraoui, E., Masrour, T. Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games. J Intell Robot Syst 108, 65 (2023). https://doi.org/10.1007/s10846-023-01917-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10846-023-01917-z

Keywords

Navigation