Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

Amhraoui, Elmehdi; Masrour, Tawfik

doi:10.1007/s10846-023-01917-z

Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

Short Paper
Published: 18 July 2023

Volume 108, article number 65, (2023)
Cite this article

Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

171 Accesses
2 Citations
Explore all metrics

Abstract

In this article, we introduce the Smooth Q-Learning algorithm for independent learners (distributed and non-communicative learners) in cooperative Markov games. Smooth Q-Learning aimed to solve the relative over-generalization and the stochasticity problems while also performing well in the presence of other non-coordination factors such as the miscoordination problem (also known as the Pareto selection problem) and the non-stationarity problem. Smooth Q-Learning is an algorithm that tries to find a trade-off between two incompatible learning approaches: the maximum-based learning and the average-based learning, by dynamically adjusting the learning rate based on the value of temporal difference error in a way that ensures the algorithm lies somewhere between average-based learning and maximum-based learning. We compare Smooth Q-Learning against different algorithms from the literature: Decentralized Q-learning, Distributed Q-Learning, Hysteretic Q-Learning, and a recent version of Lenient Q-Learning called Lenient Multiagent Reinforcement learning 2. The results show that Smooth Q-Learning is very effective in the sense that it has the highest number of convergent trials. Unlike competing algorithms, Smooth Q-Learning is also easy to tune and does not require storing additional information.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A review of cooperative multi-agent deep reinforcement learning

Article 14 October 2022

Code Availability

The code used in this paper is available from the corresponding author upon reasonable request.

References

Mnih, V., et al.: Human-level control through deep reinforcement learning. nature 518(7540), 529–533 (2015)
Article Google Scholar
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. nature 529(7587), 484–489 (2016)
Article Google Scholar
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. The J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction (2018)
Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Auton. Agent Multi-Agent Syst. 11(3), 387–434 (2005)
Article Google Scholar
Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1 183–221 (2010)
Hernandez-Leal, P., Kartal, B., Taylor, M.E.: A survey and critique of multiagent deep reinforcement learning. Auton. Agent Multi-Agent Syst. 33(6), 750–797 (2019)
Article Google Scholar
Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control 321–384 (2021)
Hu, J., Wellman, M.P.: Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4(Nov), 1039–1069 (2003)
MathSciNet MATH Google Scholar
Littman, M.L., et al.: Friend-or-foe q-learning in general-sum games 1, 322–328 (2001)
Google Scholar
Greenwald, A., Hall, K., Serrano, R., et al.: Correlated q-learning 3, 242–249 (2003)
Google Scholar
Conitzer, V., Sandholm, T.: Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Mach. Learn. 67(1), 23–43 (2007)
Article MATH Google Scholar
Wei, E., Luke, S.: Lenient learning in independent-learner stochastic cooperative games. The J. Mach. Learn. Res. 17(1), 2914–2955 (2016)
MathSciNet MATH Google Scholar
Littman, M.L.: Value-function reinforcement learning in markov games. Cogn. Syst. Res. 2, 55–66 (2001)
Article Google Scholar
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free q-learning designs for linear discrete-time zero-sum games with application to h-infinity control. Automatica 43(3), 473–481 (2007)
Bai, Y., Jin, C., Yu, T.: Near-optimal reinforcement learning with self-play. Advances in neural information processing systems 33, 2159–2170 (2020)
Google Scholar
Busoniu, L., De Schutter, B., Babuska, R.: Decentralized reinforcement learning control of a robotic manipulator, pp. 1–6. IEEE (2006)
Rhazzaf, M., Masrour, T.: Deep learning approach for automated guided vehicle system, pp. 227–237. Springer (2020)
Boutilier, C.: Planning, learning and coordination in multiagent decision processes, Vol. 96, pp. 195–210. Citeseer (1996)
Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review 27, 1–31 (2012)
Article Google Scholar
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746–752), 2 (1998)
Google Scholar
Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi-agent learning: A survey. J. Artif. Intell. Res. 53, 659–697 (2015)
Article MathSciNet MATH Google Scholar
Fulda, N., Ventura, D.: Predicting and preventing coordination problems in cooperative q-learning systems. 2007, 780–785 (2007)
Google Scholar
Laurent, G.J., Matignon, L., Fort-Piat, N.L.: The world of independent learners is not markovian. Int. J. Knowl. Based Intell. Eng. Syst. 15, 55–64 (2011)
Google Scholar
Tuyls, K., Weiss, G.: Multiagent learning: Basics, challenges, and prospects. Ai Magazine 33(3), 41–41 (2012)
Article Google Scholar
Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. Citeseer (2000)
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams, pp. 64–69. IEEE (2007)
Tesauro, G.: Extending q-learning to general adaptive multi-agent systems (2003)
Foerster, J., et al.: Stabilising experience replay for deep multi-agent reinforcement learning, pp. 1146–1155. PMLR (2017)
Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. J. Mach. Learn. Res. 7, 1789–1828 (2006)
MathSciNet MATH Google Scholar
Agogino, A.K., Tumer, K.: A multiagent approach to managing air traffic flow. Auton. Agent Multi-Agent Syst. 24, 1–25 (2010)
Article MATH Google Scholar
Panait, L., Sullivan, K., Luke, S.: Lenient learners in cooperative multiagent systems, 801–803 (2006)
Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning, 443-451 (2018)
Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39(10), 1095–1100 (1953)
Article MathSciNet MATH Google Scholar
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons (2014)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Article MATH Google Scholar
Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents, pp. 330–337. Morgan Kaufmann (1993)
Panait, L., Tuyls, K., Luke, S.: Theoretical advantages of lenient learners: An evolutionary game theoretic perspective. The Journal of Machine Learning Research 9, 423–457 (2008)
MathSciNet MATH Google Scholar
Palmer, G., Savani, R., Tuyls, K.: Negative update intervals in deep multi-agent reinforcement learning, 43–51 (2019)
Amhraoui, E., Masrour, T.: Smoothing approximations for piecewise smooth functions: A probabilistic approach. Numerical Algebra, Control and Optimization 12(4), 745–762 (2022)
Article MathSciNet MATH Google Scholar
Boutilier, C.: Sequential optimality and coordination in multiagent systems 99, 478–485 (1999)
Google Scholar

Download references

Acknowledgements

The first author’s work is supported by the national center for scientific and technical research, Morocco.

Funding

The first author’s work is supported by the national center for scientific and technical research, Morocco.

Author information

Authors and Affiliations

Laboratory of Mathematical Modeling, Simulation and Smart Systems (L2M3S), Department of Mathematics and Computer Science, ENSAM-Meknes, Moulay ISMAIL University, B.P. 15290, Marjane 2, Al-Mansor, Meknes, 50500, Morocco
Elmehdi Amhraoui & Tawfik Masrour

Authors

Elmehdi Amhraoui
View author publications
You can also search for this author in PubMed Google Scholar
Tawfik Masrour
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the proposed approach. Elmehdi Amhraoui implements the algorithm and write the first draft of the paper. Tawfik Masrour supervised the work and has reviewed, and edited the final version of the paper. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Elmehdi Amhraoui.

Ethics declarations

Ethics approval

Not applicable.

Consent to participate

All authors of this research paper have consented to participate in the research study.

Consent for publication

All authors of this research paper have read and approved the submitted version.

Conflict of Interests

The authors declare that they have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Elmehdi Amhraoui and Tawfik Masrour contributed equally to this work.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Amhraoui, E., Masrour, T. Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games. J Intell Robot Syst 108, 65 (2023). https://doi.org/10.1007/s10846-023-01917-z

Download citation

Received: 19 March 2022
Accepted: 15 June 2023
Published: 18 July 2023
DOI: https://doi.org/10.1007/s10846-023-01917-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A review of cooperative multi-agent deep reinforcement learning

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflict of Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games

Abstract

Access this article

Similar content being viewed by others

A practical guide to multi-objective reinforcement learning and planning

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A review of cooperative multi-agent deep reinforcement learning

Code Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval

Consent to participate

Consent for publication

Conflict of Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation