Abstract
In this article, we introduce the Smooth Q-Learning algorithm for independent learners (distributed and non-communicative learners) in cooperative Markov games. Smooth Q-Learning aimed to solve the relative over-generalization and the stochasticity problems while also performing well in the presence of other non-coordination factors such as the miscoordination problem (also known as the Pareto selection problem) and the non-stationarity problem. Smooth Q-Learning is an algorithm that tries to find a trade-off between two incompatible learning approaches: the maximum-based learning and the average-based learning, by dynamically adjusting the learning rate based on the value of temporal difference error in a way that ensures the algorithm lies somewhere between average-based learning and maximum-based learning. We compare Smooth Q-Learning against different algorithms from the literature: Decentralized Q-learning, Distributed Q-Learning, Hysteretic Q-Learning, and a recent version of Lenient Q-Learning called Lenient Multiagent Reinforcement learning 2. The results show that Smooth Q-Learning is very effective in the sense that it has the highest number of convergent trials. Unlike competing algorithms, Smooth Q-Learning is also easy to tune and does not require storing additional information.
Similar content being viewed by others
Code Availability
The code used in this paper is available from the corresponding author upon reasonable request.
References
Mnih, V., et al.: Human-level control through deep reinforcement learning. nature 518(7540), 529–533 (2015)
Silver, D., et al.: Mastering the game of go with deep neural networks and tree search. nature 529(7587), 484–489 (2016)
Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. The J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Sutton, R.S., Barto, A.G.: Reinforcement learning: An introduction (2018)
Panait, L., Luke, S.: Cooperative multi-agent learning: The state of the art. Auton. Agent Multi-Agent Syst. 11(3), 387–434 (2005)
Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: An overview. Innovations in multi-agent systems and applications-1 183–221 (2010)
Hernandez-Leal, P., Kartal, B., Taylor, M.E.: A survey and critique of multiagent deep reinforcement learning. Auton. Agent Multi-Agent Syst. 33(6), 750–797 (2019)
Zhang, K., Yang, Z., Başar, T.: Multi-agent reinforcement learning: A selective overview of theories and algorithms. Handbook of Reinforcement Learning and Control 321–384 (2021)
Hu, J., Wellman, M.P.: Nash q-learning for general-sum stochastic games. J. Mach. Learn. Res. 4(Nov), 1039–1069 (2003)
Littman, M.L., et al.: Friend-or-foe q-learning in general-sum games 1, 322–328 (2001)
Greenwald, A., Hall, K., Serrano, R., et al.: Correlated q-learning 3, 242–249 (2003)
Conitzer, V., Sandholm, T.: Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Mach. Learn. 67(1), 23–43 (2007)
Wei, E., Luke, S.: Lenient learning in independent-learner stochastic cooperative games. The J. Mach. Learn. Res. 17(1), 2914–2955 (2016)
Littman, M.L.: Value-function reinforcement learning in markov games. Cogn. Syst. Res. 2, 55–66 (2001)
Al-Tamimi, A., Lewis, F.L., Abu-Khalaf, M.: Model-free q-learning designs for linear discrete-time zero-sum games with application to h-infinity control. Automatica 43(3), 473–481 (2007)
Bai, Y., Jin, C., Yu, T.: Near-optimal reinforcement learning with self-play. Advances in neural information processing systems 33, 2159–2170 (2020)
Busoniu, L., De Schutter, B., Babuska, R.: Decentralized reinforcement learning control of a robotic manipulator, pp. 1–6. IEEE (2006)
Rhazzaf, M., Masrour, T.: Deep learning approach for automated guided vehicle system, pp. 227–237. Springer (2020)
Boutilier, C.: Planning, learning and coordination in multiagent decision processes, Vol. 96, pp. 195–210. Citeseer (1996)
Matignon, L., Laurent, G.J., Fort-Piat, N.L.: Independent reinforcement learners in cooperative markov games: a survey regarding coordination problems. The Knowledge Engineering Review 27, 1–31 (2012)
Claus, C., Boutilier, C.: The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746–752), 2 (1998)
Bloembergen, D., Tuyls, K., Hennes, D., Kaisers, M.: Evolutionary dynamics of multi-agent learning: A survey. J. Artif. Intell. Res. 53, 659–697 (2015)
Fulda, N., Ventura, D.: Predicting and preventing coordination problems in cooperative q-learning systems. 2007, 780–785 (2007)
Laurent, G.J., Matignon, L., Fort-Piat, N.L.: The world of independent learners is not markovian. Int. J. Knowl. Based Intell. Eng. Syst. 15, 55–64 (2011)
Tuyls, K., Weiss, G.: Multiagent learning: Basics, challenges, and prospects. Ai Magazine 33(3), 41–41 (2012)
Lauer, M., Riedmiller, M.: An algorithm for distributed reinforcement learning in cooperative multi-agent systems. Citeseer (2000)
Matignon, L., Laurent, G.J., Le Fort-Piat, N.: Hysteretic q-learning: an algorithm for decentralized reinforcement learning in cooperative multi-agent teams, pp. 64–69. IEEE (2007)
Tesauro, G.: Extending q-learning to general adaptive multi-agent systems (2003)
Foerster, J., et al.: Stabilising experience replay for deep multi-agent reinforcement learning, pp. 1146–1155. PMLR (2017)
Kok, J.R., Vlassis, N.: Collaborative multiagent reinforcement learning by payoff propagation. J. Mach. Learn. Res. 7, 1789–1828 (2006)
Agogino, A.K., Tumer, K.: A multiagent approach to managing air traffic flow. Auton. Agent Multi-Agent Syst. 24, 1–25 (2010)
Panait, L., Sullivan, K., Luke, S.: Lenient learners in cooperative multiagent systems, 801–803 (2006)
Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning, 443-451 (2018)
Shapley, L.S.: Stochastic games. Proc. Natl. Acad. Sci. 39(10), 1095–1100 (1953)
Puterman, M.L.: Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons (2014)
Watkins, C.J., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)
Tan, M.: Multi-agent reinforcement learning: Independent vs. cooperative agents, pp. 330–337. Morgan Kaufmann (1993)
Panait, L., Tuyls, K., Luke, S.: Theoretical advantages of lenient learners: An evolutionary game theoretic perspective. The Journal of Machine Learning Research 9, 423–457 (2008)
Palmer, G., Savani, R., Tuyls, K.: Negative update intervals in deep multi-agent reinforcement learning, 43–51 (2019)
Amhraoui, E., Masrour, T.: Smoothing approximations for piecewise smooth functions: A probabilistic approach. Numerical Algebra, Control and Optimization 12(4), 745–762 (2022)
Boutilier, C.: Sequential optimality and coordination in multiagent systems 99, 478–485 (1999)
Acknowledgements
The first author’s work is supported by the national center for scientific and technical research, Morocco.
Funding
The first author’s work is supported by the national center for scientific and technical research, Morocco.
Author information
Authors and Affiliations
Contributions
All authors contributed to the proposed approach. Elmehdi Amhraoui implements the algorithm and write the first draft of the paper. Tawfik Masrour supervised the work and has reviewed, and edited the final version of the paper. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Ethics approval
Not applicable.
Consent to participate
All authors of this research paper have consented to participate in the research study.
Consent for publication
All authors of this research paper have read and approved the submitted version.
Conflict of Interests
The authors declare that they have no conflicts of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Elmehdi Amhraoui and Tawfik Masrour contributed equally to this work.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Amhraoui, E., Masrour, T. Smooth Q-Learning: An Algorithm for Independent Learners in Stochastic Cooperative Markov Games. J Intell Robot Syst 108, 65 (2023). https://doi.org/10.1007/s10846-023-01917-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10846-023-01917-z