Skip to main content
Log in

Exponential moving average based multiagent reinforcement learning algorithms

  • Published:
Artificial Intelligence Review Aims and scope Submit manuscript

Abstract

Two multi-agent policy iteration learning algorithms are proposed in this work. The two proposed algorithms use the exponential moving average approach along with the Q-learning algorithm as a basis to update the policy for the learning agent so that the agent’s policy converges to a Nash equilibrium policy. The first proposed algorithm uses a constant learning rate when updating the policy of the learning agent, while the second proposed algorithm uses two different decaying learning rates. These two decaying learning rates are updated based on either the Win-or-Learn-Fast (WoLF) mechanism or the Win-or-Learn-Slow (WoLS) mechanism. The WoLS mechanism is introduced in this article to make the algorithm learn fast when it is winning and learn slowly when it is losing. The second proposed algorithm uses the rewards received by the learning agent to decide which mechanism (WoLF mechanism or WoLS mechanism) to use for the game being learned. The proposed algorithms have been theoretically analyzed and a mathematical proof of convergence to pure Nash equilibrium is provided for each algorithm. In the case of games with mixed Nash equilibrium, our mathematical analysis shows that the second proposed algorithm converges to an equilibrium. Although our mathematical analysis does not explicitly show that the second proposed algorithm converges to a Nash equilibrium, our simulation results indicate that the second proposed algorithm does converge to Nash equilibrium. The proposed algorithms are examined on a variety of matrix and stochastic games. Simulation results show that the second proposed algorithm converges in a wider variety of situations than state-of-the-art multi-agent reinforcement learning algorithms.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  • Abdallah S, Lesser V (2008) A multiagent reinforcement learning algorithm with non-linear dynamics. J Artif Intell Res 33:521–549

    MathSciNet  MATH  Google Scholar 

  • Awheda MD, Schwartz HM (2013) Exponential moving average Q-learning algorithm. In: Adaptive dynamic programming and reinforcement learning (ADPRL), 2013 IEEE symposium on, IEEE, pp 31–38. IEEE

  • Awheda MD, Schwartz HM (2015) The residual gradient FACL algorithm for differential games. In Electrical and computer engineering (CCECE). 2015 IEEE 28th Canadian conference on, IEEE, pp 1006–1011. IEEE

  • Banerjee B, Peng J (2007) Generalized multiagent learning with performance bound. Auton Agents Multi-Agent Syst 15(3):281–312

    Article  Google Scholar 

  • Bellman R (1957) Dynamic programming. Princeton University Press, Princeton

    MATH  Google Scholar 

  • Bowling M (2005) Convergence and no-regret in multiagent learning. Adv Neural Inf Process Syst 17:209–216

    Google Scholar 

  • Bowling M, Veloso M (2001a) Convergence of gradient dynamics with a variable learning rate. In: ICML, pp 27–34

  • Bowling M, Veloso M (2001b) Rational and convergent learning in stochastic games. In: International joint conference on artificial intelligence, vol. 17. Lawrence Erlbaum Associates Ltd, pp 1021–1026

  • Bowling M, Veloso M (2002) Multiagent learning using a variable learning rate. Artif Intell 136(2):215–250

    Article  MathSciNet  MATH  Google Scholar 

  • Burkov A, Chaib-draa B (2009) Effective learning in the presence of adaptive counterparts. J Algorithms 64(4):127–138

    Article  MathSciNet  MATH  Google Scholar 

  • Busoniu L, Babuska R, De Schutter B (2006) Multi-agent reinforcement learning: A survey. In: Control, automation, robotics and vision, 2006. ICARCV’06. 9th international conference on, IEEE, pp 1–6. IEEE

  • Busoniu L, Babuska R, De Schutter B (2008) A comprehensive survey of multiagent reinforcement learning. Syst Man Cybern Part C: Appl Rev, IEEE Trans 38(2):156–172

    Article  Google Scholar 

  • Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. In: AAAI/IAAI, pp 746–752

  • Conitzer V, Sandholm T (2007) Awesome: A general multiagent learning algorithm that converges in self-play and learns a best response against stationary opponents. Mach Learn 67(1–2):23–43

    Article  Google Scholar 

  • Dai X, Li C-K, Rad AB (2005) An approach to tune fuzzy controllers based on reinforcement learning for autonomous vehicle control. Intell Transp Syst, IEEE Trans 6(3):285–293

    Article  Google Scholar 

  • D’Angelo H (1970) Linear time-varying systems: analysis and synthesis. Allyn & Bacon, Newton

    MATH  Google Scholar 

  • DeCarlo RA (1989) Linear systems: a state variable approach with numerical implementation. Prentice-Hall Inc, Upper Saddle River

    Google Scholar 

  • Dixon W (2014) Optimal adaptive control and differential games by reinforcement learning principles. J Guid Control Dyn 37(3):1048–1049

    Article  Google Scholar 

  • Fulda N, Ventura D (2007) Predicting and preventing coordination problems in cooperative Q-learning systems. In: IJCAI, vol. 2007, pp 780–785

  • Gutnisky DA, Zanutto BS (2004) Learning obstacle avoidance with an operant behavior model. Artif Life 10(1):65–81

    Article  Google Scholar 

  • Hinojosa W, Nefti S, Kaymak U (2011) Systems control with generalized probabilistic fuzzy-reinforcement learning. Fuzzy Syst, IEEE Trans 19(1):51–64

    Article  Google Scholar 

  • Howard RA (1960) Dynamic programming and markov processes. MIT Press, Cambridge

  • Hu J, Wellman MP (2003) Nash Q-learning for general-sum stochastic games. J Mach Learn Res 4:1039–1069

    MathSciNet  Google Scholar 

  • Hu J, Wellman MP, et al (1998) Multiagent reinforcement learning: theoretical framework and an algorithm. In: ICML, vol. 98, Citeseer, pp 242–250

  • Kaelbling LP, Littman ML, Moore AW (1996) Reinforcement learning: a survey. J Artif Intell Res 4:237–285

  • Kondo T, Ito K (2004) A reinforcement learning with evolutionary state recruitment strategy for autonomous mobile robots control. Robot Auton Syst 46(2):111–124

    Article  Google Scholar 

  • Luo B, Wu H-N, Li H-X (2014a) Data-based suboptimal neuro-control design with reinforcement learning for dissipative spatially distributed processes. Ind Eng Chem Res 53(19):8106–8119

    Article  Google Scholar 

  • Luo B, Wu H-N, Huang T, Liu D (2014b) Data-based approximate policy iteration for nonlinear continuous-time optimal control design. Automatica 50(12):3281–3290

    Article  MathSciNet  Google Scholar 

  • Luo B, Wu H-N, Huang T (2015a) Off-policy reinforcement learning for \(H_{\infty }\) control design. Cybern, IEEE Trans 45(1):65–76

    Article  Google Scholar 

  • Luo B, Wu H-N, Li H-X (2015b) Adaptive optimal control of highly dissipative nonlinear spatially distributed processes with neuro-dynamic programming. Neural Netw Learn Syst, IEEE Trans 26(4):684–696

    Article  Google Scholar 

  • Luo B, Huang T, Wu H-N, Yang X (2015c) Data-driven \(H_{\infty }\) control for nonlinear distributed parameter systems. Neural Netw Learn Syst, IEEE Trans 26(11):2949–2961

  • Luo B, Wu H-N, Huang T, Liu D (2015d) Reinforcement learning solution for HJB equation arising in constrained optimal control problem. Neural Netw 71:150–158

    Article  Google Scholar 

  • Modares H, Lewis FL, Naghibi-Sistani M-B (2014) Integral reinforcement learning and experience replay for adaptive optimal control of partially-unknown constrained-input continuous-time systems. Automatica 50(1):193–202

    Article  MathSciNet  MATH  Google Scholar 

  • Rodríguez M, Iglesias R, Regueiro CV, Correa J, Barro S (2007) Autonomous and fast robot learning through motivation. Robot Auton Syst 55(9):735–740

    Article  Google Scholar 

  • Schwartz HM (2014) Multi-Agent Machine Learning: A Reinforcement Approach. Wiley, New York

    Book  Google Scholar 

  • Sen S, Sekaran M, Hale J (1994) Learning to coordinate without sharing information. In: AAAI, pp 426–431

  • Singh S, Kearns M, Mansour Y (2000) Nash convergence of gradient dynamics in general-sum games. In: Proceedings of the sixteenth conference on Uncertainty in artificial intelligence, Morgan Kaufmann Publishers Inc., pp 541–548

  • Smart WD, Kaelbling LP (2002) Effective reinforcement learning for mobile robots. In: Robotics and automation. Proceedings. ICRA’02. IEEE international conference on, vol. 4, IEEE, 2002, pp. 3404–3410. IEEE

  • Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. The MIT Press, Cambridge

    Google Scholar 

  • Tan M (1993) Multi-agent reinforcement learning: Independent vs. cooperative agents. In: Proceedings of the tenth international conference on machine learning, pp 330–337

  • Tesauro G (2004) Extending Q-learning to general adaptive multi-agent systems. In: Advances in neural information processing systems, vol. 16. MIT press, pp 871–878

  • Thathachar MA, Sastry PS (2011) Networks of learning automata: techniques for online stochastic optimization. Springer, Boston

    Google Scholar 

  • Watkins CJ, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292

    MATH  Google Scholar 

  • Watkins CJCH (1989) Learning from delayed rewards, Ph.D. thesis, University of Cambridge England

  • Weiss G (1999) Multiagent systems: a modern approach to distributed artificial intelligence. MIT Press

  • Wu H-N, Luo B (2012) Neural network based online simultaneous policy update algorithm for solving the HJI equation in nonlinear control. Neural Netw Learn Syst, IEEE Trans 23(12):1884–1895

    Article  MathSciNet  Google Scholar 

  • Ye C, Yung NH, Wang D (2003) A fuzzy controller with supervised learning assisted reinforcement learning algorithm for obstacle avoidance. Syst Man Cybern Part B: Cybern, IEEE Trans 33(1):17–27

    Article  Google Scholar 

  • Zhang C, Lesser VR (2010) Multi-agent learning with policy prediction. In: AAAI

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mostafa D. Awheda.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Awheda, M.D., Schwartz, H.M. Exponential moving average based multiagent reinforcement learning algorithms. Artif Intell Rev 45, 299–332 (2016). https://doi.org/10.1007/s10462-015-9447-5

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10462-015-9447-5

Keywords

Navigation