Skip to main content

Advertisement

Log in

Distributed Reinforcement Learning for Robot Teams: a Review

  • Group Robotics (M Gini and F Amigoni, Section Editors)
  • Published:
Current Robotics Reports Aims and scope Submit manuscript

Abstract

Purpose of Review

Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems consisting of hundreds/thousands of robots, with promising applications to automated manufacturing, disaster relief, harvesting, last-mile delivery, port/airport operations, or search and rescue. The community has leveraged model-free multi-agent reinforcement learning (MARL) to devise efficient, scalable controllers for multi-robot systems (MRS). This review aims to provide an analysis of the state-of-the-art in distributed MARL for multi-robot cooperation.

Recent Findings

Decentralized MRS face fundamental challenges, such as non-stationarity and partial observability. Building upon the “centralized training, decentralized execution” paradigm, recent MARL approaches include independent learning, centralized critic, value decomposition, and communication learning approaches. Cooperative behaviors are demonstrated through AI benchmarks and fundamental real-world robotic capabilities such as multi-robot motion/path planning.

Summary

This survey reports the challenges surrounding decentralized model-free MARL for multi-robot cooperation and existing classes of approaches. We present benchmarks and robotic applications along with a discussion on current open avenues for research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data Availability

Not applicable

Code Availability

Not applicable

Supplementary Information

Not applicable

Notes

  1. In some cases, one may assume \(\pi ^*\) to be deterministic, i.e., a discrete one-hot distribution or a continuous \(\delta\) distribution. In such cases, it is common to simplify it to \(\pi ^*: S \rightarrow A\), as it solely returns a single (optimal) action in each state.

References

Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance

  1. Nägele L, Schierl A, Hoffmann A, Reif W. Multi-robot cooperation for assembly: Automated planning and optimization. In: International conference on informatics in control, automation and robotics. Springer; 2019. p. 169–192.

  2. Ma K, Ma Z, Liu L, Sukhatme GS. Multi-robot informative and adaptive planning for persistent environmental monitoring. In: Distributed autonomous robotic systems, the 13th international symposium, DARS 2016, Natural History Museum, London, UK, November 7-9, 2016. vol. 6; 2016. p. 285–298. Available from: https://doi.org/10.1007/978-3-319-73008-0_20

  3. Wang H, Zhang C, Song Y, Pang B. Master-Followed multiple robots cooperation SLAM adapted to search and rescue scenarios. In: IEEE international conference on information and automation, ICIA 2017, Macau, SAR, China, July 18-20, 2017; 2017. p. 579–585. Available from: https://doi.org/10.1109/ICInfA.2017.8078975

  4. Oroojlooyjadid A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. 2019. CoRR. arXiv:1908.03963

  5. Hernandez-Leal P, Kartal B, Taylor ME. A survey and critique of multiagent deep reinforcement learning. Auton Agents Multi Agent Syst. 2019;33(6):750–97. https://doi.org/10.1007/s10458-019-09421-1.

    Article  Google Scholar 

  6. Nguyen TT, Nguyen ND, Nahavandi S. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern. 2020;50(9):3826–39. https://doi.org/10.1109/TCYB.2020.2977374.

    Article  Google Scholar 

  7. Gronauer S, Diepold K. Multi-agent deep reinforcement learning: a survey. Artif Intell Rev. 2022;55(2):895–943. https://doi.org/10.1007/s10462-021-09996-w.

    Article  Google Scholar 

  8. Papoudakis G, Christianos F, Rahman A, Albrecht SV. Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning. 2019. CoRR. arXiv:1906.04737

  9. Cortés J, Egerstedt M. Coordinated control of multi-robot systems: A survey. SICE Journal of Control, Measurement, and System Integration. 2017;10(6):495–503.

    Article  Google Scholar 

  10. Tuci E, Alkilabi MHM, Akanyeti O. Cooperative object transport in multi-robot systems: a review of the State-of-the-Art. Frontiers Robotics AI. 2018;5:59. https://doi.org/10.3389/frobt.2018.00059.

    Article  Google Scholar 

  11. Feng Z, Hu G, Sun Y, Soon J. An overview of collaborative robotic manipulation in multi-robot systems. Annu Rev Control. 2020;49:113–27. https://doi.org/10.1016/j.arcontrol.2020.02.002.

    Article  MathSciNet  Google Scholar 

  12. Sutton RS, Barto AG. Reinforcement learning: An introduction. MIT press; 2018.

  13. Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al. Playing atari with deep reinforcement learning. 2013. CoRR. arXiv:1312.5602

  14. Sutton RS, McAllester DA, Singh SP, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems 12, [NIPS Conference, Denver, Colorado, USA, November 29 - December 4, 1999]; 1999. p. 1057–1063.

  15. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, et al. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33nd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. vol. 48; 2016. p. 1928–1937. Available from: http://proceedings.mlr.press/v48/mniha16.html.

  16. Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. proximal policy optimization algorithms. 2017. CoRR. arXiv:1707.06347

  17. Haarnoja T, Zhou A, Abbeel P, Levine S. Soft Actor-Critic: Off-Policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. vol. 80; 2018. p. 1856–1865. Available from: http://proceedings.mlr.press/v80/haarnoja18b.html.

  18. Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. In: 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings; 2016. Available from: arXiv:1509.02971

  19. Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag. 2017;34(6):26–38. https://doi.org/10.1109/MSP.2017.2743240.

    Article  Google Scholar 

  20. Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. J Artif Intell Res. 1996;4:237–85. https://doi.org/10.1613/jair.301.

    Article  Google Scholar 

  21. François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J. An introduction to deep reinforcement learning. Found Trends Mach Learn. 2018;11(3–4):219–354. https://doi.org/10.1561/2200000071.

    Article  MATH  Google Scholar 

  22. Gupta JK, Egorov M, Kochenderfer MJ. Cooperative multi-agent control using deep reinforcement learning. In: Autonomous agents and multiagent systems - AAMAS 2017 Workshops, Best Papers, São Paulo, Brazil, May 8-12, 2017, Revised Selected Papers. vol. 10642; 2017. p. 66–83. Available from: https://doi.org/10.1007/978-3-319-71682-4_5

  23. Samvelyan M, Rashid T, de Witt CS, Farquhar G, Nardelli N, Rudner TGJ, et al. The starcraft multi-agent challenge. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019; 2019. p. 2186–2188. Available from: http://dl.acm.org/citation.cfm?id=3332052.

  24. Mordatch I, Abbeel P. Emergence of grounded compositional language in multi-agent populations. In: Proceedings of the Thirty-Second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018; 2018. p. 1495–1502. Available from: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17007.

  25. Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. 2017. CoRR. arXiv:1712.01815

  26. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nat. 2015;518(7540):529–33. https://doi.org/10.1038/nature14236.

    Article  Google Scholar 

  27. Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, et al. Multiagent cooperation and competition with deep reinforcement learning. 2015. CoRR. arXiv:1511.08779

  28. Resnick C, Eldridge W, Ha D, Britz D, Foerster JN, Togelius J, et al. Pommerman: A multi-agent playground. In: Joint Proceedings of the AIIDE 2018 workshops co-located with 14th AAAI conference on artificial intelligence and interactive digital entertainment (AIIDE 2018), Edmonton, Canada, November 13-14, 2018. vol. 2282; 2018. Available from: http://ceur-ws.org/Vol-2282/MARLO_104.pdf.

  29. Suarez J, Du Y, Isola P, Mordatch I. Neural MMO: A massively multiagent game environment for training and evaluating intelligent agents. 2019. CoRR. arXiv:1903.00784

  30. • Kim W, Park J, Sung Y. Communication in multi-agent reinforcement learning: intention sharing. In: 9th International conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021; 2021. Available from: https://openreview.net/forum?id=qpsl2dR9twy. This work allows globallycommunicating agents to share intent by modeling the environment dynamics and other agents’ actions.

  31. •• Kim D, Moon S, Hostallero D, Kang WJ, Lee T, Son K, et al. Learning to schedule communication in multi-agent reinforcement learning. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019; 2019. Available from: https://openreview.net/forum?id=SJxu5iR9KQ. This work allows agents to learn to estimate the importance of their observation/- knowledge, to selectively broadcasts continuous messages to the whole team.

  32. Chu T, Wang J, Codecà L, Li Z. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst. 2020;21(3):1086–95. https://doi.org/10.1109/TITS.2019.2901791.

    Article  Google Scholar 

  33. Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi VF, Jaderberg M, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018; 2018. p. 2085–2087. Available from: http://dl.acm.org/citation.cfm?id=3238080.

  34. Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA; 2017. p. 6379–6390. Available from: https://proceedings.neurips.cc/paper/2017/hash/68a9750337a418a86fe06c19 91a1d64c-Abstract.html.

  35. Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S. counterfactual multi-agent policy gradients. In: Proceedings of the Thirty-Second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018; 2018. p. 2974–2982. Available from: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17193.

  36. Rashid T, Samvelyan M, de Witt CS, Farquhar G, Foerster JN, Whiteson S. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. vol. 80; 2018. p. 4292–4301. Available from: http://proceedings.mlr.press/v80/rashid18a.html.

  37. Hausknecht MJ, Stone P. Deep recurrent Q-learning for partially observable MDPs. In: 2015 AAAI Fall Symposia, Arlington, Virginia, USA, November 12-14, 2015; 2015. p. 29–37. Available from: http://www.aaai.org/ocs/index.php/FSS/FSS15/paper/view/11673.

  38. Foerster JN, Assael YM, de Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain; 2016. p. 2137–2145. Available from: https://proceedings.neurips.cc/paper/2016/hash/c7635bfd99248a2cdef8249e f7bfbef4-Abstract.html.

  39. Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain; 2016. p. 2244–2252. Available from: https://proceedings.neurips.cc/paper/2016/hash/55b1927fdafef39c48e5b73b 5d61ea60-Abstract.html.

  40. Singh A, Jain T, Sukhbaatar S. Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019; 2019. Available from: https://openreview.net/forum?id=rye7knCqK7.

  41. Lauer M, Riedmiller MA. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the seventeenth international conference on machine learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000; 2000. p. 535–542.

  42. Matignon L, Laurent GJ, Fort-Piat NL. Hysteretic q-learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: 2007 IEEE/RSJ international conference on intelligent robots and systems, October 29 - November 2, 2007, Sheraton Hotel and Marina, San Diego, California, USA; 2007. p. 64–69. Available from: https://doi.org/10.1109/IROS.2007.4399095

  43. Panait L, Sullivan K, Luke S. Lenient learners in cooperative multiagent systems. In: 5th international joint conference on autonomous agents and multiagent systems (AAMAS 2006), Hakodate, Japan, May 8-12, 2006; 2006. p. 801–803. Available from: https://doi.org/10.1145/1160633.1160776

  44. Palmer G, Tuyls K, Bloembergen D, Savani R. Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018; 2018. p. 443–451. Available from: http://dl.acm.org/citation.cfm?id=3237451.

  45. Omidshafiei S, Pazis J, Amato C, How JP, Vian J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. vol. 70; 2017. p. 2681–2690. Available from: http://proceedings.mlr.press/v70/omidshafiei17a.html.

  46. •• Jaques N, Lazaridou A, Hughes E, Gülçehre Ç, Ortega PA, Strouse D, et al. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. vol. 97; 2019. p. 3040–3049. Available from: http://proceedings.mlr.press/v97/jaques19a.html. This work proposed to encourage cooperation among agents by relying on an intrinsic reward that aims at maximizing their influence over each other.

  47. Sun M, Devlin S, Hofmann K, Whiteson S. Monotonic improvement guarantees under non-stationarity for decentralized PPO. 2022. CoRR. arXiv:2202.00082

  48. • Yu C, Velu A, Vinitsky E, Wang Y, Bayen AM, Wu Y. The surprising effectiveness of MAPPO in cooperative, multi-agent games. 2021. CoRR. arXiv:2103.01955. This work shows that independent learning using on-policy algorithms such as PPO can perform effectively in fully cooperative MARL environments.

  49. Foerster JN, Nardelli N, Farquhar G, Afouras T, Torr PHS, Kohli P, et al. Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. vol. 70; 2017. p. 1146–1155. Available from: http://proceedings.mlr.press/v70/foerster17b.html.

  50. Tesauro G. Extending Q-learning to general adaptive multi-agent systems. In: Advances in neural information processing systems 16 [Neural Information Processing Systems, NIPS 2003, December 8-13, 2003, Vancouver and Whistler, British Columbia, Canada]; 2003. p. 871–878. Available from: https://proceedings.neurips.cc/paper/2003/hash/e71e5cd119bbc5797164fb0c d7fd94a4-Abstract.html.

  51. •• Iqbal S, Sha F. Actor-Attention-Critic for multi-agent reinforcement learning. In: Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. vol. 97; 2019. p. 2961–2970. Available from: http://proceedings.mlr.press/v97/iqbal19a.html. This work uses an attention mechanism in the centralized critic to dynamically select relevant information.

  52. Mao H, Zhang Z, Xiao Z, Gong Z. Modelling the dynamic joint policy of teammates with attention multi-agentk DDPG. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019; 2019. p. 1108–1116. Available from: http://dl.acm.org/citation.cfm?id=3331810.

  53. • Zhou M, Liu Z, Sui P, Li Y, Chung YY. Learning implicit credit assignment for cooperative multi-agent reinforcement learning. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, virtual; 2020. Available from: https://proceedings.neurips.cc/paper/2020/hash/8977ecbb8cb82d77fb091c7a 7f186163-Abstract.html. This work proposes a framework for implicit credit assignment which directly ascends approximate joint action value gradients of the centralized critic.

  54. • Son K, Kim D, Kang WJ, Hostallero D, Yi Y. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. vol. 97; 2019. p. 5887–5896. Available from: http://proceedings.mlr.press/v97/son19a.html. This work aims to learn a general value factorization without any structural constraints by transforming the optimal value function into one which is easily factorizable.

  55. • Mahajan A, Rashid T, Samvelyan M, Whiteson S. MAVEN: Multi-agent variational exploration. In: Advances in neural information processing systems 32: Annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada; 2019. p. 7611–7622. Available from: https://proceedings.neurips.cc/paper/2019/hash/f816dc0acface7498e104962 22e9db10-Abstract.html. This work extends QMIX and other value factorization methods by using a hierarchical policy to guide committed and temporally extended exploration.

  56. Mao H, Gong Z, Ni Y, Liu X, Wang Q, Ke W, et al. ACCNet: Actor-Coordinator-Critic Net for “Learning-to-Communicate” with deep multi-agent reinforcement learning. 2017. CoRR. arXiv:1706.03235

  57. Su J, Adams SC, Beling PA. Counterfactual multi-agent reinforcement learning with graph convolution communication. 2020. CoRR. arXiv:2004.00470

  58. Zhang SQ, Zhang Q, Lin J. Efficient communication in multi-agent reinforcement learning via variance based control. In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada; 2019. p. 3230–3239. Available from: https://proceedings.neurips.cc/paper/2019/hash/14cfdb59b5bda1fc245aadae 15b1984a-Abstract.html.

  59. Jiang J, Lu Z. Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada; 2018. p. 7265–7275. Available from: https://proceedings.neurips.cc/paper/2018/hash/6a8018b3a00b69c008601b8b ecae392b-Abstract.html.

  60. Jiang J, Dun C, Huang T, Lu Z. Graph convolutional reinforcement learning. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020; 2020. Available from: https://openreview.net/forum?id=HkxdQkSYDB.

  61. • Ma Z, Luo Y, Ma H. Distributed heuristic multi-agent path finding with communication. In: IEEE international conference on robotics and automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021; 2021. p. 8699–8705. Available from: https://doi.org/10.1109/ICRA48506.2021.9560748. This work formalizes the multiagent system as a graph and lets agents communicate with neighbors via graph convolution to solve the multi-agent pathfinding task.

  62. • Liu Y, Wang W, Hu Y, Hao J, Chen X, Gao Y. Multi-agent game abstraction via graph attention neural network. In: The Thirty-Fourth AAAI conference on artificial intelligence, AAAI 2020, The Thirty-Second innovative applications of artificial intelligence conference, IAAI 2020, The Tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020; 2020. p. 7211–7218. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/6211. This work uses a two-stage attention network to estimate whether two agents should communicate and the importance of that communication instance.

  63. Kong X, Xin B, Liu F, Wang Y. Revisiting the master-slave architecture in multi-agent deep reinforcement learning. 2017. CoRR. arXiv:1712.07305

  64. Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, et al. TarMAC: targeted multi-agent communication. In: Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. vol. 97; 2019. p. 1538–1546. Available from: http://proceedings.mlr.press/v97/das19a.html.

  65. Blumenkamp J, Prorok A. The emergence of adversarial communication in multi-agent reinforcement learning. In: 4th conference on robot learning, CoRL 2020, 16-18 November 2020, Virtual Event / Cambridge, MA, USA. vol. 155; 2020. p. 1394–1414. Available from: https://proceedings.mlr.press/v155/blumenkamp21a.html.

  66. Du Y, Liu B, Moens V, Liu Z, Ren Z, Wang J, et al. Learning correlated communication topology in multi-agent reinforcement learning. In: AAMAS ’21: 20th international conference on autonomous agents and multiagent systems, Virtual Event, United Kingdom, May 3-7, 2021; 2021. p. 456–464. Available from: https://dl.acm.org/doi/10.5555/3463952.3464010.

  67. •• Li W, Chen H, Jin B, Tan W, Zha H, Wang X. Multi-agent path finding with prioritized communication learning. 2022. CoRR. arXiv:2202.03634. This work incorporates relies on a conventional coupled planner to guide the learning of the communication topology in multiagent pathfinding.

  68. Pesce E, Montana G. Connectivity-driven communication in multi-agent reinforcement learning through diffusion processes on graphs. 2020. CoRR. arXiv:2002.05233

  69. Peng P, Yuan Q, Wen Y, Yang Y, Tang Z, Long H, et al. Multiagent bidirectionally-coordinated nets for learning to play StarCraft combat games. 2017. CoRR. arXiv:1703.10069

  70. Pesce E, Montana G. Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication. Mach Learn. 2020;109(9–10):1727–47. https://doi.org/10.1007/s10994-019-05864-5.

    Article  MATH  MathSciNet  Google Scholar 

  71. Wang Y, Sartoretti G. FCMNet: Full communication memory net for team-level cooperation in multi-agent systems. 2022. CoRR. arXiv:2201.11994

  72. Agarwal A, Kumar S, Sycara KP, Lewis M. Learning transferable cooperative behavior in multi-agent teams. In: Proceedings of the 19th international conference on autonomous agents and multiagent systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020; 2020. p. 1741–1743. Available from: https://dl.acm.org/doi/abs/10.5555/3398761.3398967.

  73. Cao K, Lazaridou A, Lanctot M, Leibo JZ, Tuyls K, Clark S. Emergent communication through negotiation. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings; 2018. Available from: https://openreview.net/forum?id=Hk6WhagRW.

  74. Shaw S, Wenzel E, Walker A, Sartoretti G. ForMIC: Foraging via multiagent RL with implicit communication. IEEE Robotics Autom Lett. 2022;7(2):4877–84. https://doi.org/10.1109/LRA.2022.3152688.

    Article  Google Scholar 

  75. Ma Z, Luo Y, Pan J. Learning selective communication for multi-agent path finding. 2021. CoRR. arXiv:2109.05413

  76. Freed B, James R, Sartoretti G, Choset H. Sparse discrete communication learning for multi-agent cooperation through backpropagation. In: IEEE/RSJ international conference on intelligent robots and systems, IROS 2020, Las Vegas, NV, USA, October 24, 2020 - January 24, 2021; 2020. p. 7993–7998. Available from: https://doi.org/10.1109/IROS45743.2020.9341079

  77. Leibo JZ, Zambaldi VF, Lanctot M, Marecki J, Graepel T. Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th conference on autonomous agents and multiagent systems, AAMAS 2017, São Paulo, Brazil, May 8-12, 2017; 2017. p. 464–473. Available from: http://dl.acm.org/citation.cfm?id=3091194.

  78. Claus C, Boutilier C. The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth national conference on artificial intelligence and tenth innovative applications of artificial intelligence conference, AAAI 98, IAAI 98, July 26-30, 1998, Madison, Wisconsin, USA; 1998. p. 746–752. Available from: http://www.aaai.org/Library/AAAI/1998/aaai98-106.php.

  79. Pérolat J, Leibo JZ, Zambaldi VF, Beattie C, Tuyls K, Graepel T. A multi-agent reinforcement learning model of common-pool resource appropriation. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA; 2017. p. 3643–3652. Available from: https://proceedings.neurips.cc/paper/2017/hash/2b0f658cbffd284984fb11d9 0254081f-Abstract.html.

  80. Wang WZ, Beliaev M, Biyik E, Lazar DA, Pedarsani R, Sadigh D. Emergent prosociality in multi-agent games through gifting. In: Proceedings of the Thirtieth international joint conference on artificial intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021; 2021. p. 434–442. Available from: https://doi.org/10.24963/ijcai.2021/61

  81. Mihai D, Hare JS. Learning to draw: emergent communication through sketching. 2021. CoRR. arXiv:2106.02067

  82. Li F, Bowling M. Ease-of-Teaching and language structure from emergent communication. In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada; 2019. p. 15825–15835. Available from: https://proceedings.neurips.cc/paper/2019/hash/b0cf188d74589db9b23d5d27 7238a929-Abstract.html.

  83. Lewis M, Yarats D, Dauphin YN, Parikh D, Batra D. Deal or No Deal? End-to-End learning for negotiation dialogues. 2017. CoRR. arXiv:1706.05125

  84. Noukhovitch M, LaCroix T, Lazaridou A, Courville AC. Emergent Communication under Competition. In: AAMAS ’21: 20th international conference on autonomous agents and multiagent systems, virtual event, United Kingdom, May 3-7, 2021; 2021. p. 974–982. Available from: https://dl.acm.org/doi/10.5555/3463952.3464066.

  85. Liu S, Lever G, Wang Z, Merel J, Eslami SMA, Hennes D, et al. From motor control to team play in simulated humanoid football. 2021. CoRR. arXiv:2105.12196

  86. Ding G, Koh JJ, Merckaert K, Vanderborght B, Nicotra MM, Heckman C, et al. Distributed reinforcement learning for cooperative multi-robot object manipulation. In: Proceedings of the 19th international conference on autonomous agents and multiagent systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020; 2020. p. 1831–1833. Available from: https://dl.acm.org/doi/abs/10.5555/3398761.3398997.

  87. Cao Y, Sun Z, Sartoretti G. DAN: Decentralized attention-based neural network to solve the minmax multiple traveling salesman problem. 2021. CoRR. arXiv:2109.04205

  88. Hu J, Zhang H, Song L, Schober R, Poor HV. Cooperative internet of UAVs: Distributed trajectory design by multi-agent deep reinforcement learning. IEEE Trans Commun. 2020;68(11):6807–21. https://doi.org/10.1109/TCOMM.2020.3013599.

    Article  Google Scholar 

  89. •• Fan T, Long P, Liu W, Pan J. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Int J Robotics Res. 2020;39(7). https://doi.org/10.1177/0278364920916531. This work presents a deep RLbased decentralized collision-avoidance framework for multi-robot path planning based on sensor inputs, with numerical and experimental validation results.

  90. Xiao Y, Hoffman J, Xia T, Amato C. Learning multi-robot decentralized macro-action-based policies via a centralized Q-Net. In: 2020 IEEE international conference on robotics and automation, ICRA 2020, Paris, France, May 31 - August 31, 2020; 2020. p. 10695–10701. Available from: https://doi.org/10.1109/ICRA40945.2020.9196684

  91. Wang D, Deng H, Pan Z. MRCDRL: Multi-robot coordination with deep reinforcement learning. Neurocomputing. 2020;406:68–76. https://doi.org/10.1016/j.neucom.2020.04.028.

    Article  Google Scholar 

  92. • Sartoretti G, Kerr J, Shi Y, Wagner G, Kumar TKS, Koenig S, et al. PRIMAL: pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics Autom Lett. 2019;4(3):2378–85. https://doi.org/10.1109/LRA.2019.2903261. This work introduces a scalable framework for multi-agent pathfinding which utilizes RL and imitation learning to learn decentralized policies that can scale to more than a thousand agents.

  93. Damani M, Luo Z, Wenzel E, Sartoretti G. PRIMAL2: Pathfinding via reinforcement and imitation multi-agent learning - Lifelong. IEEE Robotics Autom Lett. 2021;6(2):2666–73. https://doi.org/10.1109/LRA.2021.3062803.

    Article  Google Scholar 

  94. Marchesini E, Farinelli A. Centralizing state-values in dueling networks for multi-robot reinforcement learning mapless navigation. In: IEEE/RSJ international conference on intelligent robots and systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021; 2021. p. 4583–4588. Available from: https://doi.org/10.1109/IROS51168.2021.9636349

  95. Huang Y, Wu S, Mu Z, Long X, Chu S, Zhao G. A multi-agent reinforcement learning method for swarm robots in space collaborative exploration. In: 2020 6th international conference on control, automation and robotics (ICCAR); 2020. p. 139–144.

  96. He Z, Dong L, Song C, Sun C. Multi-agent Soft Actor-Critic Based Hybrid Motion Planner for Mobile Robots. 2021. CoRR. arXiv:2112.06594

  97. de Witt CS, Peng B, Kamienny P, Torr PHS, Böhmer W, Whiteson S. Deep multi-agent reinforcement learning for decentralized continuous cooperative control. 2020. CoRR. arXiv:2003.06709

  98. Freed B, Kapoor A, Abraham I, Schneider JG, Choset H. Learning cooperative multi-agent policies with partial reward decoupling. 2021. CoRR. arXiv:2112.12740

  99. García J, Fernández F. A comprehensive survey on safe reinforcement learning. J Mach Learn Res. 2015;16:1437–80.

    MATH  MathSciNet  Google Scholar 

  100. Shalev-Shwartz S, Shammah S, Shashua A. Safe, Multi-agent, reinforcement learning for autonomous driving. 2016. CoRR. arXiv:1610.03295

  101. Zhang W, Bastani O. MAMPS: Safe multi-agent reinforcement learning via model predictive shielding. 2019. CoRR. arXiv:1910.12639

  102. Savva M, Chang AX, Dosovitskiy A, Funkhouser TA, Koltun V. MINOS: Multimodal indoor simulator for navigation in complex environments. 2017. CoRR. arXiv:1712.03931

  103. Erickson ZM, Gangaram V, Kapusta A, Liu CK, Kemp CC. Assistive gym: a physics simulation framework for assistive robotics. In: 2020 IEEE international conference on robotics and automation, ICRA 2020, Paris, France, May 31 - August 31, 2020; 2020. p. 10169–10176. Available from: https://doi.org/10.1109/ICRA40945.2020.9197411

  104. Fan L, Zhu Y, Zhu J, Liu Z, Zeng O, Gupta A, et al. SURREAL: Open-source reinforcement learning framework and robot manipulation benchmark. In: 2nd annual conference on robot learning, CoRL 2018, Zürich, Switzerland, 29-31 October 2018, Proceedings. vol. 87; 2018. p. 767–782. Available from: http://proceedings.mlr.press/v87/fan18a.html.

  105. Freed B, Sartoretti G, Choset H. Simultaneous policy and discrete communication learning for multi-agent cooperation. IEEE Robotics Autom Lett. 2020;5(2):2498–505. https://doi.org/10.1109/LRA.2020.2972862.

    Article  Google Scholar 

  106. Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D. Deep reinforcement learning that matters. In: Proceedings of the Thirty-Second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018; 2018. p. 3207–3214. Available from: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16669.

  107. Polydoros AS, Nalpantidis L. Survey of model-based reinforcement learning: Applications on robotics. J Intell Robotic Syst. 2017;86(2):153–73. https://doi.org/10.1007/s10846-017-0468-y.

    Article  Google Scholar 

  108. Thuruthel TG, Falotico E, Renda F, Laschi C. Model-based reinforcement learning for closed-loop dynamic control of soft robotic manipulators. IEEE Trans Robotics. 2019;35(1):124–34. https://doi.org/10.1109/TRO.2018.2878318.

    Article  Google Scholar 

  109. Thananjeyan B, Balakrishna A, Rosolia U, Li F, McAllister R, Gonzalez JE, et al. Safety augmented value estimation from demonstrations (SAVED): Safe deep model-based RL for sparse cost robotic tasks. IEEE Robotics Autom Lett. 2020;5(2):3612–9. https://doi.org/10.1109/LRA.2020.2976272.

    Article  Google Scholar 

  110. Zhang K, Yang Z, Basar T. Decentralized multi-agent reinforcement learning with networked agents: recent advances. Frontiers Inf Technol Electron Eng. 2021;22(6):802–14. https://doi.org/10.1631/FITEE.1900661.

    Article  Google Scholar 

  111. Zhang K, Yang Z, Liu H, Zhang T, Basar T. Fully decentralized multi-agent reinforcement learning with networked agents. In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. vol. 80; 2018. p. 5867–5876. Available from: http://proceedings.mlr.press/v80/zhang18n.html.

Download references

Acknowledgements

Not applicable

Funding

This work was supported by the Singapore Ministry of Education Academic Research Fund Tier 1.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the study conception, study design, original draft preparation, as well as review and editing. Yutong Wang performed figure design, and the literature search and data analysis for communication learning methods, challenges, and benchmarks, as well as for open avenues for research. Mehul Damani performed the literature search and data analysis for background, as well as communication-free cooperation methods and challenges. Pamela Wang co-performed the literature search and data analysis for communication learning and benchmarks and Fig. 3. Yuhong Cao performed the literature search and data analysis for multi-robot applications and Fig. 4. Guillaume Sartoretti performed project administration, supervision, and was involved in the literature search for all aspects of this survey.

Corresponding author

Correspondence to Guillaume Sartoretti.

Ethics declarations

Ethics Approval

Not applicable

Consent to Participate

Not applicable

Consent for Publication

Not applicable

Conflict of Interest

The authors declare no competing interests.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

This article is part of the Topical Collection on Group Robotics

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, Y., Damani, M., Wang, P. et al. Distributed Reinforcement Learning for Robot Teams: a Review. Curr Robot Rep 3, 239–257 (2022). https://doi.org/10.1007/s43154-022-00091-8

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s43154-022-00091-8

Keywords

Navigation