Distributed Reinforcement Learning for Robot Teams: a Review

Wang, Yutong; Damani, Mehul; Wang, Pamela; Cao, Yuhong; Sartoretti, Guillaume

doi:10.1007/s43154-022-00091-8

Distributed Reinforcement Learning for Robot Teams: a Review

Group Robotics (M Gini and F Amigoni, Section Editors)
Published: 01 September 2022

Volume 3, pages 239–257, (2022)
Cite this article

Current Robotics Reports Aims and scope Submit manuscript

Yutong Wang¹,
Mehul Damani¹,
Pamela Wang¹,
Yuhong Cao¹ &
…
Guillaume Sartoretti¹

1032 Accesses
6 Citations
Explore all metrics

Abstract

Purpose of Review

Recent advances in sensing, actuation, and computation have opened the door to multi-robot systems consisting of hundreds/thousands of robots, with promising applications to automated manufacturing, disaster relief, harvesting, last-mile delivery, port/airport operations, or search and rescue. The community has leveraged model-free multi-agent reinforcement learning (MARL) to devise efficient, scalable controllers for multi-robot systems (MRS). This review aims to provide an analysis of the state-of-the-art in distributed MARL for multi-robot cooperation.

Recent Findings

Decentralized MRS face fundamental challenges, such as non-stationarity and partial observability. Building upon the “centralized training, decentralized execution” paradigm, recent MARL approaches include independent learning, centralized critic, value decomposition, and communication learning approaches. Cooperative behaviors are demonstrated through AI benchmarks and fundamental real-world robotic capabilities such as multi-robot motion/path planning.

Summary

This survey reports the challenges surrounding decentralized model-free MARL for multi-robot cooperation and existing classes of approaches. We present benchmarks and robotic applications along with a discussion on current open avenues for research.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A practical guide to multi-objective reinforcement learning and planning

Article Open access 13 April 2022

Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms

A review of motion planning algorithms for intelligent robots

Article Open access 25 November 2021

Data Availability

Not applicable

Code Availability

Not applicable

Supplementary Information

Not applicable

Notes

In some cases, one may assume \(\pi ^*\) to be deterministic, i.e., a discrete one-hot distribution or a continuous \(\delta\) distribution. In such cases, it is common to simplify it to \(\pi ^*: S \rightarrow A\), as it solely returns a single (optimal) action in each state.

References

Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance

Nägele L, Schierl A, Hoffmann A, Reif W. Multi-robot cooperation for assembly: Automated planning and optimization. In: International conference on informatics in control, automation and robotics. Springer; 2019. p. 169–192.
Ma K, Ma Z, Liu L, Sukhatme GS. Multi-robot informative and adaptive planning for persistent environmental monitoring. In: Distributed autonomous robotic systems, the 13th international symposium, DARS 2016, Natural History Museum, London, UK, November 7-9, 2016. vol. 6; 2016. p. 285–298. Available from: https://doi.org/10.1007/978-3-319-73008-0_20
Wang H, Zhang C, Song Y, Pang B. Master-Followed multiple robots cooperation SLAM adapted to search and rescue scenarios. In: IEEE international conference on information and automation, ICIA 2017, Macau, SAR, China, July 18-20, 2017; 2017. p. 579–585. Available from: https://doi.org/10.1109/ICInfA.2017.8078975
Oroojlooyjadid A, Hajinezhad D. A review of cooperative multi-agent deep reinforcement learning. 2019. CoRR. arXiv:1908.03963
Hernandez-Leal P, Kartal B, Taylor ME. A survey and critique of multiagent deep reinforcement learning. Auton Agents Multi Agent Syst. 2019;33(6):750–97. https://doi.org/10.1007/s10458-019-09421-1.
Article Google Scholar
Nguyen TT, Nguyen ND, Nahavandi S. Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans Cybern. 2020;50(9):3826–39. https://doi.org/10.1109/TCYB.2020.2977374.
Article Google Scholar
Gronauer S, Diepold K. Multi-agent deep reinforcement learning: a survey. Artif Intell Rev. 2022;55(2):895–943. https://doi.org/10.1007/s10462-021-09996-w.
Article Google Scholar
Papoudakis G, Christianos F, Rahman A, Albrecht SV. Dealing with Non-Stationarity in Multi-Agent Deep Reinforcement Learning. 2019. CoRR. arXiv:1906.04737
Cortés J, Egerstedt M. Coordinated control of multi-robot systems: A survey. SICE Journal of Control, Measurement, and System Integration. 2017;10(6):495–503.
Article Google Scholar
Tuci E, Alkilabi MHM, Akanyeti O. Cooperative object transport in multi-robot systems: a review of the State-of-the-Art. Frontiers Robotics AI. 2018;5:59. https://doi.org/10.3389/frobt.2018.00059.
Article Google Scholar
Feng Z, Hu G, Sun Y, Soon J. An overview of collaborative robotic manipulation in multi-robot systems. Annu Rev Control. 2020;49:113–27. https://doi.org/10.1016/j.arcontrol.2020.02.002.
Article MathSciNet Google Scholar
Sutton RS, Barto AG. Reinforcement learning: An introduction. MIT press; 2018.
Mnih V, Kavukcuoglu K, Silver D, Graves A, Antonoglou I, Wierstra D, et al. Playing atari with deep reinforcement learning. 2013. CoRR. arXiv:1312.5602
Sutton RS, McAllester DA, Singh SP, Mansour Y. Policy gradient methods for reinforcement learning with function approximation. In: Advances in neural information processing systems 12, [NIPS Conference, Denver, Colorado, USA, November 29 - December 4, 1999]; 1999. p. 1057–1063.
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap TP, Harley T, et al. Asynchronous methods for deep reinforcement learning. In: Proceedings of the 33nd international conference on machine learning, ICML 2016, New York City, NY, USA, June 19-24, 2016. vol. 48; 2016. p. 1928–1937. Available from: http://proceedings.mlr.press/v48/mniha16.html.
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O. proximal policy optimization algorithms. 2017. CoRR. arXiv:1707.06347
Haarnoja T, Zhou A, Abbeel P, Levine S. Soft Actor-Critic: Off-Policy maximum entropy deep reinforcement learning with a stochastic actor. In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. vol. 80; 2018. p. 1856–1865. Available from: http://proceedings.mlr.press/v80/haarnoja18b.html.
Lillicrap TP, Hunt JJ, Pritzel A, Heess N, Erez T, Tassa Y, et al. Continuous control with deep reinforcement learning. In: 4th international conference on learning representations, ICLR 2016, San Juan, Puerto Rico, May 2-4, 2016, Conference Track Proceedings; 2016. Available from: arXiv:1509.02971
Arulkumaran K, Deisenroth MP, Brundage M, Bharath AA. Deep reinforcement learning: a brief survey. IEEE Signal Process Mag. 2017;34(6):26–38. https://doi.org/10.1109/MSP.2017.2743240.
Article Google Scholar
Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: A survey. J Artif Intell Res. 1996;4:237–85. https://doi.org/10.1613/jair.301.
Article Google Scholar
François-Lavet V, Henderson P, Islam R, Bellemare MG, Pineau J. An introduction to deep reinforcement learning. Found Trends Mach Learn. 2018;11(3–4):219–354. https://doi.org/10.1561/2200000071.
Article MATH Google Scholar
Gupta JK, Egorov M, Kochenderfer MJ. Cooperative multi-agent control using deep reinforcement learning. In: Autonomous agents and multiagent systems - AAMAS 2017 Workshops, Best Papers, São Paulo, Brazil, May 8-12, 2017, Revised Selected Papers. vol. 10642; 2017. p. 66–83. Available from: https://doi.org/10.1007/978-3-319-71682-4_5
Samvelyan M, Rashid T, de Witt CS, Farquhar G, Nardelli N, Rudner TGJ, et al. The starcraft multi-agent challenge. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019; 2019. p. 2186–2188. Available from: http://dl.acm.org/citation.cfm?id=3332052.
Mordatch I, Abbeel P. Emergence of grounded compositional language in multi-agent populations. In: Proceedings of the Thirty-Second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018; 2018. p. 1495–1502. Available from: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17007.
Silver D, Hubert T, Schrittwieser J, Antonoglou I, Lai M, Guez A, et al. Mastering chess and shogi by self-play with a general reinforcement learning algorithm. 2017. CoRR. arXiv:1712.01815
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, et al. Human-level control through deep reinforcement learning. Nat. 2015;518(7540):529–33. https://doi.org/10.1038/nature14236.
Article Google Scholar
Tampuu A, Matiisen T, Kodelja D, Kuzovkin I, Korjus K, Aru J, et al. Multiagent cooperation and competition with deep reinforcement learning. 2015. CoRR. arXiv:1511.08779
Resnick C, Eldridge W, Ha D, Britz D, Foerster JN, Togelius J, et al. Pommerman: A multi-agent playground. In: Joint Proceedings of the AIIDE 2018 workshops co-located with 14th AAAI conference on artificial intelligence and interactive digital entertainment (AIIDE 2018), Edmonton, Canada, November 13-14, 2018. vol. 2282; 2018. Available from: http://ceur-ws.org/Vol-2282/MARLO_104.pdf.
Suarez J, Du Y, Isola P, Mordatch I. Neural MMO: A massively multiagent game environment for training and evaluating intelligent agents. 2019. CoRR. arXiv:1903.00784
• Kim W, Park J, Sung Y. Communication in multi-agent reinforcement learning: intention sharing. In: 9th International conference on learning representations, ICLR 2021, Virtual Event, Austria, May 3-7, 2021; 2021. Available from: https://openreview.net/forum?id=qpsl2dR9twy. This work allows globallycommunicating agents to share intent by modeling the environment dynamics and other agents’ actions.
•• Kim D, Moon S, Hostallero D, Kang WJ, Lee T, Son K, et al. Learning to schedule communication in multi-agent reinforcement learning. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019; 2019. Available from: https://openreview.net/forum?id=SJxu5iR9KQ. This work allows agents to learn to estimate the importance of their observation/- knowledge, to selectively broadcasts continuous messages to the whole team.
Chu T, Wang J, Codecà L, Li Z. Multi-agent deep reinforcement learning for large-scale traffic signal control. IEEE Trans Intell Transp Syst. 2020;21(3):1086–95. https://doi.org/10.1109/TITS.2019.2901791.
Article Google Scholar
Sunehag P, Lever G, Gruslys A, Czarnecki WM, Zambaldi VF, Jaderberg M, et al. Value-decomposition networks for cooperative multi-agent learning based on team reward. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018; 2018. p. 2085–2087. Available from: http://dl.acm.org/citation.cfm?id=3238080.
Lowe R, Wu Y, Tamar A, Harb J, Abbeel P, Mordatch I. Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA; 2017. p. 6379–6390. Available from: https://proceedings.neurips.cc/paper/2017/hash/68a9750337a418a86fe06c19 91a1d64c-Abstract.html.
Foerster JN, Farquhar G, Afouras T, Nardelli N, Whiteson S. counterfactual multi-agent policy gradients. In: Proceedings of the Thirty-Second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018; 2018. p. 2974–2982. Available from: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/17193.
Rashid T, Samvelyan M, de Witt CS, Farquhar G, Foerster JN, Whiteson S. QMIX: monotonic value function factorisation for deep multi-agent reinforcement learning. In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. vol. 80; 2018. p. 4292–4301. Available from: http://proceedings.mlr.press/v80/rashid18a.html.
Hausknecht MJ, Stone P. Deep recurrent Q-learning for partially observable MDPs. In: 2015 AAAI Fall Symposia, Arlington, Virginia, USA, November 12-14, 2015; 2015. p. 29–37. Available from: http://www.aaai.org/ocs/index.php/FSS/FSS15/paper/view/11673.
Foerster JN, Assael YM, de Freitas N, Whiteson S. Learning to communicate with deep multi-agent reinforcement learning. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain; 2016. p. 2137–2145. Available from: https://proceedings.neurips.cc/paper/2016/hash/c7635bfd99248a2cdef8249e f7bfbef4-Abstract.html.
Sukhbaatar S, Szlam A, Fergus R. Learning multiagent communication with backpropagation. In: Advances in neural information processing systems 29: Annual conference on neural information processing systems 2016, December 5-10, 2016, Barcelona, Spain; 2016. p. 2244–2252. Available from: https://proceedings.neurips.cc/paper/2016/hash/55b1927fdafef39c48e5b73b 5d61ea60-Abstract.html.
Singh A, Jain T, Sukhbaatar S. Learning when to communicate at scale in multiagent cooperative and competitive tasks. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019; 2019. Available from: https://openreview.net/forum?id=rye7knCqK7.
Lauer M, Riedmiller MA. An algorithm for distributed reinforcement learning in cooperative multi-agent systems. In: Proceedings of the seventeenth international conference on machine learning (ICML 2000), Stanford University, Stanford, CA, USA, June 29 - July 2, 2000; 2000. p. 535–542.
Matignon L, Laurent GJ, Fort-Piat NL. Hysteretic q-learning : an algorithm for decentralized reinforcement learning in cooperative multi-agent teams. In: 2007 IEEE/RSJ international conference on intelligent robots and systems, October 29 - November 2, 2007, Sheraton Hotel and Marina, San Diego, California, USA; 2007. p. 64–69. Available from: https://doi.org/10.1109/IROS.2007.4399095
Panait L, Sullivan K, Luke S. Lenient learners in cooperative multiagent systems. In: 5th international joint conference on autonomous agents and multiagent systems (AAMAS 2006), Hakodate, Japan, May 8-12, 2006; 2006. p. 801–803. Available from: https://doi.org/10.1145/1160633.1160776
Palmer G, Tuyls K, Bloembergen D, Savani R. Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th international conference on autonomous agents and multiagent systems, AAMAS 2018, Stockholm, Sweden, July 10-15, 2018; 2018. p. 443–451. Available from: http://dl.acm.org/citation.cfm?id=3237451.
Omidshafiei S, Pazis J, Amato C, How JP, Vian J. Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. vol. 70; 2017. p. 2681–2690. Available from: http://proceedings.mlr.press/v70/omidshafiei17a.html.
•• Jaques N, Lazaridou A, Hughes E, Gülçehre Ç, Ortega PA, Strouse D, et al. Social influence as intrinsic motivation for multi-agent deep reinforcement learning. In: Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. vol. 97; 2019. p. 3040–3049. Available from: http://proceedings.mlr.press/v97/jaques19a.html. This work proposed to encourage cooperation among agents by relying on an intrinsic reward that aims at maximizing their influence over each other.
Sun M, Devlin S, Hofmann K, Whiteson S. Monotonic improvement guarantees under non-stationarity for decentralized PPO. 2022. CoRR. arXiv:2202.00082
• Yu C, Velu A, Vinitsky E, Wang Y, Bayen AM, Wu Y. The surprising effectiveness of MAPPO in cooperative, multi-agent games. 2021. CoRR. arXiv:2103.01955. This work shows that independent learning using on-policy algorithms such as PPO can perform effectively in fully cooperative MARL environments.
Foerster JN, Nardelli N, Farquhar G, Afouras T, Torr PHS, Kohli P, et al. Stabilising experience replay for deep multi-agent reinforcement learning. In: Proceedings of the 34th international conference on machine learning, ICML 2017, Sydney, NSW, Australia, 6-11 August 2017. vol. 70; 2017. p. 1146–1155. Available from: http://proceedings.mlr.press/v70/foerster17b.html.
Tesauro G. Extending Q-learning to general adaptive multi-agent systems. In: Advances in neural information processing systems 16 [Neural Information Processing Systems, NIPS 2003, December 8-13, 2003, Vancouver and Whistler, British Columbia, Canada]; 2003. p. 871–878. Available from: https://proceedings.neurips.cc/paper/2003/hash/e71e5cd119bbc5797164fb0c d7fd94a4-Abstract.html.
•• Iqbal S, Sha F. Actor-Attention-Critic for multi-agent reinforcement learning. In: Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. vol. 97; 2019. p. 2961–2970. Available from: http://proceedings.mlr.press/v97/iqbal19a.html. This work uses an attention mechanism in the centralized critic to dynamically select relevant information.
Mao H, Zhang Z, Xiao Z, Gong Z. Modelling the dynamic joint policy of teammates with attention multi-agentk DDPG. In: Proceedings of the 18th international conference on autonomous agents and multiagent systems, AAMAS ’19, Montreal, QC, Canada, May 13-17, 2019; 2019. p. 1108–1116. Available from: http://dl.acm.org/citation.cfm?id=3331810.
• Zhou M, Liu Z, Sui P, Li Y, Chung YY. Learning implicit credit assignment for cooperative multi-agent reinforcement learning. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, virtual; 2020. Available from: https://proceedings.neurips.cc/paper/2020/hash/8977ecbb8cb82d77fb091c7a 7f186163-Abstract.html. This work proposes a framework for implicit credit assignment which directly ascends approximate joint action value gradients of the centralized critic.
• Son K, Kim D, Kang WJ, Hostallero D, Yi Y. QTRAN: Learning to factorize with transformation for cooperative multi-agent reinforcement learning. In: Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. vol. 97; 2019. p. 5887–5896. Available from: http://proceedings.mlr.press/v97/son19a.html. This work aims to learn a general value factorization without any structural constraints by transforming the optimal value function into one which is easily factorizable.
• Mahajan A, Rashid T, Samvelyan M, Whiteson S. MAVEN: Multi-agent variational exploration. In: Advances in neural information processing systems 32: Annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada; 2019. p. 7611–7622. Available from: https://proceedings.neurips.cc/paper/2019/hash/f816dc0acface7498e104962 22e9db10-Abstract.html. This work extends QMIX and other value factorization methods by using a hierarchical policy to guide committed and temporally extended exploration.
Mao H, Gong Z, Ni Y, Liu X, Wang Q, Ke W, et al. ACCNet: Actor-Coordinator-Critic Net for “Learning-to-Communicate” with deep multi-agent reinforcement learning. 2017. CoRR. arXiv:1706.03235
Su J, Adams SC, Beling PA. Counterfactual multi-agent reinforcement learning with graph convolution communication. 2020. CoRR. arXiv:2004.00470
Zhang SQ, Zhang Q, Lin J. Efficient communication in multi-agent reinforcement learning via variance based control. In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada; 2019. p. 3230–3239. Available from: https://proceedings.neurips.cc/paper/2019/hash/14cfdb59b5bda1fc245aadae 15b1984a-Abstract.html.
Jiang J, Lu Z. Learning attentional communication for multi-agent cooperation. In: Advances in neural information processing systems 31: annual conference on neural information processing systems 2018, NeurIPS 2018, December 3-8, 2018, Montréal, Canada; 2018. p. 7265–7275. Available from: https://proceedings.neurips.cc/paper/2018/hash/6a8018b3a00b69c008601b8b ecae392b-Abstract.html.
Jiang J, Dun C, Huang T, Lu Z. Graph convolutional reinforcement learning. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020; 2020. Available from: https://openreview.net/forum?id=HkxdQkSYDB.
• Ma Z, Luo Y, Ma H. Distributed heuristic multi-agent path finding with communication. In: IEEE international conference on robotics and automation, ICRA 2021, Xi’an, China, May 30 - June 5, 2021; 2021. p. 8699–8705. Available from: https://doi.org/10.1109/ICRA48506.2021.9560748. This work formalizes the multiagent system as a graph and lets agents communicate with neighbors via graph convolution to solve the multi-agent pathfinding task.
• Liu Y, Wang W, Hu Y, Hao J, Chen X, Gao Y. Multi-agent game abstraction via graph attention neural network. In: The Thirty-Fourth AAAI conference on artificial intelligence, AAAI 2020, The Thirty-Second innovative applications of artificial intelligence conference, IAAI 2020, The Tenth AAAI symposium on educational advances in artificial intelligence, EAAI 2020, New York, NY, USA, February 7-12, 2020; 2020. p. 7211–7218. Available from: https://ojs.aaai.org/index.php/AAAI/article/view/6211. This work uses a two-stage attention network to estimate whether two agents should communicate and the importance of that communication instance.
Kong X, Xin B, Liu F, Wang Y. Revisiting the master-slave architecture in multi-agent deep reinforcement learning. 2017. CoRR. arXiv:1712.07305
Das A, Gervet T, Romoff J, Batra D, Parikh D, Rabbat M, et al. TarMAC: targeted multi-agent communication. In: Proceedings of the 36th international conference on machine learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA. vol. 97; 2019. p. 1538–1546. Available from: http://proceedings.mlr.press/v97/das19a.html.
Blumenkamp J, Prorok A. The emergence of adversarial communication in multi-agent reinforcement learning. In: 4th conference on robot learning, CoRL 2020, 16-18 November 2020, Virtual Event / Cambridge, MA, USA. vol. 155; 2020. p. 1394–1414. Available from: https://proceedings.mlr.press/v155/blumenkamp21a.html.
Du Y, Liu B, Moens V, Liu Z, Ren Z, Wang J, et al. Learning correlated communication topology in multi-agent reinforcement learning. In: AAMAS ’21: 20th international conference on autonomous agents and multiagent systems, Virtual Event, United Kingdom, May 3-7, 2021; 2021. p. 456–464. Available from: https://dl.acm.org/doi/10.5555/3463952.3464010.
•• Li W, Chen H, Jin B, Tan W, Zha H, Wang X. Multi-agent path finding with prioritized communication learning. 2022. CoRR. arXiv:2202.03634. This work incorporates relies on a conventional coupled planner to guide the learning of the communication topology in multiagent pathfinding.
Pesce E, Montana G. Connectivity-driven communication in multi-agent reinforcement learning through diffusion processes on graphs. 2020. CoRR. arXiv:2002.05233
Peng P, Yuan Q, Wen Y, Yang Y, Tang Z, Long H, et al. Multiagent bidirectionally-coordinated nets for learning to play StarCraft combat games. 2017. CoRR. arXiv:1703.10069
Pesce E, Montana G. Improving coordination in small-scale multi-agent deep reinforcement learning through memory-driven communication. Mach Learn. 2020;109(9–10):1727–47. https://doi.org/10.1007/s10994-019-05864-5.
Article MATH MathSciNet Google Scholar
Wang Y, Sartoretti G. FCMNet: Full communication memory net for team-level cooperation in multi-agent systems. 2022. CoRR. arXiv:2201.11994
Agarwal A, Kumar S, Sycara KP, Lewis M. Learning transferable cooperative behavior in multi-agent teams. In: Proceedings of the 19th international conference on autonomous agents and multiagent systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020; 2020. p. 1741–1743. Available from: https://dl.acm.org/doi/abs/10.5555/3398761.3398967.
Cao K, Lazaridou A, Lanctot M, Leibo JZ, Tuyls K, Clark S. Emergent communication through negotiation. In: 6th international conference on learning representations, ICLR 2018, Vancouver, BC, Canada, April 30 - May 3, 2018, Conference Track Proceedings; 2018. Available from: https://openreview.net/forum?id=Hk6WhagRW.
Shaw S, Wenzel E, Walker A, Sartoretti G. ForMIC: Foraging via multiagent RL with implicit communication. IEEE Robotics Autom Lett. 2022;7(2):4877–84. https://doi.org/10.1109/LRA.2022.3152688.
Article Google Scholar
Ma Z, Luo Y, Pan J. Learning selective communication for multi-agent path finding. 2021. CoRR. arXiv:2109.05413
Freed B, James R, Sartoretti G, Choset H. Sparse discrete communication learning for multi-agent cooperation through backpropagation. In: IEEE/RSJ international conference on intelligent robots and systems, IROS 2020, Las Vegas, NV, USA, October 24, 2020 - January 24, 2021; 2020. p. 7993–7998. Available from: https://doi.org/10.1109/IROS45743.2020.9341079
Leibo JZ, Zambaldi VF, Lanctot M, Marecki J, Graepel T. Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th conference on autonomous agents and multiagent systems, AAMAS 2017, São Paulo, Brazil, May 8-12, 2017; 2017. p. 464–473. Available from: http://dl.acm.org/citation.cfm?id=3091194.
Claus C, Boutilier C. The dynamics of reinforcement learning in cooperative multiagent systems. In: Proceedings of the Fifteenth national conference on artificial intelligence and tenth innovative applications of artificial intelligence conference, AAAI 98, IAAI 98, July 26-30, 1998, Madison, Wisconsin, USA; 1998. p. 746–752. Available from: http://www.aaai.org/Library/AAAI/1998/aaai98-106.php.
Pérolat J, Leibo JZ, Zambaldi VF, Beattie C, Tuyls K, Graepel T. A multi-agent reinforcement learning model of common-pool resource appropriation. In: Advances in neural information processing systems 30: annual conference on neural information processing systems 2017, December 4-9, 2017, Long Beach, CA, USA; 2017. p. 3643–3652. Available from: https://proceedings.neurips.cc/paper/2017/hash/2b0f658cbffd284984fb11d9 0254081f-Abstract.html.
Wang WZ, Beliaev M, Biyik E, Lazar DA, Pedarsani R, Sadigh D. Emergent prosociality in multi-agent games through gifting. In: Proceedings of the Thirtieth international joint conference on artificial intelligence, IJCAI 2021, Virtual Event / Montreal, Canada, 19-27 August 2021; 2021. p. 434–442. Available from: https://doi.org/10.24963/ijcai.2021/61
Mihai D, Hare JS. Learning to draw: emergent communication through sketching. 2021. CoRR. arXiv:2106.02067
Li F, Bowling M. Ease-of-Teaching and language structure from emergent communication. In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, December 8-14, 2019, Vancouver, BC, Canada; 2019. p. 15825–15835. Available from: https://proceedings.neurips.cc/paper/2019/hash/b0cf188d74589db9b23d5d27 7238a929-Abstract.html.
Lewis M, Yarats D, Dauphin YN, Parikh D, Batra D. Deal or No Deal? End-to-End learning for negotiation dialogues. 2017. CoRR. arXiv:1706.05125
Noukhovitch M, LaCroix T, Lazaridou A, Courville AC. Emergent Communication under Competition. In: AAMAS ’21: 20th international conference on autonomous agents and multiagent systems, virtual event, United Kingdom, May 3-7, 2021; 2021. p. 974–982. Available from: https://dl.acm.org/doi/10.5555/3463952.3464066.
Liu S, Lever G, Wang Z, Merel J, Eslami SMA, Hennes D, et al. From motor control to team play in simulated humanoid football. 2021. CoRR. arXiv:2105.12196
Ding G, Koh JJ, Merckaert K, Vanderborght B, Nicotra MM, Heckman C, et al. Distributed reinforcement learning for cooperative multi-robot object manipulation. In: Proceedings of the 19th international conference on autonomous agents and multiagent systems, AAMAS ’20, Auckland, New Zealand, May 9-13, 2020; 2020. p. 1831–1833. Available from: https://dl.acm.org/doi/abs/10.5555/3398761.3398997.
Cao Y, Sun Z, Sartoretti G. DAN: Decentralized attention-based neural network to solve the minmax multiple traveling salesman problem. 2021. CoRR. arXiv:2109.04205
Hu J, Zhang H, Song L, Schober R, Poor HV. Cooperative internet of UAVs: Distributed trajectory design by multi-agent deep reinforcement learning. IEEE Trans Commun. 2020;68(11):6807–21. https://doi.org/10.1109/TCOMM.2020.3013599.
Article Google Scholar
•• Fan T, Long P, Liu W, Pan J. Distributed multi-robot collision avoidance via deep reinforcement learning for navigation in complex scenarios. Int J Robotics Res. 2020;39(7). https://doi.org/10.1177/0278364920916531. This work presents a deep RLbased decentralized collision-avoidance framework for multi-robot path planning based on sensor inputs, with numerical and experimental validation results.
Xiao Y, Hoffman J, Xia T, Amato C. Learning multi-robot decentralized macro-action-based policies via a centralized Q-Net. In: 2020 IEEE international conference on robotics and automation, ICRA 2020, Paris, France, May 31 - August 31, 2020; 2020. p. 10695–10701. Available from: https://doi.org/10.1109/ICRA40945.2020.9196684
Wang D, Deng H, Pan Z. MRCDRL: Multi-robot coordination with deep reinforcement learning. Neurocomputing. 2020;406:68–76. https://doi.org/10.1016/j.neucom.2020.04.028.
Article Google Scholar
• Sartoretti G, Kerr J, Shi Y, Wagner G, Kumar TKS, Koenig S, et al. PRIMAL: pathfinding via reinforcement and imitation multi-agent learning. IEEE Robotics Autom Lett. 2019;4(3):2378–85. https://doi.org/10.1109/LRA.2019.2903261. This work introduces a scalable framework for multi-agent pathfinding which utilizes RL and imitation learning to learn decentralized policies that can scale to more than a thousand agents.
Damani M, Luo Z, Wenzel E, Sartoretti G. PRIMAL₂: Pathfinding via reinforcement and imitation multi-agent learning - Lifelong. IEEE Robotics Autom Lett. 2021;6(2):2666–73. https://doi.org/10.1109/LRA.2021.3062803.
Article Google Scholar
Marchesini E, Farinelli A. Centralizing state-values in dueling networks for multi-robot reinforcement learning mapless navigation. In: IEEE/RSJ international conference on intelligent robots and systems, IROS 2021, Prague, Czech Republic, September 27 - Oct. 1, 2021; 2021. p. 4583–4588. Available from: https://doi.org/10.1109/IROS51168.2021.9636349
Huang Y, Wu S, Mu Z, Long X, Chu S, Zhao G. A multi-agent reinforcement learning method for swarm robots in space collaborative exploration. In: 2020 6th international conference on control, automation and robotics (ICCAR); 2020. p. 139–144.
He Z, Dong L, Song C, Sun C. Multi-agent Soft Actor-Critic Based Hybrid Motion Planner for Mobile Robots. 2021. CoRR. arXiv:2112.06594
de Witt CS, Peng B, Kamienny P, Torr PHS, Böhmer W, Whiteson S. Deep multi-agent reinforcement learning for decentralized continuous cooperative control. 2020. CoRR. arXiv:2003.06709
Freed B, Kapoor A, Abraham I, Schneider JG, Choset H. Learning cooperative multi-agent policies with partial reward decoupling. 2021. CoRR. arXiv:2112.12740
García J, Fernández F. A comprehensive survey on safe reinforcement learning. J Mach Learn Res. 2015;16:1437–80.
MATH MathSciNet Google Scholar
Shalev-Shwartz S, Shammah S, Shashua A. Safe, Multi-agent, reinforcement learning for autonomous driving. 2016. CoRR. arXiv:1610.03295
Zhang W, Bastani O. MAMPS: Safe multi-agent reinforcement learning via model predictive shielding. 2019. CoRR. arXiv:1910.12639
Savva M, Chang AX, Dosovitskiy A, Funkhouser TA, Koltun V. MINOS: Multimodal indoor simulator for navigation in complex environments. 2017. CoRR. arXiv:1712.03931
Erickson ZM, Gangaram V, Kapusta A, Liu CK, Kemp CC. Assistive gym: a physics simulation framework for assistive robotics. In: 2020 IEEE international conference on robotics and automation, ICRA 2020, Paris, France, May 31 - August 31, 2020; 2020. p. 10169–10176. Available from: https://doi.org/10.1109/ICRA40945.2020.9197411
Fan L, Zhu Y, Zhu J, Liu Z, Zeng O, Gupta A, et al. SURREAL: Open-source reinforcement learning framework and robot manipulation benchmark. In: 2nd annual conference on robot learning, CoRL 2018, Zürich, Switzerland, 29-31 October 2018, Proceedings. vol. 87; 2018. p. 767–782. Available from: http://proceedings.mlr.press/v87/fan18a.html.
Freed B, Sartoretti G, Choset H. Simultaneous policy and discrete communication learning for multi-agent cooperation. IEEE Robotics Autom Lett. 2020;5(2):2498–505. https://doi.org/10.1109/LRA.2020.2972862.
Article Google Scholar
Henderson P, Islam R, Bachman P, Pineau J, Precup D, Meger D. Deep reinforcement learning that matters. In: Proceedings of the Thirty-Second AAAI conference on artificial intelligence, (AAAI-18), the 30th innovative applications of artificial intelligence (IAAI-18), and the 8th AAAI symposium on educational advances in artificial intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018; 2018. p. 3207–3214. Available from: https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/view/16669.
Polydoros AS, Nalpantidis L. Survey of model-based reinforcement learning: Applications on robotics. J Intell Robotic Syst. 2017;86(2):153–73. https://doi.org/10.1007/s10846-017-0468-y.
Article Google Scholar
Thuruthel TG, Falotico E, Renda F, Laschi C. Model-based reinforcement learning for closed-loop dynamic control of soft robotic manipulators. IEEE Trans Robotics. 2019;35(1):124–34. https://doi.org/10.1109/TRO.2018.2878318.
Article Google Scholar
Thananjeyan B, Balakrishna A, Rosolia U, Li F, McAllister R, Gonzalez JE, et al. Safety augmented value estimation from demonstrations (SAVED): Safe deep model-based RL for sparse cost robotic tasks. IEEE Robotics Autom Lett. 2020;5(2):3612–9. https://doi.org/10.1109/LRA.2020.2976272.
Article Google Scholar
Zhang K, Yang Z, Basar T. Decentralized multi-agent reinforcement learning with networked agents: recent advances. Frontiers Inf Technol Electron Eng. 2021;22(6):802–14. https://doi.org/10.1631/FITEE.1900661.
Article Google Scholar
Zhang K, Yang Z, Liu H, Zhang T, Basar T. Fully decentralized multi-agent reinforcement learning with networked agents. In: Proceedings of the 35th international conference on machine learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10-15, 2018. vol. 80; 2018. p. 5867–5876. Available from: http://proceedings.mlr.press/v80/zhang18n.html.

Download references

Acknowledgements

Not applicable

Funding

This work was supported by the Singapore Ministry of Education Academic Research Fund Tier 1.

Author information

Authors and Affiliations

Department of Mechanical Engineering, National University of Singapore, 9 Engineering Dr 1, Singapore, 117575, Singapore
Yutong Wang, Mehul Damani, Pamela Wang, Yuhong Cao & Guillaume Sartoretti

Authors

Yutong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mehul Damani
View author publications
You can also search for this author in PubMed Google Scholar
Pamela Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuhong Cao
View author publications
You can also search for this author in PubMed Google Scholar
Guillaume Sartoretti
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception, study design, original draft preparation, as well as review and editing. Yutong Wang performed figure design, and the literature search and data analysis for communication learning methods, challenges, and benchmarks, as well as for open avenues for research. Mehul Damani performed the literature search and data analysis for background, as well as communication-free cooperation methods and challenges. Pamela Wang co-performed the literature search and data analysis for communication learning and benchmarks and Fig. 3. Yuhong Cao performed the literature search and data analysis for multi-robot applications and Fig. 4. Guillaume Sartoretti performed project administration, supervision, and was involved in the literature search for all aspects of this survey.

Corresponding author

Correspondence to Guillaume Sartoretti.

Ethics declarations

Ethics Approval

Not applicable

Consent to Participate

Not applicable

Consent for Publication

Not applicable

Conflict of Interest

The authors declare no competing interests.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

This article is part of the Topical Collection on Group Robotics

Rights and permissions

Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Y., Damani, M., Wang, P. et al. Distributed Reinforcement Learning for Robot Teams: a Review. Curr Robot Rep 3, 239–257 (2022). https://doi.org/10.1007/s43154-022-00091-8

Download citation

Accepted: 25 July 2022
Published: 01 September 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s43154-022-00091-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed Reinforcement Learning for Robot Teams: a Review