Improving sample efficiency in Multi-Agent Actor-Critic methods

Ye, Zhenhui; Chen, Yining; Jiang, Xiaohong; Song, Guanghua; Yang, Bowei; Fan, Sheng

doi:10.1007/s10489-021-02554-5

Improving sample efficiency in Multi-Agent Actor-Critic methods

Published: 09 July 2021

Volume 52, pages 3691–3704, (2022)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Zhenhui Ye¹,
Yining Chen¹,
Xiaohong Jiang²,
Guanghua Song¹,
Bowei Yang¹ &
…
Sheng Fan¹

1104 Accesses
9 Citations
Explore all metrics

Abstract

The popularity of multi-agent deep reinforcement learning (MADRL) is growing rapidly with the demand for large-scale real-world tasks that require swarm intelligence, and many studies have improved MADRL from the perspective of network structures or reinforcement learning methods. However, the application of MADRL in the real world is hampered by the low sample efficiency of the models and the high cost to collect data. To improve the practicability, an extension to the current training paradigm of MADRL that improves the sample efficiency is imperative. To this end, this paper proposes PEDMA, a flexible plugin unit for MADRL. It consists of three techniques: (i)Parallel Environments (PE), to accelerate the data acquisition; (ii)Experience Augmentation (EA), a novel data augmentation method that utilizes the permutation invariance property of the multi-agent system to reduce the cost of acquiring data; and (iii)Delayed Updated Policies (DUP), to improve the data utilization efficiency of the MADRL model. The proposed EA method could improve the performance, data efficiency, and convergence speed of MADRL models, which is theoretically and empirically demonstrated. Experiments on three multi-agent benchmark tasks show that the MAAC model trained with PEDMA outperforms the baselines and state-of-the-art algorithms, and ablation studies show the contribution and necessity of each component in PEDMA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Further Exploration of Deep Multi-Agent Reinforcement Learning with Hybrid Action Space

SMPG: Adaptive Soft Update for Masked MADDPG

Averaged-A3C for Asynchronous Deep Reinforcement Learning

References

Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Article Google Scholar
Chebotar Y, Kalakrishnan M, Yahya A, Li A, Schaal S, Levine S (2017) Path integral guided policy search. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3381–3388
Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373
MathSciNet MATH Google Scholar
Adler JL, Satapathy G, Manikonda V, Bowles B, Blue VJ (2005) A multi-agent approach to cooperative traffic management and route guidance. Transp Res B 39(4):297–318
Article Google Scholar
Liu CH, Ma X, Gao X, Tang J (2019) Distributed energy-efficient multi-uav navigation for long-term communication coverage by deep reinforcement learning. IEEE Trans Mob Comput
Ye D, Zhang M, Yang Y (2015) A multi-agent framework for packet routing in wireless sensor networks. Sensors 15(5):10026–10047
Article Google Scholar
Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746-752):2
Google Scholar
Lowe R, Wu YI, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390
Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: ICML
Liu Y, Wang W, Hu Y, Hao J, Chen X, Gao Y (2020) Multi-agent game abstraction via graph attention neural network.. In: AAAI, pp 7211–7218
Chen H, Liu Y, Zhou Z, Hu D, Zhang M (2020) Gama: Graph attention multi-agent reinforcement learning algorithm for cooperation. Appl Intell 50(12):4195–4205
Article Google Scholar
Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S (2017) Counterfactual multi-agent policy gradients
Ackermann J, Gabler V, Osa T, Sugiyama M (2019) Reducing overestimation bias in multi-agent domains using double centralized critics. arXiv:1910.01465
Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A (2020) Reinforcement learning with augmented data. arXiv:2004.14990
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Article Google Scholar
Pinto L, Andrychowicz M, Welinder P, Zaremba W, Abbeel P (2017) Asymmetric actor critic for image-based robot learning. arXiv:1710.06542
Cobbe K, Klimov O, Hesse C, Kim T, Schulman J (2019) Quantifying generalization in reinforcement learning. In: International Conference on Machine Learning. PMLR, pp 1282–1289
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. Proc Mach Learn Res 80:1587–1596
Google Scholar
Peters J, Schaal S (2006) Policy gradient methods for robotics. In: 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp 2219–2225
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Liu I-J, Yeh RA, Schwing AG (2019) Pic: Permutation invariant critic for multi-agent deep reinforcement learning. arXiv:1911.00025
Estrach JB, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and deep locally connected networks on graphs. In: 2nd International Conference on Learning Representations, ICLR 2014
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952
Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, Van Hasselt H, Silver D (2018) Distributed prioritized experience replay. arXiv:1803.00933
Aotani T, Kobayashi T, Sugimoto K (2021) Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks. In: Appl Intell:1–19
Srinivas A, Laskin M, Abbeel P (2020) Curl: Contrastive unsupervised representations for reinforcement learning. arXiv:2004.04136
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Machine learning proceedings 1994. Elsevier, pp 157–163
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279–292
Article Google Scholar
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Article Google Scholar
Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3-4):293–321
Article Google Scholar
Lillicrap T, Hunt J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. CoRR
Mordatch I, Abbeel P (2018) Emergence of grounded compositional language in multi-agent populations. In: Thirty-Second AAAI Conference on Artificial Intelligenc
Vidal R, Shakernia O, Kim HJ, Shim DH, Sastry S (2002) Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation. IEEE Trans Robot Autom 18(5):662–669
Article Google Scholar
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems. Springer, pp 66–83
Bellman R (1966) Dynamic programming. Science 153(3731):34–37
Article Google Scholar
Kingma DP, Ba JL Adam: Amethod for stochastic optimization

Download references

Acknowledgements

This work is supported by the National key R & D program of China sub project “Emergent behavior recognition, training and interpretation techniques” under grant No. 2018AAA010230.

Author information

Authors and Affiliations

School of Aeronautics and Astronautics, Zhejiang University, Hangzhou, 310027, China
Zhenhui Ye, Yining Chen, Guanghua Song, Bowei Yang & Sheng Fan
College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Xiaohong Jiang

Authors

Zhenhui Ye
View author publications
You can also search for this author in PubMed Google Scholar
Yining Chen
View author publications
You can also search for this author in PubMed Google Scholar
Xiaohong Jiang
View author publications
You can also search for this author in PubMed Google Scholar
Guanghua Song
View author publications
You can also search for this author in PubMed Google Scholar
Bowei Yang
View author publications
You can also search for this author in PubMed Google Scholar
Sheng Fan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guanghua Song.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ye, Z., Chen, Y., Jiang, X. et al. Improving sample efficiency in Multi-Agent Actor-Critic methods. Appl Intell 52, 3691–3704 (2022). https://doi.org/10.1007/s10489-021-02554-5

Download citation

Accepted: 21 May 2021
Published: 09 July 2021
Issue Date: March 2022
DOI: https://doi.org/10.1007/s10489-021-02554-5

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Improving sample efficiency in Multi-Agent Actor-Critic methods

Abstract

Access this article

Similar content being viewed by others

A Further Exploration of Deep Multi-Agent Reinforcement Learning with Hybrid Action Space

SMPG: Adaptive Soft Update for Masked MADDPG

Averaged-A3C for Asynchronous Deep Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Improving sample efficiency in Multi-Agent Actor-Critic methods

Abstract

Access this article

Similar content being viewed by others

A Further Exploration of Deep Multi-Agent Reinforcement Learning with Hybrid Action Space

SMPG: Adaptive Soft Update for Masked MADDPG

Averaged-A3C for Asynchronous Deep Reinforcement Learning

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation