Abstract
The popularity of multi-agent deep reinforcement learning (MADRL) is growing rapidly with the demand for large-scale real-world tasks that require swarm intelligence, and many studies have improved MADRL from the perspective of network structures or reinforcement learning methods. However, the application of MADRL in the real world is hampered by the low sample efficiency of the models and the high cost to collect data. To improve the practicability, an extension to the current training paradigm of MADRL that improves the sample efficiency is imperative. To this end, this paper proposes PEDMA, a flexible plugin unit for MADRL. It consists of three techniques: (i)Parallel Environments (PE), to accelerate the data acquisition; (ii)Experience Augmentation (EA), a novel data augmentation method that utilizes the permutation invariance property of the multi-agent system to reduce the cost of acquiring data; and (iii)Delayed Updated Policies (DUP), to improve the data utilization efficiency of the MADRL model. The proposed EA method could improve the performance, data efficiency, and convergence speed of MADRL models, which is theoretically and empirically demonstrated. Experiments on three multi-agent benchmark tasks show that the MAAC model trained with PEDMA outperforms the baselines and state-of-the-art algorithms, and ablation studies show the contribution and necessity of each component in PEDMA.
Similar content being viewed by others
References
Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484
Chebotar Y, Kalakrishnan M, Yahya A, Li A, Schaal S, Levine S (2017) Path integral guided policy search. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3381–3388
Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373
Adler JL, Satapathy G, Manikonda V, Bowles B, Blue VJ (2005) A multi-agent approach to cooperative traffic management and route guidance. Transp Res B 39(4):297–318
Liu CH, Ma X, Gao X, Tang J (2019) Distributed energy-efficient multi-uav navigation for long-term communication coverage by deep reinforcement learning. IEEE Trans Mob Comput
Ye D, Zhang M, Yang Y (2015) A multi-agent framework for packet routing in wireless sensor networks. Sensors 15(5):10026–10047
Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746-752):2
Lowe R, Wu YI, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390
Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: ICML
Liu Y, Wang W, Hu Y, Hao J, Chen X, Gao Y (2020) Multi-agent game abstraction via graph attention neural network.. In: AAAI, pp 7211–7218
Chen H, Liu Y, Zhou Z, Hu D, Zhang M (2020) Gama: Graph attention multi-agent reinforcement learning algorithm for cooperation. Appl Intell 50(12):4195–4205
Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S (2017) Counterfactual multi-agent policy gradients
Ackermann J, Gabler V, Osa T, Sugiyama M (2019) Reducing overestimation bias in multi-agent domains using double centralized critics. arXiv:1910.01465
Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A (2020) Reinforcement learning with augmented data. arXiv:2004.14990
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324
Pinto L, Andrychowicz M, Welinder P, Zaremba W, Abbeel P (2017) Asymmetric actor critic for image-based robot learning. arXiv:1710.06542
Cobbe K, Klimov O, Hesse C, Kim T, Schulman J (2019) Quantifying generalization in reinforcement learning. In: International Conference on Machine Learning. PMLR, pp 1282–1289
Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937
Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. Proc Mach Learn Res 80:1587–1596
Peters J, Schaal S (2006) Policy gradient methods for robotics. In: 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp 2219–2225
Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008
Liu I-J, Yeh RA, Schwing AG (2019) Pic: Permutation invariant critic for multi-agent deep reinforcement learning. arXiv:1911.00025
Estrach JB, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and deep locally connected networks on graphs. In: 2nd International Conference on Learning Representations, ICLR 2014
Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952
Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, Van Hasselt H, Silver D (2018) Distributed prioritized experience replay. arXiv:1803.00933
Aotani T, Kobayashi T, Sugimoto K (2021) Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks. In: Appl Intell:1–19
Srinivas A, Laskin M, Abbeel P (2020) Curl: Contrastive unsupervised representations for reinforcement learning. arXiv:2004.04136
Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Machine learning proceedings 1994. Elsevier, pp 157–163
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279–292
Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533
Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3-4):293–321
Lillicrap T, Hunt J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. CoRR
Mordatch I, Abbeel P (2018) Emergence of grounded compositional language in multi-agent populations. In: Thirty-Second AAAI Conference on Artificial Intelligenc
Vidal R, Shakernia O, Kim HJ, Shim DH, Sastry S (2002) Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation. IEEE Trans Robot Autom 18(5):662–669
Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems. Springer, pp 66–83
Bellman R (1966) Dynamic programming. Science 153(3731):34–37
Kingma DP, Ba JL Adam: Amethod for stochastic optimization
Acknowledgements
This work is supported by the National key R & D program of China sub project “Emergent behavior recognition, training and interpretation techniques” under grant No. 2018AAA010230.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Ye, Z., Chen, Y., Jiang, X. et al. Improving sample efficiency in Multi-Agent Actor-Critic methods. Appl Intell 52, 3691–3704 (2022). https://doi.org/10.1007/s10489-021-02554-5
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10489-021-02554-5