Skip to main content
Log in

Improving sample efficiency in Multi-Agent Actor-Critic methods

  • Published:
Applied Intelligence Aims and scope Submit manuscript

Abstract

The popularity of multi-agent deep reinforcement learning (MADRL) is growing rapidly with the demand for large-scale real-world tasks that require swarm intelligence, and many studies have improved MADRL from the perspective of network structures or reinforcement learning methods. However, the application of MADRL in the real world is hampered by the low sample efficiency of the models and the high cost to collect data. To improve the practicability, an extension to the current training paradigm of MADRL that improves the sample efficiency is imperative. To this end, this paper proposes PEDMA, a flexible plugin unit for MADRL. It consists of three techniques: (i)Parallel Environments (PE), to accelerate the data acquisition; (ii)Experience Augmentation (EA), a novel data augmentation method that utilizes the permutation invariance property of the multi-agent system to reduce the cost of acquiring data; and (iii)Delayed Updated Policies (DUP), to improve the data utilization efficiency of the MADRL model. The proposed EA method could improve the performance, data efficiency, and convergence speed of MADRL models, which is theoretically and empirically demonstrated. Experiments on three multi-agent benchmark tasks show that the MAAC model trained with PEDMA outperforms the baselines and state-of-the-art algorithms, and ablation studies show the contribution and necessity of each component in PEDMA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Silver D, Huang A, Maddison CJ, Guez A, Sifre L, Van Den Driessche G, Schrittwieser J, Antonoglou I, Panneershelvam V, Lanctot M, et al (2016) Mastering the game of go with deep neural networks and tree search. Nature 529(7587):484

    Article  Google Scholar 

  2. Chebotar Y, Kalakrishnan M, Yahya A, Li A, Schaal S, Levine S (2017) Path integral guided policy search. In: 2017 IEEE international conference on robotics and automation (ICRA). IEEE, pp 3381–3388

  3. Levine S, Finn C, Darrell T, Abbeel P (2016) End-to-end training of deep visuomotor policies. J Mach Learn Res 17(1):1334–1373

    MathSciNet  MATH  Google Scholar 

  4. Adler JL, Satapathy G, Manikonda V, Bowles B, Blue VJ (2005) A multi-agent approach to cooperative traffic management and route guidance. Transp Res B 39(4):297–318

    Article  Google Scholar 

  5. Liu CH, Ma X, Gao X, Tang J (2019) Distributed energy-efficient multi-uav navigation for long-term communication coverage by deep reinforcement learning. IEEE Trans Mob Comput

  6. Ye D, Zhang M, Yang Y (2015) A multi-agent framework for packet routing in wireless sensor networks. Sensors 15(5):10026–10047

    Article  Google Scholar 

  7. Claus C, Boutilier C (1998) The dynamics of reinforcement learning in cooperative multiagent systems. AAAI/IAAI 1998(746-752):2

    Google Scholar 

  8. Lowe R, Wu YI, Tamar A, Harb J, Abbeel OP, Mordatch I (2017) Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in neural information processing systems, pp 6379–6390

  9. Iqbal S, Sha F (2019) Actor-attention-critic for multi-agent reinforcement learning. In: ICML

  10. Liu Y, Wang W, Hu Y, Hao J, Chen X, Gao Y (2020) Multi-agent game abstraction via graph attention neural network.. In: AAAI, pp 7211–7218

  11. Chen H, Liu Y, Zhou Z, Hu D, Zhang M (2020) Gama: Graph attention multi-agent reinforcement learning algorithm for cooperation. Appl Intell 50(12):4195–4205

    Article  Google Scholar 

  12. Foerster J, Farquhar G, Afouras T, Nardelli N, Whiteson S (2017) Counterfactual multi-agent policy gradients

  13. Ackermann J, Gabler V, Osa T, Sugiyama M (2019) Reducing overestimation bias in multi-agent domains using double centralized critics. arXiv:1910.01465

  14. Laskin M, Lee K, Stooke A, Pinto L, Abbeel P, Srinivas A (2020) Reinforcement learning with augmented data. arXiv:2004.14990

  15. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86(11):2278–2324

    Article  Google Scholar 

  16. Pinto L, Andrychowicz M, Welinder P, Zaremba W, Abbeel P (2017) Asymmetric actor critic for image-based robot learning. arXiv:1710.06542

  17. Cobbe K, Klimov O, Hesse C, Kim T, Schulman J (2019) Quantifying generalization in reinforcement learning. In: International Conference on Machine Learning. PMLR, pp 1282–1289

  18. Mnih V, Badia AP, Mirza M, Graves A, Lillicrap T, Harley T, Silver D, Kavukcuoglu K (2016) Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp 1928–1937

  19. Fujimoto S, van Hoof H, Meger D (2018) Addressing function approximation error in actor-critic methods. Proc Mach Learn Res 80:1587–1596

    Google Scholar 

  20. Peters J, Schaal S (2006) Policy gradient methods for robotics. In: 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems. IEEE, pp 2219–2225

  21. Haarnoja T, Zhou A, Abbeel P, Levine S (2018) Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In: ICML

  22. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser L, Polosukhin I (2017) Attention is all you need. In: Advances in neural information processing systems, pp 5998–6008

  23. Liu I-J, Yeh RA, Schwing AG (2019) Pic: Permutation invariant critic for multi-agent deep reinforcement learning. arXiv:1911.00025

  24. Estrach JB, Zaremba W, Szlam A, LeCun Y (2014) Spectral networks and deep locally connected networks on graphs. In: 2nd International Conference on Learning Representations, ICLR 2014

  25. Schaul T, Quan J, Antonoglou I, Silver D (2015) Prioritized experience replay. arXiv:1511.05952

  26. Horgan D, Quan J, Budden D, Barth-Maron G, Hessel M, Van Hasselt H, Silver D (2018) Distributed prioritized experience replay. arXiv:1803.00933

  27. Aotani T, Kobayashi T, Sugimoto K (2021) Bottom-up multi-agent reinforcement learning by reward shaping for cooperative-competitive tasks. In: Appl Intell:1–19

  28. Srinivas A, Laskin M, Abbeel P (2020) Curl: Contrastive unsupervised representations for reinforcement learning. arXiv:2004.04136

  29. Littman ML (1994) Markov games as a framework for multi-agent reinforcement learning. In: Machine learning proceedings 1994. Elsevier, pp 157–163

  30. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3-4):279–292

    Article  Google Scholar 

  31. Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland AK, Ostrovski G, et al (2015) Human-level control through deep reinforcement learning. Nature 518(7540):529–533

    Article  Google Scholar 

  32. Lin L-J (1992) Self-improving reactive agents based on reinforcement learning, planning and teaching. Mach Learn 8(3-4):293–321

    Article  Google Scholar 

  33. Lillicrap T, Hunt J, Pritzel A, Heess N, Erez T, Tassa Y, Silver D, Wierstra D (2015) Continuous control with deep reinforcement learning. CoRR

  34. Mordatch I, Abbeel P (2018) Emergence of grounded compositional language in multi-agent populations. In: Thirty-Second AAAI Conference on Artificial Intelligenc

  35. Vidal R, Shakernia O, Kim HJ, Shim DH, Sastry S (2002) Probabilistic pursuit-evasion games: theory, implementation, and experimental evaluation. IEEE Trans Robot Autom 18(5):662–669

    Article  Google Scholar 

  36. Gupta JK, Egorov M, Kochenderfer M (2017) Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems. Springer, pp 66–83

  37. Bellman R (1966) Dynamic programming. Science 153(3731):34–37

    Article  Google Scholar 

  38. Kingma DP, Ba JL Adam: Amethod for stochastic optimization

Download references

Acknowledgements

This work is supported by the National key R & D program of China sub project “Emergent behavior recognition, training and interpretation techniques” under grant No. 2018AAA010230.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guanghua Song.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ye, Z., Chen, Y., Jiang, X. et al. Improving sample efficiency in Multi-Agent Actor-Critic methods. Appl Intell 52, 3691–3704 (2022). https://doi.org/10.1007/s10489-021-02554-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10489-021-02554-5

Keywords

Navigation