Advertisement

SparseMAAC: Sparse Attention for Multi-agent Reinforcement Learning

  • Wenhao Li
  • Bo JinEmail author
  • Xiangfeng Wang
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11448)

Abstract

In multi-agent scenario, each agent needs to aware other agents’ information as well as the environment to improve the performance of reinforcement learning methods. However, as the increasing of the agent number, this procedure becomes significantly complicated and ambitious in order to prominently improve efficiency. We introduce the sparse attention mechanism into multi-agent reinforcement learning framework and propose a novel Multi-Agent Sparse Attention Actor Critic (SparseMAAC) algorithm. Our algorithm framework enables the ability to efficiently select and focus on those critical impact agents in early training stages, while eliminates data noise simultaneously. The experimental results show that the proposed SparseMAAC algorithm not only exceeds those baseline algorithms in the reward performance, but also is superior to them significantly in the convergence speed.

Keywords

Multi-agent deep reinforcement learning Sparse attention mechanism Actor-attention-critic 

Notes

Acknowledgment

This work is supported by the National Natural Science Foundation of China (Grant No. 61702188, No. U1609220, No. U1509219 and No. 61672231).

References

  1. 1.
    Busoniu, L., Babuska, R., De-Schutter, B.: A comprehensive survey of multi-agent reinforcement learning. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 38(2), 156–172 (2008)CrossRefGoogle Scholar
  2. 2.
    Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: learning affordance for direct perception in autonomous driving. In: IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)Google Scholar
  3. 3.
    Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. In: 32nd AAAI Conference on Artificial Intelligence (2018)Google Scholar
  4. 4.
    Haarnoja, T., Zhou, A., Abbeel, P., Levine, S.: Soft actor-critic: off-policy maximum entropy deep reinforcement learning with a stochastic actor. arXiv preprint arXiv:1801.01290 (2018)
  5. 5.
    Haarnoja, T., et al.: Soft actor-critic algorithms and applications. arXiv preprint arXiv:1812.05905 (2018)
  6. 6.
    Iqbal, S., Sha, F.: Actor-attention-critic for multi-agent reinforcement learning. arXiv preprint arXiv:1810.02912 (2018)
  7. 7.
    Jiang, J., Lu, Z.: Learning attentional communication for multi-agent cooperation. arXiv preprint arXiv:1805.07733 (2018)
  8. 8.
    Lillicrap, T., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
  9. 9.
    Littman, M.: Markov games as a framework for multi-agent reinforcement learning. In: International Conference on Machine Learning, pp. 157–163 (1994)Google Scholar
  10. 10.
    Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)Google Scholar
  11. 11.
    Luong, M.T., Pham, H., Manning, C.: Effective approaches to attention-based neural machine translation. arXiv preprint arXiv:1508.04025 (2015)
  12. 12.
    Martins, A., Astudillo, R.: From softmax to sparsemax: a sparse model of attention and multi-label classification. In: International Conference on Machine Learning, pp. 1614–1623 (2016)Google Scholar
  13. 13.
    Matignon, L., Jeanpierre, L., Mouaddib, A.I.: Coordinated multi-robot exploration under communication constraints using decentralized Markov decision processes. In: 26th AAAI Conference on Artificial Intelligence (2012)Google Scholar
  14. 14.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  15. 15.
    Mousavi, S., Schukat, M., Howley, E., Borji, A., Mozayani, N.: Learning to predict where to look in interactive environments using deep recurrent Q-learning. arXiv preprint arXiv:1612.05753 (2016)
  16. 16.
    Niculae, V., Blondel, M.: A regularized framework for sparse and structured neural attention. In: Advances in Neural Information Processing Systems, pp. 3338–3348 (2017)Google Scholar
  17. 17.
    Niv, Y., Daniel, R., Geana, A., Gershman, S., Leong, Y., Radulescu, A., Wilson, R.: Reinforcement learning in multidimensional environments relies on attention mechanisms. J. Neurosci. 35(21), 8145–8157 (2015)CrossRefGoogle Scholar
  18. 18.
    Oh, J., Chockalingam, V., Singh, S., Lee, H.: Control of memory, active perception, and action in minecraft. arXiv preprint arXiv:1605.09128 (2016)
  19. 19.
    OpenAI: Openai Five (2018). https://blog.openai.com/openai-five/
  20. 20.
    Peng, P., et al.: Multiagent bidirectionally-coordinated nets for learning to play starcraft combat games. arXiv preprint arXiv:1703.10069 (2017)
  21. 21.
    Puterman, M.: Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley, Hoboken (2014)zbMATHGoogle Scholar
  22. 22.
    Shoham, Y., Powers, R., Grenager, T.: If multi-agent learning is the answer, what is the question? Artif. Intell. 171(7), 365–377 (2007)MathSciNetCrossRefGoogle Scholar
  23. 23.
    Sutton, R., Barto, A.: Reinforcement Learning: An Introduction (2018)Google Scholar
  24. 24.
    Sutton, R.S., McAllester, D.A., Singh, S.P., Mansour, Y.: Policy gradient methods for reinforcement learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1057–1063 (2000)Google Scholar
  25. 25.
    Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)Google Scholar
  26. 26.
    Williams, R.: Simple statistical gradient-following algorithms for connectionist reinforcement learning. Mach. Learn. 8(3–4), 229–256 (1992)zbMATHGoogle Scholar
  27. 27.
    Xu, K., et al.: Show, attend and tell: Neural image caption generation with visual attention. In: International Conference on Machine Learning, pp. 2048–2057 (2015)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Shanghai Key Lab for Trustworthy Computing, School of Computer Science and Software EngineeringEast China Normal UniversityShanghaiChina

Personalised recommendations