Skip to main content

Obstacle Avoidance in Multi-Agent Formation Process Based on Deep Reinforcement Learning


To solve the problems of difficult control law design, poor portability, and poor stability of traditional multi-agent formation obstacle avoidance algorithms, a multi-agent formation obstacle avoidance method based on deep reinforcement learning (DRL) is proposed. This method combines the perception ability of convolutional neural networks (CNNs) with the decision-making ability of reinforcement learning in a general form and realizes direct output control from the visual perception input of the environment to the action through an end-to-end learning method. The multi-agent system (MAS) model of the follow-leader formation method was designed with the wheelbarrow as the control object. An improved deep Q netwrok (DQN) algorithm (we improved its discount factor and learning efficiency and designed a reward value function that considers the distance relationship between the agent and the obstacle and the coordination factor between the multi-agents) was designed to achieve obstacle avoidance and collision avoidance in the process of multi-agent formation into the desired formation. The simulation results show that the proposed method achieves the expected goal of multi-agent formation obstacle avoidance and has stronger portability compared with the traditional algorithm.

This is a preview of subscription content, access via your institution.


  1. [1]

    XIE G, ZHANG Y. Survey of consensus problem in cooperative control of multi-agent systems [J]. Application Research of Computers, 2011, 28(6): 2035–2039 (in Chinese).

    Google Scholar 

  2. [2]

    CHEN Z, LIN L, YAN G. An approach to scientific cooperative robotics: Through MAS (multi-agent system) [J]. Robot, 2001, 23(4): 368–373 (in Chinese).

    Google Scholar 

  3. [3]

    DUAN Y, YANG H, CUI B, et al. Application of reinforcement learning to basic action learning of soccer robot [J]. Robot, 2008, 30(5): 453–459 (in Chinese).

    Google Scholar 

  4. [4]

    LITTMAN M L. Reinforcement learning improves behaviour from evaluative feedback [J]. Nature, 2015, 521(7553): 445–451.

    Article  Google Scholar 

  5. [5]

    ZHU Y, ZHAO D. Probably approximately correct reinforcement learning solving continuous-state control problem [J]. Control Theory and Applications, 2016, 33(12): 1603–1613 (in Chinese).

    MATH  Google Scholar 

  6. [6]

    ZHOU W. The application of deep learning algorithms in intelligent collaborative robots [J]. China New Telecommunications, 2017, 19(21): 129–130 (in Chinese).

    Google Scholar 

  7. [7]

    POLYDOROS A S, NALPANTIDIS L. Survey of model-based reinforcement learning: Applications on robotics [J]. Journal of Intelligent & Robotic Systems, 2017, 86(2): 153–173.

    Article  Google Scholar 

  8. [8]

    LIMA H, KUROE Y. Swarm reinforcement learning methods improving certainty of learning for amulti-robot formation problem [C]//2015 IEEE Congress on Evolutionary Computation (CEC). Sendai: IEEE, 2015: 3026–3033.

    Google Scholar 

  9. [9]

    LIU Q, ZHAI J, ZHANG Z, et al. A survey on deep reinforcement learning [J]. Chinese Journal of Computers, 2018, 41(1): 1–27 (in Chinese).

    Google Scholar 

  10. [10]

    RIEDMILLER M. Neural fitted Q iteration: First experiences with a data efficient neural reinforcement learning method [M]//Machine learning: ECML2005. Berlin, Heidelberg: Springer, 2005: 317–328.

    Google Scholar 

  11. [11]

    LANGE S, RIEDMILLER M. Deep auto-encoder neural networks in reinforcement learning [C]//The 2010 International Joint Conference on Neural Networks (IJCNN). Barcelona: IEEE, 2010: 1–8.

    Google Scholar 

  12. [12]

    ABTAHI F, FASEL I. Deep belief nets as function approximators for reinforcement learning [C]//Workshops at the Twenty-Fifth AAAI Conference on Artificial Intelligence. Frankfurt: AAAI, 2011: 2–7.

    Google Scholar 

  13. [13]

    MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning [J]. Nature, 2015, 518(7540): 529–533.

    Article  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Wenguang Luo.

Additional information

Foundation item

the National Natural Science Foundation of China (No. 61963006), and the Natural Science Foundation of Guangxi Province (Nos. 2020GXNSFDA238011, 2018GXNSFAA050029, and 2018GXNSFAA294085)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ji, X., Hai, J., Luo, W. et al. Obstacle Avoidance in Multi-Agent Formation Process Based on Deep Reinforcement Learning. J. Shanghai Jiaotong Univ. (Sci.) 26, 680–685 (2021).

Download citation

Key words

  • wheelbarrow
  • multi-agent
  • deep reinforcement learning (DRL)
  • formation
  • obstacle avoidance

CLC number

  • O 231.5

Document code

  • A