Multi-robot Target Encirclement Control with Collision Avoidance via Deep Reinforcement Learning


The target encirclement control of multi-robot systems via deep reinforcement learning has been investigated in this paper. Inspired by the encirclement behavior of dolphins to entrap the fishes, the encirclement control is mainly to enforce the robots to achieve a capturing formation pattern around a target, and can be widely applied in many areas such as coverage, patrolling, escorting, etc. Different from traditional methods, we propose a deep reinforcement learning framework for multi-robot target encirclement formation control, combining the advantages of the deep neural network and deterministic policy gradient algorithm, which is free from the complicated work of building the control model and designing the control law. Our method provides a distributed control architecture for each robot in continuous action space, relying only on local teammate information. Besides, the behavioral output at each time step is determined by its own independent network. In addition, both the robots and the moving target can be trained simultaneously. In that way, both cooperation and competition can be contained, and the results validate the effectiveness of the proposed algorithm.

This is a preview of subscription content, log in to check access.


  1. 1.

    Aguilar-Ponce, R., Kumar, A., Tecpanecatl-Xihuitl, J.L., Bayoumi, M.: A network of sensor-based framework for automated visual surveillance. J. Netw. Comput. Appl. 30(3), 1244–1271 (2007)

    Article  Google Scholar 

  2. 2.

    Arleo, A., Smeraldi, F., Gerstner, W.: Cognitive navigation based on nonuniform gabor space sampling, unsupervised growing networks, and reinforcement learning. IEEE Trans. Neural Netw. 15(3), 639–652 (2004)

    Article  Google Scholar 

  3. 3.

    Bustamante, A.L., Molina, J.M., Patricio, M.A.: A practical approach for active camera coordination based on a fusion-driven multi-agent system. Int. J. Syst. Sci. 45(4), 741–755 (2014)

    Article  Google Scholar 

  4. 4.

    Chen, F., Ren, W., Cao, Y.: Surrounding control in cooperative agent networks. Syst. Control Lett. 59 (11), 704–712 (2010)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Degris, T., White, M., Sutton, R.S.: Off-policy actor-critic. In 29th International Conference on Machine Learning (2012)

  6. 6.

    Farinelli, A., Iocchi, L., Nardi, D.: Distributed on-line dynamic task assignment for multi-robot patrolling. Auton. Robot. 41(6), 1–25 (2016)

    Google Scholar 

  7. 7.

    Finke, J., Passino, K.M., Sparks, A.G.: Stable task load balancing strategies for cooperative control of networked autonomous air vehicles. IEEE Trans. Control Syst. Technol. 14(5), 789–803 (2006)

    Article  Google Scholar 

  8. 8.

    Foerster, J., Assael, I.A., de Freitas, N., Whiteson, S.: Learning to Communicate with Deep Multi-Agent Reinforcement Learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)

  9. 9.

    Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., Whiteson, S.: Counterfactual multi-agent policy gradients. arXiv:1705.08926 (2017)

  10. 10.

    Foerster, J., Nardelli, N., Farquhar, G., Afouras, T., Torr, P.H., Kohli, P., Whiteson, S.: Stabilising experience replay for deep multi-agent reinforcement learning. arXiv:1702.08887 (2017)

  11. 11.

    Franchi, A., Stegagno, P., Oriolo, G.: Decentralized multi-robot encirclement of a 3d target with guaranteed collision avoidance. Auton. Robot. 40(2), 1–21 (2016)

    Article  Google Scholar 

  12. 12.

    Hafez, A.T., Iskandarani, M., Givigi, S.N., Yousefi, S., Beaulieu, A.: Uavs in formation and dynamic encirclement via model predictive control. IFAC Proc. 47(3), 1241–1246 (2014)

    Article  Google Scholar 

  13. 13.

    Hausman, K., Mueller, J., Hariharan, A.: Cooperative multi-robot control for target tracking with onboard sensing. Int. J. Robot. Res. 34, (2015)

  14. 14.

    He, D., Xia, Y., Qin, T., Wang, L., Yu, N., Liu, T., Ma, W.Y.: Dual Learning for Machine Translation. In: Advances in Neural Information Processing Systems, pp. 820–828 (2016)

  15. 15.

    He, H., Boyd-Graber, J., Kwok, K., Daumé, H., III: Opponent Modeling in Deep Reinforcement Learning. In: International Conference on Machine Learning, pp. 1804–1813 (2016)

  16. 16.

    Iida, S., Kanoh, M., Kato, S., Itoh, H.: Reinforcement Learning for Motion Control of Humanoid Robots. In: Ieee/Rsj International Conference on Intelligent Robots and Systems, vol.4, pp. 3153–3157 (2004)

  17. 17.

    Kim, T., Hara, S., Hori, Y.: Cooperative control of multi-agent dynamical systems in target-enclosing operations using cyclic pursuit strategy. Int. J. Control. 83(10), 2040–2052 (2010)

    MathSciNet  Article  Google Scholar 

  18. 18.

    Lan, Y., Yan, G., Lin, Z.: Distributed control of cooperative target enclosing based on reachability and invariance analysis. Syst. Control Lett. 59(7), 381–389 (2010)

    MathSciNet  Article  Google Scholar 

  19. 19.

    Leibo, J.Z., Zambaldi, V., Lanctot, M., Marecki, J., Graepel, T.: Multi-agent reinforcement learning in sequential social dilemmas. In: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems, pp. 464–473. International Foundation for Autonomous Agents and Multiagent Systems (2017)

  20. 20.

    Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)

  21. 21.

    Liu, L., Luo, C., Shen, F.: Multi-Agent Formation Control with Target Tracking and Navigation. In: IEEE International Conference on Information and Automation (2017)

  22. 22.

    Long, P., Fanl, T., Liao, X., Liu, W., Zhang, H., Pan, J.: Towards Optimally Decentralized Multi-Robot Collision Avoidance via Deep Reinforcement Learning. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6252–6259. IEEE (2018)

  23. 23.

    Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)

  24. 24.

    Macwan, A., Vilela, J., Nejat, G., Benhabib, B.: A multirobot path-planning strategy for autonomous wilderness search and rescue. IEEE Trans. Cybern. 45(9), 1784–1797 (2015)

    Article  Google Scholar 

  25. 25.

    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)

    Article  Google Scholar 

  26. 26.

    Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. arXiv:1703.06182 (2017)

  27. 27.

    Parrish, J.K., Viscido, S.V., Grunbaum, D.: Self-organized fish schools: an examination of emergent properties. Biol. Bullet. 202(3), 296–305 (2002)

    Article  Google Scholar 

  28. 28.

    Sarwal, A., Agrawal, D., Chaudhary, S.: Surveillance in an Open Environment by Co-Operative Tracking Amongst Sensor Enabled Robots. In: 2007. ICIA’07. International Conference On Information Acquisition, pp. 345–349. IEEE (2007)

  29. 29.

    Sato, K., Maeda, N.: Target-Enclosing Strategies for Multi-Agent Using Adaptive Control Strategy. In: IEEE International Conference on Control Applications, pp. 1761–1766 (2010)

  30. 30.

    Shi, Y.J., Li, R., Teo, K.L.: Rotary enclosing control of second-order multi-agent systems for a group of targets. Int. J. Syst. Sci. 48, 13–21 (2017)

    MathSciNet  Article  Google Scholar 

  31. 31.

    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)

    Article  Google Scholar 

  32. 32.

    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic Policy Gradient Algorithms. Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 387–395 (2014)

  33. 33.

    Silver, D., Schrittwieser, J., Simonyan, K., Antonoglou, I., Huang, A., Guez, A., Hubert, T., Baker, L., Lai, M., Bolton, A.: Mastering the game of go without human knowledge. Nature 550(7676), 354–359 (2017)

    Article  Google Scholar 

  34. 34.

    Su, P.H., Gasic, M., Mrksic, N., Rojas-Barahona, L., Ultes, S., Vandyke, D., Wen, T.H., Young, S.: On-line active reward learning for policy optimisation in spoken dialogue systems. arXiv:1605.07669 (2016)

  35. 35.

    Sukhbaatar, S., Fergus, R., et al.: Learning Multiagent Communication with Backpropagation. In: Advances in Neural Information Processing Systems, pp. 2244–2252 (2016)

  36. 36.

    Sutton, R., Barto, A.: Reinforcement Learning:An introduction. MIT Press, Cambridge (1998)

  37. 37.

    Tampuu, A., Matiisen, T., Kodelja, D., Kuzovkin, I., Korjus, K., Aru, J., Aru, J., Vicente, R.: Multiagent cooperation and competition with deep reinforcement learning, vol. 12 (2017)

  38. 38.

    Wang, C., Xie, G., Cao, M.: Forming circle formations of anonymous mobile agents with order preservation. IEEE Trans. Autom. Control 58(12), 3248–3254 (2013)

    Article  Google Scholar 

  39. 39.

    Wang, C., Xie, G., Cao, M.: Controlling anonymous mobile agents with unidirectional locomotion to form formations on a circle. Automatica 50(4), 1100–1108 (2014)

    MathSciNet  Article  Google Scholar 

  40. 40.

    Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv:1511.06581 (2015)

  41. 41.

    Xiao, J., Xiong, D., Yao, W., Yu, Q., Lu, H., Zheng, Z.: Building Software System and Simulation Environment for RoboCup MSL Soccer Robots Based on ROS and Gazebo. Springer International Publishing (2017)

  42. 42.

    Yao, W., Lu, H., Zeng, Z., Xiao, J., Zheng, Z.: Distributed static and dynamic circumnavigation control with arbitrary spacings for a heterogeneous multi-robot system. Journal of Intelligent & Robotic Systems 94, 883–905 (2019)

    Article  Google Scholar 

  43. 43.

    Zhang, Y., Parker, L.E.: Multi-Robot Task Scheduling. In: IEEE International Conference on Robotics and Automation, pp. 2992–2998 (2016)

  44. 44.

    Zheng, Y., Luo, S., Lv, Z.: Control Double Inverted Pendulum by Reinforcement Learning with Double Cmac Network. In: International Conference on Pattern Recognition, pp. 639–642 (2006)

Download references

Author information



Corresponding author

Correspondence to Huimin Lu.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Junchong Ma and Huimin Lu contributed equally to this work.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(MP4 20.7 MB)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ma, J., Lu, H., Xiao, J. et al. Multi-robot Target Encirclement Control with Collision Avoidance via Deep Reinforcement Learning. J Intell Robot Syst 99, 371–386 (2020).

Download citation


  • Multi-robot
  • Deep reinforcement learning
  • Encirclement control
  • Collision avoidance