Skip to main content
Log in

Proximal policy optimization for formation navigation and obstacle avoidance

  • Regular Paper
  • Published:
International Journal of Intelligent Robotics and Applications Aims and scope Submit manuscript


In this paper, a formation control problem of second-order holonomic agents is considered, where agents navigate around obstacles using proximal policy optimization (PPO)-based deep reinforcement learning (DRL). The formation is allowed to shrink and expand, while maintaining its shape, in order to navigate the geometric centroid of the formation towards the goal. A bearing-based reward function is presented that depends on the bearing error of each agent towards its designated neighbors. The agents share a single policy that is trained in a centralized manner. Distance measurements, state information, error information regarding neighboring agents, and simulation information are used for training the policy in an end-to-end fashion. Simulation results using the proposed approach are compared with that obtained using an angle-based reward function.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

Availability of data and material


  • Achiam, J.: Spinning up in deep reinforcement learning (2018)

  • Chan, T.F., Golub, G.H., LeVeque, R.J.: Updating formulae and a pairwise algorithm for computing sample variances. In: Caussinus, H., Ettinger, P., Tomassone, R. (eds.) COMPSTAT 1982 5th Symposium Held at Toulouse 1982, vol. 1, pp. 30–41. Physica-Verlag HD, Heidelberg (1982)

    Chapter  Google Scholar 

  • Cheng, R., Orosz, G., Murray, R. M., Burdick, J. W.: End-to-end safe reinforcement learning through barrier functions for safety-critical continuous control tasks (2019). arXiv:1903.08792 [cs, stat]

  • Dong, X., Yu, B., Shi, Z., Zhong, Y.: Time-varying formation control for unmanned aerial vehicles: theories and applications. IEEE Trans. Contr. Syst. Technol. 23(1), 340–348 (2015).

    Article  Google Scholar 

  • Hausknecht, M., Stone, P.: Deep recurrent Q-learning for partially observable MDP (2017). arXiv:1507.06527 [cs]

  • Hong, Z., Wang, Q.: Deterministic policy gradient based formation control for multi-agent systems. In: 2019 Chinese Automation Congress (CAC)1 (1), 4349–4354 (2019).

  • Jiao, Z., Oh, J.: End-to-end reinforcement learning for multi-agent continuous control. In: 2019 18th IEEE International Conference On Machine Learning And Applications (ICMLA)1 (1), 535–540 (2019).

  • Jun, H. W., Kim, H. J., Lee, B. H.: Goal-driven navigation for non-holonomic multi-robot system by learning collision. In: 2019 International Conference on Robotics and Automation (ICRA) 1 (1), 1758–1764 (2019).

  • Khan, A., et al.: Learning safe unlabeled multi-robot planning with motion constraints. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 1 (1), 7558–7565 (2019).

  • Liang, E.: et al. RLlib: abstractions for distributed reinforcement learning (2018). arXiv:1712.09381 [cs]

  • Lin, J., Yang, X., Zheng, P., Cheng, H.: End-to-end decentralized multi-robot navigation in unknown complex environments via deep reinforcement learning. In: 2019 IEEE International Conference on Mechatronics and Automation (ICMA) 1 (1), 2493–2500 (2019).

  • Lu, Y., Zhang, C., Shen, T., Zhang, W.: Adaptive formation scaling maneuver control of autonomous surface vehicles with uncertain dynamics and bearing constraints. In: 2019 Chinese Automation Congress (CAC) 128–133 (2019).

  • Mohseni-Kabir, A., Isele, D., Fujimura, K.: Interaction-aware multi-agent reinforcement learning for mobile agents with individual goals. In: 2019 International Conference on Robotics and Automation (ICRA)1 (1), 3370–3376 (2019).

  • Nguyen, T. T., Hatua, A., Sung, A. H.: Cumulative training and transfer learning for multi-robots collision-free navigation problems. In: 2019 IEEE 10th Annual Ubiquitous Computing, Electronics & Mobile Communication Conference (UEMCON)1 (1), 0305–0311 (2019).

  • Oury Diallo, E. A., Sugawara, T.: Multi-agent pattern formation: a distributed model-free deep reinforcement learning approach. In: 2020 International Joint Conference on Neural Networks (IJCNN) 1 (1), 1–8 (2020).

  • Peng, X.B., Abbeel, P., Levine, S., van de Panne, M.: DeepMimic: example-guided deep reinforcement learning of physics-based character skills. ACM Trans. Graph. 37(4), 1–14 (2018).

    Article  Google Scholar 

  • Peng, Z., Wang, J., Wang, D.: Distributed maneuvering of autonomous surface vehicles based on neurodynamic optimization and fuzzy approximation. IEEE Trans. Contr. Syst. Technol. 26(3), 1083–1090 (2018).

    Article  Google Scholar 

  • Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv:1707.06347 [cs]

  • Schulman, J., Moritz, P., Levine, S., Jordan, M., bbeel, P.: High-dimensional continuous control using generalized advantage estimation (2018). arXiv:1506.02438 [cs]

  • Semnani, S.H., Liu, H., Everett, M., de Ruiter, A., How, J.P.: Multi-agent motion planning for dense and dynamic environments via deep reinforcement learning. IEEE Robot. Autom. Lett. 5(2), 3221–3226 (2020).

    Article  Google Scholar 

  • Sui, Z., Pu, Z., Yi, J., Tan, X.: Path planning of multiagent constrained formation through deep reinforcement learning. In: 2018 International Joint Conference on Neural Networks (IJCNN) 1 (1), 1–8 (2018).

  • Sutton, R.S., Barto, A.: Reinforcement Learning: An Introduction Second, edition Adaptive Computation and Machine Learning. The MIT Press, Cambridge, London (2018)

    MATH  Google Scholar 

  • Tan, Q., Fan, T., Pan, J., Manocha, D.: DeepMNavigate: Deep Reinforced Multi-Robot Navigation Unifying Local & Global Collision Avoidance. 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)1 (1), 6952–6959 (2020).

  • Wang, C., Wang, J., Zhang, X.: A deep reinforcement learning approach to flocking and navigation of UAVs in large-scale complex environments. In: 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP) 1 (1), 1228–1232 (2018).

  • Yan, P., Bai, C., Zheng, H., Guo, J.: Flocking control of uav swarms with deep reinforcement leaming approach. In: 2020 3rd International Conference on Unmanned Systems (ICUS) 1 (1), 592–599 (2020).

  • Yao, S., et al.: Multi-robot collision avoidance with map-based deep reinforcement learning. In: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI) 1 (1), 532–539 (2020).

  • Zhang, Z., Zhang, K., Han, Z.: A novel cooperative control system of multi-missile formation under uncontrollable speed. IEEE Access 9, 9753–9770 (2021).

    Article  Google Scholar 

  • Zhao, W., et al.: Research on the multiagent joint proximal policy optimization algorithm controlling cooperative fixed-wing UAV obstacle avoidance. Sensors 20(16), 4546 (2020).

    Article  Google Scholar 

  • Zhao, S., Zelazo, D.: Translational and scaling formation maneuver control via a bearing-based approach. IEEE Trans. Control Netw. Syst. 4(3), 429–438 (2017).

    Article  MathSciNet  MATH  Google Scholar 

  • Zhou, Y., et al.: Adaptive leader-follower formation control and obstacle avoidance via deep reinforcement learning. In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) 1 (1), 4273–4280 (2019).

  • Zhou, X., Wu, P., Zhang, H., Guo, W., Liu, Y.: Learn to navigate: cooperative path planning for unmanned surface vehicles using deep reinforcement learning. IEEE Access 7, 165262–165278 (2019).

    Article  Google Scholar 

  • Zhu, P., et al.: Multi-robot flocking control based on deep reinforcement learning. IEEE Access 8, 150397–150406 (2020).

    Article  Google Scholar 

Download references


This work was supported in part by the Faculty Research Support fund from Concordia University, Montreal.

Author information

Authors and Affiliations



Principal author: PS

Corresponding author

Correspondence to Rastko R. Selmic.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Sadhukhan, P., Selmic, R.R. Proximal policy optimization for formation navigation and obstacle avoidance. Int J Intell Robot Appl 6, 746–759 (2022).

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: