Journal of Intelligent & Robotic Systems

, Volume 96, Issue 3–4, pp 591–601 | Cite as

Mapless Motion Planning System for an Autonomous Underwater Vehicle Using Policy Gradient-based Deep Reinforcement Learning

  • Yushan Sun
  • Junhan Cheng
  • Guocheng ZhangEmail author
  • Hao Xu


This research is concerned with the motion planning problem encountered by underactuated autonomous underwater vehicles (AUVs) in a mapless environment. A motion planning system based on deep reinforcement learning is proposed. This system, which directly optimizes the policy, is an end-to-end motion planning system. It uses sensor information as input and continuous surge force and yaw moment as output. It can reach multiple target points in a sequence while simultaneously avoiding obstacles. In addition, this study proposes a reward curriculum training method to solve the problem in which the number of samples required for random exploration increases exponentially with the number of steps needed to obtain a reward. At the same time, the negative impact of intermediate rewards can be avoided. The proposed system demonstrates good planning ability for a mapless environment and excellent ability to migrate to other unknown environments. The system also has resistance to current disturbances. The simulation results show that the proposed mapless motion planning system can guide an underactuated AUV in navigating to its desired targets without colliding with any obstacles.


Autonomous underwater vehicle (AUV) Motion planning Deep reinforcement learning Curriculum learning 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



  1. 1.
    Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, pp. 41–48. ACM (2009)Google Scholar
  2. 2.
    Carreras, M., Batlle, J., Ridao, P.: Hybrid coordination of reinforcement learning-based behaviors for auv control. In: 2001 IEEE/RSJ international conference on intelligent robots and systems, 2001. Proceedings, vol. 3, pp. 1410–1415. IEEE (2001)Google Scholar
  3. 3.
    Carreras Pérez, M., Yuh, J., Batlle i Grabulosa, J., Ridao Rodríguez, P.: A behavior-based scheme using reinforcement learning for autonomous underwater vehicles. Ⓒ Oceanic Engineering 30, 416–427 (2005)CrossRefGoogle Scholar
  4. 4.
    Chen, C., Seff, A., Kornhauser, A., Xiao, J.: Deepdriving: Learning affordance for direct perception in autonomous driving. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2722–2730 (2015)Google Scholar
  5. 5.
    Cheng, Y., Zhang, W.: Concise deep reinforcement learning obstacle avoidance for underactuated unmanned marine vessels. Neurocomputing 272, 63–73 (2018)CrossRefGoogle Scholar
  6. 6.
    Cui, R., Yang, C., Li, Y., Sharma, S.: Adaptive neural network control of auvs with control input nonlinearities using reinforcement learning. IEEE Trans. Syst. Man Cybern. Syst. Hum. 47(6), 1019–1029 (2017)CrossRefGoogle Scholar
  7. 7.
    Devlin, S., Kudenko, D., Grześ, M.: An empirical study of potential-based reward shaping and advice in complex, multi-agent systems. Adv. Complex Syst. 14(02), 251–278 (2011)MathSciNetCrossRefGoogle Scholar
  8. 8.
    El-Fakdi, A., Carreras, M.: Policy gradient based reinforcement learning for real autonomous underwater cable tracking. In: IEEE/RSJ international conference on intelligent robots and systems, 2008, IROS 2008. pp. 3635–3640. IEEE (2008)Google Scholar
  9. 9.
    El-Fakdi, A., Carreras, M.: Two-step gradient-based reinforcement learning for underwater robotics behavior learning. Robot. Auton. Syst. 61(3), 271–282 (2013)CrossRefGoogle Scholar
  10. 10.
    Fossen, T.I.: Handbook of marine craft hydrodynamics and motion control. John Wiley & Sons (2011)Google Scholar
  11. 11.
    Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Colmenarejo, S.G., Grefenstette, E., Ramalho, T., Agapiou, J., et al.: Hybrid computing using a neural network with dynamic external memory. Nature 538(7626), 471 (2016)CrossRefGoogle Scholar
  12. 12.
    Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE international conference on robotics and automation (ICRA), pp. 3389–3396. IEEE (2017)Google Scholar
  13. 13.
    Heess, N., Hunt, J.J., Lillicrap, T.P., Silver, D.: Memory-based control with recurrent neural networks. arXiv:1512.04455 (2015)
  14. 14.
    Heess, N., Sriram, S., Lemmon, J., Merel, J., Wayne, G., Tassa, Y., Erez, T., Wang, Z., Eslami, A., Riedmiller, M., et al.: Emergence of locomotion behaviours in rich environments. arXiv:1707.02286 (2017)
  15. 15.
    Kawano, H., Ura, T.: Motion planning algorithm for nonholonomic autonomous underwater vehicle in disturbance using reinforcement learning and teaching method. In: IEEE international conference on robotics and automation, 2002. Proceedings. ICRA’02, vol. 4, pp. 4032–4038. IEEE (2002)Google Scholar
  16. 16.
    Kormushev, P., Caldwell, D.G.: Towards improved auv control through learning of periodic signals. In: Oceans-San Diego, 2013, pp. 1–4. IEEE (2013)Google Scholar
  17. 17.
    Lei, T., Ming, L.: A robot exploration strategy based on q-learning network. In: IEEE international conference on real-time computing and robotics (RCAR), pp. 57–62. IEEE (2016)Google Scholar
  18. 18.
    Li, Y., Cui, R., Li, Z., Xu, D.: Neural network approximation-based near-optimal motion planning with kinodynamic constraints using rrt. IEEE Transactions on Industrial Electronics (2018)Google Scholar
  19. 19.
    Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D. arXiv:1509.02971 (2015)
  20. 20.
    Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International conference on machine learning, pp. 1928–1937 (2016)Google Scholar
  21. 21.
    Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)
  22. 22.
    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRefGoogle Scholar
  23. 23.
    Muller, U., Ben, J., Cosatto, E., Flepp, B., Cun, Y.L.: Off-road obstacle avoidance through end-to-end learning. In: Advances in neural information processing systems, pp. 739–746 (2006)Google Scholar
  24. 24.
    Ng, A.Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E.: Autonomous inverted helicopter flight via reinforcement learning. In: Experimental Robotics IX, pp. 363–372. Springer (2006)Google Scholar
  25. 25.
    Ng, A.Y., Harada, D., Russell, S.: Policy invariance under reward transformations: Theory and application to reward shaping. In: ICML, vol. 99, pp. 278–287 (1999)Google Scholar
  26. 26.
    Pfeiffer, M., Schaeuble, M., Nieto, J., Siegwart, R., Cadena, C.: From perception to decision: A data-driven approach to end-to-end motion planning for autonomous ground robots. In: 2017 IEEE international conference on robotics and automation (icra), pp. 1527–1533. IEEE (2017)Google Scholar
  27. 27.
    Ramachandran, P., Zoph, B., Le, Q.V.: Searching for activation functions. arXiv:1710.05941 (2018)
  28. 28.
    Schaul, T., Quan, J., Antonoglou, I., Silver, D.: Prioritized experience replay. arXiv:1511.05952 (2015)
  29. 29.
    Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International conference on machine learning, pp. 1889–1897 (2015)Google Scholar
  30. 30.
    Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv:1707.06347 (2017)
  31. 31.
    Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature 529(7587), 484 (2016)CrossRefGoogle Scholar
  32. 32.
    Tai, L., Liu, M. arXiv:1610.01733 (2016)
  33. 33.
    Tai, L., Paolo, G., Liu, M.: Virtual-to-real deep reinforcement learning: Continuous control of mobile robots for mapless navigation. In: 2017 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 31–36. IEEE (2017)Google Scholar
  34. 34.
    Tambet, M., Avital, O., Taco, C., John, S.: Teacher-student curriculum learning. arXiv:1707.00183 (2017)
  35. 35.
    Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: AAAI, vol. 2, pp. 5. Phoenix, AZ (2016)Google Scholar
  36. 36.
    Wang, Z., Schaul, T., Hessel, M., Van Hasselt, H., Lanctot, M., De Freitas, N.: Dueling network architectures for deep reinforcement learning. arXiv:1511.06581 (2015)
  37. 37.
    Wu, Y., Mansimov, E., Grosse, R.B., Liao, S., Ba, J.: Scalable trust-region method for deep reinforcement learning using kronecker-factored approximation. In: Advances in neural information processing systems, pp. 5279–5288 (2017)Google Scholar
  38. 38.
    Xiao, H., Cui, R., Xu, D.: A sampling-based bayesian approach for cooperative multiagent online search with resource constraints. IEEE Trans Cybern 48(6), 1773–1785 (2018)CrossRefGoogle Scholar
  39. 39.
    Xie, C., Patil, S., Moldovan, T., Levine, S., Abbeel, P.: Model-based reinforcement learning with parametrized physical models and optimism-driven exploration. In: 2016 IEEE international conference on robotics and automation (ICRA), pp. 504–511. IEEE (2016)Google Scholar
  40. 40.
    Zaremba, W., Sutskever, I.: Learning to execute. arXiv:1410.4615 (2014)
  41. 41.
    Zhang, F., Leitner, J., Milford, M., Upcroft, B., Corke, P.: Towards vision-based deep reinforcement learning for robotic motion control (2015). arXiv:1511.03791

Copyright information

© Springer Nature B.V. 2019

Authors and Affiliations

  • Yushan Sun
    • 1
  • Junhan Cheng
    • 1
  • Guocheng Zhang
    • 1
    Email author
  • Hao Xu
    • 1
  1. 1.Science and Technology on Underwater Vehicle LaboratoryHarbin Engineering UniversityHarbinChina

Personalised recommendations