Reinforcement Learning Based Obstacle Avoidance for Autonomous Underwater Vehicle

Bhopale, Prashant; Kazi, Faruk; Singh, Navdeep

doi:10.1007/s11804-019-00089-3

Reinforcement Learning Based Obstacle Avoidance for Autonomous Underwater Vehicle

Research Article
Published: 08 April 2019

Volume 18, pages 228–238, (2019)
Cite this article

Journal of Marine Science and Application Aims and scope Submit manuscript

Prashant Bhopale¹,
Faruk Kazi¹ &
Navdeep Singh¹

1726 Accesses
45 Citations
Explore all metrics

Abstract

Obstacle avoidance becomes a very challenging task for an autonomous underwater vehicle (AUV) in an unknown underwater environment during exploration process. Successful control in such case may be achieved using the model-based classical control techniques like PID and MPC but it required an accurate mathematical model of AUV and may fail due to parametric uncertainties, disturbance, or plant model mismatch. On the other hand, model-free reinforcement learning (RL) algorithm can be designed using actual behavior of AUV plant in an unknown environment and the learned control may not get affected by model uncertainties like a classical control approach. Unlike model-based control model-free RL based controller does not require to manually tune controller with the changing environment. A standard RL based one-step Q-learning based control can be utilized for obstacle avoidance but it has tendency to explore all possible actions at given state which may increase number of collision. Hence a modified Q-learning based control approach is proposed to deal with these problems in unknown environment. Furthermore, function approximation is utilized using neural network (NN) to overcome the continuous states and large state-space problems which arise in RL-based controller design. The proposed modified Q-learning algorithm is validated using MATLAB simulations by comparing it with standard Q-learning algorithm for single obstacle avoidance. Also, the same algorithm is utilized to deal with multiple obstacle avoidance problems.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent Advances in Unmanned Aerial Vehicles: A Review

Article 25 April 2022

Faiyaz Ahmed, J. C. Mohanta, … Pankaj Singh Yadav

Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction

Article 28 December 2023

Xinwei Wang, Yihui Wang, … Jie Liu

UAV Path Planning Using Optimization Approaches: A Survey

Article 18 April 2022

Amylia Ait Saadi, Assia Soukane, … Amar Ramdane-Cherif

References

Bhopale P, Bajaria P, Kazi F, Singh N (2016) LMI based depth control for autonomous underwater vehicle. International Conference on Control, Instrumentation, Communication and Computational Technologies (ICCICCT), Kumaracoil, India, 477–481
Bhopale P, Bajaria P, Kazi F, Singh N (2017) Enhancing reduced order model predictive control for autonomous underwater vehicle. In: Le NT, van Do T, Nguyen N, Thi H (eds) Advanced computational methods for knowledge engineering. ICCSAMA 2017. Advances in intelligent systems and computing, vol 629. Springer, Cham, 60–71
Google Scholar
Cheng X, Qu J, Yan Z, Bian X (2010) H∞ robust fault-tolerant controller design for an autonomous underwater vehicle’s navigation control system. J Mar Sci Appl 9(1):87–92. https://doi.org/10.1007/s11804-010-8052-x
Article Google Scholar
Council, National Research (1996) Underwater vehicles, and national needs. National Academies Press, Washington, DC, 1–6
Google Scholar
Fossen T (2011) Handbook of marine craft hydrodynamics and motion control. John Wiley & Sons Ltd. Publication, 6–78
Hafner R, Riedmiller M (2014) Reinforcement learning in feedback control: challenges and benchmarks from technical process control. Mach Learn 84(1–2):137–169. https://doi.org/10.1007/s10994-011-5235-x
Kober J, Andrew B, Jan P (2013) Reinforcement learning in robotics: a survey. Int J Robotics Res 32(11):1238–1274. https://doi.org/10.1177/0278364913495721
Article Google Scholar
Paula M, Acosta G (2015) Trajectory tracking algorithm for autonomous vehicles using adaptive reinforcement learning. Oceans 2015, Washington, DC, 1–8
Google Scholar
Phanthong T, Maki T, Ura T, Sakamaki T, Aiyarak P (2014) Application of A* algorithm for real-time path re-planning of an unmanned surface vehicle avoiding underwater obstacles. J Mar Sci Appl 13(1):105–116. https://doi.org/10.1007/s11804-014-1224-3
Article Google Scholar
Powell W (2007) Approximate dynamic programming: solving the curses of dimensionality. John Wiley and Sons Publication, 1–25
Prestero T (2001) Verification of six-degree of freedom simulation model for the REMUS autonomous underwater vehicle, MSc/ME Thesis. Massachusetts Institute of Technology, Cambridge, 1–78
Book Google Scholar
Qu Y, Xu H, Yu W, Feng H, Han X (2017) Inverse optimal control for speed-varying path following of marine vessels with actuator dynamics. J Mar Sci Appl 16(2):225–236. https://doi.org/10.1007/s11804-017-1410-1
Article Google Scholar
Russell B, Veerle A, Timothy P, Bramley J, Douglas P, Brian J, Henry A, Kirsty J, Jeffrey P, Daniel R, Esther J, Stephen E, Robert M, James E (2014) Autonomous underwater vehicles (AUVs): their past, present and future contributions to the advancement of marine geoscience. Mar Geol 352:451–468. https://doi.org/10.1016/j.margeo.2014.03.012
Article Google Scholar
Su Y, Zhao J, Cao J, Zhang G (2013) Dynamics modeling and simulation of autonomous underwater vehicles with appendages. J Mar Sci Appl 12(1):45–51. https://doi.org/10.1007/s11804-013-1169-6
Article Google Scholar
Sutton R, Barto A (1998) Introduction to reinforcement learning. MIT Press, Cambridge, MA, USA, pp 1–150
Google Scholar
Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292. https://doi.org/10.1007/BF00992698
MATH Google Scholar
Yoo B, Kim J (2016) Path optimization for marine vehicles in ocean currents using reinforcement learning. J Mar Sci Technol 21(2):334–343. https://doi.org/10.1007/s00773-015-0355-9
Article Google Scholar

Download references

Acknowledgements

The authors would like to acknowledge the support of Centre of Excellence (CoE) in Complex and Nonlinear dynamical system (CNDS), through TEQIP-II, VJTI, Mumbai, India.

Author information

Authors and Affiliations

Electrical Engineering Department, Veermata Jijabai Technological Institute, Mumbai, 400019, India
Prashant Bhopale, Faruk Kazi & Navdeep Singh

Authors

Prashant Bhopale
View author publications
You can also search for this author in PubMed Google Scholar
Faruk Kazi
View author publications
You can also search for this author in PubMed Google Scholar
Navdeep Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Prashant Bhopale.

Additional information

Article Highlights

• In order to complete the given task in unknown environment, AUV must avoid collisions with obstacles.

• A modified Q-learning-based control is proposed to reduce number of collisions and compared with standard one-step Q-learning-based control.

• Function approximation is utilized along with RL to deal with continuous states and large state-space problem.

• Proposed RL-based control is utilized for multiple obstacle avoidance.

Appendix 1

The forward motion of AUV in the horizontal plane is referred to as surge (x) (longitudinal motion), sidewise motion is referred as sway (y) (latitudinal motion) and angular motion about the vertical axis is referred as yaw (ψ). The remaining three DOFs are heave (x) (vertical motion), pitch (θ) (rotation about the transverse axis), and roll (ϕ) (rotation about the longitudinal axis) (Fossen 2011).

The following mathematical model for AUV is adopted as a plant to mimic the behavior of AUV motion for given control command:

1.1 Kinematics

The 6 DOF kinematic equations for the body to north-east-down (x-y-z in our case) transformation using Euler’s Theorems and SNAME’s notation (Fossen 2011) for the position [x, y, z, ϕ, θ, ψ]^T and velocity [u, v, w, p, q, r]^T are as follows:

$$ {\displaystyle \begin{array}{l}\dot{x}=\left[\cos \left(\theta \right)\cos \left(\psi \right)\right]u+\left[\cos \left(\psi \right)\sin \left(\phi \right)\sin \left(\theta \right)-\cos \left(\phi \right)\sin \left(\psi \right)\right]v+\left[\sin \left(\phi \right)\sin \left(\psi \right)+\cos \left(\phi \right)\cos \left(\psi \right)\sin \left(\theta \right)\right]w\\ {}\dot{y}=\left[\cos \left(\theta \right)\sin \left(\psi \right)\right]u+\left[\cos \left(\phi \right)\cos \left(\psi \right)+\sin \left(\theta \right)\sin \left(\phi \right)\sin \left(\psi \right)\right]v+\left[\cos \left(\phi \right)\sin \left(\theta \right)\sin \left(\psi \right)-\sin \left(\phi \right)\cos \left(\psi \right)\right]w\\ {}\begin{array}{l}\dot{z}=-\left[\sin \left(\theta \right)\right]u+\left[\sin \left(\phi \right)\cos \left(\theta \right)\right]v+\left[\cos \left(\phi \right)\cos \left(\theta \right)\right]w\\ {}\dot{\phi}=p+\left[\sin \left(\phi \right)\tan \left(\theta \right)\right]q+\left[\cos \left(\phi \right)\tan \left(\theta \right)\right]r\\ {}\begin{array}{l}\dot{\theta}=\cos \left(\phi \right)q-\sin \left(\phi \right)r\\ {}\dot{\psi}=\left[\frac{\sin \left(\phi \right)}{\cos \left(\theta \right)}\right]q-\left[\frac{\cos \left(\phi \right)}{\cos \left(\theta \right)}\right]r,\theta \ne {90}^{{}^{\circ}}\end{array}\end{array}\end{array}} $$

(A1)

1.2 Dynamics

The dynamic equations of motion can be stated as follows:

$$ {\displaystyle \begin{array}{l}m\left[\dot{u}- vr+ wq-{x}_g\left({q}^2+{r}^2\right)+{y}_g\left( pq-\dot{r}\right)+{z}_g\left( pr+\dot{q}\right)\right]={X}_{\mathrm{total}}\\ {}m\left[\dot{v}- wp+ ur-{y}_g\left({r}^2+{p}^2\right)+{z}_g\left( qr+\dot{p}\right)+{x}_g\left( qp+\dot{r}\right)\right]={Y}_{\mathrm{total}}\\ {}\begin{array}{l}m\left[\dot{w}- uq+ vp-{z}_g\left({p}^2+{q}^2\right)+{x}_g\left( rq-\dot{q}\right)+{y}_g\left( rq+\dot{p}\right)\right]={Z}_{\mathrm{total}}\\ {}\begin{array}{l}{I}_x\dot{p}+\left({I}_z-{I}_y\right) qr-\left(\dot{r}+ pq\right){I}_{xz}+\left({r}^2-{p}^2\right){I}_{yz}+\left( pr-\dot{q}\right){I}_{xy}+m\left[{y}_g\left(\dot{w}- uq+ vp\right)-{z}_g\left(\dot{v}- wp+ ur\right)\right]={K}_{\mathrm{total}}\\ {}{I}_y\dot{p}+\left({I}_x-{I}_z\right) rp-\left(\dot{p}+ qr\right){I}_{xy}+\left({p}^2-{r}^2\right){I}_{zx}+\left( qp-\dot{r}\right){I}_{yz}+m\left[{z}_g\left(\dot{u}- vr+ wq\right)-{x}_g\left(\dot{w}-u\mathrm{q}+ vp\right)\right]={M}_{\mathrm{total}}\end{array}\\ {}{I}_z\dot{r}+\left({I}_y-{I}_x\right) pq-\left(\dot{q}+ rp\right){I}_{yz}+\left({q}^2-{p}^2\right){I}_{xy}+\left( rp-\dot{p}\right){I}_{zx}+m\left[{x}_g\left(\dot{v}- wp+ ur\right)-{y}_g\left(\dot{u}- vr+ wq\right)\right]={N}_{\mathrm{total}}\end{array}\end{array}} $$

(A2)

Here, m is the mass of the craft, [u, v, w]^Tare linear velocities, [p, q, r]^T are angular velocities, I_(.) is inertia in particular axial or cross-axial direction and (X_total, Y_total, Z_total) and (K_total,M_total,N_total) are total external forces in linear and angular directions. The first three equations in (A1) and (A2) represents the translational motion, while the last three equation represents the rotational motion.

The external forces in rigid body equation of motion shown in (A2) plays important role in the modeling of an underwater vehicle. In these forces and coefficients are explained including restoring forces, hydrodynamic forces, truster forces and lift forces due rudder (δ_r) and stern (δ_s) elevation. The total external force will be calculated as:

$$ {\tau}_{\mathrm{total}}=\mathrm{Hydrostatic}\ \mathrm{Force}+\mathrm{Hydrodynamic}\ \mathrm{Force}+\mathrm{Added}\ \mathrm{Mass}\ \mathrm{Force}+\mathrm{Body}\ \mathrm{Lift}\ \mathrm{Force}+\mathrm{Fin}\ \mathrm{Lift}\ \mathrm{force}+\mathrm{Propeller}\ \mathrm{Thrust} $$

Therefore, the external forces in rigid body equation of motion shown in Eq. (A2) can be calculated as

$$ {\displaystyle \begin{array}{l}{X}_{\mathrm{total}}={X}_{HS}+{X}_{u\left|u\right|}u\left|u\right|+{X}_{\dot{u}}\dot{u}+{X}_{wq}+{X}_{qq} qq+{X}_{vr} vr+{X}_{rr} rr+{X}_{\mathrm{prop}}\\ {}{Y}_{\mathrm{total}}={Y}_{H\mathrm{S}}+{Y}_{v\left|v\right|}v\left|v\right|+{Y}_{r\left|r\right|}r\left|r\right|+{Y}_{\dot{v}}\dot{v}+{Y}_{\dot{r}}\dot{r}+{Y}_{ur} ur+{Y}_{wp} wp+{Y}_{pq} pq+{Y}_{uv} uv+{Y}_{uu{\delta}_r}{u}^2{\delta}_r\\ {}\begin{array}{l}{Z}_{\mathrm{total}}={Z}_{HS}+{Z}_{w\left|w\right|}w\left|w\right|+{Z}_{q\left|q\right|}+{Z}_{\dot{w}}\dot{w}+{Z}_{\dot{q}}\dot{q}+{Z}_{uq} uq+{Z}_{vp} vp+{Z}_{rp} rp+{Z}_{uw} uw+{Z}_{uu{\delta}_r}{u}^2{\delta}_s\\ {}{K}_{\mathrm{total}}={K}_{HS}+{K}_{p\left|p\right|}p\left|p\right|+{K}_{\dot{p}}+{K}_{\mathrm{prop}}\\ {}\begin{array}{l}{M}_{\mathrm{total}}={M}_{HS}+{M}_{w\left|w\right|}w\left|w\right|+{M}_{q\left|q\right|}q\left|q\right|+{M}_{\dot{w}}\dot{w}+{M}_{\dot{q}}\dot{q}+{M}_{uq} uq+{M}_{vp} vp+{M}_{rp} rp+{M}_{uw} uw+{M}_{uu{\delta}_s}{u}^2{\delta}_s\\ {}{N}_{\mathrm{total}}={N}_{HS}+{N}_{v\left|v\right|}v\left|v\right|+{N}_{r\left|r\right|}r\left|r\right|+{N}_{\dot{v}}\dot{v}+{N}_{\dot{r}}\dot{r}+{N}_{ur} ur+{N}_{wp} wp+{N}_{pq} pq+{N}_{uv} uv+{N}_{uu{\delta}_r}{u}^2{\delta}_r\end{array}\end{array}\end{array}} $$

(A3)

Hence, (A1) to (A3) represents complete 12 order 6 degree of freedom equations of motion of AUV which can be used for AUV simulations. The hydrodynamic coefficients, inertial constants, and physical parameters are taken from (Prestero 2001).

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bhopale, P., Kazi, F. & Singh, N. Reinforcement Learning Based Obstacle Avoidance for Autonomous Underwater Vehicle. J. Marine. Sci. Appl. 18, 228–238 (2019). https://doi.org/10.1007/s11804-019-00089-3

Download citation

Received: 24 September 2017
Accepted: 19 March 2018
Published: 08 April 2019
Issue Date: 01 June 2019
DOI: https://doi.org/10.1007/s11804-019-00089-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Reinforcement Learning Based Obstacle Avoidance for Autonomous Underwater Vehicle

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Unmanned Aerial Vehicles: A Review

Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction

UAV Path Planning Using Optimization Approaches: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Article Highlights

Appendix 1

1.1 Kinematics

1.2 Dynamics

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Reinforcement Learning Based Obstacle Avoidance for Autonomous Underwater Vehicle

Abstract

Access this article

Similar content being viewed by others

Recent Advances in Unmanned Aerial Vehicles: A Review

Deep reinforcement learning-based air combat maneuver decision-making: literature review, implementation tutorial and future direction

UAV Path Planning Using Optimization Approaches: A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Article Highlights

Appendix 1

Appendix 1

1.1 Kinematics

1.2 Dynamics

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation