Fuzzy Categorical Deep Reinforcement Learning of a Defensive Game for an Unmanned Surface Vessel

  • Yin Cheng
  • Zhijian Sun
  • Yuexin Huang
  • Weidong ZhangEmail author


Unmanned surface vessels (USVs) have great significance and wide applications in many fields, whereas the control law designed with the analytical approach is too complicated to implement, subject to the level of hardware development. Confronted with obstacles, USVs use the conventional method to avoid them, but in many practical cases, it is difficult to devise the path in advance. Moreover, prior knowledge, including expert experience, may be challenging to introduce into a control system effectively. In this paper, a fuzzy categorical deep reinforcement learning-based framework is established to handle a sophisticated obstruction situation. The framework consists of an interactive observation module and a control module with fuzzy reward shaping. Experimental results verify that the performance of the USV with the framework is better than that of the USV using the path-following method. In addition, it is not necessary to arrange the path of the USV beforehand; the path is autonomously steered to the destination instead. With the benefit of the simple control law, the architecture is available for various levels of hardware.


Fuzzy reinforcement learning Categorical DQN Unmanned surface vessels 



This paper is partly supported by the National Science Foundation of China (61473183, 61521063, U1509211).


  1. 1.
    Manley, J.E.: Unmanned surface vehicles, 15 years of development. In: OCEANS. IEEE, pp. 1–4 (2008)Google Scholar
  2. 2.
    Bertram, V.: Unmanned surface vehicles-a survey. Skibsteknisk Selsk. Cph. Den. 1, 1–14 (2008)Google Scholar
  3. 3.
    Villa, J L., Paez, J., Quintero, C., et al.: Design and control of an unmanned surface vehicle for environmental monitoring applications. In: IEEE Colombian conference on robotics and automation (CCRA). IEEE, pp. 1–5 (2016)Google Scholar
  4. 4.
    Wang, N., Sun, J.C., Er, M.J., et al.: A novel extreme learning control framework of unmanned surface vehicles. IEEE Trans. Cybern. 46(5), 1106–1117 (2016)CrossRefGoogle Scholar
  5. 5.
    Smierzchalski, R., Michalewicz, Z.: Path planning in dynamic environments. Innov. Robot Mobil. Control 8, 135–153 (2005)Google Scholar
  6. 6.
    Peng, Z., Wang, D., Li, T., et al.: Leaderless and leader-follower cooperative control of multiple marine surface vehicles with unknown dynamics. Nonlinear Dyn. 74(1–2), 95–106 (2013)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Zhang, G., Zhang, X.: A novel DVS guidance principle and robust adaptive path-following control for underactuated ships using low frequency gain-learning. ISA Trans. 56, 75–85 (2015)CrossRefGoogle Scholar
  8. 8.
    Zhang, G., Zhang, X.: Concise robust adaptive path-following control of underactuated ships using DSC and MLP. IEEE J. Ocean. Eng. 8(4), 685–694 (2014)CrossRefGoogle Scholar
  9. 9.
    Fossen, T.I.: Handbook of Marine Craft Hydrodynamics and Motion Control. Wiley, New York (2011)CrossRefGoogle Scholar
  10. 10.
    Perez, T.: Ship Motion Control: Course Keeping and Roll Stabilisation Using Rudder and Fins. Springer, Berlin (2006)Google Scholar
  11. 11.
    Vamvoudakis, K.G., Modares, H., Kiumarsi, B., et al.: Game theory-based control system algorithms with real-time reinforcement learning: how to solve multiplayer games online. IEEE Control Syst. 37(1), 33–52 (2017)MathSciNetCrossRefGoogle Scholar
  12. 12.
    Mendelson, E.: Introducing Game Theory and Its Applications. CRC Press, Boca Raton (2016)zbMATHGoogle Scholar
  13. 13.
    Wei, Q., Song, R., Yan, P.: Data-driven zero-sum neuro-optimal control for a class of continuous-time unknown nonlinear systems with disturbance using ADP. IEEE Trans. Neural Netw. Learn. Syst. 27(2), 444–458 (2016)MathSciNetCrossRefGoogle Scholar
  14. 14.
    Harsanyi, J.C.: Games with incomplete information played by Bayesian players: part I. The basic model. Manag. Sci 50_supplement(12), 1804–1817 (2004)CrossRefGoogle Scholar
  15. 15.
    Maschler, Michael, Solan, Eilon, Zamir, Shmuel: Game Theory. Cambridge University Press, Cambridge (2013)CrossRefGoogle Scholar
  16. 16.
    Mei, S., Wang, Y., Sun, Z.: Robust economic dispatch considering renewable generation. In: 2011 IEEE PES Innovative Smart Grid Technologies Asia (ISGT). IEEE, pp. 1–5 (2011)Google Scholar
  17. 17.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT press, Cambridge (1998)Google Scholar
  18. 18.
    Statistical, Sugiyama M.: Reinforcement Learning: Modern Machine Learning Approaches. CRC Press, Boca Raton (2015)Google Scholar
  19. 19.
    Zhao, D., Zhu, Y.M.E.C.: A near-optimal online reinforcement learning algorithm for continuous deterministic systems. IEEE Trans. Neural Netw. Learn. Syst. 26(2), 346–356 (2015)MathSciNetCrossRefGoogle Scholar
  20. 20.
    Deng, Y., Bao, F., Kong, Y., et al.: Deep direct reinforcement learning for financial signal representation and trading. IEEE Trans. Neural Netw. Learn. Syst. 28(3), 653–664 (2017)CrossRefGoogle Scholar
  21. 21.
    Bellman, R.: A Markovian Decision Process. RAND Corp, Santa Monica (1957)CrossRefGoogle Scholar
  22. 22.
    Li, Y.: Deep reinforcement learning: an overview. arXiv preprint arXiv:1701.07274 (2017)
  23. 23.
    Mnih, V., Badia, A P., Mirza, M., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
  24. 24.
    Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: The International Conference on Machine Learning (ICML) (2014)Google Scholar
  25. 25.
    Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3–4), 279–292 (1992)zbMATHGoogle Scholar
  26. 26.
    Mnih, V., Kavukcuoglu, K., Silver, D., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  27. 27.
    Krizhevsky, A., Sutskever, I., Hinton, G E.: Imagenet classification with deep convolutional neural networks. In: Pereira, F. (ed.) Advances in Neural Information Processing Systems, pp. 1097–1105. Curran Associates, New York, NY (2012)Google Scholar
  28. 28.
    Bellemare, M. G., Dabney, W., Munos, R.: A distributional perspective on reinforcement learning. In: International Conference on Machine Learning, pp. 449–458 (2017)Google Scholar
  29. 29.
    Caspi, I., Leibovich, G., Novik, G., Endrawis, S.: Reinforcement Learning Coach (2017).

Copyright information

© Taiwan Fuzzy Systems Association and Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  • Yin Cheng
    • 1
  • Zhijian Sun
    • 1
  • Yuexin Huang
    • 1
  • Weidong Zhang
    • 1
    Email author
  1. 1.Department of AutomationShanghai Jiao Tong UniversityShanghaiPeople’s Republic of China

Personalised recommendations