Skip to main content
Log in

A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform

  • Published:
Journal of Intelligent & Robotic Systems Aims and scope Submit manuscript

Abstract

The use of multi-rotor UAVs in industrial and civil applications has been extensively encouraged by the rapid innovation in all the technologies involved. In particular, deep learning techniques for motion control have recently taken a major qualitative step, since the successful application of Deep Q-Learning to the continuous action domain in Atari-like games. Based on these ideas, Deep Deterministic Policy Gradients (DDPG) algorithm was able to provide outstanding results with continuous state and action domains, which are a requirement in most of the robotics-related tasks. In this context, the research community is lacking the integration of realistic simulation systems with the reinforcement learning paradigm, enabling the application of deep reinforcement learning algorithms to the robotics field. In this paper, a versatile Gazebo-based reinforcement learning framework has been designed and validated with a continuous UAV landing task. The UAV landing maneuver on a moving platform has been solved by means of the novel DDPG algorithm, which has been integrated in our reinforcement learning framework. Several experiments have been performed in a wide variety of conditions for both simulated and real flights, demonstrating the generality of the approach. As an indirect result, a powerful work flow for robotics has been validated, where robots can learn in simulation and perform properly in real operation environments. To the best of the authors knowledge, this is the first work that addresses the continuous UAV landing maneuver on a moving platform by means of a state-of-the-art deep reinforcement learning algorithm, trained in simulation and tested in real flights.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Rucco, A., Sujit, P.B., Aguiar, A.P., Sousa, J.B., Pereira, F.L.: Optimal rendezvous trajectory for unmanned aerial-ground vehicles. arXiv:1612.06100 (2016)

  2. Abadi, M., Agarwal, A., Barham, P., Brevdo, E., Chen, Z., Citro, C., Corrado, G.S., Davis, A., Dean, J., Devin, M., et al.: Tensorflow: Large-scale machine learning on heterogeneous distributed systems. arXiv:1603.04467 (2016)

  3. Borowczyk, A., Nguyen, D.-T., Phu-Van Nguyen, A., Nguyen, D.Q., Saussié, D., Ny, J.L.: Autonomous Landing of a multirotor micro air vehicle on a high velocity ground vehicle. In: IFAC World Congress (2017)

  4. Ananthakrishnan, U.S., Akshay, N., Manikutty, G., Bhavani, R.R.: Control of quadrotors using neural networks for precise landing maneuvers (2017)

  5. Araar, O., Aouf, N., Vitanov, I.: Vision based autonomous landing of multirotor uav on moving platform. J. Intell. Robot. Syst. 85(2), 369–384 (2017)

    Article  Google Scholar 

  6. Arora, S., Jain, S., Scherer, S., Nuske, S., Chamberlain, L., Singh, S.: Infrastructure-free shipdeck tracking for autonomous landing. In: 2013 IEEE International Conference on Robotics and Automation (ICRA), pp. 323–330 (2013)

  7. Blösch, M., Weiss, S., Scaramuzza, D., Siegwart, R.: Vision based mav navigation in unknown and unstructured environments. In: 2010 IEEE International Conference on Robotics and Automation (ICRA), pp. 21–28. IEEE (2010)

  8. Brockman, G., Cheung, V., Pettersson, L., Schneider, J., Schulman, J., Tang, J., Zaremba, W.: Openai gym. arXiv:1606.01540 (2016)

  9. Cantelli, L., Mangiameli, M., Melita, C.D., Muscato, G.: Uav/Ugv cooperation for surveying operations in humanitarian demining. In: 2013 IEEE International Symposium on Safety, Security, and Rescue Robotics (SSRR), pp. 1–6. IEEE (2013)

  10. Dorigo, M., Colombetti, M.: Robot Shaping: an Experiment in Behavior Engineering. MIT Press, Cambridge (1998)

    Google Scholar 

  11. Espié, E., Guionneau, C., Wymann, B., Dimitrakakis, C., Coulom, R., Sumner, A.: Torcs-the open racing car simulator. Available at: http://torcs.sourceforge.net (2005)

  12. Falanga, D., Zanchettin, A., Simovic, A., Delmerico, J., Scaramuzza, D.: Vision-based autonomous quadrotor landing on a moving platform

  13. Furrer, F., Burri, M., Achtelik, M., Siegwart, R.: Robot operating system (ROS): the complete reference (Volume 1), chap. RotorS—A Modular Gazebo MAV Simulator Framework, pp 595–625. Springer International Publishing, Cham (2016). https://doi.org/10.1007/978-3-319-26054-9_23

    Book  Google Scholar 

  14. Gautam, A., Sujit, P.B., Saripalli, S.: A survey of autonomous landing techniques for uavs. In: 2014 International Conference on Unmanned Aircraft Systems (ICUAS) (2014)

  15. Gautam, A., Sujit, P.B., Saripalli, S.: Application of Guidance Laws to Quadrotor Landing. In: 2015 International Conference on Unmanned Aircraft Systems (ICUAS) (2015)

  16. Giusti, A., Guzzi, J., Cireşan, D.C., He, F.L., Rodríguez, J.P., Fontana, F., Faessler, M., Forster, C., Schmidhuber, J., Di Caro, G., et al.: A machine learning approach to visual perception of forest trails for mobile robots. IEEE Robotics and Automation Letters 1(2), 661–667 (2016)

    Article  Google Scholar 

  17. Gu, S., Lillicrap, T., Sutskever, I., Levine, S.: Continuous deep q-learning with model-based acceleration. In: International Conference on Machine Learning, pp. 2829–2838 (2016)

  18. Hu, B., Lu, L., Mishra, S.: Fast, safe and precise landing of a quadrotor on an oscillating platform. In: 2015 American Control Conference (ACC) (2015)

  19. Ivakhnenko, A.G.: Polynomial theory of complex systems. IEEE Trans. Syst. Man Cybern. 1(4), 364–378 (1971)

    Article  MathSciNet  Google Scholar 

  20. Kai, W., Chunzhen, S., Yi, J.: Research on adaptive guidance technology of uav ship landing system based on net recovery. Procedia Engineering 99, 1027–1034 (2015)

    Article  Google Scholar 

  21. Kelchtermans, K., Tuytelaars, T.: How hard is it to cross the room?–training (recurrent) neural networks to steer a uav. arXiv:1702.07600 (2017)

  22. Kendoul, F., Ahmed, B.: Bio-inspired taupilot for automated aerial 4d docking and landing of unmanned aircraft systems. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (2012)

  23. Kim, D.K., Chen, T.: Deep neural network for real-time autonomous indoor navigation. arXiv:1511.04668 (2015)

  24. Kim, J., Jung, Y., Lee, D., Shim, D.H.: Landing control on a mobile platform for multi-copters using an omnidirectional image sensor. J. Intell. Robot. Syst. 84(1–4), 529–541 (2016)

    Article  Google Scholar 

  25. Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32 (11), 1238–1274 (2013)

    Article  Google Scholar 

  26. Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)

  27. Lee, D., Ryan, T., Kim, H.J.: Autonomous landing of a vtol uav on a moving platform using image-based visual servoing. In: 2012 IEEE International Conference on Robotics and Automation (2012)

  28. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)

  29. Ling, K., Chow, D., Das, A., Waslander, S.L.: Autonomous maritime landings for low-cost vtol aerial vehicles. In: 2014 Canadian Conference on Computer and Robot Vision (2014)

  30. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

  31. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning. arXiv:1312.5602 (2013)

  32. Quigley, M., Conley, K., Gerkey, B., Faust, J., Foote, T., Leibs, J., Wheeler, R., Ng, A.Y.: Ros: an open-source robot operating system. In: ICRA Workshop on Open Source Software, vol. 3, p. 5. Kobe (2009)

  33. Rezelj, A.: Autonomous charging of a quadrocopter by landing at a mobile platform (2013)

  34. Rodriguez-Ramos, A., Sampedro, C., Bavle, H., Milosevic, Z., Garcia-Vaquero, A., Campoy, P.: Towards fully autonomous landing on moving platforms for rotary unmanned aerial vehicles. In: 2017 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 170–178. IEEE (2017)

  35. Sadeghi, F., Levine, S.: rl: real single image flight without a single real image. 12, arXiv:1611.04201 (2016)

  36. Sampedro, C., Bavle, H., Rodríguez-Ramos, A., Carrio, A., Fernández, R.A.S., Sanchez-Lopez, J.L., Campoy, P.: A fully-autonomous aerial robotic solution for the 2016 international micro air vehicle competition. In: 2017 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 989–998. IEEE (2017)

  37. Sanchez-Lopez, J.L., Fernández, R.A.S., Bavle, H., Sampedro, C., Molina, M., Pestana, J., Campoy, P.: Aerostack: an architecture and open-source software framework for aerial robotics. In: 2016 International Conference on Unmanned Aircraft Systems (ICUAS), pp. 332–341. IEEE (2016)

  38. Santana, P., Correia, L., Mendonça, R., Alves, N., Barata, J.: Tracking natural trails with swarm-based visual saliency. J. Field Rob. 30(1), 64–86 (2013)

    Article  Google Scholar 

  39. Serra, P., Cunha, R., Hamel, T., Cabecinhas, D., Silvestre, C.: Landing of a quadrotor on a moving target using dynamic image-based visual servo control. IEEE Trans. Robot. 32(6), 1524–1535 (2016)

    Article  Google Scholar 

  40. Shaker, M., Smith, M.N., Yue, S., Duckett, T.: Vision-based landing of a simulated unmanned aerial vehicle with fast reinforcement learning. In: 2010 International Conference on Emerging Security Technologies (EST), pp. 183–188. IEEE (2010)

  41. Silver, D., Lever, G., Heess, N., Degris, T., Wierstra, D., Riedmiller, M.: Deterministic policy gradient algorithms. In: Proceedings of the 31st International Conference on Machine Learning (ICML-14), pp. 387–395 (2014)

  42. Skoczylas, M.: Vision analysis system for autonomous landing of micro drone. Acta Mechanica et Automatica 8(4), 199–203 (2015)

    Article  Google Scholar 

  43. Sutton, R.S., Barto, A.G.: Reinforcement Learning: an Introduction, vol. 1. MIT Press, Cambridge (1998)

    Google Scholar 

  44. Todorov, E., Erez, T., Tassa, Y.: Mujoco: a physics engine for model-based control. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 5026–5033. IEEE (2012)

  45. Uhlenbeck, G.E., Ornstein, L.S.: On the theory of the brownian motion. Phys. Rev. 36(5), 823 (1930)

    Article  MATH  Google Scholar 

  46. Vlantis, P., Marantos, P., Bechlioulis, C.P., Kyriakopoulos, K.J.: Quadrotor landing on an inclined platform of a moving ground vehicle. In: 2015 IEEE International Conference on Robotics and Automation (ICRA) (2015)

  47. Wenzel, K.E., Masselli, A., Zell, A.: Automatic take off, tracking and landing of a miniature uav on a moving carrier vehicle. J. Intell. Robot. Syst. 61(1–4), 221–238 (2011)

    Article  Google Scholar 

  48. Zamora, I., Lopez, N.G., Vilches, V.M., Cordero, A.H.: Extending the openai gym for robotics: a toolkit for reinforcement learning using ros and gazebo. arXiv:1608.05742 (2016)

  49. Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), pp. 528–535. IEEE (2016)

Download references

Acknowledgements

This work was supported by the Spanish Ministry of Science (Project DPI2014-60139-R). The LAL UPM and the MONCLOA Campus of International Excellence are also acknowledged for funding the predoctoral contract of one of the authors.

An introductory version of this paper was presented in the 2017 International Conference on Unmanned Aircraft Systems (ICUAS), held in Miami, FL USA, on 13–16 June 2017.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alejandro Rodriguez-Ramos.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rodriguez-Ramos, A., Sampedro, C., Bavle, H. et al. A Deep Reinforcement Learning Strategy for UAV Autonomous Landing on a Moving Platform. J Intell Robot Syst 93, 351–366 (2019). https://doi.org/10.1007/s10846-018-0891-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10846-018-0891-8

Keywords

Navigation