Real-time control for fuel-optimal Moon landing based on an interactive deep reinforcement learning algorithm


In this study, a real-time optimal control approach is proposed using an interactive deep reinforcement learning algorithm for the Moon fuel-optimal landing problem. Considering the remote communication restrictions and environmental uncertainties, advanced landing control techniques are demanded to meet the high requirements of real-time performance and autonomy in the Moon landing missions. Deep reinforcement learning (DRL) algorithms have been recently developed for real-time optimal control but suffer the obstacles of slow convergence and difficult reward function design. To address these problems, a DRL algorithm is developed using an actor-indirect method architecture to achieve the optimal control of the Moon landing mission. In this DRL algorithm, an indirect method is employed to generate the optimal control actions for the deep neural network (DNN) learning, while the trained DNNs provide good initial guesses for the indirect method to promote the efficiency of training data generation. Through sufficient learning of the state-action relationship, the trained DNNs can approximate the optimal actions and steer the spacecraft to the target in real time. Additionally, a nonlinear feedback controller is developed to improve the terminal landing accuracy. Numerical simulations are given to verify the effectiveness of the proposed DRL algorithm and demonstrate the performance of the developed optimal landing controller.

This is a preview of subscription content, access via your institution.


  1. [1]

    Wang, Z. B., Grant, M. J. Minimum-fuel low-thrust transfers for spacecraft: a convex approach. IEEE Transactions on Aerospace and Electronic Systems, 2018, 54(5): 2274–2290.

    Article  Google Scholar 

  2. [2]

    Tang, G., Jiang, F. H., Li, J. F. Fuel-optimal low-thrust trajectory optimization using indirect method and successive convex programming. IEEE Transactions on Aerospace and Electronic Systems, 2018, 54(4): 2053–2066.

    Article  Google Scholar 

  3. [3]

    Betts, J. T. Survey of numerical methods for trajectory optimization. Journal of Guidance, Control, and Dynamics, 1998, 21(2): 193–207.

    Article  Google Scholar 

  4. [4]

    Yang, H. W., Baoyin, H. Fuel-optimal control for soft landing on an irregular asteroid. IEEE Transactions on Aerospace and Electronic Systems, 2015, 51(3): 1688–1697.

    Article  Google Scholar 

  5. [5]

    Jiang, F. H., Baoyin, H., Li, J. F. Practical techniques for low-thrust trajectory optimization with homotopic approach. Journal of Guidance, Control, and Dynamics, 2012, 35(1): 245–258.

    Article  Google Scholar 

  6. [6]

    Taheri, E., Li, N. I., Kolmanovsky, I. Co-state initialization for the minimum-time low-thrust trajectory optimization. Advances in Space Research, 2017, 59(9): 2360–2373.

    Article  Google Scholar 

  7. [7]

    Liu, X. F., Lu, P., Pan, B. F. Survey of convex optimization for aerospace applications. Astrodynamics, 2017, 1(1): 23–40.

    Article  Google Scholar 

  8. [8]

    Wang, Z. B., Grant, M. J. Constrained trajectory optimization for planetary entry via sequential convex programming. Journal of Guidance, Control, and Dynamics, 2017, 40(10): 2603–2615.

    Article  Google Scholar 

  9. [9]

    Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y, Silver, D., Wierstra, D. Continuous control with deep reinforcement learning. arXiv: 1509.02971, 2015.

  10. [10]

    Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P. Trust region policy optimization. In: Proceedings of International Conference on Machine Learning, 2015, 1889–1897.

  11. [11]

    Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2016, 1928–1937.

  12. [12]

    Sutton, R. S., Barto, A. G. Reinforcement learning: an introduction. MIT Press, 1998.

    Google Scholar 

  13. [13]

    Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G. et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533.

    Article  Google Scholar 

  14. [14]

    Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E. Autonomous inverted helicopter flight via reinforcement learning. Experimental Robotics IX, 2006, 363–372.

    Google Scholar 

  15. [15]

    Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M. et al. Mastering the game of go with deep neural networks and tree search. Nature, 2016, 529(7587): 484–489.

    Article  Google Scholar 

  16. [16]

    Cheng, L., Wang, Z. B., Jiang, F. H., Zhou, C. Y. Real-Time optimal control for spacecraft orbit transfer via multi-scale deep neural networks. IEEE Transactions on Aerospace and Electronic Systems, 2018, DOI:

  17. [17]

    Sánchez-Sánchez, C., Izzo, D. Real-time optimal control via deep neural networks: study on landing problems. Journal of Guidance, Control, and Dynamics, 2018, 41(5): 1122–1135.

    Article  Google Scholar 

  18. [18]

    Moré, J. J., Garbow, B. S., Hillstrom, K. E. User guide for MINPACK-1. CM-P00068642, Argonne National Lab., 1980.

    Google Scholar 

  19. [19]

    Kingma, D., Ba, J. Adam: a method for stochastic optimization. arXiv: 1412.6980, 2014.

  20. [20]

    Abadi, M., Barham, P., Chen, J. M., Chen, Z. F., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M. et al. Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, 2016, 265–283.

Download references


This work is supported by the National Natural Science Foundation of China (Grants Nos. 11672146 and 11432001).

Author information



Corresponding author

Correspondence to Fanghua Jiang.

Additional information

Lin Cheng received his Ph.D. degree in automation science and electrical engineering from Beihang University, China, in 2017, and is working as a postdoctor in the School of Aerospace Engineering, Tsinghua University, China. His current interests include guidance and control, trajectory optimization, deep reinforcement learning and real-time optimal control with AI.

Zhenbo Wang received his B.S. degree in aerospace engineering from Nanjing University of Aeronautics and Astronautics, China, in 2010, and his M.S. degree in control engineering from Beihang University, China, in 2013. In 2018, he received his Ph.D. degree in aerospace engineering from Purdue University, USA. Currently he is working as a tenure-track assistant professor in the Department of Mechanical, Aerospace, and Biomedical Engineering at the University of Tennessee, USA. His current interests are in the area of guidance, control, dynamics, and optimization, and specifically include optimal control, convex optimization, and machine learning with applications to guidance, control, and trajectory optimization for autonomous vehicle systems.

Fanghua Jiang was born in 1982 in Hunan Province, China. He received his B.S. degree in engineering mechanics and Ph.D. degree in mechanics from Tsinghua University, China, in 2004 and 2009, respectively. Since 2009, he has worked in the School of Aerospace Engineering at Tsinghua University, and has been worked as a postdoctor for two years, a research assistant for three years. Since then, he had been working as an associate professor for four years. His current research interests include astrodynamics, spacecraft formation flying, and interplanetary trajectory optimization.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cheng, L., Wang, Z. & Jiang, F. Real-time control for fuel-optimal Moon landing based on an interactive deep reinforcement learning algorithm. Astrodyn 3, 375–386 (2019).

Download citation


  • fuel-optimal landing problem
  • indirect methods
  • deep reinforcement learning
  • interactive network learning
  • real-time optimal control