In this study, a real-time optimal control approach is proposed using an interactive deep reinforcement learning algorithm for the Moon fuel-optimal landing problem. Considering the remote communication restrictions and environmental uncertainties, advanced landing control techniques are demanded to meet the high requirements of real-time performance and autonomy in the Moon landing missions. Deep reinforcement learning (DRL) algorithms have been recently developed for real-time optimal control but suffer the obstacles of slow convergence and difficult reward function design. To address these problems, a DRL algorithm is developed using an actor-indirect method architecture to achieve the optimal control of the Moon landing mission. In this DRL algorithm, an indirect method is employed to generate the optimal control actions for the deep neural network (DNN) learning, while the trained DNNs provide good initial guesses for the indirect method to promote the efficiency of training data generation. Through sufficient learning of the state-action relationship, the trained DNNs can approximate the optimal actions and steer the spacecraft to the target in real time. Additionally, a nonlinear feedback controller is developed to improve the terminal landing accuracy. Numerical simulations are given to verify the effectiveness of the proposed DRL algorithm and demonstrate the performance of the developed optimal landing controller.
This is a preview of subscription content, access via your institution.
Buy single article
Instant access to the full article PDF.
Tax calculation will be finalised during checkout.
Wang, Z. B., Grant, M. J. Minimum-fuel low-thrust transfers for spacecraft: a convex approach. IEEE Transactions on Aerospace and Electronic Systems, 2018, 54(5): 2274–2290.
Tang, G., Jiang, F. H., Li, J. F. Fuel-optimal low-thrust trajectory optimization using indirect method and successive convex programming. IEEE Transactions on Aerospace and Electronic Systems, 2018, 54(4): 2053–2066.
Betts, J. T. Survey of numerical methods for trajectory optimization. Journal of Guidance, Control, and Dynamics, 1998, 21(2): 193–207.
Yang, H. W., Baoyin, H. Fuel-optimal control for soft landing on an irregular asteroid. IEEE Transactions on Aerospace and Electronic Systems, 2015, 51(3): 1688–1697.
Jiang, F. H., Baoyin, H., Li, J. F. Practical techniques for low-thrust trajectory optimization with homotopic approach. Journal of Guidance, Control, and Dynamics, 2012, 35(1): 245–258.
Taheri, E., Li, N. I., Kolmanovsky, I. Co-state initialization for the minimum-time low-thrust trajectory optimization. Advances in Space Research, 2017, 59(9): 2360–2373.
Liu, X. F., Lu, P., Pan, B. F. Survey of convex optimization for aerospace applications. Astrodynamics, 2017, 1(1): 23–40.
Wang, Z. B., Grant, M. J. Constrained trajectory optimization for planetary entry via sequential convex programming. Journal of Guidance, Control, and Dynamics, 2017, 40(10): 2603–2615.
Lillicrap, T. P., Hunt, J. J., Pritzel, A., Heess, N., Erez, T., Tassa, Y, Silver, D., Wierstra, D. Continuous control with deep reinforcement learning. arXiv: 1509.02971, 2015.
Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P. Trust region policy optimization. In: Proceedings of International Conference on Machine Learning, 2015, 1889–1897.
Mnih, V., Badia, A. P., Mirza, M., Graves, A., Lillicrap, T. P., Harley, T., Silver, D., Kavukcuoglu, K. Asynchronous methods for deep reinforcement learning. In: Proceedings of International Conference on Machine Learning, 2016, 1928–1937.
Sutton, R. S., Barto, A. G. Reinforcement learning: an introduction. MIT Press, 1998.
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., Graves, A., Riedmiller, M., Fidjeland, A. K., Ostrovski, G. et al. Human-level control through deep reinforcement learning. Nature, 2015, 518(7540): 529–533.
Ng, A. Y., Coates, A., Diel, M., Ganapathi, V., Schulte, J., Tse, B., Berger, E., Liang, E. Autonomous inverted helicopter flight via reinforcement learning. Experimental Robotics IX, 2006, 363–372.
Silver, D., Huang, A., Maddison, C. J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M. et al. Mastering the game of go with deep neural networks and tree search. Nature, 2016, 529(7587): 484–489.
Cheng, L., Wang, Z. B., Jiang, F. H., Zhou, C. Y. Real-Time optimal control for spacecraft orbit transfer via multi-scale deep neural networks. IEEE Transactions on Aerospace and Electronic Systems, 2018, DOI: https://doi.org/10.1109/TAES.2018.2889571.
Sánchez-Sánchez, C., Izzo, D. Real-time optimal control via deep neural networks: study on landing problems. Journal of Guidance, Control, and Dynamics, 2018, 41(5): 1122–1135.
Moré, J. J., Garbow, B. S., Hillstrom, K. E. User guide for MINPACK-1. CM-P00068642, Argonne National Lab., 1980.
Kingma, D., Ba, J. Adam: a method for stochastic optimization. arXiv: 1412.6980, 2014.
Abadi, M., Barham, P., Chen, J. M., Chen, Z. F., Davis, A., Dean, J., Devin, M., Ghemawat, S., Irving, G., Isard, M. et al. Tensorflow: a system for large-scale machine learning. In: Proceedings of the 12th USENIX conference on Operating Systems Design and Implementation, 2016, 265–283.
This work is supported by the National Natural Science Foundation of China (Grants Nos. 11672146 and 11432001).
Lin Cheng received his Ph.D. degree in automation science and electrical engineering from Beihang University, China, in 2017, and is working as a postdoctor in the School of Aerospace Engineering, Tsinghua University, China. His current interests include guidance and control, trajectory optimization, deep reinforcement learning and real-time optimal control with AI.
Zhenbo Wang received his B.S. degree in aerospace engineering from Nanjing University of Aeronautics and Astronautics, China, in 2010, and his M.S. degree in control engineering from Beihang University, China, in 2013. In 2018, he received his Ph.D. degree in aerospace engineering from Purdue University, USA. Currently he is working as a tenure-track assistant professor in the Department of Mechanical, Aerospace, and Biomedical Engineering at the University of Tennessee, USA. His current interests are in the area of guidance, control, dynamics, and optimization, and specifically include optimal control, convex optimization, and machine learning with applications to guidance, control, and trajectory optimization for autonomous vehicle systems.
Fanghua Jiang was born in 1982 in Hunan Province, China. He received his B.S. degree in engineering mechanics and Ph.D. degree in mechanics from Tsinghua University, China, in 2004 and 2009, respectively. Since 2009, he has worked in the School of Aerospace Engineering at Tsinghua University, and has been worked as a postdoctor for two years, a research assistant for three years. Since then, he had been working as an associate professor for four years. His current research interests include astrodynamics, spacecraft formation flying, and interplanetary trajectory optimization.
About this article
Cite this article
Cheng, L., Wang, Z. & Jiang, F. Real-time control for fuel-optimal Moon landing based on an interactive deep reinforcement learning algorithm. Astrodyn 3, 375–386 (2019). https://doi.org/10.1007/s42064-018-0052-2
- fuel-optimal landing problem
- indirect methods
- deep reinforcement learning
- interactive network learning
- real-time optimal control