Abstract
In this chapter, we discuss potential opportunities and challenges associated with applications of reinforcement learning (RL) in the aerospace domain. In particular, we focus on problems related to sensor resource management, autonomous navigation, advanced manufacturing, maintenance, repair and overhaul operations, and human-machine collaboration. We present two detailed RL case studies related to sensor tasking for aerial surveillance and robot control in an additive manufacturing application which utilizes different flavors of RL including the more recent deep RL framework. Finally, we highlight some ongoing research developments which could address key challenges in deploying RL in the aerospace domain.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ACM, New York (2004)
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 22–31. JMLR. org (2017)
Allgöwer, F., Zheng, A.: Nonlinear Model Predictive Control, vol. 26. Birkhäuser, Basel (2012)
Altman, E.: Constrained Markov decision processes, vol. 7. CRC Press, Boca Raton (1999)
Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Mag. 35(4), 105–120 (2014)
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discret. Event Dyn. Syst. 13(1–2), 41–77 (2003)
Berkenkamp, F., Turchetta, M., Schoellig, A., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems, pp. 908–918 (2017)
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena scientific, Belmont, MA (1995)
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming, vol. 5. Athena Scientific, Belmont, MA (1996)
Bohg, J., Hausman, K., Sankaran, B., Brock, O., Kragic, D., Schaal, S., Sukhatme, G.S.: Interactive perception: leveraging action in perception and perception in action. IEEE Trans. Robot. 33(6), 1273–1291 (2017)
Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: an overview. In: Innovations in Multi-agent Systems and Applications, vol. 1, pp. 183–221. Springer, Berlin (2010)
Cai, Q., Filos-Ratsikas, A., Tang, P., Zhang, Y.: Reinforcement mechanism design for fraudulent behaviour in e-commerce. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Cassandra, A.R.: Exact and approximate algorithms for partially observable Markov decision processes. PhD thesis, Brown University (1998)
Chakrabarty, A., Jha, D.K., Buzzard, G.T., Wang, Y., Vamvoudakis, K.: Safe approximate dynamic programming via kernelized lipschitz estimation. arXiv:1907.02151 (2019)
Chakraborty, A., Shishkin, S., Birnkrant, M.J.: Optimal control of build height utilizing optical profilometry in cold spray deposits. In: Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems 2017. International Society for Optics and Photonics, vol. 10168, p. 101683H (2017)
Yangquan, C., Changyun, W.: Iterative Learning Control: Convergence, Robustness and Applications. Springer, Berlin (1999)
Cox, T.H., Nagy, C.J., Skoog, M.A., Somers, I.A., Warner, R.: Civil uav capability assessment
Ding, X.C., Pinto, A., Surana, A.: Strategic planning under uncertainties via constrained markov decision processes. In: 2013 IEEE International Conference on Robotics and Automation, pp. 4568–4575. IEEE, Piscataway (2013)
Ding, X.D., Englot, B., Pinto, A., Speranzon, A., Surana, A.: Hierarchical multi-objective planning: from mission specifications to contingency management. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3735–3742. IEEE, Piscataway (2014)
Dolgov, D.A., Durfee, E.H.: Stationary deterministic policies for constrained mdps with multiple rewards, costs, and discount factors. In: IJCAI, pp. 1326–1331 (2005)
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. CoRR. arxiv:abs/1604.06778 (2016)
El-Tantawy, Samah., Abdulhai, Baher, Abdelgawad, Hossam: Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown toronto. IEEE Trans. Intell. Trans. Syst. 14(3), 1140–1150 (2013)
Everton, S.K., Hirsch, M., Stravroulakis, P., Leach, R.K., Clare, A.T.: Review of in-situ process monitoring and in-situ metrology for metal additive manufacturing. Mat. Design 95, 431–445 (2016)
Finn, C., Levine, S., Abbeel, P.: Guided cost learning: Deep inverse optimal control via policy optimization. In: International Conference on Machine Learning, pp. 49–58 (2016)
Garcia, C.E., Prett, D.M., Morari, M.: Model predictive control: theory and practicea survey. Automatica 25(3), 335–348 (1989)
Garcıa, Javier, Fernández, Fernando: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
Gittins, J., Glazebrook, K., Weber, R.: Multi-armed bandit allocation indices. Wiley, Hoboken (2011)
Glavic, Mevludin., Fonteneau, Raphaël, Ernst, Damien: Reinforcement learning for electric power system decision and control: past considerations and perspectives. IFAC-PapersOnLine 50(1), 6918–6927 (2017)
Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., Celi, L.A.: Guidelines for reinforcement learning in healthcare. Nat. Med. 25(1), 16–18 (2019)
Gu, S.S., Lillicrap, T., Turner, R.E., Ghahramani, Z., Schölkopf, B., Levine, S.: Interpolated policy gradient: merging on-policy and off-policy gradient estimation for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3846–3855 (2017)
Haddal, C.C., Gertler, J.: Homeland security: unmanned aerial vehicles and border surveillance. Library of Congress Washington DC Congressional Research Service (2010)
Hero, A.O., Castañón, D., Cochran, D., Kastella, K.: Foundations and Applications of Sensor Management. Springer Science & Business Media, Berlin (2007)
James, S., Davison, A.J., Johns, E.: Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task. arXiv:1707.02267 (2017)
Kehoe, Ben., Patil, Sachin., Abbeel, Pieter, Goldberg, Ken: A survey of research on cloud robotics and automation. IEEE Trans. Autom. Sci. Eng. 12(2), 398–409 (2015)
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Kreucher, C.M.: An information-based approach to sensor resource allocation. PhD thesis, University of Michigan (2005)
Krishnamurthy, V., Evans, R.J.: Hidden markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking. IEEE Trans. Signal Process. 49(12), 2893–2908 (2001)
Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017)
Levine, S., Abbeel, P.: Learning neural network policies with guided policy search under unknown dynamics. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.), Advances in Neural Information Processing Systems, vol. 27, pp. 1071–1079. Curran Associates, Inc., Red Hook, NY (2014)
Levine, Sergey., Finn, Chelsea., Darrell, Trevor, Abbeel, Pieter: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
Levine, S., Koltun, V.: Guided policy search. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, pp. 1–9 (2013)
Levine, S., Koltun, V.: Learning complex neural network policies with trajectory optimization. In: Jebara, T., Xing, E.P. (eds.), Proceedings of the 31st International Conference on Machine Learning (ICML-14). JMLR Workshop and Conference Proceedings, pp. 829–837 (2014)
Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, vol. 17. Wiley, Hoboken (2013)
Li, J., Monroe, W., Ritter, A., Galley, M., Gao, J., Jurafsky, D.: Deep reinforcement learning for dialogue generation. arXiv:1606.01541 (2016)
Li, Weiwei, Todorov, Emanuel: Iterative linear quadratic regulator design for nonlinear biological movement systems. ICINCO 1, 222–229 (2004)
Li, Y.: Deep reinforcement learning: an overview. arXiv:1701.07274 (2017)
Liang, E., Liaw, R., Moritz, P., Nishihara, R., Fox, R., Goldberg, K., Gonzalez, J.E., Jordan, M.I., Stoica, I.: Rllib: abstractions for distributed reinforcement learning. arXiv:1712.09381 (2017)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
Liu, Y.-E., Mandel, T., Brunskill, E., Popovic, Z.: Trading off scientific knowledge and user learning with multi-armed bandits. In: EDM, pp. 161–168 (2014)
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)
Luong, N.C., Hoang, D.T., Gong, S., Niyato,. D., Wang, P., Liang, Y.-C., Kim, D.I.: Applications of deep reinforcement learning in communications and networking: a survey. IEEE Communications Surveys and Tutorials (2019)
Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM Workshop on Hot Topics in Networks, pp. 50–56. ACM, New York (2016)
Mathew, George, Mezić, Igor: Metrics for ergodicity and design of ergodic dynamics for multi-agent systems. Phys. D: Nonlinear Phenom. 240(4–5), 432–442 (2011)
Mathew, G., Surana, A., Mezić, I.: Uniform coverage control of mobile sensor networks for dynamic target detection. In: 49th IEEE Conference on Decision and Control (CDC), pp. 7292–7299. IEEE, Piscataway (2010)
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. pp. 1928–1937 (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3303–3313 (2018)
U. S. Army UAS Center of Excellence.: U.S. Army roadmap for unmanned aircraft systems, pp. 2010–2035. U.S. Army (2010)
Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2681–2690. JMLR. org (2017)
Oshin, O., Nair, B.M., Ding, J., Osborne, R.W., Varma, R., Bernal, E.A., Stramandinoli, F.: Generative and discriminative machine learning models for reactive robot planning in human-robot collaboration (2019)
Peters, Jan, Schaal, Stefan: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)
Pong, V., Gu, S., Dalal, M., Levine, S.: Temporal difference models: model-free deep rl for model-based control. arXiv:1802.09081 (2018)
Rajeswaran, A., Kumar, V., Gupta, A., Vezzani, G., Schulman, J., Todorov, E., Levine, S.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv:1709.10087 (2017)
Sallab, Ahmad E.L., Abdou, Mohammed., Perot, Etienne, Yogamani, Senthil: Deep reinforcement learning framework for autonomous driving. Electron. Imaging 2017(19), 70–76 (2017)
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. arXiv:1502.05477 (2015)
Silver, D.L., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: 2013 AAAI Spring Symposium Series (2013)
Silver, David., Hubert, Thomas., Schrittwieser, Julian., Antonoglou, Ioannis., Lai, Matthew., Guez, Arthur., Lanctot, Marc., Sifre, Laurent., Kumaran, Dharshan., Graepel, Thore., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Stramandinoli, F., Lore, K.G., Peters, J.R., ONeill, P.C., Nair, B.M., Varma, R., Ryde, J.C., Miller, J.T., Reddy, K.K.: Robot learning from human demonstration in virtual reality (2018)
Surana, A., Reddy, K., Siopis, M.: Guided policy search based control of a high dimensional advanced manufacturing process. arxiv preprint, arXiv:2009.05838 (2020)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press, Cambridge (2018)
Tapia, G., Elwany, A.: A review on process monitoring and control in metal-based additive manufacturing. J. Manuf. Sci. Eng. 136(6), 060801 (2014)
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)
Thomas, P., Brunskill, E.: Data-efficient off-policy policy evaluation for reinforcement learning. In: International Conference on Machine Learning, pp. 2139–2148 (2016)
Thrun, S., Pratt, L.: Learning to learn. Springer Science & Business Media, Berlin (2012)
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. IEEE, Piscataway (2017)
Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)
Vezhnevets, A.S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., Kavukcuoglu, K.: Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3540–3549. JMLR. org (2017)
Wang, X., Feng, F., Klecka, M.A., Mordasky, M.D., Garofano, J.K., El-Wardany, T., Nardi, A., Champagne, V.K.: Characterization and modeling of the bonding process in cold spray additive manufacturing. Addit. Manuf. 8, 149–162 (2015)
Wang, Y., Gao, F., Doyle, F.J., III.: Survey on iterative learning control, repetitive control, and run-to-run control. J. Process Control 19(10), 1589–1600 (2009)
Washburn, R.B., Schneider, M.K., Fox, J.J.: Stochastic dynamic programming based approaches to sensor resource management. In: Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002 (IEEE Cat. No. 02EX5997), vol. 1, pp. 608–615. IEEE, Annapolis, MD (2002)
Wayne, G., Hung, C-C., Amos, D., Mirza, M., Ahuja, A., Grabska-Barwinska, A., Rae, J., Mirowski, P., Leibo, J.Z., Santoro, A., et al.: Unsupervised predictive memory in a goal-directed agent. arxiv preprint, arXiv:1803.10760 (2018)
Wu, C., Rajeswaran, A., Duan, Y., Kumar, V., Bayen, A.M., Kakade, S., Mordatch, I., Abbeel, P.: Variance reduction for policy gradient with action-dependent factorized baselines. arXiv:1803.07246 (2018)
Yin, Shuo., Cavaliere, Pasquale., Aldwell, Barry., Jenkins, Richard., Liao, Hanlin., Li, Wenya, Lupoi, Rocco: Cold spray additive manufacturing and repair: fundamentals and applications. Addit. Manuf. 21, 628–650 (2018)
Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. CoRR. arxiv:abs/1509.06791 (2015)
Zhao, X., Xia, L., Tang, J., Yin, D.: Deep reinforcement learning for search, recommendation, and online advertising: a survey by xiangyu zhao, long xia, jiliang tang, and dawei yin with martin vesely as coordinator. ACM SIGWEB Newsletter, vol. 4. Spring, Berlin (2019)D
Zhu, H., Gupta, A., Rajeswaran, A., Levine, S., Kumar, V.: Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3651–3657. IEEE, Montreal, QC (2019)
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Surana, A. (2021). Reinforcement Learning: An Industrial Perspective. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-60990-0_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60989-4
Online ISBN: 978-3-030-60990-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)