Reinforcement Learning: An Industrial Perspective

Surana, Amit

doi:10.1007/978-3-030-60990-0_21

Amit Surana⁶

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 325))

7492 Accesses
1 Citations

Abstract

In this chapter, we discuss potential opportunities and challenges associated with applications of reinforcement learning (RL) in the aerospace domain. In particular, we focus on problems related to sensor resource management, autonomous navigation, advanced manufacturing, maintenance, repair and overhaul operations, and human-machine collaboration. We present two detailed RL case studies related to sensor tasking for aerial surveillance and robot control in an additive manufacturing application which utilizes different flavors of RL including the more recent deep RL framework. Finally, we highlight some ongoing research developments which could address key challenges in deploying RL in the aerospace domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ACM, New York (2004)
Google Scholar
Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 22–31. JMLR. org (2017)
Google Scholar
Allgöwer, F., Zheng, A.: Nonlinear Model Predictive Control, vol. 26. Birkhäuser, Basel (2012)
Google Scholar
Altman, E.: Constrained Markov decision processes, vol. 7. CRC Press, Boca Raton (1999)
Google Scholar
Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Mag. 35(4), 105–120 (2014)
Google Scholar
Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)
Google Scholar
Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discret. Event Dyn. Syst. 13(1–2), 41–77 (2003)
Google Scholar
Berkenkamp, F., Turchetta, M., Schoellig, A., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems, pp. 908–918 (2017)
Google Scholar
Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena scientific, Belmont, MA (1995)
Google Scholar
Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming, vol. 5. Athena Scientific, Belmont, MA (1996)
Google Scholar
Bohg, J., Hausman, K., Sankaran, B., Brock, O., Kragic, D., Schaal, S., Sukhatme, G.S.: Interactive perception: leveraging action in perception and perception in action. IEEE Trans. Robot. 33(6), 1273–1291 (2017)
Google Scholar
Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: an overview. In: Innovations in Multi-agent Systems and Applications, vol. 1, pp. 183–221. Springer, Berlin (2010)
Google Scholar
Cai, Q., Filos-Ratsikas, A., Tang, P., Zhang, Y.: Reinforcement mechanism design for fraudulent behaviour in e-commerce. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Cassandra, A.R.: Exact and approximate algorithms for partially observable Markov decision processes. PhD thesis, Brown University (1998)
Google Scholar
Chakrabarty, A., Jha, D.K., Buzzard, G.T., Wang, Y., Vamvoudakis, K.: Safe approximate dynamic programming via kernelized lipschitz estimation. arXiv:1907.02151 (2019)
Chakraborty, A., Shishkin, S., Birnkrant, M.J.: Optimal control of build height utilizing optical profilometry in cold spray deposits. In: Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems 2017. International Society for Optics and Photonics, vol. 10168, p. 101683H (2017)
Google Scholar
Yangquan, C., Changyun, W.: Iterative Learning Control: Convergence, Robustness and Applications. Springer, Berlin (1999)
Google Scholar
Cox, T.H., Nagy, C.J., Skoog, M.A., Somers, I.A., Warner, R.: Civil uav capability assessment
Google Scholar
Ding, X.C., Pinto, A., Surana, A.: Strategic planning under uncertainties via constrained markov decision processes. In: 2013 IEEE International Conference on Robotics and Automation, pp. 4568–4575. IEEE, Piscataway (2013)
Google Scholar
Ding, X.D., Englot, B., Pinto, A., Speranzon, A., Surana, A.: Hierarchical multi-objective planning: from mission specifications to contingency management. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3735–3742. IEEE, Piscataway (2014)
Google Scholar
Dolgov, D.A., Durfee, E.H.: Stationary deterministic policies for constrained mdps with multiple rewards, costs, and discount factors. In: IJCAI, pp. 1326–1331 (2005)
Google Scholar
Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. CoRR. arxiv:abs/1604.06778 (2016)
El-Tantawy, Samah., Abdulhai, Baher, Abdelgawad, Hossam: Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown toronto. IEEE Trans. Intell. Trans. Syst. 14(3), 1140–1150 (2013)
Article Google Scholar
Everton, S.K., Hirsch, M., Stravroulakis, P., Leach, R.K., Clare, A.T.: Review of in-situ process monitoring and in-situ metrology for metal additive manufacturing. Mat. Design 95, 431–445 (2016)
Google Scholar
Finn, C., Levine, S., Abbeel, P.: Guided cost learning: Deep inverse optimal control via policy optimization. In: International Conference on Machine Learning, pp. 49–58 (2016)
Google Scholar
Garcia, C.E., Prett, D.M., Morari, M.: Model predictive control: theory and practicea survey. Automatica 25(3), 335–348 (1989)
Google Scholar
Garcıa, Javier, Fernández, Fernando: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)
MathSciNet MATH Google Scholar
Gittins, J., Glazebrook, K., Weber, R.: Multi-armed bandit allocation indices. Wiley, Hoboken (2011)
Google Scholar
Glavic, Mevludin., Fonteneau, Raphaël, Ernst, Damien: Reinforcement learning for electric power system decision and control: past considerations and perspectives. IFAC-PapersOnLine 50(1), 6918–6927 (2017)
Article Google Scholar
Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., Celi, L.A.: Guidelines for reinforcement learning in healthcare. Nat. Med. 25(1), 16–18 (2019)
Google Scholar
Gu, S.S., Lillicrap, T., Turner, R.E., Ghahramani, Z., Schölkopf, B., Levine, S.: Interpolated policy gradient: merging on-policy and off-policy gradient estimation for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3846–3855 (2017)
Google Scholar
Haddal, C.C., Gertler, J.: Homeland security: unmanned aerial vehicles and border surveillance. Library of Congress Washington DC Congressional Research Service (2010)
Google Scholar
Hero, A.O., Castañón, D., Cochran, D., Kastella, K.: Foundations and Applications of Sensor Management. Springer Science & Business Media, Berlin (2007)
Google Scholar
James, S., Davison, A.J., Johns, E.: Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task. arXiv:1707.02267 (2017)
Kehoe, Ben., Patil, Sachin., Abbeel, Pieter, Goldberg, Ken: A survey of research on cloud robotics and automation. IEEE Trans. Autom. Sci. Eng. 12(2), 398–409 (2015)
Article Google Scholar
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Google Scholar
Kreucher, C.M.: An information-based approach to sensor resource allocation. PhD thesis, University of Michigan (2005)
Google Scholar
Krishnamurthy, V., Evans, R.J.: Hidden markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking. IEEE Trans. Signal Process. 49(12), 2893–2908 (2001)
Google Scholar
Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017)
Google Scholar
Levine, S., Abbeel, P.: Learning neural network policies with guided policy search under unknown dynamics. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.), Advances in Neural Information Processing Systems, vol. 27, pp. 1071–1079. Curran Associates, Inc., Red Hook, NY (2014)
Google Scholar
Levine, Sergey., Finn, Chelsea., Darrell, Trevor, Abbeel, Pieter: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)
MathSciNet MATH Google Scholar
Levine, S., Koltun, V.: Guided policy search. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, pp. 1–9 (2013)
Google Scholar
Levine, S., Koltun, V.: Learning complex neural network policies with trajectory optimization. In: Jebara, T., Xing, E.P. (eds.), Proceedings of the 31st International Conference on Machine Learning (ICML-14). JMLR Workshop and Conference Proceedings, pp. 829–837 (2014)
Google Scholar
Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, vol. 17. Wiley, Hoboken (2013)
Google Scholar
Li, J., Monroe, W., Ritter, A., Galley, M., Gao, J., Jurafsky, D.: Deep reinforcement learning for dialogue generation. arXiv:1606.01541 (2016)
Li, Weiwei, Todorov, Emanuel: Iterative linear quadratic regulator design for nonlinear biological movement systems. ICINCO 1, 222–229 (2004)
Google Scholar
Li, Y.: Deep reinforcement learning: an overview. arXiv:1701.07274 (2017)
Liang, E., Liaw, R., Moritz, P., Nishihara, R., Fox, R., Goldberg, K., Gonzalez, J.E., Jordan, M.I., Stoica, I.: Rllib: abstractions for distributed reinforcement learning. arXiv:1712.09381 (2017)
Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)
Liu, Y.-E., Mandel, T., Brunskill, E., Popovic, Z.: Trading off scientific knowledge and user learning with multi-armed bandits. In: EDM, pp. 161–168 (2014)
Google Scholar
Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)
Google Scholar
Luong, N.C., Hoang, D.T., Gong, S., Niyato,. D., Wang, P., Liang, Y.-C., Kim, D.I.: Applications of deep reinforcement learning in communications and networking: a survey. IEEE Communications Surveys and Tutorials (2019)
Google Scholar
Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM Workshop on Hot Topics in Networks, pp. 50–56. ACM, New York (2016)
Google Scholar
Mathew, George, Mezić, Igor: Metrics for ergodicity and design of ergodic dynamics for multi-agent systems. Phys. D: Nonlinear Phenom. 240(4–5), 432–442 (2011)
Article Google Scholar
Mathew, G., Surana, A., Mezić, I.: Uniform coverage control of mobile sensor networks for dynamic target detection. In: 49th IEEE Conference on Decision and Control (CDC), pp. 7292–7299. IEEE, Piscataway (2010)
Google Scholar
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. pp. 1928–1937 (2016)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)
Google Scholar
Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3303–3313 (2018)
Google Scholar
U. S. Army UAS Center of Excellence.: U.S. Army roadmap for unmanned aircraft systems, pp. 2010–2035. U.S. Army (2010)
Google Scholar
Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2681–2690. JMLR. org (2017)
Google Scholar
Oshin, O., Nair, B.M., Ding, J., Osborne, R.W., Varma, R., Bernal, E.A., Stramandinoli, F.: Generative and discriminative machine learning models for reactive robot planning in human-robot collaboration (2019)
Google Scholar
Peters, Jan, Schaal, Stefan: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)
Article Google Scholar
Pong, V., Gu, S., Dalal, M., Levine, S.: Temporal difference models: model-free deep rl for model-based control. arXiv:1802.09081 (2018)
Rajeswaran, A., Kumar, V., Gupta, A., Vezzani, G., Schulman, J., Todorov, E., Levine, S.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv:1709.10087 (2017)
Sallab, Ahmad E.L., Abdou, Mohammed., Perot, Etienne, Yogamani, Senthil: Deep reinforcement learning framework for autonomous driving. Electron. Imaging 2017(19), 70–76 (2017)
Article Google Scholar
Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. arXiv:1502.05477 (2015)
Silver, D.L., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: 2013 AAAI Spring Symposium Series (2013)
Google Scholar
Silver, David., Hubert, Thomas., Schrittwieser, Julian., Antonoglou, Ioannis., Lai, Matthew., Guez, Arthur., Lanctot, Marc., Sifre, Laurent., Kumaran, Dharshan., Graepel, Thore., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)
Article MathSciNet Google Scholar
Stramandinoli, F., Lore, K.G., Peters, J.R., ONeill, P.C., Nair, B.M., Varma, R., Ryde, J.C., Miller, J.T., Reddy, K.K.: Robot learning from human demonstration in virtual reality (2018)
Google Scholar
Surana, A., Reddy, K., Siopis, M.: Guided policy search based control of a high dimensional advanced manufacturing process. arxiv preprint, arXiv:2009.05838 (2020)
Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press, Cambridge (2018)
Google Scholar
Tapia, G., Elwany, A.: A review on process monitoring and control in metal-based additive manufacturing. J. Manuf. Sci. Eng. 136(6), 060801 (2014)
Google Scholar
Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)
Google Scholar
Thomas, P., Brunskill, E.: Data-efficient off-policy policy evaluation for reinforcement learning. In: International Conference on Machine Learning, pp. 2139–2148 (2016)
Google Scholar
Thrun, S., Pratt, L.: Learning to learn. Springer Science & Business Media, Berlin (2012)
Google Scholar
Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. IEEE, Piscataway (2017)
Google Scholar
Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)
Google Scholar
Vezhnevets, A.S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., Kavukcuoglu, K.: Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3540–3549. JMLR. org (2017)
Google Scholar
Wang, X., Feng, F., Klecka, M.A., Mordasky, M.D., Garofano, J.K., El-Wardany, T., Nardi, A., Champagne, V.K.: Characterization and modeling of the bonding process in cold spray additive manufacturing. Addit. Manuf. 8, 149–162 (2015)
Google Scholar
Wang, Y., Gao, F., Doyle, F.J., III.: Survey on iterative learning control, repetitive control, and run-to-run control. J. Process Control 19(10), 1589–1600 (2009)
Google Scholar
Washburn, R.B., Schneider, M.K., Fox, J.J.: Stochastic dynamic programming based approaches to sensor resource management. In: Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002 (IEEE Cat. No. 02EX5997), vol. 1, pp. 608–615. IEEE, Annapolis, MD (2002)
Google Scholar
Wayne, G., Hung, C-C., Amos, D., Mirza, M., Ahuja, A., Grabska-Barwinska, A., Rae, J., Mirowski, P., Leibo, J.Z., Santoro, A., et al.: Unsupervised predictive memory in a goal-directed agent. arxiv preprint, arXiv:1803.10760 (2018)
Wu, C., Rajeswaran, A., Duan, Y., Kumar, V., Bayen, A.M., Kakade, S., Mordatch, I., Abbeel, P.: Variance reduction for policy gradient with action-dependent factorized baselines. arXiv:1803.07246 (2018)
Yin, Shuo., Cavaliere, Pasquale., Aldwell, Barry., Jenkins, Richard., Liao, Hanlin., Li, Wenya, Lupoi, Rocco: Cold spray additive manufacturing and repair: fundamentals and applications. Addit. Manuf. 21, 628–650 (2018)
Google Scholar
Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. CoRR. arxiv:abs/1509.06791 (2015)
Zhao, X., Xia, L., Tang, J., Yin, D.: Deep reinforcement learning for search, recommendation, and online advertising: a survey by xiangyu zhao, long xia, jiliang tang, and dawei yin with martin vesely as coordinator. ACM SIGWEB Newsletter, vol. 4. Spring, Berlin (2019)D
Google Scholar
Zhu, H., Gupta, A., Rajeswaran, A., Levine, S., Kumar, V.: Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3651–3657. IEEE, Montreal, QC (2019)
Google Scholar
Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

Raytheon Technologies Research Center, 411 Silver Lane, East Hartford, CT, USA
Amit Surana

Authors

Amit Surana
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Amit Surana .

Editor information

Editors and Affiliations

The Daniel Guggenheim School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA, USA
Kyriakos G. Vamvoudakis
Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, USA
Yan Wan
Department of Electrical Engineering, The University of Texas at Arlington, Arlington, TX, USA
Frank L. Lewis
Army Research Office, Durham, NC, USA
Derya Cansever

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Surana, A. (2021). Reinforcement Learning: An Industrial Perspective. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-60990-0_21
Published: 24 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60989-4
Online ISBN: 978-3-030-60990-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics