Skip to main content

Reinforcement Learning: An Industrial Perspective

  • Chapter
  • First Online:
Handbook of Reinforcement Learning and Control

Part of the book series: Studies in Systems, Decision and Control ((SSDC,volume 325))

Abstract

In this chapter, we discuss potential opportunities and challenges associated with applications of reinforcement learning (RL) in the aerospace domain. In particular, we focus on problems related to sensor resource management, autonomous navigation, advanced manufacturing, maintenance, repair and overhaul operations, and human-machine collaboration. We present two detailed RL case studies related to sensor tasking for aerial surveillance and robot control in an additive manufacturing application which utilizes different flavors of RL including the more recent deep RL framework. Finally, we highlight some ongoing research developments which could address key challenges in deploying RL in the aerospace domain.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abbeel, P., Ng, A.Y.: Apprenticeship learning via inverse reinforcement learning. In: Proceedings of the Twenty-First International Conference on Machine Learning, p. 1. ACM, New York (2004)

    Google Scholar 

  2. Achiam, J., Held, D., Tamar, A., Abbeel, P.: Constrained policy optimization. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 22–31. JMLR. org (2017)

    Google Scholar 

  3. Allgöwer, F., Zheng, A.: Nonlinear Model Predictive Control, vol. 26. Birkhäuser, Basel (2012)

    Google Scholar 

  4. Altman, E.: Constrained Markov decision processes, vol. 7. CRC Press, Boca Raton (1999)

    Google Scholar 

  5. Amershi, S., Cakmak, M., Knox, W.B., Kulesza, T.: Power to the people: the role of humans in interactive machine learning. AI Mag. 35(4), 105–120 (2014)

    Google Scholar 

  6. Argall, B.D., Chernova, S., Veloso, M., Browning, B.: A survey of robot learning from demonstration. Robot. Auton. Syst. 57(5), 469–483 (2009)

    Google Scholar 

  7. Barto, A.G., Mahadevan, S.: Recent advances in hierarchical reinforcement learning. Discret. Event Dyn. Syst. 13(1–2), 41–77 (2003)

    Google Scholar 

  8. Berkenkamp, F., Turchetta, M., Schoellig, A., Krause, A.: Safe model-based reinforcement learning with stability guarantees. In: Advances in Neural Information Processing Systems, pp. 908–918 (2017)

    Google Scholar 

  9. Bertsekas, D.P.: Dynamic Programming and Optimal Control, vol. 1. Athena scientific, Belmont, MA (1995)

    Google Scholar 

  10. Bertsekas, D.P., Tsitsiklis, J.N.: Neuro-Dynamic Programming, vol. 5. Athena Scientific, Belmont, MA (1996)

    Google Scholar 

  11. Bohg, J., Hausman, K., Sankaran, B., Brock, O., Kragic, D., Schaal, S., Sukhatme, G.S.: Interactive perception: leveraging action in perception and perception in action. IEEE Trans. Robot. 33(6), 1273–1291 (2017)

    Google Scholar 

  12. Buşoniu, L., Babuška, R., De Schutter, B.: Multi-agent reinforcement learning: an overview. In: Innovations in Multi-agent Systems and Applications, vol. 1, pp. 183–221. Springer, Berlin (2010)

    Google Scholar 

  13. Cai, Q., Filos-Ratsikas, A., Tang, P., Zhang, Y.: Reinforcement mechanism design for fraudulent behaviour in e-commerce. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  14. Cassandra, A.R.: Exact and approximate algorithms for partially observable Markov decision processes. PhD thesis, Brown University (1998)

    Google Scholar 

  15. Chakrabarty, A., Jha, D.K., Buzzard, G.T., Wang, Y., Vamvoudakis, K.: Safe approximate dynamic programming via kernelized lipschitz estimation. arXiv:1907.02151 (2019)

  16. Chakraborty, A., Shishkin, S., Birnkrant, M.J.: Optimal control of build height utilizing optical profilometry in cold spray deposits. In: Sensors and Smart Structures Technologies for Civil, Mechanical, and Aerospace Systems 2017. International Society for Optics and Photonics, vol. 10168, p. 101683H (2017)

    Google Scholar 

  17. Yangquan, C., Changyun, W.: Iterative Learning Control: Convergence, Robustness and Applications. Springer, Berlin (1999)

    Google Scholar 

  18. Cox, T.H., Nagy, C.J., Skoog, M.A., Somers, I.A., Warner, R.: Civil uav capability assessment

    Google Scholar 

  19. Ding, X.C., Pinto, A., Surana, A.: Strategic planning under uncertainties via constrained markov decision processes. In: 2013 IEEE International Conference on Robotics and Automation, pp. 4568–4575. IEEE, Piscataway (2013)

    Google Scholar 

  20. Ding, X.D., Englot, B., Pinto, A., Speranzon, A., Surana, A.: Hierarchical multi-objective planning: from mission specifications to contingency management. In: 2014 IEEE International Conference on Robotics and Automation (ICRA), pp. 3735–3742. IEEE, Piscataway (2014)

    Google Scholar 

  21. Dolgov, D.A., Durfee, E.H.: Stationary deterministic policies for constrained mdps with multiple rewards, costs, and discount factors. In: IJCAI, pp. 1326–1331 (2005)

    Google Scholar 

  22. Duan, Y., Chen, X., Houthooft, R., Schulman, J., Abbeel, P.: Benchmarking deep reinforcement learning for continuous control. CoRR. arxiv:abs/1604.06778 (2016)

  23. El-Tantawy, Samah., Abdulhai, Baher, Abdelgawad, Hossam: Multiagent reinforcement learning for integrated network of adaptive traffic signal controllers (marlin-atsc): methodology and large-scale application on downtown toronto. IEEE Trans. Intell. Trans. Syst. 14(3), 1140–1150 (2013)

    Article  Google Scholar 

  24. Everton, S.K., Hirsch, M., Stravroulakis, P., Leach, R.K., Clare, A.T.: Review of in-situ process monitoring and in-situ metrology for metal additive manufacturing. Mat. Design 95, 431–445 (2016)

    Google Scholar 

  25. Finn, C., Levine, S., Abbeel, P.: Guided cost learning: Deep inverse optimal control via policy optimization. In: International Conference on Machine Learning, pp. 49–58 (2016)

    Google Scholar 

  26. Garcia, C.E., Prett, D.M., Morari, M.: Model predictive control: theory and practicea survey. Automatica 25(3), 335–348 (1989)

    Google Scholar 

  27. Garcıa, Javier, Fernández, Fernando: A comprehensive survey on safe reinforcement learning. J. Mach. Learn. Res. 16(1), 1437–1480 (2015)

    MathSciNet  MATH  Google Scholar 

  28. Gittins, J., Glazebrook, K., Weber, R.: Multi-armed bandit allocation indices. Wiley, Hoboken (2011)

    Google Scholar 

  29. Glavic, Mevludin., Fonteneau, Raphaël, Ernst, Damien: Reinforcement learning for electric power system decision and control: past considerations and perspectives. IFAC-PapersOnLine 50(1), 6918–6927 (2017)

    Article  Google Scholar 

  30. Gottesman, O., Johansson, F., Komorowski, M., Faisal, A., Sontag, D., Doshi-Velez, F., Celi, L.A.: Guidelines for reinforcement learning in healthcare. Nat. Med. 25(1), 16–18 (2019)

    Google Scholar 

  31. Gu, S.S., Lillicrap, T., Turner, R.E., Ghahramani, Z., Schölkopf, B., Levine, S.: Interpolated policy gradient: merging on-policy and off-policy gradient estimation for deep reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3846–3855 (2017)

    Google Scholar 

  32. Haddal, C.C., Gertler, J.: Homeland security: unmanned aerial vehicles and border surveillance. Library of Congress Washington DC Congressional Research Service (2010)

    Google Scholar 

  33. Hero, A.O., Castañón, D., Cochran, D., Kastella, K.: Foundations and Applications of Sensor Management. Springer Science & Business Media, Berlin (2007)

    Google Scholar 

  34. James, S., Davison, A.J., Johns, E.: Transferring end-to-end visuomotor control from simulation to real world for a multi-stage task. arXiv:1707.02267 (2017)

  35. Kehoe, Ben., Patil, Sachin., Abbeel, Pieter, Goldberg, Ken: A survey of research on cloud robotics and automation. IEEE Trans. Autom. Sci. Eng. 12(2), 398–409 (2015)

    Article  Google Scholar 

  36. Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)

    Google Scholar 

  37. Kreucher, C.M.: An information-based approach to sensor resource allocation. PhD thesis, University of Michigan (2005)

    Google Scholar 

  38. Krishnamurthy, V., Evans, R.J.: Hidden markov model multiarm bandits: a methodology for beam scheduling in multitarget tracking. IEEE Trans. Signal Process. 49(12), 2893–2908 (2001)

    Google Scholar 

  39. Lake, B.M., Ullman, T.D., Tenenbaum, J.B., Gershman, S.J.: Building machines that learn and think like people. Behav. Brain Sci. 40, e253 (2017)

    Google Scholar 

  40. Levine, S., Abbeel, P.: Learning neural network policies with guided policy search under unknown dynamics. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N.D., Weinberger, K.Q. (eds.), Advances in Neural Information Processing Systems, vol. 27, pp. 1071–1079. Curran Associates, Inc., Red Hook, NY (2014)

    Google Scholar 

  41. Levine, Sergey., Finn, Chelsea., Darrell, Trevor, Abbeel, Pieter: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(1), 1334–1373 (2016)

    MathSciNet  MATH  Google Scholar 

  42. Levine, S., Koltun, V.: Guided policy search. In: Proceedings of the 30th International Conference on Machine Learning, ICML 2013, Atlanta, GA, USA, 16–21 June 2013, pp. 1–9 (2013)

    Google Scholar 

  43. Levine, S., Koltun, V.: Learning complex neural network policies with trajectory optimization. In: Jebara, T., Xing, E.P. (eds.), Proceedings of the 31st International Conference on Machine Learning (ICML-14). JMLR Workshop and Conference Proceedings, pp. 829–837 (2014)

    Google Scholar 

  44. Lewis, F.L., Liu, D.: Reinforcement Learning and Approximate Dynamic Programming for Feedback Control, vol. 17. Wiley, Hoboken (2013)

    Google Scholar 

  45. Li, J., Monroe, W., Ritter, A., Galley, M., Gao, J., Jurafsky, D.: Deep reinforcement learning for dialogue generation. arXiv:1606.01541 (2016)

  46. Li, Weiwei, Todorov, Emanuel: Iterative linear quadratic regulator design for nonlinear biological movement systems. ICINCO 1, 222–229 (2004)

    Google Scholar 

  47. Li, Y.: Deep reinforcement learning: an overview. arXiv:1701.07274 (2017)

  48. Liang, E., Liaw, R., Moritz, P., Nishihara, R., Fox, R., Goldberg, K., Gonzalez, J.E., Jordan, M.I., Stoica, I.: Rllib: abstractions for distributed reinforcement learning. arXiv:1712.09381 (2017)

  49. Lillicrap, T.P., Hunt, J.J., Pritzel, A., Heess, N., Erez, T., Tassa, Y., Silver, D., Wierstra, D.: Continuous control with deep reinforcement learning. arXiv:1509.02971 (2015)

  50. Liu, Y.-E., Mandel, T., Brunskill, E., Popovic, Z.: Trading off scientific knowledge and user learning with multi-armed bandits. In: EDM, pp. 161–168 (2014)

    Google Scholar 

  51. Lowe, R., Wu, Y., Tamar, A., Harb, J., Abbeel, O.P., Mordatch, I.: Multi-agent actor-critic for mixed cooperative-competitive environments. In: Advances in Neural Information Processing Systems, pp. 6379–6390 (2017)

    Google Scholar 

  52. Luong, N.C., Hoang, D.T., Gong, S., Niyato,. D., Wang, P., Liang, Y.-C., Kim, D.I.: Applications of deep reinforcement learning in communications and networking: a survey. IEEE Communications Surveys and Tutorials (2019)

    Google Scholar 

  53. Mao, H., Alizadeh, M., Menache, I., Kandula, S.: Resource management with deep reinforcement learning. In: Proceedings of the 15th ACM Workshop on Hot Topics in Networks, pp. 50–56. ACM, New York (2016)

    Google Scholar 

  54. Mathew, George, Mezić, Igor: Metrics for ergodicity and design of ergodic dynamics for multi-agent systems. Phys. D: Nonlinear Phenom. 240(4–5), 432–442 (2011)

    Article  Google Scholar 

  55. Mathew, G., Surana, A., Mezić, I.: Uniform coverage control of mobile sensor networks for dynamic target detection. In: 49th IEEE Conference on Decision and Control (CDC), pp. 7292–7299. IEEE, Piscataway (2010)

    Google Scholar 

  56. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning. pp. 1928–1937 (2016)

    Google Scholar 

  57. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)

    Google Scholar 

  58. Nachum, O., Gu, S.S., Lee, H., Levine, S.: Data-efficient hierarchical reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 3303–3313 (2018)

    Google Scholar 

  59. U. S. Army UAS Center of Excellence.: U.S. Army roadmap for unmanned aircraft systems, pp. 2010–2035. U.S. Army (2010)

    Google Scholar 

  60. Omidshafiei, S., Pazis, J., Amato, C., How, J.P., Vian, J.: Deep decentralized multi-task multi-agent reinforcement learning under partial observability. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 2681–2690. JMLR. org (2017)

    Google Scholar 

  61. Oshin, O., Nair, B.M., Ding, J., Osborne, R.W., Varma, R., Bernal, E.A., Stramandinoli, F.: Generative and discriminative machine learning models for reactive robot planning in human-robot collaboration (2019)

    Google Scholar 

  62. Peters, Jan, Schaal, Stefan: Reinforcement learning of motor skills with policy gradients. Neural Netw. 21(4), 682–697 (2008)

    Article  Google Scholar 

  63. Pong, V., Gu, S., Dalal, M., Levine, S.: Temporal difference models: model-free deep rl for model-based control. arXiv:1802.09081 (2018)

  64. Rajeswaran, A., Kumar, V., Gupta, A., Vezzani, G., Schulman, J., Todorov, E., Levine, S.: Learning complex dexterous manipulation with deep reinforcement learning and demonstrations. arXiv:1709.10087 (2017)

  65. Sallab, Ahmad E.L., Abdou, Mohammed., Perot, Etienne, Yogamani, Senthil: Deep reinforcement learning framework for autonomous driving. Electron. Imaging 2017(19), 70–76 (2017)

    Article  Google Scholar 

  66. Schulman, J., Levine, S., Moritz, P., Jordan, M.I., Abbeel, P.: Trust region policy optimization. arXiv:1502.05477 (2015)

  67. Silver, D.L., Yang, Q., Li, L.: Lifelong machine learning systems: beyond learning algorithms. In: 2013 AAAI Spring Symposium Series (2013)

    Google Scholar 

  68. Silver, David., Hubert, Thomas., Schrittwieser, Julian., Antonoglou, Ioannis., Lai, Matthew., Guez, Arthur., Lanctot, Marc., Sifre, Laurent., Kumaran, Dharshan., Graepel, Thore., et al.: A general reinforcement learning algorithm that masters chess, shogi, and go through self-play. Science 362(6419), 1140–1144 (2018)

    Article  MathSciNet  Google Scholar 

  69. Stramandinoli, F., Lore, K.G., Peters, J.R., ONeill, P.C., Nair, B.M., Varma, R., Ryde, J.C., Miller, J.T., Reddy, K.K.: Robot learning from human demonstration in virtual reality (2018)

    Google Scholar 

  70. Surana, A., Reddy, K., Siopis, M.: Guided policy search based control of a high dimensional advanced manufacturing process. arxiv preprint, arXiv:2009.05838 (2020)

  71. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. MIT press, Cambridge (2018)

    Google Scholar 

  72. Tapia, G., Elwany, A.: A review on process monitoring and control in metal-based additive manufacturing. J. Manuf. Sci. Eng. 136(6), 060801 (2014)

    Google Scholar 

  73. Taylor, M.E., Stone, P.: Transfer learning for reinforcement learning domains: a survey. J. Mach. Learn. Res. 10, 1633–1685 (2009)

    Google Scholar 

  74. Thomas, P., Brunskill, E.: Data-efficient off-policy policy evaluation for reinforcement learning. In: International Conference on Machine Learning, pp. 2139–2148 (2016)

    Google Scholar 

  75. Thrun, S., Pratt, L.: Learning to learn. Springer Science & Business Media, Berlin (2012)

    Google Scholar 

  76. Tobin, J., Fong, R., Ray, A., Schneider, J., Zaremba, W., Abbeel, P.: Domain randomization for transferring deep neural networks from simulation to the real world. In: 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 23–30. IEEE, Piscataway (2017)

    Google Scholar 

  77. Vamvoudakis, K.G., Lewis, F.L., Hudas, G.R.: Multi-agent differential graphical games: Online adaptive learning solution for synchronization with optimality. Automatica 48(8), 1598–1611 (2012)

    Google Scholar 

  78. Vezhnevets, A.S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., Kavukcuoglu, K.: Feudal networks for hierarchical reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 3540–3549. JMLR. org (2017)

    Google Scholar 

  79. Wang, X., Feng, F., Klecka, M.A., Mordasky, M.D., Garofano, J.K., El-Wardany, T., Nardi, A., Champagne, V.K.: Characterization and modeling of the bonding process in cold spray additive manufacturing. Addit. Manuf. 8, 149–162 (2015)

    Google Scholar 

  80. Wang, Y., Gao, F., Doyle, F.J., III.: Survey on iterative learning control, repetitive control, and run-to-run control. J. Process Control 19(10), 1589–1600 (2009)

    Google Scholar 

  81. Washburn, R.B., Schneider, M.K., Fox, J.J.: Stochastic dynamic programming based approaches to sensor resource management. In: Proceedings of the Fifth International Conference on Information Fusion. FUSION 2002 (IEEE Cat. No. 02EX5997), vol. 1, pp. 608–615. IEEE, Annapolis, MD (2002)

    Google Scholar 

  82. Wayne, G., Hung, C-C., Amos, D., Mirza, M., Ahuja, A., Grabska-Barwinska, A., Rae, J., Mirowski, P., Leibo, J.Z., Santoro, A., et al.: Unsupervised predictive memory in a goal-directed agent. arxiv preprint, arXiv:1803.10760 (2018)

  83. Wu, C., Rajeswaran, A., Duan, Y., Kumar, V., Bayen, A.M., Kakade, S., Mordatch, I., Abbeel, P.: Variance reduction for policy gradient with action-dependent factorized baselines. arXiv:1803.07246 (2018)

  84. Yin, Shuo., Cavaliere, Pasquale., Aldwell, Barry., Jenkins, Richard., Liao, Hanlin., Li, Wenya, Lupoi, Rocco: Cold spray additive manufacturing and repair: fundamentals and applications. Addit. Manuf. 21, 628–650 (2018)

    Google Scholar 

  85. Zhang, T., Kahn, G., Levine, S., Abbeel, P.: Learning deep control policies for autonomous aerial vehicles with mpc-guided policy search. CoRR. arxiv:abs/1509.06791 (2015)

  86. Zhao, X., Xia, L., Tang, J., Yin, D.: Deep reinforcement learning for search, recommendation, and online advertising: a survey by xiangyu zhao, long xia, jiliang tang, and dawei yin with martin vesely as coordinator. ACM SIGWEB Newsletter, vol. 4. Spring, Berlin (2019)D

    Google Scholar 

  87. Zhu, H., Gupta, A., Rajeswaran, A., Levine, S., Kumar, V.: Dexterous manipulation with deep reinforcement learning: Efficient, general, and low-cost. In: 2019 International Conference on Robotics and Automation (ICRA), pp. 3651–3657. IEEE, Montreal, QC (2019)

    Google Scholar 

  88. Ziebart, B.D., Maas, A., Bagnell, J.A., Dey, A.K.: Maximum entropy inverse reinforcement learning. In: Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Amit Surana .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Surana, A. (2021). Reinforcement Learning: An Industrial Perspective. In: Vamvoudakis, K.G., Wan, Y., Lewis, F.L., Cansever, D. (eds) Handbook of Reinforcement Learning and Control. Studies in Systems, Decision and Control, vol 325. Springer, Cham. https://doi.org/10.1007/978-3-030-60990-0_21

Download citation

Publish with us

Policies and ethics