Skip to main content

Deep Reinforcement Learning: A New Frontier in Computer Vision Research

  • Chapter
  • First Online:
Vision, Sensing and Analytics: Integrative Approaches

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 207))

Abstract

Computer vision has advanced so far that machines now can think and see as we humans do. Especially deep learning has raised the bar of excellence in computer vision. However, the recent emergence of deep reinforcement learning is threatening to soar even greater heights as it combines deep neural networks with reinforcement learning along with numerous added advantages over both. This, being a relatively recent technique, has not yet seen many works, and so its true potential is yet to be unveiled. Thus, this chapter focuses on shedding light on the fundamentals of deep reinforcement learning, starting with the preliminaries followed by the theory and basic algorithms and some of its variations, namely, attention aware deep reinforcement learning, deep progressive reinforcement learning, and multi-agent deep reinforcement learning. This chapter also discusses some existing deep reinforcement learning works regarding computer vision such as image processing and understanding, video captioning and summarization, visual search and tracking, action detection, recognition and prediction, and robotics. This work further aims to elucidate the existing challenges and research prospects of deep reinforcement learning in computer vision. This chapter might be considered a starting point for aspiring researchers looking to apply deep reinforcement learning in computer vision to reach the pinnacle of performance in the field by tapping into the immense potential that deep reinforcement learning is showing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1608–1639 (2010)

    Article  Google Scholar 

  2. Abbeel, P., Dolgov, D., Ng, A.Y., Thrun, S.: Apprenticeship learning for motion planning with application to parking lot navigation. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1083–1090 (2008)

    Google Scholar 

  3. Acharya, A., Chen, X., Myers, C.W., Lewis, R.L., Howes, A.: Human visual search as a deep reinforcement learning solution to a POMDP. In: CogSci (2017)

    Google Scholar 

  4. Arora, S., Doshi, P.: A survey of inverse reinforcement learning: challenges, methods and progress (2018). arXiv preprint  arXiv:1806.06877

  5. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint  arXiv:1409.0473

  6. Barati, E., Chen, X.: Critic-based attention network for event-based video captioning. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 811–817 (2019)

    Google Scholar 

  7. Basavarajaiah, M., Sharma, P.: Survey of compressed domain video summarization techniques. ACM Comput. Surv. (CSUR) 52(6), 1–29 (2019)

    Article  Google Scholar 

  8. Bram, T., Brunner, G., Richter, O., Wattenhofer, R. Attentive multi-task deep reinforcement learning (2019). arXiv preprint  arXiv:1907.02874

  9. Cao, Q., Lin, L., Shi, Y., Liang, X., Li, G.: Attention-aware face hallucination via deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 690–698 (2017)

    Google Scholar 

  10. Castaneda, A.O.: Deep reinforcement learning variants of multi-agent learning algorithms. Master’s thesis, School of Informatics, University of Edinburgh (2016)

    Google Scholar 

  11. Chen, L., Lu, J., Song, Z., Zhou, J.: Part-activated deep reinforcement learning for action prediction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 421–436 (2018)

    Google Scholar 

  12. Chen, R.Y., Sidor, S., Abbeel, P., Schulman, J.: UCB exploration via q-ensembles (2017). arXiv preprint  arXiv:1706.01502

  13. Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint  arXiv:1406.1078

  14. Dulac-Arnold, G., Mankowitz, D., Hester, T.: Challenges of real-world reinforcement learning (2019). arXiv preprint  arXiv:1904.12901

  15. Dunnhofer, M., Martinel, N., Luca Foresti, G., Micheloni, C.: Visual tracking by means of deep reinforcement learning and an expert demonstrator. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)

    Google Scholar 

  16. Ebert, F., Finn, C., Dasari, S., Xie, A., Lee, A., Levine, S.: Visual foresight: model-based deep reinforcement learning for vision-based robotic control (2018). arXiv preprint  arXiv:1812.00568

  17. Egorov, M.: Multi-agent deep reinforcement learning. In: CS231n: convolutional Neural Networks for Visual Recognition (2016)

    Google Scholar 

  18. Fahad, M., Chen, Z., Guo, Y.: Learning how pedestrians navigate: a deep inverse reinforcement learning approach. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 819–826. IEEE (2018)

    Google Scholar 

  19. Foerster, J., Assael, I.A., De Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)

    Google Scholar 

  20. Foerster, J.N., Assael, Y.M., de Freitas, N., Whiteson, S.: Learning to communicate to solve riddles with deep distributed recurrent q-networks (2016). arXiv preprint  arXiv:1602.02672

  21. Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference (2002)

    Google Scholar 

  22. Furuta, R., Inoue, N., Yamasaki, T.: Pixelrl: fully convolutional network with reinforcement learning for image processing. IEEE Trans. Multimed. 22(7), 1704–1719 (2019)

    Google Scholar 

  23. Genc, S., Mallya, S., Bodapati, S., Sun, T., Tao, Y.: Zero-shot reinforcement learning with deep attention convolutional neural networks (2020). arXiv preprintarXiv:2001.00605

  24. Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)

    Google Scholar 

  25. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press (2016)

    Google Scholar 

  26. Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396. IEEE (2017)

    Google Scholar 

  27. Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems, pp. 66–83. Springer (2017)

    Google Scholar 

  28. Harandi, M.T., Ahmadabadi, M.N., Araabi, B.N.: Face recognition using reinforcement learning. In: 2004 International Conference on Image Processing, 2004. ICIP 2004, vol. 4, pp. 2709–2712. IEEE (2004)

    Google Scholar 

  29. Hasselt, H.V.: Double q-learning. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2010)

    Google Scholar 

  30. Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable MDPS. In: 2015 AAAI Fall Symposium Series (2015)

    Google Scholar 

  31. Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H.: Multi-task deep reinforcement learning with popart. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3796–3803 (2019)

    Google Scholar 

  32. Hong, Z.-W., Su, S.-Y., Shann, T.-Y., Chang, Y.-H., Lee, C.-Y.: A deep policy inference q-network for multi-agent systems. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1388–1396. International Foundation for Autonomous Agents and Multiagent Systems (2018)

    Google Scholar 

  33. Huang, J., Li, N., Zhang, T., Li, G., Huang, T., Gao, W.: Sap: self-adaptive proposal model for temporal action detection based on reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)

    Google Scholar 

  34. Jahne, B.: Computer Vision and Applications: A Guide for Students and Practitioners. Elsevier (2000)

    Google Scholar 

  35. Jiang, M., Hai, T., Pan, Z., Wang, H., Jia, Y., Deng, C.: Multi-agent deep reinforcement learning for multi-object tracker. IEEE Access 7, 32400–32407 (2019)

    Article  Google Scholar 

  36. Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., et al.: Qt-opt: scalable deep reinforcement learning for vision-based robotic manipulation (2018). arXiv preprint  arXiv:1806.10293

  37. Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)

    Article  MathSciNet  Google Scholar 

  38. Kober, J., Oztop, E., Peters, J.: Reinforcement learning to adjust robot movements to new situations. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)

    Google Scholar 

  39. Kornuta, T., Rocki, K.: Utilization of deep reinforcement learning for saccadic-based object visual search (2016). arXiv preprint  arXiv:1610.06492

  40. Kulhánek, J., Derner, E., de Bruin, T., Babuška, R.: Vision-based navigation using deep reinforcement learning. In: 2019 European Conference on Mobile Robots (ECMR), pp. 1–8. IEEE (2019)

    Google Scholar 

  41. Li, S., Tao, Z., Li, K., Fu, Y.: Visual to text: survey of image and video captioning. IEEE Trans. Emerg. Top. Comput. Intell. 3(4), 297–312 (2019)

    Article  Google Scholar 

  42. Lin, L.-J.: Reinforcement learning for robots using neural networks (1992)

    Google Scholar 

  43. Memarian, F., Xu, Z., Wu, B., Wen, M., Topcu, U.: Active task-inference-guided deep inverse reinforcement learning (2020). arXiv preprint  arXiv:2001.09227

  44. Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)

    Google Scholar 

  45. Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning (2013). arXiv preprint  arXiv:1312.5602

  46. Nguyen, T.T., Nguyen, N.D., Nahavandi, S.: Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans. Cybern. 50(9), 3826–3839 (2020)

    Google Scholar 

  47. Oh, J., Singh, S., Lee, H., Kohli, P.: Zero-shot task generalization with multi-task deep reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2661–2670. JMLR. org (2017)

    Google Scholar 

  48. Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped DQN. In: Advances in Neural Information Processing Systems, pp. 4026–4034 (2016)

    Google Scholar 

  49. Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 443–451. International Foundation for Autonomous Agents and Multiagent Systems (2018)

    Google Scholar 

  50. Park, J., Lee, J.-Y., Yoo, D., So Kweon, I.: Distort-and-recover: color enhancement using deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5928–5936 (2018)

    Google Scholar 

  51. Puzanov, A., Cohen, K.: 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1047–1051 (2018)

    Google Scholar 

  52. Puzanov, A., Cohen, K.: Deep reinforcement one-shot learning for artificially intelligent classification systems (2018). arXiv preprintarXiv:1808.01527

  53. Quillen, D., Jang, E., Nachum, O., Finn, C., Ibarz, J., Levine, S.: Deep reinforcement learning for vision-based robotic grasping: a simulated comparative evaluation of off-policy methods. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6284–6291. IEEE (2018)

    Google Scholar 

  54. Rao, Y., Lu, J., Zhou, J.: Attention-aware deep reinforcement learning for video face recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3931–3940 (2017)

    Google Scholar 

  55. Ratliff, N., Bagnell, J.A., Srinivasa, S.S.: Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, pp. 392–397. IEEE (2007)

    Google Scholar 

  56. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)

    Google Scholar 

  57. Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.-J.: Deep reinforcement learning-based image captioning with embedding reward. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 290–298 (2017)

    Google Scholar 

  58. Roy, N., McCallum, A.: Toward Optimal Active Learning Through Monte Carlo Estimation of Error Reduction, pp. 441–448. ICML, Williamstown (2001)

    Google Scholar 

  59. Rusu, A.A., Colmenarejo, S.G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., Hadsell, R.: Policy distillation (2015). arXiv preprint  arXiv:1511.06295

  60. Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R.,Hadsell, R.: Progressive neural networks (2016). arXiv preprint  arXiv:1606.04671

  61. Shankar, D., Narumanchi, S., Ananya, H., Kompalli, P., Chaudhury, K.: Deep learning based large scale visual recommendation and search for e-commerce (2017). arXiv preprint  arXiv:1703.02344

  62. Sharifzadeh, S., Chiotellis, I., Triebel, R., Cremers, D.: Learning to drive using inverse reinforcement learning and deep q-networks (2016). arXiv preprint  arXiv:1612.03653

  63. Siebel, N.T., Grunewald, S., Sommer, G.: Creating edge detectors by evolutionary reinforcement learning. In: 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp. 3553–3560. IEEE (2008)

    Google Scholar 

  64. Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484 (2016)

    Google Scholar 

  65. Skov, S.: Indoor visual navigation using deep reinforcement learning (2017)

    Google Scholar 

  66. Song, Y.: Inverse Reinforcement Learning for Autonomous Ground Navigation Using Aerial and Satellite Observation Data. Ph.D. thesis, Master’s thesis, The Robotics Institute, Carnegie Mellon University (2019)

    Google Scholar 

  67. Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT press Cambridge (1998)

    Google Scholar 

  68. Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)

    Google Scholar 

  69. Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)

    Google Scholar 

  70. Vaudaux-Ruth, G., Chan-Hon-Tong, A., Achard, C.: Actionspotter: deep reinforcement learning framework for temporal action spotting in videos (2020). arXiv preprint  arXiv:2004.06971

  71. Wang, X., Chen, W., Wu, J., Wang, Y.-F., Yang Wang, W.: Video captioning via hierarchical reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4213–4222 (2018)

    Google Scholar 

  72. Wang, Y., Wu, F.: Multi-agent deep reinforcement learning with adaptive policies (2019). arXiv preprintarXiv:1912.00949

  73. Wulfmeier, M., Ondruska, P., Posner, I.: Deep inverse reinforcement learning. ArXiv, abs/1507.04888 (2015)

    Google Scholar 

  74. Wulfmeier, M., Ondruska, P., Posner, I.: Maximum entropy deep inverse reinforcement learning (2015). arXiv preprint  arXiv:1507.04888

  75. Yang, Z., Merrick, K.E., Abbass, H.A., Jin, L.:Multi-task deep reinforcement learning for continuous action control. In: IJCAI, pp. 3301–3307 (2017)

    Google Scholar 

  76. Yokoyama, K., Morioka, K.: Autonomous mobile robot with simple navigation system based on deep reinforcement learning and a monocular camera. In: 2020 IEEE/SICE International Symposium on System Integration (SII), pp. 525–530. IEEE (2020)

    Google Scholar 

  77. Yu, C., Zhang, M., Ren, F., Tan, G.: Multiagent learning of coordination in loosely coupled multiagent systems. IEEE Trans. Cybern. 45(12), 2853–2867 (2015)

    Article  Google Scholar 

  78. Yun, S., Choi, J., Yoo, Y., Yun, K., Young Choi, J.: Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2711–2720 (2017)

    Google Scholar 

  79. Zhang, D., Maei, H., Wang, X., Wang, Y.-F.: Deep reinforcement learning for visual object tracking in videos (2017). arXiv preprint  arXiv:1701.08936

  80. Zhang, F., Leitner, J., Milford, M., Upcroft, B., Corke, P.: Towards vision-based deep reinforcement learning for robotic motion control (2015). arXiv preprint  arXiv:1511.03791

  81. Zhang, Y., Kampffmeyer, M., Zhao, X., Tan, M.: Deep reinforcement learning for query-conditioned video summarization. Appl. Sci. 9(4), 750 (2019)

    Article  Google Scholar 

  82. Zheng, Y., Meng, Z., Hao, J., Zhang, Z.: Weighted double deep multiagent reinforcement learning in stochastic cooperative environments. In: Pacific Rim International Conference on Artificial Intelligence, pp. 421–429. Springer (2018)

    Google Scholar 

  83. Zhou, K., Qiao, Y., Xiang, T.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Thirty-Second AAAI Conference on Artificial Intelligence(2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sejuti Rahman .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Rahman, S., Sarker, S., Haque, A.K.M.N., Uttsha, M.M. (2021). Deep Reinforcement Learning: A New Frontier in Computer Vision Research. In: Ahad, M.A.R., Inoue, A. (eds) Vision, Sensing and Analytics: Integrative Approaches. Intelligent Systems Reference Library, vol 207. Springer, Cham. https://doi.org/10.1007/978-3-030-75490-7_2

Download citation

Publish with us

Policies and ethics