Deep Reinforcement Learning: A New Frontier in Computer Vision Research

Rahman, Sejuti; Sarker, Sujan; Haque, A. K. M. Nadimul; Uttsha, Monisha Mushtary

doi:10.1007/978-3-030-75490-7_2

Sejuti Rahman⁵,
Sujan Sarker⁵,
A. K. M. Nadimul Haque⁵ &
…
Monisha Mushtary Uttsha⁵

Part of the book series: Intelligent Systems Reference Library ((ISRL,volume 207))

917 Accesses
1 Citations

Abstract

Computer vision has advanced so far that machines now can think and see as we humans do. Especially deep learning has raised the bar of excellence in computer vision. However, the recent emergence of deep reinforcement learning is threatening to soar even greater heights as it combines deep neural networks with reinforcement learning along with numerous added advantages over both. This, being a relatively recent technique, has not yet seen many works, and so its true potential is yet to be unveiled. Thus, this chapter focuses on shedding light on the fundamentals of deep reinforcement learning, starting with the preliminaries followed by the theory and basic algorithms and some of its variations, namely, attention aware deep reinforcement learning, deep progressive reinforcement learning, and multi-agent deep reinforcement learning. This chapter also discusses some existing deep reinforcement learning works regarding computer vision such as image processing and understanding, video captioning and summarization, visual search and tracking, action detection, recognition and prediction, and robotics. This work further aims to elucidate the existing challenges and research prospects of deep reinforcement learning in computer vision. This chapter might be considered a starting point for aspiring researchers looking to apply deep reinforcement learning in computer vision to reach the pinnacle of performance in the field by tapping into the immense potential that deep reinforcement learning is showing.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Deep reinforcement learning in computer vision: a comprehensive survey

Article 29 September 2021

Reinforcement learning applied to machine vision: state of the art

Article 03 May 2021

Deep Learning Algorithm and Its Applications to IoT and Computer Vision

References

Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1608–1639 (2010)
Article Google Scholar
Abbeel, P., Dolgov, D., Ng, A.Y., Thrun, S.: Apprenticeship learning for motion planning with application to parking lot navigation. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1083–1090 (2008)
Google Scholar
Acharya, A., Chen, X., Myers, C.W., Lewis, R.L., Howes, A.: Human visual search as a deep reinforcement learning solution to a POMDP. In: CogSci (2017)
Google Scholar
Arora, S., Doshi, P.: A survey of inverse reinforcement learning: challenges, methods and progress (2018). arXiv preprint arXiv:1806.06877
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473
Barati, E., Chen, X.: Critic-based attention network for event-based video captioning. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 811–817 (2019)
Google Scholar
Basavarajaiah, M., Sharma, P.: Survey of compressed domain video summarization techniques. ACM Comput. Surv. (CSUR) 52(6), 1–29 (2019)
Article Google Scholar
Bram, T., Brunner, G., Richter, O., Wattenhofer, R. Attentive multi-task deep reinforcement learning (2019). arXiv preprint arXiv:1907.02874
Cao, Q., Lin, L., Shi, Y., Liang, X., Li, G.: Attention-aware face hallucination via deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 690–698 (2017)
Google Scholar
Castaneda, A.O.: Deep reinforcement learning variants of multi-agent learning algorithms. Master’s thesis, School of Informatics, University of Edinburgh (2016)
Google Scholar
Chen, L., Lu, J., Song, Z., Zhou, J.: Part-activated deep reinforcement learning for action prediction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 421–436 (2018)
Google Scholar
Chen, R.Y., Sidor, S., Abbeel, P., Schulman, J.: UCB exploration via q-ensembles (2017). arXiv preprint arXiv:1706.01502
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078
Dulac-Arnold, G., Mankowitz, D., Hester, T.: Challenges of real-world reinforcement learning (2019). arXiv preprint arXiv:1904.12901
Dunnhofer, M., Martinel, N., Luca Foresti, G., Micheloni, C.: Visual tracking by means of deep reinforcement learning and an expert demonstrator. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Google Scholar
Ebert, F., Finn, C., Dasari, S., Xie, A., Lee, A., Levine, S.: Visual foresight: model-based deep reinforcement learning for vision-based robotic control (2018). arXiv preprint arXiv:1812.00568
Egorov, M.: Multi-agent deep reinforcement learning. In: CS231n: convolutional Neural Networks for Visual Recognition (2016)
Google Scholar
Fahad, M., Chen, Z., Guo, Y.: Learning how pedestrians navigate: a deep inverse reinforcement learning approach. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 819–826. IEEE (2018)
Google Scholar
Foerster, J., Assael, I.A., De Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)
Google Scholar
Foerster, J.N., Assael, Y.M., de Freitas, N., Whiteson, S.: Learning to communicate to solve riddles with deep distributed recurrent q-networks (2016). arXiv preprint arXiv:1602.02672
Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference (2002)
Google Scholar
Furuta, R., Inoue, N., Yamasaki, T.: Pixelrl: fully convolutional network with reinforcement learning for image processing. IEEE Trans. Multimed. 22(7), 1704–1719 (2019)
Google Scholar
Genc, S., Mallya, S., Bodapati, S., Sun, T., Tao, Y.: Zero-shot reinforcement learning with deep attention convolutional neural networks (2020). arXiv preprintarXiv:2001.00605
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press (2016)
Google Scholar
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396. IEEE (2017)
Google Scholar
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems, pp. 66–83. Springer (2017)
Google Scholar
Harandi, M.T., Ahmadabadi, M.N., Araabi, B.N.: Face recognition using reinforcement learning. In: 2004 International Conference on Image Processing, 2004. ICIP 2004, vol. 4, pp. 2709–2712. IEEE (2004)
Google Scholar
Hasselt, H.V.: Double q-learning. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2010)
Google Scholar
Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable MDPS. In: 2015 AAAI Fall Symposium Series (2015)
Google Scholar
Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H.: Multi-task deep reinforcement learning with popart. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3796–3803 (2019)
Google Scholar
Hong, Z.-W., Su, S.-Y., Shann, T.-Y., Chang, Y.-H., Lee, C.-Y.: A deep policy inference q-network for multi-agent systems. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1388–1396. International Foundation for Autonomous Agents and Multiagent Systems (2018)
Google Scholar
Huang, J., Li, N., Zhang, T., Li, G., Huang, T., Gao, W.: Sap: self-adaptive proposal model for temporal action detection based on reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Google Scholar
Jahne, B.: Computer Vision and Applications: A Guide for Students and Practitioners. Elsevier (2000)
Google Scholar
Jiang, M., Hai, T., Pan, Z., Wang, H., Jia, Y., Deng, C.: Multi-agent deep reinforcement learning for multi-object tracker. IEEE Access 7, 32400–32407 (2019)
Article Google Scholar
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., et al.: Qt-opt: scalable deep reinforcement learning for vision-based robotic manipulation (2018). arXiv preprint arXiv:1806.10293
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Article MathSciNet Google Scholar
Kober, J., Oztop, E., Peters, J.: Reinforcement learning to adjust robot movements to new situations. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)
Google Scholar
Kornuta, T., Rocki, K.: Utilization of deep reinforcement learning for saccadic-based object visual search (2016). arXiv preprint arXiv:1610.06492
Kulhánek, J., Derner, E., de Bruin, T., Babuška, R.: Vision-based navigation using deep reinforcement learning. In: 2019 European Conference on Mobile Robots (ECMR), pp. 1–8. IEEE (2019)
Google Scholar
Li, S., Tao, Z., Li, K., Fu, Y.: Visual to text: survey of image and video captioning. IEEE Trans. Emerg. Top. Comput. Intell. 3(4), 297–312 (2019)
Article Google Scholar
Lin, L.-J.: Reinforcement learning for robots using neural networks (1992)
Google Scholar
Memarian, F., Xu, Z., Wu, B., Wen, M., Topcu, U.: Active task-inference-guided deep inverse reinforcement learning (2020). arXiv preprint arXiv:2001.09227
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Google Scholar
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning (2013). arXiv preprint arXiv:1312.5602
Nguyen, T.T., Nguyen, N.D., Nahavandi, S.: Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans. Cybern. 50(9), 3826–3839 (2020)
Google Scholar
Oh, J., Singh, S., Lee, H., Kohli, P.: Zero-shot task generalization with multi-task deep reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2661–2670. JMLR. org (2017)
Google Scholar
Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped DQN. In: Advances in Neural Information Processing Systems, pp. 4026–4034 (2016)
Google Scholar
Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 443–451. International Foundation for Autonomous Agents and Multiagent Systems (2018)
Google Scholar
Park, J., Lee, J.-Y., Yoo, D., So Kweon, I.: Distort-and-recover: color enhancement using deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5928–5936 (2018)
Google Scholar
Puzanov, A., Cohen, K.: 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1047–1051 (2018)
Google Scholar
Puzanov, A., Cohen, K.: Deep reinforcement one-shot learning for artificially intelligent classification systems (2018). arXiv preprintarXiv:1808.01527
Quillen, D., Jang, E., Nachum, O., Finn, C., Ibarz, J., Levine, S.: Deep reinforcement learning for vision-based robotic grasping: a simulated comparative evaluation of off-policy methods. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6284–6291. IEEE (2018)
Google Scholar
Rao, Y., Lu, J., Zhou, J.: Attention-aware deep reinforcement learning for video face recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3931–3940 (2017)
Google Scholar
Ratliff, N., Bagnell, J.A., Srinivasa, S.S.: Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, pp. 392–397. IEEE (2007)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Google Scholar
Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.-J.: Deep reinforcement learning-based image captioning with embedding reward. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 290–298 (2017)
Google Scholar
Roy, N., McCallum, A.: Toward Optimal Active Learning Through Monte Carlo Estimation of Error Reduction, pp. 441–448. ICML, Williamstown (2001)
Google Scholar
Rusu, A.A., Colmenarejo, S.G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., Hadsell, R.: Policy distillation (2015). arXiv preprint arXiv:1511.06295
Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R.,Hadsell, R.: Progressive neural networks (2016). arXiv preprint arXiv:1606.04671
Shankar, D., Narumanchi, S., Ananya, H., Kompalli, P., Chaudhury, K.: Deep learning based large scale visual recommendation and search for e-commerce (2017). arXiv preprint arXiv:1703.02344
Sharifzadeh, S., Chiotellis, I., Triebel, R., Cremers, D.: Learning to drive using inverse reinforcement learning and deep q-networks (2016). arXiv preprint arXiv:1612.03653
Siebel, N.T., Grunewald, S., Sommer, G.: Creating edge detectors by evolutionary reinforcement learning. In: 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp. 3553–3560. IEEE (2008)
Google Scholar
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484 (2016)
Google Scholar
Skov, S.: Indoor visual navigation using deep reinforcement learning (2017)
Google Scholar
Song, Y.: Inverse Reinforcement Learning for Autonomous Ground Navigation Using Aerial and Satellite Observation Data. Ph.D. thesis, Master’s thesis, The Robotics Institute, Carnegie Mellon University (2019)
Google Scholar
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT press Cambridge (1998)
Google Scholar
Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)
Google Scholar
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Google Scholar
Vaudaux-Ruth, G., Chan-Hon-Tong, A., Achard, C.: Actionspotter: deep reinforcement learning framework for temporal action spotting in videos (2020). arXiv preprint arXiv:2004.06971
Wang, X., Chen, W., Wu, J., Wang, Y.-F., Yang Wang, W.: Video captioning via hierarchical reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4213–4222 (2018)
Google Scholar
Wang, Y., Wu, F.: Multi-agent deep reinforcement learning with adaptive policies (2019). arXiv preprintarXiv:1912.00949
Wulfmeier, M., Ondruska, P., Posner, I.: Deep inverse reinforcement learning. ArXiv, abs/1507.04888 (2015)
Google Scholar
Wulfmeier, M., Ondruska, P., Posner, I.: Maximum entropy deep inverse reinforcement learning (2015). arXiv preprint arXiv:1507.04888
Yang, Z., Merrick, K.E., Abbass, H.A., Jin, L.:Multi-task deep reinforcement learning for continuous action control. In: IJCAI, pp. 3301–3307 (2017)
Google Scholar
Yokoyama, K., Morioka, K.: Autonomous mobile robot with simple navigation system based on deep reinforcement learning and a monocular camera. In: 2020 IEEE/SICE International Symposium on System Integration (SII), pp. 525–530. IEEE (2020)
Google Scholar
Yu, C., Zhang, M., Ren, F., Tan, G.: Multiagent learning of coordination in loosely coupled multiagent systems. IEEE Trans. Cybern. 45(12), 2853–2867 (2015)
Article Google Scholar
Yun, S., Choi, J., Yoo, Y., Yun, K., Young Choi, J.: Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2711–2720 (2017)
Google Scholar
Zhang, D., Maei, H., Wang, X., Wang, Y.-F.: Deep reinforcement learning for visual object tracking in videos (2017). arXiv preprint arXiv:1701.08936
Zhang, F., Leitner, J., Milford, M., Upcroft, B., Corke, P.: Towards vision-based deep reinforcement learning for robotic motion control (2015). arXiv preprint arXiv:1511.03791
Zhang, Y., Kampffmeyer, M., Zhao, X., Tan, M.: Deep reinforcement learning for query-conditioned video summarization. Appl. Sci. 9(4), 750 (2019)
Article Google Scholar
Zheng, Y., Meng, Z., Hao, J., Zhang, Z.: Weighted double deep multiagent reinforcement learning in stochastic cooperative environments. In: Pacific Rim International Conference on Artificial Intelligence, pp. 421–429. Springer (2018)
Google Scholar
Zhou, K., Qiao, Y., Xiang, T.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Thirty-Second AAAI Conference on Artificial Intelligence(2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Robotics and Mechatronics Engineering, University of Dhaka, Dhaka, Bangladesh
Sejuti Rahman, Sujan Sarker, A. K. M. Nadimul Haque & Monisha Mushtary Uttsha

Authors

Sejuti Rahman
View author publications
You can also search for this author in PubMed Google Scholar
Sujan Sarker
View author publications
You can also search for this author in PubMed Google Scholar
A. K. M. Nadimul Haque
View author publications
You can also search for this author in PubMed Google Scholar
Monisha Mushtary Uttsha
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sejuti Rahman .

Editor information

Editors and Affiliations

Professor, Department of Electrical and Electronic Engineering, University of Dhaka, Dhaka, Bangladesh
Md Atiqur Rahman Ahad
Solutions Architect for Greenfield and Startups, Amazon Web Services, USA, Visiting Professor, Graduate School of Regional Innovation, Mie University, Tsu, Japan
Atsushi Inoue

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rahman, S., Sarker, S., Haque, A.K.M.N., Uttsha, M.M. (2021). Deep Reinforcement Learning: A New Frontier in Computer Vision Research. In: Ahad, M.A.R., Inoue, A. (eds) Vision, Sensing and Analytics: Integrative Approaches. Intelligent Systems Reference Library, vol 207. Springer, Cham. https://doi.org/10.1007/978-3-030-75490-7_2

Download citation

DOI: https://doi.org/10.1007/978-3-030-75490-7_2
Published: 06 June 2021
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75489-1
Online ISBN: 978-3-030-75490-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics