Abstract
Computer vision has advanced so far that machines now can think and see as we humans do. Especially deep learning has raised the bar of excellence in computer vision. However, the recent emergence of deep reinforcement learning is threatening to soar even greater heights as it combines deep neural networks with reinforcement learning along with numerous added advantages over both. This, being a relatively recent technique, has not yet seen many works, and so its true potential is yet to be unveiled. Thus, this chapter focuses on shedding light on the fundamentals of deep reinforcement learning, starting with the preliminaries followed by the theory and basic algorithms and some of its variations, namely, attention aware deep reinforcement learning, deep progressive reinforcement learning, and multi-agent deep reinforcement learning. This chapter also discusses some existing deep reinforcement learning works regarding computer vision such as image processing and understanding, video captioning and summarization, visual search and tracking, action detection, recognition and prediction, and robotics. This work further aims to elucidate the existing challenges and research prospects of deep reinforcement learning in computer vision. This chapter might be considered a starting point for aspiring researchers looking to apply deep reinforcement learning in computer vision to reach the pinnacle of performance in the field by tapping into the immense potential that deep reinforcement learning is showing.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Abbeel, P., Coates, A., Ng, A.Y.: Autonomous helicopter aerobatics through apprenticeship learning. Int. J. Robot. Res. 29(13), 1608–1639 (2010)
Abbeel, P., Dolgov, D., Ng, A.Y., Thrun, S.: Apprenticeship learning for motion planning with application to parking lot navigation. In: 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1083–1090 (2008)
Acharya, A., Chen, X., Myers, C.W., Lewis, R.L., Howes, A.: Human visual search as a deep reinforcement learning solution to a POMDP. In: CogSci (2017)
Arora, S., Doshi, P.: A survey of inverse reinforcement learning: challenges, methods and progress (2018). arXiv preprint arXiv:1806.06877
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473
Barati, E., Chen, X.: Critic-based attention network for event-based video captioning. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 811–817 (2019)
Basavarajaiah, M., Sharma, P.: Survey of compressed domain video summarization techniques. ACM Comput. Surv. (CSUR) 52(6), 1–29 (2019)
Bram, T., Brunner, G., Richter, O., Wattenhofer, R. Attentive multi-task deep reinforcement learning (2019). arXiv preprint arXiv:1907.02874
Cao, Q., Lin, L., Shi, Y., Liang, X., Li, G.: Attention-aware face hallucination via deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 690–698 (2017)
Castaneda, A.O.: Deep reinforcement learning variants of multi-agent learning algorithms. Master’s thesis, School of Informatics, University of Edinburgh (2016)
Chen, L., Lu, J., Song, Z., Zhou, J.: Part-activated deep reinforcement learning for action prediction. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 421–436 (2018)
Chen, R.Y., Sidor, S., Abbeel, P., Schulman, J.: UCB exploration via q-ensembles (2017). arXiv preprint arXiv:1706.01502
Cho, K., Van Merriënboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078
Dulac-Arnold, G., Mankowitz, D., Hester, T.: Challenges of real-world reinforcement learning (2019). arXiv preprint arXiv:1904.12901
Dunnhofer, M., Martinel, N., Luca Foresti, G., Micheloni, C.: Visual tracking by means of deep reinforcement learning and an expert demonstrator. In: Proceedings of the IEEE International Conference on Computer Vision Workshops (2019)
Ebert, F., Finn, C., Dasari, S., Xie, A., Lee, A., Levine, S.: Visual foresight: model-based deep reinforcement learning for vision-based robotic control (2018). arXiv preprint arXiv:1812.00568
Egorov, M.: Multi-agent deep reinforcement learning. In: CS231n: convolutional Neural Networks for Visual Recognition (2016)
Fahad, M., Chen, Z., Guo, Y.: Learning how pedestrians navigate: a deep inverse reinforcement learning approach. In: 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 819–826. IEEE (2018)
Foerster, J., Assael, I.A., De Freitas, N., Whiteson, S.: Learning to communicate with deep multi-agent reinforcement learning. In: Advances in Neural Information Processing Systems, pp. 2137–2145 (2016)
Foerster, J.N., Assael, Y.M., de Freitas, N., Whiteson, S.: Learning to communicate to solve riddles with deep distributed recurrent q-networks (2016). arXiv preprint arXiv:1602.02672
Forsyth, D.A., Ponce, J.: Computer Vision: A Modern Approach. Prentice Hall Professional Technical Reference (2002)
Furuta, R., Inoue, N., Yamasaki, T.: Pixelrl: fully convolutional network with reinforcement learning for image processing. IEEE Trans. Multimed. 22(7), 1704–1719 (2019)
Genc, S., Mallya, S., Bodapati, S., Sun, T., Tao, Y.: Zero-shot reinforcement learning with deep attention convolutional neural networks (2020). arXiv preprintarXiv:2001.00605
Gers, F.A., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with LSTM (1999)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT press (2016)
Gu, S., Holly, E., Lillicrap, T., Levine, S.: Deep reinforcement learning for robotic manipulation with asynchronous off-policy updates. In: 2017 IEEE International Conference on Robotics and Automation (ICRA), pp. 3389–3396. IEEE (2017)
Gupta, J.K., Egorov, M., Kochenderfer, M.: Cooperative multi-agent control using deep reinforcement learning. In: International Conference on Autonomous Agents and Multiagent Systems, pp. 66–83. Springer (2017)
Harandi, M.T., Ahmadabadi, M.N., Araabi, B.N.: Face recognition using reinforcement learning. In: 2004 International Conference on Image Processing, 2004. ICIP 2004, vol. 4, pp. 2709–2712. IEEE (2004)
Hasselt, H.V.: Double q-learning. In: Advances in Neural Information Processing Systems, pp. 2613–2621 (2010)
Hausknecht, M., Stone, P.: Deep recurrent q-learning for partially observable MDPS. In: 2015 AAAI Fall Symposium Series (2015)
Hessel, M., Soyer, H., Espeholt, L., Czarnecki, W., Schmitt, S., van Hasselt, H.: Multi-task deep reinforcement learning with popart. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 3796–3803 (2019)
Hong, Z.-W., Su, S.-Y., Shann, T.-Y., Chang, Y.-H., Lee, C.-Y.: A deep policy inference q-network for multi-agent systems. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 1388–1396. International Foundation for Autonomous Agents and Multiagent Systems (2018)
Huang, J., Li, N., Zhang, T., Li, G., Huang, T., Gao, W.: Sap: self-adaptive proposal model for temporal action detection based on reinforcement learning. In: Thirty-Second AAAI Conference on Artificial Intelligence (2018)
Jahne, B.: Computer Vision and Applications: A Guide for Students and Practitioners. Elsevier (2000)
Jiang, M., Hai, T., Pan, Z., Wang, H., Jia, Y., Deng, C.: Multi-agent deep reinforcement learning for multi-object tracker. IEEE Access 7, 32400–32407 (2019)
Kalashnikov, D., Irpan, A., Pastor, P., Ibarz, J., Herzog, A., Jang, E., Quillen, D., Holly, E., Kalakrishnan, M., Vanhoucke, V., et al.: Qt-opt: scalable deep reinforcement learning for vision-based robotic manipulation (2018). arXiv preprint arXiv:1806.10293
Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., et al.: Overcoming catastrophic forgetting in neural networks. Proc. Natl. Acad. Sci. 114(13), 3521–3526 (2017)
Kober, J., Oztop, E., Peters, J.: Reinforcement learning to adjust robot movements to new situations. In: Twenty-Second International Joint Conference on Artificial Intelligence (2011)
Kornuta, T., Rocki, K.: Utilization of deep reinforcement learning for saccadic-based object visual search (2016). arXiv preprint arXiv:1610.06492
Kulhánek, J., Derner, E., de Bruin, T., Babuška, R.: Vision-based navigation using deep reinforcement learning. In: 2019 European Conference on Mobile Robots (ECMR), pp. 1–8. IEEE (2019)
Li, S., Tao, Z., Li, K., Fu, Y.: Visual to text: survey of image and video captioning. IEEE Trans. Emerg. Top. Comput. Intell. 3(4), 297–312 (2019)
Lin, L.-J.: Reinforcement learning for robots using neural networks (1992)
Memarian, F., Xu, Z., Wu, B., Wen, M., Topcu, U.: Active task-inference-guided deep inverse reinforcement learning (2020). arXiv preprint arXiv:2001.09227
Mnih, V., Badia, A.P., Mirza, M., Graves, A., Lillicrap, T., Harley, T., Silver, D., Kavukcuoglu, K.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
Mnih, V., Kavukcuoglu, K., Silver, D., Graves, A., Antonoglou, I., Wierstra, D., Riedmiller, M.: Playing atari with deep reinforcement learning (2013). arXiv preprint arXiv:1312.5602
Nguyen, T.T., Nguyen, N.D., Nahavandi, S.: Deep reinforcement learning for multiagent systems: a review of challenges, solutions, and applications. IEEE Trans. Cybern. 50(9), 3826–3839 (2020)
Oh, J., Singh, S., Lee, H., Kohli, P.: Zero-shot task generalization with multi-task deep reinforcement learning. In: Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 2661–2670. JMLR. org (2017)
Osband, I., Blundell, C., Pritzel, A., Van Roy, B.: Deep exploration via bootstrapped DQN. In: Advances in Neural Information Processing Systems, pp. 4026–4034 (2016)
Palmer, G., Tuyls, K., Bloembergen, D., Savani, R.: Lenient multi-agent deep reinforcement learning. In: Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems, pp. 443–451. International Foundation for Autonomous Agents and Multiagent Systems (2018)
Park, J., Lee, J.-Y., Yoo, D., So Kweon, I.: Distort-and-recover: color enhancement using deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5928–5936 (2018)
Puzanov, A., Cohen, K.: 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1047–1051 (2018)
Puzanov, A., Cohen, K.: Deep reinforcement one-shot learning for artificially intelligent classification systems (2018). arXiv preprintarXiv:1808.01527
Quillen, D., Jang, E., Nachum, O., Finn, C., Ibarz, J., Levine, S.: Deep reinforcement learning for vision-based robotic grasping: a simulated comparative evaluation of off-policy methods. In: 2018 IEEE International Conference on Robotics and Automation (ICRA), pp. 6284–6291. IEEE (2018)
Rao, Y., Lu, J., Zhou, J.: Attention-aware deep reinforcement learning for video face recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 3931–3940 (2017)
Ratliff, N., Bagnell, J.A., Srinivasa, S.S.: Imitation learning for locomotion and manipulation. In: 2007 7th IEEE-RAS International Conference on Humanoid Robots, pp. 392–397. IEEE (2007)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 779–788 (2016)
Ren, Z., Wang, X., Zhang, N., Lv, X., Li, L.-J.: Deep reinforcement learning-based image captioning with embedding reward. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 290–298 (2017)
Roy, N., McCallum, A.: Toward Optimal Active Learning Through Monte Carlo Estimation of Error Reduction, pp. 441–448. ICML, Williamstown (2001)
Rusu, A.A., Colmenarejo, S.G., Gulcehre, C., Desjardins, G., Kirkpatrick, J., Pascanu, R., Mnih, V., Kavukcuoglu, K., Hadsell, R.: Policy distillation (2015). arXiv preprint arXiv:1511.06295
Rusu, A.A., Rabinowitz, N.C., Desjardins, G., Soyer, H., Kirkpatrick, J., Kavukcuoglu, K., Pascanu, R.,Hadsell, R.: Progressive neural networks (2016). arXiv preprint arXiv:1606.04671
Shankar, D., Narumanchi, S., Ananya, H., Kompalli, P., Chaudhury, K.: Deep learning based large scale visual recommendation and search for e-commerce (2017). arXiv preprint arXiv:1703.02344
Sharifzadeh, S., Chiotellis, I., Triebel, R., Cremers, D.: Learning to drive using inverse reinforcement learning and deep q-networks (2016). arXiv preprint arXiv:1612.03653
Siebel, N.T., Grunewald, S., Sommer, G.: Creating edge detectors by evolutionary reinforcement learning. In: 2008 IEEE Congress on Evolutionary Computation (IEEE World Congress on Computational Intelligence), pp. 3553–3560. IEEE (2008)
Silver, D., Huang, A., Maddison, C.J., Guez, A., Sifre, L., Van Den Driessche, G., Schrittwieser, J., Antonoglou, I., Panneershelvam, V., Lanctot, M., et al.: Mastering the game of go with deep neural networks and tree search. Nature, 529(7587), 484 (2016)
Skov, S.: Indoor visual navigation using deep reinforcement learning (2017)
Song, Y.: Inverse Reinforcement Learning for Autonomous Ground Navigation Using Aerial and Satellite Observation Data. Ph.D. thesis, Master’s thesis, The Robotics Institute, Carnegie Mellon University (2019)
Sutton, R.S., Barto, A.G., et al.: Introduction to Reinforcement Learning, vol. 135. MIT press Cambridge (1998)
Tang, Y., Tian, Y., Lu, J., Li, P., Zhou, J.: Deep progressive reinforcement learning for skeleton-based action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5323–5332 (2018)
Van Hasselt, H., Guez, A., Silver, D.: Deep reinforcement learning with double q-learning. In: Thirtieth AAAI Conference on Artificial Intelligence (2016)
Vaudaux-Ruth, G., Chan-Hon-Tong, A., Achard, C.: Actionspotter: deep reinforcement learning framework for temporal action spotting in videos (2020). arXiv preprint arXiv:2004.06971
Wang, X., Chen, W., Wu, J., Wang, Y.-F., Yang Wang, W.: Video captioning via hierarchical reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4213–4222 (2018)
Wang, Y., Wu, F.: Multi-agent deep reinforcement learning with adaptive policies (2019). arXiv preprintarXiv:1912.00949
Wulfmeier, M., Ondruska, P., Posner, I.: Deep inverse reinforcement learning. ArXiv, abs/1507.04888 (2015)
Wulfmeier, M., Ondruska, P., Posner, I.: Maximum entropy deep inverse reinforcement learning (2015). arXiv preprint arXiv:1507.04888
Yang, Z., Merrick, K.E., Abbass, H.A., Jin, L.:Multi-task deep reinforcement learning for continuous action control. In: IJCAI, pp. 3301–3307 (2017)
Yokoyama, K., Morioka, K.: Autonomous mobile robot with simple navigation system based on deep reinforcement learning and a monocular camera. In: 2020 IEEE/SICE International Symposium on System Integration (SII), pp. 525–530. IEEE (2020)
Yu, C., Zhang, M., Ren, F., Tan, G.: Multiagent learning of coordination in loosely coupled multiagent systems. IEEE Trans. Cybern. 45(12), 2853–2867 (2015)
Yun, S., Choi, J., Yoo, Y., Yun, K., Young Choi, J.: Action-decision networks for visual tracking with deep reinforcement learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2711–2720 (2017)
Zhang, D., Maei, H., Wang, X., Wang, Y.-F.: Deep reinforcement learning for visual object tracking in videos (2017). arXiv preprint arXiv:1701.08936
Zhang, F., Leitner, J., Milford, M., Upcroft, B., Corke, P.: Towards vision-based deep reinforcement learning for robotic motion control (2015). arXiv preprint arXiv:1511.03791
Zhang, Y., Kampffmeyer, M., Zhao, X., Tan, M.: Deep reinforcement learning for query-conditioned video summarization. Appl. Sci. 9(4), 750 (2019)
Zheng, Y., Meng, Z., Hao, J., Zhang, Z.: Weighted double deep multiagent reinforcement learning in stochastic cooperative environments. In: Pacific Rim International Conference on Artificial Intelligence, pp. 421–429. Springer (2018)
Zhou, K., Qiao, Y., Xiang, T.: Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward. In: Thirty-Second AAAI Conference on Artificial Intelligence(2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Rahman, S., Sarker, S., Haque, A.K.M.N., Uttsha, M.M. (2021). Deep Reinforcement Learning: A New Frontier in Computer Vision Research. In: Ahad, M.A.R., Inoue, A. (eds) Vision, Sensing and Analytics: Integrative Approaches. Intelligent Systems Reference Library, vol 207. Springer, Cham. https://doi.org/10.1007/978-3-030-75490-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-75490-7_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-75489-1
Online ISBN: 978-3-030-75490-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)