Deep Learning Architectures for Computer Vision Applications: A Study

Bagi, Randheer; Dutta, Tanima; Gupta, Hari Prabhat

doi:10.1007/978-981-15-0694-9_56

Randheer Bagi¹³,
Tanima Dutta¹³ &
Hari Prabhat Gupta¹³

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 94))

1841 Accesses
7 Citations

Abstract

Deep learning has become one of the most preferred solution for many complex problems. It shows outstanding performance in the field of computer vision to perform tasks like, image classification, object detection, and image generation. Recently, many research efforts are focused on changing the deep learning architecture for widespread application domain. In this paper, we present a comprehensive survey on the various issues and challenges faced by deep learning techniques. Furthermore, we analyze different deep learning architectures to provide the solution for the computer vision tasks along with their importance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 169.00; Price excludes VAT (USA)

Softcover Book: USD 219.99; Price excludes VAT (USA)

Hardcover Book: USD 219.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587.
Google Scholar
Choe, J. W., Nikoozadeh, A., & Oralkan, O., Khuri-Yakub, B.T. (2013). GPU-based real-time volumetric ultrasound image reconstruction for a ring array. IEEE Transactions on Medical Imaging,32(7), 1258–1264.
Google Scholar
Choi, Y., Choi, M., Kim, M., Ha, J., Kim, S., & Choo, J. (2017). StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. CoRR abs/1711.09020.
Google Scholar
Forsyth, D. A., & Ponce, J. (2002). Computer vision: A modern approach. Pearson Education India.
Google Scholar
Girshick, R. (2015). Fast R-CNN. In Proceedings of IEEE ICCV (pp. 1440–1448).
Google Scholar
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Google Scholar
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In Proceedings of NIPS (pp. 2672–2680).
Google Scholar
Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of IEEE ICASSP (pp. 6645–6649).
Google Scholar
Guo, T., Dong, J., Li, H., & Gao, Y. (2017). Simple convolutional neural network on image classification. In Proceeding of IEEE ICBDA (pp. 721–724).
Google Scholar
Hall, M. A., & Smith, L. A. (1999). Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In Proceedings of IFAIRSC (pp. 235–239).
Google Scholar
Hatcher, W. G., & Yu, W. (2018). A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access,6, 24411–24432.
Google Scholar
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. CoRR abs/1512.03385.
Google Scholar
He, K., Gkioxari, G., Dollr, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of IEEE ICCV (pp. 2980–2988).
Google Scholar
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,9(8), 1735–1780.
Google Scholar
Hu, B., Lu, Z., Li, H., & Chen, Q. (2014). Convolutional neural network architectures for matching natural language sentences. Advances in Neural Information Processing Systems,27, 2042–2050.
Google Scholar
Jaiswal, A., AbdAlmageed, W., & Natarajan, P. (2018). CapsuleGAN: Generative adversarial capsule network. In ECCV Workshops.
Google Scholar
Kaiming, H., Zhang, X., Ren, S., Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imageNet classification. In Proceedings of IEEE ICCV (pp. 1026–1034).
Google Scholar
Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948.
Google Scholar
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems,25, 1097–1105.
Google Scholar
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE,86(11), 2278–2324.
Google Scholar
O’Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. ArXiv e-prints.
Google Scholar
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016a). You only look once: Unified, real-time object detection. In Proceeding of IEEE CVPR (pp. 779–788).
Google Scholar
Redmon, J., Divvala, S. K., Girshick, R. B., & Farhadi, A. (2016b). You only look once: Unified, real-time object detection. Proceeding of IEEE CVPR (pp. 779–788).
Google Scholar
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems,28, 91–99.
Google Scholar
Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. CoRR abs/1710.09829, 1710.09829.
Google Scholar
Sak, H., Senior, A. W., & Beaufays, F. (2014). Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. CoRR abs/1402.1128.
Google Scholar
Schuster, M., & Paliwal, K. (1997). Bidirectional recurrent neural networks. Transaction in Signal Processing,45(11), 2673–2681.
Google Scholar
Shin, H., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging,35(5), 1285–1298.
Google Scholar
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.
Google Scholar
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems,27, 3104–3112.
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceeding of IEEE CVPR (pp. 1–9).
Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceeding of IEEE CVPR.
Google Scholar
Turner, C. R., Wolf, A. L., Fuggetta, A., & Lavazza, L. (1998). Feature engineering. In Proceedings of IWSSD (p. 162).
Google Scholar
Vijay, B., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Google Scholar
Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR.
Google Scholar
Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., et al. (2018). M2Det: A single-shot object detector based on multi-level feature pyramid network. CoRR abs/1811.04533.
Google Scholar
Zhu, J. Y., Krähenbühl, P., Shechtman, E., & Efros, A. A. (2016). Generative visual manipulation on the natural image manifold. CoRR abs/1609.03552.
Google Scholar

Download references

Acknowledgements

This work is supported by Science and Engineering Research Board (SERB) file number ECR/2017/002419, project entitled as A Robust Medical Image Forensics System for Smart Healthcare, and scheme Early Career Research Award.

Author information

Authors and Affiliations

Department of Computer Science and Engineering, IIT (BHU) Varanasi, Varanasi, India
Randheer Bagi, Tanima Dutta & Hari Prabhat Gupta

Authors

Randheer Bagi
View author publications
You can also search for this author in PubMed Google Scholar
Tanima Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Hari Prabhat Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Randheer Bagi .

Editor information

Editors and Affiliations

Smart Grid and Renewable Energy, University of Agder, Kristiansand, Norway
Mohan L. Kolhe
Department of Computer Science and Engineering, ABES Engineering College, Ghaziabad, Uttar Pradesh, India
Shailesh Tiwari
Department of Computer Science and Engineering, NIT Agartala, Tripura, India
Munesh C. Trivedi
Computer Science and Engineering, Motilal Nehru National Institute of Technology Allahabad, Prayagraj, India
Krishn K. Mishra

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Bagi, R., Dutta, T., Gupta, H.P. (2020). Deep Learning Architectures for Computer Vision Applications: A Study. In: Kolhe, M., Tiwari, S., Trivedi, M., Mishra, K. (eds) Advances in Data and Information Sciences. Lecture Notes in Networks and Systems, vol 94. Springer, Singapore. https://doi.org/10.1007/978-981-15-0694-9_56

Download citation

DOI: https://doi.org/10.1007/978-981-15-0694-9_56
Published: 03 January 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0693-2
Online ISBN: 978-981-15-0694-9
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics