Deep Learning Models and Their Architectures for Computer Vision Applications: A Review

Dutta, Tanima; Bagi, Randheer; Gupta, Hari Prabhat

doi:10.1007/978-981-16-6186-0_2

Tanima Dutta⁸,
Randheer Bagi⁸ &
Hari Prabhat Gupta⁸

Part of the book series: Signals and Communication Technology ((SCT))

468 Accesses

Abstract

Deep learning is a modern and inspiring field of machine learning with a built-in ability to overcome from the shortcoming of conventional algorithms, which are dependent on hand-crafted features. Deep learning approaches have become one of the most recognizable solutions for multimedia and internet of things applications, like image segmentation, image enhancement, image classification, image generation, action recognition, pattern recognition, sequence prediction, and object detection. Much research has been carried out to analyze the various variants of deep neural network architectures for the widespread application domain. In this chapter, we focus on the major idea of deep learning and its architecture from basic to advance. Also, we discuss the motivation, methods, principal components, and limitations behind each architecture. This chapter investigates and presents a comprehensive survey of the major architectures of deep learning. It also covers various issues and challenges encountered during the design of the architecture. Furthermore, we discuss the importance and utility of deep neural network architectures in real-world applications. Finally, we conclude this chapter by explaining the future direction of research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 149.00; Price excludes VAT (USA)

Softcover Book: USD 199.99; Price excludes VAT (USA)

Hardcover Book: USD 199.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

An, J., Cho, S.: Variational autoencoder based anomaly detection using reconstruction probability. In: Special Lecture on IE (2015)
Google Scholar
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. Proc. ICML 70, 214–223 (2017)
Google Scholar
Bagi, R., Dutta, T.: Cost-effective smart text sensing spotting in blurry scene images using deep networks. IEEE Sens. J. 1–8 (2020)
Google Scholar
Bagi, R., Dutta, T., Gupta, H.P.: Cluttered textSpotter: an end-to-end trainable light-weight scene text spotter for cluttered environment. IEEE Access 8, 111433–111447 (2020)
Article Google Scholar
Bagi, R., Dutta, T., Gupta, H.P.: Deep learning architectures for computer vision applications: a study. In: Advances in Data and Information Sciences, Springer, pp. 601–612 (2020)
Google Scholar
Bagi, R., Mohanty, S., Dutta, T., Gupta, H.P.: Leveraging smart devices for scene text preserved image stylization: a deep gaming approach. IEEE MultiMedia 27(2), 19–32 (2020)
Article Google Scholar
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: Proceedings of ICLR (2019)
Google Scholar
Choi. Y., Choi, M., Kim, M., Ha, J., Kim, S., Choo, J.: StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. CoRR abs/1711.09020 (2017)
Google Scholar
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning (2014)
Google Scholar
Deng, S., Li, S., Xie, K., Song, W., Liao, X., Hao, A., Qin, H.: A global-local self-adaptive network for drone-view object detection. IEEE Trans. Image Proc. 30, 1556–1569 (2021)
Article MathSciNet Google Scholar
Gama, F., Marques, A.G., Leus, G., Ribeiro, A.: Convolutional Graph Neural Networks. In: Proceedings of ACSSC, pp. 452–456
Google Scholar
Gao, Z., Guo, L., Guan, W., Liu, A.A., Ren, T., Chen, S.: A pairwise attentive adversarial spatiotemporal network for cross-domain few-shot action recognition-R2. IEEE Trans. Image Proc. 30, 767–782 (2021)
Article Google Scholar
Girshick, R.: Fast R-CNN. In: Proceedings of IEEE ICCV, pp. 1440–1448 (2015)
Google Scholar
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Google Scholar
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of NIPS, pp. 2672–2680 (2014)
Google Scholar
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of IEEE ICASSP, pp. 6645–6649 (2013)
Google Scholar
Guo, T., Dong, J., Li, H., Gao, Y.: Simple convolutional neural network on image classification. In: Proceeding of IEEE ICBDA, pp. 721–724 (2017)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
Google Scholar
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of IEEE ICCV, pp. 2980–2988 (2017)
Google Scholar
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Article Google Scholar
Hu, B., Lu, Z., Li, H., Chen, Q.: Convolutional neural network architectures for matching natural language sentences. Adv. Neural. Inf. Proc. Syst. 27, 2042–2050 (2014)
Google Scholar
Jaiswal, A., AbdAlmageed, W., Natarajan, P.: CapsuleGAN: generative adversarial Capsule Network. In: ECCV Workshops (2018)
Google Scholar
Kaiming, H., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imageNet classification. In: Proceedings of IEEE ICCV, pp. 1026–1034 (2015)
Google Scholar
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of ICLR (2018a)
Google Scholar
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018b)
Google Scholar
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proceedings of ICLR (2017)
Google Scholar
Kipf, T.N., Welling, M.: Variational graph auto-encoders. In: Proceedings of NeurIPS, pp. 1–11 (2019)
Google Scholar
Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)
Article Google Scholar
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Proc. Syst. 25, 1097–1105 (2012)
Google Scholar
Mehralian, M., Karasfi, B.: RDCGAN: unsupervised representation learning with regularized deep convolutional generative adversarial networks. In: Proceedings of ICAIR and ICAPIS, pp. 31–38 (2018)
Google Scholar
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceeding of IEEE CVPR, pp. 779–788 (2016a)
Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Proc. Syst. 28, 91–99 (2015)
Google Scholar
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. CoRR abs/1710.09829, 1710.09829 (2017)
Google Scholar
Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. CoRR abs/1402.1128 (2014)
Google Scholar
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2009)
Article Google Scholar
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45(11), 2673–2681 (1997)
Article Google Scholar
Shin, H., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imag. 35(5), 1285–1298 (2016)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Google Scholar
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 27, 3104–3112 (2014)
Google Scholar
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceeding of IEEE CVPR, pp. 1–9 (2015)
Google Scholar
Veličković, P., Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: Proceedings of ICLR
Google Scholar
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2021)
Article MathSciNet Google Scholar
Yu, B., Yin, H., Zhu, Z.: Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In: Proceedings of IJCAI, pp. 3634–3640 (2018)
Google Scholar
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. CoRR (2013)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, IIT (BHU) Varanasi, Varanasi, Uttar Pradesh, India
Tanima Dutta, Randheer Bagi & Hari Prabhat Gupta

Authors

Tanima Dutta
View author publications
You can also search for this author in PubMed Google Scholar
Randheer Bagi
View author publications
You can also search for this author in PubMed Google Scholar
Hari Prabhat Gupta
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Randheer Bagi .

Editor information

Editors and Affiliations

University of Derby, Derby, UK
Aaisha Makkar
Thapar Institute of Engineering and Technology, Patiala, Punjab, India
Neeraj Kumar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Dutta, T., Bagi, R., Gupta, H.P. (2021). Deep Learning Models and Their Architectures for Computer Vision Applications: A Review. In: Makkar, A., Kumar, N. (eds) Deep Learning for Security and Privacy Preservation in IoT. Signals and Communication Technology. Springer, Singapore. https://doi.org/10.1007/978-981-16-6186-0_2

Download citation

DOI: https://doi.org/10.1007/978-981-16-6186-0_2
Published: 04 April 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6185-3
Online ISBN: 978-981-16-6186-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics