Abstract
Deep learning is a modern and inspiring field of machine learning with a built-in ability to overcome from the shortcoming of conventional algorithms, which are dependent on hand-crafted features. Deep learning approaches have become one of the most recognizable solutions for multimedia and internet of things applications, like image segmentation, image enhancement, image classification, image generation, action recognition, pattern recognition, sequence prediction, and object detection. Much research has been carried out to analyze the various variants of deep neural network architectures for the widespread application domain. In this chapter, we focus on the major idea of deep learning and its architecture from basic to advance. Also, we discuss the motivation, methods, principal components, and limitations behind each architecture. This chapter investigates and presents a comprehensive survey of the major architectures of deep learning. It also covers various issues and challenges encountered during the design of the architecture. Furthermore, we discuss the importance and utility of deep neural network architectures in real-world applications. Finally, we conclude this chapter by explaining the future direction of research.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
An, J., Cho, S.: Variational autoencoder based anomaly detection using reconstruction probability. In: Special Lecture on IE (2015)
Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. Proc. ICML 70, 214–223 (2017)
Bagi, R., Dutta, T.: Cost-effective smart text sensing spotting in blurry scene images using deep networks. IEEE Sens. J. 1–8 (2020)
Bagi, R., Dutta, T., Gupta, H.P.: Cluttered textSpotter: an end-to-end trainable light-weight scene text spotter for cluttered environment. IEEE Access 8, 111433–111447 (2020)
Bagi, R., Dutta, T., Gupta, H.P.: Deep learning architectures for computer vision applications: a study. In: Advances in Data and Information Sciences, Springer, pp. 601–612 (2020)
Bagi, R., Mohanty, S., Dutta, T., Gupta, H.P.: Leveraging smart devices for scene text preserved image stylization: a deep gaming approach. IEEE MultiMedia 27(2), 19–32 (2020)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: Proceedings of ICLR (2019)
Choi. Y., Choi, M., Kim, M., Ha, J., Kim, S., Choo, J.: StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. CoRR abs/1711.09020 (2017)
Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning (2014)
Deng, S., Li, S., Xie, K., Song, W., Liao, X., Hao, A., Qin, H.: A global-local self-adaptive network for drone-view object detection. IEEE Trans. Image Proc. 30, 1556–1569 (2021)
Gama, F., Marques, A.G., Leus, G., Ribeiro, A.: Convolutional Graph Neural Networks. In: Proceedings of ACSSC, pp. 452–456
Gao, Z., Guo, L., Guan, W., Liu, A.A., Ren, T., Chen, S.: A pairwise attentive adversarial spatiotemporal network for cross-domain few-shot action recognition-R2. IEEE Trans. Image Proc. 30, 767–782 (2021)
Girshick, R.: Fast R-CNN. In: Proceedings of IEEE ICCV, pp. 1440–1448 (2015)
Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)
Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of NIPS, pp. 2672–2680 (2014)
Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of IEEE ICASSP, pp. 6645–6649 (2013)
Guo, T., Dong, J., Li, H., Gao, Y.: Simple convolutional neural network on image classification. In: Proceeding of IEEE ICBDA, pp. 721–724 (2017)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)
He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of IEEE ICCV, pp. 2980–2988 (2017)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Hu, B., Lu, Z., Li, H., Chen, Q.: Convolutional neural network architectures for matching natural language sentences. Adv. Neural. Inf. Proc. Syst. 27, 2042–2050 (2014)
Jaiswal, A., AbdAlmageed, W., Natarajan, P.: CapsuleGAN: generative adversarial Capsule Network. In: ECCV Workshops (2018)
Kaiming, H., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imageNet classification. In: Proceedings of IEEE ICCV, pp. 1026–1034 (2015)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of ICLR (2018a)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018b)
Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proceedings of ICLR (2017)
Kipf, T.N., Welling, M.: Variational graph auto-encoders. In: Proceedings of NeurIPS, pp. 1–11 (2019)
Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Proc. Syst. 25, 1097–1105 (2012)
Mehralian, M., Karasfi, B.: RDCGAN: unsupervised representation learning with regularized deep convolutional generative adversarial networks. In: Proceedings of ICAIR and ICAPIS, pp. 31–38 (2018)
Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceeding of IEEE CVPR, pp. 779–788 (2016a)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Proc. Syst. 28, 91–99 (2015)
Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. CoRR abs/1710.09829, 1710.09829 (2017)
Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. CoRR abs/1402.1128 (2014)
Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2009)
Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45(11), 2673–2681 (1997)
Shin, H., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imag. 35(5), 1285–1298 (2016)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)
Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 27, 3104–3112 (2014)
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceeding of IEEE CVPR, pp. 1–9 (2015)
Veličković, P., Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: Proceedings of ICLR
Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2021)
Yu, B., Yin, H., Zhu, Z.: Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In: Proceedings of IJCAI, pp. 3634–3640 (2018)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. CoRR (2013)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this chapter
Cite this chapter
Dutta, T., Bagi, R., Gupta, H.P. (2021). Deep Learning Models and Their Architectures for Computer Vision Applications: A Review. In: Makkar, A., Kumar, N. (eds) Deep Learning for Security and Privacy Preservation in IoT. Signals and Communication Technology. Springer, Singapore. https://doi.org/10.1007/978-981-16-6186-0_2
Download citation
DOI: https://doi.org/10.1007/978-981-16-6186-0_2
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-6185-3
Online ISBN: 978-981-16-6186-0
eBook Packages: Computer ScienceComputer Science (R0)