Skip to main content

Deep Learning Models and Their Architectures for Computer Vision Applications: A Review

  • Chapter
  • First Online:
Deep Learning for Security and Privacy Preservation in IoT

Part of the book series: Signals and Communication Technology ((SCT))

  • 468 Accesses

Abstract

Deep learning is a modern and inspiring field of machine learning with a built-in ability to overcome from the shortcoming of conventional algorithms, which are dependent on hand-crafted features. Deep learning approaches have become one of the most recognizable solutions for multimedia and internet of things applications, like image segmentation, image enhancement, image classification, image generation, action recognition, pattern recognition, sequence prediction, and object detection. Much research has been carried out to analyze the various variants of deep neural network architectures for the widespread application domain. In this chapter, we focus on the major idea of deep learning and its architecture from basic to advance. Also, we discuss the motivation, methods, principal components, and limitations behind each architecture. This chapter investigates and presents a comprehensive survey of the major architectures of deep learning. It also covers various issues and challenges encountered during the design of the architecture. Furthermore, we discuss the importance and utility of deep neural network architectures in real-world applications. Finally, we conclude this chapter by explaining the future direction of research.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. An, J., Cho, S.: Variational autoencoder based anomaly detection using reconstruction probability. In: Special Lecture on IE (2015)

    Google Scholar 

  2. Arjovsky, M., Chintala, S., Bottou, L.: Wasserstein generative adversarial networks. Proc. ICML 70, 214–223 (2017)

    Google Scholar 

  3. Bagi, R., Dutta, T.: Cost-effective smart text sensing spotting in blurry scene images using deep networks. IEEE Sens. J. 1–8 (2020)

    Google Scholar 

  4. Bagi, R., Dutta, T., Gupta, H.P.: Cluttered textSpotter: an end-to-end trainable light-weight scene text spotter for cluttered environment. IEEE Access 8, 111433–111447 (2020)

    Article  Google Scholar 

  5. Bagi, R., Dutta, T., Gupta, H.P.: Deep learning architectures for computer vision applications: a study. In: Advances in Data and Information Sciences, Springer, pp. 601–612 (2020)

    Google Scholar 

  6. Bagi, R., Mohanty, S., Dutta, T., Gupta, H.P.: Leveraging smart devices for scene text preserved image stylization: a deep gaming approach. IEEE MultiMedia 27(2), 19–32 (2020)

    Article  Google Scholar 

  7. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: Proceedings of ICLR (2019)

    Google Scholar 

  8. Choi. Y., Choi, M., Kim, M., Ha, J., Kim, S., Choo, J.: StarGAN: unified generative adversarial networks for multi-domain image-to-image translation. CoRR abs/1711.09020 (2017)

    Google Scholar 

  9. Chung, J., Gulcehre, C., Cho, K., Bengio, Y.: Empirical evaluation of gated recurrent neural networks on sequence modeling. In: NIPS 2014 Workshop on Deep Learning (2014)

    Google Scholar 

  10. Deng, S., Li, S., Xie, K., Song, W., Liao, X., Hao, A., Qin, H.: A global-local self-adaptive network for drone-view object detection. IEEE Trans. Image Proc. 30, 1556–1569 (2021)

    Article  MathSciNet  Google Scholar 

  11. Gama, F., Marques, A.G., Leus, G., Ribeiro, A.: Convolutional Graph Neural Networks. In: Proceedings of ACSSC, pp. 452–456

    Google Scholar 

  12. Gao, Z., Guo, L., Guan, W., Liu, A.A., Ren, T., Chen, S.: A pairwise attentive adversarial spatiotemporal network for cross-domain few-shot action recognition-R2. IEEE Trans. Image Proc. 30, 767–782 (2021)

    Article  Google Scholar 

  13. Girshick, R.: Fast R-CNN. In: Proceedings of IEEE ICCV, pp. 1440–1448 (2015)

    Google Scholar 

  14. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016)

    Google Scholar 

  15. Goodfellow, I.J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In: Proceedings of NIPS, pp. 2672–2680 (2014)

    Google Scholar 

  16. Graves, A., Mohamed, A., Hinton, G.: Speech recognition with deep recurrent neural networks. In: Proceedings of IEEE ICASSP, pp. 6645–6649 (2013)

    Google Scholar 

  17. Guo, T., Dong, J., Li, H., Gao, Y.: Simple convolutional neural network on image classification. In: Proceeding of IEEE ICBDA, pp. 721–724 (2017)

    Google Scholar 

  18. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. CoRR abs/1512.03385 (2015)

    Google Scholar 

  19. He, K., Gkioxari, G., Dollár, P., Girshick, R.: Mask R-CNN. In: Proceedings of IEEE ICCV, pp. 2980–2988 (2017)

    Google Scholar 

  20. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  21. Hu, B., Lu, Z., Li, H., Chen, Q.: Convolutional neural network architectures for matching natural language sentences. Adv. Neural. Inf. Proc. Syst. 27, 2042–2050 (2014)

    Google Scholar 

  22. Jaiswal, A., AbdAlmageed, W., Natarajan, P.: CapsuleGAN: generative adversarial Capsule Network. In: ECCV Workshops (2018)

    Google Scholar 

  23. Kaiming, H., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: surpassing human-level performance on imageNet classification. In: Proceedings of IEEE ICCV, pp. 1026–1034 (2015)

    Google Scholar 

  24. Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of ICLR (2018a)

    Google Scholar 

  25. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948 (2018b)

    Google Scholar 

  26. Kipf, T.N., Welling, M.: Semi-supervised classification with graph convolutional networks. In: Proceedings of ICLR (2017)

    Google Scholar 

  27. Kipf, T.N., Welling, M.: Variational graph auto-encoders. In: Proceedings of NeurIPS, pp. 1–11 (2019)

    Google Scholar 

  28. Kramer, M.A.: Nonlinear principal component analysis using autoassociative neural networks. AIChE J. 37(2), 233–243 (1991)

    Article  Google Scholar 

  29. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Proc. Syst. 25, 1097–1105 (2012)

    Google Scholar 

  30. Mehralian, M., Karasfi, B.: RDCGAN: unsupervised representation learning with regularized deep convolutional generative adversarial networks. In: Proceedings of ICAIR and ICAPIS, pp. 31–38 (2018)

    Google Scholar 

  31. Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: unified, real-time object detection. In: Proceeding of IEEE CVPR, pp. 779–788 (2016a)

    Google Scholar 

  32. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Proc. Syst. 28, 91–99 (2015)

    Google Scholar 

  33. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. CoRR abs/1710.09829, 1710.09829 (2017)

    Google Scholar 

  34. Sak, H., Senior, A.W., Beaufays, F.: Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. CoRR abs/1402.1128 (2014)

    Google Scholar 

  35. Scarselli, F., Gori, M., Tsoi, A.C., Hagenbuchner, M., Monfardini, G.: The graph neural network model. IEEE Trans. Neural Netw. 20, 61–80 (2009)

    Article  Google Scholar 

  36. Schuster, M., Paliwal, K.: Bidirectional recurrent neural networks. Trans. Sig. Proc. 45(11), 2673–2681 (1997)

    Article  Google Scholar 

  37. Shin, H., Roth, H.R., Gao, M., Lu, L., Xu, Z., Nogues, I., Yao, J., Mollura, D., Summers, R.M.: Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Trans. Med. Imag. 35(5), 1285–1298 (2016)

    Article  Google Scholar 

  38. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2014)

    Google Scholar 

  39. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. Adv. Neural Inf. Process. Syst. 27, 3104–3112 (2014)

    Google Scholar 

  40. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Proceeding of IEEE CVPR, pp. 1–9 (2015)

    Google Scholar 

  41. Veličković, P., Cucurull G, Casanova A, Romero A, Liò P, Bengio Y (2018) Graph attention networks. In: Proceedings of ICLR

    Google Scholar 

  42. Wu, Z., Pan, S., Chen, F., Long, G., Zhang, C., Yu, P.S.: A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32(1), 4–24 (2021)

    Article  MathSciNet  Google Scholar 

  43. Yu, B., Yin, H., Zhu, Z.: Spatio-temporal graph convolutional networks: a deep learning framework for traffic forecasting. In: Proceedings of IJCAI, pp. 3634–3640 (2018)

    Google Scholar 

  44. Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. CoRR (2013)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Randheer Bagi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Dutta, T., Bagi, R., Gupta, H.P. (2021). Deep Learning Models and Their Architectures for Computer Vision Applications: A Review. In: Makkar, A., Kumar, N. (eds) Deep Learning for Security and Privacy Preservation in IoT. Signals and Communication Technology. Springer, Singapore. https://doi.org/10.1007/978-981-16-6186-0_2

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-6186-0_2

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-6185-3

  • Online ISBN: 978-981-16-6186-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics