Skip to main content

Deep Learning Architectures for Computer Vision Applications: A Study

  • Conference paper
  • First Online:
Advances in Data and Information Sciences

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 94))

Abstract

Deep learning has become one of the most preferred solution for many complex problems. It shows outstanding performance in the field of computer vision to perform tasks like, image classification, object detection, and image generation. Recently, many research efforts are focused on changing the deep learning architecture for widespread application domain. In this paper, we present a comprehensive survey on the various issues and challenges faced by deep learning techniques. Furthermore, we analyze different deep learning architectures to provide the solution for the computer vision tasks along with their importance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587.

    Google Scholar 

  2. Choe, J. W., Nikoozadeh, A., & Oralkan, O., Khuri-Yakub, B.T. (2013). GPU-based real-time volumetric ultrasound image reconstruction for a ring array. IEEE Transactions on Medical Imaging,32(7), 1258–1264.

    Google Scholar 

  3. Choi, Y., Choi, M., Kim, M., Ha, J., Kim, S., & Choo, J. (2017). StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. CoRR abs/1711.09020.

    Google Scholar 

  4. Forsyth, D. A., & Ponce, J. (2002). Computer vision: A modern approach. Pearson Education India.

    Google Scholar 

  5. Girshick, R. (2015). Fast R-CNN. In Proceedings of IEEE ICCV (pp. 1440–1448).

    Google Scholar 

  6. Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.

    Google Scholar 

  7. Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In Proceedings of NIPS (pp. 2672–2680).

    Google Scholar 

  8. Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of IEEE ICASSP (pp. 6645–6649).

    Google Scholar 

  9. Guo, T., Dong, J., Li, H., & Gao, Y. (2017). Simple convolutional neural network on image classification. In Proceeding of IEEE ICBDA (pp. 721–724).

    Google Scholar 

  10. Hall, M. A., & Smith, L. A. (1999). Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In Proceedings of IFAIRSC (pp. 235–239).

    Google Scholar 

  11. Hatcher, W. G., & Yu, W. (2018). A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access,6, 24411–24432.

    Google Scholar 

  12. He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. CoRR abs/1512.03385.

    Google Scholar 

  13. He, K., Gkioxari, G., Dollr, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of IEEE ICCV (pp. 2980–2988).

    Google Scholar 

  14. Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,9(8), 1735–1780.

    Google Scholar 

  15. Hu, B., Lu, Z., Li, H., & Chen, Q. (2014). Convolutional neural network architectures for matching natural language sentences. Advances in Neural Information Processing Systems,27, 2042–2050.

    Google Scholar 

  16. Jaiswal, A., AbdAlmageed, W., & Natarajan, P. (2018). CapsuleGAN: Generative adversarial capsule network. In ECCV Workshops.

    Google Scholar 

  17. Kaiming, H., Zhang, X., Ren, S., Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imageNet classification. In Proceedings of IEEE ICCV (pp. 1026–1034).

    Google Scholar 

  18. Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948.

    Google Scholar 

  19. Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems,25, 1097–1105.

    Google Scholar 

  20. Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE,86(11), 2278–2324.

    Google Scholar 

  21. O’Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. ArXiv e-prints.

    Google Scholar 

  22. Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016a). You only look once: Unified, real-time object detection. In Proceeding of IEEE CVPR (pp. 779–788).

    Google Scholar 

  23. Redmon, J., Divvala, S. K., Girshick, R. B., & Farhadi, A. (2016b). You only look once: Unified, real-time object detection. Proceeding of IEEE CVPR (pp. 779–788).

    Google Scholar 

  24. Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems,28, 91–99.

    Google Scholar 

  25. Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. CoRR abs/1710.09829, 1710.09829.

    Google Scholar 

  26. Sak, H., Senior, A. W., & Beaufays, F. (2014). Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. CoRR abs/1402.1128.

    Google Scholar 

  27. Schuster, M., & Paliwal, K. (1997). Bidirectional recurrent neural networks. Transaction in Signal Processing,45(11), 2673–2681.

    Google Scholar 

  28. Shin, H., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging,35(5), 1285–1298.

    Google Scholar 

  29. Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.

    Google Scholar 

  30. Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems,27, 3104–3112.

    Google Scholar 

  31. Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceeding of IEEE CVPR (pp. 1–9).

    Google Scholar 

  32. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceeding of IEEE CVPR.

    Google Scholar 

  33. Turner, C. R., Wolf, A. L., Fuggetta, A., & Lavazza, L. (1998). Feature engineering. In Proceedings of IWSSD (p. 162).

    Google Scholar 

  34. Vijay, B., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.

    Google Scholar 

  35. Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR.

    Google Scholar 

  36. Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., et al. (2018). M2Det: A single-shot object detector based on multi-level feature pyramid network. CoRR abs/1811.04533.

    Google Scholar 

  37. Zhu, J. Y., Krähenbühl, P., Shechtman, E., & Efros, A. A. (2016). Generative visual manipulation on the natural image manifold. CoRR abs/1609.03552.

    Google Scholar 

Download references

Acknowledgements

This work is supported by Science and Engineering Research Board (SERB) file number ECR/2017/002419, project entitled as A Robust Medical Image Forensics System for Smart Healthcare, and scheme Early Career Research Award.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Randheer Bagi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bagi, R., Dutta, T., Gupta, H.P. (2020). Deep Learning Architectures for Computer Vision Applications: A Study. In: Kolhe, M., Tiwari, S., Trivedi, M., Mishra, K. (eds) Advances in Data and Information Sciences. Lecture Notes in Networks and Systems, vol 94. Springer, Singapore. https://doi.org/10.1007/978-981-15-0694-9_56

Download citation

  • DOI: https://doi.org/10.1007/978-981-15-0694-9_56

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-15-0693-2

  • Online ISBN: 978-981-15-0694-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics