Abstract
Deep learning has become one of the most preferred solution for many complex problems. It shows outstanding performance in the field of computer vision to perform tasks like, image classification, object detection, and image generation. Recently, many research efforts are focused on changing the deep learning architecture for widespread application domain. In this paper, we present a comprehensive survey on the various issues and challenges faced by deep learning techniques. Furthermore, we analyze different deep learning architectures to provide the solution for the computer vision tasks along with their importance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chen, L. C., Papandreou, G., Schroff, F., & Adam, H. (2017). Rethinking atrous convolution for semantic image segmentation. CoRR abs/1706.05587.
Choe, J. W., Nikoozadeh, A., & Oralkan, O., Khuri-Yakub, B.T. (2013). GPU-based real-time volumetric ultrasound image reconstruction for a ring array. IEEE Transactions on Medical Imaging,32(7), 1258–1264.
Choi, Y., Choi, M., Kim, M., Ha, J., Kim, S., & Choo, J. (2017). StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation. CoRR abs/1711.09020.
Forsyth, D. A., & Ponce, J. (2002). Computer vision: A modern approach. Pearson Education India.
Girshick, R. (2015). Fast R-CNN. In Proceedings of IEEE ICCV (pp. 1440–1448).
Goodfellow, I., Bengio, Y., & Courville, A. (2016). Deep learning. MIT Press.
Goodfellow, I. J., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., et al. (2014). Generative adversarial nets. In Proceedings of NIPS (pp. 2672–2680).
Graves, A., Mohamed, A., & Hinton, G. (2013). Speech recognition with deep recurrent neural networks. In Proceedings of IEEE ICASSP (pp. 6645–6649).
Guo, T., Dong, J., Li, H., & Gao, Y. (2017). Simple convolutional neural network on image classification. In Proceeding of IEEE ICBDA (pp. 721–724).
Hall, M. A., & Smith, L. A. (1999). Feature selection for machine learning: Comparing a correlation-based filter approach to the wrapper. In Proceedings of IFAIRSC (pp. 235–239).
Hatcher, W. G., & Yu, W. (2018). A survey of deep learning: Platforms, applications and emerging research trends. IEEE Access,6, 24411–24432.
He, K., Zhang, X., Ren, S., & Sun, J. (2015). Deep residual learning for image recognition. CoRR abs/1512.03385.
He, K., Gkioxari, G., Dollr, P., & Girshick, R. (2017). Mask R-CNN. In Proceedings of IEEE ICCV (pp. 2980–2988).
Hochreiter, S., & Schmidhuber, J. (1997). Long short-term memory. Neural Computation,9(8), 1735–1780.
Hu, B., Lu, Z., Li, H., & Chen, Q. (2014). Convolutional neural network architectures for matching natural language sentences. Advances in Neural Information Processing Systems,27, 2042–2050.
Jaiswal, A., AbdAlmageed, W., & Natarajan, P. (2018). CapsuleGAN: Generative adversarial capsule network. In ECCV Workshops.
Kaiming, H., Zhang, X., Ren, S., Sun, J. (2015). Delving deep into rectifiers: Surpassing human-level performance on imageNet classification. In Proceedings of IEEE ICCV (pp. 1026–1034).
Karras, T., Laine, S., & Aila, T. (2018). A style-based generator architecture for generative adversarial networks. CoRR abs/1812.04948.
Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems,25, 1097–1105.
Lecun, Y., Bottou, L., Bengio, Y., & Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE,86(11), 2278–2324.
O’Shea, K., & Nash, R. (2015). An introduction to convolutional neural networks. ArXiv e-prints.
Redmon, J., Divvala, S., Girshick, R., & Farhadi, A. (2016a). You only look once: Unified, real-time object detection. In Proceeding of IEEE CVPR (pp. 779–788).
Redmon, J., Divvala, S. K., Girshick, R. B., & Farhadi, A. (2016b). You only look once: Unified, real-time object detection. Proceeding of IEEE CVPR (pp. 779–788).
Ren, S., He, K., Girshick, R., & Sun, J. (2015). Faster R-CNN: Towards real-time object detection with region proposal networks. Advances in Neural Information Processing Systems,28, 91–99.
Sabour, S., Frosst, N., & Hinton, G. E. (2017). Dynamic routing between capsules. CoRR abs/1710.09829, 1710.09829.
Sak, H., Senior, A. W., & Beaufays, F. (2014). Long short-term memory based recurrent neural network architectures for large vocabulary speech recognition. CoRR abs/1402.1128.
Schuster, M., & Paliwal, K. (1997). Bidirectional recurrent neural networks. Transaction in Signal Processing,45(11), 2673–2681.
Shin, H., Roth, H. R., Gao, M., Lu, L., Xu, Z., Nogues, I., et al. (2016). Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning. IEEE Transactions on Medical Imaging,35(5), 1285–1298.
Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556.
Sutskever, I., Vinyals, O., & Le, Q. V. (2014). Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems,27, 3104–3112.
Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., et al. (2015). Going deeper with convolutions. In Proceeding of IEEE CVPR (pp. 1–9).
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., & Wojna, Z. (2016). Rethinking the inception architecture for computer vision. In Proceeding of IEEE CVPR.
Turner, C. R., Wolf, A. L., Fuggetta, A., & Lavazza, L. (1998). Feature engineering. In Proceedings of IWSSD (p. 162).
Vijay, B., Kendall, A., & Cipolla, R. (2017). SegNet: A deep convolutional encoder-decoder architecture for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence.
Zeiler, M. D., & Fergus, R. (2013). Visualizing and understanding convolutional networks. CoRR.
Zhao, Q., Sheng, T., Wang, Y., Tang, Z., Chen, Y., Cai, L., et al. (2018). M2Det: A single-shot object detector based on multi-level feature pyramid network. CoRR abs/1811.04533.
Zhu, J. Y., Krähenbühl, P., Shechtman, E., & Efros, A. A. (2016). Generative visual manipulation on the natural image manifold. CoRR abs/1609.03552.
Acknowledgements
This work is supported by Science and Engineering Research Board (SERB) file number ECR/2017/002419, project entitled as A Robust Medical Image Forensics System for Smart Healthcare, and scheme Early Career Research Award.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Bagi, R., Dutta, T., Gupta, H.P. (2020). Deep Learning Architectures for Computer Vision Applications: A Study. In: Kolhe, M., Tiwari, S., Trivedi, M., Mishra, K. (eds) Advances in Data and Information Sciences. Lecture Notes in Networks and Systems, vol 94. Springer, Singapore. https://doi.org/10.1007/978-981-15-0694-9_56
Download citation
DOI: https://doi.org/10.1007/978-981-15-0694-9_56
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-0693-2
Online ISBN: 978-981-15-0694-9
eBook Packages: EngineeringEngineering (R0)