Image Captioning Using Deep Learning Model

Patel, Disha; Gandhi, Ankita; Bhaidasna, Zubin

doi:10.1007/978-981-19-2500-9_16

Disha Patel¹²,
Ankita Gandhi¹² &
Zubin Bhaidasna¹²

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 444))

355 Accesses
1 Citations

Abstract

Image Captioning means that the natural language descriptions are generated automatically based on the content of an image. It's an important aspect of scene comprehension since it integrates computer vision and natural language processing knowledge. Numerous methods and algorithms are developed by researchers to increase the accuracy of image captioning. However, it is one of the major questions for future researchers to get optimized result in captioning an image. Furthermore, there are thousands of gray-scale images that are captioned. In this proposed work, different pre-trained models are used to extract features of images through Convolutional Neural Network (CNN) for colored images and gray-scale images from dataset and then, the extracted features are fed into LSTM, which generates caption for images. At last, the model’s accuracy of color and gray-scale images are studied to determine the model’s capability in captioning both types of images.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 219.00; Price excludes VAT (USA)

Softcover Book: USD 279.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

S. Amirian, K. Rasheed, T. Taha, H. Arabnia, Automatic image and video caption generation with deep learning: A concise review and algorithmic overlap. IEEE Access 8, 218386–218400 (2020)
Article Google Scholar
D.S. Lakshminarasimhan Srinivasan, A.L. Amutha, Image captioning—A deep learning approach. Int. J. Appl. Eng. Res. Open Access
Google Scholar
K. Wang, X. Zhang, F. Wang, T. Wu, C. Chen, Multilayer dense attention model for image caption. IEEE Access 7, 66358–66368 (2019)
Article Google Scholar
V. Pandit, R. Gulati, C. Singla, S. Singh, DeepCap: A deep learning model to caption black and white images, in 2020 10th International Conference on Cloud Computing, Data Science & Engineering (Confluence)
Google Scholar
M. Yang, W. Zhao, W. Xu, Y. Feng, Z. Zhao, X. Chen, K. Lei, Multitask learning for cross-domain image captioning. IEEE Trans. Multimedia 21(4), 1047–1061 (2019)
Article Google Scholar
N. Yu, X. Hu, B. Song, J. Yang, J. Zhang, Topic-oriented image captioning based on order-embedding. IEEE Trans. Image Process. 28(6), 2743–2754 (2019)
Article MathSciNet Google Scholar
B. Wang, X. Zheng, B. Qu, X. Lu, Retrieval topic recurrent memory network for remote sensing image captioning. IEEE J. Sel. Topics Appl. Earth Observations Remote Sens. 13, 256–270 (2020)
Article Google Scholar
M. Zhang, Y. Yang, H. Zhang, Y. Ji, H. Shen, T. Chua, More is better: Precise and detailed image captioning using online positive recall and missing concepts mining. IEEE Trans. Image Process. 28(1), 32–44 (2019)
Article MathSciNet Google Scholar
Y. Jing, X. Zhiwei, G. Guanglai, Context-driven image caption with global semantic relations of the named entities. IEEE Access 8, 143584–143594 (2020)
Article Google Scholar
Y. Xian, Y. Tian, Self-guiding multimodal LSTM—when we do not have a perfect training dataset for image captioning. IEEE Trans. Image Process. 28(11), 5241–5252 (2019)
Article MathSciNet Google Scholar
G. Hoxha, F. Melgani, B. Demir, Toward remote sensing image retrieval under a deep image captioning perspective. IEEE J. Sel. Top. Appl. Earth Observations Remote Sens. 13, 4462–4475 (2020)
Article Google Scholar
N. Xu, H. Zhang, A. Liu, W. Nie, Y. Su, J. Nie, Y. Zhang, Multi-level policy and reward-based deep reinforcement learning framework for image captioning. IEEE Trans. Multimedia 22(5), 1372–1383 (2020)
Article Google Scholar
L. Yang, H. Wang, P. Tang, Q. Li, CaptionNet: A tailor-made recurrent neural network for generating image descriptions. IEEE Trans. Multimedia 23, 835–845 (2021)
Article Google Scholar
Z. Zhang, W. Zhang, W. Diao, M. Yan, X. Gao, X. Sun, VAA: Visual aligning attention model for remote sensing image captioning. IEEE Access 7, 137355–137364 (2019)
Article Google Scholar
X. Xiao, L. Wang, K. Ding, S. Xiang, C. Pan, Deep hierarchical encoder–decoder network for image captioning. IEEE Trans. Multimedia 21(11), 2942–2956 (2019)
Article Google Scholar
J. Yu, J. Li, Z. Yu, Q. Huang, Multimodal transformer with multi-view visual representation for image captioning. IEEE Trans. Circuits Syst. Video Technol. 30(12), 4467–4480 (2020)
Article Google Scholar

Download references

Author information

Authors and Affiliations

Parul Institute of Engineering and Technology, Vadodara, India
Disha Patel, Ankita Gandhi & Zubin Bhaidasna

Authors

Disha Patel
View author publications
You can also search for this author in PubMed Google Scholar
Ankita Gandhi
View author publications
You can also search for this author in PubMed Google Scholar
Zubin Bhaidasna
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Disha Patel .

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, GITAM University, Bangalore, India
I. Jeena Jacob
Department of Mathematics and Computer Science, Ashland University, Ashland, OH, USA
Selvanayaki Kolandapalayam Shanmugam
Department of Telecommunication Engineering, Czech Technical University in Prague, Prague, Czech Republic
Robert Bestak

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Patel, D., Gandhi, A., Bhaidasna, Z. (2022). Image Captioning Using Deep Learning Model. In: Jacob, I.J., Kolandapalayam Shanmugam, S., Bestak, R. (eds) Expert Clouds and Applications. Lecture Notes in Networks and Systems, vol 444. Springer, Singapore. https://doi.org/10.1007/978-981-19-2500-9_16

Download citation

DOI: https://doi.org/10.1007/978-981-19-2500-9_16
Published: 18 August 2022
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-2499-6
Online ISBN: 978-981-19-2500-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics