Abstract
Capturing image feature and multi-object region of an image and transferring it into a Natural Language Sentence is a research issue needs to be addressed with natural language processing. Technically, the attention mechanism will force every word representation to an corresponding image region, however at times it do neglect certain words like ‘the’ in the description text, as it misleads the text interpretation. The captioning of an image involves not only detecting the features from various images, but also decoding the collaborations between the items into significant image text. The focus of the suggested work, predicts the image sentence in a more detailed way for every region/frame of an image. To overcome, an image feature extraction is carried out using CNN and LSTM for the image text generation with the help of adaptive attention mechanism, which will be add in the layer of LSTM to predict better image sentence is constructed. The above mentioned deep network methods have been analyzed using two output combination. Experiments have been implemented using Flickr8k dataset. The implementation analysis illustrates that adaptive attention performs significantly better than without adaptive attention of image sentence model and generates more meaningful captions compared to any of the individual models used. From the results on test images, the suggested network gives the accuracy, bleu score with and without using adaptive attention in the LSTM of 81.53, 61.94 and 73.53, 57.94%.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arifianto A, Ramadhani KN, Mahadi MRS (2020) “Adaptive attention generation for indonesian image captioning.” In: International conference on information and communication technology (ICICT), pp 1–6
Hani A, Kherallah M, Tagougui N (2019) “Image caption generation using a deep architecture.” In International Arab conference on information technology (ACIT), pp 246–251
Yao T, Pan Y, Li Y, Qiu Z, Mei T (2017) Boosting image captioning with attributes. In ICCV, 4904–4912
Yang Z, Yuan Y, Wu Y, Cohen WW, Salakhutdinov RR (2016) Review networks for caption generation. In NIPS, 2361–2369
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In CVPR, 4651–4659
Young P, Lai A, Hodosh M, Hockenmaier J (2014) From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions
Transactions of the Association for Computational Linguistics 2:67–78
Song J, Guo Y, Gao L, Li X, Hanjalic A, Shen HT (2018a) From deterministic to generative: multimodal stochastic rnns for video captioning. IEEE Trans Neural Netw Learn Syst 1–12
Song J, Zeng P, Gao L, Shen HT (2018b) From pixels to objects: cubic visual attention for visual question answering. In IJCAI, 906–912
Zhang M, Yang Y, Zhang H, Ji Y, Shen HT, Chua T-S (2019) More is better: precise and detailed image captioning using online positive recall and missing concepts mining. IEEE Trans Image Process 28(1):32–44
Wang B, Wang C, Zhang Q, Su Y, Wang Y, Xu Y (2020) Cross-lingual image caption generation based on visual attention model. Inst Electric Electron Eng 8:104543–104554
Song J, Gao L, Guo Z, Liu W, Zhang D, Shen HT (2017) Hierarchical LSTM with adjusted temporal attention for video captioning. In IJCAI, 2737–2743
Mohan A, Laxman K, Vigneswaran D, Yuvaraj J, Kumar NK (2019) Detection and recognition of objects in image caption generator system: a deep learning approach. Int Conf Adv Comput Commun Syst (ICACCS) 9:107–109
Bajpayee A, Raghuvanshi D, Mittal H, Bhatia Y (2019) Image captioning using google’s inception-resnet-v2 and recurrent neural network. Twelfth Int Conf Contemp Comput (IC3) 5:1–6
Sarfi A, Ghasemian F, Asadi N, Karimpour Z (2020) Show, attend to everything, and tell: image captioning with more thorough image understanding. Int Conf on Comput Knowl Eng (ICCKE) 8:001–005
Parmar B, Jayaswal D, Parikh H, Sawant H, Shah R, Chapaneri S (2020) “Encoder-decoder architecture for image caption generation.” In International conference on communication system, computing and it applications (CSCITA), 174–179
Pan C, Ding K, Wang L, Xiang S, Xiao X (2019) Deep hierarchical encoder–decoder network for image captioning. Inst Electric Electron Eng 21:2942–2956
Al Fatta H, Hartatik, Fajar U (2019) “Captioning image using convolutional neural network (cnn) and long-short term memory (lstm).” Int Seminar Res Inf Technol Intell Syst (ISRITI) 4:263–268
Hu H, Tian J, Li L, Liu M, Guan W (2020) Image caption generation with dual attention mechanism. Inf Process Manage 5:587–521
Xia Y, Tian F, Wu L, Lin J, Qin T, Yu N, Liu T (2017) Deliberation networks: sequence generation beyond one-pass decoding. In NIPS, 1782–1792
Wang A, Chan AB (2018) “CNN+ CNN: convolutional decoders for image captioning”, arXiv preprint arXiv:1805.09019
Li L, Tang S, Zhang Y, Deng L, Tian Q (2018) GLA: global-local attention for image description. IEEE Trans Multimed 20(3):726–737
Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2048–2057
Chen C, Wang EK, Wang F, Wu T, Zhang X (2019) Multilayer dense attention model for image caption. Inst Electric Electron Eng 7:66358–66368
Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: consensus-based image description evaluation. In CVPR, 4566–4575
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Vidhya, K.A., Krishnakumar, S., Cynddia, B. (2023). Adaptive Multi-attention for Image Sentence Generator Using C-LSTM. In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Seventh International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 448. Springer, Singapore. https://doi.org/10.1007/978-981-19-1610-6_51
Download citation
DOI: https://doi.org/10.1007/978-981-19-1610-6_51
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-19-1609-0
Online ISBN: 978-981-19-1610-6
eBook Packages: EngineeringEngineering (R0)