Skip to main content

Adaptive Multi-attention for Image Sentence Generator Using C-LSTM

  • Conference paper
  • First Online:
Proceedings of Seventh International Congress on Information and Communication Technology

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 448))

  • 466 Accesses

Abstract

Capturing image feature and multi-object region of an image and transferring it into a Natural Language Sentence is a research issue needs to be addressed with natural language processing. Technically, the attention mechanism will force every word representation to an corresponding image region, however at times it do neglect certain words like ‘the’ in the description text, as it misleads the text interpretation. The captioning of an image involves not only detecting the features from various images, but also decoding the collaborations between the items into significant image text. The focus of the suggested work, predicts the image sentence in a more detailed way for every region/frame of an image. To overcome, an image feature extraction is carried out using CNN and LSTM for the image text generation with the help of adaptive attention mechanism, which will be add in the layer of LSTM to predict better image sentence is constructed. The above mentioned deep network methods have been analyzed using two output combination. Experiments have been implemented using Flickr8k dataset. The implementation analysis illustrates that adaptive attention performs significantly better than without adaptive attention of image sentence model and generates more meaningful captions compared to any of the individual models used. From the results on test images, the suggested network gives the accuracy, bleu score with and without using adaptive attention in the LSTM of 81.53, 61.94 and 73.53, 57.94%.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 259.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 329.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arifianto A, Ramadhani KN, Mahadi MRS (2020) “Adaptive attention generation for indonesian image captioning.” In: International conference on information and communication technology (ICICT), pp 1–6

    Google Scholar 

  2. Hani A, Kherallah M, Tagougui N (2019) “Image caption generation using a deep architecture.” In International Arab conference on information technology (ACIT), pp 246–251

    Google Scholar 

  3. Yao T, Pan Y, Li Y, Qiu Z, Mei T (2017) Boosting image captioning with attributes. In ICCV, 4904–4912

    Google Scholar 

  4. Yang Z, Yuan Y, Wu Y, Cohen WW, Salakhutdinov RR (2016) Review networks for caption generation. In NIPS, 2361–2369

    Google Scholar 

  5. You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In CVPR, 4651–4659

    Google Scholar 

  6. Young P, Lai A, Hodosh M, Hockenmaier J (2014) From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

    Google Scholar 

  7. Transactions of the Association for Computational Linguistics 2:67–78

    Google Scholar 

  8. Song J, Guo Y, Gao L, Li X, Hanjalic A, Shen HT (2018a) From deterministic to generative: multimodal stochastic rnns for video captioning. IEEE Trans Neural Netw Learn Syst 1–12

    Google Scholar 

  9. Song J, Zeng P, Gao L, Shen HT (2018b) From pixels to objects: cubic visual attention for visual question answering. In IJCAI, 906–912

    Google Scholar 

  10. Zhang M, Yang Y, Zhang H, Ji Y, Shen HT, Chua T-S (2019) More is better: precise and detailed image captioning using online positive recall and missing concepts mining. IEEE Trans Image Process 28(1):32–44

    Article  MathSciNet  Google Scholar 

  11. Wang B, Wang C, Zhang Q, Su Y, Wang Y, Xu Y (2020) Cross-lingual image caption generation based on visual attention model. Inst Electric Electron Eng 8:104543–104554

    Google Scholar 

  12. Song J, Gao L, Guo Z, Liu W, Zhang D, Shen HT (2017) Hierarchical LSTM with adjusted temporal attention for video captioning. In IJCAI, 2737–2743

    Google Scholar 

  13. Mohan A, Laxman K, Vigneswaran D, Yuvaraj J, Kumar NK (2019) Detection and recognition of objects in image caption generator system: a deep learning approach. Int Conf Adv Comput Commun Syst (ICACCS) 9:107–109

    Google Scholar 

  14. Bajpayee A, Raghuvanshi D, Mittal H, Bhatia Y (2019) Image captioning using google’s inception-resnet-v2 and recurrent neural network. Twelfth Int Conf Contemp Comput (IC3) 5:1–6

    Google Scholar 

  15. Sarfi A, Ghasemian F, Asadi N, Karimpour Z (2020) Show, attend to everything, and tell: image captioning with more thorough image understanding. Int Conf on Comput Knowl Eng (ICCKE) 8:001–005

    Google Scholar 

  16. Parmar B, Jayaswal D, Parikh H, Sawant H, Shah R, Chapaneri S (2020) “Encoder-decoder architecture for image caption generation.” In International conference on communication system, computing and it applications (CSCITA), 174–179

    Google Scholar 

  17. Pan C, Ding K, Wang L, Xiang S, Xiao X (2019) Deep hierarchical encoder–decoder network for image captioning. Inst Electric Electron Eng 21:2942–2956

    Google Scholar 

  18. Al Fatta H, Hartatik, Fajar U (2019) “Captioning image using convolutional neural network (cnn) and long-short term memory (lstm).” Int Seminar Res Inf Technol Intell Syst (ISRITI) 4:263–268

    Google Scholar 

  19. Hu H, Tian J, Li L, Liu M, Guan W (2020) Image caption generation with dual attention mechanism. Inf Process Manage 5:587–521

    Google Scholar 

  20. Xia Y, Tian F, Wu L, Lin J, Qin T, Yu N, Liu T (2017) Deliberation networks: sequence generation beyond one-pass decoding. In NIPS, 1782–1792

    Google Scholar 

  21. Wang A, Chan AB (2018) “CNN+ CNN: convolutional decoders for image captioning”, arXiv preprint arXiv:1805.09019

  22. Li L, Tang S, Zhang Y, Deng L, Tian Q (2018) GLA: global-local attention for image description. IEEE Trans Multimed 20(3):726–737

    Article  Google Scholar 

  23. Xu K, Ba J, Kiros R, Cho K, Courville AC, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: Neural image caption generation with visual attention. In ICML, 2048–2057

    Google Scholar 

  24. Chen C, Wang EK, Wang F, Wu T, Zhang X (2019) Multilayer dense attention model for image caption. Inst Electric Electron Eng 7:66358–66368

    Google Scholar 

  25. Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: consensus-based image description evaluation. In CVPR, 4566–4575

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K. A. Vidhya .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Vidhya, K.A., Krishnakumar, S., Cynddia, B. (2023). Adaptive Multi-attention for Image Sentence Generator Using C-LSTM. In: Yang, XS., Sherratt, S., Dey, N., Joshi, A. (eds) Proceedings of Seventh International Congress on Information and Communication Technology. Lecture Notes in Networks and Systems, vol 448. Springer, Singapore. https://doi.org/10.1007/978-981-19-1610-6_51

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-1610-6_51

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-1609-0

  • Online ISBN: 978-981-19-1610-6

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics