Skip to main content

Generation and Evaluation of Hindi Image Captions of Visual Genome

  • Conference paper
  • First Online:
Proceedings of the International Conference on Computing and Communication Systems

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 170))

Abstract

The automatic image caption generation with proper fluency and expressiveness is an emerging area of research. A lot of research has been done on image caption generation for English, but very few work has been done in the area of generating and evaluating captions in Hindi. In this paper, the problem of generation and evaluation of captions in Hindi is addressed by using a framework based on convolutional neural network (CNN) and long short-term memory (LSTM). This model maximizes the likelihood of the target caption for an input image. The framework is experimented over Hindi Visual Genome dataset. Human evaluation and pre-defined automatic evaluation metrics are used for the evaluation of generated output. The experimental results of the framework manifest that the model is generating reasonably impressive Hindi captions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://ufal.mff.cuni.cz/Hindi-visual-genome.

References

  1. Banerjee S, Lavie A (2005) Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72

    Google Scholar 

  2. Denkowski M, Lavie A (2010) Choosing the right evaluation for machine translation: an examination of annotator and automatic metric performance on human judgment tasks

    Google Scholar 

  3. Elliott D, Keller F (2014) Comparing automatic evaluation measures for image description. In: ACL

    Google Scholar 

  4. Farhadi A, Hejrati M, Sadeghi MA, Young P, Rashtchian C, Hockenmaier J, Forsyth D (2010) Every picture tells a story: generating sentences from images. In: European conference on computer vision. Springer, pp 15–29

    Google Scholar 

  5. He X, Deng L (2017) Deep learning for image-to-text generation: a technical overview. IEEE Sig Process Mag 34(6):109–116

    Article  Google Scholar 

  6. Hossain M, Sohel F, Shiratuddin MF, Laga H (2019) A comprehensive survey of deep learning for image captioning. ACM Comput Surv (CSUR) 51(6):118

    Article  Google Scholar 

  7. Jia X, Gavves E, Fernando B, Tuytelaars T (2015) Guiding the long-short term memory model for image caption generation. In: The IEEE international conference on computer vision (ICCV)

    Google Scholar 

  8. Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language models. In: Proceedings of the 31st international conference on international conference on machine learning, volume 32, ICML’14, pp II–595–II–603. JMLR.org

    Google Scholar 

  9. Meetei LS, Singh TD, Bandyopadhyay S (2019) Wat2019: English-hindi translation on hindi visual genome dataset. In: Proceedings of the 6th workshop on Asian translation, pp 181–188

    Google Scholar 

  10. Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, ACL ’02. Association for Computational Linguistics, Stroudsburg, pp 311–318

    Google Scholar 

  11. Parida S, Bojar O, Dash SR (2019) Hindi visual genome: a dataset for multimodal English-to-Hindi machine translation. Computación y Sistemas. In print. Presented at CICLing 2019, La Rochelle, France

    Google Scholar 

  12. Ren Z, Wang X, Zhang N, Lv X, Li LJ Deep reinforcement learning-based image captioning with embedding reward. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 290–298

    Google Scholar 

  13. Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: Similarity of features in vector space model. Computación y Sistemas 18(3):491–504

    Article  Google Scholar 

  14. Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575

    Google Scholar 

  15. Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164

    Google Scholar 

  16. Wang C, Yang H, Meinel C (2018) Image captioning with deep bidirectional lstms and multi-task learning. ACM Trans Multimedia Comput Commun Appl 14(2s):40:1–40:20

    Google Scholar 

  17. Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

    Google Scholar 

  18. Yang Z, Yuan Y, Wu Y, Cohen WW, Salakhutdinov RR (2016) Review networks for caption generation. In: Advances in neural information processing systems, pp 2361–2369

    Google Scholar 

Download references

Acknowledgements

This work is supported by Scheme for Promotion of Academic and Research Collaboration (SPARC) Project Code: P995 of No: SPARC/2018–2019/119/SL(IN) under MHRD, Govt of India.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alok Singh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Singh, A., Meetei, L.S., Singh, T.D., Bandyopadhyay, S. (2021). Generation and Evaluation of Hindi Image Captions of Visual Genome. In: Maji, A.K., Saha, G., Das, S., Basu, S., Tavares, J.M.R.S. (eds) Proceedings of the International Conference on Computing and Communication Systems. Lecture Notes in Networks and Systems, vol 170. Springer, Singapore. https://doi.org/10.1007/978-981-33-4084-8_7

Download citation

Publish with us

Policies and ethics