Generation and Evaluation of Hindi Image Captions of Visual Genome

Singh, Alok; Meetei, Loitongbam Sanayai; Singh, Thoudam Doren; Bandyopadhyay, Sivaji

doi:10.1007/978-981-33-4084-8_7

Alok Singh¹⁴,
Loitongbam Sanayai Meetei¹⁴,
Thoudam Doren Singh¹⁴ &
…
Sivaji Bandyopadhyay¹⁴

Part of the book series: Lecture Notes in Networks and Systems ((LNNS,volume 170))

586 Accesses
4 Citations
1 Altmetric

Abstract

The automatic image caption generation with proper fluency and expressiveness is an emerging area of research. A lot of research has been done on image caption generation for English, but very few work has been done in the area of generating and evaluating captions in Hindi. In this paper, the problem of generation and evaluation of captions in Hindi is addressed by using a framework based on convolutional neural network (CNN) and long short-term memory (LSTM). This model maximizes the likelihood of the target caption for an input image. The framework is experimented over Hindi Visual Genome dataset. Human evaluation and pre-defined automatic evaluation metrics are used for the evaluation of generated output. The experimental results of the framework manifest that the model is generating reasonably impressive Hindi captions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://ufal.mff.cuni.cz/Hindi-visual-genome.

References

Banerjee S, Lavie A (2005) Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72
Google Scholar
Denkowski M, Lavie A (2010) Choosing the right evaluation for machine translation: an examination of annotator and automatic metric performance on human judgment tasks
Google Scholar
Elliott D, Keller F (2014) Comparing automatic evaluation measures for image description. In: ACL
Google Scholar
Farhadi A, Hejrati M, Sadeghi MA, Young P, Rashtchian C, Hockenmaier J, Forsyth D (2010) Every picture tells a story: generating sentences from images. In: European conference on computer vision. Springer, pp 15–29
Google Scholar
He X, Deng L (2017) Deep learning for image-to-text generation: a technical overview. IEEE Sig Process Mag 34(6):109–116
Article Google Scholar
Hossain M, Sohel F, Shiratuddin MF, Laga H (2019) A comprehensive survey of deep learning for image captioning. ACM Comput Surv (CSUR) 51(6):118
Article Google Scholar
Jia X, Gavves E, Fernando B, Tuytelaars T (2015) Guiding the long-short term memory model for image caption generation. In: The IEEE international conference on computer vision (ICCV)
Google Scholar
Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language models. In: Proceedings of the 31st international conference on international conference on machine learning, volume 32, ICML’14, pp II–595–II–603. JMLR.org
Google Scholar
Meetei LS, Singh TD, Bandyopadhyay S (2019) Wat2019: English-hindi translation on hindi visual genome dataset. In: Proceedings of the 6th workshop on Asian translation, pp 181–188
Google Scholar
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, ACL ’02. Association for Computational Linguistics, Stroudsburg, pp 311–318
Google Scholar
Parida S, Bojar O, Dash SR (2019) Hindi visual genome: a dataset for multimodal English-to-Hindi machine translation. Computación y Sistemas. In print. Presented at CICLing 2019, La Rochelle, France
Google Scholar
Ren Z, Wang X, Zhang N, Lv X, Li LJ Deep reinforcement learning-based image captioning with embedding reward. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 290–298
Google Scholar
Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: Similarity of features in vector space model. Computación y Sistemas 18(3):491–504
Article Google Scholar
Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575
Google Scholar
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Google Scholar
Wang C, Yang H, Meinel C (2018) Image captioning with deep bidirectional lstms and multi-task learning. ACM Trans Multimedia Comput Commun Appl 14(2s):40:1–40:20
Google Scholar
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Google Scholar
Yang Z, Yuan Y, Wu Y, Cohen WW, Salakhutdinov RR (2016) Review networks for caption generation. In: Advances in neural information processing systems, pp 2361–2369
Google Scholar

Download references

Acknowledgements

This work is supported by Scheme for Promotion of Academic and Research Collaboration (SPARC) Project Code: P995 of No: SPARC/2018–2019/119/SL(IN) under MHRD, Govt of India.

Author information

Authors and Affiliations

Department of Computer Science and Engineering and Center for Natural Language Processing (CNLP), National Institute of Technology Silchar, Silchar, India
Alok Singh, Loitongbam Sanayai Meetei, Thoudam Doren Singh & Sivaji Bandyopadhyay

Authors

Alok Singh
View author publications
You can also search for this author in PubMed Google Scholar
Loitongbam Sanayai Meetei
View author publications
You can also search for this author in PubMed Google Scholar
Thoudam Doren Singh
View author publications
You can also search for this author in PubMed Google Scholar
Sivaji Bandyopadhyay
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alok Singh .

Editor information

Editors and Affiliations

Department of Information Technology, North-Eastern Hill University, Shillong, Meghalaya, India
Arnab Kumar Maji
Department of Information Technology, North-Eastern Hill University, Shillong, Meghalaya, India
Goutam Saha
Department of Information Technology, North-Eastern Hill University, Shillong, Meghalaya, India
Sufal Das
Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India
Subhadip Basu
Departamento de Engenharia Mecânica, Faculdade de Engenharia, Universidade do Porto, Porto, Portugal
João Manuel R. S. Tavares

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, A., Meetei, L.S., Singh, T.D., Bandyopadhyay, S. (2021). Generation and Evaluation of Hindi Image Captions of Visual Genome. In: Maji, A.K., Saha, G., Das, S., Basu, S., Tavares, J.M.R.S. (eds) Proceedings of the International Conference on Computing and Communication Systems. Lecture Notes in Networks and Systems, vol 170. Springer, Singapore. https://doi.org/10.1007/978-981-33-4084-8_7

Download citation

DOI: https://doi.org/10.1007/978-981-33-4084-8_7
Published: 11 April 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4083-1
Online ISBN: 978-981-33-4084-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics