Abstract
The automatic image caption generation with proper fluency and expressiveness is an emerging area of research. A lot of research has been done on image caption generation for English, but very few work has been done in the area of generating and evaluating captions in Hindi. In this paper, the problem of generation and evaluation of captions in Hindi is addressed by using a framework based on convolutional neural network (CNN) and long short-term memory (LSTM). This model maximizes the likelihood of the target caption for an input image. The framework is experimented over Hindi Visual Genome dataset. Human evaluation and pre-defined automatic evaluation metrics are used for the evaluation of generated output. The experimental results of the framework manifest that the model is generating reasonably impressive Hindi captions.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Banerjee S, Lavie A (2005) Meteor: an automatic metric for mt evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72
Denkowski M, Lavie A (2010) Choosing the right evaluation for machine translation: an examination of annotator and automatic metric performance on human judgment tasks
Elliott D, Keller F (2014) Comparing automatic evaluation measures for image description. In: ACL
Farhadi A, Hejrati M, Sadeghi MA, Young P, Rashtchian C, Hockenmaier J, Forsyth D (2010) Every picture tells a story: generating sentences from images. In: European conference on computer vision. Springer, pp 15–29
He X, Deng L (2017) Deep learning for image-to-text generation: a technical overview. IEEE Sig Process Mag 34(6):109–116
Hossain M, Sohel F, Shiratuddin MF, Laga H (2019) A comprehensive survey of deep learning for image captioning. ACM Comput Surv (CSUR) 51(6):118
Jia X, Gavves E, Fernando B, Tuytelaars T (2015) Guiding the long-short term memory model for image caption generation. In: The IEEE international conference on computer vision (ICCV)
Kiros R, Salakhutdinov R, Zemel R (2014) Multimodal neural language models. In: Proceedings of the 31st international conference on international conference on machine learning, volume 32, ICML’14, pp II–595–II–603. JMLR.org
Meetei LS, Singh TD, Bandyopadhyay S (2019) Wat2019: English-hindi translation on hindi visual genome dataset. In: Proceedings of the 6th workshop on Asian translation, pp 181–188
Papineni K, Roukos S, Ward T, Zhu WJ (2002) Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics, ACL ’02. Association for Computational Linguistics, Stroudsburg, pp 311–318
Parida S, Bojar O, Dash SR (2019) Hindi visual genome: a dataset for multimodal English-to-Hindi machine translation. Computación y Sistemas. In print. Presented at CICLing 2019, La Rochelle, France
Ren Z, Wang X, Zhang N, Lv X, Li LJ Deep reinforcement learning-based image captioning with embedding reward. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 290–298
Sidorov G, Gelbukh A, Gómez-Adorno H, Pinto D (2014) Soft similarity and soft cosine measure: Similarity of features in vector space model. Computación y Sistemas 18(3):491–504
Vedantam R, Lawrence Zitnick C, Parikh D (2015) Cider: Consensus-based image description evaluation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 4566–4575
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3156–3164
Wang C, Yang H, Meinel C (2018) Image captioning with deep bidirectional lstms and multi-task learning. ACM Trans Multimedia Comput Commun Appl 14(2s):40:1–40:20
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057
Yang Z, Yuan Y, Wu Y, Cohen WW, Salakhutdinov RR (2016) Review networks for caption generation. In: Advances in neural information processing systems, pp 2361–2369
Acknowledgements
This work is supported by Scheme for Promotion of Academic and Research Collaboration (SPARC) Project Code: P995 of No: SPARC/2018–2019/119/SL(IN) under MHRD, Govt of India.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Singh, A., Meetei, L.S., Singh, T.D., Bandyopadhyay, S. (2021). Generation and Evaluation of Hindi Image Captions of Visual Genome. In: Maji, A.K., Saha, G., Das, S., Basu, S., Tavares, J.M.R.S. (eds) Proceedings of the International Conference on Computing and Communication Systems. Lecture Notes in Networks and Systems, vol 170. Springer, Singapore. https://doi.org/10.1007/978-981-33-4084-8_7
Download citation
DOI: https://doi.org/10.1007/978-981-33-4084-8_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-33-4083-1
Online ISBN: 978-981-33-4084-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)