Topic sensitive image descriptions

Zia, Usman; Riaz, M. Mohsin; Ghafoor, Abdul; Ali, Syed Sohaib

doi:10.1007/s00521-019-04587-x

Topic sensitive image descriptions

Original Article
Published: 09 November 2019

Volume 32, pages 10471–10479, (2020)
Cite this article

Neural Computing and Applications Aims and scope Submit manuscript

Usman Zia¹,
M. Mohsin Riaz²,
Abdul Ghafoor¹ &
…
Syed Sohaib Ali²

292 Accesses
4 Citations
Explore all metrics

Abstract

The objective of description models is to generate image captions to elaborate contents. Despite recent advancements in machine learning and computer vision, generating discriminative captions still remains a challenging problem. Traditional approaches imitate frequent language patterns without considering the semantic alignment of words. In this work, an image captioning framework is proposed that generates topic sensitive descriptions. The model captures the semantic relation and polysemous nature of the words that describe the images and resultantly generates superior descriptions for the target images. The efficacy of the proposed model is indicated by the evaluation on the state-of-the-art captioning datasets. The model shows promising performance compared to the existing description models proposed in the recent literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topic-Specific Image Caption Generation

Injecting Prior Knowledge into Image Caption Generation

Object semantic analysis for image captioning

Article 22 April 2023

References

Ramisa A, Yan F, Moreno-Noguer F, Mikolajczyk K (2018) Breakingnews: article annotation by image and text processing. IEEE Trans Pattern Anal Mach Intell 40(5):1072–1085
Article Google Scholar
Ling H, Fidler S (2017) Teaching machines to describe images via natural language feedback. In: NIPS
Vinyals O, Toshev A, Bengio S, Erhan D (2015) Show and tell: a neural image caption generator. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), Boston MA, pp 3156–3164
Fang H, Gupta S, Iandola F, Srivastava R, Deng L, Dollar P, Gao J, He X, Mitchell M, Platt J, Zitnick CL, Zweig G (2015) From captions to visual concepts and back. In: IEEE conference on computer vision and pattern recognition
Elliott D, de Vries AP (2015) Describing images using inferred visual dependency representations. In: Annual meeting of the association for computational linguistics
Tan YH, Chan CS (2016) phi-LSTM: a phrase-based hierarchical LSTM model for image captioning. In: ACCV
You Q, Jin H, Wang Z, Fang C, Luo J (2016) Image captioning with semantic attention. In: 2016 IEEE conference on computer vision and pattern recognition (CVPR), pp 4651–4659
Karpathy A, Fei-Fei L (2017) Deep visual-semantic alignments for generating image descriptions. IEEE Trans Pattern Anal Mach Intell 39(4):664–676
Article Google Scholar
Donahue J, Hendricks LA, Rohrbach M, Venugopalan S, Guadarrama S, Saenko K, Darrell T (2017) Long-term recurrent convolutional networks for visual recognition and description. IEEE Trans Pattern Anal Mach Intell 39(4):677–691
Article Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013) Efficient estimation of word representations in vector space. In: ICLR Workshop
Pennington J, Socher R, Manning C (2014) Glove: global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp 1532-1543
Blei D, Lafferty J (2007) A correlated topic model of science. Ann Appl Stat 1(1):17–35
Article MathSciNet Google Scholar
Chen B (2009) Latent topic modelling of word co-occurence information for spoken document retrieval. In: 2009 IEEE international conference on acoustics, speech and signal processing. Taipei, pp 3961–3964
Socher R, Karpathy A, Le QV, Manning CD, Ng A (2014) Grounded compositional semantics for finding and describing images with sentences. In: Transactions of the association for computational linguistics, pp 207218
Deng J, Dong W, Socher R, Li LJ, Li Kai, Fei-Fei Li (2009) ImageNet: a large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition, pp 248–255
Donahue J, Hendricks LA, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: IEEE conference on computer vision and pattern recognition
Jia X, Gavves E, Fernando B, Tuytelaars T (2015) Guiding the long-short term memory model for image caption generation. In: 2015 IEEE international conference on computer vision (ICCV), pp. 2407–2415
Chen X, Zitnick CL (2015) Minds eye: a recurrent visual representation for image caption generation. In: IEEE Conference on computer vision and pattern recognition
Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp. 2048–2057
Wu Q, Shen C, Liu L, Dick A, van den Hengel A (2016) What value do explicit high level concepts have in vision to language problems? In: CVPR
Jin J, Fu K, Cui R, Sha F, Zhang C (2015) Aligning where to see and what to tell: image caption with region-based attention and scene factorization. arXiv:1506.06272
Uijlings J, van de Sande K, Gevers T, Smeulders A (2013) Selective search for object recognition. IJCV 104(2):154–171
Article Google Scholar
Simonyan K, Zisserman A (2015) Very deep convolutional networks for large-scale image recognition. In: ICLR
Yang Z, Yuan Y, Wu Y, Salakhutdinov R, Cohen W (2016) Encode, review, and decode: reviewer module for caption generation. In: NIPS
Pedersoli M, Lucas T, Schmid C, Verbeek J (2017) Areas of attention for image captioning. In: CVPR
Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2014) Imagenet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
Article MathSciNet Google Scholar
Garten J, Sagae K, Ustun V, Dehghani M (2015) Combining distributed vector representations for words. In: Proceedings of the 1st workshop on vector space modeling for natural language processing, pp 95–101
Fadaee M, Bisazza A, Monz C (2017) Learning topic-sensitive word representations. In: Proceedings of the 55th annual meeting of the association for computational linguistics. Association for Computational Linguistics, pp 441–447
Asuncion HU, Asuncion AU, Taylor RN (2010) Software traceability with topic modeling. In: 2010 ACM/IEEE 32nd international conference on software engineering, pp 95–104
Aldous DJ (1985) Exchangeability and related topics. In: École d’Été de Probabilités de Saint-Flour XIII—1983, pp 1–198
Graves A, Liwicki M, Fernández S, Bertolami R, Bunke H, Schmidhuber J (2009) A novel connectionist system for unconstrained handwriting recognition. IEEE Trans Pattern Anal Mach Intell 31(5):855–868
Article Google Scholar
Sak H, Senior A, Beaufays F (2014) Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Fifteenth annual conference of the international speech communication association
Hodosh M, Young P, Hockenmaier J (2013) Framing image description as a ranking task: data, models and evaluation metrics. J Artif Intell 47:853–899
MathSciNet MATH Google Scholar
Young P, Lai A, Hodosh M, Hockenmaier J (2014) From image descriptions to visual denotations: new similarity metrics for semantic inference over event descriptions. Trans Assoc Comput Lin 2:67–78
Google Scholar
Vinyals O, Toshev A, Bengio S, Erhan D (2017) Show and tell: lessons learned from the 2015 MSCOCO image captioning challenge. IEEE Trans Pattern Anal Mach Intell 39(4):652–663
Article Google Scholar
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. Adv Neural Inf Process Syst 2:1097–1105
Google Scholar
Papineni K, Roukos S, Ward T, Zhu WJ (2002) BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th annual meeting on association for computational linguistics
Banerjee S, Lavie A (2005) METEOR: an automatic metric for MT evaluation with improved correlation with human judgments. In: Proceedings of the ACL workshop on intrinsic and extrinsic evaluation measures for machine translation and/or summarization, pp 65–72
Vedantam R, Zitnick CL, Parikh D (2015) CIDEr: consensus-based image description evaluation. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR), pp 4566–4575
Chen X, Fang H, Lin TY, Vedantam R, Gupta S, Dollár P, Zitnick CL (2015) Microsoft COCO captions: data collection and evaluation server. arXiv preprint arXiv:1504.00325
Mao J, Xu W, Yang Y, Wang J, Yuille AL (2015) Explain images with multimodal recurrent neural networks. In: ICLR
Aneja J, Aditya D, Alexander SG (2018) Convolutional image captioning. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Wang Y, Lin Z, Shen X, Cohen S, Cottrell GW (2017) Skeleton key: image captioning by skeleton-attribute decomposition. In: Proceedings of the IEEE conference on computer vision and pattern recognition
Wang Q, Chan AB (2018) CNN+ CNN: convolutional decoders for image captioning. arXiv preprint arXiv:1805.09019

Download references

Author information

Authors and Affiliations

National University of Sciences and Technology, Islamabad, Pakistan
Usman Zia & Abdul Ghafoor
COMSATS University, Islamabad, Pakistan
M. Mohsin Riaz & Syed Sohaib Ali

Authors

Usman Zia
View author publications
You can also search for this author in PubMed Google Scholar
M. Mohsin Riaz
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Ghafoor
View author publications
You can also search for this author in PubMed Google Scholar
Syed Sohaib Ali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Abdul Ghafoor.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zia, U., Riaz, M.M., Ghafoor, A. et al. Topic sensitive image descriptions. Neural Comput & Applic 32, 10471–10479 (2020). https://doi.org/10.1007/s00521-019-04587-x

Download citation

Received: 17 October 2018
Accepted: 24 October 2019
Published: 09 November 2019
Issue Date: July 2020
DOI: https://doi.org/10.1007/s00521-019-04587-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Topic sensitive image descriptions

Abstract

Access this article

Similar content being viewed by others

Topic-Specific Image Caption Generation

Injecting Prior Knowledge into Image Caption Generation

Object semantic analysis for image captioning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Topic sensitive image descriptions

Abstract

Access this article

Similar content being viewed by others

Topic-Specific Image Caption Generation

Injecting Prior Knowledge into Image Caption Generation

Object semantic analysis for image captioning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation