Generative adversarial network based on semantic consistency for text-to-image generation

Ma, Yue; Liu, Li; Zhang, Huaxiang; Wang, Chunjing; Wang, Zekang

doi:10.1007/s10489-022-03660-8

Generative adversarial network based on semantic consistency for text-to-image generation

Published: 14 June 2022

Volume 53, pages 4703–4716, (2023)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Yue Ma¹,
Li Liu ORCID: orcid.org/0000-0002-9121-5124¹,
Huaxiang Zhang¹,
Chunjing Wang¹ &
…
Zekang Wang¹

683 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

Although text-to-image generation technology has made significant progress in visually realistic images, the generated images cannot be completely consistent with the texts. In this paper, a novel generative adversarial network based on semantic consistency is proposed to generate semantically consistent and realistic images according to text descriptions. The proposed method explores the semantic consistency between text and image for an efficient cross-modal generation that combines image generation and semantic correlation. A generation network with a hybrid attention is utilized to generate different resolution images, which improves the authenticity of the generated images. In addition, a semantic comparison module is presented to map the texts and the generated images to the same semantic space for comparison through consistency refinement and information classification. Extensive experiments on public benchmark datasets demonstrate that the proposed method outperforms the comparative methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

ACMA-GAN: Adaptive Cross-Modal Attention for Text-to-Image Generation

MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

Article 28 March 2024

Data availability

The datasets generated and analysed during this study are available in the repository: http://cocodataset.org. and http://www.vision.caltech.edu/visipedia/CUB-200-2011.html. All other data are available from the authors upon reasonable request.

References

Agnese J, Herrera J, Tao H, Zhu X (2020) A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley Interdiscip Rev Data Min Knowl Discov 10(4)
Goodfellow IJ, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville CC, Bengio Y (2014) Generative adversarial nets. NIPS 2014:2672–2680
Yang Z, He X, Gao J, Deng L, Smola AJ (2016) Stacked attention networks for image question answering. CVPR, 21–29
Reed SE, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. ICML, 1060–1069
Li M (2022) Gai-Ge Wang:A review of green shop scheduling problem. Inf Sci 589:478–496
Article Google Scholar
Wang G, Lu M, Dong Y-Q, Zhao X-J (2016) Self-adaptive extreme learning machine. Neural Comput Appl 27(2):291–303
Article Google Scholar
Yi J-H, Wang J, Wang G-G (2016) Improved probabilistic neural networks with self-adaptive strategies for transformer fault diagnosis problem. Adv Mech Eng 8(1):1–13
Article Google Scholar
Cui Z, Xue F, Cai X, Cao Y, Wang G, Chen J (2018) Detection of malicious code variants based on deep learning. IEEE Trans Ind Inf 14(7):3187–3196
Article Google Scholar
Zhang H, Xu T, Li H, Zhang S, Huang X, Wang X, Metaxas DN (2016) StackGAN: Text to photo-realistic image synthesis with stacked generative adversarial networks. CoRR abs/1612.03242
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Dimitris N (2019) Metaxas: StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962
Article Google Scholar
Xu T, Zhang P, Huang Q, Zhang H, Gan Z, Huang X, He X (2018) AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. CVPR :1316–1324
Park H, Yoo Y, Kwak N (2018) MC-GAN: Multi-conditional generative adversarial network for image synthesis. BMVC :76
Zhang Z, Xie Y, Yang L (2018) Photographic text-to-image synthesis with a hierarchically-nested adversarial network. CVPR :6199–6208
Zhu M, Pan P, Chen W, Yang Y (2019) DM-GAN: Dynamic memory generative adversarial networks for text-to-image synthesis. CVPR :5802–5810
Pande S, Chouhan S, Sonavane R, Walambe R, Ghinea G, Kotecha K (2021) Development and deployment of a generative model-based framework for text to photorealistic image generation. Neurocomputing 463:1–16
Article Google Scholar
Qiao T, Zhang J, Xu D, Tao D (2019) MirrorGAN: Learning text-to-image generation by redescription. CVPR :1505–1514
Almahairi A, Rajeswar S, Sordoni A, Bachman P, Courville AC (2018) Augmented CycleGAN: Learning many-to-many mappings from unpaired data. ICML :195–204
Li B, Qi X, Lukasiewicz T, Torr PHS (2019) Controllable text-to-image generation. NeurIPS 2019:2063–2073
Zhang H, Koh JY, Baldridge J, Lee H, Yang Y (2021) Cross-modal contrastive learning for text-to-image generation. CVPR :833–842
Xie D, Deng C, Li C, Liu X, Tao D (2020) Multi-task consistency-preserving adversarial hashing for cross-modal retrieval. IEEE Trans Image Process 29:3626–3637
Johnson J, Alahi A, Fei-Fei L (2016) Perceptual losses for real-time style transfer and super-resolution. ECCV :694–7112
C. Wah, S. Branson, P. Welinder, P. Perona, and S. Belongie. TheCaltech-UCSD Birds-200-2011 Dataset. Technical Report CNS-TR-2011-001, California Institute of Technology, 2011.
Lin T-Yi, Maire M, Belongie SJ, Hays J, Perona P, Ramanan D, Dollár P, Zitnick CL (2014) Microsoft COCO: Common objects in context. ECCV :740–755
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein MS, Berg AC, Fei-Fei L (2015) ImageNet large scale visual recognition challenge. Int J Comput Vis 115(3):211–252
D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprintarXiv:1412.6980, 2014.
Salimans T, Goodfellow IJ, Zaremba W, Cheung V, Radford A, Chen X (2016) Improved techniques for training GANs. NIPS :2226–2234
Heusel M, Ramsauer H, Unterthiner T, Nessler B, Hochreiter S (2017) GANs trained by a two time-scale update rule converge to a local nash equilibrium. NIPS :6626–6637

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Shandong Normal University, Jinan, 250014, Shandong Province, China
Yue Ma, Li Liu, Huaxiang Zhang, Chunjing Wang & Zekang Wang

Authors

Yue Ma
View author publications
You can also search for this author in PubMed Google Scholar
Li Liu
View author publications
You can also search for this author in PubMed Google Scholar
Huaxiang Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Chunjing Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zekang Wang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Li Liu.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, Y., Liu, L., Zhang, H. et al. Generative adversarial network based on semantic consistency for text-to-image generation. Appl Intell 53, 4703–4716 (2023). https://doi.org/10.1007/s10489-022-03660-8

Download citation

Accepted: 18 April 2022
Published: 14 June 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s10489-022-03660-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Generative adversarial network based on semantic consistency for text-to-image generation

Abstract

Access this article

Similar content being viewed by others

ACMA-GAN: Adaptive Cross-Modal Attention for Text-to-Image Generation

MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Generative adversarial network based on semantic consistency for text-to-image generation

Abstract

Access this article

Similar content being viewed by others

ACMA-GAN: Adaptive Cross-Modal Attention for Text-to-Image Generation

MAGAN: Multi-attention Generative Adversarial Networks for Text-to-Image Generation

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

Data availability

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation