MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

Yang, Yuyan; Ni, Xin; Hao, Yanbin; Liu, Chenyu; Wang, Wenshan; Liu, Yifeng; Xie, Haiyong

doi:10.1007/978-3-030-98358-1_4

Yuyan Yang^15,16,
Xin Ni^15,16,
Yanbin Hao^15,16,
Chenyu Liu¹⁷,
Wenshan Wang¹⁷,
Yifeng Liu¹⁷ &
…
Haiyong Xie^16,18

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13141))

Included in the following conference series:

International Conference on Multimedia Modeling

2309 Accesses
3 Citations

Abstract

The performance of text-to-image synthesis has been significantly boosted accompanied by the development of generative adversarial network (GAN) techniques. The current GAN-based methods for text-to-image generation mainly adopt multiple generator-discriminator pairs to explore the coarse/fine-grained textual content (e.g., words and sentences); however, they only consider the semantic consistency between the text-image pair. One drawback of such a multi-stream structure is that it results in many heavyweight models. In comparison, the single-stream counterpart bears the weakness of insufficient use of texts. To alleviate the above problems, we propose a Multi-conditional Fusion GAN (MF-GAN) to reap the benefits of both the multi-stream and the single-stream methods. MF-GAN is a single-stream model but achieves the utilization of both coarse and fine-grained textual information with the use of conditional residual block and dual attention block. More specifically, the sentence and word features are repeatedly inputted into different model stages for textual information enhancement. Furthermore, we introduce a triple loss to close the visual gap between the synthesized image and its positive image and enlarge the gap to its negative image. To thoroughly verify our method, we conduct extensive experiments on two benchmarked CUB and COCO datasets. Experimental results show that the proposed MF-GAN outperforms the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

CA-GAN: Conditional Adaptive Generative Adversarial Network for Text-to-Image Synthesis

BA-GAN: Bidirectional Attention Generation Adversarial Network for Text-to-Image Synthesis

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

References

Gulcehre, C., Chandar, S., Cho, K., Bengio, Y.: Dynamic neural turing machine with continuous and discrete addressing schemes. Neural Comput. 30(4), 857–884 (2018)
Article MathSciNet Google Scholar
Hong, S., Yang, D., Choi, J., Lee, H.: Inferring semantic layout for hierarchical text-to-image synthesis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7986–7994 (2018)
Google Scholar
Johnson, J., Gupta, A., Fei-Fei, L.: Image generation from scene graphs. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1219–1228 (2018)
Google Scholar
Li, B., Qi, X., Lukasiewicz, T., Torr, P.H.: Controllable text-to-image generation. arXiv preprint arXiv:1909.07083 (2019)
Li, W., et al.: Object-driven text-to-image synthesis via adversarial training. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 12174–12182 (2019)
Google Scholar
Liang, J., Pei, W., Lu, F.: CPGAN: content-parsing generative adversarial networks for text-to-image synthesis. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12349, pp. 491–508. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58548-8_29
Chapter Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Melekhov, I., Kannala, J., Rahtu, E.: Siamese network features for image matching. In: 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 378–383. IEEE (2016)
Google Scholar
Qiao, T., Zhang, J., Xu, D., Tao, D.: Mirrorgan: learning text-to-image generation by redescription. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1505–1514 (2019)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text to image synthesis. In: International Conference on Machine Learning, pp. 1060–1069. PMLR (2016)
Google Scholar
Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training gans. arXiv preprint arXiv:1606.03498 (2016)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Google Scholar
Tan, H., Liu, X., Li, X., Zhang, Y., Yin, B.: Semantics-enhanced adversarial nets for text-to-image synthesis. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10501–10510 (2019)
Google Scholar
Tao, M., Tang, H., Wu, S., Sebe, N., Wu, F., Jing, X.Y.: DF-GAN: deep fusion generative adversarial networks for text-to-image synthesis. arXiv preprint arXiv:2008.05865 (2020)
Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technocal Report CNS-TR-2011-001, California Institute of Technology (2011)
Google Scholar
Xu, T., et al.: Attngan: fine-grained text to image generation with attentional generative adversarial networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1316–1324 (2018)
Google Scholar
Yin, G., Liu, B., Sheng, L., Yu, N., Wang, X., Shao, J.: Semantics disentangling for text-to-image generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2327–2336 (2019)
Google Scholar
Zhang, H., et al.: Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 5907–5915 (2017)
Google Scholar
Zhang, Z., Schomaker, L.: DTGAN: dual attention generative adversarial networks for text-to-image generation. arXiv preprint arXiv:2011.02709 (2020)
Zhang, Z., Xie, Y., Yang, L.: Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6199–6208 (2018)
Google Scholar

Download references

Acknowledgments

We would like to thank the anonymous reviewers for their valuable suggestions. Haiyong Xie is the correspondence author. This work is supported in part by the National Key R&D Project (Grant No. SQ2021YFC3300088) and the Natural Science Foundation of China (Grant No. U19B2036).

Author information

Authors and Affiliations

University of Science and Technology of China, Hefei, 230026, Anhui, China
Yuyan Yang, Xin Ni & Yanbin Hao
Key Laboratory of Cyberculture Content Cognition and Detection, Ministry of Culture and Tourism, Hefei, 230026, Anhui, China
Yuyan Yang, Xin Ni, Yanbin Hao & Haiyong Xie
National Engineering Laboratory for Risk Perception and Prevention (NEL-RPP), Beijing, 100041, China
Chenyu Liu, Wenshan Wang & Yifeng Liu
Advanced Innovation Center for Human Brain Protection, Capital Medical University, Beijing, 100069, China
Haiyong Xie

Authors

Yuyan Yang
View author publications
You can also search for this author in PubMed Google Scholar
Xin Ni
View author publications
You can also search for this author in PubMed Google Scholar
Yanbin Hao
View author publications
You can also search for this author in PubMed Google Scholar
Chenyu Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenshan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yifeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Haiyong Xie
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Haiyong Xie .

Editor information

Editors and Affiliations

IT University of Copenhagen, Copenhagen, Denmark
Björn Þór Jónsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Minh-Triet Tran
University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
National Tsing Hua University, Hsinchu, Taiwan
Anita Min-Chun Hu
Hanoi University of Science and Technology, Hanoi, Vietnam
Binh Huynh Thi Thanh
Median Technologies, Valbonne, France
Benoit Huet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, Y. et al. (2022). MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-030-98358-1_4
Published: 15 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98357-4
Online ISBN: 978-3-030-98358-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

Abstract

Access this chapter

Similar content being viewed by others

CA-GAN: Conditional Adaptive Generative Adversarial Network for Text-to-Image Synthesis

BA-GAN: Bidirectional Attention Generation Adversarial Network for Text-to-Image Synthesis

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

MF-GAN: Multi-conditional Fusion Generative Adversarial Network for Text-to-Image Synthesis

Abstract

Access this chapter

Similar content being viewed by others

CA-GAN: Conditional Adaptive Generative Adversarial Network for Text-to-Image Synthesis

BA-GAN: Bidirectional Attention Generation Adversarial Network for Text-to-Image Synthesis

Gated Cross Word-Visual Attention-Driven Generative Adversarial Networks for Text-to-Image Synthesis

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation