High-Resolution Realistic Image Synthesis from Text Using Iterative Generative Adversarial Network

Ullah, Anwar; Yu, Xinguo; Majid, Abdul; Rahman, Hafiz Ur; Mughal, M. Farhan

doi:10.1007/978-3-030-34879-3_17

Anwar Ullah¹¹,
Xinguo Yu¹¹,
Abdul Majid¹¹,
Hafiz Ur Rahman¹² &
…
M. Farhan Mughal¹³

Part of the book series: Lecture Notes in Computer Science ((LNIP,volume 11854))

Included in the following conference series:

Pacific-Rim Symposium on Image and Video Technology

1433 Accesses

Abstract

Synthesizing high-resolution realistic images from text description using one iteration Generative Adversarial Network (GAN) is difficult without using any additional techniques because mostly the blurry artifacts and mode collapse problems are occurring. To reduce these problems, this paper proposes an Iterative Generative Adversarial Network (iGAN) which takes three iterations to synthesize high-resolution realistic image from their text description. In the \(1^{st}\) iteration, GAN synthesizes a low-resolution \(64 \times 64\) pixels basic shape and basic color image from the text description with less mode collapse and blurry artifacts problems. In the \(2^{nd}\) iteration, GAN takes the result of the \(1^{st}\) iteration and text description again and synthesizes a better resolution \(128 \times 128\) pixels better shape and well color image with very less mode collapse and blurry artifacts problems. In the last iteration, GAN takes the result of the \(2^{nd}\) iteration and text description as well and synthesizes a high-resolution \(256 \times 256\) well shape and clear image with almost no mode collapse and blurry artifacts problems. Our proposed iGAN shows a significant performance on CUB birds and Oxford-102 flowers datasets. Moreover, iGAN improves the inception score and human rank as compare to the other state-of-the-art methods.

This study is funded by the General Program of the National Natural Science Foundation of China (No: 61977029).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goodfellow, I.J., et al.: Generative adversarial nets. In: 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 2672–2680 (2014)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In: ICML (2016)
Google Scholar
Welinder, P., et al.: Caltech-UCSD Birds 200. California Institute of Technology, CNS-TR-2010-001 (2010)
Google Scholar
Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)
Google Scholar
Lin, T.-Y., et al.: Microsoft coco (Common objects in context). In: European Conference on Computer Vision (2014)
Google Scholar
Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning What and Where to Draw. arXiv:1610.02454v1 [cs.CV] (2016)
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014, pp. 3686–3693 (2014)
Google Scholar
Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: CVPR (2017)
Google Scholar
Zhang, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. arXiv:1710.10916 (2017)
Xu, T., et al.: AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. arXiv: 1711.10485v1 [cs.CV] (2017)
Dash, A., Gamboa, J., Ahmed, S., Liwicki, M., Afzal, M.Z.: TAC-GAN: Text Conditioned Auxiliary Classifier Generative Adversarial Network. arXiv:1703.06412v2 [cs.CV] (2017)
Odena, A., Olah, C., Shlens, J.: Conditional Image Synthesis with Auxiliary Classifier GANs. arXiv:1610.09585v4 [stat.ML] (2017)
Zhang, Z., Xie, Y., Yang, L.: Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network. arXiv:1802.09178v2 [cs.CV] (2018)
Yuan, M., Peng, Y.: Text-to-image Synthesis via Symmetrical Distillation Networks. arXiv:1808.06801v1 [cs.CV] (2018)
Cha, M., Gwon, Y., Kung, H.T.: Adversarial nets with perceptual losses for text-to-image synthesis. arXiv:1708.09321v1 [cs.CV] (2017)
Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: Learning Text-to-image Generation by Redescription. arXiv:1903.05854v1 [cs.CL], 14 March 2019
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv: 1411.1784 (2014)
Reed, S., Akata, Z., Schiele, B., Lee, H.: Learning deep representations of fine-grained visual descriptions. In: IEEE Computer Vision and Pattern Recognition (2016)
Google Scholar
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NIPS (2016)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Google Scholar

Download references

Author information

Authors and Affiliations

National Engineering Research Center for E-Learning, Central China Normal University, Wuhan, China
Anwar Ullah, Xinguo Yu & Abdul Majid
School of Computer Science, Guangzhou University, Guangzhou, 510006, China
Hafiz Ur Rahman
Tianjin University of Finance and Economics, Tianjin, China
M. Farhan Mughal

Authors

Anwar Ullah
View author publications
You can also search for this author in PubMed Google Scholar
Xinguo Yu
View author publications
You can also search for this author in PubMed Google Scholar
Abdul Majid
View author publications
You can also search for this author in PubMed Google Scholar
Hafiz Ur Rahman
View author publications
You can also search for this author in PubMed Google Scholar
M. Farhan Mughal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xinguo Yu .

Editor information

Editors and Affiliations

Chonnam National University, Gwangju, Korea (Republic of)
Chilwoo Lee
Dalian University of Technology, Dalian, China
Zhixun Su
National Institute of Informatics, Tokyo, Japan
Akihiro Sugimoto

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ullah, A., Yu, X., Majid, A., Rahman, H.U., Mughal, M.F. (2019). High-Resolution Realistic Image Synthesis from Text Using Iterative Generative Adversarial Network. In: Lee, C., Su, Z., Sugimoto, A. (eds) Image and Video Technology. PSIVT 2019. Lecture Notes in Computer Science(), vol 11854. Springer, Cham. https://doi.org/10.1007/978-3-030-34879-3_17

Download citation

DOI: https://doi.org/10.1007/978-3-030-34879-3_17
Published: 11 November 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34878-6
Online ISBN: 978-3-030-34879-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics