Abstract
Synthesizing high-resolution realistic images from text description using one iteration Generative Adversarial Network (GAN) is difficult without using any additional techniques because mostly the blurry artifacts and mode collapse problems are occurring. To reduce these problems, this paper proposes an Iterative Generative Adversarial Network (iGAN) which takes three iterations to synthesize high-resolution realistic image from their text description. In the \(1^{st}\) iteration, GAN synthesizes a low-resolution \(64 \times 64\) pixels basic shape and basic color image from the text description with less mode collapse and blurry artifacts problems. In the \(2^{nd}\) iteration, GAN takes the result of the \(1^{st}\) iteration and text description again and synthesizes a better resolution \(128 \times 128\) pixels better shape and well color image with very less mode collapse and blurry artifacts problems. In the last iteration, GAN takes the result of the \(2^{nd}\) iteration and text description as well and synthesizes a high-resolution \(256 \times 256\) well shape and clear image with almost no mode collapse and blurry artifacts problems. Our proposed iGAN shows a significant performance on CUB birds and Oxford-102 flowers datasets. Moreover, iGAN improves the inception score and human rank as compare to the other state-of-the-art methods.
This study is funded by the General Program of the National Natural Science Foundation of China (No: 61977029).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goodfellow, I.J., et al.: Generative adversarial nets. In: 27th International Conference on Neural Information Processing Systems, Montreal, Canada, pp. 2672–2680 (2014)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In: ICML (2016)
Welinder, P., et al.: Caltech-UCSD Birds 200. California Institute of Technology, CNS-TR-2010-001 (2010)
Nilsback, M.-E., Zisserman, A.: Automated flower classification over a large number of classes. In: Proceedings of the Indian Conference on Computer Vision, Graphics and Image Processing (2008)
Lin, T.-Y., et al.: Microsoft coco (Common objects in context). In: European Conference on Computer Vision (2014)
Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning What and Where to Draw. arXiv:1610.02454v1 [cs.CV] (2016)
Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2014, pp. 3686–3693 (2014)
Huang, X., Li, Y., Poursaeed, O., Hopcroft, J., Belongie, S.: Stacked generative adversarial networks. In: CVPR (2017)
Zhang, H., et al.: StackGAN++: realistic image synthesis with stacked generative adversarial networks. arXiv:1710.10916 (2017)
Xu, T., et al.: AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks. arXiv: 1711.10485v1 [cs.CV] (2017)
Dash, A., Gamboa, J., Ahmed, S., Liwicki, M., Afzal, M.Z.: TAC-GAN: Text Conditioned Auxiliary Classifier Generative Adversarial Network. arXiv:1703.06412v2 [cs.CV] (2017)
Odena, A., Olah, C., Shlens, J.: Conditional Image Synthesis with Auxiliary Classifier GANs. arXiv:1610.09585v4 [stat.ML] (2017)
Zhang, Z., Xie, Y., Yang, L.: Photographic Text-to-Image Synthesis with a Hierarchically-nested Adversarial Network. arXiv:1802.09178v2 [cs.CV] (2018)
Yuan, M., Peng, Y.: Text-to-image Synthesis via Symmetrical Distillation Networks. arXiv:1808.06801v1 [cs.CV] (2018)
Cha, M., Gwon, Y., Kung, H.T.: Adversarial nets with perceptual losses for text-to-image synthesis. arXiv:1708.09321v1 [cs.CV] (2017)
Qiao, T., Zhang, J., Xu, D., Tao, D.: MirrorGAN: Learning Text-to-image Generation by Redescription. arXiv:1903.05854v1 [cs.CL], 14 March 2019
Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv preprint arXiv: 1411.1784 (2014)
Reed, S., Akata, Z., Schiele, B., Lee, H.: Learning deep representations of fine-grained visual descriptions. In: IEEE Computer Vision and Pattern Recognition (2016)
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, A., Chen, X.: Improved techniques for training GANs. In: NIPS (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Ullah, A., Yu, X., Majid, A., Rahman, H.U., Mughal, M.F. (2019). High-Resolution Realistic Image Synthesis from Text Using Iterative Generative Adversarial Network. In: Lee, C., Su, Z., Sugimoto, A. (eds) Image and Video Technology. PSIVT 2019. Lecture Notes in Computer Science(), vol 11854. Springer, Cham. https://doi.org/10.1007/978-3-030-34879-3_17
Download citation
DOI: https://doi.org/10.1007/978-3-030-34879-3_17
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-34878-6
Online ISBN: 978-3-030-34879-3
eBook Packages: Computer ScienceComputer Science (R0)