Abstract
In this attentional generative adversarial networks (AttnGAN) for text-to-image conversion, we have used a CUB dataset with 12,000 images of 200 different birds with 10 captions for each image (12,000 * 10 captions). In random distribution splitting, we divided the data into training and testing sets with 49.2% of data and 50.8% of data, respectively. This method is able to synthesize fine-detailed images by the use of a global attention that gives more attention to the words in the textual descriptions. Also we have the deep attentional multimodal similarity model (DAMSM) that calculates the matching loss in the generator. Though this work produced images of high quality, there was some loss while training the system and it takes enough time for training. This paper proposes the DenseNet architecture with AttnGAN in order to reduce the loss and training time thereby synthesizing images with more distinct features. This technique was able to reduce the loss by 1.62% and could retrieve faster results by 768 s per iteration than the existing CNN architecture.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li L, Sun Y, Hu F (2020) Text to realistic image generation with attentional concatenation generative adversarial networks. Hindawi
Eghbal-zadeh H, Zellinger W, Widmer G (2019) Mixture density generative adversarial networks. In Proceedings of the Advanced Computer Vision and Pattern Recognition (CVPR), pp 5820–5829
Zhang H, Xu T, Li H et al (2018) Stack GAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962
Zhang H, Xu T, Li H et al (2017) Stack GAN: Text to photorealistic image synthesis with stacked generative adversarial networks. In Proceedings of the Advanced International Conference on Computer Vision (ICCV), pp 5907–5915
Denton EL, Chintala S, Szlam A, Fergus R (2015) Deep generative image models using a Laplacian pyramid of adversarial networks. NIPS
Yang A, Dhruv D (2017) LR-GAN: layered recursive generative adversarial networks for image generation. ICLR
Chen H, Lin W (2018) High-quality face image generated with conditional boundary equilibrium generative adversarial networks. Elsevier
Xu T, Zhang P, Huang Q, et al (2017) AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the Advanced Computer Vision and Pattern Recognition (CVPR), pp 1316–1324
Durugkar I, Gemp I, Mahadevan S (2017) Generative multi-adversarial networks. International Conference on Learning Representations
Doersch C (2016) Tutorial on variational Auto-encoders. In Proceedings of Statistics and Machine Learning (arXiv), pp 1606–5908
Vaswani A, Shazeer N, Parmar N, Uszkoreit L, Jones AN, Kaiser GL, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762
Zhang H, Patel V (2018) Densely connected pyramid de-hazing network. CVPR
Zhuang H, Laurens L (2015) Densely connected convolutional networks. IEEE
Zhang H, Sindagi V, Patel VM (2019) Image de-raining using a conditional generative adversarial network. In IEEE Conference
Agrawal A, Lu J, Antol S et al (2017) VQA: visual question answering. IJCV 123(1):4–31
Reed S, Akata Z, Schiele B (2016) Learning deep representations of fine-grained visual descriptions. CVPR
Goodfellow IJ, Pouget J, Mirza M, et al (2014) Generative adversarial nets. NIPS
Ledig C, Theis L, Huszar F, Caballero J et al (2017) Photo-realistic single image super- resolution using a generative adversarial network. CVPR
Gauthier J (2015) Conditional generative adversarial nets for Convolutional face generation. IEEE
Cao Y, Jia L, Chen Y (2018) Recent advances of generative adversarial networks in computer vision. IEEE 7:14985–15006
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In Proc. Int. Conf. Mach. Learn (ICML), pp 1060–1069
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. CVPR
Kapoor A, Shah R, Bhuva R, Pandit T (2020) “Understanding inception network architecture For image classification
Yu J, Lee J et al (2020) Text-to-image generation grounded by fine-grained user attention. Google Research
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200–2011 dataset. Technical Report CNS-TR-2011–001, California Institute of Technology
Gecer B, Ploumpis S, Kotsia I, Zafeiriou S (2019) GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction. In Proceedings of the Advanced Computer Vision and Pattern Recognition (CVPR), pp 1155–1164
Salimans T, Goodfellow IJ, Zaremba W, Cheung V, Radford A (2016) Improved techniques for training GAN’s. NIPS
Arjovsky M et al (2017) Towards principled methods for training Generative Adversarial Networks. In Proceedings of ICLR
Zhan Y, Hu D, Wang Y, Yu X et al (2018) Semi-supervised hyper-spectral image classification based on generative adversarial networks. IEEE Geosci Remote Sens Lett 15(2):212–216
Mansimov E, Parisotto E, Ba LJ et al (2016) Generating images from captions with attention. ICLR
Qiao Z, Xu J, Tao (2019) Mirror GAN: learning text-to-image generation by re-description. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1505–1514
Anderson P, He XD, Buehler C et al (2018) Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the Advanced International Conference on Computer Vision (ICCV), pp 6077–6086
Kingma DP, Welling M (2014) Auto-encoding variational Bayes. ICLR
Agnese HT (2019) A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley
Wu X, Xu K, Hall P (2017) A survey of image synthesis and editing with generative adversarial networks. ISSNll1007–0214ll09/15llpp660–674 Volume 22
Jain P, Jayaswal T (2020) Generative adversarial training and its utilization for text to image generation: a survey and analysis. 7(8). ISSN-2394–5125
Li W, Zhang P, Zhang L, Huang Q, Gao L (2019) Object-driven text-to-image synthesis via adversarial training. In The IEEE conference on computer vision and pattern recognition (CVPR), pp 12174–12182
Zhao J, Mathieu M, Lecun Y (2016) Energy-based generative adversarial network. In Proceedings of ICLR
Shi W, Caballero J, Huszar F, Totz J (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In IEEE Xplore open access
Isola Z, Zhou E (2016) Image-to-image translation with conditional adversarial networks. In IEEE
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In IEEE Xplore open access
Cha M, Gwon Y, Kung HT (2017) Adversarial nets with perceptual losses for text-to-image synthesis. In IEEE Conference
Gregor K, Danihelka I, Graves A (2015) DRAW: A recurrent neural network for image generation. In Proceedings of the 32nd International Conference on Machine Learning, JMLR: W&CP, volume 37
Zhu J, Park T, Phillip A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV
Nguyen A, Yosinski J, Bengio Y, Dosovitskiy A, Clune J (2017) Plug & play generative networks: Conditional iterative generation of images in latent space. CVPR
Radford L, Chintala S (2016) Unsupervised representation learning with deep Convolutional generative adversarial networks. ICLR
Lin Y, Maire M, Belongie S, et al (2014) Microsoft coco: Common objects in context. In ECCV
Pan X, Zhao J (2018) High-resolution remote sensing image classification method based on convolutional neural network and restricted conditional random field. IJRS
Peng Z, Suncong et al (2019) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. CSCL
Salimans T, Zhang H, Radford A, Metaxas D, et al (2018) Improving GAN using optimal transport. ICLR
Schuster M et al (1997) Bidirectional recurrent neural networks. IEEE Trans. Signal Processing 45(11):2673–2681
Rintaro R, Takahiro MH (2016) Query is GAN: Scene retrieval with attentional text-to-image generative adversarial network. MIC/SCOPE, vol. 4
Peng W, et al (2016) Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics
Anjie T, Lu L (2019) Attentional Generative Adversarial Networks with Representativeness and Diversity for Generating Text to Realistic Image. In IEEE
Gan Z, Gan C, He X, Pu Y, Tran K, Gao J, Carin L, Deng L (2017) Semantic compositional networks for visual captioning. CVPR
Cheng Q, Gu X (2019 )Deep attentional fine-grained similarity network with adversarial learning for cross-modal retrieval. Springer
Tingting J (2019) Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge. NIPS
Simon M, et al (2016) The one hundred layers tiramisu: fully convolutional dense nets for semantic segmentation. IEEE
Ming O (2016) Coupled generative adversarial networks. NIPS
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Cruz, A.P., Jaiswal, J. (2021). Text-to-Image Classification Using AttnGAN with DenseNet Architecture. In: Mandal, J.K., Mukhopadhyay, S., Unal, A., Sen, S.K. (eds) Proceedings of International Conference on Innovations in Software Architecture and Computational Systems. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-16-4301-9_1
Download citation
DOI: https://doi.org/10.1007/978-981-16-4301-9_1
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4300-2
Online ISBN: 978-981-16-4301-9
eBook Packages: Computer ScienceComputer Science (R0)