Text-to-Image Classification Using AttnGAN with DenseNet Architecture

Cruz, Anunshiya Pascal; Jaiswal, Jitendra

doi:10.1007/978-981-16-4301-9_1

Anunshiya Pascal Cruz⁷ &
Jitendra Jaiswal⁷

Part of the book series: Studies in Autonomic, Data-driven and Industrial Computing ((SADIC))

247 Accesses
2 Citations

Abstract

In this attentional generative adversarial networks (AttnGAN) for text-to-image conversion, we have used a CUB dataset with 12,000 images of 200 different birds with 10 captions for each image (12,000 * 10 captions). In random distribution splitting, we divided the data into training and testing sets with 49.2% of data and 50.8% of data, respectively. This method is able to synthesize fine-detailed images by the use of a global attention that gives more attention to the words in the textual descriptions. Also we have the deep attentional multimodal similarity model (DAMSM) that calculates the matching loss in the generator. Though this work produced images of high quality, there was some loss while training the system and it takes enough time for training. This paper proposes the DenseNet architecture with AttnGAN in order to reduce the loss and training time thereby synthesizing images with more distinct features. This technique was able to reduce the loss by 1.62% and could retrieve faster results by 768 s per iteration than the existing CNN architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 189.00; Price excludes VAT (USA)

Softcover Book: USD 249.99; Price excludes VAT (USA)

Hardcover Book: USD 249.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

Article 28 March 2024

Hybrid Attention Driven Text-to-Image Synthesis via Generative Adversarial Networks

CA-GAN: Conditional Adaptive Generative Adversarial Network for Text-to-Image Synthesis

References

Li L, Sun Y, Hu F (2020) Text to realistic image generation with attentional concatenation generative adversarial networks. Hindawi
Google Scholar
Eghbal-zadeh H, Zellinger W, Widmer G (2019) Mixture density generative adversarial networks. In Proceedings of the Advanced Computer Vision and Pattern Recognition (CVPR), pp 5820–5829
Google Scholar
Zhang H, Xu T, Li H et al (2018) Stack GAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962
Article Google Scholar
Zhang H, Xu T, Li H et al (2017) Stack GAN: Text to photorealistic image synthesis with stacked generative adversarial networks. In Proceedings of the Advanced International Conference on Computer Vision (ICCV), pp 5907–5915
Google Scholar
Denton EL, Chintala S, Szlam A, Fergus R (2015) Deep generative image models using a Laplacian pyramid of adversarial networks. NIPS
Google Scholar
Yang A, Dhruv D (2017) LR-GAN: layered recursive generative adversarial networks for image generation. ICLR
Google Scholar
Chen H, Lin W (2018) High-quality face image generated with conditional boundary equilibrium generative adversarial networks. Elsevier
Google Scholar
Xu T, Zhang P, Huang Q, et al (2017) AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the Advanced Computer Vision and Pattern Recognition (CVPR), pp 1316–1324
Google Scholar
Durugkar I, Gemp I, Mahadevan S (2017) Generative multi-adversarial networks. International Conference on Learning Representations
Google Scholar
Doersch C (2016) Tutorial on variational Auto-encoders. In Proceedings of Statistics and Machine Learning (arXiv), pp 1606–5908
Google Scholar
Vaswani A, Shazeer N, Parmar N, Uszkoreit L, Jones AN, Kaiser GL, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762
Zhang H, Patel V (2018) Densely connected pyramid de-hazing network. CVPR
Google Scholar
Zhuang H, Laurens L (2015) Densely connected convolutional networks. IEEE
Google Scholar
Zhang H, Sindagi V, Patel VM (2019) Image de-raining using a conditional generative adversarial network. In IEEE Conference
Google Scholar
Agrawal A, Lu J, Antol S et al (2017) VQA: visual question answering. IJCV 123(1):4–31
Article MathSciNet Google Scholar
Reed S, Akata Z, Schiele B (2016) Learning deep representations of fine-grained visual descriptions. CVPR
Google Scholar
Goodfellow IJ, Pouget J, Mirza M, et al (2014) Generative adversarial nets. NIPS
Google Scholar
Ledig C, Theis L, Huszar F, Caballero J et al (2017) Photo-realistic single image super- resolution using a generative adversarial network. CVPR
Google Scholar
Gauthier J (2015) Conditional generative adversarial nets for Convolutional face generation. IEEE
Google Scholar
Cao Y, Jia L, Chen Y (2018) Recent advances of generative adversarial networks in computer vision. IEEE 7:14985–15006
Google Scholar
Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In Proc. Int. Conf. Mach. Learn (ICML), pp 1060–1069
Google Scholar
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. CVPR
Google Scholar
Kapoor A, Shah R, Bhuva R, Pandit T (2020) “Understanding inception network architecture For image classification
Google Scholar
Yu J, Lee J et al (2020) Text-to-image generation grounded by fine-grained user attention. Google Research
Google Scholar
Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200–2011 dataset. Technical Report CNS-TR-2011–001, California Institute of Technology
Google Scholar
Gecer B, Ploumpis S, Kotsia I, Zafeiriou S (2019) GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction. In Proceedings of the Advanced Computer Vision and Pattern Recognition (CVPR), pp 1155–1164
Google Scholar
Salimans T, Goodfellow IJ, Zaremba W, Cheung V, Radford A (2016) Improved techniques for training GAN’s. NIPS
Google Scholar
Arjovsky M et al (2017) Towards principled methods for training Generative Adversarial Networks. In Proceedings of ICLR
Google Scholar
Zhan Y, Hu D, Wang Y, Yu X et al (2018) Semi-supervised hyper-spectral image classification based on generative adversarial networks. IEEE Geosci Remote Sens Lett 15(2):212–216
Google Scholar
Mansimov E, Parisotto E, Ba LJ et al (2016) Generating images from captions with attention. ICLR
Google Scholar
Qiao Z, Xu J, Tao (2019) Mirror GAN: learning text-to-image generation by re-description. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1505–1514
Google Scholar
Anderson P, He XD, Buehler C et al (2018) Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the Advanced International Conference on Computer Vision (ICCV), pp 6077–6086
Google Scholar
Kingma DP, Welling M (2014) Auto-encoding variational Bayes. ICLR
Google Scholar
Agnese HT (2019) A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley
Google Scholar
Wu X, Xu K, Hall P (2017) A survey of image synthesis and editing with generative adversarial networks. ISSNll1007–0214ll09/15llpp660–674 Volume 22
Google Scholar
Jain P, Jayaswal T (2020) Generative adversarial training and its utilization for text to image generation: a survey and analysis. 7(8). ISSN-2394–5125
Google Scholar
Li W, Zhang P, Zhang L, Huang Q, Gao L (2019) Object-driven text-to-image synthesis via adversarial training. In The IEEE conference on computer vision and pattern recognition (CVPR), pp 12174–12182
Google Scholar
Zhao J, Mathieu M, Lecun Y (2016) Energy-based generative adversarial network. In Proceedings of ICLR
Google Scholar
Shi W, Caballero J, Huszar F, Totz J (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In IEEE Xplore open access
Google Scholar
Isola Z, Zhou E (2016) Image-to-image translation with conditional adversarial networks. In IEEE
Google Scholar
Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In IEEE Xplore open access
Google Scholar
Cha M, Gwon Y, Kung HT (2017) Adversarial nets with perceptual losses for text-to-image synthesis. In IEEE Conference
Google Scholar
Gregor K, Danihelka I, Graves A (2015) DRAW: A recurrent neural network for image generation. In Proceedings of the 32nd International Conference on Machine Learning, JMLR: W&CP, volume 37
Google Scholar
Zhu J, Park T, Phillip A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV
Google Scholar
Nguyen A, Yosinski J, Bengio Y, Dosovitskiy A, Clune J (2017) Plug & play generative networks: Conditional iterative generation of images in latent space. CVPR
Google Scholar
Radford L, Chintala S (2016) Unsupervised representation learning with deep Convolutional generative adversarial networks. ICLR
Google Scholar
Lin Y, Maire M, Belongie S, et al (2014) Microsoft coco: Common objects in context. In ECCV
Google Scholar
Pan X, Zhao J (2018) High-resolution remote sensing image classification method based on convolutional neural network and restricted conditional random field. IJRS
Google Scholar
Peng Z, Suncong et al (2019) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. CSCL
Google Scholar
Salimans T, Zhang H, Radford A, Metaxas D, et al (2018) Improving GAN using optimal transport. ICLR
Google Scholar
Schuster M et al (1997) Bidirectional recurrent neural networks. IEEE Trans. Signal Processing 45(11):2673–2681
Article Google Scholar
Rintaro R, Takahiro MH (2016) Query is GAN: Scene retrieval with attentional text-to-image generative adversarial network. MIC/SCOPE, vol. 4
Google Scholar
Peng W, et al (2016) Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics
Google Scholar
Anjie T, Lu L (2019) Attentional Generative Adversarial Networks with Representativeness and Diversity for Generating Text to Realistic Image. In IEEE
Google Scholar
Gan Z, Gan C, He X, Pu Y, Tran K, Gao J, Carin L, Deng L (2017) Semantic compositional networks for visual captioning. CVPR
Google Scholar
Cheng Q, Gu X (2019 )Deep attentional fine-grained similarity network with adversarial learning for cross-modal retrieval. Springer
Google Scholar
Tingting J (2019) Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge. NIPS
Google Scholar
Simon M, et al (2016) The one hundred layers tiramisu: fully convolutional dense nets for semantic segmentation. IEEE
Google Scholar
Ming O (2016) Coupled generative adversarial networks. NIPS
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Jain University, Bangalore, Karnataka, 560078, India
Anunshiya Pascal Cruz & Jitendra Jaiswal

Authors

Anunshiya Pascal Cruz
View author publications
You can also search for this author in PubMed Google Scholar
Jitendra Jaiswal
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

University of Kalyani, Kalyani, West Bengal, India
Jyotsna Kumar Mandal
Assam University, Silchar, Assam, India
Somnath Mukhopadhyay
Stanford University, Palo Alto, CA, USA
Aynur Unal
Guru Nanak Institute of Technology, Kolkata, West Bengal, India
Santanu Kumar Sen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Cruz, A.P., Jaiswal, J. (2021). Text-to-Image Classification Using AttnGAN with DenseNet Architecture. In: Mandal, J.K., Mukhopadhyay, S., Unal, A., Sen, S.K. (eds) Proceedings of International Conference on Innovations in Software Architecture and Computational Systems. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-16-4301-9_1

Download citation

DOI: https://doi.org/10.1007/978-981-16-4301-9_1
Published: 12 October 2021
Publisher Name: Springer, Singapore
Print ISBN: 978-981-16-4300-2
Online ISBN: 978-981-16-4301-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Text-to-Image Classification Using AttnGAN with DenseNet Architecture

Abstract

Access this chapter

Similar content being viewed by others

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

Hybrid Attention Driven Text-to-Image Synthesis via Generative Adversarial Networks

CA-GAN: Conditional Adaptive Generative Adversarial Network for Text-to-Image Synthesis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Text-to-Image Classification Using AttnGAN with DenseNet Architecture

Abstract

Access this chapter

Similar content being viewed by others

PMGAN: pretrained model-based generative adversarial network for text-to-image generation

Hybrid Attention Driven Text-to-Image Synthesis via Generative Adversarial Networks

CA-GAN: Conditional Adaptive Generative Adversarial Network for Text-to-Image Synthesis

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation