Skip to main content

Abstract

In this attentional generative adversarial networks (AttnGAN) for text-to-image conversion, we have used a CUB dataset with 12,000 images of 200 different birds with 10 captions for each image (12,000 * 10 captions). In random distribution splitting, we divided the data into training and testing sets with 49.2% of data and 50.8% of data, respectively. This method is able to synthesize fine-detailed images by the use of a global attention that gives more attention to the words in the textual descriptions. Also we have the deep attentional multimodal similarity model (DAMSM) that calculates the matching loss in the generator. Though this work produced images of high quality, there was some loss while training the system and it takes enough time for training. This paper proposes the DenseNet architecture with AttnGAN in order to reduce the loss and training time thereby synthesizing images with more distinct features. This technique was able to reduce the loss by 1.62% and could retrieve faster results by 768 s per iteration than the existing CNN architecture.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 249.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Li L, Sun Y, Hu F (2020) Text to realistic image generation with attentional concatenation generative adversarial networks. Hindawi

    Google Scholar 

  2. Eghbal-zadeh H, Zellinger W, Widmer G (2019) Mixture density generative adversarial networks. In Proceedings of the Advanced Computer Vision and Pattern Recognition (CVPR), pp 5820–5829

    Google Scholar 

  3. Zhang H, Xu T, Li H et al (2018) Stack GAN++: realistic image synthesis with stacked generative adversarial networks. IEEE Trans Pattern Anal Mach Intell 41(8):1947–1962

    Article  Google Scholar 

  4. Zhang H, Xu T, Li H et al (2017) Stack GAN: Text to photorealistic image synthesis with stacked generative adversarial networks. In Proceedings of the Advanced International Conference on Computer Vision (ICCV), pp 5907–5915

    Google Scholar 

  5. Denton EL, Chintala S, Szlam A, Fergus R (2015) Deep generative image models using a Laplacian pyramid of adversarial networks. NIPS

    Google Scholar 

  6. Yang A, Dhruv D (2017) LR-GAN: layered recursive generative adversarial networks for image generation. ICLR

    Google Scholar 

  7. Chen H, Lin W (2018) High-quality face image generated with conditional boundary equilibrium generative adversarial networks. Elsevier

    Google Scholar 

  8. Xu T, Zhang P, Huang Q, et al (2017) AttnGAN: fine-grained text to image generation with attentional generative adversarial networks. In Proceedings of the Advanced Computer Vision and Pattern Recognition (CVPR), pp 1316–1324

    Google Scholar 

  9. Durugkar I, Gemp I, Mahadevan S (2017) Generative multi-adversarial networks. International Conference on Learning Representations

    Google Scholar 

  10. Doersch C (2016) Tutorial on variational Auto-encoders. In Proceedings of Statistics and Machine Learning (arXiv), pp 1606–5908

    Google Scholar 

  11. Vaswani A, Shazeer N, Parmar N, Uszkoreit L, Jones AN, Kaiser GL, Polosukhin I (2017) Attention is all you need. arXiv:1706.03762

  12. Zhang H, Patel V (2018) Densely connected pyramid de-hazing network. CVPR

    Google Scholar 

  13. Zhuang H, Laurens L (2015) Densely connected convolutional networks. IEEE

    Google Scholar 

  14. Zhang H, Sindagi V, Patel VM (2019) Image de-raining using a conditional generative adversarial network. In IEEE Conference

    Google Scholar 

  15. Agrawal A, Lu J, Antol S et al (2017) VQA: visual question answering. IJCV 123(1):4–31

    Article  MathSciNet  Google Scholar 

  16. Reed S, Akata Z, Schiele B (2016) Learning deep representations of fine-grained visual descriptions. CVPR

    Google Scholar 

  17. Goodfellow IJ, Pouget J, Mirza M, et al (2014) Generative adversarial nets. NIPS

    Google Scholar 

  18. Ledig C, Theis L, Huszar F, Caballero J et al (2017) Photo-realistic single image super- resolution using a generative adversarial network. CVPR

    Google Scholar 

  19. Gauthier J (2015) Conditional generative adversarial nets for Convolutional face generation. IEEE

    Google Scholar 

  20. Cao Y, Jia L, Chen Y (2018) Recent advances of generative adversarial networks in computer vision. IEEE 7:14985–15006

    Google Scholar 

  21. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H (2016) Generative adversarial text to image synthesis. In Proc. Int. Conf. Mach. Learn (ICML), pp 1060–1069

    Google Scholar 

  22. Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. CVPR

    Google Scholar 

  23. Kapoor A, Shah R, Bhuva R, Pandit T (2020) “Understanding inception network architecture For image classification

    Google Scholar 

  24. Yu J, Lee J et al (2020) Text-to-image generation grounded by fine-grained user attention. Google Research

    Google Scholar 

  25. Wah C, Branson S, Welinder P, Perona P, Belongie S (2011) The Caltech-UCSD birds-200–2011 dataset. Technical Report CNS-TR-2011–001, California Institute of Technology

    Google Scholar 

  26. Gecer B, Ploumpis S, Kotsia I, Zafeiriou S (2019) GANFIT: generative adversarial network fitting for high fidelity 3D face reconstruction. In Proceedings of the Advanced Computer Vision and Pattern Recognition (CVPR), pp 1155–1164

    Google Scholar 

  27. Salimans T, Goodfellow IJ, Zaremba W, Cheung V, Radford A (2016) Improved techniques for training GAN’s. NIPS

    Google Scholar 

  28. Arjovsky M et al (2017) Towards principled methods for training Generative Adversarial Networks. In Proceedings of ICLR

    Google Scholar 

  29. Zhan Y, Hu D, Wang Y, Yu X et al (2018) Semi-supervised hyper-spectral image classification based on generative adversarial networks. IEEE Geosci Remote Sens Lett 15(2):212–216

    Google Scholar 

  30. Mansimov E, Parisotto E, Ba LJ et al (2016) Generating images from captions with attention. ICLR

    Google Scholar 

  31. Qiao Z, Xu J, Tao (2019) Mirror GAN: learning text-to-image generation by re-description. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp 1505–1514

    Google Scholar 

  32. Anderson P, He XD, Buehler C et al (2018) Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the Advanced International Conference on Computer Vision (ICCV), pp 6077–6086

    Google Scholar 

  33. Kingma DP, Welling M (2014) Auto-encoding variational Bayes. ICLR

    Google Scholar 

  34. Agnese HT (2019) A survey and taxonomy of adversarial neural networks for text-to-image synthesis. Wiley

    Google Scholar 

  35. Wu X, Xu K, Hall P (2017) A survey of image synthesis and editing with generative adversarial networks. ISSNll1007–0214ll09/15llpp660–674 Volume 22

    Google Scholar 

  36. Jain P, Jayaswal T (2020) Generative adversarial training and its utilization for text to image generation: a survey and analysis. 7(8). ISSN-2394–5125

    Google Scholar 

  37. Li W, Zhang P, Zhang L, Huang Q, Gao L (2019) Object-driven text-to-image synthesis via adversarial training. In The IEEE conference on computer vision and pattern recognition (CVPR), pp 12174–12182

    Google Scholar 

  38. Zhao J, Mathieu M, Lecun Y (2016) Energy-based generative adversarial network. In Proceedings of ICLR

    Google Scholar 

  39. Shi W, Caballero J, Huszar F, Totz J (2016) Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In IEEE Xplore open access

    Google Scholar 

  40. Isola Z, Zhou E (2016) Image-to-image translation with conditional adversarial networks. In IEEE

    Google Scholar 

  41. Zheng Z, Zheng L, Yang Y (2017) Unlabeled samples generated by GAN improve the person re-identification baseline in vitro. In IEEE Xplore open access

    Google Scholar 

  42. Cha M, Gwon Y, Kung HT (2017) Adversarial nets with perceptual losses for text-to-image synthesis. In IEEE Conference

    Google Scholar 

  43. Gregor K, Danihelka I, Graves A (2015) DRAW: A recurrent neural network for image generation. In Proceedings of the 32nd International Conference on Machine Learning, JMLR: W&CP, volume 37

    Google Scholar 

  44. Zhu J, Park T, Phillip A (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV

    Google Scholar 

  45. Nguyen A, Yosinski J, Bengio Y, Dosovitskiy A, Clune J (2017) Plug & play generative networks: Conditional iterative generation of images in latent space. CVPR

    Google Scholar 

  46. Radford L, Chintala S (2016) Unsupervised representation learning with deep Convolutional generative adversarial networks. ICLR

    Google Scholar 

  47. Lin Y, Maire M, Belongie S, et al (2014) Microsoft coco: Common objects in context. In ECCV

    Google Scholar 

  48. Pan X, Zhao J (2018) High-resolution remote sensing image classification method based on convolutional neural network and restricted conditional random field. IJRS

    Google Scholar 

  49. Peng Z, Suncong et al (2019) Text classification improved by integrating bidirectional LSTM with two-dimensional max pooling. CSCL

    Google Scholar 

  50. Salimans T, Zhang H, Radford A, Metaxas D, et al (2018) Improving GAN using optimal transport. ICLR

    Google Scholar 

  51. Schuster M et al (1997) Bidirectional recurrent neural networks. IEEE Trans. Signal Processing 45(11):2673–2681

    Article  Google Scholar 

  52. Rintaro R, Takahiro MH (2016) Query is GAN: Scene retrieval with attentional text-to-image generative adversarial network. MIC/SCOPE, vol. 4

    Google Scholar 

  53. Peng W, et al (2016) Attention-based bidirectional long short-term memory networks for relation classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics

    Google Scholar 

  54. Anjie T, Lu L (2019) Attentional Generative Adversarial Networks with Representativeness and Diversity for Generating Text to Realistic Image. In IEEE

    Google Scholar 

  55. Gan Z, Gan C, He X, Pu Y, Tran K, Gao J, Carin L, Deng L (2017) Semantic compositional networks for visual captioning. CVPR

    Google Scholar 

  56. Cheng Q, Gu X (2019 )Deep attentional fine-grained similarity network with adversarial learning for cross-modal retrieval. Springer

    Google Scholar 

  57. Tingting J (2019) Learn, Imagine and Create: Text-to-Image Generation from Prior Knowledge. NIPS

    Google Scholar 

  58. Simon M, et al (2016) The one hundred layers tiramisu: fully convolutional dense nets for semantic segmentation. IEEE

    Google Scholar 

  59. Ming O (2016) Coupled generative adversarial networks. NIPS

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Cruz, A.P., Jaiswal, J. (2021). Text-to-Image Classification Using AttnGAN with DenseNet Architecture. In: Mandal, J.K., Mukhopadhyay, S., Unal, A., Sen, S.K. (eds) Proceedings of International Conference on Innovations in Software Architecture and Computational Systems. Studies in Autonomic, Data-driven and Industrial Computing. Springer, Singapore. https://doi.org/10.1007/978-981-16-4301-9_1

Download citation

  • DOI: https://doi.org/10.1007/978-981-16-4301-9_1

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-16-4300-2

  • Online ISBN: 978-981-16-4301-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics