Skip to main content

Multi-title Attention Mechanism to Generate High-Quality Images on AttnGAN

  • Conference paper
  • First Online:
Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD 2022)

Abstract

In the field of Text-to-image, text is essentially a constraint condition for the generated image, and the generation network guides to generate images that match the text according to the constraint conditions. However, if the image is generated only on the basis of a given text constraint condition, obviously, it can be imagined that the generated image without rich details, reducing the image visualization. With that in mind, we introduce Multi-title Attention Mechanism, regard the dataset as a prior condition, at first, select other titles in the dataset that are compatible with the given text according to given title, which is essentially the process of information retrieval, and then use the self-attention mechanism to integrate the embedding of multiple titles, the final text contains rich detail information, which guides the generation of high-quality images. In addition, in order to enable AttnGAN to generate clear image in the first stage, we introduce a mixed attention mechansim and an Residual Dense Block(RDB) model. The mixed attention mechanism includes: channel attention and pixel attention. Channel attention is mainly to guide what the image is generate, while pixel attention is responsible for where it is generated. Experiments on the CUB dataset show that the proposed approaches is significantly better than AttnGAN, and the lnception Score(IS) and R-precision of the evaluation index are improved by 4.12% and 10.43% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 229.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 299.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In NIPS (2014)

    Google Scholar 

  2. Chen, Z., Nagy, J.G., Xi, Y., Yu, B.: Structured FISTA for Image Restoration. In EI (2020)

    Google Scholar 

  3. Tian, C., Zhuge, R., Wu, Z., Xu, Y.: Lightweight Image Super-Resolution with Enhanced CNN. In EI (2020)

    Google Scholar 

  4. Di, X., Patel, V.M.: Facial synthesis from visual attributes via sketch using multiscale generators. IEEE Trans Biometrics Behav Identity Sci. no. 1, pp. 55–67 (2020)

    Google Scholar 

  5. Jo, Y., Park, J.: SC-FEGAN—face editing generative adversarial network with user’s sketch and color. ICCV, pp. 1745–1753 (2019)

    Google Scholar 

  6. Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In ICML (2016)

    Google Scholar 

  7. Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In NIPS (2016)

    Google Scholar 

  8. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV (2017)

    Google Scholar 

  9. Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stackgan++: Realistic image synthesis with stacked generative adversarial networks. arXiv: 1710.10916 (2017)

    Google Scholar 

  10. Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: DRAW: A recurrent neural network for image generation. In ICML (2015)

    Google Scholar 

  11. Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J.: Plug & play generative networks: conditional iterative generation of images in latent space. In CVPR (2017)

    Google Scholar 

  12. Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A.: Conditional image generation with pixelcnn decoders. In NIPS (2016)

    Google Scholar 

  13. Metaxas.: StackGAN++: Realistic image synthesis with stacked generative adversarial networks. arXiv: 1710.10916. (2017)

    Google Scholar 

  14. Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.A.: AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In CVPR, Salt Lake City, Utah (2018)

    Google Scholar 

  15. Xu, J., He, X., Li, H.: Deep learning for matching in search and recommendation[C]. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1365–1368 (2018)

    Google Scholar 

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China(Item NO:61105064),Scientific Research Project of Shaanxi Provincial Department of Education(Item NO:16JK1689),and the Key Laboratory of Network Data Analysis of Shaanxi Province. And I would like to reviewers for reading this paper, and thank my tutor and partners for their help during the process of my experiment. Especially thank the key laboratory for providing me with a good learning environment and experimental support condition.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiwang Gao .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Qiao, P., Gao, X. (2023). Multi-title Attention Mechanism to Generate High-Quality Images on AttnGAN. In: Xiong, N., Li, M., Li, K., Xiao, Z., Liao, L., Wang, L. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 153. Springer, Cham. https://doi.org/10.1007/978-3-031-20738-9_18

Download citation

Publish with us

Policies and ethics