Multi-title Attention Mechanism to Generate High-Quality Images on AttnGAN

Qiao, Pingan; Gao, Xiwang

doi:10.1007/978-3-031-20738-9_18

Pingan Qiao⁸ &
Xiwang Gao⁸

Part of the book series: Lecture Notes on Data Engineering and Communications Technologies ((LNDECT,volume 153))

Included in the following conference series:

The International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery

1435 Accesses

Abstract

In the field of Text-to-image, text is essentially a constraint condition for the generated image, and the generation network guides to generate images that match the text according to the constraint conditions. However, if the image is generated only on the basis of a given text constraint condition, obviously, it can be imagined that the generated image without rich details, reducing the image visualization. With that in mind, we introduce Multi-title Attention Mechanism, regard the dataset as a prior condition, at first, select other titles in the dataset that are compatible with the given text according to given title, which is essentially the process of information retrieval, and then use the self-attention mechanism to integrate the embedding of multiple titles, the final text contains rich detail information, which guides the generation of high-quality images. In addition, in order to enable AttnGAN to generate clear image in the first stage, we introduce a mixed attention mechansim and an Residual Dense Block(RDB) model. The mixed attention mechanism includes: channel attention and pixel attention. Channel attention is mainly to guide what the image is generate, while pixel attention is responsible for where it is generated. Experiments on the CUB dataset show that the proposed approaches is significantly better than AttnGAN, and the lnception Score(IS) and R-precision of the evaluation index are improved by 4.12% and 10.43% respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 229.00; Price excludes VAT (USA)

Softcover Book: USD 299.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In NIPS (2014)
Google Scholar
Chen, Z., Nagy, J.G., Xi, Y., Yu, B.: Structured FISTA for Image Restoration. In EI (2020)
Google Scholar
Tian, C., Zhuge, R., Wu, Z., Xu, Y.: Lightweight Image Super-Resolution with Enhanced CNN. In EI (2020)
Google Scholar
Di, X., Patel, V.M.: Facial synthesis from visual attributes via sketch using multiscale generators. IEEE Trans Biometrics Behav Identity Sci. no. 1, pp. 55–67 (2020)
Google Scholar
Jo, Y., Park, J.: SC-FEGAN—face editing generative adversarial network with user’s sketch and color. ICCV, pp. 1745–1753 (2019)
Google Scholar
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In ICML (2016)
Google Scholar
Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In NIPS (2016)
Google Scholar
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV (2017)
Google Scholar
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stackgan++: Realistic image synthesis with stacked generative adversarial networks. arXiv: 1710.10916 (2017)
Google Scholar
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: DRAW: A recurrent neural network for image generation. In ICML (2015)
Google Scholar
Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J.: Plug & play generative networks: conditional iterative generation of images in latent space. In CVPR (2017)
Google Scholar
Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A.: Conditional image generation with pixelcnn decoders. In NIPS (2016)
Google Scholar
Metaxas.: StackGAN++: Realistic image synthesis with stacked generative adversarial networks. arXiv: 1710.10916. (2017)
Google Scholar
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.A.: AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In CVPR, Salt Lake City, Utah (2018)
Google Scholar
Xu, J., He, X., Li, H.: Deep learning for matching in search and recommendation[C]. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1365–1368 (2018)
Google Scholar

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China(Item NO:61105064),Scientific Research Project of Shaanxi Provincial Department of Education(Item NO:16JK1689),and the Key Laboratory of Network Data Analysis of Shaanxi Province. And I would like to reviewers for reading this paper, and thank my tutor and partners for their help during the process of my experiment. Especially thank the key laboratory for providing me with a good learning environment and experimental support condition.

Author information

Authors and Affiliations

Xi’an University of Posts and Telecommunications, Xi’an, 710121, China
Pingan Qiao & Xiwang Gao

Authors

Pingan Qiao
View author publications
You can also search for this author in PubMed Google Scholar
Xiwang Gao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiwang Gao .

Editor information

Editors and Affiliations

Division of Intelligent Future Technologies, Mälardalen University, Västerås, Västmanlands Län, Sweden
Ning Xiong
Department of Electronic and Computer Engineering, Brunel University London, Uxbridge, Middlesex, UK
Maozhen Li
School of Information Science and Technology, Hunan University, Changsha, Hunan, China
Kenli Li
School of Information Science and Technology, Hunan University, Changsha, Hunan, China
Zheng Xiao
College of Computer and Data Science, Fuzhou University, Fuzhou, Fujian, China
Longlong Liao
School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore, Singapore
Lipo Wang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Qiao, P., Gao, X. (2023). Multi-title Attention Mechanism to Generate High-Quality Images on AttnGAN. In: Xiong, N., Li, M., Li, K., Xiao, Z., Liao, L., Wang, L. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 153. Springer, Cham. https://doi.org/10.1007/978-3-031-20738-9_18

Download citation

DOI: https://doi.org/10.1007/978-3-031-20738-9_18
Published: 30 January 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20737-2
Online ISBN: 978-3-031-20738-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics