Abstract
In the field of Text-to-image, text is essentially a constraint condition for the generated image, and the generation network guides to generate images that match the text according to the constraint conditions. However, if the image is generated only on the basis of a given text constraint condition, obviously, it can be imagined that the generated image without rich details, reducing the image visualization. With that in mind, we introduce Multi-title Attention Mechanism, regard the dataset as a prior condition, at first, select other titles in the dataset that are compatible with the given text according to given title, which is essentially the process of information retrieval, and then use the self-attention mechanism to integrate the embedding of multiple titles, the final text contains rich detail information, which guides the generation of high-quality images. In addition, in order to enable AttnGAN to generate clear image in the first stage, we introduce a mixed attention mechansim and an Residual Dense Block(RDB) model. The mixed attention mechanism includes: channel attention and pixel attention. Channel attention is mainly to guide what the image is generate, while pixel attention is responsible for where it is generated. Experiments on the CUB dataset show that the proposed approaches is significantly better than AttnGAN, and the lnception Score(IS) and R-precision of the evaluation index are improved by 4.12% and 10.43% respectively.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial nets. In NIPS (2014)
Chen, Z., Nagy, J.G., Xi, Y., Yu, B.: Structured FISTA for Image Restoration. In EI (2020)
Tian, C., Zhuge, R., Wu, Z., Xu, Y.: Lightweight Image Super-Resolution with Enhanced CNN. In EI (2020)
Di, X., Patel, V.M.: Facial synthesis from visual attributes via sketch using multiscale generators. IEEE Trans Biometrics Behav Identity Sci. no. 1, pp. 55–67 (2020)
Jo, Y., Park, J.: SC-FEGAN—face editing generative adversarial network with user’s sketch and color. ICCV, pp. 1745–1753 (2019)
Reed, S., Akata, Z., Yan, X., Logeswaran, L., Schiele, B., Lee, H.: Generative adversarial text-to-image synthesis. In ICML (2016)
Reed, S., Akata, Z., Mohan, S., Tenka, S., Schiele, B., Lee, H.: Learning what and where to draw. In NIPS (2016)
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stackgan: Text to photo-realistic image synthesis with stacked generative adversarial networks. In ICCV (2017)
Zhang, H., Xu, T., Li, H., Zhang, S., Wang, X., Huang, X., Metaxas, D.N.: Stackgan++: Realistic image synthesis with stacked generative adversarial networks. arXiv: 1710.10916 (2017)
Gregor, K., Danihelka, I., Graves, A., Rezende, D., Wierstra, D.: DRAW: A recurrent neural network for image generation. In ICML (2015)
Nguyen, A., Clune, J., Bengio, Y., Dosovitskiy, A., Yosinski, J.: Plug & play generative networks: conditional iterative generation of images in latent space. In CVPR (2017)
Van den Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A.: Conditional image generation with pixelcnn decoders. In NIPS (2016)
Metaxas.: StackGAN++: Realistic image synthesis with stacked generative adversarial networks. arXiv: 1710.10916. (2017)
Xu, T., Zhang, P., Huang, Q., Zhang, H., Gan, Z., Huang, X., He, X.A.: AttnGAN: Fine-grained text to image generation with attentional generative adversarial networks. In CVPR, Salt Lake City, Utah (2018)
Xu, J., He, X., Li, H.: Deep learning for matching in search and recommendation[C]. The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. pp. 1365–1368 (2018)
Acknowledgment
This work was supported by the National Natural Science Foundation of China(Item NO:61105064),Scientific Research Project of Shaanxi Provincial Department of Education(Item NO:16JK1689),and the Key Laboratory of Network Data Analysis of Shaanxi Province. And I would like to reviewers for reading this paper, and thank my tutor and partners for their help during the process of my experiment. Especially thank the key laboratory for providing me with a good learning environment and experimental support condition.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Qiao, P., Gao, X. (2023). Multi-title Attention Mechanism to Generate High-Quality Images on AttnGAN. In: Xiong, N., Li, M., Li, K., Xiao, Z., Liao, L., Wang, L. (eds) Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2022. Lecture Notes on Data Engineering and Communications Technologies, vol 153. Springer, Cham. https://doi.org/10.1007/978-3-031-20738-9_18
Download citation
DOI: https://doi.org/10.1007/978-3-031-20738-9_18
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-20737-2
Online ISBN: 978-3-031-20738-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)