Abstract
We propose an approach based on self-attention generative adversarial networks to accomplish the task of image completion where completed images become globally and locally consistent. Using self-attention GANs with contextual and other constraints, the generator can draw realistic images, where fine details are generated in the damaged region and coordinated with the whole image semantically. To train the consistent generator, i.e. image completion network, we employ global and local discriminators where the global discriminator is responsible for evaluating the consistency of the entire image, while the local discriminator assesses the local consistency by analyzing local areas containing completed regions only. Last but not least, attentive recurrent neural block is introduced to obtain the attention map about the missing part in the image, which will help the subsequent completion network to fill contents better. By comparing the experimental results of different approaches on CelebA dataset, our method shows relatively good results.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ba J, Mnih V, Kavukcuoglu K (2014) Multiple object recognition with visual attention. Preprint. arXiv:1412.7755
Barnes C, Shechtman E, Finkelstein A, Goldman DB (2009) PatchMatch: a randomized correspondence algorithm for structural image editing. In: ACM transactions on graphics (ToG), vol 28. ACM, New York, p 24
Buades A, Coll B, Morel JM (2005) A non-local algorithm for image denoising. In: 2005 IEEE computer society conference on computer vision and pattern recognition (CVPR’05), vol 2. IEEE, New York, pp 60–65
Cho K, Courville A, Bengio Y (2015) Describing multimedia content using attention-based encoder–decoder networks. IEEE Trans Multimed 17(11):1875–1886
Darabi S, Shechtman E, Barnes C, Goldman DB, Sen P (2012) Image melding: combining inconsistent images using patch-based synthesis. ACM Trans Graph 31(4):82 https://doi.org/10.1145/2185520.2185578
Denton EL, Chintala S, Fergus R et al (2015) Deep generative image models using a Laplacian pyramid of adversarial networks. In: Advances in neural information processing systems, pp 1486–1494
Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention. Annu Rev Neurosci 18(1):193–222
Drori I, Cohen-Or D, Yeshurun H (2003) Fragment-based image completion. In: ACM transactions on graphics (TOG), vol 22. ACM, New York, pp 303–312
Duan M, Li K, Li K (2017) An ensemble CNN2ELM for age estimation. IEEE Trans Inf Forensics Secur 13(3):758–772
Fawzi A, Samulowitz H, Turaga D, Frossard P (2016) Image inpainting through neural networks hallucinations. In: 2016 IEEE 12th image, video, and multidimensional signal processing workshop (IVMSP). IEEE, New York, pp 1–5
Gehring J, Miao Y, Metze F, Waibel A (2013) Extracting deep bottleneck features using stacked auto-encoders. IEEE, pp 3377–3381. https://doi.org/10.1109/ICASSP.2013.6638284
Gers F (2001) Long short-term memory in recurrent neural networks. PhD thesis, Verlag nicht ermittelbar
Goldman B, Shechtman E, Belaunde I (2010) Content-aware fill. https://research.adobe.com/project/content-aware-fill
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems, pp 2672–2680
Guo S, Tan G, Pan H, Chen L, Gao C (2017) Face alignment under occlusion based on local and global feature regression. Multimed Tools Appl 76(6):8677–8694
Hays J, Efros AA (2007) Scene completion using millions of photographs. ACM Trans Graph 26(3):4. https://doi.org/10.1145/1276377.1276382
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770–778
Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780
Hong C, Yu J, Tao D, Wang M (2014) Image-based three-dimensional human pose recovery by multiview locality-sensitive sparse retrieval. IEEE Trans Ind Electron 62(6):3742–3751
Hong C, Yu J, Wan J, Tao D, Wang M (2015) Multimodal deep autoencoder for human pose recovery. IEEE Trans Image Process 24(12):5659–5670
Hong C, Yu J, Zhang J, Jin X, Lee KH (2018) Multi-modal face pose estimation with multi-task manifold deep learning. IEEE Trans Ind Inf 15:3952–3961
Iizuka S, Simo-Serra E, Ishikawa H (2017) Globally and locally consistent image completion. ACM Trans Graph (ToG) 36(4):107. https://doi.org/10.1145/3072959.3073659
Itti L, Koch C, Niebur E (1998) A model of saliency-based visual attention for rapid scene analysis. IEEE Trans Pattern Anal Mach Intell 11:1254–1259
Jaderberg M, Simonyan K, Zisserman A, Kavukcuoglu K (2015) Spatial transformer networks. http://arxiv.org/abs/1506.02025
Kataoka Y, Matsubara T, Uehara K (2016) Image generation using generative adversarial networks and attention mechanism. In: 2016 IEEE/ACIS 15th international conference on computer and information science (ICIS). IEEE, New York, pp 1–6
Lee CH, Liu Z, Wu L, Luo P (2019) MaskGAN: towards diverse and interactive facial image manipulation. Technical report
Li Y, Liu S, Yang J, Yang MH (2017) Generative face completion. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3911–3919
Liu Z, Luo P, Wang X, Tang X (2015) Deep learning face attributes in the wild. In: Proceedings of the IEEE international conference on computer vision, pp 3730–3738
Mnih V, Heess N, Graves A et al (2014) Recurrent models of visual attention. In: Advances in neural information processing systems, pp 2204–2212
Ouyang X, Zhang X, Ma D, Agam G (2018) Generating image sequence from description with LSTM conditional GAN. In: 2018 24th international conference on pattern recognition (ICPR). IEEE, New York, pp 2456–2461
Pathak D, Krahenbuhl P, Donahue J, Darrell T, Efros AA (2016) Context encoders: feature learning by inpainting. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2536–2544
Qian R, Tan RT, Yang W, Su J, Liu J (2018) Attentive generative adversarial network for raindrop removal from a single image. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2482–2491
Radford A, Metz L, Chintala S (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. Preprint. arXiv:1511.06434
Rares A, Reinders MJ, Biemond J (2005) Edge-based image restoration. IEEE Trans Image Process 14(10):1454–1468
Wang X, Girshick R, Gupta A, He K (2018) Non-local neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7794–7803
Wexler Y, Shechtman E, Irani M (2004) Space–time video completion. In: Proceedings of the 2004 IEEE computer society conference on computer vision and pattern recognition, 2004. CVPR 2004, vol 1. IEEE, New York, p 1
Wexler Y, Shechtman E, Irani M (2007) Space–time completion of video. IEEE Trans Pattern Anal Mach Intell 3:463–476
Xia C, Zhang H, Gao X (2017) Combining multi-layer integration algorithm with background prior and label propagation for saliency detection. J Vis Commun Image Represent 48:110–121
Xie J, Xu L, Chen E (2012) Image denoising and inpainting with deep neural networks. In: Proceedings of the 25th international conference on neural information processing systems, NIPS’12, vol 1. Curran Associates Inc., Lake Tahoe, Nevada, pp 341–349. http://dl.acm.org/citation.cfm?id=2999134.2999173
Xingjian S, Chen Z, Wang H, Yeung DY, Wong WK, Woo W (2015) Convolutional LSTM network: a machine learning approach for precipitation nowcasting. In: Advances in neural information processing systems, pp 802–810
Xu K, Ba JL, Kiros R, Cho K, Courville A, Salakhutdinov R, Zemel RS, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: 32nd International Conference on Machine Learning, ICML 2015, pp 2048–2057
Yeh RA, Chen C, Yian Lim T, Schwing AG, Hasegawa-Johnson M, Do MN (2017) Semantic image inpainting with deep generative models. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5485–5493
Yu C, Wang J, Peng C, Gao C, Yu G, Sang N (2018) Bisenet: bilateral segmentation network for real-time semantic segmentation. In: Proceedings of the European conference on computer vision (ECCV), pp 325–341
Yu F, Koltun V (2015) Multi-scale context aggregation by dilated convolutions. Preprint. arXiv:1511.07122
Yu J, Rui Y, Tao D (2014) Click prediction for web image reranking using multimodal sparse coding. IEEE Trans Image Process 23(5):2019–2032
Yu J, Tao D, Wang M, Rui Y (2014) Learning to rank using user clicks and visual features for image retrieval. IEEE Trans Cybern 45(4):767–779
Yu J, Kuang Z, Zhang B, Zhang W, Lin D, Fan J (2018) Leveraging content sensitiveness and user trustworthiness to recommend fine-grained privacy settings for social image sharing. IEEE Trans Inf Forensics Secur 13(5):1317–1332
Yu J, Lin Z, Yang J, Shen X, Lu X, Huang TS (2018) Generative image inpainting with contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5505–5514
Yu J, Zhu C, Zhang J, Huang Q, Tao D (2019) Spatial pyramid-enhanced NetVLAD with weighted triplet loss for place recognition. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2908982
Zhang H, Xu T, Li H, Zhang S, Wang X, Huang X, Metaxas DN (2017) Stackgan: text to photo-realistic image synthesis with stacked generative adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 5907–5915
Zhang H, Goodfellow I, Metaxas D, Odena A (2018) Self-attention generative adversarial networks. Preprint. arXiv:1805.08318
Acknowledgements
This work was supported by the National Key R&D Program of China under Grant 2018YFB1003401, and in part by the National Outstanding Youth Science Program of National Natural Science Foundation of China under Grant 61625202.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Liu, X., Li, K. & Li, K. Attentive Semantic and Perceptual Faces Completion Using Self-attention Generative Adversarial Networks. Neural Process Lett 51, 211–229 (2020). https://doi.org/10.1007/s11063-019-10080-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11063-019-10080-2