Pyramid-VAE-GAN: Transferring hierarchical latent variables for image inpainting

Tian, Huiyuan; Zhang, Li; Li, Shijian; Yao, Min; Pan, Gang

doi:10.1007/s41095-022-0331-3

Pyramid-VAE-GAN: Transferring hierarchical latent variables for image inpainting

Research Article
Open access
Published: 27 July 2023

Volume 9, pages 827–841, (2023)
Cite this article

Download PDF

You have full access to this open access article

Computational Visual Media Aims and scope Submit manuscript

Pyramid-VAE-GAN: Transferring hierarchical latent variables for image inpainting

Download PDF

Huiyuan Tian¹,
Li Zhang^1,2,
Shijian Li¹,
Min Yao¹ &
…
Gang Pan¹

1257 Accesses
Explore all metrics

Abstract

Significant progress has been made in image inpainting methods in recent years. However, they are incapable of producing inpainting results with reasonable structures, rich detail, and sharpness at the same time. In this paper, we propose the Pyramid-VAE-GAN network for image inpainting to address this limitation. Our network is built on a variational autoencoder (VAE) backbone that encodes high-level latent variables to represent complicated high-dimensional prior distributions of images. The prior assists in reconstructing reasonable structures when inpainting. We also adopt a pyramid structure in our model to maintain rich detail in low-level latent variables. To avoid the usual incompatibility of requiring both reasonable structures and rich detail, we propose a novel cross-layer latent variable transfer module. This transfers information about long-range structures contained in high-level latent variables to low-level latent variables representing more detailed information. We further use adversarial training to select the most reasonable results and to improve the sharpness of the images. Extensive experimental results on multiple datasets demonstrate the superiority of our method. Our code is available at https://github.com/thy960112/Pyramid-VAE-GAN.

Article PDF

An Improved Method for Semantic Image Inpainting with GANs: Progressive Inpainting

Article 29 June 2018

High-Resolution Image Inpainting Using Generative Adversarial Networks

Multi-level Discriminator and Wavelet Loss for Image Inpainting with Large Missing Area

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Availability of data and materials

To ensure that the results are reproducible, the source code is publicly available at https://github.com/thy960112/Pyramid-VAE-GAN.

References

Bertalmio, M.; Sapiro, G.; Caselles, V.; Ballester, C. Image inpainting. In: Proceedings of the 27th Annual Conference on Computer Graphics and Interactive Techniques, 417–424, 2000.
Wang, N.; Zhang, Y. P.; Zhang, L. F. Dynamic selection network for image inpainting. IEEE Transactions on Image Processing Vol. 30, 1784–1798, 2021.
Article Google Scholar
Li, J. Y.; Wang, N.; Zhang, L. F.; Du, B.; Tao, D. C. Recurrent feature reasoning for image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 7757–7765, 2020.
Wan, Z. Y.; Zhang, J. B.; Chen, D. D.; Liao, J. High-fidelity pluralistic image completion with transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 4672–4681, 2021.
Lu, M.; Niu, S. Z. A detection approach using LSTM-CNN for object removal caused by exemplar-based image inpainting. Electronics Vol. 9, No. 5, 858, 2020.
Article Google Scholar
Shetty, R.; Fritz, M.; Schiele, B. Adversarial scene editing: Automatic object removal from weak supervision. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, 7717–7727, 2018.
Barnes, C.; Shechtman, E.; Finkelstein, A.; Goldman, D. B. PatchMatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics Vol. 28, No. 3, Article No. 24, 2009.
Pathak, D.; Krähenbühl, P.; Donahue, J.; Darrell, T.; Efros, A. A. Context encoders: Feature learning by inpainting. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2536–2544, 2016.
Yu, J. H.; Lin, Z.; Yang, J. M.; Shen, X. H.; Lu, X.; Huang, T. S. Generative image inpainting with contextual attention. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 5505–5514, 2018.
Wu, X.; Xu, K.; Hall, P. A survey of image synthesis and editing with generative adversarial networks. Tsinghua Science and Technology Vol. 22, No. 6, 660–674, 2017.
Article MATH Google Scholar
Xue, Y.; Guo, Y. C.; Zhang, H.; Xu, T.; Zhang, S. H.; Huang, X. L. Deep image synthesis from intuitive user input: A review and perspectives. Computational Visual Media Vol. 8, No. 1, 3–31, 2022.
Article Google Scholar
Zeng, X. X.; Wu, Z. L.; Peng, X. J.; Qiao, Y. Joint 3D facial shape reconstruction and texture completion from a single image. Computational Visual Media Vol. 8, No. 2, 239–256, 2022.
Article Google Scholar
Wu, X.; Li, R. L.; Zhang, F. L.; Liu, J. C.; Wang, J.; Shamir, A.; Hu, S. M. Deep portrait image completion and extrapolation. IEEE Transactions on Image Processing Vol. 29, 2344–2355, 2020.
Article MATH Google Scholar
Liu, H. Y.; Wan, Z. Y.; Huang, W.; Song, Y. B.; Han, X. T.; Liao, J. PD-GAN: Probabilistic diverse GAN for image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 9367–9376, 2021.
Chen, Y. T.; Zhang, H. P.; Liu, L. W.; Chen, X.; Zhang, Q.; Yang, K.; Xia, R. L.; Xie, J. B. Research on image inpainting algorithm of improved GAN based on two-discriminations networks. Applied Intelligence Vol. 51, No. 6, 3460–3474, 2021.
Article Google Scholar
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Communications of the ACM Vol. 63, No. 11, 139–144, 2020.
Article MathSciNet Google Scholar
Zeng, Y. H.; Fu, J. L.; Chao, H. Y.; Guo, B. N. Learning pyramid-context encoder network for high-quality image inpainting. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1486–1494, 2019.
Kingma, D. P.; Welling, M. Auto-encoding variational bayes. In: Proceedings of the International Conference on Learning Representations, 2014.
Karras, T.; Aila, T. M.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. In: Proceedings of the International Conference on Learning Representations, 2018.
Krause, J.; Stark, M.; Jia, D.; Li, F. F. 3D object representations for fine-grained categorization. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, 554–561, 2013.
Cimpoi, M.; Maji, S.; Kokkinos, I.; Mohamed, S.; Vedaldi, A. Describing textures in the wild. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 3606–3613, 2014.
Tyleček, R.; Šára, R. Spatial pattern templates for recognition of objects with regular structure. In: Pattern Recognition. GCPR 2013. Lecture Notes in Computer Science, Vol. 8142. Weickert, J.; Hein, M.; Schiele, B. Eds. Springer Berlin Heidelberg, 364–374, 2013.
Google Scholar
Barnes, C.; Zhang, F. L. A survey of the state-of-the-art in patch-based synthesis. Computational Visual Media Vol. 3, No. 1, 3–20, 2017.
Article Google Scholar
Fukushima, K.; Miyake, S. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In: Competition and Cooperation in Neural Nets. Lecture Notes in Biomathematics, Vol. 45. Amari, S.; Arbib, M. A. Eds. Springer Berlin Heidelberg, 267–285, 1982.
Chapter Google Scholar
LeCun, Y.; Boser, B.; Denker, J. S.; Henderson, D.; Howard, R. E.; Hubbard, W.; Jackel, L. D. Backpropagation applied to handwritten zip code recognition. Neural Computation Vol. 1, No. 4, 541–551, 1989.
Article Google Scholar
Peng, J. L.; Liu, D.; Xu, S. C.; Li, H. Q. Generating diverse structure for image inpainting with hierarchical VQ-VAE. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 10770–10779, 2021.
Vahdat, A; Kautz, J. NVAE: A deep hierarchical variational autoencoder. In: Proceedings of the 34th International Conference on Neural Information Processing Systems, Article No. 1650, 19667–19679, 2020.
Ramesh, A.; Pavlov, M.; Goh, G.; Gray, S.; Voss, C.; Radford, A.; Chen, M.; Sutskever, I. Zero-shot text-to-image generation. In: Proceedings of the 38th International Conference on Machine Learning, Vol. 139, 8821–8831, 2021.
Bowman, S. R.; Vilnis, L.; Vinyals, O.; Dai, A. M.; Jozefowicz, R.; Bengio, S. Generating sentences from a continuous space. arXiv preprint arXiv:1511.06349.2015.
Frazer, J.; Notin, P.; Dias, M.; Gomez, A.; Min, J. K.; Brock, K.; Gal, Y.; Marks, D. S. Disease variant prediction with deep generative models of evolutionary data. Nature Vol. 599, No. 7883, 91–95, 2021.
Article Google Scholar
Salimans, T.; Kingma, D. P.; Welling, M. Markov Chain Monte Carlo and variational inference: Bridging the gap. In: Proceedings of the 32nd International Conference on International Conference on Machine Learning, Vol. 37, 1218–1226, 2015.
Rezende, D. J.; Mohamed, S.; Wierstra, D. Stochastic backpropagation and approximate inference in deep generative models. In: Proceedings of the 31st International Conference on International Conference on Machine Learning, Vol. 32, II-1278–II-1286, 2014.
Kulkarni, T. D.; Whitney, W. F.; Kohli, P.; Tenenbaum, J. B. Deep convolutional inverse graphics network. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, 2539–2547, 2015.
Sun, R. Q.; Huang, C.; Zhu, H. L.; Ma, L. Z. Maskaware photorealistic facial attribute manipulation. Computational Visual Media Vol. 7, No. 3, 363–374, 2021.
Article Google Scholar
Walker, J.; Doersch, C.; Gupta, A.; Hebert, M. An uncertain future: Forecasting from static images using variational autoencoders. In: Computer Vision–ECCV 2016. Lecture Notes in Computer Science, Vol. 9911. Leibe, B.; Matas, J.; Sebe, N.; Welling, M. Eds. Springer Cham, 835–851, 2016.
Chapter Google Scholar
Sohn, K.; Yan, X. C.; Lee, H. Learning structured output representation using deep conditional generative models. In: Proceedings of the 28th International Conference on Neural Information Processing Systems, Vol. 2, 3483–3491, 2015.
Gao, R.; Hou, X. S.; Qin, J.; Chen, J. X.; Liu, L.; Zhu, F.; Zhang, Z.; Shao, L. Zero-VAE-GAN: Generating unseen features for generalized and transductive zero-shot learning. IEEE Transactions on Image Processing Vol. 29, 3665–3680, 2020.
Article MATH Google Scholar
Zheng, C. X.; Cham, T. J.; Cai, J. F. Pluralistic image completion. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 1438–1447, 2019.
Gonzalez, R. C.; Woods, R. E. Digital Image Processing, 4th edn. Pearson, 2018.
Lim, J. H.; Ye, J. C. Geometric GAN. arXiv preprint arXiv:1705.02894, 2017.
Fu, M. C. Stochastic gradient estimation. In: Handbook of Simulation Optimization. International Series in Operations Research & Management Science, Vol. 216. Fu, M. Ed. Springer New York, 105–147, 2015.
Chapter Google Scholar
Devroye, L. Sample-based non-uniform random variate generation. In: Proceedings of the 18th Conference on Winter Simulation, 260–265, 1986.
Doersch, C. Tutorial on variational autoencoders. arXiv preprint arXiv:1606.05908, 2016.
Iizuka, S.; Simo-Serra, E.; Ishikawa, H. Globally and locally consistent image completion. ACM Transactions on Graphics Vol. 36, No. 4, Article No. 107, 2017.
Liu, G. L.; Reda, F. A.; Shih, K. J.; Wang, T. C.; Tao, A.; Catanzaro, B. Image inpainting for irregular holes using partial convolutions. In: Computer Vision–ECCV 2018. Lecture Notes in Computer Science, Vol. 11215. Ferrari, V.; Hebert, M.; Sminchisescu, C.; Weiss, Y. Eds. Springer Cham, 89–105, 2018.
Chapter Google Scholar
Yu, J. H.; Lin, Z.; Yang, J. M.; Shen, X. H.; Lu, X.; Huang, T. Free-form image inpainting with gated convolution. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, 4470–4479, 2019.
Cortes, C.; Vapnik, V. Support-vector networks. Machine Learning Vol. 20, No. 3, 273–297, 1995.
Article MATH Google Scholar
Wang, Z.; Simoncelli, E. P.; Bovik, A. C. Multiscale structural similarity for image quality assessment. In: Proceedings of the 37th Asilomar Conference on Signals, Systems & Computers, 1398–1402, 2003.
Szeliski, R. Computer Vision: Algorithms and Applications. Springer London, 2011.
Book MATH Google Scholar
Heusel, M.; Ramsauer, H.; Unterthiner, T.; Nessler, B.; Hochreiter, S. GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, 6629–6640, 2017.

Download references

Acknowledgements

The authors gratefully acknowledge the financial support of the National Natural Science Foundation of China (Grant No. 61925603).

Author information

Authors and Affiliations

College of Computer Science and Technology, Zhejiang University, Hangzhou, 310027, China
Huiyuan Tian, Li Zhang, Shijian Li, Min Yao & Gang Pan
Advanced Technology Research Institute, Zhejiang University, Hangzhou, 310027, China
Li Zhang

Authors

Huiyuan Tian
View author publications
You can also search for this author in PubMed Google Scholar
Li Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Shijian Li
View author publications
You can also search for this author in PubMed Google Scholar
Min Yao
View author publications
You can also search for this author in PubMed Google Scholar
Gang Pan
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Huiyuan Tian, Li Zhang, and Shijian Li contributed to the conception of the study; Huiyuan Tian, Li Zhang, and Min Yao developed the model; Huiyuan Tian, Li Zhang performed the experiments; Huiyuan Tian, Min Yao, and Gang Pan contributed significantly to analysis and manuscript preparation; Huiyuan Tian, Li Zhang, and Shijian Li performed data analysis and wrote the manuscript; Huiyuan Tian, Min Yao, and Gang Pan helped perform the analysis with constructive discussions.

Corresponding author

Correspondence to Li Zhang.

Ethics declarations

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Huiyuan Tian received her bachelor degree from Northwestern Polytechnic University in 2016. She is currently pursuing a Ph.D. degree in the College of Computer Science and Technology, Zhejiang University, Hangzhou, China. Her current research interests include computer vision, machine learning, and probabilistic graphical models.

Li Zhang received his B.Eng. and Ph.D. degrees from Zhejiang University, in 2007 and 2013, respectively. He is currently an assistant researcher in the Department of Computer Science, Zhejiang University. In 2009, he was a visiting scholar at the University of Hong Kong. From 2013 to 2017, he was a researcher in Works Applications Co., Ltd. His current interests include deep learning, game theory, human–machine hybrid computing, and pervasive computing.

Shijian Li received his Ph.D. degree from Zhejiang University in 2006. In 2010, he was a visiting scholar with the Institute Telecom SudParis, France. He currently works in the College of Computer Science and Technology, Zhejiang University. His research interests include sensor networks, ubiquitous computing, and social computing. He serves as an Editor of the International Journal of Distributed Sensor Networks.

Min Yao received his Ph.D. degree in biomedical engineering and instruments from Zhejiang University in 1995. He is currently a professor in the College of Computer Science and Technology, Zhejiang University. His research interests include computational intelligence, pattern recognition, knowledge discovery, and knowledge services.

Gang Pan received his B.Eng. and Ph.D. degrees from Zhejiang University, in 1998 and 2004, respectively. He is currently a professor in the Department of Computer Science, and deputy director of the State Key Lab of CAD&CG, Zhejiang University, China. His current interests include artificial intelligence, pervasive computing, brain-inspired computing, and brain-machine interfaces. He serves as an Associate Editor of IEEE Trans. Neural Networks and Learning Systems, IEEE Systems Journal, and Pervasive and Mobile Computing.

Rights and permissions

This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made.

The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Other papers from this open access journal are available free of charge from http://www.springer.com/journal/41095. To submit a manuscript, please go to https://www.editorialmanager.com/cvmj.

Reprints and permissions

About this article

Cite this article

Tian, H., Zhang, L., Li, S. et al. Pyramid-VAE-GAN: Transferring hierarchical latent variables for image inpainting. Comp. Visual Media 9, 827–841 (2023). https://doi.org/10.1007/s41095-022-0331-3

Download citation

Received: 02 May 2022
Accepted: 18 December 2022
Published: 27 July 2023
Issue Date: December 2023
DOI: https://doi.org/10.1007/s41095-022-0331-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Pyramid-VAE-GAN: Transferring hierarchical latent variables for image inpainting

Abstract

Article PDF

Similar content being viewed by others

An Improved Method for Semantic Image Inpainting with GANs: Progressive Inpainting

High-Resolution Image Inpainting Using Generative Adversarial Networks

Multi-level Discriminator and Wavelet Loss for Image Inpainting with Large Missing Area

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Pyramid-VAE-GAN: Transferring hierarchical latent variables for image inpainting

Abstract

Article PDF

Similar content being viewed by others

An Improved Method for Semantic Image Inpainting with GANs: Progressive Inpainting

High-Resolution Image Inpainting Using Generative Adversarial Networks

Multi-level Discriminator and Wavelet Loss for Image Inpainting with Large Missing Area

Availability of data and materials

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation