Skip to main content

Pluralistic Free-Form Image Completion

Abstract

Image completion involves filling plausible contents to missing regions in images. Current image completion methods produce only one result for a given masked image, although there may be many reasonable possibilities. In this paper, we present an approach for pluralistic image completion—the task of generating multiple and diverse plausible solutions for free-form image completion. A major challenge faced by learning-based approaches is that usually only one ground truth training instance per label for this multi-output problem. To overcome this, we propose a novel and probabilistically principled framework with two parallel paths. One is a reconstructive path that utilizes the only one ground truth to get prior distribution of missing patches and rebuild the original image from this distribution. The other is a generative path for which the conditional prior is coupled to the distribution obtained in the reconstructive path. Both are supported by adversarial learning. We then introduce a new short+long term patch attention layer that exploits distant relations among decoder and encoder features, to improve appearance consistency between the original visible and the generated new regions. Experiments show that our method not only yields better results in various datasets than existing state-of-the-art methods, but also provides multiple and diverse outputs.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18

Notes

  1. 1.

    Code: https://github.com/lyndonzheng/Pluralistic-Inpainting Demo: http://www.chuanxiaz.com/project/pluralistic.

  2. 2.

    https://github.com/pathak22/context-encoder.

  3. 3.

    https://github.com/satoshiiizuka/siggraph2017_inpainting.

  4. 4.

    https://github.com/JiahuiYu/generative_inpainting.

  5. 5.

    https://github.com/NVIDIA/partialconv.

  6. 6.

    https://github.com/knazeri/edge-connect.

  7. 7.

    https://github.com/JiahuiYu/generative_inpainting.

  8. 8.

    http://www.chuanxiaz.com/project/pluralistic/.

References

  1. Ballester, C., Bertalmio, M., Caselles, V., Sapiro, G., & Verdera, J. (2001). Filling-in by joint interpolation of vector fields and gray levels. IEEE Transactions on Image Processing, 10(8), 1200–1211.

    MathSciNet  Article  Google Scholar 

  2. Bao, J., Chen, D., Wen, F., Li, H., & Hua, G. (2017). Cvae-gan: Fine-grained image generation through asymmetric training. In 2017 IEEE international conference on computer vision (ICCV) (pp. 2764–2773). IEEE.

  3. Barnes, C., Shechtman, E., Finkelstein, A., & Goldman, D. B. (2009). Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics (ToG), 28, 24.

    Article  Google Scholar 

  4. Bertalmio, M, Sapiro, G., Caselles, V., & Ballester. C. (2000). Image inpainting. In Proceedings of the 27th annual conference on Computer graphics and interactive techniques (pp. 417–424). ACM Press/Addison-Wesley Publishing Co.

  5. Bertalmio, M., Vese, L., Sapiro, G., & Osher, S. (2003). Simultaneous structure and texture image inpainting. IEEE Transactions on Image Processing, 12(8), 882–889.

    Article  Google Scholar 

  6. Chen, Z., Nie, S., Wu, T., & Healey, C. G. (2018). High resolution face completion with multiple controllable attributes via fully end-to-end progressive generative adversarial networks. ArXiv preprint arXiv:180107632.

  7. Criminisi, A., Perez, P., & Toyama, K. (2003). Object removal by exemplar-based inpainting. In Computer vision and pattern recognition, 2003. Proceedings. 2003 IEEE computer society conference on (Vol. 2, pp. II–II). IEEE.

  8. Criminisi, A., Pérez, P., & Toyama, K. (2004). Region filling and object removal by exemplar-based image inpainting. IEEE Transactions on Image Processing, 13(9), 1200–1212.

    Article  Google Scholar 

  9. Deng, Y., & Wang, J. (2020). Image inpainting using parallel network. In 2020 IEEE international conference on image processing (ICIP) (pp. 1088–1092). IEEE.

  10. Doersch, C, Singh, S, Gupta, A, Sivic, J, & Efros, A. (2012). What makes paris look like paris? ACM Transactions on Graphics, 31(4), 1–9.

    Article  Google Scholar 

  11. Eslami, S. M. A., Jimenez Rezende, D., Besse, F., Viola, F., Morcos, A. S., Garnelo, M., et al. (2018). Neural scene representation and rendering. Science, 360(6394), 1204–1210.

    Article  Google Scholar 

  12. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2014). Generative adversarial nets. In Advances in neural information processing systems (pp. 2672–2680).

  13. Hara, T., & Harada, T. (2020). Spherical image generation from a single normal field of view image by considering scene symmetry. ArXiv preprint arXiv:200102993.

  14. Hays, J., & Efros, A. A. (2007). Scene completion using millions of photographs. In ACM Transactions on Graphics (TOG) (Vol. 26, p. 4). ACM.

  15. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., & Hochreiter, S. (2017). Gans trained by a two time-scale update rule converge to a local nash equilibrium. In Advances in neural information processing systems (pp. 6626–6637).

  16. Liu, H., Jiang, B., Song, Y., Huang, W., & Yang, C. (2020). Rethinking image inpainting via a mutual encoder-decoder with feature equalizations. In Proceedings of the European conference on computer vision.

  17. Iizuka, S., Simo-Serra, E., & Ishikawa, H. (2017). Globally and locally consistent image completion. ACM Transactions on Graphics (TOG), 36(4), 107.

    Article  Google Scholar 

  18. Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In 2017 IEEE conference on computer vision and pattern recognition (CVPR) (pp. 5967–5976). IEEE.

  19. Jaderberg, M., Simonyan, K., & Zisserman, A. (2015). Spatial transformer networks. In Advances in neural information processing systems (pp. 2017–2025).

  20. Jia, J., & Tang, C. K. (2004). Inference of segmented color and texture description by tensor voting. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(6), 771–786.

    Article  Google Scholar 

  21. Jo, Y., & Park, J. (2019). Sc-fegan: Face editing generative adversarial network with user’s sketch and color. ArXiv preprint arXiv:190206838.

  22. Karras, T., Aila, T., Laine, S., & Lehtinen, J. (2017). Progressive growing of Gans for improved quality, stability, and variation. ArXiv preprint arXiv:1710.10196.

  23. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., & Aila, T. (2020). Analyzing and improving the image quality of stylegan. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 8110–8119).

  24. Kingma, D. P., & Welling, M. (2013). Auto-encoding variational Bayes. ArXiv preprint arXiv:1312.6114.

  25. Köhler, R., Schuler, C., Schölkopf, B., & Harmeling, S. (2014). Mask-specific inpainting with deep neural networks. In German conference on pattern recognition (pp. 523–534). Springer.

  26. Lee, H. Y., Tseng, H. Y., Huang, J. B., Singh, M., & Yang, M. H. (2018). Diverse image-to-image translation via disentangled representations. In European conference on computer vision (ECCV).

  27. Levin, A., Zomet, A., & Weiss, Y. (2003). Learning how to inpaint from global image statistics. In Null (p. 305). IEEE.

  28. Li, Y., Liu, S., Yang, J., & Yang, M. H. (2017). Generative face completion. In Computer vision and pattern recognition (CVPR), 2017 IEEE conference on (pp. 5892–5900). IEEE.

  29. Liu, G., Reda, F. A., Shih, K. J., Wang, T. C., Tao, A., & Catanzaro, B. (2018). Image inpainting for irregular holes using partial convolutions. In Proceedings of the European conference on computer vision (ECCV).

  30. Liu, Z., Luo, P., Wang, X., & Tang, X. (2015). Deep learning face attributes in the wild. In Proceedings of the IEEE international conference on computer vision (pp. 3730–3738).

  31. Mao, X., Li, Q., Xie, H., Lau, R. Y., Wang, Z., & Smolley, S. P. (2017). Least squares generative adversarial networks. In Computer vision (ICCV), 2017 IEEE international conference on (pp. 2813–2821). IEEE.

  32. Mathieu, M., Couprie, C., & LeCun, Y. (2015). Deep multi-scale video prediction beyond mean square error. ArXiv preprint arXiv:151105440.

  33. Nazeri, K., Ng, E., Joseph, T., Qureshi, F., & Ebrahimi, M. (2019). Edgeconnect: generative image inpainting with adversarial edge learning. ArXiv preprint arXiv:190100212.

  34. Park, E., Yang, J., Yumer, E., Ceylan, D., & Berg, A. C. (2017). Transformation-grounded image generation network for novel 3D view synthesis. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (pp. 702–711). IEEE.

  35. Park, T., Liu, M. Y., Wang, T. C., & Zhu, J. Y. (2019). Semantic image synthesis with spatially-adaptive normalization. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2337–2346).

  36. Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A. A. (2016). Context encoders: feature learning by inpainting. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2536–2544).

  37. Peng, J., Liu, D., Xu, S., & Li, H. (2021). Generating diverse structure for image inpainting with hierarchical VQ-VAE. ArXiv preprint arXiv:210310022.

  38. Portenier, T., Hu, Q., Szabo, A., Bigdeli, S. A., Favaro, P., & Zwicker, M. (2018). Faceshop: deep sketch-based face image editing. ACM Transactions on Graphics (TOG), 37(4), 99.

    Article  Google Scholar 

  39. Ren, J. S., Xu, L., Yan, Q., & Sun, W. (2015). Shepard convolutional neural networks. In Advances in neural information processing systems (pp. 901–909).

  40. Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., et al. (2015). Imagenet large scale visual recognition challenge. International Journal of Computer Vision, 115(3), 211–252.

    MathSciNet  Article  Google Scholar 

  41. Salimans, T., Goodfellow, I., Zaremba, W., Cheung, V., Radford, A., & Chen, X. (2016). Improved techniques for training gans. In Advances in neural information processing systems (pp. 2234–2242).

  42. Shaham, T. R., Dekel, T., & Michaeli, T. (2019). Singan: learning a generative model from a single natural image. In Proceedings of the IEEE international conference on computer vision (pp. 4570–4580).

  43. Shrivastava, A., Pfister, T., Tuzel, O., Susskind, J., Wang, W., & Webb, R. (2017). Learning from simulated and unsupervised images through adversarial training. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 2107–2116).

  44. Sohn, K., Lee, H., & Yan, X. (2015). Learning structured output representation using deep conditional generative models. In Advances in neural information processing systems (pp. 3483–3491).

  45. Song, Y., Yang, C., Lin, Z., Liu, X., Huang, Q., Li, H., & Jay, C. (2018a). Contextual-based image inpainting: infer, match, and translate. In Proceedings of the European Conference on Computer Vision (ECCV) (pp. 3–19).

  46. Song, Y., Yang, C., Shen, Y., Wang, P., Huang, Q., & Kuo, C. C. J. (2018b). Spg-net: Segmentation prediction and guidance network for image inpainting. ArXiv preprint arXiv:1805.03356.

  47. Walker, J., Doersch, C., Gupta, A., & Hebert, M. (2016). An uncertain future: forecasting from static images using variational autoencoders. In European conference on computer vision (ECCV).

  48. Wang, Y., Tao, X., Qi, X., Shen, X., & Jia, J. (2018). Image inpainting via generative multi-column convolutional neural networks. In Advances in neural information processing systems (pp. 331–340).

  49. Yan, Z., Li, X., Li, M., Zuo, W., & Shan, S. (2018). Shift-net: image inpainting via deep feature rearrangement. In The European conference on computer vision (ECCV).

  50. Yang, C., Lu, X., Lin, Z., Shechtman, E., Wang, O., & Li, H. (2017). High-resolution image inpainting using multi-scale neural patch synthesis. In The IEEE conference on computer vision and pattern recognition (CVPR) (Vol. 1, p. 3).

  51. Yeh, R. A., Chen, C., Lim, T. Y., Schwing, A. G., Hasegawa-Johnson, M., & Do, M. N. (2017). Semantic image inpainting with deep generative models. In Computer vision and pattern recognition (CVPR), 2017 IEEE conference on (pp. 6882–6890). IEEE.

  52. Yi, Z., Tang, Q., Azizi, S., Jang, D., & Xu, Z. (2020). Contextual residual aggregation for ultra high-resolution image inpainting. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 7508–7517).

  53. Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., & Huang, T. S. (2018). Generative image inpainting with contextual attention. In: Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 5505–5514).

  54. Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., & Huang, T. S. (2019). Free-form image inpainting with gated convolution. In Proceedings of the IEEE international conference on computer vision (pp. 4471–4480).

  55. Zeng, Y., Lin, Z., Yang, J., Zhang, J., Shechtman, E., & Lu, H. (2020). High-resolution image inpainting with iterative confidence feedback and guided upsampling. In European conference on computer vision (pp. 1–17). Springer.

  56. Zhang, H., Goodfellow, I., Metaxas, D., & Odena, A. (2018a). Self-attention generative adversarial networks. ArXiv preprint arXiv:180508318.

  57. Zhang, R., Isola, P., & Efros, A. A. (2016). Colorful image colorization. In European conference on computer vision (pp. 649–666). Springer.

  58. Zhang, R., Isola, P., Efros, A. A., Shechtman, E., & Wang, O. (2018b). The unreasonable effectiveness of deep features as a perceptual metric. In CVPR.

  59. Zhao, L., Mo, Q., Lin, S., Wang, Z., Zuo, Z., Chen, H., Xing, W., & Lu, D. (2020). Uctgan: diverse image inpainting based on unsupervised cross-space translation. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (pp. 5741–5750).

  60. Zheng, C., Cham, T. J., & Cai, J. (2018). T2net: synthetic-to-realistic translation for solving single-image depth estimation tasks. In Proceedings of the European conference on computer vision (ECCV) (pp. 767–783).

  61. Zheng, C., Cham, T. J., & Cai, J. (2019). Pluralistic image completion. In The IEEE conference on computer vision and pattern recognition (CVPR).

  62. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., & Torralba, A. (2018). Places: a 10 million image database for scene recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 40(6), 1452–1464.

    Article  Google Scholar 

  63. Zhou, T., Tulsiani, S., Sun, W., Malik, J., & Efros, A. A. (2016). View synthesis by appearance flow. In European conference on computer vision (pp. 286–301). Springer.

  64. Zhu, J. Y., Park, T., Isola, P., & Efros, A. A. (2017a). Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision (pp. 2223–2232).

  65. Zhu, J. Y., Zhang, R., Pathak, D., Darrell, T., Efros, A. A., Wang, O., & Shechtman, E. (2017b). Toward multimodal image-to-image translation. In Advances in neural information processing systems (pp. 465–476).

Download references

Acknowledgements

This study is supported under the RIE2020 Industry Alignment Fund - Industry Collaboration Projects (IAFICP) Funding Initiative, as well as cash and in-kind contribution from Singapore Telecommunications Limited (Singtel), through Singtel Cognitive and Artificial Intelligence Lab for Enterprises (SCALE@NTU). This research is also supported by the Monash FIT Start-up Grant.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Chuanxia Zheng.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Jian Sun.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zheng, C., Cham, TJ. & Cai, J. Pluralistic Free-Form Image Completion. Int J Comput Vis (2021). https://doi.org/10.1007/s11263-021-01502-7

Download citation

Keywords

  • Image completion
  • Multi-modal generative models
  • Image generation
  • Conditional variational auto-encoders