Skip to main content

Contrastive Learning for Diverse Disentangled Foreground Generation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13676))

Included in the following conference series:

Abstract

We introduce a new method for diverse foreground generation with explicit control over various factors. Existing image inpainting based foreground generation methods often struggle to generate diverse results and rarely allow users to explicitly control specific factors of variation (e.g., varying the facial identity or expression for face inpainting results). We leverage contrastive learning with latent codes to generate diverse foreground results for the same masked input. Specifically, we define two sets of latent codes, where one controls a pre-defined factor (“known”), and the other controls the remaining factors (“unknown”). The sampled latent codes from the two sets jointly bi-modulate the convolution kernels to guide the generator to synthesize diverse results. Experiments demonstrate the superiority of our method over state-of-the-arts in result diversity and generation controllability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. https://github.com/zllrunning/face-parsing.pytorch

  2. Ballester, C., Bertalmio, M., Caselles, V., Sapiro, G., Verdera, J.: Filling-in by joint interpolation of vector fields and gray levels. IEEE Trans. Image Process. 10(8), 1200–1211 (2001). https://doi.org/10.1109/83.935036

    Article  MathSciNet  MATH  Google Scholar 

  3. Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: PatchMatch: a randomized correspondence algorithm for structural image editing. In: SIGGRAPH 2009 (2009)

    Google Scholar 

  4. Bertalmío, M., Sapiro, G., Caselles, V., Ballester, C.: Image inpainting, pp. 417–424 (2000)

    Google Scholar 

  5. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. arXiv:abs/2002.05709 (2020)

  6. Deng, J., Guo, J., Zafeiriou, S.: ArcFace: additive angular margin loss for deep face recognition. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4685–4694 (2019)

    Google Scholar 

  7. Denton, E.L., Birodkar, V.: Unsupervised learning of disentangled representations from video. arXiv:abs/1705.10915 (2017)

  8. Ding, D., Ram, S., Rodríguez, J.J.: Image inpainting using nonlocal texture matching and nonlinear filtering. IEEE Trans. Image Process. 28, 1705–1719 (2019)

    Article  MathSciNet  Google Scholar 

  9. Du, R., et al.: Fine-grained visual classification via progressive multi-granularity training of jigsaw patches. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12365, pp. 153–168. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58565-5_10

    Chapter  Google Scholar 

  10. Goodfellow, I., et al.: Generative adversarial nets. In: NeurIPS (2014)

    Google Scholar 

  11. Guo, X., Yang, H., Huang, D.: Image inpainting via conditional texture and structure dual generation. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14134–14143, October 2021

    Google Scholar 

  12. Hays, J., Efros, A.A.: Scene completion using millions of photographs. In: SIGGRAPH 2007 (2007)

    Google Scholar 

  13. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9726–9735 (2020)

    Google Scholar 

  14. He, K., Gkioxari, G., Dollár, P., Girshick, R.B.: Mask R-CNN. IEEE Trans. Pattern Anal. Mach. Intell. 42, 386–397 (2020)

    Article  Google Scholar 

  15. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: NIPS (2017)

    Google Scholar 

  16. Huang, Y., Wang, Y., Tai, Y., Liu, X., Shen, P., Li, S., Jilin Li, F.H.: CurricularFace: adaptive curriculum learning loss for deep face recognition, pp. 1–8 (2020)

    Google Scholar 

  17. Härkönen, E., Hertzmann, A., Lehtinen, J., Paris, S.: GANspace: Discovering interpretable GAN controls. In: Proceedings of NeurIPS (2020)

    Google Scholar 

  18. Iizuka, S., Simo-Serra, E., Ishikawa, H.: Globally and locally consistent image completion. ACM Trans. Graph. (TOG) 36, 1–14 (2017)

    Article  Google Scholar 

  19. Jahanian, A., Chai, L., Isola, P.: On the “steerability” of generative adversarial networks. In: International Conference on Learning Representations (2020)

    Google Scholar 

  20. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2018)

    Google Scholar 

  21. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: CVPR (2019)

    Google Scholar 

  22. Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of StyleGAN. In: Proceedings of CVPR (2020)

    Google Scholar 

  23. Kingma, D.P., Welling, M.: Auto-encoding variational bayes. CoRR abs/1312.6114 (2014)

    Google Scholar 

  24. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3D object representations for fine-grained categorization. In: 4th International IEEE Workshop on 3D Representation and Recognition (3dRR-2013), Sydney, Australia (2013)

    Google Scholar 

  25. Lee, C.H., Liu, Z., Wu, L., Luo, P.: MaskGAN: towards diverse and interactive facial image manipulation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Google Scholar 

  26. Li, Y., Li, Y., Lu, J., Shechtman, E., Lee, Y.J., Singh, K.K.: Collaging class-specific GANs for semantic image synthesis. In: ICCV (2021)

    Google Scholar 

  27. Li, Y., Singh, K.K., Ojha, U., Lee, Y.J.: MixnMatch: multifactor disentanglement and encoding for conditional image generation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 8036–8045 (2020)

    Google Scholar 

  28. Liu, G., Reda, F.A., Shih, K.J., Wang, T.-C., Tao, A., Catanzaro, B.: Image inpainting for irregular holes using partial convolutions. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11215, pp. 89–105. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01252-6_6

    Chapter  Google Scholar 

  29. Ma, X., Zhou, X., Huang, H., Chai, Z., Wei, X., He, R.: Free-form image inpainting via contrastive attention network. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 9242–9249 (2021)

    Google Scholar 

  30. Misra, I., van der Maaten, L.: Self-supervised learning of pretext-invariant representations. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6706–6716 (2020)

    Google Scholar 

  31. Park, T., Efros, A.A., Zhang, R., Zhu, J.Y.: Contrastive learning for unpaired image-to-image translation. In: ECCV (2020)

    Google Scholar 

  32. Pathak, D., Krähenbühl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: Feature learning by inpainting. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2536–2544 (2016)

    Google Scholar 

  33. Ren, Y., Yu, X., Zhang, R., Li, T.H., Liu, S., Li, G.: Structureflow: image inpainting via structure-aware appearance flow. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 181–190 (2019)

    Google Scholar 

  34. Sagong, M.C., Shin, Y.G., Kim, S.W., Park, S., Ko, S.: Pepsi : fast image inpainting with parallel decoding network. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11352–11360 (2019)

    Google Scholar 

  35. Shen, Y., Gu, J., Tang, X., Zhou, B.: Interpreting the latent space of GANs for semantic face editing. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 9240–9249 (2020)

    Google Scholar 

  36. Shen, Y., Zhou, B.: Closed-form factorization of latent semantics in GANs. arXiv:abs/2007.06600 (2020)

  37. Shu, Z., Sahasrabudhe, M., Alp Güler, R., Samaras, D., Paragios, N., Kokkinos, I.: Deforming autoencoders: unsupervised disentangling of shape and appearance. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11214, pp. 664–680. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01249-6_40

    Chapter  Google Scholar 

  38. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. CoRR abs/1409.1556 (2015)

    Google Scholar 

  39. Singh, K.K., Ojha, U., Lee, Y.J.: FineGAN: unsupervised hierarchical disentanglement for fine-grained object generation and discovery. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6483–6492 (2019)

    Google Scholar 

  40. Suin, M., Purohit, K., Rajagopalan, A.N.: Distillation-guided image inpainting. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 2481–2490, October 2021

    Google Scholar 

  41. Telea, A.: An image inpainting technique based on the fast marching method. J. Graph. Tools. 9, 23–34 (004). https://doi.org/10.1080/10867651.2004.10487596

  42. Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12356, pp. 776–794. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58621-8_45

    Chapter  Google Scholar 

  43. Voynov, A., Babenko, A.: Unsupervised discovery of interpretable directions in the GAN latent space. arXiv:abs/2002.03754 (2020)

  44. Wah, C., Branson, S., Welinder, P., Perona, P., Belongie, S.: The Caltech-UCSD Birds-200-2011 Dataset. Technical report, CNS-TR-2011-001 (2011)

    Google Scholar 

  45. Wan, Z., Zhang, J., Chen, D., Liao, J.: High-fidelity pluralistic image completion with transformers. In: ICCV (2021)

    Google Scholar 

  46. Wang, Y., Tao, X., Qi, X., Shen, X., Jia, J.: Image inpainting via generative multi-column convolutional neural networks. In: NeurIPS (2018)

    Google Scholar 

  47. Wu, Y., Kirillov, A., Massa, F., Lo, W.Y., Girshick, R.: Detectron2 (2019). https://github.com/facebookresearch/detectron2

  48. Xie, C., et al.: Image inpainting with learnable bidirectional attention maps. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 8857–8866 (2019)

    Google Scholar 

  49. Xing, X., Gao, R., Han, T., Zhu, S.C., Wu, Y.N.: Deformable generator network: Unsupervised disentanglement of appearance and geometry. IEEE Trans. Pattern Anal. Mach. Intell. (2020)

    Google Scholar 

  50. Xiong, W., et al.: Foreground-aware image inpainting. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5833–5841 (2019)

    Google Scholar 

  51. Yan, Z., Li, X., Li, M., Zuo, W., Shan, S.: Shift-Net: image inpainting via deep feature rearrangement. arXiv:abs/1801.09392 (2018)

  52. Yang, C., Shen, Y., Zhou, B.: Semantic hierarchy emerges in deep generative representations for scene synthesis. Int. J. Comput. Vis. 129, 1451–1466 (2021)

    Article  Google Scholar 

  53. Yang, C., Lu, X., Lin, Z.L., Shechtman, E., Wang, O., Li, H.: High-resolution image inpainting using multi-scale neural patch synthesis. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4076–4084 (2017)

    Google Scholar 

  54. Yu, F., Zhang, Y., Song, S., Seff, A., Xiao, J.: LSUN: construction of a large-scale image dataset using deep learning with humans in the loop. arXiv:abs/1506.03365 (2015)

  55. Yu, J., Lin, Z.L., Yang, J., Shen, X., Lu, X., Huang, T.S.: Free-form image inpainting with gated convolution. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 4470–4479 (2019)

    Google Scholar 

  56. Yu, Y., et al.: Diverse image inpainting with bidirectional and autoregressive transformers. In: Proceedings of the 29th ACM International Conference on Multimedia (2021)

    Google Scholar 

  57. Zeng, Y., Fu, J., Chao, H., Guo, B.: Learning pyramid-context encoder network for high-quality image inpainting. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1486–1494 (2019)

    Google Scholar 

  58. Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018)

    Google Scholar 

  59. Zhao, L., et al.: UCTGAN: diverse image inpainting based on unsupervised cross-space translation. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5740–5749 (2020)

    Google Scholar 

  60. Zhao, S., et al.: Large scale image completion via co-modulated generative adversarial networks. arXiv:abs/2103.10428 (2021)

  61. Zheng, C., Cham, T., Cai, J.: Pluralistic image completion. In: 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1438–1447 (2019)

    Google Scholar 

  62. Zhou, X., Li, J., Wang, Z., He, R., Tan, T.: Image inpainting with contrastive relation network. In: 2020 25th International Conference on Pattern Recognition (ICPR), pp. 4420–4427 (2021)

    Google Scholar 

  63. Zhuang, C., Zhai, A., Yamins, D.: Local aggregation for unsupervised learning of visual embeddings. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 6001–6011 (2019)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by Sony Focused Research Award, NSF CAREER IIS-2150012, Wisconsin Alumni Research Foundation, and NASA 80NSSC21K0295.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yuheng Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 4736 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Li, Y., Li, Y., Lu, J., Shechtman, E., Lee, Y.J., Singh, K.K. (2022). Contrastive Learning for Diverse Disentangled Foreground Generation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13676. Springer, Cham. https://doi.org/10.1007/978-3-031-19787-1_19

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-19787-1_19

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-19786-4

  • Online ISBN: 978-3-031-19787-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics