Skip to main content

Entropy-Driven Sampling and Training Scheme for Conditional Diffusion Generation

  • Conference paper
  • First Online:
Computer Vision – ECCV 2022 (ECCV 2022)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13682))

Included in the following conference series:

Abstract

Denoising Diffusion Probabilistic Model (DDPM) is able to make flexible conditional image generation from prior noise to real data, by introducing an independent noise-aware classifier to provide conditional gradient guidance at each time step of denoising process. However, due to the ability of the classifier to easily discriminate an incompletely generated image only with high-level structure, the gradient, which is a kind of class information guidance, tends to vanish early, leading to the collapse from conditional generation process into the unconditional process. To address this problem, we propose two simple but effective approaches from two perspectives. For sampling procedure, we introduce the entropy of predicted distribution as the measure of guidance vanishing level and propose an entropy-aware scaling method to adaptively recover the conditional semantic guidance. For the training stage, we propose the entropy-aware optimization objectives to alleviate the overconfident prediction for noisy data. On ImageNet1000 256 \(\times \) 256, with our proposed sampling scheme and trained classifier, the pretrained conditional and unconditional DDPM model can achieve 10.89% (4.59 to 4.09) and 43.5% (12 to 6.78) FID improvement, respectively. Code is available at https://github.com/ZGCTroy/ED-DPM.

G. Zheng and S. Li—The first two authors contributed equally to this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 89.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 119.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Barratt, S., Sharma, R.: A note on the inception score. arXiv:1801.01973 (2018)

  2. Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. arXiv:1809.11096 (2018)

  3. Bulat, A., Yang, J., Tzimiropoulos, G.: To learn image super-resolution, use a GAN to learn how to do image degradation first. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11210, pp. 187–202. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01231-1_12

    Chapter  Google Scholar 

  4. Choi, J., Kim, S., Jeong, Y., Gwon, Y., Yoon, S.: ILVR: conditioning method for denoising diffusion probabilistic models. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 14367–14376, October 2021

    Google Scholar 

  5. De Bortoli, V., Thornton, J., Heng, J., Doucet, A.: Diffusion Schrödinger bridge with applications to score-based generative modeling. In: Advances in Neural Information Processing Systems 34 (2021)

    Google Scholar 

  6. Dhariwal, P., Nichol, A.: Diffusion models beat GANs on image synthesis. In: Advances in Neural Information Processing Systems 34 (2021)

    Google Scholar 

  7. Donahue, J., Simonyan, K.: Large scale adversarial representation learning. arXiv:1907.02544 (2019)

  8. Goodfellow, I.J., et al.: Generative adversarial networks. arXiv:1406.2661 (2014)

  9. Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: GANs trained by a two time-scale update rule converge to a local Nash equilibrium. In: Advances in Neural Information Processing Systems 30 (NIPS 2017) (2017)

    Google Scholar 

  10. Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020)

    Google Scholar 

  11. Huang, C.W., Lim, J.H., Courville, A.C.: A variational perspective on diffusion-based generative models and score matching. In: Advances in Neural Information Processing Systems 34 (2021)

    Google Scholar 

  12. Kingma, D.P., Salimans, T., Poole, B., Ho, J.: Variational diffusion models. arXiv preprint arXiv:2107.00630 (2021)

  13. Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. Stat 1050, 1 (2014)

    Google Scholar 

  14. Kong, Z., Ping, W., Huang, J., Zhao, K., Catanzaro, B.: DiffWave: a versatile diffusion model for audio synthesis. arXiv:2009.09761 (2020)

  15. Kynkäänniemi, T., Karras, T., Laine, S., Lehtinen, J., Aila, T.: Improved precision and recall metric for assessing generative models. In: Advances in Neural Information Processing Systems 32 (2019)

    Google Scholar 

  16. Liu, J., Li, C., Ren, Y., Chen, F., Liu, P., Zhao, Z.: DiffSinger: singing voice synthesis via shallow diffusion mechanism. arXiv preprint arXiv:2105.02446 (2021)

  17. Luhman, E., Luhman, T.: Knowledge distillation in iterative generative models for improved sampling speed. arXiv:2101.02388 (2021)

  18. Lyu, Z., Kong, Z., Xu, X., Pan, L., Lin, D.: A conditional point diffusion-refinement paradigm for 3d point cloud completion. arXiv preprint arXiv:2112.03530 (2021)

  19. Meng, C., Song, Y., Song, J., Wu, J., Zhu, J.Y., Ermon, S.: SDEdit: image synthesis and editing with stochastic differential equations. arXiv preprint arXiv:2108.01073 (2021)

  20. Mirza, M., Osindero, S.: Conditional generative adversarial nets. arXiv:1411.1784 (2014)

  21. Nachmani, E., Roman, R.S., Wolf, L.: Non gaussian denoising diffusion models. arXiv preprint arXiv:2106.07582 (2021)

  22. Nash, C., Menick, J., Dieleman, S., Battaglia, P.W.: Generating images with sparse representations. arXiv:2103.03841 (2021)

  23. Nichol, A., Dhariwal, P.: Improved denoising diffusion probabilistic models. arXiv preprint arXiv:2102.09672 (2021)

  24. Nie, W., Vahdat, A., Anandkumar, A.: Controllable and compositional generation with latent-space energy-based models. In: Advances in Neural Information Processing Systems 34 (2021)

    Google Scholar 

  25. van den Oord, A., Vinyals, O., Kavukcuoglu, K.: Neural discrete representation learning. arXiv:1711.00937 (2017)

  26. Ramesh, A., et al.: Zero-shot text-to-image generation. arXiv preprint arXiv:2102.12092 (2021)

  27. Razavi, A., van den Oord, A., Vinyals, O.: Generating diverse high-fidelity images with VQ-VAE-2. arXiv:1906.00446 (2019)

  28. Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. arXiv:1409.0575 (2014)

  29. Saharia, C., Ho, J., Chan, W., Salimans, T., Fleet, D.J., Norouzi, M.: Image super-resolution via iterative refinement. arXiv:arXiv:2104.07636 (2021)

  30. Sasaki, H., Willcocks, C.G., Breckon, T.P.: UNIT-DDPM: unpaired image translation with denoising diffusion probabilistic models. arXiv preprint arXiv:2104.05358 (2021)

  31. Sinha, A., Song, J., Meng, C., Ermon, S.: D2C: diffusion-decoding models for few-shot conditional generation. In: Advances in Neural Information Processing Systems 34 (2021)

    Google Scholar 

  32. Sohl-Dickstein, J., Weiss, E.A., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. arXiv:1503.03585 (2015)

  33. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: International Conference on Learning Representations (2020)

    Google Scholar 

  34. Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. arXiv:2010.02502 (2020)

  35. Song, Y., Durkan, C., Murray, I., Ermon, S.: Maximum likelihood training of score-based diffusion models. In: Advances in Neural Information Processing Systems 34 (2021)

    Google Scholar 

  36. Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. arXiv:1907.05600 (2020)

  37. Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. arXiv:2011.13456 (2020)

  38. Vahdat, A., Kautz, J.: NVAE: a deep hierarchical variational autoencoder. arXiv:2007.03898 (2020)

  39. Vincent, P.: A connection between score matching and denoising autoencoders. Neural Comput. 23(7), 1661–1674 (2011)

    Article  MathSciNet  Google Scholar 

  40. Wang, H., Lin, G., Hoi, S.C.H., Miao, C.: Cycle-consistent inverse GAN for text-to-image synthesis (2021)

    Google Scholar 

  41. Xia, W., Yang, Y., Xue, J.H., Wu, B.: TediGAN: text-guided diverse face image generation and manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2256–2265 (2021)

    Google Scholar 

Download references

Acknowledgements

This work is supported in part by National Key Research and Development Program of China under Grant 2020AAA0107400, Zhejiang Provincial Natural Science Foundation of China under Grant LR19F020004, National Natural Science Foundation of China under Grant U20A20222.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xi Li .

Editor information

Editors and Affiliations

1 Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 6061 KB)

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Zheng, G. et al. (2022). Entropy-Driven Sampling and Training Scheme for Conditional Diffusion Generation. In: Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T. (eds) Computer Vision – ECCV 2022. ECCV 2022. Lecture Notes in Computer Science, vol 13682. Springer, Cham. https://doi.org/10.1007/978-3-031-20047-2_43

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-20047-2_43

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-20046-5

  • Online ISBN: 978-3-031-20047-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics