Multi-resolution continuous normalizing flows

Voleti, Vikram; Finlay, Chris; Oberman, Adam; Pal, Christopher

doi:10.1007/s10472-024-09939-5

Vikram Voleti^1,2,
Chris Finlay^1,3,5,
Adam Oberman^1,3 &
…
Christopher Pal^1,4

58 Accesses
Explore all metrics

Abstract

Recent work has shown that Neural Ordinary Differential Equations (ODEs) can serve as generative models of images using the perspective of Continuous Normalizing Flows (CNFs). Such models offer exact likelihood calculation, and invertible generation/density estimation. In this work we introduce a Multi-Resolution variant of such models (MRCNF), by characterizing the conditional distribution over the additional information required to generate a fine image that is consistent with the coarse image. We introduce a transformation between resolutions that allows for no change in the log likelihood. We show that this approach yields comparable likelihood values for various image datasets, with improved performance at higher resolutions, with fewer parameters, using only one GPU. Further, we examine the out-of-distribution properties of MRCNFs, and find that they are similar to those of other likelihood-based generative models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Training Algorithms for Mixtures of Normalizing Flows

A framework for data-driven solution and parameter estimation of PDEs using conditional generative adversarial networks

Article 20 December 2021

Variational Model-Based Deep Neural Networks for Image Reconstruction

Data Availability Statement (DAS)

All data generated or analysed during this study are included in their respective published articles, as mentioned in the main draft: CIFAR10 [61], ImageNet [62]

References

Dinh, L., Sohl-Dickstein, J., Bengio, S.: Density estimation using real nvp. In: International Conference on Learned Representations (2017)
Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1x1 convolutions. In: Advances in Neural Information Processing Systems, pp. 10215–10224 (2018)
Ho, J., Chen, X., Srinivas, A., Duan, Y., Abbeel, P.: Flow++: Improving flow-based generative models with variational dequantization and architecture design. In: International Conference on Machine Learning (2019)
Yu, J., Derpanis, K., Brubaker, M.: Wavelet flow: Fast training of high resolution normalizing flows. In: Advances in Neural Information Processing Systems (2020)
Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep learning. 1 (2016)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. Preprint arXiv:1312.6114 (2013)
Chen, R.T.Q., Rubanova, Y., Bettencourt, J., Duvenaud, D.: Neural ordinary differential equations. Adv. Neural Inf. Process. Syst. (2018)
Grathwohl, W., Chen, R.T.Q., Bettencourt, J., Sutskever, I., Duvenaud, D.: Ffjord: Free-form continuous dynamics for scalable reversible generative models. International Conference on Learning Representations (2019)
Finlay, C., Jacobsen, J.-H., Nurbekyan, L., Oberman, A.: How to train your neural ode: the world of jacobian and kinetic regularization. International Conference on Machine Learning (2020)
Lin, Z., Khetan, A., Fanti, G., Oh, S.: Pacgan: The power of two samples in generative adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1498–1507 (2018)
Arjovsky, M., Bottou, L.: Towards principled methods for training generative adversarial networks. Preprint arXiv:1701.04862 (2017)
Berard, H., Gidel, G., Almahairi, A., Vincent, P., Lacoste-Julien, S.: A closer look at the optimization landscapes of generative adversarial networks. In: International Conference on Machine Learning (2020)
Brock, A., Donahue, J., Simonyan, K.: Large scale GAN training for high fidelity natural image synthesis. In: International Conference on Learning Representations (2019)
Shaham, T.R., Dekel, T., Michaeli, T.: Singan: Learning a generative model from a single natural image. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4570–4580 (2019)
Karras, T., Laine, S., Aittala, M., Hellsten, J., Lehtinen, J., Aila, T.: Analyzing and improving the image quality of stylegan. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8110–8119 (2020)
Vahdat, A., Kautz, J.: Nvae: A deep hierarchical variational autoencoder. In: Advances in Neural Information Processing Systems (2020)
Tabak, E.G., Turner, C.V.: A family of nonparametric density estimation algorithms. Commun. Pur. Appl. Math. 66(2), 145–164 (2013)
Article MathSciNet Google Scholar
Jimenez Rezende, D., Mohamed, S.: Variational inference with normalizing flows. In: International Conference on Machine Learning, pp. 1530–1538 (2015)
Papamakarios, G., Nalisnick, E., Rezende, D.J., Mohamed, S., Lakshminarayanan, B.: Normalizing flows for probabilistic modeling and inference. Preprint arXiv:1912.02762 (2019)
Kobyzev, I., Prince, S., Brubaker, M.: Normalizing flows: An introduction and review of current methods. IEEE Trans. Pattern Anal. Mach. Intell. (2020)
Ghosh, A., Behl, H.S., Dupont, E., Torr, P.H., Namboodiri, V.: Steer: Simple temporal regularization for neural odes. In: Advances in Neural Information Processing Systems (2020)
Onken, D., Fung, S.W., Li, X., Ruthotto, L.: Ot-flow: Fast and accurate continuous normalizing flows via optimal transport. AAAI Conf. Artif. Intell. (2021)
Huang, H.-H., Yeh, M.-Y.: Accelerating continuous normalizing flow with trajectory polynomial regularization. AAAI Conf. Artif. Intell. (2021)
Burt, P.J.: Fast filter transform for image processing. Comput Graphics Image Process 16(1), 20–51 (1981)
Article Google Scholar
Marr, D.: Vision: A computational investigation into the human representation and processing of visual information. (2010)
Witkin, A.P.: Scale-space filtering, 329–332 (1987)
Burt, P., Adelson, E.: The laplacian pyramid as a compact image code. IEEE Trans. Commun. 31(4), 532–540 (1983)
Article Google Scholar
Mallat, S.G.: A theory for multiresolution signal decomposition: the wavelet representation. IEEE Trans. Pattern Anal. Mach. Intell. 11(7), 674–693 (1989)
Article Google Scholar
Lindeberg, T.: Scale-space for discrete signals. IEEE Trans. Pattern Anal. Mach. Intell. 12(3), 234–254 (1990)
Article Google Scholar
Adelson, E.H., Anderson, C.H., Bergen, J.R., Burt, P.J., Ogden, J.M.: Pyramid methods in image processing. RCA Eng. 29(6), 33–41 (1984)
Google Scholar
Mallat, S.G., Peyré, G.: A Wavelet Tour of Signal Processing: the Sparse Way, (2009)
Yan, H., Du, J., Tan, V.Y.F., Feng, J.: On robustness of neural ordinary differential equations. International Conference on Learning Representations. (2020)
Denton, E.L., Chintala, S., Fergus, R., et al.: Deep generative image models using a laplacian pyramid of adversarial networks. In: Advances in Neural Information Processing Systems, pp. 1486–1494 (2015)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of gans for improved quality, stability, and variation. In: International Conference on Learned Representations (2018)
Karnewar, A., Wang, O.: Msg-gan: Multi-scale gradients for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7799–7808 (2020)
Razavi, A., Oord, A., Vinyals, O.: Generating diverse high-fidelity images with vq-vae-2. In: Advances in Neural Information Processing Systems, pp. 14866–14876 (2019)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Radford, A., Metz, L., Chintala, S.: Unsupervised representation learning with deep convolutional generative adversarial networks. Preprint arXiv:1511.06434 (2015)
Oord, A.v.d., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. International Conference on Machine Learning. (2016)
Reed, S., Oord, A.v.d., Kalchbrenner, N., Colmenarejo, S.G., Wang, Z., Belov, D., De Freitas, N.: Parallel multiscale autoregressive density estimation. In: International Conference on Machine Learning (2017)
Menick, J., Kalchbrenner, N.: Generating high fidelity images with subscale pixel networks and multidimensional upscaling. In: International Conference on Learning Representations (2019)
Hoogeboom, E., Berg, R.v.d., Welling, M.: Emerging convolutions for generative normalizing flows. In: International Conference on Machine Learning (2019)
Hoogeboom, E., Peters, J., Berg, R., Welling, M.: Integer discrete flows and lossless compression. In: Advances in Neural Information Processing Systems, vol. 32, pp. 12134–12144 (2019). https://proceedings.neurips.cc/paper/2019/file/9e9a30b74c49d07d8150c8c83b1ccf07-Paper.pdf
Song, Y., Meng, C., Ermon, S.: Mintnet: Building invertible neural networks with masked convolutions. In: Advances in Neural Information Processing Systems, pp. 11004–11014 (2019)
Ma, X., Kong, X., Zhang, S., Hovy, E.: Macow: Masked convolutional generative flow. In: Advances in Neural Information Processing Systems, pp. 5893–5902 (2019)
Durkan, C., Bekasov, A., Murray, I., Papamakarios, G.: Neural spline flows. In: Advances in Neural Information Processing Systems, vol. 32, pp. 7511–7522 (2019). https://proceedings.neurips.cc/paper/2019/file/7ac71d433f282034e088473244df8c02-Paper.pdf
Chen, J., Lu, C., Chenli, B., Zhu, J., Tian, T.: Vflow: More expressive generative flows with variational data augmentation. In: International Conference on Machine Learning (2020)
Lee, S.-g., Kim, S., Yoon, S.: Nanoflow: Scalable normalizing flows with sublinear parameter complexity. In: Advances in Neural Information Processing Systems (2020)
Sohl-Dickstein, J., Weiss, E., Maheswaranathan, N., Ganguli, S.: Deep unsupervised learning using nonequilibrium thermodynamics. In: International Conference on Machine Learning, pp. 2256–2265 (2015). PMLR
Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. (2020)
Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. International Conference on Learning Representations. (2020)
Song, Y., Ermon, S.: Generative modeling by estimating gradients of the data distribution. Adv. Neural Inf. Process. Syst. (2019)
Song, Y., Ermon, S.: Improved techniques for training score-based generative models. Adv. Neural Inf. Process. Syst. (2020)
Jolicoeur-Martineau, A., Piché-Taillefer, R., Combes, R.T.d., Mitliagkas, I.: Adversarial score matching and improved sampling for image generation. International Conference on Learning Representations. (2021)
Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score-based generative modeling through stochastic differential equations. International Conference on Learning Representations. (2021)
Oord, A., Kalchbrenner, N., Espeholt, L., Vinyals, O., Graves, A., et al.: Conditional image generation with pixelcnn decoders. In: Advances in Neural Information Processing Systems, pp. 4790–4798 (2016)
Child, R., Gray, S., Radford, A., Sutskever, I.: Generating long sequences with sparse transformers. Preprint arXiv:1904.10509. (2019)
Jun, H., Child, R., Chen, M., Schulman, J., Ramesh, A., Radford, A., Sutskever, I.: Distribution augmentation for generative modeling. In: International Conference on Machine Learning, pp. 10563–10576 (2020)
Grcić, M., Grubišić, I., Šegvić, S.: Densely connected normalizing flows. Preprint. (2021)
Chen, R.T., Behrmann, J., Duvenaud, D.K., Jacobsen, J.-H.: Residual flows for invertible generative modeling. In: Advances in Neural Information Processing Systems, pp. 9916–9926 (2019)
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images. Technical Report, University of Toronto. (2009)
Deng, J., Dong, W., Socher, R., Li, L.-J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255 (2009). IEEE
Makhzani, A., Shlens, J., Jaitly, N., Goodfellow, I., Frey, B.: Adversarial autoencoders. Preprint arXiv:1511.05644. (2015)
Grover, A., Dhar, M., Ermon, S.: Flow-gan: Combining maximum likelihood and adversarial learning in generative models. In: AAAI Conference on Artificial Intelligence (2018)
Lee, A.X., Zhang, R., Ebert, F., Abbeel, P., Finn, C., Levine, S.: Stochastic adversarial video prediction. ArXiv. abs/1804.01523 (2018)
Beckham, C., Honari, S., Verma, V., Lamb, A.M., Ghadiri, F., Hjelm, R.D., Bengio, Y., Pal, C.: On adversarial mixup resynthesis. In: Advances in Neural Information Processing Systems, pp. 4346–4357 (2019)
Heusel, M., Ramsauer, H., Unterthiner, T., Nessler, B., Hochreiter, S.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. In: Advances in Neural Information Processing Systems, pp. 6626–6637 (2017)
Theis, L., Oord, A.v.d., Bethge, M.: A note on the evaluation of generative models. In: International Conference on Learning Representations (2016)
Nalisnick, E., Matsukawa, A., Teh, Y.W., Gorur, D., Lakshminarayanan, B.: Do deep generative models know what they don’t know? In: International Conference on Learning Representations (2019)
Serrà, J., Álvarez, D., Gómez, V., Slizovskaia, O., Núñez, J.F., Luque, J.: Input complexity and out-of-distribution detection with likelihood-based generative models. In: International Conference on Learning Representations (2020)
Nalisnick, E., Matsukawa, A., Teh, Y.W., Lakshminarayanan, B.: Detecting out-of-distribution inputs to deep generative models using a test for typicality. Preprint arXiv:1906.02994. 5 (2019)
Netzer, Y., Wang, T., Coates, A., Bissacco, A., Wu, B., Ng, A.Y.: Reading digits in natural images with unsupervised feature learning. NIPS Workshop on Deep Learning and Unsupervised Feature Learning. (2011)
Choi, H., Jang, E., Alemi, A.A.: Waic, but why? generative ensembles for robust anomaly detection. Preprint arXiv:1810.01392 (2018)
Kirichenko, P., Izmailov, P., Wilson, A.G.: Why normalizing flows fail to detect out-of-distribution data. In: Advances in Neural Information Processing Systems, vol. 33 (2020)
Sneyers, J., Wuille, P.: Flif: Free lossless image format based on maniac compression. In: 2016 IEEE International Conference on Image Processing (ICIP), pp. 66–70 (2016). IEEE
Hendrycks, D., Mazeika, M., Dietterich, T.: Deep anomaly detection with outlier exposure. In: International Conference on Learning Representations (2019)
Hendrycks, D., Gimpel, K.: A baseline for detecting misclassified and out-of-distribution examples in neural networks. In: International Conference on Learning Representations (2017)
Liang, S., Li, Y., Srikant, R.: Enhancing the reliability of out-of-distribution image detection in neural networks. In: International Conference on Learning Representations (2018)
Lee, K., Lee, K., Lee, H., Shin, J.: A simple unified framework for detecting out-of-distribution samples and adversarial attacks. In: Advances in Neural Information Processing Systems, pp. 7167–7177 (2018)
Sabeti, E., Høst-Madsen, A.: Data discovery and anomaly detection using atypicality for real-valued data. Entropy 21(3), 219 (2019)
Article MathSciNet Google Scholar
Høst-Madsen, A., Sabeti, E., Walton, C.: Data discovery and anomaly detection using atypicality: Theory. IEEE Trans. Inf. Theory 65(9), 5302–5322 (2019)
Article MathSciNet Google Scholar
Parmar, N., Vaswani, A., Uszkoreit, J., Kaiser, Ł., Shazeer, N., Ku, A., Tran, D.: Image transformer. In: International Conference on Machine Learning (2018)
Chen, X., Mishra, N., Rohaninejad, M., Abbeel, P.: Pixelsnail: An improved autoregressive generative model. In: International Conference on Machine Learning, pp. 864–872 (2018). PMLR
Ho, J., Kalchbrenner, N., Weissenborn, D., Salimans, T.: Axial attention in multidimensional transformers. Preprint arXiv:1912.12180 (2019)
Nielsen, D., Winther, O.: Closing the dequantization gap: Pixelcnn as a single-layer flow. In: Advances in Neural Information Processing Systems (2020)
Kingma, D.P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., Welling, M.: Improving variational inference with inverse autoregressive flow. (2016). cite arxiv:1606.04934
Behrmann, J., Grathwohl, W., Chen, R.T., Duvenaud, D., Jacobsen, J.-H.: Invertible residual networks. In: International Conference on Machine Learning, pp. 573–582 (2019)
Karami, M., Schuurmans, D., Sohl-Dickstein, J., Dinh, L., Duckworth, D.: Invertible convolutional flow. In: Advances in Neural Information Processing Systems, vol. 32, pp. 5635–5645 (2019). https://proceedings.neurips.cc/paper/2019/file/b1f62fa99de9f27a048344d55c5ef7a6-Paper.pdf
Huang, C.-W., Dinh, L., Courville, A.: Augmented normalizing flows: Bridging the gap between generative flows and latent variable models. Preprint arXiv:2002.07101 (2020)
Xiao, C., Liu, L.: Generative flows with matrix exponential. In: International Conference on Machine Learning (2020)
Lu, Y., Huang, B.: Woodbury transformations for deep generative flows. In: Advances in Neural Information Processing Systems (2020)
Hoogeboom, E., , Satorras, V.G., Tomczak, J., Welling, M.: The convolution exponential and generalized sylvester flows. In: Advances in Neural Information Processing Systems (2020)
Kelly, J., Bettencourt, J., Johnson, M.J., Duvenaud, D.: Learning differential equations that are easy to solve. In: Advances in Neural Information Processing Systems (2020)
Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: International Conference on Machine Learning, pp. 1310–1318 (2013)

Download references

Acknowledgements

Chris Finlay contributed to this paper while a postdoc at McGill University; he is now affiliated with Deep Render. His postdoc was funded in part by a Healthy Brains Healthy Lives Fellowship. Adam Oberman was supported by the Air Force Office of Scientific Research under award number FA9550-18-1-0167 and by IVADO. Christopher Pal is funded in part by CIFAR. We thank CIFAR for their support through the CIFAR AI Chairs program. We also thank Samsung for partially supporting Vikram Voleti for this work. We thank Adam Ibrahim, Etienne Denis, Gauthier Gidel, Ioannis Mitliagkas, and Roger Girgis for their valuable feedback.

Funding

Chris Finlay contributed to this paper while a postdoc at McGill University, funded in part by a Healthy Brains Healthy Lives Fellowship. Adam Oberman was supported by the Air Force Office of Scientific Research under award number FA9550-18-1-0167 and by IVADO. Christopher Pal is funded in part by CIFAR. We thank CIFAR for their support through the CIFAR AI Chairs program.

Author information

Authors and Affiliations

Mila, 6666 Rue St. Urbain, Montreal, H2S 3H1, QC, Canada
Vikram Voleti, Chris Finlay, Adam Oberman & Christopher Pal
DIRO, University of Montreal, 2900 Bd Édouard-Montpetit, Montreal, H3T 1J4, QC, Canada
Vikram Voleti
McGill University, 845 Rue Sherbrooke O, Montreal, H3A 0G4, QC, Canada
Chris Finlay & Adam Oberman
École Polytechnique de Montréal, 2500 Chem. de Polytechnique, Montreal, H3T 1J4, QC, Canada
Christopher Pal
DeepRender, London, UK
Chris Finlay

Authors

Vikram Voleti
View author publications
You can also search for this author in PubMed Google Scholar
Chris Finlay
View author publications
You can also search for this author in PubMed Google Scholar
Adam Oberman
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Pal
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Vikram Voleti and Chris Finlay brainstormed over ideas for improving image generation using the continuous normalizing flows framework of Neural ODEs. Adam Oberman and Christopher Pal provided advice and guidance throughout the project and wrote parts of the paper. With help from Adam Oberman and Christopher Pal, Vikram derived the mathematical framework. With help from Chris Finlay, Vikram designed the experiments, wrote the code, ran experiments, proposed and executed on out-of-distribution analysis, and wrote the paper. All authors reviewed the manuscript.

Corresponding author

Correspondence to Vikram Voleti.

Ethics declarations

Competing interests

The authors declare no competing interests.

Conflict of Interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A Full Table 1

Table 5 presents the full version of Table 1 including other results relevant to the conclusion but not mentioned in the main paper for brevity.

Table 5 Unconditional image generation metrics (lower is better in all cases): parameters in the model, bits-per-dimension, time (in hours)

Full size table

Appendix B Qualitative samples

Here we present qualitative examples of our method for the datasets of MNIST and CIFAR10.

Appendix C Simple example of density estimation

For example, if we use Euler method as our ODE solver, for density estimation (2) reduces to:

$$\begin{aligned} {\textbf{v}}(t_1) = {\textbf{v}}(t_0) + (t_1 - t_0)f_s({\textbf{v}}(t_0), t_0 \mid {\textbf{c}}) \end{aligned}$$

(C1)

where $f_s$ is a neural network, $t_0$ represents the "time" at which the state is image ${\textbf{x}}$, and $t_1$ is when the state is noise ${\textbf{z}}$. We start at scale S with an image sample ${\textbf{x}}_S$, and assume $t_0$ and $t_1$ are 0 and 1 respectively:

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}{\textbf{z}}_S = {\textbf{x}}_S + f_S({\textbf{x}}_S,\ t_0 \mid {\textbf{x}}_{S-1})\\ &{}{\textbf{z}}_{S-1} = {\textbf{x}}_{S-1} + f_{S-1}({\textbf{x}}_{S-1},\ t_0 \mid {\textbf{x}}_{S-2})\\ &{}\vdots \\ &{}{\textbf{z}}_1 = {\textbf{x}}_1 + f_1({\textbf{x}}_1,\ t_0 \mid {\textbf{x}}_0)\\ &{}{\textbf{z}}_0 = {\textbf{x}}_0 + f_0({\textbf{x}}_0,\ t_0) \end{array}\right. } \end{aligned}$$

(C2)

Appendix D Simple example of generation

For example, if we use Euler method as our ODE solver, for generation (2) reduces to:

$$\begin{aligned} {\textbf{v}}(t_0) = {\textbf{v}}(t_1) + (t_0 - t_1)f_s({\textbf{v}}(t_1), t_1 \mid {\textbf{c}}) \end{aligned}$$

(D3)

i.e. the state is integrated backwards from $t_1$ (i.e. ${\textbf{z}}_s$) to $t_0$ (i.e. ${\textbf{x}}_s$). We start at scale 0 with a noise sample ${\textbf{z}}_0$, and assume $t_0$ and $t_1$ are 0 and 1 respectively:

$$\begin{aligned} {\left\{ \begin{array}{ll} &{}{\textbf{x}}_0 = {\textbf{z}}_0 - f_0({\textbf{z}}_0,\ t_1)\\ &{}{\textbf{x}}_1 = {\textbf{z}}_1 - f_1({\textbf{z}}_1,\ t_1 \mid {\textbf{x}}_0)\\ &{}\vdots \\ &{}{\textbf{x}}_{S-1} = {\textbf{z}}_{S-1} - f_{S-1}({\textbf{z}}_{S-1},\ t_1 \mid {\textbf{x}}_{S-2})\\ &{}{\textbf{x}}_S = {\textbf{z}}_S - f_S({\textbf{z}}_{S},\ t_1 \mid {\textbf{x}}_{S-1}) \end{array}\right. } \end{aligned}$$

(D4)

Appendix E Models

We used the same neural network architecture as in RNODE [9]. The CNF at each resolution consists of a stack of bl blocks of a 4-layer deep convolutional network comprised of 3x3 kernels and softplus activation functions, with 64 hidden dimensions, and time t concatenated to the spatial input. In addition, except at the coarsest resolution, the immediate coarser image is also concatenated with the state. The integration time of each piece is [0, 1]. The number of blocks bl and the corresponding total number of parameters are given in Table 6.

Table 6 Number of parameters for different models with different total number of resolutions (res), and the number of channels (ch) and number of blocks (bl) per resolution

Full size table

Appendix F Gradient norm

In order to avoid exploding gradients, We clipped the norm of the gradients [94] by a maximum value of 100.0. In case of using adversarial loss, we first clip the gradients provided by the adversarial loss by 50.0, sum up the gradients provided by the log-likelihood loss, and then clip the summed gradients by 100.0.

Appendix G 8-bit to uniform

The change-of-variables formula gives the change in probability due to the transformation of ${\textbf{u}}$ to ${\textbf{v}}$:

$$\begin{aligned} \log p({\textbf{u}}) = \log p({\textbf{v}}) + \log \left| \det \frac{{\textrm{d}}{\textbf{v}}}{{\textrm{d}}{\textbf{u}}}\right| \end{aligned}$$

Specifically, the change of variables from an 8-bit image to an image with pixel values in range [0, 1] is:

$$\begin{aligned}&{\textbf{b}}_S^{(p)} = \frac{{\textbf{a}}_S^{(p)}}{256}\\&\implies \log p({\textbf{a}}_S) = \log p({\textbf{b}}_S) + \log \left| \det \frac{{\textrm{d}}{\textbf{b}}}{{\textrm{d}}{\textbf{a}}}\right| \\&\implies \log p({\textbf{a}}_S) = \log p({\textbf{b}}_S) + \log \left( \frac{1}{256}\right) ^{D_S} \\&\implies \log p({\textbf{a}}_S) = \log p({\textbf{b}}_S) - D_S \log 256\\ \implies&\text {bpd}({\textbf{a}}_S) = \frac{-\log p({\textbf{a}}_S)}{D_S \log 2} \\&= \frac{-(\log p({\textbf{b}}_S) - D_S \log 256)}{D_S \log 2} \\&= \frac{-\log p({\textbf{b}}_S)}{D_S \log 2} + \frac{\log 256}{\log 2}\\&= \text {bpd}({\textbf{x}}) + 8 \end{aligned}$$

where $\text {bpd}({\textbf{x}})$ is given from (17).

Appendix H FID v/s Temperature

Table 7 lists the FID values of generated images from MRCNF models trained on CIFAR10, with different temperature settings on the Gaussian.

Table 7 FID v/s temperature for MRCNF models trained on CIFAR10

Full size table

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Voleti, V., Finlay, C., Oberman, A. et al. Multi-resolution continuous normalizing flows. Ann Math Artif Intell (2024). https://doi.org/10.1007/s10472-024-09939-5

Download citation

Accepted: 12 March 2024
Published: 21 March 2024
DOI: https://doi.org/10.1007/s10472-024-09939-5

Keywords

Mathematics Subject Classification (2010)

68T07

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-resolution continuous normalizing flows

Abstract

Access this article

Similar content being viewed by others

Training Algorithms for Mixtures of Normalizing Flows

A framework for data-driven solution and parameter estimation of PDEs using conditional generative adversarial networks

Variational Model-Based Deep Neural Networks for Image Reconstruction

Data Availability Statement (DAS)

References

Acknowledgements

Funding