Abstract
Deep generative modeling has led to new and state of the art approaches for enforcing structural priors in a variety of inverse problems. In contrast to priors given by sparsity, deep models can provide direct low-dimensional parameterizations of the manifold of images or signals belonging to a particular natural class, allowing for recovery algorithms to be posed in a low-dimensional space. This dimensionality may even be lower than the sparsity level of the same signals when viewed in a fixed basis. What is not known about these methods is whether there are computationally efficient algorithms whose sample complexity is optimal in the dimensionality of the representation given by the generative model. In this paper, we present such an algorithm and analysis. Under the assumption that the generative model is a neural network that is sufficiently expansive at each layer and has Gaussian weights, we provide a gradient descent scheme and prove that for noisy compressive measurements of a signal in the range of the model, the algorithm converges to that signal, up to the noise level. The scaling of the sample complexity with respect to the input dimensionality of the generative prior is linear, and thus can not be improved except for constants and factors of other variables. To the best of the authors’ knowledge, this is the first recovery guarantee for compressive sensing under generative priors by a computationally efficient algorithm.
Similar content being viewed by others
Notes
This implementation is available at https://www.caam.rice.edu/~optimization/L1/fpc/.
References
Allen-Zhu, Z., Li, Y., Song, Z.: A convergence theory for deep learning via over-parameterization. In: Proceedings of the 36th International Conference on Machine Learning, vol. 97, pp. 242–252. PMLR, 09–15 (2019)
Arora, S., Liang, Y., Ma, T.: Why are deep nets reversible: a simple theory, with implications for training. Preprint (2015). arXiv:1511.05653
Blanchard, J.D., Cartis, C., Tanner, J.: Compressed sensing: How sharp is the restricted isometry property? SIAM Rev. Soc. Ind. Appl. Math. 53(1), 105–125 (2011)
Bora, A., Jalal, A., Price, E., Dimakis, A.G.: Compressed sensing using generative models. In: Proceedings of the 34th International Conference on Machine Learning, vol. 70, pp. 537–546. PMLR, 06–11 (2017)
Candes, E.J., Tao, T.: Near-optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inf. Theory 52(12), 5406–5425 (2006)
Clason, C.: Nonsmooth analysis and optimization. Preprint (2017). arXiv:1708.04180
Du, S.S., Zhai, X., Poczos, B., Singh, A.: Gradient descent provably optimizes over-parameterized neural network. In: Proceedings of the 7nd International Conference on Learning Representations (2019)
Eldar, Y.C., Kutyniok, G.: Compressed Sensing: Theory and Applications. Cambridge University Press, Cambridge (2012)
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Birkhäuser/Springer, Boston (2013)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., Bengio, Y.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Hale, E.T., Yin, W., Zhang, Y.: Fixed-point continuation for \(\ell _1\)-minimization: methodology and convergence. SIAM J. Optim. 19(3), 1107–1130 (2008)
Hand, P., Voroninski, V.: Global guarantees for enforcing deep generative priors by empirical risk. IEEE Trans. Inf. Theory 66(1), 401–418 (2019)
Heckel, R., Huang, W., Hand, P., Voroninski, V.: Deep denoising: rate-optimal recovery of structured signals with a deep prior. Inf. Inference (2020. accepted)
Johnson, J., Alahi, A., Fei-Fei, L.: Perceptual losses for real-time style transfer and super-resolution. In: European Conference on Computer Vision, pp. 694–711. Springer, Cham (2016)
Karras, T., Aila, T., Laine, S., Lehtinen, J.: Progressive growing of GANs for improved quality, stability, and variation (2018). arXiv:1710.10196
Kingma, D.P., Dhariwal, P.: Glow: Generative flow with invertible 1 \(\times \) 1 convolutions. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 10236–10245 (2018)
Kingma, D.P., Welling, M.: Auto-encoding variational Bayes. In: Proceedings of the 2nd International Conference on Learning Representations (2014)
Ledig, C., Theis, L., Huszár, F., Caballero, J., Cunningham, A., Acosta, A., Aitken, A., Tejani, A., Totz, J., Wang, Z., et al.: Photo-realistic single image super-resolution using a generative adversarial network. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4681–4690 (2017)
Li, Y., Liang, Y.: Learning overparameterized neural networks via stochastic gradient descent on structured data. In: Proceedings of the 32nd International Conference on Neural Information Processing Systems, pp. 8168–8177 (2018)
Mao, X.-J., Shen, C., Yang, Y.-B.: Image restoration using convolutional auto-encoders with symmetric skip connections. Preprint (2016). arXiv:1606.08921
Mardani, M., Gong, E., Cheng, J.Y., Vasanawala, S.S., Zaharchuk, G., Xing, L., Pauly, J.M.: Deep generative adversarial neural networks for compressive sensing MRI. IEEE Trans. Med. Imaging 38(1), 167–179 (2019)
Mardani, M., Monajemi, H., Papyan, V., Vasanawala, S., Donoho, D., Pauly, J.: Recurrent generative adversarial networks for proximal learning and automated compressive image recovery. Preprint (2017). arXiv:1711.10046
Mousavi, A., Baraniuk, R.G.: Learning to invert: Signal recovery via deep convolutional networks. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2272–2276 (2017)
Mousavi, A., Patel, A.B., Baraniuk, R.G.: A deep learning approach to structured signal recovery. In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 1336–1343 (2015)
Nocedal, J., Wright, S.J.: Numerical Optimization, 2nd edn. Springer, Cham (2006)
Oymak, S., Soltanolkotabi, M.: Toward moderate overparameterization: global convergence guarantees for training shallow neural networks. IEEE J. Sel. Areas Inf. Theory 1(1), 84–105 (2020)
Rippel, O., Bourdev, L.: Real-time adaptive image compression. In: International Conference on Machine Learning, pp. 2922–2930. PMLR (2017)
Sønderby, C.K., Caballero, J., Theis, L., Shi, W., Huszár, F.: Amortised MAP inference for image super-resolution. In: Proceedings of the 5th International Conference on Learning Representations (2017)
Van Oord, A., Kalchbrenner, N., Kavukcuoglu, K.: Pixel recurrent neural networks. In: International Conference on Machine Learning, pp. 1747–1756. PMLR (2016)
Yeh, R.A., Chen, C., Lim, T.Y., Schwing, A.G., Hasegawa-Johnson, M., Do, M.N.: Semantic image inpainting with deep generative models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 5485–5493 (2017)
Zhu, J.-Y., Krähenbühl, P., Shechtman, E., Efros, A.A.: Generative visual manipulation on the natural image manifold. In: European Conference on Computer Vision, pp. 597–613. Springer, Cham (2016)
Acknowledgements
W.H. is partially supported by the Fundamental Research Funds for the Central Universities (No. 20720190060) and the National Natural Science Foundation of China (No. 12001455).P.H. is partially supported by NSF CAREER Award DMS-1848087 and NSF Award DMS-2022205. RH is partially supported by NSF Award IIS-1816986.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Roman Vershynin.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A: Supporting Lemmas
Appendix A: Supporting Lemmas
Lemma A.1 is used in proofs for Sect. 5.3 and Lemma 5.3.
Lemma A.1
Suppose that the WDC and RRIC holds with \(\epsilon < 1/(16 \pi d^2)^2\) and that the noise e satisfies \(\Vert e\Vert \le a_5 2^{-d/2} \Vert x_*\Vert \). Then, for all x and all \(v_x \in \partial f(x)\),
where \(a_5\) and \(a_6\) are universal constants.
Proof
Define for convenience \(\zeta _j=\prod _{i = j}^{d - 1} \frac{\pi - {\bar{\theta }}_{j, x, x_*}}{\pi }\). We have
where the second inequality follows from the definition of \(h_x\) and Lemma 5.2, the third inequality uses \(| \zeta _j | \le 1\), and the last inequality uses the assumption \(\Vert e\Vert \le a_5 2^{-d/2} \Vert x_*\Vert \). \(\square \)
Lemma A.2 is used in proofs for Lemma 5.1.
Lemma A.2
Suppose \(a_i, b_i \in [0, \pi ]\) for \(i = 1, \ldots , k\), and \(|a_i - b_i| \le |a_j - b_j|, \forall i \ge j\). Then it holds that
Proof
Prove by induction. It is easy to verify that the inequality holds if \(k = 1\). Suppose the inequality holds with \(k = t - 1\). Then
\(\square \)
Lemma A.3 is used in proofs for Lemmas 5.2, 5.3, and 5.5.
Lemma A.3
Suppose the WDC and RRIC hold with \(\epsilon \le 1 / (16 \pi d^2)^2\). Then we have
where \(q_x = \left( \prod _{i = d}^1 W_{i, +, x}\right) ^T A^T e\). In addition, if x is differentiable at G(x), then we have
Proof
We have
where the second inequality follows from RRIC and the last inequality follows from [12, (10)]. Therefore, \(\left| x^T q_x\right| \le \frac{2}{2^{d/2}} \Vert e\Vert \Vert x\Vert \).
Suppose G is differentiable at x. Then the local linearity of G implies that \(G(x + z) - G(x) = \left( \prod _{i = d}^1 W_{i, +, x}\right) z\) for any sufficiently small \(z \in {\mathbb {R}}^k\). By the RRIC, we have
which implies
Therefore, we obtain
Combining above inequality with \(\prod _{i = d}^1\Vert W_{i, +, x} \Vert \le (1 + 2 \epsilon d) / 2^{d/2} \le 1.5 / 2^{d/2}\) given in [12, (10)] yields
where the second inequality follows from the assumption on \(\epsilon \). Therefore, we obtain
\(\square \)
Lemma A.4 is used in proofs for Lemma 5.5.
Lemma A.4
For all \(d\ge 2\), that
and \(a_8 = \min _{d \ge 2} \rho _d > 0\).
Proof
It holds that
where \(\theta _{x, y} = \angle (x, y)\).
We recall the results in [12, (35), (36), and (49)]:
Therefore, we have for all \(0 \le i \le d - 2\),
where the second and the fifth inequalities follow from (A.2) and (A.3) respectively. Since \(\pi ^3 / (12 (i + 1)^3) \le {\check{\theta }}_{i}^3 / 12 \le {\check{\theta }}_{i} - \sin {\check{\theta }}_{i} \le {\check{\theta }}_{i}^3 / 6 \le 27 \pi ^3 / (6 (i + 3)^3)\), we have that for all \(d \ge 3\)
and
where we use \(\sum _{i = 4}^\infty \frac{1}{i^2} \le \frac{\pi ^2}{6}\) and \(\sum _{i = 1}^n i^3 = O(n^4)\). Since \(\rho _d \ge 1 - 250 / (d+1)\) and \(\rho _d > 0\) for all \(d \ge 2\), we have \(\min _{d \ge 2} \rho _d > 0\). \(\square \)
Lemma A.5 is used in proofs for Lemma 5.5.
Lemma A.5
Fix \(0< a_9 < \frac{1}{4 d^2 \pi }\). For any \(\phi _d \in [\rho _d, 1]\), it holds that
where \(a_8\) is defined in Lemma A.4.
Proof
If \(x \in {\mathcal {B}}(\phi _d x_*, a_9 \Vert x_*\Vert )\), then we have \(0 \le {\bar{\theta }}_{0, x, x_*} \le \arcsin (a_9 / \phi _d) \le \frac{\pi a_9}{2 \phi _d}\), \(0 \le {\bar{\theta }}_{0, x, x_*} \le {\bar{\theta }}_{i, x, x_*} \le \frac{\pi a_9}{2 \phi _d}\), and \(\phi _d \Vert x_*\Vert - a_9 \Vert x_*\Vert \le \Vert x\Vert \le \phi _d \Vert x_*\Vert + a_9 \Vert x_*\Vert \). Note that \(\cos \theta \ge 1 - \frac{\theta ^2}{2}, \forall \theta \in [0, \pi ]\). We have
where the last inequality is by Lemma A.4 and \(a_9 < 1 / (4 \pi )\).
If \(x \in {\mathcal {B}}(- \phi _d x_*, a_9 \Vert x_*\Vert )\), then we have \(0 \le \pi - {{\bar{\theta }}}_{0, x, x_*} \le \arcsin (a_9 \pi ) \le \frac{\pi ^2}{2} a_9\), and \(\phi _d \Vert x_*\Vert - a_9 \Vert x_*\Vert \le \Vert x\Vert \le \phi _d \Vert x_*\Vert + a_9 \Vert x_*\Vert \). It follows that
\(\square \)
Lemma A.6 is used in proofs for Lemma 5.5.
Lemma A.6
If the WDC and RRIC hold with \(\epsilon < 1 / (16 \pi d^2)^2\), then we have
Proof
For brevity of notation, let \(\Lambda _{z} = \prod _{i = d}^1 W_{i, +, z}\). We have
where the first inequality uses the WDC, the RRIC, and [12, Lemma 8]. \(\square \)
Lemma A.7 is used in proofs for Lemma A.8.
Lemma A.7
Suppose \(W \in {\mathbb {R}}^{n \times k}\) satisfies the WDC with constant \(\epsilon \). Then for any \(x, y \in {\mathbb {R}}^k\), it holds that
where \(\theta = \angle (x, y)\).
Proof
We have
By WDC assumption, we have
We also have
Combining (A.4), (A.6), and \(\Vert W_{i, +, x}\Vert ^2 \le 1/2 + \epsilon \) given in [12, (9)] yields the result. \(\square \)
Lemma A.8 is used in proofs for Lemma 5.6 and Lemma A.9.
Lemma A.8
Suppose \(x \in {\mathcal {B}}(x_*, d \sqrt{\epsilon } \Vert x_*\Vert )\), and the WDC holds with \(\epsilon < 1/ (200)^4 / d^6\). Then it holds that
Proof
In this proof, we denote \(\theta _{i, x, x_*}\) and \({\bar{\theta }}_{i, x, x_*}\) by \(\theta _i\) and \({\bar{\theta }}_{i}\) respectively. Since \(x \in {\mathcal {B}}(x_*, d \sqrt{\epsilon } \Vert x_*\Vert )\), we have
By [12, (14)], we also have \(|\theta _{i} - {\bar{\theta }}_{i}| \le 4 i \sqrt{\epsilon } \le 4 d \sqrt{\epsilon }\). It follows that
Note that \(\sqrt{1 + 2 \epsilon } \le 1 + \epsilon \le 1 + \sqrt{d\sqrt{\epsilon }}\). We have
where the second inequality is from that \((1+x)^d \le 1 + 2dx\) if \(0< x d < 1\). Combining the above inequality with Lemma A.7 yields
\(\square \)
Lemma A.9 is used in proofs for Lemma 5.6.
Lemma A.9
Suppose \(x \in {\mathcal {B}}(x_*, d \sqrt{\epsilon } \Vert x_*\Vert )\), and the WDC holds with \(\epsilon < 1/ (200)^4 / d^6\). Then it holds that
Proof
For brevity of notation, let \(\Lambda _{j, k, z} = \prod _{i = j}^k W_{i, +, z}\). We have
For \(T_1\), we have
For \(T_2\), we have
where the first equation is by [12, (10)]; the second equation is by (A.6); the third equation is by Lemma A.8 and (A.8). The result follows from (A.9), (A.10) and (A.11). \(\square \)
Rights and permissions
About this article
Cite this article
Huang, W., Hand, P., Heckel, R. et al. A Provably Convergent Scheme for Compressive Sensing Under Random Generative Priors. J Fourier Anal Appl 27, 19 (2021). https://doi.org/10.1007/s00041-021-09830-5
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00041-021-09830-5