Abstract
Various iterative reconstruction algorithms for inverse problems can be unfolded as neural networks. Empirically, this approach has often led to improved results, but theoretical guarantees are still scarce. While some progress on generalization properties of neural networks have been made, great challenges remain. In this chapter, we discuss and combine these topics to present a generalization error analysis for a class of neural networks suitable for sparse reconstruction from few linear measurements. The hypothesis class considered is inspired by the classical iterative soft-thresholding algorithm (ISTA). The neural networks in this class are obtained by unfolding iterations of ISTA and learning some of the weights. Based on training samples, we aim at learning the optimal network parameters via empirical risk minimization and thereby the optimal network that reconstructs signals from their compressive linear measurements. In particular, we may learn a sparsity basis that is shared by all of the iterations/layers and thereby obtain a new approach for dictionary learning. For this class of networks,we present a generalization bound, which is based on bounding the Rademacher complexity of hypothesis classes consisting of such deep networks via Dudley’s integral. Remarkably, under realistic conditions, the generalization error scales only logarithmically in the number of layers, and at most linear in number of measurements.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aberdam, A., Golts, A., Elad, M.: Ada-LISTA: Learned solvers adaptive to varying models. Preprint. arXiv:2001.08456 (2020)
Arora, S., Ge, R., Neyshabur, B., Zhang, Y.: Stronger generalization bounds for deep nets via a compression approach. In: International Conference on Machine Learning, pp. 254–263 (2018)
Arridge, S., Maass, P., Öktem, O., Schönlieb, C.B.: Solving inverse problems using data-driven models. Acta Numerica 28, 1–174 (2019)
Bartlett, P.L., Mendelson, S.: Rademacher and Gaussian complexities: risk bounds and structural results. J. Mach. Learn. Res. 3(Nov), 463–482 (2002)
Bartlett, P.L., Foster, D.J., Telgarsky, M.J.: Spectrally-normalized margin bounds for neural networks. In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30, pp. 6240–6249 (2017)
Behrens, F., Sauder, J., Jung, P.: Neurally augmented ALISTA. In: International Conference on Learning Representations (2021)
Chen, X., Liu, J., Wang, Z., Yin, W.: Theoretical linear convergence of unfolded ISTA and its practical weights and thresholds. In: Advances in Neural Information Processing Systems, pp. 9061–9071 (2018)
Chou, H.H., Gieshoff, C., Maly, J., Rauhut, H.: Gradient descent for deep matrix factorization: dynamics and implicit bias towards low rank. Preprint. arxiv:2011.13772 (2021)
Daras, G., Dean, J., Jalal, A., Dimakis, A.G.: Intermediate layer optimization for inverse problems using deep generative models. arXiv:2102.07364 [cs] (2021). http://arxiv.org/abs/2102.07364
DasGupta, B., Sontag, E.: Sample complexity for learning recurrent perceptron mappings. IEEE Trans. Inf. Theory 42(5), 1479–1487 (1996)
Daubechies, I., Defrise, M., De Mol, C.: An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Commun. Pure Appl. Math. J. Issued Courant Instit. Math. Sci. 57(11), 1413–1457 (2004)
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Applied and Numerical Harmonic Analysis. Springer, New York (2013)
Genzel, M., Macdonald, J., März, M.: Solving Inverse Problems With Deep Neural Networks – Robustness Included? arXiv:2011.04268 (2020)
Georgogiannis, A.: The generalization error of dictionary learning with Moreau envelopes. In: Dy, J., Krause, A. (eds.) Proceedings of the 35th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 80, pp. 1617–1625. PMLR, Stockholmsmässan, Stockholm (2018)
Golowich, N., Rakhlin, A., Shamir, O.: Size-independent sample complexity of neural networks. In: Conference on Learning Theory, pp. 297–299 (2018)
Gottschling, N.M., Antun, V., Adcock, B., Hansen, A.C.: The troublesome kernel: why deep learning for inverse problems is typically unstable. Preprint. arXiv:2001.01258 (2020)
Gregor, K., LeCun, Y.: Learning fast approximations of sparse coding. In: Proceedings of the 27th International Conference on International Conference on Machine Learning, pp. 399–406 (2010)
Gribonval, R., Schnass, K.: Dictionary identification – sparse matrix-factorisation via ℓ 1-minimisation. IEEE Trans. Inf. Theory 56(7), 3523–3539 (2010)
Gribonval, R., Jenatton, R., Bach, F., Kleinsteuber, M., Seibert, M.: Sample complexity of dictionary learning and other matrix factorizations. IEEE Trans. Inf. Theory 61(6), 3469–3486 (2015)
Hasannasab, M., Hertrich, J., Neumayer, S., Plonka, G., Setzer, S., Steidl, G.: Parseval proximal neural networks. J. Fourier Anal. Appl. 26(4), 59 (2020)
Jiang, Y., Neyshabur, B., Mobahi, H., Krishnan, D., Bengio, S.: Fantastic generalization measures and where to find them. In: International Conference on Learning Representations (2020)
Jung, A., Eldar, Y.C., Görtz, N.: Performance limits of dictionary learning for sparse coding. In: 2014 22nd European Signal Processing Conference (EUSIPCO), pp. 765–769 (2014)
Jung, A., Eldar, Y.C., Görtz, N.: On the minimax risk of dictionary learning. IEEE Trans. Inf. Theory 62(3), 1501–1515 (2016)
Kamilov, U.S., Mansour, H.: Learning optimal nonlinearities for iterative thresholding algorithms. IEEE Signal Process. Lett. 23(5), 747–751 (2016)
Koiran, P., Sontag, E.D.: Vapnik-Chervonenkis dimension of recurrent neural networks. Discret. Appl. Math. 86(1), 63–79 (1998)
Koltchinskii, V.: Rademacher penalties and structural risk minimization. IEEE Trans. Inf. Theory 47(5), 1902–1914 (2001)
LeCun, Y.: The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/
Ledoux, M., Talagrand, M.: Probability in Banach spaces: isoperimetry and processes. Classics in Mathematics. Springer, Berlin (2011)
Lezcano-Casado, M., Martınez-Rubio, D.: Cheap orthogonal constraints in neural networks: a simple parametrization of the orthogonal and unitary group. In: International Conference on Machine Learning, pp. 3794–3803. PMLR (2019)
Liu, J., Chen, X., Wang, Z., Yin, W.: ALISTA: Analytic weights are as good as learned weights in LISTA. In: International Conference on Learning Representations (2019)
Maurer, A.: A vector-contraction inequality for Rademacher complexities. In: Algorithmic Learning Theory, Lecture Notes in Computer Science, pp. 3–17 (2016)
Mousavi, A., Patel, A.B., Baraniuk, R.G.: A deep learning approach to structured signal recovery. In: 2015 53rd Annual Allerton Conference on Communication, Control, and Computing, Allerton, pp. 1336–1343. IEEE, Piscataway (2015)
Nagarajan, V., Kolter, J.Z.: Uniform convergence may be unable to explain generalization in deep learning. In: Advances in Neural Information Processing Systems, pp. 11611–11622 (2019)
Neyshabur, B., Tomioka, R., Srebro, N.: In search of the real inductive bias: on the role of implicit regularization in deep learning. In: ICLR (Workshop) (2015)
Neyshabur, B., Bhojanapalli, S., McAllester, D., Srebro, N.: Exploring generalization in deep learning. In: Advances in Neural Information Processing Systems, pp. 5947–5956 (2017)
Neyshabur, B., Bhojanapalli, S., Srebro, N.: A PAC-Bayesian approach to spectrally-normalized margin bounds for neural networks. In: International Conference on Learning Representations (2018)
Neyshabur, B., Li, Z., Bhojanapalli, S., LeCun, Y., Srebro, N.: The role of over-parametrization in generalization of neural networks. In: International Conference on Learning Representations (2019)
Rakhlin, A., Mukherjee, S., Poggio, T.: Stability results in learning theory. Anal. Appl. 03(04), 397–417 (2005)
Rauhut, H., Schnass, K., Vandergheynst, P.: Compressed sensing and redundant dictionaries. IEEE Trans. Inf. Theory 54(5), 2210–2219 (2008)
Schnass, K.: On the identifiability of overcomplete dictionaries via the minimisation principle underlying K-SVD. Appl. Comput. Harmonic Anal. (3), 37 (2014)
Shalev-Shwartz, S., Ben-David, S.: Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, New York (2014)
Shalev-Shwartz, S., Shamir, O., Srebro, N., Sridharan, K.: Learnability, stability and uniform convergence. J. Mach. Learn. Res. 11, 2635–2670 (2010)
Sprechmann, P., Bronstein, A.M., Sapiro, G.: Learning efficient sparse and low rank models. IEEE Trans. Pattern Anal. Mach. Intell. 37(9), 1821–1833 (2015)
Sreter, H., Giryes, R.: Learned convolutional sparse coding. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2191–2195. IEEE, Piscataway (2018)
Talagrand, M.: Upper and Lower Bounds for Stochastic Processes: Modern Methods and Classical Problems. Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge/A Series of Modern Surveys in Mathematics. Springer, Berlin (2014)
Vainsencher, D., Mannor, S., Bruckstein, A.M.: The sample complexity of dictionary learning. J. Mach. Learn. Res. 12(Nov), 3259–3281 (2011)
Vapnik, V.N.: The Nature of Statistical Learning Theory. Springer, New York. Imprint: Springer, New York (2000)
Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. In: Vovk, V., Papadopoulos, H., Gammerman, A. (eds.) Measures of Complexity: Festschrift for Alexey Chervonenkis, pp. 11–30 (2015)
Wu, K., Guo, Y., Li, Z., Zhang, C.: Sparse coding with gated learned ISTA. In: International Conference on Learning Representations (2020)
Xin, B., Wang, Y., Gao, W., Wipf, D., Wang, B.: Maximal sparsity with deep networks? In: Advances in Neural Information Processing Systems, pp. 4340–4348 (2016)
Xu, H., Mannor, S.: Robustness and generalization. Mach. Learn. 86(3), 391–423 (2012)
Yang, Y., Sun, J., Li, H., Xu, Z.: Deep ADMM-Net for compressive sensing MRI. In: Lee, D., Sugiyama, M., Luxburg, U., Guyon, I., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 29 (2016)
Zhang, J., Ghanem, B.: ISTA-Net: interpretable optimization-inspired deep network for image compressive sensing. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1828–1837 (2018)
Zhang, C., Bengio, S., Hardt, M., Recht, B., Vinyals, O.: Understanding deep learning requires rethinking generalization. In: International Conference on Learning Representations (2017)
Acknowledgements
The authors would like to thank Sebastian Lubjuhn for proofreading an earlier version of this paper and giving valuable suggestions for improvement. The third author acknowledges funding from the Deutsche Forschungsgemeinschaft (DFG) through the project Structured Compressive Sensing via Neural Network Learning (SCoSNeL, MA 1184/36-1) within the SPP 1798 Compressed Sensing in Information Processing (CoSIP).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Behboodi, A., Rauhut, H., Schnoor, E. (2022). Compressive Sensing and Neural Networks from a Statistical Learning Perspective. In: Kutyniok, G., Rauhut, H., Kunsch, R.J. (eds) Compressed Sensing in Information Processing. Applied and Numerical Harmonic Analysis. Birkhäuser, Cham. https://doi.org/10.1007/978-3-031-09745-4_8
Download citation
DOI: https://doi.org/10.1007/978-3-031-09745-4_8
Published:
Publisher Name: Birkhäuser, Cham
Print ISBN: 978-3-031-09744-7
Online ISBN: 978-3-031-09745-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)