Foundations of Computational Mathematics

, Volume 16, Issue 4, pp 965–1029 | Cite as

Sharp MSE Bounds for Proximal Denoising

  • Samet Oymak
  • Babak Hassibi


Denoising has to do with estimating a signal \(\mathbf {x}_0\) from its noisy observations \(\mathbf {y}=\mathbf {x}_0+\mathbf {z}\). In this paper, we focus on the “structured denoising problem,” where the signal \(\mathbf {x}_0\) possesses a certain structure and \(\mathbf {z}\) has independent normally distributed entries with mean zero and variance \(\sigma ^2\). We employ a structure-inducing convex function \(f(\cdot )\) and solve \(\min _\mathbf {x}\{\frac{1}{2}\Vert \mathbf {y}-\mathbf {x}\Vert _2^2+\sigma {\lambda }f(\mathbf {x})\}\) to estimate \(\mathbf {x}_0\), for some \(\lambda >0\). Common choices for \(f(\cdot )\) include the \(\ell _1\) norm for sparse vectors, the \(\ell _1-\ell _2\) norm for block-sparse signals and the nuclear norm for low-rank matrices. The metric we use to evaluate the performance of an estimate \(\mathbf {x}^*\) is the normalized mean-squared error \(\text {NMSE}(\sigma )=\frac{{\mathbb {E}}\Vert \mathbf {x}^*-\mathbf {x}_0\Vert _2^2}{\sigma ^2}\). We show that NMSE is maximized as \(\sigma \rightarrow 0\) and we find the exact worst-case NMSE, which has a simple geometric interpretation: the mean-squared distance of a standard normal vector to the \({\lambda }\)-scaled subdifferential \({\lambda }\partial f(\mathbf {x}_0)\). When \({\lambda }\) is optimally tuned to minimize the worst-case NMSE, our results can be related to the constrained denoising problem \(\min _{f(\mathbf {x})\le f(\mathbf {x}_0)}\{\Vert \mathbf {y}-\mathbf {x}\Vert _2\}\). The paper also connects these results to the generalized LASSO problem, in which one solves \(\min _{f(\mathbf {x})\le f(\mathbf {x}_0)}\{\Vert \mathbf {y}-{\mathbf {A}}\mathbf {x}\Vert _2\}\) to estimate \(\mathbf {x}_0\) from noisy linear observations \(\mathbf {y}={\mathbf {A}}\mathbf {x}_0+\mathbf {z}\). We show that certain properties of the LASSO problem are closely related to the denoising problem. In particular, we characterize the normalized LASSO cost and show that it exhibits a “phase transition” as a function of number of observations. We also provide an order-optimal bound for the LASSO error in terms of the mean-squared distance. Our results are significant in two ways. First, we find a simple formula for the performance of a general convex estimator. Secondly, we establish a connection between the denoising and linear inverse problems.


Convex optimization Proximity operator Structured sparsity Statistical estimation Model fitting  Stochastic noise Linear inverse Generalized LASSO 

Mathematics Subject Classification

90C25 52A41 90C59 60G15 



This work was supported in part by the National Science Foundation under Grants CCF-0729203, CNS-0932428 and CIF-1018927, by the Office of Naval Research under the MURI Grant N00014-08-1-0747, and by a Grant from Qualcomm Inc. Authors would like to thank Michael McCoy and Joel Tropp for stimulating discussions and helpful comments. Michael McCoy pointed out Lemma 12.1 and informed us of various recent results most importantly Theorem 7.1. S.O. would also like to thank his colleagues Kishore Jaganathan and Christos Thrampoulidis for their support and to the anonymous reviewers for their valuable suggestions.

Supplementary material


  1. 1.
    A. Agarwal, S. Negahban, M. J. Wainwright, et al. Noisy matrix decomposition via convex relaxation: Optimal rates in high dimensions. The Annals of Statistics, 40(2):1171–1197, 2012.MathSciNetCrossRefzbMATHGoogle Scholar
  2. 2.
    D. Amelunxen, M. Lotz, M. B. McCoy, and J. A. Tropp. Living on the edge: Phase transitions in convex programs with random data. Inform. Inference, 2014.Google Scholar
  3. 3.
    F. R. Bach. Structured sparsity-inducing norms through submodular functions. In Advances in Neural Information Processing Systems, pages 118–126, 2010.Google Scholar
  4. 4.
    A. Banerjee, S. Chen, F. Fazayeli, and V. Sivakumar. Estimation with norm regularization. In Advances in Neural Information Processing Systems, pages 1556–1564, 2014.Google Scholar
  5. 5.
    R. G. Baraniuk, V. Cevher, M. F. Duarte, and C. Hegde. Model-based compressive sensing. Information Theory, IEEE Transactions on, 56(4):1982–2001, 2010.MathSciNetCrossRefGoogle Scholar
  6. 6.
    M. Bayati, M. Lelarge, and A. Montanari. Universality in polytope phase transitions and message passing algorithms. arXiv preprint  arXiv:1207.7321, 2012.
  7. 7.
    M. Bayati and A. Montanari. The dynamics of message passing on dense graphs, with applications to compressed sensing. Information Theory, IEEE Transactions on, 57(2):764–785, 2011.MathSciNetCrossRefGoogle Scholar
  8. 8.
    M. Bayati and A. Montanari. The lasso risk for gaussian matrices. Information Theory, IEEE Transactions on, 58(4):1997–2017, 2012.MathSciNetCrossRefGoogle Scholar
  9. 9.
    A. Belloni, V. Chernozhukov, and L. Wang. Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika, 98(4):791–806, 2011.MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    D. P. Bertsekas, A. Nedić, and A. E. Ozdaglar. Convex analysis and optimization. Athena Scientific, Belmont, 2003.zbMATHGoogle Scholar
  11. 11.
    B. N. Bhaskar, G. Tang, and B. Recht. Atomic norm denoising with applications to line spectral estimation. Signal Processing, IEEE Transactions on, 61(23):5987–5999, 2013.MathSciNetCrossRefGoogle Scholar
  12. 12.
    P. J. Bickel, Y. Ritov, and A. B. Tsybakov. Simultaneous analysis of lasso and dantzig selector. The Annals of Statistics, pages 1705–1732, 2009.Google Scholar
  13. 13.
    V. I. Bogachev. Gaussian measures. American Mathematical Society, Providence 1998.CrossRefzbMATHGoogle Scholar
  14. 14.
    S. Boyd and L. Vandenberghe. Convex optimization. Cambridge University Press, Cambridge 2009.zbMATHGoogle Scholar
  15. 15.
    F. Bunea, A. Tsybakov, M. Wegkamp, et al. Sparsity oracle inequalities for the lasso. Electronic Journal of Statistics, 1:169–194, 2007.MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    J.-F. Cai, E. J. Candès, and Z. Shen. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4):1956–1982, 2010.MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    J.-F. Cai and W. Xu. Guarantees of total variation minimization for signal recovery. arXiv preprint  arXiv:1301.6791, 2013.
  18. 18.
    T. T. Cai, T. Liang, and A. Rakhlin. Geometrizing local rates of convergence for linear inverse problems. arXiv preprint  arXiv:1404.4408, 2014.
  19. 19.
    E. Candès and B. Recht. Simple bounds for recovering low-complexity models. Mathematical Programming, 141(1-2):577–589, 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  20. 20.
    E. J. Candès, X. Li, Y. Ma, and J. Wright. Robust principal component analysis? Journal of the ACM (JACM), 58(3):11, 2011.MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    E. J. Candes and Y. Plan. Tight oracle inequalities for low-rank matrix recovery from a minimal number of noisy random measurements. Information Theory, IEEE Transactions on, 57(4):2342–2359, 2011.MathSciNetCrossRefGoogle Scholar
  22. 22.
    E. J. Candès and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational mathematics, 9(6):717–772, 2009.MathSciNetCrossRefzbMATHGoogle Scholar
  23. 23.
    E. J. Candès, J. Romberg, and T. Tao. Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. Information Theory, IEEE Transactions on, 52(2):489–509, 2006.MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    E. J. Candes, J. K. Romberg, and T. Tao. Stable signal recovery from incomplete and inaccurate measurements. Communications on pure and applied mathematics, 59(8):1207–1223, 2006.MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    E. J. Candes, M. B. Wakin, and S. P. Boyd. Enhancing sparsity by reweighted \(\ell _1\) minimization. Journal of Fourier analysis and applications, 14(5-6):877–905, 2008.MathSciNetCrossRefzbMATHGoogle Scholar
  26. 26.
    V. Chandrasekaran and M. I. Jordan. Computational and statistical tradeoffs via convex relaxation. Proceedings of the National Academy of Sciences, 110(13):E1181–E1190, 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    V. Chandrasekaran, B. Recht, P. A. Parrilo, and A. S. Willsky. The convex geometry of linear inverse problems. Foundations of Computational Mathematics, 12(6):805–849, 2012.MathSciNetCrossRefzbMATHGoogle Scholar
  28. 28.
    S. Chen and D. L. Donoho. Examples of basis pursuit. In SPIE’s 1995 International Symposium on Optical Science, Engineering, and Instrumentation, pages 564–574. International Society for Optics and Photonics, 1995.Google Scholar
  29. 29.
    P. L. Combettes and J.-C. Pesquet. Proximal splitting methods in signal processing. In Fixed-point algorithms for inverse problems in science and engineering, pages 185–212. Springer, Berlin 2011.Google Scholar
  30. 30.
    P. L. Combettes and V. R. Wajs. Signal recovery by proximal forward-backward splitting. Multiscale Modeling & Simulation, 4(4):1168–1200, 2005.MathSciNetCrossRefzbMATHGoogle Scholar
  31. 31.
    D. Donoho, I. Johnstone, and A. Montanari. Accurate prediction of phase transitions in compressed sensing via a connection to minimax denoising. arXiv preprint  arXiv:1111.1041, 2011.
  32. 32.
    D. Donoho and J. Tanner. Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, 367(1906):4273–4293, 2009.MathSciNetCrossRefzbMATHGoogle Scholar
  33. 33.
    D. L. Donoho. De-noising by soft-thresholding. Information Theory, IEEE Transactions on, 41(3):613–627, 1995.MathSciNetCrossRefzbMATHGoogle Scholar
  34. 34.
    D. L. Donoho. Compressed sensing. Information Theory, IEEE Transactions on, 52(4):1289–1306, 2006.MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    D. L. Donoho. High-dimensional centrally symmetric polytopes with neighborliness proportional to dimension. Discrete & Computational Geometry, 35(4):617–652, 2006.MathSciNetCrossRefzbMATHGoogle Scholar
  36. 36.
    D. L. Donoho and M. Gavish. Minimax risk of matrix denoising by singular value thresholding. arXiv preprint  arXiv:1304.2085, 2013.
  37. 37.
    D. L. Donoho, M. Gavish, and A. Montanari. The phase transition of matrix recovery from gaussian measurements matches the minimax mse of matrix denoising. Proceedings of the National Academy of Sciences, 110(21):8405–8410, 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  38. 38.
    D. L. Donoho, A. Maleki, and A. Montanari. Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914–18919, 2009.CrossRefGoogle Scholar
  39. 39.
    D. L. Donoho, A. Maleki, and A. Montanari. The noise-sensitivity phase transition in compressed sensing. Information Theory, IEEE Transactions on, 57(10):6920–6941, 2011.MathSciNetCrossRefGoogle Scholar
  40. 40.
    D. L. Donoho and J. Tanner. Neighborliness of randomly projected simplices in high dimensions. Proceedings of the National Academy of Sciences of the United States of America, 102(27):9452–9457, 2005.MathSciNetCrossRefzbMATHGoogle Scholar
  41. 41.
    D. L. Donoho and J. Tanner. Thresholds for the recovery of sparse solutions via l1 minimization. In Information Sciences and Systems, 2006 40th Annual Conference on, pages 202–206. IEEE, 2006.Google Scholar
  42. 42.
    Y. C. Eldar, P. Kuppinger, and H. Bolcskei. Block-sparse signals: Uncertainty relations and efficient recovery. Signal Processing, IEEE Transactions on, 58(6):3042–3054, 2010.MathSciNetCrossRefGoogle Scholar
  43. 43.
    M. Fazel. Matrix rank minimization with applications. PhD thesis, Stanford University, 2002.Google Scholar
  44. 44.
    R. Foygel and L. Mackey. Corrupted sensing: Novel guarantees for separating structured signals. Information Theory, IEEE Transactions on, 60(2):1223–1247, 2014.MathSciNetCrossRefGoogle Scholar
  45. 45.
    Y. Gordon. On Milman’s inequality and random subspaces which escape through a mesh in \({\mathbb{R}}^n\). Springer, Berlin 1988.Google Scholar
  46. 46.
    O. Güler. On the convergence of the proximal point algorithm for convex minimization. SIAM Journal on Control and Optimization, 29(2):403–419, 1991.MathSciNetCrossRefzbMATHGoogle Scholar
  47. 47.
    E. T. Hale, W. Yin, and Y. Zhang. A fixed-point continuation method for l1-regularized minimization with applications to compressed sensing. CAAM TR07-07, Rice University, 2007.Google Scholar
  48. 48.
    J.-B. Hiriart-Urruty and C. Lemaréchal. Convex Analysis and Minimization Algorithms I: Part 1: Fundamentals, volume 305. Springer, Berlin 1996.zbMATHGoogle Scholar
  49. 49.
    R. Jenatton, J. Mairal, F. R. Bach, and G. R. Obozinski. Proximal methods for sparse hierarchical dictionary learning. In Proceedings of the 27th International Conference on Machine Learning (ICML-10), pages 487–494, 2010.Google Scholar
  50. 50.
    M. A. Khajehnejad, A. G. Dimakis, W. Xu, and B. Hassibi. Sparse recovery of nonnegative signals with minimal expansion. Signal Processing, IEEE Transactions on, 59(1):196–208, 2011.MathSciNetCrossRefGoogle Scholar
  51. 51.
    M. A. Khajehnejad, W. Xu, A. S. Avestimehr, and B. Hassibi. Weighted \(\ell _1\) minimization for sparse recovery with prior information. In Information Theory, 2009. ISIT 2009. IEEE International Symposium on, pages 483–487. IEEE, 2009.Google Scholar
  52. 52.
    V. Koltchinskii, K. Lounici, A. B. Tsybakov, et al. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. The Annals of Statistics, 39(5):2302–2329, 2011.MathSciNetCrossRefzbMATHGoogle Scholar
  53. 53.
    M. Ledoux. The concentration of measure phenomenon, volume 89. American Mathematical Society, Providence, 2005.zbMATHGoogle Scholar
  54. 54.
    M. Ledoux and M. Talagrand. Probability in Banach Spaces: isoperimetry and processes, volume 23. Springer, Berlin 1991.CrossRefzbMATHGoogle Scholar
  55. 55.
    J.-J. Moreau. Fonctions convexes duales et points proximaux dans un espace hilbertien. CR Acad. Sci. Paris Sér. A Math, 255:2897–2899, 1962.MathSciNetzbMATHGoogle Scholar
  56. 56.
    D. Needell and R. Ward. Stable image reconstruction using total variation minimization. SIAM Journal on Imaging Sciences, 6(2):1035–1058, 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  57. 57.
    S. Negahban, B. Yu, M. J. Wainwright, and P. K. Ravikumar. A unified framework for high-dimensional analysis of \( m \)-estimators with decomposable regularizers. In Advances in Neural Information Processing Systems, pages 1348–1356, 2009.Google Scholar
  58. 58.
    S. Oymak and B. Hassibi. New null space results and recovery thresholds for matrix rank minimization. arXiv preprint  arXiv:1011.6326, 2010.
  59. 59.
    S. Oymak and B. Hassibi. Tight recovery thresholds and robustness analysis for nuclear norm minimization. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, pages 2323–2327. IEEE, 2011.Google Scholar
  60. 60.
    S. Oymak, A. Jalali, M. Fazel, Y. C. Eldar, and B. Hassibi. Simultaneously structured models with application to sparse and low-rank matrices. arXiv preprint  arXiv:1212.3753, 2012.
  61. 61.
    S. Oymak, C. Thrampoulidis, and B. Hassibi. The squared-error of generalized lasso: A precise analysis. arXiv preprint  arXiv:1311.0830, 2013.
  62. 62.
    N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in optimization, 1(3):123–231, 2013.Google Scholar
  63. 63.
    N. Rao, B. Recht, and R. Nowak. Tight measurement bounds for exact recovery of structured sparse signals. arXiv preprint  arXiv:1106.4355, 2011.
  64. 64.
    B. Recht, M. Fazel, and P. A. Parrilo. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM review, 52(3):471–501, 2010.MathSciNetCrossRefzbMATHGoogle Scholar
  65. 65.
    B. Recht, W. Xu, and B. Hassibi. Necessary and sufficient conditions for success of the nuclear norm heuristic for rank minimization. In Decision and Control, 2008. CDC 2008. 47th IEEE Conference on, pages 3065–3070. IEEE, 2008.Google Scholar
  66. 66.
    E. Richard, F. Bach, and J.-P. Vert. Intersecting singularities for multi-structured estimation. In ICML 2013-30th International Conference on Machine Learning, 2013.Google Scholar
  67. 67.
    E. Richard, P.-A. Savalle, and N. Vayatis. Estimation of simultaneously sparse and low rank matrices. arXiv preprint  arXiv:1206.6474, 2012.
  68. 68.
    R. T. Rockafellar. Monotone operators and the proximal point algorithm. SIAM journal on control and optimization, 14(5):877–898, 1976.MathSciNetCrossRefzbMATHGoogle Scholar
  69. 69.
    R. T. Rockafellar. Convex analysis. Princeton University Press, Princeton 1997.zbMATHGoogle Scholar
  70. 70.
    L. I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal algorithms. Physica D: Nonlinear Phenomena, 60(1):259–268, 1992.MathSciNetCrossRefzbMATHGoogle Scholar
  71. 71.
    A. A. Shabalin and A. B. Nobel. Reconstruction of a low-rank matrix in the presence of gaussian noise. Journal of Multivariate Analysis, 118:67–76, 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  72. 72.
    M. Stojnic. Various thresholds for \(\ell _1\)-optimization in compressed sensing. arXiv preprint  arXiv:0907.3666, 2009.
  73. 73.
    M. Stojnic. A framework to characterize performance of lasso algorithms. arXiv preprint  arXiv:1303.7291, 2013.
  74. 74.
    M. Stojnic. A performance analysis framework for socp algorithms in noisy compressed sensing. arXiv preprint  arXiv:1304.0002, 2013.
  75. 75.
    M. Stojnic, F. Parvaresh, and B. Hassibi. On the reconstruction of block-sparse signals with an optimal number of measurements. Signal Processing, IEEE Transactions on, 57(8):3075–3085, 2009.MathSciNetCrossRefGoogle Scholar
  76. 76.
    M. Tao and X. Yuan. Recovering low-rank and sparse components of matrices from incomplete and noisy observations. SIAM Journal on Optimization, 21(1):57–81, 2011.MathSciNetCrossRefzbMATHGoogle Scholar
  77. 77.
    C. Thrampoulidis, A. Panahi, and B. Hassibi. Asymptotically exact error analysis for the generalized \(\ell _2^2\)-lasso. lasso. arXiv preprint  arXiv:1502.06287, 2015.
  78. 78.
    R. Tibshirani. Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society. Series B (Methodological), pages 267–288, 1996.Google Scholar
  79. 79.
    N. Vaswani and W. Lu. Modified-cs: Modifying compressive sensing for problems with partially known support. Signal Processing, IEEE Transactions on, 58(9):4595–4607, 2010.MathSciNetCrossRefGoogle Scholar
  80. 80.
    J. Wright, A. Ganesh, K. Min, and Y. Ma. Compressive principal component pursuit. Information and Inference, 2(1):32–68, 2013.MathSciNetCrossRefzbMATHGoogle Scholar
  81. 81.
    Z. Zhou, X. Li, J. Wright, E. Candes, and Y. Ma. Stable principal component pursuit. In Information Theory Proceedings (ISIT), 2010 IEEE International Symposium on, pages 1518–1522. IEEE, 2010.Google Scholar

Copyright information

© SFoCM 2015

Authors and Affiliations

  1. 1.Department of Electrical Engineering and Computer ScienceUC BerkeleyBerkeleyUSA
  2. 2.Department of Electrical EngineeringCaltechPasadenaUSA

Personalised recommendations