Skip to main content
Log in

Conjugate gradient acceleration of iteratively re-weighted least squares methods

  • Published:
Computational Optimization and Applications Aims and scope Submit manuscript

Abstract

Iteratively re-weighted least squares (IRLS) is a method for solving minimization problems involving non-quadratic cost functions, perhaps non-convex and non-smooth, which however can be described as the infimum over a family of quadratic functions. This transformation suggests an algorithmic scheme that solves a sequence of quadratic problems to be tackled efficiently by tools of numerical linear algebra. Its general scope and its usually simple implementation, transforming the initial non-convex and non-smooth minimization problem into a more familiar and easily solvable quadratic optimization problem, make it a versatile algorithm. However, despite its simplicity, versatility, and elegant analysis, the complexity of IRLS strongly depends on the way the solution of the successive quadratic optimizations is addressed. For the important special case of compressed sensing and sparse recovery problems in signal processing, we investigate theoretically and numerically how accurately one needs to solve the quadratic problems by means of the conjugate gradient (CG) method in each iteration in order to guarantee convergence. The use of the CG method may significantly speed-up the numerical solution of the quadratic subproblems, in particular, when fast matrix-vector multiplication (exploiting for instance the FFT) is available for the matrix involved. In addition, we study convergence rates. Our modified IRLS method outperforms state of the art first order methods such as Iterative Hard Thresholding (IHT) or Fast Iterative Soft-Thresholding Algorithm (FISTA) in many situations, especially in large dimensions. Moreover, IRLS is often able to recover sparse vectors from fewer measurements than required for IHT and FISTA.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

Similar content being viewed by others

References

  1. Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). doi:10.1137/080716542

    Article  MathSciNet  MATH  Google Scholar 

  2. Bickel, P., Ritov, Y., Tsybakov, A.: Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37(4), 1705–1732 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  3. Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009). doi:10.1016/j.acha.2009.04.002

    Article  MathSciNet  MATH  Google Scholar 

  4. Bredies, K., Lorenz, D.A.: Minimization of non-smooth, non-convex functionals by iterative thresholding. J. Optim. Theory Appl. 165, 78–112 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  5. Candès, E.J., Tao, T., Romberg, J.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52(2), 489–509 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  6. Candès, E.J., Plan, Y.: Near-ideal model selection by \(\ell _1\) minimization. Ann. Statist. 37(5A), 2145–2177 (2009). doi:10.1214/08-AOS653

    Article  MathSciNet  MATH  Google Scholar 

  7. Candès, E.J., Tao, T.: Near optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inform. Theory 52(12), 5406–5425 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  8. Chafai, D., Guédon, O., Lecué, G., Pajor, A.: Interactions between compressed sensing. random matrices and high dimensional geometry. Soc. Math. France (2012)

  9. Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997). doi:10.1007/s002110050258

    Article  MathSciNet  MATH  Google Scholar 

  10. Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Sign. Process. Lett. 14(10), 707–710 (2007). doi:10.1109/LSP.2007.898300

    Article  Google Scholar 

  11. Chartrand, R., Staneva, V.: Restricted isometry properties and nonconvex compressive sensing. Inverse Prob. 24(3), 035020 (2008). doi: 10.1088/0266-5611/24/3/035020

  12. Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, pp. 3869–3872 (2008). doi: 10.1109/ICASSP.2008.4518498

  13. Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by Basis Pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1999)

    Article  MathSciNet  MATH  Google Scholar 

  14. Cline, A.K.: Rate of convergence of Lawson’s algorithm. Math. Comp. 26, 167–176 (1972)

    MathSciNet  MATH  Google Scholar 

  15. Cohen, A., Dahmen, W., DeVore, R.A.: Compressed sensing and best \(k\)-term approximation. J. Am. Math. Soc. 22(1), 211–231 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  16. Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C.: Iteratively re-weighted least squares minimization for sparse recovery. Comm. Pure Appl. Math. 63(1), 1–38 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  17. Dirksen, S., Lecu’e, G., Rauhut, H.: On the gap between RIP-properties and sparse recovery conditions. Preprint arXiv:1504.05073 (2015)

  18. Donoho, D.L.: Compressed sensing. IEEE Trans. Inform. Theory 52(4), 1289–1306 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  19. Fornasier, M., Rauhut, H., Ward, R.: Low-rank matrix recovery via iteratively reweighted least squares minimization. SIAM J. Optim. 21(4), 1614–1640 (2011). doi:10.1137/100811404

    Article  MathSciNet  MATH  Google Scholar 

  20. Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Springer, New York (2013). doi: 10.1007/978-0-8176-4948-7

  21. Gorodnitsky, I.F., Rao, B.D.: Sparse signal reconstruction from limited data using FOCUSS: a recursive weighted norm minimization algorithm. IEEE Trans. Sign. Process. 45(3), 600–616 (1997)

    Article  Google Scholar 

  22. Gribonval, R., Nielsen, M.: Sparse representations in unions of bases. IEEE Trans. Inform. Theory 49(12), 3320–3325 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  23. Han, W., Jensen, S., Shimansky, I.: The Kačanov method for some nonlinear problems. Appl. Numer. Math. 24(1), 57–79 (1997)

    Article  MathSciNet  MATH  Google Scholar 

  24. Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stan. 49(6), 409–436 (1952)

    Article  MathSciNet  MATH  Google Scholar 

  25. Hollanda, P.W., Welsch, R.E.: Robust regression using iteratively reweighted least-squares. Commun. Stat. 6(9), 813–827 (1977)

    Article  MATH  Google Scholar 

  26. Ito, K., Kunisch, K.: A variational approach to sparsity optimization based on Lagrange multiplier theory. Inverse Prob. 30(1), 015,001, 23 (2014). doi: 10.1088/0266-5611/30/1/015001

  27. Jacobs, D.A.H.: A generalization of the conjugate-gradient method to solve complex systems. IMA J. Num. Anal. 6(4), 447–452 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  28. Kabanava, M., Rauhut, H.: Analysis \(\ell _1\)-recovery with frames and Gaussian measurements. Acta Appl. Math. (to appear)

  29. King, J.T.: A minimal error conjugate gradient method for ill-posed problems. J. Optim. Theory Appl. 60, 297–304 (1989). doi:10.1007/BF00940009

    Article  MathSciNet  MATH  Google Scholar 

  30. Krahmer, F., Mendelson, S., Rauhut, H.: Suprema of chaos processes and the restricted isometry property. Commun. Pure Appl. Math. 67(11), 1877–1904 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  31. Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(\ell _q\) minimization. SIAM J. Num. Anal. 51(2), 927–957 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  32. Lawson, C.L.: Contributions to the Theory of Linear Least Maximum Approximation. Ph.D. Thesis. University of California, Los Angeles (1961)

  33. Lecuè, G., Mendelson, S.: Sparse recovery under weak moment assumptions. J. Eur. Math. Soc. (to appear)

  34. Nocedal, J., Wright, S.: Conjugate Gradient Methods. Springer Series in Operations Research and Financial Engineering. pp. 101–134. Springer (2006)

  35. Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM J. Imaging Sci. 8(1), 331–372 (2015). doi:10.1137/140971518

    Article  MathSciNet  MATH  Google Scholar 

  36. Osborne, M.R.: Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Finite algorithms in optimization and data analysis. Wiley, Chichester (1985)

    Google Scholar 

  37. Quarteroni, A., Sacco, R., Saleri, F.: Numerical Mathematics. Texts in Applied Mathematics Series. Springer (2000). http://books.google.de/books?id=YVpyyi1M7vUC

  38. Ramlau, R., Zarzer, C.A.: On the minimization of a Tikhonov functional with a non-convex sparsity constraint. Electron. Trans. Numer. Anal. 39, 476–507 (2012)

    MathSciNet  MATH  Google Scholar 

  39. Rauhut, H.: Compressive sensing and structured random matrices. In: Fornasier, M (ed.) Theoretical foundations and numerical methods for sparse recovery, Radon Series Comp. Appl. Math., vol. 9, pp. 1–92. deGruyter (2010)

  40. Rudelson, M., Vershynin, R.: On sparse reconstruction from Fourier and Gaussian measurements. Comm. Pure Appl. Math. 61, 1025–1045 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  41. Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1–4), 259–268 (1992)

    Article  MathSciNet  MATH  Google Scholar 

  42. Vogel, C.R., Oman, M.E.: Fast, robust total variation-based reconstruction of noisy, blurred images. IEEE Trans. Image Process. 7(6), 813–824 (1998). doi:10.1109/83.679423

    Article  MathSciNet  MATH  Google Scholar 

  43. Voronin, S.: Regularization of linear systems with sparsity constraints with applications to large scale inverse problems. Ph.D. thesis, Applied and Computational Mathematics Department, Princeton University (2012)

  44. Voronin, S., Daubechies, I.: An Iteratively Reweighted Least Squares Algorithm for Sparse Regularization. arXiv:1511.08970 [math] (2015)

  45. Zarzer, C.A.: On Tikhonov regularization with non-convex sparsity constraints. Inverse Prob. 25(2), 025006 (2009). doi: 10.1088/0266-5611/25/2/025006

Download references

Acknowledgments

Massimo Fornasier acknowledges the support of the ERC-Starting Grant HDSPCONTR “High-Dimensional Sparse Optimal Control” and the DFG Project “Optimal Adaptive Numerical Methods for p-Poisson Elliptic equations”. Steffen Peter acknowledges the support of the Project “SparsEO: Exploiting the Sparsity in Remote Sensing for Earth Observation” funded by Munich Aerospace. Holger Rauhut would like to thank the European Research Council (ERC) for support through the Starting Grant StG 258926 SPALORA (Sparse and Low Rank Recovery) and the Hausdorff Center for Mathematics at the University of Bonn where this project has started.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Steffen Peter.

Appendix: Proof of Lemma 10

Appendix: Proof of Lemma 10

\(\Rightarrow \)” (in the case \(0 < \tau \leqslant 1 \))

Let \(x = x^{{\varepsilon },1}\) or \(x\in \mathcal {X}_{{\varepsilon },\tau }(y)\), and \(\eta \in \mathcal {N}_{\varPhi }\) arbitrary. Consider the function

$$\begin{aligned} G_{{\varepsilon },\tau }(t) \mathrel {\mathop :}=f_{{\varepsilon },\tau }\left( x + t\eta \right) - f_{{\varepsilon },\tau }\left( x \right) \end{aligned}$$

with its first derivative

$$\begin{aligned} G^{\prime }_{{\varepsilon },\tau }(t) = \tau \sum \limits _{i=1}^{N}\frac{x_{i}\eta _{i} +t\eta _{i}^2}{\left[ |x_{i} + t\eta _{i}|^{2} + {\varepsilon }^{2}\right] ^{\frac{2-\tau }{2}}}. \end{aligned}$$

Now \(G_{{\varepsilon },\tau }(0) = 0\) and from the minimization property of \(f_{{\varepsilon },\tau }(x)\), \(G_{{\varepsilon },\tau }(t) \ge 0\). Therefore,

$$\begin{aligned} 0 = G^{\prime }_{{\varepsilon },\tau }(0) = \sum \limits _{i=1}^{N}\frac{x_{i}\eta _{i}}{\left[ x_{i}^{2} + {\varepsilon }^{2}\right] ^{\frac{2-\tau }{2}}} = \left\langle x,\eta \right\rangle _{\hat{w}(x,{\varepsilon },\tau )}. \end{aligned}$$

\(\Leftarrow \)” (only in the case \(\tau =1\))

Now let \(x\in \mathcal {F}_{\varPhi }(y)\) and \(\left\langle x,\eta \right\rangle _{\hat{w}(x,{\varepsilon },1)} = 0\) for all \(\eta \in \mathcal {N}_{\varPhi }\). We want to show that x is the minimizer of \(f_{{\varepsilon },1}\) in \(\mathcal {F}_{\varPhi }(y)\). Consider the convex univariate function \(g(u)\mathrel {\mathop :}=[u^{2} + {\varepsilon }^{2}]^{1/2}\). For any point \(u_{0}\) we have from convexity that

$$\begin{aligned} {[}u^{2} + {\varepsilon }^{2}{]}^{1/2} \geqslant [u_{0}^{2} + {\varepsilon }^{2}]^{1/2} + {[}u_{0}^{2} + {\varepsilon }^{2}{]}^{-1/2}u_{0}(u-u_{0}) \end{aligned}$$

because the right-hand-side is the linear function which is tangent to g at \(u_{0}\). It follows, that for every point \(v\in \mathcal {F}_{\varPhi }(y)\) we have

$$\begin{aligned} f_{{\varepsilon },1}(v)\geqslant & {} f_{{\varepsilon },1}(x) + \sum \limits _{i=1}^{N}{[x_{i}^{2} + {\varepsilon }^{2}]^{-1/2}x_{i}(v_{i} - x_{i})}\\= & {} f_{{\varepsilon },1}(x) + \left\langle x, v-x\right\rangle _{\hat{w}(x,{\varepsilon },1)} = f_{{\varepsilon },1}(x), \end{aligned}$$

where we have used the orthogonality condition and the fact that \((v - x) \in \mathcal {N}_{\varPhi }\). Since v was chosen arbitrarily, \(x = x^{{\varepsilon },1}\) as claimed.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fornasier, M., Peter, S., Rauhut, H. et al. Conjugate gradient acceleration of iteratively re-weighted least squares methods. Comput Optim Appl 65, 205–259 (2016). https://doi.org/10.1007/s10589-016-9839-8

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10589-016-9839-8

Keywords

Navigation