Abstract
Iteratively re-weighted least squares (IRLS) is a method for solving minimization problems involving non-quadratic cost functions, perhaps non-convex and non-smooth, which however can be described as the infimum over a family of quadratic functions. This transformation suggests an algorithmic scheme that solves a sequence of quadratic problems to be tackled efficiently by tools of numerical linear algebra. Its general scope and its usually simple implementation, transforming the initial non-convex and non-smooth minimization problem into a more familiar and easily solvable quadratic optimization problem, make it a versatile algorithm. However, despite its simplicity, versatility, and elegant analysis, the complexity of IRLS strongly depends on the way the solution of the successive quadratic optimizations is addressed. For the important special case of compressed sensing and sparse recovery problems in signal processing, we investigate theoretically and numerically how accurately one needs to solve the quadratic problems by means of the conjugate gradient (CG) method in each iteration in order to guarantee convergence. The use of the CG method may significantly speed-up the numerical solution of the quadratic subproblems, in particular, when fast matrix-vector multiplication (exploiting for instance the FFT) is available for the matrix involved. In addition, we study convergence rates. Our modified IRLS method outperforms state of the art first order methods such as Iterative Hard Thresholding (IHT) or Fast Iterative Soft-Thresholding Algorithm (FISTA) in many situations, especially in large dimensions. Moreover, IRLS is often able to recover sparse vectors from fewer measurements than required for IHT and FISTA.
Similar content being viewed by others
References
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). doi:10.1137/080716542
Bickel, P., Ritov, Y., Tsybakov, A.: Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37(4), 1705–1732 (2009)
Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009). doi:10.1016/j.acha.2009.04.002
Bredies, K., Lorenz, D.A.: Minimization of non-smooth, non-convex functionals by iterative thresholding. J. Optim. Theory Appl. 165, 78–112 (2015)
Candès, E.J., Tao, T., Romberg, J.: Robust uncertainty principles: exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52(2), 489–509 (2006)
Candès, E.J., Plan, Y.: Near-ideal model selection by \(\ell _1\) minimization. Ann. Statist. 37(5A), 2145–2177 (2009). doi:10.1214/08-AOS653
Candès, E.J., Tao, T.: Near optimal signal recovery from random projections: universal encoding strategies? IEEE Trans. Inform. Theory 52(12), 5406–5425 (2006)
Chafai, D., Guédon, O., Lecué, G., Pajor, A.: Interactions between compressed sensing. random matrices and high dimensional geometry. Soc. Math. France (2012)
Chambolle, A., Lions, P.L.: Image recovery via total variation minimization and related problems. Numer. Math. 76(2), 167–188 (1997). doi:10.1007/s002110050258
Chartrand, R.: Exact reconstruction of sparse signals via nonconvex minimization. IEEE Sign. Process. Lett. 14(10), 707–710 (2007). doi:10.1109/LSP.2007.898300
Chartrand, R., Staneva, V.: Restricted isometry properties and nonconvex compressive sensing. Inverse Prob. 24(3), 035020 (2008). doi: 10.1088/0266-5611/24/3/035020
Chartrand, R., Yin, W.: Iteratively reweighted algorithms for compressive sensing. In: IEEE International Conference on Acoustics, Speech and Signal Processing, 2008. ICASSP 2008, pp. 3869–3872 (2008). doi: 10.1109/ICASSP.2008.4518498
Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by Basis Pursuit. SIAM J. Sci. Comput. 20(1), 33–61 (1999)
Cline, A.K.: Rate of convergence of Lawson’s algorithm. Math. Comp. 26, 167–176 (1972)
Cohen, A., Dahmen, W., DeVore, R.A.: Compressed sensing and best \(k\)-term approximation. J. Am. Math. Soc. 22(1), 211–231 (2009)
Daubechies, I., DeVore, R., Fornasier, M., Güntürk, C.: Iteratively re-weighted least squares minimization for sparse recovery. Comm. Pure Appl. Math. 63(1), 1–38 (2010)
Dirksen, S., Lecu’e, G., Rauhut, H.: On the gap between RIP-properties and sparse recovery conditions. Preprint arXiv:1504.05073 (2015)
Donoho, D.L.: Compressed sensing. IEEE Trans. Inform. Theory 52(4), 1289–1306 (2006)
Fornasier, M., Rauhut, H., Ward, R.: Low-rank matrix recovery via iteratively reweighted least squares minimization. SIAM J. Optim. 21(4), 1614–1640 (2011). doi:10.1137/100811404
Foucart, S., Rauhut, H.: A Mathematical Introduction to Compressive Sensing. Springer, New York (2013). doi: 10.1007/978-0-8176-4948-7
Gorodnitsky, I.F., Rao, B.D.: Sparse signal reconstruction from limited data using FOCUSS: a recursive weighted norm minimization algorithm. IEEE Trans. Sign. Process. 45(3), 600–616 (1997)
Gribonval, R., Nielsen, M.: Sparse representations in unions of bases. IEEE Trans. Inform. Theory 49(12), 3320–3325 (2003)
Han, W., Jensen, S., Shimansky, I.: The Kačanov method for some nonlinear problems. Appl. Numer. Math. 24(1), 57–79 (1997)
Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems. J. Res. Natl. Bur. Stan. 49(6), 409–436 (1952)
Hollanda, P.W., Welsch, R.E.: Robust regression using iteratively reweighted least-squares. Commun. Stat. 6(9), 813–827 (1977)
Ito, K., Kunisch, K.: A variational approach to sparsity optimization based on Lagrange multiplier theory. Inverse Prob. 30(1), 015,001, 23 (2014). doi: 10.1088/0266-5611/30/1/015001
Jacobs, D.A.H.: A generalization of the conjugate-gradient method to solve complex systems. IMA J. Num. Anal. 6(4), 447–452 (1986)
Kabanava, M., Rauhut, H.: Analysis \(\ell _1\)-recovery with frames and Gaussian measurements. Acta Appl. Math. (to appear)
King, J.T.: A minimal error conjugate gradient method for ill-posed problems. J. Optim. Theory Appl. 60, 297–304 (1989). doi:10.1007/BF00940009
Krahmer, F., Mendelson, S., Rauhut, H.: Suprema of chaos processes and the restricted isometry property. Commun. Pure Appl. Math. 67(11), 1877–1904 (2014)
Lai, M.J., Xu, Y., Yin, W.: Improved iteratively reweighted least squares for unconstrained smoothed \(\ell _q\) minimization. SIAM J. Num. Anal. 51(2), 927–957 (2013)
Lawson, C.L.: Contributions to the Theory of Linear Least Maximum Approximation. Ph.D. Thesis. University of California, Los Angeles (1961)
Lecuè, G., Mendelson, S.: Sparse recovery under weak moment assumptions. J. Eur. Math. Soc. (to appear)
Nocedal, J., Wright, S.: Conjugate Gradient Methods. Springer Series in Operations Research and Financial Engineering. pp. 101–134. Springer (2006)
Ochs, P., Dosovitskiy, A., Brox, T., Pock, T.: On iteratively reweighted algorithms for nonsmooth nonconvex optimization in computer vision. SIAM J. Imaging Sci. 8(1), 331–372 (2015). doi:10.1137/140971518
Osborne, M.R.: Wiley Series in Probability and Mathematical Statistics: Applied Probability and Statistics. Finite algorithms in optimization and data analysis. Wiley, Chichester (1985)
Quarteroni, A., Sacco, R., Saleri, F.: Numerical Mathematics. Texts in Applied Mathematics Series. Springer (2000). http://books.google.de/books?id=YVpyyi1M7vUC
Ramlau, R., Zarzer, C.A.: On the minimization of a Tikhonov functional with a non-convex sparsity constraint. Electron. Trans. Numer. Anal. 39, 476–507 (2012)
Rauhut, H.: Compressive sensing and structured random matrices. In: Fornasier, M (ed.) Theoretical foundations and numerical methods for sparse recovery, Radon Series Comp. Appl. Math., vol. 9, pp. 1–92. deGruyter (2010)
Rudelson, M., Vershynin, R.: On sparse reconstruction from Fourier and Gaussian measurements. Comm. Pure Appl. Math. 61, 1025–1045 (2008)
Rudin, L., Osher, S., Fatemi, E.: Nonlinear total variation based noise removal algorithms. Phys. D 60(1–4), 259–268 (1992)
Vogel, C.R., Oman, M.E.: Fast, robust total variation-based reconstruction of noisy, blurred images. IEEE Trans. Image Process. 7(6), 813–824 (1998). doi:10.1109/83.679423
Voronin, S.: Regularization of linear systems with sparsity constraints with applications to large scale inverse problems. Ph.D. thesis, Applied and Computational Mathematics Department, Princeton University (2012)
Voronin, S., Daubechies, I.: An Iteratively Reweighted Least Squares Algorithm for Sparse Regularization. arXiv:1511.08970 [math] (2015)
Zarzer, C.A.: On Tikhonov regularization with non-convex sparsity constraints. Inverse Prob. 25(2), 025006 (2009). doi: 10.1088/0266-5611/25/2/025006
Acknowledgments
Massimo Fornasier acknowledges the support of the ERC-Starting Grant HDSPCONTR “High-Dimensional Sparse Optimal Control” and the DFG Project “Optimal Adaptive Numerical Methods for p-Poisson Elliptic equations”. Steffen Peter acknowledges the support of the Project “SparsEO: Exploiting the Sparsity in Remote Sensing for Earth Observation” funded by Munich Aerospace. Holger Rauhut would like to thank the European Research Council (ERC) for support through the Starting Grant StG 258926 SPALORA (Sparse and Low Rank Recovery) and the Hausdorff Center for Mathematics at the University of Bonn where this project has started.
Author information
Authors and Affiliations
Corresponding author
Appendix: Proof of Lemma 10
Appendix: Proof of Lemma 10
“\(\Rightarrow \)” (in the case \(0 < \tau \leqslant 1 \))
Let \(x = x^{{\varepsilon },1}\) or \(x\in \mathcal {X}_{{\varepsilon },\tau }(y)\), and \(\eta \in \mathcal {N}_{\varPhi }\) arbitrary. Consider the function
with its first derivative
Now \(G_{{\varepsilon },\tau }(0) = 0\) and from the minimization property of \(f_{{\varepsilon },\tau }(x)\), \(G_{{\varepsilon },\tau }(t) \ge 0\). Therefore,
“\(\Leftarrow \)” (only in the case \(\tau =1\))
Now let \(x\in \mathcal {F}_{\varPhi }(y)\) and \(\left\langle x,\eta \right\rangle _{\hat{w}(x,{\varepsilon },1)} = 0\) for all \(\eta \in \mathcal {N}_{\varPhi }\). We want to show that x is the minimizer of \(f_{{\varepsilon },1}\) in \(\mathcal {F}_{\varPhi }(y)\). Consider the convex univariate function \(g(u)\mathrel {\mathop :}=[u^{2} + {\varepsilon }^{2}]^{1/2}\). For any point \(u_{0}\) we have from convexity that
because the right-hand-side is the linear function which is tangent to g at \(u_{0}\). It follows, that for every point \(v\in \mathcal {F}_{\varPhi }(y)\) we have
where we have used the orthogonality condition and the fact that \((v - x) \in \mathcal {N}_{\varPhi }\). Since v was chosen arbitrarily, \(x = x^{{\varepsilon },1}\) as claimed.
Rights and permissions
About this article
Cite this article
Fornasier, M., Peter, S., Rauhut, H. et al. Conjugate gradient acceleration of iteratively re-weighted least squares methods. Comput Optim Appl 65, 205–259 (2016). https://doi.org/10.1007/s10589-016-9839-8
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10589-016-9839-8