A simple homotopy proximal mapping algorithm for compressive sensing

  • Tianbao YangEmail author
  • Lijun Zhang
  • Rong Jin
  • Shenghuo Zhu
  • Zhi-Hua Zhou


In this paper, we present novel yet simple homotopy proximal mapping algorithms for reconstructing a sparse signal from (noisy) linear measurements of the signal or for learning a sparse linear model from observed data, where the former task is well-known in the field of compressive sensing and the latter task is known as model selection in statistics and machine learning. The algorithms adopt a simple proximal mapping of the \(\ell _1\) norm at each iteration and gradually reduces the regularization parameter for the \(\ell _1\) norm. We prove a global linear convergence of the proposed homotopy proximal mapping (HPM) algorithms for recovering the sparse signal under three different settings (i) sparse signal recovery under noiseless measurements, (ii) sparse signal recovery under noisy measurements, and (iii) nearly-sparse signal recovery under sub-Gaussian noisy measurements. In particular, we show that when the measurement matrix satisfies restricted isometric properties (RIP), one of the proposed algorithms with an appropriate setting of a parameter based on the RIP constants converges linearly to the optimal solution up to the noise level. In addition, in setting (iii), a practical variant of the proposed algorithms does not rely on the RIP constants and our results for sparse signal recovery are better than the previous results in the sense that our recovery error bound is smaller. Furthermore, our analysis explicitly exhibits that more observations lead to not only more accurate recovery but also faster convergence. Finally our empirical studies provide further support for the proposed homotopy proximal mapping algorithm and verify the theoretical results.


Compressive sensing Sparse signal recovery Proximal mapping Linear convergence 



We thank all reviewers for their constructive comments. Z.-H. Zhou is partially supported by National Key R&D Program of China (2018YFB1004300), and NSFC (61751306). L. Zhang is partially supported by JiangsuSF (BK20160658), and YESS (2017QNRC001). T. Yang is partially supported by NSF (1545995).


  1. Agarwal, A., Negahban, S., & Wainwright, M. J. (2010). Fast global convergence rates of gradient methods for high-dimensional statistical recovery. Advances in Neural Information Processing Systems, 23, 37–45.zbMATHGoogle Scholar
  2. Amster, P. (2014). The Banach fixed point theorem (pp. 29–51). Boston, MA: Springer.Google Scholar
  3. Asif, M. S., & Romberg, J. K. (2014). Sparse recovery of streaming signals using \(\ell _1\)-homotopy. IEEE Transactions on Signal Processing, 62(16), 4209–4223.MathSciNetCrossRefGoogle Scholar
  4. Becker, S., Bobin, J., & Candès, E. J. (2011). Nesta: A fast and accurate first-order method for sparse recovery. SIAM Journal on Imaging Sciences, 4, 1–39.MathSciNetCrossRefGoogle Scholar
  5. Beck, A., & Teboulle, M. (2009). A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM Journal on Imaging Sciences, 2(1), 183–202.MathSciNetCrossRefGoogle Scholar
  6. Bickel, P. J., Ritov, Y., & Tsybakov, A. B. (2009). Simultaneous analysis of Lasso and Dantzig selector. Annals of Statistics, 37(4), 1705–1732.MathSciNetCrossRefGoogle Scholar
  7. Blumensath, T., & Davies, M. E. (2009). Iterative hard thresholding for compressed sensing. Applied and Computational Harmonic Analysis, 27, 265–274.MathSciNetCrossRefGoogle Scholar
  8. Brauer, C., Lorenz, D. A., & Tillmann, A. M. (2018). A primal-dual homotopy algorithm for \(\ell _1\)-minimization with \(\ell _\infty \)-constraints. Computational Optimization and Applications, 70(2), 443–478.MathSciNetCrossRefGoogle Scholar
  9. Bredies, K., & Lorenz, D. A. (2008). Linear convergence of iterative soft-thresholding. Journal of Fourier Analysis and Applications, 14(5–6), 813–837.MathSciNetCrossRefGoogle Scholar
  10. Cai, T. T., & Zhang, A. (2014). Sparse representation of a polytope and recovery of sparse signals and low-rank matrices. IEEE Transactions on Information Theory, 60(1), 122–132.MathSciNetCrossRefGoogle Scholar
  11. Candès, E. (2008). The restricted isometry property and its implications for compressed sensing. Comptes rendus de l’Académie des Sciences Serie, I, 589–592.MathSciNetzbMATHGoogle Scholar
  12. Candès, E. J., Romberg, J. K., & Tao, T. (2006). Stable signal recovery from incomplete and inaccurate measurements. Communications on Pure and Applied Mathematics, 59, 1207–1223. Scholar
  13. Candès, E. J., & Tao, T. (2005). Decoding by linear programming. IEEE Transactions on Information Theory, 51, 4203–4215.MathSciNetCrossRefGoogle Scholar
  14. Candès, E., & Tao, T. (2007). The dantzig selector: Statistical estimation when \(p\) is much larger than \(n\). The Annals of Statistics, 35(6), 2313–2351.MathSciNetCrossRefGoogle Scholar
  15. Candès, E. J., & Wakin, M. B. (2008). An introduction to compressive sampling. IEEE Signal Processing Magazine, 25, 21–30.CrossRefGoogle Scholar
  16. Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing, 20(1), 33–61.MathSciNetCrossRefGoogle Scholar
  17. Chen, S. S., Donoho, D. L., & Saunders, M. A. (2001). Atomic decomposition by basis pursuit. SIAM Review, 43, 129–159.MathSciNetCrossRefGoogle Scholar
  18. Dasgupta, A., Kumar, R., & Sarlós, T. (2010). A sparse Johnson–Lindenstrauss transform. In Proceedings of the 42nd ACM symposium on theory of computing, STOC ’10 (pp. 341–350).Google Scholar
  19. Davenport, M. A., Duarte, M. F., Eldar, Y. C., & Kutyniok, G. (2012). Introduction to compressed sensing. In Compressed sensing: Theory and applications. Cambridge University PressGoogle Scholar
  20. Davis, G., Mallat, S., & Avellaneda, M. (2004). Adaptive greedy approximations. Constructive Approximation, 13, 57–98.MathSciNetCrossRefGoogle Scholar
  21. Donoho, D. L. (2006). Compressed sensing. IEEE Transactions on Information Theory, 52, 1289–1306.MathSciNetCrossRefGoogle Scholar
  22. Donoho, D. L., & Tanner, J. (2009). Observed universality of phase transitions in high-dimensional geometry, with implications for modern data analysis and signal processing. CoRR abs/0906.2530.Google Scholar
  23. Donoho, D. L., Johnstone, I., & Montanari, A. (2013). Accurate prediction of phase transitions in compressed sensing via a connection to minimax denoising. IEEE Transactions on Information Theory, 59(6), 3396–3433.MathSciNetCrossRefGoogle Scholar
  24. Donoho, D. L., Maleki, A., & Montanari, A. (2011). The noise-sensitivity phase transition in compressed sensing. IEEE Transactions on Information Theory, 57(10), 6920–6941.MathSciNetCrossRefGoogle Scholar
  25. Donoho, D. L., & Tsaig, Y. (2008). Fast solution of l1-norm minimization problems when the solution may be sparse. IEEE Transactions on Information Theory, 54, 4789–4812.MathSciNetCrossRefGoogle Scholar
  26. Donoho, D. L., Tsaig, Y., Drori, I., & Starck, J. L. (2012). Sparse solution of underdetermined systems of linear equations by stagewise orthogonal matching pursuit. IEEE Transactions on Information Theory, 58, 1094–1121.MathSciNetCrossRefGoogle Scholar
  27. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. Annals of Statistics, 32, 407–499.MathSciNetCrossRefGoogle Scholar
  28. Eghbali, R., & Fazel, M. (2017). Decomposable norm minimization with proximal-gradient homotopy algorithm. Computational Optimization and Applications, 66(2), 345–381. Scholar
  29. Eldar, Y., & Kutyniok, G. (2012). Compressed sensing: Theory and applications. Cambridge: Cambridge University Press.CrossRefGoogle Scholar
  30. Foucart, S. (2011). Hard thresholding pursuit: An algorithm for compressive sensing. SIAM Journal on Numerical Analysis, 49(6), 2543–2563.MathSciNetCrossRefGoogle Scholar
  31. Galambos, J. (1977). Bonferroni inequalities. Annals of Probability, 5(4), 577–581. Scholar
  32. Garg, R., & Khandekar, R. (2009). Gradient descent with sparsification: An iterative algorithm for sparse recovery with restricted isometry property. In Proceedings of the 26th annual international conference on machine learning (pp. 337–344). ACM.Google Scholar
  33. Hale, E. T., Wotao, Y., & Zhang, Y. (2008). Fixed-point continuation for l1-minimization: methodology and convergence. SIAM Journal on Optimization, 19(3), 1107–1130.MathSciNetCrossRefGoogle Scholar
  34. Hanson, D. L., & Wright, F. T. (1971). A bound on tail probabilities for quadratic forms in independent random variables. Annals of Mathematical Statistics, 42(3), 1079–1083. Scholar
  35. Johnson, W., & Lindenstrauss, J. (1984). Extensions of Lipschitz mappings into a Hilbert space. In Conference in modern analysis and probability (New Haven, CT, 1982) (Vol. 26, pp. 189–206).Google Scholar
  36. Kane, D. M., & Nelson, J. (2014). Sparser Johnson–Lindenstrauss transforms. Journal of the ACM, 61, 4:1–4, 23.MathSciNetzbMATHGoogle Scholar
  37. Kim, S., Koh, K., Lustig, M., Boyd, S., & Gorinevsky, D. (2008). An interior-point method for large-scale l1-regularized least squares. IEEE Journal of Selected Topics in Signal Processing, 1, 606–617.CrossRefGoogle Scholar
  38. Koltchinskii, V. (2011). Oracle inequalities in empirical risk minimization and sparse recovery problems: École DÉté de Probabilités de Saint-Flour XXXVIII-2008. Ecole d’été de probabilités de Saint-Flour. New York: Springer.CrossRefGoogle Scholar
  39. Kyrillidis, A. T., & Cevher, V. (2012). Combinatorial selection and least absolute shrinkage via the clash algorithm. In ISIT (pp. 2216–2220).Google Scholar
  40. Kyrillidis, A. T., & Cevher, V. (2014). Matrix recipes for hard thresholding methods. Journal of Mathematical Imaging and Vision, 48(2), 235–265.MathSciNetCrossRefGoogle Scholar
  41. Lin, Q., & Xiao, L. (2015). An adaptive accelerated proximal gradient method and its homotopy continuation for sparse optimization. Computational Optimization and Applications, 60(3), 633–674. Scholar
  42. Lorenz, D. A., Pfetsch, M. E., & Tillmann, A. M. (2014a). An infeasible-point subgradient method using adaptive approximate projections. Computational Optimization and Applications, 57(2), 271–306. Scholar
  43. Lorenz, D. A., Pfetsch, M. E., & Tillmann, A. M. (2014b). Solving basis pursuit: Heuristic optimality check and solver comparison. ACM Transactions on Mathematical Software, 41, 1–29.MathSciNetCrossRefGoogle Scholar
  44. Maleki, A., & Donoho, D. L. (2010). Optimally tuned iterative reconstruction algorithms for compressed sensing. The Journal of Selected Topics in Signal Processing, 4(2), 330–341.CrossRefGoogle Scholar
  45. Mallat, S., & Zhang, Z. (1993). Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41, 3397–3415.CrossRefGoogle Scholar
  46. Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. The Annals of Statistics, 34(3), 1436–1462.MathSciNetCrossRefGoogle Scholar
  47. Needell, D., & Tropp, J. A. (2010). CoSaMP: Iterative signal recovery from incomplete and inaccurate samples. Communications of the ACM, 53, 93–100.CrossRefGoogle Scholar
  48. Needell, D., & Vershynin, R. (2009). Uniform uncertainty principle and signal recovery via regularized orthogonal matching pursuit. Foundations of Computational Mathematics, 9, 317–334.MathSciNetCrossRefGoogle Scholar
  49. Nelson, J. (2013). Johnson–Lindenstrauss notes. Technical report.Google Scholar
  50. Nesterov, Y. (2007). Gradient methods for minimizing composite objective function. Core discussion papers, Universit catholique de Louvain, Center for Operations Research and Econometrics (CORE).Google Scholar
  51. Osborne, M. R., Presnell, B., & Turlach, B. A. (1999). On the lasso and its dual. Journal of Computational and Graphical Statistics, 9, 319–337.MathSciNetGoogle Scholar
  52. Osborne, M. R., Presnell, B., & Turlach, B. A. (2000). A new approach to variable selection in least squares problems. IMA Journal of Numerical Analysis, 20, 389–403.MathSciNetCrossRefGoogle Scholar
  53. Oymak, S., Recht, B., & Soltanolkotabi, M. (2018). Sharp time?data tradeoffs for linear inverse problems. IEEE Transactions on Information Theory, 64(6), 4129–4158. Scholar
  54. Plan, Y., & Vershynin, R. (2011). One-bit compressed sensing by linear programming. CoRR abs/1109.4299.Google Scholar
  55. Rao, M., & Ren, Z. (1991). Theory of orlicz spaces. Chapman and Hall pure and applied mathematics. Boca Raton: CRC Press.Google Scholar
  56. Rockafellar, R. T. (1970). Convex analysis. Princeton mathematical series. Princeton, NJ: Princeton University Press.Google Scholar
  57. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society (Series B), 58, 267–288.MathSciNetzbMATHGoogle Scholar
  58. Tillmann, A. M., & Pfetsch, M. E. (2014). The computational complexity of the restricted isometry property, the nullspace property, and related concepts in compressed sensing. IEEE Transactions on Information Theory, 60(2), 1248–1259. Scholar
  59. Tropp, J. A. (2006a). Greed is good: Algorithmic results for sparse approximation. IEEE Transactions on Information Theory, 50, 2231–2242.MathSciNetCrossRefGoogle Scholar
  60. Tropp, J. A. (2006b). Just relax: convex programming methods for identifying sparse signals in noise. IEEE Transactions on Information Theory, 52, 1030–1051.MathSciNetCrossRefGoogle Scholar
  61. Tropp, J. A., & Gilbert, A. C. (2007). Signal recovery from random measurements via orthogonal matching pursuit. IEEE Transactions on Information Theory, 53, 4655–4666.MathSciNetCrossRefGoogle Scholar
  62. Tseng, P. (2008). On accelerated proximal gradient methods for convex-concave optimization. SIAM Journal on Optimization (submitted).Google Scholar
  63. Turlach, B. A., Venables, W. N., & Wright, S. J. (2005). Simultaneous variable selection. Technometrics, 47, 349–363.MathSciNetCrossRefGoogle Scholar
  64. van de Geer, S. A., & Bühlmann, P. (2009). On the conditions used to prove oracle results for the lasso. The Electronic Journal of Statistics, 3, 1360–1392.MathSciNetCrossRefGoogle Scholar
  65. van den Berg, E., & Friedlander, M. P. (2008). Probing the pareto frontier for basis pursuit solutions. SIAM Journal on Scientific Computing, 31(2), 890–912. Scholar
  66. Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using l1-constrained quadratic programming (Lasso). IEEE Transactions on Information Theory, 55, 2183–2202.MathSciNetCrossRefGoogle Scholar
  67. Wen, Z., Yin, W., Goldfarb, D., & Zhang, Y. (2010). A fast algorithm for sparse reconstruction based on shrinkage, subspace optimization, and continuation. SIAM Journal on Scientific Computing, 32(4), 1832–1857. Scholar
  68. Wright, S., Nowak, R., & Figueiredo, M. A. T. (2009). Sparse reconstruction by separable approximation. IEEE Transactions on Signal Processing. Scholar
  69. Xiao, L., & Zhang, T. (2013). A proximal-gradient homotopy method for the sparse least-squares problem. SIAM Journal on Optimization, 23(2), 1062–1091.MathSciNetCrossRefGoogle Scholar
  70. Zhang, T. (2009). Some sharp performance bounds for least squares regression with l1 regularization. The Annals of Statistics, 37, 2109–2144.MathSciNetCrossRefGoogle Scholar
  71. Zhang, C. H., & Huang, J. (2008). The sparsity and bias of the lasso selection in high-dimensional linear regression. The Annals of Statistics, 36, 1567–1594.MathSciNetCrossRefGoogle Scholar
  72. Zhao, P., & Yu, B. (2006). On model selection consistency of lasso. Journal of Machine Learning Research, 7, 2541–2563.MathSciNetzbMATHGoogle Scholar

Copyright information

© The Author(s) 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of IowaIowa CityUSA
  2. 2.National Key Laboratory for Novel Software TechnologyNanjing UniversityNanjingChina
  3. 3.Alibaba GroupSeattleUSA

Personalised recommendations