Skip to main content
Log in

Matrix Recipes for Hard Thresholding Methods

  • Published:
Journal of Mathematical Imaging and Vision Aims and scope Submit manuscript


In this paper, we present and analyze a new set of low-rank recovery algorithms for linear inverse problems within the class of hard thresholding methods. We provide strategies on how to set up these algorithms via basic ingredients for different configurations to achieve complexity vs. accuracy tradeoffs. Moreover, we study acceleration schemes via memory-based techniques and randomized, ϵ-approximate matrix projections to decrease the computational costs in the recovery process. For most of the configurations, we present theoretical analysis that guarantees convergence under mild problem conditions. Simulation results demonstrate notable performance improvements as compared to state-of-the-art algorithms both in terms of reconstruction accuracy and computational complexity.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Fig. 1
Fig. 2
Algorithm 3
Fig. 3
Fig. 4
Algorithm 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others


  1. The distinction between \(\mathcal{P}_{\mathcal{S}} \) and \(\mathcal {P}_{k} \) for k positive integer is apparent from context.

  2. In the case of multiple identical singular values, any ties are lexicographically dissolved.

  3. From a different perspective and for a different problem case, similar ideas have been used in [18].

  4. We can move between these two cases by a simple transpose of the problem.

  5. While such operation has O(max{m 2 n,mn 2}) complexity, each application of \(\mathcal{P}_{\mathcal{S}} \boldsymbol {X}\) requires three matrix-matrix multiplications. To reduce such computational cost, we relax this operation in Sect. 10 where in practice we use only \(\mathcal{P}_{\mathcal{U}}\) that needs one matrix-matrix multiplication.


  1. Baraniuk, R.G., Cevher, V., Wakin, M.B.: Low-dimensional models for dimensionality reduction and signal recovery: a geometric perspective. Proc. IEEE 98(6), 959–971 (2010)

    Article  Google Scholar 

  2. Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717–772 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  3. Meka, R., Jain, P., Dhillon, I.S.: Guaranteed rank minimization via singular value projection. In: NIPS Workshop on Discrete Optimization in Machine Learning (2010)

    Google Scholar 

  4. Tyagi, H., Cevher, V.: Learning ridge functions with randomized sampling in high dimensions. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2025–2028. IEEE Press, New York (2012)

    Google Scholar 

  5. Tyagi, H., Cevher, V.: Learning non-parametric basis independent models from point queries via low-rank methods. Technical report, EPFL (2012)

  6. Liu, Y.K.: Universal low-rank matrix recovery from Pauli measurements (2011)

  7. Tyagi, H., Cevher, V.: Active learning of multi-index function models. In: Advances in Neural Information Processing Systems, vol. 25, pp. 1475–1483 (2012)

    Google Scholar 

  8. Candes, E.J., Li, X.: Solving quadratic equations via phaselift when there are about as many equations as unknowns (2012). Preprint arXiv:1208.6247

  9. Bennett, J., Lanning, S.: The netflix prize. In: KDD Cup and Workshop in Conjunction with KDD (2007)

    Google Scholar 

  10. Candes, E.J., Li, X., Ma, Y., Wright, J.: Robust principal component analysis? J. ACM 58(3) (2011)

  11. Kyrillidis, A., Cevher, V.: Matrix alps: Accelerated low rank and sparse matrix reconstruction. Technical report, EPFL (2012)

  12. Waters, A.E., Sankaranarayanan, A.C., Baraniuk, R.G.: Sparcs: recovering low-rank and sparse matrices from compressive measurements. In: NIPS (2011)

    Google Scholar 

  13. Fazel, M., Recht, B., Parrilo, P.A.: Guaranteed minimum rank solutions to linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  14. Liu, Z., Vandenberghe, L.: Interior-point method for nuclear norm approximation with application to system identification. SIAM J. Matrix Anal. Appl. 31, 1235–1256 (2009)

    Article  MathSciNet  Google Scholar 

  15. Mohan, K., Fazel, M.: Reweighted nuclear norm minimization with application to system identification. In: American Control Conference (ACC). IEEE Press, New York (2010)

    Google Scholar 

  16. Cai, J.-F., Candès, E.J., Shen, Z.: A singular value thresholding algorithm for matrix completion. SIAM J. Optim. 20, 1956–1982 (2010)

    Article  MATH  MathSciNet  Google Scholar 

  17. Recht, B., Re, C.: Parallel stochastic gradient algorithms for large-scale matrix completion. Preprint (2011)

  18. Lin, Z., Chen, M., Ma, Y.: The augmented Lagrange multiplier method for exact recovery of corrupted low-rank matrices (2010). preprint arXiv:1009.5055

  19. Wright, J., Wu, L., Chen, M., Lin, Z., Ganesh, A., Ma, Y.: Fast convex optimization algorithms for exact recovery of a corrupted low-rank matrix. UIUC Technical Report UILU-ENG-09-2214

  20. Natarajan, B.K.: Sparse approximate solutions to linear systems. SIAM J. Comput. 24(2), 227–234 (1995)

    Article  MATH  MathSciNet  Google Scholar 

  21. Lee, K., Bresler, Y.: Admira: atomic decomposition for minimum rank approximation. IEEE Trans. Inf. Theory 56(9), 4402–4416 (2010)

    Article  MathSciNet  Google Scholar 

  22. Goldfarb, D., Ma, S.: Convergence of fixed-point continuation algorithms for matrix rank minimization. Found. Comput. Math. 11, 183–210 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  23. Beck, A., Teboulle, M.: A linearly convergent algorithm for solving a class of nonconvex/affine feasibility problems. In: Fixed-Point Algorithms for Inverse Problems in Science and Engineering, pp. 33–48 (2011)

    Chapter  Google Scholar 

  24. Kyrillidis, A., Cevher, V.: Recipes on hard thresholding methods. In: Computational Advances in Multi-Sensor Adaptive Processing, Dec. 2011

    Google Scholar 

  25. Kyrillidis, A., Cevher, V.: Combinatorial selection and least absolute shrinkage via the Clash algorithm. In: IEEE International Symposium on Information Theory, July 2012

    Google Scholar 

  26. Halko, N., Martinsson, P.G., Tropp, J.A.: Finding structure with randomness: probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53, 217–288 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  27. Bertsekas, D.: Nonlinear Programming. Athena Scientific, Nashua (1995)

    MATH  Google Scholar 

  28. Horn, R.A., Johnson, C.R.: Matrix Analysis. Cambridge Univ. Press, Cambridge (1990)

    MATH  Google Scholar 

  29. Cohen, A., Dahmen, W., DeVore, R.: Compressed sensing and best k-term approximation. J. Am. Math. Soc. 22(1), 211–231 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  30. Tropp, J.A.: Greed is good: algorithmic results for sparse approximation. IEEE Trans. Inf. Theory 50(10), 2231–2242 (2004)

    Article  MathSciNet  Google Scholar 

  31. Cevher, V.: An alps view of sparse recovery. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5808–5811. IEEE Press, New York (2011)

    Google Scholar 

  32. Needell, D., Tropp, J.A.: Cosamp: iterative signal recovery from incomplete and inaccurate samples. Appl. Comput. Harmon. Anal. 26(3), 301–321 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  33. Dai, W., Milenkovic, O.: Subspace pursuit for compressive sensing signal reconstruction. IEEE Trans. Inf. Theory 55, 2230–2249 (2009)

    Article  MathSciNet  Google Scholar 

  34. Foucart, S.: Hard thresholding pursuit: an algorithm for compressed sensing. SIAM J. Numer. Anal. 49(6), 2543–2563 (2011)

    Article  MATH  MathSciNet  Google Scholar 

  35. Blumensath, T., Davies, M.E.: Iterative hard thresholding for compressed sensing. Appl. Comput. Harmon. Anal. 27(3), 265–274 (2009)

    Article  MATH  MathSciNet  Google Scholar 

  36. Garg, R., Khandekar, R.: Gradient descent with sparsification: an iterative algorithm for sparse recovery with restricted isometry property. In: ICML. ACM Press, New York (2009)

    Google Scholar 

  37. Blumensath, T., Davies, M.E.: Normalized iterative hard thresholding: guaranteed stability and performance. IEEE J. Sel. Top. Signal Process. 4(2), 298–309 (2010)

    Article  Google Scholar 

  38. Blumensath, T.: Accelerated iterative hard thresholding. Signal Process. 92, 752–756 (2012)

    Article  Google Scholar 

  39. Tanner, J., Wei, K.: Normalized iterative hard thresholding for matrix completion. Preprint (2012)

  40. Coifman, R., Geshwind, F., Meyer, Y.: Noiselets. Appl. Comput. Harmon. Anal. 10(1), 27–44 (2001)

    Article  MATH  MathSciNet  Google Scholar 

  41. Foucart, S.: Sparse recovery algorithms: sufficient conditions in terms of restricted isometry constants. In: Proceedings of the 13th International Conference on Approximation Theory (2010)

    Google Scholar 

  42. Nesterov, Y.: Gradient methods for minimizing composite objective function. core discussion papers 2007076, université catholique de louvain. Center for Operations Research and Econometrics (CORE) (2007)

  43. Nesterov, Y.: Introductory Lectures on Convex Optimization. Kluwer Academic, Dordrecht (1996)

    Google Scholar 

  44. Drineas, P., Frieze, A., Kannan, R., Vempala, S., Vinay, V.: Clustering large graphs via the singular value decomposition. Mach. Learn. 56(1), 9–33 (2004)

    Article  MATH  Google Scholar 

  45. Drineas, P., Kannan, R., Mahoney, M.W.: Fast Monte Carlo algorithms for matrices ii: computing a low-rank approximation to a matrix. SIAM J. Comput. 36, 158–183 (2006)

    Article  MATH  MathSciNet  Google Scholar 

  46. Deshpande, A., Rademacher, L., Vempala, S., Wang, G.: Matrix approximation and projective clustering via volume sampling. In: Proceedings of the Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithm, SODA ’06, New York, NY, USA, pp. 1117–1126. ACM Press, New York (2006)

    Chapter  Google Scholar 

  47. Deshpande, A., Vempala, S.: Adaptive sampling and fast low-rank matrix approximation. Electron. Colloq. Comput. Complex. 13, 042 (2006)

    Google Scholar 

  48. Keshavan, R.H., Montanari, A., Oh, S.: Matrix completion from a few entries. IEEE Trans. Inf. Theory 56(6), 2980–2998 (2010)

    Article  MathSciNet  Google Scholar 

  49. Balzano, L., Nowak, R., Recht, B.: Online identification and tracking of subspaces from highly incomplete information. In: 48th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pp. 704–711. IEEE Press, New York (2010)

    Google Scholar 

  50. He, J., Balzano, L., Lui, J.C.S.: Online robust subspace tracking from partial information (2011). arXiv:1109.3827

  51. Boumal, N., Absil, P.A.: Rtrmc: a Riemannian trust-region method for low-rank matrix completion. In: NIPS (2011)

    Google Scholar 

  52. Wen, Z., Yin, W., Zhang, Y.: Solving a low-rank factorization model for matrix completion by a nonlinear successive over-relaxation algorithm. Rice University CAAM Technical Report TR10-07 (2010) Submitted

  53. Larsen, R.M.: Propack: Software for large and sparse svd calculations.

  54. Shi, X., Yu, P.S.: Limitations of matrix completion via trace norm minimization. ACM SIGKDD Explor. Newsl. 12(2), 16–20 (2011)

    Article  Google Scholar 

Download references


This work was supported in part by the European Commission under Grant MIRG-268398, ERC Future Proof, SNF 200021-132548 and DARPA KeCoM program #11-DARPA-1055. VC also would like to acknowledge Rice University for his Faculty Fellowship.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Anastasios Kyrillidis.



Remark 1

Let \(\boldsymbol {X}\in\mathbb{R}^{m \times n}\) with SVD: X=UΣV T, and \(\boldsymbol{Y} \in\mathbb{R}^{m \times n}\) with SVD: \(\boldsymbol {Y} = \widetilde{\boldsymbol{U}} \widetilde{\boldsymbol{\varSigma}} \widetilde{\boldsymbol{V}}^{T}\). Assume two sets: (i) \(\mathcal{S}_{1} = \lbrace\boldsymbol{u}_{i}\boldsymbol{u}_{i}^{T}: i \in\mathcal {I}_{1} \rbrace\) where u i is the i-th singular vector of X and \(\mathcal{I}_{1} \subseteq\lbrace1, \dots,\allowbreak \operatorname{rank}(\boldsymbol {X}) \rbrace\) and, (ii) \(\mathcal{S}_{2} = \lbrace \boldsymbol{u}_{i}\boldsymbol{u}_{i}^{T}, \tilde{\boldsymbol{u}_{j}} \tilde {\boldsymbol{u}_{j}}^{T}: i \in\mathcal{I}_{2},~j \in\mathcal{I}_{3} \rbrace\) where \(\tilde{\boldsymbol{u}_{i}} \) is the i-th singular vector of Y, \(\mathcal{I}_{1} \subseteq\mathcal{I}_{2} \subseteq\lbrace1, \dots, \operatorname{rank}(\boldsymbol {X}) \rbrace\) and, \(\mathcal{I}_{3} \subseteq\lbrace1, \dots, \operatorname{rank}(\boldsymbol {Y}) \rbrace\). We observe that the subspaces defined by \(\boldsymbol {u}_{i}\boldsymbol{u}_{i}^{T}\) and \(\tilde{\boldsymbol{u}_{j}} \tilde {\boldsymbol{u}_{j}}^{T}\) are not necessarily orthogonal.

To this end, let \(\widehat{\mathcal{S}}_{2} = \text{ortho}(\mathcal {S}_{2})\); this operation can be easily computed via SVD. Then, the following commutativity property holds true for any matrix \(\boldsymbol {W} \in\mathbb{R}^{m \times n} \):


1.1 A.1 Proof of Lemma 6

Given \(\mathcal{X}^{\ast}\leftarrow\mathcal{P}_{k}(\boldsymbol {X}^{\ast}) \) using SVD factorization, we define the following quantities: \(\mathcal{S}_{i} \leftarrow\mathcal{X}_{i} \cup\mathcal{D}_{i}\), \(\mathcal {S}^{\ast}_{i} \leftarrow\text{ortho} (\mathcal{X}_{i} \cup\mathcal {X}^{\ast}) \). Then, given the structure of the sets \(\mathcal {S}_{i}\) and \(\mathcal{S}_{i}^{\ast}\)




Since the subspace defined in \(\mathcal{D}_{i} \) is the best rank-k subspace, orthogonal to the subspace spanned by \(\mathcal {X}_{i} \), the following holds true:

Removing the common subspaces in \(\mathcal{S}_{i} \) and \(\mathcal {S}_{i}^{\ast}\) by the commutativity property of the projection operation and using the shortcut \(\mathcal{P}_{\mathcal{A} \setminus\mathcal {B}} \equiv\mathcal{P}_{\mathcal{A}} \mathcal{P}_{\mathcal {B}^{\bot}} \) for sets \(\mathcal{A}\), \(\mathcal{B}\), we get:


Next, we assume that \(\mathcal{P}_{(\mathcal{A} \setminus\mathcal {B})^{\bot}}\) denotes the orthogonal projection onto the subspace spanned by \(\mathcal{P}_{\mathcal{A}} \mathcal{P}_{\mathcal {B}^{\bot}}\). Then, on the left hand side of (39), we have:


where (i) due to triangle inequality over Frobenius metric norm, (ii) since \(\mathcal{P}_{\mathcal{S}_{i} \setminus\mathcal {S}_{i}^{\ast}} (\boldsymbol {X}(i) - \boldsymbol{X}^{\ast}) = \mathbf{0} \), (iii) by using the fact that \(\boldsymbol {X}(i) - \boldsymbol{X}^{\ast}:= \mathcal {P}_{\mathcal{S}_{i} \setminus\mathcal{S}_{i}^{\ast}}(\boldsymbol {X}(i) - \boldsymbol{X}^{\ast}) + \mathcal{P}_{(\mathcal{S}_{i} \setminus \mathcal {S}_{i}^{\ast})^{\bot}}(\boldsymbol {X}(i) - \boldsymbol{X}^{\ast}) \), (iv) due to Lemma 4, (v) due to Lemma 5 and (vi) since \(\Vert\mathcal{P}_{(\mathcal{S}_{i} \setminus\mathcal {S}_{i}^{\ast})^{\bot}} (\boldsymbol{X}^{\ast}- \boldsymbol {X}(i)) \Vert_{F} \leq \Vert \boldsymbol {X}(i) - \boldsymbol{X}^{\ast}\Vert_{F} \).

For the right hand side of (39), we calculate:


by using Lemmas 4 and 5. Combining (40) and (41) in (39), we get:

1.2 A.2 Proof of Theorem 1

Let \(\mathcal{X}^{\ast}\leftarrow\mathcal{P}_{k}(\boldsymbol {X}^{\ast}) \) be a set of orthonormal, rank-1 matrices that span the range of X . In Algorithm 1, \(\boldsymbol{W}(i) \leftarrow \mathcal{P}_{k}(\boldsymbol{V}(i)) \). Thus:


From Algorithm 1, (i) \(\boldsymbol{V}(i) \in\text {span}(\mathcal {S}_{i}) \), (ii) \(\boldsymbol {X}(i) \in\operatorname{span}( \mathcal{S}_{i}) \) and (iii) \(\boldsymbol{W}(i) \in\operatorname{span}(\mathcal{S}_{i}) \). We define \(\mathcal{E} \leftarrow \text{ortho}(\mathcal{S}_{i} \cup \mathcal{X}^{\ast}) \) where \(\operatorname{rank}(\operatorname{span}(\mathcal {E})) \leq3k\) and let \(\mathcal{P}_{\mathcal{E}} \) be the orthogonal projection onto the subspace defined by \(\mathcal{E} \).

Since \(\boldsymbol{W}(i) - \boldsymbol{X}^{\ast}\in\text {span}(\mathcal{E})\) and \(\boldsymbol{V}(i) - \boldsymbol{X}^{\ast}\in\text {span}(\mathcal{E})\), the following hold true:

Then, (42) can be written as:


In B, we observe:


where (i) holds since \(\mathcal{P}_{\mathcal{S}_{i}} \mathcal {P}_{\mathcal{E}} = \mathcal{P}_{\mathcal{E}}\mathcal{P}_{\mathcal {S}_{i}} = \mathcal{P}_{\mathcal{S}_{i}} \) for \(\operatorname{span}(\mathcal {S}_{i}) \in\operatorname{span}(\mathcal{E}) \), (ii) is due to Cauchy-Schwarz inequality and, (iii) is easily derived using Lemma 2.

In A, we perform the following motions:


where (i) is due to \(\mathcal{P}_{\mathcal{E}}(\boldsymbol {X}(i) - \boldsymbol{X}^{\ast}) := \mathcal{P}_{\mathcal{S}_{i}} \mathcal {P}_{\mathcal{E}}(\boldsymbol {X}(i) - \boldsymbol{X}^{\ast}) + \mathcal {P}_{\mathcal {S}_{i}^{\bot}} \mathcal{P}_{\mathcal{E}}(\boldsymbol {X}(i) - \boldsymbol {X}^{\ast}) \) and (ii) follows from Cauchy-Schwarz inequality. Since \(\frac{1}{1+\delta_{2k}} \leq\mu_{i} \leq\frac {1}{1-\delta_{2k}} \), Lemma 4 implies:

and thus:

Furthermore, according to Lemma 5:

since \(\operatorname{rank}(\mathcal{P}_{\mathcal{K}}\boldsymbol {X}) \leq3k\), \(\forall \boldsymbol {X}\in\mathbb{R}^{m \times n} \) for \(\mathcal{K} \leftarrow\text{ortho}(\mathcal{E} \cup\mathcal{S}_{i})\). Since \(\mathcal{P}_{\mathcal{S}_{i}^{\bot}}\mathcal{P}_{\mathcal {E}}(\boldsymbol {X}(i) - \boldsymbol{X}^{\ast}) = \mathcal{P}_{\mathcal {X}^{\ast}\setminus(\mathcal{D}_{i} \cup\mathcal{X}_{i})}\boldsymbol{X}^{\ast}\) where


using Lemma 6. Combining the above in (45), we compute:


Combining (44) and (46) in (43), we get:


Focusing on steps 5 and 6 of Algorithm 1, we perform similar motions to obtain:


Combining the recursions in (47) and (48), we finally compute:

for \(\rho:= ( \frac{1 + 2\delta_{2k}}{1-\delta_{2k}} ) (\frac{4\delta_{2k}}{1-\delta_{2k}} + (2\delta_{2k} + 2\delta_{3k})\frac{2\delta_{3k}}{1-\delta_{2k}} )\) and

For the convergence parameter ρ, further compute:


for δ k δ 2k δ 3k . Calculating the roots of this expression, we easily observe that \(\rho < \hat{\rho} < 1 \) for δ 3k <0.1235.

1.3 A.3 Proof of Theorem 2

Before we present the proof of Theorem 2, we list a series of lemmas that correspond to the motions Algorithm 2 performs.

Lemma 9

[Error norm reduction via least-squares optimization]

Let \(\mathcal{S}_{i} \) be a set of orthonormal, rank-1 matrices that span a rank-2k subspace in \(\mathbb{R}^{m \times n} \). Then, the least squares solution V(i) given by:





We observe that \(\Vert\boldsymbol{V}(i) - \boldsymbol{X}^{\ast}\Vert_{F}^{2} \) is decomposed as follows:


In (50), V(i) is the minimizer over the low-rank subspace spanned by \(\mathcal{S}_{i} \) with \(\text {rank}(\operatorname{span}(\mathcal{S}_{i})) \leq2k\). Using the optimality condition (Lemma 1) over the convex set \(\varTheta= \lbrace \boldsymbol {X}: \operatorname{span}(\boldsymbol {X}) \in\mathcal{S}_{i} \rbrace\), we have:


for \(\mathcal{P}_{\mathcal{S}_{i}}\boldsymbol{X}^{\ast}\in\text {span}(\mathcal{S}_{i}) \). Given condition (53), the first term on the right hand side of (52) becomes:


Focusing on the term \(| \langle\boldsymbol{V}(i) - \boldsymbol {X}^{\ast}, (\mathbf {I}- \boldsymbol {\mathcal {A}}^{\ast} \boldsymbol {\mathcal {A}})\mathcal{P}_{\mathcal {S}_{i}}(\boldsymbol{V}(i) - \boldsymbol{X}^{\ast}) \rangle| \), we derive the following:

where (i) follows from the facts that \(\boldsymbol{V}(i) - \boldsymbol{X}^{\ast}\in \operatorname{span}(\text{ortho}(\mathcal{S}_{i} \cup \mathcal {X}^{\ast})) \) and thus \(\mathcal{P}_{\mathcal{S}_{i} \cup\mathcal {X}^{\ast}}(\boldsymbol{V}(i) - \boldsymbol{X}^{\ast}) = \boldsymbol {V}(i) - \boldsymbol{X}^{\ast}\) and (ii) is due to \(\mathcal{P}_{\mathcal{S}_{i} \cup\mathcal{X}^{\ast}} \mathcal{P}_{\mathcal{S}_{i}} = \mathcal {P}_{\mathcal{S}_{i}} \) since \(\operatorname{span}(\mathcal{S}_{i}) \subseteq \operatorname{span}(\text{ortho}(\mathcal{S}_{i} \cup\mathcal{X}^{\ast})) \). Then, (54) becomes:


where (i) comes from Cauchy-Swartz inequality and (ii) is due to Lemmas 2 and 4. Simplifying the above quadratic expression, we obtain:


As a consequence, (52) can be upper bounded by:


We form the quadratic polynomial for this inequality assuming as unknown variable the quantity ∥V(i)−X F . Bounding by the largest root of the resulting polynomial, we get:



The following lemma characterizes how subspace pruning affects the recovered energy:

Lemma 10

[Best rank-k subspace selection]

Let \(\boldsymbol{V}(i) \in\mathbb {R}^{m \times n} \) be a rank-2k proxy matrix in the subspace spanned by \(\mathcal{S}_{i} \) and let \(\boldsymbol {X}(i+1) \leftarrow \mathcal{P}_{k}(\boldsymbol{V}(i)) \) denote the best rank-k approximation to V(i), according to (5). Then:



Since X(i+1) denotes the best rank-k approximation to V(i), the following inequality holds for any rank-k matrix \(\boldsymbol {X}\in\mathbb{R}^{m \times n} \) in the subspace spanned by \(\mathcal{S}_{i} \), i.e. \(\forall \boldsymbol {X}\in \operatorname{span}(\mathcal{S}_{i}) \):


Since \(\mathcal{P}_{\mathcal{S}_{i}} \boldsymbol{V}(i) = \boldsymbol {V}(i) \), the left inequality in (59) is satisfied for \(\boldsymbol {X}:= \mathcal{P}_{\mathcal{S}_{i}} \boldsymbol{X}^{\ast}\) in (60). □

Lemma 11

Let V(i) be the least squares solution in Step 2 of the ADMiRA algorithm and let X(i+1) be a proxy, rank-k matrix to V(i) according to: \(\boldsymbol {X}(i+1) \leftarrow\mathcal {P}_{k}(\boldsymbol{V}(i))\). Then, ∥X(i+1)−X F can be expressed in terms of the distance from V(i) to X as follows:



We observe the following


Focusing on the right hand side of expression (62), \(\langle \boldsymbol{V}(i) - \boldsymbol{X}^{\ast}, \boldsymbol{V}(i) - \boldsymbol {X}(i+1) \rangle= \langle\boldsymbol{V}(i) - \boldsymbol {X}^{\ast}, \mathcal{P}_{\mathcal{S}_{i}}(\boldsymbol{V}(i) - \boldsymbol {X}(i+1) ) \rangle\) can be similarly analysed as in Lemma 10 where we obtain the following expression:


Now, expression (62) can be further transformed as:


where (i) is due to (63). Using Lemma 10, we further have:


Furthermore, replacing \(\Vert \mathcal{P}_{\mathcal {S}_{i}}(\boldsymbol{X}^{\ast}- \boldsymbol{V}(i))\Vert _{F} \) with its upper bound defined in (56), we get:


where (i) is obtained by completing the squares and eliminating negative terms. □

Applying basic algebra tools in (61) and (51), we get:

Since \(\boldsymbol{V}(i) \in\operatorname{span}(\mathcal{S}_{i}) \), we observe \(\mathcal{P}_{\mathcal{S}_{i}^{\bot}}(\boldsymbol{V}(i) - \boldsymbol{X}^{\ast}) = -\mathcal{P}_{\mathcal{S}_{i}^{\bot}} \boldsymbol{X}^{\ast}= -\mathcal{P}_{\mathcal{X}^{\ast}\setminus (\mathcal{D}_{i} \cup\mathcal{X}_{i})} \boldsymbol{X}^{\ast}\). Then, using Lemma 6, we obtain:


Given δ 2k δ 3k , ρ is upper bounded by \(\rho\,{<}\, 4\delta_{3k}\sqrt{\frac{1+3\delta_{3k}}{1-\delta_{3k}^{2}}} \). Then, \(4\delta_{3k}\sqrt{\frac {1+3\delta_{3k}}{1-\delta_{3k}^{2}}} < 1 \Leftrightarrow \delta_{3k} < 0.2267\).

1.4 A.4 Proof of Theorem 3

Let \(\mathcal{X}^{\ast}\leftarrow\mathcal{P}_{k}(\boldsymbol {X}^{\ast}) \) be a set of orthonormal, rank-1 matrices that span the range of X . In Algorithm 3, X(i+1) is the best rank-k approximation of V(i). Thus:


From Algorithm 3, (i) \(\boldsymbol{V}(i) \in\operatorname{span}(\mathcal {S}_{i}) \), (ii) \(\boldsymbol{Q}_{i} \in \operatorname{span}( \mathcal{S}_{i}) \) and (iii) \(\boldsymbol{W}(i) \in\operatorname{span}(\mathcal{S}_{i}) \). We define \(\mathcal{E} \leftarrow\text{ortho}(\mathcal{S}_{i} \cup \mathcal{X}^{\ast}) \) where we observe \(\operatorname{rank}(\text {span}(\mathcal{E})) \leq4k\) and let \(\mathcal{P}_{\mathcal {E}} \) be the orthogonal projection onto the subspace defined by \(\mathcal{E} \).

Since \(\boldsymbol {X}(i+1) - \boldsymbol{X}^{\ast}\in\operatorname{span}(\mathcal {E}) \) and \(\boldsymbol{V}(i) - \boldsymbol{X}^{\ast}\in \operatorname{span}(\mathcal{E}) \), the following hold true:



Then, (68) can be written as:


where (i) is due to \(\mathcal{P}_{\mathcal{E}}(\boldsymbol{Q}_{i} - \boldsymbol{X}^{\ast}) := \mathcal{P}_{\mathcal{S}_{i}} \mathcal {P}_{\mathcal{E}}(\boldsymbol{Q}_{i} - \boldsymbol{X}^{\ast}) + \mathcal{P}_{\mathcal{S}_{i}^{\bot}} \mathcal{P}_{\mathcal {E}}(\boldsymbol{Q}_{i} - \boldsymbol{X}^{\ast}) \) and (ii) follows from Cauchy-Schwarz inequality. Since \(\frac{1}{1+\delta_{3k}} \leq\mu_{i} \leq\frac {1}{1-\delta_{3k}} \), Lemma 4 implies:

and thus:

Furthermore, according to Lemma 5:

since \(\operatorname{rank}(\mathcal{P}_{\mathcal{K}}\boldsymbol{Q}) \leq 4k\), \(\forall\boldsymbol{Q} \in\mathbb{R}^{m \times n} \) where \(\mathcal{K} \leftarrow\text{ortho}(\mathcal{E} \cup\mathcal{S}_{i})\). Since \(\mathcal{P}_{\mathcal{S}_{i}^{\bot}}\mathcal{P}_{\mathcal{E}} (\boldsymbol{Q}_{i} - \boldsymbol{X}^{\ast}) = \mathcal{P}_{\mathcal{X}^{\ast}\setminus(\mathcal{D}_{i} \cup\mathcal {X}_{i})}\boldsymbol{X}^{\ast}\) where



using Lemma 6. Using the above in (70), we compute:




Combining (72) and (73), we get:


Let \(\alpha:= \frac{4\delta_{3k}}{1-\delta_{3k}} + (2\delta_{3k} + 2\delta_{4k})\frac{2\delta_{3k}}{1-\delta_{3k}} \) and g(i):=∥X(i+1)−X F . Then, (74) defines the following homogeneous recurrence:


Using the method of characteristic roots to solve the above recurrence, we assume that the homogeneous linear recursion has solution of the form g(i)=r i for \(r \in\mathbb{R} \). Thus, replacing g(i)=r i in (75) and factoring out r (i−2), we form the following characteristic polynomial:


Focusing on the worst case where (76) is satisfied with equality, we compute the roots r 1,2 of the quadratic characteristic polynomial as:

Then, as a general solution, we combine the above roots with unknown coefficients b 1,b 2 to obtain (69). Using the initial condition \(g(0) := \Vert \boldsymbol {X}(0) - \boldsymbol{X}^{\ast} \Vert _{F} \stackrel{\boldsymbol {X}(0) = \mathbf{0}}{=} \Vert \boldsymbol{X}^{\ast} \Vert _{F} = 1 \), we get b 1+b 2=1. Thus, we conclude to the following recurrence:

1.5 A.5 Proof of Lemma 7

Let \(\mathcal{D}_{i}^{\epsilon} \leftarrow\mathcal{P}_{k}^{\epsilon }(\mathcal{P}_{\mathcal{X}_{i}^{\bot}} \nabla f(\boldsymbol {X}(i))) \) and \(\mathcal{D}_{i} \leftarrow \mathcal{P}_{k}(\mathcal{P}_{\mathcal {X}_{i}^{\bot}} \nabla f(\boldsymbol {X}(i)))\). Using Definition 4, the following holds true:


Furthermore, we observe:


Here, we use the notation defined in the proof of Lemma 6. Since \(\mathcal{P}_{\mathcal{D}_{i}} \nabla f(\boldsymbol {X}(i)) \) is the best rank-k approximation to ∇f(X(i)), we have:


where \(\operatorname{rank}(\operatorname{span}(\text{ortho}(\mathcal{X}^{\ast}\setminus\mathcal{X}_{i}))) \leq k\). Using (77) in (79), the following series of inequalities are observed:


Now, in (78), we compute the series of inequalities in (81)-(82).


Focusing on \(\Vert \mathcal{P}_{\mathcal{X}^{\ast}\setminus \mathcal{X}_{i}}^{\bot} \boldsymbol {\mathcal {A}}^{\ast}(\boldsymbol {y}- \boldsymbol {\mathcal {A}}\boldsymbol {X}(i))\Vert _{F} \), we observe:


Moreover, we know the following hold true from Lemma 6:




Combining (83)–(85) in (82), we obtain:

1.6 A.6 Proof of Theorem 4

To prove Theorem 4, we combine the following series of lemmas for each step of Algorithm 1.

Lemma 12

[Error norm reduction via gradient descent]

Let \(\mathcal{S}_{i} \leftarrow\text{\textit{ortho}}(\mathcal{X}_{i} \cup\mathcal{D}_{i}^{\epsilon}) \) be a set of orthonormal, rank-1 matrices that span a rank-2k subspace in \(\mathbb{R}^{m \times n} \). Then (86) holds.



We observe the following:


The following equations hold true:

Furthermore, we compute:


where (i) is due to Lemmas 2, 4, 5 and \(\frac{1}{1+\delta_{2k}} \leq\mu_{i} \leq\frac {1}{1-\delta_{2k}} \).

Using the subadditivity property of the square root in (87), (88), Lemma 7 and the fact that \(\Vert \mathcal {P}_{\mathcal{S}_{i}}(\boldsymbol {X}(i) - \boldsymbol{X}^{\ast})\Vert _{F} \leq \Vert \boldsymbol {X}(i) - \boldsymbol{X}^{\ast} \Vert _{F} \), we obtain:


where \(\hat{\rho} := ( 1+ \frac{\delta_{3k}}{1-\delta _{2k}} ) (2\delta_{2k} + 2\delta_{3k} ) + \frac{2\delta_{2k}}{1-\delta_{2k}}\). □

We exploit Lemma 8 to obtain the following inequalities:


where the last inequality holds since W(i) is the best rank-k matrix estimate of V(i) and, thus, ∥W(i)−V(i)∥ F ≤∥V(i)−X F .

Following similar motions for steps 6 and 7 in Matrix ALPS I, we obtain:


Combining (91), (90) and (89), we obtain the desired inequality.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kyrillidis, A., Cevher, V. Matrix Recipes for Hard Thresholding Methods. J Math Imaging Vis 48, 235–265 (2014).

Download citation

  • Published:

  • Issue Date:

  • DOI: