Springer Nature is making SARS-CoV-2 and COVID-19 research free. View research | View latest news | Sign up for updates

Convex Low Rank Approximation

  • 1184 Accesses

  • 11 Citations


Low rank approximation is an important tool in many applications. Given an observed matrix with elements corrupted by Gaussian noise it is possible to find the best approximating matrix of a given rank through singular value decomposition. However, due to the non-convexity of the formulation it is not possible to incorporate any additional knowledge of the sought matrix without resorting to heuristic optimization techniques. In this paper we propose a convex formulation that is more flexible in that it can be combined with any other convex constraints and penalty functions. The formulation uses the so called convex envelope, which is the provably best possible convex relaxation. We show that for a general class of problems the envelope can be efficiently computed and may in some cases even have a closed form expression. We test the algorithm on a number of real and synthetic data sets and show state-of-the-art results.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23


  1. 1.

    Since it is possible to restrict the minimization in X to a compact set the existence of a saddle point can be guaranteed (see Rockafellar (1997) for details).

  2. 2.

    Note that we choose \(g(k) = \mu k\) here to allow for a comparison with the nuclear norm. In general when we solve the missing data problem we use \(g(k) =\mathbb {I}(k \le r_0)\).

  3. 3.

    Here \({\mathcal {P}}_k(U)\) denotes the rows corresponding to block k.


  1. Andersson, F., Carlsson, M., Tourneret, J. Y., & Wendt, H. (2014). A new frequency estimation method for equally and unequally spaced data. IEEE Transactions on Signal Processing, 62(21), 5761–5774.

  2. Angst, R., Zach, C., & Pollefeys, M. (2011). The generalized trace-norm and its application to structure-from-motion problems. In International Conference on Computer Vision

  3. Aquiar, P. M. Q., Stosic, M., & Xavier, J. M. F. (2008). Spectrally optimal factorization of incomplete matrices. In IEEE Conference on Computer Vision and Pattern Recognition

  4. Argyriou, A., Foygel, R., & Srebro, N. (2012). Sparse prediction with the k-support norm. In Advances in Neural Information Processing Systems

  5. Basri, R., Jacobs, D., & Kemelmacher, I. (2007). Photometric stereo with general, unknown lighting. International Journal of Computer Vision, 72(3), 239–257.

  6. Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends in Machine Learning, 3(1), 1–122.

  7. Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3d shape from image streams. In IEEE Conference on Computer Vision and Pattern Recognition

  8. Buchanan, A.M., & Fitzgibbon, A.W. (2005). Damped newton algorithms for matrix factorization with missing data. In IEEE Conference on Computer Vision and Pattern Recognition

  9. Cabral, R., de la Torre, F., Costeira, J., & Bernardino, A. (2013). Unifying nuclear norm and bilinear factorization approaches for low-rank matrix decomposition. In International Conference on Computer Vision

  10. Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization, 20(4), 1956–1982.

  11. Candès, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis? Journal of the ACM, 58(3), 11:1–11:37.

  12. Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank. Psychometrika, 1(3), 211–218.

  13. Eriksson, A., & Hengel, A. (2012). Efficient computation of robust weighted low-rank matrix approximations using the \(L_1\) norm. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(9), 1681–1690.

  14. Eriksson, A., Thanh, P. T., Chin, T.J., & Reid, I. (2015). The k-support norm and convex envelopes of cardinality and rank. In The IEEE Conference on Computer Vision and Pattern Recognition

  15. Favaro, P., Vidal, R., & Ravichandran, A. (2011). A closed form solution to robust subspace estimation and clustering. In IEEE Confernece on Computer Vision and Pattern Recognition

  16. Fazel, M., Hindi, H., & Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. In American Control Conference

  17. Garg, R., Roussos, A., & de Agapito, L. (2013). Dense variational reconstruction of non-rigid surfaces from monocular video. In IEEE Conference on Computer Vision and Pattern Recognition

  18. Garg, R., Roussos, A., & Agapito, L. (2013). A variational approach to video registration with subspace constraints. International Journal of Computer Vision, 104(3), 286–314.

  19. Gillis, N., & Glinuer, F. (2011). Low-rank matrix approximation with weights or missing data is np-hard. SIAM Journal on Matrix Analysis and Applications, 32, 4.

  20. Hu, Y., Zhang, D., Ye, J., Li, X., & He, X. (2013). Fast and accurate matrix completion via truncated nuclear norm regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(9), 2117–2130. doi:10.1109/TPAMI.2012.271.

  21. Jacobs, D. (1997). Linear fitting with missing data: Applications to structure-from-motion and to characterizing intensity images. In IEEE Conference on Computer Vision and Pattern Recognition

  22. Jojic, V., Saria, S., & Koller, D. (2011). Convex envelopes of complexity controlling penalties: The case against premature envelopment. In International Conference on Artificial Intelligence and Statistics

  23. Ke, Q., & Kanade, T. (2005). Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming. In IEEE Conference on Computer Vision and Pattern Recognition

  24. Keshavan, R. H., Montanari, A., & Oh, S. (2010). Matrix completion from a few entries. IEEE Transactions on Information Theory, 56(6), 2980–2998.

  25. Lai, H., Pan, Y., Lu, C., Tang, Y., & Yan, S. (2014). Efficient k-support matrix pursuit. In European Conference on Computer Vision, vol. 8690, 2014

  26. Larsson, V., Bylow, E., Olsson, C., & Kahl, F. (2014). Rank minimization with structured data patterns. In European Conference on Computer Vision

  27. Larsson, V., & Olsson, C. (2015). Convex envelopes for low rank approximation. In International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition

  28. Lin, Z., Chen, M., & Ma, Y. (2010). The augmented lagrange multiplier method for exact recovery of corrupted low rank matrices. Mathematical Programming. http://arxiv.org/abs/1009.5055.

  29. Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices. Journal of Machine Learning Research, 11, 2287–2322.

  30. McDonald, A.M., Pontil, M., & Stamos, D. (2014). Spectral k-support norm regularization. In Advances in Neural Information Processing Systems

  31. Okatani, T., & Deguchi, K. (2007). On the wiberg algorithm for factorization with missing data. International Journal of Computer Vision, 72(3), 329–337.

  32. Okatani, T., Yoshida, T., & Deguchi, K. (2011). Efficient algorithm for low-rank matrix factorization with missing components and performance comparison of latest algorithms. In Proceedings of the International Conference on Computer Vision

  33. Olsen, S., & Bartoli, A. (2008). Implicit non-rigid structure-from-motion with priors. Journal of Mathematical Imaging and Vision, 31(2–3), 233–244. doi:10.1007/s10851-007-0060-3.

  34. Olsson, C., & Oskarsson, M. (2009) A convex approach to low rank matrix approximation with missing data. In Scandinavian Conference on Image Analysis

  35. Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Review, 52(3), 471–501.

  36. Rockafellar, R. (1997). Convex Analysis. Princeton: Princeton University Press.

  37. Strelow, D. (2012). General and nested Wiberg minimization. In IEEE Conference on Computer Vision and Pattern Recognition

  38. Sturm, J. F. (1999). Using SeDuMi 1.02, a Matlab toolbox for optimization over symmetric cones. Optimization Methods and Software, 11–12, 625–653.

  39. The MOSEK optimization toolbox for MATLAB manual. (2016). www.mosek.com

  40. Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: A factorization method. International Journal of Computer Vision, 9(2), 137–154.

  41. Wang, S., Liu, D., & Zhang, Z. (2013). Nonconvex relaxation approaches to robust matrix recovery. In International Joint Conference on Artificial Intelligence

  42. Wiberg, T. (2013). Computation of principal components when data are missing. In Proceedings of Second Symposium on Computational Statistics

  43. Yan, J., & Pollefeys, M. (2008). A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video. IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(5), 865–877.

  44. Zheng, Y., Liu, G., Sugimoto, S., Yan, S., & Okutomi, M. (2012). Practical low-rank matrix approximation under robust \(L_1\)-norm. In IEEE Conference on Computer Vision and Pattern Recognition

  45. Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67, 301–320.

Download references


This work has been funded by the Swedish Research Council (Grant No. 2012-4213) and the Crafoord Foundation. This paper has benefited from discussions with Erik Bylow and Fredrik Kahl.

Author information

Correspondence to Viktor Larsson.

Additional information

Communicated by Daniel Cremers.



The Sequence of Unconstrained Minimizers

In this section we prove that with the definitions of p and q as in Sect. 3.1 the sequence of unconstrained minimizers defined by Eq. (49) will have the shape illustrated in Fig. 6.

Lemma 2

If p and q are selected such that the sequence \(\{s_i\}\) defined by (49) is non-increasing for \(i \le p\) and \(q \le i\), non-decreasing for \(p \le i \le q\) and \(s_p < s_q\) then

$$\begin{aligned} s_i = {\left\{ \begin{array}{ll} \max (\sigma _i(Y),s_p), &{} i \le p\\ \min (\max (\sqrt{g_i},s_p),s_q) &{} p \le i\le q \\ \min ((\rho +1)\sigma _i(Y),s_q), &{} i \ge q \end{array}\right. }. \end{aligned}$$


We first note that (49) can equivalently be written

$$\begin{aligned} s_i = \min \left( \max ( \sqrt{g_i},\sigma _i(Y)),(\rho +1) \sigma _i(Y)\right) . \end{aligned}$$

We begin with the case \(i \le p\). Since

$$\begin{aligned} s_p < s_q \le (\rho +1)\sigma _q(Y) \le (\rho +1)\sigma _p(Y), \end{aligned}$$

we have by (84) that \(s_p = \max (\sqrt{g_p},\sigma _p(Y)) < (\rho +1)\sigma _i(Y)\) and therefore \(\sqrt{g_p} \le (\rho +1)\sigma _p(Y)\). Furthermore, since \(g_i\) is non-decreasing and \(\sigma _i(Y)\) is non-increasing \(\sqrt{g_i} \le (\rho +1)\sigma _i(Y)\) for all \(i \le p\). For \(i<p\) we now get that if \(s_i > s_p\) then \(s_i > \sqrt{g_p} \ge \sqrt{g_i}\) and therefore by (84) \(s_i = \sigma _i(Y)\).

For \(i \ge q\) we similarly have

$$\begin{aligned} s_q > s_p \ge \sigma _p(Y) \ge \sigma _q(Y), \end{aligned}$$

which together with (84) gives that \(s_q = \min \left( \sqrt{g_p},\right. \left. (\rho +1)\sigma _q(Y)\right) \) and \(\sqrt{g_q} \ge \sigma _q(Y)\). Moreover, since \(g_i\) is non-decreasing and \(\sigma _i(Y)\) is non-increasing \(\sqrt{g_i} \ge \sigma _i(Y)\) for all \(i \ge q\). For \(i>q\) we now get that if \(s_i < s_q\) then \(s_i < \sqrt{g_q} \le \sqrt{g_i}\) and therefore by (84) \(s_i = (\rho +1)\sigma _i(Y)\).

For the final case \(p\le i\le q\) we note that \(s_p \le s_i \le s_q\) since \(s_i\) is non-decreasing between p and q. If \(s_i > s_p\) then

$$\begin{aligned} s_i > s_p \ge \sigma _p(Y) \ge \sigma _i(Y). \end{aligned}$$

If \(s_i < s_q\) then

$$\begin{aligned} s_i < s_q \le (\rho +1)\sigma _q(Y) \le (\rho +1)\sigma _i(Y). \end{aligned}$$

Therefore if \(s_p < s_i < s_q\) then \(s_i = \sqrt{g_i}\).

Properties of Feasible Minimizers

In this section we give a result that enables us to efficiently search for optimal sequences of singular values. The key observation is that for concave costs the optimum is either in a stationary point (determined by minimizing each singular value separately) or constrained by one of its neighboring singular values. Using this information it is possible to single out a 1-parameter family of singular value configurations guaranteed to contain the optimal one.

We let \(s_i\) be a sequence of non-negative numbers. For \(i \le p\) we require that the sequence is non-increasing, for \(p \le i \le q\) non-decreasing and \(q \le i\) non-increasing. Note that due to the definition \(s_q\) and \(s_p\) will be local extreme points of the sequence \(\big (s_{p-1} \ge s_{p} \le s_{p+1}\) and \(s_{q-1} \le s_{q} \ge s_{q+1}\big )\).

Lemma 3

Let \(\{s_i\}\) be the unconstrained maximizers of \(f_i(s)\), where \(f_i\) are strictly concave (and therefore have unique maximizers). Then the maximizer of \(g({\varvec{\sigma }}) = \sum _i f_i(\sigma _i)\), such that \(\sigma _1 \ge \sigma _2 \ge \cdot \ge \sigma _n\), fulfills

$$\begin{aligned} \sigma _i =&\max (s_i,\sigma _{i+1}),&1 \le i \le p \end{aligned}$$
$$\begin{aligned} \sigma _i =&\sigma _{i+1},&p\le i \le q-1\end{aligned}$$
$$\begin{aligned} \sigma _i =&\min (s_i,\sigma _{i-1}),&i \ge q \end{aligned}$$


Since each \(f_i\) is (strictly) concave and \(\sigma _{i+1} \le \sigma _{i-1}\) the optimization over \(\sigma _i\) can be limited to three choices


Using induction we first prove the recursion

$$\begin{aligned} \sigma _i = \max (s_i,\sigma _{i+1}) \quad \text {for } i \le p. \end{aligned}$$

For \(i=1\) we see from (92) that \(s_1\) is the optimal choice if \(s_1 > \sigma _{2}\) otherwise \(\sigma _{2}\) is optimal. Therefore \(\sigma _1 = \max (s_1,\sigma _2)\). Next assume that \(\sigma _{i-1} = \max (s_{i-1},\sigma _i)\) for some \(i \le p\). Then

$$\begin{aligned} \sigma _{i-1} \ge s_{i-1} \ge s_i, \end{aligned}$$

therefore we can ignore the second case in (92), which proves the recursion (93).

Next we show that the sequence \(\{\sigma _i\}\) is constant in \(p \le i \le q\). We consider \(\sigma _i\) for some \(p \le i \le q-1\). If \(\sigma _i > s_i\) it must have been bounded from below in (92), i.e. \(\sigma _i = \sigma _{i+1}\). If instead \(\sigma _i \le s_i\) we have \(\sigma _{i+1} \le \sigma _i \le s_i \le s_{i+1}\). Then similarly \(\sigma _{i+1}\) is bounded from above in (92) which implies \(\sigma _{i+1} = \sigma _i\).

For the final part we consider \(i \ge q\) and show that

$$\begin{aligned} \sigma _i = \min (s_i, \sigma _{i-1}) \quad \text {for} \quad i \ge q. \end{aligned}$$

It is clear from (92) that this holds for \(i = n\). We continue using induction by assuming \(\sigma _{i+1} = \min (s_{i+1}, \sigma _i)\) holds. Then

$$\begin{aligned} \sigma _{i+1} \le s_{i+1} \le s_i, \end{aligned}$$

since \(s_i\) are non-increasing for \(i \ge q\). This means that for \(\sigma _i\) we can ignore the third case in (92). Thus it follows that \(\sigma _{i} = \min (s_{i}, \sigma _{i-1})\). So (95) holds for all \(i \ge q\).

Theorem 2

Then the maximizer \({\varvec{\sigma }}\) can be written

$$\begin{aligned} \sigma _i= & {} \max (s_i,s),\quad 1 \le i \le p \end{aligned}$$
$$\begin{aligned} \sigma _i= & {} s, \quad \quad \quad \quad \quad \quad p\le i \le q-1 \end{aligned}$$
$$\begin{aligned} \sigma _i= & {} \min (s_i,s), \quad i \ge q, \end{aligned}$$

where s fulfills \(s_p \le s \le s_q\).


We first consider \(i \le p\). Assume \(\sigma _i \ne s_i\) for some \(i < p\). From (93) it follows that

$$\begin{aligned} \sigma _i = \sigma _{i+1} > s_i. \end{aligned}$$

But \(s_i\) is non-increasing for \(i \le p\) which implies that \(\sigma _{i+1} > s_{i+1}\). By repeating the argument it follows that

$$\begin{aligned} \sigma _i = \sigma _{i+1} = \sigma _{i+2} = \dots = \sigma _{p}. \end{aligned}$$

We let \(s = \sigma _p\) and note that due to (93) \(s \ge s_p\). By Lemma 3 we also have

$$\begin{aligned} \sigma _p = \sigma _{p+1} = \sigma _{i+2} = \dots = \sigma _{q}. \end{aligned}$$

Therefore \(s=\sigma _{q}\) and by (95) we get \(s\le s_q\).

Now assume that for some \(i \ge q\) we have \(\sigma _i(Z) \ne s_i\). By (95) we must have that

$$\begin{aligned} \sigma _i(Z) = \sigma _{i-1}(Z) < s_i \le s_{i-1}. \end{aligned}$$

By repeating the argument we get

$$\begin{aligned} \sigma _i(Z) = \sigma _{i-1}(Z) = \sigma _{i-2}(Z) = \dots = \sigma _q(Z). \end{aligned}$$

and the result follows. \(\square \)

Extension Outside the Blocks

In this section we show how to extend a partial low rank solution computed on overlapping blocks of the matrix to a complete solution. The approach hinges on the following result.

Fig. 24

Two overlapping blocks \(X_1\) and \(X_2\). The goal of the extension is to find the unknown \(X_{13}\) and \(X_{31}\) such that the rank is not increased, i.e. \({{\mathrm{rank}}}(X) = \max ({{\mathrm{rank}}}(X_1),{{\mathrm{rank}}}(X_2))\)

Lemma 4

Let \(X_1\) and \(X_2\) be two overlapping blocks such that they agree on the overlap \(X_{22}\) (in the notation from Fig. 24). If the overlap satisfies

$$\begin{aligned} {{\mathrm{rank}}}(X_{22}) = \min ({{\mathrm{rank}}}(X_1),{{\mathrm{rank}}}(X_2)) \end{aligned}$$

then there exist \(X_{13}\) and \(X_{31}\) such that

$$\begin{aligned} {{\mathrm{rank}}}(X) = \max ({{\mathrm{rank}}}(X_1),~{{\mathrm{rank}}}(X_2)), \end{aligned}$$

Furthermore, if \({{\mathrm{rank}}}(X_1) = {{\mathrm{rank}}}(X_2)\) the extension is unique.


Without loss of generality assume that \({{\mathrm{rank}}}(X_{22})={{\mathrm{rank}}}(X_2) \le {{\mathrm{rank}}}(X_1)\). Then the column space of \(X_2\) must be spanned by \(\begin{bmatrix} X_{22} \\ X_{32} \end{bmatrix}\) and similarly the row space by \(\begin{bmatrix} X_{22}&X_{23} \end{bmatrix}\). There exist coefficient matrices \(C_1\) and \(C_2\) such that

$$\begin{aligned} \begin{bmatrix} X_{22} \\ X_{32} \end{bmatrix}C_1 = \begin{bmatrix} X_{23}\\X_{33} \end{bmatrix} \text {~~and~~}C_2 \begin{bmatrix} X_{22}&X_{23} \end{bmatrix} = \begin{bmatrix} X_{32}&X_{33} \end{bmatrix}. \end{aligned}$$

For the extension we can then take

$$\begin{aligned} X_{13} := X_{12}C_1 \quad \text {and} \quad X_{31} := C_2 X_{21}. \end{aligned}$$

To see that this does not increase the rank we note that

$$\begin{aligned} \begin{bmatrix} X_{12} \\ X_{22} \\ X_{32} \end{bmatrix}C_1 = \begin{bmatrix} X_{13} \\ X_{23} \\ X_{33} \end{bmatrix}, \end{aligned}$$

and similarly for the rows. This means that the number of linearly independent columns and rows have not increased and the rank must be preserved.

Now assume that \({{\mathrm{rank}}}(X_1) = {{\mathrm{rank}}}(X_2)\). We prove uniqueness by means of contradiction. Assume there exist two different extensions

$$\begin{aligned} X_{13} = X_{12}C_1 \quad \text {and} \quad \tilde{X}_{13} = X_{12} \tilde{C}_1. \end{aligned}$$

To be extensions which preserve the rank \(C_1\) and \(\tilde{C}_1\) must satisfy

$$\begin{aligned} \begin{bmatrix} X_{23}\\X_{33} \end{bmatrix} = \begin{bmatrix} X_{22} \\ X_{32} \end{bmatrix}C_1 = \begin{bmatrix} X_{22} \\ X_{32} \end{bmatrix}\tilde{C}_1. \end{aligned}$$

Which implies that \(C_1 - \tilde{C}_1\) lies in the nullspace of \(\begin{bmatrix} X_{22}^T&X_{32}^T \end{bmatrix}^T\). But by assumption we have

$$\begin{aligned} {{\mathrm{rank}}}\left( \begin{bmatrix} X_{22} \\ X_{32} \end{bmatrix}\right) = {{\mathrm{rank}}}\left( \begin{bmatrix} X_{13} \\ X_{22} \\ X_{32} \end{bmatrix}\right) . \end{aligned}$$

This implies that \(C_1 - \tilde{C}_1\) must also lie in the nullspace of \(X_{12}\), i.e.

$$\begin{aligned} X_{12}(C_1 - \tilde{C}_1) = 0 \Leftrightarrow X_{13} = \tilde{X}_{13}, \end{aligned}$$

which is a contradiction. \(\square \)

The previous lemma showed that each pair of overlapping blocks has an extension which preserves the rank. If we assume that the blocks are chosen to be connected (in a graph sense) we can iterate this construction to find an extension to the whole matrix.

Computing the Extension

To find the extension beyond the blocks we employ a nullspace matching scheme which has previously been used in Olsen and Bartoli (2008) and Jacobs (1997). The goal is find a rank r factorization of the full solution \(X = UV^T\) given the solution on the blocks. Each block \({\mathcal {P}}_k(X)\) can be factorized as \({\mathcal {P}}_k(X) = U_k V_k^T\). Then \({\mathcal {P}}_k(U)\) Footnote 3 must lie in the column space of \(U_k\) or equivalently it must be orthogonal to the complement, i.e. \((U_k^\perp )^T {\mathcal {P}}_k(U) = 0\). We can also write this as

$$\begin{aligned} A_k U = \big [~0 \quad (U_k^\perp )^T \quad 0~\big ]~U = 0. \end{aligned}$$

Collecting these into matrix, \(AU = 0\), we can find U by minimizing \(\Vert AU \Vert \). Since the scale of U is arbitrary we can consider this as a homogeneous least squares problem which can be solved using SVD. For known U we can simply find V by minimizing \(||W\odot (X-UV^T)||\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Larsson, V., Olsson, C. Convex Low Rank Approximation. Int J Comput Vis 120, 194–214 (2016). https://doi.org/10.1007/s11263-016-0904-7

Download citation


  • Low rank approximation
  • Convex relaxation
  • Convex envelope
  • Structure from motion