## Abstract

Low rank approximation is an important tool in many applications. Given an observed matrix with elements corrupted by Gaussian noise it is possible to find the best approximating matrix of a given rank through singular value decomposition. However, due to the non-convexity of the formulation it is not possible to incorporate any additional knowledge of the sought matrix without resorting to heuristic optimization techniques. In this paper we propose a convex formulation that is more flexible in that it can be combined with any other convex constraints and penalty functions. The formulation uses the so called convex envelope, which is the provably best possible convex relaxation. We show that for a general class of problems the envelope can be efficiently computed and may in some cases even have a closed form expression. We test the algorithm on a number of real and synthetic data sets and show state-of-the-art results.

This is a preview of subscription content, log in to check access.

## Notes

- 1.
Since it is possible to restrict the minimization in

*X*to a compact set the existence of a saddle point can be guaranteed (see Rockafellar (1997) for details). - 2.
Note that we choose \(g(k) = \mu k\) here to allow for a comparison with the nuclear norm. In general when we solve the missing data problem we use \(g(k) =\mathbb {I}(k \le r_0)\).

- 3.
Here \({\mathcal {P}}_k(U)\) denotes the rows corresponding to block

*k*.

## References

Andersson, F., Carlsson, M., Tourneret, J. Y., & Wendt, H. (2014). A new frequency estimation method for equally and unequally spaced data.

*IEEE Transactions on Signal Processing*,*62*(21), 5761–5774.Angst, R., Zach, C., & Pollefeys, M. (2011). The generalized trace-norm and its application to structure-from-motion problems. In

*International Conference on Computer Vision*Aquiar, P. M. Q., Stosic, M., & Xavier, J. M. F. (2008). Spectrally optimal factorization of incomplete matrices. In

*IEEE Conference on Computer Vision and Pattern Recognition*Argyriou, A., Foygel, R., & Srebro, N. (2012). Sparse prediction with the k-support norm. In

*Advances in Neural Information Processing Systems*Basri, R., Jacobs, D., & Kemelmacher, I. (2007). Photometric stereo with general, unknown lighting.

*International Journal of Computer Vision*,*72*(3), 239–257.Boyd, S., Parikh, N., Chu, E., Peleato, B., & Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers.

*Foundations and Trends in Machine Learning*,*3*(1), 1–122.Bregler, C., Hertzmann, A., & Biermann, H. (2000). Recovering non-rigid 3d shape from image streams. In

*IEEE Conference on Computer Vision and Pattern Recognition*Buchanan, A.M., & Fitzgibbon, A.W. (2005). Damped newton algorithms for matrix factorization with missing data. In

*IEEE Conference on Computer Vision and Pattern Recognition*Cabral, R., de la Torre, F., Costeira, J., & Bernardino, A. (2013). Unifying nuclear norm and bilinear factorization approaches for low-rank matrix decomposition. In

*International Conference on Computer Vision*Cai, J. F., Candès, E. J., & Shen, Z. (2010). A singular value thresholding algorithm for matrix completion.

*SIAM Journal on Optimization*,*20*(4), 1956–1982.Candès, E. J., Li, X., Ma, Y., & Wright, J. (2011). Robust principal component analysis?

*Journal of the ACM*,*58*(3), 11:1–11:37.Eckart, C., & Young, G. (1936). The approximation of one matrix by another of lower rank.

*Psychometrika*,*1*(3), 211–218.Eriksson, A., & Hengel, A. (2012). Efficient computation of robust weighted low-rank matrix approximations using the \(L_1\) norm.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*34*(9), 1681–1690.Eriksson, A., Thanh, P. T., Chin, T.J., & Reid, I. (2015). The k-support norm and convex envelopes of cardinality and rank. In

*The IEEE Conference on Computer Vision and Pattern Recognition*Favaro, P., Vidal, R., & Ravichandran, A. (2011). A closed form solution to robust subspace estimation and clustering. In

*IEEE Confernece on Computer Vision and Pattern Recognition*Fazel, M., Hindi, H., & Boyd, S. P. (2001). A rank minimization heuristic with application to minimum order system approximation. In

*American Control Conference*Garg, R., Roussos, A., & de Agapito, L. (2013). Dense variational reconstruction of non-rigid surfaces from monocular video. In

*IEEE Conference on Computer Vision and Pattern Recognition*Garg, R., Roussos, A., & Agapito, L. (2013). A variational approach to video registration with subspace constraints.

*International Journal of Computer Vision*,*104*(3), 286–314.Gillis, N., & Glinuer, F. (2011). Low-rank matrix approximation with weights or missing data is np-hard.

*SIAM Journal on Matrix Analysis and Applications*,*32*, 4.Hu, Y., Zhang, D., Ye, J., Li, X., & He, X. (2013). Fast and accurate matrix completion via truncated nuclear norm regularization.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*35*(9), 2117–2130. doi:10.1109/TPAMI.2012.271.Jacobs, D. (1997). Linear fitting with missing data: Applications to structure-from-motion and to characterizing intensity images. In

*IEEE Conference on Computer Vision and Pattern Recognition*Jojic, V., Saria, S., & Koller, D. (2011). Convex envelopes of complexity controlling penalties: The case against premature envelopment. In

*International Conference on Artificial Intelligence and Statistics*Ke, Q., & Kanade, T. (2005). Robust l1 norm factorization in the presence of outliers and missing data by alternative convex programming. In

*IEEE Conference on Computer Vision and Pattern Recognition*Keshavan, R. H., Montanari, A., & Oh, S. (2010). Matrix completion from a few entries.

*IEEE Transactions on Information Theory*,*56*(6), 2980–2998.Lai, H., Pan, Y., Lu, C., Tang, Y., & Yan, S. (2014). Efficient k-support matrix pursuit. In

*European Conference on Computer Vision*, vol. 8690, 2014Larsson, V., Bylow, E., Olsson, C., & Kahl, F. (2014). Rank minimization with structured data patterns. In

*European Conference on Computer Vision*Larsson, V., & Olsson, C. (2015). Convex envelopes for low rank approximation. In

*International Conference on Energy Minimization Methods in Computer Vision and Pattern Recognition*Lin, Z., Chen, M., & Ma, Y. (2010). The augmented lagrange multiplier method for exact recovery of corrupted low rank matrices. Mathematical Programming. http://arxiv.org/abs/1009.5055.

Mazumder, R., Hastie, T., & Tibshirani, R. (2010). Spectral regularization algorithms for learning large incomplete matrices.

*Journal of Machine Learning Research*,*11*, 2287–2322.McDonald, A.M., Pontil, M., & Stamos, D. (2014). Spectral k-support norm regularization. In

*Advances in Neural Information Processing Systems*Okatani, T., & Deguchi, K. (2007). On the wiberg algorithm for factorization with missing data.

*International Journal of Computer Vision*,*72*(3), 329–337.Okatani, T., Yoshida, T., & Deguchi, K. (2011). Efficient algorithm for low-rank matrix factorization with missing components and performance comparison of latest algorithms. In

*Proceedings of the International Conference on Computer Vision*Olsen, S., & Bartoli, A. (2008). Implicit non-rigid structure-from-motion with priors.

*Journal of Mathematical Imaging and Vision*,*31*(2–3), 233–244. doi:10.1007/s10851-007-0060-3.Olsson, C., & Oskarsson, M. (2009) A convex approach to low rank matrix approximation with missing data. In

*Scandinavian Conference on Image Analysis*Recht, B., Fazel, M., & Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization.

*SIAM Review*,*52*(3), 471–501.Rockafellar, R. (1997).

*Convex Analysis*. Princeton: Princeton University Press.Strelow, D. (2012). General and nested Wiberg minimization. In

*IEEE Conference on Computer Vision and Pattern Recognition*Sturm, J. F. (1999). Using SeDuMi 1.02, a Matlab toolbox for optimization over symmetric cones.

*Optimization Methods and Software*,*11–12*, 625–653.The MOSEK optimization toolbox for MATLAB manual. (2016). www.mosek.com

Tomasi, C., & Kanade, T. (1992). Shape and motion from image streams under orthography: A factorization method.

*International Journal of Computer Vision*,*9*(2), 137–154.Wang, S., Liu, D., & Zhang, Z. (2013). Nonconvex relaxation approaches to robust matrix recovery. In

*International Joint Conference on Artificial Intelligence*Wiberg, T. (2013). Computation of principal components when data are missing. In

*Proceedings of Second Symposium on Computational Statistics*Yan, J., & Pollefeys, M. (2008). A factorization-based approach for articulated nonrigid shape, motion and kinematic chain recovery from video.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*,*30*(5), 865–877.Zheng, Y., Liu, G., Sugimoto, S., Yan, S., & Okutomi, M. (2012). Practical low-rank matrix approximation under robust \(L_1\)-norm. In

*IEEE Conference on Computer Vision and Pattern Recognition*Zou, H., & Hastie, T. (2005). Regularization and variable selection via the elastic net.

*Journal of the Royal Statistical Society B*,*67*, 301–320.

## Acknowledgments

This work has been funded by the Swedish Research Council (Grant No. 2012-4213) and the Crafoord Foundation. This paper has benefited from discussions with Erik Bylow and Fredrik Kahl.

## Author information

## Additional information

Communicated by Daniel Cremers.

## Appendix

### Appendix

### The Sequence of Unconstrained Minimizers

In this section we prove that with the definitions of *p* and *q* as in Sect. 3.1 the sequence of unconstrained minimizers defined by Eq. (49) will have the shape illustrated in Fig. 6.

###
**Lemma 2**

If *p* and *q* are selected such that the sequence \(\{s_i\}\) defined by (49) is non-increasing for \(i \le p\) and \(q \le i\), non-decreasing for \(p \le i \le q\) and \(s_p < s_q\) then

###
*Proof*

We first note that (49) can equivalently be written

We begin with the case \(i \le p\). Since

we have by (84) that \(s_p = \max (\sqrt{g_p},\sigma _p(Y)) < (\rho +1)\sigma _i(Y)\) and therefore \(\sqrt{g_p} \le (\rho +1)\sigma _p(Y)\). Furthermore, since \(g_i\) is non-decreasing and \(\sigma _i(Y)\) is non-increasing \(\sqrt{g_i} \le (\rho +1)\sigma _i(Y)\) for all \(i \le p\). For \(i<p\) we now get that if \(s_i > s_p\) then \(s_i > \sqrt{g_p} \ge \sqrt{g_i}\) and therefore by (84) \(s_i = \sigma _i(Y)\).

For \(i \ge q\) we similarly have

which together with (84) gives that \(s_q = \min \left( \sqrt{g_p},\right. \left. (\rho +1)\sigma _q(Y)\right) \) and \(\sqrt{g_q} \ge \sigma _q(Y)\). Moreover, since \(g_i\) is non-decreasing and \(\sigma _i(Y)\) is non-increasing \(\sqrt{g_i} \ge \sigma _i(Y)\) for all \(i \ge q\). For \(i>q\) we now get that if \(s_i < s_q\) then \(s_i < \sqrt{g_q} \le \sqrt{g_i}\) and therefore by (84) \(s_i = (\rho +1)\sigma _i(Y)\).

For the final case \(p\le i\le q\) we note that \(s_p \le s_i \le s_q\) since \(s_i\) is non-decreasing between *p* and *q*. If \(s_i > s_p\) then

If \(s_i < s_q\) then

Therefore if \(s_p < s_i < s_q\) then \(s_i = \sqrt{g_i}\).

### Properties of Feasible Minimizers

In this section we give a result that enables us to efficiently search for optimal sequences of singular values. The key observation is that for concave costs the optimum is either in a stationary point (determined by minimizing each singular value separately) or constrained by one of its neighboring singular values. Using this information it is possible to single out a 1-parameter family of singular value configurations guaranteed to contain the optimal one.

We let \(s_i\) be a sequence of non-negative numbers. For \(i \le p\) we require that the sequence is non-increasing, for \(p \le i \le q\) non-decreasing and \(q \le i\) non-increasing. Note that due to the definition \(s_q\) and \(s_p\) will be local extreme points of the sequence \(\big (s_{p-1} \ge s_{p} \le s_{p+1}\) and \(s_{q-1} \le s_{q} \ge s_{q+1}\big )\).

###
**Lemma 3**

Let \(\{s_i\}\) be the unconstrained maximizers of \(f_i(s)\), where \(f_i\) are strictly concave (and therefore have unique maximizers). Then the maximizer of \(g({\varvec{\sigma }}) = \sum _i f_i(\sigma _i)\), such that \(\sigma _1 \ge \sigma _2 \ge \cdot \ge \sigma _n\), fulfills

###
*Proof*

Since each \(f_i\) is (strictly) concave and \(\sigma _{i+1} \le \sigma _{i-1}\) the optimization over \(\sigma _i\) can be limited to three choices

Using induction we first prove the recursion

For \(i=1\) we see from (92) that \(s_1\) is the optimal choice if \(s_1 > \sigma _{2}\) otherwise \(\sigma _{2}\) is optimal. Therefore \(\sigma _1 = \max (s_1,\sigma _2)\). Next assume that \(\sigma _{i-1} = \max (s_{i-1},\sigma _i)\) for some \(i \le p\). Then

therefore we can ignore the second case in (92), which proves the recursion (93).

Next we show that the sequence \(\{\sigma _i\}\) is constant in \(p \le i \le q\). We consider \(\sigma _i\) for some \(p \le i \le q-1\). If \(\sigma _i > s_i\) it must have been bounded from below in (92), i.e. \(\sigma _i = \sigma _{i+1}\). If instead \(\sigma _i \le s_i\) we have \(\sigma _{i+1} \le \sigma _i \le s_i \le s_{i+1}\). Then similarly \(\sigma _{i+1}\) is bounded from above in (92) which implies \(\sigma _{i+1} = \sigma _i\).

For the final part we consider \(i \ge q\) and show that

It is clear from (92) that this holds for \(i = n\). We continue using induction by assuming \(\sigma _{i+1} = \min (s_{i+1}, \sigma _i)\) holds. Then

since \(s_i\) are non-increasing for \(i \ge q\). This means that for \(\sigma _i\) we can ignore the third case in (92). Thus it follows that \(\sigma _{i} = \min (s_{i}, \sigma _{i-1})\). So (95) holds for all \(i \ge q\).

###
**Theorem 2**

Then the maximizer \({\varvec{\sigma }}\) can be written

where *s* fulfills \(s_p \le s \le s_q\).

###
*Proof*

We first consider \(i \le p\). Assume \(\sigma _i \ne s_i\) for some \(i < p\). From (93) it follows that

But \(s_i\) is non-increasing for \(i \le p\) which implies that \(\sigma _{i+1} > s_{i+1}\). By repeating the argument it follows that

We let \(s = \sigma _p\) and note that due to (93) \(s \ge s_p\). By Lemma 3 we also have

Therefore \(s=\sigma _{q}\) and by (95) we get \(s\le s_q\).

Now assume that for some \(i \ge q\) we have \(\sigma _i(Z) \ne s_i\). By (95) we must have that

By repeating the argument we get

and the result follows. \(\square \)

### Extension Outside the Blocks

In this section we show how to extend a partial low rank solution computed on overlapping blocks of the matrix to a complete solution. The approach hinges on the following result.

###
**Lemma 4**

Let \(X_1\) and \(X_2\) be two overlapping blocks such that they agree on the overlap \(X_{22}\) (in the notation from Fig. 24). If the overlap satisfies

then there exist \(X_{13}\) and \(X_{31}\) such that

Furthermore, if \({{\mathrm{rank}}}(X_1) = {{\mathrm{rank}}}(X_2)\) the extension is unique.

###
*Proof*

Without loss of generality assume that \({{\mathrm{rank}}}(X_{22})={{\mathrm{rank}}}(X_2) \le {{\mathrm{rank}}}(X_1)\). Then the column space of \(X_2\) must be spanned by \(\begin{bmatrix} X_{22} \\ X_{32} \end{bmatrix}\) and similarly the row space by \(\begin{bmatrix} X_{22}&X_{23} \end{bmatrix}\). There exist coefficient matrices \(C_1\) and \(C_2\) such that

For the extension we can then take

To see that this does not increase the rank we note that

and similarly for the rows. This means that the number of linearly independent columns and rows have not increased and the rank must be preserved.

Now assume that \({{\mathrm{rank}}}(X_1) = {{\mathrm{rank}}}(X_2)\). We prove uniqueness by means of contradiction. Assume there exist two different extensions

To be extensions which preserve the rank \(C_1\) and \(\tilde{C}_1\) must satisfy

Which implies that \(C_1 - \tilde{C}_1\) lies in the nullspace of \(\begin{bmatrix} X_{22}^T&X_{32}^T \end{bmatrix}^T\). But by assumption we have

This implies that \(C_1 - \tilde{C}_1\) must also lie in the nullspace of \(X_{12}\), i.e.

which is a contradiction. \(\square \)

The previous lemma showed that each pair of overlapping blocks has an extension which preserves the rank. If we assume that the blocks are chosen to be connected (in a graph sense) we can iterate this construction to find an extension to the whole matrix.

#### Computing the Extension

To find the extension beyond the blocks we employ a nullspace matching scheme which has previously been used in Olsen and Bartoli (2008) and Jacobs (1997). The goal is find a rank *r* factorization of the full solution \(X = UV^T\) given the solution on the blocks. Each block \({\mathcal {P}}_k(X)\) can be factorized as \({\mathcal {P}}_k(X) = U_k V_k^T\). Then \({\mathcal {P}}_k(U)\)
^{Footnote 3} must lie in the column space of \(U_k\) or equivalently it must be orthogonal to the complement, i.e. \((U_k^\perp )^T {\mathcal {P}}_k(U) = 0\). We can also write this as

Collecting these into matrix, \(AU = 0\), we can find *U* by minimizing \(\Vert AU \Vert \). Since the scale of *U* is arbitrary we can consider this as a homogeneous least squares problem which can be solved using SVD. For known *U* we can simply find *V* by minimizing \(||W\odot (X-UV^T)||\).

## Rights and permissions

## About this article

### Cite this article

Larsson, V., Olsson, C. Convex Low Rank Approximation.
*Int J Comput Vis* **120, **194–214 (2016). https://doi.org/10.1007/s11263-016-0904-7

Received:

Accepted:

Published:

Issue Date:

### Keywords

- Low rank approximation
- Convex relaxation
- Convex envelope
- Structure from motion