# Convex envelopes for fixed rank approximation

- 315 Downloads

## Abstract

A convex envelope for the problem of finding the best approximation to a given matrix with a prescribed rank is constructed. This convex envelope allows the usage of traditional optimization techniques when additional constraints are added to the finite rank approximation problem. Expression for the dependence of the convex envelope on the singular values of the given matrix is derived and global minimization properties are derived. The corresponding proximity operator is also studied.

### Keywords

Convex envelope Rank constraint Approximation Proximity operator## 1 Introduction

*F*and keeping only the

*K*largest singular values. However, if additional constraints are added then there will typically not be an explicit expression for the best approximation.

*A*), and consider

*g*describes the condition that

*A*is a Hankel matrix and

*F*is the Hankel matrix generated by some vector

*f*, then the minimization problem above is related to that of approximating

*f*by

*K*exponential functions [9]. This particular case of (1.3) was for instance studied in [1].

*K*is obtained.

In contrast to \({\mathcal {R}}_K(A)\) the nuclear norm \(\Vert A\Vert _*\) is a convex function, and hence (1.4) is much easier to solve than (1.3). In fact, the nuclear norm is the convex envelope of the rank function restricted to matrices with operator norm \(\le 1\) [5] which motivates the replacement of \({\mathcal {R}}_K(A)\) with \(\mu _K \Vert A\Vert _*\) (for a suitable choice of \(\mu _K\)).

*F*individually. In this paper we present explicit expressions for the l.s.c. convex envelope of (1.6) in terms of the singular values \((\alpha _j)_{j=1}^{\min (m,n)}\) of

*A*, as well as detailed information about global minimizers. More precisely, in Theorem 1 we show that the l.s.c. convex envelope of (1.6) is given by

*K*[see (2.1)]. This article also contains further information on how the l.s.c. convex envelope can be used in optimization problems. Since (1.7) is finite at all points it is also continuous, so we will sometimes write “convex envelope” instead of “l.s.c. convex envelope”.

The second main result of this note is Theorem 2, where the global minimizers of (1.7) are found. In case the *K*th singular value of *F* (denoted \(\phi _K\)) has multiplicity one, then the minimizer of (1.7) is unique and coincides with that of (1.6), given by the Eckart–Young–Schmidt theorem. If \(\phi _K\) has multiplicity *M* and is constant between sub-indices \(J\le K\le L\), it turns out that the singular values \(\alpha _j\) of global minimizers *A*, in the range \(J\le j\le L\) lie on a certain simplex in \({\mathbb {R}}^M\). We refer to Sect. 3, in particular (3.3), for further details.

*F*has a sufficient gap between the

*K*th and \(K+1\)th singular value.

Since the submission of this article the two related papers [8] and [7] have appeared. In [8] the convex envelope of (1.6) and its proximal operator are computed. In [7] these results are generalized to arbitrary unitarily invariant norms when \(F=0\).

## 2 Fenchel conjugates and the l.s.c. convex envelope

*f*is defined by

*A*is achieved for a matrix

*A*with the same Schmidt-vectors (singular vectors) as

*B*, by von-Neumann’s inequality [12]. More precisely, denote the singular values of

*A*,

*B*by \(\alpha ,\beta \) and denote the singular value decomposition by \(A=U_A\Sigma _\alpha V_A^*\), where \(\Sigma _\alpha \) is a diagonal matrix of length \(N=\min (m,n)\). We then have:

### Proposition 1

For any \(A,B\in {\mathbb {M}}_{m,n}\) we have \(\langle {A,B}\rangle \le \sum _{j=1}^{N} \alpha _j\beta _j\) with equality if and only if the singular vectors can be chosen such that \(U_A=U_B\) and \(V_A=V_B\).

See [3] for a discussion regarding the proof and the original formulation of von Neumann.

### Proposition 2

### Proof

*A*of the expression

*A*, then the last three terms are independent of the singular vectors. By Proposition 1 it follows that the maximum value is attained for a matrix

*A*which has the same singular vectors as \(F+\frac{B}{2}\). We denote \(\sigma _j\left( F+\frac{B}{2}\right) \) by \(\gamma _j\) and \(\sigma _j(A)\) by \(\alpha _j\), and write \({\mathcal {R}}_K(\alpha )\) in place of \({\mathcal {R}}_K(A)\) (since the singular vectors are irrelevant for this functional). Combining the above results gives

The computation of \({\mathcal {I}}^{**}\) is a bit more involved.

### Theorem 1

*A*. Since \({\mathcal {I}}^{**}(A)\) is the largest l.s.c. convex lower bound on \({\mathcal {I}}(A)\) we therefore have \({\mathcal {I}}^{**}(A) \ge \Vert A-F\Vert ^2\) which shows that

### Proof

*f*is clearly differentiable with derivative

*t*. In particular, the sequence \((f'(\alpha _{K+1-k}))_{k=1}^{K}\) is non-increasing and up to a factor of 2 it equals (2.1), which proves the first claim in the theorem. Moreover \(f'(\alpha _K)=\sum _{j=K+1}^N\alpha _j\) and \(\lim _{t\rightarrow \infty }f'(t)=-\infty \), whereby it follows that

*f*has a maximum in \((\alpha _K,\infty )\) at a point \(t_*\) where \(f'(t_*)=0\). It also follows that \(k_*\) is the largest integer

*k*such that \(f'(\alpha _{K+1-k})\ge 0\), and hence \(t_*\) lies in the interval \([\alpha _{K+1-k_*},\alpha _{K-k_*})\), (with the convention \(\alpha _0=\infty \) in case \(k_*=K\)). In this interval we have

## 3 Global minimizers

*K*, i.e.

*The elements of *\(\underset{A}{\mathrm {argmin~}} {\mathcal {I}}(A)\)*are all matrices of the form*\(A_*=U\Sigma _{\tilde{\phi }} V^*\), *where*\(U\Sigma _\phi V^*\)*is any singular value decomposition of**F*. *In particular*, \(A_*\)*is unique if and only if*\(\phi _K \ne \phi _{K+1}\).

### Theorem 2

Let \(K\in {\mathbb {N}}\) be given, let *F* be a fixed matrix and let \(\phi \) be its singular values. Let \(\phi _J\) (respectively \(\phi _L\)) be the first (respectively last) singular value that equals \(\phi _K\), and set \(M=L+1-J\) (that is, the multiplicity of \(\phi _K\)). Finally set \(m=K+1-J\), (that is, the multiplicity of \(\tilde{\phi }_K\)). Figure 1 illustrates the setup.

*F*, and \(\alpha \) is a non-increasing sequence satisfying:

*K*and the maximal rank is

*L*. In particular, \(A_*\) is unique if and only if \(\phi _K \ne \phi _{K+1}\).

### Proof

The fact that the minimum value of \({\mathcal {I}}\) and \({\mathcal {I}}^{**}\) coincide follows immediately since \({\mathcal {I}}^{**}\) is the l.s.c. convex envelope of \({\mathcal {I}}\), and the fact that this value is \(\sum _{j>K}\phi _j^2\) follows by the Eckart-Young-Schmidt theorem.

*U*and

*V*such that \(A_*=U\Sigma _\alpha V^*\) and \(F=U\Sigma _\phi V^*\) are singular value decompositions of \(A_*\) and

*F*respectively. Set \(\tilde{F}=U\Sigma _{\tilde{\phi }}V^*\) and note that \(\tilde{F}\) also is a minimizer of (3.2), by the first part of the proof. Since \({\mathcal {I}}^{**}\) is the l.s.c. convex envelope of \({\mathcal {I}}\), it follows that all matrices

*A*(

*t*) equals \(\tilde{\phi }+t\epsilon =t\alpha +(1-t)\tilde{\phi }\) which is non-increasing for all \(t\in [0,1]\), being the weighted mean of two non-increasing sequences. Since \(\tilde{F}\) satisfies (3.3), it follows that

*A*(

*t*) satisfies (3.3) if and only if

*t*, and hence it suffices to prove (3.3) for some fixed \(A(t_0)\) in order for \(A_*=A(1)\) to satisfy (3.3) as well. In other words we may assume that \(\epsilon \) in (3.5) is arbitrarily small [by redefining \(A_*\) to equal \(A(t_0)\)]. With this at hand, we evaluate the testing condition for \(k_*\) [recall (2.1)] at \(k=m+1\);

Finally, the uniqueness statement is immediate. Clearly we can pick \(\alpha _j\) in accordance with (3.3) to get *L* non-zero entries, but not more, so the maximal possible rank is *L*. In order to have as few non-zero entries as possible, the condition \(\sum _{j=J}^L\alpha _j =\phi _K m\) together with \(\alpha _j\le \phi _K\) for \(J\le j\le L\) clearly forces at least *m* non-zero entries in \(J\le j\le L\), so the minimal possible rank is \(J-1+m=K\). \(\square \)

## 4 The proximal operator

### Theorem 3

*s*is found by minimizing the convex function

*s*, \(k_1\) is the smallest index \(\phi \) with \(\phi _{k_1}<s\) and \(k_2\) last index with \(\phi _{k_2}>\frac{s}{1+\rho }\). In particular, \(\alpha \) is a non-increasing sequence and \(\alpha \le \phi \). In other words, the proximal operator is a contraction.

The theorem can be deduced by working directly with the expression for \({\mathcal {I}}^{**}\), but it turns out that it is easier to follow the approach in [10] which is based on the minimax theorem and an analysis of the simpler functional \({\mathcal {I}}^*\). Note in particular that the proximal operator (given by Theorem 3) reduce to the “Eckart–Young approximation” (3.1) if \(\phi _K\ge (1+\rho )\phi _{K+1}\).

### Proof

*A*is outside the compact convex set \({\mathcal {C}}=\{A:\Vert A-F\Vert \le \Vert F\Vert \}\) [recall (2.3)]. This combined with Proposition 2 and some algebraic simplifications shows that

*f*(

*A*,

*Z*), and note that by construction it is convex in

*A*and concave in

*Z*. By Sion’s minimax theorem [15] the order of \(\max \) and \(\min \) can be switched (giving the relation \(A=((1+\rho )F-Z)/\rho \)), and the above \(\min \max \) thus equal

*Z*shares singular vectors with

*F*, so the problem reduces to that of minimizing

*s*is a parameter between \(\phi _K\) and \((1+\rho )\phi _{K+1}\). Inserting this in the previous expression gives (4.3) and the appropriate value of

*s*is easily found. Let \(k_1\) resp. \(k_2\) be the first resp. last index where

*s*shows up in \(\zeta \). Formula (4.2) is now an easy consequence of (4.6). \(\square \)

## 5 Conclusions

We have analyzed and derived expressions for how to compute the l.s.c. convex envelope corresponding to the problem of finding the best approximation to a given matrix with a prescribed rank. These expressions work directly on the singular values.

## Notes

### Acknowledgements

This research is partially supported by the Swedish Research Council, Grants Nos. 2011-5589, 2012-4213 and 2015-03780; and the Crafoord Foundation. We are also indebted to an anonymous referee for significantly improving the manuscript.

### References

- 1.Andersson, F., Carlsson, M., Tourneret, J.-Y., Wendt, H.: A new frequency estimation method for equally and unequally spaced data. IEEE Trans. Signal Process.
**62**(21), 5761–5774 (2014)CrossRefMathSciNetGoogle Scholar - 2.Carlsson, M.: On convexification/optimization of functionals including an l2-misfit term. arXiv preprint arXiv:1609.09378 (2016)
- 3.de Sá, E.M.: Exposed faces and duality for symmetric and unitarily invariant norms. Linear Algebra Appl.
**197**, 429–450 (1994)MATHMathSciNetGoogle Scholar - 4.Eckart, C., Young, G.: The approximation of one matrix by another of lower rank. Psychometrika
**1**(3), 211–218 (1936)CrossRefMATHGoogle Scholar - 5.Fazel, Maryam: Matrix rank minimization with applications. PhD thesis, Stanford University (2002)Google Scholar
- 6.Grussler, C., Rantzer, A.: On optimal low-rank approximation of non-negative matrices. In: 2015 54th IEEE Conference on Decision and Control (CDC), pp. 5278–5283 (2015)Google Scholar
- 7.Grussler, C., Giselsson, P.: Low-rank inducing norms with optimality interpretations. CoRR arXiv:1612.03186 (2016)
- 8.Grussler, C., Rantzer, A., Giselsson, P.: Low-rank optimization with convex constraints. CoRR arXiv:1606.01793 (2016)
- 9.Kronecker, L.: Zur Theorie der Elimination einer Variabeln aus zwei algebraischen Gleichungen. Königliche Akad. der Wissenschaften (1881)Google Scholar
- 10.Larsson, V., Olsson, C.: Convex envelopes for low rank approximation. In: Energy Minimization Methods in Computer Vision and Pattern Recognition, pp. 1–14. Springer, Berlin (2015)Google Scholar
- 11.Larsson, V., Olsson, C., Bylow, E., Kahl, F.: Rank minimization with structured data patterns. In: Computer Vision—ECCV 2014, pp. 250–265. Springer, Berlin (2014)Google Scholar
- 12.Mirsky, L.: A trace inequality of John von Neumann. Monatshefte für Mathematik
**79**(4), 303–306 (1975)CrossRefMATHMathSciNetGoogle Scholar - 13.Rockafellar, R.T.: Convex Analysis. Princeton university press, Princeton (2015)MATHGoogle Scholar
- 14.Schmidt, E.: Zur Theorie der linearen und nichtlinearen Integralgleichungen. III. Teil. Mathematische Annalen
**65**(3), 370–399 (1908)CrossRefMATHMathSciNetGoogle Scholar - 15.Sion, M., et al.: On general minimax theorems. Pac. J. Math.
**8**(1), 171–176 (1958)CrossRefMATHMathSciNetGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.