1 Introduction

Rank optimization is a well-developed topic that has found a tremendous number of applications in recent years (see [23] and references therein). Most of the problems one encounters involve a linear data model that is underdetermined and the very poorly behaved “sparsity function”, either the function determining the rank of a matrix or the function counting the number of nonzero entries in an array. A common approach to solving sparsity optimization problems is via a convex surrogate, most often the 1 or (in the case of matrices) the nuclear norm. The rational for working with such surrogates is that the original problem is NP-complete, and thus should be avoided. Inspired by earlier work proving local convergence of cyclic projections onto nonconvex sets with an application to sparse signal recovery [6], and a more recent projection-reflection algorithm for x-ray imaging [20] that appears to be very successful at working with a proximal operator of the 0 function and a nonlinear imaging model, we set out in the present note to determine whether sets with sparsity constraints have some sort of regularity that might justify working directly with sparsity rather than through convex surrogates.

Based on the work of Lewis and Sendov [15, 16], Le has obtained an explicit formula for the subdifferential of the rank function [11]. This formula shows that every point of the rank function is a critical point [8], and so reasonable algorithmic strategies should not directly make use of the rank function. Instead, we consider the lower level sets of the rank function. While sets of matrices of rank less than a specified level are not manifolds, we show here that they are quite regular, in fact prox-regular. While prox-regularity of these sets is not new [14], our proof of this fact established in Sect. 3 uses elementary tools, at the center of which is a particularly simple and apparently new characterization of the normal cone to these sets established in Proposition 3.6.

Prox-regularity of the lower level sets of the rank function immediately yields local linear convergence of fundamental algorithms for either finding the intersection of the rank constraint set with another set determined by some (nonlinear) data model, or for minimizing the distance to a rank constrained set and a data set. The result, detailed in Sect. 4, is quite general and extends to nonconvex data imaging models with rank constraints. Our results are an extension of results established recently in [3] for the vector case, however at the cost of additional assumptions on the regularity of the solution set. In particular, [3] establishes local linear convergence, with radius of convergence, of alternating projections between an affine constraint and the set of vectors with no more than s nonzero elements without any assumptions on the regularity of the intersection of these sets, beyond the assumption that it is nonempty. Our results, in contrast, are modeled after results of [14] and [13] where a stronger regularity of the intersection is assumed. We discuss the difficulties in extending the tools developed in [4] to the matrix case in the conclusion. In any case, avoiding convex surrogates is at the cost of global convergence guarantees: these results are local and offer no panacea for solving rank optimization problems. Rather, this analysis shows that certain macro-regularity assumptions such as restricted isometry or mutual coherence (see [23] and references therein) play no role asymptotically in the convergence of algorithms, but rather have bearing only on the radius of convergence. We begin this note with a review of notation and basic results and definitions upon which we build.

2 Notation

Throughout this paper \(\mathcal{X}\) and \(\mathcal{Y}\) are Euclidean spaces. In particular we are interested in Euclidean spaces defined on ℝm×n where we derive the norm from the trace inner product

This naturally specializes to the case of ℝn when m=n above and x∈ℝn×n is restricted to the subspace of diagonal matrices. For x∈ℝm×n we denote the span of the rows of x by \(\operatorname{\mathrm{range}} (x^{T})\) and recall that this is orthogonal to the nullspace of the linear mapping x:ℝn→ℝm,

$$\operatorname{\mathrm{range}} \bigl(x^T \bigr)=\ker(x)^\perp. $$

For x∈{z∈ℝn×nz ij =0 if ij} (that is, when x is square diagonal) this corresponds exactly to the usual support of vectors on ℝn:

where \(\operatorname{Diag}(x)\) maps the diagonal of the matrix x∈ℝm×n to a vector in ℝr with r=min{m,n}. In order to emphasize this connection to the support of vectors, and reduce notational clutter we will denote the span of the rows of x by

$$\operatorname{Supp}(x):=\operatorname{\mathrm{range}} \bigl(x^T \bigr). $$

We denote the rank of x by \(\operatorname{\mathrm{rank}}(x)\) and recall that \(\operatorname{\mathrm{rank}}(x)\) is the dimension of the span of the columns—or equivalently the rows—of x which is equivalent to the number of nonzero singular values. The singular values of x∈ℝm×n are the (positive) square root of the eigenvalues of xx T; these are denoted by σ j (x) and are assumed to be ordered so that σ i (x)≥σ j (x) for i<j. We denote by σ(x):=(σ 1(x),σ 2(x),…,σ r (x))T (r=min{m,n}) the ordered vector of singular values of x. The corresponding diagonal matrix is denoted \(\varSigma(x):=\operatorname{diag}(\sigma(x))\in{\mathbb {R}^{m\times n}}\) where \(\operatorname{diag}(\cdot)\) maps vectors in ℝr to matrices in ℝm×n. Following [12, 15, 16] we denote the (Lie) group of n×n orthogonal matrices by O(n) and the product O(mO(n) by O(m,n). A singular value decomposition of x∈ℝm×n restricted to the above ordering is then any pair of orthogonal matrices (U,V)∈O(m,n) together with Σ(x) such that x=(x)V T. We will denote the set of pairs of orthogonal matrices that comprise singular systems for x by \(\mathcal{U}(x):= \{ (U,V)\in O(m,n) \mid x=U\varSigma(x)V^{T} \}\).

The closed ball centered at x with radius ρ is denoted by \({\mathbb{B}}(x,\rho)\); the unit ball centered at the origin is simply denoted by \({\mathbb{B}}\). Given a set \(\varOmega\subset\mathcal{X}\), we denote the distance of a point \(x\in\mathcal{X}\) to Ω by d Ω (x) where

$$d_\varOmega(x) :=\inf_{y\in\varOmega}\|y-x\|. $$

If Ω is empty then we use the convention that the distance to this set is +∞. The corresponding (multivalued) projection operator of x onto Ω, denoted P Ω (x), is defined by

$$P_\varOmega(x):=\mathop{\mathrm{argmin}}_{z\in\varOmega}\|z-x\|. $$

If Ω is nonempty and closed, then the projection of any point in \(\mathcal{X}\) onto Ω is nonempty.

We define the normal cone to a closed set \(\varOmega\subset\mathcal{X}\) following [24, Definition 6.3]:

Definition 2.1

(normal cone)

A vector \(v\in\mathcal{X}\) is normal to a closed set \(\varOmega\subset \mathcal{X}\) at \({\overline{x}}\in\varOmega\), written \(v\in N_{\varOmega}({\overline{x}})\) if there are sequences (x k) k∈ℕ in Ω with \(x^{k}\mathop{\rightarrow_{\hspace*{-3.5mm}{_{\varOmega}}}\hspace*{1.2mm}}{\overline{x}}\) and (v k) k∈ℕ in X with v kv such that

The vectors v k are regular normals to Ω at x k and the cone of regular normals at x k is denoted \({\widehat {N}}_{\varOmega}(x^{k})\).

What we are calling regular normals are called Fréchet normals in [19, Definition 1.1].

Here and elsewhere we use the notation \(x\mathop{\rightarrow_{\hspace*{-3.5mm}{_{\varOmega}}}\hspace*{1.2mm}}{\overline{x}}\) to mean that \(x\to{\overline{x}}\) with xΩ. An important example of a regular normal is a proximal normal, defined as any vector \(v\in\mathcal{X}\) that can be written as \(v=\lambda(x-{\overline{x}})\) for λ≥0 and \({\overline {x}}\in P_{\varOmega}(x)\) for some \(x\in\mathcal{X}\). We denote the set of proximal normals to Ω at xΩ by \(N^{P}_{\varOmega}(x)\). For Ω closed and nonempty, any normal \({\overline{v}}\in N_{\varOmega}({\overline{x}})\) can be approximated arbitrarily closely by a proximal normal [24, Exercise 6.18]. Thus we have the next result which is key to our analysis.

Proposition 2.2

(Theorem 1.6 of [19])

Let \(\varOmega\subset\mathcal{X}\) be closed and \({\overline{x}}\in \varOmega\). Then

(2.1)

Central to our results is the regularity of the intersection of sets, which we define in terms of a type constraint qualification formulated with the normal cones to the sets at points in the intersection.

Definition 2.3

(basic set intersection qualification)

A family of closed sets Ω 1,Ω 2,… Ω m \(\subset\mathcal{X}\) satisfies the basic set intersection qualification at a point \(\overline{x} \in\cap_{i} \varOmega_{i}\), if the only solution to

$${\sum_{i=1}^m} y_i = 0,\quad y_i \in N_{\varOmega_i}(\overline{x}) \ (i=1,2,\ldots,m) $$

is y i =0 for i=1,2,…,m. We say that the intersection is strongly regular at \(\overline{x}\) if the basic set constraint qualification is satisfied there.

In the case m=2, this condition can be written

$$N_{\varOmega_1}(\bar{x}) \cap- N_{\varOmega_2}(\bar{x}) =\{0\}. $$

The two set case is called the basic constraint qualification for sets in [19, Definition 3.2] and has its origins in the generalized property of nonseparability [18] which is the n-set case. It was later recovered as a dual characterization of what is called strong regularity of the intersection in [10, Theorem 3]. It is called linear regularity in [13].

The case of two sets also yields the following simple quantitative characterization of strong regularity.

Proposition 2.4

(Theorem 5.16 of [13])

Suppose that Ω 1 and Ω 2 are closed subsets of \(\mathcal{X} \). The intersection Ω 1Ω 2 satisfies the basic set intersection qualification at \(\overline{x}\) if and only if the constant

$$ \overline{c} := \sup \bigl\{ \langle u, v \rangle \mid u \in N_{\varOmega _1}(\overline{x}) \cap {\mathbb{B}},\ v \in-N_{\varOmega_2}( \overline{x}) \cap{\mathbb {B}} \bigr\}<1. $$
(2.2)

Definition 2.5

angle of regular intersections

Suppose that Ω 1 and Ω 2 are closed subsets of \(\mathcal{X}\). We say that the intersection Ω 1Ω 2 is strongly regular at \(\overline{x}\in\varOmega_{1}\cap \varOmega_{2}\) with angle \(\overline{\theta}:=\cos^{-1}(\overline{c})>0\) when the constant \(\overline{c}\) given by (2.2) is less than 1.

We will also require certain regularity of the sets themselves, not just the intersection. The following definition of prox-regularity of sets is a modern manifestation that can be traced back to [7] and sets of positive reach. What we use here as a definition actually follows from the equivalence of prox-regularity of sets as defined in [22, Definition 1.1] and the single-valuedness of the projection operator on neighborhoods of the set [22, Theorem 1.3].

Definition 2.6

(prox-regularity) A nonempty closed set \(\varOmega\subset\mathcal{X}\) is prox-regular at a point \(\overline{x}\in\varOmega\) if P C (x) is single-valued around \({\overline{x}}\).

3 Properties of Lower Level Sets of the Rank Function

We collect here some facts that will be used repeatedly in what follows.

Proposition 3.1

For any point \(\overline{x}\in\mathbb{R}^{m\times n}\) and any sequence (x k) k∈ℕ converging to \(\overline{x}\) there is a K∈ℕ such that \(\operatorname{\mathrm{rank}}(\overline{x})\leq\operatorname {\mathrm{rank}}(x^{k})\) for all k>K.

Proof

This follows immediately from continuity of the singular values as a function of x. (See, for instance, [9, Appendix D].) □

For the remainder of this note we will consider real m×n matrices and denote by r the minimum of {m,n}. The rank level set will be denoted by \(S:= \{y\in\mathbb{R}^{m\times n} \mid\operatorname{\mathrm{rank}}(y)\leq s \}\) for s∈{0,1,…,r}. As can be found in textbooks on matrix analysis, the projection onto this set is just the truncation of the rs smallest singular vectors to zero; in the case of a tie for the s-th largest singular value, the projection is the set of all s-selections from the s-largest singular values.

Lemma 3.2

(projection onto S)

For x∈ℝm×n, define

The projection P S (x) is given by

$$P_S(x)=\bigcup_{(U,V)\in\mathcal{U}(x)} \bigl\{y \mid y=U \varSigma_s(x)V^T \bigr\}. $$

Proof

By [9, Theorem 7.4.51] any matrix yS satisfies ∥xy∥≥∥Σ(x)−Σ(y)∥. The relation holds with equality whenever y= s (x)V T for some \((U,V)\in\mathcal{U}(x)\), hence \(P_{S}(x)\supset\bigcup_{(U,V)\in\mathcal{U}(x)} \{y \mid y=U\varSigma_{s}(x)V^{T} \}\neq\emptyset\). On the other hand, if \(\overline{y}\in P_{S}(x)\), then \(\|x-\overline{y}\|\leq\|x-y\|\) for all yS. In particular, for y= s (x)V T with \((U,V)\in\mathcal{U}(x)\) we have

hence \(\varSigma_{s}(x)=\varSigma(\overline{y})\) and \(\overline{y}\in \bigcup_{(U,V)\in\mathcal{U}(x)} \{y \mid y= U\varSigma_{s}(x)V^{T} \}\). □

The next results establish that the set S is prox-regular at all points where \(\operatorname{\mathrm{rank}}(x)=s\). We make use of the following tools. For r=min{m,n} define the mappings \(\mathbb{J}:~\mathbb{R}^{m\times n}\times (\mathbb{R}_{+}\cup\{+\infty\})\to2^{\{1,2,\dots,r\}}\) and α s (x): ℝm×n→[0,+∞] by

where \(|\mathbb{J}(x,\alpha)|\) denotes the cardinality of this discrete set. We define \(\mathbb{J}(x,+\infty)\) to be the empty set. Before proceeding with our results, we collect some observations about these objects.

Lemma 3.3

  1. (i)

    For all s∈{1,2,…,r} the value of the supremum in the definition of α s (x) is bounded and attained. If s=0 then α 0(x)=+∞.

  2. (ii)

    If \(\vert\mathbb{J}(x,\alpha_{s})\vert>s\) then \(\operatorname{\mathrm{rank}}(x)>s>0\).

  3. (iii)

    If \(\operatorname{\mathrm{rank}}(x)>s\) then α s >0.

  4. (iv)

    If \(\operatorname{\mathrm{rank}}(x)<s\leq r\) then α s (x)=0.

Proof

(i) Since the cardinality of the empty set is zero, the supremum in the definition of α 0 is unbounded. In any case, the cardinality of \(\mathbb{J}(x,\alpha)\) is monotonically decreasing with respect to α for x fixed from a value of r at α=0 to 0 for all α>σ 1(x). Thus for x fixed α s is bounded for all s∈{1,2,…,r}. The value of α for which the cardinality s≥1 is achieved is attained precisely when α=σ j (x) for some j. (ii) By definition, at s=0, α 0=+∞ and \(|\mathbb {J}(x,+\infty)|:= 0\), so \(\vert\mathbb{J}(x,\alpha_{s})\vert>s\) implies that s>0 and the implication \(\vert\mathbb{J}(x,\alpha_{s})\vert>\operatorname{\mathrm{rank}}(x)\) follows immediately. (iii) If \(\operatorname{\mathrm{rank}}(x)>s\) and s=0, then the result is trivial since α 0:=+∞. If \(\operatorname{\mathrm{rank}}(x)>s\) and s>0 then s∈{1,…,r−1} (it is impossible to have rank greater than r) and there exists an α>0 such that \(|\mathbb{J}(x,\alpha)|\geq s+1\). As α s+1 is the maximum of these, α s+1>0. By the argument in (i) α s α s+1 which yields the result. (iv) In this case, only by including the zero singular values of x can the inequality \(|\mathbb{J}(x,\alpha)|\geq s\) be achieved, that is by taking α=0. □

Proposition 3.4

(properties of the projection)

The following are equivalent.

  1. (i)

    P S (x) is multi-valued;

  2. (ii)

    \(\vert\mathbb{J}(x,\alpha_{s})\vert>s\).

Proof

To show that (i) implies (ii), let y and zP S (x) with yz. By Lemma 3.2 \(y=U_{y}\varSigma_{s}(x)V^{T}_{y}\) and \(z=U_{z}\varSigma_{s}(x)V^{T}_{z}\) for (U y ,V y ) and \((U_{z},V_{z})\in\mathcal{U}(x)\). Then by [9, Theorem 7.4.51]

hence \(\operatorname{\mathrm{rank}}(x)>s\). Since yz and they have the same singular values, the multiplicity of singular values σ j (x) with value α s must be greater than one, hence \(\vert\mathbb{J}(x,\alpha_{s})\vert>s\).

Conversely, to show that (ii) implies (i) first note that by Lemma 3.3(ii) \(\operatorname{\mathrm{rank}}(x)>s>0\). Now fix yP S (x) with \(y=U_{y}\varSigma_{s}(x)V^{T}_{y}\) for \((U_{y},V_{y})\in \mathcal{U}(x)\). The corresponding decomposition for x is \(U_{y}\varSigma(x)V^{T}_{y}\). Now construct the orthogonal matrix \(\widetilde{V}\) by switching the s+1′th column of V y with the s′th column of V y . Since \(\vert\mathbb{J}(x,\alpha_{s})\vert>s\) we have that \(x=U_{y}\varSigma(x)\widetilde{V}^{T}\). Define \(z:= U_{y}\varSigma_{s}(x)\widetilde{V}^{T}\). By Lemma 3.2 zP S (x) with \(\operatorname{\mathrm{rank}}(z)=s\), but the s′th column of \(\widetilde{V}\) is in the nullspace of y so zy and the projection is thus multi-valued. This completes the proof. □

An immediate consequence of the above is the obvious observation that the projection onto the trivial sparsity sets S 0 with s=0 and S r with s=r is single-valued.

Corollary 3.5

For x∈ℝm×n, if s=0 or s=r then P S (x) is single-valued.

The normal cone of this set has the following simple characterization.

Proposition 3.6

(the normal cone to S)

At a point \(\overline{x}\in S\)

(3.1)

Moreover, \(N_{S}(\overline{x})=N^{P}_{S}(\overline{x})\) at every \(\overline {x}\) with \(\operatorname{\mathrm{rank}}(\overline{x})=s\), while \(N^{P}_{S}(\overline{x})=\{0\}\) at every \(\overline{x}\) with \(\operatorname{\mathrm{rank}} (\overline{x})<s\).

Proof

Using the definition of \(\operatorname{Supp}(x):=\ker (x)^{\perp}\) define the sets

We first show that W is nonempty and hence Z(w) for wW is nonempty. For all \(\overline{x}\in\mathbb{R}^{m\times n}\) and s∈{0,1,2,…,r} the zero matrix 0∈W, hence W is nonempty. Next note that for wW, Z(w)⊂ker(w) with dim(ker(w))≥s≥0, and it is always possible to find an element z of ker(w) with \(\operatorname{\mathrm{rank}} (\overline{x}+z)=s\).

Now, choose any wW and z 0Z(w) and construct the sequences (x k) k∈ℕ and (w k) k∈ℕ by

There is a K∈ℕ such that for all k>K

Thus for all k>K

Note that by Proposition 3.4 and Lemma 3.3(ii) the representation of the projection above holds with equality since \(\operatorname{\mathrm{rank}}(\overline{x} +\tfrac{1}{\sqrt{k}}w_{0} )=s\). Since \(x^{k}\to\overline{x}\), by definition, \(w\in N_{S}(\overline{x})\). As w was arbitrary, we have \(W\subset N_{S}(\overline{x})\).

We show next that, conversely, \(N_{S}(\overline{x})\subset W\) for \(\overline{x}\in S\). The matrix w=0 trivially belongs to W, so we assume that w≠0. By Proposition 2.2 we can write w as a limit of proximal normals, that is, the limit of sequences (x k) and (w k) with x kS and w kw for w k=t k(x ky k) for y kP S (x k). We consider the corresponding singular value decompositions by \(y^{k}=U_{k}\varSigma_{s}(x^{k})V_{k}^{T}\) for \((U_{k},V_{k})\in\mathcal{U}(x^{k})\) and \(\varSigma_{s}(x):=\operatorname{diag}((\sigma_{1}(x),\sigma_{2}(x),\dots ,\sigma_{s}(x),0,\dots,0)^{T})\in\mathbb{R}^{m\times n}\) (see Lemma 3.2). Note that x k and y k have the same left and right singular vectors with the usual ordering. The matrices U k and V k are also collections of left and right singular vectors for w k, although they do not yield the usual ordering of singular values of w k:

Let \((\overline{U}, \overline{V})\in\mathcal{U}(\overline{x})\) be the limit of left and right singular vectors of x k, that is, \(U_{k}\to\overline{U}\), \(V_{k}\to\overline{V}\) where \(x^{k}=U_{k}\varSigma (x^{k})V_{k}\to\overline{U}\varSigma(\overline{x})\overline {V}=\overline{x}\). Then \(y^{k}= U_{k}\varSigma_{s}(x^{k})V_{k}\to\overline{U}\varSigma(\overline {x})\overline{V}\) and \(w^{k}=t_{k} U_{k}\widetilde{\varSigma}_{s}(x^{k})V_{k}\to \overline{U} (\lim_{k\to\infty}t_{k}\widetilde{\varSigma}_{s}(x^{k}) )\overline{V}=w\). It follows immediately that \(\operatorname{\mathrm{rank}}(w)\leq r-s\) and \(\operatorname{Supp}(w)\perp\operatorname{Supp} (\overline{x})\) which completes the proof of the inclusion.

To see that each normal to the set S at \(\overline{x}\) with \(\operatorname{\mathrm{rank}} (\overline{x})=s\) is actually a proximal normal, note that if \(\operatorname{\mathrm{rank}}(\overline{x})=s\) then by (3.1) every point \(v\in N_{S}(\overline{x})\) can be written as \(v=\frac{1}{\tau} ((\tau v+\overline{x})-P_{S}(\tau v+\overline{x}) )\) for τ>0 small enough. Suppose, on the other hand, that \(\operatorname{\mathrm {rank}}(\overline{x})<s\). Then \(P_{S}(\tau v+\overline{x}) = \overline{x}\) for τ>0 exactly when v=0: for if \(\tau v+\overline{x}\in S\) then \(P_{S}(\tau v+\overline{x}) = \tau v+\overline{x}=\overline{x}\) exactly when v=0, and if \(\tau v+\overline{x}\notin S\) then \(\operatorname{\mathrm {rank}}(P_{S}(\tau v+\overline {x}))=s\) hence \(P_{S}(\tau v+\overline{x})\neq\overline{x}\). Consequently the only proximal normal at these points is v=0. This completes the proof. □

The normal cone condition \(N_{S}(\overline{x})\cap(-N_{\varOmega}(\overline{x}))=\{0\}\) can easily be checked by determining the nullspace of matrices in \(N_{\varOmega}(\overline{x})\) as the next theorem shows.

Proposition 3.7

(strong regularity of intersections with a sparsity set)

Let Ω⊂ℝm×n be closed. If at a point \(\overline{x}\in\varOmega\cap S\) all nonzero \(v\in N_{\varOmega}(\overline{x})\) have \(\ker(v)^{\perp}\cap\ker (\overline{x})^{\perp}\neq\{0\}\), then the intersection is strongly regular there.

Proof

Choose any \(v\in N_{\varOmega}(\overline{x})\). Since \(\operatorname {Supp}(v)\cap\operatorname{Supp} (\overline{x})\neq\{0\}\) and \(N_{S}(\overline{x})\) given by (3.1) is a subset of matrices w with \(\operatorname{Supp}(w)\cap\operatorname{Supp}(\overline{x})=\{0\}\) the only solution to vw=0 is v=w=0. □

It is known that the set of matrices with rank s is a smooth manifold [1] (although the set of matrices with rank less than or equal to s is not), from which it follows that S is prox-regular at points with rank s [14, Lemma 2.1 and Example 2.3]. We present here a simple proof of this fact based on the characterization of the normal cone.

Proposition 3.8

(prox-regularity of S)

The set S is prox-regular at all points \(\overline{x}\in\mathbb {R}^{m\times n}\) with \(\operatorname{\mathrm{rank}}(\overline{x})=s\).

Proof

Let (x k) k∈ℕ in ℝm×n be any sequence converging to \(\overline{x}\) with the corresponding singular value decomposition \(U_{k}\varSigma(x^{k})V_{k}^{T}\). Decompose x k into the sum y k+z k=x k where \(y^{k}=U_{k}\varSigma_{s}(x)V_{k}^{T}\) and \(z^{k}=U_{k}\widetilde{\varSigma }_{s}(x^{k})V_{k}^{T}\) with Σ s (x k):=(σ 1(x k),σ 2(x k),…,σ s (x k),0…,0)T and \(\widetilde{\varSigma}_{s}(x^{k}):= (0,\dots,0,\sigma_{s+1}(x^{k}),\sigma_{s+2}(x^{k}),\dots, \sigma_{r}(x^{k}) )^{T}\) for r=min{m,n}. Note that \(y^{k}\to\overline{x}\) with \(\operatorname{\mathrm {rank}}(y^{k})=\operatorname{\mathrm{rank}}(\overline {x})=s\) for all k large enough, while by Proposition 3.6 z k→0 with z kN S (y k) for all k. Then for all k large enough max j {σ j (z k)}=σ s+1(x k)<σ s (x k)=min j {σ j (y k)} and \(|\mathbb {J}(x^{k},\alpha_{s})|=s\). By Proposition 3.4 the projection P S (x k) is single-valued. Since the sequence was arbitrarily chosen, it follows that the projection is single-valued on a neighborhood of \(\overline{x}\), hence S is prox-regular. □

4 Algorithms for Optimization with a Rank Constraint

The prox-regularity of the set S has a number of important implications regarding numerical algorithms. Principal among these is local linear convergence of the elementary alternating projection and steepest descent algorithms. There has been a tremendous number of articles published in recent years about convex (and nonconvex) relaxations of the rank function, and when the solution of optimization problems with respect to these relaxations corresponds to the optimization problem with the rank function (see the review article [23] and references therein). The motivation for such relaxations is that there are polynomial-time algorithms for the solution of the relaxed problems, while the rank minimization problem is NP-complete. As we will show in this section, the above theory implies that in the neighborhood of a solution there are polynomial-time algorithms for the solution of optimization problems with rank constraints. This observation was anticipated in [2] and notably [5] where a (globally) linearly convergent projected gradient algorithm with a rank constraint was presented. Without further assumptions, however, such assurances of convergence of algorithms for problems with rank constraints is at the cost of global guarantees of convergence.

4.1 Inexact, Extrapolated Alternating Projections

To the extent that the singular value decomposition can be computed exactly, the projection of a point x onto the rank lower level set S can be calculated exactly simply by ordering the singular values of x and truncating. The above analysis immediately yields local linear convergence of exact and inexact alternating projections for finding the intersection SM for M closed on neighborhoods of points where the intersection is strongly regular. The following algorithm allows for inexact evaluation of the fixed point operator, and hence implementable algorithms.

Algorithm 4.1

(inexact alternating projections [17])

Fix γ>0 and choose x 0S and x 1M. For k=1,2,3,… generate the sequence (x 2k) in S with x 2kP S (x 2k−1) where the sequence (x 2k+1) in M satisfies

(4.1a)
(4.1b)

and

$$ d_{N_M(x_*^{2k+1})} \bigl(\hat{z}^k \bigr)\le\gamma $$
(4.1c)

for

$$x_*^{2k+1}= P_{M\cap\{x^{2k}-\tau\hat{z}^k,~\tau\ge0\}} \bigl(x^{2k} \bigr) $$

and

$$\hat{z}^k:= \begin{cases} \frac{x^{2k} - x^{2k+1}}{\|x^{2k} - x^{2k+1}\|} & \text{if}\ x_*^{2k+1}\neq x^{2k}\\ 0& \text{if}\ x_*^{2k+1}=x^{2k}. \end{cases} $$

For γ=0 and \(x^{2k+1}= x^{2k+1}_{*}\) the inexact algorithm reduces to the usual alternating projections algorithm. Note that the odd iterates x 2k+1 can lie on the interior of M. This is the major difference between Algorithm 4.1 and the one specified in [13] where all of the iterates are assumed to lie on the boundary of M. We include this feature to allow for extrapolated iterates in the case where M has interior.

Theorem 4.2

(inexact alternating projections with a rank lower level set)

Let M,S⊂ℝm×n be closed with \(S:= \{y \mid\operatorname{\mathrm{rank}} (y)\leq s \}\) and suppose there is an \(\overline{x}\in M\cap S\) with \(\operatorname{\mathrm {rank}}(\overline{x})=s\). Suppose furthermore that M and S have strongly regular intersection at \(\overline{x}\) with angle \(\overline{\theta }\). Define \(\overline{c}:=\cos(\overline{\theta})<1\) and fix the constants \(c \in(\overline{c},1)\) and \(\gamma< \sqrt{1-c^{2}}\). For x 0 and x 1 close enough to \(\overline{x}\), the iterates in Algorithm 4.1 converge to a point in MS with R-linear rate

$$\sqrt{ c\sqrt{1-\gamma^2} + \gamma\sqrt{1-c^2} } < 1. $$

If, in addition, M is prox-regular at \(\overline{x}\), then the iterates converge with rate

$$c\sqrt{1-\gamma^2} + \gamma\sqrt{1-c^2} < 1. $$

Proof

Since by Proposition 3.8 S is prox regular at \(\overline{x}\) the results follow immediately from [17, Theorem 4.4]. □

Remark 4.3

The above result requires only closedness of the set M. For example, this yields convergence for affine sets M={xAx=b} which are not only closed, but convex. But the above result is not restricted to such nice sets. Another important example is inverse scattering with sparsity constraints [20]. Here the set M is M={x∈ℂn∣|(Fx) j |2=b j j=1,2,…,n} where F is a linear mapping (the discrete Fourier or Fresnel transform) and b is some measurement (a far field intensity measurement). This set is not convex, but it is certainly closed (in fact prox-regular), so again, we can apply the above results to provide local guarantees of convergence for nonconvex alternating projections with a sparsity set.

Paradoxically, in the vector case it is the projection onto the affine constraint that in general cannot be evaluated exactly, while the projection onto the sparsity set S can be implemented exactly by simply (hard) thresholding the vectors. In the matrix case, this is no longer possible since in general the singular values cannot be evaluated exactly. In order to accommodate both projections being approximately evaluated, we explore one possible solution using a common reformulation of the problem on a product space. This is explained next.

4.2 Approximate Steepest Descent

Another fundamental approach to solving such problems is simply to minimize the sum of the (squared) distances to the sets M and S:

$$\mathop{\mathrm{minimize}}_{x\in {\mathbb{R}^{m\times n}}}\frac{1}{2} \bigl(d^2(x,S)+d^2(x,M) \bigr) $$

Steepest descent without line search is: given x 0∈ℝm×n generate the sequence (x k) k∈ℕ in ℝm×n via

$$x^{k+1}=x^k-\nabla\frac{1}{2} \bigl(d^2 \bigl(x^k,S \bigr)+d^2 \bigl(x^k,M \bigr) \bigr). $$

If S and M were convex and the distance function the Euclidean distance, it is well-known that this would be equivalent to averaged projections:

(4.2)

If we assume that M is prox-regular, then, since we have already established the prox-regularity of S, the correspondence between the derivative of the sum of squared distances to these sets and the projection operators in (4.2) holds on (common) open neighborhoods of M and S [22, Theorem 1.3]. Using a common product space formulation due to [21] we can show that (4.2) is equivalent to alternating projections between the sets

$$D:= \bigl\{(x,y)\in{\mathbb{R}^{m\times n}}\times{\mathbb {R}^{m\times n}} \bigm| x=y \bigr\} $$

and

$$\varOmega:= \bigl\{(x,y)\in{\mathbb{R}^{m\times n}}\times{\mathbb {R}^{m\times n}} \bigm| x\in S, y\in M \bigr\}, $$

that is,

$$\bigl(x^{k+1},x^{k+1} \bigr) = P_D \bigl(P_\varOmega \bigl( \bigl(x^k,x^k \bigr) \bigr) \bigr) $$

where x k+1 is given by (4.2). The set Ω is prox-regular if M and S are, and the set D is convex, so Theorem 4.2 guarantees local linear convergence of the sequence of iterates (x k) k∈ℕ with rate depending on the angle of strong intersection of the sets D and Ω. We cannot expect to be able to compute the projection onto the set Ω exactly, but we can reasonably assume to be able to compute the projection onto the diagonal D exactly, even if the magnitudes of the elements of P S (x k) and P M (x k) differ in orders of magnitude beyond our numerical precision. Indeed, since the projection operators P S and P M are Lipschitz continuous for S and M prox-regular [22, Theorem 1.3], we can attribute any error we in fact make in the evaluation of P D to the evaluation of P Ω where we compute an approximation according to Algorithm 4.1. Again, Theorem 4.2 guarantees local linear convergence with rate governed by the angle of strong regularity between D and Ω and the accuracy of the approximate projection onto Ω.

5 Conclusion

We have developed a novel characterization of the normal cone to the lower level sets of the rank function. This enables us to obtain a simple proof of the prox-regularity of such sets. This property then allows for a straight-forward application of previous results on the local linear convergence of approximate alternating projections for finding the intersection of rank constrained sets and another closed set, as long as the intersection is strongly regular at a reference point \(\overline{x}\). Our characterization of the normal cone to rank constraint sets allows for easy characterization and verification of the strong regularity of intersections of these sets with other sets. The results are also extended to the elementary steepest descent algorithm for minimizing the sum of squared distances to sets, one of which is a rank constraint set. This implies that, in the neighborhood of a solution with sufficient regularity, there are polynomial time algorithms for directly solving rank constraint problems without resorting to convex relaxations or heuristics.

What remains to be determined is the radius of convergence of these algorithms. Using the restricted normal cone developed in [4] Bauschke an coauthors [3] obtained linear rates with estimates of the radius of convergence of alternating projections applied to affine sparsity constrained problems—that is, the vector affine case of the setting considered here—assuming only existence of solutions. The restricted normal cone is not immediately applicable here since the restrictions in [3] are over countable collections of subspaces representing all possible s-sparse vectors. For the rank function this is problematic since the collection of all possible s-rank matrices is not countable. Extending the tools of [4] to the matrix case is the focus of future research.

The results one might obtain using the tools of [4] or similar, however, are based on the regularity near the solution, what we call micro-regularity. We cannot expect the estimates for the radius of convergence to extend very far using these tools, unless certain local-to-global properties like convexity are assumed. In [5] a scalable restricted isometry property is used to prove global convergence of a projected gradient algorithm to the unique solution to the problem of minimizing the distance to an affine subspace subject to a rank constraint. The (scalable) restricted isometry property and other properties like it (mutual coherence, etc.) directly concern uniqueness of solutions and indirectly provide sufficient conditions for global convergence of algorithms for solving relaxations of the original sparsity/rank optimization problem. A natural question is whether there is a more general macro-regularity property than the scalable restricted isometry property, one independent of considerations of uniqueness of solutions, that guarantees global convergence.