Efficient Proximal Mapping Computation for Low-Rank Inducing Norms

Grussler, Christian; Giselsson, Pontus

doi:10.1007/s10957-021-01956-2

Efficient Proximal Mapping Computation for Low-Rank Inducing Norms

Open access
Published: 30 November 2021

Volume 192, pages 168–194, (2022)
Cite this article

Download PDF

You have full access to this open access article

Journal of Optimization Theory and Applications Aims and scope Submit manuscript

Efficient Proximal Mapping Computation for Low-Rank Inducing Norms

Download PDF

1198 Accesses
2 Altmetric
Explore all metrics

Abstract

Low-rank inducing unitarily invariant norms have been introduced to convexify problems with a low-rank/sparsity constraint. The most well-known member of this family is the so-called nuclear norm. To solve optimization problems involving such norms with proximal splitting methods, efficient ways of evaluating the proximal mapping of the low-rank inducing norms are needed. This is known for the nuclear norm, but not for most other members of the low-rank inducing family. This work supplies a framework that reduces the proximal mapping evaluation into a nested binary search, in which each iteration requires the solution of a much simpler problem. The simpler problem can often be solved analytically as demonstrated for the so-called low-rank inducing Frobenius and spectral norms. The framework also allows to compute the proximal mapping of increasing convex functions composed with these norms as well as projections onto their epigraphs.

Nonconvex Sparse Regularization and Splitting Algorithms

Proximal linearization methods for Schatten p-quasi-norm minimization

Article 02 December 2022

A Relaxed Interior Point Method for Low-Rank Semidefinite Programming Problems with Applications to Matrix Completion

Article Open access 11 October 2021

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Non-convex optimization problems with a rank or cardinality constraint appear in many data driven areas such as machine learning, image analysis and multivariate linear regression [6, 8, 13, 26, 40] as well as areas within control such as system identification, model reduction, low-order controller design, and low-complexity modeling [2, 15, 18, 35, 47]. Besides the low-rank constraint, these problems are often convex and therefore one of the most common techniques for solving such problems is to convexify them by using regularizers or taking convex envelopes [8, 15, 17, 19]. A promising class of such regularizers and convex envelopes is the class of so-called unitarily invariant low-rank inducing norms [17], i.e., convex envelopes of unitarily invariant norms whose domain is restricted to matrices with prescribed bounded rank. As many common loss functions, e.g., squared distance in the Frobenius norm, contain terms of unitarily invariant norms, these norms have the attractive feature to exactly convexify, i.e., the convexified problem in terms of the low-rank inducing norm coincides on with the original at all low-rank matrices of prescribed bounded rank. Therefore, if the convexified problem has a low-rank solution, it is guaranteed to be a solution to the non-convex one.

Although low-rank inducing norms often admit a representation as semi-definite programs (SDP) [17], proximal splitting algorithms [9] are often used for large-scale problems, where standard interior-point method solvers have too costly iterations [39]. The main objective of this work is to efficiently compute the needed proximal mappings of low-rank inducing norms that are composed with increasing convex functions. To this end, we develop a generic nested binary search algorithm, which in each iteration solves a simple problem. While for well-known low-rank inducing norms such as the nuclear norm [38] and the low-rank inducing Frobenius norm [1, 14, 29, 30], our algorithm will recover the same efficiency, for other norms such as the low-rank inducing spectral norm [46], our approach improves the computational complexity significantly, especially in the vector-valued case. Finally, [45] proposes a non-analytic approach for an extended class of not necessarily unitarily invariant low-rank inducing norms (see [27]). This approach, however, depends on the complexity and convergence rates of other optimization algorithms.

The paper is organized as follows. We start by introducing some preliminaries on norms and convex optimization. Subsequently, a formal definition of the class of low-rank inducing norms, including their application to rank constrained optimization problems is outlined. Then, we discuss and derive our main results, the binary search framework and outline an algorithm for evaluating their epigraph projections. For the low-rank inducing Frobenius and spectral norms, we make these computations explicit and arrive at implementable algorithms for which the computational cost is analyzed. Subsequently, a case study is performed in order to illustrate the performance of our algorithm through proximal splitting. Finally, we draw a conclusion and point the reader to our freely available implementations of these algorithms in MATLAB and Python.

2 Preliminaries

The set of reals is denoted by $\mathbb {R}$, the set of real vectors by $\mathbb {R}^n$, the set of vectors with nonnegative entries by $\mathbb {R}^n_{\ge 0}$ and the set of real matrices by $\mathbb {R}^{n \times m}$. In the remainder of the paper, we assume without loss of generality that $n \le m$. The singular value decomposition of $X \in \mathbb {R}^{n \times m}$ is denoted by $X = \sum _{i=1}^{n} \sigma _i(X) u_i v_i^\mathsf {T}$ with non-increasingly ordered singular values $\sigma _1(X) \ge \dots \ge \sigma _{n}(X)$ (counted with multiplicity). The corresponding vector of all singular values is given by $\sigma (X):=(\sigma _1(X),\ldots ,\sigma _{n}(X)).$ For all $x=(x_1,\ldots ,x_n)\in \mathbb {R}^n$, we define the $\ell _p$ norms with $1\le p<\infty $ by $\ell _p(x):=\left( \sum _{i=1}^{q} |x_i|^p\right) ^{\frac{1}{p}}$ and $\ell _{\infty }(x):=\max _{i}|x_i|$, where $|\cdot |$ denotes the absolute value.

A matrix norm $\Vert \cdot \Vert : \mathbb {R}^{n \times m}\rightarrow \mathbb {R}_{\ge 0}$ is called unitarily invariant if for all unitary matrices $U \in \mathbb {R}^{n \times n}$ and $V \in \mathbb {R}^{m \times m}$ and all $X \in \mathbb {R}^{n \times m}$ it holds that $\Vert UXV\Vert = \Vert X\Vert $. Equivalently, unitary invariance can be characterized by symmetric gauge functions (see e.g., [25, Theorem 7.4.7.2]):

Definition 1

A function $g: \mathbb {R}^n \rightarrow \mathbb {R}_{\ge 0}$ is a symmetric gauge function if

i.
g is a norm.
ii.
$\forall x \in \mathbb {R}^{n}: g(|x|) = g(x)$, where |x| denotes the element-wise absolute value.
iii.
$g(Px) = g(x)$ for all permutation matrices $P\in \mathbb {R}^{n\times n}$ and all $x\in \mathbb {R}^n$.

Proposition 1

The norm $\Vert \cdot \Vert :\mathbb {R}^{n \times m}\rightarrow \mathbb {R}_{\ge 0}$ is unitarily invariant if and only if $\Vert \cdot \Vert = g(\sigma _1(\cdot ),\dots ,\sigma _{n }(\cdot ))$, where g is a symmetric gauge function.

Throughout this work, we use the notation $\Vert X\Vert _g := g(\sigma (X))$. For $X,Y \in \mathbb {R}^{n \times m}$ the Frobenius inner product is defined as $\langle X , Y \rangle := \sum _{i=1}^{m}\sum _{j=n}^{n} x_{ij} y_{ij} = \text {trace}(X^\mathsf {T}Y)$ with Frobenius norm $\Vert X\Vert _{\ell _2} := \ell _2(\sigma (X)) = \sqrt{\langle X , X \rangle }.$ The nuclear norm and the spectral norm are given by $\Vert \cdot \Vert _{\ell _1}:=\ell _{1}(\sigma (\cdot ))$ and $\Vert \cdot \Vert _{\ell _\infty }:=\ell _{\infty }(\sigma (\cdot ))=\sigma _1(\cdot )$. The dual norm to $\Vert \cdot \Vert _g$ is defined as

$$\begin{aligned} \Vert \cdot \Vert _{g^D}&:= \max _{\Vert X\Vert _g \le 1} \langle \cdot , X \rangle =: g^D(\sigma _1(\cdot ),\dots ,\sigma _n(\cdot )). \end{aligned}$$

(1)

Dual norms inherit the unitary invariance as well as the duality relationship for $\ell _p$ norms, i.e., $g = \ell _p$ implies $g^D = \ell _q$ with $p,q \in [1,\infty ]$ satisfying $\frac{1}{p} + \frac{1}{q} = 1$. We will also make use of truncated dual gauge functions. Let $y\in \mathbb {R}^n$, $r\in \{1,\ldots ,n\}$, and $g^D:\mathbb {R}^n\rightarrow \mathbb {R}_{\ge 0}$. The truncated dual gauge function is then defined as

$$\begin{aligned} g^D_r(y) := g^D(\text {sort}(y)_1,\ldots ,\text {sort}(y)_r,0,\ldots ,0), \end{aligned}$$

(2)

where $\text {sort}: \mathbb {R}^n \rightarrow \mathbb {R}^n$ denotes sorting in descending order.

Next, we introduce some standard notation and results from convex optimization [5, 41]. For $f: \mathbb {R}^{n \times m}\rightarrow \mathbb {R} \cup \lbrace \infty \rbrace $, we denote by $\text {dom}(f)$ and $\text {epi}(f)$ the effective domain and epigraph of f, respectively. Its subdifferential at $X \in \mathbb {R}^{n \times m}$ is written as $\partial f(X)$. In particular, by [24, Example VI.3.1]

$$\begin{aligned} \partial \Vert X\Vert _g = \{ G \in \mathbb {R}^{n \times m}: \langle G, X\rangle = \Vert X\Vert _g, \ \Vert G\Vert _{g^D} = 1 \}. \end{aligned}$$

(3)

Further, f is said to be proper if $\text {dom}f \ne \emptyset $ and closed if $\text {epi}(f)$ is a closed set. The conjugate (dual) function of f is denoted by $f^*$ and $f^{**} := (f^*)^*$ is called the biconjugate function or convex envelope of f. For $f:\mathbb {R} \rightarrow \mathbb {R}\cup \lbrace \infty \rbrace $, we say that f increasing if $x \le y \ \Rightarrow \ f(x) \le f(y) \ \text { for all } \ x,y \in \text {dom}(f)$ and if there exist $x,y\in \mathbb {R}$ such that $x<y$ and $f(x)<f(y)$. Moreover, its monotone conjugate is defined as [41] $f^+(y) := \sup _{x \ge 0} \left[ xy - f(x) \right] \ \text { for all } y\in \mathbb {R}.$ The 0-infinity indicator (or characteristic) function of a set $\mathcal {S} \subset \mathbb {R}^{n \times m}$ is denoted by $\chi _{\mathcal {S}}$, which we also use for the indicator function of the set of matrices with at most rank r, i.e., $\chi _{\text {rank}(\cdot )\le r}$. For any $Z \in \mathbb {R}^{n \times m}$, the proximal mapping of a closed, proper and convex function $f: \mathbb {R}^{n \times m}\rightarrow \mathbb {R} \cup \lbrace \infty \rbrace $ is defined as

$$\begin{aligned} \text {prox}_{\gamma f}(Z) := \mathop {\text {argmin}}\limits _X \left( f(X)+\dfrac{1}{2\gamma } \Vert X-Z\Vert ^2_{\ell _2} \right) . \end{aligned}$$

(4)

In particular, $\text {prox}_{\gamma \chi _{\mathcal {C}}}(Z)$ coincides with the unique Euclidean projection

$$\begin{aligned} \Pi _{\mathcal {C}}(Z) := \mathop {\text {argmin}}\limits _{X \in \mathcal {C}} \Vert X-Z\Vert _{\ell _2} \end{aligned}$$

onto $\mathcal {C}$ for any closed, non-empty, convex set $\mathcal {C} \subset \mathbb {R}^{n \times m}$. Moreover, by the extended Moreau decomposition it holds for all $f: \mathbb {R}^{n \times m}\rightarrow \mathbb {R} \cup \lbrace \infty \rbrace $, $Z \in \mathbb {R}^{n \times m}$ and $\gamma > 0$ that (see [4, Theorem 6.29])

$$\begin{aligned} \text {prox}_{\gamma f}(Z) = Z - \gamma \text {prox}_{\gamma ^{-1} f^*}(\gamma ^{-1}Z). \end{aligned}$$

(5)

Finally, we denote compositions of two functions f and g by $ (f \circ g)(\cdot ) := f(g(\cdot ))$.

3 Low-Rank Inducing Norms

This section introduces the family of unitarily invariant low-rank inducing norms, which has been discussed in [17]. Besides recapping some elementary properties, we briefly motivate the usefulness of these norms as convex envelopes or additive regularizers in optimization problems to promote low-rank solutions.

Low-rank inducing norms are defined as the dual norm of a rank constrained dual norm

$$\begin{aligned} \Vert Y \Vert _{g^D,r} := \max _{{\mathop {\Vert X\Vert _g \le 1}\limits ^{\text {rank}(X) \le r}}} \langle X, Y \rangle . \end{aligned}$$

(6)

This means that the low-rank inducing norms corresponding to $\Vert \cdot \Vert _g$ are

$$\begin{aligned} \Vert X\Vert _{g,r*} := \max _{\Vert Y\Vert _{g^D,r}\le 1}\langle Y,X\rangle . \end{aligned}$$

(7)

For $r=n$, the rank constraint in (6) is redundant and $\Vert \cdot \Vert _g \equiv \Vert \cdot \Vert _{g,r*}$. Some important properties of these norms are summarized next [17].

Lemma 1

Let $X,Y \in \mathbb {R}^{n \times m}$, $r \in \mathbb {N}$ be such that $1\le r \le n$, and $g: \mathbb {R}^n \rightarrow \mathbb {R}_{\ge 0}$ be a symmetric gauge function. Then $\Vert \cdot \Vert _{g^D,r}$ is a unitarily invariant norm with

$$\begin{aligned} \Vert Y \Vert _{g^D,r} = g^D_r(\sigma (Y)), \end{aligned}$$

(8)

where $g^D_r$ is defined in (2). Its dual norm $\Vert \cdot \Vert _{g,r*}$ satisfies

$$\begin{aligned}&\Vert \cdot \Vert _{g,r*} = (\Vert \cdot \Vert _g + \chi _{\text {rank}(\cdot )\le r}(\cdot ))^{**}. \end{aligned}$$

(9)

In this work, we especially consider the so-called low-rank inducing Frobenius and spectral norms, i.e., the cases when $g = \ell _2$ and $g = \ell _{\infty }$. Since $\ell _2^D = \ell _2$ and $\ell _{\infty }^D = \ell _1$, it follows from (8) that $\Vert X\Vert _{\ell _2,r*} :=\max _{\Vert Y\Vert _{\ell _2,r} \le 1}$ with $\Vert Y\Vert _{\ell _2^D,r}:=\sqrt{\sum _{i=1}^r \sigma _i^2(Y)}$ and $\left\| {} X {}\right\| _{\ell _{\infty },r*} :=\max _{\left\| {} Y{}\right\| _{\ell _1,r}\le 1}\langle Y,X\rangle $ with $\left\| {} Y{}\right\| _{\ell _1,r} = \sum _{i=1}^r \sigma _i(Y)$.

The following motivates the main interest in low-rank inducing norms (see [16, 17, 19] for details).

Proposition 2

Assume that $f_0: \mathbb {R}^{n \times m}\rightarrow \mathbb {R} \cup \lbrace \infty \rbrace $ is a proper closed convex function, and that $r \in \mathbb {N}$ is such that $1 \le r \le \min \{ m,n \}$. Let $f_1: \mathbb {R}_{\ge 0} \rightarrow \mathbb {R} \cup \{ \infty \}$ be an increasing, proper closed convex function, and let $\theta >0$. Then

$$\begin{aligned} (f_1 \circ \left\| {} \cdot {} \right\| _{g,r*})^*= f_1^+(\left\| {} \cdot {}\right\| _{g^D,r}) \end{aligned}$$

(10)

and

$$\begin{aligned} \inf _{{\mathop {\text {rank}(X) \le r}\limits ^{X \in \mathbb {R}^{n \times m}}}} \left[ f_0(X) + \theta f_1(\Vert X\Vert _g) \right]&\ge -\inf _{D \in \mathbb {R}^{n \times m}} \left[ f_0^*(D) + \theta f^{+}(\theta ^{-1}\Vert D\Vert _{g^D,r}) \right] \end{aligned}$$

(11)

$$\begin{aligned}&= \inf _{X \in \mathbb {R}^{n \times m}} \left[ f_0(X) + \theta f_1(\Vert X\Vert _{g,r*}) \right] . \end{aligned}$$

(12)

If $X^\star $ solves (12) such that $\text {rank}(X^\star ) \le r$, then equality holds and $X^\star $ is also a solution to the problem on the left of (11).

In other words, Proposition 2 shows that low-rank inducing norms can be used both as additive regularizers and direct convex envelopes to find (approximate) solutions to

$$\begin{aligned} \begin{aligned}&\underset{X}{\text {minimize}}&L(X)\\&\text {subject to}&\text {rank}(X) \le r. \end{aligned} \end{aligned}$$

(13)

For regularization as in [15, 42], we set $f_0 = L$ and choose a suitable $f_1$ and $\theta $ to find an approximate solution. In the second case, when L can be split into $L = f_0 + f_1(\Vert \cdot \Vert _g)$ as in Proposition 2, then

$$\begin{aligned} \min _{X \in \mathbb {R}^{n \times m}} \left[ f_0(X) + f_1(\Vert X\Vert _{g,r*}) \right] \end{aligned}$$

(14)

may return an (exact) solution to (13).

4 Proximal Mappings

For problems of small dimensions, it is often convenient to solve (14) through semi-definite programming (SDP). However, conventional SDP solvers are typically based on interior-point methods (see [39]) with an iteration cost that grows unfavorably with the problem dimension. For large-scale problems, proximal splitting methods can be used [4, 9]. To efficiently solve (14), proximal splitting methods require efficient computation of the proximal mapping of $f_1(\Vert \cdot \Vert _{g,r*})$.

In this section, we present our main results on developing a nested binary search framework for computing this proximal mapping for simple choices of $f_1$, efficiently. Explicit and implementable steps for these computations will be shown for the common cases $f_1 = (\cdot )$ and $f_1 = (\cdot )^2$ with $g = \ell _2$ and $g = \ell _{\infty }$ [3, 17, 19, 37]. In Sect. 4.3, the computational complexity of our generic algorithm as well as these particular cases is derived. In cases where $f_1$ is not simple, we can write (14) as

$$\begin{aligned} \min _{t \in \mathbb {R},~X \in \mathbb {R}^{n \times m}} f_0(X) + f_1(t) + \chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}(X,t), \end{aligned}$$

(15)

where $\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}$ is the indicator function of the epigraph to $\Vert \cdot \Vert _{g,r*}$. Then a consensus formulation for proximal splitting methods (see [9]) requires an evaluation of the proximal mappings for $f_1$ and $\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}$. Since $f_1$ is one-dimensional, convex, proper and increasing, its proximal mapping is fast to evaluate. We will see as part of our complexity analysis in Sect. 4.3 that computing $\text {prox}_{\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}}$, $\text {prox}_{\Vert \cdot \Vert _{g,r*}}$ and $\text {prox}_{\Vert \cdot \Vert _{g,r*}^2}$, i.e., $f_1 = (\cdot )$ and $f_1 = (\cdot )^2$, is equally costly.

Note that in contrast to $\left\| {} \cdot {}\right\| _{ g,r*}$, its dual norm $\left\| {} \cdot {}\right\| _{ g^D,r}$ is explicitly known by its definition (8), which is why we derive our search framework for

$$\begin{aligned} \text {prox}_{\gamma ^{-1}f_1^+(\Vert \cdot \Vert _{g^D,r})}(Z) \quad \text {and} \quad \Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v), \end{aligned}$$

(16)

with

$$\begin{aligned} -\text {epi}(\left\| {} \cdot {}\right\| _{g^D,r}) :=\{(Y,-w) : \left\| {} Y{}\right\| _{g^D,r}\le w\} \end{aligned}$$

which by (5) and (10) yields the sought proximal mappings

$$\begin{aligned} \text {prox}_{\gamma f_1(\left\| {} \cdot {} \right\| _{g,r*})}(Z)&= Z - \gamma \text {prox}_{\gamma ^{-1} f_1^+(\Vert \cdot \Vert _{g^D,r})}(\gamma ^{-1}Z) \end{aligned}$$

(17a)

$$\begin{aligned} \text {prox}_{\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}}(Z,z_v)&= \Pi _{\text {epi}(\left\| {} \cdot {} \right\| _{g,r*})}(Z,z_v) = (Z,z_v) - \Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v). \end{aligned}$$

(17b)

4.1 Search Framework

Next, we present our main result, which shows that (16), and hence Eqs. (17a) and (17b), can be computed by a nested parameter search. Since the computations of (16) can be compactly unified as

$$\begin{aligned} \begin{aligned}&\underset{Y,w}{\text {minimize}}&f(w) + \frac{\gamma }{2} \Vert Y-Z\Vert _{\ell _2}^2\\&\text {subject to}&w \ge \left\| {} Y{}\right\| _{g^D,r}, \ Y \in \mathbb {R}^{n \times m}, \end{aligned} \end{aligned}$$

(18)

where f is closed, proper and convex, our results are stated for all such problems. Table 1 summaries common choices for f and its relationship to Eqs. (17a) and (17b) via (16).

Table 1 Example choices for f and $\gamma $ in (18) for the computation of (16) and thus Eqs. (17a) and (17b). $\chi _{\Vert \cdot \Vert _{g^D} \le \gamma }$ stands for the indicator function of the set $\{X: \Vert X\Vert _{g^D} \le \gamma \}$

Full size table

Before we state the main theorem on how to solve (18) with a nested binary search method, we outline the steps that give rise to this algorithm. It is well-known that the solution $Y^\star $ to (18) and Z have a simultaneous SVD [31, 36] and, therefore, only the singular values of $Y^\star $ need to be computed. Let $y_i=\sigma _i(Y)$ and $z_i=\sigma _i(Z)$, then it follows that (18) reduces to the vector-valued problem

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^n(y_i - z_i)^2 \\&\text {subject to}&w \ge g^D_r(y), \ y \in \mathbb {R}^n. \end{aligned} \end{aligned}$$

(19)

Since $z \in \mathbb {R}^n_{\ge 0}$ is monotonically decreasing, it can be shown that the minimizer of (19) is nonnegative. The problem is, therefore, equivalent to solving

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^n(y_i - z_i)^2 \\&\text {subject to}&w \ge g^D(y_1,\ldots ,y_r,0,\ldots ,0), \ y \in \mathbb {R}^n,\\&&y_1 \ge \dots \ge y_n. \end{aligned} \end{aligned}$$

(20)

Since only the r first elements in y are included in the norm constraint, the solution may have a chain of equalities around $y_r$, i.e., there exist integers $t\ge 1$ and $s\ge 0$ such that

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^n(y_i - z_i)^2 \\&\text {subject to}&w \ge g^D(y_1,\ldots ,y_r,0,\ldots ,0), \ y \in \mathbb {R}^n,\\&&y_1\ge \dots \ge y_{r-t+2}>y_{r-t+1} = \dots = y_{r}\\&\qquad \qquad \qquad =\dots = y_{r+s}>y_{r+s+1} \ge \dots \ge y_n. \end{aligned} \end{aligned}$$

(21)

The base case $t=1$ and $s=0$ implies that $y_{r-1}>y_r>y_{r+1}$, i.e., the chain has length one. Thus, if we can solve (21) for an arbitrary, but fixed pair (t, s), an optimal $(t^*,s^*)$ could be determined by comparison with all pairs. As this would be very inefficient, the proposed search rules are devised to find $(t^*,s^*)$ by only considering a few pairs.

To state these rules, we need to introduce the truncated gauge function of $g^D$ as

$$\begin{aligned} g^D_{r,s,t}(x) := g^D\bigg ((Tx)_1,\dots ,(Tx)_{r-t},\underbrace{(Tx)_{r-t+1},\dots ,(Tx)_{r-t+1}}_{t~\text {times}},0,\dots ,0\bigg ), \end{aligned}$$

where $x \in \mathbb {R}^n$ and the truncation operator $T: \mathbb {R}^{n} \rightarrow \mathbb {R}^{r-t+1}$ is defined for all $1 \le r \le n$ and $(t,s) \in \{1,\dots ,r\} \times \{0,\dots ,n-r\}$ as

$$\begin{aligned} (Tx)_i := {\left\{ \begin{array}{ll} \text {sort}(x)_i, &{}\text {if } 1\le i \le r-t, \\ \dfrac{\sum _{i = r-t+1}^{r+s} \text {sort}(x)_i}{\sqrt{t+s}}, &{}\text {if } i = r-t+1. \end{array}\right. } \end{aligned}$$

(22)

Note that $g^D_{r,s,t}$ is indeed a gauge function with dual gauge function [23, Lemma 2.2.2])

$$\begin{aligned} g_{r,s,t}(x) := g((Tx)_1,\dots ,(Tx)_{r-t},\underbrace{\tfrac{{(Tx)}_{r-t+1}(s+t)}{t},\dots ,\tfrac{(Tx)_{r-t+1}(s+t)}{t}}_{t~\text {times}},0,\dots ,0). \end{aligned}$$

For the special case $(t,s) = (1,0)$, it reduces to $g^D_r$ in (2). We are now ready to state our main theorem.

Theorem 1

Let $Z = \sum _{i = 1}^n \sigma _i(Z) u_i v_i^\mathsf {T}\in \mathbb {R}^{n \times m}$, $\gamma > 0$, $1\le r\le n$, $g: \mathbb {R}^n \rightarrow \mathbb {R}$ be a gauge function, and $f:\mathbb {R} \rightarrow \mathbb {R} $ be proper, closed and convex. For each $(t,s) \in \{1, \dots , r\} \times \{0,\dots , n-r\}$ let $(y^{(t,s)},w^{(t,s)}) \in \mathbb {R}^{n+1}$ be defined as

$$\begin{aligned} y^{(t,s)}_i&:= {\left\{ \begin{array}{ll} \tilde{y}_i, &{}\text {if } 1\le i \le r-t,\\ \dfrac{\tilde{y}_i}{\sqrt{t+s}}, &{}\text {if } r-t+1 \le i \le r+s, \\ \sigma _i(Z) &{}\text {if } i \ge r+s+1.\\ \end{array}\right. } \end{aligned}$$

(23a)

$$\begin{aligned} w^{(t,s)}&:= \tilde{w} \end{aligned}$$

(23b)

where $(\tilde{y},\tilde{w}) \in \mathbb {R}^{r-t+2}$ solves

$$\begin{aligned} \begin{aligned}&\underset{\tilde{y},\tilde{w}}{\text {minimize}}&f(\tilde{w})+ \frac{\gamma }{2}\sum _{i=1}^{r-t+1}(\tilde{y}_i - \tilde{z}_i)^2 \\&\text {subject to}&\tilde{w} \ge g_{r,s,t}^D(\tilde{y}), \ \tilde{y} \in \mathbb {R}^{r-t+1}, \end{aligned} \end{aligned}$$

(24)

and $\tilde{z} := T \sigma (Z)$ is given by (22). Then $(Y^\star ,w^\star ) = (\sum _{i = 1}^n y_i^{(t^\star ,s^\star )}u_i v_i^\mathsf {T},w^{(t^\star ,s^\star )})$ is the solution to (18), where

$$\begin{aligned} t^\star&:= \min \left\{ \lbrace t: y^{(t,s_t^\star )}_{r-t} > y_{r-t+1}^{(t,s_t^\star )} \rbrace \cup \lbrace r \rbrace \right\} \end{aligned}$$

(25a)

$$\begin{aligned} s_t^\star&:= \min \left\{ \lbrace s: y^{(t,s)}_{r+s} > y_{r+s+1}^{(t,s)} \rbrace \cup \lbrace n-r \rbrace \right\} \nonumber \\ s^\star&:= s_{t^\star }^\star \end{aligned}$$

(25b)

In particular, $(t^\star ,s^\star )$ can be found by a nested binary search over t and s with the following rules for increasing/decreasing t and s:

I.
$y^{(t,s_t^\star )}_{r-t} \ge y_{r-t+1}^{(t,s_t^\star )}$ for all $t \ge t^\star $.
II.
$y^{(t,s_t^\star )}_{r-t} \le y_{r-t+1}^{(t,s_t^\star )}$ for all $t < t^\star $.
III.
If $t < t^\star $ and $y^{(t,s_t^\star )}_{r-t} = y_{r-t+1}^{(t,s_t^\star )}$ then $\left( y^{(t,s_t^\star )},w^{(t,s_t^\star )} \right) = \left( y^{(t^\star ,s^\star )},w^{(t^\star ,s^\star )} \right) $.
IV.
$y^{(t,s)}_{r+s} \ge y_{r+s+1}^{(t,s)}$ for all $s \ge s_t^\star $.
V.
$y^{(t,s)}_{r+s} \le y_{r+s+1}^{(t,s)}$ for all $s < s_t^\star $.
VI.
If $s < s_t^\star $ and $y^{(t,s)}_{r+s} = y_{r+s+1}^{(t,s)}$ then $\left( y^{(t,s)},w^{(t,s)} \right) = \left( y^{(t,s_t^\star )},w^{(t,s_t^\star )} \right) $.

A few words on Theorem 1 may be helpful. The first part simply makes explicit that $(y^{(t,s)},w^{(t,s)})$ in Eqs. (23a) and (23b) is the solution of (21) with fixed t and s, i.e., it solves

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^{n}(y_i - z_i)^2\\&\text {subject to}&w \ge g^D_r(y), \ y \in \mathbb {R}^n,\\&&y_{r-t+1} = \dots = y_{r+s}, \end{aligned} \end{aligned}$$

(26)

via the solution of the lower-dimensional problem (24). For fixed t in (21), the search rules for s (Items IV. to VI.) can be used to find an optimal $s = s_t^\star $ that minimizes the cost in (21) among all choices of s that fulfil the constraint $y^{(t,s)}_{r+s} \ge y_{r+s+1}^{(t,s)} \ge \dots \ge y_{n}^{(t,s)}.$ Since $y_i^{(t,s)} = z_i$ for $i \ge r+s+1$ by (23a), it suffices to check that $y^{(t,s)}_{r+s} \ge y_{r+s+1}^{(t,s)}$, where by (25b) $s_t^\star $ is the smallest of such s. Similarly, the search rules for finding an optimal $t =t^\star $ minimize the cost in (21) among all choices $(t,s) = (t,s_t^\star )$ that do not violate the constraint $y^{(t,s_t^\star )}_{r-t} \ge y_{r-t+1}^{(t,s_t^\star )}$. Using nested binary search (see [28]) over s (inner loop) and t (outer loop), an optimal $(t^*,s^*)$ can be found efficiently under the assumption that (24) has an efficiently computable solution for all choices (t, s). For more details, see the derivation of the proof to Theorem 1 in Appendix 2 and our explicit implementation for determining $\Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v)$ in Algorithm 1.

4.2 Low-Rank Inducing Frobenius and Spectral Norms

Next, we exemplify solutions to (24) for the instances in Table 1 with $g = \ell _2$ and $g = \ell _\infty $. A general result on the solvability of (24) is given in Appendix 3.

In particular, we will discuss solutions to (24) for all $g = \tau \ell _2$ and $g = \tau \ell _\infty $, $\tau > 0$, because this enables us to handle the first two cases in Table 1 simultaneously through the identity

$$\begin{aligned} \text {prox}_{\frac{\tau ^2}{2}{\left\| {}\cdot {}\right\| _{ g,r*}^2}}(Z) = Z - \text {prox}_{\frac{1}{2}{\left\| {} \cdot {}\right\| _{ \frac{g^D}{\tau },r}^2}}(Z) = Z- \Pi _{Y}\bigg (\Pi _{-\text {epi}(\left\| {} \cdot {}\right\| _{ \frac{g^D}{\tau },r})}(Z,0)\bigg ), \end{aligned}$$

where $\Pi _{Y} (Y,w) := Y$ and $\tau > 0$. It is easy to adjust these computations for the third case, because $\text {prox}_{\tau {\left\| {}\cdot {}\right\| _{ g,r*}}}(Z) = \text {prox}_{{\left\| {}\cdot {}\right\| _{ \tau g,r*}}}(Z)$.

Proposition 3

Let $Z = \sum _{i = 1}^n \sigma _i(Z) u_i v_i^\mathsf {T}\in \mathbb {R}^{n \times m}$, $g = \tau \ell _2$ with $\tau > 0$, $1\le r\le n$, $\gamma = 1$, $z_v \in \mathbb {R}$ and $\tilde{z} := T \sigma (z)$. Then, $\Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v)$ can be computed via Theorem 1 with $f(w) = \frac{1}{2}(w+z_v)^2$, where the solution to (24) is characterized by one of the following three distinct cases:

$$\begin{aligned}&(\tilde{y},\tilde{w}) = (\tilde{z},z_v) ~\ \Longleftrightarrow ~\ \sqrt{\sum _{i=1}^{r-t} \tilde{z}_i^2 + \frac{t}{s+t} \tilde{z}_{r-t+1}^2} \le -\tau z_v, \end{aligned}$$

(27a)

$$\begin{aligned}&(\tilde{y},\tilde{w}) = 0 ~\ \Longleftrightarrow ~\ \sqrt{\sum _{i=1}^{r-t} \tilde{y}_i^2 + \frac{s+t}{t} \tilde{y}_{r-t+1}^2} \le \frac{z_v}{\tau } \end{aligned}$$

(27b)

and otherwise

$$\begin{aligned} \tilde{y}_i&= \frac{\tilde{z}_i}{1+\frac{\mu }{\tau ^2 \tilde{w}}}, \ 1\le i \le r-t \end{aligned}$$

(27c)

$$\begin{aligned} \tilde{y}_{r-t+1}&= \frac{ \tilde{z}_{r-t+1}}{1+\frac{\mu t}{\tau ^2 \tilde{w} (s+t)}} \end{aligned}$$

(27d)

$$\begin{aligned} \tilde{w}&= \mu - z_v \end{aligned}$$

(27e)

where the unique $\mu \ge 0$ is a solution to the fourth-order polynomial

$$\begin{aligned} \left[ \left( \tilde{w} \tau +\frac{\mu }{\tau }\right) ^2-c_1\right] \left[ (t + s)\tau \tilde{w} + \frac{\mu }{\tau } t\right] ^2 - t c_2^2\left( \tilde{w} \tau +\frac{\mu }{\tau }\right) ^2 = 0 \end{aligned}$$

(27f)

$c_1 := \sum _{i=1}^{r-t} \tilde{z}^2_i$ and $c_2 := \sqrt{t+s}\tilde{z}_{r-t+1}$.

Similarly, $\text {prox}_{\chi _{\Vert \cdot \Vert _{\tau g^D,r} \le \gamma }}(Z)$ can be determined by setting $f(w) = \chi _{[0,\gamma ]}(w)$, where it suffices to consider the two cases: (27a) with $z_v = -1$, and Eqs. (27c), (27d) and (27f) with $\tilde{w} = 1$.

Proposition 4

Let $Z = \sum _{i = 1}^n \sigma _i(Z) u_i v_i^\mathsf {T}\in \mathbb {R}^{n \times m}$, $g = \tau \ell _\infty $ with $\tau > 0$, $1\le r\le n$, $\gamma = 1$, $z_v \in \mathbb {R}$ and $\tilde{z} := T \sigma (z)$. Further, let

$$\begin{aligned} \hat{z}&:=\left( \tilde{z}_1,\ldots ,\tilde{z}_j,\frac{t}{\sqrt{(t+s)}}\tilde{z}_{r-t+1},\tilde{z}_{j+1},\ldots ,\tilde{z}_{r-t}\right) \in \mathbb {R}^{r-t+1},\\ \alpha&:= \left( \underbrace{1,\ldots ,1}_{\text {length } j},\frac{t^2}{(t+s)},1,\dots ,1\right) \in \mathbb {R}^{r-t+1}. \end{aligned}$$

where j is chosen such that

$$\begin{aligned} \tilde{z}_j>\tfrac{\sqrt{(t+s)}}{t}\tilde{z}_{r-t+1}\ge \tilde{z}_{j+1} \quad \text {or} \quad \tilde{z}_{r-t} \ge \tfrac{\sqrt{(t+s)}}{t}\tilde{z}_{r-t+1} . \end{aligned}$$

(28)

Then, $\Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v)$ can be computed via Theorem 1 with $f(w) = \frac{1}{2}(w+z_v)^2$, where the solution to (24) is characterized by one of the following three distinct cases:

$$\begin{aligned}&(\tilde{y},\tilde{w}) = (\tilde{z},zv) ~\ \Longleftrightarrow ~\ \sum _{i=1}^{r-t} |\tilde{z}_i| + \frac{t}{\sqrt{t+s}} |\tilde{z}_{r-t+1}| \le -\tau z_v \end{aligned}$$

(29a)

$$\begin{aligned}&(\tilde{y},\tilde{w}) = 0 ~\ \Longleftrightarrow ~\ \max \left( \tilde{z}_1,\dfrac{\sqrt{t+s}}{t} \tilde{z}_{r-t+1} \right) \le \frac{z_v}{\tau } \end{aligned}$$

(29b)

and otherwise

$$\begin{aligned} \tilde{y}_i&= \max \left( \tilde{z}_i - \tfrac{\mu }{\tau },0 \right) , \ 1\le i \le r-t, \end{aligned}$$

(29c)

$$\begin{aligned} \tilde{y}_{r-t+1}&= \max \left( \tilde{z}_{r-t+1} - \tfrac{t \mu }{\sqrt{(t + s)}\tau },0 \right) , \end{aligned}$$

(29d)

$$\begin{aligned} \tilde{w}&= \mu - z_v \end{aligned}$$

(29e)

where $\mu = \hat{\mu }_{k^\star }$ with $\hat{\mu }_k= \frac{z_v+\sum _{i=1}^{k}\hat{z}_{i}}{1+\sum _{i=1}^k\alpha _i}$ and $k^\star $ can be identified by a search over k with the following rules for increasing/decreasing k:

I.
$k^\star =\max \{k : \hat{z}_k-\alpha _k\hat{\mu }_k\ge 0\}$
II.
$\hat{z}_k-\alpha _k\hat{\mu }_k\ge 0$ for all $k \le k^\star $
III.
$\hat{z}_k-\alpha _k\hat{\mu }_k< 0$ for all $k> k^\star $

Similarly, $\text {prox}_{\chi _{\Vert \cdot \Vert _{\tau g^D,r} \le \gamma }}(Z)$ can be determined by setting $f(w) = \chi _{[0,\gamma ]}(w)$, where it suffices to consider the two cases: (29a) with $z_v = -1$ and Eqs. (29c) and (29d), where $\mu = \hat{\mu }_{k^\star }$ can be found with the search rules from above and $\hat{\mu }_k = \frac{\sum _{i=1}^{k}\hat{z}_{i}}{\sum _{i=1}^k\alpha _i}$.

Propositions 3 and 4 are proved in Appendixces 4 and 6, respectively, and implementations for the user are available for MATLAB and Python at [20, 21].

4.3 Computational Complexity

In the following, we evaluate the computational complexity, i.e., counting all flops (see [44]) of the discussed approaches for computing

$$\begin{aligned} \text {prox}_{\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}}(Z,z_v) = \Pi _{\text {epi}(\left\| {} \cdot {} \right\| _{g,r*})}(Z,z_v) = (Z,z_v) - \Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v). \end{aligned}$$

Since the same analysis also applies to the other cases discussed in Table 1, this will allow us to compare our approach to existing methods. Our evaluation starts with a discussion of Algorithm 1 for a general gauge function, followed by an explicit discussion for the cases of $g=\ell _2$ and $g = \ell _{\infty }$ in Sects. 4.3.1 and 4.3.2 of which a summary is given in Table 2.

Table 2 Complexity of computing $\Pi _{\text {epi}(\Vert \cdot \Vert _{\ell _2,r*})}(Z,z_v)$ and $\Pi _{\text {epi}(\Vert \cdot \Vert _{\ell _1,r*})}(Z,z_v)$ given a pre-computed SVD of Z in comparison with others

Full size table

In order to apply the binary search rules in Theorem 1, we only need to determine $(y^{(t,s)}_{r-t},y^{(t,s)}_{r-t+1},y^{(t,s)}_{r+s},y^{(t,s)}_{r+s+1}),$ whose computational cost we assume to be bounded by C(n, r). Then, the complexity of Algorithm 1 is the sum of:

1.
SVD for Z providing all $\sigma _i(Z)$ and $u_iv_i^\mathsf {T}$ such that $Z = \sum _{i = 1}^n \sigma _i(Z)u_iv_i^\mathsf {T}$ (see [44]): $\mathcal {O}(mn^2)$.
2.
Binary search rules (see [28]) in Theorem 1 for t and s:
$$\begin{aligned} \mathcal {O}(C(n,r) \log (r)\log (n-r)) \end{aligned}$$
3.
Determine the final full solution: $\mathcal {O}(n)$.
4.
Compute $\text {prox}_{\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}}(Z,z_v)$ from $\Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v)$: $\mathcal {O}(n)$

In practise, the first cost may be significantly reduced by employing sparse SVD solvers (see e.g., [32, 34]). In particular, for the vector-valued case, this corresponds to a simple sorting of the entries. The second cost is determined by the coordinate transformation (23a), i.e.,

$$\begin{aligned} (y^{(t,s)}_{r-t},y^{(t,s)}_{r-t+1},y^{(t,s)}_{r+s},y^{(t,s)}_{r+s+1})= \left( \tilde{y}_{r-t},\frac{1}{\sqrt{s+t}}\tilde{y}_{r-t+1},\frac{1}{\sqrt{s+t}}\tilde{y}_{r-t+1},\sigma _{r+s+1}(Z)\right) \end{aligned}$$

and therefore the cost for C(n, r) equals the cost $\tilde{C}(n,r)$ for solving (24) to find $(\tilde{y}_{r-t},\tilde{y}_{r-t+1})$. To compute the full solution $y^{(t^\star ,s^\star )}$, once an optimal pair $(t^\star ,s^\star )$ is found, the cost for these pre- and post-computing steps is at most $\mathcal {O}(n)$. Finally, computing $\text {prox}_{\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}}(Z,z_v)$ from $\Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v)$ only contributes an additional $n+1$ subtractions.

Remark 1

The cost for computing $\tilde{z}_{r-t+1}$ is given by the cost for knowing $\sum _{i=r+1}^{r+s} \sigma _i(Z)$ (for $s > 0$) and $\sum _{i=r-t+1}^{r} \sigma _i(Z)$. Both sums could be computed a priori for all t and s through incremental summation with cost $\mathcal {O}(n)$. However, in practice it may be cheaper to store and re-use the intermediate sums, when deriving $\sum _{i=r-t+1}^{r} z_i$ and $\sum _{i=r+1}^{r+s} z_i$. This means we only need to compute additional intermediate sums whenever t and s get increased within the binary search.

4.3.1 Low-Rank Inducing Frobenius Norms

In order to determine the computational cost $\tilde{C}(n,r)$ for $g = \ell _2$, we need to analyze the complexity of the three cases in Proposition 3. All cases require the evaluation of $\sum _{i=1}^{r-t} \tilde{z}_i^2 = \sum _{i=1}^{r-t} \sigma _i^2(Z)$ as either part of the inequalities Eqs. (27a) and (27b) or as coefficients in polynomial (27f). These sums can be computed once for all $t \in \{1,\dots ,r\}$ with cost $\mathcal {O}(r)$. Then testing Eqs. (27a) and (27b) as well as solving the fourth-order polynomial (27f) are of cost $\mathcal {O}(1)$. Our generic approach, therefore, recovers in this special case the same complexity as the algorithms in [14, 29] (see Table 2).

4.3.2 Low-Rank Inducing Spectral Norms

As in the previous case, determining $\tilde{C}(n,r)$ for $g = \ell _\infty $ requires us to compute the complexity of the three cases in Proposition 4. The cases Eqs. (29a) and (29b) require the evaluation of $\sum _{i=1}^{r-t}\tilde{z}_i = \sum _{i=1}^{r-t}\sigma _i(Z)$. This can be done once for all $t \in \{1,\dots ,r\}$ with cost $\mathcal {O}(r)$, and verifying the corresponding inequalities is then of complexity $\mathcal {O}(1)$.

Determining $\mu $ in the third case of Proposition 4 requires:

a)
Find j in (28): $\mathcal {O}(\log (r-t+1))$, because $\tilde{z}_1 \ge \dots \ge \tilde{z}_{r-t}$.
b)
Determine $\mu _{k^{\star }} = \mu $ through binary search: $\mathcal {O}(r-t+1)$, because $\sum _{i=1}^{r-t+1} \hat{z}_i$ may need to be computed.

Thus, $\tilde{C}(n,r)$ is dominated by the complexity of determining $\mu $, which by the preceding analysis is at most $\mathcal {O}(r)$. Compared to [46], our approach reduces the overall cost significantly (see Table 2), which is especially important for the corresponding vector-valued problem.

5 Case Study: Matrix Completion

In the following, we will see how the binary search parameters (t, s, k) from Algorithm 1 and Proposition 4 evolve when solving an optimization problem with a proximal splitting method. We consider the convexified low-rank matrix completion problem (see, e.g., [7, 8, 17] for motivation and examples)

$$\begin{aligned} \begin{aligned}&\underset{M}{\text {minimize}}&\Vert M\Vert _{\ell _\infty ,r*} \\&\text {subject to}&m_{ij} = n_{ij}, \ (i,j) \in \mathcal {I} \end{aligned} \end{aligned}$$

(30)

with $r =50$, $\mathcal {I} := \{n_{ij}: n_{ij} > 0 \}$ and $N = \sum _{i=1}^r u_i u_i^\mathsf {T}$ being defined through the SVD of

Note that a smaller version of this example has been solved successfully in [17] by using an SDP-solver, but this larger example is far out of the scope of typical SDP-solvers [39, 43]. Therefore, we apply the following Douglas–Rachford splitting scheme (see [9, 11, 33]):

$$\begin{aligned}&X_{i} = \text {prox}_{{\left\| {}\cdot {}\right\| _{ \ell _{\infty },r*}}}(Z_{i-1}), \quad Y_{i}= \Pi _{\mathcal {L}}(2X_{i} - Z_{i-1}), \quad Z_{i} = Z_{i-1} + Y_{i}-X_{i} \end{aligned}$$

(32)

with $\mathcal {L} := \{X \in \mathbb {R}^{500\times 500}: x_{ij} = n_{ij}, \ (i,j) \in \mathcal {I} \}$, $Z_0 = 0$ and $\lim _{i \rightarrow \infty } X_i = \lim _{i \rightarrow \infty } Y_i$ being a solution to (30). By the construction of N, it can be shown that $\lim _{i \rightarrow \infty } X_i = N$ (see [17]).

The parameter path of (t, s, k) for computing $X_i$ is shown in Fig. 1. We observe that as $X_i$ approaches N, the values of t, s and k start plateauing. Thus, by using the values from one iterate in the subsequent iterate, the practical computational cost may reduce significantly. Finally, after the initial transient, the variance of each parameter is small compared to the overall 500 singular values. As a result, sparse SVD algorithms, which only compute a small predefined number of largest singular values (see, e.g., [32, 34]), can be effectively applied. This emphasizing that our complexity analysis is important to both, vector- and matrix-valued problems.

6 Conclusion

This work presents a binary search framework for computing the proximal mappings of all unitarily invariant low-rank inducing norms and their epigraph projections. In particular, complete algorithms for the low-rank inducing Frobenius and spectral norms are presented. Our framework unifies and extends the known proximal mapping computations in the following sense: (i) So far, only proximal mappings for the squared low-rank inducing Frobenius norm [14] and the (non-squared) low-rank inducing spectral norm [46] have been derived. This framework is independent of the particular unitary invariant norm and its composition with a convex increasing function. (ii) Excluding the cost for an SVD, i.e., the cost for the analogous vector-valued problem, we recover the same complexity for the squared low-rank inducing Frobenius norm as in [14, 29], but significantly decrease the complexity for the (non-squared) low-rank inducing spectral norm. Further, we show that these costs also transfer to compositions with simple functions.

Finally, in our case study we have seen that within a proximal splitting method, the computational cost of our proximal mappings may be reduced to approximately linear cost, besides the singular value decomposition, after a small number of iterations and is therefore roughly the same as in case of the nuclear norm/spectral norm. Further, our example also demonstrates that sparse singular value decomposition (see e.g., [32, 34]) can be effectively applied, underlining the importance of our analysis even for the matrix case. Implementations for the low-rank inducing Frobenius and spectral norms are available for MATLAB and Python at [20, 21].

References

Andersson, F., Carlsson, M., Olsson, C.: Convex envelopes for fixed rank approximation. Optim. Lett. 11(8), 1783–1795 (2017). https://doi.org/10.1007/s11590-017-1146-5
Article MathSciNet MATH Google Scholar
Antoulas, A.C.: On the approximation of Hankel matrices. In: Helmke, U. (ed.) Operators. Systems and Linear Algebra: Three Decades of Algebraic Systems Theory, pp. 17–22. Vieweg+Teubner Verlag, Wiesbaden (2013)
Google Scholar
Argyriou, A., Foygel, R., Srebro, N.: Sparse prediction with the k-support norm. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 25, pp. 1457–1465. Curran Associates Inc, London (2012)
Google Scholar
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics. Springer, New York (2011). https://doi.org/10.1007/978-3-319-48311-5
Book MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004). https://doi.org/10.1017/CBO9780511804441
Book MATH Google Scholar
Candès, E.J., Plan, Y.: Matrix completion with noise. Proc. IEEE 98(6), 925–936 (2010). https://doi.org/10.1109/JPROC.2009.2035722
Article Google Scholar
Candès, E.J., Recht, B.: Exact matrix completion via convex optimization. Found. Comput. Math. 9(6), 717 (2009). https://doi.org/10.1007/s10208-009-9045-5
Article MathSciNet MATH Google Scholar
Chandrasekaran, V., Recht, B., Parrilo, P.A., Willsky, A.S.: The convex geometry of linear inverse problems. Found. Comput. Math. 12(6), 805–849 (2012). https://doi.org/10.1007/s10208-012-9135-7
Article MathSciNet MATH Google Scholar
Combettes, P.L., Pesquet, J.-C.: Proximal Splitting Methods in Signal Processing, pp. 185–212. Springer, New York (2011). https://doi.org/10.1007/978-1-4419-9569-8_10
Book MATH Google Scholar
Condat, L.: Fast projection onto the simplex and the $\ell _1$ ball. Math. Program. 158(1), 575–585 (2016). https://doi.org/10.1007/s10107-015-0946-6
Article MathSciNet MATH Google Scholar
Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc. 82(2), 421–439 (1956)
Article MathSciNet Google Scholar
Duchi, J., Shalev-Shwartz, S., Singer, Y., Chandra, T.: Efficient projections onto the L1-ball for learning in high dimensions. In: ICML ’08: Proceedings of the 25th International Conference on Machine Learning, 25th International Conference on Machine Learning (ICML), pp. 272–279, New York (2008). https://doi.org/10.1145/1390156.1390191
Eldén, L.: Matrix methods in data mining and pattern recognition. SIAM (2007). https://doi.org/10.1137/1.9780898718867
Article MATH Google Scholar
Eriksson, A., Thanh Pham, T., Chin, T.-J., Reid, I.: The k-support norm and convex envelopes of cardinality and rank. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3349–3357(2015)
Fazel, M., Hindi, H.., Boyd, S.P.: A rank minimization heuristic with application to minimum order system approximation. In: Proceedings of the 2001 American Control Conference, vol 6, pp. 4734–4739 (2001). https://doi.org/10.1109/ACC.2001.945730
Grussler, C., Giselsson, P.: Local convergence of proximal splitting methods for rank constrained problems. In: 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pp. 702–708. Melbourne (2017). https://doi.org/10.1109/CDC.2017.8263743
Grussler, C., Giselsson, P.: Low-rank inducing norms with optimality interpretations. SIAM J. Optim. 28(4), 3057–3078 (2018). https://doi.org/10.1137/17M1115770
Article MathSciNet MATH Google Scholar
Grussler, C., Zare, A., Jovanović, M.R., Rantzer, A.: The use of the $r\ast $ heuristic in covariance completion problems. In|: 2016 IEEE 55th Conference on Decision and Control (CDC), pp. 1978–1983. Las Vegas (2016). https://doi.org/10.1109/CDC.2016.7798554
Grussler, C., Rantzer, A., Giselsson, P.: Low-rank optimization with convex constraints. IEEE Trans. Autom. Control 63(11), 4000–4007 (2018). https://doi.org/10.1109/TAC.2018.2813009
Article MathSciNet MATH Google Scholar
Grussler, C.: LRINorm—A MATLAB package for rank constrained optimization by low-rank inducing norms and non-convex proximal splitting methods. https://github.com/LowRankOpt/LRINorm (2018a)
Grussler, C.: LRIPy—a python package for rank constrained optimization by low-rank inducing norms and non-convex proximal splitting methods. https://github.com/LowRankOpt/LRIPy (2018b)
Held, M., Wolfe, P., Crowder, H.P.: Validation of subgradient optimization. Math. Program. 6(1), 62–88 (1974). https://doi.org/10.1007/BF01580223
Article MathSciNet MATH Google Scholar
Hiriart-Urruty, J.-B., Lemaréchal, C.: , onvex Analysis and Minimization Algorithms II: Advanced Theory and Bundle Methods. Grundlehren der mathematischen Wissenschaften. Springer, Berlin Heidelberg (1993). https://doi.org/10.1007/978-3-662-06409-2
Book MATH Google Scholar
Hiriart-Urruty, J.-B., Lemaréchal, C.: Convex Analysis and Minimization Algorithms I: Fundamentals. Grundlehren der mathematischen Wissenschaften. Springer, Berlin, Heidelberg (1996). https://doi.org/10.1007/978-3-662-02796-7
Book Google Scholar
Horn, R.A., Johnson, C.R.: Matrix Analysis, 2nd edn. Cambridge University Press, Cambridge (2012). https://doi.org/10.1017/9781139020411
Book Google Scholar
Izenman, A.J.: Reduced-rank regression for the multivariate linear model. J. Multivar. Anal. 5(2), 248–264 (1975). https://doi.org/10.1016/0047-259X(75)90042-1
Article MathSciNet MATH Google Scholar
Jacob, L., Obozinski, G., Vert, J.-P.: Group lasso with overlaps and graph lasso. In: Bottou, L., Littman, M. (eds) Proceedings of the 26th International Conference on Machine Learning, pp. 433–440. Montreal. Omnipress (2009)
Knuth, D.E.: The Art of Computer Programming: Sorting and Searching, vol. 3. Pearson Education, New York (1998)
MATH Google Scholar
Lai, H., Pan, Y., Canyi, L., Tang, Y., Yan, S.: Efficient k-Support Matrix Pursuit, pp. 617–631. Springer, Berlin (2014)
Google Scholar
Larsson, V., Olsson, C.: Convex low rank approximation. Int. J. Comput. Vis. 120(2), 194–214 (2016)
Article MathSciNet Google Scholar
Lewis, A.S.: The convex analysis of unitarily invariant matrix functions. J. Convex Anal. 2(1), 173–183 (1995)
MathSciNet MATH Google Scholar
Li, Y., Liu, H., Wen, Z., Yuan, Y.: Low-rank matrix iteration using polynomial-filtered subspace extraction. SIAM J. Sci. Comput. 42(3), A1686–A1713 (2020). https://doi.org/10.1137/19M1259444
Article MathSciNet MATH Google Scholar
Lions, P.-L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979). https://doi.org/10.1137/0716071
Article MathSciNet MATH Google Scholar
Liu, X., Wen, Z., Zhang, Y.: Limited memory block Krylov subspace optimization for computing dominant singular value decompositions. SIAM J. Sci. Comput. 35(3), A1641–A1668 (2013a). https://doi.org/10.1137/120871328
Article MathSciNet MATH Google Scholar
Liu, Z., Hansson, A., Vandenberghe, L.: Nuclear norm system identification with missing inputs and outputs. Syst. Control Lett. 62(8), 605–612 (2013b). https://doi.org/10.1016/j.sysconle.2013.04.005
Article MathSciNet MATH Google Scholar
Lu, Z., Yong, Z., Li, X.: Penalty decomposition methods for rank minimization. Optim. Methods Softw. 30(3), 531–558 (2015). https://doi.org/10.1080/10556788.2014.936438
Article MathSciNet MATH Google Scholar
McDonald, A.M., Pontil, M., Stamos, D.: New perspectives on k-support and cluster norms. J. Mach. Learn. Res. 17(155), 1–38 (2016)
MathSciNet MATH Google Scholar
Parikh, N., Boyd, S.: Proximal algorithms. Found. Trends Optim. 1(3), 127–239 (2014). https://doi.org/10.1561/2400000003
Article Google Scholar
Peaucelle, D., Henrion, D., Labit, Y., Taitz, K.: User’s guide for SEDUMI INTERFACE 1.04. 2002. LAAS-CNRS, Toulouse
Recht, B., Fazel, M., Parrilo, P.A.: Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52(3), 471–501 (2010). https://doi.org/10.1137/070697835
Article MathSciNet MATH Google Scholar
Tyrrell, R.R.: Convex Analysis. Princeton University Press, Princeton (1970)
MATH Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. 58(1), 267–288 (1996)
MathSciNet MATH Google Scholar
Toh, K.C., Tutuncu, R.H., Todd, M.J.: On the implementation of SDPT3 (version 3.1)—a MATLAB software package for semidefinite-quadratic-linear programming. In: IEEE International Conference on Robotics and Automation, pp. 290–296 (2004)
Trefethen, L.N., David Bau, I.I.I.: Numerical Linear Algebra. SIAM, New York (1997)
Book Google Scholar
Villa, S., Rosasco, L., Mosci, S., Verri, A.: Proximal methods for the latent group lasso penalty. Comput. Optim. Appl. 58(2), 381–407 (2014). https://doi.org/10.1007/s10589-013-9628-6
Article MathSciNet MATH Google Scholar
Wu, B., Ding, C., Sun, D., Toh, K.-C.: On the Moreau–Yosida regularization of the vector $k$-norm related functions. SIAM J. Optim. 24(2), 766–794 (2014). https://doi.org/10.1137/110827144
Article MathSciNet MATH Google Scholar
Zoltowski, D.M., Dhingra, N., Lin, F., Jovanović, M.R.: Sparsity-promoting optimal control of spatially-invariant systems. In: 2014 American Control Conference, pp. 1255–1260 (2014). https://doi.org/10.1109/ACC.2014.6859491

Download references

Acknowledgements

This work was completed while both authors were members of the LCCC Linnaeus Center and the eLLIIT Excellence Center at Lund University. It was financially supported by the Swedish Foundation for Strategic Research and the Swedish Research Council through the Project 621-2012-5357.

Funding

Open access funding provided by Lund University.

Author information

Authors and Affiliations

University of California, Berkeley, Berkeley, USA
Christian Grussler
Lund University, Lund, Sweden
Christian Grussler & Pontus Giselsson

Authors

Christian Grussler
View author publications
You can also search for this author in PubMed Google Scholar
Pontus Giselsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christian Grussler.

Additional information

Communicated by Shoham Sabach.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Lemmas, Proofs and Additional Discussion

1.1 Search Rules

Lemma 2

Let f be proper, closed and convex, $z_1 \ge \dots \ge z_n\ge 0$ and $\left( y^{(t)},w^{(t)}\right) $ denote the t-dependent solution to

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^n(y_i - z_i)^2 \\&\text {subject to}&w \ge g^D_r(y), \ y \in \mathbb {R}^n,\\&&y_{r-t+1} = \dots = y_{r} \ge \dots \ge y_n. \end{aligned} \end{aligned}$$

(33)

where $1\le t \le r$. Then there exists $t^\star $ such that $\left( y^{(t^\star )},w^{(t^\star )}\right) $ is the solution to

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^n(y_i - z_i)^2 \\&\text {subject to}&w \ge g^D_r(y), \ y \in \mathbb {R}^n,\\&&y_1 \ge \dots \ge y_n \ge 0, \end{aligned} \end{aligned}$$

(34)

with $y^{(t^\star )}_{r-t^\star } > y_{r-t^\star +1}^{(t^\star )}$ and $y^{(t^\star )}_{r-t^\star } = y_{r-t^\star +1}^{(t^\star )}$ if $t^\star = r$. Further,

i.
$t^\star = \min \left\{ \lbrace t: y^{(t)}_{r-t} > y_{r-t+1}^{(t)} \rbrace \cup \lbrace r \rbrace \right\} $.
ii.
If $y^{(t')}_{r-t'} \ge y_{r-t'+1}^{(t')}$ then $y^{(t)}_{r-t} \ge y_{r-t+1}^{(t)}$ for all $t \ge t'$.
iii.
If $y^{(t')}_{r-t'} < y_{r-t'+1}^{(t')}$ then $y^{(t)}_{r-t} < y_{r-t+1}^{(t)}$ for all $t \le t'$.

In particular, $t^\star $ can be found by a search over t, where t is increased/decreased according to the following rules:

I
$y^{(t)}_{r-t} \ge y_{r-t+1}^{(t)}$ for all $t \ge t^\star $.
II
$y^{(t)}_{r-t} \le y_{r-t+1}^{(t)}$ for all $t < t^\star $.
III
If $t < t^\star $ and $y^{(t)}_{r-t} = y_{r-t+1}^{(t)}$ then $\left( y^{(t)},w^{(t)} \right) = \left( y^{(t^\star )},w^{(t^\star )} \right) $.

Proof

First we show the equivalence between Eqs. (34) and (33). To this end note that it is not necessary to explicitly restrict y to be nonnegative. The unique solution $(y^\star ,w^\star )$ to (34) fulfills $0 \le y_i^\star \le z_i$ for $1 \le i \le n$. The upper bound holds, because otherwise by [25, Theorem 7.4.8.4] $g^D_r(\bar{y}) \le g^D_r(y^\star )$ with $\bar{y}_i^\star := \min \{z_i,y_i^\star \}$ and thus $\bar{y}^\star $ is a feasible solution to (34) with smaller cost. Similarly, the lower bound holds, because otherwise $\bar{y}^\star $ with $\bar{y}_i^\star = \max \{0,y_i^\star \}$ is a feasible solution to (34) with smaller cost by Definition 1 (ii). Then there exists $t^\star $ such that $ y^\star _{r-t^\star } > y^\star _{r-t^\star +1} = \dots = y^\star _r$ where $t^\star = r$ if $y^\star _1 = y^\star _r$, which implies that $y_{r-t^\star } \ge y_{r-t^\star +1}$ is assumed to be inactive and therefore can be removed from (34). Thus, also the constraints $y_1 \ge \dots \ge y_{r-t^\star }$ can be removed, because the cost function and the sorting of z ensures that the solution will always fulfill them. This yields solving (34) reduces to finding $t^\star $ such that (33) solves (34).

Next, we characterize $t^\star $ in terms of solution to (33). In the following, we let p(t) denote the optimal cost of (33) as a function of t. Since adding constraints cannot reduce the optimal cost, p is a nondecreasing function.

Item i.: By the same reasoning that led to the equivalence between Eqs. (34) and (33), it holds that $y_1^{(t)} \ge \dots \ge y_{r-t}^{(t)}, \ 1\le t \le r$, which is why the set $\lbrace t: y^{(t)}_{r-t} > y_{r-t+1}^{(t)} \rbrace \cup \lbrace r \rbrace $ contains all t for which the solution of (33) is feasible for (34). Since p is nondecreasing and $\left( y^{(t^\star )},w^{(t^\star )}\right) $ is unique, the first claim follows.

Item ii.: The second claim is proven by contradiction. Let $(y^{(t')},w^{(t')})$ be such that $y^{(t')}_{r-t'} \ge y_{r-t'+1}^{(t')}$. Further assume that $y^{(t'+1)}_{r-t'-1} < y_{r-t'}^{(t'+1)}$. In the following, we construct another solution $(\tilde{y},\tilde{w}) \in \mathbb {R}^{q+1}$ to (33) with $t = t' +1$, which has a cost that is no larger than $p(t' +1)$. However, (33) has a unique solution due to strong convexity of the cost function. This yields the desired contradiction. The contradicting solution is constructed as a convex combination $\tilde{w}=(1-\alpha )w^{(t'+1)}+\alpha w^{(t')}$ with $\alpha \in (0,1]$ and a partially sorted convex combination of $y^{(t')}$ and $y^{(t'+1)}$ with the same $\alpha $. Let $\hat{y}:=(1-\alpha )y^{(t'+1)}+\alpha y^{(t')}$ and let

$$\begin{aligned} \tilde{y}:=(\text {sort}(\hat{y}_1,\ldots ,\hat{y}_{r-t'-2},\hat{y}_{r-t'}),\hat{y}_{r-t'-1},\hat{y}_{r-t'+1},\ldots ,\hat{y}_{q}) \end{aligned}$$

be the partially sorted convex combination. To select $\alpha $, we note that by assumption, $y^{(t')}_{r-t'-1}\ge y^{(t')}_{r-t'} \ge y_{r-t'+1}^{(t')}$ and $y^{(t'+1)}_{r-t'-1} < y_{r-t'}^{(t'+1)}=y_{r-t'+1}^{(t'+1)}$. Therefore, there exists an $\alpha \in (0,1]$ such that

$$\begin{aligned} \tilde{y}_{r-t'}=\hat{y}_{r-t'-1}&=(1-\alpha ) y^{(t'+1)}_{r-t'-1} + \alpha y^{(t')}_{r-t'-1}\\&= (1-\alpha ) y^{(t'+1)}_{r-t'+1} + \alpha y^{(t')}_{r-t'+1}=\hat{y}_{r-t'+1}=\tilde{y}_{r-t'+1}. \end{aligned}$$

Since $y_{r-t'+1}^{(t')}=\cdots =y_{r}^{(t')}$ and $ y_{r-t'-1}^{(t'+1)}=\cdots =y_{r}^{(t'+1)},$ it follows that $\tilde{y}_{r-t'}=\cdots =\tilde{y}_{r}.$ Furthermore, the construction of $\tilde{y}$ as well as the sorting yield that $\tilde{y}_{r}\ge \cdots \ge \tilde{y}_q$ and $\tilde{y}_{1}\ge \cdots \ge \tilde{y}_{r-t'-1}$, which is why $\tilde{y}$ satisfies the chain of inequalities in (33) for $t=t'+1$.

It remains to show that $\tilde{y}$ satisfies the epigraph constraint and that the cost is not higher than $p(t'+1)$. These properties are already fulfilled for $\hat{y}$ being a convex combination of two feasible points with costs $p(t')$ and $p(t'+1)$, respectively, where $p(t') \le p(t'+1)$. Therefore, it is left to show that the sorting involved in $\tilde{y}$ maintains these properties. First, we show that sorting of any sub-vector in $\hat{y}$ does not increase the cost. Suppose that $z_i\ge z_j$, $\hat{y}_i\le \hat{y}_j$, i.e., $\hat{y}$ is not sorted the same way as z. Then

$$\begin{aligned} \tfrac{1}{2}\left( (z_i-\hat{y}_i)^2+(z_j-\hat{y}_j)^2\right)&= (z_i-z_j)(\hat{y}_j-\hat{y}_i)+\tfrac{1}{2}\left( (z_i-\hat{y}_j)^2+(z_j-\hat{y}_i)^2\right) \\&\ge \tfrac{1}{2}\left( (z_i-\hat{y}_j)^2+(z_j-\hat{y}_i)^2\right) , \end{aligned}$$

and thus the cost is not increased by sorting $\hat{y}$ or any sub-vector of it. Further, a permutation of the first r elements of $\hat{y}$ does not influence the epigraph constraint, because $g^D_r(\hat{y})$ is permutation invariant by definition.

Next notice that $\tilde{y}$ is obtained from $\hat{y}$ by first swapping $\hat{y}_{r-t'-1}$ and $\hat{y}_{r-t'}$. From the choice of $\alpha $, we conclude that

$$\begin{aligned} \hat{y}_{r-t'}=(1-\alpha )y_{r-t'}^{(t'+1)}+\alpha y_{r-t'}^{(t')}\ge (1-\alpha )y_{r-t'+1}^{(t'+1)}+\alpha y_{r-t'+1}^{(t')}=\hat{y}_{r-t'+1}=\hat{y}_{r-t'-1}. \end{aligned}$$

Thus, this swap is a sorting which does neither increase the cost, nor does it violate the epigraph constraint. Analogously, sorting the first $r-t'$ elements of the resulting vector to obtain $\tilde{y}$ has the same effect and therefore we receive the desired contradiction.

Item iii.: Suppose that there exist t and $t'$ with $t'>t$ such that $y_{r-t'}^{(t')}<y_{r-t'+1}^{(t')}$ and $y_{r-t}^{(t)}\ge y_{r-t+1}^{(t)}$. Then Item ii. shows that $y_{r-t'}^{(t')}\ge y_{r-t'+1}^{(t')}$, which is a contradiction.

Items I. to III.: The statements follow immediately from Items i. to iii.

Lemma 3

Let f and z be as in Lemma 2 and $\left( y^{(t,s)},w^{(t,s)}\right) $ denote the (t, s)-dependent solution to

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^{n}(y_i - z_i)^2\\&\text {subject to}&w \ge g^D_r(y), \ y \in \mathbb {R}^n,\\&&y_{r-t+1} = \dots = y_{r+s}. \end{aligned} \end{aligned}$$

(35)

where $0 \le s \le r -n$ and t is fixed within $1 \le t \le r$. Then, there exists $s^\star $ such that $\left( y^{(t,s^\star )},w^{(t,s^\star )}\right) $ is the solution to (33) with $y^{(t,s^\star )}_{r+s^\star } > y_{r+s^\star +1}^{(t,s^\star )}$ and $y^{(t,s^\star )}_{r+s^\star } = y_{r+s^\star +1}^{(t)}$ if $s^\star = n-r$. Further,

i
$s^\star = \min \left\{ \lbrace s: y^{(t,s^\star )}_{r+s^\star } > y_{r+s^\star +1}^{(t,s^\star )} \rbrace \cup \lbrace n-r \rbrace \right\} $.
ii
If $y^{(t,s')}_{r+s'} \ge y_{r+s'+1}^{(t,s')}$ then $y^{(t,s)}_{r+s} \ge y_{r+s+1}^{(t,s)}$ for all $s \ge s'$.
iii
If $y^{(t,s')}_{r+s'} < y_{r+s'+1}^{(t,s')}$ then $y^{(t,s)}_{r+s} < y_{r+s+1}^{(t,s)}$ for all $s \le s'$.

In particular, $s^\star $ can be found by a search over s, where s is increased/decreased according to the following rules:

I
$y^{(t,s)}_{r+s} \ge y_{r+s+1}^{(t,s)}$ for all $s \ge s^\star $.
II
$y^{(t,s)}_{r+s} \le y_{r+s+1}^{(t,s)}$ for all $s < s^\star $.
III
If $s < s^\star $ and $y^{(t,s)}_{r+s} = y_{r+s+1}^{(t,s)}$ then $\left( y^{(t,s)},w^{(t,s)} \right) = \left( y^{(t,s^\star )},w^{(t,s^\star )} \right) $.

The proof of Lemma 3 goes analogously to the proof of Lemma 2 and is therefore omitted.

Lemma 4

Let f and z be as in Lemma 2, $1 \le t \le r$ and $ 0 \le s \le n-r$. Moreover, let $\tilde{z} := Tz \in \mathbb {R}^{r-t+1}$ be defined by (22) and be $(\tilde{y}^{(t,s)},w^{(t,s)})$ the (t, s)-dependent solution to

$$\begin{aligned} \begin{aligned}&\underset{\tilde{y},w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^{r-t+1}(\tilde{y}_i - \tilde{z}_i)^2 \\&\text {subject to}&w \ge g_{r,s,t}^D(\tilde{y}), \ \tilde{y} \in \mathbb {R}^{r-t+1} \end{aligned} \end{aligned}$$

(36)

Then $(y^{(t,s)},w^{(t,s)})$ is a solution to (35), where

$$\begin{aligned} y^{(t,s)}_i := {\left\{ \begin{array}{ll} \tilde{y}_i^{(t,s)}, &{}\text {if } 1\le i \le r-t,\\ \dfrac{\tilde{y}_i^{(t,s)}}{\sqrt{t+s}}, &{}\text {if } r-t+1 \le i \le r+s, \\ z_i, &{}\text {if } i \ge r+s+1.\\ \end{array}\right. } \end{aligned}$$

(37)

Proof

Letting $\tilde{y} \in \mathbb {R}^{r-t+1}$ be defined as

$$\begin{aligned} \tilde{y}_i = {\left\{ \begin{array}{ll} y_i, \text { if } 1\le i \le r-t,\\ \sqrt{t+s}y_{r-t+1}, \text { if } i = r-t+1, \end{array}\right. } \end{aligned}$$

(38)

and notice that

$$\begin{aligned} \sum _{i = r-t+1}^{r+s} (y_r - z_i)^2&= \left( \tilde{y}_{r-t+1}- \tilde{z}_{r-t+1} \right) ^2 + \sum _{i = r-t+1}^{r+s} z_i^2 - \left( \frac{1}{\sqrt{t+s}}\sum _{i = r-t+1}^{r+s} z_i \right) ^2, \end{aligned}$$

yields the reduced dimensional problem (36).

1.2 Proof to Theorem 1

By Lemma 2, (34) can be solved by performing a search over the t-dependent solutions to (33), where by Lemma 3 these solutions can be determined for each t by a search over the s-dependent solutions to (35). In order to solve (35), we apply Lemma 4 to reduce (35) to solving (24) in Theorem 1. Hence, the remainder of the theorem is a direct application of Lemmas 3 and 2 and thus a nested search with the stated rules succeeds in finding $(t^\star ,s^\star )$.

1.3 General Solution to (24)

In every step of the binary search (24) must be solved. Provided a very mild constraint qualification holds (which it does for our functions of interest), the solution will fall into one of three cases, depending on f and the singular values of Z. The different cases are described in the following.

Proposition 5

Suppose that there exits $(\bar{y},\bar{w})$ such that $\bar{w}\in \mathrm{{relint}} (\text {dom}f)$ and $\bar{w}>g^D_{r,s,t}(\bar{y})$. Then $(\tilde{y},\tilde{w})$ is a solution to (24) if and only if one of the following cases applies:

$$\begin{aligned}&\text {Case 1: }\tilde{y} = \tilde{z} ~\ \Longleftrightarrow ~\ \tilde{w} = \mathop {\text {argmin}}\limits _w f \text { and } \tilde{w} \ge g^D_{r,s,t}(\tilde{z}) \end{aligned}$$

(C1)

$$\begin{aligned}&\text {Case 2: } (\tilde{y},\tilde{w}) = 0 ~\ \Longleftrightarrow ~ g_{r,s,t}(\tilde{z}) \le \frac{\mu }{\gamma } \text { and } \mu \in \partial f(0) \end{aligned}$$

(C2)

$$\begin{aligned}&\text {Case 3: } \frac{\gamma }{\mu } (\tilde{z}-\tilde{y}) \in \partial g^D_{r,s,t}(\tilde{y}) ~\ \mu \in \partial f(\tilde{w})\cap \mathbb {R}_{\ge 0} ~\ \text {and} ~\ \tilde{w} = g^D_{r,s,t}(\tilde{y}) \end{aligned}$$

(C3)

where $\tilde{z} := T \sigma (Z)$ is given by (22).

Proof

A solution $(\tilde{y},\tilde{w})$ to (24) fulfills $0\in \partial (f(\tilde{w})+\tfrac{\gamma }{2}\Vert \tilde{y}-\tilde{z}\Vert ^2+\chi _{\text {epi}(g^D_{r,s,t})}(\tilde{y},\tilde{w}))$ by [24, Theorem VI.2.2.1], which under the assumed constraint qualification is equivalent to

$$\begin{aligned} 0 \in \begin{pmatrix} \gamma (\tilde{y} -\tilde{z}) \\ \partial f(\tilde{w}) \end{pmatrix} + \mathcal {N}_{\text {epi}(g_{r,s,t}^D)}(\tilde{y},\tilde{w}) \end{aligned}$$

(40)

where $\mathcal {N}$ denotes the normal cone to $\text {epi}(g^D_{r,s,t})$ and the summation is understood set-wise. Then by [24, Proposition VI.1.3.1]

$$\begin{aligned} \mathcal {N}_{\text {epi}(g^D_{r,s,t})}(\tilde{y},\tilde{w}) = {\left\{ \begin{array}{ll} \{(\mu G, -\mu ): G \in \partial g^D_{r,s,t}(\tilde{y}), \ \mu \ge 0 \} &{} \text {if } \tilde{w} = g^D_{r,s,t}(\tilde{y})\\ \{0\} &{} \text {if } (\tilde{y},\tilde{w}) \in \text {int}(\text {epi}(g^D_{r,s,t})) \end{array}\right. } \end{aligned}$$

(41)

which is why we need to distinguish the cases $\tilde{y} = \tilde{z}$ and $\tilde{w} = g^D_{r,s,t}(\tilde{y})$. Thus, the proof follows by invoking (3).

Remark 2

In the epigraph case with $f(x)=\tfrac{1}{2}(w+z_v)^2$ and $\gamma = 1$, (C1) corresponds to that $(z,-z_v)$ is in the cone given by the epigraph of $g^D_{r,s,t}$, (C2) corresponds to that $(z,z_v)$ is in the cone given by the epigraph of the dual gauge function $g_{r,s,t}$, and (C3) covers the remaining cases.

The problem of solving (18) therefore reduces to checking Eqs. (C1), (C2) and (C3) within the nested binary search, which has been made explicit for $g^D=\ell _2$ in Appendix 4 and $g^D=\ell _{1}$ in Appendix 6.

1.4 Proof to Proposition 3

For $\tau > 0$ and a gauge function $\tilde{g}$ it holds that $g = \tau \tilde{g}$ is gauge function with $g^D = \frac{\tilde{g}}{\tau }$. Setting $\gamma =1$ and $f(w) = \frac{1}{2}(w +z_v)$ in Theorem 1, Eqs. (C1), (C2), and (C3) in Proposition 5 then become

$$\begin{aligned}&(\tilde{y},\tilde{w}) = (\tilde{z},zv) ~\ \Longleftrightarrow ~\ -\tau z_v \ge \tilde{g}^D_{r,s,t}(\tilde{z}) \end{aligned}$$

(42a)

$$\begin{aligned}&(\tilde{y},\tilde{w}) = 0 ~\ \Longleftrightarrow ~\ \tilde{g}_{r,s,t}(\tilde{z}) \le \frac{z_v}{\tau } \end{aligned}$$

(42b)

$$\begin{aligned}&\frac{\tau }{\mu }(\tilde{z}-\tilde{y}) \in \partial {\tilde{g}^D_{r,s,t}}(\tilde{y}),~\ \mu = \tilde{w} + z_v \ge 0 ~\ \text {and} ~\ \tau \tilde{w} = \tilde{g}^D_{r,s,t}(\tilde{y}). \end{aligned}$$

(42c)

For our particular case $\tilde{g} = \ell _2$, it follows immediately that Eqs. (42a) and (42b) correspond to Eqs. (27a) and (27b). Furthermore, by taking the gradient of $g^D_{r,s,t}$, (42c) becomes Eqs. (27c), (27e) and (27d) with the constraints $\mu \ge 0$ and $\tau \tilde{w} = {g^D_{r,s,t}(\tilde{y})}$. Thus, it is left to compute $\mu \ge 0$. Plugging Eqs. (27c), (27e) and (27d) into $\tau ^2 \tilde{w}^2 = {g^D_{r,s,t}(\tilde{y})}^2$ and making some rearrangements yields

$$\begin{aligned} 1 = \frac{\sum _{i=1}^{r-t}\tilde{z}_i^2}{\left( \tilde{w} \tau +\frac{\mu }{\tau }\right) ^2} + \frac{t}{s+t}\frac{\tilde{z}_{r-t+1}^2}{\left( \tilde{w} \tau +\frac{\mu t}{(s+t)\tau }\right) ^2}. \end{aligned}$$

Then defining $c_1 := \sum _{i=1}^{r-t} \tilde{z}^2_i$ and $c_2 := \sqrt{t+s}\tilde{z}_{r-t+1}$, this can be rewritten as the fourth-order polynomial equation (27f) which can be solved explicitly for unique $\mu \ge 0$ after the substitution (27e) is performed. This proves the first part of Proposition 3. For $f(w) = \chi _{[0,\gamma ]}(w)$, Eqs. (C1), (C2) and (C3) are

$$\begin{aligned}&\tilde{y} = \tilde{z} ~\ \Longleftrightarrow ~\ \tau \ge \tilde{g}^D_{r,s,t}(\tilde{z}) \end{aligned}$$

(43a)

$$\begin{aligned}&\frac{\gamma }{\mu }(\tilde{z}-\tilde{y}) \in \partial \tilde{g}^D_{r,s,t}(\tilde{y}),~\ \mu \ge 0 ~\ \text {and} \ \tau = \tilde{g}^D_{r,s,t}(\tilde{y}). \end{aligned}$$

(43b)

Note that (C2) is redundant here, because it coincides with (43a). Hence, for $g = \ell _2$ (43a) becomes (27a) with $z_v = -1$ and (43b) is equivalent to Eqs. (27f), (27c) and (27d) with $\tilde{w} = 1$.

1.5 Break Point Search

Lemma 5

Let $(\tilde{z},z_v)$ fulfill neither of Eqs. (29a) and (29b), and $\hat{z}$ and $\alpha $ be as in Proposition 4. Further, let $\mu ^\star $ be the solution to $ \sum _{i=1}^{r-t+1}\max (\hat{z}_{i}-\alpha _i\mu ,0) +z_v - \mu =0$ and $\hat{\mu }_k$ be the solution to $\left( \sum _{i=1}^{k} \hat{z}_{i}-\alpha _i\mu \right) +z_v - \mu =0$, i.e., $\hat{\mu }_k = \frac{z_v+\sum _{i=1}^{k}\hat{z}_{i}}{1+\sum _{i=1}^k\alpha _i}.$ Then there exists $k^\star \in \{1,\dots r-t+1 \}$ such that $\hat{z}_{k^\star }- \alpha _{k^\star }{\mu ^\star } \ge 0$, $\hat{z}_{i}-\alpha _{i}{\mu ^\star } < 0$ for all $i > k^\star $ and

i.
$\hat{\mu }_{k^\star } = \mu ^\star .$
ii.
$k^\star =\max \{k : \hat{z}_k-\alpha _k\hat{\mu }_k\ge 0\}$.
iii.
If $\hat{z}_k-\alpha _k\hat{\mu }_k\ge 0$, then $\hat{z}_i-\alpha _i\hat{\mu }_i\ge 0$ for all $i \le k$.
vi.
If $\hat{z}_k-\alpha _k\hat{\mu }_k< 0$, then $\hat{z}_i-\alpha _i\hat{\mu }_i< 0$ for all $i\ge k$.

In particular,

I.
$\hat{z}_k-\alpha _k\hat{\mu }_k\ge 0$ for all $k \le k^\star $.
II.
$\hat{z}_k-\alpha _k\hat{\mu }_k< 0$ for all $k> k^\star $.

Proof

We first show some results needed to prove Items ii. and iii. Let $ g_k(\mu ):=\sum _{i=1}^{k}\max (\hat{z}_{i}-\alpha _i\mu ,0)+z_v-\mu ,$ and let $\mu _k$ be the unique solution to the equation $g_k(\mu )=0$. Since all $g_i$ are strictly decreasing in $\mu $ and $g_k(\mu ) = g_{k-1}(\mu )+ \max (\hat{z}_{k}-\alpha _k\mu ,0) \ge g_{k-1}(\mu ),$ we have

a.
$\mu _{k-1}\le \mu _{k}.$
b.
$\hat{z}_k-\alpha _k\mu _k\le 0 \ \Leftrightarrow \ g_{k-1}(\mu _k)=g_k(\mu _k)=0 \ \Leftrightarrow \ \mu _{k-1}=\mu _k$.

Moreover, the break point sorting in $\hat{z}$ implies that if l and $\mu $ are such that $\hat{z}_l-\alpha _l\mu \ge 0$, then also $\hat{z}_i-\alpha _i\mu \ge 0$ for all $i\le l$. Thus,

$$\begin{aligned} \hat{z}_k-\alpha _k\mu \ge 0 \ \Leftrightarrow \ \sum _{i=1}^k\max (\hat{z}_i-\alpha _i\mu ,0)+z_v-\mu =\left( \sum _{i=1}^k\hat{z}_i-\alpha _i\mu \right) +z_v-\mu . \end{aligned}$$

In conjunction with the uniqueness of ${\mu }_k$, this implies that

c.
$\hat{z}_k-\alpha _k\mu _k\ge 0$ or $\hat{z}_k-\alpha _{k}\hat{\mu }_k\ge 0 \ \Leftrightarrow \ \hat{\mu }_k=\mu _k$.

Item i.: This has already been proven in the discussion before Lemma 5.

Item ii.: By the definition of $k^\star $ and Item i. it holds that

$$\begin{aligned} \hat{z}_{k^\star }-\alpha _{k^\star }\hat{\mu }_{k^\star }\ge 0 \qquad \text {and} \qquad \hat{z}_i-\alpha _i\hat{\mu }_{k^\star } < 0 \ \ \text {for all} \ \ i > k^\star . \end{aligned}$$

(44)

Thus, by Item c. $\hat{\mu }_{k^{\star }} = \mu ^\star = \mu _{k^{\star }}$ and $\hat{z}_i-\alpha _i \mu _{i} < 0$ for all $i > k^\star .$ Then Item b. implies that $ \hat{\mu }_{k^{\star }} = \mu ^\star = \mu _{r-t+1} = \mu _{r-t} = \dots = \mu _{k^{\star }}.$ Therefore, if there exists $k > k^\star $ with $\hat{z}_k-\alpha _k\hat{\mu }_k\ge 0$, it will hold by Item c. that $\hat{\mu }_k = \mu _k = \hat{\mu }_{k^{\star }}$, which contradicts (44), because $0 \le \hat{z}_k-\alpha _k\hat{\mu }_k = \hat{z}_k-\alpha _k\hat{\mu }_{k^\star } < 0.$ This proves that $k^\star = \max \{k: \hat{z}_k-\alpha _k\hat{\mu }_k\ge 0\}$.

Item iii.: Assume that $\hat{z}_{k}-\alpha _{k}\hat{\mu }_k\ge 0$. Then, by the break point sorting it holds that $ \hat{z}_{k-1}-\alpha _{k-1}\hat{\mu }_k\ge 0$ and by Items a. and c. that $\hat{\mu }_k = \mu _k \ge \mu _{k-1}$. Thus, we conclude that

$$\begin{aligned} 0\le \hat{z}_{k-1}-\alpha _{k-1}\hat{\mu }_k=\hat{z}_{k-1}-\alpha _{k-1}\mu _k\le \hat{z}_{k-1}-\alpha _{k-1}\mu _{k-1}=\hat{z}_{k-1}-\alpha _{k-1}\hat{\mu }_{k-1}, \end{aligned}$$

where the last equality follows again by Item c. The other indices follow inductively.

Item iv.: Let on the contrary k be such that $\hat{z}_k-\alpha _k\hat{\mu }_k<0$, but with $i\in \{k,\ldots ,r-t+1\}$ such that $\hat{z}_i-\alpha _i\hat{\mu }_i\ge 0$. Then, by Item iii., $\hat{z}_k-\alpha _k\hat{\mu }_k\ge 0$, which is a contradiction.

Items I. and II.: Follow immediately from Items ii. to iv.

1.6 Proof to Proposition 4

Analogous to showing Proposition 3, Eqs. (42a) and (42b) correspond to Eqs. (C1) and (C2) in Proposition 5, which translate for $\tilde{g} = \ell _{\infty }$ to

$$\begin{aligned}&(\tilde{y}^\star ,w^\star ) = (\tilde{z},zv) ~\ \Longleftrightarrow ~\ \sum _{i=1}^{r-t} |\tilde{z}_i| + \frac{t}{\sqrt{t+s}} |\tilde{z}_{r-t+1}| \le - \tau z_v \\&(\tilde{y}^\star ,w^\star ) = 0 ~\ \Longleftrightarrow ~\ \max \left( |\tilde{z}_1|,\ldots ,|\tilde{z}_{r-t-2}|,\dfrac{\sqrt{t+s}}{t}|\tilde{z}_{r-t+1}|\right) \le \frac{z_v}{\gamma } \end{aligned}$$

Since $\tilde{z}$ is nonnegative and decreasingly sorted, the second case simplifies to (29b). For (42c), we need to note that $\tilde{y} \in \mathbb {R}^{r-t+1}_{\ge 0}$ and therefore the conditions for $\tilde{y}_i=0$ and $\tilde{y}_i>0$ become

$$\begin{aligned} \tilde{y}_i =0\Leftrightarrow \tilde{z}_i\in \left[ 0,\tfrac{\mu }{\tau }\right] , \qquad \tilde{y}_i>0\Leftrightarrow \tilde{y}_i=\tilde{z}_i-\tfrac{\mu }{\tau } \end{aligned}$$

for all $i \in \{1,\ldots ,r-t \}$. These equivalences also hold for $\tilde{y}_{r-t+1}$ with $\mu $ multiplied by $t/\sqrt{s+t}$. Therefore, Eqs. (29c), (29d) and (29e) follow together with the constraints $\tau \tilde{w} = \tilde{g}^D_{r,s,t}(\tilde{y})$ and $\mu \ge 0$. Then, plugging Eqs. (29c) and (29d) into $\tau w^\star = \tilde{g}^D_{r,s,t}(\tilde{y})$ yields

$$\begin{aligned} 0&= \frac{1}{\tau }\sum _{i=1}^{r-t} |\tilde{y}_i| + \dfrac{t}{\sqrt{t+s}}|\tilde{y}_{r-t+1}| - \tilde{w} \nonumber \\&= \sum _{i=1}^{r-t} \max \left( \frac{\tilde{z}_i}{\gamma } - \frac{\mu }{\gamma ^2},0 \right) + \max \left( \dfrac{t}{\sqrt{t + s}\gamma }\tilde{z}_{r-t+1} - \dfrac{t^2 \mu }{(t + s)\gamma ^2},0 \right) + z_v - \mu . \end{aligned}$$

(45a)

which determines the unique solution to $\mu \ge 0$. We solve the equation by using a so-called break point searching algorithm, as it has been done for similar problems in [10, 12, 22].

In our case, the break points are given by the smallest values of $\mu $ for which each max expressions as function of $\mu $ becomes zero, i.e., $\left( \gamma \tilde{z}_1, \dots , \gamma \tilde{z}_{r-t}, \frac{\gamma \sqrt{s+t}}{t} \tilde{z}_{r-t+1} \right) $. Then we define $\hat{z} :=\frac{1}{\gamma }\left( \tilde{z}_1,\ldots ,\tilde{z}_j,\dfrac{t}{\sqrt{(t+s)}}\tilde{z}_{r-t+1},\tilde{z}_{j+1},\ldots ,\tilde{z}_{r-t}\right) $, to be the vector that sorts $\frac{1}{\gamma }\left( \tilde{z}_1,\dots ,\tilde{z}_{r-t},\frac{t}{\sqrt{t+s}} \tilde{z}_{r-t+1}\right) $ by decreasing break points, i.e., j fulfills

$$\begin{aligned} \tilde{z}_j>\tfrac{\sqrt{(t+s)}}{t}\tilde{z}_{r-t+1}\ge \tilde{z}_{j+1} \quad \text {or} \quad \tilde{z}_{r-t} \ge \tfrac{\sqrt{(t+s)}}{t}\tilde{z}_{r-t+1}. \end{aligned}$$

(46a)

Therefore, (45a) can be equivalently written as

$$\begin{aligned} \sum _{i=1}^{r-t+1}\max (\hat{z}_{i}-\alpha _i\mu ,0) +z_v - \mu =0 \end{aligned}$$

(46b)

with $\alpha =\frac{1}{\gamma ^2}\left( 1,\ldots ,1,\dfrac{t^2}{(t+s)},1,\dots ,1\right) .$ Hence, there exists an index $k^\star \in \{1,\dots ,r-t+1\}$ such that the unique solution $\mu \ge 0$ to (46b) fulfills

$$\begin{aligned} \hat{z}_{k^\star }- \alpha _{k^\star }\mu \ge 0{\text {\qquad and \qquad }} \hat{z}_{i}-\alpha _{i}\mu < 0 \ \text { for all } \ i > k^\star , \end{aligned}$$

(46c)

which is why $\mu $ can be determined as

$$\begin{aligned} \mu =\dfrac{z_v+\sum _{i=1}^{k^\star }\hat{z}_{i}}{1+\sum _{i=1}^{k^\star }\alpha _i} . \end{aligned}$$

(46d)

Consequently, computing $\mu $ equals a search for $k^\star \in \{1,\dots ,r-t+1\}$ for which (46d) satisfies (46c). This can be done with the search rules in Lemma 5.

Finally, if $f(w) = \chi _{[0,\gamma ]}(w)$, then Eqs. (C1), (C2) and (C3) are given by Eqs. (43a) and (43b). For $\tilde{g} = \ell _{\infty }$, this corresponds to (29a) with $z_v = -1$, and Eqs. (29c) and (29d) with the constraint that $ \sum _{i=1}^{r-t+1}\max (\hat{z}_{i}-\alpha _i\mu ,0)=\tau ,$ respectively. Therefore, $\hat{\mu }_k = \frac{\sum _{i=1}^{k}\hat{z}_{i}}{\sum _{i=1}^k\alpha _i}$, $\mu =\dfrac{\sum _{i=1}^{k^\star }\hat{z}_{i}}{\sum _{i=1}^{k^\star }\alpha _i}$ and it is readily seen that $k^\star $ obeys the same rules as in Lemma 5.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Grussler, C., Giselsson, P. Efficient Proximal Mapping Computation for Low-Rank Inducing Norms. J Optim Theory Appl 192, 168–194 (2022). https://doi.org/10.1007/s10957-021-01956-2

Download citation

Received: 15 December 2020
Accepted: 29 September 2021
Published: 30 November 2021
Issue Date: January 2022
DOI: https://doi.org/10.1007/s10957-021-01956-2

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Efficient Proximal Mapping Computation for Low-Rank Inducing Norms

Abstract

Similar content being viewed by others

Nonconvex Sparse Regularization and Splitting Algorithms

Proximal linearization methods for Schatten p-quasi-norm minimization

A Relaxed Interior Point Method for Low-Rank Semidefinite Programming Problems with Applications to Matrix Completion

1 Introduction

2 Preliminaries

Definition 1

Proposition 1

3 Low-Rank Inducing Norms

Lemma 1

Proposition 2

4 Proximal Mappings

4.1 Search Framework

Theorem 1

4.2 Low-Rank Inducing Frobenius and Spectral Norms

Proposition 3

Proposition 4

4.3 Computational Complexity

Remark 1

4.3.1 Low-Rank Inducing Frobenius Norms

4.3.2 Low-Rank Inducing Spectral Norms

5 Case Study: Matrix Completion

6 Conclusion

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Lemmas, Proofs and Additional Discussion

Lemmas, Proofs and Additional Discussion

1.1 Search Rules

Lemma 2

Proof

Lemma 3

Lemma 4

Proof

1.2 Proof to Theorem 1

1.3 General Solution to (24)

Proposition 5

Proof

Remark 2

1.4 Proof to Proposition 3

1.5 Break Point Search

Lemma 5

Proof

1.6 Proof to Proposition 4

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation