1 Introduction

Non-convex optimization problems with a rank or cardinality constraint appear in many data driven areas such as machine learning, image analysis and multivariate linear regression [6, 8, 13, 26, 40] as well as areas within control such as system identification, model reduction, low-order controller design, and low-complexity modeling [2, 15, 18, 35, 47]. Besides the low-rank constraint, these problems are often convex and therefore one of the most common techniques for solving such problems is to convexify them by using regularizers or taking convex envelopes [8, 15, 17, 19]. A promising class of such regularizers and convex envelopes is the class of so-called unitarily invariant low-rank inducing norms [17], i.e., convex envelopes of unitarily invariant norms whose domain is restricted to matrices with prescribed bounded rank. As many common loss functions, e.g., squared distance in the Frobenius norm, contain terms of unitarily invariant norms, these norms have the attractive feature to exactly convexify, i.e., the convexified problem in terms of the low-rank inducing norm coincides on with the original at all low-rank matrices of prescribed bounded rank. Therefore, if the convexified problem has a low-rank solution, it is guaranteed to be a solution to the non-convex one.

Although low-rank inducing norms often admit a representation as semi-definite programs (SDP) [17], proximal splitting algorithms [9] are often used for large-scale problems, where standard interior-point method solvers have too costly iterations [39]. The main objective of this work is to efficiently compute the needed proximal mappings of low-rank inducing norms that are composed with increasing convex functions. To this end, we develop a generic nested binary search algorithm, which in each iteration solves a simple problem. While for well-known low-rank inducing norms such as the nuclear norm [38] and the low-rank inducing Frobenius norm [1, 14, 29, 30], our algorithm will recover the same efficiency, for other norms such as the low-rank inducing spectral norm [46], our approach improves the computational complexity significantly, especially in the vector-valued case. Finally, [45] proposes a non-analytic approach for an extended class of not necessarily unitarily invariant low-rank inducing norms (see [27]). This approach, however, depends on the complexity and convergence rates of other optimization algorithms.

The paper is organized as follows. We start by introducing some preliminaries on norms and convex optimization. Subsequently, a formal definition of the class of low-rank inducing norms, including their application to rank constrained optimization problems is outlined. Then, we discuss and derive our main results, the binary search framework and outline an algorithm for evaluating their epigraph projections. For the low-rank inducing Frobenius and spectral norms, we make these computations explicit and arrive at implementable algorithms for which the computational cost is analyzed. Subsequently, a case study is performed in order to illustrate the performance of our algorithm through proximal splitting. Finally, we draw a conclusion and point the reader to our freely available implementations of these algorithms in MATLAB and Python.

2 Preliminaries

The set of reals is denoted by \(\mathbb {R}\), the set of real vectors by \(\mathbb {R}^n\), the set of vectors with nonnegative entries by \(\mathbb {R}^n_{\ge 0}\) and the set of real matrices by \(\mathbb {R}^{n \times m}\). In the remainder of the paper, we assume without loss of generality that \(n \le m\). The singular value decomposition of \(X \in \mathbb {R}^{n \times m}\) is denoted by \(X = \sum _{i=1}^{n} \sigma _i(X) u_i v_i^\mathsf {T}\) with non-increasingly ordered singular values \(\sigma _1(X) \ge \dots \ge \sigma _{n}(X)\) (counted with multiplicity). The corresponding vector of all singular values is given by \(\sigma (X):=(\sigma _1(X),\ldots ,\sigma _{n}(X)).\) For all \(x=(x_1,\ldots ,x_n)\in \mathbb {R}^n\), we define the \(\ell _p\) norms with \(1\le p<\infty \) by \(\ell _p(x):=\left( \sum _{i=1}^{q} |x_i|^p\right) ^{\frac{1}{p}}\) and \(\ell _{\infty }(x):=\max _{i}|x_i|\), where \(|\cdot |\) denotes the absolute value.

A matrix norm \(\Vert \cdot \Vert : \mathbb {R}^{n \times m}\rightarrow \mathbb {R}_{\ge 0}\) is called unitarily invariant if for all unitary matrices \(U \in \mathbb {R}^{n \times n}\) and \(V \in \mathbb {R}^{m \times m}\) and all \(X \in \mathbb {R}^{n \times m}\) it holds that \(\Vert UXV\Vert = \Vert X\Vert \). Equivalently, unitary invariance can be characterized by symmetric gauge functions (see e.g., [25, Theorem 7.4.7.2]):

Definition 1

A function \(g: \mathbb {R}^n \rightarrow \mathbb {R}_{\ge 0}\) is a symmetric gauge function if

  1. i.

    g is a norm.

  2. ii.

    \(\forall x \in \mathbb {R}^{n}: g(|x|) = g(x)\), where |x| denotes the element-wise absolute value.

  3. iii.

    \(g(Px) = g(x)\) for all permutation matrices \(P\in \mathbb {R}^{n\times n}\) and all \(x\in \mathbb {R}^n\).

Proposition 1

The norm \(\Vert \cdot \Vert :\mathbb {R}^{n \times m}\rightarrow \mathbb {R}_{\ge 0}\) is unitarily invariant if and only if \(\Vert \cdot \Vert = g(\sigma _1(\cdot ),\dots ,\sigma _{n }(\cdot ))\), where g is a symmetric gauge function.

Throughout this work, we use the notation \(\Vert X\Vert _g := g(\sigma (X))\). For \(X,Y \in \mathbb {R}^{n \times m}\) the Frobenius inner product is defined as \(\langle X , Y \rangle := \sum _{i=1}^{m}\sum _{j=n}^{n} x_{ij} y_{ij} = \text {trace}(X^\mathsf {T}Y)\) with Frobenius norm \(\Vert X\Vert _{\ell _2} := \ell _2(\sigma (X)) = \sqrt{\langle X , X \rangle }.\) The nuclear norm and the spectral norm are given by \(\Vert \cdot \Vert _{\ell _1}:=\ell _{1}(\sigma (\cdot ))\) and \(\Vert \cdot \Vert _{\ell _\infty }:=\ell _{\infty }(\sigma (\cdot ))=\sigma _1(\cdot )\). The dual norm to \(\Vert \cdot \Vert _g\) is defined as

$$\begin{aligned} \Vert \cdot \Vert _{g^D}&:= \max _{\Vert X\Vert _g \le 1} \langle \cdot , X \rangle =: g^D(\sigma _1(\cdot ),\dots ,\sigma _n(\cdot )). \end{aligned}$$
(1)

Dual norms inherit the unitary invariance as well as the duality relationship for \(\ell _p\) norms, i.e., \(g = \ell _p\) implies \(g^D = \ell _q\) with \(p,q \in [1,\infty ]\) satisfying \(\frac{1}{p} + \frac{1}{q} = 1\). We will also make use of truncated dual gauge functions. Let \(y\in \mathbb {R}^n\), \(r\in \{1,\ldots ,n\}\), and \(g^D:\mathbb {R}^n\rightarrow \mathbb {R}_{\ge 0}\). The truncated dual gauge function is then defined as

$$\begin{aligned} g^D_r(y) := g^D(\text {sort}(y)_1,\ldots ,\text {sort}(y)_r,0,\ldots ,0), \end{aligned}$$
(2)

where \(\text {sort}: \mathbb {R}^n \rightarrow \mathbb {R}^n\) denotes sorting in descending order.

Next, we introduce some standard notation and results from convex optimization [5, 41]. For \(f: \mathbb {R}^{n \times m}\rightarrow \mathbb {R} \cup \lbrace \infty \rbrace \), we denote by \(\text {dom}(f)\) and \(\text {epi}(f)\) the effective domain and epigraph of f, respectively. Its subdifferential at \(X \in \mathbb {R}^{n \times m}\) is written as \(\partial f(X)\). In particular, by [24, Example VI.3.1]

$$\begin{aligned} \partial \Vert X\Vert _g = \{ G \in \mathbb {R}^{n \times m}: \langle G, X\rangle = \Vert X\Vert _g, \ \Vert G\Vert _{g^D} = 1 \}. \end{aligned}$$
(3)

Further, f is said to be proper if \(\text {dom}f \ne \emptyset \) and closed if \(\text {epi}(f)\) is a closed set. The conjugate (dual) function of f is denoted by \(f^*\) and \(f^{**} := (f^*)^*\) is called the biconjugate function or convex envelope of f. For \(f:\mathbb {R} \rightarrow \mathbb {R}\cup \lbrace \infty \rbrace \), we say that f increasing if \(x \le y \ \Rightarrow \ f(x) \le f(y) \ \text { for all } \ x,y \in \text {dom}(f)\) and if there exist \(x,y\in \mathbb {R}\) such that \(x<y\) and \(f(x)<f(y)\). Moreover, its monotone conjugate is defined as [41] \(f^+(y) := \sup _{x \ge 0} \left[ xy - f(x) \right] \ \text { for all } y\in \mathbb {R}.\) The 0-infinity indicator (or characteristic) function of a set \(\mathcal {S} \subset \mathbb {R}^{n \times m}\) is denoted by \(\chi _{\mathcal {S}}\), which we also use for the indicator function of the set of matrices with at most rank r, i.e., \(\chi _{\text {rank}(\cdot )\le r}\). For any \(Z \in \mathbb {R}^{n \times m}\), the proximal mapping of a closed, proper and convex function \(f: \mathbb {R}^{n \times m}\rightarrow \mathbb {R} \cup \lbrace \infty \rbrace \) is defined as

$$\begin{aligned} \text {prox}_{\gamma f}(Z) := \mathop {\text {argmin}}\limits _X \left( f(X)+\dfrac{1}{2\gamma } \Vert X-Z\Vert ^2_{\ell _2} \right) . \end{aligned}$$
(4)

In particular, \(\text {prox}_{\gamma \chi _{\mathcal {C}}}(Z)\) coincides with the unique Euclidean projection

$$\begin{aligned} \Pi _{\mathcal {C}}(Z) := \mathop {\text {argmin}}\limits _{X \in \mathcal {C}} \Vert X-Z\Vert _{\ell _2} \end{aligned}$$

onto \(\mathcal {C}\) for any closed, non-empty, convex set \(\mathcal {C} \subset \mathbb {R}^{n \times m}\). Moreover, by the extended Moreau decomposition it holds for all \(f: \mathbb {R}^{n \times m}\rightarrow \mathbb {R} \cup \lbrace \infty \rbrace \), \(Z \in \mathbb {R}^{n \times m}\) and \(\gamma > 0\) that (see [4, Theorem 6.29])

$$\begin{aligned} \text {prox}_{\gamma f}(Z) = Z - \gamma \text {prox}_{\gamma ^{-1} f^*}(\gamma ^{-1}Z). \end{aligned}$$
(5)

Finally, we denote compositions of two functions f and g by \( (f \circ g)(\cdot ) := f(g(\cdot ))\).

3 Low-Rank Inducing Norms

This section introduces the family of unitarily invariant low-rank inducing norms, which has been discussed in [17]. Besides recapping some elementary properties, we briefly motivate the usefulness of these norms as convex envelopes or additive regularizers in optimization problems to promote low-rank solutions.

Low-rank inducing norms are defined as the dual norm of a rank constrained dual norm

$$\begin{aligned} \Vert Y \Vert _{g^D,r} := \max _{{\mathop {\Vert X\Vert _g \le 1}\limits ^{\text {rank}(X) \le r}}} \langle X, Y \rangle . \end{aligned}$$
(6)

This means that the low-rank inducing norms corresponding to \(\Vert \cdot \Vert _g\) are

$$\begin{aligned} \Vert X\Vert _{g,r*} := \max _{\Vert Y\Vert _{g^D,r}\le 1}\langle Y,X\rangle . \end{aligned}$$
(7)

For \(r=n\), the rank constraint in (6) is redundant and \(\Vert \cdot \Vert _g \equiv \Vert \cdot \Vert _{g,r*}\). Some important properties of these norms are summarized next [17].

Lemma 1

Let \(X,Y \in \mathbb {R}^{n \times m}\), \(r \in \mathbb {N}\) be such that \(1\le r \le n\), and \(g: \mathbb {R}^n \rightarrow \mathbb {R}_{\ge 0}\) be a symmetric gauge function. Then \(\Vert \cdot \Vert _{g^D,r}\) is a unitarily invariant norm with

$$\begin{aligned} \Vert Y \Vert _{g^D,r} = g^D_r(\sigma (Y)), \end{aligned}$$
(8)

where \(g^D_r\) is defined in (2). Its dual norm \(\Vert \cdot \Vert _{g,r*}\) satisfies

$$\begin{aligned}&\Vert \cdot \Vert _{g,r*} = (\Vert \cdot \Vert _g + \chi _{\text {rank}(\cdot )\le r}(\cdot ))^{**}. \end{aligned}$$
(9)

In this work, we especially consider the so-called low-rank inducing Frobenius and spectral norms, i.e., the cases when \(g = \ell _2\) and \(g = \ell _{\infty }\). Since \(\ell _2^D = \ell _2\) and \(\ell _{\infty }^D = \ell _1\), it follows from (8) that \(\Vert X\Vert _{\ell _2,r*} :=\max _{\Vert Y\Vert _{\ell _2,r} \le 1}\) with \(\Vert Y\Vert _{\ell _2^D,r}:=\sqrt{\sum _{i=1}^r \sigma _i^2(Y)}\) and \(\left\| {} X {}\right\| _{\ell _{\infty },r*} :=\max _{\left\| {} Y{}\right\| _{\ell _1,r}\le 1}\langle Y,X\rangle \) with \(\left\| {} Y{}\right\| _{\ell _1,r} = \sum _{i=1}^r \sigma _i(Y)\).

The following motivates the main interest in low-rank inducing norms (see [16, 17, 19] for details).

Proposition 2

Assume that \(f_0: \mathbb {R}^{n \times m}\rightarrow \mathbb {R} \cup \lbrace \infty \rbrace \) is a proper closed convex function, and that \(r \in \mathbb {N}\) is such that \(1 \le r \le \min \{ m,n \}\). Let \(f_1: \mathbb {R}_{\ge 0} \rightarrow \mathbb {R} \cup \{ \infty \}\) be an increasing, proper closed convex function, and let \(\theta >0\). Then

$$\begin{aligned} (f_1 \circ \left\| {} \cdot {} \right\| _{g,r*})^*= f_1^+(\left\| {} \cdot {}\right\| _{g^D,r}) \end{aligned}$$
(10)

and

$$\begin{aligned} \inf _{{\mathop {\text {rank}(X) \le r}\limits ^{X \in \mathbb {R}^{n \times m}}}} \left[ f_0(X) + \theta f_1(\Vert X\Vert _g) \right]&\ge -\inf _{D \in \mathbb {R}^{n \times m}} \left[ f_0^*(D) + \theta f^{+}(\theta ^{-1}\Vert D\Vert _{g^D,r}) \right] \end{aligned}$$
(11)
$$\begin{aligned}&= \inf _{X \in \mathbb {R}^{n \times m}} \left[ f_0(X) + \theta f_1(\Vert X\Vert _{g,r*}) \right] . \end{aligned}$$
(12)

If \(X^\star \) solves (12) such that \(\text {rank}(X^\star ) \le r\), then equality holds and \(X^\star \) is also a solution to the problem on the left of (11).

In other words, Proposition 2 shows that low-rank inducing norms can be used both as additive regularizers and direct convex envelopes to find (approximate) solutions to

$$\begin{aligned} \begin{aligned}&\underset{X}{\text {minimize}}&L(X)\\&\text {subject to}&\text {rank}(X) \le r. \end{aligned} \end{aligned}$$
(13)

For regularization as in [15, 42], we set \(f_0 = L\) and choose a suitable \(f_1\) and \(\theta \) to find an approximate solution. In the second case, when L can be split into \(L = f_0 + f_1(\Vert \cdot \Vert _g)\) as in Proposition 2, then

$$\begin{aligned} \min _{X \in \mathbb {R}^{n \times m}} \left[ f_0(X) + f_1(\Vert X\Vert _{g,r*}) \right] \end{aligned}$$
(14)

may return an (exact) solution to (13).

4 Proximal Mappings

For problems of small dimensions, it is often convenient to solve (14) through semi-definite programming (SDP). However, conventional SDP solvers are typically based on interior-point methods (see [39]) with an iteration cost that grows unfavorably with the problem dimension. For large-scale problems, proximal splitting methods can be used [4, 9]. To efficiently solve (14), proximal splitting methods require efficient computation of the proximal mapping of \(f_1(\Vert \cdot \Vert _{g,r*})\).

In this section, we present our main results on developing a nested binary search framework for computing this proximal mapping for simple choices of \(f_1\), efficiently. Explicit and implementable steps for these computations will be shown for the common cases \(f_1 = (\cdot )\) and \(f_1 = (\cdot )^2\) with \(g = \ell _2\) and \(g = \ell _{\infty }\) [3, 17, 19, 37]. In Sect. 4.3, the computational complexity of our generic algorithm as well as these particular cases is derived. In cases where \(f_1\) is not simple, we can write (14) as

$$\begin{aligned} \min _{t \in \mathbb {R},~X \in \mathbb {R}^{n \times m}} f_0(X) + f_1(t) + \chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}(X,t), \end{aligned}$$
(15)

where \(\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}\) is the indicator function of the epigraph to \(\Vert \cdot \Vert _{g,r*}\). Then a consensus formulation for proximal splitting methods (see [9]) requires an evaluation of the proximal mappings for \(f_1\) and \(\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}\). Since \(f_1\) is one-dimensional, convex, proper and increasing, its proximal mapping is fast to evaluate. We will see as part of our complexity analysis in Sect. 4.3 that computing \(\text {prox}_{\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}}\), \(\text {prox}_{\Vert \cdot \Vert _{g,r*}}\) and \(\text {prox}_{\Vert \cdot \Vert _{g,r*}^2}\), i.e., \(f_1 = (\cdot )\) and \(f_1 = (\cdot )^2\), is equally costly.

Note that in contrast to \(\left\| {} \cdot {}\right\| _{ g,r*}\), its dual norm \(\left\| {} \cdot {}\right\| _{ g^D,r}\) is explicitly known by its definition (8), which is why we derive our search framework for

$$\begin{aligned} \text {prox}_{\gamma ^{-1}f_1^+(\Vert \cdot \Vert _{g^D,r})}(Z) \quad \text {and} \quad \Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v), \end{aligned}$$
(16)

with

$$\begin{aligned} -\text {epi}(\left\| {} \cdot {}\right\| _{g^D,r}) :=\{(Y,-w) : \left\| {} Y{}\right\| _{g^D,r}\le w\} \end{aligned}$$

which by (5) and (10) yields the sought proximal mappings

$$\begin{aligned} \text {prox}_{\gamma f_1(\left\| {} \cdot {} \right\| _{g,r*})}(Z)&= Z - \gamma \text {prox}_{\gamma ^{-1} f_1^+(\Vert \cdot \Vert _{g^D,r})}(\gamma ^{-1}Z) \end{aligned}$$
(17a)
$$\begin{aligned} \text {prox}_{\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}}(Z,z_v)&= \Pi _{\text {epi}(\left\| {} \cdot {} \right\| _{g,r*})}(Z,z_v) = (Z,z_v) - \Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v). \end{aligned}$$
(17b)

4.1 Search Framework

Next, we present our main result, which shows that (16), and hence Eqs. (17a) and (17b), can be computed by a nested parameter search. Since the computations of (16) can be compactly unified as

$$\begin{aligned} \begin{aligned}&\underset{Y,w}{\text {minimize}}&f(w) + \frac{\gamma }{2} \Vert Y-Z\Vert _{\ell _2}^2\\&\text {subject to}&w \ge \left\| {} Y{}\right\| _{g^D,r}, \ Y \in \mathbb {R}^{n \times m}, \end{aligned} \end{aligned}$$
(18)

where f is closed, proper and convex, our results are stated for all such problems. Table 1 summaries common choices for f and its relationship to Eqs. (17a) and (17b) via (16).

Table 1 Example choices for f and \(\gamma \) in (18) for the computation of (16) and thus Eqs. (17a) and (17b). \(\chi _{\Vert \cdot \Vert _{g^D} \le \gamma }\) stands for the indicator function of the set \(\{X: \Vert X\Vert _{g^D} \le \gamma \}\)

Before we state the main theorem on how to solve (18) with a nested binary search method, we outline the steps that give rise to this algorithm. It is well-known that the solution \(Y^\star \) to (18) and Z have a simultaneous SVD [31, 36] and, therefore, only the singular values of \(Y^\star \) need to be computed. Let \(y_i=\sigma _i(Y)\) and \(z_i=\sigma _i(Z)\), then it follows that (18) reduces to the vector-valued problem

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^n(y_i - z_i)^2 \\&\text {subject to}&w \ge g^D_r(y), \ y \in \mathbb {R}^n. \end{aligned} \end{aligned}$$
(19)

Since \(z \in \mathbb {R}^n_{\ge 0}\) is monotonically decreasing, it can be shown that the minimizer of (19) is nonnegative. The problem is, therefore, equivalent to solving

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^n(y_i - z_i)^2 \\&\text {subject to}&w \ge g^D(y_1,\ldots ,y_r,0,\ldots ,0), \ y \in \mathbb {R}^n,\\&&y_1 \ge \dots \ge y_n. \end{aligned} \end{aligned}$$
(20)

Since only the r first elements in y are included in the norm constraint, the solution may have a chain of equalities around \(y_r\), i.e., there exist integers \(t\ge 1\) and \(s\ge 0\) such that

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^n(y_i - z_i)^2 \\&\text {subject to}&w \ge g^D(y_1,\ldots ,y_r,0,\ldots ,0), \ y \in \mathbb {R}^n,\\&&y_1\ge \dots \ge y_{r-t+2}>y_{r-t+1} = \dots = y_{r}\\&\qquad \qquad \qquad =\dots = y_{r+s}>y_{r+s+1} \ge \dots \ge y_n. \end{aligned} \end{aligned}$$
(21)

The base case \(t=1\) and \(s=0\) implies that \(y_{r-1}>y_r>y_{r+1}\), i.e., the chain has length one. Thus, if we can solve (21) for an arbitrary, but fixed pair (ts), an optimal \((t^*,s^*)\) could be determined by comparison with all pairs. As this would be very inefficient, the proposed search rules are devised to find \((t^*,s^*)\) by only considering a few pairs.

To state these rules, we need to introduce the truncated gauge function of \(g^D\) as

$$\begin{aligned} g^D_{r,s,t}(x) := g^D\bigg ((Tx)_1,\dots ,(Tx)_{r-t},\underbrace{(Tx)_{r-t+1},\dots ,(Tx)_{r-t+1}}_{t~\text {times}},0,\dots ,0\bigg ), \end{aligned}$$

where \(x \in \mathbb {R}^n\) and the truncation operator \(T: \mathbb {R}^{n} \rightarrow \mathbb {R}^{r-t+1}\) is defined for all \(1 \le r \le n\) and \((t,s) \in \{1,\dots ,r\} \times \{0,\dots ,n-r\}\) as

$$\begin{aligned} (Tx)_i := {\left\{ \begin{array}{ll} \text {sort}(x)_i, &{}\text {if } 1\le i \le r-t, \\ \dfrac{\sum _{i = r-t+1}^{r+s} \text {sort}(x)_i}{\sqrt{t+s}}, &{}\text {if } i = r-t+1. \end{array}\right. } \end{aligned}$$
(22)

Note that \(g^D_{r,s,t}\) is indeed a gauge function with dual gauge function [23, Lemma 2.2.2])

$$\begin{aligned} g_{r,s,t}(x) := g((Tx)_1,\dots ,(Tx)_{r-t},\underbrace{\tfrac{{(Tx)}_{r-t+1}(s+t)}{t},\dots ,\tfrac{(Tx)_{r-t+1}(s+t)}{t}}_{t~\text {times}},0,\dots ,0). \end{aligned}$$

For the special case \((t,s) = (1,0)\), it reduces to \(g^D_r\) in (2). We are now ready to state our main theorem.

Theorem 1

Let \(Z = \sum _{i = 1}^n \sigma _i(Z) u_i v_i^\mathsf {T}\in \mathbb {R}^{n \times m}\), \(\gamma > 0\), \(1\le r\le n\), \(g: \mathbb {R}^n \rightarrow \mathbb {R}\) be a gauge function, and \(f:\mathbb {R} \rightarrow \mathbb {R} \) be proper, closed and convex. For each \((t,s) \in \{1, \dots , r\} \times \{0,\dots , n-r\}\) let \((y^{(t,s)},w^{(t,s)}) \in \mathbb {R}^{n+1}\) be defined as

$$\begin{aligned} y^{(t,s)}_i&:= {\left\{ \begin{array}{ll} \tilde{y}_i, &{}\text {if } 1\le i \le r-t,\\ \dfrac{\tilde{y}_i}{\sqrt{t+s}}, &{}\text {if } r-t+1 \le i \le r+s, \\ \sigma _i(Z) &{}\text {if } i \ge r+s+1.\\ \end{array}\right. } \end{aligned}$$
(23a)
$$\begin{aligned} w^{(t,s)}&:= \tilde{w} \end{aligned}$$
(23b)

where \((\tilde{y},\tilde{w}) \in \mathbb {R}^{r-t+2}\) solves

$$\begin{aligned} \begin{aligned}&\underset{\tilde{y},\tilde{w}}{\text {minimize}}&f(\tilde{w})+ \frac{\gamma }{2}\sum _{i=1}^{r-t+1}(\tilde{y}_i - \tilde{z}_i)^2 \\&\text {subject to}&\tilde{w} \ge g_{r,s,t}^D(\tilde{y}), \ \tilde{y} \in \mathbb {R}^{r-t+1}, \end{aligned} \end{aligned}$$
(24)

and \(\tilde{z} := T \sigma (Z)\) is given by (22). Then \((Y^\star ,w^\star ) = (\sum _{i = 1}^n y_i^{(t^\star ,s^\star )}u_i v_i^\mathsf {T},w^{(t^\star ,s^\star )})\) is the solution to (18), where

$$\begin{aligned} t^\star&:= \min \left\{ \lbrace t: y^{(t,s_t^\star )}_{r-t} > y_{r-t+1}^{(t,s_t^\star )} \rbrace \cup \lbrace r \rbrace \right\} \end{aligned}$$
(25a)
$$\begin{aligned} s_t^\star&:= \min \left\{ \lbrace s: y^{(t,s)}_{r+s} > y_{r+s+1}^{(t,s)} \rbrace \cup \lbrace n-r \rbrace \right\} \nonumber \\ s^\star&:= s_{t^\star }^\star \end{aligned}$$
(25b)

In particular, \((t^\star ,s^\star )\) can be found by a nested binary search over t and s with the following rules for increasing/decreasing t and s:

  1. I.

    \(y^{(t,s_t^\star )}_{r-t} \ge y_{r-t+1}^{(t,s_t^\star )}\) for all \(t \ge t^\star \).

  2. II.

    \(y^{(t,s_t^\star )}_{r-t} \le y_{r-t+1}^{(t,s_t^\star )}\) for all \(t < t^\star \).

  3. III.

    If \(t < t^\star \) and \(y^{(t,s_t^\star )}_{r-t} = y_{r-t+1}^{(t,s_t^\star )}\) then \(\left( y^{(t,s_t^\star )},w^{(t,s_t^\star )} \right) = \left( y^{(t^\star ,s^\star )},w^{(t^\star ,s^\star )} \right) \).

  4. IV.

    \(y^{(t,s)}_{r+s} \ge y_{r+s+1}^{(t,s)}\) for all \(s \ge s_t^\star \).

  5. V.

    \(y^{(t,s)}_{r+s} \le y_{r+s+1}^{(t,s)}\) for all \(s < s_t^\star \).

  6. VI.

    If \(s < s_t^\star \) and \(y^{(t,s)}_{r+s} = y_{r+s+1}^{(t,s)}\) then \(\left( y^{(t,s)},w^{(t,s)} \right) = \left( y^{(t,s_t^\star )},w^{(t,s_t^\star )} \right) \).

A few words on Theorem 1 may be helpful. The first part simply makes explicit that \((y^{(t,s)},w^{(t,s)})\) in Eqs. (23a) and (23b) is the solution of (21) with fixed t and s, i.e., it solves

$$\begin{aligned} \begin{aligned}&\underset{y,w}{\text {minimize}}&f(w)+ \frac{\gamma }{2}\sum _{i=1}^{n}(y_i - z_i)^2\\&\text {subject to}&w \ge g^D_r(y), \ y \in \mathbb {R}^n,\\&&y_{r-t+1} = \dots = y_{r+s}, \end{aligned} \end{aligned}$$
(26)

via the solution of the lower-dimensional problem (24). For fixed t in (21), the search rules for s (Items IV. to VI.) can be used to find an optimal \(s = s_t^\star \) that minimizes the cost in (21) among all choices of s that fulfil the constraint \(y^{(t,s)}_{r+s} \ge y_{r+s+1}^{(t,s)} \ge \dots \ge y_{n}^{(t,s)}.\) Since \(y_i^{(t,s)} = z_i\) for \(i \ge r+s+1\) by (23a), it suffices to check that \(y^{(t,s)}_{r+s} \ge y_{r+s+1}^{(t,s)}\), where by (25b) \(s_t^\star \) is the smallest of such s. Similarly, the search rules for finding an optimal \(t =t^\star \) minimize the cost in (21) among all choices \((t,s) = (t,s_t^\star )\) that do not violate the constraint \(y^{(t,s_t^\star )}_{r-t} \ge y_{r-t+1}^{(t,s_t^\star )}\). Using nested binary search (see [28]) over s (inner loop) and t (outer loop), an optimal \((t^*,s^*)\) can be found efficiently under the assumption that (24) has an efficiently computable solution for all choices (ts). For more details, see the derivation of the proof to Theorem 1 in Appendix 2 and our explicit implementation for determining \(\Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v)\) in Algorithm 1.

figure a

4.2 Low-Rank Inducing Frobenius and Spectral Norms

Next, we exemplify solutions to (24) for the instances in Table 1 with \(g = \ell _2\) and \(g = \ell _\infty \). A general result on the solvability of (24) is given in Appendix 3.

In particular, we will discuss solutions to (24) for all \(g = \tau \ell _2\) and \(g = \tau \ell _\infty \), \(\tau > 0\), because this enables us to handle the first two cases in Table 1 simultaneously through the identity

$$\begin{aligned} \text {prox}_{\frac{\tau ^2}{2}{\left\| {}\cdot {}\right\| _{ g,r*}^2}}(Z) = Z - \text {prox}_{\frac{1}{2}{\left\| {} \cdot {}\right\| _{ \frac{g^D}{\tau },r}^2}}(Z) = Z- \Pi _{Y}\bigg (\Pi _{-\text {epi}(\left\| {} \cdot {}\right\| _{ \frac{g^D}{\tau },r})}(Z,0)\bigg ), \end{aligned}$$

where \(\Pi _{Y} (Y,w) := Y\) and \(\tau > 0\). It is easy to adjust these computations for the third case, because \(\text {prox}_{\tau {\left\| {}\cdot {}\right\| _{ g,r*}}}(Z) = \text {prox}_{{\left\| {}\cdot {}\right\| _{ \tau g,r*}}}(Z)\).

Proposition 3

Let \(Z = \sum _{i = 1}^n \sigma _i(Z) u_i v_i^\mathsf {T}\in \mathbb {R}^{n \times m}\), \(g = \tau \ell _2\) with \(\tau > 0\), \(1\le r\le n\), \(\gamma = 1\), \(z_v \in \mathbb {R}\) and \(\tilde{z} := T \sigma (z)\). Then, \(\Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v)\) can be computed via Theorem 1 with \(f(w) = \frac{1}{2}(w+z_v)^2\), where the solution to (24) is characterized by one of the following three distinct cases:

$$\begin{aligned}&(\tilde{y},\tilde{w}) = (\tilde{z},z_v) ~\ \Longleftrightarrow ~\ \sqrt{\sum _{i=1}^{r-t} \tilde{z}_i^2 + \frac{t}{s+t} \tilde{z}_{r-t+1}^2} \le -\tau z_v, \end{aligned}$$
(27a)
$$\begin{aligned}&(\tilde{y},\tilde{w}) = 0 ~\ \Longleftrightarrow ~\ \sqrt{\sum _{i=1}^{r-t} \tilde{y}_i^2 + \frac{s+t}{t} \tilde{y}_{r-t+1}^2} \le \frac{z_v}{\tau } \end{aligned}$$
(27b)

and otherwise

$$\begin{aligned} \tilde{y}_i&= \frac{\tilde{z}_i}{1+\frac{\mu }{\tau ^2 \tilde{w}}}, \ 1\le i \le r-t \end{aligned}$$
(27c)
$$\begin{aligned} \tilde{y}_{r-t+1}&= \frac{ \tilde{z}_{r-t+1}}{1+\frac{\mu t}{\tau ^2 \tilde{w} (s+t)}} \end{aligned}$$
(27d)
$$\begin{aligned} \tilde{w}&= \mu - z_v \end{aligned}$$
(27e)

where the unique \(\mu \ge 0\) is a solution to the fourth-order polynomial

$$\begin{aligned} \left[ \left( \tilde{w} \tau +\frac{\mu }{\tau }\right) ^2-c_1\right] \left[ (t + s)\tau \tilde{w} + \frac{\mu }{\tau } t\right] ^2 - t c_2^2\left( \tilde{w} \tau +\frac{\mu }{\tau }\right) ^2 = 0 \end{aligned}$$
(27f)

\(c_1 := \sum _{i=1}^{r-t} \tilde{z}^2_i\) and \(c_2 := \sqrt{t+s}\tilde{z}_{r-t+1}\).

Similarly, \(\text {prox}_{\chi _{\Vert \cdot \Vert _{\tau g^D,r} \le \gamma }}(Z)\) can be determined by setting \(f(w) = \chi _{[0,\gamma ]}(w)\), where it suffices to consider the two cases: (27a) with \(z_v = -1\), and Eqs. (27c), (27d) and (27f) with \(\tilde{w} = 1\).

Proposition 4

Let \(Z = \sum _{i = 1}^n \sigma _i(Z) u_i v_i^\mathsf {T}\in \mathbb {R}^{n \times m}\), \(g = \tau \ell _\infty \) with \(\tau > 0\), \(1\le r\le n\), \(\gamma = 1\), \(z_v \in \mathbb {R}\) and \(\tilde{z} := T \sigma (z)\). Further, let

$$\begin{aligned} \hat{z}&:=\left( \tilde{z}_1,\ldots ,\tilde{z}_j,\frac{t}{\sqrt{(t+s)}}\tilde{z}_{r-t+1},\tilde{z}_{j+1},\ldots ,\tilde{z}_{r-t}\right) \in \mathbb {R}^{r-t+1},\\ \alpha&:= \left( \underbrace{1,\ldots ,1}_{\text {length } j},\frac{t^2}{(t+s)},1,\dots ,1\right) \in \mathbb {R}^{r-t+1}. \end{aligned}$$

where j is chosen such that

$$\begin{aligned} \tilde{z}_j>\tfrac{\sqrt{(t+s)}}{t}\tilde{z}_{r-t+1}\ge \tilde{z}_{j+1} \quad \text {or} \quad \tilde{z}_{r-t} \ge \tfrac{\sqrt{(t+s)}}{t}\tilde{z}_{r-t+1} . \end{aligned}$$
(28)

Then, \(\Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v)\) can be computed via Theorem 1 with \(f(w) = \frac{1}{2}(w+z_v)^2\), where the solution to (24) is characterized by one of the following three distinct cases:

$$\begin{aligned}&(\tilde{y},\tilde{w}) = (\tilde{z},zv) ~\ \Longleftrightarrow ~\ \sum _{i=1}^{r-t} |\tilde{z}_i| + \frac{t}{\sqrt{t+s}} |\tilde{z}_{r-t+1}| \le -\tau z_v \end{aligned}$$
(29a)
$$\begin{aligned}&(\tilde{y},\tilde{w}) = 0 ~\ \Longleftrightarrow ~\ \max \left( \tilde{z}_1,\dfrac{\sqrt{t+s}}{t} \tilde{z}_{r-t+1} \right) \le \frac{z_v}{\tau } \end{aligned}$$
(29b)

and otherwise

$$\begin{aligned} \tilde{y}_i&= \max \left( \tilde{z}_i - \tfrac{\mu }{\tau },0 \right) , \ 1\le i \le r-t, \end{aligned}$$
(29c)
$$\begin{aligned} \tilde{y}_{r-t+1}&= \max \left( \tilde{z}_{r-t+1} - \tfrac{t \mu }{\sqrt{(t + s)}\tau },0 \right) , \end{aligned}$$
(29d)
$$\begin{aligned} \tilde{w}&= \mu - z_v \end{aligned}$$
(29e)

where \(\mu = \hat{\mu }_{k^\star }\) with \(\hat{\mu }_k= \frac{z_v+\sum _{i=1}^{k}\hat{z}_{i}}{1+\sum _{i=1}^k\alpha _i}\) and \(k^\star \) can be identified by a search over k with the following rules for increasing/decreasing k:

  1. I.

    \(k^\star =\max \{k : \hat{z}_k-\alpha _k\hat{\mu }_k\ge 0\}\)

  2. II.

    \(\hat{z}_k-\alpha _k\hat{\mu }_k\ge 0\) for all \(k \le k^\star \)

  3. III.

    \(\hat{z}_k-\alpha _k\hat{\mu }_k< 0\) for all \(k> k^\star \)

Similarly, \(\text {prox}_{\chi _{\Vert \cdot \Vert _{\tau g^D,r} \le \gamma }}(Z)\) can be determined by setting \(f(w) = \chi _{[0,\gamma ]}(w)\), where it suffices to consider the two cases: (29a) with \(z_v = -1\) and Eqs. (29c) and (29d), where \(\mu = \hat{\mu }_{k^\star }\) can be found with the search rules from above and \(\hat{\mu }_k = \frac{\sum _{i=1}^{k}\hat{z}_{i}}{\sum _{i=1}^k\alpha _i}\).

Propositions 3 and 4 are proved in Appendixces 4 and 6, respectively, and implementations for the user are available for MATLAB and Python at [20, 21].

4.3 Computational Complexity

In the following, we evaluate the computational complexity, i.e., counting all flops (see [44]) of the discussed approaches for computing

$$\begin{aligned} \text {prox}_{\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}}(Z,z_v) = \Pi _{\text {epi}(\left\| {} \cdot {} \right\| _{g,r*})}(Z,z_v) = (Z,z_v) - \Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v). \end{aligned}$$

Since the same analysis also applies to the other cases discussed in Table 1, this will allow us to compare our approach to existing methods. Our evaluation starts with a discussion of Algorithm 1 for a general gauge function, followed by an explicit discussion for the cases of \(g=\ell _2\) and \(g = \ell _{\infty }\) in Sects. 4.3.1 and 4.3.2 of which a summary is given in Table 2.

Table 2 Complexity of computing \(\Pi _{\text {epi}(\Vert \cdot \Vert _{\ell _2,r*})}(Z,z_v)\) and \(\Pi _{\text {epi}(\Vert \cdot \Vert _{\ell _1,r*})}(Z,z_v)\) given a pre-computed SVD of Z in comparison with others

In order to apply the binary search rules in Theorem 1, we only need to determine \((y^{(t,s)}_{r-t},y^{(t,s)}_{r-t+1},y^{(t,s)}_{r+s},y^{(t,s)}_{r+s+1}),\) whose computational cost we assume to be bounded by C(nr). Then, the complexity of Algorithm 1 is the sum of:

  1. 1.

    SVD for Z providing all \(\sigma _i(Z)\) and \(u_iv_i^\mathsf {T}\) such that \(Z = \sum _{i = 1}^n \sigma _i(Z)u_iv_i^\mathsf {T}\) (see [44]): \(\mathcal {O}(mn^2)\).

  2. 2.

    Binary search rules (see [28]) in Theorem 1 for t and s:

    $$\begin{aligned} \mathcal {O}(C(n,r) \log (r)\log (n-r)) \end{aligned}$$
  3. 3.

    Determine the final full solution: \(\mathcal {O}(n)\).

  4. 4.

    Compute \(\text {prox}_{\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}}(Z,z_v)\) from \(\Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v)\): \(\mathcal {O}(n)\)

In practise, the first cost may be significantly reduced by employing sparse SVD solvers (see e.g., [32, 34]). In particular, for the vector-valued case, this corresponds to a simple sorting of the entries. The second cost is determined by the coordinate transformation (23a), i.e.,

$$\begin{aligned} (y^{(t,s)}_{r-t},y^{(t,s)}_{r-t+1},y^{(t,s)}_{r+s},y^{(t,s)}_{r+s+1})= \left( \tilde{y}_{r-t},\frac{1}{\sqrt{s+t}}\tilde{y}_{r-t+1},\frac{1}{\sqrt{s+t}}\tilde{y}_{r-t+1},\sigma _{r+s+1}(Z)\right) \end{aligned}$$

and therefore the cost for C(nr) equals the cost \(\tilde{C}(n,r)\) for solving (24) to find \((\tilde{y}_{r-t},\tilde{y}_{r-t+1})\). To compute the full solution \(y^{(t^\star ,s^\star )}\), once an optimal pair \((t^\star ,s^\star )\) is found, the cost for these pre- and post-computing steps is at most \(\mathcal {O}(n)\). Finally, computing \(\text {prox}_{\chi _{\text {epi}(\Vert \cdot \Vert _{g,r*})}}(Z,z_v)\) from \(\Pi _{-\text {epi}(\Vert \cdot \Vert _{g^D,r})}(Z,z_v)\) only contributes an additional \(n+1\) subtractions.

Remark 1

The cost for computing \(\tilde{z}_{r-t+1}\) is given by the cost for knowing \(\sum _{i=r+1}^{r+s} \sigma _i(Z)\) (for \(s > 0\)) and \(\sum _{i=r-t+1}^{r} \sigma _i(Z)\). Both sums could be computed a priori for all t and s through incremental summation with cost \(\mathcal {O}(n)\). However, in practice it may be cheaper to store and re-use the intermediate sums, when deriving \(\sum _{i=r-t+1}^{r} z_i\) and \(\sum _{i=r+1}^{r+s} z_i\). This means we only need to compute additional intermediate sums whenever t and s get increased within the binary search.

4.3.1 Low-Rank Inducing Frobenius Norms

In order to determine the computational cost \(\tilde{C}(n,r)\) for \(g = \ell _2\), we need to analyze the complexity of the three cases in Proposition 3. All cases require the evaluation of \(\sum _{i=1}^{r-t} \tilde{z}_i^2 = \sum _{i=1}^{r-t} \sigma _i^2(Z)\) as either part of the inequalities Eqs. (27a) and (27b) or as coefficients in polynomial (27f). These sums can be computed once for all \(t \in \{1,\dots ,r\}\) with cost \(\mathcal {O}(r)\). Then testing Eqs. (27a) and (27b) as well as solving the fourth-order polynomial (27f) are of cost \(\mathcal {O}(1)\). Our generic approach, therefore, recovers in this special case the same complexity as the algorithms in [14, 29] (see Table 2).

4.3.2 Low-Rank Inducing Spectral Norms

As in the previous case, determining \(\tilde{C}(n,r)\) for \(g = \ell _\infty \) requires us to compute the complexity of the three cases in Proposition 4. The cases Eqs. (29a) and (29b) require the evaluation of \(\sum _{i=1}^{r-t}\tilde{z}_i = \sum _{i=1}^{r-t}\sigma _i(Z)\). This can be done once for all \(t \in \{1,\dots ,r\}\) with cost \(\mathcal {O}(r)\), and verifying the corresponding inequalities is then of complexity \(\mathcal {O}(1)\).

Determining \(\mu \) in the third case of Proposition 4 requires:

  1. a)

    Find j in (28): \(\mathcal {O}(\log (r-t+1))\), because \(\tilde{z}_1 \ge \dots \ge \tilde{z}_{r-t}\).

  2. b)

    Determine \(\mu _{k^{\star }} = \mu \) through binary search: \(\mathcal {O}(r-t+1)\), because \(\sum _{i=1}^{r-t+1} \hat{z}_i\) may need to be computed.

Thus, \(\tilde{C}(n,r)\) is dominated by the complexity of determining \(\mu \), which by the preceding analysis is at most \(\mathcal {O}(r)\). Compared to [46], our approach reduces the overall cost significantly (see Table 2), which is especially important for the corresponding vector-valued problem.

5 Case Study: Matrix Completion

In the following, we will see how the binary search parameters (tsk) from Algorithm 1 and Proposition 4 evolve when solving an optimization problem with a proximal splitting method. We consider the convexified low-rank matrix completion problem (see, e.g., [7, 8, 17] for motivation and examples)

$$\begin{aligned} \begin{aligned}&\underset{M}{\text {minimize}}&\Vert M\Vert _{\ell _\infty ,r*} \\&\text {subject to}&m_{ij} = n_{ij}, \ (i,j) \in \mathcal {I} \end{aligned} \end{aligned}$$
(30)

with \(r =50\), \(\mathcal {I} := \{n_{ij}: n_{ij} > 0 \}\) and \(N = \sum _{i=1}^r u_i u_i^\mathsf {T}\) being defined through the SVD of

figure b

Note that a smaller version of this example has been solved successfully in [17] by using an SDP-solver, but this larger example is far out of the scope of typical SDP-solvers [39, 43]. Therefore, we apply the following Douglas–Rachford splitting scheme (see [9, 11, 33]):

$$\begin{aligned}&X_{i} = \text {prox}_{{\left\| {}\cdot {}\right\| _{ \ell _{\infty },r*}}}(Z_{i-1}), \quad Y_{i}= \Pi _{\mathcal {L}}(2X_{i} - Z_{i-1}), \quad Z_{i} = Z_{i-1} + Y_{i}-X_{i} \end{aligned}$$
(32)

with \(\mathcal {L} := \{X \in \mathbb {R}^{500\times 500}: x_{ij} = n_{ij}, \ (i,j) \in \mathcal {I} \}\), \(Z_0 = 0\) and \(\lim _{i \rightarrow \infty } X_i = \lim _{i \rightarrow \infty } Y_i\) being a solution to (30). By the construction of N, it can be shown that \(\lim _{i \rightarrow \infty } X_i = N\) (see [17]).

Fig. 1
figure 1

Parameter path of (t, s, k) from Algorithm 1 and Proposition 4 when computing \(\text {prox}_{{\left\| {}\cdot {}\right\| _{ \ell _{\infty },r*}}}\) within the Douglas–Rachford iterations (32). There are no values for the first iterate, because \(\text {prox}_{{\left\| {}\cdot {}\right\| _{ \ell _{\infty },r*}}}(Z_{0}) = 0\) and the iterations are stopped when \(\Vert X_i-Y_i\Vert _F \le 10^{-8}\). Strict inequalities in Algorithm 1 are determined to be valid if the corresponding nonnegative difference is above the relative threshold \(10^{-12} \sum _{i=1}^r \sigma _{i}(Z)\). The local plateauing after relatively few iterations suggests to use (tsk) from the previous iterations as an initial guess for the current iteration as well as to employ sparse SVD solvers in order to save computational time

The parameter path of (tsk) for computing \(X_i\) is shown in Fig. 1. We observe that as \(X_i\) approaches N, the values of t, s and k start plateauing. Thus, by using the values from one iterate in the subsequent iterate, the practical computational cost may reduce significantly. Finally, after the initial transient, the variance of each parameter is small compared to the overall 500 singular values. As a result, sparse SVD algorithms, which only compute a small predefined number of largest singular values (see, e.g., [32, 34]), can be effectively applied. This emphasizing that our complexity analysis is important to both, vector- and matrix-valued problems.

6 Conclusion

This work presents a binary search framework for computing the proximal mappings of all unitarily invariant low-rank inducing norms and their epigraph projections. In particular, complete algorithms for the low-rank inducing Frobenius and spectral norms are presented. Our framework unifies and extends the known proximal mapping computations in the following sense: (i) So far, only proximal mappings for the squared low-rank inducing Frobenius norm [14] and the (non-squared) low-rank inducing spectral norm [46] have been derived. This framework is independent of the particular unitary invariant norm and its composition with a convex increasing function. (ii) Excluding the cost for an SVD, i.e., the cost for the analogous vector-valued problem, we recover the same complexity for the squared low-rank inducing Frobenius norm as in [14, 29], but significantly decrease the complexity for the (non-squared) low-rank inducing spectral norm. Further, we show that these costs also transfer to compositions with simple functions.

Finally, in our case study we have seen that within a proximal splitting method, the computational cost of our proximal mappings may be reduced to approximately linear cost, besides the singular value decomposition, after a small number of iterations and is therefore roughly the same as in case of the nuclear norm/spectral norm. Further, our example also demonstrates that sparse singular value decomposition (see e.g., [32, 34]) can be effectively applied, underlining the importance of our analysis even for the matrix case. Implementations for the low-rank inducing Frobenius and spectral norms are available for MATLAB and Python at [20, 21].