1 Introduction

The present work is the extension of a chain of ideas with its roots in compressed sensing. \(\ell ^1-\ell ^2\)-minimization tricks have a long history and got renewed attention with the work of Donoho, Candés and Tao among others [1,2,3]. In the same spirit, the nuclear norm minimization strategy was investigated by Fazel and coworkers [4, 5], and in both cases, it was shown that these methods yield perfect reconstructions in the case of no noise. However, in realistic scenarios, these results often do not apply and additionally there is of course noise, in which case the methods come with a (sometimes severe) bias. Moreover, they are slow since one needs to find an appropriate value of involved penalty parameters.

Due to such issues, there is a wealth of non-convex variations to replace \(\ell ^1/\)nuclear norm in the area of compressed sensing, we refer to [6] for a survey. Two fairly recent contributions in this vein are the work by Carl Olsson and coworkers [7] as well as by Gilles Aubert and coworkers [8]. The former paper deals with non-convex matrix minimization problems with subspace constraints, the latter with sparse reconstructions, and in particular, the latter shows that the concrete regularizer considered there has the desirable property of not moving global minima. In this paper, we find a unifying framework and show that all these penalties are particular cases of the so-called proximal hull or quadratic envelope. We systematically study this as a regularizer, and in particular, we lift the result of Aubert et al. to a general context. In order to do so, we provide new results on the structure of lower semi-continuous (abbreviated l.s.c.) convex envelopes which are interesting in their own right. More precisely, we show that whenever a l.s.c. convex envelope is not in touch with the function that generates it, then it necessarily has a direction in which it is affine linear.

2 Outline and Motivation

We develop methods to compute the lower semi-continuous convex envelope of functionals of the form

$$\begin{aligned} f(x)+\frac{1}{2}\Vert x-d\Vert ^2_2, \end{aligned}$$
(1)

and show that this is of the form \(\mathcal {Q}(f)(x)+\frac{1}{2}\Vert x-d\Vert ^2_2\), where \(\mathcal {Q}(f)\) is the proximal hull or “quadratic envelope”, as we shall call it. Here, x can be in any separable Hilbert space, but f needs to be such that the global minimization of (1) is computable. The practical applications of \(\mathcal {Q}(f)\) pertain to optimization of (1) with additional constraints, as well as unconstrained optimization of

$$\begin{aligned} f(x)+\frac{1}{2}\Vert Ax-d\Vert ^2_2, \end{aligned}$$
(2)

where A is a linear operator.

To introduce the main ideas behind this work, we consider two concrete problems. A multitude of applications can be posed mathematically as finding the lowest rank matrix X satisfying some equation \({A}(X)=d\), where A is a linear operator and d is a measurement (see, e.g., [5, 9]). Usually, the measurement d is not perfect, so in practice, one wishes to find the minimum rank given some accepted error: \(\Vert A(X)-d\Vert \le \rho \). The dual formulation of this problem is

$$\begin{aligned} \mathop {{{\,\mathrm{arg\,min ~}\,}}}\limits _{X}\left[ \lambda \textsf {rank}(X)+\Vert A(X)-d\Vert ^2\right] , \end{aligned}$$
(3)

where \(\lambda \) is a parameter. However, the functional \(\textsf {rank}(X)\) is non-convex and highly discontinuous, so the problem can not be solved as stated (in general). It can be solved for the case \(A=I\), but the problem is still hard when combined with additional priors, see, e.g., Section 1.1 in [7] for an overview and applications in signal processing and imaging.

Due to the problematic nature of \(\textsf {rank}(X)\), it has become popular to replace \(\textsf {rank}(X)\) with the nuclear norm of X. However, \(\textsf {rank}(X)\) and the nuclear norm are quite far apart and the method leads to a bias in the solution, which led the authors of [7] to suggest working instead with the convex envelope of \(\textsf {rank}(X)+\frac{1}{2}\Vert X-D\Vert ^2_F\) for which they obtained an explicit expression. They also provided the convex envelope when \(\textsf {rank}(X)\) is replaced by the indicator functional of the set \(\{X:~\textsf {rank}(X)\le K\}\), in order to treat problems where a matrix of a fixed rank is sought, and this convex envelope was further studied in [10].

Independently, convex envelopes were used in [8] to suggest a regularizer to functionals of the type

$$\begin{aligned} \Vert x\Vert _0+\frac{1}{2}\Vert Ax-d\Vert ^2_2,\quad x\in \mathbb {R}^n, \end{aligned}$$
(4)

which is usually dealt with by replacing \(\Vert x\Vert _0\) by \(\lambda \Vert x\Vert _1\). The main contribution of their work is to show that their regularizer does not move global minima. A common misconception is that the same holds for \(\ell ^1\)-methods, which is true only if there is no noise [11]. In the presence of noise, the estimates for \(\ell ^1\)-methods are rather poor and [8] are the first framework which allows for regularization without moving minima in a more realistic scenario.

This paper presents a unified approach to this circle of ideas by connecting them with the “quadratic envelope” \(\mathcal {Q}(f)\). We also extend the findings of [8] to any problem of the form (2) as long as \(\mathcal {Q}(f)\) is computable. An expanded version of this article is found in [12], which contains a long list of instances where \(\mathcal {Q}(f)\) is computable.

In particular, \(\mathcal {Q}(f)\) is computable for \(\iota _K\); the indicator functional for

$$\begin{aligned} \{x\in \mathbb {C}^n:~\Vert x\Vert _0\le K\}. \end{aligned}$$

As a proof of concept, we compare performance of (4) with \(\Vert x\Vert _0\) replaced by \(\lambda \Vert x\Vert _1\), with \(\mathcal {Q}(\textsf {card})\) and with \(\mathcal {Q}(\iota _K)\) (where \(\textsf {card}(x)=\Vert x\Vert _0\)). We use a \(100\times 200\) matrix A and minimize the regularized version of (4) for d of the form \(Ax_0+\epsilon \), where \(x_0\) has cardinality 10 and \(\epsilon \) takes on various levels of noise. As noted in [13], the best one can hope for is then to recover the so-called oracle solution \(x_S\), (obtained if an oracle a priori revealed the correct support). As Fig. 1 shows, both \(\mathcal {Q}(\textsf {card})\) and \(\mathcal {Q}(\iota _{10})\) outperform \(\ell ^1\) and find the oracle solution for fairly large levels of noise. Also, \(\mathcal {Q}(\iota _{10})\) beats \(\mathcal {Q}(\textsf {card})\), which is no surprise since it contains additional information about the problem built into it and demonstrates the versatility of the new \(\mathcal {Q}\)-transform. The article [6] contains much more information about this particular case.

Fig. 1
figure 1

Reguarlizing (17) by \(\mathcal {Q}(\textsf {card})\) and \(\mathcal {Q}(\iota _{10})\) finds the oracle solution up to noise levels of around \(\Vert \epsilon \Vert =3\) and 4 (roughly 30% of \(\Vert d\Vert \)), whereas \(\ell ^1\)-regularization only finds this solution with no noise

We now outline the main contributions of this paper in greater detail. Consider any functional of the form

$$\begin{aligned} f(x)+\frac{\gamma }{2}\Vert x-d\Vert ^2_\mathcal {V}\end{aligned}$$
(5)

where \(\gamma >0\) is a parameter, \(\mathcal {V}\) is an arbitrary separable Hilbert space and f a non-negative functional on \(\mathcal {V}\). In Sect. 3, we introduce the transform \(\mathcal {Q}_\gamma \) and show that the l.s.c. convex envelope of the functional in (5) is

$$\begin{aligned} \mathcal {Q}_\gamma (f)(x)+\frac{\gamma }{2}\Vert x-d\Vert ^2_\mathcal {V}. \end{aligned}$$
(6)

In order for \(\mathcal {Q}_\gamma (f)\) to be computable, the global minimization of (5) needs to be solvable, and hence, the problem of minimizing (5) in itself is not an instance where the \(\mathcal {Q}_\gamma \)-transform is useful. However, it is useful for finding global minimizers of (5) in combination with additional prior restrictions. To illustrate, consider the problem

$$\begin{aligned} \mathop {{{\,\mathrm{arg\,min ~}\,}}}\limits _{x\in \mathcal {H}} \left[ f(x)+\frac{1}{2}\Vert x-d\Vert ^2\right] , \end{aligned}$$
(7)

where \(\mathcal {H}\) is a closed and convex subset of \(\mathcal {V}\), and suppose we are unable to find a closed form solution. Upon replacing (7) with

$$\begin{aligned} \mathop {{{\,\mathrm{arg\,min ~}\,}}}\limits _{x\in \mathcal {H}}\left[ \mathcal {Q}_\gamma (f)(x)+\frac{1}{2}\Vert x-d\Vert ^2\right] , \end{aligned}$$
(8)

for some fixed \(\gamma \le 1\), we obtain a convex problem which can be solved. However, even for \(\gamma =1\), it is possible that (7) and (8) have different solutions, despite the functional in (8) being the l.s.c. convex envelope of the one in (7). The rationale behind replacing (7) by (8) is pragmatical; since the latter is convex, the solution may be found using convex optimization routines. This may seem ad hoc, but we remind the reader that replacing, e.g., \(\Vert x\Vert _0\) by \(\Vert x\Vert _1\) or \(\textsf {rank}(X)\) by the nuclear norm \(\Vert X\Vert _1\) has had a substantial impact, and that for these concrete cases the modification \(\mathcal {Q}_\gamma (f)\) is much closer to the original functional f, (which leads to a better performance as an estimator, see the numerical sections of [6, 7]). A reason for this is that \(\mathcal {Q}_\gamma (f)\) has the desirable feature that \(\mathcal {Q}_\gamma (f)(x)=f(x)\) often holds, and since (8) is a convex problem below the original problem (7), it is easy to see that a minimum \(\hat{x}\) to (8) is the solution to (7) whenever \(\mathcal {Q}_\gamma (f)(\hat{x})=f(\hat{x})\). This is highlighted in Fig. 2 where the two problems have the same solution. More information and examples on this type of problems are found in Part II of [12].

Fig. 2
figure 2

Illustration of a non-convex optimization problem with linear constraints. The left panel shows a non-convex functional along with its level sets. The gray line represents the subspace we are interested in, and the blue curve represents the values of the functional restricted to the subspace. The right panel shows the same setup, but here the convex envelope is shown as well in orange/yellow. The values of the convex envelope over the subspace are shown in the red curve. In this case, the minima of blue and red function coincide

In Sect. 5, we consider regularization of functionals like (3) and (4), or more generally

$$\begin{aligned} f(x)+\frac{1}{2}\left\| A x-d\right\| ^2_\mathcal {W}\end{aligned}$$
(9)

for arbitrary non-negative f, where \(A:\mathcal {V}\rightarrow \mathcal {W}\) is a linear operator between separable Hilbert spaces. We assume that \(\mathcal {V}\) is such that \(\mathcal {Q}_\gamma (f)\) is computable and that the convex envelope of (9) is untractable. We propose to use as regularizer the function \(\mathcal {Q}_\gamma (f)\), i.e., we will study the relationship between minimizers of (9) and those of

$$\begin{aligned} \mathcal {Q}_\gamma (f)(x)+\frac{1}{2}\left\| A x-d\right\| ^2_\mathcal {W}. \end{aligned}$$
(10)

Since it often holds that \(\mathcal {Q}_\gamma (f)(x)=f(x)\), we again see that a global minimizer of (10) for which this is the case must also be a global minimizer of (9), in view of the inequality \(\mathcal {Q}_\gamma (f)\le f\) (shown in Sect. 3). The parameter \(\gamma \) now becomes a useful tool as it tunes the curvature of \(\mathcal {Q}_\gamma (f)\), and we pause to illustrate its role by considering a toy problem in one variable; see Fig. 3. We let \(|x|_0\) be the function equalling 1 on \(\mathbb {R}{\setminus }\{0\}\) and zero at \(x=0\). In red, we see the functional \(|x|_0+\frac{1}{2}|x-1|^2\) (which is a particular case of both (3) and (4) in one dimension, the matrix A is here the number 1), in blue its convex envelope and in pink the \(\ell ^1\) convex relaxation \(|x|+\frac{1}{2}|x-1|^2\). Clearly, the global minimum of the red and blue coincides, but the global minimum of the \(\ell ^1\)-relaxation is different. For (10), we have two options, either \(|A|^2>\gamma \) or \(|A|^2<\gamma \). The regularizer (10) is illustrated in black for these two cases in Fig. 3. The circles represent global minima of the respective functions. In the case \(|A|^2>\gamma \), we see that (10) is a convex minorant of (9) whose global minima (for this choice of parameters) is equal to that of (10). In the case \(|A|^2<\gamma \), (10) is no longer convex but the local minima of (10) are also minima of (9), and (10) has fewer local minima. In particular, the global minima coincide. The main point of the paper is loosely that the general behavior is the same.

Fig. 3
figure 3

The black curve shows two regularizations of the red curve, for different levels of \(\gamma \)

In Sect. 5.1, we generalize the situation in Fig. 3 (left) and assume that \(\gamma \) satisfies \(A^*A\succcurlyeq \gamma I\), i.e., that

$$\begin{aligned} \Vert Ax\Vert ^2\ge \gamma \Vert x\Vert ^2. \end{aligned}$$
(11)

For such choice of \(\gamma \), we prove that the functional (10) is a convex functional below (9) and hence minimization of (10) will produce a minimizer which, although not necessarily equal to the minimizer of the original problem, potentially is closer than that obtained by other convex relaxation methods.

For the problem (4), A is usually a matrix with a large kernel which rules out the above approach. In Sect. 5.2, we consider the case

$$\begin{aligned} \Vert A\Vert ^2\le \gamma , \end{aligned}$$
(12)

generalizing the situation in the right picture of Fig. 3. We can then show that (10) is a continuous (but not convex) functional with the following desirable properties:

(i):

Equation (10) lies between (9) and its l.s.c. convex envelope,

(ii):

any local minimizer of (10) is a local minimizer of (9),

(iii):

the global minimizers of (10) and (9) coincide.

These findings in turn rely on general results about l.s.c. convex envelopes which we provide in Sect. 4. The computation of the l.s.c. convex envelope of \(f(x)+\frac{\gamma }{2}\Vert x\Vert ^2\) can be thought of as stretching a plastic foil from below onto the graph of \(f(x)+\frac{\gamma }{2}\Vert x\Vert ^2\) (see Fig. 2). Consider a point x where the plastic foil is not in contact with the graph, i.e., where \(\mathcal {Q}_\gamma (f)(x)<f(x)\). It is intuitively obvious that the plastic foil, i.e., the graph of \(\mathcal {Q}_\gamma (f)(x)+\frac{\gamma }{2}\Vert x\Vert ^2\) has some direction in which it is affine linear and thus \(\mathcal {Q}_\gamma (f)\) should have some direction in which the curvature is \(-\gamma \). This is surprisingly difficult to show and despite the wealth of results on l.s.c. convex envelopes it is not found in any standard reference on the topic. The statement is shown in the Ph.D. thesis [14] for the finite- dimensional case. Here, we provide a proof based on an extension of Milman’s theorem due to Arne Brøndsted [15] in a short note from 1966.

The final Sect. 6 is more practical in nature. Critical points of (10) can be found using the forward–backward splitting method (FBS), given that \(\mathcal {Q}_\gamma (f)\) is “semi-algebraic”, as was shown in [16]. To simplify verification of when \(\mathcal {Q}_\gamma (f)\) is semi-algebraic, we show in Sect. 6 that this is true as long as f itself is semi-algebraic. Further tools to compute \(\mathcal {Q}_\gamma (f)\) as well as related proximal operators are found in [12].

3 The Quadratic Envelope

Let \(\mathcal {V}\) be a separable Hilbert space over \(\mathbb {R}\) or \(\mathbb {C}\), such as \(\mathbb {C}^n\) or the set matrices of a certain size equipped with the Frobenius norm. All Hilbert spaces over \(\mathbb {C}\) are also Hilbert spaces over \(\mathbb {R}\) with the scalar product \( \left\langle x,y \right\rangle _\mathbb {R}=\textsf {Re}\left\langle x,y \right\rangle \), and hence, it is no restriction to assume that \(\mathcal {V}\) is a real Hilbert space wherever needed. Even if \(\mathcal {V}\) is a Hilbert space over \(\mathbb {C}\), we will implicitly assume that the scalar product is \(\left\langle x,y \right\rangle _\mathbb {R}\).

Given any functional \(f:\mathcal {V}\rightarrow \mathbb {R}\cup \{\infty \}\) and parameter \(\gamma >0\), we introduce the “quadratic envelope” \(\mathcal {Q}_\gamma \) as the supremum of all minimizers of the form \(\alpha -\frac{\gamma }{2}\Vert x-y\Vert ^2\) for \(\alpha \in \mathbb {R}\) and \(y\in \mathcal {V}\);

$$\begin{aligned} \mathcal {Q}_{\gamma }(f)(x)=\sup _{\alpha \in \mathbb {R},y\in \mathcal {V}}\left\{ \alpha -\frac{\gamma }{2}\Vert x-y\Vert ^2:~\alpha -\frac{\gamma }{2}\Vert \cdot -y\Vert ^2\le f\right\} . \end{aligned}$$
(13)
Fig. 4
figure 4

Illustration of a non-convex function f (red) and its quadratic envelope \(Q_2(f)\) (black). The black graph lies slightly below for illustration only

The quadratic envelope has appeared previously, e.g., in [17] under the name “proximal hull”, denoted \(h_{\gamma ^{-1}}\) (Example 1.44), but it seems that the term is not widespread (see the discussion in Sect. 7), and it seems that its connection with convex envelopes has not been noted, or at least not systematically studied. We prefer the term quadratic envelope since it is more suggestive and prefer the notation \(\mathcal {Q}_\gamma \) since it would be messy to always have to invert \(\gamma \) which in this context has a concrete meaning; the parameter \(\gamma \) basically tunes the maximum negative curvature of \(\mathcal {Q}_\gamma (f)\), as we shall see in Sect. 4 (Corollary 4.2). When \(\gamma =1\), we simply write \(\mathcal {Q}\) as opposed to \(\mathcal {Q}_\gamma \). In this section, we first provide some tools to compute \(\mathcal {Q}_\gamma \) and then prove the connection with l.s.c. convex envelopes and end with some auxiliary results.

The Legendre transform (or Fenchel conjugate) is defined as \(g^*(y):=\sup _x \left\langle x,y \right\rangle -g(x).\) We remind the reader that \(g^*\) is l.s.c convex and that \(g^{**}\) equals the l.s.c. convex envelope of g by the Fenchel–Moreau theorem (see, e.g. Proposition 13.11 and 13.39 in [18]). We now introduce the transform \(\mathcal {S}_\gamma \) defined as follows:

$$\begin{aligned} \mathcal {S}_\gamma (f)(y):= \left( f(\cdot )+\frac{\gamma }{2}\Vert \cdot \Vert ^2\right) ^*(\gamma y)-\frac{\gamma }{2}\Vert y\Vert ^2=\sup _x -f(x)-\frac{\gamma }{2}\left\| x-y\right\| ^2. \end{aligned}$$
(14)

\(\mathcal {S}_{\gamma }\) is simply the negative of the Moreau envelope computed with constant \(\gamma ^{-1}\). If we set \(q_\gamma (x,y)=-\frac{\gamma }{2}\Vert x-y\Vert ^2\) then, in the terminology of [17] Sec. 11.L, \(\mathcal {S}_\gamma (f)\) is the \(q_\gamma \)-conjugate of f and \(\mathcal {Q}_\gamma (f)\) the \(q_\gamma \)-envelope of f (reinforcing our choice of terminology “quadratic envelope” for \(\mathcal {Q}_\gamma \)). We introduce the symbol \(\mathcal {S}_{\gamma }\) mainly since we believe the notation \(-e_{\gamma ^{-1}}(f)\) or \({}^{q_{\gamma }}f\) (c.f. [17]) or \(-{}^{\gamma ^{-1}}f\) (c.f. [18]) would be confusing for our present purposes. Its connection to the quadratic envelope is described by the following proposition (Fig. 4);

Proposition 3.1

Let \(\gamma >0\) and let f be a \([0,\infty ]\)-valued l.s.c. functional on a separable Hilbert space \(\mathcal {V}\). We have \(\mathcal {Q}_{\gamma }=\mathcal {S}_\gamma \circ \mathcal {S}_\gamma :=\mathcal {S}_{\gamma }^2\), i.e., 

$$\begin{aligned} \mathcal {Q}_{\gamma }(f)(x)=\sup _y\left( \inf _w f(w)+\frac{\gamma }{2}\left\| w-y\right\| ^2\right) -\frac{\gamma }{2}\left\| x-y\right\| ^2 \end{aligned}$$
(15)

Proof

The argument is a replica of Example 1.44 of [17] but is included for completeness. We have \(\alpha -\frac{\gamma }{2}\Vert \cdot -y\Vert ^2\le f\) iff \(\alpha \le f+\frac{\gamma }{2}\Vert \cdot -y\Vert ^2\), so the maximal \(\alpha \) for fixed y is given by \(\alpha =-\mathcal {S}_{\gamma }(f)(y)\). Thus, \(\mathcal {Q}_\gamma (f)(x)=\sup _{y\in \mathcal {V}}-\mathcal {S}_\gamma (f)(y)-\frac{\gamma }{2}\Vert x-y\Vert ^2\) as desired. \(\square \)

The next proposition contains some basic observations about the behavior of \(\mathcal {S}_\gamma \) and \(\mathcal {Q}_\gamma \).

Proposition 3.2

Let \(\gamma >0\) and let f be a \([0,\infty ]\)-valued l.s.c. functional on a separable Hilbert space \(\mathcal {V}\). Then, \(\mathcal {S}_\gamma (f)\) takes values in \(]-\infty ,0]\) and is continuous, whereas \(\mathcal {Q}_\gamma (f)\) is lower semi-continuous, takes values in \([0,\infty ]\) and is continuous in the interior of \(\textsf {dom}(\mathcal {Q}_\gamma (f))\).

Proof

The statement of the interchanging signs follows easily by the last line of (14) which also shows that \(\mathcal {S}_\gamma (f)\) avoids \(-\infty \). By (14), it follows that \(\mathcal {S}_\gamma (f)\) (and \(\mathcal {Q}_\gamma (f)\) by Proposition 3.1) is the difference of an l.s.c. convex functional and a quadratic term. With this in mind, the continuity statements follow by the standard properties of l.s.c. convex functionals (see, e.g., Corollary 8.30 [18]). \(\square \)

The following result is the key result of this section connecting the \(\mathcal {Q}_\gamma \)-transform with l.s.c. convex envelopes.

Theorem 3.1

Let \(\gamma >0\) and let f be a \([0,\infty ]\)-valued functional on a separable Hilbert space \(\mathcal {V}\). Then, \(\left( f+\frac{\gamma }{2}\Vert \cdot -d\Vert ^2\right) ^*(y)=\mathcal {S}_\gamma (f) \left( \frac{y}{\gamma }+d\right) +\frac{\gamma }{2}\left\| \frac{y}{\gamma }+d\right\| ^2-\frac{\gamma }{2}\Vert d\Vert ^2\) and

$$\begin{aligned} \left( f+\frac{\gamma }{2}\Vert \cdot -d\Vert ^2\right) ^{**}(x)=\mathcal {Q}_\gamma (f)(x) +\frac{\gamma }{2}\Vert x-d\Vert ^2. \end{aligned}$$

In particular, \(\mathcal {Q}_\gamma (f)(x)+\frac{\gamma }{2}\Vert x-d\Vert ^2\) is the l.s.c. convex envelope of \(f(x)+\frac{\gamma }{2}\Vert x-d\Vert ^2\) and

$$\begin{aligned} 0\le \mathcal {Q}_\gamma (f)\le f. \end{aligned}$$

Proof

We have

$$\begin{aligned} \left( f(\cdot )+\frac{\gamma }{2}\Vert \cdot -d\Vert ^2\right) ^*(y)&=\sup _x \left\langle x,y \right\rangle -f(x)-\frac{\gamma }{2}\Vert x-d\Vert ^2\\&=\sup _x -f(x)-\frac{\gamma }{2}\left\| x-\left( \frac{y}{\gamma }+d\right) \right\| ^2\\&\quad +\frac{\gamma }{2}\left\| \frac{y}{\gamma }+d\right\| ^2-\frac{\gamma }{2} \Vert d\Vert ^2 \end{aligned}$$

from which the first identity follows. Similarly,

$$\begin{aligned} \left( f(\cdot )+\frac{\gamma }{2}\Vert \cdot -d\Vert ^2\right) ^{**}(x)&= \left( \mathcal {S}_\gamma (f)\left( \frac{\cdot }{\gamma }+d\right) +\frac{\gamma }{2} \left\| \frac{\cdot }{\gamma }+d\right\| ^2-\frac{\gamma }{2}\Vert d\Vert ^2\right) ^*(x)\\&=\sup _y \left\langle x,y \right\rangle {-}\mathcal {S}_\gamma (f)\left( \frac{y}{\gamma }{+}d\right) {-}\frac{\gamma }{2} \left\| \frac{y}{\gamma }+d\right\| ^2+\frac{\gamma }{2}\Vert d\Vert ^2\\&=\sup _y -\mathcal {S}_\gamma (f)\left( \frac{y}{\gamma }+d\right) -\frac{\gamma }{2}\left\| \frac{y}{\gamma }+d-x\right\| ^2\\&\quad +\frac{\gamma }{2}\Vert x-d\Vert ^2 =\mathcal {S}_\gamma ^2(f)(x)+\frac{\gamma }{2}\Vert x-d\Vert ^2. \end{aligned}$$

The statement about the convex envelope follows by the Fenchel–Moreau theorem and Proposition 3.1, which also gives \(\mathcal {Q}_\gamma (f)(x)+\frac{\gamma }{2}\Vert x-d\Vert ^2\le f(x)+\frac{\gamma }{2}\Vert x-d\Vert ^2\). This implies the latter part of the inequality \(0\le \mathcal {Q}_\gamma (f)\le f\), whereas the former has already been noticed in Proposition 3.2. \(\square \)

We end this section with some observations about the behavior of \(\mathcal {Q}_\gamma (f)\) as a function of \(\gamma \).

Proposition 3.3

Let f be an l.s.c. \([0,\infty ]\)-valued functional. Then, \(\mathcal {Q}_\gamma (f)(x)\) is increasing as a function of \(\gamma \). Moreover,

$$\begin{aligned} \lim _{\gamma \rightarrow \infty }\mathcal {Q}_\gamma (f)(x)=f(x) \end{aligned}$$
(16)

whereas the limit as \({\gamma \searrow 0}\) equals a convex minimizer of f above the l.s.c. convex envelope of f.

We remark that (16) is shown in [17], whereas nothing is said about the case \(\gamma \searrow 0\). In fact, \(\lim _{\gamma \searrow 0}\mathcal {Q}_\gamma (f)\) usually equals the l.s.c. convex envelope of f, but this is not necessarily the case in general, which is a surprise at least for the author. To see this, consider \(P=\{x\in \mathbb {R}^2:~x_1>0,~x_2=\sqrt{x_1}\}\), \(Q=\{x\in \mathbb {R}^2:~x_1>0,~0<x_2\le \sqrt{x_1}\}\cup \{0\}\) and \(f=\iota _{P}\), where \(\iota _P\) is the indicator functional of P. It is easy to see that the l.s.c. convex envelope of \(\iota _P\) equals \(\iota _{cl(Q)}\) (where cl denotes closure), whereas some thinking reveals that \(\lim _{\gamma \searrow 0}\mathcal {Q}_\gamma (f)=\iota _Q\). However, if \(\mathcal {V}\) is finite dimensional and \(\lim _{\gamma \searrow 0}\mathcal {Q}_\gamma (f)\) is everywhere finite, then it is automatically continuous (Corollary 8.30 in [18]), and hence it must equal the l.s.c. convex envelope of f.

Proof

If \(\gamma _1>\gamma _2\), then \(\mathcal {Q}_{\gamma _2}(f)(x)+\frac{\gamma _1}{2}\Vert x\Vert ^2\) equals the l.s.c. convex functional \(\mathcal {Q}_{\gamma _2}(f)(x)+\frac{\gamma _2}{2}\Vert x\Vert ^2\) plus the term \(\frac{\gamma _1-\gamma _2}{2}\Vert x\Vert ^2\) so it is l.s.c. and convex. In view of \(\mathcal {Q}_{\gamma _2}(f)\le f\), it also lies below \(f+\frac{\gamma _1}{2}\Vert x\Vert ^2\), and so, we conclude that

$$\begin{aligned} \mathcal {Q}_{\gamma _2}(f)(x)+\frac{\gamma _1}{2}\Vert x\Vert ^2\le \big (f+\frac{\gamma _1}{2}\Vert x\Vert ^2\big )^{**}= \mathcal {Q}_{\gamma _1}(f)(x)+\frac{\gamma _1}{2}\Vert x\Vert ^2. \end{aligned}$$

The first claim follows. To see (16), let \(\alpha <f(x)\) be arbitrary. Since f is l.s.c., the set \(\{y:f(y)>\alpha \}\) is open and, as \(f\ge 0\), it follows that for any \(\gamma \) large enough we have \(\alpha -\frac{\gamma }{2}\Vert \cdot -x\Vert ^2\le f\). For such \(\gamma \), we thus have \(\alpha \le \mathcal {Q}_{\gamma }(f)(x)\le f(x)\) by (13) and Theorem 3.1, so (16) follows.

Concerning the limit as \({\gamma \searrow 0}\) set \(g(x)=\lim _{\gamma \searrow 0}\mathcal {Q}_\gamma (f)(x)\), which exist by the first part of this proposition. Now note that

$$\begin{aligned} g(x)=\lim _{\gamma \searrow 0}\mathcal {Q}_\gamma (f)(x)=\lim _{\gamma \searrow 0}\mathcal {Q}_\gamma (f)(x){+}\frac{\gamma }{2}\Vert x\Vert ^2{=}\lim _{\gamma \searrow 0}\big (f+\frac{\gamma }{2}\Vert \cdot \Vert ^2\big )^{**}(x)\ge f^{**}. \end{aligned}$$

We also see that g is the limit of a decreasing sequence of convex functions, hence it is also convex (Proposition 8.16 [18]). Finally, \(g\le f\) by Theorem 3.1. \(\square \)

4 Finer Properties of Convex and Quadratic Envelopes

In this section, we prove a result about the structure of l.s.c. convex envelopes which seems relatively unknown. For this, we need the concept of weak lower semi-continuity, which is nothing but semi-continuity with respect to the weak topology of the underlying separable Hilbert space \(\mathcal {V}\). We remind the reader that for convex proper functionals there is no difference (Theorem 9.1 [18]) between weakly l.s.c. functionals and the standard l.s.c. functionals. Also, if \(\mathcal {V}\) is finite dimensional and the topology is Hausdorff, the two topologies are the same so there is no difference in this case either. However, we wish to underline that the difficulty in proving the coming results is present also in the finite-dimensional setting.

We begin with a neat fact concerning weakly l.s.c. convex envelopes, which does not seem to have made its way into the modern literature on the subject. It is a reformulation of Arne Brøndsted’s extension of Milman’s theorem [15]. To state it, we remind the reader that a functional g is coercive if and only if its (lower) level sets are bounded, (see, e.g., Proposition 11.11 [18]). Note that l.s.c. convex envelopes of the type \(\mathcal {Q}_\gamma (f)(x)+\frac{\gamma }{2}\Vert x-d\Vert ^2\) (for non-negative f) always are coercive, by virtue of Proposition 3.2 and the quadratic term. A function f on \(\mathbb {R}\) is called affine if it is of the form \(f(t)=at+b\) with \(a,b\in \mathbb {R}\).

Theorem 4.1

Let g be a weakly l.s.c. functional on a separable Hilbert space \(\mathcal {V}\) such that \(g^{**}\) is coercive. Given any \(x\in \mathcal {V}\) such that \(g(x)\ne g^{**}(x)\) there exists a unit vector \(\nu \) and \(t_0>0\) such that the function \(h(t)= g^{**}(x_0+t\nu )\) is affine on \(]-t_0,t_0[\).

To prove Theorem 4.1, we recall some concepts from [15]. Given a convex function f, a point x is called extremal if and only if (xf(x)) is extremal for the epigraph of f, denoted [f]. Equivalently, x is extremal if and only if \(x\in \textsf {dom}~f\) and f is not affine on any relatively open segment containing x. Moreover, \(f_{ext}\) denotes the functional which equals f(x) for all extremal points x and \(\infty \) else.

To illustrate, consider the convex functional \(g:\mathbb {R}^2\rightarrow ]-\infty ,\infty ]\) given by \(g(x,\pm 1)=-\sqrt{1-x^2}\) for (xy) in \([-1,1]\times \{\pm 1\}\), and which equals \(\infty \) elsewhere. Then, \(g^{**}(x,y)=-\sqrt{1-x^2}\) on \([-1,1]^2\) and \(g^{**}=\infty \) elsewhere. The extremal points then equal \([-1,1]\times \{\pm 1\}\), which should not be confused with the exposed points which are \(]-1,1[\times \{\pm 1\}\) (see, e.g., Ch. 25 of [19]). Note that on the other two edges \(\{\pm 1\}\times ]-1,1[\), the graph of \(g^{**}\) is indeed affine, but that at these points, \(g^{**}\) is not sub-differentiable. Also note that \((g^{**})_{ext}= g\) for this particular function. The result we need from [15] reads as follows.

Theorem 4.2

Let g be a weakly l.s.c. functional on a separable Hilbert space \(\mathcal {V}\) such that \(g^{**}\) is coercive, then

$$\begin{aligned}{}[(g^{**})_{ext}]\subset [g]. \end{aligned}$$

Proof

In the setting of [15], we let E be the separable Hilbert space \(\mathcal {V}\) with the weak topology. Since, convex functionals are l.s.c. with respect to the weak topology if and only if they are with respect to the norm topology, it follows that the l.s.c convex envelope of g equals the weakly l.s.c. convex envelope. In the notation of Theorem 1 of [15], we can then take \(f=g^{**}\) and the theorem states that \([f_{ext}]\subset [g_{cl}]\), where \(g_{cl}\) is the greatest l.s.c. minorant of g. Since, g is assumed to be l.s.c. we have \(g=g_{cl}\), and the desired inclusion follows. It remains to check that the conditions of Theorem 1 are fulfilled, which is that “g is inf-compact in some direction” (with respect to the weak topology, referring to the terminology of [15]). For this, it suffices to check that \(g^{**}\) is inf-compact, i.e., that all level sets are compact. The level sets of \(g^{**}\) are closed and convex, and since \(g^{**}\) is assumed coercive they are also bounded. It follows that such level sets are compact in the weak topology and the proof is complete. \(\square \)

Based on this, we can now easily prove Theorem 4.1.

Proof of Theorem 4.1

Since \(g\ge g^{**}\), Theorem 4.2 clearly implies that \(g(x)=g^{**}(x)\) for all extremal points x for \(g^{**}\). Consequently, if \(g(x)= g^{**}(x)\) does not hold, then x is not extremal for \(g^{**}\), and the existence of \(\nu \) follows by the definition of an extremal point for \(g^{**}\). \(\square \)

Next, we discuss what the theorem implies about minimizers of g versus \(g^{**}\). Denote by G the set of global minimizers of g, and by \(G^{**}\) the set of global minimizers of \(g^{**}\).

Corollary 4.1

Let g be a weakly l.s.c. functional on a separable Hilbert space \(\mathcal {V}\) such that \(g^{**}\) is coercive. Then, \(G^{**}\) is a closed bounded convex set containing G. Letting \(G^{**}_{ext}\) denote the extremal points of \(G^{**}\), we also have that \(G^{**}_{ext}\subset G\). Finally, the closed convex hull of \(G^{**}_{ext}\) equals \(G^{**}\).

Proof

The convexity of \(G^{**}\) and the inclusion \(G\subset G^{**}\) are immediate. The boundedness of \(G^{**}\) follows since \(g^{**}\) is coercive. Let x be in the closure of \(G^{**}\), and let c be the value of the global minimum. Then, \(g^{**}(x)\le c\) follows by l.s.c., and the reverse inequality is obvious from the fact that c is a global minimum. It follows that \(x\in G^{**}\), and hence \(G^{**}\) is closed.

The existence of points in \(G^{**}_{ext}\), and the statement concerning the closed convex hull, are now immediate consequences of the Krein-Milman theorem and the fact that bounded closed convex sets are weakly compact in separable Hilbert spaces (Theorem 3.33, [18]). It remains to prove that \(G^{**}_{ext}\subset G\). Let \(x_0\in G^{**}_{ext}\) and suppose \(x_0\not \in G\). Then, Theorem 4.1 implies the existence of a direction \(\nu \) on which \(g^{**}\) is constant near \(x_0\), contradicting that \(x_0\) is an extremal point. \(\square \)

We end by noting that Theorem 4.1 implies that \(\gamma \) tunes the maximum negative curvature in the \(\mathcal {Q}_\gamma \)-transform, as discussed in the introduction.

Corollary 4.2

Let f be a weakly l.s.c. \([0,\infty ]\)-valued functional on a separable Hilbert space \(\mathcal {V}\). For each \(x_0\in \mathcal {V}\) with \(f(x_0)>\mathcal {Q}_\gamma (f)(x_0)\), there exists a unit vector \(\nu \) such that

$$\begin{aligned} \mathcal {Q}_\gamma (f)(x_0+t\nu )=a+bt-\frac{\gamma }{2}t^2 \end{aligned}$$

for t near 0 and some \(a,b\in \mathbb {R}\).

Proof

Set \(g(x)=f(x)+\frac{\gamma }{2}\Vert x\Vert ^2\). By Theorem 3.1, we have \(\mathcal {Q}_\gamma (f)(x)+\frac{\gamma }{2}\Vert x\Vert ^2=g^{**}(x)\), by which it is immediate that \(g^{**}\) is coercive, (since \(\mathcal {Q}_\gamma (f)\ge 0\) by Proposition 3.2). It also follows that \(g(x_0)>g^{**}(x_0)\), and hence, Theorem 4.1 implies that a unit vector \(\nu \) exists such that \(t\mapsto \mathcal {Q}_\gamma (f)(x_0+t\nu )\) equals an affine function minus \(\frac{\gamma }{2}\Vert (x_0+t\nu \Vert ^2\) in a neighborhood of \(t=0\). \(\square \)

5 The Quadratic Envelope as a Regularizer

We now let \(A:\mathcal {V}\rightarrow \mathcal {W}\) be a bounded linear operator, where \(\mathcal {V},\mathcal {W}\) are possibly different (separable) Hilbert spaces, and consider functionals of the type

$$\begin{aligned} \mathscr {J}(x)=f(x)+\frac{1}{2}\left\| A x-d\right\| ^2_\mathcal {W},\quad x\in \mathcal {V}. \end{aligned}$$
(17)

Our aim is to develop strategies to deal with the general problem of minimizing (17), in the case when f is an \([0,\infty ]\)-valued functional such that \(\mathcal {Q}_\gamma (f)\) is computable and focus on computing (explicit) approximations of the l.s.c convex envelope of \(\mathscr {J}\). The theory is split in two cases, either we approximate the convex envelope from below by a convex functional, or we approximate it from above with a non-convex functional having a number of desirable properties, most notably the fact that local minimizers do not change. More precisely, we will study the relationship between the original functional (17) and the modified functional

$$\begin{aligned} \mathscr {J}_\gamma (x)=\mathcal {Q}_\gamma (f)(x)+\frac{1}{2}\left\| A x-d\right\| ^2_\mathcal {W},\quad x\in \mathcal {V}, \end{aligned}$$
(18)

under the assumption that either \(\gamma I\preccurlyeq A^*A\) or \(\gamma I\succcurlyeq A^*A\) (c.f. (11)–(12) and recall Fig. 3). Note that \(\gamma I\succcurlyeq A^*A\) if and only if \(\gamma \ge \Vert A\Vert ^2\).

5.1 Case \(A^*A\succcurlyeq \gamma I\)

Let f be a \([0,\infty ]\)-valued functional and \(A:\mathcal {V}\rightarrow \mathcal {W}\) a bounded linear operator. The main result of this section states that \(\mathscr {J}_\gamma \) is a convex minorant of the l.s.c. convex envelope \(\mathscr {J}^{**}\).

Theorem 5.1

For \(\gamma >0\) such that \(A^*A\succcurlyeq \gamma I\), \(\mathscr {J}_\gamma \) is convex and \(\mathscr {J}_\gamma \le \mathscr {J}^{**}\). Moreover, if \(A^*A\succ \gamma I\) then it is strongly convex, in which case it has a unique minimizer. Finally, a minimizer \(\hat{x}\) of \(\mathscr {J}_{\gamma }\) is a minimizer of \(\mathscr {J}\) whenever \(f(\hat{x})=\mathcal {Q}_\gamma (f)(\hat{x})\).

Proof

Upon expanding \(\left\| Ax-d\right\| ^2=\Vert Ax\Vert ^2-2\left\langle Ax,d \right\rangle +\left\| d\right\| ^2\) and noting that the latter two terms are affine linear, it is easily seen that it suffices to prove the first part of the statement for \(d=0\). By Theorem 3.1, it is clear that \(\mathscr {J}_\gamma \) is l.s.c. and that \(\mathscr {J}_\gamma \le \mathscr {J}\), and thus \(\mathscr {J}_\gamma \le \mathscr {J}^{**}\) follows immediately upon showing that \(\mathscr {J}_\gamma \) is convex. Define

$$\begin{aligned} \left\langle x,y \right\rangle _{\mathcal {U}}=\left\langle Ax,Ay \right\rangle _{\mathcal {W}}-\gamma \left\langle x,y \right\rangle _{\mathcal {V}} \end{aligned}$$

and note that this is a semi-inner product, as long as \(A^*A\succcurlyeq \gamma I\). It is also an inner product if the inequality is strict. In either case, \(\left\| x\right\| ^2_{\mathcal {U}}:=\left\langle x,x \right\rangle _{\mathcal {U}}\) is convex. It follows that

$$\begin{aligned} \mathcal {Q}_\gamma (f)(x)+\frac{1}{2}\left\| Ax\right\| ^2_{\mathcal {W}}=\Big (\mathcal {Q}_\gamma (f)(x)+\frac{\gamma }{2}\left\| x\right\| ^2_{\mathcal {V}}\Big )+\frac{1}{2}\left\| x\right\| ^2_{\mathcal {U}}, \end{aligned}$$

which by Theorem 3.1 implies that \(\mathscr {J}_\gamma \) equals the l.s.c. convex envelope of \(f(x)+\frac{\gamma }{2}\left\| x\right\| ^2_{\mathcal {V}}\) plus the term \(\frac{1}{2}\left\| x\right\| ^2_{\mathcal {U}}\). We conclude that \(\mathscr {J}_\gamma \) is a convex functional, which is strongly convex when \(A^*A\succ \gamma I\). In the latter case, the existence of a unique minimizer follows by Corollary 11.15 in [18], (supercoercivity of \(\mathscr {J}_\gamma \) is obvious by the term \(\frac{1}{2}\left\| x\right\| ^2_{\mathcal {U}}\)). Finally, let d be fixed and let \(\hat{x}\) be a minimizer of \(\mathscr {J}_\gamma \). Suppose that \(f(\hat{x})=\mathcal {Q}_\gamma (f)(\hat{x})\), and let \(y\in \mathcal {V}\) be arbitrary. Then

$$\begin{aligned} \mathscr {J}(y)\ge \mathscr {J}_\gamma (y)\ge \mathscr {J}_\gamma (\hat{x})=\mathscr {J}(\hat{x}), \end{aligned}$$

showing that \(\hat{x}\) is a global minimizer of \(\mathscr {J}\). \(\square \)

5.2 Case \(A^*A\preccurlyeq \gamma I\)

Let f be a \([0,\infty ]-\)valued functional, and \(A:\mathcal {V}\rightarrow \mathcal {W}\) a bounded linear operator. Again, we are interested in the relationship between \(\mathscr {J}\) and \(\mathscr {J}_\gamma \), defined in (17) and (18), respectively. The main result of this section is that \(\mathscr {J}_\gamma \) does not move minima for \(\gamma \) in the stated range. We begin by noting the following inequalities, the first one being reverse of the one proved in Theorem 5.1.

Proposition 5.1

For \(\gamma \) such that \(\Vert A\Vert ^2\le \gamma \) we have \(\mathscr {J}^{**}\le \mathscr {J}_\gamma \le \mathscr {J}.\)

Proof

The right inequality is immediate since \(\mathcal {Q}_\gamma (f)\le f\) by Theorem 3.1. As in Theorem 5.1, we moreover see that it suffices to prove the left inequality for \(d=0\). To this end, set \(h(x)=\mathscr {J}^{**}(x)-\frac{1}{2}\Vert Ax\Vert ^2\). Since \(\mathscr {J}^{**}\le f+\frac{1}{2}\left\| Ax\right\| ^2\), we have \(h\le f\), and moreover

$$\begin{aligned} h(x)+\frac{\gamma }{2}\left\| x\right\| ^2=\mathscr {J}^{**}+\left( \frac{\gamma }{2} \left\| x\right\| ^2-\frac{1}{2}\left\| Ax\right\| ^2\right) . \end{aligned}$$

The right hand side is convex and l.s.c. by which we conclude that

$$\begin{aligned} h(x)+\frac{\gamma }{2}\left\| x\right\| ^2\le \big (f+\frac{\gamma }{2}\left\| \cdot \right\| ^2\big )^{**}(x)=\mathcal {Q}_\gamma (f)(x)+\frac{\gamma }{2}\left\| x\right\| ^2, \end{aligned}$$

(where the last identity follows by Theorem 3.1), which gives \(h(x)\le \mathcal {Q}_\gamma (f)(x)\). In other words, \(\mathscr {J}^{**}(x)\le \mathcal {Q}_\gamma (f)(x)+\frac{1}{2}\Vert Ax\Vert ^2,\) which is the desired inequality (for \(d=0\)). \(\square \)

We now come to the main theorem of this section, inspired by Theorems 4.5 and 4.8 in [8]. We say that x is a local minimizer of \(\mathscr {J}\) if there exists a neighborhood U of x in \(\mathcal {V}\) such that \(\mathscr {J}(y)\ge \mathscr {J}(x)\) for all \(y\in U\), and we say that x is a strict local minimizer of \(\mathscr {J}\) if the inequality is strict for \(y\ne x\).

Theorem 5.2

Suppose that \(\Vert A\Vert ^2< \gamma \). If x is a local minimizer (resp. strict local minimizer) of \(\mathscr {J}_\gamma \), then it is also a local minimizer (resp. strict local minimizer) of \(\mathscr {J}\), and \(\mathscr {J}_\gamma (x)=\mathscr {J}(x)\). In addition, the global minimizers coincide.

Proof

Let x be a local minimizer of \(\mathscr {J}_\gamma \). If \(\mathcal {Q}_\gamma (f)(x)=f(x)\) does not hold, then Corollary 4.2 implies that there exists a unit vector \(\nu \) such that

$$\begin{aligned}&\frac{\mathrm{d}^2}{\mathrm{d}t^2}\mathscr {J}_\gamma (x+t\nu )(0)\nonumber \\&\quad =\frac{\mathrm{d}^2}{\mathrm{d}t^2}\left( \mathcal {Q}_\gamma (f)(x+t\nu )+\frac{1}{2}\left\| A(x+t\nu )-d\right\| ^2_{\mathcal {V}}\right) (0)=\Vert A\nu \Vert ^2-\gamma <0.\nonumber \\ \end{aligned}$$
(19)

We thus conclude that \(\mathcal {Q}_\gamma (f)(x)=f(x)\) holds, which immediately gives that \(\mathscr {J}_\gamma (x)=\mathscr {J}(x)\). In view of Proposition 5.1, it follows that x is a local minimizer also for \(\mathscr {J}\). The same argument applies to strict local minimizers.

We now prove that the global minimizers coincide. Note that global minimizers of \(\mathscr {J}\) are global minimizers of \(\mathscr {J}_\gamma \), in view of Proposition 5.1 and the fact that \(\mathscr {J}(x)=\mathscr {J}^{**}(x)\) for all global minimizers x. From this, we also see that the global minimum of \(\mathscr {J}\) and \(\mathscr {J}_\gamma \) coincides, let us denote this value by c. Conversely, suppose that x is a global minimizer of \(\mathscr {J}_\gamma \) (i.e., \(\mathscr {J}_\gamma (x)=c\)). Then, it is a local minimizer of \(\mathscr {J}\) by the first part, which automatically is global for \(\mathscr {J}\) since we otherwise would have \(\mathscr {J}(y)<c\) for some other value y. The proof is complete. \(\square \)

The situation when \(\gamma =\Vert A\Vert ^2\) is a bit more involved, so we content ourselves with the following statement concerning the global minimizers.

Theorem 5.3

Set \(\gamma =\Vert A\Vert ^2\), let G be the global minimizers of \(\mathscr {J}\), and \(G_\gamma \) the global minimizers of \(\mathscr {J}_\gamma \). Then, \(G\subset G_\gamma \), and each connected component of \(G_\gamma \) contains points of G.

Proof

The statement \(G\subset G_\gamma \) follows as in the above proof, as well as the fact that the global minimum of \(\mathscr {J}\) and \(\mathscr {J}_\gamma \) coincide; we denote it by c. If \(x\in G_\gamma \) and \(\mathscr {J}(x)> c\), then it follows by (19) that there exists a unit vector \(\nu \) such that \(\frac{\mathrm{d}^2}{\mathrm{d}t^2}\mathscr {J}_\gamma (x+t\nu )\le 0\) in a neighborhood of \(t=0\). Strict inequality contradicts the assumption of global minima, so we deduce that \(\gamma \Vert \nu \Vert ^2=\Vert A\nu \Vert ^2\). Introducing the semi-norm \(\Vert x\Vert _{\mathcal {U}}^2=\gamma \Vert x\Vert _\mathcal {V}^2-\Vert Ax\Vert _\mathcal {W}^2\), this means that \(\Vert \nu \Vert _{\mathcal {U}}=0\), i.e., that \(\nu \) lies in the kernel of the semi-norm \(\Vert \cdot \Vert _{\mathcal {U}}\), (which is a linear subspace by bilinearity of the semi-norm). Let P be the affine hyperplane \(P=x+\ker \Vert \cdot \Vert _{\mathcal {U}}\), and set \(S=P\cap G_\gamma .\) For \(y\in \ker \Vert \cdot \Vert _{\mathcal {U}}\), we have

$$\begin{aligned}&\mathscr {J}_\gamma (x+y)=\left( \mathcal {Q}_{\gamma }(f)(x+y)+\frac{\gamma }{2}\left\| x+y\right\| ^2_{\mathcal {V}}\right) \nonumber \\&\quad -\frac{1}{2}\left\| x\right\| ^2_{\mathcal {U}}- \left\langle A(x+y),d \right\rangle _\mathcal {W}+\frac{1}{2}\Vert d\Vert ^2_\mathcal {W}, \end{aligned}$$
(20)

so Theorem 3.1 implies that \(\mathscr {J}_\gamma \) is convex on P. In particular, S is convex. Since \(\mathscr {J}_\gamma \) is l.s.c., S is also closed. Moreover, S is bounded due to the quadratic term \(\left\| x+y\right\| ^2_{\mathcal {V}}\) in (20). S is therefore weakly closed, and hence it equals the closed convex hull of its extremal points by the Krein-Milman theorem. If x now is one of these extremal points, then we can argue as in the beginning of this proof and conclude that \(\mathscr {J}_\gamma (x)=\mathscr {J}(x)\), since the existence of a \(\nu \) with the properties stated initially would contradict that x is an extremal point of S. \(\square \)

6 The Quadratic Envelope and Semi-algebraicity

We briefly treat semi-algebraicity of \(\mathcal {Q}_\gamma (f)\), since it was shown in [16] that this is connected with the convergence of the forward–backward splitting method, applied to functionals of the type (18). We remind the reader that a function on a finite-dimensional space is semi-algebraic if its graph is a semi-algebraic set [20].

Theorem 6.1

If \(\mathcal {V}\) is finite dimensional and f is semi-algebraic then so are \(\mathcal {S}_\gamma (f)\) and \(\mathcal {Q}_\gamma (f)\).

Proof

We assume for simplicity that \(\gamma =1\). It is a consequence of the Tarski-Seidenberg theorem that the set of semi-algebraic functions is closed under addition, (see, e.g., Prop. 2.2.6 in [20]), and similarly, one can prove that the epigraph of a semi-algebraic function is a semi-algebraic set. If f is semi-algebraic on \(\mathbb {R}^n\) it follows that \(g(x,y)=\left\langle x,y \right\rangle -(f(x)+\frac{1}{2}\left\| x\right\| ^2)\) is semi-algebraic on \(\mathbb {R}^{2n}\), and by the argument following Theorem 2.2 in [16], it follows that the Legendre transform of \(f+\frac{1}{2}\left\| x\right\| ^2\) is semi-algebraic. The first result now follows since this function minus \(\frac{\gamma }{2}\left\| y\right\| ^2\) equals \(\mathcal {S}_\gamma (f)(y)\) by (14), and the second is immediate by Proposition 3.1. \(\square \)

7 Related Works

The operations \(\mathcal {S}_{\gamma }(f)\) and \(\mathcal {Q}_\gamma (f)\) were introduced around 1970 in greater generality by Moreau [21] and (seemingly independently) Weiss [22] and were further studied around 1990 by Poliquin [23] with a focus on smoothness properties. Variations of Propositions 3.1 and 3.2 date back to these early articles, and are also found, e.g., in Rockafellar–Wets [17] Section 11.L. The transforms \(\mathcal {S}_\gamma \) and \(\mathcal {Q}_{\gamma }\) go under names like “\(\Phi \)-conjugate”/“proximal transform” and “\(\Phi \)-biconjucate”/“\(\Phi \)-convex envelope,” and arise by the concrete choice \(\Phi (x,y)=q_{\gamma }(x,y)=-\frac{\gamma }{2}\Vert x-y\Vert ^2\). Following Rockafellar–Wets [17], \(\mathcal {Q}_\gamma (f)\) should be called “proximal hull” or “\(q_\gamma -\)envelope”. We believe that the “quadratic envelope”, which is closer to the latter, is more suggestive. Functions that satisfy \(\mathcal {Q}_\gamma (f)=f\) have been called, e.g., \(\gamma ^{-1}\)-proximal or quadratically convex.

However they are called, it seems that the connection with convex envelopes á la Theorem 3.1 has not been investigated, which is the main novelty of this publication along with the structural result Corollary 4.2 and its applications to regularization in Sect. 5. Apart from the already mentioned works by Aubert, Blanc-Feraud, Soubies and Larsson, Olsson, we have not found any similar result in the literature. The fairly recent survey paper [24] is about the closely related concept of computing Fenchel conjugates and also mentions proximal hulls, yet it has no overlap with the present paper. It primarily deals with numeric computation of convex envelopes in cases when symbolic formulas are not available, and as such it is an interesting alternative to the methods developed here. The same goes for the papers [25, 26]. The importance of computing convex envelopes is stressed in [27], where techniques for computing convex envelopes of so-called convex polyhedral functions are developed. Convex approximations from below are considered in [28], which should be compared with the results in Sect. 5.1. An alternative to approximating the convex envelope is to numerically try to compute the proximal operator of the original functional directly, which is pursued in [29]. The papers [30, 31] deal with Lasry–Lions approximants in Hilbert space, but do not make the connection with the convex envelopes. For parameters \(s<t\), the Lasry–Lions approximation of f [32] is defined by

$$\begin{aligned} \begin{aligned} \mathcal {S}_{1/s}\mathcal {S}_{1/t}(f)(x)&=-\left( \inf _y-\left( \inf _w f(w)+\frac{1}{2t}\left\| w-y\right\| ^2\right) +\frac{1}{2s}\left\| x-y\right\| ^2\right) \\&=\sup _y\left( \inf _w f(w)+\frac{1}{2t}\left\| w-y\right\| ^2\right) -\frac{1}{2s}\left\| x-y\right\| ^2, \end{aligned} \end{aligned}$$
(21)

which for \(s=t\) gives \(\mathcal {Q}_{s^{-1}}\). This regularization is also studied in Sect. 6 of the more recent publication [31], (with the notation C(1)f), mainly with focus on differentiability-results. It is also closely connected to the more general “proximal average,” see, e.g., [33, 34] and regularization by “self-dual smoothing” [35]. However, these techniques have been used mainly for modification of convex functions f, whereas \(\mathcal {Q}_\gamma (f)=f\) for any l.s.c. convex function.

8 Conclusions

We have provided theory for computing l.s.c. convex envelopes of certain functionals and shown a connection with quadratic envelops (a.k.a. proximal hulls), which was then used to regularize more intricate problems. We showed that, for sufficiently small values of the parameter \(\gamma \), this yields convex functionals below the original functional, which coincide with the original functional on a large part of the underlying Hilbert space. For \(\gamma \) sufficiently large, on the other hand, we lose convexity, but gain the desirable feature that the modified functional has the same global minimizers as the original one and fewer local ones. This in turn was based on results regarding the structure of l.s.c. convex envelopes. The results are inspired by prior work from Carl Olsson and Viktor Larsson as well as Emmanuel Soubies, Laure Blanc-Féraud and Gilles Aubert.

Particular cases of these ideas have already been applied to compressed sensing [6], computer vision [36], signal processing and frequency estimation [37, 38], as well as structured low-rank approximation [39]. Currently, we are working on more concrete results regarding low-rank approximation, total variation denoising, as well as an application to the classical phase retrieval problem. We hope that other researchers will try these methods on their problems and find that the method is a valuable tool. To aid with this task, an expanded version of this article is available on arXiv [12] with many more examples and useful details.