On Convex Envelopes and Regularization of Non-Convex Functionals without moving Global Minima

We provide theory for the computation of convex envelopes of non-convex functionals including an l2-term, and use these to suggest a method for regularizing a more general set of problems. The applications are particularly aimed at compressed sensing and low rank recovery problems but the theory relies on results which potentially could be useful also for other types of non-convex problems. For optimization problems where the l2-term contains a singular matrix we prove that the regularizations never move the global minima. This result in turn relies on a theorem concerning the structure of convex envelopes which is interesting in its own right. It says that at any point where the convex envelope does not touch the non-convex functional we necessarily have a direction in which the convex envelope is affine.


Introduction
This article is a compressed and improved version of [13], which contains more information and potentially more errors. The present work is the extension of a chain of ideas with its roots in compressed sensing. 1 − 2 -minimization tricks have a long history and got renewed attention with the work of Donoho, Candés and Tao among others [15,16,12]. In the same spirit the nuclear norm minimization strategy was investigated by Fazel and coworkers [17,28] and in both cases it was shown that these methods yield perfect reconstructions in the case of no noise. However, in realistic scenarios these results often do not apply and moreover there is of course noise, in which case the methods come with a (sometimes severe) bias. Moreover they are slow since one needs to find an appropriate value of involved penalty parameters.
Due to such issues there is a wealth of non-convex variations to replace 1 /nuclear norm in the area of compressed sensing, we refer to [14] for a survey. Two fairly recent contributions in this vein is the work by Carl Olsson and coworkers [20] as well as by Gilles Aubert and coworkers [30]. The former paper deals with non-convex matrix minimization problems with subspace constraints, the latter with sparse reconstructions, and in particular the latter shows that the concrete regularizer considered there has the desirable property of not moving global minima. In this paper we find a unifying framework and show that all these penalties are particular cases of the so called "proximal hull" or "quadratic envelope". We systematically study this as a regularizer and in particular we lift the result of Aubert et al. to a general context. In order to do so we provide new results on the structure of lower semi-continuous (abbreviated l.s.c.) convex envelopes which are interesting in their own right. More precisely we show that whenever a l.s.c. convex envelope is not in touch with the function that generates it, then it necessarily has a direction in which it is affine linear.

Outline and motivation
We develop methods to compute the lower semi-continuous convex envelope of functionals of the form and show that this is of the form Q(f )(x)+ 1 2 x−d 2 2 , where Q(f ) is the proximal hull or "quadratic envelope", as we shall call it. Here x can be in any separable Hilbert space but f needs to be such that the global minimization of (1) is computable. The practical applications of Q(f ) pertains to optimization of (1) with additional constraints, as well as unconstrained optimization of where A is a linear operator.
To introduce the main ideas behind this work we consider two concrete problems. A multitude of applications can be posed mathematically as finding the lowest rank matrix X satisfying some equation A(X) = d, where A is a linear operator and d is a measurement (see e.g. [28,32]). Usually the measurement d is not perfect so in practice one wishes to find the minimum rank given some accepted error; A(X) − d ≤ ρ. The dual formulation of this problem is arg min where λ is a parameter. However, the functional rank(X) is non-convex and highly discontinuous, so the problem can not be solved as stated (in general). It can be solved for the case A = I but the problem is still hard when combined with additional priors, see e.g. Section 1.1 in [20] for an overview and applications in signal processing and imaging.
Due to the problematic nature of rank(X) it has become popular to replace rank(X) with the nuclear norm of X. However, rank(X) and the nuclear norm are quite far apart and the method leads to a bias in the solution, which led the authors of [20] to suggest working instead with the convex envelope of rank(X) + 1 2 X − D 2 F for which they obtained an explicit expression. They also provided the convex envelope when rank(X) is replaced by the indicator functional of the set {X : rank(X) ≤ K}, in order to treat problems where a matrix of a fixed rank is sought, and this convex envelope was further studied in [1].
Independently, convex envelopes was used in [30] to suggest a regularizer to functionals of the type which is usually dealt with by replacing x 0 by λ x 1 . The main contribution of their work is to show that their regularizer does not move global minima. A common misconception is that the same holds for 1 -methods, which is true only if there is no noise [11]. In the presence of noise the estimates for 1 -methods are rather poor and [30] is the first framework which allows for regularization without moving minima in a more realistic scenario. This paper presents a unified approach to this circle of ideas by connecting them with the "quadratic envelope" Q(f ). We also extend the findings of [30] to any problem of the form (2) as long as Q(f ) is computable. An expanded version of this article is found in [13] which contains a long list of instances where Q(f ) is computable.
In particular Q(f ) is computable for ι K ; the indicator functional for {x ∈ C n : x 0 ≤ K}. As a proof of concept, we compare performance of (4) with x 0 replaced by λ x 1 , with Q(card) and with Q(ι K ) (where card(x) = x 0 ). We use a 100 × 200 matrix A and minimize the regularized version of (4) for d of the form Ax 0 + , where x 0 has cardinality 10 and takes on various levels of noise. As noted in [10] the best one can hope for is then to recover the so called "oracle solution" x S (obtained if an oracle a priori revealed the correct support). As Figure 1 shows both Q(card) and Q(ι 10 ) outperform 1 and finds the oracle solution for fairly large levels of noise. Also Q(ι 10 ) beats Q(card), which is no surprise since it contains additional information about the problem built into it, and demonstrates the versatility of the new Q-transform. The article [14] contain much more information about this particular case.
We now outline the main contributions of this paper in greater detail. Consider any functional of the form where γ > 0 is a parameter, V is an arbitrary separable Hilbert space and f a non-negative functional on V. In Section 3 we introduce the transform Q γ and show that the l.s.c. convex envelope of the functional in (5) is In order for Q γ (f ) to be computable, the global minimization of (5) needs to be solvable, and hence the problem of minimizing (5) in itself is not an instance where the Q γ -transform is useful. However, it is useful for finding global minimizers of (5) in combination with additional prior restrictions. To illustrate, consider the problem arg min where H is a closed convex subset of V, and suppose we are unable to find a closed form solution. Upon replacing (7) with for some fixed γ ≤ 1, we obtain a convex problem which can be solved. However, even for γ = 1 it is possible that (7) and (8) have different solutions, despite the functional in (8) being the l.s.c. convex envelope of the one in (7). The rationale behind replacing (7) by (8) is pragmatical; since the latter is convex the solution may be found using convex optimization routines. This may seem ad hoc but we remind the reader that replacing e.g. x 0 by x 1 or rank(X) by the nuclear norm X 1 has had a substantial impact, and that for these concrete cases the modification Q γ (f ) is much closer to the original functional f (which leads to a better performance as an estimator, see the numerical sections of [14,20]). A reason for this is that Q γ (f ) has the desirable feature that Q γ (f )(x) = f (x) often holds, and since (8) is a convex problem below the original problem (7), it is easy to see that a minimumx to (8) is the solution to (7) whenever . This is highlighted in Figure  2 where the two problems have the same solution. More information and examples on this type of problems is found in Part II of [13]. In Section 5 we consider regularization of functionals like (3) and (4), or more generally for arbitrary non-negative f , where A : V → W is a linear operator between separable Hilbert spaces. We assume that V is such that Q γ (f ) is computable and that the convex envelope of (9) is untractable. We propose to use as regularizer the function Q γ (f ), i.e. we will study the relationship between minimizers of (9) and those of Since it often holds that Q γ (f )(x) = f (x), we again see that a global minimizer of (10) for which this is the case must also be a global minimizer of (9), in view of the inequality Q γ (f ) ≤ f (shown in Section 3). The parameter γ now becomes a useful tool as it tunes the curvature of Q γ (f ) and we pause to illustrate its role by considering a toy problem in one variable; see Figure 3. We let |x| 0 be the function equalling 1 on R \ {0} and zero at x = 0. In red we see the functional |x| 0 + 1 2 |x − 1| 2 (which is a particular case of both (3) and (4) in dimension 1, the matrix A is here the number 1), in blue its convex envelope and in pink the 1 convex relaxation |x| + 1 2 |x − 1| 2 . Clearly the global minimum of the red and blue coincide, but the global minimum of the 1 -relaxation is different. For (10) we have two options, either |A| 2 > γ or |A| 2 < γ. The regularizer (10) is illustrated in black for these two cases in Figure 3. The circles represent global minima of the respective functions. In the case |A| 2 > γ we see that (10) is a convex minorant of (9) whose global minima (for this choice of parameters) is equal to that of (10). In the case |A| 2 < γ, (10) is no longer convex but the local minima of (10) are also minima of (9), and (10) has fewer local minima. In particular the global minima coincide. The main point of the paper is loosely that the general behavior is the same. In Section 5.1 we generalize the situation in Figure 3 (left) and assume that γ satisfies A * A γI, i.e. that Ax 2 > γ x 2 .
For such choice of γ we prove that the functional (10) is a convex functional below (9) and hence minimization of (10) will produce a minimizer which, although not necessarily equal to the minimizer of the original problem, potentially is closer than that obtained by other convex relaxation methods.
For the problem (4) A is usually a matrix with a large kernel which rules out the above approach. In Section 5.2 we consider the case generalizing the situation in the right picture of Figure 3. We can then show that (10) is a continuous (but not convex) functional with the following desirable properties: i) (10) lies between (9) and its l.s.c. convex envelope, ii) any local minimizer of (10) is a local minimizer of (9), iii) the global minimizers of (10) and (9) coincide.
These findings in turn rely on general results about l.s.c. convex envelopes which we provide in Section 4. The computation of the l.s.c. convex envelope of f (x) + γ 2 x 2 can be thought of as stretching a plastic foil from below onto the graph of f (x) + γ 2 x 2 (see Figure 2). Consider a point x where the plastic foil is not in contact with the graph, i.e. where Q γ (f )(x) < f (x). It is intuitively obvious that the plastic foil, i.e. the graph of Q γ (f )(x) + γ 2 x 2 , has some direction in which it is affine linear and thus Q γ (f ) should have some direction in which the curvature is −γ. This is surprisingly difficult to show and despite the wealth of results on l.s.c. convex envelopes it is not found in any standard reference on the topic. The statement is shown in the PhD-thesis [22] for the finite dimensional case. Here we provide a proof is based on an extension of Milman's theorem due to Arne Brøndsted [9] in a short note from 1966. The final Section 6 is more practical in nature. Critical points of (10) can be found using the forward-backward splitting method (FBS), given that Q γ (f ) is "semi-algebraic", as was shown in [3]. To simplify verification of when Q γ (f ) is semi-algebraic we show in Section 6 that this is true as long as f itself is semi-algebraic. Further tools to compute Q γ (f ) as well as related proximal operators are found in [13].

The quadratic envelope
Let V be a separable Hilbert space over R or C, such as C n with the canonical norm x 2 2 = n j=1 |x j | 2 or M m,n , equipped with the Frobenius norm which we denote X F . All Hilbert spaces over C are also Hilbert spaces over R with the scalar product x, y R = Re x, y and hence it is no restriction to assume that V is a real Hilbert space wherever needed. Even if V is a Hilbert space over C we will implicitly assume that the scalar product is x, y R .
Given any functional f : V → R ∪ {∞} and parameter γ > 0 we introduce the "quadratic envelope" Q γ as the supremum of all minimizers of the form α − γ 2 x − y 2 for α ∈ R and y ∈ V;  The black graph lies slightly below for illustration only.
The quadratic envelope has appeared previously e.g. in [29] under the name "proximal hull", denoted h γ −1 (Example 1.44), but it seems that the term is not widespread (see the discussion in Section 7) and it seems that its connection with convex envelopes has not been noted or at least not systematically studied. We prefer the term quadratic envelope since it is more illustrative, and prefer the notation Q γ since it would be messy to always have to invert γ which in this context has a concrete meaning; The parameter γ basically tunes the maximum negative curvature of Q γ (f ) as we shall see in Section 4 (Corollary 4.4). When γ = 1 we simply write Q as opposed to Q γ . In this section we first provide some tools to compute Q γ , then prove the connection with l.s.c. convex envelopes and end with some auxiliary results and a discussion of connections to previous concepts and terminology.
The Legendre transform (or Fenchel conjugate) is defined as g * (y) := sup x x, y − g(x). We remind the reader that g * is l.s.c convex and that g * * equals the l.s.c. convex envelope of g by the Fenchel-Moreau theorem (see e.g. Proposition 13.11 and 13.39 in [4]). We now introduce the transform S γ defined as follows: S γ is simply the negative of the Moreau envelope computed with constant γ −1 . If we set q γ (x, y) = − γ 2 x − y 2 then, in the terminology of [29] Sec. 11.L, S γ (f ) is the q γ -conjugate of f and Q γ (f ) the q γ -envelope of f (reinforcing our choice of terminology "quadratic envelope" for Q γ ). We introduce the symbol S γ mainly since we believe the notation −e γ −1 (f ) or qγ f (c.f. [29]) or − γ −1 f (c.f. [4]) would be confusing for our present purposes. Its connection to the quadratic envelope is described by the following proposition; Proof. The argument is a replica of Example 1.44 of [29], but is included for completeness. We have α − γ 2 · −y 2 ≤ f iff α ≤ f + γ 2 · −y 2 so the maximal α for fixed y is given by as desired. The next proposition contains some basic observations on the behavior of S γ and Q γ . Proof. The statement of the interchanging signs follows easily by the last line of (14) which also shows that S γ (f ) avoids −∞. By (14) it follows that S γ (f ) (and Q γ (f ) by Proposition 3.1) is the difference of an l.s.c. convex functional and a quadratic term. With this in mind the continuity statements follows by standard properties of l.s.c. convex functionals (see e.g. Corollary 8.30 [4]).
The following result is the key result of this section connecting the Q γ -transform with l.s.c. convex envelopes.
In particular, from which the first identity follows. Similarly The statement about the convex envelope follows by the Fenchel-Moreau theorem and Proposition 3.1, which also gives This implies the latter part of the inequality 0 ≤ Q γ (f ) ≤ f whereas the former has already been noticed in Proposition 3.2.
We end this section with some observations about the behavior of Q γ (f ) as a function of γ.
whereas the limit as γ 0 equals a convex minimizer of f above the l.s.c. convex envelope of f .
We remark that (16) is shown in [29], whereas nothing is said about the case γ 0. In fact, lim γ 0 Q γ (f ) usually equals the l.s.c. convex envelope of f , but this is not necessarily the case in general, which is a surprise at least for the author. To see this, consider P = {x ∈ R 2 : where ι P is the indicator functional of P . It is easy to see that the l.s.c. convex envelope of ι P equals ι cl(Q) (where cl denotes closure) whereas some thinking reveals that lim γ 0 Q γ (f ) = ι Q . However if V is finite dimensional and lim γ 0 Q γ (f ) is everywhere finite, then it is automatically continuous (Corollary 8.30 in [4]), and hence it must equal the l.s.c. convex envelope of f .
Concerning the limit as γ 0 set g(x) = lim γ 0 Q γ (f )(x) which exist by the first part of this proposition. Since we see that g is the limit of a decreasing sequence of convex functions, hence it is also convex (Proposition 8.16 [4]), and clearly g ≤ f by Theorem 3.3.

Finer Properties of Convex and Quadratic Envelopes
In this section, we prove a result about the structure of l.s.c. convex envelopes which seems relatively unknown. For this we need the concept of weak lower semi-continuity, which is nothing but semicontinuity with respect to the weak topology of the underlying separable Hilbert space V. We remind the reader that for convex proper functionals there is no difference (Theorem 9.1 [4]) between weakly l.s.c. functionals and standard l.s.c. functionals. Also, if V is finite dimensional and the topology is Hausdorff, the two topologies are the same so there is no difference in this case either. However we wish to underline that the difficulty in proving the coming results is present also in the finitedimensional setting. We begin with a neat fact concerning weakly l.s.c. convex envelopes which does not seem to have made its way into the modern literature on the subject. It is a reformulation of Arne Brøndsted's extension of Milman's theorem [9]. To state it we remind the reader that a functional g is coercive if and only if its (lower) level sets are bounded (see e.g. Proposition 11.11 [4]). Note that l.s.c. convex envelopes of the type Q γ (f )(x) + γ 2 x − d 2 (for positive f ) always are coercive, by virtue of Proposition 3.2 and the quadratic term. A function f on R is called affine if it is of the form f (t) = at + b with a, b ∈ R.
Theorem 4.1. Let g be a weakly l.s.c. functional on a separable Hilbert space V such that g * * is coercive. Given any x ∈ V such that g(x) = g * * (x) there exists a unit vector ν and t 0 > 0 such that the function h(t) = g * * (x 0 + tν) is affine on (−t 0 , t 0 ).
To prove Theorem 4.1 we recall some concepts from [9]. Given a convex function f a point x is called extremal if and only if (x, f (x)) is extremal for the epigraph of f , denoted [f ]. Equivalently, x is extremal if and only if x ∈ dom f and f is not affine on any relatively open segment containing x. Moreover f ext denotes the functional which equals f (x) for all extremal points x and ∞ else. As a consequence of Theorem 1 in [9] we have: Theorem 4.2. Let g be a weakly l.s.c. functional on a separable Hilbert space V such that g * * is coercive, then Proof. In the setting of [9] we let E be the separable Hilbert space V with the weak topology. Since convex functionals are l.s.c. with respect to the weak topology if and only if they are with respect to the norm topology it follows that the l.s.c convex envelope of g equals the weakly l.s.c. convex envelope. In the notation of Theorem 1 of [9] we can then take f = g * * and the theorem states that where g cl is the greatest l.s.c. minorant of g. Since g is assumed to be l.s.c. we have g = g cl and the desired inclusion follows. It remains to check that the conditions of Theorem 1 are fulfilled, which is that "g is inf-compact in some direction" (with respect to the weak topology, referring to the terminology of [9]). For this it suffices to check that g * * is inf-compact i.e. that all level sets are compact. The level sets of g * * are closed and convex and since g * * is assumed coercive they are also bounded. It follows that such level sets are compact in the weak topology and the proof is complete.
Based on this we can now easily prove Theorem 4.1. Proof of Theorem 4.1. Since g ≥ g * * , Theorem 4.2 clearly implies that g(x) = g * * (x) for all extremal points x for g * * . Consequently, if g(x) = g * * (x) does not hold, then x is not extremal for g * * and the existence of ν follows by the definition of an extremal point for g * * .
Next we discuss what the theorem implies about minimizers of g versus g * * . Denote by G the set of global minimizers of g and by G * * the set of global minimizers of g * * .
Corollary 4.3. Let g be a weakly l.s.c. functional on a separable Hilbert space V such that g * * is coercive. Then G * * is a closed bounded convex set containing G. Letting G * * ext denote the extremal points of G * * we also have that G * * ext ⊂ G. Finally the closed convex hull of G * * ext equals G * * . Proof. The convexity of G * * and the inclusion G ⊂ G * * are immediate. The boundedness of G * * follows since g * * is coercive. Let x be in the closure of G * * and let c be the value of the global minimum. Then g * * (x) ≤ c follows by l.s.c. and the reverse inequality is obvious from the fact that c is a global minimum. It follows that x ∈ G * * and hence G * * is closed.
The existence of points in G * * ext and the statement concerning the closed convex hull are now immediate consequences of the Krein-Milman theorem and the fact that bounded closed convex sets are weakly compact in separable Hilbert spaces (Theorem 3.33, [4]). It remains to prove that G * * ext ⊂ G. Let x 0 ∈ G * * ext suppose x 0 ∈ G. Then Theorem 4.1 implies the existence of a direction ν on which g * * is constant near x 0 contradicting that x 0 is an extremal point.
We end by noting that Theorem 4.1 implies that γ tunes the maximum negative curvature in the Q γ -transform as discussed in the introduction. For each x 0 ∈ V with f (x 0 ) > Q γ (f )(x 0 ) there exists a unit vector ν such that Q γ (f )(x 0 + tν) = a + bt − γ 2 t 2 for t near 0 and some a, b ∈ R. Proof. Set g(x) = f (x) + γ 2 x 2 . By Theorem 3.3 we have Q γ (f )(x) + γ 2 x 2 = g * * (x) by which it is immediate that g * * is coercive (since Q γ (f ) ≥ 0 by Proposition 3.2). It also follows that g(x 0 ) > g * * (x 0 ) and hence Theorem 4.1 implies that a unit vector ν exists such that t → Q γ (f )(x + tν) equals an affine function minus γ 2 (x + tν 2 in a neighborhood of t = 0.

The Quadratic Envelope as a Regularizer
We now let A : V → W be a bounded linear operator, where V, W are possibly different (separable) Hilbert spaces, and consider functionals of the type Our aim is to develop strategies to deal with the general problem (17), in the case when f is an [0, ∞]valued functional such that Q γ (f ) is computable, and focus on computing (explicit) approximations of the l.s.c convex envelope of J . The theory is split in two cases, either we approximate the convex envelope from below by a convex functional, or we approximate it from above with a non-convex functional having a number of desirable properties, most notably the fact that local minimizers do not change. More precisely, we will study the relationship between the original problem (17) and the modified problem under the assumption that γI A * A or γI A * A (c.f. (11)-(12) and recall Figure 3). Note that γI A * A if and only if γ ≥ A 2 .

Case A * A γI.
Let f be a [0, ∞]−valued functional and A : V → W a bounded linear operator. The main result of this section states that J γ is a convex minorant of the l.s.c. convex envelope J * * .
Theorem 5.1. For γ > 0 such that A * A γI, J γ is convex and J γ ≤ J * * . Moreover, if A * A γI then it is strongly convex, in which case it has a unique minimizer. Finally, a minimizer x of J γ is a minimizer of J whenever f (x) = Q γ (f )(x).
Proof. Upon expanding Ax − d 2 = Ax 2 − 2 Ax, d + d 2 and noting that the latter two terms are affine linear, it is easily seen that it suffices to prove the first part of the statement for d = 0. That J γ is l.s.c. and that J γ ≤ J follows immediately by Theorem 3.3 and thus J γ ≤ J * * follows immediately upon showing that J γ is convex. Define x, y U = Ax, Ay W − γ x, y V and note that this is a semi-inner product, as long as A * A γI, which is an inner product if the inequality is strict. In either case x 2 U := x, x U is convex. It follows that which by Theorem 3.3 implies that J γ equals the l.s.c. convex envelope of f (x) + γ 2 x 2 V plus the term 1 2 x 2 U . We conclude that J γ is a convex functional which is strongly convex when A * A γI. In the latter case the existence of a unique minimizer follows by Corollary 11.15 in [4] (supercoercivity of J γ is obvious by the term 1 2 x 2 U ). Finally let d be fixed and letx be a minimizer of J γ . Suppose that f (x) = Q γ (f )(x) and let y ∈ V be arbitrary. Then J (y) ≥ J γ (y) ≥ J γ (x) = J (x) showing thatx is a global minimizer of J .

Case A * A γI.
Let f be a [0, ∞]−valued functional and A : V → W a bounded linear operator. Again we are interested in the relationship between J and J γ defined in (17) and (18) respectively. The main result of this section is that J γ does not move minima for γ in the stated range, but we begin by noting the following inequalities, the first one being reverse of the one proved in Theorem 5.1.
Proof. The right inequality is immediate since Q γ (f ) ≤ f by Theorem 3.3. As in Theorem 5.1 we moreover see that it suffices to prove the left inequality for d = 0. To this end set h(x) = J * * (x) − 1 2 Ax 2 . Since J * * ≤ f + 1 2 Ax 2 we have h ≤ f and moreover The right hand side is convex and l.s.c. by which we conclude that (the last identity follows by Theorem 3.3) which gives h(x) ≤ Q γ (f )(x). In other words J * * (x) ≤ Q γ (f )(x) + 1 2 Ax 2 which is the desired inequality (for d = 0). We now come to the main theorem of this section, inspired by Theorems 4.5 and 4.8 in [30]. We say that x is a local minimizer of J if there exists a neighborhood U of x in V such that J (y) ≥ J (x) for all y ∈ U and we say that x is a strict local minimizer of J if the inequality is strict for y = x. Theorem 5.3. Suppose that A 2 < γ. If x is a local minimizer (resp. strict local minimizer) of J γ then it is also a local minimizer (resp. strict local minimizer) of J , and J γ (x) = J (x). In addition the global minimizers coincide.
Proof. Let x be a local minimizer of J γ . If Q γ (f )(x) = f (x) does not hold then Corollary 4.4 implies that there exists a unit vector ν such that We thus conclude that Q γ (f )(x) = f (x) holds which immediately gives that J γ (x) = J (x). In view of Proposition 5.2 it follows that x is a local minimizer also for J . The same argument applies to strict local minimizers. We now prove that the global minimizers coincide. Note that global minimizers of J are global minimizers of J γ in view of Proposition 5.2 and the fact that J (x) = J * * (x) for all global minimizers x. From this we also see that the global minimum of J and J γ coincide, let us denote this value by c. Conversely suppose that x is a global minimizer of J γ (i.e. J γ (x) = c). Then it is a local minimizer of J by the first part, which automatically is global for J since we otherwise would have J (y) < c for some other value y. The proof is complete.
The situation when γ = A 2 is a bit more involved so we content ourselves with the following statement concerning the global minimizers.
Theorem 5.4. Set γ = A 2 , let G be the global minimizers of J and G γ the global minimizers of J γ . Then G ⊂ G γ and each connected component of G γ contains points of G.
Proof. The statement G ⊂ G γ follows as in the above proof, as well as the fact that the global minimum of J and J γ coincide; we denote it by c. If x ∈ G γ and J (x) > c then it follows by (19) that there exists a unit vector ν such that d 2 dt 2 J γ (x + tν) ≤ 0 in a neighborhood of t = 0. Strict inequality contradicts the assumption of global minima, so we deduce that γ ν 2 = Aν 2 . Introducing the semi-norm x 2 U = γ x 2 V − Ax 2 W , this means that ν U = 0, i.e. that ν lies in the kernel of the semi-norm · U (which is a linear subspace by convexity of the semi-norm). Let P be the affine hyperplane P = x + ker · U and set S = P ∩ G γ . For y ∈ ker · U we have so Theorem 3.3 implies that J γ is convex on P . In particular S is convex. Since J γ is l.s.c. it is also closed. Moreover S is bounded due to the quadratic term x + y 2 V in (20). S is therefore weakly closed and hence it equals the closed convex hull of its extremal points by the Krein-Milman theorem. If x now is one of these extremal points then we can argue as in the beginning of this proof and conclude that J γ (x) = J (x), since the existence of a ν with the properties stated initially would contradict that x is an extremal point of S.

The S-Transform and Semi-Algebraicity
We briefly treat semi-algebraicity of Q γ (f ) since it was shown in [3] that this is a necessary condition for the forward backward splitting method to converge in the non-convex setting. We remind the reader that a function on a finite dimensional space is semi-algebraic if its graph is a semi-algebraic set [6].
Theorem 6.1. If V is finite dimensional and f is semi-algebraic then so is S γ (f ) and Q γ (f ).
Proof. We assume for simplicity that γ = 1. It is a consequence of the Tarski-Seidenberg theorem that the set of semi-algebraic functions is closed under addition (see e.g. Prop. 2.2.6 in [6]) and similarly one can prove that the epigraph of a semi-algebraic function is a semi-algebraic set. If f is semi-algebraic on R n it follows that g(x, y) = x, y − (f (x) + 1 2 x 2 ) is semi-algebraic on R 2n and by the argument following Theorem 2.2 in [3] it follows that the Legendre transform of f + 1 2 x 2 is semi-algebraic. The first result now follows since this function minus γ 2 y 2 equals S γ (f )(y) by (14), and the second is immediate by Proposition 3.1.

Related Works
The operations S γ (f ) and Q γ (f ) were introduced around 1970 in greater generality by J-J. Moreau [26] and (seemingly independently) E-A. Weiss [33], and were further studied around 1990 by R. Poliquin [27] with a focus on smoothness properties. Variations of Propositions 3.1 and 3.2 date back to these early articles, and are also found e.g. in Rockafellar-Wets [29] Section 11.L. The transforms S γ and Q γ go under names like "Φ-conjugate"/"proximal transform" and "Φ-biconjucate"/"Φ-convex envelope", and arise by the concrete choice Φ(x, y) = q γ (x, y) = − γ 2 x − y 2 . Following Rockafellar-Wets [29] Q γ (f ) should be called "proximal hull" or "q γ −envelope". We believe that the "quadratic envelope", which is closer to the latter, is more suggestive. Functions that satisfy Q γ (f ) = f has been called e.g. γ −1 -proximal or quadratically convex.
However they are called, it seems that the connection with convex envelopes a lá Theorem 3.3 has not been investigated, which is the main novelty of this publication along with the structural result Corollary 4.4 and its applications to regularization in Section 5. Apart from the already mentioned works by Aubert, Blanc-Feraud, Soubies and Larsson, Olsson we have not found any similar result in the literature. The fairly recent survey paper [23] is about the closely related concept of computing Fenchel conjugates, and also mentions proximal hulls, yet it has no overlap with the present paper despite citing 262 other papers. It primarily deals with numeric computation of convex envelopes in cases when symbolic formulas are not available, and as such it is an interesting alternative to the methods developed here. The same goes for the papers [24] and [7]. The importance of computing convex envelopes is stressed in [25] where techniques for computing convex envelopes of so called "convex polyhedral" functions are developed. Convex approximations from below are considered in [8] which should be compared with the results in Section 5.1. An alternative to approximating the convex envelope is to numerically try to compute the proximal operator of the original functional directly, which is pursued in [19]. The papers [2, 31] deal with Lasry-Lions approximants in Hilbert space but do not make the connection with the convex envelopes. For parameters s < t the Lasry-Lions approximation of f [21] is defined by which for s = t gives Q s −1 . This regularization is also studied in Section 6 of the more recent publication [31] (with the notation C(1)f ), mainly with focus on differentiability-results. It is also closely connected to the more general "proximal average", see e.g. [5,18]. However the proximal average has been used mainly for modification of convex functions whereas Q γ (f ) = f for any l.s.c. convex function.

Conclusions
We have provided theory for computing l.s.c. convex envelopes of certain functionals and shown a connection with quadratic envelops (a.k.a. proximal hulls), which was then used to regularize more intricate problems. We showed that for sufficiently small values of the parameter γ, this yields convex functionals below the original functional, which coincide with the original functional on a large part of the underlying Hilbert space. For γ sufficiently large on the other hand we lose convexity but gain the desirable feature that the modified functional has the same global minimizers as the original one, and fewer local ones. This in turn was based on results regarding the structure of l.s.c. convex envelopes. The results are inspired from prior work by Carl Olsson and Viktor Larsson as well as Emmanuel Soubies, Laure Blanc-Féraud and Gilles Aubert. Particular cases of these ideas have already been applied to compressed sensing, imaging, signal processing and frequency estimation. Currently we are working on more concrete results regarding low rank approximation, improvements of frequency estimation techniques, as well as an application to the classical phase retrieval problem. We hope that other researchers will try these methods on their problems and find that the method is a valuable tool. To aid with this task an expanded version of this article is available on arXiv [13] with many more examples and useful details.