Envelope Functions: Unifications and Further Properties

Recently, the forward-backward and Douglas-Rachford envelope functions were proposed in the literature. The stationary points of these envelope functions have a close relationship with the solutions of the possibly nonsmooth optimization problem to be solved. The envelopes were shown to be smooth and convex under some additional assumptions. Therefore, these envelope functions create powerful bridges between nonsmooth and smooth optimization. In this paper, we present a general envelope function that unifies and generalizes these envelope functions. We provide properties of the general envelope function that sharpen corresponding known results for the special cases. We also present an envelope function for the generalized alternating projections method (GAP), named the GAP envelope. It enables for convex feasibility problems with two sets, of which one is affine, to be solved by finding any stationary point of the smooth and under some assumptions convex GAP envelope.

All these methods seek a fixed-point by performing an averaged iteration of the nonexpansive mapping. The averaging is the key to guaranteing convergence of the iterates to a fixed-point of the nonexpansive mapping, see [9]. The rate of convergence can, however, be very slow in practice. One way to improve convergence of such methods is to precondition the problem data. This approach has been extensively studied in the literature and has proven very successful in practice; see, e.g., [4,6,23,16,18,19,17] for a limited selection of such approaches. The underlying idea is to incorporate static second-order information in the respective algorithms.
The performance of the forward-backward and the Douglas-Rachford methods can be further improved by exploiting the properties of the recently proposed forward-backward envelope in [30,34] and Douglas-Rachford envelope in [29]. As shown in [30,34,29], the stationary points of these envelope functions agree with the fixed-points of the corresponding operator. The envelopes are also shown to be convex and to have Lipschitz continuous gradients (under certain assumptions). Therefore, the original nonsmooth problem to be solved using forward-backward splitting or Douglas-Rachford splitting can be solved by finding a stationary point of the corresponding smooth envelope functions. In [30,34], it is shown how truncated Newton methods or quasi-Newton methods can be applied to the forward-backward envelope function to improve local convergence.
A unifying property of forward-backward splitting and Douglas-Rachford splitting (for convex optimization) is that they are averaged iterations of a nonexpansive mapping S, where S = S 2 S 1 is composed of two nonexpansive mappings. These mappings are gradients of functions f 1 and f 2 respectively, i.e., S 1 = ∇f 1 and S 2 = ∇f 2 . What unifies their envelopes is the assumption corresponding to that f 1 is twice continuously differentiable. For averaged iteration of such operators, we propose a differentiable envelope function that has the forward-backward and Douglas-Rachford envelopes as special cases. Other special cases include the Moreau envelope and the ADMM envelope (which is a special case of the Douglas-Rachford envelope since ADMM is Douglas-Rachford splitting applied to the Fenchel dual problem, see [14]).
We analyze this general envelope function in the more restrictive setting of f 1 being quadratic, or equivalently S 1 = ∇f 1 being affine, i.e., of the form S 1 = P (·) + q, with P linear. We show that if P is nonsingular, the stationary points of the envelope coincide with the fixed-points of S = S 2 S 1 . We provide quadratic upper and lower bounds to the envelope function that improve corresponding results for the known special cases in the literature. The bounds imply, e.g., that the gradient of the envelope function is always 2-Lipschitz continuous. If in addition the linear operator P that defines S 1 is positive semidefinite, the envelope function is convex. Since the fixed-points of S and the stationary points of the envelope coincide, a fixed-point to S can, when P is positive semidefinite, be found by minimizing a smooth and convex envelope function.
In [30,34,29] it was shown that forward-backward splitting and Douglas-Rachford splitting can be seen as variable metric gradient methods applied to the respective envelope functions. If S 1 is affine, they show that it instead is a scaled gradient method with fixed metric. This generalizes also to our setting, i.e., an averaged iteration of a nonexpansive mapping can be interpreted as a scaled gradient method applied to the envelope function. Since the envelope function has nice smoothness properties and is in some cases convex, more efficient methods to find a fixed-point to S, or equivalently a stationary point of the envelope, probably exist. For instance, quasi-Newton, nonlinear conjugate gradient, or truncated Newton methods, some of which has been proposed to be used with the forward-backward envelope in [30,34] can be used to improve local convergence (see [28] for details on the methods). Devising new algorithm or suggesting which existing ones that are most efficient is, however, outside the scope of this paper.
We also provide a new envelope function that is a special case of the general envelope, namely the generalized alternating projections (GAP) envelope. Generalized alternating projections [22,1,26,13,7] (which is also referred to as the method of alternating relaxed projections, e.g., in [3]) solves feasibility problems involving a finite number of nonempty closed and convex sets. This is done by alternating relaxed projections onto the sets. It can use either under-relaxation, in which the step does not go all the way to the projection point, or over-relaxation when the step goes past the projection point, up towards the reflection point. Our envelope function applies to problems with two sets, with one nonempty closed and convex and one affine. Since the general envelope function always has a Lipschitz continuous gradient, so has the GAP envelope. If in addition, the first relaxed projection (onto the affine set) is an under-relaxation, the GAP envelope is convex. Therefore, all feasibility problems with an affine subspace and a convex set can be solved by minimizing a smooth convex function.
Our contributions are as follows; i) we propose a general envelope function that has several known envelope functions as special cases, ii) we provide properties of the general envelope that sharpen (sometimes considerably) and generalize corresponding known results for the special cases, iii) we provide new insights on the relation between the Douglas-Rachford envelope and the ADMM envelope, iv) we present a new envelope function, the GAP envelope, and characterize its properties.

Notation
We denote by R the set of real numbers, R n the set of real column-vectors of length n, and R m×n the set of real matrices with m rows and n columns. Further R := R ∪ {∞} denotes the extended real line. We denote innerproducts on R n by ·, · and their induced norms by · . We will also use scaled norms x P := P x, x where P is a positive definite operator (defined in Definition 2.2). We will use the same notation for scaled semi-norms, i.e., x P := P x, x where P is a positive semidefinite operator (defined in Definition 2.1). The identity operator is denoted by Id. The conjugate function is denoted and defined by f * (y) sup x { y, x − f (x)}. The adjoint operator to a linear operator L : R n → R m is defined as the unique operator L * : R m → R n that satisfies Lx, y = x, L * y . The linear operator L : R n → R n is self-adjoint if L = L * . The notation argmin x f (x) refers to any element that minimizes f while the notation Argmin x f (x) refers to the set of minimizers. Finally, ι C denotes the indicator function for the set C that

Background
In this section, we introduce some standard definitions that can be found, e.g. in [2,32].

Operator Properties
Definition 2.1 (Positive semidefiniteness) A linear operator L : R n → R n is positive semidefinite if it is self-adjoint and all eigenvalues λ i (L) ≥ 0.
Remark 2.1 An equivalent characterization of a positive semidefinite operator is that Lx, x ≥ 0 for all x ∈ R n . Definition 2.2 (Positive definiteness) A linear operator L : R n → R n is positive definite it is self-adjoint and if all eigenvalues λ i (L) ≥ m with m > 0.
Remark 2.2 An equivalent characterization of a positive definite operator L is that Lx, x ≥ m x 2 for some m > 0 and all x ∈ R n .

Remark 2.3
For notational convenience, we have included α = 1 and β = 1 in the definitions of (negative) averagedness, which both are equivalent to nonexpansiveness. For values of α ∈ (0, 1) and β ∈ (0, 1) averagedness is a stronger property than nonexpansiveness. For more on negatively averaged operators, see [17] where they were introduced.
Note that if a gradient operator ∇f is α-averaged and β-negatively averaged. Then it must hold that α + β ≥ 1. This follows immediately from Lemma C.3 and Lemma C.4 in Appendix C.

Remark 2.4
This cocoercivity definition implies that cocoercive mappings T can be expressed as for some nonexpansive operator S. We also note that 1-cocoercivity is equivalent to 1 2 -averagedness (which is also called firm nonexpansiveness).
We conclude this subsection with a result relating Lipschitz continuity and cocoercivity to averagedness and negative averagedness.
Proposition 2.1 Suppose that ∇f : R n → R n is the gradient of some function f : R n → R. Then the following hold: 1] if and only if it is 1 2 -averaged and 2 -negative averagedness is equivalent to that holds for all x, y ∈ R n . This is equivalent to that ∇f is 1 δ -cocoercive, see [

Function Properties
Definition 2.7 (Strong convexity) Let P : R n → R n be positive definite. A proper and closed function f : Remark 2.5 If f is differentiable, σ-strong convexity w.r.t. · P can equivalently be defined as that holds for all x, y ∈ R n . If P = Id, i.e., if the norm is the induced norm, we merely say that f is σ-strongly convex. If σ = 0, the function is convex.
There are many smoothness definitions for functions in the literature. We will use the following that implies that the function is in every point majorized and minimized by a norm-squared function.
holds for all x, y ∈ R n .

Connections
We will later show that our envelope function satisfies upper and lower bounds of the form for all x, y ∈ R n and for different linear operators M : R n → R n and L : R n → R n . Depending on M and L, we get different properties of f and its gradient ∇f . Some of these are stated below. The results follow immediately from Lemma C.2 in Appendix C and the definitions of smoothness and strong convexity in Definition 2.7 and Definition 2.8 respectively.
Proposition 2.4 Assume that L = −M and that L is positive definite. Then Proposition 2.5 Assume that M and L are positive definite. Then (4) is equivalent to that f is 1-smooth w.r.t. · L and 1-strongly convex w.r.t. · M .

Envelope Functions
To find a fixed-point of a nonexpansive mapping S using an averaged iteration of that mapping, is the basis for many first-order optimization methods. Based on ideas from [30,29], we present another method to find such a fixed-point. We create an envelope function whose stationary points coincide with the fixed-points of the operator S. For forward-backward splitting and Douglas-Rachford splitting, such envelopes have been proposed in [30] and [29] respectively. These envelope functions turn out to be special cases of the envelopes we propose, see Section 4. The envelope functions often possess favorable properties such as convexity and Lipschitz continuity of the gradient. Then, any method to find a stationary point (in the convex case, a minimizer) of the envelope function can be used to find a fixed-point to the nonexpansive mapping S.
To formulate our envelope function, we assume that the nonexpansive operator S is a composition of S 2 and S 1 , i.e., S = S 2 S 1 . We make the following basic assumptions on S 1 and S 2 , that sometimes will be sharpened or relaxed: is a a self-adjoint nonexpansive linear operator and q ∈ R n Remark 3.1 Part (iii) of the assumption means that P is symmetric with eigenvalues in the interval [−1, 1]. Now, we are ready to define the general envelope function whose properties we will investigate in this paper: The gradient of this function is given by The set of stationary points to the envelope function F is the set of points for which the gradient is zero. This set is denoted as follows:

Basic Properties of the Envelope Function
Here, we list some basic properties of the envelope function (5). The first two results are special cases and direct corollaries of a more general result in Theorem 3.1, and therefore not proven here.
Proposition 3.1 Suppose that Assumption 3.1 holds. Then the gradient of F is 2-Lipschitz continuous. That is, ∇F satisfies Proposition 3.2 Suppose that Assumption 3.1 holds and that P , the operator defining the linear part of S 1 , is positive semidefinite. Then F is convex.
So, if P is positive semidefinite, then the envelope function F is convex and differentiable with a Lipschitz continuous gradient. The set of stationary points of F also has a close relationship with the fixed-points of S = S 2 S 1 . This is shown next.

Proposition 3.3 Suppose that Assumption 3.1 holds and that
Proof. The first claim follows directly from (6). The second claim follows from (6) and that F is convex when P is positive (semi)definite, see Proposition 3.2.
These three results show that if P is positive definite, a fixed-point to S 2 S 1 can be found by minimizing the differentiable convex function F , which has a 2-Lipschitz continuous gradient.

Finer Properties of the Envelope Function
Here, we establish some finer properties of the envelope function. We start with a general result on upper and lower bounds for the envelope function. This result uses stronger assumptions on S 2 than nonexpansiveness, namely that it is α-averaged and β-negatively averaged with α, β ∈ (0, 1], see Definition 2.4 and Definition 2.5. We state this as an assumption.
As seen in Section 2.2.3, such bounds have many implications on the properties of the function. Next, we provide some in the form of corollaries.
Corollary 3.1 Suppose that Assumption 3.1 and Assumption 3.2 hold and that P is positive semidefinite. Let δ α = 2α − 1 and δ β = 2β − 1. Then Proof. It follows directly from Theorem 3.1 and Lemma C.5 in Appendix C.
Corollary 3.2 Suppose that Assumption 3.1 and Assumption 3.2 hold and that either of the following holds: (i) P is positive definite and contractive (ii) P is positive definite and β ∈ (0, 1) in the negative averagedness Proof. To show the strong convexity claim, it is sufficient to apply Theorem 3.1 and show that P − δ β P 2 is positive definite, i.e., that λ min (P − δ β P 2 ) is positive. In (i), λ i (P ) ∈ (0, 1) and δ β ∈ (−1, 1] and in (ii), λ i (P ) ∈ (0, 1] and δ β ∈ (−1, 1). From Lemma C.5 it follows that in both cases, λ min (P − δ β P 2 ) is positive. The smoothness claim follows immediately from Theorem 3.1 and Definition 2.8. Next, we show a less tight characterization of the envelope function that does not take the shape of the upper and lower bounds into account.
From Corollary 3.3, the following two results are immediate.

Corollary 3.4 Suppose that Assumption 3.1 and Assumption 3.2 hold. Let
, m = λ min (P ), and L = λ max (P ) and suppose that either of the following two conditions holds: Corollary 3.5 Suppose that Assumption 3.1 and Assumption 3.2 hold and that P is positive semidefinite, i.e., that λ min (P ) ≥ 0. Let L = λ max (P ), The results in Theorem 3.1 and its corollaries hold for α-averaged and βnegatively averaged operators S 2 . In Proposition 2.1, some properties that are equivalent to averagedness and negative averagedness are stated. Therefore, we can use these equivalent properties instead when stating the above results. This is done in the following to propositions.

Proposition 3.5 Suppose that Assumption 3.1 holds and that
Then all results in this section hold with δ β = δ and δ α = 0.

Relation to Averaged Operator Iteration
As noted in [30,29], the forward-backward and Douglas-Rachford splitting methods are variable metric gradient methods applied to their respective envelope functions. In our setting with S 1 being affine, it reduces to a fixed-metric scaled gradient method. Here, we show that this observation holds also in our setting.
We apply the following scaled gradient method to the envelop function F : This gives which is an averaged iteration of the nonexpansive mapping S 2 S 1 for α ∈ (0, 1). Therefore, the basic averaged iteration can be interpreted as a scaled gradient method applied to the envelope function. This is most probably not the most efficient way to find a stationary point of the envelope function (or equivalently a fixed-point to S 2 S 1 ). At least in the convex setting (for the envelope), there are numerous alternative methods that can minimize smooth functions such as truncated Newton methods, quasi-Newton methods, and nonlinear conjugate gradient descent. See [28] for an overview of such methods and [30,34] for some of these methods applied to the forward-backward envelope. Evaluating which ones that are most efficient and devising new methods to improve performance is outside the scope of this paper.

Special Cases
In this section, we present a generalization of the envelope function in the previous section. This envelope has four known special cases, namely the Moreau envelope [25], the forward-backward envelope [30,34], the Douglas-Rachford envelope [29], and the ADMM envelope (which is a special case of the Douglas-Rachford envelope).
The generalization incorporates envelopes for iterations where f 1 that defines S 1 through S 1 = ∇f 1 is twice continuously differentiable (as opposed to quadratic in the previous section). The more general envelope function is When f 1 (x) = 1 2 P x, x + q, x it reduces to (5) since then The gradient of the envelope function in (8) is If ∇ 2 f 1 (x) is nonsingular for all x, the set of stationary points of the envelope coincides with the fixed-point set of S = S 2 S 1 . We do not provide any properties of the envelope functions in this setting (it is left as future work), but merely show that that it generalizes the previously known envelope functions.
In the more restricted setting with S 1 = ∇f 1 being affine, we provide envelope function properties that coincide with or sharpen corresponding results in the literature for the special cases.

Preliminaries
Before we present the special cases, we introduce some functions whose gradients are operators that are used in the respective underlying methods. Most importantly, we will introduce a function whose gradient is the proximal operator, which is defined as follows: where γ > 0 is a parameter. To do this, we introduce the following function which is a scaling and regularization of f : This is related to the proximal operator of f as follows: Proposition 4.1 Suppose that f : R n → R ∪ {∞} is proper closed and convex and that γ > 0. The proximal operator prox γf then satisfies where r γf is defined in (9).
This result is from [31, Theorem 31.5, Theorem 16.4] and implies that the proximal operator is the gradient of a convex function. A special case is when f = ι C , where ι C is the indicator function for the nonempty closed and convex set C. The proximal operator then reduces to the projection operator. The projection operator onto C is denoted by Π C and the corresponding regularized function is denoted and defined by With this notation, Π C (x) = ∇r * C (x). Next, we introduce a linear combination between r * and 1 2 · 2 , namely where we typically require that α ∈ (0, 2]. The gradient of p α γf is denoted by P α γf and is given by This is called a relaxed proximal mapping. Some special cases of this will have their own notation. Letting α = 2, we get the reflected proximal operator When f = ι C , we will use notation p α C , P α C , and R C for (11), (12), and (13) respectively. That is We refer to (15) as a relaxed projection, and (16) as a reflection. So, the proximal and projected operators and their relaxed and reflected variants are gradients of functions. We conclude with the straightforward observation that . That is, the gradient step operator is the gradient of the function 1 2 x 2 − γf (x).

The Proximal Point Algorithm
The proximal point algorithm solves problems of the form where f : R n → R ∪ {∞} is proper closed and convex.
The algorithm repeatedly applies the proximal operator of f and is given by where γ > 0 is a parameter. This algorithm is mostly of conceptual interest since it is often as computationally demanding to evaluate the prox as to minimize the function f itself. Its envelope function, which is called the Moreau envelope [25], is a scaled version of our envelope F in (5). The scaling factor is γ −1 and F in (5) is obtained by letting S 1 x = ∇f 1 (x) = x, i.e., P = Id and q = 0, and f 2 = r * γf , where r γf is defined in (9). The resulting envelope function f γ is given by and its gradient satisfies The following properties of the Moreau envelope follow directly from Corollary 3.5 and Proposition 3.5 since the proximal operator is 1-cocoercive (see

Forward-Backward Splitting
Forward-backward splitting solves problems of the form where f : R n → R is convex with an L-Lipschitz (or equivalently 1 Lcocoercive) gradient, and g : R n → R ∪ {∞} is proper closed and convex.
The algorithm performs a forward step then a backward step and is given by where γ ∈ (0, 2 L ) is a parameter. The envelope function, which is called the forward-backward envelope [30,34], is a scaled version of our envelope F in (8) and applies when f is twice continuously differentiable and ∇F is Lipschitz continuous. The scaling factor is γ −1 and F in (8) is obtained by letting f 1 = 1 2 · 2 − γf and f 2 = r * γg , where r γg is defined in (9). The resulting forward-backward envelope function is

The gradient of this function is
which coincides with the gradient in [30,34]. As described in [30,34], the stationary points of the envelope coincide with the fixed-points of x − prox γg (x − γ∇f (x)) if (Id − γ∇ 2 f (x)) is nonsingular.

S 1 affine
We provide properties of the forward-backward envelope in the more restrictive setting where S 1 = ∇f 1 = (Id − γ∇f ) is affine. This happens if f is convex quadratic, i.e., f (x) = 1 2 Hx, x + h, x with H ∈ R n×n positive semidefinite and h ∈ R n . Then S 1 x = P x + q with P = (Id − γH) and q = −γh.
for all x, y ∈ R n , where P = (Id − γH) is positive definite. If in addition λ min (H) = m > 0, then P − P 2 is positive definite and F FB γ is γ −1 -strongly convex w.r.t. · P −P 2 .
Less tight bounds for the forward-backward envelope are provided next. These follow immediately from Corollary 3.4, Corollary 3.5, and Proposition 3.5. This result is a less tight version of Proposition 4.3, but is a slight improvement of the corresponding result in [30,Theorem 2.3]. The strong convexity moduli are the same, but this smoothness constant is a factor two smaller.

Douglas-Rachford Splitting
Douglas-Rachford splitting solves problems of the form where f : R n → R ∪ {∞} and g : R n → R ∪ {∞} are proper closed and convex functions. The algorithm performs two reflection steps (13), then an averaging according to where γ > 0 and α ∈ (0, 1) are parameters. The objective is to find a fixedpointz to R γg R γf , from which a solution to (21) can be computed as prox γfz , see [2,Proposition 25.1]. The envelope function from [29], which is called the Douglas-Rachford envelope, is a scaled version of the basic envelope function F in (8) and applies when f is twice continuously differentiable and ∇F is Lipschitz continuous. The scaling factor is (2γ) −1 and F is obtained by letting f 1 = p 2 γf with gradient ∇f 1 = S 1 = R γf and f 2 = p 2 γg , where p 2 γg is defined in (11). The Douglas-Rachford envelope function becomes The gradient of this function is which coincides with the gradient in [29] since ∇R γf = 2∇prox γf − Id and As described in [29], the stationary points of the envelope coincide with the fixed-points of x − R γg R γf if ∇R γf is nonsingular.

S 1 affine
We state properties of the Douglas-Rachford envelope in the more restrictive setting where S 1 = R γf is affine. This holds if f is convex quadratic, i.e., of the form The operator S 1 becomes which confirms that it is affine. We implicitly define P and q through S 1 = R γf = P (·) + q, and note that they are given by P = 2(Id + γH) −1 − Id and q = −2γ(Id + γH) −1 h. In this setting, the following result follows immediately from Corollary 3.1 since S 2 = R γg is nonexpansive (1-averaged and 1-negatively averaged).
The following less tight characterization of the Douglas-Rachford envelope follows from Corollary 3.4 and Corollary 3.5. This result is more conservative than the one in Proposition 4.5, but improves on [29,Theorem 2]. The strong convexity modulus coincides with the corresponding one in [29,Theorem 2]. The smoothness constant is 1 1+γm times that in [29,Theorem 2], i.e., it is slightly smaller.

ADMM
The alternating direction method of multipliers (ADMM) solves problems of the form (21). It is well known [14] that ADMM can be interpreted as Douglas-Rachford applied to the dual of (21), namely to So the algorithm is given by where ρ > 0 is a parameter, and R ρf the reflected proximal operator (13) and (g * • −Id) is the composition that satisfies (g * • −Id)(µ) = g * (−µ).
In accordance with the Douglas-Rachford envelope (23), the ADMM envelope is defined as . (26) and its gradient becomes In this section, we relate the ADMM algorithm and its envelope function to the Douglas-Rachford counterparts. To do so, we need the following lemma which is proven in Appendix B.
where R ρg is defined in (13) and p 2 ρg is defined in (11).
First, we show that the z k sequence in (primal) Douglas-Rachford (22) and the v k sequence in ADMM (i.e., dual Douglas-Rachford) in (25) differ by a factor only. This is well known [12], but the relation is stated next with a simple proof.
There is also a tight relationship between the ADMM and Douglas-Rachford envelopes. Essentially, they have opposite signs.
This result implies that the ADMM envelope is concave when the DR envelope is convex, and vice versa. We know from Section 4.4 that the operator S 1 = R ρf * is affine when f * is quadratic. This happens when and H is positive definite on the nullspace of A. From Proposition 4.5 and Proposition 4.6, we conclude that, for an appropriate choice of ρ, the ADMM envelope is convex, which implies that the Douglas-Rachford envelope is concave.

Remark 4.1
The standard ADMM formulation is applied to solve problems of the form minimizef (x) +ĝ(z) subject to Ax + Bz = c Using infimal post-compositions, also called image functions, the dual of this is on the form (24), see e.g., [20,Appendix B] for details. So also this setting is implicitly considered.

The GAP Envelope
In this section, we provide an envelope function to a generalization of the classic alternating projections method in [35]. The generalization uses relaxed projections and is sometimes referred to as the method of alternating relaxed projections (MARP) [3], but we will refer to it as generalized alternating projections (GAP). The algorithm is analyzed in [22,1,26,13,7] and a more general formulation is treated in [9]. GAP solves feasibility problems with a finite number of nonempty closed and convex sets that have a nonempty intersection. Here, we consider feasibility problems with two sets: where C ⊂ R n and D ⊂ R n are nonempty closed and convex.
The generalized alternating projections method is given by where P α C is the relaxed projection in (15), and α ∈ (0, 1] and α 1 , α 2 ∈ (0, 2]. These assumptions imply that P α2 C is α2 2 -averaged if α 2 ∈ (0, 2) and nonexpansive if α 2 ∈ (0, 2] (and similarly for P α1 D ). If α 1 = 2 or α 2 = 2, the composition P α2 C P α1 D is nonexpansive and we need α ∈ (0, 1) to arrive at an averaged iteration that guarantees convergence to a fixed-point. If α 1 = α 2 = 2, the algorithm is Douglas-Rachford splitting (see Section 4.4) applied to a feasibility problem. In this case, we have Π D (fix(P α2 C P α1 D )) = C ∩ D. For all other feasible choices of α 1 and α 2 , the fixed-point set satisfies fix(P α2 C P α1 D ) = C ∩ D. In either case, the algorithm performs an averaged iteration to find a fixed-point to the nonexpansive operator P α2 C P α1 D . The algorithm is on the general form we consider and we identify S 2 in Assumption 3.1 with P α2 C and S 1 with P α1 D . We consider in particular the case when S 1 = P α1 D is affine, i.e., S 1 = P (·) + q. This holds if D is an affine set, i.e., if D = {x ∈ R n | Ax = b} for some linear operator A. Let N denote the linear part of the projection onto the affine set Π D , i.e., where D 0 = {x ∈ R n | Ax = 0}, and let d denote the constant part, to get Π D x = N x + d. The operator S 1 then satisfies . This implies that P and q that define the affine operator S 1 = P (·) + q satisfy The GAP envelope function follows from the general envelope in (5) and is given by where p α2 C is defined in (14) and P is from (29). Since P α1 D = P x + q and ∇p α2 C = P α2 C , its gradient satisfies ∇F GAP α1,α2 (x) = P x − P ∇p α2 C (P x + q) = P (x − P α2 C P α1 D x). So if P is nonsingular, the stationary points of the GAP envelope coincides with the fixed-points of P α2 C P α1 D . The following proposition follows immediately from Proposition 3.3.
Proposition 5.1 Suppose that α 1 , α 2 ∈ (0, 2] and that α 1 = 1. Then the set of stationary points to the gap envelope F GAP α1,α2 is the fixed-point set of P α2 C P α1 D .
Next, we state some properties of the GAP envelope.
Proof. The operator P α2 C is α2 2 -averaged and 1-negatively averaged (nonexpansive). So we can apply Theorem 3.1 with δ β = 1, δ α = α 2 − 1, and P in (29). Using N = N 2 (which holds since N is a projection onto a linear subspace), we conclude that and that This concludes the proof.
Since N is a projection operator onto a linear subspace, it has only two distinct eigenvalues, namely zero and one. Therefore, there are only two distinct eigenvalues of M and L in (30) and (31). Expressions for these eigenvalues are given in the following proposition. (32) and the eigenvalues of L in (31) are (33) with N defined in (28).
Proof. First note that λ i (a 1 Id + a 2 N ) = a 1 + a 2 λ i (N ). This implies that (32) is proven. It also implies that For λ i (N ) = 0, we see that (33) holds. In the case of λ i (N ) = 1, we conclude that This concludes the proof.
Using this, we can show that for α 1 ∈ [1, 2], the GAP envelope is convex on the nullspace of A and concave on its orthogonal complement, the rangespace of A * . Proposition 5.4 Let N (A) denote the nullspace of A and let R(A * ) denote its orthogonal complement, the rangespace of A * . Then the GAP envelope is convex and α 2 -smooth when restricted to R(A * ). If α 1 ∈ [1, 2], the GAP envelope is concave and α 1 (α 1 − 1)-smooth when restricted to N (A).
Proof. The subspace R(A * ) is spanned by the eigenvectors corresponding to λ i (N ) = 1. Therefore, Proposition 5.3 implies that for all x, y ∈ R(A * ), the lower bound in Proposition 5.2 becomes M (x − y), x − y = 0 and the upper bound in Proposition 5.2 satisfies L(x − y), x − y = α 2 x − y 2 . This proves the first claim.
The following proposition is a straightforward consequence of Proposition 5.2 and Proposition 5.3 and is stated without a proof.
Proposition 5.5 Suppose that α 1 ∈ (0, 2] and α 2 ∈ (0, 2]. Then the GAP envelope F GAP α1,α2 satisfies If in addition α 1 ∈ (0, 1], then it is convex. If the first relaxed projection is under-relaxed, i.e., if α 1 ∈ (0, 1], then the GAP envelope is convex. From Proposition 5.1, we also know that if α 1 = 1 its set of stationary points is the fixed-point set of P α2 C P α1 D . For convex functions, all stationary points are minimizers. This therefore implies that all convex feasibility problems where one set is affine, can be solved by minimizing the smooth convex GAP envelope function by setting α 1 ∈ (0, 1). In Section ??, we will see that most convex optimization problems can actually be cast on this feasibility form.

Conclusions
We have presented a unified framework for envelope functions. Special cases include the Moreau envelope, the forward-backward envelope, the Douglas-Rachford envelope, and the ADMM envelope. We also presented a new envelope function, namely the generalized alternating projections (GAP) envelope. Under additional assumptions, we have provided quadratic upper and lower bounds to the general envelope function. These coincide with or sharpen corresponding results for the known special cases in the literature.

C Technical Lemmas
Lemma C.1 Assume that f : R n → R is differentiable and that M : R n → R n and L : R n → R n are linear operators. Then Proof. Adding two copies of (36) with x and y interchanged gives This shows that (36)  ∇f (x + τ (y − x)) − ∇f (x), y − x dτ Using the upper bound in (37), we get Lemma C.2 Assume that f : R n → R is differentiable and that L is positive definite. Then that f is L-smooth, i.e., that f satisfies holds for all x, y ∈ R n is equivalent to that ∇f is β-Lipschitz continuous w.r.t. · L , i.e., that holds for all x, y ∈ R n .
Proof. We start by proving the result using the induced norm · only, i.e., in the Hilbert space setting. (This covers, e.g., the setting with inner-product x, y H = Hx, y and scaled norm · H = x, y H that will be used later.) To do this, we introduce the functions h := 1 β f and r := 1 2 (h + 1 2 · 2 ). Since L = Id in the norm, the condition (40) is β-Lipschitz continuity of ∇f (w.r.t. · ). This is equivalent to that ∇h = 1 β ∇f is nonexpansive, which by [2, Proposition 4.2] is equivalent to that 1 2 (∇h + Id) = ∇ 1 2 (h + 1 2 · 2 ) = ∇r is firmly nonexpansive (or equivalently 1-cocoercive). This, is equivalent to (see [ holds for all x, y ∈ R n . Multiplying by 2 and using 2r = h + 1 2 · 2 , this is equivalent to that Multiplying by β and using f = βh, this is equivalent to This chain of equivalences show that the conditions are equivalent when L = Id. Next, we show that the scaled version holds. To do this, introduce the space H H with inner-product x, y H = Hx, y and induced norm · H = Hx, x and the space E L inner-product x, y and induced norm · L = Lx, x . Further let H = L and define f h : H H → R and f l : E L → R that satisfy f h (x) = f l (x) for all x ∈ R n . We have already shown that (39)  By definition of the gradient, ∇f l and ∇f h must satisfy ∇f l (y), x − y = ∇f h (y), x − y H = H∇f h (y), x − y for all x, y ∈ R n . This implies that ∇f h = H −1 ∇f l = L −1 ∇f l . Therefore that (39) holds for f l on E L is equivalent to that it holds for f h on H H . Further, So that (40) holds for f l on E L is equivalent to that it holds for f h on H H . This concludes the proof.