# Envelope Functions: Unifications and Further Properties

- 565 Downloads
- 1 Citations

## Abstract

Forward–backward and Douglas–Rachford splitting are methods for structured nonsmooth optimization. With the aim to use smooth optimization techniques for nonsmooth problems, the forward–backward and Douglas–Rachford envelopes where recently proposed. Under specific problem assumptions, these envelope functions have favorable smoothness and convexity properties and their stationary points coincide with the fixed-points of the underlying algorithm operators. This allows for solving such nonsmooth optimization problems by minimizing the corresponding smooth convex envelope function. In this paper, we present a general envelope function that unifies and generalizes existing ones. We provide properties of the general envelope function that sharpen corresponding known results for the special cases. We also present a new interpretation of the underlying methods as being majorization–minimization algorithms applied to their respective envelope functions.

## Keywords

First-order methods Envelope functions Nonsmooth optimization Smooth reformulations Large-scale optimization## Mathematics Subject Classification

90C30 47J25## 1 Introduction

Many convex optimization problems can be reformulated into a problem of finding a fixed-point of a nonexpansive operator. This is the basis for many first-order optimization algorithms such as; forward–backward splitting [1], Douglas–Rachford splitting [2, 3], the alternating direction method of multipliers (ADMM) [4, 5, 6] and its linearized versions [7], the three operator splitting method [8], and (generalized) alternating projections [9, 10, 11, 12, 13, 14].

In these methods, a fixed-point is found by performing an averaged iteration of the nonexpansive mapping. This scheme guarantees global convergence, but the rate of convergence can be slow. A well studied approach for improving practical convergence—that has proven very successful in practice—is preconditioning of the problem data; see, e.g., [15, 16, 17, 18, 19, 20, 21] for a limited selection of such methods. The underlying idea is to incorporate static second-order information in the respective algorithms.

The performance of the forward–backward and the Douglas–Rachford methods can be further improved by exploiting the properties of the recently proposed forward–backward envelope [22, 23] and Douglas–Rachford envelope [24]. As shown in [22, 23, 24], the stationary points of these envelope functions agree with the fixed-points of the corresponding algorithm operator. Under certain assumptions, they have favorable properties such as convexity and Lipschitz continuity of the gradient. These properties enable for nonsmooth problems to be solved by finding a stationary point of a smooth and convex envelope function. In [22, 23], truncated Newton methods and quasi-Newton methods are applied to the forward–backward envelope function to improve local convergence. During the submission procedure of this paper, these works have been extended to the nonconvex setting in [25, 26] for both forward–backward splitting and Douglas–Rachford splitting.

A unifying property of forward–backward and Douglas–Rachford splitting (for convex optimization) is that they are averaged iterations of a nonexpansive mapping. This mapping is composed of two nonexpansive mappings that are gradients of functions. Based on this observation, we present a general envelope function that has the forward–backward envelope and the Douglas–Rachford envelope as special cases. Other special cases include the Moreau envelope and the ADMM envelope [27], since they are special cases of the forward–backward and Douglas–Rachford envelopes respectively. We also explicitly characterize the relationship between the ADMM and Douglas–Rachford envelopes as being essentially the negatives of each other.

The analyses of the envelope functions in [22, 23, 24] require, translated to our setting, that one of the functions that define one of the nonexpansive operators in the composition, is twice continuously differentiable. In this paper, we analyze the proposed general envelope function in the more restrictive setting of the twice continuously function being quadratic, or equivalently its gradient being affine. We show that if the Hessian matrix of this function is nonsingular the stationary points of the envelope coincide with the fixed-points of the nonexpansive operator. We provide sharp quadratic upper and lower bounds to the envelope function that improve corresponding results for the known special cases in the literature. One implication of these bounds is that the gradient of the envelope function is Lipschitz continuous with constant two. If, in addition, the before mentioned Hessian matrix is positive semidefinite the envelope function is convex, implying that a fixed-point to the nonexpansive operator can be found by minimizing a smooth and convex envelope function.

We also provide an interpretation of the basic averaged fixed-point iteration as a majorization–minimization step on the envelope function. We show that the majorizing function is a quadratic upper bound, which is slightly more conservative than the provided sharp quadratic upper bound. We also note that using the sharp quadratic upper bound as majorizing function would result in computationally more expensive algorithm iterations.

Our contributions are as follows; (i) we propose a general envelope function that has several known envelope functions as special cases, (ii) we provide properties of the general envelope that sharpen (sometimes considerably) and generalize corresponding known results for the special cases, (iii) we provide an interpretation of the basic averaged iteration as a suboptimal majorization–minimization step on the envelope (iv) we provide new insights on the relation between the Douglas–Rachford envelope and the ADMM envelope.

## 2 Preliminaries

### 2.1 Notation

We denote by \(\mathbb {R}\) the set of real numbers, \(\mathbb {R}^n\) the set of real *n*-dimensional vectors, and \(\mathbb {R}^{m\times n}\) the set of real \(m\times n\)-matrices. Further \(\overline{\mathbb {R}}:=\mathbb {R}\cup \{\infty \}\) denotes the extended real line. We denote inner-products on \(\mathbb {R}^n\) by \(\langle \cdot ,\cdot \rangle \) and their induced norms by \(\Vert \cdot \Vert \). We define the scaled norm \(\Vert x\Vert _P:=\sqrt{\langle Px,x\rangle }\), where *P* is a positive definite operator (defined in Definition 2.2). We will use the same notation for scaled semi-norms, i.e., \(\Vert x\Vert _P:=\sqrt{\langle Px,x\rangle }\), where *P* is a positive semidefinite operator (defined in Definition 2.1). The identity operator is denoted by \(\mathrm {Id}\). The conjugate function is denoted and defined by \(f^{*}(y)\triangleq \sup _{x}\left\{ \langle y,x\rangle -f(x)\right\} \). The adjoint operator to a linear operator \(L:\mathbb {R}^n\rightarrow \mathbb {R}^m\) is defined as the unique operator \(L^*:\mathbb {R}^m\rightarrow \mathbb {R}^n\) that satisfies \(\langle Lx,y\rangle =\langle x,L^*y\rangle \). The linear operator \(L:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is self-adjoint if \(L=L^*\). The notation \({\mathrm{argmin}}_x f(x)\) refers to any element that minimizes *f*. Finally, \(\iota _C\) denotes the indicator function for the set *C* that satisfies \(\iota _C(x)=0\) if \(x\in C\) and \(\iota _C(x)=\infty \) if \(x\not \in C\).

### 2.2 Background

In this section, we introduce some standard definitions that can be found, e.g., in [28, 29].

#### 2.2.1 Operator Properties

### Definition 2.1

(*Positive semidefinite*) A linear operator \(L:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is *positive semidefinite*, if it is self-adjoint and all eigenvalues \(\lambda _i(L)\ge 0\).

### Remark 2.1

An equivalent characterization of a positive semidefinite operator is that \(\langle Lx,x\rangle \ge 0\) for all \(x\in \mathbb {R}^n\).

### Definition 2.2

(*Positive definite*) A linear operator \(L:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is *positive definite*, if it is self-adjoint and if all eigenvalues \(\lambda _i(L)\ge m\) with \(m>0\).

### Remark 2.2

An equivalent characterization of a positive definite operator *L* is that \(\langle Lx,x\rangle \ge m\Vert x\Vert ^2\) for some \(m>0\) and all \(x\in \mathbb {R}^n\).

### Definition 2.3

*Lipschitz continuous*) A mapping \(T:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is \(\delta \)-

*Lipschitz continuous*with \(\delta \ge 0\) if

*T*is

*nonexpansive*and if \(\delta \in [0,1[\), then

*T*is \(\delta \)-

*contractive*.

### Definition 2.4

(*Averaged*) A mapping \(T:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is \(\alpha \)-*averaged* if there exists a nonexpansive mapping \(S:\mathbb {R}^n\rightarrow \mathbb {R}^n\) and an \(\alpha \in ]0,1]\) such that \(T=(1-\alpha )\mathrm {Id}+\alpha S\).

### Definition 2.5

(*Negatively averaged*) A mapping \(T:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is \(\beta \)-*negatively averaged* with \(\beta \in ]0,1]\) if \(-T\) is \(\beta \)-averaged.

### Remark 2.3

For notational convenience, we have included \(\alpha =1\) and \(\beta =1\) in the definitions of (negative) averagedness, which both are equivalent to nonexpansiveness. For values of \(\alpha \in ]0,1[\) and \(\beta \in ]0,1[\) averagedness is a stronger property than nonexpansiveness. For more on negatively averaged operators, see [21] where they were introduced.

If a gradient operator \(\nabla f\) is \(\alpha \)-averaged and \(\beta \)-negatively averaged, then it must hold that \(\alpha +\beta \ge 1\). This follows immediately from Lemma 3.1.

### Definition 2.6

(*Cocoerciveness*) A mapping \(T:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is \(\delta \)-*cocoercive* with \(\delta > 0\) if \(\delta T\) is \(\tfrac{1}{2}\)-averaged.

### Remark 2.4

*T*can be expressed as

*S*is a nonexpansive operator. Therefore, 1-cocoercivity is equivalent to \(\tfrac{1}{2}\)-averagedness (which is also called firm nonexpansiveness).

#### 2.2.2 Function Properties

### Definition 2.7

(*Strongly convex*) Let \(P:\mathbb {R}^n\rightarrow \mathbb {R}^n\) be positive definite. A proper and closed function \(f:\mathbb {R}^n\rightarrow \overline{\mathbb {R}}\) is \(\sigma \)-*strongly convex* w.r.t. \(\Vert \cdot \Vert _P\) with \(\sigma >0\) if \(f-\tfrac{\sigma }{2}\Vert \cdot \Vert _P^2\) is convex.

### Remark 2.5

*f*is differentiable, \(\sigma \)-strong convexity w.r.t. \(\Vert \cdot \Vert _P\) can equivalently be defined as that

*f*is \(\sigma \)-strongly convex. If \(\sigma =0\), the function is convex.

There are many smoothness definitions for functions in the literature. We will use the following, which describes the existence of majorizing and minimizing quadratic functions.

### Definition 2.8

*Smooth*) Let \(P:\mathbb {R}^n\rightarrow \mathbb {R}^n\) be positive semidefinite. A function \(f:\mathbb {R}^n\rightarrow \mathbb {R}\) is \(\beta \)-smooth w.r.t. \(\Vert \cdot \Vert _P\) with \(\beta \ge 0\) if it is differentiable and

#### 2.2.3 Connections

*M*and

*L*, we get different properties of

*f*and its gradient \(\nabla f\). Some of these are stated below. The results follow immediately from Lemma D.2 in Appendix D and the definitions of smoothness and strong convexity in Definitions 2.7 and 2.8, respectively.

### Proposition 2.1

Assume that \(L=-M=\beta I\) with \(\beta \ge 0\) in (4). Then, (4) is equivalent to that \(\nabla f\) is \(\beta \)-Lipschitz continuous.

### Proposition 2.2

Assume that \(M=\sigma I\) and \(L=\beta I\) with \(0\le \sigma \le \beta \) in (4). Then, (4) is equivalent to that \(\nabla f\) is \(\beta \)-Lipschitz continuous and *f* is \(\sigma \)-strongly convex.

### Proposition 2.3

Assume that \(L=-M\) and that *L* is positive definite. Then, (4) is equivalent to that *f* is 1-smooth w.r.t. \(\Vert \cdot \Vert _L\).

### Proposition 2.4

Assume that *M* and *L* are positive definite. Then, (4) is equivalent to that *f* is 1-smooth w.r.t. \(\Vert \cdot \Vert _L\) and 1-strongly convex w.r.t. \(\Vert \cdot \Vert _M\).

## 3 Envelope Function

In [22, 24], the forward–backward and Douglas–Rachford envelope functions are proposed. Under certain problem data assumptions, these envelope functions have favorable properties; they are convex, they have Lipschitz continuous gradients, and their minimizers are fixed-points of the nonexpansive operator *S* that defines the respective algorithms. In this section, we will present a general envelope function that has the forward–backward and Douglas–Rachford envelopes as special cases. We will also provide properties of the general envelope that are sharper than what is known for the special cases.

We assume that the nonexpansive operator *S* that defines the algorithm is a composition of \(S_1\) and \(S_2\), i.e., \(S=S_2S_1\), where \(S_1\) and \(S_2\) satisfy the following basic assumptions (that sometimes will be sharpened or relaxed).

### Assumption 3.1

- (i)
\(S_1:\mathbb {R}^n\rightarrow \mathbb {R}^n\) and \(S_2:\mathbb {R}^n\rightarrow \mathbb {R}^n\) are nonexpansive.

- (ii)
\(S_1=\nabla f_1\) and \(S_2=\nabla f_2\) for some differentiable functions \(f_1:\mathbb {R}^n\rightarrow \mathbb {R}\) and \(f_2:\mathbb {R}^n\rightarrow \mathbb {R}\).

- (iii)
\(f_1:\mathbb {R}^n\rightarrow \mathbb {R}\) is twice continuously differentiable.

*x*, then the set of stationary points of the envelope coincides with the fixed-points of \(S_2S_1\).

### Proposition 3.1

### Proof

The statement follows trivially from (6). \(\square \)

In Sect. 4, we show that the forward–backward and Douglas–Rachford envelopes are special cases of (5). In this section, we will provide properties of the general envelope under the following restriction to Assumption 3.1.

### Assumption 3.2

Suppose that Assumption 3.1 holds and that, in addition, \(S_1:\mathbb {R}^n\rightarrow \mathbb {R}^n\) is affine, i.e., \(S_1x=Px+q\) and \(f_1(x)=\tfrac{1}{2}\langle Px,x\rangle +\langle q,x\rangle \), where \(P\in \mathbb {R}^{n\times n}\) is a self-adjoint nonexpansive linear operator and \(q\in \mathbb {R}^n\).

### Remark 3.1

That *P* a self-adjoint nonexpansive linear operator means that it is symmetric with eigenvalues in the interval \([-1,1]\).

### 3.1 Basic Properties of the Envelope Function

The following two results are special cases and direct corollaries of a more general result in Theorem 3.1, to be presented later. Proofs are therefore omitted.

### Proposition 3.2

*F*is 2-Lipschitz continuous. That is, \(\nabla F\) satisfies

### Proposition 3.3

Suppose that Assumption 3.2 holds and that *P*, that defines the linear part of \(S_1\), is positive semidefinite. Then, *F* is convex.

If *P* is positive semidefinite, then the envelope function *F* is convex and differentiable with a Lipschitz continuous gradient. This implies, e.g., that all stationary points are minimizers. If *P* is positive definite we know from Proposition 3.1 that the set of stationary points coincides with the fixed-point set of \(S=S_2S_1\). Therefore, a fixed-point to \(S_2S_1\) can be found by minimizing the smooth convex envelope function *F*.

### 3.2 Finer Properties of the Envelope Function

In this section, we establish sharp upper and lower bounds for the envelope function (7). These results use stronger assumptions on \(S_2\) than nonexpansiveness, namely that \(S_2\) is \(\alpha \)-averaged and \(\beta \)-negatively averaged:

### Assumption 3.3

The operator \(S_2\) is \(\alpha \)-averaged and \(\beta \)-negatively averaged with \(\alpha \in ]0,1]\) and \(\beta \in ]0,1]\).

Before we proceed, we state a result on how averaged and negatively averaged gradient operators can equivalently be characterized. The result is proven in Appendix A.

### Lemma 3.1

*f*is differentiable. Then, \(\nabla f\) is \(\alpha \)-averaged with \(\alpha \in ]0,1]\) and \(\beta \)-negatively averaged with \(\beta \in ]0,1]\) if and only if

These properties relate to smoothness and strong convexity properties of *f*. More precisely, they imply that *f* is \(\max ((2\alpha -1),(2\beta -1))\)-smooth and, if \(\alpha >\tfrac{1}{2}\), \((2\alpha -1)\)-strongly convex. With this interpretation in mind, we state the main theorem.

### Theorem 3.1

*F*in (7) satisfies

A proof to this result is found in “Appendix B”.

Utilizing connections established in Sect. 2.2.3, we next derive different properties of the envelope function. Especially, we provide conditions under which the envelope function is convex and strongly convex.

### Corollary 3.1

*P*is positive semidefinite. Then,

*F*is convex and 1-smooth w.r.t. \(\Vert \cdot \Vert _{P+\delta _{\alpha } P^2}\). If in addition

*P*is positive definite and either of the following holds:

- (i)
*P*is contractive, - (ii)
\(\beta \in ]0,1[\), i.e., \(\delta _{\beta }\in ]-1,1[\),

*F*is 1-strongly convex w.r.t. \(\Vert \cdot \Vert _{P-\delta _{\beta }P^2}\) and 1-smooth w.r.t. \(\Vert \cdot \Vert _{P+\delta _{\alpha } P^2}\).

### Proof

The results follow from Theorem 3.1, the definition of (strong) convexity, and by utilizing Lemma D.3 in “Appendix D” to show that the smallest eigenvalue of \(P-\delta _{\beta }P^2\) is nonnegative and positive, respectively. \(\square \)

Less sharp, but unscaled, versions of these bounds can easily be obtained from Theorem 3.1.

### Corollary 3.2

Values of \(\beta _l\) and \(\beta _u\) for different assumptions on *P*, \(\delta _{\alpha }\) and \(\delta _{\beta }\) can be obtained from Lemma D.3 in “Appendix D”.

The results in Theorem 3.1 and its corollaries are stated for \(\alpha \)-averaged and \(\beta \)-negatively averaged operators \(S_2=\nabla f_2\). Using Lemmas 3.1 and D.2, we conclude that \(\delta \)-contractive operators are \(\alpha \)-averaged and \(\beta \)-negatively averaged with \(\alpha \) and \(\beta \) satisfying \(\delta =\delta _{\alpha }=\delta _{\beta }\). This gives the following result.

### Proposition 3.4

Suppose that Assumption 3.2 holds and that \(S_2\) is \(\delta \)-Lipschitz continuous with \(\delta \in [0,1]\). Then, all results in this section hold with \(\delta _{\beta }\) and \(\delta _{\alpha }\) replaced by \(\delta \).

### Proposition 3.5

Suppose that Assumption 3.2 holds and that \(S_2\) is \(\tfrac{1}{\delta }\)-cocoercive with \(\delta \in ]0,1]\). Then, all results in this section hold with \(\delta _{\beta }=\delta \) and \(\delta _{\alpha }=0\).

### 3.3 Majorization–Minimization Interpretation of Averaged Iteration

As noted in [22, 24], the forward–backward and Douglas–Rachford splitting methods are variable metric gradient methods applied to their respective envelope functions. In our setting, with \(S_1\) being affine, they reduce to being fixed-metric scaled gradient methods. In this section, we provide a different interpretation. We show that a step in the basic iteration is obtained by performing majorization–minimization on the envelope. The majorizing function is a closely related to the upper bound provided in Corollary 3.1.

*P*is positive definite, besides being nonexpansive. This implies that the envelope is convex, see Corollary 3.1. It is straightforward to verify that \(P+\delta _{\alpha }P^2\preceq (1+\delta _{\alpha })P\). Therefore, we can construct the following more conservative upper bound to the envelope, compared to Corollary 3.1:

*k*gives

*L*is a Lipschitz constant. In this case, the upper bound (11) guarantees a Lipschitz constant to \(\nabla F\) of \(L=1+\delta _{\alpha }\) in the \(\Vert \cdot \Vert _P\)-norm, see Lemma D.2. Selecting a step-length within the allowed range yields an averaged iteration with \(\tfrac{1}{1+\delta _{\alpha }}\) replaced by \(\alpha \in ]0,\tfrac{2}{1+\delta _{\alpha }}[\).

None of these methods is probably the most efficient way to find a stationary point of the envelope function (or equivalently a fixed-point to \(S_2S_1\)). At least in the convex setting (for the envelope), there are numerous alternative methods that can minimize smooth functions such as truncated Newton methods, quasi-Newton methods, and nonlinear conjugate gradient methods. See [31] for an overview of such methods and [22, 23] for some of these methods applied to the forward–backward envelope. Evaluating which ones that are most efficient and devising new methods to improve performance is outside the scope of this paper.

## 4 Special Cases

In this section, we show that our envelope in (5) has four known special cases, namely the Moreau envelope [32], the forward–backward envelope [22, 23], the Douglas–Rachford envelope [24], and the ADMM envelope [27] (which is a special case of the Douglas–Rachford envelope).

We also show that our envelope bounds for \(S_1=\nabla f_1\) being affine coincide with or sharpen corresponding results in the literature for the special cases.

### 4.1 Algorithm Building Blocks

### Proposition 4.1

This proximal map interpretation is from [33, Theorems 31.5, 16.4] and implies that the proximal operator is the gradient of a convex function. The reflected proximal operator interpretation follows trivially from the prox interpretation.

### 4.2 The Proximal Point Algorithm

*f*and is given by

*f*itself.

*F*in (7). The scaling factor is \(\gamma ^{-1}\) and the Moreau envelope \(f^{\gamma }\) is obtained by letting \(S_1x=\nabla f_1(x)=x\), i.e., \(P=\mathrm {Id}\) and \(q=0\), and \(f_2=r_{\gamma f}^*\) in (7), where \(r_{\gamma f}\) is defined in (12):

### Proposition 4.2

The Moreau envelope \(f^{\gamma }\) in (16) is differentiable and convex and \(\nabla f^{\gamma }\) is \(\gamma ^{-1}\)-Lipschitz continuous.

This coincides with previously known properties of the Moreau envelope, see [28, Chapter 12].

### 4.3 Forward–Backward Splitting

*L*-Lipschitz (or equivalently \(\tfrac{1}{L}\)-cocoercive) gradient, and \(g:\mathbb {R}^n\rightarrow \mathbb {R}\cup \{\infty \}\) is proper, closed, and convex.

*F*in (5) and applies when

*f*is twice continuously differentiable. The scaling factor is \(\gamma ^{-1}\) and the forward–backward envelope is obtained by letting \(f_1=\tfrac{1}{2}\Vert \cdot \Vert ^2-\gamma f\) and \(f_2=r_{\gamma g}^*\) in (5), where \(r_{\gamma g}\) is defined in (12). The resulting forward–backward envelope function is

#### 4.3.1 \(S_1\) Affine

We provide properties of the forward–backward envelope in the more restrictive setting of \(S_1=\nabla f_1=(\mathrm {Id}-\gamma \nabla f)\) being affine. This applies when *f* is a convex quadratic, \(f(x)=\tfrac{1}{2}\langle Hx,x\rangle +\langle h,x\rangle \) with \(H\in \mathbb {R}^{n\times n}\) positive semidefinite and \(h\in \mathbb {R}^n\). Then, \(S_1x=Px+q\) with \(P=(\mathrm {Id}-\gamma H)\) and \(q=-\gamma h\).

In this setting, the following result follows immediately from Corollary 3.1 and Proposition 3.5 (where Proposition 3.5 is invoked since \(S_2=\mathrm{{prox}}_{\gamma g}\) is 1-cocoercive, see Remark 2.4 and [28, Proposition 12.27]).

### Proposition 4.3

Less tight bounds for the forward–backward envelope are provided next. These follow immediately from the above and Lemma D.3.

### Proposition 4.4

Assume that \(f(x)=\tfrac{1}{2}\langle Hx,x\rangle +\langle h,x\rangle \), that \(\gamma \in ]0,\tfrac{1}{L}[\) where \(L=\lambda _{\max }(H)\), and that \(m=\lambda _{\min }(H)\ge 0\). Then, the forward–backward envelope \(F_{\gamma }^\mathrm{{FB}}\) is \(\gamma ^{-1}(1-\gamma m)\)-smooth and \(\min \left( (1-\gamma m)m,(1-\gamma L)L\right) \)-strongly convex (both w.r.t. to the induced norm \(\Vert \cdot \Vert \)).

This result is a less tight version of Proposition 4.3, but is a slight improvement of the corresponding result in [22, Theorem 2.3]. The strong convexity moduli are the same, but our smoothness constant is a factor two smaller.

### 4.4 Douglas–Rachford Splitting

*F*in (5) and applies when

*f*is twice continuously differentiable and \(\nabla f\) is Lipschitz continuous. The scaling factor is \((2\gamma )^{-1}\) and the Douglas–Rachford envelope is obtained by, in (5), letting \(f_1=p_{\gamma f}\) with gradient \(\nabla f_1=S_1=R_{\gamma f}\) and \(f_2 = p_{\gamma g}\), where \(p_{\gamma g}\) is defined in (14). The Douglas–Rachford envelope function becomes

#### 4.4.1 \(S_1\) Affine

*f*:

*H*is positive semidefinite. The operator \(S_1\) becomes

*P*and

*q*through the relation \(S_1=R_{\gamma f}=P(\cdot )+q\), and note that they are given by the expressions \(P=2(\mathrm {Id}+\gamma H)^{-1}-\mathrm {Id}\) and \(q=-2\gamma (\mathrm {Id}+\gamma H)^{-1} h\), respectively.

In this setting, the following result follows immediately from Corollary 3.1 since \(S_2=R_{\gamma g}\) is nonexpansive (1-averaged and 1-negatively averaged).

### Proposition 4.5

The following less tight characterization of the Douglas–Rachford envelope follows from the above and Lemma D.3.

### Proposition 4.6

Assume that \(f(x)=\tfrac{1}{2}\langle Hx,x\rangle +\langle h,x\rangle \), that \(\gamma \in ]0,\tfrac{1}{L}[\), where \(L=\lambda _{\max }(H)\), and that \(m=\lambda _{\min }(H)\ge 0\). Then, the Douglas–Rachford envelope \(F_{\gamma }^\mathrm{{DR}}\) is \(\tfrac{1-\gamma m}{(1+\gamma m)^2}\gamma ^{-1}\)-smooth and \(\min \left( \tfrac{(1-\gamma m) m}{(1+\gamma m)^2},\tfrac{(1-\gamma L)L}{(1+\gamma L)^2}\right) \)-strongly convex.

This result is more conservative than the one in Proposition 4.5, but improves on [24, Theorem 2]. The strong convexity modulus coincides with the corresponding one in [24, Theorem 2]. The smoothness constant is \(\tfrac{1}{1+\gamma m}\) times that in [24, Theorem 2], i.e., it is slightly smaller.

### 4.5 ADMM

### Lemma 4.1

Before we state the result, we show that the \(z^k\) sequence in (primal) Douglas–Rachford (20) and the \(v^k\) sequence in ADMM (i.e., dual Douglas–Rachford) in (23) differ by a factor only. This is well known [35], but the relation is stated next with a simple proof.

### Proposition 4.7

Assume that \(\rho >0\) and \(\gamma >0\) satisfy \(\rho ^{-1}=\gamma \), and that \(z^0 = \rho ^{-1}v^0\). Then \(z^k=\rho ^{-1}v^{k}\) for all \(k\ge 1\), where \(\{z^k\}\) is the primal Douglas–Rachford sequence defined in (20) and the \(\{v^k\}\) is the ADMM sequence is defined in (23).

### Proof

There is also a tight relationship between the ADMM and Douglas–Rachford envelopes. Essentially, they have opposite signs.

### Proposition 4.8

### Proof

*H*is positive definite on the nullspace of

*A*. From Propositions 4.5 and 4.6, we conclude that, for an appropriate choice of \(\rho \), the ADMM envelope is convex, which implies that the Douglas–Rachford envelope is concave.

### Remark 4.1

## 5 Conclusions

We have presented an envelope function that unifies the Moreau envelope, the forward–backward envelope, the Douglas–Rachford envelope, and the ADMM envelope. We have provided quadratic upper and lower bounds for the envelope that coincide with or improve on corresponding results in the literature for the special cases. We have also provided a novel interpretation of the underlying algorithms as being majorization–minimization algorithms applied to their respective envelopes. Finally, we have shown how the ADMM and DR envelopes relate to each other.

## Notes

### Acknowledgements

Pontus Giselsson and Mattias Fält are financially supported by the Swedish Foundation for Strategic Research and members of the LCCC Linneaus Center at Lund University. Pontus Giselsson is also financed by the Swedish Research Council. The reviewers are gratefully acknowledged for useful comments that have considerably improved the paper.

## References

- 1.Combettes, P.L.: Solving monotone inclusions via compositions of nonexpansive averaged operators. Optimization
**53**(5–6), 475–504 (2004)MathSciNetCrossRefzbMATHGoogle Scholar - 2.Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans. Am. Math. Soc.
**82**, 421–439 (1956)MathSciNetCrossRefzbMATHGoogle Scholar - 3.Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal.
**16**(6), 964–979 (1979)MathSciNetCrossRefzbMATHGoogle Scholar - 4.Gabay, D., Mercier, B.: A dual algorithm for the solution of nonlinear variational problems via finite element approximation. Comput. Math. Appl.
**2**(1), 17–40 (1976)CrossRefzbMATHGoogle Scholar - 5.Glowinski, R., Marroco, A.: Sur l’approximation, par éléments finis d’ordre un, et la résolution, par pénalisation-dualité d’une classe de problémes de dirichlet non linéaires. ESAIM: Math. Model. Numer. Anal.
**9**, 41–76 (1975)zbMATHGoogle Scholar - 6.Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn.
**3**(1), 1–122 (2011)CrossRefzbMATHGoogle Scholar - 7.Chambolle, A., Pock, T.: A first-order primal–dual algorithm for convex problems with applications to imaging. J. Math. Imag. Vis.
**40**(1), 120–145 (2011)MathSciNetCrossRefzbMATHGoogle Scholar - 8.Davis, D., Yin, W.: A three-operator splitting scheme and its optimization applications (2015). arXiv:1504.01032
- 9.Gubin, L.G., Polyak, B.T., Raik, E.V.: The method of projections for finding the common point of convex sets. USSR Comput. Math. Math. Phys.
**7**(6), 1–24 (1967)CrossRefGoogle Scholar - 10.Agmon, S.: The relaxation method for linear inequalities. Can. J. Math.
**6**(3), 382–392 (1954)MathSciNetCrossRefzbMATHGoogle Scholar - 11.Motzkin, T.S., Shoenberg, I.: The relaxation method for linear inequalities. Can. J. Math.
**6**(3), 383–404 (1954)MathSciNetGoogle Scholar - 12.Eremin, I.I.: Generalization of the Motskin–Agmon relaxation method. Usp. mat. Nauk
**20**(2), 183–188 (1965)Google Scholar - 13.Bregman, L.M.: Finding the common point of convex sets by the method of successive projection. Dokl. Akad. Nauk SSSR
**162**(3), 487–490 (1965)MathSciNetGoogle Scholar - 14.von Neumann, J.: Functional Operators. Volume II. The Geometry of Orthogonal Spaces, Annals of Mathematics Studies. Princeton University Press, Princeton (1950). (Reprint of 1933 lecture notes)Google Scholar
- 15.Benzi, M.: Preconditioning techniques for large linear systems: a survey. J. Comput. Phys.
**182**(2), 418–477 (2002)MathSciNetCrossRefzbMATHGoogle Scholar - 16.Bramble, J.H., Pasciak, J.E., Vassilev, A.T.: Analysis of the inexact Uzawa algorithm for saddle point problems. SIAM J. Numer. Anal.
**34**(3), 1072–1092 (1997)MathSciNetCrossRefzbMATHGoogle Scholar - 17.Hu, Q., Zou, J.: Nonlinear inexact Uzawa algorithms for linear and nonlinear saddle-point problems. SIAM J. Optim.
**16**(3), 798–825 (2006)MathSciNetCrossRefzbMATHGoogle Scholar - 18.Ghadimi, E., Teixeira, A., Shames, I., Johansson, M.: Optimal parameter selection for the alternating direction method of multipliers (ADMM): quadratic problems. IEEE Trans. Autom. Control
**60**(3), 644–658 (2015)MathSciNetCrossRefzbMATHGoogle Scholar - 19.Giselsson, P., Boyd, S.: Metric selection in fast dual forward–backward splitting. Automatica
**62**, 1–10 (2015)MathSciNetCrossRefzbMATHGoogle Scholar - 20.Giselsson, P., Boyd, S.: Linear convergence and metric selection for Douglas–Rachford splitting and ADMM. IEEE Trans. Autom. Control
**62**(2), 532–544 (2017)MathSciNetCrossRefzbMATHGoogle Scholar - 21.Giselsson, P.: Tight global linear convergence rate bounds for Douglas-Rachford splitting. J. Fixed Point Theory Appl. (2017). https://doi.org/10.1007/s11784-017-0417-1 MathSciNetzbMATHGoogle Scholar
- 22.Patrinos, P., Stella, L., Bemporad, A.: Forward–backward truncated Newton methods for convex composite optimization. (2014). arXiv:1402.6655
- 23.Stella, L., Themelis, A., Patrinos, P.: Forward–backward quasi-Newton methods for nonsmooth optimization problems. Comp. Opt. and Appl.
**67**(3), 443–487 (2017)MathSciNetCrossRefzbMATHGoogle Scholar - 24.Patrinos, P., Stella, L., Bemporad, A.: Douglas–Rachford splitting: complexity estimates and accelerated variants. In: Proceedings of the 53rd IEEE Conference on Decision and Control, pp. 4234–4239. Los Angeles, CA (2014)Google Scholar
- 25.Themelis, A., Stella, L., Patrinos, P.: Forward–backward envelope for the sum of two nonconvex functions: further properties and nonmonotone line-search algorithms. (2016). arXiv:1606.06256
- 26.Themelis, A., Stella, L., Patrinos, P.: Douglas–Rachford splitting and ADMM for nonconvex optimization: new convergence results and accelerated versions. (2017). arXiv:1709.05747
- 27.Pejcic, I., Jones, C.N.: Accelerated ADMM based on accelerated Douglas–Rachford splitting. In: 2016 European Control Conference (ECC), pp. 1952–1957 (2016)Google Scholar
- 28.Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. Springer, New York (2011)CrossRefzbMATHGoogle Scholar
- 29.Rockafellar, R.T., Wets, R.J.B.: Variational Analysis. Springer, Berlin (1998)CrossRefzbMATHGoogle Scholar
- 30.Nesterov, Y.: Introductory Lectures on Convex Optimization: A Basic Course, 1st edn. Springer, Dordrecht (2003)zbMATHGoogle Scholar
- 31.Nocedal, J., Wright, S.: Numerical Optimization. Springer series in operations research and financial engineering, 2nd edn. Springer, New York (2006)zbMATHGoogle Scholar
- 32.Moreau, J.J.: Proximit et dualit dans un espace hilbertien. Bulletin de la Socit Mathmatique de France
**93**, 273–299 (1965)CrossRefzbMATHGoogle Scholar - 33.Rockafellar, R.T.: Convex Analysis, vol. 28. Princeton Univercity Press, Princeton (1970)CrossRefzbMATHGoogle Scholar
- 34.Gabay, D.: Applications of the method of multipliers to variational inequalities. In: Fortin, M., Glowinski, R. (eds.) Augmented Lagrangian Methods: Applications to the Solution of Boundary-Value Problems. North-Holland, Amsterdam (1983)Google Scholar
- 35.Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. Ph.D. thesis, MIT (1989)Google Scholar
- 36.Giselsson, P., Fält, M., Boyd, S.: Line search for averaged operator iteration. (2016). arXiv:1603.06772
- 37.Giselsson, P., Fält, M., Boyd, S.: Line search for averaged operator iteration. In: Proceedings of the 55th Conference on Decision and Control. Las Vegas, USA (2016)Google Scholar
- 38.Clarke, F.: Optimization and Nonsmooth Analysis. Wiley, New York (1983)zbMATHGoogle Scholar
- 39.Sion, M.: On general minimax theorems. Pac. J. Math.
**8**(1), 171–176 (1958)MathSciNetCrossRefzbMATHGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.