Finding the Forward-Douglas–Rachford-Forward Method

Abstract

We consider the monotone inclusion problem with a sum of 3 operators, in which 2 are monotone and 1 is monotone-Lipschitz. The classical Douglas–Rachford and forward–backward–forward methods, respectively, solve the monotone inclusion problem with a sum of 2 monotone operators and a sum of 1 monotone and 1 monotone-Lipschitz operators. We first present a method that naturally combines Douglas–Rachford and forward–backward–forward and show that it solves the 3-operator problem under further assumptions, but fails in general. We then present a method that naturally combines Douglas–Rachford and forward–reflected–backward, a recently proposed alternative to forward–backward–forward by Malitsky and Tam (A forward–backward splitting method for monotone inclusions without cocoercivity, 2018. arXiv:1808.04162). We show that this second method solves the 3-operator problem generally, without further assumptions.

Introduction

We consider the monotone inclusion problem of finding a zero of the sum of 2 maximal monotone and 1 Lipschitz-monotone operators. The classical Douglas–Rachford (DR) splitting by Lions and Mercier [1] solves the problem with a sum of 2 maximal monotone operators. The classical forward–backward–forward (FBF) splitting by Tseng [2] solves the problem with a sum of 1 maximal monotone and 1 monotone-Lipschitz operators. We consider the generalization of the setups of DR and FBF.

Recently, there has been much work developing splitting methods combining and unifying classical ones. Another classical method is forward–backward (FB) splitting [3, 4], which solves the problem with a sum of 1 monotone and 1 cocoercive operators. The effort of combining DR and FB was started by Raguet et al. [5, 6], extended by Briceño-Arias [7], and completed by Davis and Yin [8] as they proved convergence for the sum of 2 monotone and 1 cocoercive operators. FB and FBF were combined by Briceño-Arias and Davis [9] as they proved convergence for 1 monotone, 1 cocoercive, and 1 monotone-Lipschitz operators. These combined splitting methods can efficiently solve monotone inclusion problems with more complex structure.

On the other hand, DR and FBF have not been fully combined, to the best of our knowledge. Banert’s relaxed forward backward (in the thesis [10]) and Briceño-Arias’s forward–partial inverse–forward [11] combine DR and FBF in the setup where one operator is a normal cone operator with respect to a closed subspace. However, neither method applies to the general setup with 2 maximal monotone and 1 Lipschitz-monotone operators.

In this work, we first present a method that naturally combines and unifies DR and FBF. We prove convergence under further assumptions, and we prove, through a counterexample, that convergence cannot be established in full generality. We then propose a second method that naturally combines and unifies DR and forward–reflected–backward (FRB), a recently proposed alternative to FBF by Malitsky and Tam [12]. We show that this combination of DR and FRB does converge in full generality.

The paper is organized as follows. Section 2 states the problem formally. Section 3 reviews preliminary information and sets up the notation. Section 4 presents our first proposed method combining DR and FBF, proves convergence under certain further assumptions, and proves divergence in the fully general case. Section 5 presents our second proposed method combining DR and FRB and proves convergence in the fully general case. Section 6 compares our presented method with other similar and relevant methods.

Problem Statement, Contribution, and Prior Work

Consider the monotone inclusion problem

$$\begin{aligned} \text{ find } x\in {\mathcal {H}}\quad \hbox {such that}\quad 0\in Ax+Bx+Cx, \end{aligned}$$
(1)

where \({\mathcal {H}}\) is a real Hilbert space. Throughout, we assume for some \(\mu \in ]0,\infty [\):

$$\begin{aligned} A:{\mathcal {H}}\rightrightarrows {\mathcal {H}}\text { and } B:{\mathcal {H}}\rightrightarrows {\mathcal {H}}\text { are maximal monotone.} \end{aligned}$$
(A1)
$$\begin{aligned} C:{\mathcal {H}}\rightarrow {\mathcal {H}}\text { is monotone and }\mu \text {-Lipschitz continuous.} \end{aligned}$$
(A2)
$$\begin{aligned} \text {zer}(A+B+C)\text { is not empty.} \end{aligned}$$
(A3)

Let \(J_{\gamma A}\), \(J_{\gamma B}\), and \(J_{\gamma C}\), respectively, denote the resolvents with respect to A, B, and C with parameters \(\gamma \). We informally assume \(J_{\gamma A}(x)\), \(J_{\gamma B}(x)\), and C(x) can be evaluated efficiently for any input \(x\in {\mathcal {H}}\). However, \(J_{\gamma C}(x)\) may be difficult to evaluate. Therefore, we restrict our attention to methods that activate C through direct evaluations, rather than through \(J_{\gamma C}\). The monotone-Lipschitz operator C of Problem (1) arises as skew linear operators primal-dual optimization [13, 14] and saddle point problems [15].

When \(C=0\), we can use the classical Douglas–Rachford (DR) splitting presented by Lions and Mercier [1]:

$$\begin{aligned} x_{n+1}&=J_{\gamma B}(z_{n})\\ y_{n+1}&=J_{\gamma A}(2x_{n+1}-z_{n})\\ z_{n+1}&=z_{n}+y_{n+1}-x_{n+1}, \end{aligned}$$

where the step size parameter satisfies \(\gamma \in \left]0,+\infty \right[\). When \(B=0\), we can use the classical forward–backward–forward (FBF) splitting by Tseng [2]:

$$\begin{aligned} y_{n+1}&=J_{\gamma A}(x_{n}-\gamma Cx_{n})\\ x_{n+1}&=y_{n+1} -\gamma (Cy_{n+1}-Cx_{n}), \end{aligned}$$

where the step size parameter satisfies \(\gamma \in \left]0,1/\mu \right[\). Recently, Malitsky and Tam [12] have proposed forward–reflected–backward (FRB) splitting, another method for the case \(B=0\):

$$\begin{aligned} x_{n+1}=J_{\gamma A}(x_{n}-\gamma (2Cx_{n}-Cx_{n-1})), \end{aligned}$$

where the step size parameter satisfies \(\gamma \in \left]0,1/(2\mu )\right[\).

The contribution of this work is the study of splitting methods combining DR with other methods to incorporate an additional monotone-Lipschitz operator. We characterize to what extent \(\hbox {DR}+\hbox {FBF}\) works and to what extent it fails. We then demonstrate that \(\hbox {DR}+\hbox {FRB}\) is a successful combination.

Several other 3-operator splitting methods have been presented in recent years. Combettes and Pesquet’s PPXA [13], Boţ–Hendrich [16], Latafat and Patrinos’s AFBA [17], and Ryu’s 3-operator resolvent-splitting [18] solve the problem with 3 or more monotone operators by activating the operators through their individual resolvents. Condat–Vũ [19, 20], FDR [5,6,7,8], and Yan’s PD3O [21] solve the problem with 2 monotone and 1 cocoercive operators by activating the 2 monotone operators through their resolvents and the cocoercive operator through forward evaluations. FBHF [9] solves the problem with 1 monotone, 1 cocoercive, and 1 monotone-Lipschitz operators by activating the monotone operator through its resolvent and the cocoercive and monotone-Lipschitz operators through forward evaluations. These methods do not apply to our setup since we have 2 monotone operators, which we activate through their resolvents, and 1 monotone-Lipschitz operator, which we activate through forward evaluations.

The primal-dual method by Combettes and Pesquet [13] and the instances of projective splitting by Johnstone and Eckstein [22, 23] are existing methods that do solve Problem (1). However, these methods do not reduce to DR. We compare the form of these methods in Sect. 6.

While this paper was under review, there have been exciting developments on splitting methods based on cutting planes (separating hyperplanes): Warped proximal iterations by Bùi and Combettes [24] and NOFOB by Giselsson [25] are general frameworks that can solve Problem (1). The methods that arise from these frameworks are different from the methods we present.

Preliminaries

In this section, we quickly review known results and set up the notation. The notation and results we discuss are standard, and interested readers can find further information in [26, 27].

Write \({\mathcal {H}}\) for a real Hilbert space and, respectively, write \(\left\langle {\cdot }, {\cdot } \right\rangle \) and \(\Vert {\cdot }\Vert \) for its associated scalar product and norm. Write \({\text {Id}}:{\mathcal {H}}\rightarrow {\mathcal {H}}\) for the identity operator. Write \(A:{\mathcal {H}}\rightrightarrows {\mathcal {H}}\) to denote that A is a set-valued operator. For simplicity, we also write \(Ax := A(x)\). When A maps a point to a singleton, we also write \(Ax = y\) instead of \(Ax = \{y\}\). Write \({\text {dom}}(A):=\big \{{x\in {\mathcal {H}}}: {Ax\not ={\varnothing }}\big \}\) for the domain of A and \({\text {ran}}(A) := \big \{{u\in {\mathcal {H}}}: {(\exists x\in {\mathcal {H}})\, u\in Ax }\big \}\) for the range of A. Write \({\text {gra}}(A) := \big \{{(x,u)\in {\mathcal {H}}\times {\mathcal {H}}}: {u\in Ax}\big \}\) for the graph of A. The inverse of A is the set-valued operator defined by \(A^{-1}:u \mapsto \big \{{x}: { u\in Ax}\big \}\). The zero set of A is \({\text {zer}}(A) := A^{-1}0\). We say that A is monotone if

$$\begin{aligned} \big (\forall (x,u), (y,v)\in {\text {gra}}A\big ) \quad \left\langle {x-y}, {u-v} \right\rangle \ge 0, \end{aligned}$$

and it is maximally monotone if there exists no monotone operator B such that \({\text {gra}}(B)\) properly contains \({\text {gra}}(A)\). The resolvent of A is \( J_A:=({\text {Id}}+ A)^{-1}\). When A is maximal monotone, \(J_A\) is single-valued and \({\text {dom}}J_A={\mathcal {H}}\). A single-valued operator \(B:{\mathcal {H}}\rightarrow {\mathcal {H}}\) is \(\kappa \)-cocoercive for \(\kappa \in ]0,\infty [\) if

$$\begin{aligned} (\forall x,y\in {\mathcal {H}})\quad \left\langle {x-y}, {Bx-By} \right\rangle \ge \kappa \Vert Bx-By\Vert ^2. \end{aligned}$$

A single-valued operator \(C:{\mathcal {H}}\rightarrow {\mathcal {H}}\) is \(\mu \)-Lipschitz for \(\mu \in ]0,\infty [\) if

$$\begin{aligned} \left( \forall x,y\in {\mathcal {H}}\right) \quad \Vert Cx-Cy\Vert \le \mu \Vert x-y\Vert . \end{aligned}$$

A single-valued operator \(R:{\mathcal {H}}\rightarrow {\mathcal {H}}\) is nonexpansive if

$$\begin{aligned} \left( \forall x,y\in {\mathcal {H}}\right) \quad \Vert Rx-Ry\Vert \le \Vert x-y\Vert , \end{aligned}$$

i.e., if R is 1-Lipschitz. Let \(\theta \in \left[ 0,1\right] \). A single-valued operator \(T:{\mathcal {H}}\rightarrow {\mathcal {H}}\) is \(\theta \)-averaged if \(T=(1-\theta )I+\theta R\) for some nonexpansive operator R. Define the normal cone operator with respect to a nonempty closed convex set \(C\subseteq {\mathcal {H}}\) as

$$\begin{aligned} N_C(x) = \left\{ \begin{array}{l@{\quad }l} {\varnothing }, &{} \text {if}\; x \not \in C\\ \big \{{ y\in {\mathcal {H}}}: {\left\langle {y}, {z-x} \right\rangle \le 0~\forall z\in C}\big \}, &{} \text {if}\; x \in C. \end{array}\right. \end{aligned}$$

The Cauchy–Schwartz inequality states \(\left\langle {u}, {v} \right\rangle \le \Vert u\Vert \Vert v\Vert \) for any \(u,v\in {\mathcal {H}}\). The Young’s inequality states

$$\begin{aligned} \left\langle {u}, {v} \right\rangle \le \frac{\eta }{2}\Vert u\Vert ^2+\frac{1}{2\eta }\Vert v\Vert ^2, \end{aligned}$$

for any \(u,v\in {\mathcal {H}}\) and \(\eta >0\).

Lemma 3.1

If \(C:{\mathcal {H}}\rightarrow {\mathcal {H}}\) is \(\mu \)-Lipschitz continuous, then \({\text {Id}}-\gamma C\) is one-to-one for \(\gamma \in ]0,1/\mu [\).

Proof

Although this result follows immediately from the machinery of scaled relative graphs [28], we provide a proof based on first principles. Let \(x,y\in {\mathcal {H}}\). Then \( \Vert Cx-Cy\Vert \le \mu \Vert x-y\Vert \) and

$$\begin{aligned} \Vert ({\text {Id}}-\gamma C)x-({\text {Id}}-\gamma C)y\Vert \ge \Vert x-y\Vert -\gamma \Vert Cx-Cy\Vert \ge (1-\gamma \mu )\Vert x-y\Vert , \end{aligned}$$

by Cauchy–Schwartz and \(\mu \)-Lipschitz continuity. Thus \(({\text {Id}}-\gamma C)x=({\text {Id}}-\gamma C)y\) if and only if \(x=y\). \(\square \)

A classical result states that \(J_{B}\) is (1 / 2)-averaged if B is maximal monotone [26, Proposition 23.8]. The following lemma states that \(J_{B}\) is furthermore \(\theta \)-averaged with \(\theta <1/2\) if B is cocoercive.

Lemma 3.2

Let \(\gamma ,\kappa \in ]0,\infty [\). If \(B:{\mathcal {H}}\rightarrow {\mathcal {H}}\) is \(\kappa \)-cocoercive, then \(J_{\gamma B}\) is \(\frac{1}{2(1+\kappa /\gamma )}\)-averaged and

$$\begin{aligned} \Vert J_{\gamma B}x-J_{\gamma B}y\Vert ^2\le \Vert x-y \Vert ^2 - \left( 1 + \frac{2\kappa }{\gamma }\right) \Vert ({\text {Id}}-J_{\gamma B})x- ({\text {Id}}-J_{\gamma B})y \Vert ^2. \end{aligned}$$

for any \(x,y\in {\mathcal {H}}\).

Proof

Although this result follows immediately from the machinery of scaled relative graphs [28], we provide a proof based on first principles. Let \(u=J_{\gamma B}x\) and \(v=J_{\gamma B}y\), i.e., \(\gamma ^{-1}(x-u)= Bu\) and \(\gamma ^{-1}(y-v)= Bv\). Since B is \(\kappa \)-cocoercive, we have

$$\begin{aligned} \kappa \Vert \gamma ^{-1}(x-u) -\gamma ^{-1}(y-v) \Vert ^2 \le \left\langle {u-v}, {\gamma ^{-1}(x-u) -\gamma ^{-1}(y-v)} \right\rangle . \end{aligned}$$

This implies

$$\begin{aligned} (\kappa /\gamma ) \Vert x-u -y+v \Vert ^2&\le \left\langle {u-v}, {x-u -y+v} \right\rangle \\&= \left\langle {u-v}, {x-y} \right\rangle - \Vert u-v\Vert ^2\\&= -\frac{1}{2} \Vert u-v\Vert ^2 +\frac{1}{2} \Vert x-y\Vert ^2 - \frac{1}{2}\Vert x-u -y+v \Vert ^2, \end{aligned}$$

which proves the stated inequality. Finally, this inequality is equivalent to \(\frac{1}{2(1+\kappa /\gamma )}\)-averagedness of \(J_{\gamma B}\) by [26, Proposition 4.35]. \(\square \)

\(\hbox {FBF}+\hbox {DR}\): Convergent with Further Assumptions

To solve Problem (1), we propose the following iteration

$$\begin{aligned} x_{n+1}&= J_{\gamma B}z_n\\ y_{n+1}&= J_{\gamma A}(2x_{n+1}- z_n -\gamma Cx_{n+1})\\ z_{n+1}&= z_n+y_{n+1}-x_{n+1}-\gamma (Cy_{n+1}-Cx_{n+1}) \end{aligned}$$
(FDRF)

for \(n=0,1,\ldots \) where \(z_0\in {\mathcal {H}}\) is a starting point and \(\gamma >0\). We call this method forward-Douglas–Rachford-forward (FDRF) splitting as it combines Tseng’s FBF [2] and Douglas–Rachford [1]; FDRF reduces to FBF when \(B=0\) and to DR when \(C=0\).

We can view FDRF as a fixed-point iteration \(z_{n+1}=Tz_n\) with

$$\begin{aligned} T:= ({\text {Id}}-\gamma C) J_{\gamma A}(2J_{\gamma B} -{\text {Id}}- \gamma C J_{\gamma B}) +{\text {Id}}- ({\text {Id}}-\gamma C) J_{\gamma B}. \end{aligned}$$

The following result states that T is a fixed-point encoding for Problem (1).

Lemma 4.1

Assume (A1) and (A2). If \(\gamma \in ]0,1/\mu [\), then

$$\begin{aligned} {\text {zer}}(A+B+C) = J_{\gamma B}({\text {Fix}}(T)), \end{aligned}$$

where \({\text {Fix}}(T):=\big \{{x\in {\mathcal {H}}}: {Tx=x}\big \}\).

Proof

Let \(x\in {\text {zer}}(A+B+C)\). Then, there exists \(u\in Ax\) and \(v\in Bx\) such that \(0= u+v+Cx\). It follows from \(v\in Bx\) that \(x = J_{\gamma B}z\) where \(z= x+\gamma v\). We have \(2 J_{\gamma B}z-z-\gamma CJ_{\gamma B}z= 2x-z-\gamma Cx = x+\gamma u\in ({\text {Id}}+\gamma A)x\) and \(x= J_{\gamma A}(2 J_{\gamma B}z-z-\gamma CJ_{\gamma B}z)\). Therefore,

$$\begin{aligned} ({\text {Id}}-\gamma C)J_{\gamma B}z = ({\text {Id}}-\gamma C) J_{\gamma A}(2 J_{\gamma B}z-z-\gamma CJ_{\gamma B}z), \end{aligned}$$

which shows that \(Tz=z\) and \({\text {zer}}(A+B+C) \subset J_{\gamma B}({\text {Fix}}(T))\). Now, let \(z\in {\text {Fix}}(T)\). By Lemma 3.1, we have \(J_{\gamma B}z=J_{\gamma A}(2 J_{\gamma B}z-z-\gamma CJ_{\gamma B}z)\). Set \(x= J_{\gamma B}z\). Then, \(z-x\in \gamma Bx\) and \((2x-z-\gamma Cx)-x \in \gamma Ax\). Therefore, \(0\in Ax +Bx+Cx\) and hence \(J_{\gamma B}({\text {Fix}}(T))\subset {\text {zer}}(A+B+C)\). \(\square \)

Under further assumptions, FDRF’s \((x_n)_{n\in \mathbb N}\) sequence converges weakly to a solution of (1).

Theorem 4.1

Assume (A1), (A2), and (A3). If furthermore one of the following conditions holds

  1. (i)

    B is \(\kappa \)-cocoercive and \(\gamma \in \big ]0,\mu ^{-1}/\sqrt{1 +\gamma /(2\kappa ))}\big [\), which is satisfied, for example, if \(0< \gamma < \min \{\kappa , \mu ^{-1}\sqrt{2/3}\}\).

  2. (ii)

    \(B= N_{V}\) and \(C = P_VC_1P_V\) for some closed vector space V and single-valued operator \(C_1:{\mathcal {H}}\rightarrow {\mathcal {H}}\) and \(\gamma \in ]0,1/\mu [\),

then \(z_n\rightharpoonup z_\star \in {\text {Fix}}(T)\) and \(x_n\rightharpoonup J_{\gamma B}z_\star \in {\text {zer}}(A+B+C)\) and \(y_n\rightharpoonup J_{\gamma B}z_\star \) for (FDRF).

Proof

Let \(z_\star \in {\text {Fix}}(T)\) and \(x_\star :=J_{\gamma B}z_\star \). Set \(u_{n+1} := x_{n+1} - z_n +z_\star -x_\star \) and we have

$$\begin{aligned} z_{n+1} - z_\star= & {} z_n + y_{n+1} - x_{n+1}- \gamma (Cy_{n+1}-Cx_{n+1}) - z_\star \\= & {} y_{n+1}-x_\star + \gamma (Cx_{n+1}-Cy_{n+1})-u_{n+1}. \end{aligned}$$

By

$$\begin{aligned} \Vert z_{n+1}-z_\star \Vert ^2&= \Vert y_{n+1} - x_\star +\gamma (Cx_{n+1}-Cy_{n+1})\Vert ^2 \nonumber \\&\quad -2\left\langle {y_{n+1} - x_\star +\gamma (Cx_{n+1}-Cy_{n+1})}, {u_{n+1}} \right\rangle + \Vert u_{n+1}\Vert ^2. \end{aligned}$$
(2)

We expand the first term to get

$$\begin{aligned}&\Vert y_{n+1} - x_\star +\gamma (Cx_{n+1}-Cy_{n+1})\Vert ^2 \nonumber \\&\quad = \Vert y_{n+1}-x_\star \Vert ^2 + 2\gamma \left\langle {y_{n+1}-x_\star }, {Cx_{n+1}-Cy_{n+1}} \right\rangle + \gamma ^2\Vert Cx_{n+1}-Cy_{n+1} \Vert ^2. \end{aligned}$$
(3)

Note

$$\begin{aligned}&2x_{n+1} -z_n - \gamma Cx_{n+1} -y_{n+1} \in \gamma Ay_{n+1},\\&x_\star -z_\star - \gamma Cx_\star \in \gamma Ax_\star . \end{aligned}$$

Since A and C are monotone, we have

$$\begin{aligned} 0\le & {} \left\langle {y_{n+1}-x_\star }, {2x_{n+1} -z_n - \gamma Cx_{n+1} -y_{n+1} -x_\star +z_\star +\gamma Cx_\star } \right\rangle \\= & {} \left\langle {y_{n+1}-x_\star }, { x_{n+1} -y_{n+1} -\gamma Cx_{n+1} + \gamma Cx_\star } \right\rangle + \left\langle {y_{n+1}-x_\star }, {u_{n+1}} \right\rangle \\\le & {} \left\langle {y_{n+1}-x_\star }, { x_{n+1} -y_{n+1}+ \gamma Cy_{n+1}-\gamma Cx_{n+1} } \right\rangle + \left\langle {y_{n+1}-x_\star }, {u_{n+1}} \right\rangle , \end{aligned}$$

which implies that

$$\begin{aligned}&2\gamma \left\langle {y_{n+1}-x_\star }, {Cx_{n+1} - Cy_{n+1}} \right\rangle \nonumber \\&\quad \le 2 \left\langle {y_{n+1}-x_\star }, {x_{n+1}-y_{n+1}} \right\rangle + 2\left\langle {y_{n+1}-x_\star }, {u_{n+1}} \right\rangle \nonumber \\&\quad = \Vert x_{n+1}-x_\star \Vert ^2 - \Vert y_{n+1}-x_\star \Vert ^2 - \Vert x_{n+1}-y_{n+1}\Vert ^2 + 2\left\langle {y_{n+1}-x_\star }, {u_{n+1}} \right\rangle . \end{aligned}$$
(4)

Combining (3) and (4), we get

$$\begin{aligned}&\Vert y_{n+1} - x_\star +\gamma (Cx_{n+1}-Cy_{n+1})\Vert ^2 \\&\quad \le \Vert x_{n+1}-x_\star \Vert ^2 - \Vert x_{n+1}-y_{n+1}\Vert ^2+ \gamma ^2\Vert Cx_{n+1}-Cy_{n+1} \Vert ^2\\&\quad \quad +2\left\langle {y_{n+1}-x_\star }, {u_{n+1}} \right\rangle . \end{aligned}$$

Applying this bound to (2), we get

$$\begin{aligned} \Vert z_{n+1}-z_\star \Vert ^2&\le \Vert x_{n+1}-x_\star \Vert ^2 - \Vert x_{n+1}-y_{n+1}\Vert ^2 + \gamma ^2\Vert Cx_{n+1}-Cy_{n+1} \Vert ^2 \nonumber \\&\quad -2\gamma \left\langle {Cx_{n+1}-Cy_{n+1}}, {u_{n+1}} \right\rangle + \Vert u_{n+1}\Vert ^2. \end{aligned}$$
(5)
  1. (i)

    We consider the case where B is \(\kappa \)-cocoercive. From (5), we get

    $$\begin{aligned} \Vert z_{n+1}-z_\star \Vert ^2&\le \Vert z_n-z_\star \Vert ^2- \Vert x_{n+1}-y_{n+1}\Vert ^2+ \gamma ^2\Vert Cx_{n+1}-Cy_{n+1} \Vert ^2 \nonumber \\&\quad -2\gamma \left\langle {Cx_{n+1}-Cy_{n+1}}, {u_{n+1}} \right\rangle -\frac{2\kappa }{\gamma }\Vert u_{n+1}\Vert ^2\nonumber \\&\le \Vert z_n-z_\star \Vert ^2- \Vert x_{n+1}-y_{n+1}\Vert ^2 +\gamma ^2 \left( 1 +{\frac{\gamma }{2\kappa (1-\varepsilon ')}}\right) \Vert Cx_{n+1}-C y_{n+1} \Vert ^2 \nonumber \\&\quad -\frac{2\kappa \varepsilon '}{\gamma }\Vert u_{n+1}\Vert ^2\nonumber \nonumber \\&\le \Vert z_n-z_\star \Vert ^2- \Vert x_{n+1}-y_{n+1}\Vert ^2 +\gamma ^2 \left( 1 +{\frac{\gamma }{2\kappa (1-\varepsilon ')}}\right) \mu ^2 \Vert x_{n+1}- y_{n+1} \Vert ^2 \nonumber \\&\quad -\frac{2\kappa \varepsilon '}{\gamma }\Vert u_{n+1}\Vert ^2\nonumber \\&= \Vert z_n-z_\star \Vert ^2- \varepsilon \Vert x_{n+1}-y_{n+1}\Vert ^2 -\frac{2\kappa \varepsilon '}{\gamma }\Vert u_{n+1}\Vert ^2, \end{aligned}$$
    (6)

    where \(0<\varepsilon '<1\). The first inequality follows from Lemma 3.2, the second inequality follows from Young’s inequality, the third inequality follows from \(\mu \)-Lipschitz continuity of C, and the final equality follows from the definition \(\varepsilon :=1-\gamma ^2 \left( 1 +{\frac{\gamma }{2\kappa (1-\varepsilon ')}}\right) \mu ^2\). We choose \(\varepsilon '>0\) small enough so that \(\varepsilon >0\).

  2. (ii)

    If \(B= N_V\) and \(C = P_V C_1 P_V\), then

    $$\begin{aligned}&\left\langle {Cx_{n+1}-Cy_{n+1}}, {u_{n+1}} \right\rangle \\&\quad = \left\langle {C_1P_V x_{n+1} - C_1 P_V y_{n+1}}, {P_V(x_{n+1}-z_n) + P_V(z_\star -x_\star )} \right\rangle = 0. \end{aligned}$$

    Hence, (5) becomes,

    $$\begin{aligned} \Vert z_{n+1}-z_\star \Vert ^2&= \Vert x_{n+1}-x_\star \Vert ^2 {-} \Vert x_{n+1}{-}y_{n+1}\Vert ^2 {+} \gamma ^2\Vert Cx_{n+1}{-}Cy_{n+1} \Vert ^2{+} \Vert u_{n+1}\Vert ^2 \\&\le \Vert z_n-z_\star \Vert ^2 - \Vert x_{n+1}-y_{n+1}\Vert ^2 + \gamma ^2\Vert Cx_{n+1}-Cy_{n+1} \Vert ^2\\&\le \Vert z_n-z_\star \Vert ^2 - \Vert x_{n+1}-y_{n+1}\Vert ^2 + \gamma ^2\mu ^2\Vert x_{n+1}-y_{n+1} \Vert ^2\\&=\Vert z_n-z_\star \Vert ^2 - \varepsilon \Vert x_{n+1}-y_{n+1}\Vert ^2, \end{aligned}$$

    where the first inequality follows from \(\Vert x_{n+1}-x_\star \Vert ^2 \le \Vert z_n-z_\star \Vert ^2-\Vert u_{n+1}\Vert ^2 \), which follows from (1 / 2)-averagedness of \(P_V\), the second inequality follows from \(\mu \)-Lipschitz continuity of C, and the final equality follows from the definition \(\varepsilon :=1-\gamma ^2\mu ^2 >0\).

In cases (i) and (ii) both, we have

$$\begin{aligned} (\forall z_\star \in {\text {Fix}}(T))\quad \Vert z_{n+1}-z_\star \Vert ^2 \le \Vert z_n-z_\star \Vert ^2 - \varepsilon \Vert x_{n+1}-y_{n+1}\Vert ^2 \end{aligned}$$

with \(\varepsilon >0\), which shows that \((z_n)_{n\in \mathbb N}\) is Fejér monotone with respect to \({\text {Fix}}(T)\) and

$$\begin{aligned} \sum _{n\in \mathbb N} \Vert x_{n+1}-y_{n+1}\Vert ^2 < +\infty , \end{aligned}$$

which implies \(x_{n+1}-y_{n+1}\rightarrow 0\). Let us prove that every weak cluster point of \((z_n)_{n\in \mathbb N}\) is in \({\text {Fix}}(T)\). Let \(\overline{z}\) be a weak cluster point of \((z_n)_{n\in \mathbb N}\), i.e., there exists a subsequence \((z_{k_n})_{n\in \mathbb N}\) such that \(z_{k_n}\rightharpoonup \overline{z}\). Consider two cases:

  1. (i)

    We consider the case where B is \(\kappa \)-cocoercive. From the second negative term in (6), we get

    $$\begin{aligned} \sum _{n\in \mathbb N} \Vert u_n\Vert ^2 < +\infty&\Longrightarrow \quad u_n \rightarrow 0 \quad \Longrightarrow \quad x_{n+1} -z_n \rightarrow x_\star -z_\star \\&\Longrightarrow \quad Bx_{n+1}\rightarrow Bx_\star = B J_{\gamma B}z_\star \end{aligned}$$

    where the last implication follows from \(x_{n+1}=J_{\gamma B}z_n\), \(z_{n} -x_{n+1}=\gamma Bx_{n+1}\), \(x_\star = J_{\gamma B}z_\star \), and \(z_{\star }-x_{\star }=\gamma Bx_\star \). Since \(z_{k_n}\rightharpoonup \overline{z}\), we have \(x_{1+k_n} \rightharpoonup \overline{x} = \overline{z} - \gamma Bx_\star \) and \(y_{1+k_n} \rightharpoonup \overline{z} -\gamma Bx_\star \). Since \(x_{1+k_n}\rightharpoonup \overline{x}\) and \(Bx_{1+k_n} \rightarrow Bx_\star \) and \({\text {gra}}(B)\) is closed under \({\mathcal {H}}^{\mathrm{weak}} \times {\mathcal {H}}^{\mathrm{strong}}\) [26, Proposition 20.38], we get \(Bx_\star = B\overline{x}\). Hence, \(\overline{x} = J_{\gamma B}\overline{z}\). By definition of the FDRF iteration, we have

    $$\begin{aligned} \underbrace{x_{1+k_n} -z_{k_n}}_{\rightarrow -\gamma B \overline{x}} + \underbrace{x_{1+k_n} -y_{1+k_n}}_{\rightarrow 0} + \underbrace{\gamma Cy_{1+k_n}- \gamma Cx_{1+k_n}}_{\rightarrow 0} \in \gamma A\underbrace{y_{1+k_n}}_{\rightharpoonup \overline{x}}+ \gamma C \underbrace{y_{1+k_n}}_{\rightharpoonup \overline{x}}. \end{aligned}$$

    Since \(A+C\) is maximal monotone (A and C are maximal monotone with \({\text {dom}}C={\mathcal {H}}\) [26, Corollary 25.5]) \({\text {gra}}(A+C)\) is closed under \({\mathcal {H}}^{\mathrm{weak}} \times {\mathcal {H}}^{\mathrm{strong}}\) [26, Proposition 20.38] and we get

    $$\begin{aligned} -\gamma B \overline{x} \in \gamma A \overline{x} + \gamma C \overline{x}, \end{aligned}$$

    which shows that \(\overline{x} \in {\text {zer}}(A+B+C)\). Furthermore, \(\overline{x}-\overline{z} \in \gamma A \overline{x} + \gamma C \overline{x}\) and hence \(\overline{x} = J_{\gamma A}(2\overline{x} -\overline{z} -\gamma C\overline{x}) = J_{\gamma B}\overline{z}\). Therefore,

    $$\begin{aligned} ({\text {Id}}-\gamma C)J_{\gamma A}(2\overline{x} -\overline{z} -\gamma C\overline{x}) + \overline{z} - ({\text {Id}}-\gamma C) J_{\gamma B}\overline{z} = \overline{z}, \end{aligned}$$

    or equivalently \(T\overline{z} = \overline{z}\). Hence, \(z_n\rightharpoonup z_{\star }\) and \(x_n \rightharpoonup J_{\gamma B}z_{\star }\). Since we have \(x_{n+1}-y_{n+1}\rightarrow 0\), we conclude \(y_n\rightharpoonup J_{\gamma B}z_{\star }\).

  2. (ii)

    We consider the case where \(B = N_V\). Then, \(J_{\gamma B} = P_V\) is weakly continuous and hence \(x_{1+k_n} \rightharpoonup \overline{x} = P_V \overline{z}\). Then, we have

    $$\begin{aligned} p_{k_n}:= & {} \underbrace{x_{1+k_n} -z_{k_n}}_{\rightharpoonup \overline{x} - \overline{z}} + \underbrace{x_{1+k_n} -y_{1+k_n}}_{\rightarrow 0} +\underbrace{\gamma Cy_{1+k_n} - \gamma Cx_{1+k_n}}_{\rightarrow 0} \\\in & {} \gamma A\underbrace{y_{1+k_n}}_{\rightharpoonup \overline{x}}+ \gamma C \underbrace{y_{1+k_n}}_{\rightharpoonup \overline{x}}. \end{aligned}$$

Since \(A+C\) is maximal monotone, \(x_{1+k_n}=P_Vz_{k_n}\), \(x_{1+k_n}-y_{1+k_n}\rightarrow 0\), we have

$$\begin{aligned} p_{k_n} \rightharpoonup \overline{x} - \overline{z}, \quad P_{V^\perp } y_{1+k_n} \rightarrow 0,\quad \text {and}\quad P_{V} p_{k_n} \rightarrow 0. \end{aligned}$$

From [26, Example 26.7] we have \(\overline{x} \in {\text {zer}}(A+C+N_V)\) and \(\overline{x} - \overline{z} \in (\gamma A+\gamma C)\overline{x}\). Hence \(\overline{x} = J_{\gamma A}(2\overline{x} -\overline{z} -\gamma C\overline{x}) = J_{\gamma B}\overline{z}\). Therefore,

$$\begin{aligned} ({\text {Id}}-\gamma C)J_{\gamma A}(2\overline{x} -\overline{z} -\gamma C\overline{x}) + \overline{z} - ({\text {Id}}-\gamma C) J_{\gamma B}\overline{z} = \overline{z}, \end{aligned}$$

or equivalently \(T\overline{z} = \overline{z}\). Hence \(z_n\rightharpoonup \overline{z}\) and \(x_n \rightharpoonup J_{\gamma B}\overline{z}\). \(\square \)

Under condition (i), B is single-valued and one can alternatively use FBF [2] or FBHF [9] by utilizing forward evaluations of B rather than the resolvent \(J_{\gamma B}\). However, many cocoercive operators B require similar computational costs for evaluating B and \(J_B\), and, in such cases, it may be advantageous to use \(J_B\) instead of the forward evaluation B [29].

Under condition (ii), the operator \(B=N_V\) enforces a linear equality constraint. Consider

$$\begin{aligned} \text{ find } x\in {\mathcal {H}}\quad \text{ such } \text{ that } \quad 0\in \sum ^m_{i=1}A_ix+C_i x, \end{aligned}$$

where \(A_1,\ldots ,A_m\) are maximal monotone and \(C_1,\ldots ,C_m\) are monotone and Lipschitz. The equivalent formulation

$$\begin{aligned} \text{ find } \mathbf {x}\in {\mathcal {H}}^m \quad \text{ such } \text{ that } \quad 0\in N_V(\mathbf {x})+\sum ^m_{i=1}(A_ix_i+C_ix_i), \end{aligned}$$

where \(\mathbf {x}=(x_1,\ldots ,x_m)\) and \(V=\big \{{\mathbf {x}\in {\mathcal {H}}^m}: {x_1=\dots =x_m}\big \}\) is the consensus set, is an important instance of case (ii). (This problem class is the motivation for Raguet et al.’s forward-Douglas–Rachford [5, 6].) When \(B=N_V\) and V is the consensus set, FDRF reduces to Banert’s relaxed forward backward, presented in the thesis [10]. Finally, Briceño-Arias’s forward–partial inverse–forward [11] is also applicable under this setup. Briceño-Arias’s method is different from our FDRF, but it can also be considered a “forward-Douglas–Rachford-forward splitting” as it reduces to DRS and FBF as special cases.

FDRF resembles forward-Douglas–Rachford (FDR) splitting [5,6,7,8] but is different due to the correction term \(\gamma (Cy_{n+1}-Cx_{n+1})\). For convergence, FDR requires C to be cocoercive or (with a slight modification) B to be strongly monotone [8, Theorems 1.1 and 1.2]. In contrast, Theorem 4.1 states that FDRF converges when B is cocoercive.

However, FDRF does not converge in full generality. The following result establishes that Assumptions (A1), (A2), and (A3) are not sufficient to ensure that FDRF converges.

Theorem 4.2

Given any \(\gamma > 0\), there exist operators A, B, and C satisfying Assumptions (A1), (A2), and (A3) such that the FDRF iterates \((z_n)_{n\in \mathbb N}\) and \((x_n)_{n\in \mathbb N}\) diverge.

Proof

Let \({\mathcal {H}}=\mathbb R^2\) and let A, B, and C satisfy

$$\begin{aligned} J_{\gamma A}(x,y)= & {} \begin{bmatrix} 0&\quad 0\\ 0&\quad 0 \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix},\qquad B(x,y)= \begin{bmatrix} 0&\quad \gamma ^{-1}\cot (\omega /2)\\ -\gamma ^{-1}\cot (\omega /2)&\quad 0 \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix},\\ C(x,y)= & {} \begin{bmatrix} 0&\quad \mu \\ -\mu&\quad 0 \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix} , \end{aligned}$$

where \(\cot \) denotes cotangent and \(\omega >0\) is small. Then, A, B, and C are maximally monotone and \(\{0\}={\text {zer}}(A+B+C)\). With direct calculations, we get

$$\begin{aligned} T(x,y)= \begin{bmatrix} \frac{1}{2}(1+\cos (\omega )+\gamma \mu \sin (\omega ))&\frac{1}{2}(\gamma \mu -\gamma \mu \cos (\omega )+ \sin (\omega ))\\ \frac{1}{2}(-\gamma \mu +\gamma \mu \cos (\omega )- \sin (\omega ))&\frac{1}{2}(1+\cos (\omega )+\gamma \mu \sin (\omega )) \end{bmatrix} \begin{bmatrix} x\\y \end{bmatrix}. \end{aligned}$$

The \(2\times 2\) matrix defining T has eigenvalues

$$\begin{aligned} |\lambda _1|^2=|\lambda _2|^2= (\cos (\omega /2)+\gamma \mu \sin (\omega /2))^2=1+\gamma \mu \omega +\mathcal {O}(\omega ^2). \end{aligned}$$

So \(|\lambda _1|^2=|\lambda _2|^2>1\) for small enough \(\omega \). Therefore, FDRF with \(z_0\ne 0\) diverges in the sense that \(\Vert z_n\Vert \rightarrow \infty \) and \(\Vert x_n\Vert \rightarrow \infty \). \(\square \)

In splitting methods, step size requirements often depend on the assumptions, rather than on the specific operators. Theorem 4.2 rules out the possibility of proving a result like “Assuming (A1), (A2), and (A3), FDRF converges for \(\gamma \in ]0,\gamma _\mathrm {max}(\mu )[\),” where \(\gamma _\mathrm {max}(\mu )\) is some function that depends \(\mu \). However, Theorem 4.2 does not rule out the possibility that one can examine the specific operators A, B, and C (to gain more information beyond the Lipschitz parameter of C) and then select \(\gamma >0\) to obtain convergence.

\(\hbox {FRB}+\hbox {DR}\): Convergent in General

To solve Problem (1), we propose the following iteration

$$\begin{aligned} x_{n+1}&=J_{\gamma B}(x_n-\gamma u_n-\gamma (2Cx_n-Cx_{n-1}))\\ y_{n+1}&=J_{\beta A}(2x_{n+1}-x_n+\beta u_n)\\ u_{n+1}&=u_n+\frac{1}{\beta }(2x_{n+1}-x_n-y_{n+1}) \end{aligned}$$
(FRDR)

for \(n=0,1,\ldots \), where \(x_0,x_{-1},u_0\in {\mathcal {H}}\) are starting points and \(\gamma >0\), \(\beta > 0\). We call this method forward–reflected-Douglas–Rachford (FRDR) splitting as it combines Malitsky and Tam’s FRB [12] and Douglas–Rachford [1]. Note FRDR evaluates operator C only once per iteration, since the evaluation of \(Cx_{n-1}\) from the previous iteration can be reused. In contrast, FDRF evaluates C twice per iteration.

FRDR reduces to FRB when \(A=0\) and to DR when \(C=0\) and \(\beta =\gamma \). When \(A=0\), we have \(J_{\beta A}={\text {Id}}\), \(u_n=0\), and the iteration is independent of \(\beta \). FRB converges when \(\gamma <1/(2\mu )\) [12, Theorem 2.5], which is consistent with the parameter range of Theorem 5.1 with \(\beta \rightarrow \infty \). When \(C=0\) and \(\beta =\gamma \), one recovers DR with \(z_n=x_n-\gamma u_n\).

Without any further assumptions, the \((x_n)_{n\in \mathbb N}\) sequence of FRDR converges weakly to a solution of (1).

Theorem 5.1

Assume (A1), (A2), (A3), \(0<\beta \), and \(0<\gamma <\beta /(1+2\mu \beta )\). Then, \(x_n\rightharpoonup x_\star \in {\text {zer}}(A+B+C)\) for (FRDR).

Proof

Consider the Hilbert space \({\mathcal {H}}\times {\mathcal {H}}\) equipped with an alternative inner product and norm

$$\begin{aligned} \left\langle {(x,u)}, {(y,v)} \right\rangle _{{\mathcal {H}}\times {\mathcal {H}}}&:=(1/\gamma )\left\langle { x}, {y} \right\rangle -\left\langle {x}, {v} \right\rangle -\left\langle { y}, {u} \right\rangle +\beta \left\langle {u}, {v} \right\rangle ,\\ \Vert (x,u)\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}&:=(1/\gamma )\Vert x\Vert ^2-2\left\langle {x}, {u} \right\rangle +\beta \Vert u\Vert ^2. \end{aligned}$$

Since \(\gamma <\beta \), the inner product and norm are valid. Let \(x_\star \in {\text {zer}}(A+B+C)\), \(u_\star \in Ax_\star \), and \(-u_\star \in (B+C)x_\star \).

Define

$$\begin{aligned} \tilde{A}y_{n+1}&:=u_n+\frac{1}{\beta }(2x_{n+1}-x_n-y_{n+1}),\\ \tilde{B}x_{n+1}&:=\frac{1}{\gamma }(x_n-x_{n+1})- u_n- 2Cx_n+Cx_{n-1}, \end{aligned}$$

\(\tilde{A}x_\star :=u_\star \), and \(\tilde{B}x_\star :=-u_\star -Cx_\star \) so that \(\tilde{A}y_{n+1}\in Ay_{n+1}\), \(\tilde{B}x_{n+1}\in Bx_{n+1}\), \(\tilde{A}x_\star \in Ax_\star \), and \(\tilde{B}x_\star \in Bx_\star \). Define

$$\begin{aligned} V_{n}&:=\Vert (x_n,u_n)-(x_\star ,u_\star )\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}+\frac{1}{2}\Vert (x_n,u_n)-(x_{n-1},u_{n-1})\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}\\&\qquad +2\left\langle { Cx_{n}-Cx_{n-1}}, {x_\star -x_{n}} \right\rangle \\ S_{n}&:= \frac{1}{2}\Vert (x_n,u_n)-(x_{n+1},u_{n+1})\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}} +\frac{1}{2}\Vert (x_n,u_n)-(x_{n-1},u_{n-1})\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}. \end{aligned}$$

We have

$$\begin{aligned}&\Vert (x_{n+1},u_{n+1})-(x_{\star },u_{\star })\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}\\&\quad = \Vert (x_{n},u_{n})-(x_{\star },u_{\star })\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}} - \Vert (x_{n+1},u_{n+1})-(x_{n},u_{n})\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}\\&\quad \quad - 2\left\langle {\tilde{B}x_{n+1}-\tilde{B}x_\star }, {x_{n+1}-x_\star } \right\rangle - 2\left\langle { \tilde{A}y_{n+1}-\tilde{A}x_\star }, {y_{n+1}-x_\star } \right\rangle \\&\quad \quad +2\left\langle { Cx_{n}-Cx_{n-1}}, {x_{n}-x_{n+1}} \right\rangle -2\left\langle {Cx_{\star }-Cx_{n}}, {x_{\star }-x_{n+1}} \right\rangle \\&\quad \quad +2\left\langle {Cx_{n}-Cx_{n-1}}, {x_{\star }-x_{n}} \right\rangle \\&\quad {\mathop {\le }\limits ^{\text {(a)}}} \Vert (x_{n},u_{n})-(x_{\star },u_{\star })\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}} - \Vert (x_{n+1},u_{n+1})-(x_{n},u_{n})\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}\\&\quad \quad +2\left\langle { Cx_{n}-Cx_{n-1}}, {x_{n}-x_{n+1}} \right\rangle -2\left\langle {Cx_{\star }-Cx_{n}}, {x_{\star }-x_{n+1}} \right\rangle \\&\quad \quad +2\left\langle {Cx_{n}-Cx_{n-1}}, {x_{\star }-x_{n}} \right\rangle \\&\quad {\mathop {\le }\limits ^{\text {(b)}}} \Vert (x_{n},u_{n})-(x_{\star },u_{\star })\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}} - \Vert (x_{n+1},u_{n+1})-(x_{n},u_{n})\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}\\&\quad \quad +L\Vert x_{n}-x_{n-1}\Vert ^2+L\Vert x_{n+1}-x_{n}\Vert ^2 -2\left\langle {Cx_{n+1}-Cx_{n}}, {x_{\star }-x_{n+1}} \right\rangle \\&\quad \quad +2\left\langle {Cx_{n}-Cx_{n-1}}, {x_{\star }-x_{n}} \right\rangle . \end{aligned}$$

Inequality (a) follows from monotonicity of A and B. Inequality (b) follows from

$$\begin{aligned} 2\left\langle { Cx_{n}-Cx_{n-1}}, {x_{n}-x_{n+1}} \right\rangle&\le \frac{1}{\mu } \Vert Cx_{n}-Cx_{n-1}\Vert ^2+\mu \Vert x_{n}-x_{n+1}\Vert ^2\\&\le \mu \left( \Vert x_{n}-x_{n+1}\Vert ^2+\Vert x_{n}-x_{n-1}\Vert ^2\right) , \end{aligned}$$

which follows Young’s inequality and Lipschitz continuity of C, and

$$\begin{aligned} -2\left\langle {Cx_{\star }-Cx_{n}}, {x_{\star }-x_{n+1}} \right\rangle \le -2\left\langle {Cx_{n+1}-Cx_{n}}, {x_{\star }-x_{n+1}} \right\rangle \end{aligned}$$

which follows from monotonicity of C. Reorganizing, we get

$$\begin{aligned} V_{n+1}&\le V_{n}-\frac{1}{2} \left( \Vert (x_{n+1},u_{n+1})-(x_{n},u_{n})\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}} +\Vert (x_{n},u_{n})-(x_{n-1},u_{n-1})\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}} \right) \\&\quad +L\left( \Vert x_{n}-x_{n-1}\Vert ^2+\Vert x_{n+1}-x_{n}\Vert ^2\right) \end{aligned}$$

we now add

$$\begin{aligned}&\frac{\beta \gamma \mu }{\beta -\gamma } \left( \frac{1}{\beta }\Vert x_{n}-x_{n+1}\Vert ^2-2\left\langle { x_{n}-x_{n+1}}, {u_{n}-u_{n+1}} \right\rangle +\beta \Vert u_{n}-u_{n+1}\Vert ^2\right) \ge 0\\&\frac{\beta \gamma \mu }{\beta -\gamma } \left( \frac{1}{\beta }\Vert x_{n}-x_{n-1}\Vert ^2-2\left\langle {x_{n}-x_{n-1}}, {u_{n}-u_{n-1}} \right\rangle +\beta \Vert u_{n}-u_{n-1}\Vert ^2 \right) \ge 0 \end{aligned}$$

to the right-hand-side (nonnegativity follows from Young’s inequality) to get

$$\begin{aligned} V_{n+1}&\le V_{n}-\frac{\beta -\gamma -2\mu \gamma \beta }{2(\beta -\gamma )}S_{n}. \end{aligned}$$
(7)

Using the telescoping sum argument with (7), we get

$$\begin{aligned} V_0-\frac{\beta -\gamma -2\mu \gamma \beta }{2(\beta -\gamma )}\sum ^n_{i=0}S_i\ge V_n \end{aligned}$$

Next, we have

$$\begin{aligned} V_{n}&{\mathop {\ge }\limits ^{\text {(a)}}} \frac{1}{\gamma }\Vert x_{n}-x_\star \Vert ^2-2\left\langle {x_{n}-x_\star }, {u_{n}-u_\star } \right\rangle +\beta \Vert u_{n}-u_\star \Vert ^2\\&\qquad + \frac{1}{2\gamma }\Vert x_{n}-x_{n-1}\Vert ^2-\left\langle {x_{n}-x_{n-1}}, {u_{n}-u_{n-1}} \right\rangle +\frac{\beta }{2}\Vert u_{n}-u_{n-1}\Vert ^2\\&\qquad - \mu \left( \Vert x_{n}-x_{n-1}\Vert ^2+\Vert x_\star -x_{n}\Vert ^2\right) \\&{\mathop {\ge }\limits ^{\text {(b)}}}\frac{1}{2}\Vert (x_n,u_n)-(x_\star ,u_\star )\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}\\&\qquad + \left( \frac{1}{2\gamma }-\frac{1}{2\beta }-\mu \right) \Vert x_{n}-x_{\star }\Vert ^2 + \left( \frac{1}{2\gamma }-\frac{1}{2\beta }-\mu \right) \Vert x_{n}-x_{n-1}\Vert ^2\\&{\mathop {\ge }\limits ^{\text {(c)}}} \frac{1}{2}\Vert (x_n,u_n)-(x_\star ,u_\star )\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}. \end{aligned}$$

Inequality (a) follows from

$$\begin{aligned} 2\left\langle { Cx_{n}-Cx_{n-1}}, {x_\star -x_{n}} \right\rangle&\le \frac{1}{\mu } \Vert Cx_{n}-Cx_{n-1}\Vert ^2+\mu \Vert x_\star -x_{n}\Vert ^2\\&\le \mu \left( \Vert x_{n}-x_{n-1}\Vert ^2+\Vert x_\star -x_{n}\Vert ^2\right) , \end{aligned}$$

which follows Young’s inequality and Lipschitz continuity of C, inequality (b) follows from

$$\begin{aligned} \frac{\beta }{2}\Vert u_{n}-u_\star \Vert ^2-\left\langle {x_{n}-x_\star }, {u_{n}-u_\star } \right\rangle&\ge -\frac{1}{2\beta } \Vert x_{n}-x_\star \Vert ^2\\ \frac{\beta }{2} \Vert u_{n}-u_{n-1}\Vert ^2-\left\langle {x_{n}-x_{n-1}}, {u_{n}-u_{n-1}} \right\rangle&\ge -\frac{1}{2\beta } \Vert x_{n}-x_{n-1}\Vert ^2, \end{aligned}$$

Young’s inequality, and inequality (c) follows from \(\gamma <\beta /(1+2\mu \beta )\). Putting these together, we have

$$\begin{aligned} V_0-\frac{\beta -\gamma -2\mu \gamma \beta }{2(\beta -\gamma )}\sum ^n_{i=0}S_i\ge \frac{1}{2}\Vert (x_n,u_n)-(x_\star ,u_\star )\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}. \end{aligned}$$

This implies that the sequence \(\left( (x_n,u_n)\right) _{n\in \mathbb N}\) is bounded and \(S_n\rightarrow 0\). Since

$$\begin{aligned} S_n&\ge \frac{1}{2}\Vert (x_{n+1},u_{n+1})-(x_n,u_n)\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}, \end{aligned}$$

\(S_n\rightarrow 0\) implies \(x_{n+1}-x_n\rightarrow 0\) and \(u_{n+1}-u_n\rightarrow 0\). Since

$$\begin{aligned} u_{n+1}-u_n=(1/\beta )(2x_{n+1}-x_n-y_{n+1}) \end{aligned}$$

we also have \(x_{n+1}-y_{n+1}\rightarrow 0\).

Now consider a weakly convergent subsequence \(\left( (x_{k_n},u_{k_n})\right) _{n\in \mathbb N}\) such that \((x_{k_n},u_{k_n})\rightharpoonup (\overline{x},\overline{u})\). Note that \(x_{n+1}\) and \(y_{n+1}\) are defined by the inclusion

$$\begin{aligned} \begin{bmatrix} \frac{1}{\gamma }\left( x_n-x_{n+1}\right) +2Cx_n-Cx_{n-1}-Cx_{n+1}\\ \frac{1}{\beta }\left( 2x_{n+1}-y_{n+1}-x_{n}\right) \end{bmatrix} \in \begin{bmatrix} (B+C)x_{n+1}+u_n\\ Ay_{n+1}-u_n \end{bmatrix}. \end{aligned}$$

The right-hand side is a maximal monotone operator on \({\mathcal {H}}\times {\mathcal {H}}\) (equipped with the usual inner product) [26, Propositions 20.23, 20.38] and the left-hand side strongly converges to 0 since C is continuous. Since \({\text {gra}}(B+C)\) is closed under \({\mathcal {H}}^{\mathrm{weak}} \times {\mathcal {H}}^{\mathrm{strong}}\) [26, Proposition 20.38], we have

$$\begin{aligned} -\overline{u}&\in (B+C)\overline{x},\qquad \overline{u}\in A\overline{x}. \end{aligned}$$

Adding these we also get \(\overline{x}\in {\text {zer}}(A+B+C)\). Finally, since \((V_n)_{n\in \mathbb N}\) is a monotonically decreasing nonnegative sequence, it has a limit. Since C is continuous, \((x_n)_{n\in \mathbb N}\) and \((u_n)_{n\in \mathbb N}\) are bounded sequences, and \(x_{n}-x_{n-1}\rightarrow 0\) and \(u_{n}-u_{n-1}\rightarrow 0\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }V_n=\lim _{n\rightarrow \infty }\Vert (x_n,u_n)-(x_\star ,u_\star )\Vert ^2_{{\mathcal {H}}\times {\mathcal {H}}}. \end{aligned}$$

By plugging in \((x_\star ,u_\star )=(\overline{x},\overline{u})\), we conclude that the entire sequence weakly converges to \((\overline{x},\overline{u})\). \(\square \)

The proof of Theorem 5.1 closely follows Malitsky and Tam’s analysis [12, Lemma 2.4]. In fact, FRDR can be thought of as an instance FRB on a primal-dual system with an auxiliary metric. Naively translating Malitsky and Tam’s convergence analysis via a change of coordinates leads to a step size requirement of \(\mu <(\beta -\gamma )/(1+\gamma \beta +\sqrt{(1+\gamma \beta )^2-4\gamma (\beta -\gamma )})\) and \(0<\gamma <\beta \). With a direct analysis, we obtain the better (and simpler) requirement of \(\mu <(\beta -\gamma )/(2\beta \gamma )\).

The discovery of this proof was computer-assisted in the sense that we used the performance estimation problem (PEP) [30,31,32] and a computer algebra system (CAS). We briefly describe the strategy here.

The proof of Theorem 5.1 crucially relies on finding the Lyapunov function \(V_n\) and showing \(V_{n+1}\le V_n-S_n\). For a fixed numerical value of \(\beta \), \(\gamma \), and \(\mu \), roughly speaking, the PEP allows us to pose a semidefinite program (SDP) which solution indicates whether a candidate Lyapunov function is nonincreasing. (A proof establishing a Lyapunov function is nonincreasing is a nonnegative combination of known inequalities, and the SDP automates the search.) We used the SDP to quickly experiment with many candidate Lyapunov functions and identified that the \(V_n\) used in the proof of Theorem 5.1 is a nonincreasing quantity.

To get a general proof, we numerically solved the SDP for many values of \(\beta \), \(\gamma \), and \(\mu \) and deduced the general symbolic solution. (The general proof is equivalent to a general solution of the SDP that symbolically depends on \(\beta \), \(\gamma \), and \(\mu \).) We relied on a CAS, Mathematica, to work through the symbolic calculations. In deducing the symbolic form of the proof, we utilized the observed structure of the solution. For example, the optimal SDP matrices were rank deficient, so we set the determinant of the corresponding symbolic matrix to 0 and eliminated a degree of freedom.

Finally, we translated the symbolic calculations into a traditional proof that is verifiable by humans without the aid of computer software. This step involved some further simplifications, such as replacing the identity

$$\begin{aligned} 2\left\langle {u}, {v} \right\rangle = \eta \Vert u\Vert ^2+\frac{1}{\eta }\Vert v\Vert ^2-\Vert \sqrt{\eta }u-(1/\sqrt{\eta })v\Vert ^2 \end{aligned}$$

with Young’s inequality

$$\begin{aligned} 2\left\langle {u}, {v} \right\rangle \le \eta \Vert u\Vert ^2+\frac{1}{\eta }\Vert v\Vert ^2. \end{aligned}$$

Comparison with Other Methods

We now quickly examine other existing methods applied to Problem (1) to see how they differ from FDRF and FRDR. We leave the comparison of these methods, in terms of their computational effectiveness, as a direction of future work. Note that Problem (1) can be reformulated into the primal dual system

$$\begin{aligned} \text{ find } x,u\in {\mathcal {H}}\quad \text{ such } \text{ that } \quad \begin{bmatrix} 0\\ 0 \end{bmatrix} \in \begin{bmatrix} Bx \\ A^{-1}u \end{bmatrix} +\begin{bmatrix} C&\quad {\text {Id}}\\ -{\text {Id}}&\quad 0 \end{bmatrix} \begin{bmatrix} x\\ u \end{bmatrix}. \end{aligned}$$
(8)

Combettes–Pesquet

The method of [13] can be thought of as FBF applied to the primal-dual system (8):

$$\begin{aligned} \overline{x}_{n+1}= & {} J_{\gamma B}(x_n -\gamma (Cx_n+ u_n))\\ y_{n+1}= & {} J_{\gamma ^{-1}A}(x_n+\gamma ^{-1}u_n)\\ x_{n+1}= & {} \overline{x}_{n+1} - \gamma (C\overline{x}_{n+1}- Cx_n)-\gamma ^2(x_n-y_{n+1})\\ u_{n+1}= & {} u_n+\gamma (\overline{x}_{n+1}-y_{n+1}). \end{aligned}$$

This method solves Problem (1) with an appropriate choice of \(\gamma >0\). This method does not reduce to DR when \(C=0\).

Malitsky–Tam

We can plainly applying FRB [12] to (8):

$$\begin{aligned} x_{n+1}= & {} J_{\gamma B}(x_n -\gamma (2Cx_n - Cx_{n-1} + 2u_n -u_{n-1}))\\ y_{n+1}= & {} J_{\gamma ^{-1} A}(2x_n - x_{n-1}+\gamma ^{-1}u_n)\\ u_{n+1}= & {} u_n+\gamma (2x_n-x_{n-1}-y_{n+1}). \end{aligned}$$

This method solves Problem (1) with an appropriate choice of \(\gamma >0\). This method does not reduce to DR when \(C=0\).

Briceño-Arias

When \(B=N_V\) and \(V\subset {\mathcal {H}}\) is a closed vector space, i.e., in case (ii) of Theorem 4.1, forward–partial inverse–forward [11] applies:

$$\begin{aligned} x_{n+1}&=J_{\gamma A}(z_n-\gamma J_{\gamma B}CJ_{\gamma B}z_n)\\ y_{n+1}&=J_{\gamma B}(2x_{n+1}-z_n+\gamma J_{\gamma B}CJ_{\gamma B}z_n)-x_{n+1}+z_n-\gamma J_{\gamma B}CJ_{\gamma B}z_n\\ z_{n+1}&=y_{n+1}-\gamma (J_{\gamma B}CJ_{\gamma B}y_{n+1}-J_{\gamma B}CJ_{\gamma B}z_{n+1}). \end{aligned}$$

This method reduces to DRS when \(C=0\) and to FBF when \(B=0\). However, this method does not apply in the general setup when B is not a normal cone operator.

Johnstone–Eckstein

The method of [22, 23] is based on the notion of projective splitting and bears little resemblance the other methods. The method is very flexible, and there are multiple instances applicable to Problem (1). The following instance follows the presentation of [22]:

$$\begin{aligned} x_{n+1}^A&=J_{\gamma A}(z_n+\gamma w_n^A)\\ x_{n+1}^B&=J_{\gamma B}(z_n+\gamma w_n^B)\\ x_{n+1}^C&=z_n-\gamma (Cz_n- w_n^C)\\ z_{n+1}&=z_n-\frac{\alpha _n}{\gamma ^2}\left( 3z_n-x^A_{n+1}-x^B_{n+1}-\gamma Cx_{n+1}^C+\gamma (w_n^A+w_n^B+w_n^C)\right) \\ w_{n+1}^A&=w_{n}^A-\alpha _n(x_{n+1}^A-x_n^C)\\ w_{n+1}^B&=w_{n}^B-\alpha _n(x_{n+1}^B-x_n^C)\\ w_{n+1}^C&=-w_{n+1}^A-w_{n+1}^B. \end{aligned}$$

The scalar parameter \(\alpha _n\) is computed each iteration with a formula given in [22]. This method does not reduce to DR or FBF.

Conclusions

In this paper, we considered the monotone inclusion problem with a sum of 3 operators, in which 2 are monotone and 1 is monotone-Lipschitz, and studied combinations of methods type “forward-Douglas–Rachford-forward.” We presented FDRF, a combination of DR and FBF, and showed that it converges with further assumptions, but not generally. We then presented FRDR, a combination of DR and FRB, and showed that it converges in general. Moreover, FRDR has a lower computational cost per iteration since it evaluates the monotone-Lipschitz operator only once per iteration. Therefore, we conclude FRDR to be the better forward-Douglas–Rachford-forward method.

References

  1. 1.

    Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

    MathSciNet  Article  Google Scholar 

  2. 2.

    Tseng, P.: A modified forward–backward splitting method for maximal monotone mappings. SIAM J. Control Optim. 38(2), 431–446 (2000)

    MathSciNet  Article  Google Scholar 

  3. 3.

    Bruck, R.E.: On the weak convergence of an ergodic iteration for the solution of variational inequalities for monotone operators in Hilbert space. J. Math. Anal. Appl. 61(1), 159–164 (1977)

    MathSciNet  Article  Google Scholar 

  4. 4.

    Passty, G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72(2), 383–390 (1979)

    MathSciNet  Article  Google Scholar 

  5. 5.

    Raguet, H., Fadili, J., Peyré, G.: Generalized forward–backward splitting. SIAM J. Imaging Sci. 6(3), 1199–1226 (2013)

    MathSciNet  Article  Google Scholar 

  6. 6.

    Raguet, H.: A note on the forward-Douglas–Rachford splitting for monotone inclusion and convex optimization. Optim. Lett. 13(4), 717–740 (2018)

    MathSciNet  Article  Google Scholar 

  7. 7.

    Briceño-Arias, L.M.: Forward-Douglas–Rachford splitting and forward–partial inverse method for solving monotone inclusions. Optimization 64(5), 1239–1261 (2015)

    MathSciNet  Article  Google Scholar 

  8. 8.

    Davis, D., Yin, W.: A Three-operator splitting scheme and its optimization applications. Set Valued Var. Anal. 25(4), 829–858 (2017)

    MathSciNet  Article  Google Scholar 

  9. 9.

    Briceño-Arias, L.M., Davis, D.: Forward–Backward–Half forward algorithm for solving monotone inclusions. SIAM J. Optim. 28(4), 2839–2871 (2018)

    MathSciNet  Article  Google Scholar 

  10. 10.

    Banert, S.: A relaxed forward–backward splitting algorithm for inclusions of sums of monotone operators. Master’s Thesis, Technische Universität Chemnitz (2012)

  11. 11.

    Briceño-Arias, L.M.: Forward–partial inverse–forward splitting for solving monotone inclusions. J. Optim. Theory Appl. 166(2), 391–413 (2015)

    MathSciNet  Article  Google Scholar 

  12. 12.

    Malitsky, Y., Tam, M.K.: A forward–backward splitting method for monotone inclusions without cocoercivity. arXiv:1808.04162 (2018)

  13. 13.

    Combettes, P.L., Pesquet, J.-C.: Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators. Set Valued Var. Anal. 20(2), 307–330 (2012)

    MathSciNet  Article  Google Scholar 

  14. 14.

    Combettes, P.L.: Systems of structured monotone inclusions: duality, algorithms, and applications. SIAM J. Optim. 23(4), 2420–2447 (2013)

    MathSciNet  Article  Google Scholar 

  15. 15.

    Rockafellar, R.T.: Monotone operators associated with saddle-functions and minimax problems. In: Browder, F.E. (ed.) Nonlinear Functional Analysis, Part 1, Volume 18 of Proceedings of Symposia in Pure Mathematics, vol. 18, pp. 241–250 (1970)

  16. 16.

    Boţ, R.I., Hendrich, C.: A Douglas–Rachford type primal-dual method for solving inclusions with mixtures of composite and parallel-sum type monotone operators. SIAM J. Optim. 23(4), 2541–2565 (2013)

    MathSciNet  Article  Google Scholar 

  17. 17.

    Latafat, P., Patrinos, P.: Asymmetric forward–backward–adjoint splitting for solving monotone inclusions involving three operators. Comput. Optim. Appl. 68(1), 57–93 (2017)

    MathSciNet  Article  Google Scholar 

  18. 18.

    Ryu, E.K.: Uniqueness of DRS as the 2 operator resolvent-splitting and impossibility of 3 operator resolvent-splitting. Math. Program. (2019). https://doi.org/10.1007/s10107-019-01403-1

  19. 19.

    Condat, L.: A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl. 158(2), 460–479 (2013)

    MathSciNet  Article  Google Scholar 

  20. 20.

    Vũ, B.C.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math. 38(3), 667–681 (2013)

    MathSciNet  Article  Google Scholar 

  21. 21.

    Yan, M.: A new primal-dual algorithm for minimizing the sum of three functions with a linear operator. J. Sci. Comput. 76(3), 1698–1717 (2018)

    MathSciNet  Article  Google Scholar 

  22. 22.

    Johnstone, P.R., Eckstein, J.: Projective splitting with forward steps: asynchronous and block-iterative operator splitting. arXiv:1803.07043 (2018)

  23. 23.

    Johnstone, P.R., Eckstein, J.: Projective splitting with forward steps only requires continuity. arXiv:1809.07180 (2018)

  24. 24.

    Bùi, M. N. and Combettes, P. L.: Warped proximal iterations for monotone inclusions. arXiv:1908.07077 (2019)

  25. 25.

    Giselsson, P.: Nonlinear forward–backward splitting with projection correction. arXiv:1908.07449 (2019)

  26. 26.

    Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 2nd edn. Springer, New York (2017)

    Google Scholar 

  27. 27.

    Ryu, E.K., Boyd, S.: Primer on monotone operator methods. Appl. Comput. Math. 15(1), 3–43 (2016)

    MathSciNet  MATH  Google Scholar 

  28. 28.

    Ryu, E.K., Hannah, R., Yin, W.: Scaled relative graph: nonexpansive operators via 2D euclidean geometry. arXiv:1902.09788 (2019)

  29. 29.

    Combettes, P.L., Glaudin, L.E.: Proximal activation of smooth functions in splitting algorithms for convex minimization. SIAM J. Imaging Sci. arXiv:1803.02919v2 (2018)

  30. 30.

    Drori, Y., Teboulle, M.: Performance of first-order methods for smooth convex minimization: a novel approach. Math. Program. 145(1–2), 451–482 (2014)

    MathSciNet  Article  Google Scholar 

  31. 31.

    Taylor, A.B., Hendrickx, J.M., Glineur, F.: Smooth strongly convex interpolation and exact worst-case performance of first-order methods. Math. Program. 161(1–2), 307–345 (2017)

    MathSciNet  Article  Google Scholar 

  32. 32.

    Ryu, E.K., Taylor, A.B., Bergeling, C., Giselsson, P.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. arXiv:1812.00146 (2018)

Download references

Acknowledgements

Ernest Ryu was partially supported by AFOSR MURI FA9550-18-1-0502, NSF Grant DMS-1720237, and ONR Grant N000141712162. Bằng Công Vũ’s research work was funded by Vietnam National Foundation for Science and Technology Development (NAFOSTED) under Grant No. 102.01-2017.05.

Author information

Affiliations

Authors

Corresponding author

Correspondence to Ernest K. Ryu.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Communicated by Jalal Fadili.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Ryu, E.K., Vũ, B.C. Finding the Forward-Douglas–Rachford-Forward Method. J Optim Theory Appl 184, 858–876 (2020). https://doi.org/10.1007/s10957-019-01601-z

Download citation

Keywords

  • Douglas–Rachford
  • Forward–backward–forward
  • Forward–reflected–backward
  • Monotone inclusion

Mathematics Subject Classification

  • 47H05
  • 47H09
  • 90C25