# Shadow Douglas–Rachford Splitting for Monotone Inclusions

## Abstract

In this work, we propose a new algorithm for finding a zero of the sum of two monotone operators where one is assumed to be single-valued and Lipschitz continuous. This algorithm naturally arises from a non-standard discretization of a continuous dynamical system associated with the Douglas–Rachford splitting algorithm. More precisely, it is obtained by performing an explicit, rather than implicit, discretization with respect to one of the operators involved. Each iteration of the proposed algorithm requires the evaluation of one forward and one backward operator.

## Introduction

The study of continuous time dynamical systems associated with iterative algorithms for solving optimization problems has a long history which can be traced back at least to the 1950s [4, 14]. The relationship between the continuous and discrete versions of an algorithm provides a unifying perspective which gives insights into their behavior and properties. As we will see in this work, this includes suggesting new algorithmic schemes as well as appropriate Lyapunov functions for analyzing their convergence properties. The interplay between continuous and discrete dynamical systems has been studied by many authors including [1,2,3, 5, 6, 9, 10, 23, 24].

The following well-known idea will help to motivate the approach used in this work. Let $${\mathcal {H}}$$ be a real Hilbert space and suppose $${B:{\mathcal {H}}\rightarrow {\mathcal {H}}}$$ is a maximal monotone operator. Consider the monotone equation

\begin{aligned} \text {find}~x\in {\mathcal {H}}\quad \text {such that}\quad 0=B(x), \end{aligned}
(1)

to which the following continuous time dynamical system can be attached

\begin{aligned} {\dot{x}}(t) = - B(x(t)). \end{aligned}
(2)

Let $$\lambda >0$$. We now devise two iterative algorithms for solving (1) by using different discretizations of $${\dot{x}}(t)$$ in (2). To this end, let us first approximate the trajectory x(t) in (2) by discretizing at the points $$(k\lambda )_{k\in {\mathbb {Z}}_+}$$, and denote the discretized trajectory by $$x_k := x(k\lambda )$$.

Now, on one hand, using the forward discretization $${\dot{x}}(t) \approx \frac{x_{k+1}-x_k}{\lambda }$$ gives

\begin{aligned} x_{k+1} = x_k - \lambda B(x_k). \end{aligned}
(3)

In the particular case when B is the gradient of a function, (3) is nothing more than the classical gradient descent method. On the other hand, using the backward discretization $${\dot{x}}(t) \approx \frac{x_{k}-x_{k-1}}{\lambda }$$ gives

\begin{aligned} x_{k} = J_{\lambda B}(x_{k-1}), \end{aligned}
(4)

where $$J_{A}:=({{\,\mathrm{Id}\,}}+A)^{-1}$$ denotes the resolvent of a (potentially multi-valued) maximal monotone operator $$A:{\mathcal {H}}\rightrightarrows {\mathcal {H}}$$. This iteration is precisely the proximal point algorithm for the monotone inclusion (1). It is worth emphasizing that (3) and (4) are different iterative algorithms which, in general, do not converge under the same conditions. In particular, if B is monotone but not cocoercive, then (4) converges to a solution for any $$\lambda >0$$ whereas (3) does not. Nevertheless, both algorithms correspond to the same continuous dynamical system (2).

In this work, we exploit the same type of relationship between continuous and discrete dynamical systems to discover a new algorithm for monotone inclusions of the form

\begin{aligned} \text {find}~x\in {\mathcal {H}}\quad \text {such that}\quad 0\in (A+B)(x), \end{aligned}
(5)

where $$A:{\mathcal {H}}\rightrightarrows {\mathcal {H}}$$ and $$B:{\mathcal {H}}\rightarrow {\mathcal {H}}$$ are (maximally) monotone operators with BL-Lipschitz continuous (but not necessarily cocoercive). More precisely, by using a non-standard discretization of the continuous time Douglas–Rachford algorithm, we obtain

\begin{aligned} x_{k+1} = J_{\lambda A }\bigl (x_k - \lambda B(x_k)\bigr ) - \lambda \bigl (B(x_k)- B(x_{k-1})\bigr ), \end{aligned}
(6)

which, as we will show, converges weakly to a solution of (5) whenever $$\lambda \in (0,\frac{1}{3L})$$. Note also that, by choosing the operators A and B appropriately, the setting of (5) covers smooth–nonsmooth convex minimization, monotone inclusions through duality, and saddle point problems with smooth convex–concave couplings. For further details, see .

Despite substantial progress in monotone operator theory, there are not so many original splitting algorithms for solving monotone inclusions of form (5) which use forward evaluations of B. Tseng’s forward-backward-forward algorithm , published in 2000, was the first such method capable of solving (5). Until recently, this was the only known method with these properties, however there has since been progress in the area with the discovery of further methods having this property [16, 17, 22]. In this connection, see also [8, 12].

The remainder of this work is organized as follows. In Sect. 2, we discuss the classical Douglas–Rachford and study an alternative form of its continuous time dynamical system. In Sect. 3, we discretize this alternative form to obtain (6) and prove its convergence. In Sect. 4, we briefly show how the same idea can be applied to recover a primal–dual algorithm which was recently proposed in [19, Algorithm 1] and . Section 5 concludes our work by suggesting avenues for further investigation.

## From the Discrete to the Continuous

The Douglas–Rachford method is an algorithm for finding a zero of the sum of maximally monotone operators, A and B. This popular splitting method works by only requiring the evaluation of the resolvents of each of the operators individually, rather than the resolvent of their sum. The method was first formulated for solving linear equations in  and later generalized to monotone inclusions in .

The method can be compactly described as the fixed point iteration

\begin{aligned} z_{k+1} = \left( \frac{{{\,\mathrm{Id}\,}}+ R_{\lambda A} R_{\lambda B}}{2}\right) z_k, \end{aligned}
(7)

where $$R_{\lambda B}=2J_{\lambda B}-{{\,\mathrm{Id}\,}}$$ denotes the reflected resolvent of a monotone operator $$\lambda B$$. Its behavior is summarized in the following theorem.

### Theorem 1

[7, Theorem 25.6] Let $$A:{\mathcal {H}}\rightrightarrows {\mathcal {H}}$$ and $$B:{\mathcal {H}}\rightrightarrows {\mathcal {H}}$$ be maximally monotone operators with $${{\,\mathrm{zer}\,}}(A+B) \ne \varnothing$$. Let $$\lambda >0$$ and $$z_0\in {\mathcal {H}}$$. Then the sequence $$(z_k)$$, generated by (7), satisfies

1. (i)

$$(z_k)$$ converges weakly to a point $$z\in {{\,\mathrm{Fix}\,}}(R_{\lambda A} R_{\lambda B})$$.

2. (ii)

$$(J_{\lambda B}z_k)$$ converges weakly to $$J_{\lambda B}z \in {{\,\mathrm{zer}\,}}(A+B)$$.

The iteration (7) can be viewed as a discretization of the continuous time dynamical system

\begin{aligned} {\dot{z}}(t)+z(t) = \left( \frac{{{\,\mathrm{Id}\,}}+ R_{\lambda A} R_{\lambda B}}{2}\right) z(t), \end{aligned}
(8)

where the discretizations $${\dot{z}}(t)\approx z_{k+1}-z_k$$ and $$z(t)\approx z_k$$ are used. Since the operator $$R_{\lambda A} R_{\lambda B}$$ is nonexpansive (i.e., 1-Lipschitz), the Picard–Lindelöf theorem [15, Theorem 2.2] implies that, for any $$z_0\in {\mathcal {H}}$$, there exists a unique trajectory z(t) satisfying (8) and the initial condition $$z(0) = z_0$$.

Let us now express this dynamical system in an alternative form. First, by using the definition of the reflected resolvent, we observe that (8) can be written as

\begin{aligned} {\dot{z}}(t) = J_{\lambda A}\bigl (2J_{\lambda B}(z(t)) - z(t) \bigr ) - J_{\lambda B}(z(t)). \end{aligned}
(9)

Denote $$x(t) = J_{\lambda B}(z(t))$$ and $$y(t) = z(t) - x(t)$$. From the definition of the resolvent, $$y(t)\in \lambda B(x(t))$$ and we therefore have

\begin{aligned} z(t) = x(t) + y(t),\quad \text {and} \quad {\dot{z}}(t) = {\dot{x}}(t) + {\dot{y}}(t). \end{aligned}
(10)

By using these identities to eliminate z from (9), we obtain

\begin{aligned} \left\{ \begin{array}{ll} {\dot{x}}(t)+x(t) = J_{\lambda A}\left( x(t)- y(t)\right) -{\dot{y}}(t), \\ y(t)\in \lambda B(x(t)). \end{array}\right. \end{aligned}
(11)

This system can be viewed as the continuous dynamical system associated with the shadow trajectories, x(t), of the Douglas–Rachford system (8) specified by z(t). In particular, this fact implies the existence of the trajectories x(t) and y(t). In a later section, we will use a discretization of this system to obtain a new splitting algorithm. Note also that, by using the definition of the resolvent $$J_{\lambda A}$$, (11) can be equivalently expressed as

\begin{aligned} \left\{ \begin{array}{ll} {\dot{x}}(t)+y(t)+{\dot{y}}(t) &{}\in -\lambda A\bigl ({\dot{x}}(t)+x(t)+{\dot{y}}(t)\bigr ), \\ y(t) &{}\in \lambda B(x(t)). \end{array}\right. \end{aligned}
(12)

We begin with a theorem concerning the asymptotic behavior of (11). Although this result can be obtained, with some work, from [10, Theorem 6], we give a more direct proof which serves the additional purpose of providing insights useful for the analysis of the discrete case. We require the following two preparatory lemmas.

### Lemma 1

Let $$\lambda >0$$. Suppose $$A:{\mathcal {H}}\rightrightarrows {\mathcal {H}}$$ and $$B:{\mathcal {H}}\rightrightarrows {\mathcal {H}}$$ are maximally monotone operators. Then the set-valued operator on $${\mathcal {H}}\times {\mathcal {H}}$$ defined by

## From the Continuous to the Discrete

In this section, we devise a new splitting algorithm by considering different discretizations of the dynamical system (11). For the remainder of this work, we will suppose that B is a single-valued operator. In this case, the system (11) simplifies to

\begin{aligned} \left\{ \begin{array}{ll} {\dot{x}}(t)+x(t) &{}= J_{\lambda A}\left( x(t)- y(t)\right) -{\dot{y}}(t), \\ y(t) &{}= \lambda B(x(t)). \end{array} \right. \end{aligned}
(17)

In order to discretize this system, let us replace $$x(t)\approx x_k$$ and $$y(t)\approx y_k$$. As two derivatives appear in (17), there are many combinations of possible discretizations. One involves using forward discretizations of both $${\dot{x}}(t)$$ and $${\dot{y}}(t)$$, that is,

\begin{aligned} {\dot{x}}(t) \approx x_{k+1}-x_k,\quad {\dot{y}}(t) \approx y_{k+1}-y_k. \end{aligned}
(18)

Under this discretization, (17) becomes

\begin{aligned} x_{k+1} = J_{\lambda A}(x_k-\lambda B(x_k)) - \lambda \bigl (B(x_{k+1})-B(x_k)\bigr ). \end{aligned}
(19)

As written, this expression does not given rise to a useful algorithm, since $$x_{k+1}$$ appears on both sides of the equation. However, we note that by taking $$z_k=x_k+y_k=(I+\lambda B)(x_k)$$ and rearranging, we obtain

\begin{aligned} z_{k+1} = z_k + J_{\lambda A}(2J_{\lambda B}(z_k)-z_k) - J_{\lambda B}(z_k), \end{aligned}

which is precisely the usual Douglas–Rachford algorithm given in (7).

To derive a new algorithm, we consider a different discretization of (17). To this end, we perform a forward discretization of $${\dot{x}}(t)$$ and a backward discretization of $${\dot{y}}(t)$$, that is,

\begin{aligned} {\dot{x}}(t) \approx x_{k+1}-x_k,\quad {\dot{y}}(t) \approx y_{k}-y_{k-1}, \end{aligned}
(20)

Under this discretization, (11) becomes

\begin{aligned} x_{k+1} = J_{\lambda A }\bigl (x_k - \lambda B(x_k)\bigr ) - \lambda \bigl (B(x_k)- B(x_{k-1})\bigr ). \end{aligned}
(21)

Although not surprising, it is interesting to note that (19) and (21) only differ in the indices which appear in the last two terms. In particular, in this expression, $$x_{k+1}$$ does not appear on the right-hand side.

### Remark 1

(Timestep in the discretization) In the above derivation, we assumed that the discretization of $${\dot{x}}(t)$$ and $${\dot{y}}(t)$$ where performed with respect to a unit timestep for simplicity of exposition. However, if a timestep $$\gamma >0$$ is used, then (20) becomes

\begin{aligned} {\dot{x}}(t) \approx \frac{x_{k+1}-x_k}{\gamma },\quad {\dot{y}}(t) \approx \frac{y_{k}-y_{k-1}}{\gamma }. \end{aligned}

Under this discretization, (17) becomes

\begin{aligned} x_{k+1} = (1-\gamma )x_k+\gamma J_{\lambda A}(x_k-\lambda B(x_k)) - \lambda \bigl (B(x_{k})-B(x_{k-1})\bigr ). \end{aligned}

In other words, for timesteps $$\gamma$$ in (0, 1), the resolvent term in (21) becomes a convex combination with the previous point.

Before turning our attention to the convergence properties of this iteration, we make the following remark.

### Remark 2

Backward/forward discretizations of a derivative usually correspond to the same type of step in their discrete counterpart of the algorithms. This is, for instance, the case for the forward-backward method which includes the discussion from Sect. 2 as a special case. It is curious to note, however, that forward (resp. backward) discretization gave rise to backward (resp. forward) operators in the discrete counterparts. In particular, two forward discretizations of (17) gave rise the Douglas–Rachford algorithm which has two backward steps whereas one forward and one backward discretization produced a method also having one forward and one backward step.

We now prove the following lemma, which might be interesting in its own right due to the very general form of the recurrent relation.

### Lemma 3

Let $$A:{\mathcal {H}}\rightrightarrows {\mathcal {H}}$$ be a maximal monotone operator and let $$(y_k)\subset {\mathcal {H}}$$ be an arbitrary sequence. Let $$x_0\in {\mathcal {H}}$$ and consider $$(x_k)$$ defined by

\begin{aligned} x_{k+1} = J_{A}(x_k-y_k) - (y_k - y_{k-1}). \end{aligned}
(22)

Then, for all $$x\in {\mathcal {H}}$$ and $$y \in -A(x)$$, we have

\begin{aligned}&\Vert (x_{k+1} +y_k) - (x+y)\Vert ^2 \le \left\| (x_{k}+y_{k-1}) - (x+y) \right\| ^2 - 2 \left\langle y_k - y, x_k - x\right\rangle \nonumber \\&\quad + 4\left\langle y_k-y_{k-1}, x_k-x_{k+1}\right\rangle -\left\| x_{k+1}-x_k \right\| ^2 -3\left\| y_k-y_{k-1} \right\| ^2. \end{aligned}
(23)

### Proof

By the definition of the resolvent and (22), it follows that

\begin{aligned} x_{k+1}-x_k + y_k + ( y_k - y_{k-1}) \in - A\bigl (x_{k+1} + y_k-y_{k-1}\bigr ). \end{aligned}
(24)

Since $$-y \in A(x)$$ and A is monotone, we have

\begin{aligned} 0\le \left\langle x_{k+1}-x_k + y_k + (y_k-y_{k-1}) - y, x - x_{k+1}- (y_k-y_{k-1}) \right\rangle , \end{aligned}

which is equivalent to

\begin{aligned} 0\le & {} \left\langle x_{k+1}-x_k,x-x_{k+1}\right\rangle + \left\langle y_k-y,x-x_{k+1}\right\rangle + \left\langle y_k-y_{k-1},x-x_{k+1}\right\rangle \nonumber \\&+ \left\langle y_k - y_{k-1},x_k - x_{k+1}\right\rangle + \left\langle y_k-y_{k-1}, y- y_k\right\rangle - \left\| y_k - y_{k-1} \right\| ^2. \end{aligned}
(25)

To simplify (25), we note that

\begin{aligned} 2\left\langle x_{k+1}-x_k, x-x_{k+1}\right\rangle&= \left\| x_{k}-x \right\| ^2 - \left\| x_{k+1} -x_k \right\| ^2 - \left\| x_{k+1} -x \right\| ^2,\\ 2 \left\langle y_k-y_{k-1}, y-y_k \right\rangle&= \left\| y_{k-1} -y \right\| ^2 -\left\| y_k-y_{k-1} \right\| ^2 -\left\| y_k -y \right\| ^2 ,\\ \left\langle y_k-y_{k-1}, x-x_{k+1}\right\rangle&=\left\langle y_k-y_{k-1},x_k-x_{k+1}\right\rangle \\&\quad +\left\langle y_{k-1}-y, x_k-x\right\rangle +\left\langle y-y_k, x_k-x\right\rangle . \end{aligned}

Now, using the above three identities in (25), we obtain

\begin{aligned}&\left\| x_{k+1}-x \right\| ^2 + 2\left\langle y_k-y, x_{k+1}-x\right\rangle + \left\| y_k-y \right\| ^2 \nonumber \\&\quad \le \left\| x_{k}-x \right\| ^2 + 2\left\langle y_{k-1}-y, x_{k}-x\right\rangle + \left\| y_{k-1}-y \right\| ^2 \nonumber \\&{\qquad } + 4\left\langle y_k-y_{k-1}, x_k-x_{k+1}\right\rangle -\left\| x_{k+1}-x_k \right\| ^2\nonumber \\&{\qquad } - 3\left\| y_k-y_{k-1} \right\| ^2 - 2\left\langle y_k-y, x_k-x\right\rangle . \end{aligned}
(26)

The equivalence between the last inequality and (23) now follows. $$\square$$

Since (20) is of the form specified by Lemma 3, this lemma suggests one possible way to prove convergence of (20): the quantity $$\left\| x_k+y_{k-1} - x - y \right\| ^2$$ will be decreasing if the other terms in the right-hand side of (23) can be estimated appropriately. The following theorem, which is our main result regarding convergence of (21), makes use of this observation.

### Theorem 3

Let $$A:{\mathcal {H}}\rightrightarrows {\mathcal {H}}$$ be maximally monotone and $$B:{\mathcal {H}}\rightarrow {\mathcal {H}}$$ be monotone and L-Lipschitz with $${{\,\mathrm{zer}\,}}(A+B)\ne \varnothing$$. Let $$\varepsilon >0$$, $$\lambda \in \left[ \varepsilon ,\frac{1-3\varepsilon }{3L}\right]$$ and let $$x_0,x_{-1}\in {\mathcal {H}}$$. Then the sequence $$(x_k)$$, generated by (21), satisfies

1. (i)

$$(x_k)$$ converges weakly to a point $${\overline{x}}\in {{\,\mathrm{zer}\,}}(A+B)$$.

2. (ii)

$$(B(x_k))$$ converges weakly to $$B({\bar{x}})$$.

### Proof

Let $$x\in {{\,\mathrm{zer}\,}}(A+B)$$ and set $$y=\lambda B(x)\in -\lambda A(x)$$. Since (21) is of the form specified by (22), we apply Lemma 3 to the monotone operator $$\lambda A$$ with $$y_k=\lambda B(x_k)$$ to deduce that the inequality (23) holds. Now, using that B is monotone, we have $$\left\langle y_k-y, x_k-x\right\rangle \ge 0$$ and hence

\begin{aligned}&\Vert (x_{k+1} +y_k) - (x+y)\Vert ^2 \le \left\| (x_{k}+y_{k-1}) - (x+y) \right\| ^2 \nonumber \\&\quad + 4\left\langle y_k-y_{k-1}, x_k-x_{k+1}\right\rangle -\left\| x_{k+1}-x_k \right\| ^2 -3\left\| y_k-y_{k-1} \right\| ^2. \end{aligned}
(27)

Next, we estimate the inner-product in the last line of (27). To this end, note that Young’s inequality gives

\begin{aligned} 2 \left\langle y_k-y_{k-1},x_k-x_{k+1}\right\rangle \le \frac{1}{3} \left\| x_{k+1}-x_k \right\| ^2 + 3 \left\| y_k-y_{k-1} \right\| ^2, \end{aligned}
(28)

and that Lipschitzness of B yields

\begin{aligned} 2 \left\langle y_k-y_{k-1},x_k-x_{k+1}\right\rangle&\le \lambda L \bigl (\left\| x_{k}-x_{k-1} \right\| ^2 + \left\| x_{k+1}-x_k \right\| ^2\bigr ). \end{aligned}
(29)

Combing these two estimates with (27) gives the inequality

\begin{aligned}&\left\| x_{k+1}+y_k -x-y \right\| ^2 + \left( \frac{2}{3} -\lambda L\right) \left\| x_{k+1}-x_k \right\| ^2\\&\quad \le \left\| x_{k}+ y_{k-1}-x -y \right\| ^2 + \lambda L \left\| x_k-x_{k-1} \right\| ^2. \end{aligned}

By denoting $$z_{k}=x_k+y_{k-1}$$ and $$z=x+y$$, the previous inequality implies

\begin{aligned} \left\| z_{k+1} -z \right\| ^2 + \left( \frac{1}{3}+\varepsilon \right) \left\| x_{k+1}-x_k \right\| ^2\le \left\| z_k - z \right\| ^2 + \frac{1}{3} \left\| x_k-x_{k-1} \right\| ^2, \end{aligned}
(30)

which telescopes to yield

\begin{aligned} \left\| z_{k+1} -z \right\| ^2 + \frac{1}{3}\left\| x_{k+1}-x_k \right\| ^2 +\varepsilon \sum _{i=1}^k\left\| x_{i+1}-x_i \right\| ^2 \le \left\| z_1 - z \right\| ^2 + \frac{1}{3} \left\| x_1-x_{0} \right\| ^2. \end{aligned}

From this, it follows that $$(z_k)$$ is bounded and that $${\left\| x_k-x_{k-1} \right\| \rightarrow 0}$$. The latter, together with Lipschitz continuity of B, implies $${\left\| y_{k}-y_{k-1} \right\| \rightarrow 0}$$ and, consequently, we also have that $$\left\| z_k-z_{k-1} \right\| \rightarrow 0$$. Since $$z_k = ({{\,\mathrm{Id}\,}}+\lambda B)(x_k) + (y_{k-1}-y_k)$$, we have

\begin{aligned} x_k = J_{\lambda B}\left( z_k - (y_{k-1}-y_k) \right) . \end{aligned}

Since $$(z_k)$$ is bounded, $$\left\| y_k-y_{k-1} \right\| \rightarrow 0$$ and $$J_{\lambda B}$$ is nonexpansive, it then follows that the sequence $$(x_k)$$ is also bounded. Also, due to (30), we see that the following limit exits

\begin{aligned} \lim _{k\rightarrow \infty }\left( \left\| z_k-z \right\| ^2+\frac{1}{3}\left\| x_{k}-x_{k-1} \right\| ^2\right) =\lim _{k\rightarrow \infty }\left\| z_k-z \right\| ^2. \end{aligned}

Now, by using the definition of the resolvent $$J_{\lambda A}$$, we can express (24) in the form

\begin{aligned} -\left( {\begin{array}{c} z_{k+1}-z_k \\ z_{k+1}-z_k \end{array}}\right) \in \left( \begin{bmatrix}\lambda A\\lambda B)^{-1}\end{bmatrix}+\begin{bmatrix} 0&{{\,\mathrm{Id}\,}}\\ -{{\,\mathrm{Id}\,}}&0 \\ \end{bmatrix}\right) \left( {\begin{array}{c}z_{k+1}-z_k+x_k\\ z_{k+1}-x_{k+1}\end{array}}\right) . \end{aligned} (31) Let (xz) be a weak cluster point of the bounded sequence \((x_k,z_k). Taking the limit along this subsequence in (31), using Lemma 1, and unravelling the resulting expression gives\begin{aligned} \left\{ \begin{aligned} 0&\in \lambda A(x)+(z-x) \\ x&\in (\lambda B)^{-1}(z-x)\\ \end{aligned}\right. \quad \implies \quad \left\{ \begin{aligned} x&\in {{\,\mathrm{zer}\,}}(A+B)\\ z&= x+\lambda B(x)\\ \end{aligned}\right. \end{aligned}$$(32) Applying Opial’s Lemma [7, Lemma 2.39], it then follows that $$(z_k)$$ converges weakly to a point $${\bar{z}}={\bar{x}}+\lambda B({\bar{x}})$$ where $${\bar{x}}$$ is weak cluster point of $$(x_k)$$. But then the definition of $$J_{\lambda B}$$ yields that $${\bar{x}}=J_{\lambda B}({\bar{z}})$$ which implies that $$J_{\lambda B}({\bar{z}})$$ is the unique cluster point of $$(x_k)$$. The sequence $$(x_k)$$ therefore converges weakly to a point $${\bar{x}}\in {{\,\mathrm{zer}\,}}(A+B)$$. To complete the proof, simply note that $$y_{k-1}=z_{k}-x_k\rightharpoonup {\bar{z}}-{\bar{x}} = \lambda B({\bar{x}})$$ as $$k\rightarrow \infty$$. $$\square$$ Some remarks regarding Theorem 3 and its proof are in order. ### Remark 3 (Continuous and discrete proofs) The sequence $$z_{k}=x_k+y_{k-1}$$ plays a similar role in Theorem 3 to the trajectory $$z(t)=x(t)+y(t)$$ in Theorem 2. This does however highlight a subtle difference between the two proofs—in the discrete case, we have $$x_k=J_{\lambda B}\bigl (z_k+(y_k-y_{k-1})\bigr )$$ whereas, in the continuous case, we have $$x(t)=J_{\lambda B}(z(t))$$. ### Remark 4 In the original submission of this manuscript we conjectured that the interval in which $$\lambda$$ lies could be extended to $$\lambda \in (0,\frac{1}{2L})$$. Later, in a private communication, Sebastian Banert constructed a counterexample to show that this is not the case. Indeed, consider the setting with $${\mathcal {H}}={\mathbb {R}}^2, A=3B$$ and $$B(x,y)=(y,-x)$$. Then choosing $$\lambda =1/3$$ and initializing with $$x_0=-B(x_{-1})$$ yields that (21) satisfies $$x_{k+2}=-x_k$$ for all $$k\in {\mathbb {N}}$$. In particular, the sequence $$(x_k)$$ does not converge, when $$x_0$$ is nonzero. Interestingly, our original motivation for considering the continuous dynamical system (11) did not arise from its connection to the Douglas–Rachford algorithm, but rather from its connection to the operator splitting method studied in  given by$$\begin{aligned} x_{k+1} = J_{\lambda A }\bigl (x_k - \lambda B(x_k)- \lambda (B(x_k)- B(x_{k-1})) \bigr ). \end{aligned}$$(33) Note that the iterations (21) and (33) look very similar and, in fact, coincide if $$A=0$$. For (33), convergence has been established when $$\lambda < \frac{1}{2L}$$, which is slightly better than for (21). On the other hand, the analysis of dynamical systems corresponding to (33) is more complicated. In particular, a natural candidate for a continuous analogue of (33) is given by$$\begin{aligned} \left\{ \begin{array}{ll} {\dot{x}}(t)+x(t) &{}= J_{\lambda A}\left( x(t)- y(t) -{\dot{y}}(t)\right) \\ y(t) &{}= \lambda B(x(t)). \end{array}\right. \end{aligned}$$(34) Because we are unable to couple the derivatives $${\dot{x}}(t)$$ and $${\dot{y}}(t)$$ in (34) in general, it is not clear how to prove existence of its trajectory x(t). ## Primal–Dual Algorithms In this section, we illustrate another application of Lemma 3 in the analysis of a primal–dual algorithm. Consider the bilinear convex–concave saddle point problem$$\begin{aligned} \min _{u\in {\mathcal {H}}_1} \max _{v\in {\mathcal {H}}_2} \, g(u) + \left\langle Ku,v\right\rangle - f^*(v), \end{aligned}$$(35) where $$g:{\mathcal {H}}_1\rightarrow (-\infty , +\infty ]$$, $$f:{\mathcal {H}}_2 \rightarrow (-\infty , +\infty ]$$ are proper convex lsc functions, $$K:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_2$$ is a bounded linear operator with norm $$\left\| K \right\|$$, and $$f^*$$ denotes the Fenchel conjugate of f. A popular method to solve this problem is the Chambolle–Pock primal–dual method  defined by$$\begin{aligned} \left\{ \begin{aligned} u_{k+1}&= {{\,\mathrm{prox}\,}}_{\tau g}(u_k - \tau K^*v_k)\\ v_{k+1}&= {{\,\mathrm{prox}\,}}_{\sigma f^*}(v_k + \sigma K(2u_{k+1}-u_k)). \end{aligned}\right. \end{aligned}$$(36) Under the assumption that the solution set of (35) is non-empty and that $$\tau \sigma \left\| K \right\| ^2 < 1$$, one can prove that the sequence $$(u_k, v_k)$$ converges weakly to a saddle point of (35). In the spirit of (21), we might consider another primal–dual algorithm:$$\begin{aligned} \left\{ \begin{aligned} u_{k+1}&= {{\,\mathrm{prox}\,}}_{\tau g}(u_k - \tau K^*v_k)\\ v_{k+1}&= {{\,\mathrm{prox}\,}}_{\sigma f^*}(v_k + \sigma Ku_{k+1}) + \sigma (Ku_{k+1}-Ku_k). \end{aligned}\right. \end{aligned}$$(37) This algorithm can be viewed as a special case of [18, Algorithm 1] corresponding to the choice of parameters $$\mu =0$$ and $$\theta =\lambda =1$$ (see also ). In what follows, we provide an alternative derivation of its convergence using the same lemma as in the analysis of the shadow DR algorithm. Rather than present the full proof, we will only focus on the most important ingredient—the fact that $$(u_k)$$, $$(v_k)$$ remain bounded. One this is established, the rest follows the same argument as in Theorem 3. ### Theorem 4 Let $$g:{\mathcal {H}}_1 \rightarrow (-\infty , +\infty ]$$, $$f:{\mathcal {H}}_2\rightarrow (\infty , +\infty ]$$ be proper convex lsc functions and $$K:{\mathcal {H}}_1 \rightarrow {\mathcal {H}}_2$$ be a bounded linear operator with norm $$\left\| K \right\|$$ such that the solution set of (35) is nonempty. Let $$\tau \sigma \left\| K \right\| ^2 < 1$$, let $$u_0\in {\mathcal {H}}_1$$, and let $$v_0\in {\mathcal {H}}_2$$. Then the sequence $$(u_k, v_k)$$, generated by (37), converges weakly to a solution of (35). ### Proof Let (uv) be a saddle point of (35). Then the first-order optimality conditions give $$-K^*v \in \partial g(u)$$ and $$Ku \in \partial f^*(v)$$. By applying Lemma 3 for a fixed $$k\in {\mathbb {N}}$$ with$$\begin{aligned} A = \tau \partial g,\ x_k = u_k,\ y_k = y_{k-1} = \tau K^*v_k,\ x=u,\ y = \tau K^*v, \end{aligned}$$we obtain$$\begin{aligned} \left\| u_{k+1} - u \right\| ^2 + 2 \tau \left\langle K^* v_k - K^*v, u_{k+1}-u\right\rangle \le \Vert u_{k} -u \Vert ^2 -\left\| u_{k+1}-u_k \right\| ^2,\qquad \end{aligned}$$(38) where, instead of (23), we used its equivalent form (26). Similarly, by applying Lemma 3 for a fixed $$k\in {\mathbb {N}}$$ with$$\begin{aligned} A = \sigma \partial f^*,\ x_k = v_k,\ y_k = -\sigma Ku_{k+1},\ y_{k-1} = -\sigma Ku_k, \ x= v, \ y = -\sigma Ku, \end{aligned}$$we obtain$$\begin{aligned}&\left\| (v_{k+1}- \sigma Ku_{k+1}) - (v - \sigma Ku) \right\| ^2 \le \left\| (v_{k}- \sigma Ku_{k}) - (v - \sigma Ku) \right\| ^2\nonumber \\&\quad -\left\| v_{k+1}-v_k \right\| ^2 - 3\sigma ^2 \left\| K(u_{k+1}-u_k) \right\| ^2 \nonumber \\&\quad - 4\sigma \left\langle Ku_{k+1} - Ku_k, v_k-v_{k+1}\right\rangle + 2\sigma \left\langle K(u_{k+1}-u),v_k-v\right\rangle . \end{aligned}$$(39) By applying Young’s inequality and using the inequality $$\tau \sigma \left\| K \right\| ^2 <1$$, we have$$\begin{aligned}&-4 \left\langle Ku_{k+1} - Ku_k, v_k-v_{k+1}\right\rangle \le 4\sigma \left\| Ku_{k+1}-Ku_k \right\| ^2+\frac{1}{\sigma }\left\| v_{k+1}-v_k \right\| ^2 \nonumber \\&\quad \le \frac{1}{\tau }\left\| u_{k+1}-u_k \right\| ^2 +\frac{1}{\sigma }\left\| v_{k+1}-v_k \right\| ^2 + 3\sigma \left\| K(u_{k+1}-u_k) \right\| ^2, \end{aligned}$$(40) Now, multiplying (38) by $$1/\tau$$, (39) by $$1/\sigma$$, summing these two inequalities, and then using the estimate (40) yields$$\begin{aligned}&\frac{1}{\tau }\left\| u_{k+1} - u \right\| ^2 + \frac{1}{\sigma }\left\| (v_{k+1}- \sigma Ku_{k+1}) - (v - \sigma Ku) \right\| ^2 \nonumber \\&\quad \le \frac{1}{\tau }\left\| u_{k} - u \right\| ^2 + \frac{1}{\sigma }\left\| (v_{k}- \sigma Ku_{k}) - (v - \sigma Ku) \right\| ^2. \end{aligned}$$(41) By telescoping this inequality, one obtains boundedness of $$(u_k)$$ and $$(v_k)$$. In fact, a slightly tighter estimation in (40) would yield $$\left\| u_k-u_{k-1} \right\| \rightarrow 0$$ and $$\left\| v_k-v_{k-1} \right\| \rightarrow 0$$ (since the inequality $$\tau \sigma \left\| K \right\| ^2 < 1$$ is strict). $$\square$$ ## Concluding Remarks/Future Directions In this work, we proposed and analyzed a new algorithm for finding a zero of the sum of two monotone operators, one of which is assumed to be Lipschitz continuous. This algorithm naturally arise from a non-standard discretization of a continuous dynamical system with the Douglas–Rachford algorithm. To conclude, we outline possible directions for future work. • Linesearch It would be interesting to incorporate a linesearch procedure in the shadow Douglas–Rachford method. Similarly, it makes sense to consider a continuous dynamic scheme with variable steps, as it was done, for example, in  for Tseng’s method. • Inertial terms It is important to study the extensions of (11) and (21) which incorporate additional inertial and relaxation terms, as was done in the recent work  for the forward–backward method. Combining inertial and relaxing effects allows one to go beyond the standard bound of $$\frac{1}{3}$$ for the stepsize associated with the inertial term. • Role of reflection Perhaps the most interesting and challenging direction for future work is to understand why the inclusion of a “reflection term” in an algorithm allows for convergence to be proven under milder hypotheses. For instance, applied to the saddle point problem (35), the famous Arrow–Hurwicz algorithm  can fail to converge. In contrast, both (36) and (37), which can be viewed its “reflected” modifications, do converge. Similarly, for the monotone variational inequality $$0\in N_C(x) + B(x)$$, where C is a closed convex set and $$N_C$$ is its normal cone, the projected gradient algorithm$$\begin{aligned} x_{k+1} = P_C (x_k - \lambda B(x_k)) \end{aligned}$$does not work, but its “reflected” modification  given by$$\begin{aligned} x_{k+1} = P_C (x_k - \lambda B(2x_k-x_{k-1})) \end{aligned}

does converge to a solution. For the more general monotone inclusion $$0 \in A(x) + B(x)$$, the forward-backward method also does not work, however both of its “reflected” modifications, (21) and (33), do. We note however that although all of aforementioned algorithms share the same “reflected term”, their analyses are not the same. It would be interesting to understand deeper reasons for their success.

## References

1. 1.

Abbas, B., Attouch, H., Svaiter, B.F.: Newton-like dynamics and forward-backward methods for structured monotone inclusions in Hilbert spaces. J. Optim. Theory Appl. 161(2), 331–360 (2014)

2. 2.

Al’ber, Y.I.: Continuous regularization of linear operator equations in a Hilbert space. Math. Notes 4(5), 793–797 (1968)

3. 3.

Antipin, A.S.: Minimization of convex functions on convex sets by means of differential equations. Diff. Equ. 30(9), 1365–1375 (1994)

4. 4.

Arrow, K., Hurwicz, L.: Gradient methods for constrained maxima. Op. Res. 5(2), 258–265 (1957)

5. 5.

Attouch, H., Cabot, A.: Convergence of a relaxed inertial forward-backward algorithm for structured monotone inclusions. Appl. Math. Optim. (2019). https://doi.org/10.1007/s00245-019-09584-z

6. 6.

Banert, S., Boţ, R.I.: A forward-backward-forward differential equation and its asymptotic properties. J. Convex. Anal. 25(2), 371–388 (2018)

7. 7.

Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces, 1st edn. Springer, New York (2011)

8. 8.

Bello Cruz, J., Díaz Millán, R.: A variant of forward-backward splitting method for the sum of two monotone operators with a new search strategy. Optimization 64(7), 1471–1486 (2015)

9. 9.

Bolte, J.: Continuous gradient projection method in Hilbert spaces. J. Optim. Theory Appl. 119(2), 235–259 (2003)

10. 10.

Boţ, R.I., Csetnek, E.R.: A dynamical system associated with the fixed points set of a nonexpansive operator. J. Dyn. Differ. Equ. 29(1), 155–168 (2017)

11. 11.

Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011)

12. 12.

Combettes, P.L., Pesquet, J.C.: Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators. Set-Valued Var. Anal. 20(2), 307–330 (2012)

13. 13.

Douglas, J., Rachford, H.H.: On the numerical solution of heat conduction problems in two and three space variables. Trans.Am. Math. Soc. 82(2), 421–439 (1956)

14. 14.

Gavurin, M.K.: Nonlinear functional equations and continuous analogues of iterative methods (in Russian). Isvestiya Vuzov Matem 5, 18–31 (1958)

15. 15.

Granas, A., Dugundji, J.: Fixed Point Theory. Springer, New York (2013)

16. 16.

Johnstone, P.R., Eckstein, J.: Projective splitting with forward steps: asynchronous and block-iterative operator splitting. arXiv:1803.07043 (2018)

17. 17.

Johnstone, P.R., Eckstein, J.: Single-forward-step projective splitting: Exploiting cocoercivity. arXiv:1902.09025 (2019)

18. 18.

Latafat, P., Patrinos, P.: Primal-dual proximal algorithms for structured convex optimization: a unifying framework. In: Giselsson, P., Rantzer, A. (eds.) Large-Scale and Distributed Optimization, pp. 97–120. Springer, Cham (2018)

19. 19.

Latafat, P., Stella, L., Patrinos, P.: New primal-dual proximal algorithm for distributed optimization. In: IEEE 55th Conference on Decision and Control (CDC), 2016 , pp. 1959–1964

20. 20.

Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979)

21. 21.

Malitsky, Y.: Reflected projected gradient method for solving monotone variational inequalities. SIAM J. Optim. 25(1), 502–520 (2015)

22. 22.

Malitsky, Y., Tam, M.K.: A forward-backward splitting method for monotone inclusions without cocoercivity. arXiv:1808.04162 (2018)

23. 23.

Peypouquet, J., Sorin, S.: Evolution equations for maximal monotone operators: asymptotic analysis in continuous and discrete time. J. Convex Anal. 17(3&4), 1113–1163 (2010)

24. 24.

Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comp. Math. Math. Phys. 4(5), 1–17 (1964)

25. 25.

Tseng, P.: A modified forward-backward splitting method for maximal monotone mappings. SIAM J. Control Optim. 38, 431–446 (2000)

Download references

## Acknowledgements

The authors would like to thank the Erwin Sch$$\ddot{\mathrm{r}}$$odinger Institute for their support and hospitality during the thematic program “Modern Maximal Monotone Operator Theory: From Nonsmooth Optimization to Differential Inclusions”. The authors would also like to thank the two anonymous referees for their helpful comments as well as Sebastian Banert for sharing his nice counterexample that we mentioned in Remark 4.

## Funding

ERC was supported by Austrian Science Fund Project P 29809-N32. YM was supported by German Research Foundation Grant No. SFB755-A4.

## Author information

Authors

### Corresponding author

Correspondence to Yura Malitsky.

## Additional information

### Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

## Rights and permissions

Reprints and Permissions