Tight Global Linear Convergence Rate Bounds for Douglas-Rachford Splitting

Recently, several authors have shown local and global convergence rate results for Douglas-Rachford splitting under strong monotonicity, Lipschitz continuity, and cocoercivity assumptions. Most of these focus on the convex optimization setting. In the more general monotone inclusion setting, Lions and Mercier showed a linear convergence rate bound under the assumption that one of the two operators is strongly monotone and Lipschitz continuous. We show that this bound is not tight, meaning that no problem from the considered class converges exactly with that rate. In this paper, we present tight global linear convergence rate bounds for that class of problems. We also provide tight linear convergence rate bounds under the assumptions that one of the operators is strongly monotone and cocoercive, and that one of the operators is strongly monotone and the other is cocoercive. All our linear convergence results are obtained by proving the stronger property that the Douglas-Rachford operator is contractive.


Introduction
Douglas-Rachford splitting [12,24] is an algorithm that solves monotone inclusion problems of the form  (1.1) where f and g are proper, closed, and convex functions. This holds since the subdifferential of proper, closed, and convex functions are maximally monotone operators, and since Fermat's rule says that the optimality condition for solving (1.1) is 0 ∈ ∂f (x) + ∂g(x), under a suitable qualification condition. The algorithm has shown great potential in many applications such as signal processing [6], image denoising [32], and statistical estimation [5] (where the dual algorithm ADMM is discussed). It has long been known that Douglas-Rachford splitting converges under quite mild assumptions, see [14,24,13]. However, the rate of convergence in the general case has just recently been shown to be O(1/ √ k) for the fixed-point residual, [18,10,9]. For general maximal monotone operator problems, where one of the operators is strongly monotone and Lipschitz continuous, Lions and Mercier showed in [24] that the Douglas-Rachford algorithm enjoys a linear convergence rate. To the author's knowledge, this was the sole linear convergence rate results for a long period of time for these methods. Recently, however, many works have shown linear convergence rates for Douglas-Rachford splitting and its dual version, ADMM, see, [19,20,30,10,8,11,15,28,21,22,4,33,17,16,1,2]. The works in [19,20,10,4,30] concern local linear convergence under different assumptions. The works in [21,22,33] consider distributed formulations, while the works in [8,11,15,28,24,31,27,17,16,1,2] show global convergence rate bounds under various assumptions. Of these, the works in [16,1,2] show tight linear convergence rate bounds. The works in [1,2] show tight convergence rate results for problem of finding a point in the intersection of two subspaces. In [16] it is shown that the linear convergence rate bounds in [17] (which are generalizations of the bounds in [15]) are tight for composite convex optimization problems where one function is strongly convex and smooth. All these results, except the one by Lions and Mercier, are stated in the convex optimization setting. In this paper, we will provide tight linear convergence rate bounds for monotone inclusion problems.
We consider three different sets of assumptions under which we provide linear convergence rate bounds. In all cases, the properties of Lipschitz continuity or cocoercivity, and strong monotonicity, are attributed to the operators. In the first case, we assume that one operator is strongly monotone and the other is cocoercive. In the second case, we assume that one operator is both strongly monotone and Lipschitz continuous. This is the setting considered by Lions and Mercier in [24], where a non-tight linear convergence rate bound is presented. In the third case, we assume that one operator is both strongly monotone and cocoercive. We show in all these settings that our bounds are tight, meaning that there exists problems from the respective classes that converge exactly with the provided rate bound. In the second and third cases, the rates are tight for all feasible algorithm parameters, while in the first case, the rate is tight for many algorithm parameters.
From [3,Proposition 4.25], we know that an operator T : H → H is α-averaged if and only if it satisfies for all x, y ∈ H.

Preliminaries
In this section, we state and show preliminary results that are needed to prove the linear convergence rate bounds. We state some lemmas that describe how cocoercivity, Lipschitz continuity, as well as averagedness relate to each other. We also introduce negatively averaged operators, T , that are defined by that −T is averaged. We show different properties of such operators, including that averaged maps of negatively averaged operators are contractive. This result will be used to show linear convergence in the case where the strong monotonicity and Lipschitz continuity properties are split between the operators.

Useful lemmas
Proofs to the following three lemmas are found in Appendix A.
For easier reference, we also record special cases of some results in [3] that will be used later. Specifically, we record, in order, special cases of [3,Proposition 4.33], [3,Proposition 4.28], and [3,Proposition 23.11].

Negatively averaged operators
In this section we define negatively averaged operators and show various properties for these.
This definition implies that an operator T is θ-negatively averaged if and only if it satisfies whereR is nonexpansive and R := −R is therefore also nonexpansive. Since −T is averaged, it is also nonexpansive, and so is T .
Since negatively averaged operators are nonexpansive, they can be averaged.
Next, we show that averaged negatively averaged operators are contractive.
Remark 3.11. The optimal contraction factor θ 2−θ is strictly increasing in θ on the interval θ ∈ (0, 1). Therefore the contraction factor becomes smaller the smaller θ is.
We conclude this section by showing that the composition of an averaged and a negatively averaged operator is negatively averaged. Before we state the result, we need a characterization of θ-negatively averaged operators T . This follows directly from the definition of averaged operators in (2.1) since −T is θ-averaged: where the first inequality follows from convexity of · 2 . More precisely, let t ∈ [0, 1], then, by convexity of · 2 , we conclude that , gives the first inequality in (3.2). The second inequality in (3.2) follows from (2.1) and (3.1). The relation in (3.2) coincides with the definition of negative averagedness in (3.1). Thus T θ T α is φ-negatively averaged with φ satisfying 1−φ φ = 1 κ . This gives φ = κ κ+1 and the proof is complete.
Remark 3.13. This result can readily be extended to show averagedness of T = T 1 T 2 · · · T N where T i are α i -(negatively) averaged for i = 1, . . . , N . We get that T is κ 1+κ -negatively averaged with κ = N i=1 αi 1−αi if an odd number of the T i :s are negatively averaged, and that T is κ 1+κ -averaged if an even number of the T i are negatively averaged. Similar results have been and presented, e.g., in [3,Proposition 4.32] which is improved in [7]. Our result extends these results in that it allows also for negatively averaged operators and reduces to the result in [7] for averaged operators.

Douglas-Rachford splitting
Douglas-Rachford splitting can be applied to solve monotone inclusion problems of the form where The resolvent has full domain since A is assumed maximally monotone, see [26] and [3,Proposition 23.7]. If A = ∂f where f is a proper, closed, and convex function, then J A = prox f where the prox operator prox f is defined as That this holds follows directly from Fermat's rule [3,Theorem 16.2] applied to the proximal operator definition. The Douglas-Rachford algorithm is defined by the iteration where α ∈ (0, 1) (we will see that also α ≥ 1 can sometimes be used) and R A : H → H is the reflected resolvent, which is defined as (Note that what is traditionally called Douglas-Rachford splitting is when α = 1/2 in (4.3). The case with α = 1 in (4.3) is often referred to as the Peaceman-Rachford algorithm, see [29]. We will use the term Douglas-Rachford splitting for all feasible choices of α.) Since the reflected resolvent is nonexpansive in the general case [3, Corollary 23.10], and since compositions of nonexpansive operators are nonexpansive, the Douglas-Rachford algorithm is an averaged iteration of a nonexpansive mapping when α ∈ (0, 1). Therefore, Douglas-Rachford splitting is a special case of the Krasnosel'skiȋ-Mann iteration [25,23], which is known to converge to a fixed-point of the nonexpansive operator, in this case R A R B , see [3,Theorem 5.14]. Since an x ∈ H solves (4.1) if and only if x = J A z where z = R A R B z, see [3,Proposition 25.1] this algorithm can be used to solve monotone inclusion problems of the form (4.1). Note that to solve (4.1), is equivalent to solving 0 ∈ γAx + γBx for any γ ∈ (0, ∞). Then we can define A γ = γA and (4.1) can also be solved by the iteration Therefore, γ is an algorithm parameter that affects the progress of the iterations. The objective of this paper, is to provide tight linear convergence rate bounds for the Douglas-Rachford algorithm under various assumptions. Using these bounds, we will show how to select the algorithm parameters γ and α that optimize these bounds. The first setting we consider is when A is strongly monotone and B is cocoercive.

A strongly monotone and B cocoercive
In this section, we show linear convergence for Douglas-Rachford splitting in the case where A and B are maximally monotone, A is strongly monotone, and B is cocoercive. That is, we make the following assumptions. Before we can state the main linear convergence result, we need to characterize the properties of the resolvent, the reflected resolvent, and the composition between reflected resolvents. This is done in the following series of propositions, this first of which is proven in Appendix B.
This implies that also the reflected resolvent is averaged.
Proof. This follows directly from the Proposition 5.2 and Lemma 3.5.
If the operator instead is strongly monotone, the reflected resolvent is negatively averaged.
Proof. From Lemma 3.6, we have that the resolvent J A is (1 + σ)-cocoercive. Using Lemma 3.4, this implies that Id − J A is 1 2(1+σ) -averaged. Then using Lemma 3.5, this implies that 2( 1+σ -negatively averaged. This completes the proof. The composition of the reflected resolvents of a strongly monotone operator and a cocoercive operator is negatively averaged.
Proposition 5.5. Suppose that Assumption 5.1 holds. Then, the composition Proof. Since R A is 1 1+σ -negatively averaged and R B is β 1+β -averaged, see Propositions 5.3 and 5.4, we can apply Proposition 3.12. We get that κ = σ + β and that the averagedness parameter of the neg- . This concludes the proof.
With these results, we can now show the following linear convergence rate bounds for Douglas-Rachford splitting under Assumption 5.1. The theorem is proven in Appendix B.

Tightness
In this section, we present an example that shows tightness of the linear convergence rate bounds in Theorem 5.6 for many algorithm parameters. We consider a two dimensional Euclidean example, which is given by the following convex optimization problem: and x = (x 1 , x 2 ), and β > 0. The gradient ∇f = βx 1 , so it is cocoercive with factor 1 β . According to [3,Theorem 18.15] this is equivalent to that f * is 1 β -strongly convex and therefore ∂f * is σ := 1 β -strongly monotone.
The following proposition shows that when solving (5. 2) with f defined in (5.3) using Douglas-Rachford splitting, the upper linear convergence rate bound is exactly attained. The result is proven in Appendix B.
So, for all γ parameters and some α parameters, the provided bound is tight. Especially, the optimal parameter choices γ = 1 give a tight bound.
It is interesting to note that although we have considered a more general class of problems than convex optimization problems, a convex optimization problem is used to attain the worst case rate.

Comparison to other bounds
In [17], it was shown that Douglas-Rachford splitting converges as √ when solving composite optimization problems of the form 0 ∈ γ∇f + γ∂g, where ∇f is σ-strongly monotone and 1 β -cocoercive and the algorithm parameters are chosen as α = 1 and γ = 1 √ βσ . In our setting, with ∂f being σ-strongly monotone and ∂g being 1 β -cocoercive, we can instead pose the equivalent problem 0 ∈ γ∂f (x) + γ∂ĝ(x) wheref = f − σ 2 · 2 and g = g + σ 2 · 2 . Then ∂f is merely monotone andĝ is σ-strongly monotone and 1 β+σ -cocoercive. For that problem, [17] shows a linear convergence rate of at least rate √ (when optimal parameters are used). This rate turns out to be better than the rate provided in Theorem 5.6, i.e. √ , which assumes that the strong convexity and smoothness properties are split between the two operators. This is shown by the following chain of equivalences which departs from the fact that the square root is sub-additive, i.e., that This implies that, from a worst case perspective, it is better to shift both properties into one operator. This is also always possible, without increasing the computational cost in the algorithm, since the prox-operator is just shifted slightly: A similar relation holds for prox γĝ with the sign in front of γσ flipped.

A strongly monotone and Lipschitz continuous
In this section, we consider the case where one of the operators is σ-strongly monotone and β-Lipschitz continuous. This is assumption is stated next.
In the following section, we will see that there exists a problem from the considered class that converges exactly with the provided rate.

Tightness
We consider a problem where A is a rotation operator, i.e., the it is given by where 0 ≤ ψ < π 2 and d ∈ (0, ∞). First, we show that A is strongly monotone and Lipschitz continuous. Proof. We first show that A is d cos ψ-strongly monotone. Since A is linear, we have That is, A is d cos ψ-strongly monotone. Since A is a scaled (with d) rotation operator, its largest eigenvalue is d, and hence A is d-Lipschitz. This concludes the proof.
We need an explicit form of the reflected resolvent of A to show that the rate is tight. To state it, we define the following alternative arctan definition that is valid when tan ξ = x y and x ≥ 0: x ≥ 0, y = 0 This arctan is defined for nonnegative numerators x only, and outputs an angle in the interval [0, π]. Next, we provide the expression for the reflected resolvent. To simplify its notation, we let σ denote the strong convexity modulus and β the Lipschitz constant of A, i.e., The following result is proven in Appendix C.
Proposition 6.7. The reflected resolvent of γA, with A in (6.3) and γ ∈ (0, ∞), is That is, the reflected resolvent is first a rotation then a contraction. The contraction factor is exactly the upper bound on the contraction factor in Theorem 6.5. Therefore, the A in (6.3) can be used to show tightness of the results in Theorem 6.5. To do so, we need another operator B that cancels the rotation introduced by A. For α ∈ (0, 1], we will need R γA R γB = 1 − 4γσ 1+2γσ+(γβ) 2 I and for α > 1, we will need R γA R γB = − 1 − 4γσ 1+2γσ+(γβ) 2 I. This is clearly achieved if R γB is another rotation operator. Using the following straightforward consequence of Minty's theorem (see [26]) we conclude that any rotation operator (since they are nonexpansive) is the reflected resolvent of a maximally monotone operator. With this in mind, we can state the tightness claim. Proposition 6.9. Let γ ∈ (0, ∞), δ = 1 − 4γσ 1+2γσ+(γβ) 2 , and ξ be defined as in Proposition 6.7. Suppose that A is as in (6.3) and B is maximally monotone and satisfies either of the following: − sin (π−ξ) cos (π−ξ) . Then the z k sequence for solving 0 ∈ γAx+γBx using (4.3) converges exactly with the rate |1 − α| + αδ.
Proof. Case (i): Using the reflected resolvent R γA in Proposition 6.7 and that α ∈ (0, 1], we conclude that Using the reflected resolvent R γA in Proposition 6.7 and that α ≥ 1, we conclude that In both cases, the convergence rate is exactly |1 − α| + α 1 − 4γσ 1+2γσ+(γβ) 2 . This completes the proof. We have shown that the rate provided in Theorem 6.5 is tight for all feasible α and γ.

Comparison to other bounds
In Figure 1, we have compared the linear convergence rate result in Theorem 6.5 to the convergence rate result in [24]. The comparison is made with optimal γ-parameters for both bounds. The result in [24] is provided in the standard Douglas-Rachford setting, i.e., with α = 1/2. By instead letting α = 1, this rate can be improved, see [8] (which shows an improved rate in the composite convex optimization case, but the same rate can be shown to hold also for monotone inclusion problems). Also this improved rate is added to the comparison in Figure 1. We see that both rates that follow from [24] are suboptimal and worse than the rate bound in Theorem 6.5.

A strongly monotone and cocoercive
In this section, we consider the case where A is strongly monotone and cocoercive. That is, we assume the following. Convergence factor Theorem 6.5 Improvement to [24] [24] Figure 1. Convergence rate comparison for general monotone inclusion problems where one operator is strongly monotone and Lipschitz continuous. We compare Theorem 6.5 to [24], and an improvement to [24] which holds when α = 1. The linear convergence result for Douglas-Rachford splitting will follow from the contraction factor of the reflected resolvent of A. The contraction factor is provided in the following theorem, which is proven in Appendix D. When considering the reflected resolvent of γA where γ ∈ (0, ∞), the γ-parameter can be chosen to optimize the contraction factor of R γA . The operator γA is γσ-strongly monotone and 1 γβ -cocoercive, so the optimal γ > 0 minimizes h(γ) := 1 − 4γσ 1+2γσ+γ 2 σβ . The gradient of h satisfies ∇h(γ) = 4σ(βσγ 2 −1) (βσγ 2 +2σγ+1) 2 , so the extreme points of h are given by γ = ± 1 √ βσ . Since γ > 0 and the gradient is negative for γ ∈ (0, 1 √ βσ ) and positive for γ > 1 √ βσ , the parameter γ = 1 √ βσ minimizes the contraction factor. The corresponding contraction factor is . This is summarized in the following proposition. . Proof. It follows immediately from Theorem 7.2, Lemma 3.3, and Proposition 7.3 by noting that α = 1 minimizes (7.1).

Tightness
In this section, we provide a two-dimensional example that shows that the provided bounds are tight. We let A be the resolvent of a scaled rotation operator to achieve this. Let C be that scaled rotation operator, i.e., with c ∈ (1, ∞) and ψ ∈ [0, π 2 ). We will let A satisfy A = dJ C for some d ∈ (0, ∞). That is In the following proposition, we state the strong monotonicity and cocoercivity properties of A. Proof. The matrix C in (7.2) is c cos ψ-strongly monotone (see Proposition 6.6), so J C is (1 + c cos ψ)-cocoercive (see [3,Definition 4.4]) and the operator A = d(I + C) −1 is 1+c cos ψ d -cocoercive. Further, since C is monotone and c-Lipschitz continuous (see Proposition 6.6), the following holds (see Proposition 6.2): Since J C is (1 + c cos ψ)-cocoercive, we have We add (7.5) multiplied by − 1−c 2 1+c cos ψ (which is positive since c ∈ (1, ∞)) to (7.4) to get so A is strongly monotone with parameter d 1+c cos ψ 1+2c cos ψ+c 2 . This concludes the proof.
This shows that the assumptions needed for the linear convergence rate result in Theorem 7.4 hold. To prove the tightness claim, we need an expression for the reflected resolvent of A. This is easier expressed in the strong convexity modulus, which we define as σ and the inverse cocoercivity constant, which we define as β, i.e.,: The following results is proven in Appendix D.
Based on this reflected resolvent, we can show that the rate bound in Theorem 7.4 is indeed tight. The proof of the following result is the same as the proof to Proposition 6.9.
So, we have shown that the rate in Theorem 7.4 is tight for all feasible algorithm parameters α and γ. Convergence factor [17] Theorem 7.4 Theorem 6.5 Figure 2. Convergence rate comparison between Theorem 6.5, Theorem 7.4, and [17]. In all, one operator has both regularity properties. It is strongly monotone in all examples and Lipschitz in Theorem 6.5, cocoercive in Theorem 7.4 (which is stronger than Lipschitz), and a cocoercive subdifferential operator in [17] (which is the strongest property). The worst-case rate improves when the class of problems becomes more restricted.

Comparison to other bounds
We have shown tight convergence rate estimates for Douglas-Rachford splitting when the monotone operator A is cocoercive and strongly monotone (Theorem 7.4). In Section 6, we showed tight estimates when A is Lipschitz and strongly monotone (Theorem 6.5). In [17], tight convergence rate estimates are proven for the case when A and B are subdifferential operators of proper closed and convex functions and A is strongly monotone and Lipschitz continuous (which in this case is equivalent to cocoercive). The class of problems considered in [17] is a subclass of the problems considered in this section, which in turn is a subclass of the problems considered in Section 6. The optimal rates for these classes of problems are shown in Figure 2. By restricting the problem classes, the rate bounds get tighter. This is in contrast to the case in Section 5, where a convex optimization problem achieved the worst case estimate.

Conclusions
We have shown linear convergence rate bounds for Douglas-Rachford splitting for monotone inclusion problems with three different sets of assumptions. One setting was the one used by Lions and Mercier [24], for which we provided a tighter bound. We also stated linear convergence rate bounds under two other assumptions, for which no other linear rate bounds were previously available. In addition, we have shown that all our rate bounds are tight for, in two cases all feasible algorithm parameters, and in the remaining case many algorithm parameters.
Appendix B. Proofs to results in Section 5 B.1. Proof to Proposition 5.2 Adding u − v 2 to both sides gives 2 . Letting x = (B + Id)u and y = (B + Id)v implies that u = J B x and v = J B y. Therefore, we get the equivalent expression This is, by [3,Proposition 4.25], equivalent to that J B is β 2(β+1) -averaged. This concludes the proof.

B.2. Proof to Theorem 5.6
Since R γA R γB is 1 γσ +γβ 1 γσ +γβ+1 -negatively averaged, see Proposition 5.5, the Douglas-Rachford iteration is defined by an α-averaged 1 γσ +γβ 1 γσ +γβ+1 -negatively averaged operator. The rate in (5.1) follows directly from Proposition 3.9. The optimal parameters follow from Proposition 3.10. It shows that the rate factor is increasing in , which in turn is increasing in 1 γσ + γβ. Therefore this should be minimized to optimize the rate. The optimal γ = 1 √ βσ gives negative averagedness factor . Proposition 3.10 further gives that the optimal averagedness factor is and that the optimal bound on the contraction factor is This concludes the proof.
When α ∈ [c, 1), the absolute value term in (5.1) is nonpositive since Therefore, for such α, the rate in (5.1) is |1 − 2α|. This coincides with the rate for the provided example for any γ > 0, and the proof is completed.
Appendix C. Proofs to results in Section 6 C.1. Proof to Proposition 6.2 β-Lipschitz continuity of A implies that βId+A is 1 2β -cocoercive, see Lemma 3.1. That is Using βId = Id + (β − 1)Id, this is equivalent to that Using that x = (Id + A)u if and only if u = (Id + A) −1 x and y = (Id + A)v if and only if v = (Id + A) −1 y (that hold by definition of the inverse and single-valuedness), this is equivalent to Identifying the resolvent J A = (Id + A) −1 and expanding the first square give: By rearranging the terms, we conclude that The result follows by multiplying by 2β, since 1 + 1−β β = 1 β . This concludes the proof.
Case β ≤ 1 To prove the result for β ≤ 1, we define the set R of pairs of points (x, y) ∈ H × H as follows: We also define the closure of the remaining pairs of points R c = (H × H)\R, i.e., Obviously, H × H ⊆ R + R c which implies that the contraction factor of the resolvent is the worst-case contraction factor for R and R c . We first show the contraction factor for R. Since (C.2) is the definition of the set R in (C.4), the contraction factor for (x, y) ∈ R is shown exactly as in (C.3). For (x, y) ∈ R c , we have where (6.1) is used in the first inequality and the definition of R c in (C.5) in the second. That is, the worst case contraction factor is 1 − 4σ 1+2σ+β 2 also for β ≤ 1.
Appendix D. Proofs to results in Section 7 D.1. Proof to Theorem 7.2 We know from Lemma 3.6 and Definition 2.6 that J A is (1 + σ)-cocoercive, i.e., that it satisfies for all x, y ∈ H. From Proposition 5.2, we know that J A is β 2(1+β) -averaged, i.e., that it satisfies (see [3,Proposition 4 for all x, y ∈ H. Let α = β 2(1+β) and δ = 1 1+σ and define the set R of pairs of points (x, y) ∈ H × H as: We also define the closure of set of remaining pairs of points R c = (H × H)\R, i.e., Obviously, the contraction factor for R A is the worst-case contraction factor for pairs of points in R and R c .
Contraction factor on R First, we provide a contraction factor for pairs of points in R.
Contraction factor on R c Next, we provide a contraction factor for pairs of points in R c . Since R A = 2J A − Id, we conclude that where we have used that α ∈ (0, 1 2 ), (D.2), and the definition of R c in (D.4).

Contraction factor of R A
Here, we show that the contraction factors on R and R c are identical, and we simplify the expression to get a final contraction factor for the reflected resolvent R A . That the contraction factors are identical is shown by verifying that the difference between is zero: Next, we simplify this contraction factor by inserting δ = 1 1+σ and α = β 2(1+β) . We get 4(δ − 1) δ 2 + (1−α−δ/2) 2 −α 2 +(δ/2) 2 2(1−α−δ/2)  Taking the square root concludes the proof.
This completes the proof.