Distributed forward-backward methods for ring networks

In this work, we propose and analyse forward-backward-type algorithms for finding a zero of the sum of finitely many monotone operators, which are not based on reduction to a two operator inclusion in the product space. Each iteration of the studied algorithms requires one resolvent evaluation per set-valued operator, one forward evaluation per cocoercive operator, and two forward evaluations per monotone operator. Unlike existing methods, the structure of the proposed algorithms are suitable for distributed, decentralised implementation in ring networks without needing global summation to enforce consensus between nodes.


Introduction
In this work, we propose algorithms of forward-backward-type for solving structured monotone inclusions in a real Hilbert space H. Specifically, we consider the problem find x ∈ H such that 0 where where g 1 , . . ., g n : H → (−∞, +∞] are proper, lsc and convex, and f 1 , . . ., f m : H → (−∞, +∞) are convex and differentiable with L-Lipschitz continuous gradients.Through its first order optimality condition, (2) can be posed as (1) with where ∂g i denotes the subdifferential of g i .Note that the operators B 1 , . . ., B m are both L-Lipschitz and 1 L -cocoercive, due to the Baillon-Haddad theorem [4,Corolaire 10].
Example 1.3 (Structured variational inequalities).Consider the variational inequality problem given by find x * ∈ H such that where g 1 , . . ., g n : H → (−∞, +∞] are proper, lsc and convex, and B 1 , . . ., B m : H → H are monotone and L-Lipschitz.Then ( 4) is of the form of (1) with A i = ∂g i .An important special case of ( 4) is the constrained variational inequality problem given by find x * ∈ H such that where C 1 , . . ., C n ⊆ H are nonempty, closed and convex sets.This formulation allows one to exploit a representation of the set C in terms of the simpler sets C 1 , . . ., C n .

Splitting algorithms
We focus on splitting algorithms for solving (1) of forward-backward-type, by which we mean those whose iteration can be expressed in terms of the resolvents of the set-valued operators A 1 , . . ., A n and direct evaluations of the single-valued operators B 1 , . . ., B m .It is always possible to reduce this problem to the m = 1 case by combining the single-valued operators into a single operator F := m i=1 B i whilst preserving the above features.However, since the resolvent of a sum is generally not related to the individual resolvents, the same cannot be said for the set-valued operators, and so it makes sense to distinguish algorithms for (1) based on the value of n.
In the case n = 1, there are many methods satisfying the above criteria.Among them, the best known are arguably the forward-backward method given by which can be used when F is cocoercive, and the forward-backward-forward method [23] given by which can be used when F is monotone and Lipschitz.When n = 2, there are also many methods.For instance, if F is cocoercive, Davis-Yin splitting [12,11,1] which takes the form can be applied, and if F is monotone and Lipschitz, then the backward-forward-reflected-backward methods [19] can be used.
However, for n > 2, the situation is drastically different.Most existing methods rely on a product space reformulation, either directly or implicitly.For instance, the iteration given by for cocoercive B 1 , . . ., B n , amounts to Davis-Yin splitting applied to the three operator inclusion where A := (A 1 , . . ., A n ), B := (B 1 , . . ., B n ) and N D denotes the normal cone to the diagonal subspace Other methods for (1) with n > 2 include the generalised forward-backward method [18] and those from the projective splitting family [13,14].Indisputably, product space reformulations such as (6) provide a convenient tool that makes the derivation of algorithms for n > 2 operators an almost mechanical procedure.It is therefore natural to consider whether this tool is the only one at our disposal.In addition to academic importance in its own right, the discovery of new algorithms that do not fall within standard categories can provide new possibilities, both in terms of mathematical techniques and potential applications.Sometimes these applications can be quite unexpected, as we demonstrate next.

Distributed algorithms
Advances in hardware (parallel computation) and increasing the size of datasets (decentralised storage) have made distributed algorithms one of the most prevalent trends in algorithm development.Such algorithms rely on a network of devices that perform subtasks and are able to communicate with each other.For details on the topic, the reader is referred to the book of Bertsekas & Tsitsiklis [7] as well as [9] for recent advances.
From the perspective of distributed computing, the product space formulation generally requires the computation of a global sum across all nodes in every iteration.To be more concrete, consider a distributed implementation of (5) in which node i performs the z i -updates by using its operators, A i and B i .To perform the x-update, the local variables z 1 , . . ., z n must be aggregated and the result then broadcast to the entire network.There may be many reasons why this is not desirable including default network setting, privacy or cost issues.
Another important aspect of distributed communication is parallelism and synchronisation.Return to our example involving (5) from the previous paragraph, the product space reformulation provides a fully parallel algorithm in the sense that all nodes performing z-update can compute their updates in parallel before sending to the central coordinator.This parallelisation comes at cost of requiring global synchronisation between nodes.Specially, the algorithm (5) cannot move from k-th to (k + 1)-th iteration until all nodes 1, . . ., n have completed their computation.This can be overcome with asynchronous algorithms, that is, those which only require little or no global synchronisation.However, their development and mathematical analysis are significantly more delicate.

Our contribution
We propose and analyse algorithms of forward-backward-type for solving (1) which exploit problem structure.Note that by using the zero operator in (1) if necessary, we can always assume that m = n − 1. Applied to this problem with cocoercive operators B 1 , . . ., B n−1 , our algorithm can be expressed as the fixed point iteration z k+1 = T (z k ) based on the operator T : where x = (x 1 , . . ., x n ) ∈ H n depends on z = (z 1 , . . ., z n−1 ) ∈ H n−1 and is given by      x 1 = J λA 1 (z 1 ), .
For the case where B i are monotone and Lipschitz, the underlying operator is slightly more complicated and relies on an update similar to the one proposed in the forward-reflected-backward method [15].Overall, the notable characteristics of the algorithms we propose are: • They do not rely on existing product space reformulation: Instead, we extend the framework for backward operators, proposed in [16], which in turn is a generalisation of [21] for n > 3.
• They are decentralised and can be naturally implemented on a ring network for communication.
• The order in which variables are updated can vary significantly between executions: z k+1 i can be computed before evaluation of z k i+2 , z k−1 i+3 , . . . .Importantly, we believe that our work is an important starting point towards a more general template that will allow for different network topologies.
The remainder of this work is structured as follows: In Section 2, we recall notation and preliminaries for later use.In Section 3, we introduce and analyse a forward-backward type algorithm for solving (1) with cocoercive operators.In Section 4, we introduce and analyse a modification of the algorithm from Section 3 which can be used when B 1 , . . ., B m are not necessarily cocoercive.

Preliminaries
Throughout this paper, H denotes a real Hilbert space equipped with inner product •, • and induced norm • .A set-valued operator is a mapping A : H ⇒ H that assigns to each point in H a subset of H, i.e., A(x) ⊆ H for all x ∈ H.In the case when A always maps to singletons, i.e., A(x) = {u} for all x ∈ H, A is said to be a single-valued mapping and is denoted by A : H → H.In an abuse of notation, we may write A(x) = u when A(x) = {u}.The domain, the graph, the set of fixed points and the set of zeros of A, are denoted, respectively, by dom A, gra A, Fix A and zer A; i.e., dom The inverse operator of A, denoted by A −1 , is defined through x ∈ A −1 (u) ⇐⇒ u ∈ A(x).The identity operator is denoted by Id.
Definition 2.1.An operator B : H → H is said to be Note that, by the Cauchy-Schwarz inequality, a 1 L -cocoercive operator is always L-Lipschitz continuous.
When we wish to explicitly specify the constants involved, we refer to the operators in Definition 2.2(iii) and (iv), respectively, as σ-strongly quasi-nonexpansive and α-averaged nonexpansive.Since the mapping α → 1−α α is a bijection from (0, 1) to (0, +∞), there is a one-to-one relationship between the values of σ in (iii) and α in (iv), with inverse relation given by σ → 1 1+σ .

Definition 2.3. A set-valued operator
Furthermore, A is said to be maximally monotone if there exists no monotone operator B : H ⇒ H such that gra B properly contains gra A.

Proposition 2.4 ([5, Corollary 20.28]). Every continuous monotone operator with full domain is maximally monotone. In particular, every cocoercive operator is maximally monotone.
The resolvent operator, whose definition is given next, is one of the main building blocks of splitting algorithms.
(ii) dom J γA = H if and only if A is maximally monotone.

A Distributed Forward-Backward Method
Let n 2 and consider the problem where A 1 , . . ., A n : H ⇒ H are maximally monotone and B 1 , . . ., [16] proposed a splitting algorithm with (n − 1)-fold lifting for finding a zero of the sum of n 2 maximally monotone operators; see also [2] for recent extensions.In this section, we adapt the methodology developed in [1] to obtain a splitting method of forward-backward-type for the inclusion (7) by modifying the splitting method in [16] without increasing the dimension of the ambient space.
Given λ ∈ (0, 2 L ) and γ ∈ 0, 1 − λL 2 and an initial point z 0 = (z 0 1 , . . ., z 0 n−1 ) ∈ H n−1 , our proposed algorithm for (7) generates two sequences, (z k ) ⊆ H n−1 and (x k ) ⊆ H n , according to The structure of ( 8) lends itself to a distributed decentralised implementation, similar to the one in [16, Algorithm 2].More precisely, consider a cycle graph with n nodes labeled 1 through n.Each node in the graph represents an agent, and two agents can communicate only if their nodes are adjacent.In our setting, this means that Agent i can only communicate with Agents i − 1 and i + 1 mod n, for i ∈ 1, n .We assume that each agent only knows its operators in (1).Specifically, we assume that only Agent 1 knows the operator A 1 and that, for each i ∈ {2, . . ., n}, only Agent i knows the operators A i and B i−1 .The responsibility of updating x i is assigned to Agent i for all i ∈ {1, . . ., n} and the responsibility of updating z i is assigned to Agent i for i ∈ {2, . . ., n}.Altogether, this gives rise to the protocol for distributed decentralised implementation of (8) described in Algorithm 1.
Algorithm 1 Protocol for distributed decentralised implementation of (8).
and sends it to Agents 2 and n 4: Agent i computes end for 7: Agent n computes sends z k n−1 to Agent n − 1; 8: end for Remark 3.1 (Termination criterion for Algorithm 1).Let (z) k be the sequence generated by Algorithm 1.In order to detect termination, one could compute (possibly periodically) the residual given by The structure of this residual is suitable for the distributed implementation within the protocol in the algorithm.Indeed, the ith term in the sum, given by z k+1 i − z k i 2 , can already be computed by Agent i + 1, and therefore the full residual z k+1 − z k 2 can be computed by a global summation and broadcast operation.The same stopping criterion can also be applied to the algorithm presented in Section 4 generated by the iteration given in (32a) and (32b).
In order to analyse convergence of (8), we introduce the underlying fixed point operator T : where x = (x 1 , . . ., x n ) ∈ H n depends on z = (z 1 , . . ., z n−1 ) ∈ H n−1 and is given by , In this way, the sequence (z k ) given by (8a) satisfies z k+1 = T (z k ) for all k ∈ N. Remark 3.2.Note that, although the sum of cocoercive operators is cocoercive (see, e.g., [5, Proposition 4.12]), considering the sum of n − 1 operators in (1) gives the freedom of either applying each operator as a forward step before the corresponding backward step, or to apply the sum of all of them before a particular backward step (by setting all the operators to be equal to zero except for one of them, which would be equal to the sum).Remark 3.3 (Special cases).If n = 2, then x 1 = x n−1 and T in ( 9) recovers the operator corresponding to Davis-Yin splitting [12,11,1] for finding a zero of In turn, this includes the forwardbackward algorithm and Douglas-Rachford splitting as special cases by further taking 9) reduces to the resolvent splitting algorithms proposed by the authors in [16].This has been further studied in [6] for the particular case in which the operators A i are normal cones of closed linear subspaces.
Although the number of set-valued and single-valued monotone operators in (7) differ by one, it is straightforward to derive a scheme where this is not the case by setting A 1 = 0.In this case, x 1 = J λA 1 (z 1 ) = z 1 can be used to eliminate x 1 so that ( 9) and ( 10) respectively become . While at first it may seem unusual that the number of set-valued and single-valued monotone operators in (7) are not the same, we note that this same situation arises in Davis-Yin splitting as described above.Remark 3.4.The algorithm given by (8) appears to be new even in the special case with A i = 0 and B i = ∇f i for convex smooth functions f i .In this case, one of the most popular algorithms for solving min x i f i (x) in a decentralised way is EXTRA, proposed in [22].They are similar in spirit, but also have quite different properties.In particular, the main update of EXTRA is where W and W are certain mixing matrices and x 1 = Wx 0 − λ∇f(x 0 ).Undoubtedly, an advantage of EXTRA is the ability to use a wider range of mixing matrices which, in terms of communication, generalises better for network topology.
In what follows, we first describe the relationship between the solutions of the monotone inclusion (7) and the fixed point set of the operator T in (9).Lemma 3.5.Let n 2 and γ, λ > 0. The following assertions hold. Consequently, Proof.(a): Let x ∈ zer Furthermore, we have which implies that x = J λA n (2x − zn−1 − λB n−1 (x)).Altogether, it follows that z ∈ Fix T .(b): Let z ∈ Fix T and set x := J λA 1 (z 1 ).Then (11) holds thanks to the definition of T .The definition of the resolvent therefore implies

Summing together the above inclusions gives x ∈ zer
Next, we study the nonexpansivity properties of the operator T in (9).
(a) The sequence (z k ) converges weakly to a point z ∈ Fix T .
(b): By nonexpansivity of resolvents, L-Lipschitz continuity of B 1 , . . ., B n−1 , and boundedness of (z k ), it follows that (x k ) is also bounded.Further, ( 9) and the fact that lim k→∞ z k+1 − z k = 0 implies that lim k→∞ Next, using the definition of the resolvent together with (8b), we have where ) and the operator S : H n ⇒ H n is given by As the sum of two maximally monotone operators is again maximally monotone provided that one of the operators has full domain [5, Corollary 24.4(i)], it follows that S is maximally monotone.Consequently, it is demiclosed [5,Proposition 20.38].That is, its graph is sequentially closed in the weak-strong topology.
Let w ∈ H n be an arbitrary weak cluster point of the sequence (x k ).As a consequence of ( 20), w = (x, . . ., x) for some x ∈ H. Taking the limit along a subsequence of (x k ) which converges weakly to w in (21), using demiclosedness of S together with L-Lipschitz continuity of B 1 , . . ., B n−1 , and unravelling the resulting expression gives In other words, w = (x, . . ., x) ∈ H n with x := J A 1 (z 1 ) is the unique weak sequential cluster point of the bounded sequence (x k ).We therefore deduce that (x k ) converges weakly to w, which completes this part of the proof.
(c): For convenience, denote so that x k i = J λA i (y k i ) for all i ∈ 1, n .Define y = (y 1 , . . ., y n ) in an analogous way with z in place of z k and (x, . . ., x) in place of x k , so that x = J λA i (y i ) for all i ∈ 1, n .Using firm nonexpansivity of resolvents yields Rearranging (23) followed by applying 1 L -cocoercivity of B 1 , . . ., B n−1 gives Note that the left-hand side of (24) converges to zero due to (20) and the boundedness of sequences (z k ), (x k ) and Remark 3.8 (Attouch-Théra duality).Let I ⊆ {1, . . ., n − 1} be a non-empty index set with cardinality denoted by |I|.Express the monotone inclusion (1) as and note that the first operator i∈I B i is 1 |I|L -cocoercive (see, e.g., [5,Proposition 4.12]).The Attouch-Théra dual [3] associated with (25) takes the form where we note that the first operator i∈I B i −1 is 1 |I|L -strongly monotone.Hence, as a strongly monotone inclusion, (26) has a unique solution ū ∈ H.Moreover, for any solution x ∈ H of (25), [3, Theorem 3.1] implies ū = i∈I B i (x).In the context of the previous result, Theorem 3.7(c) implies In other words, the algorithm in (8) also produces a sequence which converges strongly to the unique solution of the dual inclusion (26).(ii) In the special case when n = 2, (12) from Lemma 3.6 simplifies to give the stronger inequality This assures averagedness of T provided that γ ∈ 0, 2 − λL 2 , which is larger than the range of permissible values for γ in the statement of Theorem 3.7.However, by using (27), a proof similar to that of Theorem 3.7 guarantees the convergence for a larger range of parameter values, namely, when λ ∈ 0, 4  L and γ ∈ 0, 2 − λL 2 .For details, see [11,1].

A Distributed Forward-Reflected-Backward Method
Let n 3 and consider the problem where A 1 , . . ., A n : H ⇒ H are maximally monotone and B 1 , . . ., B n−2 : H → H are monotone and L-Lipschitz continuous.Developing splitting algorithms which use forward evaluations of Lipschitz continuous monotone operators is generally more intricate than those exploiting cocoercivity, such as the one in the previous section.For concreteness, consider the special case of (28) with two operators given by It is well known that the forward-backward method for (29) given by fails to converge for any λ > 0. Indeed, consider the particular instance of (29) given by H = R 2 , A 1 := 0 and B 1 := 0 −1 1 0 , whose unique solution is (0, 0) T .Then, B 1 is skew-symmetric and thus monotone (but not cocoercive), but the sequence generated by (30) will diverge for any non-zero starting point, since the eigenvalues of Id −λB 1 are 1 ± λi.However, a small modification of (30) gives rise to which is known as the forward-reflected-backward method [15].Unlike (30), it converges for any λ < 1 2L .While (31) is not the only constant stepsize scheme for solving (29), as there are a few which are fundamentally different [23,10], it is arguably one of the simplest.In this section, we develop a modification of the method from the previous section which converges for Lipschitz continuous operators by drawing inspiration from the differences between (31) and (30).
Given λ ∈ 0, 1 2L and γ ∈ 0, 1 − 2λL and an initial point z 0 = (z 0 1 , . . ., z 0 n−1 ) ∈ H n−1 , our proposed algorithm for (28) generates two sequences, (z k ) ⊆ H n−1 and (x k ) ⊆ H n , according to Compared to the algorithm proposed in the previous section, the only major change here is that some expressions for x k i in (32b) incorporate a "reflection-type" term involving the operator B i−2 .This precise form seems important for our subsequence convergence analysis and it seems not easy to incorporate "reflection-type" terms involving the operator B i−1 .The structure of (32) allows for a similar protocol to the one described in Algorithm 1 to be used for a distributed decentralised implementation.The only change to the protocol (in terms of communication) is that Agent i must also now send λ B i−1 (x k i ) − B i−1 (x k i−1 ) to Agent i + 1 for all i ∈ 2, n − 1 .Remark 4.1.To the best of our knowledge, the scheme given by (32) does not directly recover any existing forward-backward-type scheme as special case (although it is clearly related to (31)).For example, if we take n = 3 and A 1 = A 3 = 0. Then x k 1 and x k 3 can be eliminated from (32) to give )) .To better understand the relationship between this and (31), it is instructive to consider the limiting case with γ = 1.Indeed, when γ = 1, x k 2 and z k 2 can be eliminated to give ) .
Although this closely resembles (31) for finding zero of A 2 + B 1 , it is not exactly the same due to the index of the first term inside the resolvent.
In order to analyse (32), we introduce the underlying fixed point operator T : where x = (x 1 , . . ., x n ) ∈ H n depends on z = (z 1 , . . ., z n ) ∈ H and is given by for i ∈ 3, n − 1 .In this way, the sequence (z k ) given by (32) satisfies z k+1 = T (z k ) for all k ∈ N.
Remark 4.3.Compared to Lemma 3.6 from the previous section, the conclusions of Lemma 4.2 are weaker in two ways.Firstly, the permissible stepsize range of λ ∈ (0, 1 2L ) is smaller than in Lemma 3.6, which allowed λ ∈ (0, 2 L ).And, secondly, the operator T is only shown to be strongly quasi-nonexpansive in Lemma 4.2 whereas T is known to be averaged nonexpansive.
The following theorem is our main result regarding convergence of (32).
) ⊆ H n be the sequences given by (32).Then the following assertions hold.) and γ ∈ 0, 1−2λL , Lemma 4.2 implies that (z k ) is Fejér monotone with respect to Fix T and that lim k→+∞ z k+1 − z k = 0.By nonexpansivity of resolvents, L-Lipschitz continuity of B 2 , . . ., B n−1 , and boundedness of (z k ), it follows that (x k ) is also bounded.Further, (33) and the fact that lim k→∞ z k+1 − z k = 0 implies that lim k→∞ Let u = (u 1 , . . ., u n−1 ) ∈ H n−1 be an arbitrary weak cluster point of (z k ).Then, due to (44), there exists a point x ∈ H such that (u, w) is a weak cluster point of (z k , x k ), where w = (x, . . ., x) ∈ H n .Let S denote the maximally monotone operator defined by ( 22) when B n−1 = 0. Then (32b) implies where b k i := B i−1 (x k i ) − B i−1 (x k i−1 ).Taking the limit along a subsequence of (z k , x k ) which converges weakly to (u, w) in (45), using demiclosedness of S together with L-Lipschitz continuity of B 2 , . . ., B n−1 , and unravelling the resulting expression gives that u ∈ Fix T and x = J λA 1 (u 1 ) ∈ zer  x 2 = J λA 2 z 2 + x 1 − z 1 − λB 1 (x 1 ) , x n = J This modification can be shown to converge using a proof similar to Theorem 4.4 for λ ∈ (0, 1 2L ).However, it is not straightforward to recover Theorem 3.7 as a special case of such a result because the stepsizes range of λ ∈ (0, 2 L ) in the cocoercive only case (i.e., Theorem 3.7) are larger than the range in the mixed case.Moreover, Theorem 3.7(c) (strong convergence to dual solutions) does not have an analogue in the statement of Theorem 4.4.In addition, keeping the two cases separate allows the analysis to be as transparent as possible.

Example 1 . 2 (
Structured saddle-point problems).Consider the saddle-point problem given by min

Theorem 4 . 4 .
Let n 3, let A 1 , . . ., A n : H ⇒ H be maximally monotone and let B 1 , . . ., B n−2 : H → H be monotone and L-Lipschitz continuous with zer n i=1 (a) The sequence (z k ) converges weakly to a point z ∈ Fix T .(b)The sequence (x k ) converges weakly to a point (x, . . ., x) ∈ H n with x ∈ zern i=1 A i + n−2 i=1 B i .Proof.(a): Since zern i=1 A i + n−2 i=1 B i = ∅, Lemma 3.5(a)implies that the set of fixed points of operator T in (9)-(10) (with B n−1 = 0) is nonempty.The latter set coincides with the set of fixed points of operator T in (33)-(34), so Fix T = ∅.Since λ ∈ 0, 1 2L n i=1 A i + n−2 i=1 B i .Thus, by [5, Theorem 5.5], it follows that (z k ) converges weakly to a point z ∈ Fix T .(b): Follows by using an argument analogous to the one in Theorem 3.7(b).

Remark 4 . 5 (Bx 1 =
Exploiting cocoercivity).If a Lipschitz continuous operator B i in (28) is actually cocoercive, then it is possible to reduce the number evaluations of B i per iteration by combining the ideas in Sections 3 and 4. In fact, we can consider the problem find x ∈ H such that 0 ∈ i (x), where B 1 , . . ., B n−2 are each either monotone and Lipschitz continuous or cocoercive, and B n−1 is cocoercive.For this problem, we can replace (34) in the definition of T with J λA 1 z 1 ,