ADMM for monotone operators: convergence analysis and rates

We propose in this paper a unifying scheme for several algorithms from the literature dedicated to the solving of monotone inclusion problems involving compositions with linear continuous operators in infinite dimensional Hilbert spaces. We show that a number of primal-dual algorithms for monotone inclusions and also the classical ADMM numerical scheme for convex optimization problems, along with some of its variants, can be embedded in this unifying scheme. While in the first part of the paper convergence results for the iterates are reported, the second part is devoted to the derivation of convergence rates obtained by combining variable metric techniques with strategies based on suitable choice of dynamical step sizes.


Introduction and preliminaries
Consider the convex optimization problem where H and G are real Hilbert spaces, f : H → R := R ∪ {±∞} and g : G → R are proper, convex and lower semicontinuous functions, h : H → R is a convex and Fréchet differentiable function with Lipschitz continuous gradient and L : H → G is a linear continuous operator. Due to numerous applications in fields like signal and image processing, portfolio optimization, cluster analysis, location theory, network communication, machine learning, the design and investigation of numerical algorithms for solving convex optimization problems of type (1) attracted in the last couple of years huge interest from the applied mathematics community. The most prominent methods one can find in the literature for solving (1) are the primal-dual proximal splitting algorithms and the ADMM algorithms. We briefly describe the two classes of algorithms.
First proximal splitting algorithms for solving convex optimization problems involving compositions with linear continuous operators have been reported by Combettes and Ways [16], Esser, Zhang and Chan [22] and Chambolle and Pock [13]. Further investigations have been made in the more general framework of finding zeros of sums of linearly composed maximally monotone operators, and monotone and Lipschitz, respectively, cocoercive operators. The resulting numerical schemes have been employed in the solving of the inclusion problem find x ∈ H such that 0 ∈ ∂f (x) + (L * • ∂g • L)(x) + ∇h(x), which represents the system of optimality conditions of problem (1). Briceño-Arias and Combettes pioneered this approach in [12], by reformulating the general monotone inclusion in an appropriate product space as the sum of a maximally monotone operator and a linear and skew one, and by solving the resulting inclusion problem via a forwardbackward-forward type algorithm (see also [14]). Afterwards, by using the same product space approach, this time in a suitable renormed space, Vũ succeeded in [30] in formulating a primaldual splitting algorithm of forward-backward type, in other words, by saving a forward step. Condat has presented in [17] in the variational case algorithms of the same nature with the one in [30]. Under strong monotonicity/convexity assumptions and the use of dynamic step size strategies convergence rates have been provided in [8] for the primal-dual algorithm in [30] (see also [13]), and in [9] for the primal-dual algorithm in [14].
We describe the ADMM algorithm for solving (1) in the case h = 0, which corresponds to the standard setting in the literature. By introducing an auxiliary variable one can rewrite (1) as inf For a fixed real number c > 0 we consider the augmented Lagrangian associated with problem (3), which is defined as The ADMM algorithm relies on the alternating minimization of the augmented Lagrangian with respect to the variables x and z (see [11,20,[23][24][25] and Remark 4 for the exact formulation of the iterative scheme). Generally, the minimization with respect to the variable x does not lead to a proximal step. This drawback has been overcome by Shefi and Teboulle in [28] by introducing additional suitably chosen metrics, and also in [2] for an extension of the ADMM algorithm designed for problems which involve also smooth parts in the objective. The aim of this paper is to provide a unifying algorithmic scheme for solving monotone inclusion problems which encompasses several primal-dual iterative methods [7,13,17,30], and the ADMM algorithm (and its variants from [28]) in the particular case of convex optimization problems. A closer look at the structure of the new algorithmic scheme shows that it translates the paradigm behind ADMM methods for optimization problems to the solving of monotone inclusions. We carry out a convergence analysis for the proposed iterative scheme by making use of techniques relying on the Opial Lemma applied in a variable metric setting. Furthermore, we derive convergence rates for the iterates under supplementary strong monotonicity assumptions. To this aim we use a dynamic step strategy, based on which we can provide a unifying scheme for the algorithms in [8,13]. Not least we also provide accelerated versions for the classical ADMM algorithm (and its variable metric variants).
In what follows we recall some elements of the theory of monotone operators in Hilbert spaces and refer for more details to [3,4,29].
Let H be a real Hilbert space with inner product ·, · and associated norm · = ·, · . For an arbitrary set-valued operator A : H ⇒ H we denote by Gr A = {(x, u) ∈ H × H : u ∈ Ax} its graph, by dom A = {x ∈ H : Ax = ∅} its domain and by A −1 : When G is another Hilbert space and L : H → G is a linear continuous operator, then L * : G → H, defined by L * y, x = y, Lx for all (x, y) ∈ H × G, denotes the adjoint operator of L, while the norm of L is defined as L = sup{ Lx : Since the variational case will be also in the focus of our investigations, we recall next some elements of convex analysis.
For a function f : H → R we denote by dom f = {x ∈ H : f (x) < +∞} its effective domain and say that f is proper, if dom f = ∅ and f (x) = −∞ for all x ∈ H. We denote by Γ(H) the family of proper convex and lower semi-continuous extended real-valued functions defined on ∂f is a maximally monotone operator (cf. [27]) and it holds (∂f ) −1 = ∂f * . For f, g : H → R two proper functions, we consider also their infimal convolution, which is the function f g : When f ∈ Γ(H) and γ > 0, for every x ∈ H we denote by prox γf (x) the proximal point of parameter γ of f at x, which is the unique optimal solution of the optimization problem Notice that J γ∂f = (Id +γ∂f ) −1 = prox γf , thus prox γf : H → H is a single-valued operator fulfilling the extended Moreau's decomposition formula Finally, we say that the function f : H → R is γ-strongly convex for γ > 0, if f − γ 2 · 2 is a convex function. This property implies that ∂f is γ-strongly monotone (see [3,Example 22.3]).

The ADMM paradigm employed to monotone inclusions
In this section we propose an algorithm for solving monotone inclusion problems involving compositions with linear continuous operators in infinite dimensional Hilbert spaces which is designed in the spirit of the ADMM paradigm.

Problem formulation, algorithm and particular cases
The following problem represents the central point of our investigations.
together with its dual monotone inclusion Simple algebraic manipulations yield that (8) is equivalent to the problem which can be equivalently written as We say that (x, v) ∈ H× G is a primal-dual solution to the primal-dual pair of monotone inclusions (7)- (8), if − L * v ∈ Ax + Cx and v ∈ B(Lx).
If x ∈ H is a solution to (7), then there exists v ∈ G such that (x, v) is a primal-dual solution to (7)- (8). On the other hand, if v ∈ G is a solution to (8), then there exists x ∈ H such that (x, v) is a primal-dual solution to (7)- (8). Furthermore, if (x, v) ∈ H× G is a primal-dual solution to (7)- (8), then x is a solution to (7) and v is a solution to (8).
Next we relate this general setting to the solving of a primal-dual pair of convex optimization problems.
Problem 2 Let H and G be real Hilbert spaces, f ∈ Γ(H), g ∈ Γ(G), h : H → R a convex and Fréchet differentiable function with η −1 -Lipschitz continuous gradient, for η > 0, and L : H → G a linear continuous operator. Consider the primal convex optimization problem and its Fenchel dual problem The system of optimality conditions for the primal-dual pair of optimization problems (11)-(12) reads: which is actually a particular formulation of (10) when A := ∂f, C := ∇h, B := ∂g.
Notice that, due to the Baillon-Haddad Theorem (see [3,Corollary 18.16]), ∇h is η-cocoercive. If (11) has an optimal solution x ∈ H and a suitable qualification condition is fulfilled, then there exists v ∈ G, an optimal solution to (12), such that (13) holds. If (12) has an optimal solution v ∈ G and a suitable qualification condition is fulfilled, then there exists x ∈ H, an optimal solution to (11), such that (13) holds. Furthermore, if the pair (x, v) ∈ H × G satisfies relation (13), then x is an optimal solution to (11), v is an optimal solution to (12) and the optimal objective values of (11) and (12) coincide.
One of the most popular and useful qualification conditions guaranteeing the existence of a dual optimal solution is the one known under the name Attouch-Brézis and which requires that: holds. Here, for S ⊆ G a convex set, we denote by is a closed linear subspace of G} its strong quasi-relative interior. The topological interior is contained in the strong quasi-relative interior: int S ⊆ sqri S, however, in general this inclusion may be strict. If G is finite-dimensional, then for a nonempty and convex set S ⊆ G, one has sqri S = ri S, which denotes the topological interior of S relative to its affine hull. Considering again the infinite dimensional setting, we remark that condition (15) is fulfilled, if there exists x ′ ∈ dom f such that Lx ′ ∈ dom g and g is continuous at Lx ′ . For further considerations on convex duality we refer to [3-5, 21, 31]. Throughout the paper the following additional notations and facts will be used. We denote by S + (H) the family of operators U : H → H which are linear, continuous, self-adjoint and positive semidefinite. For U ∈ S + (H) we consider the semi-norm defined by The Loewner partial ordering is defined for U 1 , U 2 ∈ S + (H) by Finally, for α > 0, we set Indeed, this is a consequence of the relation and of the maximal monotonicity of the operator U −1 A in the renormed Hilbert space (H, ·, · U ) (see for example [15,Lemma 3.7]), where x, y U := x, U y ∀x, y ∈ H.
We are now in the position to formulate the algorithm relying on the ADMM paradigm for solving the primal-dual pair of monotone inclusions (7)- (8).
For all k ≥ 0 generate the sequence (x k , z k , y k ) k≥0 as follows: As shown below, several algorithms from the literature can be embedded in this numerical scheme.
Remark 4 For all k ≥ 0, the equations (16) and (17) are equivalent to and, respectively, Notice that the latter is equivalent to In the variational setting as described in Problem 2, namely, by choosing the operators as in (14), the inclusion (19) becomes which is equivalent to On the other hand, (20) becomes which is equivalent to Consequently, the iterative scheme (16)-(18) reads which is the algorithm formulated and investigated by Banert, Boţ and Csetnek in [2]. The case when h = 0 and M k 1 , M k 2 are constant for every k ≥ 0 has been considered in the setting of finite dimensional Hilbert spaces by Shefi and Teboulle [28]. We want to emphasize that when h = 0 and M k 1 = M k 2 = 0 for all k ≥ 0 the iterative scheme (24)-(26) collapses into the classical version of the ADMM algorithm.
On the other hand, by using (4), relation (17) reads By using again (18), this can be reformulated as The iterative scheme in (27)-(28) generates for a given starting point (x 1 , y 0 ) ∈ H × G and c > 0 a sequence (x k , y k ) k≥1 which is generated for all k ≥ 0 as follows: When τ k = τ for all k ≥ 1, the algorithm (29)-(30) recovers a numerical scheme for solving monotone inclusion problems proposed by Vũ in [30, Theorem 3.1]. More precisely, the errorfree variant of the algorithm in [30, Theorem 3.1] formulated for a constant sequence (λ n ) n∈N equal to 1 and employed to the solving of the primal-dual pair (9)-(7) (by reversing the order in Problem 1, that is, by treating (9) as the primal monotone inclusion and (7) as its dual monotone inclusion) is nothing else than the iterative scheme (29)- (30).

Convergence analysis
In this subsection we will address the convergence of the sequence of iterates generated by Algorithm 3. One of the tools we will use in the proof of the convergence statement is the following version of the Opial Lemma formulated in the setting of variable metrics (see [15,Theorem 3.3]).
Lemma 6 Let S be a nonempty subset of H and (x k ) k≥0 be a sequence in H. Let α > 0 and W k ∈ P α (H) be such that W k W k+1 for all k ≥ 0. Assume that: (i) for all z ∈ S and for all k ≥ 0: (ii) every weak sequential cluster point of (x k ) k≥0 belongs to S. Then (x k ) k≥0 converges weakly to an element in S.
We present the first main theorem of this manuscript.
Theorem 7 Consider the setting of Problem 1 and assume that the set of primal-dual solutions to the primal-dual pair of monotone inclusions (7)-(8) is nonempty. Let (x k , z k , y k ) k≥0 be the sequence generated by Algorithm 3 and assume that M k for all k ≥ 0. If one of the following assumptions: (II) there exist α, α 2 > 0 such that L * L ∈ P α (H) and M k 2 ∈ P α 2 (G) for all k ≥ 0; is fulfilled, then there exists (x, v), a primal-dual solution to (7)- (8), such that (x k , z k , y k ) k≥0 converges weakly to (x, Lx, v).
Let k ≥ 0 be fixed. From (19) and the monotonicity of A we have while from (20) and the monotonicity of B we have Since C is η-cocoercive, we have Summing up the three inequalities obtained above we get (18) we also have By expressing the inner products through norms we further derive By expressing Lx k+1 − z k+1 using again relation (18) and by taking into account that we obtain From here, using the monotonicity assumptions on (M k 1 ) k≥0 and (M k 2 ) k≥0 , it yields 1 2 x Discarding the negative terms on the right-hand side of the above inequality (notice that M k 1 − 1 2η Id ∈ S + (H) for all k ≥ 0), it follows that statement (i) in Opial Lemma (Lemma 6) holds, when applied in the product space H × G × G, for the sequence (x k , z k , y k ) k≥0 , for W k := (M k 1 , M k 2 + c Id, c −1 Id) for k ≥ 0, and for S defined as in (37). Furthermore, summing up the inequalities in (40), we get Consider first the hypotheses in assumption (I). Since M k and A direct consequence of (42) and (43) is From (18), (43) and (44) we derive Next we show that the relations (42)-(45) are fulfilled also under assumption (II). Indeed, in this situation we derive from (41) that (43) and (44) hold. From (18), (43) and (44) we obtain (45). Finally, the inequalities yield (42). The relations (42)-(45) will play an essential role when verifying assumption (ii) in the Opial Lemma for S taken as in (37). Let (x, z, y) ∈ H × G × G be such that there exists (k n ) n≥0 , k n → +∞ (as n → +∞), and (x kn , z kn , y kn ) converges weakly to (x, z, y) (as n → +∞).
From (42) and the linearity and the continuity of L we obtain that (Lx kn+1 ) n∈N converges weakly to Lx (as n → +∞), which combined with (43) yields z = Lx. We use now the following notations for n ≥ 0: From (19) we have for all n ≥ 0 a * n ∈ (A + C)(a n ).
Further, from (20) and (18) we have for all n ≥ 0 b * n ∈ Bb n .
Furthermore, from (42) we have a n converges weakly to x (as n → +∞).
Finally, we have By using the fact that C is η −1 -Lipschitz continuous, from (42)-(45) we get a * n + L * b * n converges strongly to 0 (as n → +∞).
Let us define T : H ×G ⇒ H ×G by T (x, y) = (A(x)+C(x))×B −1 (y) and K : H ×G → H ×G by K(x, y) = (L * y, −Lx) for all (x, y) ∈ H × G. Since C is maximally monotone with full domain (see [3]), A + C is maximally monotone, too (see [3]), thus T is maximally monotone. Since K is s skew operator, it is also maximally monotone (see [3]). Due to the fact that K has full domain, we conclude that T + K is a maximally monotone operator.
Moreover, from (47) and (48) we have Since the graph of a maximally monotone operator is sequentially closed with respect to the weak×strong topology (see [3,Proposition 20.33]), from (53) The latter is nothing else than saying that (x, y) is a primal dual-solution to (7)- (8), which combined with z = Lx implies that the second assumption of the Opial Lemma is verified, too.
Remark 8 (i) Choosing as in Remark 5 M k 1 := 1 τ k Id −cL * L, with τ k > 0 and τ := sup k≥0 τ k ∈ R, and M k 2 := 0 for all k ≥ 0, we have which means that under the assumption 1 τ − c L 2 > 1 2η (which recovers the one in Algorithm 3.2 and Theorem 3.1 in [17]) the operators M k 1 − 1 2η Id belong for all k ≥ 0 to the class P α 1 (H) with α 1 := 1 τ − c L 2 − 1 2η > 0. (ii) Let us briefly discuss the condition considered in (II): ∃α > 0 such that L * L ∈ P α (H). (55) By taking into account [3,Fact 2.19], one can see that (55) holds if and only if L is injective and ran L * is closed. This means that if ran L * is closed, then (55) is equivalent to L is injective. Hence, in finite dimensional spaces, namely, if H = R n and G = R m , with m ≥ n ≥ 1, (55) is nothing else than saying that L has full column rank, which is a widely used assumption in the proof of the convergence of the classical ADMM algorithm.
In the second convergence result of this section we consider the case when C is identically 0. We notice that this cannot be encompassed in the above theorem due to the assumptions which involve the cococercivity constant η in the denominator of some fractions and which do not allow us to take it equal to zero. (I) there exists α 1 > 0 such that M k 1 ∈ P α 1 (H) for all k ≥ 0; (II) there exist α, α 2 > 0 such that L * L ∈ P α (H) and M k 2 ∈ P α 2 (G) for all k ≥ 0; (III) there exists α > 0 such that L * L ∈ P α (H) and 2M k+1 is fulfilled, then there exists (x, v), a primal-dual solution to (7)- (8), such that (x k , z k , y k ) k≥0 converges weakly to (x, Lx, v).
Take an arbitrary k ≥ 0. As in the proof of Theorem 7, we derive the inequality Under assumption (I) the conclusion follows as in the proof of Theorem 7 by making use of the Opial Lemma. Consider the situation when the hypotheses in assumption (II) are fulfilled. By using telescopic sum techniques, it follows that (43) and (44) hold. From (18), (43) and (44) we obtain (45). Finally, by using again the inequality (46), relation (42) holds, too.
On the other hand, (56) yields that hence (y k ) k≥0 and (z k ) k≥0 are bounded. Combining this with the condition imposed on L, we derive that (x k ) k≥0 is bounded, too. Hence there exists a weak convergent subsequence of (x k , z k , y k ) k≥0 . By using the same arguments as in the second part of the proof of Theorem 7, one can see that every sequential weak cluster point of (x k , z k , y k ) k≥0 belongs to the set S defined in (37). In the remaining of the proof we show that the set of sequential weak cluster points of (x k , z k , y k ) k≥0 is a singleton. Let (x 1 , z 1 , y 1 ), (x 2 , z 2 , y 2 ) be two such sequential weak cluster points. Then there exist (k p ) p≥0 , (k q ) q≥0 , k p → +∞ (as p → +∞), k q → +∞ (as q → +∞), a subsequence (x kp , z kp , y kp ) p≥0 which converges weakly to (x 1 , z 1 , y 1 ) (as p → +∞), and a subsequence (x kq , z kq , y kq ) q≥0 which converges weakly to (x 2 , z 2 , y 2 ) (as q → +∞). As shown above, (x 1 , z 1 , y 1 ) and (x 2 , z 2 , y 2 ) belong to the set S (see (37)), thus z i = Lx i , i ∈ {1, 2}. From (57), which is true for every primal-dual solution to (7)-(8), we derive where, for (x * , Lx * , y * ) the expression E(x k , z k , y k ; x * , Lx * , y * ) is defined as Further, we have for all k ≥ 0 which, after adding it with (56), leads to Remark 12 If C is a η-cocoercive operator for η > 0, then C is monotone and η −1 -Lipschitz continuous. Though, the converse statement may fail. The skew operator (x, y) → (L * y, −Lx) is for instance monotone and Lipschitz continuous, and not cocoercive. This operator appears in a natural way when considering formulating the system of optimality conditions for convex optimization problems involving compositions with linear continuous operators (see [12]). Notice that due to the celebrated Baillon-Haddad Theorem (see, for instance, [3,Corollary 8.16]), the gradient of a convex and Fréchet differentiable function is η-cocoercive if and only if it is η −1 -Lipschitz continuous.

Remark 13
In the setting of Problem 11 the operator A + L * • B • L + C is strongly monotone, thus the monotone inclusion problem (7) has at most one solution. Hence, if (x, v) is a primaldual solution to the primal-dual pair (7)- (8), then x is the unique solution to (7). Notice that the problem (8) may not have an unique solution.
We propose the following algorithm for the formulation of which we use dynamic step sizes.
Algorithm 14 For all k ≥ 0, let M k 2 : G → G be a linear, continuous and self-adjoint operator such that τ k LL * + M k 2 ∈ P α k (G) for α k > 0 for all k ≥ 0. Choose (x 0 , z 0 , y 0 ) ∈ H × G × G. For all k ≥ 0 generate the sequence (x k , z k , y k ) k≥0 as follows: where λ, τ k , θ k > 0 for all k ≥ 0.

Remark 15
We would like to emphasize that when C = 0 Algorithm 14 has a similar structure to Algorithm 3. Indeed, in this setting, the monotone inclusion problems (7) and (9) become and, respectively, The two problems (65) and (66) are dual to each other in the sense of the Attouch-Théra duality (see [1]). By taking in (62)-(64) λ = 1, θ k = 1 (which corresponds to the limit case µ = 0 and γ = 0 in the equation (73) below) and τ k = c > 0 for all k ≥ 0, then the resulting iterative scheme reads This is nothing else than Algorithm 3 employed to the solving of the primal-dual system of monotone inclusions (66)-(65), that is, by treating (66) as the primal monotone inclusion and (65) as its dual monotone inclusion (notice that in this case we take in relation (17) of Algorithm 3 M k 2 = 0 for all k ≥ 0). Concerning the parameters involved in Algorithm 14 we assume that there exists σ 0 > 0 such that and for all k ≥ 0: Remark 16 Fix an arbitrary k ≥ 1. From (62) we have where Due to (64) we have −τ k z k = τ k L * y k + θ k−1 (x k − x k−1 ), which combined with (78) delivers Fix now an arbitrary k ≥ 0. From (4) and (63) we have By using (64) we obtain Finally, the definition of the resolvent yields the relation Remark 17 The choice leads to so-called accelerated versions of primal-dual algorithms that have been intensively studied in the literature. Indeed, under these auspices (62) becomes (by taking into account also (64)) This together with (81) gives rise for all k ≥ 0 to the following numerical scheme which has been investigated by Boţ, Csetnek, Heinrich and Hendrich in [8,Algorithm 5]. Not least, assuming that C = 0 and λ = 1, the variational case A = ∂f and B = ∂g leads for all k ≥ 0 to the numerical scheme which has been considered by Chambolle and Pock in [13,Algorithm 2]. We also notice that condition (83) guarantees the fulfillment of both (76) and (77), due to the fact that the sequence (τ k+1 σ k ) k≥0 is constant (see (74) and (75)).

Remark 18
Assume again that C = 0 and consider the variational case as described in Problem 2. From (78) and (79) we derive for all k ≥ 1 the relation which in case M k 2 ∈ S + (G) is equivalent to Algorithm 14 becomes in case λ = 1 which can be regarded as an accelerated version of the algorithm (24)- (26) in Remark 4.
We present the main theorem of this section.

Theorem 19
Consider the setting of Problem 11 and let (x, v) be a primal-dual solution to the primal-dual system of monotone inclusions (7)- (8). Let (x k , z k , y k ) k≥0 be the sequence generated by Algorithm 14 and assume that the relations (70)-(77) are fulfilled. Then we have for all n ≥ 2 Moreover, lim n→+∞ nτ n = λ γ , hence one obtains for (x n ) n≥0 an order of convergence of O( 1 n ).

Remark 20
In Remark 17 we provided an example of a family of linear, continuous and selfadjoint operators (M k 2 ) k≥0 for which the relations (76) and (77) are fulfilled. In the following we will furnish more examples in this sense.
To begin we notice that simple algebraic manipulations easily lead to the conclusion that if then (θ k ) k≥0 is monotonically increasing.
In the examples below we replace (70) with the stronger assumption (94).
In view of the above theorem, the iterative scheme obtained in this particular instance (see Remark 18) can be regarded as an accelerated version of the classical ADMM algorithm (see Remark 4 and Remark 8(ii)).