1 Introduction

In the last decades, monotone inclusion problems have become an attractive topic of research in operator theory and numerical optimization. The wide variety of situations in applied mathematics that can be modeled as finding a zero of the sum of mixtures of maximally monotone operators is one of the reasons for its increasing popularity. Among the methods that are usually employed for tackling these problems, splitting algorithms (see, e.g., [2, Chapter 26]) are the ones that have received more attention. Using simple operations, these methods define an iterative sequence which separately handles the operators defining the problem and is convergent to a solution to the inclusion problem. Furthermore, as these methods only use first-order information, they are well suited for large-scale optimization problems.

In this work, we focus on the study of primal-dual splitting algorithms for composite monotone inclusion problems in real Hilbert spaces of the following form.

Problem 1

Let \({\mathscr{H}}\) and \((\mathcal {G}_{j})_{1\leq j\leq m}\) be real Hilbert spaces. Let \(A_{1}, \ldots , A_{n}:{\mathscr{H}} \rightrightarrows {\mathscr{H}}\) be maximally monotone operators, let \(B_{j}:\mathcal {G}_{j}\rightrightarrows \mathcal {G}_{j}\) be maximally monotone and \(L_{j}:{\mathscr{H}} \to \mathcal {G}_{j}\) be a bounded linear operator whose adjoint is denoted by \(L_{j}^{*}\), for all j ∈{1,…,m}. The problem consists in solving the primal inclusion

$$ \text{find } x \in \mathcal{H} \text{ such that } 0\in\sum\limits_{i=1}^{n} A_{i}(x) + \sum\limits_{j=1}^{m}L_{j}^{*}B_{j} (L_{j}x), $$
(1)

together with its associated dual inclusion

$$ \text{find } (u_{1},\ldots,u_{m})\!\in{\kern-.5pt}\mathcal{G}_{1}{\kern-.5pt}\times\cdots\times{\kern-.5pt}\mathcal{G}_{m}~ \text{such that} ~\left( \exists x\! \in\!\mathcal{H}\right)\! \left\{ \begin{array}{lll} \displaystyle -\sum\limits_{j=1}^{m} L_{j}^{*}u_{j} \in \sum\limits_{i=1}^{n} A_{i}(x), \\ u_{j} {\kern-.5pt}\in{\kern-.5pt} B_{j}(L_{j}x) \quad j{\kern-.5pt}={\kern-.5pt}1{\kern-.5pt},\ldots{\kern-.5pt},{\kern-.5pt}m{\kern-.5pt}. \end{array} \right. $$
(2)

Problem 1 encompasses numerous important problems in mathematical optimization and real-world applications (see, e.g., [10, 11, 20]). In these settings, it is highly desirable to devise algorithms that simultaneously obtain solutions to both problems (1) and (2) –namely, a primal-dual solution– and which only make use of resolvents of the maximally monotone operators, forward evaluations of the linear operators and their adjoints, scalar multiplication, and vector addition. Many splitting methods can be found in the literature satisfying these conditions (see, e.g., [3,4,5, 12, 22]). One of the best-known primal-dual algorithm is the one proposed by Briceño-Arias and Combettes in [7], which was further studied in [6]. To derive this scheme, let us consider first the particular instance of Problem 1 in which n = m = 1 and let us define the pair of operators M and N given by

$$ \left\{ \begin{array}{ll} M:\mathcal{H}\times\mathcal{G}\rightrightarrows\mathcal{H}\times\mathcal{G} : (x,u) \to A(x) \times B^{-1}(u), \\ N:\mathcal{H}\times\mathcal{G}\to\mathcal{H}\times\mathcal{G} : (x,u) \to (L^{*}u,-Lx). \end{array} \right. $$

The operator M is maximally monotone and N is a skew symmetric bounded linear operator. Furthermore, the set of zeros of the sum M + N consists of primal-dual solutions to Problem 1. Applying the forward-backward-forward algorithm to the problem of finding the zeros of M + N results in the fixed point iteration given by

$$ x^{k+1} = \big(J_{\gamma M} \left( \text{Id} - \gamma N\right) + \gamma N \left( \text{Id} - J_{\gamma M}\left( \text{Id} - \gamma N\right)\right)\big)(x^{k}) \quad \forall k \geq 0, $$
(3)

where γ > 0, Id denotes the identity operator and JγA stands for the resolvent of A with parameter γ (see Definition 3). Thus, since the resolvent of a cartesian product is the cartesian product of the resolvents, it can be seen that (3) is a full splitting algorithm, as it only requires evaluations of the resolvents JγA and \(J_{\gamma B^{-1}}\), and of the linear operator and its adjoint.

The general problem involving more than two operators can be addressed by setting

$$ \begin{array}{ll} A :=A_{1}, \quad B := A_{2}\times\cdots\times A_{n} \times B_{1}\times\cdots\times B_{m}\\ \text{and } \quad \qquad L := \text{Id} \times \stackrel{(n)}{\cdots} \times \text{Id} \times L_{1} \times {\cdots} \times L_{m}. \end{array} $$

In this case, according to (3), the resulting algorithm is generated by a fixed point iteration of an operator defined in the ambient space \({\mathscr{H}}^{n} \times \mathcal {G}_{1} \times {\cdots } \times \mathcal {G}_{m}\). The dimension of the underlying space is directly related to the memory requirements of the resulting algorithm. In general, a smaller dimension of the space translates into less consumption of computational resources. For this reason, the development of algorithms with reduced dimension for solving monotone inclusion problems has recently become an active topic of research [8, 14, 16, 19].

Lifted splitting algorithms

The notion of lifted splitting, first introduced in [19], relates a fixed point algorithm with the dimension of its underlying ambient space. Consider the simplest case of the classical monotone inclusion problem obtained by setting m = 0 in (1):

Problem 2

Let \(A_{1},\ldots ,A_{n}:{\mathscr{H}}\rightrightarrows {\mathscr{H}}\) be maximally monotone operators and consider the problem

$$ \text{ find } x\in\mathcal{H} \text{ such that } 0\in\sum\limits_{i=1}^{n} A_{i}(x). $$

A fixed point algorithm for finding a solution to Problem 2 employs a d-fold lifting if its underlying fixed point operator can be defined on the d-fold Cartesian product \({\mathscr{H}}^{d}\). For example, if n = 2, the famous Douglas–Rachford algorithm [15] makes use of a 1-fold lifting, since it can be written as the fixed point iteration

$$ x^{k+1} = x^{k} + \lambda \left( J_{A_{2}}\left( 2J_{A_{1}} -\text{Id}\right) -J_{A_{1}} \right)(x^{k}) \quad \forall k \geq 0, $$

with λ ∈ ]0,2[. Until very recently, the only way to tackle the problem when n > 2 was using Pierra’s product space reformulation [18], which implies an n-fold lifting. Nowadays, various algorithms have been proposed allowing to solve the problem by only resorting to an (n − 1)-lifting (see, e.g., [8, 13, 14]). This reduction from n to n − 1 has been proven to be minimal [16] when the algorithms are required to be frugal resolvent splittings [19], which means that each of the resolvents \(J_{A_{1}}, \ldots , J_{A_{n}}\) is evaluated only once per iteration.

To the best of the authors’ knowledge, the notion of lifting has not been developed in the setting of primal-dual inclusions given by Problem 1. We will say that a primal-dual splitting has (d,f)-lifting if the underlying fixed point operator can be written in the product space

$$ \mathcal{H}^{d} \times \mathcal{G}_{1}^{f_{1}} \times {\cdots} \times \mathcal{G}_{m}^{f_{m}}, $$

with \(f = {\sum }_{j=1}^{m} f_{j}\). Thus, the Briceño–Arias–Combettes primal-dual splitting algorithm makes use of an (n,m)-fold lifting. This is also the case for the other primal-dual algorithms existing in the literature. In this work, we propose the first (n − 1,m)-lifted splitting method for solving primal-dual inclusions and demonstrate the minimality of the algorithm. In order to do this, it is important to note that the definition of frugal resolvent splitting does not allow the use of parametrized resolvents. The inclusion of these resolvent parameters is of crucial importance for controlling the Lipschitz constants of the linear operators in Problem 1, as can be seen in all the existent primal-dual schemes. This motivates the introduction of the concept of frugal parametrized resolvent splitting, whose definition coincides with the one of frugal resolvent splitting except that it permits the inclusion of resolvent parameters. Our contribution to the theory of minimal lifting splitting methods is double: (i) we extend the results of Malitsky–Tam in [16, Section 3] to frugal parametrized resolvent splitting algorithms, (ii) we prove that for a frugal primal-dual parametrized resolvent splitting (see Section 3 for a precise definition) with (d,m)-fold lifting to solve Problem 1, one necessarily has dn − 1. Our proposed algorithm is the first algorithm in the literature being minimal according to this relation.

The rest of this work is structured as follows. In Section 2, we recall some preliminary notions and results. In particular, in Section 2.1, we present the extension of the results by Malitsky–Tam [16] to parametrized resolvent splitting algorithms. In Section 3, we introduce the first primal-dual algorithm with reduced lifting for composite monotone inclusion problems and prove its convergence. The concept of parametrized resolvent splitting is adapted to primal-dual schemes in Section 3. We prove a minimality theorem under the hypothesis of frugality and show that our proposed algorithm verifies it. In Section 4, we include a numerical experiment on image deblurring and compare the performance of the new algorithm with the best performing primal-dual algorithm for this problem. The paper ends with some conclusions and possible future work directions in Section 5. Finally, in Appendix A, a detailed proof of the results in Section 2.1 is presented.

2 Preliminaries

Throughout this paper, \({\mathscr{H}}\), \(\mathcal {G}\), and \((\mathcal {G}_{j})_{1 \leq j \leq m}\) are real Hilbert spaces. Otherwise stated, to simplify the notation we will employ 〈⋅,⋅〉 and ∥⋅∥ to denote the inner product and the induced norm, respectively, of any space. We use → to denote norm convergence of a sequence. We denote by \({\mathscr{H}}^{n}\) the product Hilbert space \({\mathscr{H}}^{n}={\mathscr{H}} \times \stackrel {(n)}{\cdots } \times {\mathscr{H}}\) with inner product defined as

$$ \langle (x_{1},\ldots,x_{n}), (\bar{x}_{1},\ldots,\bar{x}_{n})\rangle := \sum\limits_{i=1}^{n} \langle x_{i},\bar{x}_{i}\rangle\quad\forall (x_{1},\ldots,x_{n}), (\bar{x}_{1},\ldots,\bar{x}_{n})\in\mathcal{H}^{n}. $$

Sequences and sets in product spaces are marked with bold, e.g., \(\mathbf {x}=(x_{1},\ldots ,x_{n})\in {\mathscr{H}}^{n}\).

For a set-valued operator, we write \(A:{\mathscr{H}} \rightrightarrows {\mathscr{H}}\), in opposite to \(A:{\mathscr{H}}\to {\mathscr{H}}\) which denotes a single-valued operator. The notation dom, Fix, zer and gra is used for the domain, the set of fixed points, the zeros and the graph of A, respectively, i.e.,

$$ \begin{array}{@{}rcl@{}} \text{dom} A:&=&\left\{x\in\mathcal{H} : A(x)\neq\emptyset\right\},\quad \text{Fix} A:=\left\{x\in\mathcal{H} : x\in A(x)\right\}, \\ \text{zer} A&:=&\left\{x\in\mathcal{H} : 0\in A(x)\right\},\quad \text{gra} A:=\left\{(x,u)\in\mathcal{H}\times\mathcal{H} : u\in A(x)\right\} . \end{array} $$

The inverse operator of A, denoted by A− 1, is the operator whose graph is given by \(\text {gra} A^{-1} = \left \{ (u,x)\in {\mathscr{H}}\times {\mathscr{H}} : u\in A(x)\right \}\). The identity operator is denoted by Id. When \(L:{\mathscr{H}}\to \mathcal {G}\) is a bounded linear operator, we use \(L^{*}:\mathcal {G}\to {\mathscr{H}}\) to denote its adjoint, which is the unique bounded linear operator such that 〈Lx,y〉 = 〈x,Ly〉, for all \(x\in {\mathscr{H}}\) and \(y\in \mathcal {G}\).

To simplify the notation, we will use ⟦k,l⟧ to denote the set of integers between \(k,l\in \mathbb {N}\), i.e.,

$$ \llbracket k, l\rrbracket := \left\{\begin{array}{ll} \{k, k+1, {\ldots} ,l\} & \text{if } k\leq l, \\ \emptyset & \text{otherwise}. \end{array}\right. $$

Definition 1

An operator \(T:{\mathscr{H}} \to {\mathscr{H}}\) is said to be

  1. (i)

    κ-Lipschitz continuous for κ > 0 if

    $$ \|T(x)-T(y)\| \leq \kappa \|x-y\| \quad \forall x,y \in \mathcal{H}; $$
  2. (ii)

    nonexpansive if it is 1-Lipschitz continuous, i.e.,

    $$ \|T(x)-T(y)\| \leq \|x-y\| \quad \forall x,y \in \mathcal{H}; $$
  3. (iii)

    α-averaged nonexpansive for α ∈ ]0,1[ if

    $$ \|T(x)-T(y)\|^{2}+\frac{1-\alpha}{\alpha} \|(\text{Id}-T)(x)-(\text{Id}-T)(y)\|^{2} \leq \|x-y\|^{2} \quad \forall x,y\in\mathcal{H}. $$

Definition 2

A set-valued operator \(A:{\mathscr{H}} \rightrightarrows {\mathscr{H}}\) is monotone if

$$ \langle x-y,u-v\rangle \geq 0 \quad \forall (x,u),(y,v)\in \text{gra} {A}. $$

Furthermore, A is said to be maximally monotone if there exists no monotone operator \(B:{\mathscr{H}} \rightrightarrows {\mathscr{H}}\) such that graB properly contains graA.

Definition 3

Given an operator \(A\colon {\mathscr{H}}\rightrightarrows {\mathscr{H}}\), the resolvent of A with parameter γ > 0 is the operator \(J_{\gamma A}\colon {\mathscr{H}}\rightrightarrows {\mathscr{H}}\) defined by JγA := (Id + γA)− 1.

The next result contains Minty’s theorem [17].

Proposition 1 ([2, Proposition 23.10])

Let \(A:{\mathscr{H}}\rightrightarrows {\mathscr{H}}\) be monotone and let γ > 0. Then,

  1. (i)

    JγA is single-valued,

  2. (ii)

    \(\text {dom} J_{\gamma A}={\mathscr{H}}\) if and only if A is maximally monotone.

2.1 Parametrized resolvent splitting

Besides developing lifted splitting algorithms with reduced dimension, different works have been devoted to determine the minimal dimension reduction that can be achieved under some conditions. This is the case of [16, 19], where a minimality result is obtained for the classical monotone inclusion Problem 2. In what follows, we employ T for denoting a fixed point operator and S for a solution operator, both depending on the maximally monotone operators appearing in the problem.

Definition 4 (Fixed point encoding 19)

A pair of operators (T,S) is a fixed point encoding for Problem 2 if, for all particular instance of the problem,

$$ \text{Fix} {T}\neq \emptyset \Longleftrightarrow \text{zer}~{\left( \sum\limits_{i=1}^{n} A_{i}\right)}\neq\emptyset \text{ and } \mathbf{z}\in \text{Fix} {T} \Longrightarrow S(\mathbf{z}) \in \text{zer } {\left( \sum\limits_{i=1}^{n} A_{i}\right)}. $$

Previous works on minimality are based on the concept of resolvent splitting, which does not allow employing parametrized resolvents (i.e., it only permits computation of the resolvents \(J_{A_{1}},\ldots ,J_{A_{n}}\)). In this work, we introduce the notion of parametrized resolvent splitting and adapt the minimality result in [16, Section 3] to the more general parametrized setting. Since the reasoning is very similar to the one in the mentioned reference, we only present the results here and refer the interested reader to Appendix A for a detailed demonstration.

Definition 5 (Parametrized resolvent splitting)

A fixed point encoding (T,S) for Problem 2 is a parametrized resolvent splitting if, for all particular instances of the problem, there is a finite procedure that evaluates T and S at a given point which only uses vector addition, scalar multiplication, and the parametrized resolvents of A1,…,An.

Definition 6 (Frugality)

A parametrized resolvent splitting (T,S) for Problem 2 is frugal if, in addition, each of the parametrized resolvents of A1,…,An is used exactly once.

Definition 7 (Lifting 19)

Let \(d\in \mathbb {N}\). A fixed point encoding (T,S) is a d-fold lifting for Problem 2 if \(T:{\mathscr{H}}^{d}\to {\mathscr{H}}^{d}\) and \(S:{\mathscr{H}}^{d}\to {\mathscr{H}}\).

Example 1

In [8], a product space reformulation with reduced dimension is proposed, which applied to Problem 2 yields the following lifted splitting. Given any γ > 0 and λ ∈ ]0,2], the algorithm in [8, Theorem 5.1] can be defined by the operator \(R:{\mathscr{H}}^{n-1}\to {\mathscr{H}}^{n-1}\) given by

$$ R(\mathbf{z}) := \mathbf{z} + \lambda \left( \begin{array}{c} x_{1} - x_{0} \\ x_{2} - x_{0} \\ {\vdots} \\ x_{n-1} - x_{0} \end{array}\right), $$

where z = (z0,z1,…,zn− 1) and \(\mathbf {x} = (x_{0},x_{1},\ldots ,x_{n-1})\in {\mathscr{H}}^{n}\) is the vector defined as

$$ \left\{ \begin{array}{l} x_{0} = J_{\frac{\gamma}{n-1} A_{n}} \left( \frac{1}{n-1} \sum\limits_{i=1}^{n-1} z_{i}\right), \\ x_{i} = J_{\gamma A_{i}} (2 x_{0} - z_{i}) \quad \forall i\in{\llbracket 1,n-1\rrbracket}. \end{array} \right. $$

Moreover, if we let \(S:{\mathscr{H}}^{n-1} \to {\mathscr{H}}\) be the operator given by

$$ S(\mathbf{z}) := J_{\frac{\gamma}{n-1} A_{n}} \left( \frac{1}{n-1} \sum\limits_{i=1}^{n-1} z_{i}\right), $$

then the pair (R,S) is a frugal parametrized resolvent splitting with (n − 1)-fold lifting which is not a resolvent splitting, since it makes use of resolvent parameters.

Malitsky and Tam prove in [16, Theorem 3.3] that the minimal lifting that one can achieve for Problem 2 with frugal resolvent splittings is n − 1. From their proof, it cannot be directly determined whether the same result holds when the resolvents are allowed to have different parameters. The next theorem provides an affirmative answer to this question.

Theorem 2 (Minimal lifting for frugal parametrized splittings)

Let n ≥ 2 and let (T,S) be a frugal parametrized resolvent splitting with d-fold lifting for Problem 2. Then, dn − 1.

Proof

See Theorem 6 in Appendix A. □

3 A primal-dual splitting with minimal lifting

In this section, we devise a primal-dual splitting algorithm for Problem 1 with minimal lifting. We base our analysis in the case in which the primal problem involves only one linear composition, i.e., m = 1, and later extend to an arbitrary finite number of linearly composed maximally monotone operators by appealing to a product space reformulation. Lastly, we prove minimality of the algorithm by adapting the concept of lifted splitting to primal-dual algorithms.

The case with one linear composition

Let n ≥ 2. We start by considering the primal-dual problem given by

$$ \text{find } x \in \mathcal{H} \text{ such that } 0\in\sum\limits_{i=1}^{n} A_{i}(x) + L^{*} B (Lx), $$
(4)

and

$$ \text{find } u\in\mathcal{G} \text{ such that } 0\in -L \left( \sum\limits_{i=1}^{n} A_{i} \right)^{-1} \bigl(- L^{*} u\bigr) + B^{-1} (u), $$
(5)

where \(A_{1},\ldots ,A_{n}:{\mathscr{H}}\rightrightarrows {\mathscr{H}}\) and \(B:\mathcal {G} \rightrightarrows \mathcal {G}\) are maximally monotone operators and \(L:{\mathscr{H}}\to \mathcal {G}\) is a bounded linear operator. Note that in this case, (5) corresponds to the Attouch–Théra dual problem of (4) (see [1]). In the following, we denote the set of solutions of (4) and (5) by \(\mathcal {P}\) and \(\mathcal {D}\), respectively, and consider the set Z defined as

$$ \mathbf{Z} := \left\{ (x,u)\in \mathcal{H} \times \mathcal{G} : -L^{*}u \in\sum\limits_{i=1}^{n} A_{i}(x) \text{ and } u \in B(Lx)\right\}, $$

which is useful for tackling primal-dual inclusion problems. It is well-known that Z is a subset of \(\mathcal {P} \times \mathcal {D}\) and that

$$ \mathcal{P} \neq \emptyset \Longleftrightarrow \mathbf{Z} \neq \emptyset \Longleftrightarrow \mathcal{D} \neq \emptyset. $$

Indeed, we have

$$ \begin{array}{@{}rcl@{}} \exists x\in\mathcal{P} & \Longleftrightarrow & (\exists x\in\mathcal{H})\quad 0\in\sum\limits_{i=1}^{n} A_{i}(x)+L^{*}B(Lx) \\ & \Longleftrightarrow & (\exists (x,u) \in\mathcal{H} \times \mathcal{G}) \left\{ \begin{array}{ll} -L^{*}(u) & \in \sum\limits_{i=1}^{n} A_{i}(x), \\ & u \in B(Lx), \end{array} \right.\\ & \Longleftrightarrow & (\exists (x,u) \in\mathcal{H} \times \mathcal{G}) \left\{ \begin{array}{ll} x \in \left( \displaystyle\sum\limits_{i=1}^{n} A_{i}\right)^{-1}\bigl(-L^{*}u\bigr), \\ Lx\in B^{-1} (u), \end{array} \right. \\ &\Longleftrightarrow & (\exists u\in\mathcal{G}) \quad 0\in -L \left( \sum\limits_{i=1}^{n} A_{i} \right)^{-1} \bigl(- L^{*} u\bigr) + B^{-1} (u) \Longleftrightarrow \exists u \in\mathcal{D}. \end{array} $$

We refer to an element of Z as a primal-dual solution of (4) and (5).

Now, we introduce a fixed point algorithm for solving the primal-dual problem given by (4) and (5). Let λ,γ > 0 and let \(T:{\mathscr{H}}^{n-1} \times \mathcal {G} \to {\mathscr{H}}^{n-1} \times \mathcal {G}\) be the operator given by

$$ T \left( \begin{array}{c} \mathbf{z}\\ v \end{array}\right) : = \left( \begin{array}{c} \mathbf{z}\\v \end{array}\right) + \lambda \left( \begin{array}{c} x_{2} - x_{1}\\ x_{3}- x_{2}\\ \vdots\\ x_{n}-x_{n-1}\\ \gamma(y-Lx_{n}), \end{array}\right) $$
(6)

where \((\mathbf {x},y)=(x_{1},\ldots ,x_{n},y) \in {\mathscr{H}}^{n}\times \mathcal {G}\) depends on \((\mathbf {z},v) = (z_{1},\ldots ,z_{n-1},v) \in {\mathscr{H}}^{n-1} \times \mathcal {G}\) in the following way

$$ \left\{ \begin{array}{lll} x_{1} &=& J_{A_{1}}(z_{1}), \\ x_{i} &=& J_{A_{i}}(z_{i}+x_{i-1}-z_{i-1}), \quad \forall i\in{\llbracket 2,n-1\rrbracket}, \\ x_{n} &=& J_{A_{n}}(x_{1}+x_{n-1} -z_{n-1}- L^{*} (\gamma Lx_{1} - v)), \\ y &=& J_{B/\gamma} \left( L(x_{1} + x_{n})- \frac{v}{\gamma}\right). \end{array} \right. $$
(7)

In the next lemma, we characterize the set of fixed points of the operator T by means of the set of primal-dual solutions to (4) and (5).

Lemma 1

Let n ≥ 2 and λ,γ > 0. The following assertions hold.

  1. (i)

    If \((\bar {x},\bar {u})\in \mathbf {Z}\), then there exists \(\bar {\textbf {z}}\in {\mathscr{H}}^{n-1}\) such that \((\bar {\textbf {z}},\gamma L\bar {x}-\bar {u})\in \text {Fix} {T}\).

  2. (ii)

    If \((\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\in \text {Fix} {T}\), then \((J_{A_{1}}(\bar {z}_{1}),\gamma L\bar {x}-\bar {v})\in \mathbf {Z}\).

As a result,

$$ \text{Fix} {T} \neq \emptyset \Longleftrightarrow \mathbf{Z} \neq \emptyset. $$

Proof

(i) Let \((\bar {x},\bar {u})\in \mathbf {Z}\). Then, \(\bar {u} \in B(L\bar {x})\) and there exists \((a_{1},\ldots ,a_{n})\in {\mathscr{H}}^{n}\) such that \(a_{i}\in A_{i}(\bar {x})\) and \(-L^{*}\bar {u} = {\sum }_{i=1}^{n} a_{i}\). Consider the vectors \((\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\in {\mathscr{H}}^{n-1}\times \mathcal {G}\) defined as

$$ \left\{ \begin{array}{lll} \bar{z}_{1} &:=& \bar{x} + a_{1} \in (\text{Id} + A_{1})(\bar{x}), \\ \bar{z}_{i} &:=& a_{i} + \bar{z}_{i-1} = (\text{Id} + A_{i})(\bar{x}) - \bar{x} + \bar{z}_{i-1}, \quad \forall i\in{\llbracket2,n-1\rrbracket}, \\ \bar{v} &: =& \gamma L\bar{x}-\bar{u} \in \left( \gamma \text{Id} - B\right)(L\bar{x}). \end{array} \right. $$

Then, we deduce that \(\bar {x} = J_{A_{1}}(\bar {z}_{1})\) and \(\bar {x} = J_{A_{i}} (\bar {z}_{i} + \bar {x}-\bar {z}_{i-1})\) for all i ∈ ⟦2,n − 1⟧. Moreover, we have

$$ \begin{array}{@{}rcl@{}} 2\bar{x} - \bar{z}_{n-1}- L^{*}(\gamma L\bar{x}-\bar{v})& = &2\bar{x} -\bar{z}_{n-1} - L^{*}(\bar{u}) \\ & =& \bar{x} + a_{n} + \bar{x}-\bar{z}_{n-1} + \sum\limits_{i=1}^{n-1} a_{i} \\ & =& \bar{x}+a_{n} + \bar{x}-\bar{z}_{n-1} + \sum\limits_{i=2}^{n-1} (\bar{z}_{i} - \bar{z}_{i-1}) + \bar{z}_{1}-\bar{x} \\ & =& (\text{Id} + A_{n})(\bar{x}). \end{array} $$

Altogether, we obtain

$$ \left\{ \begin{array}{lll} \bar{x} &=& J_{A_{1}}(\bar{z}_{1}), \\ \bar{x} &=& J_{A_{i}}(\bar{z}_{i}+\bar{x}-\bar{z}_{i-1}), \quad \forall i\in{\llbracket 2,n-1\rrbracket}, \\ \bar{x} &=& J_{A_{n}}(2\bar{x} -\bar{z}_{n-1}- L^{*} (\gamma L\bar{x} - \bar{v})), \\ L\bar{x} & =& J_{B/\gamma}\left (2L\bar{x}- \frac{\bar{v}}{\gamma}\right), \end{array} \right. $$

which implies that \((\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\in \text {Fix} {T}\).

(ii) Let \((\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\in \text {Fix} {T}\) and set \(\bar {x}:= J_{A}(\bar {z}_{1})\). By (6), \(y = L\bar {x}\) and \(\bar {x}_{i} = \bar {x}\) for all i = 1,…,n. Consequently, from (7), we derive

$$ \left\{ \begin{array}{l} \bar{z}_{1}-\bar{x} \in A_{1}(\bar{x}), \\ \bar{z}_{i}-\bar{z}_{i-1} \in A_{i}(\bar{x}), \quad \forall i\in{\llbracket 2,n-1\rrbracket}, \\ \bar{x}-\bar{z}_{n-1}-L^{*}(\gamma L\bar{x}-\bar{v})\in A_{n}(\bar{x}), \\ \gamma L\bar{x}-\bar{v} \in B (L\bar{x}). \end{array} \right. $$

Summing together the first n inclusions above and setting \(\bar {u}:=\gamma L\bar {x}-\bar {v}\), we deduce

$$ \left\{ \begin{array}{ll} -L^{*}\bar{u} & \in \sum\limits_{i=1}^{n} A_{i}(\bar{x}), \\ & \bar{u} \in B(L\bar{x}), \end{array} \right. $$

which implies \((\bar {x},\bar {u})\in \mathbf {Z}\), as claimed. □

The following technical lemma provides nonexpansive properties of the operator T in the Hilbert space \({\mathscr{H}}^{n-1} \times \mathcal {G} \) with scalar product given by

$$ \langle (z_{1},\ldots,z_{n-1},v),(\bar{z}_{1},\ldots,\bar{z}_{n-1},\bar{v}) \rangle_{\gamma} : = \sum\limits_{i=1}^{n-1} \langle z_{i}, \bar{z}_{i}\rangle_{\mathcal{H}} + \frac{1}{\gamma} \langle v, \bar{v} \rangle_{\mathcal{G}}, $$
(8)

for \((z_{1},\ldots ,z_{n-1},v),(\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\in {\mathscr{H}}^{n-1}\times \mathcal {G}\) and γ > 0.

Lemma 2

For all \((\mathbf {z},v) \! =\! (z_{1},\ldots ,z_{n-1},v)\! \in \! {\mathscr{H}}^{n-1}\! \times \! \mathcal {G}\) and \((\bar {\mathbf {z}},\bar {v}) \! =\! (\bar {z}_{1},\ldots ,\bar {z}_{n-1},\bar {v})\) \(\in {\mathscr{H}}^{n-1}\times \mathcal {G}\),

$$ \begin{array}{@{}rcl@{}} \| T(\mathbf{z},v) &-& T({\bar{\mathbf{z}}},\bar{v})\|^{2}_{\gamma} + \frac{1-\lambda}{\lambda} \| \left( \mathrm{I}\mathrm{d}-T\right)(\mathbf{z},v)-\left( \mathrm{I}\mathrm{d}-T\right)({\bar{\mathbf{z}}},\bar{v})\|^{2}_{\gamma}\\ &+& \frac{1-\gamma \|L\|^{2}}{\lambda} \left\| \sum\limits_{i=1}^{n-1} \left( \mathrm{I}\mathrm{d}-T\right)(\mathbf{z},v)_{i} - \sum\limits_{i=1}^{n-1} \left( \mathrm{I}\mathrm{d}-T\right)({\bar{\mathbf{z}}},\bar{v})_{i}\right\|^{2}_{\gamma}\\ &\leq& \|(\mathbf{z},v)-({\bar{\mathbf{z}}},\bar{v})\|^{2}_{\gamma}, \end{array} $$
(9)

where ∥⋅∥γ denotes the norm induced by the scalar product (8). In particular, if λ ∈ ]0,1[ and \(\gamma \in {\left ]0,\frac {1}{\|L\|^{2}}\right ]}\), the operator T is λ-averaged nonexpansive.

Proof

Let \((x_{1},\ldots ,x_{n},y)\in {\mathscr{H}}^{n}\times \mathcal {G}\) and \((\bar {x}_{1},\ldots ,\bar {x}_{n},\bar {y})\in {\mathscr{H}}^{n} \times \mathcal {G}\) be given by (7) from (z,v) and \(({\bar {\mathbf {z}}},\bar {v})\), respectively. For simplicity, we denote (z+,v+) = T(z,v) and \(({\bar {\mathbf {z}}}^{+},\bar {v}^{+})=T({\bar {\mathbf {z}}},\bar {v})\). Since z1x1A1(x1) and \(\bar {z}_{1}-\bar {x}_{1}\in A_{1}(\bar {x}_{1})\), by monotonicity of A1

$$ \begin{array}{@{}rcl@{}} 0 & \leq& \langle (z_{1}-x_{1}) - (\bar{z}_{1}-\bar{x}_{1}), x_{1}-\bar{x}_{1} \rangle \\ & =& \langle (z_{1}-x_{1}) -(\bar{z}_{1}-\bar{x}_{1}), x_{1}-x_{2}\rangle + \langle (z_{1}-x_{1}) -(\bar{z}_{1}-\bar{x}_{1}), x_{2} - \bar{x}_{1}\rangle. \end{array} $$
(10)

For every i ∈ ⟦2,n − 1⟧ , we have zi + xi− 1zi− 1xiAi(xi) and \(\bar {z}_{i} + \bar {x}_{i-1}-\bar {z}_{i-1}-\bar {x}_{i}\in A_{i}(\bar {x}_{i})\) and thus, by monotonicity of Ai

$$ \begin{array}{@{}rcl@{}} 0 & \leq& \langle (z_{i}+x_{i-1}-z_{i-1}-x_{i})- (\bar{z}_{i} + \bar{x}_{i-1}-\bar{z}_{i-1}-\bar{x}_{i}), x_{i} - \bar{x}_{i} \rangle \\ & =& \langle (z_{i}-x_{i})-(\bar{z}_{i}-\bar{x}_{i}), x_{i}-\bar{x}_{i}\rangle - \langle (z_{i-1}-x_{i-1})-(\bar{z}_{i-1}-\bar{x}_{i-1}), x_{i}-\bar{x}_{i}\rangle \\ & =& \langle (z_{i}-x_{i})-(\bar{z}_{i}-\bar{x}_{i}),x_{i}-x_{i+1}\rangle + \langle (z_{i}-x_{i})-(\bar{z}_{i}-\bar{x}_{i}), x_{i+1}-\bar{x}_{i}\rangle \\ && \quad - \langle (z_{i-1}-x_{i-1})-(\bar{z}_{i-1}-\bar{x}_{i-1}), x_{i}-\bar{x}_{i-1} \rangle \\ & &\quad- \langle (z_{i-1}-x_{i-1})-(\bar{z}_{i-1}-\bar{x}_{i-1}),\bar{x}_{i-1}-\bar{x}_{i}\rangle. \end{array} $$
(11)

Now, since \(x_{1}+x_{n-1}-z_{n-1}-x_{n} - L^{*}\left (\gamma Lx_{1}-v\right ) \in A_{n}(x_{n}) \) and \(\bar {x}_{1}+\bar {x}_{n-1}-\bar {z}_{n-1}-\bar {x}_{n} - L^{*}\left (\gamma L\bar {x}_{1}-\bar {v}\right )\in A_{n}(\bar {x}_{n})\), again monotonicity of An results in the inequality

$$ \begin{array}{@{}rcl@{}} 0 &\leq& \left\langle x_{1}+x_{n-1}-z_{n-1}-x_{n} - L^{*}\left( \gamma Lx_{1}-v\right), x_{n}-\bar{x}_{n}\right\rangle\\ &&-\left\langle \bar{x}_{1}+\bar{x}_{n-1}-\bar{z}_{n-1}-\bar{x}_{n} - L^{*}\left( \gamma L\bar{x}_{1}-\bar{v}\right), x_{n}-\bar{x}_{n}\right\rangle\\ & = &\langle (x_{n-1}-z_{n-1}) -(\bar{x}_{n-1}-\bar{z}_{n-1}), x_{n}-\bar{x}_{n} \rangle + \langle (x_{1} - \bar{x}_{1})\!-(x_{n}\!-\bar{x}_{n}),x_{n} - \bar{x}_{n} \rangle \\ && - \langle \gamma \left( Lx_{1}-L\bar{x}_{1}\right)-(v-\bar{v}), Lx_{n}-L\bar{x}_{n}\rangle \\ & =& \langle (x_{n-1}-z_{n-1}) -(\bar{x}_{n-1}-\bar{z}_{n-1}), x_{n}- \bar{x}_{n-1} \rangle \\ &&+ \langle (x_{1}-\bar{x}_{1})-(x_{n}-\bar{x}_{n}),x_{n}-\bar{x}_{n} \rangle\\ & & + \langle (x_{n-1}-z_{n-1}) -(\bar{x}_{n-1}-\bar{z}_{n-1}), \bar{x}_{n-1}- \bar{x}_{n} \rangle \\ && - \langle \gamma \left( Lx_{1}-L\bar{x}_{1}\right)-(v-\bar{v}), Lx_{n}-L\bar{x}_{n}\rangle. \\ \end{array} $$
(12)

Finally, we have γL(x1 + xn) − vγyB(y) and \(\gamma L(\bar {x}_{1}+\bar {x}_{n})-\bar {v}-\gamma \bar {y}\in B(\bar {y})\), so by monotonicity of B we get

$$ 0 \leq \langle (\gamma L(x_{1}+x_{n}) -v-\gamma y)-(\gamma L(\bar{x}_{1}+\bar{x}_{n})-\bar{v}-\gamma \bar{y}), y-\bar{y} \rangle. \\ $$
(13)

Summing together (10)–(13) and rearranging, yields

$$ \begin{array}{@{}rcl@{}} 0 &\leq& \sum\limits_{i=1}^{n-1} \langle (x_{i}-x_{i+1})- (\bar{x}_{i}-\bar{x}_{i+1}), z_{i}-\bar{z}_{i}\rangle \\ & & + \sum\limits_{i=1}^{n-1} \langle (x_{i}-\bar{x}_{i})-(x_{i+1}-\bar{x}_{i+1}), \bar{x}_{i}-x_{i}\rangle \\ & & + \langle (x_{1}-\bar{x}_{1}) - (x_{n}-\bar{x}_{n}), x_{n}-\bar{x}_{n}\rangle +\langle \left( Lx_{n}-L\bar{x}_{n}\right)-(y-\bar{y}), v-\bar{v}\rangle \\ &&+ \gamma \langle \bigl(L(x_{1}+x_{n})-L(\bar{x}_{1}+\bar{x}_{n})\bigr)- (y-\bar{y}), y-\bar{y}\rangle \\ && - \gamma \langle Lx_{1}-L\bar{x}_{1},Lx_{n}-L\bar{x}_{n}\rangle . \end{array} $$
(14)

The sums in (14) can be written, respectively, as

$$ \begin{array}{@{}rcl@{}} \sum\limits_{i=1}^{n-1}\!\!\!&\langle&\!\!\!\! (x_{i}-x_{i+1})-(\bar{x}_{i}-\bar{x}_{i+1}), z_{i}-\bar{z}_{i}\rangle \\ &=& \frac{1}{\lambda} \sum\limits_{i=1}^{n-1}\langle (z_{i}-z_{i}^{+})-(\bar{z}_{i}-\bar{z}_{i}^{+}), z_{i}-\bar{z}_{i}\rangle \\ & =& \frac{1}{\lambda} \langle (\mathbf{z}-\mathbf{z}^{+})-({\bar{\mathbf{z}}}-{\bar{\mathbf{z}}}^{+}),\mathbf{z}-{\bar{\mathbf{z}}}\rangle \\ & =& \frac{1}{2\lambda} \left( \|(\mathbf{z}-\mathbf{z}^{+})-({\bar{\mathbf{z}}}-{\bar{\mathbf{z}}}^{+})\|^{2} - \| \mathbf{z}^{+}-{\bar{\mathbf{z}}}^{+}\|^{2} + \|\mathbf{z}- {\bar{\mathbf{z}}}\|^{2} \right), \end{array} $$
(15)

and

$$ \begin{array}{@{}rcl@{}} \sum\limits_{i=1}^{n-1}\!\!\!&\langle&\!\!\!\! (x_{i}-\bar{x}_{i})-(x_{i+1}-\bar{x}_{i+1}), \bar{x}_{i}-x_{i}\rangle \\ & =& \frac{1}{2} {\sum}_{i=1}^{n-1} \left( \|x_{i+1}-\bar{x}_{i+1}\|^{2} - \|x_{i}-\bar{x}_{i}\|^{2} - \| (x_{i}-x_{i+1})-(\bar{x}_{i}-\bar{x}_{i+1})\|^{2} \right) \\ & = &\frac{1}{2} \left( \|x_{n}-\bar{x}_{n}\|^{2} - \|x_{1} -\bar{x}_{1}\|^{2} - \frac{1}{\lambda^{2}} {\sum}_{i=1}^{n-1} \| (z_{i}-z_{i}^{+})-(\bar{z}_{i}- \bar{z}_{i}^{+})\|^{2} \right) \\ & =& \frac{1}{2} \left( \|x_{n}-\bar{x}_{n}\|^{2} - \|x_{1} -\bar{x}_{1}\|^{2} - \frac{1}{\lambda^{2}} \| (\mathbf{z}-\mathbf{z}^{+}) - ({\bar{\mathbf{z}}}-{\bar{\mathbf{z}}}^{+})\|^{2} \right). \end{array} $$
(16)

The third term in (14), becomes

$$ \begin{array}{@{}rcl@{}} \langle (x_{1}-\bar{x}_{1}) &-& (x_{n}-\bar{x}_{n}), x_{n}-\bar{x}_{n}\rangle \\ &=& \frac{1}{2} \left( \|x_{1}-\bar{x}_{1}\|^{2} - \| x_{n}-\bar{x}_{n}\|^{2} - \|(x_{1}-\bar{x}_{1})-(x_{n}-\bar{x}_{n})\|^{2} \right),\qquad \end{array} $$
(17)

while the fourth term yields

$$ \begin{aligned} \langle (Lx_{n}-&L\bar{x}_{n}) - (y-\bar{y}), v-\bar{v} \rangle \\ & = \frac{1}{\gamma \lambda } \langle (v-v^{+})-(\bar{v}-\bar{v}^{+}), v-\bar{v}\rangle \\ & = \frac{1}{2\gamma \lambda} \left( \|(v-v^{+}) -(\bar{v}-\bar{v}^{+})\|^{2} - \|v^{+}-\bar{v}^{+}\|^{2} + \|v-\bar{v}\|^{2} \right). \end{aligned} $$
(18)

Lastly, making use of the Cauchy–Schwarz and Young’s inequalities, the second last term of (14) gives

$$ \begin{aligned} \gamma \langle &\left( L(x_{1}+x_{n})-L(\bar{x}_{1}+\bar{x}_{n})\right)- (y-\bar{y}), y-\bar{y} \rangle \\ & = \gamma \left( \langle Lx_{1}-L\bar{x}_{1}, y-\bar{y}\rangle + \langle \left( Lx_{n}-L\bar{x}_{n}\right)-(y-\bar{y}), y-\bar{y}\rangle \right) \\ & = \frac{\gamma}{2} \left( \|Lx_{n}-L\bar{x}_{n}\|^{2} - \|\left( Lx_{n}-L\bar{x}_{n}\right)-(y-\bar{y})\|^{2} -\|y-\bar{y}\|^{2} \right) \\ &\quad + \gamma \langle Lx_{1}-L\bar{x}_{1}, y-\bar{y}\rangle \\ & \leq \frac{\gamma}{2} \left( \|Lx_{n}-L\bar{x}_{n}\|^{2} - \frac{1}{\gamma^{2} \lambda^{2}} \|(v-v^{+})-(\bar{v}- \bar{v}^{+})\|^{2} -\|y-\bar{y}\|^{2} \right)\\ & \quad + \frac{\gamma}{2} \|Lx_{1}-L\bar{x}_{1}\|^{2} + \frac{\gamma}{2} \|y-\bar{y}\|^{2} \\ & = \frac{\gamma}{2} \|Lx_{1}-L\bar{x}_{1}\|^{2} + \frac{\gamma}{2} \|Lx_{n}-L\bar{x}_{n}\|^{2} - \frac{1}{2 \gamma \lambda^{2}} \|(v-v^{+})-(\bar{v}- \bar{v}^{+})\|^{2}, \end{aligned} $$
(19)

while the last term can be rearranged as follows

$$ \begin{aligned} - \gamma \langle &Lx_{1}-L\bar{x}_{1}, Lx_{n}-L\bar{x}_{n}\rangle \\ & = \frac{\gamma }{2} \left( \| L(x_{1}-x_{n}) - L(\bar{x}_{1}-\bar{x}_{n})\|^{2} - \|Lx_{1}-L\bar{x}_{1}\|^{2} - \|Lx_{n}-L\bar{x}_{n}\|^{2} \right). \end{aligned} $$
(20)

Summing together (19) and (20) and using the Lipschitz continuity of L, we get

$$ \begin{aligned} \gamma \langle& \left( L(x_{1}+x_{n})-L(\bar{x}_{1}+\bar{x}_{n})\right)- (y-\bar{y}), y-\bar{y}\rangle - \gamma \langle Lx_{1}-L\bar{x}_{1},Lx_{n}-L\bar{x}_{n}\rangle \\ & = \frac{\gamma }{2} \| L(x_{1}-x_{n}) - L(\bar{x}_{1}-\bar{x}_{n})\|^{2} - \frac{1}{2 \gamma \lambda^{2}} \|(v-v^{+})-(\bar{v}- \bar{v}^{+})\|^{2} \\ & \leq \frac{\gamma \|L\|^{2} }{2} \| (x_{1}-x_{n}) - (\bar{x}_{1}-\bar{x}_{n})\|^{2} - \frac{1}{2 \gamma \lambda^{2}} \|(v-v^{+})-(\bar{v}- \bar{v}^{+})\|^{2}. \end{aligned} $$
(21)

Multiplying (14) by 2λ and substituting (15)–(21), we obtain the final inequality

$$ \begin{array}{@{}rcl@{}} \|{\mathbf{z}}^{+}-\bar{\mathbf{z}}^{+}\|^{2} &+& \left( \frac{1}{\lambda}-1\right)\left( \|(\mathbf{z}-\mathbf{z}^{+})-(\bar{\mathbf{z}}-\bar{\mathbf{z}}^{+})\|^{2} + \frac{1}{\gamma}\|(v-v^{+})-(\bar{v}-\bar{v}^{+})\|^{2} \right) \\ &+& \frac{1}{\gamma}\|v^{+}-\bar{v}^{+}\|^{2}+\lambda \left( 1-\gamma \|L\|^{2}\right) \|(x_{1}-x_{n}) -(\bar{x}_{1}-\bar{x}_{n})\|^{2} \\ &\leq& \|{\mathbf{z}}-\bar{\mathbf{z}}\|^{2} +\frac{1}{\gamma} \|v -\bar{v}\|^{2}. \end{array} $$

To complete the proof, just note that

$$ \begin{array}{@{}rcl@{}} \lambda (x_{1}-x_{n})-\lambda(\bar{x}_{1}-\bar{x}_{n})&= & \lambda \sum\limits_{i=1}^{n-1} (x_{i}-x_{i+1}) - \lambda \sum\limits_{i=1}^{n-1} (\bar{x}_{i}-\bar{x}_{i+1}) \\ &= & \sum\limits_{i=1}^{n-1} (z_{i}-z_{i}^{+}) - \sum\limits_{i=1}^{n-1} (\bar{z}_{i}-\bar{z}_{i}^{+}), \end{array} $$

from where (9) finally follows. □

Next, we state our main result, which establishes the convergence of the iterative algorithm defined by the operator T in (6) and (7).

Theorem 3

Let n ≥ 2, let \(L:{\mathscr{H}} \to \mathcal {G}\) be a bounded linear operator and let \(A_{1},\ldots ,A_{n}:{\mathscr{H}}\rightrightarrows {\mathscr{H}}\) and \(B:\mathcal {G}\rightrightarrows \mathcal {G}\) be maximally monotone operators with \(\text {zer}~{\left ({\sum }_{i=1}^{n} A_{i} + L^{*}BL\right )\neq \emptyset }\). Furthermore, let λ ∈ ]0,1[ and \(\gamma \in {\left ]0,\frac {1}{\|L\|^{2}}\right ]}\). Given an initial point \((\mathbf {z}^{0},v^{0}) = ({z_{1}^{0}},\ldots ,z_{n-1}^{0},v^{0})\in {\mathscr{H}}^{n-1}\times \mathcal {G}\), consider the sequences given by

$$ \begin{pmatrix}{\mathbf{z}}^{k+1}\\v^{k+1}\end{pmatrix} = \begin{pmatrix}\mathbf{z}^{k}\\v^{k}\end{pmatrix} + \lambda \begin{pmatrix}{x_{2}^{k}} - {x_{1}^{k}}\\ {x_{3}^{k}}- {x_{2}^{k}}\\ \vdots\\ {x_{n}^{k}}-x_{n-1}^{k}\\ \gamma(y^{k}-L{x_{n}^{k}})\end{pmatrix} \quad \forall k\geq 0, $$
(22)

with

$$ \left\{ \begin{aligned} {x_{1}^{k}} &= J_{A_{1}}({z_{1}^{k}}), \\ {x_{i}^{k}} &= J_{A_{i}}({z_{i}^{k}}+x_{i-1}^{k}-z_{i-1}^{k}), \quad \forall i\in{\llbracket 2,n-1\rrbracket}, \\ {x_{n}^{k}} &= J_{A_{n}}({x_{1}^{k}}+x_{n-1}^{k} -z_{n-1}^{k}- L^{*} (\gamma L{x_{1}^{k}} - v^{k})), \\ y^{k} & = J_{B/\gamma} \left( L({x_{1}^{k}} + {x_{n}^{k}})- \frac{v^{k}}{\gamma}\right). \end{aligned} \right. $$
(23)

Then, the following statements hold.

  1. (i)

    The sequence \((\mathbf {z}^{k},v^{k})_{k\in \mathbb {N}}\) converges weakly to a point \((\bar {\mathbf {z}},\bar {v})\in \text {Fix} {T}\).

  2. (ii)

    The sequence \(({x^{k}_{1}},\ldots ,{x^{k}_{n}},y^{k})_{k\in \mathbb {N}}\) converges weakly to \((\bar {x},\ldots ,\bar {x},L\bar {x})\) with \(\bar {x} \in \mathcal {P}\).

  3. (iii)

    The sequence \(\left (\gamma L{x_{i}^{k}}-v^{k} \right )_{k\in \mathbb {N}}\) converges weakly to \(\gamma L\bar {x}-\bar {v}\in \mathcal {D}\), for all i ∈ ⟦1,n⟧.

Proof

(i) The sequence in (22) is the fixed point iteration generated as

$$ \begin{pmatrix}\mathbf{z}^{k+1}\\v^{k+1}\end{pmatrix} = T \begin{pmatrix}\mathbf{z}^{k}\\v^{k}\end{pmatrix} \quad \forall k\geq 0. $$

Since λ ∈ ]0,1[ and \(\gamma \in {\left ]0,\|L\|^{-2}\right ]}\), T is averaged nonexpansive by Lemma 2 and, moreover, FixT = , due to Z and Lemma 1(i). Then, by [2, Theorem 5.15] the sequence \((\mathbf {z}^{k},v^{k})_{k\in \mathbb {N}}\) converges weakly to a point \((\bar {\mathbf {z}},\bar {v})\in \text {Fix} {T}\) and \(\lim _{k\to \infty } \|(\mathbf {z}^{k+1},v^{k+1})-(\mathbf {z}^{k},v^{k})\|_{\gamma }= 0\).

(ii) From (i), the sequence \((\mathbf {z}^{k},v^{k})_{k\in \mathbb {N}}\) is bounded. Then, nonexpansivity of the resolvents and boundedness of the linear operator L imply that the sequence \((\mathbf {x}^{k},y^{k})_{k\in \mathbb {N}}=({x_{1}^{k}},\ldots ,{x_{n}^{k}},y^{k})_{k\in \mathbb {N}}\) is also bounded. Furthermore, the fact that \((\mathbf {z}^{k+1},v^{k+1})_{k\in \mathbb {N}}-(\mathbf {z}^{k},v^{k})_{k\in \mathbb {N}}\to 0\), as \(k\to \infty \), implies by (22) that

$$ y^{k} - L{x_{n}^{k}} \to 0 \text{ and } x_{i+1}^{k} -{x_{i}^{k}} \to 0, \text{ for all } i\in{\llbracket 1,n-2\rrbracket}. $$
(24)

Next, by making use of the definition of resolvents and (23), we can write

$$ C\begin{pmatrix} {z_{1}^{k}}-{x_{1}^{k}} \\ ({z_{2}^{k}}-{x_{2}^{k}})-({z_{1}^{k}}-{x_{1}^{k}}) \\ {\vdots} \\ (z_{n-1}^{k}-x_{n-1}^{k})-(z_{n-2}^{k}-x_{n-2}^{k}) \\ {x_{n}^{k}} \\ \gamma \left( L({x_{1}^{k}}+{x_{n}^{k}})-y^{k}\right)-v^{k} \end{pmatrix} \ni \begin{pmatrix} {x_{1}^{k}}-{x_{n}^{k}} \\ {x_{2}^{k}}-{x_{n}^{k}} \\ {\vdots} \\ x_{n-1}^{k}-{x_{n}^{k}}\\ {x_{1}^{k}}-{x_{n}^{k}} +\gamma L^{*}\left( L{x_{n}^{k}}-y^{k}\right)\\ y^{k}- L{x_{n}^{k}} \end{pmatrix}, $$
(25)

where the operator \(C\colon {\mathscr{H}}^{n}\times \mathcal {G}\rightrightarrows {\mathscr{H}}^{n}\times \mathcal {G}\) is given by

$$ C:= \begin{pmatrix} A_{1}^{-1}\\A_{2}^{-1} \\ {\vdots} \\ A_{n-1}^{-1} \\ A_{n}\\ B^{-1} \end{pmatrix} + \begin{pmatrix} 0 & 0 & {\dots} & 0 & -\text{Id} &0 \\ 0 & 0 & {\dots} & 0 & -\text{Id} &0\\ {\vdots} & {\vdots} & {\ddots} & {\vdots} & {\vdots} &\vdots\\ 0 & 0 & {\dots} & 0 & -\text{Id} &0\\ \text{Id} & \text{Id} & {\dots} & \text{Id} & 0 & L^{*}\\ 0 & 0 &{\dots} &0 &- L &0 \end{pmatrix}. $$
(26)

The operator C is maximally monotone as the sum of a maximally monotone operator and a skew symmetric linear operator (see, e.g., [2, Corollary 25.5 (i) & Example 20.35]). Thus, the graph of C is sequentially closed in the weak-strong topology, by demiclosedness of maximally monotone operators [2, Corollary 20.38].

Now, let \((\bar {\mathbf {x}},\bar {y})\) be a weak sequential cluster point of \((\mathbf {x}^{k},y^{k})_{k\in \mathbb {N}}\). Due to (24), \(\bar {\mathbf {x}}\) is of the form \(\bar {\mathbf {x}} = (\bar {x},\ldots ,\bar {x})\in {\mathscr{H}}^{n}\) and \(\bar {y}=L\bar {x}\). Taking the limit along a subsequence of \((\mathbf {x}^{k},y^{k})_{k\in \mathbb {N}}\) which converges weakly to \((\bar {\mathbf {x}},\bar {y})\) and using demiclosedness of C, (25) and (26) yield the expression

$$ \left\{\begin{array}{l} \bar{z}_{1}-\bar{x} \in A_{1}(\bar{x}), \\ \bar{z}_{i}-\bar{z}_{i-1} \in A_{i}(\bar{x}), \quad \forall i\in{\llbracket 2,n-1\rrbracket}, \\ \bar{x}-\bar{z}_{n-1}-L^{*}(\gamma L \bar{x}-\bar{v})\in A_{n}(\bar{x}) , \\ \gamma L\bar{x}-\bar{v} \in B (L\bar{x}), \end{array}\right. $$

which, by summing the first n equations, implies that \((\bar {x},\gamma L\bar {x}-\bar {v})\in \mathbf {Z}\) with \(\bar {x} = J_{A_{1}}(\bar {z}_{1})\). In particular, we have shown that \((\bar {\mathbf {x}},\bar {y})\) is directly obtained from \(\bar {\mathbf {z}}\), implying that it is the unique weak sequential cluster point of the bounded sequence \((\mathbf {x}^{k},y^{k})_{k\in \mathbb {N}}\). Thus, the full sequence converges weakly to this point.

(iii) From (i)-(ii), for all i ∈ ⟦1,n⟧, we deduce that the sequence \((\gamma L{x_{i}^{k}}-v^{k})_{k\in \mathbb {N}}\) weakly converges to \(\gamma L\bar {x}-\bar {v}\), which belongs to \(\mathcal {D}\) since \((\bar {x},\gamma L\bar {x}-\bar {v})\in \mathbf {Z}\). □

Remark 1 (Malitsky–Tam resolvent splitting 16 as a special case)

Consider Problems (4) and (5) in the particular case in which L = Id. Then, \(B:{\mathscr{H}}\rightrightarrows {\mathscr{H}}\) and (4) becomes the classical monotone inclusion problem with (n + 1)-operators. Furthermore, by setting γ = 1 in Theorem 3, it is straightforward to see that the sequences in (22) and (23) yield the Malitsky–Tam resolvent splitting with minimal lifting for (n + 1)-operators.

Remark 2 (On the parameter γ in the definition of the norm ∥⋅∥γ)

In Lemma 2, we proved that the operator T is λ-averaged nonexpansive with respect to the norm ∥⋅∥γ induced by the scalar product defined in (8). Although the use of this norm did not require detours from the usual procedure to prove convergence of the fixed point algorithm in Theorem 3, it may numerically affect the performance of the algorithm. To give an intuition about this, consider the norm of the sequence of residuals \(\left (\|(\mathbf {z}^{k+1},v^{k+1})-(\mathbf {z}^{k},v^{k})\|_{\gamma }\right )_{k\in \mathbb {N}}\), which converges to 0 as the algorithm reaches a fixed point, and note that we have

$$ \left\|(\mathbf{z}^{k+1},v^{k+1})-(\mathbf{z}^{k},v^{k})\right\|^{2}_{\gamma} = \|\mathbf{z}^{k+1}-\mathbf{z}^{k}\|^{2} + \frac{1}{\gamma} \|v^{k+1}-v^{k}\|^{2} \quad \forall k\geq 0. $$

Lemma 2 implies that this sequence is monotone decreasing, but if γ is very small, the weight of the sequence of dual variables \((v^{k+1}- v^{k})_{k\in \mathbb {N}}\) in the norm would be much larger than the one of the sequence of primal variables \((\mathbf {z}^{k+1}-\mathbf {z}^{k})_{k\in \mathbb {N}}\), so a small decrease in the value of ∥vk+ 1vk∥ will readily imply a decrease of the norm of the sequence of residuals even if ∥zk+ 1zk∥ does not diminish much. Because of that, a larger number of iterations might be needed to achieve convergence of the primal sequence, which can slow down the overall convergence of the algorithm. Nonetheless, it is possible to perform some sort of pre-conditioning to prevent from having a large constant in the definition of the norm. We will further comment on this in the numerical experiments in Section 4.

The case with multiple linear compositions

A standard product space reformulation permits to extend our method to the more general inclusion Problem 1, which has finitely many linearly composed maximally monotone operators. We detail this in the following corollary, while the resulting scheme is displayed in Algorithm 1.

Algorithm 1
figure a

Primal-dual splitting for Problem 1 with (n − 1,m)-lifting, with n ≥ 2.

Corollary 1

Let n ≥ 2 and assume that Problem 1 has a solution. Let λ ∈ ]0,1[ and \(\gamma \in {\left ]0,1/ {\sum }_{j=1}^{m} \|L_{j}\|^{2} \right ]}\). Given some initial points \(\mathbf {z}^{0}= (z_{1},\ldots ,z_{n-1})\in {\mathscr{H}}^{n-1}\) and \(\mathbf {v}^{0} = ({v_{1}^{0}},\ldots ,{v_{m}^{0}})\in \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\), consider the sequences \((\mathbf {z}^{k},\mathbf {v}^{k})_{k\in \mathbb {N}}\) and \((\mathbf {x}^{k},\mathbf {y}^{k})_{k\in \mathbb {N}}\) generated by Algorithm 1. Then, the following assertions hold:

  1. (i)

    The sequence \((\mathbf {z}^{k},\mathbf {v}^{k})_{k\in \mathbb {N}}\) converges weakly to a point \((\bar {\mathbf {z}},\bar {\mathbf {v}})\in {\mathscr{H}}^{n-1}\times \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\).

  2. (ii)

    The sequence \(({x_{1}^{k}}, \ldots , {x_{n}^{k}},{y_{1}^{k}},\ldots ,{y_{m}^{k}})_{k\in \mathbb {N}}\) converges weakly to \((\bar {x},\ldots ,\bar {x},\) \(L_{1}\bar {x},\ldots ,L_{m}\bar {x})\) with \(\bar {x}\in {\mathscr{H}}\) solving the primal inclusion (1).

  3. (iii)

    For all i ∈ ⟦1,n⟧, the sequence \((\gamma L_{1} {x_{i}^{k}}-{v_{1}^{k}},\ldots ,\gamma L_{m} {x_{i}^{k}}-{v_{m}^{k}})_{k\in \mathbb {N}}\) converges weakly to \((\gamma L_{1}\bar {x}-\bar {v}_{1}, \ldots , \gamma L_{m}\bar {x}-\bar {v}_{m})\), which solves the dual inclusion (2).

Proof

Just note that Problem 1 can be reformulated as an instance of Problems (4) and (5) by replacing B by the operator \(\mathbf {B}: \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\rightrightarrows \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\) defined as the cartesian product and L by the linear operator . In particular, \(\|\mathbf {L}\|^{2} ={\sum }_{j=1}^{n}\|L_{j}\|^{2}\) and its adjoint operator is \(\mathbf {L}^{*}:\mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\to {\mathscr{H}}:(v_{1},\ldots ,v_{m})\to {\sum }_{j=1}^{m} L_{j}^{*} v_{j}\). Hence, the result follows by considering the averaged nonexpansive operator T in (6) for this choice of operators and applying Theorem 3. □

Minimality for primal-dual parametrized resolvent splitting

We begin by extending the definition of fixed point encoding to englobe primal-dual problems. As in Section 2.1, we denote by T a fixed point operator and by S a solution operator, both parametrized by the maximally monotone operators as well as the linear and adjoint operators appearing in Problem 1.

Definition 8 (Fixed point encoding)

A pair of operators (T,S) is a fixed point encoding for Problem 1 if, for all particular instance of the problem,

$$ \text{Fix} {T}\neq \emptyset \Longleftrightarrow \text{zer}~{\left( \sum\limits_{i=1}^{n} A_{i}+ \sum\limits_{j=1}^{m} L_{j}^{*}B_{j}L_{j}\right)}\neq\emptyset \text{ and } \mathbf{w}\in\text{Fix} {T} \Longrightarrow S(\mathbf{w}) \in\mathbf{Z}, $$

where we recall that Z denotes the set of primal-dual solutions of the problem.

When talking about lifting for primal-dual problems, the need to distinguish between variables in the space of primal solutions and dual solutions arises. This motivates the following definition.

Definition 9 (Primal-dual lifting)

Let \(d,f\in \mathbb {Z}_{+}\). A fixed point encoding (T,S) is a (d,f)-fold lifting for Problem 1 if

$$ T:\mathcal{H}^{d}\times\mathcal{G}_{1}^{f_{1}}\times\cdots\times\mathcal{G}_{m}^{f_{m}} \to\mathcal{H}^{d}\times\mathcal{G}_{1}^{f_{1}}\times\cdots\times\mathcal{G}_{m}^{f_{m}} $$

and

$$ S:\mathcal{H}^{d}\times\mathcal{G}_{1}^{f_{1}}\times\cdots\times\mathcal{G}_{m}^{f_{m}}\to\mathcal{H}\times\mathcal{G}_{1}\times\cdots\times\mathcal{G}_{m}, $$

where fj ≥ 0 for all j ∈ ⟦1,m⟧ and \(f = {\sum }_{j=1}^{m} f_{j}\). We adopt the convention that the space \(\mathcal {G}_{j}\) vanishes from the equation when fj = 0.

The need to control the Lipschitz constants of the linear operators requires the introduction of parameters in the resolvents of the maximally monotone operators. This motivates the definition of parametrized resolvent splitting introduced in Section 2.1 and which we now adapt to primal-dual splitting algorithms.

Definition 10 (Primal-dual parametrized resolvent splitting)

A fixed point encoding (T,S) for Problem 1 is a primal-dual parametrized resolvent splitting if, for all particular instance of the problem, there is a finite procedure that evaluates T and S at a given point which only uses vector addition, scalar multiplication, the parametrized resolvents of A1,…An and B1,…,Bm, and forward evaluations of L1,…,Lm and their adjoints.

Definition 11 (Frugality)

A primal-dual parametrized resolvent splitting (T,S) for Problem 1 is frugal if, in addition, each of the parametrized resolvents of A1,…,An and B1,…,Bm is used exactly once.

Remark 3 (On the absence of restrictions on the evaluation of the linear operators)

Since in the finite case, a forward evaluation of a linear operator is computationally equivalent to performing vector addition and scalar multiplication, this suggests that for practical applications there is no computational need to control the number of evaluations of the linear operators in the definition of frugality.

Example 4

Let n ≥ 2 and consider Problem 1. Let \(T:{\mathscr{H}}^{n-1}\times \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\to {\mathscr{H}}^{n-1}\times \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\) be the operator defined in (6) by setting and . Let \(S:{\mathscr{H}}^{n-1}\times \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\to {\mathscr{H}}\times \mathcal {G}_{1}\times \cdots \times \mathcal {G}_{m}\) be defined as

$$ S\begin{pmatrix}\mathbf{z}\\ \mathbf{v}\end{pmatrix} := \begin{pmatrix} J_{A_{1}}(z_{1}) \\ \gamma L_{1} J_{A_{1}}(z_{1}) - v_{1} \\{\vdots} \\ \gamma L_{m} J_{A_{1}}(z_{1}) - v_{m} \end{pmatrix}. $$

Then, by Lemma 1 and Corollary 1, the pair (T,S) is a frugal parametrized resolvent splitting with (n − 1,m)-fold lifting.

The following result shows that the lifting of Algorithm 1 is minimal among frugal primal-dual parametrized resolvent splitting algorithms with m dual variables.

Theorem 5 (Minimality theorem for frugal parametrized splitting)

Let (T,S) be a frugal primal-dual parametrized resolvent splitting for Problem 1 with (d,m)-fold lifting. Then, if n ≥ 2, necessarily dn − 1.

Proof

By way of contradiction, let (T,S) be a frugal primal-dual parametrized resolvent splitting for Problem 1 with (d,m) fold lifting and d < n − 1. Consider the instance of the problem in which \(L_{j} =\text {Id} :{\mathscr{H}} \to {\mathscr{H}}\) for all j ∈ ⟦1,m⟦. Then, Problem 1 becomes the classical monotone inclusion problem with n + m operators and (T,S) is a frugal resolvent splitting with (d + m)-fold lifting for such problem with d + m < n + m − 1, which contradicts Theorem 2. □

Finally, we conclude this section by highlighting that Algorithm 1 can be applied with n < 2, by setting Ai = 0 if required. However, a reduction in the lifting is not obtained in this case.

Remark 4 (Algorithm 1 when n ≤ 1)

Consider Algorithm 1 applied to Problem 1 with n ≤ 1. We distinguish the two cases:

  1. (i)

    If n = 1, then Algorithm 1 has (1,m)-lifting. Indeed, (27) and (28) become

    $$ \left( \begin{array}{ll} z^{k+1} \\ \mathbf{v}^{k+1} \end{array}\right) = \left( \begin{array}{ll} z^{k} \\ \mathbf{v}^{k} \end{array}\right) + \lambda \left( \begin{array}{cc} x^k-z^k \\ \gamma (y_1^k - L_1x^k) \\ {\vdots} \\ \gamma(y_m^k - L_m x^k) \end{array}\right) \quad \forall k\geq 0, $$
    (29)

    and

    $$ \left\{\begin{array}{ll} x^{k} &= J_{A_{1}}\left( z^{k} -\sum\limits_{j=1}^{m} L_{j}^{*}(\gamma L_{j} z^{k} - {v_{j}^{k}})\right), \\ {y_{j}^{k}} &= J_{B_{j}/\gamma} \left( L_{j}(z^{k}+x^{k}) - \frac{{v^{k}_{j}}}{\gamma}\right), \quad \forall j\in{\llbracket 1,m\rrbracket}, \end{array}\right. $$
    (30)

    respectively. This means that, in contrast with what happens when n ≥ 2, there is no reduction in the lifting with respect to the number of operators involved.

  2. (ii)

    If n = 0, the scheme also has (1,m)-lifting. In fact, the scheme is the same as in the previous case but substituting \(J_{A_{1}}\) by Id in (30). Note that this is also the lifting obtained by the already known algorithms in the literature applied to this case.

4 Numerical experiments

In this section, we test our algorithm for solving an ill-conditioned linear inverse problem which arises in image deblurring and denoising. Let \(b\in \mathbb {R}^{n}\) be an observed blurred and noisy image of size M × N, with n = MN for grayscale and n = 3MN for color images, and denote by \(A\in \mathbb {R}^{n\times n}\) the blur operator. The problem can be tackled by means of the regularized convex non-differentiable problem

$$ \inf\limits_{s \in \mathbb{R}^{n}} \left\{ \|A s- b\|_{1} + \alpha_{1} \|Ws\|_{1} + \alpha_{2} TV(s) + \delta_{[0,1]^{n}}(s) \right\}, $$
(31)

where α1,α2 > 0 are regularization parameters, \(\delta _{[0,1]^{n}}\) denotes the indicator function of the set [0,1]n, \(TV:\mathbb {R}^{n}\to \mathbb {R}\) is the discrete isotropic total variation function and W is the linear operator given by the normalized nonstandard Haar transform [21].

Recalling Remark 2, it is of interest to consider a mechanism which allows tuning the parameter γ appearing in the definition of the norm given by the inner product in (8) to an appropriate value. To this aim, we perform in (31) a change of variable of the form s = μx, with μ > 0, and instead handle the problem

$$ \inf_{x \in \mathbb{R}^{n}} \left\{ \mu \left\|A x- \frac{b}{\mu}\right\|_{1} + \alpha_{1}\mu \|Wx\|_{1} + \alpha_{2} TV(\mu x) + \delta_{[0,1/\mu]^{n}}(x) \right\}. $$
(32)

Below, we will see the way in which the choice of μ can help setting a suitable parameter γ.

The minimization problem in (32) can be modeled as a composite monotone inclusion problem. For this, define the operator \(L:\mathbb {R}^{n}\to \mathbb {R}^{n}\times \mathbb {R}^{n} : x \to (L_{1} x, L_{2} x)\) where L1 and L2 are defined component-wise as

$$ \left\{\!\begin{array}{ll} \frac{x_{i+1,j} - x_{i,j}}{\mu}, & \text{if } i<M, \\ 0, & \text{otherwise}, \end{array}\right. \text{ and } (L_{2} x)_{i,j} \! =\! \left\{\! \begin{array}{ll} \frac{x_{i,j+1} - x_{i,j}}{\mu}, & \text{if } j<N,\\ 0, & \text{otherwise}. \end{array}\right. $$
(33)

Then, the parametrized total variation function can be written as TV (μ⋅) = ∥L(⋅)∥×, with \(\|(p,q)\|_{\times } := {\sum }_{i=1}^{m} {\sum }_{j=1}^{n} \sqrt {p_{i,j}^{2} + q_{i,j}^{2}}\). Furthermore, an upper bound of the Lipschitz constant of L is given by ∥L2 ≤ 8μ2 (see [9] for details).

By [2, Proposition 27.5], obtaining a solution to the following problem is equivalent to solving (32)

$$ \text{find } x\in\text{zer}~{\left( N_{[0,1/\mu]^{n}}+ W^{*} \circ \partial g_{1} \circ W + A^{*} \circ \partial g_{2} \circ A + L^{*} \circ \partial g_{3} \circ L \right)}, $$
(34)

with \(g_{1}: \mathbb {R}^{n} \to \mathbb {R}\), g1(y) = α1μy1, \(g_{2}:\mathbb {R}^{n} \to \mathbb {R}\), g2(y) = μyb/μ1, \(g_{3}:\mathbb {R}^{n} \times \mathbb {R}^{n} \to \mathbb {R}\), g3(p,q) = α2∥(p,q)∥×, and \(N_{[0,1/\mu ]^{n}}\) the normal cone operator to the set [0,1/μ]n. In order to implement Algorithm 1 for solving (34), we need the expression of the following resolvents and proximity operators. By [2, Proposition 23.25 (iii)], the second term in (34) is a maximally monotone operator and its resolvent can be expressed as

$$ J_{W^{*} \circ \partial g_{1} \circ W} = \text{Id} - W^{*} \circ \left( \text{Id} - \text{prox}_{g_{1}}\right)\circ W = \text{Id} - W^{*}\circ \text{prox}_{g_{1}^{*}} \circ W, $$

where proxg = Jg denotes the proximity operator of a function g, and \(g_{1}^{*}\) is the conjugate function to g1, which is equal to the indicator function \(\delta _{[-\alpha _{1}\mu ,\alpha _{1}\mu ]^{n}}\), and thus \(\text {prox}_{g_{1}^{*}} = P_{[-\alpha _{1}\mu ,\alpha _{1}\mu ]^{n}}\). Given σ > 0, the proximity operators of g2 and g3 are, respectively,

$$ \text{prox}_{\sigma g_{2}}(x) = \frac{b}{\mu}+\text{prox}_{\sigma \mu \|\cdot\|_{1}} \left( x-\frac{b}{\mu}\right) = \frac{b}{\mu} + \text{sign}{\left( x-\frac{b}{\mu}\right)} \odot \left[\biggl\vert x-\frac{b}{\mu}\biggr\vert- \sigma \mu \right]_{+}, $$

where ⊙ denotes element-wise product and [⋅]+ and |⋅| are applied element-wise, and

$$ \text{prox}_{\sigma g_{3}} = \text{Id} - \sigma \text{prox}_{\frac{1}{\sigma} g_{3}^{*}} \circ \frac{1}{\sigma} \text{Id} = \text{Id} - \sigma P_{S} \circ \frac{1}{\sigma} \text{Id}, $$

since the conjugate function of g3 is \(g_{3}^{*}:\mathbb {R}^{n}\times \mathbb {R}^{n} \to \mathbb {R}^{n}\), \(g_{3}^{*} = \delta _{S}\), with the set S defined as

$$ S:= \left\{ (p,q)\in\mathbb{R}^{n}\times\mathbb{R}^{n} : \max\limits_{1\leq i\leq M, 1\leq j\leq N} \sqrt{p_{i,j}^{2} + q_{i,j}^{2}} \leq \alpha_{2} \right\}, $$

and the projection operator \(P_{S}:\mathbb {R}^{n}\times \mathbb {R}^{n}\to S\) is given component-wise by

$$ (p_{i,j},q_{i,j}) \mapsto \alpha_{2} \frac{(p_{i,j},q_{i,j})}{\max{\{\alpha_{2}, \sqrt{p_{i,j}^{2}+q_{i,j}^{2}}\}}}, \quad 1\leq i \leq M, 1\leq j\leq N. $$

Hence, when choosing \(z^{0}\in \mathbb {R}^{n}\), \({v_{1}^{0}}\in \mathbb {R}^{n}\) and \({v_{2}^{0}}\in \mathbb {R}^{n}\times \mathbb {R}^{n}\) as starting values, and letting λ ∈ ]0,1[ and \(\gamma \in {\left ]0,1/(\|A\|^{2}+\|L\|^{2})\right ]}\), the iterative scheme in Algorithm 1 becomes

$$ \left\lfloor \begin{array}{ll} {\kern9.5pt} {x_{1}^{k}} = &P_{[0,1/\mu]^{n}} (z^{k}), \\ {\kern9.5pt}{x_{2}^{k}} =& \left( \text{Id} - W^{*} \circ P_{[- \alpha_{1} \mu,\alpha_{1} \mu]^{n}} \circ W \right)\\ &\left( 2{x_{1}^{k}}-z^{k} - A^{*}(\gamma A{x_{1}^{k}} - {v_{1}^{k}}) - L^{*}(\gamma L{x_{1}^{k}} -{v_{2}^{k}}) \right), \\ {\kern9.5pt}{y_{1}^{k}} = &\frac{b}{\mu}+\text{prox}_{\frac{\mu}{\gamma} \|\cdot\|_{1}}\left( A({x_{1}^{k}}+{x_{2}^{k}}) - \frac{{v_{1}^{k}}}{\gamma} -\frac{b}{\mu}\right), \\ {\kern9.5pt}{y_{2}^{k}} = &\left( \text{Id} - \frac{1}{\gamma} P_{S}\right) \left( \gamma L({x^{k}_{1}}+{x_{2}^{k}})- {v_{2}^{k}}\right), \\ {\kern.5pt}z^{k+1} = &z^{k} + \lambda ({x_{2}^{k}}-{x_{1}^{k}}), \\ v_{1}^{k+1} =& {v_{1}^{k}} + \lambda \gamma ({y_{1}^{k}}-A {x_{2}^{k}}), \\ v_{2}^{k+1} =& {v_{2}^{k}} + \lambda \gamma ({y_{2}^{k}} - L {x_{2}^{k}}). \end{array}\right. $$

In our experiment, we replicate the problem in [5, Section 4.2], where an extensive comparison between different primal-dual algorithms is presented. Since the best performing algorithm is the Douglas–Rachford type primal-dual method in [5, Algorithm 3.1], we limit our comparison to this algorithm, whose detailed implementation is given in the cited work. We ran our experiments in MATLAB, making use of the inbuilt functions fspecial and imfilter to define an operator A which is a Gaussian blur operator of size 9 × 9 with standard deviation 4 and reflexive boundary conditions. In particular, A verifies ∥A∥ = 1 and A = A. We employed as observed image b a picture taken at the Schönbrunn Palace Gardens (Vienna) subjected to the already specified blur followed by the addition of a zero-mean Gaussian noise with standard deviation 10− 3 (see Fig. 2). To test the influence on the performance of the picture size, we resized the original picture to different pixel resolutions (see Table 1).

Table 1 Results from running on the picture displayed in Fig. 2 (for various pixel resolutions) 400 iterations of Algorithm 1 with μ = 1 and \(\mu =1/\sqrt {8}\), and DR1

When measuring the quality of the restored images, we use the improvement in signal-to-noise-ratio (ISNR), which is given by

$$ \text{ISNR}_{k} = 10\log_{10}\left( \frac{\|x-b\|^{2}}{\|x-x^{k}\|^{2}}\right), $$

where x and xk are the original and the reconstructed image at iteration k, respectively. We tuned the regularization parameters in order to guarantee an adequate ISNR value for the restored images, setting α1 := 0.005 and α2 := 0.009.

We recall that the stepsize parameter γ of Algorithm 1 must be taken in the interval \(\gamma \in {\left ]0,1/(\|A\|^{2}+\|L\|^{2})\right ]}= {\left ]0,1/(1+8\mu ^{2})\right ]}\). When μ = 1 (i.e., we solve (31)), this interval is ]0,0.111]. In our numerical experiments, we empirically observed that a very small stepsize negatively affects the performance of the algorithm, as mentioned in Remark 2. After testing different options, the most convenient one seems to be \(\mu = 1/\sqrt {8}\), which implies making the Lipschitz constant of both linear operators in the problem equal to 1.

The initialization of each of the methods was the following:

  • DR1([5, Algorithm 3.1]): starting points x0 = b and (v1,0,v2,0,v3,0) = (0,0,0), σ1 = 1, σ2 = 0.05, σ3 = 0.05, τ = 1(σ1 + σ2 + 8σ3)− 1 − 0.01, λn = 1.5 for al \(n\in \mathbb {N}\).

  • Algorithm 1 with μ = 1: starting points z0 = b and \(({v_{1}^{0}},{v_{2}^{0}}) = (0,0)\), λ = 0.99 and γ = 1/9;

  • Algorithm 1 with \(\mu =1/\sqrt {8}\): starting points z0 = b/μ and \(({v_{1}^{0}},{v_{2}^{0}}) = (0,0)\), λ = 0.99 and γ = 1/2.

We performed 400 iterations of each of the algorithms and compared the values of the objective function in (32) and the ISNR with respect to the CPU time, which provides a more realistic comparison than iteration count, since DR1 has a higher computational cost per iteration than Algorithm 1. The tests were ran on a desktop of Intel Core i7-4770 CPU 3.40GHz with 32GB RAM, under Windows 10 (64-bit). The algorithms were ran 3 times, once for each of the RGB components of the picture. The evolution in CPU time of adding these 3 values of the objective function and those of the ISNR for the 640 × 768-sized picture are represented in Fig. 1, where we observe that Algorithm 1 with \(\mu =1/\sqrt {8}\) obtains slightly better values than those returned by DR1, but in significantly less time.

Fig. 1
figure 1

The evolution of the values of the objective function and of the ISNR in CPU time for 400 iterations of Algorithm 1 with μ = 1 and \(\mu =1\sqrt {8}\) and DR1, using the 640 × 768 pixels image displayed in Fig. 2

The restored images are presented in Fig. 2. There is no much difference between the ones corresponding to Algorithm 1 with \(\mu =1/\sqrt {8}\) (bottom-middle) and DR1 (bottom-right), but a close look at the image obtained with Algorithm 1 with μ = 1 permits to observe its worse quality. To show that this trend in the performance of the algorithms is not affected by the image size, we present in Table 1 the results from running the algorithms on the same picture for five different pixel resolutions. Overall, we notice that the CPU time required for computing the 400 iterations is significantly lower for Algorithm 1, as expected. On average, DR1 required 45% more time than Algorithm 1 to compute the 400 iterations, independently of the size of the image. Regarding the parameter μ, Algorithm 1 with μ = 1 is notably outperformed by the other two methods, making thus clear the influence that this parameter has on it. The function values obtained were slightly lower for DR1, while the ISNR was slightly lower for Algorithm 1 with \(\mu =1/\sqrt {8}\), which implies that both algorithms performed similarly with respect to the restored image quality.

Interpretation of the results of the experiments

The experimental results show that, after performing the same number of iterations, Algorithm 1 with \(\mu =1/\sqrt {8}\) obtains similar results in the function values and the measurement in the quality of the image recovery than those obtained by DR1, but in considerably less time. This decrease in the running time can be attributed to the reduction in the lifting of the operator. Although in the first iterations DR1 achieves a larger reduction of the objective function, the quality of the restored image is not sufficient, as assessed by the low ISNR values. On the other hand, Algorithm 1 with μ = 1 can be discarded, as it obtains higher objective and lower ISNR values. Consequently, Algorithm 1 with \(\mu =1/\sqrt {8}\) is the preferable choice to address problem (31).

Fig. 2
figure 2

On the top, the original 640 × 768 pixels image and the blurred and noisy image. On the bottom the images restored after computing 400 iterations of Algorithm 1 with μ = 1 (left) and \(\mu =1/\sqrt {8}\) (middle), and DR1 (right)

5 Conclusions and open questions

In this work, we have considered the composite monotone inclusion problem together with its dual counterpart given by Problem 1. We have extended the definition of resolvent splitting given in [19] to encompass primal-dual algorithms and the inclusion of parameters in the resolvent and presented a definition of minimal lifting for frugal schemes of this form. We have proposed the first primal-dual algorithm which presents minimal lifting in this sense, and show its good performance with a numerical example.

To conclude, we outline possible directions for further research.

Establishing an optimal criterion for tuning the stepsize γ

We pointed out in Remark 2 the influence that the parameter γ can have in the performance of the algorithm. In Section 4 we presented a possibility for controlling this parameter, by making use of a change of variable which modifies the Lipschitz constants of the linear operators, and we empirically showed that it significantly affects the speed of performance of the algorithm. However, there is no guarantee that this strategy is optimal. It would be interesting to further investigate which is the best way for tuning the value of γ.

Achieving lifting reduction in the dual variables

The reduction in the lifting with respect to the number of operators achieved in the algorithm here presented only affects the primal variables. It remains open the question of whether it is possible to reduce the dimension of the underlying space associated to the linearly composed operators. More precisely, if we consider the problem given by

$$ \text{find } x \in \mathcal{H} \text{ such that }0 \in\sum\limits_{j=1}^{m} L^{*}_{j} {B_{j}} (L_{j} x), $$

is it possible to obtain an algorithm for solving this problem with (0,m − 1)-fold lifting (according to Definition 9)? Or even with (1,m − 1) or (0,m)-fold lifting? All these questions remain open.