Incorporating history and deviations in forward–backward splitting

Sadeghi, Hamed; Banert, Sebastian; Giselsson, Pontus

doi:10.1007/s11075-023-01686-8

Incorporating history and deviations in forward–backward splitting

Original Paper
Open access
Published: 04 December 2023

(2023)
Cite this article

Download PDF

You have full access to this open access article

Numerical Algorithms Aims and scope Submit manuscript

Incorporating history and deviations in forward–backward splitting

Download PDF

Hamed Sadeghi¹,
Sebastian Banert¹ &
Pontus Giselsson¹

395 Accesses
1 Citation
Explore all metrics

Abstract

We propose a variation of the forward–backward splitting method for solving structured monotone inclusions. Our method integrates past iterates and two deviation vectors into the update equations. These deviation vectors bring flexibility to the algorithm and can be chosen arbitrarily as long as they together satisfy a norm condition. We present special cases where the deviation vectors, selected as predetermined linear combinations of previous iterates, always meet the norm condition. Notably, we introduce an algorithm employing a scalar parameter to interpolate between the conventional forward–backward splitting scheme and an accelerated $\mathcal {O}\left( \frac{1}{n^2}\right) $-convergent forward–backward method that encompasses both the accelerated proximal point method and the Halpern iteration as special cases. The existing methods correspond to the two extremes of the allowed scalar parameter range. By choosing the interpolation scalar near the midpoint of the permissible range, our algorithm significantly outperforms these previously known methods when addressing a basic monotone inclusion problem stemming from minimax optimization.

The Forward–Backward Algorithm and the Normal Problem

Article 06 February 2018

An Inertial Forward-Backward Algorithm for Monotone Inclusions

Article 17 July 2014

A Mirror Inertial Forward–Reflected–Backward Splitting: Convergence Analysis Beyond Convexity and Lipschitz Smoothness

Article 20 February 2024

1 Introduction

We consider the problem of finding $x\in \mathcal {H}$ such that

$$\begin{aligned} 0 \in Ax+Cx, \end{aligned}$$

(1)

where $A:\mathcal {H}\rightarrow 2^\mathcal {H}$ is a maximally monotone operator, $C:\mathcal {H}\rightarrow \mathcal {H}$ is a cocoercive operator, $\mathcal {H}$ is a real Hilbert space and $2^\mathcal {H}$ denotes its power set. This monotone inclusion has optimization problems [18, 29], convex-concave saddle-point problems [13], and variational inequalities [5, 14, 38] as special cases.

The forward–backward (FB) splitting method [11, 24, 27] has been widely used to solve the monotone inclusion problem (1). The gradient method, the proximal point algorithm [30], the proximal-gradient method [16], the Chambolle–Pock method [13], the Douglas–Rachford method [18, 24], and the Krasnosel’skiĭ–Mann iteration [9, Section 5.2] can all be considered special instances of the FB method. Various attempts have been made to improve the convergence of the FB splitting algorithm by incorporating information from previous iterations. Notable examples include the heavy-ball method [28], the inertial proximal point algorithm [1, 2], and inertial FB algorithms [3, 4, 6, 7, 10, 12, 15, 25, 26], which integrate prior information into the current iteration through a momentum term.

In this paper, we propose an extension to the conventional FB algorithm that includes momentum-like terms and two deviation vectors. These deviations have the same dimension as the underlying space of the problem and serve as adjustable parameters that provide the algorithm with great flexibility. This flexibility can be exploited to control the trajectory of the iterations with the aim to enhance algorithm convergence. To guarantee convergence, we require the deviations to satisfy a safeguarding condition that restricts the norm, but not the direction, of the deviation vectors. Our safeguarding approach is similar to those in [8, 32, 33]—which indeed are special instances of our algorithm—while it distinctly contrasts with the safeguarding conditions presented in [20, 34, 37, 40] that choose between a globally convergent and locally fast methods depending on the fulfillment of their respective safeguarding conditions.

We also introduce two special cases where the deviation vectors are predetermined linear combinations of prior iteration data. This construction ensures that the safeguarding condition is met in all iterations, implying that it does not require online evaluation. The two special cases incorporate different scalar parameters controlling the behaviour of their respective algorithms. In one case, the scalar parameter $\kappa \in (-1,1)$ regulates the momentum used in the algorithm, with $\kappa =0$ yielding the standard FB method. This algorithm converges weakly towards a solution of the inclusion problem for all $\kappa $ within the permitted range. In the other case, the scalar parameter $e\in [0,1]$ acts as an interpolator between the standard FB method ($e=0$) and an accelerated FB method ($e=1$) featuring the accelerated proximal point method from [21] and the Halpern iteration analyzed in [23] as special cases. The scalar parameter e regulates the convergence rate of the squared norm of the fixed-point residual, converging as $\mathcal {O}\left( \min \left( 1/n^{2e},1/n\right) \right) $, with $e=1$ offering the accelerated FB method with an $\mathcal {O}\left( 1/n^2\right) $ convergence rate, consistent with the rates in [21, 23].

We perform numerical evaluation of these two special cases on a simple skew-symmetric monotone inclusion problem arising from optimality conditions for the minimax problem $\max _{y\in \mathbb {R}}\min _{x\in \mathbb {R}}xy$. Our findings suggest that with $\kappa \in [0.8,0.9]$, our first special case performs an order of magnitude better than the FB method ($\kappa =0$) on this problem. Furthermore, by allowing $e\in [0.4,0.5]$, we observe that our second special case outperforms the FB method ($e=0$) by an order of magnitude and performs several orders of magnitude better than the accelerated FB method ($e=1$), despite the latter’s stronger theoretical convergence guarantee.

The analysis of our base algorithm relies on a Lyapunov inequality. We derive this inequality by applying the monotonicity inequality of operator A and the cocoercivity inequality of operator C (that are referred to as interpolation conditions in the terminology of performance estimation (PEP), see for instance [31, 35, 36]), both between the last iterate and a solution of the problem, as well as the last two points generated by the algorithm. This is in contrast to the analysis conducted in [33], which restricts the use of these inequalities to only the last iteration and a solution. The inclusion of additional inequalities allows for deriving special cases such as the one that interpolates between FB splitting and accelerated FB splitting along with the associated convergence rate of $\mathcal {O}(1/n^{2e})$ where $e\in [0,1]$. This result is not achievable via the algorithm proposed in [33].

The paper is organized as follows. In Section 2, we establish the basic definitions and notations used throughout the paper. Section 3 contains the formal presentation of the problem and introduces our proposed algorithm that is analyzed in Section 4. In Section 5, we present several special instances of our algorithm, two of which we examine numerically in Section 6. Proofs omitted for the sake of brevity are shared in Sections 7 and 8 concludes the paper.

2 Preliminaries

The set of real numbers is denoted by $\mathbb {R}$. $\mathcal {H}$ denotes a real Hilbert space that is equipped with an inner product and an induced norm, respectively denoted by $ {\left\langle \cdot , \cdot \right\rangle }$ and $ {\left\| \cdot \right\| }:=\sqrt{ {\left\langle \cdot , \cdot \right\rangle }}$. $\mathcal {M} {\left( \mathcal {H} \right) }$ denotes the set of bounded linear, self-adjoint, strongly positive operators on $\mathcal {H}$. For $M\in \mathcal {M} {\left( \mathcal {H} \right) }$ and all $x,y\in \mathcal {H}$, the M-induced inner product and norm are denoted and defined by $ {\left\langle x, y \right\rangle }_M:= {\left\langle x, My \right\rangle }$ and $ {\left\| x \right\| }_M = \sqrt{ {\left\langle x, Mx \right\rangle }}$, respectively.

The power set of $\mathcal {H}$ is denoted by $2^\mathcal {H}$. A map $A:\mathcal {H}\rightarrow 2^{\mathcal {H}}$ is characterized by its graph ${\text {gra}}(A) = {\left\{ {\left( x, u \right) }\in \mathcal {H}\times \mathcal {H} : u\in Ax \right\} }$. An operator $A:\mathcal {H}\rightarrow 2^{\mathcal {H}}$ is monotone if $ {\left\langle u-v, x-y \right\rangle }\ge 0$ for all $ {\left( x, u \right) }, {\left( y, v \right) }\in {\text {gra}}(A)$. A monotone operator $A:\mathcal {H}\rightarrow 2^{\mathcal {H}}$ is maximally monotone if there exists no monotone operator $B:\mathcal {H}\rightarrow 2^\mathcal {H}$ such that ${\text {gra}}(B)$ properly contains ${\text {gra}}(A)$.

Let $M\in \mathcal {M} {\left( \mathcal {H} \right) }$. An operator $T:\mathcal {H}\rightarrow \mathcal {H}$ is said to be

(i)
L-Lipschitz continuous ($L \ge 0$) w.r.t. $ {\left\| \cdot \right\| }_M$ if
$$\begin{aligned} {\left\| Tx-Ty \right\| }_{M^{-1}}\le L {\left\| x-y \right\| }_M \qquad \text {for all } x,y\in \mathcal {H}; \end{aligned}$$
(ii)
$\frac{1}{\beta }$-cocoercive ($\beta \ge 0$) w.r.t. $ {\left\| \cdot \right\| }_M$ if
$$\begin{aligned} \beta {\left\langle Tx -Ty, x-y \right\rangle }\ge {\left\| Tx-Ty \right\| }_{M^{-1}}^2\qquad \text {for all } x,y\in \mathcal {H}; \end{aligned}$$
(iii)
nonexpansive if it is 1-Lipschitz continuous w.r.t. $ {\left\| \cdot \right\| }$.

Note that a $\frac{1}{\beta }$-cocoercive operator is $\beta $-Lipschitz continuous. This holds trivially for $\beta =0$ and for $\beta >0$ it follows from the Cauchy-Schwarz inequality.

3 Problem statement and proposed algorithm

We consider structured monotone inclusion problems of the form

$$\begin{aligned} 0 \in Ax + Cx, \end{aligned}$$

(2)

that satisfy the following assumption.

Assumption 1

Let $\beta \ge 0$ and $M\in \mathcal {M} {\left( \mathcal {H} \right) }$ and assume that

(i)
$A:\mathcal {H}\rightarrow 2^{\mathcal {H}}$ is maximally monotone,
(ii)
$C:\mathcal {H}\rightarrow \mathcal {H}$ is $\frac{1}{\beta }$-cocoercive with respect to $ {\left\| \cdot \right\| }_M$,
(iii)
the solution set $\text{zer}(A+C) := {\left\{ x\in \mathcal {H} : 0\in Ax+Cx \right\} }$ is nonempty.

Since operator C has a full domain and is maximally monotone as a cocoercive operator [9, Corollary 20.28], the operator $A+C$ is also maximally monotone [9, Corollary 25.5].

We propose the following variant of FB splitting which incorporates momentum terms and deviations in order to solve the inclusion problem in (2). The algorithm has many degrees of freedom that we will specify later in this section and in the special cases found in Section 5.

At the core of the method is a forward–backward type step, found in Step 7 of Algorithm 1, which reduces to a nominal forward–backward step when $z_n=y_n$. The update equations for the algorithm sequences $y_n$, $z_n$, and $x_n$ involve linear combinations of momentum-like terms and the so-called deviations, $u_n$ and $v_n$. These deviations are arbitrarily chosen provided they satisfy safeguarding condition in (3), where $\ell _{n}$ is defined in (4). When selecting the deviations, all other quantities involved in () are computable. These deviations offer a degree of flexibility that can be used to control the algorithm trajectory with the aim of improving convergence. In Section 5, we present examples of nontrivial deviations that a priori satisfy this condition, thus removing the need for online evaluation.

For the algorithm to be implementable, let alone convergent, the algorithm parameters must be constrained. For the FB step in Step 7 to be implementable, and the safeguarding step to be satisfied for some $u_{n+1}$ and $v_{n+1}$, we require for all $n\in \mathbb {N}$ that $\gamma _n$, $\lambda _n$, $\theta _n$, $\hat{\theta }_n$, and $\tilde{\theta }_n$ are strictly positive and the parameters $\zeta _n$, $\mu _n$, $\alpha _n$, and $\bar{\alpha }_n$ are non-negative. Fulfillment of these requirements allows for a trivial choice that satisfies the safeguarding condition (), namely $u_{n+1}=v_{n+1}=0$, which results in a novel momentum-type forward–backward scheme. Additional requirements on some of these parameters, that are needed for the convergence analysis, are discussed in Section 4.

Algorithm 1 can be viewed as an extension of the algorithm in [33]. The key difference arises from the inclusion of additional monotonicity and cocoercivity inequalities (interpolation conditions) in our analysis compared to the analysis of [33]. In contrast to the analysis in [33], we utilize inequalities not only between the last iteration points and a solution but also between points generated during the last two iterations of our algorithm. This approach provides our algorithm with an additional degree of freedom, embodied by the parameter $\mu _n$, that stems from the degree to which these extra interpolation conditions are incorporated into the analysis. This addition yields momentum-like terms in the updates, a less restrictive safeguarding condition, and the potential to derive convergence rate estimates for several involved quantities up to $O(\frac{1}{n^2})$. Such rates are not achievable in [33] as setting $\mu _n$ to zero reverts our algorithm to that of [33].

When $\gamma _n=\gamma >0$ for every $n\in \mathbb {N}$, the deviation vectors $u_n$ and $v_n$ can be chosen so that $y_{n}=z_n$. In this case, Step 7 of Algorithm 1 simplifies to a FB step of the form

$$\begin{aligned} p_n = \left( (M+\gamma A)^{-1}\circ (M-\gamma C)\right) y_n. \end{aligned}$$

It is widely recognized that given appropriate selections of $\gamma >0$, $M\in \mathcal {M} {\left( \mathcal {H} \right) }$, A, and C, this FB step can reduce to iterations of well-known algorithms. These include the Chambolle–Pock algorithm [13], the Condat–Vũ method [17, 39], the Douglas–Rachford method [24], the Krasnosel’skiĭ–Mann iteration [9, Section 5.2], and the proximal gradient method. Consequently, Algorithm 1 can be applied to all these special cases.

3.1 Preview of special cases

This section previews some special cases of Algorithm 1, which we will explore in depth in Sections 5 and 6. Specifically, we consider cases where the sequence $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is non-decreasing and, for all $n\in \mathbb {N}$, $\gamma _n=\gamma >0$,

$$\begin{aligned} v_n = \frac{ {\left( 2-\gamma \bar{\beta } \right) }(\lambda _n+\mu _n)}{\hat{\theta }_n}u_n, \end{aligned}$$

and $u_n$ is parallel to the expression in the first norm in the $\ell _{n}$ expression in () that contributes to the upper bound in the safeguarding condition in (). This yields $z_n=y_n$ and as demonstrated in Section 5, with a particular choice of $\mu _n$, Algorithm 1 becomes

$$\begin{aligned} y_n&=x_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(y_{n-1}-x_n)+u_n\\ p_n&= {\left( M+\gamma A \right) }^{-1} {\left( M y_n - \gamma C y_n \right) }\\ x_{n+1}&= x_n + \lambda _n(p_n - y_n) + (\lambda _n-\lambda _0)(y_{n-1}-p_{n-1})\\ u_{n+1}&=\kappa _n\frac{4-\gamma \bar{\beta }-2\lambda _0}{2}(p_n-x_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(x_n-p_{n-1}) - \frac{2-\gamma \bar{\beta }-2\lambda _0}{4-\gamma \bar{\beta }-2\lambda _0} u_n), \end{aligned} $$

that is initialized with $y_{-1}=p_{-1}=x_0$ and $u_0=0$. This algorithm, under the condition $\kappa _n^2\le \zeta _{n+1}$, satisfies the safeguarding condition. As we will see later, $\zeta _{n+1}\in [0,1]$ can be set arbitrarily (though stronger convergence conclusions can be drawn if $\zeta _n<1$ for all $n\in \mathbb {N}$), indicating that as long as $\kappa _n\in [-1,1]$, this new algorithm satisfies the safeguarding condition by design.

In Section 5, we present two special cases of this iteration that we numerically evaluate in Section 6. The first special case involves setting $\lambda _0=1$ and, for all $n\in \mathbb {N}$, $\lambda _n=\lambda _0$ and $\kappa _n=\kappa \in (-1,1)$. As shown in Section 5, the resulting algorithm can be written as

$$\begin{aligned} p_n&= \left( {\left( M+\gamma A \right) }^{-1}\circ {\left( M - \gamma C \right) }\right) \left( p_{n-1}+u_n-u_{n-1}\right) \\ u_{n+1}&=\kappa \left( \frac{2-\gamma \bar{\beta }}{2}(p_n-p_{n-1}+u_{n-1}) + \frac{\gamma \bar{\beta }}{2} u_n\right) \end{aligned}$$

(5)

that is initialized with $u_{-1}=u_0=0$. This algorithm converges weakly to a solution of the inclusion problem and setting $\kappa =0$ gives $u_n=0$ for all $n\in \mathbb {N}$ and the algorithm reduces to the standard FB method.

The second special case is obtained by letting $\kappa _{n}=\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\in [0,1)$ resulting, as shown in Section 5, in the algorithm:

$$\begin{aligned} p_n&= {\left( M+\gamma A \right) }^{-1} {\left( M y_n - \gamma C y_n \right) }\\ y_{n+1}&=y_{n} + \left( \frac{\lambda _0\lambda _n}{\lambda _{n+1}}+\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\frac{4-\gamma \bar{\beta }-2\lambda _0}{2}\right) (p_n - y_n) \\&\quad +\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( (y_n- y_{n-1})+\frac{4-\gamma \bar{\beta }}{2}(y_{n-1} - p_{n-1})\right) , \end{aligned}$$

(6)

that is initialized with $y_{-1}=p_{-1}$. We will pay particular attention to the choice $\lambda _n=\left( 1-\frac{\gamma \bar{\beta }}{4}\right) ^e(1+n)^e$ with $e\in [0,1]$. The choice $e=0$ gives the standard FB method and, as shown in Section 5.2.1, the choice $e=1$ has the Halpern iteration analyzed in [23] and the accelerated proximal point method in [21] as special cases. By choosing an e value between the allowed extremes, we can interpolate between these methods. We show that $\Vert p_n-y_n\Vert _M^2$ with this choice of $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ converges as $\mathcal {O}\left( \frac{1}{n^{2e}}\right) $ (and, if $\bar{\beta }>\beta $, as $\mathcal {O}\left( \frac{1}{n}\right) $ for all $e\in [0,1]$) implying that the convergence rate can be tuned by selecting e. The requirements we will pose on the parameters in Section 4 to guarantee convergence state that $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ can grow at most linearly, meaning values of $e>1$ are not viable and the best possible rate is $\mathcal {O}\left( \frac{1}{n^2}\right) $ obtained by letting $e=1$. As shown in Section 5, this case recovers the exact $\mathcal {O}\left( \frac{1}{n^2}\right) $ rate results of the Halpern iteration in [23] and the accelerated proximal point method in [21].

In Section 6, we present numerical experiments on a simple skew-symmetric monotone inclusion problem, originating from the problem $\max _{y\in \mathbb {R}}\min _{x\in \mathbb {R}}xy$. We find that both Algorithms 5 and 6 can significantly outperform the standard FB method and the Halpern iteration when $\kappa $ and e are appropriately chosen.

4 Convergence analysis

In this section, we conduct a Lyapunov-based convergence analysis for Algorithm 1. In Theorem 1, we define a Lyapunov function, $V_n$, based on the iterates generated by Algorithm 1, and present an identity that establishes a relation between $V_{n+1}$ and $V_n$. In Theorem 2, we introduce additional assumptions and derive a Lyapunov inequality that serves as the main tool for the convergence and convergence rate analysis in Theorem 3.

The proof of our first theorem is lengthy and only based on algebraic manipulations and is therefore deferred to Section 7. The equality in the proof is validated with symbolic calculations in

https://github.com/sbanert/incorporating-history-and-deviations.

Theorem 1

Suppose that Assumption 1 holds. Let $x^\star $ be an arbitrary point in $\text{zer}(A+C)$ and $V_0= {\left\| x_0-x^\star \right\| }_M^2$, and based on the iterates generated by Algorithm 1, for all $n\in \mathbb {N}$, let

$$\begin{aligned} V_{n+1} := {\left\| x_{n+1}-x^\star \right\| }_M^2 + 2\lambda _{n+1}\gamma _{n+1}\alpha _{n+1}\phi _n + \ell _{n}, \end{aligned}$$

(7)

where

$$\begin{aligned} \phi _n := {\left\langle \frac{z_n-p_n}{\gamma _n}, p_n-x^\star \right\rangle }_M + \frac{\bar{\beta }}{4} {\left\| y_n-p_n \right\| }_M^2, \end{aligned}$$

(8)

and $\ell _{n}$ given by (). Then,

$$\begin{aligned} V_{n+1} + 2\gamma _n(\lambda _n&- \bar{\alpha }_{n+1}\lambda _{n+1})\phi _n + \ell _{n-1} \\&= V_n + (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2 + \frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) \end{aligned}$$

holds for all $n\in \mathbb {N}$.

The identity relation presented in Theorem 1 can provides meaningful insights on convergence (as we will see in Theorem 3) when all its constituent terms are non-negative. The non-negativity of these terms is contingent on the selection of the parameter sequences $ {\left( \zeta _n \right) }_{n\in \mathbb {N}}$, $ {\left( \gamma _n \right) }_{n\in \mathbb {N}}$, $ {\left( \mu _n \right) }_{n\in \mathbb {N}}$, and $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$. To reduce the degrees of freedom and facilitate a clearer exposition, we constrain ourselves to a non-decreasing $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$, and

$$\begin{aligned} \mu _n=\frac{1}{\lambda _0}\lambda _n^2-\lambda _n \end{aligned}$$

(9)

for all $n\in \mathbb {N}$. This implies $\mu _n\ge 0$ and offers a slightly less general algorithm, yet it encompasses all special cases in Section 5. We next state our restrictions on the parameter sequences that will give rise to a meaningful Lyapunov inequality in Theorems 2 and 3.

Assumption 2

Assume that $\varepsilon >0$, $\epsilon _0,\epsilon _1\ge 0$, $\lambda _0>0$, and that, for all $n\in \mathbb {N}$, $\mu _n=\frac{1}{\lambda _0}\lambda _n^2-\lambda _n$ and the following hold:

(i)
$0\le \zeta _n\le 1-\epsilon _0$;
(ii)
$\varepsilon \le \gamma _n\bar{\beta }\le 4-2\lambda _0-\varepsilon $;
(iii)
$\lambda _{n+1}\ge \lambda _{n}$ and $\gamma _n\lambda _n-\gamma _{n-1}\lambda _{n-1}\le \gamma _{n}\lambda _0-\epsilon _1$;
(iv)
$\bar{\beta }\ge \beta $.

Remark 1

Assumption 2 (i) gives an upper bound for $\zeta _n$ to be less than or equal to 1. The variable, $\zeta _n$, multiplies $\ell _{n-1}$ in the right-hand side of the safeguarding condition (), effectively contributing to limit the size of the ball from which the deviations $u_n$ and $v_n$ are selected. A consistent choice is $\zeta _n=1-\epsilon _0$. Assumption 2 (ii) sets requirements on the relation between the initial relaxation parameter $\lambda _0$ and the step size parameter $\gamma _n$. An alternative expression of the upper bound is given by

$$\begin{aligned} 4-\gamma _n\bar{\beta }-2\lambda _0\ge \varepsilon , \end{aligned}$$

(10)

implying that

$$\begin{aligned} \gamma _n\bar{\beta }\lambda _0\le (4-\varepsilon )\lambda _0-2\lambda _0^2=-(\sqrt{2}\lambda _0-\sqrt{2})^2+2-\varepsilon \lambda _0\le 2-\varepsilon \lambda _0. \end{aligned}$$

(11)

This inequality will be used to bound certain algorithm parameters. Note that, similarly to in [19, 22], we can allow for a $\gamma _n>\frac{2}{\beta }$ with the trade-off of using a relaxation parameter $\lambda _0<1$. Assumption 2 (iii) states that $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is non-decreasing, resulting in $\mu _n\ge 0$, and enforces a linear growth upper bound since $\gamma _n$ is both positive and upper bounded. We will later see that our algorithm can converge as $\mathcal {O}(\frac{1}{\lambda _n^2})$, with this upper bound resulting in a best possible convergence rate of $\mathcal {O}(\frac{1}{n^2})$. Finally, Assumption 2 (iv) sets requirements on $\bar{\beta }$. While the choice $\bar{\beta }=\beta $ always works, selecting $\bar{\beta }>\beta $ guarantees convergence of certain parameter sequences. Note that when $\lambda _0=1$ and $\bar{\beta }=\beta $, it follows from Assumption 2 (ii) that $\gamma _n\le \frac{2-\varepsilon }{\beta }$, aligning with the conventional step size upper bound for forward–backward splitting.

Our convergence analysis requires that specific parameter sequences are non-negative or, in certain cases, are lower bounded by a positive number. Before showing that Assumption 2 ensures this, we provide expressions for the following sequences defined in Algorithm 1 in terms of $\lambda _n$, $\gamma _n$, and $\bar{\beta }$:

$$\begin{aligned} \alpha _n&=\frac{\mu _n}{\lambda _n+\mu _n}=\frac{\lambda _n-\lambda _0}{\lambda _n},\\ \bar{\alpha }_n&=\frac{\gamma _n\mu _n}{\gamma _{n-1}(\lambda _n+\mu _n)}=\frac{\gamma _n}{\gamma _{n-1}}\frac{\lambda _n-\lambda _0}{\lambda _n},\\ \theta _n&=(4-\gamma _n\bar{\beta })(\lambda _n+\mu _n)-2\lambda _n^2=\frac{4-\gamma _n\bar{\beta }-2\lambda _0}{\lambda _0}\lambda _n^2,\\ \hat{\theta }_n&=2\lambda _n+2\mu _n-\gamma _n\bar{\beta }\lambda _n^2=\frac{2-\lambda _0\gamma _n\bar{\beta }}{\lambda _0}\lambda _n^2,\\ \bar{\theta }_n&=\lambda _n+\mu _n-\lambda _n^2=\frac{1-\lambda _0}{\lambda _0}\lambda _n^2,\\ \tilde{\theta }_n&=(\lambda _n+\mu _n)\gamma _n\bar{\beta }= \frac{\gamma _n\bar{\beta }}{\lambda _0}\lambda _n^2. \end{aligned}$$

(12)

Note that the final four quantities are quadratic in $\lambda _n$.

Proposition 1

Consider the quantities defined in Algorithm 1 and suppose that Assumption 2 holds. The parameter sequences $ {\left( \theta _n \right) }_{n\in \mathbb {N}}$, $ {\left( \hat{\theta }_n \right) }_{n\in \mathbb {N}}$, $ {\left( \tilde{\theta }_n \right) }_{n\in \mathbb {N}}$, $ {\left( \frac{\tilde{\theta }_n}{\hat{\theta }_n} \right) }_{n\in \mathbb {N}}$, $ {\left( \frac{\hat{\theta }_n}{\theta _n} \right) }_{n\in \mathbb {N}}$, and $ {\left( \frac{\theta _n}{\lambda _n^2} \right) }_{n\in \mathbb {N}}$ are lower bounded by a positive constant and $ {\left( \alpha _n \right) }_{n\in \mathbb {N}}$ and $ {\left( \gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})-\epsilon _1 \right) }_{n\in \mathbb {N}}$ are non-negative.

Proof

Let us first consider $\theta _n$, $\hat{\theta }_n$, and $\tilde{\theta }_n$. Then Assumption 2, (12), (10), and (11) immediately imply that $\theta _n$, $\hat{\theta }_n$, and $\tilde{\theta }_n$ are lower bounded by a positive constant. Moreover, since $2-\lambda _0\gamma _n\bar{\beta }\ge \varepsilon \lambda _0>0$ by (11), we have

$$\begin{aligned} \frac{\tilde{\theta }_n}{\hat{\theta }_n}=\frac{\gamma _n\bar{\beta }}{2-\lambda _0\gamma _n\bar{\beta }}\ge \frac{\gamma _n\bar{\beta }}{2}\ge \frac{\varepsilon }{2}> 0 \end{aligned}$$

and since from (10), $4-\gamma _n\bar{\beta }-2\lambda _0\ge \varepsilon >0$, we have

$$\begin{aligned} \frac{\hat{\theta }_n}{\theta _n}=\frac{2-\lambda _0\gamma _n\bar{\beta }}{4-\gamma _n\bar{\beta }-2\lambda _0}\ge \frac{2-\lambda _0\gamma _n\bar{\beta }}{4}\ge \frac{\lambda _0\varepsilon }{4}>0 \end{aligned}$$

and

$$\begin{aligned} \frac{\theta _n}{\lambda _n^2}=\frac{4-\gamma _n\bar{\beta }-2\lambda _0}{\lambda _0}\ge \frac{\varepsilon }{\lambda _0}>0. \end{aligned}$$

That $\alpha _n\ge 0$ follows trivially from nonegativity of $\mu _n$ and that $\lambda _n>0$. Finally,

$$\begin{aligned} \gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})&= \gamma _n\lambda _n-\gamma _{n+1}\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\lambda _{n+1}\\&=\gamma _n\lambda _n-\gamma _{n+1}\lambda _{n+1}+\gamma _{n+1}\lambda _0\ge \epsilon _1 \end{aligned}$$

by Assumption 2 (iii).$\square $

In the following result, we introduce a so-called Lyapunov inequality that serves as the foundation of our main convergence results.

Theorem 2

Suppose that Assumptions 1 and 2 hold. Let $x^\star $ be an arbitrary point in $\text{zer}(A+C)$, and the sequences $ {\left( \ell _{n} \right) }_{n\in \mathbb {N}}$, $ {\left( V_n \right) }_{n\in \mathbb {N}}$, and $ {\left( \phi _n \right) }_{n\in \mathbb {N}}$ be constructed in terms of the iterates obtained from Algorithm 1, as per (), (7), and (8) respectively. Then, for all $n\in \mathbb {N}$,

(i)
the safeguarding upper bound
$$\begin{aligned} \zeta _{n+1}\ell _{n} \ge \frac{\zeta _{n+1}\theta _n}{2} {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2\ge 0; \end{aligned}$$
(ii)
the term safeguarded by $\zeta _{n+1}\ell _{n}$,
$$\begin{aligned} \left( \lambda _{n+1}+\mu _{n+1}\right) \left( \frac{\tilde{\theta }_{n+1}}{\hat{\theta }_{n+1}} {\left\| u_{n+1} \right\| }_{M}^2 + \frac{\hat{\theta }_{n+1}}{\theta _{n+1}} {\left\| v_{n+1} \right\| }_{M}^2\right) \ge 0; \end{aligned}$$
(iii)
$\phi _n\ge 0$, more specifically, if $\beta >0$:
$$\begin{aligned} \phi _n\ge \frac{\bar{\beta }}{4} {\left\| \frac{2}{\bar{\beta }}(Cy_n-Cx^\star )+M(p_n-y_n) \right\| }_{M^{-1}}^2+\frac{\bar{\beta }-\beta }{\bar{\beta }\beta } {\left\| Cy_n-Cx^\star \right\| }_{M^{-1}}^2\ge 0, \end{aligned}$$
and if $\beta =0$:
$$\begin{aligned} \phi _n\ge \frac{\bar{\beta }}{4} {\left\| p_n-y_n \right\| }_M^2\ge 0; \end{aligned}$$
(iv)
the Lyapunov function $V_n\ge 0$;
(v)
the following Lyapunov inequality holds
$$\begin{aligned} V_{n+1}+2\gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})\phi _n + (1-\zeta _n)\ell _{n-1} \le V_n. \end{aligned}$$

Proof

Theorem 2 (i). In view of () and since, by Assumption 2 and Proposition 1, $2\mu _{n}\gamma _{n}\ge 0$, $\theta _n>0$, and $\zeta _{n+1}\ge 0$, the statement reduces to showing that

$$\begin{aligned} \widehat{\varphi }_n := {\left\langle \frac{z_{n}-p_{n}}{\gamma _{n}}-\frac{z_{n-1}-p_{n-1}}{\gamma _{n-1}}, p_{n}-p_{n-1} \right\rangle }_M+\frac{\bar{\beta }}{4} {\left\| p_{n}-y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2\ge 0. \end{aligned}$$

It follows from Step 7 of Algorithm 1 that

$$\begin{aligned} \frac{Mz_n-Mp_n}{\gamma _n}-Cy_n\in {Ap_n}. \end{aligned}$$

(13)

Combined with montonicity of A, this gives

$$\begin{aligned} 0&\le {\left\langle \frac{Mz_n-Mp_n}{\gamma _n}-Cy_n-\frac{Mz_{n-1}-Mp_{n-1}}{\gamma _{n-1}}+Cy_{n-1}, p_n-p_{n-1} \right\rangle }. \end{aligned}$$

(14)

If $\beta =0$, C is constant, implying that the right-hand side reduces to the first term defining $\widehat{\varphi }_n$. Therefore, $\widehat{\varphi }_n\ge 0$ since it is constructed by adding two non-negative terms. It remains to show $\widehat{\varphi }_n\ge 0$ when $\beta >0$. From $\frac{1}{\beta }$-cocoercivity of C w.r.t. $ {\left\| \cdot \right\| }_M$, we have

$$\begin{aligned} 0&\le {\left\langle Cy_n-Cy_{n-1}, y_n-y_{n-1} \right\rangle }-\frac{1}{\beta } {\left\| Cy_n-Cy_{n-1} \right\| }_{M^{-1}}^2. \end{aligned}$$

(15)

Adding (14) and (15) to form $\varphi _n$ gives

$$\begin{aligned} \varphi _n&:= {\left\langle \frac{Mz_n-Mp_n}{\gamma _n}-Cy_n-\frac{Mz_{n-1}-Mp_{n-1}}{\gamma _{n-1}}+Cy_{n-1}, p_n-p_{n-1} \right\rangle }\end{aligned}$$

(16)

$$\begin{aligned}&\quad + {\left\langle Cy_n-Cy_{n-1}, y_n-y_{n-1} \right\rangle }-\frac{1}{\beta } {\left\| Cy_n-Cy_{n-1} \right\| }_{M^{-1}}^2\end{aligned}$$

(17)

$$\begin{aligned}&= {\left\langle \frac{z_n-p_n}{\gamma _n}-\frac{z_{n-1}-p_{n-1}}{\gamma _{n-1}}, p_n-p_{n-1} \right\rangle }_M-\frac{1}{\bar{\beta }} {\left\| Cy_n-Cy_{n-1} \right\| }_{M^{-1}}^2 \nonumber \\&\quad + {\left\langle Cy_n-Cy_{n-1}, y_n-y_{n-1}-(p_n-p_{n-1}) \right\rangle }\nonumber \\&= {\left\langle \frac{z_n-p_n}{\gamma _n}-\frac{z_{n-1}-p_{n-1}}{\gamma _{n-1}}, p_n-p_{n-1} \right\rangle }_M + \frac{\bar{\beta }}{4} {\left\| p_n-p_{n-1}-y_n+y_{n-1} \right\| }_M^2\nonumber \\&\quad - \frac{\bar{\beta }}{4} {\left\| \frac{2}{\bar{\beta }}(Cy_n-Cy_{n-1})+M(p_n-p_{n-1}-y_n+y_{n-1}) \right\| }_{M^{-1}}^2\nonumber \\&=\widehat{\varphi }_n - \frac{\bar{\beta }}{4} {\left\| \frac{2}{\bar{\beta }}(Cy_n-Cy_{n-1})+M(p_n-p_{n-1}-y_n+y_{n-1}) \right\| }_{M^{-1}}^2, \end{aligned}$$

(18)

where we have used that

$$\begin{aligned} {\left\langle s, t \right\rangle }-\frac{1}{\delta } {\left\| s \right\| }_{M^{-1}}^2=\frac{\delta }{4} {\left\| t \right\| }_M^2-\frac{\delta }{4} {\left\| \frac{2}{\delta }s-Mt \right\| }_{M^{-1}}^2 \end{aligned}$$

(19)

holds for all $t,s\in \mathcal {H}$. Since $\varphi _n\ge 0$ by construction and $\bar{\beta }\ge \beta > 0$, this implies that $\widehat{\varphi }_n\ge 0$.

Theorem 2 (ii). This follows by Assumption 2 and Proposition 1 that imply strict positiveness of $\lambda _{n+1}+\mu _{n+1}$, $\frac{\tilde{\theta }_{n+1}}{\hat{\theta }_{n+1}}$, and $\frac{\hat{\theta }_{n+1}}{\theta _{n+1}}$.

Theorem 2 (iii). Recall that

$$\begin{aligned} \phi _n = {\left\langle \frac{z_n-p_n}{\gamma _n}, p_n-x^\star \right\rangle }_M + \frac{\bar{\beta }}{4} {\left\| y_n-p_n \right\| }_M^2, \end{aligned}$$

(20)

as defined in (8). Since $x^\star \in \text {zer}(A+C)$, we have $-Cx^\star \in {Ax^\star }$, which combined with (13) and montonicity of A gives

$$\begin{aligned} 0&\le {\left\langle \frac{Mz_n-Mp_n}{\gamma _n}-Cy_n+Cx^\star , p_n-x^\star \right\rangle }. \end{aligned}$$

(21)

If $\beta =0$, C is constant and the right hand side reduces to $\phi _n-\frac{\bar{\beta }}{4}\Vert y_n-p_n\Vert _M^2$, which is non-negative for all $n\in \mathbb {N}$ by (21). Let us now consider $\beta >0$. From $\frac{1}{\beta }$-cocoercivity of C w.r.t. $ {\left\| \cdot \right\| }_M$, we have

$$\begin{aligned} 0&\le {\left\langle Cy_n-Cx^\star , y_n-x^\star \right\rangle }-\frac{1}{\beta } {\left\| Cy_n-Cx^\star \right\| }_{M^{-1}}^2. \end{aligned}$$

(22)

Construct $\widehat{\phi }_n$ by adding (21) and (22) to get

$$\begin{aligned} \widehat{\phi }_n&:= {\left\langle \frac{Mz_n-Mp_n}{\gamma _n}-Cy_n+Cx^\star , p_n-x^\star \right\rangle }\nonumber \\&\quad + {\left\langle Cy_n-Cx^\star , y_n-x^\star \right\rangle }-\frac{1}{\beta } {\left\| Cy_n-Cx^\star \right\| }_{M^{-1}}^2\nonumber \\&= {\left\langle \frac{z_n-p_n}{\gamma _n}, p_n-x^\star \right\rangle }_M + {\left\langle Cy_n-Cx^\star , y_n-p_n \right\rangle }-\frac{1}{\beta } {\left\| Cy_n-Cx^\star \right\| }_{M^{-1}}^2, \\&= {\left\langle \frac{z_n-p_n}{\gamma _n}, p_n-x^\star \right\rangle }_M + {\left\langle Cy_n-Cx^\star , y_n-p_n \right\rangle }-\frac{1}{\bar{\beta }} {\left\| Cy_n-Cx^\star \right\| }_{M^{-1}}^2\nonumber \\&\qquad +\left( \frac{1}{\bar{\beta }}-\frac{1}{\beta }\right) {\left\| Cy_n-Cx^\star \right\| }_{M^{-1}}^2\nonumber \\&= {\left\langle \frac{z_n-p_n}{\gamma _n}, p_n-x^\star \right\rangle }_M + \frac{\bar{\beta }}{4} {\left\| y_n-p_n \right\| }_M^2 \nonumber \\&\qquad - \frac{\bar{\beta }}{4} {\left\| \frac{2}{\bar{\beta }}(Cy_n-Cx^\star )+M(p_n-y_n) \right\| }_{M^{-1}}^2-\frac{\bar{\beta }-\beta }{\bar{\beta }\beta } {\left\| Cy_n-Cx^\star \right\| }_{M^{-1}}^2\nonumber \\&= \phi _n - \frac{\bar{\beta }}{4} {\left\| \frac{2}{\bar{\beta }}(Cy_n-Cx^\star )+M(p_n-y_n) \right\| }_{M^{-1}}^2-\frac{\bar{\beta }-\beta }{\bar{\beta }\beta } {\left\| Cy_n-Cx^\star \right\| }_{M^{-1}}^2, \end{aligned}$$

where (19) is used in the next to last equality. The result therefore follows since $\bar{\beta }\ge \beta >0$ and $\widehat{\phi }_n\ge 0$ by construction.

Theorem 2 (iv). Since $\ell _{n}\ge 0$ by Theorem 2 (i), $\phi _n\ge 0$ by Theorem 2 (iii), and the coefficients in front of $\phi _n$ in the definition of $V_n$ in (7) are non-negative by Assumption 2 and Proposition 1, we conclude that $V_n\ge 0$.

Theorem 2 (v). By Theorem 1, we have

$$\begin{aligned} V_{n+1} + \ell _{n-1}&+ 2\gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})\phi _n\\&= V_n + (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2 + \frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) . \end{aligned}$$

Using this equality and () gives

$$\begin{aligned} V_{n+1}&+ 2\gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})\phi _n +\ell _{n-1}\le V_n + \zeta _n\ell _{n-1}. \end{aligned}$$

Moving $\zeta _n\ell _{n-1}$ to the other side gives the desired result. This concludes the proof.$\square $

This result demonstrates the feasibility of selecting $u_{n+1}$ and $v_{n+1}$ that meet the safeguarding condition. The obvious selection of $u_{n+1}=v_{n+1}=0$ is always viable, but we will provide in Section 5 a nontrivial choice that consistently satisfies the condition and can enhance convergence. Furthermore, Theorem 2 introduces a valuable Lyapunov inequality that will underpin our conclusions on convergence. Before stating these convergence results, we show boundedness of certain coefficient sequences.

Lemma 1

Consider the quantities defined in Algorithm 1 and suppose that Assumption 2 holds. The sequences $ {\left( \frac{\tilde{\theta }_n}{\hat{\theta }_n} \right) }_{n\in \mathbb {N}}$, $ {\left( \frac{\lambda _n^2}{\hat{\theta }_n} \right) }_{n\in \mathbb {N}}$, $ {\left( \frac{\bar{\theta }_n}{\theta _n} \right) }_{n\in \mathbb {N}}$, as well as $ {\left( \frac{(2-\gamma _n\bar{\beta })(\lambda _n+\mu _n)}{\theta _n} \right) }_{n\in \mathbb {N}}$ are bounded.

Proof

We have $\frac{\tilde{\theta }_n}{\hat{\theta }_n}\ge 0$ by Proposition 1 and

$$\begin{aligned} \frac{\tilde{\theta }_n}{\hat{\theta }_n}&=\frac{\gamma _n\bar{\beta }}{2-\lambda _0\gamma _n\bar{\beta }}\le \frac{\gamma _n\bar{\beta }}{\lambda _0\varepsilon }\le \frac{4-2\lambda _0-\varepsilon }{\lambda _0\varepsilon }\le \frac{4}{\lambda _0\varepsilon } \end{aligned}$$

by (11). Further, $\frac{\lambda _n^2}{\hat{\theta }_n}\ge 0$ by Proposition 1 and

$$\begin{aligned} \frac{\lambda _n^2}{\hat{\theta }_n}=\frac{\lambda _0}{(2-\lambda _0\gamma _n\bar{\beta })}\le \frac{1}{\varepsilon } \end{aligned}$$

by (11). Further, by (10),

$$\begin{aligned} \left| \frac{\bar{\theta }_n}{\theta _n}\right| =\frac{|1-\lambda _0|}{4-\gamma _n\bar{\beta }-2\lambda _0}\le \frac{|1-\lambda _0|}{\varepsilon } \end{aligned}$$

and, since $\gamma _n\bar{\beta }\in (0,4)$ by Assumption 2 (ii),

$$\begin{aligned} \left| \frac{(2-\gamma _n\bar{\beta })(\lambda _n+\mu _n)}{\theta _n}\right|&\le \frac{2(\lambda _n+\mu _n)}{\theta _n}=\frac{2}{4-\gamma _n\bar{\beta }-2\lambda _0}\le \frac{2}{\varepsilon }. \end{aligned}$$

This completes the proof.$\square $

Theorem 3

Suppose that Assumptions 1 and 2 hold. Let $x^\star $ be an arbitrary point in $\text{zer}(A+C)$, and the sequences $ {\left( \ell _{n} \right) }_{n\in \mathbb {N}}$, $ {\left( V_n \right) }_{n\in \mathbb {N}}$, and $ {\left( \phi _n \right) }_{n\in \mathbb {N}}$ be constructed in terms of the iterates obtained from Algorithm 1, as per (), (7), and (8) respectively. Then the following hold:

(i)
the sequence $ {\left( V_n \right) }_{n\in \mathbb {N}}$ is convergent and $\ell _{n}\le V_{n+1}\le \Vert x_0-x^\star \Vert _M^2$;
(ii)
if $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ increasing and $\lambda _n\rightarrow \infty $ as $n\rightarrow \infty $, then
$$\begin{aligned} {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2\le \frac{2\lambda _0\Vert x_0-x^\star \Vert _M^2}{(4-\gamma _n\bar{\beta }-2\lambda _0)\lambda _n^2}; \end{aligned}$$
(iii)
if $\epsilon _0>0$, then $ {\left( \ell _{n} \right) }_{n\in \mathbb {N}}$ is summable;
(iv)
if $\epsilon _0>0$, then $\lambda _n u_n\rightarrow 0$, $\lambda _nv_n\rightarrow 0$, and $x_{n+1}-x_n\rightarrow 0$ as $n\rightarrow \infty $;
(v)
if $\epsilon _1>0$, then $ {\left( \phi _n \right) }_{n\in \mathbb {N}}$ is summable;
(vi)
if $\epsilon _1>0$ and $\bar{\beta }>\beta $, then $\Vert y_n-p_n\Vert _M^2$ is summable;
(vii)
if $\epsilon _0,\epsilon _1>0$, $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is bounded, and $p_n-x_n\rightarrow 0$, $y_n-p_n\rightarrow 0$, and $z_n-p_n\rightarrow 0$ as $n\rightarrow \infty $, then $p_n\rightharpoonup x^\star $;
(viii)
if $\epsilon _0,\epsilon _1>0$ and $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is constant, then $p_n\rightharpoonup x^\star $.

Proof

We base our convergence results on

$$\begin{aligned} V_{n+1}&+ 2\gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})\phi _n + {\left( 1-\zeta _n \right) }\ell _{n-1}\le V_n, \end{aligned}$$

(23)

from Theorem 2.

Theorem 3 (i). Recall that the sequences $ {\left( \ell _{n} \right) }_{n\in \mathbb {N}}$, $ {\left( V_n \right) }_{n\in \mathbb {N}}$, and $ {\left( \phi _n \right) }_{n\in \mathbb {N}}$ are non-negative by Theorem 2. Additionally, by Assumption 2 (i) and Proposition 1 respectively, the quantities $1-\zeta _n$ and $\gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})$ are non-negative for all $n\in \mathbb {N}$; and thus, the quantity $2\gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})\phi _n + {\left( 1-\zeta _n \right) }\ell _{n-1}$ is non-negative for all $n\in \mathbb {N}$. Therefore, by [9, Lemma 5.31] the sequence $ {\left( V_n \right) }_{n\in \mathbb {N}}$ converges and $V_{n+1}\le V_n$ for all $n\in \mathbb {N}$ and since $\lambda _{n+1}\gamma _{n+1}\alpha _{n+1}\ge 0$ by Assumption 2 and Proposition 1,

$$\begin{aligned} \ell _{n}\le V_{n+1} \le V_n\le \ldots \le V_0=\Vert x_0-x^\star \Vert _M^2. \end{aligned}$$

(24)

Theorem 3 (ii). Theorem 2 (i) states that

$$\begin{aligned} \frac{\theta _n}{2} {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2\le \ell _{n}, \end{aligned}$$

where $\ell _{n}\le \Vert x_0-x^\star \Vert _M^2$ by Theorem 3 (i). Inserting the definition of $\theta _n$ and rearranging gives the result.

Theorem 3 (iii). That $\epsilon _0>0$ implies that $1-\zeta _n>\epsilon _0>0$ by Assumption 2 (i) and a telescope summation of (23) gives summability of $ {\left( \ell _{n} \right) }_{n\in \mathbb {N}}$.

Theorem 3 (iv). To show that $ {\left( \lambda _nu_n \right) }_{n\in \mathbb {N}}$ and $ {\left( \lambda _nv_n \right) }_{n\in \mathbb {N}}$ converge to 0, we note that due to (), the summability of $ {\left( \ell _{n} \right) }_{n\in \mathbb {N}}$ implies summability of

$$\begin{aligned} {\left( \frac{\lambda _{n+1}+\mu _{n+1}}{\lambda _{n+1}^2}\left( \frac{\tilde{\theta }_{n+1}}{\hat{\theta }_{n+1}} {\left\| \lambda _{n+1}u_{n+1} \right\| }_{M}^2+\frac{\hat{\theta }_{n+1}}{\theta _{n+1}} {\left\| \lambda _{n+1}v_{n+1} \right\| }_{M}^2\right) \right) }_{n\in \mathbb {N}}. \end{aligned}$$

(25)

Hence, as, for all $n\in \mathbb {N}$, by Proposition 1 and Assumption 2 the coefficients in the expression above are strictly positive, the sequences $ {\left( \lambda _nu_n \right) }_{n\in \mathbb {N}}$ and $ {\left( \lambda _nv_n \right) }_{n\in \mathbb {N}}$ must be convergent to zero.

Next, we show convergence to zero of $ {\left( x_{n+1}-x_n \right) }_{n\in \mathbb {N}}$. Since $ {\left( \ell _{n} \right) }_{n\in \mathbb {N}}$ is summable, Theorem 2 (i) implies that

$$\begin{aligned} {\left( \frac{\theta _n}{2} {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2 \right) }_{n\in \mathbb {N}} \end{aligned}$$

is summable. Using Lemma 2 to replace the expression inside the norm above by Lemma 2 (iii) and taking the factor $\frac{1}{\lambda _n}$ out of the norm, we get

$$\begin{aligned} {\left( \frac{\theta _n}{2\lambda _n^2} {\left\| x_{n+1}-x_n+\frac{\tilde{\theta }_n}{\hat{\theta }_n}\lambda _nu_n+\frac{(2-\gamma _n\bar{\beta })(\lambda _n+\mu _n)}{\theta _n}\lambda _nv_n \right\| }_M^2 \right) }_{n\in \mathbb {N}}, \end{aligned}$$

which is a summable sequence too. Since, by Proposition 1, $\frac{\theta _n}{2\lambda _n^2}$ is lower bounded by a positive constant and the coefficients multiplying $\lambda _nu_n$ and $\lambda _nv_n$ are bounded by Lemma 1, we conclude, since $\lambda _nu_n\rightarrow 0$ and $\lambda _nv_n\rightarrow 0$ as $n\rightarrow \infty $, that $x_{n+1}-x_n\rightarrow 0$ as $n\rightarrow \infty $.

Theorem 3 (v). Proposition 1 implies that $2\gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})\ge 2\epsilon _1>0$ and a telescope summation of (23) gives summability of $ {\left( \phi _n \right) }_{n\in \mathbb {N}}$.

Theorem 3 (vi). Let $\beta =0$. Then Theorem 2 (iii) immediately gives the result due to summability of $ {\left( \phi _n \right) }_{n\in \mathbb {N}}$. Let $\beta >0$. Then Theorem 2 (iii) and $\bar{\beta }>\beta $ imply that

$$\begin{aligned} {\left\| \frac{2}{\bar{\beta }}(Cy_n-Cx^\star )+M(p_n-y_n) \right\| }_{M^{-1}}^2\qquad {\hbox {and}}\qquad {\left\| Cy_n-Cx^\star \right\| }_{M^{-1}}^2 \end{aligned}$$

are summable. Since

$$\begin{aligned} \Vert p_n-y_n\Vert _M^2&=\Vert \frac{2}{\bar{\beta }}(Cy_n-Cx^\star )-\frac{2}{\bar{\beta }}(Cy_n-Cx^\star )+M(p_n-y_n)\Vert _{M^{-1}}^2\\&\le 2 {\left\| \frac{2}{\bar{\beta }}Cy_n-Cx^\star +M(p_n-y_n) \right\| }_{M^{-1}}^2+2 {\left\| Cy_n-Cx^\star \right\| }_{M^{-1}}^2, \end{aligned}$$

we conclude that $ {\left( \Vert p_n-y_n\Vert _M^2 \right) }_{n\in \mathbb {N}}$ is summable.

Theorem 3 (vii). We first show that $\Vert x_n-x^\star \Vert _M^2$ converges. From Theorem 3 (i), we know that

$$\begin{aligned} V_{n+1} := {\left\| x_{n+1}-x^\star \right\| }_M^2 + 2\lambda _{n+1}\gamma _{n+1}\alpha _{n+1}\phi _n + \ell _{n}, \end{aligned}$$

converges. Since $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is bounded so is $\lambda _{n+1}\gamma _{n+1}\alpha _{n+1}$ and by Theorem 3 (iii) and Theorem 3 (v) we conclude that $2\lambda _{n+1}\gamma _{n+1}\alpha _{n+1}\phi _n + \ell _{n}\rightarrow 0$ as $n\rightarrow \infty $. This implies that $ {\left\| x_{n+1}-x^\star \right\| }_M^2$ converges.

Now, since $\Vert p_n-x_n\Vert _M^2\rightarrow 0$ as $n\rightarrow \infty $ and $\Vert x_n-x^\star \Vert \le D$ for all $n\in \mathbb {N}$ and some $D\in (0,\infty )$, we conclude that

$$\begin{aligned} \left| \Vert p_n-x^\star \Vert _M^2-\Vert x_n-x^\star \Vert _M^2\right|&=\left| \Vert p_n-x_n\Vert _M^2+2\langle p_n-x_n,x_n-x^\star \rangle \right| \\&\le \Vert p_n-x_n\Vert _M^2+2\left| \langle p_n-x_n,x_n-x^\star \rangle \right| \\&\le \Vert p_n-x_n\Vert _M^2+2\Vert p_n-x_n\Vert \Vert x_n-x^\star \Vert \\&\le \Vert p_n-x_n\Vert _M^2+2\Vert p_n-x_n\Vert D\rightarrow 0 \end{aligned}$$

as $n\rightarrow \infty $. Therefore also $\Vert p_n-x^\star \Vert ^2$ converges and $p_n$ has weakly convergent subsequences. Let $(p_{n_k})_{k\in \mathbb {N}}$ be one such subsequence with weak limit point $\bar{x}$ and construct corresponding subsequences $(y_{n_k})_{k\in \mathbb {N}}$, $(z_{n_k})_{k\in \mathbb {N}}$, and $(\gamma _{n_k})_{k\in \mathbb {N}}$. Now, Step 7 in Algorithm 1 can equivalently be written as

$$\begin{aligned} Mp_{n_k}+\gamma _{n_k}Ap_{n_k}\ni Mz_{n_k}-\gamma _{n_k}Cy_{n_k}, \end{aligned}$$

which is equivalent to that

$$\begin{aligned} Cp_{n_k}+Ap_{n_k}\ni \frac{1}{\gamma _{n_k}}M(z_{n_k}-p_{n_k})+(Cp_{n_k}-Cy_{n_k}). \end{aligned}$$

The right hand side converges to 0 as $k\rightarrow \infty $ since $z_{n_k}-p_{n_k}\rightarrow 0$ and $p_{n_k}-y_{n_k}\rightarrow 0$ as $k\rightarrow \infty $ and due to Lipschitz continuity of C, the uniform positive lower bound on $\gamma _n$ in Assumption 2 (ii), and boundedness of $M\in \mathcal {M} {\left( \mathcal {H} \right) }$. By weak-strong closedness of the maximal monotone operator $(A+C)$ (which is maximally monotone since C has full domain) the limit point satisfies $0\in (A+C)\bar{x}$ by [9, Proposition 20.38]. The weak convergence result now follows from [9, Lemma 2.47].

Theorem 3 (viii). In view of Theorem 3 (vii), it is enough to show that $p_n-x_n\rightarrow 0$, $y_n-p_n\rightarrow 0$, and $z_n-p_n\rightarrow 0$ as $n\rightarrow \infty $. Since $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is constant, $\mu _n=\frac{1}{\lambda _0}\lambda _n^2-\lambda _n=0$ and $\alpha _n=0$. Summability of $\ell _{n}$ therefore implies through Theorem 2 (i) that

$$\begin{aligned} {\left( \frac{\theta _n}{2} {\left\| p_n-x_n +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2 \right) }_{n\in \mathbb {N}} \end{aligned}$$

is summable. Since $\theta _n$ is lower bounded by a positive constant due to Proposition 1 and the coefficients in front of $u_n$ and $v_n$ are bounded due to Lemma 1, we conclude, since $u_n\rightarrow 0$ and $v_n\rightarrow 0$ by Theorem 3 (iv) and Assumption 2, that $p_n-x_n\rightarrow 0$ as $n\rightarrow \infty $. From the $x_n$ update,

$$\begin{aligned} x_{k+1}=x_k+\lambda _n(p_n-z_n), \end{aligned}$$

Theorem 3 (iv), and since $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is constant, we conclude that $p_n-z_n\rightarrow 0$ as $n\rightarrow \infty $. Finally, from the $y_n$ update,

$$\begin{aligned} y_n=x_n+u_n, \end{aligned}$$

and since $u_n\rightarrow 0$, we conclude that $y_n-x_n\rightarrow 0$, which implies that $y_n-p_n\rightarrow 0$ as $n\rightarrow \infty $. This concludes the proof.$\square $

We could derive convergence properties for other quantities involved, yet we limit our discussion to these results as they are sufficient for our needs for the special cases. Notably, the conclusion in Theorem 3 (viii) aligns with a similar result presented in the authors’ previous work [33]. This is due to $\mu _n=0$, causing our algorithm to reduce to the one presented in that work.

5 A special case

The safeguarding condition in () typically requires the evaluation of four norms and a scalar product. However, if the vectors inside the norms are parallel, the number of norm evaluations is reduced. This section introduces an algorithm wherein we choose $u_{n}$ and $v_{n}$ to ensure $y_n=z_n$ for all $n\in \mathbb {N}$ and such that the safeguarding condition reduces to a scalar condition that is readily verified offline. The algorithm we propose is as follows:

$$\begin{aligned} y_n&=x_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(y_{n-1}-x_n)+u_n\\ p_n&= {\left( M+\gamma A \right) }^{-1} {\left( M y_n - \gamma C y_n \right) }\\ x_{n+1}&= x_n + \lambda _n(p_n - y_n) + (\lambda _n-\lambda _0)(y_{n-1}-p_{n-1})\\ u_{n+1}&=\kappa _n\frac{4-\gamma \bar{\beta }-2\lambda _0}{2}\left( p_n-x_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(x_n-p_{n-1}) - \frac{2-\gamma \bar{\beta }-2\lambda _0}{4-\gamma \bar{\beta }-2\lambda _0} u_n\right) , \end{aligned}$$

(26)

where $y_{-1}=p_{-1}=x_0$, $u_0=0$, and a constant step size $\gamma >0$ is used. With a constant step size, Assumption 2 (iii) reduces to

$$\begin{aligned} \lambda _n\le \lambda _{n+1}\le \lambda _n+\lambda _0-\frac{\epsilon _1}{\gamma }=(2+n)\lambda _0-(n+1)\frac{\epsilon _1}{\gamma }, \end{aligned}$$

(27)

which gives an increasing $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ sequence that grows at most linearly in n.

Prior to presenting the convergence results for this algorithm, we specify a particular form for the sequence $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$. This form separates the growth in n from the selection of $\lambda _0$.

Assumption 3

Let $\lambda _0>0$. Assume that $f:{\textrm{dom}} f\rightarrow \mathbb {R}$, with $\mathrm {int\,dom} f\supseteq \{x\in \mathbb {R}:x\ge 0\}$, is differentiable (on the interior of its domain), concave, and non-decreasing, and satisfies $f(0)=1$ and $f^\prime (0)\in [0,1]$. Let, for all $n\in \mathbb {N}$,

$$\begin{aligned} \lambda _n = f(n)\lambda _0. \end{aligned}$$

Proposition 2

Suppose that Assumption 3 holds, then (27) and Assumption 2 (iii) hold with $\epsilon _1=0$. Suppose in addition that $f^\prime (0)<1$, then there exists $\epsilon _1>0$ such that (27) and Assumption 2 (iii) hold.

Proof

That f is non-decreasing trivially implies $\lambda _{n}\le \lambda _{n+1}$. Concavity implies $f^{\prime }(x)\le f^{\prime }(0)$ for all $x\ge 0$. Therefore

$$\begin{aligned} f(n) =1+\int _{0}^{n}f^\prime (x) dx\le 1+\int _{0}^{n}f^\prime (0) dx\le 1+n \end{aligned}$$

and $\lambda _{n+1}\le (2+n)\lambda _0$. Let $a:=f^{\prime }(0)\in [0,1)$, then

$$\begin{aligned} f(n) =1+\int _{0}^{n}f^\prime (x) dx\le 1+\int _{0}^{n}f^\prime (0) dx=1+an=1+n-(1-a)n \end{aligned}$$

and with $\epsilon _1=(1-a)\gamma \lambda _0>0$, we get $\lambda _{n+1}\le (2+n)\lambda _0-(n+1)\frac{\epsilon _1}{\gamma }$, as desired.$\square $

Example 1

Examples of functions f that satisfy Assumption 3 for which an $\epsilon _1>0$ exists include functions that, for all $n\in \mathbb {N}$, satisfy $f(n)=(1+n)^e$ with $e\in [0,1)$, $f(n)=\frac{\log (n+2)}{\log (2)}$, and $f(n)=1$. The choice $f(n)=(1+n)$ requires that $\epsilon _1=0$.

We will use this construction of $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ throughout this section and specialize Assumption 2 as follows.

Assumption 4

Assume that $\varepsilon >0$, $\epsilon _0\ge 0$, $\lambda _0>0$, and that, for all $n\in \mathbb {N}$, $\mu _n=\frac{1}{\lambda _0}\lambda _n^2-\lambda _n$ and the following hold:

(i)
$0\le \kappa _n^2\le 1-\epsilon _0$;
(ii)
$\varepsilon \le \gamma \bar{\beta }\le 4-2\lambda _0-\varepsilon $;
(iii)
$ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is given by Assumption 3;
(iv)
$\bar{\beta }\ge \beta $.

The differences to Assumption 2 are that $\zeta _{n}$ in Assumption 2 (i) has been replaced with $\kappa _n^2$ in Assumption 4 (i) and that Assumption 2 (iii) has been replaced with Assumption 4 (iii). If, e.g., $\kappa _n^2\le \zeta _{n+1}$, Proposition 2 implies that Assumption 2 holds if Assumption 4 does.

We are ready to state our convergence results for the algorithm in (26).

Proposition 3

Suppose that Assumptions 1 and 4 hold. Then the following hold for (26):

(i)
if $f(n)\rightarrow \infty $ as $n\rightarrow \infty $, then
$$\begin{aligned} {\left\| p_n-x_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(x_n-p_{n-1}) - \frac{2-\gamma \bar{\beta }-2\lambda _0}{4-\gamma \bar{\beta }-2\lambda _0}u_n \right\| }_M^2\le \frac{2\Vert y_0-x^\star \Vert _M^2}{(4-\gamma \bar{\beta }-2\lambda _0)\lambda _0f(n)^2}; \end{aligned}$$
(ii)
if $\bar{\beta }>\beta $ and $f^{\prime }(0)<1$, then $ {\left( \Vert p_n-y_n\Vert ^2 \right) }_{n\in \mathbb {N}}$ is summable;
(iii)
if $ {\left( \kappa _n^2 \right) }_{n\in \mathbb {N}}$ is upper bounded by a constant less than 1 and $f^{\prime }(0)=0$ (i.e., $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is constant), then $p_n\rightharpoonup x^\star \in \text{zer}(A+C)$.

Proof

We first show that the algorithm is a special case of Algorithm 1. First note that $\gamma _n=\gamma $ implies that $\alpha _n=\bar{\alpha }_n$ for all $n\in \mathbb {N}$. Let

$$\begin{aligned} v_n&=\frac{2-\gamma _n\bar{\beta }}{2-\lambda _0\gamma _n\bar{\beta }}u_n \end{aligned}$$

for all $n\in \mathbb {N}$, which implies that

$$\begin{aligned} \frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}u_n+v_n=\frac{(1-\lambda _0)\gamma _n\bar{\beta }}{2-\lambda _0\gamma _n\bar{\beta }}u_n+\frac{2-\gamma _n\bar{\beta }}{2-\lambda _0\gamma _n\bar{\beta }}u_n=u_n \end{aligned}$$

(28)

since $2-\lambda _0\gamma _n\bar{\beta }>0$ by (11). Let us show by induction that this implies $y_n=z_n$ for all $n\in \mathbb {N}$ in Algorithm 1. Since $y_{-1}=z_{-1}=p_{-1}=x_0$ and $u_0=0$, we get $y_0=z_0$. Now, assume that $y_k=z_k$ for all $k\in \{-1,\ldots ,n\}$, then, since $\alpha _n=\bar{\alpha }_n$ and due to (28),

$$\begin{aligned} z_{n+1}&=y_n-x_n+\alpha _n(p_{n-1}-x_n)+\bar{\alpha }_n(y_{n-1}-p_{n-1})+\frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}u_n+v_n\\&=y_n-x_n+\alpha _n(p_{n-1}-x_n)+u_n =y_{n+1}. \end{aligned}$$

Therefore, the $z_{n+1}$ update of Algorithm 1 can be removed and all $z_n$ instances replaced by $y_n$ in (26). Moreover, the $y_n$ and $x_n$ updates of (26) are obtained from the corresponding sequences in Algorithm 1 by inserting $\alpha _n=\frac{\lambda _n-\lambda _0}{\lambda _n}$.

It remains to show that the $u_{n+1}$ update satisfies the safeguarding condition. We use Theorem 2 (i), $\alpha _n=\frac{\lambda _n-\lambda _0}{\lambda _n}$, and the equality

$$\begin{aligned} \frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n&=\left( \frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n} - \frac{2\bar{\theta }_n}{\theta _n}\frac{2-\gamma _n\bar{\beta }}{2-\lambda _0\gamma _n\bar{\beta }}\right) u_n\\&=\left( \frac{\lambda _0\gamma _n\bar{\beta }}{2-\lambda _0\gamma _n\bar{\beta }} - \frac{2(1-\lambda _0)}{4-\gamma _n\bar{\beta }-2\lambda _0}\frac{2-\gamma _n\bar{\beta }}{2-\lambda _0\gamma _n\bar{\beta }}\right) u_n\\&=\frac{\gamma _n\bar{\beta }\lambda _0(4-\gamma _n\bar{\beta }-2\lambda _0)-2(1-\lambda _0)(2-\gamma _n\bar{\beta })}{(2-\lambda _0\gamma _n\bar{\beta })(4-\gamma _n\bar{\beta }-2\lambda _0)}u_n \\&=\frac{\gamma _n\bar{\beta }\lambda _0(2-\gamma _n\bar{\beta }-2\lambda _0)-2(2-\gamma _n\bar{\beta }-2\lambda _0)}{(2-\lambda _0\gamma _n\bar{\beta })(4-\gamma _n\bar{\beta }-2\lambda _0)}u_n \\&=\frac{(\gamma _n\bar{\beta }\lambda _0-2)(2-\gamma _n\bar{\beta }-2\lambda _0)}{(2-\lambda _0\gamma _n\bar{\beta })(4-\gamma _n\bar{\beta }-2\lambda _0)}u_n \\&=-\frac{(2-\gamma _n\bar{\beta }-2\lambda _0)}{(4-\gamma _n\bar{\beta }-2\lambda _0)}u_n, \end{aligned}$$

(29)

to conclude that

$$\begin{aligned} \ell _{n}&\ge \frac{\theta _n}{2} {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2\nonumber \\&=\frac{\theta _n}{2} {\left\| p_n-x_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(x_n-p_{n-1}) -\frac{(2-\gamma _n\bar{\beta }-2\lambda _0)}{(4-\gamma _n\bar{\beta }-2\lambda _0)}u_n \right\| }_M^2. \end{aligned}$$

Now, since $\lambda _{n+1}+\mu _{n+1}=\frac{\lambda _n^2}{\lambda _0}$, we conclude that if

$$\begin{aligned}&\frac{\lambda _n^2}{\lambda _0}\left( \frac{\tilde{\theta }_{n+1}}{\hat{\theta }_{n+1}} {\left\| u_{n+1} \right\| }_{M}^2 + \frac{\hat{\theta }_{n+1}}{\theta _{n+1}} {\left\| v_{n+1} \right\| }_{M}^2\right) \\&\qquad \qquad \qquad \le \zeta _{n+1}\frac{\theta _n}{2} {\left\| p_n-x_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(x_n-p_{n-1}) -\frac{(2-\gamma _n\bar{\beta }-2\lambda _0)}{(4-\gamma _n\bar{\beta }-2\lambda _0)}u_n \right\| }_M^2, \end{aligned}$$

the safeguarding condition in Algorithm 1 is satisfied. The vectors $u_{n+1}$ and $v_{n+1}$ are scalars times the quantity inside this norm. Therefore, the safeguarding condition reduces to the scalar condition

$$\begin{aligned} \frac{\lambda _n^2}{\lambda _0}\kappa _n^2\frac{(4-\gamma \bar{\beta }-2\lambda _0)^2}{4}\left( \frac{\tilde{\theta }_{n+1}}{\hat{\theta }_{n+1}} + \frac{\hat{\theta }_{n+1}}{\theta _{n+1}}\frac{(2-\gamma _n\bar{\beta })^2}{(2-\lambda _0\gamma _n\bar{\beta })^2}\right) \le \zeta _{n+1}\frac{\theta _n}{2}. \end{aligned}$$

Inserting the quantities in (12) and $\mu _n=\frac{1}{\lambda _0}\lambda _n^2-\lambda _n$ and multiplying by $\frac{2}{\theta _n}>0$ gives

$$\begin{aligned} \kappa _n^2\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( \frac{\gamma _n\bar{\beta }}{2-\lambda _0\gamma _n\bar{\beta }} + \frac{2-\lambda _0\gamma _n\bar{\beta }}{(4-\gamma _n\bar{\beta }-2\lambda _0)}\frac{(2-\gamma _n\bar{\beta })^2}{(2-\lambda _0\gamma _n\bar{\beta })^2}\right) \le \zeta _{n+1}. \end{aligned}$$

The left-hand side satisfies

$$\begin{aligned}&\kappa _n^2\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( \frac{\gamma _n\bar{\beta }}{2-\lambda _0\gamma _n\bar{\beta }} + \frac{2-\lambda _0\gamma _n\bar{\beta }}{(4-\gamma _n\bar{\beta }-2\lambda _0)}\frac{(2-\gamma _n\bar{\beta })^2}{(2-\lambda _0\gamma _n\bar{\beta })^2}\right) \\&=\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\frac{\kappa _n^2}{2-\lambda _0\gamma _n\bar{\beta }}\frac{\left( \gamma _n\bar{\beta }(4-\gamma _n\bar{\beta }-2\lambda _0)+(2-\gamma _n\bar{\beta })^2\right) }{(4-\gamma _n\bar{\beta }-2\lambda _0)}\\&=\frac{\kappa _n^2}{2(2-\lambda _0\gamma _n\bar{\beta })}\left( \gamma _n\bar{\beta }(4-\gamma _n\bar{\beta }-2\lambda _0)+4-4\gamma _n\bar{\beta }+(\gamma _n\bar{\beta })^2\right) \\&=\frac{\kappa _n^2}{2(2-\lambda _0\gamma _n\bar{\beta })}\left( 4-2\gamma _n\bar{\beta }\lambda _0\right) \\&=\kappa _n^2, \end{aligned}$$

leading to the safeguarding condition

$$\begin{aligned} \kappa _n^2\le \zeta _{n+1} \end{aligned}$$

which is satisfied by letting $\zeta _{n+1}=\kappa _n^2$.

Using Proposition 2 and the choice $\zeta _{n+1}=\kappa _n^2$, we conclude that Assumption 2 holds since Assumption 4 does and we can use Theorem 3 to prove convergence.

That Proposition 3 (i) holds follows from Theorem 3 (ii) since $y_0=x_0$ and by updating the norm expression using (29).

Proposition 3 (ii) follows from Theorem 3 (vi) due to Proposition 2 that ensures $\epsilon _1>0$.

Proposition 3 (iii) follows from Theorem 3 (viii) due to Proposition 2 and that $\kappa _n^2=\zeta _{n+1}$ is upper bounded by a constant less than 1, which implies $\epsilon _0>0$.$\square $

Remark 2

The algorithm produces points that satisfy

$$\begin{aligned} (A+C)p_n\ni \frac{1}{\gamma }M(y_n-p_n)-C(y_n-p_n) \end{aligned}$$

and Proposition 3 (ii) implies, if $f^\prime (0)<1$ and $\bar{\beta }>\beta $, that $M(y_n-p_n)-C(y_n-p_n)\rightarrow 0$ as $n\rightarrow \infty $ due to boundedness of M and Lipschitz continuity of C. Although $ {\left( p_n \right) }_{n\in \mathbb {N}}$ may not converge, it satisfies the monotone inclusion (2) in the limit. If in addition $\Vert p_n-x^\star \Vert _M$ converges, we can conclude that $p_n\rightharpoonup x^\star \in \text{zer}(A+C)$.

A special case of (26) that we will evaluate numerically in Section 6 is found by letting $\lambda _0=1$, and, for all $n\in \mathbb {N}$, $\lambda _n=\lambda _0$, and $\kappa _n=\kappa \in (-1,1)$. Then (26) reduces to

$$\begin{aligned} y_n&=x_n+u_n\\ p_n&= {\left( M+\gamma A \right) }^{-1} {\left( M y_n - \gamma C y_n \right) }\\ x_{n+1}&= x_n + p_n - y_n\\ u_{n+1}&=\kappa \left( \frac{2-\gamma \bar{\beta }}{2}\left( p_n-x_n\right) + \frac{\gamma \bar{\beta }}{2} u_n\right) , \end{aligned}$$

which, since $x_{n+1}=p_n-u_n$, can be written as

$$\begin{aligned} p_n&= \left( {\left( M+\gamma A \right) }^{-1}\circ {\left( M - \gamma C \right) }\right) \left( p_{n-1}+u_n-u_{n-1}\right) \\ u_{n+1}&=\kappa \left( \frac{2-\gamma \bar{\beta }}{2}(p_n-p_{n-1}+u_{n-1}) + \frac{\gamma \bar{\beta }}{2} u_n\right) . \end{aligned}$$

(30)

This algorithm is previewed in Section 3.1. Since $\kappa _n=\kappa \in (-1,1)$ for all $n\in \mathbb {N}$ and since $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is constant, Proposition 3 (iii) ensures that this algorithm produces a $p_n$-sequence that converges weakly to a solution.

5.1 Alternative formulation

We can eliminate the $x_n$ sequence in (26) and express the algorithm solely in terms of $y_n$, $p_n$, and $u_n$. The algorithm becomes

$$\begin{aligned} p_n&= {\left( M+\gamma A \right) }^{-1} {\left( M y_n - \gamma C y_n \right) }\\ y_{n+1}&=y_{n} + \frac{\lambda _0\lambda _n}{\lambda _{n+1}}(p_n - y_n) +u_{n+1}-\frac{\lambda _n}{\lambda _{n+1}}u_n \\&\quad +\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( (y_n- y_{n-1})+\lambda _0 (y_{n-1} - p_{n-1})\right) \\ u_{n+1}&=\kappa _n\left( \frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( p_n-y_n-\frac{\lambda _n-\lambda _0}{\lambda _n}(p_{n-1}-y_{n-1})\right) + u_n\right) \end{aligned}$$

(31)

with $y_{-1}=p_{-1}=x_0$ and $u_0=0$.

Proposition 4

The algorithms in (26) and (31) produce the same $ {\left( y_n \right) }_{n\in \mathbb {N}}$ and $ {\left( p_n \right) }_{n\in \mathbb {N}}$ sequences, provided $p_{-1}=y_{-1}=y_0=x_0$.

Proof

We remove the $x_n$ sequence from (26) by inserting

$$\begin{aligned} x_n=\frac{\lambda _n}{\lambda _0}\left( y_n-\frac{\lambda _n-\lambda _0}{\lambda _n} y_{n-1}-u_n\right) , \end{aligned}$$

that comes from the $y_n$ update, into the $x_{n+1}$ update. This gives $x_{n+1}$ update

$$\begin{aligned} \frac{\lambda _{n+1}}{\lambda _0}\left( y_{n+1}-\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}} y_{n}-u_{n+1}\right)&= \frac{\lambda _n}{\lambda _0}\left( y_n-\frac{\lambda _n-\lambda _0}{\lambda _n} y_{n-1}-u_n\right) \\&\quad + \lambda _n(p_n - y_n) + (\lambda _n-\lambda _0)(y_{n-1}-p_{n-1}). \end{aligned}$$

Multiplying by $\frac{\lambda _0}{\lambda _{n+1}}$ gives

$$\begin{aligned} y_{n+1}&= \frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}y_{n}+u_{n+1}+\frac{\lambda _n}{\lambda _{n+1}}\left( y_n-\frac{\lambda _n-\lambda _0}{\lambda _n} y_{n-1}-u_n\right) \\&\quad + \frac{\lambda _0}{\lambda _{n+1}}\left( \lambda _n(p_n - y_n) + (\lambda _n-\lambda _0)(y_{n-1}-p_{n-1})\right) \\&= y_n+u_{n+1}+\frac{\lambda _n}{\lambda _{n+1}}\left( -\frac{\lambda _0}{\lambda _{n}}y_{n}+y_n-\frac{\lambda _n-\lambda _0}{\lambda _n} y_{n-1}-u_n\right) \\&\quad + \frac{\lambda _0}{\lambda _{n+1}}\left( \lambda _n(p_n - y_n) + (\lambda _n-\lambda _0)(y_{n-1}-p_{n-1})\right) \\&= y_n+\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( y_n- y_{n-1}\right) +u_{n+1}-\frac{\lambda _n}{\lambda _{n+1}}u_n\\&\quad + \frac{\lambda _0}{\lambda _{n+1}}\left( \lambda _n(p_n - y_n) + (\lambda _n-\lambda _0)(y_{n-1}-p_{n-1})\right) \\&= y_n+ \frac{\lambda _0\lambda _n}{\lambda _{n+1}}\left( p_n - y_n\right) +u_{n+1}-\frac{\lambda _n}{\lambda _{n+1}}u_n\\&\quad +\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( \left( y_n- y_{n-1}\right) + \lambda _0(y_{n-1}-p_{n-1})\right) . \end{aligned}$$

The $u_{n+1}$ update becomes

$$\begin{aligned} u_{n+1}&=\kappa _n\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( p_n-x_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(x_n-p_{n-1}) - \frac{2-\gamma \bar{\beta }-2\lambda _0}{4-\gamma \bar{\beta }-2\lambda _0} u_n\right) \\&=\kappa _n\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( p_n-\frac{\lambda _0}{\lambda _n}x_n-\frac{\lambda _n-\lambda _0}{\lambda _n}p_{n-1} - \frac{2-\gamma \bar{\beta }-2\lambda _0}{4-\gamma \bar{\beta }-2\lambda _0} u_n\right) \\&\!=\!\kappa _n\frac{(4\!-\!\gamma \bar{\beta }\!-\!2\lambda _0)}{2}\left( p_n-\left( y_n\!-\!\frac{\lambda _n-\lambda _0}{\lambda _n} y_{n-1}-u_n\right) -\frac{\lambda _n\!-\!\lambda _0}{\lambda _n}p_{n-1} \!-\! \frac{2-\gamma \bar{\beta }-2\lambda _0}{4\!-\!\gamma \bar{\beta }\!-\!2\lambda _0} u_n\right) \\&=\kappa _n\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( p_n-y_n-\frac{\lambda _n-\lambda _0}{\lambda _n}(p_{n-1}-y_{n-1}) + \frac{2}{4-\gamma \bar{\beta }-2\lambda _0} u_n\right) \\&=\kappa _n\left( \frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( p_n-y_n-\frac{\lambda _n-\lambda _0}{\lambda _n}(p_{n-1}-y_{n-1})\right) + u_n\right) . \end{aligned}$$

This concludes the proof.$\square $

5.2 Fixed-point residual convergence rate

The convergent quantity in Proposition 3 (i) may be hard to interpret. In this section, we propose a special case of (26) and (31) such that this quantity is the fixed-point residual, $p_n-y_n$, for the forward–backward mapping. This is achieved by letting $\kappa _n=\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}$, which implies that

$$\begin{aligned} u_{n+1} = \frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( p_n-y_n\right) \end{aligned}$$

and that the algorithm becomes

$$\begin{aligned} p_n&= {\left( M+\gamma A \right) }^{-1} {\left( M y_n - \gamma C y_n \right) }\\ y_{n+1}&=y_{n} + \left( \frac{\lambda _0\lambda _n}{\lambda _{n+1}}+\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\right) (p_n - y_n) \\&\quad +\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( (y_n- y_{n-1})+\frac{(4-\gamma \bar{\beta })}{2}(y_{n-1} - p_{n-1})\right) . \end{aligned}$$

(32)

This algorithm converges as per the following result.

Proposition 5

Suppose that Assumptions 1 and 4 hold. Then the following hold for (32):

(i)
if $f(n)\rightarrow \infty $ as $n\rightarrow \infty $, then
$$\begin{aligned} {\left\| \frac{1}{\gamma }(p_n-y_n) \right\| }_M^2\le \frac{2\Vert y_0-x^\star \Vert _M^2}{\gamma ^2(4-\gamma \bar{\beta }-2\lambda _0)\lambda _0f(n)^2}; \end{aligned}$$
(ii)
if $\bar{\beta }>\beta $ and $f^{\prime }(0)<1$, then $ {\left( \Vert p_n-y_n\Vert ^2 \right) }_{n\in \mathbb {N}}$ is summable;
(iii)
if $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ is constant, the algorithm reduces to relaxed forward–backward splitting and $p_n\rightharpoonup x^\star \in \text{zer}(A+C)$.

Proof

The claims follow from Propositions 3 and 4 by showing that, for all $n\in \mathbb {N}$: the choice $\kappa _n=\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}$ in (31) gives (32); by noting that $\kappa _n=\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}$ satisfies Assumption 4 (i) with $\epsilon _0>0$ if f (and consequently $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$) is bounded and with $\epsilon _0=0$ if f (and consequently $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$) is unbounded; and by showing that the expression inside the norm in Proposition 3 (i) satisfies

$$\begin{aligned} p_n-x_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(x_n-p_{n-1}) - \frac{2-\gamma \beta -2\lambda _0}{4-\gamma \beta -2\lambda _0}u_n = p_n-y_n. \end{aligned}$$

(33)

We will first show that the $u_{n+1}$ update in (31) with $\kappa _n=\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}$, i.e.,

$$\begin{aligned} u_{n+1}&=\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\left( \frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( p_n-y_n-\frac{\lambda _n-\lambda _0}{\lambda _n}(p_{n-1}-y_{n-1})\right) + u_n\right) \end{aligned}$$

(34)

implies that $u_n=\frac{\lambda _n-\lambda _0}{\lambda _n}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}(p_{n-1}-y_{n-1})$ for all $n\in \mathbb {N}$. For $n=0$, since $p_{-1}=y_{-1}=u_0=0$, we get

$$\begin{aligned} u_{1} = \frac{\lambda _1-\lambda _0}{\lambda _1}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}(p_{0}-y_{0}). \end{aligned}$$

For $n\ge 1$, we use induction. Assume that $u_n=\frac{\lambda _n-\lambda _0}{\lambda _n}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}(p_{n-1}-y_{n-1})$, then

$$\begin{aligned} u_{n+1}&=\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( p_n-y_n\right) \end{aligned}$$

(35)

since the last two terms in (34) cancel, which is what we wanted to show.

The $y_{n+1}$ update in (31) with $u_{n+1}$ defined in (35) inserted satisfies

$$\begin{aligned} y_{n+1}&=y_{n} + \frac{\lambda _0\lambda _n}{\lambda _{n+1}}(p_n - y_n) +u_{n+1}-\frac{\lambda _n}{\lambda _{n+1}}u_n \\&\quad +\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( (y_n- y_{n-1})+\lambda _0 (y_{n-1} - p_{n-1})\right) \\&=y_{n} + \frac{\lambda _0\lambda _n}{\lambda _{n+1}}(p_n - y_n) +\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( p_n-y_n\right) \\&\quad -\frac{\lambda _{n}-\lambda _0}{\lambda _{n+1}}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\left( p_{n-1}-y_{n-1}\right) \\&\quad +\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( (y_n- y_{n-1})+\lambda _0 (y_{n-1} - p_{n-1})\right) \\&=y_{n} + \left( \frac{\lambda _0\lambda _n}{\lambda _{n+1}}+\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\right) (p_n - y_n) \\&\quad +\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( y_n- y_{n-1}\right) +\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( \lambda _0 +\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\right) (y_{n-1} - p_{n-1})\\&=y_{n} + \left( \frac{\lambda _0\lambda _n}{\lambda _{n+1}}+\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\right) (p_n - y_n) \\&\quad +\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( \left( y_n- y_{n-1}\right) +\frac{(4-\gamma \bar{\beta })}{2}(y_{n-1} - p_{n-1})\right) , \end{aligned}$$

which equals the $y_{n+1}$ update in (32).

Finally, using the $y_n$ update equation in (26), i.e, $\frac{\lambda _0}{\lambda _n}x_n=y_n-\frac{\lambda _n-\lambda _0}{\lambda _n}y_{n-1}-u_n$ and the $u_{n+1}$ definition in (35), we conclude that

$$\begin{aligned}&p_n-x_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(x_n-p_{n-1}) - \frac{2-\gamma \beta -2\lambda _0}{4-\gamma \beta -2\lambda _0}u_n\\&\qquad \qquad =p_n-y_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(y_{n-1}-p_{n-1}) + \left( 1-\frac{2-\gamma \beta -2\lambda _0}{4-\gamma \beta -2\lambda _0}\right) u_n\\&\qquad \qquad =p_n-y_n+\frac{\lambda _n-\lambda _0}{\lambda _n}(y_{n-1}-p_{n-1}) + \frac{2}{4-\gamma \beta -2\lambda _0} u_n, \\&\qquad \qquad =p_n-y_n. \end{aligned}$$

This completes the proof.$\square $

One of the special cases previewed in Section 3.1 is obtained from (32) by letting $\lambda _n=\left( 1-\frac{\gamma \bar{\beta }}{4}\right) ^e(1+n)^{e}$ for all $n\in \mathbb {N}$. The resulting algorithm is numerically evaluated in Section 6 and enjoys the following convergence properties.

Corollary 1

Suppose that Assumptions 1 and 4 hold with $\lambda _0=\left( 1-\frac{\gamma \bar{\beta }}{4}\right) ^e$ and let $f(n)=(1+n)^{e}$ with $e\in [0,1]$. Then the following hold for (32):

(i)
if $e>0$, then
$$\begin{aligned} {\left\| \frac{1}{\gamma }(p_n-y_n) \right\| }_M^2\le \frac{2\Vert y_0-x^\star \Vert _M^2}{\gamma ^2(4-\gamma \bar{\beta }-2\lambda _0)\lambda _0(1+n)^{2e}}; \end{aligned}$$
(ii)
if $\bar{\beta }>\beta $ and $e<1$, then $ {\left( \Vert p_n-y_n\Vert ^2 \right) }_{n\in \mathbb {N}}$ is summable;
(iii)
if $e=0$, the algorithm reduces to forward–backward splitting and $p_n\rightharpoonup x^\star \in \text {zer}(A+C)$.

Corollary 1 (i) states that $ {\left( \Vert p_n-y_n\Vert ^2 \right) }_{n\in \mathbb {N}}$ converges as $\mathcal {O}\left( \frac{1}{n^{2e}}\right) $. When $\bar{\beta }>\beta $ and $e<1$. Corollary 1 (ii) gives that $ {\left( \Vert p_n-y_n\Vert ^2 \right) }_{n\in \mathbb {N}}$ converges as $\mathcal {O}\left( \frac{1}{n}\right) $ due to its summability. This gives a combined $\mathcal {O}\left( \min \left( \frac{1}{n},\frac{1}{n^{2e}}\right) \right) $ convergence rate and implies tunability of the convergence rate by selecting $e\in [0,1]$. Letting $e=1$ implies that our algorithm, as we will see in Section 5.2.1, reduces to the accelerated proximal point method and the Halpern iteration that converge as $O\left( \frac{1}{n^2}\right) $.

5.2.1 Accelerated proximal point method and Halpern iteration

Letting $f(n)=1+n$ and $\lambda _0=\left( 1-\frac{\gamma \bar{\beta }}{4}\right) $ to get $\lambda _n=\left( 1-\frac{\gamma \bar{\beta }}{4}\right) (1+n)$, we get that the $y_{n+1}$ update of (32) satisfies

$$\begin{aligned} y_{n+1}&=y_{n} + \left( \frac{\lambda _0\lambda _n}{\lambda _{n+1}}+\frac{\lambda _{n+1}-\lambda _0}{\lambda _{n+1}}\frac{(4-\gamma \bar{\beta }-2\lambda _0)}{2}\right) (p_n - y_n) \\&\quad +\frac{\lambda _n-\lambda _0}{\lambda _{n+1}}\left( \left( y_n- y_{n-1}\right) +\frac{(4-\gamma \bar{\beta })}{2}(y_{n-1} - p_{n-1})\right) \\&=y_{n} + \left( 1-\frac{\gamma \bar{\beta }}{4}\right) \left( \frac{n+1}{n+2}+\frac{n+1}{n+2}\right) (p_n - y_n) \\&\quad +\frac{n}{n+2}\left( \left( y_n- y_{n-1}\right) +\frac{(4-\gamma \bar{\beta })}{2}(y_{n-1} - p_{n-1})\right) \\&=y_{n} + \frac{4-\gamma \bar{\beta }}{2}\frac{n+1}{n+2}(p_n - y_n) \\&\quad +\frac{n}{n+2}\left( \left( y_n- y_{n-1}\right) +\frac{(4-\gamma \bar{\beta })}{2}(y_{n-1} - p_{n-1})\right) \\&= \frac{\gamma \bar{\beta }(1+n)}{4+2n}y_n + \frac{n(2-\gamma \bar{\beta })}{4+2n}y_{n-1} + \frac{(1+n)(4-\gamma \bar{\beta })}{4+2n}p_n - \frac{n(4-\gamma \bar{\beta })}{4+2n}p_{n-1} \end{aligned}$$

and algorithm (32) becomes

$$\begin{aligned} p_n&= {\left( M+\gamma A \right) }^{-1} {\left( M - \gamma C \right) }y_n,\\ y_{n+1}&= \frac{\gamma \bar{\beta }(1+n)}{4+2n}y_n + \frac{n(2-\gamma \bar{\beta })}{4+2n}y_{n-1} + \frac{(1+n)(4-\gamma \bar{\beta })}{4+2n}p_n - \frac{n(4-\gamma \bar{\beta })}{4+2n}p_{n-1}. \end{aligned}$$

(36)

From Corollary 1, we conclude since $e=1$ and by letting $\beta =\bar{\beta }$ that this algorithm converges as

$$\begin{aligned} {\left\| \frac{1}{\gamma }(p_n-y_n) \right\| }_M^2\le \frac{16\Vert y_0-x^\star \Vert _M^2}{\gamma ^2\left( 4-\gamma \beta \right) ^2(1+n)^{2}}. \end{aligned}$$

By letting $C=0$ and consequently $\beta =0$, we arrive at the accelerated proximal point method in [21] and the $O(\frac{1}{n^2})$ convergence rate results found in [21] is recovered by Corollary 1.

Table 1 We report the number of iterations to reach accuracy $\Vert p_n-x^\star \Vert \le 10^{-6}$ for the algorithm in (32) with $\lambda _n$ in (37) and different choices of e

Full size table

If we let $ A = 0 $, $\bar{\beta }= \beta $, and $\gamma \beta = 2$, the forward–backward mapping in (36) satisfies

$$\begin{aligned} p_n = (M-\frac{2}{\beta } C)y_n \end{aligned}$$

where $(M-\frac{2}{\beta } C): = N$ is nonexpansive in the $\Vert \cdot \Vert _M$ norm. This implies that the algorithm aims at solving the nonexpansive fixed-point equation $y = Ny$ by iterating

$$\begin{aligned} p_n&= Ny_n\\ y_{n+1}&= \frac{1+n}{2+n}y_n + \frac{1+n}{2+n}p_n - \frac{n}{2+n}p_{n-1}, \end{aligned}$$

which is the Halpern iteration studied in [23]. This is seen by recursively inserting $y_n$ into the $y_{n+1}$ update to get

$$\begin{aligned} y_{n+1} = \frac{1}{n+2}y_{0} + \frac{n+1}{n+2}Ny_n, \end{aligned}$$

which is the formulation used in [23]. From Corollary 1, we conclude that this iteration converges as

$$\begin{aligned} {\left\| p_n-y_n \right\| }_M^2\le \frac{4\Vert y_0-x^\star \Vert _M^2}{(1+n)^{2}}, \end{aligned}$$

which recovers the convergence result in [23]. Interestingly, although the convergence rate is optimized by this choice of $\lambda _n$, it does not perform very well in practice. Other choices of $ {\left( \lambda _n \right) }_{n\in \mathbb {N}}$ with slower rate guarantees can give significantly better practical performance as demonstrated in Section 6.

Table 2 We report the number of iterations to reach accuracy $\Vert p_n-x^\star \Vert \le 10^{-6}$ for the algorithm in (30) with $\kappa \in \{-0.9,-0.8,\ldots ,0.9\}$

Full size table

6 Numerical examples

In this section, we apply our proposed algorithms on the problem $0\in Az$, where $z = (x,y)$ and

$$\begin{aligned} Az = \begin{bmatrix}0&{} -1\\ 1 &{} 0\end{bmatrix}z = (-y,x) \end{aligned}$$

for all $z\in \mathbb {R}^2$. The operator $A:\mathbb {R}^2\rightarrow \mathbb {R}^2$ is skew-symmetric and the monotone inclusion problem $0\in Az$ can be interpreted as an optimality condition for the minimax problem

$$\begin{aligned} \max _{y\in \mathbb {R}}\min _{x\in \mathbb {R}} xy \end{aligned}$$

with unique solution $x = y = 0$. We will in particular evaluate the algorithm in (30) with different choices of $\kappa \in (-1,1)$ and the algorithm in (32) with

$$\begin{aligned} \lambda _n = \left( 1-\frac{\gamma \bar{\beta }}{4}\right) ^e(1+n)^e \end{aligned}$$

(37)

for all $n\in \mathbb {N}$ and $e\in [0,1]$. According to Propositions 3 and 5, (30) with $\kappa \in (-1,1)$ and (32) with $\lambda _n$ in (37) and $e = 0$ (corresponding to the standard FB method) converge weakly to a solution of the inclusion problem. As per Corollary 1, (32) with $\lambda _n$ in (37) and $e\in (0,1]$ converges in squared norm of the fixed point residual as $O\left( \frac{1}{n^{2e}}\right) $ and when $e<1$ and $\bar{\beta }>\beta $, it does so with a rate of $O\left( \frac{1}{n}\right) $.

For all our experiments, parameters $\gamma = 0.1$ and $\bar{\beta }= 0.001$ (which is feasible since $C = 0$ and therefore $\beta = 0$) are used, and starting points $y_{-1} = p_{-1} = y_0 = (3,3)$ and $x_{-1} = p_{-1} = x_0 = (3,3)$ for (32) and (30) respectively.

In Fig. 1 and Table 1 we report numerical results for the algorithm in (32) with $\lambda _n$ in (37) and $e\in \{0,0.1,\ldots ,1\}$ and $M = {\textrm{Id}}$. The choice $e = 0$ gives standard forward–backward splitting and $e = 1$ gives the accelerated proximal point method in [21]. The other choices of e gives rise to new algorithms. The figure shows that the distance to the unique solution behaves over-damped for small values of e and under-damped for large values of e. There is a sweet spot in the middle that has the right level of damping and performs significantly better than the previously known special cases with $e = 0$ and $e = 1$, at least for this problem.

Table 3 We report the number of iterations to reach accuracy $\Vert p_n-x^\star \Vert \le 10^{-6}$ for the algorithm in (30) with $\kappa \in \{0.8,0.82,\ldots ,0.9\}$

Full size table

In Fig. 2 and Table 2, we report numerical results for the algorithm in (30) with $\kappa \in \{-0.9,-0.8,\ldots ,0.9\}$ and $M = \textrm{Id}$. The theory predicts sequence convergence towards a solution of the problem for all $\kappa \in (-1,1)$. The choice $\kappa = 0$ gives rise to standard forward–backward splitting and all other values of $\kappa $ define new algorithms. The figure reveals that the performance is best for $\kappa \in [0.8,0.9]$, significantly better than standard FB splitting with $\kappa = 0$.

In Fig. 3 and Table 3, we provide numerical results over a finer grid of the best performing $\kappa $. We set the range to $\kappa \in [0.8,0.9]$ and use a spacing of 0.02. We see that for $\kappa = 0.8$ and $\kappa = 0.82$, the distance to solution is non-oscillatory, while it oscillates for greater values of $\kappa $. All these choices of $\kappa $ perform very well.

7 Deferred results and proofs

In what follows, we present some results that have been used in the previous sections along with the proof of Theorem 1 that was deferred to this section. Prior to that, we define the auxiliary parameter

$$\begin{aligned} \theta ^{\prime }_n:= {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n+2\bar{\alpha }_n\bar{\theta }_n \end{aligned}$$

(38)

which frequently appears throughout this section.

We begin by establishing some identities between the parameters defined in Algorithm 1. These identities are used several times in the proof of Theorem 1.

Proposition 6

Consider the auxiliary parameters defined in Step 2 of Algorithm 1. Then, for all $n\in \mathbb {N}$, the following identities hold

(i)
$\theta _n = {\left( 2-\gamma _n\bar{\beta } \right) }\bar{\theta }_n+\hat{\theta }_n$;
(ii)
$\theta _n = 2\bar{\theta }_n+ {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n)$;
(iii)
$\lambda _n^2\theta _n = \hat{\theta }_{n}(\lambda _{n}+\mu _{n})-2\bar{\theta }_n^2$.

Proof

For Proposition 6 (i), from definition of $\bar{\theta }_n$ and $\hat{\theta }_n$, we have

$$\begin{aligned} {\left( 2-\gamma _n\bar{\beta } \right) }\bar{\theta }_n+\hat{\theta }_n&= {\left( 2-\gamma _n\bar{\beta } \right) } {\left( \lambda _n+\mu _n-\lambda _n^2 \right) }+ {\left( 2\lambda _n+2\mu _n-\gamma _n\bar{\beta }\lambda _n^2 \right) }\\&= {\left( 2-\gamma _n\bar{\beta } \right) } {\left( \lambda _n+\mu _n \right) }-2\lambda _n^2+\gamma _n\bar{\beta }\lambda _n^2+ {\left( 2\lambda _n+2\mu _n-\gamma _n\bar{\beta }\lambda _n^2 \right) }\\&= {\left( 2-\gamma _n\bar{\beta } \right) } {\left( \lambda _n+\mu _n \right) }-2\lambda _n^2+2 {\left( \lambda _n+\mu _n \right) }\\&= {\left( 4-\gamma _n\bar{\beta } \right) } {\left( \lambda _n+\mu _n \right) }-2\lambda _n^2\\&= \theta _n, \end{aligned}$$

which holds by definition of $\theta _n$ in Algorithm 1. For Proposition 6 (ii) we have

$$\begin{aligned} 2\bar{\theta }_n+ {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n)&= 2 {\left( \lambda _n+\mu _n-\lambda _n^2 \right) }+ {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n)\\&= -2\lambda _n^2+ {\left( 4-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n)\\&= \theta _n. \end{aligned}$$

For Proposition 6 (iii), after moving all terms to the left-hand side of the equality we get

$$\begin{aligned} \lambda _n^2\theta _n+2\bar{\theta }_n^2-\hat{\theta }_{n}(\lambda _{n}+\mu _{n})&= \lambda _n^2 {\left( {\left( 2-\gamma _n\bar{\beta } \right) }\bar{\theta }_n+\hat{\theta }_n \right) }+2\bar{\theta }_n^2-\hat{\theta }_{n}(\lambda _{n}+\mu _{n})\\&= \bar{\theta }_n {\left( \lambda _n^2 {\left( 2-\gamma _n\bar{\beta } \right) }+2\bar{\theta }_n \right) }-\hat{\theta }_n(\lambda _n+\mu _n-\lambda _n^2)\\&= \bar{\theta }_n {\left( 2\lambda _n+2\mu _n-\gamma _n\bar{\beta }\lambda _n^2 \right) }-\hat{\theta }_n\bar{\theta }_n = \bar{\theta }_n\hat{\theta }_n-\hat{\theta }_n\bar{\theta }_n, \end{aligned}$$

where in the first equality $\theta _n$ is substituted using Proposition 6 (i) and in the second and the third equalities, definitions of $\bar{\theta }_n$ and $\hat{\theta }_n$ are used, respectively.$\square $

We note from Proposition 6 (iii) that the assumption $\theta _n$ > 0 in Algorithm 1 implies $\hat{\theta }_{n}$ > 0.

The next lemma provides alternative expressions for the term inside the first norm in ().

Lemma 2

Suppose that Assumption 1 holds and consider the sequences generated by Algorithm 1. Then, for all $n\in \mathbb {N}$, the following

(i)
$p_n - (1-\alpha _n)x_n-\alpha _np_{n-1} +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n$;
(ii)
$p_n-\frac{2\bar{\theta }_n}{\theta _n}z_n+\frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1}$;
(iii)
$\frac{1}{\lambda _n} {\left( x_{n+1}-x_n \right) }+\frac{\tilde{\theta }_n}{\hat{\theta }_n}u_n+\frac{(2-\gamma _n\bar{\beta })(\lambda _n+\mu _n)}{\theta _n}v_n$.

Proof

We, first, show that Lemma 2 (ii) represents the same vector as Lemma 2 (i):

$$\begin{aligned} p_n&-\frac{2\bar{\theta }_n}{\theta _n}z_n+\frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1}\\&= p_n - \frac{2\bar{\theta }_n}{\theta _n}(z_n-\bar{\alpha }_nz_{n-1}) + \frac{\tilde{\theta }_n}{\theta _n}(y_n-\alpha _ny_{n-1})-\frac{2\lambda _n}{\theta _n}x_n -\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}\\&= p_n - \frac{2\bar{\theta }_n}{\theta _n} {\left( (1-\alpha _n)x_n + (\alpha _n-\bar{\alpha }_n)p_{n-1} + \frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}u_n + v_n \right) }\\&\qquad + \frac{\tilde{\theta }_n}{\theta _n}((1-\alpha _n)x_n + u_n)-\frac{2\lambda _n}{\theta _n}x_n -\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}\\&= p_n - \frac{2(1-\alpha _n)\bar{\theta }_n-(1-\alpha _n)\tilde{\theta }_n+2\lambda _n}{\theta _n}x_n - \frac{\theta ^{\prime }_n+2\bar{\theta }_n(\alpha _n-\bar{\alpha }_n)}{\theta _n}p_{n-1}\\&\qquad + \frac{\hat{\theta }\tilde{\theta }_n-2\bar{\theta }_n^2\gamma _n\bar{\beta }}{\theta _n\hat{\theta }_n}u_n-\frac{2\bar{\theta }_n}{\theta _n}v_n\\&= p_n - (1-\alpha _n)x_n-\alpha _np_{n-1} +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \end{aligned}$$

where the coefficients of the last equality are found as follows. The numerator of the coefficient of $x_n$ reads

$$\begin{aligned} 2(1&-\alpha _n)\bar{\theta }_n-(1-\alpha _n)\tilde{\theta }+2\lambda _n\\&= (1-\alpha _n) {\left( \theta _n- {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n) \right) }\\&\qquad -(1-\alpha _n)\gamma _n\bar{\beta }(\lambda _n+\mu _n)+2\lambda _n \\&= (1-\alpha _n)\theta _n-2(1-\alpha _n)(\lambda _n+\mu _n)+2\lambda _n\\&= (1-\alpha _n)\theta _n-2\frac{\lambda _n}{\lambda _n+\mu _n}(\lambda _n+\mu _n)+2\lambda _n = (1-\alpha _n)\theta _n \end{aligned}$$

(39)

where in the first equality, $\bar{\theta }_n$ is substituted from Proposition 6 (ii), and $\tilde{\theta }_n$ and $\alpha _n$ are substituted by their definitions in Algorithm 1. The numerator of the coefficient of $p_{n-1}$ is

$$\begin{aligned} \theta ^{\prime }_n+2\bar{\theta }_n(\alpha _n-\bar{\alpha }_n)&= {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n+2\bar{\alpha }_n\bar{\theta }_n+2\bar{\theta }_n(\alpha _n-\bar{\alpha }_n)\\&= {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n+2\bar{\theta }_n\alpha _n\\&= {\left( 2-\gamma _n\bar{\beta } \right) }\alpha _n(\lambda _n+\mu _n)+2\bar{\theta }_n\alpha _n = \alpha _n\theta _n \end{aligned}$$

(40)

where in the first equality (38) is used, the third equality is obtained using the definition of $\alpha _n$, and Proposition 6 (ii) is utilized in the last equality. For the numerator of $u_n$ we get

$$\begin{aligned} \hat{\theta }\tilde{\theta }_n-2\bar{\theta }_n^2\gamma _n\bar{\beta }&= \hat{\theta }\gamma _n\bar{\beta }(\lambda _n+\mu _n)-2\bar{\theta }_n^2\gamma _n\bar{\beta }\\&= \gamma _n\bar{\beta } {\left( \hat{\theta }_n(\lambda _n+\mu _n)-2\bar{\theta }_n^2 \right) } = \gamma _n\bar{\beta }\lambda _n^2\theta _n \end{aligned}$$

(41)

where the first equality is obtained by substitution of the definition of $\tilde{\theta }_n$ from Algorithm 1, and in the last equality Proposition 6 (iii) is used.

Now, we show that Lemma 2 (ii) and (iii) represent the same vector. Starting from Lemma 2 (ii), we have

$$\begin{aligned} p_n&-\frac{2\bar{\theta }_n}{\theta _n}z_n+\frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1}\\&= p_n - \frac{2\bar{\theta }_n}{\theta _n}(z_n-\bar{\alpha }_nz_{n-1}) + \frac{\tilde{\theta }_n}{\theta _n}(y_n-\alpha _ny_{n-1})-\frac{2\lambda _n}{\theta _n}x_n -\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}\\&= \frac{1}{\lambda _n}(x_{n+1}-x_{n}) + z_n + \bar{\alpha }_n(p_{n-1}-z_{n-1}) - \frac{2\bar{\theta }_n}{\theta _n}(z_n-\bar{\alpha }_nz_{n-1}) \\&\qquad + \frac{\tilde{\theta }_n}{\theta _n}(y_n-\alpha _ny_{n-1})-\frac{2\lambda _n}{\theta _n}x_n -\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}\\&= \frac{1}{\lambda _n}(x_{n+1}-x_{n})+\frac{\theta _n-2\bar{\theta }_n}{\theta _n} {\left( z_n-\bar{\alpha }_nz_{n-1} \right) } + \frac{\tilde{\theta }_n}{\theta _n}(y_n-\alpha _ny_{n-1})-\frac{2\lambda _n}{\theta _n}x_n\\&\qquad + \frac{\bar{\alpha }_n\theta _n-\theta ^{\prime }_n}{\theta _n}p_{n-1}\\&= \frac{1}{\lambda _n}(x_{n+1}-x_{n})+\frac{\theta _n-2\bar{\theta }_n}{\theta _n} {\left( (1-\alpha _n)x_n + (\alpha _n-\bar{\alpha }_n)p_{n-1} + \frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}u_n + v_n \right) }\\&\qquad + \frac{\tilde{\theta }_n}{\theta _n} {\left( (1-\alpha _n)x_n + u_n \right) }-\frac{2\lambda _n}{\theta _n}x_n + \frac{\bar{\alpha }\theta _n-\theta ^{\prime }_n}{\theta _n}p_{n-1}\\&= \frac{1}{\lambda _n}(x_{n+1}-x_{n}) + \frac{ {\left( \theta _n-2\bar{\theta }_n \right) }\bar{\theta }_n\gamma _n\bar{\beta }+\tilde{\theta }_n\hat{\theta }_n}{\theta _n\hat{\theta }_n}u_n + \frac{\theta _n-2\bar{\theta }_n}{\theta _n}v_n\\&\qquad + \frac{ {\left( \theta _n-2\bar{\theta }_n \right) }(1-\alpha _n)+\tilde{\theta }_n(1-\alpha _n)-2\lambda _n}{\theta _n}x_n + \frac{(\theta _n-2\bar{\theta }_n)(\alpha _n-\bar{\alpha }_n)+\bar{\alpha }_n\theta _n-\theta ^{\prime }_n}{\theta _n}p_{n-1}\\&= \frac{1}{\lambda _n} {\left( x_{n+1}-x_n \right) }+\frac{\tilde{\theta }_n}{\hat{\theta }_n}u_n+\frac{(2-\gamma _n\bar{\beta })(\lambda _n+\mu _n)}{\theta _n}v_n \end{aligned}$$

In the second equality, the definition of $x_{n+1}$ in Step 8 of Algorithm 1 is used. In the fourth equality, the definition of $y_n$ in Step 5 and the definition of $z_n$ in Step 6 of Algorithm 1 are used. In the last equality, the coefficient of $x_n$ is found to be $-\frac{1}{\lambda _n}$ by (39), the coefficient of $p_{n-1}$ is zero by (40), the coefficient of $v_n$ is found by Proposition 6 (ii), and for the coefficient of $u_n$ we have

$$\begin{aligned} {\left( \theta _n-2\bar{\theta }_n \right) }\bar{\theta }_n\gamma _n\bar{\beta }+\tilde{\theta }_n\hat{\theta }_n&= \theta _n\bar{\theta }_n\gamma _n\bar{\beta }-2\bar{\theta }^2\gamma _n\bar{\beta }+\tilde{\theta }_n\hat{\theta }_n\\&= \theta _n\bar{\theta }_n\gamma _n\bar{\beta }+ \gamma _n\bar{\beta }\lambda _n^2\theta _n\\&= \theta _n\gamma _n\bar{\beta } {\left( \bar{\theta }_n+\lambda _n^2 \right) } = \theta _n\gamma _n\bar{\beta } {\left( \lambda _n+\mu _n \right) } \end{aligned}$$

where the second equality is obtained by (41), and in the last equality the definition of $\bar{\theta }_n$ is used. This concludes the proof.$\square $

7.1 Proof of Theorem 1

Proof

The only (non-trivial) divisors that will appear the proof (as well as Algorithm 1) are $\theta _n$ and $\hat{\theta }_{n}$. In Algorithm 1, we assume $\theta _n>$ 0 for all $n \in \mathbb {N}$, which by Proposition 6 (iii) (and since $\lambda _n > 0$ and $\mu _n \ge 0$) implies that $\hat{\theta }_{n}>$ 0. Therefore, there are no divisions by zero.

Let us define the following quantity

$$\begin{aligned} \Delta _n :=V_{n+1} - V_n+ & {} 2\gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})\phi _n + \ell _{n-1} \nonumber \\- & {} (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2 + \frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) \end{aligned}$$

(42)

and prove the result by showing that, for all $n\in \mathbb {N}$, it is identical to zero. By substituting $V_{n+1}$ and $V_n$ in (42), we get

$$\begin{aligned} \Delta _n&= {\left\| x_{n+1}-x^\star \right\| }_M^2 + \ell _{n} + 2\lambda _{n+1}\gamma _{n+1}\alpha _{n+1}\phi _n\\&\quad - {\left\| x_{n}-x^\star \right\| }_M^2 - \ell _{n-1} - 2\lambda _{n}\gamma _{n}\alpha _{n}\phi _{n-1}\\&\quad + 2\gamma _n(\lambda _n-\bar{\alpha }_{n+1}\lambda _{n+1})\phi _n + \ell _{n-1} - (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2+\frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) \\&= {\left\| x_{n+1}-x^\star \right\| }_M^2 - {\left\| x_{n}-x^\star \right\| }_M^2 + \ell _{n} - 2\lambda _{n}\gamma _{n}\alpha _{n}\phi _{n-1} + 2\gamma _n\lambda _n\phi _n \nonumber \\&\quad - (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2+\frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) , \end{aligned}$$

where in the last equality we used $\gamma _n\bar{\alpha }_{n+1} = \gamma _{n+1}\alpha _{n+1}$. Next, substituting $\ell _{n}$ from (), and $\phi _{n-1}$ and $\phi _n$ from (8) on the right-hand side of the last equality above, yields

$$\begin{aligned} \Delta _n&= {\left\| x_{n+1}-x^\star \right\| }_M^2 - {\left\| x_{n}-x^\star \right\| }_M^2\\&\quad + \frac{1}{2}\theta _n {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2 \nonumber \\&\quad + 2\mu _{n}\gamma _{n} {\left\langle \frac{z_{n}-p_{n}}{\gamma _{n}}-\frac{z_{n-1}-p_{n-1}}{\gamma _{n-1}}, p_{n}-p_{n-1} \right\rangle }_M\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n}-y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2\\&\quad - 2\lambda _{n}\gamma _{n}\alpha _{n}\left( {\left\langle \frac{z_{n-1}-p_{n-1}}{\gamma _{n-1}}, p_{n-1}-x^\star \right\rangle }_M + \frac{\bar{\beta }}{4} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\right) \\&\quad + 2\lambda _n\gamma _n\left( {\left\langle \frac{z_n-p_n}{\gamma _n}, p_n-x^\star \right\rangle }_M + \frac{\bar{\beta }}{4} {\left\| y_n-p_n \right\| }_M^2\right) \\&\quad - (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2+\frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) \\&= {\left\| x_{n+1}-x^\star \right\| }_M^2 - {\left\| x_{n}-x^\star \right\| }_M^2 + 2\lambda _n\gamma _n {\left\langle \frac{z_n-p_n}{\gamma _n}, p_n-x^\star \right\rangle }_M\\&\quad - 2\lambda _{n}\gamma _{n}\alpha _{n} {\left\langle \frac{z_{n-1}-p_{n-1}}{\gamma _{n-1}}, p_{n-1}-p_n+p_n-x^\star \right\rangle }_M \\&\quad + 2\mu _{n}\gamma _{n} {\left\langle \frac{z_{n}-p_{n}}{\gamma _{n}}-\frac{z_{n-1}-p_{n-1}}{\gamma _{n-1}}, p_{n}-p_{n-1} \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n}-y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2 \\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n-p_n \right\| }_M^2 - (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2+\frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) \\&= {\left\| x_{n+1}-x^\star \right\| }_M^2 - {\left\| x_{n}-x^\star \right\| }_M^2\\&\quad + 2 {\left\langle \lambda _n(z_n-p_n)-\bar{\alpha }_n\lambda _n(z_{n-1}-p_{n-1}), p_n-x^\star \right\rangle }_M \\&\quad + 2 {\left\langle \mu _{n}(z_{n}-p_{n})+(\bar{\alpha }_n\lambda _n-\frac{\gamma _n}{\gamma _{n-1}}\mu _n)(z_{n-1}-p_{n-1}), p_{n}-p_{n-1} \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n}-y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n-p_n \right\| }_M^2 - (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2+\frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) . \end{aligned}$$

We define

$$\begin{aligned} \omega _n :=\bar{\alpha }_n\lambda _n-\frac{\gamma _n}{\gamma _{n-1}}\mu _n \end{aligned}$$

(43)

and substitute it in the last equality above; and also from Step 8 of Algorithm 1, we replace $\lambda _n(z_n-p_n)-\bar{\alpha }_n\lambda _n(z_{n-1}-p_{n-1})$ by $x_n-x_{n+1}$. Then, we get

$$\begin{aligned} \Delta _n&= {\left\| x_{n+1}-x^\star \right\| }_M^2 - {\left\| x_{n}-x^\star \right\| }_M^2 + 2 {\left\langle x_n-x_{n+1}, p_n-x^\star \right\rangle }_M\\&\quad + 2 {\left\langle \mu _{n}(z_{n}-p_{n})+\omega _n(z_{n-1}-p_{n-1}), p_{n}-p_{n-1} \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n}-y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n-p_n \right\| }_M^2 - (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2+\frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) \\&= {\left\| x_{n+1}-p_n \right\| }_M^2 - {\left\| x_{n}-p_n \right\| }_M^2\\&\quad + 2 {\left\langle \mu _{n}(z_{n}-p_{n})+\omega _n(z_{n-1}-p_{n-1}), p_{n}-p_{n-1} \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n}-y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n-p_n \right\| }_M^2 - (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2+\frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) \end{aligned}$$

where in the last equality we used the identity $2 {\left\langle a-b, c-d \right\rangle }_M+ {\left\| b-d \right\| }_M^2- {\left\| a-d \right\| }_M^2 = {\left\| b-c \right\| }_M^2- {\left\| a-c \right\| }_M^2$ for all $a,b,c,d\in \mathcal {H}$. Now, inserting $x_{n+1}$ from Step 8 of Algorithm 1, yields

$$\begin{aligned} \Delta _n&= {\left\| x_n-p_n+\lambda _n(p_n - z_n) + \lambda _n\bar{\alpha }_n(z_{n-1}-p_{n-1}) \right\| }_M^2 - {\left\| x_{n}-p_n \right\| }_M^2 \\&\quad + 2 {\left\langle \mu _{n}(z_{n}-p_{n})+\omega _n(z_{n-1}-p_{n-1}), p_{n}-p_{n-1} \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n}-y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n-p_n \right\| }_M^2 - (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2+\frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) \\&= {\left\| \lambda _n(p_n-z_n)+\lambda _n\bar{\alpha }_n(z_{n-1}-p_{n-1}) \right\| }_M^2\\&\quad + 2 {\left\langle x_n-p_n, \lambda _n(p_n-z_n)+\lambda _n\bar{\alpha }_n(z_{n-1}-p_{n-1}) \right\rangle }_M \\&\quad + 2 {\left\langle \mu _{n}(z_{n}-p_{n})+\omega _n(z_{n-1}-p_{n-1}), p_{n}-p_{n-1} \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| p_n-x_n+\alpha _n(x_n-p_{n-1}) +\frac{\gamma _n\bar{\beta }\lambda _n^2}{\hat{\theta }_n}u_n - \frac{2\bar{\theta }_n}{\theta _n}v_n \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n}-y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n-p_n \right\| }_M^2 - (\lambda _{n}+\mu _{n})\left( \frac{\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| u_{n} \right\| }_{M}^2+\frac{\hat{\theta }_{n}}{\theta _{n}} {\left\| v_{n} \right\| }_{M}^2\right) . \end{aligned}$$

Next, using Lemma 2 and Steps 5–6 of Algorithm 1, we replace the terms including $u_n$ and $v_n$ in terms of the iterates

$$\begin{aligned} \Delta _n&= {\left\| \lambda _n(p_n-z_n)+\lambda _n\bar{\alpha }_n(z_{n-1}-p_{n-1}) \right\| }_M^2\\&\quad + 2 {\left\langle x_n-p_n, \lambda _n(p_n-z_n)+\lambda _n\bar{\alpha }_n(z_{n-1}-p_{n-1}) \right\rangle }_M \\&\quad + 2 {\left\langle \mu _{n}(z_{n}-p_{n})+\omega _n(z_{n-1}-p_{n-1}), p_{n}-p_{n-1} \right\rangle }_M\\&\quad + \frac{\theta _n}{2} {\left\| p_n-\frac{2\bar{\theta }_n}{\theta _n}z_n+\frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n}-y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n-p_n \right\| }_M^2 - \frac{(\lambda _{n}+\mu _{n})\tilde{\theta }_{n}}{\hat{\theta }_{n}} {\left\| y_n-(1-\alpha _n)x_n-\alpha _ny_{n-1} \right\| }_{M}^2\\&\quad - \frac{(\lambda _{n}+\mu _{n})\hat{\theta }_{n}}{\theta _{n}}\Big \Vert z_n-\frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_n-\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n\\&\hspace{40mm}+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1}\Big \Vert _{M}^2 \end{aligned}$$

where we used

$$\begin{aligned} v_n&= z_n-(1-\alpha _n)x_n+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}-\bar{\beta }\frac{\bar{\theta }_n\gamma _n}{\hat{\theta }_n}u_n\\&= z_n-\frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_n-\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \end{aligned}$$

which is obtained by substituting $u_n$ from Step 5 into Step 6 of Algorithm 1. Next, we expand the terms on the right-hand side of the last equality above which include $p_n$. This yields

$$\begin{aligned} \Delta _n&= \lambda _n^2 {\left\| p_n \right\| }_M^2+2 {\left\langle p_n, \lambda _n^2 {\left( -z_n+\bar{\alpha }_nz_{n-1}-\bar{\alpha }_np_{n-1} \right) } \right\rangle }_M\\&\quad + \lambda _n^2 {\left\| z_n-\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2\\&\quad - 2\lambda _n {\left\| p_n \right\| }_M^2+2 {\left\langle p_n, \lambda _n {\left( z_n+x_n+\bar{\alpha }_np_{n-1}-\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad + 2 {\left\langle x_n, \lambda _n {\left( -z_n-\bar{\alpha }_np_{n-1}+\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad - 2\mu _n {\left\| p_n \right\| }_M^2+2 {\left\langle p_n, \mu _{n}z_{n}+(\mu _n-\omega _n)p_{n-1}+\omega _nz_{n-1} \right\rangle }_M\\&\quad + 2 {\left\langle \mu _{n}z_{n}+\omega _n(z_{n-1}-p_{n-1}), -p_{n-1} \right\rangle }_M + \frac{1}{2}\theta _n {\left\| p_n \right\| }_M^2\\&\quad + 2 {\left\langle p_n, \frac{\theta _n}{2} {\left( -\frac{2\bar{\theta }_n}{\theta _n}z_n+\frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n \right) } \right\rangle }_M\\&\quad + 2 {\left\langle p_n, \frac{\theta _n}{2} {\left( -\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right) } \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| -\frac{2\bar{\theta }_n}{\theta _n}z_n+\frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n} \right\| }_M^2 + 2 {\left\langle p_n, \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left( -y_{n}-(p_{n-1}-y_{n-1}) \right) } \right\rangle }_M\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| -y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| p_n \right\| }_M^2 - 2 {\left\langle p_n, \frac{\lambda _n\gamma _n\bar{\beta }}{2}y_n \right\rangle }_M + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n \right\| }_M^2\\&\quad - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_n-(1-\alpha _n)x_n-\alpha _ny_{n-1} \right\| }_{M}^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}}\Big \Vert z_n-\frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_n-\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n\\&\hspace{30mm} +(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1}\Big \Vert _{M}^2\\&= \left( \lambda _n^2-\frac{ {\left( 4-\gamma _n\bar{\beta } \right) } {\left( \lambda _n+\mu _n \right) }}{2}+\frac{1}{2}\theta _n\right) {\left\| p_n \right\| }_M^2 \\&\quad + 2 {\left\langle p_n, {\left( \lambda _n+\mu _n-\lambda _n^2-\bar{\theta }_n \right) }z_n + {\left( \frac{\tilde{\theta }_n}{2}-\frac{ {\left( \lambda _n+\mu _n \right) }\gamma _n\bar{\beta }}{2} \right) }y_n \right\rangle }_M\\&\quad + 2 {\left\langle p_n, {\left( \lambda _n-\lambda _n \right) }x_n + {\left( (1-\lambda _n)\lambda _n\bar{\alpha }_n+\mu _n-\omega _n-\frac{1}{2}\theta ^{\prime }_n-\frac{\mu _n\gamma _n\bar{\beta }}{2} \right) }p_{n-1} \right\rangle }_M\\&\quad + 2 {\left\langle p_n, {\left( \lambda _n^2\bar{\alpha }_n-\lambda _n\bar{\alpha }_n+\omega _n+\bar{\theta }_n\bar{\alpha }_n \right) }z_{n-1} + {\left( -\frac{1}{2}\tilde{\theta }_n\alpha _n+\frac{\mu _n\gamma _n\bar{\beta }}{2} \right) }y_{n-1} \right\rangle }_M\\&\quad + \lambda _n^2 {\left\| z_n-\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2 + 2 {\left\langle x_n, \lambda _n {\left( -z_n-\bar{\alpha }_np_{n-1}+\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad + 2 {\left\langle \mu _{n}z_{n}+\omega _n(z_{n-1}-p_{n-1}), -p_{n-1} \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| -\frac{2\bar{\theta }_n}{\theta _n}z_n+\frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| -y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n \right\| }_M^2 - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_n-(1-\alpha _n)x_n-\alpha _ny_{n-1} \right\| }_{M}^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}}\Big \Vert {}z_n-\frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_n-\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n\\&\hspace{30mm}+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1}\Big \Vert _{M}^2. \end{aligned}$$

All terms involving $p_n$ in this expression are identically zero since their coefficients become zero. This is for most terms straightforward to show by substituting $\theta _n$, $\bar{\theta }_n$, $\tilde{\theta }_n$, $\theta ^{\prime }_n$, $\alpha _n$, and $\bar{\alpha }_n$ defined in Algorithm 1 into the corresponding coefficients. We show this for two coefficients for which it is less obvious. For the coefficient of $ {\left\langle p_n, p_{n-1} \right\rangle }_M$ we have

$$\begin{aligned} -\lambda _n^2\bar{\alpha }_n&+\lambda _n\bar{\alpha }_n+\mu _n-\omega _n-\frac{1}{2}\theta ^{\prime }_n-\frac{\mu _n\gamma _n\bar{\beta }}{2}\\&= -\lambda _n^2\bar{\alpha }_n+\lambda _n\bar{\alpha }_n+\mu _n- {\left( \lambda _n\bar{\alpha }_n-\frac{\gamma _n}{\gamma _{n-1}}\mu _n \right) }-\frac{1}{2}\theta ^{\prime }_n-\frac{\mu _n\gamma _n\bar{\beta }}{2}\\&= -\lambda _n^2\bar{\alpha }_n+\frac{\gamma _n}{\gamma _{n-1}}\mu _n+\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n}{2}-\frac{1}{2}\theta ^{\prime }_n\\&= -\lambda _n^2\bar{\alpha }_n+(\lambda _n+\mu _n)\bar{\alpha }_n+\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n}{2}-\frac{1}{2}\theta ^{\prime }_n\\&= \bar{\theta }_n\bar{\alpha }_n+\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n}{2}-\frac{1}{2}\theta ^{\prime }_n = \frac{1}{2}\theta ^{\prime }_n-\frac{1}{2}\theta ^{\prime }_n = 0, \end{aligned}$$

where in the first equality $\omega _n$ is substituted from (43) and in the third equality the definition of $\bar{\alpha }_n$ is used. For the coefficient of $ {\left\langle p_n, z_{n-1} \right\rangle }_M$

$$\begin{aligned} \lambda _n^2\bar{\alpha }_n&-\lambda _n\bar{\alpha }_n+\omega _n+\bar{\theta }_n\bar{\alpha }_n\\&= \lambda _n^2\bar{\alpha }_n-\lambda _n\bar{\alpha }_n+\lambda _n\bar{\alpha }_n-\frac{\gamma _n}{\gamma _{n-1}}\mu _n+\bar{\theta }_n\bar{\alpha }_n\\&= \lambda _n^2\bar{\alpha }_n- {\left( \lambda _n+\mu _n \right) }\bar{\alpha }_n+\bar{\theta }_n\bar{\alpha }_n = (\bar{\theta }_n-\bar{\theta }_n)\bar{\alpha }_n = 0. \end{aligned}$$

Next, for the terms containing $z_n$, we do a similar procedure of expanding, reordering, and recollecting the terms as we did for $p_n$. This gives

$$\begin{aligned} \Delta _n&= \lambda _n^2 {\left\| z_n-\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2 + 2 {\left\langle x_n, \lambda _n {\left( -z_n-\bar{\alpha }_np_{n-1}+\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad + 2 {\left\langle \mu _{n}z_{n}+\omega _n(z_{n-1}-p_{n-1}), -p_{n-1} \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| -\frac{2\bar{\theta }_n}{\theta _n}z_n+\frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| -y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n \right\| }_M^2 - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_n-(1-\alpha _n)x_n-\alpha _ny_{n-1} \right\| }_{M}^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}}\Big \Vert {}z_n-\frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_n-\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n\\&\hspace{30mm}+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1}\Big \Vert _{M}^2\\&= \lambda _n^2 {\left\| z_n \right\| }_M^2 + 2 {\left\langle z_n, \lambda _n^2 {\left( -\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right) } \right\rangle }_M + \lambda _n^2 {\left\| -\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2\\&\quad + 2 {\left\langle z_n, -\lambda _nx_n \right\rangle }_M + 2 {\left\langle x_n, \lambda _n {\left( -\bar{\alpha }_np_{n-1}+\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad + 2 {\left\langle z_{n}, -\mu _{n}p_{n-1} \right\rangle }_M + 2 {\left\langle p_{n-1}, -\omega _n(z_{n-1}-p_{n-1}) \right\rangle }_M + \frac{2\bar{\theta }_n^2}{\theta _n} {\left\| z_n \right\| }_M^2\\&\quad + 2 {\left\langle z_n, -\bar{\theta }_n {\left( \frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right) } \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| \frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| -y_{n}-(p_{n-1}-y_{n-1}) \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n \right\| }_M^2 - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_n-(1-\alpha _n)x_n-\alpha _ny_{n-1} \right\| }_{M}^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left\| z_n \right\| }_{M}^2 + 2 {\left\langle z_n, -\frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left( -\frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_n-\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n \right) } \right\rangle }_M\\&\quad + 2 {\left\langle z_n, -\frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left( (\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \right) } \right\rangle }_M\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}}\Big \Vert {}-\frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_n-\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n\\&\hspace{30mm}+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1}\Big \Vert _{M}^2\\&= \frac{\lambda _n^2\theta _n+2\bar{\theta }_n^2-\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _n} {\left\| z_n \right\| }_M^2 + 2 {\left\langle z_n, \frac{\bar{\theta }_n {\left( \gamma _n\bar{\beta }(\lambda _n+\mu _n)-\tilde{\theta }_n \right) }}{\theta _n}y_n \right\rangle }_M\\&\quad + 2 {\left\langle z_n, \frac{2\bar{\theta }_n\lambda _n-\lambda _n\theta _n+ {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n)\lambda _n}{\theta _n}x_n \right\rangle }_M\\&\quad + 2 {\left\langle z_n, \frac{\lambda _n^2\bar{\alpha }_n\theta _n-\mu _n\theta _n+\bar{\theta }_n\theta ^{\prime }_n-\hat{\theta }_n(\lambda _n+\mu _n)(\bar{\alpha }_n-\alpha _n)}{\theta _n}p_{n-1} \right\rangle }_M\\&\quad + 2 {\left\langle z_n, \frac{\bar{\alpha }_n {\left( \hat{\theta }_n(\lambda _n+\mu _n)-2\bar{\theta }_n^2-\lambda _n^2\theta _n \right) }}{\theta _n}z_{n-1} + \frac{\alpha _n\bar{\theta }_n {\left( \tilde{\theta }_n-\gamma _n\bar{\beta }(\lambda _n+\mu _n) \right) }}{\theta _n}y_{n-1} \right\rangle }_M\\&\quad + \lambda _n^2 {\left\| -\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2 + 2 {\left\langle x_n, \lambda _n {\left( -\bar{\alpha }_np_{n-1}+\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad + 2 {\left\langle p_{n-1}, -\omega _n(z_{n-1}-p_{n-1}) \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| \frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| -y_{n}-p_{n-1}+y_{n-1} \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n \right\| }_M^2 - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_n-(1-\alpha _n)x_n-\alpha _ny_{n-1} \right\| }_{M}^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}}\Big \Vert {}-\frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_n-\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n\\&\hspace{30mm}+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1}\Big \Vert _{M}^2 \end{aligned}$$

Now, we show that all the coefficients of the terms containing $z_n$ are identical to zero. The coefficients of $ {\left\| z_n \right\| }_M^2$ and $ {\left\langle z_n, z_{n-1} \right\rangle }_M$ are zero by Proposition 6 (iii). For the coefficient of $ {\left\langle z_n, x_n \right\rangle }_M$ we have

$$\begin{aligned} 2\bar{\theta }_n\lambda _n-\lambda _n\theta _n+ {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n)\lambda _n = \lambda _n {\left( 2\bar{\theta }_n+ {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n)-\theta _n \right) } \end{aligned}$$

which is identical to zero by Proposition 6 (ii). For the coefficient of $ {\left\langle z_n, p_{n-1} \right\rangle }_M$ we have

$$\begin{aligned}&\lambda _n^2\bar{\alpha }_n\theta _n-\mu _n\theta _n+\bar{\theta }_n\theta ^{\prime }_n-\hat{\theta }_n(\lambda _n+\mu _n)(\bar{\alpha }_n-\alpha _n)\\&\quad = \lambda _n^2\bar{\alpha }_n\theta _n-\mu _n\theta _n+\bar{\theta }_n {\left( {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n+2\bar{\alpha }_n\bar{\theta }_n \right) }-\hat{\theta }_n(\lambda _n+\mu _n)(\bar{\alpha }_n-\alpha _n)\\&\quad = \bar{\alpha }_n {\left( \lambda _n^2\theta _n+2\bar{\theta }_n^2-(\lambda _n+\mu _n)\hat{\theta }_n \right) }-\mu _n\theta _n\\&\qquad + {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n\bar{\theta }_n+(\lambda _n+\mu _n)\alpha _n\hat{\theta }_n\\&\quad = {\left( -\theta _n+ {\left( 2-\gamma _n\bar{\beta } \right) }\bar{\theta }_n+\hat{\theta }_n \right) }\mu _n = 0 \end{aligned}$$

where in the first equality, $\theta ^{\prime }_n$ is substituted and in the third equality Proposition 6 (iii) is used and in the last equality Proposition 6 (i) is used. Therefore, all terms containing $z_n$ can be eliminated from $\Delta _n$ and we are left with

$$\begin{aligned}&\Delta _n = \\&\quad \lambda _n^2 {\left\| -\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2 + 2 {\left\langle x_n, \lambda _n {\left( -\bar{\alpha }_np_{n-1}+\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad + 2 {\left\langle p_{n-1}, -\omega _n(z_{n-1}-p_{n-1}) \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| \frac{\tilde{\theta }_n}{\theta _n}y_n-\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| -y_{n}-p_{n-1}+y_{n-1} \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n \right\| }_M^2 - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_n-(1-\alpha _n)x_n-\alpha _ny_{n-1} \right\| }_{M}^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}}\Big \Vert {}-\frac{\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_n-\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n\\&\hspace{30mm}+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1}\Big \Vert _{M}^2\\&= \lambda _n^2 {\left\| -\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2 + 2 {\left\langle x_n, \lambda _n {\left( -\bar{\alpha }_np_{n-1}+\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad + 2 {\left\langle p_{n-1}, -\omega _n(z_{n-1}-p_{n-1}) \right\rangle }_M + \frac{\tilde{\theta }_n^2}{2\theta _n} {\left\| y_n \right\| }_M^2\\&\quad + 2 {\left\langle y_n, \frac{\tilde{\theta }_n}{2} {\left( -\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right) } \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| -\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| y_{n} \right\| }_M^2 + 2 {\left\langle y_n, \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left( p_{n-1}-y_{n-1} \right) } \right\rangle }_M + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n-1}-y_{n-1} \right\| }_M^2 \\&\quad + \frac{\lambda _n\gamma _n\bar{\beta }}{2} {\left\| y_n \right\| }_M^2 - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_n \right\| }_{M}^2\\&\quad - 2 {\left\langle y_n, \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left( (\alpha _n-1)x_n-\alpha _ny_{n-1} \right) } \right\rangle }_M \\&\quad - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| (\alpha _n-1)x_n-\alpha _ny_{n-1} \right\| }_{M}^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2 \\&\quad - \frac{\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}\theta _n} {\left\| y_n \right\| }_{M}^2 + 2 {\left\langle y_n, \frac{\bar{\theta }_n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n})}{\theta _n} {\left( -\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n \right) } \right\rangle }_M\\&\quad + 2 {\left\langle y_n, \frac{\bar{\theta }_n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n})}{\theta _n} {\left( (\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \right) } \right\rangle }_M\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left\| -\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \right\| }_{M}^2\\&= {\left( \frac{\tilde{\theta }_n^2}{2\theta _n}+\frac{\mu _n\gamma _n\bar{\beta }}{2}+\frac{\lambda _n\gamma _n\bar{\beta }}{2}-\frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}}-\frac{\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}\theta _n} \right) } {\left\| y_n \right\| }_M^2\\&\quad + 2 {\left\langle y_n, {\left( -\frac{\lambda _n\tilde{\theta }_n}{\theta _n}+\frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})(1-\alpha _n)}{\hat{\theta }_{n}}-\frac{\bar{\theta }_n\lambda _n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n}) {\left( 2-\gamma _n\bar{\beta } \right) }}{\theta _n\hat{\theta }_{n}} \right) }x_n \right\rangle }_M\\&\quad + 2 {\left\langle y_n, {\left( -\frac{\tilde{\theta }_n\theta ^{\prime }_n}{2\theta _n}+\frac{\mu _{n}\gamma _{n}\bar{\beta }}{2}+\frac{\bar{\theta }_n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)}{\theta _{n}} \right) }p_{n-1} \right\rangle }_M\\&\quad + 2 {\left\langle y_n, {\left( \frac{\tilde{\theta }_n\bar{\theta }_n\bar{\alpha }_n}{\theta _n}-\frac{\bar{\theta }_n\bar{\alpha }_n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n})}{\theta _{n}} \right) }z_{n-1} \right\rangle }_M\\&\quad + 2 {\left\langle y_n, {\left( -\frac{\tilde{\theta }_n^2\alpha _n}{2\theta _n}-\frac{\mu _{n}\gamma _{n}\bar{\beta }}{2}+\frac{\tilde{\theta }_{n}\alpha _n(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}}+\frac{\alpha _n\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2(\lambda _{n}+\mu _{n})}{\theta _n\hat{\theta }_{n}} \right) }y_{n-1} \right\rangle }_M\\&\quad + \lambda _n^2 {\left\| -\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2 + 2 {\left\langle x_n, \lambda _n {\left( -\bar{\alpha }_np_{n-1}+\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n-1}-y_{n-1} \right\| }_M^2 + 2 {\left\langle p_{n-1}, -\omega _n(z_{n-1}-p_{n-1}) \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| -\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2\\&\quad - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| (\alpha _n-1)x_n-\alpha _ny_{n-1} \right\| }_{M}^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left\| -\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \right\| }_{M}^2 \end{aligned}$$

Now, we show that all the coefficients of the terms containing $y_n$ are identically zero. For the coefficient of $ {\left\| y_n \right\| }_M^2$ we have

$$\begin{aligned}& {\left( \theta _n\hat{\theta }_n\gamma _n\bar{\beta }-2\theta _n\tilde{\theta }_{n}-2\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2 \right) }(\lambda _n+\mu _n)+\tilde{\theta }_n^2\hat{\theta }_n\\&\quad = {\left( \theta _n\gamma _n\bar{\beta }(2\lambda _n+2\mu _n-\lambda _n^2\gamma _n\bar{\beta })-2\theta _n\gamma _n\bar{\beta }(\lambda _n+\mu _n)-2\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2 \right) }(\lambda _n+\mu _n)\\&\qquad +\tilde{\theta }_n^2\hat{\theta }_n\\&\quad = - {\left( \theta _n\lambda _n^2+2\bar{\theta }_n^2 \right) }(\lambda _n+\mu _n)\gamma _n^2\bar{\beta }^2+(\lambda _n+\mu _n)^2\gamma _n^2\bar{\beta }^2\hat{\theta }_n\\&\quad = {\left( (\lambda _n+\mu _n)\hat{\theta }_n-\theta _n\lambda _n^2-2\bar{\theta }_n^2 \right) }(\lambda _n+\mu _n)\gamma _n^2\bar{\beta }^2, \end{aligned}$$

which, by Proposition 6 (iii), is identical to zero. Now, for the coefficient of $ {\left\langle y_n, x_n \right\rangle }_M$ we have

$$\begin{aligned}&\tilde{\theta }_{n}\theta _{n}(\lambda _{n}+\mu _{n})(1-\alpha _n)-\lambda _n\tilde{\theta }_n\hat{\theta }_{n}-\bar{\theta }_n\lambda _n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n}) {\left( 2-\gamma _n\bar{\beta } \right) }\\&\qquad = {\left( \theta _{n}(\lambda _{n}+\mu _{n})(1-\alpha _n)-\lambda _n\hat{\theta }_{n}-\bar{\theta }_n\lambda _n {\left( 2-\gamma _n\bar{\beta } \right) } \right) }\tilde{\theta }_n\\&\qquad = {\left( \theta _{n}-\hat{\theta }_{n}-\bar{\theta }_n {\left( 2-\gamma _n\bar{\beta } \right) } \right) }\lambda _n\tilde{\theta }_n \end{aligned}$$

which is identically zero by Proposition 6 (i). For the coefficient of $ {\left\langle y_n, p_{n-1} \right\rangle }_M$ we have

$$\begin{aligned}&\mu _{n}\gamma _{n}\bar{\beta }\theta _n+2\bar{\theta }_n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)-\tilde{\theta }_n\theta ^{\prime }_n\\&\qquad = \mu _{n}\gamma _{n}\bar{\beta }\theta _n+2\bar{\theta }_n\tilde{\theta }_n(\bar{\alpha }_n-\alpha _n)-\tilde{\theta }_n {\left( {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n+2\bar{\alpha }_n\bar{\theta }_n \right) }\\&\qquad = \mu _{n}\gamma _{n}\bar{\beta }\theta _n+2\bar{\theta }_n\tilde{\theta }_n\bar{\alpha }_n-2\bar{\theta }_n\tilde{\theta }_n\alpha _n- {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n\tilde{\theta }_n-2\bar{\alpha }_n\bar{\theta }_n\tilde{\theta }_n\\&\qquad = \mu _{n}\gamma _{n}\bar{\beta }\theta _n-2\bar{\theta }_n\tilde{\theta }_n\alpha _n- {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n\tilde{\theta }_n\\&\qquad = \mu _{n}\gamma _{n}\bar{\beta }\theta _n-2\bar{\theta }_n\mu _{n}\gamma _{n}\bar{\beta }- {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n)\mu _n\gamma _{n}\bar{\beta }\\&\qquad = \mu _{n}\gamma _{n}\bar{\beta } {\left( \theta _n-2\bar{\theta }_n- {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n) \right) }\\&\qquad = \mu _{n}\gamma _{n}\bar{\beta } {\left( \theta _n-2 {\left( \lambda _n+\mu _n-\lambda _n^2 \right) }- {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n) \right) }\\&\qquad = \mu _{n}\gamma _{n}\bar{\beta } {\left( \theta _n+2\lambda _n^2-(4-\gamma _n\bar{\beta })(\lambda _n+\mu _n) \right) } \end{aligned}$$

which is equal to zero by the definition of $\theta _n$. The equivalence of the coefficient of $ {\left\langle y_n, z_{n-1} \right\rangle }_M$ to zero follows from the definition of $\tilde{\theta }_n$. For the coefficient of $ {\left\langle y_n, y_{n-1} \right\rangle }_M$ we have

$$\begin{aligned}&2\alpha _n\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2(\lambda _{n}+\mu _{n})-\tilde{\theta }_n^2\hat{\theta }_{n}\alpha _n+2\theta _n\tilde{\theta }_{n}\alpha _n(\lambda _{n}+\mu _{n})-\mu _{n}\gamma _{n}\bar{\beta }\theta _n\hat{\theta }_{n}\\&\qquad = 2\alpha _n\bar{\theta }_n^2\gamma _n\bar{\beta }\tilde{\theta }_n-\tilde{\theta }_n^2\hat{\theta }_{n}\alpha _n+2\theta _n\tilde{\theta }_{n}\alpha _n(\lambda _{n}+\mu _{n})-\alpha _n\tilde{\theta }_n\theta _n\hat{\theta }_{n}\\&\qquad = \alpha _n\tilde{\theta }_n {\left( 2\bar{\theta }_n^2\gamma _n\bar{\beta }-\tilde{\theta }_n\hat{\theta }_{n}+2\theta _n(\lambda _{n}+\mu _{n})-\theta _n\hat{\theta }_{n} \right) }\\&\qquad = \alpha _n\tilde{\theta }_n {\left( 2\bar{\theta }_n^2\gamma _n\bar{\beta }-\tilde{\theta }_n\hat{\theta }_{n}+\theta _n {\left( 2(\lambda _{n}+\mu _{n})-2(\lambda _{n}+\mu _{n})+\lambda _n^2\gamma _n\bar{\beta } \right) } \right) }\\&\qquad = \alpha _n\tilde{\theta }_n {\left( 2\bar{\theta }_n^2\gamma _n\bar{\beta }-\tilde{\theta }_n\hat{\theta }_{n}+\theta _n\lambda _n^2\gamma _n\bar{\beta } \right) }\\&\qquad = \alpha _n\tilde{\theta }_n {\left( 2\bar{\theta }_n^2\gamma _n\bar{\beta }-\hat{\theta }_{n}\gamma _n\bar{\beta }(\lambda _n+\mu _n)+\theta _n\lambda _n^2\gamma _n\bar{\beta } \right) }\\&\qquad = \alpha _n\tilde{\theta }_n\gamma _n\bar{\beta } {\left( 2\bar{\theta }_n^2-\hat{\theta }_{n}(\lambda _n+\mu _n)+\theta _n\lambda _n^2 \right) } \end{aligned}$$

which by Proposition 6 (iii) is identical to zero. Therefore, all the coefficients of the terms containing $y_n$ are zero and we can eliminate those terms. The remaining terms are

$$\begin{aligned}&\Delta _n = \\&\quad \lambda _n^2 {\left\| -\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2 + 2 {\left\langle x_n, \lambda _n {\left( -\bar{\alpha }_np_{n-1}+\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n-1}-y_{n-1} \right\| }_M^2 + 2 {\left\langle p_{n-1}, -\omega _n(z_{n-1}-p_{n-1}) \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| -\frac{2\lambda _n}{\theta _n}x_n-\frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}+\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}-\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2\\&\quad - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| (\alpha _n-1)x_n-\alpha _ny_{n-1} \right\| }_{M}^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left\| -\frac{ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n}{\hat{\theta }_n}x_n+(\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \right\| }_{M}^2\\&= \lambda _n^2 {\left\| -\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2 + 2 {\left\langle x_n, \lambda _n {\left( -\bar{\alpha }_np_{n-1}+\bar{\alpha }_nz_{n-1} \right) } \right\rangle }_M\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n-1}-y_{n-1} \right\| }_M^2 + 2 {\left\langle p_{n-1}, -\omega _n(z_{n-1}-p_{n-1}) \right\rangle }_M\\&\quad + \frac{2\lambda _n^2}{\theta _n} {\left\| x_n \right\| }_M^2 + 2 {\left\langle x_n, \lambda _n {\left( \frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}-\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}+\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right) } \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| \frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}-\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}+\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad - \frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})(1-\alpha _n)^2}{\hat{\theta }_{n}} {\left\| x_n \right\| }_{M}^2 + 2 {\left\langle x_n, \frac{\tilde{\theta }_{n}\alpha _n(\lambda _{n}+\mu _{n})(\alpha _n-1)}{\hat{\theta }_{n}}y_{n-1} \right\rangle }_M\\&\quad - \frac{(\lambda _{n}+\mu _{n}) {\left( 2-\gamma _n\bar{\beta } \right) }^2\lambda _n^2}{\theta _n\hat{\theta }_n} {\left\| x_n \right\| }_{M}^2 - \frac{\tilde{\theta }_{n}\alpha _n^2(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_{n-1} \right\| }_{M}^2 \\&\quad + 2 {\left\langle x_{n}, \frac{\lambda _n {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left( (\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \right) } \right\rangle }_M\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left\| (\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \right\| }_{M}^2\\&= {\left( \frac{2\lambda _n^2}{\theta _n}-\frac{\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})(1-\alpha _n)^2}{\hat{\theta }_{n}}-\frac{(\lambda _{n}+\mu _{n}) {\left( 2-\gamma _n\bar{\beta } \right) }^2\lambda _n^2}{\theta _n\hat{\theta }_n} \right) } {\left\| x_n \right\| }_M^2\\&\quad + 2 {\left\langle x_n, {\left( \frac{\lambda _n\theta ^{\prime }_n}{\theta _n}+\frac{\lambda _n {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)}{\theta _{n}}-\lambda _n\bar{\alpha }_n \right) }p_{n-1} \right\rangle }_M\\&\quad + 2 {\left\langle x_n, {\left( -\frac{2\lambda _n\bar{\theta }_n\bar{\alpha }_n}{\theta _n}-\frac{\lambda _n\bar{\alpha }_n {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})}{\theta _{n}}+\lambda _n\bar{\alpha }_n \right) }z_{n-1} \right\rangle }_M\\&\quad + 2 {\left\langle x_n, {\left( \frac{\lambda _n\tilde{\theta }_n\alpha _n}{\theta _n}+\frac{\tilde{\theta }_{n}\alpha _n(\lambda _{n}+\mu _{n})(\alpha _n-1)}{\hat{\theta }_{n}}+\frac{\bar{\theta }_n\alpha _n\lambda _n\gamma _n\bar{\beta } {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})}{\theta _n\hat{\theta }_{n}} \right) }y_{n-1} \right\rangle }_M\\&\quad + \lambda _n^2 {\left\| -\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2 - \frac{\tilde{\theta }_{n}\alpha _n^2(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_{n-1} \right\| }_{M}^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n-1}-y_{n-1} \right\| }_M^2 + 2 {\left\langle p_{n-1}, -\omega _n(z_{n-1}-p_{n-1}) \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| \frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}-\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}+\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left\| (\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \right\| }_{M}^2 \end{aligned}$$

We want to show that all the coefficients of the terms containing $x_n$ are zero. For the coefficient of $ {\left\| x_n \right\| }_M^2$ we have

$$\begin{aligned}&2\lambda _n^2\hat{\theta }_n-(\lambda _{n}+\mu _{n}) {\left( 2-\gamma _n\bar{\beta } \right) }^2\lambda _n^2-\theta _n\tilde{\theta }_{n}(\lambda _{n}+\mu _{n})(1-\alpha _n)^2\\&\qquad = 2\lambda _n^2\hat{\theta }_n-(\lambda _{n}+\mu _{n}) {\left( 2-\gamma _n\bar{\beta } \right) }^2\lambda _n^2-\theta _n\lambda _n^2\gamma _n\bar{\beta }\\&\qquad = \lambda _n^2 {\left( 2\hat{\theta }_n-(\lambda _{n}+\mu _{n}) {\left( 2-\gamma _n\bar{\beta } \right) }^2-\theta _n\gamma _n\bar{\beta } \right) }\\&\qquad = \lambda _n^2 {\left( 2 {\left( 2\lambda _n+2\mu _n-\gamma _n\bar{\beta }\lambda _n^2 \right) }-(\lambda _{n}+\mu _{n}) {\left( 4-4\gamma _n\bar{\beta }+\gamma _n^2\bar{\beta }^2 \right) }-\theta _n\gamma _n\bar{\beta } \right) }\\&\qquad = \lambda _n^2\Bigg (-2\gamma _n\bar{\beta }\lambda _n^2-(\lambda _{n}+\mu _{n}) {\left( -4\gamma _n\bar{\beta }+\gamma _n^2\bar{\beta }^2 \right) }\\&\hspace{30mm}- {\left( (4-\gamma _n\bar{\beta })(\lambda _n+\mu _n)-2\lambda _n^2 \right) }\gamma _n\bar{\beta }\bigg )\\&\qquad = \lambda _n^2 {\left( -2\gamma _n\bar{\beta }\lambda _n^2+2\lambda _n^2\gamma _n\bar{\beta } \right) } = 0 \end{aligned}$$

(44)

where in the first equality $\tilde{\theta }_n$ and $\alpha _n$ and in the third equality $\hat{\theta }_n$ are substituted by their definition from Algorithm 1. For the coefficient of $ {\left\langle x_n, p_{n-1} \right\rangle }$ we have

$$\begin{aligned}&\lambda _n\theta ^{\prime }_n+\lambda _n {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)-\lambda _n\bar{\alpha }_n\theta _n\\&\quad = \lambda _n {\left( {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n+2\bar{\alpha }_n\bar{\theta }_n \right) }+\lambda _n {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)-\lambda _n\bar{\alpha }_n\theta _n\\&\quad = 2\lambda _n\bar{\alpha }_n\bar{\theta }_n+ {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})\lambda _n\bar{\alpha }_n-\lambda _n\bar{\alpha }_n\theta _n\\&\quad \quad - {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})\lambda _n\alpha _n+ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n\mu _n\\&\quad = \lambda _n\bar{\alpha }_n {\left( 2\bar{\theta }_n+ {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})-\theta _n \right) }\\&\quad \quad - {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n\mu _n+ {\left( 2-\gamma _n\bar{\beta } \right) }\lambda _n\mu _n \end{aligned}$$

which by Proposition 6 (ii) is zero. For the coefficient of $ {\left\langle x_n, z_{n-1} \right\rangle }$ we have

$$\begin{aligned}&\lambda _n\bar{\alpha }_n\theta _n-2\lambda _n\bar{\theta }_n\bar{\alpha }_n-\lambda _n\bar{\alpha }_n {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})\\&\qquad = \lambda _n\bar{\alpha }_n {\left( \theta _n-2\bar{\theta }_n- {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n}) \right) } \end{aligned}$$

which is identically zero by Proposition 6 (ii). For the coefficient of $ {\left\langle x_n, y_{n-1} \right\rangle }$ we have

$$\begin{aligned}&\lambda _n\tilde{\theta }_n\alpha _n\hat{\theta }_{n}+\bar{\theta }_n\alpha _n\lambda _n\gamma _n\bar{\beta } {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _{n}+\mu _{n})-\theta _{n}\tilde{\theta }_n\alpha _n(\lambda _{n}+\mu _{n})(1-\alpha _n)\\&\qquad = \lambda _n\tilde{\theta }_n\alpha _n\hat{\theta }_{n}+\bar{\theta }_n\alpha _n\lambda _n\tilde{\theta }_n {\left( 2-\gamma _n\bar{\beta } \right) }-\theta _{n}\tilde{\theta }_n\alpha _n\lambda _n\\&\qquad = \lambda _n\tilde{\theta }_n\alpha _n {\left( \hat{\theta }_{n}+\bar{\theta }_n {\left( 2-\gamma _n\bar{\beta } \right) }-\theta _n \right) } \end{aligned}$$

which by Proposition 6 (i) is identically zero. Now, expanding all the remaining terms, reordering and recollecting them give

$$\begin{aligned}&\Delta _n = \\&\quad \lambda _n^2 {\left\| -\bar{\alpha }_nz_{n-1}+\bar{\alpha }_np_{n-1} \right\| }_M^2 - \frac{\tilde{\theta }_{n}\alpha _n^2(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_{n-1} \right\| }_{M}^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n-1}-y_{n-1} \right\| }_M^2 + 2 {\left\langle p_{n-1}, -\omega _n(z_{n-1}-p_{n-1}) \right\rangle }_M\\&\quad + \frac{1}{2}\theta _n {\left\| \frac{\theta ^{\prime }_n}{\theta _n}p_{n-1}-\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}+\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\| }_M^2 - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1}-p_{n-1} \right\| }_M^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left\| (\bar{\alpha }_n-\alpha _n)p_{n-1}-\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \right\| }_{M}^2\\&= \lambda _n^2\bar{\alpha }_n^2 {\left\| p_{n-1} \right\| }_M^2 + 2 {\left\langle p_{n-1}, -\lambda _n^2\bar{\alpha }_n^2z_{n-1} \right\rangle }_M + \lambda _n^2\bar{\alpha }_n^2 {\left\| z_{n-1} \right\| }_M^2\\&\quad + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| p_{n-1} \right\| }_M^2 + 2 {\left\langle p_{n-1}, -\frac{\mu _{n}\gamma _{n}\bar{\beta }}{2}y_{n-1} \right\rangle }_M + \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2} {\left\| y_{n-1} \right\| }_M^2 \\&\quad - \frac{\tilde{\theta }_{n}\alpha _n^2(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}} {\left\| y_{n-1} \right\| }_{M}^2 + 2\omega _n {\left\| p_{n-1} \right\| }_M^2 + 2 {\left\langle p_{n-1}, -\omega _nz_{n-1} \right\rangle }_M\\&\quad + \frac{{\theta ^{\prime }_n}^2}{2\theta _n} {\left\| p_{n-1} \right\| }_M^2 + 2 {\left\langle p_{n-1}, \frac{1}{2}\theta ^{\prime }_n {\left( -\frac{2\bar{\theta }_n\bar{\alpha }_n}{\theta _n}z_{n-1}+\frac{\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right) } \right\rangle }_M\\&\quad + \frac{2\bar{\theta }_n^2\bar{\alpha }_n^2}{\theta _n} {\left\| z_{n-1} \right\| }_M^2 + 2 {\left\langle z_{n-1}, -\frac{\bar{\theta }_n\bar{\alpha }_n\tilde{\theta }_n\alpha _n}{\theta _n}y_{n-1} \right\rangle }_M + \frac{\tilde{\theta }_n^2\alpha _n^2}{2\theta _n} {\left\| y_{n-1} \right\| }_M^2\\&\quad - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| p_{n-1} \right\| }_M^2 + 2 {\left\langle p_{n-1}, \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2}y_{n-1} \right\rangle }_M - \frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2} {\left\| y_{n-1} \right\| }_M^2\\&\quad - \frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)^2}{\theta _{n}} {\left\| p_{n-1} \right\| }_{M}^2 - \frac{\alpha _n^2\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2(\lambda _{n}+\mu _{n})}{\hat{\theta }_n\theta _n} {\left\| y_{n-1} \right\| }_{M}^2\\&\quad - \frac{\hat{\theta }_{n}\bar{\alpha }_n^2(\lambda _{n}+\mu _{n})}{\theta _{n}} {\left\| z_{n-1} \right\| }_{M}^2 + 2 {\left\langle z_{n-1}, \frac{\bar{\alpha }_n\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n})}{\theta _{n}}y_{n-1} \right\rangle }_M\\&\quad + 2 {\left\langle p_{n-1}, -\frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)}{\theta _{n}} {\left( -\bar{\alpha }_nz_{n-1}+\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }}{\hat{\theta }_n}y_{n-1} \right) } \right\rangle }_M\\&= {\left( \lambda _n^2\bar{\alpha }_n^2+\frac{\mu _{n}\gamma _{n}\bar{\beta }}{2}+2\omega _n+\frac{{\theta ^{\prime }_n}^2}{2\theta _n}-\frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2}-\frac{\hat{\theta }_{n}(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)^2}{\theta _{n}} \right) } {\left\| p_{n-1} \right\| }_M^2\\&\quad +2 {\left\langle p_{n-1}, {\left( -\lambda _n^2\bar{\alpha }_n^2-\omega _n-\frac{\bar{\theta }_n\theta ^{\prime }_n\bar{\alpha }_n}{\theta _n}+\frac{\hat{\theta }_{n}\bar{\alpha }_n(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)}{\theta _{n}} \right) }z_{n-1} \right\rangle }_M\\&\quad + 2 {\left\langle p_{n-1}, {\left( -\frac{\mu _{n}\gamma _{n}\bar{\beta }}{2}+\frac{\theta ^{\prime }_n\tilde{\theta }_n\alpha _n}{2\theta _n}+\frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2}-\frac{\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)}{\theta _{n}} \right) }y_{n-1} \right\rangle }_M\\&\quad + {\left( \lambda _n^2\bar{\alpha }_n^2+\frac{2\bar{\theta }_n^2\bar{\alpha }_n^2}{\theta _n}-\frac{\hat{\theta }_{n}\bar{\alpha }_n^2(\lambda _{n}+\mu _{n})}{\theta _{n}} \right) } {\left\| z_{n-1} \right\| }_M^2\\&\quad + 2 {\left\langle z_{n-1}, {\left( -\frac{\bar{\theta }_n\bar{\alpha }_n\tilde{\theta }_n\alpha _n}{\theta _n}+\frac{\bar{\alpha }_n\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n})}{\theta _{n}} \right) }y_{n-1} \right\rangle }_M\\&\quad + {\left( \frac{\mu _{n}\gamma _{n}\bar{\beta }}{2}-\frac{\tilde{\theta }_{n}\alpha _n^2(\lambda _{n}+\mu _{n})}{\hat{\theta }_{n}}+\frac{\tilde{\theta }_n^2\alpha _n^2}{2\theta _n}-\frac{\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }}{2}-\frac{\alpha _n^2\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2(\lambda _{n}+\mu _{n})}{\hat{\theta }_n\theta _n} \right) } {\left\| y_{n-1} \right\| }_M^2 \end{aligned}$$

We show that all the coefficients in the expression above are identically zero. Starting by the coefficient of $ {\left\| p_{n-1} \right\| }_M^2$, we have

$$\begin{aligned}&2\theta _{n}\lambda _n^2\bar{\alpha }_n^2+\theta _{n}\mu _{n}\gamma _{n}\bar{\beta }+4\theta _{n}\omega _n+{\theta ^{\prime }_n}^2-\theta _{n}\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }-2\hat{\theta }_{n}(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)^2\\&\quad = 2\theta _{n}\lambda _n^2\bar{\alpha }_n^2+\theta _{n}\mu _{n}\gamma _{n}\bar{\beta }-4\theta _{n}\bar{\alpha }_n\mu _n+ {\left( {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n+2\bar{\alpha }_n\bar{\theta }_n \right) }^2\\&\qquad -\theta _{n}\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }-2\hat{\theta }_{n}(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)^2\\&\quad = 2\bar{\alpha }_n^2 {\left( \theta _{n}\lambda _n^2+2\bar{\theta }_n^2-\hat{\theta }_n(\lambda _n+\mu _n) \right) }+ 4\mu _n\bar{\alpha }_n {\left( -\theta _n+\bar{\theta }_n {\left( 2-\gamma _n\bar{\beta } \right) }+\hat{\theta }_n \right) }\\&\qquad + \theta _{n}\mu _{n}\gamma _{n}\bar{\beta }+ {\left( 2-\gamma _n\bar{\beta } \right) }^2\mu _n^2-\theta _{n}\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }-2\hat{\theta }_{n}(\lambda _{n}+\mu _{n})\alpha _n^2\\&\quad = \theta _{n}\mu _{n}\gamma _{n}\bar{\beta }+ {\left( 2-\gamma _n\bar{\beta } \right) }^2\mu _n^2-\theta _{n}\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }-2\hat{\theta }_{n}\mu _{n}\alpha _n\\&\quad = \theta _{n}\gamma _{n}\bar{\beta } {\left( \mu _{n}-\lambda _n\alpha _n \right) }+ {\left( 2-\gamma _n\bar{\beta } \right) }^2\mu _n^2-2\hat{\theta }_{n}\mu _{n}\alpha _n\\&\quad = \theta _{n}\gamma _{n}\bar{\beta }\mu _{n}\alpha _n+ {\left( 2-\gamma _n\bar{\beta } \right) }^2(\lambda _n+\mu _n)\alpha _n\mu _n-2\hat{\theta }_{n}\mu _{n}\alpha _n\\&\quad = \mu _{n}\alpha _n {\left( \theta _{n}\gamma _{n}\bar{\beta }+ {\left( 2-\gamma _n\bar{\beta } \right) }^2(\lambda _n+\mu _n)-2\hat{\theta }_{n} \right) } \end{aligned}$$

where in the first equality $\omega _n = -\bar{\alpha }_n\mu _n$ is used and $\theta ^{\prime }_n$ is substituted from 38, the third equality is attained from Proposition 6 (i) and (iii), and the expression to the right-hand side of the last equality is identically zero by (44). For the coefficient of $ {\left\langle p_{n-1}, z_{n-1} \right\rangle }_M$ we have

$$\begin{aligned}&-\lambda _n^2\bar{\alpha }_n^2\theta _{n}-\omega _n\theta _{n}-\bar{\theta }_n\theta ^{\prime }_n\bar{\alpha }_n+\hat{\theta }_{n}\bar{\alpha }_n(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)\\&\quad = -\lambda _n^2\bar{\alpha }_n^2\theta _{n}-\omega _n\theta _{n}-\bar{\theta }_n\bar{\alpha }_n {\left( {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n+2\bar{\alpha }_n\bar{\theta }_n \right) }\\&\qquad +\hat{\theta }_{n}\bar{\alpha }_n(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)\\&\quad = \bar{\alpha }_n^2 {\left( -\lambda _n^2\theta _{n}-2\bar{\theta }_n^2+\hat{\theta }_n(\lambda _n+\mu _n) \right) }-\omega _n\theta _{n}-\bar{\theta }_n\bar{\alpha }_n\mu _n {\left( 2-\gamma _n\bar{\beta } \right) }\\&\qquad -\hat{\theta }_{n}\bar{\alpha }_n(\lambda _{n}+\mu _{n})\alpha _n\\&\quad = \bar{\alpha }_n\mu _n\theta _{n}-\bar{\theta }_n\bar{\alpha }_n\mu _n {\left( 2-\gamma _n\bar{\beta } \right) }-\hat{\theta }_{n}\bar{\alpha }_n\mu _{n}\\&\quad = \bar{\alpha }_n\mu _n {\left( \theta _{n}-\bar{\theta }_n {\left( 2-\gamma _n\bar{\beta } \right) }-\hat{\theta }_{n} \right) } \end{aligned}$$

which by Proposition 6 (i) is equal to zero. The third equality above is attained by using Proposition 6 (iii). For the coefficient of $ {\left\langle p_{n-1}, y_{n-1} \right\rangle }_M$ we have

$$\begin{aligned}&\theta ^{\prime }_n\tilde{\theta }_n\alpha _n-\mu _{n}\gamma _{n}\bar{\beta }\theta _{n}+\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }\theta _{n}-2\alpha _n\bar{\theta }_n\gamma _n\bar{\beta }(\lambda _{n}+\mu _{n})(\bar{\alpha }_n-\alpha _n)\\&\quad = {\left( {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n+2\bar{\alpha }_n\bar{\theta }_n \right) }\tilde{\theta }_n\alpha _n-\mu _{n}\gamma _{n}\bar{\beta }\theta _{n}+\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }\theta _{n}\\&\qquad -2\alpha _n\bar{\theta }_n\tilde{\theta }_n(\bar{\alpha }_n-\alpha _n)\\&\quad = {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n\tilde{\theta }_n\alpha _n-\mu _{n}\gamma _{n}\bar{\beta }\theta _{n}+\lambda _{n}\gamma _{n}\bar{\beta }\theta _{n}\alpha _{n}+2\alpha _n\bar{\theta }_n\tilde{\theta }_n\alpha _n\\&\quad = {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n(\lambda _n+\mu _n)\gamma _n\bar{\beta }\alpha _n-(\lambda _n+\mu _n)\alpha _n\gamma _{n}\bar{\beta }\theta _{n}\\&\qquad +\lambda _{n}\gamma _{n}\bar{\beta }\theta _{n}\alpha _{n}+2\alpha _n\bar{\theta }_n\mu _n\gamma _n\bar{\beta }\\&\quad = {\left( 2-\gamma _n\bar{\beta } \right) }\mu _n(\lambda _n+\mu _n)\gamma _n\bar{\beta }\alpha _n-\mu _n\alpha _n\gamma _{n}\bar{\beta }\theta _{n}+2\alpha _n\bar{\theta }_n\mu _n\gamma _n\bar{\beta }\\&\quad = \mu _n\gamma _n\bar{\beta }\alpha _n {\left( {\left( 2-\gamma _n\bar{\beta } \right) }(\lambda _n+\mu _n)-\theta _{n}+2\bar{\theta }_n \right) } \end{aligned}$$

which by Proposition 6 (ii) is identical to zero. For the coefficient of $ {\left\| z_{n-1} \right\| }_M^2$, it is straightforward to see its equivalence to zero by Proposition 6 (iii). Likewise, the coefficient of $ {\left\langle z_{n-1}, y_{n-1} \right\rangle }_M$ is identically zero by definition of $\tilde{\theta }_n$. The coefficient of $ {\left\| y_{n-1} \right\| }_M^2$ is

$$\begin{aligned}&\mu _{n}\gamma _{n}\bar{\beta }\hat{\theta }_n\theta _n-2\theta _n\tilde{\theta }_{n}\alpha _n^2(\lambda _{n}+\mu _{n})+\hat{\theta }_n\tilde{\theta }_n^2\alpha _n^2-\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }\hat{\theta }_n\theta _n\\&\quad \qquad -2\alpha _n^2\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2(\lambda _{n}+\mu _{n})\\&\quad = (\lambda _n+\mu _n)\alpha _n\gamma _{n}\bar{\beta }\hat{\theta }_n\theta _n-2\theta _n\tilde{\theta }_{n}\alpha _n^2(\lambda _{n}+\mu _{n})+\hat{\theta }_n\tilde{\theta }_n^2\alpha _n^2\\&\quad \qquad -\lambda _{n}\gamma _{n}\alpha _{n}\bar{\beta }\hat{\theta }_n\theta _n-2\alpha _n^2\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2(\lambda _{n}+\mu _{n})\\&\quad = \mu _n\alpha _n\gamma _{n}\bar{\beta }\hat{\theta }_n\theta _n-2\theta _n\tilde{\theta }_{n}\alpha _n^2(\lambda _{n}+\mu _{n})+\hat{\theta }_n\tilde{\theta }_n^2\alpha _n^2-2\alpha _n^2\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2(\lambda _{n}+\mu _{n})\\&\quad = \mu _n\alpha _n\gamma _{n}\bar{\beta }\hat{\theta }_n\theta _n-2\theta _n\gamma _n\bar{\beta }\mu _n\alpha _n(\lambda _{n}+\mu _{n})+\hat{\theta }_n\tilde{\theta }_n\mu _n\gamma _n\bar{\beta }\alpha _n-2\alpha _n\mu _n\bar{\theta }_n^2\gamma _n^2\bar{\beta }^2\\&\quad = \mu _n\alpha _n\gamma _{n}\bar{\beta } {\left( \hat{\theta }_n\theta _n-2\theta _n(\lambda _{n}+\mu _{n})+\hat{\theta }_n\tilde{\theta }_n-2\bar{\theta }_n^2\gamma _n\bar{\beta } \right) }\\&\quad = \mu _n\alpha _n\gamma _{n}\bar{\beta } {\left( \theta _n {\left( \hat{\theta }_n-2\lambda _{n}-2\mu _{n} \right) }+\hat{\theta }_n\gamma _n\bar{\beta }(\lambda _n+\mu _n)-2\bar{\theta }_n^2\gamma _n\bar{\beta } \right) }\\&\quad = \mu _n\alpha _n\gamma _{n}\bar{\beta } {\left( -\theta _n\lambda _n^2\gamma _n\bar{\beta }+\hat{\theta }_n\gamma _n\bar{\beta }(\lambda _n+\mu _n)-2\bar{\theta }_n^2\gamma _n\bar{\beta } \right) }\\&\quad = \mu _n\alpha _n\gamma _{n}^2\bar{\beta }^2 {\left( -\theta _n\lambda _n^2+\hat{\theta }_n(\lambda _n+\mu _n)-2\bar{\theta }_n^2 \right) } \end{aligned}$$

which by Proposition 6 (iii) is equivalent to zero. This concludes the proof.$\square $

8 Conclusions

We have presented a variant of the well-known forward–backward algorithm. Our method incorporates momentum-like terms in the algorithm updates as well as deviation vectors. These deviation vectors can be chosen arbitrarily as long as a safeguarding condition that limits their size is satisfied. We propose special instances of our method that fulfill the safeguarding condition by design. Numerical evaluations reveal that these novel methods can significantly outperform the traditional forward–backward method as well as the accelerated proximal point method and the Halpern iteration, all of which are encompassed within our framework. This demonstrates the potential of our proposed methods for efficiently solving structured monotone inclusions.

References

Alvarez, F.: On the minimizing property of a second order dissipative system in Hilbert spaces. SIAM J. Control Optim. 38.4, 1102–1119 (2000). https://doi.org/10.1137/s0363012998335802
Alvarez, F., Attouch, H.: An inertial proximal method for maximal monotone operators via discretization of a nonlinear oscillator with damping. Set-Valued Anal. 9(1/2), 3–11 (2001). https://doi.org/10.1023/a:1011253113155
Article MathSciNet MATH Google Scholar
Apidopoulos, V., Aujol, J.-F., Dossal, C.: Convergence rate of inertial forward-backward algorithm beyond Nesterov’s rule. Math. Program. 180(1), 137–156 (2020). https://doi.org/10.1007/s10107-018-1350-9
Article MathSciNet MATH Google Scholar
Attouch, H., Cabot, A.: Convergence of a relaxed inertial proximal algorithm for maximally monotone operators. Math. Program. 184(1), 243–287 (2020). https://doi.org/10.1007/s10107-019-01412-0
Article MathSciNet MATH Google Scholar
Attouch, H., Czarnecki, M.-O., Peypouquet, J.: Coupling forward-backward with penalty schemes and parallel splitting for constrained variational inequalities. SIAM J. Optim. 21(4), 1251–1274 (2011). https://doi.org/10.1137/110820300
Article MathSciNet MATH Google Scholar
Attouch, H., Peypouquet, J.: The rate of convergence of Nesterov’s accelerated forward-backward method is actually faster than $1/k^{2}$. SIAM J. Optim. 26(3), 1824–1834 (2016). https://doi.org/10.1137/15M1046095
Attouch, H., et al.: Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity. Math. Program. 168(1), 123–175 (2018). https://doi.org/10.1007/s10107-016-0992-8
Article MathSciNet MATH Google Scholar
Banert, S., et al.: Accelerated forward–backward optimization using deep learning (2021). arXiv:2105.05210v1 [math.OC]
Bauschke, H.H., Combettes, P. L.: Convex analysis and monotone operator theory in Hilbert spaces. 2nd edn. CMS Books in Mathematics. Springer, 2017. https://doi.org/10.1007/978-3-319-48311-5
Beck, A., Teboulle, M.: A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM J. Imaging Sci. 2(1), 183–202 (2009). https://doi.org/10.1137/080716542
Article MathSciNet MATH Google Scholar
Bruck, R.E.: An iterative solution of a variational inequality for certain monotone operators in Hilbert space. Bull. Am. Math. Soc. 81, 890–892 (1975). https://doi.org/10.1090/S0002-9904-1975-13874-2
Article MathSciNet MATH Google Scholar
Chambolle, A., Dossal, C.: On the convergence of the iterates of the fast iterative shrinkage/thresholding algorithm. J. Optim. Theory Appl. 166(3), 968–982 (2015). https://doi.org/10.1007/s10957-015-0746-4
Article MathSciNet MATH Google Scholar
Chambolle, A., Pock, T.: A first-order primal-dual algorithm for convex problems with applications to imaging. J. Math. Imaging Vis. 40(1), 120–145 (2011). https://doi.org/10.1007/s10851-010-0251-1
Article MathSciNet MATH Google Scholar
Chen, G.H.-G., Rockafellar, R.T.: Convergence rates in forward–backward Splitting. SIAM J. Optim. 7.2, 421–444 (1997). https://doi.org/10.1137/S1052623495290179
Cholamjiak, W., Cholamjiak, P., Suantai, S.: An inertial forward–backward splitting method for solving inclusion problems in Hilbert spaces. J. Fixed Point Theory Appl. 20.1 (2018). https://doi.org/10.1007/s11784-018-0526-5
Combettes, P. L., Pesquet, J.-C.: Proximal splitting methods in signal processing. In: by Bauschke, H.H., et al. (eds.) Fixed-point algorithms for inverse problems in science and engineering. Springer New York, pp. 185–212 (2011). https://doi.org/10.1007/978-1-4419-9569-8_10
Condat, L.: A primal-dual splitting method for convex optimization involving Lipschitzian, proximable and linear composite terms. J. Optim. Theory Appl 158(2), 460–479 (2013). https://doi.org/10.1007/s10957-012-0245-9
Article MathSciNet MATH Google Scholar
Eckstein, J.: Splitting methods for monotone operators with applications to parallel optimization. PhD thesis. Mass. Insitute Technol. (1989). http://hdl.handle.net/1721.1/14356
Giselsson, P.: Nonlinear forward-backward splitting with projection correction. SIAM J. Optim. 31(3), 2199–2226 (2021). https://doi.org/10.1137/20M1345062
Article MathSciNet MATH Google Scholar
Giselsson, P., Fält, M., Boyd, S.: Line search for averaged operator iteration. In: 2016 IEEE 55th Conference on decision and control (CDC). IEEE, pp. 1015–1022 (2016). https://doi.org/10.1109/CDC.2016.7798401
Kim, D.: Accelerated proximal point method for maximally monotone operators. Math. Program. 190(1), 57–87 (2021). https://doi.org/10.1007/s10107-021-01643-0
Article MathSciNet MATH Google Scholar
Latafat, P., Patrinos, P.: Asymmetric forward-backward-adjoint splitting for solving monotone inclusions involving three operators. Comput. Optim. Appl. 68(1), 57–93 (2017). https://doi.org/10.1007/s10589-017-9909-6
Article MathSciNet MATH Google Scholar
Lieder, F.: On the convergence rate of the Halpern-iteration. Optim. Lett. 15(2), 405–418 (2021). https://doi.org/10.1007/s11590-020-01617-9
Article MathSciNet MATH Google Scholar
Lions, P.L., Mercier, B.: Splitting algorithms for the sum of two nonlinear operators. SIAM J. Numer. Anal. 16(6), 964–979 (1979). https://doi.org/10.1137/0716071
Article MathSciNet MATH Google Scholar
Lorenz, D.A., Pock, T.: An inertial forward-backward algorithm for monotone inclusions. J. Math. Imaging Vis. 51(2), 311–325 (2015). https://doi.org/10.1007/s10851-014-0523-2
Article MathSciNet MATH Google Scholar
Morin, M., Banert, S., Giselsson, P.: Nonlinear forward–backward splitting with momentum correction (2021). arXiv:2112.00481v4 [math.OC]
Passty G.B.: Ergodic convergence to a zero of the sum of monotone operators in Hilbert space. J. Math. Anal. Appl. 72.2, 383x390 (1979). https://doi.org/10.1016/0022-247x(79)90234-8
Polyak, B.T.: Some methods of speeding up the convergence of iteration methods. USSR Comput. Mathem. Math. Phys. 4(5), 1–17 (1964). https://doi.org/10.1016/0041-5553(64)90137-5
Article Google Scholar
Raguet, H., Landrieu, L.: Preconditioning of a generalized forward–backward splitting and application to optimization on graphs. SIAM J. Imaging Sci. 8(4), 2706–2739 (2015). https://doi.org/10.1137/15m1018253
Rockafellar, R., T.: Monotone operators and the proximal point algorithm. SIAM J. Control Optim 14.5, 877–898 (1976). https://doi.org/10.1137/0314056
Ryu, E.K., et al.: Operator splitting performance estimation: tight contraction factors and optimal parameter selection. SIAM J. Optim. 30(3), 2251–2271 (2020). https://doi.org/10.1137/19M1304854
Article MathSciNet MATH Google Scholar
Sadeghi, H., Banert, S., Giselsson, P.: DWIFOB: A dynamically weighted inertial forward–backward algorithm for monotone inclusions (2021). arXiv:2203.00028 [math.OC]
Sadeghi, H., Banert, S., Giselsson, P.: Forward–backward splitting with deviations for monotone inclusions (2021). arXiv:2112.00776 [math.OC]
Sadeghi, H., Giselsson, P.: Hybrid acceleration scheme for variance reduced stochastic optimization algorithms (2021). arXiv:2111.06791 [math.OC]
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Performance estimation toolbox (PESTO): Automated worst-case analysis of first-order optimization methods. In: 2017 IEEE 56th Annual conference on decision and control (CDC). IEEE, pp. 1278–1283 (2017). https://doi.org/10.1109/CDC.2017.8263832
Taylor, A.B., Hendrickx, J.M., Glineur, F.: Exact worst-case performance of first-order methods for composite convex optimization. SIAM J. Optim. 27(3), 1283–1313 (2017). https://doi.org/10.1137/16M108104X
Article MathSciNet MATH Google Scholar
Themelis, A., Patrinos, P.: SuperMann: a superlinearly convergent algorithm for finding fixed points of nonexpansive operators. IEEE Trans. Autom. Control 64(12), 4875–4890 (2019). https://doi.org/10.1109/TAC.2019.2906393
Article MathSciNet MATH Google Scholar
Tseng, P.: A modified forward-backward splitting method for maximal monotone mappings. SIAM J. Control Optim. 38(2), 431–446 (2000). https://doi.org/10.1137/S0363012998338806
Article MathSciNet MATH Google Scholar
Vũ, B.C.: A splitting algorithm for dual monotone inclusions involving cocoercive operators. Adv. Comput. Math. 38(3), 667–681 (2013). https://doi.org/10.1007/s10444-011-9254-8
Article MathSciNet MATH Google Scholar
Zhang, J., O’Donoghue, B., Boyd, S.: Globally convergent type-I Anderson acceleration for nonsmooth fixed-point iterations. SIAM J. Optim. 30.4, 3170–3197 (2020). https://doi.org/10.1137/18M1232772

Download references

Acknowledgements

This research was partially supported by Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. S. Banert was partially supported by ELLIIT.

Funding

Open access funding provided by Lund University.

Author information

Authors and Affiliations

Lund University, Lund, Sweden
Hamed Sadeghi, Sebastian Banert & Pontus Giselsson

Authors

Hamed Sadeghi
View author publications
You can also search for this author in PubMed Google Scholar
Sebastian Banert
View author publications
You can also search for this author in PubMed Google Scholar
Pontus Giselsson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Hamed Sadeghi, Sebastian Banert or Pontus Giselsson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sadeghi, H., Banert, S. & Giselsson, P. Incorporating history and deviations in forward–backward splitting. Numer Algor (2023). https://doi.org/10.1007/s11075-023-01686-8

Download citation

Received: 29 November 2022
Accepted: 13 October 2023
Published: 04 December 2023
DOI: https://doi.org/10.1007/s11075-023-01686-8

Keywords

Mathematics Subject Classification (2010)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Incorporating history and deviations in forward–backward splitting

Abstract

Similar content being viewed by others

The Forward–Backward Algorithm and the Normal Problem

An Inertial Forward-Backward Algorithm for Monotone Inclusions

A Mirror Inertial Forward–Reflected–Backward Splitting: Convergence Analysis Beyond Convexity and Lipschitz Smoothness

1 Introduction

2 Preliminaries

3 Problem statement and proposed algorithm

Assumption 1

3.1 Preview of special cases

4 Convergence analysis

Theorem 1

Assumption 2

Remark 1

Proposition 1

Proof

Theorem 2

Proof

Lemma 1

Proof

Theorem 3

Proof

5 A special case

Assumption 3

Proposition 2

Proof

Example 1

Assumption 4

Proposition 3

Proof

Remark 2

5.1 Alternative formulation

Proposition 4

Proof

5.2 Fixed-point residual convergence rate

Proposition 5

Proof

Corollary 1

5.2.1 Accelerated proximal point method and Halpern iteration

6 Numerical examples

7 Deferred results and proofs

Proposition 6

Proof

Lemma 2

Proof

7.1 Proof of Theorem 1

Proof

8 Conclusions

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding authors

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2010)

Search

Navigation