1 Introduction

Inverse problems occur whenever unknown quantities are measured indirectly and in many cases the measurement process introduces measurement noise. Nevertheless, these inverse problems appear in many practical applications and are often approached by solving minimization problems, which can be formulated as the minimization of an expression

$$\begin{aligned} \min _{x \in X} F(Ax) + G(x) \end{aligned}$$
(1)

on a Hilbert space X with a linear and bounded operator \(A: X \rightarrow Y\). Sometimes these kind of problems are hard to solve and it can be beneficial to examine the equivalent dual problem

$$\begin{aligned} \min _{y \in Y} G^*(-A^*y) + F^*(y) \end{aligned}$$
(2)

on the Hilbert space Y or the saddle point problem

$$\begin{aligned} \min _{x \in X} \max _{y \in Y} G(x) + \langle Ax,y \rangle - F^*(y). \end{aligned}$$

with the Fenchel conjugates \(G^*, F^*\) of GF instead of the primal problem above. If both \(G: X \rightarrow \overline{\mathbb {R}}\) and \(F^*: Y \rightarrow \overline{\mathbb {R}}\) are proper, convex, lower semicontinuous functionals defined on Hilbert spaces XY, the primal-dual algorithm of Chambolle and Pock [1], defined as

$$\begin{aligned} \begin{aligned} x^{i+1}&= {\text {prox}}_{\tau _i G}(x^i - \tau _i A^* y^i), \\ \bar{x}^{i+1}&= x^{i+1} + \omega _i (x^{i+1} - x^i), \\ y^{i+1}&= {\text {prox}}_{\sigma _{i+1} F^*}(y^i + \sigma _{i+1} A \bar{x}^{i+1}), \end{aligned} \end{aligned}$$
(3)

for all \(i \in \mathbb {N}\), with positive stepsizes \(\tau _i\) and \(\sigma _i\) and extrapolation parameter \(\omega _i\) has proven to be a simple and effective solution method. Using constant stepsizes, this method converges weakly, if \(F^*\) and G are convex and lower semi continuous functionals. Furthermore linear convergence is proven in [1], if both functionals are strongly convex. If only one of the functionals is strongly convex, while the other is convex, an accelerated version of the algorithm with varying stepsizes is proven in [1] to converge with rate of \(\Vert {x^{N}-{\hat{x}}} \Vert ^{2} = \mathcal {O}{(N^{-2})}\) where \({\hat{x}}\) is a solution of (1).

In practical applications, however, it can happen that the operator and its adjoint are given as two seperate implementations of discretizations of a continuous operator and its adjoint. If the implementations use the “first dualize, then discretize” approach, it may happen, that the discretizatons are not adjoint to each other. Sometimes, this even happens on purpose, for example to save computational time or to impose certain structure for the image of the adjoint operator [2,3,4]. The influence of such a mismatch has been studied for various algorithms [4,5,6,7,8,9].

In this paper we examine the convergence of the Chambolle–Pock method in the case of a mismatched adjoint, i.e., we examine the algorithm

$$\begin{aligned} \begin{aligned} x^{i+1}&= {\text {prox}}_{\tau _i G}(x^i - \tau _i V^* y^i), \\ \bar{x}^{i+1}&= x^{i+1} + \omega _i (x^{i+1} - x^i), \\ y^{i+1}&= {\text {prox}}_{\sigma _{i+1} F^*}(y^i + \sigma _{i+1} A \bar{x}^{i+1}). \end{aligned} \end{aligned}$$
(4)

with a linear operator \(V: X \rightarrow Y\) instead of A for convergence to a fixed point of (4) in the case where both G and \(F^{*}\) are strongly convex.

Example 1.1

(Counterexample for convergence) Here is a simple example that shows that the mismatched iteration does not necessarily converge. We consider the problem \(\min _{x} \Vert {x} \Vert _{1}\) on \(\mathbb {R}^{n}\), which is of the form (1), and model this with \(A = I\), \(F(y) = \Vert {y} \Vert _1\), and \(G\equiv 0\). We consider the most basic form of Chambolle–Pock’s method with constant \(\tau ,\sigma > 0\) and \(\omega =1\), i.e. the mismatched iteration is

$$\begin{aligned} x^{i+1}&= x^{i} - \tau V^{T}y^{i}\\ y^{i+1}&= {{\,\textrm{proj}\,}}_{[-1,1]}(y^{i} + \sigma A(2x^{i+1}-x^{i})). \end{aligned}$$

If we consider the mismatch \(V = -\alpha I\) with \(\alpha >0\) (instead of I), the iteration becomes

$$\begin{aligned} x^{i+1}&= x^{i} + \alpha \tau y^{i}\\ y^{i+1}&= {{\,\textrm{proj}\,}}_{[-1,1]}(y^{i} + \sigma (2x^{i+1}-x^{i})). \end{aligned}$$

If we initialize with \(x^{0} > 0\) and \(y^{0} > 0\) (component-wise), we get that the entries in \(x^{i}\) are strictly increasing and hence, will not converge to the unique solution \(x=0\).

Note that \((x,y) = (0,0)\) is both a saddle point and a fixed point of the mismatched iterations in this case.

Before we analyze the convergence of the mismatched iteration (4) we provide a result that shows that fixed points of (4) are close to the true solution of (3) if the norm \(\Vert {A-V} \Vert \) is small.

Theorem 1.2

If G is a \(\gamma _G\)-strongly convex function, \((x^*, y^*)\) is the fixed point of the original Chambolle–Pock method (3) and \((\hat{x},\hat{y})\) is the fixed point of the Chambolle–Pock method with mismatched adjoint (4), it holds that

$$\begin{aligned} \Vert {x^* - \hat{x}} \Vert \le \frac{1}{\gamma _G} \Vert {(V-A)^* \hat{y}} \Vert . \end{aligned}$$

Proof

Since \(\partial G\) is \(\gamma _{G}\)-strongly monotone and \(\partial F^*\) is monotone, we can conclude for \(-A^* y^* \in \partial G(x^*), -V^*\hat{y} \in \partial G(\hat{x}), A x^* \in \partial F^*(y^*), A \hat{x} \in \partial F^*(\hat{y})\) that

$$\begin{aligned} \langle {x^*-\hat{x}},{-A^* y^* + V^* \hat{y}}\rangle \ge \gamma _G \Vert {x^* - \hat{x}} \Vert ^2 \end{aligned}$$

and

$$\begin{aligned} \langle {x^*-\hat{x}},{A^* (y^* - \hat{y})}\rangle = \langle {A(x^*-\hat{x})},{y^* - \hat{y}}\rangle \ge 0. \end{aligned}$$

These sum up to

$$\begin{aligned} \langle {x^*-\hat{x}},{(V-A)^* \hat{y}}\rangle \ge \gamma _G \Vert {x^* - \hat{x}} \Vert ^2 . \end{aligned}$$

Furthermore, it is

$$\begin{aligned} \gamma _G \Vert {x^* - \hat{x}} \Vert ^2 \le \langle {x^*-\hat{x}},{(V-A)^* \hat{y}}\rangle \le \Vert {x^* - \hat{x}} \Vert \Vert {(V-A)^* \hat{y}} \Vert , \end{aligned}$$

which shows

$$\begin{aligned} \gamma _G \Vert x^* - \hat{x}\Vert \le \Vert (V-A)^* \hat{y} \Vert . \end{aligned}$$

\(\square \)

Notably, we cannot show that the mismatched algorithm will converge to the original solution (a situation which possible for other mismatched iterations [7]). However, since we can bound the difference of fixed points up to a multiplicative constant by the norm of the difference between the correct and the mismatched adjoint, the analysis of the Chambolle–Pock method with mismatched adjoint can be of interest in practical applications. Moreover, in applications in computerized tomography, mismatched adjoints are also used on purpose [4,5,6,7].

The rest of the paper is structured as follows. In Sect. 2 we reformulate (4), introduce the concept of test operators from [10] and provide some technical lemmas that we need to prove convergence of the mismatched iteration in Sect. 3. In Sect. 4, we present numerical examples and Sect. 5 concludes the paper.

Throughout this paper, we will use \(\langle {x},{x'}\rangle _T := \langle {Tx},{x'}\rangle \) and the seminorm \(\Vert {x} \Vert _T ^{2}:= \langle {x},{x}\rangle _T\) for a (not necessarily symmetric) positive semidefinite operator T.Footnote 1 In case of having \(T = I\), we will denote the Hilbert space norm as \(\Vert {x} \Vert \) without the subscripts. Additionally, we will use the notation \(\mathcal {L}(\mathcal {U},\mathcal {V})\) for the space of bounded linear operators \(L: \mathcal {U} \rightarrow \mathcal {V}\) between Hilbert spaces \(\mathcal {U}\) and \(\mathcal {V}\) with the corresponding operator norm \(\Vert {L} \Vert = \inf \{c \ge 0: \Vert {Lx} \Vert \le c \Vert {x} \Vert \text { for all } x \in \mathcal {U}\}\). Furthermore, we will write \(A \ge B\) for operators \(A,B \in \mathcal {L}(\mathcal {U},\mathcal {U})\), if \(A-B\) is positive semidefinite.

2 Preliminaries

In this section we present the reformulation of the mismatched iteration as a preconditioned proximal point method, recall the results from [10] on which our analysis relies and provide the necessary technical estimates needed for the convergence proof. Note that we do not prove the existence of a fixed point of the mismatched iteration, but take its existence for granted.

2.1 Subdifferential Reformulation

Since the original proof of Chambolle and Pock in [1] relies on having the exact adjoint, we use a different approach, namely the reformulation of the method as a preconditioned proximal point method from [11] (see also [12]). Recall that the proximal operator of a proper, convex and lower semicontinuous functional \(f: X \rightarrow \overline{\mathbb {R}}\) is defined as \({{\,\textrm{prox}\,}}_{f}(x) = \arg \min _{y \in X} f(y) + \tfrac{1}{2}\Vert {y-x} \Vert ^{2}\) and that it holds

$$\begin{aligned} v = {{\,\textrm{prox}\,}}_f(x) \Leftrightarrow x-v \in \partial f(v). \end{aligned}$$

One sees that each stationary point \({\hat{u}} = \begin{pmatrix} {\hat{x}}\\ {\hat{y}} \end{pmatrix}\) of iteration (4) fulfills \(0 \in H\hat{u}\) with

$$\begin{aligned} H(u) = H(x,y) = \begin{pmatrix} \partial G(x) + V^* y \\ \partial F^*(y) - Ax \end{pmatrix}. \end{aligned}$$
(5)

With \(\bar{x}^{i+1} = (1+\omega _i) x^{i+1} - \omega _i x^i\), we rewrite the update steps of algorithm (4) as

$$\begin{aligned} 0 \in \left( \begin{array}{c} x^{i+1} + \tau _i \partial G(x^{i+1}) - x^i + \tau _i V^* y^i \\ y^{i+1} + \sigma _{i+1} \partial F^*(y^{i+1})- y^i - \sigma _{i+1} A \bar{x}^{i+1} \end{array} \right) . \end{aligned}$$
(6)

We rewrite this inclusion with the help of the operator \({\tilde{H}}_{i+1}\) defined by

$$\begin{aligned} \tilde{H}_{i+1}(u) = \begin{pmatrix} \partial G(x) + V^*y \\ \partial F^*(y) - A [(1+\omega _i) x - \omega _i x^i ] - \omega _i V (x^i -x) \end{pmatrix}, \end{aligned}$$
(7)

the preconditioner

$$\begin{aligned} M_{i+1} = \begin{pmatrix} I &{} - \tau _i V^* \\ - \omega _i \sigma _{i+1} V &{} I\end{pmatrix}, \end{aligned}$$
(8)

and the step length operator

$$\begin{aligned} W_{i+1} = \begin{pmatrix} \tau _i I &{} 0 \\ 0 &{} \sigma _{i+1} I \end{pmatrix}. \end{aligned}$$
(9)

as the following preconditioned proximal point iteration

$$\begin{aligned} 0 \in W_{i+1} \tilde{H}_{i+1}(u^{i+1}) + M_{i+1} (u^{i+1} - u^i) \end{aligned}$$
(10)

which is exactly the mismatched iteration (6).

For this formulation of the iteration we can apply results from [10] and [12]. We quote this (slightly modified) general theorem of [12, Theorem 2.1]:

Theorem 2.1

Let \(\mathcal {U}\) be a Hilbert space, \(\tilde{H}_{i+1}: \mathcal {U} \rightrightarrows \mathcal {U}\) and let \(M_{i+1}, W_{i+1}, Z_{i+1} \in \mathcal {L}(\mathcal {U},\mathcal {U})\) for \(i \in \mathbb {N}\). Suppose that

$$\begin{aligned} 0 \in W_{i+1} \tilde{H}_{i+1}(u^{i+1}) + M_{i+1}(u^{i+1}-u^i) \end{aligned}$$
(11)

is solvable for \(\{u^{i+1}\}_{i \in \mathbb {N}} \subset \mathcal {U}\) and let \(\hat{u} \in \mathcal {U}\) be a stationary point of the iteration.

If \(Z_{i+1}M_{i+1}\) is self-adjoint and for some \(\Delta _{i+1} \in \mathbb {R}\) the condition

$$\begin{aligned}&\langle {\tilde{H}_{i+1}(u^{i+1})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}} \nonumber \\&\quad \ge \frac{1}{2} \Vert {u^{i+1} - \hat{u}} \Vert _{Z_{i+2}M_{i+2} - Z_{i+1}M_{i+1}}^2 - \frac{1}{2} \Vert {u^{i+1}-u^i} \Vert ^2_{Z_{i+1}M_{i+1}} - \Delta _{i+1} \end{aligned}$$
(12)

holds for all \(i \in \mathbb {N}\), then so does the descent inequality

$$\begin{aligned} \frac{1}{2} \Vert {u^N - \hat{u}} \Vert ^2_{Z_{N+1}M_{N+1}} \le \frac{1}{2} \Vert u^0-\hat{u}\Vert ^2_{Z_1M_1} + \sum _{i=0}^{N-1} \Delta _{i+1}. \end{aligned}$$

The maps \(\tilde{H}_{i}\), \(M_{i}\), and \(W_{i}\) from (7), (8), and (9) are defined on \(\mathcal {U} = X \times Y\). Then the iteration (11) is exactly the iteration (4). The operator \(Z_{i}\) and the number \(\Delta _{i}\) are yet to be defined and are used to establish inequality (12) and the final descent inequality. The operator \(Z_{i}\) is called test operator and the \(\Delta _{i}\) can be used to further quantify the descent.

We will introduce the test operator \(Z_{i}\) and the quantities \(\Delta _{i}\) in the next subsection and aim for non-positive \(\left( \Delta _{i}\right) _{i \in \mathbb {N}}\), but also want that the operators \(Z_{i+1}M_{i+1}\) grow as fast as possible to obtain fast convergence.

Consequently, our next aim is to show that

  • with the right step length choices an operator \(Z_{i+1}\) with \(Z_{i+1} M_{i+1}\) being self-adjoint exists, and

  • for some non-positive \(\Delta _{i+1}\) the inequality (12) can be obtained.

2.2 Test Operator and Step Length Bounds

We choose the test operator as

$$\begin{aligned} Z_{i+1} := \begin{pmatrix} \varphi _i I &{} 0 \\ 0 &{} \psi _{i+1}I \end{pmatrix} \end{aligned}$$
(13)

for \(i \in \mathbb {N}\) and show that with this choice we can fulfill the assumptions in Theorem 2.1 for appropriate \(\varphi _{i}\) and \(\psi _{i+1}\). First, we need that \(Z_{i+1}M_{i+1}\) is self adjoint. Since this operator is

$$\begin{aligned} Z_{i+1}M_{i+1} = \begin{pmatrix} \varphi _{i}I &{} -\tau _{i}\varphi _{i}V^{*}\\ -\omega _{i}\sigma _{i+1}\psi _{i+1}V &{} \psi _{i+1}I \end{pmatrix}, \end{aligned}$$
(14)

we assume that the values \(\varphi _{i},\psi _{i+1}\) of the test operator fulfill

$$\begin{aligned} \omega _{i}\sigma _{i+1}\psi _{i+1} = \tau _{i}\varphi _{i}. \end{aligned}$$
(15)

Next we introduce the “tested dual stepsize” \(\psi _{i}\sigma _{i}\)

$$\begin{aligned} \eta _{i} = \psi _{i}\sigma _{i} \end{aligned}$$
(16)

and define the extrapolation constant \(\omega _{i}\) as

$$\begin{aligned} \omega _{i} = \tfrac{\eta _{i}}{\eta _{i+1}}. \end{aligned}$$
(17)

Consequently, the “tested primal stepsize” \(\tau _{i}\varphi _{i}\) also fulfills

$$\begin{aligned} \eta _{i} = \eta _{i+1}\omega _{i} = \omega _{i}\sigma _{i+1}\psi _{i+1} = \tau _{i}\varphi _{i}. \end{aligned}$$
(18)

Furthermore, Theorem 2.1 needs \(Z_{i+1} M_{i+1}\) to be positive semidefinite which we show now.

Lemma 2.2

([12, Lemma 3.4]) Let \(i \in \mathbb {N}\) and suppose that conditions (15), (16) and (17) hold. Then we have that \(Z_{i+1} M_{i+1}\) is self-adjoint and satisfies

$$\begin{aligned} Z_{i+1} M_{i+1} = \begin{pmatrix} \varphi _i I &{} - \eta _i V^* \\ - \eta _i V &{} \psi _{i+1} I \end{pmatrix} \ge \begin{pmatrix} \delta \varphi _i I &{} 0 \\ 0 &{} \psi _{i+1} I - \frac{\eta _i^2 }{\varphi _i (1-\kappa )} VV^* \end{pmatrix} \end{aligned}$$
(19)

for all \(\delta \in [0,\kappa ]\) with \(\kappa \in (0,1)\).

Proof

First note that from (14), (15), (16), (17), and (18) we get

$$\begin{aligned} Z_{i+1} M_{i+1}&=\begin{pmatrix} \varphi _i I &{} - \eta _i V^* \\ - \eta _i V &{} \psi _{i+1} I \end{pmatrix}. \end{aligned}$$

For the second claim we observe

$$\begin{aligned} M&: = Z_{i+1} M_{i+1} - \begin{pmatrix} \delta \varphi _i I &{} 0 \\ 0 &{} \psi _{i+1} I - \frac{\eta _i^2}{\varphi _i (1-\kappa )} VV^* \end{pmatrix} \\&= \begin{pmatrix} (1-\delta ) \varphi _i I &{} -\eta _i V^* \\ - \eta _i V &{}(-\eta _i V ) \left( (1-\kappa ) \varphi _i \right) ^{-1} (-\eta _i V)^* \end{pmatrix}. \end{aligned}$$

Since \(1 > \kappa \ge \delta \) and \((1-\kappa ) \varphi _i > 0\), we derive the positive semidefiniteness of M. \(\square \)

Hence, we can ensure that \(Z_{i+1}M_{i+1}\) is positive semidefinite if we assume

$$\begin{aligned} \psi _{i+1}&\ge \frac{\eta _i^2}{\varphi _i (1-\kappa )} \Vert {V} \Vert ^2. \end{aligned}$$
(20)

Note that by \(\eta _i = \varphi _i \tau _i = \psi _i \sigma _i\) and \(\eta _{i+1} = \eta _{i} \omega _{i}^{-1}\) we get that condition (20) enforces

$$\begin{aligned} \sigma _{i+1} \tau _{i} \le \frac{(1-\kappa )}{\omega _i} \frac{1}{\Vert {V} \Vert ^2}, \end{aligned}$$

similar to the widely known stepsize bounds in [1].

Next we investigate the operator \(Z_{i+2} M_{i+2} - Z_{i+1} M_{i+1}\) and show that it is easy to evaluate \(\Vert {u} \Vert _{Z_{i+2} M_{i+2} - Z_{i+1} M_{i+1}}^{2}\).

Lemma 2.3

([12, Lemma 3.5]) Let \(i \in \mathbb {N}\) and assume that conditions (15), (16), (17) and (20) are fulfilled and define

$$\begin{aligned} \Xi _{i+1} := 2 \begin{pmatrix} \tau _i \mu _G I &{} \tau _i V^* \\ -\sigma _{i+1} V &{} \sigma _{i+1} \mu _{F^*} I \end{pmatrix} \end{aligned}$$
(21)

If the constants \(\mu _{G}\) and \(\mu _{F^{*}}\) are chosen such that

$$\begin{aligned} \varphi _{i+1 }&= \varphi _i (1 + 2 \tau _i \mu _G), \end{aligned}$$
(22)
$$\begin{aligned} \psi _{i+1}&= \psi _i (1 + 2 \sigma _i \mu _{F^*}), \end{aligned}$$
(23)

then it holds that

$$\begin{aligned} \Vert {u} \Vert ^2_{Z_{i+2}M_{i+2} - Z_{i+1} M_{i+1} } = \Vert {u} \Vert ^2_{Z_{i+1}\Xi _{i+1}}. \end{aligned}$$

Proof

We use the expression for \(Z_{i+1}M_{i+1}\) from (19) and the conditions (15), (16),  (22), (23) and (17) to get

$$\begin{aligned}&Z_{i+1} M_{i+1} + Z_{i+1} \Xi _{i+1} - Z_{i+2} M_{i+2} \\&\quad = \begin{pmatrix} \varphi _i I &{} - \eta _i V^* \\ - \eta _i V &{} \psi _{i+1} I \end{pmatrix} + \begin{pmatrix} \varphi _i I &{} 0 \\ 0 &{} \psi _{i+1}I \end{pmatrix} \begin{pmatrix} 2 \tau _i \mu _G I &{} 2\tau _i V^* \\ -2\sigma _{i+1} V &{} 2 \sigma _{i+1} \mu _{F^*} I \end{pmatrix} - \begin{pmatrix} \varphi _{i+1} I &{} - \eta _{i+1} V^* \\ - \eta _{i+1} V &{} \psi _{i+2} I \end{pmatrix} \\&\quad = \begin{pmatrix} \varphi _i I &{} - \eta _i V^* \\ - \eta _i V &{} \psi _{i+1} I \end{pmatrix} + \begin{pmatrix} 2 \varphi _i \tau _i \mu _G I &{} 2 \varphi _i \tau _i V^* \\ -2\sigma _{i+1} \psi _{i+1} V &{} 2 \sigma _{i+1} \psi _{i+1} \mu _{F^*} I \end{pmatrix} - \begin{pmatrix} \varphi _{i+1} I &{} - \eta _{i+1} V^* \\ - \eta _{i+1} V &{} \psi _{i+2} I \end{pmatrix} \\&\quad = \begin{pmatrix} \varphi _{i+1} I &{} \eta _{i} V^* \\ (- \eta _i - 2 \eta _{i+1}) V &{} \psi _{i+2} I \end{pmatrix} - \begin{pmatrix} \varphi _{i+1} I &{} - \eta _{i+1} V^* \\ - \eta _{i+1} V &{} \psi _{i+2} I \end{pmatrix} \\&\quad = \begin{pmatrix} 0 &{} (\eta _i + \eta _{i+1}) V^* \\ - (\eta _i + \eta _{i+1}) V &{} 0 \end{pmatrix}. \end{aligned}$$

This shows that \(Z_{i+1} M_{i+1} + Z_{i+1} \Xi _{i+1} - Z_{i+2} M_{i+2}\) is skew-symmetric, and hence, it holds for all u that \(\Vert {u} \Vert ^2_{Z_{i+1} ( M_{i+1} + \Xi _{i+1} ) - Z_{i+2} M_{i+2} } = 0\) from which the statement follows. \(\square \)

2.3 Technical Estimates

With the preparations from the previous subsection we are in position to estimate the term

$$\begin{aligned} D&:= \langle {\tilde{H}_{i+1}(u^{i+1})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}} - \frac{1}{2} \Vert {u^{i+1} - \hat{u}} \Vert _{Z_{i+2}M_{i+2} - Z_{i+1}M_{i+1}}^2\\&= \langle {\tilde{H}_{i+1}(u^{i+1})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}} - \frac{1}{2} \Vert {u^{i+1} - \hat{u}} \Vert _{Z_{i+1}\Xi _{i+1}}^2, \end{aligned}$$

which appears in Theorem 2.1. Recall that the operator \({\tilde{H}}_{i}\) is given by (7), the preconditioner \(M_{i}\) is given by (8), the step length operator \(W_{i}\) is given by (9), the test operator \(Z_{i}\) is defined in (13), and the operator \(\Xi _{i}\) is defined in (21). In order to estimate D, we assume that both \(F^*\) and G are strongly convex functionals and choose the step length appropriately, and state the following estimate for D.

Theorem 2.4

Let \(i \in \mathbb {N}\). Suppose that the conditions (15), (16), (17), (22), and (23) hold,

that \(G, F^*\) are \(\gamma _G\)/\(\gamma _{F^{*}}\)-strongly convex, respectively, with

$$\begin{aligned} \gamma _G \ge \frac{\epsilon }{2 \omega _i} \Vert {A-V} \Vert + \mu _G \end{aligned}$$
(24)

for some \(\epsilon >0\). Then it holds that

$$\begin{aligned} D \ge \eta _{i+1} ( \gamma _{F^*} - \mu _{F^*} - \frac{1+\omega _i}{2 \epsilon } \Vert {A-V} \Vert ) \Vert {y^{i+1} - \hat{y}} \Vert ^2 - \frac{\eta _i \epsilon \Vert {A\!-\!V} \Vert }{2} \Vert {x^{i\!+\!1} - x^i} \Vert ^2. \end{aligned}$$

Proof

We observe

$$\begin{aligned} - \frac{1}{2} \Vert {u^{i+1} - \hat{u} } \Vert ^2_{Z_{i+1} \Xi _{i+1}}&= (\eta _{i+1} - \eta _i) \langle {V (x^{i+1} - \hat{x})},{y^{i+1} - \hat{y} }\rangle \\&\quad - \eta _i \mu _G \Vert {x^{i+1} - \hat{x}} \Vert ^2 - \eta _{i+1} \mu _{F^*} \Vert {y^{i+1} - \hat{y}} \Vert ^2, \end{aligned}$$

which gives, by definition of \({\tilde{H}}_{i+1}\) in (7) and H in (5),

$$\begin{aligned} \begin{aligned} D&= \langle {\tilde{H}_{i+1}(u^{i+1})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}} - \frac{1}{2} \Vert {u^{i+1} - \hat{u} } \Vert ^2_{Z_{i+1} \Xi _{i+1}} \\&= \langle {H(u^{i+1})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}} \\&\quad + \eta _{i+1} \langle { (A-V)(x^{i+1} - \bar{x}^{i+1})},{y^{i+1} - \hat{y}}\rangle \\&\quad + (\eta _{i+1} - \eta _i) \langle {V (x^{i+1} - \hat{x})},{y^{i+1} - \hat{y} }\rangle \\&\quad - \eta _i \mu _G \Vert {x^{i+1} - \hat{x}} \Vert ^2 - \eta _{i+1} \mu _{F^*} \Vert {y^{i+1} - \hat{y}} \Vert ^2. \end{aligned} \end{aligned}$$
(25)

Now we estimate the first term on the right hand side: Since \(\hat{u} \in H^{-1}(0)\) with \(\hat{u} = \begin{pmatrix} \hat{x} \\ \hat{y} \end{pmatrix}\), we have

$$\begin{aligned} -V^* \hat{y} \in \partial G(\hat{x}) \qquad \text {and}\qquad A \hat{x} \in \partial F^*(\hat{y}) \end{aligned}$$

Hence we get

$$\begin{aligned} \langle {H(u^{i+1})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}}&= \langle {H(u^{i+1})- H(\hat{u})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}} \\&= \eta _i \langle {\partial G(x^{i+1}) - \partial G({\hat{x}})},{x^{i+1}-\hat{x}}\rangle \\&\quad + \eta _{i+1} \langle { \partial F^*(y^{i+1}) - \partial F^{*}({\hat{y}})},{y^{i+1}-\hat{y}}\rangle \\&\quad + \eta _i \langle {V^*(y^{i+1} - \hat{y})},{x^{i+1} - \hat{x}}\rangle \\&\quad + \eta _{i+1} \langle {A(\hat{x} - x^{i+1})},{y^{i+1}-\hat{y}}\rangle . \end{aligned}$$

Now the strong convexity of G and \(F^*\) with constants \(\gamma _{G}\) and \(\gamma _{F^{*}}\), respectively, results in the inequality

$$\begin{aligned} \langle {H(u^{i+1})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}}&\ge \eta _i \gamma _G \Vert {x^{i+1} - \hat{x}} \Vert ^2 + \eta _{i+1} \gamma _{F^*} \Vert {y^{i+1} - \hat{y}} \Vert ^2 \\&\quad + \eta _i \langle { V(x^{i+1} - \hat{x}) },{ y^{i+1} - \hat{y} }\rangle \\&\quad + \eta _{i+1} \langle { A(\hat{x} - x^{i+1}) },{ y^{i+1} - \hat{y} }\rangle . \end{aligned}$$

Plugging this into the definition of D in (25) and collecting terms gives

$$\begin{aligned} D&\ge \eta _i (\gamma _G-\mu _G) \Vert {x^{i+1} - \hat{x}} \Vert ^2 + \eta _{i+1} (\gamma _{F^*} - \mu _{F^*}) \Vert {y^{i+1} - \hat{y}} \Vert ^2 \\&\quad + \eta _i \langle { V(x^{i+1} - \hat{x}) },{ y^{i+1} - \hat{y} }\rangle + \eta _{i+1} \langle { A(\hat{x} - x^{i+1}) },{ y^{i+1} - \hat{y} }\rangle \\&\quad + \eta _{i+1} \langle { V(\bar{x}^{i+1} - x^{i+1}) },{ y^{i+1} - \hat{y} }\rangle + (\eta _{i+1} - \eta _i) \langle { V(x^{i+1} - \hat{x} ) },{ y^{i+1} - \hat{y} }\rangle \\&= \eta _i (\gamma _G-\mu _G) \Vert {x^{i+1} - \hat{x}} \Vert ^2 + \eta _{i+1} (\gamma _{F^*} - \mu _{F^*}) \Vert {y^{i+1} - \hat{y}} \Vert ^2 \\&\quad + \eta _{i+1} \langle { (A-V)(\hat{x} - x^{i+1}) },{ y^{i+1} - \hat{y} }\rangle + \eta _{i+1} \langle { (A-V)( x^{i+1} - \bar{x}^{i+1}) },{ y^{i+1} - \hat{y} }\rangle . \end{aligned}$$

We use the extrapolation \(\bar{x}^{i+1} - x^{i+1} = \omega _i (x^{i+1} - x^i)\) to get for every \(\epsilon >0\)

$$\begin{aligned}&\langle { (A-V)(x^{i+1} - \bar{x}^{i+1}) },{ y^{i+1} - \hat{y} }\rangle \\&\quad =\omega _i \langle { (A-V)(x^i - x^{i+1}) },{ y^{i+1} - \hat{y} }\rangle \\&\quad \ge - \omega _i \Vert {A-V} \Vert \Vert {x^i -x^{i+1} } \Vert \Vert { y^{i+1} - \hat{y} } \Vert \\&\quad \ge - \frac{\omega _i \Vert {A-V} \Vert }{2} \left( \epsilon \Vert {x^i -x^{i+1} } \Vert ^2 + \frac{1}{\epsilon } \Vert { y^{i+1} - \hat{y} } \Vert ^2 \right) \end{aligned}$$

by Young’s inequality. Similarly we derive

$$\begin{aligned} \langle {(A-V)(\hat{x} - x^{i+1}) },{ y^{i+1} - \hat{y} }\rangle&\ge - \Vert {A-V} \Vert \Vert {\hat{x} - x^{i+1} } \Vert \Vert { y^{i+1} - \hat{y} } \Vert \\&\ge - \frac{\Vert {A-V} \Vert }{2} \left( \epsilon \Vert {\hat{x} - x^{i+1} } \Vert ^2 + \frac{1}{\epsilon } \Vert { y^{i+1} - \hat{y} } \Vert ^2 \right) . \end{aligned}$$

With these two estimates we continue to lower bound D and arrive at

$$\begin{aligned} D&\ge \eta _i (\gamma _G-\mu _G) \Vert {x^{i+1} - \hat{x}} \Vert ^2 + \eta _{i+1} \left( \gamma _{F^*} - \mu _{F^*}\right) \Vert {y^{i+1} - \hat{y}} \Vert ^2 \\&\quad - \eta _{i+1} \epsilon \frac{\Vert {A-V} \Vert }{2} \Vert { x^{i+1} - \hat{x} } \Vert ^2 - \eta _{i+1} \frac{\Vert {A-V} \Vert }{2 \epsilon } \Vert { y^{i+1} - \hat{y} } \Vert ^2 \\&\quad - \eta _{i+1} \epsilon \frac{\omega _i \Vert {A-V} \Vert }{2} \Vert {x^i -x^{i+1} } \Vert ^2 - \eta _{i+1} \frac{\omega _i \Vert {A-V} \Vert }{2 \epsilon } \Vert { y^{i+1} - \hat{y} } \Vert ^2 \\&= \eta _{i+1} (\gamma _{F^*} - \mu _{F^*} - \frac{(1+\omega _i) \Vert {A-V} \Vert }{2 \epsilon } ) \Vert {y^{i+1} - \hat{y}} \Vert ^2 \\&\quad + [ \eta _i (\gamma _G-\mu _G) - \eta _{i+1} \epsilon \frac{\Vert {A-V} \Vert }{2}] \Vert {x^{i+1} - \hat{x}} \Vert ^2 \\&\quad - \eta _{i+1} \epsilon \frac{\omega _i \Vert {A-V} \Vert }{2} \Vert {x^{i+1} - x^i } \Vert ^2 \\&\ge \left[ \eta _{i+1} ( \gamma _{F^*} - \mu _{F^*} - \frac{1+\omega _i}{2 \epsilon } \Vert {A-V} \Vert ) \right] \Vert {y^{i+1} - \hat{y}} \Vert ^2 - \frac{\eta _i \epsilon \Vert {A-V} \Vert }{2} \Vert {x^{i+1} - x^i} \Vert ^2. \end{aligned}$$

\(\square \)

The next lemma provides an estimate for \(-\Delta _{i+1}\).

Lemma 2.5

Let \(i \in \mathbb {N}\), the assumptions of Theorem 2.4 be fulfilled and assume furthermore that (20), (24) and

$$\begin{aligned} \varphi _i&\ge \frac{\epsilon \eta _i}{\delta } \Vert {A-V} \Vert \end{aligned}$$
(26)

hold with \(0<\delta \le \kappa <1\). Define

$$\begin{aligned} S_{i+1} = \begin{pmatrix} (\delta \varphi _i - \epsilon \eta _i \Vert {A-V} \Vert ) I &{} 0 \\ 0 &{} \psi _{i+1} I - \frac{\eta _i^2}{\varphi _i (1-\kappa )} VV^*\end{pmatrix}, \end{aligned}$$
(27)

and assume that

$$\begin{aligned} \frac{1}{2} \Vert {u^{i+1} - u^i } \Vert _{S_{i+1}}^2 + \eta _{i+1} (\gamma _{F^*} - \mu _{F^*} - \frac{1+\omega _i}{2 \epsilon } \Vert {A-V} \Vert ) \Vert { y^{i+1} - \hat{y} } \Vert ^2 \ge - \Delta _{i+1}, \end{aligned}$$

is fulfilled. Then it holds that

$$\begin{aligned} -\Delta _{i+1} \le \frac{1}{2} \Vert { u^{i+1} - u^i } \Vert _{Z_{i+1} M_{i+1}}^2 + D. \end{aligned}$$

Proof

In the first step, we rewrite

$$\begin{aligned} S_{i+1}&= \begin{pmatrix} (\delta \varphi _i - \epsilon \eta _i \Vert {A-V} \Vert ) I &{} 0 \\ 0 &{} \psi _{i+1} I - \frac{\eta _i^2}{\varphi _i (1-\kappa )} VV^*\end{pmatrix} \\&= \underbrace{\begin{pmatrix} \delta \varphi _i I &{} 0 \\ 0 &{} \psi _{i+1} I - \frac{\eta _i^2}{\varphi _i (1-\kappa )} VV^*\end{pmatrix}}_{\le Z_{i+1}M_{i+1}} - \begin{pmatrix} \epsilon \eta _i \Vert {A-V} \Vert I &{} 0 \\ 0 &{} 0 \end{pmatrix}, \end{aligned}$$

so Lemma 2.2 gives

$$\begin{aligned} \frac{1}{2} \Vert { u^{i+1} - u^i } \Vert _{S_{i+1}}^2 \le \frac{1}{2} \Vert { u^{i+1} - u^i } \Vert _{Z_{i+1} M_{i+1}}^2 - \frac{\eta _i \epsilon \Vert {A-V} \Vert }{2} \Vert { x^{i+1} - x^i } \Vert ^2. \end{aligned}$$

Hence

$$\begin{aligned} -\Delta _{i+1}&\le \frac{1}{2} \Vert {u^{i+1} - u^i } \Vert _{S_{i+1}}^2 + \eta _{i+1} (\gamma _{F^*} - \mu _{F^*} - \frac{1+\omega _i}{2 \epsilon } \Vert {A-V} \Vert ) \Vert { y^{i+1} - \hat{y} } \Vert ^2 \\&\le \frac{1}{2} \Vert { u^{i+1} - u^i } \Vert _{Z_{i+1} M_{i+1}}^2 - \frac{\eta _i \epsilon \Vert {A-V} \Vert }{2} \Vert { x^{i+1} - x^i } \Vert ^2 \\&\qquad + \eta _{i+1} (\gamma _{F^*} - \mu _{F^*} - \frac{1+\omega _i}{2 \epsilon } \Vert {A-V} \Vert ) \Vert { y^{i+1} - \hat{y} } \Vert ^2 \end{aligned}$$

which shows the claim since Theorem 2.4 shows that the last two terms are a lower bound for D. \(\square \)

Now we state the following abstract convergence result which concludes this section:

Theorem 2.6

Suppose that the step length conditions (15), (16), (17), (20)–(23) and (26) hold with \(\epsilon >0\), \(0< \delta \le \kappa < 1\) and for all \(i \in \mathbb {N}\). Additionally suppose that \(G, F^*\) are \(\gamma _G\)/\(\gamma _{F^{*}}\)-strongly convex, respectively, and that (24) holds. Furthermore, let \(S_{i+1}\) be defined by (27) and \({\hat{u}} = \begin{pmatrix} {\hat{x}}\\ {\hat{y}} \end{pmatrix}\) fulfill \(0\in {\tilde{H}}({\hat{u}})\). Then it holds

$$\begin{aligned} \frac{1}{2} \Vert {u^N - \hat{u}} \Vert ^2_{Z_{N+1}M_{N+1}} \le \frac{1}{2} \Vert u^0-\hat{u}\Vert ^2_{Z_1M_1} + \sum _{i=0}^{N-1} \Delta _{i+1} \end{aligned}$$

for every \(\Delta _{i}\) which fulfills

$$\begin{aligned} \frac{1}{2} \Vert {u^{i+1} - u^i } \Vert _{S_{i+1}}^2 + \eta _{i+1} (\gamma _{F^*} - \mu _{F^*} - \frac{1+\omega _i}{2 \epsilon } \Vert {A-V} \Vert ) \Vert { y^{i+1} - \hat{y} } \Vert ^2 \ge - \Delta _{i+1}. \end{aligned}$$

Proof

We recognize from Lemma 2.5 that

$$\begin{aligned} - \Delta _{i+1}&\le \frac{1}{2} \Vert { u^{i+1} - u^i } \Vert _{Z_{i+1} M_{i+1}}^2 + \langle { \tilde{H}_{i+1}(u^{i+1}) },{ u^{i+1} - \hat{u} }\rangle _{Z_{i+1} W_{i+1} } \\&- \frac{1}{2} \Vert { u^{i+1} - \hat{u} } \Vert _{Z_{i+1} \Xi _{i+1}}^2. \end{aligned}$$

Using the equality

$$\begin{aligned} \Vert {u^{i+1} - \hat{u} } \Vert _{Z_{i+1} \Xi _{i+1}}^2 = \Vert {u^{i+1} - \hat{u} } \Vert _{Z_{i+2} M_{i+2} - Z_{i+1} M_{i+1}}^2 \end{aligned}$$

we get

$$\begin{aligned} - \Delta _{i+1}&\le \frac{1}{2} \Vert { u^{i+1} - u^i } \Vert _{Z_{i+1} M_{i+1}}^2 + \langle { \tilde{H}_{i+1}(u^{i+1}) },{ u^{i+1} - \hat{u} }\rangle _{Z_{i+1} W_{i+1} } \\&- \frac{1}{2} \Vert { u^{i+1} - \hat{u} } \Vert _{Z_{i+2} M_{i+2} - Z_{i+1} M_{i+1}}^2. \end{aligned}$$

Rearranging these terms leads to

$$\begin{aligned} \langle { \tilde{H}_{i+1}(u^{i+1}) },{ u^{i+1} - \hat{u} }\rangle _{Z_{i+1} W_{i+1} }&\ge \frac{1}{2} \Vert { u^{i+1} - \hat{u} } \Vert _{Z_{i+2} M_{i+2} - Z_{i+1} M_{i+1}}^2 \\&\quad \quad - \frac{1}{2} \Vert { u^{i+1} - u^i } \Vert _{Z_{i+1} M_{i+1}}^2 - \Delta _{i+1} \end{aligned}$$

and thus all conditions of Theorem 2.1 are satisfied, and the result follows. \(\square \)

3 Convergence Rates

With the results from the previous section, we are in position to prove convergence of (4) under easily verifiable conditions. We assume that both G as well as \(F^*\) are proper, strongly convex, and lower-semicontinuous functions, and that the stepsizes can be chosen such that we obtain linear convergence. We proceed as follows: First we derive conditions under which we can guarantee linear convergence, and then we show how to select the parameters in order to obtain a choice of the stepsizes that is simply to apply in practice.

Theorem 3.1

Choose \(\mu _G> 0, \mu _{F^*} > 0\) and suppose that G is \(\gamma _G\)-strongly convex and \(F^*\) is \(\gamma _{F^*}\)-strongly convex with

$$\begin{aligned} \gamma _G \ge \frac{\epsilon }{2 \omega _i} \Vert {A-V} \Vert + \mu _G, \;\; \gamma _{F^*} \ge \frac{1+\omega _i}{2 \epsilon } \Vert A-V\Vert + \mu _{F^*} \end{aligned}$$
(28)

for some \(\epsilon > 0\). Furthermore let \(u^{N} = (x^{N},y^{N})^{T}\) be generated by the Chambolle–Pock method with mismatched adjoint (4) and \({\hat{u}} = ({\hat{x}},{\hat{y}})^{T}\) be a fixed point of this iteration. Then with constant step lengths

$$\begin{aligned} \tau _i = \tau&:= \min \left\{ \frac{\epsilon ^{-1} \delta }{\Vert {A-V} \Vert }, \sqrt{ \frac{(1-\kappa )\mu _{F^*}}{\Vert {V} \Vert ^2 \mu _G}} \right\} , \\ \sigma _i = \sigma&:= \frac{\mu _{G} }{\mu _{F^*} } \tau , \\ \omega _i = \omega&:= (1+2 \tau \mu _G)^{-1} \end{aligned}$$

for some \(0 \le \delta \le \kappa < 1\) it holds that \(\Vert {u^N - \hat{u}} \Vert ^2 =\mathcal {O}(\omega ^N)\).

Proof

We set \(\varphi _0 := \frac{1}{\tau }\) and \(\psi _0 := \frac{1}{\sigma }\) and by (22) we get

$$\begin{aligned} \varphi _i \tau = \underbrace{\varphi _0 \tau }_{=1} (1+2\tau \mu _G)^i = (1+2\tau \mu _G)^i \end{aligned}$$

By (15), (16) and (23) we get

$$\begin{aligned} \psi _{i+1} \sigma&= \psi _ {i} \sigma (1+2\sigma \mu _{F^*})= \psi _ {i} \sigma (1+2\tau \mu _{G}) \\&= \underbrace{\psi _0 \sigma }_{=1} (1+2\tau \mu _{G})^{i+1} = (1+2\tau \mu _{G})^{i+1}. \end{aligned}$$

Hence, again by (16), we get

$$\begin{aligned} \eta _i = (1+2\tau \mu _{G})^i \end{aligned}$$

and, by (17)

$$\begin{aligned} \omega = \frac{\eta _i}{\eta _{i+1}} = \frac{1}{1+2\tau \mu _{G}}. \end{aligned}$$

Now we claim that the matrix \(S_{i+1}\) defined in (27) is positive semidefinite, which is equivalent to the conditions \(\delta \varphi _{i} - \eta _{i}\epsilon \Vert {A-V} \Vert \ge 0\) and \(\psi _{i+1}\ge \eta _{i}^{2}\Vert {V} \Vert ^{2}/(\varphi _{i}(1-\kappa ))\). The first condition is fulfilled since \(\tau \le \delta /(\epsilon \Vert {A-V} \Vert )\) holds by the assumption in the theorem. The second condition follows from the assumption \(\tau ^2 \le (1-\kappa )\mu _{F^{*}}/(\Vert {V} \Vert ^{2}\mu _{G})\), since

$$\begin{aligned}&\tau ^2 \le \frac{(1-\kappa ) \mu _{F^*}}{\Vert {V} \Vert ^2 \mu _G} \overset{\sigma = \frac{\mu _G}{\mu _{F^*}} \tau }{\Leftrightarrow } \tau \sigma \le \frac{1-\kappa }{\Vert {V} \Vert ^2} \\&\quad \overset{\eta _i = \sigma \psi _{i} = \tau \varphi _i}{\Leftrightarrow } \frac{\eta _{i+1} \eta _i}{\psi _{i+1} \varphi _i} \le \frac{1-\kappa }{\Vert {V} \Vert ^2} \\&\quad \overset{\omega = \eta _i / \eta _{i+1}}{\Leftrightarrow } \frac{\eta _{i}^2}{\omega \psi _{i+1} \varphi _i} \le \frac{1-\kappa }{\Vert {V} \Vert ^2} \Leftrightarrow \psi _{i+1} \ge \frac{\eta _i^2 \Vert {V} \Vert ^2}{\varphi _i (1- \kappa ) \omega } \ge \frac{\eta _i^2 \Vert {V} \Vert ^2}{\varphi _i (1- \kappa ) }. \end{aligned}$$

Consequently, \(S_{i+1}\) is positive semidefinite, so we can choose \(\Delta _{i+1} = 0\) for \(i \in \mathbb {N}\). As a result, Theorem 2.6 and Lemma 2.5 results in

$$\begin{aligned} \Vert { u^0 - \hat{u}} \Vert _{Z_1M_1}^2&\ge \Vert u^N - \hat{u} \Vert ^2_{Z_{N+1} M_{N+1}} \\&\ge \delta \varphi _N \Vert x^N - \hat{x} \Vert ^2 + \underbrace{\left( \psi _{N+1} - \frac{\eta _N^2 \Vert V\Vert ^2}{\varphi _N (1-\kappa )} \right) }_{\ge 0} \Vert y^N - \hat{y}\Vert ^2 \\&= \varphi _{N}\Big (\delta \Vert {x^{N}-{\hat{x}}} \Vert ^{2} + \big (\tfrac{\psi _{N+1}}{\varphi _{N}} - \tfrac{\eta _{N}^{2}\Vert {V} \Vert ^2}{\varphi _{N}^{2}(1-\kappa )}\big )\Vert {y^{N}-{\hat{y}}} \Vert ^{2}\Big ). \end{aligned}$$

Using the properties \(\eta _N = \tau \varphi _N = \sigma \psi _N\) and \(\eta _{N+1} = \omega \eta _N\) we arrive at

$$\begin{aligned} \Vert {u^0 - \hat{u}} \Vert _{Z_1M_1}^2&\ge \varphi _{N}\Big (\delta \Vert {x^{N}-{\hat{x}}} \Vert ^{2} + \big (\tfrac{\psi _{N+1}}{\varphi _{N}} - \tfrac{\eta _{N}^{2}\Vert {V} \Vert ^2}{\varphi _{N}^{2}(1-\kappa )}\big )\Vert {y^{N}-{\hat{y}}} \Vert ^{2}\Big ) \\&= (1+2\tau \mu _{G})^{N}\left( \tfrac{\delta }{\tau }\Vert {x^{N}-{\hat{x}}} \Vert ^{2} + \big (\tfrac{\tau }{\sigma }(1+2\tau \mu _{G}) - \frac{ \tau ^2 \Vert {V} \Vert ^2}{(1-\kappa )} \big )\Vert {y^{N}-{\hat{y}}} \Vert ^{2}\right) \end{aligned}$$

which proves the claim. \(\square \)

This above theorem is not immediately practical, since it is unclear, if all parameters can be chosen such that all conditions are fulfilled. Hence, we now derive a method that allows for a feasible choice of the parameters.

If we plug the definition of \(\omega \) into conditions (28), we get

$$\begin{aligned} \gamma _{G}&\ge \tfrac{\epsilon }{2}(1+2\tau \mu _{G})\Vert {A-V} \Vert + \mu _{G} \end{aligned}$$
(29)
$$\begin{aligned} \gamma _{F^{*}}&\ge \tfrac{1+\tau \mu _{G}}{\epsilon (1+2\tau \mu _{G})}\Vert {A-V} \Vert + \mu _{F^{*}}. \end{aligned}$$
(30)

Since \(\tfrac{1}{2}\le \tfrac{1+t}{1+2t}\le 1\) for \(t>0\) ,  (30) is fulfilled, if

$$\begin{aligned} \gamma _{F^{*}} = \tfrac{\Vert {A-V} \Vert }{\epsilon } + \mu _{F^{*}}. \end{aligned}$$
(31)

We express the quantities \(\mu _{G}\) and \(\mu _{F^{*}}\) by \(\mu _{F^{*}} = a\gamma _{F^{*}},\ \mu _{G} = b\gamma _{G}\) with \(0\le a,b\le 1\). Then, we can use (31) to get a valid value for \(\epsilon \), namely

$$\begin{aligned} \epsilon = \tfrac{\Vert {A-V} \Vert }{(1-a)\gamma _{F^{*}}}. \end{aligned}$$

Furthermore, we observe that it is always beneficial to choose \(\delta \) as large as possible, i.e. we set \(\delta =\kappa \). Additionally, using this \(\epsilon \) and \(\mu _{G}\) in (29), we get

$$\begin{aligned} \gamma _{G}&\ge \tfrac{1+2\tau b\gamma _{G}}{2(1-a)\gamma _{F^{*}}}\Vert {A-V} \Vert ^{2} + b\gamma _{G}, \end{aligned}$$

which gives the inequality

$$\begin{aligned} \tau \le \tfrac{(1-b)(1-a)\gamma _{F^{*}}}{\Vert {A-V} \Vert ^{2}b} - \tfrac{1}{2b\gamma _{G}}. \end{aligned}$$
(32)

On the other hand, we plug in \(\delta =\kappa \), and the values for \(\epsilon \), \(\mu _{F^{*}}\) and \(\mu _{G}\) into the definition of \(\tau \) in Theorem 3.1 and get

$$\begin{aligned} \tau = \min \left\{ \frac{\kappa (1-a)\gamma _{F^{*}}}{\Vert {A-V} \Vert ^{2}}, \sqrt{ \frac{(1-\kappa )a\gamma _{F^*}}{\Vert {V} \Vert ^2 b\gamma _G}} \right\} . \end{aligned}$$

For all choices of \(\kappa \) and a, there exists a small \(b \in (0,1)\), such that the minimal value is attained at the left expression in minimum. Hence, by choice, we choose our parameters in such a way, that

$$\begin{aligned} \tau =\frac{\kappa (1-a)\gamma _{F^{*}}}{\Vert {A-V} \Vert ^{2}}, \end{aligned}$$

so we have that

$$\begin{aligned} \tau ^2 = \tfrac{\kappa ^2 (1-a)^2 \gamma _{F^*}^2}{\Vert {A-V} \Vert ^4} \le \tfrac{(1-\kappa ) a\gamma _{F^{*}}}{\Vert {V} \Vert ^2 b \gamma _G} \end{aligned}$$
(33)

must hold in the definition. Hence, we have to find ab and \(\kappa \), such that

$$\begin{aligned} \tfrac{\kappa (1-a) \gamma _{F^*}}{\Vert {A-V} \Vert ^2} = \tau \overset{(32)}{\le }\tfrac{(1-b)(1-a)\gamma _{F^{*}}}{\Vert {A-V} \Vert ^{2}b} - \tfrac{1}{2b\gamma _{G}}, \end{aligned}$$

which is equivalent to

$$\begin{aligned} \kappa \le \tfrac{1-b}{b} - \tfrac{\Vert {A-V} \Vert ^2}{2 b (1-a) \gamma _{F^*} \gamma _G} = \tfrac{1}{b} \left( 1-b - \tfrac{\Vert {A-V} \Vert ^2}{2 (1-a) \gamma _{F^*} \gamma _G} \right) . \end{aligned}$$
(34)

Clearly, the upper bound increases with decreasing the value of b. Hence, we restrict ourselves to \(b \le \tfrac{1}{2}\) and use our degrees of freedom to set \(a = \tfrac{1}{2}\). Now (33) turns into

$$\begin{aligned} \tfrac{\kappa ^2 \gamma _{F^*}}{4 \Vert {A-V} \Vert ^4} \le \tfrac{1}{b} \tfrac{1-\kappa }{2 \Vert {V} \Vert ^2 \gamma _G}, \end{aligned}$$
(35)

which is equivalent to the condition

$$\begin{aligned} b \le \tfrac{1- \kappa }{\kappa ^2} \tfrac{\Vert {A-V} \Vert ^4}{\Vert {V} \Vert ^2} \tfrac{2}{\gamma _{F^*} \gamma _G} \end{aligned}$$
(36)

on b. To satisfy (34), we require \(b \le \frac{1}{2}\) and

$$\begin{aligned} \kappa \le \tfrac{1}{b} \left( \tfrac{1}{2} - \tfrac{\Vert {A-V} \Vert ^2}{\gamma _{F^*} \gamma _G} \right) . \end{aligned}$$
(37)

The later is positive, whenever

$$\begin{aligned} 2 \Vert {A-V} \Vert ^2 < \gamma _{F^*} \gamma _G \end{aligned}$$
(38)

is fulfilled. So by (37) we are able to find a small enough b for every \(\kappa \in (0,1)\), such that the corresponding inequality holds. In conclusion, we have proven:

Theorem 3.2

Suppose that \(G, F^*\) are \(\gamma _G/\gamma _{F^*}\)-strongly convex functions fulfilling

$$\begin{aligned} \gamma _{F^*} \gamma _G > 2 \Vert {A-V} \Vert ^2. \end{aligned}$$

Define

$$\begin{aligned} b = \min \left\{ \tfrac{1}{2}, \tfrac{1}{\kappa } \left( \tfrac{1}{2} - \tfrac{\Vert {A-V} \Vert ^2}{\gamma _{F^*} \gamma _G} \right) , \tfrac{1-\kappa }{\kappa ^2} \tfrac{\Vert {A-V} \Vert ^4}{\Vert {V} \Vert ^2} \tfrac{2}{\gamma _{F^*} \gamma _G}\right\} \end{aligned}$$

for some arbitrary \(\kappa \in (0,1)\). Furthermore let \(\hat{u} = ({\hat{x}},{\hat{y}})^{T}\) be a fixed point of the Chambolle–Pock method with mismatched adjoint and constant step lengths

$$\begin{aligned} \tau _i = \tau&:= \sqrt{ \frac{(1-\kappa )\gamma _{F^*}}{2 b \Vert {V} \Vert ^2 \gamma _G}}, \\ \sigma _i = \sigma&:= 2 b \frac{\gamma _{G} }{\gamma _{F^*} } \tau , \\ \omega _i = \omega&:= (1+2 b \tau \gamma _G)^{-1}. \end{aligned}$$

Then the iterates \((u^{N})_{N \in \mathbb {N}}\) converge with \(\Vert {u^N - \hat{u}} \Vert ^2\) decaying to zero at the rate \(\mathcal {O}(\omega ^N)\).

As a consequence of the freedom of choice for the value of \(\kappa \in (0,1)\), we can choose \(\kappa \le \frac{1}{2}\) small enough such that b in Theorem 3.2 equals \(\frac{1}{2}\). This leads to the following corollary.

Corollary 3.3

Suppose that \(G, F^*\) are \(\gamma _G/\gamma _{F^*}\)-strongly convex functions fulfilling

$$\begin{aligned} \gamma _{F^*} \gamma _G > 2 \Vert {A-V} \Vert ^2. \end{aligned}$$

Let

$$\begin{aligned} 0< \kappa \le \min \left\{ \tfrac{1}{2}, 1- \tfrac{2 \Vert {A-V} \Vert ^2}{\gamma _{F^*} \gamma _G}, \tfrac{\Vert {A-V} \Vert ^2}{\Vert {V} \Vert } \sqrt{\tfrac{2}{\gamma _{F^*} \gamma _G}}\right\} . \end{aligned}$$

Furthermore let \(\hat{u} = ({\hat{x}},{\hat{y}})^{T}\) be a fixed point of the Chambolle–Pock method with mismatched adjoint and constant step lengths

$$\begin{aligned} \tau _i = \tau&:= \sqrt{ \frac{(1-\kappa )\gamma _{F^*}}{\Vert {V} \Vert ^2 \gamma _G}}, \\ \sigma _i = \sigma&:= \frac{\gamma _{G} }{\gamma _{F^*} } \tau , \\ \omega _i = \omega&:= (1+\tau \gamma _G)^{-1}. \end{aligned}$$

Then the iterates converge with \(\Vert {u^N - \hat{u}} \Vert ^2\) decaying to zero at the rate \(\mathcal {O}(\omega ^N)\).

Proof

By choosing \(\kappa \) as stated, one shows by a routine calculation that one gets \(b=1/2\) in Theorem 3.2. \(\square \)

As a consequence of Corollary 3.3 we get a simple parameter choice method. For fast convergence one needs small \(\omega \), i.e. one wants a large \(\tau \). Hence, smaller \(\kappa \) is better and thus, one chooses \(\kappa \) positive but small (e.g. \(\kappa =10^{-5}\)). Note that \(\kappa =0\) is not covered by our theory.

Since many problems involve only one strongly convex function while the other function only remains to be convex we investigate if we can prove convergence of the Chambolle–Pock method with mismatch with the approach in this paper. To that end, we start again at inequality (12) and investigate under which conditions we have non-positivity of \(\Delta _{i+1}\). Using the definition of \({\tilde{H}}\) from (7), H from (5), the definition of \(Z_{i+1}\) from (13), the one of \(W_{i+1}\) from (9) and the relations for \(\eta ,\tau ,\sigma ,\xi ,\varphi ,\psi \) from Sect. 2 we get from the monotonicity of the subgradient

$$\begin{aligned}&\langle {\tilde{H}_{i+1}(u^{i+1})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}} - \frac{1}{2} \Vert {u^{i+1} - \hat{u}} \Vert _{Z_{i+2}M_{i+2} - Z_{i+1}M_{i+1}}^2 \\&\quad + \frac{1}{2} \Vert {u^{i+1}-u^i} \Vert ^2_{Z_{i+1}M_{i+1}} \\&= \langle {H(u^{i+1}) - H(\hat{u})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}} \\&\quad + \eta _{i+1} \langle { (A-V)(x^{i+1} - \bar{x}^{i+1})},{y^{i+1} - \hat{y}}\rangle \\&\quad + (\eta _{i+1} - \eta _i) \langle {V (x^{i+1} - \hat{x})},{y^{i+1} - \hat{y} }\rangle \\&\quad - \eta _i \mu _G \Vert {x^{i+1} - \hat{x}} \Vert ^2 - \eta _{i+1} \mu _{F^*} \Vert {y^{i+1} - \hat{y}} \Vert ^2 + \frac{1}{2} \Vert {u^{i+1}-u^i} \Vert ^2_{Z_{i+1}M_{i+1}} \\&\ge \eta _i \gamma _G \Vert {x^{i+1} - \hat{x}} \Vert ^2 + \eta _{i+1} \gamma _{F^*} \Vert {y^{i+1} - \hat{y}} \Vert ^2 \\&\quad + \eta _i \langle {V^*(y^{i+1} - \hat{y})},{x^{i+1}-\hat{x}}\rangle - \eta _{i+1} \langle {y^{i+1} - \hat{y}},{A(x^{i+1} - \hat{x})}\rangle \\&\quad - \eta _i \langle {y^{i+1} - \hat{y}},{(A-V)(x^{i+1} - x^{i})}\rangle + (\eta _{i+1} - \eta _{i}) \langle {V^* (y^{i+1} - \hat{y})},{x^{i+1} - \hat{x}}\rangle \\&\quad - \eta _i \mu _G \Vert {x^{i+1} - \hat{x}} \Vert ^2 - \eta _{i+1} \mu _{F^*} \Vert {y^{i+1} - \hat{y}} \Vert ^2 \\&\quad +\varphi _{i} \Vert {x^{i+1} - x^{i}} \Vert ^2 + \psi _{i+1} \Vert {y^{i+1} - y^{i}} \Vert ^2 - 2 \eta _{i} \langle {V^{*} (y^{i+1} - y^{i})},{x^{i+1}-x^{i}}\rangle \\&= \eta _{i} (\gamma _G - \mu _G) \Vert {x^{i+1} - \hat{x}} \Vert ^2 + \eta _{i+1} (\gamma _{F^*} - \mu _{F^*}) \Vert {y^{i+1} - \hat{y}} \Vert ^2 \\&\quad - \eta _{i+1} \langle {y^{i+1} - \hat{y}},{(A-V)(x^{i+1} - \hat{x})}\rangle - \eta _i \langle {y^{i+1} - \hat{y}},{(A-V)(x^{i+1} - x^{i})}\rangle \\&\quad +\varphi _{i} \Vert {x^{i+1} - x^{i}} \Vert ^2 + \psi _{i+1} \Vert {y^{i+1} - y^{i}} \Vert ^2 - 2 \eta _{i} \langle {V^{*} (y^{i+1} - y^{i})},{x^{i+1}-x^{i}}\rangle \\&= \eta _{i} (\gamma _G - \mu _G) \Vert {x^{i+1} - \hat{x}} \Vert ^2 + \eta _{i+1} (\gamma _{F^*} - \mu _{F^*}) \Vert {y^{i+1} - \hat{y}} \Vert ^2 \\&\quad - \eta _{i+1} \langle {y^{i+1} - \hat{y}},{(A-V)[x^{i+1} - \hat{x} + \omega _i (x^{i+1} - x^{i})]}\rangle \\&\quad +\varphi _{i} \Vert {x^{i+1} - x^{i}} \Vert ^2 + \psi _{i+1} \Vert {y^{i+1} - y^{i}} \Vert ^2 - 2 \eta _{i} \langle {V^{*} (y^{i+1} - y^{i})},{x^{i+1}-x^{i}}\rangle \end{aligned}$$

Using the abbreviations

$$\begin{aligned} \begin{array}{llllll} a = \Vert {x^{i+1} - \hat{x}} \Vert ,&b = \Vert {y^{i+1} - \hat{y}} \Vert ,&c = \Vert {x^{i+1} - x^{i}} \Vert&\text {and}&d = \Vert {y^{i+1} - y^{i}} \Vert \end{array} \end{aligned}$$

as well as the Cauchy-Schwarz inequality, we can finally bound

$$\begin{aligned}&\langle {\tilde{H}_{i+1}(u^{i+1})},{u^{i+1} - \hat{u}}\rangle _{Z_{i+1} W_{i+1}} - \frac{1}{2} \Vert {u^{i+1} - \hat{u}} \Vert _{Z_{i+2}M_{i+2} - Z_{i+1}M_{i+1}}^2 \\&\quad + \frac{1}{2} \Vert {u^{i+1}-u^i} \Vert ^2_{Z_{i+1}M_{i+1}} \\&\ge \eta _{i} (\gamma _G - \mu _G) a^2 + \eta _{i+1} (\gamma _{F^*} - \mu _{F^*}) b^2 - \eta _{i+1} \Vert {A-V} \Vert ab - \eta _i \Vert {A-V} \Vert bc \\&\quad +\varphi _{i} c^2 + \psi _{i+1} d^2 - 2 \eta _{i} \Vert {V} \Vert cd. \end{aligned}$$

This is a quadratic polynomial in the four variables abcd and hence, non-negativity of this expression is implied by positiv semidefiniteness of the quadratic form \((a,b,c,d) Q (a,b,c,d)^T\) with

$$\begin{aligned}Q = \left( \begin{array}{cccc} \eta _{i} (\gamma _G - \mu _G) &{} - \frac{1}{2} \eta _{i+1} \Vert {A-V} \Vert &{} 0 &{} 0 \\ - \frac{1}{2} \eta _{i+1} \Vert {A-V} \Vert &{} \eta _{i+1} (\gamma _{F^*} - \mu _{F^*}) &{} - \frac{1}{2} \eta _i \Vert {A-V} \Vert &{} 0 \\ 0 &{} - \frac{1}{2} \eta _i \Vert {A-V} \Vert &{} \varphi _i &{} - \eta _i \Vert {V} \Vert \\ 0 &{} 0 &{} - \eta _{i} \Vert {V} \Vert &{} \psi _{i+1} \end{array} \right) \end{aligned}$$

However, the conditions for positive semidefiniteness of Q involve the inequality

$$\begin{aligned} \omega _{i} (\gamma _G - \mu _G) (\gamma _{F^*} - \mu _{F^*}) - \frac{1}{4} \Vert {A-V} \Vert ^2 \ge 0 \end{aligned}$$

and if we have \(\Vert {A-V} \Vert \ne 0\), i.e. there is mismatch, then this implies that \(\gamma _{G},\gamma _{F^{*}}>0\) is necessary for \(\mu _{G},\mu _{F^{*}}\) to exist. Hence, we can not prove convergence of the Chambolle–Pock method with mismatch with the techniques of this paper if G or \(F^{*}\) is not strongly convex.

Here is a counterexample, that the Chambolle–Pock method with mismatch may actually diverge if there is mismatch and one of the functions is strongly convex while the other is not.

Example 3.4

Let \(A = \begin{pmatrix} 1&1 \end{pmatrix}\in \mathbb {R}^{1\times 2}\), \(F(y) = (y-z)^{2}/2\) for some \(z\in \mathbb {R}\) and \(G\equiv 0\) in problem (1), i.e., we consider the minimization problem

$$\begin{aligned} \min _{x\in \mathbb {R}^{2}}\tfrac{1}{2}(Ax-z)^{2}. \end{aligned}$$

We consider the accelerated Chambolle–Pock method (Algorithm 2 in [1]) which is (with mismatch)

$$\begin{aligned} x^{i+1}&= x^{i} + \tau _{i}V^{*}y^{i}\\ \theta _{i}&= \tfrac{1}{\sqrt{1+2\tau _{i}}},\ \tau _{i+1} = \theta _{i}\tau _{i},\ \sigma _{i+1} = \sigma _{i}/\theta _{i}\\ y^{i+1}&= \frac{y^{i} + \sigma _{i+1}A(x^{i+1} + \theta _{i}(x^{i+1} - x^{i})) - \sigma _{i+1}z}{1+\sigma _{i}}. \end{aligned}$$

We initialize the stepsizes with \(\tau _{0}\sigma _{0}<1/\Vert {A} \Vert ^{2}\) and the iterates with

$$\begin{aligned} x^{0} = \begin{pmatrix} 0\\ 0 \end{pmatrix},\quad y^{0} = -z. \end{aligned}$$

For the mismatch we take \(V = \begin{pmatrix} 1&-1 \end{pmatrix} \). A standard calculation shows that one gets

$$\begin{aligned} x^{n}&= \sum _{i=0}^{n}\tau _{i}\, V^{*}z,\\ \text {and}\ y^{n}&= y^{0} = -z. \end{aligned}$$

The sequence \(\tau _{i}\) fulfills

$$\begin{aligned} \tau _{i+1}&= \tfrac{\tau _{i}}{\sqrt{1+2\tau _{i}}} \end{aligned}$$

from which we deduce

$$\begin{aligned} \tfrac{1}{\tau _{i+1}} = \tfrac{\sqrt{1+2\tau _{i}}}{\tau _{i}}\le \tfrac{\sqrt{1+2\tau _{i} + \tau _{i}^{2}}}{\tau _{i}} = 1 + \tfrac{1}{\tau _{i}}. \end{aligned}$$

Hence, it holds that \(1/\tau _{i+1}\le i+1\), i.e. \(\tau _{i}\ge 1/i\) and thus, the iterates \(x^{i}\) do diverge if \(z\ne 0\).

If we use the non-accelerated variant (Algorithm 1 from [1])) we can show divergence of the \(x^{i}\) as well.

4 Numerical Examples

In this section we report some numerical experiments to illustrate the results.

4.1 Convex Quadratic Problems

As examples where all quantities and solutions can be computed exactly, we consider convex quadratic problems of the form

$$\begin{aligned} \min _{x\in \mathbb {R}^{n}}\tfrac{\alpha }{2}\Vert {x} \Vert _{2}^{2} + \tfrac{1}{2\beta }\Vert {Ax-z} \Vert _{2}^{2} \end{aligned}$$
(39)

with \(\alpha ,\beta >0\), \(A\in \mathbb {R}^{m\times n}\) and \(z\in \mathbb {R}^{m}\). With \(G(x) = \tfrac{\alpha }{2}\Vert {x} \Vert _{2}^{2}\) and \(F(\zeta ) = \tfrac{1}{2\beta }\Vert {\zeta -z} \Vert _{2}^{2}\) this is of the form (1). The conjugate functions are \(G^{*}(\xi ) = \tfrac{1}{2\alpha }\Vert {x} \Vert _{2}^{2}\) and \(F^{*}(y) = \tfrac{\beta }{2}\Vert {y} \Vert _{2}^{2} + \langle {y},{z}\rangle \) and the respective proximal operators are readily computed as

$$\begin{aligned} {{\,\textrm{prox}\,}}_{\tau G}(x)&= \tfrac{x}{1+\tau \alpha }\\ {{\,\textrm{prox}\,}}_{\sigma F^{*}}(y)&= \tfrac{y-\sigma z}{1+\sigma \beta } \end{aligned}$$

and the optimal primal solution is

$$\begin{aligned} x^{*} = \Big (\alpha I + \tfrac{1}{\beta }A^{T}A\Big )^{-1}(\tfrac{1}{\beta }A^{T}z). \end{aligned}$$

Note that G is strongly convex with constant \(\gamma _{G} = \alpha \) and \(F^{*}\) is strongly convex with constant \(\gamma _{F^{*}} = \beta \) and hence, for \(\alpha ,\beta >0\) we can use Theorem 3.2 to obtain valid stepsizes. For a numerical experiment we choose \(n=400\), \(m=200\), a random matrix \(A\in \mathbb {R}^{m\times n}\) and a perturbation \(V\in \mathbb {R}^{m\times n}\) by adding a small random matrix to A, i.e.

$$\begin{aligned} V = A + E\ \text { with }\ \Vert {E} \Vert \le \eta . \end{aligned}$$

The resulting algorithm is

$$\begin{aligned} \begin{aligned} x^{i+1}&= \tfrac{1}{1+\tau \alpha }(x^{i}-\tau V^{T}y^{i})\\ y^{i+1}&= \tfrac{1}{1+\sigma \beta }\Big (y^{i} + \sigma A(x^{i+1} + \omega (x^{i+1}-x^{i})) - \sigma z\Big ). \end{aligned} \end{aligned}$$
(40)

We check the condition \(\gamma _{G}\gamma _{F^{*}}>2\Vert {A-V} \Vert ^{2}\) numerically and use Theorem 3.2 to obtain feasible stepsizes. For constant stepsizes, we get as limit the unique fixed points

$$\begin{aligned} {\hat{x}}&= \Big (\alpha I + \tfrac{1}{\beta }V^{T}A\Big )^{-1}(\tfrac{1}{\beta }V^{T}z) = V^T(\alpha \beta I + AV^{T})^{-1}z\\ {\hat{y}}&= -(\beta I + \tfrac{1}{\alpha }AV^{T})^{-1}z = -\alpha (\alpha \beta I + AV^{T})^{-1}z\nonumber \end{aligned}$$
(41)

while the true primal solution is

$$\begin{aligned} x^{*} = \Big (\alpha I + \tfrac{1}{\beta }A^{T}A\Big )^{-1}(\tfrac{1}{\beta }A^{T}z) = A^T(\alpha \beta I + AA^{T})^{-1}z \end{aligned}$$
(42)

For our experiment we used \(\alpha = \gamma _G=0.15\) and \(\beta = \gamma _{F^{*}}=1\) and \(\kappa =0.01\) in Theorem 3.2.

Figure 1 illustrates that the method with mismatched adjoint behaves as expected: We observe linear convergence towards the fixed point \({\hat{x}}\) and the iterates reach the error to the true minimizer \(x^{*}\) that has been predicted by Theorem 1.2.

Fig. 1
figure 1

Convergence of iteration (40). Here \({\hat{x}}\) is the fixed point (41) of the iteration with mismatch and \(x^{*}\) is the original primal solution (42). The solid orange plot is the distance of the primal iterates \(x^{k}\) of (40) to the fixed point of the iteration, and the dashed purple line is the distance of the iterates \(x^{k}\) to the original primal solution. As predicted, the latter distance falls below the value given in Theorem 1.2

4.2 Computerized Tomography

To illustrate a real-world application of our results, we consider the problem of computerized tomography (CT) [13]. In computerized tomography one aims to reconstruct a slice of an object from x-ray measurements taken in different directions. The x-ray measurements are stored as the so-called sinogram and the map of the image of the slice to the sinogram is modeled by a linear map which is referred to as Radon transform or forward projection. The adjoint of the map is called backprojection. There exist various inversion formulas which express the inverse of the Radon transform explicitly, but since the Radon transform is compact (when modeled as a map between the right function spaces [14]), any inversion formula has to be unstable. One popular stable, approximate inversion method is the so called filtered backprojection (FBP) [13]. The method gives good approximate reconstruction when the number of projections is high and when the data is not too noisy. However, the quality of the reconstruction quickly gets worse when the number of projections decreases. There are numerous efforts to increase reconstruction quality from only a few projections, as this lowers the x-ray dose for a CT scan. One successful approach uses total variation (TV) regularization as a reconstruction method [15] and solves the respective minimization problem with the Chambolle–Pock method. Usually, the method takes a large number of iterations. Moreover, there are many ways to implement the forward and the backward projection. In applications it sometimes happens that a pair of forward and backward projections are chosen that are not adjoint to each other, either because this importance of adjointness is not noted, or on purpose (speed of computation, special choice of backprojector to achieve a certain reconstruction quality, see also [4, 6, 7, 16]).

We describe a discrete image with \(m\times n\) pixels as \(x\in \mathbb {R}^{m\times n}\). Its discrete gradient \(\nabla x = u\in \mathbb {R}^{m\times n\times 2}\) is a tensor and for such a tensor we define the pixel-wise absolute value in the pixel \((i_{1},i_{2})\) as \(|{u} |_{i_{1},i_{2}}^2 = \sum _{k=1}^2 u_{i_{1},i_{2},k}^2\), cf. [17, p. 416]. For images \(x\in \mathbb {R}^{m\times n}\) we denote by \(\Vert {x} \Vert _{p} = \left( \sum _{i_{i},i_{2}}|{x} |_{i_{1},i_{2}}^{p} \right) ^{1/p}\) the usual pixel-wise p-norm. With R we denote the discretized Radon transform taking an \(m\times n\)-pixel image to a sinogram of size \(s\times t\). We aim to solve the problem

$$\begin{aligned} \min _{x \in \mathbb {R}^{m \times n}} \frac{\lambda _0}{2}\Vert R x-z\Vert _{2}^{2}+ \frac{\lambda _1}{2} \Vert {|{\nabla x} |} \Vert _{1}+ \frac{\lambda _2}{2} \Vert {x} \Vert _2^2 \end{aligned}$$

for a given sinogram z and constants \(\lambda _{0},\lambda _1,\lambda _{2}>0\). This can be expressed as the saddle point problem

$$\begin{aligned} \min _{x \in \mathbb {R}^{m \times n}} \max _{\begin{array}{c} p \in \mathbb {R}^{m \times n \times 2}\\ q \in \mathbb {R}^{s \times t} \end{array}}- \langle x, {\text {div}} p\rangle + \langle R x-z, q\rangle - \frac{1}{2 \lambda _0} \Vert {q} \Vert ^2 - I_{\Vert {\cdot } \Vert _{\infty } \le \lambda _1}(p) + \frac{\lambda _2}{2} \Vert {x} \Vert ^2. \end{aligned}$$

With \(F^*(q,p) = \frac{1}{2 \lambda _0}\Vert q\Vert _{2}^{2} + \langle q, z \rangle + I_{\Vert {\cdot } \Vert _{\infty } \le \lambda _1}(p)\), \(G(x) = \frac{\lambda _2}{2} \Vert {x} \Vert _2^2\) and \(A = \left( \begin{array}{l} R \\ \nabla \end{array}\right) \) the saddle point formulation is exactly of the form (1). The function G is strongly convex, however, \(F^{*}\) is not. Hence, we regularize further by adding \(\epsilon \Vert p\Vert _{2}^{2}/2\) with \(\epsilon >0\) to \(F^{*}\) which amounts to a Huber-smoothing of the total variation term in the primal problem.

In our experiment we want to recover the Shepp Logan phantom \(\hat{x}\) with \(400 \times 400\) pixels from measurement with just 40 equispaced angles and 400 bins for each angle and a parallel bean geometry, and added 15% relative Gaussian noise. Hence, the resulting sinogram z is of the shape \(40 \times 400\). To implement the mismatch we used non-adjoined implementations of the forward and back-projection (we used the Astra toolbox [18] and took as forward operator A the parallel strip beam projector and as backward projection \(V^{*}\) the adjoint of the parallel line beam projector, see https://www.astra-toolbox.com/docs/proj2d.html for the documentation). All experiments are done in Python 3.7.

We use the algorithm from Theorem 3.1 where we used \(V^{*} = \begin{pmatrix} S^{*}&-{\text {div}} \end{pmatrix} \) instead of \(A^{*}\) and replace the correct adjoint \(S^{*}\) by the computationally more efficient adjoint of the parallel line projector. To achieve a fair comparison, we vary the regularization parameter \(\lambda _1\) of the total variation penalty from 0.6 to 2.4. The remaining parameters are set to \(\lambda _0 = 1\) and \(\lambda _2 = \epsilon = 0.01\). The initial stepsizes are set according to Theorem 3.1 for the mismatched adjoint and [1, Algorithm 3] in the non-mismatched case, respectively. The operator norm of the adjoint operator \(S^{*}\) is computed numerically and the operator norm of the gradient is estimated like in (cf. [17, Lemma 6.142]).

In Fig. 2 we show the original image (the famous Shepp-Logan phantom), the reconstruction by filtered backprojection and the best results of the Chambolle–Pock iteration for TV regularized reconstruction with the exact and with the mismatched adjoint. One notes that the use of a mismatched adjoint leads to a good reconstruction, comparable to the result with [1, Algorithm 3].

Fig. 2
figure 2

Reconstruction (Rec.) of the Shepp Logan phantom. From left to right: Reconstruction with adjoint mismatch with fixed grayscale, the absolute reconstruction error towards the original image, the reconstruction with the exact adjoint and the corresponding absolute error. All images have a fixed grayscale with values reaching from 0.0 to 1.0

Figure 3 shows the distance of the iteration to the exact reconstruction (i.e. the true, noise-free Shepp-Logan phantom). Naturally, the Chambolle–Pock iteration does not drive the error to the noise-free solution to zero (for both exact and mismatched adjoint). There are at least three different reasons: There is some error in the sinogram, we only use few projections and there is TV regularization involved. However, the non-mismatched Chambolle–Pock method gets admittedly closer to the original image than the mismatched method. Figure 4 shows the primal objective. We note that in this example the iteration with mismatch yields results comparable to the non-mismatched Chambolle–Pock method. Moreover, it can be seen that, as expected, the use of a mismatched adjoint prevents the true minimization of the objective. With the computationally more efficient parallel line projector as adjoint, we are able to decreases the computation time significantly with approximately \(15\%\) average time saving per iteration on a 2020 M1 MacBook Air running macOS Big Sur. However, the non-mismatched method takes less iterations to retrieve a good result, so the no computational advantage can be shown by this experiment.

Fig. 3
figure 3

Relative distance to the exact reconstruction over iterations

Fig. 4
figure 4

Decay of the primal objective function value

5 Conclusion

We have established stepsizes, for which the Chambolle–Pock method converges, even if the adjoint is mismatched. Additionally, we presented results that showed, that not only strong convergence can be preserved under strong convexity assumptions, and also that the convergence rate is in a similar region. Furthermore, as a broad class of problems are in the scope of this paper, we established an upper bound on the distance between the original solution and the fixed point of iteration with mismatch. Thus, approximating the adjoint with a computationally more efficient algorithm can be done as long as the assumptions are respected. One of these assumptions is, that the iteration with mismatch still possesses a fixed point and more work is needed to understand when this is guaranteed. Furthermore, we discussed advantages and disadvantages of our analysis and illustrated our results on an example from computerized tomography, in which the mismatched adjoint was obtained using different discretization schemes for the forward operator and the mismatched adjoint.