1 Introduction

Let \(\Omega \subset \mathbb {R}^d\) be a bounded Lipschitz domain, \(d \ge 2\). For given \(f \in L^2(\Omega )\) and \(\varvec{f}\in [L^2(\Omega )]^d\), we consider a linear symmetric and elliptic partial differential equation

$$\begin{aligned} \begin{array}{lll} -\textrm{div}\,\varvec{A}\nabla u^\star + c&{}u^\star = f + \textrm{div}\,\varvec{f}&{}\quad \text {in } \Omega ,\\ &{}u^\star = 0 &{}\quad \text {on } \Gamma := \partial \Omega , \end{array} \end{aligned}$$
(1)

where \(\varvec{A}(x) \in \mathbb {R}^{d \times d}_{\textrm{sym}}\) is symmetric and \(c(x) \in \mathbb {R}\). As usual, we assume that \(\varvec{A}, c \in L^\infty (\Omega )\), that \(\varvec{A}\) is uniformly positive definite and that the weak form (see (5) below) fits into the setting of the Lax–Milgram lemma. Standard adaptivity aims to approximate the unknown solution \(u^\star \in H^1_0(\Omega )\) of (1) in the energy norm at optimal rate; see [1, 6, 7, 9, 11, 19, 23] for adaptive finite element methods (AFEMs) and [5] for an overview of available results. Instead, the quantity of interest for goal-oriented adaptivity is only some functional value of the unknown solution \(u^\star \in H^1_0(\Omega )\) of (1), and the present paper aims to compute the linear goal functional

$$\begin{aligned} G(u^\star ) := \int _\Omega \big ( gu^\star - \varvec{g}\cdot \nabla u^\star \big ) {\text {d}}{x}, \end{aligned}$$
(2)

for given \(g \in L^2(\Omega )\) and \(\varvec{g}\in [L^2(\Omega )]^d\). To approximate \(G(u^\star )\) accurately, it is not necessary (and might even waste computational time) to accurately approximate the solution \(u^\star \) on the whole computational domain. Due to this potential decrease of computational cost, goal-oriented adaptivity is of high relevance in practice as well as in mathematical research; see, e.g., [3, 4, 10, 17] for some prominent contributions.

The present work formulates a goal-oriented adaptive finite element method (GOAFEM), where the sought goal \(G(u^\star )\) is approximated by some computable \(G_\ell \) such that

$$\begin{aligned} | G(u^\star ) - G_\ell | \xrightarrow {\ell \rightarrow \infty } 0 \quad \text {even at optimal algebraic rate}. \end{aligned}$$
(3)

The earlier works [2, 12, 14, 20] are essentially concerned with optimal convergence rates for GOAFEM, where all arising linear FEM systems are solved exactly. While [12, 14] particularly aim to transfer ideas from the AFEM analysis of [5, 6] to GOAFEM for general elliptic PDEs, the seminal work [20] considers the Poisson model problem and additionally addresses the total computational cost by formulating realistic assumptions on a generic inexact solver (called GALSOLVE in [20, 23]).

The focus of the present work is also on the iterative (and hence inexact) solution of the arising FEM systems. However, we avoid any realistic assumptions on the solver, but rather rely on energy contraction per solver step, which is proved to hold for the preconditioned CG method with optimal multilevel additive Schwarz preconditioner [8] or the geometric multigrid method [25]. In the proposed GOAFEM algorithm, the termination of such a contractive iterative solver is then based on appropriate computable a posteriori error estimates by a similar criterion as in [20, 23]. We discuss several implementations of such termination criteria and prove that these allow to control the total computational cost of computing the approximate goal value \(G_\ell \), where we already stress now that \(G(u^\star ) \approx G_\ell = G(u_\ell ) + R_\ell \), where \(u_\ell \approx u^\star \) is a FEM approximation of \(u^\star \) and \(R_\ell \) is a residual correction related to inexact solution of the FEM formulation. While [20] shows algebraic convergence with optimal rates (in the present setting of FEM on quasi-uniform meshes) with respect to the overall computational cost for the final iterates on every level for sufficiently small adaptivity parameters (for mesh-refinement and solver termination), our main contribution is full linear convergence, i.e., linear convergence of the estimator product independently of the algorithmic decision for either mesh-refinement or solver step and even for arbitrary adaptivity parameters. An immediate consequence is that the convergence rate of the computed solutions with respect to the number of elements will be the same as with respect to the overall computational cost (i.e., the cumulative computational time). Moreover, for sufficiently small adaptivity parameters, we show convergence with optimal rates with respect to the number of elements and, hence, with respect to the overall computational cost. This extends the results of [20] to the present setting of symmetric second-order linear elliptic PDEs. Finally, we stress that, unlike [20], our GOAFEM algorithm does not require any inner loop for data approximation and therefore does not require different (but still nested) meshes for the primal and dual problem. Overall, the present paper thus provides further mathematical understanding for bridging the gap between applied GOAFEM and theoretical optimality results.

Outline In Sect. 2, we present our GOAFEM algorithm (Algorithm 3) and the details of its individual steps. This includes the details of our finite element discretization as well as the precise assumptions for the iterative solver, the marking strategy, and the error estimators. We then state in Sect. 3 that Algorithm 3 leads to linear convergence for arbitrary stopping parameters (Theorem 6) and even achieves optimal rates with respect to the total computational cost if the adaptivity parameters are sufficiently small (Theorem 8). We emphasize that linear convergence applies to all steps of the adaptive strategy, independently of whether the algorithm decides for one solver step or one step of local mesh-refinement. This turns out to be the key argument for optimal rates with respect to the total computational cost (see Corollary 7). Section 3.2 comments on alternative termination criteria for the iterative solver. Section 4 then illustrates our theoretical findings with numerical experiments. Finally, we give a proof of our main Theorems 6 and 8 in Sects. 5 and 6, respectively.

Notation In the following text, we write \(a \lesssim b\) for \(a, b \in \mathbb {R}\) if there exists a constant \(C > 0\) (which is independent of the mesh width h) such that \(a \le C \, b\). If there holds \(a \lesssim b \lesssim a\), we abbreviate this by \(a \simeq b\). Furthermore, we denote by \(\#A\) the cardinality of a finite set A and by \(|\omega |\) the d-dimensional Lebesgue measure of a subset \(\omega \subset \mathbb {R}^d\).

2 Goal-oriented adaptive finite element method

2.1 Variational formulation

Defining the symmetric bilinear form

$$\begin{aligned} a(u, v) := \int _\Omega \varvec{A}\nabla u \cdot \nabla v {\text {d}}{x} + \int _\Omega cuv {\text {d}}{x}, \end{aligned}$$
(4)

we suppose that \(a(\cdot ,\cdot )\) is continuous and elliptic on \(H^1_0(\Omega )\) and thus fits into the setting of the Lax–Milgram lemma, i.e., there exist constants \(0< C_\textrm{ell}\le C_\textrm{cnt}< \infty \) such that

$$\begin{aligned} C_\textrm{ell}\Vert u \Vert _{H^1_0(\Omega )}^2 \le a(u, u) ~ ~ \text {and}~ ~ a(u,v) \le C_\textrm{cnt}\Vert u \Vert _{H^1_0(\Omega )} \Vert v \Vert _{H^1_0(\Omega )} \quad \text {for all } u,v \in H^1_0(\Omega ). \end{aligned}$$

In particular, \(a(\cdot , \cdot )\) is a scalar product that yields an equivalent norm \(|\!|\!| v |\!|\!|^2 := a(v,v)\) on \(H^1_0(\Omega )\). The weak formulation of (1) reads

$$\begin{aligned} a(u^\star , v) = F(v) := \int _\Omega \big ( fv {\text {d}}{x} - \varvec{f}\cdot \nabla v \big ) {\text {d}}{x} \quad \text {for all } v \in H^1_0(\Omega ). \end{aligned}$$
(5)

The Lax–Milgram lemma proves existence and uniqueness of the solution \(u^\star \in H^1_0(\Omega )\) of (5). The same argument applies and proves that the dual problem

$$\begin{aligned} a(v, z^\star ) = G(v) \quad \text {for all } v \in H^1_0(\Omega ) \end{aligned}$$
(6)

admits a unique solution \(z^\star \in H^1_0(\Omega )\), where the linear goal functional \(G \in H^{-1}(\Omega ) := H^1_0(\Omega )'\) is defined by (2).

Remark 1

For ease of presentation, we restrict our model problem (1) to homogeneous Dirichlet boundary conditions. We note, however, that for mixed homogeneous Dirichlet and inhomogeneous Neumann boundary conditions our main results hold true with the obvious modifications. In particular, with the partition \(\partial \Omega = \overline{\Gamma }_D \cup \overline{\Gamma }_N\) into Dirichlet boundary \(\Gamma _D\) with \(|\Gamma _D| > 0\) and Neumann boundary \(\Gamma _N\), the space \(H^1_0(\Omega )\) (and its discretization) has to be replaced by \(H^1_D(\Omega ) := \big \{v \in H^1(\Omega ) \,:\, v|_{\Gamma _D} = 0 \text { in the sense of traces} \big \}\) and the Neumann data has to be given in \(L^2(\Gamma _N)\). Furthermore, the coefficient \(\varvec{f}\) must vanish in a neighborhood of \(\Gamma _N\) to go from the strong form (1) to the weak form (5) via integration by parts.

2.2 Finite element discretization and solution

For a conforming triangulation \(\mathcal {T}_H\) of \(\Omega \) into compact simplices and a polynomial degree \(p \ge 1\), let

$$\begin{aligned} \mathcal {X}_H:= \big \{v_H\in H^1_0(\Omega ) \,:\, \forall T \in \mathcal {T}_H, \quad v_H|_T \text { is a polynomial of degree } \le p \big \}. \end{aligned}$$
(7)

To obtain conforming finite element approximations \(u^\star \approx u_H\in \mathcal {X}_H\) and \(z^\star \approx z_H\in \mathcal {X}_H\), we consider the Galerkin discretizations of (5)–(6). First, we note that the Lax–Milgram lemma yields the existence and uniqueness of exact discrete solutions \(u_H^\star , z_H^\star \in \mathcal {X}_H\), i.e., there holds that

$$\begin{aligned} a(u_H^\star , v_H) = F(v_H) \quad \text {and} \quad a(v_H, z_H^\star ) = G(v_H) \,\, \text { for all } v_H\in \mathcal {X}_H. \end{aligned}$$
(8)

In practice, the discrete systems (8) are rarely solved exactly (or up to machine precision). Instead, a suitable iterative solver is employed, which yields approximate discrete solutions \(u_{H}^{m}, z_{H}^{n} \in \mathcal {X}_H\). We suppose that this iterative solver is contractive, i.e., for all \(m, n \in \mathbb {N}\), it holds that

$$\begin{aligned} |\!|\!| u_H^\star - u_{H}^{m} |\!|\!| \le q_\textrm{ctr}\, |\!|\!| u_H^\star - u_{H}^{m-1} |\!|\!| \,\, \text { and } \,\, |\!|\!| z_H^\star - z_{H}^{n} |\!|\!| \le q_\textrm{ctr}\, |\!|\!| z_H^\star - z_{H}^{n-1} |\!|\!|, \end{aligned}$$
(9)

where \(0< q_\textrm{ctr}< 1\) is a generic constant and, in particular, independent of \(\mathcal {X}_H\). Assumption (9) is satisfied, e.g., for an optimally preconditioned conjugate gradient (PCG) method (see [8]) or geometric multigrid solvers (see [25]); see also the discussion in [16]. We note that these solvers are also guaranteed to satisfy the realistic assumptions from [20, 23] (which require that any initial energy error can be improved by a factor \(0<\tau <1\) at \(\mathcal {O}(|\log (\tau )|\#\mathcal {T}_H)\) cost). However, while (9) is slightly less general, it allows to prove full linear convergence; see Theorem 6 below.

2.3 Discrete goal quantity

To approximate \(G(u^\star )\), we proceed as in [17]: For any \(u_H, z_H\in \mathcal {X}_H\), it holds that

Defining the discrete quantity of interest

$$\begin{aligned} G_H(u_H, z_H) := G(u_H) + \big [ F(z_H) - a(u_H, z_H) \big ], \end{aligned}$$
(10)

the goal error can be controlled by means of the Cauchy–Schwarz inequality

$$\begin{aligned} \big | G(u^\star ) - G_H(u_H, z_H) \big | \le \big | a(u^\star - u_H, z^\star - z_H) \big | \le |\!|\!| u^\star - u_H |\!|\!| \, |\!|\!| z^\star - z_H |\!|\!|. \end{aligned}$$
(11)

We note that the additional term in (10) is the residual of the discrete primal problem (8) evaluated at an arbitrary function \(z_H\in \mathcal {X}_H\) and hence \(G(u_H^\star ) = G_H(u_H^\star , z_H)\).

In the following, we design an adaptive algorithm that provides a computable upper bound to (11) which tends to zero at optimal algebraic rate with respect to the number of elements \(\#\mathcal {T}_H\) as well as with respect to the total computational cost.

2.4 Mesh refinement

Let \(\mathcal {T}_0\) be a given conforming triangulation of \(\Omega \). We suppose that the mesh-refinement is a deterministic and fixed strategy, e.g., newest vertex bisection [24]. For each conforming triangulation \(\mathcal {T}_H\) and marked elements \(\mathcal {M}_H\subseteq \mathcal {T}_H\), let \(\mathcal {T}_h:= \texttt{refine}(\mathcal {T}_H,\mathcal {M}_H)\) be the coarsest conforming triangulation, where all \(T \in \mathcal {M}_H\) have been refined, i.e., \(\mathcal {M}_H\subseteq \mathcal {T}_H\backslash \mathcal {T}_h\). We write \(\mathcal {T}_h\in \mathbb {T}(\mathcal {T}_H)\), if \(\mathcal {T}_h\) results from \(\mathcal {T}_H\) by finitely many steps of refinement. To abbreviate notation, let \(\mathbb {T}:=\mathbb {T}(\mathcal {T}_0)\). We note that the order on \(\mathbb {T}\) is respected by the finite element spaces, i.e., \(\mathcal {T}_h\in \mathbb {T}(\mathcal {T}_H)\) implies that \(\mathcal {X}_H\subseteq \mathcal {X}_h\).

We further suppose that each refined element has at least two sons, i.e.,

$$\begin{aligned} \#(\mathcal {T}_H\backslash \mathcal {T}_h) + \#\mathcal {T}_H\le \#\mathcal {T}_h\quad \text {for all } \mathcal {T}_H\in \mathbb {T}\text { and all } \mathcal {T}_h\in \mathbb {T}(\mathcal {T}_H), \end{aligned}$$
(12)

and that the refinement rule satisfies the mesh-closure estimate

$$\begin{aligned} \#\mathcal {T}_\ell - \#\mathcal {T}_0 \le C_\textrm{cls}\,\sum _{j=0}^{\ell -1}\#\mathcal {M}_j \quad \text {for all } \ell \in \mathbb {N}, \end{aligned}$$
(13)

where \(C_\textrm{cls}>0\) depends only on \(\mathcal {T}_0\). For newest vertex bisection, this has been proved under an additional admissibility assumption on \(\mathcal {T}_0\) in [1, 24] and for 2D even without any additional assumption in [18]. Finally, we suppose that the overlay estimate holds, i.e., for all triangulations \(\mathcal {T}_H,\mathcal {T}_h\in \mathbb {T}\), there exists a common refinement \(\mathcal {T}_H\oplus \mathcal {T}_h\in \mathbb {T}(\mathcal {T}_H) \cap \mathbb {T}(\mathcal {T}_h)\) which satisfies that

$$\begin{aligned} \#(\mathcal {T}_H\oplus \mathcal {T}_h) \le \#\mathcal {T}_H+ \#\mathcal {T}_h- \#\mathcal {T}_0, \end{aligned}$$
(14)

which has been proved in [6, 23] for newest vertex bisection.

2.5 Estimator properties

For \(\mathcal {T}_H\in \mathbb {T}\) and \(v_H\in \mathcal {X}_H\), let

$$\begin{aligned} \eta _H(T, v_H) \ge 0 \quad \text {and}\quad \zeta _H(T, v_H) \ge 0 \quad \text {for all } T \in \mathcal {T}_H\end{aligned}$$

be given refinement indicators. For \(\mu _H\in \{\eta _H, \zeta _H\}\), we use the usual convention that

$$\begin{aligned} \mu _H(v_H) := \mu _H(\mathcal {T}_H, v_H), \quad \text {where} \quad \mu _H(\mathcal {U}_H, v_H) = \bigg ( \sum _{T \in \mathcal {U}_H} \mu _H(T, v_H)^2 \bigg )^{1/2} \end{aligned}$$
(15)

for all \(v_H\in \mathcal {X}_H\) and all \(\mathcal {U}_H\subseteq \mathcal {T}_H\).

We suppose that the estimators \(\eta _H\) and \(\zeta _H\) satisfy the so-called axioms of adaptivity (which are designed for, but not restricted to, weighted-residual error estimators) from [5]: There exist constants \(C_\textrm{stab}, C_\textrm{rel}, C_\textrm{drel}> 0\) and \(0< q_\textrm{red}< 1\) such that for all \(\mathcal {T}_H\in \mathbb {T}(\mathcal {T}_0)\) and all \(\mathcal {T}_h\in \mathbb {T}(\mathcal {T}_H)\), the following assumptions are satisfied:

  1. (A1)

    Stability: For all \(v_h\in \mathcal {X}_h\), \(v_H\in \mathcal {X}_H\), and \(\mathcal {U}_H\subseteq \mathcal {T}_h\cap \mathcal {T}_H\), it holds that

    $$\begin{aligned} \big | \eta _h(\mathcal {U}_H, v_h) - \eta _H(\mathcal {U}_H, v_H) \big | + \big | \zeta _h(\mathcal {U}_H, v_h) - \zeta _H(\mathcal {U}_H, v_H) \big |&\le C_\textrm{stab}\, |\!|\!| v_h- v_H |\!|\!|. \end{aligned}$$
  2. (A2)

    Reduction: For all \(v_H\in \mathcal {X}_H\), it holds that

    $$\begin{aligned}&\eta _h(\mathcal {T}_h\backslash \mathcal {T}_H, v_H) \le q_\textrm{red}\, \eta _H(\mathcal {T}_H\backslash \mathcal {T}_h, v_H) \quad \text { and } \\&\zeta _h(\mathcal {T}_h\backslash \mathcal {T}_H, v_H) \le q_\textrm{red}\, \zeta _H(\mathcal {T}_H\backslash \mathcal {T}_h, v_H). \end{aligned}$$
  3. (A3)

    Reliability: The Galerkin solutions \(u_H^\star , z_H^\star \in \mathcal {X}_H\) to (8) satisfy that

    $$\begin{aligned} |\!|\!| u^\star - u_H^\star |\!|\!|&\le C_\textrm{rel}\, \eta _H(u_H^\star ) \quad \text { and } \quad |\!|\!| z^\star - z_H^\star |\!|\!| \le C_\textrm{rel}\, \zeta _H(z_H^\star ). \end{aligned}$$
  4. (A4)

    Discrete reliability: The Galerkin solutions \(u_H^\star , z_H^\star \in \mathcal {X}_H\) and \(u_h^\star , z_h^\star \in \mathcal {X}_h\) to (8) satisfy that

    $$\begin{aligned} |\!|\!| u_h^\star - u_H^\star |\!|\!|&\le C_\textrm{drel}\, \eta _H(\mathcal {T}_H\backslash \mathcal {T}_h, u_H^\star ) \quad \text { and } \quad |\!|\!| z_h^\star - z_H^\star |\!|\!| \le C_\textrm{drel}\, \zeta _H(\mathcal {T}_H\backslash \mathcal {T}_h, z_H^\star ). \end{aligned}$$

By assumptions (A1) and (A3), we can estimate for every discrete function \(w_H\in \mathcal {X}_H\) the errors in the energy norm of the primal and the dual problem by

$$\begin{aligned}&|\!|\!| u^\star - w_H |\!|\!| \le C \, \big [ \eta _H(w_H) + |\!|\!| u_H^\star - w_H |\!|\!| \big ] \quad \text {and} \\&|\!|\!| z^\star - w_H |\!|\!| \le C \, \big [ \zeta _H(w_H) + |\!|\!| z_H^\star - w_H |\!|\!| \big ], \end{aligned}$$

respectively, where \(C = \max \{ C_\textrm{rel}, C_\textrm{rel}C_\textrm{stab}+ 1 \} > 0\). Together with (11), we then obtain that the goal error for approximations \(u_H^m \approx u_H^\star \) and \(z_H^n \approx z_H^\star \) in \(\mathcal {X}_H\) is bounded by

$$\begin{aligned} \big | G(u^\star ) - G_H(u_{H}^{m}, z_{H}^{n}) \big | \le C^2 \, \big [ \, \eta _H(u_{H}^{m}) + |\!|\!| u_H^\star - u_{H}^{m} |\!|\!| \, \big ] \, \big [ \, \zeta _H(z_{H}^{n}) + |\!|\!| z_H^\star - z_{H}^{n} |\!|\!| \, \big ]. \end{aligned}$$
(16)

In the following sections, we provide building blocks for our adaptive algorithm that allow to control the arising estimators (by a suitable marking strategy) as well as the arising norms in the upper bound of (16) (by an appropriate stopping criterion for the iterative solver).

2.6 Marking strategy

We suppose that the refinement indicators \(\eta _H(T, u_H^m)\) and \(\zeta _H(T, z_H^n)\) for some \(m,n \in \mathbb {N}\) are used to mark a subset \(\mathcal {M}_H\subseteq \mathcal {T}_H\) of elements for refinement, which, for fixed marking parameter \(0 < \theta \le 1\), satisfies that

$$\begin{aligned} 2 \theta \eta _H(u_H^m)^2 \zeta _H(z_H^n)^2 \le \eta _H(\mathcal {M}_H, u_H^m)^2 \zeta _H(z_H^n)^2 + \zeta _H(\mathcal {M}_H, z_H^n)^2 \eta _H(u_H^m)^2. \end{aligned}$$
(17)

Remark 2

Given \(0 < \vartheta \le 1\), possible choices of marking strategies satisfying assumption (17) are the following:

  1. (a)

    The strategy proposed in [2] defines the weighted estimator

    $$\begin{aligned} \rho _H(T, u_H^m, z_H^n)^2 := \eta _H(T, u_H^m)^2 \zeta _H(z_H^n)^2 + \eta _H(u_H^m)^2 \zeta _H(T, z_H^n)^2 \end{aligned}$$

    and then determines a set \(\mathcal {M}_H\subseteq \mathcal {T}_H\) such that

    $$\begin{aligned} \vartheta \, \rho _H(u_H^m, z_H^n) \le \rho _H(\mathcal {M}_H, u_{H}^{m}, z_H^n) \end{aligned}$$
    (18)

    which is the Dörfler marking criterion introduced in [9] and well-known in the context of AFEM analysis; see, e.g., [5]. This strategy satisfies (17) with \(\theta = \vartheta ^2\).

  2. (b)

    The strategy proposed in [20] determines sets \(\overline{\mathcal {M}}_H^{u}, \overline{\mathcal {M}}_H^{z} \subseteq \mathcal {T}_H\) such that

    $$\begin{aligned} \vartheta \, \eta _H(u_{H}^{m}) \le \eta _\ell (\overline{\mathcal {M}}_H^{u},u_{H}^{m}) \quad \text {and} \quad \vartheta \, \zeta _H(z_{H}^{n}) \le \zeta _H(\overline{\mathcal {M}}_H^{z}, z_{H}^{n}) \end{aligned}$$
    (19)

    and then chooses \(\mathcal {M}_H:= \arg \min \{ \, \#\overline{\mathcal {M}}_H^{u} \,,\, \#\overline{\mathcal {M}}_H^{z} \, \}\). This strategy satisfies (17) with \(\theta = \vartheta ^2/2\).

  3. (c)

    A more aggressive variant of (b) was proposed in [14]: Let \(\overline{\mathcal {M}}_H^{u}\) and \(\overline{\mathcal {M}}_H^{z}\) as above. Then, choose \(\mathcal {M}_H^{u} \subseteq \overline{\mathcal {M}}_H^{u}\) and \(\mathcal {M}_H^{z} \subseteq \overline{\mathcal {M}}_H^{z}\) with \(\#\mathcal {M}_H^{u} = \#\mathcal {M}_H^{z} = \min \{ \, \#\overline{\mathcal {M}}_H^{u} \,,\, \#\overline{\mathcal {M}}_H^{z} \, \}\). Finally, define \(\mathcal {M}_H:= \mathcal {M}_H^{u} \cup \mathcal {M}_H^{z}\). Again, this strategy satisfies (17) with \(\theta = \vartheta ^2/2\).

Note that our main results of Theorem 6 and 8 below hold true for all presented marking criteria (a)–(c). For our numerical experiments, we focus on criterion (a), which empirically tends to achieve slightly better performance in practice.

2.7 Adaptive algorithm

Any adaptive algorithm strives to drive down the bound in (16). However, the errors of the iterative solver, \(|\!|\!| u_H^\star - u_{H}^{m} |\!|\!|\) and \(|\!|\!| z_H^\star - z_{H}^{n} |\!|\!|\), cannot be computed in general since the exact discrete solutions \(u_H^\star , z_H^\star \in \mathcal {X}_H\) to (8) are unknown and will not be computed. Thus, we note that (9) and the triangle inequality prove that

$$\begin{aligned} (1 - q_\textrm{ctr}) \, |\!|\!| u_H^\star - u_{H}^{m-1} |\!|\!| \le |\!|\!| u_{H}^{m} - u_{H}^{m-1} |\!|\!| \le (1 + q_\textrm{ctr}) \, |\!|\!| u_H^\star - u_{H}^{m-1} |\!|\!| \end{aligned}$$
(20a)

as well as

$$\begin{aligned} (1 - q_\textrm{ctr}) \, |\!|\!| z_H^\star - z_{H}^{n-1} |\!|\!| \le |\!|\!| z_{H}^{n} - z_{H}^{n-1} |\!|\!| \le (1 + q_\textrm{ctr}) \, |\!|\!| z_H^\star - z_{H}^{n-1} |\!|\!|. \end{aligned}$$
(20b)

With \(C_\textrm{goal}= \max \{ C_\textrm{rel}, C_\textrm{rel}C_\textrm{stab}+ 1 \} \, \big (1 + q_\textrm{ctr}/ (1-q_\textrm{ctr})\big )\), (16) leads to

$$\begin{aligned} \big | G(u^\star ) - G_H(u_{H}^{m}, z_{H}^{n}) \big | \le C_\textrm{goal}^2 \, \big [ \eta _H(u_{H}^{m}) + |\!|\!| u^m_H- u^{m-1}_H |\!|\!| \big ] \big [ \zeta _H(z_{H}^{n}) + |\!|\!| z^n_H- z^{n-1}_H |\!|\!| \big ], \end{aligned}$$
(21)

which is a computable upper bound to the goal error if \(m, n \ge 1\). Moreover, given some \(\lambda _\textrm{ctr}> 0\), this motivates to stop the iterative solvers as soon as

$$\begin{aligned} |\!|\!| u_{H}^{m} - u_{H}^{m-1} |\!|\!| \le \lambda _\textrm{ctr}\, \eta _H(u_{H}^{m}) \quad \text { and } \quad |\!|\!| z_{H}^{n} - z_{H}^{n-1} |\!|\!| \le \lambda _\textrm{ctr}\, \zeta _H(z_{H}^{n}) \end{aligned}$$

to equibalance the contributions of the upper bound in (21); alternative stopping criteria are introduced and analyzed below. Overall, we thus consider the following adaptive algorithm.

Algorithm 3

Let \(u_{0}^{0}, z_{0}^{0} \in \mathcal {X}_0\) be initial guesses. Let \(0 < \theta \le 1\) as well as \(\lambda _\textrm{ctr}> 0\) be arbitrary but fixed marking parameters. For all \(\ell = 0,1,2,\dots \), perform the following steps (i)–(vi):

  1. (i)

    Employ (at least one step of) the iterative solver to compute iterates \(u_{\ell }^{1}, \dots , u_{\ell }^{m}\) and \(z_{\ell }^{1}, \dots , z_{\ell }^{n}\) together with the corresponding refinement indicators \(\eta _\ell (T,u_{\ell }^{k})\) and \(\zeta _\ell (T, z_{\ell }^{k})\) for all \(T \in \mathcal {T}_\ell \), until

    $$\begin{aligned} |\!|\!| u_{\ell }^{m} - u_{\ell }^{ m-1} |\!|\!| \le \lambda _\textrm{ctr}\, \eta _\ell (u_{\ell }^{m}) \quad \text {and} \quad |\!|\!| z_{\ell }^{n} - z_{\ell }^{ n-1} |\!|\!| \le \lambda _\textrm{ctr}\, \zeta _\ell (z_{\ell }^{n}). \end{aligned}$$
    (22)
  2. (ii)

    Define \({\underline{m}}(\ell ) := m\) and \({\underline{n}}(\ell ) := n\).

  3. (iii)

    If \(\eta _\ell (u_\ell ^m) = 0\) or \(\zeta _\ell (z_\ell ^m) = 0\), then define \(\underline{\ell } := \ell \) and terminate.

  4. (iv)

    Otherwise, find a set \(\mathcal {M}_\ell \subseteq \mathcal {T}_\ell \) such that the marking criterion (17) is satisfied.

  5. (v)

    Generate \(\mathcal {T}_{\ell +1} := \texttt{refine}(\mathcal {T}_\ell , \mathcal {M}_\ell )\).

  6. (vi)

    Define the initial guesses \(u_{\ell +1}^{ 0} := u_{\ell }^{m}\) and \(z_{\ell +1}^{ 0} := z_{\ell }^{n}\) for the iterative solver.

Remark 4

Theorem 6 below proves (linear) convergence for any choice of the marking parameters \(0 < \theta \le 1\) and \(\lambda _\textrm{ctr}> 0\), and for any of the marking strategies from Remark 2. Theorem 8 below proves optimal convergence rates (with respect to the number of elements and the total computational cost) if both parameters are sufficiently small (see (32) for the precise condition) and if the set \(\mathcal {M}_\ell \) is constructed by one of the strategies from Remark 2, where the respective sets have quasi-minimal cardinality.

Remark 5

Note that Algorithm 3(i) requires to evaluate the error estimator after each solver step. Clearly, it would be favorable to replace \(\eta _\ell (u_\ell ^m)\) (resp. \(\zeta _\ell (z_\ell ^n)\)) by \(\eta _\ell (u_\ell ^0)\) (resp. \(\zeta _\ell (z_\ell ^0)\)) in (22). Arguing as in [13, Lemma 8], this allows to prove convergence of the adaptive strategy, but full linear convergence (Theorem 6 below) and optimal convergence rates (Theorem 8 below) are exptected to fail.

For each adaptive level \(\ell \), Algorithm 3 performs at least one solver step to compute \(u_{\ell }^{m}\) as well as one solver step to compute \(z_{\ell }^{n}\). By definition, \({\underline{m}}(\ell ) \ge 1\) is the solver step, for which the discrete solution \(u_{\ell }^{ {\underline{m}}(\ell )}\) is accepted (to contribute to the set of marked elements \(\mathcal {M}_\ell \)). Analogously, \({\underline{n}}(\ell ) \ge 1\) is the solver step, for which the discrete solution \(z_{\ell }^{ {\underline{n}}(\ell )}\) is accepted (to contribute to \(\mathcal {M}_\ell \)). If the iterative solver for either the primal or the dual problem fails to terminate for some level \(\ell \in \mathbb {N}_0\), i.e., (22) cannot be achieved for finite m, or n, we define \({\underline{m}}(\ell ) := \infty \), or \({\underline{n}}(\ell ) := \infty \), respectively, and \(\underline{\ell } := \ell \). With \({\underline{k}}(\ell ) := \max \{{\underline{m}}(\ell ), {\underline{n}}(\ell )\}\), we define

$$\begin{aligned} {\begin{matrix} u_{\ell }^{k} &{}:= u_{\ell }^{{\underline{m}}(\ell )} \text {for all } k \in \mathbb {N}\text { with } {\underline{m}}(\ell )< k \le {\underline{k}}(\ell ), \\ z_{\ell }^{k} &{}:= z_{\ell }^{{\underline{n}}(\ell )} \text {for all } k \in \mathbb {N}\text { with } {\underline{n}}(\ell ) < k \le {\underline{k}}(\ell ). \end{matrix}} \end{aligned}$$
(23)

For ease of presentation, we omit the \(\ell \)-dependence of the indices for final iterates \({\underline{m}}(\ell )\), \({\underline{n}}(\ell )\), and \({\underline{k}}(\ell )\) in the following, if they appear as upper indices and write, e.g., \(u_{\ell }^{{\underline{m}}} := u_{\ell }^{{\underline{m}}(\ell )}\) and \(u_{\ell }^{{\underline{m}}-1} := u_{\ell }^{{\underline{m}}(\ell )-1}\). If Algorithm 3 does not terminate in step (iii) for some \(\ell \in \mathbb {N}\), then we define \(\underline{\ell } := \infty \). To formulate the convergence of Algorithm 3, we define the ordered set

$$\begin{aligned} \mathcal {Q}:= \big \{(\ell ,k) \in \mathbb {N}_0^2 \,:\, \ell \le \underline{\ell } \text { and } 1 \le k \le {\underline{k}}(\ell ) \big \}, \quad \text {where} \quad |(\ell ,k)| := k + \sum _{j=0}^{\ell -1} {\underline{k}}(j). \end{aligned}$$
(24)

Note that \(|(\ell ,k)|\) is proportional to the overall number of solver steps to compute the estimator product \(\eta _\ell (u_\ell ^k)\zeta _\ell (z_\ell ^k)\). Additionally, we sometimes require the notation

$$\begin{aligned} \mathcal {Q}_0 := \big \{(\ell ,k)\in \mathbb {N}_0^2 \,:\, \ell \le \underline{\ell } \text { and } 0\le k \le {\underline{k}}(\ell ) \big \} = \mathcal {Q}\cup \big \{(\ell ,0)\in \mathbb {N}_0^2 \,:\, \ell \le \underline{\ell } \big \}. \end{aligned}$$
(25)

To estimate the work necessary to compute a pair \((u_{\ell }^{k}, z_{\ell }^{k}) \in \mathcal {X}_\ell \times \mathcal {X}_\ell \), we make the following assumptions which are usually satisfied in practice:

  • The iterates \(u_{\ell }^{k}\) and \(z_{\ell }^{k}\) are computed in parallel and each step of the solver in Algorithm 3(i) can be done in linear complexity \(\mathcal {O}(\#\mathcal {T}_\ell )\);

  • Computation of all indicators \(\eta _\ell (T,u_{\ell }^{k})\) and \(\zeta _\ell (T, z_{\ell }^{k})\) for \(T \in \mathcal {T}_\ell \) requires \(\mathcal {O}(\#\mathcal {T}_\ell )\) steps;

  • The marking in Algorithm 3(iv) can be performed at linear cost \(\mathcal {O}(\#\mathcal {T}_\ell )\) (according to [23] this can be done for the strategies outlined in Remark 2 with \(\mathcal {M}_\ell \) having almost minimal cardinality; moreover, we refer to a recent own algorithm in [21] with linear cost even for \(\mathcal {M}_\ell \) having minimal cardinality);

  • We have linear cost \(\mathcal {O}(\#\mathcal {T}_\ell )\) to generate the new mesh \(\mathcal {T}_{\ell +1}\).

Since a step \((\ell ,k) \in \mathcal {Q}\) of Algorithm 3 depends on the full history of preceding steps, the total work spent to compute \((u_{\ell }^{k}, z_{\ell }^{k}) \in \mathcal {X}_\ell \times \mathcal {X}_\ell \) is then of order

$$\begin{aligned} \texttt{work}(\ell ,k) := \sum _{\begin{array}{c} (\ell ',k') \in \mathcal {Q}\\ |(\ell ',k')| \le |(\ell ,k)| \end{array}} \#\mathcal {T}_{\ell '} \quad \text {for all } (\ell ,k) \in \mathcal {Q}. \end{aligned}$$
(26)

Finally, we note that Algorithm 3(vi) employs nested iteration to obtain the initial guesses \(u^0_{\ell +1}, z^0_{\ell +1}\) of the solver from the final iterates \(u^{{\underline{m}}}_{\ell }, z^{{\underline{n}}}_{\ell }\) for the mesh \(\mathcal {T}_\ell \). According to (21), this allows for a posteriori error control for all indices \((\ell , k) \in \mathcal {Q}_0 \setminus \{(0,0)\}\) beyond the initial step.

3 Main results

3.1 Linear convergence with optimal rates

Our first main result states linear convergence of the quasi-error product

$$\begin{aligned} \Lambda _\ell ^k := \big [ \, |\!|\!| u_{\ell }^{\star } - u_{\ell }^{k} |\!|\!| + \eta _\ell (u_{\ell }^{k}) \, \big ] \big [ \, |\!|\!| z_{\ell }^{\star } - z_{\ell }^{k} |\!|\!| + \zeta _\ell (z_{\ell }^{k}) \, \big ] \quad \text {for all } (\ell ,k) \in \mathcal {Q}_0 \end{aligned}$$
(27)

for every choice of the stopping parameter \(\lambda _\textrm{ctr}> 0\). Recall from (16) that the quasi-error product is an upper bound for the error \(|G(u^\star ) - G_\ell (u_\ell ^k,z_\ell ^k)|\). Moreover, if \(k={\underline{k}}(\ell )\), then () and (22) give that \(\Lambda _\ell ^{\underline{k}}\simeq \eta _\ell (u_\ell ^{\underline{k}})\zeta _\ell (z_\ell ^{\underline{k}})\).

Theorem 6

Suppose (A1)–(A3). Suppose that \(0 < \theta \le 1\) and \(\lambda _\textrm{ctr}> 0\). Then, Algorithm 3 satisfies linear convergence in the sense of

$$\begin{aligned} \Lambda _{\ell '}^{k'} \le C_\textrm{lin}q_\textrm{lin}^{|(\ell '\!,k')| - |(\ell ,k)|} \, \Lambda _{\ell }^{k} \quad \text {for all } (\ell ,k),(\ell '\!,k') \in \mathcal {Q}\cup \{(0,0)\} \text { with } |(\ell '\!,k')| \ge |(\ell ,k)|. \end{aligned}$$
(28)

The constants \(C_\textrm{lin}> 0 \) and \(0< q_\textrm{lin}< 1\) depend only on \(C_\textrm{stab}\), \(q_\textrm{red}\), \(C_\textrm{rel}\), \(q_\textrm{ctr}\), and the (arbitrary) adaptivity parameters \(0 < \theta \le 1\) and \(\lambda _\textrm{ctr}> 0\).

Full linear convergence implies that convergence rates with respect to degrees of freedom and with respect to total computational cost are equivalent. From this point of view, full linear convergence indeed turns out to be the core argument for optimal complexity.

Corollary 7

Recall the definition of the total computational cost \(\texttt{work}(\ell ,k)\) from (26). Let \(r > 0\) and \(C_r := \sup _{(\ell ,k) \in \mathcal {Q}} (\#\mathcal {T}_\ell - \#\mathcal {T}_0 + 1)^r \Lambda _\ell ^k\in [0,\infty ]\). Then, under the assumptions of Theorem 6, it holds that

$$\begin{aligned} C_r \le \sup _{(\ell ,k) \in \mathcal {Q}} (\#\mathcal {T}_\ell )^r \, \Lambda _\ell ^k \le \sup _{(\ell ,k) \in \mathcal {Q}} \texttt{work}(\ell ,k)^r \, \Lambda _\ell ^k \le C_\textrm{rate}\, C_r, \end{aligned}$$
(29)

where the constant \(C_\textrm{rate}> 0\) depends only on r, \(\#\mathcal {T}_0\), and on the constants \(q_\textrm{lin}, C_\textrm{lin}\) from Theorem 6.

Proof

The first two estimates in (29) are obvious. It remains to prove the last estimate in (29). To this end, note that it follows from the definition of \(C_r\) that

$$\begin{aligned} \#\mathcal {T}_\ell - \#\mathcal {T}_0 + 1 \le \big ( \Lambda _\ell ^k \big )^{-1/r} \, C_r^{1/r} \quad \text {for all } (\ell , k) \in \mathcal {Q}. \end{aligned}$$

Moreover, elementary algebra yields that

$$\begin{aligned} \#\mathcal {T}_{\ell '}\le \#\mathcal {T}_0 (\#\mathcal {T}_{\ell '}-\#\mathcal {T}_0+1) \quad \text {for all }(\ell ',0)\in \mathcal {Q}_0. \end{aligned}$$

For \((\ell , k) \in \mathcal {Q}\), Theorem 6 and the geometric series thus show that

$$\begin{aligned} \texttt{work}(\ell , k)&{\mathop {=}\limits ^{26}} \sum _{\begin{array}{c} (\ell ',k') \in \mathcal {Q}\\ |(\ell ',k')| \le |(\ell ,k)| \end{array}} \#\mathcal {T}_{\ell '} \le \#\mathcal {T}_0 \, \sum _{\begin{array}{c} (\ell ',k') \in \mathcal {Q}\\ |(\ell ',k')| \le |(\ell ,k)| \end{array}} (\#\mathcal {T}_{\ell '} - \#\mathcal {T}_0 + 1)\\&\le \#\mathcal {T}_0 C_r^{1/r} \sum _{\begin{array}{c} (\ell ',k') \in \mathcal {Q}\\ |(\ell ',k')| \le |(\ell ,k)| \end{array}} \big ( \Lambda _{\ell '}^{k'} \big )^{-1/r} \le \#\mathcal {T}_0 C_r^{1/r} C_\textrm{lin}^{1/r} \frac{1}{1-q_\textrm{lin}^{1/r}} \, \big ( \Lambda _{\ell }^{k} \big )^{-1/r}. \end{aligned}$$

With \(C_\textrm{rate}:= (\#\mathcal {T}_0)^r C_\textrm{lin}\, 1/(1-q_\textrm{lin}^{1/r})^r\), this gives that

$$\begin{aligned} \texttt{work}(\ell , k)^r \Lambda _{\ell }^{k} \le C_\textrm{rate}C_r \quad \text {for all } (\ell , k) \in \mathcal {Q}. \end{aligned}$$

This shows the final inequality in (29) and thus concludes the proof.

If \(\theta \) and \(\lambda _\textrm{ctr}\) are small enough, we are able to show that linear convergence from Theorem 6 even guarantees optimal rates with respect to both the number of unknowns \(\#\mathcal {T}_\ell \) and the total cost \(\texttt{work}(\ell ,k)\). Given \(N\in \mathbb {N}_0\), let \(\mathbb {T}(N)\) be the set of all \(\mathcal {T}_H\in \mathbb {T}\) with \(\#\mathcal {T}_H-\#\mathcal {T}_0\le N\). With

$$\begin{aligned} \Vert u^\star \Vert _{\mathbb {A}_r} := \sup _{N\in \mathbb {N}_0} (N+1)^r \min _{\mathcal {T}_\textrm{opt}\in \mathbb {T}(N)} \eta _\textrm{opt}(u_\textrm{opt}^\star ) \in [0,\infty ] \end{aligned}$$
(30a)

and

$$\begin{aligned} \Vert z^\star \Vert _{\mathbb {A}_r} := \sup _{N\in \mathbb {N}_0} (N+1)^r \min _{\mathcal {T}_\textrm{opt}\in \mathbb {T}(N)}\zeta _\textrm{opt}(z_\textrm{opt}^\star ) \in [0,\infty ] \end{aligned}$$
(30b)

for all \(r>0\), there holds the following result.

Theorem 8

Recall the definition of the total computational cost \(\texttt{work}(\ell ,k)\) from (26). Suppose the mesh properties (12)–(14) as well as the axioms (A1)–(A4). Define

$$\begin{aligned} \theta _\star := \frac{1}{1+C_\textrm{stab}^2C_\textrm{drel}^2} \quad \text {and} \quad \lambda _\star := \frac{1-q_\textrm{ctr}}{q_\textrm{ctr}C_\textrm{stab}}. \end{aligned}$$
(31)

Let both adaptivity parameters \(0<\theta \le 1\) and \(0<\lambda _\textrm{ctr}<\lambda _\star \) be sufficiently small such that

$$\begin{aligned} 0< \Big (\frac{\sqrt{2\theta } + \lambda _\textrm{ctr}/\lambda _\star }{1-\lambda _\textrm{ctr}/\lambda _\star }\Big )^2 < \theta _\star . \end{aligned}$$
(32)

Let \(1 \le C_\textrm{mark}< \infty \). Suppose that the set of marked elements \(\mathcal {M}_\ell \) in Algorithm 3(iv) is constructed by one of the strategies from Remark 2(a)–(c), where the sets in (18) and (19) have up to the factor \(C_\textrm{mark}\) minimal cardinality. Let \(s,t>0\) with \(\Vert u^\star \Vert _{\mathbb {A}_s}+\Vert z^\star \Vert _{\mathbb {A}_t}<\infty \). Then, there exists a constant \(C_\textrm{opt}>0\) such that

$$\begin{aligned} \sup _{(\ell ,k)\in \mathcal {Q}} \texttt{work}(\ell ,k)^{s+t} \Lambda _\ell ^k \le C_\textrm{opt}\max \{\Vert u^\star \Vert _{\mathbb {A}_s}\Vert z^\star \Vert _{\mathbb {A}_t},\Lambda _0^0\}. \end{aligned}$$
(33)

The constant \(C_\textrm{opt}\) depends only on \(C_\textrm{cls}\), \(C_\textrm{stab}\), \(q_\textrm{red}\), \(C_\textrm{rel}\), \(C_\textrm{drel}\), \(q_\textrm{ctr}\), \(C_\textrm{mark}\), \(\theta \), \(\lambda _\textrm{ctr}\), \(\#\mathcal {T}_0\), s, and t.

Remark 9

The constraint (32) is enforced by our analysis of the marking strategy from Remark 2(a), while the marking strategies from Remark 2(b)–(c) allow to relax the condition to

$$\begin{aligned} 0< \Big (\frac{\sqrt{\theta } + \lambda _\textrm{ctr}/\lambda _\star }{1-\lambda _\textrm{ctr}/\lambda _\star }\Big )^2 < \theta _\star . \end{aligned}$$
(34)

3.2 Alternative termination criteria for iterative solver

The above formulations of Algorithm 3 stops the iterative solver for \(u_\ell ^m\) and the iterative solver for \(z_\ell ^n\) independently of each other as soon as the respective termination criteria in (22) are satisfied. In this section, we briefly discuss two alternative termination criteria:

Stronger termination: The current proof of linear convergence (and of the subsequent proof of optimal convergence) does only exploit that \(u_{\ell }^{{\underline{k}}}\) and \(z_{\ell }^{{\underline{k}}}\) satisfy the stopping criterion and the previous iterates do not (cf. Lemma 10(iii)). This can also be ensured by the following modification of Algorithm 3(i):

  1. (i)

    Employ the iterative solver to compute iterates \(u_{\ell }^{1}, \dots , u_{\ell }^{k}\) and \(z_{\ell }^{1}, \dots , z_{\ell }^{k}\) together with the corresponding refinement indicators \(\eta _\ell (T,u_{\ell }^{k})\) and \(\zeta _\ell (T, z_{\ell }^{k})\) for all \(T \in \mathcal {T}_\ell \), until

    $$\begin{aligned} |\!|\!| u_{\ell }^{k} - u_{\ell }^{ k-1} |\!|\!| \le \lambda _\textrm{ctr}\, \eta _\ell (u_{\ell }^{k}) \quad \text {and} \quad |\!|\!| z_{\ell }^{k} - z_{\ell }^{ k-1} |\!|\!| \le \lambda _\textrm{ctr}\, \zeta _\ell (z_{\ell }^{k}). \end{aligned}$$
    (35)

Note that this will lead to more solver steps, since now \(k = {\underline{k}}(\ell )\) (if it exists) is the smallest index for which the stopping criterion holds simultaneously for both \(u_{\ell }^{{\underline{k}}}\) and \(z_{\ell }^{{\underline{k}}}\).

Inspecting the proof of Lemma 10 below, we see that all results hold verbatim also for this stopping criterion. Thus, we conclude linear and optimal convergence (in the sense of Theorem 6 and Theorem 8) also in this case.

Natural termination: The following stopping criterion (which is somehow the most natural candidate) also leads to linear convergence: Let \({\underline{m}}(\ell ), {\underline{n}}(\ell ) \in \mathbb {N}\) be minimal with (22). If either of them do not exist, we set again \({\underline{m}}(\ell ) = \infty \), or \({\underline{n}}(\ell ) = \infty \), respectively. Define \({\underline{k}}(\ell ) := \max \{{\underline{m}}(\ell ), {\underline{n}}(\ell )\}\). Then, employ the iterative solver \({\underline{k}}(\ell )\) times for both the primal and the dual problem, i.e., the solver provides iterates \(u_\ell ^k\) and \(z_\ell ^k\) until both stopping criteria in (22) have been satisfied once (which avoids the artificial definition (23)). For instance, if \({\underline{m}}(\ell )< {\underline{n}}(\ell )={\underline{k}}(\ell )<\infty \), we continue to iterate for the primal problem until \(u_{\ell }^{{\underline{k}}}\) is obtained (or never stop the iteration if \({\underline{n}}(\ell )={\underline{k}}(\ell ) = \infty \)). If \(\lambda _\textrm{ctr}> 0\) is sufficiently small such that \(1 - \frac{q_\textrm{ctr}}{1-q_\textrm{ctr}} \, C_\textrm{stab}\, (1+q_\textrm{ctr}) \lambda _\textrm{ctr}> 0\), then we can define

$$\begin{aligned} \lambda _\textrm{ctr}\le \lambda _\textrm{ctr}^\prime := \max \Big \{ 1, \frac{(1+q_\textrm{ctr})q_\textrm{ctr}}{(1-q_\textrm{ctr}) \big ( 1 - \frac{q_\textrm{ctr}}{1-q_\textrm{ctr}} \, C_\textrm{stab}\, (1+q_\textrm{ctr}) \lambda _\textrm{ctr}\big )} \Big \} \, \lambda _\textrm{ctr}< \infty , \end{aligned}$$

and we can guarantee the stopping condition (22) with the larger constant \(\lambda _\textrm{ctr}'\), i.e.,

$$\begin{aligned} |\!|\!| u_{\ell }^{ {\underline{k}}} - u_{\ell }^{ {\underline{k}}-1} |\!|\!| \le \lambda _\textrm{ctr}^\prime \, \eta _\ell (u_{\ell }^{{\underline{k}}}) \quad \text {and}\quad |\!|\!| z_{\ell }^{ {\underline{k}}} - z_{\ell }^{ {\underline{k}}-1} |\!|\!| \le \lambda _\textrm{ctr}^\prime \, \zeta _\ell (z_{\ell }^{{\underline{k}}}); \end{aligned}$$
(36)

see the proof below. Again, we notice that then the assumptions of Lemma 10 below are met. Hence, we conclude linear convergence (in the sense of Theorem 6) also for this stopping criterion. Moreover, optimal rates in the sense of Theorem 8 hold if \(\lambda _\textrm{ctr}\) in (32) is replaced by \(\lambda _\textrm{ctr}'\).

Proof of (36)

Without loss of generality, let us assume that \({\underline{m}}(\ell )< {\underline{k}}(\ell ) = {\underline{n}}(\ell )<\infty \). First, we have that

$$\begin{aligned} |\!|\!| u_\ell ^{\underline{k}}- u_\ell ^{\underline{m}} |\!|\!| \le |\!|\!| u_\ell ^\star - u_\ell ^{\underline{k}} |\!|\!| + |\!|\!| u_\ell ^\star - u_\ell ^{\underline{m}} |\!|\!| \le (1 + q_\textrm{ctr}^{{\underline{k}}(\ell )-{\underline{m}}(\ell )}) |\!|\!| u_\ell ^\star - u_\ell ^{\underline{m}} |\!|\!|. \end{aligned}$$

Then, using the fact that \(u_\ell ^{\underline{m}}\) satisfies the stopping criterion in (22) and stability (A1), we get that

For \(\lambda _\textrm{ctr}< (1-q_\textrm{ctr}) / [C_\textrm{stab}q_\textrm{ctr}(1 + q_\textrm{ctr}^{{\underline{k}}(\ell )-{\underline{m}}(\ell )})]\) we can absorb the last term to obtain

$$\begin{aligned} |\!|\!| u_\ell ^\star - u_\ell ^{\underline{m}} |\!|\!| \le \frac{q_\textrm{ctr}}{1-q_\textrm{ctr}} \Big ( 1 - \frac{C_\textrm{stab}q_\textrm{ctr}}{1-q_\textrm{ctr}}(1 + q_\textrm{ctr}^{{\underline{k}}(\ell )-{\underline{m}}(\ell )}) \lambda _\textrm{ctr}\Big )^{-1} \lambda _\textrm{ctr}\eta _\ell (u_\ell ^{\underline{k}}). \end{aligned}$$

Finally, we observe that

$$\begin{aligned} |\!|\!| u_{\ell }^{ {\underline{k}}} - u_{\ell }^{ {\underline{k}}-1} |\!|\!| \le (1+q_\textrm{ctr}) |\!|\!| u_{\ell }^{ \star } - u_{\ell }^{ {\underline{k}}-1} |\!|\!| \le (1+q_\textrm{ctr}) q_\textrm{ctr}^{{\underline{k}}-{\underline{m}}-1} |\!|\!| u_{\ell }^{ \star } - u_{\ell }^{ {\underline{m}}} |\!|\!|. \end{aligned}$$

Combining the last two estimates we obtain that

$$\begin{aligned} |\!|\!| u_{\ell }^{ {\underline{k}}} - u_{\ell }^{ {\underline{k}}-1} |\!|\!| \le \frac{(1+q_\textrm{ctr})q_\textrm{ctr}^{\,{\underline{k}}(\ell )-{\underline{m}}(\ell )}}{(1-q_\textrm{ctr}) \big ( 1 - \frac{q_\textrm{ctr}}{1-q_\textrm{ctr}} \, C_\textrm{stab}\, (1+q_\textrm{ctr}^{\,{\underline{k}}(\ell )-{\underline{m}}(\ell )}) \lambda _\textrm{ctr}\big )} \, \lambda _\textrm{ctr}\, \eta _\ell (u_{\ell }^{{\underline{k}}}). \end{aligned}$$

Hence, (36) follows with \(q_\textrm{ctr}^{\,{\underline{k}}(\ell )-{\underline{m}}(\ell )} \le q_\textrm{ctr}\) and \(|\!|\!| z_{\ell }^{ {\underline{k}}} - z_{\ell }^{ {\underline{k}}-1} |\!|\!| \le \lambda _\textrm{ctr}\, \zeta _\ell (z_{\ell }^{{\underline{k}}}) \le \lambda _\textrm{ctr}^\prime \, \zeta _\ell (z_{\ell }^{{\underline{k}}})\).\(\square \)

4 Numerical examples

In this section, we consider two numerical examples which solve the equation

$$\begin{aligned} {\begin{matrix} - \Delta u^\star &{}= f \quad \text {in } \Omega ,\\ u^\star &{}= 0 \quad \text {on } \Gamma _D,\\ \nabla u^\star \cdot \varvec{n} &{}= \phi \quad \text {on } \Gamma _N, \end{matrix}} \end{aligned}$$
(37)

where \(\phi \in L^2(\Gamma _N)\) and \(\varvec{n}\) is the element-wise outwards facing unit normal vector. We refer the reader to Remark 1 for a comment on the applicability of our results to this model problem. We further suppose that the goal functional is a slight variant of the one proposed in [20], i.e.,

$$\begin{aligned} G(v) = - \int _{\omega } \nabla v \cdot \varvec{g}{\text {d}}{x} \qquad \text {for } v \in H^1_D(\Omega ), \end{aligned}$$
(38)

with a subset \(\omega \subseteq \Omega \) and a fixed direction \(\varvec{g}(x) = \varvec{g}_0 \in \mathbb {R}^2\). Moreover, for error estimation, we employ standard residual error estimators, which in our case, for all \((\ell , k) \in \mathcal {Q}\) and all \(T \in \mathcal {T}_\ell \), read

$$\begin{aligned} \eta _\ell (T, u_\ell ^k)^2&:= h_T^2 \Vert \Delta u_\ell ^k + f \Vert _{L^2(T)}^2 + h_T \Vert [\![\nabla u_\ell ^k \cdot \varvec{n}]\!] \Vert _{L^2(\partial T \cap \Omega )}^2\\&\quad + h_T \Vert \nabla u_\ell ^k \cdot \varvec{n}- \phi \Vert _{L^2(\partial T \cap \Gamma _N)}^2,\\ \zeta _\ell (T, z_\ell ^k)^2&:= h_T^2 \Vert \textrm{div}\,(\nabla z_\ell ^k + \varvec{g}) \Vert _{L^2(T)}^2 + h_T \Vert [\![(\nabla z_\ell ^k + \varvec{g}) \cdot \varvec{n}]\!] \Vert _{L^2(\partial T \cap \Omega )}^2, \end{aligned}$$

where \(h_T = |T|^{1/2}\) is the local mesh-width and \([\![\cdot ]\!]\) denotes the jump across interior edges. It is well-known [5, 14] that \(\eta _\ell \) and \(\zeta _\ell \) satisfy the assumptions (A1)–(A4). The examples are chosen to showcase the performance of the proposed GOAFEM algorithm for different types of singularities.

Throughout this section, we solve (37) as well as the corresponding dual problem numerically using Algorithm 3, where we make the following choices:

  • We solve the problems on the lowest order finite element space, i.e., with polynomial degree \(p=1\).

  • As initial values, we use \(u_0^0 = z_0^0 = 0\).

  • To solve the arising linear systems, we use a preconditioned conjugate gradient (PCG) method with an optimal additive Schwarz preconditioner. We refer to [8, 22] for details and, in particular, the proof that this iterative solver satisfies (9).

  • We use the marking criterion from Remark 2(a) and choose \(\mathcal {M}_\ell \) such that it has minimal cardinality.

  • Unless mentioned otherwise, we use \(\vartheta = 0.5\) and \(\lambda _\textrm{ctr}= 10^{-5}\).

4.1 Singularity in goal functional only

In our first example, the primal problem is (37) with \(f = 2x_1(1-x_1) + 2x_2(1-x_2)\) on the unit square \(\Omega = (0,1)^2\), and \(\Gamma _D = \partial \Omega \) (and thus, \(\Gamma _N = \emptyset \)). For this problem, the exact solution reads

$$\begin{aligned} u^\star (x) =x_1 x_2 (1-x_1) (1-x_2). \end{aligned}$$

The goal functional is (38) with \(\omega = T_1 := \big \{x \in \Omega \,:\, x_1 + x_2 \ge 3/2 \big \}\) and \(\varvec{g}_0 = (-1, 0)\). The exact goal value can be computed analytically to be

$$\begin{aligned} G(u^\star ) = \int _{T_1} \frac{\partial u^\star }{\partial x_1} {\text {d}}{x} = 11/960. \end{aligned}$$

The initial mesh \(\mathcal {T}_0\) as well as a visualization of the set \(T_1\) can be seen in Fig. 1.

Fig. 1
figure 1

Left: Initial mesh \(\mathcal {T}_0\). The shaded area is the set \(T_1\) from Section (). Right: Mesh after 14 iterations of Algorithm 3 with \(\#\mathcal {T}_{14} = 4157\)

Fig. 2
figure 2

Comparison between iterative solvers for the problem from Sect. 4.1. A conjugate gradient method without preconditioner (CG) leads to optimal rates with respect to \(\#\mathcal {T}_\ell \) for the final iterates where \(k = {\underline{k}}(\ell )\), but not with respect to \(\texttt{work}(\ell ,k)\) for every \((\ell , k) \in \mathcal {Q}\). Our choice of the iterative solver (ML) achieves optimal rates with respect to both measures

For this setting, we compare our iterative solver to a conjugate gradient method without preconditioner in Fig. 2, where we plot the computable upper bound from (21),

$$\begin{aligned} \Xi _\ell ^k := \big [ \eta _\ell (u_\ell ^k) + |\!|\!| u_\ell ^k - u_\ell ^{k-1} |\!|\!| \big ] \big [ \zeta _\ell (z_\ell ^k) + |\!|\!| z_\ell ^k - z_\ell ^{k-1} |\!|\!| \big ] \quad \text {for all } (\ell ,k) \in \mathcal {Q}, \end{aligned}$$

over \(\texttt{work}(\ell , k)\) for all iterates \((\ell ,k) \in \mathcal {Q}\) and the estimator product for the final iterates \(\eta _\ell (u_\ell ^{\underline{k}}) \zeta _\ell (z_\ell ^{\underline{k}})\) over \(\# \mathcal {T}_\ell \). We stress that, for \((\ell ,k) \in \mathcal {Q}\), the computable upper bound \(\Xi _\ell ^k\) and the quasi-error product \(\Lambda _\ell ^k\) from (27) are related by \(\Lambda _\ell ^k \lesssim \Xi _\ell ^k \lesssim \Lambda _\ell ^{k-1}\) so that linear convergence (28) with optimal rates (33) of \(\Lambda _\ell ^k\) also yields linear convergence with optimal rates of \(\Xi _\ell ^k\). Since in our experiments \(\lambda _\textrm{ctr}= 10^{-5}\) is small, it is plausible to assume that the final estimates on every level approximate the exact solutions sufficiently well in the sense of estimator products, i.e., \(\eta _\ell (u_\ell ^{\underline{k}})\zeta _\ell (z_\ell ^{\underline{k}}) \approx \eta _\ell (u_\ell ^\star )\zeta _\ell (z_\ell ^\star )\) (cf. Lemma 13 below) for which [14] proves optimal convergence rates with respect to \(\#\mathcal {T}_\ell \). Indeed, we see optimal rates for \(\eta _\ell (u_\ell ^{\underline{k}}) \zeta _\ell (z_\ell ^{\underline{k}})\) with respect to \(\#\mathcal {T}_\ell \) for both solvers in Fig. 2. However, the non-preconditioned CG method fails to satisfy uniform contraction (9) and thus Theorem 8 cannot be applied. In fact, Fig. 2 shows that this method fails to drive down \(\Xi _\ell ^k\) with optimal rates with respect to \(\texttt{work}(\ell ,k)\) (cf. (26)), as opposed to the optimally preconditioned CG method.

Fig. 3
figure 3

Comparison between \(\Xi _\ell ^k\), discrete goal \(G_\ell (u_\ell ^k, z_\ell ^k)\), primal residual evaluated at the dual solution \(z_\ell ^k\), and direct evaluation of goal functional \(G(u_\ell ^k)\) for every iterate \((\ell ,k) \in \mathcal {Q}\) and different values of \(\lambda _\textrm{ctr}\in \{ 1, 10^{-2}, 10^{-4}, 10^{-6} \}\). The primal residual evaluated at the dual solution \(z_\ell ^k\) is the difference between goal and discrete goal; see (10)

Furthermore, we plot in Fig. 3 different error measures over \(\texttt{work}(\ell ,k)\) for every iterate \((\ell ,k) \in \mathcal {Q}\). This shows that the corrector term

$$\begin{aligned} a(u_\ell ^k , z_\ell ^k) - F(z_\ell ^k) \end{aligned}$$
(39)

(which is the residual of \(u_\ell ^k\) evaluated at the dual solution \(z_\ell ^k\)) in the definition of the discrete goal functional (10) is indeed necessary. We see that throughout the iteration, the goal value \(G(u_\ell ^k)\) highly oscillates and, for large values of \(\lambda _\textrm{ctr}\), even shows a different rate than the \(\Xi _\ell ^k\) over \(\texttt{work}(\ell ,k)\). In general, we thus cannot expect the quantity \(\Xi _\ell ^k\) to bound the uncorrected goal-error \(|G(u^\star ) - G(u_\ell ^k)|\).

For the discrete goal, the corrector term compensates the oscillations of the goal functional, such that their sum decreases with the same rate as \(\Xi _\ell ^k\), as predicted by (21). Smaller values of \(\lambda _\textrm{ctr}\) imply that on every level \(\ell \) the approximate solutions \(u_\ell ^k, z_\ell ^k\) are computed more accurately, such that the corrector term becomes smaller and the effect on the rate of the goal value becomes negligible.

4.2 Geometrical singularity

Our second example is the classical example of a geometric singularity on the so-called Z-shape \(\Omega = (-1,1)^2 \setminus \textrm{conv}\{(-1,-1), (0,0), (-1, 0)\}\), where \(\Gamma _D\) is only the re-entrant corner (cf. Fig. 4). The primal problem is (37) with \(f=0\) and \(\phi = \nabla u^\star \cdot \varvec{n}\), where the exact solution in polar coordinates r(x) and \(\varphi (x)\) of \(x \in \mathbb {R}^2\) is prescribed as

$$\begin{aligned} u^\star (x) = r(x)^{4/7} \sin (\tfrac{4}{7} \varphi (x) + \tfrac{3\pi }{7}). \end{aligned}$$

The goal functional is (38) with \(\omega = T_2 := (0.5, 0.5)^2 \cap \Omega \) and \(\varvec{g}_0 = (-1,-1)\) and can be computed directly via numerical integration to be

$$\begin{aligned} G(u^\star ) = \int _{T_2} \Big ( \frac{\partial u^\star }{\partial x_1} + \frac{\partial u^\star }{\partial x_2} \Big ) {\text {d}}{x} \approx 0.82962247157810. \end{aligned}$$

In Fig. 4, the initial triangulation \(\mathcal {T}_0\) as well as the mesh after several iterations of Algorithm 3 can be seen. The adaptive algorithm resolves the singularity at the re-entrant corner, as well as critical points of the goal functional, which are at the corners of \(T_2\).

Fig. 4
figure 4

Left: Initial mesh \(\mathcal {T}_0\). The shaded area is the set \(T_2\) from Section () and the Dirichlet boundary at the re-entrant corner is marked in red. Right: Mesh after 13 iterations of Algorithm 3 with \(\#\mathcal {T}_{13} = 4534\)

Figure 5 shows the rate of the estimator product \(\eta _\ell (u_\ell ^{\underline{k}}) \zeta _\ell (z_\ell ^{\underline{k}})\) of the final iterates over \(\#\mathcal {T}_\ell \) as well as the rate of \(\Xi _\ell ^k\) over \(\texttt{work}(\ell , k)\) for all \((\ell ,k) \in \mathcal {Q}\).

Fig. 5
figure 5

Rates of the estimator product for final iterates over \(\#\mathcal {T}_\ell \) and \(\Xi _\ell ^k\) as well as goal error over \(\texttt{work}(\ell ,k)\) for all \((\ell ,k) \in \mathcal {Q}\)

5 Proof of Theorem 6

The following core lemma extends one of the key observations of [16] to the present setting, where we stress that the nonlinear product structure of \(\Delta _\ell ^k\) leads to technical challenges which go much beyond [16].

Lemma 10

Suppose (A1)–(A3). Then, there exist constants \(\mu , C_\textrm{aux}> 0\), and \(0< q_\textrm{aux}< 1\), and some scalar sequence \((R_\ell )_{\ell \in \mathbb {N}_0} \subset \mathbb {R}\) such that the quasi-error product

$$\begin{aligned} \Delta _{\ell }^k := \big [ \, |\!|\!| u_{\ell }^{\star } - u_{\ell }^{k} |\!|\!| + \mu \, \eta _\ell (u_{\ell }^{k}) \, \big ] \big [ \, |\!|\!| z_ {\ell }^{\star } - z_{\ell }^{k} |\!|\!| + \mu \, \zeta _\ell (z_{\ell }^{k}) \, \big ] \quad \text {for all } (\ell ,k) \in \mathcal {Q}_0 \end{aligned}$$

satisfies the following statements (i)–(v):

  1. (i)

    \(\Delta _\ell ^k \le \Delta _\ell ^j\)    for all \(0 \le j \le k \le {\underline{k}}(\ell )\).

  2. (ii)

    \(\Delta _\ell ^{{\underline{k}}-1} \le C_\textrm{aux}\, \Delta _\ell ^{\underline{k}}\)    if \({\underline{k}}(\ell ) < \infty \).

  3. (iii)

    \(\Delta _{\ell }^{k} \le q_\textrm{aux}\, \Delta _{\ell }^{k-1}\)    for all \(0< k < {\underline{k}}(\ell )\).

  4. (iv)

    \(\Delta _{\ell +1}^{0} \le q_\textrm{aux}\, \Delta _{\ell }^{{\underline{k}}-1} + R_\ell \)    for all \(0< \ell < \underline{\ell }\).

  5. (v)

    \(\sum _{\ell = \ell '}^{\underline{\ell }-1} R_{\ell }^2 \le C_\textrm{aux}(\Delta _{\ell }^{{\underline{k}}-1})^2\)    for all \(0 \le \ell ' < \underline{\ell }-1\).

The constants \(\mu \), \(C_\textrm{aux}\), and \(q_\textrm{aux}\) depend only on \(C_\textrm{stab}\), \(q_\textrm{red}\), \(C_\textrm{rel}\), and \(q_\textrm{ctr}\) as well as on the (arbitrary) adaptivity parameters \(0 < \theta \le 1\) and \(\lambda _\textrm{ctr}> 0\).

For the following proofs, we define

$$\begin{aligned} \alpha _\ell ^k&:= |\!|\!| u_\ell ^\star - u_{\ell }^{k} |\!|\!|,&x_\ell ^\star&:= |\!|\!| u_{\ell +1}^\star - u_{\ell }^{\star } |\!|\!|,\\ \beta _\ell ^k&:= |\!|\!| z_\ell ^\star - z_{\ell }^{k} |\!|\!|,&y_\ell ^\star&:= |\!|\!| z_{\ell +1}^\star -z_{\ell }^{\star } |\!|\!|, \end{aligned}$$

such that the quasi-error product reads \(\Delta _{\ell }^{k} = \big [ \alpha _{\ell }^{k} + \mu \, \eta _\ell (u_{\ell }^{k}) \big ] \big [ \beta _{\ell }^{k} + \mu \, \zeta _\ell (z_{\ell }^{k}) \big ]\) with a free parameter \(\mu > 0\) which will be fixed below.

Proof of Lemma 10(i)

Recall from (23) that \(u_\ell ^k = u_\ell ^{\underline{m}}\) for all \({\underline{m}}(\ell ) < k \le {\underline{k}}(\ell )\). Thus, we have that

$$\begin{aligned} \alpha _\ell ^k + \mu \, \eta _\ell (u_{\ell }^{k}) = \alpha _\ell ^{\underline{m}}+ \mu \, \eta _\ell (u_{\ell }^{{\underline{m}}}) \quad \text {for all } {\underline{m}}(\ell ) < k \le {\underline{k}}(\ell ). \end{aligned}$$

For \(0< k < {\underline{m}}(\ell )\), on the other hand, the solution \(u_\ell ^k\) is obtained by one step of the iterative solver. From stability (A1) and solver contraction (9), we have for all \(0 \le j < k \le {\underline{m}}(\ell )\) that

If \(\mu \) is chosen small enough such that \(q_\textrm{ctr}+ 2 \mu C_\textrm{stab}\le 1\), together with the trivial case \(j = k\), the last two equations show that

$$\begin{aligned} \alpha _\ell ^k + \mu \, \eta _\ell (u_{\ell }^{k}) \le \alpha _\ell ^j + \mu \, \eta _\ell (u_{\ell }^{j}) \quad \text {for all } 0 \le j \le k \le {\underline{k}}(\ell ). \end{aligned}$$

The same argument shows that

$$\begin{aligned} \beta _\ell ^k + \mu \, \zeta _\ell (z_{\ell }^{k}) \le \beta _\ell ^j + \mu \, \zeta _\ell (z_{\ell }^{j}). \quad \text {for all } 0 \le j \le k \le {\underline{k}}(\ell ). \end{aligned}$$
(40)

Multiplication of the last two estimates shows the assertion. \(\square \)

Proof of Lemma 10(ii)

Recall that for the index \({\underline{k}}(\ell )\) there holds (22). From the triangle inequality, we thus get for the primal estimator that

Furthermore, stability (A1) leads to

Combining the last two estimates, we see that

$$\begin{aligned} \alpha _\ell ^{{\underline{k}}-1} + \mu \, \eta _\ell (u_\ell ^{{\underline{k}}-1}) \le \big ( 1 + \lambda _\textrm{ctr}(C_\textrm{stab}+ \mu ^{-1}) \big ) \, \big [ \, \alpha _\ell ^{{\underline{k}}} + \mu \, \eta _\ell (u_\ell ^{{\underline{k}}}) \, \big ]. \end{aligned}$$

Together with the analogous estimate for \(\beta _\ell ^{{\underline{k}}-1} + \mu \, \zeta _\ell (z_\ell ^{{\underline{k}}-1})\), we conclude the proof with \(C_\textrm{aux}= \big ( 1 + \lambda _\textrm{ctr}(C_\textrm{stab}+ \mu ^{-1}) \big )^2\). \(\square \)

Proof of Lemma 10(iii)

Without loss of generality, suppose that \({\underline{k}}(\ell ) = {\underline{m}}(\ell )\) and thus \(|\!|\!| u_{\ell }^{k} - u_{\ell }^{k-1} |\!|\!| > \lambda _\textrm{ctr}\, \eta _\ell (u_{\ell }^{k})\). Then, this yields that

With contraction of the solver (9), this leads to

$$\begin{aligned} \alpha _{\ell }^k + \mu \, \eta _\ell (u_{\ell }^{k}) \le q_\textrm{ctr}\alpha _{\ell }^{k-1} + \mu \lambda _\textrm{ctr}^{-1} (1+q_\textrm{ctr}) \, \alpha _{\ell }^{k-1} \quad \text {for all } 0< k < {\underline{k}}(\ell ). \end{aligned}$$

From (40) for \(\mu \) small enough, we see that \(\beta _\ell ^k + \mu \, \zeta _\ell (z_{\ell }^{k}) \le \beta _\ell ^{k-1} + \mu \, \zeta _\ell (z_{\ell }^{k-1})\). Together with the previous estimate, this shows that

$$\begin{aligned} \Delta _\ell ^k \le \big ( q_\textrm{ctr}+ \mu \lambda _\textrm{ctr}^{-1} (1+q_\textrm{ctr}) \big ) \Delta _\ell ^{k-1}. \end{aligned}$$
(41)

Up to the choice of \(\mu \), this concludes the proof. \(\square \)

Proof of Lemma 10(iv)

First, we note that \(\eta _\ell (u_\ell ^{\underline{k}}) \zeta _\ell (z_\ell ^{\underline{k}}) \ne 0\), according to Algorithm 3(iii) and the assumption that \(\ell < \underline{\ell }\). From reduction of the solver (9) and nested iteration, we get that

$$\begin{aligned} {\begin{matrix} \alpha _{\ell +1}^0 &{}= |\!|\!| u_{\ell +1}^{\star } - u_\ell ^{\underline{k}} |\!|\!| \le |\!|\!| u_{\ell +1}^{\star } - u_{\ell }^{\star } |\!|\!| + q_\textrm{ctr}\, |\!|\!| u_\ell ^\star - u_{\ell }^{{\underline{k}}-1} |\!|\!| = x_\ell ^\star + q_\textrm{ctr}\, \alpha _\ell ^{{\underline{k}}-1},\\ \beta _{\ell +1}^0 &{}= |\!|\!| z_{\ell +1}^{\star } - z_\ell ^{\underline{k}} |\!|\!| \le |\!|\!| z_{\ell +1}^{\star } - z_{\ell }^{\star } |\!|\!| + q_\textrm{ctr}\, |\!|\!| z_\ell ^\star - z_{\ell }^{{\underline{k}}-1} |\!|\!| = y_\ell ^\star + q_\textrm{ctr}\, \beta _\ell ^{{\underline{k}}-1} \end{matrix}} \end{aligned}$$
(42)

and thus

$$\begin{aligned} \alpha _{\ell +1}^0 \beta _{\ell +1}^0 \le q_\textrm{ctr}^2 \, \alpha _{\ell }^{{\underline{k}}-1} \beta _{\ell }^{{\underline{k}}-1} + q_\textrm{ctr}( \alpha _{\ell }^{{\underline{k}}-1} y_{\ell }^\star + \beta _{\ell }^{{\underline{k}}-1} x_{\ell }^\star ) + x_{\ell }^\star y_{\ell }^\star . \end{aligned}$$
(43)

For the estimator terms, we have with stability (A1) and reduction (A2) that

$$\begin{aligned} \eta _{\ell +1}(u_{\ell +1}^0)^2 = \eta _{\ell +1}(u_{\ell }^{\underline{k}})^2&= \eta _{\ell +1}(\mathcal {T}_{\ell +1} \cap \mathcal {T}_\ell , u_{\ell }^{\underline{k}})^2 + \eta _{\ell +1}(\mathcal {T}_{\ell +1} \setminus \mathcal {T}_\ell , u_{\ell }^{\underline{k}})^2\\&\le \eta _{\ell }(\mathcal {T}_{\ell +1} \cap \mathcal {T}_\ell , u_{\ell }^{\underline{k}})^2 + q_\textrm{red}^2 \, \eta _{\ell }(\mathcal {T}_{\ell } \setminus \mathcal {T}_{\ell +1}, u_{\ell }^{\underline{k}})^2\\&= \eta _{\ell }(u_{\ell }^{\underline{k}})^2 - (1-q_\textrm{red}^2) \, \eta _{\ell }(\mathcal {T}_{\ell } \setminus \mathcal {T}_{\ell +1}, u_{\ell }^{\underline{k}})^2. \end{aligned}$$

On the one hand, with \(C_1 := C_\textrm{stab}(1+q_\textrm{red})\), this implies that

(44)

On the other hand, with \(0< q_\theta := 1-(1-q_\textrm{red}^2)\theta < 1\), we get that

$$\begin{aligned} \frac{\eta _{\ell +1}(u_{\ell +1}^0)^2}{\eta _{\ell }(u_{\ell }^{\underline{k}})^2} \le q_\theta + (1-q_\textrm{red}^2) \Big [ \theta - \frac{\eta _{\ell }(\mathcal {T}_{\ell } \setminus \mathcal {T}_{\ell +1}, u_{\ell }^{\underline{k}})^2}{\eta _{\ell }(u_{\ell }^{\underline{k}})^2} \Big ]. \end{aligned}$$
(45)

Using (45), the corresponding estimate for the dual estimator, and the Young inequality, we obtain that

$$\begin{aligned} \frac{\eta _{\ell +1}(u_{\ell +1}^0)}{\eta _{\ell }(u_{\ell }^{\underline{k}})} \frac{\zeta _{\ell +1}(z_{\ell +1}^0)}{\zeta _{\ell }(z_{\ell }^{\underline{k}})} \le q_\theta + \frac{(1-q_\textrm{red}^2)}{2} \Big [ 2\theta - \frac{\eta _{\ell }(\mathcal {T}_{\ell } \setminus \mathcal {T}_{\ell +1}, u_{\ell }^{\underline{k}})^2}{\eta _{\ell }(u_{\ell }^{\underline{k}})^2} - \frac{\zeta _{\ell }(\mathcal {T}_{\ell } \setminus \mathcal {T}_{\ell +1}, z_{\ell }^{\underline{k}})^2}{\zeta _{\ell }(z_{\ell }^{\underline{k}})^2} \Big ]. \end{aligned}$$

The marking criterion (17), which is applicable due to \(\ell < \underline{\ell }\), estimates the term in brackets by zero. Thus stability (A1) leads to

(46)

For the mixed terms in \(\Delta _{\ell +1}^0\), we have with (42) and (44) that

$$\begin{aligned} {\begin{matrix} \eta _{\ell +1}(u_{\ell +1}^0) \beta _{\ell +1}^0 &{}\le \big [ \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) + C_1 \, \alpha _{\ell }^{{\underline{k}}-1} \big ] \big [ y_\ell ^\star + q_\textrm{ctr}\, \beta _\ell ^{{\underline{k}}-1} \big ]\\ &{}= q_\textrm{ctr}\, \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) \beta _\ell ^{{\underline{k}}-1} + \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) y_\ell ^\star + C_1 \, \alpha _\ell ^{{\underline{k}}-1} y_\ell ^\star + C_1 q_\textrm{ctr}\, \alpha _\ell ^{{\underline{k}}-1} \beta _\ell ^{{\underline{k}}-1}. \end{matrix}} \end{aligned}$$
(47)

Analogously, we see that

$$\begin{aligned} \zeta _{\ell +1}(z_{\ell +1}^0) \alpha _{\ell +1}^0 \le q_\textrm{ctr}\, \zeta _{\ell }(z_{\ell }^{{\underline{k}}-1}) \alpha _\ell ^{{\underline{k}}-1} + \zeta _{\ell }(z_{\ell }^{{\underline{k}}-1}) x_\ell ^\star + C_1 \, \beta _\ell ^{{\underline{k}}-1} x_\ell ^\star + C_1 q_\textrm{ctr}\, \alpha _\ell ^{{\underline{k}}-1} \beta _\ell ^{{\underline{k}}-1}. \end{aligned}$$
(48)

Combining (43) and (46)–(48), we get that

$$\begin{aligned} \Delta _{\ell +1}^0&= \alpha _{\ell +1}^0 \beta _{\ell +1}^0 + \mu \, \big [ \eta _{\ell +1}(u_{\ell +1}^0) \beta _{\ell +1}^0 + \zeta _{\ell +1}(z_{\ell +1}^0) \alpha _{\ell +1}^0 \big ]\\&\quad + \mu ^2 \, \eta _{\ell +1}(u_{\ell +1}^0) \zeta _{\ell +1}(z_{\ell +1}^0)\\&\le q_\textrm{ctr}^2 \, \alpha _{\ell }^{{\underline{k}}-1} \beta _{\ell }^{{\underline{k}}-1} + q_\textrm{ctr}( \alpha _{\ell }^{{\underline{k}}-1} y_{\ell }^\star + \beta _{\ell }^{{\underline{k}}-1} x_{\ell }^\star ) + x_{\ell }^\star y_{\ell }^\star \\&\quad + \mu \, \big [ q_\textrm{ctr}\, \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) \beta _\ell ^{{\underline{k}}-1} + \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) y_\ell ^\star + C_1 \, \alpha _\ell ^{{\underline{k}}-1} y_\ell ^\star + C_1 q_\textrm{ctr}\, \alpha _\ell ^{{\underline{k}}-1} \beta _\ell ^{{\underline{k}}-1} \big ]\\&\quad + \mu \, \big [ q_\textrm{ctr}\, \zeta _{\ell }(z_{\ell }^{{\underline{k}}-1}) \alpha _\ell ^{{\underline{k}}-1} + \zeta _{\ell }(z_{\ell }^{{\underline{k}}-1}) x_\ell ^\star + C_1 \, \beta _\ell ^{{\underline{k}}-1} x_\ell ^\star + C_1 q_\textrm{ctr}\, \alpha _\ell ^{{\underline{k}}-1} \beta _\ell ^{{\underline{k}}-1} \big ]\\&\quad + \mu ^2 \, \big [ q_\theta \, \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) \zeta _{\ell }(z_{\ell }^{{\underline{k}}-1}) + q_\theta C_1 \big ( \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) \beta _{\ell }^{{\underline{k}}-1} + \zeta _{\ell }(z_{\ell }^{{\underline{k}}-1}) \alpha _{\ell }^{{\underline{k}}-1} \big ) \\&\quad + C_1^2 \, \alpha _{\ell }^{{\underline{k}}-1} \beta _{\ell }^{{\underline{k}}-1} \big ]. \end{aligned}$$

Rearranging the terms, we obtain that

$$\begin{aligned} {\begin{matrix} \Delta _{\ell +1}^0 &{}\le \big ( q_\textrm{ctr}^2 + 2 \mu q_\textrm{ctr}C_1 + \mu ^2 C_1^2 \big ) \, \alpha _{\ell }^{{\underline{k}}-1} \beta _{\ell }^{{\underline{k}}-1}\\ &{}\qquad + \mu \, \big ( q_\textrm{ctr}+ \mu q_\theta C_1 \big ) \, \big [ \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) \beta _\ell ^{{\underline{k}}-1} + \zeta _{\ell }(z_{\ell }^{{\underline{k}}-1}) \alpha _{\ell }^{{\underline{k}}-1} \big ]\\ &{}\qquad + \mu ^2 \, q_\theta \, \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) \zeta _{\ell }(z_{\ell }^{{\underline{k}}-1}) + R_\ell , \end{matrix}} \end{aligned}$$
(49)

where the remainder term is defined as

$$\begin{aligned} R_\ell := \mu \, \big [ \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) y_\ell ^\star + \zeta _{\ell }(z_{\ell }^{{\underline{k}}-1}) x_\ell ^\star \big ] + (q_\textrm{ctr}+ \mu C_1) \big [ \alpha _{\ell }^{{\underline{k}}-1} y_{\ell }^\star + \beta _{\ell }^{{\underline{k}}-1} x_{\ell }^\star \big ] + x_\ell ^\star y_\ell ^\star . \end{aligned}$$
(50)

Up to the choice of \(\mu \), this concludes the proof. \(\square \)

Proof of Lemma 10 (choosing \(\mu \)) For Lemma 10(i), we choose \(\mu \) small enough such that \(q_\textrm{ctr}+ 2 \mu C_\textrm{stab}\le 1\). From (41) and (49) in the proofs of Lemma 10(iii)–(iv), we see that we additionally require

$$\begin{aligned} q_\textrm{ctr}+ \mu \lambda _\textrm{ctr}^{-1} (1+q_\textrm{ctr})< 1, \quad \quad q_\textrm{ctr}^2 + 2 \mu q_\textrm{ctr}C_1 + \mu ^2 C_1^2< 1, \quad \text {and}\quad q_\textrm{ctr}+ \mu q_\theta C_1 < 1. \end{aligned}$$
(51)

Choosing \(\mu \) small enough, we satisfy all estimates. We define \(q_\textrm{aux}< 1\) as the maximum of all terms in (51) and \(q_\theta \)\(\square \)

Proof of Lemma 10(v)

First, we note that from stability (A1) it follows that

$$\begin{aligned} \eta _{\ell }(u_{\ell }^{{\underline{k}}-1}) \lesssim \eta _{\ell }(u_{\ell }^{\star }) + \alpha _\ell ^{{\underline{k}}-1} \quad \text {and}\quad \eta _{\ell }(u_{\ell }^{\star }) \zeta _{\ell }(z_{\ell }^{\star }) \lesssim \Delta _{\ell }^{j} \text { for all } 0 \le j \le {\underline{k}}. \end{aligned}$$
(52)

Furthermore, Galerkin orthogonality and reliability (A3) imply that, for all \(n \in \mathbb {N}\) with \(\ell '+n < \underline{\ell }\),

(53)

With (52) and (53) for \(n=1\), we can bound the remainder term from (50) by

$$\begin{aligned} R_\ell \lesssim \eta _{\ell }(u_{\ell }^{\star }) y_\ell ^\star + \zeta _{\ell }(z_{\ell }^{\star }) x_\ell ^\star + \alpha _\ell ^{{\underline{k}}-1} y_\ell ^\star + \beta _\ell ^{{\underline{k}}-1} x_\ell ^\star . \end{aligned}$$

Next, let us recall from [5, Lemma 3.6] the quasi-monotonicity of the estimator, which follows from (A1)–(A3) and the Céa lemma, i.e., for all \(\ell ' \le \ell < \underline{\ell }\),

$$\begin{aligned} \eta _\ell (u_\ell ^\star ) \le \eta _{\ell '}(u_{\ell '}^\star ) + C_\textrm{stab}\, |\!|\!| u_\ell ^\star - u_{\ell '}^\star |\!|\!| \le \eta _{\ell '}(u_{\ell '}^\star ) + C_\textrm{stab}\, |\!|\!| u^\star - u_{\ell '}^\star |\!|\!| \lesssim \eta _{\ell '}(u_{\ell '}^\star ). \end{aligned}$$
(54)

For \(\eta _{\ell }(u_{\ell }^{\star }) y_\ell \), we get by summation for all \(0 \le j \le {\underline{k}}(\ell ')\) and all \(n \in \mathbb {N}\) with \(\ell '+n < \underline{\ell }\) that

Analogously, we see that

$$\begin{aligned} \sum _{\ell ={\ell '}}^{{\ell '}+n} (x_\ell ^\star )^2 \lesssim \eta _{{\ell '}}(u_{{\ell '}}^{\star })^2 \quad \text {as well as}\quad \sum _{\ell ={\ell '}}^{{\ell '}+n} \zeta _{\ell }(z_{\ell }^{\star })^2 (x_\ell ^\star )^2 \lesssim (\Delta _{{\ell '}}^j)^2. \end{aligned}$$
(55)

We proceed with \(\alpha _\ell ^{{\underline{k}}-1} y_\ell ^\star \). From (42) and the Young inequality with \(\delta > 0\), we see for \(0< \ell ' \le \ell < \underline{\ell }\) that

$$\begin{aligned} (\alpha _\ell ^{{\underline{k}}-1})^2 \le (\alpha _\ell ^{0})^2 {\mathop {\le }\limits ^{}} (1+\delta ^{-1}) \, (x_{\ell -1}^\star )^2 + q_\textrm{ctr}(1+\delta ) \, (\alpha _{\ell -1}^{{\underline{k}}-1})^2. \end{aligned}$$

For \(\delta \) small enough such that \(q_2 := q_\textrm{ctr}(1+\delta ) < 1\) and all for \(0 \le \ell \le \ell ' < \underline{\ell }\), the geometric series proves that

$$\begin{aligned} (\alpha _{\ell }^{{\underline{k}}-1})^2 \le (1+\delta ^{-1}) \sum _{j=\ell '}^{\ell -1} (x_{j}^\star )^2 + (\alpha _{\ell }^{{\underline{k}}-1})^2 \sum _{j=0}^{\infty } q_2^j {\mathop {\lesssim }\limits ^{}} \eta _{{\ell '}}(u_{{\ell '}}^{\star })^2 + (\alpha _{{\ell '}}^{{\underline{k}}-1})^2 \end{aligned}$$

and thus

Analogously, we see that \(\sum _{\ell ={\ell '}}^{{\ell '}+n} (\beta _\ell ^{{\underline{k}}-1})^2 (x_\ell ^\star )^2 \lesssim (\Delta _{{\ell '}}^{{\underline{k}}-1})^2\). Combining all estimates with

$$\begin{aligned} R_\ell ^2 \lesssim \eta _{\ell }(u_{\ell }^{\star })^2 (y_\ell ^\star )^2 + \zeta _{\ell }(z_{\ell }^{\star })^2 (x_\ell ^\star )^2 + (\alpha _\ell ^{{\underline{k}}-1})^2 (y_\ell ^\star )^2 + (\beta _\ell ^{{\underline{k}}-1})^2 (x_\ell ^\star )^2, \end{aligned}$$

we conclude the proof. \(\square \)

With the foregoing auxiliary result, we are in the position to prove linear convergence.

Proof of Theorem 6

Let \((\ell , k) \in \mathcal {Q}\). We recall the quasi-error products

$$\begin{aligned} \Lambda _\ell ^k&= \big [ \, |\!|\!| u_{\ell }^{\star } - u_{\ell }^{k} |\!|\!| + \eta _\ell (u_{\ell }^{k}) \, \big ] \big [ \, |\!|\!| z_{\ell }^{\star } - z_{\ell }^{k} |\!|\!| + \zeta _\ell (z_{\ell }^{k}) \, \big ],\\ \Delta _\ell ^k&= \big [ \, |\!|\!| u_{\ell }^{\star } - u_{\ell }^{k} |\!|\!| + \mu \, \eta _\ell (u_{\ell }^{k}) \, \big ] \big [ \, |\!|\!| z_{\ell }^{\star } - z_{\ell }^{k} |\!|\!| + \mu \, \zeta _\ell (z_{\ell }^{k}) \, \big ] \end{aligned}$$

from Theorem 6 and Lemma 10, respectively. Note that

$$\begin{aligned} \Lambda _\ell ^k \le \Delta _\ell ^k \le \mu ^2 \, \Lambda _\ell ^k \quad \text {if } \mu \ge 1, \qquad \Delta _\ell ^k \le \Lambda _\ell ^k \le \mu ^{-2} \, \Delta _\ell ^k \quad \text {if } \mu < 1, \end{aligned}$$

which yields the equivalence

$$\begin{aligned} \min \{1, \mu ^2 \} \, \Lambda _\ell ^k \le \Delta _\ell ^k \le \max \{1, \mu ^2 \} \, \Lambda _\ell ^k. \end{aligned}$$
(56)

We first show linear convergence of \(\Delta _\ell ^k\). By Lemma 10(i), we can absorb the term \(\Delta _{\ell '}^{{\underline{k}}}\le \Delta _{\ell '}^{{\underline{k}}-1}\) for all \(\ell '\). Paying attention to the possible case \(k={\underline{k}}(\ell )\), this allows us to estimate

$$\begin{aligned} \sum _{\begin{array}{c} (\ell ',k') \in \mathcal {Q}\\ |(\ell ',k')| \ge |(\ell ,k)| \end{array}} (\Delta _{\ell '}^{k'})^2 \lesssim (\Delta _\ell ^k)^2 + \sum _{k' = k}^{{\underline{k}}(\ell )-1} (\Delta _{\ell }^{k'})^2 + \sum _{\ell ' = \ell +1}^{\underline{\ell }} \sum _{k' = 0}^{{\underline{k}}(\ell ')-1} (\Delta _{\ell '}^{k'})^2. \end{aligned}$$

Lemma 10(iii) shows uniform reduction of the quasi-error on every level. This yields that

$$\begin{aligned} \sum _{\begin{array}{c} (\ell ',k') \in \mathcal {Q}\\ |(\ell ',k')| \ge |(\ell ,k)| \end{array}} (\Delta _{\ell '}^{k'})^2{} & {} \lesssim (\Delta _{\ell }^{k})^2 \sum _{k' = k}^{{\underline{k}}(\ell )} q_\textrm{aux}^{2(k'-k)} + \sum _{\ell ' = \ell +1}^{\underline{\ell }} (\Delta _{\ell '}^{0})^2 \sum _{k' = 0}^{{\underline{k}}(\ell ')-1} q_\textrm{aux}^{2k'}\\{} & {} \lesssim (\Delta _{\ell }^{k})^2 + \sum _{\ell ' = \ell +1}^{\underline{\ell }} (\Delta _{\ell '}^{0})^2. \end{aligned}$$

To estimate the sum over all levels, we use that, for the refinement step, Lemma 10(iv) shows contraction up to a remainder term. The Young inequality with \(\delta > 0\) and Lemma 10(i) then prove that

$$\begin{aligned} (\Delta _{\ell '}^{0})^2&\le q_\textrm{aux}^2 (1+\delta ) \, (\Delta _{\ell '-1}^{{\underline{k}}-1})^2 + (1+\delta ^{-1}) \, R_{\ell '-1}^2\\&\le q_\textrm{aux}^2 (1+\delta ) \, (\Delta _{\ell '-1}^{0})^2 + (1+\delta ^{-1}) \, R_{\ell '-1}^2 \quad \text {for all } 0 < \ell ' \le \underline{\ell }. \end{aligned}$$

Choosing \(\delta \) small enough such that \(q := q_\textrm{aux}^2 (1+\delta ) < 1\), we obtain from repeatedly applying the previous estimates that

$$\begin{aligned} (\Delta _{\ell '}^{0})^2 \le q^{\ell '-\ell } \, (\Delta _{\ell }^{{\underline{k}}-1})^2 + (1+\delta ^{-1}) \sum _{n = \ell }^{\ell '-1} q^{(\ell '-1)-n} \, R_{n}^2 \quad \text {for all } 0 \le \ell < \ell ' \le \underline{\ell }. \end{aligned}$$

Using this estimate and a change of summation indices, the geometric series and Lemma 10(v) uniformly bound the sum over all levels by

$$\begin{aligned} \sum _{\ell ' = \ell +1}^{\underline{\ell }} (\Delta _{\ell '}^{0})^2&\lesssim \sum _{\ell ' = \ell +1}^{\underline{\ell }} \Big [ q^{\ell '-\ell } \, (\Delta _{\ell }^{{\underline{k}}-1})^2 + \sum _{n = \ell }^{\ell '-1} q^{(\ell '-1)-n} \, R_{n}^2\Big ]\\&\lesssim (\Delta _{\ell }^{{\underline{k}}-1})^2 +\sum _{n=\ell }^{\underline{\ell }-1} R_{n}^2 \sum _{i = 0}^{\infty } q^{i} \lesssim (\Delta _{\ell }^{{\underline{k}}-1})^2 +\sum _{n=\ell }^{\underline{\ell }-1} R_{n}^2 {\mathop {\lesssim }\limits ^\mathrm{(v)}} (\Delta _{\ell }^{{\underline{k}}-1})^2. \end{aligned}$$

Combining the estimates above, we obtain that

$$\begin{aligned} \sum _{\begin{array}{c} (\ell ',k') \in \mathcal {Q}\\ |(\ell ',k')| \ge |(\ell ,k)| \end{array}} (\Delta _{\ell '}^{k'})^2 \lesssim (\Delta _{\ell }^{k})^2 + \sum _{\ell ' = \ell +1}^{\underline{\ell }} (\Delta _{\ell '}^{0})^2 \lesssim (\Delta _{\ell }^{k})^2 + (\Delta _{\ell }^{{\underline{k}}-1})^2. \end{aligned}$$

In the case \(k < {\underline{k}}(\ell )\), Lemma 10(i) proves that

$$\begin{aligned} \sum _{\begin{array}{c} (\ell ',k') \in \mathcal {Q}\\ |(\ell ',k')| \ge |(\ell ,k)| \end{array}} (\Delta _{\ell '}^{k'})^2 \le C \, (\Delta _{\ell }^{k})^2. \end{aligned}$$

In the case \(k = {\underline{k}}(\ell )\), this follows with Lemma 10(ii). In either case, the constant \(C > 0\) depends only on \(C_\textrm{aux}\) and \(q_\textrm{aux}\) from Lemma 10. Basic calculus then provides the existence of \(C_\textrm{lin}^\prime := (1+C)^{1/2} > 1\) and \(0< q_\textrm{lin}:= (1-C^{-1})^{-1/2} < 1\) such that

$$\begin{aligned} \Delta _{\ell '}^{k'} \le C_\textrm{lin}^\prime q_\textrm{lin}^{|(\ell ',k')| - |(\ell ,k)|} \, \Delta _{\ell }^{k} \quad \text {for all } (\ell ,k), (\ell ',k') \in \mathcal {Q}\text { with } (\ell ',k') \ge (\ell ,k); \end{aligned}$$

see [5, Lemma 4.9]. Finally, the claim of Theorem 6 follows from (56) with \(C_\textrm{lin}= \max \{ \mu ^{-2}, \mu ^2 \} \, C_\textrm{lin}^\prime \). \(\square \)

6 Proof of Theorem 8 (optimal rates)

We recall the following comparison lemma from [12]. While [12] is concerned with point errors in boundary element computations, we stress that the proof of [12, Lemma 14] works on a completely abstract level and thus is applicable here as well.

Lemma 11

([12, Lemma 14]) The overlay estimate (14) and the axioms (A1)–(A2) and (A4) yield the existence of a constant \(C_1>0\) such that, given \(0<\kappa <1\), each mesh \(\mathcal {T}_H\in \mathbb {T}\) admits some refinement \(\mathcal {T}_h\in \mathbb {T}(\mathcal {T}_H)\) such that for all \(s,t>0\), it holds that

$$\begin{aligned} \eta _h(u_h^\star )^2 \zeta _h(z_h^\star )^2&\le \kappa ^2 \eta _H(u_H^\star )^2 \zeta _H(z_H^\star )^2, \end{aligned}$$
(57a)
$$\begin{aligned} \#\mathcal {T}_h- \#\mathcal {T}_H&\le 2 \big (C_1 \kappa ^{-1} \Vert u^\star \Vert _{\mathbb {A}_s} \Vert z^\star \Vert _{\mathbb {A}_t}\big )^{1/(s+t)} \big (\eta _H(u_H^\star )\zeta (z_H^\star ) \big )^{1/(s+t)}. \end{aligned}$$
(57b)

The constant \(C_1\) depends only on \(C_\textrm{stab}\), \(q_\textrm{red}\), and \(C_\textrm{drel}\). \(\square \)

Note that (57a) immediately implies that

$$\begin{aligned} \eta _h(u_h)^2 \le \kappa \eta _H(u_H^\star )^2 \quad \text {or}\quad \zeta _h(z_h^\star )^2 \le \kappa \zeta _H(z_H^\star )^2. \end{aligned}$$
(58)

We will employ this lemma in combination with the so-called optimality of Dörfler marking from [5].

Lemma 12

([5, Proposition 4.12]) Under (A1) and (A4), for all \(0<\Theta '<1/(1+C_\textrm{stab}^2C_\textrm{drel}^2)\), there exists \(0<\kappa _{\Theta '}<1\) such that for all \(\mathcal {T}_H\in \mathbb {T}\) and all \(\mathcal {T}_h\in \mathbb {T}(\mathcal {T}_H)\), (58) with \(\kappa =\kappa _{\Theta '}\) implies that

$$\begin{aligned} \Theta ' \eta _H(u_H^\star )^2 \le \eta _H(\mathcal {T}_H\setminus \mathcal {T}_h,u_H^\star )^2 \quad \text {or}\quad \Theta ' \zeta _H(z_H^\star )^2 \le \zeta _H(\mathcal {T}_H\setminus \mathcal {T}_h,z_H^\star )^2. \end{aligned}$$
(59)

The constant \(\kappa _{\Theta '}\) depends only on \(C_\textrm{stab}\), \(C_\textrm{drel}\), and \(\Theta '\). \(\square \)

The next lemma is already implicitly found in [15]. It shows that, if \(\lambda _\textrm{ctr}> 0\) is sufficiently small, then Dörfler marking for the exact discrete solution implicitly implies Dörfler marking for the approximate discrete solution. This will turn out to be the key observation to prove optimal convergence rates. We include the proof for the convenience of the reader.

Lemma 13

Suppose (A1)–(A3). Let \(0< \Theta \le 1\) and \(0< \lambda _\textrm{ctr}< \lambda _\star :=(1-q_\textrm{ctr}) / ( q_\textrm{ctr}C_\textrm{stab})\). Define \(\Theta ' := \big (\frac{\sqrt{\Theta } + \lambda _\textrm{ctr}/\lambda _\star }{1-\lambda _\textrm{ctr}/\lambda _\star }\big )^2\). Then, as soon as the iterative solver terminates (22), there hold the following statements (i)–(iv) for all \(0 \le \ell < \underline{\ell }\) and all \(\mathcal {U}_\ell \subseteq \mathcal {T}_\ell \):

  1. (i)

    \((1 - \lambda _\textrm{ctr}/\lambda _\star ) \, \eta _\ell (u_{\ell }^{{\underline{m}}}) \le \eta _\ell (u_\ell ^\star ) \le (1 + \lambda _\textrm{ctr}/\lambda _\star ) \, \eta _\ell (u_{\ell }^{{\underline{m}}})\).

  2. (ii)

    \(\Theta \, \eta _\ell (u_\ell ^{\underline{m}})^2 \le \eta _\ell (\mathcal {U}_\ell , u_\ell ^{\underline{m}})^2\)    provided that \(\Theta ' \, \eta _\ell (u_{\ell }^\star )^2 \le \eta _\ell (\mathcal {U}_\ell , u_{\ell }^\star )^2\).

  3. (iii)

    \((1 - \lambda _\textrm{ctr}/\lambda _\star ) \, \zeta _\ell (z_{\ell }^{{\underline{n}}}) \le \zeta _\ell (z_\ell ^\star ) \le (1 + \lambda _\textrm{ctr}/\lambda _\star ) \, \zeta _\ell (z_{\ell }^{{\underline{n}}})\).

  4. (iv)

    \(\Theta \, \zeta _\ell (z_\ell ^{\underline{n}}) \le \zeta _\ell (\mathcal {U}_\ell , z_\ell ^{\underline{n}})\)    provided that \(\Theta ' \, \zeta _\ell (z_{\ell }^\star )^2 \le \zeta _\ell (\mathcal {U}_\ell , z_{\ell }^\star )^2\).

Proof

It holds that

$$\begin{aligned}&\eta _\ell (\mathcal {U}_\ell , u_\ell ^\star ) {\mathop {(A1)}\limits {\le }} \eta _\ell (\mathcal {U}_\ell , u_{\ell }^{{\underline{m}}}) + C_\text {stab}\, |\!|\!| u_\ell ^\star - u_{\ell }^{{\underline{m}}} |\!|\!| {\mathop {(20)}\limits {\le }} \eta _\ell (\mathcal {U}_\ell , u_{\ell }^{{\underline{m}}}) \\ {}&\quad + C_\text {stab}\, \frac{q_\text {ctr}}{1-q_\text {ctr}} \, |\!|\!| u_{\ell }^{{\underline{m}}} - u_{\ell }^{ {\underline{m}}-1} |\!|\!|\\ {}&\quad {\mathop {(22)}\limits {\le }} \eta _\ell (\mathcal {U}_\ell , u_{\ell }^{{\underline{m}}}) + C_\text {stab}\, \frac{q_\text {ctr}}{1-q_\text {ctr}} \, \lambda _\text {ctr}\, \eta _\ell (u_{\ell }^{{\underline{m}}}) = \eta _\ell (\mathcal {U}_\ell , u_{\ell }^{{\underline{m}}}) + \frac{\lambda _\text {ctr}}{\lambda _\star } \, \eta _\ell (u_{\ell }^{{\underline{m}}}). \end{aligned}$$

The same argument proves that

$$\begin{aligned} \eta _\ell (\mathcal {U}_\ell , u_{\ell }^{{\underline{m}}}) \le \eta _\ell (\mathcal {U}_\ell , u_\ell ^\star ) + \frac{\lambda _\textrm{ctr}}{\lambda _\star } \, \eta _\ell (u_{\ell }^{{\underline{m}}}). \end{aligned}$$

For \(\mathcal {U}_\ell = \mathcal {T}_\ell \), the latter two estimates lead to

$$\begin{aligned} (1 - \lambda _\textrm{ctr}/\lambda _\star ) \, \eta _\ell (u_{\ell }^{{\underline{m}}}) \le \eta _\ell (u_\ell ^\star ) \le (1 + \lambda _\textrm{ctr}/\lambda _\star ) \, \eta _\ell (u_{\ell }^{{\underline{m}}}). \end{aligned}$$

This concludes the proof of (i). To see (ii), we use the assumption

$$\begin{aligned} (1-\lambda _\textrm{ctr}/\lambda _\star ) \, \sqrt{\Theta '} \, \eta _\ell (u_\ell ^{{\underline{m}}}) {\mathop {\le }\limits ^\mathrm{(i)}}\sqrt{\Theta '} \, \eta _\ell (u_{\ell }^\star ) \le \eta _\ell (\mathcal {U}_\ell , u_{\ell }^\star ) \le \eta _\ell (\mathcal {U}_\ell , u_\ell ^{\underline{m}}) + \frac{\lambda _\textrm{ctr}}{\lambda _\star } \, \eta _\ell (u_{\ell }^{{\underline{m}}}). \end{aligned}$$

Noting that \(\sqrt{\Theta } = (1-\lambda _\textrm{ctr}/\lambda _\star )\,\sqrt{\Theta '} - \lambda _\textrm{ctr}/\lambda _\star \), this concludes the proof of (ii). The remaining claims (iii)–(iv) follow verbatim. \(\square \)

Proof of Theorem 8

By Corollary 7, it is sufficient to prove that

$$\begin{aligned} C_{s+t} = \sup _{(\ell ,k) \in \mathcal {Q}} \big ( \#\mathcal {T}_\ell - \#\mathcal {T}_0 + 1 \big )^{s+t} \Lambda _\ell ^k \lesssim \max \{\Vert u^\star \Vert _{\mathbb {A}_s}\Vert z^\star \Vert _{\mathbb {A}_t},\Lambda _0^0\}. \end{aligned}$$

We prove this inequality in two steps.

Step 1: In this step, we bound the number of marked elements \(\#\mathcal {M}_{\ell '}\) for arbitrary \(0\le \ell '<\underline{\ell }\). Let \(\Theta > 0\) and corresponding \(\Theta '\) from Lemma 13 such that

$$\begin{aligned} \Theta ' = \Big ( \frac{\sqrt{\Theta }+\lambda _\textrm{ctr}/\lambda _\star }{1-\lambda _\textrm{ctr}/\lambda _\star } \Big )^2 < \frac{1}{1+C_\textrm{stab}^2C_\textrm{drel}^2}. \end{aligned}$$
(60)

Let \(\mathcal {T}_{h(\ell ')}\in \mathbb {T}(\mathcal {T}_{\ell '})\) be the corresponding mesh as in Lemma 11. With Lemma 12, this yields that

$$\begin{aligned} \Theta '\eta _{\ell '}(u_{\ell '}^\star )^2 \le \eta _{\ell '}(\mathcal {T}_{\ell '}\setminus \mathcal {T}_{h(\ell ')},u_{\ell '}^\star )^2 \quad \text {or}\quad \Theta '\zeta _{\ell '}(z_{\ell '}^\star )^2 \le \zeta _{\ell '}(\mathcal {T}_{\ell '}\setminus \mathcal {T}_{h(\ell ')},z_{\ell '}^\star )^2. \end{aligned}$$

Lemma 13 with \(\mathcal {U}_{\ell '} = \mathcal {T}_{\ell '}\setminus \mathcal {T}_{h(\ell ')}\) shows that

$$\begin{aligned} \Theta \eta _{\ell '}(u_{\ell '}^{\underline{m}})^2 \le \eta _{\ell '}(\mathcal {T}_{\ell '}\setminus \mathcal {T}_{h(\ell ')},u_{\ell '}^\star )^2 \quad \text {or}\quad \Theta \zeta _{\ell '}(z_{\ell '}^{\underline{n}})^2 \le \zeta _{\ell '}(\mathcal {T}_{\ell '}\setminus \mathcal {T}_{h(\ell ')},z_{\ell '}^\star )^2. \end{aligned}$$
(61)

We consider the marking strategies from Remark 2 separately.

For strategy (a), we have with \(\Theta := 2\theta \) and assumption (32) that (60) is satisfied. Hence, (61) implies that there holds (17), i.e.,

$$\begin{aligned} 2 \theta \eta _{\ell '}(u_{\ell '}^{\underline{m}})^2 \zeta _{\ell '}(z_{\ell '}^{\underline{n}})^2 \le \eta _{\ell '}(\mathcal {T}_{\ell '}\setminus \mathcal {T}_{h(\ell ')},u_{\ell '}^{\underline{m}})^2 \zeta _{\ell '}(z_{\ell '}^{\underline{n}})^2 + \eta _{\ell '}(u_{\ell '}^{\underline{m}})^2 \zeta _{\ell '}(\mathcal {T}_{\ell '}\setminus \mathcal {T}_{h(\ell ')},z_{\ell '}^{\underline{n}})^2. \end{aligned}$$

By assumption of Theorem 8, \(\mathcal {M}_{\ell '}\) is essentially minimal with (17). We infer that

$$\begin{aligned} \#\mathcal {M}_{\ell '} \le C_\textrm{mark}\#(\mathcal {T}_{\ell '}\setminus \mathcal {T}_{h(\ell ')}) {\mathop {\lesssim }\limits ^{}} \#\mathcal {T}_{h(\ell ')} - \#\mathcal {T}_{\ell '}. \end{aligned}$$
(62)

For the strategies (b)–(c), we set \(\Theta = \theta \) and note that assumption (32) (as well as the weaker assumption (34)) imply (60), and hence (61). Again, by assumption of Theorem 8, \(\mathcal {M}_\ell \) is chosen essentially minimal (with an additional factor two for the strategy (c)) such that (61) holds. For all three strategies, we therefore conclude that

Recall that () and (22) give that \(\eta _{\ell '}(u_{\ell '}^{\underline{k}})\zeta _{\ell '}(z_{\ell '}^{\underline{k}})\simeq \Lambda _{\ell '}^{\underline{k}}\). This finally shows that

$$\begin{aligned} \#\mathcal {M}_{\ell '} \lesssim \big (\Vert u^\star \Vert _{\mathbb {A}_s}\Vert z^\star \Vert _{\mathbb {A}_t}\big )^{1/(s+t)} (\Lambda _{\ell '}^{\underline{k}})^{-1/(s+t)}. \end{aligned}$$

Step 2: Let \((\ell ,k)\in \mathcal {Q}\). First, we consider \(\ell >0\) and thus \(\#\mathcal {T}_\ell >\#\mathcal {T}_0\). The closure estimate and Step 1 prove that

$$\begin{aligned} \#\mathcal {T}_\ell - \#\mathcal {T}_0 + 1 \simeq \#\mathcal {T}_\ell -\#\mathcal {T}_0&{\mathop {\lesssim }\limits ^{}}\sum _{\ell '=0}^{\ell -1} \#\mathcal {M}_{\ell '} \lesssim \big (\Vert u^\star \Vert _{\mathbb {A}_s}\Vert z^\star \Vert _{\mathbb {A}_t}\big )^{1/(s+t)} \sum _{\ell '=0}^{\ell -1}(\Lambda _{\ell '}^{\underline{k}})^{-1/(s+t)} \\&\le \big (\Vert u^\star \Vert _{\mathbb {A}_s}\Vert z^\star \Vert _{\mathbb {A}_t}\big )^{1/(s+t)} \sum _{\begin{array}{c} (\ell ',k')\in \mathcal {Q}\\ |(\ell ',k')|\ge |(\ell ,k)| \end{array}}(\Lambda _{\ell '}^{k'})^{-1/(s+t)}. \end{aligned}$$

Linear convergence of Theorem 6, further shows that

$$\begin{aligned} \#\mathcal {T}_\ell -\#\mathcal {T}_0+1&\lesssim \big (\Vert u^\star \Vert _{\mathbb {A}_s}\Vert z^\star \Vert _{\mathbb {A}_t}\big )^{1/(s+t)} C_\textrm{lin}^{1/(s+t)} (\Lambda _{\ell }^k)^{-1/(s+t)}\\&\quad \sum _{\begin{array}{c} (\ell ',k')\in \mathcal {Q}\\ |(\ell ',k')|\ge |(\ell ,k)| \end{array}} (q_\textrm{lin}^{1/(s+t)})^{|(\ell ,k)|-|(\ell ',k')|} \\&\le \big (\Vert u^\star \Vert _{\mathbb {A}_s}\Vert z^\star \Vert _{\mathbb {A}_t}\big )^{1/(s+t)} \frac{C_\textrm{lin}^{1/(s+t)}}{1-q_\textrm{lin}^{1/(s+t)}}\, C_\textrm{lin}^{1/(s+t)} (\Lambda _{\ell }^k)^{-1/(s+t)}. \end{aligned}$$

Rearranging this estimate, we see that

$$\begin{aligned} (\#\mathcal {T}_\ell -\#\mathcal {T}_0+1)^{s+t}\Lambda _\ell ^k \lesssim \Vert u^\star \Vert _{\mathbb {A}_s}\Vert z^\star \Vert _{\mathbb {A}_t} \quad \text {for all }(\ell ,k)\in \mathcal {Q}\text { with }\ell >0. \end{aligned}$$

It remains to consider \(\ell =0\). By Theorem 6, we have that

$$\begin{aligned} (\#\mathcal {T}_\ell - \#\mathcal {T}_0 + 1)^{s+t} \Lambda _\ell ^k = \Lambda _0^k \lesssim \Lambda _0^0 \quad \text {for all }(\ell ,k)\in \mathcal {Q}\text { with }\ell =0. \end{aligned}$$

This concludes the proof. \(\square \)