Goal-oriented adaptive finite element methods with optimal computational complexity

We consider a linear symmetric and elliptic PDE and a linear goal functional. We design and analyze a goal-oriented adaptive finite element method, which steers the adaptive mesh-refinement as well as the approximate solution of the arising linear systems by means of a contractive iterative solver like the optimally preconditioned conjugate gradient method or geometric multigrid. We prove linear convergence of the proposed adaptive algorithm with optimal algebraic rates. Unlike prior work, we do not only consider rates with respect to the number of degrees of freedom but even prove optimal complexity, i.e., optimal convergence rates with respect to the total computational cost.


Introduction
Let Ω ⊂ R d be a bounded Lipschitz domain, d ≥ 2. For given f ∈ L 2 (Ω) and f ∈ [L 2 (Ω)] d , we consider a linear symmetric and elliptic partial differential equation where A(x) ∈ R d×d sym is symmetric and c(x) ∈ R. As usual, we assume that A, c ∈ L ∞ (Ω), that A is uniformly positive definite and that the weak form (see (5) below) fits into the setting of the Lax-Milgram lemma.Standard adaptivity aims to approximate the unknown solution u ∈ H 1 0 (Ω) of (1) in the energy norm at optimal rate; see [Dör96, MNS00, BDD04, Ste07, CKNS08, CN12, FFP14] for adaptive finite element methods (AFEMs) and [CFPP14] for an overview of available results.Instead, the quantity of interest for goal-oriented adaptivity is only some functional value of the unknown solution u ∈ H 1 0 (Ω) of (1), and the present paper aims to compute the linear goal functional for given g ∈ L 2 (Ω) and g ∈ [L 2 (Ω)] d .To approximate G(u ) accurately, it is not necessary (and might even waste computational time) to accurately approximate the solution u on the whole computational domain.Due to this potential decrease of computational cost, goal-oriented adaptivity is of high relevance in practice as well as in mathematical research; see, e.g., [BR01,BR03,EEHJ95,GS02] for some prominent contributions.The present work formulates a goal-oriented adaptive finite element method (GOAFEM), where the sought goal G(u ) is approximated by some computable G such that |G(u ) − G | →∞ − −− → 0 even at optimal algebraic rate. (3) The earlier works [MS09, BET11, FGH + 16, FPZ16] are essentially concerned with optimal convergence rates for GOAFEM, where all arising linear FEM systems are solved exactly.
While [FGH + 16, FPZ16] particularly aim to transfer ideas from the AFEM analysis of [CKNS08,CFPP14] to GOAFEM for general elliptic PDEs, the seminal work [MS09] considers the Poisson model problem and additionally addresses the total computational cost by formulating realistic assumptions on a generic inexact solver (called GALSOLVE in [Ste07,MS09]).
The focus of the present work is also on the iterative (and hence inexact) solution of the arising FEM systems.However, we avoid any realistic assumptions on the solver, but rather rely on energy contraction per solver step, which is proved to hold for the preconditioned CG method with optimal multilevel additive Schwarz preconditioner [CNX12] or the geometric multigrid method [WZ17].In the proposed GOAFEM algorithm, the termination of such a contractive iterative solver is then based on appropriate computable a posteriori error estimates by a similar criterion as in [Ste07,MS09].We discuss several implementations of such termination criteria and prove that these allow to control the total computational cost of computing the approximate goal value G , where we already stress now that G(u ) ≈ G = G(u ) + R , where u ≈ u is a FEM approximation of u and R is a residual correction related to inexact solution of the FEM formulation.While [MS09] shows algebraic convergence with optimal rates with respect to the overall computational cost for the final iterates on every level for sufficiently small adaptivity parameters (for mesh-refinement and solver termination), our main contribution is full linear convergence, i.e., linear convergence of the estimator product independently of the algorithmic decision for either mesh-refinement or solver step, even for arbitrary adaptivity parameters.An immediate consequence is that the convergence rate of the computed solutions with respect to the number of elements will be the same as with respect to the overall computational cost (i.e., the cumulative computational time).Moreover, for sufficiently small adaptivity parameters, we show convergence with optimal rates with respect to the number of elements and, hence, with respect to the overall computational cost.This extends the results of [MS09] to the present setting of symmetric second-order linear elliptic PDEs.Finally, we stress that, unlike [MS09], our GOAFEM algorithm does not require any inner loop for data approximation and therefore does not require different (but still nested) meshes for the primal and dual problem.Overall, the present paper thus provides further mathematical understanding for bridging the gap between applied GOAFEM and theoretical optimality results.
Outline.In Section 2, we present our GOAFEM algorithm (Algorithm 3) and the details of its individual steps.This includes the details of our finite element discretization as well as the precise assumptions for the iterative solver, the marking strategy, and the error estimators.We then state in Section 3 that Algorithm 3 leads to linear convergence for arbitrary stopping parameters (Theorem 6) and even achieves optimal rates with respect to the total computational cost if the adaptivity parameters are sufficiently small (Theorem 8).We emphasize that linear convergence applies to all steps of the adaptive strategy, independently of whether the algorithm decides for one solver step or one step of local mesh-refinement.This turns out to be the key argument for optimal rates with respect to the total computational cost (see Corollary 7).Section 3.2 comments on alternative termination criteria for the iterative solver.Section 4 then illustrates our theoretical findings with numerical experiments.Finally, we give a proof of our main Theorems 6 and 8 in Section 5 and Section 6, respectively.
Notation.In the following text, we write a b for a, b ∈ R if there exists a constant C > 0 (which is independent of the mesh width h) such that a ≤ C b.If there holds a b a, we abbreviate this by a b.Furthermore, we denote by #A the cardinality of a finite set A and by |ω| the d-dimensional Lebesgue measure of a subset ω ⊂ R d .

Goal-oriented adaptive finite element method
2.1.Variational formulation.Defining the symmetric bilinear form we suppose that a(•, •) is continuous and elliptic on H 1 0 (Ω) and thus fits into the setting of the Lax-Milgram lemma, i.e., there exist constants In particular, a(•, •) is a scalar product that yields an equivalent norm |||v||| 2 := a(v, v) on H 1 0 (Ω).The weak formulation of (1) reads The Lax-Milgram lemma proves existence and uniqueness of the solution u ∈ H 1 0 (Ω) of (5).The same argument applies and proves that the dual problem admits a unique solution z ∈ H 1 0 (Ω), where the linear goal functional Remark 1.For ease of presentation, we restrict our model problem (1) to homogeneous Dirichlet boundary conditions.We note, however, that for mixed homogeneous Dirichlet and inhomogeneous Neumann boundary conditions our main results hold true with the obvious modifications.In particular, with the partition ∂Ω = Γ D ∪ Γ N into Dirichlet boundary Γ D with |Γ D | > 0 and Neumann boundary Γ N , the space H 1 0 (Ω) (and its discretization) has to be replaced by H 1 D (Ω) := v ∈ H 1 (Ω) : v| Γ D = 0 in the sense of traces and the Neumann data has to be given in L 2 (Γ N ).Furthermore, the coefficient f must vanish in a neighborhood of Γ N to go from the strong form (1) to the weak form (5) via integration by parts.

Finite element discretization and solution.
For a conforming triangulation T H of Ω into compact simplices and a polynomial degree p ≥ 1, let To obtain conforming finite element approximations u ≈ u H ∈ X H and z ≈ z H ∈ X H , we consider the Galerkin discretizations of (5)-(6).First, we note that the Lax-Milgram lemma yields the existence and uniqueness of exact discrete solutions u H , z H ∈ X H , i.e., there holds that In practice, the discrete systems (8) are rarely solved exactly (or up to machine precision).Instead, a suitable iterative solver is employed, which yields approximate discrete solutions u m H , z n H ∈ X H .We suppose that this iterative solver is contractive, i.e., for all m, n ∈ N, it holds that where 0 < q ctr < 1 is a generic constant and, in particular, independent of X H . Assumption (9) is satisfied, e.g., for an optimally preconditioned conjugate gradient (PCG) method (see [CNX12]) or geometric multigrid solvers (see [WZ17]); see also the discussion in [GHPS21].We note that these solvers are also guaranteed to satisfy the realistic assumptions from [Ste07, MS09] (which require that any initial energy error can be improved by a factor 0 < τ < 1 at O(| log(τ )|#T H ) cost).However, while (9) is slightly less general, it allows to prove full linear convergence; see Theorem 6 below.

Discrete goal quantity.
To approximate G(u ), we proceed as in [GS02]: For any u H , z H ∈ X H , it holds that

Defining the discrete quantity of interest
the goal error can be controlled by means of the Cauchy-Schwarz inequality We note that the additional term in (10) is the residual of the discrete primal problem (8) evaluated at an arbitrary function z H ∈ X H and hence G(u H ) = G H (u H , z H ).
In the following, we design an adaptive algorithm that provides a computable upper bound to (11) which tends to zero at optimal algebraic rate with respect to the number of elements #T H as well as with respect to the total computational cost.
2.4.Mesh refinement.Let T 0 be a given conforming triangulation of Ω.We suppose that the mesh-refinement is a deterministic and fixed strategy, e.g., newest vertex bisection [Ste08].For each conforming triangulation T H and marked elements M H ⊆ T H , let T h := refine(T H , M H ) be the coarsest conforming triangulation, where all T ∈ M H have been refined, i.e., M H ⊆ T H \T h .We write T h ∈ T(T H ), if T h results from T H by finitely many steps of refinement.To abbreviate notation, let T := T(T 0 ).We note that the order on T is respected by the finite element spaces, i.e., T h ∈ T(T H ) implies that X H ⊆ X h .
We further suppose that each refined element has at least two sons, i.e.,

#(T
and that the refinement rule satisfies the mesh-closure estimate where C cls > 0 depends only on T 0 .For newest vertex bisection, this has been proved under an additional admissibility assumption on T 0 in [BDD04,Ste08] and for 2D even without any additional assumption in [KPP13].Finally, we suppose that the overlay estimate holds, i.e., for all triangulations T H , T h ∈ T, there exists a common refinement which has been proved in [Ste07,CKNS08] for newest vertex bisection. for all v H ∈ X H and all U H ⊆ T H .We suppose that the estimators η H and ζ H satisfy the so-called axioms of adaptivity (which are designed for, but not restricted to, weighted-residual error estimators) from [CFPP14]: There exist constants C stab , C rel , C drel > 0 and 0 < q red < 1 such that for all T H ∈ T(T 0 ) and all T h ∈ T(T H ), the following assumptions are satisfied: By assumptions (A1) and (A3), we can estimate for every discrete function w H ∈ X H the errors in the energy norm of the primal and the dual problem by respectively, where C = max{C rel , C rel C stab + 1} > 0. Together with (11), we then obtain that the goal error for approximations In the following sections, we provide building blocks for our adaptive algorithm that allow to control the arising estimators (by a suitable marking strategy) as well as the arising norms in the upper bound of (16) (by an appropriate stopping criterion for the iterative solver).
2.6.Marking strategy.We suppose that the refinement indicators η H (T, u m H ) and ζ H (T, z n H ) for some m, n ∈ N are used to mark a subset M H ⊆ T H of elements for refinement, which, for fixed marking parameter 0 < θ ≤ 1, satisfies that Remark 2. Given 0 < ϑ ≤ 1, possible choices of marking strategies satisfying assumption (17) are the following: (a) The strategy proposed in [BET11] defines the weighted estimator which is the Dörfler marking criterion introduced in [Dör96] and well-known in the context of AFEM analysis; see, e.g., [CFPP14].This strategy satisfies (17) and then chooses Again, this strategy satisfies (17) with θ = ϑ 2 /2.Note that our main results of Theorem 6 and 8 below hold true for all presented marking criteria (a)-(c).For our numerical experiments, we focus on criterion (a), which empirically tends to achieve slightly better performance in practice.

Adaptive algorithm. Any adaptive algorithm strives to drive down the bound in (16
).However, the errors of the iterative solver, |||u H − u m H ||| and |||z H − z n H |||, cannot be computed in general since the exact discrete solutions u H , z H ∈ X H to (8) are unknown and will not be computed.Thus, we note that (9) and the triangle inequality prove that as well as With C goal = max{C rel , C rel C stab + 1} 1 + q ctr /(1 − q ctr ) , ( 16) leads to ) which is a computable upper bound to the goal error if m, n ≥ 1.Moreover, given some λ ctr > 0, this motivates to stop the iterative solvers as soon as to equibalance the contributions of the upper bound in (21), which is very similar to the stopping criteria considered in the seminal works [Ste07,MS09]; further alternative stopping criteria are introduced and analyzed below.Overall, we thus consider the following adaptive algorithm.
Remark 4. Theorem 6 below proves (linear) convergence for any choice of the marking parameters 0 < θ ≤ 1 and λ ctr > 0, and for any of the marking strategies from Remark 2. Theorem 8 below proves optimal convergence rates (with respect to the number of elements and the total computational cost) if both parameters are sufficiently small (see (32) for the precise condition) and if the set M is constructed by one of the strategies from Remark 2, where the respective sets have quasi-minimal cardinality.
Remark 5. Note that Algorithm 3(i) requires to evaluate the error estimator after each solver step.Clearly, it would be favorable to replace Arguing as in [FP18, Lemma 8], this allows to prove convergence of the adaptive strategy, but full linear convergence (Theorem 6 below) and optimal convergence rates (Theorem 8 below) are exptected to fail.
For each adaptive level , Algorithm 3 performs at least one solver step to compute u m as well as one solver step to compute z n .By definition, m( ) ≥ 1 is the solver step, for which the discrete solution u m( ) is accepted (to contribute to the set of marked elements M ).Analogously, n( ) ≥ 1 is the solver step, for which the discrete solution z n( ) is accepted (to contribute to M ).If the iterative solver for either the primal or the dual problem fails to terminate for some level ∈ N 0 , i.e., (22) cannot be achieved for finite m, or n, we define m( ) := ∞, or n( ) := ∞, respectively, and := .With k( ) := max{m( ), n( )}, we define For ease of presentation, we omit the -dependence of the indices for final iterates m( ), n( ), and k( ) in the following, if they appear as upper indices and write, e.g., u m := u m( ) and u m−1 := u m( )−1 .If Algorithm 3 does not terminate in step (iii) for some ∈ N, then we define := ∞.To formulate the convergence of Algorithm 3, we define the ordered set Note that |( , k)| is proportional to the overall number of solver steps to compute the estimator product η (u k )ζ (z k ).Additionally, we sometimes require the notation To estimate the work necessary to compute a pair (u k , z k ) ∈ X × X , we make the following assumptions which are usually satisfied in practice: • The iterates u k and z k are computed in parallel and each step of the solver in Algorithm 3(i) can be done in linear complexity O(#T ); steps; • The marking in Algorithm 3(iv) can be performed at linear cost O(#T ) (according to [Ste07] this can be done for the strategies outlined in Remark 2 with M having almost minimal cardinality; moreover, we refer to a recent own algorithm in [PP20] with linear cost even for M having minimal cardinality); • We have linear cost O(#T ) to generate the new mesh T +1 .Since a step ( , k) ∈ Q of Algorithm 3 depends on the full history of preceding steps, the total work spent to compute Finally, we note that Algorithm 3(vi) employs nested iteration to obtain the initial guesses u 0 +1 , z 0 +1 of the solver from the final iterates u m , z n for the mesh T .According to (21), this allows for a posteriori error control for all indices ( , k) ∈ Q 0 \{(0, 0)} beyond the initial step.

Main results
3.1.Linear convergence with optimal rates.Our first main result states linear convergence of the quasi-error product for every choice of the stopping parameter λ ctr > 0. Recall from ( 16) that the quasi-error product is an upper bound for the error Theorem 6. Suppose (A1)-(A3).Suppose that 0 < θ ≤ 1 and λ ctr > 0.Then, Algorithm 3 satisfies linear convergence in the sense of The constants C lin > 0 and 0 < q lin < 1 depend only on C stab , q red , C rel , q ctr , and the (arbitrary) adaptivity parameters 0 < θ ≤ 1 and λ ctr > 0.
Full linear convergence implies that convergence rates with respect to degrees of freedom and with respect to total computational cost are equivalent.From this point of view, full linear convergence indeed turns out to be the core argument for optimal complexity.Corollary 7. Recall the definition of the total computational cost work( , k) from (26).Let r > 0 and C r := sup ( ,k)∈Q (#T − #T 0 + 1) r Λ k ∈ [0, ∞].Then, under the assumptions of Theorem 6, it holds that where the constant C rate > 0 depends only on r, #T 0 , and on the constants q lin , C lin from Theorem 6.
Proof.The first two estimates in (29) are obvious.It remains to prove the last estimate in (29).To this end, note that it follows from the definition of C r that Moreover, elementary algebra yields that For ( , k) ∈ Q, Theorem 6 and the geometric series thus show that work( , k) With C rate := (#T 0 ) r C lin 1/(1 − q 1/r lin ) r , this gives that This shows the final inequality in (29) and thus concludes the proof.
If θ and λ ctr are small enough, we are able to show that linear convergence from Theorem 6 even guarantees optimal rates with respect to both the number of unknowns #T and the total cost work( , k).Given N ∈ N 0 , let T(N ) be the set of all and for all r > 0, there holds the following result.
Theorem 8. Recall the definition of the total computational cost work( , k) from (26).Suppose the mesh properties (12)-( 14) as well as the axioms (A1)-(A4).Define Let both adaptivity parameters 0 < θ ≤ 1 and 0 < λ ctr < λ be sufficiently small such that Let 1 ≤ C mark < ∞.Suppose that the set of marked elements M in Algorithm 3(iv) is constructed by one of the strategies from Remark 2(a)-(c), where the sets in (18) and (19) have up to the factor C mark minimal cardinality.Let s, t > 0 with u As + z At < ∞.
Then, there exists a constant C opt > 0 such that sup The constant C opt depends only on C cls , C stab , q red , C rel , C drel , q ctr , C mark , θ, λ ctr , #T 0 , s, and t.
Remark 9.The constraint (32) is enforced by our analysis of the marking strategy from Remark 2(a), while the marking strategies from Remark 2(b)-(c) allow to relax the condition to 3.2.Alternative termination criteria for iterative solver.The above formulations of Algorithm 3 stops the iterative solver for u m and the iterative solver for z n independently of each other as soon as the respective termination criteria in (22) are satisfied.In this section, we briefly discuss two alternative termination criteria: Stronger termination: The current proof of linear convergence (and of the subsequent proof of optimal convergence) does only exploit that u k and z k satisfy the stopping criterion and the previous iterates do not (cf.Lemma 10(iii)).This can also be ensured by the following modification of Algorithm 3(i): (i) Employ the iterative solver to compute iterates u 1 , . . ., u k and z 1 , . . ., z k together with the corresponding refinement indicators η (T, u k ) and ζ (T, z k ) for all T ∈ T , until Note that this will lead to more solver steps, since now k = k( ) (if it exists) is the smallest index for which the stopping criterion holds simultaneously for both u k and z k .Inspecting the proof of Lemma 10 below, we see that all results hold verbatim also for this stopping criterion.Thus, we conclude linear and optimal convergence (in the sense of Theorem 6 and Theorem 8) also in this case.
Natural termination: The following stopping criterion (which is somehow the most natural candidate) also leads to linear convergence: Let m( ), n( ) ∈ N be minimal with (22).If either of them do not exist, we set again m( ) = ∞, or n( ) = ∞, respectively.Define k( ) := max{m( ), n( )}.Then, employ the iterative solver k( ) times for both the primal and the dual problem, i.e., the solver provides iterates u k and z k until both stopping criteria in (22) have been satisfied once (which avoids the artificial definition (23)).For instance, if m( ) < n( ) = k( ) < ∞, we continue to iterate for the primal problem until u k is obtained (or never stop the iteration if n( ) = k( ) = ∞).If λ ctr > 0 is sufficiently small such that 1 − qctr 1−qctr C stab (1 + q ctr )λ ctr > 0, then we can define and we can guarantee the stopping condition (22) with the larger constant λ ctr , i.e., see the proof below.Again, we notice that then the assumptions of Lemma 10 below are met.Hence, we conclude linear convergence (in the sense of Theorem 6) also for this stopping criterion.Moreover, optimal rates in the sense of Theorem 8 hold if λ ctr in (32) is replaced by λ ctr .
Proof of (36).Without loss of generality, let us assume that m( First, we have that Then, using the fact that u m satisfies the stopping criterion in (22) and stability (A1), we get that )] we can absorb the last term to obtain Finally, we observe that Combining the last two estimates we obtain that Hence, (36) follows with q k( )−m( ) ctr

Numerical examples
In this section, we consider two numerical examples which solve the equation where φ ∈ L 2 (Γ N ) and n is the element-wise outwards facing unit normal vector.We refer the reader to Remark 1 for a comment on the applicability of our results to this model problem.We further suppose that the goal functional is a slight variant of the one proposed in [MS09], i.e., with a subset ω ⊆ Ω and a fixed direction g(x) = g 0 ∈ R 2 .Moreover, for error estimation, we employ standard residual error estimators, which in our case, for all ( , k) ∈ Q and all T ∈ T , read Throughout this section, we solve (37) as well as the corresponding dual problem numerically using Algorithm 3, where we make the following choices: • We solve the problems on the lowest order finite element space, i.e., with polynomial degree p = 1.• As initial values, we use u 0 0 = z 0 0 = 0. • To solve the arising linear systems, we use a preconditioned conjugate gradient (PCG) method with an optimal additive Schwarz preconditioner.We refer to [CNX12,Sch21] for details and, in particular, the proof that this iterative solver satisfies (9).• We use the marking criterion from Remark 2(a) and choose M such that it has minimal cardinality.• Unless mentioned otherwise, we use ϑ = 0.5 and λ ctr = 10 −5 .4.1.Singularity in goal functional only.In our first example, the primal problem is (37) with f = 2x 1 (1 − x 1 ) + 2x 2 (1 − x 2 ) on the unit square Ω = (0, 1) 2 , and Γ D = ∂Ω (and thus, Γ N = ∅).For this problem, the exact solution reads The goal functional is (38) with ω = T 1 := x ∈ Ω : x 1 + x 2 ≥ 3/2 and g 0 = (−1, 0).The exact goal value can be computed analytically to be The initial mesh T 0 as well as a visualization of the set T 1 can be seen in Figure 1.
For this setting, we compare our iterative solver to a conjugate gradient method without preconditioner in Figure 2, where we plot the computable upper bound from (21), over work( , k) for all iterates ( , k) ∈ Q and the estimator product for the final iterates η (u k )ζ (z k ) over #T .We stress that, for ( , k) ∈ Q, the computable upper bound Ξ k and the quasi-error product Λ k from (27) are related by Λ k Ξ k Λ k−1 so that linear convergence (28) with optimal rates (33) of Λ k also yields linear convergence with optimal rates of Ξ k .Since in our experiments λ ctr = 10 −5 is small, it is plausible to assume that the final estimates on every level approximate the exact solutions sufficiently well in the sense 10 1 10 3 10 5 10 7 10 9 10 11 10 −9 10 −7 Comparison between iterative solvers for the problem from Section 4.1.A conjugate gradient method without preconditioner (CG) leads to optimal rates with respect to #T for the final iterates where k = k( ), but not with respect to work( , k) for every ( , k) ∈ Q.Our choice of the iterative solver (ML) achieves optimal rates with respect to both measures.
of estimator products, i.e., η (u k )ζ (z k ) ≈ η (u )ζ (z ) (cf.Lemma 13 below) for which [FPZ16] proves optimal convergence rates with respect to #T .Indeed, we see optimal rates for η (u k )ζ (z k ) with respect to #T for both solvers in Figure 2.However, the nonpreconditioned CG method fails to satisfy uniform contraction (9) and thus Theorem 8 cannot be applied.In fact, Figure 2 shows that this method fails to drive down Ξ k with optimal rates with respect to work( , k) (cf.( 26)), as opposed to the optimally preconditioned PCG method.Furthermore, we plot in Figure 3 different error measures over work( , k) for every iterate ( , k) ∈ Q.This shows that the corrector term (which is the residual of u k evaluated at the dual solution z k ) in the definition of the discrete goal functional (10) is indeed necessary.We see that throughout the iteration, the goal value G(u k ) highly oscillates and, for large values of λ ctr , even shows a different rate than the Ξ k over work( , k).In general, we thus cannot expect the quantity Ξ k to bound the uncorrected goal-error For the discrete goal, the corrector term compensates the oscillations of the goal functional, such that their sum decreases with the same rate as Ξ k , as predicted by (21).Smaller values of λ ctr imply that on every level the approximate solutions u k , z k are computed more accurately, such that the corrector term becomes smaller and the effect on the rate of the goal value becomes negligible.4.2.Geometrical singularity.Our second example is the classical example of a geometric singularity on the so-called Z-shape Ω = (−1, 1) 2 \ conv{(−1, −1), (0, 0), (−1, 0)}, where Γ D is only the re-entrant corner (cf. Figure 4).The primal problem is (37) with f = 0 and φ = ∇u • n, where the exact solution in polar coordinates r(x) and ϕ(x) of x ∈ R 2 is prescribed as u (x) = r(x) 4/7 sin( 4 7 ϕ(x) + 3π 7 ).The goal functional is (38) with ω = T 2 := (0.5, 0.5) 2 ∩ Ω and g 0 = (−1, −1) and can be computed directly via numerical integration to be In Figure 4, the initial triangulation T 0 as well as the mesh after several iterations of Algorithm 3 can be seen.The adaptive algorithm resolves the singularity at the reentrant corner, as well as critical points of the goal functional, which are at the corners of T 2 .
Figure 5 shows the rate of the estimator product η (u k )ζ (z k ) of the final iterates over #T as well as the rate of Ξ k over work( , k) for all ( , k) ∈ Q.

Proof of Theorem 6
The following core lemma extends one of the key observations of [GHPS21] to the present setting, where we stress that the nonlinear product structure of ∆ k leads to technical challenges which go much beyond [GHPS21].
For the following proofs, we define such that the quasi-error product reads ) with a free parameter µ > 0 which will be fixed below.
Proof of Lemma 10(i).Recall from (23) that u k = u m for all m( ) < k ≤ k( ).Thus, we have that For 0 < k < m( ), on the other hand, the solution u k is obtained by one step of the iterative solver.From stability (A1) and solver contraction (9), we have for all 0 ≤ j k ≤ m( ) that ≤ q k−j ctr + µC stab (1 + q k−j ctr ) α j + µ η (u j ) ≤ q ctr + 2µC stab α j + µ η (u j ).If µ is chosen small enough such that q ctr + 2µC stab ≤ 1, together with the trivial case j = k, the last two equations show that The same argument shows that Multiplication of the last two estimates shows the assertion.
Proof of Lemma 10(ii).Recall that for the index k( ) there holds (22).From the triangle inequality, we thus get for the primal estimator that Furthermore, stability (A1) leads to Combining the last two estimates, we see that Together with the analogous estimate for β k−1 + µ ζ (z k−1 ), we conclude the proof with Proof of Lemma 10(iii).Without loss of generality, suppose that k( ) = m( ) and thus With contraction of the solver (9), this leads to ). Together with the previous estimate, this shows that Up to the choice of µ, this concludes the proof.
Proof of Lemma 10(iv).First, we note that η (u k )ζ (z k ) = 0, according to Algorithm 3(iii) and the assumption that < .From reduction of the solver (9) and nested iteration, we get that and thus For the estimator terms, we have with stability (A1) and reduction (A2) that On the other hand, with 0 < q Using (45), the corresponding estimate for the dual estimator, and the Young inequality, we obtain that The marking criterion (17), which is applicable due to < , estimates the term in brackets by zero.Thus stability (A1) leads to For the mixed terms in ∆ 0 +1 , we have with ( 42) and (44) that Analogously, we see that Combining ( 43) and ( 46)-(48), we get that Rearranging the terms, we obtain that where the remainder term is defined as Up to the choice of µ, this concludes the proof.
The next lemma is already implicitly found in [GHPS18].It shows that, if λ ctr > 0 is sufficiently small, then Dörfler marking for the exact discrete solution implicitly implies Dörfler marking for the approximate discrete solution.This will turn out to be the key observation to prove optimal convergence rates.We include the proof for the convenience of the reader.
Proof of Theorem 8.By Corollary 7, it is sufficient to prove that We prove this inequality in two steps.
Step 1: In this step, we bound the number of marked elements #M for arbitrary 0 ≤ < .Let Θ > 0 and corresponding Θ from Lemma 13 such that Let T h( ) ∈ T(T ) be the corresponding mesh as in Lemma 11.With Lemma 12, this yields that Lemma 13 with U = T \ T h( ) shows that We consider the marking strategies from Remark 2 separately.For strategy (a), we have with Θ := 2θ and assumption (32) that (60) is satisfied.Hence, (61) implies that there holds (17), i.e., By assumption of Theorem 8, M is essentially minimal with (17).We infer that For the strategies (b)-(c), we set Θ = θ and note that assumption (32) (as well as the weaker assumption (34)) imply (60), and hence (61).Again, by assumption of Theorem 8, M is chosen essentially minimal (with an additional factor two for the strategy (c)) such that (61) holds.For all three strategies, we therefore conclude that #M #T h( ) − #T Recall that (20) and ( 22) give that η (u k )ζ (z k ) Λ k .This finally shows that #M u As z At 1/(s+t) (Λ k ) −1/(s+t) .
∩Ω) , where h T = |T | 1/2 is the local mesh-width and [[•]] denotes the jump across interior edges.It is well-known [CFPP14, FPZ16] that η and ζ satisfy the assumptions (A1)-(A4).The examples are chosen to showcase the performance of the proposed GOAFEM algorithm for different types of singularities.

10 − 2 10 2 Figure 3 .
Figure 3.Comparison between Ξ k , discrete goal G (u k , z k ), primal residual evaluated at the dual solution z k , and direct evaluation of goal functional G(u k ) for every iterate ( , k) ∈ Q and different values of λ ctr ∈ {1, 10 −2 , 10 −4 , 10 −6 }.The primal residual evaluated at the dual solution z k is the difference between goal and discrete goal; see (10).

T 2 Figure 4 .ΞFigure 5 .
Figure 4. Left: Initial mesh T 0 .The shaded area is the set T 2 from Section (4.2) and the Dirichlet boundary at the re-entrant corner is marked in red.Right: Mesh after 13 iterations of Algorithm 3 with #T 13 = 4534.
2.5.Estimator properties.For T H ∈ T and v H ∈ X H , let η H (T, v H ) ≥ 0 and ζ H (T, v H ) ≥ 0 for all T ∈ T H be given refinement indicators.For µ H ∈ {η H , ζ H }, we use the usual convention that