Error estimates for the finite element approximation of bilinear boundary control problems

In this article a special class of nonlinear optimal control problems involving a bilinear term in the boundary condition is studied. These kind of problems arise for instance in the identification of an unknown space-dependent Robin coefficient from a given measurement of the state, or when the Robin coefficient can be controlled in order to reach a desired state. Necessary and sufficient optimality conditions are derived and several discretization approaches for the numerical solution of the optimal control problem are investigated. Considered are both a full discretization and the postprocessing approach meaning that we compute an improved control by a pointwise evaluation of the first-order optimality condition. For both approaches finite element error estimates are shown and the validity of these results is confirmed by numerical experiments.


Introduction
This paper is concerned with bilinear boundary control problems of the form where Ω ⊂ R n , n ∈ {2, 3}, is a bounded domain, α > 0 is the regularization parameter, y d ∈ L 2 (Ω) is a desired state and 0 ≤ u a < u b are the control bounds.
As an application of bilinear boundary control problems we mentioned the identification of an unknown Robin coefficient from a given measurement y d of the state quantity.This is for instance of interest in the modeling of stem cell division processes [15], where u is the unknown parameter describing the chemical reactions between proteins from the cell interior and the cell cortex.For further applications, u can be interpreted as a heat-exchange coefficient in thermodynamics or as a quantity for corrosion damage in electrostatics.There are many publications dealing with the identification of the Robin coefficient, see for instance [11,21,27,30].Only a few papers use an optimal control approach similar to the one considered in the present article.We mention [23,20], where the parabolic version of our model problem is considered.The authors prove convergence of a finite element approximation but no convergence rate is established.A similar problem is discussed in [19], dealing with the recovery of the Robin parameter in a variational inequality.
The aim of the present paper is to derive necessary and sufficient optimality conditions for the optimal control problem and to investigate several numerical approximations regarding convergence towards a local solution.This complements a previous contribution of Kröner and Vexler [25] where the distributed control case, meaning that the bilinear term u y appears in the differential equation, is discussed.In this article error estimates for the approximate controls in the L 2 (Ω)-norm are derived for several finite element approximations.To this end, the authors approximate the state and adjoint state by linear finite elements and derive the convergence rate 1 for piecewise constant and 3/2 for piecewise linear approximations for the control.Moreover, advanced discretization concepts like the postprocessing approach [28] and the variational discretization [22] is investigated which allow an improvement up to a convergence rate of 2. It is the purpose of the present article to extend the results to the case of boundary control.
The numerical analysis of boundary control problems is usually more difficult than for distributed control problems as the adjoint control-to-state operator maps onto some Sobolev/Lebesgue space defined on the boundary.As a consequence, error estimates for the traces of finite element solutions have to be proved, more precisely, in the L 2 (Γ)-norm.Here, we consider two different discretization approaches.The first one is a full discretization using piecewise linear finite elements for the states and piecewise constant functions on the boundary for the control approximation.Under the assumption that the domain has a Lipschitz boundary we show that the discrete optimal control converges with the optimal rate 1.To show this result we exploit the local coercivity of the objective, best-approximation properties of the control space and suboptimal error estimates for the state and adjoint equation.In order to obtain a more accurate solution we also investigate the postprocessing approach where an improved control is computed by a pointwise application of the first-order optimality condition to the discrete state variables.For this approach we have to assume more regularity for the exact solution and thus, we restrict our considerations to two-dimensional domains with sufficiently smooth boundary.Under this assumption we show the optimal convergence rate of 2 − ε with arbitrary ε > 0 which is the rate one would also expect in the case of linear quadratic boundary control problems and smooth solutions [2,3,29] (even with h −ε replaced by |ln h|, where h is the maximal element diameter of the finite element mesh).The proof relies on the non-expansivity of the projection onto the feasible set as well as sharp error estimates for the state and adjoint state in L 2 (Γ).To obtain estimates in these norms superconvergence properties of the midpoint interpolant, finite element error estimates for the Ritz projection in L 2 (Γ) and a supercloseness result between the midpoint interpolant of the exact and the discrete solution are exploited.To show the L 2 (Γ)-norm error estimate we will, as we consider smooth solutions, derive a maximum norm estimate.To the best of the authors knowledge these results are not available in the literature for problems with Robin boundary conditions.Based on the ideas from [16] we formulate the missing proof.
We moreover note that the setting discussed here does not fit into the well-known framework of the semilinear optimal control problems discussed e. g. in [4,8,10,26], as these contributions deal with nonlinearities depending solely on the state variable.However, many techniques can be reused for the problem considered here.The only publication where more general nonlinearities depending both on the state and the control variable is, to the best of the authors knowledge, [31].Therein optimality conditions are discussed but there is no theory on the numerical analysis of approximation methods for this problem class available yet.However, we think that the consideration of bilinear control problems may serve as a starting point for the investigation of a more general class of nonlinear optimal control problems.
The article is structured as follows.In Section 2 we discuss the solubility of the state equation and regularity results for its solution.In Section 3 we analyze the optimal control problem.In particular, necessary and sufficient optimality conditions are investigated.Section 4 is devoted to the finite element discretization of the state equation, where we show finite element error estimates required for the numerical analysis of the optimal control problem later.The discretization of the optimal control problem is considered in Section 5.In particular, we discuss convergence rates for the numerical solution obtained by a full discretization of the optimal control problem as well as for an improved control obtained by a postprocessing step.The latter result requires some auxiliary results that we discuss in the appendix.To be more precise, a maximum norm error estimate for the finite element solution of an elliptic equation with Robin boundary conditions is needed.A proof is given in Appendix A. Moreover, a proof of local error estimates for the midpoint interpolant and the L 2 (Γ) projection onto piecewise constant functions on the boundary is needed.To the best of the authors knowledge these results are not available in the literature in case of domains with curved boundaries.Thus, we discuss these auxiliary results in Appendix B. Finally, we will compare the theoretical results with numerical experiments in Section 6.

Analysis of the state equation
We consider the boundary value problem with First, we show an existence and uniqueness result for (1).Therefore, we introduce a decomposition of the control into a positive and negative part u + , u − ∈ L 2 + (Γ) := {v ∈ L 2 (Γ) : v ≥ 0 a. e. on Γ} such that u = u + −u − .The following result then relies on the Lax-Milgram-Lemma.However, an assumption on the coefficient u is required.
with the constant c * which is due to the embedding . Then, the solution y of (1) belongs to H 1 (Ω) and satisfies the a priori estimate .
Proof.The boundedness of a u follows directly from the Cauchy-Schwarz inequality and the embedding H 1 (Ω) → L 4 (Γ).This implies.
To show the coercivity we take into account the decomposition u = u + − u − and the embedding Here, the assumption (2) will ensure the coercivity.An application of the Lax-Milgram Lemma leads to the desired result.
. This is the key idea which allows us to avoid the two-norm discrepancy for the optimal control problem as we will see that the reduced objective functional is differentiable with respect to the L 2 (Γ)-topology.In the following we will hide the dependency of the estimates on u − L 2 (Γ) and thus γ u in the generic constant as we impose positive control bounds in the considered optimal control problem.
Later, we will frequently make use of the following Lipschitz estimate.
In the following theorem we collect some regularity results for the solution of (1).
, be a bounded Lipschitz domain.By y ∈ H 1 (Ω) we denote the solution of (1).The following a priori estimates are valid, under the assumption that the input data possess the regularity demanded by the right-hand side: a) If r > 2n/(1 + n) and p > 2 for n = 2 and p ≥ 4 for n = 3, then b) If r > n/2, s > n − 1, and p ≥ 2 for n = 2 and p > 8/3 for n = 3, then c) Furthermore, if Ω is a convex polygonal/polyhedral domain, or possesses a boundary which is of class C 1,1 , there holds Proof.a) In [14, Theorem 1.12] it is shown that the problem as well as Ω F + Γ G = 0.The solubility condition is satisfied in our situation with F = f − y and G = g − u y and becomes clear when testing (1) with v ≡ 1.The regularity required for F follows from the embedding f ∈ L r (Ω) → H −1/2+ε (Ω) for sufficiently small ε > 0.Moreover, the Hölder inequality and the embeddings , from which we conclude G ∈ L 2 (Γ).From [14, Theorem 1.12] and Lemma 2.1 we then obtain It remains to show the H 1 (Γ)-norm estimate.We split the solution into the parts y f and y g solving Using [17,Theorem 5.4] we directly deduce and Lemma 2.1 leads to the desired estimate for y g .For the function y f , we get the desired estimate by an application of a trace theorem and the a priori estimate (3) which can in case of g ≡ 0 be improved to provided that ε > 0 is sufficiently small.The validity of the second step can be confirmed by means of [14, Theorem 1.12] and [13,Theorem 23.3].The decomposition y = y f + y g and the estimates shown above imply the desired estimate in the H 1 (Γ)-norm.b) We prove the result for the case n = 3.The two-dimensional case follows from the same arguments.From [7,Theorem 3.1] it is known that the solution of (1) belongs to C(Ω) if f ∈ L r (Ω), r > n/2, and g − uy ∈ L s (Γ), s > n − 1.The latter assumption can be concluded from the Hölder inequality, a Sobolev embedding and a trace theorem, which implies for 1/p + 1/8 = 1/(2 + ε).A simple computation shows that p > 8/3 and s = 2 + ε with ε > 0 sufficiently small guarantee the validity of the previous steps.It remains to show y ∈ H 5/4+ε (Ω).This can be deduced from [13,Theorem 23.3] where the a priori estimate is stated.The regularity demanded by the right-hand side of ( 4) is confirmed with the embeddings , see [18,Theorem 1.4.4.2].Collecting up the arguments above leads to and the assertion follows after insertion of the a priori estimate from Lemma 2.1.c) With an embedding we deduce from the assumption that u ∈ L 4 (Γ).Hence, (4) is applicable which implies y ∈ H 3/4 (Γ) and thus, u y ∈ H 1/2 (Γ), see [18,Theorem 1.4.4.2].The H 2 (Ω)regularity of y then follows from a shift theorem applied to the equation with boundary conditions ∂ n y = g − uy ∈ H 1/2 (Γ) on Γ, see [18,Theorem 2.4.2.7] (for domains with smooth boundary) or [18,Theorem 4.4.3.8](for convex polygonal domains).

The optimal control problem
We introduce the control-to-state operator S : U ad → H 1 (Ω) defined by S(u) := y, with y the solution of (1).In this section we discuss the bilinear optimal control problem subject to u ∈ U ad := {v ∈ L 2 (Γ) : u a ≤ v ≤ u b a. e. on Γ}.Here, α > 0 is the regularization parameter, y d ∈ L 2 (Ω) the desired state and 0 < u a < u b the control bounds.Our aim is to derive necessary and sufficient optimality conditions as well as regularity results for local solutions.Note, that the operator S is non-affine and consequently, j is non-convex.

Optimality conditions
To derive optimality conditions differentiability properties of the (implicitly defined) operator S are of interest.
Lemma 3.1.The operator S : U ad → H 1 (Ω) is infinitely many times Fréchet differentiable with respect to the L 2 (Γ)-topology.The first derivative δy := S (u)δu is the weak solution of Proof.The result follows from an application of the implicit function theorem to the operator e : whose roots are solutions of (1).We choose δy ∈ H 1 (Ω), δu ∈ U such that u + δu ∈ U (note that U is an open subset of L 2 (Γ)).First, we confirm that the linear operator e (y, u) : is the Fréchet-derivative of e.This is a consequence of e(y + δy, u + δu) − e(y, u) = e (y, u)(δy, δu) and the fact that the remainder term satisfies where we applied the generalized Hölder inequality and the embedding H 1 (Ω) → L 4 (Γ).The second Fréchet derivative e : given by e (y, u)(δy, δu)(τ y, τ u) := (τ u δy and the mapping (y, u) → e (y, u) is continuous.The derivatives of order n ≥ 3 vanish.Hence, e : Finally, due to Lemma 2.1 we conclude that the linear mapping δy → e y (y, u)δy = (∇δy, ∇•) is bijective.The implicit function theorem implies the assertion and the derivative δy := S (u)δu is given by e (y, u)(δy, δu) = 0.This corresponds to the weak formulation of (6).
From the chain rule and Lemma 3.1 we directly conclude the following differentiability result Lemma 3.2.The functional j : U ad → R is infinitely many times Fréchet differentiable with respect to the L 2 (Γ)-topology and the first derivative is given by The optimality condition can be simplified using the adjoint of the linearized control-to-state operator, this is, with p ∈ H 1 (Ω) solving the adjoint equation In the following we denote the control-to-adjoint mapping u → p = S (u) * (S(u) − y d ) with by Z : L 2 (Γ) → H 1 (Ω).Consequently, we can rewrite the optimality condition (9) as The variational inequality is equivalent to the projection formula with Π ad the L 2 (Γ)-projection onto U ad .
To compute the second derivative of j we need the solution δy := S (u)δu ∈ H 1 (Ω) of the tangent equation −∆δy + δy = 0 in Ω, ∂ n δy + u δy = −y δu on Γ, and the solution δp := Z (u)δu ∈ H 1 (Ω) of the dual for Hessian equation Then, the reduced Hessian in the directions δu, τ u ∈ L 2 (Γ) then reads Next, we derive some stability and Lipschitz properties of S, Z, S and Z .As the following results require different assumptions on f , y d and g we simply assume the most restrictive ones, this is, Moreover, we will hide the dependency on these quantities in the generic constant to simplify the notation.
Lemma 3.3.Let u ∈ L 2 (Γ) satisfy the assumption (2).The control-to-state operator S satisfies the following inequalities: with p 1 > 2 and p 2 ≥ 2 for n = 2, and p 1 ≥ 4 and p 2 > 8/3 for n = 3.The estimates remain valid when replacing the operator S by the control-to-adjoint operator Z.
Proof.The inequalities for S are a direct consequence of Lemmata 2.1 and 2.3.The inequalities for Z can be derived with similar arguments, but the right-hand side of the adjoint equation involves the corresponding state S(u).However, in all cases the norms of S(u) − y d can be bounded by c (1 + S(u) H 1 (Ω) ) ≤ c.
Lemma 3.4.Given are u, δu ∈ L 2 (Γ) and it is assumed that u satisfies (2).Then, the following stability estimates hold true: with p > 2 for n = 2 and p ≥ 4 for n = 3.The estimates remain valid when replacing S by Z .
Proof.In the following we write y := S(u) and δy = S (u)δu.The stability in H 1 (Ω) follows directly from Lemma 2.1 and the estimate which follows from the same arguments used already in (8).The boundedness of y := S(u) in H 1 (Ω) can be found in the previous Lemma.The estimate in the H 3/2 (Ω)-norm follows analogously with Lemma 2.3a) and and the stability in L ∞ (Ω) proved in Lemma 3.3.
The estimates for Z are deduced with similar techniques.With the a priori estimate from Lemma 2.3a) and the embedding H 1 (Ω) → L r (Ω) which holds for r < ∞ (n = 2) or r ≤ 6 (n = 3) we get satisfy assumption (2).Then, the following Lipschitz-estimates hold: The estimates are also valid when replacing S by Z and Z by Z .
Proof.The estimates for S and S follow directly from Lemma 2.2 and the stability estimates for S and S in H 1 (Ω) proved in the Lemmata 3.3 and 3.4.The Lipschitz estimate for Z is proved in a similar way.In this case one has to apply the Lipschitz estimate shown for S to the term S(u) − S(v) H 1 (Ω) appearing due to the differences in the right-hand sides.With the same idea we show the Lipschitz estimate for Z .Using again Lemma 2.2 we get It remains to bound the three terms on the right-hand side.To this end, we apply Lemma 3.4 to the first term, the Lipschitz estimate for S (•)δu to the second term, and the multiplication rule ( 14) with y = Z(u) − Z(v) as well as the Lipschitz estimate for Z to the third term.
As the optimal control problem is non-convex we have to deal with local solutions.For some local solution ū ∈ U ad we require the following second-order sufficient condition: Assumption 3.6 (SSC).The objective functional is locally convex near the local solution ū, i. e., a constant δ > 0 exists such that With standard arguments one can show that each function ū ∈ U ad fulfilling the first-order necessary condition (11) and the second-order sufficient condition ( 15) is indeed a local solution and satisfies the quadratic growth condition with certain constants γ, τ > 0. The author is aware that there are weaker assumptions which are sufficient for local minima, for instance one could formulate (15) for all directions v from a critical cone.However, with this assumption the convergence proof for the postprocessing approach presented in Section 5.3 requires some more careful investigations, in particular the construction of a modified interpolant onto U ad .One possible solution for this issue can be found in [26].
Later, we will require the following Lipschitz estimate for the Hessian of j.
From the representation (13) we obtain We estimate the right-hand side using the Cauchy-Schwarz inequality, the embedding H 1 (Ω) → L 4 (Γ) and the Lipschitz estimates from Lemma 3.5 as well as the a priori estimates from Lemmata 3.3 and 3.4.This implies With similar arguments we deduce and conclude the assertion.
Corollary 3.8.Let ū ∈ U ad be a local solution of (5) satisfying Assumption 3.6.Then, some ε > 0 exists such that the inequality Proof.The assertion follows immediately from the previous Lemma.For further details we refer to [25,Lemma 2.23].
In the next Lemma we will collect some basic regularity results for the solution of (5).
Under additional assumptions on the geometry of Ω we can show even higher regularity.This is needed for the postprocessing approach studied in Section 5.3 where we will show almost quadratic convergence of the control approximations.Lemma 3.10.Let Ω ⊂ R 2 be a bounded domain with a C 1,1 -boundary Γ.Then, there holds for all Γ ⊂⊂ A or Γ ⊂⊂ I, where A := {x ∈ Γ : u(x) ∈ {u a , u b }} and I := Γ \ A denote the active and inactive set, respectively.
We chose the assumptions of the previous Lemma in such a way that the regularity is only restricted due to the projection formula.Of course, when the control bounds are never active we could further improve the regularity results.

Finite element approximation of the state equation
This section is devoted to the finite element approximation of the variational problem (1).While the results from the previous sections are valid for arbitrary Lipschitz domains (unless otherwise explicitly assumed), we have to assume more smoothness of the boundary Γ in order to establish our discretization results: This definition includes arbitrary (possibly non-convex) polygonal or polyhedral domains.Indeed, the regularity of solutions is in this case also restricted by corner and edge singularities.However, for the first convergence result we require only H 3/2 (Ω) ∩ H 1 (Γ)-regularity of the solution.Later, we want to investigate improved discretization techniques for which more regularity is needed.Then, we will use a stronger assumption on the domain.
First, we introduce shape-regular triangulations {T h } h>0 of Ω consisting of triangles (n = 2) or tetrahedra (n = 3).The elements T may have curved edges/faces such that the property

Ω =
T ∈T h T is valid for an arbitrary domain Ω.Moreover, we assume that the triangulations are feasible in the sense of Ciarlet [12].
The mesh parameter h > 0 is the maximal element diameter The family of meshes {T h } h>0 is assumed to be quasi-uniform, this means some κ > 0 independent of h exists such that each element T ∈ T h contains a ball with radius ρ T satisfying the estimate Each triangulation T h of Ω induces also a triangulation E h of the boundary Γ By F T : T → T we denote the transformations from the reference triangle/tetrahedron T to the world element T ∈ T h .The transformations F T may be non-affine for elements with curved faces.Here, we consider transformations of the form with some affine function FT (x) = BT x + bT , BT ∈ R n×n , b ∈ R n , chosen in such a way that if T is a curved boundary element, T = FT ( T ) is an n-simplex whose vertices coincide with the vertices of T .The assumed shape-regularity implies BT ≤ c h T and B−1 To guarantee the validity of interpolation error estimates we assume: (A2) The triangulations T h are regular of order 2 in the sense of [5], this is, for all sufficiently small h > 0 there holds for all T ∈ T h .
There are multiple strategies to construct the mappings F T satisfying these assumptions and we refer the reader for instance to [5,33,36].Therein, it is assumed that Γ is piecewise C 3 , only in the second reference C 4 is required.
The trial and test space is defined by Next, we introduce an interpolation operator which maps functions from W 1,1 (Ω) onto V h .Therefore, we partly use the quasi-interpolant proposed by Bernardi [5], but use a modification for boundary nodes as in [32], see also [1].To each interior node x i , i = 1, . . ., N in , of T h , we associate the patch of elements σ i := ∪{ T : Instead of using nodal values as for the Lagrange interpolant, we use the nodal values of some regularized function computed by an L 2 -projection over σ i .Therefore, denote by F i : σi → σ i a continuous transformation from a reference patch σi having diameter O(1) to σ i .The interpolation operator Π h : W 1,1 (Ω) → V h is defined as follows.
To each node x i , i = 1, . . ., N , we associate a first-order polynomial where û is chosen such that u = û • F −1 i .The interpolation operator is defined by where {ϕ i } i=1,...,N is the nodal basis of V h .Note, that due to the modification for boundary nodes, this operator is only applicable to W 1,1 (Ω)-functions.The desired interpolation properties remain valid.In particular, there holds The proof follows from the same arguments as in [32,Theorem 4.1].
The finite element solutions of (1) are characterized by the variational formulations As in the continuous case one can show that (19) possesses a unique solution for each h > 0.
With the usual arguments we can derive an error estimate for the approximation error in the energy-norm.Lemma 4.1.Assume that (A1) and (A2) are satisfied and that the solution y of (1) belongs to H s (Ω) with some s ∈ [1,2].Then, there holds the error estimate Proof.The proof follows from the Céa-Lemma and the interpolation error estimates (17).
Of particular interest are error estimates on the boundary.This is required in order to derive error estimates for boundary control problems.To this end, we prove first a suboptimal result which is valid for arbitrary Lipschitz domains Ω. Lemma 4.2.Let the assumptions (A1) and (A2) be satisfied.It is assumed that the solution y of (1) belongs to H 3/2 (Ω).Moreover, the parameter u fulfills (2) and belongs to L p (Γ) with p > 2 for n = 2 and p ≥ 4 for n = 3.Then, the error estimate holds, for all h > 0.
Proof.We introduce the dual problem and obtain with the typical arguments of the Aubin-Nitsche trick . The last step is an application of Lemma 4.1 and the interpolation error estimate (17).The regularity required for the dual solution w can be deduced from Lemma 2.3 with f ≡ 0 and g = y − y h .Taking into account the a priori estimate we conclude the assertion.
If the solution is more regular, we can also show a higher convergence rate.In this case we will use the Hölder inequality and a trace theorem to obtain y − y h L 2 (Γ) ≤ y − y h L ∞ (Ω) , and insert the following result.Theorem 4.3.Consider a planar domain domain Ω ∈ R 2 .Let u ∈ H 1/2 (Γ) with u ≥ 0 a. e., and assume that (A1) and (A2) are satisfied.Assume that the solution y of (1) belongs to y ∈ W 2,q (Ω) with q ∈ [2, ∞).Then, the error estimate The proof requires rather technical arguments and is postponed to the appendix.

The discrete optimal control problem
In the following we investigate the discretized optimal control problem: The reduced objective functional is denoted by j h (u h ) := J h (S h (u h ), u h ).We use piecewise linear finite elements to approximate the state y, i. e., the space V h is defined as in the previous section.
The controls are sought in the space of piecewise constant functions, where E h is the triangulation of the boundary induced by T h .As in the continuous case we can derive a first-order necessary optimality condition which reads The discrete control-to-state operator is denoted by S h : L 2 (Γ) → V h and the control-to-adjoint operator by Z h : L 2 (Γ) → V h .Analogous to the continuous case we compute the first and second derivatives of j h and obtain and The first-order optimality condition reads in the short form 5.1.Properties of the discrete control-to-state/adjoint operator In Section 3 we have derived several stability and Lipschitz properties for the operators S, Z, S and Z .Here, we will derive the discrete analogues that are needed in the following.Throughout this section we assume that (A1) and (A2) are fulfilled.
Lemma 5.1.There hold the following properties: for p 1 , p 2 > 2 for n = 2 and p 1 ≥ 4, p 2 > 4 for n = 3.These estimates remain valid when replacing S h by Z h .
Proof.We start with the estimate in the H 1 (Γ)-norm.With the triangle inequality and an inverse estimate we obtain The first two terms are bounded by the last one due to (18) and it remains to apply the stability estimate from Lemma 3.3.For the third term we apply the error estimate from Lemma 4.2.This implies the first estimate.
We prove the maximum norm estimate only for the case n = 3.In the following, we write y h := S h (u).We introduce the function ỹ ∈ H 1 (Ω) solving the problem Obviously, y h is the Neumann Ritz-projection of ỹ, i. e., Let x * ∈ T * with T * ∈ T h be the point where |y h | attains its maximum.With an inverse inequality and the Hölder inequality we get where δ h is a regularized delta function defined by δ h (x) = |T * | −1 sgn(ỹ(x) − y h (x)) if x ∈ T * and δ h (x) = 0 otherwise.The second term on the right-hand side can be treated with the arguments used already in the proof of Lemma 2.3b), namely with r > 3/2 and s = 2 + ε with ε > 0 sufficiently small such that the following arguments remain valid.Furthermore, we estimate the last term with the Hölder inequality with p 2 = 4 (2+ε)/(2−ε) and p = 4 (note that 1/p 2 + 1/p = 1/s) and the embedding H 1 (Ω) → L 4 (Γ).This yields It remains to exploit stability of S h in the The estimate for the first term on the right-hand side of ( 25) is based on the ideas from [35,Section 3.6].First, we introduce a regularized Green's function g h ∈ H 1 (Ω) solving the variational problem a N (z, g h ) = (δ h , z) L 2 (Ω) for all z ∈ H 1 (Ω).The Neumann Ritz-projection of g h is denoted by g h h .Using the Galerkin orthogonality we obtain where the last step follows form the stability of the Ritz projection and the interpolation error estimate (17).To bound the H 1 (Ω)-norm of g h we apply the ellipticity of a N , the definition of g h , the Hölder inequality and an embedding to arrive at The last step follows from the property δ h L 6/5 (Ω) ≤ c |T * | −1/6 ≤ c h −1/2 that can be confirmed with a simple computation.Insertion into (27) and taking into account ( 25) and ( 26) yields the desired stability estimate.
The estimates for Z h follow in a similar way.One just has to replace f by S h (u) − y d and the result follows from the estimates proved already for S h (u).
Lemma 5.2.Assume that u, v ∈ L 2 (Γ) satisfy the assumption (2).Then, the Lipschitz estimate Proof.The proof follows with the same arguments as in the continuous case, see Lemmata 2.2 and 3.5.
Next, we discuss some error estimates for the approximation of the control-to-state and controlto-adjoint operator.While estimates for S h and Z h are a direct consequence of Lemma 4.2, the results for the linearized operators S h and Z h require some more effort as for instance S (u)δu − S h (u)δu does not fulfill the Galerkin orthogonality.
Lemma 5.3.For each u ∈ U ad and δu ∈ L 2 (Γ) the error estimates are valid for p > 2 for n = 2 and p ≥ 4 for n = 3.The results are also valid when replacing S and S h by Z and Z h , as well as S and S h by Z and Z h , respectively.
Proof.The first estimate is just a combination of the Lemmata 4.1 and 3.3.To show the estimate for the linearized operators we introduce again the abbreviations y := S(u), y h := S h (u), δy := S (u)δu and δy h := S h (u)δu.Moreover, define the auxiliary function δ ỹh ∈ V h as the solution of This function fulfills the Galerkin orthogonality, i. e., a u (δy − δ ỹh , v h ) = 0 for all v h ∈ V h .Hence, we obtain with Lemma 4.1 and the Lipschitz-property from Lemma 2.2 (note that this Lemma is also valid for the discrete solutions) For the first term we simply insert the second estimate from Lemma 3.4.The second term on the right-hand side is further estimated by means of [18, Theorem 1.4.4.2] and a trace theorem which yield δu (y − y h ) and the assertion follows after an application of the estimate shown already for S(u) − S h (u).
The estimates for Z and Z follow with similar arguments.

Convergence of the fully discrete solutions
Throughout this subsection we assume that the properties (A1) and (A2) are fulfilled.These assumptions are needed to guarantee the required regularity of the solution and the validity of interpolation error estimates.
As the solutions of both the continuous and discrete optimal control problem ( 5) and ( 21), respectively, are not unique we have to construct a sequence of discrete local solutions converging towards a continuous one.The first question which arises is whether such a sequence exists.To this end, we introduce a localized problem where ū ∈ U ad is a fixed local solution of ( 5) and ε > 0 is some small parameter.First, we show that this problem possesses a unique local solution which would immediately follow if we could show that the coercivity discussed in Corollary 3.8 is transferred to the discrete case.The following arguments are similar to the investigations in [10], in particular Theorem 4.4 and 4.5 therein.
Proof.With the explicit representations of j and j h from ( 13) and ( 23), respectively, and Corollary 3.8, we obtain with y = S(u), p = Z(u), δy = S (u)δu and δp = Z (u)δu, and the discrete analogues y h = S h (u), p h = Z h (u), δy h = S h (u)δu and δp h = Z h (u)δu.It remains to bound the two norms in parentheses appropriately.Therefore, we apply the triangle inequality, the stability properties for S , S h , Z and Z h from Lemmata 3.3, 3.4 and 5.1 as well as the error estimates from Lemma 5.3.Note that the control bounds provide the regularity for u that is required for these estimates.As a consequence we obtain With similar arguments we can show The previous two estimates together with (29) imply Choosing h sufficiently small such that c h 1/2 ≤ δ 4 leads to the assertion.
Theorem 5.5.Let ū ∈ U ad be a local solution of (5) satisfying Assumption 3.6.Assume that ε > 0 and h 0 > 0 are sufficiently small.Then, the auxiliary problem (28) possesses a unique solution for each h ≤ h 0 denoted by ūε h , and there holds Proof.The existence of at least one solution of (28) follows immediately from the compactness and non-emptyness of U ad h ∩ B ε (ū).Note that Q h ū ∈ U ad h ∩ B ε (ū) for sufficiently small h > 0, this confirms that the feasible set is non-empty.Due to Lemma 5.4 this solution is unique.
Moreover, the family {ū ε h } h≤h 0 is bounded and hence, a weakly convergent sequence {ū ε h k } k∈N with h k 0 exists.The weak limit is denoted by ũ ∈ L 2 (Γ) and from the convexity of the feasible set we deduce ũ ∈ U ad h ∩ B ε (ū).Without loss of generality it is assumed that ūε h ũ in L 2 (Γ) as h 0. Next, we show that ũ is a local minimum of the continuous problem.First, we show the convergence of the corresponding states which follows with the arguments from [9].First, we employ the triangle inequality to get For the first term on the right-hand side we exploit convergence of the finite element method proved in Lemma 5.3 which yields With similar arguments as in the proof of Lemma 2.2 we moreover deduce The integral term on the right-hand side is non-negative due to the lower control bounds ūε h ≥ u a ≥ 0. We can bound the first term on the right-hand side with the Cauchy-Schwarz inequality and the multiplication rule from [18, Theorem 1.4.4.2] which provides for arbitrary s ∈ (0, 1/2).Note that there holds ũ− ūε h H −s (Γ) → 0 for h 0 due to the compact embedding L 2 (Γ) → H −s (Γ), s > 0. It remains to bound the second factor on the right-hand side by an application of Lemma 5.1 and to divide the whole estimate by the third factor.After insertion of this estimate into (30) we obtain the strong convergence of the states, this is, Next, we show that ũ is a local solution of the continuous problem (5).To this end we exploit (31) and the lower semi-continuity of the norm map to arrive at The second to last step follows from the optimality of ūε h for ( 28) and the admissibility of the L 2 (Γ)-projection Q h ū for sufficiently small h > 0. The last step follows from the strong convergence of the L 2 (Γ)-projection Q h in L 2 (Γ).Note that this implies lim h 0 S h (Q h ū)−S(ū) L 2 (Ω) = 0. Due to Assumption 3.6 the solution ū is unique within B ε (ū) when ε > 0 is sufficiently small.This implies ũ = ū.Note that all "≤" signs in (32) then turn to "=" signs.
To conclude the strong convergence of the sequence {ū ε h } h>0 we show additionally the convergence of the norms.This follows from (32) and the strong convergence of the states from which we infer The previous Lemma guarantees that each local solution ū ∈ U ad can be approximated by a sequence of local solutions of the discretized problems (28).Due to ūε h ∈ B ε (ū) and ūε h → ū for h 0 (i.e., the constraint ūε h ∈ B ε (ū) is never active), the functions ūε h are local solutions of the discrete problems (21) provided that h > 0 is small enough.Hence, we neglect the superscript ε in the following and denote by ūh the sequence of discrete local solutions converging to the local solution ū.
Next, we show linear convergence of the sequence ūh .
Theorem 5.6.Let ū ∈ U ad be a local solution of (5) which fulfills Assumption 3.6, and {ū h } h>0 are local solutions of (21) with ūh → ū for h 0.Then, the error estimate Proof.Let ξ = ū + t(ū h − ū) with t ∈ (0, 1).From Corollary 3.8 we obtain for sufficiently small h the estimate where the last step follows from the mean value theorem for some t ∈ (0, 1).Next, we confirm with the first-order optimality conditions that with the L 2 (Γ) projection Q h onto U h .Note that the property Q h ū ∈ U ad is trivially satisfied.Insertion into the inequality above leads to An estimate for the second part follows from orthogonality of the L 2 (Γ)-projection, this is, Furthermore, we exploit the Leibniz rule and the stability properties for S and Z from Lemma 3.3 to obtain Next, we discuss the first term on the right-hand side of (33).Insertion of the definition of j h and j and the stability of In the last step we inserted the finite element error estimates from Lemma 4.2.Exploiting also the stability estimates from Lemmata 3.3 and 5.1 we obtain Together with (33), (34) and ( 35) we arrive at the assertion.

Postprocessing approach
In this section we consider the so-called postprocessing approach introduced in [29].The basic idea is to compute an "improved" control ũh by a pointwise evaluation of the projection formula, i. e., ũh : where ȳh and ph is the discrete state and adjoint state, respectively, obtained by the full discretization approach discussed in Section 5.2.As we require higher regularity of the exact solution in order to observe a higher convergence rate than for the full discretization approach, we replace (A1) by the stronger assumption (A1') The domain Ω is planar and its boundary is globally C 3 .
The most technical part of convergence proofs for this approach is the proof of L 2 -norm estimates for the state variables.This is usually done by considering the following three terms separately: In [29] R h : C(Γ) → U h is chosen as the midpoint interpolant.We will construct and investigate such an operator in Appendix B. Note that a definition of a midpoint interpolant on curved elements is not straight-forward.The first term on the right-hand side of (37) is a finite element error in the L 2 (Γ)-norm.We collect the required estimates in the following Lemma.
Lemma 5.7.For all q < ∞ there hold the estimates Proof.The first estimate follows from the Hölder inequality and the maximum norm estimate derived in Theorem 4.3.The second estimate requires an intermediate step.We denote by p h (ū) ∈ V h the solution of the equation As p h (ū) is the Ritz-projection of p we can apply Theorem 4.3 again and obtain To show an estimate for the error between p h (ū) and Z h (ū) we test the equations defining both functions by v h = p h (ū) − Z h (ū), compare the proof of Lemma 2.2.Together with the nonnegativity of ū we obtain The last step follows from the estimate S(ū) which is a consequence of the Aubin-Nitsche trick.With the triangle inequality we conclude the desired estimate for the discrete control-to-adjoint operator .
To obtain an optimal error estimate for the second term we need an additional assumption which is used in all contributions studying the postprocessing approach.To this end, define the subsets K 2 := ∪{ Ē : E ∈ E h , E ⊂ A, or E ⊂ I} and K 1 := Γ \ K 2 .In the following we will assume that K 1 satisfies The idea of this assumption is, that the control can only switch between active and inactive set on K 1 .Only due to these switching points the regularity of the control is reduced, see also Lemma 3.10.One can in general expect that this happens at finitely many points and thus, the assumption (38) is not very restrictive.

As an intermediate result required to prove estimates for
Lemma 5.8.For all q < ∞ there holds the estimate Proof.To shorten the notation we write e h := S h (ū) − S h (R h ū).Moreover, we introduce the function w ∈ H 1 (Ω) solving the equation This implies Next, we discuss both terms on the right-hand side separately.The first one is treated with the Cauchy-Schwarz inequality and the interpolation error estimate (17).These arguments lead to The H 1 (Ω)-norm of e h is further estimated by the Lipschitz property from Lemma 5.2 and an interpolation error estimate for the midpoint interpolant.This yields for all q ≥ 2. Insertion into the estimate above taking into account the stability estimates from Lemma 5.1 yields Next, we consider the second term on the right-hand side of (40).After a reformulation by means of the definition of S h we get We can further estimate this term with the interpolation error estimate from Lemma B.3 The last step follows from the embedding H 1 (Γ) → L ∞ (Γ) and the multiplication rule , see [18,Theorem 1.4.4.2].Both properties are only fulfilled in case of n = 2.
Let us discuss the terms on the right-hand side separately.For elements E ⊂ K 1 we can exploit the assumption (38) which provides the estimate E⊂K 1 |E| ≤ c h and the second interpolation error estimate from Lemma B.1 to arrive at Moreover, with the first estimate from Lemma B.1 and the discrete Cauchy-Schwarz inequality we obtain for elements in K 2 the estimate The remaining terms on the right-hand side of (43) can be treated with stability estimates for S h (see Lemma 5.1) and R h , the estimate Π h w H 1 (Γ) ≤ c w H 1 (Γ) stated in (18) and the a priori estimate w H 1 (Γ) ≤ c e h L 2 (Ω) from Lemma 2.3a).Insertion of the previous estimates into (43) yields Note that we hide the of lower-order norms of ū in the generic constant as these quantities may be estimated by means of the control bounds u a and u b .Insertion of ( 41) and ( 44) into (40) and dividing by e h L 2 (Ω) implies the assertion.
Proof.We will only prove the second estimate as the first one follows from the same technique and is even easier as the right-hand sides of the equations defining S h (ū) and S h (R h ū) coincide.This is not the case for the control-to-adjoint operator.
To shorten the notation we write e h := Z h (ū)−Z h (R h ū).As in the previous Lemma we rewrite the error by a duality argument using a dual problem similar to (39) with solution w ∈ H 1 (Ω), more precisely, This yields We rewrite the second expression in (45) and get analogous to (42) Note that the first term would not appear when deriving estimates for S h instead of Z h as the equations defining S h (ū) and S h (R h ū) have the same right-hand side.
The first term can be treated with the Cauchy-Schwarz inequality, Lemma 5.8 and the estimate which can be deduced from ( 17) and Lemma 2.1 with g = e h .These ideas lead to For the second term on the right-hand side of (46) we apply the same steps as for (44) with the only modification that the a priori estimate w H 1 (Γ) ≤ c e h L 2 (Γ) from Lemma 2.3a) has to be employed.From this we infer In the last step we used the boundedness of Z h (R h ū), see Lemma 5.1.Insertion of ( 47) and ( 48) into (46) leads to It remains to discuss the first term on the right-hand side of (45).We obtain with the boundedness of a ū, the interpolation error estimate (17) and Lemma 2.3a) An estimate for the expression e h H 1 (Ω) follows from the equality which can be deduced by subtracting the equations for Z h (ū) and Z h (R h ū) from each other.Rearranging the terms yields The second term on the right-hand side can be bounded by zero as ū ≥ 0. An estimate for the last term is proved in Lemma 5.8.For the first term we apply the estimate (48) with Π h w replaced by e h .All together, we obtain Moreover, with an inverse inequality and a trace theorem we get Consequently, we deduce from (51) Insertion into (50) leads to Together with (49) and (45) we conclude the desired estimate for Z h .
Lemma 5.10.Under the assumption (38) there holds the estimate Proof.We observe that each function ξ for arbitrary ε > 0 provided that h is sufficiently small.This follows from the convergence of the midpoint interpolant, see Lemma B.1, and convergence of ūh towards ū, see Theorem 5.5.Hence, with the coercivity of j h proved in Lemma 5.4 and the mean value theorem we conclude For the latter term we exploit the discrete optimality condition and the fact that the continuous optimality condition holds even pointwise.This implies the inequality Insertion into the estimate above implies The right-hand side can be decomposed into two parts.With appropriate intermediate functions we obtain for the latter one Moreover, we apply the triangle inequality and the estimates from Lemmata 5.7 and 5.9 to deduce ≤ c h 2−2/q |ln h|.
Analogously, one can derive an estimate for the term p − Z h (R h ū) L 2 (Γ) .Moreover, we apply Lemmata 3.3 and 5.1 to bound the norms of p = Z(ū) and S h (R h ū), respectively.All together we obtain the estimate Next we discuss that part of (52) which involves the term R h (ȳ p) − ȳ p in the first argument.
With an application of the local estimate from Lemma B.3 we obtain With [18,Theorem 1.4.4.2] and a trace theorem we conclude Insertion of the estimates (53) and ( 54) into (52), and dividing the resulting estimate by R h u − u h L 2 (Γ) , leads to the desired result.Now we are in the position to state the main result of this section.
Theorem 5.11.Let (ȳ, ū, p) be a local solution of (11) satisfying Assumption 38.Moreover, let {ū h } h>0 be a sequence of local solutions of (24) such that for sufficiently small ε, h 0 > 0 the property ū − ūh L 2 (Γ) < ε ∀h < h 0 holds.Then, the error estimate Proof.With the projection formulas ( 12) and (36), respectively, the non-expansivity of the operator Π ad and the triangle inequality we obtain The assertion follows after insertion of (37) together with the estimates obtained in Lemmata 5.7, 5.9 and 5.10, as well as the stability estimates of Z and S h from Lemmata 2.3 and 5.1, respectively.

Numerical experiments
It is the purpose of this last section to confirm the theoretical results by numerical experiments.To this end, we reformulate the discrete optimality condition (22) and use the equivalent projection formula Here, R Simp h : C(Γ) → U h is a projection operator based on the Simpson rule, this is, where x E 1 and x E 2 are the endpoints of the boundary edge E ∈ E h and x E its midpoint.The numerical solution of ( 55) is computed by a semismooth Newton-method.The input data of the considered benchmark problem is chosen as follows.The computational domain is the unit square Ω := (0, 1) 2 .We define the exact Robin parameter ũ by ũ(x 1 , x 2 ) := max(−0.01, 1 − 30(x 1 − 0.5) 2 ), if x 1 = 0, −0.01, otherwise, and use the desired state y d = S h (ũ) and the right-hand side f ≡ 0.Moreover, the regularization parameter α = 10 −2 and the control bounds u a = 0, u b = ∞ are used.We compute the numerical solution of our benchmark problem on a sequence of meshes starting with T h 0 , h 0 = √ 2, consisting of two rectangular triangles only.The remaining grids T h i , i = 1, 2, . . ., are obtained by a double bisection through the longest edge of each element applied to the previous mesh.This guarantees h i = 1 2 h i−1 .In order to compute the discretization error we use the solution on the mesh T h 11 as an approximation of the exact solution, this means, ū − ūh i L 2 (Γ) ≈ ūh 11 − ūh i L 2 (Γ) , i = 0, 1, . . ., 10.
Analogously, we compute the error for the approximation obtained by the postprocessing strategy.However, in this case the exact solution is approximated by ū ≈ Π ad ( The optimal control and corresponding state of our benchmark problem is illustrated in Figure 1 and the measured discretization errors as well as the experimentally computed convergence rates are summarized in Table 1.As we have proven in Theorem 5.6 the numerical solutions obtained by a full discretization using a piecewise constant control approximation converge with the optimal convergence rate 1.Moreover, it is confirmed that the solution obtained with a postprocessing step, see Theorem 5.11, converges with order 2. Note that we actually proved the results for the case that the boundary is smooth which is indeed not the case in our example.However, the corner singularities contained in the solution are for a 90 • -corner comparatively mild so that the regularity results from Lemma 3.10 remain valid.i A. Proof of Theorem 4.3 The proof of the maximum norm estimate presented in Theorem 4.3 follows basically from the arguments of [16,34].For the convenience of the reader we want to repeat the proof as the result of Theorem 4.3 is, for our specific situation, not directly available in the literature.The novelty of the present proof is that it includes curved elements as well as Robin boundary conditions.In the aforementioned articles, a representation of the error term based on a regularized Dirac function is used.This function forms the right-hand side of a dual problem whose solution is an approximation of Green's function.The main difficulty is to bound this solution in appropriate norms. To this end, we denote by T * ∈ T h the element where |y − y h | attains its maximum.The regularized Dirac function is defined by δ h (x) := |T * | −1 sgn(y(x)−y h (x)) if x ∈ T * , and δ h (x) := 0 if x ∈ T * .The corresponding Green's function denoted by g h solves the problem The Dirac function satisfies the properties We start our considerations with some a priori estimates for the solution g h .
Lemma A.1.The following a priori estimates hold: Proof.(ii) To show the estimate in the H 2 (Ω)-norm we apply the a priori estimate from Lemma 2.3c) and δ h L 2 (Ω) ≤ ch −1 .(i) The weak form of (56) and the property (57) imply where the discrete Sobolev inequality was applied in the last step.The function g h h ∈ V h is the Ritz-projection of g h and satisfies the usual stability estimate Next, we derive a suboptimal error estimate for the finite-element error in the L ∞ (Ω)-norm.Using an inverse inequality, estimates for the interpolant Π h from ( 17), the Aubin-Nitsche trick and the a priori estimate shown already in the H 2 (Ω)-norm we deduce Note that we hide the dependency on u, or more precisely on u H 1/2 (Ω) and lower-order norms, in the generic constant to simplify the notation.Insertion of (59) and ( 60) into (58) yields with Young's inequality The desired estimate follows form a kick-back-argument.
Next, we show an a priori estimate for g h in a weighted norm.This is the key idea which allows us to bound second derivatives by a logarithmic factor only.The weight function we will use is defined by The first property follows from a simple computation.The second estimate is a consequence of the multiplicative trace theorem σ , see [6, Theorem 1.6.6],and the chain rule ∇σ −1 = −σ −2 ∇σ and it is simple to show ∇σ L ∞ (Ω) ≤ 1 and σ Lemma A.2. Assume that u ∈ H 1/2 (Γ).There holds the estimate Proof.We introduce the functions ξ With the reverse product rule we obtain Moreover, we easily confirm that ξ i g h is the solution of the problem where n i is the i-th component of the outer unit normal vector on Γ. Lemma 2.3c) using the property ξ i δ h L 2 (Ω) ≤ c, which follows from a simple computation, leads to Insertion into (63) and using Lemma A.1(i) leads to An estimate for the second term on the right-hand side of (62) is derived in Lemma A.1(ii).
Next, we derive some error estimates for the approximation g h h in several norms.
Lemma A.3.Assume that u ∈ H 1/2 (Γ).Then, there hold the error estimates Proof.The first estimate follows directly from the H 1 (Ω)-error estimate stated in Lemma 4.1 and the Aubin-Nitsche trick.Moreover, the a priori estimate for the H 2 (Ω)-norm of g h from Lemma A.1 has to be exploited.
In the second estimate one observes that the discrete function g h h would vanish except on curved elements (note that g h h is affine on the reference element only, but not on T ).With the transformation result [5, Lemma 2.3] we obtain where , which holds due to the assumed shape-regularity, and |σ(x)| ≤ c for all x ∈ Ω, we obtain The first term has been discussed in the previous Lemma and the last term has been considered in the present Lemma already.Now we are in the position to prove Theorem 4.3.
Proof.With an inverse inequality and the Hölder inequality, the definition of δ h and a maximum norm estimate for the interpolant Π h , see e. g. [5, Theorem 4.1], we obtain where g h h ∈ V h denotes the Ritz projection of g h .For the latter part on the right-hand side of (65) we get with the Galerkin orthogonality, the Hölder inequality, a trace theorem for the boundary integral term as well as u An estimate for the interpolation error is deduced in [5].The L 1 (Ω)-norms can be replaced by weighted L 2 (Ω)-norms involving the weighting function σ.Taking into account the properties (61) we obtain In the following we will show that the expressions on the right-hand side of (67) are bounded by c h |ln h| 1/2 .Therefore, we apply the reverse product rule and get From this we conclude Here, we exploited that (u σ 2 (g h − g h h ), g h − g h h ) L 2 (Γ) ≥ 0 due to u ≥ u a ≥ 0. Next, we introduce the abbreviation z := σ 2 (g h − g h h ).The Galerkin orthogonality of g h − g h h , Young's inequality and the trace theorem taking into account |σ| and thus, Next, we derive local interpolation error estimates.In the following we use the notation σ T := inf x∈T σ(x) and σ T := sup x∈T σ(x).Due to the assumed shape-regularity there holds σ T ∼ σ T for all T ∈ T h , and hence, where T is the patch of all elements adjacent to T (note that Π h is a quasi-interpolant).The Leibniz rule and the properties Next, we combine the two estimates above and take into account the properties h σ −1 T ≤ c and σ T ∼ σ T which follows from the assumed quasi-uniformity.Summation over all T ∈ T h and an application of Lemma A.3 yields It remains to discuss the third term on the right-hand side of (69).With interpolation error estimates for Π h on the boundary, compare also (18), and u ∈ L ∞ (Γ) we obtain where we exploited the product rule and the property ∇σ 2 ≤ 2σ 1 in the last step.With a trace theorem and Lemma A.3 we conclude and with a multiplicative trace theorem, Young's inequality, the product rule and the estimates from Lemma A.3 we obtain σ ∇(g h − g h h ) L 2 (Γ) ≤ c σ ∇(g h − g h h ) L 2 (Ω) + ∇(σ ∇(g h − g h h )) L 2 pw (Ω) The estimate (71) then simplifies to Insertion of ( 70) and ( 72) into (69) leads to the estimate a(σ 2 (g h − g h h ), g h − g h h ) ≤ It remains to show an estimate for the second term on the right-hand side of (68).Due to |∇σ 2 | ≤ 2σ 1, Young's inequality and the L 2 (Ω)-error estimate from Lemma A.3 we get Insertion of ( 73) and ( 74) into (68) yields and with a kick-back-argument we conclude Θ 2 = c h 2 |ln h|.Finally, we collect up the previous estimates.To this end, we insert (75) into (67), the resulting estimate into (66) and this into (65).

B. Local estimates for the midpoint interpolant and the L 2 (Γ)-projection
To the best of the author's knowledge there are no error estimates for the midpoint interpolant defined on a curved boundary available in the literature.Thus, we prove the following Lemmata which are needed in the proof of Lemma 5.9.Consider a single boundary element E ⊂ T with corresponding element T ∈ T h .A parametrization of the boundary element is given by E := {γ E (ξ) := F T (ξ, 0), ξ ∈ (0, 1)} when assuming that the edge of T with endpoints (0, 0), (1, 0) is mapped onto E. In the following we denote the length of a boundary element E ∈ E h by L E = for all E ∈ E h , provided that u possesses the regularity demanded by the right-hand side.
Proof.Let us first construct a suitable interpolation operator.To obtain the desired second-order accuracy we have to guarantee that the property E p = E R h p holds for all functions p(γ E (ξ)) = p(ξ) with some first-order polynomial p(ξ) := a + b ξ.The transformation to Ê := (0, 1) × {0} yields  (76) Note, that the last step is valid for the spectral norm only.An application of Lemma 2.2 from [5] which provides sup x∈ T DF T (x) −1 ≤ ch −1 leads to the first estimate.
The second estimate follows with similar arguments.For an arbitrary constant p we then obtain A further operator that is needed in Section 3 is the L 2 (Γ)-projection onto U h .In case of curved boundaries, this operator reads for each E ∈ E h .Note that this definition implies the orthogonality property With similar arguments as in the previous Lemma we obtain the following local estimate which is standard in case of a boundary consisting of straight edges.
Lemma B.2. Assume that u ∈ H 1 (Γ).Then the estimate is fulfilled for all E ∈ E h .
We conclude this section with an estimate for an expression which is need in Lemma 5.9.
Lemma B.3.Assume that the functions u and v belong to ∈ H 1 (Γ).Then the inequality is valid.
Proof.First, we split the term under consideration using the L 2 (Γ)-projection onto U h and obtain The first term on the right-hand side can be treated with the local estimate from Lemma B.2 which yields (u For the second term we exploit the definition of Q h and R h on the reference element.For each E ∈ E h we then obtain and conclude the assertion.

see [ 5 ,
Theorem 4.1],[32, Theorem 4.1].Due to the special choice of the patches σ i for the boundary nodes we get similar interpolation error estimates on the boundary, this is,

Figure 1 :
Figure 1: Optimal state (surface) and the optimal control (boundary curve) for the benchmark problem.

1 0
| γE (ξ)| dξ.Lemma B.1.For each function u : Γ → R there exists some piecewise constant function R h u ∈ U h satisfying the local estimates

Table 1 :
Experimentally computed errors for the full discretization and the postprocessing approach with the corresponding experimental convergence rates (in parentheses)