1 Introduction

This paper is concerned with bilinear boundary control problems of the form

$$J(y,u):=\frac{1}{2} \Vert y-y_d\Vert _{L^2(\varOmega )}^2 + \frac{\alpha }{2}\Vert u\Vert _{L^2(\varGamma )}^2 \rightarrow \min !$$

subject to

$$\begin{aligned}-\varDelta y + y&= f\quad {\text{ in }}\quad \varOmega ,\\ \partial _n y + u\,y&= g\quad{\text{ on }}\quad \varGamma,\\ u\in U_{ad}&:=\{v\in L^2(\varGamma ):u_a \le u \le u_b\quad {\text{a.e.\,on}}\quad \varGamma \}, \end{aligned}$$

where \(\varOmega \subset {{\mathbb {R}}}^n\), \(n\in \{2,3\}\), is a bounded domain, \(\alpha >0\) is the regularization parameter, \(y_d\in L^2(\varOmega )\) is a desired state and \(0 \le u_a < u_b\) are the control bounds.

As an application of bilinear boundary control problems we mentioned the identification of an unknown Robin coefficient from a given measurement \(y_d\) of the state quantity. This is for instance of interest in the modeling of stem cell division processes [16, 17], where u is the unknown parameter describing the chemical reactions between proteins from the cell interior and the cell cortex. For further applications, u can be interpreted as a heat-exchange coefficient in thermodynamics or as a quantity for corrosion damage in electrostatics. There are many publications dealing with the identification of the Robin coefficient, see for instance [12, 23, 31, 34]. Only a few papers use an optimal control approach similar to the one considered in the present article. We mention [22, 25], where the parabolic version of our model problem is considered. The authors prove convergence of a finite element approximation but no convergence rate is established. A similar problem is discussed in [21], dealing with the recovery of the Robin parameter in a variational inequality.

The aim of the present paper is to derive necessary and sufficient optimality conditions for the optimal control problem and to investigate several numerical approximations regarding convergence towards a local solution. This complements a previous contribution of Kröner and Vexler [27] where the distributed control case, meaning that the bilinear term \(u\,y\) appears in the differential equation, is discussed. The main results in their article are error estimates for the approximate controls in the \(L^2(\varOmega )\)-norm for several finite element approximations. To be more precise, the convergence rate 1 is shown for piecewise constant and 3/2 for piecewise linear approximations for the control. Moreover, advanced discretization concepts like the postprocessing approach [32] and the variational discretization [24] are investigated which allow an improvement up to a convergence rate of 2. It is the purpose of the present article to extend the results to the case of bilinear boundary control.

The numerical analysis of boundary control problems is usually more difficult than for distributed control problems as the adjoint control-to-state operator maps onto some Sobolev/Lebesgue space defined on the boundary. As a consequence, error estimates for the traces of finite element solutions have to be proved, more precisely, in the \(L^2(\varGamma )\)-norm. Here, we consider two different discretization approaches. The first one is a full discretization using piecewise linear finite elements for the states and piecewise constant functions on the boundary for the control approximation. Under the assumption that the domain has a Lipschitz boundary we show that the discrete optimal control converges with the optimal rate 1. To show this result we exploit the local coercivity of the objective, best-approximation properties of the control space and suboptimal error estimates for the state and adjoint equation. In order to obtain a more accurate solution we also investigate the postprocessing approach where an improved control is computed by a pointwise application of the first-order optimality condition to the discrete state variables. For this approach we have to assume more regularity for the exact solution and thus, we restrict our considerations to two-dimensional domains with sufficiently smooth boundary. Under this assumption we show the optimal convergence rate of \(2-\varepsilon\) with arbitrary \(\varepsilon >0\) which is the rate one would also expect in the case of linear quadratic boundary control problems and smooth solutions [3, 4, 33] (even with \(h^{-\varepsilon }\) replaced by \(|\ln h|\), where h is the maximal element diameter of the finite element mesh). The proof relies on the non-expansivity of the projection onto the feasible set as well as sharp error estimates for the state and adjoint state in \(L^2(\varGamma )\). To obtain estimates in these norms superconvergence properties of the midpoint interpolant, finite element error estimates for the Ritz projection in \(L^2(\varGamma )\) and a supercloseness result between the midpoint interpolant of the exact and the discrete solution are exploited. To show the \(L^2(\varGamma )\)-norm error estimate we will, as we consider smooth solutions, derive a maximum norm estimate. To the best of the author’s knowledge these results are not available in the literature for problems with Robin boundary conditions. Based on the ideas from [18] we formulate the missing proof.

We moreover note that the setting discussed here does not fit into the well-known framework of the semilinear optimal control problems discussed e. g. in [5, 9, 11, 29], as these contributions deal with nonlinearities depending solely on the state variable. However, many techniques can be reused for the problem considered here. The only publication where more general nonlinearities depending both on the state and the control variable is, to the best of the author’s knowledge, [35]. Therein optimality conditions are discussed but there is no theory on the numerical analysis of approximation methods for this problem class available yet. However, we think that the consideration of bilinear control problems may serve as a starting point for the investigation of a more general class of nonlinear optimal control problems.

The article is structured as follows. In Sect. 2 we discuss the solubility of the state equation and regularity results for its solution. In Sect. 3 we analyze the optimal control problem. In particular, necessary and sufficient optimality conditions are investigated. Section 4 is devoted to the finite element discretization of the state equation, where we show finite element error estimates required for the numerical analysis of the optimal control problem later. The discretization of the optimal control problem is considered in Sect. 5. In particular, we discuss convergence rates for the numerical solution obtained by a full discretization of the optimal control problem as well as for an improved control obtained by a postprocessing step. The latter result requires some auxiliary results that we discuss in the appendix. To be more precise, a maximum norm error estimate for the finite element solution of an elliptic equation with Robin boundary conditions is needed. A proof is given in Appendix 1. Moreover, a proof of local error estimates for the midpoint interpolant and the \(L^2(\varGamma )\) projection onto piecewise constant functions on the boundary is needed. To the best of the author’s knowledge these results are not available in the literature in case of domains with curved boundaries. Thus, we discuss these auxiliary results in Appendix 2. Finally, we will compare the theoretical results with numerical experiments in Sect. 6.

2 Analysis of the state equation

We consider the boundary value problem

$$-\varDelta y + y = f\quad {\text{ in }}\quad \varOmega ,\qquad \partial _n y + u\,y=g\quad \text{ on }\quad \varGamma ,$$

on a bounded Lipschitz domain \(\varOmega \subset {{\mathbb {R}}}^n\), \(n\in \{2,3\}\), with data \(f\in L^2(\varOmega )\) and \(g\in L^2(\varGamma )\). The corresponding weak formulation reads

$${\text {Find}}\ y\in H^1(\varOmega ):\qquad a_u(y,v) = F(v)\qquad \forall v\in H^1(\varOmega ),$$
(1)

with

$$\begin{aligned} a_u(y,v)&:=(\nabla y,\nabla v)_{L^2(\varOmega )} + (y,v)_{L^2(\varOmega )} + (u\,y,v)_{L^2(\varGamma )},\\ F(v)&:= (f,v)_{L^2(\varOmega )} + (g,v)_{L^2(\varGamma )}. \end{aligned}$$

First, we show an existence and uniqueness result for (1). Therefore, we introduce a decomposition of the control into positive and negative parts \(u^+,u^-\in L_+^2(\varGamma ):=\{v\in L^2(\varGamma ):v\ge 0 \text{ a. } \text{ e. } \text{ on }\ \varGamma \}\) such that \(u=u^+ - u^-\). The following result then relies on the Lax–Milgram–Lemma. However, an assumption on the coefficient u is required.

Lemma 1

Assume that\(u\in L^2(\varGamma )\)satisfies

$$\Vert u^-\Vert _{L^2(\varGamma )} < \frac{1}{c_*^2}$$
(2)

with the constant\(c^*\)which is due to the estimate\(\Vert v\Vert _{L^4(\varGamma )} \le c^*\Vert v\Vert _{H^1(\varOmega )}\). Then, the solutionyof (1) belongs to\(H^1(\varOmega )\)and satisfies the a priori estimate

$$\Vert y\Vert _{H^1(\varOmega )}\le \frac{1}{\gamma _u}\,\left( \Vert f\Vert _{H^1(\varOmega )^*} + \Vert g\Vert _{H^{-1/2}(\varGamma )}\right)$$

with\(\gamma _u := 1-c_*^2\,\Vert u^-\Vert _{L^2(\varGamma )}>0\).

Proof

The boundedness of \(a_u\) follows directly from the Cauchy–Schwarz inequality and the continuity of the trace operator \(\tau :H^1(\varOmega )\rightarrow L^4(\varGamma )\). This implies

$$\begin{aligned} a(y,z)&\le \Vert y\Vert _{H^1(\varOmega )}\,\Vert z\Vert _{H^1(\varOmega )} + \Vert u\Vert _{L^2(\varGamma )}\,\Vert y\Vert _{L^4(\varGamma )}\, \Vert z\Vert _{L^4(\varGamma )} \\&\le \left( 1+c_*^2\,\Vert u\Vert _{L^2(\varGamma )}\right) \,\Vert y\Vert _{H^1(\varOmega )}\, \Vert z\Vert _{H^1(\varOmega )}. \end{aligned}$$

To show the coercivity we take into account the decomposition \(u=u^+ - u^-\) to get

$$a(y,y)\ge \Vert y\Vert _{H^1(\varOmega )}^2 - \int _\varGamma u^-\,y^2 \ge \left( 1-c_*^2\,\Vert u^-\Vert _{L^2(\varGamma )}\right) \,\Vert y\Vert _{H^1(\varOmega )}^2.$$

Here, the assumption (2) will ensure the coercivity. An application of the Lax–Milgram Lemma leads to the desired result. \(\square\)

Note that \(\{v\in L^2(\varGamma ):\Vert v^-\Vert _{L^2(\varOmega )} < c_*^{-2}\}\) is an open subset of \(L^2(\varGamma )\). This is the key idea which allows us to avoid the two-norm discrepancy for the optimal control problem as we will see that the reduced objective functional is differentiable with respect to the \(L^2(\varGamma )\)-topology. In the following we will hide the dependency of the estimates on \(\Vert u^-\Vert _{L^2(\varGamma )}\) and thus \(\gamma _u\) in the generic constant as we impose positive control bounds in the considered optimal control problem.

Later, we will frequently make use of the following Lipschitz estimate.

Lemma 2

If\(u_1,u_2\in L^2(\varGamma )\)satisfy the assumption (2), the corresponding states\(y_1,y_2\in H^1(\varOmega )\)solving

$$a_{u_i}(y_i,v) = (f_i,v)_{L^2(\varOmega)} + (g_i,v)_{L^2(\varGamma)} \quad \forall v\in H^1(\varOmega ),\ i=1,2,$$

fulfill the estimate

$$\begin{aligned} \Vert y_1-y_2\Vert _{H^1(\varOmega )}&\le \big (\Vert u_1-u_2\Vert _{L^2(\varGamma )}\,\Vert y_2\Vert _{H^1(\varOmega )} \\&\quad + \Vert f_1-f_2\Vert _{H^1(\varOmega )^*} + \Vert g_1-g_2\Vert _{H^{-1/2}(\varGamma )}\big ). \end{aligned}$$

Proof

Subtracting the variational formulations for \(y_1\) and \(y_2\) from each other leads to

$$\begin{aligned}&(\nabla (y_1-y_2),\nabla v)_{L^2(\varOmega )} + (y_1-y_2,v)_{L^2(\varOmega )} + (u_1\,(y_1-y_2),v)_{L^2(\varGamma )} \\&\quad = (f_1-f_2,v)_{L^2(\varOmega )} + (g_1-g_2,v)_{L^2(\varGamma )} + ((u_2-u_1)\,y_2,v)_{L^2(\varGamma )}. \end{aligned}$$

The result follows from Lemma 1 and the continuity of the product mapping from \(L^2(\varGamma )\times H^{1/2}(\varGamma )\) to \(H^{-1/2}(\varGamma )\), see [20, Theorem 1.4.4.2]. \(\square\)

In the following theorem we collect some regularity results for the solution of (1).

Lemma 3

Let\(\varOmega \subset {{\mathbb {R}}}^n\), \(n\in \{2,3\}\), be a bounded Lipschitz domain. By\(y\in H^1(\varOmega )\)we denote the solution of (1). The following a priori estimates are valid, under the assumption that the input data possess the regularity demanded by the right-hand side:

(a):

If \(r>2n/(1+n)\)and\(p > 2\)for\(n=2\)and\(p\ge 4\)for\(n=3\), then

$$\Vert y\Vert _{H^{3/2}(\varOmega )} + \Vert y\Vert _{H^1(\varGamma )}\le c\left( 1+\Vert u\Vert _{L^p(\varGamma )}\right) \left( \Vert f\Vert _{L^{r}(\varOmega )} + \Vert g\Vert _{L^2(\varGamma )} \right) .$$
(b):

If \(r> n/2,\ s > n-1,\)and\(p\ge 2\)for\(n=2\)and\(p > 8/3\) for \(n=3\), then

$$\Vert y\Vert _{C({\overline{\varOmega }})}\le c \left( 1+\Vert u\Vert _{L^{p}(\varGamma )}\right) ^2\left( \Vert f\Vert _{L^{r}(\varOmega )} + \Vert g\Vert _{L^{s}(\varGamma )}\right) .$$
(c):

Furthermore, if\(\varOmega\)is a convex polygonal/polyhedral domain, or possesses a boundary which is of class\(C^{1,1}\), there holds

$$\begin{aligned} \Vert y\Vert _{H^2(\varOmega )}&\le c\left( 1+\Vert u\Vert _{H^{1/2}(\varGamma )}\right) ^2\,\left( \Vert f\Vert _{L^2(\varOmega )} + \Vert g\Vert _{H^{1/2}(\varGamma )}\right) . \end{aligned}$$

Proof

(a)

In [15, Theorem 1.12] it is shown that the problem

$$\begin{aligned} -\varDelta y = F\ \text{ in }\ \varOmega ,\qquad \partial _n y = G \ \text{ on }\ \varGamma \end{aligned}$$

possesses a solution in \(H^{3/2}(\varOmega )\) provided that \(F\in H^{s-2}(\varOmega )\) for some \(s\in (3/2,2]\) and \(G\in L^2(\varGamma )\), as well as \(\int _\varOmega F + \int _\varGamma G = 0\). The solubility condition is satisfied in our situation with \(F=f-y\) and \(G=g-u\,y\) and becomes clear when testing (1) with \(v\equiv 1\). The regularity required for F follows from the embedding \(f\in L^r(\varOmega )\hookrightarrow H^{-1/2+\varepsilon }(\varOmega )\) for sufficiently small \(\varepsilon >0\). Moreover, the Hölder inequality and the continuity of the trace operator \(\tau :H^1(\varOmega )\rightarrow L^q(\varGamma )\) for \(q<\infty\) (\(n=2\)) or \(q\le 4\) (\(n=3\)) imply \(\Vert u\,y\Vert _{L^2(\varGamma )} \le c\,\Vert u\Vert _{L^p(\varGamma )}\,\Vert y\Vert _{H^1(\varOmega )}\), from which we conclude \(G\in L^2(\varGamma )\). From [15, Theorem 1.12] and Lemma 1 we then obtain

$$\begin{aligned} \Vert y\Vert _{H^{3/2}(\varOmega )}&\le c \left( \Vert F\Vert _{L^r(\varOmega )} + \Vert G\Vert _{L^2(\varGamma )} + \left| \int _\varOmega y(x)\mathrm {d}x\right| \right) \nonumber \\&\le c \left( 1+\Vert u\Vert _{L^p(\varGamma )}\right) \left( \Vert f\Vert _{L^r(\varOmega )} + \Vert g\Vert _{L^2(\varGamma )}\right) . \end{aligned}$$
(3)

It remains to show the \(H^1(\varGamma )\)-norm estimate. We split the solution into the parts \(y_f\) and \(y_g\) solving

$$\begin{aligned} -\varDelta y_f + y_f&= f&\qquad -\varDelta y_g + y_g&= 0&\text{ in }\ \varOmega ,\\ \partial _n y_f&= 0&\partial _n y_g&= g - u y&\qquad \text{ on }\ \varGamma . \end{aligned}$$

Using [19, Theorem 5.4] we directly deduce

$$\begin{aligned} \Vert y_g\Vert _{H^1(\varGamma )} \le c\,\Vert g-u\,y\Vert _{L^2(\varGamma )} \le c \left( \Vert g\Vert _{L^2(\varGamma )} + \Vert u\Vert _{L^p(\varGamma )}\,\Vert y\Vert _{H^1(\varOmega )}\right) \end{aligned}$$

and Lemma 1 leads to the desired estimate for \(y_g\). For the function \(y_f\), we get the desired estimate by an application of a trace theorem and the a priori estimate (3) which can in case of \(g\equiv 0\) be improved to

$$\begin{aligned} \Vert y_f\Vert _{H^1(\varGamma )} \le c\,\Vert y_f\Vert _{H^{3/2+\varepsilon }(\varOmega )} \le c\,\Vert f\Vert _{L^r(\varOmega )}, \end{aligned}$$

provided that \(\varepsilon >0\) is sufficiently small. The validity of the second step can be confirmed by means of [15, Theorem 1.12] and [14, Theorem 23.3]. The decomposition \(y=y_f+y_g\) and the estimates shown above imply the desired estimate in the \(H^1(\varGamma )\)-norm.

(b) We prove the result for the case \(n=3\). The two-dimensional case follows from the same arguments. From [8, Theorem 3.1] it is known that the solution of (1) belongs to \(C({\overline{\varOmega }})\) if \(f\in L^r(\varOmega )\), \(r>n/2\), and \(g-uy\in L^s(\varGamma )\), \(s>n-1\). The latter assumption can be concluded from the Hölder inequality, a Sobolev embedding and a trace theorem, which implies

$$\begin{aligned} \Vert u\,y\Vert _{L^s(\varGamma )} \le c\,\Vert u\Vert _{L^p(\varGamma )}\,\Vert y\Vert _{L^8(\varGamma )}\le c\,\Vert u\Vert _{L^p(\varGamma )}\,\Vert y\Vert _{H^{5/4+\varepsilon }(\varOmega )} \end{aligned}$$

for \(1/p+1/8=1/(2+\varepsilon )\). A simple computation shows that \(p>8/3\) and \(s=2+\varepsilon\) with \(\varepsilon >0\) sufficiently small guarantee the validity of the previous steps. It remains to show \(y\in H^{5/4+\varepsilon }(\varOmega )\). This can be deduced from [14, Theorem 23.3] where the a priori estimate

$$\begin{aligned} \Vert y\Vert _{H^{5/4+\varepsilon }(\varOmega )} \le c \left( \Vert f\Vert _{H^{3/4-\varepsilon }(\varOmega )^*} + \Vert g - u\,y\Vert _{H^{-1/4+\varepsilon }(\varGamma )}\right) \end{aligned}$$
(4)

is stated. The regularity demanded by the right-hand side of (4) is confirmed with the embeddings \(f\in L^r(\varOmega )\hookrightarrow H^{3/4-\varepsilon }(\varOmega )^*\) and \(g\in L^s(\varGamma )\hookrightarrow H^{-1/4+\varepsilon }(\varGamma )\). Moreover, there holds \(\Vert u\,y\Vert _{H^{-1/4+\varepsilon }(\varGamma )} \le c\,\Vert u\Vert _{L^p(\varGamma )}\,\Vert y\Vert _{L^4(\varGamma )}\), see [20, Theorem 1.4.4.2]. Collecting up the arguments above leads to

$$\begin{aligned} \Vert y\Vert _{C({\overline{\varOmega }})}&\le c \left( \Vert f\Vert _{L^r(\varOmega )} + \Vert g\Vert _{L^s(\varGamma )} + \Vert u\Vert _{L^p(\varGamma )} \Vert y\Vert _{H^{5/4+\varepsilon }(\varOmega )}\right) \\&\le c \left( 1+\Vert u\Vert _{L^p(\varGamma )}\right) ^2\left( \Vert f\Vert _{L^r(\varOmega )} + \Vert g\Vert _{L^s(\varOmega )} + \Vert y\Vert _{H^1(\varOmega )}\right) \end{aligned}$$

and the assertion follows after insertion of the a priori estimate from Lemma 1.

(c) With an embedding we deduce from the assumption that \(u\in L^4(\varGamma )\). Hence, (4) is applicable which implies \(y\in H^{3/4}(\varGamma )\) and thus, \(u\,y\in H^{1/2}(\varGamma )\), see [20, Theorem 1.4.4.2]. The \(H^2(\varOmega )\)-regularity of y then follows from a shift theorem applied to the equation with boundary conditions \(\partial _n y = g - u y\in H^{1/2}(\varGamma )\) on \(\varGamma\), see [20, Theorem 2.4.2.7] (for domains with smooth boundary) or [20, Theorem 4.4.3.8] (for convex polygonal domains). \(\square\)

3 The optimal control problem

Due to the well-posedness of the state equation we may introduce the control-to-state operator \(S:U_{ad}\rightarrow H^1(\varOmega )\) defined by \(S(u):=y\), with y solving (1). This allows to reformulate the optimal control problem introduced in Sect. 1 and we arrive at

$$\begin{aligned} j(u):=\frac{1}{2}\Vert S(u)-y_d\Vert _{L^2(\varOmega )}^2 + \frac{\alpha }{2}\Vert u\Vert _{L^2(\varGamma )}^2 \rightarrow \min ! \end{aligned}$$
(5)

subject to \(u\in U_{ad}:=\{v\in L^2(\varGamma ):u_a\le v\le u_b\ {\text {a. e. on}}\ \varGamma \}\). Here, \(\alpha >0\) is the regularization parameter, \(y_d\in L^2(\varOmega )\) the desired state and \(0< u_a < u_b\) the control bounds. Our aim is to derive necessary and sufficient optimality conditions as well as regularity results for local solutions. Note, that the operator S is non-affine and consequently, j is non-convex. The existence of at least one local solution can be concluded from standard arguments [39], taking into account that for a minimizing sequence \(\{u_n\}\subset L^q(\varGamma )\), \(q\in (2,\infty )\), the corresponding states converge strongly in \(L^p(\varGamma )\) for each \(p<4\), which is due the compact embedding \(H^1(\varOmega )\hookrightarrow L^p(\varGamma )\).

3.1 Optimality conditions

To derive optimality conditions differentiability properties of the (implicitly defined) operator S are of interest.

Lemma 4

The operator\(S:U_{ad}\rightarrow H^1(\varOmega )\)is infinitely many times Fréchet differentiable with respect to the\(L^2(\varGamma )\)-topology. The first derivative\(\delta y := S'(u)\delta u\)is the weak solution of the tangent equation

$$\begin{aligned} \left\{ \begin{array}{ll} -\varDelta \delta y + \delta y = 0 &{}\quad \text{ in }\ \varOmega ,\\ \partial _n\delta y + u\,\delta y = -\delta u\,y &{}\quad \text{ on }\ \varGamma . \end{array} \right. \end{aligned}$$
(6)

Proof

The result follows from an application of the implicit function theorem to the operator \(e:H^1(\varOmega )\times U \rightarrow H^1(\varOmega )^*\) with \(U:=\{v\in L^2(\varGamma ):v \text{ fulfills }\) (2)} defined by

$$\begin{aligned} e(y,u)v := (\nabla y,\nabla v)_{L^2(\varOmega )} + (y,v)_{L^2(\varOmega )} + (u\,y,v)_{L^2(\varGamma )} - (f,v)_{L^2(\varOmega )} - (g,v)_{L^2(\varOmega )}, \end{aligned}$$

whose roots are solutions of (1). We choose \(\delta y\in H^1(\varOmega )\), \(\delta u\in U\) such that \(u+\delta u\in U\) (note that U is an open subset of \(L^2(\varGamma )\)). First, we confirm that the linear operator \(e'(y,u):H^1(\varOmega )\times U\rightarrow H^1(\varOmega )^*\) defined by

$$\begin{aligned} e'(y,u)(\delta y, \delta u):=(\nabla \delta y,\nabla \cdot )_{L^2(\varOmega )} + (\delta y,\cdot )_{L^2(\varOmega )} + (u\,\delta y + y\,\delta u,\cdot )_{L^2(\varGamma )} \end{aligned}$$
(7)

is the Fréchet-derivative of e. This is a consequence of

$$\begin{aligned} e(y+\delta y,u+\delta u) - e(y,u) = e'(y,u)(\delta y,\delta u) + (\delta u\,\delta y,\cdot )_{L^2(\varGamma )} \end{aligned}$$

and the fact that the remainder term satisfies

$$\begin{aligned} \Vert \delta u\,\delta y\Vert _{H^1(\varOmega )^*}&= \sup _{\varphi \in H^1(\varOmega )} \frac{(\delta u\,\delta y,\varphi )_{L^2(\varGamma )}}{\Vert \varphi \Vert _{H^1(\varOmega )}} \nonumber \\&\le c \sup _{\varphi \in H^1(\varOmega )} \frac{\Vert \varphi \Vert _{L^4(\varGamma )}}{\Vert \varphi \Vert _{H^1(\varOmega )}}\, \Vert \delta u\Vert _{L^2(\varGamma )}\,\Vert \delta y\Vert _{L^4(\varGamma )} \nonumber \\&\le c\,\Vert \delta u\Vert _{L^2(\varGamma )}\,\Vert \delta y\Vert _{H^1(\varOmega )} \le c\left( \Vert \delta u\Vert _{L^2(\varGamma )}^2+\Vert \delta y\Vert _{H^1(\varOmega )}^2\right) \nonumber \\&= o(\Vert (\delta y,\delta u)\Vert _{H^1(\varOmega )\times L^2(\varGamma )}), \end{aligned}$$
(8)

where we applied the generalized Hölder inequality and \(H^1(\varOmega )\hookrightarrow L^4(\varGamma )\). The second Fréchet derivative

$$\begin{aligned} e'':H^1(\varOmega )\times U\rightarrow {\mathcal {L}}((H^1(\varOmega )\times L^2(\varGamma ))^2,H^1(\varOmega )^*) \end{aligned}$$

is given by

$$\begin{aligned} e''(y,u)(\delta y,\delta u)(\tau y,\tau u) := (\tau u\,\delta y + \delta u\,\tau y,\cdot )_{L^2(\varGamma )} \end{aligned}$$

and the mapping \((y,u)\mapsto e''(y,u)\) is continuous. The derivatives of order \(n\ge 3\) vanish. Hence, \(e:H^1(\varOmega )\times U\rightarrow H^1(\varOmega )^*\) is of class \(C^\infty\).

Finally, due to Lemma 1 we conclude that the linear mapping

$$\begin{aligned} \delta y\mapsto e_y(y,u)\delta y = (\nabla \delta y,\nabla \cdot )_{L^2(\varOmega )} + (\delta y,\cdot )_{L^2(\varOmega )}+(u\,\delta y,\cdot )_{L^2(\varGamma )}\in H^1(\varOmega )^* \end{aligned}$$

is bijective. The implicit function theorem implies the assertion and the derivative \(\delta y:=S'(u)\delta u\) is given by \(e'(y,u)(\delta y,\delta u) = 0\). This corresponds to the weak formulation of (6). \(\square\)

From the chain rule and Lemma 4 we directly conclude the following differentiability result:

Lemma 5

The functional \(j:U_{ad}\rightarrow {{\mathbb {R}}}\) is infinitely many times Fréchet differentiable with respect to the \(L^2(\varGamma )\) -topology and the first derivative is given by

$$\begin{aligned} \left<j'(u),v\right> = (S(u)-y_d, S'(u)v)_{L^2(\varOmega )} + \alpha \,(u,v)_{L^2(\varGamma )}, \qquad v\in L^2(\varGamma ). \end{aligned}$$
(9)

The derivative of j can be simplified exploiting a precise representation of the adjoint \(S'(u)^*:H^1(\varOmega )^*\rightarrow L^2(\varGamma )\) of the linearized control-to-state operator \(S'(u)\). In order to compute this, we introduce the the adjoint state \(p\in H^1(\varOmega )\) as the weak solution of the adjoint equation

$$\begin{aligned} \left\{ \begin{array}{ll} -\varDelta p + p = y-y_d &{}\qquad \text{ in }\ \varOmega ,\\ \partial _n p + u\,p = 0 &{}\qquad \text{ on }\ \varGamma . \end{array} \right. \end{aligned}$$
(10)

Testing the variational problems for (10) and (6) with \(\delta y:=S'(u)\delta u\) and p, respectively, leads to the relation

$$\begin{aligned} (y-y_d,\delta y)_{L^2(\varOmega )} = - (y\,p,\delta u)_{L^2(\varGamma )}, \end{aligned}$$

which implies

$$\begin{aligned} S'(u)^*(y-y_d):= -[y\,p]_\varGamma . \end{aligned}$$
(11)

In the following we denote the control-to-adjoint mapping \(Z:L^2(\varGamma )\rightarrow H^1(\varOmega )\) defined by \(u\mapsto Z(u):=p\) via (10) with \(y=S(u)\). Finally, we are able to formulate the necessary optimality condition

$$\begin{aligned} \left<j'(u),v-u\right> = (S(u)-y_d,S'(u)(v-u))_{L^2(\varOmega )} + \alpha \,(u,v-u)_{L^2(\varGamma )}\ge 0 \quad \forall v\in U_{ad} \end{aligned}$$

and with (11) we get the equivalent representation

$$\begin{aligned} \left<j'(u),v-u\right> = (\alpha \,u-S(u)\,Z(u),v-u)_{L^2(\varGamma )} \ge 0 \quad \forall v\in U_{ad}. \end{aligned}$$

Taking into account the definitions of S and Z we can write this variational inequality in the form

$$\begin{array}{ll} -\varDelta y + y= f\qquad-\varDelta p + p= y-y_d&\quad\text{ in }\ \varOmega ,\\ \partial _n y + u\,y= g\qquad\partial _n p + u\,p= 0&\quad\text{ on }\ \varGamma , \\ \quad\left( \alpha \,u - y\,p,v-u\right) _{L^2(\varGamma )} \ge 0&\quad\text{ for } \text{ all }\ v\in U_{ad}. \end{array}$$
(12)

The latter inequality is equivalent to the projection formula

$$\begin{aligned} u = \varPi _{ad}\left( \frac{1}{\alpha }[y\,p]_\varGamma \right) \end{aligned}$$
(13)

with \(\varPi _{ad}\) the \(L^2(\varGamma )\)-projection onto \(U_{ad}\).

As the problem (5) is not convex, we have to investigate second-order sufficient conditions. To obtain the Hessian of j we apply the product rule and get

$$\begin{aligned} j''(u)(\delta u,\tau u) = \left( \alpha \,\tau u + S'(u)\tau u\,Z(u) + S(u)\,Z'(u)\tau u,\delta u\right) _{L^2(\varGamma )}. \end{aligned}$$
(14)

The function \(\tau p = Z'(u)\tau u\in H^1(\varOmega )\) is the weak solution of the “dual for Hessian”-equation

$$\begin{aligned} \left\{ \begin{aligned} -\varDelta \tau p + \tau p&= \tau y&{\text {in}}\ \varOmega ,\\ \partial _n \tau p + u\,\tau p&= -\tau u\,p&{\text {on}}\ \varGamma , \end{aligned} \right. \end{aligned}$$
(15)

where \(\tau y = S'(u)\tau u\). As in the proof of Lemma  3 this follows from the implicit function theorem. Note that also further representations of the Hessian are possible. For instance, a direct application of the product rule to (9) yields

$$\begin{aligned} j''(u)(\delta u,\tau u)&= \alpha (\delta u,\tau u)_{L^2(\varGamma )} + (S'(u)\tau u,S'(u)\delta u)_{L^2(\varOmega )} \\&\quad + (S(u)-y_d,S''(u)(\delta u,\tau u))_{L^2(\varOmega )}\\&= \alpha (\delta u,\tau u)_{L^2(\varGamma )} + (\delta y,\tau y)_{L^2(\varOmega )} + (y-y_d,\delta \tau y)_{L^2(\varOmega )}, \end{aligned}$$

with \(y=S(u)\), \(\delta y = S'(u)\delta y\), \(\tau y = S'(u)\tau u\) and \(\delta \tau y = S''(u)(\delta u,\tau u)\). The latter relation means that \(\delta \tau y\in H^1(\varOmega )\) is the weak solution of

$$\begin{aligned} \left\{ \begin{aligned} -\varDelta \delta \tau y + \delta \tau y&= 0&{\text {in}}\ \varOmega ,\\ \partial _n\delta \tau y + u\,\delta \tau y&= - \delta u\,\tau y - \tau u\,\delta y&{\text {on}}\ \varGamma . \end{aligned} \right. \end{aligned}$$

Moreover, due to the definition of p and \(\delta \tau y\) there holds the relation \((y-y_d,\delta \tau y)_{L^2(\varOmega )} = -(p,\delta u\tau y + \tau u\,\delta y)_{L^2(\varGamma )}\) and as a consequence, we can further simplify the representation of the Hessian and obtain

$$\begin{aligned} j''(u)(\delta u,\tau u) = \alpha (\delta u,\tau u)_{L^2(\varGamma )} + (\delta y,\tau y)_{L^2(\varOmega )} - (p,\delta u\,\tau y + \tau u\,\delta y)_{L^2(\varGamma )}. \end{aligned}$$

Next, we derive some stability and Lipschitz properties of S, Z, \(S'\) and \(Z'\). As the following results require different assumptions on f, \(y_d\) and g we simply assume the most restrictive ones, this is,

$$\begin{aligned} f, y_d\in L^\infty (\varOmega ),\qquad g\in H^{1/2}(\varGamma ). \end{aligned}$$

Moreover, we will hide the dependency on these quantities in the generic constant to simplify the notation.

Lemma 6

Let\(u\in L^2(\varGamma )\)satisfy the assumption (2). The control-to-state operatorSsatisfies the following inequalities:

$$\begin{aligned} \Vert S(u)\Vert _{H^1(\varOmega )}&\le c,&\\ \Vert S(u)\Vert _{H^{3/2}(\varOmega )} + \Vert S(u)\Vert _{H^1(\varGamma )}&\le c\,(1+\Vert u\Vert _{L^{p_1}(\varGamma )}), \\ \Vert S(u)\Vert _{L^\infty (\varOmega )}&\le c\,(1+\Vert u\Vert _{L^{p_2}(\varGamma )})^2, \end{aligned}$$

with\(p_1 > 2\)and\(p_2\ge 2\)for\(n=2\), and\(p_1\ge 4\)and\(p_2 > 8/3\)for\(n=3\). The estimates remain valid when replacing the operatorSby the control-to-adjoint operatorZ.

Proof

The inequalities for S are a direct consequence of Lemmata  1 and 3. The inequalities for Z can be derived with similar arguments, but the right-hand side of the adjoint equation involves the corresponding state S(u). However, in all cases the norms of \(S(u)-y_d\) can be bounded by \(c\,(1+\Vert S(u)\Vert _{H^1(\varOmega )})\le c\). \(\square\)

Lemma 7

Given are\(u,\delta u\in L^2(\varGamma )\)and it is assumed thatusatisfies (2). Then, the following stability estimates hold true:

$$\begin{aligned} \Vert S'(u)\delta u\Vert _{H^1(\varOmega )}&\le c\,\Vert \delta u\Vert _{L^2(\varGamma )},\\ \Vert S'(u)\delta u\Vert _{H^{3/2}(\varOmega )}&\le c\left( 1+\Vert u\Vert _{L^p(\varGamma )}\right) ^3 \Vert \delta u\Vert _{L^2(\varGamma )}, \end{aligned}$$

with\(p>2\)for\(n=2\)and\(p\ge 4\)for\(n=3\). The estimates remain valid when replacing\(S'\)by\(Z'\).

Proof

In the following we write \(y:=S(u)\) and \(\delta y = S'(u)\delta u\). The stability in \(H^1(\varOmega )\) follows directly from Lemma 1 and the estimate

$$\begin{aligned} \Vert \delta u\,y\Vert _{H^{-1/2}(\varGamma )} = \sup _{\genfrac{}{}{0.0pt}{}{\varphi \in H^{1/2}(\varGamma )}{\varphi \not \equiv 0}} \frac{\left( \delta u \,y,\varphi \right) _{L^2(\varGamma )}}{\Vert \varphi \Vert _{H^{1/2}(\varGamma )}} \le c\,\Vert \delta u\Vert _{L^2(\varGamma )}\,\Vert y\Vert _{H^1(\varOmega )}, \end{aligned}$$
(16)

which follows from the same arguments used already in (8). The boundedness of \(y:=S(u)\) in \(H^1(\varOmega )\) can be found in the previous lemma. The estimate in the \(H^{3/2}(\varOmega )\)-norm follows analogously with Lemma 3a) and

$$\begin{aligned} \Vert y\,\delta u\Vert _{L^2(\varGamma )}\le c\,\Vert y\Vert _{L^\infty (\varOmega )}\,\Vert \delta u\Vert _{L^2(\varGamma )} \end{aligned}$$

and the stability in \(L^\infty (\varOmega )\) proved in Lemma 6.

The estimates for \(Z'\) are deduced with similar techniques. With the a priori estimate from Lemma 3a) and the embedding \(H^1(\varOmega )\hookrightarrow L^r(\varOmega )\) which holds for \(r <\infty\) (\(n=2\)) or \(r\le 6\) (\(n=3\)) we get

$$\begin{aligned} \Vert Z'(u)\delta u\Vert _{H^{3/2}(\varOmega )}&\le c \left( 1+\Vert u\Vert _{L^p(\varGamma )}\right) \left( \Vert p\,\delta u\Vert _{L^2(\varGamma )} + \Vert \delta y\Vert _{H^1(\varOmega )}\right) \\&\le c\left( 1+\Vert u\Vert _{L^p(\varGamma )}\right) \left( 1+\Vert p\Vert _{L^\infty (\varGamma )}\right) \Vert \delta u\Vert _{L^2(\varGamma )} \end{aligned}$$

with \(p=Z(u)\). The stability of Z in \(L^\infty (\varOmega )\) is discussed in the previous lemma. \(\square\)

Lemma 8

Let\(u,v\in L^2(\varGamma )\)satisfy assumption (2). Then, the following Lipschitz estimates hold:

$$\begin{aligned} \Vert S(u) - S(v)\Vert _{H^1(\varOmega )}&\le c\,\Vert u-v\Vert _{L^2(\varGamma )}, \\ \Vert S'(u)\delta u - S'(v)\delta u\Vert _{H^1(\varOmega )}&\le c\,\Vert u-v\Vert _{L^2(\varGamma )}\Vert \delta u\Vert _{L^2(\varGamma )}. \end{aligned}$$

The estimates are also valid when replacingSbyZandZby\(Z'\).

Proof

The estimates for S and \(S'\) follow directly from Lemma  2 and the stability estimates for S and \(S'\) in \(H^1(\varOmega )\) proved in the Lemmata 6 and 7. The Lipschitz estimate for Z is proved in a similar way. In this case one has to apply the Lipschitz estimate shown for S to the term \(\Vert S(u)-S(v)\Vert _{H^1(\varOmega )}\) appearing due to the differences in the right-hand sides. With the same idea we show the Lipschitz estimate for \(Z'\). Using again Lemma 2 we get

$$\begin{aligned}&\Vert Z'(u)\delta u - Z'(v)\delta u\Vert _{H^1(\varOmega )} \le c\,\Big (\Vert u-v\Vert _{L^2(\varGamma )}\,\Vert Z'(u)\delta u\Vert _{H^1(\varOmega )} \\&\quad + \Vert S'(u)\delta u - S'(v)\delta v\Vert _{H^1(\varOmega )} + \Vert \delta u\,(Z(u) - Z(v))\Vert _{H^{-1/2}(\varGamma )}\Big ). \end{aligned}$$

It remains to bound the three terms on the right-hand side. To this end, we apply Lemma 7 to the first term, the Lipschitz estimate for \(S'(\cdot )\delta u\) to the second term, and the multiplication rule (16) with \(y=Z(u) - Z(v)\) as well as the Lipschitz estimate for Z to the third term. \(\square\)

As the optimal control problem is non-convex we have to deal with local solutions. For some local solution \({\bar{u}}\in U_{ad}\) we require the following second-order sufficient condition:

Assumption 1

(SSC) The objective functional is locally convex near the local solution \({\bar{u}}\), i. e., a constant \(\delta > 0\) exists such that

$$\begin{aligned} j''({\bar{u}})(v,v) \ge \delta \,\Vert v\Vert _{L^2(\varGamma )}^2 \qquad \forall v\in L^2(\varGamma ). \end{aligned}$$
(17)

With standard arguments and the estimate we will prove below in Corollary 1 one can show that each function \({\bar{u}}\in U_{ad}\) fulfilling the first-order necessary condition (12) and the second-order sufficient condition (17) is indeed a local solution and satisfies the quadratic growth condition

$$\begin{aligned} j({\bar{u}}) \le j(u) - \gamma \,\Vert u-{\bar{u}}\Vert _{L^2(\varGamma )}^2\qquad \forall u\in B_\tau ({\bar{u}}), \end{aligned}$$

with certain constants \(\gamma ,\tau > 0\). Note that there are weaker assumptions which are sufficient for local minima, for instance one could formulate (17) for all directions v from a critical cone. However, with this assumption the convergence proof for the postprocessing approach presented in Sect. 5.3 requires some more careful investigations, in particular the construction of a modified interpolant onto \(U_{ad}\). One possible solution for this issue can be found in [29].

Later, we will require the following Lipschitz estimate for the Hessian of j.

Lemma 9

Let\(u,v \in L^2(\varGamma )\)fulfilling (2) be given. Then, the Lipschitz-estimate

$$\begin{aligned} \left| j''(u)(\delta u,\delta u) - j''(v)(\delta u,\delta u)\right| \le c\,\Vert \delta u\Vert _{L^2(\varGamma )}^2\,\Vert u-v\Vert _{L^2(\varGamma )}. \end{aligned}$$

is valid for all\(\delta u\in L^2(\varGamma )\).

Proof

To shorten the notation we write \(y_u=S(u)\), \(p_u = Z(u)\), \(\delta y_u = S'(u)\delta u\) and \(\delta p_u = Z'(u)\delta u\). From the representation (14) we obtain

$$\begin{aligned}&\left| j''(v)(\delta u,\delta u) - j''(u)(\delta u,\delta u)\right| \\&\quad \le \left| (p_u\, \delta y_u - p_v\, \delta y_v + y_u\,\delta p_u - y_v\,\delta p_v,\delta u)_{L^2(\varGamma )}\right| . \end{aligned}$$

We estimate the right-hand side using the Cauchy–Schwarz inequality, the embedding \(H^1(\varOmega )\hookrightarrow L^4(\varGamma )\) and the Lipschitz estimates from Lemma 8 as well as the a priori estimates from Lemmata 6 and  7. This implies

$$\begin{aligned}&\left| (p_u\,\delta y_u - p_v\,\delta y_v,\delta u)_{L^2(\varGamma )}\right| \\&\quad \le c\left( \Vert p_u - p_v\Vert _{H^1(\varOmega )}\,\Vert \delta y_u\Vert _{H^1(\varOmega )} + \Vert \delta y_u - \delta y_v\Vert _{H^1(\varOmega )}\,\Vert p_v\Vert _{H^1(\varOmega )} \right) \Vert \delta u\Vert _{L^2(\varGamma )}\\&\quad \le c\,\Vert u-v\Vert _{L^2(\varGamma )}\,\Vert \delta u\Vert _{L^2(\varGamma )}^2. \end{aligned}$$

With similar arguments we deduce

$$\begin{aligned}&\left| \left( y_u\,\delta p_u - y_v\,\delta p_v,\delta u\right) _{L^2(\varGamma )}\right| \\&\quad \le c\left( \Vert y_u - y_v\Vert _{H^1(\varOmega )}\,\Vert \delta p_u\Vert _{H^1(\varOmega )} + \Vert \delta p_u - \delta p_v\Vert _{H^1(\varOmega )}\, \Vert y_v\Vert _{H^1(\varOmega )}\right) \Vert \delta u\Vert _{L^2(\varGamma )} \\&\quad \le c\,\Vert u-v\Vert _{L^2(\varGamma )}\,\Vert \delta u\Vert _{L^2(\varGamma )}^2, \end{aligned}$$

and conclude the assertion. \(\square\)

Corollary 1

Let\({\bar{u}}\in U_{ad}\)be a local solution of (5) satisfying Assumption 1. Then, some\(\varepsilon >0\)exists such that the inequality

$$\begin{aligned} j''(u)(\delta u,\delta u) \ge \frac{\delta }{2} \Vert \delta u\Vert ^2_{L^2(\varGamma )} \end{aligned}$$

is valid for all\(\delta u\in L^2(\varGamma )\)and\(u\in L^2(\varGamma )\)with\(\Vert u-{\bar{u}}\Vert _{L^2(\varGamma )}\le \varepsilon\).

Proof

The assertion follows immediately from the previous lemma. For further details we refer to [27, Lemma 2.23]. \(\square\)

In the next Lemma we will collect some basic regularity results for the solution of (5).

Lemma 10

Let\(\varOmega \subset {{\mathbb {R}}}^n\), \(n\in \{2,3\}\), be a Lipschitz domain. Each local solution\({\bar{u}}\in U_{ad}\)of (5) and the corresponding states\({\bar{y}}=S({\bar{u}})\), \({\bar{p}}=Z({\bar{u}})\)satisfy

$$\begin{aligned} {\bar{u}}\in H^1(\varGamma )\cap L^\infty (\varGamma ),\qquad {\bar{y}},{\bar{p}}\in H^{3/2}(\varOmega ) \cap H^1(\varGamma )\cap C({\overline{\varOmega }}). \end{aligned}$$

Proof

All regularity result, except \({\bar{u}}\in H^1(\varGamma )\), follow directly from Lemma 3. To show \({\bar{u}}\in H^1(\varGamma )\) we apply the product rule

$$\begin{aligned} \Vert {\bar{y}}\,{\bar{p}}\Vert _{H^1(\varGamma )} \le c\left( \Vert {\bar{y}}\Vert _{H^1(\varGamma )} \,\Vert {\bar{p}}\Vert _{L^\infty (\varOmega )} + \Vert {\bar{y}}\Vert _{L^\infty (\varOmega )} \,\Vert {\bar{p}}\Vert _{H^1(\varGamma )}\right) \le c \end{aligned}$$

and confirm \({\bar{y}}\,{\bar{p}}\in H^1(\varGamma )\). The desired result then follows after an application of the Stampacchia-Lemma, [26, p. 50], to the projection formula (13). The fact that the Stampacchia-Lemma is also valid on the boundary \(\varGamma\) is discussed in [28, Lemma 2.8] and [30, Lemma 3.3]. \(\square\)

Under additional assumptions on the geometry of \(\varOmega\) we can show even higher regularity. This is needed for the postprocessing approach studied in Sect. 5.3 where we will show almost quadratic convergence of the control approximations.

Lemma 11

Let\(\varOmega \subset {{\mathbb {R}}}^2\)be a bounded domain with a\(C^{1,1}\)-boundary\(\varGamma\). Then, there holds

$$\begin{aligned} {\bar{u}}\in W^{1,q}(\varGamma )\cap H^{2-1/q}({\tilde{\varGamma }}),\qquad {\bar{y}}, {\bar{p}} \in W^{2,q}(\varOmega ),\qquad q<\infty, \end{aligned}$$

for all\({\tilde{\varGamma }}\subset \subset {\mathcal {A}}\)or\({\tilde{\varGamma }}\subset \subset {\mathcal {I}}\), where\({\mathcal {A}}:=\{x\in \varGamma :u(x)\in \{u_a,u_b\}\}\)and\({\mathcal {I}}:=\varGamma \setminus {\mathcal {A}}\)denote the active and inactive set, respectively.

Proof

With the regularity results obtained already in Lemma 10, in particular \({\bar{u}}\in H^{1/2}(\varGamma )\), and Lemma 3c) we conclude \({\bar{y}}, {\bar{p}}\in H^2(\varOmega )\hookrightarrow W^{1,q}(\varGamma )\) for all \(q<(1,\infty )\) and a further application of the multiplication rule yields \({\bar{y}}\,{\bar{p}}\in W^{1,q}(\varGamma )\). From (13) we conclude the property \({\bar{u}}\in W^{1,q}(\varGamma )\). Furthermore, we confirm that \({\bar{u}}\,{\bar{y}}, {\bar{u}}\,{\bar{p}}\in W^{1-1/q,q}(\varGamma )\) and a standard shift theorem for the Neumann problem, compare also the technique used in the proof of Lemma 3a), results in \({\bar{y}},{\bar{p}}\in W^{2,q}(\varOmega )\). Repeating the arguments above, i. e., using the multiplication rule and the projection formula, we obtain \({\bar{u}}\in W^{2-1/q,q}({\tilde{\varGamma }})\hookrightarrow H^{2-1/q}({\tilde{\varGamma }})\). \(\square\)

We chose the assumptions of the previous lemma in such a way that the regularity is only restricted due to the projection formula. Of course, when the control bounds are never active we could further improve the regularity results.

4 Finite element approximation of the state equation

This section is devoted to the finite element approximation of the variational problem (1). While the results from the previous sections are valid for arbitrary Lipschitz domains (unless otherwise explicitly assumed), we have to assume more smoothness of the boundary \(\varGamma\) in order to establish our discretization results:

(A1):

The domain \(\varOmega \subset {{\mathbb {R}}}^n\), \(n\in \{2,3\}\), possesses a Lipschitz continuous boundary \(\varGamma\) which is piecewise \(C^1\).

This definition includes arbitrary (possibly non-convex) polygonal or polyhedral domains. Indeed, the regularity of solutions is in this case also restricted by corner and edge singularities. However, for the first convergence result we require only \(H^{3/2}(\varOmega )\cap H^1(\varGamma )\)-regularity of the solution. Later, we want to investigate improved discretization techniques for which more regularity is needed. Then, we will use a stronger assumption on the domain.

First, we introduce shape-regular triangulations \(\{{\mathcal {T}}_h\}_{h>0}\) of \(\varOmega\) consisting of triangles (\(n=2\)) or tetrahedra (\(n=3\)). The elements T may have curved edges/faces such that the property

$$\begin{aligned} {\overline{\varOmega }} = \bigcup _{T\in {\mathcal {T}}_h} {\overline{T}} \end{aligned}$$

is valid for an arbitrary domain \(\varOmega\). Moreover, we assume that the triangulations are feasible in the sense of Ciarlet [13].

The mesh parameter \(h>0\) is the maximal element diameter

$$\begin{aligned} h = \max _{T\in {\mathcal {T}}_h} h_T,\quad h_T:={\text {diam}}(T). \end{aligned}$$

The family of meshes \(\{{\mathcal {T}}_h\}_{h>0}\) is assumed to be quasi-uniform, this means some \(\kappa > 0\) independent of h exists such that each element \(T\in {\mathcal {T}}_h\) contains a ball with radius \(\rho _T\) satisfying the estimate \(\frac{\rho _T}{h} \ge \kappa\). Each triangulation \({\mathcal {T}}_h\) of \(\varOmega\) induces also a triangulation \({\mathcal {E}}_h\) of the boundary \(\varGamma\)

By \(F_T:{\hat{T}}\rightarrow T\) we denote the transformations from the reference triangle or tetrahedron \({\hat{T}}\) to the world element \(T\in {\mathcal {T}}_h\). The transformations \(F_T\) may be non-affine for elements with curved faces. Here, we consider transformations of the form

$$\begin{aligned} F_T = {\tilde{F}}_T + \varPhi _T, \end{aligned}$$

with some affine function \({\tilde{F}}_T({\hat{x}}) = {\tilde{B}}_T{\hat{x}} + {\tilde{b}}_T\), \({\tilde{B}}_T\in {{\mathbb {R}}}^{n\times n}\), \({\tilde{b}}\in {{\mathbb {R}}}^n\), chosen in such a way that if T is a curved boundary element, \({\tilde{T}}={\tilde{F}}_T({\hat{T}})\) is an n-simplex whose vertices coincide with the vertices of T. The assumed shape-regularity implies \(\Vert {\tilde{B}}_T\Vert \le c\,h_T\) and \(\Vert {\tilde{B}}_T^{-1}\Vert \le h_T^{-1}\), see [13, Theorem 15.2].

To guarantee the validity of interpolation error estimates we assume:

(A2):

The triangulations \({\mathcal {T}}_h\) are regular of order 2 in the sense of [6], this is, for all sufficiently small \(h>0\) there holds

$$\begin{aligned} \sup _{{\hat{x}}\in {\hat{T}}} \Vert D \varPhi _T({\hat{x}})\cdot {\tilde{B}}_T^{-1}\Vert \le c < 1,\qquad \sup _{{\hat{x}}\in {\hat{T}}}\Vert D^2 \varPhi _T({\hat{x}})\Vert \le c h^2, \end{aligned}$$
(18)

for all \(T\in {\mathcal {T}}_h\).

There are multiple strategies to construct the mappings \(F_T\) satisfying these assumptions and we refer the reader for instance to [6, 37, 41]. Therein, it is assumed that \(\varGamma\) is piecewise \(C^3\), only in the second reference \(C^4\) is required.

The trial and test space is defined by

$$\begin{aligned} V_h:= \{ v_h\in C({\overline{\varOmega }}) :v_h = {\hat{v}}_h\circ F_T^{-1},\ {\hat{v}}_h \in {\mathcal {P}}_1({\hat{T}})\ \text{ for } \text{ all }\ T\in {\mathcal {T}}_h\}. \end{aligned}$$

Next, we introduce an interpolation operator onto \(V_h\). We partly use the quasi-interpolant proposed by Bernardi [6], but use a modification for boundary nodes as in [36], see also [1, 2]. To each interior node \(x_i\), \(i=1,\ldots ,N^{{\text {in}}}\), of \({\mathcal {T}}_h\), we associate an element \(\sigma _i:=T\in {\mathcal {T}}_h\) with \(x_i\in T\). For the boundary nodes \(x_i\), \(i=N^{{\text {in}}}+1,\ldots ,N\), we define instead \(\sigma _i:= E\in {\mathcal {E}}_h\) with \(x_i\in E\). Instead of using nodal values as for the Lagrange interpolant, we use the nodal values of some regularized function computed by an \(L^2\)-projection over \(\sigma _i\). The interpolation operator \(\varPi _h:W^{1,1}(\varOmega )\rightarrow V_h\) is defined as follows. For each node \(i=1,\ldots ,N\) we define a local \(L^2\)-projection \({\hat{\pi }}_i\) onto \({\mathcal {P}}_1(\sigma _i)\) by

$$\begin{aligned} \int _{{\hat{\sigma }}_i} ({\hat{\pi }}_i(u) - u\circ F_i)\,{\hat{q}} = 0\qquad \forall {\hat{q}}\in {\mathcal {P}}_1, \end{aligned}$$

with \(F_i\) the transformation from the reference element \({\hat{T}}\) (\(i=1,\ldots ,N^{{\text {in}}}\)) or \({\hat{E}}\) (\(i=N^{{\text {in}}}+1,\ldots ,N\)) onto \(\sigma _i\). The interpolation operator is defined by

$$\begin{aligned} \varPi _h v(x) = \sum _{i=1}^N ({\hat{\pi }}_i(u)\circ F_i^{-1})(x_i)\,\varphi _i(x), \end{aligned}$$

where \(\{\varphi _i\}_{i=1,\ldots ,N}\) is the nodal basis of \(V_h\). Note, that due to the modification for boundary nodes, this operator is only applicable to \(W^{1,1}(\varOmega )\)-functions. The desired interpolation properties remain valid. In particular, for each \(T\in {\mathcal {T}}_h\), there holds

$$\begin{aligned} \Vert u-\varPi _h u\Vert _{H^m(T)} \le c h^{\ell -m} \Vert u\Vert _{H^{\ell }(S_T)},\quad m\le \ell \le 2,\ \ell \ge 1, \end{aligned}$$
(19)

where \(S_T\) is the patch of elements adjacent to T, see [6, Theorem 4.1], [36, Theorem 4.1]. Due to the special choice of the patches \(\sigma _i\) for the boundary nodes we get similar interpolation error estimates on the boundary elements \(E\in {\mathcal {E}}_h\), this is,

$$\begin{aligned} \Vert u-\varPi _h u\Vert _{H^m(E)} \le c h^{\ell -m}\Vert u\Vert _{H^{\ell }(S_E)}, \quad m\le \ell \le 2, \end{aligned}$$
(20)

with the patch \(S_E\) of that boundary elements \(E'\in {\mathcal {E}}_h\) that touch E. The proof follows from the same arguments as in [36, Theorem 4.1].

The finite element solutions of (1) are characterized by the variational formulations

$$\begin{aligned} {\text {Find }} y_h\in V_h:\quad a_u(y_h,v_h) = F(v_h)\qquad \forall v_h\in V_h. \end{aligned}$$
(21)

As in the continuous case one can show that (21) possesses a unique solution for each \(h>0\).

With the usual arguments we can derive an error estimate for the approximation error in the energy-norm.

Lemma 12

Assume that (A1)and (A2)are satisfied and that the solutionyof (1) belongs to\(H^{s}(\varOmega )\)with some\(s\in [1,2]\). Then, there holds the error estimate

$$\begin{aligned} \Vert y-y_h\Vert _{H^1(\varOmega )}&\le c\,h^{s-1}\,\Vert y\Vert _{H^{s}(\varOmega )}. \end{aligned}$$
(22)

Proof

The proof follows from the Céa-Lemma and the interpolation error estimates (19). \(\square\)

Of particular interest are error estimates on the boundary. This is required in order to derive error estimates for boundary control problems. To this end, we prove first a suboptimal result which is valid for arbitrary Lipschitz domains \(\varOmega\).

Lemma 13

Let the assumptions (A1)and (A2)be satisfied. It is assumed that the solutionyof (1) belongs to\(H^{3/2}(\varOmega )\). Moreover, the parameterufulfills (2) and belongs to\(L^p(\varGamma )\)with\(p>2\)for\(n=2\)and\(p\ge 4\)for\(n=3\). Then, the error estimate

$$\begin{aligned} \Vert y - y_h\Vert _{L^2(\varGamma )} \le c\,h\left( 1+\Vert u\Vert _{L^p(\varGamma )}\right) \Vert y\Vert _{H^{3/2}(\varOmega )} \le c\,h\left( 1+\Vert u\Vert _{L^p(\varGamma )}\right) ^2 \end{aligned}$$

holds, for all\(h>0\).

Proof

We introduce the dual problem

$$\begin{aligned} \text{ Find } w\in H^1(\varOmega ):\quad a_u(v,w) = (y - y_h,v)_{L^2(\varGamma )}\qquad \forall v\in H^1(\varOmega ) \end{aligned}$$

and obtain with the typical arguments of the Aubin-Nitsche trick

$$\begin{aligned} \Vert y - y_h\Vert _{L^2(\varGamma )}^2&\le c\,\Vert y - y_h\Vert _{H^1(\varOmega )}\,\Vert w-\varPi _h w\Vert _{H^1(\varOmega )}\\&\le c\,h\,\Vert w\Vert _{H^{3/2}(\varOmega )}\,\Vert y\Vert _{H^{3/2}(\varOmega )}. \end{aligned}$$

The last step is an application of Lemma 12 and the interpolation error estimate (19). The regularity required for the dual solution w can be deduced from Lemma 3 with \(f\equiv 0\) and \(g=y-y_h\). Taking into account the a priori estimate

$$\begin{aligned} \Vert w\Vert _{H^{3/2}(\varOmega )} \le c\left( 1+\Vert u\Vert _{L^p(\varGamma )}\right) \Vert y-y_h\Vert _{L^2(\varGamma )} \end{aligned}$$

we conclude the assertion. \(\square\)

If the solution is more regular, we can also show a higher convergence rate. In this case we will use the Hölder inequality and a trace theorem to obtain \(\Vert y-y_h\Vert _{L^2(\varGamma )} \le \Vert y-y_h\Vert _{L^\infty (\varOmega )}\), and insert the following result.

Theorem 2

Consider a planar domain domain\(\varOmega \in {{\mathbb {R}}}^2\). Let \(u\in H^{1/2}(\varGamma )\)with\(u\ge 0\)a. e., and assume that  (A1)and (A2)are satisfied. Assume that the solutionyof (1) belongs to\(y\in W^{2,q}(\varOmega )\)with\(q\in [2,\infty )\). Then, the error estimate

$$\begin{aligned} \Vert y-y_h\Vert _{L^\infty (\varOmega )} \le c\,h^{2-2/q}\,|\ln h|\,\Vert y\Vert _{W^{2,q}(\varOmega )} \end{aligned}$$

is valid.

The proof requires rather technical arguments and is postponed to the appendix.

5 The discrete optimal control problem

In the following we investigate the discretized optimal control problem:

$$\begin{aligned} {\text {Find}}\ u_h\in U_h^{ad}:\quad J_h(y_h,u_h) := \frac{1}{2} \Vert y_h-y_d\Vert _{L^2(\varOmega )}^2 + \frac{\alpha }{2} \Vert u_h\Vert _{L^2(\varGamma )}^2 \rightarrow \min ! \end{aligned}$$
(23)

subject to

$$\begin{aligned} y_h\in V_h,\quad a_{u_h}(y_h,v_h) = F(v_h)\qquad \forall v_h\in V_h. \end{aligned}$$

The reduced objective functional is denoted by \(j_h(u_h):=J_h(S_h(u_h),u_h)\). We use piecewise linear finite elements to approximate the state y, i. e., the space \(V_h\) is defined as in the previous section. The controls are sought in the space of piecewise constant functions,

$$\begin{aligned} U_h^{ad}:=\{w_h\in L^\infty (\varGamma ):w_h|_E \in {\mathcal {P}}_0\quad \forall E\in {\mathcal {E}}_h\}\cap U_{ad}, \end{aligned}$$

where \({\mathcal {E}}_h\) is the triangulation of the boundary induced by \({\mathcal {T}}_h\).

As in the continuous case we can derive a first-order necessary optimality condition which reads

$$\begin{aligned} \left\{ \begin{array}{ll} a_{u_h}(y_h,v_h)= F(v_h)&~~\text{ for all }\ v_h\in V_h,\\ a_{u_h}(v_h,p_h)= (y_h-y_d,v_h)_{L^2(\varOmega )}&~~\text{ for all }\ v_h\in V_h,\\ (\alpha \,u_h - y_h\, p_h,w_h-u_h)_{L^2(\varGamma )}\ge 0&~~\text{ for all }\ w_h\in U_h^{ad}. \end{array} \right. \end{aligned}$$
(24)

The discrete control-to-state operator \(S_h:L^2(\varGamma )\rightarrow V_h\) and the discrete control-to-adjoint operator \(Z_h:L^2(\varGamma )\rightarrow V_h\) are defined by \(y_h= S_h(u)\) and \(p_h = Z_h(u)\) with

$$\begin{aligned} a_u(y_h,v_h)&= F(v_h)&\forall v_h\in V_h,\\ a_u(v_h,p_h)&= (y_h-y_d,v_h)_{L^2(\varOmega )}&\forall v_h\in V_h. \end{aligned}$$

Analogous to the continuous case we compute the first and second derivatives of \(j_h\) and obtain

$$\begin{aligned} j_h'(u)\delta u = \left( \alpha \,u - S_h(u)\,Z_h(u),\delta u\right) _{L^2(\varGamma )} \end{aligned}$$
(25)

and

$$\begin{aligned} j_h''(u)(\delta u,\tau u) = \left( \alpha \,\tau u - S_h(u)\, Z_h'(u)\tau u - S_h'(u)\tau u\, Z_h(u),\delta u\right) _{L^2(\varGamma )}, \end{aligned}$$
(26)

where \(\tau y_h = S_h'(u)\tau u\in V_h\) and \(\tau p_h = Z_h'(u)\tau u\in V_h\) are the solutions of

$$\begin{aligned} a_u(\tau y_h,v_h)&= -(\tau u\,y_h,v_h)_{L^2(\varGamma )}&{\text {for all}}\ v_h\in V_h,\\ a_u(v_h,\tau p_h)&= (\tau y_h, v_h)_{L^2(\varOmega )} -(\tau u\,p_h,v_h)_{L^2(\varGamma )}&{\text {for all}}\ v_h\in V_h, \end{aligned}$$

with \(y_h=S_h(u)\) and \(p_h=Z_h(u)\). These are the discretized versions of the equations (6) and (15). The first-order optimality condition reads in the short form

$$\begin{aligned} (\alpha \,u_h - S_h(u_h)\,Z_h(u_h), w_h-u_h)_{L^2(\varGamma )} \ge 0 \qquad \text{ for } \text{ all }\ w_h\in U_h^{ad}. \end{aligned}$$
(27)

5.1 Properties of the discrete control-to-state/adjoint operator

In Sect. 3 we have derived several stability and Lipschitz properties for the operators S, Z, \(S'\) and \(Z'\). Here, we will derive the discrete analogues that are needed in the following. Throughout this section we assume that (A1) and (A2) are fulfilled.

Lemma 14

There hold the following properties:

$$\begin{aligned} \Vert S_h(u)\Vert _{H^1(\varGamma )}&\le c\left( 1+\Vert u\Vert _{L^{p_1}(\varGamma )}\right) ^2,\\ \Vert S_h(u)\Vert _{L^\infty (\varOmega )}&\le c\left( 1+\Vert u\Vert _{L^{p_2}(\varGamma )}\right) ^2, \end{aligned}$$

for\(p_1, p_2 >2\)for\(n=2\)and\(p_1\ge 4\), \(p_2> 4\)for\(n=3\). These estimates remain valid when replacing\(S_h\)by\(Z_h\).

Proof

We start with the estimate in the \(H^1(\varGamma )\)-norm. With the triangle inequality and an inverse estimate we obtain

$$\begin{aligned} \Vert S_h(u)\Vert _{H^1(\varGamma )}&\le c \left( \Vert S(u) - S_h (u)\Vert _{H^1(\varGamma )} + \Vert S(u)\Vert _{H^1(\varGamma )} \right) \\&\le c\,\big ( \Vert S(u) - \varPi _h S(u)\Vert _{H^1(\varGamma )} + h^{-1} \Vert S(u) - \varPi _h S(u)\Vert _{L^2(\varGamma )} \\&\quad + h^{-1}\,\Vert S(u) - S_h(u)\Vert _{L^2(\varGamma )} + \Vert S(u)\Vert _{H^1(\varGamma )}\big ). \end{aligned}$$

The first two terms are bounded by the last one due to (20) and it remains to apply the stability estimate from Lemma 6. For the third term we apply the error estimate from Lemma  13. This implies the first estimate.

We prove the maximum norm estimate only for the case \(n=3\). In the following, we write \(y_h := S_h(u)\). We introduce the function \({\tilde{y}}\in H^1(\varOmega )\) solving the problem

$$\begin{aligned} -\varDelta {\tilde{y}} + {\tilde{y}} = f\ \text{ in }\ \varOmega ,\qquad \partial _n \tilde{y} = g-u\,y_h\ \text{ on }\ \varGamma . \end{aligned}$$

Obviously, \(y_h\) is the Neumann Ritz-projection of \({\tilde{y}}\), i. e.,

$$\begin{aligned} a^{\text{ N }}(y_h - {\tilde{y}},v_h)=\int _\varOmega \left( \nabla (y_h-{\tilde{y}})\cdot \nabla v_h + (y_h-{\tilde{y}})\,v_h \right) = 0\quad \text{ for } \text{ all }\ v_h\in V_h. \end{aligned}$$

Let \(x^*\in {\bar{T}}^*\) with \(T^*\in \mathcal T_h\) be the point where \(|y_h|\) attains its maximum. With an inverse inequality and the Hölder inequality we get

$$\begin{aligned} \Vert y_h\Vert _{L^\infty (\varOmega )}&= |y_h(x^*)| \le c\,|T^*|^{-1}\,\Vert y_h\Vert _{L^1(T^*)} \nonumber \\&\le c \left( |T^*|^{-1}\,\Vert {\tilde{y}}-y_h\Vert _{L^1(T^*)} + \Vert {\tilde{y}}\Vert _{L^\infty (T^*)}\right) \nonumber \\&= c\,(\delta ^h,{\tilde{y}}-y_h)_{L^2(\varOmega )} + c\,\Vert {\tilde{y}}\Vert _{L^\infty (\varOmega )}, \end{aligned}$$
(28)

where \(\delta ^h\) is a regularized delta function defined by \(\delta ^h(x) = |T^*|^{-1}{{\,\mathrm{sgn}\,}}({\tilde{y}}(x)-y_h(x))\) if \(x\in T^*\) and \(\delta ^h(x)=0\) otherwise. The second term on the right-hand side can be treated with the arguments used already in the proof of Lemma 3b), namely

$$\begin{aligned} \Vert {\tilde{y}}\Vert _{L^\infty (\varOmega )}&\le c\left( \Vert f\Vert _{L^r(\varOmega )} + \Vert g\Vert _{L^s(\varGamma )} + \Vert u\,y_h\Vert _{L^{s}(\varGamma )} \right) \end{aligned}$$

with \(r>3/2\) and \(s=2+\varepsilon\) with \(\varepsilon >0\) sufficiently small such that the following arguments remain valid. We estimate the last term with the Hölder inequality for \(p_2=4\,(2+\varepsilon )/(2-\varepsilon )\) and \(p'=4\) (note that \(1/{p_2}+1/p'=1/s\)) and the embedding \(H^1(\varOmega )\hookrightarrow L^4(\varGamma )\). This yields

$$\begin{aligned} \Vert u\,y_h\Vert _{L^{s}(\varGamma )} \le c\,\Vert u\Vert _{L^{p_2}(\varGamma )}\,\Vert y_h\Vert _{L^{4}(\varGamma )} \le c\,\Vert u\Vert _{L^{p_2}(\varGamma )}\,\Vert y_h\Vert _{H^1(\varOmega )}. \end{aligned}$$

It remains to exploit stability of \(S_h\) in the \(H^1(\varOmega )\)-norm to conclude

$$\begin{aligned} \Vert {\tilde{y}}\Vert _{L^\infty (\varOmega )} \le c\,(1+\Vert u\Vert _{L^{p_2}(\varGamma )}). \end{aligned}$$
(29)

The estimate for the first term on the right-hand side of (28) is based on the ideas from [40, Section 3.6]. First, we introduce a regularized Green’s function \(g^h\in H^1(\varOmega )\) solving the variational problem \(a^{\text{ N }}(z,g^h) = (\delta ^h,z)_{L^2(\varOmega )}\) for all \(z\in H^1(\varOmega )\). The Neumann Ritz-projection of \(g^h\) is denoted by \(g_h^h\). Using the Galerkin orthogonality we obtain

$$\begin{aligned} (\delta ^h,{\tilde{y}}-y_h)&= a^{\text{ N }}({\tilde{y}}-y_h,g^h) = a^{\text{ N }}({\tilde{y}}-\varPi _h {\tilde{y}},g^h-g_h^h)\nonumber \\&\le c\,h^{1/2}\,\Vert {\tilde{y}}\Vert _{H^{3/2}(\varOmega )}\,\Vert g^h\Vert _{H^1(\varOmega )}, \end{aligned}$$
(30)

where the last step follows form the stability of the Ritz projection and the interpolation error estimate (19). To bound the \(H^1(\varOmega )\)-norm of \(g^h\) we apply the ellipticity of \(a^{\text{ N }}\), the definition of \(g^h\), the Hölder inequality and an embedding to arrive at

$$\begin{aligned} c\,\Vert g^h\Vert _{H^1(\varOmega )}^2&\le a^{\text{ N }}(g^h,g^h) =(\delta ^h,g^h)_{L^2(\varOmega )} \\&\le c\,\Vert \delta ^h\Vert _{L^{6/5}(\varOmega )}\,\Vert g^h\Vert _{L^6(\varOmega )} \le c\,h^{-1/2}\,\Vert g^h\Vert _{H^1(\varOmega )}. \end{aligned}$$

The last step follows from the property \(\Vert \delta ^h\Vert _{L^{6/5}(\varOmega )}\le c\,|T^*|^{-1/6}\le c\,h^{-1/2}\) that can be confirmed with a simple computation. Insertion into (30) and taking into account (28) and (29) yields the desired stability estimate.

The estimates for \(Z_h\) follow in a similar way. One just has to replace f by \(S_h(u)-y_d\) and the result follows from the estimates proved already for \(S_h(u)\). \(\square\)

Lemma 15

Assume that\(u,v\in L^2(\varGamma )\)satisfy the assumption (2). Then, the Lipschitz estimate

$$\begin{aligned} \Vert S_h(u) - S_h(v)\Vert _{H^1(\varOmega )} \le c\,\Vert u-v\Vert _{L^2(\varGamma )} \end{aligned}$$

holds.

Proof

The proof follows with the same arguments as in the continuous case, see Lemmata  2 and 8. \(\square\)

Next, we discuss some error estimates for the approximation of the control-to-state and control-to-adjoint operator. While error estimates for \(S_h\) and \(Z_h\) are a direct consequence of Lemma 12, the results for the linearized operators \(S_h'\) and \(Z_h'\) require some more effort as for instance \(S'(u)\delta u - S_h'(u)\delta u\) does not fulfill the Galerkin orthogonality.

Lemma 16

For each\(u\in U_{ad}\)and\(\delta u\in L^2(\varGamma )\)the error estimates

$$\begin{aligned} \Vert S(u) - S_h(u)\Vert _{H^1(\varOmega )}&\le c\,h^{1/2}\,(1+\Vert u\Vert _{L^p(\varGamma )}),\\ \Vert S'(u)\delta u - S_h'(u)\delta u \Vert _{H^1(\varOmega )}&\le c\,h^{1/2}\,(1+\Vert u\Vert _{L^p(\varGamma )})^3\,\Vert \delta u\Vert _{L^2(\varGamma )} \end{aligned}$$

are valid for\(p>2\)for\(n=2\)and\(p\ge 4\)for\(n=3\). The results are also valid when replacingSand\(S_h\)byZand\(Z_h\), as well as\(S'\)and\(S_h'\)by\(Z'\)and\(Z_h'\), respectively.

Proof

The first estimate is just a combination of the Lemmata 6 and 12. To show the estimate for the linearized operators we introduce again the abbreviations \(y:=S(u)\), \(y_h:=S_h(u)\), \(\delta y := S'(u)\delta u\) and \(\delta y_h:= S_h'(u)\delta u\). Moreover, define the auxiliary function \(\delta {\tilde{y}}_h\in V_h\) as the solution of

$$\begin{aligned} a_u(\delta {\tilde{y}}_h,v_h) = (y\,\delta u,v_h)_{L^2(\varGamma )}\qquad \forall v_h\in V_h. \end{aligned}$$

This function fulfills the Galerkin orthogonality, i. e., \(a_u(\delta y-\delta {\tilde{y}}_h,v_h) = 0\) for all \(v_h\in V_h\). Hence, we obtain with Lemma 12 and the Lipschitz-property from Lemma  2 (note that this Lemma is also valid for the discrete solutions)

$$\begin{aligned} \Vert \delta y - \delta y_h\Vert _{H^1(\varOmega )}&\le c\left( \Vert \delta y - \delta {\tilde{y}}_h\Vert _{H^1(\varOmega )} + \Vert \delta {\tilde{y}}_h - \delta y_h\Vert _{H^1(\varOmega )} \right) \\&\le c\left( h^{1/2}\,\Vert \delta y\Vert _{H^{3/2}(\varOmega )} + \Vert \delta u\,(y-y_h)\Vert _{H^{-1/2}(\varGamma )}\right) . \end{aligned}$$

For the first term we simply insert the second estimate from Lemma 7. The second term on the right-hand side is further estimated by means of [20, Theorem 1.4.4.2] and a trace theorem which yield

$$\begin{aligned} \Vert \delta u\,(y-y_h)\Vert _{H^{-1/2}(\varGamma )} \le c\,\Vert \delta u\Vert _{L^2(\varGamma )} \,\Vert y-y_h\Vert _{H^1(\varOmega )}, \end{aligned}$$

and the assertion follows after an application of the estimate shown already for \(S(u)-S_h(u)\). The estimates for Z and \(Z'\) follow with similar arguments. \(\square\)

5.2 Convergence of the fully discrete solutions

Throughout this subsection we assume that the properties (A1) and (A2) are fulfilled. These assumptions are needed to guarantee the required regularity of the solution and the validity of interpolation error estimates.

As the solutions of both the continuous and discrete optimal control problem (5) and (23), respectively, are not unique we have to construct a sequence of discrete local solutions converging towards a continuous one. The first question which arises is whether such a sequence exists. To this end, we introduce a localized problem

$$\begin{aligned} j_h(u_h)\rightarrow \min ! \quad \text{ s. } \text{ t. }\ u_h\in U_h^{ad}\cap B_\varepsilon ({\bar{u}}), \end{aligned}$$
(31)

where \({\bar{u}}\in U_{ad}\) is a fixed local solution of (5) fulfilling Assumption 1 and \(B_\varepsilon (\bar{u})\) is the \(L^2(\varGamma )\)-ball with radius \(\varepsilon\) around \({\bar{u}}\). The parameter \(\varepsilon >0\) is arbitrary but sufficiently small. First, we show that this problem possesses a unique local solution which would immediately follow if we could show that the coercivity discussed in Corollary 1 is transferred to the discrete case. The following arguments are similar to the investigations in [11], in particular Theorem 4.4 and 4.5 therein.

Lemma 17

Let\({\bar{u}}\in U_{ad}\)be a local solution of (5). Assume that\(\varepsilon >0\)and\(h>0\)are sufficiently small. Then, the inequality

$$\begin{aligned} j_h''(u)\delta u^2\ge \frac{\delta }{4}\Vert \delta u\Vert _{L^2(\varGamma )}^2 \end{aligned}$$

is valid for allusatisfying\(\Vert u-{\bar{u}}\Vert _{L^2(\varGamma )} \le \varepsilon\).

Proof

With the explicit representations of \(j''\) and \(j_h''\) from (14) and (26), respectively, and Corollary 1, we obtain

$$\begin{aligned} \frac{\delta }{2} \Vert \delta u\Vert _{L^2(\varGamma )}^2&\le \left( j''(u)\delta u^2 - j_h''(u)\delta u^2\right) + j_h''(u)\delta u^2 \nonumber \\&\le \Big ( \Vert y_h\, \delta p_h - y\, \delta p\Vert _{L^2(\varGamma )} + \Vert \delta y_h\, p_h - \delta y\,p\Vert _{L^2(\varGamma )} \Big )\,\Vert \delta u\Vert _{L^2(\varGamma )} + j_h''(u)\delta u^2, \end{aligned}$$
(32)

with \(y=S(u)\), \(p = Z(u)\), \(\delta y= S'(u)\delta u\) and \(\delta p=Z'(u)\delta u\), and the discrete analogues \(y_h=S_h(u)\), \(p_h = Z_h(u)\), \(\delta y_h= S_h'(u)\delta u\) and \(\delta p_h=Z_h'(u)\delta u\). It remains to bound the two norms in parentheses appropriately. Therefore, we apply the triangle inequality, the stability properties for \(S'\), \(S_h\), \(Z'\) and \(Z_h\) from Lemmata 67 and 14 as well as the error estimates from Lemma 16. Note that the control bounds provide the regularity for u that is required for these estimates. As a consequence we obtain

$$\begin{aligned}&\Vert y_h\, \delta p_h - y\, \delta p\Vert _{L^2(\varGamma )} \\&\quad \le c\left( \Vert y-y_h\Vert _{H^1(\varOmega )}\,\Vert \delta p\Vert _{H^1(\varOmega )} + \Vert \delta p-\delta p_h\Vert _{H^1(\varOmega )}\,\Vert y_h\Vert _{H^1(\varOmega )}\right) \\&\quad \le c\,h^{1/2}\,\Vert \delta u\Vert _{L^2(\varGamma )}. \end{aligned}$$

With similar arguments we can show

$$\begin{aligned}&\Vert \delta y_h\, p_h - \delta y \,p\Vert _{L^2(\varGamma )} \\&\quad \le c \left( \Vert \delta y - \delta y_h\Vert _{H^1(\varOmega )}\,\Vert p_h\Vert _{H^1(\varOmega )} + \Vert p - p_h\Vert _{H^1(\varOmega )}\,\Vert \delta y\Vert _{H^1(\varOmega )}\right) \\&\quad \le c\, h^{1/2}\,\Vert \delta u\Vert _{L^2(\varGamma )}. \end{aligned}$$

The previous two estimates together with (32) imply

$$\begin{aligned} \frac{\delta }{2}\, \Vert \delta u\Vert _{L^2(\varGamma )}^2 \le c\,h^{1/2}\,\Vert \delta u\Vert _{L^2(\varGamma )}^2 + j_h''({\bar{u}})\delta u^2. \end{aligned}$$

Choosing h sufficiently small such that \(c\, h^{1/2} \le \frac{\delta }{4}\) leads to the assertion. \(\square\)

Theorem 3

Let\({\bar{u}}\in U_{ad}\)be a local solution of (5) satisfying Assumption 1. Assume that\(\varepsilon >0\)and\(h_0>0\)are sufficiently small. Then, the auxiliary problem (31) possesses a unique solution for each\(h\le h_0\)denoted by\({\bar{u}}_h^\varepsilon\), and there holds

$$\begin{aligned} \lim _{h\rightarrow 0} \Vert {\bar{u}}-{\bar{u}}_h^\varepsilon \Vert _{L^2(\varGamma )} = 0. \end{aligned}$$

Proof

The existence of at least one solution of (31) follows immediately from the compactness and non-emptyness of \(U_h^{ad}\cap B_\varepsilon ({\bar{u}})\). Note that the \(L^2(\varGamma )\)-projection \(Q_h {\bar{u}}\) of \(\bar{u}\) onto \(U_h\), defined in (81) in Appendix 2, belongs to \(U_h^{ad}\cap B_{\varepsilon }({\bar{u}})\) provided that \(h>0\) is sufficiently small. This means that the feasible set is not empty. Due to Lemma 17 this solution is unique.

Moreover, the family \(\{{\bar{u}}_h^\varepsilon \}_{h\le h_0}\) is bounded and hence, a weakly convergent sequence \(\{{\bar{u}}_{h_k}^\varepsilon \}_{k\in {{\mathbb {N}}}}\) with \(h_k \searrow 0\) exists. The weak limit is denoted by \({\tilde{u}}\in L^2(\varGamma )\) and from the convexity of the feasible set we deduce \({\tilde{u}}\in U_h^{ad}\cap B_\varepsilon ({\bar{u}})\). Without loss of generality it is assumed that \({\bar{u}}_h^\varepsilon \rightharpoonup {\tilde{u}}\) in \(L^2(\varGamma )\) as \(h\searrow 0\).

Next, we show that \({\tilde{u}}\) is a local minimum of the continuous problem. First, we show the convergence of the corresponding states which follows with the arguments from [10]. First, we employ the triangle inequality to get

$$\begin{aligned} \Vert S({\tilde{u}}) - S_h({\bar{u}}_h^\varepsilon )\Vert _{H^1(\varOmega )} \le c\left( \Vert S({\tilde{u}}) - S_h({\tilde{u}})\Vert _{H^1(\varOmega )} + \Vert S_h({\tilde{u}}) - S_h({\bar{u}}_h^\varepsilon )\Vert _{H^1(\varOmega )} \right) . \end{aligned}$$
(33)

For the first term on the right-hand side we exploit convergence of the finite element method proved in Lemma 16 which yields

$$\begin{aligned} \Vert S({\tilde{u}}) - S_h({\tilde{u}})\Vert _{H^1(\varOmega )}\rightarrow 0,\quad h\searrow 0. \end{aligned}$$

With similar arguments as in the proof of Lemma 2 we moreover deduce

$$\begin{aligned} \Vert S_h({\tilde{u}}) - S_h({\bar{u}}_h^\varepsilon )\Vert _{H^1(\varOmega )}^2&= -({\tilde{u}}\,S_h({\tilde{u}}) - {\bar{u}}_h^\varepsilon \,S_h({\bar{u}}_h^\varepsilon ), S_h({\tilde{u}}) - S_h({\bar{u}}_h^\varepsilon ))_{L^2(\varGamma )} \\&= -(({\tilde{u}}-{\bar{u}}_h^\varepsilon )\,S_h({\tilde{u}}), S_h({\tilde{u}}) - S_h({\bar{u}}_h^\varepsilon ))_{L^2(\varGamma )} \\&\quad - \int _\varGamma {\bar{u}}_h^\varepsilon \,(S_h({\tilde{u}}) - S_h({\bar{u}}_h^\varepsilon ))^2. \end{aligned}$$

The integral term on the right-hand side is non-negative due to the lower control bounds \({\bar{u}}_h^\varepsilon \ge u_a\ge 0\). We can bound the first term on the right-hand side with the Cauchy–Schwarz inequality and the multiplication rule from [20, Theorem 1.4.4.2] which provides

$$\begin{aligned}&\Vert S_h({\tilde{u}}) - S_h({\bar{u}}_h^\varepsilon )\Vert _{H^1(\varOmega )}^2 \\&\quad \le \Vert {\tilde{u}} - {\bar{u}}_h^\varepsilon \Vert _{H^{-s}(\varGamma )} \, \Vert S_h({\tilde{u}})\Vert _{H^1(\varGamma )} \, \Vert S_h({\tilde{u}}) - S_h({\bar{u}}_h^\varepsilon )\Vert _{H^1(\varOmega )} \end{aligned}$$

for arbitrary \(s\in (0,1/2)\). Note that there holds \(\Vert {\tilde{u}} - {\bar{u}}_h^\varepsilon \Vert _{H^{-s}(\varGamma )}\rightarrow 0\) for \(h\searrow 0\) due to the compact embedding \(L^2(\varGamma )\hookrightarrow H^{-s}(\varGamma )\), \(s>0\). It remains to bound the second factor on the right-hand side by an application of Lemma  14 and to divide the whole estimate by the third factor. After insertion of this estimate into (33) we obtain the strong convergence of the states, this is,

$$\begin{aligned} \Vert S({\tilde{u}}) - S_h({\bar{u}}_h^\varepsilon )\Vert _{H^1(\varOmega )} \rightarrow 0\quad \text{ for }\quad h\searrow 0. \end{aligned}$$
(34)

Next, we show that \({\tilde{u}}\) is a local solution of the continuous problem (5). To this end we exploit (34) and the lower semi-continuity of the norm map to arrive at

$$\begin{aligned} j({\tilde{u}}) \le \liminf _{h\searrow 0} j_h({\bar{u}}_h^\varepsilon ) \le \limsup _{h\searrow 0} j_h({\bar{u}}_h^\varepsilon ) \le \limsup _{h\searrow 0} j_h(Q_h {\bar{u}}) \le j({\bar{u}}). \end{aligned}$$
(35)

The second to last step follows from the optimality of \({\bar{u}}_h^\varepsilon\) for (31) and the admissibility of the \(L^2(\varGamma )\)-projection \(Q_h {\bar{u}}\) for sufficiently small \(h>0\). The last step follows from the strong convergence of the \(L^2(\varGamma )\)-projection \(Q_h\) in \(L^2(\varGamma )\). Note that this implies \(\lim _{h\searrow 0} \Vert S_h(Q_h {\bar{u}}) - S({\bar{u}})\Vert _{L^2(\varOmega )} = 0\). Due to Assumption 1 the solution \({\bar{u}}\) is unique within \(B_\varepsilon ({\bar{u}})\) when \(\varepsilon > 0\) is sufficiently small. This implies \({\tilde{u}} = {\bar{u}}\). Note that all “\(\le\)” signs in (35) then turn to “\(=\)” signs.

To conclude the strong convergence of the sequence \(\{{\bar{u}}_h^\varepsilon \}_{h>0}\) we show additionally the convergence of the norms. This follows from (35) and the strong convergence of the states from which we infer

$$\begin{aligned} \frac{\alpha }{2}\lim _{h\searrow 0} \Vert {\bar{u}}_h^\varepsilon \Vert _{L^2(\varGamma )}^2&= \lim _{h\searrow 0}\left( j_h({\bar{u}}_h^\varepsilon ) - \frac{1}{2}\Vert S_h({\bar{u}}_h^\varepsilon ) - y_d\Vert _{L^2(\varOmega )}^2\right) \\&= j({\bar{u}}) - \frac{1}{2} \Vert S({\bar{u}}) - y_d\Vert _{L^2(\varOmega )}^2 = \frac{\alpha }{2} \Vert {\bar{u}}\Vert _{L^2(\varGamma )}^2. \end{aligned}$$

\(\square\)

The previous lemma guarantees that every local solution \({\bar{u}}\in U_{ad}\) satisfying the second-order sufficient condition in Assumption 1 can be approximated by a sequence of local solutions of the discretized problems (31). Due to \({\bar{u}}_h^\varepsilon \in B_\varepsilon ({\bar{u}})\) and \({\bar{u}}_h^\varepsilon \rightarrow {\bar{u}}\) for \(h\searrow 0\) (i. e., the constraint \({\bar{u}}_h^\varepsilon \in B_\varepsilon ({\bar{u}})\) is never active), the functions \({\bar{u}}_h^\varepsilon\) are local solutions of the discrete problems (23) provided that \(h>0\) is small enough. Hence, we neglect the superscript \(\varepsilon\) in the following and denote by \({\bar{u}}_h\) the sequence of discrete local solutions converging to the local solution \({\bar{u}}\).

Next, we show linear convergence of the sequence \({\bar{u}}_h\).

Theorem 4

Let\({\bar{u}}\in U_{ad}\)be a local solution of (5) which fulfills Assumption 1, and\(\{{\bar{u}}_h\}_{h>0}\)are local solutions of (23) with\({\bar{u}}_h\rightarrow {\bar{u}}\)for\(h\searrow 0\). Then, the error estimate

$$\begin{aligned} \Vert {\bar{u}}-{\bar{u}}_h\Vert _{L^2(\varGamma )} \le \frac{c}{\sqrt{\delta }} h \end{aligned}$$

holds.

Proof

Let \(\xi = {\bar{u}}+t({\bar{u}}_h-{\bar{u}})\) with \(t\in (0,1)\). From Corollary 1 we obtain for sufficiently small h the estimate

$$\begin{aligned} \frac{\delta }{2} \Vert {\bar{u}}-{\bar{u}}_h\Vert _{L^2(\varGamma )}^2&\le j''(\xi )({\bar{u}}-{\bar{u}}_h)^2\\&= j'({\bar{u}})({\bar{u}}-{\bar{u}}_h) - j'({\bar{u}}_h)({\bar{u}}-{\bar{u}}_h), \end{aligned}$$

where the last step follows from the mean value theorem for some \(t\in (0,1)\). Next, we confirm with the first-order optimality conditions that

$$\begin{aligned} j'({\bar{u}})({\bar{u}}-{\bar{u}}_h) \le 0 \le j_h'({\bar{u}}_h)(Q_h {\bar{u}}-{\bar{u}}_h) \end{aligned}$$

with the \(L^2(\varGamma )\) projection \(Q_h\) onto \(U_h\). Note that the property \(Q_h{\bar{u}}\in U_{ad}\) is trivially satisfied. Insertion into the inequality above leads to

$$\begin{aligned} \frac{\delta }{2} \Vert {\bar{u}}-{\bar{u}}_h\Vert _{L^2(\varGamma )}^2 \le (j_h'({\bar{u}}_h) - j'({\bar{u}}_h))(Q_h {\bar{u}}-{\bar{u}}_h) - j'({\bar{u}}_h)({\bar{u}}-Q_h {\bar{u}}). \end{aligned}$$
(36)

An estimate for the second part follows from orthogonality of the \(L^2(\varGamma )\)-projection, this is,

$$\begin{aligned} j'({\bar{u}}_h)({\bar{u}}-Q_h {\bar{u}})&= (\alpha \,{\bar{u}}_h - S({\bar{u}}_h)\,Z({\bar{u}}_h), {\bar{u}}-Q_h {\bar{u}})_{L^2(\varGamma )} \nonumber \\&= (Q_h(S({\bar{u}}_h)\,Z({\bar{u}}_h)) - S({\bar{u}}_h)\,Z({\bar{u}}_h), {\bar{u}}-Q_h {\bar{u}})_{L^2(\varGamma )} \nonumber \\&\le c\,h^2\,\Vert S({\bar{u}}_h)\,Z({\bar{u}}_h)\Vert _{H^1(\varGamma )}\,\Vert {\bar{u}}\Vert _{H^1(\varGamma )}. \end{aligned}$$
(37)

Furthermore, we exploit the Leibniz rule and the stability properties for S and Z from Lemma 6 to obtain

$$\begin{aligned} \Vert S({\bar{u}}_h)\,Z({\bar{u}}_h)\Vert _{H^1(\varGamma )}&\le c\Big (\Vert S({\bar{u}}_h)\Vert _{H^1(\varGamma )} \,\Vert Z({\bar{u}}_h)\Vert _{L^\infty (\varOmega )}\nonumber \\&\quad + \Vert S({\bar{u}}_h)\Vert _{L^\infty (\varOmega )}\, \Vert Z({\bar{u}}_h)\Vert _{H^1(\varGamma )}\Big ) \le c. \end{aligned}$$
(38)

Next, we discuss the first term on the right-hand side of (36). Insertion of the definition of \(j_h'\) and \(j'\) and the stability of \(Q_h\) yield

$$\begin{aligned}&(j_h'({\bar{u}}_h) - j'({\bar{u}}_h))(Q_h {\bar{u}}-{\bar{u}}_h)\\&\quad =(S({\bar{u}}_h)\,Z({\bar{u}}_h) - S_h({\bar{u}}_h)\,Z_h({\bar{u}}_h), Q_h({\bar{u}}-{\bar{u}}_h))_{L^2(\varGamma )} \\&\quad \le c\,\Big (\Vert Z({\bar{u}}_h)\Vert _{L^\infty (\varGamma )}\, \Vert S({\bar{u}}_h) - S_h({\bar{u}}_h)\Vert _{L^2(\varGamma )} \nonumber \\&\qquad + \Vert S_h({\bar{u}}_h)\Vert _{L^\infty (\varGamma )}\, \Vert Z({\bar{u}}_h) - Z_h({\bar{u}}_h)\Vert _{L^2(\varGamma )}\Big )\, \Vert {\bar{u}}-{\bar{u}}_h\Vert _{L^2(\varGamma )} \\&\quad \le c\,h\left( \Vert Z({\bar{u}}_h)\Vert _{L^\infty (\varGamma )}\,\Vert S({\bar{u}}_h)\Vert _{H^{3/2}(\varOmega )} + \Vert S_h({\bar{u}}_h)\Vert _{L^\infty (\varGamma )}\,\Vert Z({\bar{u}}_h)\Vert _{H^{3/2}(\varOmega )}\right) \\&\qquad \times \Vert {\bar{u}}-{\bar{u}}_h\Vert _{L^2(\varGamma )}. \end{aligned}$$

In the last step we inserted the finite element error estimates from Lemma 13. Exploiting also the stability estimates from Lemmata 6 and  14 we obtain

$$\begin{aligned} (j_h'({\bar{u}}_h) - j'({\bar{u}}_h))(Q_h {\bar{u}}-{\bar{u}}_h) \le c\,h\,\Vert {\bar{u}} - {\bar{u}}_h\Vert _{L^2(\varGamma )}. \end{aligned}$$

Together with (36), (37) and (38) we arrive at the assertion. \(\square\)

5.3 Postprocessing approach

In this section we consider the so-called postprocessing approach introduced in [33]. The basic idea is to compute an “improved” control \({\tilde{u}}_h\) by a pointwise evaluation of the projection formula, i. e.,

$$\begin{aligned} {\tilde{u}}_h := \varPi _{{\text {ad}}}\left( \frac{1}{\alpha }[{\bar{y}}_h\,{\bar{p}}_h]_\varGamma \right) , \end{aligned}$$
(39)

where \({\bar{y}}_h\) and \({\bar{p}}_h\) is the discrete state and adjoint state, respectively, obtained by the full discretization approach discussed in Sect. 5.2. As we require higher regularity of the exact solution in order to observe a higher convergence rate than for the full discretization approach, we replace (A1) by the stronger assumption

(A1’):

The domain \(\varOmega\) is planar and its boundary is globally \(C^3\).

The most technical part of convergence proofs for this approach is the proof of \(L^2\)-norm estimates for the state variables. This is usually done by considering the following three terms separately:

$$\begin{aligned} \Vert {\bar{y}}-{\bar{y}}_h\Vert _{L^2(\varGamma )}&\le c \Big (\Vert {\bar{y}}-S_h({\bar{u}})\Vert _{L^2(\varGamma )} + \Vert S_h({\bar{u}}) - S_h(R_h {\bar{u}})\Vert _{L^2(\varGamma )} \nonumber \\&\quad + \Vert S_h(R_h {\bar{u}}) - {\bar{y}}_h\Vert _{L^2(\varGamma )}\Big ). \end{aligned}$$
(40)

In [33] \(R_h:C(\varGamma )\rightarrow U_h\) is chosen as the midpoint interpolant. We will construct and investigate such an operator in Appendix 1. Note that a definition of a midpoint interpolant on curved elements is not straight-forward. The first term on the right-hand side of (40) is a finite element error in the \(L^2(\varGamma )\)-norm. We collect the required estimates in the following Lemma.

Lemma 18

For all \(q<\infty\) there hold the estimates

$$\begin{aligned} \Vert {\bar{y}} - S_h({\bar{u}})\Vert _{L^2(\varGamma )}&\le c\,h^{2-2/q}\,|\ln h|\, \Vert {\bar{y}}\Vert _{W^{2,q}(\varOmega )} \\ \Vert {\bar{p}} - Z_h({\bar{u}})\Vert _{L^2(\varGamma )}&\le c\,h^{2-2/q}\,|\ln h|\, \left( \Vert {\bar{p}}\Vert _{W^{2,q}(\varOmega )} + \Vert {\bar{y}}\Vert _{H^2(\varOmega )}\right) . \end{aligned}$$

Proof

The first estimate follows from the Hölder inequality and the maximum norm estimate derived in Theorem 2. The second estimate requires an intermediate step. We denote by \(p^h({\bar{u}})\in V_h\) the solution of the equation

$$\begin{aligned} a_{{\bar{u}}}(p^h({\bar{u}}),v_h) = (S({\bar{u}})-y_d,v_h)_{L^2(\varOmega )}\qquad \forall v_h\in V_h. \end{aligned}$$

As \(p^h({\bar{u}})\) is the Ritz-projection of \({\bar{p}}\) we can apply Theorem 2 again and obtain

$$\begin{aligned} \Vert {\bar{p}} - p^h({\bar{u}})\Vert _{L^2(\varGamma )} \le c\,h^{2-2/q}\,|\ln h|\,\Vert {\bar{p}}\Vert _{W^{2,q}(\varGamma )}. \end{aligned}$$

To show an estimate for the error between \(p^h({\bar{u}})\) and \(Z_h({\bar{u}})\) we test the equations defining both functions by \(v_h = p^h({\bar{u}}) - Z_h(\bar{u})\), compare the proof of Lemma 2. Together with the non-negativity of \({\bar{u}}\) we obtain

$$\begin{aligned}&\Vert p^h({\bar{u}})-Z_h({\bar{u}})\Vert _{H^1(\varOmega )}^2\\&\quad = -\int _\varGamma {\bar{u}}\,(p^h({\bar{u}}) - Z_h({\bar{u}}))^2 + (S({\bar{u}}) - S_h({\bar{u}}),p^h({\bar{u}})-Z_h({\bar{u}}))_{L^2(\varOmega )} \\&\quad \le c\,h^2\,\Vert y\Vert _{H^2(\varOmega )}\,\Vert p^h({\bar{u}})-Z_h({\bar{u}})\Vert _{H^1(\varOmega )}. \end{aligned}$$

The last step follows from the estimate \(\Vert S({\bar{u}}) - S_h({\bar{u}})\Vert _{L^2(\varOmega )} \le c\,h^2\,\Vert S({\bar{u}})\Vert _{H^2(\varOmega )}\) which is a consequence of the Aubin-Nitsche trick. With the triangle inequality we conclude the desired estimate for the discrete control-to-adjoint operator. \(\square\)

To obtain an optimal error estimate for the second term we need an additional assumption which is used in all contributions studying the postprocessing approach. To this end, define the subsets \({\mathcal {K}}_2:=\cup \{{\bar{E}}:E\in {\mathcal {E}}_h,\ E\subset {\mathcal {A}},\ \text{ or }\ E\subset {\mathcal {I}}\}\) and \({\mathcal {K}}_1:=\varGamma \setminus {\mathcal {K}}_2\). In the following we will assume that \({\mathcal {K}}_1\) satisfies

$$\begin{aligned} |{\mathcal {K}}_1|\le c\,h. \end{aligned}$$
(41)

The idea of this assumption is, that the control can only switch between active and inactive set on \({\mathcal {K}}_1\). Only due to these switching points the regularity of the control is reduced, see also Lemma 11. One can in general expect that this happens at finitely many points and thus, the assumption (41) is not very restrictive.

As an intermediate result required to prove estimates for \(Z_h({\bar{u}})-Z_h(R_h{\bar{u}})\) in \(L^2(\varGamma )\), we need an estimate for \(S_h({\bar{u}}) - S_h(R_h{\bar{u}})\) in \(L^2(\varOmega )\).

Lemma 19

For all \(q<\infty\) there holds the estimate

$$\begin{aligned} \Vert S_h({\bar{u}}) - S_h(R_h {\bar{u}})\Vert _{L^2(\varOmega )} \le c\,h^{2-2/q}\left( 1 + \Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )} + \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}\right) . \end{aligned}$$

Proof

To shorten the notation we write \(e_h:= S_h({\bar{u}}) - S_h(R_h {\bar{u}})\). Moreover, we introduce the function \(w\in H^1(\varOmega )\) solving the equation

$$\begin{aligned} a_{{\bar{u}}}(v,w) = (e_h,v)_{L^2(\varOmega )}\qquad \forall v\in H^1(\varOmega ). \end{aligned}$$
(42)

This implies

$$\begin{aligned} \Vert e_h\Vert _{L^2(\varOmega )}^2 = a_{{\bar{u}}}(e_h,w-\varPi _h w) + a_{{\bar{u}}}(e_h,\varPi _h w). \end{aligned}$$
(43)

Next, we discuss both terms on the right-hand side separately. The first one is treated with the Cauchy–Schwarz inequality and the interpolation error estimate (19). These arguments lead to

$$\begin{aligned} a_{{\bar{u}}}(e_h,w-\varPi _h w) \le c\,h\,\Vert e_h\Vert _{H^1(\varOmega )}\,\Vert e_h\Vert _{L^2(\varOmega )}. \end{aligned}$$

The \(H^1(\varOmega )\)-norm of \(e_h\) is further estimated by the Lipschitz property from Lemma 15 and the interpolation error estimate for the midpoint interpolant from Lemma 25. This yields

$$\begin{aligned} \Vert e_h\Vert _{H^1(\varOmega )} \le c\,\Vert {\bar{u}} - R_h{\bar{u}}\Vert _{L^2(\varGamma )} \le c\,h\,\Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )} \end{aligned}$$

for all \(q\ge 2\).

Insertion into the estimate above taking into account the stability estimates from Lemma 14 yields

$$\begin{aligned} a_{{\bar{u}}}(e_h,w-\varPi _h w) \le c h^{2} \Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )} \Vert e_h\Vert _{L^2(\varOmega )}. \end{aligned}$$
(44)

Next, we consider the second term on the right-hand side of (43). After a reformulation by means of the definition of \(S_h\) we get

$$\begin{aligned} a_{{\bar{u}}}(e_h,\varPi _h w)&= a_{{\bar{u}}} (S_h({\bar{u}}),\varPi _h w) - a_{{\bar{u}}}(S_h(R_h{\bar{u}}), \varPi _h w) \nonumber \\&= a_{R_h{\bar{u}}} (S_h(R_h {\bar{u}}),\varPi _h w) - a_{{\bar{u}}}(S_h(R_h{\bar{u}}), \varPi _h w)\nonumber \\&= ((R_h{\bar{u}} - {\bar{u}})\,S_h(R_h {\bar{u}}),\varPi _h w)_{L^2(\varGamma )}. \end{aligned}$$
(45)

We can further estimate this term with the interpolation error estimate from Lemma 27

$$\begin{aligned}&((R_h{\bar{u}} - {\bar{u}})\,S_h(R_h {\bar{u}}),\varPi _h w)_{L^2(\varGamma )} \le c\,h^2\,\Vert {\bar{u}}\Vert _{H^1(\varGamma )}\,\Vert S_h(R_h{\bar{u}})\,\varPi _hw\Vert _{H^1(\varGamma )} \nonumber \\&\qquad + \Vert S_h(R_h{\bar{u}})\Vert _{L^\infty (\varGamma )}\,\Vert \varPi _h w\Vert _{L^\infty (\varGamma )}\,\sum _{E\in {\mathcal {E}}_h} \left| \int _E({\bar{u}}-R_h\bar{u})\right| \nonumber \\&\quad \le c\,\Big (h^2 + \sum _{E\in {\mathcal {E}}_h} \Big |\int _E({\bar{u}}-R_h\bar{u})\Big |\Big )\left( 1+\Vert {\bar{u}}\Vert _{H^1(\varGamma )}\right) \Vert S_h(R_h{\bar{u}})\Vert _{H^1(\varGamma )}\, \Vert \varPi _h w\Vert _{H^1(\varGamma )}. \end{aligned}$$
(46)

The last step follows from the embedding \(H^1(\varGamma )\hookrightarrow L^\infty (\varGamma )\) and the multiplication rule \(\Vert u\,v\Vert _{H^1(\varGamma )} \le c\,\Vert u\Vert _{H^1(\varGamma )}\,\Vert v\Vert _{H^1(\varGamma )}\), see [20, Theorem 1.4.4.2]. Both properties are only fulfilled in case of \(n=2\).

Let us discuss the terms on the right-hand side separately. For elements \(E\subset {\mathcal {K}}_1\) we can exploit the assumption (41) which provides the estimate \(\sum _{E\subset {\mathcal {K}}_1} |E| \le c\,h\) and the second interpolation error estimate from Lemma 25 to arrive at

$$\begin{aligned} \sum _{\genfrac{}{}{0.0pt}{}{E\in {\mathcal {E}}_h}{E\subset {\mathcal {K}}_1}} \left| \int _E({\bar{u}}-R_h{\bar{u}})\right| \le c\,h\,\Vert {\bar{u}}-R_h{\bar{u}}\Vert _{L^\infty (\varGamma )} \le c\,h^{2-1/q}\,\Vert \nabla {\bar{u}}\Vert _{L^q(\varGamma )}. \end{aligned}$$

On elements \(E\subset {\mathcal {K}}_2\) the control has higher regularity, namely \({\bar{u}}\in H^{2-1/q}(E)\). To this end, we show by interpolation arguments in Banach spaces, see e. g. [7, Section 14.3], that the two estimates from Lemma 25 (the second one with \(r=1\) and \(q=2\)) also imply

$$\begin{aligned} \int _E(v-R_h v)\le c\,h^{5/2-1/q}\,\Vert v\Vert _{H^{2-1/q}(E)},\quad v\in H^{2-1/q}(\varGamma ),\,q\in [1,\infty ]. \end{aligned}$$
(47)

As a consequence, we deduce

$$\begin{aligned} \sum _{\genfrac{}{}{0.0pt}{}{E\in {\mathcal {E}}_h}{E\subset {\mathcal {K}}_2}} \left| \int _E({\bar{u}}-R_h{\bar{u}})\right|&\le c\,h^{5/2-1/q}\,\Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}\, \Big (\sum _{\genfrac{}{}{0.0pt}{}{E\in {\mathcal {E}}_h}{E\subset {\mathcal {K}}_2}}1\Big )^{1/2} \\&\le c\,h^{2-1/q}\,\Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}. \end{aligned}$$

The remaining terms on the right-hand side of (46) can be treated with stability estimates for \(S_h\) (see Lemma 14) and \(R_h\), the estimate \(\Vert \varPi _h w\Vert _{H^1(\varGamma )}\le c\,\Vert w\Vert _{H^1(\varGamma )}\) stated in (20) and the a priori estimate \(\Vert w\Vert _{H^1(\varGamma )} \le c\,\Vert e_h\Vert _{L^2(\varOmega )}\) from Lemma 3a). Insertion of the previous estimates into (46) yields

$$\begin{aligned}&((R_h{\bar{u}} - {\bar{u}})\,S_h(R_h {\bar{u}}),\varPi _h w)_{L^2(\varGamma )}\nonumber \\&\quad \le c\,h^{2-1/q}\left( 1+\Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )} + \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}\right) \Vert e_h\Vert _{L^2(\varOmega )}. \end{aligned}$$
(48)

Note that we hide the of lower-order norms of \({\bar{u}}\) in the generic constant as these quantities may be estimated by means of the control bounds \(u_a\) and \(u_b\). Insertion of (44), (45) and (48) into (43) and dividing by \(\Vert e_h\Vert _{L^2(\varOmega )}\) implies the assertion. \(\square\)

Lemma 20

Under the assumption (41) the estimates

$$\begin{aligned} \Vert S_h({\bar{u}}) - S_h(R_h {\bar{u}})\Vert _{L^2(\varGamma )}&\le c\,h^{2-1/q}\, \left( 1 + \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)} + \Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )}\right) , \\ \Vert Z_h({\bar{u}}) - Z_h(R_h {\bar{u}})\Vert _{L^2(\varGamma )}&\le c\,h^{2-1/q}\, \left( 1 + \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)} + \Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )}\right) \end{aligned}$$

are valid for arbitrary\(q\in [2,\infty )\).

Proof

We will only prove the second estimate as the first one follows from the same technique and is even easier as the right-hand sides of the equations defining \(S_h({\bar{u}})\) and \(S_h(R_h{\bar{u}})\) coincide. This is not the case for the control-to-adjoint operator.

To shorten the notation we write \(e_h:=Z_h({\bar{u}}) - Z_h(R_h {\bar{u}})\). As in the previous lemma we rewrite the error by a duality argument using a dual problem similar to (42) with solution \(w\in H^1(\varOmega )\), more precisely,

$$\begin{aligned} a_{{\bar{u}}}(v,w) = (e_h,v)_{L^2(\varGamma )}\qquad \forall v\in H^1(\varOmega ). \end{aligned}$$

This yields

$$\begin{aligned} \Vert e_h\Vert _{L^2(\varGamma )}^2 = a_{{\bar{u}}}(e_h,w-\varPi _h w) + a_{{\bar{u}}}(e_h, \varPi _h w). \end{aligned}$$
(49)

We rewrite the second expression in (49) and get analogous to (45)

$$\begin{aligned}&a_{{\bar{u}}}(e_h,\varPi _h w)\nonumber \\&\quad = a_{{\bar{u}}}(Z_h({\bar{u}}), \varPi _h w) \pm a_{R_h{\bar{u}}}(Z_h(R_h {\bar{u}}), \varPi _h w) - a_{{\bar{u}}}(Z_h(R_h {\bar{u}}), \varPi _h w)\nonumber \\&\quad = (S_h({\bar{u}}) - S_h(R_h{\bar{u}}), \varPi _h w)_{L^2(\varOmega )} + ((R_h {\bar{u}} - {\bar{u}})\, Z_h(R_h {\bar{u}}), \varPi _h w)_{L^2(\varGamma )}. \end{aligned}$$
(50)

Note that the first term would not appear when deriving estimates for \(S_h\) instead of \(Z_h\) as the equations defining \(S_h({\bar{u}})\) and \(S_h(R_h {\bar{u}})\) have the same right-hand side.

The first term can be treated with the Cauchy–Schwarz inequality, Lemma 19 and the estimate \(\Vert \varPi _h w\Vert _{L^2(\varOmega )} \le c\,\Vert w\Vert _{H^1(\varOmega )}\le c\,\Vert e_h\Vert _{L^2(\varGamma )}\) which can be deduced from (19) and Lemma 1 with \(g=e_h\). These ideas lead to

$$\begin{aligned}&(S_h({\bar{u}}) - S_h(R_h{\bar{u}}), \varPi _h w)_{L^2(\varOmega )} \nonumber \\&\qquad \le c\,h^{2-1/q}\, \left( 1 + \Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )} + \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}\right) \,\Vert e_h\Vert _{L^2(\varGamma )}. \end{aligned}$$
(51)

For the second term on the right-hand side of (50) we apply the same steps as for (48) with the only modification that the a priori estimate \(\Vert w\Vert _{H^1(\varGamma )} \le c\,\Vert e_h\Vert _{L^2(\varGamma )}\) from Lemma 3a) has to be employed. From this we infer

$$\begin{aligned}&((R_h {\bar{u}} - {\bar{u}})\,Z_h(R_h {\bar{u}}), \varPi _h w)_{L^2(\varGamma )} \nonumber \\&\quad \ \le c\, h^{2-1/q}\left( 1+\Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )} + \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}\right) \Vert Z_h(R_h {\bar{u}})\Vert _{H^1(\varGamma )}\,\Vert \varPi _h w\Vert _{H^1(\varGamma )}\nonumber \\&\quad \ \le c\, h^{2-1/q}\left( 1+\Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )} + \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}\right) \Vert e_h\Vert _{L^2(\varGamma )}. \end{aligned}$$
(52)

In the last step we used the boundedness of \(Z_h(R_h {\bar{u}})\), see Lemma 14. Insertion of (51) and (52) into (50) leads to

$$\begin{aligned} a_{{\bar{u}}}(e_h,\varPi _h w) \le c\,h^{2-1/q}\left( 1 + \Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )}+ \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}\right) \Vert e_h\Vert _{L^2(\varGamma )}. \end{aligned}$$
(53)

It remains to discuss the first term on the right-hand side of (49). We obtain with the boundedness of \(a_{{\bar{u}}}\), the interpolation error estimate (19) and Lemma 3a)

$$\begin{aligned} a_{{\bar{u}}}(e_h,w-\varPi _h w)&\le c\,h^{1/2}\,\Vert e_h\Vert _{H^1(\varOmega )}\, \Vert w\Vert _{H^{3/2}(\varOmega )} \nonumber \\&\le \,c\,h^{1/2}\,\Vert e_h\Vert _{H^1(\varOmega )}\,\Vert e_h\Vert _{L^2(\varGamma )}. \end{aligned}$$
(54)

An estimate for the expression \(\Vert e_h\Vert _{H^1(\varOmega )}\) follows from the equality

$$\begin{aligned} \Vert e_h\Vert _{H^1(\varOmega )}^2 + ({\bar{u}}\,Z_h({\bar{u}}) - R_h {\bar{u}}\, Z_h(R_h {\bar{u}}), e_h)_{L^2(\varGamma )} = (S_h({\bar{u}}) - S_h(R_h{\bar{u}}), e_h)_{L^2(\varOmega )} \end{aligned}$$

which can be deduced by subtracting the equations for \(Z_h({\bar{u}})\) and \(Z_h(R_h {\bar{u}})\) from each other. Rearranging the terms yields

$$\begin{aligned} \Vert e_h\Vert _{H^1(\varOmega )}^2 \le&((R_h {\bar{u}} - {\bar{u}})\,Z_h(R_h {\bar{u}}), e_h)_{L^2(\varGamma )} \\&- ({\bar{u}} \,e_h,e_h)_{L^2(\varGamma )} + \Vert S_h({\bar{u}}) - S_h(R_h{\bar{u}})\Vert _{L^2(\varOmega )} \Vert e_h\Vert _{L^2(\varOmega )}. \end{aligned}$$

The second term on the right-hand side can be bounded by zero as \({\bar{u}}\ge 0\). An estimate for the last term is proved in Lemma 19. For the first term we apply the estimate (52) with \(\varPi _h w\) replaced by \(e_h\). All together, we obtain

$$\begin{aligned} \Vert e_h\Vert _{H^1(\varOmega )}^2&\le c\, h^{2-1/q}\left( 1 + \Vert \bar{u}\Vert _{W^{1,q}(\varGamma )} + \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}\right) \nonumber \\&\quad \times \left( \Vert e_h\Vert _{H^1(\varGamma )} + \Vert e_h\Vert _{H^1(\varOmega )}\right) . \end{aligned}$$
(55)

Moreover, with an inverse inequality and a trace theorem we get

$$\begin{aligned} \Vert e_h\Vert _{H^1(\varGamma )} \le c\,h^{-1/2}\,\Vert e_h\Vert _{H^{1/2}(\varGamma )} \le c\,h^{-1/2}\,\Vert e_h\Vert _{H^1(\varOmega )}. \end{aligned}$$

Consequently, we deduce from (55)

$$\begin{aligned} \Vert e_h\Vert _{H^1(\varOmega )} \le c\,h^{3/2-1/q}\left( 1 + \Vert \bar{u}\Vert _{W^{1,q}(\varGamma )} + \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}\right) . \end{aligned}$$

Insertion into (54) leads to

$$\begin{aligned} a_{{\bar{u}}}(e_h,w-\varPi _h w) \le c\,h^{2-1/q}\,\left( 1 + \Vert \bar{u}\Vert _{W^{1,q}(\varGamma )} + \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)}\right) \Vert e_h\Vert _{L^2(\varGamma )}. \end{aligned}$$

Together with (53) and (49) we conclude the desired estimate for \(Z_h\). \(\square\)

Lemma 21

Under the assumption (41) there holds the estimate

$$\begin{aligned} \Vert R_h {\bar{u}} - {\bar{u}}_h\Vert _{L^2(\varGamma )} \le c\,h^{2-2/q}\,|\ln h|\end{aligned}$$

with

$$\begin{aligned} c = c\left( \Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)},\Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )}, \Vert {\bar{y}}\Vert _{W^{2,q}(\varOmega )},\Vert {\bar{p}}\Vert _{W^{2,q}(\varOmega )}\right) . \end{aligned}$$

Proof

We observe that each function \(\xi :=t\, R_h {\bar{u}} + (1-t)\, {\bar{u}}_h\) for \(t\in [0,1]\) satisfies

$$\begin{aligned} \Vert {\bar{u}} - \xi \Vert _{L^2(\varGamma )} \le t \Vert {\bar{u}} - R_h{\bar{u}}\Vert _{L^2(\varGamma )} + (1-t)\Vert {\bar{u}} - {\bar{u}}_h\Vert _{L^2(\varGamma )} < \varepsilon , \end{aligned}$$

for arbitrary \(\varepsilon >0\) provided that h is sufficiently small. This follows from the convergence of the midpoint interpolant, see Lemma 25, and convergence of \({\bar{u}}_h\) towards \({\bar{u}}\), see Theorem 3. Hence, with the coercivity of \(j_h''\) proved in Lemma 17 and the mean value theorem we conclude

$$\begin{aligned} \frac{\delta }{4}\,\Vert R_h {\bar{u}} - {\bar{u}}_h\Vert _{L^2(\varGamma )}^2&\le j_h''(\xi )(R_h {\bar{u}} - {\bar{u}}_h)^2 \\&= j_h'(R_h {\bar{u}})(R_h {\bar{u}} - {\bar{u}}_h) - j_h'({\bar{u}}_h)(R_h {\bar{u}} - {\bar{u}}_h). \end{aligned}$$

For the latter term we exploit the discrete optimality condition and the fact that the continuous optimality condition holds even pointwise. This implies the inequality

$$\begin{aligned} j_h'({\bar{u}}_h)(R_h{\bar{u}} - {\bar{u}}_h) \ge 0 \ge (\alpha \,R_h {\bar{u}} - R_h({\bar{y}}\,{\bar{p}}),R_h {\bar{u}} - {\bar{u}}_h)_{L^2(\varGamma )}. \end{aligned}$$

Insertion into the estimate above implies

$$\begin{aligned}&\frac{\delta }{4}\Vert R_h {\bar{u}} - {\bar{u}}_h\Vert _{L^2(\varGamma )}^2 \nonumber \\&\quad \le \left( R_h({\bar{y}}\,{\bar{p}})-{\bar{y}}\,{\bar{p}} +{\bar{y}}\,{\bar{p}} -S_h(R_h {\bar{u}})\,Z_h(R_h {\bar{u}}), R_h {\bar{u}} - {\bar{u}}_h\right) _{L^2(\varGamma )}. \end{aligned}$$
(56)

The right-hand side can be decomposed into two parts. With appropriate intermediate functions we obtain for the latter one

$$\begin{aligned}&({\bar{y}}\, {\bar{p}} - S_h(R_h {\bar{u}})\,Z_h(R_h {\bar{u}}), R_h {\bar{u}} - {\bar{u}}_h)_{L^2(\varGamma )} \\&\quad = (({\bar{y}}-S_h(R_h{\bar{u}}))\, {\bar{p}} + S_h(R_h{\bar{u}})\,({\bar{p}}-Z_h(R_h{\bar{u}})), R_h{\bar{u}}-{\bar{u}}_h)_{L^2(\varGamma )} \\&\quad \le c\,\big (\Vert {\bar{y}}-S_h(R_h{\bar{u}})\Vert _{L^2(\varGamma )}\,\Vert {\bar{p}}\Vert _{L^\infty (\varGamma )} \\&\qquad + \Vert {\bar{p}}-Z_h(R_h{\bar{u}})\Vert _{L^2(\varGamma )}\, \Vert S_h(R_h{\bar{u}})\Vert _{L^\infty (\varGamma )} \big )\,\Vert R_h{\bar{u}}-{\bar{u}}_h\Vert _{L^2(\varGamma )}. \end{aligned}$$

Moreover, we apply the triangle inequality and the estimates from Lemmata 18 and 20 to deduce

$$\begin{aligned} \Vert {\bar{y}}-S_h(R_h{\bar{u}})\Vert _{L^2(\varGamma )}&\le \Vert {\bar{y}}-S_h({\bar{u}})\Vert _{L^2(\varGamma )} + \Vert S_h({\bar{u}})-S_h(R_h{\bar{u}}) \Vert _{L^2(\varGamma )} \\&\le c\,h^{2-2/q}\,|\ln h|. \end{aligned}$$

Analogously, one can derive an estimate for the term \(\Vert {\bar{p}} - Z_h(R_h{\bar{u}})\Vert _{L^2(\varGamma )}\). Moreover, we apply Lemmata 6 and 14 to bound the norms of \(p=Z({\bar{u}})\) and \(S_h(R_h {\bar{u}})\), respectively. All together we obtain the estimate

$$\begin{aligned}&({\bar{y}}\,{\bar{p}}-S_h(R_h {\bar{u}})\,Z_h(R_h {\bar{u}}), R_h {\bar{u}} - \bar{u}_h)_\varGamma \le c\,h^{2-2/q}\,|\ln h|\,\Vert R_h{\bar{u}}-{\bar{u}}_h\Vert _{L^2(\varGamma )}. \end{aligned}$$
(57)

Next we discuss that part of (56) which involves the term \(R_h({\bar{y}}\,{\bar{p}})-{\bar{y}}\,{\bar{p}}\) in the first argument. Here, we again use the interpolation error estimate (47) exploiting regularity in fractional-order Sobolev spaces and obtain

$$\begin{aligned}&(R_h ({\bar{y}}\,{\bar{p}}) - {\bar{y}}\,{\bar{p}}, R_h {\bar{u}} - {\bar{u}}_h)_{L^2(\varGamma )} = \sum _{E\in {\mathcal {E}}_h} [R_h{\bar{u}} - {\bar{u}}_h]_E\int _E \left( {\bar{y}}\,{\bar{p}}-R_h ({\bar{y}}\,{\bar{p}})\right) \nonumber \\&\quad \le c\sum _{E\in {\mathcal {E}}_h} h^{1/2}\,[R_h{\bar{u}} - {\bar{u}}_h]_E\, h^{2-1/q}\,\Vert {\bar{y}}\,{\bar{p}}\Vert _{H^{2-1/q}(E)}\nonumber \\&\quad \le c\,h^{2-1/q}\,\Vert R_h{\bar{u}}- {\bar{u}}_h\Vert _{L^2(\varGamma )}\,\Vert {\bar{y}}\,{\bar{p}}\Vert _{H^{2-1/q}(\varGamma )}. \end{aligned}$$
(58)

With [20, Theorem 1.4.4.2] and a trace theorem we conclude

$$\begin{aligned} \Vert {\bar{y}}\,{\bar{p}}\Vert _{H^{2-1/q}(\varGamma )} \le c\,\Vert {\bar{y}}\Vert _{W^{2-1/q,q}(\varGamma )}\,\Vert {\bar{p}}\Vert _{W^{2-1/q,q}(\varGamma )} \le c\,\Vert {\bar{y}}\Vert _{W^{2,q}(\varOmega )}\,\Vert {\bar{p}}\Vert _{W^{2,q}(\varOmega )}. \end{aligned}$$

Insertion of the estimates (57) and (58) into (56), and dividing the resulting estimate by \(\Vert R_h u - u_h\Vert _{L^2(\varGamma )}\), leads to the desired result. \(\square\)

Now we are in the position to state the main result of this section.

Theorem 5

Let\(({\bar{y}},{\bar{u}},{\bar{p}})\)be a local solution of (12) satisfying the assumption (41). Moreover, let\(\{{\bar{u}}_h\}_{h>0}\)be a sequence of local solutions of (27) such that for sufficiently small\(\varepsilon ,h_0>0\)the property

$$\begin{aligned} \Vert {\bar{u}} - {\bar{u}}_h\Vert _{L^2(\varGamma )}< \varepsilon \qquad \forall h < h_0 \end{aligned}$$

holds. Then, the error estimate

$$\begin{aligned} \Vert {\bar{u}} - {\tilde{u}}_h\Vert _{L^2(\varGamma )} \le c\,h^{2-2/q}\,|\ln h|\end{aligned}$$

is satisfied with\(c=c(\Vert {\bar{u}}\Vert _{W^{1,q}(\varGamma )},\Vert {\bar{u}}\Vert _{H^{2-1/q}({\mathcal {K}}_2)},\Vert {\bar{y}}\Vert _{W^{2,q}(\varOmega )}, \Vert {\bar{p}}\Vert _{W^{2,q}(\varOmega )})\).

Proof

With the projection formulas (13) and (39), respectively, the non-expansivity of the operator \(\varPi _{ad}\) and the triangle inequality we obtain

$$\begin{aligned} \Vert {\bar{u}} - {\tilde{u}}_h\Vert _{L^2(\varGamma )}&\le c\,\Vert \varPi _{ad} \left( \frac{1}{\alpha }\,{\bar{y}}\, {\bar{p}}\right) - \varPi _{ad} \left( \frac{1}{\alpha }\,{\bar{y}}_h\, {\bar{p}}_h\right) \Vert _{L^2(\varGamma )} \\&\le \frac{c}{\alpha }\,\left( \Vert {\bar{y}} - {\bar{y}}_h\Vert _{L^2(\varGamma )}\,\Vert {\bar{p}}\Vert _{L^\infty (\varOmega )} + \Vert {\bar{y}}_h\Vert _{L^\infty (\varOmega )}\,\Vert {\bar{p}} - {\bar{p}}_h\Vert _{L^2(\varGamma )}\right) . \end{aligned}$$

The assertion follows after insertion of (40) together with the estimates obtained in Lemmata 1820 and 21, as well as the stability estimates of Z and \(S_h\) from Lemmata 3 and 14, respectively. \(\square\)

6 Numerical experiments

It is the purpose of this last section to confirm the theoretical results by numerical experiments. To this end, we reformulate the discrete optimality condition (27) and use the equivalent projection formula

$$\begin{aligned} u_h = \varPi _{ad}\left( \frac{1}{\alpha }R_h^{{\text {Simp}}}(S_h(u_h)\,Z_h(u_h))\right) . \end{aligned}$$
(59)

Here, \(R_h^{{\text {Simp}}}:C(\varGamma )\rightarrow U_h\) is a projection operator based on the Simpson rule, this is,

$$\begin{aligned}{}[R_h^{{\text {Simp}}}(v)]_{E} = \frac{1}{6}\left( v(x_{E_1}) + 4v(x_{E}) + v(x_{E_2})\right) , \end{aligned}$$

where \(x_{E_1}\) and \(x_{E_2}\) are the endpoints of the boundary edge \(E\in {\mathcal {E}}_h\) and \(x_E\) its midpoint. The numerical solution of (59) is computed by a semismooth Newton-method.

The input data of the considered benchmark problem is chosen as follows. The computational domain is the unit square \(\varOmega :=(0,1)^2\). We define the exact Robin parameter \({\tilde{u}}\) by

$$\begin{aligned} {\tilde{u}}(x_1,x_2):= {\left\{ \begin{array}{ll} \max (-0.01,\ 1-30(x_1-0.5)^2), &{}\text{ if }\ x_1=0, \\ -0.01, &{}\text{ otherwise }, \end{array}\right. } \end{aligned}$$

and use the desired state \(y_d = S_h({\tilde{u}})\) and the right-hand side \(f\equiv 0\). Moreover, the regularization parameter \(\alpha = 10^{-2}\) and the control bounds \(u_a=0\), \(u_b=\infty\) are used.

We compute the numerical solution of our benchmark problem on a sequence of meshes starting with \({\mathcal {T}}_{h_0}\), \(h_0=\sqrt{2}\), consisting of two rectangular triangles only. The remaining grids \({\mathcal {T}}_{h_i}\), \(i=1,2,\ldots ,\) are obtained by a double bisection through the longest edge of each element applied to the previous mesh. This guarantees \(h_i = \frac{1}{2} h_{i-1}\). In order to compute the discretization error we use the solution on the mesh \({\mathcal {T}}_{h_{11}}\) as an approximation of the exact solution, this means,

$$\begin{aligned} \Vert {\bar{u}}-{\bar{u}}_{h_i}\Vert _{L^2(\varGamma )} \approx \Vert {\bar{u}}_{h_{11}} - {\bar{u}}_{h_i}\Vert _{L^2(\varGamma )},\quad i=0,1,\ldots ,10. \end{aligned}$$

Analogously, we compute the error for the approximation obtained by the postprocessing strategy. However, in this case the exact solution is approximated by \({\bar{u}}\approx \varPi _{ad}(\frac{1}{\alpha }{\bar{y}}_{h_{11}}\,{\bar{p}}_{h_{11}})\). The error norms \(\Vert \varPi _{ad}(\frac{1}{\alpha }{\bar{y}}_{h_{11}}\,{\bar{p}}_{h_{11}}) - \varPi _{ad}(\frac{1}{\alpha }{\bar{y}}_{h_{i}}\,{\bar{p}}_{h_{i}})\Vert _{L^2(\varGamma )}\), \(i=0,\ldots ,11\), are computed element-wise by the Simpson quadrature formula with the modification that elements E are split at those points where \({\bar{y}}_{h_i}\,{\bar{p}}_{h_i}\) or \({\bar{y}}_{h_{11}}\,{\bar{p}}_{h_{11}}\) change its sign.

Fig. 1
figure 1

Optimal state (surface) and the optimal control (boundary curve) for the benchmark problem

The optimal control and corresponding state of our benchmark problem is illustrated in Fig. 1 and the measured discretization errors as well as the experimentally computed convergence rates are summarized in Table 1. As we have proven in Theorem 4 the numerical solutions obtained by a full discretization using a piecewise constant control approximation converge with the optimal convergence rate 1. Moreover, it is confirmed that the solution obtained with a postprocessing step, see Theorem 5, converges with order \(2-\varepsilon\), \(\varepsilon >0\). Note that we actually proved the results for the case that the boundary is smooth which is indeed not the case in our example. However, the corner singularities contained in the solution are for a \(90^\circ\)-corner comparatively mild so that the regularity results from Lemma 11 remain valid.

Table 1 Experimentally computed errors for the full discretization and the postprocessing approach with the corresponding experimental convergence rates (in parentheses)