1 Introduction

Consider the model problem:

$$\begin{aligned} \min _{f, \, u} \, \left\{ \frac{1}{2} \left\| u-d \right\| ^2_{{L^2(\partial {\varOmega })}} + \frac{\alpha }{2} \left\| f \right\| _{{L^2({\varOmega })}}^2 \right\} , \end{aligned}$$
(1)

on a Lipschitz domain \({\varOmega }\subset \mathbb {R}^n\), subject to

$$\begin{aligned} - {\varDelta }u + u + f = 0 \quad \hbox {in } {\varOmega }, \end{aligned}$$
(2)
$$\begin{aligned} \frac{\partial u}{\partial \mathbf {n}} = 0 \quad \hbox {on } \partial {\varOmega }. \end{aligned}$$
(3)

This minimization task is similar to the standard example considered in PDE-constrained optimization. But instead of assuming that observation data is available everywhere in \({\varOmega }\), we consider the case where observations are only given at the boundary \(\partial {\varOmega }\) of \({\varOmega }\), that is \(d \in {L^2(\partial {\varOmega })}\), see the first term in (1). For problems of the form (1)–(3), in which the objective functional is replaced by

$$\begin{aligned} \frac{1}{2}\left\| u-d \right\| ^2_{{L^2({\varOmega })}} + \frac{\alpha }{2} \left\| f \right\| _{{L^2({\varOmega })}}^2 \end{aligned}$$
(4)

very efficient preconditioners have been developed for the associated KKT system. In fact, by employing proper \(\alpha \)-dependent scalings of the involved Hilbert spaces [14], or by using a Schur complement approach [13], methods that are robust with respect to the size of the regularization parameter \(\alpha \) have been developed. More specifically, the condition number of the preconditioned optimality system is small and bounded independently of \(0 < \alpha \ll 1\) and the mesh size h. This ensures good performance for suitable Krylov subspace methods, e.g. the minimum residual method (Minres), independently of both parameters. These techniques have been extended to handle time dependent problems [12] and PDE-constrained optimization with Stokes equations [17], but the rigorous analysis of \(\alpha \)-independent bounds always requires that observations are available throughout all of \({\varOmega }\).

For cases with limited observations, for example with cost-functionals of the form (1), efficient preconditioners are also available for a rather large class of PDE-constrained optimization problems, see [10, 11]. But these techniques do not yield convergence rates, for the preconditioned KKT-system, that are completely robust with respect to the size of the regularization parameter \(\alpha \). Instead, the number of preconditioned Minres iterations grows logarithmicallyFootnote 1 with respect to the size of \(\alpha ^{-1}\), as \(\alpha \rightarrow 0\):

$$\begin{aligned} a + b \log _{10} \left( \alpha ^{-1} \right) \end{aligned}$$
(5)

for constants a, b independent of \(\alpha \). According to the numerical experiments presented in [11], the size of b may become significant. More specifically, \(b \in [5,50]\) for problems with simple elliptic state equations posed on rectangles. Thus, for small values of \(\alpha \), Minres may require rather many iterations to converge—even though the growth in iteration numbers is only logarithmic.

In practice, observations are rarely available throughout the entire domain of the state equation. On the contrary, the purpose of solving an inverse problem is typically to use data recorded at the surface of an object to compute internal properties of that object: Impedance tomography, the inverse problem of electrocardiography (ECG), computerized tomography (CT), etc. This fact, combined with the discussion above, motivate the need for further improving numerical methods for solving KKT systems arising in connection with PDE-constrained optimization.

This paper is organized as follows. In the next section we derive the KKT system associated with the model problem (1)–(3). Our \(\alpha \) robust preconditioner is presented in Sect. 3, along with a number of numerical experiments. Sections 4 and 5 contain our analysis, and the method is generalized in Sects. 6 and 7. In Sect. 8 we discuss the preconditioner when applied to a standard finite element approximation of the problem. Section 9 provides a discussion of our findings, including their limitations.

2 KKT system

Consider the PDE (2) with the boundary condition (3). A solution u to this elliptic PDE, with source term \(f\in {L^2({\varOmega })}\), is known to have improved regularity, i.e. \(u\in H^{s}({\varOmega })\), for some \(s\in [1, 2]\), with s depending on the domain \({\varOmega }\). In the remainder of this paper we assume that the solution u is in \({H^2({\varOmega })}\) for any source term \(f\in {L^2({\varOmega })}\). This assumption is known to hold if \({\varOmega }\) is convex or if \(\partial {\varOmega }\) is \(C^2\), see e.g. [5, 7].

When solutions to (2) exhibit \({H^2({\varOmega })}\)-regularity, we can write the problem on the non-standard variational form: Find \(u \in {\bar{H}^2({\varOmega })}\) such that

$$\begin{aligned} (- {\varDelta }u + u,w)_{{L^2({\varOmega })}} + (f,w)_{{L^2({\varOmega })}} = 0\quad \forall w \in {L^2({\varOmega })}, \end{aligned}$$
(6)

where

$$\begin{aligned} {\bar{H}^2({\varOmega })}= \bigg \{ \phi \in {H^2({\varOmega })}\, \bigg | \, \frac{\partial \phi }{\partial \mathbf {n}} = 0 \hbox { on } \partial {\varOmega }\bigg \}, \end{aligned}$$

equipped with the inner product

$$\begin{aligned} (u,v)_{{H^2({\varOmega })}}= & {} \int _{\varOmega }\nabla ^2u : \nabla ^2v + \nabla u \cdot \nabla v + uv \, dx\nonumber \\= & {} \int _{\varOmega }{\varDelta }u {\varDelta }v + \nabla u \cdot \nabla v + u v \, dx. \end{aligned}$$
(7)

Here \(\nabla ^2u\) denotes the Hessian of u, and the second identity is due to the boundary condition \(\frac{\partial u}{\partial \mathbf {n}} = 0\) imposed on the space \({\bar{H}^2({\varOmega })}\).

We will see below that, in order to design a regularization robust preconditioner for (1)–(3), it is convenient to express the state equation in the form (6), instead of employing integration by parts/Green’s formula to write it on the standard self-adjoint form.

2.1 Optimality system

We may express (1)–(3) in the form:

$$\begin{aligned} \min _{f \in {L^2({\varOmega })}, \, u \in {\bar{H}^2({\varOmega })}}\, \left\{ \frac{1}{2}\left\| u-d \right\| _{{L^2(\partial {\varOmega })}}^2 + \frac{\alpha }{2} \left\| f \right\| _{{L^2({\varOmega })}}^2 \right\} \end{aligned}$$
(8)

subject to

$$\begin{aligned} (- {\varDelta }u + u,w)_{{L^2({\varOmega })}} + (f,w)_{{L^2({\varOmega })}} =0\quad \forall w \in {L^2({\varOmega })}. \end{aligned}$$
(9)

The associated Lagrangian reads

$$\begin{aligned} \begin{aligned} \mathcal {L}(f,u,w)&= \frac{1}{2} \left\| u-d \right\| _{{L^2(\partial {\varOmega })}}^2 + \frac{\alpha }{2} \left\| f \right\| _{{L^2({\varOmega })}}^2 +(f - {\varDelta }u + u,w)_{{L^2({\varOmega })}}, \end{aligned} \end{aligned}$$

with \(f \in {L^2({\varOmega })}\), \(u \in {\bar{H}^2({\varOmega })}\) and \(w \in {L^2({\varOmega })}\). From the first order optimality conditions

$$\begin{aligned} \frac{\partial \mathcal {L}}{\partial f} = 0,\quad \frac{\partial \mathcal {L}}{\partial u} = 0,\quad \frac{\partial \mathcal {L}}{\partial w} = 0, \end{aligned}$$

we obtain the optimality system: determine \((f,u,w) \in {L^2({\varOmega })}\times {\bar{H}^2({\varOmega })}\times {L^2({\varOmega })}\) such that

$$\begin{aligned} \alpha (f,\psi )_{{L^2({\varOmega })}}+(\psi ,w)_{{L^2({\varOmega })}}= & {} 0 \quad \forall \psi \in {L^2({\varOmega })}, \end{aligned}$$
(10)
$$\begin{aligned} (u-d,\phi )_{{L^2(\partial {\varOmega })}} + (-{\varDelta }\phi + \phi , w)_{{L^2({\varOmega })}}= & {} 0 \quad \forall \phi \in {\bar{H}^2({\varOmega })}, \end{aligned}$$
(11)
$$\begin{aligned} (f,\xi )_{{L^2({\varOmega })}} + (- {\varDelta }u + u,\xi )_{{L^2({\varOmega })}}= & {} 0 \quad \forall \xi \in {L^2({\varOmega })}. \end{aligned}$$
(12)

3 Numerical experiments

Prior to analyzing our model problem, we will consider some numerical experiments. Discretization of (10)–(12) yields an algebraic system of the form

$$\begin{aligned} \underbrace{\left[ \begin{array}{ccc} \alpha M &{} {0}&{} M \\ {0}&{} M_{\partial } &{} A^T \\ M &{} A &{} 0 \end{array} \right] }_{\mathcal {A}_{\alpha }} \left[ \begin{array}{c} f \\ u \\ w \end{array} \right] = \left[ \begin{array}{c} 0 \\ {\tilde{M}}_{\partial }d \\ 0 \end{array} \right] , \end{aligned}$$
(13)

where M is a mass matrix, the discretization of the \({L^2({\varOmega })}\) inner product. \(M_{\partial }\) is a mass matrix associated with the boundary \(\partial {\varOmega }\) of \({\varOmega }\). A is a matrix that arises upon discretization of the operator \((1-{\varDelta })\). Since we write the state equation on a non self-adjoint form, A will not be the usual sum of the stiffness and mass matrices. Instead, Eq. (6) is discretized with subspaces of \({\bar{H}^2({\varOmega })}\) and \({L^2({\varOmega })}\). Consequently, A will in general not be a square matrix.

In (13), we have implicitly used the same discretization for the control variable and the Lagrange multiplier. In (10)–(12), both variables belong to \({L^2({\varOmega })}\), so it seems natural to preserve this correspondence in the discretization. In fact, we can see from (10) that \(f = -\alpha ^{-1} w\), so the control could be eliminated from the system prior to the discretization. This would result in a \(2\times 2\) block system in place of (13). While solving the smaller system is more practical in terms of computational costs, we find that the analysis is more clearly presented for the \(3\times 3\) system (13).

In the current numerical experiments, we employ the Bogner–Fox–Schmit (BFS) rectangle for discretizing the state variable \(u \in {\bar{H}^2({\varOmega })}\). That is, the finite element field consists of bicubic polynomials that are continuous, have continuous first order derivatives and mixed second order derivatives at each vertex of the mesh. BFS elements are \(C^1\) on rectangles and therefore \(H^2\)-conforming. The control f and Lagrange multiplier w are discretized with discontinuous bicubic elements.

We propose to precondition (13) with the block-diagonal matrix

$$\begin{aligned} \mathcal {B}_{\alpha }= \left[ \begin{array}{c@{\quad }c@{\quad }c} \alpha M &{} {0}&{} {0}\\ {0}&{} \alpha R+M_\partial &{} {0}\\ {0}&{} {0}&{} \frac{1}{\alpha } M \end{array} \right] ^{-1}, \end{aligned}$$
(14)

where R results from a discretization of the bilinear form \(b(\cdot ,\cdot )\) on \({\bar{H}^2({\varOmega })}\):

$$\begin{aligned} b( u, v)= (u,v)_{H^2({\varOmega })} + \int _{\varOmega }\nabla u \cdot \nabla v \, dx. \end{aligned}$$
(15)

In the experiments presented below, we used this bilinear form to construct a multigrid approximation of \(\left( \alpha R+M_\partial \right) ^{-1}\).

Remark

The bilinear form (15) is equivalent to the inner product on \({\bar{H}^2({\varOmega })}\). The additional term stems from our choice of implementing a multigrid algorithm for the bilinear form associated with the operator \(({\varDelta }- 1)^2 = {\varDelta }^2 -2 {\varDelta }+1\). Indeed, the bilinear form \(\alpha b(\,\cdot \,, \,\cdot \,) + (\,\cdot \,,\,\cdot \,)_{{L^2(\partial {\varOmega })}}\) can be seen to coincide with the variational form associated with the fourth order problem

$$\begin{aligned} \alpha ({\varDelta }-1)^2 u= & {} f \quad \hbox {in } {\varOmega }, \end{aligned}$$
(16)
$$\begin{aligned} \frac{\partial u}{\partial \mathbf {n}}= & {} 0 \quad \hbox {on } \partial {\varOmega }, \end{aligned}$$
(17)
$$\begin{aligned} \alpha \frac{\partial {\varDelta }u}{\partial \mathbf {n}}= & {} u \quad \hbox {on } \partial {\varOmega }. \end{aligned}$$
(18)

To limit the technical complexity of the implementation, we considered the problem (1)–(3) on the unit square in two dimensions. The experiments were implemented in Python and SciPy. The meshes were uniform rectangular, with the coarsest level for the multigrid solver consisting of \(8\times 8\) rectangles. Figure 1 shows an example of a solution of the optimality system (13).

Fig. 1
figure 1

An example of a solution of (13). The observation data d was generated with the forward model, using the “true” control \(4x(1-x) + y\) shown in panel (d). Solutions to the unregularized problem are non-unique, and the generating control cannot be (exactly) recovered. The figures were generated with mesh parameter \(h = 1/128\) and regularization parameter \(\alpha = 10^{-6}\). a Observation data d. The forward model was solved for the control shown in d, but only the boundary values can be observed. b Computed optimal state u based on the observation data shown in a. c Computed optimal control f based on the observation data in a. d The “true” control function used to generate the observation data in a

3.1 Eigenvalues

Let us first consider the exact preconditioner \(\mathcal {B}_{\alpha }\) defined in (14). If \(\mathcal {B}_{\alpha }\) is a good preconditioner for the discrete optimality system (13), then the spectral condition number of \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }\) should be small and bounded, independently of the size of both the regularization parameter \(\alpha \) and the discretization parameter h.

The eigenvalues of this preconditioned system were computed by solving the generalized eigenvalue problem

$$\begin{aligned} \mathcal {A}_{\alpha }x = \lambda \mathcal {B}_{\alpha }^{-1} x. \end{aligned}$$

We found that the absolute value of the eigenvalues \(\lambda \) were bounded, with

$$\begin{aligned} 0.445 \le \vert \lambda \vert \le 1.809, \end{aligned}$$

uniformly in \(\alpha \in \{1,10^{-1},\ldots , 10^{-10}\}\) and \(h \in \{2^{-2},\ldots ,2^{-5}\}\). This yields a uniform condition number \(k(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }) \approx 4.05\). The spectra of the preconditioned systems are pictured in Fig. 2 for some choices of \(\alpha \). The spectra are clearly divided into three bounded intervals, and the eigenvalues are more clustered for \(\alpha \approx 1\) and for very small \(\alpha \).

Fig. 2
figure 2

Spectrum of \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }\) for different regularization parameters \(\alpha \). The discretization parameter was \(h=2^{-4}\) for all figures

3.2 Multilevel preconditioning

In practice, the action of \(\mathcal {B}_{\alpha }\) is replaced with a less computationally expensive operation \({\widehat{\mathcal {B}_{\alpha }}}\). Note that \(\mathcal {B}_{\alpha }\) has a block structure, and that computationally efficient approximations can be constructed for the individual blocks. The only challenging block of the preconditioner is the biharmonic operator \(\alpha R+M_\partial \). Order optimal multilevel algorithms for forth order operators discretized with the Bogner–Fox–Schmit was developed in [16]. Specifically, it was shown that a multigrid V-cycle using a symmetric \(4\times 4\) block Gauss–Seidel smoother, where the blocks contain the matrix entries corresponding to all degrees of freedom associate with a vertex in the mesh, results in an order optimal approximation. The remaing blocks of the preconditioners are weighted mass matrices which are efficiently handled by two symmetric Gauss-Seidel iterations for the (1,1) and (3,3) blocks.

We estimated condition numbers of the individual blocks of \(\mathcal {B}_{\alpha }^{-1}\) preconditioned with their respective approximations. The results are reported in Tables 1 and 2. A slight deterioration in the performance of the multigrid cycle can be seen for very small values of \(\alpha > 0\).

3.3 Iteration numbers

To verify that also \({\widehat{\mathcal {B}_{\alpha }}}\) is an effective preconditioner for \(\mathcal {A}_{\alpha }\), we applied the Minres scheme to the system

$$\begin{aligned} {\widehat{\mathcal {B}_{\alpha }}} \mathcal {A}_{\alpha }x = {\widehat{\mathcal {B}_{\alpha }}} b. \end{aligned}$$

For the results presented in Table 3, the Minres iteration process was stopped as soon as

$$\begin{aligned} \frac{(r_k,{\widehat{\mathcal {B}_{\alpha }}} r_k)}{(r_0,{\widehat{\mathcal {B}_{\alpha }}} r_0)}= \frac{(\mathcal {A}_{\alpha }x_k - b, {\widehat{\mathcal {B}_{\alpha }}} \{\mathcal {A}_{\alpha }x_k - b\})}{(\mathcal {A}_{\alpha }x_0 - b, {\widehat{\mathcal {B}_{\alpha }}} \{\mathcal {A}_{\alpha }x_0 - b\})} \le \varepsilon , \end{aligned}$$
(19)

which is the standard termination criterion for the preconditioned Minres scheme, provided that the preconditioner is SPD. A random initial guess \(x_0\) was used, and the tolerance was set to \(\varepsilon = 10^{-12}\).

Table 1 Condition numbers of M preconditioned with symmetric Gauss–Seidel iterations
Table 2 Estimated condition numbers of \(\alpha R + M_\partial \) preconditioned with one V-cycle multigrid iteration
Table 3 Number of preconditioned Minres iterations needed to solve the optimality system to a relative error tolerance \(\varepsilon = 10^{-12}\)

4 Analysis of the KKT system

Recall that our optimality system reads:

$$\begin{aligned} \alpha (f,\psi )_{{L^2({\varOmega })}}+(\psi ,w)_{{L^2({\varOmega })}}= & {} 0 \quad \forall \psi \in {L^2({\varOmega })}, \\ (u-d,\phi )_{{L^2(\partial {\varOmega })}} + (- {\varDelta }\phi + \phi , w)_{{L^2({\varOmega })}}= & {} 0 \quad \forall \phi \in {\bar{H}^2({\varOmega })}, \\ (f,\xi )_{{L^2({\varOmega })}} + (- {\varDelta }u + u,\xi )_{{L^2({\varOmega })}}= & {} 0 \quad \forall \xi \in {L^2({\varOmega })}, \end{aligned}$$

with unknowns \(f \in {L^2({\varOmega })}\), \(u \in {\bar{H}^2({\varOmega })}\) and \(w \in {L^2({\varOmega })}\). We may write this KKT system in the form:

Determine \((f,u,w) \in {L^2({\varOmega })}\times {\bar{H}^2({\varOmega })}\times {L^2({\varOmega })}\) such that

$$\begin{aligned} \underbrace{\left[ \begin{array}{ccc} \alpha M &{} {0}&{} M' \\ {0}&{} M_{\partial } &{} A' \\ M &{} A &{} 0 \end{array} \right] }_{\mathcal {A}_{\alpha }} \left[ \begin{array}{c} f \\ u \\ w \end{array} \right] = \left[ \begin{array}{c} 0 \\ \tilde{M}_{\partial }d \\ 0 \end{array} \right] , \end{aligned}$$
(20)

where

$$\begin{aligned} M: {L^2({\varOmega })}&\rightarrow {L^2({\varOmega })}', \quad f \mapsto (f,\,\cdot \,)_{{L^2({\varOmega })}}, \end{aligned}$$
(21)
$$\begin{aligned} M_{\partial }: {\bar{H}^2({\varOmega })}&\rightarrow {\bar{H}^2({\varOmega })}',\quad u \mapsto (u,\,\cdot \,)_{{L^2(\partial {\varOmega })}}, \end{aligned}$$
(22)
$$\begin{aligned} \tilde{M}_{\partial }: {L^2(\partial {\varOmega })}&\rightarrow {\bar{H}^2({\varOmega })}', \quad d \mapsto (d,\,\cdot \,)_{{L^2(\partial {\varOmega })}}, \end{aligned}$$
(23)
$$\begin{aligned} A: {\bar{H}^2({\varOmega })}&\rightarrow {L^2({\varOmega })}',\quad u \mapsto (- {\varDelta }u + u,\,\cdot \,)_{{L^2({\varOmega })}}, \end{aligned}$$
(24)

and the notation “\('\)” is used to denote dual operators and dual spaces. In the rest of this paper, the symbols M, \( M_{\partial }\) and A will represent the mappings defined in (21), (22) and (24), respectively, and not (the associated) matrices, as was the case in Sect. 3. We believe that this mild ambiguity improves the readability of the present text.

By using standard techniques for saddle point problems, one can show that the system (20) satisfies the Brezzi conditions [1], provided that \(\alpha >0\). Therefore, for every \(\alpha > 0\), this set of equations has a unique solution. Nevertheless, if the standard norms of \({L^2({\varOmega })}\) and \({H^2({\varOmega })}\) are employed in the analysis, then the constants in the Brezzi conditions will depend on \(\alpha \). More specifically, the constant in the coercivity condition will be of order \(O(\alpha )\), and thus becomes very small for \(0 < \alpha \ll 1\). This property is consistent with the ill posed nature of (1)–(3) for \(\alpha =0\), and makes it difficult to design \(\alpha \) robust preconditioners for the algebraic system associated with (20).

Similar to the approach used in [9, 10, 14], we will now introduce weighted Hilbert spaces. The weights are constructed such that the constants appearing in the Brezzi conditions are independent of \(\alpha \). Thereafter, in Sect. 5, we will show how these scaled Hilbert spaces can be combined with simple maps to design \(\alpha \) robust preconditioners for our model problem.

4.1 Weighted norms

Consider the \(\alpha \)-weighted norms:

$$\begin{aligned} \left\| f \right\| _{L_{\alpha }^2({\varOmega })}^2&= \alpha \left\| f \right\| _{{L^2({\varOmega })}}^2, \end{aligned}$$
(25)
$$\begin{aligned} \left\| u \right\| _{H_{\alpha }^2({\varOmega })}^2&= \alpha \left\| u \right\| _{{H^2({\varOmega })}}^2 + \left\| u \right\| _{{L^2(\partial {\varOmega })}}^2, \end{aligned}$$
(26)
$$\begin{aligned} \left\| w \right\| _{L_{\alpha ^{-1}}^2({\varOmega })}^2&= \frac{1}{\alpha } \left\| w \right\| _{{L^2({\varOmega })}}^2, \end{aligned}$$
(27)

applied to the control f, the state u and the dual/Lagrange-multiplier w, respectively. Note that these norms become “meaningless” for \(\alpha = 0\), but are well defined for positive \(\alpha \).

4.2 Brezzi conditions

We will now analyze the properties of

$$\begin{aligned} \mathcal {A}_{\alpha }: L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })\times L_{\alpha ^{-1}}^2({\varOmega })\rightarrow L_{\alpha }^2({\varOmega })' \times H_{\alpha }^2({\varOmega })' \times L_{\alpha ^{-1}}^2({\varOmega })', \end{aligned}$$

defined in (20). More specifically, we will show that the Brezzi conditions are satisfied with constants that do not depend on the size of the regularization parameter \(\alpha > 0\). Note that we use the scaled Hilbert norms (25)–(27).

Lemma 1

For all \(\alpha > 0\), the following “inf-sup” condition holds:

$$\begin{aligned} \inf _{w \in L_{\alpha ^{-1}}^2({\varOmega })} \sup _{(f,u) \in L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \frac{(f,w)_{{L^2({\varOmega })}} + (- {\varDelta }u + u,w)_{{L^2({\varOmega })}}}{\left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \left\| w \right\| _{L_{\alpha ^{-1}}^2({\varOmega })}} \ge 1. \end{aligned}$$

Proof

Note that \(L_{\alpha }^2({\varOmega })\) and \(L_{\alpha ^{-1}}^2({\varOmega })\) contain the same functions, provided that \(\alpha > 0\). Let \(w \in L_{\alpha ^{-1}}^2({\varOmega })\) be arbitrary. By choosing \(f=w\) and \(u=0\) we find that

$$\begin{aligned} \begin{aligned}&\sup _{(f,u) \in L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \frac{(f,w)_{{L^2({\varOmega })}} + (- {\varDelta }u + u,w)_{{L^2({\varOmega })}}}{\left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \left\| w \right\| _{L_{\alpha ^{-1}}^2({\varOmega })}} \\&\quad \ge \frac{(w,w)_{{L^2({\varOmega })}}}{\left\| (w,0) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \left\| w \right\| _{L_{\alpha ^{-1}}^2({\varOmega })}} \\&\quad =\frac{\left\| w \right\| _{{L^2({\varOmega })}}^2}{\sqrt{\alpha }\left\| w \right\| _{{L^2({\varOmega })}} (\sqrt{\alpha })^{-1} \left\| w \right\| _{{L^2({\varOmega })}}} \\&\quad =1. \end{aligned} \end{aligned}$$

Since \(w\in L_{\alpha ^{-1}}^2({\varOmega })\) was arbitrary, this completes the proof. \(\square \)

Expressed in terms of the operators that constitute \(\mathcal {A}_{\alpha }\), Lemma 1 takes the form

$$\begin{aligned} \inf _{w \in L_{\alpha ^{-1}}^2({\varOmega })} \sup _{(f,u) \in L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \frac{\left\langle Mf,w \right\rangle + \left\langle Au,w \right\rangle }{\left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \left\| w \right\| _{L_{\alpha ^{-1}}^2({\varOmega })}} \ge 1, \end{aligned}$$

see (21) and (24).

Recall that we decided to write our state Eqs. (2)–(3) on the non-standard variational form (6). Throughout this paper we assume that problem (2)–(3) admits a unique solution \(u \in {\bar{H}^2({\varOmega })}\) for every \(f \in {L^2({\varOmega })}\), and that

$$\begin{aligned} \left\| u \right\| _{{H^2({\varOmega })}} \le c_1 \left\| f \right\| _{{L^2({\varOmega })}}. \end{aligned}$$
(28)

This assumption is valid if \({\varOmega }\) is convex or if \({\varOmega }\) has a \(C^2\) boundary, see e.g. [5, 7]. Inequality (28) is a key ingredient of the proof of our next lemma.

Lemma 2

There exists a constant \(c_2\), which is independent of \(\alpha > 0\), such that

$$\begin{aligned} \begin{aligned} \alpha \left\| f \right\| _{{L^2({\varOmega })}}^2 + \left\| u \right\| _{{L^2(\partial {\varOmega })}}^2&\ge c_2 \left( \alpha \left\| f \right\| _{{L^2({\varOmega })}}^2 + \alpha \left\| u \right\| _{{H^2({\varOmega })}}^2 + \left\| u \right\| _{{L^2(\partial {\varOmega })}}^2 \right) \\&= c_2 \left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })}^2 \end{aligned} \end{aligned}$$

for all \((f,u) \in {L^2({\varOmega })}\times {\bar{H}^2({\varOmega })}\) such that

$$\begin{aligned} (f,\phi )_{{L^2({\varOmega })}} + (-{\varDelta }u + u,\phi )_{{L^2({\varOmega })}} = 0 \quad \forall \phi \in {L^2({\varOmega })}. \end{aligned}$$
(29)

Proof

If (fu) satisfies (29), then

$$\begin{aligned} \left\| u \right\| _{{H^2({\varOmega })}} \le c_1 \left\| f \right\| _{{L^2({\varOmega })}}, \end{aligned}$$

see the discussion of (28). Let \(\theta = (1+ c_1^2)^{-1} \in (0,1)\), and it follows that

$$\begin{aligned} \begin{aligned} \alpha \left\| f \right\| _{{L^2({\varOmega })}}^2 + \left\| u \right\| _{{L^2(\partial {\varOmega })}}^2&\ge \alpha \theta \left\| f \right\| _{{L^2({\varOmega })}}^2 +\alpha \frac{1-\theta }{c_1^2} \left\| u \right\| _{{H^2({\varOmega })}}^2 + \left\| u \right\| _{{L^2(\partial {\varOmega })}}^2 \\&\ge \frac{1}{1+ c_1^2} \left( \alpha \left\| f \right\| _{{L^2({\varOmega })}}^2 + \alpha \left\| u \right\| _{{H^2({\varOmega })}}^2 + \left\| u \right\| _{{L^2(\partial {\varOmega })}}^2 \right) . \end{aligned} \end{aligned}$$

\(\square \)

This result may also be written in the form

$$\begin{aligned} \begin{aligned} \left\langle \left[ \begin{matrix} \alpha M &{} {0}\\ {0}&{} M_{\partial } \end{matrix} \right] \left[ \begin{matrix} f \\ u \end{matrix} \right] , \left[ \begin{matrix} f \\ u \end{matrix}\right] \right\rangle&\ge c_2 \left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })}^2 \\ \end{aligned} \end{aligned}$$

for all \((f,u) \in L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })\) satisfying

$$\begin{aligned} Mf+Au=0, \end{aligned}$$

where M, \(M_{\partial }\) and A are the operators defined in (21), (22) and (24), respectively.

4.3 Boundedness

Having established that the Brezzi conditions hold, with constants that are independent of \(\alpha \), we next explore the boundedness of \(\mathcal {A}_{\alpha }\).

Lemma 3

For all \((f,u),(\psi , \phi ) \in L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })\),

$$\begin{aligned} \left| \left\langle \left[ \begin{matrix} \alpha M &{} {0}\\ {0}&{} M_{\partial } \end{matrix} \right] \left[ \begin{matrix} f \\ u \end{matrix} \right] , \left[ \begin{matrix} \psi \\ \phi \end{matrix} \right] \right\rangle \right| \le \sqrt{2} \left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \left\| (\psi ,\phi ) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })}. \end{aligned}$$

Proof

Recall the definitions (21) and (22) of M and \(M_{\partial }\), respectively. Since

$$\begin{aligned} \left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} = \sqrt{\alpha \left\| f \right\| _{{L^2({\varOmega })}}^2 + \alpha \left\| u \right\| _{{H^2({\varOmega })}}^2 + \left\| u \right\| _{{L^2(\partial {\varOmega })}}^2}, \end{aligned}$$

we find, by employing the Cauchy–Schwarz inequality, that

$$\begin{aligned} \begin{aligned} \left| \left\langle \left[ \begin{matrix} \alpha M &{} {0}\\ {0}&{} M_{\partial } \end{matrix} \right] \left[ \begin{matrix} f \\ u \end{matrix} \right] , \left[ \begin{matrix} \psi \\ \phi \end{matrix} \right] \right\rangle \right|&= \left| \alpha (f,\psi )_{{L^2({\varOmega })}} + (u,\phi )_{{L^2(\partial {\varOmega })}} \right| \\&\le \left\| f \right\| _{L_{\alpha }^2({\varOmega })} \left\| \psi \right\| _{L_{\alpha }^2({\varOmega })} + \left\| u \right\| _{{L^2(\partial {\varOmega })}} \left\| \phi \right\| _{{L^2(\partial {\varOmega })}} \\&\le \sqrt{2}\sqrt{\left\| f \right\| _{L_{\alpha }^2({\varOmega })}^2 \left\| \psi \right\| _{L_{\alpha }^2({\varOmega })}^2 + \left\| u \right\| _{{L^2(\partial {\varOmega })}}^2 \left\| \phi \right\| _{{L^2(\partial {\varOmega })}}^2} \\&\le \sqrt{2} \left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \left\| (\psi ,\phi ) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })}. \end{aligned} \end{aligned}$$

Lemma 4

For all \((f,u) \in L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })\), \(w \in L_{\alpha ^{-1}}^2({\varOmega })\),

$$\begin{aligned} \left| \left\langle \left[ M \, \, A \right] \left[ \begin{matrix} f \\ u \end{matrix} \right] , w \right\rangle \right| \le \sqrt{3} \left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \left\| w \right\| _{L_{\alpha ^{-1}}^2({\varOmega })}. \end{aligned}$$

Proof

Again, we note that

$$\begin{aligned} \left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })}&= \sqrt{\alpha \left\| f \right\| _{{L^2({\varOmega })}}^2 + \alpha \left\| u \right\| _{{H^2({\varOmega })}}^2 + \left\| u \right\| _{{L^2(\partial {\varOmega })}}^2}, \\ \left\| w \right\| _{L_{\alpha ^{-1}}^2({\varOmega })}&= \frac{1}{\sqrt{\alpha }} \left\| w \right\| _{{L^2({\varOmega })}}. \end{aligned}$$

From the definitions of M and A, see (21) and (24), and the Cauchy-Schwarz inequality, it follows that

$$\begin{aligned} \begin{aligned} \left| \left\langle \left[ M \, \, A \right] \left[ \begin{matrix}f \\ u\end{matrix} \right] , w \right\rangle \right|&= \left| \left\langle Mf,w \right\rangle + \left\langle Au,w \right\rangle \right| \\&= \left|(f, w)_{{L^2({\varOmega })}} + (-{\varDelta }u + u, w)_{{L^2({\varOmega })}} \right| \\&\le \left( \left\| f \right\| _{L_{\alpha }^2({\varOmega })} + \left\| {\varDelta }u \right\| _{L_{\alpha }^2({\varOmega })} + \left\| u \right\| _{L_{\alpha }^2({\varOmega })} \right) \left\| w \right\| _{L_{\alpha ^{-1}}^2({\varOmega })} \\&\le \sqrt{3} \left\| (f,u) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })} \left\| w \right\| _{L_{\alpha ^{-1}}^2({\varOmega })}. \end{aligned} \end{aligned}$$

For the last equality, recall from (7) that \(\left\| {\varDelta }u \right\| _{L^2({\varOmega })}= \left\| \nabla ^2u \right\| _{L^2({\varOmega })}\le \left\| u \right\| _{H^2({\varOmega })}\) for all \(u\in {\bar{H}^2({\varOmega })}\). \(\square \)

4.4 Isomorphism

We have verified that the Brezzi conditions hold, and that \(\mathcal {A}_{\alpha }\) is a bounded operator. Moreover, all constants appearing in the inequalities expressing these properties are independent of the regularization parameter \(\alpha > 0\). Let

$$\begin{aligned} \mathcal {V}&= L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })\times L_{\alpha ^{-1}}^2({\varOmega }), \end{aligned}$$
(30)
$$\begin{aligned} \mathcal {V}'&= L_{\alpha }^2({\varOmega })' \times H_{\alpha }^2({\varOmega })' \times L_{\alpha ^{-1}}^2({\varOmega })'. \end{aligned}$$
(31)

Theorem 1

The operator \(\mathcal {A}_{\alpha }\), defined in (20), is bounded and continuously invertible for \(\alpha > 0\) in the sense that for all nonzero \(x\in \mathcal {V}\),

$$\begin{aligned} c \le \sup _{ 0\ne y\in \mathcal {V}} \frac{\left\langle \mathcal {A}_{\alpha }x, y \right\rangle }{\left\| y \right\| _{\mathcal {V}}\left\| x \right\| _{\mathcal {V}}} \le C, \end{aligned}$$
(32)

for some positive constants c and C that are independent of \(\alpha > 0\). In particular,

$$\begin{aligned} \left\| \mathcal {A}_{\alpha }^{-1} \right\| _{\mathcal {L}(\mathcal {V}',\mathcal {V})} \le c^{-1} \quad \hbox {and} \quad \left\| \mathcal {A}_{\alpha } \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V}')} \le C. \end{aligned}$$

Proof

This result follows from Lemmas 1, 2, 3, and 4 and Brezzi theory for saddle point problems, see [1]. \(\square \)

4.5 Estimates for the discretized problem

The stability properties (32) are not necessarily inherited by discretizations. However, the structure used to prove the so-called “inf-sup condition” in Lemma 1 is preserved in the discrete system provided that the same discretization is employed for the control and the Lagrange multiplier. Furthermore, the boundedness properties, Lemmas 3 and 4, certainly also hold for conforming discretizations.

It remains to adress the coercivity condition, Lemma 2, for the discretized problem. We consider finite dimensional subspaces \(U_h\subset U = {\bar{H}^2({\varOmega })}\) and \(W_h\subset W = {L^2({\varOmega })}\). For certain choices of \(U_h\) and \(W_h\), the estimate of Lemma 2 carries over to the finite-dimensional setting.

Lemma 5

Assume \(U_h\subset U\) and \(W_h\subset W\), such that \((1-{\varDelta }) U_h \subset W_h\). Then

$$\begin{aligned} \alpha \left\| f_h \right\| _{L^2({\varOmega })}^2 + \left\| u_h \right\| _{L^2(\partial {\varOmega })}^2 \ge c_2 \left\| (f_h,u_h) \right\| _{L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })}^2 \end{aligned}$$
(33)

for all \((f_h,u_h) \in W_h\times U_h\) such that

$$\begin{aligned} (f_h, \phi _h)_{L^2({\varOmega })}+ (u_h -{\varDelta }u_h, \phi _h)_{L^2({\varOmega })}= 0 \quad \forall \phi _h \in W_h. \end{aligned}$$
(34)

Proof

Assume that \((1-{\varDelta }) U_h \subset W_h\), and that (34) holds for \((f_h,u_h)\in W_h\times U_h\). Then \(f_h +(1-{\varDelta }) u_h \in W_h\), and (34) implies \(f_h +(1-{\varDelta }) u_h = 0\). Therefore, \((f_h, u_h)\) satisfies (29) and the estimate (33) follows from Lemma 2. \(\square \)

If the discretization is chosen such that Lemma 5 is satisfied, then the estimates (32) carries over to discretized system. More precisely, we have

$$\begin{aligned} \left\| \mathcal {A}_{\alpha ,h} \right\| _{\mathcal {L}(\mathcal {V}_h,\mathcal {V}_h')} \le \left\| \mathcal {A}_{\alpha } \right\| _{\mathcal {L}(\mathcal {V},\mathcal {V}')}, \quad \hbox {and}\quad \left\| \mathcal {A}_{\alpha ,h}^{-1} \right\| _{\mathcal {L}(\mathcal {V}_h',\mathcal {V}_h)} \le \left\| \mathcal {A}_{\alpha }^{-1} \right\| _{\mathcal {L}(\mathcal {V}',\mathcal {V})}, \end{aligned}$$
(35)

where \(\mathcal {V}_h = W_h\times U_h \times W_h \subset \mathcal {V}\), equipped with the inner prdocut of \(\mathcal {V}\), and \(\mathcal {A}_{\alpha ,h}\) is discrete counterpart to \(\mathcal {A}_{\alpha }\), defined by setting \(\left\langle \mathcal {A}_{\alpha ,h}x_h,y_h \right\rangle = \left\langle \mathcal {A}_{\alpha }x_h, y_h \right\rangle \) for all \(x_h,y_h\in \mathcal {V}_h\).

If the state is discretized with \(C^1\)-conforming bicubic Bogner–Fox–Schmit rectangles, as in Sect. 3, then Lemma 5 is satisfied if the control and Lagrange multiplier is discretized with discontinuous bicubic elements on the same mesh. For triangular meshes, one could choose Argyris triangles for the state variable and piecewise quintic polynomials for the control and Lagrange multiplier variables.

We remark that Lemma 5 provides a sufficient, but not necessary criterion for stability of the discrete problem, and usually may imply far more degrees of freedom in the discrete space \(W_h\subset W\) than is actually needed. The usefulness of Lemma 5 is that the estimates (35) can, in principle, always be obtained by choosing a sufficiently large space for the control and Lagrange multiplier.

5 Preconditioning

The linear problem (20) is of the form

$$\begin{aligned} \mathcal {A}x = b. \end{aligned}$$
(36)

where x is sought in a Hilbert space \(\mathcal {V}\), the right hand side b is in the dual space \(\mathcal {V}'\), and \(\mathcal {A}\) is a self-adjoint continuous mapping of \(\mathcal {V}\) onto \(\mathcal {V}'\). Iterative methods for linear problems are most often formulated for operators mapping \(\mathcal {V}\) into itself, and can not be directly applied to the linear system (36), as described in [9]. If we want to apply such methods to (36), then we need to introduce a continuous operator mapping \(\mathcal {V}'\) isomorphically back onto \(\mathcal {V}\). More precisely, if we have a continuous operator

$$\begin{aligned} \mathcal {B}: \mathcal {V}' \rightarrow \mathcal {V}, \end{aligned}$$

then \(\mathcal {M}= \mathcal {B}\mathcal {A}:\mathcal {V}\rightarrow \mathcal {V}\) is continuous and has the desired mapping properties, and if \(\mathcal {B}\) is an isomorphism, the solutions to (36) coincides with the solutions to the problem

$$\begin{aligned} \mathcal {M}x = \mathcal {B}\mathcal {A}x = \mathcal {B}b. \end{aligned}$$
(37)

In this paper we shall consider \(\mathcal {B}\in \mathcal {L}(\mathcal {V},\mathcal {V}')\) a preconditioner if \(\mathcal {B}\) is self-adjoint and positive definite. This implies that \(\mathcal {B}^{-1}\) is self-adjoint and positive definite as well, and hence \(\mathcal {B}^{-1}\) defines an inner product on \(\mathcal {V}\) by setting

$$\begin{aligned} \left( x,y \right) = \left\langle \mathcal {B}^{-1} x, y \right\rangle , \quad \quad x,y \in \mathcal {V}. \end{aligned}$$
(38)

This inner product has the crucial property of making \(\mathcal {M}\) self-adjoint, in the sense that

$$\begin{aligned} \left( \mathcal {M}x,y \right) = \left\langle \mathcal {A}x,y \right\rangle = \left\langle \mathcal {A}y,x \right\rangle = \left( \mathcal {M}y,x \right) . \end{aligned}$$
(39)

Conversely, given any inner product on \(\left( \,\cdot \,,\,\cdot \, \right) \) on \(\mathcal {V}\), the Riesz–Fréchet theorem provides a self-adjoint positive definite isomorphism \(\mathcal {B}:\mathcal {V}'\rightarrow \mathcal {V}\) such that (38) and (39) hold, and we say that \(\mathcal {B}\) is the Riesz operator induced by \(\left( \,\cdot \,,\,\cdot \, \right) \). This establishes a one-to-one correspondence between preconditioners and Riesz operators on \(\mathcal {V}'\). Since the Riesz operator is an isometric isomorphism, the operator norm of \(\mathcal {B}\mathcal {A}\) coincides with the operator norm of \(\mathcal {A}\). We formulate this well-known fact here in a lemma for the sake of self-containedness. We refer to [6, 9] for a more in-depth discussion of preconditioning and its relation to Riesz operators.

Lemma 6

Let \(\mathcal {V}\) be a Hilbert space, and let \(\mathcal {A}:\mathcal {V}\rightarrow \mathcal {V}'\) be a self-adjoint isomorphism, and assume that \(\mathcal {B}\) is the Riesz operator induced by the inner product on \(\mathcal {V}\), or equivalently, that the inner product on \(\mathcal {V}\) is defined by the self-adjoint positive definite isomorphism \(\mathcal {B}^{-1}:\mathcal {V}\rightarrow \mathcal {V}'\). Then \(\mathcal {B}\mathcal {A}: \mathcal {V}\rightarrow \mathcal {V}\) is an isomorphism, self-adjoint in the inner product on \(\mathcal {V}\), with

$$\begin{aligned} \left\| \mathcal {B}\mathcal {A} \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V})} = \left\| \mathcal {A} \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V}')} \quad \hbox {and}\quad \left\| (\mathcal {B}\mathcal {A})^{-1} \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V})} = \left\| \mathcal {A}^{-1} \right\| _{\mathcal {L}(\mathcal {V}', \mathcal {V})}. \end{aligned}$$

In particular, the condition number of \(\mathcal {B}\mathcal {A}\) is given by

$$\begin{aligned} \kappa (\mathcal {B}\mathcal {A}) = \left\| \mathcal {A}^{-1} \right\| _{\mathcal {L}(\mathcal {V}', \mathcal {V})}\left\| \mathcal {A} \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V}')}. \end{aligned}$$

Proof

Since \(\mathcal {A}\) is self-adjoint, \(\mathcal {M}= \mathcal {B}\mathcal {A}\) is self-adjoint with respect to the inner product on \(\mathcal {V}\). From the Riesz–Fréchet theorem we have \(\left\| \mathcal {A}x \right\| _{\mathcal {V}'} = \left\| \mathcal {B}\mathcal {A}x \right\| = \left\| \mathcal {M}x \right\| \), and we obtain following identity for the operator norm of \(\mathcal {M}\).

$$\begin{aligned} \begin{aligned} \left\| \mathcal {M} \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V})}&= \sup _{x\ne 0} \frac{ \left\| \mathcal {M}x \right\| _\mathcal {V}}{\left\| x \right\| _\mathcal {V}} = \sup _{x\ne 0} \frac{ \left\| \mathcal {A}x \right\| _{\mathcal {V}'}}{\left\| x \right\| _\mathcal {V}} \\&= \sup _{x\ne 0}\sup _{y\ne 0} \frac{\left\langle \mathcal {A}x, y \right\rangle }{\left\| x \right\| _\mathcal {V}\left\| y \right\| _\mathcal {V}} = \left\| \mathcal {A} \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V}')}. \end{aligned} \end{aligned}$$

A similar identity is obtained for the norm of the inverse operator,

$$\begin{aligned} \begin{aligned} \left\| \mathcal {M}^{-1} \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V})}&= \sup _{x\ne 0} \frac{ \left\| \mathcal {M}^{-1} x \right\| _\mathcal {V}}{\left\| x \right\| _\mathcal {V}}\\&= \left( \inf _{x\ne 0} \frac{ \left\| \mathcal {M}x \right\| _\mathcal {V}}{\left\| x \right\| _\mathcal {V}} \right) ^{-1}\\&= \left( \inf _{x\ne 0}\sup _{y\ne 0} \frac{\left\langle \mathcal {A}x, y \right\rangle }{\left\| x \right\| _\mathcal {V}\left\| y \right\| _\mathcal {V}}\right) ^{-1} = \left\| \mathcal {A}^{-1} \right\| _{\mathcal {L}(\mathcal {V}', \mathcal {V})}. \end{aligned} \end{aligned}$$

We say that a preconditioner \(\mathcal {B}_{\alpha }\) for \(\mathcal {A}_{\alpha }\) is robust with respect to the parameter \(\alpha \) if \(\kappa (\mathcal {B}_{\alpha }\mathcal {A}_{\alpha })\) is bounded uniformly in \(\alpha \). The significance of Lemma 6 is that such a robust preconditioner can be found by identifying (parameter-dependent) norms in which \(\mathcal {A}_{\alpha }\) and \(\mathcal {A}_{\alpha }^{-1}\) are both uniformly bounded.

5.1 Parameter-robust minimum residual method

In Sect. 4 stability of \(\mathcal {A}_{\alpha }\) was shown in the \(\alpha \)-dependent norms defined in (25)–(27). The preconditioner provided by Lemma 6 is the Riesz operator induced by the weighted norms. This operator \(\mathcal {B}_{\alpha }: \mathcal {V}' \rightarrow \mathcal {V}\) takes the form

$$\begin{aligned} \mathcal {B}_{\alpha }= \left[ \begin{matrix} \alpha M &{} {0}&{} {0}\\ {0}&{} \alpha R +M_{\partial } &{} {0}\\ {0}&{} {0}&{} \frac{1}{\alpha } M \end{matrix} \right] ^{-1} \end{aligned}$$
(40)

where \(R:{\bar{H}^2({\varOmega })}\rightarrow {\bar{H}^2({\varOmega })}'\) is the operator induced by the \({H^2({\varOmega })}\) inner product, i.e. \(\left\langle Ru,v \right\rangle = (u,v)_{{H^2({\varOmega })}}\).

Since \(\mathcal {A}_{\alpha }\) is self-adjoint, the preconditioned operator \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }:\mathcal {V}\rightarrow \mathcal {V}\) is self-adjoint in the inner product on \(\mathcal {V}\). Consequently we can apply the minimum residual method (Minres) to the problem

$$\begin{aligned} \mathcal {B}_{\alpha }\mathcal {A}_{\alpha }x = \mathcal {B}_{\alpha }b. \end{aligned}$$

Theorem 2

Let \(\mathcal {A}_{\alpha }\) be the operator defined in (20) and \(\mathcal {B}_{\alpha }\) the operator defined in (40). Then there exists an upper bound, independent of \(\alpha \), for the convergence rate of Minres applied to the preconditioned system

$$\begin{aligned} \mathcal {B}_{\alpha }\mathcal {A}_{\alpha }x = \mathcal {B}_{\alpha }b. \end{aligned}$$

In particular there exists an upper bound, independent of \(\alpha \), for the number of iterations needed to reach the stopping criterion (19).

Proof

A crude upper bound for the convergence rate (more precisely, the two-step convergence rate) of Minres is given by

$$\begin{aligned} \left\| \mathcal {B}_{\alpha }\mathcal {A}_{\alpha }(x-x_{2m}) \right\| _{\mathcal {V}} \le \left( \frac{1-\kappa }{1+\kappa }\right) ^{m} \left\| \mathcal {B}_{\alpha }\mathcal {A}_{\alpha }(x-x_{0}) \right\| _{\mathcal {V}} \end{aligned}$$

where \(\kappa = \kappa (\mathcal {B}_{\alpha }\mathcal {A}_{\alpha })\) is the condition number of \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }\), see e.g. [9]. From Lemma 6 and (32) we determine that \(\kappa \) is bounded independently of \(\alpha \), with

$$\begin{aligned} \begin{aligned} \kappa&= \left\| (\mathcal {B}_{\alpha }\mathcal {A}_{\alpha })^{-1} \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V})} \left\| \mathcal {B}_{\alpha }\mathcal {A}_{\alpha } \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V})}\\&= \left\| \mathcal {A}_{\alpha }^{-1} \right\| _{\mathcal {L}(\mathcal {V}', \mathcal {V})} \left\| \mathcal {A}_{\alpha } \right\| _{\mathcal {L}(\mathcal {V}, \mathcal {V}')}\\&\le c^{-1}C. \end{aligned} \end{aligned}$$
(41)

\(\square \)

In practical applications, the operator \(\mathcal {B}_{\alpha }\) will be replaced with a less computationally expensive approximation \({\widehat{\mathcal {B}_{\alpha }}}\). Ideally \({\widehat{\mathcal {B}_{\alpha }}}\) will be spectrally equivalent to \(\mathcal {B}_{\alpha }\), in the sense that the condition number of \({\widehat{\mathcal {B}_{\alpha }}} \mathcal {B}_{\alpha }^{-1}\) is bounded, independently of \(\alpha \). Then the preconditioned system reads

$$\begin{aligned} {\widehat{\mathcal {B}_{\alpha }}} \mathcal {A}_{\alpha }x = {\widehat{\mathcal {B}_{\alpha }}} b, \end{aligned}$$

and the upper bound for the convergence rate is determined by the conditioned number \(\kappa ({\widehat{\mathcal {B}_{\alpha }}}\mathcal {A}_{\alpha }) \le \kappa ({\widehat{\mathcal {B}_{\alpha }}}\mathcal {B}_{\alpha }^{-1})\kappa (\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }^{-1})\).

Remark

In this paper we only consider the minimum residual method, and we therefore require that the preconditioner is self-adjoint and positive definite. More generally, if other Krylov subspace methods are to be applied to (20), then preconditioners lacking symmetry or definiteness may be considered.

We mention in particular that a preconditioned conjugate gradient method for problems similar to (20) was proposed in [14], based on a clever choice of inner product.

6 Generalization

Is our technique applicable to other problems than (1)–(3)? We will now briefly explore this issue, and show that the preconditioning scheme derived above yields \(\alpha \) robust methods for a class of problems.

The scaling (25)–(27) was also investigated in [10], but for a family of abstract problems posed in terms of Hilbert spaces. More specifically, for general PDE-constrained optimization problems, subject to Tikhonov regularization, and with linear state equations. But in [10] no assumptions about the control, state or observation spaces were made, except that they were Hilbert spaces. Under these circumstances, it was proved that the coercivity and the boundedness, of the operator associated with the KKT system, hold with \(\alpha \)-independent constants. Nevertheless, in this general setting, the inf-sup condition involved an \(\alpha \)-dependent constant, which, eventually, yielded theoretical iteration bounds of order \(O([\log \left( \alpha ^{-1} \right) ]^2)\) for Minres.

In the present paper we were able to prove an \(\alpha \)-robust inf-sup condition for the model problem (1)–(3). This is possible because both the control f and the dual/Lagrange-multiplier w belong to \({L^2({\varOmega })}\). From a more general perspective, it turns out that this is the property that must be fulfilled in order for our approach to be successful: The control space and the dual space, associated with the state equation, must coincide. This will usually lead to additional regularity requirements for the state space.

Motivated by this discussion, let us consider an abstract problem of the form:

$$\begin{aligned} \min _{f \in W, \, u \in U} \left\{ \frac{1}{2} \left\| Tu - d \right\| ^2_O + \frac{1}{2} \alpha \left\| f \right\| ^2_W \right\} \end{aligned}$$
(42)

subject to

$$\begin{aligned} \left\langle Au, w \right\rangle + (f,w)_W =0, \quad \forall w \in W. \end{aligned}$$
(43)

Here, W is the dual and control space, U is the state space, O is the observation space, W, U and O are Hilbert spaces.

Let us assume that

(A1):

\(A:U \rightarrow W'\) is a continuous linear operator with closed range. In particular, there is a constant \(c_1\) such that for all \(u \in U\),

$$\begin{aligned} \left\| u \right\| _{U/ {\text {Ker}}A} = \inf _{{\tilde{u}} \in {\text {Ker}}A} \left\| u-{\tilde{u}} \right\| _U \le c_1\left\| A u \right\| _{W'}. \end{aligned}$$
(A2):

\(T:U \rightarrow O\) is linear and bounded, and invertible on the kernel of A. That is, there is a constant \(c_2\) such that for all \(u\in {\text {Ker}}A\),

$$\begin{aligned} \left\| u \right\| _U \le c_2 \left\| T u \right\| _O. \end{aligned}$$

It then follows that the KKT system associated with (42)–(43) is well-posed for every \(\alpha > 0\): Determine \((f,u,w) \in W \times U \times W\) such that

$$\begin{aligned} \underbrace{\left[ \begin{matrix} \alpha M &{} {0}&{} M' \\ {0}&{} K &{} A' \\ M &{} A &{} 0 \end{matrix} \right] }_{=\mathcal {A}_{\alpha }} \left[ \begin{matrix} f \\ u \\ w \end{matrix} \right] = \left[ \begin{matrix} 0 \\ {\tilde{K}} d \\ 0 \end{matrix} \right] , \end{aligned}$$
(44)

where

$$\begin{aligned} M&: W \rightarrow W',\quad f \mapsto (f,\,\cdot \,)_W, \end{aligned}$$
(45)
$$\begin{aligned} K&: U \rightarrow U',\quad u \mapsto (Tu,T\,\cdot \,)_O, \end{aligned}$$
(46)
$$\begin{aligned} {\tilde{K}}&: O \rightarrow U', \quad d \mapsto (d,T\,\cdot \,)_O, \end{aligned}$$
(47)

Note that, compared with (13), the boundary observation matrix \(M_{\partial }\) has been replaced with the general observation operator K in (44).

We introduce scaled norms as follows.

$$\begin{aligned} \left\| f \right\| _{W_{\alpha }}^2&= \alpha \left\| f \right\| _W^2, \\ \left\| u \right\| _{U_{\alpha }}^2&= \alpha \left\| Au \right\| _{W'}^2 + \left\| Tu \right\| _O^2, \\ \left\| w \right\| _{W_{\alpha ^{-1}}}^2&= \frac{1}{\alpha } \left\| w \right\| _W^2. \end{aligned}$$

We first show that \(\left\| \,\cdot \, \right\| _{U_\alpha }\) is indeed a norm on U when assumptions (A1) and (A2) hold. It suffices to show that \(\left\| \,\cdot \, \right\| _{U_\alpha }\) is a norm equivalent to \(\left\| \,\cdot \, \right\| _U\) when \(\alpha =1\). We have

$$\begin{aligned} \left\| Tu \right\| _O + \left\| Au \right\| _{W'} \le \big (\left\| T \right\| _{\mathcal {L}( U,O)} + \left\| A \right\| _{\mathcal {L}(U,W')}\big ) \left\| u \right\| _U, \end{aligned}$$
(48)

and letting \(\pi \) denote the orthogonal projection of U onto \({\text {Ker}}A\),

$$\begin{aligned} \begin{aligned} \left\| u \right\| _U&\le \left\| \pi u \right\| _U + \left\| u-\pi u \right\| _U\\&\le c_2 \left\| T\pi u \right\| _O + \left\| u-\pi u \right\| _U\\&\le c_2 \left\| T u \right\| _O + \big (1+c_2 \left\| T \right\| _{\mathcal {L}(U,O)}\big )\left\| u-\pi u \right\| _U\\&\le c_2 \left\| T u \right\| _O +c_1\big (1+c_2\left\| T \right\| _{\mathcal {L}(U,O)}\big ) \left\| Au \right\| _{W'}. \end{aligned} \end{aligned}$$
(49)

Here the last inequality follows from \(\left\| u-\pi u \right\| _{U} = \inf _{{\tilde{u}}\in {\text {Ker}}A}\left\| u-{\tilde{u}} \right\| _U\) and assumption (A1).

We set \(\mathcal {V}= W_\alpha \times U_{\alpha } \times W_{\alpha ^{-1}}\). As in Sect. 4, \(\mathcal {A}_{\alpha }:\mathcal {V}\rightarrow \mathcal {V}'\) can be shown to be an isomorphism, with parameter-independent estimates obtained in the weighted norms.

Theorem 3

There exists positive constants c and C, independent of \(\alpha \), such that for all nonzero \(x \in \mathcal {V}\),

$$\begin{aligned} c \le \sup _{0\ne y \in \mathcal {V}} \frac{\left\langle \mathcal {A}_{\alpha }x, y \right\rangle }{ \left\| x \right\| _{\mathcal {V}} \left\| y \right\| _{\mathcal {V}}} \le C. \end{aligned}$$
(50)

We omit the full proof, which is analogous to that of Theorem 1. The crucial part is the “inf-sup condition” of Lemma 1, which is easily shown to hold in the abstract setting:

$$\begin{aligned} \begin{aligned} \sup _{(f,u) \in W_\alpha \times U_\alpha } \frac{(f,w)_{W} + \left\langle A u,w \right\rangle }{\left\| (f,u) \right\| _{W_\alpha \times U_\alpha } \left\| w \right\| _{W_{\alpha ^{-1}}}}&\ge \frac{(w,w)_{W}}{\left\| (w,0) \right\| _{W_\alpha \times U_\alpha } \left\| w \right\| _{W_{\alpha ^{-1}}}} = 1. \end{aligned} \end{aligned}$$

The coercivity condition of Lemma 2 naturally holds in the prescribed norm on \(U_\alpha \), since for \((f,u)\in W\times U\) such that \(Au = Mf\),

$$\begin{aligned} \alpha \left\| f \right\| _W^2 + \left\| T u \right\| _O^2 = \frac{\alpha }{2} \left\| f \right\| _W^2 + \frac{\alpha }{2} \left\| Au \right\| _{W'}^2 + \left\| T u \right\| _O^2 \ge \frac{1}{2} \left( \left\| f \right\| _{W_\alpha }^2 + \left\| u \right\| _{U_\alpha }^2 \right) . \end{aligned}$$

Note that the weighted norm now depends on A, and as consequence, the estimates become A-independent. In fact, we obtain bounds for the constants c and C which are independent of \(\alpha \) as well as the operators appearing in (42)–(43). This is postponed to the next section, where sharp estimates are obtained for (50).

With the estimates (50), Lemma 6 provides a preconditioner for the operator \(\mathcal {A}_{\alpha }\), given as

$$\begin{aligned} \mathcal {B}_{\alpha }= \left[ \begin{matrix} \alpha M &{} {0}&{} {0}\\ {0}&{} \alpha A' M^{-1} A + K &{} {0}\\ {0}&{} {0}&{} \frac{1}{\alpha } M \end{matrix} \right] ^{-1} . \end{aligned}$$
(51)

The condition number of \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }\) will be bounded independently of \(\alpha \). It is, however, not clear how to find a computationally efficient approximation of \(\mathcal {B}_{\alpha }\) in the abstract setting of (42)–(43).

Example 1

The problem (1)–(3) fits in the abstract framework presented in this section when we assume that the state has \({H^2({\varOmega })}\) regularity. We set \(W= {L^2({\varOmega })}\), \(U={\bar{H}^2({\varOmega })}\), \(A = 1-{\varDelta }\), and \(T:{\bar{H}^2({\varOmega })}\rightarrow {L^2(\partial {\varOmega })}\) is a trace operator, see (46). Since A is a continuous isomorphism, assumptions (A1) and (A2) are both valid. The inner product on \(U_\alpha \) takes the form

$$\begin{aligned} \begin{aligned} (u, v)_{U_\alpha }&= \left\langle K u, v \right\rangle + \alpha \left\langle A M^{-1} A u, v \right\rangle \\&= \int _{\partial {\varOmega }} u v \, ds + \alpha \int _{\varOmega }(u-{\varDelta }u)(v-{\varDelta }v) \, dx\\&= \int _{\partial {\varOmega }} u v \, ds + \alpha \int _{\varOmega }\nabla ^2u : \nabla ^2v + 2 \nabla u \cdot \nabla v \, +uv \,dx, \end{aligned} \end{aligned}$$

where \(\nabla ^2u\) denotes the Hessian of u, and the last equality follows from the boundary condition \(\partial u /\partial \mathbf {n} = 0\) imposed on \({\bar{H}^2({\varOmega })}\). The resulting preconditioner is the one that was used in the numerical experiments, detailed in Sect. 3, and it is spectrally equivalent to the preconditioner defined in (40).

Example 2

Let U, W, and K be as in Example 1, but let us set \(A = -{\varDelta }\). Now A has non-trivial kernel, consisting of the a.e. constant functions, and for constant u we have

$$\begin{aligned} \left\| Tu \right\| _{L^2(\partial {\varOmega })}= \sqrt{\frac{\left|\partial {\varOmega } \right|}{\left|{\varOmega } \right|}} \left\| u \right\| _{\bar{H}^2({\varOmega })}. \end{aligned}$$

Since assumptions (A1) and (A2) are valid, the optimality system is still well-posed. In this case the inner product on \(U_\alpha \) is given by

$$\begin{aligned} \begin{aligned} (u, v)_{U_\alpha } = \int _{\partial {\varOmega }} u v \, ds + \alpha \int _{\varOmega }D^2u:D^2v \,dx. \end{aligned} \end{aligned}$$

Example 3

Let us consider the “prototype” problem:

$$\begin{aligned} \min _{f, \, u} \left\{ \frac{1}{2}\left\| u - d \right\| _{{L^2({\varOmega })}}^2 + \frac{\alpha }{2}\left\| f \right\| _{{L^2({\varOmega })}}^2 \right\} \end{aligned}$$

subject to

$$\begin{aligned} - {\varDelta }u + u + f&= 0 \quad \hbox {in } {\varOmega }, \\ \frac{\partial u}{\partial \mathbf {n}}&= 0 \quad \hbox {on } \partial {\varOmega }. \end{aligned}$$

Note that we here consider the case in which observation data is assumed to be available throughout the entire domain \({\varOmega }\) of the state equation.

If the usual variational form of the PDE is used, i.e.,

$$\begin{aligned} (u,w)_{{H^1({\varOmega })}} + (f,w)_{{L^2({\varOmega })}} = 0, \quad \forall w \in {H^1({\varOmega })}, \end{aligned}$$
(52)

then the control space equals \({L^2({\varOmega })}\), whereas the dual space is \({H^1({\varOmega })}\). The preconditioning strategy presented in this section is therefore not applicable.

If instead we can assume \({H^2({\varOmega })}\)-regularity, we can use the variational form

$$\begin{aligned} (- {\varDelta }u +u,w)_{{L^2({\varOmega })}} + (f,w)_{{L^2({\varOmega })}} =0, \quad \forall w \in {L^2({\varOmega })}. \end{aligned}$$
(53)

Now, the control and dual spaces both equal \({L^2({\varOmega })}\). The methodology presented in this section can thus be applied, and a robust preconditioner is obtained. Compared with the preconditioner for the problem with boundary observations only, see Sect. 5, Eq. (40), the only change is the replacement of \(M_{\partial }\), in the (2, 2) block of \(\mathcal {B}_{\alpha }\) with M.

We remark that in [13, 14], parameter-robust preconditioners were proposed for the “prototype” problem, using the standard variational formulation (52) of the PDE. Those methods do not require improved regularity for the state space. Instead, they require that observations are available throughout the computational domain.

7 Eigenvalue analysis

In Sect. 6 it was shown that the condition number of \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }\), with \(\mathcal {A}_{\alpha }\) defined in (44) and \(\mathcal {B}_{\alpha }\) defined in (51), can be bounded independently of \(\alpha \), as well as independently of the operators appearing in (42)–(43). Moreover, the numerical experiments indicate that the eigenvalues are contained in three intervals, independently of the regularization parameter \(\alpha \), see Fig. 2. In this section we detail the structure of the spectrum of the preconditioned system considered in Sect. 6, and we obtain sharp estimates for the constants appearing in Theorem 3.

We consider self-adjoint linear operators \(\mathcal {A}_{\alpha }\) and \(\mathcal {B}_\alpha \),

$$\begin{aligned} \mathcal {A}_{\alpha } = \left[ \begin{matrix} \alpha M &{} {0}&{} M' \\ {0}&{} K &{} A ' \\ M &{} A &{} 0 \end{matrix} \right] \quad \hbox {and}\quad \mathcal {B}_\alpha ^{-1} = \left[ \begin{matrix} \alpha M &{} {0}&{} {0}\\ {0}&{} K + \alpha R &{} {0}\\ {0}&{} {0}&{} \alpha ^{-1} M \end{matrix} \right] \end{aligned}$$
(54)

where R is defined by

$$\begin{aligned} R = A ' M ^{-1} A . \end{aligned}$$
(55)

We assume that \(A:U\rightarrow W'\) and \(M:W\rightarrow W'\) are continuous operators, for some Hilbert spaces U and W. In addition we will make use of the following assumptions.

(B1):

M is a self-adjoint and positive definite,

(B2):

\( K + R\) is positive definite,

(B3):

K is self-adjoint and positive semi-definite.

Assumptions (B1)–(B3) ensure that \(\mathcal {B}_\alpha \) is a self-adjoint and positive definite. In particular, assumptions (B1)–(B3) hold for \(\mathcal {A}_{\alpha }\) as in (44), provided that the assumptions of Sect. 6 hold. For simplicity, we also assume that that \(\mathcal {A}_{\alpha }\) and \(\mathcal {B}_\alpha \) are finite-dimensional operators.

Theorem 4

Let p, q, and r be the polynomials

$$\begin{aligned} p(\lambda ) = 1-\lambda , \quad q(\lambda ) = 1+\lambda p(\lambda ), \quad r(\lambda ) = p - \lambda q(\lambda ). \end{aligned}$$

Let \(q_1<q_2\) and \(r_1< r_2 <r_3\) be the roots of q and r, respectively. The spectrum of \(\mathcal {B}_\alpha \mathcal {A}_{\alpha }\) is contained within three intervals, determined by the roots of p and r, independently of \(\alpha \):

$$\begin{aligned} {\text {sp}}(\mathcal {B}_\alpha \mathcal {A}_{\alpha }) \subset [r_1,q_1] \cup [r_2,1] \cup [q_2,r_3]. \end{aligned}$$
(56)

Consequently, the spectral condition number of \(\mathcal {B}_\alpha \mathcal {A}_{\alpha }\) is bounded, uniformly in \(\alpha \),

$$\begin{aligned} k (\mathcal {B}_\alpha \mathcal {A}_{\alpha }) \le \frac{r_3}{r_2} \approx 4.089. \end{aligned}$$
(57)

If K has a nontrivial kernel, inequality (57) becomes an equality.

Proof

Consider the equivalent generalized eigenvalue problem

$$\begin{aligned} \left[ \begin{array}{ccc} \alpha M &{} {0}&{} M' \\ {0}&{} K &{} A ' \\ M &{} A &{} 0 \end{array} \right] \left[ \begin{matrix} f \\ u \\ w \end{matrix} \right] \quad = \quad \lambda \, \left[ \begin{array}{ccc} \alpha M &{} {0}&{} {0}\\ {0}&{} K + \alpha R &{} {0}\\ {0}&{} {0}&{} \alpha ^{-1} M \end{array} \right] \left[ \begin{matrix} f \\ u \\ w \end{matrix} \right] \end{aligned}$$
(58)

We show that (58) admits no nontrivial solutions unless \(\lambda \) is as in (56).

Since M is a self-adjoint isomorphism, by assumption (B1), we can rewrite (58) as the three identities

$$\begin{aligned} \alpha p f + w&= 0, \end{aligned}$$
(59)
$$\begin{aligned} p K u + A ' w - \lambda \alpha R u&= 0, \end{aligned}$$
(60)
$$\begin{aligned} f + M ^{-1} A u -\lambda \alpha ^{-1} w&= 0. \end{aligned}$$
(61)

Assume that \(\lambda \) is not contained within the three closed intervals of (56). Then \(p \ne 0\), and we can use (59) to eliminate f from (61).

$$\begin{aligned} \begin{aligned} 0&= \alpha p ( f + M ^{-1} A u - \lambda \alpha ^{-1} w ) = \alpha p M ^{-1} A u - (1 +\lambda p) w \\&= \alpha p M ^{-1} A u - q w . \end{aligned} \end{aligned}$$
(62)

Since q is nonzero, we can use (62) to eliminate w from (60),

$$\begin{aligned} \begin{aligned} 0&= q (p K u + A ' w - \lambda \alpha R u ) = qp K u + \alpha (p - \lambda q)R u \\&= qp K u + r R u, \end{aligned} \end{aligned}$$
(63)

where the identity (55) was used. By assumption, pq and r are both nonzero. Moreover, it can be easily seen that pq and r have the same sign outside of the bounded intervals of (56). From assumptions (B1)–(B3), we conclude that \(qp K + r R\) is a self-adjoint definite operator. Then (63) only admits trivial solutions, hence \(\lambda \) can not be an eigenvalue of \(\mathcal {B}_\alpha \mathcal {A}_{\alpha }\).

The estimate (57) follows from (56), noting that \(\vert {\text {sp}}(\mathcal {B}_\alpha \mathcal {A}_{\alpha }) \vert \subset [r_2, r_3]\). From (63) it can be seen that the roots of r are eigenvalues of \(\mathcal {B}_\alpha \mathcal {A}_{\alpha }\) if \({\text {Ker}}K \) is nontrivial. \(\square \)

Remark

If \(A = (1 - {\varDelta }):{\bar{H}^2({\varOmega })}\rightarrow {L^2({\varOmega })}'\), then \(R = A' M^{-1} A\) is characterized by a bilinear form \(b(\cdot ,\cdot )\) as in (15):

$$\begin{aligned} \begin{aligned} \langle A' M^{-1} A u, v \rangle&= \int _{\varOmega }{\varDelta }u {\varDelta }v + 2 \nabla u \cdot \nabla v + u v \, dx\\&= (u,v)_{{H^2({\varOmega })}} + \int _{\varOmega }\nabla u \cdot \nabla v \, dx= b(u, v) \end{aligned} \end{aligned}$$

For discretizations \(U_h\subset U\) and \(W_h\subset W\) of A such that \(A(U_h) \subset M(W_h)\), the discretization of b coincides with \(A_h'M_h^{-1} A_h\). This follows from an argument similar to that in the proof of Lemma 5, and as a consequence, Theorem 4 can be applied to the preconditioned discrete systems considered in Sect. 3.

8 Discretization with \(H^1\) conforming finite elements

The theory outlined in Sect. 4 provides a robust preconditioning technique for the optimality system (20) assuming additional regularity and making use of \(H^2\) conforming elements. However, this additional regularity only appears relevant to the discretization of the (2, 2) block of the ideal preconditioner (51), since the coefficient matrix in (44) only involves second order operators. It therefore seems reasonable that the use of sophisticated \(H^2\) conforming elements could be avoided in favour of standard \(H^1\) conforming elements, provided that we can implement an approximate inverse to the fourth operator appearing in the preconditioner.

To be precise, we can discretize the optimality system (10)–(12) with \(H^1\)-conforming piecewise linear Lagrange elements for all the unknown variables. Note that this requires that the integration by parts formula is applied to the state equation, resulting in the variation problem

$$\begin{aligned} \alpha (f_h,\psi _h)_{{L^2({\varOmega })}}+(\psi _h,w_h)_{{L^2({\varOmega })}}= & {} 0 \quad \forall \psi _h \in V_h, \\ (u_h-d,\phi _h)_{{L^2(\partial {\varOmega })}} + (\phi _h, w_h)_{{H^1({\varOmega })}}= & {} 0 \quad \forall \phi _h \in V_h, \\ (f_h,\xi _h)_{{L^2({\varOmega })}} + (u_h,\xi _h)_{{H^1({\varOmega })}}= & {} 0 \quad \forall \xi _h \in V_h. \end{aligned}$$

for \((f_h, u_h, \xi _h) \in V_h \times V_h \times V_h\), where \(V_h\) is the space of continuous piecewise linear functions. Since all three unknowns belong to the same space, the eigenvalue analysis in Sect. 7 can be applied to the discretized coefficient matrix, which reads

$$\begin{aligned} \left[ \begin{array}{ccc} \alpha M_h &{} {0}&{} M_h \\ {0}&{} K_h &{} A_h \\ M_h &{} A_h &{} 0 \end{array} \right] , \end{aligned}$$
(64)

where \(A_h\) and \(M_h\) are symmetric matrices. The analysis in Sect. 7 reveals that an ideal preconditioner is given by

$$\begin{aligned} \left[ \begin{array}{ccc} \alpha M_h &{} {0}&{} {0}\\ {0}&{} K_h + \alpha A_h M_h^{-1} A_h &{} {0}\\ {0}&{} {0}&{} \alpha ^{-1} M_h \end{array} \right] ^{-1}, \end{aligned}$$
(65)

with condition numbers of the preconditioned system bounded independtly of \(\alpha \) and the discretization parameter h.

The operator \(K_h + A_h M_h^{-1} A_h\) in the (2,2) block of (65) coincides with Schur complement of a Ciarlet-Raviart mixed finite element formulation of the fourth order problem (16)–(18), and can be thought of as non-local fourth order operator. Multigrid techniques for a similar operator was studied in  [8], where a multigrid W-cycle applied to a local operator approximating the Schur complement was shown to be an efficient preconditioner.

Table 4 presents iteration numbers and estimated condition numbers for a simplistic scheme where we replace the (2,2) block in (65) with \(K_h + A_h {\tilde{M}}_h^{-1} A_h\), where \({\tilde{M}}_h\) is a lumped mass matrix. For the appxroximate inversion of (65), we applied an algebraic multigrid W-cycle for the (2,2) block and two symmetric Gauss–Seidel iterations to the remaining two diagonal blocks. The experiment was carried out on a unit square domain and an L-shaped domain, with both domains triangularized with structured meshes. For the L-shaped domain, the \(H^2\)-regularity discussed in the beginning of Sect. 2 is known not to hold.

The iteration numbers reported in Table 4 appears bounded, although we observe an increase in the estimated condition number for the L-shaped domain as the mesh is refined. Although the condition number with an exact inverse (65) is bounded in accordance with the analysis in Sect. 7, this appears not to be the case when the exact inverse of the (2,2) block is replaced with an AMG cycle.

We remark that for unstructered meshes we observed iteration counts increasing with mesh refinement, indicating the need for a more sophisticated approach to the multilevel approximation of the (2,2) block, for example as in  [8], for more complicated geometries.

Table 4 Minres iteration counts with estimated condition numbers in paranthesis for the coefficient matrix (64), with a preconditioner based on (65)

9 Discussion

Previously, parameter robust preconditioners for PDE-constrained optimization problems have been successfully developed, provided that observation data is available throughout the entire domain of the state equation. For many important inverse problems, arising in industry and science, this is an unrealistic requirement. On the contrary, observation data will typically only be available in subregions, of the domain of the state variable, or at the boundary of this domain. We have therefore explored the possibility for also constructing robust preconditioners for PDE-constrained optimization problems with limited observation data.

For an elliptic control problem, with boundary observations only, we have developed a regularization robust preconditioner for the associated KKT system. Consequently, the number of Minres iterations required to solve the problem is bounded independently of both regularization parameter \(\alpha \) and the mesh size h. In order to achieve this, it was necessary to write the elliptic state equation on a non-standard, and non-self-adjoint, variational form. If this approach is employed, then the control and the Lagrange multiplier will belong to the same Hilbert space, which leads to extra regularity requirements for the state. This fact makes it possible to construct parameter weighted metrics such that the constants appearing in the Brezzi conditions, as well as the constants in the inequalities expressing the boundedness of the KKT system, are independent of \(\alpha \) and h. Consequently, the spectrum of the preconditioned KKT system is uniformly bounded with respect to \(\alpha \) and h, which is ideal for the Minres scheme. These properties were illuminated through a series of numerical experiments, and the preconditioned Minres scheme handled our model problem excellently.

The use of a non-self-adjoint form of the elliptic state equation leads to additional challenges for conforming discretization schemes and in multigrid implementations. For the numerical experiments, we employed a \(C^1\) finite element discretization that is \(H^2\)-conforming, where the rectangular elements are tensor products of Hermite intervals. This discretization is limited to structured meshes. While there are other, more flexible \(C^1\) finite element discretizations available in two dimensions (e.g. Argyris and Bell triangles), all of the methods suffer from high computational cost due the smoothness requirements imposed on the nodal basis functions. In three dimensions, the situation is even worse, and \(C^1\) discretizations with tetrahedrons become nearly intractable, see e.g.  [15].

Some of the difficulties with traditional \(C^1\) finite element discretizations can be avoided with Galerkin methods making use of basis functions that naturally fulfill the smoothness requirements. Examples of such methods include discretization with spline basis functions, such as isogeometric analysis [3]. Another approach is the virtual element method [2]. However, the development of multilevel methods for the fourth order operator in the preconditioner (51) would remain a challenging problem.

We have also demonstrated that the technique is applicable also outside of \(H^2\)-conforming discretizations.

Our findings for the simple elliptic control problem were generalized to a broader class of KKT systems. It turns out that the methodology is applicable whenever the control and the Lagrange multiplier belong to the same space, and extra regularity properties are fulfilled by the state equation - these are the key issues. From a theoretical perspective, this is in many cases not a severe restriction, but it gives rise to new challenges for the discrete problems. This is even the case for the elliptic state equation considered in this text. Also, our approach will not yield \(\alpha \) independent bounds if the control is only defined on a subdomain of the domain of the state equation. In such cases, the spaces for the control and the Lagrange multiplier will not coincide. How to design efficient parameter-robust preconditioners for such problems, is, as far as the authors know, still an open problem.