Abstract
Regularization robust preconditioners for PDEconstrained optimization problems have been successfully developed. These methods, however, typically assume observation data and control throughout the entire domain of the state equation. For many inverse problems, this is an unrealistic assumption. In this paper we propose and analyze preconditioners for PDEconstrained optimization problems with limited observation data, e.g. observations are only available at the boundary of the solution domain. Our methods are robust with respect to both the regularization parameter and the mesh size. That is, the condition number of the preconditioned optimality system is uniformly bounded, independently of the size of these two parameters. The method does, however, require extra regularity. We first consider a prototypical elliptic control problem and thereafter more general PDEconstrained optimization problems. Our theoretical findings are illuminated by several numerical results.
Introduction
Consider the model problem:
on a Lipschitz domain \({\varOmega }\subset \mathbb {R}^n\), subject to
This minimization task is similar to the standard example considered in PDEconstrained optimization. But instead of assuming that observation data is available everywhere in \({\varOmega }\), we consider the case where observations are only given at the boundary \(\partial {\varOmega }\) of \({\varOmega }\), that is \(d \in {L^2(\partial {\varOmega })}\), see the first term in (1). For problems of the form (1)–(3), in which the objective functional is replaced by
very efficient preconditioners have been developed for the associated KKT system. In fact, by employing proper \(\alpha \)dependent scalings of the involved Hilbert spaces [14], or by using a Schur complement approach [13], methods that are robust with respect to the size of the regularization parameter \(\alpha \) have been developed. More specifically, the condition number of the preconditioned optimality system is small and bounded independently of \(0 < \alpha \ll 1\) and the mesh size h. This ensures good performance for suitable Krylov subspace methods, e.g. the minimum residual method (Minres), independently of both parameters. These techniques have been extended to handle time dependent problems [12] and PDEconstrained optimization with Stokes equations [17], but the rigorous analysis of \(\alpha \)independent bounds always requires that observations are available throughout all of \({\varOmega }\).
For cases with limited observations, for example with costfunctionals of the form (1), efficient preconditioners are also available for a rather large class of PDEconstrained optimization problems, see [10, 11]. But these techniques do not yield convergence rates, for the preconditioned KKTsystem, that are completely robust with respect to the size of the regularization parameter \(\alpha \). Instead, the number of preconditioned Minres iterations grows logarithmically^{Footnote 1} with respect to the size of \(\alpha ^{1}\), as \(\alpha \rightarrow 0\):
for constants a, b independent of \(\alpha \). According to the numerical experiments presented in [11], the size of b may become significant. More specifically, \(b \in [5,50]\) for problems with simple elliptic state equations posed on rectangles. Thus, for small values of \(\alpha \), Minres may require rather many iterations to converge—even though the growth in iteration numbers is only logarithmic.
In practice, observations are rarely available throughout the entire domain of the state equation. On the contrary, the purpose of solving an inverse problem is typically to use data recorded at the surface of an object to compute internal properties of that object: Impedance tomography, the inverse problem of electrocardiography (ECG), computerized tomography (CT), etc. This fact, combined with the discussion above, motivate the need for further improving numerical methods for solving KKT systems arising in connection with PDEconstrained optimization.
This paper is organized as follows. In the next section we derive the KKT system associated with the model problem (1)–(3). Our \(\alpha \) robust preconditioner is presented in Sect. 3, along with a number of numerical experiments. Sections 4 and 5 contain our analysis, and the method is generalized in Sects. 6 and 7. In Sect. 8 we discuss the preconditioner when applied to a standard finite element approximation of the problem. Section 9 provides a discussion of our findings, including their limitations.
KKT system
Consider the PDE (2) with the boundary condition (3). A solution u to this elliptic PDE, with source term \(f\in {L^2({\varOmega })}\), is known to have improved regularity, i.e. \(u\in H^{s}({\varOmega })\), for some \(s\in [1, 2]\), with s depending on the domain \({\varOmega }\). In the remainder of this paper we assume that the solution u is in \({H^2({\varOmega })}\) for any source term \(f\in {L^2({\varOmega })}\). This assumption is known to hold if \({\varOmega }\) is convex or if \(\partial {\varOmega }\) is \(C^2\), see e.g. [5, 7].
When solutions to (2) exhibit \({H^2({\varOmega })}\)regularity, we can write the problem on the nonstandard variational form: Find \(u \in {\bar{H}^2({\varOmega })}\) such that
where
equipped with the inner product
Here \(\nabla ^2u\) denotes the Hessian of u, and the second identity is due to the boundary condition \(\frac{\partial u}{\partial \mathbf {n}} = 0\) imposed on the space \({\bar{H}^2({\varOmega })}\).
We will see below that, in order to design a regularization robust preconditioner for (1)–(3), it is convenient to express the state equation in the form (6), instead of employing integration by parts/Green’s formula to write it on the standard selfadjoint form.
Optimality system
We may express (1)–(3) in the form:
subject to
The associated Lagrangian reads
with \(f \in {L^2({\varOmega })}\), \(u \in {\bar{H}^2({\varOmega })}\) and \(w \in {L^2({\varOmega })}\). From the first order optimality conditions
we obtain the optimality system: determine \((f,u,w) \in {L^2({\varOmega })}\times {\bar{H}^2({\varOmega })}\times {L^2({\varOmega })}\) such that
Numerical experiments
Prior to analyzing our model problem, we will consider some numerical experiments. Discretization of (10)–(12) yields an algebraic system of the form
where M is a mass matrix, the discretization of the \({L^2({\varOmega })}\) inner product. \(M_{\partial }\) is a mass matrix associated with the boundary \(\partial {\varOmega }\) of \({\varOmega }\). A is a matrix that arises upon discretization of the operator \((1{\varDelta })\). Since we write the state equation on a non selfadjoint form, A will not be the usual sum of the stiffness and mass matrices. Instead, Eq. (6) is discretized with subspaces of \({\bar{H}^2({\varOmega })}\) and \({L^2({\varOmega })}\). Consequently, A will in general not be a square matrix.
In (13), we have implicitly used the same discretization for the control variable and the Lagrange multiplier. In (10)–(12), both variables belong to \({L^2({\varOmega })}\), so it seems natural to preserve this correspondence in the discretization. In fact, we can see from (10) that \(f = \alpha ^{1} w\), so the control could be eliminated from the system prior to the discretization. This would result in a \(2\times 2\) block system in place of (13). While solving the smaller system is more practical in terms of computational costs, we find that the analysis is more clearly presented for the \(3\times 3\) system (13).
In the current numerical experiments, we employ the Bogner–Fox–Schmit (BFS) rectangle for discretizing the state variable \(u \in {\bar{H}^2({\varOmega })}\). That is, the finite element field consists of bicubic polynomials that are continuous, have continuous first order derivatives and mixed second order derivatives at each vertex of the mesh. BFS elements are \(C^1\) on rectangles and therefore \(H^2\)conforming. The control f and Lagrange multiplier w are discretized with discontinuous bicubic elements.
We propose to precondition (13) with the blockdiagonal matrix
where R results from a discretization of the bilinear form \(b(\cdot ,\cdot )\) on \({\bar{H}^2({\varOmega })}\):
In the experiments presented below, we used this bilinear form to construct a multigrid approximation of \(\left( \alpha R+M_\partial \right) ^{1}\).
Remark
The bilinear form (15) is equivalent to the inner product on \({\bar{H}^2({\varOmega })}\). The additional term stems from our choice of implementing a multigrid algorithm for the bilinear form associated with the operator \(({\varDelta } 1)^2 = {\varDelta }^2 2 {\varDelta }+1\). Indeed, the bilinear form \(\alpha b(\,\cdot \,, \,\cdot \,) + (\,\cdot \,,\,\cdot \,)_{{L^2(\partial {\varOmega })}}\) can be seen to coincide with the variational form associated with the fourth order problem
To limit the technical complexity of the implementation, we considered the problem (1)–(3) on the unit square in two dimensions. The experiments were implemented in Python and SciPy. The meshes were uniform rectangular, with the coarsest level for the multigrid solver consisting of \(8\times 8\) rectangles. Figure 1 shows an example of a solution of the optimality system (13).
Eigenvalues
Let us first consider the exact preconditioner \(\mathcal {B}_{\alpha }\) defined in (14). If \(\mathcal {B}_{\alpha }\) is a good preconditioner for the discrete optimality system (13), then the spectral condition number of \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }\) should be small and bounded, independently of the size of both the regularization parameter \(\alpha \) and the discretization parameter h.
The eigenvalues of this preconditioned system were computed by solving the generalized eigenvalue problem
We found that the absolute value of the eigenvalues \(\lambda \) were bounded, with
uniformly in \(\alpha \in \{1,10^{1},\ldots , 10^{10}\}\) and \(h \in \{2^{2},\ldots ,2^{5}\}\). This yields a uniform condition number \(k(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }) \approx 4.05\). The spectra of the preconditioned systems are pictured in Fig. 2 for some choices of \(\alpha \). The spectra are clearly divided into three bounded intervals, and the eigenvalues are more clustered for \(\alpha \approx 1\) and for very small \(\alpha \).
Multilevel preconditioning
In practice, the action of \(\mathcal {B}_{\alpha }\) is replaced with a less computationally expensive operation \({\widehat{\mathcal {B}_{\alpha }}}\). Note that \(\mathcal {B}_{\alpha }\) has a block structure, and that computationally efficient approximations can be constructed for the individual blocks. The only challenging block of the preconditioner is the biharmonic operator \(\alpha R+M_\partial \). Order optimal multilevel algorithms for forth order operators discretized with the Bogner–Fox–Schmit was developed in [16]. Specifically, it was shown that a multigrid Vcycle using a symmetric \(4\times 4\) block Gauss–Seidel smoother, where the blocks contain the matrix entries corresponding to all degrees of freedom associate with a vertex in the mesh, results in an order optimal approximation. The remaing blocks of the preconditioners are weighted mass matrices which are efficiently handled by two symmetric GaussSeidel iterations for the (1,1) and (3,3) blocks.
We estimated condition numbers of the individual blocks of \(\mathcal {B}_{\alpha }^{1}\) preconditioned with their respective approximations. The results are reported in Tables 1 and 2. A slight deterioration in the performance of the multigrid cycle can be seen for very small values of \(\alpha > 0\).
Iteration numbers
To verify that also \({\widehat{\mathcal {B}_{\alpha }}}\) is an effective preconditioner for \(\mathcal {A}_{\alpha }\), we applied the Minres scheme to the system
For the results presented in Table 3, the Minres iteration process was stopped as soon as
which is the standard termination criterion for the preconditioned Minres scheme, provided that the preconditioner is SPD. A random initial guess \(x_0\) was used, and the tolerance was set to \(\varepsilon = 10^{12}\).
Analysis of the KKT system
Recall that our optimality system reads:
with unknowns \(f \in {L^2({\varOmega })}\), \(u \in {\bar{H}^2({\varOmega })}\) and \(w \in {L^2({\varOmega })}\). We may write this KKT system in the form:
Determine \((f,u,w) \in {L^2({\varOmega })}\times {\bar{H}^2({\varOmega })}\times {L^2({\varOmega })}\) such that
where
and the notation “\('\)” is used to denote dual operators and dual spaces. In the rest of this paper, the symbols M, \( M_{\partial }\) and A will represent the mappings defined in (21), (22) and (24), respectively, and not (the associated) matrices, as was the case in Sect. 3. We believe that this mild ambiguity improves the readability of the present text.
By using standard techniques for saddle point problems, one can show that the system (20) satisfies the Brezzi conditions [1], provided that \(\alpha >0\). Therefore, for every \(\alpha > 0\), this set of equations has a unique solution. Nevertheless, if the standard norms of \({L^2({\varOmega })}\) and \({H^2({\varOmega })}\) are employed in the analysis, then the constants in the Brezzi conditions will depend on \(\alpha \). More specifically, the constant in the coercivity condition will be of order \(O(\alpha )\), and thus becomes very small for \(0 < \alpha \ll 1\). This property is consistent with the ill posed nature of (1)–(3) for \(\alpha =0\), and makes it difficult to design \(\alpha \) robust preconditioners for the algebraic system associated with (20).
Similar to the approach used in [9, 10, 14], we will now introduce weighted Hilbert spaces. The weights are constructed such that the constants appearing in the Brezzi conditions are independent of \(\alpha \). Thereafter, in Sect. 5, we will show how these scaled Hilbert spaces can be combined with simple maps to design \(\alpha \) robust preconditioners for our model problem.
Weighted norms
Consider the \(\alpha \)weighted norms:
applied to the control f, the state u and the dual/Lagrangemultiplier w, respectively. Note that these norms become “meaningless” for \(\alpha = 0\), but are well defined for positive \(\alpha \).
Brezzi conditions
We will now analyze the properties of
defined in (20). More specifically, we will show that the Brezzi conditions are satisfied with constants that do not depend on the size of the regularization parameter \(\alpha > 0\). Note that we use the scaled Hilbert norms (25)–(27).
Lemma 1
For all \(\alpha > 0\), the following “infsup” condition holds:
Proof
Note that \(L_{\alpha }^2({\varOmega })\) and \(L_{\alpha ^{1}}^2({\varOmega })\) contain the same functions, provided that \(\alpha > 0\). Let \(w \in L_{\alpha ^{1}}^2({\varOmega })\) be arbitrary. By choosing \(f=w\) and \(u=0\) we find that
Since \(w\in L_{\alpha ^{1}}^2({\varOmega })\) was arbitrary, this completes the proof. \(\square \)
Expressed in terms of the operators that constitute \(\mathcal {A}_{\alpha }\), Lemma 1 takes the form
Recall that we decided to write our state Eqs. (2)–(3) on the nonstandard variational form (6). Throughout this paper we assume that problem (2)–(3) admits a unique solution \(u \in {\bar{H}^2({\varOmega })}\) for every \(f \in {L^2({\varOmega })}\), and that
This assumption is valid if \({\varOmega }\) is convex or if \({\varOmega }\) has a \(C^2\) boundary, see e.g. [5, 7]. Inequality (28) is a key ingredient of the proof of our next lemma.
Lemma 2
There exists a constant \(c_2\), which is independent of \(\alpha > 0\), such that
for all \((f,u) \in {L^2({\varOmega })}\times {\bar{H}^2({\varOmega })}\) such that
Proof
If (f, u) satisfies (29), then
see the discussion of (28). Let \(\theta = (1+ c_1^2)^{1} \in (0,1)\), and it follows that
\(\square \)
This result may also be written in the form
for all \((f,u) \in L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })\) satisfying
where M, \(M_{\partial }\) and A are the operators defined in (21), (22) and (24), respectively.
Boundedness
Having established that the Brezzi conditions hold, with constants that are independent of \(\alpha \), we next explore the boundedness of \(\mathcal {A}_{\alpha }\).
Lemma 3
For all \((f,u),(\psi , \phi ) \in L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })\),
Proof
Recall the definitions (21) and (22) of M and \(M_{\partial }\), respectively. Since
we find, by employing the Cauchy–Schwarz inequality, that
Lemma 4
For all \((f,u) \in L_{\alpha }^2({\varOmega })\times H_{\alpha }^2({\varOmega })\), \(w \in L_{\alpha ^{1}}^2({\varOmega })\),
Proof
Again, we note that
From the definitions of M and A, see (21) and (24), and the CauchySchwarz inequality, it follows that
For the last equality, recall from (7) that \(\left\ {\varDelta }u \right\ _{L^2({\varOmega })}= \left\ \nabla ^2u \right\ _{L^2({\varOmega })}\le \left\ u \right\ _{H^2({\varOmega })}\) for all \(u\in {\bar{H}^2({\varOmega })}\). \(\square \)
Isomorphism
We have verified that the Brezzi conditions hold, and that \(\mathcal {A}_{\alpha }\) is a bounded operator. Moreover, all constants appearing in the inequalities expressing these properties are independent of the regularization parameter \(\alpha > 0\). Let
Theorem 1
The operator \(\mathcal {A}_{\alpha }\), defined in (20), is bounded and continuously invertible for \(\alpha > 0\) in the sense that for all nonzero \(x\in \mathcal {V}\),
for some positive constants c and C that are independent of \(\alpha > 0\). In particular,
Proof
This result follows from Lemmas 1, 2, 3, and 4 and Brezzi theory for saddle point problems, see [1]. \(\square \)
Estimates for the discretized problem
The stability properties (32) are not necessarily inherited by discretizations. However, the structure used to prove the socalled “infsup condition” in Lemma 1 is preserved in the discrete system provided that the same discretization is employed for the control and the Lagrange multiplier. Furthermore, the boundedness properties, Lemmas 3 and 4, certainly also hold for conforming discretizations.
It remains to adress the coercivity condition, Lemma 2, for the discretized problem. We consider finite dimensional subspaces \(U_h\subset U = {\bar{H}^2({\varOmega })}\) and \(W_h\subset W = {L^2({\varOmega })}\). For certain choices of \(U_h\) and \(W_h\), the estimate of Lemma 2 carries over to the finitedimensional setting.
Lemma 5
Assume \(U_h\subset U\) and \(W_h\subset W\), such that \((1{\varDelta }) U_h \subset W_h\). Then
for all \((f_h,u_h) \in W_h\times U_h\) such that
Proof
Assume that \((1{\varDelta }) U_h \subset W_h\), and that (34) holds for \((f_h,u_h)\in W_h\times U_h\). Then \(f_h +(1{\varDelta }) u_h \in W_h\), and (34) implies \(f_h +(1{\varDelta }) u_h = 0\). Therefore, \((f_h, u_h)\) satisfies (29) and the estimate (33) follows from Lemma 2. \(\square \)
If the discretization is chosen such that Lemma 5 is satisfied, then the estimates (32) carries over to discretized system. More precisely, we have
where \(\mathcal {V}_h = W_h\times U_h \times W_h \subset \mathcal {V}\), equipped with the inner prdocut of \(\mathcal {V}\), and \(\mathcal {A}_{\alpha ,h}\) is discrete counterpart to \(\mathcal {A}_{\alpha }\), defined by setting \(\left\langle \mathcal {A}_{\alpha ,h}x_h,y_h \right\rangle = \left\langle \mathcal {A}_{\alpha }x_h, y_h \right\rangle \) for all \(x_h,y_h\in \mathcal {V}_h\).
If the state is discretized with \(C^1\)conforming bicubic Bogner–Fox–Schmit rectangles, as in Sect. 3, then Lemma 5 is satisfied if the control and Lagrange multiplier is discretized with discontinuous bicubic elements on the same mesh. For triangular meshes, one could choose Argyris triangles for the state variable and piecewise quintic polynomials for the control and Lagrange multiplier variables.
We remark that Lemma 5 provides a sufficient, but not necessary criterion for stability of the discrete problem, and usually may imply far more degrees of freedom in the discrete space \(W_h\subset W\) than is actually needed. The usefulness of Lemma 5 is that the estimates (35) can, in principle, always be obtained by choosing a sufficiently large space for the control and Lagrange multiplier.
Preconditioning
The linear problem (20) is of the form
where x is sought in a Hilbert space \(\mathcal {V}\), the right hand side b is in the dual space \(\mathcal {V}'\), and \(\mathcal {A}\) is a selfadjoint continuous mapping of \(\mathcal {V}\) onto \(\mathcal {V}'\). Iterative methods for linear problems are most often formulated for operators mapping \(\mathcal {V}\) into itself, and can not be directly applied to the linear system (36), as described in [9]. If we want to apply such methods to (36), then we need to introduce a continuous operator mapping \(\mathcal {V}'\) isomorphically back onto \(\mathcal {V}\). More precisely, if we have a continuous operator
then \(\mathcal {M}= \mathcal {B}\mathcal {A}:\mathcal {V}\rightarrow \mathcal {V}\) is continuous and has the desired mapping properties, and if \(\mathcal {B}\) is an isomorphism, the solutions to (36) coincides with the solutions to the problem
In this paper we shall consider \(\mathcal {B}\in \mathcal {L}(\mathcal {V},\mathcal {V}')\) a preconditioner if \(\mathcal {B}\) is selfadjoint and positive definite. This implies that \(\mathcal {B}^{1}\) is selfadjoint and positive definite as well, and hence \(\mathcal {B}^{1}\) defines an inner product on \(\mathcal {V}\) by setting
This inner product has the crucial property of making \(\mathcal {M}\) selfadjoint, in the sense that
Conversely, given any inner product on \(\left( \,\cdot \,,\,\cdot \, \right) \) on \(\mathcal {V}\), the Riesz–Fréchet theorem provides a selfadjoint positive definite isomorphism \(\mathcal {B}:\mathcal {V}'\rightarrow \mathcal {V}\) such that (38) and (39) hold, and we say that \(\mathcal {B}\) is the Riesz operator induced by \(\left( \,\cdot \,,\,\cdot \, \right) \). This establishes a onetoone correspondence between preconditioners and Riesz operators on \(\mathcal {V}'\). Since the Riesz operator is an isometric isomorphism, the operator norm of \(\mathcal {B}\mathcal {A}\) coincides with the operator norm of \(\mathcal {A}\). We formulate this wellknown fact here in a lemma for the sake of selfcontainedness. We refer to [6, 9] for a more indepth discussion of preconditioning and its relation to Riesz operators.
Lemma 6
Let \(\mathcal {V}\) be a Hilbert space, and let \(\mathcal {A}:\mathcal {V}\rightarrow \mathcal {V}'\) be a selfadjoint isomorphism, and assume that \(\mathcal {B}\) is the Riesz operator induced by the inner product on \(\mathcal {V}\), or equivalently, that the inner product on \(\mathcal {V}\) is defined by the selfadjoint positive definite isomorphism \(\mathcal {B}^{1}:\mathcal {V}\rightarrow \mathcal {V}'\). Then \(\mathcal {B}\mathcal {A}: \mathcal {V}\rightarrow \mathcal {V}\) is an isomorphism, selfadjoint in the inner product on \(\mathcal {V}\), with
In particular, the condition number of \(\mathcal {B}\mathcal {A}\) is given by
Proof
Since \(\mathcal {A}\) is selfadjoint, \(\mathcal {M}= \mathcal {B}\mathcal {A}\) is selfadjoint with respect to the inner product on \(\mathcal {V}\). From the Riesz–Fréchet theorem we have \(\left\ \mathcal {A}x \right\ _{\mathcal {V}'} = \left\ \mathcal {B}\mathcal {A}x \right\ = \left\ \mathcal {M}x \right\ \), and we obtain following identity for the operator norm of \(\mathcal {M}\).
A similar identity is obtained for the norm of the inverse operator,
We say that a preconditioner \(\mathcal {B}_{\alpha }\) for \(\mathcal {A}_{\alpha }\) is robust with respect to the parameter \(\alpha \) if \(\kappa (\mathcal {B}_{\alpha }\mathcal {A}_{\alpha })\) is bounded uniformly in \(\alpha \). The significance of Lemma 6 is that such a robust preconditioner can be found by identifying (parameterdependent) norms in which \(\mathcal {A}_{\alpha }\) and \(\mathcal {A}_{\alpha }^{1}\) are both uniformly bounded.
Parameterrobust minimum residual method
In Sect. 4 stability of \(\mathcal {A}_{\alpha }\) was shown in the \(\alpha \)dependent norms defined in (25)–(27). The preconditioner provided by Lemma 6 is the Riesz operator induced by the weighted norms. This operator \(\mathcal {B}_{\alpha }: \mathcal {V}' \rightarrow \mathcal {V}\) takes the form
where \(R:{\bar{H}^2({\varOmega })}\rightarrow {\bar{H}^2({\varOmega })}'\) is the operator induced by the \({H^2({\varOmega })}\) inner product, i.e. \(\left\langle Ru,v \right\rangle = (u,v)_{{H^2({\varOmega })}}\).
Since \(\mathcal {A}_{\alpha }\) is selfadjoint, the preconditioned operator \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }:\mathcal {V}\rightarrow \mathcal {V}\) is selfadjoint in the inner product on \(\mathcal {V}\). Consequently we can apply the minimum residual method (Minres) to the problem
Theorem 2
Let \(\mathcal {A}_{\alpha }\) be the operator defined in (20) and \(\mathcal {B}_{\alpha }\) the operator defined in (40). Then there exists an upper bound, independent of \(\alpha \), for the convergence rate of Minres applied to the preconditioned system
In particular there exists an upper bound, independent of \(\alpha \), for the number of iterations needed to reach the stopping criterion (19).
Proof
A crude upper bound for the convergence rate (more precisely, the twostep convergence rate) of Minres is given by
where \(\kappa = \kappa (\mathcal {B}_{\alpha }\mathcal {A}_{\alpha })\) is the condition number of \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }\), see e.g. [9]. From Lemma 6 and (32) we determine that \(\kappa \) is bounded independently of \(\alpha \), with
\(\square \)
In practical applications, the operator \(\mathcal {B}_{\alpha }\) will be replaced with a less computationally expensive approximation \({\widehat{\mathcal {B}_{\alpha }}}\). Ideally \({\widehat{\mathcal {B}_{\alpha }}}\) will be spectrally equivalent to \(\mathcal {B}_{\alpha }\), in the sense that the condition number of \({\widehat{\mathcal {B}_{\alpha }}} \mathcal {B}_{\alpha }^{1}\) is bounded, independently of \(\alpha \). Then the preconditioned system reads
and the upper bound for the convergence rate is determined by the conditioned number \(\kappa ({\widehat{\mathcal {B}_{\alpha }}}\mathcal {A}_{\alpha }) \le \kappa ({\widehat{\mathcal {B}_{\alpha }}}\mathcal {B}_{\alpha }^{1})\kappa (\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }^{1})\).
Remark
In this paper we only consider the minimum residual method, and we therefore require that the preconditioner is selfadjoint and positive definite. More generally, if other Krylov subspace methods are to be applied to (20), then preconditioners lacking symmetry or definiteness may be considered.
We mention in particular that a preconditioned conjugate gradient method for problems similar to (20) was proposed in [14], based on a clever choice of inner product.
Generalization
Is our technique applicable to other problems than (1)–(3)? We will now briefly explore this issue, and show that the preconditioning scheme derived above yields \(\alpha \) robust methods for a class of problems.
The scaling (25)–(27) was also investigated in [10], but for a family of abstract problems posed in terms of Hilbert spaces. More specifically, for general PDEconstrained optimization problems, subject to Tikhonov regularization, and with linear state equations. But in [10] no assumptions about the control, state or observation spaces were made, except that they were Hilbert spaces. Under these circumstances, it was proved that the coercivity and the boundedness, of the operator associated with the KKT system, hold with \(\alpha \)independent constants. Nevertheless, in this general setting, the infsup condition involved an \(\alpha \)dependent constant, which, eventually, yielded theoretical iteration bounds of order \(O([\log \left( \alpha ^{1} \right) ]^2)\) for Minres.
In the present paper we were able to prove an \(\alpha \)robust infsup condition for the model problem (1)–(3). This is possible because both the control f and the dual/Lagrangemultiplier w belong to \({L^2({\varOmega })}\). From a more general perspective, it turns out that this is the property that must be fulfilled in order for our approach to be successful: The control space and the dual space, associated with the state equation, must coincide. This will usually lead to additional regularity requirements for the state space.
Motivated by this discussion, let us consider an abstract problem of the form:
subject to
Here, W is the dual and control space, U is the state space, O is the observation space, W, U and O are Hilbert spaces.
Let us assume that
 (A1):

\(A:U \rightarrow W'\) is a continuous linear operator with closed range. In particular, there is a constant \(c_1\) such that for all \(u \in U\),
$$\begin{aligned} \left\ u \right\ _{U/ {\text {Ker}}A} = \inf _{{\tilde{u}} \in {\text {Ker}}A} \left\ u{\tilde{u}} \right\ _U \le c_1\left\ A u \right\ _{W'}. \end{aligned}$$  (A2):

\(T:U \rightarrow O\) is linear and bounded, and invertible on the kernel of A. That is, there is a constant \(c_2\) such that for all \(u\in {\text {Ker}}A\),
$$\begin{aligned} \left\ u \right\ _U \le c_2 \left\ T u \right\ _O. \end{aligned}$$
where
Note that, compared with (13), the boundary observation matrix \(M_{\partial }\) has been replaced with the general observation operator K in (44).
We introduce scaled norms as follows.
We first show that \(\left\ \,\cdot \, \right\ _{U_\alpha }\) is indeed a norm on U when assumptions (A1) and (A2) hold. It suffices to show that \(\left\ \,\cdot \, \right\ _{U_\alpha }\) is a norm equivalent to \(\left\ \,\cdot \, \right\ _U\) when \(\alpha =1\). We have
and letting \(\pi \) denote the orthogonal projection of U onto \({\text {Ker}}A\),
Here the last inequality follows from \(\left\ u\pi u \right\ _{U} = \inf _{{\tilde{u}}\in {\text {Ker}}A}\left\ u{\tilde{u}} \right\ _U\) and assumption (A1).
We set \(\mathcal {V}= W_\alpha \times U_{\alpha } \times W_{\alpha ^{1}}\). As in Sect. 4, \(\mathcal {A}_{\alpha }:\mathcal {V}\rightarrow \mathcal {V}'\) can be shown to be an isomorphism, with parameterindependent estimates obtained in the weighted norms.
Theorem 3
There exists positive constants c and C, independent of \(\alpha \), such that for all nonzero \(x \in \mathcal {V}\),
We omit the full proof, which is analogous to that of Theorem 1. The crucial part is the “infsup condition” of Lemma 1, which is easily shown to hold in the abstract setting:
The coercivity condition of Lemma 2 naturally holds in the prescribed norm on \(U_\alpha \), since for \((f,u)\in W\times U\) such that \(Au = Mf\),
Note that the weighted norm now depends on A, and as consequence, the estimates become Aindependent. In fact, we obtain bounds for the constants c and C which are independent of \(\alpha \) as well as the operators appearing in (42)–(43). This is postponed to the next section, where sharp estimates are obtained for (50).
With the estimates (50), Lemma 6 provides a preconditioner for the operator \(\mathcal {A}_{\alpha }\), given as
The condition number of \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }\) will be bounded independently of \(\alpha \). It is, however, not clear how to find a computationally efficient approximation of \(\mathcal {B}_{\alpha }\) in the abstract setting of (42)–(43).
Example 1
The problem (1)–(3) fits in the abstract framework presented in this section when we assume that the state has \({H^2({\varOmega })}\) regularity. We set \(W= {L^2({\varOmega })}\), \(U={\bar{H}^2({\varOmega })}\), \(A = 1{\varDelta }\), and \(T:{\bar{H}^2({\varOmega })}\rightarrow {L^2(\partial {\varOmega })}\) is a trace operator, see (46). Since A is a continuous isomorphism, assumptions (A1) and (A2) are both valid. The inner product on \(U_\alpha \) takes the form
where \(\nabla ^2u\) denotes the Hessian of u, and the last equality follows from the boundary condition \(\partial u /\partial \mathbf {n} = 0\) imposed on \({\bar{H}^2({\varOmega })}\). The resulting preconditioner is the one that was used in the numerical experiments, detailed in Sect. 3, and it is spectrally equivalent to the preconditioner defined in (40).
Example 2
Let U, W, and K be as in Example 1, but let us set \(A = {\varDelta }\). Now A has nontrivial kernel, consisting of the a.e. constant functions, and for constant u we have
Since assumptions (A1) and (A2) are valid, the optimality system is still wellposed. In this case the inner product on \(U_\alpha \) is given by
Example 3
Let us consider the “prototype” problem:
subject to
Note that we here consider the case in which observation data is assumed to be available throughout the entire domain \({\varOmega }\) of the state equation.
If the usual variational form of the PDE is used, i.e.,
then the control space equals \({L^2({\varOmega })}\), whereas the dual space is \({H^1({\varOmega })}\). The preconditioning strategy presented in this section is therefore not applicable.
If instead we can assume \({H^2({\varOmega })}\)regularity, we can use the variational form
Now, the control and dual spaces both equal \({L^2({\varOmega })}\). The methodology presented in this section can thus be applied, and a robust preconditioner is obtained. Compared with the preconditioner for the problem with boundary observations only, see Sect. 5, Eq. (40), the only change is the replacement of \(M_{\partial }\), in the (2, 2) block of \(\mathcal {B}_{\alpha }\) with M.
We remark that in [13, 14], parameterrobust preconditioners were proposed for the “prototype” problem, using the standard variational formulation (52) of the PDE. Those methods do not require improved regularity for the state space. Instead, they require that observations are available throughout the computational domain.
Eigenvalue analysis
In Sect. 6 it was shown that the condition number of \(\mathcal {B}_{\alpha }\mathcal {A}_{\alpha }\), with \(\mathcal {A}_{\alpha }\) defined in (44) and \(\mathcal {B}_{\alpha }\) defined in (51), can be bounded independently of \(\alpha \), as well as independently of the operators appearing in (42)–(43). Moreover, the numerical experiments indicate that the eigenvalues are contained in three intervals, independently of the regularization parameter \(\alpha \), see Fig. 2. In this section we detail the structure of the spectrum of the preconditioned system considered in Sect. 6, and we obtain sharp estimates for the constants appearing in Theorem 3.
We consider selfadjoint linear operators \(\mathcal {A}_{\alpha }\) and \(\mathcal {B}_\alpha \),
where R is defined by
We assume that \(A:U\rightarrow W'\) and \(M:W\rightarrow W'\) are continuous operators, for some Hilbert spaces U and W. In addition we will make use of the following assumptions.
 (B1):

M is a selfadjoint and positive definite,
 (B2):

\( K + R\) is positive definite,
 (B3):

K is selfadjoint and positive semidefinite.
Theorem 4
Let p, q, and r be the polynomials
Let \(q_1<q_2\) and \(r_1< r_2 <r_3\) be the roots of q and r, respectively. The spectrum of \(\mathcal {B}_\alpha \mathcal {A}_{\alpha }\) is contained within three intervals, determined by the roots of p and r, independently of \(\alpha \):
Consequently, the spectral condition number of \(\mathcal {B}_\alpha \mathcal {A}_{\alpha }\) is bounded, uniformly in \(\alpha \),
If K has a nontrivial kernel, inequality (57) becomes an equality.
Proof
Consider the equivalent generalized eigenvalue problem
We show that (58) admits no nontrivial solutions unless \(\lambda \) is as in (56).
Since M is a selfadjoint isomorphism, by assumption (B1), we can rewrite (58) as the three identities
Assume that \(\lambda \) is not contained within the three closed intervals of (56). Then \(p \ne 0\), and we can use (59) to eliminate f from (61).
Since q is nonzero, we can use (62) to eliminate w from (60),
where the identity (55) was used. By assumption, pq and r are both nonzero. Moreover, it can be easily seen that pq and r have the same sign outside of the bounded intervals of (56). From assumptions (B1)–(B3), we conclude that \(qp K + r R\) is a selfadjoint definite operator. Then (63) only admits trivial solutions, hence \(\lambda \) can not be an eigenvalue of \(\mathcal {B}_\alpha \mathcal {A}_{\alpha }\).
The estimate (57) follows from (56), noting that \(\vert {\text {sp}}(\mathcal {B}_\alpha \mathcal {A}_{\alpha }) \vert \subset [r_2, r_3]\). From (63) it can be seen that the roots of r are eigenvalues of \(\mathcal {B}_\alpha \mathcal {A}_{\alpha }\) if \({\text {Ker}}K \) is nontrivial. \(\square \)
Remark
If \(A = (1  {\varDelta }):{\bar{H}^2({\varOmega })}\rightarrow {L^2({\varOmega })}'\), then \(R = A' M^{1} A\) is characterized by a bilinear form \(b(\cdot ,\cdot )\) as in (15):
For discretizations \(U_h\subset U\) and \(W_h\subset W\) of A such that \(A(U_h) \subset M(W_h)\), the discretization of b coincides with \(A_h'M_h^{1} A_h\). This follows from an argument similar to that in the proof of Lemma 5, and as a consequence, Theorem 4 can be applied to the preconditioned discrete systems considered in Sect. 3.
Discretization with \(H^1\) conforming finite elements
The theory outlined in Sect. 4 provides a robust preconditioning technique for the optimality system (20) assuming additional regularity and making use of \(H^2\) conforming elements. However, this additional regularity only appears relevant to the discretization of the (2, 2) block of the ideal preconditioner (51), since the coefficient matrix in (44) only involves second order operators. It therefore seems reasonable that the use of sophisticated \(H^2\) conforming elements could be avoided in favour of standard \(H^1\) conforming elements, provided that we can implement an approximate inverse to the fourth operator appearing in the preconditioner.
To be precise, we can discretize the optimality system (10)–(12) with \(H^1\)conforming piecewise linear Lagrange elements for all the unknown variables. Note that this requires that the integration by parts formula is applied to the state equation, resulting in the variation problem
for \((f_h, u_h, \xi _h) \in V_h \times V_h \times V_h\), where \(V_h\) is the space of continuous piecewise linear functions. Since all three unknowns belong to the same space, the eigenvalue analysis in Sect. 7 can be applied to the discretized coefficient matrix, which reads
where \(A_h\) and \(M_h\) are symmetric matrices. The analysis in Sect. 7 reveals that an ideal preconditioner is given by
with condition numbers of the preconditioned system bounded independtly of \(\alpha \) and the discretization parameter h.
The operator \(K_h + A_h M_h^{1} A_h\) in the (2,2) block of (65) coincides with Schur complement of a CiarletRaviart mixed finite element formulation of the fourth order problem (16)–(18), and can be thought of as nonlocal fourth order operator. Multigrid techniques for a similar operator was studied in [8], where a multigrid Wcycle applied to a local operator approximating the Schur complement was shown to be an efficient preconditioner.
Table 4 presents iteration numbers and estimated condition numbers for a simplistic scheme where we replace the (2,2) block in (65) with \(K_h + A_h {\tilde{M}}_h^{1} A_h\), where \({\tilde{M}}_h\) is a lumped mass matrix. For the appxroximate inversion of (65), we applied an algebraic multigrid Wcycle for the (2,2) block and two symmetric Gauss–Seidel iterations to the remaining two diagonal blocks. The experiment was carried out on a unit square domain and an Lshaped domain, with both domains triangularized with structured meshes. For the Lshaped domain, the \(H^2\)regularity discussed in the beginning of Sect. 2 is known not to hold.
The iteration numbers reported in Table 4 appears bounded, although we observe an increase in the estimated condition number for the Lshaped domain as the mesh is refined. Although the condition number with an exact inverse (65) is bounded in accordance with the analysis in Sect. 7, this appears not to be the case when the exact inverse of the (2,2) block is replaced with an AMG cycle.
We remark that for unstructered meshes we observed iteration counts increasing with mesh refinement, indicating the need for a more sophisticated approach to the multilevel approximation of the (2,2) block, for example as in [8], for more complicated geometries.
Discussion
Previously, parameter robust preconditioners for PDEconstrained optimization problems have been successfully developed, provided that observation data is available throughout the entire domain of the state equation. For many important inverse problems, arising in industry and science, this is an unrealistic requirement. On the contrary, observation data will typically only be available in subregions, of the domain of the state variable, or at the boundary of this domain. We have therefore explored the possibility for also constructing robust preconditioners for PDEconstrained optimization problems with limited observation data.
For an elliptic control problem, with boundary observations only, we have developed a regularization robust preconditioner for the associated KKT system. Consequently, the number of Minres iterations required to solve the problem is bounded independently of both regularization parameter \(\alpha \) and the mesh size h. In order to achieve this, it was necessary to write the elliptic state equation on a nonstandard, and nonselfadjoint, variational form. If this approach is employed, then the control and the Lagrange multiplier will belong to the same Hilbert space, which leads to extra regularity requirements for the state. This fact makes it possible to construct parameter weighted metrics such that the constants appearing in the Brezzi conditions, as well as the constants in the inequalities expressing the boundedness of the KKT system, are independent of \(\alpha \) and h. Consequently, the spectrum of the preconditioned KKT system is uniformly bounded with respect to \(\alpha \) and h, which is ideal for the Minres scheme. These properties were illuminated through a series of numerical experiments, and the preconditioned Minres scheme handled our model problem excellently.
The use of a nonselfadjoint form of the elliptic state equation leads to additional challenges for conforming discretization schemes and in multigrid implementations. For the numerical experiments, we employed a \(C^1\) finite element discretization that is \(H^2\)conforming, where the rectangular elements are tensor products of Hermite intervals. This discretization is limited to structured meshes. While there are other, more flexible \(C^1\) finite element discretizations available in two dimensions (e.g. Argyris and Bell triangles), all of the methods suffer from high computational cost due the smoothness requirements imposed on the nodal basis functions. In three dimensions, the situation is even worse, and \(C^1\) discretizations with tetrahedrons become nearly intractable, see e.g. [15].
Some of the difficulties with traditional \(C^1\) finite element discretizations can be avoided with Galerkin methods making use of basis functions that naturally fulfill the smoothness requirements. Examples of such methods include discretization with spline basis functions, such as isogeometric analysis [3]. Another approach is the virtual element method [2]. However, the development of multilevel methods for the fourth order operator in the preconditioner (51) would remain a challenging problem.
We have also demonstrated that the technique is applicable also outside of \(H^2\)conforming discretizations.
Our findings for the simple elliptic control problem were generalized to a broader class of KKT systems. It turns out that the methodology is applicable whenever the control and the Lagrange multiplier belong to the same space, and extra regularity properties are fulfilled by the state equation  these are the key issues. From a theoretical perspective, this is in many cases not a severe restriction, but it gives rise to new challenges for the discrete problems. This is even the case for the elliptic state equation considered in this text. Also, our approach will not yield \(\alpha \) independent bounds if the control is only defined on a subdomain of the domain of the state equation. In such cases, the spaces for the control and the Lagrange multiplier will not coincide. How to design efficient parameterrobust preconditioners for such problems, is, as far as the authors know, still an open problem.
Notes
 1.
In [10, 11] it is proved that the number of needed preconditioned Minres iterations cannot grow faster than
$$\begin{aligned} a + b \left[ \log _{10} \left( \alpha ^{1} \right) \right] ^2. \end{aligned}$$Furthermore, in [11] it is explained why iterations counts of the kind (5) often will occur in practice.
References
 1.
Brezzi, F.: On the existence, uniqueness and approximation of saddlepoint problems arising from Lagrangian multipliers. RAIRO Numer. Anal. 8, 129–151 (1974)
 2.
Brezzi, F., Marini, L.D.: Virtual element methods for plate bending problems. Comput. Methods Appl. Mech. Eng. 253, 455–462 (2013)
 3.
Cottrell, J.A., Hughes, T.J.R., Bazilevs, Y.: Isogeometric analysis: toward integration of CAD and FEA. Wiley, USA (2009)
 4.
Falgout, R., Yang, U.: Hypre: a library of high performance preconditioners. Comput. Sci. ICCS 2002, 632–641 (2002)
 5.
Grisvard, P.: Elliptic problems in nonsmooth domains. Pitman, Boston (1985)
 6.
Günnel, A., Herzog, R., Sachs, E.: A note on preconditioners and scalar products in Krylov subspace methods for selfadjoint problems in Hilbert space. Electron. Trans. Numer. Anal. 41, 13–20 (2014)
 7.
Hackbusch, W.: Elliptic differential equations. Theory and numerical treatment. SpringerVerlag, Berlin (1992)
 8.
Hanisch, M.R.: Multigrid preconditioning for the biharmonic dirichlet problem. SIAM J. Numer. Anal. 30(1), 184–214 (1993)
 9.
Mardal, K.A., Winther, R.: Preconditioning discretizations of systems of partial differential equations. Numer. Linear Algebra Appl. 18(1), 1–40 (2011)
 10.
Nielsen, B.F., Mardal, K.A.: Efficient preconditioners for optimality systems arising in connection with inverse problems. SIAM J. Control Optim. 48(8) (2010)
 11.
Nielsen, B.F., Mardal, K.A.: Analysis of the minimal residual method applied to illposed optimality systems. SIAM J. Sci. Comput. 35(2), A785–A814 (2013)
 12.
Pearson, J.W., Stoll, M., Wathen, A.J.: Regularizationrobust preconditioners for timedependent PDEconstrained optimization problems. SIAM J. Matrix Anal. Appl. 33, 1126–1152 (2012)
 13.
Pearson, J.W., Wathen, A.J.: A new approximation of the Schur complement in preconditioners for PDEconstrained optimization. Numer. Linear Algebra Appl. 19, 816–829 (2012)
 14.
Schöberl, J., Zulehner, W.: Symmetric indefinite preconditioners for saddle point problems with applications to PDEconstrained optimization problems. SIAM J. Matrix Anal. Appl. 29(3), 752–773 (2007)
 15.
Ženíšek, Alexander: Polynomial approximation on tetrahedrons in the finite element method. J. Approx. Theory 7(4), 334–351 (1973)
 16.
Zhang, X.: Multilevel schwarz methods for the biharmonic dirichlet problem. SIAM J. Sci. Comput. 15(3), 621–644 (1994)
 17.
Zulehner, W.: Nonstandard norms and robust estimates for saddle point problems. SIAM J. Matrix Anal. Appl. 32, 536–560 (2011)
Acknowledgements
The first author is grateful to Adrian Hope for conducting numerical experiments concerning Hermite elements.
Author information
Affiliations
Corresponding author
Additional information
Communicated by Rolf Stenberg.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Mardal, K., Nielsen, B.F. & Nordaas, M. Robust preconditioners for PDEconstrained optimization with limited observations. Bit Numer Math 57, 405–431 (2017). https://doi.org/10.1007/s1054301606358
Received:
Accepted:
Published:
Issue Date:
Keywords
 PDEconstrained optimization
 Preconditioning
 Minimum residual method
Mathematics Subject Classification
 65F08
 65N21
 65K10