Robust preconditioners for PDE-constrained optimization with limited observations

Regularization robust preconditioners for PDE-constrained optimization problems have been successfully developed. These methods, however, typically assume observation data and control throughout the entire domain of the state equation. For many inverse problems, this is an unrealistic assumption. In this paper we propose and analyze preconditioners for PDE-constrained optimization problems with limited observation data, e.g. observations are only available at the boundary of the solution domain. Our methods are robust with respect to both the regularization parameter and the mesh size. That is, the condition number of the preconditioned optimality system is uniformly bounded, independently of the size of these two parameters. The method does, however, require extra regularity. We first consider a prototypical elliptic control problem and thereafter more general PDE-constrained optimization problems. Our theoretical findings are illuminated by several numerical results.


Introduction
Consider the model problem: on a Lipschitz domain Ω ⊂ R n , subject to This minimization task is similar to the standard example considered in PDE-constrained optimization.But instead of assuming that observation data is available everywhere in Ω, we consider the case where observations are only given at the boundary ∂Ω of Ω, that is d ∈ L 2 (∂Ω), see the first term in (1).For problems of the form (1)-( 3), in which (5) very efficient preconditioners have been developed for the associated KKT system.In fact, by employing proper α-dependent scalings of the involved Hilbert spaces [10], or by using a Schur complement approach [9], methods that are robust with respect to the size of the regularization parameter α have been developed.More specifically, the condition number of the preconditioned optimality system is small and bounded independently of 0 < α 1 and the mesh size h.This ensures good performance for suitable Krylov subspace methods, e.g. the minimum residual method (Minres), independently of both parameters.These techniques have been extended to handle time dependent problems [8] and PDE-constrained optimization with Stokes equations [12], but the rigorous analysis of α-independent bounds always requires that observations are available throughout all of Ω.
For cases with limited observations, for example with cost-functionals of the form (4), efficient preconditioners are also available for a rather large class of PDE-constrained optimization problems, see [6,7].But these techniques do not yield convergence rates, for the preconditioned KKT-system, that are completely robust with respect to the size of the regularization parameter α.Instead, the number of preconditioned Minres iterations grows logarithmically 1 with respect to the size of α −1 , as α → 0: a + b log 10 α −1 . (6)
Furthermore, in [7] it is explained why iterations counts of the kind (6) often will occur in practice.
equipped with the inner product Here ∇ 2 u denotes the Hessian of u, and the second identity is due to the boundary condition ∂u ∂n = 0 imposed on the space H2 (Ω).We will see below that, in order to design a regularization robust preconditioner for (1)-( 3), it is convenient to express the state equation in the form (7), instead of employing integration by parts/Green's formula to write it on the standard self-adjoint form.

Numerical experiments
Prior to analyzing our model problem, we will consider some numerical experiments.Discretization of ( 11)-( 13) yields an algebraic system of the form where • M ∂ is a mass matrix associated with the boundary ∂Ω of Ω.
• A is a matrix that arise upon discretization of the operator (1 − ∆).
Since we write the state equation on a non self-adjoint form, A will not be the usual sum of the stiffness and mass matrices.Instead, equation ( 7) is discretized with subspaces of H2 (Ω) and L 2 (Ω).
In the current numerical experiments, we employ the Bogner-Fox-Schmit (BFS) rectangle for discretizing the state variable u ∈ H2 (Ω).That is, the finite element field consists of bicubic polynomials that are continuous, have continuous first order derivatives and mixed second order derivatives at each vertex of the mesh.BFS elements are C 1 on rectangles and therefore H 2conforming.The control f and Lagrange multiplier w are discretized with discontinuous bicubic elements.
We propose to precondition (14) with the block-diagonal matrix where R results from a discretization of the bilinear form b(•, •) on H2 (Ω): In the experiments presented below, we used this bilinear form to construct a multigrid approximation of (αR

Remark
The bilinear form ( 16) is equivalent to the inner product on H2 (Ω).The additional term stems from our choice of implementing a multigrid algorithm for the bilinear form associated with the operator (∆ − 1) 2 = ∆ 2 − 2∆ + 1.Indeed, the bilinear form αb( can be seen to coincide with the variational form associated with the fourth order problem To limit the technical complexity of the implementation, we considered the problem ( 1)-( 3) on the unit square in two dimensions.The experiments were implemented in Python and SciPy.The meshes were uniform rectangular, with the coarsest level for the multigrid solver consisting of 8 × 8 rectangles.Figure 1 shows an example of a solution of the optimality system (14).

Eigenvalues
Let us first consider the exact preconditioner B α defined in (15).If B α is a good preconditioner for the discrete optimality system (14), then the spectral condition number of B α A α should be small and bounded, independently of the size of both the regularization parameter α and the discretization parameter h.The eigenvalues of this preconditioned system were computed by solving the generalized eigenvalue problem We found that the absolute value of the eigenvalues λ were bounded, with 0.445 ≤ |λ| ≤ 1.809, uniformly in α ∈ {1, 10 −1 , . . ., 10 −10 } and h ∈ {2 −2 , . . ., 2 −5 }.This yields a uniform condition number k(B α A α ) ≈ 4.05.The spectra of the preconditioned systems are pictured in Figure 2 for some choices of α.The spectra are clearly divided into three bounded intervals, and the eigenvalues are more clustered for α ≈ 1 and for very small α.

Efficient preconditioning
In    • 1 multigrid V-cycle for the (2,2) block of B α , containing a symmetric 4×4 block Gauss-Seidel smoother where the blocks contain the matrix entries corresponding to all degrees of freedom associate with a vertex in the mesh (see [11] for a theoretical analysis of the method).
We estimated condition numbers of the individual blocks of B −1 α preconditioned with their respective approximations.The results are reported in Tables 1 and 2. A slight deterioration in the performance of the multigrid cycle can be seen for very small values of α > 0.

Iteration numbers
To verify that also B α is an effective preconditioner for A α , we applied the Minres scheme to the system For the results presented in Table 3, the Minres iteration process was stopped as soon as which is the standard termination criterion for the preconditioned Minres scheme, provided that the preconditioner is SPD.A random initial guess x 0 was used, and the tolerance was set to ε = 10 −12 .

Analysis of the KKT system
Recall that our optimality system reads:  3: Number of preconditioned Minres iterations needed to solve the optimality system to a relative error tolerance ε = 10 −12 .Estimated condition numbers in parentheses, computed from conjugate gradient iterations on the normal equations for the preconditioned optimality system.with unknowns f ∈ L 2 (Ω), u ∈ H2 (Ω) and w ∈ L 2 (Ω).We may write this KKT system in the form: where and the notation " " is used to denote dual operators and dual spaces.In the rest of this paper, the symbols M , M ∂ and A will represent the mappings defined in ( 19), ( 20) and ( 22), respectively, and not (the associated) matrices, as was the case in Section 3. (We believe that this mild ambiguity improves the readability of the present text).By using standard techniques for saddle point problems, one can show that the system (18) satisfies the Brezzi conditions [1], provided that α > 0. Therefore, for every α > 0, this set of equations has a unique solution.Nevertheless, if the standard norms of L 2 (Ω) and H 2 (Ω) are employed in the analysis, then the constants in the Brezzi conditions will depend on α.More specifically, the constant in the coercivity condition will be of order O(α), and thus becomes very small for 0 < α 1.This property is consistent with the ill posed nature of (1)-( 3) for α = 0, and makes it difficult to design α robust preconditioners for the algebraic system associated with (18).
Similar to the approach used in [5,6,10], we will now introduce weighted Hilbert spaces.The weights are constructed such that the constants appearing in the Brezzi conditions are independent of α.Thereafter, in Section 5, we will show how these scaled Hilbert spaces can be combined with simple maps to design α robust preconditioners for our model problem.

Weighted norms
Consider the α-weighted norms: applied to the control f , the state u and the dual/Lagrange-multiplier w, respectively.Note that these norms become "meaningless" for α = 0, but are well defined for positive α.

Brezzi conditions
We will now analyze the properties of defined in (18).More specifically, we will show that the Brezzi conditions are satisfied with constants that do not depend on the size of the regularization parameter α > 0. Note that we use the scaled Hilbert norms (23)-(25).
Expressed in terms of the operators that constitute A α , Lemma 4.1 takes the form inf

see (19) and (22).
Recall that we decided to write our state equation ( 2)-( 3) on the nonstandard variational form (7). Throughout this paper we assume that problem ( 2)-( 3) admits a unique solution u ∈ H2 (Ω) for every f ∈ L 2 (Ω), and that This assumption is valid if Ω is convex or if Ω has a C 2 boundary, see e.g.[2,4].Inequality (26) is a key ingredient of the proof of our next lemma.
Lemma 4.2 There exists a constant c 2 , which is independent of α > 0, such that see the discussion of (26).Let θ = (1 + c 2 1 ) −1 ∈ (0, 1), and it follows that This result may also be written in the form where M , M ∂ and A are the operators defined in ( 19), ( 20) and ( 22), respectively.

Boundedness
Having established that the Brezzi conditions hold, with constants that are independent of α, we next explore the boundedness of A α .

Proof
Recall the definitions (19) and (20) of M and M ∂ , respectively.Since we find, by employing the Cauchy-Schwarz inequality, that Lemma 4.4

Proof
Again, we note that From the definitions of M and A, see ( 19) and ( 22), and the Cauchy-Schwarz inequality, it follows that

Isomorphism
We have verified that the Brezzi conditions hold, and that A α is a bounded operator.Moreover, all constants appearing in the inequalities expressing these properties are independent of the regularization parameter α > 0. Let Theorem 4.5 The operator A α , defined in (18), is bounded and continuously invertible for α > 0 in the sense that for all nonzero x ∈ V, c ≤ sup for some positive constants c and C that are independent of α > 0. In particular,

Estimates for the discretized problem
The stability properties (30) is not necessarily inherited by discretizations.However, the structure used to prove the so-called "inf-sup condition" in Lemma 4.1 is preserved in the discrete system provided that the same discretization is employed for the control and the Lagrange multiplier.Furthermore, the boundedness properties, Lemma 4.3 and Lemma 4.4, certainly also hold for conforming discretizations.It remains to adress the coercivity condition, Lemma 4.2, for the discretized problem.We consider finite dimensional subspaces U h ⊂ U = H2 (Ω) and W h ⊂ W = L 2 (Ω).For certain choices of U h and W h , the estimate of Lemma 4.2 carries over to the finite-dimensional setting.

Proof
Assume that (1 − ∆)U h ⊂ W h , and that (32 If the discretization is chosen such that Lemma 4.6 is satisfied, then the estimates (30) carries over to discretized system.More precisely, we have where , equipped with the inner prdocut of V, and A α,h is discrete counterpart to A α , defined by setting A α,h x h , y h = A α x h , y h for all x h , y h ∈ V h .If the state is discretized with C 1 -conforming bicubic Bogner-Fox-Schmit rectangles, as in Section 3, then Lemma 4.6 is satisfied if the control and Lagrange multiplier is discretized with discontinuous bicubic elements on the same mesh.For triangular meshes, one could choose Argyris triangles for the state variable and piecewise quintic polynomials for the control and Lagrange multiplier variables.
We remark that Lemma 4.6 provides a sufficient, but not necessary criterion for stability of the discrete problem, and usually may imply far more degrees of freedom in the discrete space W h ⊂ W than is actually needed.The usefulness of Lemma 4.6 is that the estimates (33) can, in principle, always be obtained by choosing a sufficiently large space for the control and Lagrange multiplier.

Preconditioning
The linear problem (18) is of the form where x is sought in a Hilbert space V, the right hand side b is in the dual space V , and A is a self-adjoint continuous mapping of V onto V .Iterative methods for linear problems are most often formulated for operators mapping V into itself, and can not be directly applied to the linear system (34), as described in [5].If we want to apply such methods to (34), then we need to introduce a continuous operator mapping V isomorphically back onto V.
More precisely, if we have a continuous operator then M = BA : V → V is continuous and has the desired mapping properties, and if B is an isomorphism, the solutions to (34) coincides with the solutions to the problem In this paper we shall consider B ∈ L(V, V ) a preconditioner if B is self-adjoint and positive definite.This implies that B −1 is self-adjoint and positive definite as well, and hence B −1 defines an inner product on V by setting (x, y) = B −1 x, y , x, y ∈ V. (36) This inner product has the crucial property of making M self-adjoint, in the sense that (Mx, y) = Ax, y = Ay, x = (My, x).
Conversely, given any inner product on ( • , • ) on V, the Riesz-Fréchet theorem provides a self-adjoint positive definite isomorphism B : V → V such that (36) and (37) hold, and we say that B is the Riesz operator induced by ( • , • ).This establishes a one-to-one correspondence between preconditioners and Riesz operators on V .Since the Riesz operator is an isometric isomorphism, the operator norm of BA coincides with the operator norm of A. We formulate this well-known fact here in a lemma for the sake of self-containedness.We refer to [3,5] for a more in-depth discussion of preconditioning and its relation to Riesz operators.Lemma 5.1 Let V be a Hilbert space, and let A : V → V be a self-adjoint isomorphism, and assume that B is the Riesz operator induced by the inner product on V, or equivalently, that the inner product on V is defined by the self-adjoint positive definite isomorphism B −1 : V → V .Then BA : V → V is an isomorphism, self-adjoint in the inner product on V, with In particular, the condition number of BA is given by

Proof
Since A is self-adjoint, M = BA is self-adjoint with respect to the inner product on V. From the Riesz-Fréchet theorem we have Ax V = BAx = Mx , and we obtain following identity for the operator norm of M.
A similar identity is obtained for the norm of the inverse operator, We say that a preconditioner B α for A α is robust with respect to the parameter α if κ(B α A α ) is bounded uniformly in α.The significance of Lemma 5.1 is that such a robust preconditioner can be found by identifying (parameterdependent) norms in which A α and A −1 α are both uniformly bounded.

Parameter-robust minimum residual method
In Section 4 stability of A α was shown in the α-dependent norms defined in (23)-( 25).The preconditioner provided by Lemma 5.1 is the Riesz operator induced by the weighted norms.This operator B α : V → V takes the form where R : H2 (Ω) → H2 (Ω) is the operator induced by the H 2 (Ω) inner product, i.e.Ru, v = (u, v) H 2 (Ω) .Since A α is self-adjoint, the preconditioned operator B α A α : V → V is self-adjoint in the inner product on V. Consequently we can apply the minimum residual method (Minres) to the problem Theorem 5.2 Let A α be the operator defined in (18) and B α the operator defined in (38).Then there exists an upper bound, independent of α, for the convergence rate of Minres applied to the preconditioned system In particular there exists an upper bound, independent of α, for the number of iterations needed to reach the stopping criterion (17).

Proof
A crude upper bound for the convergence rate (more precisely, the two-step convergence rate) of Minres is given by is the condition number of B α A α , see e.g.[5].From Lemma 5.1 and (30) we determine that κ is bounded independently of α, with In practical applications, the operator B α will be replaced with a less computationally expensive approximation B α .Ideally B α will be spectrally equivalent to B α , in the sense that the condition number of B α B −1 α is bounded, independently of α.Then the preconditioned system reads and the upper bound for the convergence rate is determined by the conditioned number κ(

Remark
In this paper we only consider the minimum residual method, and we therefore require that the preconditioner is self-adjoint and positive definite.More generally, if other Krylov subspace methods are to be applied to (18), then preconditioners lacking symmetry or definiteness may be considered.We mention in particular that a preconditioned conjugate gradient method for problems similar to (18) was proposed in [10], based on a clever choice of inner product.

Generalization
Is our technique applicable to other problems than (1)-( 3)?We will now briefly explore this issue, and show that the preconditioning scheme derived above yields α robust methods for a class of problems.
The scaling ( 23)-( 25) was also investigated in [6], but for a family of abstract problems posed in terms of Hilbert spaces.More specifically, for general PDE-constrained optimization problems, subject to Tikhonov regularization, and with linear state equations.But in [6] no assumptions about the control, state or observation spaces were made, except that they were Hilbert spaces.Under these circumstances, it was proved that the coercivity and the boundedness, of the operator associated with the KKT system, hold with α-independent constants.Nevertheless, in this general setting, the inf-sup condition involved an α-dependent constant, which, eventually, yielded theoretical iteration bounds of order O([log α −1 ] 2 ) for Minres.
In the present paper we were able to prove an α-robust inf-sup condition for the model problem (1)-(3).This is possible because both the control f and the dual/Lagrange-multiplier w belong to L 2 (Ω).From a more general perspective, it turns out that this is the property that must be fulfilled in order for our approach to be successful: The control space and the dual space, associated with the state equation, must coincide.This will usually lead to additional regularity requirements for the state space.
Motivated by this discussion, let us consider an abstract problem of the form: Here, • W is the dual and control space, • U is the state space, • O is the observation space, • W , U and O are Hilbert spaces.
Let us assume that (A1) A : U → W is a continuous linear operator with closed range.In particular, there is a constant c 1 such that for all u ∈ U , (A2) T : U → O is linear and bounded, and invertible on the kernel of A.
That is, there is a constant c 2 such that for all u ∈ Ker A, It then follows that the KKT system associated with (40)-( 41) is well-posed for every α > 0: Determine (f, u, w) where Note that, compared with (14), the boundary observation matrix M ∂ has been replaced with the general observation operator K in (42).We introduce scaled norms as follows.
We first show that • Uα is indeed a norm on U when assumptions (A1) and (A2) hold.It suffices to show that • Uα is a norm equivalent to • U when α = 1.We have and letting π denote the orthogonal projection of U onto Ker A, Here the last inequality follows from u − πu U = inf ũ∈Ker A u − ũ U and assumption (A1).
We set V = W α ×U α ×W α −1 .As in Section 4, A α : V → V can be shown to be an isomorphism, with parameter-independent estimates obtained in the weighted norms.Theorem 6.1 There exists positive constants c and C, independent of α, such that for all nonzero x ∈ V, c ≤ sup We omit the full proof, which is analogous to that of Theorem 4.5.The crucial part is the "inf-sup condition" of Lemma 4.1, which is easily shown to hold in the abstract setting: The coercivity condition of Lemma 4.2 naturally holds in the prescribed norm on U α , since for (f, u) Note that the weighted norm now depends on A, and as consequence, the estimates become A-independent.In fact, we obtain bounds for the constants c and C which are independent of α as well as the operators appearing in (40)-(41).This is postponed to the next section, where sharp estimates are obtained for (48).
With the estimates (48), Lemma 5.1 provides a preconditioner for the operator A α , given as The condition number of B α A α will be bounded independently of α.It is, however, not clear how to find a computationally efficient approximation of B α in the abstract setting of (40)-(41).

Example 1
The problem (1)-( 3) fits in the abstract framework presented in this section when we assume that the state has H 2 (Ω) regularity.We set W = L 2 (Ω), U = H2 (Ω), A = 1 − ∆, and T : H2 (Ω) → L 2 (∂Ω) is a trace operator, see (44).Since A is a continuous isomorphism, assumptions (A1) and (A2) are both valid.The inner product on U α takes the form where D 2 u denotes the Hessian of u, and the last equality follows from the boundary condition ∂u/∂n = 0 imposed on H2 (Ω).The resulting preconditioner is the one that was used in the numerical experiments, detailed in Section 3, and it is spectrally equivalent to the preconditioner defined in (38).

Example 2
Let U , W , and K be as in Example 1, but let us set A = −∆.Now A has non-trivial kernel, consisting of the a.e.constant functions, and for constant u we have Since assumptions (A1) and (A2) are valid, the optimality system is still well-posed.In this case the inner product on U α is given by

Example 3
Let us consider the "prototype" problem: Note that we here consider the case in which observation data is assumed to be available throughout the entire domain Ω of the state equation.
If the usual variational form of the PDE is used, i.e., (u, w) Assumptions (B1)-(B3) ensure that B α is a self-adjoint and positive definite.
In particular, assumptions (B1)-(B3) hold for A α as in (42), provided that the assumptions of Section 6 hold.For simplicity, we also assume that that A α and B α are finite-dimensional operators.
Theorem 7.1 Let p, q, and r be the polynomials Let q 1 < q 2 and r 1 < r 2 < r 3 be the roots of q and r, respectively.The spectrum of B α A α is contained within three intervals, determined by the roots of p and r, independently of α: Consequently, the spectral condition number of If K has a nontrivial kernel, inequality (55) becomes an equality.

Proof
Consider the equivalent generalized eigenvalue problem We show that (56) admits no nontrivial solutions unless λ is as in (54).
Since M is a self-adjoint isomorphism, by assumption (B1), we can rewrite (56) as the three identities αpf + w = 0, (57) Assume that λ is not contained within the three closed intervals of (54).Then p = 0, and we can use (57) to eliminate f from (59).
Since q is nonzero, we can use (60) to eliminate w from (58), where the identity (53) was used.By assumption, pq and r are both nonzero.Moreover, it can be easily seen that pq and r have the same sign outside of the bounded intervals of (54).From assumptions (B1)-(B3), we conclude that qpK + rR is a self-adjoint definite operator.Then (61) only admits trivial solutions, hence λ can not be an eigenvalue of B α A α .The estimate (55) follows from (54), noting that | sp(B α A α )| ⊂ [r 2 , r 3 ].From (61) it can be seen that the roots of r are eigenvalues of B α A α if Ker K is nontrivial.For discretizations U h ⊂ U and W h ⊂ W of A such that A(U h ) ⊂ M (W h ), the discretization of b coincides with A h M −1 h A h .This follows from an argument similar to that in the proof of Lemma 4.6, and as a consequence, Theorem 7.1 can be applied to the preconditioned discrete systems considered in Section 3.

Discussion
Previously, parameter robust preconditioners for PDE-constrained optimization problems have been successfully developed, provided that observation data is available throughout the entire domain of the state equation.For many important inverse problems, arising in industry and science, this is an unrealistic requirement.On the contrary, observation data will typically only be available in subregions, of the domain of the state variable, or at the boundary of this domain.We have therefore explored the possibility for also constructing robust preconditioners for PDE-constrained optimization problems with limited observation data.
For an elliptic control problem, with boundary observations only, we have developed a regularization robust preconditioner for the associated KKT system.Consequently, the number of Minres iterations required to solve the problem is bounded independently of both regularization parameter α and the mesh size h.In order to achieve this, it was necessary to write the elliptic state equation on a non-standard, and non-self-adjoint, variational form.If this approach is employed, then the control and the Lagrange multiplier will belong to the same Hilbert space, which leads to extra regularity requirements for the state.This fact makes it possible to construct parameter weighted metrics such that the constants appearing in the Brezzi conditions, as well as the constants in the inequalities expressing the boundedness of the KKT system, are independent of α and h.Consequently, the spectrum of the preconditioned KKT system is uniformly bounded with respect to α and h, which is ideal for the Minres scheme.These properties were illuminated through a series of numerical experiments, and the preconditioned Minres scheme handled our model problem excellently.
The use of a non-self-adjoint form of the elliptic state equation leads to additional challenges for constructing discretization schemes and suitable multigrid methods.More specifically, it becomes necessary to implement a FE space approximating H 2 .We accomplished this by a C 1 discretization that is conforming in H 2 .The method employed does, however, have strong restrictions on the mesh, which seemingly must be composed of rectangles.
Computed optimal control f based on the observation data in (a) The "true" control function used to generate the observation data in (a).

Figure 1 :
Figure 1: An example of a solution of (14).The observation data d was generated with the forward model, using the "true" control 4x(1 − x) + y shown in panel (d).Solutions to the unregularized problem are non-unique, and the generating control cannot be (exactly) recovered.The figures were generated with mesh parameter h = 1/128 and regularization parameter α = 10 −6 .

Table 1 :
Condition numbers of M preconditioned with symmetric Gauss-Seidel iterations.

Table 2 :
Estimated condition numbers of αR + M ∂ preconditioned with one V-cycle multigrid iteration.