Optimal control of the stationary Kirchhoff equation

We consider an optimal control problem for the steady-state Kirchhoff equation, a prototype for nonlocal partial differential equations, different from fractional powers of closed operators. Existence and uniqueness of solutions of the state equation, existence of global optimal solutions, differentiability of the control-to-state map and first-order necessary optimality conditions are established. The aforementioned results require the controls to be functions in H1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$H^1$$\end{document} and subject to pointwise lower and upper bounds. In order to obtain the Newton differentiability of the optimality conditions, we employ a Moreau–Yosida-type penalty approach to treat the control constraint and study its convergence. The first-order optimality conditions of the regularized problems are shown to be Newton differentiable, and a generalized Newton method is detailed. A discretization of the optimal control problem by piecewise linear finite elements is proposed and numerical results are presented.


Introduction
In this paper we study an optimal control problem governed by a nonlinear, nonlocal partial differential equation (PDE) of Kirchhoff-type −M x, ∇ y 2 L 2 (Ω) ; u y = f in Ω, y = 0 o n ∂Ω. (1.1) here Ω ⊂ R N is an open and bounded set and the right-hand side f belongs to L 2 (Ω). We focus on the particular case M(x, s; u) = u(x) + b(x) s, which has been considered previously, e. g., in [8,10]. Here u and b are strictly positive functions and u serves as the control. The full set of assumptions is given in Sect. 2. We mention that in case u and b are positive constants, (1.1) has a variational structure; see [10]. Equation (1.1) is the steady-state problem associated with its time-dependent variant (1. 2) In one space dimension, problem (1.2) models small vertical vibrations of an elastic string with fixed ends, when the density of the material is not constant. Specifically, the control u is proportional to the inverse of the string's cross section; see [10,18]. A physical interpretation of the multi-dimensional problems (1.1) and (1.2) appears to be missing in the literature.
PDEs with nonlocal terms play an important role in physics and technology and they can be mathematically challenging. Although in some cases variational reformulations are available, the models (1.1), (1.2) do not allow this in general. Thus, despite the deceptively simple structure, (1.1) requires a set of analytical tools not often employed in PDE-constrained optimization. Existence and uniqueness of solutions for (1.1) have been investigated in [10] and [8]; see also the references therein. For further applications of nonlocal PDEs, we refer the reader to [2,9,17].
The authors in [8] studied an optimal control problem for (1.1) with the following cost functional with an admissible set U ad = {u ∈ L 2 (Ω) | u ≥ u a > 0 a.e. in Ω}. However we believe that the proof of existence of an optimal solution in this work has a flaw. We give further details in the "Appendix". Moreover, the proof in [8] is explicitly tailored to such tracking type functionals. In the present work we see it necessary to modify the control cost term to contain the stronger H 1 -norm. We also allow for a more general state dependent term, which leads to the objective J (y, u) = Ω ϕ(x, y(x)) dx + λ 2 u 2 H 1 (Ω) (1.4) and a set of admissible controls in H 1 (Ω). In this setting, we prove the weak-strong continuity of the control-to-state operator into H 1 0 (Ω) ∩ W 2,q (Ω) for any q ∈ [1, ∞) and the existence of a globally optimal solution. Moreover, we work with a pointwise lower bound on admissible controls. This bound has an immediate technological interpretation, representing an upper bound on the string's cross section. On the other hand, we also impose an upper bound on the controls. This is to be able to use the topology of L ∞ (Ω) in the proof of the Fréchet differentiability of the control-to-state map so that we can derive optimality conditions in a more straightforward way than by the Dubovitskii-Milyoutin formalism utilized in [8].
The first-order optimality conditions obtained when minimizing (1.4) subject to (1.1) involve a variational inequality of nonlinear obstacle type in H 1 . We choose to relax and penalize the bound constraints via a Moreau-Yosida regularization, which amounts to a quadratic penalty of the bound constraints for the control. In this setting, we can prove the generalized (Newton) differentiability of the optimality system. A similar philosophy, albeit for a different problem, has been pursued by [1]. We also mention [23,Chapter 9.2] for an approach via a regularized dual obstacle problem. A recent and promising alternative is offered by [5], where the Newton differentiability of the solution map for unilateral obstacle problems is shown, without the need to penalize the constraint. Indeed, relaxing the lower and upper bounds adds new difficulties, since the existence of a solution of the Kirchhoff equation (1.1) can only be guaranteed for positive controls. Therefore, we compose the control-to-state map with a smooth cut-off function. We then study the convergence of global minimizers as the penalty parameter goes to zero, see Theorem 3.4 for details. We can expect a corresponding result to hold also for locally optimal solutions under an assumption of second-order sufficient optimality conditions, but this is not investigated here.
To summarize our contributions in comparison to [8], we consider a more general objective, present a simpler proof for the existence of a globally optimal control, prove the differentiability of the control-to-state map and generalized differentiability of the optimality system for a regularized version of the problem as well as the applicability of a generalized Newton scheme. We also describe a structure preserving finite element discretization of the problem and the discrete counterpart of the generalized Newton method.
The paper is organized as follows. In Sect. 2, we review existence and uniqueness results for solutions of the Kirchhoff equation (1.1) and prove the existence of a globally optimal control. Subsequently, we prove the Fréchet differentiability of the control-to-state operator and derive a system of necessary optimality conditions for a regularized problem in Sect. 3. In Sect. 4, we prove the Newton differentiability of the optimality system and devise a locally superlinearly convergent scheme in appropriate function spaces. Section 5 addresses the discretization of the optimal control problem, its optimality system and the generalized Newton method by a finite element scheme. The paper concludes with numerical results in Sect. 6.

Optimal control problem: existence of a solution
In this work we are interested in the study of the following optimal control problem for a stationary nonlinear, nonlocal Kirchhoff equation: The set of admissible controls is given by The following are our standing assumptions.
where f 0 is a non-negative real number. The bounds u a and u b are functions in C(Ω) such that u b ≥ u a ≥ u 0 holds for some positive real number u 0 . Finally, we assume b ∈ L ∞ (Ω) with b ≥ b 0 a.e. for some positive real number b 0 .
The integrand ϕ in the objective is assumed to satisfy the following standard assumptions; see for instance [22,Chapter 4.3]: (1) ϕ : Ω × R → R is Carathéodory and of class C 2 , i. e., (2) ϕ satisfies the boundedness and local Lipschitz conditions of order 2, i. e., there exists a constant K > 0 such that D y ϕ(x, 0) ≤ K for all 0 ≤ ≤ 2 and for a.e. x ∈ Ω, and for every M > 0, there exists a Lipschitz constant L(M) > 0 such that We now proceed to define the notion of weak solution of the Kirchhoff problem. Since for any pair (u, y) (2.3) Here and in the following, we occasionally write · instead of · L 2 (Ω) . The L 2 (Ω)inner product is denoted by (· , ·). Moreover, we denote by L(U , V ) the space of bounded linear operators from U to V . Multiplication of (2.3) with a test function v ∈ H 1 0 (Ω) and integration by parts yields the following definition.

Definition 2.4 A function
The existence of a unique weak solution as well as its W 2,q (Ω)-regularity has been shown in [8,Theorem 2.2]. Nevertheless, we briefly sketch the proof since its main idea is utilized again later on. For a complete proof we refer the reader to [8].

Theorem 2.5
For any u ∈ U ad , there exists a unique weak solution y ∈ H 1 0 (Ω) of the Kirchhoff problem (2.3). Moreover, y ∈ W 2,q (Ω) holds for all q ∈ [1, ∞), so it is also a strong solution.
Proof Suppose that u ∈ U ad and let g : [0, ∞) → R be the function defined by where y s is the unique weak solution of the Poisson problem A monotonicity argument can be used to show that g has a unique root. Since y s solves (2.3) if and only if g(s) = 0 holds, the uniqueness of the solution of the Kirchhoff equation is guaranteed. Furthermore, due to the boundedness of u from below, the right-hand side f /(u + b s) of the Poisson problem above belongs to L ∞ (Ω). Hence, by virtue of regularity results for the Poisson problem, y ∈ W 2,q (Ω) holds for any q ∈ [1, ∞); see, e. g., [11,Theorem 9.15].
For the proof of existence of a globally optimal control of (2.1), we show next that the control-to-state operator S :

Theorem 2.6
The control-to-state map S is continuous from U ad (with the L 2 (Ω)- Proof The control-to-state map S : is well-defined as a consequence of Theorem 2.5. To show its continuity, let {u n } ⊂ U ad be a sequence with u n → u in L 2 (Ω). Set y n :=S(u n ), then we have the a-priori estimate From now on, suppose without loss of generality that q ∈ [2, ∞) holds. Since W 2,q (Ω) is a reflexive Banach space and every bounded subset of a reflexive Banach space is weakly relatively compact, there exists a subsequence y n , denoted by the same indices, satisfying y n ŷ in W 2,q (Ω). The compactness of the embedding W 2,q (Ω) → W 1,q (Ω) implies the strong convergence y n →ŷ in W 1,q (Ω) and thus ∇ y n → ∇ŷ . Moreover, u n → u in L 2 (Ω) implies the existence of a further subsequence u n , still denoted by the same indices, with u n (x) → u(x) for a.e. x ∈ Ω. Consequently, Since f u n +b ∇ y n 2 is dominated by f u a , we have By virtue of the dominated convergence theorem, On the other hand, from y n ŷ in W 2,q (Ω), it follows that y n ŷ holds in L q (Ω). The uniqueness of the weak limit yields and from the uniqueness of the solution of (2.3) we obtainŷ = S(u). Therefore, y n → ŷ holds in L q (Ω) and thereby y n →ŷ in W 2,q (Ω).
We note that we have proved that for any sequence {u n } ⊂ U ad with u n → u in L 2 (Ω) there exists a subsequence {u n }, denoted by the same indices, so that S(u n ) → S(u) in W 2,q (Ω). Thus we can easily conclude convergence of the entire sequence S(u), then there exist δ > 0 and a subsequence with indices n k such that Since u n k → u in L 2 (Ω), there exists a further subsequence {u n k } such that S(u n k ) → S(u), which is a contradiction. Consequently, we obtain S(u n ) → S(u) as claimed.
We can now address the existence of a global minimizer of (2.1). Theorem 2.8 Problem (2.1) possesses a globally optimal controlū ∈ U ad with associated optimal stateȳ = S(ū) ∈ H 1 0 (Ω) ∩ W 2,q (Ω) for all q ∈ [1, ∞). Proof The proof follows the standard route of the direct method so we can be brief.
Step (1): We show that the reduced cost functional is bounded from below on the set U ad . To this end, recall from the proof of Theorem 2.6 that S(U ad ) is bounded in W 2,q (Ω). Due to the embedding W 2,q (Ω) → C(Ω) for q > N /2, there exists M > 0 such that S(u) L ∞ (Ω) ≤ M holds for all u ∈ U ad . From Assumption 2.2 we can obtain the estimate

This implies
for all u ∈ U ad . The assertion follows.
Step (2): We construct the tentative minimizerū. Since j is bounded from below on U ad , there exists a minimizing sequence {u n } ⊂ U ad so that The boundedness of {u n } in H 1 (Ω) follows from the radial unboundedness of j. Consequently, there exists a subsequence, denoted by the same indices, such that u n ū in H 1 (Ω). U ad is convex and closed in H 1 (Ω) and therefore weakly closed in Step (3): It remains to show the global optimality ofū. Set F(y):= Ω Φ(y) dx, thus F is composed of a Nemytskii operator and a continuous linear integral In summary, exploiting the weak sequential lower semicontinuity of · H 1 we have By definition of β and sinceū ∈ U ad ∩ H 1 (Ω), we therefore must have β = j(ū).

Remark 2.9
An inspection of the existence theory shows that these results remain valid in the absence of an upper bound u b on the control. However, the upper bound is of essential importance in the following section, where we prove the Fréchet differentiability of the control-to-state map.

Optimality system
In this section we address first-order necessary optimality conditions for local minimizers. We need to overcome several obstacles. First of all, the control-to-state operator is not Fréchet differentiable in the H 1 (Ω)-topology. The reason is that U ad has empty interior w.r.t. this topology except in dimension N = 1. More precisely, every H 1 (Ω)neighborhood of any control u ∈ U ad contains functions which are arbitrarily negative on sets of small but positive measure. However, the proof of Theorem 2.5, which establishes the well-definedness of the control-to-state map, is contingent upon the controls to remain positive. In order to overcome this issue, we work with the topology of With regard to an efficient numerical solution method in function spaces, we are aiming to arrive at an optimality system which is Newton differentiable. To this end, we propose to relax and penalize the control constraint. Notice that this is not straightforward since we need to ensure positivity of the relaxed control in the state equation. We achieve the latter by a smooth cut-off function. The optimality system of the penalized problem then turns out to be Newton differentiable, as we shall show in Sect. 4.
The material in this section is structured as follows. In Sect. 3.1, we prove the Fréchet differentiability of the control-to-state map. We establish the system of firstorder necessary optimality conditions for the original problem (2.1) in Sect. 3.2. In Sect. 3.3 we introduce the penalty approximation and show that for any null sequence of penalty parameters, there exists a subsequence of global solutions to the corresponding penalized problems which converges weakly to a global solution of the original problem; see Theorem 3.4. Section 3.4 addresses the system of first-order necessary optimality conditions for the penalized problem.

Differentiability of the control-to-state map
In this subsection we show the Fréchet differentiability of the control-to-state map S by means of the implicit function theorem. To verify the assumption of the implicit function theorem, we need the following result about the linearization of the Kirchhoff equation (2.3).

Theorem 3.2 Suppose thatû ∈ U ad . Then the control-to-state operator
Proof Suppose thatû ∈ U ad is arbitrary and thatŷ ∈ H 1 0 (Ω) ∩ W 2,q (Ω) is the associated state. The map E : (3. 2) The existence and uniqueness of y Invoking the implicit function theorem, we obtain that S is continuously differentiable in some L ∞ (Ω)-neighborhood ofû. Sinceû ∈ U ad was arbitrary, S actually extends into an L ∞ (Ω)-neighborhood of U ad and it is continuously differentiable there. Moreover, we obtain that δ y = S (û) δu satisfies E y (ŷ,û) δy = −E u (ŷ,û) δu, i. e.,

First-order optimality conditions
The optimality system can be derived by using the Lagrangian L : and taking the derivative with respect to the state and the control. In the first case, we obtain Integration by parts yields Notice that L y (y, u, p) δy = 0 for all δ y ∈ H 1 0 (Ω) ∩ W 2,q (Ω) represents the strong form of the adjoint equation, which reads . This can be shown either by direct arguments as in Theorem 2.5, or by exploiting that the bounded invertibility of E y implies that of its adjoint, see the proof of Theorem 3.2.
The derivative of the Lagrangian with respect to the control is given by It is now standard to derive the following system of necessary optimality conditions.
such that the following system holds: Notice that (3.5b) is a nonlinear obstacle problem for the control variable u originating from the bound constraints in U ad and the presence of the H 1 -control cost term in the objective. Until recently, the Newton differentiability of the associated solution map was not known. In order to apply a generalized Newton method, we therefore chose to relax and penalize the bound constraints via a quadratic penalty in the following section. This is also known as Moreau-Yosida regularization of the indicator function pertaining to U ad .
Recently, the authors in [5] proved a Newton differentiability result for the solution map of unilateral obstacle problems. This approach offers a promising alternative route to solving (3.5) numerically. It would amount to introducing a fourth unknown satisfying z = − f p (u+b ∇ y 2 ) 2 and replacing (3.5b) by u = G(z), where G stands for the solution map of the obstacle problem We leave the details for future work.

Moreau-Yosida penalty approximation
The Moreau-Yosida penalty approximation of problem (2.1) consists of the following modifications.
(1) We remove the constraints u a ≤ u ≤ u b from U ad and work with controls in H 1 (Ω) which do not necessarily belong to L ∞ (Ω). (2) We add the penalty term 1 2ε Here v + = max{0, v} is the positive part function and ε > 0 is the penalty parameter.
Notice that modification (3) is required since the control-to-state map S is guaranteed to be defined only for positive controls; compare Theorem 2.5. Therefore, we use u a /2+η ε (u−u a /2) ≥ u a /2 as an effective control. In addition, u a /2+η ε (u−u a /2) = u holds for all u ∈ U ad , provided that ε is small enough. We now consider the following relaxed problem: and u ∈ H 1 (Ω).

(P ε )
The relation between (P ε ) and the original problem (2.1) is clarified in the following theorem.
Proof Statement (i) can be proved in a straightforward manner using a similar procedure as in Theorem 2.8. The proof of statement (ii) is divided into several steps. As in the proof of Theorem 2.8, we define β to be the globally optimal value of the objective in (2.1). Similarly, we let β ε denote the globally optimal value of the objective in (P ε ). Suppose that ε n 0 is any sequence.
We have already shown that β ε n ≤ β holds, therefore Taking the lim sup in this inequality as n → ∞, we find as in the proof of Theorem 2.8. Passing with n → ∞ in (**) yields and consequently, u * ∈ U ad follows.
Step (3): To obtain the convergence η ε n (ū ε n − u a /2) → u * − u a /2 in L 2 (Ω), it suffices to note that the assumptions on η ε imply that, for all t ∈ R, η ε n (t) → max{0, t} holds as n → ∞ and that η ε n has a Lipschitz constant of 1 for all n. In combination with u * ≥ u a , the triangle inequality, and the dominated convergence theorem, this gives as desired. The continuity of S on U ad w.r.t. the L 2 (Ω)-topology now impliesȳ ε n = S u a /2 + η ε n ū ε n − u a /2 → S u * .

From
Step (2) we have the weak convergence ofȳ ε n to y * . The uniqueness of the weak limit shows y * = S(u * ).

First-order optimality conditions for the penalized problem
The derivation of optimality conditions for (P ε ) proceeds along the same lines as in Sect. 3.2 and the details are omitted. Notice that the use of the cut-off function in the control-to-state map resolves the difficulty with differentiability of this map with respect to H 1 (Ω)-topology in appropriate function spaces. For simplicity, we drop the index · ε from now on and denote states, controls, and associated adjoint states by (y, u, p). We obtain the following regularized system of necessary optimality conditions.

Theorem 3.5 Suppose that
is a locally optimal solution of problem (P ε ) for any q ∈ [1, ∞). Then there exists a unique adjoint state p ∈ H 1 0 (Ω) ∩ W 2,q (Ω) for all q ∈ [1, ∞) such that the following system holds: Remark 3. 6 We note that under a second-order sufficient condition, which is not investigated in this paper, every solution of (3.6) is a strict local minimizer of (P ε ). According to Theorem 3.4, applied to a modified problem with a suitable localization term, the local minimizer of the penalized problem under consideration converges to a local minimizer of the original optimal control problem as ε → 0. This technique is well known; see for instance [4,Section 4]. Therefore, under second-order sufficient optimality conditions, the solutions of the optimality system of (P ε ) converge to the solutions of the optimality system of (2.1).

Generalized Newton method
In this section we show that the optimality system (3.6) of the penalized problem is differentiable in a generalized sense, referred to as Newton differentiability. This allows us to formulate a generalized Newton method. Due to its similarity with the concept of semismoothness, see [23], such methods are sometimes referred to as a semismooth Newton method.
In this case G is said to be a Newton derivative of F on V .
We formulate the optimality system (3.6) in terms of an operator equation F = 0 where and q ∈ [max{1, N /2}, ∞) is arbitrary but fixed.
Here W 2,q (Ω) is defined as The component F 1 represents the adjoint equation (3.6a) in strong form, i. e., The continuous Fréchet differentiability of F 1 is a standard result, which uses Lemma 2.3 and the embedding W 2,q (Ω) → L ∞ (Ω). The directional derivative is given by Similarly, F 3 represents the state equation (3.6c), i. e., F 3 (y, u, p) = − y − f u a /2 + η ε (u − u a /2) + b ∇ y 2 and its continuous Fréchet derivative is given by Finally, in order to define F 2 we integrate (3.6b) by parts, which is feasible due to Corollary 3.7. This results in the equivalent formulation F 2 = 0, where and the boundary conditions ∂u ∂n = 0, which are included in the definition of W 2,q (Ω).
In order to establish the Newton differentiability of F 2 , we invoke the following classical result.
given by Using Theorem 4.2 and the embedding W 2,q (Ω) → L ∞ (Ω), it follows that F 2 is Newton differentiable on the entire space X with generalized derivative G 2 (y, u, p)(δ y, δu, δ p) Here χ A stands for the indicator function of the set We are now in a position to state a basic generalized Newton method; see Algorithm 4.3. Following well-known arguments, we can show its local well-posedness and superlinear convergence to local minimizers satisfying second-order sufficient conditions. We refrain from repeating the details and refer the interested reader to, e. g., [16,Chapter 7], [14, and [23,Chapter 10]. It is also possible to globalize the method using a line search approach; see, e. g., [15].

Algorithm 4.3 (Basic semismooth Newton method for the solution of problem (P ε )).
Input: initial guess (y 0 , u 0 , p 0 ) ∈ X Output: approximate stationary point of (P ε ) 1: Set k:=0 2: while not converged do 3: Determine the active set A(u k ) 4: Solve the Newton system Update the iterates by setting y k+1 :=y k + δ y, u k+1 :=u k + δu, p k+1 := p k + δ p 6: Set k:=k + 1 7: end while An appropriate criterion for the convergence of Algorithm 4.3 is the smallness of , either in absolute terms or relative to the initial values.

Discretization and implementation
In this section we address the discretization of the relaxed optimal control problem (P ε ). We then follow a discretize-then-optimize approach and derive the associated discrete optimality system, as well as a discrete version of the generalized Newton method. In order to simplify the implementation, we employ the original control-to-state map y = S(u). In other words, we choose η ε = id in (P ε ), which no longer approximates the positive part function. Consequently, the controls appearing in the control-to-state map are no longer guaranteed to be bounded below by u a . This simplification is justified a posteriori, provided that the control iterates happen to remain positive and bounded away from zero and thus still permit the state equation to be uniquely solvable, or rather its linearized counterpart appearing in the generalized Newton method. We numerically observed this to be the case for all examples. In addition, we allow the addition of an upper bound on the constraint in our implementation, which is treated via the same penalty approach as the lower bound.
Our discretization method of choice is the finite element method. We employ piecewise linear, globally continuous finite elements on geometrically conforming triangulations of the domain Ω. More precisely, we use the space to discretize the control, the state and adjoint state variables. We use the usual Lagrangian basis and refer to the basis functions as {ϕ j }, where j = 1, . . . , N V and N V denotes the number of vertices in the mesh. The coefficient vector, e. g., for the discrete control variable u ∈ V h , will be denoted by u, so we have In order to formulate the discrete optimal control problem, we introduce the mass and stiffness matrices M and K as follows: We also make use of the diagonally lumped mass matrix M lumped with entries M lumped ii = N V j=1 M i j . Suppose that the right-hand side f and coefficient b have been discretized and represented by their coefficient vectors f and b in V h . Using the lumped mass matrix, the weak formulation (2.4) of the state equation can be written in preliminary discrete form as In order to incorporate the Dirichlet boundary conditions, we introduce the boundary projector P Γ . This is a diagonal N V × N V -matrix which has ones along the diagonal in entries pertaining to boundary vertices, and zeros otherwise. We also introduce the interior projector P Ω :=id − P Γ . We can thus state the discrete form of the state equation (2.4) as In order to simplify the notation, we introduce further diagonal matrices Using these matrices, we can write (5.1) more compactly as e(y, u):=P Ω K y − P Ω M lumped F D(y, u) −1 1 + P Γ y = 0, where 1 and 0 denote column vectors of all ones and all zeros, respectively. To be specific, we focus on a tracking-type objective and choose ϕ(x, y) = 1 2 (y − y d ) 2 in (1.4) and thus also in (P ε ). In addition, we distinguish two positive control cost parameters λ 1 and λ 2 , which leads to discrete problems of the form and the Lagrangian of our discretized problem becomes Before we state the first-and second-order derivatives of the Lagrangian, we address the nonlinear term D(y, u) −1 first. We obtain Therefore, the first-order derivatives of L (written as column vectors) are given by (5.5b) Here D A + (u) and D A − (u) are diagonal (active-set) matrices with entries In order to solve the discrete optimality system consisting of (5.1) and (5.5), we employ a finite-dimensional semismooth Newton method (Algorithm 5.1). This requires the evaluation of first-order derivatives of the state equation (5.1) as well as second-order derivatives of the Lagrangian (5.4). The following expressions are obtained.
Notice that the expression for L uu is the generalized derivative of L u in the sense of Definition 4.1.
The discrete generalized Newton system has the following form: The well-posedness of the system (5.7) can be shown in a neighborhood of a locally optimal solution satisfying second-order sufficient optimality conditions, under the additional assumption that u remains positive. This is a well established technique and it applies both to the continuous as well as to the discrete setting; see for instance [3,20,21]. In contrast to standard optimal control problems which do not feature a nonlocal PDE, some of the blocks in (5.7) are no longer sparse. This comment applies to e y due to the second summand in (5.6a), to L yy due to the second summand in (5.6c) as well as to L yu given by (5.6d). For a high performance implementation, it is therefore important to not assemble the blocks in (5.7) as matrices, but rather to provide matrix-vector products and use a preconditioned iterative solver such as Minres ( [19]) to solve (5.7). This aspect, however, is beyond the scope of this paper and we defer the design and analysis of a suitable preconditioner to future work. For the time being we resort to the direct solution of (5.7) using Matlab's direct solver, which is still feasible on moderately fine discretizations of two-dimensional domains. Our implementation of the semismooth Newton method is described in Algorithm 5.1. In contrast to Algorithm 4.3, we added an additional step in which we solve the discrete nonlinear state equation (5.2) for y k+1 once per iteration for increased robustness; see Line 6 in Algorithm 5.1. Notice that the preliminary linear update to y k+1 in Line 5 is still useful since it provides an initial guess for the subsequent solution of e(y k+1 , u k+1 ) = 0. We mention that nonlinear state updates have been analyzed in the closely related context of SQP methods, e. g., in [7,24]. We also added a rudimentary damping strategy which improves the convergence behavior. In our examples, it suffices to choose γ = 1/2 when L u (y k , u k , p k ) (K+M) −1 > 1/10 and γ = 1 otherwise.
The stopping criterion we employ in Line 2 measures the three components of the residual, i. e., the right-hand side in (5.7).Since each component represents an element of the dual space of H 1 (Ω), we evaluate the (squared) H 1 (Ω) * -norm of all residual components, which amounts to R 2 (y, u, p):= L y (y, u, p) 2 Algorithm 5.1 is stopped when R(y, u, p) ≤ 10 −6 (5.9) is reached. Moreover, we impose a tolerance of e(y, u) (K+M) −1 ≤ 10 −10 for the solution of the forward problem in Line 6.
Algorithm 5.1 (Discrete semismooth Newton method with nonlinear state update for the solution of a discretized instance of problem (P ε )).

Numerical experiments
In this section we describe a number of numerical experiments. The first experiment serves the purpose of demonstrating the influence of the non-locality parameter b. In the second experiment, we numerically confirm the mesh independence of our algorithm. The third experiment is dedicated to studying the impact of the penalty parameter ε. As mentioned in Sect. 5, our implementation of Algorithm 5.1 employs a direct solver for the linear systems arising in Line 4 and is therefore only suitable for relatively coarse discretization of two-dimensional domains. Unless otherwise mentioned, the following experiments are obtained on a mesh discretizing a square domain with N V = 665 vertices and N T = 1248 triangles. Notice that convex domains are covered by our theory due to Remark 4.4. The typical run-time for Algorithm 5.1 is around 3 s.

Influence of the non-locality parameter
Our initial example builds on the two-dimensional problem presented in [8]. The problem domain is Ω = (−0.5, 0.5) 2 ; notice that this is slightly incorrectly stated in [8]. Moreover, we have right-hand side f (x, y) ≡ 100 and desired state y d (x, y) ≡ 0. The lower bound for the control is given as u a (x, y) = −3x − 3y + 10 and the upper bound is u b ≡ ∞. Moreover, the control cost parameters are λ 1 = 0 and λ 2 = 4 · 10 −5 . We choose ε = 10 −2 as our penalty parameter. The coefficient function determining the degree of non-locality is set to b(x, y) = α (x 2 + y 2 ), where α varies in {0, 10 0 , 10 1 , 10 2 , 10 3 }. We point out that these settings violate Assumption 2.1 due to λ 1 = 0, i. e., the cost term is only of L 2 -type, and since b is not uniformly positive inside Ω. The lack of an upper bound in this example is of no concern because we could assign a posteriori a sufficiently large upper bound which does not become active. Nonetheless, we present this experiment in order to reproduce the results in [8], which correspond to the case α = 1.
For each value of α, we start from an initial guess constructed as follows. We initialize u 0 to the lower bound u a and set y 0 to the numerical solution of the forward problem with control u 0 . The adjoint state is initialized to p 0 = 0. Figure 1 shows some of the optimal state and control functions obtained. We notice that the solution in case of a local problem (α = 0) is visually indistinguishable from the setting α = 1 considered in [8]. We therefore compare it to the case α = 10 3 of significantly more pronounced non-local effects. Clearly, an increase in the nonlocal parameter aids the control in this example, so the control effort can decrease, as reflected in Fig. 1. Also, we observe that the number of iterations of the discrete semismooth Newton method (Algorithm 5.1) decreases slightly as α increases; see Table 1.

Dependence on the discretization
In this experiment we study the dependence of the number of semismooth Newton steps in Algorithm 5.1 on the refinement level of the underlying discretization. To this end, we consider a coarse mesh and two uniform refinements; see Table 2.
The problem is similar as in Sect. 6.1. The domain is Ω = (−0.5, 0.5) 2 . We use f (x, y) ≡ 100 as right-hand side and the desired state is y d (x, y) ≡ 0. The lower bound for the control is now given as u a (x, y) = −10x −10y+20 and the upper bound is u b = u a +5. Moreover, the control cost parameters are λ 1 = 10 −7 and λ 2 = 4·10 −5 . We choose ε = 10 −2 as our penalty parameter. The coefficient function determining the degree of non-locality is set to b(x, y) ≡ 10. Notice that Assumption 2.1 is satisfied for this experiment.
For each mesh, we start from an initial guess constructed as follows. We initialize u 0 to the lower bound u a and set y 0 to the numerical solution of the forward problem with control u 0 . The adjoint state is initialized to p 0 = 0. In this example, both the lower and upper bounds are relevant on all mesh levels. Nonetheless, we observe a mesh-independent convergence behavior; see

Influence of the penalty parameters
In this final experiment, we study the behavior of Algorithm 5.1 and the solutions to the penalized problem (P ε ) in dependence of the penalty parameter ε. We solve similar problems as before, with domain Ω = (−0.5, 0.5) 2 , right-hand side f (x, y) ≡ 100 and desired state y d (x, y) ≡ 0. The lower bound for the control is u a (x, y) = −10x − 10y + 20 and the upper bound is u b = u a + 8. Moreover, the control cost parameters are λ 1 = 10 −7 and λ 2 = 4 · 10 −5 . The penalty parameter varies in {10 0 , 10 −1 , 10 −2 , 10 −3 , 10 −4 }. The coefficient function determining the degree of non-locality is set to b(x, y) ≡ 10. penalty parameter 1.00e+00 penalty parameter 1.00e-01 penalty parameter 1.00e-02 penalty parameter 1.00e-03 penalty parameter 1.00e-04 Fig. 3 The convergence plot shows the total residual norm R(y, u, p) as in (5.8) for all values of the penalty parameter ε. In the left plot, the same initial guess was used for all penalty parameters. With warmstarting, convergence can be achieved in one semismooth Newton step The construction of an initial guess is the same as in Sect. 6.2. The experiment is split into two parts. First, we consider Algorithm 5.1 without warmstarts. The corresponding results are shown in Table 3. As expected, the number of Newton steps increases as ε 0 while the norm of the bound violation decreases. Second, we repeat the same experiment with warmstarts. That is, we use the initialization as described above only for the initial value of ε. Subsequent runs of Algorithm 5.1 are initialized with the final iterates obtained for the previous value of ε. This strategy is very effective, as shown in Fig. 3 (right column).

A Comment on the proof of existence of an optimal solution in [8]
We believe that the proof concerning the existence of an optimal solution in Theorem 2.5 of [8] contains a flaw. That proof uses the direct method of the calculus of variations and begins by constructing two sequences {u n } and {y n } satisfying the state equation (2.3) and converging weakly in L 2 (Ω). The proof then proceeds to show that the weak limit satisfies the state equation as well. That claim, however, is incorrect. Indeed, we construct below a counterexample showing that the control-to-state map is not continuous in any meaningful sense w.r.t. the weak L 2 -convergence of the controls. We acknowledge that this argument was suggested by one of the reviewers.
It suffices to consider (2.3) in the setting Ω = (0, 1) ⊂ R with data b ≡ 1 and f ≡ 1. We consider the sequence of controls {u n } ⊂ L 2 (Ω) defined by u n (x):=1+2χ(nx), where This sequence clearly satisfies u n ū:=2 in L 2 (Ω); see, for instance, [6, Theorem 2.6]. We now show that y n :=S(u n ) does not converge to S(ū)=:ȳ. To this end, we note that {y n } is bounded in H 2 (Ω) and thus a subsequence (which we denote the same) converges weakly in H 2 (Ω) and strongly in H 1 0 (Ω) to some y * ∈ H 2 (Ω) ∩ H 1 0 (Ω). This implies that . Now if S(ū) =ȳ = y * held, then − y * = 1 2 1 1 + ∇ y * 2 L 2 (Ω) would follow. This, however, is impossible due to the strict convexity of the function (0, ∞) t → 1/ t + ∇ y * 2 L 2 (Ω) . Consequently,ȳ = y * and we obtain that u n u in L 2 (Ω) does not imply S(u n ) → S(u) in any meaningful sense. Therefore, the proof of Theorem 2.5 of [8] cannot be correct, since it implies the weak L 2 -continuity of the control-to-state map. The issues appears to be in step four of the proof on page 779, where the authors conclude that i∈I n λ i a i (x m ) −â(x m ) ≥ δ holds for all n ∈ N. This, however, is not the case, and therefore, the desired contradiction is not obtained.
Given the lack of weak L 2 -continuity of the control-to-state operator, the direct method of the calculus of variations cannot be applied in the setting of [8], where only an L 2 -cost term is present. We overcome this issue by choosing a stronger norm for the control cost term, so that we can use the strong L 2 -continuity of the control-to-state map proved in Theorem 2.6.