A Lagrange Multiplier Method for Semilinear Elliptic State Constrained Optimal Control Problems

In this paper we apply an augmented Lagrange method to a class of semilinear elliptic optimal control problems with pointwise state constraints. We show strong convergence of subsequences of the primal variables to a local solution of the original problem as well as weak convergence of the adjoint states and weak* convergence of the multipliers associated to the state constraint. Moreover, we show existence of stationary points in arbitrary small neighborhoods of local solutions of the original problem. Additionally, various numerical results are presented.


Introduction
In this paper, the solution of an optimal control problem subject to a semilinear elliptic state equation and pointwise control and state constraints will be studied. The control problem is non-convex due to the nonlinearity of the state equation. The problem under consideration is given by Here, A denotes a second-order elliptic operator while d(y) is a nonlinear term in y.
The setting of the optimal control problem will be made precise in Section 2.
Optimal control problems with pointwise state constraints suffer from low regularity of the respective Lagrange multipliers, see [1,3] for Dirichlet problems and [2] for Neumann problems. The multiplierμ associated to the state constraint is a Borel measure. Under additional assumptions it has been proven in [5] that the multiplier satisfies H −1 (Ω)regularity. These assumptions are satisfied, e.g., for ψ constant. For linear quadratic optimal control problems the literature is quite rich. Quite a number of different regularization approaches have been investigated to overcome the problems that occur when solving problems of this type. We want to mention here penalization-based approaches [9][10][11][12]15] and interior point methods [22,28]. It is a common way to reach higher regularity of the Lagrange multiplier by replacing the pure state constraints by mixed control-state-constraints as it has been done by applying Lavrentiev-regularization [13,26], or the virtual control approach. This approach has been introduced by Krumbiegel and Rösch in [18] for boundary control problems. In [7] the approach has been adapted to linear elliptic distributed control problems and extended in [20] to distributed elliptic optimal control problems governed by a semilinear state equation.
In lots of these approaches, the state constraints are relaxed in a suitable way, but not removed completely from the set of explicit constraints. Differently, by applying augmented Lagrange methods the state constraints are replaced by a penalized term augmenting the inequality constraint in the cost functional [16,17]. In our recent work [17] an adapted augmented Lagrange method has been analyzed in the general setting of linear elliptic optimal control problems with state constraints. Here, the presented algorithm solves sub-problems that are control constrained only. Compared to the unregularized problem the occurring sub-problems can be solved by efficient optimization algorithms. Establishing a special update rule that performs the classical augmented Lagrange update only if a sufficient decrease of the maximal constraint violation and the violation of the complementarity condition is achieved allowed us to guarantee the L 1 -boundedness of generated multiplier approximations. The goal of the present paper is to extend this work to a larger class of optimal control problems in order to solve non-convex elliptic problems. Non-convexity arises from a semilinear state equation yielding a nonlinear solution operator.
In every iteration of the augmented Lagrange algorithm one has to solve the following sub-problem min y,u J(y, u) + 1 2ρ (µ + ρ(y − ψ)) + 2 L 2 (Ω) s.t. y = S(u) and u ∈ U ad , where µ is a function given in L 2 (Ω) and S denotes the solution operator of the semilinear partial differential equation given in (1). The convergence analysis of solution algorithms of nonconvex optimal control problems suffers from non-uniqueness of local and global solutions. Due to the nonlinearity of the state equation uniqueness of the optimal solution can not be expected for the unregularized as well as for the augmented Lagrange sub-problem. In addition, it is generally possible that critical points, which are no local solutions, are computed. The sub-problem may have stationary points arbitrarily far from a given local solutionū, and there is no rule to determine which of these points has to be chosen in the solution process of the sub-problem in order to guarantee convergence. That is why our first main result (Theorem 4.11) states (global) subsequential convergence towards KKT points.
We are able to extend this result under certain second-order conditions in Section 6.
We will prove that the sequence that is generated by our algorithm converges to a local solution of the original problem. Furthermore, we will derive second-order optimality conditions for the sub-problem that allow us to derive local uniqueness of stationary points of the arising sub-problem. In computations, one often uses the previous iterateū k as initial guess for the computation of the next iterateū k+1 . Hence, it is reasonable to expect that, ifū k is near a local solutionū the remaining iterates will stay nearū, too. However, in this case one has to provide existence of a KKT point of the sub-problem in exactly this neighborhood. Under a quadratic growth condition we are able to prove that for every fixed µ there exists a KKT point of the augmented Lagrange sub-problem near a local solutionū, provided that the penalty parameter ρ is large enough. Therefore, we investigate in Section 5 the auxiliary problem min y,u J(y, u) that claims solutions that are close enough to a local solutionū of (P ). We will prove that for ρ large enough global solutions of this auxiliary problem are local solutions of the augmented Lagrange sub-problem and that these solutions converge to a local solution of the unregularized problem as the penalty parameter ρ tends to infinity. The outline of this paper is as follows: In Section 2 we start collecting results about the unregularized optimal control problem. Next, in Section 3 we present the augmented Lagrange method. Section 4 is dedicated to show that every weak limit point of the sequence generated by our algorithm is a KKT point of the original problem. Further, in Section 5 we construct an auxiliary problem that claims solutions near a local solution of the original problem. Exploiting appropriate properties of this auxiliary problem we prove that for ρ sufficiently large solutions of the auxiliary problem are local solutions of the augmented Lagrange sub-problem. Further we show convergence rates for the arising sub-problems. In Section 6 we consider second-order sufficient conditions. To illustrate our theoretical findings we present numerical examples in Section 7.
Notation. Throughout the article we will use the following notation. The inner product in L 2 (Ω) is denoted by (·, ·). Duality pairings will be denoted by ·, · . The dual of C(Ω) is denoted by M(Ω), which is the space of regular Borel measures onΩ. Further (·) + := max(0, ·) the pointwise almost-everywhere sense. We refer to u * as a (weak) limit point of a sequence (u k ) k if there exists a subsequence (u k ) k such that u k u * . If u * is the (weak) limit of (u k ) k , then the whole sequence converges weakly.

The Optimal Control Problem
Let Y denote the space Y := H 1 (Ω) ∩ C(Ω), and set U := L 2 (Ω). We want to solve the following state constrained optimal control problem: Minimize over all (y, u) ∈ Y × U ad subject to the semilinear elliptic equation and subject to the pointwise constraints In the sequel, we will work with the following set of standing assumptions.
be a bounded domain with C 1,1 -boundary Γ or a bounded, convex domain with polygonal boundary Γ.

The differential operator A is given by
with a i,j ∈ C 0,1 (Ω) and a 0 ∈ L ∞ (Ω). Further, a 0 ≥ a.e. x ∈ Ω and a 0 = 0. The operator A is assumed to satisfy the following ellipticity condition: There is δ > 0 such that 4. The co-normal derivative ∂ ν A y is given by where ν denotes the outward unit normal vector on Γ.

The function d(x, y)
: Ω × R is measurable with respect to x ∈ Ω for all fixed y ∈ R and twice continuously differentiable with respect to y for almost all x ∈ Ω. Moreover, for y = 0 the function d and its derivative with respect to y up to order two are bounded, i.e. there exists C > 0 such that The derivatives of d with respect to y are uniformly Lipschitz up to order two on bounded sets, i.e, there exists a constant M and a constant L(M ), that is dependent of M such that for almost every x ∈ Ω and all y 1 , Finally, there is a subset E Ω ⊂ Ω of positive measure with d y (x, y) > 0 in E Ω × R.
Theorem 2.1 (Existence of solution of the state equation). Let Assumption 1 be satisfied. Then, for every u ∈ L 2 (Ω), the elliptic partial differential equation admits a unique weak solution y ∈ H 1 (Ω) ∩ C(Ω), and it holds with c > 0 independent of u. If in addition (u n ) n is such that u n u ∈ L 2 (Ω) then the corresponding solutions (y n ) n of (2) converge strongly in H 1 (Ω) ∩ C(Ω) to the solution y of (2) to data u.
Proof. The proof stating existence of a solution, its uniqueness, and the estimates of the norm can be found in [2,Theorem 3.1]. The compact inclusion L 2 (Ω) ⊂ H −1 (Ω) and the fact that u ∈ H −1 (Ω) provides solutions in H 1 (Ω) ∩ C(Ω) imply the additional statement.
We introduce the control-to-state operator It is well known [29,Theorem 4.16] that S is locally Lipschitz continuous from L 2 (Ω) to H 1 (Ω) ∩ C(Ω), i.e., there exists a constant L such that is satisfied for all u i ∈ L 2 (Ω), i = 1, 2 with corresponding states y i = S(u i ). We define the following sets The feasible set of the optimal control problem is denoted by Using this notation the reduced formulation of problem (P ) is given by For further use we want to recall a result concerning differentiability of the nonlinear control-to-state mapping S. Theorem 2.2 (Differentiability of the solution mapping). Let Assumption 1 be satisfied. Then, the mapping S : L 2 (Ω) → H 1 (Ω) ∩ C(Ω), that is defined by S(u) = y is twice continuously Fréchet differentiable. Furthermore for all u, h ∈ L 2 (Ω), y h = S (u)h is defined as solution of Ay h + d y (y)y h = h in Ω, Proof. The proof for the first derivative of S : L r (Ω) → H 1 (Ω) ∩ C(Ω), r > N/2 can be found in [29,Theorem 4.17]. We refer to [29,Theorem 4.24] for the proof of secondorder differentiability of S : L ∞ (Ω) → H 1 (Ω) ∩ C(Ω) which is also valid for S : L 2 (Ω) → H 1 (Ω) ∩ C(Ω).

Existence of Solutions of the Optimal Control Problem
Under the standing assumptions we can show existence of solutions of the reduced control problem (4).
By standard arguments we get the following theorem.
Theorem 2.3 (Existence of solution of the optimal control problem). Let Assumption 1 be satisfied. Assume that the feasible set F ad is nonempty. Then, there exists at least one global solution (ȳ,ū) of (P ).
Proof. The proof can be found in [14,Theorem 1.45].
Due to non-convexity, global solutions of problem (P ) are not unique in general, also, in addition there might be local solutions.

First-Order Optimality Conditions
The existence of Lagrange multipliers to state constrained optimal control problems is not guaranteed without some regularity assumption. In order to formulate firstorder necessary optimality conditions we will work with the following linearized Slater condition.
Assumption 2 (Linearized Slater condition). We assume that a local solutionū satisfies the linearized Slater condition, i.e., there existsû ∈ U ad and σ > 0 such that there holds Next, we state a regularity result concerning linear partial differential equations with measure on the right-hand side, see [2,Theorem 4.3].
Theorem 2.4 (Existence of solution of the adjoint equation). Let µ be a regular Borel measure with µ = µ Ω + µ Γ ∈ M(Ω). Then the elliptic partial differential equation admits a unique weak solution p ∈ W 1,s (Ω), s ∈ [1, N/(N − 1)) and it holds with c > 0 independent of the right hand side of the partial differential equation.
Based on the linearized Slater condition first-order necessary optimality conditions for problem (P ) can be established.
Proof. The proof can be done by adapting the theory from [2, Theorem 5.3] to Neumann boundary conditions.
Let us emphasize that due to the presence of control as well as state constraints, the adjoint statep and the Lagrange multiplierμ need not to be unique.

The Augmented Lagrange Method
Like in [17] we eliminate the explicit state constraint S(u) ≤ ψ from the set of constraints by adding an augmented Lagrange term to the cost functional. Let ρ > 0 denote a penalization parameter and µ a fixed function in L 2 (Ω). Then in every step k of the augmented Lagrange method one has to solve the sub-problem where (·) + := max(0, ·) in the pointwise sense, subject to the control constraints u ρ ∈ U ad .

Analysis of the Augmented Lagrange Sub-Problem
In the following, existence of an optimal control and existence of a corresponding adjoint state will be proven. Local solutions of the augmented Lagrange sub-problem (P ρ,µ AL ) are defined analogously to (P ).
Since the problem (P ρ,µ AL ) has no state constraints, the first-order optimality system is fulfilled without any further regularity assumptions.
Finally, in Algorithm 1 we present the augmented Lagrange algorithm, which is based on the algorithm that has been developed in [17].
In the following we will call the step k successful if the quantity shows sufficient decrease (see step 4 of the algorithm). Otherwise we will call the step not successful. The first part of R k measures the maximal constraint violation while the second term quantifies the fulfilment of the complementarity condition in the second part. Since (μ k (x), ψ(x) −ȳ k (x)) is nonnegative for every feasibleȳ k it is enough to check on the smallness of (μ k , ψ −ȳ k ) + in the second term for quantifying if the complementarity condition is satisfied.
From now on let (P k AL ) denote the augmented Lagrange sub-problem (P ρ,µ AL ) for given penalty parameter ρ := ρ k and multiplier µ := µ k . We will denote its solution by (ȳ k ,ū k ) with adjoint statep k and updated multiplierμ k .

Infinitely Many Successful Steps and Convergence Towards Feasible Points
The most crucial part of the convergence analysis is to prove that the algorithm makes infinitely many successful steps. Otherwise the algorithm might be caught in an infinite loop between the steps 1, 2, 3 and 5.
The following assumption plays the key role for proving that Algorithm 1 is well-defined.
Assumption 3. In step 1 of Algorithm 1, the solutions (ȳ k ,ū k ,p k ) of (6) are chosen such that is uniformly bounded.
Working with Assumption 3 we consider different approaches. In the first approach we consider the sequence (ū k ) k generated by Algorithm 1 which is, due to the control constraints, bounded in L 2 (Ω). Hence, we can extract a weakly converging subsequencē u k u * in L 2 (Ω). Note, that here u * denotes only a weak limit point of (ū k ) k and not necessarily a local solution of the optimal control problem (P ). Further, by Theorem 2.1 we get a strongly converging subsequenceȳ k → y * in H 1 (Ω) ∩ C(Ω). Exploiting Assumption 3 our aim is to show that y * := S(u * ) is feasible which in turn will yield that the term R k tends to zero (Theorem 4.3). In the second approach we choose (ȳ k ,ū k ) to be global minimizers of the augmented Lagrange sub-problem and show via a contradiction argument that infinitely many successful steps are done which in turn yields feasibility of any accumulation point of (S(ū k )) k . We start with the first approach proving an auxiliary result that does not require Assumption 3.
Proof. By assumption, there is an index m such that all steps k with k > m are not successful. According to Algorithm 1 it holds µ k = µ m for all k > m. Let Then, the desired estimate follows easily by pointwise evaluation of the contributing quantities in where we applied Young's inequality.
We will now use Assumption 3 to prove feasibility of y * = S(u * ).
Lemma 4.2. Let Assumption 3 be satisfied. Further, let (µ k ) k ∈ L 2 (Ω) and let (ρ k ) k be a sequence of positive numbers with ρ k → ∞. Let (ȳ k ,ū k ,p k ) k be a sequence of solutions of (6). Let u * denote a weak limit point of (ū k ) k . Then the associated state y * = S(u * ) is feasible, i.e., y * ≤ ψ.
Exploiting Lemma 4.2 it can be shown that the augmented Lagrange algorithm makes infinitely many successful steps.
yielding a contradiction.
Let us recall that Assumption 3 which is the basis for proving that the algorithm makes infinitely many successful steps is a rather strong assumption. We therefore want to argue that it can be satisfied, if we take (ȳ k ,ū k ) to be global minimizers of the augmented Lagrange sub-problem.
Hence, Assumption 3 is clearly satisfied.
Lemma 4.5. Assume that in step 1 of Algorithm 1, the pair (ȳ k ,ū k ) is chosen to be global minimizers of the augmented Lagrange sub-problem. Then the augmented Lagrange algorithm makes infinitely many successful steps and any limit point y * of (ȳ k ) k corresponding to (ū k ) k is feasible for (P ).
Proof. Assuming that only finitely many steps are successful we know from Lemma 4.4 that Assumption 3 is satisfied. However, then from Theorem 4.3 we obtain a contradiction. Hence we know that Algorithm 1 makes infinitely many successful steps. Since R k tends to zero, the term (ȳ k − ψ) + C(Ω) yields feasibility of any limit point of (ȳ k ) k .
Without any further assumptions our algorithm yields the following convergence properties.
In Theorem 4.6 we have proven that a weak limit point u * of (u + n ) n with corresponding state y * is feasible for (P ). However, we do not know yet, if u * is a stationary point, i.e., if (p + n , µ + n ) n converges in some sense to (p * , µ * ) such that (y * , u * , p * , µ * ) satisfies the optimality system (5). To achieve this aim, we have to suppose additional properties of the weak limit point u * . In the next subsection we will investigate the impact on our convergence result if our algorithm generates a sequence with weak limit point u * that satisfies a linearized Slater condition.

Convergence towards KKT Points
We have shown in the previous section that the augmented Lagrange algorithm converges on a subsequence to a feasible point. Now we want to extend our results by proving convergence to a KKT point. We start with several auxiliary results.
Proof. From Theorem 2.1 we know that y k := S(u k ) is the unique weak solution of the state equation Further for u k u * in L 2 (Ω) we get y k → y * in H 1 (Ω) ∩ C(Ω). Let now z k denote the linearized state z k := S (u k )h k . Then by Theorem 2.2 we know that z k is the unique solution of Further let z * := S (u * )h * solve the equation We subtract both PDEs and set e k : Inserting the identity d y (y k )z k − d y (y * )z * = (d y (y k ) − d y (y * )) z k + d y (y * )(z k − z * ) we obtain From Assumption 1 we know that d y (y) is locally Lipschitz continuous, i.e., Concluding, for y k → y * in L ∞ (Ω) we have d y (y k ) → d y (y * ) in L ∞ (Ω). Due to h k h * in L 2 (Ω) and the boundedness of z k in L 2 (Ω) we gain e k → 0 in H 1 (Ω) ∩ C(Ω). Hence, and the proof is done.
Let us recall that (y + n , u + n , p + n , µ + n ) denotes the solution of the n-th successful iteration of Algorithm 1. We want to investigate the convergence properties of the algorithm for a weak limit point u * of (u + n ) n . A point u * ∈ U ad satisfies the linearized Slater condition if there exists aû ∈ U ad and σ > 0 such that Lemma 4.8. Let u * denote a weak limit point of (u + n ) n that satisfies the linearized Slater condition (7). Then, there exists an N ∈ N such that for all n > N the control u + n satisfies Proof. By Theorem 4.6 we have strong convergence S(u + n ) → S(u * ) in H 1 (Ω) ∩ C(Ω). By Theorem 4.7 we get S (u + n )(û − u + n ) → S (u * )(û − u * ) in H 1 (Ω) ∩ C(Ω). Using the identity and exploiting the specified convergence results, we conclude of an N ∈ N such that We recall an estimate for the second term of the update rule, see [17,Lemma 3.9], that is necessary to state L 1 -boundedness of the Lagrange multiplier. This estimate does not require any additional assumption, it just results from the structure of the update rule.
Lemma 4.9. Let y + n , µ + n be given as defined in Algorithm 1. Then for all n > 1 it holds Lemma 4.10 (Boundedness of the Lagrange multiplier). Assume that Algorithm 1 generates the sequence (y + n , u + n , p + n , µ + n ) n . Let (u + n ) n denote a subsequence of (u + n ) n that converges weakly to u * . If u * satisfies the linearized Slater condition from (7), then the corresponding sequence of multipliers (µ + n ) n is bounded in L 1 (Ω), i.e., there is a constant C > 0 independent of n such that for all n it holds Proof. Writing (6c) in variational form we see Using the identity Rearranging terms yields Testing the left hand side of the previous inequality with the test function u :=û ∈ U ad we get By Lemma 4.8 we know that there exists an N such that for all n > N the control u + n satisfies (8). Hence for all n > N we obtain Thus, we estimate and hence From Theorem 2.2 we know, that y h := S (u + n )(û − u + n ) is the weak solution of a uniquely solvable partial differential equation with right-hand sideû − u + n . Hence, it is norm bounded by c û − u + n L 2 (Ω) with c > 0 independent of n. With Young's Inequality we obtain Exploiting the boundedness of y + n − y d L 2 (Ω) ,û ∈ U ad , and Lemma 4.9 this yields the assertion.
Let us conclude this section with the following result on convergence.
Theorem 4.11 (Convergence towards KKT points). Assume that Algorithm 1 generates the sequence (y + n , u + n , p + n , µ + n ) n . Let u * denote a weak limit point of (u + n ) n . If u * satisfies the linearized Slater condition from (7), then there exist subsequences (y + n , u + n , p + n , µ + n ) n of (y + n , u + n , p + n , µ + n ) n such that and (y * , u * , p * , µ * ) is a KKT point of the original problem (P ).

Convergence towards Local Solutions
So far, we have been able to show that a weak limit point that has been generated by Algorithm 1 is a stationary point of the original problem (P ) if it satisfies the linearized Slater condition. If a weak limit point satisfies a second-order condition, we gain convergence to a local solution. However the convergence result from Theorem 4.11 yields convergence of a subsequence of (u + n ) n only. Accordingly, during all other steps the algorithm might choose solutions of the KKT system (6) that are far away from a desired local minimumū. Here the following questions arise: 1. For every fixed µ does there exist a KKT point of the arising sub-problem that satisfiesū k ∈ B r (ū)? and 2. Is an infinite number of steps successful if the algorithm chooses these KKT points in step 1?
Indeed these questions can be answered positively. We will show in this section that for every fixed µ there exists a KKT point of the augmented Lagrange sub-problem such that for ρ sufficiently largeū k ∈ B r (ū). One should keep in mind, that also in this case there is no warranty that forces the algorithm to choose exactly these solutions. However, if the previous iterates are used in numerical computations as a starting point for the computation of the next iterate, the remaining iterates are likely located in B r (ū). In order to reach this result we need the following assumption which is rather standard.
Assumption 4 (Quadratic growth condition (QGC)). Letū ∈ U ad be a control satisfying the first-order necessary optimality conditions (5). We assume that there exist β > 0 and > 0 such that the quadratic growth condition is satisfied for all feasible u ∈ U ad , S(u) ≤ ψ with u −ū L 2 (Ω) ≤ . Hence,ū is a local solution in the sense of L 2 (Ω) for problem (P ).
Let us mention that the quadratic growth condition can be implied by some well known second-order sufficient condition (SSC). We refer the reader to Section 6 for more details.
Our idea now is the following: In order to show that in every iteration of the algorithm there existsū k ∈ B r (ū) we want to estimate the error norm ū k −ū 2 L 2 (Ω) . Here we want to exploit the quadratic growth condition from Assumption 4. However, this condition requires a control u ∈ U ad that is feasible for the original problem (P ), which has explicit state constraints. Since the solutions of the augmented Lagrange sub-problems cannot be expected to be feasible for the original problem in general, we consider an auxiliary problem. Due to the special construction of this problem one can construct an auxiliary control that is feasible for the original problem (P ). This idea has been presented in [6] for a finite-element approximation as well as in [20] for regularizing a semilinear elliptic optimal control problem with state constraints by applying a virtual control approach.

Analysis of the Auxiliary Problem
Letū be a local solution of (P ) that satisfies the first-order necessary optimality conditions (5) of Theorem 2.5 and the quadratic growth condition from Assumption 4. Following the idea from [6,20] we consider the following auxiliary problem min y r ρ ,u r ρ J r AL (y r ρ , u r ρ , µ, ρ) := J(y r ρ , u r ρ ) + such that We choose r small enough such that the quadratic growth condition from Assumption 4 is satisfied. In the following we define the set of admissible controls of (P r AL ) by The auxiliary problem admits at least one (global) solution. Moreover first-order necessary optimality conditions can be derived by standard arguments without any regularity assumption: Theorem 5.1 (Existence of solution of the auxiliary problem). The auxiliary problem (P r AL ) admits a global solutionū r ρ ∈ U r ad . Proof. Can be found in [8,Theorem 5.1].
Theorem 5.2 (Necessary optimality conditions of the auxiliary problem). Let u r ρ be a local optimal solution of (P r AL ) andȳ r ρ its associated state. Then, there exist a unique adjoint statep r ρ ∈ H 1 (Ω) and a unique Lagrange multiplierμ r ρ ∈ M(Ω) such that they satisfy the following optimality system

Construction of a Feasible Control
In this section we want to construct a control u r,δ ∈ U r ad that is feasible for the original problem (P ), i.e., u r,δ ∈ U ad and S(u r,δ ) ≤ ψ. Based on a Slater point assumption controls of this type have already been constructed in [25] for obtaining error estimates of finite element approximation of linear elliptic state constrained optimal control problems. In [20] these techniques were combined with the idea of the auxiliary problem presented for nonlinear optimal control problems in [6].
We follow the strategy from [20]. This work applied the virtual control approach in order to solve (P ). This means, that the state constraints are relaxed in a suitable way. To obtain optimality conditions for the corresponding auxiliary problem the authors showed that the linearized Slater condition of the original problem can be carried over to feasible controls of the auxiliary problem. This transferred linearized Slater condition is also the main ingredient for the construction of feasible controls of the original problem. In our case, the state constraints have been removed from the set of explicit constraints by augmentation. Thus it is not necessary to establish a linearized Slater condition for the auxiliary problem in order to establish optimality conditions. However the Slater-type inequality that is deduced in the following lemma is still needed for our analysis, see Lemma 5.4. Then, it holds û r −ū L 2 (Ω) ≤ r. Moreover, letū r ρ ∈ U r ad be an admissible control of (P r AL ). Then, for r > 0 sufficiently smallū r ρ satisfies the following inequality Proof. By definition ofû r and t it holds û r −ū L 2 (Ω) ≤ r. Inserting the definition of u r we get Hence,û r is a linearized Slater point of the original problem (P ) in the neighborhood ofū. We have û r −ū ≤ r, ū −ū r ρ ≤ r and hence û r −ū r ρ ≤ 2r. Since S and S are Lipschitz we obtain (if r sufficiently small) Thus,û r satisfies (11) and the proof is done.
In the following lemma we will construct feasible controls for (P ) to be used in the sequel for our convergence analysis. The construction of an admissible control u r,δ ∈ U r ad that is also feasible for (P ) is based on the fact thatū r ρ satisfies Lemma 5.3. We define the maximal violation ofū r ρ with respect to the state constraintsȳ r ρ ≤ ψ by whereȳ r ρ = S(ū r ρ ).
The error between the auxiliary control u r,δ and the global solutionū r ρ of (P r AL ) is bounded by the maximal constraint violation. Proof. We estimate δ ρ from Lemma 5.4 by Together with û r −ū r ρ L 2 (Ω) ≤ 2r and the definition of σ r from Lemma 5.3 as well as δ ∈ [0, δ ρ ] we arrive at and the proof is done.
Finally we are able to apply the quadratic growth condition from Assumption 4.
Lemma 5.6. Letū be a local solution of (P ) that satisfies the quadratic growth condition from Assumption 4 and the linearized Slater condition from Assumption 2. Consider a fixed µ ∈ L 2 (Ω) and r > 0 sufficiently small such that the quadratic growth condition is satisfied. Ifū r ρ is a global solution of the auxiliary problem (P r AL ) then it holds Proof. As has been shown in Lemma 5.4 u r,δ is feasible for (P ). We insert the special choice u = u r,δ in the quadratic growth condition (9) and get where we exploited that ū r ρ −ū 2 L 2 (Ω) ≤ r 2 and ū r ρ − u r,δ L 2 (Ω) is bounded by the maximal constraint violation (Lemma 5.5). Rearranging the terms of (14) and applying Lemma 5.5 we get We recall the definition of the reduced cost functional of the auxiliary problem (P r AL ) Exploiting the Lipschitz continuity of the solution operator S for the estimate , see [29,Lemma 4.11] and exploiting the optimality ofū r ρ for (P r AL ) as well as applying the definition of the reduced cost functional and the feasibility ofū for the auxiliary problem, we get Noting that it holds we get with (12) which yields the claim.

An Estimate of the Maximal Constraint Violation
In this section we will derive an estimate on the maximal constraint violation. We recall an estimate from [21, Lemma 4].
Lemma 5.7. Let f ∈ C 0,1 (Ω) be given. Then, there exists a constant c > 0 so that f satisfies the estimate Theorem 5.8. Let µ ∈ L 2 (Ω) be fixed. Further, letū r ρ be the optimal control of the auxiliary problem (P r AL ). Then, the maximal violation d[ū r ρ , (P )] ofū r ρ with respect to (P ) can be estimated by .
Hence, we get the desired estimate.

Main Results
We can now formulate our main results of this section.
Theorem 5.9. Letū be a local solution of (P ) with corresponding stateȳ satisfying the QGC from Assumption 4 and the linearized Slater condition from Assumption 2. Let µ ∈ L 2 (Ω) be fix and let (ȳ r ρ ,ū r ρ ) denote the global solution of the auxiliary problem (P r AL ). Then, we have: a) For every r > 0 there is aρ such that for all ρ >ρ it holds ū r ρ −ū L 2 (Ω) < r.
b) The solutions (ȳ r ρ ,ū r ρ ) ρ converge in (H 1 (Ω) ∩ C(Ω)) × L 2 (Ω) to (ȳ,ū) as ρ → ∞ and we have the following convergence rates: Let γ := 1 2(2+N ) , then we have c) The pointsū r ρ are local solutions of the augmented Lagrange sub-problem (P k AL ), provided that ρ is sufficiently large. and we can conclude the existence ofρ, r > 0 such that for all ρ >ρ we have ū r ρ −ū L 2 (Ω) < r. c) We have to show that holds for a certain r > 0. Sinceū r ρ is the global solution of the auxiliary problem (P r AL ) we already know that there holds Let now u ∈ U ad such that u −ū r ρ L 2 (Ω) ≤ r 2 . The triangle inequality yields for ρ sufficiently large. Here, we exploited statement b). Hence, u ∈ U r ad where f AL (u) ≥ f AL (ū r ρ ) is satisfied. By definition we can conclude thatū r ρ is a local solution of (P k AL ).
We can further prove that the algorithm makes infinitely many successful steps if (ȳ k ,ū k ) in step 1 of Algorithm 1 are chosen as the global minimizers of the corresponding auxiliary problem.
Theorem 5.10. Assume that in step 1 of Algorithm 1 (ȳ k ,ū k ,p k ) is chosen as the global solution of the auxiliary problem (P r AL ) if it solves the optimality system of the augmented Lagrange sub-problem (6). Assume that only finitely many steps of Algorithm 1 are successful. Then Assumption 3 is satisfied.
Proof. Let m denote the largest index of a successful step. Hence µ k = µ m for all k > m. The sequence (ρ k ) k is monotonically increasing. Exploiting Theorem 5.9 c) we can find an index K > m such that for all k > K the global solution (ȳ k ,ū k ) of the auxiliary problem is a KKT point of (6). Further due to Lemma 5.6 and Theorem 5.8 the following inequality is satisfied Hence, Assumption 3 is satisfied.
We can conclude that the algorithm makes infinitely many successful steps. We omit the proof since it uses the same arguments as in Lemma 4.5.
Corollary 5.10.1. Let all assumptions from Theorem 5.10 be satisfied. Then Algorithm 1 makes infinitely many successful steps.
One has to keep in mind that the quadratic growth condition is only a local condition. Hence, the result of Theorem 5.9 is actually the best we can expect. In particular, the sub-problems (P k AL ) may have solutions arbitrarily far fromū and we cannot exclude the possibility that these solutions are chosen in the sub-problem solution process from Algorithm 1. However, one can prevent this kind of scenario by using the previous iterateū k as a starting point for the computation ofū k+1 . In this way it is reasonable to expect that as soon as one of the iteratesū k lies in B r (ū) (with r as above) and the penalty parameter is sufficiently large, the remaining iterates will stay in B r (ū) and converge toū.

Second-Order Sufficient Conditions
We take up the quadratic growth condition from Assumption 4. This condition is implied by a second-order sufficient condition, see [3]. We define the Lagrangian function where y = S(u) and assume that for all (ȳ,p,μ) satisfying the first-order necessary optimality conditions (5) toū it holds where Cū denotes the cone of critical directions as defined in [3]. Since the solution operator S (Theorem 2.2) and the cost functional J : L 2 (Ω) → R are of class C 2 (see [3,4]), inequality (15) together with the first-order necessary conditions implies the quadratic growth condition from Assumption 4, see [3,Theorem 4.1,Remark 4.2] and [29]. Note, that the multiplierμ does not need to be unique. That is why (15) is imposed for every multiplier.
Let us return to the convergence analysis of Algorithm 1. If in addition to the assumptions of Theorem 4.11, u * satisfies the QGC from Assumption 4, then u * obviously is a local solution.
Second-order sufficient conditions not only allow us to prove convergence to a local solution but also to show local uniqueness of stationary points of the augmented Lagrange sub-problem. This is an important issue for numerical methods. In [19] the authors proved that the Moreau-Yosida regularization without additional shift parameter is equivalent to the virtual control problem for a specific choice of therein appearing parameters. This equivalence can be transferred to the augmented Lagrange sub-problem (P ρ,µ AL ).

Remark 1.
Letū ∈ U ad be a control that satisfies the first-order necessary optimality conditions (5) and letμ be the unique Lagrange multiplier w.r.t. the state constraints. We assume that there exists a constant δ > 0 such that One can prove that the SSC (16) can be carried over to the augmented Lagrange subproblems. Let µ ∈ L 2 (Ω) and ρ > 0 be fixed. Letū ρ ∈ U ad be a control that satisfies u ρ ∈ B r (ū) and the first-order necessary optimality conditions (6). Let the SSC (16) be satisfied. Then, there exists a constant δ > 0, which is independent of µ such that for all h ∈ L 2 (Ω) the following condition is fulfilled for all (h, y h ) ∈ L 2 (Ω) × H 1 (Ω) provided that ρ is sufficiently large. Here, y h = S (ū ρ )h andp ρ is the solution of the adjoint equation of the augmented Lagrange sub-problem. Moreover, then there exists a constant β > 0 and γ > 0 such that the quadratic growth holds for all u ∈ U ad with u −ū ρ L 2 (Ω) ≤ γ andū ρ is a local solution with corresponding stateȳ ρ of the augmented Lagrange sub-problem. Here, Theorem 13 from [20] yields the carried over version of the second-order condition for a virtual control problem. In [19,Proposition 3] it is proved that this condition implies a quadratic growth condition for the virtual control problem. Further, following the arguments as in [19,Theorem 5] this results can be adapted to the augmented Lagrange sub-problem.

Numerical Tests
In this section we report on numerical results for the solution of a semilinear elliptic pointwise state constrained optimal control problem in two dimensions. All optimal control problems have been solved using the above stated augmented Lagrange algorithm implemented with FEniCS [23] using the DOLFIN [24] Python interface.
In every outer iteration of the augmented Lagrange algorithm the KKT system (6) has to be solved for given µ and ρ. This is done by applying a semi-smooth Newton method. We define the sets Then system (6) can be stated as The semi-smooth Newton method for solving (6) is given in Algorithm 2.
Since the linear parts of the system can be solved exactly we choose the error that arises during the linearization of the discretized system (18) as a stopping criterion. We terminate the semi-smooth Newton method as soon as max(r 1 , r 2 , r 3 ) ≤ 10 −6 , where r 1 := d(y k ) − (d y (y k−1 )(y k − y k−1 ) + d(y k−1 )) , is satisfied. In the following, (y h , u h , p h , µ h ) denote the calculated solutions after the stopping criterion is reached. We consider optimal control problems like min J(y, u) : where Ω = [0, 1] × [0, 1]. As not mentioned otherwise, we initialize (ȳ 0 ,ū 0 ,p 0 , µ 1 ) equal to zero, the penalty parameter with ρ 0 := 0.5 and choose the parameter in the decision concerning successful steps to be τ := 0.1. If a step has not been successful, the penalization parameter is increased by the factor θ := 10. We stopped the algorithm as soon as R + n := (y + n − ψ) + C(Ω) + (µ + n , ψ − y + n ) + ≤ 10 −6 was satisfied. Since the stopping criterion from Algorithm 2 yields (y h , u h , p h ) that satisfies (5a)-(5c) with the desired accuracy this is a suitable stopping criterion.

Example 1
Let us first consider an optimal control problem that is governed by the following partial differential equation
We start the algorithm with ρ 0 := 1 and τ := 0.5. The Figures 3 and 4 depict the computed result for a degree of freedom of 10 4 . Moreover, Figure 5 depicts the L 2 -error of the computed solution (y h , u h , p h ) to the constructed solution (ȳ,ū,p) in dependence of the degrees of freedom.

Example 3
We adapt an example from [17] which can also be found in [27] for state constraints given by y ≥ ψ. In this case Ω := [−1, 2] × [−1, 2]. This example does not include constraints on the control. The optimal control problem is governed by the semilinear partial differential equation −∆y + y 5 = u + f in Ω,