A Sequential Homotopy Method for Mathematical Programming Problems

We propose a sequential homotopy method for the solution of mathematical programming problems formulated in abstract Hilbert spaces under the Guignard constraint qualification. The method is equivalent to performing projected backward Euler timestepping on a projected gradient/antigradient flow of the augmented Lagrangian. The projected backward Euler equations can be interpreted as the necessary optimality conditions of a primal-dual proximal regularization of the original problem. The regularized problems are always feasible, satisfy a strong constraint qualification guaranteeing uniqueness of Lagrange multipliers, yield unique primal solutions provided that the stepsize is sufficiently small, and can be solved by a continuation in the stepsize. We show that equilibria of the projected gradient/antigradient flow and critical points of the optimization problem are identical, provide sufficient conditions for the existence of global flow solutions, and show that critical points with emanating descent curves cannot be asymptotically stable equilibria of the projected gradient/antigradient flow, practically eradicating convergence to saddle points and maxima. The sequential homotopy method can be used to globalize any locally convergent optimization method that can be used in a homotopy framework. We demonstrate its efficiency for a class of highly nonlinear and badly conditioned control constrained elliptic optimal control problems with a semismooth Newton approach for the regularized subproblems. In contrast to the published article, this version contains a correction that the associate editor considers as too insignificant to justify publication in the journal.


Mathematics Subject Classification (2010) 49M05
Let  and  be real Hilbert spaces and  ⊆  a nonempty closed convex set.Let the nonlinear objective function  :  → R and the nonlinear constraint function  :  →  be twice continuously Fréchet differentiable.We consider the mathematical programming problem min () over  ∈  subject to () = 0. ( This formulation is equivalent to a more prevalent formulation that allows () ∈   for some nonempty closed convex set   (by the use of slack variables  ∈  via () −  = 0 and (, ) ∈  ×   ).Further restrictions on the overall setting are stated in section 1.5 after we settle the notation in section 1.4.This setting naturally comprises finite dimensional problems (also known as Nonlinear Programming Problems, NLPs) of the form min  ∈R  () subject to  l ≤  ≤  u and () = 0 with  = R  ,  = R  , and  = { ∈ R  |  l ≤  ≤  u }, where some components of  u and  l may take on values of ±∞.
Another popular example is partial differential equation (PDE) constrained optimization, where  =  ×  is a product of the state and control space,  encodes pointwise constraints on the controls, and () = ((, )) = 0 is the PDE constraint, where we often assume that the state  ∈  is locally uniquely determined by the control  ∈  as an implicit function () via (((), )) = 0.

Structure of the article
We give a concise overview of the results of this article in section 1.2.We outline our contributions and connections to existing methods in section 1.3.In the remainder of section 1, we settle our notation, state the general assumptions, and provide the statements of important classical results.We give a short proof of the necessary optimality conditions we use and discuss two central constraint qualifications in section 2. Our main results on projected gradient/antigradient flows for (1) follow in section 3. The application of a projected backward Euler method on the projected gradient/antigradient flow results in a sequential homotopy method, which we describe in section 4. We present numerical results for a local semismooth Newton method globalized by the sequential homotopy approach for a class of highly nonlinear and badly conditioned elliptic PDE-constrained optimal control problems with control constraints in section 5.

Overview: A novel solution approach based on a sequence of homotopies
We propose the following general solution approach in this paper: We construct and analyze existence and uniqueness of a primal-dual projected gradient/antigradient flow for an augmented Lagrangian.The equilibria of the flow are critical points of (1) and vice versa.Under reasonable assumptions, we prove that critical points that are not local minima cannot be asymptotically stable.Small perturbations will make the flow escape these unwanted critical points.We then apply a projected version of backward Euler timestepping.We provide an interpretation of the backward Euler equations as the optimality conditions of a primal-dual proximally regularized counterpart to (1), which satisfies a strong constraint qualification, even though (1) might only satisfy the Guignard constraint qualification [26], the weakest of all constraint qualifications.This gives rise to a sequential homotopy method, in which a sequence of proximally regularized subproblems needs to be solved by (possibly inexact) fast numerical methods that are only required to converge locally.
We invite the reader to read the supplementary material, in which we sketch without proofs the salient features of our approach with an illustrative example in finite dimensions without inequalities.

Related work and contributions
We advance and bridge several fields of optimization with this paper.
The field of globalized Newton methods based on differential equation methods applied to the Newton flow started in the early 1950s with Davidenko [20] and continues to raise scientific interest over the decades [21,9,34,24,15,51,42,52,23], predominantly due to the affine invariance properties of the Newton flow [22].By trading the affine invariance of the Newton flow for the stability properties of the gradient flow, we obtain from a dynamical systems point of view the advantage of being repelled from maxima or saddle points when solving nonlinear optimization problems.
The Newton method, which is equivalent to forward Euler timestepping on the Newton flow with stepsize Δ = 1, has the prominent property of quadratic local convergence.Backward Euler timestepping on the gradient/antigradient flow can attain superlinear local convergence if the solution is sufficiently regular so that we can take the stepsize Δ to infinity or, equivalently, drive the proximal coefficient  to zero, provided that we use a local solver in the numerical homotopy method with at least superlinear local convergence.Driving  to zero is usually possible if the solution satisfies certain second order sufficient optimality conditions.
Three methods in the field of convex optimization are closely related to our approach.The first method is the proximal point algorithm for closed proper convex functions, which can be interpreted as a backward Euler timestepping on the gradient flow of the objective function, while the gradient descent method amounts to forward Euler timestepping on the gradient flow (see, e.g., [48, sec. 4.1] and references therein).We extend this approach to nonconvex optimization problems with explicit handling of nonlinear equality constraints, as they appear for instance in optimal control.To this end, we extend a second method, the primal-dual projected gradient/antigradient flow of [8, chap. 6, 7], from the finite-dimensional convex to the infinite-dimensional nonconvex setting with the help of an augmented Lagrangian technique in the framework of projected differential equations in Hilbert space [19].The third method we extend is the closely related Arrow-Hurwicz gradient method [8, chap. 10], which amounts to projected forward Euler timestepping on the projected gradient/antigradient flow of the Lagrangian without augmentation ( = 0).Our sequential homotopy method is equivalent to projected backward Euler timestepping.Hence, it bears the same connection with the Arrow-Hurwicz gradient method as the proximal point algorithm with gradient descent.
From a Sequential Quadratic Programming (SQP) perspective (see, e.g., [46]), our approach resolves all the numerical difficulties on the nonlinear level such as subproblem infeasibility, degeneracy, and nonconvexity due to indefinite subproblem Hessians.Existing approaches often pass these difficulties on to the level of the quadratic subproblem solvers, which may fail to resolve these issues in a way that guarantees convergence of the overall nonlinear iteration.Our method can thus be used as a black-box globalization framework for any locally convergent optimization method that can be used within a continuation framework, e.g., methods of structureexploiting inexact Sequential Quadratic Programming (SQP) [33,53,49,50,28] or semismooth Newton methods [43,54,56,32,35,33,57,31]. The local methods are even allowed to converge to maxima and saddle points.These issues are taken care of by our sequential homotopy method.For the application of local SQP methods, we can guarantee that the quadratic subproblems are always feasible and that they satisfy a strong constraint qualification that implies unique subproblem Lagrange multipliers.In addition, they are convex if the augmentation parameter  is sufficiently large and the stepsize Δ is sufficiently small when we are still far away from a solution.
Our approach uses the theory of projected differential equations due to Cojocaru and Jonker [19], which have a tight connection to differential inclusions [10] and evolutionary/differential variational inequalities [18,47].We are mainly interested in their equilibrium points, which satisfy a variational inequality (VI).Other methods to compute solutions to VIs have been described in the literature (see, e.g., [45,13]), which are based on semismooth iterations on reformulations using special Nonlinear Complementarity Problem (NCP) functions.
Projected gradient flows for constrained optimization problems in finite dimensions have also been considered with techniques from Riemannian geometry (see, e.g., [37,55,29,30] and references therein), but the resulting methods produce only feasible iterates.It is often computationally wasteful to satisfy all constraints for iterates far away from an optimum and to force the iterates to follow a feasible manifold with possibly high curvature.
For an introduction to augmented Lagrangian approaches in Hilbert spaces we refer to [36] and references therein.We point out that our approach relies on the augmented Lagrangian mainly to remove negative curvature of the Lagrangian in the kernel of the constraints.In contrast to classical augmented Lagrangian methods, we do not alternate between updates of the primal and dual variables but rather update primal and dual variables simultaneously as in augmented Lagrangian-SQP methods [36, chap. 6].

General assumptions
A central role in this article is played by the augmented objective and augmented Lagrangian defined for some fixed  ∈ R ≥0 and arbitrary  ∈  and  ∈  .Throughout this article, we make the following assumptions: Assumption 2 For some fixed  ∈ R ≥0 we have the coercivity condition

Well-known results
Let us recall the following well-known definitions.For properties of projection operators, we refer the reader to [59].
Definition 3 (Polar cone) For a cone  ⊆ , we call the polar cone of .
Remark 1 If  ⊆  is a linear subspace, then  ∈  implies − ∈  and thus equality holds in the definition of We shall make use of the following classical results from convex analysis.

Necessary optimality conditions
The basis for the sequential homotopy method we propose in Sec. 4 is a necessary optimality condition due to Guignard [26].Because the separation of nonlinearities () = 0 and inequalities  ∈  in (1) allow for a much shorter proof, we state it here for the sake of convenience.
Proof Let  ∈  (F , x) with corresponding sequences (  ) ⊂  and (  ) ⊂ R ≥0 .Using the shorthand   =   (  − x), we obtain the assertion from letting  → ∞ in Theorem 1 (Necessary optimality conditions) If x ∈ F is a local optimum of (1) that satisfies GCQ, then there exists a multiplier ȳ ∈  such that Proof The proof is based on the Closed Range Theorem (see, e.g., [58, sec.VII.5] with premultiplication by the Riesz isomorphism   to obtain the Hilbert space version), which states that Assumption 1 is equivalent to Together with Lemma 3 and GCQ we obtain Thus, there exists a ȳ ∈  such that −∇( The method we propose below enjoys the benefit that its subproblems lift the original problem into a larger space with additional structural properties in , , and , which result in satisfaction of a constraint qualification that is much stronger than GCQ, even though problem (1) only satisfies GCQ.

Lemma 4
Let  =  × , equipped with the canonical inner product derived from the Hilbert spaces  and , and let  =  ×   for some nonempty closed convex set   ⊆ .Furthermore, assume there exists a continuously Fréchet-differentiable mapping  : Then, F is nonempty, every x ∈ F satisfies GCQ, and the Lagrange multiplier ȳ in (3) is uniquely determined.
We choose  = ∇( q) + q in order to obtain For the other polar cone in the definition of GCQ, we get Taking the derivative of ((), ) = 0 with respect to  in direction  ∈  yields As a consequence of the Closed Range Theorem [58, sec.VII.5, Corollary 1], (c) is equivalent to the existence of a continuous inverse of  ′  ( x), from which we see that ker Hence, it follows from ( 6), (7), and (5) that which shows that GCQ holds at x. Regarding multiplier uniqueness, we take the -components of ( 3) and ( 6) to deduce from which the uniqueness of ȳ follows from the the existence of a continuous inverse of ∇  ( x) by virtue of (b) and [58, sec.VII.5, Corollary 1].

Projected gradient/antigradient flow
We study a primal-dual gradient/anti-gradient flow (from now on simply called gradient flow) of the augmented Lagrangian   , defined in (2), projected on the closed convex set  in the framework of projected differential equations in Hilbert space [19] according to where the gradients with respect to  and  evaluate to The following existence theorem uses   and 1 2 ∥(.)∥ 2  as Lyapunov-type functions.Due to Lemma 1, the -derivative of   along the flow is given by The positive sign in front of the last term in (9) reflects the saddle point nature of the Lagrangian approach and complicates the use of Lyapunov arguments in comparison to the unconstrained case.We pursue the basic idea that by increasing , we can make the negative term overpower the -independent positive term.That this is not always possible will be discussed after the following theorem.
Theorem 2 (Unique existence of solutions) Let Assumptions 2 and 3 be satisfied.
Then, there exists an interval [0,  final ] and a uniquely determined pair of absolutely continuous functions (, ) : [0,  final ] →  ×  that satisfy the projected gradient flow equation ( 8) and ((0), (0)) = ( 0 ,  0 ).The final time  final can be extended as long as the condition holds almost everywhere on [0,  final ].In addition, if for some  1 ,  2 ∈ (0, 1) the conditions (10) and hold almost everywhere in R ≥0 , we have Proof By Assumption 3, ∇  (, ) is Lipschitz continuous in a neighborhood of ( 0 ,  0 ) with some Lipschitz constant  < ∞.By virtue of [19,Theorem 3.1], there exists an  > 0 and a uniquely determined pair of absolutely continuous functions (, ) : [0, ] →  ×  that satisfy (8) for almost all  ∈ [0, ] and (0) =  0 , (0) =  0 .Without loss of generality, (10) is satisfied on [0, ] and we can repeatedly extend the local solution by the above arguments until (10) or (11) is violated for some  final > 0. As long as (10) is satisfied, no blowup is possible in finite time.To see this, we first observe that d d which implies in combination with (10) and Assumption 2 that This establishes that there can be no blowup of  in finite time.In addition,  cannot blow up in finite time because then   ((), ()) would tend to infinity by virtue of Assumption 2. Hence, we can extend the local solutions to global solutions on the whole interval R ≥0 if the condition (10) holds almost everywhere.In this case, equations ( 13), (2), and Assumption 2 imply that for  > 0 Using the monotonicity   ((), ()) ≤   ((), ()) for 0 <  ≤  implied by (10), we obtain for  ≤  that We concatenate ( 14) and ( 15) and let  → ∞, which yields Hence, we obtain If condition (11) holds additionally, the boundedness of the integral in ( 16) implies with integration of assumption (11) that Hence, d < ∞ and we can establish ( 12) by way of ( 16) and the representation (9).
If now there is a set  ⊆  ×  such that ∇  is Lipschitz continuous on  and ((), ()) ∈  for all  ∈ [0, ∞), then the integrand in ( 17) is absolutely continuous (as a concatenation of an absolutely continuous function with Lipschitz continuous functions).This implies uniform continuity of the integrand and we can deduce that ∥(())∥ 2  → 0 for  → ∞.In combination with ( 16) and the representation (9), this implies that Discussion of Theorem 2 If we do not obtain a solution up to  final = ∞, it must be due to violation of ( 10) or (11).In this case, we may try to increase  in order for the negative term in (9) to overpower the positive one.For  = 0, these flow equations reduce to the projected gradient flow for minimizing the constraint violation ∥()∥ 2  over  ∈  according to Hence, violation of (10) or (11) for large  can only occur if for  = 1 we get stuck in a locally infeasible point x of problem (1), which means This case must arise for instance if F = ∅ and it is reassuring that the theory provides room for this pathological case and that we at least obtain a point of (locally) minimal constraint violation.
We next characterize equilibrium points of (8) assuming they exist.
⊓ ⊔ Among the critical points we are apparently only interested in those that are minima of (1).For the finite-dimensional unconstrained case, we recall that asymptotically stable equilibria of the gradient flow are strict local minima of the objective function and that the converse is true if the objective is analytic in a neighborhood of the minimum [1].This is of high practical relevance, because the gradient flow will be attracted to strict local minima and, conversely, small perturbations (for instance due to numerical round-off) will usually make the flow escape unwanted critical points such as saddle points or maxima.
For the constrained case, the situation is more complicated because the intrinsic saddle point structure of the Lagrangian requires a gradient/antigradient flow, for which to our knowledge no results on asymptotic stability exist so far.We show that critical points that admit an emanating feasible curve of descent are not asymptotically stable (under reasonable conditions).This implies that the projected gradient/antigradient flow will not be attracted to these undesired critical points.To prove this result, we need the following three definitions.
We can think of a flow ribbon as the trajectory of a curve under the gradient/antigradient flow (8), just as if the curve at  = 0 is the first thread and we weave together the threads into a fabric while moving along the flow.This somewhat unusual definition is required to keep the set of points (, ) small on which assumption (19) in the following theorem must hold (compare also Example 1 below).

⊓ ⊔
In order to validate that assumption (19) does not reduce the assertion of Theorem 3 to one about the empty set, we provide a simple example.

Projected backward Euler: A sequential homotopy method
It is well-known that the projection in ( 8) is actually the derivative of the projection of the primal variable onto  in direction of the negative primal gradient: Lemma 6 For a nonempty closed convex set  ⊆ , the Gâteaux derivative of the projection of  ∈  onto  in the direction  ∈  is the projection of  onto the tangent cone  (, ), i.e., Lemma 4.5].

⊓ ⊔
This motivates following the flow defined by ( 8) from ( x, ŷ) ∈  × to (, ) ∈  × with a projected backward Euler step of step size Δ > 0 by solving because Lemma 6 ensures consistency by virtue of lim From a computational point of view, the projected backward Euler system ( 21) is an ideal candidate for the application of local (possibly inexact) semismooth Newton methods (see, e.g., [43,54,57]), which we will investigate in more detail in section 5.
In addition, the projected backward Euler system (21) can be interpreted as necessary optimality conditions of a primal-dual proximally regularized version of the augmented form of (1).With  = 1/Δ, it reads Uniqueness of solutions to ( 22) can be guaranteed for sufficiently large .

Theorem 4
The regularized problem (22) has the following properties for  > 0: 1.It satisfies the strong constraint qualification of Lemma 4.
Proof With  =  ,  = , we can apply Lemma 4 with the solution mapping () = −Δ() and ran I  = ran I ★  =  .This shows assertion 1.We call the Lagrangian of ( 22) homotopy Lagrangian or proximal Lagrangian and denote it by + (, )  .By Lemma 4, GCQ holds at all feasible points and Theorem 1 yields that from which we can deduce that w = ŷ − ȳ because of Hence, the feasibility of ( w, x) implies that Multiplication with Δ yields the second equation of (21).For the -part of ( 23), we observe that x and by Lemma 2 that therefore   ( x − Δ∇    ( x, ȳ)) = x, which coincides with the first equation of ( 21).This shows assertion 2.
We can now use (21) to define a fixed point iteration Let  denote the Lipschitz constant of ∇  .For Δ < 1  , the mapping Φ is a contraction because   is Lipschitz continuous with modulus 1: The Banach fixed point theorem yields uniqueness and existence of a fixed point ( x, ȳ), which together with w = ŷ − ȳ is the unique solution of (22).For Δ = 0, the fixed point and thus the solution is obviously (0, x, ŷ).In order to prove convergence of ( w, x, ȳ) to (0, x, ŷ) for Δ → 0, we observe that which implies (recall that Φ( ẑ) depends continuously on Δ) This finally proves assertion 3.

⊓ ⊔
The artificial introduction of the variable  in ( 22) allows a lifting of the dual regularization term  − ŷ in the backward Euler system (21) onto primal variables.From a linear algebra perspective, this can be understood as a Schur complement approach, as we see in the following example.
If we eliminate w with a Schur complement approach, we obtain the backward Euler system (21) as a primal-dual regularization of the original saddle point system for (1) according to We can derive two interesting equivalent reformulations of (22).The first reformulation substitutes  = √ , from which we obtain The advantage of ( 25) over ( 22) is that the optimal  is also uniquely determined for  = 0.The second reformulation completely eliminates  = −Δ().This leads to the problem which has no equality constraint and might allow for the application of projected Newton/gradient methods similar to, e.g., [14,16,38].
The homotopy problem (22) and Theorem 4 provide a complementary interpretation of using projected backward Euler steps (21) for the gradient flow equations ( 8): We trace the solutions of ( 22) from some primal-dual starting point (0, x, ŷ) as a continuation in  until the homotopy breaks down.The result yields an update for ( x, ŷ) and we can repeat the procedure.If, at one point, we are able to drive  to zero, we can solve the original problem (1) with superlinear local convergence rate by the means of a locally superlinearly convergent method for the homotopy problem ( 22), e.g., a semismooth Newton method.If it is never possible to drive  to zero, we at least follow the gradient flow (8) with a projected backward Euler method with stepsize 1/.If we fix  to some positive value, we obtain a locally linear convergence rate provided that the gradient flow converges exponentially.

Numerical case study in PDE constrained optimization
We apply the proposed method to the following benchmark problem adapted from [42]: Let Ω ⊂ R 2 be a bounded domain with Lipschitz boundary and let constants , ,  > 0, control bounds  l ,  u ∈   (Ω),  ∈ (2, ∞], and a target function  d ∈  2 (Ω) be given.We solve the control-constrained quasilinear elliptic optimal control problem min 1 2 In addition to [42], we include pointwise control bounds.For smaller values of  and , problem (26) becomes more and more ill-conditioned, while the effects of nonlinearity become more challenging for larger values of .
To transform problem (26) into the form (1), we use the variables  = (, ) where  is the weak form of the PDE in (26).The problem has a continuously Fréchet-differentiable solution operator  :  →  in the sense of Lemma 4 [17].

Implementation aspects
From an implementation point of view, the projected backward Euler system (21) with all its required derivatives can be conveniently generated by the use of the Unified Form Language [4,2] in combination with Algorithmic Differentiation [25], as it is implemented in the DOLFIN/FEniCS project [40,41,3,39].When evaluating the augmented objective   () = () +  2 ∥() ∥ 2  , the inner product (, ())  , or the dual proximal term in (22), we face the problem of computing norms and inner products in  =  −1 (Ω), which we can facilitate computationally with the use of the Riesz isomorphism ∥∥  = ∥  ∥  .If we choose the norm ∥∥  = ∥∇∥  2 (Ω) 2 on , the evaluation of    boils down to one solution of a Poisson problem with right-hand side  and homogeneous Dirichlet boundary conditions.The difficulty from a computational vantage point is that   is a large dense matrix in contrast to its inverse  −1  , which is a sparse finite element stiffness matrix.For practical purposes, we always work with the Riesz represenation of the dual variable   =    directly, eliminating the need for evaluating the Riesz isomorphism for the dual variables.
From a linear algebra point of view, it is important to exploit the special structure of the augmentation term   2 ∥() ∥ 2  .We extend a well-known argument for the special case of  = 0 (see, e.g., [36, p. 158f]) to the case  ≥ 0: For fixed (, ), let us denote the gradients and the second derivative of the augmented Lagrangian   (, ) by Disregarding inequalities for a moment, each Newton step for the (appropriately scaled) backward Euler equations (21) requires us to solve the linear system The problem here is that  ★  =    *  −1   =    *    becomes a dense matrix after discretization by finite elements due to   .Hence, we must avoid the formation of  ★ .Instead of (27) we solve the equivalent system with the reconstruction  = (1 + ) −1 ( ỹ +  2 ).The equivalence can easily be checked.Because we work with   =    directly, we need to compute the Riesz representation   =   () first, evaluate the Lagrangian derivatives at (,   +   ), solve the unaugmented Newton system (28) (reformulated for   instead of ) for (,  ỹ ), and finally reconstruct   = (1 + ) −1 ( ỹ +   ).
Newton step at  + .Here, simplified means that the system matrix of the previous semismooth Newton system is reused, subject to modifications concerning the current active set guess derived from the residual evaluated at  + .We accept an iterate for the current value of  if the following natural monotonicity test is satisfied in line 7: We require that the simplified semismooth Newton increment is smaller in norm than a contraction factor Θ ∈ (0, 1) times the semismooth Newton increment.
If the monotonicity test fails, we enlarge  by a constant factor to drive the solution of ( 21) closer to ẑ in order to eventually enter the region of local superlinear convergence of the semismooth Newton method.
If the monotonicity test is satisfied, we accept  ++ as the new iterate.If  and the norm of the outer loop increment  − ẑ are small enough, then we terminate with the solution , otherwise we predict a new step size which should eventually drive  close to zero.We then commence the next outer iteration.
There are many possibilities to predict the next  after acceptance of the current iterate.For the numerical results below, we use a heuristic motivated by a discrete proportional-integral (PI) controller: We try to choose  such that the contraction factor  = ∥ ++ −  + ∥  /∥ + − ∥  is close to a given reference  ref ∈ (0, 1).We choose to predict  ← / mod , where log  mod is the manipulated variable.To this end, let  = log  ref − log  and let  denote the sum of all previous values of  over the last successful outer loops.We then set with some constants   and   log  mod ←    +   .
In each accepted iteration, we have the simple update  ←  + .In case the monotonicity test fails, we possibly reset the integral term  ← min(, 0).We can also clip  at a lower bound  min .For a related concept in the stepsize control of one-step methods for ordinary differential equations we refer to [27, p. 28ff].
It is also possible to keep all iterates inside  with an additional projection in the local semismooth Newton step (see, e.g., [57]).We found the method to require fewer iterations on (26) without projection steps, even though we are aware that if  ∉ , we might run into problems with the monotonicity test in line 7 of Algorithm 1 because ∥ + − ∥  might not tend to 0 for  → ∞.
Alternatively to Algorithm 1, it is conceivable to update the reference point ẑ less frequently and to trace each homotopy leg until it nearly breaks down in a singularity.In our experience, this approach of long homotopy legs leads to a more complicated algorithm and requires the solution of more and worse conditioned linear systems.We prefer the sequential homotopy method with short homotopy legs in the form of Algorithm 1.

Numerical results
We apply Algorithm 1 to problem (26) on Ω = (0, 1) 2 with the target state [42]   We compare the sequential homotopy method of Algorithm 1 with a nonlinear VI solver described in [45,13] and implemented in the production quality software package PETSc [11,12].For better comparison, we use the direct solver MUMPS [6,7] for the solution of the linear systems in both approaches.The use of inexact linear algebra solvers is no conceptual problem, as long as they yield a locally convergent nonlinear iteration.The efficiency of iterative linear algebra methods, however, depends crucially on the use of suitable structure-exploiting preconditioners.This topic exceeds the scope of this paper and is the subject of future research.
For the VI solver, we consider two implemented globalization strategies, a backtracking line-search (bt) and an error-oriented monotonicity test (nleqerr).As it turns out, the VI solver did not solve any of the problem instances when started at the initial guess  0 = 0, failing either by raising an error or reaching the limit of 5.000 residual evaluations, even for a reduced termination tolerance of 10 −5 on the  ∞ -norm of the Table 1 Comparison of the sequential homotopy method of Algorithm 1 with a nonlinear VI solver with backtracking (bt) and error-oriented monotonicity test (nleqerr) for different instances of problem (26) and varying discretizations ( ).The cardinality of the discrete optimal active set is given in the #act column.The column #disc shows the number of discarded steps, which are reasonably low, hinting at the efficiency of the PI control stepsize prediction.The columns #mat and #res show the number of required matrix and residual evaluations.The sequential homotopy method solves all instances and exhibits mesh-independent convergence (subject to some fluctuations for the worse conditioned problems).The symbol ̸ ↘ denotes an error in the line-search, the symbol ∞ an error after exceeding 5.000 residual evaluations, and the symbol ↗ an error after not more than 10 −8 relative reduction of a criticality measure over 100 system matrix evaluations.residuals.Some problem instances could be solved successfully after dropping the lower control bound, which is only active for  = 10 −2 ,  = 10 2 .In some of these instances the residual norm stalled between 10 −5 and 10 −8 .

Problem parameters
We compare in Table 1 the sequential homotopy method of Algorithm 1 (with a sharper termination tolerance of 10 −8 on the -norm of the homotopy increment and upper and lower bounds) to the VI approach with reduced termination tolerance as above and only upper bounds.We can observe that the sequential homotopy method solves all problem instances with mesh-independent convergence (subject to some fluctuation for the worse conditioned problems).The VI approach with backtracking is faster for the less demanding but fails for the more demanding instances.The VI approach with error-oriented monotonicity test solves at least two of the more demanding instances successfully, although only one with an efficiency comparable to the sequential homotopy method.
In Figure 3, we see that even though slightly different numbers of iterations (depicted with markers) are performed on different meshes for the case  = 10 −2 ,  = 10 2 , roughly the same flow time of 10 11 has to be traversed to reach the required tolerance of TOL = 10 −8 .We also see that the stepsizes Δ eventually become very large and lead to superlinear convergence.This is the typical numerical behavior of the sequential homotopy method on all considered instances.For  = 128 some extra steps are carried out around  = 10 5 and  = 10 7 .

Summary
We provided sufficient conditions for the existence of global solutions to the projected gradient/antigradient flow (8) and showed that critical points with emanating descent curves cannot be asymptotically stable and are thus not attracting for the flow.We applied projected backward Euler timestepping to derive the necessary optimality conditions of a primal-dual proximally regularized counterpart (22) of (1).The regularized problem can be solved by a homotopy method, giving rise to a sequence of homotopy problems.The sequential homotopy method can be used to globalize any locally convergent optimization method that can be employed efficiently in a homotopy framework.The sequential homotopy method with a local semismooth Newton solver outperforms state-of-the-art VI solvers for a challenging class of PDE-constrained optimization problem with control constraints.
Correction to: Mathematical Programming (2021) 187:459-486 https://doi.org/10.1007/s10107-020-01488-zCorrection 1 For the example (26) considered in Sec. 5 of "A sequential homotopy method for mathematical programming problems" written by Potschka, A. and Bock, H.G., the argument presented in the last paragraph of Sec.5.1 is insufficient to prove semismoothness of the equations in (21).The required smoothing property for the argument of    ( q − Δ [ −   ( + ())]) cannot be established with the usual trick (see, e.g., [1, Theorem 2.14]) of letting Δ = 1/, which only works if the first term in the argument is also  (and not q).In fact, semismoothness does not hold due to [1, Lemma 2.7].Hence, semismoothness of ( 21) holds only for each fixed discretization and there is no theoretical justification for the good numerical results for refined discretizations reported in Sec.5.3 using a semismooth Newton method on the projected backward Euler equations (21) for the gradient/antigradient flow of the augmented Lagrangian directly.
This gap can be closed with an algorithmic modification to solve the homotopy subproblem ( 22) differently (i.e., not by ( 21) directly): By Lemma 5, critical points of (22) can also be equivalently characterized as equilibria of another gradient/antigradient flow for (22), which are equivalent with the fixpoints of the projected (forward or backward) Euler equations for any fixed stepsize  > 0. Using the Lagrangian  , (, , ) of ( 22), which is defined at the beginning of the proof of Theorem 4, the projected Euler fixpoint equations for ( 22 Choosing  = 1 + cancels the -term and if   ⊂   (Ω) for some  > 2, the required smoothing property holds because q ∈   and the Riesz operator   maps to  1 0 (Ω).A semismooth Newton method can then be applied to (C1a)-(C1c) instead of (21), where (C1a) can be used to immediately eliminate  = ŷ −  just as in (24).Using The resulting linear subproblems of a semismooth Newton method for (C2a)-(C2b) (and, equivalently, (C1a)-(C1c)), differ from the ones given in Sec.5.1 only in the determination of the active set, while the remaining entries of the matrices and righthand sides coincide.The original formulation ( 21) can be recovered by choosing  = Δ = 1  instead of  = 1 + , which justifies good numerical behavior for  ≫ .We provide an update of Table 2 with the results for the modified active set determination.In terms of number of matrix evaluations, the results are worse for  = 10 0 , similar for  = 10 1 , slightly better for  = 10 2 , slightly worse for  = 10 3 (with an outlier on the  = 512 mesh, which vanishes for a small perturbation of the initial  = 1.1 instead of  = 1), slightly better for  = 10 4 , and considerably better for  = 10 5 .Except for the outlier, the number of required matrix evaluations appears to be mesh independent.
We believe the reason for the outlier to be the following: It may happen that the monotonicity test accepts a step even though the inertia of the saddle-point matrix of the linearized subproblem for (21) changes between the points ,  + and  ++ , which implies the existence of a singularity on one of the lines connecting  with  + or  + with  ++ .As a result,  needs to be increased considerably, resulting in a high number of discarded steps.It would generally be possible to check the inertia using appropriate sparse matrix decomposition methods, but it is technically challenging in our current implementation.It is reassuring that the sequential homotopy method eventually recovers and converges to a solution (in this case the same as in the original version).Correction 2 Substitute "directional" for "Gâteaux" in Lemma 6, because the limit is only taken over positive ℎ.

Definition 1 (Tangent cone) For
x ∈  and a nonempty set  ⊆ , we call (, x) = { ∈  | there exist sequences (  ) ⊂ , (  ) ⊂ R ≥0with   → x and   (  − x) →  as  → ∞} the tangent cone to  at x.For a nonempty closed convex set  ⊆ , we denote by   :  →  the projection operator of  onto , which is uniquely defined by ∥  for all  ∈ .