Newton and interior-point methods for (constrained) nonconvex-nonconcave minmax optimization with stability and instability guarantees

We address the problem of finding a local solution to a nonconvex-nonconcave minmax optimization using Newton type methods, including interior-point ones. We modify the Hessian matrix of these methods such that, at each step, the modified Newton update direction can be seen as the solution to a quadratic program that locally approximates the minmax problem. Moreover, we show that by selecting the modification in an appropriate way, the only stable equilibrium points of the algorithm's iterations are local minmax points. As a consequence, the algorithm can only converge towards an equilibrium point if such point is a local minmax, and it will escape if the point is not a local minmax. Using numerical examples, we show that the computation time of our algorithm scales roughly linearly with the number of nonzero elements in the Hessian.


Introduction
In minmax optimization, one minimizes a cost function which is itself obtained from the maximization of an objective function.Minmax optimization is a powerful modeling framework, generally used to guarantee robustness to an adversarial parameter such as accounting for disturbances in model predictive control [1,2], security related problems [3,4], or training neural networks to be robust to adversarial attacks [5].It can also be used as a framework to model more general problem such as sampling from unknown distributions using generative adversarial networks [6], reformulating stochastic programming as minmax optimization [7][8][9], or producing robustness of a stochastic program with respect to the probability distribution [10].Minmax optimization is also known as minimax or robust optimization.Minmax optimization is related to bilevel optimization [11][12][13], as minmax optimization can sometimes be used to find solutions to bi-level optimization when the inner and outer maximization have antisymmetric criteria.
Finding a global minmax point for nonconvex-nonconcave problems is generally difficult, and one has to settle for finding a local minmax point.Surprisingly, only recently a first definition of unconstrained local minmax was proposed in [14], and the definition of constrained local minmax in [15].
In optimization, Newton's method consists of applying Newton's root finding algorithm to obtain a point for which the gradient is equal to zero.In convex minimization, the only such points are (global) minima [16,Theorem 2.5].Likewise, in convex-concave minmax optimization (meaning that the function is convex in the minimizing variable and concave in the maximizing variable), Van Neumans's Theorem [17] states that the min and the max commute, which implies that the only points for which the gradient is zero are solutions to the optimization.This means that both in convex minimization and convex-concave minmax optimization, using Newton's root finding method to obtain a point for which the gradient is zero is a good strategy to solve the optimization problem.
In contrast, for nonconvex minimization or nonconvex-nonconcave minmax optimization, the gradient can be zero at a point even if such point is not a solution to the optimization.So using Newton's root finding method to obtain a point for which the gradient is equal to zero is not a good strategy to find a (local) solution to the optimization.The foundation of our work involves examining Newton's method iterations through the lens of dynamical systems.By analyzing the linearization of the dynamics, we deduce that every equilibrium point (i.e., a point with a zero gradient) is locally asymptotically stable, which is why the iterations of the Newton's method are attracted to them.The key contribution of this article is to study how to modify the Newton's method such that it is only attracted to (local) solutions of the optimization, and repelled by any equilibrium points that are not (local) solutions.
Our paper's initial contribution is an examination of the local convergence properties of a modified Newton's method for minimization in which a multiple of the identity matrix is added to the Hessian such that the resulting matrix is positive definite [16, Chapter 3.4 "Newton's method with Hessian modification"].This modified Newton has two crucial properties.First, it can be shown to be equivalent to a sequence of local quadratic approximations to the minimization problem.Second, we demonstrate that incorporating this additive matrix renders every non-local-minimum equilibrium point unstable while maintaining stability for local minima.This simple modification ensures that the modified Newton's method has the property we refer to in the previous paragraph that the iterations are only attracted to equilibrium points that are local minima, and repelled by other equilibrium points.Utilizing analogous techniques, we establish similar results for primal-dual interior-point methods in constrained minimization.These findings (outlined in Section 2) directly inspire the development of new Newton-type algorithms for minmax optimization.
Drawing inspiration from the Newton's method for minimization, we develop Newton-type algorithms for minmax optimization, conceptualized as a series of local quadratic approximations of the minmax problem.For convex-concave functions, this quadratic approximation is just the secondorder Taylor expansion, which leads to the (unmodified) Newton's method, accompanied by its well-established local convergence properties.However, for nonconvex-nonconcave functions, it is necessary to add scaled identity matrices to ensure that the local approximations possess finite minmax solutions (without mandating convex-concavity).Additive terms meeting this criterion are said to satisfy the Local Quadratic Approximation Condition (LQAC).Employing a sequence of local quadratic approximations acts as a surrogate for guiding the modified Newton's method towards a solution at each step.Nevertheless, we demonstrate that, unlike minimization, local quadratic approximation-based modifications are not enough to ensure that the algorithm can only converge towards local minmax points.Our minmax findings reveal that additional conditions are required on the modification to unsure the algorithm's convergence to an equilibrium point is guaranteed only if that point is a local minmax.To streamline the presentation, we first introduce this result in Section 3.1 for unconstrained minmax, then expand it to primal-dual interior-point methods for constrained minmax in Section 3.2.
The conditions described above to establish the equivalence between local minmax and local asymptotic stability of the equilibria to a Newton-type iteration are directly used to construct a numerical algorithm to find local minmax.By construction, when this algorithm converges to an equilibrium point, its is guaranteed to obtain a local minmax.One could be tempted to think that the issue of getting instability for the equilibria that are not local minima (in Theorems 1 and 2) or that are not local minmax (in Theorems 3 and 4) is just a mathematical curiosity, which in practice makes little difference.However, our numerical examples in sections 4.1 and 4.2 show otherwise.Most especially the pursuit-evasion MPC problem in section 4.2, where finding a local minmax (rather than an equilibrium that is not local minmax) leads to a completely different control.Specifically, if the instability property is not guaranteed, the evader is not able escape from the pursuer.It is important to emphasize that our results fall shy of guaranteeing global asymptotic convergence to a local minmax, as the algorithm could simply never converge.However, our numerical examples also show that our algorithm seems to enjoy good global convergence properties in practice.Using the results of this paper, we have created a solver for minmax optimization and included it in the solvers of TensCalc1 [18]; this solver was used to generate the numerical results we present.

Notation:
The set of real numbers is denoted by R. Given a vector v ∈ R n , its transpose is denoted by v ′ .The operation diag(v) creates a matrix with diagonal elements v and off-diagonal elements 0. The matrix I is the identity, 1 is the matrix of ones and 0 the matrix of zeros; their sizes will be provided as subscripts whenever it is not clear from context.If a matrix A only has real eigenvalues, we denote by λ min (A) and λ max (A) its smallest and largest eigenvalues.The inertia of A is denoted by inertia(A), and is a 3-tuple with the number of positive, negative and zero eigenvalues of A.
Consider a differentiable function f : R n × R m → R p .The Jacobian (or gradient if p = 1) at a point (x, ȳ) according to the x variable is a matrix of size n × p and is denoted by ∇ x f (x, ȳ), and analogously for the variable y.When p = 1 and f (•) is twice differentiable, we use the notation ∇ yx f (x, ȳ) := ∇ y ∇ x f (x, ȳ) which has sizes m × n.We use analogous definition for ∇ xy f (x, ȳ), ∇ xx f (x, ȳ) and ∇ yy f (x, ȳ).

Literature Review
Traditionally, robust optimization focused on the convex-concave case, with three main methods.The first type of method is based on Von Neuman's minmax theorem [17] that states that the min and the max commute when the problem is convex-concave and the optimization sets are convex and compact.Solving the minmax then simplifies to finding a point that satisfies the first order condition.While there are many different methods to achieve this, many of them can be summarized by the problem of finding the zeros of a monotone operator [19].The second type of methods consists on reformulating the minmax as a minimization problem which has the same solution as the original problem.This is generally done using either robust reformulation through duality theory or tractable variational inequalities [12,[20][21][22].The third, cutting-set methods, solves a sequence of minimization where the constraint of each minimization is based on subdividing the inner maximization [23].
Motivated by some of the shortcomings of these methods and the necessities of machine learning, research on minmax optimization started to study first-order methods based on variations of gradient descent-ascent.The results tend to focus on providing convergence complexity given different convexity/ concavity assumptions on the target function.We can divide these first order methods in three families.The first familie solves the minmax by (approximately) solving the maximization each time the value of the minimizer is updated.When this is done using first order methods, it is generally referred to as multi-step gradient descent ascent, unrolled gradient descent ascent or GDmax, and the minimizer is updated by a single gradient descent whereas the maximizer is updated by several gradient ascent steps.A second family uses single step, where the minimizer and maximizer are updated at each iteration.For both of these two first families, the gradient iterations can include variations such as using different step sizes for the minimization and maximization, or using momentum.A third family, which is completely different from what is described for other ones, is to include the gradient from different time steps in the computation, such as the past one (as in optimistic gradient descent-ascent), the midpoint between the current and future points (as in extra gradient descent-ascent) and at future point (as in proximal point).The literature on first-order methods is very extensive, and we refer to [14,[24][25][26][27][28][29][30][31] and the references within for the exposition on some of these methods and their convergence properties.
In recent years, researchers have also started to work on algorithms that use second order derivatives to determine the directions.These algorithm, in their major part have not attracted as much attention as first order methods.In the Learning with Opponent Learning Awareness (LOLA), the minimizer anticipates the play of the maximizer using the Jacobian of the maximizer's gradient [32,33].In competitive gradient descent, both minimizer and maximizer use the cross derivative of the Hessian to compute their direction [34].In follow the ridge, the gradient ascent step is corrected by a term that avoids a drift away from local maxima [35].In the total gradient descent-ascent, similarly to LOLA, the descent direction is computed by taking to total derivative of a function which anticipates the maximizer's response to the minimizer [36].Finally, the complete Newton borrows ideas from follow the ridge and total gradient to obtain a Newton method which prioritizes steps towards local minmax [37].These three last algorithms are shown to only converge towards local minmax under some conditions, but in none of them it is addressed the issue of how to adjust the Hessian far away from a local minmax point.
Recently, some second order methods have been proposed for the nonconvex-strongly-concave case, where the minimizer update is a descent direction of the objective function at its maximum.They either use cubic regularization [38,39] or randomly perturb the Hessian [40].Because of some of the assumptions these work make, most important the strong-concavity of the objective function with respect to the maximizer, they are able to establish complexity analysis and guarantee.It is also worth mention that these algorithms are all multi-step based, meaning they (approximately) solve the maximization between each update of the minimizer, whereas our algorithm updates both the minimizer and the maximizer simultaneously.

Minimization
Let f : X → R be a twice continuously differentiable cost function defined in a set X ⊂ R nx where n x is a positive integer2 , and consider the minimization problem min We recall that a point x * is called a local minimum of f (•) if there exist δ > 0 such that f (x * ) ≤ f (x) for all x ∈ {x ∈ X : ∥x − x * ∥ < δ}.We will study the property of Newton type algorithms to solve (1) in two distinct cases, when X = R nx and when X is defined by equality and inequality constraints.

Unconstrained minimization
Let X = R nx , which is referred to as unconstrained minimization in the literature, in which case (1) simplifies to If f (•) is twice continuously differentiable in a neighborhood of a point x and An extremely popular method to solve a minimization problem is to use Newton's root finding method to obtain a point x such that ∇ x f (x) = 0.In its most basic form, the algorithm's iterations are given by ( where we use the notation x + to designate the value of x at the next iteration.Newton's method biggest advantage is that it converges very fast near any point that satisfies the first order condition ∇ x f (x) = 0: at least linearly but possibly superlinearly when the function is Lipschitz [16,Theorem 3.6].However, this is also precisely Newton's method biggest limitation for nonconvex minimization, because it does not distinguish a local minimum from any other point satisfying the first order condition.Let us further illustrate this limitation with an example.
Example 1 Consider the optimization, for which ∀x ∈ R, The corresponding Newton iteration (3) is of the form for which both the local minimum x min := 1 and the local maximum x max := −1 are locally asymptotically stable equilibria with superlinear convergence.Specifically, x 0 = 0 ⇒ iteration fails since ∇xxf (x) = 6x is not invertible.
In order to address this limitation, a widely used modification of Newton's method for unconstrained nonconvex optimization [16,Chapter 3.4], is obtained by modifying the basic Newton method such that d x is obtained from solving the following local quadratic approximation to (1) = arg min with ϵ x (x) ≥ 0 chosen such that (∇ xx f (x) + ϵ x (x)I) is positive definite.For twice differentiable strongly-convex functions we can choose ϵ x (x) = 0 and this corresponds to the classical Newton's method.However, when f (•) is not strongly-convex, the minimization in ( 5) is only well-defined if ∇ xx f (x)+ϵ x (x)I is positive definite, which requires selecting a strictly positive value for ϵ x (x), leading to a modified Newton's method.Regardless of whether f (•) is convex, the positive definiteness of < 0 and therefore d x is a descent direction at x [16].The corresponding Newton iteration to obtain a local minimum is then given by Let us analyze how this modification impacts the convergence in our previous example.
Example 1 (Continuation) For the optimization in (4), the modified Newton step in (6) becomes In this case, Selecting the function ϵx(•) with ϵx(•) = 0 around x min results in superlinear convergence to x min , but if ϵx(•) > 0, the convergence is only linear.For example, picking ϵx(x) = −6x + η with η > 0, (7) holds for all x, but the modified Newton step in (6) becomes , which is just a gradient descent.
The following result generalizes the conclusion from the previous example by establishing that the positive definiteness of ∇ xx f (x) + ϵ x (x)I not only guarantees that d x is a descent direction, but also that every locally asymptotically stable (LAS) equilibrium point of the Newton iteration (6) is a local minimum.
Theorem 1 (Stability and instability of modified Newton method for unconstrained minimization) Let x be an equilibrium point in the sense that ∇xf (x) = 0. Assume that ∇xxf (x) is invertible and that ∇xxf (•) is differentiable in a neighborhood around x. Then for any function ϵx(•) that is constant in a neighborhood around x and satisfies ∇xxf (x) + ϵx(x)I ≻ 0 one has that if: i) x is a local minimum of (2), then it is a LAS equilibrium of (6).ii) x is not a local minimum of (2), then it is an unstable equilibrium of (6).The theorem's first implication is that if the modified Newton iteration starts sufficiently close to a strict local minimum, it will converge at least linearly fast to it.One could think that it would always be preferable to have ϵ x (x) = 0 if ∇ xx f (x) ≻ 0, in which case not only stability can be trivially obtained but also that the Newton method has superlinear convergence if f (•) is Lipschitz [16,Theorem 3.6].However, in practice, there are situations for which one might want to take ϵ x (x) > 0. A typical case happens if the smallest eigenvalue of ∇ xx f (x) is positive but very small, which might bring numerical issues when computing the Newton step ∇ xx f (x) −1 ∇ x f (x).This issue can be fixed by taking ϵ x (x) > 0, and Theorem 1 guarantees that doing so will not impair (at least locally) the algorithm's capacity to converge towards a local minimum.
The theorem's second implication is, in a way, even more relevant than the first one.As we mentioned earlier, the regular Newton's method (meaning, with ϵ x (x) = 0) is infamously known to be attracted to any point that satisfies ∇ x f (x) = 0, regardless of whether it is a local minimum, a saddle point, or a local maximum.What Theorem 1 is essentially saying is that the modified Newton is only attracted to local minima, and that any other equilibrium point repels the iteration.In essence, this means that the modified Newton's method cannot converge towards a point that is not a local minimum, thus fixing one of the biggest drawbacks of the regular Newton's method.
While it goes beyond the point of this article, notice that for large values of ϵ x (x), which shows that the modified Newton's step (6) essentially becomes a gradient descent step with a small step size ϵ x (x) −1 .This also shows that, by keeping ϵ x (x) −1 sufficiently large, the iteration (6) could be made descent with respect to the cost.However, this would be achieved at the cost of losing superlinear convergence.
Proof of Theorem 1 From our assumption that ∇xxf (x) is invertible, x is a local minimum if and only if ∇xxf (x) ≻ 0. This comes from the second order necessary condition for minimization [16,Chapter 2].
Let us now prove the stability and instability properties.The first step in our analysis is to calculate the Jacobian of (∇xxf (x) + ϵx(x)I) −1 ∇xf (x) that appears in (6) at an equilibrium point x.Using the differentiability of ∇xxf (•) and that ϵx(•) is constant in a neighborhood of x, we obtain that where ∇xf (x) (i) is the i th element of ∇xf (x) and [(∇xxf (x) + ϵx(x)I) −1 ] i is the i th column of (∇xxf (x) + ϵx(x)I) −1 .Since (∇xxf (x) + ϵx(x)I) is positive definite, ∇x[(∇xxf (x) + ϵx(x)I) −1 ] i is well defined and since x is an equilibrium point, ∇xf (x) (i) = 0 for i ∈ {1 . . .N } and therefore the Jacobian of right-hand side of ( 6) is given by The main argument of the proof is based on the following result.Let v be an eigenvector associated to an eigenvalue ρ of (8).Then Therefore, ρ is an eigenvalue of (8) if and only if ρ∇xxf (x)+(ρ−1)ϵx(x)I is singular.
We remind the reader that given a dynamical system, if the system's dynamic equation is continuously differentiable, a point is a LAS equilibrium point if all the eigenvalues of the linearized system are inside the unit circle.Conversely, if at least one of the eigenvalues of the linearized system is outside the unit circle, then the system is unstable [41,Chapter 8].
From (9), ρ = 0 is an eigenvalue if and only if ϵx(x) = 0, which, by construction, can only happen if x is a local minimum, in which case x is a LAS equilibrium point of ( 6), as expected.

Constrained minimization
Our results from the previous section can also be extended to consider the case with more general constraint with the minimization set X involving equality and inequality constraints of the form where the functions G x : R nx → R lx and F x : R nx → R mx are all twice continuously differentiable3 .It will be convenient for the development of the primal-dual interior-point method to use slack variables and rewrite (1) as min where Similar to what we have in the unconstrained minimization, we want a second order conditions to determine whether a point is a local minimum.Consider the function where we use the shorthand notation z := (x, s x , ν x , λ x ).L(z) is essentially the Lagrangian of (10).In order to present the second order conditions, we need to define two concepts, the linear independence constraint qualification and strict complementarity [16,Definitions 12.4 and 12.5].
Definition 1 (LICQ and strict complementarity) Let the set of active inequality constraints for the minimization be defined by x (x) denote the i th element of Fx(x).Then: • The linear independence constraint qualification (LICQ) is said to hold at z if the vectors in the set We have almost all the ingredients to present the second order condition for constrained minimization.For unconstrained minimization, a sufficient condition for a point x to be a local minimum is that ∇ x f (x) = 0 and ∇ xx f (x) ≻ 0. If it were not for the inequality constraints in (10), we would be able to state the second order conditions using gradients and Hessians of L(z).The inequality constraints make the statement a bit more complicated.The role of the gradient will be played by with ⊙ denoting the element wise Hadamard product of two vectors and b ≥ 0 the barrier parameter (its role will be explained shortly).The role of ∇ xx f (x) in the unconstrained minimization will be played by the matrix We also remind the reader that the inertia inertia(A) of a symmetric matrix A is a 3-tuple with the number of positive, negative and zero eigenvalues of A.
Proposition 1 (Second order sufficient conditions for constrained minimization) Let z be an equilibrium point in the sense that g(z, 0) = 0 with λx, sx ≥ 0. If the LICQ and strict complementarity hold at z and inertia(Hzzf (z)) = (nx + mx, lx + mx, 0) (13) then x is a local minimum of (10).
While this result is relatively well known, we present its proof in Appendix A. The proof also makes it easier to understand the proof of the second order sufficient conditions for constrained minmax optimization.

Primal-dual interior-point method
Let d z := (d x , d s , d ν , d λ ) be the update direction for z, which will play an equivalent role to d x in the unconstrained case.A basic primal-dual interiorpoint method finds a candidate solution to (10) using the iterations where the barrier parameter4 b is slowly decreased to 0, so that z converges to a root of g(z, 0) = 0 while α ∈ (0, 1] is chosen at each step such that the feasibility condition λ x , s x > 0 hold [16,Chapter 19].This basic primal-dual interior-point has similar limitation as a (non-modified) Newton method for unconstrained minimization: it might converge towards an equilibrium point that is not a local minimum and ∇ z g(z, b) might not be invertible.Similar to what we have done in the unconstrained case, we can modify this basic primaldual interior-point method such that the update direction d z is obtained from a quadratic program that locally approximates (10).The rest of this section will be spent mostly constructing such quadratic program.Let us start with X described only by equality constraints (i.e., no F x (x) and no s x ), in which case which locally approximates (10) around (x, ν x ) 5 .If ∇ x G x (x) is full column rank, we can choose ϵ x (z) large enough such that the solution of ( 15) is well defined and unique.To show that, let us look at (15) as an optimization in its own right.Let dν be the Lagrange multiplier and define the function ḡ( dx , dν ) which is the function g(z, b) defined in (11) but now for problem (15): So if one takes any ϵ x (z) ≥ 0 large enough such that then we guarantee that any point dx , dν that satisfies ḡ( dx , dν ) = 0 will be a strict local minimum of (15) (see Proposition 1).Moreover, this choice of ϵ x (z) also guarantees that ( 15) is a strongly convex quadratic optimization, which, with the fact that ∇ x G x (x) is full column rank, means that the solution ( dx , dν ) is unique.Therefore, we will take the update directions (d x , d ν ) to be the solution ( dx , dν ).Moreover, with some algebra, one can show that the solution to ( 15) is given by Let us now address the case in which there there are inequality constraints.The challenge is to take into account the constraint s x ≥ 0. To address this, let us start by relaxing the inequality constraint from (10) and including it in the cost as the barrier function This is a relaxation because −b1 ′ log(s x ) only accepts s ≥ 0 and goes to +∞ if s x → 0. The optimization (18) only has equality constraints, so similar to what we did in (15), let us construct a local second order approximation of (18) around z: associated Lagrange multiplier ν * , then x * is also a local minimum of min Evidently, ν * x is not know in advance, so instead one uses the value of νx at the current iteration, which leads to the local approximation (15).
where ⊘ designates the element wise division of two vectors.Equation ( 19) is not exactly a second order approximation because instead of using as quadratic term for ds the matrix b diag(s x ) −2 (which is the actual matrix given by second order approximation of −b1 ′ log(s x + d s ) around s x ), we used the matrix diag(λ x ⊘ s x ).This is a relatively well known substitutions for interior-point methods, and is what makes it be a primal-dual interior-point method instead of a barrier interior-point method.The technical justification is that, if we were at a point such that g(z, b) = 0, the two would be equivalent as λ x ⊙s x −b1 = 0.In practice, it has been observed that this modified linearization tends to perform better because it provides directions d s that also take into account the current value of λ x in the quadratic form, which helps to get a direction d z that does no violate the constraints λ x , s x > 0 [16,Chapter 19.3].Because ( 19) is a quadratic program with linear equality constraints, just as it was the case for (15), we can use the exact same reasoning to choose ϵ x (z).Let us define the matrices and E(z) := diag(ϵ x (z)1 nx , 0 mx+lx+mx ).If ϵ x (z) is chosen large enough such that inertia(J zz + E(z)) = (n x + m x , l x + m x , 0), then the solution ( dx , ds ) of ( 19) and associated Lagrange multipliers ( dν , dλ ) are unique.With some algebra, one could show that the solution of ( 19) is where S := diag(1 nx , s x , 1 lx+mx ).Putting it all together, the modified primaldual interior-point is governed by the equation where α ∈ (0, 1] is chosen such that λ x , s x > 0. Conveniently, because we used diag(λ x ⊘ s x ) for the second order linearization of the barrier, when ϵ x (x) = 0, we recover the basic primal-dual interior-point method from (14).We refer to [16,Chapter 19] for a complete description of an algorithm using (21), including a strategy to decrease the barrier parameter b.Alternatively, we describe such strategy in Section 4 for the minmax optimization case.
We can now state a result connecting the stability/instability of any equilibrium point of the modified primal-dual interior-point method to such point being or not a local minimum.The theorem says essentially the same thing as Theorem 1: On the one hand, even if inertia(J zz f (z)) = (n x +m x , l x +m x , 0), taking ϵ x (z) > 0 will not impair the algorithm's capacity to converge towards a local minimum; this can be useful, for instance, if inertia(J zz f (z)) has an eigenvalue close to 0. On the other hand, using the modified primal-dual interior-point method essentially guarantees that the algorithm can only converge towards an equilibrium point if such point is a local minimum, thus fixing the issue of primal-dual interior-point methods being attracted to any equilibrium point, regardless of whether such point is a local minimum.
Theorem 2 (Stability and instability of modified primal-dual interior-point method for constrained minimization) Let α = 1 and (z, b) with b > 0, be an equilibrium point in the sense that g(z, b) = 0. Assume the LICQ and strict complementarity hold at z, that Jzzf (z) is invertible, and that Jzzf (•) is differentiable on a neighborhood around z. Then for any function ϵx(•) that is constant in a neighborhood around z and satisfies inertia(Jzz + E(z)) = (nx + mx, lx + mx, 0) one has that if: i) z is a local minimum of (10), then it is a LAS equilibrium of (21).ii) z is not a local minimum of (10), then it is an unstable equilibrium of (21).
Proof sketch First, using the same arguments as in the proof of Theorem 1, we conclude that the Jacobian of the dynamic system (21) around a point z for which g(z, b) = 0 is Second, it is straightforward to check that Hzzf (z) = S 1/2 Jzzf (z)S 1/2 which, using Sylvester's law of inertia [42,Theorem 1.5], means that inertia(Hzzf (z)) = inertia(Jzzf (z)).This means that one can check the second order conditions in (13) by using Jzzf (z).
Let us define the matrix where Zx(z) ∈ R nx+mx,nx−lx is a matrix with full column rank such that ∇xGx(x Using the same arguments as in the proof of Proposition 1, we conclude that inertia(Jzzf 1) ≻ 0 and that the second order sufficient condition is equivalent to R(0) ≻ 0. This means that the rest of the theorem's proof is analogous to the one of Theorem 1, but instead of looking at the sign of the smallest eigenvalue of ∇xxf (x) + µϵx(z)I, one looks at the sign of the smallest eigenvalue of the matrix R(µ).

Minmax optimization
Consider the minmax optimization problem where f : R nx × R ny → R is a twice continuously differentiable objective function, X ⊂ R nx is the feasible set for x and Y : X ⇒ R ny is a setvalued map that defines an x dependent feasible set for y; we do not make any convexity or concavity assumption on f (•), X and Y(•).We chose Y(•) to be dependent on x because this describes the most general application.Moreover, having the constraints of the inner maximization to depend on the value of outer maximization is often necessary in problems such as robust Model Predictive Control or in bi-level optimization.Furthermore, notice that we do not make any assumption on whether the min and the max commute (and this would not be well defined as Y(•) depends on x).A solution (x * , y * ) to ( 24) is called a global minmax and satisfies We will look at two representations of X and Y(•): first when X = R nx and Y = R ny , which is known in the literature as the unconstrained case; second a more general representation in which X and Y(•) are defined using equality and inequality constraints.
A point (x * , y * ) is said to be a local minmax of (24) if there exist a constant δ 0 > 0 and a positive function h(•) satisfying h(δ) → 0 as δ → 0, such that for every δ ∈ (0, δ 0 ] and for every (x, y) ∈ {x ∈ X : ∥x − x * ∥ ≤ δ} ×{y ∈ Y(x * ) : ∥y − y * ∥ ≤ h(δ)} we have f (x, ỹ) [14,15].Inspired by the properties of the modified Newton and primal-dual interior-point methods for minimization in Section 2, we want to develop a Newton-type iterative algorithm of the form where d x and d y satisfy the following properties: P1: At each time step, (d x , d y ) is obtained from the solution of a quadratic program that locally approximates (24) and therefore (x + , y + ) can be seen as an improvement over (x, y).This acts as a surrogate for guiding the modified Newton's method towards a solution at each step.
P2: The iterations of (25) can converge towards an equilibrium point only if such point is a local minmax.Similar to what was the case in minimization (see Example 1), a pure Newton method will be attracted to any equilibrium point.This makes sure that the iterations will not be attracted to equilibrium points that are not local minmax.P3: The iterations of (25) can converge to any local minmax.This property means that any modification to Newton's method needs to keep local minmax as attractor.

Unconstrained minmax
We start by considering the case where X = R nx and Y(•) = R ny such that (24) simplifies to min For this case, [14] establishes second order sufficient conditions to determine if a point (x, y) is a local minmax which can be stated in terms of the inertia of the matrix We recall that the inertia inertia(A) of a symmetric matrix A is a 3-tuple with the number of positive, negative and zero eigenvalues of A.
The second order conditions in [14] are: inertia(∇ yy f (x, y)) = (0, n y , 0) and which turn out to be equivalent to the inertia conditions in Proposition 2 in view of Haynsworth inertia additivity formula [42,Theorem 1.6].Notice that the second order sufficient conditions are not symmetric.A point might be a local minmax even if ∇ xx f (x, y) ⊁ 0 as long as −∇ xy f (x, y)∇ yy f (x, y) −1 ∇ yx f (x, y) (which is positive) is large enough.So the second order conditions are what allow one to distinguish between an equilibrium point being a local minmax and a minmin, maxmax or maxmin.One can interpret the second order sufficient conditions as saying that y → f (x, y) is strongly concave in a neighborhood around (x, y) and x → max ỹ:∥y−ỹ∥<δ f (x, ỹ) is strongly convex in a neighborhood around (x, y) for some δ > 0. Notice that these are only local properties around local minmax, as f (•) may be nonconvex-nonconcave away from local minmax points.
In order to obtain property P1, we propose to obtain the Newton direction (d x , d y ) for ( 25) by solving the following local quadratic approximation to (26) d′ y ∇ yy f (x, y) − ϵ y (x, y)I dy (28) with ϵ x (•) and ϵ y (•) chosen so that the minmax problem in (28) has a unique solution, which means that the inner (quadratic) maximization must be strictly concave and that the outer (quadratic) minimization of the maximized function must be strictly convex, which turns out to be precisely the second order sufficient conditions in Proposition 2, applied to the approximation in (28), which can be explicitly written as follows: inertia ∇ yy f (x, y) − ϵ y (x, y)I = (0, n y , 0) and where E(x, y) = diag(ϵ x (x, y)1 nx , −ϵ y (x, y)1 ny ).We call these condition the Local Quadratic Approximation Condition (LQAC).It is straightforward to show that the Newton iterations (25) with (d x , d y ) obtained from the solution to (28) is given by To obtain properties P2 and P3, we need all locally asymptotically stable equilibrium points of (28) to be local minmax of (26) and that all other equilibrium points of (28) to be unstable.For the unconstrained minimization in Section 2.1, to obtain the equivalent of properties P2 and P3 it was sufficient to simply select ϵ x (•) such that the local quadratic approximation (5) has a well-defined minimum (Theorem 1).However, for minmax optimization the (LQAC) does not suffice to guarantee that P2 and P3 hold.Our first counter example bellow show how the (LQAC) are not enough to ensure that P2 holds; our second counter example show how they are not enough to guarantee that P3 holds.
The main contribution of this section is a set of sufficient conditions that, in addition to (LQAC), guarantee P2 and P3 hold.
Theorem 3 (Stability and instability of modified Newton's method for unconstrained minmax) Let (x, y) be an equilibrium point in the sense that ∇xf (x, y) = 0 and ∇yf (x, y) = 0. Assume that ∇zzf (x, y) and ∇yyf (x, y) are invertible and that ∇zzf (•) is differentiable on a neighborhood around (x, y).Then there exist functions ϵx(•) and ϵy(•) that are constant in a neighborhood around (x, y), satisfy the (LQAC) at (x, y) and guarantee that if: i) (x, y) is a local minmax of (26), then it is a LAS equilibrium of (29).ii) (x, y) is not a local minmax of (26), then it is an unstable equilibrium of (29).
The theorem's implications are similar to those of Theorem 1.On the one hand, if (x, y) is a local minmax, then it is possible to construct functions ϵ x (•) and ϵ y (•) that guarantee that the modified Newton method can converge towards a local minmax.A natural choice for such function near a local minmax is to take ϵ y (•) = ϵ x (•) = 0, which not only provides the stability result, but can also achieve superlinear convergence if f (•) is Lipschitz.On the other hand, if (x, y) is an equilibrium point but not a local minmax, it is possible to construct functions ϵ x (•) and ϵ y (•) such that the algorithm's iterations cannot converge towards it.This means that the modified Newton's method for minmax can only converge towards an equilibrium point if such point is a local minmax.
While the statement of Theorem 3 is about existence, the proof is actually constructive.The functions ϵ x (•) and ϵ y (•) are not unique, and have to satisfy the following conditions: i) For the stability result, if ϵ y (x, y) = 0, then the stability property is guaranteed by any ϵ x (x, y) ≥ 0. If ϵ y (x, y) > 0, then ϵ x (x, y) needs to be taken large enough to satisfy the condition in equation ( 32) of the proof.ii) For the instability result: • unless inertia(∇ yy f (x, y)) ̸ = (0, n y , 0) and inertia(∇ zz f (x, y)) = (n x , n y , 0), then it is sufficient for ϵ x (x, y) and ϵ y (x, y) to satisfy the (LQAC) to guarantee instability.• if inertia(∇ yy f (x, y)) ̸ = (0, n y , 0) and inertia(∇ zz f (x, y)) = (n x , n y , 0) then for a given ϵ y (x, y), ϵ x (x, y) needs to be large enough such that for some µ ∈ (0, 1), inertia(∇ zz f (x, y) + µE(x, y)) ̸ = (n x , n y , 0).
We use these results in Section 4 to present an efficient way to numerically construct these functions.
Proof of Theorem 3 The fact that the (LQAC) can always be satisfied is straightforward: as ∇zzf (x, y) is differentiable, its eigenvalues are bounded and can be made to have the desired inertia by taking sufficiently large (but finite) values of ϵx(x, y) and ϵy(x, y).Moreover, from our assumption that ∇zzf (x, y) and ∇yyf (x, y) are invertible, (x, y) is a local minmax point if and only if (x, y) satisfy the second order sufficient in (27); this is implied by the second order necessary conditions for local minmax in [14].

Constrained minmax
We now consider the case with more general constraint sets involving equality and inequality constraints of the form where the functions G x : R nx → R lx , F x : R nx → R mx , G y : R nx ×R ny → R ly and F y : R nx × R ny → R my are all twice continuously differentiable.Similar to what we did in Section 2.2, it will be convenient for the development of the primal-dual interior-point method to use slack variables and rewrite the constrained minmax (24) as min x,sx:Gx(x)=0,Fx(x)+sx=0,sx≥0 max y,sy:Gy(x,y)=0,Fy(x,y)+sy=0,sy≥0 f (x, y).(36) where s x ∈ R mx and s y ∈ R my .
Similar to what we have done in the unconstrained case, we want to present second order conditions to determine if a point is a constrained local minmax.In order to do so, we need to extend some fundamental concepts of constrained minimization to constrained minmax optimization.The function will play an equivalent role as the Lagrangian with (ν x , ν y , λ x , λ y ) as the equivalent of Lagrange multipliers; we use the shorthand notation z = (x, s x , y, s y , ν y , λ y , ν x , λ x ).Furthermore, we use the following definition of linear independence constraint qualifications (LICQ) and of strict complementarity for minmax optimization: Definition 2 (LICQ and strict complementarity for minmax) Let the sets of active inequality constraints for the minimization and maximization be defined, respectively, by x (x) = 0} and Ay(x, y) = {i = 1, . . ., my : where x (x) and y (x, y) denote the i th element of Fx(x) and Fy(x, y).Then: • The linear independence constraint qualification (LICQ) is said to hold at z if the vectors in the sets We have almost all the ingredients to present the second order conditions for constrained minimization.For the unconstrained minmax optimization, the second order condition in Proposition 2 required that gradients (∇ x f (x, y) and ∇ y f (x, y)) were equal to zero and that Hessians (∇ zz f (x, y) and ∇ yy f (x, y)) had a particular inertia.Analogously to what was the case for the constrained minimization in Section 2.2, if it were not for the inequality constraints in (35), we would be able to state the second order conditions using gradients and Hessians of L(z).The inequality constraints make the statement a bit more complicated.The role of the gradient will be played by where ⊙ denotes the element wise Hadamard product of two vectors and b ≥ 0 the barrier parameter, which is the extension to minmax of the function g(•) defined in (11) for the minimization.The role of ∇ yy f (x, y) will be played by while the role of ∇ zz f (x, y) will be played by with blocks defined by Proposition 3 (Second order sufficient conditions for constrained minmax) Let z be an equilibrium point in the sense that g(z, 0) = 0 with λy, λx, sy, sx ≥ 0. If the LICQ and strict complementarity hold at z and inertia(Hyyf (z)) = (ly+my, ny + my, 0) and inertia(Hzzf (z)) = (nx + mx + ly+my, lx + mx + ny + my, 0) (38) then (x, y) is a local minmax of (24).
Similar to what was the case for the second order sufficient conditions for unconstrained minmax in Proposition 2, the conditions in (38) are not symmetric, highlighting that there is a distinction between the minimizer and maximizer.Moreover, similar to the second order sufficient conditions for unconstrained minmax in Proposition 2, one can interpret the second order sufficient conditions for constrained minmax as saying that the optimization max y∈Y(x) f (x, y) is strongly concave in a neighborhood around (x, y) and that the optimization min x∈X ϕ(x) with ϕ(x) := max ỹ∈Y(x):∥y−ỹ∥<δ f (x, ỹ) is strongly convex in a neighborhood around (x, y) for some δ > 0.
The conditions for Proposition 3 are slightly stricter than the ones in [15] as we require strict complementarity and LICQ both for the max and the min.However, our conditions allow us to verify whether a point is a local minmax using the inertia, instead of having to compute solution cones.We prove that given these stricter assumptions our conditions are equivalent to those in [15] in Appendix A.

Primal-dual interior-point method
Let d z = (d x , d sx , d y , d sy , d νy , d λy , d νx , d λx ) be a shorthand notation to designate the update direction of the variables z = (x, s x , y, s y , ν y , λ y , ν x , λ x ).Similar to the basic primal-dual interior-point method introduced in Section 2.2, a basic primal-dual interior-point method for minmax finds a candidate solution to (36) using the iterations where the barrier parameter b is slowly decreased to 0, so that z converges to a root of g(z, 0) = 0 while α ∈ (0, 1] is chosen at each step such that the feasibility conditions λ y , λ x , s y , s x > 0 hold.We want to modify this basic primal-dual interior-point so it satisfies the properties P1, P2 and P3. In order to obtain property P1, we propose to obtain d z from the solution of a quadratic program that locally approximates (36).Using equivalent arguments as in the development of the quadratic program (19) for the constrained minimization in Section 2.2, we obtain that the objective function should be where ϵ x (z) ≥ 0 and ϵ y (z) ≥ 0 are scalar and ⊘ designates the element wise division of two vectors.The feasible sets dX for (d x , d sx ) and the set-valued map that defines a feasible set dY(d x ) for (d y , d sy ) are obtained from the first order linearization of the functions in X and Y(d y ) and are given by where ϵ x (z) and ϵ y (z) are chosen such that the solution to ( 40) is unique.We can apply to (40) the second order condition from Proposition 3 and obtain that ϵ x (z) and ϵ y (z) need to be chosen to satisfy where is the equivalent of the matrix defined in (37b) for the problem (40) and can be shown to be equal to with S = diag(1 nx , s x , 1 ny , s y , 1 ly+my+lx+mx ); J yy f (z) is the equivalent partition of J zz f (z) as H yy (z) is of H zz (z).We will call these conditions the Constrained Local Quadratic Approximation Conditions (ConsLQAC).In this case, it is straightforward to show that modifying the basic primal-dual interior-point iterations in (39) by taking d z from the solution of (40) leads to the iterations Analogously to what was the case in unconstrained minmax optimization, choosing ϵ x (z) and ϵ y (z) such that the (ConsLQAC) hold is not sufficient to guarantee that P2 and P3 hold for the modified primal-dual interior-point method (a counter example can be found in Section 4.2).Our next theorem is the extensions of Theorem 3 to the modified primal-dual interior-point and has the equivalent consequences: For property P3 to hold, as long as ϵ x (z) is large enough, taking ϵ y (z) > 0 will not impair the algorithm's capacity to converge towards a local minmax; this can be useful, for instance, if inertia(J zz ) has an eigenvalue close to 0. For property P2 to hold, in order to guarantee that the modified primal-dual interior-point method cannot converge towards an equilibrium point that is not local minmax, the (ConsLQAC) are sufficient only whenever inertia(J zz f (z)) ̸ = (n x + m x + l y + m y , l x + m x + n y + m y , 0).Otherwise, ϵ x (z) needs to be taken large enough such that inertia(J zz f (z) + µE(z)) ̸ = (n x + m x + l y + m y , l x + m x + n y + m y , 0) for some µ ∈ (0, 1).
Theorem 4 (Stability and instability of modified primal-dual interior-point method for constrained minmax) Let α = 1 and (z, b) with b > 0, be an equilibrium point in the sense that g(z, b) = 0. Assume the LICQ hold at z, that Jzzf (z) and Jyyf (z) are invertible, and that Jzzf (•) is differentiable in a neighborhood around z. Then there exists functions ϵx(•) and ϵy(•) that are constant in a neighborhood around z, satisfy the (ConsLQAC) at z and guarantee that if: i) z is a local minmax of (36), then it is a LAS equilibrium of (42).ii) z is not a local minmax of (36), then it is an unstable equilibrium of (42).
has an analogous statement with Ry(µ) and Rx(µ), respectively.For the sake of completeness, we highlight the main points of the analogy.
First, when z is such that (38) holds, the sufficient condition for z to be a LAS equilibrium point of ( 42) is that The only extra argument needed is to show that condition (46) is always feasible for some ϵx(z) large enough.This is not evident as the matrix has size (nx + mx) × (nx + mx) while Ex(z) only has nx nonzero elements in the diagonal.However, because of the structural zeros in Jxyf (z) and Ey(z), one can verify with some algebraic manipulation that rank(M ) := r ≤ min(nx, ny).Let Λ be the matrix with eigenvalues of M in decreasing order and V its associated eigenvectors such that M = V ΛV ′ .We can partition V into V 1 of size (r, r) associated to the nonzero eigenvalues of M and V 2 = I nx+mx−r .This partition means that Ex(z) = V ′ Ex(z)V , which means on can conclude that which implies that one can always take ϵx large enough such that for each negative diagonal entries of Λ, the equivalent diagonal element of (Ex(z) + Λ) is positive.Now the second part, let us prove the statement when z is such that the second order conditions in (38) do not hold.We need to prove that is singular for some µ ∈ (0, 1).On the one hand, using the same analysis as in the proof of Theorem 3, we conclude that the (ConsLQAC) are sufficient to guarantee that z is an unstable equilibrium point of (42) if inertia(Jzzf (z)) ̸ = (nx + mx + ly + my, lx + mx + ny + my, 0).On the other hand, if inertia(Jzzf (z)) = (nx + mx + ly + my, lx + mx + ny + my, 0), than we can guarantee that by taking ϵx sufficiently large, there is a µ ∈ (0, 1) such that inertia(Jzzf (z) + µE) ̸ = (nx + mx + ly + my, lx + mx + ny + my, 0), which means that z is an unstable equilibrium point of (42).This concludes the proof.□

Algorithmic development and numerical examples
The following algorithm combines the result of the previous section to propose a method for selecting ϵ x (z) and ϵ y (z) that satisfies the (ConsLQAC) and guarantees the stability properties of Theorem 4. We only state the algorithm for the constrained case, its specialization to the unconstrained case is straightforward.In order to keep the algorithm more simple and to highlight the instability property, we chose to use the functions ϵ y (•) = ϵ x (•) = 0 whenever the algorithm is near a local minmax.
Algorithm 1 Primal-dual interior-point method for minmax Require: An initial point z = (x, s x , y, s y , ν y , λ y , ν x , λ x ), an initial barrier parameter value b, a barrier reduction factor σ ∈ (0, 1), a stopping accuracy δ s ≥ 0, a δ ϵ > 0 that defines a neighborhood for stopping to adjust ϵ x and ϵ y .
if (ConsLQAC) cannot be satisfied with ϵ y = ϵ x = 0 then 5: Increase ϵ y until inertia(J yy f (z) − E y ) = (l y + m y , n y + m y , 0) Increase ϵ x until, for some value of µ ∈ (0, 1), inertia(J zz f (z) + µE(z)) ̸ = (n x + m x + l y + m y , l x + m x + n y + m y , 0) Compute a new z using the equation where α ∈ (0, 1] is selected such that the feasibility conditions λ y , λ x , s y , s x > 0 hold.Proof For each z, Algorithm 1 produces values of ϵx and ϵy that only depend on z, therefore it implicitly constructs functions ϵx(•) and ϵy(•).Moreover, ϵx(•) and ϵy(•) are such that either the stability condition (46) or the instability condition (47) are satisfied for each z, therefore they are satisfied in the neighborhood of any equilibrium point z * .Finally, ϵx(•) and ϵy(•) are constant in a neighborhood around each equilibrium point as the values of ϵx and ϵy are not adjusted when ∥g(z, b)∥ ∞ ≤ δϵ.□ In Algorithm 1, for each z, (ϵ x , ϵ y ) is chosen to satisfy the conditions of Theorem 4, and therefore generate the desired stability and instability.This means that the algorithm essentially guarantees that the modified primal-dual interior-point method can only converge to an equilibrium point if such point is a local minmax.A key point of the algorithm is that it only uses the inertia of matrices, which can be efficiently computed using either the LBLt or LDLt decomposition, as we further detail in the following remark.
Remark 1 (Computing the inertia) It is not necessary to actually compute the eigenvalues of Jzzf (z) in order to determine the inertia.A first option is to use the lower-triangular-block-lower-triangular-transpose (LBLt) decomposition [16, Appendix A], which decomposes Jzzf (z) into the product LBL ′ where L is a lower triangular matrix and B a block diagonal one, the inertia of B is the same as the inertia of Jzzf (z).
Let Γ = diag(γ1 nx+mx , −γ1 ny+my , γ1 ly+my , −γ1 lx+mx ), with γ a small positive number.A second approach is to use the lower-triangular-diagonal-lowertriangular-transpose (LDLt) decomposition, to decompose Jzzf (z) + Γ into the product LDL ′ where L is a lower triangular matrix and D is a diagonal matrix; the inertia of D, which is given by the number of positive, negative and zero elements of the diagonal of D, gives the inertia of Jzzf (z) + Γ.The matrix Γ introduces a distortion in the inertia but it helps to stabilize the computation of the LDLt decomposition, which tends to be faster than the LBLt decomposition.This is the approach we use in our implementation; it has been studied in primal-dual interiorpoint algorithms for minimization and the distortion introduced by Γ tends to be compensated by a better numerical algorithm [43,44].□

Benchmark example for unconstrained minmax
Consider the following functions The first three have been used as examples in [31,35,45] respectively, whereas the fourth one is a well known case for a simple but challenging function to find the local minmax.These problems all satisfy the assumption of Theorem Table 1: Comparing the performance of Pure Newton's method, Gradient Descent Ascent and Algorithm 1.We randombly generated 1000 initializations and used them to start the algorithms from different locations."eq" refers to the number of times the algorithm converged to an equilibrium point, i.e., a point that satisfy the first order condition (but not necessarily the second order ones), "minmax" refers to the number of times the algorithm converged to a local mimmax, "iter" refers to the number of iterations it took for the algorithm to converge to a minmax point.
3 and have local minmax points.We have chosen these functions because, as we will show, they illustrate some interesting behaviors.
Our goal is to compare the performance of Algorithm 1 to the performance of two well established algorithms.On the one hand, we look at the performance of a "pure" Newton algorithm, i.e., using ϵ x (•) = ϵ y (•) = 0. On the other hand, we look into the convergence of a Gradient Descent Ascent (GDA), i.e., where α x and α y are constant and different for each problem; we did our best to select the best values α x and α y for each problem.
Each algorithm is initialized 1000 times, using the same initialization for the three of them each time.We compare their convergence properties according to three criteria: the number of times the algorithm converged to an equilibrium point (eq.), the number of times it converged to a local minmax point (minmax) and the average number of iterations to converge to a local minmax point (iter).The algorithm is terminated when the infinity norm of the gradient is smaller than δ s = 10 −5 and we declare that they did not converge if it has not terminated in less than 500 iterations for the pure Newton and Algorithm 1, and 50 000 for GDA.The result of the comparison is displayed in Table 1.The key take away from these examples is that Algorithm 1 never converges towards an equilibrium point that is not a local minmax, in contrast with the pure Newton method which is attracted to any equilibrium point.Here is a detailed observation from this comparison.
• The pure Newton algorithm has good overall convergence for all the problems, but it also tends to often converge towards an equilibrium point that is not a local minmax problems.On the other hand, when the pure Newton converges to a local minmax, it does so in less iterations than the other two methods.This is expected when comparing to the GDA, as it is a first order method.Pure Newton algorithm converges in (slightly) less iterations than Algorithm 1 because taking ϵ x and ϵ y different than 0 hinders the superlinear convergence property of Newton's method.• The GDA algorithm seems to enjoy the property of always converging towards a local minmax, and except for f 3 (•) and f 4 (•), it has good rate of convergence.However, GDA takes an exceptionally long number of iterations to converge.This is somehow expected from the fact that it is a first order method, and it is partially compensated by each iteration being more simple to compute.However, one must keep in mind that none of this takes into account the time that needs to be spent adjusting the step sizes until a good convergence rate can be obtained.• At last, Algorithm 1 is across the board the algorithm with better convergence towards local minmax, and it does so in the smallest number of iterations.As it was expected from the theory, Algorithm 1 never converges towards an equilibrium point that is not a local minmax.From a numerical perspective, the biggest takeaway is that while our results are only about local convergence, the algorithm still enjoys good global convergence properties; only in f 3 (•) it does not converge essentially 100% of the time.• Function f 4 (•) is particularly interesting example.First, notice that the pure Newton converges in one iteration.This is expected as the iterations are given by This is in stark contrast with GDA which, as it is well known, diverges away from the local minmax.As for Algorithm 1, it converges even though it does not satisfy the assumptions of Theorem 3, further emphasizing that these are sufficient but not necessary conditions.Notice that Algorithm 1 is not the same as the pure Newton as the Hessian will be modified with an ϵ y (x, y) > 0 to guarantee that the portion of the Hessian associated to the maximization is negative definite.

The homicidal chauffeur example for constrained minmax
In the homicidal chauffeur problem, a pursuer driving a car is trying to hit a pedestrian, who (understandably) is trying to evade it.The pursuer is modeled as a discrete time Dubins's vehicle with equations  x (1,2)   p (t + i + 1) − x e (t + i + 1) (48) where x (1,2) p designates the first and second elements of the vector x p ; γ u and γ d are positive weights; and U , U, D and D are defined for i = 0, . . ., T − 1 Instead of explicitly computing the solution of the trajectory of the pursuer and evaders, we are implicitly computing them by setting the dynamics as equality constraints; we will show shortly that this has an important impact on the scalability of the algorithm.
Each player is controlled using Model Predictive Control (MPC), meaning that at each time step t we solve (48) obtaining controls u(t) and d(t), which are then used to control the system for the next time step.The problem satisfy Fig. 2: Scaling of homicidal chauffeur with horizon length and sparsity pattern of the Hessian when using the sequential approach the assumptions of Theorem 4, as it is differentiable and has local minmax points for which the LICQ and strict complementarity hold.

The importance of guaranteeing instability
It is natural to ask whether it is important to enforce the instability guarantee, specially in the case where the (ConsLQAC) is not enough, meaning one needs to use line 7 of Algorithm 1.In Figure 1 we show what can happen if they are not enforced.We take the homicidal chauffeur problem with a horizon of T = 20 and we run the MPC control for t = 1, . . ., 50.In one case we enforce the instability guarantee, meaning that we use line 7 of Algorithm 1, on the second case we only enforce the (ConsLQAC), and on the third case we only enforce the instability guarantees after t = 25.In all cases, we start the system with the exact same initial conditions.In the first case, the evader (which is the maximizer), is able to find a control that allows it to get further from the pursuer.The average cost for all the time steps (t = 1, . . ., 50) ends up being around 0.2.In the second case, the solver keeps being attracted towards a point that is not a local minmax (and more precisely, not a local maximum), which means that the evader is not capable of escaping the pursuer; as a consequence, the average cost for all the time steps ends up being around 0.05, which is lower, as expected.Finally, in the third case, at t = 25 the solver starts to be able to converge towards a local minmax, and the evader is able to escape from the pursuer.This example illustrates how crucial it is to enforce instability.By doing it, we guarantee that the algorithm can only converge towards an equilibrium point that is a local minmax, and this can completely change the numerical solution.

Exploiting sparsity
Instead of setting the dynamics as equality constraints in (48), one could simply find the solution of the trajectory equation at each time step.This means to explicitly calculate x p (t + i + 1) = ϕ p ϕ p . . ., u(t + i − 1) , u(t + i) .In the MPC literature, this is known as the sequential approach, versus the simultaneous approach we used in (48) [46,Chapter 8.1.3].We want to study the scalability of the algorithm by enlarging the horizon T , both when using the sequential and the simultaneous approaches.
The sequential approach solves an optimization problem in a smaller state space, because it only needs to solve the optimization for u(t), . . ., u(t + T ) and d(t), . . ., d(t + T ) and it does not have to handle equality constraints.However, as we can see from the sparsity pattern in Figure 2b, the Hessian is rather dense, with large parts of it containing nonzero entries.As it can be seen in Figure 2a, the algorithm scales rather poorly as the horizon length (and hence, the number of variables) increases; it no longer converges reliably after T = 80.
The simultaneous approach on the other hand solves the optimization problem in a much larger space state, because not only it needs to also solve for u(t), . . ., u(t + T ) and d(t), . . ., d(t + T ), but also for x p (t), . . ., x p (t + T ) and x e (t), . . ., x e (t + T ) and it also needs to handle equality constraints.Fortunately, as we can see from the sparsity pattern in Figure 3b, most of the entries in the Hessian are actually structurally zero (meaning they are always zero).TensCalc's implementation of the LDLt factorization exploits sparsity patterns and scales roughly in O(T ), which makes it substantially more efficient than standard LDLt decomposition, which scales in O(T 3 ) [16, Appendix A].At each step of Algorithm 1, most of the time is spent computing the LDLt decomposition, either for adjusting ϵ x and ϵ y or to invert H zz f (z).As a consequence, we can see in Figure 3a that both the number of iterations necessary to solve the optimization as well as the time per iteration scale roughly linear, the first being multiplied by about 1.7 while the second by 3.5 while the horizon length T is multiplied by roughly 30.
Remark 2 (Minmax problems with shared dynamics) In the homicidal chauffeur, the control of the pursuer does not impact the dynamics of the evader, and vice versa.This is why in (48) the dynamics can be set as equality constraints independently for the min and for the max.Now consider the problem x + = f (x, u, d) where u is the control and d is the disturbance and one wants to minimize a cost function V (x(1), . . ., x(T ), u(0), . . ., u(T −1)) given the worst disturbance d(1), . . ., d(T ).Because both the control and the disturbances influence the dynamics, we need to include the dynamics as equality constraints for the maximization, leading to the optimization problem min u(i)∈U ,i=0,...,T −1 max d(i)∈D,x(i+1),i=0,...,T −1: x(i+1)=f (x(i),u(i),d(i)) V x(1), . . ., x(T ), u(0), . . ., u(T − 1) where U, D are the feasible sets for the control and disturbances.It is important to notice that x just acts as a latent/dummy variable that allows us to avoid solving the trajectory equation.Setting it as a maximization variable does not changes the result as x is always exactly determined by the value of u and d.It does, however, improves the numerical efficiency of the algorithm as now the Hessian matrices are sparse and their LDL decomposition can be efficiently computed.□

Conclusion
The main contribution of this article is the construction of Newton and primaldual interior-point algorithm for nonconvex-nonconcave minmax optimization that can only converge towards an equilibrium point if such point is a local minmax.We established this results by modifying the Hessian matrices such that the update steps can be seen as the solution of quadratic programs that locally approximate the minmax problem.While our results are only local, using numerical simulations we see that the algorithm is able to make progress towards a solution even if it does not start close to it.We also illustrated using numerical examples how important it is to have a formulation of the minmax problem such that the Hessian matrix is sparse.
The main future direction would be to develop non-local convergence results.We believe that the best approach to obtain such results would be to develop a type of Armijo rule which could be used to obtain similar results to those from minimization.Developing filters and merit function could also play an important role in coming up with ways to improve the algorithm's convergence.

Acknowledgment
The authors would like to thank the reviewers and editors for their comments which greatly improved the quality of the article and helped to clarify the main then (x, w x , ν x , λ x ) is a local minimum of (A2).The proof can be found in [16,Theorem 12.5].
We now need to prove that (A3) is equivalent to the condition (13) from the proposition.Because the LICQ and strict complementarity hold, the set C x (z) is given by the null space (a.k.a. the kernel) of the matrix Hxλ f (z) = ∇ x G x (x) ∇ x F x (x) 0 2 diag(w x ). (A4) This result can be found in [16,Chapter 12.5], in the subsection "Secondorder conditions and projected Hessian".Let Z x ∈ R nx+mx,nx+mx−mx−lx be a matrix with full column rank such that Hxλ f (z) ′ Z x = 0.Then, the condition (A3) can be rewritten as which is equivalent to say that inertia Z ′

13 :Proposition 4 (
if ∥g(z, b)∥ ∞ ≤ b then Construction of the modified primal-dual interior-point method) Algorithm 1 generates functions ϵx(•) and ϵy(•) that satisfy the conditions of Theorem 4 in the neighborhood of any equilibrium point z * that satisfy the assumptions of Theorem 4.

Fig. 1 :
Fig. 1: Trajectory for Homicidal Chauffeur problem with and without guaranteeing instability at equilibrium points that are not a local minmax.
iteration in ms (line) (a) Computational scaling for solving homicidal chauffeur per horizon length (b) Structural sparsity pattern of Jzzf (z)

5Fig. 3 :
Fig. 3: Scaling of homicidal chauffeur with horizon length and sparsity pattern of the Hessian

)
We want now to define a new partition of H zz f (z) which we will use to finish the proof.Consider the matricesHzz f (z) = H xx f (z) H xy f (z) H xy f (z) ′ H yy f (z) and Hxλ f (z) = H xλ f (z) 0 ny+my+ly+my,lx+mx .suchthatH zz f (z) = Hzz f (z) Hxλ f (z) Hxλ f (z) ′ 0 lx+mxLet the matrix Zx := Z x 0 nx+mx,ny+my+ly+my 0 ny+my+ly+my,nx−lx I ny+my+ly+my .
and ∇ y G y (x, y) have linearly independent columns, we propose to obtain (d x , d sx , d y , d sy ) as the optimizers and (d νy , d λy , d νx , d λx ) the associated K( dx , dsx , dy , dsy )