On the SQH Method for Solving Differential Nash Games

A sequentialquadratic Hamiltonian schemefor solving open-loop differential Nash games is proposed and investigated. This method is formulated in the framework of the Pontryagin maximum principle and represents an efficient and robust extension of the successive approximations strategy for solving optimal control problems. Theoretical results are presented that prove the well-posedness of the proposed scheme, and results of numerical experiments are reported that successfully validate its computational performance.


Introduction
The sequential quadratic Hamiltonian (SQH) scheme has been recently proposed in [4][5][6][7] for solving nonsmooth optimal control problems governed by differential models. The SQH scheme belongs to the class of iterative methods known as successive approximations (SA) schemes that are based on the characterisation of optimality in control problems by the Pontryagin maximum principle (PMP); see [2,27] and [12] for a recent detailed discussion. The initial development of SA schemes was inspired by the work of L. I. Rozonoèr [32], and originally proposed in different variants by H.J. Kelley, R.E. Kopp and H.G. Moyer [19] and by I.A. Krylov and F.L. Chernous'ko [20,21]; see [11] for an early review.
The working principle of most SA schemes is the point-wise minimisation of the Hamilton-Pontryagin function introduced in the PMP theory. However, in their original formulation, the SA schemes appeared efficient but not robust with respect to the numerical Alfio Borzì alfio.borzi@mathematik.uni-wuerzburg.de Francesca Calà Campana francesca.cala-campana@mathematik.uni-wuerzburg.de 1 and optimisation parameters. Twenty years later, a great improvement in robustness was achieved by Y. Sakawa and Y. Shindo [33,34] by introducing a quadratic penalty of the control updates that resulted in an augmented Hamiltonian. In this latter formulation, the need of coupled updates of the state and control variables of the controlled system limited the application of the resulting method to small-size control problems. This limitation was resolved in the SQH approach where a sequential point-wise optimisation of an augmented Hamiltonian function is considered that defines a suitable update step for the control variable while the state function is updated after the completion of this step. Since the SQH iterative procedure has proved efficient and robust in solving (non-smooth) optimal control problems governed by ordinary-and partial-differential models, it is reasonable to investigate whether this procedure can be successfully extended to differential games.
The first component of a differential game is a differential model that governs the state of the considered system and is subject to different mechanisms of action representing the strategies of the players in the game. Furthermore, objective functionals and admissible set of actions are associated to each player, and in the game the purpose of each player is to minimise its own objective subject to the constraints given by the differential model and the admissible sets. Since we consider non-cooperative games, a suitable solution concept is the one proposed by J. F. Nash in [22,23], where a so-called Nash equilibrium (NE) for static games with complete information is defined, that is, a configuration where no player can benefit from unilaterally changing its own strategy. In this framework, differential Nash games were pioneered by R. P. Isaacs [16]. However, contemporary to Isaacs' book, there are the works [25,26] where differential games are discussed in the framework of the maximum principle. Furthermore, in [1,21], we find early attempts and comments towards the development of a SA scheme for differential games. Unfortunately, less attention was paid to this research direction and the further development of these schemes for differential games was left out.
It is the purpose of this work to contribute to this development by investigating a SQH scheme for solving open-loop non-zero-sum two-player differential Nash games. In particular, we consider linear-quadratic (LQ) Nash games that appear in the field of, e.g., economics and marketing [13,18], and are very well investigated from the theoretical point of view; see, e.g., [8,14,15,35]. Moreover, since the solution of unconstrained LQ Nash games can be readily obtained by solving coupled Riccati equations [3,14], they provide a convenient benchmark for our method. However, we also consider extension of LQ Nash games to problems with tracking objectives, box constraints on the players' actions, and actions' costs that include L 1 terms.
In the next section, we formulate a class of differential Nash games and their characterisation in the PMP framework. In particular, we notice that the point-wise PMP characterisation of a Nash equilibrium corresponds to the requirement that, at each time instant, the conditions of equilibrium of a finite-dimensional Nash game with two Hamilton-Pontryagin functions must be satisfied. In Section 3, we present our SQH method for solving differential Nash games and discuss its well-posedness. Specifically, we show that the adaptive choice of the weight of a Sakawa-Shindo-type penalisation can be successfully performed in a finite number of steps such that the proposed update criteria based on the Nikaido-Isoda function are satisfied.
Section 4 is devoted to numerical experiments that successfully validate our computational framework. In the first experiment, we consider a differential LQ Nash game and show that the SQH scheme provides a solution that is identical to that obtained by solving an appropriate Riccati system. In the second experiment, the same problem with the addition of the requirement that the players' actions must belong to given bounded, closed, and convex sets is solved by the SQH scheme. In the third experiment, we extend the setting of the second experiment by adding weighted L 1 costs of the players' actions and verify that these costs promote sparsity. In the fourth experiment, we consider the case where each player's functional corresponds to a tracking problem where the players aim at following different paths. Also in this case, constraints on the players' actions and L 1 costs are considered. We remark that all NE solutions are successfully computed with the same setting of values of the parameters entered in the SQH algorithm, that is, independently of the problem and of the chosen weights in the players' cost functionals. A section of conclusion completes this work.

PMP Characterization of Nash Games
This section is devoted to the formulation of differential Nash games and the characterisation of their NE solution in the PMP framework. We discuss the case of two players, represented by their strategies u 1 and u 2 , which can be readily extended to the case of N players, and assume the following dynamics: where t ∈ [0, T ], y(t) ∈ R n , and u 1 (t) ∈ R m and u 2 (t) ∈ R m , m ≤ n. We assume that f is such that for any choice of the initial condition y 0 ∈ R n , and any u 1 , u 2 ∈ L 2 (0, T ; R m ), the Cauchy problem (1) admits a unique solution in the sense of Carathéodory; see, e.g., [3]. Furthermore, we assume that the map (u 1 , u 2 ) → y = y(u 1 , u 2 ), where y(u 1 , u 2 ) represents the unique solution to Eq. 1 with fixed initial conditions is continuous in (u 1 , u 2 ). We refer to u 1 and u 2 as the game strategies of the players P 1 and P 2 , respectively. The goal of P 1 is to minimise the following cost (or objective) functional: whereas P 2 aims at minimising its own objective given by We consider the cases of unconstrained and constrained strategies. In the former case, we assume u 1 , u 2 ∈ L 2 (0, T ; R m ), whereas in the latter case we assume that u 1 and u 2 belong, respectively, to the following admissible sets: where K (i) ad are compact and convex subsets of R m . We denote with U ad = U (1) ad × U (2) ad and U = L 2 (0, T ; R m ) × L 2 (0, T ; R m ). Notice that we have a uniform bound on |y(u 1 , u 2 )(t)|, t ∈ [0, T ], that holds for any u ∈ U ad ; see [9].

Definition 1
The functions (u * 1 , u * 2 ) ∈ U ad are said to form a Nash equilibrium (NE) for the game (J 1 ,J 2 ; U (1) ad , U (2) ad ), if it holds (A similar Nash game is defined replacing U ad with U .) We remark that existence of a NE point can be proved subject to appropriate conditions on the structure of the differential game, including the choice of T . For our purpose, we assume existence of a Nash equilibrium (u * 1 , u * 2 ) ∈ U ad , and refer to [9] for a review and recent results in this field.
We remark that, if u * = (u * 1 , u * 2 ) is a NE for the game, then it satisfies the following: This fact implies that the NE point u * = (u * 1 , u * 2 ) must fulfil the necessary optimality conditions given by the Pontryagin maximum principle applied to both optimisation problems stated in Eq. 5, alternatively (6).
In order to discuss these conditions, we introduce the following Hamilton-Pontryagin (HP) functions: In terms of these functions, the PMP condition for the NE point u * = (u * 1 , u * 2 ) states the existence of multiplier (adjoint) functions p 1 , p 2 : [0, T ] → R n such that the following holds: for almost all t ∈ [0, T ]. Notice that, at each t fixed, the problem (8) corresponds to a finite-dimensional Nash game.
In Eq. 8, we have y * = y(u * 1 , u * 2 ), and the adjoint variables p * 1 , p * 2 are the solutions to the following differential problems: where i = 1, 2, ∂ y φ(y) represents the Jacobian of φ with respect to the vector of variables y, and means transpose. Similarly to Eq. 1, one can prove that Eqs. 9 and 10 are uniquely solvable, and the solution can be uniformly bounded independently of u ∈ U ad . We conclude this section introducing the Nikaido-Isoda [24] function ψ : U ad × U ad → R, which we use for the realisation of the SQH algorithm. We have for any v ∈ U ad and ψ(u * , u * ) = 0.

The SQH Scheme for Solving Nash Games
In the spirit of the SA scheme proposed by Krylov and Chernous'ko [20], a SA methodology for solving our Nash game (J 1 ,J 2 ; U (1) ad , U (2) ad ) consists of an iterative process, starting with an initial guess (u 0 1 , u 0 2 ) ∈ U ad , and followed by the solution of our governing model (1) and of the adjoint problems Eqs. 9 and 10 for i = 1, 2. Thereafter, a new approximation to the strategies u 1 and u 2 is obtained by solving, at each t fixed, the Nash game (8) and assigning the values of (u 1 (t), u 2 (t)) equal to the solution of this game.
We remark that this update step is well posed if this solution exists for t ∈ [0, T ] and the resulting functions u 1 and u 2 are measurable. Clearly, this issue requires identifying classes of problems for which we can guarantee existence and uniqueness (or the possibility of selection) of a NE point. In this respect, a large class can be identified based on the following result given in [8], which is proved by an application of the Kakutani's fixed-point theorem; see, e.g., [3] for references. We have Theorem 3.1 Assume the following structure: Furthermore, suppose that K (1) ad and K (2) ad are compact and convex, the function f 0 , 0 i and the matrix functions M 1 and M 2 are continuous in t and y, and the functions u 1 → 1 1 (t, u 1 ) and u 2 → 2 2 (t, u 2 ) are strictly convex for any choice of t ∈ [0, T ] and y ∈ R n . Then, for any t ∈ [0, T ] and any y, p 1 , p 2 ∈ R n , there exists a unique pair (ũ 1 ,ũ 2 ) ∈ K (1) ad × K (2) ad such that With the setting of this theorem, the map (t, y, p 1 , p 2 ) → u * 1 , u * 2 is continuous [8]. Moreover, based on results given in [30], one can prove that the functions (u 1 (t), u 2 (t)) resulting from the SA update, starting from measurable (u 0 1 (t), u 0 2 (t)), are measurable. Therefore, the proposed SA update is well-posed and it can be repeated in order to construct a sequence of functions (u k 1 , u k 2 ) ∞ k=0 . However, as already pointed out in [20] in the case of optimal control problems, it is difficult to find conditions that guarantee convergence of SA iterations to the solution sought. Furthermore, results of numerical experiments show a lack of robustness of the SA scheme with respect to the choice of the initial guess and of the numerical and optimisation parameters.
For this reason, further research effort was put in the development of the SA strategy, and an advancement was achieved by Sakawa and Shindo considering a quadratic penalty on the Hamiltonian [33,34]. We remark that these authors related their penalisation strategy to that proposed by B. Järmark in [17], which is similar to the proximal scheme of R. T. Rockafellar discussed in [29].
For our purpose, we follow the same path of [33] and extend it to the case of Nash games as follows. Consider the following augmented HP functions: (13) where, in the iteration process, u = (u 1 , u 2 ) is subject to the update step, and v = (v 1 , v 2 ) corresponds to the previous strategy approximation; | · | denotes the Euclidean norm. The parameter > 0 represents the augmentation weight that is chosen adaptively along the iteration as discussed below.
Now, similar to the SA update illustrated above, suppose that the kth function approximation (u k 1 , u k 2 ) and the corresponding y k and p k 1 , p k 2 have been computed. For any fixed t ∈ [0, T ] and > 0, consider the following finite-dimensional Nash game: where y k = y k (t), p k 1 = p k 1 (t), p k 2 = p k 2 (t), and (u k 1 , u k 2 ) = (u k 1 (t), u k 2 (t)). It is clear that, assuming the structure specified in Theorem 3.1, the Nash game (14) admits a unique NE point, (ũ 1 ,ũ 2 ) ∈ K (1) ad × K (2) ad , and the sequence constructed recursively by the procedure: is well defined. Notice that, in this procedure, the solution to Eq. 14 depends on the value of . Therefore, the issue arises whether, corresponding to the step k → k+1, we can choose the value of this parameter such that the strategy function u k+1 = (u k+1 1 , u k+1 2 ) represents an improvement on u k = (u k 1 , u k 2 ), in the sense that some convergence criteria towards the solution to our differential Nash problem are fulfilled.
For this purpose, we define a criterion that is based on the Nikaido-Isoda function. We require that for some chosen ξ > 0. This is a consistency criterion in the sense that ψ must be nonpositive, and if (u k+1 , u k ) → (u * , u * ), then we must have lim k→∞ ψ(u k+1 , u k ) = 0. Furthermore, we require that the absolute value |ψ(u k+1 , u k )| monotonically decreases in the SQH iteration process. In our SQH scheme, if the strategy update meets the two requirements above, then the update is taken and the value of is diminished by a factor ζ ∈ (0, 1). If not, the update is discarded and the value of is increased by a factor σ > 1, and the procedure is repeated.
Below, we show that a value of can be found such that the update is successful and the SQH iteration proceeds until an appropriate stopping criterion is reached.
Our SQH scheme for differential Nash games is implemented as follows: In the following proposition, we prove that the Steps 1-6 of the SQH scheme are well posed. For the proof, we consider the assumptions of Theorem 3.1 with further simplifying hypothesis, which can be relaxed at the cost of more involved calculations.
Our purpose is to show that it is possible to find an in Algorithm 1 such thatũ generated in Step 2 satisfies the criterion required in Step 5 for a successful update. We have Let (ỹ,ũ 1 ,ũ 2 ), (y k , u k 1 , u k 2 ) be generated by Algorithm 1, Steps 2-3, and denote δu = u − u k . Then, there exists a θ > 0 independent of such that, for > 0 currently chosen in Step 2, the following inequality holds: In particular, if > θ then ψ(ũ, u k ) ≤ 0.
Proof Recall the definition of the Nikaido-Isoda function: We focus on the first two terms involving J 1 ; however, the same calculation applies to the last two terms with J 2 .
Therefore, we have shown that With a similar computation, we also obtain whereỹ k 2 = y(ũ 1 , u k 2 ), δy 2 :=ỹ −ỹ k 2 and δỹ 2 :=ỹ k 2 − y k . Thus, we arrive at the following inequality: Next, we notice that the solutions to the state and adjoint problems are uniformly bounded in [0, T ] for any choice of u ∈ U ad , and the following estimates hold: Similarly, see [9] for a proof. By using these estimates in Eq. 19, we obtain where θ depends on the functions computed at the kth iteration but not on . Thus, the claim is proved. We remark that in Step 2 of the SQH algorithm, the NE solutionũ obtained in Step 2 depends on so that ũ − u k 2 L 2 (0,T ) decreases as O(1/ 2 ). In order to illustrate this fact, consider the following optimisation problem: where ν, > 0. Clearly, the function f is concave and its maximum is attained atũ = Now, subject to the assumptions of Proposition 3.2 and using the estimates in its proof, we can state that there exists a constant C > 0 such that where C increases linearly with . On the other hand, since the HP functions are concave, we have that ũ − u k 2 L 2 (0,T ) decreases as O(1/ 2 ). Therefore, given the value k in Step 5 of the SQH algorithm, it is always possible to choose sufficiently large such that |ψ(ũ, u k )| ≤ k .
We remark that, subject to the assumptions of Proposition 3.2, if (u k 1 , u k 2 ) generated by Algorithm 1 satisfies the PMP conditions, then Algorithm 1 stops returning (u k 1 , u k 2 ).

Numerical Experiments
In this section, we present results of four numerical experiments to validate the computational performance of the proposed SQH scheme. We remark that in all these experiments the structure of the corresponding problems is such that in Step 2 of the SQH algorithm the updateũ(t) at any fixed t can be determined analytically. The first experiment exploits the possibility to compute open-loop NE solutions to linearquadratic Nash games by solving a coupled system of Riccati equations [14]. Thus, we use this solution for comparison to the solution of the same Nash game obtained with the SQH method.
The initial guess u 0 for the SQH iteration are zero functions, and we choose = 10, ζ = 0.95, σ = 1.05, ξ = 10 −8 , 0 = 10, and K = 10 −14 . With this setting, we obtain the Nash strategies (u 1 , u 2 ) depicted in Fig. 1 Fig. 2 Strategies u 1 (left) and u 2 for the LQ Nash game with constraints on u as obtained by the SQH scheme obtained by solving the Riccati system as shown in Fig. 1 (right). We can see that the two sets of solutions overlap. Next, we consider the same setting but require that the players' strategies are constrained by choosing With this setting, we obtain the strategies depicted in Fig. 2.
In our third experiment, we consider a setting similar to the second experiment but add to the cost functionals a weighted L 1 cost of the strategies. We have (written in a more compact form): where i = 1, 2; the terms with D i , i = 1, 2, are omitted. We choose ν 1 = 0.1, ν 2 = 0.01, and β 1 = 0.01, β 2 = 0.1; the other parameters are set as in the first experiment. Furthermore, we require that the players' strategies are constrained by choosing K (i) ad = [−10, 10] × [−10, 10], i = 1, 2. The strategies obtained with this setting are depicted in Fig. 3. Notice that the addition of L 1 costs of the players' actions promotes their sparsity. In the next experiment, we consider a tracking problem where the cost functionals have the following structure: Notice that these trajectories are orthogonal to each other, that is, the two players have very different purposes. For the initial state, we take y 0 = (1/2, 1/2). In this fourth experiment, the values of the game parameters are given by α 1 = 1, α 2 = 10, ν 1 = 10 −8 , ν 2 = 10 −6 , β 1 = 10 −8 , β 2 = 10 −6 , and γ 1 = 1 and γ 2 = 1. Furthermore, we require that the players' strategies are constrained by choosing K  Fig. 4.
For this concluding experiment, we report that the convergence criterion is achieved after 3593 SQH iterations, whereas the number of successful updates is 1686. Correspondingly, we see that ψ is always negative and its absolute value monotonically decreases, with ψ = −1.70 × 10 −10 at convergence. On the other hand, we can see that the value of is changed along the iteration, while the values of the players' functionals reach the Nash equilibrium. The CPU time for this experiment is 1151.4 s on a laptop computer.

Conclusion
A sequential quadratic Hamiltonian (SQH) scheme for solving open-loop differential Nash games was discussed. Theoretical and numerical results were presented that successfully demonstrated the well-posedness and computational performance of the SQH method applied to different Nash games governed by ordinary differential equations.
However, the applicability of the proposed method seems not restricted to these models and appears to be a promising technique for solving infinite-dimensional differential Nash game problems that have been considered recently in different fields of applied mathematics; see, e.g., [10,28,31].