Approximating the Value of Zero-Sum Differential Games with Linear Payoffs and Dynamics

We consider two-player zero-sum differential games of fixed duration, where the running payoff and the dynamics are both linear in the controls of the players. Such games have a value, which is determined by the unique viscosity solution of a Hamilton–Jacobi-type partial differential equation. Approximation schemes for computing the viscosity solution of Hamilton–Jacobi-type partial differential equations have been proposed that are valid in a more general setting, and such schemes can of course be applied to the problem at hand. However, such approximation schemes have a heavy computational burden. We introduce a discretized and probabilistic version of the differential game, which is straightforward to solve by backward induction, and prove that the solution of the discrete game converges to the viscosity solution of the partial differential equation, as the discretization becomes finer. The method removes part of the computational burden of existing approximation schemes.


Introduction
We consider two-player zero-sum deterministic differential games defined on the timeinterval [0, T ] (with T > 0). The state of the game is given by a vector s ∈ I R N . The state is driven by the controls of the players, who can choose and adapt their controls continuously during play. Player 1 chooses controls from a convex and compact set U , while player 2 chooses controls from a convex and compact set V . Both players know the state of the game at any time, and they can choose their controls depending on both time and state. A control strategy for player 1 is therefore defined as a function u ∈ U, where U is the set of Borel measurable functions with domain [0, T ] × I R N and codomain U . Similarly, a control strategy for player 2 is defined as a function v ∈ V, where V is the set of Borel measurable functions with domain [0, T ] × I R N and codomain V .
Given initial time t ∈ [0, T ] and initial state x ∈ I R N of the game, the control strategies u and v of the players determine the state by the differential equatioṅ s(τ ) = f (τ, s(τ ), u(τ, s(τ )), v(τ, s(τ ))) for τ ∈ [t, T ] s(t) = x. (1) Here f : [0, T ] × I R N × U × V → I R satisfies the following Lipschitz conditions: (a) There exists and v ∈ V . Moreover, f is linear in the control variables u and v. The payoff of the players consists of a running payoff and an additional payoff at termination: J (u, v; t, x) = T t (τ, s(τ ), u(τ, s(τ )), v(τ, s(τ )))dτ + g(s(T )). ( Here : [0, T ] × I R N × U × V → I R satisfies the following Lipschitz conditions: (a) There exists K > 0, such that | (t, x, u, v) − (t, y, u, v)| ≤ K |x − y| for all t ∈ [0, T ], x, y ∈ I R N , u ∈ U , and v ∈ V . (b) There exists L > 0, such that and v ∈ V . Moreover, is linear in the control variables u and v. Also, the function g : I R N → I R is Lipschitz continuous. Let us denote the class of games that fall under the given description by G. A distinguishing feature of the games in G is the linearity in the control variables of both the dynamics and the running payoff, determined by the functions f and , respectively. All games in G have a value. In this paper, we propose an approximation scheme for finding the value of such games, which specifically exploits the linearity property. We compare the proposed scheme with approximation schemes that work in a more general setting, for games that may not have a value. These general approximation schemes converge to either the lower or upper value of a game, so either version of such a general scheme can be applied to find the value of a game in G. In their implementation, the general approximation schemes solve a certain minimax or maximin problem repeatedly, for each point in a given grid. For the games in G, we propose to replace the minimax or maximin problem by the computationally much easier problem of finding the value of a matrix game. Proof that the proposed scheme indeed converges to the value of a game in G is given in Appendix B.
We define and The quantities W + (t, x) and W − (t, x) are called the upper and lower value of the game (with initial time t and initial state x), respectively.
is called the value of the game. In the following, it will be convenient to treat the (arbitrarily chosen) initial time t and the initial state x as variables, and W , W + , and W − as functions of t and x.
In order to find out whether the value of the game exists, one looks at its Hamiltonian where the notation a, b denotes the inner product of two vectors a, b ∈ I R N .
for all (t, x, λ) ∈ [0, T ] × I R N × I R N , then the value of the differential game exists, for any given initial time t ∈ [0, T ] and initial state x ∈ I R N . Condition (4) is known as the Isaacs condition (see e.g., Bardi and Capuzzo-Dolcetti [1]). In our context, the Hamiltonian H is linear in the variables u and v, since we assumed that the functions and f are linear in u and v. Moreover, we assumed that the sets U and V are convex and compact. These facts imply that Von Neumann's minimax theorem (Von Neumann [8]) applies to equation (4) and that equality indeed holds. Thus, the differential game defined by (1) and (2) has a value for any given initial time t ∈ [0, T ] and initial state x ∈ I R N . We will be concerned with the numerical approximation of the value.
Let us define H : [0, T ] × I R N × I R N → I R as (We make take the minimum and maximum instead of the infimum and supremum, because U and V are compact.) Now, the value W (x, t) of the differential game defined by (1) and (2) can be found as the solution of the following Hamilton-Jacobi-type partial differential equation (PDE): Here ∂ t W denotes the partial derivative of W with respect to the time variable and DW is the vector of partial derivatives with respect to the N state variables. The PDE given by (6) often does not have a solution in the usual sense, where the solution is smooth everywhere. In such a situation, the notion of a viscosity solution, developed during the 1980 s, is needed. Crandall et al. [3] introduced this notion for solutions of nonlinear first-order partial differential equations of the following Hamilton-Jacobi type: where H : [0, T ] × I R N × I R × I R N → I R is a continuous function. Crandall, Evans and Lions [3] proved uniqueness and stability results for equations of type (7). Existence was established by Crandall and Lions [4]. Finally, the convergence of general approximation schemes to the viscosity solution of (7) was proved by Souganidis [6].

Approximation Schemes
In this section, we discuss approximation schemes that converge to the viscosity solution of PDE's of the type given by (6). Note that (6) is a simplified version of (7), except for the fact that the boundary condition is at time t = T instead of t = 0 (which can be 'repaired' by a substitution of the time variable). Thus, the results in Souganidis [6] apply to the situation here. In a subsequent paper, Souganidis [7] applied the results in [6] to give a proof that (under certain conditions) zero-sum differential games have an upper and a lower value and that the value exists if the Isaacs condition is met. We will use concepts and results from Souganidis [7] to derive our result. Consider a mapping F : denotes the set of all continuous functions on the domain I R N . To guide the intuition: The mapping F takes the time t, a time-step ρ, and an approximation φ of the viscosity solution of (6) at time t + ρ as its arguments, and gives F(t, ρ, φ) as approximation of the viscosity solution at time t. The mapping F is applied as follows: We call the mapping F an approximation scheme for (6) if φ P converges to the viscosity solution of (6) as |P| = max 1≤k≤K (t k − t k−1 ) → 0. In Appendix A, we provide conditions under which mapping F is indeed an approximation scheme for (6). Souganidis [7] applied the theory of general approximation schemes to a class of differential games that is larger than the class defined by (1) and (2) and that contains games without a value (see [7], section 3). Souganidis provided two different approximation schemes, one that converging to the upper and lower value of the game, respectively. When applied to the games we consider here, we obtain two different approximation schemes that both converge to the value. The approximation schemes can be formulated as follows: and In order to implement an approximation scheme, one must not only discretize time (with a partition P), but it is also necessary to restrict calculations to a bounded and discretized subset of I R N (the grid). Since the number of points in the grid increases rapidly with increasing |N |, all approximation schemes suffer from the fact that the number of calculations increases rapidly as the dimension |N | increases. For more information on discretization schemes in specific game-theoretic examples, we refer an interested reader to Appendix A of [1], to Cardaliaguet et al. [2], and to Falcone and Stefani [5]. The schemes defined by (9) and (10) have an additional computational obstacle: For every point in the grid one must solve a subproblem of either the type min v∈V max u∈U ζ(u, v) (scheme (9)) or of the type max u∈U min v∈V ζ(u, v) (scheme (10)). The function ζ lacks any special structure that might make this an easy task. Therefore, a fine discretization of the sets U and V seems necessary to obtain good approximations for each of these subproblems. To address this second computational issue, we will propose an alternative approximation scheme that exploits the linearity of the functions and f in the control variables u and v.
For this purpose, we choose a finite set of controls on the boundary of U and V , say {u 1 , . . . , u m } ∈ ∂U and {v 1 , . . . , v n } ∈ ∂ V , such that the convex hull of {u 1 , . . . , u m } and the convex hull of {v 1 , . . . , v n } are good approximations of U and V , respectively. Let us denote the convex hull of a set X in a vector space by conv(X ). We will assume here that U and V are polytopes, to assure that U = conv({u 1 , . . . , u m }) and We define q j = 1 and q j ≥ 0 for all j ∈ {1, . . . , n}}.
Additionally, let us define, for t, ρ ∈ [0, T ], x ∈ I R N and φ ∈ C(I R N ), Ψ (t, ρ, x, φ) as the m × n-matrix for which entry (i, j) equals For any matrix A, let us denote the value of the matrix game associated with A by ν(A). We now define the scheme G by The clear computational advantage of (12) is that the nonlinear 'min max' and 'max min' optimization problems in schemes (9) and (10), respectively, are replaced by the standard problem of finding the value of a matrix game. This can be done efficiently with linear programming techniques.
In what follows we will explain how the application of G is equivalent to computing the value of a certain discrete and probabilistic game, related to the differential game defined by (1) and (2).
For a partition P = {0 = t 0 < t 1 < . . . < t K = T } of [0, T ] we define a 2player zero-sum game that proceeds in stages numbered 0, 1, . . . , K , at times 0 = t 0 , t 1 , . . . , t K = T , as follows: At each stage k < K , player 1 must choose an element of {u 1 , . . . , u m } and player 2 must choose an element in {v 1 , . . . , v n }. If player 1 chooses u i and player 2 chooses v j (at stage k in state s k ), then the stage payoff is given by and the next state is given by When stage K is reached, there is a terminal payoff g(s t K ). Moreover, the game starts in state x.
The game described above consists of playing a sequence of classical matrix games. Such a game has a value, which can be determined by backwards induction as follows: For x ∈ I R N and k ∈ {0, . . . , K }, let us define the number W (k, x) as the value of the subgame starting at stage k in state x ∈ I R N . We then trivially have, for all x ∈ I R N , To determine W (k, x) for x ∈ I R N and k < K , we first determine the expected payoff if player 1 chooses control u i and player 2 chooses control v j . The game then advances to stage k + 1 and position where the players can expect a payoff equal to W k + 1, x + (t k+1 − t k ) f (t k , x, u i , v j ) (assuming they play optimally from stage k + 1 to K ). Thus, the total expected payoff at stage k and state x, associated with the control pair (u i , v j ), equals which is the sum of the stage payoff (t k+1 − t k ) (t k , x, u i , v j )) and the subsequent payoff for the remaining stages. This is precisely the number Ψ i j (t, ρ, x, φ) defined by (11), where t = t k , ρ = t k+1 − t k , and φ = W (k + 1, ·). Thus, in order to play optimally at stage k and state x, the players should play optimal mixed strategies for the matrix game associated with the matrix Ψ t k , t k+1 − t k , x, W (k + 1, ·) . It follows that We see that application of the mapping G at moments t 0 , t 1 , . . . , t K , as indicated by (8), yields exactly the value of the discrete and probabilistic game we described in this section. The main result of this paper states that G is indeed an approximation scheme for the PDE defined by (6).

Theorem 2.1 The mapping G is an approximation scheme for the PDE defined by (6).
A proof of theorem 2.1 is given in "Appendix B." The necessary background from Souganidis [7] is given in "Appendix A."

Conclusions
Finding a value of two-player zero-sum differential games with a fixed duration typically involves approximation schemes for calculating the viscosity solution of corresponding Hamilton-Jacobi partial differential equations. Such schemes are computationally very expensive, partly due to rather complex subproblems that need to be solved at each iteration.
Here, we considered two-player zero-sum differential games with a fixed duration, whose payoffs and dynamics are both linear in players' controls. For this special class of games, we proposed an alternative approximating scheme that replaces the difficult subproblem by the problem of solving a matrix game. This gives the alternative scheme a clear computational advantage over more generic schemes. We prove that the alternative scheme indeed converges to the value of the associated differential game, as the discretization becomes finer.
We then introduced a discretized and probabilistic game, as an approximate version of the differential game, for which the value can be determined in a straightforward manner, by backward induction. We observed that the backward induction scheme for the discrete game does in fact coincide with the earlier proposed alternative approximation scheme for calculating the viscosity solution of the differential game. This gives the alternative approximation scheme a clear interpretation.

Appendix A
In this appendix, we will state a theorem about the convergence of approximation schemes to the viscosity solution of the PDE given by (6). The theorem is a simplified version of Theorem 1.3(a) in Souganidis [7], adapted to equation (6). We will state it without proof. Notation: Properties concerning the function H in the formulation of (6): There are constants K , L, M > 0 such that There exist constants C 3 > 0 and C 4 > 0 such that where C 5 may depend on φ and Dφ .
Recall Proof of (H2): We have since the function is bounded. Proof of (H3): Let t ∈ [0, T ], x, y ∈ I R N , and λ ∈ I R N . Choose v * ∈ V such that Then choose u * ∈ U such that max Then we have Proof of (H4): Here we use that there exist L f , Similarly to the proof of (H3) we now show that Proof of (H5): Let t ∈ [0, T ], x ∈ I R N , and λ, μ ∈ I R N . Choose v * ∈ V such that Then choose u * ∈ U such that Then we have Similarly, one shows that Proof of (H1): Let t, t ∈ [0, T ], x, y ∈ I R N and λ, μ ∈ I R N . Then Here E denotes the m × n-matrix with all entries equal to 1. Proof of (F4): Let t, ρ ∈ [0, T ] and φ ∈ C 0,1 b (I R N ). Then where E denotes the m ×n-matrix for which all entries are equal to 1.
We have Then Thus, for all t, ρ ∈ [0, T ] and x ∈ I R N , corresponding entries of the two matrices Ψ (t, ρ, x, φ) and Ψ (t, ρ, x, φ) differ by at most φ − φ . This implies that the values of the corresponding matrix games differ by at most φ − φ . It follows that Proof of (F6): Let t, ρ ∈ [0, T ] and φ ∈ C 0,1 b (I R N ). We have: We see that (F6) holds with C 2 = B (and that the exponential term e ρC 2 is not necessary here.) observe that H (t, x, λ) = ν H (t, x, λ) . Then We see that (F8) holds with C 5 = 1 2 K 2 f . Proof of theorem 2.1: If , f and g are bounded, we can directly apply theorem A.1 to conclude the proof. If , f and/or g is not bounded, for any R > 0, we define and Then R , f R and g R are bounded. Moreover, it is easy to see that g R is Lipschitz continuous and that f R and R satisfy respectively | f R (t, x, u, v) − f R (t, y, u, v)| ≤ K f |x − y| and | R (t, x, u, v) − R (t, y, u, v)| ≤ K |x − y|, for all t ∈ [0, T ], x, y ∈ I R N , u ∈ U and v ∈ V . Also, | f R (t, x, u, v) − f R (t , x, u, v)| ≤ L f |t − t | and | R (t, x, u, v) − R (t , x, u, v)| ≤ L |t − t |, for all t, t ∈ [0, T ], x ∈ I R N , u ∈ U and v ∈ V . Therefore, we can apply theorem A.1 with respect to the functions R , f R and g R . Let (t, x) ∈ [0, T ] × I R N . Now, we wish to choose R sufficiently large, such that W R (t, x) = W (t, x) and such that for any partition P, we have W P R (t, x) = W P (t, x). Here W R refers to the value of the differential game defined by (1) and (2), where , f and g are replaced by the truncated functions R , f R and g R . Similarly, W P R refers to the approximation of W R that is obtained by applying G to the game with truncated functions. The choice R = e K f T (|x + M f ) will do, with M f = sup (t,u,v)∈[0,T ]×U ×V f (t, 0, u, v).