1 Introduction

We consider two-player zero-sum deterministic differential games defined on the time-interval [0, T] (with \(T>0\)). The state of the game is given by a vector \(s\in \mathbb {R}^N\). The state is driven by the controls of the players, who can choose and adapt their controls continuously during play. Player 1 chooses controls from a convex and compact set U, while player 2 chooses controls from a convex and compact set V. Both players know the state of the game at any time, and they can choose their controls depending on both time and state. A control strategy for player 1 is therefore defined as a function \({\textbf{u}} \in {\mathcal {U}}\), where \({\mathcal {U}}\) is the set of Borel measurable functions with domain \([0,T]\times \mathbb {R}^N\) and codomain U. Similarly, a control strategy for player 2 is defined as a function \({\textbf{v}}\in {\mathcal {V}}\), where \({\mathcal {V}}\) is the set of Borel measurable functions with domain \([0,T]\times \mathbb {R}^N\) and codomain V.

Given initial time \(t\in [0,T]\) and initial state \(x\in \mathbb {R}^N\) of the game, the control strategies \({\textbf{u}}\) and \({\textbf{v}}\) of the players determine the state by the differential equation

$$\begin{aligned} \begin{array}{lllr} \dot{s}(\tau ) &{} = &{} f(\tau ,s(\tau ),{\textbf{u}}(\tau ,s(\tau )),{\textbf{v}}(\tau ,s(\tau ))) &{} \text{ for } \tau \in [t,T] \\ s(t) &{} = &{} x. \end{array} \end{aligned}$$
(1)

Here \(f: [0,T]\times \mathbb {R}^N\times U\times V \rightarrow \mathbb {R}\) satisfies the following Lipschitz conditions: (a) There exists \(K_f > 0\), such that \(|f(t,x,u,v) - f(t,y,u,v)| \le K_f |x-y|\) for all \(t\in [0,T]\), \(x,y\in \mathbb {R}^N\), \(u\in U\), and \(v\in V\); (b) There exists \(L_f > 0\), such that \(|f(t,x,u,v) - f(t^\prime ,x,u,v)| \le L_f |t-t^\prime |\) for all \(t,t^\prime \in [0,T]\), \(x\in \mathbb {R}^N\), \(u\in U\), and \(v\in V\). Moreover, f is linear in the control variables u and v.

The payoff of the players consists of a running payoff and an additional payoff at termination:

$$\begin{aligned} J({\textbf{u}},{\textbf{v}};t,x) = \int \limits _{t}^T \ell (\tau ,s(\tau ),{\textbf{u}}(\tau ,s(\tau )),{\textbf{v}}(\tau ,s(\tau )))\text{ d }\tau + g(s(T)). \end{aligned}$$
(2)

Here \(\ell : [0,T]\times \mathbb {R}^N\times U\times V \rightarrow \mathbb {R}\) satisfies the following Lipschitz conditions: (a) There exists \(K_\ell > 0\), such that \(|\ell (t,x,u,v) - \ell (t,y,u,v)| \le K_\ell |x-y|\) for all \(t\in [0,T]\), \(x,y\in \mathbb {R}^N\), \(u\in U\), and \(v\in V\). (b) There exists \(L_\ell > 0\), such that \(|\ell (t,x,u,v) - \ell (t^\prime ,x,u,v)| \le L_\ell |t-t^\prime |\) for all \(t,t^\prime \in [0,T]\), \(x\in \mathbb {R}^N\), \(u\in U\), and \(v\in V\). Moreover, \(\ell \) is linear in the control variables u and v. Also, the function \(g: \mathbb {R}^N \rightarrow \mathbb {R}\) is Lipschitz continuous.

Let us denote the class of games that fall under the given description by \({\mathcal {G}}\). A distinguishing feature of the games in \({\mathcal {G}}\) is the linearity in the control variables of both the dynamics and the running payoff, determined by the functions f and \(\ell \), respectively. All games in \({\mathcal {G}}\) have a value. In this paper, we propose an approximation scheme for finding the value of such games, which specifically exploits the linearity property. We compare the proposed scheme with approximation schemes that work in a more general setting, for games that may not have a value. These general approximation schemes converge to either the lower or upper value of a game, so either version of such a general scheme can be applied to find the value of a game in \({\mathcal {G}}\). In their implementation, the general approximation schemes solve a certain minimax or maximin problem repeatedly, for each point in a given grid. For the games in \({\mathcal {G}}\), we propose to replace the minimax or maximin problem by the computationally much easier problem of finding the value of a matrix game. Proof that the proposed scheme indeed converges to the value of a game in \({\mathcal {G}}\) is given in Appendix B.

We define

$$\begin{aligned} W^+(t,x) = \mathop {\inf }\limits _{{\textbf{v}}\in {\mathcal {V}}} \mathop {\sup }\limits _{{\textbf{u}}\in {\mathcal {U}}} J({\textbf{u}},{\textbf{v}}; t,x) \end{aligned}$$

and

$$\begin{aligned} W^-(t,x) = \mathop {\sup }\limits _{{\textbf{u}}\in {\mathcal {U}}} \mathop {\inf }\limits _{{\textbf{v}}\in {\mathcal {V}}} J({\textbf{u}},{\textbf{v}}; t,x). \end{aligned}$$

The quantities \(W^+(t,x)\) and \(W^-(t,x)\) are called the upper and lower value of the game (with initial time t and initial state x), respectively. If \(W^+(t,x) = W^-(t,x)\), then \(W(t,x) = W^+(t,x) = W^-(t,x)\) is called the value of the game. In the following, it will be convenient to treat the (arbitrarily chosen) initial time t and the initial state x as variables, and \(W, W^+\), and \(W^-\) as functions of t and x.

In order to find out whether the value of the game exists, one looks at its Hamiltonian \({\widetilde{H}}:[0,T]\times \mathbb {R}^N\times \mathbb {R}^N\times U\times V\rightarrow \mathbb {R}\), defined by

$$\begin{aligned} {\widetilde{H}}(t,x,\lambda ,u,v) = \ell (t,x,u,v) + \left<\lambda , f(t,x,u,v)\right>, \end{aligned}$$
(3)

where the notation \(\left<a,b\right>\) denotes the inner product of two vectors \(a,b\in \mathbb {R}^N\).

If we have

$$\begin{aligned} \begin{array}{ll} \displaystyle \inf _{v\in V} \sup _{u\in U} \widetilde{H}(t,x,\lambda ,u,v) = \sup _{u\in U} \inf _{v \in V} \widetilde{H}(t,x,\lambda ,u,v) \end{array} \end{aligned}$$
(4)

for all \((t,x,\lambda ) \in [0,T]\times \mathbb {R}^N\times \mathbb {R}^N\), then the value of the differential game exists, for any given initial time \(t\in [0,T]\) and initial state \(x\in \mathbb {R}^N\). Condition (4) is known as the Isaacs condition (see e.g., Bardi and Capuzzo-Dolcetti [1]).

In our context, the Hamiltonian \({\widetilde{H}}\) is linear in the variables u and v, since we assumed that the functions \(\ell \) and f are linear in u and v. Moreover, we assumed that the sets U and V are convex and compact. These facts imply that Von Neumann’s minimax theorem (Von Neumann [8]) applies to equation (4) and that equality indeed holds. Thus, the differential game defined by (1) and (2) has a value for any given initial time \(t\in [0,T]\) and initial state \(x\in \mathbb {R}^N\). We will be concerned with the numerical approximation of the value.

Let us define \(H: [0,T]\times \mathbb {R}^N\times \mathbb {R}^N \rightarrow \mathbb {R}\) as

$$\begin{aligned} H(t,x,\lambda ) = \displaystyle \min _{v\in V} \max _{u\in U} \widetilde{H}(t,x,\lambda ,u,v) = \max _{u\in U} \min _{v \in V} \widetilde{H}(t,x,\lambda ,u,v). \end{aligned}$$
(5)

(We make take the minimum and maximum instead of the infimum and supremum, because U and V are compact.) Now, the value W(xt) of the differential game defined by (1) and (2) can be found as the solution of the following Hamilton–Jacobi-type partial differential equation (PDE):

$$\begin{aligned} \begin{array}{lll} \partial _t W(t,x) + H\big (t,x,DW(t,x)\big ) = 0 &{} \,\text{ for } (t,x)\in (0,T)\times \mathbb {R}^N \\ W(T,x) = g(x) &{} \,\text{ for } x\in \mathbb {R}^N. \end{array} \end{aligned}$$
(6)

Here \(\partial _t W\) denotes the partial derivative of W with respect to the time variable and DW is the vector of partial derivatives with respect to the N state variables.

The PDE given by (6) often does not have a solution in the usual sense, where the solution is smooth everywhere. In such a situation, the notion of a viscosity solution, developed during the 1980 s, is needed. Crandall et al. [3] introduced this notion for solutions of nonlinear first-order partial differential equations of the following Hamilton–Jacobi type:

$$\begin{aligned} \begin{array}{llll} \partial _t \phi (t,x) + H(t,x,\phi (t,x),D\phi (t,x)) &{} = &{} 0 &{} \text{ for } (t,x) \in (0,T)\times \mathbb {R}^N \\ \phi (0,x) &{} = &{} \phi _0(x) &{} \text{ for } x\in \mathbb {R}^N, \end{array} \end{aligned}$$
(7)

where \(H: [0, T]\times \mathbb {R}^N\times \mathbb {R}\times \mathbb {R}^N \rightarrow \mathbb {R}\) is a continuous function. Crandall, Evans and Lions [3] proved uniqueness and stability results for equations of type (7). Existence was established by Crandall and Lions [4]. Finally, the convergence of general approximation schemes to the viscosity solution of (7) was proved by Souganidis [6].

2 Approximation Schemes

In this section, we discuss approximation schemes that converge to the viscosity solution of PDE’s of the type given by (6). Note that (6) is a simplified version of (7), except for the fact that the boundary condition is at time \(t=T\) instead of \(t=0\) (which can be ‘repaired’ by a substitution of the time variable). Thus, the results in Souganidis [6] apply to the situation here. In a subsequent paper, Souganidis [7] applied the results in [6] to give a proof that (under certain conditions) zero-sum differential games have an upper and a lower value and that the value exists if the Isaacs condition is met. We will use concepts and results from Souganidis [7] to derive our result.

Consider a mapping \(F:[0,T]\times [0,T]\times C(\mathbb {R}^N) \rightarrow C(\mathbb {R}^N)\), where \(C(\mathbb {R}^N)\) denotes the set of all continuous functions on the domain \(\mathbb {R}^N\). To guide the intuition: The mapping F takes the time t, a time-step \(\rho \), and an approximation \(\phi \) of the viscosity solution of (6) at time \(t+\rho \) as its arguments, and gives \(F(t,\rho ,\phi )\) as approximation of the viscosity solution at time t. The mapping F is applied as follows:

For a partition \(P = \{0=t_0< t_1< \ldots < t_K = T\}\) of [0, T], define \(\phi _P: [0,T]\times \mathbb {R}^N \rightarrow \mathbb {R}\) by

$$\begin{aligned} \begin{array}{lll} \phi ^P(T,x) = \phi _T(x) \\ \phi ^P(t,x) = F\big (t,t_{k}-t,\phi ^P(t_{k},\cdot )\big )(x) &{} \text{ if } t\in [t_{k-1},t_k) \text{ for } k\in \{1,\ldots ,K\}. \end{array} \end{aligned}$$
(8)

We call the mapping F an approximation scheme for (6) if \(\phi ^P\) converges to the viscosity solution of (6) as \(|P| = \max \nolimits _{1\le k \le K} (t_k - t_{k-1}) \rightarrow 0\). In Appendix A, we provide conditions under which mapping F is indeed an approximation scheme for  (6).

Souganidis [7] applied the theory of general approximation schemes to a class of differential games that is larger than the class defined by (1) and (2) and that contains games without a value (see [7], section 3). Souganidis provided two different approximation schemes, one that converging to the upper and lower value of the game, respectively. When applied to the games we consider here, we obtain two different approximation schemes that both converge to the value. The approximation schemes can be formulated as follows:

$$\begin{aligned} F^+(t,\rho ,\phi )(x) = \displaystyle \min _{v\in V} \max _{u\in U} \left( \rho \, \ell (t,x,u,v) + \phi \big (x + \rho f(t,x,u,v)\big )\right) \end{aligned}$$
(9)

and

$$\begin{aligned} F^-(t,\rho ,\phi )(x) = \displaystyle \max _{u\in U} \min _{v\in V} \left( \rho \, \ell (t,x,u,v) + \phi \big (x + \rho f(t,x,u,v)\big )\right) . \end{aligned}$$
(10)

In order to implement an approximation scheme, one must not only discretize time (with a partition P), but it is also necessary to restrict calculations to a bounded and discretized subset of \(\mathbb {R}^N\) (the grid). Since the number of points in the grid increases rapidly with increasing |N|, all approximation schemes suffer from the fact that the number of calculations increases rapidly as the dimension |N| increases. For more information on discretization schemes in specific game-theoretic examples, we refer an interested reader to Appendix A of [1], to Cardaliaguet et al. [2], and to Falcone and Stefani [5].

The schemes defined by (9) and (10) have an additional computational obstacle: For every point in the grid one must solve a subproblem of either the type \(\min \nolimits _{v\in V} \max \nolimits _{u\in U} \zeta (u,v)\) (scheme (9)) or of the type \(\max \nolimits _{u\in U} \min \nolimits _{v\in V} \zeta (u,v)\) (scheme (10)). The function \(\zeta \) lacks any special structure that might make this an easy task. Therefore, a fine discretization of the sets U and V seems necessary to obtain good approximations for each of these subproblems. To address this second computational issue, we will propose an alternative approximation scheme that exploits the linearity of the functions \(\ell \) and f in the control variables u and v.

For this purpose, we choose a finite set of controls on the boundary of U and V, say \(\{u_1, \ldots , u_m\} \in \partial U\) and \(\{v_1, \ldots , v_n\} \in \partial V\), such that the convex hull of \(\{u_1,\ldots ,u_m\}\) and the convex hull of \(\{v_1,\ldots ,v_n\}\) are good approximations of U and V, respectively. Let us denote the convex hull of a set X in a vector space by \(\text{ conv }(X)\). We will assume here that U and V are polytopes, to assure that \(U = \text{ conv }(\{u_1, \ldots , u_m\})\) and \(V = \text{ conv }(\{v_1, \ldots , v_n\})\).

We define

$$\begin{aligned} \varDelta _m = \{p = (p_1,\ldots , p_m)\mid \sum _{i=1}^m p_i = 1 \text{ and } p_i\ge 0 \text{ for } \text{ all } i\in \{1,\ldots ,m\}\} \end{aligned}$$

and

$$\begin{aligned} \varDelta _n = \{q = \left( \begin{array}{c} q_1\\ \vdots \\ q_n\end{array}\right) \mid \sum _{i=1}^n q_j = 1 \text{ and } q_j\ge 0 \text{ for } \text{ all } j\in \{1,\ldots ,n\}\}. \end{aligned}$$

Additionally, let us define, for \(t,\rho \in [0,T]\), \(x\in \mathbb {R}^N\) and \(\phi \in C(\mathbb {R}^N)\), \(\varPsi (t,\rho ,x,\phi )\) as the \(m\times n\)-matrix for which entry (ij) equals

$$\begin{aligned} \varPsi _{ij}(t,\rho ,x,\phi ) := \rho \,\ell (t,x, u_i,v_j) + \phi \big (x + \rho f(t,x,u_i,v_j)\big ). \end{aligned}$$
(11)

For any matrix A, let us denote the value of the matrix game associated with A by \(\nu (A)\). We now define the scheme G by

$$\begin{aligned} \begin{array}{lll} G(t,\rho ,\phi )(x) &{} = &{} \displaystyle \max _{p\in \varDelta _m} \min _{q\in \varDelta _n} p\, \varPsi (t,\rho ,x,\phi ) q \\ &{} = &{} \displaystyle \min _{q\in \varDelta _n} \max _{p\in \varDelta _m} p\, \varPsi (t,\rho ,x,\phi ) q \\ &{} = &{} \nu (\varPsi (t,\rho ,x,\phi )). \end{array} \end{aligned}$$
(12)

The clear computational advantage of (12) is that the nonlinear ‘\(\min \max \)’ and ‘\(\max \min \)’ optimization problems in schemes (9) and (10), respectively, are replaced by the standard problem of finding the value of a matrix game. This can be done efficiently with linear programming techniques.

In what follows we will explain how the application of G is equivalent to computing the value of a certain discrete and probabilistic game, related to the differential game defined by (1) and (2).

For a partition \(P = \{0=t_0< t_1< \ldots < t_K = T\}\) of [0, T] we define a 2-player zero-sum game that proceeds in stages numbered \(0,1,\ldots , K\), at times \(0=t_0, t_1, \ldots , t_K=T\), as follows: At each stage \(k< K\), player 1 must choose an element of \(\{u_1, \ldots , u_m\}\) and player 2 must choose an element in \(\{v_1, \ldots , v_n\}\). If player 1 chooses \(u_i\) and player 2 chooses \(v_j\) (at stage k in state \(s_k\)), then the stage payoff is given by

$$\begin{aligned} (t_{k+1} - t_k) \ell (t_k,s_k,u_i,v_j) \end{aligned}$$

and the next state is given by

$$\begin{aligned} s_{k+1} = s_k + (t_{k+1} - t_k) f(t_k,s_k,u_i,v_j). \end{aligned}$$

When stage K is reached, there is a terminal payoff \(g(s_{t_K})\). Moreover, the game starts in state x.

The game described above consists of playing a sequence of classical matrix games. Such a game has a value, which can be determined by backwards induction as follows: For \(x\in \mathbb {R}^N\) and \(k\in \{0,\ldots ,K\}\), let us define the number \({\widetilde{W}}(k,x)\) as the value of the subgame starting at stage k in state \(x\in \mathbb {R}^N\). We then trivially have, for all \(x\in \mathbb {R}^N\),

$$\begin{aligned} {\widetilde{W}}(K,x) = g(x). \end{aligned}$$
(13)

To determine \({\widetilde{W}}(k,x)\) for \(x\in \mathbb {R}^N\) and \(k < K\), we first determine the expected payoff if player 1 chooses control \(u_i\) and player 2 chooses control \(v_j\). The game then advances to stage \(k+1\) and position \(x + (t_{k+1} - t_k) f(t_k,x,u_i,v_j)\), where the players can expect a payoff equal to \(\widetilde{W}\left( k+1, x + (t_{k+1} - t_k) f(t_k,x,u_i,v_j)\right) \) (assuming they play optimally from stage \(k+1\) to K). Thus, the total expected payoff at stage k and state x, associated with the control pair \((u_i,v_j)\), equals

$$\begin{aligned} (t_{k+1} - t_k)\ell (t_k,x,u_i,v_j)) + {\widetilde{W}}\left( k+1, x+ (t_{k+1} - t_k) f(t_k,x,u_i,v_j)\right) , \end{aligned}$$

which is the sum of the stage payoff \((t_{k+1} - t_k)\ell (t_k,x,u_i,v_j))\) and the subsequent payoff for the remaining stages. This is precisely the number \(\varPsi _{ij}(t,\rho ,x,\phi )\) defined by (11), where \(t = t_k\), \(\rho = t_{k+1} - t_k\), and \(\phi = {\widetilde{W}}(k+1,\cdot )\). Thus, in order to play optimally at stage k and state x, the players should play optimal mixed strategies for the matrix game associated with the matrix \(\varPsi \big (t_k,t_{k+1}-t_k,x,\widetilde{W}(k+1,\cdot )\big )\). It follows that

$$\begin{aligned} {\widetilde{W}}(k,x) = \nu \big (\varPsi \big (t_k,t_{k+1}-t_k,x,\widetilde{W}(k+1,\cdot )\big )\big ) = G\big (t_k,t_{k+1}-t_k,\widetilde{W}(k+1,\cdot )\big )(x). \end{aligned}$$
(14)

We see that application of the mapping G at moments \(t_0, t_1, \ldots , t_K\), as indicated by (8), yields exactly the value of the discrete and probabilistic game we described in this section. The main result of this paper states that G is indeed an approximation scheme for the PDE defined by (6).

Theorem 2.1

The mapping G is an approximation scheme for the PDE defined by (6).

A proof of theorem 2.1 is given in “Appendix B.” The necessary background from Souganidis [7] is given in “Appendix A.”

3 Conclusions

Finding a value of two-player zero-sum differential games with a fixed duration typically involves approximation schemes for calculating the viscosity solution of corresponding Hamilton–Jacobi partial differential equations. Such schemes are computationally very expensive, partly due to rather complex subproblems that need to be solved at each iteration.

Here, we considered two-player zero-sum differential games with a fixed duration, whose payoffs and dynamics are both linear in players’ controls. For this special class of games, we proposed an alternative approximating scheme that replaces the difficult subproblem by the problem of solving a matrix game. This gives the alternative scheme a clear computational advantage over more generic schemes. We prove that the alternative scheme indeed converges to the value of the associated differential game, as the discretization becomes finer.

We then introduced a discretized and probabilistic game, as an approximate version of the differential game, for which the value can be determined in a straightforward manner, by backward induction. We observed that the backward induction scheme for the discrete game does in fact coincide with the earlier proposed alternative approximation scheme for calculating the viscosity solution of the differential game. This gives the alternative approximation scheme a clear interpretation.