An Update on Continuous-Time Stochastic Games of Fixed Duration

This paper shows that continuous-time stochastic games of fixed duration need not possess equilibria in Markov strategies. The example requires payoffs and transitions to depend on time in a continuous but irregular (almost nowhere almost differentiable) way. This example offers a correction to the erroneous construction presented previously in Levy (Dyn Games Appl 3(2):279–312, 2013. 10.1007/s13235-012-0067-2).


Introduction
Following [5], a framework of continuous-time stochastic games of fixed duration is studied. In such games, the game is played on a fixed time interval, there are finitely many possible states and actions, and the players control the rate of the payoffs and the rate of transition between states. The staple model, due to [9], assumes that payoffs and transitions rates are stationary-that is, time-independent-functions of the actions and state. However, as discussed in Levy [5,Sec. 9], many results concerning the model-including all the results in that paper-extend fairly automatically if the payoffs and transition rates depend in any (bounded and Borel) way on time.
The purpose of this corrigendum is to show that such games need not possess equilibria in Markov strategies-a natural class of strategies for these games which depend only on time and state, not on histories. Indeed, Levy [5] establishes a number of results concerning these strategies and their variations, in particular, the existence of extensive-form publicsignal correlated Markov equilibria, various optimality equations for Markov equilibria, and a study of how approximate-Markov equilibria can be constructed. Levy [9] had previously established the existence of optimal Markov strategies in zero-sum games.
Levy [5] also claims to show that Markov equilibria-equilibria which depends only on the current state and on time-need not exist, even when the payoffs and transitions are stationary. That example is based on a construction carried out in Levy [4], in the frame-work B Yehuda John Levy john.levy@glasgow.ac.uk state with payoff 0, but while continuation payoffs enter the optimality conditions in discrete time with a positive sign, reflecting the fact that an agent is guaranteed the payoff of the current indivisible time period even if facing absorption immediately thereafter, the continuation payoffs enter our framework with a negative sign, reflecting the fact that a quick absorption will result in loss of 'the payoff that could have been'. In addition to required sign changes at various points in the analysis, one combats the intricacies resulting from needing to account for both present payoff and future payoff by forcing the payoffs to shrink as a function of time, hence guaranteeing future payoff concerns are small (but, crucially, not negligible).

Recalling Model and Result
Following [5], the general framework for a continuous-time stochastic games-also called Markov Games; see [9]-with finite duration, and allowing for time-dependent payoffs and transitions-consists of the following framework: 5 • A finite set of states Z .
• A finite set of players P.
• A finite set of actions I p for each p ∈ P. Denote I P := p∈P I p and P (I ) = p∈P (I p ), the mixed action profiles.
Given an initial state z 0 ∈ Z , the game is played in continuous-time on the interval [0, T ]. The states are governed by a stochastic process, in which the probability of a transition from state z to a state z = z in time [t, t + h], during which the players play action profile a ∈ P (I ), is given by μ(z |t, z, a) · h + o(h); all this is formalized in Levy [5,Sec. 2].
A Markov strategy for player p ∈ P is a Lebesgue-measurable mapping u p : Z × [0, T ] → (I p ). Given a Markov strategy profile u = (u p ) p∈P , for each t ∈ T , let u t : Z × [0, T − t] → P (I ) be defined by u t (z, s) = u(z, s + t). Also denote, for each t ∈ [0, T ], z ∈ Z , and p ∈ P, 6 where the expectation is taken w.r.t. the measure induced by the initial state z and the profile u t , and z(s) denotes the state at time s. Denote further γ can be viewed as the payoff to a player p who evaluates his future payoffs starting at time t, assuming he is in state z at that time, under the profile u. A profile u of Markov strategies is a Markov equilibrium if for every z ∈ Z , every p ∈ P and every Markov τ p , we 5 In Levy [5,Sec. 2], we first present a model in which the payoffs and transitions do not depend on time, but in Sec. 9 there, it is remarked that the model and all results generalize immediately. Zachrisson [9] similar works with only stationary payoffs and transitions. 6 We note an addition typo in the middle term Equation (3.1) of Levy [5,Sec. 3]; E z u there should be E z u t , as it is written here. The evaluation is written correctly on the right side of that equation in terms of the transition matrix. We note that the time-dependence of at least the payoffs is crucial for the example presented here; in fact, we impose very irregular time dependence, typical of, e.g., the path of a Brownian motion. 7 As such, it is still an open question as to whether stationarity of payoffs and transitions guarantees existence of Markov equilibria. If the answer to this question is affirmative, it still remains an open question as to whether one can allow time-dependence with sufficient regularly properties and still obtain equilibrium existence.
We remark that we strongly conjecture that one could suffice with time dependence of only the payoffs (i.e., use stationary transition rates) to construct the counter-example, but in the name of simplification have not attempted to do so, inasmuch as it does not seem to add value.

Notations
Recall that ·, · denotes the inner product of vectors. In addition, the following notational conventions will be used: • Throughout · denotes the L ∞ norm. That is, for a vector or bounded real-valued function f , f = sup | f |, where the supremum is taken over the set of indices or the domain of f . • If p is a mixed action over an action space I and i ∈ I , then p[i] denotes the probability that p chooses i. • In connection with a tuple c indexed by the elements of some set T ⊂ P of players, if 1 , . . . , k ∈ T , then c 1 ,..., k will denote (c 1 , . . . , c k ).

Optimality Criteria and Payoff Evolution
Fix a continuous-time stochastic games as per Sect. 2, and a Markov strategy profile u = (u p ) p∈P . In Levy [5], Theorem 1 (p. 285) describes the evolution of γ u over time, and Theorem 2 (p. 286) gives a criterion for a profile u to be a Markov equilibrium: 8 is the unique absolutely continuous function satisfying the following differential equation for a.e. t ∈ [0, T ]: This in particular follows that γ u is absolutely continuous, and in fact Lipschitz, and hence in particular a.e. differentiable.
where N E (resp. N E P) denotes the Nash equilibria (resp. Nash equilibria payoff) correspondence, which assigns to each normal-form game its set of Nash equilibria (resp. Nash equilibria payoffs).
In this paper, we will discuss games with a particular structure: There are only two states, one denoted z 0 and the other denoted 0, the latter of which is an absorbing state with payoff 0, i.e., r (·, 0, ·) ≡ μ(z 0 | ·, 0, ·) ≡ 0. Clearly, as only the non-absorbing state z 0 is of interest, we may drop reference to it and write γ u (t), and for a.e. t ∈ [0, 1], In (3.2) and (3.3), we see how for both payoff and strategic purposes, we can clearly separate, in classic dynamic programming fashion, components resulting from present/running payoff r (t, ·), and components resulting from expected continuation/future payoffs, μ(t, ·)γ u (t). This separation will prove most useful along the way for intuitions driving the constructions, in particular as we will force the continuation payoff vectors to be small (in norm) when compared to the running payoff vectors.

Erratic Functions
Spurred by the error in the previous work, as we have discussed in the introduction, the construction at hand requires a pertubation by a sufficiently 'erratic' function, in a sense we make precise here. Let λ denote the Lebesgue measure on R. The following definition can be found, e.g., in Saks [8, Sec VII.3]: as δ → 0.  Table 2 The Payoffs to C and D (G C,D ) The following is included in Theorem 3.3 of Saks [8, Sec VII.3] 9 : Berman [1] shows that, with probability one, the path of a Brownian motion is nowhere approximately differentiable, and in particular erratic; the path of a Brownian motion is wellknown to be continuous with probability one. The existence of erratic continuous functions is also shown more directly in Jarník [3]; see also [7] and the references within.

The Example's Stage Game
Our construction has three phases: (a) Selecting four perturbations of a "base" game (Sect  Table 2. We state the properties of G that figure in the subsequent analysis. For a mixed strategy profile x, let G(x) be the vector of expected payoffs. In view of (b) and the bounds on payoffs for C and D, the upper semicontinuity of the Nash equilibrium correspondence implies that there is an η 0 > 0 such that whenever x is an equilibrium of a game G such that G − G ≤ η 0 . (Note that the game in (4.1) is the original game G, but the profile x is the equilibrium of a perturbed game.) We fix such η 0 > 0, and for each ( j, k) ∈ {−1, 1} 2 we fix such a perturbation G j,k of G such that the unique Nash equilibrium x of G j,k satisfies G C,D (x) = ( j, k). (The payoffs of A and B in G j,k play no role in our analysis after Lemma 4.1 has been established.)

The Stage Game
Next we describe a second strategic form game; in our stochastic game there will be two states, one of which is absorbing with 0 payoff to all, and the other of which has running payoff that is a rescaling of this strategic form game. The set of players is P = {A, B, C, C , D, D , E, F}. As above, player A has the pure strategies U and D, and player B has the pure strategies L, M and R, but in this game players C and D have pure strategies 0 and 1. Players C and D also have pure strategies 0 and 1, and players E and F have pure strategies −1 and 1. Pure and mixed strategy profiles will be denoted by a = (a A , a B , a C , a C , a D , a D , a E , a F The payoffs of this strategic form game depend on a parameter ∈ (− 1 where x C , x D denote the probability that these players play 1. The payoffs in the game g 1 ( , ·) are: a F (a A , a B ), In the stochastic game given in Sect. 5, (·) will depend on time, and the transition rates are controlled by C, C , D, and D , so in each time period the other players will only be concerned with maximizing their running payoffs, which is a rescaling of g 1 ( (t), ·). Players A and B are playing a perturbation of the game G, as described above.
The running payoff to C is the negation of the running payoff to C, so C and C will have opposite views concerning the desirability of the game continuing (as opposed to transitioning to the absorbing state with zero payoffs). Leaving aside the components of the stage game payoffs for C and C that depend only on the behavior of A and B, the conflict between C and C at time t is a zero sum game that consists of matching pennies perturbed by these concerns about absorption to the state with payoff 0. These perturbations, i.e., these concerns, will be small enough that there is always a unique equilibrium which is mixed. The conflict between D and D is similar to the conflict between C and C , albeit with different payoffs, as they are effected by A, B in a different way.
The best responses of players E and F depend on the signs of the expectations of the inner products (1, ), ψ(a) and (− , 1), ψ(a) , respectively. For ∈ (− 1 2 , 1 2 ) and j, k = ±1 Observe that (1, ), (− , 1) are orthogonal, so the D j,k are just the open quadrants of the plane under a certain rotation (see Fig. 1).
As mentioned, in the stochastic game defined in Sect. 4.1, (·) will be a function of the time t ∈ [0, 1], and we will see that in any Markov equilibrium, for a.e. t, behavior at time t is characterized by a mixed strategy profile x such that ψ(x) defined in (4.2) lies in D (t) , so that E and F play pure strategies, and consequently A and B are playing one of the perturbations G j,k of G. In this sense, the behavior of A and B is well controlled.
The following lemma summarizes the properties of g 1 needed going forward concerning A, B, E, F; thereafter, we will only reference the payoffs of C, D, C , D .
For Part (b), observe that if ψ(x) ∈ D j,k , then j · (1, ), ψ(x) > 0 and k · (− , 1), ψ(x) > 0; from the payoffs of g 1 , we see that players E, F play pure with (a E , a F ) = ( j, k), so x A,B is an equilibrium of G j,k , which in turn implies G C,D (x) = ( j, k).

Equilibrium in a Stage
To complement the function g 1 already defined, define a payoff function g 2 , which depends on a parameter ω = (ω C , ω D ) ∈ R 2 , in the following way: ) We also denote a payoff function g( , ω, ·) which will be the sum of two payoffs: g( , ω, ·) := g 1 ( , ·) + g 2 (ω, ·) (4.6) In the analysis conducted in Sect. 5.2 of the stochastic game we will present, the stage payoffs at time t will be a rescaling of g 1 ( , ·), where will be time-dependent as well, and g 2 (ω, ·) will be a rescaling of the continuation payoffs, where ω C,D = γ C,D u (t) for a candidate Markov equilibrium u; hence g( , ω, ·) will encompass all the strategic considerations of the agents at each time. 11 Recall the notation D j,k given in (4.3). Equilibrium analysis for g( , ω, ·) will yield Proposition 4.4, which will summarize the properties of the equilibria of g( , ω, ·) needed later. En route to that proposition, we need the following lemma, which will play no role after Proposition 4.4 is established: and that x is an equilibrium of g( , ω, ·).
Proof Since g A,B,E,F = g A,B,E,F 1 , Lemma 4.2 applies to x, so (a) follows from Lemma 4.2(a). For part (b), we claim that which is unaffected by x C,C , and 1 16 times the payoffs resulting from applying x C,C to the bimatrix game below.
(For example, the part of C's payoff affected by C and C 's behavior that accrues in the future is (a C + a C ) 1 64 ω C .) Since |ω C | < 2 this bimatrix game has a unique equilibrium, which must be x C,C . To see that x C [1] = 1 2 + 1 16 ω C , one can simply compare the payoff differences for C . The result for x D [1] follows by symmetry.
Recalling the definition of ψ given in (4.2), it follows that ψ(x) = 1
Proof From the definition of g 1 ,
• The transition rate μ(t, ·) := μ(0 | t, z 0 , ·) ≥ 0 (the intensity of the flow out of z 0 ) is determined by the actions of players C, D, C , D , and is given by: where recall that the actions a C , a C , a D , a D are in the action space {0, 1}.
Recall the notion of an erratic function introduced in Definition 3.2. We now state the main step in the argument: To see the key intuition underlying the construction, suppose that u = (u p ) p∈P is a Markov equilibrium of˜ . The optimality criteria recalled in Sect. 3.2, together with the fairly low transition rates, will show that at a.e. time t agents are playing an equilibrium of a game which is close to the game g 1 ( (t), ·). Since G is by the far the largest component of this payoff for A, B, C, D, at a.e. time t these agents are getting a payoff close to an equilibrium payoff of G. Hence, γ C,D u is absolutely continuous and with derivative not far from equilibrium payoff G C,D , which is non-zero. 12 Because is erratic, for a.e. t such that γ C,D u (t) = (0, 0), the best responses of E and F are a.s. pure, leading the perturbation of the base game to be one of the G j,k whose equilibrium pushes the vector of future payoffs of C and D away from the origin in 2 as we go forward in time, which is to say that the derivative of s → γ C,D u (s) 2 is positive at 12 Indeed, an essential feature of the construction is that G does not have any equilibria that give expected utility zero to both C and D, but nonetheless the origin is in the convex hull of the set of pairs of expected payoffs for C and D induced by the equilibria of G. The reader is referred to the discussion on this point on [6, p. 1244].
(d) By symmetry, the proof of (c) also establishes (d).
Proof Let t ∈ (t 0 , 1) be such that all the properties of Lemma 5.4 hold. To simplify notation, we drop the argument t. Denote δ = dγ C,D dt . We have J = γ C · δ C + γ D · δ D . Either |γ C | ≥ 1 2 |γ D | and hence δ C · γ C ≥ 13 16 (1 − t)|γ C | or |γ D | ≥ 1 2 |γ C | and hence δ D · γ D ≥ 13 16 (1 − t)|γ D |. If both hold, then 16 (1 − t) · (|γ C | + |γ D |) > 0. (The strict inequality is from Lemma 5.4(a).) Therefore, we may suppose that one of these holds, say the first without loss of generality, and the other does not, so 2|γ D | < |γ C |. By Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.