A Myopic Adjustment Process for Mean Field Games with Finite State and Action Space

In this paper, we introduce a natural learning rule for mean field games with finite state and action space, the so-called myopic adjustment process. The main motivation for these considerations are the complex computations necessary to determine dynamic mean-field equilibria, which make it seem questionable whether agents are indeed able to play these equilibria. We prove that the myopic adjustment process converges locally towards stationary equilibria with deterministic equilibrium strategies under rather broad conditions. Moreover, for a two-strategy setting, we also obtain a global convergence result under stronger, yet intuitive conditions.


Introduction
Mean field games have been introduced by Lasry and Lions (2007) and Huang et al. (2006) in order to make dynamic games with a large number of players tractable.The central idea is to approximate these games with many players by a game with a continuum of anonymous players.Thereafter a vibrant field of research emerged in particular the crucial assumption for most methods (a unique optimizer of the Hamiltonian) is typically not satisfied (see Neumann (2020a, Remark 2.5)).This is the first paper that discusses learning in mean field games with finite state and action space.We will introduce a myopic adjustment process, where agents choose to play a best response for the scenario that the current population distribution will persist for all future times.This definition yields to a formulation of this process as a differential inclusion and we can prove existence of this process under a continuity assumption.
Then we discuss under which conditions we can expect local convergence towards stationary equilibria.First, we explain that a general convergence result is only possible for deterministic stationary equilibria.Thereafter, we obtain a first convergence result for dynamics constant in m.Namely, under a classical irreducibility condition we establish local convergence towards stationary equilibria with a deterministic equilibrium strategy.For general dynamics, under some technical conditions as well as a condition similar to the irreducibility condition in the constant case, we are again able to prove convergence towards stationary equilibria with a deterministic equilibrium strategy.We conclude the analysis of local convergence with the discussion of examples.
In the end of the paper, we turn to the question of global convergence for games with only two sensible deterministic stationary strategies.Under certain technical conditions, we obtain that whenever the instantaneous change of the population is non-orthogonal to the set of points where both strategies are simultaneously optimal, global convergence towards deterministic stationary strategies happens or the trajectories stay in the set where both strategies are simultaneously optimal.In examples this is often enough to prove global convergence towards stationary equilibria in mixed strategies, which we illustrate in an example.
The rest of the paper is structured as follows: Section 2 describes the mean field game model considered in this paper.Section 3 then introduces the myopic adjustment process and justifies its definition as a sensible partially rational learning rule.Moreover, it presents the myopic adjustment process for a simple example.In Section 4 we study the local convergence of the myopic adjustment process and in Section 5 we then study the global convergence for the special case of two strategies.
intuitions.Moreover, we remark that the model has been first introduced in an analytic formulation and without the notion of stationary equilibria in Doncel et al. (2019).
Let S = {1, . . ., S} (S > 1) be the set of possible states of each player and let A = {1, . . ., A} be the set of possible actions.With P(S) we denote the probability simplex over S and with P(A) the probability simplex over A. A (mixed) strategy is a measurable function π : S × [0, ∞) → P(A), (i, t) → (π ia (t)) a∈A with the interpretation that π ia (t) is the probability that at time t and in state i the player chooses action a.A strategy π = d : S × [0, ∞) → P(A) is deterministic if it satisfies for all t ≥ 0 and for all i ∈ S that there is an a ∈ A such that d ia (t) = 1 and d ia = 0 for all a ∈ A \ {a}.Sometimes the following equivalent representation is helpful: Namely, we represent a deterministic strategy as a function d : S × [0, ∞) → A, (i, t) → d i (t) with the interpretation that d i (t) = a states that at time t in state i action a is chosen.A stationary strategy is a map π : S × [0, ∞) → P(A) such that π ia (t) = π ia for all t ≥ 0. With Π we denote the set of all (mixed) strategies and with Π s the set of all stationary strategies.Similarly, we denote by D the set of all deterministic strategies and by D s the set of all deterministic stationary strategies.
Let for all a ∈ A and m ∈ P(S) the matrices (Q ••a (m)) a∈A be conservative generators, that is Q ija (m) ≥ 0 for all i, j ∈ S with i = j and j∈S Q ija (m) = 0 for all i ∈ S. The individual dynamics of each player given a Lipschitz continuous flow of population distributions m : [0, ∞) → P(S) and a strategy π : S × [0, ∞) → P(A) are given as a Markov process X π (m) with given initial distribution x 0 ∈ P(S) and infinitesimal generator given by the Q(t)-matrix Given the initial condition x 0 ∈ P(S), the goal of each player is to maximize his expected discounted reward, which is given by where r : S ×A×P(S) → R is a real-valued function and β ∈ (0, 1) is the discount factor.That is, for a fixed flow of population distributions m : [0, ∞) → P(S) the individual agent's decision problem is a Markov decision process with expected discounted reward criterion and time-inhomogeneous reward functions and transition rates.
In this paper we work under the following standing assumption, which ensures the welldefinition of the model as well as the existence of dynamic as well as stationary equilibria (Neumann, 2020a): Assumption A1.For all i, j ∈ S and all a ∈ A the function m → Q ija (m) mapping from P(S) to R is Lipschitz-continuous in m .For all i ∈ S and all a ∈ A the function m → r ia (m) mapping from P(S) to R is continuous in m.
Definition 2.1.Given an initial distribution m 0 ∈ P(S), a mean field equilibrium is a pair (m, π) consisting of a flow of population distributions m : [0, ∞) → P(S) with m(0) = m 0 and a strategy π : S × [0, ∞) → P(A) such that • the distribution of the process X π (m) at time t is given by m(t), and Definition 2.2.A stationary mean field equilibrium is given by a stationary strategy π and a vector m ∈ P(S) such that • the law of X π (m) at any point in time t is given by m, and • for any initial distribution x 0 ∈ P(S) we have

The Myopic Adjustment Process
In general, it is not possible to compute dynamic mean field equilibria for the considered game; it is not even possible to explicitly characterize solutions of the individual control problem for a given non-constant flow of population distributions.Moreover, also in the case of a finite time horizon, the search for equilibria can only be reduced to a forwardbackward system of ODEs, which can, most of the time, be only solved numerically (see Belak et al. (2019)).The aim of this section is to motivate and define a reasonable alternative decision mechanism for the agents.
In contrast to Cardaliaguet and Hadikhanloo (2017) we cannot assume that the game is played repeatedly, but instead we have to assume that the agent changes his strategy during the game.For this we note that the game at time t with current distribution m is, due to the time-homogeneous formulation and the infinite time horizon, equivalent to the game started at time 0 with initial distribution m.Moreover, we remind ourselves that the influence of the individual agent on the game characteristics and thus on the payoff of the other players is negligible.Therefore, it is reasonable to assume that the agents do not try to influence the other players' choices, but that they only maximize their own payoff.Because of time-homogeneity and the negligible influence on other players we assume that the agents choose Markovian strategies that only depend on the current state and current population distribution.
We assume that the agent when choosing an optimal strategy given the current population distribution m assumes that the population distribution is constant.This myopia is not only a classical simplification, but the only sensible prediction an agent can compute in this setting.Indeed, in general one cannot compute optimal strategies for the Markov decision processes with non-stationary transition rates and rewards that arise in this setting.Given such a constant prediction of the behaviour of the population, the optimization problem becomes a tractable Markov decision process with stationary transition rates and rewards.It is well known that there is always an optimal stationary strategy for the considered optimization problem (Guo and Hernández-Lerma, 2009) and it is again natural to assume that agents choose such a stationary strategy.
We remark that this assumption that agents choose a stationary strategy is classical and that there are several conceptual reasons for the use of these strategies (see Maskin and Tirole (2001)).Namely, stationary Markov strategies are the simplest (rational) form of decision-making in this context.Moreover, this type of strategies is related to subgame perfection, which is as discussed earlier a reasonable requirement in our setting.Indeed, these strategies ensure that a game with the same relevant characteristics (i.e.current state and population distribution) is played in the same way.Finally, the restriction on this type of strategies reduces the number of possible best responses and thus increase predictive power.
For our purpose now the following result explicitly characterizing the set of all optimal stationary strategies given a constant flow of population distributions proves to be useful (see Neumann (2020a, Section 3)): Let us denote by V * j (m) the unique solution of the optimality equation for the Markov decision process and define Furthermore, we set Then the set of all optimal stationary strategies is given by conv(D(m)).
We argued that every agent will choose a strategy from the set conv(D(m)).However, we cannot assume that all agents choose a particular strategy nor that the agents or groups of them agree on a common strategy.Moreover, we also cannot describe which agents will choose which strategy.The only sensible assumption is that the population chooses aggregately a strategy from the set conv(D(m)).The next lemma describes how the population's distribution evolves if all agents adopt this decision mechanism: Lemma 3.1.Let m : [0, ∞) → P(S) be the distribution of the population, where at time t ≥ 0 any agent chooses a strategy from conv(D(m)).Then for almost all t ≥ 0.
Proof.Using the Kolmogorov forward equation, the individual agent's dynamics given any strategy π ∈ Π s can be equivalently described as being the solution of the ordinary differential equation (in the sense of Caratheodory) with initial condition x(0) = x 0 .Since the aggregated strategy π of the population satisfies π ∈ conv(D(m)) the desired claim follows.
With these preparations we define the myopic adjustment process as a solution of the differential inclusion in the sense of Deimling (1992) given by ( 2).Namely, a trajectory of the myopic adjustment process is an absolutely continuous function m We remark, that the use of differential inclusion as a modelling tool for situations where uncertainty, the absence of control or a variety of available dynamics occurs is classical (Aubin and Cellina, 1984).
Before we start the analysis of the long-term behaviour, we note, relying a classical existence result for differential inclusions, that under Assumption A1 always a trajectory of the differential inclusion exists.The proof can be found in the appendix A.
For our purpose, it is central to understand how stationary equilibria and the trajectories of the myopic adjustment process interact.The following observation, which is classical for many learning procedures, is a first step: Remark 3.3.By definition, a point is a stationary point of (3) if and only if it is a stationary mean field equilibrium.
In the Section 4 we will analyse, whether the myopic adjustment process started close to a stationary equilibrium converges towards it.Thereafter, in Section 5 we analyse in the setting of two sensible deterministic stationary strategies under which conditions convergence towards some stationary equilibrium irrespective of the starting point can be expected.
We conclude this section by discussing the shape of the myopic adjustment process for an example linked to consumer choice in the mobile phone sector, for which previously in Neumann (2020a) the stationary equilibria have been computed: Example 3.4.The agents can choose between two providers and their utility is increasing in the share of customers using the same provider.The agents can switch the provider facing a time-unit cost c.However, the decision is not implemented immediately, but according to a Poisson process with rate b.The formal description of the model is given as follows: Let S = {1, 2} and A = {stay, change}.Let δ > 0 be small and define Then the transition rates and rewards are giben by where , b, s 1 , s 2 and c are positive constants with < b.
Let us write (a 1 , a 2 ) for the deterministic strategy d such that where Thus, the myopic adjustment process is defined as a solution of the differential inclusion ṁ(t) ∈ F (m(t)) with In Figure 1 we illustrate the behaviour of this process for a parameter choice that yields for any initial condition to a unique solution of the differential inclusion.

Local Convergence
This section discusses the question of local convergence, that is we analyse under which conditions we can expect that for an initial condition close to a stationary equilibrium the trajectories of (3) converge towards that equilibrium.
Local convergence of the myopic adjustment process will in general not happen if the equilibrium distribution m satisfies |D( m)| > 1.Indeed, in this case there are points arbitrary close to the equilibrium at which a different strategy than the equilibrium strategy is optimal and for which we cannot tell anything about the behaviour of the trajectories given this strategy -it might happen that we are pushed away from the equilibrium.Thus, it remains to verify whether we have local convergence towards a stationary deterministic mean field equilibrium ( m, d) where d is the unique optimal strategy at m.As a first step, we observe the following: This lemma now yields that for all m ∈ N ( m) we have Thus, it suffices to investigate whether there is a δ > 0 such that for all m 0 ∈ P(S) satisfying |m 0 − m| < δ we have that the solution of ṁ(t) = (Q d (m(t))) T m(t) lies in N ( m) for all t ≥ 0 and converges towards m.This question is closely linked to the notion of asymptotically stable solutions of autonomous ordinary differential equations ẋ = f (x).However, we do not consider arbitrary initial conditions in N δ (x), but only those that lie in P(S).Moreover, we always face, since Q(m) is conservative for all m ∈ P(S), a zero eigenvalue of the Jacobian.Thus, general stability results are not applicable.
The first positive result we present covers the case where the dynamics given the equilibrium strategy are constant, that is ṁ = (Q d ) T m.In this setting it is even possible to explicitly characterize what "close" means: Theorem 4.2.Let ( m, d) be a stationary mean field equilibrium such that D( m) = {d} (that is, d is the unique optimal strategy at m).Furthermore, assume that Q d (m) is constant in m and an irreducible generator.Then there is a δ > 0 such that any solution of the myopic adjustment process (3) with initial condition m 0 ∈ P(S)∩N δ ( m) converges exponentially fast to m, i.e. there are constants C 1 , C 2 > 0 such that ||m(t) − m|| ≤ C 1 e −C 2 t for all t ≥ 0.
Remark 4.3.We can explicitly determine δ > 0: Namely, let λ 1 , . . ., λ n be the eigenvalues of Q d .By Asmussen (2003, Corollary 4.9) there is an eigenvalue λ i = 0 with multiplicity one and eigenvector m and such that Re(λ j ) < 0 for all j = i.Without loss of generality assume that i = 1 and denote by m(λ j ) the multiplicity of λ j and by (v 0 j , . . ., v m(λ j )−1 j ) the basis of the generalized eigenspace Eig(λ j ) := ker (Q d ) T − λ j I m(λ j ) .Choose > 0 such that for all m ∈ N ( m) it holds that D( m) = {d} and define for all j ∈ {2, . . ., n} and k ∈ {0, . . ., m(λ j ) − 1}, then The proof relies on the classical result that the solutions of a linear ODE with initial condition v ∈ Eig(λ) for some eigenvalue λ is given by x(t) = e λt m(λ)−1 l=0 ) is a basis of R S .Thus, for any initial condition m 0 ∈ R S we find a unique set of constants ( Logemann and Ryan (2014, Theorem 2.11) and since m is the eigenvector for the eigenvalue 0 we obtain Since the continuous time Markov chain with generator Q d is ergodic, we have that m(t) → m for t → ∞.Since moreover Re(λ i ) < 0 it holds that α 0 1 = 1.Thus, using that the function t → e Re(λ i )t t l l! has a unique global maximum in [0, ∞) at t = − l Re(λ i ) , we obtain If m 0 ∈ N δ ( m) ∩ P(S), then Therefore, m(t) ∈ N ( m) for all t ≥ 0. The exponential convergence then follows from (4).Also in the case of general dynamics we can provide a similar positive statement.However, we have to impose additional conditions, since the necessary eigenvalue structure does not follow immediately in this setting: Theorem 4.4.Let ( m, d) be a stationary mean field equilibrium such that D( m) = {d} (that is, d is the unique optimal strategy at m).Let O ⊇ P(S) be an open set such that Q d : O → R S×S is componentwise Lipschitz continuous, the matrix Q d (m) is a transition rate matrix for all m ∈ P(S) and the function Assume further that the Jacobian ∂ ∂m f d (m) has a zero eigenvalue with eigenvector m and all other eigenvalues have strictly negative real parts.Then there is a δ > 0 such that any solution of the myopic adjustment process (3) with initial condition m 0 ∈ P(S) ∩ N δ ( m) converges exponentially fast to m, i.e. there are constants C 1 , C 2 > 0 such that ||m(t) − m|| ≤ C 1 e −C 2 t for all t ≥ 0.
Remark 4.5.This result covers Theorem 4.2, however, the proof is non-constructive.In particular, we cannot, in contrast to the setting of Theorem 4.2, explicitly describe δ.
The central idea of the proof has also been used in the analysis of nonlinear sinks in Hirsch and Smale (1974) Also here the central step of the proof is to find a suitable constant C > 0 such that However, it is not possible to directly use the techniques of Hirsch and Smale (1974) because of the zero eigenvalue (which always occurs since Q d (m) is conservative).) is a basis of the generalized eigenspace Eig(λ i ) (ii) for all i ∈ {2, . . ., n} and x ∈ span(b 0 We first obtain that for all vectors x = n i=2 m(λ i )−1 k=0 As a next step we note that the set P(S) − m := {x ∈ R S : ∃m ∈ P(S) : x = m − m} is compact and that is a homeomorphism.Thus, also As a final preparation we note that, by the same reasoning as in the proof of Theorem 3.2, the set P(S) is flow invariant for ṁ(t) = f d (m(t)).Thus, the set P(S) − m is flow invariant for ẋ(t) = f d (x(t)).
By definition of the derivative (which in particular yields ) in a neighbourhood of 0) and Cauchy's inequality we have Since for all x ∈ P(S) − m we have Ax, x B ≤ −bD||x|| 2 B , there is a δ > 0 such that for all x ∈ N δ (0) then make δ smaller such that D(x + m) = {d} for all x ∈ P(S).Now let x 0 ∈ N δ (0) ∩ (P(S) − m).Then by Peano's existence theorem there is a solution which means that ||x(t)|| B is strictly decreasing on [0, t 0 ].Thus, x(t) ∈ N δ (0) for all t ∈ [t 0 , t 0 + ˜ ].Repeating this argument, we obtain by Hirsch and Smale (1974, Section 8.5) that x(t) ∈ N δ (0) for all t ≥ 0. Furthermore, the estimate (5) yields that ||x(t)|| B ≤ e −cDt ||x(0)|| B , which is the desired exponential convergence.
These theorems can be directly applied in examples: Indeed, we obtain for the consumer choice model introduced in Section 3 and analysed in Neumann (2020a) that local convergence happens to any deterministic stationary equilibrium where the equilibrium distribution does not equal the boundary value k 1 or k 2 , respectively.Also in a simplified version of the corruption model of Kolokoltsov and Malafeyev (2017), which has also been analysed in Neumann (2020a), we obtain local convergence for several stationary equilibria having a deterministic equilibrium strategy that is unique for the equilibrium point.More precisely, we obtain for any parameter choice local convergence towards those deterministic stationary equilibria where the equilibrium distribution lies in the interior of P(S) and for some parameter constellations we also obtain local convergence towards the deterministic stationary equilibria where the equilibrium distribution is (1, 0, 0) or (0, 1, 0).

Global Convergence for a Two Strategy Setting
The question of global convergence "Given an arbitrary initial condition m 0 ∈ P(S) does any trajectory converge towards some mean field equilibrium?" is much more complex.
Here we provide a statement for the case that U := {d ∈ D s : D(m) = {d}} consists of exactly two strategies, i.e.U = {d 1 , d 2 }.The statement does not directly yield the desired convergence statement, instead we only obtain convergence towards equilibria with a deterministic equilibrium strategy or that the trajectory remains in a set where the two strategies from U are simultaneously optimal.However, relying on examplespecific properties, we can then often prove the convergence towards the mixed strategy equilibria by hand.
If U = {d 1 , d 2 } then the differential inclusion (3) describing the myopic adjustment process simplifies substantially: Define and assume that g is twice continuously differentiable, i.e. assume that Q ija and r ia are twice continuously differentiable for all i, j ∈ S and a ∈ A on some open superset O of P(S).Then This means, that on g(m) = 0 the trajectory is the trajectory of a nonlinear Markov chain with generator Q d 1 (•) whenever g(m) < 0 and Q d 2 (•) whenever g(m) > 0. These processes are a generalization of a classical Markov chain with the new feature that the transition probabilities do not only depend on the current state, but also on the current distribution of the process.For more details consider Kolokoltsov (2010) and (in particular regarding the long-term behaviour) Neumann (2020b).Thus, the processes are characterized through the transition probabilities (P ij (t, m)) i,j∈S , which describe the probability to be in state j and time t when at time 0 the state was i and the distribution was 0, or (non-uniquely) through the marginal distributions Φ t i (m), which describes the probability to be in state i at time t when the initial distribution was m.One can show that it is indeed sufficient to characterize a nonlinear Markov chain through a nonlinear generator, that is a Lipschitz continuous function Q : P(S) → R S×S such that Q(m) is a conservative generator for all m ∈ P(S).
In the following theorem, we desire that the considered nonlinear Markov chains behave well in the long-term.As in the theory of standard Markov chains, the invariant distribution is a central tool for this analysis and it solves the (now non-linear) equation 0 = Q(m)m T .However, a weaker condition than classical ergodicity is enough.Indeed, it suffices to require that the nonlinear Markov chain converges in the limit towards some invariant distribution, that is that for all m 0 ∈ P(S) there is an invariant distribution m(m 0 ) such that lim This condition is indeed weaker than ergodicity, see Neumann (2020b, Section 4.2).
With these preparations, we are able to formulate and prove the global convergence theorem: Theorem 5.1.Assume that for all m ∈ O such that g(m) = 0 it holds that ∇g(m) = 0. Furthermore, assume that the nonlinear Markov chains with transition rate matrix functions Q d 1 (m) and Q d 2 (m) converge in the limit towards some stationary distribution.
(i) If for all m ∈ O such that g(m) = 0 it holds that then the myopic adjustment processes converges towards some stationary mean field equilibrium with deterministic equilibrium strategy from U.
(ii) If for all m ∈ O such that g(m) = 0 it holds that then the myopic adjustment processes converges towards some stationary mean field equilibrium with deterministic equilibrium strategy from U.
(iii) If for all m ∈ O such that g(m) = 0 it holds that then the myopic adjustment process either converges towards a deterministic stationary mean field equilibrium with equilibrium strategy from U or there is a T > 0 such that the processes satisfies g(m(t)) = 0 for all t > T .
The gradient conditions of the theorem have an intuition: More precisely, the conditions in case (i) state that when g(m) = 0 the population distribution heads to the set where the strategy d 1 is optimal, and the conditions in case (ii) state that when g(m) = 0 the population distribution heads to the set where the strategy d 2 is optimal.In case (iii) the conditions say, that when g(m) = 0 and the population chooses d 1 the distribution tends into the set where d 2 is optimal and when g(m) = 0 and the population chooses d 2 the distribution tends into the set where d 1 is optimal.
Proof of Theorem 5.1.We first note that if there is a T ≥ 0 such that the trajectory satisfies g(m(t)) < 0 or g(m(t)) > 0 for all t ≥ T , then the solution of ( 3) is also a solution of ṁ , respectively, which means that (m(t)) t≥T are the marginals of a nonlinear Markov chain.By assumption, we thus obtain convergence towards some stationary point, which is, since g(m(t)) < 0 or g(m(t)) > 0 for all t ∈ [T, ∞), a stationary equilibrium.
In case (i) whenever g(m(T )) = 0 for some T ≥ 0, then g(m(t)) > 0 for all t ≥ T .Indeed, assume that g(m(t)) = 0 for all t ∈ [T, T + ] for some > 0. Then for almost all t ∈ [T, T + ] it holds that which is a contradiction.Similarly, if we assume that g(m(t)) < 0 for all t ∈ (T, T + ) with > 0, then it holds for almost all t ∈ (T, T + ) that again a contradiction.Thus, it either holds that g(m(t)) < 0 for all t ≥ 0 or that g(m(t)) > 0 for all t ≥ T with T ≥ 0, which by the first observation yields the desired convergence.
If T = inf{t ≥ T : g(m(t)) < 0}, then since m is absolutely continuous and g is continuously differentiable, we obtain for some 1 > 0 that for almost all t ∈ [ T , T + 1 ], a contradiction.Similarly, if T = inf{t ≥ T : g(m(t)) > 0}, then we obtain for some 2 > 0 that , a contradiction.Thus, either g(m(t)) < 0 for all t ≥ 0, or g(m(t)) > 0 for all t ≥ 0, in which case we obtain convergence towards some stationary equilibrium with a deterministic equilibrium strategy, or there is a T > 0 such that g(m(t)) = 0 for all t ≥ T .
Example 5.2.Let us consider the following example, which consists of two "good" states, where a positive reward is earned, and one "bad" state, where no reward is earned.The agents in the "good" state face congestion effects, namely there is a risk, increasing in the share of individuals in that state, to go to the "bad" state.The control options are to switch between the two good states.One can interpret this model as a stylized version to model the choice between two mobile phone providers, where the customer faces the risk of a breakdown in connection that increases in the share of customers using the same provider.For simplicity, we assume that agents in the "bad" state have no choice option, but recover into each of the two states with equal probability.Thus, Theorem 5.1 yields that either convergence towards a stationary equilibrium with an equilibrium strategy from U happens or that there is a T ≥ 0 such that g(m(t)) = 0 for all t ≥ T .Since there is no stationary equilibrium with a equilibrium strategy from U it is clear that there is a T ≥ 0 such that g(m(t)) = 0 for all t ≥ T , which means that m 1 (t) = m 2 (t) for all t ≥ T .Thus, also ṁ1 (t) = ṁ2 (t).By (3), this yields that − π 1,change (t)bm 1 (t) − em 1 (t) 2 − m 1 (t) + π 2,change (t)bm 1 (t) + λm 3 (t) = π 1,change (t)bm 1 (t) − π 2,change (t)bm 1 (t) − em 1 (t) 2 − m 1 (t) + λm 3 (t), i.e. − π 1,change (t)bm 1 (t) + π 2,change (t)bm 1 (t) = 0. Thus, for almost all t ≥ T the trajectory of the myopic adjustment process has to satisfy ṁ1 (t) = −em 1 (t) 2 − ( + 2λ) m 1 (t) + λ, which is a Riccati equation, for which [0, 1] is flow invariant and for which a unique classical solution for any initial condition m 0 ∈ [0, 1] exists.Numerical simulations indicate that in our setting with initial conditions m 0 ∈ [0, 1] convergence towards the distribution of the stationary mixed strategy equilibria is likely.

A. Appendix
Proof of Theorem 3.2.We show that the conditions of Lemma 5.1 in Deimling (1992) are satisfied, as this yields the desired existence statement.More precisely, we show in the following that (i) F is upper semicontinuous, (ii) F (m) is a closed, convex set for all m ∈ P(S), for all m ∈ P(S), and

Figure 1 :
Figure 1: Illustration of Example 3.4: The figure shows m 1 (t) for several trajectories of the myopic adjustment process given different initial conditions.The blue vertical lines represent the thresholds at which the set of optimal strategies changes.

(
iii) there is a constant c > 0 such that ||F (m)|| := sup{||y|| : y ∈ F (m)} ≤ c(1 + ||m||) and Ryan, 2014, Theorem 2.11)as well as a sensible treatment of the eigenvalue structure of Q d : , namely to bound ∂ ∂t ||x(t)|| B with x(t) = m(t) − m for a suitable basis (and corresponding scalar product (i.e.b i , b j = δ ij ) and norm).More precisely, since x, y B = x T Cy for any basis B we obtain a product rule (