Dynamic Games with Strategic Complements and Large Number of Players

We study dynamic games with strategic complements where each player is modeled by a scalar flow dynamical system with a controlled input and an uncontrolled output. The model originates in inventory control problems with shared set-up costs and a large number of players. An activation cost is shared among active players, namely players who control their dynamics at a given time. As a main contribution, we prove that two-threshold strategies, like the (s, S) strategies used in inventory control, are mean-field equilibrium strategies in dynamic games with a large number of players. Furthermore, we provide conditions for the convergence of the nonstationary mean-field equilibrium to the stationary one in the limit.

of a player scales with the percentage of players taking an action. This paper studies a dynamic game with strategic complements where the players have to coordinate actions within a finite horizon window [2,3,19]. The dynamics of each player is a fluid flow dynamical system subject to a controlled input flow and a stochastic uncontrolled output flow. Activating an input flow requires an activation cost. The discrepancy between input and output flow accumulates in a state variable. Coupling derives from the activation cost to be shared among all players who activate an input flow at a given time, called active players. Sharing the activation cost determines an incentive for the players to be active with an increasing number of active players. All results can be extended to the vector case by using the robust decomposition approach in [4,Section 3].
We extend the analysis in [19] to a mean-field scenario [1,9,10,13,14,16,17] characterized by a microscopic and macroscopic dynamics. The microscopic dynamics is the fluid flow system determining the state of each player. The optimal control is obtained from solving a backward Bellman equation in the value function. The macroscopic dynamics is in the form of a Markov chain dynamics where the nodes represent all possible values for the players' states, and the links are weighted by the transition probabilities between states. The Markov chain dynamics determines the evolution of the distribution of players' states among the different values. The resulting game involves both the microscopic and macroscopic dynamics in a unified framework and takes the form of a discrete-state discrete-time mean-field game. Such a game consists of two coupled difference equations, a backward Bellman equation in the value function, and a forward Markov dynamics in the distribution of the players' states. The mean-field equilibrium is obtained as solution of these two coupled equations. The stationary solution is obtained in the asymptotic limit when the horizon length goes to infinity.

Contribution
This study contributes in different ways to advance the theory on dynamic coordination games with activation costs and extend for the first time the use of two-threshold strategies to mean-field games. An example of two-threshold strategy is the (s, S) strategy used in inventory control, see [7] and [5,Chapter 4]. In [5], the author derives the thresholds of the (s, S) policy for an individual player considering a fixed cost. In this work, we present the explicit expression for these thresholds considering a large number of players and an activation cost that depends on the fraction of active players at each time t. We recall that (s, S) strategies are strategies where replenishments occur anytime the inventory level goes below a lower threshold s. Replenishments bring back the inventory level up to a higher threshold S. In particular, we highlight the following results: • Strategies at a Nash equilibrium have a threshold structure. Lower and upper thresholds have an explicit expression in the deterministic case, namely when the demand is known, or in single-stage games. • Two-threshold (s, S) strategies are mean-field equilibrium strategies for the stationary solution in dynamic games with a large number of players. Stationary solutions imply that the fixed cost is constant over the horizon. The game decomposes into a set of uncoupled optimization problems. In each problem, a single player has to find the optimal strategy under a fixed cost. We then use the well-known optimality of (s, S) strategies under fixed cost to show that such strategies are best responses for the game. Furthermore, we provide conditions for the convergence of the nonstationary mean-field equilibrium to the stationary one in the limit. • We corroborate our results with a numerical analysis of a stylized inventory model. This paper is organized as follows. In Sect. 2, we introduce the model. In Sect. 3, we obtain the optimal thresholds. In Sect. 4, we study convergence to stationary solutions. In Sect. 5, we provide numerical analysis. Finally, in Sect. 6, we draw conclusions and discuss future works.

Mean-Field Inventory Game
We consider a large number of indistinguishable players and a finite number of states (inventory levels). Let us assume that at stage t = 0, 1, ..., N the inventory level for an individual player is x t ∈ Z, the player faces a stochastic demand ω t ∈ Z + and orders a quantity u t ∈ U t ⊆ Z + , where U t denotes the set of admissible actions, Z is the set of integers, and Z + is the set of nonnegative integers. Hence, the microscopic dynamics of the player evolves according to a linear finite-state, discrete-time model: According to [5] in (s, S) strategies, replenishments occur anytime the inventory level goes below a lower threshold s and when a replenishment takes place it brings back the inventory level up to the upper threshold S [7]. In accordance with this strategy, let us define the control u t as follows: After substituting the (s, S) strategy as defined in (2) in the dynamics (1), we obtain To define the random parameter ω t that corresponds to the uncertain demand at time t, let us consider a probability distribution φ t : ω is the probability of having a demand of ω items at time t for all ω ∈ Z + . To derive a macroscopic dynamics for the system, let us denote by π t the distribution of players over the states at time t. Hence, π t is a vector that stores in each of its entries the fraction of players in each possible state. In particular, the jth entry π t j represents the fraction of players whose state is x t = j at time t and derives from the following distribution function: Fig. 1 Markov chain representing the macroscopic dynamics (4) obtained from the microscopic dynamics (1) Occasionally, we will view π t as an infinite-dimensional vector in Z. Also, let π 0 be the initial distribution of players over the states. At every time step t, the players in state l decide the amount to reorder u t . The order quantity, as well as the demand distribution ω t , determines the transition probability P t l j from state l to state j. Given the transition probabilities P t l j at time 0 ≤ t < N , the distribution of players at time t + 1 is given by the following macroscopic model which takes the form of a Markov chain: π t+1 j = l∈Z π t l P t l j , for all j ∈ Z, for all t = 0, 1, . . . , N .
The transition probabilities P t l j used in the above equation are linked to the probability mass functions used to model the stochastic demand. To see this, let φ t 0 , φ t 1 , φ t 2 , . . . be the probability mass functions at time t associated with ω t = 0, 1, 2 . . ., respectively. The relation between P t l j and φ t 0 , φ t 1 , φ t 2 , . . . is as follows: The above equation defines the transition probabilities from any state below the threshold, where the players reorder up to level S. For any state equal to or greater than the threshold s, the transition probabilities are instead given by: Figure 1 depicts the Markov chain that represents the macroscopic dynamics (4). In the mean-field context, the fraction of active players, which are the players whose inventory level is below or equal to the lower threshold s t , is then given by: l , for all t = 0, 1, . . . , N .
Likewise, we can define a value function for any time t which represents the expected optimal cost for a player in the generic state j at time t: Let the transition probability matrix at time t be denoted by P t = [P t l j ] l, j∈Z . Associated with each probability P t l j , there is a transition cost for going from state l to state j, which depends also on the distribution of players π t ; let us denote such cost as c t l j (π t , P t ).
The average cost for the players in state l, when their dynamics follow the transition probability matrix P t , for a given distribution π t and the future cost defined by the value function v t+1 j , for all j ∈ Z, are given by: We are in the position to provide the following definition of Nash equilibrium in the mean-field limit, in discrete-time, and in discrete-state space.
where P(P, q, l) is obtained from matrix P by replacing the lth row by q ∈ S Z .
We say that the following pair of time-varying distribution and value function is a mean-field equilibrium if it is the solution of the following system of equations where P t is a Nash minimizer of e(π t , ·, v t+1 ).
In the above set of equations, we set the transition cost c t l j = c l j (π t , P t ) at time t as follows: where K t := K (a t ) ≥ 0 is the transportation cost charged to each player that is active at time t, r ≥ 0 is the fixed purchase cost per stock unit, h ≥ 0 the fixed penalty on holding and p > h ≥ 0 the fixed penalty on shortage.
The above transition cost can be rewritten in compact form as: Note that the transportation cost K t = K (a t ) paid by each player is a monotonically decreasing function on the fraction of active players at time t. As the fraction of active players a t increases, the transportation cost K t decreases. If a player makes an order, it incentivizes other players to reorder; this implies that the cost of one player also depends on the actions of the other players. Let us assume a large number of players M and a total transportation costK . As an example, if the total cost is equally divided among the active players, the individual transportation cost charged to each player is given by K (a t ) =K Ma t if the player is active, and it is zero otherwise.

Optimal Thresholds
In this section, we provide explicit expressions to obtain the lower threshold s and the upper threshold S, as a function of the probability distribution function φ t which determines the stochastic demand at each time t.
Let us denote by y t = x t + u t , the instantaneous inventory position, i.e., the inventory level just after the order has been issued, and let us define the following stage cost function: Then, we have for the value function: where the term −r x t + K t + G t (y t ) indicates the stage cost in case of reordering, and −r x t + G t (x t ) indicates the stage cost in case of no reordering. Hence, note that the cost of reordering is given by: To obtain S t , for an instantaneous inventory position γ , first let us define the expected holding E{max(0, γ − ω t )} and expected shortage E{max(0, −(γ − ω t ))} as follows: where φ t ω is the probability of having a demand of ω items at time t. Hence, the stage cost function G t (γ ) is given by: .
By applying the discrete difference operator Δ, to function G t (γ ) we then have: is the cumulative distribution function defined as: The order-up-to level S t is the optimal γ , which is obtained from solving: From the above, we then obtain (Fig. 2): To obtain s t , let us consider the cost of not reordering, which is given by: From the above, we then obtain: In particular, we have ( Fig. 3): Observe that the right-hand side of the inequality in (15) corresponds to the cost of reordering once we obtain the optimal upper threshold S t .
In order to obtain the lower threshold s t , we have to find the minimum inventory level x t that satisfies (15). As the penalty on shortage is greater than the penalty on holding ( p > h), if the inventory level decreases, then the left-hand side of the inequality in (15) increases. If the transportation cost K t decreases, the right-hand side of the inequality decreases and the minimum inventory level x t that satisfies (15) increases. Therefore, the lower the transportation cost the higher the threshold s t .
Equations (13) and (15) represent explicit expressions to obtain the two thresholds and fully characterize the reordering strategy once the probability distribution of the stochastic demand is given.
Once the thresholds are obtained, we implement the control u t , which is given by (2), and we obtain the resulting dynamics (3).
In the following, we study the time evolution of the first-order moment of the inventories. The expected inventory at time t when x t is distributed according to π t is given by: Fig. 3 Value of x t that satisfies equation (15) Then, from (3) the expected inventory at time t +1 when x t+1 is distributed according to π t+1 and the demand ω takes values in the support Ω ⊆ Z + , follows the recursion: From l,l≥s t π t l = 1 − a t , we have: In the numerical example, we make use of (17) to obtain the first moment of the distribution of the inventory at time t + 1.

Stationarity
In this section, we are interested in stationary solutions, namely solutions where both the distribution function and the value function do not depend on time. In addition, the activation cost is a function of the fraction of active players. Therefore, the cost K (ã) is fixed over the horizon and it depends on the stationary solution. Now, we can apply the results obtained in Sect. 3 for a fixed activation cost K , to obtain the optimal lower threshold s and the optimal upper threshold S.
Let us denote by (π, v) the generic stationary solution. The pair (π, v) is a meanfield equilibrium at steady state if it satisfies the following set of equations: v l = j c l j (π, P)P l j + v j P l j −λ, π j = l π l P l j , whereλ is the optimal average cost per stage. In [9], the authors prove that the optimal average cost can be seen as an average transition cost over the population of players. IfP is the optimal transition matrix and (π,v) is a stationary solution of (18), then λ = l j π j c l j (π,P)P l j . Assuming a bounded support for the demand ω and therefore also for the inventory level x, which we denote by [1, η], let us define matrixÃ = [ã i j ] i, j∈ [1,η] , where: Let us define the new variable ξ t lk = [v t l − v t k ], which can be seen as a potential difference between two generic states or nodes of the Markov chain l and k, and the vector ξ t l : j∈Z . In addition, denote P t l = [P t l j ] j∈Z and c l = [c l j ] j∈Z for all l ∈ Z. Before discussing the main contribution of this section, that is the convergence of nonstationary mean-field equilibrium to the stationary one in the limit, we present an intermediate result to verify the structure ofÃ introduced in (19). [1, η]. The discrete-time dynamics of the potential difference

Lemma 4.1 Let a bounded support for the demand ω and for the inventory level x be given and denote it by
whereÃ = [ã t i j ] i, j∈ [1,η] , each entryã t i j is of the form (19) [1,η] .
Proof The proof is in the Appendix.
In the following theorem, we present the conditions for the nonstationary meanfield equilibrium, which is a solution of (8), to converge to the stationary solution of problem (18). Note that the stochastic matrix P t presented in equation (8) is a Nash minimizer of the average cost e(π t , ·, v t ).
Proof The proof is in the Appendix.

Numerical Analysis
We consider an example where the demand ω t ∈ Ω := {0, 1, 2, 3} and it is uniformly distributed, namely by using the notation φ ω to indicate the probability that ω t = ω, we have φ i = 1 4 for all i ∈ Ω. Assume that the proportional purchase cost is r = 1, the shortage cost is p = 10, and the holding cost is h = 2. In the case of single-stage optimization, we have that the order-up-to level is given by: From the above, we obtain S = 2. Indeed for γ = 3, we have: For γ = 2, we obtain: Differently, for γ = 1 it holds and therefore As for the reorder level s, we have: We show next that we have s = 1. Actually, for x t = 1 we obtain: which is satisfied by any K t ≥ 3. For x t = 0, we have: which is satisfied by any K t ≥ 9. For any K t < 9, we then have: We can conclude then that for any K t , such that 1 ≤ K t < 9, we have the reorder level s = 1 and the order-up-to level S = 2.
As for the value function difference we have a 4 × 4 system where l ∈ {−2, −1, 0, 1, 2}, which is given by: From (26), we note that the det(Ã) = 1 > 0. From (17), we also have that the dynamics of the expected inventory (first moment) is given by: The rest of this section involves numerical analysis for a system of 100 indistinguishable players. All simulations are carried out with MATLAB on an Intel(R) Core(TM)2 Duo, CPU P8400 at 2.27 GHz, and a 3GB of RAM. The horizon window consists of T = 200 iterations. For each player, we simulate (25) for three cases characterized by a different initial distribution. The initial state is obtained from a random uniform distribution in {1, 2} for case 1, in {−2, 0} for case 2, and in {−2, 2} for case 3 using the commands x0=randi( [1,2],n,1), x0=randi([-2,0],n,1), and x0=randi([-2,2],n,1), respectively. The demand is obtained in accordance with φ i and is generated using the command w=randi([0,3],n,T).
The step size is dt = 0.1, the proportional purchase cost is r = 1, the shortage cost is p = 10, and the holding cost is h = 2. Figure 5 displays the time plot of the distribution π t for all t ∈ [0, T ] for the three cases. The distribution at steady state is greater in state −1, 0, and 1 (red, yellow, and purple lines, respectively). Note that, in accordance with Theorem 4.1, the three cases with different initial distribution have the same distribution at steady state. During the simulation, we assume any 50 iterations the states are reset to their initial value, to investigate the time response during the transients. Figure 6 displays the time plot of the microscopic dynamics for a single player. In other words, the plot shows the inventory level (the state) of a player. Observe that, according to (25), the inventory level of the individual player takes its values in the bounded support {−2, −1, 0, 1, 2}, where the lower threshold is s = 1 and the upper threshold is S = 2. The player's inventory is for most of the time in state 0 and 1, which is in accordance with the greater values of the distribution in those states obtained from the macroscopic dynamics in the previous figure. Therefore, we can observe a clear connection between the macroscopic dynamics (Fig. 5) and the microscopic dynamics for a single player (Fig. 6).
In the next example, we analyze the same system with 100 indistinguishable players. The purchase, shortage, and holding costs are as in the previous example, and we consider a transportation cost K = 1200, which will be divided among the active However, in this case we increase the demand set such that w t ∈ Ω := {0, 1, ..., 10} and is uniformly distributed. The macroscopic dynamics is represented by the Markov chain displayed in Fig. 7. In Fig. 8, it is represented the time plot of the macroscopic dynamics for one player. In accordance with (13) and (15), it is possible to see that the players reorder when their inventory level is lower than or equal to the threshold s, which also depends on the number of active players, and they reorder up to the upper threshold S = 8. Figure 9 illustrates the time plot of the distribution π t for three different initial states. The simulations were developed for three cases in which the initial states are obtained from a random uniform distribution in state is greater in state -1 and 1, which is consistent with Fig. 8. In Fig. 8 indeed, the inventory is for most of the time in states closer to state 0. In the same way as in the previous example, we can observe a clear connection between the macroscopic dynamics ( Fig. 9) and the microscopic dynamics for a single player (Fig. 8). During this simulation, we assume any 50 iterations the states are reset to their initial value.

Conclusions
We have developed an abstraction in the form of a dynamic coordination game model where each player's dynamics is a scalar fluid flow dynamical system characterized by a controlled input flow and an uncontrolled output flow. The players have to pay a share of the activation cost to control their dynamics at a given time. We have provided three main contributions. First, we have showed that if the retailers are rational players, then they benefit from using threshold strategies where the threshold is on the fraction of active players. Then, we have obtained explicit expressions for the lower and upper thresholds under specific circumstances. Third, we have extended our study to a scenario with a large number of players and we have proved that twothreshold strategies, such as the (s, S) strategies used in inventory control, are optimal strategies for the stationary solution. In this context, we have also provided conditions for the nonstationary mean-field equilibrium to converge to the stationary one in the limit.
A main key direction for future works is to explore the feasibility of the proposed coordination scheme in multi-vector energy systems (heat, gas, power) with special focus on coalitional bidding in decentralized energy trade. The ultimate goal is to investigate the benefits of aggregating independent wind power producers.
We know that ξ t l j = −ξ t jl = −ξ t 0l + ξ t 0 j . Hence, we obtain: ξ t 0k =ξ t 01 (−P t 01 + P t k1 ) + ξ t 02 (−P t 02 + P t k2 ) + . . . + ξ t 0k (−P t 0k − j, j =k P t k j ) + . . . + ξ t 0η (−P t 0η + P t kη ) + j∈ [0,η] [c 0 j (π t , P t )P t 0 j ] − j∈ [0,η] [c k j (π t , P t )P t k j ], where the matrixÃ and vectorb can be derived from A and vector b. Assuming a bounded support for ω and therefore also for x, denoted by [1, η], we obtain a generic η × η dynamical system whereÃ = [ã t i j ] i, j∈ [1,η] , and from which the following equilibrium point can be obtained: In Lemma 4.1, we illustrate a constructive way to obtainÃ. Hence, for the bounded support [1, η], system (34) can be represented in matrix form as: It is evident that the entries of the main diagonal of the matrix follow the law, for generic l ∈ {0, 1, . . . , η}:ã t ll = −P t 0l − k,k =i P t lk , a t l j = −P t 0 j − P t l j , which are in accordance with (19). Now, note that the trace ofÃ is negative, namely If the determinant of matrixÃ is positive, then the time response of the dynamical system (34) is characterized by eigenvalues with negative real part, and the system is asymptotically stable. Therefore, we can conclude that the initial conditions