1 Introduction

Games with strategic complements are characterized by the property that a player has an increasing incentive to take a given action as more neighbors take that same action [15, Chapter 9]. Examples of such games, though sometimes not explicitly mentioned, arise in learning in social networks [11], collective behavior in social networks [12], systemic risk [6], and cascading failures in financial networks [8, 18]. Coordination games represent a subset of games with strategic complements whereby the payoff of a player scales with the percentage of players taking an action. This paper studies a dynamic game with strategic complements where the players have to coordinate actions within a finite horizon window [2, 3, 19]. The dynamics of each player is a fluid flow dynamical system subject to a controlled input flow and a stochastic uncontrolled output flow. Activating an input flow requires an activation cost. The discrepancy between input and output flow accumulates in a state variable. Coupling derives from the activation cost to be shared among all players who activate an input flow at a given time, called active players. Sharing the activation cost determines an incentive for the players to be active with an increasing number of active players. All results can be extended to the vector case by using the robust decomposition approach in [4, Section 3].

We extend the analysis in [19] to a mean-field scenario [1, 9, 10, 13, 14, 16, 17] characterized by a microscopic and macroscopic dynamics. The microscopic dynamics is the fluid flow system determining the state of each player. The optimal control is obtained from solving a backward Bellman equation in the value function. The macroscopic dynamics is in the form of a Markov chain dynamics where the nodes represent all possible values for the players’ states, and the links are weighted by the transition probabilities between states. The Markov chain dynamics determines the evolution of the distribution of players’ states among the different values. The resulting game involves both the microscopic and macroscopic dynamics in a unified framework and takes the form of a discrete-state discrete-time mean-field game. Such a game consists of two coupled difference equations, a backward Bellman equation in the value function, and a forward Markov dynamics in the distribution of the players’ states. The mean-field equilibrium is obtained as solution of these two coupled equations. The stationary solution is obtained in the asymptotic limit when the horizon length goes to infinity.

Contribution This study contributes in different ways to advance the theory on dynamic coordination games with activation costs and extend for the first time the use of two-threshold strategies to mean-field games. An example of two-threshold strategy is the (sS) strategy used in inventory control, see [7] and [5, Chapter 4]. In [5], the author derives the thresholds of the (sS) policy for an individual player considering a fixed cost. In this work, we present the explicit expression for these thresholds considering a large number of players and an activation cost that depends on the fraction of active players at each time t. We recall that (sS) strategies are strategies where replenishments occur anytime the inventory level goes below a lower threshold s. Replenishments bring back the inventory level up to a higher threshold S. In particular, we highlight the following results:

  • Strategies at a Nash equilibrium have a threshold structure. Lower and upper thresholds have an explicit expression in the deterministic case, namely when the demand is known, or in single-stage games.

  • Two-threshold (sS) strategies are mean-field equilibrium strategies for the stationary solution in dynamic games with a large number of players. Stationary solutions imply that the fixed cost is constant over the horizon. The game decomposes into a set of uncoupled optimization problems. In each problem, a single player has to find the optimal strategy under a fixed cost. We then use the well-known optimality of (sS) strategies under fixed cost to show that such strategies are best responses for the game. Furthermore, we provide conditions for the convergence of the nonstationary mean-field equilibrium to the stationary one in the limit.

  • We corroborate our results with a numerical analysis of a stylized inventory model.

This paper is organized as follows. In Sect. 2, we introduce the model. In Sect. 3, we obtain the optimal thresholds. In Sect. 4, we study convergence to stationary solutions. In Sect. 5, we provide numerical analysis. Finally, in Sect. 6, we draw conclusions and discuss future works.

2 Mean-Field Inventory Game

We consider a large number of indistinguishable players and a finite number of states (inventory levels). Let us assume that at stage \(t=0,1,...,N\) the inventory level for an individual player is \(x^t\in {\mathbb {Z}}\), the player faces a stochastic demand \(\omega ^t \in {\mathbb {Z}}_+\) and orders a quantity \(u^t\in U^t\subseteq {\mathbb {Z}}_+\), where \(U^t\) denotes the set of admissible actions, \({\mathbb {Z}}\) is the set of integers, and \({\mathbb {Z}}_+\) is the set of nonnegative integers. Hence, the microscopic dynamics of the player evolves according to a linear finite-state, discrete-time model:

$$\begin{aligned} x^{t+1}=x^t+u^t-\omega ^t, \quad \hbox { for all}\ t=0,1,\ldots ,N. \end{aligned}$$
(1)

According to [5] in (sS) strategies, replenishments occur anytime the inventory level goes below a lower threshold s and when a replenishment takes place it brings back the inventory level up to the upper threshold S [7]. In accordance with this strategy, let us define the control \(u^t\) as follows:

$$\begin{aligned} u^t:=\mu (x^t):=\left\{ \begin{array}{cc} S-x^t, &{} \qquad \text{ if } \quad x^t < s,\\ 0, &{} \qquad \text{ if } \quad x^t \ge s, \end{array}\right. \quad \hbox { for all}\ t=0,1,\ldots ,N. \end{aligned}$$
(2)

After substituting the (sS) strategy as defined in (2) in the dynamics (1), we obtain

$$\begin{aligned} x^{t+1}=\left\{ \begin{array}{lll} S-\omega ^t, &{} \qquad \text{ if } \quad x^t < s,\\ x^t-\omega ^t, &{} \qquad \text{ if } \quad x^t \ge s, \end{array} \right. \quad \hbox { for all}\ t=0,1,\ldots ,N. \end{aligned}$$
(3)

To define the random parameter \(\omega ^t\) that corresponds to the uncertain demand at time t, let us consider a probability distribution \(\phi ^t:{\mathbb {Z}}_+\rightarrow [0,1]\) such that \(\omega \mapsto \phi ^t_\omega \); here, \(\phi ^t_\omega \) is the probability of having a demand of \(\omega \) items at time t for all \(\omega \in {\mathbb {Z}}_+\).

To derive a macroscopic dynamics for the system, let us denote by \(\pi ^t\) the distribution of players over the states at time t. Hence, \(\pi ^t\) is a vector that stores in each of its entries the fraction of players in each possible state. In particular, the jth entry \(\pi _j^t\) represents the fraction of players whose state is \(x^t=j\) at time t and derives from the following distribution function:

$$\begin{aligned} \pi _j^t:{\mathbb {Z}} \rightarrow [0,1], \quad j \mapsto \pi _j^t \in [0,1]. \end{aligned}$$

Occasionally, we will view \(\pi ^t\) as an infinite-dimensional vector in \({\mathbb {Z}}\). Also, let \(\pi ^0\) be the initial distribution of players over the states.

At every time step t, the players in state l decide the amount to reorder \(u^t\). The order quantity, as well as the demand distribution \(\omega ^t\), determines the transition probability \(P_{lj}^t\) from state l to state j. Given the transition probabilities \(P_{lj}^t\) at time \(0\le t <N\), the distribution of players at time \(t+1\) is given by the following macroscopic model which takes the form of a Markov chain:

$$\begin{aligned} \pi _j^{t+1} = \sum _{l \in {\mathbb {Z}}} \pi _l^t P_{lj}^t, \text{ for } \text{ all } j \in {\mathbb {Z}}, \quad \hbox { for all}\ t=0,1,\ldots ,N. \end{aligned}$$
(4)

The transition probabilities \(P_{lj}^t\) used in the above equation are linked to the probability mass functions used to model the stochastic demand. To see this, let \(\phi ^t_0, \, \phi ^t_1, \, \phi _2^t, \ldots \) be the probability mass functions at time t associated with \(\omega ^t = 0,1,2 \ldots \), respectively. The relation between \(P_{lj}^t\) and \(\phi ^t_0, \, \phi ^t_1, \, \phi ^t_2, \ldots \) is as follows:

$$\begin{aligned} \begin{array}{ll} [\,\ldots \, P_{l,S-2}^t \, P^t_{l,S-1} \, P^t_{l,S}]= [\,\ldots \,\phi _2^t \,\quad \phi ^t_1 \,\quad \phi ^t_0], \quad l < s. \end{array} \end{aligned}$$
(5)

The above equation defines the transition probabilities from any state below the threshold, where the players reorder up to level S. For any state equal to or greater than the threshold s, the transition probabilities are instead given by:

$$\begin{aligned} \begin{array}{ll} [\,\ldots \, P^t_{l,l-2} \, P^t_{l,l-1} \, P^t_{l,l}]= [\, \ldots \, \phi ^t_2 \,\quad \phi ^t_1 \,\quad \phi ^t_0], \quad l \ge s. \end{array} \end{aligned}$$
(6)
Fig. 1
figure 1

Markov chain representing the macroscopic dynamics (4) obtained from the microscopic dynamics (1)

Figure 1 depicts the Markov chain that represents the macroscopic dynamics (4). In the mean-field context, the fraction of active players, which are the players whose inventory level is below or equal to the lower threshold \(s^t\), is then given by:

$$\begin{aligned} a^t = \sum _{l,l< s^t} \pi _l^t, \quad \text{ for } \text{ all } t=0,1,\ldots ,N. \end{aligned}$$
(7)

Likewise, we can define a value function for any time t which represents the expected optimal cost for a player in the generic state j at time t:

$$\begin{aligned} v^t: {\mathbb {Z}} \rightarrow {\mathbb {R}}_+, \quad j \mapsto v_j^t \in {\mathbb {R}}_+. \end{aligned}$$

Let the transition probability matrix at time t be denoted by \(P^t=[P^t_{lj}]_{l,j\in {\mathbb {Z}}}\). Associated with each probability \(P^t_{lj}\), there is a transition cost for going from state l to state j, which depends also on the distribution of players \(\pi ^t\); let us denote such cost as \(c_{lj}^t(\pi ^t,P^t)\).

The average cost for the players in state l, when their dynamics follow the transition probability matrix \(P^t\), for a given distribution \(\pi ^t\) and the future cost defined by the value function \(v_j^{t+1}\), for all \(j\in {\mathbb {Z}}\), are given by:

$$\begin{aligned} e_l^t(\pi ^t,P^t,v^{t+1}) = \sum _{j \in {\mathbb {Z}}} \left[ c_{lj}(\pi ^t,P^t) P^t_{lj} + v^{t+1}_j P^t_{lj}\right] , \quad \hbox { for all}\ t=0,1,\ldots ,N. \end{aligned}$$

We are in the position to provide the following definition of Nash equilibrium in the mean-field limit, in discrete-time, and in discrete-state space.

Definition 2.1

(Definition 1 in [9]) Let \( {\mathbb {S}}^{{\mathbb {Z}}}\) denote the simplex in \({\mathbb {Z}}\). Fix a probability vector \(\pi \in {\mathbb {S}}^{{\mathbb {Z}}}\) and a cost vector \(v \in {\mathbb {R}}^{{\mathbb {Z}}}\). A stochastic matrix \(P \in [0,1]^{{\mathbb {Z}} \times {\mathbb {Z}}}\) is a Nash minimizer of \(e(\pi ,\cdot ,v)\) if for each \(l \in {\mathbb {Z}}\) and any \(q \in [0,1]^{\mathbb {Z}}\),

$$\begin{aligned} e_l(\pi ,P,v)\le e_l(\pi ,{\mathcal {P}} (P,q,l),v), \end{aligned}$$

where \({\mathcal {P}} (P,q,l)\) is obtained from matrix P by replacing the lth row by \(q \in {\mathbb {S}}^{{\mathbb {Z}}}\).

We say that the following pair of time-varying distribution and value function

$$\begin{aligned} \{(\pi ^t,v^t); \, 0 \le t \le N\} \end{aligned}$$

is a mean-field equilibrium if it is the solution of the following system of equations for all \(t=0,1,\ldots , N\):

$$\begin{aligned} \left\{ \begin{array}{ll} v_l^t = \sum _{j} \left[ c_{lj}(\pi ^t,P^t) P_{lj}^t + v_j^{t+1} P_{lj}^t\right] , &{}\hbox { }\ \forall j \in {\mathbb {Z}},\\ \pi _j^{t+1} = \sum _{l} \pi _l^t P_{lj}^t, &{} \forall j \in {\mathbb {Z}}, \end{array} \right. \end{aligned}$$
(8)

where \(P^t\) is a Nash minimizer of \(e(\pi ^t,\cdot ,v^{t+1})\).

In the above set of equations, we set the transition cost \(c^t_{lj}=c_{lj}(\pi ^t,P^t)\) at time t as follows:

$$\begin{aligned} \left\{ \begin{array}{lll} K^t + r (S-l) + p \max (0,-j) +h \max (0,j), &{}\hbox { if}\ l<s \\ p \max (0,-j)+h \max (0,j), &{}\text{ otherwise }, \end{array} \right. \end{aligned}$$
(9)

where \(K^t:=K(a^t)\ge 0\) is the transportation cost charged to each player that is active at time t, \(r\ge 0\) is the fixed purchase cost per stock unit, \(h\ge 0\) the fixed penalty on holding and \(p>h\ge 0\) the fixed penalty on shortage.

The above transition cost can be rewritten in compact form as:

$$\begin{aligned} c^t_{lj} = \Big (K^t + r (S-l) \Big ) \delta (l<s) + p \max (0,-j) +h \max (0,j), \end{aligned}$$

where

$$\begin{aligned} \delta (l<s) = \left\{ \begin{array}{lll} 1, &{} \text{ if } l<s\text{, } \\ 0, &{} \text{ otherwise }. \end{array} \right. \end{aligned}$$
(10)

Note that the transportation cost \(K^t=K(a^t)\) paid by each player is a monotonically decreasing function on the fraction of active players at time t. As the fraction of active players \(a^t\) increases, the transportation cost \(K^t\) decreases. If a player makes an order, it incentivizes other players to reorder; this implies that the cost of one player also depends on the actions of the other players. Let us assume a large number of players M and a total transportation cost \({\tilde{K}}\). As an example, if the total cost is equally divided among the active players, the individual transportation cost charged to each player is given by \(K(a^t)=\frac{{\tilde{K}}}{Ma^t}\) if the player is active, and it is zero otherwise.

3 Optimal Thresholds

In this section, we provide explicit expressions to obtain the lower threshold s and the upper threshold S, as a function of the probability distribution function \(\phi ^t\) which determines the stochastic demand at each time t.

Let us denote by \(y^t=x^t+u^t\), the instantaneous inventory position, i.e., the inventory level just after the order has been issued, and let us define the following stage cost function:

$$\begin{aligned} \begin{array}{lll} G^{t}(y^{t})= ry^t+p {\mathbb {E}} \{ \max (0,-(y^t-\omega ^t))\} + h {\mathbb {E}} \{ \max (0,y^t-\omega ^t) \}. \end{array} \end{aligned}$$
(11)

Then, we have for the value function:

$$\begin{aligned} \begin{array}{lll} v_x^t =-rx^t+\min _{y^t\ge x^t}[K^t+G^t(y^t),G^t(x^t)], \end{array} \end{aligned}$$
(12)

where the term \(-rx^t+K^t+G^t(y^t)\) indicates the stage cost in case of reordering, and \(-rx^t+G^t(x^t)\) indicates the stage cost in case of no reordering. Hence, note that the cost of reordering is given by:

$$\begin{aligned} \begin{array}{ll} K^t -rx^t +&{}G^t(y^t)= K^t + ru^t +p {\mathbb {E}} \{ \max (0,-(y^t-\omega ^t))\} + h {\mathbb {E}} \{ \max (0,y^t -\omega ^t) \} \\ &{}=K^t + r (y^t - x^t )+p {\mathbb {E}} \{ \max (0,-(y^t -\omega ^t))\} + h {\mathbb {E}} \{ \max (0,y^t-\omega ^t) \}. \\ \end{array} \end{aligned}$$

To obtain \(S^t\), for an instantaneous inventory position \(\gamma \), first let us define the expected holding \({\mathbb {E}} \{\max (0,\gamma -\omega ^t)\}\) and expected shortage \(\mathbb E\{\max (0,-(\gamma -\omega ^t))\}\) as follows:

$$\begin{aligned} {\mathbb {E}} \{\max (0,\gamma -\omega ^t)\}=\varPsi ^t_h[\gamma ]:=\sum _{\omega =0}^\gamma (\gamma -\omega ) \phi ^t_{\omega }, \\ {\mathbb {E}}\{\max (0,-(\gamma -\omega ^t))\}=\varPsi ^t_s[\gamma ]:=\sum _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega }, \end{aligned}$$

where \(\phi ^{t}_{\omega }\) is the probability of having a demand of \(\omega \) items at time t.

Hence, the stage cost function \(G^t(\gamma )\) is given by:

$$\begin{aligned} G^t(\gamma ) = r ( \gamma - x^t ) + h \underbrace{\sum _{\omega =0}^\gamma (\gamma -\omega ) \phi ^t_{\omega }}_{:=\varPsi ^t_h[\gamma ]} + p \underbrace{\sum _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega }}_{:=\varPsi ^t_s[\gamma ]}. \end{aligned}$$

By applying the discrete difference operator \(\varDelta \), to function \(G^t(\gamma )\) we then have:

$$\begin{aligned} \begin{array}{ll} \varDelta G^t(\gamma ) &{}:= G^t(\gamma +1) - G^t(\gamma ) \\ &{}= r( \gamma +1 - x^t) + h \sum _{\omega =0}^{\gamma +1}(\gamma +1 -\omega ) \phi ^t_{\omega } + p \sum _{\omega =\gamma +2}^\infty (\omega - \gamma -1) \phi ^t_{\omega } \\ &{}\quad - r ( \gamma - x^t) - h \sum _{\omega =0}^\gamma (\gamma -\omega ) \phi ^t_{\omega } - p \sum _{\omega =\gamma +1}^\infty (\omega - \gamma ) \phi ^t_{\omega }\\ &{}= r + h \sum _{\omega =0}^\gamma \phi ^t_\omega - p \sum _{\omega =\gamma +1}^\infty \phi ^t_{\omega }\\ &{}= r + h \varPhi ^t_\omega [\gamma ] - p (1 - \varPhi ^t_\omega [\gamma ]), \end{array} \end{aligned}$$

where \(\varPhi ^t_\omega [\gamma ]\) is the cumulative distribution function defined as:

$$\begin{aligned} \varPhi ^t_\omega [\gamma ]:=\sum _{\omega =0}^\gamma \phi ^t_\omega . \end{aligned}$$

The order-up-to level \(S^t\) is the optimal \(\gamma \), which is obtained from solving:

$$\begin{aligned} \begin{array}{ll} \min _\gamma \, \{\gamma | \, \varDelta G^t(\gamma ) \ge 0\} = \min _\gamma \,\{ \gamma | \, r + h \varPhi ^t_\omega [\gamma ] - p (1 - \varPhi ^t_\omega [\gamma ]) \ge 0\}. \end{array} \end{aligned}$$

From the above, we then obtain (Fig. 2):

$$\begin{aligned} S^t = \arg \min _\gamma \Big \{\gamma | \, \varPhi ^t_\omega [\gamma ] \ge \frac{-r+ p }{h +p} \Big \}. \end{aligned}$$
(13)
Fig. 2
figure 2

Value of \(\gamma \) such that the cumulative distribution function \(\varPhi _{\omega }^t[\gamma ]\ge \frac{-r+p}{h+p}\)

To obtain \(s^t\), let us consider the cost of not reordering, which is given by:

$$\begin{aligned} \begin{array}{ll} -rx^t+G^t(x^t) &{}= p {\mathbb {E}} \{ \max (0,-(x^t-\omega ^t))\} + h {\mathbb {E}} \{ \max (0,x^t -\omega ^t) \} \\ &{}=h \sum _{\omega =0}^{x^t} (x^t-\omega ) \phi ^t_{\omega } + p \sum _{\omega =x^t+1}^\infty (\omega - x^t) \phi ^t_{\omega }\\ &{}= h \varPsi ^t_h[\gamma ] + p \varPsi ^t_s[\gamma ]. \end{array} \end{aligned}$$
(14)

From the above, we then obtain:

$$\begin{aligned} s^t:= \arg \min _{x^t} \{x^t | \, -rx^t +G^t(x^t) \le K^t-rS^t+G^t(S^t)\}. \end{aligned}$$

In particular, we have (Fig. 3):

$$\begin{aligned} \begin{array}{ll} s^t:= \arg \min _{x^t} \Big \{x^t | \, h \varPsi ^t_h[x^t]+ p \varPsi ^t_s[x^t] \le K^t+r ( S^t - x^t) + h \varPsi ^t_h[S^t] + p \varPsi ^t_s[S^t] \Big \}. \end{array}\nonumber \\ \end{aligned}$$
(15)
Fig. 3
figure 3

Value of \(x^t\) that satisfies equation (15)

Observe that the right-hand side of the inequality in (15) corresponds to the cost of reordering once we obtain the optimal upper threshold \(S^t\).

In order to obtain the lower threshold \(s^t\), we have to find the minimum inventory level \(x^t\) that satisfies (15). As the penalty on shortage is greater than the penalty on holding (\(p>h\)), if the inventory level decreases, then the left-hand side of the inequality in (15) increases. If the transportation cost \(K^t\) decreases, the right-hand side of the inequality decreases and the minimum inventory level \(x^t\) that satisfies (15) increases. Therefore, the lower the transportation cost the higher the threshold \(s^t\).

Equations (13) and (15) represent explicit expressions to obtain the two thresholds and fully characterize the reordering strategy once the probability distribution of the stochastic demand is given.

Once the thresholds are obtained, we implement the control \(u^t\), which is given by (2), and we obtain the resulting dynamics (3).

In the following, we study the time evolution of the first-order moment of the inventories. The expected inventory at time t when \(x^t\) is distributed according to \(\pi ^t\) is given by:

$$\begin{aligned} {\mathbb {E}} x^{t} = \sum _{l} \pi _l^t l. \end{aligned}$$

Then, from (3) the expected inventory at time \(t+1\) when \(x^{t+1}\) is distributed according to \(\pi ^{t+1}\) and the demand \(\omega \) takes values in the support \(\varOmega \subseteq {\mathbb {Z}}_+\), follows the recursion:

$$\begin{aligned} {\mathbb {E}} x^{t+1}&= \sum _{l} \pi _l^{t+1} l = \sum _{\omega \in \varOmega } [(S^t-\omega ) (\sum _{l,l< s^t} \pi ^t_l) + \sum _{l,l \ge s^t} (l -\omega ) \pi _l^t ] \phi _\omega \nonumber \\&= \sum _{\omega \in \varOmega } [ (S^t-\omega ) a^t + \sum _{l,l \ge s^t} (l -\omega ) \pi _l^t] \phi _\omega . \end{aligned}$$
(16)

From \(\sum _{l,l\ge s^t} \pi ^t_l = 1 - a^t\), we have:

$$\begin{aligned} {\mathbb {E}} x^{t+1}&= \sum _{\omega \in \varOmega } [S^t a^t -\omega + \sum _{l,l \ge s^t} l \pi _l^t] \phi _\omega = \sum _{\omega \in \varOmega } [S^t a^t -\omega + \sum _{l} l \pi _l^t - \sum _{l,l< s^t} l \pi _l^t] \phi _\omega \nonumber \\&= \sum _{\omega \in \varOmega } [S^t a^t -\omega + {\mathbb {E}} x^{t} - \sum _{l,l< s^t} l \pi _l^t ] \phi _\omega = \sum _{\omega \in \varOmega } [S^t (\sum _{l,l< s^t} \pi _l^t ) -\omega + {\mathbb {E}} x^{t} - \sum _{l,l< s^t} l \pi _l^t ] \phi _\omega \nonumber \\&= \sum _{\omega \in \varOmega } [-\omega \phi _\omega ] + \sum _{l,l < s^t} (S^t - l) \pi _l^t + {\mathbb {E}} x^{t}. \end{aligned}$$
(17)

In the numerical example, we make use of (17) to obtain the first moment of the distribution of the inventory at time \(t+1\).

4 Stationarity

In this section, we are interested in stationary solutions, namely solutions where both the distribution function and the value function do not depend on time.

Remark 4.1

If the distribution function and the value function do not depend on time, we have a stationary fraction of active players, namely

$$\begin{aligned} {{\tilde{a}}} = \sum _{l,l < s} \pi _l. \end{aligned}$$

In addition, the activation cost is a function of the fraction of active players. Therefore, the cost \(K({\tilde{a}})\) is fixed over the horizon and it depends on the stationary solution. Now, we can apply the results obtained in Sect. 3 for a fixed activation cost K, to obtain the optimal lower threshold s and the optimal upper threshold S.

Let us denote by \((\pi ,v)\) the generic stationary solution. The pair \((\pi ,v)\) is a mean-field equilibrium at steady state if it satisfies the following set of equations:

$$\begin{aligned} \left\{ \begin{array}{ll} v_l = \sum _{j} c_{lj}(\pi ,P) P_{lj} + v_j P_{lj} - {{\bar{\lambda }}}, \\ \pi _j = \sum _{l} \pi _l P_{lj}, \end{array} \right. \end{aligned}$$
(18)

where \({{\bar{\lambda }}}\) is the optimal average cost per stage. In [9], the authors prove that the optimal average cost can be seen as an average transition cost over the population of players. If \({\bar{P}}\) is the optimal transition matrix and \(({\bar{\pi }},{\bar{v}})\) is a stationary solution of (18), then \({\bar{\lambda }}=\sum _{lj}\pi _jc_{lj}({\bar{\pi }},{\bar{P}}){\bar{P}}_{lj}\).

Assuming a bounded support for the demand \(\omega \) and therefore also for the inventory level x, which we denote by \([1,\eta ]\), let us define matrix \({{\tilde{A}}}=[{\tilde{a}}_{ij}]_{i,j\in [1,\eta ]}\), where:

$$\begin{aligned} {\tilde{a}}_{ij}=\left\{ \begin{array}{ll} -P_{0i} - \sum _{k,k\not = i} P_{ik}, &{} \hbox { if}\ j=i,\\ -P_{0j} + P_{ij}, &{} \hbox { if}\ j\not =i. \end{array} \right. \end{aligned}$$
(19)

Let us define the new variable \(\xi ^t_{lk}=[v^t_l-v^t_k]\), which can be seen as a potential difference between two generic states or nodes of the Markov chain l and k, and the vector \(\xi ^t_l:=[\xi ^t_{lj}]_{j\in {\mathbb {Z}}}=[v^t_l-v^t_j]_{j\in {\mathbb {Z}}}\). In particular, \(\xi ^t_0:=[\xi ^t_{0j}]_{j\in {\mathbb {Z}}}=[v^t_0-v^t_j]_{j\in {\mathbb {Z}}}\). In addition, denote \(P^t_l= [P^t_{lj}]_{j \in {\mathbb {Z}}}\) and \(c_l= [c_{lj}]_{j \in {\mathbb {Z}}}\) for all \(l \in {\mathbb {Z}}\).

Before discussing the main contribution of this section, that is the convergence of nonstationary mean-field equilibrium to the stationary one in the limit, we present an intermediate result to verify the structure of \({\tilde{A}}\) introduced in (19).

Lemma 4.1

Let a bounded support for the demand \(\omega \) and for the inventory level x be given and denote it by \([1,\eta ]\). The discrete-time dynamics of the potential difference \(\xi ^t_0=[v^t_0-v^t_j]_{j\in [1,\eta ]}\) is given by:

$$\begin{aligned} {\dot{\xi }}^t_0={\tilde{A}}\xi ^t_0+{\tilde{b}}, \end{aligned}$$
(20)

where \({\tilde{A}}=[{\tilde{a}}^t_{ij}]_{i,j\in [1,\eta ]}\), each entry \({\tilde{a}}_{ij}^t\) is of the form (19) and \({\tilde{b}}=[c_0^TP^t_0-c_j^TP^t_j]_{j\in [1,\eta ]}\).

Proof

The proof is in the Appendix. \(\square \)

In the following theorem, we present the conditions for the nonstationary mean-field equilibrium, which is a solution of (8), to converge to the stationary solution of problem (18). Note that the stochastic matrix \(P^t\) presented in equation (8) is a Nash minimizer of the average cost \(e(\pi ^t,\cdot ,v^t)\).

Let \(\pi [N](-N)\) be the initial distribution of players at the beginning of the horizon at time \(-N\) and \(v[N](N)_l\) the terminal cost at the end of the horizon at time N.

Theorem 4.1

Given \(N>0\), a vector \(\pi ^0 \in {\mathbb {Z}}\) and a terminal penalty \(v^N_l\in {\mathbb {R}}_+\), let \((\pi [N], v[N])\) be the solution of (8) with initial-terminal conditions \(\pi [N](-N) = \pi ^0\) and \(v[N](N)_l = v^N_l\). Let \(({\bar{\pi }},{\bar{v}})\) be a solution of the stationary problem (18). When \(N \rightarrow \infty \)

$$\begin{aligned} \pi [N]^0 \rightarrow {\bar{\pi }}, \qquad v[N]^0 \rightarrow {\bar{v}}, \end{aligned}$$
(21)

if \(det({{\tilde{A}}})>0\).

Proof

The proof is in the Appendix. \(\square \)

5 Numerical Analysis

We consider an example where the demand \(\omega ^t \in \varOmega := \{0,1,2,3\}\) and it is uniformly distributed, namely by using the notation \(\phi _\omega \) to indicate the probability that \(\omega ^t = \omega \), we have \(\phi _i=\frac{1}{4}\) for all \(i \in \varOmega \).

Assume that the proportional purchase cost is \(r=1\), the shortage cost is \(p=10\), and the holding cost is \(h=2\). In the case of single-stage optimization, we have that the order-up-to level is given by:

$$\begin{aligned} S = \arg \min _\gamma \Big \{\gamma | \, \varPhi ^t_\omega [\gamma ] \ge \frac{-c + p }{h +p} \Big \}. \end{aligned}$$

From the above, we obtain \(S=2\). Indeed for \(\gamma =3\), we have:

$$\begin{aligned} \varPhi ^t_\omega [3] =1 \ge \frac{-r + p }{h +p} = \frac{3}{4}. \end{aligned}$$

For \(\gamma =2\), we obtain:

$$\begin{aligned} \varPhi ^t_\omega [2] = \frac{3}{4} = \frac{-r + p }{h +p} = \frac{3}{4}, \end{aligned}$$

Differently, for \(\gamma =1\) it holds

$$\begin{aligned} \varPhi ^t_\omega [1] = \frac{1}{2} \not \ge \frac{-r + p }{h +p} = \frac{3}{4}, \end{aligned}$$

and therefore

$$\begin{aligned} S=\arg \min _\gamma \Big \{\gamma | \, \varPhi ^t_\omega [\gamma ] \ge \frac{-r + p }{h +p} \Big \} =2. \end{aligned}$$

As for the reorder level s, we have:

$$\begin{aligned} \begin{array}{ll} s:= \arg \min _{x} \Big \{x | \, h \varPsi _h^t[x] + p \varPsi _s^t[x] \le K^t + r ( S - x ) + h \varPsi _h^t[S] + p \varPsi _s^t[S] \Big \}. \end{array}\nonumber \\ \end{aligned}$$
(22)

We show next that we have \(s=1\).

Actually, for \(x^t=1\) we obtain:

$$\begin{aligned} \begin{array}{ll} h \varPsi _h^t[1] + p \varPsi _s^t[1] = h \frac{1}{4} + p \frac{3}{4} = 8 &{}\le K^t + r + h \varPsi _h^t[2] + p \varPsi _s^t[2] \\ {} &{}= K^t + 1 + h \frac{3}{4} + p\frac{1}{4} =K^t + 5, \end{array}\end{aligned}$$
(23)

which is satisfied by any \(K^t \ge 3\).

For \(x^t=0\), we have:

$$\begin{aligned} \begin{array}{ll} h \varPsi _h^t[0] + p \varPsi _s^t[0] = p\frac{6}{4}= 15 &{}\le K^t + 2 r + h \varPsi _h^t[2]+ p \varPsi _s^t[2] \\ {} &{}= K^t + 2 + h \frac{3}{4} + p\frac{1}{4} =K^t + 6, \end{array}\end{aligned}$$
(24)

which is satisfied by any \(K^t \ge 9\).

For any \(K^t < 9\), we then have:

$$\begin{aligned} \begin{array}{ll} s:= \arg \min _{x} \Big \{x | \, h \varPsi _h^t[x] + p \varPsi _s^t[x] \le K^t + r ( S - x ) + h \varPsi _h^t[S]+ p \varPsi _s^t[S] \Big \}=1.\end{array}\end{aligned}$$

We can conclude then that for any \(K^t\), such that \(1 \le K^t < 9\), we have the reorder level \(s=1\) and the order-up-to level \(S=2\).

Then, from (3) the microscopic dynamics is defined in the bounded support \(\{-2,-1,0,1,2\}\), namely \(x^t \in \{-2,-1,0,1,2\}\) for all \(t\ge 0\) and is given by:

$$\begin{aligned} x^{t+1}=\left\{ \begin{array}{cc} 2-\omega ^t, &{} \qquad \text{ if } \quad x^t =-2,-1,0,\\ x^t-\omega ^t, &{} \qquad \text{ if } \quad x^t = 1,2. \end{array}\right. \end{aligned}$$
(25)

The macroscopic dynamics corresponding to the microscopic dynamics (25) is the Markov chain displayed in Fig. 4.

Fig. 4
figure 4

Markov chain representing the macroscopic dynamics obtained from the microscopic dynamics (25)

As for the value function difference we have a \(4 \times 4\) system where \(l \in \{-2,-1,0,1,2\}\), which is given by:

$$\begin{aligned} \left[ \begin{array}{ll} {{\dot{\xi }}}_{-2-1} \\ {{\dot{\xi }}}_{-20}\\ {{\dot{\xi }}}_{-21}\\ {{\dot{\xi }}}_{-22} \end{array} \right] = \left[ \begin{array}{cccc} -1 &{} 0 &{} 0 &{} 0 \\ 0 &{} -1 &{} 0 &{} 0 \\ 0 &{} 0 &{} -1 &{} 0 \\ 0 &{} 0 &{} 0 &{} -1 \end{array} \right] \left[ \begin{array}{ll} \xi _{-2-1}\\ \xi _{-20} \\ \xi _{-21} \\ \xi _{-22} \end{array} \right] +\left[ \begin{array}{ll} \frac{1}{4}(\sum _{j=1}^{2}c_{-2j}-\sum _{j=1}^{2}c_{-1j}) \\ \frac{1}{4}(\sum _{j=1}^{2}c_{-2j}-\sum _{j=1}^{2}c_{0j}) \\ \frac{1}{4}(\sum _{j=1}^{2}c_{-2j}-\sum _{j=1}^{2}c_{1j}) \\ \frac{1}{4}(\sum _{j=1}^{2}c_{-2j}-\sum _{j=1}^{2}c_{2j}) \end{array} \right] .\nonumber \\ \end{aligned}$$
(26)

From (26), we note that the \(det({{\tilde{A}}})=1>0\). From (17), we also have that the dynamics of the expected inventory (first moment) is given by:

$$\begin{aligned} \begin{array}{ll} {\mathbb {E}} x^{t+1} &{} = -2\pi _{-2}^{t+1} - \pi _{-1}^{t+1} + \pi _{1}^{t+1} + 2 \pi _2^{t+1} \\ &{} = \sum _{\omega \in \varOmega } [(2-\omega )(\pi _{-2}^{t} + \pi _{-1}^{t} + \pi _{0}^{t})+(1-\omega )\pi _1^{t}+(2-\omega )\pi _2^{t}]\phi _\omega \\ &{} = \sum _{\omega \in \varOmega } [2(\pi _{-2}^{t} + \pi _{-1}^{t} + \pi _{0}^{t}) + \pi _1^t + 2\pi _2^t - \omega (\pi _{-2}^{t} + \pi _{-1}^{t} + \pi _{0}^{t} + \pi _1^t + \pi _2^t)]\phi _\omega \\ &{} = \sum _{\omega \in \varOmega }[-2\pi _{-2}^t-\pi _{-1}^t+\pi _1^t+2\pi _2^t+4\pi _{-2}^t+3\pi _{-1}^t+2\pi _0^t-\omega ]\phi _\omega \\ &{} = \sum _{\omega \in \varOmega } (-\omega \phi _{\omega })+\sum _{l,l<1}(2-l)\pi _l^t+{\mathbb {E}}x^t. \end{array}\nonumber \\ \end{aligned}$$
(27)

The rest of this section involves numerical analysis for a system of 100 indistinguishable players. All simulations are carried out with MATLAB on an Intel(R) Core(TM)2 Duo, CPU P8400 at 2.27 GHz, and a 3GB of RAM. The horizon window consists of \(T=200\) iterations. For each player, we simulate (25) for three cases characterized by a different initial distribution.

The initial state is obtained from a random uniform distribution in \(\{1,2\}\) for case 1, in \(\{-2,0\}\) for case 2, and in \(\{-2,2\}\) for case 3 using the commands x0=randi([1,2],n,1), x0=randi([-2,0],n,1), and x0=randi([-2,2],n,1), respectively. The demand is obtained in accordance with \(\phi _i\) and is generated using the command w=randi([0,3],n,T).

The step size is \(dt=0.1\), the proportional purchase cost is \(r=1\), the shortage cost is \(p=10\), and the holding cost is \(h=2\).

Figure 5 displays the time plot of the distribution \(\pi ^t\) for all \(t\in [0,T]\) for the three cases. The distribution at steady state is greater in state \(-1\), 0, and 1 (red, yellow, and purple lines, respectively). Note that, in accordance with Theorem 4.1, the three cases with different initial distribution have the same distribution at steady state. During the simulation, we assume any 50 iterations the states are reset to their initial value, to investigate the time response during the transients.

Fig. 5
figure 5

Time plot of the distribution \(\pi ^t\) for the three cases over states \(-2\) (blue), \(-1\) (red), 0 (yellow), 1 (purple), and 2 (green)

Figure 6 displays the time plot of the microscopic dynamics for a single player. In other words, the plot shows the inventory level (the state) of a player. Observe that, according to (25), the inventory level of the individual player takes its values in the bounded support \(\{-2,-1,0,1,2\}\), where the lower threshold is \(s=1\) and the upper threshold is \(S=2\). The player’s inventory is for most of the time in state 0 and 1, which is in accordance with the greater values of the distribution in those states obtained from the macroscopic dynamics in the previous figure. Therefore, we can observe a clear connection between the macroscopic dynamics (Fig. 5) and the microscopic dynamics for a single player (Fig. 6).

Fig. 6
figure 6

Time plot of the microscopic dynamics of a single player

In the next example, we analyze the same system with 100 indistinguishable players. The purchase, shortage, and holding costs are as in the previous example, and we consider a transportation cost \(K = 1200\), which will be divided among the active players at each time t. The horizon window consists again of \(T = 200\) iterations. However, in this case we increase the demand set such that \(w^t \in \varOmega := \{0,1,...,10\}\) and is uniformly distributed. The macroscopic dynamics is represented by the Markov chain displayed in Fig. 7.

Fig. 7
figure 7

Markov chain representing the macroscopic dynamics for a demand set \(\varOmega := \{0,1,...,10\}\)

In Fig. 8, it is represented the time plot of the macroscopic dynamics for one player. In accordance with (13) and (15), it is possible to see that the players reorder when their inventory level is lower than or equal to the threshold s, which also depends on the number of active players, and they reorder up to the upper threshold \(S = 8\).

Fig. 8
figure 8

Time plot of the microscopic dynamics of a single player and demand \(w^t \in \varOmega := \{0,1,...,10\}\)

Figure 9 illustrates the time plot of the distribution \(\pi ^t\) for three different initial states. The simulations were developed for three cases in which the initial states are obtained from a random uniform distribution in \(\{0,1,...,8\}\) for case 1, in \(\{-10,-9,...,-1,0\}\) for case 2, and in \(\{-10,-9,...,-1,0,1,...,8\}\) for case 3. The states i displayed are \(i = -8\) (blue), \(i = -1\) (yellow), \(i = 1\) (purple), and \(i = 8\) (red). Note that, in accordance with Theorem 4.1, the four cases with different initial distribution have the same distribution at steady state. One can also see that the distribution at steady state is greater in state -1 and 1, which is consistent with Fig. 8. In Fig. 8 indeed, the inventory is for most of the time in states closer to state 0. In the same way as in the previous example, we can observe a clear connection between the macroscopic dynamics (Fig. 9) and the microscopic dynamics for a single player (Fig. 8). During this simulation, we assume any 50 iterations the states are reset to their initial value.

Fig. 9
figure 9

Time plot of the distribution \(\pi ^t\) for the three cases overstates \(-8\) (blue), \(-1\) (yellow), 1 (purple), and 8 (red)

6 Conclusions

We have developed an abstraction in the form of a dynamic coordination game model where each player’s dynamics is a scalar fluid flow dynamical system characterized by a controlled input flow and an uncontrolled output flow. The players have to pay a share of the activation cost to control their dynamics at a given time. We have provided three main contributions. First, we have showed that if the retailers are rational players, then they benefit from using threshold strategies where the threshold is on the fraction of active players. Then, we have obtained explicit expressions for the lower and upper thresholds under specific circumstances. Third, we have extended our study to a scenario with a large number of players and we have proved that two-threshold strategies, such as the (sS) strategies used in inventory control, are optimal strategies for the stationary solution. In this context, we have also provided conditions for the nonstationary mean-field equilibrium to converge to the stationary one in the limit.

A main key direction for future works is to explore the feasibility of the proposed coordination scheme in multi-vector energy systems (heat, gas, power) with special focus on coalitional bidding in decentralized energy trade. The ultimate goal is to investigate the benefits of aggregating independent wind power producers.