Dynamic Coordination Games with Activation Costs

Motivated by inventory control problems with set-up costs, we consider a coordination game where each player’s dynamics is an inventory model characterized by a controlled input and an uncontrolled output. An activation cost is shared among active players, namely players who control their dynamics at a given time. At each time, each player decides to be active or not depending on its inventory level. The main contribution of this paper is to show that strategies at a Nash equilibrium have a threshold structure on the number of active players. Furthermore, we provide an explicit expression for the lower and upper threshold is given both in the deterministic case, namely when the exogenous signal is known, and in the single-stage game. The relevance of the above results is discussed in the context of inventory control where Nash equilibrium reordering strategies imply that a single retailer reorders only if jointly with a number of other retailers and will reorder to restore a pre-assigned inventory level.


Introduction
This paper studies a discrete-state discrete-time dynamic game where players have to coordinate actions within a finite horizon window [2,3]. Each player's dynamics is an inventory model characterized by a controlled input and an uncontrolled output. The output flow is an uncontrolled exogenous signal. The input flow is controlled by the player and is subject to an activation cost. The state of the player is the accumulated discrepancy between input flow and output flow. The activation cost is shared among active players, namely those players who control their dynamics at a given time. The possibility of sharing the activation cost determines the need for coordination of control strategies on the part of the players. We study the cases under deterministic and stochastic disturbances. All results can be extended to the vector case by using the robust decomposition approach in [4,Section 3]. Applications arise in coordinated replenishment [8], and opportunistic maintenance [7].

Contribution
This study contributes in different ways to advance the theory on dynamic coordination games with activation costs for the control. An example of two-threshold strategy is the (s, S) strategy used in inventory control, see [6] and [5,Chapter 4]. We recall that (s, S) strategies are strategies where replenishments occur anytime the inventory level goes below a lower threshold s. Replenishments bring back the inventory level up to a higher threshold S. In particular, we highlight the following results.
-Strategies at a Nash equilibrium have a threshold structure. We obtain this result in two steps. First, we prove that Nash equilibria are associated with (s, S) strategies via Kconvex analysis. Second, we view the (s, S) strategies as threshold strategies on the number of active retailers. -Lower and upper thresholds have an explicit expression in the deterministic case, namely when the exogenous signal is known, or in single-stage games. -We corroborate our results with a numerical analysis of a stylized inventory model. This paper is organized as follows. In Sect. 2, we introduce the dynamic inventory game. In Sect. 4, we first show that all Nash equilibrium strategies have a two-threshold structure with a reorder level and an order-up-to level. We then provide a dual interpretation of such strategies as threshold strategies on the number of active players. In Sect. 5, we specialize our results to the case of single stage coordination game. In Sect. 6, we provide numerical analysis. Finally, in Sect. 7, we draw conclusions and discuss future works.

Dynamic Inventory Coordination Game
Consider a set of n retailers Γ = {1, . . . , n}. At stage t = 0, . . . , N − 1, the ith retailer holds inventory x t i ∈ Z, faces a stochastic demand ω t i ∈ Z + and orders a quantity u t i ∈ U t i ⊆ Z + , where U t i denotes the set of admissible decisions, Z the set of integers, and Z + the set of nonnegative integers. Thus, for all retailers i ∈ Γ , inventory x t i , which we refer to as the state of retailer i, evolves according to a linear finite-state, discrete-time model of the form: Here, we assume that there are no delays between orders and deliveries. For all retailers, we also suppose that the inventory at hand plus inventory ordered may not exceed the storage capacity denoted by C store . Hence, we have x t i + u t i ≤ C store . We also assume that C store ≥ x i 0 so to exclude an empty set of feasible orders. Now, for each time t, let us introduce the vector of the retailers' decisions u t = [u t i ] i∈Γ and the vector of decisions of all retailers other than denotes the Cartesian product of all sets U t j j = i. At each stage, the ith retailer has a cost where E{.} indicates expectation, K ≥ 0 represents the transportation cost, c ≥ 0 is the purchase cost per stock unit, h ≥ 0 is the penalty on holding, p ≥ 0 the penalty on shortage. The term δ(u t i ) is one if the ith retailer replenishes, i.e., is active, and zero otherwise. We henceforth denote by a t the number of active retailers at stage t, i.e.: Note that the term , which describes the fixed cost paid by retailer i in (2), is equal to K a t if retailer i is active and equal to zero otherwise. After introducing the N stage decision vectors u i , and denoting by Φ i (x N i ) a penalty term on final state, the cost over the horizon from 0 to N is of the form A challenging issue in the definition of the stage cost (2) is its dependence on the number of active retailers through the term . This term establishes that the transportation cost K is equally divided by all active retailers. This in turn implies that the cost of one retailer also depends on the decisions of all other retailers. Conditions (1)-(3) describe the dynamics and the costs of our game.
Other concepts we will make use of in the rest of the paper are Nash equilibrium strategies and K -convexity which we briefly recall next. Definition 1 Decisions u i are at a Nash equilibrium, if it holds for all i ∈ Γ For the inventory problem, once at a Nash equilibrium, no retailer benefits from changing its replenishment decisions. The following definition of K -convexity is borrowed from [6].
K -convexity is used in [6] and reiterated in [5] to prove optimality of (s, S) strategies. We will make use of K -convexity to prove the main result of this work. In the following section, we consider threshold strategies according to which, given an inventory level x t i , there exists a threshold l t i ∈ {1, 2, . . . , n}, such that retailer i reorders only if the number of active players a t is greater than or equal to such a threshold. Such strategies are given by As main result, we will show that all Nash equilibrium strategies have the threshold structure (4). To emphasize that l t i depends on x t i , we sometimes write l t i (x t i ). Note that orders depend on the history of the game as they are function of the state variable x t i which in turn depends on the past orders of the retailer as in (1). Orders of a single retailer also depend on her competitors' orders through variable a t .
An additional concept that is important to explain is the one of subgame perfect equilibrium. We have borrowed and adapted from [9] the following definition: Definition 3 A subgame perfect equilibrium is a n-tuple of strategies u = [u i ] i∈Γ , such that for every i ∈ Γ and every history x t i , we have that: Therefore, note that, if strategy (4) returns a Nash equilibrium, then such equilibrium is also subgame perfect in the sense that the strategy returns the optimal order depending on the current state x t i and irrespective of the fact that past orders might not be optimal. To simplify the proofs and the graphs plotted in the following figures, in the rest of the paper, we assume that the penalty term on the final state Φ(x n i ) is null. However, the results that we prove still hold if Φ(x n i ) is a generic convex function with a minimum in x n i = 0.

On the Generality of the Model
Consider an n-dimensional inventory model characterized by discrete states x t ∈ Z n , integer controls u t ∈ Z n + , and binary controls y t ∈ {0, 1} n , and discrete stochastic disturbances w t ∈ Z n + , where t = 0, 1, . . . is the time index. The evolution of the state is described by a linear discrete-time (difference) equation in the general form (5) below, where A and E are matrices of compatible dimensions and x(0) = ξ 0 ≥ 0 is a given initial state. Integer and binary controls are linked through the general capacity constraints (6), where the (scalar) parameter c is an upper bound on control, with the inequalities in (5) and (6) to be interpreted component-wise.
The above dynamics are characterized by two discrete valued control variables per each state. Starting from nonnegative initial states, we wish to control the state to remain confined to the positive orthant, which may describe a safety region in engineering applications or reflect the desire to prevent shortfalls in inventory applications.
A common situation is where the disturbance seeks to push the state out of the desired region. Its value is given at the beginning and fixed that way. Each column of matrix E establishes how each disturbance component influences the evolution of the state vector. Then, it is reasonable to assume Ew(k) < 0, where the inequality is to be interpreted component-wise.
With regard to (5), we can isolate the dependence of one component state on the other ones and rewrite (5) in a way that establishes similarity with standard lot sizing models [10]: Equation (7) is a straightforward representation of (5) where To preserve the nature of the problem, which has stabilizing control actions playing against unstabilizing disturbances, we assume that the influence of other states on state i is relatively "weak." In other words, we assume that the influence of Bx t is small if compared with the unstabilizing effects of disturbances captured by the term Ew t . This is captured by assuming that the sum Δx t + Ew t has same (negative) sign of Ew t , namely where inequality is again component-wise and it holds almost everywhere. Essentially, the states' mutual dependence expressed by Bx t only emphasizes or reduces "weakly" the destabilizing effects of the disturbances. In the following, we present a robust decomposition approach that translates dynamics (7) into n scalar dynamics in "lot sizing" form [10].
With the term "robust decomposition" we mean a transformation through which dynamics (7) are replaced by n independent uncertain lot sizing models of the form (8) where x t i is the inventory, d t i the demand, u t i the reordered quantity and D t i ⊂ R denotes the uncertainty set: Recall that in (7) the disturbance is given at the beginning and fixed that way. We use those values of the disturbance to determine set D t i in (8), as explained in the following. Replacing (7) with (8) is possible once we relate the demand d t i to the current values of all other state components and disturbances as expressed below: where we denote by B i• the ith row of the matrix B, with the same convention applying to In other words, we assume that the influence that all other states have on state i enters into Eq. (8) through demand d t i defined in (9). Following the decomposition, each lot sizing model is controlled by an agent i (whose state is x i ) who plays against a virtual opponent which selects a worst-case demand, which can be viewed as a two-player game.
Our next step is to make the n dynamics in the form (8) mutually independent. Toward that end, we introduce X t as the set of x t and observe that this set is bounded for bounded d t i . The set X t can be defined in two steps. First, we assume that the states never leave a given region, then we compute the worst-case vector x t in the region, namely the vector x t that, once substituted in (9), has the effect of pushing the ith state out of the safe region. Then, we check whether the trajectory still lies within the region.
Boundedness of X t means that there exists a scalar φ > 0 such that x ∞ ≤ φ for all x ∈ X t . In view of this, it is possible to decompose the system by replacing the current demand d t i by the maximal or minimal demand as computed below: where [B i j ] + denotes the positive part of B i j , i.e., max{B i j , 0} and [B i j ] − the negative part.
From the above preamble, we derive the uncertainty set as Likewise, (10) describes the demand that would push the state out of the positive orthant in the longest time.

Nash Equilibrium Strategies
In this section, we show that all Nash equilibrium strategies are threshold strategies of type (4): retailer i reorders only if the number of active retailers is greater than or equal to a given threshold. For the general model explained in Sect. 3, proving that strategies at a Nash equilibrium have a threshold structure is not straight forward, for that reason in this section the results are given for a single retailer i. To show this, in the next subsection we prove the optimality of the (s, S)-like strategies via K -convex analysis (see the definition in [5], chapter 4). We recall from [5] that (s, S) strategies are strategies where replenishments occur anytime the inventory level goes below a lower threshold s. Replenishments bring back the inventory level up to a higher threshold S [6]. This is formally stated below where μ(.) is the strategy, x the inventory, and s and S lower and upper thresholds, respectively: We refer to (s, S)-like strategies as (s, S) strategies whose thresholds depend on the players and on time, i.e., we will have s := s t i and S := S t i for fixed i and t. In Theorem 1, we prove the optimality of (s, S)-like strategies. Before doing this, we need some preliminary analysis which is inspired by [5,Chapter 4].
Similarly, denote the resulting transportation costs by K 0 , . . . , K N −1 . Note that K t is a function of u t −i but for ease of notation sometimes we omit the dependence. Then, let us rewrite the stage cost (2) for retailer i as Now, we can write the cost-to-go from stage t to the final stage recursively using dynamic programming and the Bellman equation. Let us use the superscript t to indicate the iteration. (3). Being y t i = x t i + u t i , the instantaneous inventory position, i.e., the inventory level just after the order has been issued, let us define the new function and rewrite the Bellman Eq. (11) as follows Note that if we can show that v t+1 i is K -convex with K = K t then G t i is also K -convex for K = K t and the Bellman Eq. (13) has a unique minimizer. Indeed, it has been proved in [5], chapter 4.
. This represents a sufficient optimality condition for the (s, S)-like strategies with thresholds depending on time t, that is, s := s t i and S := S t i , where s t i and S t i satisfy: The meaning of s t i and S t i is exactly the same as in the (s, S) strategies (cfg. [5]), that is, s t i represents the minimum threshold on inventory level below which retailers replenish to restore the inventory up to level S t i . Now, let us call s t i , the threshold which corresponds to the assumption that the ith retailer is charged the whole transportation cost, i.e., In the above condition, we have set K t = K . Analogously, let us denote by s t i the threshold computed as if all retailers would share equally the transportation cost, i.e., In essence, in the condition above, each retailer is charged a transportation cost K t = K n , namely one nth of the full cost K . Hence, we have s i ≤ s t i ≤ s i . The following theorem establishes the optimality of (s, S)-like strategies, where each pair of thresholds is valid on different intervals of inventory levels.
Proof The proof is by induction. Assume J N i (x N i ) = 0, and consider the convex function Then, we say that G N −1 i (·) is convex and hence, it is also K -convex where K = K N −1 as shown in Fig. 1. Here, we also use the notation The above reasoning on K -convexity implies that the piecewise linear function ) (see, e.g., Fig. 1). To obtain S N −1 i , let a probability distribution function φ N −1 : Z + → [0, 1] be given, Then, the cost of reordering is given by where E t h (γ ) and E t s (γ ) are the expected holding and shortage, respectively, defined as: Let the discrete difference operator be given, d dS and let us apply such an operator to function In the above equations, we make use of the following conditions γ +1 The order-up-to level S N −1 i is the optimal γ , which is obtained from solving From the above, we then obtain To obtain s N −1 i , let us consider the cost of not reordering, which is given by Now, we are going to assume that the statement is true for some t = m, and we are going to proof that it is also valid for t = m − 1.
Consider now the convex function (see Fig. 2 which illustrate the example of t = N − 2) is K m−1 -convex, with a global minimum at S m−1 It is important to notice that we can ensure the existence of a unique minimum value in (18) thanks to the nondecreasing property of K m−1 .
The cost of reordering for t = m − 1 is given by Applying operator d dγ to function G m−1 (14) and (15), respectively Hence, the order-up-to level S m−1 i is the optimal γ , which is obtained from solving To obtain s m−1 i , let us consider the cost of not reordering, which is given by Then, we have The above can be rewritten as Thus by induction backwards in time, we have proved Theorem 1.
We can reinterpret the (s, S)-like strategies as threshold strategies on the number of active retailers. The result is that all Nash equilibrium strategies have the threshold structure (4).
In the following result on a single-stage inventory game (where we have dropped index t), we reinterpret a threshold on the inventory level as a threshold on the number of "active retailers".
is a Nash equilibrium for the single-stage formulation of the inventory game. For the sake of simplicity, we have dropped dependence on time.
Proof From Theorem 1, if N = 1, we have a unique multi-period strategy (s i , S i ). This means that the retailers make decisions according to Note that from G i (s i ) = G i (S i ) + ( K a ) we have that s i depends on the number of active players a. Now, for given x i , the idea is to find l i as the minimum number of active players such that the cost of replenishing does not exceed the cost of not replenishing. This can be expressed by the minimization below (in a single-stage optimization, we can drop the second argumentū t+1 −i from G i (., .)) Strategy (20) implies (19) once we compute l i from (21) for fixed x i .
-The inventory level is "low," namely, x i < s i . Then, the optimal decision is "replenish" independently of a. Actually, the minimization (21) returns l i = 1 and as it always holds a ≥ l i we have μ i (x i , a) = S i − x i . -The inventory level is "high," namely x i ≥ s i . Then, the optimal decision is "do not replenish." Indeed, the minimization (21) is infeasible. With a little abuse of notation, we can take l i = n + 1 so that it always holds a < l i and therefore also μ i (x i , a) = 0.

Single-Stage Coordination
In this section, we specialize our results to the case of single-stage game. In particular, we provide explicit expressions for the two thresholds, as a function of the probability distribution function which determines the stochastic demand. Let us start by noting that in the single-stage game function G t i (y t i ,ū t+1∼N −1 −i ) does not depend onū N −i and therefore, we simply write G t i (y t i ): Then, we have for the value function To obtain S t i , consider the cost of reordering, which is given by Let the discrete difference operator be given, d dS and let us apply such an operator to function . By applying the difference operator to function G t i (γ ), we then have d dγ Further derivations yield In the above, we have used the following equalities The order-up-to level S t i is the optimal γ , which is obtained from solving From the above, we then obtain To obtain s t i , let us consider the cost of not reordering, which is given by From the above, we then obtain In particular, we have Equations (25) and (27) represent explicit expressions for the two thresholds and fully characterize then the reordering strategy once the probability distribution of the stochastic demand is given.
Once thresholds are obtained, we implement the control u t i which is given by The resulting dynamics is then (29)

Numerical Analysis
We consider an example where the demand ω t ∈ Ω := {0, 1, 2} and is uniformly distributed, namely after introducing the notation φ ω to indicate the probability that ω t = ω, we have φ ω = 1 3 for ω = 0, 1, 2. Assume that the proportional purchase cost is c = 1, the shortage cost is p = 10, and the holding cost is h = 2. In the case of single stage optimization, we have that the order up to level is given by From the above, we obtain S = 2. Indeed for γ = 2, we have Differently, for γ = 1 it holds and therefore As for the reorder level s, we have We show next that we have s = 1. Actually, for x = 1 we obtain which is satisfied by any K t ≥ 1. For x = 0, we have which is satisfied by any K t < 6. For any K t < 6, we then have We can conclude then that for any K t such that 1 ≤ K t < 6 we have the reorder level s = 1 and the order-up-to level S = 2. Then, from (29) the microscopic dynamics is defined in the bounded support {−1, 0, 1, 2}, namely x t ∈ {−1, 0, 1, 2} for all t ≥ 0 and is given by Figure 3 displays the time plot of the microscopic dynamics for a single player. In other words, the plot shows the inventory level (the state) of a player. The player's inventory is for most of the time in state 0 and 1, which is in accordance with the greater values of the distribution in those states.
In the following example, we consider a larger instance involving five agents, where the demand of each agent w t ∈ Ω := {0, 1, . . . , 20} and is uniformly distributed. Assume the same purchase, shortage, and holding costs as in the previous example and consider a transportation cost K = 120, which will be divided among the active agents at each time t ∈ [0, 50]. Figure 4 shows the relation between the inventory levels and the transportation costs that each player is willing to pay in case of reordering as well as the minimum number of active agents in case of replenishment for any inventory level. It is possible to see that the inventory has an inverse relation with the transportation cost and an increase relation with the number of active agents. This means that if the inventory level of agent i is higher, the agent is willing to pay less in case of reordering and hence, it is expected to require a large number of active agents to coordinate with.
The last two figures (Figs. 5, 6) display the inventory level of the five players over time. In Fig. 5, it is possible to see the moment in time when it is most convenient that the players coordinate for replenishment. On the other hand, Fig. 6 exhibits the relation of the inventory level and the number of active agents at each time. It is clear that the agents reorder when its inventory level is lower or equal to the threshold s, which also depends on the number of active agents, and they reorder up to the upper threshold S = 15.

Conclusions and Future Works
We first developed an abstraction in the form of a dynamic coordination game model where each player's dynamics is a scalar inventory model characterized by a controlled input and an uncontrolled output. The players have to pay a share of the activation cost to control their dynamics at a given time. First, we showed that if the retailers are rational players, then they benefit from using threshold strategies where the threshold is on the number of active players. We then turned to obtain an explicit expressions for the lower and upper thresholds under specific circumstances. A main key direction for future works is to explore the feasibility of the proposed coordination scheme in multi-vector energy systems (heat, gas, power) with special focus on coalitional bidding in decentralized energy trade. The ultimate goal is to investigate the benefits of aggregating independent wind power producers.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.