1 Introduction

This paper studies a periodic review multi-item two-layer inventory model to determine the can-order threshold of a (Scs) policy in a coordinated replenishment problem. The (Scs) policy was first introduced by Balintfy [3] and is useful in situations where the demand is stochastic and where the setup costs are high.

In the (Scs) policy, an order is placed to take the inventory level to the order-up-to threshold S. A replenishment occurs either when the inventory level is lower than the must-order threshold s or when the inventory level is lower than the can-order threshold c and there is at least another item being ordered. The inventory evolution of a generic item i over time under the (Scs) policy is illustrated in Fig. 1.

Fig. 1
figure 1

Inventory evolution over time; in the grey area coordinated replenishment may occur

Various algorithms [4, 8, 18, 23, 29, 32, 33] and heuristics [6, 22, 34] to determine the thresholds of the (S, c, s) policy are established and compared in the literature [12, 19, 21, 26, 27]. Federgruen et al. [8] proposed an algorithm to determine the optimal (Scs) policy in a multi-item inventory system with compound Poisson demands. In the work of Zheng [35], the author provides a method to determine the thresholds in a single-item continuous review inventory system with Poisson demand. Similarly, in this work we assume that the demand and discount opportunity of each item evolves according to two independent Poisson processes. However, we consider that the rate parameters of these processes are unknown and change over time.

In practice, demand is often assumed to be uncertain [2, 10, 20, 23,24,25,26, 28, 31] and demand forecasting models are widely developed in order to cope with uncertainty [9]. In this research, as the demand follows a Poisson distribution, we apply a Poisson regression model to learn the future demand [5]. This technique is commonly used to forecast a count variable that follows a Poisson distribution and it ensures that the estimated parameter takes values in the set of nonnegative integers \({\mathbb {Z}}_+\). On the other hand, the setup costs can also be uncertain [1]. In this case, we apply an exponential smoothing method to forecast the values of the setup cost at each time t [11, 17]. The uncertainty in the parameters highlights the versatility of the proposed method and its link to data-driven approaches. Indeed, the method is effective even when we do not know a priori the values of the parameters. Furthermore, considering setup costs as unknown parameters allows to accommodate real-life scenarios where it is common that market conditions generate changes in transportation cost.

In this paper, we focus on determining how the optimal can-order level in a periodic review system with multiple items and discount opportunities can be obtained, while learning uncertain setup cost and uncertain demand. We assume that each item is managed by an independent decision maker. In addition, assuming that the decision-maker of each item does not have information about the demand of any other item, we translate this coordinated replenishment problem into a Bayesian game, where each item is considered as a player. The theory about Bayesian games was introduced by Harsanyi [14,15,16]. However, in a Bayesian game all the players need to know the prior probability distribution of the demand. In this work, we assume that the players do not know the exact prior probability distribution, but each player knows the set of possible probability distributions. To be protected against the uncertainty of not knowing the real probability distribution, we apply distributionally robust optimization [30] to obtain the optimal can-order level that minimizes the expected cost.

The main contributions of this paper are:

  • We present a new model to describe a coordinated replenishment game which involves a learning mechanism to forecast uncertain and time-dependent parameters. This model is based on Bayesian games and distributionally robust optimization to accommodate the uncertainty in the probability distributions.

  • We prove that the cost function is convex with respect to the can-order threshold and that it is concave with respect to the set of probability distributions used for the robust optimization problem. Furthermore we prove that, in terms of Bayesian games, finding an optimal can-order level that minimizes the cost function corresponds to finding a strategy at a Bayesian Nash equilibrium.

  • We provide a new method to determine the optimal-can order level that minimizes the cost function.

  • We conclude with a numerical analysis to corroborate our theoretical results.

This paper is organized as follows. In Sect. 2, we introduce the model for a specific item and we extend the model to multi-item joint replenishment. In Sect. 3, we analyse the coordinated replenishment model presented in Sect. 2 as a Bayesian game. In Sect. 4, we present our main results. In Sect. 5, we provide a numerical analysis. Finally, in Sect. 6 we draw conclusions and discuss future works.

2 Problem Statement and Model

Consider consecutive time intervals \([t, t+1)\) for all \(t = 0,1,\ldots \), and let the set of items \(N=\{1,2,...n\}\) be given. Let us assume that each item \(i\in N\) is managed by an independent decision maker. For each specific item \(i\in N\), we also assume that the customers arrive one at a time. The arrival of a customer indicates the occurrence of a demand event. The cumulative demand in a time interval \([t,\tau ]\), where \(\tau \in [t,t+1)\), is given by the number of customers that arrive during that period of time. Let \(d_i(\tau )\in {\mathbb {Z}}_+\) be the cumulative demand of item i from time t to \(\tau \), where \({\mathbb {Z}}_+\) represents the set of nonnegative integers. The value of the cumulative demand \(d_i(\tau )\) is reset at the beginning of each time interval \([t,t+1)\). It means that when \(\tau =t^+, d_i(\tau ) = 0\), where \(t^+\) denotes the first time instant in the interval \([t,t+1)\). We are interested in determining the cumulative demand that is observed at the end of each discrete-time interval. Let us denote by \(D_i(t+1)\in {\mathbb {Z}}_+\) the observed cumulative demand of item i from time t to \(t+1\). In addition, let us denote by \((t+1)^-\in [t,t+1)\) the last time instant in the interval \([t,t+1)\). The cumulative demand at time \(t+1\) is given by \(D_i(t+1)=d_i((t+1)^-).\)

It is assumed that \(d_i(\tau )\) evolves according to a continuous time Poisson process with rate \(\lambda _i(t)\in {\mathbb {R}}_+\), where \(\lambda _i(t)\) represents the average number of demand events that occur during the time interval \([t,t+1)\), and \({\mathbb {R}}_+\) denotes the set of positive real numbers. However, at the beginning of each time interval the true value of the rate parameter \(\lambda _i(t)\) is unknown and has to be estimated. Let us denote by \({\hat{\lambda }}_i(t)\) the estimated parameter of the Poisson process. This parameter is learned from past data by applying a Poisson regression model. In this Poisson regression model, the estimated parameter depends on the historical observed cumulative demand \(D_i(t)\), as well as on the historical data of the minor and major setup costs, denoted by \(a_i(t)\) and A(t), respectively. The minor setup cost \(a_i(t)\in {\mathbb {R}}_+\) is the cost that has to be paid when item i is ordered jointly with at least one other item \(j\in N\), \(j\ne i\). On the other hand, the major setup cost \(A(t)\in {\mathbb {R}}_+\) corresponds to the cost that has to be paid when an order is placed and no other items are ordered at the same time.

Before explaining the Poisson regression model, let us introduce the forecast cumulative demand of item i at time \(t+1\), denoted by \({\hat{D}}_i(t+1)\in {\mathbb {Z}}_+\). We assume that the forecast cumulative demand follows a Poisson distribution with rate \(\hat{\lambda }_i(t)\). Let \({\mathscr {D}}_i(t)\) be the discrete-time random variable of the forecast cumulative demand. The corresponding Poisson distribution has the following probability mass function:

$$\begin{aligned} Pr({\mathscr {D}}_i(t)=\varkappa )=Pr(\varkappa )=\frac{e^{-{\hat{\lambda }}_i(t)}{\hat{\lambda }}_i(t)^{\varkappa }}{\varkappa !}, \quad \varkappa \in {\mathbb {Z}}_+. \end{aligned}$$
(1)

It is important to remember that the expected value and variance of a Poisson distribution correspond to the value of the rate parameter. Furthermore, we consider that the forecast cumulative demand is given by the value of the estimated parameter \({\hat{\lambda }}_i(t)\), namely \(\sigma ^{2}({\mathscr {D}}_i(t)) = {\mathbb {E}}({\mathscr {D}}_i(t)) = {\hat{\lambda }}_{i}(t)={\hat{D}}_i(t+1).\)

For a specific item i, let us assume that we have historical data of the observed demand and the setup costs. The data consist of m independent observations of the cumulative demand \(D_i(t)\), as well as of the major and minor setup costs A(t) and \(a_i(t)\), respectively, for all \(t=0,1 \ldots , m\). In a Poisson regression model, \(D_i(t)\) is the dependent variable and \((A(t),a_i(t))\) are the linear independent regressors that determine \(D_i(t)\) at each time t. The Poisson regression model in this case is defined by the following equations:

$$\begin{aligned} \left\{ \begin{array}{lll} Pr(D_i(t)|(A(t),a_i(t)))=\frac{e^{-\lambda _i(t)}\lambda _i(t)^{D_i(t)}}{D_i(t)!},\\ \lambda _i(t)= \lambda _i(A(t),a_i(t),\beta _1,\beta _2)=e^{A(t)\beta _1+a_i(t)\beta _2}, \end{array}\right. \end{aligned}$$
(2)

where \(Pr(D_i(t)|(A(t),a_i(t)))\) is the conditional probability distribution of the observed cumulative demand given the independent variables A(t) and \(a_i(t)\), and the unknown parameters \(\beta _1,\beta _2\in {\mathbb {R}}\) called regression coefficients. Observe that this conditional probability distribution is also a Poisson distribution, where the parameter \(\lambda _i(t)\) is a continuous function that depends on the variables \((A(t),a_i(t))\) and also on the unknown parameters \(\beta _1\) and \(\beta _2\).

Let us denote by \(\beta =(\beta _1,\beta _2)\) the vector of unknown parameters and by \(\hat{\beta }=({\hat{\beta }}_1,{\hat{\beta }}_2)\) its corresponding vector of estimated parameters. In a Poisson regression model, these parameters are estimated by maximum likelihood estimation (MLE). Given m independent observations, the log-likelihood function is given by

$$\begin{aligned} ln{\mathscr {L}}(\beta )&=ln\prod _{k=1}^m \frac{e^{-\lambda _i}(k)\lambda _i(k)^{D_i(k)}}{D_i(k)!} \nonumber \\&=ln\prod _{k=1}^m \frac{e^{-[e^{A(k)\beta _1+a_i(k)\beta _2}]}\left[ e^{A(k)\beta _1+a_i(k)\beta _2}\right] ^{D_i(k)}}{D_i(k)!} \nonumber \\&=\sum _{k=1}^m\left\{ [A(k)\beta _1+a_i(k)\beta _2]D_i(k)-e^{A(k)\beta _1+a_i(k)\beta _2}-ln(D_i(k)!)\right\} . \end{aligned}$$
(3)

To maximize the log-likelihood function, we differentiate (3) with respect to \(\beta =(\beta _1,\beta _2)\) and we set the derivative equal to zero. The estimated parameters \(\hat{\beta }_1\) and \(\hat{\beta }_2\) are obtained from solving the set of nonlinear equations:

$$\begin{aligned} \left\{ \begin{array}{lll} \frac{\partial [ln{\mathscr {L}}(\beta )]}{\partial \beta _1}=\sum _{k=1}^m A(k)\left[ D_i(k)-e^{A(k)\beta _1+a_i(k)\beta _2}\right] =0,\\ \frac{\partial [ln{\mathscr {L}}(\beta )]}{\partial \beta _2}=\sum _{k=1}^m a_i(k)\left[ D_i(k)-e^{A(k)\beta _1+a_i(k)\beta _2}\right] =0. \end{array}\right. \end{aligned}$$
(4)

Note that finding an analytical solution for \({\hat{\beta }}_1\) and \({\hat{\beta }}_2\) is prohibitive. However, it is common to apply iterative algorithms to obtain the values of the parameters that maximize the log-likelihood function. The most common method that statistical programs use to find the solution for the parameters is the Newton–Raphson iterative method. By applying this method convergence is guaranteed, as the log-likelihood function (3) is globally concave.

On the other hand, at time \(t+1\) the values of the major and minor setup cost are also unknown. The learning mechanism applied to forecast the setup costs is the exponential smoothing. Let us denote by \({\hat{A}}(t+1)\) and \({\hat{a}}_i(t+1)\) the forecast values of the major and minor setup costs at time \(t+1\), respectively. These values are obtained from the following exponential smoothing model:

$$\begin{aligned} \left\{ \begin{array}{lll} {\mathbb {E}}(a_{i}(t))={\hat{a}}_{i}(t+1) = \alpha a_{i}(t) + (1 - \alpha ) {\hat{a}}_{i}(t),\\ {\mathbb {E}}(A(t))= {\hat{A}}(t+1) = \gamma A(t) + (1 - \gamma ) {\hat{A}}(t), \end{array}\right. \end{aligned}$$
(5)

where \(\alpha \) and \(\gamma \) are the smoothing coefficients.

Finally, to forecast the value of the demand rate at time \(t+1\) we substitute in (2) the estimated parameters \((\hat{\beta }_1\), \(\hat{\beta }_2)\), and the forecast setup costs \(({\hat{A}}(t+1),{\hat{a}}_i(t+1))\) as follows:

$$\begin{aligned} {\mathbb {E}}({\mathscr {D}}_i(t+1))=\hat{\lambda }_i(t+1)=e^{{\hat{A}}(t+1)\hat{\beta }_1 + {\hat{a}}_i(t+1)\hat{\beta }_2}. \end{aligned}$$
(6)

Figure 2(left) displays the continuous cumulative demand \(d_i(\tau )\) which evolves according to a Poisson process and the cumulative demand \(D_i(t)\) in the interval \([t-1,t)\). The displayed example repeats over three consecutive time periods. Once a Poisson regression model has been fit for the historical information up to time t, we can use this model to forecast the demand rate at time \(t+1\) as in Fig. 2(right).

Fig. 2
figure 2

Demand evolves according to a Poisson process (left). The observed cumulative demand at discrete time D(t) is used to estimate the Poisson rate in the interval \([t,t+1)\) using a Poisson regression model (right)

In the (Scs) policy, item i is ordered as a result of a discount opportunity when its inventory level is lower than its can-order threshold \(c_i(t)\in {\mathbb {R}}_+\) and there is, at least, an order placed by the decision maker of another item \(j\in N\), \(j\ne i\). We assume that the discount opportunity events for item i are also modelled by a Poisson process with rate \(\mu _i(t)\in {\mathbb {R}}_+\). However, similarly as in the rate parameter \(\lambda _i(t)\), the discount opportunity rate is unknown at the beginning of each time interval \([t,t+1)\) and has to be estimated. Let us denote by \({\hat{\mu }}_i(t)\) the estimated rate parameter of the Poisson process that describes the discount opportunity events for item i. The discount opportunity rate can be calculated as follows:

$$\begin{aligned} {\hat{\mu }}_i(t)=\sum _{j\in N, j\ne i}\frac{{\hat{\lambda }}_j(t)}{S_j(t)-s_j(t)}+\sum _{j\in N, j\ne i}\frac{{\hat{\lambda }}_j(t)}{S_j(t)-c_j(t)}\sum _{k\in N, k\ne j}\frac{{\hat{\lambda }}_k(t)}{S_k(t)-s_k(t)}. \end{aligned}$$
(7)

For the computation of the discount opportunity rate, in (7) we consider the possible events that can generate a discount opportunity. For a specific item i, the first term in (7) corresponds to the possibility that any other item \(j\ne i\) is reordered separately. On the other hand, the second term in (7) corresponds to the possibility that any item \(j\in N\), \(j\ne i\), is reordered jointly with a different item \(k \in N\).

Note that the discount opportunity rate of item i depends on the demand rates of the other items in N. In this work, we assume that the demand Poisson processes for the different items are independent. In Sect. 3, we explain how each player i estimates the values of \({\hat{\lambda }}_j\) for all \(j\ne i\), \(j\in N\). It is important to observe that the demand and the discount opportunity are two independent Poisson processes; therefore, any possible event, which involves either the materializing of a demand or the occurrence of a discount opportunity, is determined by a Poisson process with rate \(\lambda _i(t)+\mu _i(t)\).

At each time t, an optimization problem over an infinite time horizon is formulated to obtain the optimal thresholds of the can-order policy \((S_i,c_i,s_i)\). In doing this the values of the demand parameters are regarded as constant and equal to the last forecast. This is illustrated in Fig. 3, where \({\hat{D}}_i(t)\) is the forecast demand at time t, which is regarded as a constant as indicated by the horizontal line. At each time t, a new \({\hat{D}}_i(t)\) is calculated and used in the new optimization problem.

Fig. 3
figure 3

One iteration of the receding horizon optimization; three samples of observed demand \(D_i\) (diamond), and forecast demand \({\hat{D}}_i\) (circles). Forecast is kept constant throughout the optimization horizon (dashed lines)

The must-order \(s_{i}(t)\in {\mathbb {N}}\) level can be obtained as

$$\begin{aligned} s_{i}(t) := {\mathbb {E}}({\mathscr {D}}_{i}(t)) + k \sqrt{\sigma ^2({\mathscr {D}}_{i}(t)) } = \hat{\lambda }_{i}(t) + k \sqrt{\hat{\lambda }_{i}(t)}. \end{aligned}$$
(8)

The term k is a safety factor that has to be set to ensure that in a specified percentage of the possible cases shortage does not occur [13].

The order-up-to level \(S_{i}(t)\in {\mathbb {N}}\) consists of the safety stock SS and the economic order quantity \(EOQ_{i}\), namely \(S_{i} := SS + EOQ_{i}\). For the economic order quantity, we have \(EOQ_{i} := \sqrt{{2A(t) {\mathbb {E}}({\mathscr {D}}_{i}(t)) }/{h_{i}}}\), where \(h_i\) is the holding cost per unit time. Then, for the order-up-to level we obtain

$$\begin{aligned} S_{i}(t) := k \sqrt{\hat{\lambda }_{i}(t)} + \sqrt{\frac{2A(t)\hat{\lambda }_{i}(t)}{h_{i}}}. \end{aligned}$$
(9)

The dynamics of the inventory level for a specific item i is characterized by a Markov decision-making process. Let us assume that the inventory level of item i at time t is y and when the next event occurs, the inventory goes to level x, the transition probabilities \(p_{yx}(t)\) between two inventory levels are given by

$$\begin{aligned} p_{yx}(t)=\left\{ \begin{array}{ll} \quad 1 &{} \qquad \qquad y\le s_i, \quad x=S_i,\\ \frac{\hat{\lambda }_i(t)}{\hat{\lambda }_i(t)+\hat{\mu }_i(t)} &{} \qquad s_i<y\le c_i, \quad x=y-1,\\ \frac{\hat{\mu }_i(t)}{\hat{\lambda }_i(t)+\hat{\mu }_i(t)} &{} \qquad s_i<y\le c_i, \quad x=S_i,\\ \frac{\hat{\mu }_i(t)}{\hat{\lambda }_i(t)+\hat{\mu }_i(t)} &{} \qquad c_i<y\le S_i, \quad x=y,\\ \frac{\hat{\lambda }_i(t)}{\hat{\lambda }_i(t)+\hat{\mu }_i(t)} &{} \qquad c_i<y\le S_i, \quad x=y-1\\ \quad 0 &{} \qquad \qquad \qquad \hbox {otherwise}. \end{array}\right. \end{aligned}$$
(10)

The transition probabilities in Eq. (10) describe a data-driven time varying Markov chain. The Markov decision process is illustrated in Fig. 4. The plot depicts transitions between inventory levels. When at time t the inventory is \(s_i(t)+1\) the transition can be to \(s_i(t)\) (unit demand) with probability \(\frac{{\hat{\lambda }}_i(t)}{{\hat{\lambda }}_i(t)+{\hat{\mu }}_i(t)}\) or to \(S_i\) (discount opportunity) with probability \(\frac{{\hat{\mu }}_i(t)}{{\hat{\lambda }}_i(t)+{\hat{\mu }}_i(t)}\). When the inventory is \(c_i+1\) the transition can be to \(c_i\) (unit demand) with probability \(\frac{{\hat{\lambda }}_i(t)}{{\hat{\lambda }}_i(t)+{\hat{\mu }}_i(t)}\) or it remains at the same level in case of discount opportunity with probability \(\frac{{\hat{\mu }}_i(t)}{{\hat{\lambda }}_i(t)+{\hat{\mu }}_i(t)}\). The expected time between consecutive events is given by \(1/(\hat{\lambda }_{i}(t) + \hat{\mu }_{i}(t))\).

Fig. 4
figure 4

Markov decision process: inventory levels (nodes) and transitions (arcs)

Given the transition probabilities, the purpose is to determine the optimal policy that minimizes the long-run average cost for each item i. The long-run average cost is defined similarly as in the work of Zheng [35] and is given by

$$\begin{aligned} g_{i}(S_{i},c_{i},s_{i}) := \frac{I_{i}(S_{i}) + {\mathbb {E}}(A) Q_{i}(S_{i}) + {\mathbb {E}} (a_{i})(1-Q_{i}(S_{i}))}{T_{i}(S_{i})}. \end{aligned}$$
(11)

In the above, assuming that the inventory level is \(S_i\) (we are at the beginning of a new replenishment period), we denote the expected time until the next order is placed by \(T_{i}(S_i)\), the expected holding and penalty costs incurred until the next order is placed by \(I_{i}(S_{i})\), and the probability that the next order is triggered by a demand by \(Q_{i}(S_{i})\).

Let us denote by y the inventory level of item i at time t and by \(G_{i}(y)\) its corresponding one-step inventory cost. In addition, let us denote by \( \theta _i(t)= \frac{{\hat{\lambda }}_i(t)}{{\hat{\lambda }}_i(t)+{\hat{\mu }}_i(t)}\) the probability that the next event is the occurrence of a unitary demand. In the following result, we explain how to obtain an explicit expression for each term in (11).

Lemma 1

For the long-run average cost (11), the following equations hold:

$$\begin{aligned}&T_{i}(S_i) := \frac{1}{\hat{\lambda }_{i}(t)+ \hat{\mu }_{i}(t)}\sum _{x=0}^{(c_{i}-s_{i}-1)} \theta (t)^{x} + \frac{1}{\hat{\lambda }_{i}(t)}(S_i-c_{i}), \\&I_i(S_i)=\frac{1}{\hat{\lambda }_{i}(t)+ \hat{\mu }_{i}(t)}\sum _{x=0}^{(c_{i}-s_{i}-1)} \theta (t)^{x}G_{i}(S_i-x) + \frac{1}{\hat{\lambda }_{i}(t)}\sum _{x = c_{i} + 1}^{S_i}G_{i}(x), \\&Q_i(S_i)=\theta (t)^{(c_i-s_i)}. \end{aligned}$$

Proof

The proof is in the Appendix. \(\square \)

Since the parameters involved in the computation of the long-run average cost depend on the thresholds \(S_{i}(t)\), \(c_{i}(t)\), and \(s_{i}(t)\), the minimization of the long-run average cost with fixed \(S_{i}(t)\) and \(s_{i}(t)\) consists in finding the optimal value of \(c_{i}(t)\). The problem we wish to solve can be formulated as follows:

For all \(i=1,\ldots ,n\)

$$\begin{aligned} \underset{c_{i}(t) \in {\mathbb {N}}}{\text {min}} g_{i}(S_{i}(t),c_{i}(t),s_{i}(t)) :=\frac{I_{i}(S_{i}(t)) + {\mathbb {E}}(A(t))Q_{i}(S_{i}(t)) + {\mathbb {E}}(a_{i}(t))(1-Q_{i}(S_{i}(t)))}{T_{i}(S_{i}(t))},\nonumber \\ \end{aligned}$$
(12)

where the expressions for \(T_{i}(S_i)\), \(I_{i}(S_i)\) and \(Q_{i}(S_i)\) are as in Lemma 1.

In the same spirit as in Zheng [35], an algorithm to calculate the optimal can-order level for a single-item inventory system with Poisson demand and Markovian discount opportunities on the setup cost is presented in the rest of this paper. In this research, the long-run average cost formula is extended to a multi-item periodic review system, where the demand rate \(\hat{\lambda }_i(t)\) is forecast by a Poisson regression model.

It is important to observe that the long-run average cost of a specific item i depends on the decisions made for the other items. When item i is reordered as a result of a discount opportunity the minor setup cost \(a_i(t)\) is paid. Otherwise, whenever item i is reordered without a discount opportunity, the major setup cost A(t) is paid. Nevertheless, the decision-maker of item i does not know the demand rate \( \lambda _j (t) \) of any other item, for all \( j \in N \), \( j \ne i \). In the next section, we explain how the decision-maker of each item i can estimate the demand rate \( {\hat{\lambda }}_j (t) \), for all \( j \in N \), \( j \ne i \) and how it affects the computation of the optimal can-order level that minimizes the long-run average cost (12). This estimation is based on the theory of Bayesian games as introduced by Harsanyi [14,15,16].

3 Bayesian Game

In this section, we analyse the coordinated replenishment model presented in Sect. 2 as an incomplete game, where each item \(i\in N\) represents a player. A game is called incomplete when the players do not have complete information about the important parameters of the game. As it is explained in the previous section, at each time t a new observation of the demand is obtained based on which we estimate the future demand and setup costs. These estimated parameters are used in the Bayesian game and in the computation of the optimal can-order level \(c^*_i(t)\) that minimizes the expected long-run average cost. For the sake of simplicity in the rest of this paper, we drop the dependence on time t.

First, let us introduce the control variable \(u_{i}\) for a specific item i. This variable indicates the possible amount to be reordered, which can be triggered either by the occurrence of a demand event or by a discount opportunity, and it is determined by the corresponding \((S_i,c_i,s_i)\) policy. For any inventory level y, this strategy can be defined formally as follows:

$$\begin{aligned} u_i(y)=\left\{ \begin{array}{ccc} S_i-y, &{} \qquad \text{ if } \quad y<s_i,\\ S_i-y, &{} \qquad \text{ if } \quad s_i<y<c_i &{}\quad \text{ and } \text{ discount } \text{ opportunity },\\ 0, &{} \qquad \text{ if } \quad s_i<y<c_i &{}\quad \text{ and } \text{ no } \text{ discount } \text{ opportunity },\\ 0, &{} \qquad \text{ if } \quad y\ge c_i. \end{array}\right. \end{aligned}$$
(13)

Let us denote by \({\mathscr {G}}\) the incomplete game related to the initial coordinated replenishment problem. We consider this as an incomplete game since the only information available to player i is its own estimated parameter \({\hat{\lambda }}_i\). However, player i does not know the estimated parameter \({\hat{\lambda }}_j\) of any other player \(j\in N\), \(j\ne i\). Let \(\hat{\lambda }=({\hat{\lambda }}_1,\ldots ,{\hat{\lambda }}_n)\) be the random vector that contains the demand rates of all the players. Also, let \(\hat{\lambda }_{-i}=({\hat{\lambda }}_1,\ldots ,{\hat{\lambda }}_{i-1},{\hat{\lambda }}_{i+1},\ldots ,{\hat{\lambda }}_n)\) be the random vector that contains the demand rates of all the players, except player i. The incomplete game \({\mathscr {G}}\) can be formally defined in normal form as follows:

$$\begin{aligned} {\mathscr {G}}=\left\langle N,\{U_i\}_{i=1}^n,\{L_i\}_{i=1}^n,\{g_i(S_i,c_i,s_i)\}_{i=1}^n,\{P_i\}_{i=1}^n\right\rangle , \end{aligned}$$
(14)

where \(N=\{1,\ldots ,n\}\) is the set of players, \(U_i\) is set of strategies of player i, \(L_i\) is the set of possible values that the rate parameter \({\hat{\lambda }}_i\) can take, \(g_i(S_i,c_i,s_i)\) is the long-run average cost of player i. \(P_i=P_i(\hat{\lambda }_{-i}|\hat{\lambda }_i)\) is the conditional probability distribution that player i uses to describe the chance of having certain vector \(\hat{\lambda }_{-i}\) given that player i knows the value of its estimated parameter \(\hat{\lambda }_i\).

To determine the best strategy for player i, denoted by \(u_i^*\), we need to define a complete game that is Bayes-equivalent to the incomplete game \({\mathscr {G}}\). This complete game is also called Bayesian game, and it is denoted by \(\bar{{\mathscr {G}}}\). To obtain the Bayesian game, we have to assume that each player i knows not only its own estimated parameter \({\hat{\lambda }}_i\), but also the common prior probability distribution of \({\hat{\lambda }}\), represented by \({\bar{P}}({\hat{\lambda }})={\bar{P}}({\hat{\lambda }}_1,\ldots ,{\hat{\lambda }}_n)\). Given the prior probability distribution \({\bar{P}}({\hat{\lambda }})\), the complete game \(\bar{{\mathscr {G}}}\) can be formally defined by

$$\begin{aligned} \bar{{\mathscr {G}}}=\left\langle N,\{U_i\}_{i=1}^n,\{L_i\}_{i=1}^n,\{g_i(S_i,c_i,s_i,{\hat{\lambda }}_i)_{{\hat{\lambda }}_i\in L_i}\}_{i=1}^n,{\bar{P}}\right\rangle . \end{aligned}$$
(15)

Note that \(\bar{{\mathscr {G}}}\) differs from \({\mathscr {G}}\) from the fact that the set of probability distributions \(\{P_i\}_{i=1}^n\) is replaced by \({\bar{P}}\), which is supposed to be known by all players. The games \({\mathscr {G}}\) and \(\bar{{\mathscr {G}}}\) are called Bayes-equivalent because \(P_i({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)={\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)\), where \({\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)\) can be obtained from \({\bar{P}}({\hat{\lambda }})\) using the Bayes’ rule:

$$\begin{aligned} {\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)=\frac{{\bar{P}}({\hat{\lambda }})}{\sum _{L_{-i}}{\bar{P}}({\hat{\lambda }})}. \end{aligned}$$
(16)

It is important to mention that in the Bayesian game instead of minimizing the long-run average cost \(g_i(S_i,c_i,s_i)\) as in (12), we aim to minimize the conditional expected long-run average cost. Let us denote the conditional expected long-run average cost by \({\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]\), where

$$\begin{aligned} {\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]=\sum _{{\hat{\lambda }}_{-i}\in L_{-i}}{\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)g_i(S_i,c_i,s_i). \end{aligned}$$
(17)

Let \(u^*_{-i}\) be the set of optimal strategies of all players except player i. In the rest of this paper, we are going to make use of the following definitions.

Definition 1

A best-response set in a Bayesian game is the set of all strategies that minimize the expected long-run average cost:

$$\begin{aligned} {\mathscr {B}}_i(u_{-i}):=\{u_i^*|\min _{c_i\in {\mathbb {N}}}\sum _{{\hat{\lambda }}_{-i}\in L_{-i}}{\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)g_i(S_i,c_i,s_i)\}. \end{aligned}$$
(18)

The can-order level \(c_i\) takes values between the upper threshold \(S_i\) and the lower threshold \(s_i\). Therefore, we have a finite set of options for the values of the can-order level. As a result, there is always an optimal strategy \(u^*\) in the best-response set \({\mathscr {B}}_i(u_{-i})\).

For compactness, let us denote the optimal strategy \(u_i^*=(S_i,c_i^*,s_i)\). This strategy corresponds to the optimal can-order level \(c_i^*\) that minimizes the expected long-run average cost.

Definition 2

The strategy \(u^*_i\) is at Bayesian Nash equilibrium, if it holds for all item \(i\in N\)

$$\begin{aligned} \sum _{{\hat{\lambda }}_{-i}\in L_{-i}}{\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)g_i(u_i^*,u_{-i}^*)\le \sum _{{\hat{\lambda }}_{-i}\in L_{-i}}{\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)g_i(u_i^*,u_{-i}). \end{aligned}$$
(19)

Note that at a Nash equilibrium all players play a best-response, such that \(u^*_i\in {\mathscr {B}}_i(u^*_{-i})\) for all \(i\in N\), and no player benefits from changing its reorder strategy.

The following result was borrowed from Harsanyi [15], and it states that if \(u^*=(u_1^*,\ldots ,u_n^*)\) is a Nash equilibrium strategy of the complete game \(\bar{{\mathscr {G}}}\), then it is also a Bayesian equilibrium strategy for the incomplete game \({\mathscr {G}}\).

Theorem 1

Let the incomplete game (14) and the Bayesian game (15) be given, such that (15) is Bayes-equivalent to (14). In order to any given n-tuple of strategies \(u = (u_1, ... ,u_n)\) be a Bayesian equilibrium point in the game (14), it is both sufficient and necessary that in the normal form of the game (15) this n-tuple u be an equilibrium point in Nash’s sense.

However, defining a prior probability distribution \({\bar{P}}\) that is known by all players can be very complicated. In this work we assume that the players do not know the exact prior probability distribution \({\bar{P}}\), but each player knows the set of possible probability distributions \({\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)\), denoted by \({\mathscr {P}}\). In order to be protected against the uncertainty of not knowing the real probability distribution we reframe our approach within the context of distributionally robust optimization. Distributionally robust optimization emerges from the idea of being protected against the ambiguity in the probability distribution by taking the worst-case scenario, which involves finding the probability distribution that maximizes the long-run average cost. As a result, we can formulate the following robust optimization problem:

$$\begin{aligned} \min _{c_i\in {\mathbb {N}}}\max _{{\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)\in {\mathscr {P}}}{\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]=\min _{c_i\in {\mathbb {N}}}\max _{{\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)\in {\mathscr {P}}}\sum _{{\hat{\lambda }}_{-i}\in L_{-i}}{\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)g_i(S_i,c_i,s_i).\nonumber \\ \end{aligned}$$
(20)

Each player \(i\in N\) can define its best strategy \(u^*_i\) by finding the optimal solution of (20) denoted by the point \((c_i^*,{\bar{P}}^*({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i))\). Observe that for fixed values of the upper and lower thresholds \(S_i\) and \(s_i\), respectively, determining the optimal strategy that characterizes the control variable \(u_i\) consists first in finding the optimal probability distribution \({\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)\) that maximizes the long-run average cost \(g_i(S_i,c_i,s_i)\) for each possible can-order threshold \(c_i\). Then, we can find an optimal can-order threshold \(c_i\) that minimizes the expected long-run average cost \({\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]\) for all \({\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)\in {\mathscr {P}}\).

To find the optimal solution, it is important to define the correct set of possible probability distributions \({\mathscr {P}}\). We assume that \({\mathscr {P}}\) is a set of multi-variate Poisson probability distributions. Let us denote by \(MPoisson(\varvec{\bar{\eta }}\), \({\hat{\lambda }}_{-i}\)) the probability distribution of a multi-variate Poisson random variable, where \(\varvec{\bar{\eta }}\in {\mathbb {R}}_+^{1\times (n-1)}\) represents the vector of rate parameters of each random variable \({\hat{\lambda }}_j\in {\hat{\lambda }}_{-i}\), \(j\ne i\). Now we can formally define the set \({\mathscr {P}}\) as follows:

$$\begin{aligned} {\mathscr {P}}=\{MPoisson(\varvec{\bar{\eta }},{\hat{\lambda }}_{-i}) | \varvec{\bar{\eta }}=(\eta _1,\eta _2,\ldots ,\eta _{i-1},\eta _{i+1},\ldots ,\eta _n)\in {\mathbb {R}}_+^{1\times (n-1)},j\ne i\}. \end{aligned}$$
(21)

We assume that the probability distributions in the set \({\mathscr {P}}\) satisfy the Bayesian-equivalence \(P_i({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)=\bar{P}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)\). Under this assumption, we ensure that Theorem 1 is still satisfied. In addition, we assume that from the perspective of player i the rest of the players are homogeneous, namely the values of the parameters \(\eta _j\in {\mathbb {R}}_+\) are the same for all player \(j\ne i\). Let us denote this general parameter by \(\eta \in {\mathbb {R}}_+\). For the sake of simplicity and without loss of generality, the random variables \({\hat{\lambda }}_j\), \(j\in N\), are independent and identically distributed with parameter \(\eta \). We have to solve the robust optimization problem to obtain a single optimal value \(\eta ^*\in {\mathbb {R}}_+\) for the multi-variate Poisson distribution. Based on these assumptions, the expected long-run average cost is explicitly computed as follows:

$$\begin{aligned} {\mathbb {E}}[g_i({\hat{\lambda }}_{-i})|{\hat{\lambda }}_i]=\sum _{{\hat{\lambda }}_{-i}\in L_{-i}}exp(-(n-1)\eta )\frac{\eta ^{\sum _{{\hat{\lambda }}_j\in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j}}{\prod _{{\hat{\lambda }}_j\in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j!}g_i({\hat{\lambda }}_{-i}). \end{aligned}$$
(22)

Note that in (22) we take the thresholds \(S_i\), \(c_i\), and \(s_i\) as fixed and for each term in the sum we compute the long-run average cost as a function of \({\hat{\lambda }}_{-i}\).

4 Main Results

In this section, we show that the expected long-run average cost \({\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]\) is convex on the can-order threshold \(c_i\). Furthermore, we prove that finding an optimal threshold \(c_i\) corresponds to finding a Bayesian Nash equilibrium strategy.

To prove the convexity of the expected long-run average cost with respect to the can-order threshold \(c_i\), first we are going to analyse the behaviour of the long-run average cost \(g_i(S_i,c_i,s_i)\) for any possible vector \({\hat{\lambda }}_{-i}\).

Let us define

$$\begin{aligned} \omega _i = \frac{(S_{i} - c_{i} - 1)(1-\theta )}{\sum ^{c_{i} + 1-s_i}_{x=1}\theta ^{x} + S_{i} - c_{i} - 1}. \end{aligned}$$
(23)

It is important to observe that \(\omega _i\) is a positive fraction, namely \(\omega _i\in (0,1)\). Note that \(\omega _i\) increases when the can-order level decreases.

Also let the following function be given

$$\begin{aligned} f_i(c_{i}, S_{i}) := \frac{\hat{\lambda }_i {\mathbb {E}}(a_{i}) + \sum _{x = c_{i} + 1}^{S_{i}} G(x)}{S_{i} - c_{i}}. \end{aligned}$$
(24)

The above is related to the cost of replenishing always with discount opportunity when the inventory reaches the can-order level.

Lemma 2

For \(\omega _i\) as in (23), the following equation holds:

$$\begin{aligned} \begin{array}{ll} g_i(S_{i}, c_{i} +1, s_{i}) = \omega _i f_i(c_{i}+1, S_{i}) + (1-\omega _i) g_i(S_{i}, c_{i}, s_{i}). \end{array} \end{aligned}$$
(25)

Proof

Let \(\theta = \hat{\lambda }_{i}/(\hat{\lambda }_{i} + \hat{\mu }_i)\). Since the inventory level is at \(S_{i}\) and \(c_{i}\) must be less than \(S_{i}\), for \(T_{i}(y)\) we have

$$\begin{aligned} T_{i}(S_{i}) = \frac{1}{\hat{\lambda }_{i}+ \hat{\mu }_i}\sum _{x=0}^{(c_{i}-s_{i}-1)} \theta ^{x} + \frac{1}{\hat{\lambda }_{i}}(S_i-c_{i}). \end{aligned}$$
(26)

\(I_{i}(y)\) at an inventory level of \(S_{i}\) is given by

$$\begin{aligned} \begin{aligned} I_{i}(S_{i}) = \frac{1}{\hat{\lambda }_{i}+ \hat{\mu }_i}\sum _{x=0}^{(c_{i}-s_{i}-1)} \theta ^{x}G_{i}(S_i-x) + \frac{1}{\hat{\lambda }_{i}}\sum _{x = c_{i} + 1}^{S_i}G_{i}(x), \end{aligned} \end{aligned}$$
(27)

where \(S_{i} > c_{i}\).

\(Q_{i}\) at inventory \(S_{i}\) can be written as:

$$\begin{aligned} Q_{i}(S_{i}) = Q_{i}(y-1), \end{aligned}$$
(28)

where \(S_{i} > c_{i}\).

Let \({\mathbf {N}}_{g_i}\) and \({\mathbf {D}}_{g_i}\) be the numerator and denominator of the cost function in (12). From (26) we have that

$$\begin{aligned} \begin{aligned} \hat{\lambda }_{i}{\mathbf {D}}_{g_i}&= \hat{\lambda }_{i} T_{i}(S_{i}) = \frac{\hat{\lambda }_{i}}{\hat{\lambda }_{i}+ \hat{\mu _i}}\sum _{x=0}^{(c_{i}-s_{i}-1)} \theta ^{x} + S_i-c_{i}\\&=\sum _{x=1}^{c_{i}-s_{i}} \theta ^{x} + S_i-c_{i}. \end{aligned} \end{aligned}$$

From (27), we also have

$$\begin{aligned} \begin{aligned} \hat{\lambda }_{i} {\mathbf {N}}_{g_i}&= \hat{\lambda }_{i}{\mathbb {E}}({a_{i}}) + \hat{\lambda }_{i}\left( {\mathbb {E}}(A) - {\mathbb {E}}(a_{i})\right) {\theta }^{c_{i}-s_i}\\&\quad + \sum _{x=1}^{c_{i}-s_i} \theta ^{x} G(c_{i} - x + 1) + \sum _{x = c_{i} + 1}^{S_{i}}G(x). \end{aligned} \end{aligned}$$

The variables, \(T_{i}, I_{i}\) and \(Q_{i}\) can be substituted into the objective function in (12), and this yields

$$\begin{aligned} \begin{aligned} g_{i}(S_{i},c_{i},s_{i}) =&\frac{\hat{\lambda }_{i} {\mathbf {N}}_{g_i}}{\hat{\lambda }_{i} {\mathbf {D}}_{g_i}}= \frac{\hat{\lambda }_{i}{\mathbb {E}}({a_{i}}) + \hat{\lambda }_{i}\left( {\mathbb {E}}({A}) - {\mathbb {E}}({a_{i}})\right) {\theta }^{c_{i}-s_i}}{\sum _{x=1}^{c_{i}-s_i}\theta ^{x} + S_{i} - c_{i}} \\&+ \frac{(\sum _{x=1}^{c_{i}-s_i} \theta ^{x} G(c_{i} - x + 1)) + \sum _{x = c_{i} + 1}^{S_{i}}(G(x))}{\sum _{x=1}^{c_{i}-s_i}\theta ^{x} + S_{i} - c_{i}}. \end{aligned} \end{aligned}$$
(29)

Let us now substitute \(c_{i}+1\) in (29), which yields:

$$\begin{aligned} \begin{aligned} g_{i}(S_{i},c_{i} +1,s_{i}) =&\frac{\hat{\lambda }_{i}{\mathbb {E}}({a_{i}}) + \hat{\lambda }_{i} \left( {\mathbb {E}} (A) - {\mathbb {E}}({a_{i}})\right) {\theta }^{c_{i}+1-s_i}}{\sum _{x=1}^{c_{i}+1-s_i}\theta ^{x} + S_{i} - c_{i}} \\&+ \frac{\sum _{x=1}^{c_{i}+1-s_i} \theta ^{x} G(c_{i} - x + 2) + \sum _{x = c_{i} + 2}^{S_{i}}G(x)}{\sum _{x=1}^{c_{i}+1-s_i}\theta ^{x} + S_{i} - c_{i}}. \end{aligned} \end{aligned}$$
(30)

Note that for the last part of the numerator it holds

$$\begin{aligned} \begin{aligned}&\sum _{x=1}^{c_{i}+1-s_i} \theta ^{x} G(c_{i} - x + 2) + \sum _{x = c_{i} + 2}^{S_{i}}G(x)\\&\quad =\sum _{x=2}^{c_{i}+1-s_i} \theta ^{x} G(c_{i} - x + 2) + \theta \sum _{x = c_{i} + 1}^{S_{i}}G(x) +(1-\theta ) \sum _{x = c_{i} + 2}^{S_{i}}G(x). \end{aligned} \end{aligned}$$

Then, (30) can be rewritten as

$$\begin{aligned} \begin{aligned} g_{i}(S_{i},c_{i}+1,s_{i}) =&\frac{\hat{\lambda }_{i} {\mathbb {E}}({a_{i}}) + \hat{\lambda }_{i}\left( {\mathbb {E}} (A)-{\mathbb {E}}({a_{i}})\right) \theta ^{c_{i}+1-s_i}}{\sum _{x=1}^{c_{i}+1-s_i}\theta ^{x} + S_{i} - c_{i}}\\&+ \frac{\sum ^{c_{i} + 1-s_i}_{x=2}(G(c_{i} - x + 2))+\theta \sum ^{S_{i}}_{x = c_{i} + 1}G(x) + (1-\theta ) \sum ^{S_{i}}_{x = c_{i} + 2}G(x)}{\sum _{x=1}^{c_{i}+1-s_i}\theta ^{x} + S_{i} - c_{i} }. \end{aligned}\nonumber \\ \end{aligned}$$
(31)

We also have

$$\begin{aligned} \begin{aligned}&({\mathbf {D}}_{\omega _i}) \cdot (1-\omega _i) g(S_{i}, c_{i}, s_{i}) \\&\quad = \sum _{x=1}^{c_{i}+1-s_i}\left( \theta ^{x} + S_{i} - c_{i}\right) g_{i}(S_{i},c_{i},s_{i}) \theta \\&\quad = \hat{\lambda }_{i}{\mathbb {E}}({a_{i}})\theta + \hat{\lambda }_{i} \left( {\mathbb {E}} (A) - {\mathbb {E}}({a_{i}}) \right) \theta ^{c_i+1-s_i} \\&\qquad + \theta \sum _{x=1}^{c_{i}-s_i} \theta ^{x} G(c_{i} - x + 1) + \theta \sum _{x = c_{i} + 1}^{S_{i}}G(x)\\&\quad = \hat{\lambda }_{i}{\mathbb {E}}({a_{i}})\theta + \hat{\lambda }_{i} \left( {\mathbb {E}} (A) - {\mathbb {E}}({a_{i}}) \right) \theta ^{c_i+1-s_i} \\&\qquad + \sum _{x=2}^{c_{i}+1-s_i} \theta ^{x} G(c_{i} - x + 2) + \theta \sum _{x = c_{i} + 1}^{S_{i}}G(x). \end{aligned} \end{aligned}$$
(32)

Likewise, from (24) we have:

$$\begin{aligned} \begin{aligned} ({\mathbf {D}}_{\omega _i}) \cdot \omega _i f_i(c_{i} + 1, S_{i}) =&(S_{i} - c_{i} -1) (1- \theta ) f_i(c_{i} + 1, S_{i}) \\ =&(1-\theta ) \hat{\lambda }_{i}{\mathbb {E}}({a_{i}}) + (1-\theta ) \sum ^{S_{i}}_{x = c_{i} + 2} G(x). \end{aligned} \end{aligned}$$
(33)

From (31)–(33), we can conclude that \(g_{i}(S_{i},c_{i}+1,s_{i}) = \omega _i f_i(c_{i}+1, S_{i}) + (1-\omega _i) g_i(S_{i}, c_{i}, s_{i})\) and (25) is proved.\(\square \)

A direct consequence of Lemma 2 is that \(g_i(c_i+1)\) is in the convex hull between \(g_i(c_i)\) and \(f_i(c_i+1)\).

In the next result, we show that the optimal value of \(c_{i}\) is located at a local minimum of \(g_{i}\) for any possible vector \({\hat{\lambda }}_{-i}\), under continuous relaxation for \(c_i\), namely \(c_{i} \in {\mathbb {R}}_{+}\). Let the discrete difference operator be given \(\frac{\partial }{\partial c_i}\), which is defined as

$$\begin{aligned} \frac{\partial }{\partial c_i} g_{i}(S_{i}, c_{i}, s_{i}):=g_{i}(S_{i}, c_{i}+1, s_{i})-g_{i}(S_{i}, c_{i}, s_{i}). \end{aligned}$$

Lemma 3

Let the can-order threshold be \(c_{i} \in {\mathbb {R}}\) and in the interval \([s_{i},S_{i}]\). Then, at a local minimum it must hold

$$\begin{aligned} f_{i}(c_{i} + 1, S_{i}) - g_{i}(S_{i}, c_{i}, s_{i}) \ge 0. \end{aligned}$$
(34)

Proof

First, we obtain the derivative of \(g_{i}(\cdot )\) as

$$\begin{aligned} \frac{\partial }{\partial c_{i}} g_{i}(S_{i}, c_{i}, s_{i}) = \omega _{i}\left( f_{i}(c_{i}+1,S_{i})-g_{i}(S_{i},c_{i},s_{i})\right) . \end{aligned}$$
(35)

At a local minimum it must hold \(\frac{\partial g_{i}(S_{i}, c_{i}, s_{i})}{\partial c_{i}} \ge 0\). As \(\omega _{i}\) is only zero when there is no demand, the above implies \(f_{i}(c_{i} + 1, S_{i}) - g_{i}(S_{i}, c_{i}, s_{i}) \ge 0\) and (34) is proven. \(\square \)

In the next result, we establish the convexity of function \(g_{i}(S_{i},c_{i},s_{i})\) in the decision variable \(c_i\) for any possible values of \({\hat{\lambda }}_{-i}\).

Lemma 4

The long-run average cost function is convex with respect to \(c_{i}\), and the following inequality holds:

$$\begin{aligned} g_i(S_{i}, \nu c_{i} + (1-\nu )(c_{i} + x), s_{i}) \le \nu g_i(S_{i}, c_{i}, s_{i}) + (1-\nu ) g_i(S_{i}, c_{i} + x, s_{i}), \end{aligned}$$
(36)

where \(\nu \in [0,1]\) and \(g_{i}(\cdot )\) is defined on positive real numbers.

Proof

First note that \(g_{i}(\cdot )\) is defined on positive real numbers as the costs are nonnegative. Now, take a value for \(c_{i}\) which is a local minimum for \(g_{i}(\cdot )\) and denote it \({\bar{c}}_{i}\). Then the value of \(g_{i}(S_{i}, c_{i}, s_{i})\) on the interval range \(H({\bar{c}}_{i}, \epsilon ) = [{\bar{c}}_{i} - \epsilon , {\bar{c}}_{i} + \epsilon ]\) should be greater than or equal to \(g_i(S_{i}, {\bar{c}}_{i}, s_{i})\). This is the case, if and only if the second derivative of \(g_i(\cdot )\) is greater than or equal to zero. From the first derivative in (35), we obtain

$$\begin{aligned} \begin{aligned} \frac{\partial ^{2} g_{i}(S_{i}, c_{i}, s_{i})}{\partial c_{i}^{2}} =&\frac{ \partial \omega _{i}}{\partial c_{i}} f_{i}(c_{i}+1,S_{i}) + \omega _{i}\frac{\partial f_{i}(c_{i}+1,S_{i})}{\partial c_{i}} \\&- \frac{\partial \omega _{i}}{\partial c_{i}} g_{i}(S_{i},c_{i},s_{i}) - \omega _{i} \frac{\partial g_{i}(S_{i},c_{i},s_{i})}{\partial c_{i}}, \end{aligned} \end{aligned}$$
(37)

where \(\omega _{i}\) is as in (23).

To simplify the expression of (37), the result of (35) is substituted in (37), which yields

$$\begin{aligned} \begin{aligned} \frac{\partial ^{2}}{\partial c_{i}^{2}} g_{i}(S_{i}, c_{i}, s_{i})&= \omega _{i} \frac{\partial f_{i}(c_{i}+1,S_{i})}{\partial c_{i}} + \left( \frac{\partial \omega _{i}}{\partial c_{i}} - \omega _{i}^{2}\right) (f_{i}(c_{i}+1,S_{i}) - g_{i}(S_{i},c_{i},s_{i}))\\&= \omega _{i} \frac{\partial f_{i}(c_{i}+1,S_{i})}{\partial c_{i}} + \left( \frac{\partial \omega _{i}}{\partial c_{i}} - \omega _{i}^{2} \right) \frac{1}{\omega _i} \underbrace{\frac{\partial }{\partial c_{i}} g_{i}(S_{i}, c_{i}, s_{i})}_{\approx 0}. \end{aligned} \end{aligned}$$
(38)

Note that \(\omega _{i} \ge 0\), \(\omega _{i}^{2} \ge 0\), \(g_{i}(S_{i},c_{i},s_{i}) \ge 0\) and \(f_{i}(c_{i}+1,S_{i})\ge 0\). We wish to show that \(f_{i}(c_{i} +1, S_{i})\) is an increasing function, namely that \((\partial f_{i}(c_{i}+1,S_{i}))/(\partial c_{i})\) is positive. This holds true if \(f_{i}(c_{i} +2, S_{i}) \ge f_{i}(c_{i} +1, S_{i})\) is proven. If the previous inequality is true, then we have

$$\begin{aligned} \begin{aligned} \frac{\hat{\lambda }_i{\mathbb {E}}(a_{i}) + \sum _{x = c_{i} + 3}^{S_{i}} G(x)}{S_{i} - c_{i} - 2} \ge \frac{\hat{\lambda }_i {\mathbb {E}}(a_{i}) + \sum _{x = c_{i} + 2}^{S_{i}} G(x)}{S_{i} - c_{i} - 1}. \end{aligned} \end{aligned}$$

Furthermore, the above can be rewritten as

$$\begin{aligned} \begin{aligned} \left( \hat{\lambda }_i {\mathbb {E}}(a_{i}) + \sum _{x = c_{i} + 3}^{S_{i}} G(x)\right) (S_{i} - c_{i} - 1) \ge \left( \hat{\lambda }_i {\mathbb {E}}(a_{i}) + \sum _{x = c_{i} + 2}^{S_{i}} G(x)\right) (S_{i} - c_{i} - 2). \end{aligned}\nonumber \\ \end{aligned}$$
(39)

The previous condition yields

$$\begin{aligned}&S_{i} \hat{\lambda }_{i}{\mathbb {E}}(a_{i}) - c_{i}\hat{\lambda }_{i}{\mathbb {E}}(a_{i}) - \hat{\lambda }_{i}{\mathbb {E}}(a_{i}) \\&\qquad + S_{i}\sum _{x = c_{i} + 3}^{S_{i}} G(x) - c_{i} \sum _{x = c_{i} + 3}^{S_{i}} G(x)- \sum _{x = c_{i} + 3}^{S_{i}} G(x)\\&\quad \ge S_{i}\hat{\lambda }_{i}{\mathbb {E}}(a_{i}) - c_{i}\hat{\lambda }_{i}{\mathbb {E}}(a_{i}) - 2 \hat{\lambda }_{i}{\mathbb {E}}(a_{i}) \\&\qquad + S_{i}\sum _{x = c_{i} + 2}^{S_{i}} G(x)- c_{i}\sum _{x = c_{i} + 2}^{S_{i}} G(x)- 2\sum _{x = c_{i} + 2}^{S_{i}} G(x), \end{aligned}$$

which can be simplified as

$$\begin{aligned} \begin{aligned}&S_{i}G(c_i+2) - c_{i} G(c_i+2) -2G(c_i+2) - \sum _{x = c_{i} + 3}^{S_{i}} G(x) - \hat{\lambda }_{i}{\mathbb {E}}(a_{i}) \\&\quad = S_{i}(c_i+2)h_i - c_{i} (c_i+2)h_i -2(c_i+2)h_i - \sum _{x = c_{i} + 3}^{S_{i}} G(x) - \hat{\lambda }_{i}{\mathbb {E}}(a_{i}) \le 0. \end{aligned} \end{aligned}$$
(40)

Let us rewrite \(\sum _{x = c_{i} + 3}^{S_{i}} G(x) = \frac{1}{2}h_i(S_{i} - c_{i} -2 )(c_{i}+S_{i} + 3)\), then we have

$$\begin{aligned} \hat{\lambda }_{i} {\mathbb {E}}(a_{i}) \ge h_i(S_{i} - c_{i} - 2) \left( \frac{c_{i}}{2} +\frac{1}{2} - \frac{S_{i}}{2} \right) , \end{aligned}$$
(41)

where \(h_i(S_{i} - c_{i} - 2) \ge 0\) and \((\frac{c_{i}}{2} +\frac{1}{2} - \frac{S_{i}}{2}) \le 0\), if \(S_{i} \ge c_{i} +1\) which holds true. Since \({\mathbb {E}}(a_{i}) > 0\) and \(\hat{\lambda }_{i} > 0\) hold true, then (41) is satisfied and we can conclude that \(f_{i}(c_{i} +2, S_{i}) \ge f_{i}(c_{i} +1, S_{i})\) holds.

A direct implication of the above is that \((\partial f_{i}(c_{i}+1,S_{i}))/(\partial c_{i})\) is positive. This means that the first term in (38) for \(\frac{\partial ^{2} g_i(S_{i},c_{i},s_{i})}{\partial c_i^2}\) is greater than or equal to zero, namely:

$$\begin{aligned} \omega _{i} \frac{\partial f_{i}(c_{i}+1,S_{i})}{\partial c_{i}} \ge 0. \end{aligned}$$
(42)

In order to prove convexity, the rest of \(\frac{\partial ^{2} g_i(S_{i},c_{i},s_{i})}{\partial c_i^2}\) must be positive or smaller than (42). Since from (35), we know that \({\partial g_{i}(\cdot )}/{\partial c_{i}}\) is proportional to \(f_{i}(c_{i}+1,S_{i}) - g_{i}(S_{i},c_{i},s_{i})\) which becomes zero in a local minimum. Then in the neighbourhood of a stationary point \(f_{i}(c_{i}+1,S_{i}) - g_{i}(S_{i},c_{i},s_{i})\) is relatively small. This makes the second term in (38) approximately equal to zero and therefore negligible. We can conclude that for any stationary point the second derivative is nonnegative and therefore function \(g_{i}(\cdot )\) is convex on \(c_{i}\). \(\square \)

Lemma 4 proves that \(g_{i}(S_{i}, c_{i}, s_{i})\) is convex on \(c_i\); therefore, it is known that the optimal can-order level is at the same time a local and global minimum.

In the following result, we prove the convexity of the function \({\mathbb {E}}[g_i (S_i, c_i, s_i) | {\hat{\lambda }}_i]\) with respect to the decision variable \(c_i\), for a given parameter \(\eta \) that characterizes the probability \({\bar{P}}({\hat{\lambda }}_{-i} | {\hat{\lambda }}_i) \).

Theorem 2

The expected long-run average cost function is convex with respect to \(c_i\), and the following inequality holds:

$$\begin{aligned} {\mathbb {E}}[g_i(S_i,\nu c_i+(1-\nu )(c_i+x),s_i)|{\hat{\lambda }}_i]\le \nu {\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]+(1-\nu ){\mathbb {E}}[g_i(S_i,(c_i+x),s_i)|{\hat{\lambda }}_i],\nonumber \\ \end{aligned}$$
(43)

where \(\nu \in (0,1)\) and \(g_i(\cdot )\) is defined on positive real numbers.

Proof

Similarly as in Lemma 4 we have to prove that for any \(c_i\) in the interval \([{\bar{c}}_i-\epsilon ,{\bar{c}}_i+\epsilon ]\), where \({\bar{c}}_i\) is the local minimum of \({\mathbb {E}}[g_i(\cdot )|{\hat{\lambda }}_i]\), the value of \({\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]\) is greater than or equal to \({\mathbb {E}}[g_i(S_i,{\bar{c}}_i,s_i)|{\hat{\lambda }}_i]\). This is satisfied when the second derivative of \({\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]\) is greater than or equal to zero.

The first derivative of \({\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]\) with respect to \(c_i\) is

$$\begin{aligned} \frac{\partial {\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]}{\partial c_i}=&\frac{\partial }{\partial c_i}\sum _{{\hat{\lambda }}_{-i}\in L_{-i}}exp(-(n-1)\eta )\frac{\eta ^{\sum _{{\hat{\lambda }}_j\in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j}}{\prod _{{\hat{\lambda }}_j\in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j!}g_i(S_i,c_i,s_i)\\ =&\sum _{{\hat{\lambda }}_{-i}\in L_{-i}}exp(-(n-1)\eta )\frac{\eta ^{\sum _{{\hat{\lambda }}_j\in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j}}{\prod _{{\hat{\lambda }}_j\in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j!}\frac{\partial }{\partial c_i}g_i(S_i,c_i,s_i). \end{aligned}$$

Consequently, the second derivative is given by

$$\begin{aligned} \frac{\partial ^2 {\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]}{\partial c_i^2}=\sum _{{\hat{\lambda }}_{-i}\in L_{-i}}exp(-(n-1)\eta )\frac{\eta ^{\sum _{{\hat{\lambda }}_j\in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j}}{\prod _{{\hat{\lambda }}_j\in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j!}\frac{\partial ^2}{\partial c_i^2}g_i(S_i,c_i,s_i). \end{aligned}$$
(44)

Based on Lemma 4, we know that \(\frac{\partial ^2}{\partial c_i^2}g_i(S_i,c_i,s_i)\ge 0\), and then each term in (44) is greater than or equal to zero. We can conclude that for any stationary point and for any parameter \(\eta \), the second derivative is nonnegative and the function \({\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]\) is convex with respect to \(c_i\). \(\square \)

The set \({\mathscr {P}}\) defined by (21) has to be determined in such a way that the expected long-run average cost function is concave with respect to the parameter \(\eta \). In the following lemma, we prove that such a set \({\mathscr {P}}\) exists.

Lemma 5

Let \(\eta ^*\in {\mathbb {R}}_+\) be an optimal value that maximizes (22). There exists \(\varepsilon \), \(0<\varepsilon <\eta ^*\), such that in the interval \([\eta ^*-\varepsilon ,\eta ^*+\varepsilon ]\), the expected long-run average cost is concave with respect to \(\eta \).

Proof

To prove the concavity of the expected long-run average cost with respect to \(\eta \), we use the second derivative criteria. The first derivative is computed as follows:

$$\begin{aligned} \begin{aligned}&\frac{\partial {\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]}{\partial \eta }\\&\quad =\sum _{{\hat{\lambda }}_{-i}\in L_{-i}}g_i(S_i,c_i,s_i)exp(-(n-1)\eta )\eta ^{\sum _{{\hat{\lambda }}_j \in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j-1}\left[ \frac{\sum _{{\hat{\lambda }}_j \in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j-(n-1)\eta }{\prod _{{\hat{\lambda }}_j \in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j!}\right] . \end{aligned} \end{aligned}$$

By computing the second derivative and evaluating it in \(\eta =\eta ^*\), we obtain:

$$\begin{aligned} \begin{aligned}&\frac{\partial ^2 {\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i](\eta ^*)}{\partial \eta ^2}\\&\quad =\frac{1}{\eta ^*}\sum _{{\hat{\lambda }}_{-i}\in L_{-i}}\frac{g_i(S_i,c_i,s_i)exp(-(n-1)\eta ^*){\eta ^*}^{\sum _{{\hat{\lambda }}_j \in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j-1}}{\prod _{{\hat{\lambda }}_j \in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j!}\left[ \left( \sum _{{\hat{\lambda }}_j \in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j-(n-1)\eta ^*\right) ^2\right. \\&\qquad -\left. \left( \sum _{{\hat{\lambda }}_j\in {\hat{\lambda }}_{-i}} {\hat{\lambda }}_j-(n-1)\eta ^*\right) -\eta ^*(n-1)\right] . \end{aligned} \end{aligned}$$

Observe that as \(\eta ^*\) is the optimal value that maximizes the expected long-run average cost, the first two terms in the previous equation are equal to zero. Then, we have

$$\begin{aligned} \begin{aligned}&\frac{\partial ^2 {\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i](\eta ^*)}{\partial \eta ^2}=-(n-1)\sum _{{\hat{\lambda }}_{-i}\in L_{-i}}\frac{g_i(S_i,c_i,s_i)exp(-(n-1)\eta ^*){\eta ^*}^{\sum _{{\hat{\lambda }}_j \in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j-1}}{\prod _{{\hat{\lambda }}_j \in {\hat{\lambda }}_{-i}}{\hat{\lambda }}_j!}. \end{aligned} \end{aligned}$$

From the previous equation, we have that \(\frac{\partial ^2 {\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i](\eta ^*)}{\partial \eta ^2}<0\). Hence \(\eta ^*\) is a local maximum. We can conclude that there exists an interval \([\eta ^*-\varepsilon ,\eta ^*+\varepsilon ]\) such that the expected long-run average cost is concave. \(\square \)

Remark 1

Note that as \(c_i\in [S_i,s_i]\) and \(\eta \in [\eta ^*-\varepsilon ,\eta ^*+\varepsilon ]\), both variables are contained in convex and compact sets. In addition, based on Theorem 2 and Lemma 5 we know that the expected long-run average cost is convex with respect to \(c_i\) and concave with respect to \(\eta \), on their respective intervals. Therefore, according to John von Neumann’s minimax theorem:

$$\begin{aligned} \min _{c_i\in {\mathbb {N}}}\max _{{\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)\in {\mathscr {P}}}{\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]=\max _{{\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)\in {\mathscr {P}}}\min _{c_i\in {\mathbb {N}}}{\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]. \end{aligned}$$
(45)

In the following result, we prove that finding an optimal can-order level \(c_i\) that minimizes (17) corresponds to finding a strategy at a Bayesian Nash equilibrium.

Theorem 3

Let for all players \(i\in N\) the upper threshold \(S_i\) and the lower threshold \(s_i\) be given. In addition for a given parameter \(\eta \) that maximizes the conditional expected long-run average cost, let the optimal can-order threshold \(c_i^*\) be obtained by:

$$\begin{aligned} c_i^*:= arg\min _{c_i\in {\mathbb {N}}} \sum _{{\hat{\lambda }}_{-i}\in L_{-i}}{\bar{P}}({\hat{\lambda }}_{-i}|{\hat{\lambda }}_i)g_i(S_i,c_i,s_i). \end{aligned}$$
(46)

Then, the n-tuple \((c_1^*,c_2^*,\ldots ,c_n^*)\) is at a Bayesian Nash equilibrium of the Bayesian game \(\bar{{\mathscr {G}}}\).

Proof

Based on Theorem 2, we know that the expected long-run average cost is convex on the can-order level \(c_i\). Moreover, the optimal value \(c^*_i\) is a global minimum. Therefore, the can-order level \(c^*_i\) that minimizes the long-run average cost is unique.

By definition of best-response set in a Bayesian game \({\mathscr {B}}_i(u^*_{-i})\), if we find the can-order level that minimizes \({\mathbb {E}}[g_i(S_i,c_i,s_i)|{\hat{\lambda }}_i]\) for a player \(i\in N\), given the optimal strategy of the other players, then the optimal strategy of player i, \(u^*_i\) is in the set \({\mathscr {B}}(u^*_{-i})\) defined by (18). Hence, the strategy is at a Bayesian Nash equilibrium.\(\square \)

5 Numerical Analysis

Let us consider a time interval of 39 months, where each month consists of 4 weeks. The simulated results are obtained by optimizing the can-order threshold for the minimum expected long-run average cost at every time interval. Consider a system of \(N = 4\) items which are ordered at a single distribution centre. At every time period, we forecast the demand and setup costs of a single item i to obtain the optimal can-order threshold of the (Scs) policy.

The parameters are selected as follows. The demand is measured over 39 consecutive months. Each month consists of four weeks. The total number of weeks included in the simulations is 156. The lead time is 1 week. The holding costs \(h_i\) are €2 per month. We consider 4 items involved in the coordinated replenishment. The major setup cost \({\hat{A}}(1)\) is initially forecast to be €80. The minor setup cost \({\hat{a}}_i(1)\) is initially forecast to be €15. The safety factor k is 3. The forecast coefficients \(\alpha , \gamma \) are both 0.2. The initial demand is forecast considering observed data for the first 12 months applying a Poisson regression model and every month we add four more new observations (one for each week) to forecast the demand for the next month. Finally, it is assumed that all items in the system face the same demand uncertainty for the sake of simplicity and without the loss of generality. Note that \(S_{i}(t)\) and \(s_{i}(t)\) can therefore be represented by S(t) and s(t), respectively, as the levels are assumed to be the same for all items. The demand is assumed to follow a Poisson distribution with mean equal to \(\lambda _{i}\) and the setup cost are assumed to follow a Gaussian distribution with mean equal to \(a_{i}\) and A, see Table 1.

Table 1 Setup cost and demand parameters

Since it is a forecasting model of the can-order policy, the demands and setup costs for all weeks are forecast. Figure 5 displays the forecast \({\hat{D}}_i(t)\) and actual demand \(D_i(t)\) over a 39-month period. Similar to the demand, the setup costs influence the control variables of the can-order policy and must therefore be redetermined at every time period.

Fig. 5
figure 5

Forecast demand \({\hat{D}}_i(t)\) (dots) and actual demand \(D_i\) (solid) over a 39-month period

In Algorithm 1, we present the pseudo-code used to obtain the optimal can-order level at each month. Each month, the events are forecast again and the control variables of the can-order policy are redetermined. The optimal can-order level is determined by increasing the value of \(c_{i}\) by one and calculating the corresponding expected long-run average cost for each probability distribution in the set \({\mathscr {P}}\) as in equation (21). If the expected long-run average cost calculated at \((c_i+1)\) is higher than it was at the previous can-order level, the iterations stop and the value of the previous can-order level is the optimal value of \(c_{i}\). As it is stated in Theorem 2, the expected long-run average cost is convex with respect to \(c_i\). The optimal can-order level is at the same time a local and a global minimum. As a result of the behaviour of the long-run average cost and the iterative process of the algorithm, it is possible to say that it converges to the optimal can-order level \(c_i^*\) that minimizes the expected cost in the long run.

figure a

It is important to mention that for the Poisson regression model, we use the built-in MATLAB function glmfit. This function uses a method called weighted least squares to estimate the regression parameters \({\hat{\beta }}_1\) and \({\hat{\beta }}_2\). Weighted least squares is a generalization of generalized least squares. It takes into account the weights that determine how much each value influences the final parameter estimates. For the convergence of the weighted least squares MATLAB uses a Newton–Raphson iterative method capturing the first-order optimality conditions at the solution, resulting in strong local convergence rates.

The simulations of the expected long-run average cost are depicted in Fig. 6. The graph corroborates the results in Theorem 2 and Lemma 5, as it shows that the expected long-run average cost is a convex function on \(c_{i}\) and concave on \(\eta \). For all consecutive time periods, the optimal can-order levels are listed in Table 2.

Fig. 6
figure 6

Expected long-run average cost vs. can-order level for all probability distributions in the set \({\mathscr {P}}\)

Table 2 Monthly optimal value of \(c_{i}\)

The optimal can-order levels which are determined in the previous step of the algorithm are used to simulate the inventory reordering policy and the related inventory evolution over time. Figure 7 shows that reordering also takes place at the can-order level. In addition, to analyse the stability of the algorithm, we run 100 simulations to compute the \(95\%\) confidence interval of the can-order level. The evolution over time of the can-order level and its \(95\%\) confidence interval is depicted in Fig. 8. In this plot, we can observe that the optimal values for the can-order level do not have a great variability as its confidence interval is small.

Fig. 7
figure 7

Inventory evolution for the optimal (Scs) policy

Fig. 8
figure 8

Time evolution of \(c_i^*\) (solid line) and its \(95\%\) confidence interval (dotted line)

6 Conclusions

In this research, we have developed a periodic review system with multiple items, with uncertain demand and uncertain setup costs. The problem consists in learning the uncertain demand and setup costs, while determining the values for the threshold of the can-order policy by minimizing the expected long-run average cost.

Motivations arise from two-layer time-dependent logistic systems. The first layer consists of a learning mechanism to obtain the uncertain input parameters. The setup cost and demand are uncertain, and their forecasts are run using exponential smoothing and Poisson regression, respectively. The second layer consists of a time dependent (Scs) policy where the must-order, can-order and order-up-to levels are determined every time new input data are obtained. The proposed methodology is original in that the long-run average cost is optimized under the assumption that the parameters are time-dependent. We translate the initial coordinated replenishment problem into a Bayesian game, where the probabilities involved in this computation are unknown. In order to deal with the uncertainty in the probability distributions, we consider the worst-case scenario by making use of distributionally robust optimization.

The main result is the proof of convexity of the cost function with respect to the can-order level, according to which the can-order value can be optimized by solving a convex problem. In addition, we show that there exists a set of probability distributions where the cost function is concave with respect to the parameter of the corresponding distribution. Finally we show that finding an optimal can-order level that minimizes the expected cost function in the Bayesian game corresponds to finding a strategy at a Bayesian Nash equilibrium. The developed model is validated on a numerical example.

An interesting approach for future works is to analyse the model from a different game theory perspective, by focusing on minimizing the long-run average cost \(g_i\) for all the items at the same time applying, for example, the theory of strategic complements.