Optimal (R, Q) policy and pricing for two-echelon supply chain with lead time and retailer’s service-level incomplete information

Many studies focus on inventory systems to analyze different real-world situations. This paper considers a two-echelon supply chain that includes one warehouse and one retailer with stochastic demand and an up-to-level policy. The retailer’s lead time includes the transportation time from the warehouse to the retailer that is unknown to the retailer. On the other hand, the warehouse is unaware of retailer’s service level. The relationship between the retailer and the warehouse is modeled based on the Stackelberg game with incomplete information. Moreover, their relationship is presented when the warehouse and the retailer reveal their private information using the incentive strategies. The optimal inventory and pricing policies are obtained using an algorithm based on bi-level programming. Numerical examples, including sensitivity analysis of some key parameters, will compare the results between the Stackelberg models. The results show that information sharing is more beneﬁcial to the warehouse rather than the retailer.


Introduction and literature review
Two-echelon supply chain consists of one or more manufactures or warehouses who wholesale their products to lower echelon (retailers) who retail them to end customers (Lau and Lau 2005). To maximize their profits, each echelon should make decisions about pricing and appropriate replenishment policy, including order quantity (Q) and reorder point (R) (Heydari 2014). The related literature on optimal pricing and (R, Q) in the supply chain regarding lead time and service level can be categorized into two groups: up-to-level policy (R, Q) and game theoretic pricing and ordering decisions. These models are briefly summarized to be compared with the proposed model.
The first group focuses on (R, Q) inventory when an order for Q is placed and the inventory level falls to R (Li et al. 2011). In optimization cost and profit of supply chain, it is common to assume that retailers and warehouses and even plant follow (R, Q) policy for replenishment. The interruption demand of lower level in a supply chain does not violate the assumption of (R, Q) replenishment policy (Forsberg 1997) and (Ganeshan 1999). Considering stochastic demand is the common assumption in this group (Tan and Weng 2013) (Alkhedher and Darwish 2013). For instance, Taleizadeh et al. (2011) proposed a multi-buyer multi-vendor supply chain and determined ordering policy, including reorder point and number of shipments as well as safety stock to minimize the total cost. Strijbosch and Moors (2006) considered single-stage supply chain with (R, Q) policy. By considering the demand with truncated normal distribution and taking into account the non-negative values, they derive ordering policy. While lead time demand has normal distribution, Chung et al. (2009) presented an accurate algorithm to determine the order quantity and the reorder point. Thangam and Uthayakumar (2008) derived optimal reorder points in two-echelon supply chain, while the number of backorders which are allowed during the lead time is limited. Isotupa and Samanta (2013) developed a cost function for (R, Q) and a lost sales inventory system with stochastic lead time and two types of customers with different priorities. Al-Rifai and Rossetti (2007) proposed a two-echelon inventory system with non-reparable spare parts and utilized a heuristic algorithm to drive ordering decisions. Although the proposed models considered different real-world situations on (R, Q) inventory systems, the interaction between each echelon of the supply chain, such as coordination, cooperation, or competition, has been ignored. Moreover, some researchers have applied metaheuristic methods for optimizing supply chain. Taleizadeh et al. (2010a) applied genetic algorithm to optimize multiproduct multiconstraint inventory control systems with stochastic replenishment intervals and discount. Taleizadeh et al. (2010b) used A particle swarm optimization approach for constraint joint single buyer-single vendor inventory problem with changeable lead time and (R, Q) policy in supply chain.
The second group of literature has been dedicated to the pricing and ordering decisions. These decisions in the supply chain would be also practical and efficient, because emphasis is placed the dynamic behavior of members by the game theory approach. Most papers present the noncooperative game in the supply chain as the Stackelberg game. Taleizadeh and Noori-daryan (2016) optimized the total cost of the supply chain network by coordinating decision-making policy using Stackelberg-Nash equilibrium. The decision variables of their model were the supplier's price, the producer's price, and the number of shipments received by the supplier and producer. Taleizadeh et al. (2015) used Stackelberg game for optimizing prices and ordering decisions in a supply chain with imperfect quality items and inspection under the buyback of defective items. Heydari (2014) considered the coordination between a supplier and a retailer in the supply chain with respect to fixed amount of demand, and also the variation of lead time with complete information structure. Nevertheless, the optimal ordering and pricing policies are obtained in a seller-buyer supply chain with partial lost sale and stochastic demands as the seller-Stackelberg game (Cai et al. 2011). Ye and Xu (2010) presented a vendorbuyer model under buyer-Stackelberg and cooperative games.  determined price and ordering decisions for two competing supply chains using a Stackelberg game approach. The heuristic algorithms have been used to obtain the optimal order quantity, length of lead time, the safety factor, and the number of products which are delivered to the buyer to minimize their costs. Moreover, many researchers have to make decisions in the supply chain using incomplete information. A supply chain model is proposed to obtain optimal order quantity and selling price, while the seller's setup, purchase costs, and the demand are unknown (Esmaeili and Zeephongsekul 2010). Uncertain market demand is also considered in a duopoly market, where two separate firms offer complementary goods regarding the Stackelberg model (Mukhopadhyay et al. 2011). In addition, to obtain the optimal pricing policy in a supply chain, including a multichannel manufacturer, a retailer uncertain demand is considered (Yan and Pei 2011). The model is analyzed with information sharing and non-information sharing. Later, the optimal pricing and collection policies for a manufacturer-retailer are obtained in a closed-loop supply chain with complete and incomplete information about customer's demand (Wei et al. 2015). Hu et al. (2014) obtained optimal ordering and production policy in a twoechelon supply chain to maximize expected profit. They considered stochastic demand and production yield, while these factors, as well as price, are proposed to be common knowledge.
The features of this paper are classified in terms of decision policies, the main assumptions, and solution methods in comparison with those of the literature (Table 1). To the best of our knowledge, ordering size, safety factor, and pricing policies in a supply chain have not been studied regarding incomplete information of lead time and retailer's service level in the literature. This paper considers a two-echelon supply chain that includes one warehouse and one retailer under incomplete information. Both the warehouse and the retailer apply (R, Q) replenishment policy with a continuous-review backorder inventory model and truncated normal demand. The warehouse's order quantity is integer multiples of the retailer's order quantity. The warehouse uses two transportation modes via a logistic service provider (LSP), the traditional, and the emergency ones, to deliver batches to the retailer. Whenever the warehouse is out of stock, orders are delivered with delay. The warehouse uses emergency transportation, such as air transportation, to keep high service level. Practical examples would be observed in the delivery of large-sized transactions in a supply chain for each industry, such as car, tire, and computer companies. For example, in Customs Administration or when there is a distance between the upstream (vendors) or downstream (sellers), there are some silos and warehouses at the Customs Administration. The existence of such silos and warehouses is required for exchanging, sending, and receiving cargo, shipment, products with, to, and from their retailers or buyers. Regarding the strategic location of warehouse or silos, a warehouse could be considered as a monopolist by different retailers or sellers. Therefore, the relationship between the retailer and the warehouse is investigated based on the warehouse-Stackelberg game with incomplete information. Moreover, the warehouse offers a fraction of its profit to the retailer, while the retailer offers more order quantities to reveal their private information. The optimal inventory and pricing policies are obtained using the algorithm based on BLPP. Numerical examples, including sensitivity analysis on the probability distribution of transportation mode (proposed by the retailer) and the retailer's service-level mean and variance (estimated by the warehouse), will compare the results between the Stackelberg models. The results show that information sharing is more beneficial to the warehouse rather than the retailer.
The rest of this paper is organized as follows. In Sect. 2, the notations and assumptions of the proposed model under complete information are presented. In Sect. 3, a model regarding incomplete information is provided. The warehouse-Stackelberg game with incomplete information and incentive strategies to reveal the information is presented in Sect. 4. Computational results include the numerical example and sensitive analyses are considered to analyze the effect of parameters on the model in Sect. 5. Finally, conclusion and future suggestions are presented in Sect. 6.

Notation and problem formulation
This section introduces the notation, all decision variables, input parameters, assumptions, and details of the models which will be stated here:

Assumptions
The following assumptions are considered for the warehouse and retailer's models ( Fig. 1): 1. The planning horizon is infinite. 2. The warehouse and the retailer follow (R, Q) replenishment policy. An order for Q is placed when the inventory level falls to R. In optimization cost and profit of supply chain, it is common to assume that retailers and warehouses and even plant follow (R, Q) policy for replenishment (Forsberg 1997) and (Ganeshan 1999). 3. The warehouse's lead time L o is constant. 4. The unsatisfied demand is backordered. 5. The warehouse's lot size is Q 0 ¼ NQ r when N is integer multiples of retailer's lot size. 6. The retailer's demand from the warehouse, D 0 , is a function of final customers' potential demand, D r , (1)]. Therefore, 7. Retailer and the warehouse are unaware of L r and h r , respectively. 8. Retailer's holding cost is proportional to the unit price of h r ¼ h c þ ip r . 9. According to the strategic location of warehouse, a warehouse would be interacted by different retailers or sellers; therefore, the warehouse has greater power than the retailer or the seller. 10. Since negative demands are meaningless, the truncated normal distribution is considered for the customers' and retailer's demand.

Model description
In this section, the warehouse and the retailer's models are introduced regarding complete information.

The warehouse model
The warehouse's profit function is described as follows: p 0 N; k 0 ð Þ¼Total revenue À Total purchasing cost À Total ordering cost À Total holding cost À Total stock out cost: Or, expressed as Eq. (2): where Equation (3) describes the warehouse's safety stock. S 0 in Eq. (4) represents the warehouse's stock out.
Equation (5) indicates the expected selling price, including the price of goods and the expected transportation cost. The first term of Eq. (5) indicates the traditional transportation cost which is calculated as follows (Fig. 2): In addition, the second term explains the emergency transportation cost: Equation (6) determines a lower bound for the warehouse's service level. Equation (7) is written with respect to assumption (5).
According to assumption (10), total retailer's demand from the warehouse has truncated normal distribution (modified normal distribution) with the following parameters: While where In the above equations, v ¼ r 0 l 0 , and uðÁÞ and UðÁÞ are probability density and cumulative distribution functions of the standard normal distribution for safety factor (k 0 ), respectively. Moreover, the retailer's service level proposed by the warehouse is considered to have a probability distribution over its value (h r ðl h r ; r 2 h r Þ) regarding assumption (8).

The retailer's model
To satisfy the final customers' demand, the retailer places orders from the warehouse. Due to the competitive conditions, the retailer has no bargaining power and charges p 0 per each item. The retailer's profit function is described as follows: p r Q r ; k r ð Þ¼Total revenue À Total purchasing cost À Total ordering cost À Total holding cost À Total stock out cost: Or mentioned as follows (Eq. (15)): where In Eq. (15), p r is the retailer's selling price when p r ¼ ap 0 (a ! 1) and h r ¼ h c þ ip 0 . Equation (16) describes retailer's safety stock. In Eq. (17), S r indicates the retailer's stock out. Equation (18) determines a lower bound for the retailer's service level.
According to assumption (10), customer's demand from the retailer has truncated normal distribution with the following parameters per unit time: In the above equations, v ¼ r r l r , G À 1 v À Á , and H À 1 v À Á are calculated based on Eqs. (13) and (14).

The models regarding incomplete information
According to the uncertainty of transportations time and demand in the real world, the transportation time from the warehouse to the retailer and the retailer's service level is considered unknown to the retailer and the warehouse. Therefore, the warehouse and the retailer's models are presented under incomplete information.

The warehouse model
The warehouse is unaware of the retailer's service level; therefore, the uncertainty of service level (h r ) is expressed by probability density function (pdf), gðh r Þ. The expected profit function of the warehouse, E p 0 Q 0 ; k 0 ð Þ ð Þ , is maximized with respect to order quantity, Q 0 , and safety factor k 0 , such that Given where The retailer's model The transportation mode is shown with n with two types: traditional and emergency ones that are different. Since the retailer is unaware of the transportation time (lead time), therefore, the uncertainty of, L r , is expressed by probability distribution over its value L r ða; 1 À aÞ. Therefore, the expected annual profit function of the retailer, E p r Q r ; k r ð Þ ð Þ , is maximized with respect to order quantity,Q r , and safety factor k r , such that Given that A r ¼ ½u k r ð Þ À k r ð1 À Uðk r Þr Ã r d r , we achieve where

Warehouse-Stackelberg game
In Sects. 2 and 3, the warehouse and the retailer's models under complete and incomplete information were considered separately. The warehouse-Stackelberg model with incomplete information and under information sharing regarding incentive strategy will be represented as follows: The warehouse-Stackelberg game with incomplete information According to assumption (9), considering the warehouse selling price, p 0 , the retailer (follower) obtains ordering policy, Q r and R r , to maximize the profit. Then, the warehouse as a leader determines the optimal p 0 and Q 0 by maximizing the profit. The solutions to the retailers' problem (follower) are exhibited by ðQ Ã r ; R Ã r Þ. Given that Þ À k 0 ½1 À Uðk 0 ÞÞ and A r ¼ ½u k r ð Þ À k r ð1À Uðk r Þr Ã r d r , the following model is presented for the warehouse (leader): MaxEðp 0 ðN; k 0 ; Q Ã r ; k Ã r ÞÞ s:t : MaxE p r Q r ; k r ð Þ ð Þ s:t : Due to the complexity of the model, the optimal solution could not be obtained as a closed-form solution. Therefore, the warehouse-Stackelberg is applied as a bi-level programming (BLPP). The retailer and the warehouse's models are presented as the inner and outer levels, respectively. The optimal inventory and pricing policies are obtained using the algorithm based on BLPP (Gümüş and Floudas 2001) that guarantees the global optimality of solution. Their algorithm is as follows: Step 1 Set the lower and upper bound of the outer level objective function (leader), LB ¼ À1 and UB ¼ 1.
Step 2 Use KKT optimality conditions instead of the inner level optimization problem (follower), and consider them as the constraints of the outer level problem.
Step 3 Use the current problem from the second step to calculate the lower bound of the outer level objective function.
Step 4 Substitute the nonlinear equality constraints of the outer problem with two inequality constraints (greater than or smaller than).
Step 5 Determine the lower and upper bounds of the variables.
Step 6 To calculate the upper bound of the outer level objective function, first convert the inner level objective function problem to a convex shape and then use the KKT optimality conditions instead of the inner level problem.
Step 7 Substitute the complementarily conditions with their equivalent linear constraints to calculate the upper bound of outer level objective function.
Step 8 If the upper and the lower bounds of the outer level objective function converge, then stop, otherwise divide the interval of one of the variables into two or more sub-intervals, and then go back to step 2.

Revealing information regarding incentive strategies
In this case, both the warehouse and the retailer reveal their private information according to an incentive strategy. The warehouse shares transportation information, while the retailer promises to buy more rather than an incomplete pattern. On the other hand, the retailer shares an accurate service level and demand, while the warehouse offers the retailer a fraction of its profit by d 0 percent. However, both the warehouse and the retailer's objective functions should be at least the optimal value in the case of incomplete information.
Given that the warehouse's selling price is (d 0 p 0 ), the retailer (follower) obtains an optimal ordering policy.
ð Þ À k r ð1 À Uðk r Þr Ã r d r , the following model is presented for the retailer as a follower: In addition, in the following model, the warehouse is considered as a leader: Due to the complexity of the model, the optimal solution could not be obtained as a closed-form solution. The optimal inventory and pricing policies are obtained using the algorithm based on BLPP (Gü müş and Floudas 2001) that guarantee the global optimality of solution.

Computational results
In this section, a numerical example is presented to illustrate the proposed models and also a sensitivity analysis is performed.

Numerical example
Suppose a supply chain consists of one central warehouse and a retailer. The retailer pays p 0 per each item and sells it to the end customer by ð1 þ aÞp 0 ; 0\a 1. All parameters are known to both the warehouse and the retailer except L r and h r . The parameters are presented in Table 2. The optimal solution of the warehouse-Stackelberg problem with incomplete information is shown in Table 3 using the algorithm in Sect. 4. Moreover, the Stackelberg game regarding information sharing as well as incomplete information is considered to compare the results. In this numerical problem, information sharing enables both the warehouse and the retailer to gain more compared with the incomplete structure.
It is necessary to note that the warehouse's reorder point can be obtained based on Eq. (45): Similarly, the retailer's reorder point can be calculated.

Sensitivity analysis
Sensitivity analysis is performed regarding the main parameters, including the probability of common transportation mode proposed by the retailer (a) and the retailer's service-level mean (l h r ) and variance (r h r ) estimated by the warehouse. As exhibited in Table 4 and Fig. 3, unlike the retailer, the warehouse's decisions have been affected by changing the retailer's service-level estimation (l h r ). When l h r decreases, the warehouse's payoff decreases as well. If the warehouse underestimates the retailer's service level, less order quantity (Q 0 ) will be placed to satisfy the retailer's demand. Therefore, the warehouse will keep fewer inventories and faces more stock out. As shown in Table 5 and Fig. 4, by increasing r h r , the warehouse's payoff (p 0 ) decreases. However, the change in the retailer's payoff can almost be ignored. Therefore, information sharing is more beneficial to the warehouse compared to the retailer. Thus, the warehouse should provide motivating options, such as price discount or service-level improvement to develop its relationship with the retailer (Table 5).
By increasing the probability of common transportation type (a), the lead time is overestimated by the retailer. Therefore, more order quantities should be placed to prevent stock out. Consequently, the warehouse places more order quantities to satisfy the retailer's demand and prevent stock out. While keeping more inventory increases the warehouse's holding cost, it also reduces the shortage cost. When the shortage cost is rather high, the warehouse's profit increases accordingly. Similarly, keeping more inventories affects the retailer's profit (Table 6; Fig. 5).

Conclusion
Inventory systems to analyze different real-world situations have received great interest in the literature. This paper considers a two-echelon supply chain that includes a warehouse and a retailer who are faced with the stochastic demand. The retailer's service level and the transportation time from the warehouse to the retailer are private information for the retailer and the warehouse, respectively. Regarding incomplete information, the interaction between the warehouse and the retailer is considered by Stackelberg game. The optimal inventory and pricing policies are obtained using the algorithm based on BLPP. While the warehouse's policy and profit are very sensitive to the estimation of parameters, the retailer's decisions and profit change can almost be ignored. In this case study, the warehouse has more motivation to share information compared to the retailer to gain more benefit.
There are more scopes in extending the present work. For example, other types of inventory control policies could be considered regarding incomplete information. Moreover, other parameters of the model, such as the order quantity or different costs, could be presented as private information for both the warehouse and the retailer.