A CUDA approach to compute perishable inventory control policies using value iteration
- 205 Downloads
Abstract
Dynamic programming (DP) approaches, in particular value iteration, is often seen as a method to derive optimal policies in inventory management. The challenge in this approach is to deal with an increasing state space when handling realistic problems. As a large part of world food production is thrown out due to its perishable character, a motivation exists to have a good look at order policies in retail. Recently, investigation has been introduced to consider substitution of one product by another, when one is out of stock. Taking this tendency into account in a policy requires an increasing state space. Therefore, we investigate the potential of using GPU platforms in order to derive optimal policies when the number of products taken into account simultaneously is increasing. First results show the potential of the GPU approach to accelerate computation in value iteration for DP.
Keywords
GPU Inventory control Value iteration CUDA1 Introduction
However, in many cases, a so-called stock base policy works well, where an order-up-to level S is used, [11]. The retailer inspects the current inventory level I and orders an amount Q up to level S, i.e., the rule \(Q=(S-I)^+\) is followed, where \((x)^+=\max \{x,0\}\). Due to modern scanning techniques, the retailer may be aware of the exact age distribution of the items in stock, let say \((I_1,\ldots I_r,\ldots ,I_M)\), where the products with remaining shelf life in \(I_1\) will perish at the end of the day, but fresher products (\(I_M\) is the number of items with the maximum shelf life M) can still be sold in the future. For some inventory models, it is possible to derive the so-called optimal order policy \(Q(I_1,\ldots ,I_M)\) via Stochastic Dynamic Programming (SDP) using a computational procedure which is called value iteration (VI).
Recently, an interesting question has been raised where two products interact with each other due to substitution of one product by the other when one is out of stock, [2, 4, 5, 16]. It is much harder to obtain an optimal policy, as the order quantity of one product also depends on the inventory of the other. Trying to obtain an optimal policy via DP for such a case requires not only to consider the stock of one product, but also that which serves as a substitute increasing the state space. This is where our research question comes in. Can High Performance Computing (HPC) aid to consider such cases which imply a larger state space?
The question of applying GPUs for value iteration (VI) in Markov Decision Problems (MDP) has been investigated by several researchers. Johannson [10] provides a proof of principle, where CUDA is used to map the VI process on a GPU platform. Chen and Lu [3] consider OpenCL for implementing a VI for a path finding problem. Herrera et al. [9] use HPC to implement a very large-scale MDP for the generation of traffic control tables. Ruiz and Hernández [14] use the developed concepts of [10] and claim a \(90\times \) speedup on an MDP model for crowd navigation. To the knowledge of the authors, no experience has been reported in literature on applying GPU implementations for VI to derive inventory control rules.
The contribution of this paper is to illustrate the MDP process for perishable inventory control first by a simplified retailer inventory control model of one product and then to investigate how a two product case with a larger state space might be handled exploiting the use of GPUs. Therefore, Sect. 2 first describes a simplified one product inventory control problem and illustrates how the optimal solution can be derived by value iteration. Then, Sect. 3 discusses how we can handle a two product case with interaction of products caused by substitution. Section 4 describes the CUDA implementation of the algorithm. Section 5 depicts the evaluation results of the CUDA approach in comparison with a sequential version of VI. Finally, Sect. 6 provides the findings of our investigation.
2 Value iteration for a one product model
We first describe the decision situation and model for one product and then elaborate the optimal solution by value iteration. Section 2.3 provides an illustrative example.
2.1 Decision model for one product
2.2 Value iteration to reach the optimal policy
The question is whether there might be a better rule than (7) when the retailer measures the complete age distribution of its inventory \((I_1,I_r,\ldots ,I_M)\). Actually, the described system behaves as a so-called Markov Decision Process (MDP), [13]. Given that there is a maximum amount to be ordered \(\overline{Q}\), the state space is given by \(I_1=0,\ldots ,\overline{Q}\), \(\mathcal {S}=\{0,\ldots ,\overline{Q}\}^M\) consisting of \(N=(\overline{Q} +1)^M\) elements. For each inventory situation \(I\in \mathcal {S}\), we would like to know how many items Q(I) to order.
2.3 Illustration for one product case
As an example we consider an instance where sales price is \(s=1\), unit cost is \(c=.5\) and the average daily demand is \(\mu =5\) and the maximum shelf life is only \(M=2\). For this instance, the optimal order-up-to value of policy (7) is \(S=13\). This provides a daily profit of \(\varPi =2.195\) and the percentage of waste on the total order quantity is \(Wasteperc=7.33\) based on a simulation of 400,000 periods.
To start the VI algorithm, we have to define the maximum order quantity \(\overline{Q}\). This can be done by considering a newsvendor who orders for two periods following Equation (1), which provides \(\overline{Q}=9\). Using \(\epsilon =10^{-4}\), the algorithm converges in 12 iterations. Interestingly enough, the optimal policy \(Q^*\) never orders more than 7 units for this instance. The fixed value \(\pi =2.215\) corresponds to the simulation result of this policy, which is very close to the optimal base stock policy. However, the waste is reduced to \(Wasteperc=5.78\).
In order to challenge the algorithm, we increase the shelf life to \(M=3\), where the newsvendor would order 15 units corresponding to the optimal base stock \(S=15\) and leading to \(N=4096\) states. The algorithm converges in 15 iterations. Where the base stock policy gives a daily profit of \(\varPi =2.39\) and a \(Wasteperc=2.63\), the optimal policy is very near for this case with \(\varPi =2.40\) and a \(Wasteperc=2.53\).
The value iteration gets more challenge when we increase the shelf life to \(M=4\) where \(\overline{Q}=20\) for this case. Although this provides a computational challenge, the optimal value \(\pi =2.47\) corresponds practically to the average profit \(\varPi \) of the base stock policy with order-up-to level \(S=16\). So, for larger values of the shelf life, the systems starts to behave more like a non-perishable product and the order quantities get larger.
We also studied the smaller case of shelf life \(M=2\) with a LIFO withdrawal behavior, i.e., the clients first pick the freshest product. In that case, the optimal policy starts to behave periodic; order \(Q=10\) and next period order nothing, in order to avoid waste. For such cases, actually the Markovian behavior is different than used for the analysis in (8).
3 Handling two products due to substitution interaction
Recently, an interesting question has been raised where two products interact with each other due to substitution of one product by the other when one is out of stock. Several papers [2, 4, 5, 16] try to deal with this phenomenon and show that for a variety of models, one gains by considering the two products simultaneously instead of individually. For our case, not only the state space becomes much larger, but we also have to find two order quantities simultaneously. We extend our analysis to a case or two products, a and b with the corresponding data \(\mu _a,\mu _b,s_a,s_b, c_a,c_b\) and a probability \(\gamma \) for the binomial distribution Binom(\(n,\gamma \)) that models the amount of substitution demand for product a when product b is out of stock.
3.1 Value iteration for two products simultaneously
3.2 Two product instances
We elaborated first a symmetric case where both products have the same cost \(c=.5\), sales price \(s=1\) and demand parameters values \(\mu =5\) for shelf life \(M=2\). This implies a 4-dimensional state space \((I_{a1},I_{a2},I_{b1},I_{b2})\). The willingness to substitute product b by product a when being out of stock is \(\gamma =.5\). As the maximum order quantity is set on \(\overline{Q}=10\), the state space contains 14,651 states and in each iteration, for each state 121 order quantity combinations are evaluated generating the corresponding transition probabilities. Simulating the system with (best) order-up-to levels \(S_a=13\) and \(S_b=12\) generates a daily profit of \(\varPi =4.479\) and the percentage of waste on the total order quantity is \(Wasteperc_a=6.26\) and \(Wasteperc_b=5.23\) based on a simulation of 400,000 periods. The value iteration process converges in 12 iterations to an accuracy of \(\epsilon =10^{-4}\). Simulation of the generated order quantities \([Q_a,Q_b](I_a,I_b)\) reveals a daily profit of \(\varPi =5.509\) and less waste generation; the percentage of waste on the total order quantity is \(Wasteperc_a=5.97\) and \(Wasteperc_b=4.14\). This illustrates that it is worth the trouble to look into the generation of better order policies by value iteration for highly perishable products when there is a willingness to substitute a product when it is out of stock.
3.3 Complexity of the algorithm
Number of states N and computational time (in seconds or minutes) of a Matlab implementation for several instances with \(s_a=s_b=1\), \(\gamma =.5\) and \(c_a=c_b=.5\) on a Intel i5 with 8GB RAM
Instance | ||||||||
---|---|---|---|---|---|---|---|---|
Nr products | 1 | 1 | 1 | 2 | 2 | 2 | 2 | 2 |
Shelf life M | 2 | 3 | 4 | 2 | 2 | 3 | 3 | 3 |
\(\overline{Q}_a\) | 10 | 15 | 20 | 10 | 14 | 6 | 6 | 15 |
\(\overline{Q}_b\) | 10 | 10 | 3 | 6 | 15 | |||
\(\mu _a\) | 5 | 5 | 5 | 5 | 7 | 2 | 2 | 5 |
\(\mu _b\) | 5 | 5 | 1 | 2 | 5 | |||
N | 121 | 4096 | 194,481 | 14,461 | 27,225 | 21,952 | 117,649 | \(16.8\times 10^6\) |
Comp time | .03 s | 2.1 s | 9 min | 2 min | 6 min | 6 min | 99 min | xxx |
The computational burden (complexity) of Algorithm 2 for the case of two products with a shelf life of M is determined by the necessary number of iterations up to convergence, the number of states \((\overline{Q}_a+1)^M \times (\overline{Q}_b+1)^M\) and the number of relevant events in each state. We will denote the number of relevant events by \({ Nre}=\sum _{r=1}^ M I_{ar}+I_{br}\). We can observe that for shelf life \(M=3\) the Matlab code becomes intractable when the mean demand values are still realistic.
4 Parallelization of value iteration
In order to investigate a parallelization of VI, Algorithm 3 is constructed, which handles the states with respect to the number of relevant events \({ Nre}\) in line 8 of Algorithm 2.
Algorithm 3 depicts the details of the sequential code that we have considered to run the VI. For the case where no stock is available, indicated by \(j=0\), no stochastic event is relevant. For the states without inventory of product a, i.e., where \(I_a=0\), no substitution behavior is possible. Therefore, our intention has been to compute those states in parallel where substitution takes place (\(\sum _rI_{ar}>0\)) because they require more than 95% of the total runtime. Running Algorithm 3 reveals that the most computationally demanding part are the lines 14–19 which compute the valuation \(Fv(q_a,q_b)\) of an order decision \((q_a,q_b)\). In order to tackle the parallelization, we have considered a GPU approach using Compute Unified Device Architecture (CUDA) interface.
Algorithm 4 shows the host pseudocode of the GPU version that we have implemented. The algorithm implements lines 14–19 of Algorithm 3 to be computed on a GPU. Notice that the value of W is the same for all the states (j) of Algorithm 3, therefore, only one copy of W from CPU memory to GPU memory is required (before line 6 of Algorithm 3) in the GPU version.
In Algorithm 4, the number of threads and blocks are calculated and the call to GPU kernel (GPUSubstitution) is carried out. After that, a communication between the GPU and the CPU memory is necessary to transfer the computed value \(Fv(q_a,q_b)\).
5 Evaluation results
Characteristics of the NVIDIA K80
NVIDIA K80 | |
---|---|
Peak performance (double prec.) (TFlops) | 2.91 |
Peak performance (simple prec.) (TFlops) | 8.74 |
Device memory (GB) | 24 |
Boost clock rate (MHz) | 875 |
Memory bandwidth (GB/s) | 480 |
Multiprocessors | 13 |
CUDA cores | 4992 |
Compute capability | 3.7 |
Instances P1 to P4 with varying mean demand \(\mu _a,\mu _b\)
P1 | P2 | P3 | P4 | |
---|---|---|---|---|
\(\mu _a\) | 5 | 5 | 6 | 7 |
\(\mu _b\) | 5 | 6 | 6 | 7 |
\(\overline{Q}_a\) | 11 | 11 | 13 | 14 |
\(\overline{Q}_b\) | 11 | 13 | 13 | 14 |
N | 14,461 | 20,449 | 28,561 | 38,416 |
Total runtime in seconds of the sequential and the GPU code to compute the VI algorithm
P1 | P2 | P3 | P4 | |
---|---|---|---|---|
Seq. code | ||||
Total runtime of Algorithm 3 | 357.89 | 698.20 | 1358.04 | 4241.66 |
Substitution (lines 14–19) | 351.31 | 688.71 | 1344.82 | 4217.15 |
GPU code | ||||
Total runtime | 91.29 | 136.84 | 214.71 | 361.49 |
Communication host to device | 0.00 | 0.00 | 0.01 | 0.01 |
Communication device to host | 21.55 | 23.35 | 43.63 | 56.69 |
GPUSubstitution | 39.94 | 67.29 | 111.99 | 215.54 |
Total Communic. GPU/CPU | 21.56 | 23.35 | 43.64 | 56.70 |
% Communications | 23.62 | 17.06 | 20.32 | 15.69 |
% GPUSubstitution | 43.75 | 49.18 | 52.16 | 59.62 |
% Other parts of VI | 32.63 | 33.76 | 27.52 | 24.69 |
Table 4 shows the results of the experiments where the described parallel version is compared to the sequential C code for the four instances in Table 3. For each problem, we run the code for 100 iterations in order to have an accurate execution time and all converged before this maximum has been reached. As can be observed from Table 4, the parallel version improves the sequential version in a factor up to \(11.7\times \). An interesting aspect is that due to the use of GPU computing, the Substitution part has considerably reduced its runtime. Specifically for P4 (the largest problem), GPUSubstitution is \(19.6\times \) faster than the sequential substitution. What can also be observed is that when problems become larger, a larger part of the computation can be done by the GPU.
6 Conclusions
Value iteration is an algorithm that can be used to derive optimal inventory control order policies. The computational challenge becomes interesting when a number of products and the age distribution is considered simultaneously. The latter aspect is of interest in the case of perishable products with a low mean demand that sometimes may be substituted by other products when being out of stock due to a revealed customer willingness to do so. This paper describes a simple model and illustrates the underlying algorithm.
The complexity increases with the shelf life of a perishable product, but on the other hand leads to less interesting policies, as simple order-up-to level policies give practically the same performance in profit and waste generation. The challenging case is where several highly perishable products may be substituted by another, because ordering product a also requires looking into inventory levels (and age distribution) of product b, which can be substituted by a. Basically, dimension grows more than quadratically with the shelf life of the products.
Therefore, we developed a parallel version of the corresponding value iteration algorithm based on computing on a GPU architecture the part of the model which has the highest computational cost. This CUDA code tries to efficiently compute the nested-loops required for calculating the substitution of one product by another. Numerical results based on a case of only two products show that the parallel algorithm may improve the performance of the sequential value iteration in a factor that reaches 11.7\(\times \). Our future work will be focused on the parallel implementation of the relevant two product inventory management where shelf life is three.
References
- 1.Bellman R (1957) A Markovian decision process. J Math Mech 6(5):679–684MathSciNetzbMATHGoogle Scholar
- 2.Buisman M, Haijema R, Hendrix EMT (2018) On the delta service level for demand substitution in inventory control. IFAC-PapersOnLine 51:1660–1665CrossRefGoogle Scholar
- 3.Chen P, Lu L (2013) Markov decision process parallel value iteration algorithm on GPU. In: Proceedings of 2013 International Conference on Information Science and Computer Applications. Atlantis Press, pp 299–304Google Scholar
- 4.Chen X, Feng Y, Keblis MF, Xu J (2015) Optimal inventory policy for two substitutable products with customer service objectives. Eur J Oper Res 246(1):76–85MathSciNetCrossRefzbMATHGoogle Scholar
- 5.Gürler U, Yilmaz A (2010) Inventory and coordination issues with two substitutable products. Appl Math Model 34(3):539–551MathSciNetCrossRefzbMATHGoogle Scholar
- 6.Haijema R, Minner S (2016) Stock-level dependent ordering of perishables: a comparison of hybrid base-stock and constant order policies. Int J Prod Econ 181(PA):215–225CrossRefGoogle Scholar
- 7.Hendrix EMT, Haijema R, Rossi R, Pauls-Worm KGJ (2012) On solving a stochastic programming model for perishable inventory control. In: Murgante B et al (eds) Computational science and its applications-ICCSA 2012. Springer, Heidelberg, pp 45–56CrossRefGoogle Scholar
- 8.Hendrix EMT, Ortega G, Haijema R, Buisman M, García I (2018) On computing optimal policies in perishable inventory control using value iteration. In: Proceedings of the 18th CMMSE, pp 1–6Google Scholar
- 9.Herrera JFR, Hendrix EMT, Casado LG, Haijema R (2014) Data parallelism in traffic control tables with arrival information. In: Euro-Par 2014: Parallel Processing Workshops. Springer, Cham, pp 60–70Google Scholar
- 10.Johannson A (2009) GPU-based Markov decision process solver. Master’s thesis, Reykjavik University, IcelandGoogle Scholar
- 11.Minner S (2000) Strategic safety stocks in supply chains. Springer, BerlinCrossRefzbMATHGoogle Scholar
- 12.NVIDIA Corporation: CUDA C PROGRAMMING GUIDE PG-02829-001\(\_\)v9.2 (2018)Google Scholar
- 13.Puterman ML (1994) Markov decision processes: discrete stochastic dynamic programming, 1st edn. Wiley, New YorkCrossRefzbMATHGoogle Scholar
- 14.Ruiz S, Hernández B (2015) A parallel solver for Markov decision process in crowd simulations. In: 2015 Fourteenth Mexican International Conference on Artificial Intelligence (MICAI), pp 107–116Google Scholar
- 15.Silver E, Pyke D, Peterson R (1998) Inventory management and production planning and scheduling. Wiley, New YorkGoogle Scholar
- 16.Yadavalli V, Sundar D, Udayabaskaran S (2015) Two substitutable perishable product disaster inventory systems. Ann Oper Res 233(1):517–534MathSciNetCrossRefzbMATHGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.