Abstract
As is well known, average-cost optimality inequalities imply the existence of stationary optimal policies for Markov decision processes with average costs per unit time, and these inequalities hold under broad natural conditions. This paper provides sufficient conditions for the validity of the average-cost optimality equation for an infinite state problem with weakly continuous transition probabilities and with possibly unbounded one-step costs and noncompact action sets. These conditions also imply the convergence of sequences of discounted relative value functions to average-cost relative value functions and the continuity of average-cost relative value functions. As shown in this paper, the classic periodic-review setup-cost inventory control problem with backorders and convex holding/backlog costs satisfies these conditions. Therefore, the optimality inequality holds in the form of an equality with a continuous average-cost relative value function for this problem. In addition, the K-convexity of discounted relative value functions and their convergence to average-cost relative value functions, when the discount factor increases to 1, imply the K-convexity of average-cost relative value functions. This implies that average-cost optimal (s, S) policies for the inventory control problem can be derived from the average-cost optimality equation.
Similar content being viewed by others
References
Bensoussan, A. (2011). Dynamic programming and inventory control. Amsterdam: IOS Press.
Bertsekas, D. P., & Shreve, S. E. (1996). Stochastic optimal control: The discrete-time case. Belmont, MA: Athena Scientific.
Beyer, D., Cheng, F., Sethi, S. P., & Taksar, M. (2010). Markovian demand inventory models. New York: Springer.
Beyer, D., & Sethi, S. P. (1999). The classical average-cost inventory models of Iglehart and Veinott–Wagner revisited. Journal of Optimization Theory and Applications, 101(3), 523–555.
Cavazos-Cadena, R. (1991). A counterexample on the optimality equation in markov decision chains with the average cost criterion. System and Control Letters, 16(5), 387–392.
Chen, X., & Simchi-Levi, D. (2004a). Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: The finite horizon case. Operations Research, 52(6), 887–896.
Chen, X., & Simchi-Levi, D. (2004b). Coordinating inventory control and pricing strategies with random demand and fixed ordering cost: The infinite horizon case. Mathematics of Operations Research, 29(3), 698–723.
Chen, X., & Simchi-Levi, D. (2006). Coordinating inventory control and pricing strategies: The continuous review model. Operations Research Letters, 34(3), 323–332.
Costa, O. L. V., & Dufour, F. (2012). Average control of Markov decision processes with Feller transition probabilities and general action spaces. Journal of Mathematical Analysis and Applications, 396(1), 58–69.
Feinberg, E. A. (2016). Optimality conditions for inventory control. In A. Gupta & A. Capponi (Eds.), Tutorials in operations research. Optimization challenges in complex, networked, and risky systems (pp. 14–44). Cantonsville, MD: INFORMS.
Feinberg, E. A., Kasyanov, P. O., & Zadoianchuk, N. V. (2012). Average cost Markov decision processes with weakly continuous transition probability. Mathematics of Operations Research, 37(4), 591–607.
Feinberg, E. A., Kasyanov, P. O., & Zadoianchuk, N. V. (2013). Berge’s theorem for noncompact image sets. Journal of Mathematical Analysis and Applications, 397(1), 255–259.
Feinberg, E. A., Kasyanov, P. O., & Zgurovsky, M. Z. (2016). Partially observable total-cost Markov decision processes with weakly continuous transition probabilities. Mathematics of Operations Research, 41(2), 656–681.
Feinberg, E. A., & Lewis, M. E. (2007). Optimality inequalities for average cost Markov decision processes and the stochastic cash balance problem. Mathematics of Operations Research, 32(4), 769–783.
Feinberg, E. A., & Lewis, M. E. (2015). On the convergence of optimal actions for Markov decision processes and the optimality of (s, S) policies for inventory control. Preprint arXiv:1507.05125. http://arxiv.org/pdf/1507.05125.pdf.
Feinberg, E. A., & Liang, Y. (2017a). Structure of optimal policies to periodic-review inventory models with convex costs and backorders for all values of discount factors. Annals of Operations Research. doi:10.1007/s10479-017-2548-6.
Feinberg, E. A., & Liang, Y. (2017b). Stochastic setup-cost inventory model with backorders and quasiconvex cost functions. Preprint arXiv:1705.06814. http://arxiv.org/pdf/1705.06814.pdf.
Hernández-Lerma, O., & Lasserre, J. B. (1996). Discrete-time Markov control processes: Basic optimality criteria. New York: Springer.
Hiriart-Urruty, J.-B., & Lemaréchal, C. (1993). Convex analysis and minimization algorithms I. Berlin: Springer.
Iglehart, D. L. (1963). Dynamic programming and stationary analysis of inventory roblems. In H. Scarf, D. Gilford, & M. Shelly (Eds.), Multistage inventory control models and techniques (pp. 1–31). Stanford, CA: Stanford University Press.
Jaśkiewicz, A., & Nowak, A. S. (2006). On the optimality equation for average cost Markov control processes with Feller transition probabilities. Journal of Mathematical Analysis and Applications, 316(2), 495–509.
Katehakis, M. N., & Smit, L. C. (2012). On computing optimal (Q, r) replenishment policies under quantity discounts. Annals of Operations Research, 200(1), 279–298.
Luque-Vasques, F., & Hernández-Lerma, O. (1995). A counterexample on the semicontinuity of minima. Proceedings of the American Mathematical Society, 123(10), 3175–3176.
Montes-de-Oca, R. (1994). The average cost optimality equation for Markov control processes on Borel spaces. Systems and Control Letters, 22(5), 251–357.
Presman, E., & Sethi, S. P. (2006). Inventory models with continuous and poisson demands and discounted and average costs. Production and Operations Management, 15(2), 279–293.
Resnick, S. I. (1992). Adventures in Stochastic Processes. Boston: Birkhauser.
Schäl, M. (1993). Average optimality in dynamic programming with general state space. Mathematics of Operations Research, 18(1), 163–172.
Sennott, L. I. (1998). Stochastic dynamic programming and the control of queueing systems. New York: Wiley.
Sennott, L. I. (2002). Average reward optimization theory for denumerable state systems. In E. A. Feinberg & A. Shwartz (Eds.), Handbook of Markov decision processes: Methods and applications (pp. 153–173). Boston, MA: Kluwer.
Shi, J., Katehakis, M. N., & Melamed, B. (2013). Martingale methods for pricing inventory penalties under continuous replenishment and compound renewal demands. Annals of Operations Research, 208(1), 593–612.
Veinott, A. F., & Wagner, H. M. (1965). Computing optimal (s, S) policies. Management Science, 11(5), 525–552.
Zheng, Y. (1991). A simple proof for optimality of (s, S) policies in infinite-horizon inventory systems. Journal of Applied Probability, 28(4), 802–810.
Acknowledgements
This research was partially supported by NSF Grants CMMI-1335296 and CMMI-1636193.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Feinberg, E.A., Liang, Y. On the optimality equation for average cost Markov decision processes and its validity for inventory control. Ann Oper Res 317, 569–586 (2022). https://doi.org/10.1007/s10479-017-2561-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10479-017-2561-9