Abstract
This paper presents our approach for the control of a centralized distributed inventory management system using reinforcement learning (RL). We propose the application of policy-based reinforcement learning algorithms to tackle this problem in an effective manner. We have formulated the problem as a Markov decision process (MDP) and have created an environment that keeps track of multiple products across multiple warehouses returning a reward signal that directly corresponds to the total revenue across all warehouses at every time step. In this environment, we have applied various policy-based reinforcement learning algorithms such as Advantage Actor-Critic, Trust Region Policy Optimization and Proximal Policy Optimization to decide the amount of each product to be stocked in every warehouse. The performance of these algorithms in maximizing average revenue over time has been evaluated considering various statistical distributions from which we sample demand per time step per episode of training. We also compare these approaches to an existing approach involving a fixed replenishment scheme. In conclusion, we elaborate upon the results of our evaluation and the scope for future work on the topic.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Sutton R, Barto A (1995) Reinforcement learning: an introduction. MIT Press, Cambridge, MA. http://www.incompleteideas.net/book/the-book-2nd.html
Stockheim T, Schwind M, Koenig W (2003) A reinforcement learning approach for supply chain management
Kara A, Dogan I (2018) Reinforcement learning approaches for specifying ordering policies of perishable inventory systems. Expert Syst Appl 91:150–158
Katanyukul T, Chong EKP (2014) Intelligent inventory control via ruminative reinforcement learning. https://doi.org/10.1007/11823285_121
Pontrandolfo P, Gosavi A, Okogbaa OG, Das TK (2002) Global supply chain management: a reinforcement learning approach. Int J Prod Res 40:1299–1317
Yuan Yu J (2017) Quality assurance in supply chain management lesson 4, INSE6300 supply chain management. https://users.encs.concordia.ca/~jiayuan/scm17/reinforcement.pdf
Kim CO, Jun J, Baek JK, Smith RL, Kim YD (2005) Adaptive inventory control models for supply chain management. Int J Adv Manuf Technol 26:1184–1192. https://doi.org/10.1007/s00170-004-2069-8
Giannoccaro I, Pontrandolfo P: Inventory management in supply chains: a reinforcement learning approach. Int J Prod Econ 78:153–161. https://doi.org/10.1016/S0925-5273(00)00156-0
Oroojlooyjadid A, Snyder LV, Takac M: Applying deep learning to the newsvendor problem. arXiv:1607.02177
Mehta D, Yamparala D (2014) Policy gradient reinforcement learning for solving supply-chain management problems. https://doi.org/10.1145/2662117.2662129
Brockman G, Cheung V, Pettersson L, Schneider J, Schulman J, Tang J, Zaremba W (2016) OpenAI Gym. arXiv:1606.01540 [cs.LG]
Mnih V, Kavukcuoglu K, Silver D, Rusu A, Veness J, Bellemare MG, Graves A, Riedmiller M, Fidjeland A, Ostrovski G, Petersen S, Beattie C, Sadik A, Antonoglou I, King H, Kumaran D, Wierstra D, Legg S, Hassabis D (2015) Human level control through deep reinforcement learning. Nature 518:529–533. https://doi.org/10.1038/nature14236
Lagoudakis M, Parr R (2003) Least-squares policy iteration
Sutton RS, McAllester D, Singh S, Mansour Y (2000) Policy gradient methods for reinforcement learning with function approximation. http://dl.acm.org/citation.cfm?id=3009657.3009806
Schulman J, Levine S, Moritz P, Jordan M, Abbeel P (2015) Trust region policy optimization. arXiv:1502.05477
Schulman J, Wolski F, Dhariwal P, Radford A, Klimov O (2017) Proximal policy optimization algorithms. arXiv:1707.06347
Adobe Magento eCommerce Software: threshold-based inventory replenishment. https://docs.magento.com/m1/ce/user_guide/catalog/inventory-out-of-stock-threshold.html
National Programme for Technology Enhanced Learning, Reinforcement Learning, Ravindran B. https://onlinecourses.nptel.ac.in/noc16_cs09
Deep RL Bootcamp lecture series. Berkeley, CA (2017). https://sites.google.com/view/deep-rl-bootcamp/lectures
Analytics Vidhya website. https://www.analyticsvidhya.com/blog/2018/09/reinforcement-learning-model-based-planning-dynamic-programming
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Gokhale, A., Trasikar, C., Shah, A., Hegde, A., Naik, S.R. (2021). A Reinforcement Learning Approach to Inventory Management. In: Chiplunkar, N.N., Fukao, T. (eds) Advances in Artificial Intelligence and Data Engineering. AIDE 2019. Advances in Intelligent Systems and Computing, vol 1133. Springer, Singapore. https://doi.org/10.1007/978-981-15-3514-7_23
Download citation
DOI: https://doi.org/10.1007/978-981-15-3514-7_23
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3513-0
Online ISBN: 978-981-15-3514-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)