Abstract
Deep Reinforcement Learning (RL) has been used recently in many areas achieving successful results. A multi-period supply chain operation can be viewed as a sequential decision-making problem for which Deep RL may be appropriate. Previous uses of such approach on related problems consider only serial or two-echelon supply chains with limited decision possibilities. In this research a four-echelon supply chain with two nodes per echelon and stochastic customer demands is considered. An MDP formulation and a Non-Linear Programming model of the problem are presented. Proximal Policy Optimization (PPO2) is used in order to find a good policy to operate the entire supply chain and minimize total operating costs. An agent based on a linearized model is used as a baseline. Experimental results indicate that PPO2 is a suitable and competitive approach for the proposed problem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
Unmet demand cost \(c^d \) was considered as \(3c^q \) where \(c^q\) is the total operating cost of delivering one unity of product. The value of \(c^q\) was calculated as 72 for the presented scenario (considering the highest supply and processing costs, total transportation costs and inventory costs over eight periods).
References
Brockman, G., et al.: Openai gym (2016)
Chaharsooghi, S.K., Heydari, J., Zegordi, S.H.: A reinforcement learning model for supply chain ordering management: an application to the beer game. Dec. Supp. Syst. 45(4), 949–959 (2008). https://doi.org/10.1016/j.dss.2008.03.007
Giannoccaro, I., Pontrandolfo, P.: Inventory management in supply chains: a reinforcement learning approach. Int. J. Prod. Econ. 78(2), 153–161 (2002). https://doi.org/10.1016/S0925-5273(00)00156-0
Gijsbrechts, J., Boute, R.N., Van Mieghem, J.A., Zhang, D.: Can deep reinforcement learning improve inventory management? performance on dual sourcing, lost sales and multi-echelon problems (2019). https://doi.org/10.2139/ssrn.3302881
Hill, A., et al.: Stable baselines. https://github.com/hill-a/stable-baselines
Hutse, V.: Reinforcement Learning for Inventory Optimisation in multi-echelon supply chains. Master in business engineering, Ghent University (2019)
Kemmer, L., von Kleist, H., de Rochebouët, D., Tziortziotis, N., Read, J.: Reinforcement learning for supply chain optimization. In: European Workshop on Reinforcement Learning, vol. 14 (2018)
Laumanns, M., Woerner, S.: Multi-echelon supply chain optimization: methods and application examples. In: Póvoa, A.P.B., Corominas, A., de Miranda, J.L. (eds.) Optimization and Decision Support Systems for Supply Chains, pp. 131–138. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-42421-7_9
Mortazavi, A., Arshadi Khamseh, A., Azimi, P.: Designing of an intelligent self-adaptive model for supply chain ordering management system. Eng. Appl. Artif. Intell 37, 207–220 (2015)
OpenAI Baselines: Blog. https://openai.com/blog/openai-baselines-ppo/. Accessed 23 May 2020
Oroojlooyjadid, A.: Applications of Machine Learning in Supply Chains. Ph.D. thesis, Lehigh University (2019). https://preserve.lehigh.edu/etd/4364
Peng, Z., Zhang, Y., Feng, Y., Zhang, T., Wu, Z., Su, H.: Deep reinforcement learning approach for capacitated supply chain optimization under demand uncertainty. In: 2019 Chinese Automation Congress (CAC), pp. 3512–3517 (2019). https://doi.org/10.1109/CAC48633.2019.8997498
Raffin, A.: Rl baselines zoo (2018). https://github.com/araffin/rl-baselines-zoo
Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017)
Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction. MIT Press, Cambridge (2018)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Alves, J.C., Mateus, G.R. (2020). Deep Reinforcement Learning and Optimization Approach for Multi-echelon Supply Chain with Uncertain Demands. In: Lalla-Ruiz, E., Mes, M., Voß, S. (eds) Computational Logistics. ICCL 2020. Lecture Notes in Computer Science(), vol 12433. Springer, Cham. https://doi.org/10.1007/978-3-030-59747-4_38
Download citation
DOI: https://doi.org/10.1007/978-3-030-59747-4_38
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-59746-7
Online ISBN: 978-3-030-59747-4
eBook Packages: Computer ScienceComputer Science (R0)