Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation

Barat, Souvik; Kumar, Prashant; Gajrani, Monika; Khadilkar, Harshad; Meisheri, Hardik; Baniwal, Vinita; Kulkarni, Vinay

doi:10.1007/978-3-030-60843-9_3

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12025))

Included in the following conference series:

International Workshop on Multi-Agent Systems and Agent-Based Simulation

458 Accesses
4 Citations

Abstract

Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and autonomous driving, but has rarely been used to manage operations of business-critical systems such as supply chains. A key aspect of using RL in the real world is to train the agent before deployment by computing the effect of its exploratory actions on the environment. While this effect is easy to compute for online gameplay (where the rules of the game are well known) and autonomous driving (where the dynamics of the vehicle are predictable), it is much more difficult for complex systems due to associated complexities, such as uncertainty, adaptability and emergent behaviour. In this paper, we describe a framework for effective integration of a reinforcement learning controller with an actor-based multi-agent simulation of the supply chain network including the warehouse, transportation system, and stores, with the objective of maximizing product availability while minimising wastage under constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Agha, G.A.: Actors: a model of concurrent computation in distributed systems. Technical report, DTIC Document (1985)
Google Scholar
Allen, J.: Effective Akka. O’Reilly Media, Sebastopol (2013)
Google Scholar
Anderson, P.: Perspective: complexity theory and organization science. Organ. Sci. 10(3), 216–232 (1999)
Article Google Scholar
Armstrong, J.: Erlang - a survey of the language and its industrial applications. In: Proceedings of the INAP, vol. 96 (1996)
Google Scholar
Barabási, A.L., Bonabeau, E.: Scale-free networks. Sci. Am. 288(5), 60–69 (2003)
Article Google Scholar
Bouabdallah, S., Noth, A., Siegwart, R.: PID vs LQ control techniques applied to an indoor micro quadrotor. In: Proceedings of The IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 2451–2456. IEEE (2004)
Google Scholar
Caro, F., Gallien, J.: Inventory management of a fast-fashion retail network. Oper. Res. 58(2), 257–273 (2010)
Article MathSciNet Google Scholar
Clark, T., Kulkarni, V., Barat, S., Barn, B.: ESL: an actor-based platform for developing emergent behaviour organisation simulations. In: Demazeau, Y., Davidsson, P., Bajo, J., Vale, Z. (eds.) PAAMS 2017. LNCS (LNAI), vol. 10349, pp. 311–315. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59930-4_27
Chapter Google Scholar
Condea, C., Thiesse, F., Fleisch, E.: RFID-enabled shelf replenishment with backroom monitoring in retail stores. Decis. Support Syst. 52(4), 839–849 (2012)
Article Google Scholar
Duan, Y., et al.: One-shot imitation learning. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), vol. 31 (2017)
Google Scholar
Gabel, T., Riedmiller, M.: Distributed policy search RL for job-shop scheduling tasks. Int. J. Prod. Res. 50(1), 41–61 (2012)
Article Google Scholar
Giannoccaro, I., Pontrandolfo, P.: Inventory management in supply chains: a reinforcement learning approach. Int. J. Prod. Econ. 78(2), 153–161 (2002)
Article Google Scholar
Godfrey, G.A., Powell, W.B.: An ADP algorithm for dynamic fleet management, I: single period travel times. Transp. Sci. 36(1), 21–39 (2002)
Article Google Scholar
Hewitt, C.: Actor model of computation: scalable robust information systems. arXiv preprint arXiv:1008.1459 (2010)
Holland, J.H.: Complex Adaptive Systems, pp. 17–30. Daedalus, Boston (1992)
Google Scholar
Iacob, M., Jonkers, D.H., Lankhorst, M., Proper, E., Quartel, D.D.: Archimate 2.0 Specification. Van Haren Publishing, ’s-Hertogenbosch (2012)
Google Scholar
Jiang, C., Sheng, Z.: Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst. Appl. 36(3), 6520–6526 (2009)
Article Google Scholar
Kaggle: Instacart market basket analysis data. https://www.kaggle.com/c/instacart-market-basket-analysis/data. Accessed August 2018
Khadilkar, H.: A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans. ITS 20, 727–736 (2018)
Google Scholar
Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)
Article Google Scholar
Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)
Google Scholar
Lee, H.L., Padmanabhan, V., Whang, S.: Information distortion in a supply chain: The bullwhip effect. Manage. Sci. 43(4), 546–558 (1997)
Article Google Scholar
Macal, C.M., North, M.J.: Tutorial on agent-based modelling and simulation. J. Simul. 4(3), 151–162 (2010)
Article Google Scholar
Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)
Article MathSciNet Google Scholar
Meadows, D.H., Wright, D.: Thinking in Systems. Chelsea Green Publishing, Hartford (2008)
Google Scholar
Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)
Mortazavi, A., Khamseh, A.A., Azimi, P.: Designing of an intelligent self-adaptive model for supply chain ordering management system. Eng. Appl. Artif. Intell. 37, 207–220 (2015)
Article Google Scholar
Powell, W.B.: AI, OR and control theory: a Rosetta stone for stochastic optimization. Princeton University (2012)
Google Scholar
Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Kuala Lumpur (2016)
MATH Google Scholar
Sabri, E.H., Beamon, B.M.: A multi-objective approach to simultaneous strategic and operational planning in supply chain design. Omega 28(5), 581–598 (2000)
Article Google Scholar
Shervais, S., Shannon, T.T., Lendaris, G.G.: Intelligent supply chain management using adaptive critic learning. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 33(2), 235–244 (2003)
Article Google Scholar
Silver, E.A.: Operations research in inventory management: a review and critique. Oper. Res. 29(4), 628–645 (1981)
Article MathSciNet Google Scholar
Simon, H.A.: The architecture of complexity. In: Facets of Systems Science, pp. 457–476. Springer, Boston (1991). https://doi.org/10.1007/978-1-4899-0718-9_31
Smith, S.A., Agrawal, N.: Management of multi-item retail inventory systems with demand substitution. Oper. Res. 48(1), 50–64 (2000)
Article MathSciNet Google Scholar
Sutton, R., Barto, A.: Reinforcement Learning, 2nd edn. MIT Press, Cambridge (2012)
MATH Google Scholar
Thomas, M., McGarry, F.: Top-down vs. bottom-up process improvement. IEEE Softw. 11(4), 12–13 (1994)
Article Google Scholar
Topaloglu, H., Powell, W.B.: Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS J. Comput. 18(1), 31–42 (2006)
Article MathSciNet Google Scholar
Valluri, A., North, M.J., Macal, C.M.: Reinforcement learning in supply chains. Int. J. Neural Sys. 19(05), 331–344 (2009)
Article Google Scholar
White, S.A.: BPMN Modeling and Reference Guide (2008)
Google Scholar

Download references

Author information

Authors and Affiliations

TCS Research, Pune, India
Souvik Barat, Prashant Kumar, Monika Gajrani & Vinay Kulkarni
TCS Research, Mumbai, India
Harshad Khadilkar, Hardik Meisheri & Vinita Baniwal

Authors

Souvik Barat
View author publications
You can also search for this author in PubMed Google Scholar
Prashant Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Monika Gajrani
View author publications
You can also search for this author in PubMed Google Scholar
Harshad Khadilkar
View author publications
You can also search for this author in PubMed Google Scholar
Hardik Meisheri
View author publications
You can also search for this author in PubMed Google Scholar
Vinita Baniwal
View author publications
You can also search for this author in PubMed Google Scholar
Vinay Kulkarni
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Souvik Barat .

Editor information

Editors and Affiliations

ISTC/CNR, Rome, Italy
Mario Paolucci
Laboratório de Técnicas Inteligentes, Universidade de São Paulo, São Paulo, Brazil
Jaime Simão Sichman
Institutionen för data- och systemvetenskap, Stockholm University, Kista, Sweden
Harko Verhagen

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Barat, S. et al. (2020). Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation. In: Paolucci, M., Sichman, J.S., Verhagen, H. (eds) Multi-Agent-Based Simulation XX. MABS 2019. Lecture Notes in Computer Science(), vol 12025. Springer, Cham. https://doi.org/10.1007/978-3-030-60843-9_3

Download citation

DOI: https://doi.org/10.1007/978-3-030-60843-9_3
Published: 04 November 2020
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-60842-2
Online ISBN: 978-3-030-60843-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics