Skip to main content

Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation

  • Conference paper
  • First Online:
Multi-Agent-Based Simulation XX (MABS 2019)

Abstract

Reinforcement Learning (RL) has achieved a degree of success in control applications such as online gameplay and autonomous driving, but has rarely been used to manage operations of business-critical systems such as supply chains. A key aspect of using RL in the real world is to train the agent before deployment by computing the effect of its exploratory actions on the environment. While this effect is easy to compute for online gameplay (where the rules of the game are well known) and autonomous driving (where the dynamics of the vehicle are predictable), it is much more difficult for complex systems due to associated complexities, such as uncertainty, adaptability and emergent behaviour. In this paper, we describe a framework for effective integration of a reinforcement learning controller with an actor-based multi-agent simulation of the supply chain network including the warehouse, transportation system, and stores, with the objective of maximizing product availability while minimising wastage under constraints.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Agha, G.A.: Actors: a model of concurrent computation in distributed systems. Technical report, DTIC Document (1985)

    Google Scholar 

  2. Allen, J.: Effective Akka. O’Reilly Media, Sebastopol (2013)

    Google Scholar 

  3. Anderson, P.: Perspective: complexity theory and organization science. Organ. Sci. 10(3), 216–232 (1999)

    Article  Google Scholar 

  4. Armstrong, J.: Erlang - a survey of the language and its industrial applications. In: Proceedings of the INAP, vol. 96 (1996)

    Google Scholar 

  5. Barabási, A.L., Bonabeau, E.: Scale-free networks. Sci. Am. 288(5), 60–69 (2003)

    Article  Google Scholar 

  6. Bouabdallah, S., Noth, A., Siegwart, R.: PID vs LQ control techniques applied to an indoor micro quadrotor. In: Proceedings of The IEEE International Conference on Intelligent Robots and Systems (IROS), pp. 2451–2456. IEEE (2004)

    Google Scholar 

  7. Caro, F., Gallien, J.: Inventory management of a fast-fashion retail network. Oper. Res. 58(2), 257–273 (2010)

    Article  MathSciNet  Google Scholar 

  8. Clark, T., Kulkarni, V., Barat, S., Barn, B.: ESL: an actor-based platform for developing emergent behaviour organisation simulations. In: Demazeau, Y., Davidsson, P., Bajo, J., Vale, Z. (eds.) PAAMS 2017. LNCS (LNAI), vol. 10349, pp. 311–315. Springer, Cham (2017). https://doi.org/10.1007/978-3-319-59930-4_27

    Chapter  Google Scholar 

  9. Condea, C., Thiesse, F., Fleisch, E.: RFID-enabled shelf replenishment with backroom monitoring in retail stores. Decis. Support Syst. 52(4), 839–849 (2012)

    Article  Google Scholar 

  10. Duan, Y., et al.: One-shot imitation learning. In: Proceedings of Conference on Neural Information Processing Systems (NIPS), vol. 31 (2017)

    Google Scholar 

  11. Gabel, T., Riedmiller, M.: Distributed policy search RL for job-shop scheduling tasks. Int. J. Prod. Res. 50(1), 41–61 (2012)

    Article  Google Scholar 

  12. Giannoccaro, I., Pontrandolfo, P.: Inventory management in supply chains: a reinforcement learning approach. Int. J. Prod. Econ. 78(2), 153–161 (2002)

    Article  Google Scholar 

  13. Godfrey, G.A., Powell, W.B.: An ADP algorithm for dynamic fleet management, I: single period travel times. Transp. Sci. 36(1), 21–39 (2002)

    Article  Google Scholar 

  14. Hewitt, C.: Actor model of computation: scalable robust information systems. arXiv preprint arXiv:1008.1459 (2010)

  15. Holland, J.H.: Complex Adaptive Systems, pp. 17–30. Daedalus, Boston (1992)

    Google Scholar 

  16. Iacob, M., Jonkers, D.H., Lankhorst, M., Proper, E., Quartel, D.D.: Archimate 2.0 Specification. Van Haren Publishing, ’s-Hertogenbosch (2012)

    Google Scholar 

  17. Jiang, C., Sheng, Z.: Case-based reinforcement learning for dynamic inventory control in a multi-agent supply-chain system. Expert Syst. Appl. 36(3), 6520–6526 (2009)

    Article  Google Scholar 

  18. Kaggle: Instacart market basket analysis data. https://www.kaggle.com/c/instacart-market-basket-analysis/data. Accessed August 2018

  19. Khadilkar, H.: A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Trans. ITS 20, 727–736 (2018)

    Google Scholar 

  20. Kober, J., Bagnell, J.A., Peters, J.: Reinforcement learning in robotics: a survey. Int. J. Robot. Res. 32(11), 1238–1274 (2013)

    Article  Google Scholar 

  21. Konda, V.R., Tsitsiklis, J.N.: Actor-critic algorithms. In: Advances in Neural Information Processing Systems, pp. 1008–1014 (2000)

    Google Scholar 

  22. Lee, H.L., Padmanabhan, V., Whang, S.: Information distortion in a supply chain: The bullwhip effect. Manage. Sci. 43(4), 546–558 (1997)

    Article  Google Scholar 

  23. Macal, C.M., North, M.J.: Tutorial on agent-based modelling and simulation. J. Simul. 4(3), 151–162 (2010)

    Article  Google Scholar 

  24. Mayne, D.Q., Rawlings, J.B., Rao, C.V., Scokaert, P.O.: Constrained model predictive control: stability and optimality. Automatica 36(6), 789–814 (2000)

    Article  MathSciNet  Google Scholar 

  25. Meadows, D.H., Wright, D.: Thinking in Systems. Chelsea Green Publishing, Hartford (2008)

    Google Scholar 

  26. Mnih, V., et al.: Playing Atari with deep reinforcement learning. arXiv preprint arXiv:1312.5602 (2013)

  27. Mortazavi, A., Khamseh, A.A., Azimi, P.: Designing of an intelligent self-adaptive model for supply chain ordering management system. Eng. Appl. Artif. Intell. 37, 207–220 (2015)

    Article  Google Scholar 

  28. Powell, W.B.: AI, OR and control theory: a Rosetta stone for stochastic optimization. Princeton University (2012)

    Google Scholar 

  29. Russell, S.J., Norvig, P.: Artificial Intelligence: A Modern Approach. Pearson Education Limited, Kuala Lumpur (2016)

    MATH  Google Scholar 

  30. Sabri, E.H., Beamon, B.M.: A multi-objective approach to simultaneous strategic and operational planning in supply chain design. Omega 28(5), 581–598 (2000)

    Article  Google Scholar 

  31. Shervais, S., Shannon, T.T., Lendaris, G.G.: Intelligent supply chain management using adaptive critic learning. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 33(2), 235–244 (2003)

    Article  Google Scholar 

  32. Silver, E.A.: Operations research in inventory management: a review and critique. Oper. Res. 29(4), 628–645 (1981)

    Article  MathSciNet  Google Scholar 

  33. Simon, H.A.: The architecture of complexity. In: Facets of Systems Science, pp. 457–476. Springer, Boston (1991). https://doi.org/10.1007/978-1-4899-0718-9_31

  34. Smith, S.A., Agrawal, N.: Management of multi-item retail inventory systems with demand substitution. Oper. Res. 48(1), 50–64 (2000)

    Article  MathSciNet  Google Scholar 

  35. Sutton, R., Barto, A.: Reinforcement Learning, 2nd edn. MIT Press, Cambridge (2012)

    MATH  Google Scholar 

  36. Thomas, M., McGarry, F.: Top-down vs. bottom-up process improvement. IEEE Softw. 11(4), 12–13 (1994)

    Article  Google Scholar 

  37. Topaloglu, H., Powell, W.B.: Dynamic-programming approximations for stochastic time-staged integer multicommodity-flow problems. INFORMS J. Comput. 18(1), 31–42 (2006)

    Article  MathSciNet  Google Scholar 

  38. Valluri, A., North, M.J., Macal, C.M.: Reinforcement learning in supply chains. Int. J. Neural Sys. 19(05), 331–344 (2009)

    Article  Google Scholar 

  39. White, S.A.: BPMN Modeling and Reference Guide (2008)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Souvik Barat .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Barat, S. et al. (2020). Reinforcement Learning of Supply Chain Control Policy Using Closed Loop Multi-agent Simulation. In: Paolucci, M., Sichman, J.S., Verhagen, H. (eds) Multi-Agent-Based Simulation XX. MABS 2019. Lecture Notes in Computer Science(), vol 12025. Springer, Cham. https://doi.org/10.1007/978-3-030-60843-9_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60843-9_3

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60842-2

  • Online ISBN: 978-3-030-60843-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics