Reinforcement Learning Methods for Operations Research Applications: The Order Release Problem

  • Manuel SchneckenreitherEmail author
  • Stefan Haeussler
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11331)


An important goal in Manufacturing Planning and Control systems is to achieve short and predictable flow times, especially where high flexibility in meeting customer demand is required. Besides achieving short flow times, one should also maintain high output and due-date performance. One approach to address this problem is the use of an order release mechanism which collects all incoming orders in an order-pool and thereafter determines when to release the orders to the shop-floor. A major disadvantage of traditional order release mechanisms is their inability to consider the nonlinear relationship between resource utilization and flow times which is well known from practice and queuing theory. Therefore, we propose a novel adaptive order release mechanism which utilizes deep reinforcement learning to set release times of the orders and provide several techniques for challenging operations research problems with reinforcement learning. We use a simulation model of a two-stage flow-shop and show that our approach outperforms well-known order release mechanism.


Operations research Production planning Order release Machine learning Reinforcement learning 


  1. 1.
    Ackerman, S.: Even-flow a scheduling method for reducing lateness in job shops. Manag. Technol. 3, 20–32 (1963)Google Scholar
  2. 2.
    Akyol, D.E., Bayhan, G.M.: A review on evolution of production scheduling with neural networks. Comput. Ind. Eng. 53(1), 95–122 (2007). Scholar
  3. 3.
    Aytug, H., Bhattacharyya, S., Koehler, G.J., Snowdon, J.L.: A review of machine learning in scheduling. IEEE Trans. Eng. Manag. 41, 165–171 (1994)CrossRefGoogle Scholar
  4. 4.
    Baykasoglu, A., Gocken, M.: A simulation based approach to analyse the effects of job release on the performance of a multi-stage job-shop with processing flexibility. Int. J. Prod. Res. 49(2), 585–610 (2011). <GotoISI>://WOS:000284413100015Google Scholar
  5. 5.
    Bechte, W.: Theory and practice of load-oriented manufacturing control. Int. J. Prod. Res. 26(3), 375–395 (1988)CrossRefGoogle Scholar
  6. 6.
    Bechte, W.: Load-oriented manufacturing control just-in-time production for job shops. Prod. Plan. Control 5(3), 292–307 (1994)CrossRefGoogle Scholar
  7. 7.
    Bertrand, J.W.M., Wortmann, J.C.: Production Control and Information Systems for Component Manufacturing Shops. Elsevier Science Inc., New York (1981)Google Scholar
  8. 8.
    Bertrand, J., Wortmann, J., Wijngaard, J.: Production Control: A Structural and Design Oriented Approach. Elsevier, Amsterdam (1990)Google Scholar
  9. 9.
    Conover, W.: Practical Nonparametric Statistics. Wiley Series in Probability and Statistics, 3rd edn. Wiley, New York (1999)Google Scholar
  10. 10.
    Enns, S., Suwanruji, P.: Work load responsive adjustment of planned lead times. J. Manuf. Technol. Manag. 15(1), 90–100 (2004)CrossRefGoogle Scholar
  11. 11.
    Gelders, L., Van Wassenhove, L.N.: Hierarchical integration in production planning: theory and practice. J. Oper. Manag. 3(1), 27–35 (1982)CrossRefGoogle Scholar
  12. 12.
    Hendry, L., Kingsman, B.: Production planning systems and their applicability to make-to-order companies. Eur. J. Oper. Res. 40(1), 1–15 (1989). Scholar
  13. 13.
    Hendry, L., Kingsman, B.: A decision support system for job release in make-to-order companies. Int. J. Ope. Prod. Manag. 11(6), 6–16 (1991)CrossRefGoogle Scholar
  14. 14.
    Hoyt, J.: Dynamic lead times that fit today’s dynamic planning (quoat lead times). Prod. Inventory Manag. 19(1), 63–71 (1978)Google Scholar
  15. 15.
    Hsu, S.Y., Sha, D.Y.: Due date assignment using artificial neural networks under different shop floor control strategies. Int. J. Prod. Res. 42(9), 1727–1745 (2004). Scholar
  16. 16.
    Karaoglan, A.D., Karademir, O.: Flow time and product cost estimation by using an artificial neural network (ANN): a case study for transformer orders. Eng. Econ. 62(3), 272–292 (2017). Scholar
  17. 17.
    Knollmann, M., Windt, K.: Control-theoretic analysis of the lead time syndrome and its impact on the logistic target achievement. Procedia CIRP 7, 97–102 (2013)CrossRefGoogle Scholar
  18. 18.
    Law, A.M., Kelton, W.D.: Simulation Modeling & Analysis, 3rd edn. McGraw-Hill Inc., New York (2000)zbMATHGoogle Scholar
  19. 19.
    Lee, C.Y., Piramuthu, S., Tsai, Y.K.: Job shop scheduling with a genetic algorithm and machine learning. Int. J. Prod. Res. 35(4), 1171–1191 (1997). Scholar
  20. 20.
    Li, S., Li, Y., Liu, Y., Xu, Y.: A GA-based NN approach for makespan estimation. Appl. Math. Comput. 185(2), 1003–1014 (2007). Special Issue on Intelligent Computing Theory and Methodology.
  21. 21.
    Lillicrap, T.P., et al.: Continuous control with deep reinforcement learning. arXiv preprint arXiv:1509.02971 (2015)
  22. 22.
    Lin, L.J.: Reinforcement learning for robots using neural networks. Technical report, School of Computer Science, Carnegie-Mellon University, Pittsburgh, PA (1993)Google Scholar
  23. 23.
    Mahadevan, S.: Average reward reinforcement learning: foundations, algorithms, and empirical results. Mach. Learn. 22(1), 159–195 (1996). Scholar
  24. 24.
    Mather, H., Plossl, G.W.: Priority fixation versus throughput planning. Prod. Inventory Manag. 19, 27–51 (1978)Google Scholar
  25. 25.
    Melnyk, S.A., Ragatz, G.L.: Order review release - research issues and perspectives. Int. J. Prod. Res. 27(7), 1081–1096 (1989). <GotoISI>://WOS:A1989AC60400003Google Scholar
  26. 26.
    Metan, G., Sabuncuoglu, I., Pierreval, H.: Real time selection of scheduling rules and knowledge extraction via dynamically controlled data mining. Int. J. Prod. Res. 48(23), 6909–6938 (2010). Scholar
  27. 27.
    Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)Google Scholar
  28. 28.
    Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRefGoogle Scholar
  29. 29.
    Molinder, A.: Joint optimization of lot-sizes, safety stocks and safety lead times in a MRP system. Int. J. Prod. Res. 35(4), 983–994 (1997)CrossRefGoogle Scholar
  30. 30.
    Pahl, J., Voß, S., Woodruff, D.L.: Production planning with load dependent lead times: an update of research. Ann. Oper. Res. 153(1), 297–345 (2007). Scholar
  31. 31.
    Paternina-Arboleda, C.D., Das, T.K.: Intelligent dynamic control policies for serial production lines. IIE Trans. 33(1), 65–77 (2001). Scholar
  32. 32.
    Patil, R.: Using ensemble and metaheuristics learning principles with artificial neural networks to improve due date prediction performance. Int. J. Prod. Res. 46(21), 6009–6027 (2008)CrossRefGoogle Scholar
  33. 33.
    Philipoom, P.R., Rees, L.P., Wiegmann, L.: Using neural networks to determine internally-set due-date assignments for shop scheduling. Decis. Sci. 25(5–6), 825–851 (1994). Scholar
  34. 34.
    Raaymakers, W., Weijters, A.: Makespan estimation in batch process industries: a comparison between regression analysis and neural networks. Eur. J. Oper. Res. 145(1), 14–30 (2003). Scholar
  35. 35.
    Savell, D.V., Perez, R.A., Koh, S.W.: Scheduling semiconductor wafer production: an expert system implementation. IEEE Expert 4(3), 9–15 (1989). (Fall)CrossRefGoogle Scholar
  36. 36.
    Schneeweiss, C.: Distributed decision making–a unified approach. Eur. J. Oper. Res. 150(2), 237–252 (2003)MathSciNetCrossRefGoogle Scholar
  37. 37.
    Selcuk, B., Fransoo, J.C., De Kok, A.: The effect of updating lead times on the performance of hierarchical planning systems. Int. J. Prod. Econ. 104(2), 427–440 (2006)CrossRefGoogle Scholar
  38. 38.
    Silver, D., et al.: Mastering chess and shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017)
  39. 39.
    Spearman, M.L., Woodruff, D.L., Hopp, W.J.: CONWIP: a pull alternative to Kanban. Int. J. Prod. Res. 28(5), 879–894 (1990). Scholar
  40. 40.
    Sutton, R.S., Barto, A.G.: Reinforcement Learning: An Introduction, vol. 1. MIT Press, Cambridge (1998)Google Scholar
  41. 41.
    Tatsiopoulos, I., Kingsman, B.: Lead time management. Eur. J. Oper. Res. 14(4), 351–358 (1983)CrossRefGoogle Scholar
  42. 42.
    Teo, C.C., Bhatnagar, R., Graves, S.C.: An application of master schedule smoothing and planned lead time control. Prod. Oper. Manag. 21(2), 211–223 (2012)CrossRefGoogle Scholar
  43. 43.
    Thuerer, M., Stevenson, M., Silva, C.: Three decades of workload control research: a systematic review of the literature. Int. J. Prod. Res. 49(23), 6905–6935 (2011)CrossRefGoogle Scholar
  44. 44.
    Thuerer, M., Stevenson, M., Silva, C., Land, M.J., Fredendall, L.D.: Workload control and order release: a lean solution for make-to-order companies. Prod. Oper. Manag. 21(5), 939–953 (2012)CrossRefGoogle Scholar
  45. 45.
    Tsitsiklis, J.N., Van Roy, B.: Analysis of temporal-difference learning with function approximation. In: Advances in Neural Information Processing Systems, pp. 1075–1081 (1997)Google Scholar
  46. 46.
    Watkins, C.J.C.H., Dayan, P.: Q-learning. Mach. Learn. 8(3), 279–292 (1992). Scholar
  47. 47.
    Wiendahl, H.: Load-Oriented Manufacturing Control, 1st edn. Springer, Berlin (1995). Scholar
  48. 48.
    Wuest, T., Weimer, D., Irgens, C., Thoben, K.D.: Machine learning in manufacturing: advantages, challenges, and applications. Prod. Manuf. Res. 4(1), 23–45 (2016). Scholar
  49. 49.
    Yano, C.: Setting planning lead times in serial production systems with earliness costs. Manag. Sci. 33(1), 95–106 (1987)CrossRefGoogle Scholar
  50. 50.
    Zhang, G.P.: Avoiding pitfalls in neural network research. IEEE Trans. Syst. Man Cybern. Part C (Appl. Rev.) 37(1), 3–16 (2007)CrossRefGoogle Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  1. 1.Department of Information Systems, Production and Logistics ManagementUniversity of InnsbruckInnsbruckAustria

Personalised recommendations