Skip to main content

Reinforcement learning applications to machine scheduling problems: a comprehensive literature review

Abstract

Reinforcement learning (RL) is one of the most remarkable branches of machine learning and attracts the attention of researchers from numerous fields. Especially in recent years, the RL methods have been applied to machine scheduling problems and are among the top five most encouraging methods for scheduling literature. Therefore, in this study, a comprehensive literature review about RL methods applications to machine scheduling problems was conducted. In this regard, Scopus and Web of Science databases were searched very inclusively using the proper keywords. As a result of the comprehensive research, 80 papers were found, published between 1995 and 2020. These papers were analyzed considering different aspects of the problem such as applied algorithms, machine environments, job and machine characteristics, objectives, benchmark methods, and a detailed classification scheme was constructed. Job shop scheduling, unrelated parallel machine scheduling, and single machine scheduling problems were found as the most studied problem type. The main contributions of the study are to examine essential aspects of reinforcement learning in machine scheduling problems, identify the most frequently investigated problem types, objectives, and constraints, and reveal the deficiencies and promising areas in the related literature. This study can help researchers who wish to study in this field through the comprehensive analysis of the related literature.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

References

  1. Ábrahám, G., Auer, P., Dósa, G., Dulai, T., & Werner-Stark, Ã. (2019). A reinforcement learning motivated algorithm for process optimization. Periodica Polytechnica Civil Engineering, 63(4), 961–970. https://doi.org/10.3311/PPci.14295

    Article  Google Scholar 

  2. Aissani, N., Bekrar, A., Trentesaux, D., & Beldjilali, B. (2012). Dynamic scheduling for multi-site companies: A decisional approach based on reinforcement multi-agent learning. Journal of Intelligent Manufacturing, 23(6), 2513–2529. https://doi.org/10.1007/s10845-011-0580-y

    Article  Google Scholar 

  3. Aissani, N., Trentesaux, D., & Beldjilali, B. (2009). Multi-agent reinforcement learning for adaptive scheduling: Application to multi-site company. In IFAC proceedings volumes, (Vol. 42, No. 4, pp. 1102–1107). https://doi.org/10.3182/20090603-3-RU-2001.0280.

  4. Aissani, N., & Trentesaux, D. (2008). Efficient and effective reactive scheduling of manufacturing system using Sarsa-multi-objective agents. In Proceedings of the 7th international conference MOSIM, Paris (pp. 698–707).

  5. Arviv, K., Stern, H., & Edan, Y. (2016). Collaborative reinforcement learning for a two-robot job transfer flow-shop scheduling problem. International Journal of Production Research, 54(4), 1196–1209. https://doi.org/10.1080/00207543.2015.1057297

    Article  Google Scholar 

  6. Atighehchian, A., & Sepehri, M. M. (2013). An environment-driven, function-based approach to dynamic single-machine scheduling. European Journal of Industrial Engineering, 7(1), 100–118. https://doi.org/10.1504/EJIE.2013.051594

    Article  Google Scholar 

  7. Aydin, M. E., & Öztemel, E. (2000). Dynamic job-shop scheduling using reinforcement learning agents. Robotics and Autonomous Systems, 33(2), 169–178. https://doi.org/10.1016/S0921-8890(00)00087-7

    Article  Google Scholar 

  8. Barto, A. G., & Mahadevan, S. (2003). Recent advances in hierarchical reinforcement learning. Discrete Event Dynamic Systems, 13(1), 41–77. https://doi.org/10.1023/A:1022140919877

    Article  Google Scholar 

  9. Bouazza, W., Sallez, Y., & Beldjilali, B. (2017). A distributed approach solving partially flexible job-shop scheduling problem with a Q-learning effect. IFAC-PapersOnLine, 50(1), 15890–15895. https://doi.org/10.1016/j.ifacol.2017.08.2354

    Article  Google Scholar 

  10. Cadavid, J. P. U., Lamouri, S., Grabot, B., Pellerin, R., & Fortin, A. (2020). Machine learning applied in production planning and control: a state-of-the-art in the era of industry 4.0. Journal of Intelligent Manufacturing, 31(6), 1531–1558. https://doi.org/10.1007/s10845-019-01531-7

    Article  Google Scholar 

  11. Csáji, B. C., & Monostori, L. (2005). Stochastic approximate scheduling by neurodynamic learning. In IFAC Proceedings Volumes, (Vol. 38, No. 1, pp. 355–360). https://doi.org/10.3182/20050703-6-CZ-1902.01481

  12. Csáji, B. C., & Monostori, L. (2008). Adaptive stochastic resource control: A machine learning approach. Journal of Artificial Intelligence Research, 32, 453–486. https://doi.org/10.1613/jair.2548

    Article  Google Scholar 

  13. Csáji, B. C., Monostori, L., & Kádár, B. (2006). Reinforcement learning in a distributed market-based production control system. Advanced Engineering Informatics, 20(3), 279–288. https://doi.org/10.1016/j.aei.2006.01.001

    Article  Google Scholar 

  14. Das, T. K., Gosavi, A., Mahadevan, S., & Marchalleck, N. (1999). Solving semi-Markov decision problems using average reward reinforcement learning. Management Science, 45(4), 560–574. https://doi.org/10.1287/mnsc.45.4.560

    Article  Google Scholar 

  15. De Raedt, L. (2008). Logical and relational learning. New York: Springer. https://doi.org/10.1007/978-3-540-68856-3.

    Book  Google Scholar 

  16. Ding, Z., & Dong, H. (2020). Challenges of reinforcement learning. In Deep Reinforcement Learning (pp. 249–272). Singapore: Springer. https://doi.org/10.1007/978-981-15-4095-0_7

    Chapter  Google Scholar 

  17. Dulac-Arnold, G., Mankowitz, D., & Hester, T. (2019). Challenges of real-world reinforcement learning. (Online) https://arxiv.org/abs/1904.12901

  18. Fuchigami, H. Y., & Rangel, S. (2018). A survey of case studies in production scheduling: Analysis and perspectives. Journal of Computational Science, 25, 425–436. https://doi.org/10.1016/j.jocs.2017.06.004

    Article  Google Scholar 

  19. Fang, G., Li, Y., Liu, A., & Liu, Z. (2020). A reinforcement learning method to scheduling problem of steel production process.Journal of Physics: Conference Series, 1486(7), 072035. https://doi.org/10.1088/1742-6596/1486/7/072035

    Article  Google Scholar 

  20. Gabel, T., & Riedmiller, M. (2006a). Reducing policy degradation in neuro-dynamic programming. In ESANN 2006 Proceedings - European Symposium on Artificial Neural Networks (pp. 653–658).

  21. Gabel, T., & Riedmiller, M. (2006b). Multi-agent case-based reasoning for cooperative reinforcement learners. In Roth-Berghofer, T. R., Göker, M. H., & Güvenir, H. A. (Eds.), Advances in case-based reasoning. ECCBR 2006 (4106 vol.). Berlin, Heidelberg: Springer. https://doi.org/10.1007/11805816_5

    Chapter  Google Scholar 

  22. Gabel, T., & Riedmiller, M. (2007a). On a successful application of multi-agent reinforcement learning to operations research benchmarks. In 2007 IEEE international symposium on approximate dynamic programming and reinforcement learning (pp. 68–75). https://doi.org/10.1109/ADPRL.2007.368171

  23. Gabel, T., & Riedmiller, M. (2007b). Scaling adaptive agent-based reactive job-shop scheduling to large-scale problems. In Proceedings of the 2007 IEEE symposium on computational Intelligence in scheduling, CI-Sched 2007 (pp. 259–266). https://doi.org/10.1109/SCIS.2007.367699

  24. Gabel, T., & Riedmiller, M. (2008). Adaptive reactive job-shop scheduling with reinforcement learning agents. International Journal of Information Technology and Intelligent Computing, 24(4), 14–18

    Google Scholar 

  25. Gabel, T., & Riedmiller, M. (2011). Distributed policy search reinforcement learning for job-shop scheduling tasks. International Journal of Production Research, 50(1), 41–61. https://doi.org/10.1080/00207543.2011.571443

    Article  Google Scholar 

  26. Gosavi, A. (2015). Simulation-based optimization. Berlin: Springer

    Google Scholar 

  27. Graham, R. L., Lawler, E. L., Lenstra, J. K., & Kan, A. H. G. R. (1979). Optimization and approximation in deterministic sequencing and scheduling: A survey. Annals of Discrete Mathematics, 5, 287–326. https://doi.org/10.1016/S0167-5060(08)70356-X

    Article  Google Scholar 

  28. Guo, L., Zhuang, Z., Huang, Z., & Qin, W. (2020). Optimization of dynamic multi-objective non-identical parallel machine scheduling with multi-stage reinforcement learning. In 2020 IEEE 16th international conference on automation science and engineering (CASE) (pp. 1215–1219). https://doi.org/10.1109/CASE48305.2020.9216743

  29. Han, W., Guo, F., & Su, X. (2019). A reinforcement learning method for a hybrid flow-shop scheduling problem. Algorithms, 12(11), https://doi.org/10.3390/a12110222

  30. Heuillet, A., Couthouis, F., & Díaz-Rodríguez, N. (2021). Explainability in deep reinforcement learning. Knowledge-Based Systems, 214, 106685. https://doi.org/10.1016/j.knosys.2020.106685

    Article  Google Scholar 

  31. Hong, J., & Prabhu, V. V. (2004). Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Applied Intelligence, 20(1), 71–87. https://doi.org/10.1023/B:APIN.0000011143.95085.74

    Article  Google Scholar 

  32. Idrees, H. D., Sinnokrot, M. O., & Al-Shihabi, S. (2006). A reinforcement learning algorithm to minimize the mean tardiness of a single machine with controlled capacity. In Proceedings - Winter simulation conference (pp. 1765–1769). https://doi.org/10.1109/WSC.2006.322953

  33. Iwamura, K., Mayumi, N., Tanimizu, Y., & Sugimura, N. (2010). A study on real-time scheduling for holonic manufacturing systems - Determination of utility values based on multi-agent reinforcement learning. In International conference on industrial applications of holonic and multi-agent systems (pp. 135–144). Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03668-2_13

  34. Jiménez, Y. M., Palacio, J. C., & Nowé, A. (2020). Multi-agent reinforcement learning tool for job shop scheduling problems. In International conference on optimization and learning (pp. 3–12). https://doi.org/10.1007/978-3-030-41913-4_1

  35. Kaelbling, L., Littman, M. L., Moore, A. W., & Hall, S. (1996). Reinforcement learning: A survey. Journal of Artificial Intelligence Research, 4, 237–285. https://doi.org/10.1613/jair.301

    Article  Google Scholar 

  36. Khadilkar, H. (2018). A scalable reinforcement learning algorithm for scheduling railway lines. IEEE Transactions on Intelligent Transportation Systems, 20(2), 727–736. https://doi.org/10.1109/TITS.2018.2829165

    Article  Google Scholar 

  37. Kim, G. H., & Lee, C. S. G. (1996). Genetic reinforcement learning for scheduling heterogeneous machines. In Proceedings - IEEE International Conference on Robotics and Automation (Vol. 3, pp. 2798–2803). https://doi.org/10.1109/ROBOT.1996.506586

  38. Kim, N., & Shin, H. (2017). The application of actor-critic reinforcement learning for fab dispatching scheduling. In 2017 Winter simulation conference (pp. 4570–4571). https://doi.org/10.1109/WSC.2017.8248209

  39. Kong, L. F., & Wu, J. (2005). Dynamic single machine scheduling using Q-learning agent. In 2005 International conference on machine learning and cybernetics, ICMLC 2005 (pp. 3237–3241). https://doi.org/10.1109/ICMLC.2005.1527501

  40. Lee, S., Cho, Y., & Lee, Y. H. (2020). Injection mold production sustainable scheduling using deep reinforcement learning. Sustainability, 12(20), 8718. https://doi.org/10.3390/su12208718

    Article  Google Scholar 

  41. Lihu, A., & Holban, S. (2009). Top five most promising algorithms in scheduling. In Proceedings – 2009 5th international symposium on applied computational intelligence and informatics, SACI 2009 (pp. 397–404). https://doi.org/10.1109/SACI.2009.5136281

  42. Lin, C. C., Deng, D. J., Chih, Y. L., & Chiu, H. T. (2019). Smart manufacturing scheduling with edge computing using multiclass deep Q network. IEEE Transactions on Industrial Informatics, 15(7), 4276–4284. https://doi.org/10.1109/TII.2019.2908210

    Article  Google Scholar 

  43. Liu, C. C., Jin, H. Y., Tian, Y., & Yu, H. B. (2001). Reinforcement learning approach to re-entrant manufacturing system scheduling. In 2001 International Conferences on Info-Tech and Info-Net: A Key to Better Life, ICII 2001 - Proceedings (Vol. 3, pp. 280–285). https://doi.org/10.1109/ICII.2001.983070

  44. Liu, C. L., Chang, C. C., & Tseng, C. J. (2020). Actor-critic deep reinforcement learning for solving job shop scheduling problems. IEEE Access, 8, 71752–71762. https://doi.org/10.1109/ACCESS.2020.2987820

    Article  Google Scholar 

  45. Liu, W., & Wang, X. (2009). Dynamic decision model in evolutionary games based on reinforcement learning. Systems Engineering - Theory & Practice, 29(3), 28–33. https://doi.org/10.1016/S1874-8651(10)60008-7

    Article  Google Scholar 

  46. Luo, S. (2020). Dynamic scheduling for flexible job shop with new job insertions by deep reinforcement learning. Applied Soft Computing, 91, 106208. https://doi.org/10.1016/j.asoc.2020.106208

    Article  Google Scholar 

  47. Miyashita, K. (2000). Learning scheduling control knowledge through reinforcements. International Transactions in Operational Research, 7(2), 125–138. https://doi.org/10.1016/S0969-6016(00)00014-9

    Article  Google Scholar 

  48. Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A. A., Veness, J., Bellemare, M. G., & Hassabis, D., …. (2015). Human-level control through deep reinforcement learning. Nature, 518(7540), 529–533. https://doi.org/10.1038/nature14236

    Article  Google Scholar 

  49. Monostori, L., & Csáji, B. C. (2006). Stochastic dynamic production control by neurodynamic programming. CIRP Annals - Manufacturing Technology, 55(1), 473–478. https://doi.org/10.1016/S0007-8506(07)60462-4

    Article  Google Scholar 

  50. Monostori, L., Csáji, B. C., & Kádár, B. (2004). Adaptation and learning in distributed production control. CIRP Annals - Manufacturing Technology, 53(1), 349–352. https://doi.org/10.1016/S0007-8506(07)60714-8

    Article  Google Scholar 

  51. Nahmias, S., & Olsen, T. L. (2015). Production and operations analysis. Long Grove: Waveland Press

  52. Neto, T. R. F., & Godinho Filho, M. (2013). Literature review regarding Ant Colony Optimization applied to scheduling problems: Guidelines for implementation and directions for future research. Engineering Applications of Artificial Intelligence, 26(1), 150–161. https://doi.org/10.1016/j.engappai.2012.03.011

    Article  Google Scholar 

  53. Palombarini, J., & Martínez, E. (2010). Learning to repair plans and schedules using a relational (deictic) representation. In Computer aided chemical engineering (Vol. 27, pp. 1377–1382). Elsevier. https://doi.org/10.1016/s1570-7946(09)70620-0

  54. Palombarini, J., & Martínez, E. (2012a). SmartGantt – An interactive system for generating and updating rescheduling knowledge using relational abstractions. Computers and Chemical Engineering, 47, 202–216. https://doi.org/10.1016/j.compchemeng.2012.06.021

    Article  Google Scholar 

  55. Palombarini, J., & Martínez, E. (2012b). SmartGantt – An intelligent system for real time rescheduling based on relational reinforcement learning. Expert Systems With Applications, 39(11), 10251–10268. https://doi.org/10.1016/j.eswa.2012.02.176

    Article  Google Scholar 

  56. Parente, M., Figueira, G., Amorim, P., & Marques, A. (2020). Production scheduling in the context of Industry 4.0: review and trends. International Journal of Production Research, 58(17), 5401–5431. https://doi.org/10.1080/00207543.2020.1718794

    Article  Google Scholar 

  57. Park, I., Huh, J., Kim, J., & Park, J. (2020). A reinforcement learning approach to robust scheduling of semiconductor manufacturing facilities. IEEE Transactions on Automation Science and Engineering, 17(3), 1420–1431. https://doi.org/10.1109/tase.2019.2956762

    Article  Google Scholar 

  58. Paternina-Arboleda, C. D., & Das, T. K. (2001). Intelligent dynamic control policies for serial production lines. IIE Transactions, 33(1), 65–77. https://doi.org/10.1023/A:1007641824604

    Article  Google Scholar 

  59. Qu, S., Chu, T., Wang, J., Leckie, J., & Jian, W. (2015). A centralized reinforcement learning approach for proactive scheduling in manufacturing. In IEEE international conference on emerging technologies and factory automation, ETFA (Vol. 2015-Octob, pp. 1–8). https://doi.org/10.1109/ETFA.2015.7301417

  60. Qu, S., Wang, J., Govil, S., & Leckie, J. O. (2016a). Optimized adaptive scheduling of a manufacturing process system with multi-skill workforce and multiple machine types: An ontology-based, multi-agent reinforcement learning approach. Procedia CIRP, 57, 55–60. https://doi.org/10.1016/j.procir.2016.11.011

    Article  Google Scholar 

  61. Qu, S., Jie, W., & Shivani, G. (2016b). Learning adaptive dispatching rules for a manufacturing process system by using reinforcement learning approach. In IEEE International Conference on Emerging Technologies and Factory Automation, ETFA (Vol. 2016-Novem, pp. 1–8). https://doi.org/10.1109/etfa.2016.7733712

  62. Qu, G., Wierman, A., & Li, N. (2020). Scalable reinforcement learning of localized policies for multi-agent networked systems. In Learning for Dynamics and Control (pp. 256–266).

  63. Ramírez-Hernández, J. A., & Fernandez, E. (2005). A case study in scheduling reentrant manufacturing lines: Optimal and simulation-based approaches. In Proceedings of the 44th IEEE conference on decision and control (Vol. 2005, pp. 2158–2163). https://doi.org/10.1109/CDC.2005.1582481

  64. Ramírez-Hernández, J. A., & Fernandez, E. (2009). A simulation-based approximate dynamic programming approach for the control of the intel Mini-Fab benchmark model. In Proceedings - Winter simulation conference (pp. 1634–1645). https://doi.org/10.1109/wsc.2009.5429179

  65. Ren, J., Ye, C., & Yang, F. (2020). A novel solution to JSPs based on long short-term memory and policy gradient algorithm. International Journal of Simulation Modelling, 19, 157–168. https://doi.org/10.2507/ijsimm19-1-co4

    Article  Google Scholar 

  66. Reyna, Y. C. F., Cáceres, A. P., Jiménez, Y. M., & Reyes, Y. T. (2019a). An improvement of reinforcement learning approach for permutation of flow-shop scheduling problems. In RISTI - Revista Iberica de Sistemas e Tecnologias de Informacao, (E18), pp. 257–270.

  67. Reyna, Y. C. F., Jiménez, Y. M., Cabrera, A. V., & Sánchez, E. A. (2019b). Optimization of heavily constrained hybrid-flexible flowshop problems using a multi-agent reinforcement learning approach. Investigacion Operacional, 40(1), 100–111

    Google Scholar 

  68. Reyna, Y. C. F., Jiménez, Y. M., & Nowé, A. (2018). Q-learning algorithm performance for m-machine n-jobs flow shop scheduling to minimize makespan. Investigación Operacional, 38(3), 281–290

    Google Scholar 

  69. Reyna, Y. C. F., Jiménez, Y. M., Bermúdez Cabrera, J. M., & Méndez Hernández, B. M. (2015). A reinforcement learning approach for scheduling problems. Investigacion Operacional, 36(3), 225–231

    Google Scholar 

  70. Riedmiller, S., & Riedmiller, M. (1999). A neural reinforcement learning approach to learn local dispatching policies in production scheduling. In IJCAI Iiternational joint conference on artificial intelligence (Vol. 2, pp. 764–769).

  71. Russel, S., & Norvig, P. (2010). Artificial intelligence: A modern approach. London: Pearson.

    Google Scholar 

  72. Schwartz, A. (1993). A reinforcement learning method for maximizing undiscounted rewards. In Proceedings of the tenth international conference on machine learning (Vol. 298, pp. 298–305). https://doi.org/10.1016/b978-1-55860-307-3.50045-9

  73. Shiue, Y., Lee, K., & Su, C. (2018). Real-time scheduling for a smart factory using a reinforcement learning approach. Computers & Industrial Engineering, 125(101), 604–614. https://doi.org/10.1016/j.cie.2018.03.039

    Article  Google Scholar 

  74. Sigaud, O., & Buffet, O. (2013). Markov Decision Processes in Artificial Intelligence: MDPs, beyond MDPs and applications. New York: Wiley

    Book  Google Scholar 

  75. Stricker, N., Kuhnle, A., Sturm, R., & Friess, S. (2018). Manufacturing technology reinforcement learning for adaptive order dispatching in the semiconductor industry. CIRP Annals, 67(1), 511–514. https://doi.org/10.1016/j.cirp.2018.04.041

    Article  Google Scholar 

  76. Sutton, R. S., & Barto, A. G. (1998). Reinforcement Learning: An Introduction. Cambridge: MIT Press

    Google Scholar 

  77. Szepesvári, C. (2010). Algorithms for reinforcement learning. Synthesis lectures on artificial intelligence and machine learning, 4(1), 1–103. https://doi.org/10.2200/S00268ED1V01Y201005AIM009

    Article  Google Scholar 

  78. Thomas, T. E., Koo, J., Chaterji, S., & Bagchi, S. (2018). Minerva: A reinforcement learning-based technique for optimal scheduling and bottleneck detection in distributed factory operations. In 2018 10th international conference on communication systems & networks (COMSNETS) (pp. 129–136). https://doi.org/10.1109/COMSNETS.2018.8328189

  79. Van Otterlo, M. (2009). The logic of adaptive behavior: Knowledge representation and algorithms for adaptive sequential decision making under uncertainty in first-order and relational domains. Ios Press

  80. Vapnik, V. N. (2000). Methods of pattern recognition. In The nature of statistical learning theory (pp. 123–180). New York, NY: Springer

    Chapter  Google Scholar 

  81. Wang, H. X., & Yan, H. S. (2013a). An adaptive scheduling system in knowledgeable manufacturing based on multi-agent. In 10th IEEE international conference on control and automation (ICCA) (pp. 496–501). https://doi.org/10.1109/icca.2013.6564866

  82. Wang, H. X., & Yan, H. S. (2013b). An adaptive assembly scheduling approach in knowledgeable manufacturing. Applied Mechanics and Materials, 433–435, 2347–2350. https://doi.org/10.4028/www.scientific.net/AMM.433-435.2347

    Article  Google Scholar 

  83. Wang, H. X., & Yan, H. S. (2016). An interoperable adaptive scheduling strategy for knowledgeable manufacturing based on SMGWQ-learning. Journal of Intelligent Manufacturing, 27(5), 1085–1095. https://doi.org/10.1007/s10845-014-0936-1

    Article  Google Scholar 

  84. Wang, H. X., Sarker, B. R., Li, J., & Li, J. (2020). Adaptive scheduling for assembly job shop with uncertain assembly times based on dual Q- learning. International Journal of Production Research. https://doi.org/10.1080/00207543.2020.1794075

    Article  Google Scholar 

  85. Wang, Y. C., & Usher, J. M. (2004). Learning policies for single machine job dispatching. Robotics and Computer-Integrated Manufacturing, 20(6), 553–562. https://doi.org/10.1016/j.rcim.2004.07.003

    Article  Google Scholar 

  86. Wang, Y. C., & Usher, J. M. (2005). Application of reinforcement learning for agent-based production scheduling. Engineering Applications of Artificial Intelligence, 18(1), 73–82. https://doi.org/10.1016/j.engappai.2004.08.018

    Article  Google Scholar 

  87. Wang, Y. C., & Usher, J. M. (2007). A reinforcement learning approach for developing routing policies in multi-agent production scheduling. International Journal of Advanced Manufacturing Technology, 33(3–4), 323–333. https://doi.org/10.1007/s00170-006-0465-y

    Article  Google Scholar 

  88. Wang, Y. F. (2018). Adaptive job shop scheduling strategy based on weighted Q-learning algorithm. Journal of Intelligent Manufacturing, 31(2), 417–432. https://doi.org/10.1007/s10845-018-1454-3

    Article  Google Scholar 

  89. Waschneck, B., Reichstaller, A., Belzner, L., Altenmüller, T., Bauernhansl, T., Knapp, A., & Kyek, A. (2018a). Optimization of global production scheduling with deep reinforcement learning. Procedia CIRP, 72, 1264–1269. https://doi.org/10.1016/j.procir.2018.03.212

    Article  Google Scholar 

  90. Waschneck, B., Reichstaller, A., Belzner, L., Altenmuller, T., Bauernhansl, T., Knapp, A., & Kyek, A. (2018b). Deep reinforcement learning for semiconductor production scheduling. In 2018 29th annual SEMI advanced semiconductor manufacturing conference, ASMC 2018 (pp. 301–306). https://doi.org/10.1109/asmc.2018.8373191

  91. Wei, Y., & Zhao, M. (2004). Composite rules selection using reinforcement learning for dynamic job-shop scheduling. In 2004 IEEE conference on robotics, automation and mechatronics (Vol. 2, pp. 1083–1088). https://doi.org/10.1109/RAMECH.2004.1438070

  92. Xanthopoulos, A. S., Koulouriotis, D. E., Tourassis, V. D., & Emiris, D. M. (2013). Intelligent controllers for bi-objective dynamic scheduling on a single machine with sequence-dependent setups. Applied Soft Computing Journal, 13(12), 4704–4717. https://doi.org/10.1016/j.asoc.2013.07.015

    Article  Google Scholar 

  93. Xiao, Y., Tan, Q., Zhou, L., & Tang, H. (2017). Stochastic scheduling with compatible job families by an improved Q-learning algorithm. In Chinese Control Conference, CCC (pp. 2657–2662). https://doi.org/10.23919/ChiCC.2017.8027764

  94. Yang, H. B., & Yan, H. S. (2009). An adaptive approach to dynamic scheduling in knowledgeable manufacturing cell. International Journal of Advanced Manufacturing Technology, 42(3–4), 312–320. https://doi.org/10.1007/s00170-008-1588-0

    Article  Google Scholar 

  95. Yang, H. B., & Yan, H. S. (2007). An adaptive policy of dynamic scheduling in knowledgeable manufacturing environment. In Proceedings of the IEEE international conference on automation and logistics, ICAL 2007 (pp. 835–840). https://doi.org/10.1109/ICAL.2007.4338680

  96. Yingzi, W. E. I., Xinli, J., & Pingbo, H. A. O. (2009). Pattern Driven Dynamic Scheduling Approach using Reinforcement Learning. In 2009 IEEE international conference on automation and logistics (pp. 514–519). https://doi.org/10.1109/ICAL.2009.5262867

  97. Yuan, B., Jiang, Z., & Wang, L. (2016). Dynamic parallel machine scheduling with random breakdowns using the learning agent. International Journal of Services Operations and Informatics, 8(2), 94–103. https://doi.org/10.1504/IJSOI.2016.080083

    Article  Google Scholar 

  98. Yuan, B., Wang, L., & Jiang, Z. (2013). Dynamic parallel machine scheduling using the learning agent. In 2013 IEEE international conference on industrial engineering and engineering management (pp. 1565–1569). https://doi.org/10.1109/IEEM.2013.6962673

  99. Zhang, T., Xie, S., & Rose, O. (2017). Real-time job shop scheduling based on simulation and Markov decision processes. In Proceedings - Winter simulation conference (pp. 3899–3907). https://doi.org/10.1109/WSC.2017.8248100

  100. Zhang, T., Xie, S., & Rose, O. (2018). Real-time batching in job shops based on simulation and reinforcement learning. In 2018 Winter simulation conference (WSC) (pp. 3331–3339). https://doi.org/10.1109/WSC.2018.8632524

  101. Zhang, W., & Dietterich, T. G. (1995). A reinforcement learning approach to job-shop scheduling. In 1995 International joint conference on artificial intelligence (pp. 1114–1120).

  102. Zhang, W., & Dietterich, T. G. (1996). High-performance job-shop scheduling with a time-delay TD (λ) network. Advances in Neural Information Processing Systems, 91, 1024–1030

    Google Scholar 

  103. Zhang, Z., Zheng, L., Hou, F., & Li, N. (2011). Semiconductor final test scheduling with Sarsa(λ, k) algorithm. European Journal of Operational Research, 215(2), 446–458. https://doi.org/10.1016/j.ejor.2011.05.052

    Article  Google Scholar 

  104. Zhang, Z., Zheng, L., Li, N., Wang, W., Zhong, S., & Hu, K. (2012). Minimizing mean weighted tardiness in unrelated parallel machine scheduling with reinforcement learning. Computers and Operations Research, 39(7), 1315–1324. https://doi.org/10.1016/j.cor.2011.07.019

    Article  Google Scholar 

  105. Zhang, Z., Zheng, L., & Weng, M. X. (2007). Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-learning. International Journal of Advanced Manufacturing Technology, 34(9–10), 968–980. https://doi.org/10.1007/s00170-006-0662-8

    Article  Google Scholar 

  106. Zhao, M., Li, X., Gao, L., Wang, L., & Xiao, M. (2019). An improved Q-learning based rescheduling method for flexible job-shops with machine failures. In 2019 IEEE 15th international conference on automation science and engineering (CASE) (pp. 331–337). https://doi.org/10.1109/COASE.2019.8843100

  107. Zhou, L., Zhang, L., & Horn, B. K. P. (2020). Deep reinforcement learning-based dynamic scheduling in smart manufacturing. Procedia CIRP, 93, 383–388. https://doi.org/10.1016/j.procir.2020.05.163

    Article  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Behice Meltem Kayhan.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kayhan, B.M., Yildiz, G. Reinforcement learning applications to machine scheduling problems: a comprehensive literature review. J Intell Manuf (2021). https://doi.org/10.1007/s10845-021-01847-3

Download citation

Keywords

  • Reinforcement learning
  • Q-learning
  • Machine scheduling
  • Job shop scheduling problem
  • Parallel machine scheduling problems