Skip to main content
Log in

Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning

  • ORIGINAL ARTICLE
  • Published:
The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

Abstract

In this paper, we discuss a dynamic unrelated parallel machine scheduling problem with sequence-dependant setup times and machine–job qualification consideration. To apply the Q-Learning algorithm, we convert the scheduling problem into reinforcement learning problems by constructing a semi-Markov decision process (SMDP), including the definition of state representation, actions and the reward function. We use five heuristics, WSPT, WMDD, WCOVERT, RATCS and LFJ-WCOVERT, as actions and prove the equivalence of the reward function and the scheduling objective: minimisation of mean weighted tardiness. We carry out computational experiments to examine the performance of the Q-Learning algorithm and the heuristics. Experiment results show that Q-Learning always outperforms all heuristics remarkably. Averaged over all test problems, the Q-Learning algorithm achieved performance improvements over WSPT, WMDD, WCOVERT, RATCS and LFJ-WCOVERT by considerable amounts of 61.38%, 60.82%, 56.23%, 57.48% and 66.22%, respectively.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Liaw C-F, Lin Y-K, Cheng C-Y, Chen M (2003) Scheduling unrelated parallel machines to minimize total weighted tardiness. Comput Oper Res 30(12):1777–1789

    Article  MATH  MathSciNet  Google Scholar 

  2. Kim D-W, Na D-G, Chen FF (2003) Unrelated parallel machine scheduling with setup times and a total weighted tardiness objective. Robot Cim-Int Manuf 19(1–2):173–181

    Article  Google Scholar 

  3. Rachamadugu RV, Morton TE (1981) Myopic heuristics for the single machine weighted tardiness problem. Working paper #28-81-82, Graduate School of Industrial Administration, Carnegie-Mellon University, Pittsburgh, Pennsylvania

  4. Volgenant A, Teerhuis E (1999) Improved heuristics for the n-job single-machine weighted tardiness problem. Comput Oper Res 26(1):35–44

    Article  MATH  Google Scholar 

  5. Carroll DC (1965) Heuristic sequencing of jobs with single and multiple components. PhD thesis, Sloan School of Management, MIT, Cambridge, Massachusetts

  6. Vepsalainen A, Morton TE (1987) Priority rules for job shops with weighted tardiness costs. Manag Sci 33(8):1035–1047

    Google Scholar 

  7. Russell RS, Dar-El EM, Taylor BW (1987) A comparative analysis of the COVERT job sequencing rule using various shop performance measures. Int J Prod Res 25(10):1523–1540

    Article  Google Scholar 

  8. Baker KR, Bertrand JWM (1983) A dynamic priority rule for scheduling against due-dates. J Oper Manag 3(1):37–42

    Article  Google Scholar 

  9. Kanet JJ, Li X (2004) A weighted modified due date rule for sequencing to minimize weighted tardiness. J Scheduling 7(4):261–276

    Article  MathSciNet  MATH  Google Scholar 

  10. Lee YH, Bhaskaran K, Pinedo M (1997) A heuristic to minimize the total weighted tardiness with sequence-dependent setups. IIE Trans 29:45–52

    Article  Google Scholar 

  11. Eom D-H, Shin H-J, Kwun I-H, Shim J-K, Kim S-S (2002) Scheduling jobs on parallel machines with sequence-dependent family set-up times. Int J Adv Manuf Tech 19(12):926–932

    Article  Google Scholar 

  12. Kim S-S, Shin HJ, Eom D-H, Kim C-O (2003) A due date density-based categorising heuristic for parallel machines scheduling. Int J Adv Manuf Tech 22(9–10):753–760

    Article  Google Scholar 

  13. Sutton RS, Barto AG (1998) Reinforcement learning: an introduction. MIT Press, Cambridge, Massachusetts

    Google Scholar 

  14. Watkins CJCH (1989) Learning from Delayed Rewards. PhD thesis, Cambridge University, Cambridge, England

  15. Watkins CJCH, Dayan P (1992) Q-learning. Mach Learn 8(3–4):279–292

    MATH  Google Scholar 

  16. Jaakkola T, Jordan MI, Singh SP (1994) On the convergence of stochastic iterative dynamic programming algorithms. Neural Comput 6(6):1185–1201

    MATH  Google Scholar 

  17. Tsitsiklis JN (1994) Asynchronous stochastic approximation and Q-learning. Mach Learn 16(3):185–202

    MATH  MathSciNet  Google Scholar 

  18. Bertsekas DP, Tsitsiklis JN (1996) Neuro-dynamic programming. Athena Scientific, Belmont, Massachusetts

    MATH  Google Scholar 

  19. Aydin ME, Öztemel E (2000) Dynamic job-shop scheduling using reinforcement learning agents. Robot Auton Syst 33(2–3):169–178

    Article  Google Scholar 

  20. Hong J, Prabhu VV (2004) Distributed reinforcement learning control for batch sequencing and sizing in just-in-time manufacturing systems. Appl Intell 20(1):71–87

    Article  Google Scholar 

  21. Puterman ML (1994) Markov decision processes. Wiley, New York

    MATH  Google Scholar 

  22. Wang YC (2003) Application of reinforcement learning to multi-agent production scheduling. PhD. thesis, Mississippi State University, Mississippi

  23. Riedmiller S, Riedmiller M (1999) A neural reinforcement learning approach to learn local dispatching policies in production scheduling. In: Proceedings of the 16th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, August 1999

  24. Csáji BC, Kádár B, Monostori L (2003) Improving multi-agent based scheduling by neurodynamic programming. In: Proceedings of the 1st International Conference on Applications of Holonic and Multi-Agent Systems (HoloMAS 2003), Prague, Czech Republic, September 2003, pp 110–123

  25. Wang YC, Usher JM (2005) Application of reinforcement learning for agent-based production scheduling. Eng Appl Artif Intell 18(1):73–82

    Article  Google Scholar 

  26. Csáji BC, Monostori L (2005) Stochastic reactive production scheduling by multi-agent based asynchronous approximate dynamic programming. Lect Notes Comput Sci 3690:388–397

    Article  Google Scholar 

  27. Pinedo M (2002) Scheduling: theory, algorithms, and systems, 2nd edn. Prentice-Hall, Englewood Cliffs, New Jersey

    MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zhicong Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, Z., Zheng, L. & Weng, M.X. Dynamic parallel machine scheduling with mean weighted tardiness objective by Q-Learning. Int J Adv Manuf Technol 34, 968–980 (2007). https://doi.org/10.1007/s00170-006-0662-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00170-006-0662-8

Keywords

Navigation