Abstract
Intelligent optimization refers to the promising technique of integrating learning mechanisms into (meta-)heuristic search. In this paper, we use multi-agent reinforcement learning for building high-quality solutions for the multi-mode resource-constrained project scheduling problem (MRCPSP). We use a network of distributed reinforcement learning agents that cooperate to jointly learn a well-performing constructive heuristic. Each agent, being responsible for one activity, uses two simple learning devices, called learning automata, that learn to select a successor activity order and a mode, respectively. By coupling the reward signals for both learning tasks, we can clearly show the advantage of using reinforcement learning in search. We present some comparative results, to show that our method can compete with the best performing algorithms for the MRCPSP, yet using only simple learning schemes without the burden of complex fine-tuning.
References
Alcaraz J, Maroto C and Ruiz R (2003). Solving the multi-mode resource-constrained project scheduling problem with genetic algorithms. J Opl Res Soc 54: 614–626.
Battiti R, Brunato M and Mascia F (2008). Reactive search and intelligent optimization. Operations Research/Computer Science Interfaces, Vol. 45. Springer Verlag: USA.
Blazewicz J, Lenstra JK and Kan AHGR (1983). Scheduling subject to resource constraints: Classification and complexity. Discrete Appl Math 5: 11–24.
Bouleimen K and Lecocq H (2003). A new efficient simulated annealing algorithm for the resource-constrained project scheduling problem and its multiple mode version. Eur J Opl Res 149: 268–281.
Brucker P, Drexl A, Mohring R, Neumann K and Pesch E (1999). Resource-constrained project scheduling: Notation, classification, models and methods. Eur J Opl Res 112: 3–41.
Demeulemeester E and Herroelen W (2002). Project scheduling: A research handbook. International Series in Operations Research & Management Science, Vol. 49. Kluwer: Norwell, MA.
De Reyck B and Herroelen W (1999). The multi-mode resource-constrained project scheduling problem with generalized precedence relations. Eur J Opl Res 119: 538–556.
Hartmann S (1997). Project scheduling with multiple modes: A genetic algorithm. Ann Opns Res 102: 111–135.
Herroelen W, De Reyck B and Demeulemeester E (1998). Resource-constrained project scheduling: A survey of recent developements. Comput Opns Res 25: 297–302.
Jarboui B, Damak N, Siarry P and Rebai A (2008). A combinatorial particle swarm optimization for solving multi-mode resource-constrained project scheduling problems. Appl Math Comput 195: 299–308.
Jedrzejowicz P and Ratajczak-Ropel E (2006). Population learning algorithm for the resource-constrained project scheduling. International Series in Operations Research & Management Science, Vol. 92, Chapter 11. Springer: USA, pp 275–296.
Jedrzejowicz P and Ratajczak-Ropel E (2007). Agent-based approach to solving the resource constrained project scheduling problem. 4431/2007 (8th International Conference, ICANNGA 2007), Springer: Berlin/Heidelberg. pp 480–487.
Jozefowska J, Mika M, Rozycki R, Waligora G and Weglarz J (2001). Simulated annealing for multi-mode resource-constrained project scheduling. Ann Opns Res 102: 137–155.
Knotts G, Dror M and Hartman BC (2000). Agent-based project scheduling. IIE T 32: 387–401.
Kolisch R (1996). Serial and parallel resource-constrained project scheduling methods revisited: Theory and computation. Eur J Opl Res 90: 320–333.
Kolisch R and Hartmann S (2006). Experimental investigation of heuristics for resource-constrained project scheduling: An update. Eur J Opl Res 174: 23–37.
Kolisch R and Sprecher A (1996). PSPLIB—A project scheduling problem library. Eur J Opl Res 96: 205–216.
Li KY and Willis RJ (1992). An iterative scheduling technique for resource-constrained project scheduling. Eur J Opl Res 56: 370–379.
Littman ML (1994). Markov games as a framework for multi-agent reinforcement learning. In: W.W. Cohen and H. Hirsh (eds.) Proceedings of the Eleventh International Conference on Machine Learning. Morgan Kaufmann: Rutgers University, New Brunsurick, NJ, pp 157–163.
Lova A, Tormos P, Cervantes M and Barber F (2009). An efficient hybrid genetic algorithm for scheduling projects with resource constraints and multiple execution modes. Int J Prod Econ 117: 302–316.
Masao M and Tseng CC (1997). A genetic algorithm for multi-mode resource constrained project scheduling problem. Eur J Opl Res 100: 134–141.
Narendra K.S. and Thathachar M.A. (1989). Learning Automata: An Introduction. Prentice-Hall, Inc.: Upper Saddle River, NJ.
Patterson J, Talbot F, Slowinski R and Weglarz J (1990). Computational experience with a backtracking algorithm for solving a general class of precedence and resource-constrained scheduling problems. Eur J Opl Res 49: 68–79.
Pritsker AAB, Watters LJ and Wolfe PM (1969). Multiproject scheduling with limited resources: A zero-one programming approach. Mngt Sci 16: 93–107.
Ranjbar M, De Reyck B and Kianfar F (2009). A hybrid scatter search for the discrete time/resource trade-off problem in project scheduling. Eur J Opl Res 193: 35–48.
Sprecher A, Hartmann S and Drexl A (1997). An exact algorithm for project scheduling with multiple modes. OR Spektrum 19: 195–203.
Sutton RS and Barto G (1998). Reinforcement Learning. The MIT Press: Cambridge, MA and London, UK.
Talbot FB (1982). Resource constrained project scheduling with time-resource tradeoffs: The nonpreemptive case. Mngt Sci 28: 1197–1210.
Thathachar MAL and Sastry PS (2004). Networks of Learning Automata Techniques for Online Stochastic Optimization. Kluwer Academic Publishers: Norwell, MA.
Thomas PR and Salhi S (1998). A tabu search approach for the resource constrained project scheduling problem. J Heuristics 4: 123–139.
Van Peteghem V and Vanhoucke M (2009). An articial immune system for the multi-mode resource-constrained project scheduling problem. Working Paper 09/555. Ghent University, Belgium.
Van Peteghem V and Vanhoucke M (2010). A genetic algorithm for the preemptive and non-preemptive multi-mode resource-constrained project scheduling problem. Eur J Opl Res 201: 409–418.
Verbeeck K, Nowé A, Vrancx P and Peeters M (2008). Reinforcement learning theory and applications. Multi-Automata Learning, Chapter 9. I-Tech Education and Publishing: Vienna, Austria, pp 167–185.
Vrancx P, Verbeeck K and Nowé A (2008). Decentralized learning in Markov games. IEEE T Syst Man Cybern 38: 976–981.
Watkins CJCH (1989). Learning from delayed rewards. PhD Thesis, Cambridge University.
Wheeler RM and Narendra K (1986). Decentralized learning in finite Markov chains. IEEE T Automat Contr AC-31: 519–526.
Zhu G, Bard J and Tu G (2006). A branch-and-cut procedure for the multimode resource-constrained project-scheduling problem. J Comput 18: 377–390.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wauters, T., Verbeeck, K., Berghe, G. et al. Learning agents for the multi-mode project scheduling problem. J Oper Res Soc 62, 281–290 (2011). https://doi.org/10.1057/jors.2010.101
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1057/jors.2010.101