Decentralized list scheduling
Classical list scheduling is a very popular and efficient technique for scheduling jobs for parallel and distributed platforms. It is inherently centralized. However, with the increasing number of processors, the cost for managing a single centralized list becomes too prohibitive. A suitable approach to reduce the contention is to distribute the list among the computational units: each processor only has a local view of the work to execute. Thus, the scheduler is no longer greedy and standard performance guarantees are lost.
The objective of this work is to study the extra cost that must be paid when the list is distributed among the computational units. We first present a general methodology for computing the expected makespan based on the analysis of an adequate potential function which represents the load imbalance between the local lists. We obtain an equation giving the evolution of the potential by computing its expected decrease in one step of the schedule. Our main theorem shows how to solve such equations to bound the makespan. Then, we apply this method to several scheduling problems, namely, for unit independent tasks, for weighted independent tasks and for tasks with precedence constraints. More precisely, we prove that the time for scheduling a global workload W composed of independent unit tasks on m processors is equal to W/m plus an additional term proportional to log2 W. We provide a lower bound which shows that this is optimal up to a constant. This result is extended to the case of weighted independent tasks. In the last setting, precedence task graphs, our analysis leads to an improvement on the bound of Arora et al. (Theory Comput. Syst. 34(2):115–144, 2001). We end with some experiments using a simulator. The distribution of the makespan is shown to fit existing probability laws. Moreover, the simulations give a better insight into the additive term whose value is shown to be around 3log2 W confirming the precision of our analysis.
KeywordsScheduling List algorithms Work stealing
The authors would like to thank Julien Bernard and Jean-Louis Roch for fruitful discussions on the preliminary version of this work.
- Adler, M., Chakrabarti, S., Mitzenmacher, M., & Rasmussen, L. (1995). Parallel randomized load balancing. In Proceedings of STOC (pp. 238–247). Google Scholar
- Arora, N. S., Blumofe, R. D., & Plaxton, C. G. (2001). Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2), 115–144. Google Scholar
- Frigo, M., Leiserson, C. E., & Randall, K. H. (1998). The implementation of the Cilk-5 multithreaded language. In Proceedings of PLDI. Google Scholar
- Gast, N., & Gaujal, B. (2010). A mean field model of work stealing in large-scale systems. In Proceedings of SIGMETRICS. Google Scholar
- Gautier, T. (2010). Personal communication. Google Scholar
- Kotz, S., & Nadarajah, S. (2001). Extreme value distributions: theory and applications. Singapore: World Scientific. Google Scholar
- Leung, J. (2004). Handbook of scheduling: algorithms, models, and performance analysis. Boca Raton: CRC Press. Google Scholar
- Lueling, R., & Monien, B (1993). A dynamic distributed load balancing algorithm with provable good performance. In SPAA: annual ACM symposium on parallel algorithms and architectures. Google Scholar
- Mitzenmacher, M. (1998). Analyses of load stealing models based on differential equations. In Proceedings of SPAA (pp. 212–221). Google Scholar
- Robison, A., Voss, M., & Kukanov, A. (2008). Optimization via reflection on work stealing in TBB. In Proceedings of IPDPS (pp. 1–8). Google Scholar
- Rudolph, L., Slivkin-Allalouf, M., & Upfal, E. (1991). A simple load balancing scheme for task allocation in parallel machines. In SPAA (pp. 237–245). Google Scholar
- Sanders, P. (1999). Asynchronous random polling dynamic load balancing. In A. Aggarwal & C. P. Rangan (Eds.), Lecture notes in computer science: Vol. 1741. ISAAC (pp. 37–48). Berlin: Springer. Google Scholar
- Schwiegelshohn, U., Tchernykh, A., & Yahyapour, R. (2008). Online scheduling in grids. In Proceedings of IPDPS. Google Scholar
- Tchiboukdjian, M., Gast, N., Trystram, D., Roch, J. L., & Bernard, J. (2010). A tighter analysis of work stealing. In The 21st international symposium on algorithms and computation (ISAAC). Google Scholar
- Traoré, D., Roch, J. L., Maillard, N., Gautier, T., & Bernard, J. (2008). Deque-free work-optimal parallel STL algorithms. In Proceedings of Euro-Par (pp. 887–897). Google Scholar