Annals of Operations Research

, Volume 207, Issue 1, pp 237–259 | Cite as

Decentralized list scheduling

  • Marc Tchiboukdjian
  • Nicolas Gast
  • Denis Trystram


Classical list scheduling is a very popular and efficient technique for scheduling jobs for parallel and distributed platforms. It is inherently centralized. However, with the increasing number of processors, the cost for managing a single centralized list becomes too prohibitive. A suitable approach to reduce the contention is to distribute the list among the computational units: each processor only has a local view of the work to execute. Thus, the scheduler is no longer greedy and standard performance guarantees are lost.

The objective of this work is to study the extra cost that must be paid when the list is distributed among the computational units. We first present a general methodology for computing the expected makespan based on the analysis of an adequate potential function which represents the load imbalance between the local lists. We obtain an equation giving the evolution of the potential by computing its expected decrease in one step of the schedule. Our main theorem shows how to solve such equations to bound the makespan. Then, we apply this method to several scheduling problems, namely, for unit independent tasks, for weighted independent tasks and for tasks with precedence constraints. More precisely, we prove that the time for scheduling a global workload W composed of independent unit tasks on m processors is equal to W/m plus an additional term proportional to log2 W. We provide a lower bound which shows that this is optimal up to a constant. This result is extended to the case of weighted independent tasks. In the last setting, precedence task graphs, our analysis leads to an improvement on the bound of Arora et al. (Theory Comput. Syst. 34(2):115–144, 2001). We end with some experiments using a simulator. The distribution of the makespan is shown to fit existing probability laws. Moreover, the simulations give a better insight into the additive term whose value is shown to be around 3log2 W confirming the precision of our analysis.


Scheduling List algorithms Work stealing 



The authors would like to thank Julien Bernard and Jean-Louis Roch for fruitful discussions on the preliminary version of this work.


  1. Adler, M., Chakrabarti, S., Mitzenmacher, M., & Rasmussen, L. (1995). Parallel randomized load balancing. In Proceedings of STOC (pp. 238–247). Google Scholar
  2. Arora, N. S., Blumofe, R. D., & Plaxton, C. G. (2001). Thread scheduling for multiprogrammed multiprocessors. Theory of Computing Systems, 34(2), 115–144. Google Scholar
  3. Azar, Y., Broder, A. Z., Karlin, A. R., & Upfal, E. (1999). Balanced allocations. SIAM Journal on Computing, 29(1), 180–200. doi: 10.1137/S0097539795288490. CrossRefGoogle Scholar
  4. Bender, M. A., & Rabin, M. O. (2002). Online scheduling of parallel programs on heterogeneous systems with applications to Cilk. Theory of Computing Systems, 35, 289–304. CrossRefGoogle Scholar
  5. Berenbrink, P., Friedetzky, T., & Goldberg, L. A. (2003). The natural work-stealing algorithm is stable. SIAM Journal on Computing, 32(5), 1260–1279. CrossRefGoogle Scholar
  6. Berenbrink, P., Friedetzky, T., Goldberg, L. A., Goldberg, P. W., Hu, Z., & Martin, R. (2007). Distributed selfish load balancing. SIAM Journal on Computing, 37(4), 1163–1181. doi: 10.1137/060660345. CrossRefGoogle Scholar
  7. Berenbrink, P., Friedetzky, T., Hu, Z., & Martin, R. (2008). On weighted balls-into-bins games. Theoretical Computer Science, 409(3), 511–520. CrossRefGoogle Scholar
  8. Berenbrink, P., Friedetzky, T., & Hu, Z. (2009). A new analytical method for parallel, diffusion-type load balancing. Journal of Parallel and Distributed Computing, 69(1), 54–61. CrossRefGoogle Scholar
  9. Blumofe, R. D., & Leiserson, C. E. (1999). Scheduling multithreaded computations by work stealing. Journal of the ACM, 46(5), 720–748. CrossRefGoogle Scholar
  10. Chekuri, C., & Bender, M. (2001). An efficient approximation algorithm for minimizing makespan on uniformly related machines. Journal of Algorithms, 41(2), 212–224. CrossRefGoogle Scholar
  11. Drozdowski, M. (2009). Scheduling for parallel processing. Berlin: Springer. CrossRefGoogle Scholar
  12. Frigo, M., Leiserson, C. E., & Randall, K. H. (1998). The implementation of the Cilk-5 multithreaded language. In Proceedings of PLDI. Google Scholar
  13. Gast, N., & Gaujal, B. (2010). A mean field model of work stealing in large-scale systems. In Proceedings of SIGMETRICS. Google Scholar
  14. Gautier, T. (2010). Personal communication. Google Scholar
  15. Gautier, T., Besseron X., & Pigeon, L. (2007). KAAPI: a thread scheduling runtime system for data flow computations on cluster of multi-processors. In Proceedings of PASCO (pp. 15–23). CrossRefGoogle Scholar
  16. Graham, R. L. (1969). Bounds on multiprocessing timing anomalies. SIAM Journal on Applied Mathematics, 17, 416–429. CrossRefGoogle Scholar
  17. Hwang, J. J., Chow, Y. C., Anger, F. D., & Lee, C. Y. (1989). Scheduling precedence graphs in systems with interprocessor communication times. SIAM Journal on Computing, 18(2), 244–257. CrossRefGoogle Scholar
  18. Kotz, S., & Nadarajah, S. (2001). Extreme value distributions: theory and applications. Singapore: World Scientific. Google Scholar
  19. Leung, J. (2004). Handbook of scheduling: algorithms, models, and performance analysis. Boca Raton: CRC Press. Google Scholar
  20. Lueling, R., & Monien, B (1993). A dynamic distributed load balancing algorithm with provable good performance. In SPAA: annual ACM symposium on parallel algorithms and architectures. Google Scholar
  21. Mitzenmacher, M. (1998). Analyses of load stealing models based on differential equations. In Proceedings of SPAA (pp. 212–221). Google Scholar
  22. Robert, Y., & Vivien, F. (2009). Introduction to scheduling. London/Boca Raton: Chapman & Hall/CRC Press. CrossRefGoogle Scholar
  23. Robison, A., Voss, M., & Kukanov, A. (2008). Optimization via reflection on work stealing in TBB. In Proceedings of IPDPS (pp. 1–8). Google Scholar
  24. Rudolph, L., Slivkin-Allalouf, M., & Upfal, E. (1991). A simple load balancing scheme for task allocation in parallel machines. In SPAA (pp. 237–245). Google Scholar
  25. Sanders, P. (1999). Asynchronous random polling dynamic load balancing. In A. Aggarwal & C. P. Rangan (Eds.), Lecture notes in computer science: Vol. 1741. ISAAC (pp. 37–48). Berlin: Springer. Google Scholar
  26. Schwiegelshohn, U., Tchernykh, A., & Yahyapour, R. (2008). Online scheduling in grids. In Proceedings of IPDPS. Google Scholar
  27. Tchiboukdjian, M., Gast, N., Trystram, D., Roch, J. L., & Bernard, J. (2010). A tighter analysis of work stealing. In The 21st international symposium on algorithms and computation (ISAAC). Google Scholar
  28. Traoré, D., Roch, J. L., Maillard, N., Gautier, T., & Bernard, J. (2008). Deque-free work-optimal parallel STL algorithms. In Proceedings of Euro-Par (pp. 887–897). Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  • Marc Tchiboukdjian
    • 1
  • Nicolas Gast
    • 2
  • Denis Trystram
    • 3
  1. 1.CNRS/CEA, DAMDIFArpajonFrance
  2. 2.IC-LCA2, Bâtiment BCEPFLLausanne-EPFLSwitzerland
  3. 3.Grenoble Institute of Technology and Institut Universitaire de FranceMontbonnotFrance

Personalised recommendations