## Abstract

Classical list scheduling is a very popular and efficient technique for scheduling jobs for parallel and distributed platforms. It is inherently centralized. However, with the increasing number of processors, the cost for managing a single centralized list becomes too prohibitive. A suitable approach to reduce the contention is to distribute the list among the computational units: each processor only has a local view of the work to execute. Thus, the scheduler is no longer greedy and standard performance guarantees are lost.

The objective of this work is to study the extra cost that must be paid when the list is distributed among the computational units. We first present a general methodology for computing the expected makespan based on the analysis of an adequate potential function which represents the load imbalance between the local lists. We obtain an equation giving the evolution of the potential by computing its expected decrease in one step of the schedule. Our main theorem shows how to solve such equations to bound the makespan. Then, we apply this method to several scheduling problems, namely, for unit independent tasks, for weighted independent tasks and for tasks with precedence constraints. More precisely, we prove that the time for scheduling a global workload *W* composed of independent unit tasks on *m* processors is equal to *W*/*m* plus an additional term proportional to log_{2} *W*. We provide a lower bound which shows that this is optimal up to a constant. This result is extended to the case of weighted independent tasks. In the last setting, precedence task graphs, our analysis leads to an improvement on the bound of Arora et al. (Theory Comput. Syst. 34(2):115–144, 2001). We end with some experiments using a simulator. The distribution of the makespan is shown to fit existing probability laws. Moreover, the simulations give a better insight into the additive term whose value is shown to be around 3log_{2} *W* confirming the precision of our analysis.

## Keywords

Scheduling List algorithms Work stealing## Notes

### Acknowledgements

The authors would like to thank Julien Bernard and Jean-Louis Roch for fruitful discussions on the preliminary version of this work.

## References

- Adler, M., Chakrabarti, S., Mitzenmacher, M., & Rasmussen, L. (1995). Parallel randomized load balancing. In
*Proceedings of STOC*(pp. 238–247). Google Scholar - Arora, N. S., Blumofe, R. D., & Plaxton, C. G. (2001). Thread scheduling for multiprogrammed multiprocessors.
*Theory of Computing Systems*,*34*(2), 115–144. Google Scholar - Azar, Y., Broder, A. Z., Karlin, A. R., & Upfal, E. (1999). Balanced allocations.
*SIAM Journal on Computing*,*29*(1), 180–200. doi: 10.1137/S0097539795288490. CrossRefGoogle Scholar - Bender, M. A., & Rabin, M. O. (2002). Online scheduling of parallel programs on heterogeneous systems with applications to Cilk.
*Theory of Computing Systems*,*35*, 289–304. CrossRefGoogle Scholar - Berenbrink, P., Friedetzky, T., & Goldberg, L. A. (2003). The natural work-stealing algorithm is stable.
*SIAM Journal on Computing*,*32*(5), 1260–1279. CrossRefGoogle Scholar - Berenbrink, P., Friedetzky, T., Goldberg, L. A., Goldberg, P. W., Hu, Z., & Martin, R. (2007). Distributed selfish load balancing.
*SIAM Journal on Computing*,*37*(4), 1163–1181. doi: 10.1137/060660345. CrossRefGoogle Scholar - Berenbrink, P., Friedetzky, T., Hu, Z., & Martin, R. (2008). On weighted balls-into-bins games.
*Theoretical Computer Science*,*409*(3), 511–520. CrossRefGoogle Scholar - Berenbrink, P., Friedetzky, T., & Hu, Z. (2009). A new analytical method for parallel, diffusion-type load balancing.
*Journal of Parallel and Distributed Computing*,*69*(1), 54–61. CrossRefGoogle Scholar - Blumofe, R. D., & Leiserson, C. E. (1999). Scheduling multithreaded computations by work stealing.
*Journal of the ACM*,*46*(5), 720–748. CrossRefGoogle Scholar - Chekuri, C., & Bender, M. (2001). An efficient approximation algorithm for minimizing makespan on uniformly related machines.
*Journal of Algorithms*,*41*(2), 212–224. CrossRefGoogle Scholar - Drozdowski, M. (2009).
*Scheduling for parallel processing*. Berlin: Springer. CrossRefGoogle Scholar - Frigo, M., Leiserson, C. E., & Randall, K. H. (1998). The implementation of the Cilk-5 multithreaded language. In
*Proceedings of PLDI*. Google Scholar - Gast, N., & Gaujal, B. (2010). A mean field model of work stealing in large-scale systems. In
*Proceedings of SIGMETRICS*. Google Scholar - Gautier, T. (2010). Personal communication. Google Scholar
- Gautier, T., Besseron X., & Pigeon, L. (2007). KAAPI: a thread scheduling runtime system for data flow computations on cluster of multi-processors. In
*Proceedings of PASCO*(pp. 15–23). CrossRefGoogle Scholar - Graham, R. L. (1969). Bounds on multiprocessing timing anomalies.
*SIAM Journal on Applied Mathematics*,*17*, 416–429. CrossRefGoogle Scholar - Hwang, J. J., Chow, Y. C., Anger, F. D., & Lee, C. Y. (1989). Scheduling precedence graphs in systems with interprocessor communication times.
*SIAM Journal on Computing*,*18*(2), 244–257. CrossRefGoogle Scholar - Kotz, S., & Nadarajah, S. (2001).
*Extreme value distributions: theory and applications*. Singapore: World Scientific. Google Scholar - Leung, J. (2004).
*Handbook of scheduling: algorithms, models, and performance analysis*. Boca Raton: CRC Press. Google Scholar - Lueling, R., & Monien, B (1993). A dynamic distributed load balancing algorithm with provable good performance. In
*SPAA: annual ACM symposium on parallel algorithms and architectures*. Google Scholar - Mitzenmacher, M. (1998). Analyses of load stealing models based on differential equations. In
*Proceedings of SPAA*(pp. 212–221). Google Scholar - Robert, Y., & Vivien, F. (2009).
*Introduction to scheduling*. London/Boca Raton: Chapman & Hall/CRC Press. CrossRefGoogle Scholar - Robison, A., Voss, M., & Kukanov, A. (2008). Optimization via reflection on work stealing in TBB. In
*Proceedings of IPDPS*(pp. 1–8). Google Scholar - Rudolph, L., Slivkin-Allalouf, M., & Upfal, E. (1991). A simple load balancing scheme for task allocation in parallel machines. In
*SPAA*(pp. 237–245). Google Scholar - Sanders, P. (1999). Asynchronous random polling dynamic load balancing. In A. Aggarwal & C. P. Rangan (Eds.),
*Lecture notes in computer science: Vol.*1741.*ISAAC*(pp. 37–48). Berlin: Springer. Google Scholar - Schwiegelshohn, U., Tchernykh, A., & Yahyapour, R. (2008). Online scheduling in grids. In
*Proceedings of IPDPS*. Google Scholar - Tchiboukdjian, M., Gast, N., Trystram, D., Roch, J. L., & Bernard, J. (2010). A tighter analysis of work stealing. In
*The 21st international symposium on algorithms and computation (ISAAC)*. Google Scholar - Traoré, D., Roch, J. L., Maillard, N., Gautier, T., & Bernard, J. (2008). Deque-free work-optimal parallel STL algorithms. In
*Proceedings of Euro-Par*(pp. 887–897). Google Scholar