Using Load Information in Work-Stealing on Distributed Systems with Non-uniform Communication Latencies

  • Vladimir Janjic
  • Kevin Hammond
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7484)


We evaluate four state-of-the-art work-stealing algorithms for distributed systems with non-uniform communication latenices (Random Stealing, Hierarchical Stealing, Cluster-aware Random Stealing and Adaptive Cluster-aware Random Stealing) on a set of irregular Divide-and-Conquer (D&C) parallel applications. We also investigate the extent to which these algorithms could be improved if dynamic load information is available, and how accurate this information needs to be. We show that, for highly-irregular D&C applications, the use of load information can significantly improve application speedups, whereas there is little improvement for less irregular ones. Furthermore, we show that when load information is used, Cluster-aware Random Stealing gives the best speedups for both regular and irregular D&C applications.


Processing Element Parallel Application Runtime System Load Information Potential Victim 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Al Zain, A.D., Trinder, P.W., Michaelson, G.J., Loidl, H.-W.: Managing Heterogeneity in a Grid Parallel Haskell. Scalable Computing: Practice and Experience 7(3), 9–25 (2006)Google Scholar
  2. 2.
    Baldeschwieler, J.E., Blumofe, R.D., Brewer, E.A.: ATLAS: An Infrastructure for Global Computing. In: Proc. 7th Workshop on System Support for Worldwide Applications, pp. 165–172. ACM (1996)Google Scholar
  3. 3.
    Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: An Efficient Multithreaded Runtime System. In: Proc. PPoPP 1995: ACM Symp. on Principles and Practice of Parallel Prog., pp. 207–216 (1995)Google Scholar
  4. 4.
    Blumofe, R.D., Leiserson, C.E.: Scheduling Multithreaded Computations by Work Stealing. Journal of the ACM 46(5), 720–748 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    Burton, F.W., Sleep, M.R.: Executing Functional Programs on a Virtual Tree of Processors. In: Proc. FPCA 1981: 1981 Conf. on Functional Prog. Langs. and Comp. Arch., pp. 187–194. ACM (1981)Google Scholar
  6. 6.
    Dinan, J., Larkins, D.B., Sadayappan, P., Krishnamoorthy, S., Nieplocha, J.: Scalable Work Stealing. In: Proc. SC 2009: Conf. on High Performance Computing Networking, Storage and Analysis, pp. 1–11. ACM (2009)Google Scholar
  7. 7.
    Hammes, J., Bohm, W.: Comparing Id and Haskell in a Monte Carlo Photon Transport Code. J. Functional Programming 5, 283–316 (1995)CrossRefGoogle Scholar
  8. 8.
    Janjic, V.: Load Balancing of Irregular Parallel Applications on Heterogeneous Computing Environments. PhD thesis, University of St Andrews (2011)Google Scholar
  9. 9.
    Janjic, V., Hammond, K.: Granularity-Aware Work-Stealing for Computationally-Uniform Grids. In: Proc. CCGrid 2010: IEEE/ACM Intl. Conf. on Cluster, Cloud and Grid Computation, pp. 123–134 (May 2010)Google Scholar
  10. 10.
    Michael, M.M., Vechev, M.T., Saraswar, V.A.: Idempotent Work Stealing. In: Proc. PPoPP 2009: 14th ACM SIGPLAN Symp. on Principles and Practice of Parallel Prog., pp. 45–54 (2009)Google Scholar
  11. 11.
    Neary, M.O., Cappello, P.: Advanced Eager Scheduling for Java-Based Adaptively Parallel Computing. In: Proc. JGI 2002: Joint ACM-ISCOPE Conference on Java Grande, pp. 56–65 (2002)Google Scholar
  12. 12.
    Olivier, S., Huan, J., Liu, J., Prins, J.F., Dinan, J., Sadayappan, P., Tseng, C.-W.: UTS: An Unbalanced Tree Search Benchmark. In: Almási, G.S., Caşcaval, C., Wu, P. (eds.) LCPC 2006. LNCS, vol. 4382, pp. 235–250. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  13. 13.
    Ravichandran, K., Lee, S., Pande, S.: Work Stealing for Multi-core HPC Clusters. In: Jeannot, E., Namyst, R., Roman, J. (eds.) Euro-Par 2011, Part I. LNCS, vol. 6852, pp. 205–217. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  14. 14.
    Saraswat, V.A., Kambadur, P., Kodali, S., Grove, D., Krishnamoorthy, S.: Lifeline-based Global Load Balancing. In: Proc. PPoPP 2011: 16th ACM Symp. on Principles and Practice of Parallel Prog., pp. 201–212 (2011)Google Scholar
  15. 15.
    Trinder, P.W., Hammond, K., Mattson Jr., J.S., Partridge, A.S., Peyton Jones, S.L.: GUM: A Portable Parallel Implementation of Haskell. In: Proc. PLDI 1996: ACM Conf. on Prog. Lang. Design and Implementation, pp. 79–88. ACM (1996)Google Scholar
  16. 16.
    Van Nieuwpoort, R.V., Kielmann, T., Bal, H.E.: Efficient Load Balancing for Wide-area Divide-and-Conquer Applications. In: Proc. PPoPP 2001: 8th ACM SIGPLAN Symp. on Principles and Practice of Parallel Prog., pp. 34–43 (2001)Google Scholar
  17. 17.
    Van Nieuwpoort, R.V., Maassen, J., Wrzesinska, G., Kielmann, T., Bal, H.E.: Adaptive Load Balancing for Divide-and-Conquer Grid Applications. Journal of Supercomputing (2004)Google Scholar
  18. 18.
    Van Nieuwpoort, R.V., Wrzesińska, G., Jacobs, C.J.H., Bal, H.E.: Satin: A High-Level and Efficient Grid Programming Model. ACM TOPLAS: Trans. on Prog. Langs. and Systems 32(3), 1–39 (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Vladimir Janjic
    • 1
  • Kevin Hammond
    • 1
  1. 1.School of Computer ScienceUniversity of St AndrewsUnited Kingdom

Personalised recommendations