The Journal of Supercomputing

, Volume 74, Issue 4, pp 1435–1448 | Cite as

Hybrid work stealing of locality-flexible and cancelable tasks for the APGAS library



Since large parallel machines are typically clusters of multicore nodes, parallel programs should be able to deal with both shared memory and distributed memory. This paper proposes a hybrid work stealing scheme, which combines the lifeline-based variant of distributed task pools with the node-internal load balancing of Java’s Fork/Join framework. We implemented our scheme by extending the APGAS library for Java, which is a branch of the X10 project. APGAS programmers can now spawn locality-flexible tasks with a new asyncAny construct. These tasks are transparently mapped to any resource in the overall system, so that the load is balanced over both nodes and cores. Unprocessed asyncAny-tasks can also be cancelled. In performance measurements with up to 144 workers on up to 12 nodes, we observed near linear speedups for four benchmarks and a low overhead for cancellation-related bookkeeping.


Task pool Work stealing Task cancellation APGAS Java 



This work is supported by the Deutsche Forschungsgemeinschaft, under Grant FO 1035/5-1.


  1. 1.
    Applegate DL, Bixby RE, Chvatal V, Cook WJ (2007) The traveling salesman problem. Princeton University Press, PrincetonMATHGoogle Scholar
  2. 2.
    Diaz J, Munoz-Caro C, Nino A (2012) A survey of parallel programming models and tools in the multi and many-core era. IEEE Trans Parallel Distrib Syst 23:1369–1386. CrossRefGoogle Scholar
  3. 3.
    Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35. CrossRefGoogle Scholar
  4. 4.
    Gendron B, Crainic TG (1994) Parallel branch-and-branch algorithms: survey and synthesis. Oper Res 42(6):1042–1066. MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Gik J (1987) Schach und Mathematik. Deutsch Harri GmbH, Frankfurt a. MGoogle Scholar
  6. 6.
    Guo Y, Zhao J, Cave V, Sarkar V (2010) SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
  7. 7.
    IBM: Core implementation of X10 programming language including compiler, runtime, class libraries, sample programs and test suite. (2017)
  8. 8.
    IBM: The APGAS library for fault-tolerant distributed programming in Java 8. (2017)
  9. 9.
    Kestor G, Krishnamoorthy S, Ma W (2017) Localized fault recovery for nested fork-join programs. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing.
  10. 10.
    Kolesnichenko A, Nanz S, Meyer B (2013) How to cancel a task. Springer, Berlin, pp 61–72. Google Scholar
  11. 11.
    Kumar V, Murthy K, Sarkar V, Zheng Y (2016) Optimized distributed work-stealing. In: Proceedings of Workshop on Irregular Applications: Architectures and Algorithms, pp 74–77.
  12. 12.
    Olivier S, Huan J, Liu J, Prins J, Dinan J, Sadayappan P, Tseng CW (2006) UTS: an unbalanced tree search benchmark. In: Languages and Compilers for Parallel Computing, pp 235–250. Springer LNCS 4382.
  13. 13.
    OpenMP ARB: OpenMP specifications. (2017)
  14. 14.
  15. 15.
    Paudel J, Tardieu O, Amaral JN (2013) Hybrid parallel task placement in X10. In: Proceedings of ACM SIGPLAN Workshop on X10.
  16. 16.
    Paudel J, Tardieu O, Amaral JN (2013) On the merits of distributed work-stealing on selective locality-aware tasks. In: Proceedings of International Conference on Parallel Processing.
  17. 17.
    Posner J, Fohry C (2016) Cooperation versus coordination for lifeline-based global load balancing in APGAS. In: Proceedings of ACM SIGPLAN Workshop on X10.
  18. 18.
    Rice University: HabaneroUPC++: a Compiler-free PGAS Library. (2017)
  19. 19.
    STEllAR-GROUP: HPX: The C++ standards library for parallelism and concurrency (2017).
  20. 20.
    Tardieu O (2015) The APGAS library: resilient parallel and distributed programming in Java 8. In: Proceedings of ACM SIGPLAN Workshop on X10.
  21. 21.
    Thoman P, et al (2018) A taxonomy of task-based technologies for high-performance computing. In: Proceedings of International Conference Parallel Processing and Applied Mathematics (To appear)Google Scholar
  22. 22.
  23. 23.
    Yamashita K, Kamada T (2016) Introducing a multithread and multistage mechanism for the global load balancing library of X10. J Inf Process 24(2):416–424. Google Scholar
  24. 24.
    Zhang W, Tardieu O, Grove D, Herta B, Kamada T, Saraswat V, Takeuchi M (2014) GLB lifeline-based global load balancing library in X10. In: Proceedings of ACM Workshop on Parallel Programming for Analytics Applications.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Reseach Group Programming Languages/MethodologiesUniversity of KasselKasselGermany

Personalised recommendations