The Journal of Supercomputing

, Volume 74, Issue 4, pp 1435–1448 | Cite as

Hybrid work stealing of locality-flexible and cancelable tasks for the APGAS library

  • Jonas Posner
  • Claudia Fohry


Since large parallel machines are typically clusters of multicore nodes, parallel programs should be able to deal with both shared memory and distributed memory. This paper proposes a hybrid work stealing scheme, which combines the lifeline-based variant of distributed task pools with the node-internal load balancing of Java’s Fork/Join framework. We implemented our scheme by extending the APGAS library for Java, which is a branch of the X10 project. APGAS programmers can now spawn locality-flexible tasks with a new asyncAny construct. These tasks are transparently mapped to any resource in the overall system, so that the load is balanced over both nodes and cores. Unprocessed asyncAny-tasks can also be cancelled. In performance measurements with up to 144 workers on up to 12 nodes, we observed near linear speedups for four benchmarks and a low overhead for cancellation-related bookkeeping.


Task pool Work stealing Task cancellation APGAS Java 



This work is supported by the Deutsche Forschungsgemeinschaft, under Grant FO 1035/5-1.


  1. 1.
    Applegate DL, Bixby RE, Chvatal V, Cook WJ (2007) The traveling salesman problem. Princeton University Press, PrincetonzbMATHGoogle Scholar
  2. 2.
    Diaz J, Munoz-Caro C, Nino A (2012) A survey of parallel programming models and tools in the multi and many-core era. IEEE Trans Parallel Distrib Syst 23:1369–1386. CrossRefGoogle Scholar
  3. 3.
    Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35. CrossRefGoogle Scholar
  4. 4.
    Gendron B, Crainic TG (1994) Parallel branch-and-branch algorithms: survey and synthesis. Oper Res 42(6):1042–1066. MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Gik J (1987) Schach und Mathematik. Deutsch Harri GmbH, Frankfurt a. MGoogle Scholar
  6. 6.
    Guo Y, Zhao J, Cave V, Sarkar V (2010) SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.
  7. 7.
    IBM: Core implementation of X10 programming language including compiler, runtime, class libraries, sample programs and test suite. (2017)
  8. 8.
    IBM: The APGAS library for fault-tolerant distributed programming in Java 8. (2017)
  9. 9.
    Kestor G, Krishnamoorthy S, Ma W (2017) Localized fault recovery for nested fork-join programs. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing.
  10. 10.
    Kolesnichenko A, Nanz S, Meyer B (2013) How to cancel a task. Springer, Berlin, pp 61–72. Google Scholar
  11. 11.
    Kumar V, Murthy K, Sarkar V, Zheng Y (2016) Optimized distributed work-stealing. In: Proceedings of Workshop on Irregular Applications: Architectures and Algorithms, pp 74–77.
  12. 12.
    Olivier S, Huan J, Liu J, Prins J, Dinan J, Sadayappan P, Tseng CW (2006) UTS: an unbalanced tree search benchmark. In: Languages and Compilers for Parallel Computing, pp 235–250. Springer LNCS 4382.
  13. 13.
    OpenMP ARB: OpenMP specifications. (2017)
  14. 14.
  15. 15.
    Paudel J, Tardieu O, Amaral JN (2013) Hybrid parallel task placement in X10. In: Proceedings of ACM SIGPLAN Workshop on X10.
  16. 16.
    Paudel J, Tardieu O, Amaral JN (2013) On the merits of distributed work-stealing on selective locality-aware tasks. In: Proceedings of International Conference on Parallel Processing.
  17. 17.
    Posner J, Fohry C (2016) Cooperation versus coordination for lifeline-based global load balancing in APGAS. In: Proceedings of ACM SIGPLAN Workshop on X10.
  18. 18.
    Rice University: HabaneroUPC++: a Compiler-free PGAS Library. (2017)
  19. 19.
    STEllAR-GROUP: HPX: The C++ standards library for parallelism and concurrency (2017).
  20. 20.
    Tardieu O (2015) The APGAS library: resilient parallel and distributed programming in Java 8. In: Proceedings of ACM SIGPLAN Workshop on X10.
  21. 21.
    Thoman P, et al (2018) A taxonomy of task-based technologies for high-performance computing. In: Proceedings of International Conference Parallel Processing and Applied Mathematics (To appear)Google Scholar
  22. 22.
  23. 23.
    Yamashita K, Kamada T (2016) Introducing a multithread and multistage mechanism for the global load balancing library of X10. J Inf Process 24(2):416–424. Google Scholar
  24. 24.
    Zhang W, Tardieu O, Grove D, Herta B, Kamada T, Saraswat V, Takeuchi M (2014) GLB lifeline-based global load balancing library in X10. In: Proceedings of ACM Workshop on Parallel Programming for Analytics Applications.

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Reseach Group Programming Languages/MethodologiesUniversity of KasselKasselGermany

Personalised recommendations