Advertisement

The Journal of Supercomputing

, Volume 74, Issue 4, pp 1435–1448 | Cite as

Hybrid work stealing of locality-flexible and cancelable tasks for the APGAS library

  • Jonas Posner
  • Claudia Fohry
Article

Abstract

Since large parallel machines are typically clusters of multicore nodes, parallel programs should be able to deal with both shared memory and distributed memory. This paper proposes a hybrid work stealing scheme, which combines the lifeline-based variant of distributed task pools with the node-internal load balancing of Java’s Fork/Join framework. We implemented our scheme by extending the APGAS library for Java, which is a branch of the X10 project. APGAS programmers can now spawn locality-flexible tasks with a new asyncAny construct. These tasks are transparently mapped to any resource in the overall system, so that the load is balanced over both nodes and cores. Unprocessed asyncAny-tasks can also be cancelled. In performance measurements with up to 144 workers on up to 12 nodes, we observed near linear speedups for four benchmarks and a low overhead for cancellation-related bookkeeping.

Keywords

Task pool Work stealing Task cancellation APGAS Java 

Notes

Acknowledgements

This work is supported by the Deutsche Forschungsgemeinschaft, under Grant FO 1035/5-1.

References

  1. 1.
    Applegate DL, Bixby RE, Chvatal V, Cook WJ (2007) The traveling salesman problem. Princeton University Press, PrincetonzbMATHGoogle Scholar
  2. 2.
    Diaz J, Munoz-Caro C, Nino A (2012) A survey of parallel programming models and tools in the multi and many-core era. IEEE Trans Parallel Distrib Syst 23:1369–1386.  https://doi.org/10.1109/tpds.2011.308 CrossRefGoogle Scholar
  3. 3.
    Freeman LC (1977) A set of measures of centrality based on betweenness. Sociometry 40(1):35.  https://doi.org/10.2307/3033543 CrossRefGoogle Scholar
  4. 4.
    Gendron B, Crainic TG (1994) Parallel branch-and-branch algorithms: survey and synthesis. Oper Res 42(6):1042–1066.  https://doi.org/10.1287/opre.42.6.1042 MathSciNetCrossRefzbMATHGoogle Scholar
  5. 5.
    Gik J (1987) Schach und Mathematik. Deutsch Harri GmbH, Frankfurt a. MGoogle Scholar
  6. 6.
    Guo Y, Zhao J, Cave V, Sarkar V (2010) SLAW: a scalable locality-aware adaptive work-stealing scheduler for multi-core systems. In: Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming.  https://doi.org/10.1145/1693453.1693504
  7. 7.
    IBM: Core implementation of X10 programming language including compiler, runtime, class libraries, sample programs and test suite. https://github.com/x10-lang/x10 (2017)
  8. 8.
    IBM: The APGAS library for fault-tolerant distributed programming in Java 8. https://github.com/x10-lang/apgas (2017)
  9. 9.
    Kestor G, Krishnamoorthy S, Ma W (2017) Localized fault recovery for nested fork-join programs. In: Proceedings of IEEE International Symposium on Parallel and Distributed Processing.  https://doi.org/10.1109/ipdps.2017.75
  10. 10.
    Kolesnichenko A, Nanz S, Meyer B (2013) How to cancel a task. Springer, Berlin, pp 61–72.  https://doi.org/10.1007/978-3-642-39955-8_6 Google Scholar
  11. 11.
    Kumar V, Murthy K, Sarkar V, Zheng Y (2016) Optimized distributed work-stealing. In: Proceedings of Workshop on Irregular Applications: Architectures and Algorithms, pp 74–77.  https://doi.org/10.1109/IA3.2016.19
  12. 12.
    Olivier S, Huan J, Liu J, Prins J, Dinan J, Sadayappan P, Tseng CW (2006) UTS: an unbalanced tree search benchmark. In: Languages and Compilers for Parallel Computing, pp 235–250. Springer LNCS 4382.  https://doi.org/10.1007/978-3-540-72521-3_18
  13. 13.
    OpenMP ARB: OpenMP specifications. http://www.openmp.org/specifications/ (2017)
  14. 14.
  15. 15.
    Paudel J, Tardieu O, Amaral JN (2013) Hybrid parallel task placement in X10. In: Proceedings of ACM SIGPLAN Workshop on X10.  https://doi.org/10.1145/2481268.2481277
  16. 16.
    Paudel J, Tardieu O, Amaral JN (2013) On the merits of distributed work-stealing on selective locality-aware tasks. In: Proceedings of International Conference on Parallel Processing.  https://doi.org/10.1109/icpp.2013.19
  17. 17.
    Posner J, Fohry C (2016) Cooperation versus coordination for lifeline-based global load balancing in APGAS. In: Proceedings of ACM SIGPLAN Workshop on X10.  https://doi.org/10.1145/2931028.2931029
  18. 18.
    Rice University: HabaneroUPC++: a Compiler-free PGAS Library. https://github.com/habanero-rice/habanero-upc (2017)
  19. 19.
    STEllAR-GROUP: HPX: The C++ standards library for parallelism and concurrency (2017). https://github.com/STEllAR-GROUP/hpx
  20. 20.
    Tardieu O (2015) The APGAS library: resilient parallel and distributed programming in Java 8. In: Proceedings of ACM SIGPLAN Workshop on X10.  https://doi.org/10.1145/2771774.2771780
  21. 21.
    Thoman P, et al (2018) A taxonomy of task-based technologies for high-performance computing. In: Proceedings of International Conference Parallel Processing and Applied Mathematics (To appear)Google Scholar
  22. 22.
  23. 23.
    Yamashita K, Kamada T (2016) Introducing a multithread and multistage mechanism for the global load balancing library of X10. J Inf Process 24(2):416–424.  https://doi.org/10.2197/ipsjjip.24.416 Google Scholar
  24. 24.
    Zhang W, Tardieu O, Grove D, Herta B, Kamada T, Saraswat V, Takeuchi M (2014) GLB lifeline-based global load balancing library in X10. In: Proceedings of ACM Workshop on Parallel Programming for Analytics Applications.  https://doi.org/10.1145/2567634.2567639

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Reseach Group Programming Languages/MethodologiesUniversity of KasselKasselGermany

Personalised recommendations