BDDT: Block-Level Dynamic Dependence Analysis for Task-Based Parallelism

  • George Tzenakis
  • Angelos Papatriantafyllou
  • Hans Vandierendonck
  • Polyvios Pratikakis
  • Dimitrios S. Nikolopoulos
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8299)

Abstract

We present BDDT, a task-parallel runtime system that dynamically discovers and resolves dependencies among parallel tasks. BDDT allows the programmer to specify detailed task footprints on any memory address range, multidimensional array tile or dynamic region. BDDT uses a block-based dependence analysis with arbitrary granularity. The analysis is applicable to existing C programs without having to restructure object or array allocation, and provides flexibility in array layouts and tile dimensions.

We evaluate BDDT using a representative set of benchmarks, and we compare it to SMPSs (the equivalent runtime system in StarSs) and OpenMP. BDDT performs comparable to or better than SMPSs and is able to cope with task granularity as much as one order of magnitude finer than SMPSs. Compared to OpenMP, BDDT performs up to 3.9× better for benchmarks that benefit from dynamic dependence analysis. BDDT provides additional data annotations to bypass dependence analysis. Using these annotations, BDDT outperforms OpenMP also in benchmarks where dependence analysis does not discover additional parallelism, thanks to a more efficient implementation of the runtime system.

Keywords

Compilers and runtime systems Task-parallel libraries Middleware for parallel systems Synchronization and concurrency control 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Augonnet, C., Thibault, S., Namyst, R.: StarPU: a runtime system for scheduling tasks over accelerator-based multicore machines. Tech. Report RR-7240, INRIA (March 2010)Google Scholar
  2. 2.
    Ayguadé, E., Copty, N., Duran, A., Hoeflinger, J., Lin, Y., Massaioli, F., Teruel, X., Unnikrishnan, P., Zhang, G.: The Design of OpenMP Tasks. TPDS 20(3), 404–418 (2009)Google Scholar
  3. 3.
    Bauer, M., Clark, J., Schkufza, E., Aiken, A.: Programming the Memory Hierarchy Revisited: Supporting Irregular Parallelism in Sequoia. In: PPoPP (2011)Google Scholar
  4. 4.
    Best, M.J., Mottishaw, S., Mustard, C., Roth, M., Fedorova, A., Brownsword, A.: Synchronization via Scheduling: Techniques for Efficiently Managing Shared State. In: PLDI (2011)Google Scholar
  5. 5.
    Bienia, C., Kumar, S., Pal Singh, J., Li, K.: The PARSEC benchmark suite: Characterization and architectural implications. In: PACT (October 2008)Google Scholar
  6. 6.
    Bocchino, R., Adve, V.S., Dig, D., Adve, S.V., Heumann, S., Komuravelli, R., Overbey, J., Simmons, P., Sung, H., Vakilian, M.: A type and effect system for deterministic parallel Java. In: OOPSLA (2009)Google Scholar
  7. 7.
    Cao Minh, C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: Stanford transactional applications for multi-processing. In: IISWC (September 2008)Google Scholar
  8. 8.
    Fatahalian, K., Horn, D.R., Knight, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: Programming the Memory Hierarchy. In: SC (2006)Google Scholar
  9. 9.
    Herlihy, M., Moss, J.E.: Transactional memory: Architectural support for lock-free data structures. In: ISCA (1993)Google Scholar
  10. 10.
    Jenista, J.C., Eom, Y.H., Demsky, B.: OoOJava: Software Out-of-Order Execution. In: PPoPP (2011)Google Scholar
  11. 11.
    Leiserson, C.E.: The Cilk++ concurrency platform. TJS 51(3), 244–257 (2010)Google Scholar
  12. 12.
    Pérez, J.M., Badia, R.M., Labarta, J.: Handling Task Dependencies under Strided and Aliased References. In: ICS (2010)Google Scholar
  13. 13.
    Pérez, J.M., Bellens, P., Badia, R.M., Labarta, J.: CellSs: Making it Easier to Program the Cell Broadband Engine Processor. IBMRD 51(5), 593–604 (2007)CrossRefGoogle Scholar
  14. 14.
    Planas, J., Badia, R.M., Ayguadé, E., Labarta, J.: Hierarchical Task-Based Programming With StarSs. IJHPCA 23(3), 284–299 (2009)Google Scholar
  15. 15.
    Pratikakis, P., Vandierendonck, H., Lyberis, S., Nikolopoulos, D.S.: A programming model for deterministic task parallelism. In: MSPC, pp. 7–12 (2011)Google Scholar
  16. 16.
    Rinard, M.C., Lam, M.S.: The Design, Implementation, and Evaluation of Jade. TOPLAS 20(3), 483–545 (1998)CrossRefGoogle Scholar
  17. 17.
    Tofte, M., Talpin, J.-P.: Region-based memory management. Inf. Comput. 132(2) (1997)Google Scholar
  18. 18.
    Vandierendonck, H., Pratikakis, P., Nikolopoulos, D.S.: Parallel programming of general-purpose programs using task-based programming models. In: HotPar (2011)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • George Tzenakis
    • 1
  • Angelos Papatriantafyllou
    • 3
  • Hans Vandierendonck
    • 1
  • Polyvios Pratikakis
    • 2
  • Dimitrios S. Nikolopoulos
    • 1
  1. 1.Queen’s University of BelfastBelfastUnited Kingdom
  2. 2.FORTH-ICSHeraklionCrete, Greece
  3. 3.TU WienViennaAustria

Personalised recommendations