Advertisement

Inference and Declaration of Independence in Task-Parallel Programs

  • Foivos S. Zakkak
  • Dimitrios Chasapis
  • Polyvios Pratikakis
  • Angelos Bilas
  • Dimitrios S. Nikolopoulos
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8299)

Abstract

The inherent difficulty of thread-based shared-memory programming has recently motivated research in high-level, task-parallel programming models. Recent advances of Task-Parallel models add implicit synchronization, where the system automatically detects and satisfies data dependencies among spawned tasks. However, dynamic dependence analysis incurs significant runtime overheads, because the runtime must track task resources and use this information to schedule tasks while avoiding conflicts and races.

We present SCOOP, a compiler that effectively integrates static and dynamic analysis in code generation. SCOOP combines context-sensitive points-to, control-flow, escape, and effect analyses to remove redundant dependence checks at runtime. Our static analysis can work in combination with existing dynamic analyses and task-parallel runtimes that use annotations to specify tasks and their memory footprints. We use our static dependence analysis to detect non-conflicting tasks and an existing dynamic analysis to handle the remaining dependencies. We evaluate the resulting hybrid dependence analysis on a set of task-parallel programs.

Keywords

Task-Parallelism Static Analysis Dependence Analysis Deterministic Execution 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Blumofe, R.D., Joerg, C.F., Kuszmaul, B.C., Leiserson, C.E., Randall, K.H., Zhou, Y.: Cilk: an efficient multithreaded runtime system. In: Proceedings of the Fifth ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, pp. 207–216. ACM, New York (1995)CrossRefGoogle Scholar
  2. 2.
    Dagum, L., Menon, R.: OpenMP: An industry-standard api for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)CrossRefGoogle Scholar
  3. 3.
    Fatahalian, K., Horn, D.R., Knight, T.J., Leem, L., Houston, M., Park, J.Y., Erez, M., Ren, M., Aiken, A., Dally, W.J., Hanrahan, P.: Sequoia: programming the memory hierarchy. In: Proceedings of the 2006 ACM/IEEE Conference on Supercomputing. ACM, New York (2006)Google Scholar
  4. 4.
    Pop, A., Cohen, A.: OpenStream: Expressiveness and data-flow compilation of OpenMP streaming programs. ACM Transactions on Architecture and Code Optimization 9(4), 53:1–53:25 (2013)CrossRefGoogle Scholar
  5. 5.
    Jenista, J.C., Eom, Y.H., Demsky, B.C.: OoOJava: software out-of-order execution. In: Proceedings of the 16th ACM Symposium on Principles and Practice of Parallel Programming, pp. 57–68. ACM, New York (2011)CrossRefGoogle Scholar
  6. 6.
    Bocchino Jr., R.L., Heumann, S., Honarmand, N., Adve, S.V., Adve, V.S., Welc, A., Shpeisman, T.: Safe nondeterminism in a deterministic-by-default parallel language. In: Proceedings of the 38th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 535–548. ACM, New York (2011)Google Scholar
  7. 7.
    Best, M.J., Mottishaw, S., Mustard, C., Roth, M., Fedorova, A., Brownsword, A.: Synchronization via scheduling: techniques for efficiently managing shared state. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 640–652. ACM, New York (2011)CrossRefGoogle Scholar
  8. 8.
    Tzenakis, G., Papatriantafyllou, A., Vandierendonck, H., Pratikakis, P., Nikolopoulos, D.S.: BDDT: Block-level dynamic dependence analysis for task-based parallelism. In: International Conference on Advanced Parallel Processing Technology (2013)Google Scholar
  9. 9.
    Perez, J.M., Badia, R.M., Labarta, J.: Handling task dependencies under strided and aliased references. In: Proceedings of the 24th ACM International Conference on Supercomputing, pp. 263–274. ACM, New York (2010)CrossRefGoogle Scholar
  10. 10.
    Rehof, J., Fähndrich, M.: Type-base flow analysis: from polymorphic subtyping to cfl-reachability. In: Proceedings of the 28th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 54–66. ACM, New York (2001)CrossRefGoogle Scholar
  11. 11.
    Wolf, M.E.: Improving locality and parallelism in nested loops. PhD thesis, Stanford University, Stanford, CA, USA (1992) UMI Order No. GAX93-02340Google Scholar
  12. 12.
    Maydan, D.E., Hennessy, J.L., Lam, M.S.: Efficient and exact data dependence analysis. In: Proceedings of the ACM SIGPLAN 1991 Conference on Programming Language Design and Implementation, pp. 1–14. ACM, New York (1991)CrossRefGoogle Scholar
  13. 13.
    Bienia, C., Kumar, S., Singh, J.P., Li, K.: The parsec benchmark suite: characterization and architectural implications. In: Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques, pp. 72–81. ACM, New York (2008)CrossRefGoogle Scholar
  14. 14.
    Woo, S.C., Ohara, M., Torrie, E., Singh, J.P., Gupta, A.: The SPLASH-2 programs: Characterization and methodological considerations. In: International Symposium on Computer Architecture, pp. 24–36. ACM (1995)Google Scholar
  15. 15.
    Minh, C.C., Chung, J., Kozyrakis, C., Olukotun, K.: STAMP: Stanford transactional applications for multi-processing. In: IEEE International Symposium on Workload Characterization, pp. 35–46. IEEE (2008)Google Scholar
  16. 16.
    Reinders, J.: Intel threading building blocks, 1st edn. O’Reilly & Associates, Inc., Sebastopol (2007)Google Scholar
  17. 17.
    Herlihy, M., Moss, J.E.B.: Transactional memory: architectural support for lock-free data structures. In: Proceedings of the 20th Annual International Symposium on Computer Architecture, pp. 289–300. ACM, New York (1993)CrossRefGoogle Scholar
  18. 18.
    Cherem, S., Chilimbi, T., Gulwani, S.: Inferring locks for atomic sections. In: Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 304–315. ACM, New York (2008)CrossRefGoogle Scholar
  19. 19.
    Rinard, M.C., Lam, M.S.: The design, implementation, and evaluation of Jade. ACM Transactions on Programming Languages and Systems 20(3), 483–545 (1998)CrossRefGoogle Scholar
  20. 20.
    Planas, J., Badia, R.M., Ayguadé, E., Labarta, J.: Hierarchical task-based programming with StarSs. International Journal of High Performance Computing Applications 23(3), 284–299 (2009)CrossRefGoogle Scholar
  21. 21.
    Prabhu, P., Ghosh, S., Zhang, Y., Johnson, N.P., August, D.I.: Commutative set: a language extension for implicit parallel programming. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 1–11. ACM, New York (2011)CrossRefGoogle Scholar
  22. 22.
    Olszewski, M., Ansel, J., Amarasinghe, S.: Kendo: efficient deterministic multithreading in software. In: Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 97–108. ACM, New York (2009)CrossRefGoogle Scholar
  23. 23.
    Berger, E.D., Yang, T., Liu, T., Novark, G.: Grace: safe multithreaded programming for c/c++. In: Proceedings of the 24th ACM SIGPLAN Conference on Object Oriented Programming Systems Languages and Applications, pp. 81–96. ACM, New York (2009)CrossRefGoogle Scholar
  24. 24.
    Devietti, J., Nelson, J., Bergan, T., Ceze, L., Grossman, D.: RCDC: a relaxed consistency deterministic computer. In: Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, pp. 67–78. ACM, New York (2011)CrossRefGoogle Scholar
  25. 25.
    Lee, D., Chen, P.M., Flinn, J., Narayanasamy, S.: Chimera: hybrid program analysis for determinism. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 463–474. ACM, New York (2012)CrossRefGoogle Scholar
  26. 26.
    Pugh, W.: The omega test: a fast and practical integer programming algorithm for dependence analysis. Communications of the ACM 8, 4–13 (1992)Google Scholar
  27. 27.
    Rauchwerger, L., Padua, D.: The lrpd test: speculative run-time parallelization of loops with privatization and reduction parallelization. In: Proceedings of the ACM SIGPLAN 1995 Conference on Programming Language Design and Implementation, pp. 218–232. ACM, New York (1995)CrossRefGoogle Scholar
  28. 28.
    Holewinski, J., Ramamurthi, R., Ravishankar, M., Fauzia, N., Pouchet, L.N., Rountev, A., Sadayappan, P.: Dynamic trace-based analysis of vectorization potential of applications. In: Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 371–382. ACM, New York (2012)CrossRefGoogle Scholar
  29. 29.
    Naik, M., Aiken, A.: Conditional must not aliasing for static race detection. In: Proceedings of the 34th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, pp. 327–338. ACM, New York (2007)CrossRefGoogle Scholar
  30. 30.
    Pratikakis, P., Foster, J.S., Hicks, M.: Locksmith: Practical static race detection for c. ACM Transactions on Programming Languages and Systems 33(1), 3:1–3:55 (2011)CrossRefGoogle Scholar
  31. 31.
    Pratikakis, P., Foster, J.S., Hicks, M.: Locksmith: context-sensitive correlation analysis for race detection. In: Proceedings of the 2006 ACM SIGPLAN Conference on Programming Language Design and Implementation, pp. 320–331. ACM, New York (2006)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Foivos S. Zakkak
    • 1
  • Dimitrios Chasapis
    • 1
  • Polyvios Pratikakis
    • 1
  • Angelos Bilas
    • 1
  • Dimitrios S. Nikolopoulos
    • 2
  1. 1.FORTH-ICSHeraklionCrete, Greece
  2. 2.Queens University of BelfastBelfastUnited Kingdom

Personalised recommendations