International Journal of Parallel Programming

, Volume 42, Issue 4, pp 529–545 | Cite as

Dynamic and Speculative Polyhedral Parallelization Using Compiler-Generated Skeletons

  • Alexandra Jimborean
  • Philippe Clauss
  • Jean-François Dollinger
  • Vincent Loechner
  • Juan Manuel Martinez Caamaño
Article

Abstract

We propose a framework based on an original generation and use of algorithmic skeletons, and dedicated to speculative parallelization of scientific nested loop kernels, able to apply at run-time polyhedral transformations to the target code in order to exhibit parallelism and data locality. Parallel code generation is achieved almost at no cost by using binary algorithmic skeletons that are generated at compile-time, and that embed the original code and operations devoted to instantiate a polyhedral parallelizing transformation and to verify the speculations on dependences. The skeletons are patched at run-time to generate the executable code. The run-time process includes a transformation selection guided by online profiling phases on short samples, using an instrumented version of the code. During this phase, the accessed memory addresses are used to compute on-the-fly dependence distance vectors, and are also interpolated to build a predictor of the forthcoming accesses. Interpolating functions and distance vectors are then employed for dependence analysis to select a parallelizing transformation that, if the prediction is correct, does not induce any rollback during execution. In order to ensure that the rollback time overhead stays low, the code is executed in successive slices of the outermost original loop of the nest. Each slice can be either a parallel version which instantiates a skeleton, a sequential original version, or an instrumented version. Moreover, such slicing of the execution provides the opportunity of transforming differently the code to adapt to the observed execution phases, by patching differently one of the pre-built skeletons. The framework has been implemented with extensions of the LLVM compiler and an x86-64 runtime system. Significant speed-ups are shown on a set of benchmarks that could not have been handled efficiently by a compiler.

Keywords

Algorithmic skeletons Polytope model Automatic parallelization Speculative parallelization Dynamic parallelization Loop nests Compilation 

References

  1. 1.
    Bala, V., Duesterwald, E., Banerjia, S.: Dynamo: a transparent dynamic optimization system. In: PLDI ’00. ACM (2000)Google Scholar
  2. 2.
    Bondhugula, U., Hartono, A., Ramanujam, J., Sadayappan, P.: A practical automatic polyhedral parallelizer and locality optimizer. In: PLDI ’08. ACM (2008)Google Scholar
  3. 3.
    Che, S., Boyer, M., Meng, J., Tarjan, D., Sheaffer, J.W., Lee, S.H., Skadron, K.: Rodinia: a benchmark suite for heterogeneous computing. In: IISWC, pp. 44–54. IEEE (2009)Google Scholar
  4. 4.
    Cole, M.: Bringing skeletons out of the closet: a pragmatic manifesto for skeletal parallel programming. Parallel Comput. 30(3), 389–406 (2004)CrossRefGoogle Scholar
  5. 5.
    GOMP An OpenMP implementation for GCC—GNU Project. http://gcc.gnu.org/projects/gomp
  6. 6.
  7. 7.
    Jimborean, A., Clauss, P., Pradelle, B., Mastrangelo, L., Loechner, V.: Adapting the polyhedral model as a framework for efficient speculative parallelization. In: PPoPP ’12 (2012)Google Scholar
  8. 8.
    Jimborean, A., Mastrangelo, L., Loechner, V., Clauss, P.: VMAD: an advanced dynamic program analysis and instrumentation framework. In: OBoyle, M. (ed.) Compiler Construction, Lecture Notes in Computer Science, vol. 7210, pp. 220–239. Springer, Berlin, Heidelberg (2012)Google Scholar
  9. 9.
    Jimborean, A.: Adapting the polytope model for dynamic and speculative parallelization. PhD Thesis, University of Strasbourg, France (2012). http://tel.archives-ouvertes.fr/tel-00733850
  10. 10.
    Johnson, T.A., Eigenmann, R., Vijaykumar, T.N.: Speculative thread decomposition through empirical optimization. In: PPoPP ’07. ACM (2007)Google Scholar
  11. 11.
    Khan, M.A., Charles, H.P., Barthou, D.: Improving performance of optimized kernels through fast instantiations of templates. Concurr. Comput. Pract. Exp. 21(1), 59–70 (2009)Google Scholar
  12. 12.
    Kim, H., Johnson, N.P., Lee, J.W., Mahlke, S.A., August, D.I.: Automatic speculative doall for clusters. In: CGO ’12. ACM (2012)Google Scholar
  13. 13.
    Kotzmann, T., Wimmer, C., Mössenböck, H., Rodriguez, T., Russell, K., Cox, D.: Design of the java hotspot client compiler for java 6. ACM Trans. Archit. Code Optim. 5, 7–32 (2008)Google Scholar
  14. 14.
    Li, C., Gava, F., Hains, G.: Implementation of data-parallel skeletons: a case study using a coarse-grained hierarchical model. In: ISPDC, pp. 26–33 (2012)Google Scholar
  15. 15.
    Liu, W., Tuck, J., Ceze, L., Ahn, W., Strauss, K., Renau, J., Torrellas, J.: POSH: a TLS compiler that exploits program structure. In: PPoPP ’06. ACM (2006)Google Scholar
  16. 16.
    LLVM compiler infrastructure. http://llvm.org
  17. 17.
    Noël, F., Hornof, L., Consel, C., Lawall, J.L.: Automatic, template-based run-time specialization: implementation and experimental study. In: International Conference on Computer Languages. IEEE Computer Society Press (1998)Google Scholar
  18. 18.
    Nugteren, C., Corporaal, H.: Introducing ’Bones’: a parallelizing source-to-source compiler based on algorithmic skeletons. In: Proceedings of the 5th Annual Workshop on General Purpose Processing with Graphics Processing Units, GPGPU-5, pp. 1–10. ACM, New York, NY, USA (2012). doi:10.1145/2159430.2159431
  19. 19.
  20. 20.
    Pouchet, L.N., Bondhugula, U., Bastoul, C., Cohen, A., Ramanujam, J., Sadayappan, P., Vasilache, N.: Loop transformations: convexity, pruning and optimization. In: POPL ’11. ACM (2011)Google Scholar
  21. 21.
    Pouchet, L.N.: FM: the Fourier-Motzkin library. (2008). http://www.cse.ohio-state.edu/pouchet/software/fm
  22. 22.
    Prabhu, M.K., Olukotun, K.: Using thread-level speculation to simplify manual parallelization. In: PPoPP ’03. ACM (2003)Google Scholar
  23. 23.
    Raman, E., Vachharajani, N., Rangan, R., August, D.I.: Spice: speculative parallel iteration chunk execution. In: CGO ’08. ACM (2008)Google Scholar
  24. 24.
    Rauchwerger, L., Padua, D.: The LRPD test: speculative run-time parallelization of loops with privatization and reduction parallelization. In: PLDI ’95. ACM (1995)Google Scholar
  25. 25.
  26. 26.
    Schrijver, A.: Theory of Linear and Integer Programming. Wiley, NY, USA (1986)MATHGoogle Scholar
  27. 27.
    Smith, F., Grossman, D., Morrisett, G., Hornof, L., Jim, T.: Compiling for template-based run-time code generation. J. Funct. Program. 13(3), 677–708 (2003)Google Scholar
  28. 28.
    Tian, C., Feng, M., Gupta, R.: Speculative parallelization using state separation and multiple value prediction. In: International Symposium on Memory Management, ISMM ’10. ACM (2010)Google Scholar

Copyright information

© Springer Science+Business Media New York 2013

Authors and Affiliations

  • Alexandra Jimborean
    • 1
  • Philippe Clauss
    • 2
  • Jean-François Dollinger
    • 2
  • Vincent Loechner
    • 2
  • Juan Manuel Martinez Caamaño
    • 2
  1. 1.UPMARCUniversity of UppsalaUppsalaSweden
  2. 2.ICube, INRIA, CNRSUniversity of StrasbourgStrasbourgFrance

Personalised recommendations