Skip to main content

BFCA+: automatic synthesis of parallel code with TLS capabilities

Abstract

Parallelization of sequential applications requires extracting information about the loops and how their variables are accessed, and afterwards, augmenting the source code with extra code depending on such information. In this paper we propose a framework that avoids such an error-prone, time-consuming task. Our solution leverages the compile-time information extracted from the source code to classify all variables used inside each loop according to their accesses. Then, our system, called BFCA+, automatically instruments the source code with the necessary OpenMP directives and clauses to allow its parallel execution, using the standard shared and private clauses for variable classification. The framework is also capable of instrumenting loops for speculative parallelization, with the help of the ATLaS runtime system, that defines a new speculative clause to point out those variables that may lead to a dependency violation. As a result, the target loop is guaranteed to correctly run in parallel, ensuring that its execution follows sequential semantics even in the presence of dependency violations. Our experimental evaluation shows that the framework not only saves development time, but also leads to a faster code than the one manually parallelized.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Notes

  1. 1.

    Only well-formed for loops where the number of iterations are known at the beginning of the loop can be parallelized by the ATLaS framework. See [7] for additional details.

  2. 2.

    The current version of BFCA\(+\) only transforms a single loop of the application to avoid the transformation of two nested loops, a situation not allowed by the ATLaS runtime system. We expect to overcome this limitation in the near future.

  3. 3.

    Note that the manual transformation process is included to figure out which loop would be more profitable to be parallelized and then perform an in-depth analysis of the data elements being accessed inside the loop. This is an error-prone, time-consuming process that, for the benchmarks considered, took between 10 and 30 h.

References

  1. 1.

    Aldea S, Llanos DR, Gonzalez-Escribano A (2012) Using SPEC CPU2006 to evaluate the secuential and parallel code generated by commercial and open-source compilers. J Supercomput 59(1):486–498

    Article  Google Scholar 

  2. 2.

    Cintra M, Llanos DR (2003) Toward efficient and robust software speculative parallelization on multiprocessors. In: PPoPP’03 proceedings, pp 13–24

  3. 3.

    Dang FH, Yu H, Rauchwerger L (2002) The R-LRPD test: speculative parallelization of partially parallel loops. In: IPDPS’02 proceedings, pp 20–29

  4. 4.

    Aldea S, Llanos DR, Gonzalez-Escribano A (2012) Support for thread-level speculation into OpenMP. In: IWOMP’12 proceedings, pp 275–278

  5. 5.

    Aldea S, Llanos DR, Gonzalez-Escribano A (2014) The BonaFide C analyzer: automatic loop-level characterization and coverage measurement. J Supercomput 68(3):1378–1401

    Article  Google Scholar 

  6. 6.

    Aldea S, Estebanez A, Llanos DR, Gonzalez-Escribano A (2014) A new GCC plugin-based compiler pass to add support for thread-level speculation into OpenMP. In: EuroPar’14 proceedings, LNCS 8632, Springer, pp 234–245

  7. 7.

    Aldea S et al (2015) An OpenMP extension that supports thread-level speculation. IEEE Trans Partial Distrib Syst (to appear)

  8. 8.

    Oancea CE, Mycroft A, Harris T (2009) A lightweight in-place implementation for software thread-level speculation. In: SPAA 2009 proceedings, pp 223–232. ACM, New York

  9. 9.

    Yiapanis P et al (2013) Optimizing software runtime systems for speculative parallelization. ACM Trans Arch Code Optim (TACO) 9(4):39

    Google Scholar 

  10. 10.

    Adhianto Laksono et al (2000) Tools for OpenMP application development: the POST project. Concurr Pract Exp 12:1177–1191

    Article  MATH  Google Scholar 

  11. 11.

    Ierotheou Cos S et al (2005) Generating OpenMP code using an interactive parallelization environment. Parallel Comput 31(10–12):999–1012

    Article  Google Scholar 

  12. 12.

    Jin Haoqiang et al (2003) Automatic multilevel parallelization using OpenMP. J Sci Program EWOMP’11 11(2):177–190 (2)

    Google Scholar 

  13. 13.

    Johnson S et al (2005) The ParaWise expert assistant—widening accessibility to efficient and scalable tool generated OpenMP code. In: Proceedings of the WOMPAT’04, pp 67–82

  14. 14.

    Bondhugula, Uday et al (2008) A practical automatic polyhedral parallelizer and locality optimizer. In: PLDI’08 proceedings, pp 101–113

  15. 15.

    Trifunovic K et al (2010) Graphite two years after: first lessons learned from real-world polyhedral compilation. In: GROW’10 proceedings, pp 4–19

  16. 16.

    Grosser T et al (2011) Polly—polyhedral optimization in LLVM. In: IMPACT’11 workshop proceedings, Charmonix, France, pp 1–6

  17. 17.

    Lattner C, Adve V (2004) LLVM: a compilation framework for lifelong program analysis transformation. In: CGO’04 proceedings, pp 75–86 (2004)

  18. 18.

    Amini M et al (2012) Par4All: from convex array regions to heterogeneous computing. In: IMPACT’12 HiPEAC workshop proceedings, Paris, France, pp 1–2

  19. 19.

    Guelton S (2011) Building source-to-source compilers for heterogeneous targets. PhD thesis, Universit europenne de Bretagne, Rennes (2011)

  20. 20.

    Amini M et al (2011) PIPS is not (just) polyhedral software. In: IMPACT’11 workshop proceedings, Charmonix, France, pp 7–12

  21. 21.

    Liao C et al (2008) Automatic parallelization using OpenMP based on STL semantics. In: Languages and compilers for parallel computing (LCPC)

  22. 22.

    Dave Chirag et al (2009) Cetus: a source-to-source compiler infrastructure for multicores. IEEE Comput 42(12):36–42

    Article  Google Scholar 

  23. 23.

    Taillard J, Guyomarch F, Dekeyser JL (2008) A graphical framework for high performance computing using an MDE approach. In: PDP’08 proceedings, pp 165–173

  24. 24.

    Nardi L et al (2012) YAO: a generator of parallel code for variational data assimilation applications. In: HPCC’12 proceedings, pp 224–232

  25. 25.

    Clarkson KL, Mehlhorn K, Seidel R (1993) Four results on randomized incremental constructions. Comput Geom Theory Appl 3(4):185–212

    MathSciNet  Article  MATH  Google Scholar 

  26. 26.

    Devroye L, Mücke EP, Zhu B (1998) A note on point location in Delaunay triangulations of random points. Algorithmica 22:477–482

    MathSciNet  Article  MATH  Google Scholar 

  27. 27.

    Welzl E (1991) Smallest enclosing disks (balls and ellipsoids). In: New results and new trends in computer science. LNCS, vol 555. Springer, New York, pp 359–370

  28. 28.

    Barnes JE (1997) TREE. Institute for Astronomy, University of Hawaii. ftp://hubble.ifa.hawaii.edu/pub/barnes/treecode/

Download references

Acknowledgments

This research has been partially supported by MICINN (Spain) and ERDF program of the European Union: HomProg-HetSys project (TIN2014-58876-P), CAPAP-H5 network (TIN2014-53522-REDT), and COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS).

Author information

Affiliations

Authors

Corresponding author

Correspondence to Diego R. Llanos.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Aldea, S., Llanos, D.R. & Gonzalez-Escribano, A. BFCA+: automatic synthesis of parallel code with TLS capabilities. J Supercomput 73, 88–99 (2017). https://doi.org/10.1007/s11227-016-1623-0

Download citation

Keywords

  • Automatic parallelization
  • Compiler framework
  • OpenMP
  • Source synthesis
  • Source transformation
  • Speculative parallelization
  • XML