Advertisement

The Journal of Supercomputing

, Volume 74, Issue 11, pp 5674–5689 | Cite as

Assessing and discovering parallelism in C\(++\) code for heterogeneous platforms

  • David del Rio Astorga
  • Rafael Sotomayor
  • Luis Miguel Sanchez
  • Javier Garcia Blas
  • Alejandro Calderon
  • Javier Fernandez
Article

Abstract

Massively parallel architectures are mainly based on a parallel heterogeneous setup. They are composed by different computing devices that speed up specific code regions, named kernels. These kernels are usually executed offline in the corresponding devices. Porting applications to a specific heterogeneous platform is a costly task in terms of time and human resources. The key points in the porting process are the manual analysis of the source code and kernel detection. Each device of these heterogeneous platforms has their own restrictions, such as the memory allocation support. Kernels must be mapped with suitable computing devices. We introduced AKI as an automatic kernel identification and annotation tool that aims to identify potential kernels on C\(++\) sequential applications. AKI identifies those kernels that can be offlined on heterogeneous computing devices. To annotate these kernels, REPARA C++ attributes have been defined. This annotation mechanism can aid future automatic source-to-source transformation tools to facilitate the work for parallel heterogeneous platforms. AKI has been evaluated over all benchmarks included in the NAS suite. The benchmark suite incorporates a big set of realistic high performance applications. The evaluation results demonstrate that AKI is a competitive solution for identifying and annotating parallel code fragments (aka kernels).

Keywords

Heterogeneous programming Parallelism discovery Parallel programming 

Notes

Acknowledgments

The research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007-2013) under Grant Agreement No. 609666 and by the Spanish Ministry of Economics and Competitiveness under the Grant TIN2013-41350-P.

References

  1. 1.
    Athavale A, Ranadive P, Babu MN, Pawar P, Sah S, Vaidya V, Rajguru C (2012) Automatic sequential to parallel code conversion the S2P tool and performance analysis. GSTF J Comput (JoC) 1(4)Google Scholar
  2. 2.
    Brown C, Janjic V, Hammond K, Schner H, Idrees K, Glass C (2014) Agricultural reform: more efficient farming using advanced parallel refactoring tools. In: 22nd Euromicro international conference on parallel, distributed, and network-based processingGoogle Scholar
  3. 3.
    Binkley D (2007) Source code analysis: a road map. In: Future of software engineering, FOSE ’07. IEEE Computer Society, Washington, DC, pp 104–119Google Scholar
  4. 4.
    Bozó I, Fordós V, Horvath Z, Tóth M, Horpácsi D, Kozsik T, Köszegi J, Barwell A, Brown C, Hammond K (2014) Discovering parallel pattern candidates in Erlang. In: 13th ACM SIGPLAN workshop on Erlang, Erlang ’14. ACM, New York, pp 13–23Google Scholar
  5. 5.
    Brown C, Hammond K, Danelutto M, Kilpatrick P, Schöner H, Breddin T (2013) Paraphrasing: generating parallel programs using refactoring. In: Beckert B, Damiani F, de Boer F, Bonsangue MM (eds) Formal methods for components and objects, LNCS, vol 7542. Springer, Berlin, pp 237–256CrossRefGoogle Scholar
  6. 6.
    Castro PDO, Akel C, Petit E, Popov M, Jalby W (2015) CERE: LLVM-based Codelet Extractor and REplayer for piecewise benchmarking and optimization. ACM Trans Archit Code Optim 12:6:1–6:24CrossRefGoogle Scholar
  7. 7.
    Cevelop (2016) The C\(++\) IDE for professional developers. https://www.cevelop.com/. Accessed 5 Apr 2016
  8. 8.
    Göhringer D, Tepelmann J (2014) An interactive tool based on polly for detection and parallelization of loops. In: Workshop on parallel programming and run-time management techniques for many-core architectures and design tools and architectures for multicore embedded computing platforms, PARMA-DITAM ’14. ACM, New York, pp 1:1–1:6Google Scholar
  9. 9.
    González-Vélez H, Leyton M (2010) A survey of algorithmic skeleton frameworks: high-level structured parallel programming enablers. Softw Pract Exp 40(12):1135–1160CrossRefGoogle Scholar
  10. 10.
    Herb Sutter: welcome to the jungle (2012). http://herbsutter.com/welcome-to-the-jungle/. Accessed 6th May 2015
  11. 11.
    ISO/IEC (2011) Information technology—programming languages—C++. In: International standard ISO/IEC 14882:20111. ISO/IEC, GenevaGoogle Scholar
  12. 12.
    Jin H, Jost G, Yan J, Ayguade E, Gonzalez M, Martorell X (2003) Automatic multilevel parallelization using OpenMP. Sci Program 11(2):177–190Google Scholar
  13. 13.
    Kevin H, Allan P, Nick C, Hartmut K, Malony Allen D, Thomas S, Rob F (2015) An autonomic performance environment for exascale. In: Supercomputing frontiers and innovations, pp 49–66Google Scholar
  14. 14.
    Lee S, Vetter JS (2014) OpenARC: Open accelerator research compiler for directive-based, efficient heterogeneous computing. In: 23rd international symposium on high-performance parallel and distributed computing, HPDC ’14. ACM, New York, pp 115–120Google Scholar
  15. 15.
    Li Z, Atre R, Ul-Huda Z, Jannesari A, Wolf F (2015) Discopop: a profiling tool to identify parallelization opportunities. In: Tools for high performance computing 2014, chap 3. Springer, New York, , pp 37–54CrossRefGoogle Scholar
  16. 16.
    Lattner C (2008) LLVM and Clang: next generation compiler technology. In: The BSD conference, pp 1–2Google Scholar
  17. 17.
    McCool M, Reinders J, Robison A (2012) Structured parallel programming: patterns for efficient computation, 1st edn. Morgan Kaufmann, San FranciscoGoogle Scholar
  18. 18.
    REPARA FP-7 European Project (2015). http://repara-project.eu/. Accessed 1 Apr 2015
  19. 19.
    Sotomayor R, Sanchez LM, Garcia Blas J, Calderon A, Fernandez J (2015) AKI: automatic kernel identification and annotation tool based on C\(++\) attributes. In: IEEE TrustCom-BigDataSE-ISPA 2015, pp 148–156Google Scholar
  20. 20.
    Seo S, Jo G, Lee J (2011) Performance characterization of the NAS parallel benchmarks in OpenCL. In: 2011 IEEE international symposium on workload characterization (IISWC), pp 137–148Google Scholar
  21. 21.
    Torquati M, Vanneschi M, Amini M, Guelton S, Keryell R, Lanore V, Pasquier FX, Barreteau M, Barrère R, Petrisor CT et al (2012) An innovative compilation tool-chain for embedded multi-core architectures. In: Embedded world conferenceGoogle Scholar
  22. 22.
    Tournavitis G, Franke B (2010) Semi-automatic extraction and exploitation of hierarchical pipeline parallelism using profiling information. PACT 2010:377–388Google Scholar
  23. 23.
    Vandierendonck H, Rul S, De Bosschere K (2010) The paralax infrastructure: automatic parallelization with a helping hand. PACT 2010:389–400zbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • David del Rio Astorga
    • 1
  • Rafael Sotomayor
    • 1
  • Luis Miguel Sanchez
    • 1
  • Javier Garcia Blas
    • 1
  • Alejandro Calderon
    • 1
  • Javier Fernandez
    • 1
  1. 1.MadridSpain

Personalised recommendations