Advertisement

Automatic Skeleton-Based Compilation through Integration with an Algorithm Classification

  • Cedric Nugteren
  • Pieter Custers
  • Henk Corporaal
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8299)

Abstract

This paper presents a technique to fully automatically generate efficient and readable code for parallel processors. We base our approach on skeleton-based compilation and ‘algorithmic species’, an algorithm classification of program code. We use a tool to automatically annotate C code with species information where possible. The annotated program code is subsequently fed into the skeleton-based source-to-source compiler ‘Bones’, which generates OpenMP, OpenCL or CUDA code and optimises host-accelerator transfers. This results in a unique approach, integrating a skeleton-based compiler for the first time into an automated flow. We demonstrate the benefits of our approach on the PolyBench suite by showing average speed-ups of 1.4x and 1.6x for GPU code compared to ppcg and Par4All, two state-of-the-art compilers.

Keywords

Parallel Programming Algorithm Classification Algorithmic Skeletons Source-to-Source Compilation GPUs 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Amini, M., Creusillet, B., Even, S., Keryell, R., Goubier, O., Guelton, S., Mcmahon, J.O., Pasquier, F.-X., Péan, G., Villalon, P.: Par4All: From Convex Array Regions to Heterogeneous Computing. In: IMPACT 2012: Second International Workshop on Polyhedral Compilation Techniques (2012)Google Scholar
  2. 2.
    Baskaran, M.M., Ramanujam, J., Sadayappan, P.: Automatic C-to-CUDA Code Generation for Affine Programs. In: Gupta, R. (ed.) CC 2010. LNCS, vol. 6011, pp. 244–263. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  3. 3.
    Caarls, W., Jonker, P., Corporaal, H.: Algorithmic Skeletons for Stream Programming in Embedded Heterogeneous Parallel Image Processing Applications. In: IPDPS: Int. Parallel and Distributed Processing Symposium. IEEE (2006)Google Scholar
  4. 4.
    Cole, M.: Algorithmic Skeletons: Structured Management of Parallel Computation. MIT Press (1991)Google Scholar
  5. 5.
    Custers, P.: Algorithmic Species: Classifying Program Code for Parallel Computing. Master’s thesis, Eindhoven University of Technology (2012)Google Scholar
  6. 6.
    Dolbeau, R., Bihan, S., Bodin, F.: HMPP: A Hybrid Multi-core Parallel Programming Environment. In: GPGPU-1: 1st Workshop on General Purpose Processing on Graphics Processing Units (2007)Google Scholar
  7. 7.
    Enmyren, J., Kessler, C.W.: SkePU: A Multi-backend Skeleton Programming Library for Multi-GPU Systems. In: HLPP 2010: 4th International Workshop on High-level Parallel Programming and Applications. ACM (2010)Google Scholar
  8. 8.
    Feautrier, P.: Dataflow Analysis of Array and Scalar References. Springer International Journal of Parallel Programming 20, 23–53 (1991)CrossRefzbMATHGoogle Scholar
  9. 9.
    Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S., Cavazos, J.: Auto-tuning a High-Level Language Targeted to GPU Codes. In: Workshop on Innovative Parallel Computing (2012)Google Scholar
  10. 10.
    Guelton, S., Amini, M., Creusillet, B.: Beyond Do Loops: Data Transfer Generation with Convex Array Regions. In: Kasahara, H., Kimura, K. (eds.) LCPC 2012. LNCS, vol. 7760, pp. 249–263. Springer, Heidelberg (2013)CrossRefGoogle Scholar
  11. 11.
    Han, T., Abdelrahman, T.: hiCUDA: High-Level GPGPU Programming. IEEE Transactions on Parallel and Distributed Systems 22, 78–90 (2011)CrossRefGoogle Scholar
  12. 12.
    Jablin, T., Jablin, J., Prabhu, P., Liu, F., August, D.: Dynamically Managed Data for CPU-GPU Architectures. In: CGO 2012: International Symposium on Code Generation and Optimization. ACM (2012)Google Scholar
  13. 13.
    Khan, M., Basu, P., Rudy, G., Hall, M., Chen, C., Chame, J.: A Script-Based Autotuning Compiler System to Generate High-Performance CUDA Code. ACM Transactions on Architecture and Code Optimisations 9(4), Article 31 (January 2013)Google Scholar
  14. 14.
    Lee, Y., Krashinsky, R., Grover, V., Keckler, S.W., Asanovic, K.: Convergence and Scalarization for Data-Parallel Architectures. In: CGO 2013: International Symposium on Code Generation and Optimization. IEEE (2013)Google Scholar
  15. 15.
    Nugteren, C., Corporaal, H.: Introducing ‘Bones’: A Parallelizing Source-to-Source Compiler Based on Algorithmic Skeletons. In: GPGPU-5: 5th Workshop on General Purpose Processing on Graphics Processing Units. ACM (2012)Google Scholar
  16. 16.
    Nugteren, C., Corvino, R., Corporaal, H.: Algorithmic Species Revisited: A Program Code Classification Based on Array References. In: MuCoCoS 2013: International Workshop on Multi-/Many-core Computing Systems (2013)Google Scholar
  17. 17.
    Nugteren, C., Custers, P., Corporaal, H.: Algorithmic Species: An Algorithm Classification of Affine Loop Nests for Parallel Programming. ACM TACO: Transactions on Architecture and Code Optimisations 9(4), Article 40 (2013)Google Scholar
  18. 18.
    Olschanowsky, C., Snavely, A., Meswani, M., Carrington, L.: PIR: PMaC’s Idiom Recognizer. In: ICPPW 2010: 39th International Conference on Parallel Processing Workshops. IEEE (2010)Google Scholar
  19. 19.
    Park, E., Pouchet, L.-N., Cavazos, J., Cohen, A., Sadayappan, P.: Predictive Modeling in a Polyhedral Optimization Space. In: CGO 2011: International Symposium on Code Generation and Optimization. IEEE (2011)Google Scholar
  20. 20.
    Shen, J., Fang, J., Sips, H., Varbanescu, A.: Performance Gaps between OpenMP and OpenCL for Multi-core CPUs. In: ICPPW: International Conference on Parallel Processing Workshops. IEEE (2012)Google Scholar
  21. 21.
    Steuwer, M., Kegel, P., Gorlatch, S.: SkelCL - A Portable Skeleton Library for High-Level GPU Programming. In: IPDPSW 2011: International Symposium on Parallel and Distributed Processing Workshops and PhD Forum. IEEE (2011)Google Scholar
  22. 22.
    Verdoolaege, S., Carlos Juega, J., Cohen, A., Ignacio Gómez, J., Tenllado, C., Catthoor, F.: Polyhedral Parallel Code Generation for CUDA. ACM Transactions on Architecture and Code Optimisations 9(4), Article 54 (January 2013)Google Scholar
  23. 23.
    Wolfe, M.: Implementing the PGI Accelerator Model. In: GPGPU-3: 3rd Workshop on General Purpose Processing on Graphics Processing Units. ACM (2010)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Cedric Nugteren
    • 1
  • Pieter Custers
    • 1
  • Henk Corporaal
    • 1
  1. 1.Eindhoven University of TechnologyThe Netherlands

Personalised recommendations