accULL: An OpenACC Implementation with CUDA and OpenCL Support

  • Ruymán Reyes
  • Iván López-Rodríguez
  • Juan J. Fumero
  • Francisco de Sande
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7484)

Abstract

The irruption in the HPC scene of hardware accelerators, like GPUs, has made available unprecedented performance to developers. However, even expert developers may not be ready to exploit the new complex processor hierarchies. We need to find a way to leverage the programming effort in these devices at programming language level, otherwise, developers will spend most of their time focusing on device-specific code instead of implementing algorithmic enhancements. The recent advent of the OpenACC standard for heterogeneous computing represents an effort in this direction. This initiative, combined with future releases of the OpenMP standard, will converge into a fully heterogeneous framework that will cope the programming requirements of future computer architectures. In this work we present accULL, a novel implementation of the OpenACC standard, based on the combination of a source to source compiler and a runtime library. To our knowledge, our approach is the first providing support for both OpenCL and CUDA platforms under this new standard.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bihan, F.B.S.: Heterogeneous multicore parallel programming for graphics processing units. Sci. Program. 17, 325–336 (2009)Google Scholar
  2. 2.
    Che, S., Sheaffer, J.W., Boyer, M., Szafaryn, L.G., Wang, L., Skadron, K.: A characterization of the rodinia benchmark suite with comparison to contemporary cmp workloads. In: Proceedings of the IEEE International Symposium on Workload Characterization, IISWC 2010, pp. 1–11. IEEE Computer Society, Washington, DC (2010)Google Scholar
  3. 3.
    Giménez, J., Labarta, J., Pegenaute, F.X., Wen, H.-F., Klepacki, D., Chung, I.-H., Cong, G., Voigtländer, F., Mohr, B.: Guided Performance Analysis Combining Profile and Trace Tools. In: Guarracino, M.R., Vivien, F., Träff, J.L., Cannatoro, M., Danelutto, M., Hast, A., Perla, F., Knüpfer, A., Di Martino, B., Alexander, M. (eds.) Euro-Par-Workshop 2010. LNCS, vol. 6586, pp. 513–521. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  4. 4.
    OpenACC directives for accelerators (2011), http://www.openacc-standard.org
  5. 5.
    Reyes, R., de Sande, F.: Optimization strategies in different CUDA architectures using. Microprocessors and Microsystems - Embedded Hardware Design 36(2), 78–87 (2012)CrossRefGoogle Scholar
  6. 6.
    Wolfe, M.: Implementing the PGI accelerator model. In: Proceedings of the 3rd Workshop on General-Purpose Computation on Graphics Processing Units, GPGPU 2010, pp. 43–50. ACM, New York (2010)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Ruymán Reyes
    • 1
  • Iván López-Rodríguez
    • 1
  • Juan J. Fumero
    • 1
  • Francisco de Sande
    • 1
  1. 1.Dept. de E.I.O. y ComputaciónUniversidad de La LagunaLa LagunaSpain

Personalised recommendations