Advertisement

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems

  • Thomas R. W. Scogland
  • Wu-chun Feng
  • Barry Rountree
  • Bronis R. de Supinski
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8488)

Abstract

The popularity of heterogeneous computing continues to increase rapidly due to the high peak performance, favorable energy efficiency, and comparatively low cost of accelerators. However, heterogeneous programming models still lack the flexibility of their CPU-only counterparts. Accelerated OpenMP models, including OpenMP 4.0 and OpenACC, ease the migration of code from CPUs to GPUs but lack much of OpenMP’s flexibility: OpenMP applications can run on any number of CPUs without extra user effort, but GPU implementations do not offer similar adaptive worksharing across GPUs in a node, nor do they employ a mix of CPUs and GPUs. To address these shortcomings, we present CoreTSAR, our library for scheduling cores via a task-size adapting runtime system by supporting worksharing of loop nests across arbitrary heterogeneous resources. Beyond scheduling the computational load across devices, CoreTSAR includes a memory-management system that operates based on task association, enabling the runtime to dynamically manage memory movement and task granularity. Our evaluation shows that CoreTSAR can provide nearly linear scaling to four GPUs and all cores in a node without modifying the code within the parallel region. Furthermore, CoreTSAR provides portable performance across a variety of system configurations.

Keywords

Static Schedule Adaptive Schedule CUDA Implementation OpenMP Version Task Granularity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Anandakrishnan, R., Scogland, T.R.W., Fenley, A.T., Gordon, J.C., Feng, W.-c., Onufriev, A.V.: Accelerating Electrostatic Surface Potential Calculation with Multi-Scale Approximation on Graphics Processing Units. Journal of Molecular Graphics and Modelling 28(8), 904–910 (2009)Google Scholar
  2. 2.
    Augonnet, C., Thibault, S., Namyst, R., Wacrenier, P.-A.: StarPU: A Unified Platform for Task Scheduling on Heterogeneous Multicore Architectures. In: Sips, H., Epema, D., Lin, H.-X. (eds.) Euro-Par 2009. LNCS, vol. 5704, pp. 863–874. Springer, Heidelberg (2009)CrossRefGoogle Scholar
  3. 3.
    Ayguadé, E., Blainey, B., Duran, A., Labarta, J., Martínez, F., Martorell, X., Silvera, R.: Is the Schedule Clause Really Necessary in OpenMP? In: Voss, M.J. (ed.) WOMPAT 2003. LNCS, vol. 2716, pp. 147–160. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  4. 4.
    Berkelaar, M., Notebaert, P., Eikland, K.: lp_solve(mixed integer) linear programming problem solver (2003), http://lpsolve.sourceforge.net/5.0/
  5. 5.
    Beyer, J.C., Stotzer, E.J., Hart, A., de Supinski, B.R.: OpenMP for accelerators. In: Chapman, B.M., Gropp, W.D., Kumaran, K., Müller, M.S. (eds.) IWOMP 2011. LNCS, vol. 6665, pp. 108–121. Springer, Heidelberg (2011)CrossRefGoogle Scholar
  6. 6.
    CAPS Enterprise, Cray Inc., NVIDIA and the Portland Group. The openacc application programming interface, v1.0. (November 2011), http://www.openacc-standard.org
  7. 7.
    Daga, M., Scogland, T., Feng, W.: Architecture-aware mapping and optimization on a 1600-core gpu. In: 2011 IEEE 17th International Conference on Parallel and Distributed Systems (ICPADS), pp. 316–323. IEEE (2011)Google Scholar
  8. 8.
    Dagum, L., Menon, R.: OpenMP: An Industry Standard API for Shared-Memory Programming. IEEE Computational Science & Engineering 5(1), 46–55 (1998)CrossRefGoogle Scholar
  9. 9.
    Duran, A., Ayguade, E., Badia, R., Labarta, J., Martinell, L., Martorell, X., Planas, J.: OmpSs: A Proposal for Programming Heterogeneous Multi-Core Architectures. Parallel Processing Letters 21(2), 173–193 (2011)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Grauer-Gray, S., Xu, L., Searles, R., Ayalasomayajula, S.: Auto-tuning a High-Level Language Targeted to GPU Codes. cis.udel.eduGoogle Scholar
  11. 11.
    Munshi, A.: Khronos OpenCL Working Group and others. The opencl specification (2008)Google Scholar
  12. 12.
    OpenMP Architecture Review Board. OpenMP application program interface version 4.0 (2013)Google Scholar
  13. 13.
    Ravi, V.T., Agrawal, G.: A dynamic scheduling framework for emerging heterogeneous systems. In: 2011 18th International Conference on High Performance Computing (HiPC), pp. 1–10 (2011)Google Scholar
  14. 14.
    Ravi, V.T., Ma, W., Chiu, D., Agrawal, G.: Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations. In: ICS 2010: Proceedings of the 24th ACM International Conference on Supercomputing, ACM Request Permissions (June 2010)Google Scholar
  15. 15.
    Reinders, J.: Intel Threading Building Blocks (2007)Google Scholar
  16. 16.
    Scogland, T.R.W., Rountree, B., Feng, W.-c., de Supinski, B.R.: Heterogeneous Task Scheduling for Accelerated OpenMP. In: 2012 IEEE International Parallel & Distributed Processing Symposium (IPDPS), Shanghai, China (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Thomas R. W. Scogland
    • 1
  • Wu-chun Feng
    • 1
  • Barry Rountree
    • 2
  • Bronis R. de Supinski
    • 2
  1. 1.Department of Computer ScienceVirginia TechBlacksburgUSA
  2. 2.Center for Applied Scientific ComputingLawrence Livermore National LaboratoryLivermoreUSA

Personalised recommendations