Skip to main content
Log in

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

  • Published:
International Journal of Parallel Programming Aims and scope Submit manuscript

Abstract

Current HPC clusters are composed by several machines with different computation capabilities and different kinds and families of accelerators. Programming efficiently for these heterogeneous systems has become an important challenge. There are many proposals to simplify the programming and management of accelerator devices, and the hybrid programming, mixing accelerators and CPU cores. However, in many cases, portability compromises the efficiency on different devices, and there are details concerning the coordination of different types of devices that should still be tackled by the programmer. In this work, we introduce the Multi-Controller, an abstract entity implemented in a library that coordinates the management of heterogeneous devices, including accelerators with different capabilities and sets of CPU-cores. Our proposal improves state-of-the-art solutions, simplifying data partition, mapping and the transparent deployment of both, simple generic kernels portable across different device types, and specialized implementations defined and optimized using specific native or vendor programming models (such as CUDA for NVIDIA’s GPUs, or OpenMP for CPU-cores). The run-time system automatically selects and deploys the most appropriate implementation of each kernel for each device, managing data movements and hiding the launch details. The results of an experimental study with five study cases indicates that our abstraction allows the development of flexible and highly efficient programs that adapt to the heterogeneous environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Alonso-Mayo, A., Ortega-Arranz, H., Gonzalez-Escribano, A.: Communicators: an abstraction to ease the use of accelerators. In: HLPGPU’2016 (2016)

  2. Dastgeer, U., Enmyren, J., Kessler, C.W.: Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems. In: Proceedings of IWMSE’11, pp. 25–32. ACM, New York (2011)

  3. Gonzalez-Escribano, A., Torres, Y., Fresno, J., Llanos, D.R.: An extensible system for multilevel automatic data partition and mapping. IEEE Trans. Parallel Distrib. Syst. 25(5), 1145–1154 (2014)

    Article  Google Scholar 

  4. Haidl, M., Gorlatch, S.: PACXX: Towards a unified programming model for programming accelerators using C++14. In: Proceedings of LLVM-HPC’14. IEEE (2014)

  5. Halstead, M.H.: Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., New York (1977)

    MATH  Google Scholar 

  6. Hijma, P., Jacobs, C.J., van Nieuwpoort, R.V., Bal, H.E.: Cashmere: Heterogeneous many-core computing. In: 2015 IEEE International and Parallel and Distributed Processing Symposium (IPDPS), pp. 135–145. IEEE (2015)

  7. Hugo, A.E., Guermouche, A., Wacrenier, P.A., Namyst, R.: Composing multiple StarPU applications over heterogeneous machines: a supervised approach. In: Proceedings of IPDPSW’13 PhD Forum, pp. 1050–1059. IEEE, Washington (2013)

  8. Karimi, K., Dickson, N.G., Hamze, F.: A performance comparison of cuda and opencl. arXiv preprint arXiv:1005.2581 (2010)

  9. Liang, T., Li, H., Chiu, J.: Enabling mixed OpenMP/MPI programming on hybrid CPU/GPU computing architecture. In: Proceedings of IPDPSW’12, PhD Forum, pp. 2369–2377. IEEE, Washington (2012). https://doi.org/10.1109/IPDPSW.2012.294

  10. McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 4, 308–320 (1976)

    Article  MathSciNet  MATH  Google Scholar 

  11. Moreton-Fernandez, A., Rodriguez-Gutiez, E., Gonzalez-Escribano, A., Llanos, D.R.: Supporting the xeon phi coprocessor in a heterogeneous programming model. In: European Conference on Parallel Processing, pp. 457–469. Springer, Cham (2017)

  12. MoretonFernandez, A., OrtegaArranz, H., GonzalezEscribano, A.: Controllers: an abstraction to ease the use of hardware accelerators. Int. J. High Perform. Comput. Appl. (2017). https://doi.org/10.1177/1094342017702962

  13. NVIDIA: NVIDIA CUDA C Programming Guide 7.5 (2015). http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf. Last visit: November 16th (2015)

  14. Nvidia, C.: Cublas Library, vol. 15, p. 27. NVIDIA Corporation, Santa Clara (2008)

    Google Scholar 

  15. Ortega-Arranz, H., Torres, Y., Gonzalez-Escribano, A., Llanos, D.R.: Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria. J. Supercomput. 70(2), 786–798 (2014). https://doi.org/10.1007/s11227-014-1212-z

    Article  Google Scholar 

  16. Pérez, B., Bosque, J.L., Beivide, R.: Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, pp. 42–51. ACM (2016)

  17. Scogland, T.R., Rountree, B., Feng, W.C., de Supinski, B.R.: Heterogeneous task scheduling for accelerated openmp. In: 2012 IEEE 26th International and Parallel & Distributed Processing Symposium (IPDPS), pp. 144–155. IEEE (2012)

  18. Shen, J., Varbanescu, A.L., Lu, Y., Zou, P., Sips, H.: Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 27(9), 2766–2780 (2016)

    Article  Google Scholar 

  19. Stone, J.E., Gohara, D., Shi, G.: Opencl: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(1–3), 66–73 (2010)

    Article  Google Scholar 

  20. TOP500.org: Top500 supercomputing sites. WWW (2017). On http://www.top500.org/

  21. Torres, Y., Gonzalez-Escribano, A., Llanos, D.R.: uBench: exposing the impact of CUDA block geometry in terms of performance. J. Supercomput. 65(3), 1150–1163 (2013). https://doi.org/10.1007/s11227-013-0921-z

    Article  Google Scholar 

Download references

Acknowledgements

This research has been partially supported by MICINN (Spain), the ERDF program of the European Union and Junta de Castilla y Leon: HomProg-HetSys Project (TIN2014-58876-P), CAPAP-H6 (TIN2016-81840-REDT), COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS), and FEDER Grant VA082P17 (PROPHET Project).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ana Moreton-Fernandez.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moreton-Fernandez, A., Gonzalez-Escribano, A. & Llanos, D.R. Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming . Int J Parallel Prog 47, 94–113 (2019). https://doi.org/10.1007/s10766-017-0542-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10766-017-0542-x

Keywords

Navigation