Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

Moreton-Fernandez, Ana; Gonzalez-Escribano, Arturo; Llanos, Diego R.

doi:10.1007/s10766-017-0542-x

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

Published: 09 December 2017

Volume 47, pages 94–113, (2019)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Ana Moreton-Fernandez¹,
Arturo Gonzalez-Escribano¹ &
Diego R. Llanos¹

514 Accesses
8 Citations
Explore all metrics

Abstract

Current HPC clusters are composed by several machines with different computation capabilities and different kinds and families of accelerators. Programming efficiently for these heterogeneous systems has become an important challenge. There are many proposals to simplify the programming and management of accelerator devices, and the hybrid programming, mixing accelerators and CPU cores. However, in many cases, portability compromises the efficiency on different devices, and there are details concerning the coordination of different types of devices that should still be tackled by the programmer. In this work, we introduce the Multi-Controller, an abstract entity implemented in a library that coordinates the management of heterogeneous devices, including accelerators with different capabilities and sets of CPU-cores. Our proposal improves state-of-the-art solutions, simplifying data partition, mapping and the transparent deployment of both, simple generic kernels portable across different device types, and specialized implementations defined and optimized using specific native or vendor programming models (such as CUDA for NVIDIA’s GPUs, or OpenMP for CPU-cores). The run-time system automatically selects and deploys the most appropriate implementation of each kernel for each device, managing data movements and hiding the launch details. The results of an experimental study with five study cases indicates that our abstraction allows the development of flexible and highly efficient programs that adapt to the heterogeneous environment.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Survey on chiplets: interface, interconnect and integration methodology

Article 31 March 2022

Containerization technologies: taxonomies, applications and challenges

Article 08 June 2021

Virtualization in Cloud Computing: Moving from Hypervisor to Containerization—A Survey

Article 13 April 2021

References

Alonso-Mayo, A., Ortega-Arranz, H., Gonzalez-Escribano, A.: Communicators: an abstraction to ease the use of accelerators. In: HLPGPU’2016 (2016)
Dastgeer, U., Enmyren, J., Kessler, C.W.: Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems. In: Proceedings of IWMSE’11, pp. 25–32. ACM, New York (2011)
Gonzalez-Escribano, A., Torres, Y., Fresno, J., Llanos, D.R.: An extensible system for multilevel automatic data partition and mapping. IEEE Trans. Parallel Distrib. Syst. 25(5), 1145–1154 (2014)
Article Google Scholar
Haidl, M., Gorlatch, S.: PACXX: Towards a unified programming model for programming accelerators using C++14. In: Proceedings of LLVM-HPC’14. IEEE (2014)
Halstead, M.H.: Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., New York (1977)
MATH Google Scholar
Hijma, P., Jacobs, C.J., van Nieuwpoort, R.V., Bal, H.E.: Cashmere: Heterogeneous many-core computing. In: 2015 IEEE International and Parallel and Distributed Processing Symposium (IPDPS), pp. 135–145. IEEE (2015)
Hugo, A.E., Guermouche, A., Wacrenier, P.A., Namyst, R.: Composing multiple StarPU applications over heterogeneous machines: a supervised approach. In: Proceedings of IPDPSW’13 PhD Forum, pp. 1050–1059. IEEE, Washington (2013)
Karimi, K., Dickson, N.G., Hamze, F.: A performance comparison of cuda and opencl. arXiv preprint arXiv:1005.2581 (2010)
Liang, T., Li, H., Chiu, J.: Enabling mixed OpenMP/MPI programming on hybrid CPU/GPU computing architecture. In: Proceedings of IPDPSW’12, PhD Forum, pp. 2369–2377. IEEE, Washington (2012). https://doi.org/10.1109/IPDPSW.2012.294
McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 4, 308–320 (1976)
Article MathSciNet MATH Google Scholar
Moreton-Fernandez, A., Rodriguez-Gutiez, E., Gonzalez-Escribano, A., Llanos, D.R.: Supporting the xeon phi coprocessor in a heterogeneous programming model. In: European Conference on Parallel Processing, pp. 457–469. Springer, Cham (2017)
MoretonFernandez, A., OrtegaArranz, H., GonzalezEscribano, A.: Controllers: an abstraction to ease the use of hardware accelerators. Int. J. High Perform. Comput. Appl. (2017). https://doi.org/10.1177/1094342017702962
NVIDIA: NVIDIA CUDA C Programming Guide 7.5 (2015). http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf. Last visit: November 16th (2015)
Nvidia, C.: Cublas Library, vol. 15, p. 27. NVIDIA Corporation, Santa Clara (2008)
Google Scholar
Ortega-Arranz, H., Torres, Y., Gonzalez-Escribano, A., Llanos, D.R.: Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria. J. Supercomput. 70(2), 786–798 (2014). https://doi.org/10.1007/s11227-014-1212-z
Article Google Scholar
Pérez, B., Bosque, J.L., Beivide, R.: Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, pp. 42–51. ACM (2016)
Scogland, T.R., Rountree, B., Feng, W.C., de Supinski, B.R.: Heterogeneous task scheduling for accelerated openmp. In: 2012 IEEE 26th International and Parallel & Distributed Processing Symposium (IPDPS), pp. 144–155. IEEE (2012)
Shen, J., Varbanescu, A.L., Lu, Y., Zou, P., Sips, H.: Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 27(9), 2766–2780 (2016)
Article Google Scholar
Stone, J.E., Gohara, D., Shi, G.: Opencl: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(1–3), 66–73 (2010)
Article Google Scholar
TOP500.org: Top500 supercomputing sites. WWW (2017). On http://www.top500.org/
Torres, Y., Gonzalez-Escribano, A., Llanos, D.R.: uBench: exposing the impact of CUDA block geometry in terms of performance. J. Supercomput. 65(3), 1150–1163 (2013). https://doi.org/10.1007/s11227-013-0921-z
Article Google Scholar

Download references

Acknowledgements

This research has been partially supported by MICINN (Spain), the ERDF program of the European Union and Junta de Castilla y Leon: HomProg-HetSys Project (TIN2014-58876-P), CAPAP-H6 (TIN2016-81840-REDT), COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS), and FEDER Grant VA082P17 (PROPHET Project).

Author information

Authors and Affiliations

Departamento de Informática, Universidad de Valladolid, Valladolid, Spain
Ana Moreton-Fernandez, Arturo Gonzalez-Escribano & Diego R. Llanos

Authors

Ana Moreton-Fernandez
View author publications
You can also search for this author in PubMed Google Scholar
Arturo Gonzalez-Escribano
View author publications
You can also search for this author in PubMed Google Scholar
Diego R. Llanos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ana Moreton-Fernandez.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Moreton-Fernandez, A., Gonzalez-Escribano, A. & Llanos, D.R. Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming . Int J Parallel Prog 47, 94–113 (2019). https://doi.org/10.1007/s10766-017-0542-x

Download citation

Received: 19 May 2017
Accepted: 02 December 2017
Published: 09 December 2017
Issue Date: 15 February 2019
DOI: https://doi.org/10.1007/s10766-017-0542-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

Abstract

Access this article

Similar content being viewed by others

Survey on chiplets: interface, interconnect and integration methodology

Containerization technologies: taxonomies, applications and challenges

Virtualization in Cloud Computing: Moving from Hypervisor to Containerization—A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

Abstract

Access this article

Similar content being viewed by others

Survey on chiplets: interface, interconnect and integration methodology

Containerization technologies: taxonomies, applications and challenges

Virtualization in Cloud Computing: Moving from Hypervisor to Containerization—A Survey

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation