Abstract
Current HPC clusters are composed by several machines with different computation capabilities and different kinds and families of accelerators. Programming efficiently for these heterogeneous systems has become an important challenge. There are many proposals to simplify the programming and management of accelerator devices, and the hybrid programming, mixing accelerators and CPU cores. However, in many cases, portability compromises the efficiency on different devices, and there are details concerning the coordination of different types of devices that should still be tackled by the programmer. In this work, we introduce the Multi-Controller, an abstract entity implemented in a library that coordinates the management of heterogeneous devices, including accelerators with different capabilities and sets of CPU-cores. Our proposal improves state-of-the-art solutions, simplifying data partition, mapping and the transparent deployment of both, simple generic kernels portable across different device types, and specialized implementations defined and optimized using specific native or vendor programming models (such as CUDA for NVIDIA’s GPUs, or OpenMP for CPU-cores). The run-time system automatically selects and deploys the most appropriate implementation of each kernel for each device, managing data movements and hiding the launch details. The results of an experimental study with five study cases indicates that our abstraction allows the development of flexible and highly efficient programs that adapt to the heterogeneous environment.
Similar content being viewed by others
References
Alonso-Mayo, A., Ortega-Arranz, H., Gonzalez-Escribano, A.: Communicators: an abstraction to ease the use of accelerators. In: HLPGPU’2016 (2016)
Dastgeer, U., Enmyren, J., Kessler, C.W.: Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems. In: Proceedings of IWMSE’11, pp. 25–32. ACM, New York (2011)
Gonzalez-Escribano, A., Torres, Y., Fresno, J., Llanos, D.R.: An extensible system for multilevel automatic data partition and mapping. IEEE Trans. Parallel Distrib. Syst. 25(5), 1145–1154 (2014)
Haidl, M., Gorlatch, S.: PACXX: Towards a unified programming model for programming accelerators using C++14. In: Proceedings of LLVM-HPC’14. IEEE (2014)
Halstead, M.H.: Elements of Software Science (Operating and Programming Systems Series). Elsevier Science Inc., New York (1977)
Hijma, P., Jacobs, C.J., van Nieuwpoort, R.V., Bal, H.E.: Cashmere: Heterogeneous many-core computing. In: 2015 IEEE International and Parallel and Distributed Processing Symposium (IPDPS), pp. 135–145. IEEE (2015)
Hugo, A.E., Guermouche, A., Wacrenier, P.A., Namyst, R.: Composing multiple StarPU applications over heterogeneous machines: a supervised approach. In: Proceedings of IPDPSW’13 PhD Forum, pp. 1050–1059. IEEE, Washington (2013)
Karimi, K., Dickson, N.G., Hamze, F.: A performance comparison of cuda and opencl. arXiv preprint arXiv:1005.2581 (2010)
Liang, T., Li, H., Chiu, J.: Enabling mixed OpenMP/MPI programming on hybrid CPU/GPU computing architecture. In: Proceedings of IPDPSW’12, PhD Forum, pp. 2369–2377. IEEE, Washington (2012). https://doi.org/10.1109/IPDPSW.2012.294
McCabe, T.J.: A complexity measure. IEEE Trans. Softw. Eng. 4, 308–320 (1976)
Moreton-Fernandez, A., Rodriguez-Gutiez, E., Gonzalez-Escribano, A., Llanos, D.R.: Supporting the xeon phi coprocessor in a heterogeneous programming model. In: European Conference on Parallel Processing, pp. 457–469. Springer, Cham (2017)
MoretonFernandez, A., OrtegaArranz, H., GonzalezEscribano, A.: Controllers: an abstraction to ease the use of hardware accelerators. Int. J. High Perform. Comput. Appl. (2017). https://doi.org/10.1177/1094342017702962
NVIDIA: NVIDIA CUDA C Programming Guide 7.5 (2015). http://docs.nvidia.com/cuda/pdf/CUDA_C_Programming_Guide.pdf. Last visit: November 16th (2015)
Nvidia, C.: Cublas Library, vol. 15, p. 27. NVIDIA Corporation, Santa Clara (2008)
Ortega-Arranz, H., Torres, Y., Gonzalez-Escribano, A., Llanos, D.R.: Optimizing an APSP implementation for NVIDIA GPUs using kernel characterization criteria. J. Supercomput. 70(2), 786–798 (2014). https://doi.org/10.1007/s11227-014-1212-z
Pérez, B., Bosque, J.L., Beivide, R.: Simplifying programming and load balancing of data parallel applications on heterogeneous systems. In: Proceedings of the 9th Annual Workshop on General Purpose Processing using Graphics Processing Unit, pp. 42–51. ACM (2016)
Scogland, T.R., Rountree, B., Feng, W.C., de Supinski, B.R.: Heterogeneous task scheduling for accelerated openmp. In: 2012 IEEE 26th International and Parallel & Distributed Processing Symposium (IPDPS), pp. 144–155. IEEE (2012)
Shen, J., Varbanescu, A.L., Lu, Y., Zou, P., Sips, H.: Workload partitioning for accelerating applications on heterogeneous platforms. IEEE Trans. Parallel Distrib. Syst. 27(9), 2766–2780 (2016)
Stone, J.E., Gohara, D., Shi, G.: Opencl: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(1–3), 66–73 (2010)
TOP500.org: Top500 supercomputing sites. WWW (2017). On http://www.top500.org/
Torres, Y., Gonzalez-Escribano, A., Llanos, D.R.: uBench: exposing the impact of CUDA block geometry in terms of performance. J. Supercomput. 65(3), 1150–1163 (2013). https://doi.org/10.1007/s11227-013-0921-z
Acknowledgements
This research has been partially supported by MICINN (Spain), the ERDF program of the European Union and Junta de Castilla y Leon: HomProg-HetSys Project (TIN2014-58876-P), CAPAP-H6 (TIN2016-81840-REDT), COST Program Action IC1305: Network for Sustainable Ultrascale Computing (NESUS), and FEDER Grant VA082P17 (PROPHET Project).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Moreton-Fernandez, A., Gonzalez-Escribano, A. & Llanos, D.R. Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming . Int J Parallel Prog 47, 94–113 (2019). https://doi.org/10.1007/s10766-017-0542-x
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10766-017-0542-x