TuCCompi: A Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities

Ortega-Arranz, Hector; Torres, Yuri; Gonzalez-Escribano, Arturo; Llanos, Diego R.

doi:10.1007/s10766-015-0349-6

TuCCompi: A Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities

Published: 27 February 2015

Volume 43, pages 939–960, (2015)
Cite this article

International Journal of Parallel Programming Aims and scope Submit manuscript

Hector Ortega-Arranz¹,
Yuri Torres¹,
Arturo Gonzalez-Escribano¹ &
…
Diego R. Llanos¹

269 Accesses
5 Citations
Explore all metrics

Abstract

During the last decade, parallel processing architectures have become a powerful tool to deal with massively-parallel problems that require high performance computing (HPC). The last trend of HPC is the use of heterogeneous environments, that combine different computational processing devices, such as CPU-cores and graphics processing units (GPUs). Maximizing the performance of any GPU parallel implementation of an algorithm requires an in-depth knowledge about the GPU underlying architecture, becoming a tedious manual effort only suited for experienced programmers. In this paper, we present TuCCompi, a multi-layer abstract model that simplifies the programming on heterogeneous systems including hardware accelerators, by hiding the details of synchronization, deployment, and tuning. TuCCompi chooses optimal values for their configuration parameters using a kernel characterization provided by the programmer. This model is very useful to tackle problems characterized by independent, high computational-load independent tasks, such as embarrassingly-parallel problems. We have evaluated TuCCompi in different, real-world, heterogeneous environments using the all-pair shortest-path problem as a case study.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments

Article Open access 07 January 2023

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

Article 09 December 2017

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems

References

Foster, I.: Designing and building parallel programs: concepts and tools for parallel software engineering. Addison-Wesley Longman Publishing Co., Inc., Boston (1995)
MATH Google Scholar
Hoelzle, U., Barroso, L.A.: The datacenter as a computer: an introduction to the design of warehouse-scale machines, 1st edn. Morgan and Claypool Publishers, San Rafael (2009)
Google Scholar
Cirne, W., Paranhos, D., Costa, L., Santos-Neto, E., Brasileiro, F., Sauve, J., Silva, F.A.B., Barros, C., Silveira, C.: Running bag-of-tasks applications on computational grids: the mygrid approach. In: Proceedings of international conference on parallel processing (ICPP 2003), pp. 407–416 (2003)
Mangharam, R., Saba, A.A.: Anytime algorithms for GPU architectures. In: Proceedings of the 2011 IEEE 32nd real-time systems symposium, RTSS ’11, pp. 47–56. Washington, DC, IEEE Computer Society (2011)
Taylor, M.: Bitcoin and the age of bespoke silicon. In: Compilers, architecture and synthesis for embedded systems (CASES), 2013 international conference on, pp. 1–10 (2013)
Brodtkorb, A.R., Dyken, C., Hagen, T.R., Hjelmervik, J.M., Storaasli, O.O.: State-of-the-art in heterogeneous computing. Sci. Program. 18(1), 1–33 (2010)
Google Scholar
Reyes, R., de Sande, F.: Optimization strategies in different CUDA architectures using llCoMP. Microprocess. Microsyst. 36(2), 78–87 (2012)
Article Google Scholar
Liang, T., Li, H., Chiu, J.: Enabling mixed openMP/MPI programming on hybrid CPU/GPU computing architecture. In: Proceedings of the 2012 IEEE 26th international parallel and distributed processing symposium workshops & PhD forum (IPDPSW), pp. 2369–2377. IEEE, Shanghai (2012)
Torres, Y., Gonzalez-Escribano, A., Llanos, D.: Using Fermi architecture knowledge to speed up CUDA and OpenCL programs. In: Parallel and distributed processing with applications (ISPA), 2012 IEEE 10th international symposium on, pp. 617–624 (2012)
Torres, Y., Gonzalez-Escribano, A., Llanos, D.R.: uBench: exposing the impact of CUDA block geometry in terms of performance. J. Supercomput. 65(3), 1150–1163 (2013)
Yang, C., Huang, C., Lin, C.: Hybrid CUDA, OpenMP, and MPI parallel programming on multicore GPU clusters. Comput. Phys. Commun. 182, 266–269 (2011)
Article Google Scholar
Howison, M., Bethel, E., Childs, H.: Hybrid parallelism for volume rendering on large-, multi-, and many-core systems. Vis. Comput. Graph. IEEE Trans. 18(1), 17–29 (2012)
Article Google Scholar
Steuwer, M., Gorlatch, S.: SkelCL: enhancing OpenCL for high-level programming of multi-GPU systems. In: LNCS, ser, Malyshkin, V. (eds.) Parallel computing technologies, p. 258272. Springer, Berlin (2013)
Google Scholar
Hugo, A.-E., Guermouche, A., Wacrenier, P.-A., Namyst, R.: Composing multiple starPU applications over heterogeneous machines: a supervised approach. In: Proceedings of IEEE 27th IPDPSW’13, pp. 1050–1059. Washington, USA: IEEE, (2013)
Dastgeer, U., Enmyren, J., Kessler, C. W.: Auto-tuning SkePU: a multi-backend skeleton programming framework for multi-GPU systems. In: Proceedings of the 4th IWMSE, pp. 25–32. New York, NY, USA: ACM, (2011)
Reyes, R., López-Rodríguez, I., Fumero, J.J., de Sande, F.: accULL: an OpenACC implementation with CUDA and OpenCL support. In: Proceedings of the 18th conference on parallel processing, ser. EuroPar’12, pp. 871–882. Springer, Berlin (2012)
Farooqui, N., Kerr, A., Diamos, G.F., Yalamanchili, S., Schwan, K.:A framework for dynamically instrumenting GPU compute applications within GPU Ocelot. In: Proceedings of 4th workshop on GPGPU: CA, USA, 5 Mar 2011. ACM, p. 9 (2011)
Pai, S., Thazhuthaveetil, M.J., Govindarajan, R.: Improving GPGPU concurrency with elastic kernels. SIGPLAN Not. 48(4), 407–418 (2013)
Article Google Scholar
NVIDIA.: NVIDIA CUDA programming guide 6.0, (2014)
Kirk, D. B., Hwu, W.W.: Programming massively parallel processors: a hands-on approach, 1st edn. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA (2010)
Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische Mathematik 1, 269–271 (1959)
Article MathSciNet MATH Google Scholar
Crauser, A., Mehlhorn, K., Meyer, U., Sanders, P.: A parallelization of Dijkstra’s shortest path algorithm. In: LNCS, ser, Brim, L., Gruska, J., Zlatuška, J. (eds.) Mathematical foundations of computer science 1998, pp. 722–731. Springer, Berlin (1998)
Ortega-Arranz, H., Torres, Y., Llanos, D.R., Gonzalez-Escribano, A.: A new GPU-based approach to the shortest path problem. In: High performance computing and simulation (HPCS). international conference on 2013, pp. 505–512 (2013)
Martín, P., Torres, R., Gavilanes, A.: CUDA solutions for the SSSP problem. In: LNCS, ser, Allen, G., Nabrzyski, J., Seidel, E., van Albada, G., Dongarra, J., Sloot, P. (eds.) Computational Science: ICCS 2009, pp. 904–913. Springer, Berlin (2009)
Chapter Google Scholar

Download references

Acknowledgments

The authors would like to thank Javier Ramos López for his support with technical issues. This research has been partially supported by Ministerio de Economía y Competitividad and ERDF program of the European Union: CAPAP-H5 network (TIN2014-53522-REDT), MOGECOPP project (TIN2011-25639); Junta de Castilla y León (Spain): ATLAS project (VA172A12-2); and the COST Program Action IC1305: NESUS.

Author information

Authors and Affiliations

Departamento de Informática, Universidad de Valladolid, Valladolid, Spain
Hector Ortega-Arranz, Yuri Torres, Arturo Gonzalez-Escribano & Diego R. Llanos

Authors

Hector Ortega-Arranz
View author publications
You can also search for this author in PubMed Google Scholar
Yuri Torres
View author publications
You can also search for this author in PubMed Google Scholar
Arturo Gonzalez-Escribano
View author publications
You can also search for this author in PubMed Google Scholar
Diego R. Llanos
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hector Ortega-Arranz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ortega-Arranz, H., Torres, Y., Gonzalez-Escribano, A. et al. TuCCompi: A Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities. Int J Parallel Prog 43, 939–960 (2015). https://doi.org/10.1007/s10766-015-0349-6

Download citation

Received: 04 July 2014
Accepted: 03 February 2015
Published: 27 February 2015
Issue Date: October 2015
DOI: https://doi.org/10.1007/s10766-015-0349-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

TuCCompi: A Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities

Abstract

Access this article

Similar content being viewed by others

Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

TuCCompi: A Multi-layer Model for Distributed Heterogeneous Computing with Tuning Capabilities

Abstract

Access this article

Similar content being viewed by others

Distributed Calculations with Algorithmic Skeletons for Heterogeneous Computing Environments

Multi-device Controllers: A Library to Simplify Parallel Heterogeneous Programming

CoreTSAR: Adaptive Worksharing for Heterogeneous Systems

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation