Hybrid static–dynamic selection of implementation alternatives in heterogeneous environments

  • D. del Rio Astorga
  • Manuel F. DolzEmail author
  • Javier Fernandez
  • Javier Garcia Blas


With the emergence of heterogeneous architectures, developing parallel software has become an increasingly complex task. The ability of using multiple devices in a single application, such as CPUs, accelerators, or coprocessors, has turned the implementation and optimization tasks into a challenging process, which comes along with a variety of difficulties. The inherent complexities of the parallel algorithm, its multiple implementations, and the mapping possibilities onto one of the available processors are just examples of how intricate these tasks can become. To alleviate these issues, this paper proposes a hybrid static–dynamic selector to better exploit resources provided by heterogeneous systems. Specifically, this framework generates at compile time a decision tree based on historical information for selecting the implementation that performs best at run-time. To evaluate the benefits of this approach, we analyze the performance with two use cases: the general matrix–matrix multiplication and an image processing medical application. The experimental results demonstrate that our proposed selector enhances performance and minimizes efforts needed to tune applications. We proved that our solution improves from 10 to 24% the overall application performance in comparison with other similar approach.


Implementation selector Heterogeneous platforms Auto-tuning 



This work has been partially supported by the EU Project ICT 644235 “RePhrase: REfactoring Parallel Heterogeneous Resource-Aware Applications” and the Project TIN2016-79637-P “Towards Unification of HPC and Big Data Paradigms” from the Spanish “Ministerio de Economía y Competitividad”.


  1. 1.
    Brodtkorb AR, Dyken C, Hagen TR, Hjelmervik JM, Storaasli OO (2010) State-of-the-art in heterogeneous computing. Sci Program 18(1):1–33. doi: 10.1155/2010/540159 Google Scholar
  2. 2.
    Canales-Rodríguez EJ, Daducci A, Sotiropoulos SN, Caruyer E, Aja-Fernández S, Radua J, Mendizabal JMY, Iturria-Medina Y, Melie-García L, Alemán-Gómez Y et al (2015) Spherical deconvolution of multichannel diffusion MRI data with non-Gaussian noise models and spatial regularization. PloS One 10(10):e0138910CrossRefGoogle Scholar
  3. 3.
    clMathLibraries (2015) clBLAS.
  4. 4.
    Daoud MI, Kharma N (2006) Efficient compile-time task scheduling for heterogeneous distributed computing systems. In: 12th International Conference on Parallel and Distributed Systems—(ICPADS’06), vol 1, 9 ppGoogle Scholar
  5. 5.
    Dastgeer U, Li L, Kessler C (2013) Adaptive implementation selection in the SkePU skeleton programming library. In: Advanced Parallel Processing Technologies: 10th International Symposium, APPT 2013, Stockholm, Sweden, 27–28 August 2013, Revised Selected Papers. Springer, Berlin, pp 170–183Google Scholar
  6. 6.
    Duran A, Ayguadé E, Badia RM, Labarta J, Martinell L, Martorell X, Planas J (2011) Ompss: a proposal for programming heterogeneous multi-core architectures. Parallel Process Lett 21:173–193. doi: 10.1142/S0129626411000151 MathSciNetCrossRefGoogle Scholar
  7. 7.
    Garcia-Blas J (2016) Parallel high angular resolution diffusion imaging toolbox.
  8. 8.
    Garcia-Blas J, Dolz MF, García JD, Carretero J, Daducci A, Alemán-Gómez Y, Canales-Rodríguez EJ (2016) Porting Matlab applications to high-performance C++ codes: CPU/GPU-accelerated spherical deconvolution of diffusion MRI data. In: Algorithms and Architectures for Parallel Processing—16th International Conference, ICA3PP 2016, Granada, Spain, 14–16 December 2016, Proceedings, pp 630–643. doi: 10.1007/978-3-319-49583-5_49
  9. 9.
    Intel (2015) MKL—Math Kernel Library.
  10. 10.
    Maurer J, Wong M (2008) Towards support for attributes in C++ (Revision 6). In: JTC1/SC22/WG21—The C++ Standards Committee. N2761=08-0271Google Scholar
  11. 11.
    nVidia (2012) cuBLAS library user guide. nVidia, v5.0 ednGoogle Scholar
  12. 12.
    Sotomayor R, Sanchez LM, Garcia-Blas J, Calderon A, Fernandez J (2015) AKI: automatic kernel identification and annotation tool based on C++ attributes. In: Proceedings of the IEEE TrustCom-BigDataSE-ISPA, pp 148–156Google Scholar
  13. 13.
    Sanchez LM, del Rio Astorga D, Dolz MF, Fernández J (2016) CID: a compile-time implementation decider for heterogeneous platforms based on C++ attributes. In: 2016 International IEEE Conference on Scalable Computing and Communications (ScalCom), pp 1149–1156. doi: 10.1109/UIC-ATC-ScalCom-CBDCom-IoP-SmartWorld.2016.0177
  14. 14.
    Shen J, Varbanescu A, Sips H (2014) Look before you leap: using the right hardware resources to accelerate applications. In: 2014 IEEE 6th International Symposium on Cyberspace Safety and Security, 2014 IEEE 11th International Conference on Embedded Software and Systems (HPCC, CSS, ICESS), 2014 IEEE International Conference on High Performance Computing and Communications, pp 383–391Google Scholar
  15. 15.
    Su LT (2013) Architecting the future through heterogeneous computing. In: 2013 IEEE International Solid-State Circuits Conference Digest of Technical Papers, pp 8–11. doi: 10.1109/ISSCC.2013.6487618
  16. 16.
    Tan WJ, Tang WT, Goh R, Turner S, Wong WF (2015) A code generation framework for targeting optimized library calls for multiple platforms. IEEE Trans Parallel Distrib Syst 26(7):1789–1799CrossRefGoogle Scholar
  17. 17.
    Zhong Z, Rychkov V, Lastovetsky A (2015) Data partitioning on multicore and multi-gpu platforms using functional performance models. IEEE Trans Comput 64(9):2506–2518. doi: 10.1109/TC.2014.2375202 MathSciNetCrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2017

Authors and Affiliations

  • D. del Rio Astorga
    • 1
  • Manuel F. Dolz
    • 1
    Email author
  • Javier Fernandez
    • 1
  • Javier Garcia Blas
    • 1
  1. 1.Department of Computer ScienceUniversidad Carlos IIILeganésSpain

Personalised recommendations