Abstract
Heterogeneity, parallelization and vectorization are key techniques to improve the performance and energy efficiency of modern computing systems. However, programming and maintaining code for these architectures poses a huge challenge due to the ever-increasing architecture complexity. Task-based environments hide most of this complexity, improving scalability and usage of the available resources. In these environments, while there has been a lot of effort to ease parallelization and improve the usage of heterogeneous resources, vectorization has been considered a secondary objective. Furthermore, there has been a swift and unstoppable burst of vector architectures at all market segments, from embedded to HPC. Vectorization can no longer be ignored, but manual vectorization is tedious, error-prone and not practical for the average programmer. This work evaluates the feasibility of user-directed vectorization in task-based applications. Our evaluation is based on the OmpSs programming model, extended to support user-directed vectorization for different SIMD architectures (i.e., SSE, AVX2, AVX512). Results show that user-directed codes achieve manually optimized code performance and energy efficiency with minimal code modifications, favoring portability across different SIMD architectures.
This is a preview of subscription content, access via your institution.





References
Maleki S et al (2011) An evaluation of vectorizing compilers. In: Proceedings of the 2011 International Conference on Parallel Architectures and Compilation Techniques, ser. PACT ’11. IEEE Computer Society, Washington, DC, pp 372–382
Cebrian JM, Jahre M, Natvig L (2015) ParVec: vectorizing the PARSEC benchmark suite. Computing 97(11):1077–1100
Programming Models, Barcelona Supercomputing Center (2011) The Mercurium \(\text{C}/\text{ C }++\) Source-to-source Compiler Website. http://pm.bsc.es/projects/mcxx. Accessed 1 Jan 2017
Duran A et al (2011) OmpSs: a proposal for programming heterogeneous multi-core architetcures. Parallel Process Lett 21:173–193
Caballero de Gea DL (2015) SIMD@OpenMP: a programming model approach to leverage SIMD features. PhD Thesis. http://www.tdx.cat/handle/10803/334171. Accessed 1 Jan 2017
Mucci PJ et al (1999) PAPI: a portable interface to hardware performance counters. In: Proceedings of the Department of Defense HPCMP Users Group Conference
Intel Corporation (2011) Intel SPMD Program Compiler. https://ispc.github.io. Accessed 1 Jan 2017
Rapaport G, Zaks A, Ben-Asher Y (2015) Streamlining whole function vectorization in C using higher order vector semantics. In: 2015 IEEE International Parallel and Distributed Processing Symposium Workshop (IPDPSW), pp 718–727
Molka D et al (2011) Flexible workload generation for HPC cluster efficiency benchmarking. Springer, Berlin/Heidelberg
Kim C et al (2012) Technical report: closing the ninja performance gap through traditional programming and compiler technology
Che S et al (2009) Rodinia: a benchmark suite for heterogeneous computing. In: Proceedings of the 2009 IEEE International Symposium on Workload Characterization. IEEE, pp 44–54
Li M et al (2005) The ALPBench benchmark suite. In: Proceedings of the IEEE International Symposium on Workload Characterization
Author information
Authors and Affiliations
Corresponding author
Additional information
This work was done at Barcelona Supercomputing Center (BSC).
Rights and permissions
About this article
Cite this article
Caminal, H., Caballero, D., Cebrián, J.M. et al. Performance and energy effects on task-based parallelized applications. J Supercomput 74, 2627–2637 (2018). https://doi.org/10.1007/s11227-018-2294-9
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2294-9
Keywords
- Data-level parallelism
- Task-level parallelism
- Vectorization
- Energy efficiency