Abstract
Given the increasing diversity of multi- and many-core processors, portability is a desirable feature of applications designed and implemented for such platforms. Portability is unanimously seen as a productivity enabler, but it is also considered a major performance blocker. Thus, performance portability has emerged as the property of an application to preserve similar form and similar performance on a set of platforms; a first metric, based on extensive evaluation, has been proposed to quantify performance portability for a given application on a set of given platforms.
In this work, we explore the challenges and limitations of this performance portability metric (PPM) on two levels. We first use 5 OpenACC applications and 3 platforms, and we demonstrate how to compute and interpret PPM in this context. Our results indicate specific challenges in parameter selection and results interpretation. Second, we use controlled experiments to assess the impact of platform-specific optimizations on both performance and performance portability. Our results illustrate, for our 5 OpenACC applications, a clear tension between performance improvement and performance portability improvement.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
All applications use single precision floating point data types, with Diffusion Operator being the only exception.
- 2.
- 3.
Oak Ridge Leadership Computing Facility, https://github.com/olcf/vector_addition_tutorials.
- 4.
yuhc, https://github.com/yuhc/gpu-rodinia [3].
- 5.
In all notations with a subscript m or peak, m stands for measured, and peak represents a form of peak performance.
References
Bal, H., et al.: A medium-scale distributed system for computer science research: infrastructure for the long term. Computer 49(5), 54–63 (2016)
Bauer, S.: Accelerator Offloading mit GCC (in German) (2016). https://www.heise.de/developer/artikel/Accelerator-Offloading-mit-GCC-3317330.html?seite=3. Accessed Apr 2018
Che, S., et al.: Rodinia: a benchmark suite for heterogeneous computing. In: IEEE International Symposium on Workload Characterization, IISWC 2009, pp. 44–54. IEEE (2009)
Fabeiro, J.F.: Tools for improving performance portability in heterogeneous environments. Ph.D. thesis, Department of Computer Engineering, University of A Coruña, July 2017
Fang, J., Varbanescu, A.L., Sips, H.: A comprehensive performance comparison of CUDA and OpenCL. In: 2011 International Conference on Parallel Processing (ICPP), pp. 216–225. IEEE (2011)
Intel. Intel Math Kernel Library. https://software.intel.com/en-us/mkl. Accessed Apr 2018
Martineau, M., McIntosh-Smith, S., Gaudin, W.: Assessing the performance portability of modern parallel programming models using TeaLeaf. Concurrency Comput.: Pract. Exp. 29(15), e4117 (2017)
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995
McIntosh-Smith, S., Boulton, M., Curran, D., Price, J.: On the performance portability of structured grid codes on many-core computer architectures. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2014. LNCS, vol. 8488, pp. 53–75. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-07518-1_4
NVIDIA. cuBLAS. https://developer.nvidia.com/cublas. Accessed Apr 2018
NVIDIA. CUDA C Programming Guide (2018). https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html. Accessed Apr 2018
Pennycook, S.J., Hammond, S.D., Wright, S.A., Herdman, J.A., Miller, I., Jarvis, S.A.: An investigation of the performance portability of OpenCL. J. Parallel Distrib. Comput. 73(11), 1439–1450 (2013)
Pennycook, S.J., Sewall, J.D., Lee, V.W.: A Metric for Performance Portability. arXiv preprint arXiv:1611.07409 (2016)
Pennycook, S.J., Sewall, J.D., Lee, V.W.: A metric for performance portability. CoRR, abs/1611.07409 (2016)
Pennycook, S.J., Sewall, J.D., Lee, V.W.: Implications of a metric for performance portability. Future Gener. Comput. Syst. (2017)
Rul, S., Vandierendonck, H., D’Haene, J., De Bosschere, K.: An experimental study on performance portability of OpenCL kernels. In: 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC 2010) (2010)
Shen, J., Fang, J., Sips, H., Varbanescu, A.L.: Performance gaps between OpenMP and OpenCL for multi-core CPUs. In: Proceedings of the 2012 41st International Conference on Parallel Processing Workshops, ICPPW 2012, Washington, DC, USA, pp. 116–125. IEEE Computer Society (2012)
Stratton, J.A., Kim, H., Jablin, T.B., Hwu, W.W.: Performance portability in accelerated parallel kernels. Center for Reliable and High-Performance Computing (2013)
UK-MAC. TeaLeaf (2017). http://uk-mac.github.io/TeaLeaf/
van der Sanden, J.: Evaluating the performance portability of OpenCL. Master’s thesis, Eindhoven University of Technology, The Netherlands (2011)
Williams, S., Waterman, A., Patterson, D.: Roofline: an insightful visual performance model for multicore architectures. Commun. ACM 52(4), 65–76 (2009)
Zhang, Y., Sinclair, M., Chien, A.A.: Improving performance portability in OpenCL programs. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. LNCS, vol. 7905, pp. 136–150. Springer, Heidelberg (2013). https://doi.org/10.1007/978-3-642-38750-0_11
Acknowledgements
We would like to thank Jason Sewall and John Pennycook for their help in designing our experiments and interpreting the results.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer Nature Switzerland AG
About this paper
Cite this paper
Dreuning, H., Heirman, R., Varbanescu, A.L. (2018). A Beginner’s Guide to Estimating and Improving Performance Portability. In: Yokota, R., Weiland, M., Shalf, J., Alam, S. (eds) High Performance Computing. ISC High Performance 2018. Lecture Notes in Computer Science(), vol 11203. Springer, Cham. https://doi.org/10.1007/978-3-030-02465-9_52
Download citation
DOI: https://doi.org/10.1007/978-3-030-02465-9_52
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-02464-2
Online ISBN: 978-3-030-02465-9
eBook Packages: Computer ScienceComputer Science (R0)