Abstract
In June 2022, Frontier became the first Supercomputer to “officially” break the ExaFLOP/s barrier on LINPACK, achieving a peak performance of \(1.1 \times 10^{18}\) floating-point operations per second using AMD Instinct accelerators. Developing high performance applications for such platforms typically requires the adoption of vendor-specific programming models, which in turn may limit portability. SYCL is a high-level, single-source language based on C++17, developed by the Khronos group to overcome the shortcomings of those vendor-specific HPC programming models. In this paper we present an initial study into the SYCL parallel programming model and its implementing compilers, to understand its performance and portability, and how this compares to other parallel programming models. We use three major SYCL implementations for our evaluation – Open SYCL (previously hipSYCL), DPC++, and ComputeCpp – on a range of CPU and GPU hardware from Intel, AMD, Fujitsu, Marvell, and NVIDIA. Our results show that for a simple finite difference mini-application, SYCL can offer competitive performance to native approaches, while for a more complex finite-element mini-application, significant performance degradation is observed. Our findings suggest that development work is required at the compiler- and application-level to ensure SYCL is competitive with alternative approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Beckingsale, D.A., et al.: RAJA: portable performance for large-scale scientific applications. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 71–81 (2019)
Breyer, M., Van Craen, A., Pflüger, D.: A comparison of SYCL, OpenCL, CUDA, and OpenMP for massively parallel support vector machine classification on multi-vendor hardware. In: International Workshop on OpenCL (IWOCL), pp. 1–12 (2022)
Dagum, L., Menon, R.: OpenMP: an industry standard API for shared-memory programming. IEEE Comput. Sci. Eng. 5(1), 46–55 (1998)
Deakin, T., McIntosh-Smith, S.: Evaluating the performance of HPC-style SYCL applications. In: International Workshop on OpenCL (IWOCL). ACM (2020)
Deakin, T., et al.: Performance portability across diverse computer architectures. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 1–13 (2019)
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. (JPDC) 74(12), 3202–3216 (2014)
Herdman, J.A., et al.: Accelerating hydrocodes with OpenACC, OpenCL and CUDA. In: SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 465–471 (2012)
Joo, B., et al.: Performance portability of a Wilson Dslash stencil operator mini-app using Kokkos and SYCL. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 14–25 (2019)
Kirk, R.O., Mudalige, G.R., Reguly, I.Z., Wright, S.A., Martineau, M.J., Jarvis, S.A.: Achieving performance portability for a heat conduction solver mini-application on modern multi-core systems. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 834–841 (2017)
Law, T.R., et al.: Performance portability of an unstructured hydrodynamics mini-application. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 0–12 (2018)
Lin, P.T., Heroux, M.A., Barrett, R.F., Williams, A.B.: Assessing a mini-application as a performance proxy for a finite element method engineering application. Concurrency Comput. Pract. Experience 27(17), 5374–5389 (2015)
Lin, W.C., Deakin, T., McIntosh-Smith, S.: On Measuring the Maturity of SYCL Implementations by Tracking Historical Performance Improvements. In: International Workshop on OpenCL (IWOCL). ACM (2021)
OpenACC-Standard.org: The OpenACC Application Program Interface Version 3.3 (2022). https://www.openacc.org/sites/default/files/inline-images/Specification/OpenACC-3.3-final.pdf
OpenMP Architecture Review Board: OpenMP API Version 4.5 (2015). https://www.openmp.org/wp-content/uploads/openmp-4.5.pdf
Pennycook, S.J., Jarvis, S.A.: Developing performance-portable molecular dynamics kernels in opencl. In: 2012 SC Companion: High Performance Computing, Networking Storage and Analysis, pp. 386–395 (2012)
Pennycook, S.J., Sewall, J., Jacobsen, D., Deakin, T., Zamora, Y., Lee, K.L.K.: Performance, portability and productivity analysis. Library (2023). https://doi.org/10.5281/zenodo.7733678
Pennycook, S.J., Hammond, S.D., Wright, S.A., Herdman, J.A., Miller, I., Jarvis, S.A.: An investigation of the performance portability of OpenCL. J. Parallel Distrib. Comput. (JPDC) 73(11), 1439–1450 (2013)
Pennycook, S., Sewall, J., Lee, V.: Implications of a metric for performance portability. Futur. Gener. Comput. Syst. 92, 947–958 (2019)
Reguly, I.Z., Owenson, A.M.B., Powell, A., Jarvis, S.A., Mudalige, G.R.: Under the hood of SYCL – an initial performance analysis with an unstructured-mesh CFD application. In: Chamberlain, B.L., Varbanescu, A.-L., Ltaief, H., Luszczek, P. (eds.) ISC High Performance 2021. LNCS, vol. 12728, pp. 391–410. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-78713-4_21
Sewall, J., Pennycook, S.J., Jacobsen, D., Deakin, T., McIntosh-Smith, S.: Interpreting and visualizing performance portability metrics. In: IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC), pp. 14–24 (2020)
Stone, J.E., Gohara, D., Shi, G.: OpenCL: a parallel programming standard for heterogeneous computing systems. Comput. Sci. Eng. 12(3), 66 (2010)
The Khronos SYCL Working Group: SYCL 2020 Specification (2023). https://registry.khronos.org/SYCL/specs/sycl-2020/pdf/sycl-2020.pdf
Truby, D., Wright, S.A., Kevis, R., Maheswaran, S., Herdman, J.A., Jarvis, S.A.: BookLeaf: an unstructured hydrodynamics mini-application. In: IEEE International Conference on Cluster Computing (CLUSTER), pp. 615–622 (2018)
University of Bristol HPC Group: Programming Your GPU with OpenMP: A Hands-On Introduction (2022). https://github.com/UoB-HPC/openmp-tutorial
Acknowledgements
Many of the results in this paper were gathered on the Isambard UK National Tier-2 HPC Service (http://gw4.ac.uk/isambard/) operated by GW4 and the UK Met Office, and funded by EPSRC (EP/P020224/1).
Access to the Intel HD Graphics P630 GPU was provided by Intel through the Intel Developer Cloud.
The ExCALIBUR programme (https://excalibur.ac.uk/) is supported by the UKRI Strategic Priorities Fund. The programme is co-delivered by the Met Office and EPSRC in partnership with the Public Sector Research Establishment, the UK Atomic Energy Authority (UKAEA) and UKRI research councils, including NERC, MRC and STFC.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Shilpage, W.R., Wright, S.A. (2023). An Investigation into the Performance and Portability of SYCL Compiler Implementations. In: Bienz, A., Weiland, M., Baboulin, M., Kruse, C. (eds) High Performance Computing. ISC High Performance 2023. Lecture Notes in Computer Science, vol 13999. Springer, Cham. https://doi.org/10.1007/978-3-031-40843-4_45
Download citation
DOI: https://doi.org/10.1007/978-3-031-40843-4_45
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-40842-7
Online ISBN: 978-3-031-40843-4
eBook Packages: Computer ScienceComputer Science (R0)