Closing the Performance Gap with Modern C++

Heller, Thomas; Kaiser, Hartmut; Diehl, Patrick; Fey, Dietmar; Schweitzer, Marc Alexander

doi:10.1007/978-3-319-46079-6_2

Thomas Heller^16,20,
Hartmut Kaiser^17,20,
Patrick Diehl^18,20,
Dietmar Fey¹⁶ &
…
Marc Alexander Schweitzer^18,19

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9945))

Included in the following conference series:

International Conference on High Performance Computing

2590 Accesses
13 Citations

Abstract

On the way to Exascale, programmers face the increasing challenge of having to support multiple hardware architectures from the same code base. At the same time, portability of code and performance are increasingly difficult to achieve as hardware architectures are becoming more and more diverse. Today’s heterogeneous systems often include two or more completely distinct and incompatible hardware execution models, such as GPGPU’s, SIMD vector units, and general purpose cores which conventionally have to be programmed using separate tool chains representing non-overlapping programming models. The recent revival of interest in the industry and the wider community for the C++ language has spurred a remarkable amount of standardization proposals and technical specifications in the arena of concurrency and parallelism. This recently includes an increasing amount of discussion around the need for a uniform, higher-level abstraction and programming model for parallelism in the C++ standard targeting heterogeneous and distributed computing. Such an abstraction should perfectly blend with existing, already standardized language and library features, but should also be generic enough to support future hardware developments. In this paper, we present the results from developing such a higher-level programming abstraction for parallelism in C++ which aims at enabling code and performance portability over a wide range of architectures and for various types of parallelism. We present and compare performance data obtained from running the well-known STREAM benchmark ported to our higher level C++ abstraction with the corresponding results from running it natively. We show that our abstractions enable performance at least as good as the comparable base-line benchmarks while providing a uniform programming API on all compared target architectures.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Bolt C++ Template Library. http://developer.amd.com/tools-and-sdks/opencl-zone/bolt-c-template-library/
C++ Single-source Heterogeneous Programming for OpenCL. https://www.khronos.org/sycl
HCC: an open source C++ compiler for heterogeneous devices. https://github.com/RadeonOpenCompute/hcc
OpenACC (Directives for Accelerators). http://www.openacc.org/
OpenMP: a proposed Industry standard API for shared memory programming, October 1997. http://www.openmp.org/mp-documents/paper/paper.ps
CUDA (2013). http://www.nvidia.com/object/cuda_home_new.html
N4406: parallel algorithms need executors. Technical report (2015). http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n4406.pdf
Broquedis, F., Clet-Ortega, J., Moreaud, S., Furmento, N., Goglin, B., Mercier, G., Thibault, S., Namyst, R.: hwloc: a generic framework for managing hardware affinities in HPC applications. In: PDP 2010 - The 18th Euromicro International Conference on Parallel, Distributed and Network-Based Computing. IEEE, Pisa, Italy. https://hal.inria.fr/inria-00429889
Deakin, T., McIntosh-Smith, S.: GPU-STREAM: benchmarking the achievable memory bandwidth of graphics processing units. In: IEEE/ACM SuperComputing (2015)
Google Scholar
Edwards, H.C., Trott, C.R., Sunderland, D.: Kokkos: enabling manycore performance portability through polymorphic memory access patterns. J. Parallel Distrib. Comput. 74(12), 3202–3216 (2014). Domain-Specific Languages and High-Level Frameworks for High-Performance Computing
Article Google Scholar
Hoberock, J., Bell, N.: Thrust: a parallel template library, vol. 42, p. 43 (2010). http://thrust.googlecode.com
Hornung, R., Keasler, J., et al.: The Raja portability layer: overview andstatus. Lawrence Livermore National Laboratory, Livermore, USA (2014)
Google Scholar
Kaiser, H., Adelstein-Lelbach, B., Heller, T., Berg, A., Biddiscombe, J., Bikineev, A., Mercer, G., Schfer, A., Habraken, J., Serio, A., Anderson, M., Stumpf, M., Bourgeois, D., Grubel, P., Brandt, S.R., Copik, M., Amatya, V., Huck, K., Viklund, L., Khatami, Z., Bacharwar, D., Yang, S., Schnetter, E., Bcorde5, Brodowicz, M., Bibek, atrantan, Troska, L., Byerly, Z., Upadhyay, S.: hpx: HPX V0.9.99: a general purpose C++ runtime system for parallel and distributed applications of any scale, July 2016. http://dx.doi.org/10.5281/zenodo.58027
Kaiser, H., Heller, T., Bourgeois, D., Fey, D.: Higher-level parallelization for local and distributed asynchronous task-based programming. In: Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, pp. 29–37. ACM (2015)
Google Scholar
McCalpin, J.D.: Stream: sustainable memory bandwidth in high performance computers. Technical report, University of Virginia, Charlottesville, Virginia (1991–2007), a continually updated Technical report. http://www.cs.virginia.edu/stream/
McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. IEEE Comput. Soc. Tech. Committee Comput. Archit. (TCCA) Newsl. 59, 19–25 (1995)
Google Scholar
The C++ Standards Committee: ISO International Standard ISO/IEC 14882: 2014, Programming Language C++. Technical report, Geneva, Switzerland: International Organization for Standardization (ISO) (2014). http://www.open-std.org/jtc1/sc22/wg21
The C++ Standards Committee: N4578: Working Draft, Technical Specification for C++ Extensions for Parallelism Version 2. Technical report (2016). http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/n4578.pdf
The C++ Standards Committee: N4594: Working Draft, Standard for Programming Language C ++. Technical report (2016). http://open-std.org/JTC1/SC22/WG21/docs/papers/2016/n4594.pdf

Download references

Acknowledgement

This work is supported by the NSF awards 1240655 (STAR), 1447831 (PXFS), and 1339782 (STORM), and the DoE award DE-SC0008714 (XPRESS) and by the European Union’s Horizon 2020 research and innovation program under grant agreement No 671603.

Author information

Authors and Affiliations

Computer Science 3, Computer Architectures, Friedrich-Alexander-University, Erlangen, Germany
Thomas Heller & Dietmar Fey
Center for Computation and Technology, Louisiana State University, Baton Rouge, USA
Hartmut Kaiser
Institute for Numerical Simulation, University of Bonn, Bonn, Germany
Patrick Diehl & Marc Alexander Schweitzer
Meshfree Multiscale Methods, Fraunhofer SCAI, Schloss Birlinghoven, Sankt Augustin, Germany
Marc Alexander Schweitzer
The STELLAR Group, Baton Rouge, USA
Thomas Heller, Hartmut Kaiser & Patrick Diehl

Authors

Thomas Heller
View author publications
You can also search for this author in PubMed Google Scholar
Hartmut Kaiser
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Diehl
View author publications
You can also search for this author in PubMed Google Scholar
Dietmar Fey
View author publications
You can also search for this author in PubMed Google Scholar
Marc Alexander Schweitzer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Thomas Heller .

Editor information

Editors and Affiliations

University of Delaware, Newark, Delaware, USA
Michela Taufer
Forschungszentrum Jülich, Jülich, Germany
Bernd Mohr
DKRZ, Hamburg, Germany
Julian M. Kunkel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Heller, T., Kaiser, H., Diehl, P., Fey, D., Schweitzer, M.A. (2016). Closing the Performance Gap with Modern C++. In: Taufer, M., Mohr, B., Kunkel, J. (eds) High Performance Computing. ISC High Performance 2016. Lecture Notes in Computer Science(), vol 9945. Springer, Cham. https://doi.org/10.1007/978-3-319-46079-6_2

Download citation

DOI: https://doi.org/10.1007/978-3-319-46079-6_2
Published: 06 October 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-46078-9
Online ISBN: 978-3-319-46079-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics