Skip to main content

Performance and Portability of a Linear Solver Across Emerging Architectures

  • Conference paper
  • First Online:
Accelerator Programming Using Directives (WACCPD 2020)

Part of the book series: Lecture Notes in Computer Science ((LNPSE,volume 12655))

Included in the following conference series:

  • 348 Accesses

Abstract

A linear solver algorithm used by a large-scale unstructured-grid computational fluid dynamics application is examined for a broad range of familiar and emerging architectures. Efficient implementation of a linear solver is challenging on recent CPUs offering vector architectures. Vector loads and stores are essential to effectively utilize available memory bandwidth on CPUs, and maintaining performance across different CPUs can be difficult in the face of varying vector lengths offered by each. A similar challenge occurs on GPU architectures, where it is essential to have coalesced memory accesses to utilize memory bandwidth effectively. In this work, we demonstrate that restructuring a computation, and possibly data layout, with regard to architecture is essential to achieve optimal performance by establishing a performance benchmark for each target architecture in a low level language such as vector intrinsics or CUDA. In doing so, we demonstrate how a linear solver kernel can be mapped to Intel® Xeon™ and Xeon Phi™, Marvell® ThunderX2®, NEC® SX-Aurora™ TSUBASA Vector Engine, and NVIDIA® and AMD® GPUs. We further demonstrate that the required code restructuring can be achieved in higher level programming environments such as OpenACC, OCCA, and Intel® OneAPI™/SYCL, and that each generally results in optimal performance on the target architecture. Relative performance metrics for all implementations are shown, and subjective ratings for ease of implementation and optimization are suggested.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. OpenACC. https://www.openacc.org. Accessed 24 Aug 2020

  2. OpenMP. https://www.openmp.org. Accessed 24 Aug 2020

  3. The MPI Forum Website. http://www.mpi-forum.org. Accessed 24 Aug 2020

  4. AMD Incorporated: AMD Radeon Instinct MI50 Accelerator. https://www.amd.com/en/products/professional-graphics/instinct-mi50. Accessed 24 Aug 2020

  5. AMD Incorporated: HIP Porting Guide. https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-porting-guide.html. Accessed 24 Aug 2020

  6. AMD Incorporated: HIP Programming Guide. https://rocm-documentation.readthedocs.io/en/latest/Programming_Guides/HIP-GUIDE.html. Accessed 24 Aug 2020

  7. Biedron, R., et al.: FUN3D Manual 13.6. NASA/TM-2019-220416 (2019)

    Google Scholar 

  8. Codeplay: Codeplay Contribution to DPC++ Brings SYCL Support for NVIDIA GPUs. https://www.codeplay.com/portal/news/2020/02/03/codeplay-contribution-to-dpcpp-brings-sycl-support-for-nvidia-gpus.html. Accessed 24 Aug 2020

  9. Intel Corporation: Intel oneAPI DPC++ Compiler (Beta). https://software.intel.com/content/www/us/en/develop/tools/oneapi/components/dpc-compiler.html. Accessed 24 Aug 2020

  10. Intel Corporation: Intrinsics Guide. https://software.intel.com/sites/landingpage/IntrinsicsGuide/. Accessed 24 Aug 2020

  11. Khronos Group: OpenCL. https://www.khronos.org/opencl/. Accessed 24 Aug 2020

  12. Khronos Group: SYCL. https://www.khronos.org/sycl/. Accessed 24 Aug 2020

  13. Kincaid, D.R., Oppe, T.C., Young, D.M.: ITPACKV 2D User’s Guide, May 1989

    Google Scholar 

  14. Korzun, A., et al.: Effects of Spatial Resolution on Retropropulsion Aerodynamics in an Atmospheric Environment. AIAA SciTech Forum (2020)

    Google Scholar 

  15. Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.R.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36(5), C401–C423 (2014). https://doi.org/10.1137/130930352

    Article  MathSciNet  MATH  Google Scholar 

  16. Laflin, K.R., et al.: Data summary from second AIAA computational fluid dynamics drag prediction workshop. J. Aircraft 42(5), 1165–1178 (2005)

    Article  Google Scholar 

  17. Medina, D.S., St-Cyr, A., Warburton, T.: OCCA: A Unified Approach to Multi-Threading Languages. arXiv preprint arXiv:1403.0968 (2014)

  18. NEC Corporation: SX-Aurora TSUBASA Fortran Compiler User’s Guide. https://www.hpc.nec/documents/sdk/pdfs/g2af02e-FortranUsersGuide-018.pdf. Accessed 24 Aug 2020

  19. NEC Corporation: SX-Aurora TSUBASA VEOS NUMA Mode Guide for Partitioning Mode. https://www.hpc.nec/documents/guide/pdfs/VEOS_NUMA_Mode4PartitioningMode_E.pdf. Accessed 24 Aug 2020

  20. Nielsen, E.J., Diskin, B.: High-performance aerodynamic computations for aerospace applications. Parallel Comput. 64, 20–32 (2017)

    Article  MathSciNet  Google Scholar 

  21. NVIDIA Corporation: cuBLAS. https://developer.nvidia.com/cublas. Accessed 24 Aug 2020

  22. NVIDIA Corporation: CUDA C Programming Guide. http://docs.nvidia.com/cuda/cuda-c-programming-guide/#axzz4Hicq83a9. Accessed 24 Aug 2020

  23. NVIDIA Corporation: cuSPARSE. https://developer.nvidia.com/cusparse. Accessed 24 Aug 2020

  24. Oak Ridge National Laboratory: Exascale System Expected to be World’s Most Powerful Computer for Science and Innovation. https://www.olcf.ornl.gov/2019/05/07/no-scaling-back-doe-cray-amd-to-bring-exascale-to-ornl/. Accessed 24 Aug 2020

  25. Saad, Y.: Iterative Methods for Sparse Linear Systems, 2nd edn. Society for Industrial and Applied Mathematics, Philadelphia (2003)

    Google Scholar 

  26. ANANDTECH: Assessing Cavium’s ThunderX2: The Arm Server Dream Realized At Last (2018). https://www.anandtech.com/show/12694/assessing-cavium-thunderx2-arm-server-reality

  27. Walden, A., Nielsen, E., Diskin, B., Zubair, M.: A mixed precision multicolor point-implicit solver for unstructured grids on GPUs. In: Proceedings of the Ninth Workshop on Irregular Applications: Architectures and Algorithms, IA3 2019, Los Alamitos, CA, USA, pp. 23–30. IEEE Press (2019)

    Google Scholar 

  28. Zubair, M., Nielsen, E., Luitjens, J., Hammond, D.: An optimized multicolor point-implicit solver for unstructured grid applications on graphics processing units. In: Proceedings of the Sixth Workshop on Irregular Applications: Architectures and Algorithms, IA3 2016, Piscataway, NJ, USA, pp. 18–25. IEEE Press (2016)

    Google Scholar 

Download references

Acknowledgments

The authors would like to express their appreciation to the following people for many helpful conversations pertaining to the current work: Justin Luitjens (NVIDIA Corporation), Erich Focht and Rudolf Fischer (NEC Corporation), John Linford (Arm Limited), Tim Warburton (Department of Mathematics, Virginia Tech); Noel Chalmers (AMD Incorporated), Sameer Shende (Department of Computer and Information Science, University of Oregon), and Jeff Hammond, Varsha Madananth, and Kevin O’Leary (Intel Corporation). The authors also wish to thank the High Performance Computing Incubator at the NASA Langley Research Center and the NASA Headquarters Office of Chief Engineer Research and Analysis program for providing support for this work. The support of Dr. Mujeeb Malik, Technical Lead for the Revolutionary Computational Aerosciences subproject within the NASA Aeronautics Research Mission Directorate Transformational Tools and Technologies Project, is also acknowledged.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Aaron C. Walden .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Walden, A.C., Zubair, M., Nielsen, E.J. (2021). Performance and Portability of a Linear Solver Across Emerging Architectures. In: Bhalachandra, S., Wienke, S., Chandrasekaran, S., Juckeland, G. (eds) Accelerator Programming Using Directives. WACCPD 2020. Lecture Notes in Computer Science(), vol 12655. Springer, Cham. https://doi.org/10.1007/978-3-030-74224-9_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-74224-9_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-74223-2

  • Online ISBN: 978-3-030-74224-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics