Skip to main content

A CUDA Implementation of the High Performance Conjugate Gradient Benchmark

  • Conference paper
  • First Online:
High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation (PMBS 2014)

Abstract

The High Performance Conjugate Gradient (HPCG) benchmark has been recently proposed as a complement to the High Performance Linpack (HPL) benchmark currently used to rank supercomputers in the Top500 list. This new benchmark solves a large sparse linear system using a multigrid preconditioned conjugate gradient (PCG) algorithm. The PCG algorithm contains the computational and communication patterns prevalent in the numerical solution of partial differential equations and is designed to better represent modern application workloads which rely more heavily on memory system and network performance than HPL. GPU accelerated supercomputers have proved to be very effective, especially with regard to power efficiency, for accelerating compute intensive applications like HPL. This paper will present the details of a CUDA implementation of HPCG, and the results obtained at full scale on the largest GPU supercomputers available: the Cray XK7 at ORNL and the Cray XC30 at CSCS. The results indicate that GPU accelerated supercomputers are also very effective for this type of workload.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Dongarra, J., Heroux, M.A.: Toward a New Metric for Ranking High Performance Computing Systems. Sandia report SAND2013-4744 (2013)

    Google Scholar 

  2. Dongarra, J., Luszczek, P.: Introduction to the HPC challenge benchmark Suite, ICL Technical report, ICL-UT-05-01, (Also appears as CS Dept. Tech report UT-CS-05-544) (2005)

    Google Scholar 

  3. Heroux, M.A., Dongarra, J., Luszczek, P: HPCG Technical specification, Sandia report SAND2013-8752 (2013)

    Google Scholar 

  4. Graph 500. http://www.graph500.org

  5. Green 500. http://www.green500.org

  6. CUDA Toolkit. http://developer.nvidia.com/cuda-toolkit

  7. CUDA Fortran. http://www.pgroup.com/resources/cudafortran.htm

  8. CUBLAS Library. http://docs.nvidia.com/cuda/cublas

  9. CUSPARSE Library. http://docs.nvidia.com/cuda/cusparse

  10. THRUST Library. http://docs.nvidia.com/cuda/thrust

  11. http://devblogs.nvidia.com/parallelforall/cuda-pro-tip-generate-custom-application-profile-timelines-nvtx/

  12. Barrett, R.F., Heroux, M.A., Lin, P.T., Vaughan, C.T., Williams, A.B.: Poster: mini-applications: vehicles for co-design. In: Proceedings of the 2011 Companion on High Performance Computing Networking, Storage and Analysis Companion (SC 2011 Companion), pp. 1–2. ACM, New York (2011)

    Google Scholar 

  13. Golub, G.H., Van Loan, C.F.: Matrix Computations, 3rd edn. John Hopkins University Press, USA (1996)

    MATH  Google Scholar 

  14. Briggs, W.L., Henson, V.E., McCormick, S.F.: A Multigrid Tutorial. SIAM, USA (2000)

    Book  MATH  Google Scholar 

  15. Green 500: Energy efficient HPC System workloads power measurement methodology (2013)

    Google Scholar 

  16. McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, December 1995

    Google Scholar 

  17. Phillips, E.H., Fatica, M.: Implementing the Himeno benchmark with CUDA on GPU clusters. In: IEEE International Symposium on Parallel & Distributed Processing IPDPS, pp. 1–10 (2010)

    Google Scholar 

  18. Park, J., Smelyanskiy, M.: Optimizing Gauss-Seidel Smoother in HPCG. In: ASCR HPCG Workshop, Bethesda MD, 25 March 2014

    Google Scholar 

  19. Luby, M.: A simple parallel algorithm for the maximal independent set problem. SIAM J. Comput. 15(4), 1036–1053 (1986)

    Article  MATH  MathSciNet  Google Scholar 

  20. Jones, M.T., Plassmann, P.E.: A parallel graph coloring heuristic. SIAM J. Sci. Comput. 14, 654–669 (1992)

    Article  MathSciNet  Google Scholar 

  21. Cohen, J., Castonguay, P.: Efficient graph matching and coloring on the GPU. In: GPU Technology Conference, San Jose CA, 14–17 May 2012. http://ondemand.gputechconf.com/gtc/2012/presentations/S0332-Efficient-Graph-Matching-and-Coloring-on-GPUs.pdf

Download references

Acknowledgments

This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725. We wish to thank Buddy Bland, Jack Wells and Don Maxwell of Oak Ridge National Laboratory for their support. This work was also supported by a grant from the Swiss National Supercomputing Centre (CSCS) under project ID g33. We also want to acknowledged the support from Gilles Fourestey and Thomas Schulthess at CSCS. We wish to thank Lung Scheng Chien and Jonathan Cohen at NVIDIA for relevant discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Everett Phillips .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Phillips, E., Fatica, M. (2015). A CUDA Implementation of the High Performance Conjugate Gradient Benchmark. In: Jarvis, S., Wright, S., Hammond, S. (eds) High Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation. PMBS 2014. Lecture Notes in Computer Science(), vol 8966. Springer, Cham. https://doi.org/10.1007/978-3-319-17248-4_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-17248-4_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-17247-7

  • Online ISBN: 978-3-319-17248-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics