Skip to main content

Using Mixed Precision Algorithm for LINPACK Benchmark on AMD GPU

  • Chapter
  • First Online:
  • 2809 Accesses

Part of the book series: Lecture Notes in Earth System Sciences ((LNESS))

Abstract

LINPACK is a de facto benchmark for supercomputers. Nowadays, the CPU and GPU heterogenous cluster becomes an important trendy of supercomputers. Because of high performance of mixed precision algorithm, we had developed a mixed precision high performance LINPACK software package GHPL on NVIDIA GPU cluster. In this paper, we will introduce the recent work about porting and optimizing GHPL on AMD GPU. On AMD GPU platform, we implemented a hybrid of CPU and GPU GEMM function by ACML-GPU and GotoBLAS library. According to our results, the speedup of GHPL over HPL was 3.21. In addition, we would point out the limitations of ACML-GPU library.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

References

  • Kurzak J, Dongarra J (2006) Implementation of the mixed-precision high performance LINPACK benchmark on the CELL Processor. University of Tennessee Computer Science, Technical report UT-CS-06-580, LAPACK Working Note 177, Sept 2006.

    Google Scholar 

  • Langou J, Langou J, Luszcek P, Kurzak J, Buttari A, Dongarra JJ (2006) Exploiting the performance of 32bit floating point arithmetic in obtatining 64 bit accuracy. In: Proceedings of the 2006 ACM/IEEE conference on supercomputing, Tampa, 2006.

    Google Scholar 

  • Moler CB (1967) Iterative refinement in floating point. J ACM 14(2):316–321

    Article  MATH  Google Scholar 

  • Wang L, Zhang Y, Zhang X, Liu F (2010) Accelerating linpack performance with mixed precision algorithm on CPU+GPGPU heterogeneous cluster, In: Proceedings of the 10th IEEE international conference on computer and information technology, 2010, pp 1169–1174.

    Google Scholar 

  • Wilkinson JH (1965) The algebraic eigenvalue problem. Clarendon, Oxford

    MATH  Google Scholar 

Download references

Acknowledgments

This work is partly supported by the National 863 Plan of China (No.2006AA01A125, No. 2009AA01A129, No.2009AA01A134), the China HGJ Project (No. 2009ZX01036-001-002), the Knowledge Innovation Program of the Chinese Academy of Sciences (No.KGCX1-YW-13), the Ministry of Finance (No. ZDYZ2008-2).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xianyi Zhang .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Zhang, X., Zhang, Y., Wang, L. (2013). Using Mixed Precision Algorithm for LINPACK Benchmark on AMD GPU. In: Yuen, D., Wang, L., Chi, X., Johnsson, L., Ge, W., Shi, Y. (eds) GPU Solutions to Multi-scale Problems in Science and Engineering. Lecture Notes in Earth System Sciences. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16405-7_34

Download citation

Publish with us

Policies and ethics