Skip to main content

Lattice-CSC: Optimizing and Building an Efficient Supercomputer for Lattice-QCD and to Achieve First Place in Green500

  • Conference paper
  • First Online:
Book cover High Performance Computing (ISC High Performance 2015)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9137))

Included in the following conference series:

Abstract

In the last decades, supercomputers have become a necessity in science and industry. Huge data centers consume enormous amounts of electricity and we are at a point where newer, faster computers must no longer drain more power than their predecessors. The fact that user demand for compute capabilities has not declined in any way has led to studies of the feasibility of exaflop systems. Heterogeneous clusters with highly-efficient accelerators such as GPUs are one approach to higher efficiency. We present the new L-CSC cluster, a commodity hardware compute cluster dedicated to Lattice QCD simulations at the GSI research facility. L-CSC features a multi-GPU design with four FirePro S9150 GPUs per node providing 320 GB/s memory bandwidth and 2.6 TFLOPS peak performance each. The high bandwidth makes it ideally suited for memory-bound LQCD computations while the multi-GPU design ensures superior power efficiency. The November 2014 Green500 list awarded L-CSC the most power-efficient supercomputer in the world with 5270 MFLOPS/W in the Linpack benchmark. This paper presents optimizations to our Linpack implementation HPL-GPU and other power efficiency improvements which helped L-CSC reach this benchmark. It describes our approach for an accurate Green500 power measurement and unveils some problems with the current measurement methodology. Finally, it gives an overview of the Lattice QCD application on L-CSC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    See http://www.gsi.de.

  2. 2.

    See e.g. http://lattice.github.io/quda/.

  3. 3.

    Compute Abstraction Layer is the assembler language of former AMD GPUs.

References

  1. Rohr, D., Kalcher, S., Bach, M., Alaqeeli, A., Alzaid, H., et al.: An energy-efficient multi-GPU supercomputer. In: Proceedings of the 16th IEEE International Conference on High Performance Computing and Communications, IEEE, Paris, France (2014)

    Google Scholar 

  2. Gupta, R.: Introduction to Lattice QCD (1998). http://arxiv.org/abs/hep-lat/9807028

  3. Babich, R., Clark, M., Joó, B., Shi, G., Brower, R. C., Gottlieb, S.: Scaling lattice QCD beyond 100 GPUs. In: SC 2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 70:1–70:11 (2011)

    Google Scholar 

  4. Bach, M., Lindenstruth, V., Philipsen, O., Pinke, C.: Lattice QCD based on OpenCL. Comput. Phys. Commun. 184, 2042–2052 (2013)

    Article  Google Scholar 

  5. Bach, M., Lindenstruth, V., Pinke, C., Philipsen, O.: Twisted-Mass Lattice QCD using OpenCL. In: PoS LATTICE2013, p. 032 (2013)

    Google Scholar 

  6. Philipsen, O., Pinke, C., Sciarra, A., Bach, M.: CL2QCD - lattice QCD based on OpenCL. In: PoS LATTICE2014, p. 038 (2014)

    Google Scholar 

  7. http://code.compeng.uni-frankfurt.de/projects/clhmc

  8. Khronos OpenCL Registry, OpenCL API and C Language Specifications. https://www.khronos.org/registry/cl/

  9. NVIDIA, CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/index.html

  10. Philipsen, O., Pinke, C.: The nature of the Roberge-Weiss transition in \(N_f=2\). Phys. Rev. D 89(9), 094504 (2014)

    Article  Google Scholar 

  11. Philipsen, O., Bach, M., Lindenstruth, V., Pinke, C.: The thermal quark hadron transition in lattice QCD with two quark flavours. In: Proceedings of Conference: C14–02-12.1, pp. 33–40

    Google Scholar 

  12. Dongarra, J., Luszczek, P., Petitet, A.: The LINPACK benchmark: past, present and future. Concurrency Comput.: Pract. Experience 15(9), 803–820 (2003)

    Article  Google Scholar 

  13. TOP500 Supercomputer Sites. http://www.top500.org

  14. Bach, M., Kretz, M., Lindenstruth, V., Rohr, D.: Optimized HPL for AMD GPU and multi-core CPU usage. Comput. Sci. - Res. Dev. 26(3–4), 153–164 (2011)

    Article  Google Scholar 

  15. Rohr, D., Bach, M., Kretz, M., Lindenstruth, V.: Multi-GPU DGEMM and HPL on highly energy efficient clusters. In: IEEE Micro, Special Issue, CPU, GPU, and Hybrid Computing (2011)

    Google Scholar 

  16. Sharma, S., Hsu, C., Feng, W.: Making a case for a Green500 list. In: Proceedings of the 20th IEEE International Parallel Distributed Processing Symposium p. 343 (2006)

    Google Scholar 

  17. The Green500 List. http://www.green500.org

  18. Bach, M., De Cuveland, J., Ebermann, H., Eschweiler, D., Kretz, M., et al.: The LOEWE-CSC: a comprehensive approach for a power efficient general purpose supercomputer. In: 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (2013)

    Google Scholar 

  19. Rohr, D., Nescovic, G., Radtke, M., Lindenstruth, V.: The L-CSC cluster: greenest supercomputer in the world in Green500 list of November 2014. In: Proceedings of Supercomputing Frontiers (2015)

    Google Scholar 

  20. High Energy Accelerator Research Organization. http://www.kek.jp

  21. PEZY Computing, PEZY-SC Many Core Processor (2014). http://www.pezy.co.jp/en/products/pezy-sc.html

  22. http://www.gsic.titech.ac.jp/tsubame

  23. Sterling, T.L.: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. MIT Press, Cambridge (1999)

    Google Scholar 

  24. Intel Corporation, Intel MKL BLAS Library. https://software.intel.com/en-us/intel-mkl

  25. Rohr, D., Lindenstruth, V.: A flexible and portable large-scale DGEMM library for linpack on next-generation multi-GPU systems. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (2015)

    Google Scholar 

  26. https://www.kernel.org/pub/linux/utils/kernel/cpufreq/

  27. Kidd, T.I.: What exactly is a P-state? (2008). https://software.intel.com/en-us/blogs/2008/05/29/what-exactly-is-a-p-state-pt-1

  28. EEHPC Working Group: Energy Efficient High Performance Computing Power Measurement Methodology v1.2 RC 2

    Google Scholar 

  29. ZES Zimmer: LMG95 1 Phase Power Analyzer. http://www.zes.com/en/Products/Precision-Power-Analyzer/LMG95

  30. Rohr, D.: On Development, Feasibility, and Limits of Highly Efficient CPU and GPU Programs in Several Fields. Dissertation Thesis (2013)

    Google Scholar 

  31. Clark, M.A., Babich, R., Barros, K., Brower, R.C., Rebbi, C.: Solving lattice QCD systems of equations using mixed precision solvers on GPUs. Comput. Phys. Commun. 181, 1517–1528 (2010)

    Article  MATH  Google Scholar 

  32. Battista, C., Cabasino, S., Marzano, F., Paolucci, P., Pech, J., et al.: APE-100 computer: (i) the architecture. Int. J. High Speed Comput. 05(04), 637–656 (1993)

    Article  Google Scholar 

  33. Boyle, P. A., Chen, D., Christ, N. H., Clark, M. A., Cohen, S. D., et al.: QCDOC: a 10 teraflops computer for tightly-coupled calculations. In: SC 2004 Proceedings of 2004 International Conference for High Performance Computing, Networking, Storage and Analysis (2004)

    Google Scholar 

  34. Baier, H., Boettiger, H., Drochner, M., Eicker, N., Fischer, U.: QPACE - a QCD parallel computer based on cell processors. In: Proceedings of Science, p. 21, November 2009

    Google Scholar 

  35. Vranas, P.: QCD and the BlueGene. J. Phys.: Conf. Ser. 78, 012080 (2007)

    Google Scholar 

  36. Smelyanskiy, M., Vaidyanathan, K., Choi, J., Joó, B., Chhugani, J., et al.: High-performance lattice QCD for Multi-Core based parallel systems using a cache-friendly hybrid threaded-MPI approach. In: SC 2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (2011)

    Google Scholar 

  37. Winter, F. T., Clark, M. A., Edwards, R. G., Joó, B.: A framework for lattice QCD calculations on GPUs. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1073–1082 (2014)

    Google Scholar 

  38. Joó, B., Kalamkar, D.D., Vaidyanathan, K., Smelyanskiy, M., Pamnany, K., et al.: Supercomputing. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. Lecture Notes in Computer Science, vol. 7905, pp. 40–54. Springer, Heidelberg (2014)

    Google Scholar 

Download references

Acknowledgments

We would like to thank Advanced Micro Devices, Inc. (AMD) and ASUSTeK Computer Inc. (Asus) for their support.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to David Rohr .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Rohr, D., Bach, M., Nešković, G., Lindenstruth, V., Pinke, C., Philipsen, O. (2015). Lattice-CSC: Optimizing and Building an Efficient Supercomputer for Lattice-QCD and to Achieve First Place in Green500. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-20119-1_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-20118-4

  • Online ISBN: 978-3-319-20119-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics