Lattice-CSC: Optimizing and Building an Efficient Supercomputer for Lattice-QCD and to Achieve First Place in Green500

Rohr, David; Bach, Matthias; Nešković, Gvozden; Lindenstruth, Volker; Pinke, Christopher; Philipsen, Owe

doi:10.1007/978-3-319-20119-1_14

David Rohr¹⁵,
Matthias Bach¹⁵,
Gvozden Nešković¹⁵,
Volker Lindenstruth^15,16,
Christopher Pinke¹⁷ &
…
Owe Philipsen¹⁷

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9137))

Included in the following conference series:

International Conference on High Performance Computing

2773 Accesses
7 Citations

Abstract

In the last decades, supercomputers have become a necessity in science and industry. Huge data centers consume enormous amounts of electricity and we are at a point where newer, faster computers must no longer drain more power than their predecessors. The fact that user demand for compute capabilities has not declined in any way has led to studies of the feasibility of exaflop systems. Heterogeneous clusters with highly-efficient accelerators such as GPUs are one approach to higher efficiency. We present the new L-CSC cluster, a commodity hardware compute cluster dedicated to Lattice QCD simulations at the GSI research facility. L-CSC features a multi-GPU design with four FirePro S9150 GPUs per node providing 320 GB/s memory bandwidth and 2.6 TFLOPS peak performance each. The high bandwidth makes it ideally suited for memory-bound LQCD computations while the multi-GPU design ensures superior power efficiency. The November 2014 Green500 list awarded L-CSC the most power-efficient supercomputer in the world with 5270 MFLOPS/W in the Linpack benchmark. This paper presents optimizations to our Linpack implementation HPL-GPU and other power efficiency improvements which helped L-CSC reach this benchmark. It describes our approach for an accurate Green500 power measurement and unveils some problems with the current measurement methodology. Finally, it gives an overview of the Lattice QCD application on L-CSC.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
See http://www.gsi.de.
2.
See e.g. http://lattice.github.io/quda/.
3.
Compute Abstraction Layer is the assembler language of former AMD GPUs.

References

Rohr, D., Kalcher, S., Bach, M., Alaqeeli, A., Alzaid, H., et al.: An energy-efficient multi-GPU supercomputer. In: Proceedings of the 16th IEEE International Conference on High Performance Computing and Communications, IEEE, Paris, France (2014)
Google Scholar
Gupta, R.: Introduction to Lattice QCD (1998). http://arxiv.org/abs/hep-lat/9807028
Babich, R., Clark, M., Joó, B., Shi, G., Brower, R. C., Gottlieb, S.: Scaling lattice QCD beyond 100 GPUs. In: SC 2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 70:1–70:11 (2011)
Google Scholar
Bach, M., Lindenstruth, V., Philipsen, O., Pinke, C.: Lattice QCD based on OpenCL. Comput. Phys. Commun. 184, 2042–2052 (2013)
Article Google Scholar
Bach, M., Lindenstruth, V., Pinke, C., Philipsen, O.: Twisted-Mass Lattice QCD using OpenCL. In: PoS LATTICE2013, p. 032 (2013)
Google Scholar
Philipsen, O., Pinke, C., Sciarra, A., Bach, M.: CL2QCD - lattice QCD based on OpenCL. In: PoS LATTICE2014, p. 038 (2014)
Google Scholar
http://code.compeng.uni-frankfurt.de/projects/clhmc
Khronos OpenCL Registry, OpenCL API and C Language Specifications. https://www.khronos.org/registry/cl/
NVIDIA, CUDA Toolkit Documentation. http://docs.nvidia.com/cuda/index.html
Philipsen, O., Pinke, C.: The nature of the Roberge-Weiss transition in \(N_f=2\). Phys. Rev. D 89(9), 094504 (2014)
Article Google Scholar
Philipsen, O., Bach, M., Lindenstruth, V., Pinke, C.: The thermal quark hadron transition in lattice QCD with two quark flavours. In: Proceedings of Conference: C14–02-12.1, pp. 33–40
Google Scholar
Dongarra, J., Luszczek, P., Petitet, A.: The LINPACK benchmark: past, present and future. Concurrency Comput.: Pract. Experience 15(9), 803–820 (2003)
Article Google Scholar
TOP500 Supercomputer Sites. http://www.top500.org
Bach, M., Kretz, M., Lindenstruth, V., Rohr, D.: Optimized HPL for AMD GPU and multi-core CPU usage. Comput. Sci. - Res. Dev. 26(3–4), 153–164 (2011)
Article Google Scholar
Rohr, D., Bach, M., Kretz, M., Lindenstruth, V.: Multi-GPU DGEMM and HPL on highly energy efficient clusters. In: IEEE Micro, Special Issue, CPU, GPU, and Hybrid Computing (2011)
Google Scholar
Sharma, S., Hsu, C., Feng, W.: Making a case for a Green500 list. In: Proceedings of the 20th IEEE International Parallel Distributed Processing Symposium p. 343 (2006)
Google Scholar
The Green500 List. http://www.green500.org
Bach, M., De Cuveland, J., Ebermann, H., Eschweiler, D., Kretz, M., et al.: The LOEWE-CSC: a comprehensive approach for a power efficient general purpose supercomputer. In: 21st Euromicro International Conference on Parallel, Distributed and Network-Based Processing (2013)
Google Scholar
Rohr, D., Nescovic, G., Radtke, M., Lindenstruth, V.: The L-CSC cluster: greenest supercomputer in the world in Green500 list of November 2014. In: Proceedings of Supercomputing Frontiers (2015)
Google Scholar
High Energy Accelerator Research Organization. http://www.kek.jp
PEZY Computing, PEZY-SC Many Core Processor (2014). http://www.pezy.co.jp/en/products/pezy-sc.html
http://www.gsic.titech.ac.jp/tsubame
Sterling, T.L.: How to Build a Beowulf: A Guide to the Implementation and Application of PC Clusters. MIT Press, Cambridge (1999)
Google Scholar
Intel Corporation, Intel MKL BLAS Library. https://software.intel.com/en-us/intel-mkl
Rohr, D., Lindenstruth, V.: A flexible and portable large-scale DGEMM library for linpack on next-generation multi-GPU systems. In: 23rd Euromicro International Conference on Parallel, Distributed and Network-Based Processing (2015)
Google Scholar
https://www.kernel.org/pub/linux/utils/kernel/cpufreq/
Kidd, T.I.: What exactly is a P-state? (2008). https://software.intel.com/en-us/blogs/2008/05/29/what-exactly-is-a-p-state-pt-1
EEHPC Working Group: Energy Efficient High Performance Computing Power Measurement Methodology v1.2 RC 2
Google Scholar
ZES Zimmer: LMG95 1 Phase Power Analyzer. http://www.zes.com/en/Products/Precision-Power-Analyzer/LMG95
Rohr, D.: On Development, Feasibility, and Limits of Highly Efficient CPU and GPU Programs in Several Fields. Dissertation Thesis (2013)
Google Scholar
Clark, M.A., Babich, R., Barros, K., Brower, R.C., Rebbi, C.: Solving lattice QCD systems of equations using mixed precision solvers on GPUs. Comput. Phys. Commun. 181, 1517–1528 (2010)
Article MATH Google Scholar
Battista, C., Cabasino, S., Marzano, F., Paolucci, P., Pech, J., et al.: APE-100 computer: (i) the architecture. Int. J. High Speed Comput. 05(04), 637–656 (1993)
Article Google Scholar
Boyle, P. A., Chen, D., Christ, N. H., Clark, M. A., Cohen, S. D., et al.: QCDOC: a 10 teraflops computer for tightly-coupled calculations. In: SC 2004 Proceedings of 2004 International Conference for High Performance Computing, Networking, Storage and Analysis (2004)
Google Scholar
Baier, H., Boettiger, H., Drochner, M., Eicker, N., Fischer, U.: QPACE - a QCD parallel computer based on cell processors. In: Proceedings of Science, p. 21, November 2009
Google Scholar
Vranas, P.: QCD and the BlueGene. J. Phys.: Conf. Ser. 78, 012080 (2007)
Google Scholar
Smelyanskiy, M., Vaidyanathan, K., Choi, J., Joó, B., Chhugani, J., et al.: High-performance lattice QCD for Multi-Core based parallel systems using a cache-friendly hybrid threaded-MPI approach. In: SC 2011 Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (2011)
Google Scholar
Winter, F. T., Clark, M. A., Edwards, R. G., Joó, B.: A framework for lattice QCD calculations on GPUs. In: Proceedings of the 2014 IEEE 28th International Parallel and Distributed Processing Symposium, pp. 1073–1082 (2014)
Google Scholar
Joó, B., Kalamkar, D.D., Vaidyanathan, K., Smelyanskiy, M., Pamnany, K., et al.: Supercomputing. In: Kunkel, J.M., Ludwig, T., Meuer, H.W. (eds.) ISC 2013. Lecture Notes in Computer Science, vol. 7905, pp. 40–54. Springer, Heidelberg (2014)
Google Scholar

Download references

Acknowledgments

We would like to thank Advanced Micro Devices, Inc. (AMD) and ASUSTeK Computer Inc. (Asus) for their support.

Author information

Authors and Affiliations

Frankfurt Institute for Advanced Studies, Department for High Performance Computing, Goethe University Frankfurt, Ruth-Moufang-Str.1, 60438, Frankfurt, Germany
David Rohr, Matthias Bach, Gvozden Nešković & Volker Lindenstruth
GSI Helmholtz Center for Heavy Ion Research, Planckstraße 1, 64291, Darmstadt, Germany
Volker Lindenstruth
Institute for Theoretical Physics, Goethe University Frankfurt, Max-von-Laue-Str.1, 60438, Frankfurt, Germany
Christopher Pinke & Owe Philipsen

Authors

David Rohr
View author publications
You can also search for this author in PubMed Google Scholar
Matthias Bach
View author publications
You can also search for this author in PubMed Google Scholar
Gvozden Nešković
View author publications
You can also search for this author in PubMed Google Scholar
Volker Lindenstruth
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Pinke
View author publications
You can also search for this author in PubMed Google Scholar
Owe Philipsen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to David Rohr .

Editor information

Editors and Affiliations

Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Julian M. Kunkel
Deutsches Klimarechenzentrum (DKRZ), Hamburg, Germany
Thomas Ludwig

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Rohr, D., Bach, M., Nešković, G., Lindenstruth, V., Pinke, C., Philipsen, O. (2015). Lattice-CSC: Optimizing and Building an Efficient Supercomputer for Lattice-QCD and to Achieve First Place in Green500. In: Kunkel, J., Ludwig, T. (eds) High Performance Computing. ISC High Performance 2015. Lecture Notes in Computer Science(), vol 9137. Springer, Cham. https://doi.org/10.1007/978-3-319-20119-1_14

Download citation

DOI: https://doi.org/10.1007/978-3-319-20119-1_14
Published: 20 June 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-20118-4
Online ISBN: 978-3-319-20119-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics