A survey of compute nodes with 100 TFLOPS and beyond for supercomputers

Chang, Junsheng; Lu, Kai; Guo, Yang; Wang, Yongwen; Zhao, Zhenyu; Huang, Libo; Zhou, Hongwei; Wang, Yao; Lei, Fei; Zhang, Biwei

doi:10.1007/s42514-024-00188-w

A survey of compute nodes with 100 TFLOPS and beyond for supercomputers

Regular Paper
Published: 23 May 2024

(2024)
Cite this article

CCF Transactions on High Performance Computing Aims and scope Submit manuscript

Junsheng Chang ORCID: orcid.org/0009-0006-3061-980X¹,
Kai Lu¹,
Yang Guo¹,
Yongwen Wang¹,
Zhenyu Zhao¹,
Libo Huang¹,
Hongwei Zhou¹,
Yao Wang¹,
Fei Lei¹ &
…
Biwei Zhang¹

107 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

With the Frontier supercomputer ranked first on the Top500 list, it marks the era of exascale computing power for supercomputers, employing the compute nodes with double-precision floating-point performance exceeding 100 TFLOPS. As the basic computing unit of supercomputers, the efficiency of compute nodes significantly impacts the processing efficiency of application workloads in different domains, such as scientific computing and artificial intelligence. This article systematically analyzes the architectures and key technologies of major supercomputers already built or soon to be constructed in the world, and summarizes the development trends of 100 TFLOPS compute nodes technology. This paper provides some valuable insights on how to address the challenges imposed by future 10-exascale and even zettascale supercomputers, which include improving energy efficiency, optimizing memory access, utilizing chiplet interconnection, and employing advanced packaging technology. This will help scholars in related fields to further think about future research directions, and help industrial employees to construct more practical and efficient compute nodes for the future supercomputers.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Advancements in Quantum Computing—Viewpoint: Building Adoption and Competency in Industry

Article Open access 11 March 2024

First Impressions of the Sapphire Rapids Processor with HBM for Scientific Workloads

Article Open access 07 June 2024

OpenMP offload toward the exascale using Intel® GPU Max 1550: evaluation of STREAmS compressible solver

Article 06 June 2024

References

Alcorn, P.: AMD Shares New CPU Core Roadmap, 3 nm Zen 5 by 2024, 4th-Gen Infinity Architecture. Tom’s Hardware. https://www.tomshardware.com/ (2022). Accessed 15 June 2022
AMD.: AMD RDNA Architecture. https://www.amd.com/system/files/documents/rdna-whitepaper.pdf (2019). Accessed 8 June 2019
AMD.: AMD CDNA Architecture. https://www.amd.com/content/dam/amd/en/documents/instinct-business-docs/white-papers/amd-cdna-white-paper.pdf (2020). Accessed 18 Nov 2020
AMD.: AMD CDNA™ 2 Architecture. https://www.amd.com/content/dam/amd/en/documents/instinct-business-docs/white-papers/amd-cdna2-white-paper.pdf (2021). Accessed 16 Aug 2021
AMD.: AMD CDNA™ 3 Architecture. https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/white-papers/amd-cdna-3-white-paper.pdf (2023). Accessed 26 Dec 2023
Asch, M., Moore, T., Badia, R., Beck, M., Beckman, P., Bidot, T., Bodin, F., Cappello, F., Choudhary, A., Supinski, B., et al.: Big data and extreme-scale computing: pathways to convergence-toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int. J. High Perform. C 32(4), 435–479 (2018)
Article Google Scholar
Aurora.: https://www.alcf.anl.gov/aurora (2023). Accessed 26 Dec 2023
Biswas, A.: Sapphire rapids. In: 2021 IEEE Hot Chips 33 Symposium (HCS), pp. 1–22. IEEE, (2021)
Blaiszik, B., Ward, L.T., Schwarting, M., Gaff, J., Chard, R., Pike, D.W., Chard, K., Foster, I.T.: A data ecosystem to support machine learning in materials science. MRS Commun. 9, 1125–1133 (2019)
Article Google Scholar
Blythe, D.: XeHPC Ponte Vecchio. In: 2021 IEEE Hot Chips 33 Symposium (HCS), pp. 1–34. IEEE (2021)
Burd, T., Li, W., Pistole, J., Venkataraman, S., McCabe, M., Johnson, T., Vinh, J., Yiu, T., Wasio, M., Wong, H.-H., et al.: Zen3: the AMD 2nd-generation 7 nm × 86–64 microprocessor core. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), pp. 1–3. IEEE (2022)
Choquette, J.: NVIDIA Hopper H100 GPU: scaling performance. IEEE Micro 43(3), 9–17 (2023)
Article Google Scholar
Choquette, J., Gandhi, W.: NVIDIA A100 GPU: performance and innovation for GPU computing. In: 2020 IEEE Hot Chips 32 Symposium (HCS), pp. 1–43. IEEE (2020)
CXL.: Compute Express Link™: The Breakthrough CPU-to-Device Interconnect CXL™. https://www.computeexpresslink.org/ (2023). Accessed 26 Dec 2023
Evers, M., Barnes, L., Clark, M.: Next generation “Zen 3” Core. In: 2021 IEEE Hot Chips 33 Symposium (HCS). pp 1–32. IEEE (2021)
Fedus, W., Zoph, B., Shazeer, N.: Switch transformers: scaling to trillion parameter models with simple and efficient sparsity. JMLR 23(1), 1532–4435 (2022)
MathSciNet Google Scholar
Frontier.: Oak Ridge National Laboratorys Frontier Supercomputer. https://docs.olcf.ornl.gov/systems/frontier_user_guide.html (2024). Accessed 26 Jan 2024
Fu, H., Liao, J., Yang, J., Wang, L., Song, Z., Huang, X., Yang, C., Xue, W., Liu, F., Qiao, F., et al.: The sunway TaihuLight supercomputer: system and applications. Sci. China Inf. Sci. 59, 072001 (2016)
Article Google Scholar
Gao, J., Zheng, F., Qi, F., Ding, Y., Li, H., Lu, H., He, W., Wei, H., Jin, L., Liu, X., et al.: Sunway supercomputer architecture towards exascale computing: analysis and practice. Sci. China Inf. Sci. 64, 141101 (2021)
Article Google Scholar
Gokhale, M., Holmes, B., Iobst, K.: Processing in memory: the Terasys massively parallel PIM array. Computer 28(4), 23–31 (1995)
Article Google Scholar
Gomes, W., Koker, A., Stover, P., Ingerly, D., Siers, S., Venkataraman, S., Pelto, C., Shah, T., Rao, A., O’Mahony, F., et al.: Ponte Vecchio: a multi-tile 3D stacked processor for exascale computing. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), pp. 42–44. IEEE (2022)
Gonzalez, C., Floyd, M., Fluhr, E., Restle, P., Dreps, D., Sperling, M., Rao, R., Hogenmiller, D., Vezyrtis, C., Chuang, P., et al.: The 24-core POWER9 processor with adaptive clocking, 25-Gb/s accelerator links, and 16-Gb/s PCIe Gen4. IEEE J. Solid State Circuits 53(1), 91–101 (2018)
Article Google Scholar
Gouk, D., Kwon, M., Bae, H., Lee, S., Jung, M.: Memory pooling with CXL. IEEE Micro 43(2), 48–57 (2023)
Article Google Scholar
Green500 the list.: https://www.top500.org/lists/green500/2022/06/ (2022). Accessed 15 June 2022
Hines, J.: Stepping up to summit. Comput. Sci. Eng. 20(2), 78–82 (2018)
Article Google Scholar
HPL-MxP results.: https://hpl-mxp.org/results.md (2023). Accessed 14 Nov 2023
Huerta, E.A., Khan, A., Davis, E., Bushell, C., Gropp, W., Katz, D.S., Kindratenko, V., Koric, S., Kramer, W.T.C., McGinty, B., et al.: Convergence of artificial intelligence and high performance computing on NSF-supported cyberinfrastructure. J. Big Data 7, 88 (2020)
Article Google Scholar
IBM POWER9 NPU team.: Functionality and performance of NVLink with IBM POWER9 processors. IBM J. Res. Dev. 64(4/5), 91–910 (2018)
Ingerly, D. B., Amin, S., Aryasomayajula, L., Balankutty, A., Borst, D., Chandra, A., Cheemalapati, K., Cook, C. S., Criss, R., Enamul, K., et al.: Foveros: 3D integration and the use of face-to-face chip stacking for logic devices. In: 2019 IEEE International Electron Devices Meeting (IEDM), pp. 1961–1964. IEEE (2019)
Ishii, A., Wells, R.: The Nvlink-network switch: Nvidia’s switch chip for high communication-bandwidth superpods. In: 2022 IEEE Hot Chips 34 symposium (HCS), pp. 1–23. IEEE (2022)
Jang, J., Kim, H., Lee, H.: Characterizing memory access patterns of various convolutional neural networks for utilizing processing-in-memory. In: 2023 International Conference on Electronics, Information, and Communication (ICEIC), pp. 1–3. IEEE (2023)
Jiang, H.: Intel’s Ponte Vecchio GPU: architecture, systems & software. In: 2022 IEEE Hot Chips 34 Symposium (HCS), pp. 1–29. IEEE (2022)
Kahle, J.A., Moreno, J., Dreps, D.: Summit and sierra: designing AI/HPC supercomputers. In: 2019 IEEE International Solid-State Circuits Conference (ISSCC), pp. 42–43. IEEE (2019)
Kang, W., Zhang, H., Zhao, W.: Spintronic memories: from memory to computing-in-memory. In: 2019 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH), pp. 1–2. IEEE (2019)
Kim, H., Baek, S., Song, J., Song, T.: A novel processing unit and architecture for process-in memory (PIM) in NAND flash memory. In: 2022 19th International SoC Design Conference (ISOCC), pp. 127–128. IEEE (2022)
Li, A., Song, S.L., Chen, J., Li, J., Liu, X., Tallent, N.R., Barker, K.J.: Evaluating modern GPU interconnect: PCIe, NVLink, NV-SLI, NVSwitch and GPUDirect. IEEE Trans. Parallel Distrib. Syst. 31(1), 94–110 (2020)
Article Google Scholar
Liao, X., Lu, K., Yang, C., Li, J.-W., Yuan, Y., Lai, M.-C., Huang, L., Lu, P.-J., Fang, J., Ren, J.: Moving from exascale to zettascale computing: challenges and techniques. Front. Inf. Technol. Electron. Eng. 19, 1236–1244 (2018)
Article Google Scholar
Liu, M.: Unleashing the future of innovation. In: 2021 IEEE International Solid-State Circuits Conference (ISSCC), pp. 9–16. IEEE (2021)
Lu, Y.: Paving the way for China exascale computing. CCF Trans. HPC 1, 63–72 (2019)
Article Google Scholar
Mahajan, R., Sankman, R., Patel, N., Kim, D.-w., Aygun, K., Qian, Z., Mekonnen, Y. S., Salama, I. A., Sharan, S., Iyengar, D., et al.: Embedded multi-die interconnect bridge (EMIB)—a high density, high bandwidth packaging interconnect. In: 2016 IEEE 66th Electronic Components and Technology Conference (ECTC), pp. 557–565. IEEE (2016)
Matsuoka, S., Domke, J., Wahib, M., Drozd, A., Hoefler, T.: Myths and legends in high-performance computing. Int. J. High Perform. C 37(3–4), 245–259 (2023)
Article Google Scholar
Moreau, M., Muhr, E., Bocquet, M., Aziza, H., Portal, J., Giraud, B., Noël, J.: Reliable ReRAM-based logic operations for computing in memory. In: 2018 IFIP/IEEE International Conference on Very Large Scale Integration (VLSI-SoC), pp. 192–195. IEEE (2018)
Moreno, J.H., Wen, S.: Co-design in high performance computing systems. In: 2021 IEEE International Electron Devices Meeting (IEDM), pp. 1–4. IEEE (2021)
Moreno-Álvarez, S., Paoletti, M.E., Rico-Gallego, J.A., Haut, J.M.: Heterogeneous gradient computing optimization for scalable deep neural networks. J. Supercomput. 78, 13455–13469 (2022)
Article Google Scholar
Morgan, T.P.: Lawrence Livermore’s “El Capitan” To Take AMD’s Instinct APU Mainstream. The Next Platform. https://www.nextplatform.com/ (2022). Accessed 22 June 2022
Munger, B., Wilcox, K., Sniderman, J., Tung, C., Johnson, B., Schreiber, R., Henrion, C., Gillespie, K., Burd, T., Fair, H. R., et al.: “Zen 4”: the AMD 5 nm 5.7 GHz × 86–64 microprocessor core. In: 2023 IEEE International Solid-State Circuits Conference (ISSCC), pp. 38–39. IEEE (2023)
Nassif, N., Munch, A. O., Molnar, C. L., Pasdast, G., Iyer, S. V., Yang, Z., Mendoza, O., Huddart, M., Venkataraman, S., Kandula, S., et al.: Sapphire rapids: the next-generation Intel Xeon scalable processor. In: 2022 IEEE International Solid-State Circuits Conference (ISSCC), pp. 44–46. IEEE (2022)
Nvidia.: OpenCL Programming Guide for the CUDA Architecture. https://www.nvidia.com/content/cudazone/download/opencl/nvidia_opencl_programmingguide.pdf (2009). Accessed 27 Aug 2009
Nvidia.: Nvidia Tesla V100 Architecture. https://images.nvidia.cn/content/volta-architecture/pdf/volta-architecture-whitepaper.pdf (2017). Accessed 27 Aug 2017
Nvidia.: Nvidia Turing GPU Architecture. https://images.nvidia.cn/aem-dam/en-zz/Solutions/design-visualization/technologies/turing-architecture/NVIDIA-Turing-Architecture-Whitepaper.pdf (2018). Accessed 14 Sept 2018
Nvidia.: Nvidia A100 Tensor Core GPU Architecture. https://images.nvidia.cn/aem-dam/en-zz/Solutions/data-center/nvidia-ampere-architecture-whitepaper.pdf (2020). Accessed 23 Dec 2020
Nvidia.: Nvidia H100 Tensor Core GPU Architecture. https://www.techpowerup.com/gpu-specs/docs/nvidia-gh100-architecture.pdf (2023a). Accessed 27 Dec 2023
Nvidia.: Nvidia DGX H100 User Guide. https://docs.nvidia.com/dgx/dgxh100-user-guide/dgxh100-user-guide.pdf (2023b). Accessed 27 Dec 2023
Nvidia.: Nvidia DGX SuperPOD: next Generation Scalable Infrastructure for AI Leadership. https://docs.nvidia.com/https:/docs.nvidia.com/dgx-superpod-reference-architecture-dgx-h100.pdf (2023c). Accessed 22 Sept 2023
Nvidia.: Nvidia DGX SuperPOD Data Center. https://docs.nvidia.com/nvidia-dgx-superpod-data-center-design-dgx-h100.pdf (2023d). Accessed 22 May 2023
Nvidia.: NVLink and NVSwitch. https://www.nvidia.com/en-us/data-center/nvlink/ (2023e). Accessed 27 Dec 2023
Nvidia.: Nvidia GH200 Grace Hopper Superchip Architecture. https://resources.nvidia.com/en-us-grace-cpu/nvidia-grace-hopper (2024). Accessed 27 Nov 2024
Park, S. J., Kim, H., Kim, K.-S., So, J., Ahn, J., Lee, W.-J., Kim, D., Kim, Y.-J., Seok, J., Lee, J.-G., et al.: Scaling of memory performance and capacity with CXL memory expander. In: 2022 IEEE Hot Chips 34 Symposium (HCS), pp. 1–27. IEEE (2022)
Pires, F.: AMD’s Third-Gen Infinity Architecture Enables Coherent CPU-GPU Communication. Tom’s Hardware. https://www.tomshardware.com/ (2021). Accessed 9 Nov 2021
Preface.: Summit and sierra supercomputers. IBM J. Res. Dev. 64(3/4), 1–4 (2020)
Raihan, M.A., Goli, N., Aamodt, T.M.: Modeling deep learning accelerator enabled GPUs. In: 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), pp. 79–92. IEEE (2019)
Rajaraman, V.: Frontier—world’s first ExaFLOPS supercomputer. Resonance 28, 567–576 (2023)
Article Google Scholar
Sato, M., Ishikawa, Y., Tomita, H., Kodama, Y., Odajima, T., Tsuji, M., Yashiro, H., Aoki, M., Shida, N., Miyoshi, I., et al.: Co-design for A64FX manycore processor and “Fugaku”. In: SC20: International Conference for High Performance Computing, Networking, Storage and Analysis, pp. 1–15. IEEE (2020)
Sato, M., Kodama, Y., Tsuji, M.: Co-design and system for the supercomputer “Fugaku.” IEEE Micro 42(2), 26–34 (2022)
Article Google Scholar
Shimizu, T.: Supercomputer Fugaku: co-designed with application developers/researchers. In: 2020 IEEE Asian Solid-State Circuits Conference (A-SSCC), pp. 1–4. IEEE (2020)
Singh, T., Rangarajan, S., John, D., Henrion, C., Southard, S., McIntyre, H., Novak, A., Kosonocky, S., Jotwani, R., Schaefer, A., et al.: Zen: a next-generation high-performance ×86 core. In: 2017 IEEE International Solid-State Circuits Conference (ISSCC), pp. 52–53. IEEE (2017)
Singh, T., Schaefer, A., Rangarajan, S., John, D., Henrion, C., Schreiber, R., Rodriguez, M., Kosonocky, S., Naffziger, S., Novak, A.: Zen: an energy-efficient high-performance × 86 Core. IEEE J. Solid State Circuits 53(1), 102–114 (2018)
Article Google Scholar
Singh, T., Rangarajan, S., John, D., Schreiber, R., Oliver, S., Seahra, R., Schaefer, A.: Zen 2: the AMD 7 nm energy-efficient high-performance × 86–64 microprocessor core. In: 2020 IEEE International Solid-State Circuits Conference (ISSCC), pp. 42–44. IEEE (2020)
Smith, R.: El Capitan supercomputer detailed: AMD CPUs & GPUs to Drive 2 Exaflops of Compute. AnandTech. https://www.anandtech.com/ (2020). Accessed 4 Mar 2007
Su, L., Naffziger, S.: Innovation for the next decade of compute efficiency. In: 2023 IEEE International Solid-State Circuits Conference (ISSCC), pp. 8–12. IEEE (2023)
Suggs, D., Bouvier, D., Clark, M., Lepak, K., Subramony, M.: AMD “ZEN 2”. In: 2019 IEEE Hot Chips 31 Symposium (HCS), pp. 1–24. IEEE (2019)
Suggs, D., Subramony, M., Bouvier, D.: The AMD “Zen 2” processor. IEEE Micro 40(2), 45–52 (2020)
Article Google Scholar
Top500 the list.: https://www.top500.org/lists/top500/2022/06/ (2022). Accessed 15 June 2022
Top500 the list.: https://www.top500.org/lists/top500/2023/11/ (2023). Accessed 14 Nov 2023
Ward, L., Blaiszik, B., Foster, I., Assary, R.S., Narayanan, B., Curtiss, L.: Machine learning prediction of accurate atomization energies of organic molecules from low-fidelity quantum chemical calculations. MRS Commun. 9, 891–899 (2019)
Article Google Scholar
Wolf, W.H.: Hardware-software co-design of embedded systems. Proc. IEEE 82(7), 967–989 (1994)
Article Google Scholar

Download references

Acknowledgements

This work was supported by National Natural Science Foundation of China-China Academy of General Technology Joint Fund for Basic Research (Grant no. 62272475).

Author information

Authors and Affiliations

School of Computer, National University of Defense Technology, Key Laboratory of Advanced Microprocessors Chips and Systems, Changsha, 410073, China
Junsheng Chang, Kai Lu, Yang Guo, Yongwen Wang, Zhenyu Zhao, Libo Huang, Hongwei Zhou, Yao Wang, Fei Lei & Biwei Zhang

Authors

Junsheng Chang
View author publications
You can also search for this author in PubMed Google Scholar
Kai Lu
View author publications
You can also search for this author in PubMed Google Scholar
Yang Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yongwen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenyu Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Libo Huang
View author publications
You can also search for this author in PubMed Google Scholar
Hongwei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Yao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Fei Lei
View author publications
You can also search for this author in PubMed Google Scholar
Biwei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Kai Lu.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chang, J., Lu, K., Guo, Y. et al. A survey of compute nodes with 100 TFLOPS and beyond for supercomputers. CCF Trans. HPC (2024). https://doi.org/10.1007/s42514-024-00188-w

Download citation

Received: 12 September 2023
Accepted: 15 March 2024
Published: 23 May 2024
DOI: https://doi.org/10.1007/s42514-024-00188-w

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A survey of compute nodes with 100 TFLOPS and beyond for supercomputers

Abstract

Access this article

Similar content being viewed by others

Advancements in Quantum Computing—Viewpoint: Building Adoption and Competency in Industry

First Impressions of the Sapphire Rapids Processor with HBM for Scientific Workloads

OpenMP offload toward the exascale using Intel® GPU Max 1550: evaluation of STREAmS compressible solver

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A survey of compute nodes with 100 TFLOPS and beyond for supercomputers

Abstract

Access this article

Similar content being viewed by others

Advancements in Quantum Computing—Viewpoint: Building Adoption and Competency in Industry

First Impressions of the Sapphire Rapids Processor with HBM for Scientific Workloads

OpenMP offload toward the exascale using Intel® GPU Max 1550: evaluation of STREAmS compressible solver

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation