Advertisement

Moving from exascale to zettascale computing: challenges and techniques

  • Xiang-ke Liao
  • Kai LuEmail author
  • Can-qun Yang
  • Jin-wen Li
  • Yuan Yuan
  • Ming-che Lai
  • Li-bo Huang
  • Ping-jing Lu
  • Jian-bin Fang
  • Jing Ren
  • Jie Shen
Perspective

Abstract

High-performance computing (HPC) is essential for both traditional and emerging scientific fields, enabling scientific activities to make progress. With the development of high-performance computing, it is foreseeable that exascale computing will be put into practice around 2020. As Moore’s law approaches its limit, high-performance computing will face severe challenges when moving from exascale to zettascale, making the next 10 years after 2020 a vital period to develop key HPC techniques. In this study, we discuss the challenges of enabling zettascale computing with respect to both hardware and software. We then present a perspective of future HPC technology evolution and revolution, leading to our main recommendations in support of zettascale computing in the coming future.

Key words

High-performance computing Zettascale Micro-architectures Interconnection Storage system Manufacturing process Programming models and environments 

CLC number

TP311 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Ábrahám E, Bekas C, Brandic I, et al., 2015. Preparing HPC applications for exascale: challenges and recommendations. 18th Int Conf on Network–Based Information Systems, p.401–406.Google Scholar
  2. Asch M, Moore T, Badia R, et al., 2018. Big data and extreme–scale computing: pathways to convergence—toward a shaping strategy for a future software and data ecosystem for scientific inquiry. Int J High Perform Comput Appl, 32(4):435–479. https://doi.org/10.1177.1094342018778123CrossRefGoogle Scholar
  3. Cavin RK, Lugli P, Zhirnov VV, 2012. Science and engineering beyond Moore’s law. Proc IEEE, 100:1720–1749. https://doi.org/10.1109/JPROC.2012.2190155CrossRefGoogle Scholar
  4. Chong FT, Franklin D, Martonosi M, 2017. Programming languages and compiler design for realistic quantum hardware. Nature, 549(7671):180–187. https://doi.org/10.1038/natur.23459CrossRefGoogle Scholar
  5. Diaz J, Muãoz–Caro C, Nião A, 2012. A survey of parallel programming models and tools in the multi and manycore era. IEEE Trans Parall Distrib Syst, 23(8):1369–1386. https://doi.org/10.1109/TPDS.2011.308CrossRefGoogle Scholar
  6. Fang J, Varbanescu AL, Sips HJ, 2011. A comprehensive performance comparison of CUDA and OpenCL. Int Conf on Parallel Processing, p.216–225. https://doi.org/10.1109/ICPP.2011.45CrossRefGoogle Scholar
  7. Glosli JN, Richards DF, Caspersen KJ, et al., 2007. Extending stability beyond CPU millennium: a micronscale atomistic simulation of Kelvin–Helmholtz instability. ACM/IEEE Conf on Supercomputing, p.1–11. https://doi.org/10.1145/1362622.1362700CrossRefGoogle Scholar
  8. Jacob P, Zia A, Erdogan O, et al., 2009. Mitigating memory wall effects in high–clock–rate and multicore CMOS 3D processor memory stacks. Proc IEEE, 97(1):108–122. https://doi.org/10.1109/JPROC.2008.2007472CrossRefGoogle Scholar
  9. Jeddeloh J, Keeth B, 2012. Hybrid memory cube new DRAM architecture increases density and performance. Int Symp on VLSI Technology, p.87–88. https://doi.org/10.1109/VLSIT.2012.6242474CrossRefGoogle Scholar
  10. Keeton K, 2015. The machine: an architecture for memorycentric computing. 5th Int Workshop on Runtime and Operating Systems for Supercomputers, p.1. https://doi.org/10.1145/2768405.2768406CrossRefGoogle Scholar
  11. Kim NS, Chen D, Xiong J, et al., 2017. Heterogeneous computing meets near–memory acceleration and high–level synthesis in the post–Moore era. IEEE Micro, 37(4):10–18. https://doi.org/10.1109/MM.2017.3211105CrossRefGoogle Scholar
  12. Kolli A, Rosen J, Diestelhorst S, et al., 2016. Delegated persist ordering. 49th Annual IEEE/ACM Int Symp on Microarchitecture, p.1–13. https://doi.org/10.1109/MICRO.2016.7783761CrossRefGoogle Scholar
  13. Lucas R, Ang J, Bergman K, et al., 2014. Top10 exascale research challenges. Department of Energy Office of Science. https://science.energy.gov/~/media/ascr/ascac/pdf/meetings/20140210/Top10reportFEB14.pdfGoogle Scholar
  14. Mishra S, Chaudhary NK, Singh K, 2013. Overview of optical interconnect technology. Int J Sci Eng Res, 3(4):364–374.Google Scholar
  15. Rumley S, Nikolova D, Hendry R, et al., 2015. Silicon photonics for exascale systems. J Lightw Technol, 33(3):547–562. https://doi.org/10.1109/JLT.2014.2363947CrossRefGoogle Scholar
  16. Schroeder B, Gibson GA, 2007. Understanding failures in petascale computers. J Phys, 78(1):012022. https://doi.org/10.1088/1742–6596/78/1.012022Google Scholar
  17. Shen J, Fang J, Sips HJ, et al., 2013. An application–centric evaluation of OpenCL on multi–core CPUs. Parall Comput, 39(12):834–850. https://doi.org/10.1016/j.parco.2013.08.009CrossRefGoogle Scholar
  18. Vinaik B, Puri R, 2015. Oracle’s Sonoma processor: advanced low–cost SPARC processor for enterprise workloads. IEEE Hot Chips.27 Symp, p.1–23. https://doi.org/10.1109/HOTCHIPS.2015.7477455Google Scholar
  19. Waldrop MM, 2016. The chips are down for Moore’s law. Nature, 530(7589):144–147. https://doi.org/10.1038/530144aCrossRefGoogle Scholar
  20. Wilkes MV, 1995. The memory wall and the CMOS endpoint. SIGARCH Comput Archit News, 23(4):4–6. https://doi.org/10.1145/218864.218865MathSciNetCrossRefGoogle Scholar
  21. Wulf WA, McKee SA, 1995. Hitting the memory wall: implications of the obvious. SIGARCH Comput Archit News, 23(1):20–24. https://doi.org/10.1145/216585.216588CrossRefGoogle Scholar
  22. Xu W, Lu Y, Li Q, et al., 2014. Hybrid hierarchy storage system in Milkyway–2 supercomputer. Front Comput Sci, 8(3):367–377. https://doi.org/10.1007/s11704–014–3499.6MathSciNetCrossRefGoogle Scholar
  23. Xu Z, Chi X, Xiao N, 2016. High–performance computing environment: a review of twenty years of experiments in China. Nat Sci Rev, 3(1):36–48. https://doi.org/10.1093/nsr/nw.001CrossRefGoogle Scholar
  24. Zhang P, Fang JB, Tang T, et al., 2018. Auto–tuning streamed applications on Intel Xeon Phi. IEEE Int Parallel and Distributed Processing Symp, p.515–525. https://doi.org/10.1109/IPDPS.2018.00061CrossRefGoogle Scholar

Copyright information

© Editorial Office of Journal of Zhejiang University Science and Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.College of ComputerNational University of Defense TechnologyChangshaChina

Personalised recommendations