Advertisement

Software and Hardware Co-design for Low-Power HPC Platforms

  • Manolis PloumidisEmail author
  • Nikolaos D. Kallimanis
  • Marios Asiminakis
  • Nikos Chrysos
  • Pantelis Xirouchakis
  • Michalis Gianoudis
  • Leandros Tzanakis
  • Nikolaos Dimou
  • Antonis Psistakis
  • Panagiotis Peristerakis
  • Giorgos Kalokairinos
  • Vassilis Papaefstathiou
  • Manolis Katevenis
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 11887)

Abstract

In order to keep an HPC cluster viable in terms of economy, serious cost limitations on the hardware and software deployment should be considered, prompting researchers to reconsider the design of modern HPC platforms. In this paper we present a cross-layer communication architecture suitable for emerging HPC platforms based on heterogeneous multiprocessors. We propose simple hardware primitives that enable protected, reliable and virtualized, user-level communication that can easily be integrate in the same package with the processing unit. Using an efficient user-space software stack the proposed architecture provides efficient, low-latency communication mechanisms to HPC applications. Our implementation of the MPI standard that exploits the aforementioned capabilities delivers point-to-point and collective primitives with low overheads, including an eager protocol with end-to-end latency of 1.4 \(\upmu \mathrm{s}\). We port and evaluate our communication stack using real HPC applications in a cluster of 128 ARMv8 processors that are tightly coupled with FPGA logic. The network interface primitives occupy less than 25% of the FPGA logic and only 3 Mbits of SRAM while they can easily saturate the 16 Gb/s links in our platform.

Notes

Acknowledgments

This work is supported by the European Commission under the Horizon 2020 Framework Programme [8] for Research and Innovation through the EuroEXA project [5] (g.a. 754337), the EU H2020 FETHPC project Exanode (g.a. 671578) and the ExaNeSt project (g.a. 671553) [2].

References

  1. 1.
    LAMMPS Molecular Dynamics Simulator. Sandia National Laboratories. https://lammps.sandia.gov
  2. 2.
    The ExaNest project. European Exascale System Interconnect and Storage. GA-671553. www.exanest.eu
  3. 3.
    Alverson, B., Froese, E., Kaplan, L., Roweth, D.: Cray xc series network. Cray Inc., White Paper WP-Aries01-1112 (2012)Google Scholar
  4. 4.
    Ammendola, R., et al.: Apenet: a high speed, low latency 3d interconnect network. In: cluster, p. 481. Citeseer (2004)Google Scholar
  5. 5.
    EuroEXA: European Exascale System Interconnect and Storage. https://euroexa.eu/
  6. 6.
    Feldman, M.: Fujitsu switches horses for post-k supercomputer, will ride arm into exascale. Recuperado de (2016). https://www.top500.org/news/fujitsu-switcheshorses-for-post-k-supercomputer-will-ride-arm-intoexascale
  7. 7.
    Fu, H., et al.: The sunway taihulight supercomputer: system and applications. Sci. China Inf. Sci. 59(7), 072001 (2016)CrossRefGoogle Scholar
  8. 8.
    HORIZON 2020: The EU Framework Programme for Research and Innovation. https://ec.europa.eu/programmes/horizon2020/
  9. 9.
    Katevenis, M., et al., N.C.: The exanest project: Interconnects, storage, and packaging for exascale systems. In: 2016 Euromicro Conference on Digital System Design (DSD), pp. 60–67, August 2016.  https://doi.org/10.1109/DSD.2016.106
  10. 10.
    Katevenis, M.G.: Interprocessor communication seen as load-store instruction generalization. In: The Future of Computing, essays in memory of Stamatis Vassiliadis. In: Bertels, K., et al. (eds.) Delft, The Netherlands. Citeseer (2007)Google Scholar
  11. 11.
    Katz, R.H., Eggers, S.J., Wood, D.A., Perkins, C., Sheldon, R.G.: Implementing a cache consistency protocol, vol. 13. IEEE Computer Society Press (1985)Google Scholar
  12. 12.
    Leitao, B.H.: Tuning 10gb network cards on linux. In: Proceedings of the 2009 Linux Symposium, pp. 169–185. Citeseer (2009)Google Scholar
  13. 13.
    LAMMPS Benchmark suite. http://lammps.sandia.gov/bench.html
  14. 14.
  15. 15.
    Pfister, G.F.: An introduction to the infiniband architecture. High Perform. Mass Storage Parallel I/O 42, 617–632 (2001)Google Scholar
  16. 16.
    Thakur, R., Rabenseifner, R., Gropp, W.: Optimization of collective communication operations in MPICH. Int. J. High Perform. Comput. Appl. 19(1), 49–66 (2005).  https://doi.org/10.1177/1094342005051521. http://dx.doi.org/10.1177/1094342005051521CrossRefGoogle Scholar
  17. 17.
    Yokokawa, M., Shoji, F., Uno, A., Kurokawa, M., Watanabe, T.: The k computer: Japanese next-generation supercomputer development project. In: IEEE/ACM International Symposium on Low Power Electronics and Design, pp. 371–372. IEEE (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Manolis Ploumidis
    • 1
    Email author
  • Nikolaos D. Kallimanis
    • 1
  • Marios Asiminakis
    • 1
  • Nikos Chrysos
    • 1
  • Pantelis Xirouchakis
    • 1
  • Michalis Gianoudis
    • 1
  • Leandros Tzanakis
    • 1
  • Nikolaos Dimou
    • 1
  • Antonis Psistakis
    • 1
  • Panagiotis Peristerakis
    • 1
  • Giorgos Kalokairinos
    • 1
  • Vassilis Papaefstathiou
    • 1
  • Manolis Katevenis
    • 1
  1. 1.Foundation for Research and Technology – Hellas (FORTH)HeraklionGreece

Personalised recommendations