Energy-Efficient Heterogeneous Computing at exaSCALE—ECOSCALE

  • Konstantinos Georgopoulos
  • Iakovos MavroidisEmail author
  • Luciano Lavagno
  • Ioannis Papaefstathiou
  • Konstantin Bakanov


In the effort for exascale performance, current High-Performance Computing (HPC) systems need to be improved. Simple hardware scaling is not a feasible solution due to the increasing utility costs and power consumption limitations. Hence, apart from improvements in implementation technology, what is needed is to refine the HPC application development flow as well as the system architecture of future HPC systems. ECOSCALE tackles precisely these challenges by proposing a scalable programming environment and architecture, aiming to substantially reduce energy consumption as well as data traffic and latency. Furthermore, ECOSCALE introduces a novel heterogeneous energy-efficient hierarchical architecture, as well as a hybrid many-core+OpenCL programming environment and runtime system. The ECOSCALE approach is hierarchical and is expected to scale well by partitioning the physical system into multiple independent Workers. Workers are interconnected in a tree-like fashion and define a contiguous global address space that can be viewed either as a set of partitions in a Partitioned Global Address Space (PGAS), or as a set of nodes hierarchically interconnected via an MPI-like protocol. To further increase energy efficiency, as well as to provide resilience, the Workers employ reconfigurable accelerators mapped onto the virtual address space by utilising a dual-stage system memory management unit with coherent memory access. The architecture supports shared partitioned reconfigurable resources accessed by any Worker in a PGAS partition, as well as automated hardware synthesis of these resources from an OpenCL-based programming model.



This research project is supported by the European Commission under the H2020 Programme and the ECOSCALE project (grant agreement 671632).


  1. 1.
    Abdel-Gawad A, Thottethodi M, Bhatele A (2014) RAHTM: routing-algorithm aware hierarchical task mapping, SCGoogle Scholar
  2. 2.
    Barak A, Ben-Nun T, Levy E, Shiloh A (2010) A package for OpenCL based heterogeneous computing on clusters with many GPU devices. In: IEEE international conference on cluster computing workshops and posters (CLUSTER WORKSHOPS), pp 1–7.
  3. 3.
    Beckhoff C, Koch D, Torresen J (2012) GoAhead: a partial reconfiguration framework. In: FCCMGoogle Scholar
  4. 4.
    Chung I-H, Lee C-R, Zhou J, Chung Y-C (2011) Hierarchical mapping for HPC applications. Parallel Process LettGoogle Scholar
  5. 5.
  6. 6.
    Curtis-Maury M, Shah A, Blagojevic F, Nikolopoulos D, de Supinski B, Schulz M (2008) Prediction models for multi-dimensional power-performance optimization on many cores. In: PACT, pp 250–259Google Scholar
  7. 7.
    Dongarra J, Beckman P, Moore T et al (2011) The international exascale software project roadmap. IJHPCA 25(1):3–60Google Scholar
  8. 8.
    Durand Y et al (2014) EUROSERVER: energy efficient node for european micro-servers. In: Euromicro DSDGoogle Scholar
  9. 9.
    Jose J, Potluri S, Subramoni H, Lu X et al (2014) Designing scalable out-of-core sorting with hybrid MPI + PGAS. In: 8th PGAS programming modelsGoogle Scholar
  10. 10.
    Kim J, Seo S, Lee J, Nah J, Jo G, Lee J (2012) SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters. In: Proceedings of the 26th ACM international conference on supercomputing, ICS ’12, 2012, San Servolo Island, Venice, Italy. ACM, New York, NY, USA, pp 341–352. ISBN 978-1-4503-1316-2
  11. 11.
    Katevenis M (2007) Interprocessor communication seen as load-store instruction generalization. In: Bertels K et al (eds). DelftGoogle Scholar
  12. 12.
    Koch D (2012) Partial reconfiguration on FPGAs—architectures, tools and application. SpringerGoogle Scholar
  13. 13.
    Koch D, Beckhoff C, Teich J (2009) Hardware decompression techniques for FPGA-based Embedded Systems, ACM TRETSGoogle Scholar
  14. 14.
    Mavroidis I, Papaefstathiou I, Lavagno L et al (2012) FASTCUDA: open source fpga accelerator & hardware-software codesign toolset for CUDA kernels. In: DSDGoogle Scholar
  15. 15.
    Pham KD, Horta E, Koch D (2017) BitMan: a tool and API for FPGA bitstream manipulations. In: Design, automation and test in Europe conference exhibition (DATE), pp 894–897Google Scholar
  16. 16.
    Prisacari B, Rodriguez G, Heidelberger P, Chen D, Minkenberg C, Hoefler T (2014) Efficient task placement and routing of nearest neighbor exchanges in dragonfly networks. In: 23rd HPDC. ACMGoogle Scholar
  17. 17.
    Showerman M et al (2009) QP: a heterogeneous multi-accelerator cluster. In: High-performance clustered computingGoogle Scholar
  18. 18.
    Tzannes A, Caragea GC, Vishkin U et al (2014) Lazy scheduling: a runtime adaptive scheduler for declarative parallelism. ACM Trans Program Lang SystGoogle Scholar
  19. 19.
    Yan Y, Lin P, Liao C, de Supinski B, Quinlan D (2015) Supporting multiple accelerators in high-level programming models. In: Proceedings of the sixth international workshop on programming models and applications for multicores and manycores, PMAM ’15. ACM, New York, NY, USA, pp 170–180.
  20. 20.
    Xiao S, Balaji P, Dinan J, Zhu Q, Thakur R, Coghlan S, Lin H, Wen G, Hong J, Feng W (2012) Transparent accelerator migration in a virtualized GPU environment. In: Proceedings of the 2012 12th IEEE/ACM international symposium on cluster, cloud and grid computing (CCGRID 2012). IEEE Computer Society, Washington, DC, USA, pp 124–131.
  21. 21.
    Website of ECOSCALE.

Copyright information

© Springer International Publishing AG, part of Springer Nature 2019

Authors and Affiliations

  • Konstantinos Georgopoulos
    • 1
  • Iakovos Mavroidis
    • 1
    Email author
  • Luciano Lavagno
    • 1
  • Ioannis Papaefstathiou
    • 1
  • Konstantin Bakanov
    • 1
  1. 1.Telecommunication Systems Institute (TSI), Technical University of CreteChaniaGreece

Personalised recommendations