On the potential of significance-driven execution for energy-aware HPC

  • Philipp Gschwandtner
  • Charalampos Chalios
  • Dimitrios S. Nikolopoulos
  • Hans Vandierendonck
  • Thomas Fahringer
Special Issue Paper


Dynamic voltage and frequency scaling (DVFS) exhibits fundamental limitations as a method to reduce energy consumption in computing systems. In the HPC domain, where performance is of highest priority and codes are heavily optimized to minimize idle time, DVFS has limited opportunity to achieve substantial energy savings. This paper explores if operating processors near the transistor threshold voltage (NTV) is a better alternative to DVFS for breaking the power wall in HPC. NTV presents challenges, since it compromises both performance and reliability to reduce power consumption. We present a first of its kind study of a significance-driven execution paradigm that selectively uses NTV and algorithmic error tolerance to reduce energy consumption in performance-constrained HPC environments. Using an iterative algorithm as a use case, we present an adaptive execution scheme that switches between near-threshold execution on many cores and above-threshold execution on one core, as the computational significance of iterations in the algorithm evolves over time. Using this scheme on state-of-the-art hardware, we demonstrate energy savings ranging between 35 and 67 %, while compromising neither correctness nor performance.


Significance Energy Unreliability  Near-threshold voltage Fault tolerance 



The research leading to these results has received funding from the European Union’s Seventh Framework Programme (FP7/2007–2013), under grant agreements FP7-323872 (SCoRPiO) and 327744 (NovoSoft), the United Kingdom Engineering and Physical Sciences Research Council (EPSRC), under grants EP/L000055/1 (ALEA), EP/L004232/1 (ENPOWER) and EP/K017594/1 (GEMSCLAIM), and the Austrian Science Fund (FWF) under contract W1227-N16 (DK-plus CIM).


  1. 1.
    Agarwal A, Rinard M, Sidiroglou S, Misailovic S, Hoffmann H (2009) Using code perforation to improve performance, reduce energy consumption, and respond to failures. Tech. rep, Massachusetts Institute of TechnologyGoogle Scholar
  2. 2.
    Ayatolahi F, Sangchoolie B, Johansson R, Karlsson J (2013) A study of the impact of single bit-flip and double bit-flip errors on program execution. In: Computer safety, reliability, and security. Springer, pp 265–276Google Scholar
  3. 3.
    Baek W, Chilimbi TM (2010) Green: a framework for supporting energy-conscious programming using controlled approximation. SIGPLAN Not 45(6):198–209CrossRefGoogle Scholar
  4. 4.
    Dreslinski RG, Wieckowski M, Blaauw D, Sylvester D, Mudge T (2010) Near-threshold computing: reclaiming moore’s law through energy efficient integrated circuits. Proc IEEE 98(2):253–266CrossRefGoogle Scholar
  5. 5.
    Elliot J, Müller F, Stoyanov M, Webster C (2013) Quantifying the impact of single bit flips on floating point arithmetic. Tech. rep., Tech. Rep. ORNL/TM-2013/282, Oak Ridge National Laboratory, One Bethel Valley Road, Oak Ridge, TN, 2013. 6, 9Google Scholar
  6. 6.
    Fiala D, Mueller F, Engelmann C, Riesen R, Ferreira K, Brightwell R (2012) Detection and correction of silent data corruption for large-scale high-performance computing. In: Proceedings of the International Conference on high performance computing, networking, storage and analysis. IEEE Computer Society Press, p 78Google Scholar
  7. 7.
    Hähnel M, Döbel B, Völp M, Härtig H (2012) Measuring energy consumption for short code paths using RAPL. SIGMETRICS Perform Eval Rev 40(3):13–17CrossRefGoogle Scholar
  8. 8.
    Hoemmen M, Heroux M (2011) Fault-tolerant iterative methods via selective reliability. In: Proceedings of the 2011 International Conference for high performance computing, networking, storage and analysis (SC), vol 3. IEEE Computer Society, p 9Google Scholar
  9. 9.
    Hursey J, Squyres J, Mattox T, Lumsdaine A (2007) The design and implementation of checkpoint/restart process fault tolerance for open mpi. In: Parallel and distributed processing symposium, 2007. IPDPS 2007. IEEE International, IEEE, pp 1–8Google Scholar
  10. 10.
    Intel (2013) Intel 64 and IA-32 Architectures software developer’s manual, vol 3B, part 2Google Scholar
  11. 11.
    Jordan H, Thoman P, Durillo J, Gschwandtner P, Fahringer T (2012) A multi-objective auto-tuning framework for parallel codes. In: Supercomputing, 2012 Conference. IEEEGoogle Scholar
  12. 12.
    Karpuzcu U, Kim NS, Torrellas J (2013) Coping with parametric variation at near-threshold voltages. Micro IEEE 33(4):6–14CrossRefGoogle Scholar
  13. 13.
    Leem L, Cho H, Bau J, Jacobson QA, Mitra S (2010) Ersa: error resilient system architecture for probabilistic applications. In: Design, automation and test in Europe Conference and exhibition (DATE), 2010. IEEE, pp 1560–1565Google Scholar
  14. 14.
    Misailovic S, Sidiroglou S, Hoffmann H, Rinard M (2010) Quality of service profiling. In: Proceedings of the 32nd ACM/IEEE International Conference on software engineering-volume 1. ACM, pp 25–34Google Scholar
  15. 15.
    Rinard M (2006) Probabilistic accuracy bounds for fault-tolerant computations that discard tasks. In: Proceedings of the 20th annual international conference on supercomputing. ACM, pp 324–334Google Scholar
  16. 16.
    Rinard M, Hoffmann H, Misailovic S, Sidiroglou S (2010) Patterns and statistical analysis for understanding reduced resource computing. SIGPLAN Not 45(10):806–821CrossRefGoogle Scholar
  17. 17.
    Saggese GP, Wang NJ, Kalbarczyk ZT, Patel SJ, Iyer RK (2005) An experimental study of soft errors in microprocessors. IEEE Micro 25(6):30–39CrossRefGoogle Scholar
  18. 18.
    Sampson A, Dietl W, Fortuna E, Gnanapragasam D, Ceze L, Grossman D (2011) EnerJ: approximate data types for safe and general low-power computation. SIGPLAN Not 46(6):164–174CrossRefGoogle Scholar
  19. 19.
    Tolentino M, Cameron KW (2012) The optimist, the pessimist, and the global race to exascale in 20 megawatts. Computer 45(1):95–97CrossRefGoogle Scholar
  20. 20.
    Utke J, Naumann U, Fagan M, Tallent N, Strout M, Heimbach P, Hill C, Wunsch C (2008) Openad/f: a modular open-source tool for automatic differentiation of fortran codes. ACM Trans Math Softw 34(4):18:1–18:36Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Philipp Gschwandtner
    • 1
  • Charalampos Chalios
    • 2
  • Dimitrios S. Nikolopoulos
    • 2
  • Hans Vandierendonck
    • 2
  • Thomas Fahringer
    • 1
  1. 1.Innsbruck Austria
  2. 2.Belfast UK

Personalised recommendations