Skip to main content

Improving the Energy Efficiency by Exceeding the Conservative Operating Limits

  • Chapter
  • First Online:
Hardware Accelerators in Data Centers

Abstract

This chapter presents UniServer that exploits the increased variability within CPUs and memories manufactured in advanced nanometer nodes that give rise to another type of heterogeneity; the intrinsic hardware heterogeneity which differs from the functional heterogeneity, which is discussed in the previous chapters. In particular, the aggressive miniaturization of transistors led to worsening of the static and temporal variations of transistor parameters, resulting eventually to large variations in the performance and energy efficiency of the manufactured chips. Such increased variability causes otherwise-identical nanoscale circuits to exhibit different performance or power-consumption behaviors, even though they are designed using the same processes and architectures and manufactured using the same exact production lines. The UniServer approach discussed in this chapter attempts to quantify the intrinsic variability within the CPUs and memories of commodity servers and reveal the true capabilities of each core and memory through unique automated online and offline characterization processes. The revealed capabilities and new operating points or cores and memories that may differ substantially from the ones currently adopted by manufacturers are then being exploited by an enhanced error-resilient software stack for improving the energy efficiency, while maintaining high levels of system availability. The UniServer approach introduces innovations across all layers of the hardware and system software stack; from firmware to hypervisor, up to the OpenStack resource manager targeting deployments at the emerging edge or classical cloud data centers.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 119.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 159.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 159.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. The Internet of Things: sizing up the opportunity (2014) Online report. http://www.mckinsey.com/industries/semiconductors/our-insights/the-internet-of-things-sizing-up-the-opportunity

  2. Cisco visual networking index: global mobile data traffic forecast update 2016–2021 (2017). Online white paper. http://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/mobile-white-paper-c11-520862.html

  3. Hamilton J (2009) Cooperative expendable micro-slice servers (CEMS): low cost, low power servers for internet-scale services. IEEE, Asilomar

    Google Scholar 

  4. Shi W, Cao J, Zhang Q, Li Y, Xu L (2016) Edge computing: vision and challenges. IEEE Internet Things J 3(5):637–646 2016

    Google Scholar 

  5. Koomey JG et al (2011) Implications of historical trends in the electrical efficiency of computing. IEEE Ann Hist Comput 33(3):46–54

    Google Scholar 

  6. Esmaeilzadeh H et al (2011) Dark silicon and the end of multicore scaling. In: Proceedings of the 38th IEEE annual international symposium on computer architecture (ISCA), pp 365–376

    Google Scholar 

  7. Ghosh S, Roy K (2010) parameter variation tolerance and error resiliency: new design paradigm for the nanoscale era. Proc IEEE 98(10):1718–1751

    Google Scholar 

  8. Karakonstantis G, Roy K (2010) Low power and variation-tolerant application-specific system design. In: Low-power variation-tolerant design in nanometer silicon. Springer

    Google Scholar 

  9. Esmaeilzadeh H et al (2011) Dark silicon and the end of multicore scaling In: IEEE international symposium on computer architecture (ISCA)

    Google Scholar 

  10. Whatmough PN et al (2015) 14.6 an all-digital power-delivery monitor for analysis of a 28 nm dual-core arm cortex-a57 cluster. In: IEEE ISSCC

    Google Scholar 

  11. Jamie L et al (2013) An experimental study of data retention behavior in modern DRAM devices. In: IEEE ISCA ’13

    Google Scholar 

  12. Qureshi MK et al (2015) Avatar: a variable-retention-time (VRT) aware refresh for dram systems. In: Proceedings of the 2015 45th IEEE DSN ’15, pp 427–437

    Google Scholar 

  13. Borkar S et al (2003) Parameter variations and impact on circuits and microarchitecture. In: IEEE DAC

    Google Scholar 

  14. Reddi VJ et al (2010) Voltage smoothing: characterizing and mitigating voltage noise in production processors via software-guided thread scheduling. In: IEEE MICRO

    Google Scholar 

  15. Bacha A, Teodorescu R (2013) Dynamic reduction of voltage margins by leveraging on-chip ECC in itanium II processors. In: Proceedings of the 40th IEEE annual international symposium on computer architecture (ISCA), pp 297–307

    Google Scholar 

  16. Karakonstantis G et al (2001) Containing the nanometer pandora-box: crosslayer design techniques for variation aware low power systems. In: IEEE JETCAS

    Google Scholar 

  17. Leem L et al (2010) Cross-layer error resilience for robust systems. In: Proceedings of IEEE/ACM ICCAD

    Google Scholar 

  18. Das S et al (2009) RazorII: in situ error detection and correction for PVT and SER tolerance. In: IEEE JSSCC

    Google Scholar 

  19. Bowman K et al (2011) A 45 nm resilient microprocessor core for dynamic variation tolerance. IEEE J Solid-State Circuits 46(1)

    Google Scholar 

  20. Bull DM et al (2011) A power-efficient 32 bit ARM processor using timing error detection and correction for transient-error tolerance and adaptation to PVT variation. In: IEEE JSSC

    Google Scholar 

  21. Xu Q Kim NS, Mytkowicz T (2016) Approximate computing: a survey. IEEE Des Test

    Google Scholar 

  22. Mitra S et al (2011) Robust system design to overcome CMOS reliability challenges. IEEE J Emerg Sel Top Circuits Syst 1(1)

    Google Scholar 

  23. Sampson A, Dietl W, Fortuna E, Gnanapragasam D, Ceze L, Grossman D (2011) EnerJ: approximate data types for safe and general low-power computation. In Proceedings of the 32nd ACM SIGPLAN conference on programming language design and implementation (PLDI), New York, NY, June 2011

    Google Scholar 

  24. Esmaeilzadeh H, Sampson A, Ceze L, Burger D (2012) Architecture support for disciplined approximate programming. In: ASPLOS 2012, pp 301–312, London, UK

    Google Scholar 

  25. Cho H, Leem L, Mitra S (2012) ERSA: error resilient system architecture for probabilistic applications. IEEE Trans CAD Integr Circuits Syst 31:546–558

    Google Scholar 

  26. Narayanan S, Sartori J, Kumar R, Jones DL (2010) Scalable stochastic processors. In: DATE

    Google Scholar 

  27. Ganapathy S, Karakonstantis G, Teman A, Burg A (2015) Mitigating the impact of faults in unreliable memories for error-resilient applications. In: IEEE DAC

    Google Scholar 

  28. Teman A, Karakonstantis G et al (2015) Energy versus data integrity trade-offs in embedded high-density logic compatible dynamic memories. In: IEEE DATE

    Google Scholar 

  29. Gu et al W (2003) Characterization of linux kernel behavior under errors. In: IEEE DSN

    Google Scholar 

  30. David FM et al (2007) Building a self-healing operating system. In: IEEE DASC

    Google Scholar 

  31. Jin X et al (2015) FTXen: making hypervisor resilient to hardware faults on relaxed cores. In: IEEE HPCA

    Google Scholar 

  32. Bahga A et al (2012) Analyzing massive machine maintenance data in a computing cloud. In: IEEE TPDS

    Google Scholar 

  33. Dean D et al (2012) UBL: unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems. In: ICAC

    Google Scholar 

  34. Gaikwad P et al (2016) Anomaly detection for scientific workflow applications on networked clouds. In: HPCS

    Google Scholar 

  35. Huzum C et al (2011) March test for static neighborhood pattern-sensitive faults in random-access memories. Elektronika ir Elektrotechnika

    Google Scholar 

  36. Dongarra JJ, Luszczek P, Petitet A (2003) The LINPACK benchmark: past, present and future. Concurr Comput pract Exp 15(9):803–820

    Google Scholar 

  37. Henning JL (2006) SPEC CPU2006 benchmark descriptions. SIGARCH Comput Archit

    Google Scholar 

  38. Das S et al (2015) Modelling and analysis of the system-level power delivery network for a dual-core cortex-A57 in 28 nm CMOS. In: International symposium on low-power electronic design (ISLPED)

    Google Scholar 

  39. Qiang G et al (2012) A failure detection and prediction mechanism for enhancing dependability of data centers. J Comput Theory Eng

    Google Scholar 

  40. Liu J et al (2012) RAIDR: retention-aware intelligent dram refresh. In: IEEE ISCA, pp 1–12

    Google Scholar 

  41. Tovletoglou K, Nikolopoulos D, Karakonstantis G (2017) Relaxing DRAM refresh-rate through access pattern scheduling. IEEE IOLTS

    Google Scholar 

  42. Karakonstantis G et al (2018) An energy-efficient and error-resilient server ecosystem exceeding conservative scaling limits. IEEE Des Test Eur 2018:1099–1104

    Google Scholar 

  43. Karakonstantis G, Nikolopoulos DS, Gizopoulos D, Trancoso P, Sazeides Y, Antonopoulos CD, Venugopal D, Das S (2017) Error-resilient server ecosystems for edge and cloud datacenters. IEEE Comput 50(12):78–81

    Google Scholar 

  44. Papadimitriou G, Kaliorakis M, Chatzidimitriou A, Magdalinos C, Gizopoulos D (2017) Voltage margins identification on commercial x86-64 multicore microprocessors. IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS 2017), Thessaloniki, Greece, July 2017

    Google Scholar 

  45. Papadimitriou G, Kaliorakis M, Chatzidimitriou A, Gizopoulos D, Favor G, Sankaran K, Das S (2017) A system-level voltage/frequency scaling characterization framework for multicore CPUs. IEEE Silicon Errors in Logic – System Effects (SELSE 2017), Boston, MA, USA, March 2017

    Google Scholar 

  46. Papadimitriou G, Kaliorakis M, Chatzidimitriou A, Gizopoulos D, Lawthers P, Das S (2017) Harnessing voltage margins for energy efficiency in multicore CPUs. IEEE/ACM International Symposium on Microarchitecture (MICRO 2017), Cambridge, MA, USA, October 2017

    Google Scholar 

  47. Chatzidimitriou A, Papadimitriou G, Gizopoulos G (2018) HealthLog Monitor: A Flexible System-Monitoring Linux Service. IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS 2018), Costa Brava, Spain, July 2018

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Georgios Karakonstantis .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer International Publishing AG, part of Springer Nature

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mukhanov, L. et al. (2019). Improving the Energy Efficiency by Exceeding the Conservative Operating Limits. In: Kachris, C., Falsafi, B., Soudris, D. (eds) Hardware Accelerators in Data Centers. Springer, Cham. https://doi.org/10.1007/978-3-319-92792-3_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-92792-3_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-92791-6

  • Online ISBN: 978-3-319-92792-3

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics