Skip to main content

Resilience Proportionality—A Paradigm for Efficient and Reliable System Design

  • Chapter
  • First Online:
Dependable Multicore Architectures at Nanoscale

Abstract

Reliability, Availability, and Serviceability (RAS) are key considerations in hardware design, be it for mobile devices or high-end servers. However, provisioning RAS is often at odds with meeting performance and energy targets and increases the overall cost of design of the chip. As a result of this tension, chip design companies have to make difficult decisions about how much RAS they can incorporate into each product in their portfolio and even what customers and market segments they can realistically target. On the other hand, highly scaled silicon technology nodes are susceptible to a variety of reliability problems and emerging technologies such as die-stacking and non-volatile memory, while critical for meeting the demands of future computing needs, have significant reliability challenges of their own. RAS features can actually serve to reduce the deployment costs of these technologies (e.g., by increasing effective yield). Determining the tradeoff between design cost, deployment cost, and the RAS needs of a market is the critical issue to address when evaluating RAS features. In this article, we shed light on this struggle between driving greater efficiency, lowering costs, and meeting the RAS demands of various market segments from an industry perspective. We argue that ending this struggle requires having sufficient flexibility in the design to adapt to the needs of a wide range of applications and hardware configurations. We call such an approach “resilience proportionality” and believe that this approach should guide future architectural reliability research. Finally, we discuss how resilience proportionality can be achieved and certain challenges that need to be addressed.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. K. Bergman, S. Borkar, D. Campbell, W. Carlson, W. Dally, M. Denneau, P. Franzon, W. Harrod, J. Hiller, S. Karp, S. Keckler, D. Klein, R. Lucas, M. Richards, A. Scarpelli, S. Scott, A. Snavely, T. Sterling, R. S. Williams and K. Yelick, Exascale computing study: technology challenges in achieving exascale systems, peter kogge, editor & study lead, 2008

    Google Scholar 

  2. A. Avizienis, J.-C. Laprie, B. Randell, C. Landwehr, Basic Concepts and Taxonomy of Dependable and Secure Computing, in IEEE Transactions on Dependable and Secure Computing, (Jan–Mar 2004), pp. 11–33

    Google Scholar 

  3. S. S. Mukherjee, J. Emer, S. Reinhardt, The Soft Error Problem: An Architectural Perspective, in International Symposium on High-Performance Computer Architecture, 2005

    Google Scholar 

  4. R. Baumann, Radiation-Induced Soft Errors In Advanced Semiconductor Technologies, in IEEE Transactions on Device and Materials Reliability, 2005

    Google Scholar 

  5. C. Constantinescu, Trends and challenges in vlsi circuit reliability, in IEEE Micro, (Jul–Aug 2003), pp. 14–19

    Google Scholar 

  6. V. Sridharan, N. DeBardeleben, S. Blanchard, K. B. Ferreira, J. Stearley, J. Shalf, S. Gurumurthi, Memory Errors in Modern Systems: The Good, The Bad, and The Ugly, in International Conference on Architectural Support for Programming Languages and Operating Systems, 2015

    Google Scholar 

  7. S. Mittal, J. S. Vetter, A survey of Software Techniques for Using Non-Volatime Memories for Storage and main Memory Systems, in IEEE Transactions on Parallel and Distributed Systems, 2015

    Google Scholar 

  8. T. Siddiqua, S. Gurumurthi, A Multi-Level Approach to Reduce The Impact of Nbti on Processor Functional Units, in Great lakes symposium on VLSI, 2010

    Google Scholar 

  9. M. R. Shaneyfelt, P. E. Dodd, B. L. Draper, R. S. Flores, Challenges in Hardening Technologies Using Shallow-Trench Isolation, in IEEE Transactions on Nuclear Science, pp. 2584–2592, 1998

    Google Scholar 

  10. R. W. Hamming, Error Detecting and Correcting Codes, in Bell System Technical Journal, 1950

    Google Scholar 

  11. D. Bernick, B. Bruckert, P. D. Vigna, D. Garcia, R. Jardine, J. Klecka, J. Smullen, Nonstop Advanced Architecture, in International Conference on Dependable Systems and Networks, 2005

    Google Scholar 

  12. L. A. Barroso, J. Clidaras, U. Holzle, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines, 2nd edn. (2013)

    Google Scholar 

  13. D. Lyons, Sun Screen, Forbes, 13 Nov 2000

    Google Scholar 

  14. A. Biswas, C. Recchia, S. S. Mukherjee, V. Ambrose, L. Chan, A. Jaleel, A. Papathanasiou, M. Plaster, N. Seifert, Explaining Cache SER Anomaly Using DUE AVF Measurement, 2010

    Google Scholar 

  15. L. Szafaryn, B. H. Meyer, K. Skadron, Evaluating Overheads of Multibit Soft-Error Protection in the Processor Core, in IEEE Micro, pp. 56–65, 2013

    Google Scholar 

  16. S. S. Mukherjee, C. Weaver, J. Emer, S. Reinhardt, T. Austin, A Systematic Methodology to Compute the Architectural Vulnerability Factors for a High-Performance Microprocessor, in International Symposium on Microarchitecture, 2003

    Google Scholar 

  17. L. Barroso, U. Holzle, The Case for Energy-Proportional Computing, in IEEE Computer, pp. 33–37, 2007

    Google Scholar 

  18. Khronos Group, OpenCL, [Online]. Available: www.khronos.org/opencl

  19. Heterogeneous System Architecture Foundation, [Online]. Available: http://www.hsafoundation.com

  20. J. Wadden, A. Lyashevsky, S. Gurumurthi, V. Sridharan, K. Skadron, Real-World Design and Evaluation of Compiler-Managed GPU Redundant Multi-Threading, in International Symposium on Computer Architecture, 2014

    Google Scholar 

  21. S. Li, V. Sridharan, S. Gurumurthi, S. Yalamanchili, Software-based Dynamic Reliability Management for GPU Applications, in Workshop on Silicon Errors in Logic—System Effects, 2015

    Google Scholar 

  22. V. Sridharan, D. R. Kaeli, Eliminating Microarchitectural Dependency from Architectural Vulnerability, in International Symposium on High-Performance Computer Architecture, 2009

    Google Scholar 

  23. B. Fang, K. Pattabiraman, M. Ripeanu, S. Gurumurthi, GPU-Qin: A Methodology for Evaluating the Error Resilience of GPGPU Applications, in International Symposium on Performance Analysis of Systems and Software, 2014

    Google Scholar 

  24. S. Hari, T. Tsai, M. Stephenson, S. Keckler, J. Emer, SASSIFI: Evaluating Resilience of GPU Applications, in IEEE Workshop on Silicon Errors in Logic—System Effects, 2015

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Vilas Sridharan .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG

About this chapter

Cite this chapter

Sridharan, V., Gurumurthi, S. (2018). Resilience Proportionality—A Paradigm for Efficient and Reliable System Design. In: Ottavi, M., Gizopoulos, D., Pontarelli, S. (eds) Dependable Multicore Architectures at Nanoscale. Springer, Cham. https://doi.org/10.1007/978-3-319-54422-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-54422-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-54421-2

  • Online ISBN: 978-3-319-54422-9

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics