Skip to main content
Log in

Hot sparing for lifetime-chip-performance and cost improvement in application specific SIMT processors

  • Published:
Design Automation for Embedded Systems Aims and scope Submit manuscript

Abstract

Redundancy is a well-known technique for replacing components with manufacturing defects, improving yield and reducing cost. Previously, most yield improvement strategies utilized redundant components only when another component had failed (i.e., cold spares). However, utilizing hot spares is becoming popular in commercial products (e.g., the NVIDIA Ti GPU series). Hot spares address manufacturing cost when the components are defective; otherwise, they can be used to improve performance in the field. In this paper, we investigate the effect of hot spares on lifetime-chip-performance (LCP) in multi-core single-instruction, multiple-thread (SIMT) processors. We observe that hot sparing is outstandingly effective for specific types of SIMT processor configurations (small and medium systems) and applications (FFT and FILTER), while improving cost and LCP over other configurations and applications as well. For example, hot-sparing can improve LCP more than 75% compared with conventional methods (i.e., cold sparing), on average, for applications that experience significant performance improvement when adding hot spares (e.g., FFT and FILTER). In particular, microarchitectural hot redundant resources (e.g., hot spare lanes) achieve better LCP improvement than conventional architectural redundancies (e.g., hot spare cores).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  1. Iverson D, Dickinson D, Masson J, Newman-LaBounty C, Simmons D, Tanona W (2010) Redundant core testing on the cell be microprocessor. In: IEEE International test conference, pp 1–6

  2. NVIDIA GeForce GTX 480/470 to lose cores over poor GPU yield (2013)https://www.slashgear.com/nvidia-geforce-gtx-480470-to-lose-cores-over-poor-gpu-yield-2278420/. Accessed 03 Feb 2017

  3. Powell MD, Biswas A, Gupta S, Mukherjee SS (2009) Architectural core salvaging in a multi-core processor for hard-error tolerance. In: Proceedings of the 36th annual international symposium on computer architecture, pp 93–104

  4. Mozafari SH, Meyer BH (2015) Hot spare components for performance-cost improvement in multi-core simt. In: IEEE international symposium on defect and fault tolerance in VLSI and nanotechnology systems (DFTS), pp 53–59

  5. Gupta S, Feng S, Ansari A, Mahlke S (2011) Stagenet: a reconfigurable fabric for constructing dependable cmps. IEEE Trans Comput 60(1):5–19

    Article  MathSciNet  Google Scholar 

  6. GTX 780Ti is the fully enabled GTX 780 part (2015) http://www.anandtech.com/show/7492/the-geforce-gtx-780-ti-review, accessed 18 Mar 2017

  7. Coskun AK, Simunic Rosing T, Mihic K, De Micheli G (2006) Analysis and optimization of MPSoC reliability. J Low Power Electron 2(1):56–69

    Article  Google Scholar 

  8. Das A, Kumar A, Veeravalli B (2016) Reliability and energy-aware mapping and scheduling of multimedia applications on multiprocessor systems. IEEE Trans Parallel Distrib Syst 27(3):869–884

    Article  Google Scholar 

  9. Krishnapura S, Tang T, Lal V, Nallapa R, Austin D, Achuthan S (2015) White paper: hyperscale high-performance computing for silicon design. IT@INTEL, no. MSU-CSE-06-2

  10. Yu W, Liang F, He X, Hatcher WG, Lu C, Lin J, Yang X (2018) A survey on the edge computing for the internet of things. IEEE Access 6:6900–6919

    Article  Google Scholar 

  11. Machine learning on amazon web service. https://aws.amazon.com/machine-learning/, accessed: 10 Feb 2018

  12. Shivakumar P, Keckler SW, Moore CR, Burger D (2003) Exploiting microarchitectural redundancy for defect tolerance. In: Proceedings 21st international conference on computer design, pp 481–488

  13. Gao Y, Breuer MA, Wang Y (2013) A new paradigm for trading off yield, area and performance to enhance performance per wafer. In: Design, automation test in Europe conference exhibition (DATE), pp 1753–1758

  14. Mozafari SH, Meyer BH, Skadron K (2015) Yield-aware performance-cost characterization for multi-core simt. In: Proceedings of the 25th edition on great lakes symposium on VLSI, pp 237–240

  15. Rodrigues R, Annamalai A, Koren I, Kundu S, Khan O (2011) Performance per watt benefits of dynamic core morphing in asymmetric multicores. In: 2011 International conference on parallel architectures and compilation techniques, pp 121–130

  16. Meng J, Sheaffer JW, Skadron K (2012) Robust simd: dynamically adapted simd width and multi-threading depth. In: 2012 IEEE 26th international parallel and distributed processing symposium, pp 107–118

  17. Maitre O (2013) Understanding NVIDIA GPGPU hardware in massively parallel evolutionary computation on GPGPUs. Springer, Heidelberg, pp 15–34

    Book  Google Scholar 

  18. Meng J, Skadron K (2009) Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling. In: 2009 IEEE international conference on computer design, pp 282–288

  19. Mozafari SH, Meyer BH (2019) Characterizing the effectiveness of hot sparing on cost and performance-per-watt in application specific simt. Integr VLSI J 19:198–209

    Article  Google Scholar 

  20. Bushnell ML, Agrawal VD (2002) Essential of electronic testing for digital, memory and mixed signal VLSI circuits

  21. Hennessy JL, Patterson DA (2012) Computer architecture: a quantitative approach, 5th edn. Morgan Kaufmann Publishers Inc., Burlington

    MATH  Google Scholar 

  22. Koren I, Koren Z (1998) Defect tolerance in vlsi circuits: techniques and yield analysis. Proc IEEE 86(9):1819–1838

    Article  Google Scholar 

  23. Mozafari SH, Meyer BH (2018) Efficient performance evaluation of multi-core simt processors with hot redundancy. IEEE Trans Emerg Top Comput 6(4):498–510

    Article  Google Scholar 

  24. Lee H, Shafique M, Al Faruque MA (2017) Low-overhead aging-aware resource management on embedded gpus. In: 2017 54th ACM/EDAC/IEEE design automation conference (DAC), pp 1–6

  25. Meyer BH, Hartman AS, Thomas DE (2010) Cost-effective slack allocation for lifetime improvement in noc-based mpsocs. In: Design, automation test in Europe conference exhibition (DATE), pp 1596–1601

  26. J Electron Device Engineering Council (2006) Failure mechanisms and models for semiconductor devices. In: JEDEC Publication JEP122C

  27. Huang L, Xu Q (2010) Energy-efficient task allocation and scheduling for multi-mode mpsocs under lifetime reliability constraint. In: Design, automation test in Europe conference exhibition (DATE), pp 1584–1589

  28. Gu Z, Zhu C, Shang L, Dick RP (2008) Application-specific mpsoc reliability optimization. IEEE Trans Very Large Scale Integr Syst 16(5):603–608

    Article  Google Scholar 

  29. Faust GG, Zhang R, Skadron K, Stan M, Meyer BH (2012) ArchFP: rapid prototyping of pre-RTL floorplans. In: VLSI-SOC, pp 259–263

  30. Leng J, Hetherington T, ElTantawy A, Gilani S, Kim NS, Aamodt TM, Reddi VJ (2013) Gpuwattch: enabling energy optimizations in gpgpus. ACM SIGARCH Comput Archit News 41(3):487–498

    Article  Google Scholar 

  31. Skadron K, Stan MR, Sankaranarayanan K, Huang W, Velusamy S, Tarjan D (2004) Temperature-aware microarchitecture: modeling and implementation. ACM Trans Archit Code Optim 1(1):94–125

    Article  Google Scholar 

  32. Narayanan R, Ozisikyilmaz B, Zambreno J, Memik G, Choudhary A (2006) Minebench: a benchmark suite for data mining workloads. In: 2006 IEEE international symposium on workload characterization, pp 182–188

  33. Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The splash-2 programs: characterization and methodological considerations. In: Proceedings 22nd annual international symposium on computer architecture, pp 24–36

  34. Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using cuda. J Parallel Distrib Comput 68(10):1370–1380

    Article  Google Scholar 

  35. (2009) Intel atom processor d400 and d500 series thermal/mechanical specifications and design guidelines. Intel, Tech. Rep. 322856-001

  36. (2006) Dual-core intel xeon processor 3000 series thermal and mechanical design guidelines. Intel, Tech. Rep. 314917-001

  37. Palermo G, Silvano C, Zaccaria V (2009) ReSPIR: a response surface-based pareto iterative refinement for application-specific design space exploration. IEEE Trans Comput Aided Des Integr Circuits Syst 28(12):1816–1829

    Article  Google Scholar 

  38. Hanumaiah V, Vrudhula S (2014) Energy-efficient operation of multicore processors by DVFS, task migration, and active cooling. IEEE Trans Comput 63(2):349–360

    Article  MathSciNet  Google Scholar 

Download references

Acknowledgements

This research was made possible with funding from Fonds de recherche Nature et technologies du Québec (FRQNT), and CAD tools from CMC Microsystems.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to S. Hasan Mozafari.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mozafari, S.H., Meyer, B.H. Hot sparing for lifetime-chip-performance and cost improvement in application specific SIMT processors. Des Autom Embed Syst 24, 249–266 (2020). https://doi.org/10.1007/s10617-020-09238-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10617-020-09238-2

Keywords

Navigation