Hot sparing for lifetime-chip-performance and cost improvement in application specific SIMT processors

Mozafari, S. Hasan; Meyer, Brett H.

doi:10.1007/s10617-020-09238-2

Hot sparing for lifetime-chip-performance and cost improvement in application specific SIMT processors

Published: 03 June 2020

Volume 24, pages 249–266, (2020)
Cite this article

Design Automation for Embedded Systems Aims and scope Submit manuscript

185 Accesses
1 Citation
Explore all metrics

Abstract

Redundancy is a well-known technique for replacing components with manufacturing defects, improving yield and reducing cost. Previously, most yield improvement strategies utilized redundant components only when another component had failed (i.e., cold spares). However, utilizing hot spares is becoming popular in commercial products (e.g., the NVIDIA Ti GPU series). Hot spares address manufacturing cost when the components are defective; otherwise, they can be used to improve performance in the field. In this paper, we investigate the effect of hot spares on lifetime-chip-performance (LCP) in multi-core single-instruction, multiple-thread (SIMT) processors. We observe that hot sparing is outstandingly effective for specific types of SIMT processor configurations (small and medium systems) and applications (FFT and FILTER), while improving cost and LCP over other configurations and applications as well. For example, hot-sparing can improve LCP more than 75% compared with conventional methods (i.e., cold sparing), on average, for applications that experience significant performance improvement when adding hot spares (e.g., FFT and FILTER). In particular, microarchitectural hot redundant resources (e.g., hot spare lanes) achieve better LCP improvement than conventional architectural redundancies (e.g., hot spare cores).

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 8

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

Article 21 September 2023

Shengzhe Yan, Zhaori Cong, … Qing Luo

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

Article 27 April 2021

Xingqi Zou, Sheng Xu, … Yinhe Han

A Modern Primer on Processing in Memory

References

Iverson D, Dickinson D, Masson J, Newman-LaBounty C, Simmons D, Tanona W (2010) Redundant core testing on the cell be microprocessor. In: IEEE International test conference, pp 1–6
NVIDIA GeForce GTX 480/470 to lose cores over poor GPU yield (2013)https://www.slashgear.com/nvidia-geforce-gtx-480470-to-lose-cores-over-poor-gpu-yield-2278420/. Accessed 03 Feb 2017
Powell MD, Biswas A, Gupta S, Mukherjee SS (2009) Architectural core salvaging in a multi-core processor for hard-error tolerance. In: Proceedings of the 36th annual international symposium on computer architecture, pp 93–104
Mozafari SH, Meyer BH (2015) Hot spare components for performance-cost improvement in multi-core simt. In: IEEE international symposium on defect and fault tolerance in VLSI and nanotechnology systems (DFTS), pp 53–59
Gupta S, Feng S, Ansari A, Mahlke S (2011) Stagenet: a reconfigurable fabric for constructing dependable cmps. IEEE Trans Comput 60(1):5–19
Article MathSciNet Google Scholar
GTX 780Ti is the fully enabled GTX 780 part (2015) http://www.anandtech.com/show/7492/the-geforce-gtx-780-ti-review, accessed 18 Mar 2017
Coskun AK, Simunic Rosing T, Mihic K, De Micheli G (2006) Analysis and optimization of MPSoC reliability. J Low Power Electron 2(1):56–69
Article Google Scholar
Das A, Kumar A, Veeravalli B (2016) Reliability and energy-aware mapping and scheduling of multimedia applications on multiprocessor systems. IEEE Trans Parallel Distrib Syst 27(3):869–884
Article Google Scholar
Krishnapura S, Tang T, Lal V, Nallapa R, Austin D, Achuthan S (2015) White paper: hyperscale high-performance computing for silicon design. IT@INTEL, no. MSU-CSE-06-2
Yu W, Liang F, He X, Hatcher WG, Lu C, Lin J, Yang X (2018) A survey on the edge computing for the internet of things. IEEE Access 6:6900–6919
Article Google Scholar
Machine learning on amazon web service. https://aws.amazon.com/machine-learning/, accessed: 10 Feb 2018
Shivakumar P, Keckler SW, Moore CR, Burger D (2003) Exploiting microarchitectural redundancy for defect tolerance. In: Proceedings 21st international conference on computer design, pp 481–488
Gao Y, Breuer MA, Wang Y (2013) A new paradigm for trading off yield, area and performance to enhance performance per wafer. In: Design, automation test in Europe conference exhibition (DATE), pp 1753–1758
Mozafari SH, Meyer BH, Skadron K (2015) Yield-aware performance-cost characterization for multi-core simt. In: Proceedings of the 25th edition on great lakes symposium on VLSI, pp 237–240
Rodrigues R, Annamalai A, Koren I, Kundu S, Khan O (2011) Performance per watt benefits of dynamic core morphing in asymmetric multicores. In: 2011 International conference on parallel architectures and compilation techniques, pp 121–130
Meng J, Sheaffer JW, Skadron K (2012) Robust simd: dynamically adapted simd width and multi-threading depth. In: 2012 IEEE 26th international parallel and distributed processing symposium, pp 107–118
Maitre O (2013) Understanding NVIDIA GPGPU hardware in massively parallel evolutionary computation on GPGPUs. Springer, Heidelberg, pp 15–34
Book Google Scholar
Meng J, Skadron K (2009) Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling. In: 2009 IEEE international conference on computer design, pp 282–288
Mozafari SH, Meyer BH (2019) Characterizing the effectiveness of hot sparing on cost and performance-per-watt in application specific simt. Integr VLSI J 19:198–209
Article Google Scholar
Bushnell ML, Agrawal VD (2002) Essential of electronic testing for digital, memory and mixed signal VLSI circuits
Hennessy JL, Patterson DA (2012) Computer architecture: a quantitative approach, 5th edn. Morgan Kaufmann Publishers Inc., Burlington
MATH Google Scholar
Koren I, Koren Z (1998) Defect tolerance in vlsi circuits: techniques and yield analysis. Proc IEEE 86(9):1819–1838
Article Google Scholar
Mozafari SH, Meyer BH (2018) Efficient performance evaluation of multi-core simt processors with hot redundancy. IEEE Trans Emerg Top Comput 6(4):498–510
Article Google Scholar
Lee H, Shafique M, Al Faruque MA (2017) Low-overhead aging-aware resource management on embedded gpus. In: 2017 54th ACM/EDAC/IEEE design automation conference (DAC), pp 1–6
Meyer BH, Hartman AS, Thomas DE (2010) Cost-effective slack allocation for lifetime improvement in noc-based mpsocs. In: Design, automation test in Europe conference exhibition (DATE), pp 1596–1601
J Electron Device Engineering Council (2006) Failure mechanisms and models for semiconductor devices. In: JEDEC Publication JEP122C
Huang L, Xu Q (2010) Energy-efficient task allocation and scheduling for multi-mode mpsocs under lifetime reliability constraint. In: Design, automation test in Europe conference exhibition (DATE), pp 1584–1589
Gu Z, Zhu C, Shang L, Dick RP (2008) Application-specific mpsoc reliability optimization. IEEE Trans Very Large Scale Integr Syst 16(5):603–608
Article Google Scholar
Faust GG, Zhang R, Skadron K, Stan M, Meyer BH (2012) ArchFP: rapid prototyping of pre-RTL floorplans. In: VLSI-SOC, pp 259–263
Leng J, Hetherington T, ElTantawy A, Gilani S, Kim NS, Aamodt TM, Reddi VJ (2013) Gpuwattch: enabling energy optimizations in gpgpus. ACM SIGARCH Comput Archit News 41(3):487–498
Article Google Scholar
Skadron K, Stan MR, Sankaranarayanan K, Huang W, Velusamy S, Tarjan D (2004) Temperature-aware microarchitecture: modeling and implementation. ACM Trans Archit Code Optim 1(1):94–125
Article Google Scholar
Narayanan R, Ozisikyilmaz B, Zambreno J, Memik G, Choudhary A (2006) Minebench: a benchmark suite for data mining workloads. In: 2006 IEEE international symposium on workload characterization, pp 182–188
Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The splash-2 programs: characterization and methodological considerations. In: Proceedings 22nd annual international symposium on computer architecture, pp 24–36
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using cuda. J Parallel Distrib Comput 68(10):1370–1380
Article Google Scholar
(2009) Intel atom processor d400 and d500 series thermal/mechanical specifications and design guidelines. Intel, Tech. Rep. 322856-001
(2006) Dual-core intel xeon processor 3000 series thermal and mechanical design guidelines. Intel, Tech. Rep. 314917-001
Palermo G, Silvano C, Zaccaria V (2009) ReSPIR: a response surface-based pareto iterative refinement for application-specific design space exploration. IEEE Trans Comput Aided Des Integr Circuits Syst 28(12):1816–1829
Article Google Scholar
Hanumaiah V, Vrudhula S (2014) Energy-efficient operation of multicore processors by DVFS, task migration, and active cooling. IEEE Trans Comput 63(2):349–360
Article MathSciNet Google Scholar

Download references

Acknowledgements

This research was made possible with funding from Fonds de recherche Nature et technologies du Québec (FRQNT), and CAD tools from CMC Microsystems.

Author information

Authors and Affiliations

Department of Electrical and Computer Engineering, McGill University, Montréal, QC, Canada
S. Hasan Mozafari & Brett H. Meyer

Authors

S. Hasan Mozafari
View author publications
You can also search for this author in PubMed Google Scholar
Brett H. Meyer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to S. Hasan Mozafari.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mozafari, S.H., Meyer, B.H. Hot sparing for lifetime-chip-performance and cost improvement in application specific SIMT processors. Des Autom Embed Syst 24, 249–266 (2020). https://doi.org/10.1007/s10617-020-09238-2

Download citation

Received: 05 August 2019
Accepted: 29 April 2020
Published: 03 June 2020
Issue Date: December 2020
DOI: https://doi.org/10.1007/s10617-020-09238-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Hot sparing for lifetime-chip-performance and cost improvement in application specific SIMT processors

Abstract

Access this article

Similar content being viewed by others

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Hot sparing for lifetime-chip-performance and cost improvement in application specific SIMT processors

Abstract

Access this article

Similar content being viewed by others

Recent progress in InGaZnO FETs for high-density 2T0C DRAM applications

Breaking the von Neumann bottleneck: architecture-level processing-in-memory technology

A Modern Primer on Processing in Memory

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation