Abstract
Redundancy is a well-known technique for replacing components with manufacturing defects, improving yield and reducing cost. Previously, most yield improvement strategies utilized redundant components only when another component had failed (i.e., cold spares). However, utilizing hot spares is becoming popular in commercial products (e.g., the NVIDIA Ti GPU series). Hot spares address manufacturing cost when the components are defective; otherwise, they can be used to improve performance in the field. In this paper, we investigate the effect of hot spares on lifetime-chip-performance (LCP) in multi-core single-instruction, multiple-thread (SIMT) processors. We observe that hot sparing is outstandingly effective for specific types of SIMT processor configurations (small and medium systems) and applications (FFT and FILTER), while improving cost and LCP over other configurations and applications as well. For example, hot-sparing can improve LCP more than 75% compared with conventional methods (i.e., cold sparing), on average, for applications that experience significant performance improvement when adding hot spares (e.g., FFT and FILTER). In particular, microarchitectural hot redundant resources (e.g., hot spare lanes) achieve better LCP improvement than conventional architectural redundancies (e.g., hot spare cores).
Similar content being viewed by others
References
Iverson D, Dickinson D, Masson J, Newman-LaBounty C, Simmons D, Tanona W (2010) Redundant core testing on the cell be microprocessor. In: IEEE International test conference, pp 1–6
NVIDIA GeForce GTX 480/470 to lose cores over poor GPU yield (2013)https://www.slashgear.com/nvidia-geforce-gtx-480470-to-lose-cores-over-poor-gpu-yield-2278420/. Accessed 03 Feb 2017
Powell MD, Biswas A, Gupta S, Mukherjee SS (2009) Architectural core salvaging in a multi-core processor for hard-error tolerance. In: Proceedings of the 36th annual international symposium on computer architecture, pp 93–104
Mozafari SH, Meyer BH (2015) Hot spare components for performance-cost improvement in multi-core simt. In: IEEE international symposium on defect and fault tolerance in VLSI and nanotechnology systems (DFTS), pp 53–59
Gupta S, Feng S, Ansari A, Mahlke S (2011) Stagenet: a reconfigurable fabric for constructing dependable cmps. IEEE Trans Comput 60(1):5–19
GTX 780Ti is the fully enabled GTX 780 part (2015) http://www.anandtech.com/show/7492/the-geforce-gtx-780-ti-review, accessed 18 Mar 2017
Coskun AK, Simunic Rosing T, Mihic K, De Micheli G (2006) Analysis and optimization of MPSoC reliability. J Low Power Electron 2(1):56–69
Das A, Kumar A, Veeravalli B (2016) Reliability and energy-aware mapping and scheduling of multimedia applications on multiprocessor systems. IEEE Trans Parallel Distrib Syst 27(3):869–884
Krishnapura S, Tang T, Lal V, Nallapa R, Austin D, Achuthan S (2015) White paper: hyperscale high-performance computing for silicon design. IT@INTEL, no. MSU-CSE-06-2
Yu W, Liang F, He X, Hatcher WG, Lu C, Lin J, Yang X (2018) A survey on the edge computing for the internet of things. IEEE Access 6:6900–6919
Machine learning on amazon web service. https://aws.amazon.com/machine-learning/, accessed: 10 Feb 2018
Shivakumar P, Keckler SW, Moore CR, Burger D (2003) Exploiting microarchitectural redundancy for defect tolerance. In: Proceedings 21st international conference on computer design, pp 481–488
Gao Y, Breuer MA, Wang Y (2013) A new paradigm for trading off yield, area and performance to enhance performance per wafer. In: Design, automation test in Europe conference exhibition (DATE), pp 1753–1758
Mozafari SH, Meyer BH, Skadron K (2015) Yield-aware performance-cost characterization for multi-core simt. In: Proceedings of the 25th edition on great lakes symposium on VLSI, pp 237–240
Rodrigues R, Annamalai A, Koren I, Kundu S, Khan O (2011) Performance per watt benefits of dynamic core morphing in asymmetric multicores. In: 2011 International conference on parallel architectures and compilation techniques, pp 121–130
Meng J, Sheaffer JW, Skadron K (2012) Robust simd: dynamically adapted simd width and multi-threading depth. In: 2012 IEEE 26th international parallel and distributed processing symposium, pp 107–118
Maitre O (2013) Understanding NVIDIA GPGPU hardware in massively parallel evolutionary computation on GPGPUs. Springer, Heidelberg, pp 15–34
Meng J, Skadron K (2009) Avoiding cache thrashing due to private data placement in last-level cache for manycore scaling. In: 2009 IEEE international conference on computer design, pp 282–288
Mozafari SH, Meyer BH (2019) Characterizing the effectiveness of hot sparing on cost and performance-per-watt in application specific simt. Integr VLSI J 19:198–209
Bushnell ML, Agrawal VD (2002) Essential of electronic testing for digital, memory and mixed signal VLSI circuits
Hennessy JL, Patterson DA (2012) Computer architecture: a quantitative approach, 5th edn. Morgan Kaufmann Publishers Inc., Burlington
Koren I, Koren Z (1998) Defect tolerance in vlsi circuits: techniques and yield analysis. Proc IEEE 86(9):1819–1838
Mozafari SH, Meyer BH (2018) Efficient performance evaluation of multi-core simt processors with hot redundancy. IEEE Trans Emerg Top Comput 6(4):498–510
Lee H, Shafique M, Al Faruque MA (2017) Low-overhead aging-aware resource management on embedded gpus. In: 2017 54th ACM/EDAC/IEEE design automation conference (DAC), pp 1–6
Meyer BH, Hartman AS, Thomas DE (2010) Cost-effective slack allocation for lifetime improvement in noc-based mpsocs. In: Design, automation test in Europe conference exhibition (DATE), pp 1596–1601
J Electron Device Engineering Council (2006) Failure mechanisms and models for semiconductor devices. In: JEDEC Publication JEP122C
Huang L, Xu Q (2010) Energy-efficient task allocation and scheduling for multi-mode mpsocs under lifetime reliability constraint. In: Design, automation test in Europe conference exhibition (DATE), pp 1584–1589
Gu Z, Zhu C, Shang L, Dick RP (2008) Application-specific mpsoc reliability optimization. IEEE Trans Very Large Scale Integr Syst 16(5):603–608
Faust GG, Zhang R, Skadron K, Stan M, Meyer BH (2012) ArchFP: rapid prototyping of pre-RTL floorplans. In: VLSI-SOC, pp 259–263
Leng J, Hetherington T, ElTantawy A, Gilani S, Kim NS, Aamodt TM, Reddi VJ (2013) Gpuwattch: enabling energy optimizations in gpgpus. ACM SIGARCH Comput Archit News 41(3):487–498
Skadron K, Stan MR, Sankaranarayanan K, Huang W, Velusamy S, Tarjan D (2004) Temperature-aware microarchitecture: modeling and implementation. ACM Trans Archit Code Optim 1(1):94–125
Narayanan R, Ozisikyilmaz B, Zambreno J, Memik G, Choudhary A (2006) Minebench: a benchmark suite for data mining workloads. In: 2006 IEEE international symposium on workload characterization, pp 182–188
Woo SC, Ohara M, Torrie E, Singh JP, Gupta A (1995) The splash-2 programs: characterization and methodological considerations. In: Proceedings 22nd annual international symposium on computer architecture, pp 24–36
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using cuda. J Parallel Distrib Comput 68(10):1370–1380
(2009) Intel atom processor d400 and d500 series thermal/mechanical specifications and design guidelines. Intel, Tech. Rep. 322856-001
(2006) Dual-core intel xeon processor 3000 series thermal and mechanical design guidelines. Intel, Tech. Rep. 314917-001
Palermo G, Silvano C, Zaccaria V (2009) ReSPIR: a response surface-based pareto iterative refinement for application-specific design space exploration. IEEE Trans Comput Aided Des Integr Circuits Syst 28(12):1816–1829
Hanumaiah V, Vrudhula S (2014) Energy-efficient operation of multicore processors by DVFS, task migration, and active cooling. IEEE Trans Comput 63(2):349–360
Acknowledgements
This research was made possible with funding from Fonds de recherche Nature et technologies du Québec (FRQNT), and CAD tools from CMC Microsystems.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mozafari, S.H., Meyer, B.H. Hot sparing for lifetime-chip-performance and cost improvement in application specific SIMT processors. Des Autom Embed Syst 24, 249–266 (2020). https://doi.org/10.1007/s10617-020-09238-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10617-020-09238-2