Abstract
Computational capabilities of the largest high performance computing systems have increased by more than 100 folds in the last 10 years and keep increasing substantially every year. This increase is made possible mostly by multi-core technology besides the increase in clock speed of CPUs. Nowadays, there are systems with more than 100 thousand cores installed and available for processing simultaneously. Computational simulation tools are always in need of more than available computational sources. This is the case for especially complex, large scale flow problems. For these large scale problems, the soft error tolerance of the simulation codes should also be encountered where it is not an issue in relatively small scale problems due to the low occurrence probabilities. In this study, we analyzed the reaction of an incompressible flow solver to randomly generated soft errors at several levels of computation. Soft errors are induced into the final global assembly matrix of the solver by manipulating predetermined bit-flip operations. Behaviour of the computational fluid dynamics (CFD) solver is observed after iterative matrix solver, flow convergence and CFD iterations. Results show that the iterative solvers of CFD matrices are highly sensitive to customized soft errors while the final solutions seem more intact to bit-flip operations. But, the solutions might still differ from the real physical results depending on the bit-flip location and iteration number. So, the next generation computing platforms and codes should be designed to be able to detect bit-flip operations and be designed bit-flip resistant.
Graphic abstract
Similar content being viewed by others
References
Adiga NR, Almasi G, et al (2002) An overview of the bluegene/l supercomputer. In: SC ’02: Proceedings of the 2002 ACM/IEEE conference on supercomputing, pp 60–60
Agullo E, Giraud L, Guermouche A, Roman J, Zounon M (2016) Numerical recovery strategies for parallel resilient krylov linear solvers. Numer Linear Algebra Appl 23(5):888–905
Agullo E, Cools S, Giraud L, Moreau A, Salas P, Vanroose W, Yetkin EF, Zounon M (2017) Hard faults and soft-errors: possible numerical remedies in linear algebra solvers. In: Dutra I, Camacho R, Barbosa J, Marques O (eds) High performance computing for computational science - VECPAR 2016. Springer, Cham, pp 11–18
Agullo E, Cools S, Yetkin EF, Giraud L, Vanroose W (2018) On soft errors in the conjugate gradient method: sensitivity and robust numerical detection. Research Report RR-9226, Inria Bordeaux Sud-Ouest
Agullo E, Cools S, Yetkin EF, Giraud L, Schenkels N, Vanroose W (2020) On soft errors in the conjugate gradient method: sensitivity and robust numerical detection. SIAM J Sci Comput 42(6):C335–C358
Alvarez X, Gorobets A, Trias F, Borrell R, Oyarzun G (2018) Hpc2-a fully-portable, algebra-based framework for heterogeneous computing. application to CFD. Comput Fluids 173:285–292
Arnaz A, Piskin S, Oguz GN, Yalcinbas Y, Pekkan K, Saroglu T (2018) Effect of modified Blalock–Taussig shunt anastomosis angle and pulmonary artery diameter on pulmonary flow. Anatol J Cardiol 20(1):2–8
Avižienis A, Laprie JC, Randell B, Landwehr C (2004) Basic concepts and taxonomy of dependable and secure computing. IEEE Trans Dependable Secure Comput 1(1):11–33
Bautista-Gomez L, Cappello F (2015) Detecting silent data corruption for extreme-scale MPI applications. In: Proceedings of the 22nd European MPI users’ group meeting, association for computing machinery, New York, NY, USA, EuroMPI ’15
Benson AR, Schmit S, Schreiber R (2015) Silent error detection in numerical time-stepping schemes. Int J High Perform Comput Appl 29(4):403–421
Berrocal E, Bautista-Gomez L, Di S, Lan Z, Cappello F (2015) Lightweight silent data corruption detection based on runtime data analysis for hpc applications. In: Proceedings of the 24th international symposium on high-performance parallel and distributed computing, Association for Computing Machinery, New York, NY, USA, HPDC ’15, pp 275–278
Bronevetsky G, de Supinski B (2008) Soft error vulnerability of iterative linear algebra methods. In: Proceedings of the 22nd annual international conference on Supercomputing, pp 155–164
Bronevetsky G, de Supinski B, Schulz M (2009) A foundation for the accurate prediction of the soft error vulnerability of scientific applications. In: IEEE workshop on silicon errors in logic - system effects, Stanford, CA, United States
Calmet H, Gambaruto AM, Bates AJ, Vázquez M, Houzeaux G, Doorly DJ (2016) Large-scale CFD simulations of the transitional and turbulent regime for the large human airways during rapid inhalation. Comput Biol Med 69:166–180
Cappello F, Geist A, Gropp W, Kale S, Kramer B (2014) Toward exascale resilience: 2014 Update 2. The Exascale Resilience Problem. Technical Report p 1
Carson E, Strakoš Z (2020) On the cost of iterative computations. Philos Trans R Soc A Math Phys Eng Sci 378:20190050. https://doi.org/10.1098/rsta.2019.0050
Chen L, Ebrahimi M, Tahoori MB (2016) Reliability-aware resource allocation and binding in high-level synthesis. ACM Trans Des Autom Electron Syst 21(2)
Cools S (2019) Analyzing and improving maximal attainable accuracy in the communication hiding pipelined bicgstab method. Parallel Comput 86:16–35
Cools S, Yetkin EF, Agullo E, Giraud L, Vanroose W (2018) Analyzing the effect of local rounding error propagation on the maximal attainable accuracy of the pipelined conjugate gradient method. SIAM J Matrix Anal Appl 39(1):426–450
Du P, Luszczek P, Dongarra J (2012) High performance dense linear system solver with soft error resilience. In: Proceedings of the international conference on computational science, pp 216–225
Einstein A (1905) Zur Elektrodynamik bewegter Körper. (German) [On the electrodynamics of moving bodies]. Annalen der Physik 322(10):891–921
Elliott J, Hoemmen M, Mueller F (2016) Exploiting data representation for fault tolerance. J Comput Sci 14:51–60, the Route to Exascale: Novel Mathematical Methods, Scalable Algorithms and Computational Science Skills
Fiala D, Mueller F, Engelmann C, Riesen R, Ferreira K, Brightwell R (2012) Detection and correction of silent data corruption for large-scale high-performance computing. In: SC ’12: Proceedings of the international conference on high performance computing, networking, storage and analysis, pp 1–12
Garcia-Gasulla M, Mantovani F, Josep-Fabrego M, Eguzkitza B, Houzeaux G Runtime mechanisms to survive new HPC architectures: a use case in human respiratory simulations. Int J High Perform Comput Appl 0(0):1094342019842919
Ghysels P, Vanroose W (2014) Hiding global synchronization latency in the preconditioned conjugate gradient algorithm. Parallel Comput 40(7):224–238
Howard M, Fisher T, Hoemmen M, Dinzl D, Overfelt J, Bradley A, Kim K, Rajamanickam S (2018) Employing multiple levels of parallelism for CFD at large scales on next generation high-performance computing platforms. In: Editor T (ed) Tenth international conference on computational fluid dynamics (ICCFD10), The organization, Barcelona, Spain, an optional note
Huang K, Abraham J (1984) Algorithm-based fault tolerance for Matnx operations. IEEE Trans Comput c(6):518–528
Hwang AA, Stefanovici IA, Schroeder B (2012) Cosmic rays don’t strike twice: understanding the nature of dram errors and the implications for system design. In: Proceedings of the seventeenth international conference on architectural support for programming languages and operating systems, Association for Computing Machinery, New York, NY, USA, ASPLOS XVII, pp 111–122
Jaulmes L, Casas M, Moretó M, Ayguadé E, Labarta J, Valero M (2015) Exploiting asynchrony from exact forward recovery for due in iterative solvers. In: SC ’15: Proceedings of the international conference for high performance computing, networking, storage and analysis, pp 1–12
Khawaja H (2019 (accessed May 15, 2020)a) CFD solution using SIMPLE. https://www.mathworks.com/matlabcentral/fileexchange/66129-matlab
Khawaja H (2019 (accessed May 15, 2020)b) SIMPLE code rectengular. https://github.com/hassan-khawaja/matlab
Khawaja H, Moatamedi M (2018) Semi-implicit method for pressure-linked equations (simple) - solution in matlab\(\textregistered \). Int J Multiphys 12(4)
Lashkarinia S, Piskin S, Bozkaya TA, Salihoglu E, Yerebakan C, Pekkan K (2018) Computational pre-surgical planning of arterial patch reconstruction: parametric limits and in vitro validation. Ann Biomed Eng 46:1292–1308
Lee S, Kevrekidis IG, Karniadakis GE (2017) A general CFD framework for fault-resilient simulations based on multi-resolution information fusion. J Comput Phys 347:290–304
Lienig J, Bruemmer H (2017) Reliability analysis. Springer, Cham, pp 45–73
Oguz GN, Piskin S, Ermek E, Donmazov S, Altekin N, Arnaz A, Pekkan K (2017) Increased energy loss due to twist and offset buckling of the total cavopulmonary connection. J Med Devices 11(2):021012
Piskin S, Celebi MS (2013) Analysis of the effects of different pulsatile inlet profiles on the hemodynamical properties of blood flow in patient specific carotid artery with stenosis. Comput Biol Med 43(6):717–728
Piskin S, Ündar A, Pekkan K (2015) Computational modeling of neonatal cardiopulmonary bypass hemodynamics with full circle of willis anatomy. Artif Organs 39(10):E164–E175
Piskin S, Altin HF, Yildiz O, Bakir I, Pekkan K (2017a) Hemodynamics of patient-specific aorta-pulmonary shunt configurations. J Biomech 50:166–171, biofluid mechanics of multitude pathways: From cellular to organ
Piskin S, Unal G, Arnaz A, Sarioglu T, Pekkan K (2017b) Tetralogy of fallot surgical repair: shunt configurations, ductus arteriosus and the circle of Willis. Cardiovasc Eng Technol 8:107–119
Piskin S, Patnaik SS, Han D, Bordones AD, Murali S, Finol EA (2020) A canonical correlation analysis of the relationship between clinical attributes and patient-specific hemodynamic indices in adult pulmonary hypertension. Med Eng Phys 77:1–9
Roy S (2019) LES and DNS of multiphase flows in industrial devices: application of high-performance computing. Springer, Singapore, pp 223–247
Shang Z (2014) Impact of mesh partitioning methods in CFD for large scale parallel computing. Comput Fluids 103:1–5
Shantharam M, Srinivasmurthy S, Raghavan P (2011) Characterizing the impact of soft errors on iterative methods in scientific computing. In: Proceedings of the international conference on supercomputing - ICS ’11 p 152
Snir M, Wisniewski RW, Ja Abraham, Adve SV, Bagchi S, Balaji P, Belak J, Bose P, Cappello F, Carlson B, Aa Chien, Coteus P, Na DeBardeleben, Diniz PC, Engelmann C, Erez M, Fazzari S, Geist A, Gupta R, Johnson F (2014) Addressing failures in exascale computing. Int J High Perform Comput Appl 28:129–173
Ugurel E, Piskin S, Aksu AC, Eser A, Yalcin O (2020) From experiments to simulation: shear-induced responses of red blood cells to different oxygen saturation levels. Front Physiol 10:1559
van der Vorst HA (2009) Iterative Krylov methods for large linear systems. Cambridge University Press, Cambridge
Wang F, Agrawal VD (2008) Single event upset: an embedded tutorial. In: Proceedings of the IEEE international frequency control symposium and exposition pp 429–434
Wang YX, Zhang LL, Liu W, Cheng XH, Zhuang Y, Chronopoulos AT (2018) Performance optimizations for scalable CFD applications on hybrid CPU+MIC heterogeneous computing system with millions of cores. Comput Fluids 173:226–236
Funding
This research did not receive any specific grant from any funding agencies in the public, commercial, or not-for-profit sectors.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Yetkin, E.F., Pişkin, Ş. Sensitivity of computational fluid dynamics simulations against soft errors. Computing 103, 2687–2709 (2021). https://doi.org/10.1007/s00607-021-00976-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00607-021-00976-0
Keywords
- High performance computing
- Navier–stokes equations
- Fault tolerance
- Silent data corruption
- BiCG
- Exascale/petascale computing
- Bit-flip error
- Simulation platform