Abstract
A parallel data structure that gives optimized memory layout for problems involving iterative solution of sparse linear systems is developed, and its efficient implementation is presented. The proposed method assigns a processor to a problem subdomain, and sorts data based on the shared entries with the adjacent subdomains. Matrix–vector-product communication overhead is reduced and parallel scalability is improved by overlapping inter-processor communications and local computations. The proposed method simplifies the implementation of parallel iterative linear equation solver algorithms and reduces the computational cost of vector inner products and matrix–vector products. Numerical results demonstrate very good performance of the proposed technique.
Similar content being viewed by others
References
Balay S, Brown J, Buschelman K, Eijkhout V, Gropp W, Kaushik D, Knepley M, Curfman McInnes L, Smith B, Zhang H (2013) PETSc Users Manual Revision 3.4, 2013
Bazilevs Y, Calo VM, Cottrell JA, Hughes TJR, Reali A, Scovazzi G (2007) Variational multiscale residual-based turbulence modeling for large eddy simulation of incompressible flows. Comput Methods Appl Mech Eng 197(1–4):173–201
Bazilevs Y, Takizawa K, Tezduyar TE (2013) Computational fluid-structure interaction: methods and applications. Wiley, New York
Behr M, Johnson A, Kennedy J, Mittal S, Tezduyar T (1993) Computation of incompressible flows with implicit finite element implementations on the Connection Machine. Comput Methods Appl Mech Eng 108:99–118
Behr M, Tezduyar TE (1994) Finite element solution strategies for large-scale flow simulations. Comput Methods Appl Mech Eng 112:3–24
Brooks AN, Hughes TJR (1982) Streamline upwind/Petrov–Galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations. Comput Methods Appl Mech Eng 32(1–3):199–259
Elman H, Silvester D, Wathen A (2014) Finite elements and fast iterative solvers with applications in incompressible fluid dynamics. Oxford University Press, New York
Esmaily-Moghadam M, Bazilevs Y, Hsia TY, Vignon-Clementel I, Marsden AL (2011) A comparison of outlet boundary treatments for prevention of backflow divergence with relevance to blood flow simulations. Comput Mech 48(3):277–291
Esmaily-Moghadam M, Bazilevs Y, Marsden AL (2013) Low entropy data mapping for sparse iterative linear solvers. In Proceedings of the conference on extreme science and engineering discovery environment: gateway to discovery, p 2. ACM, 2013
Esmaily-Moghadam M, Bazilevs Y, Marsden AL (2013) A new preconditioning technique for implicitly coupled multidomain simulations with applications to hemodynamics. Comput Mech 52:1141–1152. doi:10.1007/s00466-013-0868-1
Esmaily-Moghadam M, Bazilevs Y, Marsden AL (2014) A bi-partitioned iterative algorithm for solving linear systems arising from incompressible flow problems. Comput Methods Appl Mech Eng, in review
Esmaily-Moghadam M, Hsia T-Y, Marsden AL (2013) A non-discrete method for computation of residence time in fluid mechanics simulations. Phys Fluids. doi:10.1063/1.4819142
Esmaily-Moghadam M, Migliavacca F, Vignon-Clementel IE, Hsia TY, Marsden AL (2012) Optimization of shunt placement for the Norwood surgery using multi-domain modeling. J Biomech Eng 134(5):051002
Esmaily-Moghadam M, Vignon-Clementel IE, Figliola R, Marsden AL (2013) A modular numerical method for implicit 0D/3D coupling in cardiovascular finite element simulations. J Comput Phys 224:63–79
Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49:409–436
Jansen KE, Whiting CH, Hulbert GM (2000) A generalized-[alpha] method for integrating the filtered Navier-Stokes equations with a stabilized finite element method. Comput Methods Appl Mech Eng 190(3—-4):305–319
Johnson AA, Tezduyar TE (1997) 3D simulation of fluid-particle interactions with the number of particles reaching 100. Comput Methods Appl Mech Eng 145:301–321
Karypis G, Kumar V (2009) MeTis: unstructured graph partitioning and sparse matrix ordering system, Version 4.0. http://www.cs.umn.edu/~metis
Kennedy JG, Behr M, Kalro V, Tezduyar TE (1994) Implementation of implicit finite element methods for incompressible flows on the CM-5. Comput Methods Appl Mech Eng 119:95–111
Kuck DJ, Davidson ES, Lawrie DH, Sameh AH (1986) Parallel supercomputing today and the cedar approach. Science 231(4741):967–974
Manguoglu M, Sameh AH, Saied F, Tezduyar TE, Sathe S (2009) Preconditioning techniques for nonsymmetric linear systems in the computation of incompressible flows. J Appl Mech 76(2):021204
Manguoglu M, Sameh AH, Tezduyar TE, Sathe S (2008) A nested iterative scheme for computation of incompressible flows in long domains. Comput Mech 43(1):73–80
Manguoglu M, Takizawa K, Sameh AH, Tezduyar TE (2010) Solution of linear systems in arterial fluid mechanics computations with boundary layer mesh refinement. Comput Mech 46(1):83–89
Manguoglu M, Takizawa K, Sameh AH, Tezduyar TE (2011) Nested and parallel sparse algorithms for arterial fluid mechanics computations with boundary layer mesh refinement. Int J Numer Methods Fluids 65(1–3):135–149
Manguoglu M, Takizawa K, Sameh AH, Tezduyar TE (2011) A parallel sparse algorithm targeting arterial fluid mechanics computations. Comput Mech 48(3):377–384
Nigro N, Storti M, Idelsohn S, Tezduyar T (1998) Physics based GMRES preconditioner for compressible and incompressible Navier-Stokes equations. Comput Methods Appl Mech Eng 154:203–228
Polizzi E, Sameh AH (2006) A parallel hybrid banded system solver: the SPIKE algorithm. Parallel Comput 32(2):177–194
Saad Y (2003) Iterative methods for sparse linear systems. In: SIAM, 2003
Saad Y, Schultz MH (1983) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. Technical Report YALEU/DCS/RR-254, Department of Computer Science, Yale University, Yale
Sameh AH, Kuck DJ (1978) On stable parallel linear system solvers. J ACM 25(1):81–91
Sengupta D, Kahn A, Burns J, Sankaran S, Shadden S, Marsden A (2012) Image-based modeling of hemodynamics in coronary artery aneurysms caused by Kawasaki disease. Biomech Model Mechanobiol 11:915–932
Tezduyar T, Aliabadi S, Behr M, Johnson A, Mittal S (1993) Parallel finite-element computation of 3D flows. Computer 26(10):27–36
Tezduyar TE (2001) Finite element methods for flow problems with moving boundaries and interfaces. Arch Comput Methods Eng 8:83–130
Tezduyar TE (2007) Finite elements in fluids: special methods and enhanced solution techniques. Comput Fluids 36:207–223
Tezduyar TE, Behr M, Aliabadi SK, Mittal S, Ray SE (1992) A new mixed preconditioning method for finite element computations. Comput Methods Appl Mech Eng 99:27–42
Tezduyar TE, Liou J (1989) Grouped element-by-element iteration schemes for incompressible flow computations. Comput Phys Commun 53:441–453
Tezduyar TE, Mittal S, Ray SE, Shih R (1992) Incompressible flow computations with stabilized bilinear and linear equal-order-interpolation velocity-pressure elements. Comput Methods Appl Mech Eng 95:221–242
Tezduyar TE, Sathe S (2004) Enhanced-approximation linear solution technique (EALST). Comput Methods Appl Mech Eng 193:2033–2049
Tezduyar TE, Sathe S (2005) Enhanced-discretization successive update method (EDSUM). Int J Numer Methods Fluids 47:633–654
Tezduyar TE, Sathe S (2007) Modeling of fluid-structure interactions with the space-time finite elements: Solution techniques. Int J Numer Methods Fluids 54:855–900
Tezduyar TE (2003) Computation of moving boundaries and interfaces and stabilization parameters. Int J Numer Methods Fluids 43(5):555–575
Tezduyar TE, Sameh AH (2006) Parallel finite element computations in fluid mechanics. Comput Methods Appl Mech Eng 195(13):1872–1884
Washio T, Hisada T, Watanabe H, Tezduyar TE (2005) A robust preconditioner for fluid-structure interaction problems. Comput Methods Appl Mech Eng 194:4027–4047
Acknowledgments
Funding for this work was provided by a Leducq Foundation Network of Excellent Grant, a Burroughs Welcome Fund Career Award at the Scientific Interface, and the NIH grant RHL102596A. The second author was supported by the NSF CAREER award OCI-105509. The computational resources were provided by the national XSEDE program.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The performance of the present method is compared with that of PETSc, a widely adopted linear algebra library of routines [1]. A Cray PETSc 3.2.00 release, equivalent to a 3.2-p5 release by the Argonne National Laboratory, is used for comparison. Since there may be significant differences in the implementation of iterative linear algebra algorithms, only matrix-vector product operation is considered in this comparison study. Our choices for the PETSc matrix type, its initialization and assembly, and also the matrix-vector product operation are included here for completeness:
In the above pseudo-code \(\tilde{\varvec{x}}\) and \(\tilde{\varvec{A}}\) are the memory locations of the vector \(\varvec{x}\) and matrix \(\varvec{A}\), \(\varvec{t}\) is a temporary array, and \(I^i\) and \(J^i\) are the C-compatible integer arrays with 0 as the first entry.
Computations associated with Fig. 5 are repeated using PETSc. One hundred matrix-vector products are computed and \(t_\mathrm{cpu}\) is measured. To compare the methods in terms of minimal time to completion of a matrix-vector product operation, the cases of \(n_\mathrm{p}=\)8, 16, and 64 are considered for the small, medium, and large model, respectively (see Table 1). These correspond to near-peak performance of both techniques (see Fig. 5). The results show that by increasing the problem size the difference between the peak performance of the present method and PETSc becomes more apparent: The higher the number of partitions, the better the present method performs relative to PETSc. The improvement for the small, medium, and large model is 65, 177, and 516 %, respectively. Repeating the computations on a different machine and using a different version of PETSc had minimal effect on the results. Note that, among the several options available, we used a basic PETSc matrix type in this comparison. PETSc results may depend on the matrix type, however, this was not investigated here.
Rights and permissions
About this article
Cite this article
Esmaily-Moghadam, M., Bazilevs, Y. & Marsden, A.L. Impact of data distribution on the parallel performance of iterative linear solvers with emphasis on CFD of incompressible flows. Comput Mech 55, 93–103 (2015). https://doi.org/10.1007/s00466-014-1084-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00466-014-1084-3