Impact of data distribution on the parallel performance of iterative linear solvers with emphasis on CFD of incompressible flows

Esmaily-Moghadam, M.; Bazilevs, Y.; Marsden, A. L.

doi:10.1007/s00466-014-1084-3

Impact of data distribution on the parallel performance of iterative linear solvers with emphasis on CFD of incompressible flows

Original Paper
Published: 25 October 2014

Volume 55, pages 93–103, (2015)
Cite this article

Computational Mechanics Aims and scope Submit manuscript

M. Esmaily-Moghadam¹,
Y. Bazilevs² &
A. L. Marsden¹

477 Accesses
14 Citations
Explore all metrics

Abstract

A parallel data structure that gives optimized memory layout for problems involving iterative solution of sparse linear systems is developed, and its efficient implementation is presented. The proposed method assigns a processor to a problem subdomain, and sorts data based on the shared entries with the adjacent subdomains. Matrix–vector-product communication overhead is reduced and parallel scalability is improved by overlapping inter-processor communications and local computations. The proposed method simplifies the implementation of parallel iterative linear equation solver algorithms and reduces the computational cost of vector inner products and matrix–vector products. Numerical results demonstrate very good performance of the proposed technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

Article 01 December 2017

An Efficient Parallel Implementation of the SIMPLE Algorithm Based on a Multigrid Method

Article 25 February 2020

Distributed generic approximate sparse inverses

Article 25 June 2014

References

Balay S, Brown J, Buschelman K, Eijkhout V, Gropp W, Kaushik D, Knepley M, Curfman McInnes L, Smith B, Zhang H (2013) PETSc Users Manual Revision 3.4, 2013
Bazilevs Y, Calo VM, Cottrell JA, Hughes TJR, Reali A, Scovazzi G (2007) Variational multiscale residual-based turbulence modeling for large eddy simulation of incompressible flows. Comput Methods Appl Mech Eng 197(1–4):173–201
Article MathSciNet MATH Google Scholar
Bazilevs Y, Takizawa K, Tezduyar TE (2013) Computational fluid-structure interaction: methods and applications. Wiley, New York
Book Google Scholar
Behr M, Johnson A, Kennedy J, Mittal S, Tezduyar T (1993) Computation of incompressible flows with implicit finite element implementations on the Connection Machine. Comput Methods Appl Mech Eng 108:99–118
Article MathSciNet MATH Google Scholar
Behr M, Tezduyar TE (1994) Finite element solution strategies for large-scale flow simulations. Comput Methods Appl Mech Eng 112:3–24
Article MathSciNet MATH Google Scholar
Brooks AN, Hughes TJR (1982) Streamline upwind/Petrov–Galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations. Comput Methods Appl Mech Eng 32(1–3):199–259
Article MathSciNet MATH Google Scholar
Elman H, Silvester D, Wathen A (2014) Finite elements and fast iterative solvers with applications in incompressible fluid dynamics. Oxford University Press, New York
Book MATH Google Scholar
Esmaily-Moghadam M, Bazilevs Y, Hsia TY, Vignon-Clementel I, Marsden AL (2011) A comparison of outlet boundary treatments for prevention of backflow divergence with relevance to blood flow simulations. Comput Mech 48(3):277–291
Article MathSciNet MATH Google Scholar
Esmaily-Moghadam M, Bazilevs Y, Marsden AL (2013) Low entropy data mapping for sparse iterative linear solvers. In Proceedings of the conference on extreme science and engineering discovery environment: gateway to discovery, p 2. ACM, 2013
Esmaily-Moghadam M, Bazilevs Y, Marsden AL (2013) A new preconditioning technique for implicitly coupled multidomain simulations with applications to hemodynamics. Comput Mech 52:1141–1152. doi:10.1007/s00466-013-0868-1
Article MathSciNet MATH Google Scholar
Esmaily-Moghadam M, Bazilevs Y, Marsden AL (2014) A bi-partitioned iterative algorithm for solving linear systems arising from incompressible flow problems. Comput Methods Appl Mech Eng, in review
Esmaily-Moghadam M, Hsia T-Y, Marsden AL (2013) A non-discrete method for computation of residence time in fluid mechanics simulations. Phys Fluids. doi:10.1063/1.4819142
Esmaily-Moghadam M, Migliavacca F, Vignon-Clementel IE, Hsia TY, Marsden AL (2012) Optimization of shunt placement for the Norwood surgery using multi-domain modeling. J Biomech Eng 134(5):051002
Article Google Scholar
Esmaily-Moghadam M, Vignon-Clementel IE, Figliola R, Marsden AL (2013) A modular numerical method for implicit 0D/3D coupling in cardiovascular finite element simulations. J Comput Phys 224:63–79
Article MathSciNet Google Scholar
Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49:409–436
Article MathSciNet MATH Google Scholar
Jansen KE, Whiting CH, Hulbert GM (2000) A generalized-[alpha] method for integrating the filtered Navier-Stokes equations with a stabilized finite element method. Comput Methods Appl Mech Eng 190(3—-4):305–319
Article MathSciNet MATH Google Scholar
Johnson AA, Tezduyar TE (1997) 3D simulation of fluid-particle interactions with the number of particles reaching 100. Comput Methods Appl Mech Eng 145:301–321
Article MATH Google Scholar
Karypis G, Kumar V (2009) MeTis: unstructured graph partitioning and sparse matrix ordering system, Version 4.0. http://www.cs.umn.edu/~metis
Kennedy JG, Behr M, Kalro V, Tezduyar TE (1994) Implementation of implicit finite element methods for incompressible flows on the CM-5. Comput Methods Appl Mech Eng 119:95–111
Article MATH Google Scholar
Kuck DJ, Davidson ES, Lawrie DH, Sameh AH (1986) Parallel supercomputing today and the cedar approach. Science 231(4741):967–974
Article Google Scholar
Manguoglu M, Sameh AH, Saied F, Tezduyar TE, Sathe S (2009) Preconditioning techniques for nonsymmetric linear systems in the computation of incompressible flows. J Appl Mech 76(2):021204
Article Google Scholar
Manguoglu M, Sameh AH, Tezduyar TE, Sathe S (2008) A nested iterative scheme for computation of incompressible flows in long domains. Comput Mech 43(1):73–80
Article MathSciNet MATH Google Scholar
Manguoglu M, Takizawa K, Sameh AH, Tezduyar TE (2010) Solution of linear systems in arterial fluid mechanics computations with boundary layer mesh refinement. Comput Mech 46(1):83–89
Article MATH Google Scholar
Manguoglu M, Takizawa K, Sameh AH, Tezduyar TE (2011) Nested and parallel sparse algorithms for arterial fluid mechanics computations with boundary layer mesh refinement. Int J Numer Methods Fluids 65(1–3):135–149
Article MathSciNet MATH Google Scholar
Manguoglu M, Takizawa K, Sameh AH, Tezduyar TE (2011) A parallel sparse algorithm targeting arterial fluid mechanics computations. Comput Mech 48(3):377–384
Article MATH Google Scholar
Nigro N, Storti M, Idelsohn S, Tezduyar T (1998) Physics based GMRES preconditioner for compressible and incompressible Navier-Stokes equations. Comput Methods Appl Mech Eng 154:203–228
Polizzi E, Sameh AH (2006) A parallel hybrid banded system solver: the SPIKE algorithm. Parallel Comput 32(2):177–194
Article MathSciNet Google Scholar
Saad Y (2003) Iterative methods for sparse linear systems. In: SIAM, 2003
Saad Y, Schultz MH (1983) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. Technical Report YALEU/DCS/RR-254, Department of Computer Science, Yale University, Yale
Sameh AH, Kuck DJ (1978) On stable parallel linear system solvers. J ACM 25(1):81–91
Article MathSciNet MATH Google Scholar
Sengupta D, Kahn A, Burns J, Sankaran S, Shadden S, Marsden A (2012) Image-based modeling of hemodynamics in coronary artery aneurysms caused by Kawasaki disease. Biomech Model Mechanobiol 11:915–932
Article Google Scholar
Tezduyar T, Aliabadi S, Behr M, Johnson A, Mittal S (1993) Parallel finite-element computation of 3D flows. Computer 26(10):27–36
Article Google Scholar
Tezduyar TE (2001) Finite element methods for flow problems with moving boundaries and interfaces. Arch Comput Methods Eng 8:83–130
Article MATH Google Scholar
Tezduyar TE (2007) Finite elements in fluids: special methods and enhanced solution techniques. Comput Fluids 36:207–223
Article MathSciNet MATH Google Scholar
Tezduyar TE, Behr M, Aliabadi SK, Mittal S, Ray SE (1992) A new mixed preconditioning method for finite element computations. Comput Methods Appl Mech Eng 99:27–42
Article MathSciNet MATH Google Scholar
Tezduyar TE, Liou J (1989) Grouped element-by-element iteration schemes for incompressible flow computations. Comput Phys Commun 53:441–453
Article MathSciNet MATH Google Scholar
Tezduyar TE, Mittal S, Ray SE, Shih R (1992) Incompressible flow computations with stabilized bilinear and linear equal-order-interpolation velocity-pressure elements. Comput Methods Appl Mech Eng 95:221–242
Article MATH Google Scholar
Tezduyar TE, Sathe S (2004) Enhanced-approximation linear solution technique (EALST). Comput Methods Appl Mech Eng 193:2033–2049
Article MathSciNet MATH Google Scholar
Tezduyar TE, Sathe S (2005) Enhanced-discretization successive update method (EDSUM). Int J Numer Methods Fluids 47:633–654
Article MathSciNet MATH Google Scholar
Tezduyar TE, Sathe S (2007) Modeling of fluid-structure interactions with the space-time finite elements: Solution techniques. Int J Numer Methods Fluids 54:855–900
Article MathSciNet MATH Google Scholar
Tezduyar TE (2003) Computation of moving boundaries and interfaces and stabilization parameters. Int J Numer Methods Fluids 43(5):555–575
Article MathSciNet MATH Google Scholar
Tezduyar TE, Sameh AH (2006) Parallel finite element computations in fluid mechanics. Comput Methods Appl Mech Eng 195(13):1872–1884
Article MathSciNet MATH Google Scholar
Washio T, Hisada T, Watanabe H, Tezduyar TE (2005) A robust preconditioner for fluid-structure interaction problems. Comput Methods Appl Mech Eng 194:4027–4047
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

Funding for this work was provided by a Leducq Foundation Network of Excellent Grant, a Burroughs Welcome Fund Career Award at the Scientific Interface, and the NIH grant RHL102596A. The second author was supported by the NSF CAREER award OCI-105509. The computational resources were provided by the national XSEDE program.

Author information

Authors and Affiliations

Department of Mechanical and Aerospace Engineering, University of California, San Diego, USA
M. Esmaily-Moghadam & A. L. Marsden
Department of Structural Engineering, University of California, San Diego, USA
Y. Bazilevs

Authors

M. Esmaily-Moghadam
View author publications
You can also search for this author in PubMed Google Scholar
Y. Bazilevs
View author publications
You can also search for this author in PubMed Google Scholar
A. L. Marsden
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Y. Bazilevs.

Appendix

The performance of the present method is compared with that of PETSc, a widely adopted linear algebra library of routines [1]. A Cray PETSc 3.2.00 release, equivalent to a 3.2-p5 release by the Argonne National Laboratory, is used for comparison. Since there may be significant differences in the implementation of iterative linear algebra algorithms, only matrix-vector product operation is considered in this comparison study. Our choices for the PETSc matrix type, its initialization and assembly, and also the matrix-vector product operation are included here for completeness:

In the above pseudo-code \(\tilde{\varvec{x}}\) and \(\tilde{\varvec{A}}\) are the memory locations of the vector \(\varvec{x}\) and matrix \(\varvec{A}\), \(\varvec{t}\) is a temporary array, and \(I^i\) and \(J^i\) are the C-compatible integer arrays with 0 as the first entry.

Computations associated with Fig. 5 are repeated using PETSc. One hundred matrix-vector products are computed and \(t_\mathrm{cpu}\) is measured. To compare the methods in terms of minimal time to completion of a matrix-vector product operation, the cases of \(n_\mathrm{p}=\)8, 16, and 64 are considered for the small, medium, and large model, respectively (see Table 1). These correspond to near-peak performance of both techniques (see Fig. 5). The results show that by increasing the problem size the difference between the peak performance of the present method and PETSc becomes more apparent: The higher the number of partitions, the better the present method performs relative to PETSc. The improvement for the small, medium, and large model is 65, 177, and 516 %, respectively. Repeating the computations on a different machine and using a different version of PETSc had minimal effect on the results. Note that, among the several options available, we used a basic PETSc matrix type in this comparison. PETSc results may depend on the matrix type, however, this was not investigated here.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Esmaily-Moghadam, M., Bazilevs, Y. & Marsden, A.L. Impact of data distribution on the parallel performance of iterative linear solvers with emphasis on CFD of incompressible flows. Comput Mech 55, 93–103 (2015). https://doi.org/10.1007/s00466-014-1084-3

Download citation

Received: 28 July 2014
Accepted: 23 September 2014
Published: 25 October 2014
Issue Date: January 2015
DOI: https://doi.org/10.1007/s00466-014-1084-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Impact of data distribution on the parallel performance of iterative linear solvers with emphasis on CFD of incompressible flows

Abstract

Access this article

Similar content being viewed by others

Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

An Efficient Parallel Implementation of the SIMPLE Algorithm Based on a Multigrid Method

Distributed generic approximate sparse inverses

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Impact of data distribution on the parallel performance of iterative linear solvers with emphasis on CFD of incompressible flows

Abstract

Access this article

Similar content being viewed by others

Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

An Efficient Parallel Implementation of the SIMPLE Algorithm Based on a Multigrid Method

Distributed generic approximate sparse inverses

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation