Skip to main content
Log in

Impact of data distribution on the parallel performance of iterative linear solvers with emphasis on CFD of incompressible flows

  • Original Paper
  • Published:
Computational Mechanics Aims and scope Submit manuscript

Abstract

A parallel data structure that gives optimized memory layout for problems involving iterative solution of sparse linear systems is developed, and its efficient implementation is presented. The proposed method assigns a processor to a problem subdomain, and sorts data based on the shared entries with the adjacent subdomains. Matrix–vector-product communication overhead is reduced and parallel scalability is improved by overlapping inter-processor communications and local computations. The proposed method simplifies the implementation of parallel iterative linear equation solver algorithms and reduces the computational cost of vector inner products and matrix–vector products. Numerical results demonstrate very good performance of the proposed technique.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Balay S, Brown J, Buschelman K, Eijkhout V, Gropp W, Kaushik D, Knepley M, Curfman McInnes L, Smith B, Zhang H (2013) PETSc Users Manual Revision 3.4, 2013

  2. Bazilevs Y, Calo VM, Cottrell JA, Hughes TJR, Reali A, Scovazzi G (2007) Variational multiscale residual-based turbulence modeling for large eddy simulation of incompressible flows. Comput Methods Appl Mech Eng 197(1–4):173–201

    Article  MathSciNet  MATH  Google Scholar 

  3. Bazilevs Y, Takizawa K, Tezduyar TE (2013) Computational fluid-structure interaction: methods and applications. Wiley, New York

    Book  Google Scholar 

  4. Behr M, Johnson A, Kennedy J, Mittal S, Tezduyar T (1993) Computation of incompressible flows with implicit finite element implementations on the Connection Machine. Comput Methods Appl Mech Eng 108:99–118

    Article  MathSciNet  MATH  Google Scholar 

  5. Behr M, Tezduyar TE (1994) Finite element solution strategies for large-scale flow simulations. Comput Methods Appl Mech Eng 112:3–24

    Article  MathSciNet  MATH  Google Scholar 

  6. Brooks AN, Hughes TJR (1982) Streamline upwind/Petrov–Galerkin formulations for convection dominated flows with particular emphasis on the incompressible Navier-Stokes equations. Comput Methods Appl Mech Eng 32(1–3):199–259

    Article  MathSciNet  MATH  Google Scholar 

  7. Elman H, Silvester D, Wathen A (2014) Finite elements and fast iterative solvers with applications in incompressible fluid dynamics. Oxford University Press, New York

    Book  MATH  Google Scholar 

  8. Esmaily-Moghadam M, Bazilevs Y, Hsia TY, Vignon-Clementel I, Marsden AL (2011) A comparison of outlet boundary treatments for prevention of backflow divergence with relevance to blood flow simulations. Comput Mech 48(3):277–291

    Article  MathSciNet  MATH  Google Scholar 

  9. Esmaily-Moghadam M, Bazilevs Y, Marsden AL (2013) Low entropy data mapping for sparse iterative linear solvers. In Proceedings of the conference on extreme science and engineering discovery environment: gateway to discovery, p 2. ACM, 2013

  10. Esmaily-Moghadam M, Bazilevs Y, Marsden AL (2013) A new preconditioning technique for implicitly coupled multidomain simulations with applications to hemodynamics. Comput Mech 52:1141–1152. doi:10.1007/s00466-013-0868-1

    Article  MathSciNet  MATH  Google Scholar 

  11. Esmaily-Moghadam M, Bazilevs Y, Marsden AL (2014) A bi-partitioned iterative algorithm for solving linear systems arising from incompressible flow problems. Comput Methods Appl Mech Eng, in review

  12. Esmaily-Moghadam M, Hsia T-Y, Marsden AL (2013) A non-discrete method for computation of residence time in fluid mechanics simulations. Phys Fluids. doi:10.1063/1.4819142

  13. Esmaily-Moghadam M, Migliavacca F, Vignon-Clementel IE, Hsia TY, Marsden AL (2012) Optimization of shunt placement for the Norwood surgery using multi-domain modeling. J Biomech Eng 134(5):051002

    Article  Google Scholar 

  14. Esmaily-Moghadam M, Vignon-Clementel IE, Figliola R, Marsden AL (2013) A modular numerical method for implicit 0D/3D coupling in cardiovascular finite element simulations. J Comput Phys 224:63–79

    Article  MathSciNet  Google Scholar 

  15. Hestenes MR, Stiefel E (1952) Methods of conjugate gradients for solving linear systems. J Res Natl Bur Stand 49:409–436

    Article  MathSciNet  MATH  Google Scholar 

  16. Jansen KE, Whiting CH, Hulbert GM (2000) A generalized-[alpha] method for integrating the filtered Navier-Stokes equations with a stabilized finite element method. Comput Methods Appl Mech Eng 190(3—-4):305–319

    Article  MathSciNet  MATH  Google Scholar 

  17. Johnson AA, Tezduyar TE (1997) 3D simulation of fluid-particle interactions with the number of particles reaching 100. Comput Methods Appl Mech Eng 145:301–321

    Article  MATH  Google Scholar 

  18. Karypis G, Kumar V (2009) MeTis: unstructured graph partitioning and sparse matrix ordering system, Version 4.0. http://www.cs.umn.edu/~metis

  19. Kennedy JG, Behr M, Kalro V, Tezduyar TE (1994) Implementation of implicit finite element methods for incompressible flows on the CM-5. Comput Methods Appl Mech Eng 119:95–111

    Article  MATH  Google Scholar 

  20. Kuck DJ, Davidson ES, Lawrie DH, Sameh AH (1986) Parallel supercomputing today and the cedar approach. Science 231(4741):967–974

    Article  Google Scholar 

  21. Manguoglu M, Sameh AH, Saied F, Tezduyar TE, Sathe S (2009) Preconditioning techniques for nonsymmetric linear systems in the computation of incompressible flows. J Appl Mech 76(2):021204

    Article  Google Scholar 

  22. Manguoglu M, Sameh AH, Tezduyar TE, Sathe S (2008) A nested iterative scheme for computation of incompressible flows in long domains. Comput Mech 43(1):73–80

    Article  MathSciNet  MATH  Google Scholar 

  23. Manguoglu M, Takizawa K, Sameh AH, Tezduyar TE (2010) Solution of linear systems in arterial fluid mechanics computations with boundary layer mesh refinement. Comput Mech 46(1):83–89

    Article  MATH  Google Scholar 

  24. Manguoglu M, Takizawa K, Sameh AH, Tezduyar TE (2011) Nested and parallel sparse algorithms for arterial fluid mechanics computations with boundary layer mesh refinement. Int J Numer Methods Fluids 65(1–3):135–149

    Article  MathSciNet  MATH  Google Scholar 

  25. Manguoglu M, Takizawa K, Sameh AH, Tezduyar TE (2011) A parallel sparse algorithm targeting arterial fluid mechanics computations. Comput Mech 48(3):377–384

    Article  MATH  Google Scholar 

  26. Nigro N, Storti M, Idelsohn S, Tezduyar T (1998) Physics based GMRES preconditioner for compressible and incompressible Navier-Stokes equations. Comput Methods Appl Mech Eng 154:203–228

  27. Polizzi E, Sameh AH (2006) A parallel hybrid banded system solver: the SPIKE algorithm. Parallel Comput 32(2):177–194

    Article  MathSciNet  Google Scholar 

  28. Saad Y (2003) Iterative methods for sparse linear systems. In: SIAM, 2003

  29. Saad Y, Schultz MH (1983) GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems. Technical Report YALEU/DCS/RR-254, Department of Computer Science, Yale University, Yale

  30. Sameh AH, Kuck DJ (1978) On stable parallel linear system solvers. J ACM 25(1):81–91

    Article  MathSciNet  MATH  Google Scholar 

  31. Sengupta D, Kahn A, Burns J, Sankaran S, Shadden S, Marsden A (2012) Image-based modeling of hemodynamics in coronary artery aneurysms caused by Kawasaki disease. Biomech Model Mechanobiol 11:915–932

    Article  Google Scholar 

  32. Tezduyar T, Aliabadi S, Behr M, Johnson A, Mittal S (1993) Parallel finite-element computation of 3D flows. Computer 26(10):27–36

    Article  Google Scholar 

  33. Tezduyar TE (2001) Finite element methods for flow problems with moving boundaries and interfaces. Arch Comput Methods Eng 8:83–130

    Article  MATH  Google Scholar 

  34. Tezduyar TE (2007) Finite elements in fluids: special methods and enhanced solution techniques. Comput Fluids 36:207–223

    Article  MathSciNet  MATH  Google Scholar 

  35. Tezduyar TE, Behr M, Aliabadi SK, Mittal S, Ray SE (1992) A new mixed preconditioning method for finite element computations. Comput Methods Appl Mech Eng 99:27–42

    Article  MathSciNet  MATH  Google Scholar 

  36. Tezduyar TE, Liou J (1989) Grouped element-by-element iteration schemes for incompressible flow computations. Comput Phys Commun 53:441–453

    Article  MathSciNet  MATH  Google Scholar 

  37. Tezduyar TE, Mittal S, Ray SE, Shih R (1992) Incompressible flow computations with stabilized bilinear and linear equal-order-interpolation velocity-pressure elements. Comput Methods Appl Mech Eng 95:221–242

    Article  MATH  Google Scholar 

  38. Tezduyar TE, Sathe S (2004) Enhanced-approximation linear solution technique (EALST). Comput Methods Appl Mech Eng 193:2033–2049

    Article  MathSciNet  MATH  Google Scholar 

  39. Tezduyar TE, Sathe S (2005) Enhanced-discretization successive update method (EDSUM). Int J Numer Methods Fluids 47:633–654

    Article  MathSciNet  MATH  Google Scholar 

  40. Tezduyar TE, Sathe S (2007) Modeling of fluid-structure interactions with the space-time finite elements: Solution techniques. Int J Numer Methods Fluids 54:855–900

    Article  MathSciNet  MATH  Google Scholar 

  41. Tezduyar TE (2003) Computation of moving boundaries and interfaces and stabilization parameters. Int J Numer Methods Fluids 43(5):555–575

    Article  MathSciNet  MATH  Google Scholar 

  42. Tezduyar TE, Sameh AH (2006) Parallel finite element computations in fluid mechanics. Comput Methods Appl Mech Eng 195(13):1872–1884

    Article  MathSciNet  MATH  Google Scholar 

  43. Washio T, Hisada T, Watanabe H, Tezduyar TE (2005) A robust preconditioner for fluid-structure interaction problems. Comput Methods Appl Mech Eng 194:4027–4047

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgments

Funding for this work was provided by a Leducq Foundation Network of Excellent Grant, a Burroughs Welcome Fund Career Award at the Scientific Interface, and the NIH grant RHL102596A. The second author was supported by the NSF CAREER award OCI-105509. The computational resources were provided by the national XSEDE program.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Y. Bazilevs.

Appendix

Appendix

The performance of the present method is compared with that of PETSc, a widely adopted linear algebra library of routines [1]. A Cray PETSc 3.2.00 release, equivalent to a 3.2-p5 release by the Argonne National Laboratory, is used for comparison. Since there may be significant differences in the implementation of iterative linear algebra algorithms, only matrix-vector product operation is considered in this comparison study. Our choices for the PETSc matrix type, its initialization and assembly, and also the matrix-vector product operation are included here for completeness:

figure h

In the above pseudo-code \(\tilde{\varvec{x}}\) and \(\tilde{\varvec{A}}\) are the memory locations of the vector \(\varvec{x}\) and matrix \(\varvec{A}\), \(\varvec{t}\) is a temporary array, and \(I^i\) and \(J^i\) are the C-compatible integer arrays with 0 as the first entry.

Computations associated with Fig. 5 are repeated using PETSc. One hundred matrix-vector products are computed and \(t_\mathrm{cpu}\) is measured. To compare the methods in terms of minimal time to completion of a matrix-vector product operation, the cases of \(n_\mathrm{p}=\)8, 16, and 64 are considered for the small, medium, and large model, respectively (see Table 1). These correspond to near-peak performance of both techniques (see Fig. 5). The results show that by increasing the problem size the difference between the peak performance of the present method and PETSc becomes more apparent: The higher the number of partitions, the better the present method performs relative to PETSc. The improvement for the small, medium, and large model is 65, 177, and 516 %, respectively. Repeating the computations on a different machine and using a different version of PETSc had minimal effect on the results. Note that, among the several options available, we used a basic PETSc matrix type in this comparison. PETSc results may depend on the matrix type, however, this was not investigated here.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Esmaily-Moghadam, M., Bazilevs, Y. & Marsden, A.L. Impact of data distribution on the parallel performance of iterative linear solvers with emphasis on CFD of incompressible flows. Comput Mech 55, 93–103 (2015). https://doi.org/10.1007/s00466-014-1084-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00466-014-1084-3

Keywords

Navigation