Abstract
A scalable matrix solver was developed for the moving particle hydrodynamics for incompressible flows (MPH-I) method. Since the MPH-I method can calculate both incompressible and highly viscous flows while ensuring stability through physical consistency, a wide range of industrial applications is expected. However, in its implicit calculation, both the pressure and velocity must be solved simultaneously via a linear equation with a nondefinite symmetric coefficient matrix. In this study, this nondefinite linear system was converted into a symmetric positive definite (SPD) system where only the velocity is unknown. This conversion enabled us to solve the system with well-known solvers such as the conjugated gradient (CG) and conjugated residual (CR) methods. For scalability, bucket-based multigrid preconditioned CG and CR solvers were developed for the SPD system. To handle multidimensionality during preconditioning, an extended Jacobi smoother that is even applicable in a nondiagonally dominant matrix system was proposed. The numerical efficiency was confirmed via a simple high-viscosity incompressible dam break calculation, and the scalability within the presented case was confirmed. In addition, the performance under shared memory parallel computations was studied.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Particle methods can easily handle large deformations of free surface flows compared to mesh methods such as the finite volume method (FVM) because the motion of a continuum, i.e., a fluid and a solid, can be directly expressed by particle movement. The representative particle methods are the smoothed particle hydrodynamics (SPH) method proposed by Monaghan [1] and the moving particle semi-implicit (MPS) method proposed by Koshizuka et al. [2]. Although they have been adopted in various applications, they require empirical relaxations to obtain stable results (e.g., artificial viscosity [1], density smoothing [3,4,5,6], background pressure [7], relaxation of the pressure Poisson equation (PPE) [2, 8,9,10,11], and particle regularization [12,13,14,15,16]). On the other hand, the moving particle hydrodynamics (MPH) method [17,18,19,20,21,22], which inherits the concepts of the SPH and MPS methods, can conduct calculations while avoiding unphysical instability, such as particle scattering, even without empirical treatments. This is because the stability with respect to particle motion is ensured through physical consistency in the MPH method. When the discrete particle motion equations can be fit into the analytical mechanical framework [23], the system will be physically consistent. The MPH method currently has two types, i.e., MPH for weakly compressible flows (MPH-WC) [18] and MPH for incompressible flows (MPH-I) [17]. In previous studies [17,18,19,20], the MPH method was validated with various calculations, e.g., static pressure [17, 18], dam break [17, 18], Taylor Couette flow [19], high-viscosity free surface flow [19], droplet oscillation [20], liquid bridge [20], and Plateau–Rayleigh instability [20]. Although it appears straightforward owing to its simplicity, calculating the static pressure using particle methods is not easy, as the static pressure calculation is easily affected by unphysical fluctuation. In fact, unrealistic results were observed with the SPH and MPS methods [17]. However, even in such cases, the physically consistent MPH method could obtain the reasonable results [17]. In addition, Other physically consistent particle methods, e.g., the elastic body models proposed by Suzuki et al. [24] and Kondo et al. [25], thin plate model by Kondo et al. [26], constraint-based incompressible SPH model by Ellero et al. [27] and Hamiltonian MPS model by Suzuki et al. [28], have also been proposed.
To simulate practically incompressible fluids such as water, a very large bulk modulus must be employed. In addition to the physical consistency, the constraint-based incompressible SPH method proposed by Ellero et al. [27] and the Hamiltonian MPS method proposed by Suzuki et al. [28] could treat the incompressibility. However, to strictly satisfy the geometric incompressible constraint, they must solve nonlinear equations in symplectic algorithms [29], i.e., RATTLE and SHAKE. In contrast, the MPH-I method [17] can practically simulate incompressible flows only by solving linear equations because it adopts a very large bulk modulus and bulk viscosity instead of directly applying incompressible constraints. Since it can treat not only incompressibility but also high viscosity, it can be applied in various industrial problems with respect to complex flows [30,31,32].
However, the arising linear equation in the MPH-I method [17] has a nonpositive definite coefficient matrix when the pressure and velocity are both treated as unknowns. Therefore, convergence in solving the linear equation is not assured when well-known solvers such as the conjugated gradient (CG) and conjugate residual (CR) methods [33] are adopted. In fact, the convergence is sometimes unstable in the MPH-I method [17] when the CR solver is adopted. When the coefficient matrix is symmetric positive definite (SPD), convergence is ensured using the CG and CR solvers adopting short recurrence for iteration. Therefore, the SPD linear system is favorable in terms of both calculation efficiency and stability. In the MPH-I method, a linear equation can be converted to an SPD equation, whose unknowns are only velocity, via pressure substitution [21]. Since the equations before and after the conversion are mathematically the same, the calculation will be faster without changing the results. However, scalability is not achieved because the number of iterations increases with the system size even when solving the SPD system.
For large computations, it is important to develop a numerical method whose calculation cost is linear to the problem size. To obtain such a scalable feature, matrix solvers such as the multigrid method [34,35,36] are needed. In particle methods, there are fewer studies adopting multigrid solvers than the finite element method (FEM) or finite volume method (FVM). Cummins and Rudman’s work [37] is a pioneering study. They applied a bucket-based geometric multigrid (BMG) solver in an isolated manner to solve the pressure Poisson equation in their incompressible SPH method. In recent studies, multigrid methods were used as preconditioners in Krylov subspace methods, e.g., the CG, CR and GMRES methods. Algebraic multigrid (AMG) methods were adopted by Trask et al. [38], Chow et al. [39] and Guo et al. [40] in the incompressible SPH method and by Matsunaga et al. [41] in the MPS method. Geometric bucket-based multigrid (BMG) methods were adopted by Sodersten et al. [42] in the MPS method and by Takahashi and Lin [43] in the incompressible SPH method. In addition, Sodersten et al. [42] reported that the BMG solver was more efficient in particle methods because the AMG solver needs to be set up at every time step due to the dynamic change in connectivity. In these multigrid solvers for particle methods [37,38,39,40,41,42,43], only smoothers demanding diagonally dominant matrix equations, e.g., Jacobi and Gauss‒Seidel smoothers, were adopted. Therefore, their applications were limited to diagonally dominant systems, which are obtained when difference-based Laplacian models are applied to the Poisson equation or the Helmholtz equation. In the finite point method (FPM), which is a meshless method, Seibold [44] and Metsch et al. [45] adopted the AMG solver. Metsche et al. [45] applied a multigrid solver not only for the simple pressure Poisson equation but also for the pressure–velocity coupled equation, where pressure and velocity are both treated implicitly. Although they successfully obtained solutions in nonsymmetric, nondiagonally dominant, and nonpositive definite systems with the combination of the GMRES method and AMG preconditioning with a Uzawa smoother [46], they noted that convergence was not assured due to the nonsymmetric matrix. In addition, they reported that the calculation time was dominated by the AMG setup time because the point cloud in FPM dynamically changes. This implies that the AMG setup time is not negligible when the connectivity dynamically changes, as in the particle methods and meshless methods.
Fortunately, the linear equation in the MPH-I method [21] can be converted and will have the SPD feature even in the pressure–velocity coupled approach. Therefore, the classic multigrid preconditioned CG solver [47] is applicable. However, the coefficient matrix is not diagonally dominant due to the complex connectivity and multidimensionality, has many nonzero elements corresponding to the neighboring particles, and has a large condition number due to incompressibility and large viscosity. To handle multidimensionality and heterogeneity [48, 49], a damped Jacobi smoother [34,35,36] is often applied to relax the convergence. However, it is difficult to set the damped parameter to ensure asymptotic convergence in the problem where the connectivity dynamically changes.
In this study, a scalable MPH-I method was developed. It was shown that the SPD matrix equation can be derived via pressure substitution [21] and that the SPD feature generally appeared in physically consistent systems. For the SPD matrix equation, geometric bucket-based multigrid (BMG) preconditioned CG/CR solvers were constructed, and the preconditioner was designed to satisfy the condition for theoretical convergence in a finite number of iterations. To handle multidimensionality, the Jacobi smoother was extended to the one that is applicable for the nondiagonally dominant matrix. To confirm the validity of the multigrid solver, the CG, CR, multigrid preconditioned CG (MGCG) and multigrid preconditioned CR (MGCR) solvers were compared with respect to the number of solver iterations and computation time. Specifically, high-viscosity incompressible dam break calculations were conducted with various resolutions. Furthermore, the performance in the shared memory parallel computation with CPU and GPU were also investigated.
2 Moving particle hydrodynamics for incompressible flows (MPH-I)
2.1 Governing equations and physical consistency
The governing equations in the MPH methods [17,18,19,20,21,22] are the Navier‒Stokes equation with a Lagrangian description
and the equation for pressure
where ρ, u, Ψ, μ, g, λ and κ are the density, velocity, pressure, shear viscosity, gravity, bulk viscosity and bulk modulus, respectively. Although these governing equations do not directly include the incompressible condition, incompressible flows can practically be expressed by setting λ and μ to sufficiently large values. The expression in Eq. (2) enables the arising matrix equation to be SPD, which is discussed later. In the MPH method, the governing equations are discretized using particle interaction models, which is conceptually the same as that in the SPH and MPS methods. Simultaneously, for physical consistency, the interaction models are to be chosen such that they can be fit into an analytical mechanical framework [23]. In this study, the normalized weight function
was used for discretization, where rij is the relative position between particles i and j, h is the effective radius, i.e., the cutoff radius, and d is the number of dimensions. Here, the effective radius for pressure term hp and the radius for shear viscosity hv are separately given, and the corresponding weight functions are denoted by \(w_{ij}^{p}\) and \(w_{ij}^{v}\), respectively. The Navier‒Stokes equation (Eq. (1)) is discretized as [20]
where eij is the unit vector in the rij direction, Ψi and ui are the pressure and velocity of particle i, and uij = uj-ui is the relative velocity between particles i and j, respectively. The right shoulder prime in \(w_{ij}^{p^{\prime }}\) and \(w_{ij}^{v^{\prime }}\) indicates the differential of the weight function
which yields negative values. On the other hand, the equation for pressure (Eq. (2)) is discretized as
where ni is a particle number density given by the summation of the weight function wijp as
and n0 is a base value of the particle number density.
In addition, Eqs. (4) and (6) can be fit into the extended Lagrangian mechanics framework with dissipation [23]
where \( \mathcal{L}\), \( \mathcal{T}\), \( \mathcal{V} \) and \( \mathcal{D} \) are the Lagrangian, kinetic energy, potential energy and Rayleigh’s dissipative function, respectively. Therefore, the system in the MPH-I method is physically consistent [17, 20]. Specifically, when Lagrangian \( \mathcal{L}\) and dissipative function \( \mathcal{D} \) are given as
and
the discretized governing equations (Eqs. (4) and (6)) are derived using Eq. (8). In Eqs. (9) and (10), m and ΔV are the mass and volume of the particles, respectively, which are given as constants
using initial particle spacing l.
Although the particle interaction models appearing in the physically consistent formulations (Eqs. (4) and (6)) are zeroth-order accurate, the calculation model, i.e., the MPH method, was validated in previous studies [17,18,19,20,21], for example, via the static pressure calculation [17, 18], dam break calculation [17, 18] and Taylor Couette calculation [19]. The static pressure calculation is not always easy for particle methods because it is sensitive to the unphysical fluctuation that often appears in particle methods. Even so, the MPH method could obtain reasonable results without any empirical relaxations in such a problem, where the classical particle methods, i.e., the SPH and MPS methods, fail [17]. In addition, since the MPH method is purely a multibody system defined via analytical mechanics [23], no special treatment is needed for giving boundary conditions. Specifically, the wall boundary can be easily expressed by putting fixed particles in the calculation domain, and the free surface boundary is naturally given via the vacant space where particles do not exist.
2.2 Linear matrix equation in the implicit calculation
For practically simulating incompressible flows, the bulk viscosity λ and bulk modulus κ are set very large. For setting κ, numerical stability is assured when the condition
is satisfied [17], where Δt is the time step width. This implies that the bulk modulus κ can be set large when the bulk viscosity λ is large. Therefore, the way to handle a large bulk viscosity λ must be considered in the incompressible calculation. In addition, to simulating high-viscosity flows, a large shear viscosity μ must be calculated stably. For stability with a large bulk viscosity λ and a large shear viscosity μ, the velocity in Eqs. (4) and (6) is implicitly treated as
where the upper shoulder index k attached to the velocity ui indicates the time step. Equations (13) and (14) form a linear matrix equation whose unknowns are the velocity uk+1 and pressure Ψk+1. By solving this matrix equation, the velocity at the next step uk+1 is obtained, and by updating the particle position x as
the particle movement can be calculated. However, since the linear system (Eqs. (13) and (14)) has a nondefinite coefficient matrix, the convergence is not assured when the well-known CG and CR solvers are applied [17]. In this study, the system is converted to a system with a symmetric positive-definite (SPD) feature by substituting Eq. (14) into the pressure Ψk+1 in Eq. (13) as
where the unknowns are only the velocity uk+1 [21]. Since this conversion mathematically retains the same equation, it enables justifiably solving the system with CG and CR solvers without affecting the calculation results. The partial differential version of Eq. (16) is obtained by substituting Eq. (2) into Eq. (1) as
The partial differential equation to be solved (Eq. (17)) is more complex than the Poisson equation or Helmholtz equation that were solved in previous studies [34,35,36,37,38,39,40], but it is a kind of diffusion equation with respect to velocity. Therefore, the discretized version is expected to be basically diagonally dominant.
To obtain the SPD feature in Eq. (16), the formulation of Eq. (2) is a key, and this approach is analogous to the penalty method in the finite element method [50] which was originally developed for structural calculations. In structural calculations, the Lagrangian is given with elastic strain energy, and the motion equation is derived by minimizing the potential energy. On the other hand, in fluid calculations with Lagrangian specifications, i.e., particle methods, the dissipative function is given, and the corresponding force in the motion equation is derived by minimizing the function. In structural calculations, the incompressible constraint can practically be posed using a very large bulk modulus. This is the penalty method with which the SPD feature can be obtained. Analogically, in the particle methods, the practically incompressible calculation can be conducted with very large bulk viscosity, λ in Eq. (2), and similarly, the SPD feature can be maintained with this approach.
Moreover, the SPD feature appeared in Eq. (16) generally arises in physically consistent systems. In the extended Lagrangian mechanics with dissipation (Eq. (8)), Rayleigh’s dissipative function must be positive definite. This allows the dissipative function to be expressed as
using an SPD matrix C, where the vector u without a lower index indicates a large vector unifying the velocity of all particles, and the bracket {,} indicates the dot product of the unified vectors. Using the matrix C, the motion equation of the particles is expressed as
When the velocity is treated implicitly,
is obtained. Since the coefficient matrix appears on the left-hand side of Eq. (20) is SPD, it is proven that the SPD feature will arise in arbitrary physically consistent systems that can be fit into the analytical mechanical framework of Eq. (8). Therefore, it is interpreted that the SPD feature in Eq. (16) emerged owing to the physical consistency in the MPH-I method.
3 Multigrid preconditioned CG/CR method
3.1 Generalized CG/CR algorithm
The CG and CR solvers, where the convergence is assured with short recursive iterations whose number is smaller than the degree of freedom, can be generalized as follows. Here, let the weight matrix M and the preconditioning matrix K. Using the initial solution x0, the residual r and search direction p are initially given as
where the lower index indicates the iteration. Then, the solution x and residual r are updated
with a parameter αk
which is determined such that
Here, the bracket {a,b} indicates the dot product of unified vectors a and b. Then, the conjugate vector p is updated as
with a parameter βk
which is determined such that
The iteration given by Eqs. (22)–(27) is repeated until the L2 norm of the residual |rk|2 becomes small enough, i.e., the convergence. When the matrices M and MAK are symmetric, the orthogonalities
are provided (see “Appendix A”). Using Eqs. (28)–(30), αk and βk are rewritten as
and
which are often used in implementation. When the matrices M and MAK are symmetric positive definite (SPD), theoretical convergence is assured (see “Appendix A”). This generalized algorithm (Eqs. (21)–(27)) can generate specific algorithms such as the CG and CR methods. For example, the CG method is derived from M = A−1 and K = I, the CR method is from M = I and K = I, the CGNR method [33] is from M = I and K = AT, and the CGNE method [33] is from M = (AAT)−1 and K = AT. Furthermore, the preconditioned CG method is equivalent to the case with M = A−1 and K = KCG, and the preconditioned CR method is the case with M = K = KCR. Therefore, theoretical convergence is obtained when KCG and KCR are SPD because M and MAK will be SPD in such cases.
It is noteworthy that the residual in the CG and preconditioned CG methods is weighted by matrix A−1. In fact, the weight matrix is specified as M = A−1 when deriving the CG method. Consequently, the solution in the CG iteration is updated such that {r, A−1r} is minimized using the search direction p. This unintended weight may affect the convergence. In fact, the convergence degrades, especially when the condition number of A is large. On the other hand, in the CR and preconditioned CR methods, the objective functions to be minimized are straightforwardly expressed as {r, r} and {r, KCRr}, respectively, which results in good convergence properties, such as a monotonic decrease in the residual.
3.2 Bucket-based geometric multigrid preconditioner
In this study, background bucket cells are utilized for constructing a geometric multigrid preconditioner for the CG and CR methods. The algorithm of the multigrid preconditioned conjugated residual (MGCR) solver is shown in Fig. 1. To construct the multigrid structure, the bucket size is set the same as the effective radius such that the range of the interaction is limited to the next buckets. The linear equation in the MPH-I method (Eq. (16)) has multidimensionality because the unknowns are the velocities of particles. To restrict the multidimensional vector of the particles to the finest grid, i.e., buckets, the vectors of particles in the buckets are simply summed as
where the i and l on the lower right of u show the indices of the particles and the buckets, respectively, and the upper index 0 of ul indicates that the parameter belongs to the finest grid (level 0 in Fig. 1). Here, the restriction matrix corresponding to Eq. (33) is denoted by R, and the prolongation matrix from the buckets to the particles P is given by its transpose as
Then, the coefficient matrix in the finest grid scale A0 is defined as
where A is the coefficient matrix at the original particle scale, which is expressed in Eq. (16). Furthermore, the coarser grids are recursively created from the finer grids. In this study, the size of the coarser grids is double that of the finer grids. Specifically, the coarser grid consists of 4 finer grids in 2D and 8 finer grids in 3D. The level of the grid is incremented as the grid size is doubled (level 0: grid size = hv, level 1: grid size = 2hv, level 2: grid size = 4hv, and so on). The restriction from the finer grid (level r) to the coarser grid (level r + 1) is simply given by the summation
where l and s on the lower right of u are the indices of the coarser and finer grids, respectively. Using the restriction matrix Rr, which corresponds to Eq. (36), the prolongation matrix from level r + 1 to level r is given by
Then, the coefficient matrix Ar is recursively provided as
Using the geometric multigrid expressed by Eqs. (33)–(38), a preconditioner for the CG and CR methods satisfying the SPD condition is constructed. However, the linear system to be solved in the MPH-I method is not diagonally dominant even at the coarse grid scale because of multidimensionality. Therefore, the widely used smoothers, e.g., the Jacobi and Gauss‒Seidel smoothers, cannot directly be applied because they demand diagonal dominancy. To address this issue, the Jacobi smoother is extended to be applicable even for nondiagonally dominant systems. In solving the linear equation
the Jacobi iteration is expressed as
where D is the diagonal component of the coefficient matrix A. In the extended Jacobi iteration, the diagonal matrix D is replaced by another diagonal matrix \({\hat{\mathbf{D}}}\), whose elements satisfy
Then, the iteration is given by
With this simple extension, asymptotic convergence will be obtained even in a nondiagonally dominant system (see “Appendix B”). In this study, the right-hand side of Eq. (41) was simply adopted for calculating the elements \({\hat{\mathbf{D}}}\) because it works when \(2{\hat{\mathbf{D}}} - {\mathbf{A}}\) is not singular, which is satisfied in most cases. Since the matrix equation in the MPH-I method is a discretized version of the diffusion equation, it is close to diagonally dominant. In such cases, this extension is useful.
In this study, the V cycle multigrid calculation was included as a preconditioner of the CG and CR methods (Fig. 1), and the extended Jacobi iteration was adopted as a smoother in each level. It is better for the preconditioner to skip smoothing at the original particle level, where matrix–vector multiplication requires a large computational cost. This is because the CG and CR solvers already have the main iterations in the original level, which are more efficient than the smoothing iterations in the multigrid calculation, i.e., the extended Jacobi iterations. Therefore, in this study, the preconditioning matrix K was designed not to include particle level smoothing as
where \({\hat{\mathbf{D}}}\) is the extended diagonal matrix defined in Eq. (41) corresponding to the coefficient matrix A in Eq. (16), and M0 is the matrix corresponding to the calculation at level 0. When the extended Jacobi iterations are conducted twice in both pre- and postsmoothing in each level (Fig. 1), the recursive relation between the matrices Mr and Mr+1
holds, where the upper right index of M indicates the level (see “Appendix C”). In the maximum level, the extended Jacobi iterations are conducted 4 times, and the matrix Mmax is expressed as
Here, the matrix Mmax is SPD (see “Appendix B”). In the same way, the sum of the first and second terms on the right-hand side of Eq. (44) is SPD. When Mr+1 is SPD, the third term in Eq. (44) is symmetric nonnegative definite. Therefore, Mr is recursively SPD, and K is also SPD. Therefore, K satisfies the condition to be a preconditioner for the CG and CR solvers. Note that PM0R in the second term on the right-hand side of Eq. (43) cannot solely be a preconditioner because it is nonnegative definite but singular. Therefore, it is combined with \({\hat{\mathbf{D}}}^{ - 1}\) to construct a preconditioner.
4 Benchmark calculations
4.1 Number of iterations
The presented geometric bucket-based multigrid preconditioner was implemented in the MPH-I methods, where open-source code [22] was used. The calculations were conducted using the multigrid preconditioned conjugated residual (MGCR), multigrid preconditioned conjugated gradient (MGCG), nonpreconditioned conjugated residual (CR) and nonpreconditioned conjugated gradient (CG) solvers. Specifically, the simple high-viscosity incompressible dam break shown in Fig. 2 was calculated for a phenomenon time of 0.2 s. The calculation condition for the base case is shown in Table 1, and the scaled cases, whose particle spacing l and time step width Δt are 1/2, 1/4, 1/8,…,1/64 times the base case, are shown in Table 2. In Table 2, only the difference from the base case (Table 1) is shown. In addition, the diffusion numbers dλ and dμ, degrees of freedom DoFA, approximated number of nonzero elements nnzA and approximated condition numbers KA are displayed in Table 2. The DoFA was calculated as a product of the number of fluid particles and the number of dimensions. The nnzA was estimated as
using the approximated number of neighboring particles, which is π(hv/l)2 in 2D and 4/3π(hv/l)3 in 3D. For the rough estimation of KA, maximum and minimum eigen values Λmax and Λmin were predicted using
which is the 1D version of the left-hand side of Eq. (17). In estimating Λmax, it was assumed that the discretization in the MPH-I method is analogous to the finite difference discretization with a mesh size of hv(= 2hp). Equation (47) was discretized as
and the maximum eigen value Λmax was approximated as
On the other hand, for predicting Λmin, the maximum wavelength 4L which was determined from the calculation geometry (Fig. 2), was focused on. By substituting a sine wave with a wavelength of 4L
into Eq. (47),
was obtained. Then, the minimum eigen value Λmin was approximated as
and the condition number was predicted as
The number of solver iterations at t = 0.2 s with respect to the scaled cases (Table 2) is presented in Fig. 3. Here, only the main iterations (Fig. 1) are counted with the convergence threshold of \(\left| {\mathbf{r}} \right|^{{2}} /\left| {\mathbf{b}} \right|^{{2}} < {1}0^{{{-}{12}}}\). When the CR and CG solvers were adopted, the number of iterations drastically increased with the problem size. On the other hand, with the MGCR and MGCG solvers, the increase in the number of iterations was limited. Additionally, the number of iterations was smaller with the MGCR and CR solvers than with the MGCG and CG solvers, respectively. The objective function in the MGCG and CG was unintendedly weighted by A−1 as {r, A−1r}, and it possibly affected the convergence.
The difficulty in solving the scaled cases shown in Table 2 is not only because of the problem size but also because of the large diffusion numbers. Although the Courant numbers in the scaled cases are the same as those in the base case, the diffusion numbers dλ and dμ are larger than those in the base case. For the comparison, the small cases shown in Table 3 were studied, and the problem size was set the same as that of the base case, but the diffusion numbers varied to follow each scaled case in Table 2. In Table 3, only the conditions that are different from those of the base case (Table 1) are presented. By setting the gravity g and viscosities μ and λ, which were 2, 4, 8,…64 times larger those of the base case, only the diffusion numbers dλ and dμ were enlarged, while almost the same flow as that of the base case was maintained. The number of solver iterations at t = 0.2 s with respect to the small cases (Table 3) is presented in Fig. 4. The number of iterations is smaller with the MGCR and MGCG solvers than with the CR and CG solvers, respectively. This implies that multigrid preconditioning also suppresses the number of iterations associated with large diffusion numbers. The scaled cases (Table 2) and the small cases with the same diffusion numbers (Table 3) are compared in Fig. 5. When the problem size is large, the multigrid preconditioner contribution is large, and the number of iterations in the scaled cases is kept smaller than that in the corresponding small cases. In addition, it is confirmed that the MGCR solver shows better scalability than the MGCG solver. Compared to the previous studies [38, 42, 43], the numbers of iterations in this study (Figs. 3 and 4) are relatively large. This is because the linear matrix equation in this study (Eq. (16)) is difficult to be solve due to the nondiagonally dominancy, large number of nonzero elements and large condition number (Table 2).
4.2 Single CPU calculation
The calculation times for a single thread CPU (Xeon Gold 6252 (24 core)) computation are shown in Fig. 6, where the results for Case x1~x1/8 in Table 2 are presented. Hereinafter, the calculation times are shown with dividing by the number of particles and number of time steps for comparison, and such calculati on times are referred to “unit calculation times”. The unit calculation times using the CR and CG methods become larger as the problem size becomes larger. On the other hand, when using the MGCR and MGCG methods, the unit calculation time is almost constant even for large problems. This implies that the computational time is proportional to the problem size and that the multigrid methods are scalable. The breakdowns of the unit calculation times with the MGCR and CR solvers are shown in Fig. 7. For each calculation, the whole computational time is labeled “total”. For the MGCR solver, the time spent for preconditioning is labeled “solver preconditioning”, and the other time spent by the solver, which is mostly for the main iteration, is labeled “solver main”. For the CR solver, the time spent by the solver is labeled “solver” because there is no preconditioning. Most of the calculation time was spent by the solvers. In the single CPU calculation, the preconditioning time was not dominant in all the time spent by the MGCR solver. In the preconditioning stage, the most computationally expensive matrix–vector product calculations in the original particle level are avoided as in Eq. (43), and the V-cycle only includes the product calculations in the coarser level. Therefore, the amount of computation required for the preconditioning is basically smaller than that for the main iteration.
4.3 Parallel CPU and GPU calculations
The calculation times of the OpenACC parallel computation on the GPU (A100 (80 GB)) are shown in Fig. 8, where the results for Cases x1~x1/64 in Table 2 are presented. In the relatively small cases with 400–6400 particles (Cases x1 ~x1/4), the CR and CG methods were faster than the MGCR and MGCG methods, but in the larger cases with over 25,600 particles (Cases x1/16 ~), the multigrid methods were faster. The breakdowns of the calculation times with the MGCR and CR methods are shown in Fig. 9, where the legends are the same as in Fig. 7. With both methods, the time spent by the solvers dominated the total calculation time. Using the CR solver, the unit calculation time against the problem size (CR (total)) decreased once in the range of 400–102,400 particles (Cases x1 ~x1/16) and increased again in the range over 102,400 particles (Cases x1/16 ~). When the problem size was large, the number of iterations dominated the computation time, and the increasing trend in the large cases reflects the large number of iterations in the CR solver. When the problem size was small, the overhead cost for parallelization dominated the computation time. The straight decreasing trends in the small cases were due to the overhead cost. Since the overhead cost can be assumed to be almost constant, the unit calculation time will be close to inversely proportional to the problem size when the overhead cost is dominant. With the MGCR solver, the unit calculation time (MGCR (total)) linearly decreased in the range of 400–102,400 particles (Cases x1 ~ x1/16), and the decrease slowed down in the range over 102,400 particles (Cases x1/16 ~). This indicates that the overhead cost was dominant with 400–102,400 particles (Cases x1 ~ x1/16). According to the breakdown of the total computational time with the MGCR solver, the preconditioning time was larger with 400 –102,400 particles (Case x1 ~ 1/16), and the main iteration time was larger with more than 102,400 particles (Case x1/32 ~). The unit calculation time with respect to the main iteration (MGCR (solver main)) linearly decreased in the range of 400–6400 particles (Case x1~1/4), and after the transition range of 6400 –102,400 particles (Case x1/4~x1/16), it became almost constant in the range over 102,400 particles (Case x1/16 ~). On the other hand, the unit calculation time with respect to the preconditioning (MGCR (solver preconditioning)) linearly decreased in the whole range, showing that it was dominated by the overhead cost within all the presented cases (Cases x1~ x1/64). In addition, the unit calculation time of the MGCR preconditioning was larger than that of the MGCR main iteration in the small cases where the overhead cost is thought to be dominant. This implies that the overhead cost of the preconditioning was larger than that of the main iteration. This large overhead cost is the main reason why the MGCR method showed lower performance than the CR method in the small cases. However, the scalability of the MGCR solver is expected when considering the extrapolations in Fig. 9 toward larger cases with more than 1,638,400 particles (Case x1/64 ~). It was previously confirmed via the single CPU cases that the preconditioning stage does not need as much computation compared to that of the main iteration. Therefore, the unit calculation time with respect to the MGCR preconditioning stage decreases in larger cases where the parallel efficiency is expected to be improved. In addition, the unit calculation time of the MGCR main iteration is already almost constant with over 102,400 particles (Case x1/16 ~). Therefore, the MGCR solver is expected to be scalable in larger cases, where the MGCR main iteration time will dominate the total computational time.
A larger overhead cost is required when the number of parallel threads is larger. Since the parallel computation on GPU (A100 (80 GB)) is highly parallelized, the overhead cost was thought to be relatively large. To confirm the dependency on the number of parallel threads, the cases in Table 2 are studied with OpenMP parallel computations on a CPU (Xeon Gold 6252 (24 core)). The unit calculation times obtained with the MGCR solver are shown in Fig. 10. The overhead cost was dominant in the range where the straight decreasing trends were found. With a larger number of CPU threads, the overhead cost was larger, and it dominated the total computational time in a wider range. In addition, the unit calculation time was mostly constant in the sufficiently large cases where the overhead cost was not dominant. This implies that the computational time is proportional to the problem size and that the numerical method is scalable when the problem size is sufficiently large. In comparison, the unit calculation time with the CR solver is shown in Fig. 11. A straight decreasing trend was also observed in Fig. 11, but the overhead cost and its dominant range were smaller than those in Fig. 10. This is because the CR solver does not include preconditioning where the large overhead cost is needed. In contrast to Fig. 10, the unit calculation time in Fig. 11 increased in the sufficiently large cases where the overhead cost was not dominant. This is because the number of iterations in the CR solver increases against the problem size.
Overall, in parallel computation, the multigrid solvers (MGCR and MGCG) did not perform well in the small cases due to the large overhead cost with respect to the preconditioning stage, but they were efficient in the large cases where the number of iterations mainly determines the total computational time.
4.4 Three-dimensional calculations
The trends observed in the above sections were almost the same as those in the 3D calculations. Here, a simple 3D high-viscosity incompressible dam break problem (Fig. 12) was taken as an example. Based on the conditions in Table 1, the calculation conditions in the 3D scaled cases are given in Table 4. The calculations were conducted on a GPU (A100 (80 GB)) with OpenACC applying the MGCR and CR solvers. The number of solver iterations at t = 0.2 s is shown in Fig. 13. While the CR method suffered from a large number of iterations in the large cases, the MGCR method could suppress the iterations even in the large cases. The breakdowns of the unit calculation times are shown in Fig. 14. Since the parallel efficiency of the preconditioning was not good, the MGCR method was slower than the CR method in the small cases. In contrast, the MGCR method was faster in the large cases where the number of iterations mainly determines the total computational time. This indicates that the multigrid technique is useful for both 2D and 3D parallel calculations when the problem size is sufficiently large.
5 Conclusion
In this study, a scalable MPH-I method was developed. A derivation of the SPD matrix equation through pressure substitution [21] was presented, and additionally, it was shown that the SPD feature generally appears in a physically consistent system by deriving it from the analytical mechanical equation [23]. To solve the SPD matrix equation, a bucket-based multigrid preconditioner was constructed such that it satisfies the condition for application to the CG and CR solvers. Moreover, to handle the complexity due to multidimensionality, the extended Jacobi iteration, which is also applicable for nondiagonally dominant matrix equations, was proposed. In the benchmark calculations, the simple high-viscosity incompressible dam break problems were calculated with the MGCR, MGCG, CR and CG solvers in both 2D and 3D. The number of iterations could be suppressed by the multigrid solvers, and it was smaller with the CR solvers than with the CG solvers regardless of the preconditioning steps. Consequently, the MGCR solver showed the best performance of the four, and the number of iterations hardly depended on the problem size. In fact, the computational time in the single CPU calculation was almost proportional to the numbers of particles and time steps, and the scalability was presented within the tested cases. The performance of the multigrid solvers was also tested in parallel computations on a CPU and GPU. For the small problems, the MGCR and MGCG solvers were slower than the CR and CG solvers because of the large overhead cost in the preconditioning process. However, they showed better performance than conventional solvers for large problems, where the number of solver iterations mainly determines the calculation time.
Data availability
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Code availability
The code that supports the findings of this study is available from the corresponding author upon reasonable request and licensing.
References
Monaghan JJ (1994) Simulating free surface flows with SPH. J Comput Phys 110:399–406. https://doi.org/10.1006/jcph.1994.1034
Koshizuka S, Oka Y (1996) Moving-Particle Semi-Implicit methods for fragmentation of incompressible fluid. Nucl Sci Eng 123:421–434. https://doi.org/10.13182/NSE96-A24205
Colagrossi A, Landrini M (2003) Numerical simulation of interfacial flows by smoothed particle hydrodynamics. J Comput Phys 191:448–475. https://doi.org/10.1016/S0021-9991(03)00324-3
Molteni D, Colagrossi A (2009) A simple procedure to improve the pressure evaluation in hydrodynamic context using the SPH. Comput Phys Commun 180:861–872. https://doi.org/10.1016/j.cpc.2008.12.004
Antuono M, Colagrossi A, Marrone S, Molteni D (2010) Free-surface flows sloved by means of SPH schemes with numerical diffusive terms. Comput Phys Commun 181:532–549. https://doi.org/10.1016/j.cpc.2009.11.002
Marrone S, Antuono M, Colagrossi A, Colicchio G, Le Touze D, Graziani G (2011) δ-SPH model for simulating violent impact flows. Comput Methods Appl Mech Engrg 200:1526–1542. https://doi.org/10.1016/j.cma.2010.12.016
Hu XY, Adams NA (2006) A multi-phase SPH method for macroscopic and mesoscopic flows. J Comput Phys 213:844–861. https://doi.org/10.1016/j.jcp.2005.09.001
Khayyer A, Gotoh H (2011) Enhancement of stability and accuracy of the moving particle semi-implicit method. J Comput Phys 230:3093–3118. https://doi.org/10.1016/j.jcp.2011.01.009
Tanaka M, Masunaga T (2010) Stabilization and smoothing of pressure in MPS method by quasi-compressiblility. J Comput Phys 229:4279–4290. https://doi.org/10.1016/j.jcp.2010.02.011
Kondo M, Koshizuka S (2011) Improvement of stability in moving particle semi-implicit method. Int J Numer Meth Fluids 65:638–654. https://doi.org/10.1002/fld.2207
Asai M, Aly AM, Sonoda Y, Sakai Y (2012) A stabilized incompressible SPH method by relaxing the density invariance condition. J Appl Math 2012:139583. https://doi.org/10.1155/2012/139583
Xu R, Stansby P, Laurence D (2009) Accuracy and stability in incompressible SPH (ISPH) based on projection method and a new approach. J Comput Phys 228:6703–6725. https://doi.org/10.1016/j.jcp.2009.05.032
Hosseini SM, Feng JJ (2011) Pressure boundary conditions for computing incompressible flows with SPH. J Comput Phys 230:7473–7487. https://doi.org/10.1016/j.jcp.2011.06.013
Lind SJ, Xu R, Stansby PK, Rogers BD (2012) Incompressible smoothed particle hydrodynamics for free-surface flows: a general diffusion-based algorithm for stability and validations for impulsive flows and propagating waves. J Comput Phys 231:1499–1523. https://doi.org/10.1016/j.jcp.2011.10.027
Shadloo MS, Zainali A, Sadek SH, Yildiz M (2011) Improved incompressible smoothed particle hydrodynamics method for simulating flow around bluff bodies. Comput Methods Appl Mech Eng 200:1008–1020. https://doi.org/10.1016/j.cma.2010.12.002
Tsuruta N, Khayyer A, Gotoh H (2013) A short note on dynamic stabilization of moving particle semi-implicit method. Comput Fluids 82:158–164. https://doi.org/10.1016/j.compfluid.2013.05.001
Kondo M (2021) A physically consistent particle method for incompressible fluid flow calculation. Comput Part Mech 8:69–86. https://doi.org/10.1007/s40571-020-00313-w
Kondo M, Matsumoto J (2021) Weakly compressible particle method with physical consistency for spatially discretized system, Transactions of JSCES (2021) Paper No. 20210006 (in Japanese). https://doi.org/10.11421/jsces.2021.20210006
Kondo M, Fujiwara T, Masaie I, Matsumoto J (2021) A physically consistent particle method for high-viscous free-surface flow calculation. Comput Part Mech. https://doi.org/10.1007/s40571-021-00408-y
Kondo M, Matsumoto J (2021) Surface tension and wettability calculation using density gradient potential in a physically consistent particle method. Comput Methods Appl Mech Eng 385:114072. https://doi.org/10.1016/j.cma.2021.114072
Kondo M, Matsumoto J (2021) Pressure substituting implicit solver to speed-up moving particle hydrodynamics method for high-viscous incompressible flows, Transactions of JSCES (2021) Paper No. 20210016. (in Japanese). https://doi.org/10.11421/jsces.2021.20210016
MphImplicit (GPLv3 license). https://github.com/Masahiro-Kondo-AIST/MphImplicit
Goldstein H, Poole CP, Safko JL (2013) Clasical mechanics, Pearson New International Edition
Suzuki Y, Koshizuka S (2008) A Hamiltonian particle method for non-linear elastodynamics. Int J Numer Meth Eng 74:1344–1373. https://doi.org/10.1002/nme.2222
Kondo M, Suzuki Y, Koshizuka S (2010) Suppressing local particle oscillations in the Hamiltonian particle method for elasticity. Int J Numer Meth Eng 81:1514–1528. https://doi.org/10.1002/nme.2744
Kondo M, Koshizuka S (2010) Development of thin plate model using Hamiltonian particle method, Transactions of JSCES, Paper No. 20100016 (in Japanese). https://doi.org/10.11421/jsces.2010.20100016
Ellero M, Serrano M, Español P (2007) Incompressible smoothed particle hydrodynamics. J Comput Phys 226:1731–1752. https://doi.org/10.1016/j.jcp.2007.06.019
Suzuki Y, Koshizuka S, Oka Y (2007) Hamiltonian moving-particle semi-implicit (HMPS) method for incompressible fluid flows. Comput Methods Appl Mech Eng 196:2876–2894. https://doi.org/10.1016/j.cma.2006.12.006
Leimkuhler B, Reich S (2004) Simulating Hamiltonian dynamics. Cambridge University Press, Cambridge
Yokoyama R, Kondo M, Suzuki S, Okamoto K (2021) Analysis of molten metal spreading and solidification behaviors utilizing moving particle full-implicit method. Front Energy 15:959–973. https://doi.org/10.1007/s11708-021-0753-0
Yokoyama R, Kondo M, Suzuki S, Okamoto K (2022) Simulating melt spreading into shallow water using moving particle hydrodynamics with turbulence model. Comput Part Mech. https://doi.org/10.1007/s40571-022-00520-7
Negishi H, Kondo M, Amakawa H, Obara S, Kurose R (2023) A fluid lubrication analysis including negative pressure using a physically consistent particle method. Comput Part Mech. https://doi.org/10.1007/s40571-023-00584-z
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM Press, Philadelphia
Trottenberg U, Oosterlee CW, Schuller A (2000) Multigrid. Elsevier, Amsterdam
Briggs WL, Henson VE (2000) S, 2nd edn. F. McCormick, A multigrid tutorial
Wesseling P, Oosterlee CW (2001) Geometric multigrid with applications to computational fluid dynamics. J Comput Appl Math 128:311–334. https://doi.org/10.1016/S0377-0427(00)00517-3
Cummins SJ, Rudman M (1999) An SPH projection method. J Comput Phys 152:584–607. https://doi.org/10.1006/jcph.1999.6246
Trask N, Maxey M, Kim K, Perego M, Parks ML, Yang K, Xu J (2015) A scalable consistent second-order SPH solver for unsteady low Reynolds number flows. Comput Methods Appl Mech Eng 289:155–178. https://doi.org/10.1016/j.cma.2014.12.027
Chow AD, Rogers BD, Lind SJ, Stansby PK (2018) Incompressible SPH (ISPH) with fast Poisson solver on a GPU. Comput Phys Commun 226:81–103. https://doi.org/10.1016/j.cpc.2018.01.005
Guo X, Rogers BD, Lind S, Stansby PK (2018) New massively parallel scheme for Incompressible Smoothed Particle Hydrodynamics (ISPH) for highly nonlinear and distorted flow. Comput Phys Commun 233:16–28. https://doi.org/10.1016/j.cpc.2018.06.006
Matsunaga T, Shibata K, Murotani K, Koshizuka S (2016) Solution of pressure Poisson equation in particle method using algebraic multigrid method, Transactions of JSCES Paper No. 20160012 (in Japanese). https://doi.org/10.11421/jsces.2016.20160012
Södersten A, Matsunaga T, Koshizuka S (2019) Bucket-based multigrid preconditioner for solving pressure Poisson equation using a particle method. Comput Fluids. https://doi.org/10.1016/j.compfluid.2019.104242
Takahashi T, Lin MC (2006) A multilevel SPH solver with unified solid boundary handling. Comput Graph Forum 35:517–512. https://doi.org/10.1111/cgf.13048
Seibold B (2010) Performance of algebraic multigrid methods for non-symmetric matrices arising in particle methods. Numer Linear Algebra Appl 17:433–451. https://doi.org/10.48550/arXiv.0905.3005
Metsch B, Nick F, Kuhnert J (2020) Algebraic multigrid for the finite pointset method. Comput Vis Sci 23:3. https://doi.org/10.1007/s00791-020-00324-3
Schöberl J, Zulehner W (2003) On Schwarz-type smoothers for saddle point problems. Numer Math 95:377–399. https://doi.org/10.1007/s00211-002-0448-3
Tatebe O (1993) The multigrid preconditioned conjugate gradient method. In: Proceedings of sixth copper mountain conference on multigrid methods, NASA conference publication, vol 3224, pp 621–634. https://www.hpcs.cs.tsukuba.ac.jp/~tatebe/research/paper/CM93-tatebe.pdf
Fish J, Belsky V (1995) Multigrid method for periodic heterogeneous media Part 1: convergence studies for one-dimensional case. Comput Methods Appl Mech Eng 126:1–16. https://doi.org/10.1016/0045-7825(95)00811-E
Fish J, Belsky V (1995) Multi-grid method for periodic heterogeneous media Part 2: Multiscale modeling and quality control in multidimensional case. Comput Methods Appl Mech Eng 126:17–38. https://doi.org/10.1016/0045-7825(95)00812-F
Zienkiewicz OC, Taylor RL (2002) The finite element method, 5th edn. Butterworth-Heinemann, Oxford
Funding
Not applicable.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author reports no conflict of interest in this study.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A: Convergence of the generalized CG/CR solvers
The algorithm expressed in Eqs. (21)–(27) will theoretically converge within N iterations when solving a linear matrix equation with N degrees of freedom when the matrices M and MAK are symmetric positive definite (SPD). First, the orthogonalities (Eqs. (28)–(30)) will be proven. Assume that Eqs. (28)–(30) hold when i,j < k. From Eq. (30),
is derived with Eq. (22). Using the condition (Eq. (24)) for determining αk,
holds in j < k + 1. Therefore, Eq. (28) is also satisfied when i,j < k + 1. Because Eq. (25),
under j < k + 1. Since MAK is symmetric,
Therefore,
under Conditions i ≠ j and i, j < k + 1, and Eq. (29) is also satisfied when i, j < k + 1. Using Eqs. (22) and (25), Eq. (30) is deformed as
With the condition (Eq. (27)) used for determining βk,
holds in j < k + 1, and with the symmetry of M,
is derived under i ≠ j and i, j < k + 1. Therefore, Eq. (30) is also satisfied when i,j < k + 1. Then, Eqs. (28)–(30) are satisfied recursively. Moreover, since M is SPD, it is expressed as
with a nonsingular matrix W, and Eq. (30) will be
Here, {WAp0,WAp1,…,WApk} form an orthogonal basis for a k dimensional space, and the search vectors {p0,p1,…pk} are linearly independent of each other. Therefore, k linearly independent search directions are produced by k iterations, but the number of linearly independent vectors cannot exceed N in an N dimensional space. This implies that the N dimensional space is all explored with the N search directions, and the solver will find the exact solution at most after N iterations.
Appendix B: Extended Jacobi iteration
It is shown here that the extended Jacobi iteration (Eq. (42)) can obtain asymptotic convergence even in a nondiagonally dominant matrix equation. Assume that the initial solution is x0 = 0 in solving the linear matrix equation (Eq. (39)) having a symmetric coefficient matrix A. With a matrix Q given by
Equation (42) is written as
and the matrix MexJ corresponding to the iterative calculation is expressed as
Since the matrix \({\hat{\mathbf{D}}} + {\mathbf{Q}} = 2{\hat{\mathbf{D}}} - {\mathbf{A}}\) is symmetric and diagonally dominant as
it is symmetric positive definite (SPD). Consequently, MexJ is also SPD. On the other hand, matrix MexJ can also be written as
using A−1 (see Appendix C). For Eq. (B.5) to be SPD, the eigenvalues of \(({\mathbf{I}} - ({\mathbf{I}} - {\mathbf{A}}{\hat{\mathbf{D}}}^{ - 1} )^{\text{k}} )\) need to be positive, and those of \({\mathbf{I}} - {\mathbf{A}}{\hat{\mathbf{D}}}^{ - 1}\) need to be less than 1. Let qi and Λi be the eigenvectors and eigenvalues of \({\mathbf{I}} - {\mathbf{A}}{\hat{\mathbf{D}}}^{ - 1}\), respectively. An arbitrary vector z is expressed as
Because
the eigenvalues Λi < 1. This implies that the convergence \({\mathbf{(I}} - {\mathbf{A}}{\hat{\mathbf{D}}}^{ - 1} )^{{\text{k}}} \to {\mathbf{O}}\) and Mex → A−1 is obtained when k → ∞.
Appendix C: Matrix corresponding to iterative calculation
The Jacob iteration, extended Jacobi iteration (Eq. (42)) and multigrid calculation are generally expressed as
For convergence, Lk must be a good approximation of A−1. Specifically, in the Jacobi iteration, extended Jacobi iteration and multigrid calculation, Lk s are given as
respectively, where P, R and M are the prolongation matrix, restriction matrix and the matrix corresponding to the upper-level calculation. Since the recursive formula (Eq. (C.1)) is deformed as
the iterative calculation with the initial solution of x0 = 0 is expressed as
and the corresponding matrix is
Moreover, the second term Z in Eq. (C.5) is deformed as
When A is symmetric and
for every p = 0,1,2,…k − 1,
holds, and Mgen will be symmetric.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Kondo, M., Matsumoto, J. & Sawada, T. A scalable physically consistent particle method for high-viscous incompressible flows. Comp. Part. Mech. 11, 511–527 (2024). https://doi.org/10.1007/s40571-023-00636-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s40571-023-00636-4