1 Introduction

Particle methods can easily handle large deformations of free surface flows compared to mesh methods such as the finite volume method (FVM) because the motion of a continuum, i.e., a fluid and a solid, can be directly expressed by particle movement. The representative particle methods are the smoothed particle hydrodynamics (SPH) method proposed by Monaghan [1] and the moving particle semi-implicit (MPS) method proposed by Koshizuka et al. [2]. Although they have been adopted in various applications, they require empirical relaxations to obtain stable results (e.g., artificial viscosity [1], density smoothing [3,4,5,6], background pressure [7], relaxation of the pressure Poisson equation (PPE) [2, 8,9,10,11], and particle regularization [12,13,14,15,16]). On the other hand, the moving particle hydrodynamics (MPH) method [17,18,19,20,21,22], which inherits the concepts of the SPH and MPS methods, can conduct calculations while avoiding unphysical instability, such as particle scattering, even without empirical treatments. This is because the stability with respect to particle motion is ensured through physical consistency in the MPH method. When the discrete particle motion equations can be fit into the analytical mechanical framework [23], the system will be physically consistent. The MPH method currently has two types, i.e., MPH for weakly compressible flows (MPH-WC) [18] and MPH for incompressible flows (MPH-I) [17]. In previous studies [17,18,19,20], the MPH method was validated with various calculations, e.g., static pressure [17, 18], dam break [17, 18], Taylor Couette flow [19], high-viscosity free surface flow [19], droplet oscillation [20], liquid bridge [20], and Plateau–Rayleigh instability [20]. Although it appears straightforward owing to its simplicity, calculating the static pressure using particle methods is not easy, as the static pressure calculation is easily affected by unphysical fluctuation. In fact, unrealistic results were observed with the SPH and MPS methods [17]. However, even in such cases, the physically consistent MPH method could obtain the reasonable results [17]. In addition, Other physically consistent particle methods, e.g., the elastic body models proposed by Suzuki et al. [24] and Kondo et al. [25], thin plate model by Kondo et al. [26], constraint-based incompressible SPH model by Ellero et al. [27] and Hamiltonian MPS model by Suzuki et al. [28], have also been proposed.

To simulate practically incompressible fluids such as water, a very large bulk modulus must be employed. In addition to the physical consistency, the constraint-based incompressible SPH method proposed by Ellero et al. [27] and the Hamiltonian MPS method proposed by Suzuki et al. [28] could treat the incompressibility. However, to strictly satisfy the geometric incompressible constraint, they must solve nonlinear equations in symplectic algorithms [29], i.e., RATTLE and SHAKE. In contrast, the MPH-I method [17] can practically simulate incompressible flows only by solving linear equations because it adopts a very large bulk modulus and bulk viscosity instead of directly applying incompressible constraints. Since it can treat not only incompressibility but also high viscosity, it can be applied in various industrial problems with respect to complex flows [30,31,32].

However, the arising linear equation in the MPH-I method [17] has a nonpositive definite coefficient matrix when the pressure and velocity are both treated as unknowns. Therefore, convergence in solving the linear equation is not assured when well-known solvers such as the conjugated gradient (CG) and conjugate residual (CR) methods [33] are adopted. In fact, the convergence is sometimes unstable in the MPH-I method [17] when the CR solver is adopted. When the coefficient matrix is symmetric positive definite (SPD), convergence is ensured using the CG and CR solvers adopting short recurrence for iteration. Therefore, the SPD linear system is favorable in terms of both calculation efficiency and stability. In the MPH-I method, a linear equation can be converted to an SPD equation, whose unknowns are only velocity, via pressure substitution [21]. Since the equations before and after the conversion are mathematically the same, the calculation will be faster without changing the results. However, scalability is not achieved because the number of iterations increases with the system size even when solving the SPD system.

For large computations, it is important to develop a numerical method whose calculation cost is linear to the problem size. To obtain such a scalable feature, matrix solvers such as the multigrid method [34,35,36] are needed. In particle methods, there are fewer studies adopting multigrid solvers than the finite element method (FEM) or finite volume method (FVM). Cummins and Rudman’s work [37] is a pioneering study. They applied a bucket-based geometric multigrid (BMG) solver in an isolated manner to solve the pressure Poisson equation in their incompressible SPH method. In recent studies, multigrid methods were used as preconditioners in Krylov subspace methods, e.g., the CG, CR and GMRES methods. Algebraic multigrid (AMG) methods were adopted by Trask et al. [38], Chow et al. [39] and Guo et al. [40] in the incompressible SPH method and by Matsunaga et al. [41] in the MPS method. Geometric bucket-based multigrid (BMG) methods were adopted by Sodersten et al. [42] in the MPS method and by Takahashi and Lin [43] in the incompressible SPH method. In addition, Sodersten et al. [42] reported that the BMG solver was more efficient in particle methods because the AMG solver needs to be set up at every time step due to the dynamic change in connectivity. In these multigrid solvers for particle methods [37,38,39,40,41,42,43], only smoothers demanding diagonally dominant matrix equations, e.g., Jacobi and Gauss‒Seidel smoothers, were adopted. Therefore, their applications were limited to diagonally dominant systems, which are obtained when difference-based Laplacian models are applied to the Poisson equation or the Helmholtz equation. In the finite point method (FPM), which is a meshless method, Seibold [44] and Metsch et al. [45] adopted the AMG solver. Metsche et al. [45] applied a multigrid solver not only for the simple pressure Poisson equation but also for the pressure–velocity coupled equation, where pressure and velocity are both treated implicitly. Although they successfully obtained solutions in nonsymmetric, nondiagonally dominant, and nonpositive definite systems with the combination of the GMRES method and AMG preconditioning with a Uzawa smoother [46], they noted that convergence was not assured due to the nonsymmetric matrix. In addition, they reported that the calculation time was dominated by the AMG setup time because the point cloud in FPM dynamically changes. This implies that the AMG setup time is not negligible when the connectivity dynamically changes, as in the particle methods and meshless methods.

Fortunately, the linear equation in the MPH-I method [21] can be converted and will have the SPD feature even in the pressure–velocity coupled approach. Therefore, the classic multigrid preconditioned CG solver [47] is applicable. However, the coefficient matrix is not diagonally dominant due to the complex connectivity and multidimensionality, has many nonzero elements corresponding to the neighboring particles, and has a large condition number due to incompressibility and large viscosity. To handle multidimensionality and heterogeneity [48, 49], a damped Jacobi smoother [34,35,36] is often applied to relax the convergence. However, it is difficult to set the damped parameter to ensure asymptotic convergence in the problem where the connectivity dynamically changes.

In this study, a scalable MPH-I method was developed. It was shown that the SPD matrix equation can be derived via pressure substitution [21] and that the SPD feature generally appeared in physically consistent systems. For the SPD matrix equation, geometric bucket-based multigrid (BMG) preconditioned CG/CR solvers were constructed, and the preconditioner was designed to satisfy the condition for theoretical convergence in a finite number of iterations. To handle multidimensionality, the Jacobi smoother was extended to the one that is applicable for the nondiagonally dominant matrix. To confirm the validity of the multigrid solver, the CG, CR, multigrid preconditioned CG (MGCG) and multigrid preconditioned CR (MGCR) solvers were compared with respect to the number of solver iterations and computation time. Specifically, high-viscosity incompressible dam break calculations were conducted with various resolutions. Furthermore, the performance in the shared memory parallel computation with CPU and GPU were also investigated.

2 Moving particle hydrodynamics for incompressible flows (MPH-I)

2.1 Governing equations and physical consistency

The governing equations in the MPH methods [17,18,19,20,21,22] are the Navier‒Stokes equation with a Lagrangian description

$$ \rho_{0} \frac{{d{\mathbf{u}}}}{dt} = - \nabla \Psi + \mu \nabla^{2} {\mathbf{u}} + \rho {\mathbf{g}} $$
(1)

and the equation for pressure

$$ \Psi = - \lambda \nabla \cdot {\mathbf{u}} + \kappa \frac{{\rho - \rho_{0} }}{{\rho_{0} }}, $$
(2)

where ρ, u, Ψ, μ, g, λ and κ are the density, velocity, pressure, shear viscosity, gravity, bulk viscosity and bulk modulus, respectively. Although these governing equations do not directly include the incompressible condition, incompressible flows can practically be expressed by setting λ and μ to sufficiently large values. The expression in Eq. (2) enables the arising matrix equation to be SPD, which is discussed later. In the MPH method, the governing equations are discretized using particle interaction models, which is conceptually the same as that in the SPH and MPS methods. Simultaneously, for physical consistency, the interaction models are to be chosen such that they can be fit into an analytical mechanical framework [23]. In this study, the normalized weight function

$$ \begin{aligned}& w_{ij}^{{}} = w^{{}} (|{\mathbf{r}}_{ij} |) \hfill \\ &w_{{}} (r) = \left\{ {\begin{array}{*{20}c} {\frac{1}{{S_{{}} }}\frac{1}{{h_{{}}^{d} }}\left( {1 - \frac{r}{{h_{{}} }}} \right)^{2} } & {(r \le h_{{}} )} \\ 0 & {(r > h_{{}} )} \\ \end{array} } \right. \hfill \\ & S_{{}} = \int_{r < h} {\frac{1}{{h_{{}}^{d} }}\left( {1 - \frac{r}{{h_{{}} }}} \right)^{2} dv} \hfill \\ \end{aligned} $$
(3)

was used for discretization, where rij is the relative position between particles i and j, h is the effective radius, i.e., the cutoff radius, and d is the number of dimensions. Here, the effective radius for pressure term hp and the radius for shear viscosity hv are separately given, and the corresponding weight functions are denoted by \(w_{ij}^{p}\) and \(w_{ij}^{v}\), respectively. The Navier‒Stokes equation (Eq. (1)) is discretized as [20]

$$ \rho_{0} \frac{{d{\mathbf{u}}_{i} }}{dt} = \sum\limits_{j} {(\Psi_{j} + \Psi_{i} ){\mathbf{e}}_{ij} w_{ij}^{p^{\prime } } } - 2\mu (d + 2)\sum\limits_{j} {\left( {{\mathbf{u}}_{ij} \cdot {\mathbf{e}}_{ij}^{{}} } \right){\mathbf{e}}_{ij}^{{}} \frac{{w_{ij}^{v^{\prime } } }}{{|{\mathbf{r}}_{ij} |}}} + \rho_{0} {\mathbf{g}}, $$
(4)

where eij is the unit vector in the rij direction, Ψi and ui are the pressure and velocity of particle i, and uij = uj-ui is the relative velocity between particles i and j, respectively. The right shoulder prime in \(w_{ij}^{p^{\prime }}\) and \(w_{ij}^{v^{\prime }}\) indicates the differential of the weight function

$$ w^{\prime}_{ij} = \frac{{\partial w_{{}} (|{\mathbf{r}}_{ij} |)}}{\partial r}, $$
(5)

which yields negative values. On the other hand, the equation for pressure (Eq. (2)) is discretized as

$$ \Psi_{i} = \lambda \sum\limits_{j} {\left( {{\mathbf{u}}_{ij}^{{}} \cdot {\mathbf{e}}_{ij}^{{}} } \right)w_{ij}^{p^{\prime }}} + \kappa (n_{i} - n_{0} ), $$
(6)

where ni is a particle number density given by the summation of the weight function wijp as

$$ n_{i} = \sum\limits_{j} {w_{ij}^{p} } , $$
(7)

and n0 is a base value of the particle number density.

In addition, Eqs. (4) and (6) can be fit into the extended Lagrangian mechanics framework with dissipation [23]

$$ \frac{d}{dt}\left( {\frac{{\partial \mathcal{L}}}{{\partial {\mathbf{u}}_{i} }}} \right) - \left( {\frac{{\partial \mathcal{L}}}{{\partial {\mathbf{x}}_{i} }}} \right) + \left( {\frac{{\partial \mathcal{D}}}{{\partial {\mathbf{u}}_{i} }}} \right) = {\mathbf{0}}, $$
(8)
$$ \mathcal{L} = \mathcal{T} - \mathcal{V} $$

where \( \mathcal{L}\), \( \mathcal{T}\), \( \mathcal{V} \) and \( \mathcal{D} \) are the Lagrangian, kinetic energy, potential energy and Rayleigh’s dissipative function, respectively. Therefore, the system in the MPH-I method is physically consistent [17, 20]. Specifically, when Lagrangian \( \mathcal{L}\) and dissipative function \( \mathcal{D} \) are given as

$$ \mathcal{L} = \sum\limits_{i} {\left( {\frac{1}{2}m|{\mathbf{u}}_{i} |^{2} - \frac{\kappa }{2}\left( {n_{i} - n_{0} } \right)^{2} \Delta V + m{\mathbf{g}} \cdot {\mathbf{x}}_{i} } \right)} $$
(9)

and

$$\begin{aligned} \mathcal{D} &= \sum\limits_{i} {\left( {\frac{\lambda }{2}\left( {\sum\limits_{j} {\left( {{\mathbf{u}}_{ij}^{{}} \cdot {\mathbf{e}}_{ij}^{{}} } \right)w_{ij}^{p^{\prime }} } } \right)^{2} \Delta V}\right.} \\ & \quad {\left.{- \sum\limits_{j} {\frac{\mu }{2}(d + 2)\left( {{\mathbf{u}}_{ij} \cdot {\mathbf{e}}_{ij} } \right)^{2} \frac{{w_{ij}^{v^{\prime }} }}{{|{\mathbf{r}}_{ij} |}}} \Delta V} \right)} ,\end{aligned} $$
(10)

the discretized governing equations (Eqs. (4) and (6)) are derived using Eq. (8). In Eqs. (9) and (10), m and ΔV are the mass and volume of the particles, respectively, which are given as constants

$$ \begin{gathered} \Delta V = l^{d} \hfill \\ m = \rho_{0} \Delta V \hfill \\ \end{gathered} $$
(11)

using initial particle spacing l.

Although the particle interaction models appearing in the physically consistent formulations (Eqs. (4) and (6)) are zeroth-order accurate, the calculation model, i.e., the MPH method, was validated in previous studies [17,18,19,20,21], for example, via the static pressure calculation [17, 18], dam break calculation [17, 18] and Taylor Couette calculation [19]. The static pressure calculation is not always easy for particle methods because it is sensitive to the unphysical fluctuation that often appears in particle methods. Even so, the MPH method could obtain reasonable results without any empirical relaxations in such a problem, where the classical particle methods, i.e., the SPH and MPS methods, fail [17]. In addition, since the MPH method is purely a multibody system defined via analytical mechanics [23], no special treatment is needed for giving boundary conditions. Specifically, the wall boundary can be easily expressed by putting fixed particles in the calculation domain, and the free surface boundary is naturally given via the vacant space where particles do not exist.

2.2 Linear matrix equation in the implicit calculation

For practically simulating incompressible flows, the bulk viscosity λ and bulk modulus κ are set very large. For setting κ, numerical stability is assured when the condition

$$ \kappa \le \frac{\lambda }{\Delta t} $$
(12)

is satisfied [17], where Δt is the time step width. This implies that the bulk modulus κ can be set large when the bulk viscosity λ is large. Therefore, the way to handle a large bulk viscosity λ must be considered in the incompressible calculation. In addition, to simulating high-viscosity flows, a large shear viscosity μ must be calculated stably. For stability with a large bulk viscosity λ and a large shear viscosity μ, the velocity in Eqs. (4) and (6) is implicitly treated as

$$ \rho_{0} \frac{{{\mathbf{u}}_{i}^{k + 1} - {\mathbf{u}}_{i}^{k} }}{\Delta t} = \sum\limits_{j} {(\Psi_{j}^{k + 1} + \Psi_{i}^{k + 1} ){\mathbf{e}}_{ij}^{{}} w_{ij}^{p^{\prime }} } - 2\mu (d + 2)\sum\limits_{j} {\left( {{\mathbf{u}}_{ij}^{k + 1} \cdot {\mathbf{e}}_{ij}^{{}} } \right){\mathbf{e}}_{ij}^{{}} \frac{{w_{ij}^{v^{\prime }} }}{{|{\mathbf{r}}_{ij} |}}} + \rho_{0} {\mathbf{g}} $$
(13)
$$ \Psi_{i}^{k + 1} = \lambda \sum\limits_{j} {\left( {{\mathbf{u}}_{ij}^{k + 1} \cdot {\mathbf{e}}_{ij}^{{}} } \right)w_{ij}^{p^{\prime }} } + \kappa (n_{i}^{{}} - n_{0} ), $$
(14)

where the upper shoulder index k attached to the velocity ui indicates the time step. Equations (13) and (14) form a linear matrix equation whose unknowns are the velocity uk+1 and pressure Ψk+1. By solving this matrix equation, the velocity at the next step uk+1 is obtained, and by updating the particle position x as

$$ {\mathbf{x}}_{i}^{k + 1} = {\mathbf{x}}_{i}^{k} + {\mathbf{u}}_{i}^{k + 1} \Delta t, $$
(15)

the particle movement can be calculated. However, since the linear system (Eqs. (13) and (14)) has a nondefinite coefficient matrix, the convergence is not assured when the well-known CG and CR solvers are applied [17]. In this study, the system is converted to a system with a symmetric positive-definite (SPD) feature by substituting Eq. (14) into the pressure Ψk+1 in Eq. (13) as

$$ \begin{aligned} & \rho_{0} \frac{{{\mathbf{u}}_{i}^{k + 1} }}{\Delta t} + 2\mu (d + 2)\sum\limits_{j} {\left( {({\mathbf{u}}_{j}^{k + 1} - {\mathbf{u}}_{i}^{k + 1} ) \cdot {\mathbf{e}}_{ij}^{{}} } \right){\mathbf{e}}_{ij}^{{}} \frac{{w_{ij}^{v^{\prime }} }}{{|{\mathbf{r}}_{ij} |}}} \hfill \\ & - \sum\limits_{j} {\left( {\lambda \sum\limits_{n} {\left( {{\mathbf{u}}_{n}^{k + 1} - {\mathbf{u}}_{j}^{k + 1} } \right) \cdot {\mathbf{e}}_{jn}^{{}} w_{jn}^{p^{\prime }} } }\right.}\\ & {\left.{+ \lambda \sum\limits_{m} {\left( {{\mathbf{u}}_{m}^{k + 1} - {\mathbf{u}}_{i}^{k + 1} } \right) \cdot {\mathbf{e}}_{im}^{{}} w_{im}^{p^{\prime }} } } \right){\mathbf{e}}_{ij}^{{}} w_{ij}^{p^{\prime }} } , \hfill \\ & = \rho_{0} \frac{{{\mathbf{u}}_{i}^{k} }}{\Delta t} + \sum\limits_{j} {\left( {\kappa (n_{j}^{{}} - n_{0}^{{}} ) + \kappa (n_{i}^{{}} - n_{0}^{{}} )} \right){\mathbf{e}}_{ij}^{{}} w_{ij}^{p^{\prime }} } + \rho_{0} {\mathbf{g}} \hfill \\ \end{aligned} $$
(16)

where the unknowns are only the velocity uk+1 [21]. Since this conversion mathematically retains the same equation, it enables justifiably solving the system with CG and CR solvers without affecting the calculation results. The partial differential version of Eq. (16) is obtained by substituting Eq. (2) into Eq. (1) as

$$ \begin{gathered} \rho_{0} \frac{{d{\mathbf{u}}}}{dt} - \nabla \cdot \mu \nabla {\mathbf{u}} - \nabla (\lambda \nabla \cdot {\mathbf{u}}) \hfill \\ = - \nabla \left( {\kappa \frac{{\rho - \rho_{0} }}{{\rho_{0} }}} \right) + \rho_{0} {\mathbf{g}}. \hfill \\ \end{gathered} $$
(17)

The partial differential equation to be solved (Eq. (17)) is more complex than the Poisson equation or Helmholtz equation that were solved in previous studies [34,35,36,37,38,39,40], but it is a kind of diffusion equation with respect to velocity. Therefore, the discretized version is expected to be basically diagonally dominant.

To obtain the SPD feature in Eq. (16), the formulation of Eq. (2) is a key, and this approach is analogous to the penalty method in the finite element method [50] which was originally developed for structural calculations. In structural calculations, the Lagrangian is given with elastic strain energy, and the motion equation is derived by minimizing the potential energy. On the other hand, in fluid calculations with Lagrangian specifications, i.e., particle methods, the dissipative function is given, and the corresponding force in the motion equation is derived by minimizing the function. In structural calculations, the incompressible constraint can practically be posed using a very large bulk modulus. This is the penalty method with which the SPD feature can be obtained. Analogically, in the particle methods, the practically incompressible calculation can be conducted with very large bulk viscosity, λ in Eq. (2), and similarly, the SPD feature can be maintained with this approach.

Moreover, the SPD feature appeared in Eq. (16) generally arises in physically consistent systems. In the extended Lagrangian mechanics with dissipation (Eq. (8)), Rayleigh’s dissipative function must be positive definite. This allows the dissipative function to be expressed as

$$ \mathcal{D} = \frac{1}{2}\{ {\mathbf{u}},{\mathbf{Cu}}\} $$
(18)

using an SPD matrix C, where the vector u without a lower index indicates a large vector unifying the velocity of all particles, and the bracket {,} indicates the dot product of the unified vectors. Using the matrix C, the motion equation of the particles is expressed as

$$ m\frac{{d{\mathbf{u}}}}{dt} = \frac{d\mathcal{L}}{{d{\mathbf{x}}}} - {\mathbf{Cu}}. $$
(19)

When the velocity is treated implicitly,

$$ \left( {\frac{m}{\Delta t}{\mathbf{I}} + {\mathbf{C}}} \right){\mathbf{u}}^{k + 1} = m\frac{{{\mathbf{u}}^{k} }}{\Delta t} + \frac{d\mathcal{L}}{{d{\mathbf{x}}}} $$
(20)

is obtained. Since the coefficient matrix appears on the left-hand side of Eq. (20) is SPD, it is proven that the SPD feature will arise in arbitrary physically consistent systems that can be fit into the analytical mechanical framework of Eq. (8). Therefore, it is interpreted that the SPD feature in Eq. (16) emerged owing to the physical consistency in the MPH-I method.

3 Multigrid preconditioned CG/CR method

3.1 Generalized CG/CR algorithm

The CG and CR solvers, where the convergence is assured with short recursive iterations whose number is smaller than the degree of freedom, can be generalized as follows. Here, let the weight matrix M and the preconditioning matrix K. Using the initial solution x0, the residual r and search direction p are initially given as

$$ \begin{aligned} {\mathbf{r}}_{0}&= {\mathbf{b}} - {\mathbf{Ax}}_{0} \hfill \\ {\mathbf{p}}_{0}& = {\mathbf{Kr}}_{0} , \hfill \\ \end{aligned} $$
(21)

where the lower index indicates the iteration. Then, the solution x and residual r are updated

$$ \begin{aligned} {\mathbf{x}}_{k + 1} &= {\mathbf{x}}_{k} + \alpha_{k} {\mathbf{p}}_{k} \hfill \\ {\mathbf{r}}_{k + 1} &= {\mathbf{r}}_{k} - \alpha_{k} {\mathbf{Ap}}_{k} \hfill \\ \end{aligned} $$
(22)

with a parameter αk

$$ \alpha_{k} = \frac{{\{ {\mathbf{r}}_{k} ,{\mathbf{MAp}}_{k} \} }}{{\{ {\mathbf{Ap}}_{k} ,{\mathbf{MAp}}_{k} \} }}, $$
(23)

which is determined such that

$$ \{ {\mathbf{r}}_{k + 1} ,{\mathbf{MAp}}_{k} \} = 0, $$
(24)

Here, the bracket {a,b} indicates the dot product of unified vectors a and b. Then, the conjugate vector p is updated as

$$ {\mathbf{p}}_{k + 1} = {\mathbf{Kr}}_{k + 1} + \beta_{k} {\mathbf{p}}_{k} $$
(25)

with a parameter βk

$$ \beta_{k} = - \frac{{\{ {\mathbf{AKr}}_{k + 1} ,{\mathbf{MAp}}_{k} \} }}{{\{ {\mathbf{Ap}}_{k} ,{\mathbf{MAp}}_{k} \} }}, $$
(26)

which is determined such that

$$ \{ {\mathbf{Ap}}_{k + 1} ,{\mathbf{MAp}}_{k} \} = 0. $$
(27)

The iteration given by Eqs. (22)–(27) is repeated until the L2 norm of the residual |rk|2 becomes small enough, i.e., the convergence. When the matrices M and MAK are symmetric, the orthogonalities

$$ \{ {\mathbf{r}}_{i} ,{\mathbf{MAp}}_{j} \} = 0 \quad (j < i) $$
(28)
$$ \{ {\mathbf{r}}_{i} ,{\mathbf{MAKr}}_{j} \} = 0\quad (j \ne i) $$
(29)
$$ \{ {\mathbf{Ap}}_{i} ,{\mathbf{MAp}}_{j} \} = 0 \quad (j \ne i) $$
(30)

are provided (see “Appendix A”). Using Eqs. (28)–(30), αk and βk are rewritten as

$$ \alpha_{k} = \frac{{\{ {\mathbf{r}}_{k} ,{\mathbf{MAp}}_{k} \} }}{{\{ {\mathbf{MAp}}_{k} ,{\mathbf{Ap}}_{k} \} }} = \frac{{\{ {\mathbf{r}}_{k} ,{\mathbf{MAKr}}_{k} \} }}{{\{ {\mathbf{Ap}}_{k} ,{\mathbf{MAp}}_{k} \} }} $$
(31)

and

$$ \beta_{k} = - \frac{{\{ {\mathbf{MAKr}}_{k + 1} ,{\mathbf{Ap}}_{k} \} }}{{\{ {\mathbf{MAp}}_{k} ,{\mathbf{Ap}}_{k} \} }} = \frac{{\{ {\mathbf{r}}_{k + 1} ,{\mathbf{MAKr}}_{k + 1} \} }}{{\{ {\mathbf{r}}_{k} ,{\mathbf{MAKr}}_{k} )\} }}, $$
(32)

which are often used in implementation. When the matrices M and MAK are symmetric positive definite (SPD), theoretical convergence is assured (see “Appendix A”). This generalized algorithm (Eqs. (21)–(27)) can generate specific algorithms such as the CG and CR methods. For example, the CG method is derived from M = A−1 and K = I, the CR method is from M = I and K = I, the CGNR method [33] is from M = I and K = AT, and the CGNE method [33] is from M = (AAT)−1 and K = AT. Furthermore, the preconditioned CG method is equivalent to the case with M = A−1 and K = KCG, and the preconditioned CR method is the case with M = K = KCR. Therefore, theoretical convergence is obtained when KCG and KCR are SPD because M and MAK will be SPD in such cases.

It is noteworthy that the residual in the CG and preconditioned CG methods is weighted by matrix A−1. In fact, the weight matrix is specified as M = A−1 when deriving the CG method. Consequently, the solution in the CG iteration is updated such that {r, A−1r} is minimized using the search direction p. This unintended weight may affect the convergence. In fact, the convergence degrades, especially when the condition number of A is large. On the other hand, in the CR and preconditioned CR methods, the objective functions to be minimized are straightforwardly expressed as {r, r} and {r, KCRr}, respectively, which results in good convergence properties, such as a monotonic decrease in the residual.

3.2 Bucket-based geometric multigrid preconditioner

In this study, background bucket cells are utilized for constructing a geometric multigrid preconditioner for the CG and CR methods. The algorithm of the multigrid preconditioned conjugated residual (MGCR) solver is shown in Fig. 1. To construct the multigrid structure, the bucket size is set the same as the effective radius such that the range of the interaction is limited to the next buckets. The linear equation in the MPH-I method (Eq. (16)) has multidimensionality because the unknowns are the velocities of particles. To restrict the multidimensional vector of the particles to the finest grid, i.e., buckets, the vectors of particles in the buckets are simply summed as

$$ {\mathbf{u}}_{l}^{0} = \sum\limits_{i \in l} {{\mathbf{u}}_{i} } , $$
(33)

where the i and l on the lower right of u show the indices of the particles and the buckets, respectively, and the upper index 0 of ul indicates that the parameter belongs to the finest grid (level 0 in Fig. 1). Here, the restriction matrix corresponding to Eq. (33) is denoted by R, and the prolongation matrix from the buckets to the particles P is given by its transpose as

$$ {\mathbf{P}} = {\mathbf{R}}^{T} . $$
(34)
Fig. 1
figure 1

Multigrid preconditioned conjugated residual algorithm

Then, the coefficient matrix in the finest grid scale A0 is defined as

$$ {\mathbf{A}}^{0} = {\mathbf{RAP}}, $$
(35)

where A is the coefficient matrix at the original particle scale, which is expressed in Eq. (16). Furthermore, the coarser grids are recursively created from the finer grids. In this study, the size of the coarser grids is double that of the finer grids. Specifically, the coarser grid consists of 4 finer grids in 2D and 8 finer grids in 3D. The level of the grid is incremented as the grid size is doubled (level 0: grid size = hv, level 1: grid size = 2hv, level 2: grid size = 4hv, and so on). The restriction from the finer grid (level r) to the coarser grid (level r + 1) is simply given by the summation

$$ {\mathbf{u}}_{l}^{r + 1} = \sum\limits_{s \in l} {{\mathbf{u}}_{s}^{r} } , $$
(36)

where l and s on the lower right of u are the indices of the coarser and finer grids, respectively. Using the restriction matrix Rr, which corresponds to Eq. (36), the prolongation matrix from level r + 1 to level r is given by

$$ {\mathbf{P}}^{r} = {\mathbf{R}}^{rT} . $$
(37)

Then, the coefficient matrix Ar is recursively provided as

$$ {\mathbf{A}}^{r + 1} = {\mathbf{R}}^{r} {\mathbf{A}}^{r} {\mathbf{P}}^{r} . $$
(38)

Using the geometric multigrid expressed by Eqs. (33)–(38), a preconditioner for the CG and CR methods satisfying the SPD condition is constructed. However, the linear system to be solved in the MPH-I method is not diagonally dominant even at the coarse grid scale because of multidimensionality. Therefore, the widely used smoothers, e.g., the Jacobi and Gauss‒Seidel smoothers, cannot directly be applied because they demand diagonal dominancy. To address this issue, the Jacobi smoother is extended to be applicable even for nondiagonally dominant systems. In solving the linear equation

$$ {\mathbf{Ax}} = {\mathbf{b}}, $$
(39)

the Jacobi iteration is expressed as

$$ {\mathbf{x}}_{k + 1} = {\mathbf{x}}_{k} + {\mathbf{D}}^{ - 1} ({\mathbf{b}} - {\mathbf{Ax}}_{k} ), $$
(40)

where D is the diagonal component of the coefficient matrix A. In the extended Jacobi iteration, the diagonal matrix D is replaced by another diagonal matrix \({\hat{\mathbf{D}}}\), whose elements satisfy

$$ \hat{D}_{ii}^{{}} > \frac{1}{2}\sum\limits_{j} {|A_{ij}^{{}} |} . $$
(41)

Then, the iteration is given by

$$ {\mathbf{x}}_{k + 1} = {\mathbf{x}}_{k} + {\hat{\mathbf{D}}}^{ - 1} ({\mathbf{b}} - {\mathbf{Ax}}_{k} ). $$
(42)

With this simple extension, asymptotic convergence will be obtained even in a nondiagonally dominant system (see “Appendix B”). In this study, the right-hand side of Eq. (41) was simply adopted for calculating the elements \({\hat{\mathbf{D}}}\) because it works when \(2{\hat{\mathbf{D}}} - {\mathbf{A}}\) is not singular, which is satisfied in most cases. Since the matrix equation in the MPH-I method is a discretized version of the diffusion equation, it is close to diagonally dominant. In such cases, this extension is useful.

In this study, the V cycle multigrid calculation was included as a preconditioner of the CG and CR methods (Fig. 1), and the extended Jacobi iteration was adopted as a smoother in each level. It is better for the preconditioner to skip smoothing at the original particle level, where matrix–vector multiplication requires a large computational cost. This is because the CG and CR solvers already have the main iterations in the original level, which are more efficient than the smoothing iterations in the multigrid calculation, i.e., the extended Jacobi iterations. Therefore, in this study, the preconditioning matrix K was designed not to include particle level smoothing as

$$ {\mathbf{K}} = {\hat{\mathbf{D}}}^{ - 1} + {\mathbf{PM}}^{0} {\mathbf{R}}, $$
(43)

where \({\hat{\mathbf{D}}}\) is the extended diagonal matrix defined in Eq. (41) corresponding to the coefficient matrix A in Eq. (16), and M0 is the matrix corresponding to the calculation at level 0. When the extended Jacobi iterations are conducted twice in both pre- and postsmoothing in each level (Fig. 1), the recursive relation between the matrices Mr and Mr+1

$$ \begin{aligned} {\mathbf{M}}^{r} & = ({\mathbf{A}}^{r} )^{ - 1} - \left( {{\mathbf{I}}^{r} - ({\hat{\mathbf{D}}}^{r} )^{ - 1} {\mathbf{A}}^{r} } \right)^{2} ({\mathbf{A}}^{r} )^{ - 1} \left( {{\mathbf{I}}^{r} - {\mathbf{A}}^{r} ({\hat{\mathbf{D}}}^{r} )^{ - 1} } \right)^{2} \\ & \quad + \left( {{\mathbf{I}}^{r} - ({\hat{\mathbf{D}}}^{r} )^{ - 1} {\mathbf{A}}^{r} } \right)^{2} {\mathbf{P}}^{r} {\mathbf{M}}^{r + 1} {\mathbf{R}}^{r} \left( {{\mathbf{I}}^{r} - {\mathbf{A}}^{r} ({\hat{\mathbf{D}}}^{r} )^{ - 1} } \right)^{2} \\ \end{aligned} $$
(44)

holds, where the upper right index of M indicates the level (see “Appendix C”). In the maximum level, the extended Jacobi iterations are conducted 4 times, and the matrix Mmax is expressed as

$$\begin{aligned} {\mathbf{M}}^{\max } &= ({\mathbf{A}}^{\max } )^{ - 1} - \left( {{\mathbf{I}}^{\max } - ({\hat{\mathbf{D}}}^{\max } )^{ - 1} {\mathbf{A}}^{\max } } \right)^{2} ({\mathbf{A}}^{\max } )^{ - 1} \\ & \left( {{\mathbf{I}}^{\max } - {\mathbf{A}}^{\max } ({\hat{\mathbf{D}}}^{\max } )^{ - 1} } \right)^{2} .\end{aligned} $$
(45)

Here, the matrix Mmax is SPD (see “Appendix B”). In the same way, the sum of the first and second terms on the right-hand side of Eq. (44) is SPD. When Mr+1 is SPD, the third term in Eq. (44) is symmetric nonnegative definite. Therefore, Mr is recursively SPD, and K is also SPD. Therefore, K satisfies the condition to be a preconditioner for the CG and CR solvers. Note that PM0R in the second term on the right-hand side of Eq. (43) cannot solely be a preconditioner because it is nonnegative definite but singular. Therefore, it is combined with \({\hat{\mathbf{D}}}^{ - 1}\) to construct a preconditioner.

4 Benchmark calculations

4.1 Number of iterations

The presented geometric bucket-based multigrid preconditioner was implemented in the MPH-I methods, where open-source code [22] was used. The calculations were conducted using the multigrid preconditioned conjugated residual (MGCR), multigrid preconditioned conjugated gradient (MGCG), nonpreconditioned conjugated residual (CR) and nonpreconditioned conjugated gradient (CG) solvers. Specifically, the simple high-viscosity incompressible dam break shown in Fig. 2 was calculated for a phenomenon time of 0.2 s. The calculation condition for the base case is shown in Table 1, and the scaled cases, whose particle spacing l and time step width Δt are 1/2, 1/4, 1/8,…,1/64 times the base case, are shown in Table 2. In Table 2, only the difference from the base case (Table 1) is shown. In addition, the diffusion numbers dλ and dμ, degrees of freedom DoFA, approximated number of nonzero elements nnzA and approximated condition numbers KA are displayed in Table 2. The DoFA was calculated as a product of the number of fluid particles and the number of dimensions. The nnzA was estimated as

$$ {\text{nnz}}_{{\mathbf{A}}} \approx \left\{ {\begin{array}{*{20}c} {{\text{DoF}}_{{\mathbf{A}}} \times 2\pi (h_{v} /l)^{2} } & {(d = 2)} \\ {{\text{DoF}}_{{\mathbf{A}}} \times 4\pi (h_{v} /l)^{3} } & {(d = 3)} \\ \end{array} } \right. $$
(46)

using the approximated number of neighboring particles, which is π(hv/l)2 in 2D and 4/3π(hv/l)3 in 3D. For the rough estimation of KA, maximum and minimum eigen values Λmax and Λmin were predicted using

$$ f(u) = \frac{{\rho_{0} }}{\Delta t}u - (\mu + \lambda )\frac{{d^{2} u}}{{dx^{2} }}, $$
(47)

which is the 1D version of the left-hand side of Eq. (17). In estimating Λmax, it was assumed that the discretization in the MPH-I method is analogous to the finite difference discretization with a mesh size of hv(= 2hp). Equation (47) was discretized as

$$ f(u_{l} ) \approx \frac{{\rho_{0} }}{\Delta t}u_{l} - (\mu + \lambda )\frac{{u_{l + 1} - 2u_{l} + u_{l - 1} }}{{h_{v}^{2} }}, $$
(48)

and the maximum eigen value Λmax was approximated as

$$ \Lambda_{\max } \approx \frac{\rho }{\Delta t} + (\mu + \lambda )\frac{4}{{h_{v}^{2} }}. $$
(49)
Fig. 2
figure 2

Dam-break calculation of a highly viscous incompressible flow

Table 1 Calculation conditions for the base case
Table 2 Calculation conditions for the scaled cases with various resolutions

On the other hand, for predicting Λmin, the maximum wavelength 4L which was determined from the calculation geometry (Fig. 2), was focused on. By substituting a sine wave with a wavelength of 4L

$$ u(x) = \sin \frac{\pi x}{{2L}} $$
(50)

into Eq. (47),

$$ f(u) = \left[ {\frac{{\rho_{0} }}{\Delta t} + (\mu + \lambda )\left( {\frac{\pi }{2L}} \right)^{2} } \right]u(x) $$
(51)

was obtained. Then, the minimum eigen value Λmin was approximated as

$$ \Lambda_{\min } \approx \frac{{\rho_{0} }}{\Delta t} + (\mu + \lambda )\left( {\frac{\pi }{2L}} \right)^{2} , $$
(52)

and the condition number was predicted as

$$ {\text{K}}_{{\mathbf{A}}} = \Lambda_{\max } /\Lambda_{\min } . $$
(53)

The number of solver iterations at t = 0.2 s with respect to the scaled cases (Table 2) is presented in Fig. 3. Here, only the main iterations (Fig. 1) are counted with the convergence threshold of \(\left| {\mathbf{r}} \right|^{{2}} /\left| {\mathbf{b}} \right|^{{2}} < {1}0^{{{-}{12}}}\). When the CR and CG solvers were adopted, the number of iterations drastically increased with the problem size. On the other hand, with the MGCR and MGCG solvers, the increase in the number of iterations was limited. Additionally, the number of iterations was smaller with the MGCR and CR solvers than with the MGCG and CG solvers, respectively. The objective function in the MGCG and CG was unintendedly weighted by A−1 as {r, A−1r}, and it possibly affected the convergence.

Fig. 3
figure 3

Number of iterations in the scaled cases (Table 2)

The difficulty in solving the scaled cases shown in Table 2 is not only because of the problem size but also because of the large diffusion numbers. Although the Courant numbers in the scaled cases are the same as those in the base case, the diffusion numbers dλ and dμ are larger than those in the base case. For the comparison, the small cases shown in Table 3 were studied, and the problem size was set the same as that of the base case, but the diffusion numbers varied to follow each scaled case in Table 2. In Table 3, only the conditions that are different from those of the base case (Table 1) are presented. By setting the gravity g and viscosities μ and λ, which were 2, 4, 8,…64 times larger those of the base case, only the diffusion numbers dλ and dμ were enlarged, while almost the same flow as that of the base case was maintained. The number of solver iterations at t = 0.2 s with respect to the small cases (Table 3) is presented in Fig. 4. The number of iterations is smaller with the MGCR and MGCG solvers than with the CR and CG solvers, respectively. This implies that multigrid preconditioning also suppresses the number of iterations associated with large diffusion numbers. The scaled cases (Table 2) and the small cases with the same diffusion numbers (Table 3) are compared in Fig. 5. When the problem size is large, the multigrid preconditioner contribution is large, and the number of iterations in the scaled cases is kept smaller than that in the corresponding small cases. In addition, it is confirmed that the MGCR solver shows better scalability than the MGCG solver. Compared to the previous studies [38, 42, 43], the numbers of iterations in this study (Figs. 3 and 4) are relatively large. This is because the linear matrix equation in this study (Eq. (16)) is difficult to be solve due to the nondiagonally dominancy, large number of nonzero elements and large condition number (Table 2).

Table 3 Calculation conditions for the small cases with various diffusion numbers
Fig. 4
figure 4

Number of iterations in the small cases with various diffusion numbers (Table 3)

Fig. 5
figure 5

Comparison between the scaled cases (Table 2) and small cases (Table 3)

4.2 Single CPU calculation

The calculation times for a single thread CPU (Xeon Gold 6252 (24 core)) computation are shown in Fig. 6, where the results for Case x1~x1/8 in Table 2 are presented. Hereinafter, the calculation times are shown with dividing by the number of particles and number of time steps for comparison, and such calculati on times are referred to “unit calculation times”. The unit calculation times using the CR and CG methods become larger as the problem size becomes larger. On the other hand, when using the MGCR and MGCG methods, the unit calculation time is almost constant even for large problems. This implies that the computational time is proportional to the problem size and that the multigrid methods are scalable. The breakdowns of the unit calculation times with the MGCR and CR solvers are shown in Fig. 7. For each calculation, the whole computational time is labeled “total”. For the MGCR solver, the time spent for preconditioning is labeled “solver preconditioning”, and the other time spent by the solver, which is mostly for the main iteration, is labeled “solver main”. For the CR solver, the time spent by the solver is labeled “solver” because there is no preconditioning. Most of the calculation time was spent by the solvers. In the single CPU calculation, the preconditioning time was not dominant in all the time spent by the MGCR solver. In the preconditioning stage, the most computationally expensive matrix–vector product calculations in the original particle level are avoided as in Eq. (43), and the V-cycle only includes the product calculations in the coarser level. Therefore, the amount of computation required for the preconditioning is basically smaller than that for the main iteration.

Fig. 6
figure 6

Calculation time of a single thread computation with a CPU

Fig. 7
figure 7

Breakdown of the calculation time of a single thread computation with a CPU

4.3 Parallel CPU and GPU calculations

The calculation times of the OpenACC parallel computation on the GPU (A100 (80 GB)) are shown in Fig. 8, where the results for Cases x1~x1/64 in Table 2 are presented. In the relatively small cases with 400–6400 particles (Cases x1 ~x1/4), the CR and CG methods were faster than the MGCR and MGCG methods, but in the larger cases with over 25,600 particles (Cases x1/16 ~), the multigrid methods were faster. The breakdowns of the calculation times with the MGCR and CR methods are shown in Fig. 9, where the legends are the same as in Fig. 7. With both methods, the time spent by the solvers dominated the total calculation time. Using the CR solver, the unit calculation time against the problem size (CR (total)) decreased once in the range of 400–102,400 particles (Cases x1 ~x1/16) and increased again in the range over 102,400 particles (Cases x1/16 ~). When the problem size was large, the number of iterations dominated the computation time, and the increasing trend in the large cases reflects the large number of iterations in the CR solver. When the problem size was small, the overhead cost for parallelization dominated the computation time. The straight decreasing trends in the small cases were due to the overhead cost. Since the overhead cost can be assumed to be almost constant, the unit calculation time will be close to inversely proportional to the problem size when the overhead cost is dominant. With the MGCR solver, the unit calculation time (MGCR (total)) linearly decreased in the range of 400–102,400 particles (Cases x1 ~  x1/16), and the decrease slowed down in the range over 102,400 particles (Cases x1/16 ~). This indicates that the overhead cost was dominant with 400–102,400 particles (Cases x1 ~  x1/16). According to the breakdown of the total computational time with the MGCR solver, the preconditioning time was larger with 400 –102,400 particles (Case x1 ~ 1/16), and the main iteration time was larger with more than 102,400 particles (Case x1/32 ~). The unit calculation time with respect to the main iteration (MGCR (solver main)) linearly decreased in the range of 400–6400 particles (Case x1~1/4), and after the transition range of 6400 –102,400 particles (Case x1/4~x1/16), it became almost constant in the range over 102,400 particles (Case x1/16 ~). On the other hand, the unit calculation time with respect to the preconditioning (MGCR (solver preconditioning)) linearly decreased in the whole range, showing that it was dominated by the overhead cost within all the presented cases (Cases x1~ x1/64). In addition, the unit calculation time of the MGCR preconditioning was larger than that of the MGCR main iteration in the small cases where the overhead cost is thought to be dominant. This implies that the overhead cost of the preconditioning was larger than that of the main iteration. This large overhead cost is the main reason why the MGCR method showed lower performance than the CR method in the small cases. However, the scalability of the MGCR solver is expected when considering the extrapolations in Fig. 9 toward larger cases with more than 1,638,400 particles (Case x1/64 ~). It was previously confirmed via the single CPU cases that the preconditioning stage does not need as much computation compared to that of the main iteration. Therefore, the unit calculation time with respect to the MGCR preconditioning stage decreases in larger cases where the parallel efficiency is expected to be improved. In addition, the unit calculation time of the MGCR main iteration is already almost constant with over 102,400 particles (Case x1/16 ~). Therefore, the MGCR solver is expected to be scalable in larger cases, where the MGCR main iteration time will dominate the total computational time.

Fig. 8
figure 8

Calculation time of a parallel computation with OpenACC and a GPU

Fig. 9
figure 9

Breakdown of the calculation time of a parallel computation with OpenACC and a GPU

A larger overhead cost is required when the number of parallel threads is larger. Since the parallel computation on GPU (A100 (80 GB)) is highly parallelized, the overhead cost was thought to be relatively large. To confirm the dependency on the number of parallel threads, the cases in Table 2 are studied with OpenMP parallel computations on a CPU (Xeon Gold 6252 (24 core)). The unit calculation times obtained with the MGCR solver are shown in Fig. 10. The overhead cost was dominant in the range where the straight decreasing trends were found. With a larger number of CPU threads, the overhead cost was larger, and it dominated the total computational time in a wider range. In addition, the unit calculation time was mostly constant in the sufficiently large cases where the overhead cost was not dominant. This implies that the computational time is proportional to the problem size and that the numerical method is scalable when the problem size is sufficiently large. In comparison, the unit calculation time with the CR solver is shown in Fig. 11. A straight decreasing trend was also observed in Fig. 11, but the overhead cost and its dominant range were smaller than those in Fig. 10. This is because the CR solver does not include preconditioning where the large overhead cost is needed. In contrast to Fig. 10, the unit calculation time in Fig. 11 increased in the sufficiently large cases where the overhead cost was not dominant. This is because the number of iterations in the CR solver increases against the problem size.

Fig. 10
figure 10

Calculation time of a parallel computation with OpenMP and a CPU using MGCR solver

Fig. 11
figure 11

Calculation time of a parallel computation with OpenMP and a CPU using CR solver

Overall, in parallel computation, the multigrid solvers (MGCR and MGCG) did not perform well in the small cases due to the large overhead cost with respect to the preconditioning stage, but they were efficient in the large cases where the number of iterations mainly determines the total computational time.

4.4 Three-dimensional calculations

The trends observed in the above sections were almost the same as those in the 3D calculations. Here, a simple 3D high-viscosity incompressible dam break problem (Fig. 12) was taken as an example. Based on the conditions in Table 1, the calculation conditions in the 3D scaled cases are given in Table 4. The calculations were conducted on a GPU (A100 (80 GB)) with OpenACC applying the MGCR and CR solvers. The number of solver iterations at t = 0.2 s is shown in Fig. 13. While the CR method suffered from a large number of iterations in the large cases, the MGCR method could suppress the iterations even in the large cases. The breakdowns of the unit calculation times are shown in Fig. 14. Since the parallel efficiency of the preconditioning was not good, the MGCR method was slower than the CR method in the small cases. In contrast, the MGCR method was faster in the large cases where the number of iterations mainly determines the total computational time. This indicates that the multigrid technique is useful for both 2D and 3D parallel calculations when the problem size is sufficiently large.

Fig. 12
figure 12

Dam-break calculation of a highly viscous incompressible flow in 3D

Table 4 Calculation conditions for the scaled cases in 3D
Fig. 13
figure 13

Number of iterations in the 3D scaled cases (Table 4)

Fig. 14
figure 14

Calculation time in the 3D scaled cases (Table 4) computed with OpenACC and a GPU

5 Conclusion

In this study, a scalable MPH-I method was developed. A derivation of the SPD matrix equation through pressure substitution [21] was presented, and additionally, it was shown that the SPD feature generally appears in a physically consistent system by deriving it from the analytical mechanical equation [23]. To solve the SPD matrix equation, a bucket-based multigrid preconditioner was constructed such that it satisfies the condition for application to the CG and CR solvers. Moreover, to handle the complexity due to multidimensionality, the extended Jacobi iteration, which is also applicable for nondiagonally dominant matrix equations, was proposed. In the benchmark calculations, the simple high-viscosity incompressible dam break problems were calculated with the MGCR, MGCG, CR and CG solvers in both 2D and 3D. The number of iterations could be suppressed by the multigrid solvers, and it was smaller with the CR solvers than with the CG solvers regardless of the preconditioning steps. Consequently, the MGCR solver showed the best performance of the four, and the number of iterations hardly depended on the problem size. In fact, the computational time in the single CPU calculation was almost proportional to the numbers of particles and time steps, and the scalability was presented within the tested cases. The performance of the multigrid solvers was also tested in parallel computations on a CPU and GPU. For the small problems, the MGCR and MGCG solvers were slower than the CR and CG solvers because of the large overhead cost in the preconditioning process. However, they showed better performance than conventional solvers for large problems, where the number of solver iterations mainly determines the calculation time.