Abstract
The use of sequential time integration schemes becomes more and more the bottleneck within largescale computations due to a stagnation of processor’s clock speeds. In this study, we combine the parallelintime Multigrid Reduction in Time method with a pmultigrid method to obtain a scalable solver specifically designed for Isogeometric Analysis. Numerical results obtained for two and threedimensional benchmark problems show the overall scalability of the proposed method on modern computer architectures and a significant improvement in terms of CPU timings compared to the use of standard spatial solvers.
Article Highlights

The use of a pmultigrid method significantly reduces the CPU timings for higher values of the spline degree.

The Multigrid Reduction in Time method shows both strong and weak scalability up to 2048 cores.

Iteration numbers are independent of the number of time steps, mesh width and spline degree.
1 Introduction
Since its introduction in [1], Isogeometric Analysis (IgA) has become more and more a viable alternative to the Finite Element Method (FEM). Within IgA, the same building blocks (i.e., Bsplines and NURBS) as in Computer Aided Design (CAD) are adopted, which closes the gap between CAD and FEM. In particular, the use of highorder splines results in a highly accurate represention of (curved) geometries and has shown to be advantageous in many applications, like structural mechanics [2], solid and fluid dynamics [3] and shape optimization [4]. Finally, the accuracy per degree of freedom (DOF) compared to FEM is significantly higher with IgA [5].
For timedependent partial differential equations (PDEs), IgA is typically combined with a traditional time integration scheme within the method of lines. Here, the spatial variables are discretized by adopting IgA, after which the resulting system of ordinary differential equations (ODEs) is integrated in time. However, as with all traditional time integration schemes, the latter part becomes more and more the bottleneck in numerical simulations. When the spatial resolution is increased to improve accuracy, a smaller time step size has to be chosen to ensure stability of the overall method. As clock speeds are no longer increasing, but the core count goes up, the parallelizability of the entire calculation process becomes more and more important to obtain an overall efficient method. As traditional time integration schemes are sequential by nature, new parallelintime methods are needed to resolve this problem.
The Multigrid Reduction in Time (MGRIT) method [6] is a parallelintime algorithm based on multigrid reduction (MGR) techniques [7]. In contrast to spacetime methods, in which time is considered as an extra spatial dimension, sequential time stepping is still necessary within MGRIT. Spacetime methods have been combined in the literature with IgA [8]. Although very successful, a drawback of such methods is the fact that they are more intrusive on existing codes, while MGRIT just requires a routine to integrate the fully discrete problem from one time instance to the next. Over the years, MGRIT has been studied in detail (see [9]) and applied to a variety of problems, including those arising in optimization [10] and power networks [11].
Recently, the authors applied MGRIT in the context of IgA for the first time in the literature [12]. Here, MGRIT showed convergence for a variety of twodimensional benchmark problems independent of the mesh width h, the spline degree p of the Bspline basis functions and the number of time steps \(N_t\). However, as a standard (diagonally preconditioned) Conjugate Gradient method was adopted for the spatial solves within MGRIT, a significant dependency of the CPU timings on the spline degree was visible. Furthermore, the parallel performance of MGRIT was investigated for a very limited number of cores.
In this paper, we extend the research direction set out in [12] by combining MGRIT with a stateoftheart pmultigrid method [13] to solve the linear systems arising within MGRIT. CPU timings show that the use of such a solver significantly improves the overall performance of MGRIT, in particular for higher values of p. Furthermore, the parallel performance of the resulting MGRIT method (i.e., strong and weak scalability) is investigated on modern computer architectures, showing significant (and close to ideal) speedups up to 2048 cores.
This paper is structured as follows: In Sect. 2, a twodimensional model problem and its spatial and temporal discretization are considered. The MGRIT algorithm is then described in Sect. 3. In Sect. 4, the adopted pmultigrid method and its components are presented in more detail. In Sect. 5, numerical results obtained for the considered model problem are analyzed for different values of the mesh width, spline degree and the number of time steps and compared to those obtained in [12]. Furthermore, weak and strong scaling studies of MGRIT when adopting a pmultigrid method are performed. Finally, conclusions are drawn in Sect. 7.
2 Model problem and discretization
As a model problem, we consider the transient diffusion equation:
Here, \(\Omega \subset {\mathbb {R}}^d\) denotes a simply connected, Lipschitz domain in d dimensions and \(f \in L^2(\Omega )\) a source term. The above equation is complemented by initial conditions and homogeneous Dirichlet boundary conditions:
First, we discretize Eq. (1) by dividing the time interval [0, T] in \(N_t\) subintervals of size \(\Delta t\) and applying the \(\theta\)scheme to the temporal derivative, which leads to the following equation to be solved at every time step:
for \(\mathbf {x} \in \Omega\) and \(k = 0,\ldots ,N_t\). Depending on the choice of \(\theta\), this scheme leads to the backward Euler (\(\theta =1\)), forward Euler (\(\theta =0\)) or secondorder accurate Crank–Nicolson (\(\theta =0.5\)) method. By rearranging the terms, the discretized equation can be written as follows:
To obtain the variational formulation, let \({\mathcal {V}} =H^1_0(\Omega )\) be the space of functions in the Sobolev space \(H^1(\Omega )\) that vanish on the boundary \(\partial \Omega\). Equation (5) is multiplied with a test function \(v \in {\mathcal {V}}\) and the result is then integrated over the domain \(\Omega\):
where we write \(u(\mathbf {x})^{k+1} = u^{k+1}\) throughout the remainder of this section to improve readability. Applying integration by parts on the second term on both sides of the equation results in
where the boundary integral integral vanishes since \(v = 0\) on \(\partial \Omega\). To parameterize the physical domain \(\Omega\), a geometry function \(\mathbf {F}\) is then defined, describing an invertible mapping to connect the parameter domain \(\Omega _0 =(0,1)^d\) with the physical domain \(\Omega\):
Provided that the physical domain \(\Omega\) is topologically equivalent to the unit square, the geometry can be described by a single geometry function \(\mathbf {F}\). In case of more complex geometries, a family of functions \(\mathbf {F}^{(m)}\) (\(m = 1,\ldots , K\)) is defined and we refer to \(\Omega\) as a multipatch geometry consisting of K patches. For a more detailed description of the spatial discretization in IgA and multipatch constructions, the authors refer to chapter 2 of [1].
At each time step, we express u in Eq. (7) by a linear combination of multivariate Bspline basis functions of order p. Multivariate Bspline basis functions are defined as the tensor product of univariate Bspline basis functions \(\phi _{i,p}\) \((i=1,\ldots ,N)\) which are uniquely defined on the parameter domain (0, 1) by an underlying knot vector \(\Xi = \{ \xi _1, \xi _2, \ldots , \xi _{N+p}, \xi _{N+p+1} \}\). Here, N denotes the number of Bspline basis functions and p the spline degree. Based on this knot vector, the basis functions are defined recursively by the Coxde Boor formula [14], starting from the constant ones:
Higherorder Bspline basis functions of order \(p>0\) are then defined recursively:
The resulting Bspline basis functions \(\phi _{i,p}\) are nonzero on the interval \([\xi _i,\xi _{i+p+1})\) and possess the partition of unity property. Furthermore, the basis functions are \(C^{pm_i}\)continuous, where \(m_i\) denotes the multiplicity of knot \(\xi _i\). Throughout this paper, we consider a uniform knot vector with knot span size h, where the first and last knot are repeated \(p+1\) times. As a consequence, the resulting Bspline basis functions are \(C^{p1}\) continuous and interpolatory at both end points. Figure 1 illustrates both linear (left) and quadratic (right) Bspline basis functions based on such a knot vector.
As mentioned previously, the tensor product of univariate Bspline basis functions is adopted for the multidimensional case. Denoting the total number of multivariate Bspline basis functions \(\Phi _{i,p}\) by \(N_{\mathrm{dof}}\), the solution u is thus approximated at each time step as follows:
Here, the spline space \({\mathcal {V}}_{h,p}\) is defined, using the inverse of the geometry mapping \(\mathbf {F}^{1}\) as pullback operator, as follows:
By setting \(v = \Phi _{j,p}\), Eq. (7) can be written as follows:
where \(\mathbf {M}\) and \(\mathbf {K}\) denote the mass and stiffness matrix, respectively:
3 Multigrid Reduction in Time
A traditional (i.e., sequential) time integration scheme would solve Eq. (13) for \(k=0,\ldots ,N_t\) to obtain the numerical solution at each time instance. In this paper, however, we apply the Multigrid Reduction in Time (MGRIT) method to solve Eq. (13) parallelintime. For the ease of notation, we set \(\theta =1\) throughout the remainder of this section. Let \(\Psi = \left( \mathbf {M} + \Delta t \mathbf {K} \right) ^{1}\) denote the inverse of the left hand side operator. Then, Eq. (13) can be written as follows:
where \(\mathbf {g}^{k+1} = \Psi \Delta t \mathbf {f}\). Setting \(\mathbf {g}^0\) equal to the initial condition \(u^0(\mathbf {x})\) projected on the spline space \({\mathcal {V}}_{h,p}\), the time integration method can be written as a linear system of equations:
A sequential time integration scheme would correspond to a blockforward solve of this linear system of equations. In this paper, however, we adopt MGRIT to iteratively solve Eq. (17). First, we introduce the twolevel MGRIT method, showing similarities with the wellknown parareal algorithm [15]. In fact, it can be shown that both methods are equivalent, assuming a specific choice of relaxation [16]. Then, the multilevel variant of MGRIT will be presented in more detail.
3.1 Twolevel MGRIT method
The twolevel MGRIT method combines the use of a cheap coarselevel time integration method with an accurate more expensive finelevel one which can be performed in parallel. That is, the linear system of equations given by Eq. (17) can be solved iteratively by introducing a coarse temporal mesh with time step size \(\Delta t_C = m \Delta t_F\). Here, \(\Delta t_F\) coincides with the \(\Delta t\) from the previous sections and m denotes the coarsening factor. Figure 2 illustrates both the fine and coarse temporal discretization.
The time instances \(T_0, T_1, \ldots , T_{N_t/m}\) are referred to as coarse points (or Cpoints), while the remaining points are called fine points (or Fpoints). The description of MGRIT is based on this division of the time instances in both coarse and fine points. By applying a numbering strategy that first numbers the Fpoints and then the Cpoints, we can write Eq. (17) as follows:
where the matrix \(\mathbf {A}\) can be decomposed as follows:
where \(\mathbf {I}_C\) and \(\mathbf {I}_F\) are identity matrices. The ‘ideal’ restriction and prolongation operator are then defined as follows:
Within MGRIT, the ‘ideal’ prolongation operator P is typically adopted, while the ‘ideal’ restriction operator is replaced by \(\tilde{R} = \begin{bmatrix} \mathbf {0}&\mathbf {I}_C \end{bmatrix}\). The matrix \(\mathbf {A}_{FF}\) is given by:
Note that each solve with \(\mathbf {A}_{\Psi }\) corresponds to a single time step within a coarse interval, which is a completely independent process for each coarse interval and can therefore be performed in parallel. The Schur complement matrix \(\mathbf {S}\) in Eq. (19) is given by:
Instead of solving for \(\mathbf {S}\) directly, MGRIT solves for a modified matrix \(\tilde{\mathbf {S}}\) by replacing the operator \((\Psi \mathbf {M})^m \approx \Phi \mathbf {M}\), which corresponds to applying a single time step of a coarse time integrator. As a true multigrid method, the building blocks of the MGRIT method consist of relaxation (= fine time stepping) and a coarse grid correction (= coarse time stepping). Relaxation involves solving a linear system of the form
where \(\mathbf {u}_F\) and \(\mathbf {u}_C\) denote the solution at all Fpoints and Cpoints, respectively. Within relaxation, the solution is updated at the Fpoints based on the given values at the Cpoints. This time stepping from a coarse point C to all neighbouring fine points is also referred to as Frelaxation [6]. On the other hand, time stepping to a Cpoint from the previous Fpoint is referred to as Crelaxation. It should be noted that both types of relaxation are highly parallel and can be combined leading to socalled CF or FCFrelaxation. Figure 3 illustrates both C and Frelaxation methods.
The coarse grid correction involves solving the linear system of equations
which is a sequential procedure by design, but is much cheaper compared to the fine time integration (which can be performed in parallel). Here, the vector \(\mathbf {u}_{C}\) is obtained by applying \(\tilde{R}\) on \(\mathbf {u}\).
3.2 Multilevel MGRIT method
The solution procedure described above can be extented to a true multilevel MGRIT method. First, we define a hierarchy of L temporal meshes, where the time step size for the discretization at level \(l \ (l = 0,1, \ldots L)\) is given by \(\Delta t_F m^l\). The total number of levels L is related to the coarsening factor m and the total number of fine steps \(\Delta t_F\) by \(L=\text {log}_m(N_t)\). Let \(\mathbf {A}^{(l)} \mathbf {u}^{(l)} =\mathbf {g}^{(l)}\) denote the linear system of equations based on the considered time step size at level l. The MGRIT method can then be written as follows:

1.
Apply Frelaxation (\(=\) fine time stepping) on \(\mathbf {A}^{(l)} \mathbf {u}^{(l)} = \mathbf {g}^{(l)}\):
$$\begin{aligned} \mathbf {A}^{(l)}_{FF} \mathbf {u}^{(l)}_F = \mathbf {g}^{(l)}_F \mathbf {A}^{(l)}_{FC} \mathbf {u}^{(l)}_C. \end{aligned}$$ 
2.
Determine the residual at level l and restrict it to level \(l+1\) using the restriction operator \(\tilde{R}\):
$$\begin{aligned} \mathbf {r}^{(l+1)} = \tilde{R} \left( \mathbf {g}^{(l)}  \mathbf {A}^{(l)} \mathbf {u}^{(l)} \right) . \end{aligned}$$ 
3.
Solve Eq. (24) (\(=\) coarse time stepping) to obtain \(\mathbf {u}^{(l+1)}\):
$$\begin{aligned} \tilde{\mathbf {S}} \mathbf {u}^{(l+1)} = \mathbf {r}^{(l+1)}. \end{aligned}$$ 
4.
Prolongate the correction using the ‘ideal’ interpolation operator P and update the solution at level l:
$$\begin{aligned} \mathbf {u}^{(l)} := \mathbf {u}^{(l)} + P \mathbf {u}^{(l+1)}. \end{aligned}$$
Recursive application of this scheme until the coarsest level is reached, leads to a socalled Vcycle. However, as with standard multigrid methods, alternative cycle types (i.e., Wcycles, Fcycles) can be defined. At all levels of the multigrid hierarchy, the operators are obtained by rediscretizing Eq. (1) using a different time step size.
4 pmultigrid method
Within the MGRIT algorithm, fine time stepping is performed in parallel within each time interval. Assuming a backward Euler time integration scheme, the following linear system of equations is solved within each time interval at every iteration:
Throughout this section we will omit the time step index k and write the linear system of equations given by Eq. (25) as follows:
In a recent paper by the authors [12], this linear system of equations was solved within MGRIT by means of a (diagonally preconditioned) ConjugateGradient method. However, as the condition number of the system matrix increases exponentially in IgA with the spline degree p, the use of standard iterative solvers becomes less efficient for higher values of p. As a consequence, alternative solution techniques have been developed in recent years to overcome this dependency [17].
In this paper, we adopt a pmultigrid method [13] specifically designed for discretizations arising in IgA to solve the linear systems within MGRIT. Within the pmultigrid method, a loworder correction is obtained (at level \(p=1\)) to update the solution at the highorder level. Starting from the highorder problem, the following steps are performed [13]:

1.
Apply one presmoothing step to the initial guess \(\mathbf {u}_{h,p}^0\):
$$\begin{aligned} \mathbf {u}_{h,p}^0 = \mathbf {u}_{h,p}^0 + {\mathcal {S}}_{h,p} \left( \mathbf {f}_{h,p}  \mathbf {A}_{h,p} \mathbf {u}_{h,p}^0 \right) , \end{aligned}$$(27)where \({\mathcal {S}}_{h,p}\) is a smoothing operator applied to the highorder problem.

2.
Determine the residual at level p and project it onto the space \({\mathcal {V}}_{h,1}\) using the restriction operator \({\mathcal {I}}_{p}^{1}\):
$$\begin{aligned} \mathbf {r}_{h,1} = {\mathcal {I}}_{p}^{1} \left( \mathbf {f}_{h,p} \mathbf {A}_{h,p} \mathbf {u}_{h,p}^0 \right) . \end{aligned}$$(28) 
3.
Solve the residual equation to determine the coarse grid error:
$$\begin{aligned} \mathbf {A}_{h,1} \mathbf {e}_{h,1} = \mathbf {r}_{h,1}. \end{aligned}$$(29) 
4.
Project the error \(\mathbf {e}_{h,1}\) onto the space \({\mathcal {V}}_{h,p}\) using the prolongation operator \({\mathcal {I}}_{1}^p\) and update \(\mathbf {u}_{h,p}^0\):
$$\begin{aligned} \mathbf {u}_{h,p}^0 := \mathbf {u}_{h,p}^0 + {\mathcal {I}}_{1}^p \left( \mathbf {e}_{h,1} \right) . \end{aligned}$$(30) 
5.
Apply one postsmoothing step of the form (27) on the updated solution to obtain \(\mathbf {u}_{h,p}^1\).
To approximately solve the residual equation given by Eq. (29) a single Wcycle of a standard hmultigrid method [18], using canonical prolongation and weighted restriction, is applied. As the level \(p=1\) corresponds to a loworder Lagrange discretization, an hmultigrid method (using Gauss–Seidel as a smoother) is known to be both efficient and cheap [19]. The resulting pmultigrid adopted throughout this paper is shown in Fig. 4.
Note that, we directly restrict the residual at the highorder level to level \(p=1\). This aggressive pcoarsening strategy has shown to significantly improve the computational efficiency of the resulting pmultigrid method [20], while maintaining its excellent convergence behavior.
Prolongation and restriction operators based on an \(L_2\) projection are adopted to transfer vectors from the highorder level to the loworder level (and vice versa). These transfer operators have been used extensively in the literature [21,22,23] and are given by:
Here, the mass matrix \(\mathbf {M}_p\) and transfer matrix \(\mathbf {P}_{1}^{p}\) are defined as follows:
To prevent the explicit solution of a linear system of equations for each projection step, the consistent mass matrix in both transfer operators is replaced by its lumped counterpart by applying rowsum lumping. Note that, rowsum lumping can be applied within the variational formulation, due to the partition of unity and nonnegativity of the Bspline basis functions.
Various choices can be made with respect to the smoother at the highorder level. The use of Gauss–Seidel or (damped) Jacobi as a smoother at level p leads to convergence rates of the resulting multigrid method that depend significantly on the spline degree p [24]. Alternative smoothers have been developed in recent years to overcome this shortcoming [25]. In particular, the use of ILUT factorizations [26] (i.e., as a preconditioner within a preconditioned Richardson iteration) has shown to be very effective in the context of IgA [24] and will therefore be adopted throughout the remainder of this paper. An efficient implementation of ILUT is available in the Eigen library [27]. Once the factorization \(\mathbf {A}_{h,p} \approx \mathbf {L}_{h,p} \mathbf {U}_{h,p}\) is obtained, a single smoothing step is applied as follows:
The ILUT factorization is determined completely by a dropping tolerance \(\tau\) and fill factor f. Based on previous studies by the authors, we choose \(\tau =10^{12}\) and \(f=1\), which implies we only drop a few (very) small values during the factorization and \(\mathbf {L}_{h,p} \mathbf {U}_{h,p}\) has a similar number of nonzero elements as \(\mathbf {A}_{h,p}\).
5 Numerical results
To assess the quality of MGRIT when applied in combination with a pmultigrid method within Isogeometric Analysis, we consider the timedependent heat equation in two dimensions given by Eq. (1). Figure 5 shows the resulting solution u at different time instances for \(\Omega = [0,1]^2\). Here, an inhomogeneous Neumann boundary condition is applied at the left boundary. Furthermore, the righthand side is chosen equal to one and the initial condition is equal to zero.
Based on a spatial discretization with Bspline basis functions of order p and a mesh width h, MGRIT is applied to iteratively solve the resulting equation. Both the number of iterations and CPU timings needed to reach convergence will be investigated using both a (diagonally preconditioned) Conjugate Gradient method and the described pmultigrid method. Furthermore, we will investigate the parallel performance of MGRIT on modern computer architectures. The opensource C++ library G+Smo [28] is used to discretize the model problem in space using IgA, while, for the MGRIT algorithm, the parallelintime code XBraid, developed at Lawrence Livermore National Lab, is adopted [29]. The MGRIT method is said to have reached convergence if the relative residual (in the \(L_2\) norm) at the end of an iteration is smaller or equal to \(10^{10}\), unless stated otherwise.
As a starting point, we briefly summarize the results obtained in a previous paper of the authors (see [12]). There, numerical results were obtained for the same model problem using different hierarchies (i.e., a Vcycle, Fcycle and twolevel method), time integration schemes (i.e., backward Euler, forward Euler and Crank–Nicolson) and domains of interest (see Fig. 6).
In general, it was observed that MGRIT converged in a low number (i.e., 5–10) of iterations, although the number of iterations was slightly higher when Vcycles were adopted instead of Fcycles or a twolevel method. Furthermore, the number of iterations was independent of the mesh width h, spline degree of the Bspline basis functions p and the number of time steps \(N_t\) for all considered hierarchies and domains of interest. As expected from sequential time stepping methods, the use of the implicit backward Euler within MGRIT lead to the most stable time integration method. Finally, CPU timings were obtained for a limited number of processors, showing a strong dependency on the spline degree p when the Conjugate Gradient method was applied as a spatial solver within MGRIT.
In this section, we investigate the effect of using a pmultigrid method for the spatial solves compared to the use of a Conjugate Gradient method. Furthermore, we present numerical results when considering a three dimensional geometry (i.e., the unit cube). Finally, we investigate the weak and strong scaling of MGRIT on modern architectures when applied in the context of IgA. As this research focuses on the spatial solver and scalability, we will restrict ourselves to the backward Euler method and the use of Vcycles within MGRIT.
5.1 Iteration numbers
As a first step, we compare the number of MGRIT iterations to reach convergence when a pmultigrid method or a (diagonally preconditioned) Conjugate Gradient method is adopted while keeping all other parameters the same. Table 1 shows the results when the mesh width is kept constant (\(h=2^{6}\)) for the unit square and a quarter annulus when adopting Vcycles with a pmultigrid (top) and CG method (bottom), respectively. For both benchmarks and all configurations, the number of iterations needed with MGRIT to reach convergence is independent of the number of time steps \(N_t\) and spline degree p. Furthermore, the number of MGRIT iterations is identical when adopting a pmultigrid method compared to the use of a Conjugate Gradient method.
Table 2 shows the results for different values of the mesh width h when the number of time steps is kept constant (\(N_t=100\)) for both benchmarks when adopting Vcycles. The number of MGRIT iterations is independent of the mesh width h and spline degree p. Furthermore, the number of MGRIT iterations is identical when adopting a pmultigrid method compared to the use of a Conjugate Gradient method.
Results when adopting the pmultigrid method have been obtained for a threedimensional benchmark problem as well. Table 3 shows the number of MGRIT iterations for different values of \(N_t\), p and h when the unit cube is considered as geometry. In general, the number of iterations needed to reach convergence is independent of the number of time steps, spline degree and mesh width. Furthermore, the number of iterations are comparable to the ones obtained for the twodimensional benchmark problems.
Finally, we investigate the influence of the time integration scheme on the number of MGRIT iterations. Table 4 shows the number of MGRIT iterations needed to reach convergence for the forward Euler (\(\theta =0\)) and Crank–Nicolson (\(\theta =0.5\)) method. Results can be compared to the ones obtained with the backward Euler method (see Table 2). For many configurations, MGRIT using forward Euler does not convergence (which is related to the CFL condition), while the Crank–Nicolson method converges for all configurations. A small dependency on h and p is, however, visible. Based on these results, the backward Euler method will be adopted throughout the remainder of this paper. For a more detailed analysis regarding different time integration schemes within MGRIT, the authors refer to [12].
Although the number of MGRIT iterations is identical for all configurations when adopting a pmultigrid or Conjugate Gradient method for solving the linear systems of equations, it is expected that CPU timings will differ significantly. Therefore, focus will lie on CPU timings throughout the remainder of this section.
5.2 CPU timings
CPU timings have been obtained when a pmultigrid method or Conjugate Gradient method is adopted for the spatial solves within MGRIT. As in the previous section, we adopt Vcycles, a mesh width of \(h=2^{6}\) and the unit square as our domain of interest. Note that the corresponding iteration numbers can be found in Table 1. The computations are performed on three compute nodes each consisting of an Intel(R) i710700 (@ 2.90GHz) Cometlake processor with 8 hardware cores (hyperthreading turned on) and 128GB DDR4 main memory organized in 4 modules of 32GB each.
Figure 7 shows the CPU time needed to reach convergence for a varying number of cores, a different number of time steps and different values of p. When the Conjugate Gradient method is adopted for the spatial solves, doubling the number of time steps leads to an increase of the CPU time by a factor of two. Furthermore, it can be observed that the CPU timings significantly increase for higher values of p which is related to the spatial solves required at every time step. As standard iterative solvers (like the Conjugate Gradient method) have a detoriating performance for increasing values of p, more iterations are required to reach convergence for each spatial solve, resulting in higher computational costs of the MGRIT method. When focussing on the number of cores, it can be seen that doubling the number of cores significantly reduces the CPU time needed to reach convergence. More precisely, a reduction of 45–50% can be observed when doubling the number of cores to 6, implying the MGRIT algorithm is highly parallelizable.
As with the use of the Conjugate Gradient method, doubling the number of time steps leads to an increase of the CPU time by a factor of two when a pmultigrid method is adopted. For \(p=2\), the use of a pmultigrid method leads to higher CPU timings compared to the use of the Conjugate Gradient method for all values of \(N_t\). However, the dependency of the CPU timings on the spline degree is significantly mitigated, which leads to a serious decrease of the CPU timings compared to the use of the Conjugate Gradient method when higher values of p are considered. For example, for \(N_t=2000\) and \(p=5\) a speedup of more than a factor of 10 is achieved.
Again, increasing the number of cores from 3 to 6, reduces the CPU time needed to reach convergence by 45–50%. These results show that MGRIT combined with a pmultigrid method leads to an overall more efficient method. Therefore, a larger computer cluster will be considered in the next section to further investigate the scalability of MGRIT (i.e., weak and strong scalability) when combined with a pmultigrid method within IgA.
6 Scalability
In the previous sections, we applied MGRIT adopting a relatively low number of cores. Here, it was shown that the use of a pmultigrid method significantly reduces the dependency of the CPU timings on the spline degree. In this section, we investigate the scalability of MGRIT (combined with a pmultigrid method) on a modern architecture. More precisely, we will investigate both strong and weak scalability on the Lisa system, one of the nationally used clusters of the Netherlands^{Footnote 1}.
6.1 Strong scalability
First, we fix the total problem size and increase the number of cores (i.e., strong scalability). That is, we consider the same benchmark problem as in the previous sections, but with a mesh width of \(h=2^{6}\) and a number of time steps \(N_t\) of 10.000. As before, backward Euler is applied for the time integration and Vcycles are adopted as MGRIT hierarchy. Figure 8 shows the CPU timings needed to reach convergence for a varying number of Intel Xeon Gold 6130 (@ 2.10GHz) processors, where each processor consists of 16 cores. For all values of p, increasing the number of cores leads to significant speedups which illustrates the parallizability of the MGRIT method up to 2048 cores. To compare the results with a sequential time integration method, results with a backward Euler method have been added as well. Here, the CPU timings are independent of the number of processors and shown in the most right column for each value of p (‘sequential’). Clearly, MGRIT outperforms the sequential algorithm when the number of cores is larger or equal to 128. This behavior has been observed in the literature as well in case of a finite difference discretization for a similar model problem, see [6].
Figure 9 shows the obtained speedups as a function of the number of cores for different values of p based on the results presented in Fig. 8. As a comparison, the ideal speedup has been added, assuming a perfect parallizability of the MGRIT method. Note that, for all values of p, the observed speedup slightly increases when the number of cores is higher than 256. Furthermore, the obtained speedups remain high, even when the number of cores is further increased to 2048, and is independent of the spline degree p.
Strong scalability has been investigated for the threedimensional benchmark problem as well. Figure 10 shows the strong scalability for MGRIT on the unit cube. In general, the obtained results are comparable to the ones obtained in two dimensions, showing significant speedups when increasing the number of cores for all values of p. Results obtained with a sequential time integration method have been added as well, showing comparable results to MGRIT when adopting 512 cores. It should be noted that, compared to the twodimensional benchmark problem, the CPU timings grow significantly faster for increasing values of p. This is wellknown in Isogeometric Analysis [30] and is related to the relatively high number of nonzero entries in three dimensions when considering higher values of p. The use of the (preconditioned) Conjugate Gradient method would even lead to a significant higher growth in CPU timings, as the number of iterations needed to reach convergence for every spatial solve increases excessively in three dimensions when adopting a standard solver.
Figure 11 shows the obtained speedups for different values of p based on the results presented in Fig. 10. The obtained speedups are similar to the ones obtained for the twodimensional problem but vary slightly more for different values of p. In general, the observed speedups remain high, even when the number of cores is increased to 2048.
6.2 Weak scalability
As a next step, we consider the unit square as our domain of interest but keep the problem size per processor fixed (i.e. weak scalability). In case of 64 cores, the number of time steps equals 1000 and is adjusted based on the number of cores. Figure 12 shows the CPU time needed to reach convergence for a different number of cores and different values of p. Clearly, the CPU timings remain (more or less) constant when the number of cores is increased, showing the weak scalability of the MGRIT method. Although the CPU timings slightly increase for higher values of p, the strong pdependency observed with the Conjugate Gradient method is clearly mitigated.
7 Conclusions
In this paper, we combined MGRIT with a pmultigrid method for discretizations arising in Isogeometric Analysis. Numerical results obtained for a variety of benchmark problems show that the use of a pmultigrid method for all spatial solves within MGRIT results in convergence rates independent of the mesh width h, spline degree p and number of time steps \(N_t\). Furthermore, CPU timings depend only mildly on the spline degree p in two dimensions. This is in sharp contrast to standard solvers (e.g. a Conjugate Gradient method), which show a deteriorating performance (in terms of CPU timings) for higher values of p already in two dimensions. Furthermore, the obtained CPU timings when adopting a pmultigrid method are significantly lower for almost all considered configurations. On modern computer architectures, both strong and weak scalability of the resulting MGRIT method have been investigated, showing good scalability up to 2048 cores, illustrating the potential of MGRIT (combined with a pmultigrid method) for timedependent simulations in IgA.
Within this paper, we restrict ourselves to first and secondorder accurate time integration schemes. As the use of highorder Bspline basis functions significantly reduces the spatial discretization error, the use of alternative (and in particular higherorder) time integration scheme is interesting and will be investigated in future work. Furthermore, we will focus on the application of MGRIT to more challenging benchmark problems, in particular those where IgA has proven to be a viable alternative to FEM.
References
Hughes TJR, Cottrell JA, Bazilevs Y (2005) Isogeometric analysis: CAD, finite elements, NURBS, exact geometry and mesh refinement. Comput Methods Appl Mech Eng 194:4135–4195. https://doi.org/10.1016/j.cma.2004.10.008
Cottrell J, Reali A, Bazilevs Y, Hughes T (2006) Isogeometric analysis of structural vibrations. Comput Methods Appl Mech Eng 195(41–43):5257–5296. https://doi.org/10.1016/j.cma.2005.09.027
Bazilevs Y, Calo VM, Zhang Y, Hughes TJR (2006) Isogeometric fluidstructure interaction analysis with applications to arterial blood flow. Comput Mech 38(4–5):310–322. https://doi.org/10.1007/s0046600600843
Wall WA, Frenzel MA, Cyron C (2008) Isogeometric structural shape optimization. Comput Methods Appl Mech Eng 197(33–40):2976–2988. https://doi.org/10.1016/j.cma.2008.01.025
Hughes TJR, Reali A, Sangalli G (2008) Duality and unified analysis of discrete approximations in structural dynamics and wave propagation: Comparison of \(p\)method finite elements with \(k\)method NURBS. Comput Methods Appl Mech Eng 197:4104–4124. https://doi.org/10.1016/j.cma.2008.04.006
Falgout RD, Friedhoff S, Kolev TV, MacLachlan SP, Schroder JB (2014) Parallel time integration with multigrid. SIAM J Sci Comput 36(6):635–661. https://doi.org/10.1137/130944230
Ries M, Trottenberg U, Winter G (1983) A note on MGR methods. Linear Algebra Appl 49:1–26. https://doi.org/10.1016/00243795(83)900915
Langer U, Moore SE, Neumüller M (2016) Spacetime isogeometric analysis of parabolic evolution problems. Comput Methods Appl Mech Eng 306:342–363. https://doi.org/10.1016/j.cma.2016.03.042
Dobrev VA, Kolev T, Petersson NA, Schroder JB (2017) Twolevel convergence theory for multigrid reduction in time (MGRIT). SIAM J Sci Comput 39(5):501–527. https://doi.org/10.1137/16m1074096
Günther S, Gauger NR, Schroder JB (2018) A nonintrusive parallelintime approach for simultaneous optimization with unsteady PDEs. Optim Methods Softw 34(6):1306–1321. https://doi.org/10.1080/10556788.2018.1504050
Lecouvez M, Falgout RD, Woodward CS, Top P (2016) A parallel multigrid reduction in time method for power systems. In: 2016 IEEE power and energy society general meeting (PESGM), pp 1–5. https://doi.org/10.1109/PESGM.2016.7741520
Tielen R, Möller M, Vuik C (2021) Multigrid reduced in time for isogeometric analysis. In: VI Eccomas Young Investigators Conference
Tielen R, Möller M, Göddeke D, Vuik C (2020) \(p\)multigrid methods and their comparison to \(h\)multigrid methods within isogeometric analysis. Comput Methods Appl Mech Eng. https://doi.org/10.1016/j.cma.2020.113347
De Boor C (1978) A practical guide to splines. Springer, New York
Lions JL (2001) Résolution d’edp par un schéma en temps pararéel a parareal in time discretization of pde’s. CRASM 332(7):661–668. https://doi.org/10.1016/S07644442(00)017936
Gander MJ, Vandewalle S (2007) Analysis of the parareal timeparallel timeintegration method. SIAM J Sci Comput 29(2):556–578. https://doi.org/10.1137/05064607x
Donatelli M, Garoni C, Manni C, Capizzano S, Speleers H (2017) Symbolbased multigrid methods for galerkin Bspline isogeometric analysis. SIAM J Numer Anal 55:31–62. https://doi.org/10.1137/140988590
Hackbush W (1985) Multigrid methods and applications. Springer, Berlin. https://doi.org/10.1007/9783662024270
Trottenberg U, Oosterlee C, Schüller A (2001) Multigrid. Academic Press, London
Tielen R, Möller M, Vuik K (2021) A direct projection to loworder level for \(p\)multigrid methods in isogeometric analysis. In: Vermolen F, Vuik C (eds) Numerical mathematics and advanced applications, ENUMATH 2019  European Conference. Lecture notes in computational science and engineering, pp 1001–1009. Springer, Cham. https://doi.org/10.1007/9783030558741_99
Briggs WL, Henson VE, McCormick SF (2000) A multigrid tutorial, 2nd edn. SIAM, Philadelphia. https://doi.org/10.1137/1.9780898719505
Brenner SC, Scott LR (1994) The mathematical theory of finite element methods. Springer, New York
Sampath RS, Biros G (2010) A parallel geometric multigrid method for finite elements on octree meshes. SIAM J Sci Comput 32:1361–1392. https://doi.org/10.1137/090747774
Tielen R, Möller M, Vuik C (2018) Efficient multigrid based solvers for isogeometric analysis. In: van Brummelen H, Vuik C, Möller M, Verhoorsel C, Simeon B, Jüttler B (eds.), Isogeometric analysis and applications 2018. Lecture notes in computational science and engineering. Springer, Cham. https://doi.org/10.1007/9783030498368
Hofreither C, Takacs S, Zulehner W (2017) A robust multigrid method for isogeometric analysis in two dimensions using boundary correction. Comput Methods Appl Mech Eng 316:22–42. https://doi.org/10.1016/j.cma.2016.04.003
Saad Y (1994) ILUT: a dual threshold incomplete LU factorization. Numer Linear Algebra Appl 1:387–402. https://doi.org/10.1002/nla.1680010405
Guennebaud G et al (2010) Eigen v3. http://eigen.tuxfamily.org
Mantzaflaris A et al (2018) G+Smo (Geometry plus Simulation modules) v0.8.1. http://github.com/gismo
XBraid: Parallel multigrid in time. http://llnl.gov/casc/xbraid
Collier N, Dalcin L, Pardo D, Calo V (2013) The costs of continuity: performance of iterative solvers on isogeometric finite elements. SIAM J Sci Comput 35:767–784. https://doi.org/10.1137/120881038
Funding
The authors declare that no funds, grants, or other support were received during the preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
All authors contributed to the study conception and design. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Conflict of interest
On behalf of all authors, the corresponding author states that there is no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/
About this article
Cite this article
Tielen, R., Möller, M. & Vuik, C. Combining pmultigrid and Multigrid Reduction in Time methods to obtain a scalable solver for Isogeometric Analysis. SN Appl. Sci. 4, 163 (2022). https://doi.org/10.1007/s42452022050437
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42452022050437
Keywords
 Multigrid Reduction in Time
 Isogeometric Analysis
 pmultigrid