We focus on an initial value problem
$$\begin{aligned} u_t = f(u),\quad u(0) = u_0 \end{aligned}$$
(1)
with \(u(t), u_0, f(u) \in \mathbb {R}\). In order to keep the notation simple, we do not consider systems of initial value problems for now, where \(u(t) \in \mathbb {R}^N\). Necessary modifications will be mentioned where needed. In a first step, we now discretize this problem in time and review the idea of single-step, time-serial spectral deferred corrections (SDC).
Spectral deferred corrections
For one time-step on the interval \([t_l,t_{l+1}]\) the Picard formulation of Eq. (1) is given by
$$\begin{aligned} u(t) = u_{l,0} + \int _{t_0}^t f(u(s))ds,\ t\in [t_l,t_{l+1}]. \end{aligned}$$
(2)
To approximate the integral we use a spectral quadrature rule. We define M quadrature nodes \(\tau _{l,1},\ldots ,\tau _{l,M}\), which are given by \(t_l \le \tau _{l,1}< \cdots < \tau _{l,M} = t_{l+1}\). We will in the following explicitly exploit the condition that the last node is equal to the right integral boundary. Quadrature rules like Gauß-Radau or Gauß-Lobatto quadrature satisfy this property. We can then approximate the integrals from \(t_l\) to the nodes \(\tau _{l,m}\), such that
$$\begin{aligned} u_{l,m} = u_{l,0} + \varDelta t \sum _{j=1}^Mq_{m,j}f(u_{l,j}), \end{aligned}$$
where \(u_{l,m} \approx u(\tau _{l,m})\), \(\varDelta t= t_{l+1}-t_l\) and \(q_{m,j}\) represent the quadrature weights for the interval \([t_l,\tau _{l,m}]\) such that
$$\begin{aligned} \sum _{j=1}^Mq_{m,j}f(u_{l,j})\approx \int _{t_l}^{\tau _{l,m}}f(u(s))ds. \end{aligned}$$
We combine these M equations into one system
$$\begin{aligned} \left( \mathbf {I} - \varDelta t\mathbf {Q}\varvec{f} \right) (\varvec{u}_l) = \varvec{u}_{l,0}, \end{aligned}$$
(3)
which we call the “collocation problem”. Here, \(\varvec{u}_l = (u_{l,1}, \ldots , u_{l,M})^T \approx (u(\tau _{l,1}), \ldots , u(\tau _{l,M}))^T\in \mathbb {R}^M\), \(\varvec{u}_{l,0} = (u_{l,0}, \ldots , u_{l,0})^T\in \mathbb {R}^M\), \(\mathbf {Q}= (q_{ij})_{i,j}\in \mathbb {R}^{M\times M}\) is the matrix gathering the quadrature weights and the vector function \(\varvec{f}:\mathbb {R}^M \rightarrow \mathbb {R}^M\) is given by
$$\begin{aligned} \varvec{f}(\varvec{u}_l) = (f(u_{l,1}), \ldots , f(u_{l,M}))^T. \end{aligned}$$
To simplify the notation we define
$$\begin{aligned} \mathbf {C}^{{\text {coll}}}_{\varvec{f}}(\varvec{u}_l) := \left( \mathbf {I} - \varDelta t\mathbf {Q}\varvec{f} \right) (\varvec{u}_l) . \end{aligned}$$
We note that for \(u(t) \in \mathbb {R}^N\), we need to replace \(\mathbf {Q}\) by \(\mathbf {Q}\otimes \mathbf {I}_N\), where \(\otimes \) denotes the Kronecker product.
System (3) is dense and a direct solution is not advisable, in particular if \(\varvec{f}\) is a nonlinear operator. The spectral deferred correction method solves the collocation problem in an iterative way. While it has been derived originally from classical deferred or defect correction strategies, we here follow [10, 17, 27] to present SDC as preconditioned Picard iteration. A standard Picard iteration is given by
$$\begin{aligned} \varvec{u}^{k+1}_l = \varvec{u}^{k}_l + (\varvec{u}_{l,0} - \mathbf {C}^{{\text {coll}}}_{\varvec{f}} (\varvec{u}^k_l)) \end{aligned}$$
for \(k = 0, \ldots , K\), and some initial guess \(\varvec{u}^{0}_l\).
In order to increase range and speed of convergence, we now precondition this iteration. The standard approach to preconditioning is to define an operator \(\mathbf {P}^{{\text {sdc}}}_{\varvec{f}}\), which is easy to invert but also close to the operator of the system. We define this “SDC preconditioner” as
$$\begin{aligned} \mathbf {P}^{{\text {sdc}}}_{\varvec{f}}(\varvec{u}_l) := \left( \mathbf {I} - \varDelta t\mathbf {Q}_\varDelta \varvec{f} \right) (\varvec{u}_l) \end{aligned}$$
so that the preconditioned Picard iteration reads
$$\begin{aligned} \mathbf {P}^{{\text {sdc}}}_{\varvec{f}}(\varvec{u}_l^{k+1}) = (\mathbf {P}^{{\text {sdc}}}_{\varvec{f}} - \mathbf {C}^{{\text {coll}}}_{\varvec{f}})(\varvec{u}_l^k) + \varvec{u}_{l,0} . \end{aligned}$$
(4)
The key for defining \(\mathbf {P}^{{\text {sdc}}}_{\varvec{f}}\) is the choice of the matrix \(\mathbf {Q}_\varDelta \). The idea is to choose a “simpler” quadrature rule to generate a triangular matrix \(\mathbf {Q}_\varDelta \) such that solving System (4) can be done by forward substitution. Common choices include the implicit Euler method or the so-called “LU-trick”, where the LU decomposition of \(\mathbf {Q}^T\) with
$$\begin{aligned} \mathbf {Q}_\varDelta ^{\mathrm {LU}} = \mathbf {U}^{\mathbf {T}}\quad \text {for}\quad \mathbf {Q}^T = \mathbf {L}\mathbf {U} \end{aligned}$$
(5)
is used [27].
System (4) establishes the method of spectral deferred corrections, which can be used to approximate the solution of the collocation problem on a single time-step. In the next step, we will couple multiple collocation problems and use SDC to explain the idea of the parallel full approximation scheme in space and time.
Parallel full approximation scheme in space and time
The idea of PFASST is to solve a “composite collocation problem” for multiple time-steps at once using multigrid techniques and SDC for each step in parallel. This composite collocation problem for L time-steps can be written as
$$\begin{aligned} \begin{pmatrix} \mathbf {C}^{{\text {coll}}}_{\varvec{f}} \\ -\mathbf {H} &{} \mathbf {C}^{{\text {coll}}}_{\varvec{f}} \\ &{} \ddots &{} \ddots \\ &{} &{} -\mathbf {H} &{} \mathbf {C}^{{\text {coll}}}_{\varvec{f}} \end{pmatrix} \begin{pmatrix} \varvec{u}_{1}\\ \varvec{u}_{2}\\ \vdots \\ \varvec{u}_{L} \end{pmatrix} =\begin{pmatrix} \varvec{u}_{0,0} \\ \varvec{0}\\ \vdots \\ \varvec{0} \end{pmatrix}, \end{aligned}$$
where the matrix \(\mathbf {H}\in \mathbb {R}^{M\times M}\) on the lower subdiagonal transfers the information from one time-step to the next one. It takes the value of the last node \(\tau _{l,M}\) of an interval \([t_l, t_{l+1}]\), which is by requirement equal to the left boundary \(t_{l+1}\) of the following interval \([t_{l+1}, t_{l+2}]\), and provides it as a new starting value for this interval. Therefore, the matrix \(\mathbf {H}\) contains the value 1 on every position in the last column and zeros elsewhere. To write the composite collocation problem in a more compact form we define the vector \(\varvec{u} = (\varvec{u}_{1}, \ldots , \varvec{u}_{L})^T\in \mathbb {R}^{LM}\), which contains the solution at all quadrature nodes at all time-steps, and the vector \(\varvec{b} = (\varvec{u}_{0,0}, \varvec{0}, \ldots , \varvec{0})^T\in \mathbb {R}^{LM}\), which contains the initial condition for all nodes at the first interval and zeros elsewhere. We define \({\varvec{F}: \mathbb {R}^{LM} \rightarrow \mathbb {R}^{LM}}\) as an extension of \(\varvec{f}\) so that \({\varvec{F}} ({\varvec{u}}) = \left( {\varvec{f}} ({\varvec{u}}_{1}), \ldots , {\varvec{f}} ({\varvec{u}}_{L}) \right) ^T\). Then, the composite collocation problem can be written as
$$\begin{aligned} \mathbf {C}_{\varvec{F}}(\varvec{u}) = \varvec{b}. \end{aligned}$$
(6)
with
$$\begin{aligned} \mathbf {C}_{\varvec{F}}(\varvec{u}) = \left( \mathbf {I} - \varDelta t(\mathbf {I}_L\otimes \mathbf {Q} )\varvec{F} - \mathbf {E}\otimes \mathbf {H}\right) (\varvec{u}), \end{aligned}$$
where the matrix \(\mathbf {E}\in \mathbb {R}^{L\times L}\) just has ones on the first subdiagonal and zeros elsewhere. If \(u \in \mathbb {R}^N\), we need to replace \(\mathbf {H}\) by \(\mathbf {H}\otimes \mathbf {I}_N\).
SDC can be used to solve the composite collocation problem by forward substitution in a sequential way, which means to solve one time-step after each other using the previous solution as initial value of the current time-step. The parallel-in-time integrator PFASST, on the other hand solves the composite collocation problem by calculating on all time-steps simultaneously and is therefore an attractive alternative. The first step from SDC towards PFASST is the introduction of multiple levels, which are representations of the problem with different accuracies in space and time. In order to simplify the notation we focus on a two-level scheme consisting of a fine and a coarse level. Coarsening can be achieved for example by reducing the resolution in space, by decreasing the number of quadrature nodes on each interval or by solving implicit systems less accurately. Especially a coarsening through the reduction of quadrature points does not seem to be worthwhile for our idea to parallize the belonging calculations, since there would no longer be a full employment regarding the calculations on the coarse grid, but instead individual processors would have to communicate larger amounts of data. For this work, we only consider coarsening in space, i.e., by using a restriction operator R on a vector \(u\in \mathbb {R}^{N}\) we obtain a new vector \(\tilde{u}\in \mathbb {R}^{\tilde{N}}\). Vice versa, the interpolation operator T is used to interpolate values from \(\tilde{u}\) to u. Operators, vectors and numbers on the coarse level will be denoted by a tilde to avoid further index cluttering. Thus, the composite collocation operator on the coarse-level is given by \( \tilde{\mathbf {C}}_{\varvec{F}}\). While \(\mathbf {C}_{\varvec{F}}\) is defined on \(\mathbb {R}^{LMN}\), \(\tilde{\mathbf {C}}_{\varvec{F}}\) acts on \(\mathbb {R}^{L M \tilde{N}}\) with \(\tilde{N} \le N\), but as before we will neglect the space dimension in the following notation. The extension of the spatial transfer operators to the full space–time domain is given by \(\mathbf {R} = \mathbf {I}_{LM} \otimes R\) and \(\mathbf {T} = \mathbf {I}_{LM} \otimes T\).
The main goal of the introduction of a coarse level is to move the serial part of the computation to this hopefully cheaper level, while being able to run the expensive part in parallel. For that, we define two preconditioners: a serial one with a lower subdiagonal for the coarse level and a parallel, block-diagonal one for the fine level. The serial preconditionier for the coarse level is defined by
$$\begin{aligned} \tilde{\mathbf {P}}_{\varvec{F}} = \begin{pmatrix} \tilde{\mathbf {P}}_{\varvec{f}}^{{\text {sdc}}} \\ -\tilde{\mathbf {H}} &{} \tilde{\mathbf {P}}_{\varvec{f}}^{{\text {sdc}}} \\ &{} \ddots &{} \ddots \\ &{} &{} -\tilde{\mathbf {H}} &{} \tilde{\mathbf {P}}_{\varvec{f}}^{{\text {sdc}}} \\ \end{pmatrix}, \end{aligned}$$
or, in a more compact way, by
$$\begin{aligned}&\tilde{\mathbf {P}}_{\varvec{F}}(\tilde{\varvec{u}}) = \left( \tilde{\mathbf {I}} - \varDelta t(\mathbf {I}_L \otimes \tilde{\mathbf {Q}}_\varDelta )\tilde{\mathbf {F}} - \mathbf {E}\otimes \tilde{\mathbf {H}} \right) (\tilde{\varvec{u}}). \end{aligned}$$
Inverting this corresponds to a single inner iteration of SDC (a “sweep”) on step 1, then sending forward the result to step 2, an SDC sweep there and so on. The parallel preconditioner on the fine level then simply reads
$$\begin{aligned}&\mathbf {P}_{\varvec{F}}(\varvec{u}) = (\mathbf {I} - \varDelta t(\mathbf {I}_L\otimes \mathbf {Q}_\varDelta ) \varvec{F}) (\varvec{u}). \end{aligned}$$
Applying \(\mathbf {P}_{\varvec{F}}\) on the fine level leads to L decoupled SDC sweeps, which can be run in parallel.
For PFASST, these two preconditioners and the levels they work on are coupled using a full approximation scheme (FAS) known from nonlinear multigrid theory [25]. Following [1] one iteration of PFASST can then be formulated in four steps:
-
1.
the computation of the FAS correction \({\tau }^k\), including the restriction of the fine value to the coarse level
$$\begin{aligned} {\tau }^k =\tilde{\mathbf {C}}_{\varvec{F}} (\mathbf {R} {\varvec{u}}^k) - \mathbf {R} \mathbf {C}_{\varvec{F}} ( {\varvec{u}}^k) , \end{aligned}$$
-
2.
the coarse sweep on the modified composite collocation problem on the coarse level
$$\begin{aligned} \tilde{\mathbf {P}}_{\varvec{F}} (\tilde{\varvec{u}}^{k+1} )&= (\tilde{\mathbf {P}}_{\varvec{F}} - \tilde{\mathbf {C}}_{\varvec{F}})({\tilde{\varvec{u}}}^{k}) + \tilde{\varvec{b}} + \tau ^k, \end{aligned}$$
(7)
-
3.
the coarse grid correction applied to the fine level value
$$\begin{aligned} \varvec{u}^{k+\frac{1}{2}}&= \varvec{u}^{k} + \mathbf {T} ( \tilde{\varvec{u}}^{k+1} -\mathbf {R} \varvec{u}^k ), \end{aligned}$$
(8)
-
4.
the fine sweep on the composite collocation problem on the fine level
$$\begin{aligned} \mathbf {P}_{\varvec{F}} ( \varvec{u}^{k+1} )&= (\mathbf {P}_{\varvec{F}} - \mathbf {C}_{\varvec{F}})( \varvec{u}^{k+\frac{1}{2}} ) + \varvec{b} . \end{aligned}$$
(9)
In Fig. 1, we see a schematic representation of the described steps. The time-step parallel procedure, which we describe here is also the same for all PFASST versions, that we will introduce later. It is common to use as many processors as time-steps: In the given illustration four processors work on four time-steps. Therefore the temporal domain is divided into four intervals, which are assigned to four processors \(P_0, \ldots , P_3\). Every processor performs SDC sweeps on its assigned interval on alternating levels. The big red blocks represent fine sweeps, given by Eq. (9), and the small blue blocks coarse sweeps, given by Eq. (7).
The coarse sweep over all intervals is a serial process: after a processor finished its coarse sweeps, it sends forward its results to the next processor, which takes this result as an initial value for its own coarse sweeps. We see the communication in the picture represented by small arrows, which connect the coarse sweeps of each interval. In (7), the need for communication with a neighboring process is obvious, because \(\tilde{\mathbf {P}}_{\varvec{F}}\) is not a (block-) diagonal matrix, but has entries on its lower block-diagonal. \(\mathbf {P}_{\varvec{F}}\) on the other hand is block-diagonal, which means that the processors can calculate on the fine level in parallel. We see in (9) that there is only a connection to previous time-steps through the right-hand side, where we gather values from the previous time-step and iteration but not from the current iteration. The picture shows this connection by a fine communication, which forwards data from each fine sweep to the following fine sweep of the right neighbor. The fine and coarse calculations on every processor are connected through the FAS corrections, which in our formula are part of the coarse sweep.
PFASST-Newton
For each coarse and each fine sweep within each PFASST iteration, System (7) and System (9), respectively, need to be solved. If f is a nonlinear function these systems are nonlinear as well. The obvious and traditional way to proceed in this case is to linearize the problem locally (i.e. for each time-step, at each quadrature node) using Newton’s method. This way, PFASST is the outer solver with an inner Newton iteration. For triangular \(\mathbf {Q}_\varDelta \), the mth equation on the lth time-step on the coarse level reads
$$\begin{aligned} (1 - \varDelta t\ \tilde{q}^\varDelta _{l,m} \tilde{f}) (\tilde{u}^{k+1}_{l,m}) =&\ \tilde{u}^{k+1}_{l, 0} + \varDelta t\sum _{n=1}^{m-1} \tilde{q}^\varDelta _{l,n} \tilde{f}(\tilde{u}^{k+1}_{l,n}) \\&+ \tilde{\varvec{c}}( \tilde{\varvec{u}}^k)_{l,m}, \end{aligned}$$
where \(\tilde{u}^{k+1}_{0,0} = \tilde{u}_{0,0}\) and \(\tilde{\varvec{c}}(\tilde{\varvec{u}}^k)_{l,m}\) is the mth entry the lth block of \(\tilde{\varvec{c}}(\tilde{\varvec{u}}^k) := (\tilde{\mathbf {P}}_{\varvec{F}} - \tilde{\mathbf {C}}_{\varvec{F}})({\tilde{\varvec{u}}}^{k}) + \tau ^k.\) This term gathers all values of the previous iteration. The first summand of the right-hand side of the coarse level equation corresponds to \(\tilde{\varvec{b}}\) and \(\tilde{\mathbf {H}}\), while the following sum comes from the lower triangular structure of \(\tilde{\mathbf {Q}}_\varDelta \).
For time-step l these equations can be solved one by one using Newton iterations and forward substitution. This is inherently serial, because the solution on the mth quadrature node depends on the solution at all previous nodes through the sum. Thus, while running parallel across the steps, each solution of the local collocation problem is found in serial. In the next section, we will present a novel way of applying Newton’s method, which allows one to parallelize this part across the collocation nodes, joining parallelization across the step with parallelization across the method.