Wave propagation characteristics of Parareal

The paper derives and analyses the (semi-)discrete dispersion relation of the Parareal parallel-in-time integration method. It investigates Parareal's wave propagation characteristics with the aim to better understand what causes the well documented stability problems for hyperbolic equations. The analysis shows that the instability is caused by convergence of the amplification factor to the exact value from above for medium to high wave numbers. Phase errors in the coarse propagator are identified as the culprit, which suggests that specifically tailored coarse level methods could provide a remedy.


Introduction
Parallel computing has become ubiquitous in science and engineering, but requires suitable numerical algorithms to be efficient. Parallel-in-time integration methods have been identified as a promising direction to increase the level of concurrency in computer simulations that involve the numerical solution of time dependent partial differential equations [5]. A variety of methods has been proposed [8,10,12,22,25], the earliest going back to 1964 [27]. While even complex diffusive problems can be tackled successfully [11,14,24,33] -although parallel efficiencies remain low -hyperbolic or advection-dominated problems have proved to be much harder to parallelise in time. This currently prevents the use of parallel-in-time integration for most problems in computational fluid dynamics, even though many applications struggle with excessive solution times and could benefit greatly from new parallelisation strategies. Daniel Ruprecht School of Mechanical Engineering, University of Leeds, LS2 9JT, UK For the Parareal parallel-in-time algorithm there is some theory available illustrating its limitations in this respect. Bal shows that Parareal with a sufficiently damping coarse method is unconditionally stable for parabolic problems but not for hyperbolic equations [2]. An early investigation of Parareal's stability properties showed instabilities for imaginary eigenvalues [31]. Gander and Vandewalle [17] give a detailed analysis of Parareal's convergence and show that even for the simple advection equation u t + u x = 0, Parareal is either unstable or inefficient. Numerical experiments reveal that the instability emerges in the nonlinear case as a degradation of convergence with increasing Reynolds number [32]. Approaches exist to stabilise Parareal for hyperbolic equations [3,4,7,13,15,23,29], but typically with significant overhead, leading to further degradation of efficiency, or limited applicability.
Since a key characteristic of hyperbolic problems is the existence of waves propagating with finite speeds, understanding Parareal's wave propagation characteristics is important to understand and, hopefully, resolve these problems. However, no such analysis exists that gives insight into how the instability emerges. A better understanding of the instability could show the way to novel methods that allow the efficient and robust parallel-in-time solution of flows governed by advection. Additionally, just like for "classical" time stepping methods, detailed knowledge of Parareal's theoretical properties for test cases will help understanding its performance for complex test problems where mathematical theory is not available.
To this end, the paper derives a discrete dispersion relation for Parareal to study how plane wave solutions u(x,t) = exp(−iωt) exp(iκx) are propagated in time. It studies the discrete phase speed and amplification factor and how they depend on e.g. the number of processors, choice of coarse propagator and other parameters. The analysis reveals that the source of the instability is a convergence from above in arXiv:1701.01359v2 [math.NA] 14 Oct 2017 the amplification factor in higher wave number modes. In diffusive problems, where high wave numbers are naturally strongly damped, this does not cause any problems, but in hyperbolic problems with little or no diffusion it causes the amplification factors to exceed a value of one and thus triggers instability. Furthermore, the paper identifies phase errors in the coarse propagator as the source of these issues. This suggests that controlling coarse level phase errors could be key to devising efficient parallel-in-time methods for hyperbolic equations.
All results presented in this paper have been produced using pyParareal, a simple open-source Python code. It is freely available [28] to maximise the usefulness of the here presented analysis, allowing other researchers to test different equations or to explore sets of parameters which are not analysed in this paper.

Parareal for linear problems
Parareal [25] is a parallel-in-time integration method for an initial value probleṁ For the sake of simplicity we consider only the linear case where the right hand side is given by a matrix A ∈ C n,n . To parallelise integration of (1) in time, Parareal decomposes the time interval [0, T ] into P time slices with P indicating the number of processors. Denote as F δt and G ∆t two "classical" time integration methods with time steps of length δt and ∆t (e.g. Runge-Kutta methods). For the sake of simplicity, assume that all slice [T j−1 , T j ) have the same length ∆ T and that this length is an integer multiple of both δt and ∆t so that ∆ T = N c ∆t and ∆ T = N f δt. Below, δt will always denote the time step size of the fine method and ∆t the time step size of the coarse method, so that we omit the indices and just write G and F to avoid clutter. Standard serial time marching using the method denoted as F would correspond to evaluating where u p ≈ u(T p ). Instead, after an initialisation procedure to provide values u 0 p -typically running the coarse method once -Parareal computes the iteration for k = 1, . . . , K where the computationally expensive evaluation of the fine method can be parallelised over P processors. If the number of iterations K is small enough and the coarse method much cheaper than the fine, iteration (4) can run in less wall clock time than serially computing (3).

The Parareal iteration in matrix form
As a first step toward deriving Parareal's dispersion relation we will need to derive its stability function which will require writing it in matrix form. Consider now the case where both the coarse and the fine integrator are one-step methods with stability functions R f and R c . Then, G and F can be expressed as matrices with F := R f (Aδt) N f and G := (R c (A∆t)) N c . Denote as u k = (u 0 , . . . , u P ) ∈ R (P+1)n a vector that contains the approximate solutions at all time points T j , j = 1, . . . , P and the initial value u 0 . Simple algebra shows that one step of iteration (4) is equivalent to the block matrix formulation with matrices and and a vector b = (u 0 , 0, . . . , 0) ∈ R (P+1)n . Formulation (6) interpretes Parareal as a preconditioned linear iteration [1].

Stability function of Parareal
From the matrix formulation of a single Parareal iteration (6), we can now derive its stability function, that is we can express the update from the initial value u 0 to an approximation u P at time T = T P using Parareal with K iterations as multiplication by a single matrix. The fine propagator solution satisfies and is a fixed point of iteration (6). Therefore, propagation of the error is governed by the matrix in the sense that Using this notation and applying (6) recursively, it is now easy to show that If, as is typically done, the start value u 0 for the iteration is produced by a single run of the coarse method, that is if Equation (13) further simplifies to The right hand side vector can be generated from the initial value u 0 via by defining Finally, denote as the matrix that selects the last n entries out of u k . Now, a full Parareal update from some initial value u 0 to an approximation u P using K iterations can be written compactly as The stability matrix M parareal ∈ R n,n depends on K, T , ∆ T , P, ∆t, δt, F, G and A. Note that Staff and Rønquist derived the stability function for the scalar case using Pascal's tree [31].

Weak scaling vs. longer simulation times
There are two different application scenarios for Parareal that we can study when increasing the number of processors P. If we fix the final time T , increasing P will lead to better resolution since the coarse time step ∆t cannot be larger than the length of a time slice ∆ T -the coarse method has to perform at least one step per time slice. In this scenario, more processors are used to absorb the cost of higher temporal resolution ("weak scaling"). Alternatively, we can use additional processors to compute until a later final time T and this is the scenario investigated here. Consequently, we study here the case where T and P increase together and always assume that T = P, that is each time slice has length one and increasing P means parallelising over more time slices covering a longer time interval. Since dispersion properties of numerical methods are typically analysed for a unit interval, this causes some issues that we resolve by "normalising" the Parareal stability function, see §3.1.

Maximum singular value
The matrix E defined in (11) determines how quickly Parareal converges. Note that E is nil-potent with E P = 0, owing to the fact that after P iterations Parareal will always reproduce the fine solution exactly. Therefore, all eigenvalues of E are zero and the spectral radius is not useful to analyse convergence. Below, to investigate convergence, we will therefore compute the maximum singular value σ of E instead. Since (20) so that if σ < 1 Parareal will converge monotonically without stalling. In particular, this rules out behaviour as found by Gander and Vandewalle for hyperbolic problems, where the error would first increase substantially over the first P/2 iterations before beginning to decrease [16]. However, achieving fast convergence and good efficiency will typically require σ 1. Note that if the coarse method is used to generate u 0 , it follows from (9) and (14) that the initial error is The size of σ depends on the accuracy "gap" between the coarse and fine integrator and the wave number. Figure 1 shows σ for varying values of ∆t when backward Euler is used for both coarse and fine method. Clearly, as the coarse time step approaches to fine time step of δt = 0.05, the maximum singular value approaches zero. However, in this limit the coarse and fine propagator are identical and no speedup is possible. The larger ∆t compared to δt, the cheaper the coarse method becomes but since σ also grows, more iterations are likely required. Note that higher wave numbers lead to higher values of σ while lower wave numbers tend to have values of σ 1 even for large coarse-to-fine time step ratios.
Looking at σ also provides a way to refine performance models for Parareal. Typically, in models projecting speedup, the number of iterations has to be fixed in addition to ∆t, δt  and P. Instead, at least for linear problems, we can fix k such that σ k ≤ tol (22) for some fixed tolerance tol. The resulting projected speedup for P = 16 processors and a tolerance of tol = 1e − 2 is shown in Figure 2. First, as the coarse time step increases, the reduced cost of the coarse propagator improves achievable speedup. Simultaneously, the decreasing accuracy of G manifests itself in an increasing number of iterations required to match the selected tolerance. These two counteracting effects create a "sweet spot" where G is accurate enough to still enable relatively fast convergence but cheap enough to allow for speedup. It is noteworthy, however, that this sweet spot is different for lower and higher wave numbers. Therefore, the potential for speedup from Parareal does not solely depend on the solved equations and discretization parameters but also on the solution -the more prominent high wave number modes are, the more restricted achievable speedup.

Convergence and (in)stability of Parareal
Two different but connected issues with Parareal are discussed throughout the paper, convergence and (in)stability.
Here, convergence refers to how fast Parareal approaches the fine solution within a single instance of Parareal, that is  [26] with P = 16 processors for the same parameters as in Figure 1 and the number of iterations k fixed such that σ k ≤ tol with tol = 0.01.
As discussed above, after k = P iterations we always have M parareal = F, but particularly for hyperbolic problems this can happen only at the final iterations k = P while at k = P − 1 there is still a substantial difference [16]. Clearly, speedup is not obtainable in such a situation. The maximum singular value σ of E gives an upper bound or worst-case scenario of how fast Parareal converges to the fine solution, cf. Equation (20). While σ < 1 does not necessarily guarantee converge quick enough to generate meaningful speedup, it guarantees monotonous convergence and rules out an error that increases first before decreasing only in later iterations.
The other issue investigated in the paper is that of stability of repeated application of Parareal ("restarting"). Below, stability is normally assessed for Parareal with a fixed number of iterations k. A configuration of Parareal is referred to as unstable if it leads to an amplification factor of more than unity. This corresponds to an artificial increase in wave amplitudes and, just as for classical time stepping methods, would result in a diverging numerical approximation if the method is used recursively While for classical methods this recursive application simply means stepping through time steps, for Parareal with P processors it would mean computing one window [0, T P ], then restarting it with the final approximation as initial value for the next window [T P , 2T P ] and so on.

Discrete dispersion relation for the advection-diffusion equation
Starting from (19) we now derive the (semi-)discrete dispersion relation of Parareal for the one dimensional linear advection diffusion problem First, assume a plane wave solution in space with wave number κ so that (25) reduces to the initial value problem where R is the stability function of the employed method. Now assume that the approximation ofû is a discrete plane wave so that the solution at the end of the n th time slice is given bŷ Inserting this in (28) gives For R in polar coordinates, that is R = |R| exp(iθ ) with θ = angle(R), we get The exact integrator, for example, would read Using (30) to compute the resulting frequency yields ω = iδ (k,U, ν) and retrieves the continuous plane wave solution of (25). It also reproduces the dispersion relation of the continuous system However, if we use an approximate stability function R instead, we get some approximate ω = ω r + iω i with ω r , ω i ∈ R. The resulting semi-discrete solution becomes u n j = e −iωt n e iκx = e ω i t n e iκ(x− ωr κ t n) .
Therefore, ω r /κ governs the propagation speed of the solution while ω i governs the growth or decay in amplitude. Consequently, ω r /κ is referred to as phase velocity while exp(ω i ) is called amplification factor. In the continuous case, the phase speed is equal to U and the amplification factor equal to e −νκ 2 . Note that for (25) the exact phase speed should be independent of the wave number κ. However, the discrete phase speed of a numerical method often will change with κ, thus introducing numerical dispersion. Also note that for ν > 0 higher wave numbers decay faster because the amplification factor decreases rapidly as κ increases.

Normalisation
The update function R for Parareal in Equation (19) denotes not an update over [0, 1] but over [0, T P ] where T P = P is the number of processors. A phase speed of ω r /κ = 1.0, for example, indicates a wave that travels across a whole interval [0, 1] during the step. If scaled up to an interval [0, P], the corresponding phase speed would become ω r /κ = P instead. This enlarged range of values causes problems with the complex logarithm in (30). As an example, take the stability function of the exact propagator (32). Analytically, the identity holds, resulting in the correct dispersion relation (34) of the continuous system. However, depending on the values of κ, U, T and ν, this identity is not necessarily satisfied when computing the complex logarithm using np.log. For example, for κ = 2, U = 1, ν = 0 and T > 0, the exact stability function is R = e −2iT . In Python, we obtain for T = 1 and so identity (36) is not fulfilled. The reason is that the logarithm of a complex number is not unique and so np.log returns only a complex logarithm but not necessarily the right one in terms of the dispersion relation.
To circumvent this problem, we "normalise" the update R for Parareal to [0, 1]. To this end, decompose where P √ R corresponds to the propagation over [0, 1] instead of [0, P]. Since there are P many roots P √ R, we have to select the right one. First, we use the zeros function of numPy to find all P complex roots z i of Then, we select as root P √ R the value z i that satisfies where θ is the angle function and θ targ some target angle, which we still need to define. We compute ω and the resulting phase speed and amplification factor for a range of wave numbers 0 ≤ κ 1 ≤ κ 2 ≤ . . . ≤ κ N ≤ π. For κ 1 , θ targ is set to the angle of the frequency ω computed from the analytic dispersion relation. After that, θ targ is set to the angle of the root selected for the previous value of κ. The rationale is that small changes in κ should only result in small changes of frequency and phase so that θ (ω i−1 ) ≈ θ (ω i ) if the increment between wave numbers is small enough. From the selected root P √ R we then compute ω using (30), the resulting discrete phase speed and amplification factor and the target angle θ targ for the next wave number.

Analysis of the dispersion relation
After showing how to derive Parareal's dispersion relation and normalising it to the unit interval, this section now provides a detailed analysis of different factors influencing its discrete phase speed and amplification factor. In both cases, the discrete phase speed of Parareal converges almost monotonically toward the continuous phase speed. Even for ten iterations, Parareal still causes significant slowing of medium to large wave number modes. Parareal requires almost the full number of iterations, k = 15, before it faithfully reproduces the correct phase speed across most of the spectrum. However, for any number of iterations where speed up might still be possible, Parareal will introduce significant numerical dispersion. Slight artificial acceleration is also observed for high wave number modes for k = 15 in the non-diffusive and k = 10 in the diffusive case, but generally phase speeds are quite similar in the diffusive and non-diffusive case.

Influence of diffusion
The amplification factor in the non-diffusive case (upper right figure) illustrates Parareal's instability for hyperbolic equations: for k = 10 and k = 15 it is larger than one for a significant part of the spectrum, indicating that these modes are unstable and will be artificially amplified. For k = 5, the iteration has not yet corrected for the strong diffusivity of the coarse propagator and remains stable for all modes but with significant numerical damping of medium to high wave numbers. The reason for the stability problems is discernible from the amplification factor for the diffusive case (rightmost figure): from k = 0 (blue circles) to k = 5, Parareal reproduces the correct amplification factor for small wave number modes but significantly overestimates the amplitude of medium to large wave numbers. It then continues to converge to the correct value from above. For the diffusive case where the exact values are smaller than one this does not cause instabilities. In the non-diffusive case, however, any overestimation of the analytical amplification factor immediately causes instability. There is, in a sense, "no room" for the amplification factor to converge to the correct value from above. This also means that using a non-diffusive method as coarse propagator, for example trapezoidal rule, leads to disastrous consequences (not shown) where most parts of the spectrum are unstable for almost any value of k. Figure 4 illustrate how the phase speed and amplitude errors discussed above manifest themselves. It shows a single Gauss peak advected with a velocity of U = 1.0 with ν = 0.0 on a spatial domain [0, 4] over a time interval [0, 16] distributed over P = 16 processors and N c = 2 coarse time steps per slice. A spectral discretisation is used in space, allowing to represent the derivative exactly. For k = 5 iterations, most higher wave numbers are damped out and the result looks essentially like a low wave number sine function. The artificially amplified medium to high wave number modes create a "bulge" for k = 10 while dispersion leads to a significant trough at the sides of the domain. After fifteen iterations, the solution approximates the main part of the Gauss peak reasonably well, but dispersion still leads to visible errors along the flanks. The right figure shows a part of the resulting spectrum. For k = 5, only the lowest wave number modes are present, leading to the sine shaped solution. After k = 10 iterations, most of the spectrum is still being truncated but a small range of wave numbers around κ = 0.05 is being artificially amplified which creates the "bulge" seen in the left figure. Finally, for k = 15 iterations, Parareal starts to correctly capture the spectrum but the still significant overestimation of low wave number amplitudes and underestimation of higher modes causes visible errors.

Observation 1
The amplification factor in Parareal for higher wave numbers converges "from above". In diffusive problems these wave numbers are damped, so the exact amplification factor is significantly smaller than one, leaving room for Parareal to overestimate it without crossing the threshold to instability. For non-diffusive problems where the exact amplification factor is one, every overestimation causes the mode to become unstable.

Low order finite difference in coarse method
In a realistic scenario, some approximation of the spatial derivatives would have to be used instead of the exact symbol δ . For simple finite differences, we can study the effect this has on the dispersion relation. Consider the first order upwind finite difference as approximation for u x in (25). Assuming a discrete plane wave u j =û(t)e iκ j∆ x on a uniform spatial mesh x j = j∆ x instead of the continuous plane wave (26), this leads to For ν = 0 this results in the initial value problem with initial valueû(0) = 1 and a discrete symbolδ instead of δ as in (27) so thatδ → δ as ∆ x → 0. Durran gives details for different stencils [6].
The dispersion properties of the implicit Euler method together with the first order upwind finite difference are qualitatively similar to the ones for implicit Euler with the exact symbol (not shown). 1 Using the upwind finite difference instead of the exact symbol gives qualitatively similar wave propagation characteristics for the coarse propagator. Numerical slowdown increases (up to the point where modes at the higher end of the spectrum almost do not propagate at all) and numerical diffusion becomes somewhat stronger. As a result, Parareal's dispersion properties (also not shown) are also relatively similar, except for too small phase speeds even for k = 15.
However, if we use the centred finite difference instead, this leads to an approximate symbol In this case, it turns out that the dispersion properties of implicit Euler with δ andδ are quite different. Figure 5 shows the discrete phase speed (left) and amplification factor (right) for both configurations. For the phase speed, both version agree qualitatively, even though usingδ leads to noticeable stronger slow down, particularly of higher wave numbers. For the amplification factor, however, there is a 1 The script plot ieuler dispersion.py supplied together with the Python code can be used to visualize the dispersion properties of the coarse propagator alone. significant difference between the semi-discrete and fully discrete method. While the former damps high wave numbers strongly, the combination of implicit Euler and centred finite differences strongly damps medium wave numbers while damping of high wave numbers is weak.
In Parareal, this causes a situation similar to what happens when using the trapezoidal rule as coarse propagator, albeit less drastic. Figure 6 shows again the phase speed (left) and amplification factor (right) for the same configuration as before but implicit Euler with centred finite difference for G. The failure of the coarse method to remove high wave number modes again leads to an earlier triggering of the instability. Whereas for Parareal using δ on the coarse level the iteration k = 5 was will stable (see Figure 3), it is now unstable. For iterations k = 10 and k = 15, large parts of the spectrum remain unstable. Also, the stronger numerical slow down of the coarse method makes it harder for Parareal to correct for phase speed errors. Where before Parareal with k = 15 iteration captured the exact phase speed reasonably well, in Figure 6 we still see significant numerical slow down of the higher wave number modes.

Observation 2
The choice of finite difference stencil used in the coarse propagator can have a significant effect on Parareal. It seems that centred stencils that fail to remove high wave number modes cause similar problems as non-diffusive time stepping methods, suggesting that stencils with upwindbias are a much better choice.

Influence of phase and amplitude errors
To investigate whether phase errors or amplitude errors in the coarse method trigger the instability, we construct coarse propagators where either the phase or the amplitude is exact. Denote as R euler the stability function of backward Euler and as R exact the stability function of the exact integrator. Then, a method with the amplification factor of backward Euler and no phase speed error can be constructed as while a method with no amplification error and the phase speed of backward Euler can be constructed as These artificially constructed propagators are now used within Parareal. Figure 7 shows the resulting amplification factor when using R 1 (upper) or R 2 (lower) as coarse propagator. For R 1 , where there is no phase speed error in the coarse propagator, there is no instability. Already for k = 10 it produces a good approximation of the exact amplification factor across  the whole spectrum. In contrast, for R 2 where there is no amplification error produced by G, the instability is clearly present for k = 5, k = 10 and k = 15. Figure 8 shows the solution for the same setup that was used for Figure 4, except using the R 1 artificial coarse propagator without phase errors instead of the backward Euler. For k = 5 iterations, the peak is strongly damped but, because G has no phase errors, in the correct place. After k = 10 iterations, Parareal has corrected for most the numerical damping and already provides a reasonable approximation, even though the amplitude of most wave numbers in the spectrum is still severely underestimated. However, the lack of phase errors and resulting numerical dispersion avoids the "bulge" and distortions that were present in Figure 4. Finally, for k = 15 iterations, the solution provided by Parareal is indistinguishable from the exact solution. Small underestimation of the amplitudes of larger wave numbers can still be seen in the spectrum, but the effect is minimal. Note that this does not mean that Parareal will provide speedup -in a realistic scenario, where F is not exact but a time stepping method, too, it would depend on how many iterations are required for Parareal to be as accurate as the fine method run serially and the actual runtimes of both propagators. All that can be said so far is that avoiding coarse propagator phase errors avoids the instability and leads to faster convergence.
The effect of eliminating phase errors in the coarse method can also be illustrated by analysing the maximum singular value σ of the error propagation matrix. Figure 9 shows σ depending on the wave number κ for three different coarse propagators: the backward Euler, the artificially constructed propagator R 1 with no phase error and the artificially constructed propagator R 2 with no amplitude error. For the backward Euler method, σ is larger than one for significant parts of the spectrum, indicating possible non-monotonous convergence for these modes. The situation is even worse for R 2 , mirroring the problems with a non-diffusive coarse method like the trapezoidal rule mentioned above. For R 1 , however, σ remains below one across the whole spectrum, so that Parareal will converge monotonically for every mode. Since σ approaches one for medium to high wave numbers, convergence there is potentially very slow, in line with the errors seen in the upper part of the spectrum of the Gauss peak. However, in contrast to the other two cases, these wave numbers will not trigger instabilities.
In summary, these results strongly suggest that phase errors in the coarse method are responsible for the instability, which is in line with previous findings that Parareal can quickly correct even for very strong numerical diffusion as long as a wave is placed at roughly the correct position by the coarse predictor [29].
Observation 3 The instability in Parareal seems to be caused by phase errors in the coarse propagator while amplitude errors are quickly corrected by the iteration.

Relation to asymptotic Parareal
It is interesting to point out how the R 1 propagator with exact phase speed is related to the asymptotic Parareal method developed by Haut et al. [18]. The exact propagator for (25) is given by Therefore, we have This leads to the purely diffusive coarse level problem with restriction operator e Uiκt and interpolation e −Uiκt taking care of the propagation part. This is precisely the strategy pursued in the nonlinear case by "asymptotic Parareal" where they factor out a fast term with purely imaginary eigenvalues, related to acoustic waves. In a sense, their approach can be understood as an attempt to construct a coarse method with minimal phase speed error. Of course, evaluation of the transformation is not trivial for more complex problems and requires a sophisticated approach [30], in contrast to the here studied linear advection-diffusion equation where the transformation is simply multiplication by e −Uiκt and e Uiκt .

Phase error or mismatch
So far, we have always assumed that the fine method is exact. This leaves the question whether the instability is triggered by phase errors in the coarse method or simply by a mismatch in phase speeds between fine and coarse level. In order to see if the instability arises if both fine and coarse level have the same large phase error, we replace the fine propagator stability function by Now, the fine propagator is a method with exact amplification factor but a discrete phase speed that is as inaccurate as the coarse method. While such an integrator would not make for a very useful method in practice it is valuable for illustrative purposes. The coarse method is again the standard implicit Euler. Figure 10 shows the phase speed (left) and amplification factor (right) of Parareal for this configuration. Since the fine and coarse method have the same (highly inaccurate) phase  speed, Parareal matches the fine method's phase speed exactly from the first iteration and all lines coincide. The amplification factor converges quickly to the correct value and looks almost identical to the case where a coarse propagator with exact phase speed was used, compare for Figure 7 (left). No instability occurs and amplification factors are below one across the whole spectrum for all iterations. Figure 11 shows how Parareal converges for this configuration in physical and spectral space. Because both fine and coarse method now have substantial phase error, the Gauss peak is at a completely wrong position. However, for k = 10, Parareal already approximates it reasonably well and shows no sign of instability. Convergence looks again very simi-lar to the results shown in Figure 8 except for the wrong position of the Gauss peak. While making the fine method as inaccurate as the coarse method is clearly not a useful strategy to stabilise Parareal, this experiment nevertheless demonstrates that the instability is triggered by different discrete phase speeds in the coarse and fine method.
Observation 4 Analysing further the issue of phase errors shows that the instability seems to arise from mismatches between the phase speed of coarse and fine propagator.
It is interesting to note that a very similar observation was made by Ernst and Gander for multi-grid methods for the Helmholtz equation. There, the "coarse grid correction fails because of the incorrect dispersion relation (phase error) on coarser and coarser grids [...]" [9]. They find that adjusting the wave number of the coarse level problem in relation to the mesh size leads to rapid convergence of their multi-grid solver. Investigating if and how their approach might be applied to Parareal (which can also be considered as a multi-grid in time method [17]) would be a very interesting direction for future research. Furthermore, it seems possible that parallel-in-time methods with more than two levels like MGRIT [10] or PFASST [8] could yield some improvement, because they would allow for less drastic changes in resolution compared to two-level Parareal.

Coarse time step
Using a smaller time step for the coarse method will obviously reduce its phase error and can thus be expected to benefit Parareal convergence. Figure 12 shows that this is indeed true. It shows discrete phase speed (left) and amplification factor (right) for the same configuration as used for Figure 3, except now using two coarse step per time slice instead of one. Since the coarse propagator alone is now already significantly more accurate, Parareal with k = 5 and k = 10 iterations provides more accurate phase speeds and, for k = 15, reproduces the exact value exactly, The reduced phase errors translate into a milder instability. For k = 10, some wave numbers have amplification factors above one, but both the range of unstable wave numbers and the severity of the instability are much smaller than if only a single coarse time step is used. This explains why configurations can be quite successful where both F and G use nearly identical time steps and the difference in runtime is achieved by other means, e.g. an expensive high order spatial discretisation for the fine and a cheap low order discretisation on the coarse level.
Observation 5 Since phase errors of the coarse method obviously depend on its time step size, reducing the coarse time step helps to reduce the range of unstable wave numbers and the severity of the instability.

Number of time slices
All examples so far only considered P = 16 time slices and processors. To illustrate the effect of increasing P, Figure 13 shows the discrete dispersion relation for standard Parareal for P = 64 time slices or processors (same configuration as in Figure 3 except for P). Even for k = 15 iterations, Parareal reproduces the correct phase speed (left figure) very poorly -waves across large parts of the spectrum suffer from substantial numerical slowdown. Also, converge is slow and there is only marginal improvement from k = 5 to k = 15 it-erations. Convergence is somewhat faster for the amplification factor (right figure) with more substantial improvement for k = 15 over the coarse method. However, there also remains significant numerical attenuation of the upper half of the spectrum. If integrating the Gauss peak with this configuration, the result at T = 64 after k = 5 iterations is essentially a straight line (not shown) as almost all modes beyond κ = 0 are strongly damped. A small overshoot at around κ = 1 is noticeable for k = 15 iterations and this will worsen as k increases. In general, as P increases, it takes more iterations to trigger the instability since the slow convergence requires longer to correct for the strong diffusivity of the coarse method.
These results suggest that the high wave numbers are the slowest to converge and that convergence deteriorates as P increases. This is confirmed by Figure 14, showing the maximum singular value for three wave numbers plotted against P. Convergence generally gets worse as P increases, but note that even for P = 64 the low wave number mode (blue circles) still converges monotonically while the high wave number mode (green crosses) might already converge nonmonotonically for only P = 4 processors. There also seems to be a limit for σ as P increases, with higher wave numbers levelling off at higher values of σ .
Therefore, Parareal could provide some speedup for linear hyperbolic problems if the solution consists mainly of very low wave number modes and/or numerical diffusion in the fine propagator is sufficiently strong to remove higher wave number modes. This also explains why divergence damping in the fine propagator can accelerate convergence of Parareal [29], as it removes exactly the high wave number modes that converge the slowest.
Observation 6 While convergence becomes slower as the number of processors P increases, low wave numbers converge monotonically even for large numbers of P but high wave numbers might not do so already for P = 4.

Wave number
The analysis above showed that higher wave numbers converge slower and are more susceptible to instabilities. This is confirmed in Figure 15 showing the difference between the Parareal and fine integrator stability function The smallest wave number, κ = 0.45, converges quickly in the hyperbolic (upper) and diffusive case (lower). For κ = 0.9, both cases show some initial stalling before the mode converges. Finally, κ = 2.69 shows first a significant increase in the defect before convergence sets in as k comes close to P = 16. Interestingly, this is the case for both ν = 0 and ν = 0.1. While the "bulge" is even more pronounced in the diffusive case, since modes decay proportional to e −νκ 2 t in amplitude, the absolute values of the defect are orders of magnitude smaller. Therefore, at least until k = 7 iterations, it is no longer the high wave number mode κ = 2.69 that will restrict performance, but rather the lower wave number κ = 0.9. Then, the instability for the high wave number kicks in and for k ≥ 8 wave number κ = 2.69 is again causing the largest defect. As ν increases, however, the defects for κ = 2.69 will reduce further, the cross-over point will move to later iterations and eventually lower wave numbers will determine convergence for all iterations. In a sense, in line with the analysis above, Parareal propagates high wave number modes very wrongly in both cases, but since high wave number modes are quickly attenuated if ν > 0, it does not matter very much in the diffusive case. Figure 16 confirms this for a wider range of wave numbers κ. It shows the maximum singular value σ of the error propagation matrix E over the whole spectrum for three different values of ν. For ν = 0 (hyperbolic case), σ increases monotonically with κ and the highest wave number converges the slowest. After around κ ≥ 0.8, the singular values are larger than one and convergence becomes potentially non-monotone. For ν = 0.1, σ increases until around κ = 1.8 and then decreases again. Therefore, the slowest converging mode is no longer at the end but in the middle of the spectrum. Also, we now have σ < 1 for all κ so that all modes will converge, even though some potentially very slowly. Increasing diffusion further to ν = 0.5 greatly improves convergence for all modes, the largest σ across the whole spectrum is now below 0.5. The worst converging mode has also moved "further down" the spectrum and is now at around κ = 1.0. This shows how the strong natural damping of high wave numbers from diffusion counteracts Parareal's tendency to amplify them and thus stabilises it.
Observation 7 Since diffusion naturally damps higher wave numbers, it removes the issue of slow or no convergence at the end of the spectrum. Therefore, as the diffusivity parameter ν increases, the wave number that converges the slowest and determines performance of Parareal becomes smaller.

Conclusions
Efficient parallel-in-time integration of hyperbolic and advection-dominated problems has been shown to be problematic. This prevents application of a promising new parallelisation strategy to many problems in computational fluid dynamics, despite the urgent need for better utilisation of massively parallel computers. For the Parareal parallel-in-time method, mathematical theory has shown that the algorithm is either unstable or inefficient when applied to hyperbolic equations, but so far no detailed analysis exists of how exactly the instability emerges.
The paper presents the first detailed analysis of how Parareal propagates waves and the ways in which the instability is triggered. It uses a formulation of Parareal as a preconditioned fixed point iteration for linear problems to derive its stability function. From there, a discrete dispersion relation is obtained that allows to study the phase speed and amplitude errors from Parareal when computing wave-like solutions. To deal with issues arising from increasing the time interval together with the number of processors, a simple procedure is introduced to normalise the stability function to the unit interval.    Analysis of the discrete dispersion relation and the maximum singular value of the error propagation matrix then allows to make a range of observations, illustrating where the issues of Parareal for wave problems originate. A key finding is that the source of the instability are different discrete phase speeds on the coarse and fine level, which cause instability of higher wave number modes. Interestingly, the overestimation of high wave number amplitudes is present in diffusive problems, too, but since there these amplitudes are naturally strongly damped, this does not trigger instabilities. Further analysis addresses the role of the number of processors, the coarse time step size and comments on possible connections to asymptotic Parareal and multi-grid methods for the Helmholtz equation.
The analysis presented here will be useful to interpret and understand performance of Parareal for more complex problems in computational fluid dynamics. A natural line of future research would be to attempt to develop a new, more stable, parallel-in-time method for hyperbolic problems based on the provided observations. For example, the update in Parareal proceeds component wise. That means that if the coarse propagator moves a wave at the wrong speed, the update will not know that a simple shift of entries could provide a good correction. Attempting to somehow modify the Parareal update to take into account this type of information seems promising, even though probably challenging to do in 3D. Extending the analysis presented here to systems with multiple waves, e.g. the shallow water equations, or to nonlinear problem where wave numbers interact would be another interesting line of inquiry. Furthermore, the framework used here to analyse Parareal is straightforward to adopt for other parallel-in-time integration methods as long as a matrix representation for them is available.