1 Introduction

This work is concerned with the parallel-in-time integration of evolution problems for which the solution can be well approximated by a time-dependent low-rank matrix. In particular, we aim to solve approximately the evolution problem

(1)

where X(t) is a matrix of size \(m \times m\). When the dimension m is large, the numerical solution of (1) can be very expensive since the matrix X(t) is usually dense. One way to alleviate this curse of dimensionality is to use low-rank approximations where, for every t, we approximate X(t) by \(Y(t) \in {\mathbb {R}}^{m \times m}\) such that \({{\,\mathrm{rank}\,}}(Y(t)) = r \ll m\). The accuracy of this approximation will depend on the choice of the rank r. Here, X(t) is assumed to be square for notational convenience and all results can be easily formulated for rectangular X(t).

A popular paradigm to solve directly for the low-rank approximation Y(t) is the dynamical low-rank approximation (DLRA), first proposed in [25]. As defined later in Def. 5, DLRA leads to an evolution problem that is a projected version of (1). In the last decade, many discrete integrators for this projected problem have been proposed. One class of integrators consists in a clever splitting of the projector so that the resulting splitting method can be implemented efficiently. An influential example is the projector-splitting scheme proposed in [29]. Other methods that require integrating parts of the vector field can be found in [4, 22]. Another approach, proposed in [9, 24, 35], is based on projecting standard Runge–Kutta methods (sometimes including their intermediate stages). Most of these methods are formulated for constant rank r. Rank adaptivity can be incorporated without much difficulty for splitting and for projected schemes; see [3, 6, 9, 35]. Finally, given the importance of DLRA in problems from physics (like the Schrödinger and Vlasov equation), the integrators in [7, 29] also preserve certain invariants, like energy. However, none of these time integrators consider a parallel-in-time scheme for DLRA, which is particularly interesting in the large-scale setting.

Parallel computing can be very effective and is even necessary to solve large-scale problems. While parallelization in space is well known, also the time direction can be parallelized to some extent when solving evolution problems. Over the last decades, various parallel-in-time algorithms have been proposed; see, e.g., the overviews [12, 32]. Among these, the Parareal algorithm from [28] is one of the more popular algorithms for time parallelization. It is based on a Newton-like iteration, with inaccurate but cheap corrections performed sequentially, and accurate but expensive solves performed in parallel. This idea of solving in parallel an evolution problem as a nonlinear (discretized) system also appears in related methods like PFASST [8], MGRIT [11] and Space-Time Multi-Grid [17]. Theoretical results and numerical studies on a large numbers of cores show that these parallel-in-time methods can have good parallel performance for parabolic problems; see, e.g., [16, 21, 37]. So far, these methods did not incorporate a low-rank compression of the space dimension, which is the main topic of this work.

2 Preliminaries and contributions

2.1 The Parareal algorithm

The Parareal iteration in Def. 1 below is given for constant time step h (hence \(T=Nh\)) and for autonomous F. Both restrictions are not crucial but ease the presentation. The quantity \(X_n^k\) is an approximation for \(X(t_n)\) at time \(t_n = nh\) and iteration k. The accuracy of this approximation is expected to improve with increasing k. Here, and throughout the paper, we denote dependency on the iteration index k as \({\ }^k\), which should not be confused with the kth power.

Definition 1

(Parareal) The Parareal algorithm is defined by the following double iteration on k and n,

$$\begin{aligned}&\text {(Initial value)} \quad&X_0^k = X_0,\end{aligned}$$
(2)
$$\begin{aligned}&\text {(Initial approximation)} \quad&X_{n+1}^0 = {\mathscr {G}}^h (X_n^0),\end{aligned}$$
(3)
$$\begin{aligned}&\text {(Iteration)} \quad&X_{n+1}^{k+1}= {\mathscr {F}}^h (X_n^k) + {\mathscr {G}}^h (X_n^{k+1}) - {\mathscr {G}}^h (X_n^k). \end{aligned}$$
(4)

Here, \({\mathscr {F}}^h(X)\) represents a fine (accurate) time stepper applied to the initial value X and propagated until time h. Similarly, \({\mathscr {G}}^h(X)\) represents a coarse (inaccurate) time stepper.

Given two time steppers, the Parareal algorithm is easy to implement. A remarkable property of Parareal is the convergence in a finite number of steps for \(k=n\). It is well known that Parareal works well on parabolic problems but behaves worse on hyperbolic problems; see [18] for an analysis.

2.2 Dynamical low-rank approximation

Let \({\mathscr {M}}_r\) denote the set of \(m \times m\) matrices of rank r, which is a smooth embedded submanifold in \({\mathbb {R}}^{m \times m}\). Instead of solving (1), the DLRA solves the following projected problem:

Definition 2

(Dynamical low-rank approximation) For a rank r, the dynamical low-rank approximation of problem (1) is the solution of

(5)

where \({\mathscr {P}}_{Y}\) is the \(l_2\)-orthogonal projection onto the tangent space \({\mathscr {T}}_{Y} {\mathscr {M}}_r\) of \({\mathscr {M}}_r\) at \(Y \in {\mathscr {M}}_r\). In particular, \(Y(t) \in {\mathscr {M}}_r\) for every \(t \in [0,T]\).

To analyze the approximation error of DLRA, we need the following standard assumptions from [23]. Here, and throughout the paper, \(\Vert \cdot \Vert \) denotes the Frobenius norm.

Assumption 1

(DLRA assumptions) The function F satisfies the following properties for all \(X,Y \in {\mathbb {R}}^{m \times m}\):

  • Lipschitz with constant L: \(\left\Vert F(X)-F(Y)\right\Vert \le L \left\Vert X-Y\right\Vert \).

  • One-sided Lipschitz with constant \(\ell \): \(\langle X-Y,F(X)-F(Y) \rangle \le \ell \left\Vert X-Y\right\Vert ^2\).

  • Maps almost to the tangent bundle of \({\mathscr {M}}_r\): \(\left\Vert F(Y)-{\mathscr {P}}_Y F(Y)\right\Vert \le \varepsilon _r\).

In the analysis in Section 3, it is necessary to have \(\ell < 0\) for convergence. This holds when F is a discretization of certain parabolic PDEs, like the heat equation. In particular, for an affine function of the form \(F(X)=A(X)+B\), it holds \(\ell = \tfrac{1}{2}\lambda _{\max }(A+A^T)\); see [20, Ch. I.10]. The quantity \(\varepsilon _r\) is called the modeling error and decreases when the rank r increases. For our problems of interest, this quantity is typically very small. Finally, the existence of L is only needed to guarantee the uniqueness of (1) but it will actually not appear in our analysis. We can therefore allow L to be large, as is the case for discretized parabolic PDEs.

Standard theory for perturbations of ODEs allows us to obtain the following error bound from the assumptions above:

Theorem 1

(Error of DLRA [23]) Under Assumption 1, the DLRA verifies

$$\begin{aligned} \left\Vert \psi ^h_r(Y_0) - \phi ^h(X_0)\right\Vert \le e^{\ell t} \left\Vert Y_0 - X_0\right\Vert + \varepsilon _r \int _0^t e^{\ell s} ds, \end{aligned}$$
(6)

where \(\phi ^h\) is the flow of the original problem (1) and \(\psi ^h_r\) is the flow of its DLRA (5) for rank r.

The solution of DLRA (5) is quasi-optimal with the best rank approximation. This can be seen already in Theorem 1 for short time intervals. Similar estimates exist for parabolic problems [5] and for longer time when there is a sufficiently large gap in the singular values and when their derivatives are bounded [25].

2.3 Contributions

In this paper, we propose a new algorithm, called low-rank Parareal. As far as we know, this is the first parallel-in-time integrator for low-rank approximations. We analyze the proposed algorithm when the function F in (1) is affine. To this end, we extend the analysis of the classical Parareal algorithm in [14] to a more general setup where the coarse problem is different from the fine problem. We can prove that the method converges for big steps (large h) on diffusive problems (\(\ell <0\)). In numerical experiments, we confirm this behavior. In addition, the method also performs well empirically with a less strict condition on h and on a non-affine problem.

3 Low-rank Parareal

We now present our low-rank Parareal algorithm for solving (1). Since the cost of most discrete integrators for DLRA scales quadraticallyFootnote 1 with the approximation rank, we take the coarse time stepper as DLRA with a small rank q. Likewise, the fine time stepper is DLRA with a large rank r. We can even take \(r=m\), which corresponds to computing the exact solution as the fine time stepper since \(Y \in {\mathbb {R}}^{m \times m}\).

Definition 3

(Low-rank Parareal) Consider two ranks \(q < r\). The low-rank Parareal algorithm iterates

$$\begin{aligned}&\text {(Initial value)} \quad&Y_0^k = Y_0, \end{aligned}$$
(7)
$$\begin{aligned}&\text {(Initial approximation)} \quad&Y_{n+1}^0 = \psi ^h_q\circ {\mathscr {T}}_q(Y_n^0) + {\mathscr {E}}_n,\end{aligned}$$
(8)
$$\begin{aligned}&\text {(Iteration)} \quad&Y_{n+1}^{k+1} = \psi ^h_r\circ {\mathscr {T}}_r(Y_n^k) + \psi ^h_q\circ {\mathscr {T}}_q(Y_n^{k+1}) - \psi ^h_q\circ {\mathscr {T}}_q(Y_n^k), \end{aligned}$$
(9)

where \(\psi ^h_r(Z)\) is the solution of (5) at time h with initial value \(Y_0 = Z\), and \({\mathscr {T}}_r\) is the orthogonal projection onto \({\mathscr {M}}_r\). The notations \(\psi ^h_q\) and \({\mathscr {T}}_q\) are similar but apply to rank q. The matrices \({\mathscr {E}}_n\) are small perturbations such that \({{\,\mathrm{rank}\,}}(Y_{n+1}^0) = r + 2q\) and can be chosen randomly.Footnote 2

Observe that the rank of \(Y_n^k\) is at most \(r+2q\) for all nk. The low-rank structure is therefore preserved over the iterations. The matrices \({\mathscr {E}}_n\) insure that each iteration has a rank between r and \(r+2q\). These matrices impact only the initial error but do not have any role in the convergence of the algorithm as is shown later in the analysis. An efficient implementation should store the low-rank matrices in factored form. In this context, the truncated SVD can be efficiently performed. The DLRA flows \(\psi ^h_r\) and \(\psi ^h_q\) can only be computed for relatively small problems. For larger problems, a suitable DLRA integrator must be used; see Sect. 4 for implementation details.

3.1 Convergence analysis

Let \(X_{n} = X(t_{n})\) be the solution of the full problem (1) at time \(t_{n}\). Let \(Y_{n}^{k}\) be the corresponding low-rank Parareal solution at iteration k. We are interested in bounding the error of the algorithm,

$$\begin{aligned} \begin{aligned} E_n^k= X_{n} - Y_{n}^{k}, \end{aligned} \end{aligned}$$
(10)

for all relevant n and k. To this end, we make the following assumption:

Assumption 2

(Affine vector field) The function F is affine linear and autonomous, that is,

$$\begin{aligned} F(X)= A(X) + B \end{aligned}$$

with \(A:{\mathbb {R}}^{m \times m}\rightarrow {\mathbb {R}}^{m \times m}\) a linear operator and \(B \in {\mathbb {R}}^{m \times m}\).

The following lemma gives us a recursion for the Frobenius norm of the error. This recursion will be fundamental in deriving our convergence bounds later on when we generalize the proof for standard Parareal from [14].

Lemma 1

(Iteration of the error) Under the Assumptions 1 and 2, the error of low-rank Parareal verifies

$$\begin{aligned} \left\Vert E_{n+1}^{k+1}\right\Vert\le & {} e^{\ell h} C_{r,q} \left\Vert E_n^k\right\Vert + e^{\ell h} C_q \left\Vert E_n^{k+1}\right\Vert + e^{\ell h} \max _{n \ge 0} \left\Vert X_n - {\mathscr {T}}_r(X_n)\right\Vert \nonumber \\&+ (2\varepsilon _q + \varepsilon _r) \int _0^h e^{\ell (h-s)} ds. \end{aligned}$$
(11)

The constants \(\ell , \varepsilon _q, \varepsilon _r\) are defined in Assumption 1. Moreover, \(C_{r,q}\) and \(C_q\) are the Lipschitz constants of \({\mathscr {T}}_{r} - {\mathscr {T}}_{q}\) and \({\mathscr {T}}_q\).

Proof

Our proof is similar to the one in [24] where first the continuous version of the approximation error of DLRA is studied. Denote by \(\phi ^h(Z)\) the solution of (1) at time h with initial value \(Y_0 = Z\). By definition, the discrete error is

$$\begin{aligned} E_{n+1}^{k+1}= \phi ^h(X_{n}) - \psi ^h_r\circ {\mathscr {T}}_r(Y_n^k) - \psi ^h_q\circ {\mathscr {T}}_q(Y_n^{k+1}) + \psi ^h_q\circ {\mathscr {T}}_q(Y_n^k). \end{aligned}$$

We can interpret each term above as a flow from \(t_n\) to \(t_{n+1}=t_n+h\). Denote these flows by X(t), Z(t), W(t), and V(t) with the initial values

$$\begin{aligned} X(t_n) = X_n, \ Z(t_n) = {\mathscr {T}}_r(Y_n^k), \ W(t_n) = {\mathscr {T}}_q(Y_n^{k+1}), \ V(t_n) = {\mathscr {T}}_q(Y_n^k). \end{aligned}$$
(12)

Defining the continuous error as

$$\begin{aligned} E(t) = X(t) - Z(t) - W(t) + V(t), \end{aligned}$$

we then get the identity \(E_{n+1}^{k+1}= E(t_{n}+h)\).

We proceed by bounding \(\left\Vert E(t)\right\Vert \). By definition of the flows above, we have (omitting the dependence on t in the notation)

where the last equality holds since the function F is affine. Using Assumption 1 and Cauchy–Schwarz, we compute

Since \(\frac{d}{dt} \left\Vert E(t)\right\Vert ^2 = 2 \left\Vert E(t)\right\Vert \frac{d}{dt} \left\Vert E(t)\right\Vert \), we therefore obtain the differential inequality

$$\begin{aligned} \frac{d}{dt} \left\Vert E(t)\right\Vert \le \ell \left\Vert E\right\Vert + \varepsilon _r + 2 \varepsilon _q. \end{aligned}$$

Gr-nwall’s lemma allows us to conclude

$$\begin{aligned} \left\Vert E(t_n+h)\right\Vert \le \left\Vert E(t_n)\right\Vert e^{\ell h} + (2 \varepsilon _q + \varepsilon _r) \int _{t_n}^{t_n+h} e^{\ell (h-s)} ds. \end{aligned}$$
(13)

From (12), we get

$$\begin{aligned} E(t_n)&= X_n - {\mathscr {T}}_r(Y_n^k) - {\mathscr {T}}_q(Y_n^{k+1}) + {\mathscr {T}}_q(Y_n^k). \end{aligned}$$

Denoting \({\mathscr {T}}_r^{\perp }= I - {\mathscr {T}}_r\) and \({\mathscr {T}}_{r,q} = {\mathscr {T}}_r- {\mathscr {T}}_q\), we get after rearranging terms

$$\begin{aligned} E(t_n)= & {} {\mathscr {T}}_r^{\perp }(X_n) + {\mathscr {T}}_r(X_n) - {\mathscr {T}}_q(X_n) - {\mathscr {T}}_r(Y_n^k) + {\mathscr {T}}_q(Y_n^k) + {\mathscr {T}}_q(X_n) - {\mathscr {T}}_q(Y_n^{k+1}) \\= & {} {\mathscr {T}}_r^{\perp }(X_n) + {\mathscr {T}}_{r,q}(X_n) - {\mathscr {T}}_{r,q}(Y_n^k) + {\mathscr {T}}_q(X_n) - {\mathscr {T}}_q(Y_n^{k+1}). \end{aligned}$$

Taking norms gives

$$\begin{aligned} \left\Vert E(t_n)\right\Vert&\le \left\Vert {\mathscr {T}}_r^{\perp }(X_n)\right\Vert + \left\Vert {\mathscr {T}}_{r,q}(X_n) - {\mathscr {T}}_{r,q}(Y_n^k)\right\Vert - \left\Vert {\mathscr {T}}_q(X_n) - {\mathscr {T}}_q(Y_n^{k+1})\right\Vert \nonumber \\&\le \max _{n \ge 0} \left\Vert {\mathscr {T}}_r^{\perp }(X_n)\right\Vert + C_{r,q} \left\Vert E_n^k\right\Vert + C_q \left\Vert E_n^{k+1}\right\Vert , \end{aligned}$$
(14)

where \(C_{r,q}\) and \(C_q\) are the Lipschitz constants of \({\mathscr {T}}_{r,q}\) and \({\mathscr {T}}_q\) respectively. Combining inequalities (13) and (14) gives the statement of the lemma.\(\square \)

We now study the error recursion (11) in more detail. To this end, let us slightly rewrite it as

$$\begin{aligned} \left\Vert E_{n+1}^{k+1}\right\Vert \le \alpha \left\Vert E_n^k\right\Vert + \beta \left\Vert E_n^{k+1}\right\Vert + \kappa , \quad \left\Vert E_n^0\right\Vert \le \gamma , \end{aligned}$$
(15)

with the non-negative constants

$$\begin{aligned} \begin{aligned} \alpha&= e^{\ell h} C_{r,q}, \quad \beta = e^{\ell h} C_q, \quad \gamma = \max _{n \ge 0} \left\Vert E_n^0\right\Vert ,\\ \kappa&= e^{\ell h} \max _{n \ge 0} \left\Vert X_n - {\mathscr {T}}_r(X_n)\right\Vert + (2 \varepsilon _q + \varepsilon _r) \int _0^h e^{\ell (h-s)} ds. \end{aligned} \end{aligned}$$
(16)

Our first result is a linear convergence bound, up to the DLRA approximation error. It is similar to the linear bound for standard Parareal.

Theorem 2

(Linear convergence) Under the Assumptions 1 and 2, and if \(\alpha + \beta < 1\), low-rank Parareal verifies for all \(k \in {\mathbb {N}}\) the linear bound

$$\begin{aligned} \max _{n \ge 0} \left\Vert E_n^k\right\Vert \le \left( \frac{\alpha }{1 - \beta } \right) ^k \max _{n\ge 0} \left\Vert E_n^0\right\Vert + \frac{\kappa }{1 - \alpha - \beta }, \end{aligned}$$
(17)

where \(\alpha , \beta , \kappa \) are defined in (16).

Proof

Define \(e_{\star }^k = \max _{n \ge 0} \left\Vert E_n^k\right\Vert \). Taking the maximum for \(n \ge 0\) of both sides of (15), we obtain

$$\begin{aligned} e_{\star }^{k+1} \le \alpha e_{\star }^k + \beta e_{\star }^{k+1} + \kappa . \end{aligned}$$

By assumption, \(0 \le \beta < 1\) and we can therefore obtain the recursion

$$\begin{aligned} e_{\star }^{k+1} \le \frac{\alpha }{1-\beta } e_{\star }^k + \frac{\kappa }{1-\beta }, \end{aligned}$$

with solution

$$\begin{aligned} e_{\star }^k \le \left( \frac{\alpha }{1 - \beta }\right) ^k e_{\star }^0 + \frac{\kappa }{1- \alpha -\beta } \left[ 1 - \left( \frac{\alpha }{1 - \beta }\right) ^k\right] . \end{aligned}$$

By assumption, we also have \(0 \le \frac{\alpha }{1-\beta }<1\), which allows us to obtain the statement of the theorem.\(\square \)

Next, we present a more refined superlinear bound. To this end, we require the following technical lemma that solves the equality version of the double iteration (15). A similar result, but without the \(\kappa \) term and only as an upper bound, already appeared in [14, Thm. 1]. Our proof is therefore similar but more elaborate.

Lemma 2

Let \(\alpha , \beta , \gamma , \kappa \in {\mathbb {R}}\) be any non-negative constants such that \(\alpha < 1\) and \(\beta < 1\). Let \(e_n^k\) be a sequence depending on \(n, k \in {\mathbb {N}}\) such that

$$\begin{aligned} e_{n+1}^{k+1} = \alpha e_n^k + \beta e_n^{k+1} + \kappa , \qquad e_{n+1}^0 = \gamma , \qquad e_0^k = 0. \end{aligned}$$
(18)

Then,

$$\begin{aligned} e_n^k = \kappa \sum _{j=0}^{k-1} \sum _{i=0}^{n-j-1} \left( {\begin{array}{c}i+j\\ i\end{array}}\right) \ \alpha ^j \beta ^i + {\left\{ \begin{array}{ll} 0 &{} \text { if } n \le k, \\ \gamma \ \alpha ^k \sum _{i=0}^{n-k-1} \left( {\begin{array}{c}i+k-1\\ i\end{array}}\right) \ \beta ^i&\text { if } n \ge k+1. \end{array}\right. } \end{aligned}$$
(19)

Proof

The idea is to use the generating function \(\rho _k(\xi ) = \sum _{n=1}^{\infty }e_n^k \xi ^n\) for \(k \ge 1\). Multiplying (18) by \(\xi ^{n+1}\) and summing over n, we obtain

$$\begin{aligned} \sum _{n=0}^{\infty }e_{n+1}^{k+1} \xi ^{n+1}= & {} \sum _{n=0}^{\infty }\alpha e_n^k \xi ^{n+1} + \sum _{n=0}^{\infty }\beta e_n^{k+1} \xi ^{n+1} + \sum _{n=0}^{\infty }\kappa \xi ^{n+1}, \\ \sum _{n=0}^{\infty }e_{n+1}^0 \xi ^{n+1}= & {} \sum _{n=0}^{\infty }\gamma \xi ^{n+1}. \end{aligned}$$

Since \(e_0^k = 0\) for all k, this gives the relations

$$\begin{aligned} \rho _{k+1}(\xi ) = \alpha \xi \rho _k (\xi ) + \beta \xi \rho _{k+1} ( \xi ) + \kappa \frac{\xi }{1-\xi }, \qquad \rho _0(\xi ) = \gamma \frac{\xi }{1 - \xi }. \end{aligned}$$

We can therefore obtain the linear recurrence

$$\begin{aligned} \rho _{k+1}(\xi ) = a \rho _k(\xi ) + b, \quad \text {where } a = \frac{\alpha \xi }{1 - \beta \xi }, \ b = \frac{\kappa \xi }{(1-\xi )(1-\beta \xi )}. \end{aligned}$$

Its solution satisfies

$$\begin{aligned} \rho _k(\xi ) = \frac{\alpha ^k \xi ^k}{(1 - \beta \xi )^k} \frac{\gamma \xi }{1 - \xi } + \sum _{j=0}^{k-1} \frac{\alpha ^j \xi ^j}{(1-\beta \xi )^{j+1}} \frac{\kappa \xi }{1 - \xi }. \end{aligned}$$

It remains to compute the coefficients in the power series of the above formula since by definition of \(\rho _k(\xi ) = \sum _{n=1}^{\infty }e_n^k \xi ^n\) they equal the unknowns \(e_n^k\). The binomial series formula for \(|z|<1\),

$$\begin{aligned} \frac{1}{(1-z)^{k+1}} = \sum _{i=0}^{\infty } \left( {\begin{array}{c}i + k\\ i\end{array}}\right) z^i, \end{aligned}$$
(20)

together with the Cauchy product gives

$$\begin{aligned} \frac{1}{(1-\beta \xi )^k} \frac{1}{1-\xi }&= \sum _{i=0}^{\infty } \left( {\begin{array}{c}i+k-1\\ i\end{array}}\right) \beta ^i \xi ^i \cdot \sum _{i=0}^{\infty } \xi ^i \\&= \sum _{n=0}^{\infty }\left( \sum _{\ell =0}^n \left( {\begin{array}{c}\ell +k-1\\ \ell \end{array}}\right) \beta ^{\ell } \right) \xi ^n \\ \frac{1}{(1-\beta \xi )^{j+1}} \frac{1}{1-\xi }&= \sum _{i=0}^{\infty } \left( {\begin{array}{c}i+j\\ i\end{array}}\right) \beta ^i \xi ^i \cdot \sum _{i=0}^{\infty } \xi ^i = \sum _{n=0}^{\infty }\left( \sum _{\ell =0}^n \left( {\begin{array}{c}\ell +j\\ \ell \end{array}}\right) \beta ^{\ell } \right) \xi ^n. \end{aligned}$$

Hence, the first term in \(\rho _k(\xi )\) satisfies

$$\begin{aligned} \frac{\alpha ^k \xi ^k}{(1 - \beta \xi )^k} \frac{\gamma \xi }{1 - \xi }&= \gamma \alpha ^k \sum _{n=0}^{\infty }\left( \sum _{\ell =0}^n \left( {\begin{array}{c}\ell +k-1\\ \ell \end{array}}\right) \beta ^{\ell } \right) \xi ^{n+k+1}\nonumber \\&\quad = \gamma \alpha ^k \sum _{n=k+1}^{\infty } \left( \sum _{\ell =0}^{n-k-1} \left( {\begin{array}{c}\ell +k-1\\ \ell \end{array}}\right) \beta ^{\ell } \right) \xi ^n, \end{aligned}$$

while the second term can be written as

$$\begin{aligned} \sum _{j=0}^{k-1} \frac{\alpha ^j \xi ^j}{(1-\beta \xi )^{j+1}} \frac{\kappa \xi }{1 - \xi }= & {} \kappa \sum _{j=0}^{k-1} \sum _{n=0}^{\infty }\sum _{\ell =0}^{n} \left( {\begin{array}{c}\ell +j\\ \ell \end{array}}\right) \alpha ^j \beta ^{\ell } \xi ^{n+j+1} \nonumber \\= & {} \kappa \sum _{n=1}^{\infty } \sum _{j=0}^{k-1} \left( \sum _{\ell =0}^{n-1} \left( {\begin{array}{c}\ell +j\\ \ell \end{array}}\right) \alpha ^j \beta ^{\ell } \right) \xi ^{n+j}. \end{aligned}$$

Putting everything together, we have

$$\begin{aligned} \sum _{n=0}^{\infty }e_n^k \xi ^n&= \gamma \alpha ^k \sum _{n=k+1}^{\infty } \left( \sum _{\ell =0}^{n-k-1} \left( {\begin{array}{c}\ell +k-1\\ \ell \end{array}}\right) \beta ^\ell \right) \xi ^n \\&\quad + \kappa \sum _{m=1}^{\infty } \sum _{j=0}^{k-1} \left( \sum _{\ell =0}^{m-1} \left( {\begin{array}{c}\ell +j\\ \ell \end{array}}\right) \alpha ^j \beta ^{\ell } \right) \xi ^{m+j}. \end{aligned}$$

Finally, we can identify the coefficient \(e_n^k\) in front of \(\xi ^n\) with those on the right-hand side. The coefficient for \(\xi ^n\) in the first term is clearly nonzero only when \(n \ge k+1\). In the second term, there is only one m for every j such that \(m+j=n\). Substituting \(m=n-j\) allows us to identify the coefficient of \(\xi ^n\).\(\square \)

Fig. 1
figure 1

Bounds derived for several values of \(\alpha \) and \(\beta \). In all panels \(n=30\), \(\gamma =1\), and \(\kappa =10^{-15}\)

Using the previous lemma, we can obtain a convergence bound that is superlinear in k.

Theorem 3

(Superlinear convergence) Under the Assumptions 1 and 2, and if \(\alpha + \beta < 1\), the error of low-rank Parareal satisfies for all \(n,k \in {\mathbb {N}}\) the bound

$$\begin{aligned} \left\Vert E_n^k\right\Vert \le \frac{\alpha ^k}{(k-1)!} \frac{\prod _{j=2}^k (n-j)}{1 - \beta } \max _{n \ge 0} \left\Vert E_n^0\right\Vert + \frac{\kappa }{1 - \alpha - \beta }, \end{aligned}$$
(21)

where \(\alpha , \beta , \kappa \) are defined in (16).

Proof

Define \(e_n^k = \left\Vert E_n^k\right\Vert \). By Lemma 1, the terms \(e_n^k\) verify the relation described in Lemma 2 with \(=\) replaced by \(\le \) in (18). Hence, the solution (19) from Lemma 2 will be an upper bound for \(e_n^k\).

Since \(0 \le \alpha + \beta < 1\) and using the binomial series formula (20), we bound the first term in (19) as

$$\begin{aligned} \kappa \sum _{j=0}^{k-1} \sum _{i=0}^{n-j-1} \left( {\begin{array}{c}i+j\\ i\end{array}}\right) \alpha ^j \beta ^i&\le \kappa \sum _{j=0}^{k-1} \sum _{i=0}^{\infty } \left( {\begin{array}{c}i+j\\ i\end{array}}\right) \alpha ^j \beta ^i = \kappa \sum _{j=0}^{k-1} \alpha ^j \frac{1}{(1-\beta )^{j+1}} \\&\le \frac{\kappa }{1-\beta } \sum _{j=0}^{\infty } \left( \frac{\alpha }{1-\beta } \right) ^j = \frac{\kappa }{1-\beta } \frac{1}{1-\frac{\alpha }{1-\beta }} = \frac{\kappa }{1 - \alpha - \beta }. \end{aligned}$$

For \(0 \le i \le n-k-1\) and \(n \ge k+1\), observe that

$$\begin{aligned} \frac{(i+k-1)!}{i!} = \prod _{j=1}^k (i+j) \le \prod _{j=1}^k (n-k-1+j) = \prod _{j=2}^{k} (n-j). \end{aligned}$$

Since \(0 \le \beta < 1\), we can therefore bound the second term as

$$\begin{aligned} \gamma \ \alpha ^k \sum _{i=0}^{n-k-1} \left( {\begin{array}{c}i+k-1\\ i\end{array}}\right) \beta ^i&= \gamma \ \alpha ^k \sum _{i=0}^{n-k-1} \frac{(i+k-1)!}{i! (k-1)!} \beta ^i \\&\le \gamma \ \frac{\alpha ^k}{(k-1)!} \prod _{j=2}^{k} (n-j) \sum _{i=0}^{n-k-1} \beta ^i \\&\le \gamma \ \frac{\alpha ^k}{(k-1)!} \frac{\prod _{j=2}^k (n-j)}{1 - \beta }. \end{aligned}$$

The conclusion now follows by the definition of \(\gamma \).\(\square \)

The proof above can be modified to obtain a simple linear bound that is similar but different to the one from Theorem 2:

Theorem 4

(Another linear convergence bound) Under Assumptions 1 and 2, and if \(\alpha + \beta < 1\), the error of low-rank Parareal satisfies for all \(n,k \in {\mathbb {N}}\) the bound

$$\begin{aligned} \left\Vert E_n^k\right\Vert \le \alpha ^k (1+\beta )^{n-1} \max _{n \ge 0} \left\Vert E_n^0\right\Vert + \frac{\kappa }{1 - \alpha - \beta }, \end{aligned}$$
(22)

where \(\alpha , \beta , \kappa \) are defined in (16).

Proof

We repeat the proof for the superlinear bound but this time, the second term is bounded as

$$\begin{aligned} \gamma \ \alpha ^k \sum _{i=0}^{n-k-1} \left( {\begin{array}{c}i+k-1\\ i\end{array}}\right) \beta ^i \le \gamma \ \alpha ^k \sum _{i=0}^{n-1} \left( {\begin{array}{c}n-1\\ i\end{array}}\right) \beta ^i = \gamma \ \alpha ^k \ (1+\beta )^{n-1}. \end{aligned}$$

\(\square \)

Remark 1

In the proof above, yet another bound based on (20) is

$$\begin{aligned} \gamma \ \alpha ^k \sum _{i=0}^{n-k-1} \left( {\begin{array}{c}i+k-1\\ i\end{array}}\right) \beta ^i \le \gamma \ \alpha ^k \sum _{i=0}^{\infty } \left( {\begin{array}{c}i+k-1\\ i\end{array}}\right) \beta ^i = \gamma \ \alpha ^k \ \frac{1}{(1-\beta )^k}. \end{aligned}$$

This time we recover the linear bound from Theorem 2.

3.2 Summary of the convergence bounds

In the previous section, we have proven four upper bounds for the error of low-rank Parareal. The first is directly obtained from Lemma 2. It is the tightest bound but its expression is too unwieldy for practical use. The other three bounds can be summarized as

$$\begin{aligned} \left\Vert E_n^k\right\Vert \le B_{n,k} \max _{n \ge 0} \left\Vert E_n^0\right\Vert + \frac{\kappa }{1 - \alpha - \beta }, \end{aligned}$$
(23)

with

\(B_{n,k}\)

Rate of (23) in k

\(\alpha ^k (1 - \beta )^{-k}\)

Linear

\(\alpha ^k (1+\beta )^{n-1}\)

Linear

\(\alpha ^k (1 - \beta )^{-1} \frac{\prod _{j=2}^k (n-j)}{(k-1)!}\)

Superlinear

Each of these practical bounds describes different phases of the convergence, and none is always better than the others. In Fig. 1, we have plotted all four bounds for realistic values of \(\alpha \) and \(\beta \). We took \(\kappa = 10^{-15} \approx \varepsilon _{\text {mach}}\) since it only determines the stagnation of the error and would interfere with judging the transient behavior of the convergence plot. Furthermore, the errors \(e_n^0=\gamma =1\) at the start of the iteration \(k=0\) were chosen arbitrarily since they have little influence on the results.

The bounds above depend on \(\alpha =e^{\ell h} C_{r,q}\) and \(\beta =e^{\ell h} C_q\), where \(C_q\) and \(C_{r,q}\) are the Lipschitz constants of \({\mathscr {T}}_q\) and \({\mathscr {T}}_{r,q}\) respectively; see (16). While it seems difficult to give a priori results on the size \(C_q\) and \(C_{r,q}\), we can bound them up to first order in the theorem below. Note also that in the important case of \(\ell <0\), the constants \(\alpha \) and \(\beta \) can be made as small as desired by taking h sufficiently large.

Theorem 5

(Lipschitz constants) Let \(A , {\tilde{A}} \in {\mathbb {R}}^{m \times n}\). Then

$$\begin{aligned} \Vert {\mathscr {T}}_q(A) - {\mathscr {T}}_q({\tilde{A}})\Vert \le \frac{\sigma _q}{\sigma _q - \sigma _{q+1}} \Vert A - {\tilde{A}}\Vert + O(\Vert A - {\tilde{A}}\Vert ^2), \end{aligned}$$
(24)

where \(\sigma _q\) is the qth singular value of A. Moreover,

$$\begin{aligned} \Vert {\mathscr {T}}_{r,q} (A) - {\mathscr {T}}_{r,q} ({\tilde{A}})\Vert \le \left( \frac{\sigma _q}{\sigma _q - \sigma _{q+1}} + \frac{\sigma _r}{\sigma _r - \sigma _{r+1}} \right) \Vert A - {\tilde{A}}\Vert + O(\Vert A - {\tilde{A}}\Vert ^2). \end{aligned}$$
(25)

Proof

For the first inequality, we refer to [2, Theorem2] and [10, Theorem24].The second inequality follows from the first by the triangle inequality,

$$\begin{aligned} \Vert {\mathscr {T}}_{r,q} (A) - {\mathscr {T}}_{r,q} ({\tilde{A}})\Vert _F&\le \Vert {\mathscr {T}}_q(A) - {\mathscr {T}}_q({\tilde{A}})\Vert + \Vert {\mathscr {T}}_r(A) - {\mathscr {T}}_r({\tilde{A}})\Vert . \end{aligned}$$

\(\square \)

In many applications with low-rank matrices, the singular values of the underlying matrix are rapidly decaying. In particular, when the singular values decay exponentially like \(\sigma _k \approx e^{-ck}\) for some \(c > 0\), we have

$$\begin{aligned} \frac{\sigma _q}{\sigma _q - \sigma _{q+1}} = \frac{1}{1- \sigma _{q+1}/\sigma _{q}} \approx \frac{1}{1 - e^{-c}}. \end{aligned}$$
(26)

This last quantity decreases quickly to 1 when c grows. Even for \(c=1\), it is less than 1.6. We therefore see that the constants in Theorem 5 are not too large in this case.

Remark 2

In the analysis, a sufficiently large gap in the singular values is required at both the coarse rank and the fine rank. In our experiments, we observed that such a gap is indeed required at the coarse rank, but not at the fine rank. It suggests that the bound (25) can therefore probably be improved.

4 Numerical experiments

We now show numerical experiments for our low-rank Parareal algorithm. We implemented the algorithm in Python 3.10 and all computations were performed on a MacBook Pro with a M1 processor and 16GB of RAM. The complete code is available at GitHub so that all the experiments can be reproduced. The DLRA steps are solved by the second-order projector-splitting integrator from [29]. Since the problems considered are stiff, we used sufficiently many substeps of this integrator so that the coarse and fine solvers within low-rank Parareal can be considered exact.

4.1 Lyapunov equation

Consider the differential Lyapunov equation,

(27)

where \(A \in {\mathbb {R}}^{m \times m}\) is a symmetric matrix, and \(C \in {\mathbb {R}}^{m \times k}\) is a tall matrix for some \(k \le m\). This initial value problem admits a unique solution for \(t \in [0,T]\) for any \(T>0\). The most typical example of (27) is the heat equation on a square with separable source term. Other applications can be found in [31].

Assumption 3

The matrix \(A \in {\mathbb {R}}^{m \times m}\) is symmetric and strictly negative definite.

Under Assumption 3, the one-sided Lipschitz constant \(\ell \) for (27) is strictly negative. Indeed, the linear Lyapunov operator \({\mathscr {A}}(X) = AX + XA\) has the symmetric matrix representation \(A \otimes I + I \otimes A\) with eigenvalues \(\lambda _i(A) + \lambda _j(A)\) for \(1 \le i,j \le m\); see [19, Ch. 12.3]. As in [20, Ch. I.10], we therefore get immediately that \(\ell = 2 \max _i \lambda _i(A) < 0\). Moreover, since \({\mathscr {A}}\) is invertible, we can write the closed-form solution of (27) as

$$\begin{aligned} X(t) = e^{t {\mathscr {A}}} (X_0) + {\mathscr {A}}^{-1} (e^{t {\mathscr {A}} } (CC^T) - CC^T), \end{aligned}$$
(28)

which can be easily verified by differentiation using properties of the matrix exponential \(e^{t {\mathscr {A}}}(Z) = e^{t A} Z e^{t A}\).

The following result shows that the solution of (27) can be well approximated by low rank. It is the analogue to a similar result for the algebraic Lyapunov equation \({\mathscr {A}}(X)=CC^T\). The latter result is well known, but we did not find a proof for the former in the literature.

Lemma 3

(Low-rank approximability of Lyapunov ODE) Let \(\sigma _i(X_0)\) be the ith singular value of \(X_0\) and likewise for \(\sigma _i(CC^T)\). Under Assumption 3, the solution X(t) of (27) has an approximation

$$\begin{aligned} Y(t) \text { of rank at most }r_0 + 2 r \rho \end{aligned}$$

for any \(0 \le r_0, r, \rho \le m\) with error

$$\begin{aligned} \left\Vert X(t) - Y(t)\right\Vert _2\le & {} e^{\ell t} \sigma _{r_0+1}(X_0) \\&+ \left( \frac{e^{t \ell } - 1}{\ell } \right) \left( 4 \exp \left( \frac{- \pi ^2 \rho }{\log (4 \kappa _A)} \right) \Vert CC^T \Vert _2 + \sigma _{r+1}(CC^T) \right) , \end{aligned}$$

where \(\kappa _A = \left\Vert A\right\Vert _2 \Vert A^{-1}\Vert _2\) and \(\ell = 2 \max _i \lambda _i(A)\).

Proof

The aim is to approximate the following two terms that make up the closed-form solution X(t) in (28):

$$\begin{aligned} X_1(t) = e^{t {\mathscr {A}}} (X_0), \quad X_2(t) = {\mathscr {A}}^{-1} (e^{t {\mathscr {A}} } (CC^T) - CC^T). \end{aligned}$$

The first term \(X_1(t)\) can be treated directly. By the truncated SVD, the initial value satisfies

$$\begin{aligned} X_0 = Y_0 + E_0 \text { where } {{\,\mathrm{rank}\,}}(Y_0) = r_0 \text { and } \Vert E_0\Vert _2 = \sigma _{r_0+1}(X_0). \end{aligned}$$

By Assumption 3, the operator \({\mathscr {A}}\) is full rank. We therefore obtain

$$\begin{aligned} X_1(t) = e^{t {\mathscr {A}}} (X_0) = e^{At} Y_0 e^{At} + e^{At} E_0 e^{At} = Y_1(t) + E_1(t), \end{aligned}$$
(29)

where \({{\,\mathrm{rank}\,}}(Y_1(t)) = {{\,\mathrm{rank}\,}}(Y_0) = r_0\) and \(\Vert E_1(t)\Vert _2 \le e^{\ell t} \sigma _{r_0+1}(X_0)\) since \(\ell = 2 \max _i \lambda _i(A)\). Next, we focus on the second term \(X_2(t)\). Like above, the source term can be decomposed as

$$\begin{aligned} CC^T = D + F \text { where } {{\,\mathrm{rank}\,}}(D) = r \text { and } \Vert F \Vert _2 = \sigma _{r+1}(CC^T). \end{aligned}$$

By linearity of the Lyapunov operator, we therefore obtain

$$\begin{aligned} X_2(t) = {\mathscr {A}}^{-1} (e^{t {\mathscr {A}}} D - D) + {\mathscr {A}}^{-1} (e^{t {\mathscr {A}}} F - F). \end{aligned}$$
(30)

Denote \(M=e^{t {\mathscr {A}}} D - D\). By definition of the Lyapunov operator \({\mathscr {A}}\), we have

$$\begin{aligned} S = {\mathscr {A}}^{-1} (M) \iff AS + SA = M. \end{aligned}$$

As studied in [34] and then improved in [1], the singular values of the solution S are bounded as

$$\begin{aligned} \frac{ \sigma _{{{\,\mathrm{rank}\,}}(M) \rho + 1}(S) }{\sigma _1(S)} \le 4 \exp \left( \frac{- \pi ^2 \rho }{\log (4 \kappa _A)} \right) , \end{aligned}$$
(31)

where \(\kappa _A = \left\Vert A\right\Vert _2 \Vert A^{-1}\Vert _2\) and \( 0 \le \rho \le m\). Since \({{\,\mathrm{rank}\,}}(M) \le 2 {{\,\mathrm{rank}\,}}(D) = 2r\) by assumption on D, the bound (31) then implies that

$$\begin{aligned} S = Y_2(t) + \delta S(t), \end{aligned}$$

where \({{\,\mathrm{rank}\,}}(Y_2(t)) \le 2 r \rho \) and

$$\begin{aligned} \left\Vert \delta S(t)\right\Vert _2&\le 4 \exp \left( \frac{- \pi ^2 \rho }{\log (4 \kappa _A)} \right) \left\Vert {\mathscr {A}}^{-1}(e^{t {\mathscr {A}}} D -D)\right\Vert _2\nonumber \\&\le 4 \exp \left( \frac{- \pi ^2 \rho }{\log (4 \kappa _A)} \right) \frac{e^{t \ell } - 1}{\ell } \left\Vert D\right\Vert _2, \end{aligned}$$

where the last inequality holds by properties of the logarithmic norm \(\mu \) of \({\mathscr {A}}\) which equals \(\ell \); see [36, Proposition 2.2]. Moreover, we can bound the last term in (30) as

$$\begin{aligned} \Vert E_2(t) \Vert _2 = \Vert {\mathscr {A}}^{-1} (e^{t {\mathscr {A}}} F - F) \Vert _2 \le \frac{e^{t \ell } - 1}{\ell } \left\Vert F\right\Vert _2. \end{aligned}$$

Putting (29) and (30) together, we obtained

$$\begin{aligned} X(t) = Y(t) + E(t), \quad Y(t) = Y_1(t) + Y_2(t), \quad E(t) = E_1(t) + \delta S(t) + E_2(t), \end{aligned}$$

which proves the statement of the lemma.\(\square \)

The lemma shows that if \(X_0\) and \(CC^T\) have good low-rank approximations, then the solution X(t) of the differential Lyapunov equation has comparable low-rank approximations as well on [0, T]. Since \(\ell < 0\), we can even take \(T \rightarrow \infty \) and recover essentially the low-rank approximability of the Lyapunov equation \(X(\infty ) = {\mathscr {A}}^{-1}(CC^T)\). This is clearly visible when \(X_0\) and \(CC^T\) are exactly of low rank, which we state as a simple corollary for convenience.

Corollary 1

Under Assumption 3 and assuming that \({{\,\mathrm{rank}\,}}(X_0) = r_0, \, {{\,\mathrm{rank}\,}}(CC^T) = r,\) the solution X(t) of (28) has an approximation

$$\begin{aligned} Y(t) \text { of rank at most } r_0 + 2 r \rho \end{aligned}$$

for any \(0 \le \rho \le m\) with error

$$\begin{aligned} \left\Vert X(t) - Y(t)\right\Vert _2 \le 4\frac{e^{t \ell } - 1}{\ell } \exp \left( \frac{- \pi ^2 \rho }{\log (4 \kappa _A)} \right) \left\Vert CC^T\right\Vert _2. \end{aligned}$$

The corollary clearly shows that the approximation error decreases exponentially when the approximation rank increases linearly via \(\rho \). Furthermore, we see that the condition number of the matrix A has only a mild influence due to \(\log (\kappa _A)\).

Remark 3

Corollary 1 can be compared to a similar result in [26]. In that work, the authors solve (27) with exact low-rank \(X_0=ZZ^T\) and \(CC^T\) using a Krylov subspace method. More specifically, with \(U_k\) an orthonormal matrix that spans the block Krylov space \(K_k(A, [C\,Z])\), the projected Lyapunov equation

is used to define the approximation \(X_k(t) = U_k Y_k(t) U_k^T\). The approximation error of \(X_k(t)\) is studied in [26, Theorem4.2]. Since \({{\,\mathrm{rank}\,}}(X_k(t)) \le k({{\,\mathrm{rank}\,}}(Z) + {{\,\mathrm{rank}\,}}(C))\), we therefore also get a result on the low-rank approximability of (27). This bound is, however, worse than ours since it does not give zero error for \(t=0\) and \(k=1\), for example. On the other hand, it is a bound for a discrete method whereas our Lemma 3 and Corollary 1 are statements about the exact solution.

We now apply the low-rank Parareal algorithm to the differential Lyapunov equation (27). Let \(A = \varDelta _{dx}\) be the \(n \times n\) discrete Laplacian with zero Dirichlet boundary conditions obtained by standard centered differences on \([-1, 1]\). The Lyapunov equation is therefore a model for the 2D heat equation on \(\varOmega = [-1,1]^2\). In the experiments, we used \(n=100\) spatial points and the time interval \([0,T] = [0,2]\). The matrix C for the source is generated randomly with singular values \(\sigma _i = 10^{-5(i-1)}\) where \(i=1,2,\ldots \) so that its numerical rank is 4. In order to have a realistic initial value, \(X_0\) is obtained as the exact solution at time \(t=0.01\) of the same ODE but with a random initial value \(\tilde{X_0}\) with singular values \(\sigma _i = 10^{-(i-1)}\).

Figure 2 is a 3D plot of the solution over time on \(\varOmega \) with its corresponding singular values. As we can see, the solution becomes almost stationary at \(t=1.0\). In addition, it stays low-rank over time in agreement to Lemma 3. Moreover, the singular values suggest to take the fine rank \(r=16\) for an error of the fine solver of order \(10^{-12}\).

Fig. 2
figure 2

Solution over time of the Lyapunov ODE (27) for the heat equation. Note the change of scale between \(t=0.0\) and \(t=1.0\)

The convergence of the error of the low-rank Parareal algorithm is shown in Fig. 3. The algorithm converges linearly from the coarse rank solution to the fine rank solution. Figure 3a suggests that the coarse rank does not influence the convergence rate and it only reduces the initial error. This is consistent with our analysis. Indeed, since the singular values are exponentially decaying, the singular gap is approximately constant; see (26). Hence, the constants \(\alpha \) and \(\beta \) from (16) that determine the convergence rate do not depend on the coarse rank q; as is shown up to first order in Theorem 5. Figure 3b shows that, similarly, the convergence rate does not depend on the fine rank either, although it limits the final error.

Fig. 3
figure 3

Convergence of the error of low-rank Parareal for the Lyapunov ODE (27) with \(n=100\) and \(T=2.0\). Influence of the coarse and fine ranks

In Fig. 4a, we investigate the convergence for several sizes n. Even though the problem is stiff, the convergence does not seem influenced by the size of the problem. Figure 4b shows the error of the algorithm applied to the problem with several step sizes. According to our analysis, the convergence is faster when the stepsize h is large; see (16).

Fig. 4
figure 4

Convergence of the error of low-rank Parareal for the Lyapunov ODE (27) with coarse rank \(q=4\) and fine rank \(r=16\). Influence of size and final time

4.2 Parametric cookie problem

We now solve a simplified version of the parametric cookie problem from [27]. Consider the ODE

(32)

where the sparse matrices \(A_0, A_1 \in {\mathbb {R}}^{1580 \times 1580}\), \({\mathbf {b}} \in {\mathbb {R}}^{1580}\), and \(C_1 = {{\,\mathrm{diag}\,}}(c_1^1, c_1^2, \ldots ,c_1^p)\) are given in [27]. The aim of this problem is to solve a heat problem simultaneously with several heat coefficients, denoted by \(c_1^1, \ldots , c_1^p\).

In our experiments, we used \(p=101\) parameters with \(c_1^1 = 0, c_1^2 = 1, \ldots , c_1^{101} = 100\). The initial value \(X_0\) is obtained after computing the exact solution of (32) at time \(t=0.01\) with the zero matrix as initial value. The time interval is \([0,T] = [0, 0.1]\).

The singular values of the reference solution are shown in Fig. 5. The stationary solution has good low-rank approximations, as was proved in [27, Thm. 2.4]. The singular value decay suggests that a fine rank \(r=16\) leads to full numerical accuracy.

Fig. 5
figure 5

Singular values of the solution over time of the parametric cookie problem (32)

In Fig. 6, we applied the low-rank Parareal algorithm with several coarse ranks q and fine ranks r. Like for the Lyapunov equation, it seems that the convergence rate does not depend on the coarse rank q. In agreement to our analysis (see Fig. 1), the convergence is linear in the first iterations and superlinear in the last iterations. In addition, the convergence is not influenced by the fine rank r.

Fig. 6
figure 6

Convergence of the errorof low-rank Parareal for the parametric cookie problem (32). Influence of the coarse and fine ranks

4.3 Riccati equation

The Riccati differential equation is given by

(33)

where \(X \in {\mathbb {R}}^{m \times m}\), \(A \in {\mathbb {R}}^{m \times m}\), \(C \in {\mathbb {R}}^{k \times m}\), and \(S \in {\mathbb {R}}^{m \times m}\). We note that this is no longer an ODE with an affine vector field and hence our theoretical results do not apply here. As already studied in [33], we take \(S = I\) and A is the spatial discretization of the diffusion operator

$$\begin{aligned} {\mathscr {D}} = \partial _x (\alpha (x) \partial _x(\cdot )) - \lambda I \end{aligned}$$

on the spatial domain \(\varOmega = [0,1]\). Furthermore, we take \( \alpha (x) = 2 + 2 \cos (2 \pi x) \) and \(\lambda = 1\). The discretization is done by the finite volume method, as described in [15]. The tall matrix \(C \in {\mathbb {R}}^{k \times m}\) is obtained from k independent vectors \(\{1, e_1, \ldots ,e_{(k-1)/2}, f_1, \ldots , f_{(k-1)/2} \}\), where

$$\begin{aligned} e_i(x) = \sqrt{2} \cos (2 \pi k x) \quad \text {and} \quad f_i(x) = \sqrt{2} \sin (2 \pi k x), \quad i=1, \ldots , (q-1)/2, \end{aligned}$$
(34)

are evaluated at the grid points \(\{x_j \}_{j=1}^m\) with \(x_j = \frac{j}{m+1}\). The time interval is \([0, T] = [0, 0.1]\).

As for the other problems, the singular values of the solution (shown in Fig. 7) indicate that we can expect good low-rank approximations on [0, T]. We choose the fine rank \(r=18\). The convergence of low-rank Parareal is shown in Fig. 8. Unlike the previous problems, the coarse rank q has a more pronounced influence on the behavior of the convergence. While our theoretical results do not hold for this nonlinear problem, we still see that low-rank Parareal converges linearly when the coarse rank q is sufficiently large (\(q=6\), \(q=8\)). The convergence is slower (but still superlinear) when \(q=4\). This could be due to the non-constant gaps in the singular values. The influence of the fine rank r is more like for the linear problems.

Fig. 7
figure 7

Singular values of the solution over time of the Riccati ODE (33)

Fig. 8
figure 8

Convergence of the error of low-rank Parareal for the Riccati problem (33). Influence of the coarse and fine ranks

4.4 Rank-adaptive algorithm

Since the approximation rank of the solution is usually not known a priori, it is more convenient for the user to supply an approximation tolerance than an approximation rank. Even though the rank can change to satisfy the tolerance during the truncation steps, Algorithm 3 can be easily reformulated for such a rank adaptive setting. The key idea is to fix the coarse rank to keep the cost of the coarse solver low, while the fine rank is determined by a fine tolerance.

Definition 4

(Adaptive low-rank Parareal) Consider a small fixed rank q and a fine tolerance \(\tau \). The adaptive low-rank Parareal algorithm iterates

$$\begin{aligned}&\text {(Initial value)} \quad&Y_0^k = Y_0, \end{aligned}$$
(35)
$$\begin{aligned}&\text {(Initial approximation)} \quad&Y_{n+1}^0 = \psi ^h_q\circ {\mathscr {T}}_q(Y_n^0) + {\mathscr {E}}_n, \end{aligned}$$
(36)
$$\begin{aligned}&\text {(Iteration)} \quad&Y_{n+1}^{k+1} = \psi ^h_r\circ {\mathscr {T}}_r(Y_n^k) + \psi ^h_q\circ {\mathscr {T}}_q(Y_n^{k+1}) - \psi ^h_q\circ {\mathscr {T}}_q(Y_n^k), \end{aligned}$$
(37)

where the notation is similar to that of the previous Def. 3, except for \({\mathscr {T}}_{\tau }\) which represents the rank-adaptive truncation. In particular, \({\mathscr {T}}_{\tau }(Y)\) is the best rank q approximation of Y so that the \((q+1)\)st singular value of Y equals the tolerance \(\tau \). The matrices \({\mathscr {E}}_n\) are small perturbations, randomly generated such that \({{\,\mathrm{rank}\,}}(Y_{n+1}^0) = {{\,\mathrm{rank}\,}}(Y_0)\) and its smallest singular value is larger than the fine tolerance \(\tau \).

Figure 9 shows the numerical behavior of this rank-adaptive algorithm. As we can see, the algorithm behaves as desired. Figure 9a shows the algorithm applied with several tolerances and is comparable to Fig. 3b with several fine ranks. Figure 9b shows the rank of the solution over time. Already after two iterations, the rank is reduced to almost the numerical rank of the exact solution and the rank does not change much for the rest of the iterations.

Fig. 9
figure 9

Adaptive low-rank Parareal. On the left, the algorithm is applied with several tolerances. On the right, the rank of the solution over time is shown for several iterations

5 Conclusion and future work

We proposed the first parallel-in-time algorithm for integrating a dynamical low-rank approximation (DLRA) of a matrix evolution equation. The algorithm follows the traditional Parareal scheme but it uses DLRA with a low rank as coarse integrator, whereas the fine integrator is DLRA with a higher rank. Taking into account the modeling error of DLRA, we presented an analysis of the algorithm and showed linear convergence as well as superlinear convergence under common assumptions and for affine linear vector fields, up to the modeling error.

In our numerical experiments, the algorithm behaved well on diffusive problems, which is similar to the original Parareal algorithm. Due to the significant difference in computational cost for the fine and coarse integrators, it is reasonable to expect good speed-up in actual parallel implementations. A proper parallel implementation to verify this claim is a natural future work. It may however be more appropriate to first generalize more efficient parallel-in-time algorithms, like Schwarz waveform relaxation and multigrid methods [13], to DLRA.

Since DLRA can also be used to obtain low-rank tensor approximations [30], another future work is to extend low-rank Parareal to tensor DLRA. Finally, our theoretical analysis assumes that the ODE has an affine vector field. Since this assumption was only needed in one step of the proof of Lemma 1, it might be possible that it can be relaxed to include certain non-linear vector fields.