All-at-once multigrid approaches for one-dimensional space-fractional diffusion equations

We focus on a time-dependent one-dimensional space-fractional diffusion equation with constant diffusion coefficients. An all-at-once rephrasing of the discretized problem, obtained by considering the time as an additional dimension, yields a large block linear system and paves the way for parallelization. In particular, in case of uniform space–time meshes, the coefficient matrix shows a two-level Toeplitz structure, and such structure can be leveraged to build ad-hoc iterative solvers that aim at ensuring an overall computational cost independent of time. In this direction, we study the behavior of certain multigrid strategies with both semi- and full-coarsening that properly take into account the sources of anisotropy of the problem caused by the grid choice and the diffusion coefficients. The performances of the aforementioned multigrid methods reveal sensitive to the choice of the time discretization scheme. Many tests show that Crank–Nicolson prevents the multigrid to yield good convergence results, while second-order backward-difference scheme is shown to be unconditionally stable and that it allows good convergence under certain conditions on the grid and the diffusion coefficients. The effectiveness of our proposal is numerically confirmed in the case of variable coefficients too and a two-dimensional example is given.


Introduction
Fractional diffusion equations (FDEs) generalize classical partial differential equations (PDEs), and their recent success is due to the non-local behavior of fractional operators that translates in an appropriate modeling of anomalous diffusion phenomena appearing in several applicative fields, like imaging or electrophysiology [2,4]. The non-locality of the fractional operators causes absence of sparsity in the discretization matrices. Fortunately, in presence of uniform grids, the discretization matrices show a Toeplitz-like structure and this paves the way for the design of iterative solvers specialized for structured linear systems.
In this regard, for one-dimensional space-FDE problems we mention the circulant preconditioning in [14], the multigrid method in [20], and the structure preserving tridiagonal preconditioners in [6]. The latter preconditioners were motivated by the spectral study of the coefficient matrices through the notion of symbol and by the so-called generalized locally Toeplitz theory [9].
In the two-dimensional setting, classical preconditioners based on multilevel circulant matrices are not well suited, while multigrid methods, possibly used as preconditioners, can be effective and robust solvers. Some multigrid proposals for FDEs discretized with finite differences are given in [16][17][18]. When finite element or finite volume discretizations are adopted, multigrid methods are investigated in [13] and [7], respectively. In particular, in [18] the spectral approach presented in [6] has been extended to two-dimensional FDEs and has been used to define a multigrid preconditioner which is particularly effective when the fractional orders are both close to 2. For fractional derivative orders that differ from each other, i.e., for FDEs that show an intrinsic anisotropy along the axes, we mention the "multigrid as smoother" (MG-S) approach firstly proposed in [19] for integer-order problems and then adapted to fractional ones in [5]. Such method consists in a V-cycle with semicoarsening used as smoother inside an outer full-coarsening, and in [5] it is shown to be a robust preconditioner in presence of strong anisotropies caused by the fractional orders.
Due to the sequentiality of time integration, with none of the aforementioned approaches we can aspire towards complete independence of time of the overall computational cost. By contrary, an all-at-once rephrasing of the discretized problem over a uniform space-time grid, obtained by considering the time as an additional dimension, yields large (multilevel) Toeplitz linear systems and opens to parallelization.
In this regard, we mention the banded Toeplitz preconditioner proposed in [27] for solving non-linear space-FDEs, and the block structured preconditioner given in [3] for dealing with arbitrary dimensional space problems. In [10,21] a Strang-type preconditioner for solving FDEs by boundary value methods has been proposed. In the case of equal left and right diffusion coefficients, we also mention the multigrid reduction in time (MGRIT) discussed in [26]. Therein, the authors consider finite elements in space and Crank-Nicolson in time, since the MGRIT is specifically tailored for one step methods.
The present paper fits within the latter framework. Precisely, our scope is to build a fast and efficient parallel-in-time structure-based multigrid solver. We fix our attention on a weighted and shifted Grünwald difference (WSGD) discretization of a one-dimensional time-dependent space-FDE with constant diffusion coefficients. We stress that this one-dimensional problem turns out to be already a tough one, due to the block structure of the coefficient matrix and to its possibly anisotropic character because of the grid choice and the diffusion coefficients.
As for the time discretization, we opt either for Crank-Nicolson (CN) or secondorder backward-difference formula (BDF2) schemes. The unconditional stability of CN-WSGD has already been proven in [23]. Concerning BDF2, in [15] it was combined with a central finite difference scheme for solving space-FDEs with diffusion coefficients equal to 1. In that same paper, a proof of unconditional stability of the resulting method was given. We extend this result to the case where the space scheme is WSGD and the diffusion coefficients are not necessarily equal to each other.
Aiming at building a parallel-in-time multigrid, we consider block Jacobi as smoother, since it is parallelizable. Moreover, exploiting the Toeplitz structure of the coefficient matrices and the related symbols we define the projectors according to what has been done in the integer-order literature for both isotropic [1], and anisotropic Toeplitz linear systems [8].
The performances of the proposed multigrid strategies reveal sensitive to the choice of the time discretization scheme. Indeed, many tests, including a two-dimensional example, show that Crank-Nicolson prevents the multigrid to yield good convergence results, while BDF2 scheme allows good convergence under certain conditions on the grid and the diffusion coefficients.
The paper is organized as follows. In Sect. 2, we introduce the problem setting and we describe both CN-WGSD and BDF2-WSGD approximation methods, by giving the formal expression and the structure of the resulting matrices. In Sect. 3, we prove that BDF2-WSGD scheme is unconditionally stable. In Sect. 4, we perform an all-at-once rephrasing of the original matrices and give some results on their spectra, which are leveraged in Sect. 5 for the design of proper multigrid strategies. Several numerical experiments, also in the case of variable diffusion coefficients, are reported in Sect. 6 for testing the performances of our proposals. Finally, in Sect. 7 we draw conclusions.

Preliminaries
In this section, we first introduce the FDE problem we are interested in (Sect. 2.1). Second, we recall the definition of multilevel Toeplitz matrix (Sect. 2.2). The latter will be needed in Sect. 4 where we perform on a all-at-once rephrasing of the given problem and we study the related spectral properties. Finally, we briefly review the combination of the chosen finite difference space-discretization with two different time discretization schemes (Sect. 2.3).

Problem setting: a one-dimensional space-FDE
In this work, we focus on the following one-dimensional initial-boundary value space-FDE problem where = (a, b) is the space domain, d ± > 0 are the diffusion coefficients, v(x, t) is the forcing term, and u(x,t) ± x are the left (+) and right (−) fractional derivative operators of order ∈ (1, 2) with respect to variable x defined as By considering the time like an additional dimension, the discretization of equation (1) yields a two-level Toeplitz matrix. In the next subsection we clarify such structure.

Multilevel Toeplitz matrices and their symbol
Here we report the formal definition of a multilevel Toeplitz matrix. (1) To clarify the notation, a 2-level Toeplitz matrix of order N generated by f is given by n i ∈ ℝ n i ×n i are matrices whose entry (s, t)th equals 1 if s − t = j i and 0 elsewhere. In other words, a 2-level Toeplitz matrix is a block Toeplitz whose blocks are Toeplitz. When d = 1 , we simplify the notation using

Space-time discretizations: CN-WSGD and BDF2-WSGD
In the following, we briefly review the finite difference space-discretization of problem (1) obtained using the second order accurate Weighted and Shifted Grünwald Difference (WSGD) scheme [23] combined with either Crank-Nicolson (CN) or second-order backward-difference formula (BDF2) schemes in time.
Let N, M ∈ ℕ and consider the following uniform space-time grid According to [23], the discretization of + x yields a lower Hessenberg Toeplitz matrix A N = T N (f ) , where with and g ( ) k = (−1) k k . A similar reasoning shows that the discretization of − x yields an upper Hessenberg Toeplitz matrix, which coincides with A T N . The discretization of the forcing term returns vector v m = [v(x i , t m )] N i=1 and the application of CN and BDF2 schemes in time gives the following linear systems respectively, where r = t 2 x and Remark 1 In the case of BDF2, the solution u 1 at time t 1 is computed with CN and outside the all-at-once linear system. Note that any other one step method could be used to compute u 1 . For example, although Implicit Euler is only first order accurate, if we only use it once, it will not compromise the global second-order accuracy. Such a statement can be found in [24].
The following proposition, which plays an important role in the definition of the projectors for our multigrid strategy (see Sect. 5), defines the symbol of the spatial discretization.
is non-positive and has a zero of order at x = 0.

Stability of the BDF2-WSGD scheme
In [23] the authors proved the unconditional stability of the CN-WSGD scheme (2), in the constant diffusion coefficients case, as a consequence of the following theorem.
Indeed, in case of a one step scheme like CN, the stability relies on the spectral radius of the iterations matrix of the time stepping algorithm, which is required to be lower than 1. We now aim to prove the stability of the BDF2-WSGD scheme. After the discretization in space we obtain an equation of the form where (t), (t) are, respectively, the semi-discrete in space unknown and forcing term.
The region of absolute stability of a linear multistep method is defined as follows.
Definition 2 (Absolute stability region) The region of absolute stability for the linear multistep method is the set of points z ∈ ℂ for which the roots { j } s j=1 of the polynomial satisfies the following root conditions: Therefore, again as a consequence of Theorem 1, the following theorem holds. By defining z ∶= 1 − 2z 3 x , which is a complex number and can be written as z = a + ib , it follows that the roots are From Theorem 1, we have that Re(z) = a > 1 for ∈ (1, 2) , and the study of the maximum of function g(a, b) shows that sup a>1 g(a, b) The following corollary exploits the density of diagonalizable matrices into the space of square matrices to remove the diagonalizability hypothesis in Theorem 2.

Corollary 1 Let ∈ (1, 2) , then BDF2-WSGD scheme (3) is unconditionally stable.
Proof Let us suppose that A x,N is not diagonalizable, otherwise the thesis follows from Theorem 2. Let us consider the Schur decomposition QTQ H of A x,N , where Q and T are unitary and upper triangular matrices, respectively. Note that due to the structure of T, its diagonal elements are the eigenvalues of T and that, by similarity, they coincide with the eigenvalues of A x,N .
Since we are assuming that A x,N is not diagonalizable, T has at least two diagonal elements that are equal. Let us then consider matrix B N = QTQ H , where T is obtained from T by properly shifting its diagonal entries such that T becomes diagonalizable. More precisely, for > 0 and for Since B N is diagonalizable and its eigenvalues have negative real part, Theorem 2 applies to B N , and since by letting → 0 the thesis is proven. ◻

Remark 2
The unconditional stability of BDF2 combined with a central finite difference scheme for discretizing the fractional derivative operator, was given in [15]. Therein, the diffusion coefficients were both equal to 1. Under the diagonalizability hypothesis, Theorem 2 extends the unconditional stability of BDF2 to the case where the space scheme is WSGD and the diffusion coefficients are not necessarily equal to each other. Corollary 1 generalizes Theorem 2 to the case of a non diagonalizable matrix A x,N .

All-at-once rephrasing of our problem and related spectral study
Starting from Eqs. (2) and (3), and chaining the unknown u m at each time step into a unique vector as in the case of CN and BDF2, respectively, we can rephrase the original discretized problem as the following large block linear systems and Note that both coefficient matrices A CN , A BDF2 are two-level Toeplitz according to Definition 1, and hence we can compute their symbol. From has a zero of order at x = 0 . Then, by assuming d ± = d and N, M → ∞ , we have with h S defined as follows: (i) if r is constant, then and both functions have a unique zero at (x, t) = (0, 0) of order 1 and α in t and x, respectively; (ii) if r → 0 , then from (7) we have and both functions have a zero of order 1 at t = 0, ∀x; (iii) if r → ∞ , by grouping up r in (7), we have where h CN has a zero of order 1 at t = , ∀x and a zero of order at x = 0, ∀t , while h BDF2 vanishes only at x = 0, ∀t with order . The presence of at least a line of zeros in the symbol is called anisotropy. The latter becomes stronger as the number of such lines increases. We expect then case (iii) for CN to be much harder to be numerically treated than all other cases.

Remark 3
If we suppose r to be constant and let d → ∞ or d → 0 then the same results as in case (ii) and (iii), with r in place of d, hold. In practice, since r, d are fixed coefficients, the anisotropy arises when d ⋅ r is very large or very small.

Remark 4
The study of the symbol h S can easily be extended to the case where d + ≠ d − using the results in [6].

Multigrid methods for all-at-once systems
This section is devoted to the design of multigrid strategies based on the spectral study performed in Sect. 4 for the linear systems in (6). With this aim, we first recall the basics of the multigrid in Sect. 5.1, then we discuss our multigrid proposals in Sect. 5.2.

Multigrid idea and convergence results
Multigrid methods, introduced in [22], combine two iterative methods known as smoother and coarse grid correction (CGC). Given the linear system A N x = b , A N ∈ ℝ N×N , the former is typically a stationary iterative method, which we denote with S x, A N , b , where is the number of iterations.
To give a precise account of what the CGC is, let us consider the most basic version of multigrid, often used for proving convergence results, i.e., the two-grid method (TGM). Given a full-rank matrix P N ∈ ℝ N×k , with k < N , a step of TGM is defined by Algorithm 1.
Steps 1) to 5) define the CGC and step 6) is called post-smoother. To strengthen the algorithm, a second smoother, which is called pre-smoother, could be added before the CGC. V-Cycle is obtained from TGM by replacing Step 4) with a recursive call of TGM. The Garlerkin approach is useful for the convergence analysis, but in practice it could be too computationally expensive. Therefore, it is often replaced by a geometric approach, which consists in the rediscretization of the equation over a coarser grid. This leads to a less robust algorithm, but allows to maintain the same structure of the coefficient matrix at each level and is highly parallelizable.
Moreover, if the matrix A N shows a block Toeplitz-like structure, by choosing the geometric approach the matrix-vector products can be performed by means of the fast Fourier transform algorithm at each coarser grid.
Let K N ∈ ℝ N×⌊ N 2 ⌋ be the down-sampling matrix, which keeps an entry every two, then the one-dimensional grid transfer operator P N is defined as In more than one dimension, the projector is defined through the tensor product. For instance, in two dimensions the required projector P NM is defined as where are the projectors in space and time generated by p 1 (x) and p 2 (t) , respectively. Notice, that this is the case we are interested in. Indeed, by considering the time as an additional dimension, we have reinterpreted the original one-dimensional problem as a two-dimensional one.
The convergence of the V-cycle, proved in [1], requires the following condition on p(x, t) = p 1 (x)p 2 (t) . Let f ≥ 0 be the symbol of A N , vanishing only at (x 0 , t 0 ) , then p has to satisfy } is the set of the "mirror points" of (x, t).
In the case where the symbol f has a whole line of zeros along the axes, relation (8) does not hold anymore. In such a case, an efficient alternative to standard V-cycle is given by the so-called semi-coarsening [8]. The projector in the semi-coarsening approach is defined by considering the vanishing variable as a parameter, and then building the projector in the remaining variable according to the one-dimensional version of relation (8). The projector in the other dimension is simply given by an identity matrix.

Multigrid methods for the all-at-once systems
We now see how the results recalled in Sect. 5.1 apply to our case. In particular, we define the projector according to condition (8) and the properties of h S defined in Sect. 4. Let us first introduce the polynomials Note that: • p 1 (x) has a zero of order 2 at x = ; • p + 2 (t) has a zero of order 1 at t = ; • p − 2 (t) has a zero of order 1 at t = 0.
In the case of CN, according to the analysis in Sect. 4, we distinguish the following three cases: (1) If r is constant, then h CN has a unique zero of minimum order 1 at (x, t) = (0, 0) . The mirror points of (x, t) = (0, 0) are (0, ), ( , 0), ( , ) . Since p 1 (x) and p + 2 (t) vanish at x = and t = , respectively, p(x, t) = p 1 (x)p + 2 (t) vanishes at the mirror points with a minimum order of 1 and hence satisfies relation (8).
(2) In the anisotropic case where r → 0 , h CN is zero on the whole x-axis, then we opt for semi-coarsening in time. Precisely, by considering variable x as a parameter, h CN has a unique zero of order 1 at t = 0 . Then, we generate P t,M through p + 2 (t) that has a zero of order 1 at the mirror point t = . The projector P x,N is given by the identity matrix I N .
(3) In the anisotropic case, where r → ∞ , h CN is zero on both axes. The theory does not apply to this scenario. Nevertheless, as a first attempt to dominate (at least partially) this other kind of anisotropy, we use again standard semi-coarsening in both time and space. Precisely, • When x is considered as a parameter, h CN has a zero of order 1 at t = .
Therefore, we perform semi-coarsening in time replacing P x,N with I N and generating P t,M through p − 2 (t) , whose first order zero at t = 0 satisfies relation (8) for x fixed, i.e., considering only the one-dimensional problem in the variable t. • When t is considered as a parameter, h CN has a zero of order at x = 0 . Therefore, the semi-coarsening in space is defined by replacing P t,M with the identity matrix I M and generating P x,N through p 1 (x) , which has a zero of order 2 at the mirror point x = , again according to condition (8) for t fixed.
In the case of BDF2, items 1) and 2) are identical. Regarding item 3), i.e. when r → ∞ , symbol h BDF2 vanishes only over the line x = 0, ∀t with order . Hence we have a standard anisotropy, like in item 2), but this time along the t-axis. Therefore, we consider a semi-coarsening in space by setting P t,M = I M and by generating P x,N through p 1 (x) , whose position and order of the zero satisfies relation (8) for t fixed.
Concerning the smoother, we consider -weighted block Jacobi method ( -BJ), where the diagonal blocks are of the form I − rA x,N with = 1 for CN and = 4∕3 for BDF2. The reason for such a choice is that it is parallelizable and it allows to exploit the structure of the coefficient matrix. Moreover -BJ converges for any ∈ (0, 1] whenever the blocks are of size N × N , hence the study of its relaxation parameter is only related to the smoothing property along the time axis (see Sect. 6.1 for a numerical discussion about this issue). In our tests, the inversion of the blocks in block-Jacobi is performed through the Matlab function backslash.
We stress that, choosing standard Jacobi would decrease the computational cost and would also exploit the structure of the coefficient matrix, but its use would ask for a tougher study of the relaxation parameter in order to ensure the convergence.

Remark 5
Note that in case of BDF2, when d → ∞ or r → ∞ , the coefficient matrix in Eq. (6) tends to a block diagonal Toeplitz matrix. This means that, when d ⋅ r becomes large, using the multigrid is pointless since its smoother is already computing the solution accurately enough.

Numerical results
In this section we investigate the performances of -BJ, two-grid method (TGM) and V-Cycle (V), mainly used as standalone solvers for solving the linear system in (6). Few numerical results concerning the use of both TGM and V as preconditioners (only one iteration) for GMRES are also given.
Both TGM and V-Cycle will have one iteration of -BJ as post-smoother and no pre-smoothing iteration (the reason is given in Sect. 6.1). In V-Cycle we halt the coarsening at the 5th level, when the coefficient matrix has a minimum size of N 5 × M 5 with N 5 ≥ N 2 5 and M 5 ≥ N 2 5 , depending on the coarsening technique, and the solution on the coarsest level is performed through the backslash Matlab function, which is a direct solver.
As already clarified in Remark 1, the solution u 1 at the first time step is computed outside the coefficient matrix. One could of course include the computation of u 1 in the coefficient matrix as done in [11]. In our case, due to the computationally expensive smoother we use, the difference between the two approaches is negligible. A comparison between the two approaches in terms of iterations can be found in Sect. 6.6.
The section is organized as follows. In the first part we aim at explaining how we fix the fractional derivative order , and the relaxation parameter in our numerical examples. Precisely, in Sect. 6.1 we test the performances of ω-BJ for two different values of and we show that it generates jumps along the time axis, independently of . In Sect. 6.2, we check how much TGM is sensitive to , and we numerically prove that its behavior is only slightly -dependent.
Aside from and , we also need to clarify how we choose between the two projector generators p + 2 and p − 2 discussed in Sect. 5.2 when performing semi-coarsening in time for CN. This is the subject of Sect. 6.3. In Sects. 6.4-6.5 we perform few tests with large N, M to numerically check the robustness of TGM and V as N, M increase in both constant and variable diffusion coefficients cases. Finally, in Sect. 6.6 we provide a two-dimensional example.
All our tests have been run on a server with Intel(R) Xeon(R) Silver 4114 at 2.20GHz with Matlab 2019b. For all methods we fix the tolerance to 10 −7 and the initial guess as the null vector. We use the built-in gmres function, whose preconditioner is left-sided. For this reason we force GMRES to reach the required tolerance on the actual residual through a 'by hand' restart. A right preconditioned GMRES could be of course employed and it would basically give the same amount of iterations.
The addition of '(G)' after the solver name stands for 'Galerkin approach'. In case nothing is specified, geometric approach is adopted. Finally, the presence of '(P )' in the name of the solver means that the considered multigrid method is set as GMRES preconditioner.
We point out that due to space limitations, in the key of each figure we omit the name of the method and specify only the projector. For instance, we write simply xt + in place of TGMxt + . The name of the method will be clear from the caption of the figure.
All the results contained in the Sects. 6.1-6.4 refer to the following example.

Example 1
In this example we assume the diffusion coefficients to be constant and equal, that is d ± = d . The space and time domains in problem (1) are fixed as = (0, 2) , and [0, 1] respectively, while the true solution and the solution at t = 0 are given by The numerical approximation of v is computed starting from the discretized exact solution.

Behavior of ω-BJ smoother
Here we test the "smoothing properties" of -BJ. Let us consider = 1.5 , d = 1 , N = 63 and, to better point out the behavior of ω-BJ along the time axis, we fix M = 7 ≪ N. Figure 1 shows the error, reshaped as a space-time surface, after one iteration of -BJ for both linear systems in Eq. (6). We note that, in case of CN, one iteration of 1-BJ generates a jump along the time axis from t 1 to t 2 . In Fig. 1b, where the 2-step method BDF2 is considered, such a jump involves also t 3 due to the longer stencil of BDF2 with respect to CN. On the other hand, both surfaces in Fig. 1 do not show any jump along the x-axis.
We now analyze the jump in time varying the magnitude of ∶= d ⋅ r , where r is the grid dependent scale parameter. We introduce the function which measures the vertical displacement of the discrete error E(x, t) in the first three time steps at the midpoint x = 1 . We note that dist(E) = 0 if and only if E is constant in the first three time steps. In other words, as far as dist(E) ≈ 0 , E is smooth, and this indicates that -BJ is a good smoother. Figures 2a and b show how dist(E) behaves for both CN and BDF2, fixed N = M = 63 , = 1, 0.5 , = 1.1, 1.8 , and varying ∈ [10 −5 , 10 5 ] . As we can see, both discretizations are characterized by a region where the jump is negligible. In detail, the jump generated in CN is negligible only when ≈ 10 . In BDF2, instead, the jump becomes negligible as increases. Moreover, for both CN and BDF2, the jump slightly moves while varying , and it halves its magnitude when switching from = 1 to = 0.5.
In summary, in all the considered cases -BJ generates jumps along the time axis which means that the projection along such axis could be inaccurate. In order to face this drawback, in the following we only consider -BJ as post-smoother avoiding pre-smoothing iteration at the first iteration. This choice is supported by the idea that applying the CGC before the smoother could reduce the jump, preventing then the projection of a non-smooth error.

Behavior of TGM varying Į
n Sect. 6.1, we observed that dist(E) , in Eq. (9), slightly varies with . This could lead to a difference in the behavior of the multigrid depending on , when solving the two linear systems in (6). Here we perform few tests which show that the behavior of the proposed TGM is almost independent of and hence that justify the choice of a fixed value for in the reminder of the numerical tests.
In Figs On the other hand, = 0.5 provides a good convergence, according also to the analysis in Sect. 6.1, for both CN and BDF2. Therefore, in the rest of this section we fix = 0.5 . We stress that such discussion on the relaxation parameter is not intended as a substitute of a rigorous study, and that a theoretical approach to the subject will be investigated in a future work. Figure 3, where we use CN scheme, shows that by increasing the optimal region of convergence (blue) shifts to the left for any of the considered algorithms. Regarding BDF2, instead, Fig. 4 shows that the blue region shifts to the left as increases only in the case of TGMx. In the other two cases, their number of iterations stays almost independent of .
Summarizing, the width of the blue regions does not seem to significantly change while varying . Therefore, in the following we restrict our analysis to the case where = 1.5.

Time projection performances for the CN scheme
In Sect. 4 we have shown that the symbol h CN has a zero in t that moves from 0 to depending on how r or d behave asymptotically, and then on the magnitude of = d ⋅ r . As discussed in Sect. 5.2, this means that the projector in time must change as well from t + to t − according to . Here, we show that the latter does not work satisfactorily in practical applications when is large.
Let us fix = 1.5 and N = M = 2 7 − 1 . Figure 5a shows the iterations to tolerance of TGMt ± , TGMt ± (G), TGMt − (P ) while varying the magnitude of ∈ [10 −5 , 10 5 ] . We note, in line with the discussion in Sect. 5.2, that the Galerkin approach allows TGMt + (G) to converge in a low amount of iterations when ∈ [10 −5 , 1] , that is for small values of . When considering the less robust geometric approach, TGMt + still yields good convergence results in the same range of , even if the iteration number slightly increases.
In the case where ≫ 1 , the only working method is TMGt − (G). Unfortunately, the Galerkin approach is not of practical use since it is too computationally expensive. Regarding the geometric approach, TGMt − results unpractical also when used as GMRES preconditioner (refer to TGMt − (P ) in Fig. 5a).
The reason why geometric and Galerkin methods behave differently is due to the large difference between the matrices at the coarser level obtained with the two approaches. Indeed, the convergence condition given in (8) requires the Galerkin approach, which leads to a coarser matrix having a symbol that vanishes at the origin (see [1] for details). Differently, the geometric approach, which consists in discretizing the same problem over a coarser grid, builds a coarser matrix that vanishes again at t = and that shows then opposite spectral behavior with respect to Galerkin.
In Fig. 5b, TGM with full-coarsening is shown not to work in the anisotropic cases < 10 −1 and > 10 , independently of the time projectors and the approach for computing the matrix at the coarser level.
In conclusion, in the following we only consider the time projector given by t + and we denote it simply with 't', since it is the only projector that allows TGM with both semi-coarsening in time and full-coarsening to yield good convergence results for the geometric approach.

Comparison between CN and BDF2: TGM and V-cycle performances
Now we discuss how the performances of TGM and V-cycle with both semi-and full-coarsening vary depending on the adopted discretization scheme, i.e., CN or BDF2.
Let us discuss first the behavior of TGM. In Fig. 6, we compare the iterations to tolerance of TGMxt, TGMx, TGMt for both CN and BDF2 varying the magnitude We note that, in the case of CN, the iterations to tolerance of TGMt look stable for < 1 as N, M increase. The same holds for TGMxt and TGMx when ≈ 1 . Nothing seems to work when > 10 again independently of N, M, which is what we are expecting due to the strong anisotropy of this specific case discussed in Sect. 5.2.
In the case of BDF2, again according to our theoretical analysis, TGMx and TGMt yield good convergence results when > 1 and < 1 , respectively, and both are stable as N, M increase. In line with what we observed in Sect. 6.1, the high number of iterations of TGMt, even if constant, could be due to the bad smoothing effects along the time axis of 0.5-BJ. Concerning TGMxt, it yields the same iterations to tolerance as TGMt when > 10 −1 . Note that for this example both TGMxt and TGMt work where they are not supposed to, i.e., in the anisotropic case ≫ 1 . This is due to the smoother that, according to Remark 5, is already a robust enough solver. Due to the high computational cost of TGM, in Fig. 7 we switch from TGM to V-cycle and we check its behavior depending on the chosen time discretization scheme.
In the case of CN, conversely to the results obtained for TGM, the only projector which seems to allow V-cycle to converge in a reasonable amount of iterations, is the semi-coarsening in space. However, it works in a really small region, i.e. when ≈ 1 , which is close to the region where one iteration of 0.5-BJ yields a smooth solution (go back to Fig. 2).
Regarding BDF2, Vx converges in almost the same amount of iterations as TGMx when > 1 . In particular, for ∈ 1, 10 3 the block diagonal part of A BDF2 is not dominant and hence 0.5-BJ has a slow convergence, but when it is used as smoother in Vx we obtain a robust and fast convergent method. Moreover, we note that the region where 0.5-BJ used as standalone solver is already enough robust becomes smaller as the mesh-size increases. This is not the case for Vx, which is then faster than 0.5-BJ in a wider range of as N, M become large. Concerning Vt and Vxt, their plots are basically superposed independently of , and they perform well only for large values of again because of the ω-BJ smoother.
In the case of a semicoarsening in space only, since time interpolation is not involved, larger values of could be used. Tests which are not reported here show a reduction in the iteration number of the multigrid with ω-BJ, with ≈ 1 , when applied to both CN and BDF2 for almost the same values of where it performs well with = 0.5.
We note that, for both CN and BDF2, the iterations to tolerance of all the tested V-cycles stay almost stable as N, M increase. Moreover our results are in line with the results reported in Figure 2, Section 4.4 of [12], where multigrid with coloured pointwise Gauss-Seidel as smoother is used to solve a space-time linear system obtained from the discretization of a standard time-dependent diffusion equation.

A variable diffusion coefficients example
We now consider the example taken from [14] in which the diffusion coefficients are not constant.

Example 2
We assume the space and time domains in problem (1) as = (0, 2) , and [0, 1] respectively, and define the diffusion coefficients, the true solution and the solution at t = 0 as follows where is the gamma function and d > 0 . When d = 1 , the forcing term is given by while in the remaining cases, the numerical approximation of v is computed starting from the discretized exact solution.
We note that, as in the case of constant and equal diffusion coefficients, the results are not significantly sensitive to N, M. Moreover, like in Example 1, Vx is the only V-cycle, between the three tested, that yields good convergence results for both CN and BDF2. Finally, also in this variable coefficients example, the optimal convergence region, given by the magnitude of = d ⋅ r , is much bigger in the case of BDF2 than of CN.
We note that, independently of the constant or variable diffusion coefficients character, none of the tested methods is robust enough to deal with the case where < 1 . Further tests, not reported here, show that this holds unchanged even when using the V-cycle as preconditioner for the GMRES. On the other hand, we stress that, since d is fixed, the choice of an opportune grid could lead to > 1 , choosing t and x such that d t > 2 x and making V-cycle a suitable solver again.

Two dimensional case
We end the numerical section by providing numerical results in the two-dimensional case. We consider the following extension of the one-dimensional FDE in Eq. (1): Due to hardware limitations we cannot choose too dense grids, therefore we fix N x = N y = M = 2 6 − 1 , where N x and N y are the amount of points over the grids in the first and second spatial dimensions and M are the points over the time grid. Moreover, when considering the BDF2 scheme, as in the one-dimensional case we use CN to compute the solution at the first time step. As done in [11], we consider the case where the computation of the solution u 1 at the first the step is included in the coefficient matrix (inner CN) and we compare it with the previously considered case (outer CN), where u 1 is computed outside the coefficient matrix.
In Fig. 9 we show the iterations to tolerance of the multigrid used as standalone solver (denoted by 'xy') for solving Eq. (10) discretized with CN, BDF2 with inner CN and BDF2 with outer CN. In the case of BDF2 with outer CN, we compare the results with 1-BJ.
We note that, even in the two-dimensional case, multigrid applied to CN is efficient when ≈ 1 and applied to BDF2 when ≥ 1 . Moreover, we observe that the plots of BDF2 with inner and outer CN overlap almost everywhere, therefore the addition of CN inside the coefficient matrix does not seem to compromise the convergence of multigrid.
When ≥ 10 2 , as in the one-dimensional case, 1-BJ is an efficient solver, since the coefficient matrix becomes block diagonally dominant.

Conclusions and future works
In this work we focused on an all-at-once rephrasing of a time-dependent onedimensional space-FDE with constant diffusion coefficients discretized with WSGD in space and CN or BDF2 in time. The unconditional stability of the BDF2-WSGD scheme has been proven, and the two-level Toeplitz structure of the resulting linear systems has been leveraged to design multigrid strategies that use block Jacobi as smoother and whose projectors definition is driven by the symbol.
We have numerically shown that V-cycle with semi-coarsening in x is the only multigrid, among all the tested ones, that yields good convergence results for both BDF2 and CN schemes. Moreover, it performs satisfactorily under less restrictive assumptions on the magnitude of = d ⋅ r in the case of BDF2 than in the case of CN, and this let us to conclude that BDF2 is a much better alternative to CN for parallel-in-time integration with multigrid, when is large.
As future works, we plan to properly study the relaxation parameter of block Jacobi, as well as to consider alternative smoothers that could allow multigrid convergence even when ≪ 1 . Moreover, it would be interesting to investigate the stability of higher order BDF schemes for smooth solutions in time, which could potentially increase the effectiveness of a multigrid solver reducing, at the same time, the overall computational cost. Furthermore, we aim at providing a parallel implementation of such strategies and to extend our structure-based approach also to other stateof-the-art solvers like parareal [25] or MGRIT [26]. Finally, we aim at extending our results also to the variable coefficients case and to higher-dimensional problems. This opens to other kind of anisotropies for which the V-cycle with semi-coarsening could not work anymore and that could, instead, be treated with the MG-S already applied to two-dimensional FDEs in [5].