Boundary Treatment and Multigrid Preconditioning for Semi-Lagrangian Schemes Applied to Hamilton-Jacobi-Bellman Equations

We analyse two practical aspects that arise in the numerical solution of Hamilton-Jacobi-Bellman (HJB) equations by a particular class of monotone approximation schemes known as semi-Lagrangian schemes. These schemes make use of a wide stencil to achieve convergence and result in discretization matrices that are less sparse and less local than those coming from standard finite difference schemes. This leads to computational difficulties not encountered there. In particular, we consider the overstepping of the domain boundary and analyse the accuracy and stability of stencil truncation. This truncation imposes a stricter CFL condition for explicit schemes in the vicinity of boundaries than in the interior, such that implicit schemes become attractive. We then study the use of geometric, algebraic and aggregation-based multigrid preconditioners to solve the resulting discretised systems from implicit time stepping schemes efficiently. Finally, we illustrate the performance of these techniques numerically for benchmark test cases from the literature.


Introduction
We consider semi-Lagrangian schemes, as described in [5,9], for the numerical approximation of solutions to the Hamilton-Jacobi-Bellman (HJB) equation where Ω is a domain, Q T := (0, T ] ×Ω withΩ := Ω ∪ ∂Ω ⊆ R d , A is a compact set, is a second order differential operator, and ψ and g are the Dirichlet and initial conditions. The coefficients a α = 1 2 σ α σ α,T , b α , c α , f α , the initial data g and the boundary conditions ψ take their values, respectively, in S d , the space of d × d symmetric matrices, R d , R, R, R, and R, and σ α ∈ R d×P such that a α is positive semi-definite. We also assume the usual well-posedness conditions on the PDE coefficients, i.e. Lipschitz continuous in x uniformly in α, Hölder continuous with exponent 1 2 in time and continuous in α for each (t, x) ∈ Q T [18]. The relevant notion of solution for this type of non-linear equations is that of viscosity solutions [7] and the above conditions guarantee existence and uniqueness.
In general, the viscosity solution to (1.1)-(1.3) is unknown, thus it is necessary in practice to compute approximations numerically. Sufficient conditions for a numerical scheme to converge to the unique viscosity solution of (1.1)-(1.3) were proved by Barles and Souganidis [2] in terms of consistency, L ∞ -stability and monotonicity. We restrict our attention to finite difference discretizations of the differential operator (1.4).
The requirement of monotonicity drastically affects the properties and construction of finite difference schemes. Theorem 4 in [27] proves that local monotone discretizations have at most first order for first-order equations and second order for second-order equations. What is more, standard fixed stencil methods are monotone only under restrictions on the diffusion matrix, such as diagonal dominance [9,12]. Results from [6,21] further illustrate the limitations of such methods for the monotone approximation of second order derivatives.
This implies that generally approximations have to be non-local on the discrete level, i.e. the distance between mesh points involved in the scheme at a given point grows in relation to the mesh width as the mesh is refined. Such schemes are referred to as wide stencils. For general diffusion matrices, first order accurate wide stencils of the type considered here have been proposed in [5,9], and a mixed fixed-and wide-stencil scheme in [19].
In this article, we analyse two issues arising in practice when numerically solving (1.1)-(1.3) using the class of schemes described in [20,5,9] to discretize the second order differential operator (1.4). This approximation combines wide stencils in the directions determined by the columns of the diffusion matrix σ α and the drift b α , together with (linear) interpolation. Following the notation in [9], we write the matrix σ α ∈ R d×P as (σ α 1 , σ α 2 , . . . , σ α P ), where σ α p ∈ R d for p ∈ {1, 2, . . . , P } denotes the p-th column of σ α , and observe that for k > 0 and any smooth function φ, where O(k 2 ) is the local truncation error of the finite difference and for compactness we write b α ≡ b α (t, x) and σ α ≡ σ α (t, x). As these approximations will be used for points lying on a discrete spatial grid Ω ∆x with nodes {x j : 1 ≤ j ≤ N }, the displaced points x + k 2 b α , x ± kσ α p do not generally coincide with nodes of Ω ∆x . Therefore, φ is replaced by an interpolant I ∆x φ on that grid. We restrict our attention to linear interpolants, defined by the standard piecewise multilinear non-negative basis functions {w j (·) : 1 ≤ j ≤ N } associated with the mesh nodes, such that for any function φ φ(x j )w j (x), (1.7) for all x ∈ Ω, x j ∈ Ω ∆x , where N (x) is the set of neighbours of x on the mesh Ω ∆x , i.e. the mesh points with non-zero interpolation weight. The resulting scheme is referred to as the Linear Interpolation Semi-Lagrangian (LISL) scheme. It is shown in [9] that the leading order terms of the local truncation error are proportional to k 2 and ∆x 2 k 2 , where the last quantity corresponds to the linear interpolation error in the finite difference formulae (1.5) and (1.6) by replacing φ by its interpolant. Therefore, by choosing k = √ ∆x, the resulting scheme is locally of first order in ∆x. Following the notation in [9], the LISL finite difference approximations for the differential operator in (1.4) can be expressed as (I ∆x φ)(t, x + y α,+ p (t, x)) − 2(I ∆x φ)(t, x) + (I ∆x φ)(t, x + y α,− p (t, x)) 2∆x , (1.8) for x ∈ Ω ∆x , and some M ≥ 1. Different schemes can be obtained depending on the values taken by M and y α,± p (t, x). In particular, [9] discusses the following three schemes: Examples of LISL schemes.
2. Scheme 2: The approximation in [9], corresponding to y α,± p = ± √ ∆xσ α p for p ≤ P , y α,± P +1 = ∆xb α , and M = P + 1. 3. Scheme 3: A more efficient version of the Camilli-Falcone approximation, corresponding to y α,± p = ± √ ∆xσ α p for p < P , y α,± P = ± √ ∆xσ α P + ∆xb α , and M = P . The authors show that this family of discretizations of (1.4) is consistent and monotone. Monotonicity of the scheme is fulfilled as the discrete approximation L α ∆x [I ∆x φ] is the composition of monotone finite differences and a monotone interpolation operation. Once discretized in space, the final scheme arises from discretising in time using the standard θ-time stepping scheme for θ ∈ [0, 1], where θ = 0 corresponds to the explicit Euler time stepping and θ = 1 to the implicit case, on a time grid represented by a strictly increasing sequence of points {t n } Nt+1 n=0 with t 0 = 0, t Nt+1 = T , and ∆t n := t n − t n−1 ≤ ∆t for all n. The scheme being monotone, it can be written as described in the following definition, where for any grid function V : {t n } Nt+1 n=0 × Ω ∆x → R, V n i ≡ V (t n , x i ). Definition 1.1 (Equation (4.1) in [9]). A scheme is said to be of positive type, if it can be written as for j = 1, . . . , N , on the discrete domain {t n } Nt+1 n=0 × Ω ∆x , where U n i is the numerical solution at node (t n , x i ) and all the coefficients B are non-negative.
For the convenience of the reader, we reproduce the expressions for B α,n,· j,· of the LISL schemes as in [9], for all 1 ≤ i = j ≤ N , .
The schemes described above have a wide stencil as the length of the stencil, being proportional to the ratio k/∆x ∼ 1/ √ ∆x, tends to ∞ as ∆x → 0. Hence, when applied on a bounded discrete grid, the stencil will generally exceed the domain for points close to its boundary. As discussed in [9], the overstepping may pose a problem depending on the equation and the type of boundary conditions imposed. We consider Dirichlet boundary conditions here.
Our first goal is to present and analyse a modification of the LISL scheme to deal with overstepping for problems on bounded domains with Dirichlet boundary conditions, and general drift and diffusion coefficients. We describe how to truncate the LISL stencil so that the truncation remains consistent and monotone. We prove that the resulting stencil for Scheme 2 above is of positive type (as per Definition 1.1), and since the coefficients B in (1.9) do not depend on U , it is also monotone. This is not the case for Schemes 1 and 3. We also observe that the truncation has both local and global impacts on the properties of the scheme. Locally, the modification of the scheme leads to a loss of accuracy of half an order in the consistency error, i.e. O( √ ∆x) instead of O(∆x), due to the loss of symmetry. We compare the accuracy of the truncation with extrapolations of the boundary conditions by way of numerical tests for benchmark problems. As the mesh points requiring truncation of the scheme are restricted to an O( √ ∆x) layer at the boundary, convergence rates close to O(∆x) are observed empirically for the new scheme. The truncation has a global effect in the sense that it modifies the CFL condition of explicit schemes by at least half an order, from ∆t = O(∆x) to ∆t = O(∆x 3/2 ). As the empirical error is O(∆t) + O(∆x) for fully implicit schemes, the computationally most efficient choice is ∆t ∼ ∆x, outside the stability region of explicit schemes.
The second goal is therefore the use of implicit schemes and the efficient solution of the discrete system (1.9) using multigrid preconditioning. For θ = 0, the coupling of the optimal control and the coefficients makes (1.9) a non-linear system of algebraic equations, where A α i is the i-th row of a matrix A α with elements A α i,j , i, j = 1, . . . , N , and control α ∈ A. Comparing with (1.9), A α i,j = B α,n,n i,j , F α i = F α,n−1+θ i , and X = (X i ) = (U n i ) is the solution vector for the n-th time step. The maximisation over α in (1.10) is row-wise and usually done by linear search. By construction of the LISL scheme, A α is an M-matrix with non-negative row sum. Therefore, following results in [4], we can use policy iteration to compute U . Then, within each policy iteration, a linear system A α i i X = F α i i , i = 1, . . . , N , with fixed control vector (α i ) 1≤i≤N has to be solved. We find (in contrast to [19]) that this last step is the computationally most costly part of the overall algorithm if direct linear solvers or standard iterative solvers are used. 1 We therefore study multigrid preconditioners. (See Table A .5.) In the literature on multigrid for HJB equations, two main approaches are observed: on the one hand, multigrid is applied directly to the non-linear problem, as in [3,13,14]; and on the other hand, multigrid is applied to a linearised problem, as in [1]. In particular, [3,14] provide the first multigrid algorithms for HJB equations and prove convergence, while [13] presents a novel smoother for HJB equations based on damped value iteration [17]. These articles have in common the use of standard fixed stencil finite difference approximations and the use of a geometric structure when building the hierarchy of multigrid subspaces.
The novelty of this article is to study the application of multigrid preconditioning to a wide stencil discretization. We will demonstrate, both by Fourier analysis of a model problem and by numerical tests in a more complex application, that standard geometric multigrid does not give mesh-size independent convergence.
We then investigate algebraic multigrid methods. The basis for the specific algorithm we use was introduced in [24] for linear elliptic PDEs. It empirically showed that "aggregation based methods could yield robust 2 and convergent schemes if used as preconditioners of a Krylov method, and were part of an enhanced multigrid cycle, not simple V-or W-cycles" as considered in [31]. By enhanced multigrid cycles, the authors refer to recursive schemes in which at each coarse level the solution to the residual equation is computed using a number of Krylov subspace iterations as in [26] or with a semi-iterative method based on Chebyshev polynomials called the AMLI cycle, see Section 5.6 of [34]. The aggregates were formed using heuristic criteria following coupling in the strongest direction.
In [22] the authors introduced an aggregation-based multigrid method with guaranteed convergence rate for symmetric M-matrices with non-negative row sum. A LISL discretization matrix is only symmetric in very specific cases with limited practical interest. For nonsymmetric matrices, in [25] convergence of a simplified two-grid scheme using aggregation is proved for non-singular M-matrices with non-negative row and column sums. This requirement ensures that the symmetric part of the coefficient matrix A given by A + A T meets the assumptions in [22] and allows the use of its theoretically justified algorithms. We will derive conditions on the coefficients of the HJB equation such that this theory applies, and show empirically that aggregation-based multigrid gives roughly mesh-size independent convergence.
The rest of the article is organised as follows. Section 2 discusses the truncation of the LISL scheme for points whose stencil exceeds the domain and compares its performance to naïve extrapolations of the boundary conditions. Section 3 considers the application of three different multigrid methods to linear systems where the coefficient matrix arises from LISL 1 Which of the two steps is more costly depends crucially on the type of control problem. The optimisation step is typically fast if the control is taken from a finite set, if the local control problem is analytically solvable (e.g., quadratic or of 'bang-bang'-type), or if the coefficients are a smooth convex function of the control, such that standard Newton-type methods can be used. It will be more costly, if the optimal control has to be approximated by exhaustive search over a discretised control set, especially if the dimension of the control space is higher than the spatial dimension of the PDE. In the examples considered in this paper, the control is scalar and we optimise by linear search over a one-dimensional control mesh. 2 In this context, a robust method is referred to as one showing good performance for a large range of problems without changing the smoother. discretizations. Section 4 contains the final remarks.

Boundary treatment for the LISL scheme
In this section, we analyse adaptations of Schemes 1-3 for initial-boundary value problems on bounded domains. As described in the introduction, for points x close to the boundaries of the domain, the stencil points x + y α,± p (t, x) in (1.8) generally do not lie in a mesh element. In the following, we therefore discuss the truncation of (1.8) so that the resulting scheme remains monotone, consistent, and L ∞ -stable. The proposed truncation samples the boundary points on the straight lines defined by the point x and x + y α,± p (t, x) and adjusts the corresponding finite difference weights for consistency.

Definition of truncated stencils
We take Ω ⊂ R d for d ≥ 2. We first outline how the method can be defined on a general domain with curved boundary, but later (especially in the numerical tests) focus for simplicity on rectangular domains. We start with a Cartesian mesh on R d with uniform mesh width ∆x and then choose Ω ∆x as all the points which lie inside Ω. See We now fix a mesh node x ∈ Ω ∆x . There are two distinct situations where interpolation at the point x + y α,± p (t, x) as per (1.8) is not possible for given t, α and p: Fig. 2.1); B. x + y α,± p (t, x) ∈Ω, but the element it is contained in has vertices outsideΩ (top right).
We say the stencil "oversteps". In such cases, the objective is to find truncated or extended stencil vectorsŷ α,± p (t, x) and corresponding finite difference weights A α p ≡ A α p (t, x) and B α p ≡ B α p (t, x), such that x +ŷ α,± p (t, x) ∈ ∂Ω and the truncated schemê is a consistent approximation of (1.4) as ∆x → 0. If the stencil does not overstep, we have thatŷ α,± p (t, x) = y α,± p (t, x) and A α p = B α p = 1. If it does, for any t we definê In case A, this means µ < 1, while in case B we have µ > 1.
In the remainder of this section we restrict our attention to the truncation of the scheme on rectangular domains, in which case the elements of the Cartesian mesh cover exactly the domain and case B does not occur. Moreover, this means that interior mesh points cannot be arbitrarily close to the boundary, but are always at least ∆x away 3 . This allows the derivation of CFL conditions for the explicit schemes as given below in Section 2.3.

Consistency conditions
In the truncated scheme (2.1) there are M pairs of weights, which can be chosen freely, subject to positivity, in order to obtain a consistent scheme. As we will see below, this is only possible for Scheme 2.
In the following, we denote [ [1, j]] ≡ [1, j] ∩ Z and for a vector v ∈ R d , (v) i denotes its i-th element. As in the introduction, we have that b α ∈ R d , and σ α = (σ α 1 , . . . , σ α p , . . . , σ α P ) ∈ R d×P where σ α p ∈ R d denotes the p-th column vector. For compactness, we omit the dependence of the coefficients and the stencil related functions with respect to the position, that is b α ≡ b α (t, x), σ α p ≡ σ α p (t, x), y α,± p ≡ y α,± p (t, x) and µ α,± p ≡ µ α,± p (t, x). We add a second subscript taking values 1, 2 or 3 to A α p , B α p and y α,± p to make the discretization scheme explicit.
Proposition 2.1. The truncated version of Schemes 1 and 3 is generally not consistent.
Proof. By Taylor expansion of a smooth test function we find that the consistency conditions for Scheme 1 are where P ⊆ [[1, P ]] denotes the set of stencils overstepping the domain and i, i 1 , In Scheme 1, there are 2|P| ≤ 2d variables, but (d 2 +3d)/2 equations, d from the condition on the Jacobian and (d 2 +d)/2 from the condition on the Hessian. This overdetermined system has a solution only if there is linear dependence between the equations. Except for special cases, e.g. |P| = 0 or σ α p parallel to b α for some p, this is not the case. Hence, in general the truncated Scheme 1 is not consistent.
We conclude that for points whose stencil oversteps the boundary, the approximations of the first and second derivative should be considered separately, as done in Scheme 2.

2)
and, for p ∈ [[1, P ]], Then the scheme defined by (2.1) is consistent unless both µ α, Proof. If the stencil oversteps, then the truncated stencil consists of the point at the intersection between the boundary ∂Ω and one of the segments {x, For each point (t, x) Scheme 2 requires the calculation of at most 2P + 1 different weights, i.e. 2P for the second order term and one for the first order term. For the latter we have thatŷ α,+ 2,P +1 =ŷ α,− 2,P +1 , therefore A α 2,P +1 = B α 2,P +1 . Ignoring the interpolation error for the time being, the coefficients are obtained from the consistency conditions (up to a term o(∆x)), for the first order term, and for the second order term. By construction of the truncated stencil (2.4) and (2.5) are linearly dependent across i, and (2.6) across i 1 and i 2 , resulting in one (linearly independent) equation for the first order term weights and two for A α 2,p , B α 2,p , with solutions given by and which are seen to be equivalent to equations (2.2) and (2.3). The contribution to the consistency error of (2.1) from the bilinear interpolation operator c) The local consistency error for points with truncation and p = P To prove c) we use Taylor expansions for each p and conclude using the limits in b). Let φ :Ω → R be a smooth function and for any p ∈ (P ∩ [[1, P ]]), where P denotes the set of stencils overstepping the domain, then by Taylor expansion and the consistency conditions (2.5)-(2.6) the local consistency error τ for the p-th addend of (2.1) using multi-index notation is given by where, due to the truncation of the stencil, the scheme is not central and therefore the terms for odd |β| do not cancel out. If only one side of the stencil oversteps then for |β| = 3 whereas if both sides overstep then the error from interpolation dominates and is O(1) for points O(∆x) from the boundary, as seen at the end of the proof of Proposition 2.2.
Remark 2.1 (Two-sided overstepping). We note that it is possible for both sides of the stencil to overstep if the diffusion direction σ α p is (almost) parallel to the domain boundary, for points close to a locally convex smooth boundary with high curvature in that direction, as well as close to corners; see Remark 2.4 and Table 2.5 below.
The scheme is consistent at points with two-sided overstepping if the truncated scheme is not interpolated at the boundary but uses the exact boundary values. In that case, the consistency error for those points is O(∆x).

Properties of the truncated stencil
The changes in the finite difference weights of scheme (2.1) introduced by the truncation, modify the positivity conditions given in Lemma 4.1 in [9]. We will show that the scheme remains conditionally L ∞ -stable and monotone, but the CFL conditions are more restrictive in the truncated case for time-stepping schemes with θ < 1. We start by writing the scheme on a discrete time-space grid with mesh parameters ∆t and ∆x aŝ where N is the set of neighbours as in (1.7), and The first equality follows from (2.1), the second from (1.7) and since for all It is straightforward to write down the expressions for the coefficients in (1.9): In writing down (2.9), we assumed that the value at the boundary is interpolated from other mesh points, which is feasible on rectangular cuboids, but not for general domain boundaries. In both cases, the Dirichlet boundary value at x j +ŷ α,± p can be used. This has the advantage that interpolation error is avoided. Moreover, as this value then contributes to the right-hand-side f of equation (2.11) instead of the off-diagonal matrix elements, the system matrix becomes more diagonally dominant. This is advantageous for the iterative solution, see Section 3.4.
The next proposition contains the positivity conditions for the coefficients B defined above.
Proposition 2.4. The scheme (2.11) is of positive type if the following conditions hold, for all α, n, i.
Corollary 2.5. In the case of overstepping and θ < 1, monotonicity requires that ∆t ∼ O(∆x 3/2 ) if only one side of the diffusion stencils oversteps, or ∆t ∼ O(∆x 2 ) if both sides overstep. However, if the stencil is not truncated, the positivity condition remains as in [9], that is ∆t ∼ O(∆x).
Proof. From Corollary 2.3, if the corresponding stencil is truncated on one side A α,n−1 The L ∞ -stability follows from the proof of Lemma 4.1 in [9] and the new CFL conditions in Proposition 2.4.

Numerical experiments
To test the truncation of the stencil, we consider Problems A and B in Section 9.3 from [9]. Both problems follow the formulation in (1.1)-(1.3) with homogeneous Dirichlet boundary conditions and have smooth solutions.
Problem A (see Section 9.3 from [9]). It has exact solution u(t, x 1 , x 2 ) = 3 2 − t sin x 1 sin x 2 , and coefficients and control set are given by Problem B (see Section 9.3 from [9]). It has exact solution u(t, x 1 , , and coefficients and control set Both problems are solved on the domain (t, We discretize the spatial domain using Cartesian grids with N x ×N x equispaced nodes and for the control set A we take N α equally spaced points. Here, I ∆x is the usual bilinear interpolator on rectangles.
For illustration of the stencil and its non-locality, the top row of Figure 2.2 represents the stencil for Problems A and B on a Cartesian grid of 11 × 11 points and 10 points in the control set A. Colour coded lines link the stencil points with the node where the numerical solution is computed, the different colours correspond to the differentŷ α,· · . On top of some of the stencil points we print the value of the finite difference weights, for compactness we set A ≡ A α 2,1 (x), B ≡ B α 2,1 (x) and C ≡ (µ α 2,2 (x)) −1 , following the notation in (2.3) and (2.2). The bottom row of Figure 2.2 represents the non-locality of the diffusion stencil by counting the number of stencil points at a given distance from the central node. The distance is measured as multiples of ∆x and given by ( , where the grid is of size 641 × 641 and 10 points in the control set A.
Problems A and B were obviously chosen in [9] for their periodic solutions, to be able to analyse the convergence of the scheme without the complication of boundary conditions. Here, we do not make use of the periodicity but only use the values at the boundary and not outside the domain.
We note that the problems being linear in t, a single time step with ∆t = T suffices to obtain an exact solution in t. However, in order to check the effect of the truncation on the stability, in addition to ∆t = T , we also investigate ∆t equal to ∆x 4 , ∆x 3/2 , and ∆x 2 . We report the ∞-norm of the errors over two regions: the first one comprising the whole domain, and the second one comprising part of the interior of the domain.
Ω∆x where Ω∆x is a Cartesian grid with ∆x = 2π 640 , 10 points in the control set A, and i ∈ {1, 2} is the dimension index.
Ω∆x where Ω∆x is a Cartesian grid with ∆x = 2π 640 , 10 points in the control set A, and i ∈ {1, 2} is the dimension index.
and C ≡ (µ α 2,2 (x)) −1 , following the notation in (2.3) and (2.2). To illustrate the non-locality of the scheme as the grid is refined, the second row represents the histograms of the shortest displacement from the central node for a grid of size 641 × 641 for both problems. The radius of the stencil in σ α is 14.27 for this grid, given by σ α 2 √ ∆x = 640/π. We consider explicit and implicit time stepping schemes, corresponding to θ = 0 and θ = 1 respectively. For the explicit scheme in the case of overstepping we test the following modifications of the scheme: 1. truncation of the stencil as discussed in Section 2.2 ( The results confirm the impact of the truncation on the stability of the scheme, when θ = 0. However, when θ = 1, we do not observe any instability regardless of the size of the time step. When stable, the truncation of the stencil outperforms the two extrapolations of the boundary conditions considered. Furthermore, as the mesh and time steps are refined, only the truncated scheme, if stable, achieves convergence orders close to O(∆x) when the error at t = T is measured on the entire spatial grid. This can be explained without rigorous proof by the observation that the truncation error of order √ ∆x is restricted to a boundary layer of width √ ∆x. Therefore, as seen from the last two columns in Table 2.4, choosing ∆t of order higher than 1 in ∆x does not improve the accuracy of the numerical results and leads to computational inefficiency. Remark 2.3. Regarding the discretization of the control set, we take N α = 40 equally spaced points. For this choice, the discretization error of the LISL scheme is found to dominate the control discretization error for the problems and the space-time mesh sizes considered.
Remark 2.4. Corollary 2.5 shows two different CFL conditions for the truncated stencil, the first one for diffusion stencils where only one side oversteps and a second one when both sides overstep. The results in Note that the solution itself is periodic with period 2π. This problem differs from the original one in that both sides of the diffusion stencil overstep for mesh points within a distance of O( √ ∆x) to the bottom left corner, located at ( −π 8 , −π 8 ), where σ α = (−1, 1) T . In Table 2.5 we report the results for the explicit method using the truncation of the stencil. As expected, we find that we now need ∆t ∼ ∆x 2 for stability.
Remark 2.5. For the explicit method using the truncation of the stencil, i.e. Tables 2.1, A.1 and 2.5, focusing on the ∆t = T case, we notice that the convergence rate over the whole mesh Ω ∆x is approximately 0.5, whereas it is approximately 1.0 when the error is measured in the interior of the mesh. We also notice that there is a significant difference between the magnitude of the errors if measured over the whole grid or on a region in the interior. The difference in the magnitude of the errors may be due to the fact that ∆t = T does not satisfy   the CFL condition and that the CFL condition is more restrictive for points where the stencil is truncated. It is also at these points that the local consistency error is of order √ ∆x as shown in Corollary 2.3. The situation is different for ∆t = ∆x 2 in the explicit case, or for any ∆t in the implicit case. In these cases, the error convergence rates are approximately 1.0 when measured over the whole mesh and the errors over the whole grid and in the interior are comparable in magnitude.

Multigrid preconditioning
In this section, we study the application of multigrid preconditioners together with policy iteration [4] to solve the non-linear system (1.10).
Geometric multigrid requires us to predefine a grid hierarchy based on the geometry of the problem. The variability of the width of the LISL stencil within a given grid (variable coefficients) and through the grid hierarchy makes it difficult, even for simple problems, to design an appropriate grid hierarchy and a good smoother. Moreover, the varying stencil requires us to build the coarse-grid version of the operator algebraically instead of using its coarse grid version, which further limits our knowledge of the problem as we go deeper into the grid hierarchy.
Another aspect to consider is related to the transfer operators. Standard grid interpolations provide approximations using the grid neighbours of a given node, whereas for the LISL stencil, being non-local, the solution at a given node may not be best approximated by its neighbours on the grid but by those on its stencil. These heuristics suggest that the algebraic approach to multigrid, fixing the smoother and building operator dependent intergrid transfer operators, may result in more efficient multigrid preconditioning for LISL discretizations.
Algebraic multigrid (AMG), introduced in [28], constructs "coarse grids" based on the matrix coefficients. However, as pointed out in Section 6.2 of the recent review on preconditioning [36], AMG coarsening may not reduce the number of variables fast enough from one grid to the next. A slow reduction in the number of unknowns and the use of the Galerkin principle to build the coarse system matrix with intergrid transfer operators using weighted averages increase the complexity of the multigrid scheme. To measure the complexity the following quantities are commonly used: Definition 3.1. The grid complexity c G is the total number of variables N . , on all multigrid levels, divided by the number of variables on the finest level N 1 , Definition 3.2. The algebraic complexity c A is the total number of non-zero entries, in all matrices A , divided by the number of non-zero entries of the finest level operator A 1 , We will find a benefit to the convergence of constructing the "coarse grids" algebraically already for simple examples of LISL matrices (Section 3.3, in particular Table 3.1), and that algebraic construction of the grid hierarchy deals well with the varying LISL stencils (Section 3.5). However, there is an increase in complexity of AMG (see Table 3.1) mainly due to the use of interpolation in LISL discretizations.
Recent and on-going research on algebraic multigrid [24,25] shows how one can construct good multigrid cycles using simplified "intergrid" transfer operators based on aggregation of the unknown variables, thus avoiding the problem of increased complexity on coarser levels. In particular, [25] proves convergence of a simplified two-grid scheme using aggregation for non-singular M-matrices with non-negative row and column sums. We will show that these results apply for LISL discretizations matrices and justify the use of AGMG both theoretically and empirically.

On the spectrum of LISL matrices
To assess the suitability of preconditioning based on geometric multigrid, we start by considering the spectrum of LISL matrices for a simplified model. For illustration, we first calculate the eigenvalues and eigenvectors of the LISL discretization of the diffusion operator with constant coefficients, for any function u : where σ ∈ R d×d is a diagonal matrix with (σ) ii = σ i . We start by considering the one-dimensional case on an equispaced grid Ω ∆x , where ∆x > 0 is the distance between two consecutive nodes. For σ > 0, we define where m ∈ N denotes the stencil length and γ ∈ [0, 1] is the interpolation weight of the onedimensional linear interpolation operator, such that for any real function φ : R → R the linear interpolation operator on Ω ∆x is I ∆x (φ)(x i + √ ∆xσ) = γφ(x i + m∆x) + (1 − γ)φ(x i + (m + 1)∆x). Without loss of generality and for simplicity of the notation we assume that ∆x = 1. Denote by L N the following N × N Laplacian matrix Let now m = 2 and ∆x = 1, then the LISL discretization matrix is given by       The spectrum of higher-dimensional constant coefficient Laplacians can be inferred from the spectrum of the one-dimensional matrices by means of Kronecker products. Next, we consider the properties of common smoothers when applied to LISL discretization matrices of the two-dimensional Laplacian and conclude with an example illustrating the impact of the diffusion coefficient on the convergence of geometric multigrid cycles.

Local Fourier analysis of the smoothers
We seek to analyse how a varying size stencil affects the properties of the standard Gauss-Seidel smoother. We base the analysis on Local Fourier Analysis (LFA) as described in Chapter 4 of [32] and state the smoothing factors µ loc of Gauss-Seidel iterations when applied to wide stencil finite difference discretizations. The key to the analysis is the use of grid functions of the form ϕ(θ, x) = e iθ·x , where i is the imaginary unit, x ∈ R d , θ ∈ [−π, π) d and · is the inner product for vectors in R d . For simplicity we consider equispaced grids Ω ∆x with refinement parameter ∆x > 0. Therefore, any x ∈ Ω ∆x can be written as x ≡ x 0 + κ∆x for some fixed x 0 ∈ Ω ∆x and κ ∈ Z d . It is thus convenient to rescale the exponent of ϕ by ∆x −1 .
The functions ϕ are important since, as shown in Lemma 4.2.1 of [32], "all grid functions ϕ(θ, x) are (formal) eigenfunctions of any discrete operator which can be described by a difference stencil". This property allows us to associate to each discrete finite difference operator L ∆x a so-called symbolL ∆x (θ) defined by where s κ ∈ R is the finite difference coefficient at the location κ with respect to the node x 0 . As in [32], we consider smoothers formed by a splitting L ∆x = L + ∆x + L − ∆x of the discrete operator, i.e. , whereL + ∆x andL − ∆x are defined as for L ∆x in (3.4). With multigrid, the objective of the smoother is to dampen error components not reduced by the coarse grid correction. Therefore, assessing the properties of a given smoother requires fixing the coarse grid correction. We limit the study to the simplest coarsening strategy, that is if Ω ∆x is the fine grid then Ω 2∆x is the coarse grid. This leads to the definition of low and high frequencies below.  [32]). For the coarsening considered, we define the high and low frequencies as follows: Definition 3.4 (Definition 4.3.1 in [32]). The smoothing factor for standard coarsening is We employ these definitions to compare the smoothing factors for the standard twodimensional Laplacian, setting d = 2, discretised using standard local finite differences and the LISL discretization.  [32]). The smoothing factor for the Gauss-Seidel smoother for the standard Laplacian discretisation is given by Similarly, the smoothing factor for the LISL scheme can be derived. In the present case of pure diffusion, Schemes 1-3 coincide.
Example 3.2. Proceeding as in [32] for Example 3.1, the symbols L + ∆x and L − ∆x for the LISL discretizations arẽ where m i and γ i are given by (3.2) replacing σ by σ i . For compactness of notation, define then the smoothing factor for a Gauss-Seidel smoother with standard coarsening and the LISL scheme is given by µ loc (S SL ∆x ) = sup for θ ∈ T high , where g 1 ≡ g(θ 1 , γ 1 , m 1 ), g 2 ≡ g(θ 2 , γ 2 , m 2 ), andc denotes the complex conjugate of the complex number c.
From (3.5) we see that as the non-locality of the discretization grows, i.e. m . → ∞ then the smoothing factor approaches 1 (no smoothing) and so highly oscillatory modes will be transferred to the coarser subspace. Figure 3.2 compares the smoothing factor for the fixed stencil 5 point discretization and a specific semi-Lagrangian stencil. Example 3.3. We can generalise the results in the previous example to the case of diffusion given by a vector (σ 1 , σ 2 ) T not necessarily parallel to any of the axes. If σ 1 and σ 2 have the same sign, theñ If, however, σ 1 and σ 2 have different signs, theñ To account for the fact that σ · can be negative, we re-define m i := |σ i |/ √ ∆x and γ i := (m i + 1) − |σ i |/ √ ∆x. The deterioration of the smoother for large m i is present here too.

Performance of geometric multigrid
We conclude the discussion of geometric multigrid by testing its performance against an iterative solver used in [19], i.e. BICGSTAB [33] with and without ILU(0) 4 as preconditioner [29], and algebraic multigrid algorithms, namely, the classical Ruge-Stüben AMG [28] using our own implementation, and AGMG from [25], using the implementation from [23].
As benchmark examples, we choose a linear system Ax = b whose coefficient matrix is the LISL discretization of (3.1) in the two-dimensional square [0, 1] 2 with Dirichlet boundary conditions, and σ = 2I 2 and σ = √ 5I 2 , respectively, where I 2 is the 2 × 2 identity matrix. These values are chosen to study the effect of interpolation (which is always required in the second case and only for odd levels in the first) on the convergence and complexity of the methods, in particular on the convergence rates of geometric multigrid and the operator complexity of algebraic multigrid.
We use a Cartesian grid with equal number of equispaced nodes in both directions, the smoother is Gauss-Seidel, the prolongation operator bilinear interpolation, the restriction the transpose of the prolongation, and the coarse grid operator is constructed using the Galerkin principle. Figure 3.3 presents the reduction of the residual, r k ≡ b − Ax k 2 , against the number of iterations k with x 0 = 0, for a discrete mesh where the distance between two consecutive nodes is ∆x = 2 −8 . The algorithm is stopped whenever the relative residual, b−Ax k 2 / b 2 , measured in the Euclidean norm, is below the prescribed tolerance, in this case 10 −6 .
Next, we study the residual reduction factor ρ = (r k /r 0 ) 1/k , where k is the number of  Geometric V (ν 1 , ν 2 ) and W (ν 1 , ν 2 ) cycles are considered, where ν 1 and ν 2 denote the number of pre-and post-smoothing steps. Their performance is compared to the iterative method BICGSTAB with and without preconditioner, and to two algebraic algorithms, AMG and AGMG from [28] and [25], respectively (see also Sections 3.4 and 3.5). Notice the almost overlapping of lines for geometric V (ν 1 , ν 2 ) and W (ν 1 , ν 2 ) cycles for equal ν 1 and ν 2 (see also Table 3.1, which shows almost identical rates for = 8).
iterations required for the prescribed tolerance. We solve the problems above for different refinement levels , where the number of nodes per dimension is 2 +1. We observe in Table 3.1 that for even the convergence factor ρ corresponding to geometric multigrid cycles for σ = 2I 2 is significantly worse than that for σ = √ 5I 2 . This is due to the lack of interpolation when 2 2 ∈ N, as the step is a multiple of ∆x, so for any mesh node x ∆x ∈ Ω ∆x x ∆x ± √ ∆xσ i ∈ Ω ∆x . The lack of interpolation, and the equal stencil lengths in both directions, gives µ loc = 1 in (3.5). Moreover, as shown in Figure 3.1, the eigenvectors corresponding to small eigenvalues are highly oscillatory and hence not resolved sufficiently on the coarse mesh.
Regarding the BICGSTAB iterative solver, we observe the benefit of using ILU(0) as preconditioner, however, the significant increase in the convergence rate as the mesh is refined (and hence the condition number of the matrix increases) suggests that convergence is not asymptotically mesh size independent. To further illustrate this, Table 3.2 contains the residual reduction factors for AGMG and BICGSTAB with and without preconditioner when solving (3.1) in the one-dimensional domain [0, 1] with homogeneous Dirichlet boundary conditions and discretized using the LISL scheme. We consider the cases σ = 2 and σ = √ 5. As discussed previously, for σ = 2 and even, no interpolation is required. In this case, ILU(0) exactly factorises the system matrix A and hence ρ = 0. However, when interpolation is needed, ρ approaches 1 for BICGSTAB as the mesh is refined but not for AGMG. Furthermore, for = 21 BICGSTAB and BICGSTAB with ILU(0) require approximately 86 and 21 times the time required by AGMG for the relative residual to be below 10 −6 . Additionally, let t tol be the time required for the relative residual to be below 10 −6 and N the number of unknowns, assuming that t tol ∼ O(N a ), empirically we observe that a equals 1.11 for AGMG, 1.50 for BICGSTAB and 1.36 for BICGSTAB with ILU(0) as preconditioner.
Returning to the two-dimensional case, the grid hierarchies in the geometric (GMG) and algebraic (AMG) multigrid have 5 levels, including the finest one. In the geometric case, the number of unknowns is 4 times smaller from one level to the next. In the AGMG case, the hierarchy is at most 4 levels deep. Table 3.3 reports the complexities for the three categories of algorithms considered. The results confirm the assertion in [36] that AMG coarsening, generally, need not reduce the number of unknowns fast enough. In the present setting, contrasting the case σ = √ 5I 2 with σ = 2I 2 shows that the growth in complexity is due to the interpolation, which creates a denser connectivity graph on the coarser levels.

Properties of the LISL matrix
In this section, we discuss the theoretical foundation of Aggregation-based Multigrid (AGMG) for our specific application of wide stencil discretisations.
The key result is Lemma 3.1 in [25]. The non-negativity of the row sums of a LISL discretization matrix is obtained almost by construction. To see this, let A ∈ R N ×N be the discretization matrix, where N := |Ω ∆x | is the number of mesh points, then the sum for the i-th row is where we have used the fact that for any z ∈Ω, N j=1 w j (z) = 1 and the CFL-type condition 1 − ∆tc α,n i ≥ 0, which is satisfied for sufficiently small ∆t independent of ∆x. The following analysis of the non-negativity of the column sum makes use of the regularity of the coefficients b and σ. In particular, we assume that the coefficients are such that for all p ∈ [[1, P ]], and for any mesh points x i , x l and corresponding controls α i , α l ∈ A and s ∈ [0, T ] we have that where β ∈ (0, 1], η ∈ 1 2 , 1 . Remark 3.1. As stated in the introduction, we are working under the standard assumption of Lipschitz continuity of the coefficients in x and continuity in α. However, what we require in (3.6) is stronger, namely, if the control is inserted in the coefficients as a function of the state x, the resulting functions are Hölder continuous in x. This situation arises in every step of the policy iteration algorithm: a control vector (α i ) is determined by the optimisation step, and then a linear system with this control vector is solved for (x) i . Generally, the optimal control is not a (Hölder) continuous function of the space variables, but there are many important examples where it is at least piecewise Hölder. It can be seen from the proof below that Proposition 3.1 still holds in this situation.
Remark 3.2. Lemma 3.1 in [25] also assumes that the system matrix is irreducible. LISL discretization matrices need not be irreducible, e.g. L 2K,2,1 SL for any K ∈ N as in (3.3), however, this technical requirement could be overcome by adding an irreducible M-matrix, multiplied by a sufficiently small factor, to the LISL discretization matrix.     We also assume the use of multi-linear interpolation requiring 2 d points to approximate function values in R d and the use of Cartesian grids. Proposition 3.1. Let A be the LISL discretization matrix of (1.1) for a given time step, on an equispaced Cartesian grid Ω ∆x of Ω ⊂ R d with ∆x > 0, and a given vector of control values (α i ) i=1,...,N , α i ∈ A, associated with the mesh points x i , 1 ≤ i ≤ N := |Ω ∆x |. Assume that (3.6) holds.
Then the column sum of the matrix is non-negative provided ∆t ≤ ∆x sup α∈A |c α,+ | + (M − 1)(P + 1) , where M depends on the dimension of the domain d and the Lipschitz constants L σ and L b , but not on the mesh parameter ∆x. Indeed, M = 3 d for sufficiently small ∆x.
Proof. We carry out the proof for Scheme 2, but an analogous analysis holds for Schemes 1 and 3 in the introduction. We also note that we can restrict the analysis to steps where no truncation is required, as in the case of truncation the weights only contribute (positively) to the diagonal of the matrix and the right-hand side of the equation (see Remark 2.2). For simplicity of notation, we omit the dependence of the coefficient functions b and σ p on the time variable t and the control. For any i = j the matrix entry (A) ij = 0 if and only if for any 1 ≤ m ≤ P + 1 we require φ(x j ) to approximate -by means of linear interpolation -φ( For any two nodes i and l to contribute to the sum of column j, it is necessary that We consider the different possible values for y ± m (x l ) separately. First, assume y ± m (x l ) = ∆xb(x l ) and let ∆x ≤ 1/ where we have used the triangle inequality and the Hölder regularity of b. Re-arranging, such that M ≤ 2 for sufficiently small ∆x. Proceeding similarly for y ± m (x) = ± √ ∆xσ m (x), we obtain again M ≤ 2 as ∆x → 0.
Denote M to be the maximum of all of the (M + 1) d s above, i.e. for different mesh points, then the column sum gives Therefore, non-negativity of the sum is guaranteed by condition (3.7).

Remark 3.3.
For the LISL scheme to be first order accurate, it is required that ∆t ∼ O(∆x). Therefore, the bound (3.7) does not impose problematic restrictions on the size of the time steps. We recall that ∆t = O(∆x) and ∆t = O(∆x 3/2 ) (or even ∆t = O(∆x 2 )) are the CFL conditions for the explicit schemes without and with truncation, respectively, see Corollary 2.5. Therefore, on bounded domains, fully implicit time stepping with policy iteration and AGMG preconditioning is the computationally most efficient overall algorithm among the ones considered.

Performance of the algebraic approaches
We compare the performance of the classical AMG implementation in the HSL library [15], and AGMG from [23] for the benchmark optimal control problems in Section 2.4. Both of these methods are used as preconditioners for a Krylov subspace method that is assumed to have converged when the relative residual is below 10 −6 . In particular, we use MATLAB's implementation of GMRES [30] for AMG and GCR [10] for AGMG. The AMG preconditioner consists of one iteration of the standard V-cycle with two Gauss-Seidel pre-and postsmoothing steps, whereas AGMG uses one Gauss-Seidel pre-and post-smoothing step and the enhanced multigrid cycles mentioned in the introduction, see [24,25]. For completeness, we also include as benchmark MATLAB's sparse direct solver using UMFPACK [8]. The problems considered have smooth closed form solutions linear in t. As mentioned in the previous section, we employ policy iteration to solve the resulting non-linear discrete problem. The tests were run on a Linux machine under MATLAB 2015a, on a quad-core AMD 4.2GHz with 7.5GB of RAM and 15GB of swap.   In Figure 3.4 we present the elapsed time solving linear systems for a single time step (∆t = T ). Both MG methods provide a solution with the same accuracy as the sparse direct solver but with improved scalability. AGMG outperforms AMG and the sparse direct solver in both problems. Figure 3.5 shows the average time spent solving linear systems per time step when ∆t = ∆x. Reducing ∆t makes the system matrix more diagonally dominant and as a consequence easier to precondition. This effect is noticeable for Problem B using AMG as preconditioner, see Table 3.4.      Table 3.6 report memory consumption and quantities related to the Krylov subspace method and to the coarsening. As commented in the previous sections, AMG results in grid and algebraic complexities higher than AGMG's. The coarsening for both of the methods is stopped when the coarse level system is cheap to solve exactly compared to the starting system, specifically, we stop whenever the number of unknowns at the coarse level is comparable to the cubic root of the initial number of unknowns. The effect of simplifying the intergrid transfer operators can be observed on the coarse to fine stencil ratio (C/F stencil). For AGMG, the stencil on the coarsest level is similar to the initial one, whereas for AMG it is significantly denser. The fact that aggregation-based coarsening strategies yield coarse matrices with similar sparsity as the original one was noted in [16]. Moreover, AGMG yields shallower hierarchies due to higher coarsening factors. The effect of reducing ∆t is also appreciated in this ratio. We observe that the direct method's consumption increases dramatically while for the MG methods, we note the relation between the memory requirement and the algebraic complexity of the method.  Table 3.5: Peak memory consumption statistics in gigabytes (GB) of the MATLAB process sampled using the shell command top. VIRT is the total amount of virtual memory used by MATLAB, whereas RES is the non-swapped physical memory (limited to 7.5).

Conclusions
This article discusses two aspects of practical importance associated with wide stencil discretizations of second order non-linear parabolic differential operators. First, we study the truncation of the stencil for problems on bounded domains, as a result of the method overstepping the boundaries for nodes in a surrounding layer. Our main result details the construction of such truncation and proves that the resulting scheme remains consistent, monotone and conditionally stable. Numerical examples confirm that the truncation improves the accuracy of the approach compared to constant and linear extrapolation of the boundary conditions, and the modification of the CFL condition of the scheme. Second, motivated by the stringent CFL condition of explicit time stepping schemes, we consider implicit schemes and the application of multigrid methods to solve the resulting discrete non-linear system of equations efficiently. Using theoretical and empirical arguments, we show the need to employ multigrid methods based on algebraic ideas. We show that aggregation-based methods are well suited for the discretization schemes considered and justify their use by proving that under mild conditions on the mesh refinement parameters the LISL discretization matrices are M-matrices with non-negative row and column sums. The algorithms are shown to compare favourably against AMG and sparse direct solvers.
Although we only considered linear interpolation, much of the analysis, including in particular the matrix properties in Section 3.4, will also hold if other limited interpolations (see, e.g., [9], [35]), are used, as only the properties in (2.10) are critical.
To conclude, we emphasise that monotone schemes for general diffusions in two and more dimensions are necessarily non-local, so that the question of boundary truncation is not restricted to the class of schemes studied in this paper.   Table 3.6: Quantities related the to the Krylov subspace iteration and multigrid coarsening. Avg Krylov It contains the average number of Krylov iterations over all time steps and all policy iterations; # levels contains the average depth in the grid hierarchy; C/F stencil contains the ratio between the stencil at the coarsest level and that on the finest level (lower is better). On the finest level, the stencil is close to 11 for Problem A and close to 8 for Problem B. The last two columns report the grid and algebraic complexity as per Definitions 3.1 and 3.2. As the full matrix hierarchy was not available from [15] for AMG, but only the coarsest and finest matrices, the starred algebraic complexities are estimates based on the assumption of a geometrically decreasing complexity between the coarsest and finest level, which is likely to be a significant underestimate.