1 Introduction

This paper provides stability and convergence results for a type of implicit finite difference scheme for the approximation of nonlinear parabolic equations using backward differentiation formulae (BDF).

In particular, we consider Hamilton–Jacobi–Bellman (HJB) equations of the following form:

$$\begin{aligned} v_t(t,x) +\sup _{a\in \Lambda }\Big \{{\mathcal {L}}^a[v](t,x) +r(t,x,a) v +\ell (t,x,a)\Big \}=0, \end{aligned}$$
(1)

where \((t,x)\in [0,T]\times {{\mathbb {R}}}^d\), \(\Lambda \subset {{\mathbb {R}}}^m\) is a compact set and

$$\begin{aligned} {\mathcal {L}}^a[v](t,x)=-\frac{1}{2}{{\,\mathrm{tr}\,}}[\Sigma (t,x,a)D^2_x v(t,x)] + b(t,x,a)D_x v(t,x) \end{aligned}$$

is a second order differential operator. Here, \((\Sigma )_{ij}\) is symmetric non-negative definite for all arguments.

It is well known that for nonlinear, possibly degenerate equations the appropriate notion of solutions to be considered is that of viscosity solutions [9]. We assume throughout the whole paper the well-posedness of the problem, namely the existence and uniqueness of a solution in the viscosity sense. Under such weak assumptions, convergence of numerical schemes can only be guaranteed if they satisfy certain monotonicity properties, in addition to the more standard consistency and stability conditions for linear equations [2]. This in turn reduces the obtainable consistency order to 1 in the general case [12].

We will therefore not treat (1) in this generality. As we detail further below, the main stability analysis is restricted to the uniformly parabolic case, and full convergence results are given under the additional assumption of semi-linearity, \(\Sigma \equiv \Sigma (t,x)\).

It is shown in [16] that a monotone (but inconsistent) \(P_1\)–finite element approximation converges in the maximum norm, and in the \(H^1\)-norm under a mild non-degeneracy assumption; this assumption is further weakened to possibly degenerate coefficients in [15].

On the other hand, in many cases – especially in non-degenerate ones – solutions exhibit higher regularity and are amenable to higher order approximations. The existence of classical solutions and their regularity properties under a strict ellipticity condition have been investigated, for instance, in [11, 17].

The higher order of convergence in both space and time of discontinuous Galerkin approximations is demonstrated theoretically and numerically in [20] for sufficiently regular solutions under a Cordes condition for the diffusion matrix, a measure of the ellipticity.

More recently, it was shown numerically in [6] that some approximation schemes based on a second order backward differentiation formula in both time and space (see, e.g., [22], Section 12.11, for the definition of BDF schemes for ODEs) have good convergence properties. In particular, in an example therein with non-degenerate controlled diffusion where the second order, non-monotone Crank–Nicolson scheme fails to converge, the (also non-monotone) BDF2 scheme shows second order convergence.

The filtered schemes in [6] and \(\epsilon \)-monotone schemes, e.g. in [7], modify a higher-order scheme to stay \(\epsilon \)-close to a monotone scheme. This enforces convergence, but in general only at the rate of the monotone scheme, and practically the rate may vary depending on the data and on the strategy to choose the \(\epsilon \) parameter (see Example 2 in [6, Section 4.2] where a filtered scheme switches back to first order). Here, we directly analyse the stability and the convergence for a non-monotone BDF scheme.

For constant coefficient parabolic PDEs, the \(L^2\)-stability and smoothing properties of the BDF scheme are a direct consequence of the strong A-stability of the scheme. Moreover, [3] shows that for the multi-dimensional heat equation the BDF time stepping solution and its first numerical derivative are stable in the maximum norm. The technique, which is strongly based on estimates for the resolvent of the discrete Laplacian, do not easily extend to variable coefficients or the nonlinear case.

A more general linear parabolic setting is considered in [4], where second order convergence is shown for variable demister using energy techniques. This result is extended to a semi-linear example in [10]; the application to incompressible Navier–Stokes equations has been analyzed in [14]. In [5], a closely related BDF scheme is studied for a diffusion problem with an obstacle term (which includes the American option problem in mathematical finance).

The scheme we propose is constructed by using a second order BDF approximation for the first derivatives in both time and space, combined with a standard three-point central finite difference for the second spatial derivative in one dimension. The scheme is therefore second order consistent by construction.

For this scheme, under the assumption of uniform parabolicity, we establish new stability results in the \(H^1\)-norm for fully nonlinear HJB and Isaacs equations, and in the \(L^2\)-norm for the semilinear case (see Theorems 4 and 5, respectively). These generalize some results of [4, 5, 10] to more general non-linear situations. From this analysis we deduce error bounds for classical smooth and piecewise smooth solutions in the semilinear uniformly parabolic case (see Theorems 7 and 19).

Our overall approach relies on stability results with respect to perturbations of the right-hand side of the equations. We start by deriving a recursive linear relation satisfied by the approximation error between the original equation and a perturbed one, in the case of HJB and Isaacs equations (Lemma 10); then, we give an inequality between the error norms for three consecutive time steps (Lemma 11) which guarantees an overall stability estimate (Lemma 12). Having proved this generic sufficient condition for stability, we show that this condition is satisfied for different choices of the norm under specific assumptions, which are summarized in Table 1.

Table 1 Main stability results

The outline of the paper is as follows. In Sect. 2, we define some specific BDF schemes and state the main results concerning well-posedness and stability in discrete \(H^1\)- or \(L^2\)-norms and our main convergence result for uniformly parabolic semilinear HJB equations. In Sects. 3 and 4 we prove the main stability results and give an extension from HJB to Isaacs equations. In Sect. 5, we give further stability results in the discrete \(L^2\)-norm, which are weaker in the sense that they hold only for uncontrolled Lipschitz regular diffusion coefficients, but stronger in the sense that they allow for degenerate diffusion in the linear case and can be extended to two dimensions. In Sect. 6, we deduce error estimates from the \(L^2\) stability results and from the truncation error of the scheme for sufficiently regular solutions. Section 7 studies carefully two numerical examples, the Eikonal equation and a second order equation with controlled diffusion. Section 8 concludes. An appendix contains a proof of the existence of solutions for our schemes.

2 Definition of the scheme and main results

We focus in the first instance on the one-dimensional equation

$$\begin{aligned}&v_t + \sup _{a \in \Lambda }\bigg (-\frac{1}{2}\sigma ^2(t,x,a) v_{xx} + b(t,x,a) v_x + r(t,x,a) v + \ell (t,x,a)\bigg ) = 0, \nonumber \\& \quad t\in [0,T],\ x\in {{\mathbb {R}}}, \end{aligned}$$
(2a)
$$\begin{aligned}&v(0,x)=v_0(x) \quad x\in {{\mathbb {R}}}. \end{aligned}$$
(2b)

It is known (see Theorem A.1 in [1]) that with the following assumptions:

  • \(\Lambda \) is a compact set,

  • for some \(C_0>0\) the functions \(\phi \equiv \sigma , b, r, \ell :[0,T]\times {{\mathbb {R}}}\times \Lambda \rightarrow {{\mathbb {R}}}\) and \(v_0:{{\mathbb {R}}}\rightarrow {{\mathbb {R}}}\) satisfy for any \(t,s\in [0,T]\), \(x,y\in {{\mathbb {R}}}\), \(a\in \Lambda \)

    $$\begin{aligned}&|v_0(x)|+ |\phi (t,x,a)|\le C_0, \\&|v_0(x) - v_0(y)|+|\phi (t, x,a) - \phi (s,y,a)|\le C_0(|x-y|+|t-s|^{1/2}), \end{aligned}$$

there exists a unique bounded continuous viscosity solution of (2). We denote by v this solution.

We will make individual assumptions for each result as we go along.

2.1 The BDF2 scheme

For the approximation in the x variable, we will consider the PDE on a truncated domain \(\Omega :=(x_{\min },x_{\max })\), where \(x_{\min }<x_{\max }\).

Let \(N\in {{\mathbb {N}}}^* \equiv {{\mathbb {N}}}\backslash \{0\}\) be the number of time steps, \(\tau :=T/N\) the time step size, and \(t_n = n \tau \), \(n=1,\ldots , N\). Let \(I\in {{\mathbb {N}}}^*\) the number of interior mesh points in the spatial direction, and define a uniform mesh \((x_i)_{1\le i\le I}\) with mesh size h by

$$\begin{aligned} x_i:= x_{\min }+ ih, \quad i \in {\mathbb {I}} = \{1,\dots ,I\}, \quad \hbox {where}\quad h:=\frac{x_{\max }-x_{\min }}{I+1}. \end{aligned}$$

Hereafter, we denote by u a numerical approximation of v, the solution of (1), i.e.

$$\begin{aligned} u^k_i\sim v(t_k,x_i). \end{aligned}$$

For each time step \(t_k\), the unknowns are the values \(u^k_i\) for \(i=1,\dots ,I\).

Standard Dirichlet boundary conditions use the knowledge of the values at the boundary, \(v(t,x_{\min })\) and \(v(t,x_{\max })\). Here, as a consequence of the size of the stencil for the spatial BDF2 scheme below, we will assume that values at the two left- and right-most mesh points are given, that is, \(v(t,x_j)\) for \(j\in \{-1,0\}\) as well as \(j\in \{I+1, I+2\}\) are known (corresponding to the values at the points \((x_{-1},x_0,x_{I+1},x_{I+2})\equiv (x_{\min }-h,x_{\min },x_{\max },x_{\max }+h)\)).Footnote 1

We then consider the following scheme, for \(k\ge 2\), \(i\in {\mathbb {I}}\),

$$\begin{aligned}&\mathcal S^{(\tau ,h)}(t_k,x_i,u^k_i,[u]_i^k) := \frac{3 u^{k}_i - 4u^{k-1}_i + u^{k-2}_{i}}{2\tau } \nonumber \\&\qquad +\ \sup _{a\in \Lambda } \Big \{L^a[u^k](t_k,x_i) + r(t_{k},x_i,a) u^{k}_i + \ell (t_{k},x_i,a) \Big \} \ = \ 0, \end{aligned}$$
(3)

where we denote as usual by \([u]_i^k\) the numerical solution excluding at \((t_k,x_i)\), and

$$\begin{aligned} L^a[u](t_k,x_i) := -\frac{1}{2} \sigma ^2(t_{k},x_i,a) D^2 u^{}_i + b^+(t_k,x_i,a) D^{1,-} u^{}_i - \; b ^-(t_{k},x_i,a) D^{1,+} u^{}_i, \end{aligned}$$
$$\begin{aligned} D^2 u_i:=\frac{u_{i-1} - 2 u_i + u_{i+1}}{h^2} \end{aligned}$$
(4)

(the usual second order approximation of \(v_{xx}\)), \(b^+:=\max (b,0)\) and \(b^-:=\max (-b,0)\) denote the positive and negative part of b, respectively, and where a second order left- or right-sided BDF approximation is used for the first derivative in space:

$$\begin{aligned} D^{1,-} u_i:= \frac{3 u_i - 4 u_{i-1} + u_{i-2}}{2 h} \quad \hbox {and} \quad D^{1,+} u_i:= -\bigg (\frac{3 u_i - 4 u_{i+1} + u_{i+2}}{2h}\bigg ). \end{aligned}$$
(5)

Note in particular the implicit form of the scheme (3) for the forward Eq. (2). The existence of a unique solution to this nonlinear equation will be addressed later.

We will also define the numerical Hamiltonian associated with the scheme:

$$\begin{aligned} H[u](t_k,x_i):= \sup _{a\in \Lambda } \Big \{L^a[u](t_k,x_i) + r(t_{k},x_i,a) u_i + \ell (t_{k},x_i,a) \Big \}. \end{aligned}$$

As discussed above, the scheme is completed by the following boundary conditions:

$$\begin{aligned} u^k_i:= v(t_k,x_i),\quad \hbox {for } i\in \{-1,0\} \cup \{I+1,I+2\} \text { and }2\le k\le N. \end{aligned}$$

Since (3) is a two-step scheme, for the first time step \(k=1\) (and \(i \in {\mathbb {I}}\)), we use a backward Euler approximation scheme :

$$\begin{aligned}&{\mathcal {S}}^{(\tau ,h)}(t_1,x_i,u^1_i,[u]_i^1)\nonumber \\&:= \frac{u^{1}_i - u^{0}_i}{\tau } +\ \sup _{a\in \Lambda } \Big \{L^a[u^1](t_1,x_i) + r(t_{1},x_i,a) u^{1}_i + \ell (t_{1},x_i,a) \Big \} = 0, \end{aligned}$$
(6)

with the initial condition

$$\begin{aligned} u^0_i= v_0(x_i), \quad i \in {\mathbb {I}}. \end{aligned}$$
(7)

Remark 1

As the backward Euler step is only used once, it does not affect the overall second order of the scheme (see also Sect. 6 below).

Remark 2

Most of our results also apply to the scheme obtained by replacing the BDF approximation (5) of the drift term by a centered finite difference approximation:

$$\begin{aligned} {{\widetilde{D}}}^{1,\pm } u_i:= \frac{ u_{i+1}-u_{i-1} }{2h}. \end{aligned}$$
(8)

However, numerical tests (see Sect. 7.1) show that the BDF upwind approximation as in (5) has a better behavior in some extreme cases where the diffusion vanishes. We shall give a rigorous stability estimate for the BDF scheme in the linear case for possibly vanishing diffusion in Sect. 5.2.

2.2 Definitions and main results

In the remainder of this paper, we prove various stability and convergence results for the scheme (3). We state in this section the first main well-posedness and stability results.

Throughout the paper, A will denote the finite difference matrix associated with the second order derivative, i.e.

$$\begin{aligned} A: = \frac{1}{h^2} \begin{pmatrix} 2 &{}\quad -1 &{}\quad 0 &{}\quad &{}\quad \\ -1 &{}\quad 2 &{} \quad -1 &{}\quad \ddots &{}\quad \ddots \\ 0 &{}\quad -1 &{}\quad \ddots &{}\quad \ddots &{}\quad 0&{}\quad \\ \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad -1 \\ &{}\quad &{} \quad 0 &{}\quad -1 &{} \quad 2 \end{pmatrix}. \end{aligned}$$
(9)

Let \( \langle x,y\rangle _A:=\langle x,Ay\rangle \). Then we consider the A-norm defined as follows:

$$\begin{aligned} |x|_A^2:= \langle x,Ax\rangle = \sum _{1\le i\le I+1} \left( \frac{x_i-x_{i-1}}{h}\right) ^2 \end{aligned}$$
(10)

(with the convention in (10) that \(x_0=x_{I+1}=0\)). Hence, \(\sqrt{h}|x|_A\) approximates the \(H^1\) semi-norm in \(\Omega \). Similarly, we will consider later the standard Euclidean norm defined by \( \Vert x\Vert ^2:= \langle x,x\rangle \), such that \(\sqrt{h}\Vert x\Vert \) approximates the \(L^2\)-norm. We define therefore the following rescaled norms on \({{\mathbb {R}}}^I\):

$$\begin{aligned} |u|_0 := \left( \sum _{i\in {\mathbb {I}}} u_i^2 \, h \right) ^{1/2} \!= \Vert u\Vert \sqrt{h}, \!\!\qquad |u|_1 := \left( \sum _{i\in {\mathbb {I}}} \left( \frac{u_i-u_{i-1}}{h} \right) ^2 h \right) ^{1/2} \!= |u|_A \sqrt{h}. \end{aligned}$$

Both these norms will be used in the numerical section.

Our first result concerns the solvability of the numerical scheme \({\mathcal {S}}^{(\tau ,h)}\) defined by (3) with respect to its third argument, i.e. seen as an equation for \(u^k_i\), with \((t_n,x_i)\) and \([u]^k_i\) given.

Assumption (A1). The functions \(\sigma , b\) and r are bounded.

Theorem 3

Let (A1) and the following CFL condition hold:

$$\begin{aligned} \Vert b\Vert _\infty \frac{\tau }{h} < C. \end{aligned}$$
(11)

Then, for \(\tau \) small enough and \(C=3/2\) (resp. \(C=1\)) there exists a unique solution of the scheme (3) for \(k\ge 2\) (resp. \(k=1\), for scheme (6)).

The scheme is hence well-defined even if \(\sigma \) vanishes. A uniform ellipticity condition for \(\sigma \) as per Assumption (A2) below will be needed for proving the \(H^1\) stability of the scheme. We provide a relaxation of the ellipticity condition for stability in the Euclidean norm in Sect. 5.2.

Assumption (A2). There exists \(\eta >0\) such that

$$\begin{aligned} \inf _{t\in [0,T]} \inf _{x\in \Omega } \inf _{a\in \Lambda } \ \sigma ^2(t,x,a)\ \ge \ \eta . \end{aligned}$$

Let u denote the solution of (3), that is,

$$\begin{aligned} {\mathcal {S}}^{(\tau ,h)}(t_k,x_i,u^k_i,[u]^k_i) = {\mathcal {E}}^k_i(u), \quad i \in {\mathbb {I}}, \; 1\le k\le N, \end{aligned}$$

with \({\mathcal {E}}^k_i(u)\equiv 0\), and let w be the solution of a perturbed equation

$$\begin{aligned} {\mathcal {S}}^{(\tau ,h)}(t_k,x_i,w^k_i,[w]^k_i) = {\mathcal {E}}^k_i(w), \quad i \in {\mathbb {I}}, \; 1\le k\le N, \end{aligned}$$
(12)

with the same boundary values as u:

$$\begin{aligned} w_i^k = u_i^k, \quad i \in \{-1,0\} \cup \{I+1,I+2\}, \; 2\le k\le N \end{aligned}$$
(13)

(but potentially different initial values \(w^0\) and \(w^1\)).

Denote further

$$\begin{aligned} E^k:=(E^k_1, \dots , E^k_I)^T \ = \ u^k - w^k, \quad 0\le k\le N. \end{aligned}$$

Our main stability result in this setting (which also holds when \({\mathcal {E}}^k(u)\ne 0\)) is the following.

Theorem 4

Assume (A1), (A2), as well as the CFL condition (11). Then there exists a constant \(C\ge 0\) (independent of \(\tau \) and h) and \(\tau _0>0\) such that, for any \(\tau \le \tau _0\), for any \(u=(u^k_i)\) and \(w=(w^k_i)\) with same boundary values (13), it holds:

$$\begin{aligned} \max _{2\le k\le N}|E^k|_A^2\le & {} C \Big (|E^0|_A^2 + |E^1|_A^2 +\tau \sum _{2\le k\le N} |{{\mathcal {E}}^k(u)-{\mathcal {E}}^k(w)}|_A^2\Big ). \end{aligned}$$
(14)

The proof of Theorem 4 will be the subject of Sect. 4.

As a corollary we can deduce the \(|\cdot |_1\)-seminorm boundedness of the scheme. For instance, let us assume that \(\ell \equiv 0\), and let u be a solution of the scheme (3) (that is, \({\mathcal {E}}^k_i(u)\equiv 0\)), with 0 boundary conditions (\(u^k_i=0\) for all \(k\ge 2\) and \(i\in \{-1,0,I+1,I+2\}\)). Then by taking \(w=0\) in (14), we obtain

$$\begin{aligned} \max _{2\le k\le N}|u^k|_1^2\le & {} C \Big (|u^0|_1^2 + |u^1|_1^2 \Big ). \end{aligned}$$
(15)

A more general bound of \(|u^k|_1\) could also be obtained in the case of non-zero boundary values and non-vanishing \(\ell \), the bound then depending on these data.

In order to obtain stability estimates in other norms, one typically needs some uniform continuity of the coefficients. The analysis of the controlled case, associated with the presence of the supremum operator in (2a), is then made complicated by the fact that even if the solution to (2) is classical and the supremum is attained at some \(a^*(t,x)\) for each t and x [and similarly for each k and i in (3)], the optimal control \(a^*\) in general does not have any regularity as a function of t and x (or k and i, respectively).

However, in certain circumstances, the previous bound holds with the A-norm replaced by the Euclidean norm. In particular, we consider the following assumption:

Assumption (A3). The diffusion coefficient is independent of the control and Lipschitz continuous, i.e. \(\sigma \equiv \sigma (t,x)\) and there exists \(L\ge 0\) such that

$$\begin{aligned} |\sigma ^2(t,x)-\sigma ^2(t,y)| \le L |x-y| \quad \forall x,y\in \Omega , t\in [0,T]. \end{aligned}$$

Theorem 5

Assume (A1), (A2), (A3), as well as the CFL condition (11). Then there exists \(C\ge 0\) (independent of \(\tau \) and h) and \(\tau _0>0\) such that, for any \(\tau \le \tau _0\), for any \(u=(u^k_i)\) and \(w=(w^k_i)\) with the same boundary values (13), it holds:

$$\begin{aligned} \max _{2\le k\le N}\Vert E^k\Vert ^2\le & {} C \Big (\Vert E^0\Vert ^2 + \Vert E^1\Vert ^2 +\tau \sum _{2\le k\le N} \Vert { {\mathcal {E}}^k(u) - {\mathcal {E}}^k(w) }\Vert ^2\Big ). \end{aligned}$$
(16)

As a consequence, we obtain error estimates under the main assumptions (A1), (A2) and (A3), or under some specific regularity assumptions.

We define the following semi-norm on some interval \({\mathcal {I}}=(a,b)\), for \(\alpha \in (0,1]\):

$$\begin{aligned} || \phi ||_{C^{0,\alpha }({\mathcal {I}})}:= \sup \bigg \{ \frac{|\phi (x) - \phi (y)|}{|x-y|^\alpha }, \ x\ne y, \ x,y\in {\mathcal {I}} \bigg \}. \end{aligned}$$

For a given open subset \(\Omega _T^*\) of \((0,T)\times \Omega \), we define \(C^{k,\ell }(\Omega _T^*)\) as the set of functions \(\phi :\Omega _T^*\rightarrow {{\mathbb {R}}}\) which admit continuous derivatives \((\frac{\partial ^{i}\phi }{\partial t^i})_{0\le i\le k}\) and \((\frac{\partial ^{j}\phi }{\partial x^j})_{0\le j\le \ell }\) on \(\Omega _T^*\). We also denote by \(C^{k,\ell }_b(\Omega _T^*)\) the subset of functions with bounded derivatives on \(\Omega _T^*\).

Assumption (A4). \(v \in C^{1,2}((0,T)\times \Omega )\) and for some \(C\ge 0\), \(\delta \in (0,1]\), it holds:

$$\begin{aligned} \sup _{x\in \Omega } \Vert v_t(.,x)\Vert _{C^{0,\delta }([0,T])} \le C, \qquad \sup _{t\in (0,T)} \Vert v_{xx}(t,.)\Vert _{C^{0,\delta }({\bar{\Omega }})} \le C. \end{aligned}$$
(17)

Remark 6

By results in [11] and [17], assumption (A4) is satisfied for sufficiently smooth data and given a uniform ellipticity condition.

We have the following error estimates:

Theorem 7

We assume (A1), (A2), (A3), and the CFL condition (11).

  1. (i)

    If (A4) holds for some \(\delta \in (0,1]\), then the numerical solution u of (3), (6) converges to v in the \(L^2\)-norm with

    $$\begin{aligned} \max _{0\le k\le N}|v^k-u^k|_0 \le C h^{\delta }, \end{aligned}$$

    for some constant C (possibly different from the one in (A4)).

  2. (ii)

    If, moreover, \(v\in C^{3,4}_b((0,T)\times \Omega )\), then

    $$\begin{aligned} \max _{0\le k\le N} |v^k-u^k|_0 \le C h^2, \end{aligned}$$

    where C is a constant which depends on the derivatives of v of order 3 and 4 in t and x, respectively.

The proof of these and further error estimates will be the subject of Sect. 6.

The extension of the presented results to other types of nonlinear operators (\(\inf \), \(\sup \inf \) or \(\inf \sup \)) and corresponding equations will also be discussed.

Hereafter, for simplicity, we will consider \({\mathcal {E}}^k(u)\equiv 0\) and will denote \({\mathcal {E}}^k_i:={\mathcal {E}}^k_i(w)\).

3 Proof of Theorem 3 (well-posedness of the scheme)

The scheme (3) at time \(t_k\) (for \(k\ge 2\)) can be written in the following form:

$$\begin{aligned} {\sup }_{a\in \Lambda } ( M^k_a X - q^k_a ) = 0, \end{aligned}$$

where \(q^k_a \in {{\mathbb {R}}}^I\) and \(M^k_a \in {{\mathbb {R}}}^{I\times I}\) with the following non-zero entries:

$$\begin{aligned}&(M^k_a)_{i,i} \, {:}{=} \, \frac{3}{2} + \tau \bigg \{2 \frac{\sigma ^2}{h^2} + \frac{3 b^+}{2 h} + \frac{3 b^-}{2h} + r \bigg \} \end{aligned}$$
(18)
$$\begin{aligned}&(M^k_a)_{i,i+1} \, {:}{=}\, \tau \bigg \{- \frac{\sigma ^2}{h^2} - \frac{4 b^-}{2 h} \bigg \}, \quad (M^k_a)_{i,i-1} {:}{=} \tau \bigg \{- \frac{\sigma ^2}{h^2} - \frac{4 b^+}{2 h} \bigg \} \end{aligned}$$
(19)
$$\begin{aligned}&(M^k_a)_{i,i+2} \,{:}{=} \, \tau \frac{b^-}{2h} \quad (M^k_a)_{i,i-2}\, {:}{=} \, \tau \frac{b^+}{2h} \end{aligned}$$
(20)

with \(\sigma \equiv \sigma (t_k,x_i,a)\), \(b^\pm \equiv b^\pm (t_k,x_i,a)\) and \(r\equiv r(t_k,x_i,a)\). For \(k=1\), the terms are different but the form (and analysis) is similar. The fact that \((M_a)_{i,i\pm 2}\) are nonnegative breaks the monotonicity of the scheme and makes the analysis more difficult compared to the non-degenerate setting and central differences, where M is a diagonally dominant M-matrix for h small enough.

We will use the following lemma, whose proof is given in Appendix A:

Lemma 8

Assume that \(\Lambda \) is some set, \((q_a)_{a\in \Lambda }\) is a family of vectors in \({{\mathbb {R}}}^I\), \((M_a)_{a\in \Lambda } \) is a family of matrices in \({{\mathbb {R}}}^{I\times I}\) such that:

  1. (i)

    for all \(a\in \Lambda \),

    $$\begin{aligned} (M_a)_{ii}>0; \end{aligned}$$
  2. (ii)

    (a form of diagonal dominance)

    $$\begin{aligned} \sup _{a\in \Lambda } \max _{i\in {\mathbb {I}}} \frac{\sum _{j>i} |(M_a)_{ij}|}{|(M_a)_{ii}| - \sum _{j<i} |(M_a)_{ij}|} < 1. \end{aligned}$$
    (21)

Then there exists a unique solution X in \({{\mathbb {R}}}^n\) of

$$\begin{aligned} {\sup }_{a\in \Lambda } (M_a X - q_a) = 0. \end{aligned}$$
(22)

Remark 9

For a fixed \(a\in \Lambda \), we have

$$\begin{aligned} \max _{i\in {\mathbb {I}}} \frac{\sum _{j>i} |(M_a)_{ij}|}{|(M_a)_{ii}| - \sum _{j<i} |(M_a)_{ij}|} < 1 \Leftrightarrow \min _{i\in {\mathbb {I}}} \bigg ( |(M_a)_{ii}| - \sum _{j\ne i} |(M_a)_{ij}| \bigg ) >0. \end{aligned}$$

Moreover, if \(\Lambda \) is compact and \(a\rightarrow M_a\) is continuous, then (21) is equivalent to

$$\begin{aligned} \inf _{a\in \Lambda } \min _{i\in {\mathbb {I}}} \bigg ( |(M_a)_{ii}| - \sum _{j\ne i} |(M_a)_{ij}| \bigg ) >0. \end{aligned}$$

Proof of Theorem 3

Condition (i) in Lemma 8 is immediately verified, and we turn to proving (ii). We have

$$\begin{aligned} \mu _1 := \sum _{j>i} |(M_a)_{ij}| \le \tau \bigg ( \frac{\sigma _{i}^2}{h^2} + \frac{5 b^-_{i}}{2h}\bigg ) \end{aligned}$$

(omitting the dependency on k and a in \(\sigma ,b^\pm ,r\)) and

$$\begin{aligned} \mu _2 := |(M_a)_{ii}| - \sum _{j<i} |(M_a)_{ij}| \ge \frac{3}{2} + \tau \bigg ( \frac{\sigma _{i}^2}{h^2} - \frac{2 b^+_{i}}{2h} + \frac{3 b^-_{i}}{2h} + r \bigg ). \end{aligned}$$

By the CFL condition (11), there exists \(\epsilon >0\) such that \(\frac{\tau \Vert b\Vert _\infty }{h} \le \frac{3}{2}- \epsilon \). This implies

$$\begin{aligned} \frac{3}{2} -\frac{\epsilon }{2} + \tau \left( -\frac{2 b^+_{i}}{2h} + \frac{3 b^-_{i}}{2h}\right) \ge \frac{\epsilon }{2} + \tau \frac{5b^-_{i}}{2h} \end{aligned}$$

and therefore

$$\begin{aligned} \mu _2 \ge \bigg (\tau \frac{\sigma _{i}^2}{h^2} + \frac{\epsilon }{2} + \, \tau r\bigg ) + \bigg (\tau \frac{5 b^-_{i}}{2h} + \frac{\epsilon }{2}\bigg ). \end{aligned}$$

Then by using \(\displaystyle \frac{a_1+a_2}{c_1+c_2}\le \max \Big (\frac{a_1}{c_1}, \frac{a_2}{c_2}\Big )\) for numbers \(a_i,c_i\ge 0\), we obtain

$$\begin{aligned} \frac{\mu _1}{\mu _2} \le \max \bigg ( \frac{\tau \frac{\sigma ^2_i}{h^2}}{\tau \frac{\sigma ^2_i}{h^2} + \frac{\epsilon }{2}+\ \tau \,r },\ \frac{\tau \frac{5 b^-_{i}}{2h}}{\tau \frac{5 b^-_{i}}{2h} + \frac{\epsilon }{2}}\bigg ). \end{aligned}$$

Taking \(\tau \) small enough such that for instance \(\frac{\epsilon }{2} + \tau r \ge \frac{\epsilon }{4}\), and since b(.) and \(\sigma (.)\) are bounded functions (by (A1)), we obtain the bound

$$\begin{aligned} \sup _{a\in A} \max _{i\in {\mathbb {I}}} \frac{\sum _{j>i} |(M_a)_{ij}|}{|(M_a)_{ii}| - \sum _{j<i} |(M_a)_{ij}|}&\! \le \!&\max \bigg ( \frac{\tau \frac{\Vert \sigma ^2\Vert _\infty }{h^2}}{\tau \frac{\Vert \sigma ^2\Vert _\infty }{h^2} + \frac{\epsilon }{4}},\ \frac{\tau \frac{5 \Vert b^-\Vert _\infty }{2h}}{\tau \frac{5 \Vert b^-\Vert _\infty }{2h} + \frac{\epsilon }{2}}\bigg ). \end{aligned}$$

Since the last bound is a constant \(<1\), we can apply Lemma 8 to obtain the existence and uniqueness of the solution of the BDF2 scheme.

4 Proof of Theorem 4 (stability in the A-norm)

The proof consists of three main steps: first, we show a “linear” recursion for the error (Lemma 10); second, we pass from such a recursion for the error in vector form to a scalar recursion (Lemma 11); finally, we show the stability estimate from this scalar recursion (Lemma 12).

4.1 Treatment of the nonlinearity

Given a function \(\phi :[0,T]\times {{\mathbb {R}}}\times \Lambda \rightarrow {{\mathbb {R}}}\), for any \((t,x)\in [0,T]\times {{\mathbb {R}}}\) we will make use of the notation \(co(\phi (t,x,\Lambda ))\) to indicate the convex hull of \(\phi \) with respect to its third variable, i.e.

$$\begin{aligned} co(\phi (t,x,\Lambda )) = \left\{ \sum _{n\in {{\mathbb {N}}}} \gamma _n \phi (t,x,a_n) : a_n \in \Lambda , \gamma _n \ge 0, \sum _{n\in {{\mathbb {N}}}}\gamma _n =1\right\} . \end{aligned}$$

First, we have the following:

Lemma 10

Let u be the solution of (3) and w the solution of (12). There exist coefficients \({{\tilde{\sigma }}}^k_i\), \(({{\tilde{b}}}^\pm )^k_i\), \({{\tilde{r}}}^k_i\), such that the error \(E^k=u^k-w^k\) satisfies

$$\begin{aligned} \frac{3 E_i^k -4 E_i^{k-1} + E_i^{k-2} }{2\tau } -\frac{1}{2}({{\tilde{\sigma }}}^2)^k_i D^2 E^{k}_i + ({{\tilde{b}}}^+)^k_i D^{1,-} E^{k}_i -({{\tilde{b}}}^-)^k_i D^{1,+} E^{k}_i + {{\tilde{r}}}^k_i E^k_i = -{\mathcal {E}}_i^k \nonumber \\ \end{aligned}$$
(23)

for any \(k\ge 2\) and \(i\in {\mathbb {I}}\), where \(({{\tilde{\sigma }}}^2)^k_i\), \(({{\tilde{b}}}^\pm )^k_i\), \({{\tilde{r}}}^k_i\) belong, respectively, to the convex hulls \(co(\sigma ^2(t_k,x_i,\Lambda ))\), \(co(b^\pm (t_k,x_i,\Lambda ))\), \(co(r(t_k,x_i,\Lambda ))\).

Proof

By (12) one has (for \(k\ge 2\), \(1\le i\le I\))

$$\begin{aligned} \frac{3 w^k_i -4 w^{k-1}_{i} + w^{k-2}_i}{2\tau } + H[w^k](t_k,x_i) = {\mathcal {E}}^k_i. \end{aligned}$$
(24)

The scheme simply reads

$$\begin{aligned} \frac{3 u^k_i -4 u^{k-1}_{i} + u^{k-2}_i}{2\tau } + H[u^k](t_k,x_i) = 0. \end{aligned}$$
(25)

Subtracting (24) from (25), denoting also \(H[u^k]\equiv (H[u^k](t_k,x_i))_{1\le i\le I}\), the following recursion is obtained for the error in \({{\mathbb {R}}}^I\):

$$\begin{aligned} \frac{3 E^k -4 E^{k-1} + E^{k-2} }{2\tau } + H[u^k] - H[w^k] = -{\mathcal {E}}^k. \end{aligned}$$
(26)

For simplicity of presentation, we first consider the case when b and r vanish, i.e. \(\hbox {} b(.)\equiv 0 \text { and } r(.)\equiv 0\), and defer a sketch of the general case to the end of the proof. In this case,

$$\begin{aligned} H[u^k]_i = \sup _{a\in \Lambda } \Big \{- \frac{1}{2} \sigma ^2(t_k,x_i,a) (D^2 u^k)_i + \ell (t_k,x_i,a)\Big \}. \end{aligned}$$
(27)

Let us assume that \(\sigma \) and \(\ell \) are continuous functions of a so that the supremum is attained.Footnote 2 For each given ki, let then \({\bar{a}}^k_i \in \Lambda \) denote an optimal control in (27).

In the same way, let \({{\bar{b}}}^k_i\) denote an optimal control for \(H[v^k]_i\). By using the optimality of \({\bar{a}}^k_i\), it holds

$$\begin{aligned}&H[u^k]_i - H[w^k]_i \nonumber \\&=\! - \frac{1}{2} \sigma ^2(t_k,x_i, {\bar{a}}^k_i) (D^2 u^k)_i \! + \ell (t_{k},x_i,{\bar{a}}^k_i) - \sup _{a\in \Lambda }\Big \{\!\! -\! \frac{1}{2} \sigma ^2(t_k,x_i, a) (D^2 w^k)_i \! + \ell (t_{k},x_i,a)\!\Big \} \nonumber \\&\le \! - \frac{1}{2} \sigma ^2(t_k,x_i, {\bar{a}}^k_i) (D^2 u^k)_i - \Big (- \frac{1}{2} \sigma ^2(t_k,x_i, {\bar{a}}^k_i) (D^2 w^k)_i \Big ) \nonumber \\&=\! - \frac{1}{2} \sigma ^2(t_k,x_i, {\bar{a}}^k_i) (D^2 E^k)_i \nonumber \\ \end{aligned}$$
(28)

and, in the same way,

$$\begin{aligned} H[u^k]_i - H[w^k]_i \ge - \frac{1}{2} \sigma ^2(t_k,x_i,{\bar{b}}^k_i) (D^2 E^k)_i. \end{aligned}$$
(29)

Therefore, combining (28) and (29), \(H[u^k]_i - H[w^k]_i\) is a convex combination of \(-\frac{1}{2} \sigma ^2(t_k,x_i,{\bar{a}}^k_i) (D^2 E^k)_i\) and \(-\frac{1}{2} \sigma ^2(t_k,x_i,{\bar{b}}^k_i) (D^2 E^k)_i\). In particular, we can write

$$\begin{aligned} H[u^k]_i - H[w^k]_i = -\frac{1}{2} {{{\tilde{\sigma }}}}^2 (t_{k},x_i) (D^2 E^k)_i, \end{aligned}$$
(30)

where \({{\tilde{\sigma }}}^{2}(t_{k},x_i)\) is a convex combination of \(\sigma ^2(t_k,x_i,{\bar{a}}^k_i)\) and \(\sigma ^2(t_k,x_i,{\bar{b}}^k_i)\). In the general case (i.e. \(b,r\not \equiv 0\)) one can argue in the exact same way to get

$$\begin{aligned} H[u^k]_i - H[w^k]_i = -\frac{1}{2}({{\tilde{\sigma }}}^2)^k_i D^2 E^{k}_i + ({{{\tilde{b}}}}^+)^k_i D^{1,-} E^{k}_i - ({{\tilde{b}}}^-)^k_i D^{1,+} E^{k}_i +{{\tilde{r}}}^k_i E^k_i,\qquad \end{aligned}$$
(31)

where, for \(\phi \in \{\sigma ^2,b,r\}\),

$$\begin{aligned} {{\tilde{\phi }}}^k_i := \gamma ^k_i \phi (t_{k},x_i,{\bar{a}}^k_i) +(1-\gamma ^k_i) \phi (t_{k},x_i,{\bar{b}}^k_i) \end{aligned}$$

for some \(\gamma ^k_i\in [0,1]\). \(\square \)

The same technique used above to deal with the nonlinear operator applies also to Isaacs equations, i.e. equations of the following form:

$$\begin{aligned} v_t +\sup _{a\in \Lambda _1}\inf _{b\in \Lambda _2}\Big \{-\mathcal L^{(a,b)}[v](t,x) +r(t,x,a,b) v +\ell (t,x,a,b)\Big \}=0, \end{aligned}$$
(32)

where \((t,x)\in [0,T]\times {{\mathbb {R}}}^d\), \(\Lambda _1,\Lambda _2\subset {{\mathbb {R}}}^m\) are compact sets and

$$\begin{aligned} {\mathcal {L}}^{(a,b)}[v](t,x)=\frac{1}{2}\sigma ^2(t,x,a,b)v_{xx} + b(t,x,a,b)v_x. \end{aligned}$$

To simplify the presentation, let us consider again \(b,r \equiv 0\), and now also \(\ell \equiv 0\) (indeed, one can easily verify that as in (28) the term \(\ell \) would not appear in (34) and (35), and the case of non-zero b and r is treated similar to the Proof of Lemma 10). By analogous definitions and reasoning to above, we get (26), where, for \(\phi \in \{u,w\}\),

$$\begin{aligned} H[\phi ^k]_i = \sup _{a\in \Lambda _1}\inf _{b\in \Lambda _2}\Big \{-\frac{1}{2}\sigma ^2(t,x,a,b)(D^2_x \phi ^k)_i\Big \}. \end{aligned}$$
(33)

Making use of the general inequality (for any real-valued functions \((a,b)\rightarrow F_{a,b}\) and \((a,b)\rightarrow G_{a,b}\))

$$\begin{aligned} \sup _{a\in \Lambda _1} \inf _{b\in \Lambda _2} F_{a,b} - \sup _{a\in \Lambda _1} \inf _{b\in \Lambda _2} G_{a,b} \ge \inf _{a\in \Lambda _1} \inf _{b\in \Lambda _2} (F_{a,b} - G_{a,b}), \end{aligned}$$

we obtain

$$\begin{aligned} H[u^k]_i - H[w^k]_i \ge \inf _{a\in \Lambda _1} \inf _{b\in \Lambda _2}\Big \{-\frac{1}{2}\sigma ^2(t,x,a,b)(D^2_x E^k)_i\Big \}. \end{aligned}$$
(34)

Analogously, one can prove

$$\begin{aligned} H[u^k]_i - H[w^k]_i \le \sup _{a\in \Lambda _1} \sup _{b\in \Lambda _2}\Big \{-\frac{1}{2}\sigma ^2(t,x,a,b)(D^2_x E^k)_i\Big \}. \end{aligned}$$
(35)

From these inequalities, an equation exactly as in (23) can be derived, with a suitable convex combination \((\tilde{\sigma ^{_2}})_i^k\) of diffusion coefficients, and similar for the drift and other terms.

As a consequence, Lemma 10 – and by extension Theorem 4 – also hold for Isaacs equations of type (32), with the obvious modifications to the definition of the scheme.

4.2 A scalar error recursion

From the recursion (23) on \(E^k\), (or its corresponding formulation for Isaacs equations), we can derive the following:

Lemma 11

Let assumptions (A1) and (A2) in Theorem 4 be satisfied. Then there exists a constant \(C\ge 0\) such that

$$\begin{aligned}&\frac{1}{2} \Big ( (3-C\tau )|E^k|_A^2 -4| E^{k-1}|_A^2 +|E^{k-2}|_A^2\Big ) +|E^k-E^{k-1}|_A^2-|E^{k-1}-E^{k-2}|_A^2 \nonumber \\&\quad \le 2 \tau |E^k|_A\;|{\mathcal {E}}^k|_A. \nonumber \\ \end{aligned}$$
(36)

Proof

For simplicity of presentation we will assume that b has constant positive sign. The case of b with variable sign can be treated in a similar way obtaining estimates analogous to those below separately for the positive and negative part of b and then summing up.

We remark that for \(E\in {{\mathbb {R}}}^I\), \( - D^2 E = A E,\) where A is the finite difference matrix defined in (9) and \(D^2\) as in (4). By (23), we get the following:

$$\begin{aligned} \frac{3 E^k -4 E^{k-1} + E^{k-2} }{2\tau } + \Delta ^k A E^k + F^k B E^k + R^k E^k = -{\mathcal {E}}^k, \end{aligned}$$
(37)

where \(\Delta ^k:=\frac{1}{2} {{\,\mathrm{diag}\,}}( ({{\tilde{\sigma }}}^2)^k_i),\ F^k = {{\,\mathrm{diag}\,}}({{\tilde{b}}}^k_i),\ R_k = {{\,\mathrm{diag}\,}}({{\tilde{r}}}^k_i)\) and

$$\begin{aligned} B = \frac{1}{2h} \begin{pmatrix} 3 &{}\quad 0 &{}\quad &{} \quad &{}\quad \\ -4 &{}\quad 3 &{}\quad 0 &{}\quad &{} \quad \\ 1 &{}\quad -4 &{}\quad \ddots &{}\quad \ddots &{}\quad &{}\quad \\ 0 &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad 0 \\ \ddots &{}\quad \ddots &{} \quad 1 &{}\quad -4 &{} \quad 3 \end{pmatrix}. \end{aligned}$$

We form the scalar product of (37) with \(AE^k\). By using the identity \( 2 \langle a-b,a\rangle _A=|a|_A^2+|a-b|_A^2-|b|_A^2, \) one has:

$$\begin{aligned}&\langle 3 E^k -4 E^{k-1} + E^{k-2}, E^k\rangle _A \nonumber \\&\ = 4\langle E^k-E^{k-1},E^k\rangle _A - \langle E^{k}-E^{k-2},E^k\rangle _A \nonumber \\&\ = \frac{1}{2} \left( 4 |E^k|_A^2 + 4 |E^k-E^{k-1}|_A^2 - 4|E^{k-1}|_A^2\right) - \frac{1}{2}\left( |E^{k}|_A^2 + |E^{k}-E^{k-2}|_A^2 -|E^{k-2}|_A^2\right) \nonumber \\&\ \ge \frac{1}{2} \left( 3 |E^k|_A^2 - 4|E^{k-1}|_A^2 + |E^{k-2}|_A^2\right) + |E^k-E^{k-1}|_A^2 - |E^{k-1}-E^{k-2}|_A^2, \nonumber \\ \end{aligned}$$
(38)

where we have also used \(|a+b|^2\le 2|a|^2 + 2|b|^2\). From \((\sigma ^2)^k_i\ge \eta >0\) for all ki:

$$\begin{aligned} \langle \Delta ^k A E^k, A E^k\rangle \ge \frac{\eta }{2} \Vert A E^k \Vert ^2, \end{aligned}$$
(39)

where \(\Vert \cdot \Vert \) denotes the canonical Euclidean norm in \({{\mathbb {R}}}^I\).

In order to estimate the drift component, let us introduce the notation

$$\begin{aligned} \delta E:=(E_i-E_{i-1})_{1\le i\le {I}},\quad \delta _2 E:=(E_{i-1}-E_{i-2})_{1\le i\le {I}} \end{aligned}$$
(40)

with the convention that \(E_i=0\) for all indices i which are not in \({{\mathbb {I}}}\). It holds:

$$\begin{aligned} | \langle F^k B E^k, A E^k\rangle |= & {} \bigg | \frac{1}{2h}\langle F^k (3E^k_i - 4 E^k_{i-1} +E^k_{i-2})_{i\in {\mathbb {I}}}, A E^k\rangle \bigg | \\= & {} \bigg | \frac{1}{2h}\langle F^k \big (3 \delta E^k - \delta _2 E^k\big ),\ A E^k\rangle \bigg | \\\le & {} \frac{1}{2h}\Big \{3\Vert F^k \delta E^k \Vert \, \Vert A E^k \Vert + \Vert F^k \delta _2 E^k\Vert \, \Vert A E^k\Vert \Big \}. \end{aligned}$$

By using the boundedness of the drift term, and \(\Vert \delta E^k\Vert ,\Vert \delta _2 E^k\Vert \le h |E^k|_A\),

$$\begin{aligned} | \langle F^k B E^k, A E^k\rangle |\le & {} \frac{\Vert b\Vert _\infty }{2h}\Big \{3\Vert A E^k\Vert \Vert \delta E^k\Vert + \Vert A E^k\Vert \Vert \delta _2 E^k\Vert \Big \}\nonumber \\\le & {} 2 \Vert b\Vert _\infty \ \Vert A E^k\Vert \, |E^k|_A. \end{aligned}$$
(41)

For the last term, using the boundedness of r and the Cauchy-Schwarz inequality,

$$\begin{aligned} |\langle R^k E^k, A E^k\rangle | \le \Vert r\Vert _\infty \Vert E^k\Vert \Vert A E^k\Vert . \end{aligned}$$
(42)

Therefore, putting (39), (41) and (42) together,

$$\begin{aligned}&\langle \Delta ^k A E^k + F^k B E^k + R^k E^k ,AE^k\rangle \nonumber \\&\quad \ge \frac{\eta }{2} \Vert A E^k\Vert ^2 - 2 \Vert b\Vert _\infty \Vert A E^k\Vert \, |E^k|_A - \Vert r\Vert _\infty \Vert A E^k\Vert \Vert E^k\Vert . \end{aligned}$$
(43)

Easy calculus shows that the minimal eigenvalue of A is \(\lambda _{\min }(A)=\frac{4}{h^2} \sin ^2(\frac{\pi h}{2})\ge ~4\). Hence \(\langle X,AX\rangle \ge 4 \langle X,X\rangle \) and therefore \(\Vert X\Vert \le \frac{1}{2} |X|_A\). In the same way, we have also \(|X|_A \le \frac{1}{2} \Vert AX\Vert \). Hence it holds

$$\begin{aligned} \langle \Delta ^k A E^k + F^k B E^k + R^k E^k ,AE^k\rangle \ge \frac{\eta }{2} \Vert A E^k\Vert ^2 - C_1 \Vert A E^k\Vert |E^k|_A \end{aligned}$$
(44)

with \(C_1:=2 \Vert b\Vert _\infty + \frac{1}{2} \Vert r\Vert _\infty \). By using \( C_1 \Vert A E^k\Vert |E^k|_A \le \frac{\eta }{2} \Vert A E^k\Vert ^2 + \frac{1}{2\eta } C_1^2 |E^k|_A^2\),

$$\begin{aligned} \langle \Delta ^k A E^k + F^k B E^k + R^k E^k ,AE^k\rangle \ge - \frac{1}{2\eta } C_1^2 |E^k|^2_A. \end{aligned}$$
(45)

Then, combining (38) and (45), we obtain the desired inequality with \(C:=\frac{2}{\eta } C_1^2\). \(\square \)

4.3 A universal stability lemma

In the following, it is assumed that \(|\,{\cdot }\,|\) is any vectorial norm. Combined with Lemmas 10 and 11, the following Lemma 12 with the A-norm \(|\,{\cdot }\,|\,{\equiv }\,|\,{\cdot }|_A\) immediately gives Theorem 14. In Sect. 5, we will use the result for the canonical Euclidean norm \(|\cdot |\equiv \Vert \cdot \Vert \) to prove Theorem 5.

In order to prove the following Lemma 12, we will exploit properties of the matrix

$$\begin{aligned} M_\tau := \begin{pmatrix} (3-C\tau ) &{} -4 &{} 1 &{} 0 &{} \\ 0 &{} (3-C\tau ) &{} -4 &{} \ddots &{} \ddots \\ &{} 0 &{} \ddots &{}\ddots &{} 1 \\ &{} &{} \ddots &{}\ddots &{} -4 \\ &{} &{} &{} 0 &{} (3-C\tau ) \end{pmatrix}, \end{aligned}$$
(46)

in particular the fact that \((M_\tau )^{-1}\ge 0\) for \(\tau \) small enough (which we prove).

Lemma 12

Assume that there exists a constant \(C\ge 0\) such that \(\forall k=2,\dots ,N\):

$$\begin{aligned}&\frac{1}{2}\Big ( (3-C\tau )|E^k|^2 -4| E^{k-1}|^2 +|E^{k-2}|^2\Big )+|E^k-E^{k-1}|^2-|E^{k-1}-E^{k-2}|^2 \nonumber \\&\quad \le 2 \tau |E^k|\;|{\mathcal {E}}^k|. \end{aligned}$$
(47)

Then there exists a constant \(C_1\ge 0\) and \(\tau _0>0\) such that \(\forall 0<\tau \le \tau _0\), \(\forall n\le N\):

$$\begin{aligned} \max _{2\le k\le n}|E^k|^2\le & {} C_1 \Big (|E^0|^2 + |E^1|^2 + \tau \sum _{2\le j\le n}|{\mathcal {E}}^j|^2\Big ). \end{aligned}$$
(48)

Proof

Let us denote

$$\begin{aligned} x_k:= |E^k|^2\qquad \text {and}\qquad y_k:=|E^{k}-E^{k-1}|^2, \end{aligned}$$

so that (47) reads

$$\begin{aligned} \Big ( (3-C\tau ) x_k-4x_{k-1} +x_{k-2}\Big )\le 2(y_{k-1} -y_k) + 4 \tau |E^k|\;|{\mathcal {E}}^k|. \end{aligned}$$
(49)

For a given \(\tau >0\) and given k, let \(M_\tau \in {{\mathbb {R}}}^{(k-1)\times (k-1)}\) as defined in (46). Let \(z, q \in {{\mathbb {R}}}^{k-1}\) be defined by

$$\begin{aligned} z:=(x_k, x_{k-1}, \dots , x_2)^T \quad \text {and}\qquad q:=(2(y_{j-1}-y_j) + 4\tau |E^j|\;|{\mathcal {E}}^j|)_{j=k,\dots ,2}. \end{aligned}$$

By (49), we have

$$\begin{aligned} M_\tau z \le q. \end{aligned}$$
(50)

We notice that \(M_\tau =(3-C\tau ) I -4 J + J^2\) with

$$\begin{aligned} J:={{\,\mathrm{tridiag}\,}}(0,0,1). \end{aligned}$$

Hence, with

$$\begin{aligned} \lambda _{1}= 2 + \sqrt{1+C\tau } \quad \hbox {and} \quad \lambda _{2}= 2 - \sqrt{1+C\tau }, \end{aligned}$$

the roots of \(\lambda ^2 - 4\lambda + (3-C\tau ) =0\) for \(3- C\tau \ge 0\), we can write

$$\begin{aligned} M_\tau = (\lambda _1 I-J)(\lambda _2I-J) = \lambda _1\lambda _2\left( I-\frac{J}{\lambda _1}\right) \left( I-\frac{J}{\lambda _2}\right) . \end{aligned}$$

Furthermore, since \(J^{k-1}=0\), it holds

$$\begin{aligned} M_\tau ^{-1}&= \frac{1}{\lambda _1\lambda _2}\left( I-\frac{J}{\lambda _1}\right) ^{-1}\left( I-\frac{J}{\lambda _2}\right) ^{-1}\\&= \frac{1}{\lambda _1\lambda _2} \left( \sum _{0\le \xi \le k-2} \left( \frac{J}{\lambda _1}\right) ^\xi \right) \left( \sum _{0\le \xi \le k-2} \left( \frac{J}{\lambda _2}\right) ^\xi \right) = \sum _{p=0}^{k-2} a_p J^p, \end{aligned}$$

where

$$\begin{aligned} a_p:=\sum ^p_{j=0} \frac{1}{\lambda _1^{j+1}\lambda _2^{p-j+1}} = \frac{1}{\lambda _2^{p+2}}\sum ^p_{j=0} \left( \frac{\lambda _2}{\lambda _1}\right) ^{j+1}. \end{aligned}$$

Therefore \(M_\tau ^{-1}\ge 0 \) componentwise (for \(\tau <3/C\)), and using (50) it holds \(z \le M_\tau ^{-1} q\).

It is possible to prove that there exists \(\tau _0>0\) and a constant \(C_0\ge 0\) (depending only on T) such that \(\forall 0<\tau \le \tau _0\) and \(\forall p\ge 0\):

$$\begin{aligned} 0\le a_p\le C_0\quad \text {and}\quad a_p-a_{p-1}\ge 0. \end{aligned}$$
(51)

We postpone the Proof of (51) to the end. For the first component of z, we deduce

$$\begin{aligned} x_k\le & {} \sum _{j=0}^{k-2} a_j q_{j+1} \nonumber \\\le & {} 2 \sum _{j=0}^{k-2} a_j (y_{k-j-1}-y_{k-j}) + {4C_0\tau } \sum _{j=2}^k |E^j|\; |{\mathcal {E}}^j| \nonumber \\= & {} -2a_0 y_k + 2 \sum _{j=0}^{k-3} (a_j-a_{j+1}) y_{k-j+1} + 2 a_{k-2} y_1 + {4C_0\tau } \sum _{j=2}^k |E^j|\; |{\mathcal {E}}^j|,\nonumber \\ \end{aligned}$$
(52)

for all \(k\ge 2\), where, for (52), we have used the fact that \(a_p\le C_0\). Since \(y_j\ge 0\), \(\forall j\), by definition, \(a_{k-2}\le C_0\), \(a_0= \frac{1}{\lambda _1\lambda _2}\ge 0\) and \(a_j-a_{j-1}\ge 0\), \(\forall j\), we obtain

$$\begin{aligned} x_k\le & {} 2 C_0 y_1 + {4C_0\tau } \sum _{j=2}^k |E^j|\; |{\mathcal {E}}^j|. \end{aligned}$$
(53)

Recalling the definition of \(x_k\) and \(y_k\), for any \(2\le k\le n\) one has:

$$\begin{aligned} |E^k|^2\le & {} 2C_0 |E^1-E^0|^2 + 4C_0\tau \sum _{j=2}^k |E^j|\; |{\mathcal {E}}^j|\\\le & {} 4C_0 (|E^0|^2 + |E^1|^2) + 4C_0\tau \Big (\max _{2\le k\le n}|E^k|\Big )\sum _{j=2}^n |{\mathcal {E}}^j|\\\le & {} 4C_0 (|E^0|^2 + |E^1|^2) + \frac{1}{2}\Big (\max _{2\le k\le n}|E^k|\Big )^2 + {8C_0^2\tau ^2} \Big (\sum _{j=2}^n |{\mathcal {E}}^j|\Big )^2 \end{aligned}$$

(where we made use of \(2ab\le \frac{a^2}{K} +K b^2 \) for any \(a,b\ge 0\) and \(K>0\)). Hence, we obtain

$$\begin{aligned} \Big (\max _{2\le k\le n}|E^k|\Big )^2\le & {} C_1\Big (|E^0|^2 + |E^1|^2 + \tau \sum _{j=2}^n |{\mathcal {E}}^j|^2\Big ) \end{aligned}$$

with \(C_1 := \max (8C_0, 16 C_0^2 T)\) (we used \(\Big (\sum _{j=2}^n |{\mathcal {E}}^j|\Big )^2 \le n \sum _{j=2}^n |{\mathcal {E}}^j|^2\) and \(n\tau \le T\)).

It remains to prove (51). From the definition of \(a_p\) one has

$$\begin{aligned} a_p = \frac{1}{\lambda _2^{p+2}}\sum ^p_{j=0} \left( \frac{\lambda _2}{\lambda _1}\right) ^{j+1} \le \frac{1}{\lambda _2^{p+2}} \left( 1-\frac{\lambda _2}{\lambda _1}\right) ^{-1} \end{aligned}$$

for \(p=0,\ldots , k-2\). Observing that \(\frac{\lambda _2}{\lambda _1}\le \frac{1}{3}\), it follows that

$$\begin{aligned} a_p \le \frac{3}{2 \lambda _2^{p+2}} \le \frac{3}{2 (2-\sqrt{1+C\tau })^{n}}. \end{aligned}$$

Notice that \(\sqrt{1+C\tau }\le 1+C\tau \), and also that \(e^{-x} \le 1- x/2\), \(\forall x\in [0,1]\). Hence \((2-\sqrt{1+C\tau })^n\ge (2-(1+C\tau ))^n = (1-C\tau )^n \ge (e^{-2C\tau })^n = e^{-2 C t_n}\) for \(C\tau \le \frac{1}{2}\), and therefore \(a_p\le \frac{3}{2} e^{2Ct_n}\). The desired result follows with \(C_0:=\frac{3}{2} e^{2CT}\) and \(\tau _0:=\frac{1}{2C}\).

Moreover, one has

$$\begin{aligned} a_p-a_{p-1} = \frac{1}{\lambda _2^{p+1}}\left( \frac{1}{\lambda _2}\sum ^p_{j=0} \left( \frac{\lambda _2}{\lambda _1}\right) ^{j+1}- \sum ^{p-1}_{j=0} \left( \frac{\lambda _2}{\lambda _1}\right) ^{j+1} \right) , \end{aligned}$$

which is nonnegative for \(\tau \) small enough thanks to the fact that \(\lambda _1,\lambda _2\ge 0\) and \(\lambda _2\le 1\). \(\square \)

Remark 13

From the previous proof and the Proof of Lemma 11 one can deduce that the restriction

$$\begin{aligned} \tau \le \frac{\eta }{C^2_1} \end{aligned}$$

(where \(C_1= 2\Vert b\Vert _\infty + \frac{1}{2}\Vert r\Vert _\infty \)) has to be imposed on the time step. From the theoretical point of view this makes the scheme not suitable for nearly-degenerate equations. However, in our numerical tests we did not observe any stability issue even in the case of degenerate problems (see Sect. 7.1).

5 Stability in the Euclidean norm

The fundamental stability result given by Lemma 12 applies to any vectorial norm. In this section, we discuss some special cases where (47) can be obtained for the Euclidean norm \(|\cdot |=\Vert \cdot \Vert \).

We first prove the stability result for this norm under the extra assumption (A3), i.e., the control may appear everywhere except in the diffusion term, which must also be Lipschitz continuous in the following proof (see Assumption (A3)).

5.1 Proof of Theorem 5 (stability in the Euclidean norm)

We consider the scalar product of (37) directly with \(E^k\) (instead of \(A E^k\) previously used), again in the situation where \(b\ge 0\) to simplify the argument (but the general case follows analogous to the Proof of Lemma 11). We obtain:

$$\begin{aligned} \langle E^k,3E^k - 4 E^{k-1}+E^{k-2}\rangle + 2\tau \langle E^k,\Delta ^k A E^k + F^k B E^k + R^k E^k\rangle = -2\tau \langle E^k,{\mathcal {E}}^k\rangle . \end{aligned}$$
(54)

As in Sect. 4.2, we have

$$\begin{aligned}&\langle E^k,3E^k - 4 E^{k-1}+E^{k-2}\rangle \nonumber \\& \ge \frac{1}{2} \left( 3 \Vert E^k\Vert ^2 - 4\Vert E^{k-1}\Vert ^2 +\Vert E^{k-2}\Vert ^2\right) + \Vert E^k-E^{k-1}\Vert ^2 - \Vert E^{k-1}-E^{k-2}\Vert ^2.\nonumber \\ \end{aligned}$$
(55)

We now focus on bounding the other terms on the left-hand side of (54).

By using the Lipschitz continuity of \(\sigma ^2\) one has

$$\begin{aligned} \langle E^k,\Delta ^k A E^k\rangle&= \sum _{i\in {\mathbb {I}}} \frac{(\sigma ^k_i)^2}{2h^2}(-E^k_{i+1}+2E^k_{i}-E^k_{i-1}) E^k_i\nonumber \\&= \sum _{i\in \mathbb I}\frac{(\sigma ^k_{i-1})^2}{2h^2}(E^k_{i-1}-E^k_{i})^2 +\sum _{i\in {\mathbb {I}}} \left( \frac{(\sigma ^k_{i-1})^2}{2h^2}-\frac{(\sigma ^k_i)^2}{2h^2}\right) (E^k_{i-1}-E^k_i) E^k_i\nonumber \\&\ge \frac{\eta }{2h^2} \sum _{i\in {\mathbb {I}}}(E^k_{i-1}-E^k_{i})^2 - \frac{L}{2h} \sum _{i\in {\mathbb {I}}}|E^k_{i-1}-E^k_{i}| |E^k_i|. \end{aligned}$$
(56)

Therefore, by the Cauchy–Schwarz inequality, one obtains

$$\begin{aligned} \langle E^k,\Delta _k A E^k\rangle \ge \frac{\eta }{2h^2}\Vert \delta E^k\Vert ^2 - \frac{L}{2h} \Vert \delta E^k\Vert \Vert E^k\Vert , \end{aligned}$$
(57)

where \(\delta E^k\) is defined by (40). Moreover, for the first order term one has

$$\begin{aligned} \langle E^k,F^k B E^k\rangle&= \sum _{i\in {\mathbb {I}}} \frac{b_i}{2h}(3E^k_{i}-4E^k_{i-1}+E^k_{i-2}) E^k_i\nonumber \\&\ge - \frac{3 \Vert b\Vert _\infty }{2h}\sum _{i\in {\mathbb {I}}}|E^k_{i}-E^k_{i-1}| |E^k_i| - \frac{\Vert b\Vert _\infty }{2h} \sum _{i\in {\mathbb {I}}}|E^k_{i-1}-E^k_{i-2}| |E^k_i|\nonumber \\&\ge - \frac{2 \Vert b\Vert _\infty }{h}\Vert \delta E^k\Vert \Vert E^k\Vert , \end{aligned}$$
(58)

where for the last equality we have used that \(\Vert \delta _2 E^k\Vert \le \Vert \delta E^k\Vert \). Putting together estimates (57) and (58), using the fact that \(\langle E^k,R^k E^k\rangle \ge -\Vert r\Vert _\infty \Vert E^k\Vert ^2\), we get

$$\begin{aligned} \langle E^k,\Delta _k A E^k + F_k B E^k + R^k E^k\rangle&\ge \frac{\eta }{2h^2} \Vert \delta E^k\Vert ^2 - \frac{C_1}{2h} \Vert \delta E^k\Vert \Vert E^k\Vert -\Vert r\Vert _\infty \Vert E^k\Vert ^2\\&\ge \frac{\eta }{4h^2} \Vert \delta E^k\Vert ^2 - \left( \frac{C_1^2}{4\eta } + \Vert r\Vert _\infty \right) \Vert E^k\Vert ^2, \end{aligned}$$

where we have denoted \(C_1:=L+4\Vert b\Vert _\infty \) and have used again the Cauchy–Schwarz inequality. Hence, together with (55), this gives (47) with \(|\cdot |=\Vert \cdot \Vert \) and the constant \(C:=4 (\frac{C_1^2}{4\eta }+\Vert r\Vert _\infty )\). By using Lemma 12, this concludes the proof of Theorem 5. \(\Box \)

Remark 14

The step (56) highlights the need for Assumption (A3), Lipschitz regularity of the diffusion coefficient, in order to obtain the one-step stability inequality (47). This can be avoided in the A-norm stability analysis, Lemma 11, by using a different inner product, which directly gives (39) and only requires uniform ellipticity.

The adaptation of (A3) to the controlled case would impose some Lipschitz continuity of the feedback control with respect to the state variable. Such regularity of the control cannot usually be expected (see for instance the tests in Sect. 7.2).

5.2 Linear equation with degenerate diffusion term

The next result concerns the case of a possibly degenerate diffusion term. It will require more restrictive assumptions on the drift and diffusion terms, and we shall assume that there is no control here. Indeed, in this case, one cannot count on the positive term coming from the non-degenerate diffusion which, in the proof of Theorem 5, is used to compensate the negative correction terms coming from the drift term. This leads us to consider the following assumption:

Assumption (A5).

  1. (i)

    The function r(.) is bounded.

  2. (ii)

    The drift and diffusion coefficients are independent of the control : \(b \equiv b(t,x)\) and \(\sigma \equiv \sigma (t,x)\).

  3. (iii)

    there exist \(L_1, L_2\ge 0\) such that, for all txh:

    $$\begin{aligned}&|b(t,x)-b(t,y)| \le L_1 |x-y|, \end{aligned}$$
    (59)
    $$\begin{aligned}&\frac{\sigma ^2(t,x-h)- 2\sigma ^2(t,x) + \sigma ^2(t,x+h)}{h^2} \ge -L_2. \end{aligned}$$
    (60)

(The last condition is equivalent to \((\sigma ^2)_{xx}\ge -L_2\) in the differentiable case.)

Proposition 15

Let assumption (A5) be satisfied. Then (47) holds for \(|\cdot |=\Vert \cdot \Vert \).

Proof

We consider again the scalar recursion (54). For any vector \(E=(E_i)_{1\le i\le I}\) (with \(E_j=0\) for \(j\in \{-1,0,I+1,I+2\}\)), it holds:

$$\begin{aligned} E_i(2E_i - E_{i-1} - E_{i+1})\ge & {} 2 |E_i|^2 - \frac{1}{2}(|E_i|^2+|E_{i-1}|^2) - \frac{1}{2}(|E_i|^2+|E_{i+1}|^2) \\\ge & {} \frac{1}{2}\ (2 |E_i|^2 - |E_{i-1}|^2 - |E_{i+1}|^2). \end{aligned}$$

Hence, by the semi-concavity assumption (60) on \(\sigma ^2\),

$$\begin{aligned} \langle E^k,\Delta ^k A E^k\rangle= & {} \sum _{1\le i\le I} \frac{\sigma _i^2}{2h^2} E^k_i (2E^k_i - E^k_{i-1} - E^k_{i+1}) \nonumber \\\ge & {} \sum _{1\le i\le I} \frac{\sigma _i^2}{4h^2} (-|E^k_{i-1}|^2 + 2 |E^k_{i}|^2 - |E^k_{i+1}|^2) \nonumber \\\ge & {} \sum _{1\le i\le I}\bigg (\frac{ - \sigma _{i-1}^2 + 2\sigma _i^2 - \sigma _{i+1}^2}{4h^2}\bigg )\,|E^k_i|^2. \nonumber \\\ge & {} - \frac{L_2}{4} \Vert E^k\Vert ^2. \end{aligned}$$
(61)

Now we focus on a lower bound for \(\langle E^k,F^k B E^k\rangle \). Let \(y^k_i=|E^k_i-E^k_{i-1}|^2\). First,

$$\begin{aligned} (3E^k_{i}-4E^k_{i-1}+E^k_{i-2}) E^k_i= & {} \frac{1}{2}(3 |E^k_{i}|^2- 4 |E^k_{i-1}|^2 + |E^k_{i-2}|^2) \\& + \frac{1}{2}(4 |E^k_i - E^k_{i-1}|^2 - |E^k_i - E^k_{i-2}|^2) \\\ge & {} \frac{1}{2}(3 |E^k_{i}|^2- 4 |E^k_{i-1}|^2 + |E^k_{i-2}|^2) + \frac{1}{2}(2 y^k_i - 2 y^k_{i-1}). \end{aligned}$$

We assume again \(b_i\ge 0\) for all i to simplify the presentation. The case where \(b_i\le 0\) for some i is similar. Then, the following bound holds:

$$\begin{aligned} \langle E^k,\, F^k B E^k \rangle= & {} \sum _{i=1}^I \frac{b_i}{2h}(3E^k_{i}-4 E^k_{i-1}+E^k_{i-2}) E^k_i = \sum _{i=1}^{I+2} \frac{b_i}{2h}(3E^k_{i}-4 E^k_{i-1}+E^k_{i-2}) E^k_i \\\ge & {} \sum _{i=1}^{I+2} \frac{b_i}{4h} (3 |E^k_{i}|^2- 4 |E^k_{i-1}|^2 + |E^k_{i-2}|^2) + \sum _{i=1}^{I+2} \frac{b_i}{h} (y^k_i-y^k_{i-1}) \\\ge & {} \sum _{i=1}^I \bigg (\frac{3 b_i - 4 b_{i+1} + b_{i+2}}{4h}\bigg ) |E^k_{i}|^2 + \sum _{i=1}^{I+1} \bigg (\frac{b_i-b_{i+1}}{h}\bigg ) y^k_i \end{aligned}$$

(where we have used \(y^k_{0}=y^k_{I+2}=0\) and \(\sum _{1\le i\le I+2} b_i (E^k_{i-2})^2 = \sum _{1\le i\le I} b_{i+2} (E^k_i)^2 \) as well as \(\sum _{1\le i\le I+2} b_i (E^k_{i-1})^2 = \sum _{0\le i\le I+1} b_{i+1} (E^k_i)^2 = \sum _{1\le i\le I} b_{i+1} (E^k_i)^2\)). Then, by the Lipschitz continuity of b(.) and the bound \(y^k_i\le 2(E^k_i)^2 + 2(E^k_{i-1})^2\), we have

$$\begin{aligned} \langle E^k,\, F^k B E^k \rangle\ge & {} - L_1 \sum _{i=1}^I |E^k_i|^2 - L_1 \sum _{i=1}^{I+1} y^k_i \ge - 3 L_1 \Vert E^k\Vert ^2. \end{aligned}$$
(62)

By combining the bounds (61) and (62), we obtain

$$\begin{aligned} \langle E^k,\, \Delta ^k A E^k \rangle + \langle E^k,\, F^k B E^k \rangle + \langle E^k,\, R^k E^k\rangle \ge - (\frac{L_2}{4} + 3 L_1 + \Vert r\Vert _\infty ) \Vert E^k\Vert ^2. \end{aligned}$$

Therefore, inequality (47) is obtained with \(C := 4(\frac{L_2}{4} + 3 L_1 + \Vert r\Vert _\infty )\), which leads to the desired stability estimate. \(\square \)

5.3 Extension to a two-dimensional case

Under suitable assumptions, the result of Theorem 5 can be extended to multi-dimensional equations. We only sketch the main extra features and analysis steps as the notation is significantly lengthier. In the nonlinear case of HJB and Isaacs equations, the derivation of a linear error recursion can be carried out exactly as in Sect. 4.1 so that we can restrict ourselves to the following linear case with appropriate assumptions on the coefficients specified below,

$$\begin{aligned} v_t -\frac{1}{2}{{\,\mathrm{tr}\,}}[\Sigma (t,x)D_x^2 v] + b(t,x)D_x v+r(t,x) v +\ell (t,x)=0 \end{aligned}$$

for a positive definite matrix \(\Sigma \) and a drift vector b. We consider the two-dimensional case (\(d=2\)), as the approximation of the diffusion term with suitable properties is better understood here for diagonally dominant diffusion tensor (see also Remark 17, (iii)). For simplicity, we take \(r,\ell \equiv 0\), but this condition can easily be removed as in earlier sections. Lastly, we omit for brevity the dependence of the coefficients on the time variable, which is inconsequential for the stability analysis.

Then with

$$\begin{aligned} \Sigma (x,y) :=\left( \begin{array}{cc} \sigma ^2_1(x,y) &{} \rho \sigma _1\sigma _2(x,y)\\ \rho \sigma _1\sigma _2(x,y) &{} \sigma _2^2(x,y) \end{array}\right) \quad \text {and}\quad b(x,y): =\left( \begin{array}{c} b_1(x,y)\\ b_2(x,y) \end{array}\right) , \end{aligned}$$

where \(\sigma _1,\sigma _2\ge 0\) and \(\rho \in [-1,1]\) is the correlation parameter, the equation reads (by slight abuse of notation)

$$\begin{aligned} v_t - \frac{1}{2}\sigma _1^2(x,y)v_{xx} -\rho \sigma _1\sigma _2(x,y)v_{xy}-\frac{1}{2}\sigma _2^2(x,y)v_{yy} + b_1(x,y)v_x + b_2(x,y)v_y = 0. \end{aligned}$$

The computational domain is given by \(\Omega :=(x_{\min },x_{\max })\times (y_{\min },y_{\max })\). We introduce the discretization in space defined by the steps \(h_x, h_y>0\) and we denote by \({\mathcal {G}}_{(h_x,h_y)}\) the associated mesh. In what follows, given any function \(\phi \) of \((x,y)\in \Omega \), we will denote \(\phi _{ij}=\phi (x_i,y_j)\) for \((i,j)\in {\mathbb {I}}:={\mathbb {I}}_1\times {\mathbb {I}}_2\), where \(\mathbb I_1=\{1, \ldots , I_1\}\), \({\mathbb {I}}_2=\{1, \ldots , I_2\}\).

Assuming that \(\rho \ge 0\) everywhere (the case when \(\rho \le 0\) is similar), we consider a 7-point stencil for the second order derivatives (see [13, Section 5.1.4]):

$$\begin{aligned} v_{xx}\sim \frac{v_{i-1,j}-2v_{ij}+v_{i+1,j}}{h_x^2}=:D^2_{xx}v_{ij},\quad \quad v_{yy}\sim \frac{v_{i,j-1}-2v_{ij}+v_{i,j+1}}{h_y^2}=:D^2_{yy}v_{ij}\\ v_{xy} \sim \frac{-v_{i,j-1}-v_{i,j+1}-v_{i-1,j}-v_{i+1,j}+v_{i-1,j-1}+v_{i+1,j+1}+2v_{ij}}{2 h_x h_y}=:D^2_{xy}v_{ij} \end{aligned}$$

and the BDF approximation of the first order derivatives

$$\begin{aligned} D^{1,-}_x u_{ij}:= \frac{3 u_{ij} - 4 u_{i-1,j} + u_{i-2,j}}{2 h_x} \quad \hbox {and} \quad D^{1,+}_x u_{ij}:= -\bigg (\frac{3 u_{ij} - 4 u_{i+1,j} + u_{i+2,j}}{2h_x}\bigg ),\\ D^{1,-}_y u_{ij}:= \frac{3 u_{ij} - 4 u_{i,j-1} + u_{i,j-2}}{2 h_y} \quad \hbox {and} \quad D^{1,+}_y u_{ij}:= -\bigg (\frac{3 u_{ij} - 4 u_{i,j+1} + u_{i,j+2}}{2h_y}\bigg ). \end{aligned}$$

The scheme is therefore defined, for \(k\ge 2\), by

$$\begin{aligned}&0 \ =\ \frac{u^k_{ij}-4u^{k-1}_{ij}+u^{k-2}_{ij}}{2\tau } \nonumber \\&\quad - \frac{1}{2} \sigma _1^2 (x_i,y_j)D^2_{xx}u^k_{ij} -\rho \sigma _1\sigma _2 (x_i,y_j)D^2_{xy}u^k_{ij} - \frac{1}{2} \sigma _2^2 (x_i,y_j)D^2_{yy}u^k_{ij}\nonumber \\&\quad + b^+_1(x_i,y_j) D^{1,-}_x u^{k}_{ij} - b_1^-(x_i,y_j) D^{1,+}_x u^{k}_{ij} + b^+_2(x_i,y_j) D^{1,-}_y u^{k}_{ij} - b_2^-(x_i,y_j) D^{1,+}_y u^{k}_{ij}.\nonumber \\ \end{aligned}$$
(63)

A straightforward calculation shows that

$$\begin{aligned}&\sigma ^2_1(x_i,y_j)D^2_{xx}u_{ij} +2\rho \sigma _1\sigma _2(x_i,y_j) D^2_{xy}u_{ij}+\sigma ^2_2(x_i,y_j)D^2_{yy}u_{ij}\nonumber \\&\quad = \alpha _{ij} D^2_{xx}u_{ij} + \beta _{ij} D^2_{yy}u_{ij} + \gamma _{ij}\left( u_{i-1,j-1}-2u_{ij}+u_{i+1,j+1}\right) , \end{aligned}$$
(64)

with

$$\begin{aligned} \alpha _{ij}&:=\frac{\sigma _1(x_i,y_j)}{h_x}\left( \frac{\sigma _1(x_i,y_j)}{h_x}-\frac{\rho \sigma _2(x_i,y_j)}{h_y}\right) , \\ \beta _{ij}&:=\frac{\sigma _2(x_i,y_j)}{h_y}\left( \frac{\sigma _2(x_i,y_j)}{h_y}-\frac{\rho \sigma _1(x_i,y_j)}{h_x}\right) , \qquad \gamma _{ij} :=\frac{\rho (x_i,y_j)\sigma _1(x_i,y_j)\sigma _2(x_i,y_j)}{h_y h_x}. \end{aligned}$$

The scheme is completed with the following boundary conditions:

$$\begin{aligned} u^k_{i,j} = v(t_k,x_i, y_j),&\quad \forall i\in \{-1,0\}\cup \{I_1+1,I_1+2\}, \; j\in {\mathbb {I}}_2, \\ u^k_{i,j} = v(t_k,x_i, y_j),&\quad \forall j\in \{-1,0\}\cup \{I_2+1,I_2+2\}, \; i\in {\mathbb {I}}_1. \end{aligned}$$

For simplicity, assume \(h_x=h_y=: h\). We consider the following assumptions:

Assumptions.

  1. (A1’):

    \(\Vert b_i\Vert _\infty <\infty \) for \(i=1,2\);

  2. (A2’):

    \(\exists \eta >0\), \(\forall (x,y)\in \Omega \), \(\forall i\ne j\): \(\sigma _i^2(x,y)-\rho (x,y)\sigma _i(x,y)\sigma _j(x,y)\ge \eta \);

  3. (A3’):

    \(\forall i,j=1,2\), \(\sigma _i\sigma _j\) is Lipschitz continuous on \(\Omega \).

We then have the following result.

Proposition 16

Let assumptions (A1’),(A2’) and (A3’) be satisfied. Then the stability estimate (16) holds for \(|\cdot |=\Vert \cdot \Vert \).

Proof

The proof follows by similar steps to those of Theorem 5, using (64) with \(\alpha _{ij}, \beta _{ij}\ge \eta /h^2\) and \(\gamma _{ij} \ge 0\) by assumption (A2’). \(\square \)

Remark 17

(i) If \(h_x\ne h_y\) and for instance \(h_y=C h_x\) for some \(C\ge 1\), (A2’) has to hold with \(\sigma _2\) replaced by \(\sigma _2/C\) as a result of the scaling properties of the scheme.

(ii) Observe that assumption (A2’) is equivalent to requiring strong diagonal dominance of the covariance matrix.

(iii) When the strong diagonal dominance of the matrix \(\Sigma \) is not guaranteed, one can consider the generalized finite difference scheme in [8]. However, determining the precise set of assumptions on the coefficients needed to apply the previous arguments does not seem easy from the construction in [8].

6 Error estimates

In this section, we derive detailed error estimates for the implicit BDF2 scheme (3). For brevity, we restrict ourselves to the one-dimensional case.

In the following, we define specific instances of \(w_i^k\), \(E_i^k\) and \({\mathcal {E}}_i^k\), to which we can apply the results from the preceding sections.

Let u denote the solution of (3) and let w be the solution of (1), i.e. the function v. The error associated with the scheme is then defined by

$$\begin{aligned} E^k_i:=u^k_i - v(t_k,x_i), \quad i\in {\mathbb {I}}, 0\le k\le N. \end{aligned}$$

For any function \(\phi \) we will also use the notation \(\phi ^k_i:=\phi (t_k,x_i)\) as well as \(\phi ^k:=(\phi ^k_i)_{1\le i\le I}\) and \([\phi ]_i^k := (\phi ^m_j)_{(j,m)\ne (i,k)}\), and the error vector at time \(t_k\) is defined by

$$\begin{aligned} E^k:=(E^k_1, \dots , E^k_I)^T \ = \ u^k - v^k, \quad 0\le k\le N. \end{aligned}$$

The consistency error will be denoted by \({\mathcal {E}}^k(\phi ):=({\mathcal {E}}^k_i(\phi ))_{1\le i\le I}\in {{\mathbb {R}}}^I\) and for any smooth enough function \(\phi \) is defined, in this section, as follows:

$$\begin{aligned} {\mathcal {E}}^k_i(\phi ):= & {} {\mathcal {S}}^{(\tau ,h)}(t_k,x_i,\phi ^k_i,[\phi ]_i^k)\nonumber \\&- \bigg (\phi _t +\sup _{a\in \Lambda }\Big \{{\mathcal {L}}^a[\phi ](t_k,x_i) +r(t_k,x_i,a) \phi +\ell (t_k,x_i,a)\Big \}\bigg ). \end{aligned}$$
(65)

By extension, for the exact solution v of (1), we will simply define

$$\begin{aligned} {\mathcal {E}}^k_i(v):=\mathcal S^{(\tau ,h)}(t_k,x_i,v^k_i,[v]^k_i). \end{aligned}$$
(66)

Note that (66) is well-defined for any continuous function.

In particular, for the scheme (3) it is clear that we have second order consistency in space and time, that is,

$$\begin{aligned} |{\mathcal {E}}^k_i(\phi )|\le c_1(\phi )\tau ^2 + c_2(\phi ) h^2 \end{aligned}$$
(67)

for any sufficiently regular test function \(\phi \).

To prove convergence of a certain order, we can now follow the standard approach of considering the exact solution to the PDE as a solution of a perturbed finite difference scheme with the truncation error as the right-hand side. The error therefore satisfies precisely Eqs. (14) and (16) under the pertaining assumptions.

When the Euler timestepping scheme (6) is used at the first time step, by the stability of the scheme we expect to have

$$\begin{aligned} |E^1|_A\le C \tau |{\mathcal {E}}^1(v)|_A \end{aligned}$$

and (14) simply reads

$$\begin{aligned} \max _{2\le k\le N}|E^k|_A^2 \le C \Big (|E^0|_A^2 + \tau ^2 |{\mathcal {E}}^1(v)|_A^2 + \tau \sum _{2\le k\le N} |{\mathcal {E}}^k(v)|_A^2\Big ), \end{aligned}$$

and similarly for the \(L^2\) error.

6.1 Proof of Theorem 7

We first prove (i). By Taylor expansion, we can write for instance, for some \(\theta _1,\theta _2 \in [0,1]\),

$$\begin{aligned} \left| v_t(t,x) - \frac{v(t,x) - v(t-\tau ,x)}{\tau } \right| \ = \ \left| v_t(t,x) - v_t(t-\theta _1 \tau ,x) \right|\le & {} C \tau ^\delta \end{aligned}$$

and

$$\begin{aligned}&\left| v_t(t,x) - \frac{3 v(t,x) - 4 v(t-\tau ,x) + v(t-2\tau ,x)}{2 \tau } \right| \\&\quad \le \ \left| v_t(t,x) - \frac{1}{2} \left( 3 v_t(t-\theta _1 \tau ,x) - v_t(t-(1+\theta _2)\tau ,x) \right) \right| \\&\quad \le \ \left| v_t(t,x) - v_t(t-\theta _1 \tau ,x) \right| + \frac{1}{2} \left| v_t(t-\theta _1 \tau ,x) - v_t(t-(1+\theta _2)\tau ,x) \right| \\&\quad \le \ C \tau ^\delta + \frac{1}{2}C (2\tau )^\delta \le 2 C \tau ^\delta . \end{aligned}$$

Similarly, using the higher spatial regularity, there exists a constant \(C_0\ge 0\) such that

$$\begin{aligned} \left| v_x(t,x) - \frac{3 v(t,x) - 4 v(t,x-h) + v(t,x-2h)}{2 h} \right|\le & {} C_0 C h^{\delta +1}, \\ \left| v_{xx}(t,x) \, - \ \frac{v(t,x+h) - 2 v(t,x) + v(t,x-h)}{h^2} \right|\le & {} C_0 C h^{\delta }. \end{aligned}$$

The result (i) now follows directly by inserting the obtained truncation error into the stability estimate of Theorem 5.

For the proof of (ii) (smooth case), expansion up to order 3 and 4 gives the truncation error of higher order for \(k\ge 2\), and we use the fact that the error from the first backward Euler step is bounded by \(\Vert E^1\Vert \le C \tau (\tau + h^2)\); in particular, we use that \((E^1-E^0)/\tau + (\Delta ^1A + F^1 B +R^1) E^1 = - {\mathcal {E}}^1\), with \(\Vert {\mathcal {E}}^1\Vert \le C(\tau + h^2)\), \(E^0=0\) and the bound is otherwise similar and simpler than that for \(k\ge 2\).

6.2 Piecewise smooth solutions

The previous arguments can also be used to derive error estimates for piecewise smooth solutions. In this case, we will need to limit the number of non-regular points that may appear in the exact solution (assumption (A6)(i) is similar to [5]).

Assumption (A6). There exists an integer \(p\ge 1\) and functions \(t\rightarrow (x^*_j(t))_{1\le j\le p}\) for \(t\in [0,T]\), such that, with \(\Omega ^*_T:= (\Omega \times (0,T))\backslash \bigcup _{1\le j\le p} \{(t,x^*_j(t)),\ t\in (0,T)\}\), the following holds:

  1. (i)

    \(v \in C^{3,4}_b(\Omega ^*_T)\);

  2. (ii)

    \(\forall j\), \(t\rightarrow x^*_j(t)\) is Lipschitz regular.

We give the following straightforward preliminary result without proof:

Lemma 18

Assume (A6) and the CFL condition (11). Then for all t

$$\begin{aligned} \hbox {Card} \{ j, \ x \rightarrow v(t,x)\ \hbox {not regular in } [x_{j-2},x_{j+2}]\} \le 5p \end{aligned}$$

and

$$\begin{aligned} \hbox {Card} \{ j, \ \theta \rightarrow v(\theta ,x_j)\ \hbox {not regular in } [t-2\tau ,t] \} \le Cp \end{aligned}$$

for some constant \(C\ge 0\) independent of \(\tau ,h\) (“not regular” meaning not \(C^4\) in the first case and not \(C^3\) in the second one).

Such a situation will be illustrated in the numerical example of Sect. 7.2.

Theorem 19

We assume (A1), (A2), (A3) and the CFL condition (11). Let (A4) for some \(\delta \in (0,1]\) and (A6) hold, then the numerical solution u of (3), (6) converges to v in the \(L^2\)-norm with

$$\begin{aligned} \max _{2\le k\le N}|v^k-u^k|_0 \le C h^{1/2+\delta }, \end{aligned}$$

where C is a constant independent of h.

Proof

Let \({\mathbb {I}}^k\) be the (finite) set of indices i such that v is not regular in \( \{t_k\} \times (x_i-2h,x_i+2 h) \cup (t_k-2\tau ,t_k) \times \{x_i\}\). Then

$$\begin{aligned} |{\mathcal {E}}^k |_0^2= & {} \sum _{i\in {\mathbb {I}}} |{\mathcal {E}}_i^k|^2 h = \sum _{i\in {\mathbb {I}}^k} |{\mathcal {E}}_i^k|^2 h + \sum _{i\in {\mathbb {I}}\backslash {\mathbb {I}}^k} |{\mathcal {E}}_i^k|^2 h \\\le & {} C |{\mathbb {I}}^k| (\tau ^\delta + h^\delta )^2 h + C (\tau ^2 + h^2)^2. \end{aligned}$$

We then use the fact that \(|{\mathbb {I}}^k|\le C\) for some (different) constant C by Lemma 18 and that \((\tau ^2 + h^2)^2 = O(h^4) = O(h^{2+\delta })\), \(\tau ^\delta +h^\delta = O(h^\delta )\) by the CFL condition (11), in order to obtain the desired result. \(\square \)

Remark 20

  1. (i)

    Similar results can be derived for errors in the A-norm, however derivatives of one order higher are required due to the derivative in the definition of the norm.

  2. (ii)

    The estimates in Theorem 7 are not always sharp, as symmetries and the smoothing behaviour of the scheme can result in higher order convergence. We discuss such special cases for Examples 1 and 2 in Sect. 7, Remarks 22 and 23, respectively.

  3. (iii)

    These error estimates can be compared with [5], where an error bound of order \(h^{1/2}\) was obtained for diffusion problems with an obstacle term, under the main assumption that \(v_{xx}\) is a.e. bounded with a finite number of singularities (instead of (A4)) . In the present context it seems natural to assume the Hölder regularity of \(u_t\) and \(u_{xx}\) coming from the ellipticity assumption (see Remark 6).

7 Numerical tests

We now compare the performance of the BDF2 scheme with other second order finite difference schemes on two examples.

7.1 Test 1: Eikonal equation

The first example is based on a deterministic control problem (\(\sigma \equiv 0\)) and motivates the choice of the BDF2 approximation for the drift term in (5), compared to the more classical centered scheme (8). We consider

$$\begin{aligned} \left\{ \begin{array}{ll} v_t + |v_x| = 0 ,&{} x\in (-2,2), \; t\in (0,T), \\ v(0,x)= v_0(x), &{} x\in (-2,2), \end{array} \right. \end{aligned}$$

with \(v_0(x)=\max (0, 1-x^2)^4\) and \(T=0.2\). The initial datum is shown in Fig. 1 (dashed line). The exact solution is

$$\begin{aligned} v(t,x) = \min (v_0(x-t), v_0(x+t)). \end{aligned}$$

Remark 21

The Eikonal equation can be written as \(v_t + \max _{a\in \{-1,1\}} (a v_x) = 0\) in HJB form. Note that our theoretical analysis does not cover this example, however, since in the degenerate case assumption (A5) is required, which is not satisfied here.

Fig. 1
figure 1

Test 1: Initial data (dashed line) and numerical solution at time \(T=0.2\) computed for \(I+1=200\) and \(N=20\) (\(\tau /h=0.5\)) using BDF in time and centered approximation of the drift (left), BDF in time and space (right)

In Fig. 1, we show the results obtained at the terminal time \(T=0.2\) using schemes (3) with (8) (left) and (3) with(5) (right) with \(\tau /h=0.5\). We numerically observe that the centered approximation generates undesirable oscillations, whereas the BDF2 scheme preserves the total variation.

As stated in Theorem 3, in the case of a degenerate diffusion, a CFL condition of the form \(\tau \le Ch\) has to be satisfied for well-posedness of the BDF2 scheme. Table 2 shows numerical convergence of order 2 in both time and space, although the solution is globally only Lipschitz.

Table 2 Test 1. Error and convergence rate to the exact solution for the BDF2 scheme with \(\tau /h=0.1\) and initial data \(v_0(x)=\max (0, 1-x^2)^4\)

Remark 22

The full convergence order here is due to the particular symmetry of the solution. To confirm this, we report in Table 3 the results obtained for the same equation with initial data

$$\begin{aligned} v(0,x)=-\max (0, 1-x^2)^4 \end{aligned}$$

(see also Fig. 2). In this case, there is no such symmetry around the two singular points and as a result the full convergence order is lost numerically: the scheme is globally only of order 1 in the \(H^1\) norm and roughly 1.5 in the \(L^2\) and \(L^\infty \) norms.

Table 3 Test 1. Error and convergence rate to the exact solution for the BDF2 scheme with \(\tau /h=0.1\) and initial data \(v_0(x)=-\max (0, 1-x^2)^4\)

7.2 Test 2: A simple controlled diffusion model equation

The second test we propose is a problem with controlled diffusion. We consider

$$\begin{aligned} \left\{ \begin{array}{ll} v_t + \sup _{\sigma \in \{\sigma _1,\sigma _2\}}\Big (-\frac{1}{2}\sigma ^2v_{xx}\Big ) = 0, &{} x\in (-1,1), t\in (0,T), \\ v(0,x)= \sin (\pi x), &{} x\in (-1,1), \end{array} \right. \end{aligned}$$

with parameters \(\sigma _{1}=0.1\), \(\sigma _{2}=0.5\), \(T=0.5\).

Fig. 2
figure 2

Test 1: Initial data (dashed line) \( v_0(x)=-\max (0, 1-x^2)^4 \) and numerical solution at time \(T=0.2\) computed for \(I+1=200\) and \(N=20\) (\(\tau /h=0.5\)) using the BDF2 scheme. The convergence rates for this example are reported in Table 3

In spite of the apparent simplicity of the equation under consideration, in [19] an example of non-convergence of the Crank-Nicolson scheme is given for a similar optimal control problem. The BDF2 scheme, in contrast, has shown good performance for that same problem in [6].

Figure 3 (top row) shows the initial data and the value function at terminal time computed using the BDF2 scheme. The error and convergence rate in different norms are reported in Table 4. Here an accurate numerical solution computed by an implicit Euler scheme (which is monotone and hence guaranteed to converge) is used for comparison.

Fig. 3
figure 3

Test 2: Initial data (top, left), numerical solution at time \(T=0.5\) (top, right) computed by the BDF2 scheme, second order derivative computed with CN scheme (bottom, left) and BDF2 (bottom, right) for \(N=256\) and \(I+1=5120\)

Table 4 Test 2. Error and convergence rate for the BDF2 scheme with high CFL number \(\tau = 5 h\). A reference solution computed by the implicit Euler scheme (6) with \(I+1=20\times 2^{9}, N = 2^{22}\) is used
Table 5 Test 2. Error and convergence rate for the CN scheme with high CFL number \(\tau = 5 h\). A reference solution computed by the implicit Euler scheme (6) with \(I+1=20\times 2^{9}, N = 2^{22}\) is used

Taking \(\tau \sim h\) the BDF2 scheme gives clear second order convergence, as seen in Table 4. This is not the case for CN as shown in Table 5. The CN scheme also exhibits some instability in the second order derivative for high CFL number, i.e. \(\tau /h\), see Fig. 3 (this is analogous to the finding in [19]). One can verify that for a small CFL number, i.e. \(\tau \sim h^2\), the CN scheme shows convergence of second order.

Remark 23

In this example, due to the strict ellipticity, Assumption (A4) is guaranteed for some \(\delta >0\) (see Remark 6). Then Theorem 7 gives convergence with order \(\delta \). Furthermore, Fig. 3, bottom row, suggests Hölder continuity of \(u_{xx}\) in x, which is expected by virtue of the control being piecewise constant. Therefore, we conjecture that Assumption (A6) is satisfied, such that Theorem 19 would give the higher order \(1/2+\delta \). In the test, in fact the full order 2 is observed (see Table 4).

8 Conclusions

We have proved the well-posedness and stability in \(L^2\) and \(H^1\) norms of a second order BDF scheme for HJB equations with enough regularity of the coefficients. The significance of the results is that this was achieved for a second order (and hence) non-monotone scheme.

One can use the recursion we derived to bound the error of the numerical solution in terms of the truncation error of the scheme. The latter depends on the regularity of the solution and has to be estimated for individual examples. A full analysis was carried out for the semi-linear, uniformly parabolic case.

The numerical tests demonstrate convergence at least as good as predicted by the theoretical results, and often better, due to symmetries of the solution or smoothing properties of the equation and the scheme. This is in contrast to some alternative second order schemes, such as the central spatial difference in the case of a first order equation, or the Crank-Nicolson time stepping scheme for a second order equation, which can show poor or no convergence.