1 Introduction

Lévy processes have become very useful recently in several scientific disciplines. A non-exhaustive list includes physics, in the study of turbulence and quantum field theory; economics, for continuous time-series models; insurance mathematics, for computation of insurance and risk, and mathematical finance, for pricing path dependent options. Earlier application of Lévy processes in modeling financial instruments dates back in Madan and Seneta (1990) where a variance gamma process is used to model market returns.

A typical computational problem in mathematical finance is the computation of the quantity \(\mathbb {E}\left[ f(Y_t)\right] \), where \(Y_t\) is the time t solution of a stochastic differential equation driven by a Lévy process and \(f\in \mathcal {B}_b(\mathbb {R}^d)\), a bounded Borel measurable function on \(\mathbb {R}^d\). For instance f can be a payoff function. Typically one uses the Black–Scholes model, in which the underlying price process is lognormal. However, often the asset price exhibits big jumps over the time horizon. The inconsistency of the assumptions of the Black–Scholes model for market data has lead to the development of more realistic models for these data in the literature. General Lévy processes offer a promising alternative to describe the observed reality of financial market data, as compared to models that are based on standard Brownian motions.

In the application of standard and multilevel particle filter methods to SDEs driven by general Lévy processes, in addition to pricing path dependent options, we will consider filtering of partially-observed Lévy process with discrete-time observations. In the latter context, we will assume that the partially-observed data are regularly spaced observations \(z_1,\dots ,z_n\), where \(z_k\in \mathbb {R}^d\) is a realization of \(Z_k\) and the density of \(Z_k\) given \(Y_{k\tau }=y_{k\tau }\) is known, where \(\tau \) is the time scale. Real S&P 500 stock price data will be used to illustrate our proposed methods as well as the standard particle filter. We will show how both of these problems can be formulated as general Feynman–Kac type problems (Moral 2004), with time-dependent potential functions modifying the Lévy path measure.

The multilevel Monte Carlo (MLMC) methodology was introduced in Heinrich (2001) and first applied to the simulation of SDE driven by Brownian motion in Giles (2008). Recently, Dereich and Heidenreich (2011) provided a detailed analysis of the application of MLMC to a Lévy-driven SDE. This first work was extended in Dereich (2011) to a method with a Gaussian correction term which can substantially improve the rate for pure jump processes (Asmussen and Rosiński 2001). The authors in Ferreiro-Castilla et al. (2014) use the MLMC method for general Lévy processes based on Wiener–Hopf decomposition. We extend the methodology described in Dereich and Heidenreich (2011) to a particle filtering framework. This is challenging due to the following reasons. First, one must choose a suitable weighting function to prevent the weights in the particle filter being zero (or infinite). Next, one must control the jump part of the underlying Lévy process such that the path of the filter does not blow up as the time parameter increases. In pricing path dependent options, for example knock out barrier options, we adopt the same strategy described in Jasra and Moral (2011) and Jasra and Doucet (2009) for the computation of the expectation of the functionals of the SDE driven by general Lévy processes.

The rest of the paper is organised as follows. In Sect. 2, we briefly review the construction of general Lévy processes, the numerical approximation of Lévy-driven SDEs, the MLMC method, and finally the construction of a coupled kernel for Lévy-driven SDEs which will allow MLMC to be used. Section 3 introduces both the standard and multilevel particle filter methods and their application to Lévy-driven SDEs. Section 4 features numerical examples of pricing barrier options and filtering of partially observed Lévy processes. The computational savings of the multilevel particle filter over the standard particle filter is illustrated in this section.

2 Approximating SDE driven by Lévy processes

In this section, we briefly describe the construction and approximation of a general \(d^{\prime }\)-dimensional Lévy process \(\{X_t\}_{t\in [0,K]}\), and the solution \(Y:=\{Y_t\}_{t\in [0,K]}\) of a d-dimensional SDE driven by X. Consider a stochastic differential equation given by

$$\begin{aligned} \mathrm {d}Y_t&=a(Y_{t^{-}})\mathrm {d}X_t,\quad \mathrm {y}_0\in \mathbb {R}^d, \end{aligned}$$
(1)

where \(a:\mathbb {R}^d\rightarrow \mathbb {R}^{d\times d^{\prime }}\), and the initial value is \(\mathrm {y}_0\) (assumed known). In particular, in the present work we are interested in computing the expectation of bounded and measurable functions \(f:\mathbb {R}^d\rightarrow \mathbb {R}\), that is \(\mathbb {E}[f(Y_t)]\).

2.1 Lévy processes

For a general detailed description of the Lévy processes and analysis of SDEs driven by Lévy processes, we shall refer the reader to the monographs of Bertoin (1996), Sato (1999), Applebaum (2004) and Protter (2004). Lévy processes are stochastic processes with stationary and independent increments, which begin almost surely from the origin and are stochastically continuous. Two important fundamental tools available to study the richness of the class of Lévy processes are the Lévy–Khintchine formula and the Lévy–Itô decomposition. They respectively characterize the distributional properties and the structure of sample paths of the Lévy process. Important examples of Lévy processes include Poisson processes, compound Poisson processes and Brownian motions.

There is a strong interplay between Lévy processes and infinitely divisible distributions such that, for any \(t>0\) the distribution of \(X_t\) is infinitely divisible. Conversely, if F is an infinitely divisible distribution then there exists a Lévy process X such that the distribution of \(X_1\) is given by F. This conclusion is the result of Lévy–Khintchine formula for Lévy processes we describe below. Let X be a Lévy process with a triplet \(\left( \nu ,\Sigma ,b\right) \), \(b\in \mathbb {R}^{d'}, 0\le \Sigma =\Sigma ^T \in \mathbb {R}^{d'\times d'}\), where \(\nu \) is a measure satisfying \(\nu (\{0\})=0\) and \(\int _{\mathbb {R}^{d'}}(1\wedge |x|^2)\nu (\mathrm {d}x)<\infty \), such that

$$\begin{aligned} \mathbb {E}[e^{i\langle u, X_t\rangle }]&=\int _{\mathbb {R}^{d'}}e^{i\langle u, x\rangle }\pi (\mathrm {d}x)=e^{t\psi (u)} \end{aligned}$$

with \(\pi \) the probability law of \(X_t\), where

$$\begin{aligned} \psi (u)&=i\langle u, b\rangle -\frac{\langle u, \Sigma u \rangle }{2}+\int _{\mathbb {R}^{d'}\backslash \{0\}} \nonumber \\&\quad \times \,\left( e^{i\langle u , x\rangle }-1- { i\langle u , x\rangle \mathbb {1}_{\{|x|<1\}}} \right) \nu (dx),\quad u\in \mathbb {R}^{d'}. \end{aligned}$$
(2)

The measure \(\nu \) is called the Lévy measure of X. The triplet of Lévy characteristics \(\left( \nu ,\Sigma ,b\right) \) is simply called Lévy triplet. Note that in general, the Lévy measure \(\nu \) can be finite or infinite. If \(\nu ({\mathbb {R}}^{d'})<\infty \), then almost all paths of the Lévy process have a finite number of jumps on every compact interval and it can be represented as a compensated compound Poisson process. On the other hand, if \(\nu ({\mathbb {R}}^{d'})=\infty \), then the process has an infinite number of jumps on every compact interval almost surely. Even in this case the third term in the integrand ensures that the integral is finite, and hence so is the characteristic exponent.

2.2 Simulation of Lévy processes

The law of increments of many Lévy processes is not known explicitly. This makes it more difficult to simulate a path of a general Lévy process than for instance standard Brownian motion. For a few Lévy processes where the distribution of the process is known explicitly, Cont and Tankov (2004) and Schoutens (2003) provided methods for exact simulation of such processes, which are applicable in financial modelling. For our purposes, the simulation of an approximated path of a general Lévy process will be based on the Lévy–Itô decomposition and we briefly describe the construction below. An alternative construction is based on Wiener–Hopf decomposition. This is used in Ferreiro-Castilla et al. (2014).

The Lévy–Itô decomposition reveals much about the structure of the paths of a Lévy process. We can split the Lévy exponent, or the characteristic exponent of \(X_t\) in (2), into three parts

$$\begin{aligned} \psi&=\psi ^{1}+\psi ^2+\psi ^3 \, . \end{aligned}$$

where

$$\begin{aligned} \psi ^1(u)&=i\langle u , b\rangle ,\quad \psi ^2(u)=-\frac{\langle u, \Sigma u \rangle }{2}, \\ \psi ^3(u)&=\int _{\mathbb {R}^{d'}\backslash \{0\}}\left( e^{i\langle u , x\rangle }-1- { i\langle u , x\rangle \mathbb {1}_{\{|x|<1\}}} \right) \nu (dx),\quad u\in \mathbb {R}^{d'} \end{aligned}$$

The first term corresponds to a deterministic drift process with parameter b, the second term to a Wiener process with covariance \(\Sigma \), and the last part corresponds to a Lévy process which is a square integrable martingale. This term may either be a compensated compound Poisson process or the limit of such processes, and it is the hardest to handle when it arises from such a limit.

Thus, any Lévy process can be decomposed into three independent Lévy processes thanks to the Lévy–Itô decomposition theorem. In particular, let \(\{L_t\}_{t\in [0,K]}\) denote a process such that the characteristic exponent of \(L_t\) is \(t\psi ^3(u)\), and let \(\{W_t\}_{t\in [0,K]}\) denote a Wiener process independent of the process \(\{L_t\}_{t\in [0,K]}\). A Lévy process \(\{X_t\}_{t\in [0,K]}\) can be constructed as follows

$$\begin{aligned} X_t&=\sqrt{\Sigma } W_t+L_t+bt \, , \end{aligned}$$
(3)

where \(\sqrt{\Sigma }\) denotes the symmetric square-root of \(\Sigma \). The Lévy–Itô decomposition guarantees that every square integrable Lévy process has a representation as (3). We will assume that one cannot sample from the law of \(X_t\), hence of \(Y_t\), and rather we must numerically approximate the process with finite resolution. Such numerical methods have been studied extensively, for example in Jacod et al. (2005) and Rubenthaler (2003).

Let \(|\cdot |\) denote the standard Euclidean l2 norm, for vectors, and induced operator norm for matrices. It will be assumed that the Lévy process X (2), and the Lévy-driven process Y in (1), satisfy the following conditions.

Assumption 2.1

There exists a \(C>0\) such that

  1. (i)

    \(|a(y) - a(y')| \le C |y-y'|\), and \(|a(y)| \le C\) for all \(y\in {\mathbb {R}}^d\) ;

  2. (ii)

    \(0 < \int _{{\mathbb {R}}^{d}} |x|^2 \nu (dx) \le C^2\) ;

Item (i) provides continuity of the forward map, while (ii) controls the variance of the jumps. These assumptions are the same as in the paper (Dereich and Heidenreich 2011), with the exception of the second part of (i), which was not required there.

2.3 Numerical approximation of a Lévy process and Lévy-driven SDE

Recall (1) and (3). Consider the evolution of discretized Lévy process and the Lévy-driven SDE over the time interval [0, K].

In order to describe the Euler discretization of the two processes for a given accuracy parameter \(h_l\), we need some definitions. The meaning of the subscript l will become clear in the next section. Let \(\delta _l>0\) denote a jump threshold parameter in the sense that jumps which are smaller than \(\delta _l\) will be ignored. Let \(B_{\delta _l}=\{x\in \mathbb {R}^{d'}:|x|<\delta _l\}\). Define \(\lambda _l=\nu (B_{\delta _l}^{c})<\infty \), that is the Lévy measure outside of the ball of radius \(\delta _l\). We assume that the Lévy component of the process is nontrivial so that \(\nu (B_1)=\infty \), which assures that \(\lambda _l > 0\) for \(\delta _l\) sufficiently small. Note that \(\lambda _l\) will increase as \(\delta _l\) decreases and, assuming \(\delta _\infty =0\), then \(\lambda _\infty = \infty \). Therefore, the following procedure is logical. First \(h_l\) will be chosen and then the parameter \(\delta _l\) will be chosen such that the step-size of the time-stepping method is \(h_l=1/\lambda _l\). The jump time increments larger than \(\delta _l\) are exponentially distributed with parameter \(\lambda _l\) so that the number of jumps larger than \(\delta _l\) before time t is a Poisson process \(N^l(t)\) with intensity \(\lambda _l\). The jump times will be denoted by \({\tilde{T}}_j^l\). The jump heights \(\varDelta L_{{\tilde{T}}_j}^l\) are distributed according to

$$\begin{aligned} \mu ^l(\mathrm {d}x)&:=\frac{1}{\lambda _l}\mathbb {1}_{B_{\delta _l}^{c}}(x)\nu (\mathrm {d}x). \end{aligned}$$

Define

$$\begin{aligned} F_0^l=\int _{B_{\delta _{l}}^{c}}x\nu (\mathrm {d}x). \end{aligned}$$
(4)

The expected total length of all jumps before time t is \(F_0^l t\), and the compensated compound Poisson process \(L^{\delta _l}\) defined by

$$\begin{aligned} L_t^{\delta _l}=\sum _{j=1}^{N^l(t)} \varDelta L_{{\tilde{T}}_j}^l - F_0^l t \end{aligned}$$

is an \(L^2\) martingale which converges in \(L^2\) to the Lévy process L as \(\delta _l \rightarrow 0\) (Applebaum 2004; Dereich and Heidenreich 2011).

The Euler discretization of the Lévy process and the Lévy driven SDE is given by Algorithm 1. Appropriate refinement of the original jump times \(\{{\tilde{T}}^l_j\}\) to new jump times \(\{T_{j}^{l}\}\) is necessary to control the discretization error arising from the Brownian motion component, the original drift process, and the drift component of the compound Poisson process. Note that the \(\varDelta L_{{T}^l_j}^{l}\) is non-zero only when \(T_{j}^{l}\) corresponds with \({\tilde{T}}_{m}^{l}\) for some m, as a consequence of the construction presented above.

figure a
figure b

The numerical approximation of the Lévy process described in Algorithm 1 gives rise to an approximation of the Lévy-driven SDE as follows. Given \(Y^l_{T^l_{0}}\), for \(m=0,\dots , k_l-1\)

$$\begin{aligned} Y^l_{T^l_{m+1}} =Y^l_{T^l_{m}}+a(Y^l_{T^l_{m}})(\varDelta X)^l_{T^l_{m+1}}, \end{aligned}$$
(6)

where \((\varDelta X)^l_{T^l_{m+1}} = X^l_{T^l_{m+1}}-X^l_{T^l_{m}}\) is given by (5). In particular the recursion in (6) gives rise to a transition kernel, denoted by \(Q^l(u,dy)\), between observation times \(t\in \{0,1,\dots ,K\}\). This kernel is the conditional distribution of \(Y^l_{T^l_{k_l}}\) given initial condition \(Y^l_{T^l_0}=u\). Observe that the initial condition for X is irrelevant for simulation of Y, since only the increments \((\varDelta X)^l_{T^l_{m+1}}\) are required, which are simulated independently by adding a realization of \(N\big ((b-F_0^l)(T^l_{m+1}-T^l_{m}), (T^l_{m+1}-T^l_{m})\Sigma \big )\) to \(\varDelta L_{T_{m+1}^l}^l\).

Remark 2.1

The numerical approximation of the Lévy process and hence Lévy-driven SDE (1) in Algorithm 1 is the single-level version of a more general coupled discretization (Dereich and Heidenreich 2011) which will be described shortly in Sect. 2.5. This procedure will be used to obtain samples for the plain particle filter algorithm.

2.4 Multilevel Monte Carlo method

Suppose one aims to approximate the expectation of functions of the solution of the Lévy-driven SDE in (1) at time 1, that is \(\mathbb {E}[f(Y_1)]\), where \(f:\mathbb {R}^d\rightarrow \mathbb {R}\) is a bounded and measurable function. Typically, one is interested in the expectation w.r.t. the law of the exact solution of SDE (1), but this is not always possible in practice. Suppose that the law associated with (1) with no discretization is \(\pi _1\). Since we cannot sample from \(\pi _1\), we use a biased version \(\pi ^L_1\) associated with a given level of discretization of SDE (1) at time 1. Given \(L\ge 1\), define \(\pi _{1}^{L}(f):=\mathbb {E}[f(Y^L_{1})]\), the expectation with respect to the law associated with the Euler discretization ((5)) at level L. The standard Monte Carlo (MC) approximation at time 1 consists in obtaining i.i.d. samples \(\Big (Y_{1}^{L,(i)}\Big )_{i=1}^{N_L}\) from the law \(\pi _1^L\) and approximating \(\pi _{1}^{L}(f)\) by its empirical average

$$\begin{aligned} {\pi }_1^{L,N_L}(f)&:=\frac{1}{N_L}\sum _{i=1}^{N_L}f(Y_1^{L,(i)}). \end{aligned}$$

The mean square error of the estimator is

$$\begin{aligned} e\left( {\pi }_1^{L,N_L}(f)\right) ^2&:=\mathbb {E}\left[ \left( {\pi }_1^{L,N}(f)-\pi _{1}(f)\right) ^2\right] . \end{aligned}$$

Since the MC estimator \({\pi }_1^{L,N_L}(f)\) is an unbiased estimator for \(\pi _{1}^{L}(f)\), this can further be decomposed into

$$\begin{aligned} e\left( {\pi }_1^{L,N_L}(f)\right) ^2=\underbrace{N_L^{-1}\mathbb {V}[f(Y_1^L)]}_{\hbox {variance}} +(\underbrace{{\pi }_1^L(f)-\pi _{1}(f)}_{\hbox {bias}})^2. \end{aligned}$$
(7)

The first term in the right hand side of the decomposition is the variance of MC simulation and the second term is the bias arising from discretization. If we want (7) to be \(\mathcal {O}(\epsilon ^2)\), then it is clearly necessary to choose \(N_L\propto \epsilon ^{-2}\), and then the total cost is \(N_L\times \mathrm{Cost}(Y_1^{L,(i)}) \propto \epsilon ^{-2 - 1}\), where it is assumed that \(\mathrm{Cost}(Y_1^{L,(i)}) \propto \epsilon ^{-1}\) is the cost to ensure the bias is \(\mathcal {O}(\epsilon )\).

Now, in the multilevel Monte Carlo (MLMC) settings, one can observe that the expectation of the finest approximation \({\pi }_1^L(f)\) can be written as a telescopic sum starting from a coarser approximation \({\pi }_{1}^{0}(f)\), and the intermediate ones:

$$\begin{aligned} {\pi }_{1}^L(f)&:={\pi }_{1}^{0}(f)+\sum _{l=1}^{L}\left( {\pi }_{1}^{l}(f)-{\pi }_{1}^{l-1}(f)\right) . \end{aligned}$$
(8)

Now it is our hope that the variance of the increments decays with l, which is reasonable in the present scenario where they are finite resolution approximations of a limiting process. The idea of the MLMC method is to approximate the multilevel (ML) identity (8) by independently computing each of the expectations in the telescopic sum by a standard MC method. This is possible by obtaining i.i.d. pairs of samples \(\Big (Y_{1}^{l,(i)},Y_{1}^{l-1,(i)}\Big )_{i=1}^{N_l}\) for each l, from a suitably coupled joint measure \(\bar{\pi }_1^l\) with the appropriate marginals \(\pi _1^l\) and \(\pi _1^{l-1}\), for example generated from a coupled simulation the Euler discretization of SDE (1) at successive refinements. The construction of such a coupled kernel is detailed in Sect. 2.5. Suppose it is possible to obtain such coupled samples at time 1. Then for \(l=0,\dots ,L\), one has independent MC estimates. Let

$$\begin{aligned}&{\pi }^{{N}_{0:L}}_{1}(f):=\frac{1}{N_0}\sum _{i=1}^{N_0}f\left( Y_1^{1,(i)}\right) \nonumber \\&\quad +\sum _{l=1}^{L}\frac{1}{N_l}\sum _{i=1}^{N_l}\left( f(Y_1^{l,(i)})-f(Y_1^{l-1,(i)})\right) , \end{aligned}$$
(9)

where \({N}_{0:L}:=\left\{ N_l\right\} _{l=0}^{L}\). Analogously to the single level Monte Carlo method, the mean square error for the multilevel estimator (9) can be expanded to obtain

$$\begin{aligned}&e\left( {\pi }_{1}^{{N}_{0:L}}(f)\right) ^2 \nonumber \\&\quad :=\underbrace{\sum _{l=0}^{L}N_l^{-1}\mathbb {V}[f(Y_1^{l})-f(Y_1^{l-1}))]}_{\hbox {variance}}+(\underbrace{{\pi }_1^L(f)-\pi _{1}(f)}_{\hbox {bias}})^2, \end{aligned}$$
(10)

with the convention that \(f(Y_1^{-1})\equiv 0\). It is observed that the bias term remains the same; that is we have not introduced any additional bias. However, by an optimal choice of \(N_{0:L}\), one can possibly reduce the computational cost for any pre-selected tolerance of the variance of the estimator, or conversely reduce the variance of the estimator for a given computational effort.

In particular, for a given user specified error tolerance \(\epsilon \) measured in the root mean square error, the highest level L and the replication numbers \(N_{0:L}\) are derived as follows. We make the following assumptions about the bias, variance and computational cost based on the observation that there is an exponential decay of bias and variance as L increases.

Suppose that there exist some constants \(\alpha ,\beta \) and an accuracy parameter \(h_l\) associated with the discretization of SDE (1) at level l such that

\(\left( B_l\right) \) :

\(|\mathbb {E}[f(Y_1^{l})-f(Y_1^{l-1})]|=\mathcal {O}(h_{l}^{\alpha })\),

\(\left( V_l\right) \) :

\(\mathbb {E}[|f(Y_1^{l})-f(Y_1^{l-1})|^2]=\mathcal {O}(h_{l}^{\beta })\),

\(\left( C_l\right) \) :

\(\hbox {cost}\left( Y_1^{l},Y_1^{l-1}\right) \propto h_{l}^{-1}\),

where \(\alpha ,\beta \) are related to the particular choice of the discretization method and cost is the computational effort to obtain one sample \(\left( Y^l,Y^{l-1}\right) \). For example, Euler–Maruyama discretization method for the solution of SDEs driven by Brownian motion, under Assumption 2.1 on the coefficient of the SDE, and for Lipschitz f, gives orders \(\alpha =\beta =1\) (see Kloeden and Platen 1992, Theorem 10.2.2). The accuracy parameter \(h_l\) typically takes the form \(h_l=S_{0}^{-l}\) for some integer \(S_{0}\in \mathbb {N}\). Such estimates can be obtained for Lévy driven SDE and this point will be revisited in detail below. For the time being we take this as an assumption.

The key observation from the mean-square error of the multilevel estimator (9)–(10) is that the bias is given by the finest level, while the variance is decomposed into a sum of variances of the lth increments. Thus the total variance is of the form \(\mathcal {V}=\sum _{l=0}^{L}V_lN_l^{-1}\) and by condition \(\left( V_l\right) \) above, the variance of the lth increment is of the form \(V_lN_l^{-1}\). The total computational cost takes the form \(\mathcal {C}=\sum _{l=0}^{L}C_lN_l\). In order to minimize the effort to obtain a given mean square error (MSE), one must balance the terms in (10). Based on the condition \(\left( B_l\right) \) above, a bias error proportional to \(\epsilon \) will require the highest level

$$\begin{aligned} L&\propto \frac{-\log (\epsilon )}{\log (S_{0})\alpha }. \end{aligned}$$
(11)

In order to obtain optimal allocation of resources \(N_{0:L}\), one needs to solve a constrained optimization problem: minimize the total cost \(\mathcal {C}=\sum _{l=0}^{L}C_lN_l\) for a given fixed total variance \(\mathcal {V}=\sum _{l=0}^{L}V_lN_l^{-1}\) or vice versa. Based on the conditions \(\left( V_l\right) \) and \(\left( C_l\right) \) above, one obtains via the Lagrange multiplier method the optimal allocation \(N_l\propto V_l^{1/2}C_l^{-1/2} \propto h_{l}^{(\beta +1)/2}\).

Now targetting an error of size \(\mathcal {O}(\epsilon )\), one sets \(N_l\propto \epsilon ^{-2}h_{l}^{(\beta +1)/2}K(\epsilon )\), where \(K(\epsilon )\) is chosen to control the total error for increasing L. Thus, for the multilevel estimator we obtained:

$$\begin{aligned}&\hbox {variance}:\mathcal {V}=\sum _{l=0}^{L}V_lN_l^{-1}=\epsilon ^{2}K(\epsilon )^{-1}\sum _{l=0}^{L}h_{l}^{(\beta -1)/2}\\&\hbox {cost}:\,\,\mathcal {C}=\sum _{l=0}^{L}C_lN_l=\epsilon ^{-2}K(\epsilon )^{2}. \end{aligned}$$

One then sets \(K(\epsilon )=\sum _{l=0}^{L}h_{l}^{(\beta -1)/2}\) in order to have variance of \(\mathcal {O}(\epsilon ^2)\). We can identify three distinct cases

  1. (i)

    If \(\beta =1\), which corresponds to the Euler–Maruyama scheme, then \(K(\epsilon )=L\). One can clearly see from the expression in (11) that \(L=\mathcal {O}(|\log (\epsilon )|)\). Then the total cost is \(\mathcal {O}(\epsilon ^{-2}\log (\epsilon )^2)\) compared with single level \(\mathcal {O}(\epsilon ^{-3})\).

  2. (ii)

    If \(\beta > 1\), which correspond to the Milstein scheme, then \(K(\epsilon )\equiv 1\), and hence the optimal computational cost is \(\mathcal {O}(\epsilon ^{-2})\).

  3. (iii)

    If \(\beta < 1\), which is the worst case scenario, then it is sufficient to choose \(K(\epsilon )=K_{L}(\epsilon )= h_{L}^{(\beta -1)/2}\). In this scenario, one can easily deduce that the total cost is \(\mathcal {O}(\epsilon ^{-(1/\alpha +\kappa )})\), where \(\kappa =2-\beta /\alpha \), using the fact that \(h_L\propto \epsilon ^{1/\alpha }\).

One of the defining features of the multilevel method is that the realizations \((Y_1^l,Y_1^{l-1})\) for a given increment must be sufficiently coupled in order to obtain decaying variances \((V_l)\). It is clear how to accomplish this in the context of stochastic differential equations driven by Brownian motion introduced in Giles (2008) [see also Jasra et al. (2017)], where coarse icrements are obtained by summing the fine increments, but it is non-trivial how to proceed in the context of SDEs purely driven by general Lévy processes. A technique based on Poisson thinning has been suggested by Giles and Xia (2012) for pure-jump diffusion and by Ferreiro-Castilla et al. (2014) for general Lévy processes. In the next section, we explain an alternative construction of a coupled kernel based on the Lévy–Ito decomposition, in the same spirit as in Dereich and Heidenreich (2011).

2.5 Coupled sampling for Levy-driven SDEs

The ML methodology described in Sect. 2.4 works by obtaining samples from some coupled-kernel associated with discretization of (1). We now describe how one can construct such a kernel associated with the discretization of the Lévy-driven SDE. Let \((y,y')\in {\mathbb {R}}^{2d}\). For \(l\ge 1\), let \({\check{Q}}^{l,l-1}((y,y'),\cdot )\) be a coupling of the kernels \(Q^l(y,\cdot )\) and \(Q^{l-1}(y',\cdot )\). For \(\varphi \in \mathcal {B}_b({\mathbb {R}}^{2d})\) we use the notation for \((y,y')\in \mathbb {R}^{2d}\):

$$\begin{aligned}&{\check{Q}}^{l,l-1}(\varphi )(y,y') \\&\quad := \int _{\mathbb {R}^{2d}} \varphi (y^l,y^{l-1}){\check{Q}}^{l,l-1}((y,y'),d(y^l,y^{l-1})). \end{aligned}$$

Coupling means that for \(\varphi \in \mathcal {B}_b({\mathbb {R}}^d)\)

$$\begin{aligned}&{\check{Q}}^{l,l-1}(\varphi \otimes 1)(y,y') = Q^l(\varphi )(y), \\&{\check{Q}}^{l,l-1}(1 \otimes \varphi )(y,y') = Q^{l-1}(\varphi )(y') \end{aligned}$$

where \(\otimes \) denotes the tensor product of functions, e.g. \(\varphi \otimes 1\) denotes \(\varphi (y^l)\) in the integrand associated to \({\check{Q}}^{l,l-1}((y,y'),d(y^l,y^{l-1}))\).

The coupled kernel \({\check{Q}}^{l,l-1}\) can be constructed using the following strategy. Using the same definitions as in Sect. 2.3, let \(\delta _l\) and \(\delta _{l-1}\) be user specified jump-thresholds for the fine and coarse approximation, respectively. Define

$$\begin{aligned} F_0^l=\int _{B_{\delta _{l}}^{c}}x\nu (\mathrm {d}x) \quad \mathrm{and} \quad F_0^{l-1}=\int _{B_{\delta _{l-1}}^{c}}x\nu (\mathrm {d}x). \end{aligned}$$
(12)

The objective is to generate a coupled pair \((Y_1^{l,l},Y_1^{l,l-1})\) given \((y_0^{l},y_0^{l-1})\), \(h_l,h_{l-1}\) with \(h_l<h_{l-1}\). The parameter \(\delta _\ell (h_\ell )\) will be chosen such that \(h_\ell ^{-1}=\nu (B_{\delta _\ell }^c)\), and these determine the value of \(F_0^\ell \) in (12), for \(\ell \in \{l,l-1\}\). We now describe the construction of the coupled kernel \({\check{Q}}^{l,l-1}\) and thus obtain the coupled pair in Algorithm 2, which is the same as the one presented in Dereich and Heidenreich (2011).

figure c

3 Multilevel particle filter for Lévy-driven SDEs

In this section, the multilevel particle filter will be discussed for sampling from certain types of measures which have a density with respect to a Lévy process. We will begin by briefly reviewing the general framework and standard particle filter, and then we will extend these ideas into the multilevel particle filtering framework.

3.1 Filtering and normalizing constant estimation for Lévy-driven SDEs

Recall the Lévy-driven SDE (1). We will use the following notation here: \(y_{1:n}=[y_1, y_2,\dots ,y_n]\). It will be assumed that the general probability measure of interest is of the form

$$\begin{aligned} {\hat{\eta }}^{\infty }_{n}(dy_{1:n}) \propto \Big [\prod _{i=1}^{n}G_i(y_{i})Q^{\infty }(y_{i-1}, dy_i)\Big ], \end{aligned}$$
(15)

for \(n\ge 1\) and for some given \(y_0\). Here \(Q^{\infty }(y_{i-1},dy)\) is the transition kernel of the process (1), i.e. the probability measure of solution \(Y_1\) at time point 1 given initial condition \(Y_0=y_{i-1}\). It is assumed that observations \(Z_i\) are regularly observed at times \(1,2,\dots \). The conditional density of an observation \(Z_i\), given \(y_i\), is known and denoted \(g(z_i | y_i)\). Assume the observations are fixed. Since this will always only be considered as a function of \(y_i\), following standard practice, we introduce the shorthand notation \(G_i(y_{i})=g(z_i | y_i)\), where the subscript i encapsulates the dependence on \(z_i\). Note that the formulation discussed here, that is for \({\hat{\eta }}^{\infty }_{n}\), also allows one to consider general Feynman–Kac models (of the form (15)), rather than just the filters that are focussed upon in this section. The following assumptions will be made on the likelihood functions \(\{G_i\}\), which are sufficient for the mathematical results in this paper.

Assumption 3.1

There are \(c>1\) and \(C>0\), such that for all \(n>0\), and \(v, v' \in {\mathbb {R}}^{d}\), \(G_n\) satisfies

(i):

\(c^{-1}< G_n(v) < c\);

(ii):

\(|G_n(v) - G_n(v')| \le C |v - v'|\).

In practice, as discussed earlier on \(Q^{\infty }\) is typically analytically intractable, in the sense that we cannot sample from it exactly, and we do not know how to evaluate a non-negative and un-biased estimate of it, or even an un-normalized version of it. As a result, we will focus upon targets associated to a discretization, i.e. of the type

$$\begin{aligned} {\hat{\eta }}^{l}_{n}(dy_{1:n}) \propto \Big [\prod _{i=1}^{n}G_i(y_{i})Q^{l}(y_{i-1}, dy_i)\Big ], \end{aligned}$$
(16)

for \(l<\infty \), where \(Q^l\) is defined by \(k_l\) iterates of the recursion in (6). Note that we will use \({\hat{\eta }}^{l}_{n}\) as the notation for measure and density, with the use clear from the context, where \(l = 0,1,\dots \).

The objective is to compute the expectation of functionals with respect to this measure, particularly at the last coordinate. For any bounded and measurable function \(f:\mathbb {R}^d \rightarrow \mathbb {R}\), \(n\ge 1\), we will use the notation

$$\begin{aligned} {\hat{\eta }}_{n}^{l}(f)&:=\int _{{\mathbb {R}}^{dn}}f(y_{n}){\hat{\eta }}^{l}_{n}(\mathrm {d}y_{1:n}). \end{aligned}$$
(17)

Often of interest is the computation of the unnormalized measure. That is, for any bounded and measurable function \(f:\mathbb {R}^d \rightarrow \mathbb {R}\) define, for \(n\ge 1\)

$$\begin{aligned} {\hat{\zeta }}^{l}_{n}(f) := \int _{\mathbb {R}^{dn}}f(y_n) \Big [\prod _{i=1}^{n}G_i(y_{i})Q^{l}(y_{i-1}, dy_i)\Big ]\, . \end{aligned}$$
(18)

In the context of the model under study, \({\hat{\zeta }}^{l}_{n}(1)\) is the marginal likelihood.

Henceforth \(Y^l_{1:n}\) will be used to denote a draw from \({\hat{\eta }}^{l}_{n}\). The vanilla case described earlier can be viewed as the special example in which \(G_i\equiv 1\) for all i. Following standard practice, realizations of random variables will be denoted with small letters. The randomness of the samples will be recalled again for MSE calculations, over potential realizations.

3.2 Particle filtering

We will now describe the particle filter, which is capable of consistently estimating terms of the form (17) and (18), for any fixed l. Consistent means it is asymptotically exact in the limit of infinite particles. The particle filter has been studied and used extensively (see for example Moral 2004; Doucet et al. 2001) in many practical applications of interest.

For a given level l, Algorithm 3 gives the standard particle filter. The weights are defined as for \(k\ge 1\)

$$\begin{aligned} w^{l,(i)}_{k}&= w^{l,(i)}_{k-1} \frac{G_k\left( y^{l,(i)}_{k}\right) }{\sum _{j=1}^{N_l}w_{k-1}^{l,(j)} G_k\left( y^{l,(j)}_{k}\right) } \end{aligned}$$
(19)

with the convention that \(w^{l,(i)}_{0}=1\). Note that the abbreviation ESS stands for effective sample size, which measures the variability of weights at time k of the algorithm. In the analysis to follow \(H=1\) in Algorithm 3 (or rather it’s extension in the next section), but this is not the case in our numerical implementations.

Moral (2004) (along with many other authors) have shown that for upper-bounded and non-negative, \(G_i\), and suitable assumptions on the dynamics (for example Assumption 2.1 is sufficient), for \(f:\mathbb {R}^d\rightarrow \mathbb {R}\) bounded and measurable (these conditions can be relaxed), at step 3 of Algorithm 3, the estimate

$$\begin{aligned} \sum _{i=1}^{N_l} w^{l,(i)}_{n}f\left( y^{l,(i)}_{n}\right) \end{aligned}$$

will converge almost surely to (17). In addition, if \(H=1\) in Algorithm 3,

$$\begin{aligned} \Big [\prod _{i=1}^{n-1}\frac{1}{N_l}\sum _{j=1}^{N_l}G_i(y^{l,(j)}_{i})\Big ]\frac{1}{N_l}\sum _{j=1}^{N_l}G_n\left( y^{l,(j)}_{n}\right) f\left( y^{l,(j)}_{n}\right) \end{aligned}$$

will converge almost surely to (18).

figure d

3.3 Multilevel particle filter

We now describe the multilevel particle filter of Jasra et al. (2017) for the context considered here. The basic idea is to run \(L+1\) independent algorithms, the first a particle filter as in the previous section and the remaining, coupled particle filters. The particle filter will sequentially (in time) approximate \({\hat{\eta }}_k^0\) and the coupled filters will sequentially approximate the couples \(({\hat{\eta }}^0_k,{\hat{\eta }}^1_k),\dots ,({\hat{\eta }}^{L-1}_k,{\hat{\eta }}^{L}_k)\). Each (coupled) particle filter will be run with \(N_l\) particles.

The most important step in the MLPF is the coupled resampling step, which maximizes the probability of resampled indices being the same at the coarse and fine levels. Denote the coarse and fine particles at level \(l\ge 1\) and step \(k\ge 1\) as \(\Big (Y_{k}^{l,(i)}(l),Y_{k}^{l-1,(i)}(l)\Big )\), for \(i=1,\dots ,N_l\). Equation (19) is replaced by the following, for \(k\ge 1\)

$$\begin{aligned} w^{l,(i)}_{k}(l)&= w^{l,(i)}_{k-1}(l) \frac{G_k(y^{l,(i)}_{k}(l))}{\sum _{j=1}^{N_l}w_{k-1}^{l,(j)}(l) G_k(y^{l,(j)}_{k}(l))} \end{aligned}$$
(20)
$$\begin{aligned} w^{l-1,(i)}_{k}(l)&= w^{l-1,(i)}_{k-1}(l) \frac{G_k(y^{l-1,(i)}_{k}(l))}{\sum _{j=1}^{N_l}w_{k-1}^{l-1,(j)}(l) G_k(y^{l-1,(j)}_{k}(l))} \end{aligned}$$
(21)

with the convention that \(w^{l,(i)}_{0}(l)=w^{l-1,(i)}_{0}(l)=1\).

figure e

We set \(H=1\) (as in Algorithm 3), but it need not be the case. Recall that the case \(l=0\) is just a particle filter. For each \(1\le l \le L\) the following procedure is run independently.

figure f

The samples generated by the particle filter for \(l=0\) at time k are denoted \(Y_{k}^{0,(i)}(0)\), \(i\in \{1,\dots ,N_0\}\) (we are assuming \(H=1\)).

To estimate the quantities (17) and (18) (with \(l=L\)) (Jasra et al. 2017, 2018) show that in the case of discretized diffusion processes

$$\begin{aligned}&{\hat{\eta }}^\mathrm{ML,L}_n(f) \\&\quad = \sum _{l=1}^L \left( \frac{\sum _{i=1}^{N_l}G_n(y_n^{l,(i)}(l))f(y_n^{l,(i)}(l))}{\sum _{i=1}^{N_l}G_n(y_n^{l,(i)}(l))}\right. \\&\left. \quad - \, \frac{\sum _{i=1}^{N_l}G_n(y_n^{l-1,(i)}(l))f(y_n^{l-1,(i)}(l))}{\sum _{i=1}^{N_l}G_n(y_n^{l-1,(i)}(l))} \right) \\&\quad +\,\frac{\sum _{i=1}^{N_0}G_n(y_n^{0,(i)}(0))f(y_n^{0,(i)}(0))}{\sum _{i=1}^{N_0}G_n(y_n^{0,(i)}(0))} \end{aligned}$$

and

$$\begin{aligned}&{\hat{\zeta }}^\mathrm{ML,L}_n(f) =\sum _{l=1}^L \Big ( \Big [\prod _{i=1}^{n-1}\frac{1}{N_l}\sum _{j=1}^{N_l}G_i(y^{l,(j)}_{i}(l))\Big ]\nonumber \\&\quad \frac{1}{N_l}\sum _{j=1}^{N_l}G_n(y^{l,(j)}_{n}(l))f(y^{l,(j)}_{n}(l))\nonumber \\&\qquad -\Big [\prod _{i=1}^{n-1}\frac{1}{N_l}\sum _{j=1}^{N_l}G_i(y^{l-1,(j)}_{i}(l))\Big ]\nonumber \\&\quad \frac{1}{N_l}\sum _{j=1}^{N_l}G_n(y^{l-1,(j)}_{n}(l))f(y^{l-1,(j)}_{n}(l)) \Big ) \nonumber \\&\qquad +\Big [\prod _{i=1}^{n-1}\frac{1}{N_0}\sum _{j=1}^{N_0}G_i(y^{0,(j)}_{i}(0))\Big ]\nonumber \\&\quad \frac{1}{N_0}\sum _{j=1}^{N_0}G_n(y^{0,(j)}_{n}(0))f(y^{0,(j)}_{n}(0)) \end{aligned}$$
(22)

converge almost surely to \({\hat{\eta }}^L_n(f)\) and \({\hat{\zeta }}^{L}_n(f)\) respectively, as min\(\{N_l\} \rightarrow \infty \). Furthermore, both can significantly improve over the particle filter, for L and \(\{N_l\}_{l=1}^L\) appropriately chosen to depend upon a target mean square error (MSE). By improve, we mean that the work is less than the particle filter to achieve a given MSE with respect to the continuous time limit, under appropriate assumptions on the diffusion. We show how the \(N_0,\dots ,N_L\) can be chosen in Sect. 3.3.1. Note that for positive f the estimator above \({\hat{\zeta }}^\mathrm{ML,L}_n(f)\) can take negative values with positive probability.

We remark that the coupled resampling method can be improved as in Sen et al. (2018). We also remark that the approaches of Houssineau et al. (2018) and Jacob et al. (2016) could potentially be used here. However, none of these articles has sufficient supporting theory to verify a reduction in cost of the ML procedure.

3.3.1 Theoretical result

We conclude this section with a technical theorem. We consider only \({\hat{\eta }}^\mathrm{ML,L}_n(f)\), but this can be extended to \({\hat{\zeta }}^\mathrm{ML,L}_n(f)\), similarly to Jasra et al. (2018) . The proofs are given in “Appendix”.

Define \(\mathcal {B}_b(\mathbb {R}^d)\) as the bounded, measurable and real-valued functions on \(\mathbb {R}^d\) and \(\text {Lip}(\mathbb {R}^d)\) as the globally Lipschitz real-valued functions on \(\mathbb {R}^d\). Define the space \(\mathcal {A}=\mathcal {B}_b(\mathbb {R}^d)\cap \text {Lip}(\mathbb {R}^d)\) with the norm \(\Vert \varphi \Vert = \sup _{x\in \mathbb {R}^d} |\varphi (x)| + \sup _{x,y \in \mathbb {R}^d} \frac{|\varphi (x) - \varphi (y)|}{|x-y|}\).

The following assumptions will be required.

Assumption 3.2

There exists \(C,\beta _1>0\), such that for all \(h_l>0\), there exists a \(\delta _l(h_l)>0\) such that \(\delta _l(h_l) \le C h_l^{\beta _1}\) and \(h_l = 1/\nu (B_{\delta _l(h_l)}^c)\).

Denote by \({\check{Q}}^{l,l-1}((y,y'),\cdot )\) the coupling of the Markov transitions \(Q^l(y,\cdot )\) and \(Q^{l-1}(y',\cdot )\), \((y,y')\in \mathbb {R}^{2d}\) as in Algorithm 2.

Assumption 3.3

\(\mathbb {E}[\mathrm{COST}({\check{Q}}^{l,l-1})] = \mathcal {O}(h_l^{-1})\), where \(\mathbb {E}[\mathrm{COST}({\check{Q}}^{l,l-1})]\) is the expected cost to simulate one sample from the kernel \({\check{Q}}^{l,l-1}\).

Below \(\mathbb {E}\) denotes expectation w.r.t. the law of the particle system.

Theorem 1

Assume (2.1, 3.1,3.2, 3.3). Then for any \(n\ge 0, f \in \mathcal {A}\), there exists a \(C>0\) such that for \(\varepsilon >0\) given and \(L>0, \{N_l\}_{l=0}^L\) depending only upon \(\varepsilon \), the following estimate holds

$$\begin{aligned} \mathbb {E}\Bigg [ \Bigg ({\hat{\eta }}^\mathrm{ML,L}_n(f) - {\hat{\eta }}_n^\infty (f) \Bigg )^2 \Bigg ] \le C \varepsilon ^2, \end{aligned}$$

for the cost \(\mathcal {C}(\varepsilon ) := \mathbb {E}[\mathrm{COST}(\varepsilon )]\) given in the second column of Table 1.

Table 1 The three cases of MLPF, and associated cost \(\mathcal {C}(\varepsilon )\). \(\beta \) is as Lemma A3

Proof

The proof is essentially identical to Jasra et al. (2017, Theorem 4.3). The only difference is to establish analogous results to Jasra et al. (2017, Appendix D); this is done in the appendix of this article. \(\square \)

4 Numerical examples

In this section, we compare our proposed multilevel particle filter method with the vanilla particle filter method. A target accuracy parameter \(\epsilon \) will be specified and the cost to achieve an error below this target accuracy will be estimated. The performance of the two algorithms will be compared in two applications of SDEs driven by general Lévy process: filtering of a partially observed Lévy process (S&P 500 stock price data) and pricing of a path dependent option. In each of these two applications, we let \(X=\{X_t\}_{t\in [0,K]}\) denote a symmetric stable Lévy process, i.e. X is a \(\left( \nu ,\Sigma , b \right) \)-Lévy process, and the Lebesgue density of the Lévy measure is given by

$$\begin{aligned} \nu (\mathrm {d}x)&=c|x|^{-1-\phi }\mathbb {1}_{[-x^*,0)}(x)\mathrm {d}x+ c|x|^{-1-\phi }\mathbb {1}_{(0,x^*]}(x)\mathrm {d}x,\nonumber \\&\quad x\in \mathbb {R}\setminus \{0\}, \end{aligned}$$
(23)

with \(c>0\), \(x^*\ge 1\) (the truncation threshold) and index \(\phi \in (0,2)\). The parameters c and \(x^*\) are both 1 for all the examples considered. Notice Assumption 2.1(ii) is satisfied. The Lévy-driven SDE considered here has the form

$$\begin{aligned} \mathrm {d}Y_t=a(Y_{t^{-}})\mathrm {d}X_t,\quad Y_{0}=y_{0}, \end{aligned}$$
(24)

with \(y_0\) assumed known. In the examples illustrated below, we take \(a(Y_t)=Y_t, y_0=1\), and \(\phi =0.5\). The first condition of Assumption 2.1(i) is satisfied for this choice of a, but the second condition is not, without modifying a outside of a compact set.

Remark 1

(Symmetric stable Lévy process of index \(\phi \in (0,2)\))

In approximating the Lévy-driven SDE (24), Theorem 2 of Dereich and Heidenreich (2011) provided asymptotic error bounds for the strong approximation by the Euler scheme. If the driving Lévy process \(X_t\) has no Brownian component, that is \(\Sigma =0\), then the \(L^2\)-error, denoted \(\sigma _{h_l}^2\), is bounded by

$$\begin{aligned} \sigma _{h_l}^2&\le C(\sigma ^2(\delta _l)+{ |b-F_0^l|^2}h_l^2) \, , \end{aligned}$$

and for \(\Sigma > 0\),

$$\begin{aligned} \sigma _{h_l}^2&\le C(\sigma ^2(\delta _l)+h_l|\log (h_l)|) \, , \end{aligned}$$

for a fixed constant \(C<\infty \) (that is the Lipschitz constant), where \(\sigma ^2(\delta _l) := \int _{B_{\delta _l}} |x|^2 \nu (dx)\). Recall that \(\delta _l(h_l)\) is chosen such that \(h_l=1/{\nu (B_{\delta _l}^{c})}\). Observe that for \(\delta _l \ge x^*\) one has \(\nu (B_{\delta _l}^{c})=0\) and \(h_l=\infty \). As mentioned earlier, we are concerned with asymptotically small values of \(h_l\) and \(\delta _l\) here, and so we let \(\delta _l<x^*\), without any loss of generality. Then

one obtains the analytical expression

$$\begin{aligned} \sigma ^2(\delta _l)&= \frac{2c}{2-\phi }\delta _l(h_l)^{2-\phi }\le C\delta _l^{2-\phi } \, , \end{aligned}$$
(25)

for some constant \(C>0\).

One can also analytically compute

$$\begin{aligned} \nu (B_{\delta _l}^{c})&=\frac{2c(\delta _l^{-\phi }-{x^*}^{-\phi })}{\phi } \, . \end{aligned}$$

Now one can solve \(\nu (B_{\delta _l}^{c})=h_l\), and obtain

$$\begin{aligned} \delta _l&=\Big (\frac{2ch_l}{\phi + {x^*}^{-\phi } h_l} \Big )^{1/\phi } \, . \end{aligned}$$
(26)

Then, one finds that

$$\begin{aligned} \delta _l=\mathcal {O}(h_{l}^{1/\phi })\, , \end{aligned}$$

hence verifying Assumption 3.2 for this example. Using (25)–(26) and the error bounds for \(\Sigma =0\), one can straightforwardly obtain strong error rates for the approximation of an SDE driven by a stable Lévy process in terms of the single accuracy parameter \(h_l\). This is given by

$$\begin{aligned} \sigma _{h_l}^2&\le C (h_l^{(2-\phi )/\phi }+|b-F_0^l|^2h_l^{2}) \, . \end{aligned}$$

Thus, if \(b-F_0^l \ne 0\), the strong error rate \(\beta \) of Lemma A3 associated with a particular discretization level \(h_l\), is given by

$$\begin{aligned} \beta&= \min \Big (\frac{2-\phi }{\phi },2\Big ) \, . \end{aligned}$$
(27)

Otherwise it is just given by \((2-\phi )/\phi \).

In the examples considered below, the original Lévy process has no drift and Brownian motion components, that is \(\Sigma =b=0\). Due to the linear drift correction \(F_0^l\) in the compensated compound Poisson process, the random jump times are refined such that the time differences between successive jumps are bounded by the accuracy parameter \(h_l\) associated with the Euler discretization approximation methods in ((5)) and ((13)–((14)). However, since \(F_0^l=0\) here, due to symmetry, this does not affect the rate, as described in Remark 1.

We start with verification of the weak and strong error convergence rates, \(\alpha \) and \(\beta \) for the forward model. To this end the quantities \(|\mathbb {E}[Y^{l}_1-Y^{l-1}_1]|\) and \(\mathbb {E}[|Y^{l}_1-Y^{l-1}_1|^2]\) are computed over increasing levels l. Figure 1 shows these computed values plotted against \(h_l\) on base-2 logarithmic scales. A fit of a linear model gives rate \(\alpha =1.3797\), and similar simulation experiment gives \(\beta =2.7377\). This is consistent with the rate \(\beta =3\) and \(\alpha =\beta /2\) from Remark 1 (27).

Fig. 1
figure 1

Empirical weak and strong error rates estimates

We begin our comparison of the MLPF and PF algorithms starting with the filtering of a partially observed Lévy-driven SDE and then consider the knock out barrier call option pricing problem.

4.1 Partially observed data

In this section we consider filtering a partially observed Lévy process. Recall that the Lévy-driven SDE takes the form (24). In addition, partial observations \(\{z_1,\dots ,z_n\}\) are available with \(Z_k\) obtained at time k and \(Z_k|(Y_{k}=y_k)\) has a density function \(G_k(y_{k})\) (with observation is omitted from the notation and appearing only as subscript k). The observation density is Gaussian with mean \(y_k\) and variance 1. We aim to estimate \(\mathbb {E}[f(Y_{k})|z_{1:k}]\) for some test function f(y). In this application, we consider the real daily S&P 500 \(\log \) return data (from August 3, 2011 to July 24, 2015, normalized to unity variance). We shall take the test function \(f(y)=e^{y}\) for the example considered below, which we note does not satisfy the assumptions of Theorem 1, and hence challenges the theory. In fact the results are roughly equivalent to the case \(f(y)=e^{y}\mathbb {I}_{\{|y|<10\}}\), where \(\mathbb {I}_A\) is the indicator function on the set A, which was also considered and does satisfy the required assumptions.

Fig. 2
figure 2

Mean square error against computational cost for filtering with partially observed data

The error-versus-cost plots on base 10 logarithmic scales for PF and MLPF are shown in Fig. 2. The fitted linear model of \(\log \) MSE against \(\log \) Cost has a slope of − 0.667 and − 0.859 for PF and MLPF respectively. These results again verify numerically the expected theoretical asymptotic behaviour of computational cost as a function of MSE for both standard cost and ML cost.

4.2 Barrier option

Here we consider computing the value of a discretely monitored knock out barrier option (see e.g. Glasserman (2004) and the references therein). Let \(Y_0\in [a,b]\) for some \(0<a<b<+\infty \) known and let \(Q^{\infty }(y_{i-1},dy)\) be the transition kernel of the process as in (24). Then the value of the barrier option (up-to a known constant) is

$$\begin{aligned} \int _{{\mathbb {R}}^{n}} \max \{y_n-S,0\}\prod _{i=1}^n \mathbb {I}_{[a,b]}(y_i)Q^{\infty }(y_{i-1},dy_i) \, , \end{aligned}$$

for \(S>0\) given. As seen in Jasra and Moral (2011) the calculation of the barrier option is non-trivial, in the sense that even importance sampling may not work well. We consider the (time) discretized version

$$\begin{aligned} \int _{{\mathbb {R}}^{n}} \max \{y_n-S,0\}\prod _{i=1}^n \mathbb {I}_{[a,b]}(y_i)Q^{l}(y_{i-1},dy_i) \, . \end{aligned}$$

Define a sequence of probabilities, \(k\in \{1,\dots ,n\}\),

$$\begin{aligned}&{\hat{\eta }}_k^l(dy_{1:k}) \propto {\tilde{G}}_k(y_k)\prod _{i=1}^k \mathbb {I}_{[a,b]}(y_i)Q^{l}(y_{i-1},dy_i) \nonumber \\&\quad = \prod _{i=1}^k \left( \frac{{\tilde{G}}_i(y_i)}{{\tilde{G}}_{i-1}(y_{i-1})} \right) \mathbb {I}_{[a,b]}(y_i) Q^{l}(y_{i-1},dy_i) \end{aligned}$$
(28)

for some non-negative collection of functions \({\tilde{G}}_k(y_k)\), \(k\in \{1,\dots ,n\}\) to be specified. Recall that \({\hat{\zeta }}_n^l\) denotes the un-normalized density associated to \({\hat{\eta }}_n^l\). Then the value of the time discretized barrier option is exactly

$$\begin{aligned} {\hat{\zeta }}_{n}^l\Big (\frac{f}{{\tilde{G}}_n}\Big ) = \int _{{\mathbb {R}}^{n}} \max \{y_n-S,0\}\prod _{i=1}^n \mathbb {I}_{[a,b]}(y_i)Q^{l}(y_{i-1},dy_i) \, ,\nonumber \\ \end{aligned}$$
(29)

where \(f(y_n)=\max \{y_n-S,0\}\). Thus, we can apply the MLPF targetting the sequence \(\{{\hat{\eta }}_k^l\}_{ k\in \{1,\dots ,n\},l \in \{0,\dots ,L\} }\) and use our normalizing constant estimator (22) to estimate (29). If \({\tilde{G}}_n=|f|\), then we have an optimal importance distribution, in the sense that we are estimating the integral of the constant function 1 and the variance is minimal (Rubinstein and Kroese 2016). However, noting the form of the effective potential above (28), this can result in infinite weights (with adaptive resampling as done here), and so some regularization is necessary. We bypass this issue by choosing \({\tilde{G}}_k(y_k) = |y_k-S|^{\kappa _k}\), where \(\kappa _k\) is an annealing parameter with \(\kappa _0 = 0\) and \(\kappa _n = 1\). We make no claim that this is the best option, but it guides us to something reminiscent of the optimal thing, and with well-behaved weights, in practice. We tried also \(|\max \{y_k-S,\varepsilon \}|^{\kappa _k}\), with \(\varepsilon =0.001\), and the results are almost identical.

For this example we choose \(S=1.25,a=0,b=5,y_0=1,n=100\). The \(N_l\) are chosen as in the previous example. The error-versus-cost plots for PF and MLPF are shown in Fig. 3. Note that the bullets in the graph correspond to different choices of L (for both PF and MLPF, \(2\le L\le 8\)). The fitted linear model of \(\log \) MSE against \(\log \) cost has a slope of − 0.6667 and − 0.859 for PF and MLPF respectively. These numerical results are consistent with the expected theoretical asymptotic behaviour of MSE\(\propto \)Cost\(^{-1}\) for the multilevel method. The single level particle filter achieves the asymptotic behaviour of the standard Monte Carlo method with MSE\(\propto \)Cost\(^{-2/3}\).

Fig. 3
figure 3

Mean square error against computational cost for the knock out barrier option example