Large and moderate deviations for importance sampling in the Heston model

We provide a detailed importance sampling analysis for variance reduction in stochastic volatility models. The optimal change of measure is obtained using a variety of results from large and moderate deviations: small-time, large-time, small-noise. Specialising the results to the Heston model, we derive many closed-form solutions, making the whole approach easy to implement. We support our theoretical results with a detailed numerical analysis of the variance reduction gains.

1. Introduction and general overview 1.1.Introduction.Monte Carlo simulation is the standard (if not the only) technique for most numerical problems in stochastic modelling.It has a long history and has been successfully applied in many fields, such as biology [20], statistical Physics [2], Finance [12] among others.The default order of magnitude for the variance of the estimator is O(N −1/2 ) with N the number of sample paths.It has long been recognised though that several tricks achieve lower variance with equivalent (hopefully zero) bias; among those antithetic variables and importance sampling have become ubiquitous.We focus on the latter, for which large and moderate deviations (LDP and MDP) provide closed-form formulae, making their applications pain-free and without additional computer costs.
The first attempt to reduce the variance of a Monte Carlo estimator based on asymptotics probably originated, rather heuristically, in [24].This was made rigorous later by Glasserman and Wang [13] who also highlighted pitfalls of the method and later by Dupuis and Wang [10], who provided clear explanations on the trade-off between asymptotic approximations and the restrictions they entail on the induced change of measure.Guasoni and Robertson [15] put this into practice for out-the-money path-dependent options in the Black-Scholes models, and Robertson [22] developed a thorough analysis for the Heston model using sample path large deviations.This is our starting point, and the goal of our current enterprise is to analyse different asymptotic regimes (small-time, large-time, small-noise), both in the large deviations and in the moderate deviations regimes, in the Heston model and to show how these yield closed-form formulae for an optimal change of measure for importance sampling purposes.
We propose, in particular, a specific form of adaptive drift, allowing for fast computation and increase in variance reduction.For geometric Asian Call options in the Heston model, MDP-based estimators with deterministic changes of drift turn out to be no better than those computed with deterministic volatility approximation in the LDP approach.However, MDP-based estimators with adaptive changes of drift perform much better than their LDPcounterparts with deterministic volatility approximation, and in fact show a performance very close to the LDP-based estimators in Heston.These adaptive MDP-based estimators therefore provide an efficient alternative in models where LDP is difficult to compute.
Setting and notations Throughout this paper we work on a filtered probability space (Ω, F, P, F) with a finite time horizon T > 0, where Ω = C([0, T ] → R 2 ) is the space of all continuous functions, F is the Borel-σ-algebra on Ω and F := (F t ) t∈[0,T ] is the natural filtration of a given two-dimensional standard Brownian motion W := (W, W ⊥ ).For a pair of (possibly deterministic) process (X, Y ), where X is predictable and Y a semi-martingale, we write the stochastic integral X • Y := • 0 X s dY s and X • t W := (X • W ) t for any t ∈ [0, T ].We denote any d-dimensional path by h := (h 1 , . . ., h d ) for d ∈ N, and for such a path, we write h 2 T := We denote H 0 T the Cameron-Martin space of Brownian motion, isomorphic to the space of absolutely continuous functions AC([0, T ]).We define a similar space H x T := {ϕ ∈ C([0, T ] → R d ) : ϕ t = x + t 0 ψ s ds, ψ ∈ L 2 [0, T ]; R d } for processes starting at x ∈ R d and a subspace H x,+ T ⊂ H x T where functions map to R +,d instead of R d .Whenever a variable has an obvious time-dependence, we drop the explicit reference in the notation.We shall also write C T := C([0, T ] → R) to simplify statements.We write {X ε } ∼ LDP(I X , C T ) to mean that the sequence {X ε } satisfies a large deviations principle as ε tends to zero on C T with good rate function I X .For a given function f , we denote by D(f ) its effective domain.We shall finally denote R + := (0, ∞).
1.2.Overview of the importance sampling methodology.We consider a given riskneutral probability measure P, so that the fundamental theorem of asset pricing implies that the price of an option with payoff G ∈ L 2 ([0, T ]; R) is equal to E P [G].While, strictly speaking, we do not need L 2 ([0, T ]; R) for pricing purposes, we require it to estimate the variance of payoff estimators.Monte-Carlo estimators rely on the (strong) law of large number, whereby for iid samples {G i } 1≤i≤n from P • G −1 , the empirical mean G n := 1 n n i=1 G i converges to the true expectation P-almost surely: P-a.s. .
Importance sampling is a method to reduce the variance of the estimator G n , yielding a new law (and of course both the equality and inequality remain true with G replaced by G n ).Let for example Z := dQ dP denote the Radon-Nikodym derivative of the change of measure, so that E P [G] = E Q [GZ −1 ].The variance of the Monte-Carlo estimator based on iid samples of G n Z −1 under Q is then , the variance is thus reduced.Finding such Z however is usually hard, and we shall instead consider the approximation (1.1) for small ε > 0, for two random variables G ε and Z ε whose choices will be discussed later.The computation of this expression is then further simplified by the use of the Varadhan's lemma (Theorem B), which casts the problem into a deterministic optimisation over the appropriate Cameron-Martin space.
1.3.Choosing an approximated random variable G ε .Consider a payoff of the form G = G(X), where X is the unique strong solution to the stochastic differential equation where b, σ : R → R are sufficiently well-behaved deterministic functions and W is a standard Brownian motion.The approximation of G is then defined as G ε := G(X ε ), where the following are possible approximations of X: The terminology here is straightforward as (1.4) follows from (1.2) via the mapping t → εt and (1.5) follows from (1.2) via the mapping X t → X ε t := X t/ε .The small-noise (1.3) comes directly from the early works on random perturbations of deterministic systems by Varadhan [25] and Freidlin-Wentzell [11].
1.4.General approach.We consider an asset price S := {S t } t∈[0,T ] and the corresponding log-price process X := log(S) := {X t } t∈[0,T ] , with dynamics where W = (W, W ⊥ ) is a standard two-dimensional Brownian motion and B := ρW + ρW ⊥ with correlation coefficient ρ ∈ (−1, 1) and ρ := 1 − ρ 2 .The drift and diffusion coefficients of the volatility process satisfy f : R + → R and g : R + → R + and Assumption 1.4.1 if not stated otherwise (e.g. in the case of large-time approximation in Section 3.3 additional assumptions are required for ergodicity purposes).
i) The function f : R + → R is globally Lipschitz continuous; ii) The function g : R + → (0, ∞) is strictly increasing and satisfies p-polynomial growth condition for p ≥ 1 2 , that is |g(x)| ≤ 1 + |x| p , for all x ∈ R + .
Under this assumption, Yamada-Watanabe's theorem [26,Theorem 1] ensures the existence of a unique strong solution to (1.6).Consider now the continuous map G ∈ C T , yielding the option price E[G(X)].Finding the optimal Radon-Nikodym derivative encoding the change of measure from P to Q is hard in general and we instead consider the particular class of change of measure (1.7) which is well defined and satisfies E P [Z T ] = 1 whenever h ∈ H 0 T .Now let F := log |G| and H := log(Z), so that to compute, for some proxies X ε and H ε T .In light of (1.7), the approximation H ε (h) reads (1.9) is called an asymptotically optimal change of drift.
From this point onward, several approaches exist in the literature.In [9,16,8] fully adaptive schemes are consider, where h is function of (t, X t , V t ).These schemes effectively reduce variance, but are expensive to compute.For that reason, we look at the case where h is an absolutely continuous function with derivatives in L 2 ([0, T ]; R).The main advantage is the fast computation in comparison to the fully adaptive schemes.Conserving this advantage, we also look at path h of the form . 0 ḣ(t)

√
V t dt (yielding a stochastic change of measure) for which computations are usually as fast and variance reduction higher.
In the case where h is a deterministic function (the approach is similar in other settings), the main methodology we shall develop below then goes through the following steps: i) Choose appropriate approximations X ε and H ε as in Section 1.3; ii) Prove an LDP (MDP) with good rate function I X,H for {X ε , H ε } ε>0 ; iii) Show that Varadhan's Lemma applies, so that the function L(h) in (1.8) reads where iv) We consider the dual problem of (1.10) in the sense of Definition 1.4.2.For more details see the remark below.
Definition 1.4.2.The primal problem is defined as while the dual consists in Remark.In many cases this optimisation problem may be difficult to solve analytically, so we deal with the dual problem which turns out to be much simpler.With further assumptions, it may be possible to prove strong duality, however this is outside the scope of this paper.
Remark.Small-time approximations may induce important loss of information.The reason is that the drift term in H ε (the quadratic part in ḣ) can be negligible and can thus lead to a trivial dual problem.In the Black-Scholes setting (Appendix C), a small-time approximation for H leads to the following problem: let F : R + → R + be a smooth enough function so that Varadhan's Lemma (Theorem B) holds, then the small-noise approximation problem reads However, in the small-time case we actually have In this small-time setting, the dual problem (1.13) then reads sup Clearly the path h can be multiplied by any arbitrarily large positive constant to increase the inner supremum, and therefore the optimisation does not admit a maximiser.In these cases, we thus do not consider the small-time approximation for H ε T .The paper's structure is as follows: in Section 2 we look at stochastic volatility models satisfying Assumption 1.4.1 and derive explicit solutions for large deviations approximations for path-dependent payoffs of the form F ∫ T 0 α t dX t for general deterministic paths α ∈ C T .This includes state-dependent payoffs of European type, i.e., functions of X T (for the choice of α = 1) and of Asian type 1 T ∫ T 0 X t dt (for α t = T −t T ).Later in Section 3 we study moderate deviations, where we derive small-noise, small-time and large-time MDPs, whose advantages, compared to LDP, are simpler forms of rate functions.Finally, in Section 4 we present results for the Heston model and compare variance reduction results for different approximation types.Some of the technical proofs are relegated to the Appendix A.

Importance sampling via large deviations
2.1.Small-noise LDP.We start with the small-noise approximation of (1.6): (2.1) 2.1.1.Large deviations.In the spirit of [11], we provide in this section an LDP in C([0, T ] → R 2 ) for {X ε , V ε }; usual assumptions involve non-degenerate and locally-Lipschitz diffusion though, which clearly fails for square-root type stochastic volatility models.We follow [22] who lifted this constraint and instead showed an LDP under the following assumption: for some good rate function I V with D(I V ) ⊂ H v0 T .This assumption allows an extension of the Freidlin-Wentzell in the following theorem, which is a log-price analogue of [22,Theorem 2.2] and also gives two variational conditions sufficient for the tuple to satisfy an LDP.Theorem 2.1.1 (Theorem 2.2 and Corollary 2.3 in [22]).Under Assumption 2.1.1,if there exists β > 0 such that , where Here, I W is nothing else than the usual energy function for the Brownian motion from Schilder's theorem [23].To apply Theorem 2.1.1,we first need to show that Assumption 2.1.1 holds in our setting and check whether any further assumption on the coefficients are necessary.Many processes arising from volatility models, where the classical Freidlin-Wentzell does not apply, have been studied in the literature.For example Donati-Martin, Rouault, Yor and Zani [7] show that V ε satisfies an LDP in the case of the Heston model, then Chiarini and Fischer [3] show existence of an LDP for class of models with uniformly continuous coefficients on compacts, Conforti et al. [4] show an LDP for non-Lipschitz diffusion coefficient of CEV type.Most notably, Baldi and Caramellino [1] cover the case of Assumption 1.4.1 with f (0) > 0 and sub-linear growth when f, g → ∞.We now state their main result.
Theorem 2.1.2(Theorem 1.2 in [1]).Let V ε be the solution of (2.1) on [0, T ] with v 0 > 0, f (0) > 0 and sublinear growth of f, g at infinity.Then under Assumption 1.4.1 the process V ε satisfies an LDP with the good rate function εW and satisfies a LDP with good rate function by Schilder's theorem, then [6,Exercise 4.2.7] in turn implies that the triple (V ε , W ε ) satisfies an LDP with the good rate function and is infinite otherwise.Then, by applying Theorem 2.1.1,we finally obtain the following: and is infinite otherwise.
Proof.The proposition is a direct application of Theorem 2.1.1:since By [1,Proposition 3.11] the unique solution to the ODE φt = f (φ t )+g(φ t ) ẇt with φ 0 = v 0 > 0 is strictly positive under Assumption 1.4.1 and since g maps to R + and is strictly increasing, then sup t∈[0,T ] 2φt g 2 (φt) is finite.Again, since φ is strictly positive, then inf t∈[0,T ] 2φt g 2 (φt) > 0 and therefore for all 2.1.2.LDP-based importance sampling.We consider two changes of measure, with a deterministic and a stochastic change of drift, and start with the former.
Deterministic change of drift.The drift is of the form with h ∈ H 0 T .The limit (1.8), together with (1.9), then reads We now follow the same approach as in the case of deterministic volatility in Appendix C. Let F ∈ C(R + → R + ) be bounded from above and ḣ be of finite variation, then the tail condition of Varadhan's Lemma is easy to verify and the functional L in (1.10) reads where ϕ(x) is the unique solution on [0, T ] to with initial conditions ϕ 0 (x) = 0 and ψ 0 = v 0 , and := (ρ, ρ).To solve the dual problem (1.13), the inner optimisation reads As shown in [22], the dual problem (1.13) can then be solved uniquely as which is an asymptotically optimal change of drift in the sense of Definition 1.4.1.
Stochastic change of drift.We now consider the stochastic change of measure with ḣ a deterministic function of finite variation such that E[ dQ dP ] = 1; this holds for example under the Novikov condition E exp( where x = (x 1 x 2 ) and {ϕ(x), ψ} are unique solutions on [0, T ] to with initial conditions (ϕ 0 (x), ψ 0 ) = (0, v 0 ).For the dual problem, we search for a change of measure with h such that: (2.4) The maximisation problem is very similar to the one with deterministic change of drift (2.3).However, as we will see in Section 4, the stochastic version usually gives better results.3) since the solutions to (2.4) can be easily deduced from it.It reads The following lemma helps transforming the above optimisation problem.
with initial conditions ϕ 0 = 0 and ψ 0 = v 0 , is well defined and is a bijection.
Proof.Clearly ψ ∈ L 2 ([0, T ]; R) and the unique solution ψ to the ODE ψt = f (φ t ) + g(ψ t ) ẇt with ψ 0 = v 0 > 0 is strictly positive under Assumption 1.4.1 by [1, Proposition 3.11].Therefore ψ ∈ H v0,+ T and K(x) = (ϕ, ψ) is well defined.Finally, K is clearly a bijection and its inverse can be computed explicitly as Using Lemma 2.1.1,we can substantially simplify the optimisation problem by writing it in terms of K(x).To be more precise, we will make use of the following transformation: which stems from the two components of K(x).The optimisation problem becomes This allows us to apply Euler-Lagrange to the problem seen as an optimisation over where . This system of equations is still hard to solve for general f and g, so we instead consider the case of the Heston model with f (ψ) = κ(θ − ψ) and g(ψ) = ξ √ ψ.Then the system becomes Solving the first ODE and plugging it into the second one gives for all t where β > 0 is an arbitrary constant and A t = Ut √ ψt is the solution to the Ricatti equation.The option payoff at the terminal time determines the boundary conditions through the Euler-Lagrange equations so that Since both conditions include the same optimising variable, the resulting problem in fact becomes an optimisation in R 2 over β and A T (or equivalently A 0 ) and is therefore much simpler than the original optimisation problem.The procedure is the following: after solving for A t for all t ∈ [0, T ], we can now solve for the couple (ϕ, ψ) by writing U and Z in terms of the couple using (2.5).We note that in the small-noise setting the results are of the similar form.
Example.In the case α t = α > 0, when the Riccati equation can be reduced to a separable differential equation, let so that the Riccati equation reads where D ∈ R can be determined from the initial condition on A. Now, since , we are to solve ψt + (κ − A t )ψ t = κθ, for t ∈ [0, T ], which is just a nonhomogeneous linear ODE with the solution Then the optimisation problem for h reduces to which is an optimisation over 2.2.Small-time LDP.Applying the mapping t → εt to (1.6) yields (2.6) Robertson [22] showed that ε • 0 V ε t dt is in fact exponentially equivalent to zero, so that the drift of V ε can be ignored at the large deviations level.In the case of a general drift f , the following lemma provides a similar statement: Proof.Markov's inequality implies that, for any δ > 0, and we are therefore left to show that lim sup ε↓0 ε log E exp To that end we apply the integral Jensen's inequality and the linear growth condition from the global Lipschitz condition in Assumption 1.4.1: Next, by the properties of the logarithm and supremum We can now apply Gronwall's Lemma to the last term, which yields for some C > 0, lim sup which is finite.
Following this lemma, the results from the previous section could simply be adapted so that {V ε t , W ε t } satisfy the same LDP by simply setting f = 0 (or equivalently κ = 0 in the case of Heston).However, this violates the condition f (0) > 0 in Baldi and Caramellino [1].Fortunately, Conforti, Deuschel and De Marco [4] removed the need for strict positivity on the drift at the initial time by imposing more stringent conditions on the diffusion.
Theorem 2.2.1 (Theorem 1.1 in [4]).Under Assumption 2.2.1, the solution Therefore by setting K = 0 we can use the methodology form the previous section since the LDPs are the same.Similarly as before, we only consider the deterministic change of drift, since the stochastic case is very similar.We therefore search for h such that where ϕ t (x) is the unique solution on [0, T ] to: 2.2.1.Example: option with path-dependent payoff.We consider a payoff G(α • T X) as in Section 2.1.3.In the deterministic case, we have where The same way as before we only look at the Heston model and we have for all t ∈ [0, T ], with Remark.Variance reduction for affine stochastic volatility processes via importance sampling through the large-time approximation is extensively covered in [14], so we do not repeat the study and refer the reader to the aforementioned work.

Importance sampling via moderate deviations
In the previous sections, large deviations provided us with a way of computing the asymptotic change of measure for importance sampling, via an ε-approximation of the log-price X ε .While the large deviations rate function was a convenient quadratic in the deterministic volatility setting, it is in general rather cumbersome to compute numerically, unfortunately offsetting any importance sampling gain.Moderate deviations act on a cruder scale, but provide quadratic rate functions, easier to compute.Suppose that the sequence {X ε } ε>0 converges in probability to X. Moderate deviations for {X ε } ε>0 are defined as large deviations for the rescaled sequence where h(ε) tends to infinity and √ εh(ε) to zero as ε tends to zero.A typical choice is h(ε) = ε −α for α ∈ (0, 1  2 ) or equivalently 2 ).We shall stick to this choice of h in our analysis in order to highlight clear rates of convergence.We now introduce the approximation (3.1) This process is centered around X and is a simple candidate.Furthermore, in stochastic volatility models, and particularly in large-time setting, the moderate deviations rate function is simply the second-order Taylor expansion of the large deviations rate function around its minimum X [17, Remark 3.5].We again consider the dynamics (1.6) with Assumption 1.4.1 for the coefficients.We further assume the following conditions: ii) The small-noise approximation (2.1) of V satisfies an LDP with the good rate function I V and speed ε such that I V admits a unique minimum and is null there.
As it will be shown in Lemma 3.1.3,the sequence {V ε } ε>0 converges in probability to the function ψ as a consequence of Assumption 3.0.1.This provides a natural choice for the centered process X t = − 1 2 t 0 ψ s ds, so that the approximation (3.1) reads, for any t ∈ [0, T ], 3.1.Small-noise moderate deviations.Plugging in the small-noise approximation of X ε introduced in (2.1), the process starting from X ε 0 = 0, together with the small-noise approximation (2.1) for the variance: This transformation creates a lag between the decreasing speeds of X ε and that of V ε (speed of convergence to zero of the diffusion part of the volatility process i.e. ).Since X ε is our reference, we adjust the speed of the LDP via Similarly, in the price small-noise setting we have for γ In the following, we provide an LDP for {η ε } (equivalently an MDP for {V ε }).We relegate more technical proofs to Appendix A.
3.1.1.Theoretical results.The main moderate deviations result of this section is Theorem 3.1.1,but we first start with the following three technical lemmata, useful for the theorem but also of independent interest, proved in Appendices A.1-A.2-A.3:Lemma 3.1.1.Let {Z ε } ε>0 be a family of random variables mapping to any metrisable space X and satisfying an LDP with good rate function I.If there exists a unique x 0 such that I(x 0 ) = 0, then for all β > 1, Z ε β satisfies an LDP with the good rate function Equivalently, if for β > 1, Z ε satisfies an LDP with speed ε β and the good rate function with a unique minimum at zero, then Z ε is exponentially equivalent to x 0 with speed ε.
As a consequence of this lemma, the sequence {Z ε } converges in probability to x 0 .Lemma 3.1.2.Let {Z ε } ε>0 be a sequence of random variables mapping to a metrisable space X and satisfying an LDP with good rate function I such that I(x) = 0 if and only if 3.1.2.Importance sampling using MDP.Consider the system (3.3), and let for h ∈ H 0 T with ḣ of finite variation.In the spirit of moderate deviations, we use the approximation dQ dP and thus aim at minimising Under the conditions of Varadhan's lemma B.0.2 (e.g. if with := ρ ρ and Minimising L(h) is far from trivial, and hence, as before, we define the optimal change of drift h * as a solution to the dual problem inf h∈H 0 T L(x, h), so that, with η as in (3.5), Remark.This moderate deviations approach is equivalent to approximating √ V t with √ ψ t and V t with some Gaussian process centered at ψ.
We now consider a stochastic change of drift, through the Radon-Nikodym derivative for h ∈ H 0 T , ḣ of finite variation and such that E dQ dP = 1.Again, we use the approximation and aim at minimising with η defined as in (3.5).As minimising L(h) is a priori complicated, we define our optimal change of drift h * as a solution to the dual problem 3.1.4.Price small-noise MDP.We consider now the stock price dynamics given in (3.4): starting from S ε 0 = 1, with γ > 0. With the deterministic change of drift dQ dP for h ∈ H 0 T with ḣ of finite variation, the optimal change of drift h * is the solution to Regarding the stochastic change of drift, the Radon-Nikodym derivative takes the form with ḣ of finite variation and such that E dQ dP = 1.We define our optimal change of drift h * as a solution to the dual problem: with η, Y as in (3.5).Again the objective simplifies to the case of the deterministic drift change, the only difference being the way h * is calculated.3.1.6.Log-price small-noise MDP with path-dependent payoff.We consider a deterministic change of drift and proceed similarly to Section 2 with the transformations φ = ψ (ρ ẋ1 + ρ ẋ2 ) − η 2 and η = ḟ (ψ)η + g(ψ) ẋ1 , so that the optimisation problem (3.6) for a path-dependent payoff can then be written as Then, by applying Euler-Lagrange to the problem seen as an optimisation over • 0 α t φt dt, η , we obtain the system of ODEs with boundary conditions β = −2F T and U T = −ρα T √ ψ T F T , where F T := F ( T 0 α t φt dt).
simplifies the problem to the linear ODE: Ȧ − ḟ A = 1 2 βα with A T = 0, with solution where B := • 0 f (ψ t )dt and γ := • 0 e Bt α t dt.We can now solve for U and Z: Our optimisation problem was posed over • 0 α t φt dt, η so we require the solution in terms of this couple, therefore: where β, η 0 ∈ R are parameters over which we perform our optimisation.Thus the original optimisation objective (3.8) becomes Remark.Stochastic change of drift objective is equivalent to the one with deterministic drift, the difference being how h is calculated.

Now seen as optimisation over
• 0 α t φt dt, η we obtain by Euler-Lagrange the system of ODEs with boundary condition β = −2F T with F T as above, which can be solved as where := ρ + (ρ + 1)(ρ + ρ).The optimisation problem (3.7) thus simplifies to Remark.Stochastic change of drift objective is again equivalent to the one with deterministic drift the difference being how h is calculated.

3.2.
Small-time moderate deviations.We now mimic the results of the previous section, bu for small-time moderate deviations.Consider the log-price dynamics (1.6) under Assumption 1.4.1 and Assumption 3.0.1.Let α ∈ (0, 1 2 ) and X As we will see, as a consequence of Theorem 3.2.1 the results remain the same as in the case of price small-noise moderate deviations of the previous section with f = 0 and ψ = v 0 .This being the case, we do not repeat them here.Let us nevertheless note that in the case of change of measure with deterministic drift, the problem is similar to the one, where V is constant equal to v 0 . otherwise.

3.3.
Large-time moderate deviations.We now consider a rescaling of (1.6) defined as V ε t = V t ε and X ε t = εX t ε , so that under Assumption 1.4.1 and Assumption 3.
which leads to Regime 2 in the slow-fast setting of [21, Theorem 2.1], by choosing the timescale separation parameter equal to ε.The following assumption is needed in order to conform to the conditions in [21].
(i) f is locally bounded and of the form f (y) = −κy + τ (y) with τ globally Lipschitz with Lipschitz constant L τ < κ.In addition, the tail condition lim |y|↑∞ τ (y)y |y| 2 = κ holds.(ii) The function g is either uniformly continuous and bounded from above and away from zero or takes the form g(y) = ξ|y| qg for q g ∈ [ 1 2 , 1) with ξ = 0. Remark.Together with Assumption 1.4.1,Condition (ii), necessary to ensure ergodicity of the volatility process, collapses g to the form g(y) = ξ|y| 1 2 (for details we refer to [17,21]).
3.3.1.Theoretical results.In order to apply the methodology from the previous sections to derive the desired changes of measure, we need a large-time MDP.More precisely, we need an MDP for {X ε , √ εW, √ εW ⊥ } in the case of deterministic drift.We do not consider the stochastic drift change here, since a rigorous treatment is out of scope of this paper.Similar problem has been studied in [21,Theorem 2.1] and [17,Theorem 3.3], where authors propose fewer conditions, although in a simpler setting (which happens to include the Heston model as well).We now introduce a theorem, which is a direct application of [17,Theorem 3.3] and [21,Theorem 2.1], that provides the desired MDP.Theorem 3.3.1.Let L V denote the infinitesimal generator of V before rescaling, i.e.
Under Assumption 1.4.1 and Assumption 3.3.1 the following hold: i) There exists a unique invariant measure µ corresponding to L V ; ii) The process X ε converges in probability to − 1 2 vt, where v = ∞ 0 y µ(dy); iii) There exists a unique solution ϕ with at most polynomial growth to the Poisson equation , with if φ ∈ AC and infinite otherwise.Proof.Let Y a random variable with distribution µ, the invariant measure.Then by Cauchy-Schwarz, where an equality would imply that V is constant µ-almost surely.This implies f (v) = g(v) = 0, which is not possible due to Assumption 1.4.1 on f and g.

3.3.2.
Example: Options with path-dependent payoff.Consider again an option with a continuous payoff G(X) and let F = log |G|.Following the approach of the previous sections and using similar notations, we search solutions to the dual problems for deterministic change of drift.Let x := x 1 x 2 x 3 and x := x 2 x 3 .By considering a deterministic change of drift similarly as before our problem becomes Applying the modified Varadhan's lemma gives us the target functional We now write the problem in terms of x := ẋ : which is equivalent to (3.11) This substantially simplifies the main optimisation problem, which can now be solved explicitly, by first denoting , where both constants (3.12) 2 are obtained from the definition of Q.Finally, we have Remark.This problem is similar to the problem in Section C, with deterministic volatility.
Example.(Heston model) We again consider the Heston model, i.e., in the setting of (1.6) Notice that the condition of Assumption 3.0.1 are automatically satisfied.In this case, the invariant measure µ is a Gamma distribution as shown in [5].Therefore, in light of Theorem 3.

Numerical results
In all the different settings we studied, the final form of the optimisation problem is where the function F was linked to the payoff, to the rate function of the (log-)price process and ϕ and φ are absolutely continuous paths that arise from Varadhan's lemma of the (log-)price and volatility processes respectively.In the tables below, we summarise all problems considered so far.As one can see, in the deterministic drift setting, the MDP problem is usually as simple as solving the problem under the Balck-Scholes (BS) model or at least by approximating the model with a Black-Scholes model.Furthermore, the variance reduction for geometric Asian options are also similar under the MDP with deterministic drift change, meaning advantage over a simple BS model is not significant.However, when it comes to the stochastic change of drift of the form 1 Since Q is positive-definite, both A and A 22 are positive-definite and thus invertible.
the MDP problems are slightly harder than in BS approximation and the variance reduction results are in fact significantly better.Table 2. Summary of the optimisation problems with the stochastic change of drift.ϕ and φ are absolutely continuous paths that arise from Varadhan's lemma of the (log-)price and volatility processes respectively.

Method used
4.1.Pricing Asian options.In order to compare variance reduction results in the Heston model, we look at Asian Geometric Call options, with payoffs of the form where x + = max{x, 0}.For convenience we restate the dynamics of the Heston model with parameters realistic on Equity markets: S 0 = 50 ; r = 0.05 ; v 0 = 0.04 ; ρ = −0.5 ; κ = 2 ; θ = 0.09 ; ξ = 0.2 .
To simulate the paths (X, V ) on [0, T ], we use a standard Euler-Maruyama scheme for X, but use the scheme [19] for the CIR process in the volatility, which is upward biased 2 , however 2 There are many discretisation schemes for the Heston model.Since the objective of this paper is not to study the effects of different schemes we satisfy ourselves with [19].
nevertheless converges strongly in L 1 to the true process V .For n ∈ N, ∆ = T n and the increments of the Brownian motion {∆W n i } n−1 i=0 the scheme reads on [0, T ]: In what follows, we compare different LDP and MDP results, with n = 252 trading days per year.All the results are computed for maturity T = 1, using N MC = 500, 000 Monte-Carlo samples.We also consider an antithetic estimator and an LDP estimator derived under the assumption of deterministic volatility (denoted by BS).Furthermore, since LDP-based deterministic changes of drift in the BS setting (or in cases where the final form of the optimisation problem is similar) are easy to compute, we also propose a fully adaptive scheme based on the BS estimator: where h i t is the best deterministic change of drift up to the i-th discretisation step. 3We shall refer to deterministic schemes to mean changes of law with deterministic changes of drift and to adaptive changes of drift for changes of law with drift of the form 4.2.LDP results in different settings.We now look at the results of LDP based estimators in small-noise, small-time and large-time setting.Figure 1 indicates that the estimators derived in small-noise log-price and small-noise price have very similar variance reduction.
On the other hand, the small-time estimator provides good results, but is significantly outperformed by the other two.Although not apparent in the figure, looking at Table 3, the adaptive estimators provide slightly better results, as a matter of fact, they are notably better for small strikes.However, the computation time is also higher for adaptive estimators, which balances out the slight increase in variance reduction for higher strikes.In the deterministic case, all considered estimators have similar variance reduction (see Figure 2).To be more precise, the BS estimator has a very similar variance reduction or even even slightly outperforms the MDP based estimators (note that the blue and green lines in the left plot of Figure 2 are indistinguishable for high strikes).Therefore, in that aspect, MDP based estimators do not justify their higher computational cost compared to the simple LDP-BS estimator.In the adaptive case, the BS estimator performs slightly better than before, whereas the MDP based estimators significantly outperform their results from the deterministic case and those of the BS estimator.Moreover, as it will be discussed in the next section, their variance reduction is in fact even close to that of the LDP based estimators.5 indicates that MDP estimators are on average about 10% and LDP estimators approximately 15% slower than the corresponding standard BS estimators.The fully adaptive BS estimator provides interesting results, especially for near-the-money strikes, where it performs much better than MDP and LDP estimators.Although this estimator is time consuming, it can still provide a good balance between variance reduction and computation time for certain strikes, see Tables 3, 4, 5 and Figure 4.
and hence By Fatou's lemma we have but also by Bonferroni's inequality Because lim ε↓0 P [Z ε ∈ Γ δ ] = 0 from the proof of Lemma therefore using the obvious inequality |a + b| α ≤ 2 α−1 (|a| α + |b| α ) for α ≥ 1 we have Consider for now only the drift term where we used Jensen's inequality for the first inequality.The second follows from the linear growth condition in Assumption 1.4.1.For the diffusion term where the first line follows from the Burkholder-Davis-Gundy inequality (with α ≥ 2) and the last one from the H-polynomial growth (H ≥ 1 2 ) condition on the diffusion.Adding both terms together and applying the Gronwall lemma to the integrands yields where the constants are Choosing any α ≥ 2 and β = α 2 − 1 completes the proof.Similarly as in [3], define the bounded map Φ t ∈ C T for each t ∈ [0, T ] as It is also continuous.Indeed, let ψ n → ψ in C T , then we have by the triangle inequality for all t ∈ [0, T ] and ϕ, φ ∈ A. Therefore Φ t is continuous since Now by Lemma A.3.1 {V ε } ε>0 is tight as a family of random variables.Therefore, by taking a subsequence {V ε } ε>0 , converges in distribution to some random variable on the same probability space.Since Φ t is continuous and bounded, we have by the Continuous mapping theorem that lim ε↓0 E[Φ t (V ε )] = E[Φ(ψ)].We now show this limit is zero for ψ = v 0 + • 0 f ( ψ s )ds by Hölder inequality and Itô isometry since the integral is bounded by M > 0 by Lemma A.
is well defined and continuous and for every γ > 0, lim sup Proof.If ḣ is of finite variation and x is continuous with x(0) = 0, then Since F is continuous, the first statement, about existence and continuity, holds.The second one is a direct consequence of the computation of the exponential Gaussian moment: for every γ > 0,  C.2. Small-time.The small-time approximation (1.4) reads dX ε t = − 1 2 εσ 2 (εt)dt + σ(tε)dW ε t , with X ε 0 = 0 and W ε = √ εW .The couple (σ(tε), W ε t ) is exponentially equivalent to (σ(0), W ε t ).Since I W is the good rate function of the LDP verified by (W ε ) by Schilder's Theorem [6, Theorem 5.2.3], (σ(tε), W ε t ) verifies a LDP with good rate function I(s, w) = I W (w) + ∞1 1 {s =σ(0)} .By Theorem 2.1.1,( • 0 σ(tε)dW ε t , W ε ) satisfies a LDP with the same good rate function as ( • 0 σ(0)dW ε t , W ε ).Noticing that • 0 εσ(tε) 2 dt is exponentially equivalent to zero, our method then leads to the same solution as the problem for dX ε t = − 1 2 εσ 2 (0)dt + σ(0)dW ε t , which was treated above.In this small-time setting, we lose all information on the path of σ, except for its initial value.C.3.Numerical results.We provide numerical evidence in the Black-Scholes model with S0 = 50 in the log-price small-noise approximation.In order to compare estimators, we look at Asian Arithmetic Call options, with payoff ( 1 T T 0 exp(Xt)dt − K) + .The form of the solution of the optimisation problem studied previously can be found in [15].We compare the naive Monte-Carlo estimator, the antithetic Monte-Carlo estimator, the control estimator based on the price of Geometric Asian options (with payoff {exp[ 1 T T 0 Xtdt] − K} + ), that can be computed explicitly, and the LDP-based importance sampling estimator above.Instead of simulating T 0 exp(Xt)dt, we consider a discretised payoff on n = 252 dates and draw 10 5 paths.For the LDP-based estimator, the law of W after the change of measure is given by Girsanov theorem.In what follows, when we refer to variance reduction we mean the ratio of variance of the classical Monte-Carlo estimator over the variance of estimator in question.As we can see in Table 7 and in Figure 6, even in non-rare events, the LDP estimator provides good variance reduction.However, it is mainly in the context of rare events that it performs best and outperforms the other estimators (Figure 5 and Table 7), revealing the true power of LDP-based importance sampling estimators.

2. 1 . 3 .
Example: options with path-dependent payoff.Consider a payoff G(α• T X) with G : R + → R + differentiable, α a positive function of class C 1 and F := log |G|.We only look at the deterministic case, namely the optimisation problem(2.

3 .
s) ds, y = ψ • w, Log-price small-noise MDP.Consider a continuous payoff function G ∈ C([0, T ] → R + and let F := log |G|.As a reminder, we are interested in finding a measure change minimising E[e 2F dP dQ ].We first consider a deterministic change of drift, via the change of measure dQ dP

3. 1 . 5 . 2 T0
Example: options with path-dependent payoffs.We apply our methodology to options with payoffs of the form G(α • T X), where G : R → R + is a differentiable function and α a positive (almost everywhere) function of class C 1 ([0, T ] → R + ).The payoff is then a continuous function of the path.Let F = log |G| and F (x) = F (x − 1 ψ s α s ds) and suppose that Assumptions 1.4.1 and 3.0.1 hold.

Figure 1 .
Figure 1.Variance reduction for LDP based estimators for in log-scale.Left: deterministic change of drift.Right: adaptive changes of drift.

Figure 2 .
Figure 2. Variance reduction for MDP based estimators for in log-scale.Left: deterministic change of drift.Right: adaptive changes of drift.Note that because of computational problems, the adaptive small-time MDP estimator was not computed correctly for a strikes greater than 75.Nevertheless, it looks to be outperformed significantly by other MDP estimators.4.4.Overall comparison.Looking at Figure 3, as expected the LDP small-noise adaptive estimators perform best, even though MDP small-noise and large-time adaptive estimators are not far behind.Regaring computation time, Table5indicates that MDP estimators are on average about 10% and LDP estimators approximately 15% slower than the corresponding standard BS estimators.The fully adaptive BS estimator provides interesting results, especially for near-the-money strikes, where it performs much better than MDP and LDP estimators.Although this estimator is time consuming, it can still provide a good balance between variance reduction and computation time for certain strikes, see Tables3, 4, 5 and Figure4.

Figure 3 .
Figure 3. Variance reduction for different estimators in log-scale.The antithetic estimator offers almost no variance reduction for OTM options, because with higher strikes very few paths end up in the money, reducing the effect of antithetic samples.In the following three tables we use these notations: -Proba: Probability of having a positive Payoff.-LDPsn: Deterministic estimator based on LDP in small-noise log-price setting.-LDPsn A: Adaptive estimator based on LDP in small-noise log-price setting.-BS: Deterministic BS estimator.-BS A: Adaptive BS estimator.-MDPsn-log A: Adaptive estimator based on MDP in small-noise log-price setting.-MDPsn A: Adaptive estimator based on MDP in small-noise price setting.-BS A2: Fully adaptive BS estimator.-Ant: Antithetic estimator.-Classic: Classic Monte-Carlo estimator.

Figure 4 .
Figure 4. Ratio of variance reduction over computation time for different estimators in log-scale.

Lemma C. 1 . 1 .
If ḣ is of finite variation, the function

Lemma C. 1 . 2 .Furthermore, it immediately follows from [ 15 , 2 • 0 σ 2 ( 2 •
If ḣ is of finite variation, then L(h) = sup x∈H 0 T L(x; h).Proof.Let γ > 1 such that (C.1) holds.Then by Cauchy-Schwarz and Lemma C.1.1,conditions of Theorem B.0.1 are verified.The continuity has already been shown.Since the map G : W → X is continuous by the continuity of the Itô integral, we can freely introduce F = F • G and the existence of a minimum to the dual version of inf ; h), can be proved as in[15] under Assumption C.1.1 by choosing M = 0 in [15,Lemma 7.1].The minimum is then attained for h = x and equal to(Theorem 3.6] that, if h * ∈ H 0T is of finite variation and is a solution to (C.2), then it is asymptotically optimal ifL(h * ) = 2F − 1 t | 2 dt.Therefore, in order to derive a change of measure, we search for h * ∈ H 0 T such thath * = arg maxSimplified deterministic change of drift.We consider a simplified version of the problem.Since ε • 0 σ 2 (t)dt is exponentially equivalent to 0, the results of the previous section remain valid when replacing F (− 1 0 σ 2 (t)dt + • 0 σ(t) ẋtdt) with F ( • 0 σ(t) ẋtdt).The problem then becomes h * = arg max x∈H 0

Figure 5 . 2 T0
Figure 5. Left: Comparison of the best estimator (in terms of variance) between the control and the LDP one for different values of σ, K, T .Right: Same but plotted againts the probability of a positive payoff (computed using the estimated prices of the options with the values of σ, K, T from the left.) Following Theorem 3.1.1and Lemma 2.1.1,and the Contraction principle and [6, Exercise 4.2.7]imply that the triple {η ε , W ε , Y ε } satisfies an LDP with speed ε and good rate function

Table 1 .
Optimisation with general payoff Optimisation with payoff α • T X Summary of the optimisation problems with the deterministic change of drift.ϕ and φ are absolutely continuous paths that arise from Varadhan's lemma of the (log-)price and volatility processes respectively.

Table 3 .
Variance reduction for different estimators and probability of pos-

Table 4 .
Ratio of variance reduction over computation time for different estimators Strike Classic LDPsn LDPsn A BS BS A MDPsn log A MDPsn A BS A2 Ant

Table 5 .
Computation time (in seconds) for different estimators.4.5.Variance swaps.The methodology can also be applied to options with payoffs depending on volatility, for example for options with payoffs of the form A.2. Proof of Lemma 3.1.2.Let Z ε : Ω → X with X a metrisable space and(X , d) be a metric space, and δ, M > 0. Define further the sets Γ δ 3.1.1and Z ε is uniformly integrable, the result follows by taking M to infinity.