Importance sampling for stochastic reaction–diffusion equations in the moderate deviation regime

We develop a provably efficient importance sampling scheme that estimates exit probabilities of solutions to small-noise stochastic reaction–diffusion equations from scaled neighborhoods of a stable equilibrium. The moderate deviation scaling allows for a local approximation of the nonlinear dynamics by their linearized version. In addition, we identify a finite-dimensional subspace where exits take place with high probability. Using stochastic control and variational methods we show that our scheme performs well both in the zero noise limit and pre-asymptotically. Simulation studies for stochastically perturbed bistable dynamics illustrate the theoretical results.


INTRODUCTION
In this paper we are concerned with the problem of rare event simulation for the stochastic reactiondiffusion equation (SRDE) where ϵ ≪ 1, A is a uniformly elliptic second-order differential operator, f : R → R is a dissipative nonlinearity with polynomial growth and Ẇ is a stochastic forcing term of intensity √ ϵ modeled by space-time white noise.The mixed boundary conditions are given by the linear operator N which acts on functions defined on the boundary ∂(0, ℓ) (see Section 2 for more details), and the initial datum x : (0, ℓ) → R is a continuous function in the kernel of N .
Systems like (1) are of interest because they exhibit metastable behavior.Assuming that the associated noiseless dynamics are non-trivial and ϵ > 0, the stochastic forcing can induce transitions between neighborhoods of metastable states.As ϵ → 0, transitions and exits from domains of attraction occur with very small probabilities and rigorous asymptotic analysis of exit times and places is possible within the framework of large deviations or potential theory (see e.g.[23,33,36,37] and [4,21,32,41,54], as well as references within, for results in metastability theory in finite and infinite dimensions respectively).
In practice, efficient simulation of such events is challenging.On the one hand, Large Deviation Principles (LDPs) characterize the exponential decay rates of probabilities in the limit as ϵ → 0 but ignore the effect of prefactors which can be significant (see [29]).On the other hand, as ϵ decreases, standard Monte-Carlo schemes require an increasingly large sample size in order to maintain a small relative error per sample.For this reason, accelerated and adaptive methods such as importance sampling or multi-level splitting become essential when it comes to rare events.For more details on the general theory and applications of such methods in a number of different models, the interested reader is referred to the book [12].
In the present work, we aim to develop a provably efficient importance sampling scheme that computes exit probabilities of X ϵ from scaled neighborhoods of a stable equilibrium point x * .In particular, let X ϵ x denote the unique (mild) solution of (1) with initial condition x, D ⊂ L 2 (0, ℓ) and For T, L > 0, we focus on the estimation of probabilities P[τ ϵ x * ≤ T ] in the case where D = D ϵ with The scaling h(ϵ) is chosen so that h(ϵ) → ∞ and √ ϵh(ϵ) → 0, as ϵ → 0. As ϵ → 0, exit probabilities from such domains lie in an asymptotic regime that interpolates between the Central Limit Theorem (CLT) and LDP.To be precise, let X 0 x denote the (deterministic) solution of (1) with ϵ = 0 and define a family of centered and re-scaled processes As ϵ → 0, the choices h(ϵ) = 1/ √ ϵ and h(ϵ) = 1 correspond to large and Gaussian deviations of X ϵ respectively.Exits of X ϵ from D are then equivalent to exits of η ϵ x from an L 2 −ball of radius L around 0 and large deviations of the family {η ϵ x } ϵ∈(0,1) are called moderate deviations of {X ϵ x } ϵ∈(0,1) .Moderate Deviation Principles (MDPs) have been studied in many different contexts such as multiscale and interacting particle systems, Markov processes with jumps, small-noise stochastic dynamics, statistical estimation, option pricing and stochastic recursive algorithms see e.g.[35,61] for SRDEs as well as [7,13,26,34,39,40,42,44,51].
Importance sampling is a variance-reduction accelerated Monte-Carlo method and its objective is to minimize the variance of the estimator by carefully chosen changes of measure.Such changes of measure "push" the dynamics towards trajectories that realize the rare event of interest.This procedure transforms tail events to more typical events, thus allowing for more efficient sampling.The simulation outcomes are then weighted by likelihood ratios so that the importance sampling estimators remain unbiased under the new probability measures.Importance sampling schemes for events in the large and moderate deviation regimes have been developed for finite-dimensional systems in [27,29,56,57,59].In [27,57], the authors observed that moderate-deviation based schemes provide a viable and simpler alternative to their largedeviation based counterparts, in cases where both are applicable.This is due to the fact that the MDP action functional, which characterizes exponential decay rates of probabilities, takes a much simpler form.In turn, this allows for more tractable and straightforward design of optimal changes of measure.
Importance sampling for SRDEs presents new challenges due to infinite dimensionality combined with the nonlinearity of the dynamics.Our work is close to [53] where a large deviation based scheme was developed for linear equations (i.e. when f = 0).In there, the authors show that efficient changes of measure need to accomplish both variance and dimension reduction.For example, changes of measure that force infinitely many modes of the dynamics lead to estimators with very large variance when ϵ is small.A possible workaround is to show that exits from D take place in a finite-dimensional submanifold of ∂D with high probability.This was achieved in the linear case of [53] where it was proved that, under a sufficiently large spectral gap, exit from D happens in the direction of the eigenvector e 1 of −A corresponding to the smallest non-zero eigenvalue.Similar results regarding the exit direction for (finite-dimensional) SDEs with a linear drift have been proved in [63] (see also Remark 5 below).
To the best of our knowledge, importance sampling for nonlinear SRDEs is rigorously studied here for the first time.The main difficulty in designing large deviation-based schemes for such equations lies in the task of identifying a finite-dimensional exit submanifold (if any).We are able to overcome this obstacle by working in the moderate deviation regime.As we show in the sequel, the latter is equivalent to linearizing the dynamics in a neighborhood of the equilibrium x * .Consequently, the results of [53] can be applied locally at the cost of a linearization error which is, however, negligible as ϵ → 0. In cases where both LDP and MDP-based schemes are available, one may think of the tradeoff between the two as follows: Moderate deviations cover the regime between central limit theorem and large deviations, so they are appropriate to characterize rare events, but not so rare that they would be in the large deviations regime.On the other hand, moderate deviations schemes are in general more tractable due to the asymptotic linearization of the dynamics that takes place.In our setting, this tradeoff is reflected in the fact that we only consider exit domains (2) in which the radius shrinks to zero as ϵ → 0. Furthermore, the probability of exiting from a ball of radius √ ϵh(ϵ) is strictly smaller than the probability of exiting a ball of radius 1.The MDP importance sampling schemes described in this paper can provide a quantitative upper bound for the much more difficult to characterize LDP exit probabilities.
The design of an importance sampling scheme and proof of its good asymptotic and pre-asymptotic performance is the main contribution of this paper.In the course of our analysis, we prove an MDP for additive-noise SRDEs with a non-Lipschitz nonlinearity which cannot be found in the literature (see Theorem 3.1 and Remark 11).Furthermore, our theory is applied to the stochastic Allen-Cahn (also known as real Ginzburg-Landau or Chafee-Infante) equation and supplemented by simulation studies.In contrast to the linear case, there is a number of interesting cases where the aforementioned spectral gap is not satisfied.Another novel feature of this work is the construction of changes of measure that perform well asymptotically (i.e. as ϵ → 0) in the absence of this condition (see Hypothesis 3(c') below).
The rest of this paper is organized as follows: In Section 2 we fix the notation and state our assumptions.In the first part of Section 3 we introduce moderate deviations and subsolution-based importance sampling and then state and prove our results on the asymptotic theory of the scheme.Section 4 is devoted to the implementation and pre-asymptotic performance analysis of our scheme.In Section 5 we apply the developed theory to the case where f is, up to a sign, the derivative of a double-well potential.Our examples include the stochastic Allen-Cahn equation (which features a cubic nonlinearity) with different boundary conditions as well as SRDEs with higher order polynomial nonlinearities.The results of simulation studies are then presented in Section 6.Finally, Appendix A collects the proofs of some useful lemmas.

NOTATION AND ASSUMPTIONS
Let ℓ > 0. The Hilbert space L 2 (0, ℓ) endowed with its usual inner product will be denoted by (H, ⟨•, •⟩ H ). The Banach space C[0, ℓ], endowed with the supremum norm, is denoted by E. The norm of a Banach space X will be denoted by ∥ • ∥ X and the closed ball of radius R > 0 and center x 0 ∈ X , i.e. the set {x ∈ X : ∥x − x 0 ∥ X ≤ R}, by B X (x 0 , R).We use D, D, ∂D to denote interior, closure and boundary of a set D ⊂ X respectively.The lattice notation ∧, ∨ is used to indicate minimum and maximum respectively.
For θ > 0, p ∈ [1, ∞), we denote by W p,θ (0, ℓ) the fractional Sobolev space of x ∈ L p (0, ℓ) such that W p,θ (0, ℓ), endowed with the norm ∥ • ∥ p,θ := ∥ • ∥ L p (0,ℓ) + [•] p,θ , is a Banach space.W 2,θ (0, ℓ) is a Hilbert space and is denoted by H θ (0, ℓ).Moreover, for T > 0 and β ∈ [0, 1), we denote by C β ([0, T ]; X ) the space of β-Hölder continuous X -valued functions defined on the interval [0, T ].C β ([0, T ]; X ), endowed with the norm is a Banach space.For any two Banach spaces X , Y we denote the space of linear bounded operators B : X → Y by L (X ; Y).The latter is a Banach space when endowed with the norm ∥B∥ L (X ;Y) := sup x∈B X (0,1) ∥Bx∥ Y .When the domain coincides with the co-domain, we use the simpler notation L (X ).The spaces of traceclass and Hilbert-Schmidt linear operators B : H → H are denoted by L 1 (H) and L 2 (H) respectively.The former is a Banach space when endowed with the norm ∥B∥ L1(H) := tr( √ B * B) while the latter is a Hilbert space when endowed with the inner product ⟨B 1 , B 2 ⟩ L2(H) := tr(B * 2 B 1 ).The operator A in (1) is a uniformly elliptic second-order differential operator in divergence form.In particular: with a ∈ C 1 (0, ℓ) and inf ξ∈(0,ℓ) a(ξ) > 0. The operator N acts on the boundary {0, ℓ} and can be either the identity operator (corresponding to Dirichlet boundary conditions), first-order differential operators of the type for periodic boundary conditions.We denote by A the realization of the differential operator A in H, endowed with the boundary condition N .It is defined on a dense subspace Dom(A) ⊂ H that contains and it generates a C 0 semigroup of operators S = {S(t)} t≥0 ⊂ L (H).Moreover, the part of A in Dom(A) ⊂ E, where the closure is taken in the topology of E, generates either a C 0 or an analytic semigroup for which we use the same notation (see e.g.A.27 in [22] for a definition).Regarding the spectral properties of A, we make the following assumptions: Hypothesis 1(a).In view of (4), the operator −A is self-adjoint.As a result, there exists a countable complete orthonormal basis {e n } n∈N ⊂ H of eigenvectors of −A.The corresponding sequence of nonnegative eigenvalues is denoted by {a n } n∈N .

Hypothesis 1(b). The eigenvectors satisfy sup
Remark 1.Without loss of generality, we can replace the operator A by Ã = A − cI for some c > 0 and the reaction term f in (1), by f (x(ξ)) := f (x(ξ)) + cx(ξ).The model is invariant under this transformation and, in light of Hypothesis 1(a), it follows that ∥ S(t)∥ L (H) ≤ e −ct .Throughout the rest of this work we will be using Ã, S and f with no further distinction in notation.
Let θ ≥ 0. In view of Hypotheses 1(a) along with the previous remark, −A, restricted to its image, has a densely defined bounded inverse (−A) −1 which can then be uniquely extended to all of H.The fractional power (−A) −θ is defined via interpolation and is also injective.Letting (−A) Remark 2. For θ ∈ (0, 1  2 ) the spaces H θ (0, ℓ) and H θ coincide via the identification which holds with equivalence of norms.The latter implies that for each t ≥ 0, the linear operator S(t) − I ∈ L (H θ ; H) and there exists a constant C > 0 such that The analytic semigroup S possesses the following regularizing properties (see e.g.section 4.1.1 in [18]) : (i) For 0 ≤ s ≤ r ≤ 1 2 and t > 0, S maps H s (0, ℓ) to H r (0, ℓ) and for some positive constants c r,s , C r,s .
(ii) S is ultracontractive, i.e. for t > 0, S(t) maps H to L ∞ (0, ℓ) and furthermore, for any The next set of assumptions concerns the nonlinear reaction term in (1).
Hypothesis 2(a).f : R → R is twice continuously differentiable and where f 1 : R → R is globally Lipschitz continuous and f 2 : R → R is a non-increasing function.

Hypothesis 2(b).
There exists C f > 0 and p 0 ≥ 3 such that for all x ∈ R and i ∈ {0, 1, 2} For p ≥ 1, f induces a superposition (or Nemytskii) operator F : E → L p (0, ℓ) defined by F (x)(ξ) := f (x(ξ)), ξ ∈ (0, ℓ).In view of Hypotheses 2(a) and 2(b), F is twice Gâteaux differentiable along any direction in E and (with some abuse of notation) its Gâteaux differentials are given by The last set of assumptions concerns the stability properties of the deterministic and linearized dynamics governed by (1), after setting ϵ = 0.

Hypothesis 3(b).
The linear self-adjoint operator −A − DF (x * ) has a countable, non-decreasing sequence of nonnegative eigenvalues {a f n } n∈N corresponding to a complete orthonormal set of eigenvectors {e f n } n∈N ⊂ E. Therefore, the equilibrium x * is asymptotically stable.

Hypothesis 3(c). The first two eigenvalues of the self-adjoint operator
This spectral gap provides a sufficient condition that allows us to identify a one-dimensional exit direction for limiting trajectories (see Lemma 3.4 below).A weaker condition under which our results continue to hold is 2a f 1 < a f 2 (see Remark 7).In fact, our asymptotic results continue to hold under the following relaxed spectral gap:

Hypothesis 3(c').
There exists k 0 ≥ 1 such that 3a f 1 < a f k0+1 and a f 1 < a f 2 .Note that Hypothesis 3(c) trivially implies Hypothesis 3(c') with k 0 = 1.The latter will be used throughout Section 3 to prove asymptotic results.In Section 4 we restrict the pre-asymptotic analysis to schemes that work under Hypothesis 3(c).
Turning to the stochastic forcing, let (Ω, F , F t≥0 , P) be a complete filtered probability space.The spacetime white noise Ẇ is understood as the time-derivative of a cylindrical Wiener process W : [0, ∞) × H → L 2 (Ω) in the sense of distributions.The latter is a Gaussian family of random variables with covariance given by Given a separable Hilbert space (H 1 , ⟨. , .⟩H1 ) such that H is a linear subspace of H 1 and the inclusion map This identification is assumed throughout the rest of this paper without further distinction in notation.
Having introduced the necessary notation, we can recast (1) as a stochastic evolution equation on E given by A mild solution to the latter is defined as a process X ϵ satisfying for each ϵ and all t ∈ [0, T ], with probability 1.The last term is known as a stochastic convolution and will be frequently denoted by W A .Our assumptions guarantee that the E-valued paths of W A are continuous with probability 1 and This can be proved by the stochastic factorization method of Da Prato-Zabczyk [22] (see also Theorem B.6 in [52]).Moreover, for each ϵ > 0, (8) has a unique mild solution taking values in C([0, T ]; E) with probability 1 (see e.g.Theorem 2.2 in [19]).

MODERATE DEVIATIONS, IMPORTANCE SAMPLING AND ASYMPTOTIC THEORY
3.1.General theory and main results.In this section we present some theoretical aspects of subsolutionbased importance sampling in the moderate deviation regime, applied to our problem of interest.First, we recall the notion of a Moderate Deviation Principle (MDP).
with compact sub-level sets.
(i) We say that the collection of C([0, T ]; X )-valued random elements {X ϵ } ϵ≪1 satisfies an MDP with action functional S x,T if, for all continuous and bounded g : C([0, T ]; X ) → R and all scalings h(ϵ) such that h(ϵ) → ∞ and where η ϵ x is defined as in (3).(ii) A Borel set E ⊂ C([0, T ]; X ) will be called an S x,T −continuity set if As mentioned in Section 1 we aim to compute probabilities of the form for ϵ ≪ 1, T > 0, where τ ϵ x * = inf{t > 0 : X ϵ x * (t) / ∈ D} and for some L > 0. Passing to the moderate deviation process η ϵ x and recalling that x * is a (stable) equilibrium of X 0 x we see that As will be shown in Section 3.4, η ϵ x * converges, as ϵ → 0, to the solution of a linear deterministic PDE with zero initial condition.Since 0 is the unique fixed point of this PDE, the limit process is bound to stay at 0 and lim ϵ→0 P (ϵ) = 0.This is why accelerated methods that estimate P (ϵ) when ϵ is small are useful.
In this paper, we will only work with unbiased estimators.Hence, minimizing the variance of the estimator is equivalent to minimizing the second moment.As we show below, an upper bound for the exponential decay rate of the second moment of any unbiased estimator can be determined in terms of the action functional S x,T .Lemma 3.1.Let P (ϵ) as in (12) and P (ϵ) be an unbiased estimator of P (ϵ) with respect to a probability measure P If {X ϵ } satisfies an MDP with action functional S x,T and where Ē denotes expectation with respect to the measure P.
Proof.We have Now, for any unbiased estimator P (ϵ), where we used Jensen's inequality.Thus where we used the continuity property of E in the last equality.□ As in finite dimensions (see e.g. the discussion in Section 2.2 in [29]) , the previous lemma shows that 2G T (0, 0) is the best possible exponential decay rate for any unbiased estimator.In turn, this motivates the following criterion for asymptotic optimality.
Definition 3.2.An unbiased estimator P (ϵ) of P (ϵ) defined on a probability space (Ω, F , P) will be called asymptotically optimal if In other words, an estimator is asymptotically optimal if its second moment achieves the best possible exponential decay rate in the limit as ϵ → 0.
Importance sampling involves changes of measure chosen to guarantee that the corresponding estimators achieve optimal (or nearly optimal) asymptotic behavior.Given a measurable feedback control (or change of measure) u : [0, T ] × H → H that is bounded on bounded subsets of H, we define a family of probability measures {P ϵ } ϵ>0 on (Ω, F ) such that, for all ϵ, P ϵ << P on F T and Using these new measures, it is straightforward to verify that As we show in the next lemma Q ϵ (u) admits a variational stochastic control representation which will be useful for studying its asymptotic behavior.A similar variational formula can be found in (2.5) of [53].
Lemma 3.2.Let u : [0, T ] × H → H be a measurable feedback control that is bounded on bounded subsets of H, where A is the collection of all H-valued, F t≥0 -adapted processes v defined on [0, T ] such that Proof.Let ϵ > 0. From the Cameron-Martin-Girsanov theorem, where is a cylindrical Wiener process under P ϵ .Using yet another change of measure with we can write where ηϵ x * solves and τ ϵ x * denotes the corresponding exit time for ηϵ x * .This follows, once again, from the Cameron-Martin-Girsanov theorem, as is a cylindrical Wiener process under the measure Pϵ .From (19) we see that the second moment of the estimator can be written as an exponential functional of the driving noise and, as such, it admits the variational representation (17) (see (2.5) in [53] as well as (14) in [57] for the finite-dimensional case).□ The form of the MDP action functional provides essential information for choosing changes of measure u that perform well asymptotically.In particular, if for all ϕ with S x,T (ϕ) < ∞ there exists a (local) Lagrangian L x defined on a subset of X × H, such that then "good" changes of measure are connected to subsolutions of the PDE with Here, H x denotes the Hamiltonian corresponding to L x via Legendre transform (up to a sign).In the problems we consider, the latter are not well-defined on the whole space but rather on a subset K × H ⊂ H × H, see e.g. ( 23) below.The notion of subsolution is meant in the sense of the following definition.
in the sense of Fréchet differentiation and satisfies The interested reader is referred to [31] for the original development of subsolution-based importance sampling.As we will show below (Theorem 3.1 and Remark 11), when x = x * , the MDP action functional takes the form (20) with and the corresponding Hamiltonian is given by A direct consequence of ( 20) is that we can construct an explicit stationary subsolution in terms of the corresponding quasipotential.The latter is given by and V x * (η) = ∞ otherwise.A physical interpretation of V x * (η) is that of the minimal "energy" required to push a path from 0 to the state η and its explicit form is a consequence of the fact that (8) is, in our setting, a gradient system (see e.g.[21], [22] Section 12.2.3 for SRDEs).In view of Hypotheses 1(a), 1(b), 3(a), 3(b) it follows that is a subsolution of ( 21) on K = Dom(A).The final condition is satisfied since a f 1 = inf n∈N a f n .Remark 3. In finite-dimensional systems, feedback controls (or changes of measure) defined by u(t, η) = −D η U (t, η) lead to nearly optimal asymptotic behavior (see [28] Section 2.3, [29] Theorem 2.4 for largedeviation and [57] Theorem 3.1 for moderate deviation-based schemes).A first issue that appears in infinite dimensions is that u(t, ηϵ,v x * (t)) is not well-defined since with probability 1 and for all t, ηϵ,v x * (t) / ∈ Dom(A).The latter is a consequence of the spatial irregularity of the noise.
Throughout the rest of this paper, P f n : H → H denotes an orthogonal projection to the n−dimensional eigenspace span{e f j } n j=1 and we consider the "projected" quasipotential , the subsolution U (t, P f n η) of ( 21) (with K = P f n H).The changes of measure we will use are given by with k 0 as in Hypothesis 3(c').For implementation purposes, u k0 is replaced by a sequence u ϵ k0 that converges to u k0 as ϵ → 0. For more details on the choice of u ϵ 1 see ( 63) and the discussion in Section 4 below.We can now present our main results on the asymptotic behavior of the scheme.Theorem 3.1.(Moderate Deviations) Let T > 0, L > 0 as in (13), k 0 as in Hypothesis 3(c'), u k0 as in (25), Q ϵ as in (16) and B H (0, L) ⊂ H denote the closed ball of radius L centered at the origin.Moreover let u ϵ k0 : [0, T ] × H → H be a sequence that converges pointwise and uniformly over bounded subsets of H to u k0 , and Under Hypotheses 1(a)-(c), 2(a),(b), 3(a),(b),(c') we have with the convention that the infimum over the empty set is ∞.
The first statement above asserts that the limiting controlled trajectories exit the domain D through the boundary near the direction of the eigenvector e f 1 (see Hypotheses 3(c), (c')).Finally, (30) shows that, for any finite time horizon T , our scheme is close to asymptotic optimality, according to Definition 3.2, and achieves optimal behavior in the limit ϵ → 0, T → ∞.Near asymptotic optimality is a common feature of importance sampling schemes for continuous-time dynamics even in finite dimensions.This is mainly a consequence of using subsolutions of (21) instead of exact solutions which are rarely given in explicit form.Our numerical studies indicate that near optimality leads to provably superior performance in comparison to standard Monte Carlo.
Remark 5.The moderate deviation regime allows us to work with the exit problem of a linear equation instead of that of the initial nonlinear SRDE (1).The "drift" of this linear equation is given by A + DF (x * ) and thus the dominant eigenpairs of this operator govern the exit time and exit place asymptotics.As mentioned in the introduction, similar statements have been proved for finite-dimensional linear equations in [63] (see e.g.Theorem 6).
3.2.On the asymptotic exit direction.In this section we study the limiting variational problem appearing on the right-hand side of (27).In particular, we will show that, under Hypothesis 3(c'), changes of measure that force the dynamics in the e f 1 direction lead to minimal paths that exit from the ball BH (0, L) through the same direction.From this point on we will only use the notation S x,T to denote the explicit action functional Moving on to the variational problem in ( 27), we let and seek to characterize arg min y∈T I k0 (y).For the first part of this section we consider the case k 0 = 1 covered by Hypothesis 3(c).The more general setting of Hypothesis 3(c') will be studied in Proposition 3.1 below.For the sake of simplicity we will drop the superscript k 0 and write I ≡ I 1 and u ≡ u 1 unless otherwise stated.A first observation is that and for all such y the infimum above is uniquely attained by (see also Remark 4 above).Therefore, in view of (25), we can re-express I as follows: The last two terms in the last display are equal due to (25).Thus, It is straightforward to verify that arg min y∈T I(y) ̸ = ∅, i.e. the minimum value of I over the set is lower-semicontinuous and the second summand in (34) defines a continuous functional on the same set.Thus, I is itself lowersemicontinuous and furthermore T is closed in the topology of C([0, T ]; H) (recall that B H (0, L) in ( 26) is a closed ball in H).
Remark 6.We shall proceed to the characterization of minimizers in three steps.First we minimize over paths y with y(0) = 0 and y(τ ) = z ∈ ∂B H (0, L).Then we minimize over the exit place z and finally over the time τ in which the path y hits the boundary ∂B H (0, L) of the closed ball B H (0, L).At this point, we emphasize that, in contrast to τ ϵ x * (14), τ ϕ (Lemma 3.1),τ ϵ,v x * (Lemma 3.2) and τ ϵ,v ϵ x * (28), it is not known a priori whether the time τ is the first exit time of y from the open ball BH (0, L).We will show that the latter is true for minimizing paths in Lemma 3.4 and Proposition 3.1 below.
Proof.The fact that we minimize over Next notice that y * z,τ ∈ C([0, τ ]; H) since sinh is increasing and continuous.In particular, Proceeding to the proof we have, in view of (34), with L x * as in (22).Minimizers are then governed by the Euler-Lagrange equation which boils down to

Projecting to the eigenbasis {e
Letting and taking into account the initial and terminal conditions we obtain Thus, The next lemma is concerned with the exit direction when Hypothesis 3(c) holds.
Proof.Let ϕ * = ϕ * z,τ be a minimizer provided by Lemma 3.3.Notice that, since the Euler-Lagrange equations provide necessary conditions for minimality, any ϕ * ∈ arg min{I(y) : y ∈ C([0, τ ]; H), y(0) = 0, y(τ ) = z} will be of this form.After straightforward algebra we obtain . Now for each fixed τ , Hypothesis 3(c) guarantees that this quadratic form is minimized for z * ∈ ∂B H (0, L) such that z * k = 0 for all k ≥ 2 and z * 1 = ±L (see e.g.Theorem 3.4 in [53]).Then, is minimized for the largest possible τ i.e. for τ = T. Hence, since the order with which the variables are being minimized does not change the value of the minimum, we have min y∈T I(y) = I(ϕ * z * ,T ) and the minimizers y * = ϕ * z * ,T enjoy the desired properties.Finally, note that any element y * ∈ arg min{I(y); y ∈ T } is of the form ϕ * z * ,T .Indeed, fix the initial and terminal values y * (0) = 0, y * (τ ) = z ∈ ∂B H (0, L) and assume that y * does not satisfy the Euler-Lagrange equations (35).Since the latter provide necessary conditions for minimality, it follows that y * is not a minimizer.Moreover, it follows from the previous calculations that if τ < T or if z k ̸ = 0 for some k ≥ 2 then y * cannot be a minimizer of I.The proof is complete.□ As mentioned above, the previous lemma implies that, for any minimizing path y * , τ is in fact the first exit time from the open ball BH (0, L), i.e. τ = inf{t ∈ [0, T ] : y * / ∈ BH (0, L)} and furthermore τ = T .

Remark 7.
If the sampling time T is large enough, the results of Lemma 3.4 as well as Theorems 3.1, 3.2 remain true under the weaker spectral gap assumption that 2a f 1 < a f k for all k ≥ 2. Since we are interested in schemes that perform well for large values of T, this generalization comes at no cost.For more details on this relaxed condition see [53], Theorem 3.9.
Up to this point we have worked under Hypothesis 3(c) to show that minimizers of the functional I lie on the one-dimensional subspace where the change of measure u acts.In the absence of a sufficiently large spectral gap the situation is more complicated.In particular, if the sampling time T is large enough, the minimizers can be orthogonal to u.In other words, forcing the system towards its physical exit direction e f 1 might actually lead to controlled trajectories that exit from a subspace that is orthogonal to e f 1 under the change of measure.This is proved in the following lemma.Lemma 3.5.Assume that the eigenvalues {a f k } k∈N are strictly increasing, a f 2 ≤ 2a f 1 and let If T > T * then any minimizer y * ∈ arg min{I(y); y ∈ T } satisfies ∥y * (t)∥ H < L for all t < T and ∥y * (T )∥ H = L. Moreover y * first exits BH (0, L) at τ = T in the direction of the eigenvector e f 2 (recall Remark 6) i.e. for all k ̸ = 2, ⟨y * (τ ), e f k ⟩ H = ⟨y * (T ), e f k ⟩ H = 0. Proof.As in the proof of Lemma 3.4 we have We claim that, without loss of generality, we can consider τ ∈ (T * , T ].Assuming the latter for now, we can compare the weights λ f k to conclude that and since x → x/(1 − e −2τ x ) is (strictly) increasing for all τ, it follows that and which follows from (37) by setting τ = T > T * .Since the infimum is achieved at t = T , the combination of ( 38) and ( 39) concludes the proof.□ Remark 8. Lemma 3.5 highlights the importance of sufficient spectral gaps for the design of efficient changes of measure.If Hypothesis 3(c) fails, a scheme that forces the e f 1 direction will be far from optimal and is expected to produce large errors for small values of ϵ.Under the assumptions of that lemma, one can repeat the arguments of the proof above to show

If the ratio 2a f
1 /a f 2 is large, this bound translates to sub-optimal performance as ϵ → 0 which does not improve as T → ∞.Moreover, as we will see in Section 5, this ratio depends non-trivially on the interval length ℓ and is indeed large when ℓ is moderately small.This behavior is caused by the linearization of the dynamics and is completely absent when f = 0.For an example that satisfies the assumptions of Lemma 3.5 see Section 5.1.
Before we conclude this section we consider once again the situation where the eigenvalues {a f k } k∈N do not satisfy Hypothesis 3(c) but instead Hypothesis 3(c') holds.We show that the conclusions of Lemma 3.4 can be recovered by projecting to a higher dimensional eigenspace of A + DF (x * ) consisting of the first k 0 eigenvalues.Proposition 3.1.Let k 0 as in Hypothesis 3(c'), U as in (24) and u k0 as in (25).Under Hypothesis 3(c') any minimizer y * ∈ arg min{I k0 (y); y ∈ T } satisfies the same properties as in Lemma 3.4.
Proof.Following the computations in (33), which carry over verbatim, we see that Since the second term is constant for each fixed value of the exit point y(τ ), the Euler-Lagrange equations and minimizers for this functional are then identical to those derived in Lemma 3.3 for I. Thus, for any minimizing path ϕ * z,τ that hits the point z = (z k ) k∈N ∈ ∂B H (0, L) at time τ ∈ [0, T ] we have Comparing the weights λ f k0,j we see that for all 1 < j ≤ k 0 ) is (strictly) increasing for all τ and a f 1 < a f 2 ≤ a f j for any j ≥ 2. In order to show that minimizers point towards z 1 it remains to compare λ f k0,1 with λ f k0,j for j ≥ k 0 + 1.Since λ k0,k0+1 ≤ λ k0,k0+2 ≤ . . . it suffices to consider λ f k0,k0+1 .In view of Hypothesis 3(c') and Theorem 3.4 of [53] we conclude that Let v ϵ be a sequence in A satisfying the assumptions of Theorem 3.2, u k0 as in ( 25) and u ϵ k0 : [0, T ] × H → H be a sequence that converges pointwise and uniformly over bounded subsets of H to u k0 .The goal of this section is to prove tightness estimates for the collection {η ϵ,v ϵ x * : ϵ < ϵ 0 } of C([0, T ]; X )−valued random elements.Throughout the rest of this section we drop the index k 0 and write u ≡ u k0 , u ϵ ≡ u ϵ k0 .Recall that for each ϵ, ηϵ,v ϵ x * is the unique mild solution of the controlled equation ( 18) with v = v ϵ , u = u ϵ .Existence and uniqueness is once again provided by Theorem 2.2 of [19] (see also Theorem 7.1 of [52]).The following lemma guarantees that, for ϵ small, the sequence v ϵ is bounded in L 2 .Lemma 3.6.There exists ϵ 0 > 0 and a constant C > 0 such that Proof.In view of the variational representation (17) Now from the MDP for bounded functionals (see Definition 3.1 as well as Remark 11 below), along with Lemma 3.1, there exists a constant C > 0 such that, for ϵ sufficiently small, Hence, from the uniform convergence of u ϵ to u and the uniform boundedness of u in bounded subsets of H and the fact that τ ϵ,v ϵ x * ≤ T with probability 1 the estimate follows.□ Remark 9. Without loss of generality, we can trivially extend the controls . This convention will be in use for the rest of this section.We shall now proceed to the proof of tightness estimates.Lemma 3.7.Let p ≥ 1.For all ϵ, T > 0, there exist ϵ 0 > 0, α, β > 0 such that Proof.Using the mild formulation we have We now fix a version of the process Ψ ϵ,v ϵ (t, ξ) and work path-by-path.The paths of Ψ ϵ,v ϵ are weakly differentiable with probability 1 and with A as in (4).Next, let t ∈ [0, T ] and choose ξ t ∈ [0, L] to be such that In view of Proposition A.1 in [52] (see also Proposition D.4 of [22]) we can estimate the left derivative of the supremum norm From the uniform ellipticity of A we have for all t ∈ [0, T ], AΨ ϵ,v ϵ (t, ξ t )sign Ψ ϵ,v ϵ (t, ξ t ) ≤ 0. Thus, in view of Hypothesis (2(a)) where M f1 is the Lipschitz constant of f 1 .To proceed, we distinguish the following two cases: Case 1: Since f 2 is non-increasing, Hence, Case 2: In this case it is straightforward to verify that The reader is referred to the proof of Theorem 6.1 of [52] for a similar argument.The latter, along with the optimality of ξ t , yields 43), (44) and the mean value inequality to obtain By Grönwall's inequality, where C T,ϕ = e 2M f 1 T .Since the latter holds for all t ∈ [0, T ] we obtain Turning to the control term, This is a consequence of the embedding W θ,p (O) → E which holds for smooth domains O ⊂ R d and all θ > d/p.From the smoothing property (6), the Cauchy-Schwarz inequality, the uniform convergence of u ϵ to u and (25) we have which holds w.p. 1 for θ < 1, ϵ sufficiently small and ρ > 0. As for the stochastic convolution term we have h(ϵ) → ∞ and ( 10) yields for ϵ small and some C > 0 independent of ϵ.The estimate is a consequence of the Sobolev embedding theorem along with heat kernel estimates and the stochastic factorization formula.Combining ( 42), ( 45), (46), Lemma 3.6 and Remark 9 we obtain and h(ϵ) → ∞ as ϵ → 0. Another application of Grönwall's inequality leads to which is the first estimate in (41).Note here that C does not depend on x * .Turning to the spatial Hölder regularity, an application of Taylor's theorem for Gâteaux derivatives yields for some θ 0 ∈ (0, 1).Let θ > 1/2 and α = (2θ − 1)/2.By virtue of the Sobolev embedding theorem (see e.g.Theorem 8.2 in [25]) and Hypothesis 2(b) we have In view of (47), Repeating similar arguments to the ones used in (46) we see that Moreover, we have the following well-known spatial equicontinuity estimate for the stochastic convolution The reader is refered to [22], Theorems 5.16, 5.22 for the proof and a detailed discussion of regularity properties of stochastic convolutions.Combining the latter along with ( 49) and ( 50) we deduce that for each for some sufficiently small ϵ 0 .It remains to study the temporal equicontinuity of ηϵ,v ϵ Hence, From the estimates preceding (49) and the arguments in (46) we obtain and respectively.As for the stochastic convolution, there exists β ∈ (0, 1) such that (see e.g.[22], Theorem 5.22).Finally, let θ > 0, β ∈ (0, 1/2) such that β + θ/2 < 1.From the Sobolev embedding theorem and ( 5) Following the derivation of the estimates ( 49), ( 50), (51) (see also Lemma A.3 in [35]) we deduce that From the latter and ( 52)-( 55), there exists a sufficiently small ϵ 0 and β > 0 such that This proves the last estimate in (41) and completes the proof.□ From Lemma 3.7, along with an infinite-dimensional version of the Arzelà-Ascoli theorem, it follows that the family of laws of the controlled processes {η ϵ,v ϵ x * } ϵ is concentrated on compact subsets of C([0, T ]; E), uniformly over sufficiently small values of ϵ.Thus, in view of Prokhorov's theorem (Theorem 3.3 below), it forms a relatively compact set in the topology of weak convergence of measures in C([0, T ]; E).In the next section we aim to characterize the limit points as ϵ → 0.

Limiting behavior of ηϵ,v ϵ
x * .Before we proceed to the main body of this section let us recall the notion of a tight family of probability measures and the classical theorem of Prokhorov.Definition 3.4.Let Z be a Polish space and Π ⊂ P(Z) be a set of Borel probability measures on Z and {P n } n∈N ⊂ Π.We say that (i) P n converges weakly to a measure (ii) Π is tight if for each ϵ > 0 there exists a compact set K ϵ ⊂ Z such that for all P ∈ Π, Prokhorov's theorem asserts that the notions of tightness and relative weak sequential compactness are equivalent for Borel measures on Polish spaces.Theorem 3.3.(Prokhorov) Let Z be a Polish space and Π ⊂ P(Z) be a tight family of Borel probability measures.Then every sequence in Π contains a weakly convergent subsequence.Lemma 3.8.Let ϵ 0 be sufficiently small, v ϵ be a sequence in A satisfying the assumptions of Theorem 3.2, u as in (25) and u ϵ : [0, T ] × H → H be a sequence that converges pointwise and uniformly over bounded subsets of H to u.Any sequence in {(η ϵ,v ϵ x * , v ϵ )} ϵ<ϵ0 has a further subsequence that converges in distribution in C([0, T ]; E) × L 2 ([0, T ]; H) to a pair (η v 0 x * , v 0 ) in the product of uniform and weak topologies.Moreover: (i) ηv 0 x * is equal in law to the (unique) solution of (ii) Any sequence in {τ ϵ,v ϵ x * ; ϵ < ϵ 0 } converges in distribution to a [0, T ]-valued random variable τ v 0 such that and for all t < τ v 0 , ηv 0 x * (t) ∈ B H (0, L) with probability 1 (recall that B H (0, L) denotes a closed ball on H).Proof.Starting from the controls v ϵ , Lemma 3.6 along with Remark 9 yield Since any bounded subset of L 2 ([0, T ]; H) is relatively compact in the weak topology, we deduce from the discussion after Lemma 3.7 that the family of laws of the pairs {(η v ϵ x * , v ϵ )} ϵ<ϵ0 is tight.By virtue of Prokhorov's theorem any sequence of such elements contains a subsequence (denoted with the same notation) that converge in distribution to a pair (η We remark here that L 2 ([0, T ]; H) with the weak topology is not globally metrizable, hence not a Polish space, and Prokhorov's theorem is not directly applicable.However the same conclusions can be drawn by a more general version of the theorem (e.g.Theorem 8.6.7 in [8]).Invoking Skorokhod's theorem we can now assume that this convergence happens almost surely.This theorem involves the introduction of a new probability space with respect to which the convergence takes place.This will not be reflected in our notation for the sake of convenience.We will now characterize the law of ηx * .
(i) Recall that for all t with probability 1. Starting from the last term, the estimate (10) yields 1 h(ϵ) W A −→ 0 in L p (Ω; C([0, T ]; E)) for any p ≥ 1. Next, from Lemma 4.7 in [53] we have The first term on the right hand side converges to 0 by our assumptions along with (47).The almost sure convergence of ηϵ,v ϵ x * and the continuity of u (see (25)) along with the dominated convergence theorem imply the convergence of the second term to 0. Next, in view of (48), Hypothesis 2(b) and the dominated convergence theorem we have as ϵ → 0. Uniqueness of (56) along with a subsequence argument complete the proof.
3.5.Proof of Theorem 3.1.Before we move on to the proof we remind the reader that the index k 0 has been dropped.Let ϵ > 0. Returning to (17), choose a sequence {v ϵ } ⊂ A of approximate minimizers such that (28) holds.Since u ϵ converges uniformly to u over bounded subsets, there exists ϵ 0 sufficiently small such that for any δ > 0 and ϵ < ϵ 0 From the variational representation (17), Lemma 3.6 and the assumptions on u ϵ and u there exists ϵ 0 sufficiently small such that Thus, there exists a sequence in ϵ over which the left hand side in (57 is lower semi-continuous in the product of uniform, weak and standard topologies, we can pass to a further subsequence and apply the Portmanteau lemma along with Lemma 3.8 to obtain with T as in (26).Since δ is arbitrary, the upper bound is complete.To obtain a lower bound we will use the conclusions of Proposition 3.1 for the limiting variational problem.To this end let y * satisfy As we mentioned in Section 3.2, the optimization problem on the left-hand side has an explicit solution attained by and from Proposition 3.1, T = inf{t > 0 : ∥y * (t)∥ H = L} = τ y * .Now consider the processes ηϵ,v x * controlled by v. From Lemmas 3.7, 3.8, {η ϵ,v x * ; ϵ > 0} is tight and converges in distribution to a process ηv x * .From the choice of v and uniqueness of solutions it follows that ηv x * = y * with probability 1.Moreover, the exit times τ ϵ,v x * converge in distribution to a random time τ v which is no less than the first exit time of y * from BH (0, L).Since the latter is equal to T it follows that τ v = T with probability 1.Thus where the second inequality follows from lower semi-continuity.Combining ( 58) and ( 60) allows us to conclude.
Remark 11.Theorem 3.1 is essentially equivalent to an MDP for the family {X ϵ } ϵ of solutions of ( 8), in the space C([0, T ]; E).The latter is an asympotic statement for exponential functionals of g(X ϵ ), where g : C([0, T ]; E) → R is continuous and bounded (see Definition 3.1), while the former covers exit probabilities and corresponds to the choice g = g with The case for bounded continuous test functions is in fact simpler, does not require analysis of the limiting variational problem and can be proved using very similar arguments to the ones used above.To be precise, for any continuous, bounded g : C([0, T ]; E) → R the variational representation (17) takes the form according to the classical results of [14].The controlled process η ϵ,v solves (18) with u = 0 and A is a collection of square-integrable adapted controls.The tightness and limiting statements of Lemmas 3.7, 3.8 carry over verbatim after setting u = 0 and (11) then follows with the same action functional (31) by proving an upper and a lower bound as above.In particular, the upper bound is a consequence of lowersemicontinuity and the lower bound follows by considering the minimizing control v in (59).In fact, this simpler MDP is used to obtain Lemma 3.6 above, which is important for the case of unbounded functionals that we consider here.
3.6.Proof of Theorem 3.2.Let {v ϵ } ⊂ A satisfy (28).From Lemma 3.8, Theorem 3.1 and the lower semicontinuity argument in (58) we know that the triples Invoking Lemma 3.8 once again we have ηv 0 x * ∈ T and v 0 ∈ C ηv 0 x * ,x * with probability 1.Since the left-hand side is the infimum over all such paths and controls it follows that 1 2 with probability 1.Thus, from Proposition 3.1 we can conclude that τ ϵ,v ϵ x * → T in probability as ϵ → 0, ηv 0 x * (T ), e f 1 2 H = L 2 with probability 1 and (29) follows.It remains to prove (30).We start from the upper bound which is a consequence of Lemma 3.1, provided that E = {ϕ ∈ C([0, T ]; H) : τ ϕ ≤ T } is a S x * ,T −continuity set.This property can be verified from the analysis of Section 3.2.In particular, Lemmas 3.3, 3.4 and Proposition 3.1 remain true after setting the second summand in (34) or (40) equal to 0. Hence the infima of the action functional over {τ ϕ ≤ T }, {τ ϕ < T } and {τ ϕ = T } coincide and the estimate follows.As for the lower bound, we combine Theorem 3.1, ( 36) and ( 24) to obtain The latter shows that the lower bound actually holds with equality, hence the proof is complete.

IMPLEMENTATION AND PRE-ASYMPTOTIC ANALYSIS OF THE SCHEME
4.1.Implementation issues and exponential mollification.In Section 3, we demonstrated that, under fairly general spectral gap conditions, an importance sampling scheme using the change of measure u k0 (25) achieves nearly optimal asymptotic behavior as the noise intensity ϵ → 0. However, changes of measure based only on the quasipotential subsolution U (24) can lead to poor pre-asymptotic performance.This issue is present even in finite dimensions and is related to the behavior of the controlled dynamics near the origin.In [29], the authors demonstrated that, for certain choices of controls v, the second moment of the estimator degrades over time.In these situations, the system tends to spend a large amount of time near the attractor thus accumulating a large running cost which affects the variance.As a result, for fixed ϵ > 0 the pre-exponential terms which are ignored by the asymptotic bounds (30) dominate and can even lead to errors that increase exponentially as T grows.For more details the reader is referred to the discussion in [29] pp.2919-2921.
In infinite dimensions, an additional challenge appears when the changes of measure act on the full space H.As we will see in Lemma 4.1 below, in order to prove that the second moment of a scheme behaves well for any fixed ϵ > 0, one needs to have good control over the quantity where Z x * denotes a subsolution used for the analysis of the scheme.However, any radial function Z : Thus, apart from dealing with the difficulties related to unbounded operators (see Remark 3), changes of measure for SRDEs that effectively accomplish dimension reduction are necessary for provably efficient performance.
In this section we construct a scheme under Hypothesis 3(c), i.e. our changes of measure only force the e f 1 direction.From this point on it is understood that u ≡ u 1 and u ϵ 1 ≡ u ϵ .In order to deal with the aforementioned issues, our changes of measure u ϵ will meet the following criteria: 1) The projectedquasipotential subsolution (denoted below by F 1 ) will be used for regions of space that are sufficiently far from the origin.2) A constant subsolution F ϵ 2 will dominate near zero.F ϵ 2 does not influence the dynamics until they enter the domain where F 1 dominates.3) To avoid issues from lack of smoothness, the combination of F 1 , F 2 should be appropriately mollified.4) As ϵ → 0 the changes of measure u ϵ converge to the asymptotically nearly optimal u.A suitable choice is provided by the exponential mollification of F 1 , F ϵ 2 .To be precise, we define for a f 1 , e f 1 as in Hypothesis 3(c), κ ∈ (0, 1) and δ = δ(ϵ) > 0 ), η ∈ H and consider the exponential mollification We implement our scheme using the change of measure where is the mollification parameter and κ is a parameter that controls the size of the neighborhood outside of which F 1 dominates.
In order to derive non-asymptotic bounds for the second moment of the estimator, we will use the following min/max representation for the Hamiltonian (see e.g.[29,30]) and for any smooth functions A consequence of this expression is the following pre-asymptotic bound for the second moment: Lemma 4.1.For any smooth functions U x * , Z x * : [0, T ] × H → R, D x * as in (61) and some θ 0 ∈ (0, 1) let and For all ϵ > 0 we have The proof makes use of Itô's formula and is deferred to Appendix A.

Remark 12.
The term H ϵ x * accounts for the error coming from the local approximation of the nonlinear dynamics by their linearized version around the stable equilibrium x * .A significant part of this section is devoted to the pre-asymptotic control of this term.The rest of this section is devoted to the pre-asymptotic analysis of Q ϵ (u ϵ ) based on the lower bound (67) with (68) 4.2.Performance analysis of the scheme.At this point we shall recall the definition of the random times where ηϵ,v x * solves (18).Before we state the main result of this section, we provide the definition of exponential negligibility; a concept which will be frequently used in the sequel.Definition 4.1.A term will be called exponentially negligible (a) in the moderate deviations range if it can be bounded from above in absolute value by C 1 e −c2h 2 (ϵ) /h 2 (ϵ) where C 1 < ∞, c 2 > 0 (b) in the large deviations range if (a) holds with 1/h 2 (ϵ) replaced by ϵ.
The analysis of this section is summarized in the following theorem.Its proof is postponed for the end of this section and is preceded by several auxiliary estimates.
Then, up to exponentially negligible terms in the moderate deviations range, Moreover, if h(ϵ) is such that √ ϵh 3 (ϵ) −→ 0 as ϵ → 0 then for ϵ sufficiently small we have Remark 14.Note that for a small fixed ϵ, (69) shows that, in theory, the second moment degrades as the sampling time T grows.This degradation is caused by the linearization error (65) and suggests that, in practice, good performance lies in the balance between ϵ and T .Fortunately, (70) shows that this theoretical degradation is no longer present if the scaling h(ϵ) does not grow too fast.Moreover, the simulation studies of Section 6 show that our scheme performs well for large T even when this growth assumption is not satisfied.
The following lemma collects a few straightforward computations that will be used below.Its proof can be found in Appendix A.

Returning to (67) we have the following decomposition
Remark 15.This decomposition allows us to deal with the cubic power of η that appears in (65).Since we are only controlling the spatial L 2 −norm of the moderate deviation process, this term is problematic.In particular, estimates based in the a-priori bound (47) will introduce T -dependent constants which are not desirable for the pre-asymptotic analysis.
The last term in (72) concerns the behavior of the controlled process ηϵ,v in the event that it exits an L ∞ −ball of radius 1/ √ ϵh(ϵ) before it exits BH (0, L).Since the latter is a very rare event in the moderate deviations range, we expect that this term is exponentially negligible.This claim is proved in the following proposition.
Proposition 4.1.The term is exponentially negligible in the moderate deviations range for ϵ sufficiently small.
In view of ( 7), Moreover, from (71) we have where we used that ζ, ρ ∈ (0, 1), and h(ϵ) > 1. Combining the last two estimates we deduce that for any v ∈ A, An application of Hölder's inequality along with ( 41) yields and, as ϵ → 0, { Xϵ,v x * } ϵ>0 satisfies a large deviation principle in C([0, T ]; L ∞ (0, ℓ)) with action functional where the convention inf ∅ = +∞ is in use (see e.g.[19], Theorems 6.2, 6.3).Passing to a convergent subsequence if necessary, we deduce that Hence, for ϵ sufficiently small 2 ≤ e − inf ϕ∈B∞(x * ,L) c S x * ,T (ϕ)/4ϵ .Finally, we claim that inf ϕ∈B∞(x * ,L) c S x * ,T (ϕ) > 0. Indeed, since the action functional is lower semicontinuous (see Lemma 5.1, [19]) and The last inequality fails if and only if u * = 0 almost everywhere in [0, T ] × [0, ℓ].Since x * is an equilibrium of the uncontrolled system, the latter implies that ϕ * (t) = x * for all t ∈ [0, T ], hence ϕ * / ∈ B ∞ (x * , L) c .This contradicts the initial choice of ϕ * and concludes the argument.Therefore, the term of interest is exponentially negligible in the large deviation range hence also in the moderate deviation range.□ Next, we turn our attention to the third term in (72).The linearization error in this term is easier to control, since the process √ ϵh(ϵ)η ϵ,v is uniformly bounded by 1 in L ∞ −norm.This fact is used in the following lemma whose proof can be found in Appendix A.3.
As for H ϵ x * (U δ , Z)(t, η) − H ϵ x * (Z x * )(t, η), straightforward algebra along with the arguments of Lemma 4.2 of [29] yield where the quantity β 0 (η Combining the latter with (73) and substituting δ = 2/h 2 (ϵ) and At this point we partition and the constants α, ζ, κ ∈ (0, 1), K < 0 will be chosen later.The remaining part of this section is devoted to the study of the right-hand side of (74) on each component separately.
We address the region B f 3 in the following lemma.Lemma 4.5.Let κ ∈ (0, 1), or, if h(ϵ) is such that √ ϵh 3 (ϵ) → 0, then for sufficiently small ϵ we have It is the most problematic region as there is no guarantee that the weight ρ ϵ is exponentially negligible or of order one.The analysis is deferred to Appendix A.6.
where C does not depend on ϵ or, if √ ϵh 3 (ϵ) −→ 0, there exists ϵ sufficiently small such that Combining the three previous lemmas we arrive at the following regarding the third term in (72) Lemma 4.7.There exists a constant C independent of T > 0 such that for ϵ sufficiently small, up to exponentially negligible terms in the moderate deviations range.Moreover, if h(ϵ) is such that lim ϵ→0 √ ϵh 3 (ϵ) = 0 then, for ϵ sufficiently small, up to exponentially negligible terms in the moderate deviations range.
Proof.(i) From Lemmas 4.4, 4.6(i), 4.5(i) we have with probability 1, up to exponentially negligible terms.Since τ ϵ,v x * ∧ τ ϵ ∞ ≤ T with probability 1 and the constant is deterministic, the estimate follows by taking expectation.
(ii) The estimate follows from Lemmas 4.4, 4.6(ii), 4.5(ii).□ We conclude this section with the proof of Theorem 4.1.
Proof of Theorem 4.1.In view of Lemmas (4.1) and (4.7)(i), (72) yields up to exponentially negligible terms.In view of (68), we have x * ) ] = 0. Thus for ϵ sufficiently small we may write As for the first term, since U δ is the exponential mollification of two functions, Lemma 4.1 of [29] gives that Finally, the improved bound (70) follows by invoking Lemma (4.7)(ii).□

THE CASE OF A DOUBLE-WELL POTENTIAL
In this section we specialize our results to SRDEs in which the differential operator A = ∆ (i.e. the second derivative operator in one spatial dimension) and the reaction term takes the form f = −V ′ f , where V f is a double-well potential as the one depicted below.This choice is possible in view of Hypotheses 2(a), 2(b) which allow arbitrary polynomial growth.Thus, we assume that V f has two global minima and a local maximum which, for simplicity, is assumed to lie in the origin.Without loss of generality, we take Such SRDEs arise as scaling limits of particle systems with nearest-neighbor coupling that evolve in the inverted potential −V f (see e.g.[4], Chapter 1) and provide one of the simplest examples of non-trivial dynamical behavior.
The deterministic reaction-diffusion equation posed on the interval (0, ℓ) has two stable equilibria x * − , x * + , corresponding to the global minima, and a saddle point x * 0 , corresponding to the global maximum, that is identically equal to 0. The equilibria x * ± only exist if ℓ > π for Dirichlet boundary conditions and for all ℓ > 0 for Neumann and periodic conditions.Moreover, every time the interval length ℓ crosses the value kπ (for Neumann or Dirichlet b.c.) or 2kπ (for periodic b.c.), for some k ∈ N, two (resp.one) non-constant saddle points ±x * k,ℓ (resp.x * p,k,ℓ ) bifurcate from x * 0 .The k−th non-constant saddle points feature k kinkantikink pairs in the periodic and Dirichlet cases and k kinks in the Neumann case.The interested reader is referred to [4], Section 2.1 and [32] for the bifurcation analysis of the problem with different boundary conditions.
For most of the sequel, we specialize the discussion to the potential The corresponding bistable stochastic dynamics are governed by the (stochastic) Allen-Cahn equation The noiseless (ϵ = 0) equation was proposed in [2] as a simple model of phase separation of two-component alloy systems.It is also known in the literature as real Ginzburg-Landau [46] (due to its connections with the physical superconductivity theory bearing the same name) or Chafe-Infante problem [20].Transitions between the stable states x * ± that correspond to the absolute minima ±1 are enabled by the stochastic forcing and have been studied as models of quantum tunneling phenomena [32] and thermally induced magnetization reversal of micromagnets [48].For studies of transition times the interested reader is referred to [5,6] and [48,50] in the mathematical and physical literature respectively.
For ξ ∈ (0, ℓ), n = 1, 2, . . ., the corresponding eigenfunctions are (78) For both cases, the stable equilibria are the constant functions x * ± (ξ) = ±1, ξ ∈ (0, ℓ).The reaction term is f (x) = x − x 3 and the linearized operators ∆ + DF (x * ± ) acting on a function y are given by Both cases can be treated simultaneously after indexing the eigenpairs by the natural numbers.In particular, in the Neumann case, the eigenvalues {a f n } from Hypothesis 3(b) are shifted eigenvalues of the (negative) Laplacian, i.e.
and the sequence of eigenvectors {e f n } coincides with {e N e n−1 }.In the periodic case we set a f 1 = 2, a f 2n = 2 + a per n , a f 2n+1 = 2 + a per −n and e f 1 = e per 0 , e f 2n = e per n , e f 2n+1 = e per −n for n = 1, 2, . . . .Turning to the spectral gap conditions, Theorems 3.1 and 3.2 hold for any value of ℓ provided that the change of measure u k0 acts on a finite-dimensional eigenspace of sufficiently high dimension.For example, consider the Neumann problem with ℓ = 4π/3.For this value of ℓ, (76) has 3 saddle points (see e.g.[55], Chapter 5.3.4) and it is easy to check that Hypothesis 3(c) is violated.However, the weak spectral gap of Hypothesis 3(c') is satisfied for k 0 = 3.Indeed, we have Thus, the asymptotic results hold with the change of measure As for the pre-asymptotic analysis of Section 4 and the numerical studies of the following section we work under the stronger spectral gap of Hypothesis 3(c).For the Neumann problem, this places the restriction For the periodic problem, Hypothesis 3(c) gives Finally, an example where the assumptions of Lemma 3.5 are satisfied is given by the Neumann problem with ℓ ≥ π/ √ 2. In this case it is straightforward to verify that

Stochastic Allen-Cahn with
However, exact spectral analysis and numerical simulation of the linearized operators is more involved than the periodic and Neumann cases.This is due to the fact that the stable equilibria x * ± are non-constant functions with absolute value less than or equal to 1 that vanish at the endpoints 0, ℓ.They can be determined by solving the Sturm-Liouville problem Following [60] (see also [16]), we can parametrize x * ± with respect to their minimum pointwise distance from the constant solutions ±1.The latter is in one-to-one correspondence with the bifurcation parameter ℓ (see (80) below).
First, note that the scaling y(ξ) = x(ℓξ) leads to the equivalent problem For any a ∈ (0, 1), the stable equilibria y * ± of the latter are then given by ±y * , where for any m ∈ (0, 1), is the complete elliptic integral of the first kind and sn(•, m) is the Jacobi elliptic sine function defined by The function sn(•, m) can be periodically extended to all of R so that K(m) is its quarter-period.We remark that there are several different parameterizations of K in the literature (e.g. in [60] K(ξ) corresponds to K( √ ξ) in our notation).The definition above was chosen in agreement with [1] and the corresponding built-in Matlab function.
The parameter a is the maximum value of y * a i.e.
In order to convert to a parameterization in terms of ℓ, we first define a scaled quarter-period map M : (0, 1) → R with The correspondence of the interval length ℓ and a is then given by As seen in the Figure 1, M is continuous, strictly increasing and lim a→1 M(a) = ∞.Thus M is continuously invertible.Furthermore it is straightforward to verify that lim a→0 M(a) = π/2.Putting the previous facts together we deduce that Turning to the spectral properties of the linearized operators ∆ + DF (x * ± ), they have a countable sequence of eigenvalues-eigenvectors {(a f n , e f n )} n∈N , hence they satisfy Hypothesis 3(b).The first two pairs have been computed explicitly in [60] and are given by and where dn, cn denote the Jacobi delta amplitude and elliptic cosine functions The spectral gap of Hypothesis 3(c) is then satisfied if where we used the monotonicity of M and 2M( √ 2/2) ≈ 4.0043.Plots of the equilibria y * a and eigenfunctions e f 1,a for a = 0.65, 0.95 are given in Figures 2, 3   5.3.A higher-order Ginzburg-Landau SRDE.We conclude this section with an example of an SRDE with a higher-order polynomial nonlinearity.This time we consider a potential given by If µ > −1 then V f is a double-well potential with steeper walls than the fourth-order case.Such potentials have been considered in the physical literature as higher order quantum mechanical models, see e.g.[45].
The nonlinear reaction term is given by f As in the Allen-Cahn case, Theorems 3.1 and 3.2 hold for any value of ℓ provided that the change of measure u k0 acts on a finite-dimensional eigenspace of sufficiently high dimension.The pre-asymptotic analysis of Section 4 holds under the spectral gap of Hypothesis 3(c).In the Neumann case the spectral gap holds if µ + 2 and in the periodic case

NUMERICAL SIMULATIONS
In this section we demonstrate the theoretical results of this paper by a series of simulation studies for (8).As explained in Section 3, we start the process X ϵ x at a stable equilibrium x = x * and develop a scheme that computes exit probabilities of the form For the simulations that follow we fix L = 1 and set R = R(ϵ) = √ ϵh(ϵ).In view of Remark 10 we have and the process η ϵ x * (3) converges in distribution to 0 as ϵ → 0. Hence, for ϵ small, we are dealing with rare events.We will apply the scheme of Section 4 to the examples of Section 5 and compare its performance to the standard Monte Carlo, which corresponds to no change of measure at all.It is clear that in order to simulate the mild solutions in ( 9), (42) we need to discretize the equation in time and space.In the simulations below we used the exponential Euler scheme finite-dimensional Galerkin projection as it is described in [43].In particular, with ηϵ = ( Xϵ − x * )/ √ ϵh(ϵ), u ϵ as in (25) and Xϵ solving we simulate the mild solution Xϵ on the sampling window t ∈ [0, T ] until it hits ∂D.Its N -th Galerkin projection is given in mild formulation by where P N denotes a projection to the N -dimensional subspace of H spanned by the eigenvectors e 1 , . . ., e N of A (not to be confused with the linearization eigenvectors e f n of Hypothesis 3(b)).Turning to the time discretization, we consider a time-step h = T /∆t for some ∆t ∈ N, discretization times t k = kh, k = 0, . . ., ∆t and set Θ N 0 := P N x * .The exponential Euler scheme is then given by k , e j ⟩ H the numerical scheme for the approximation of ( 84) is then given by where for k = 0, . . .∆t − 1, j = 1, . . ., N ξ k,j are independent standard normal random variables.For Neumann and periodic boundary conditions, the pairs (a j , e j ) are given by (77), (78).Since the changes of measure u ϵ act only in the direction of e f 1 and the latter coincides with e 0 (i.e. a constant function) we have that u N k,j = 0 when j ̸ = 0. However the eigenvalue a 0 is in both cases equal to 0. Hence the exponential Euler scheme is not well-defined for j = 0.For this reason we simulate Θ N k+1,0 via an explicit Euler scheme i.e. where k , e 0 ⟩ H and w k,0 are once again independent standard normal random variables.The computation of the coefficients {f N k,j } in the Neumann (respectively periodic) case can be efficiently performed by applying a forward-backward odd (resp.periodic or Hartley-type) Fast Fourier Transform (FFT) in an iterative fashion.For more details on the discrete Fourier transform and the FFT algorithm the reader is referred to [11] (Chapters 6 and 8 respectively).
Turning to the stochastic Allen-Cahn with Dirichlet boundary conditions the simulations require an additional step.As discussed in Section 5, the stable equilibrium x * (81) is no longer a constant function and the changes of measure u ϵ push towards e f 1 (82) which no longer coincides with a single eigenvector e Dir k (79).Thus, one needs to express x * and e f 1 in terms of the eigenbasis {e Dir k } k∈N of the Laplacian and then perform the exponential Euler scheme (85).If the changes of measure acted on a higher dimensional eigenspace, this step essentially reduces to a change of basis which can be computed with numerical linear-algebraic methods.Regarding the coefficients {f N k,j }, these can be computed by applying a forward-backward even Fast Fourier Transform iteratively.

Remark 16.
In the examples of Section 5 the stable equilibrium x * and the spectra of the linearized operators could be found explicitly.We remark here that our scheme does not depend on explicit formulas for eigenvalues and eigenvectors as long as those can be approximated numerically and the approximated eigenvalues satisfy Hypothesis 3(c).
All the simulations below were done using a parallel MPI C code with M = 5 × 10 4 Monte Carlo trajectories.The FFTs were performed with the aid of the C library FFTW.As it is standard in the related literature (see e.g.[3], Chapter VI,1), the measure of performance is relative error per sample, defined as relative error per sample = √ M st.deviation( P ϵ ) The smaller the relative error per sample, the more efficient the algorithm and the more accurate the estimator.However, in practice both the standard deviation and the expected value of an estimator are typically unknown, which implies that empirical relative error is often used for measurement.This means that the expected value of the estimator will be replaced by the empirical sample mean and the standard deviation of the estimator will be replaced by the empirical sample standard error.A dash line in the simulation tables indicates that no trajectory exited D before time T .Before presenting the simulation tables, let us make a few comments on the parameter values and the end conclusions of the numerical studies.
1) (Simulations for the Neumann stochastic Allen-Cahn) We estimate exit probabilities P (ϵ) for the solution X ϵ of (76) driven by additive space-time white noise on the interval (0, ℓ) and Neumann boundary conditions.For the simulations we set ℓ = 1, x * = x * + = 1, h(ϵ) = ϵ −0.1 and Galerkin projection level N = 50.The numerical results can be found in Tables 1-4.
2) (Simulations for the periodic stochastic Allen-Cahn) We estimate P (ϵ) for the solution X ϵ of (76) driven by additive space-time white noise on the interval (0, ℓ) and periodic boundary conditions.For the simulations we set , Galerkin projection level N = 50.The numerical results can be found in Tables 5-8.
4) (Simulations for the quintic SRDE (83)) We estimate P (ϵ) for the solution X ϵ of (83) driven by additive space-time white noise on the interval (0, ℓ) and Neumann boundary conditions.For the simulations we set µ = −0.5, , Galerkin projection level N = 50.The numerical results can be found in Tables 13-16.

5)
Standard Monte Carlo (sMC) estimation, i.e. with no change of measure does not perform well for small values of ϵ, as indicated in Tables 4,12,8.A dash line indicates that there was no successful trajectory in the simulations and thus no estimate could be provided.The relative errors per sample are getting increasingly large making most of the reported probability values of Tables 3,11,7 to be of no value.

6)
The importance sampling scheme for the Allen-Cahn equation outperforms sMC and performs well for all boundary conditions and probabilities ranging from 10 −1 to 10 −11 (see Tables 1, 9, 5).In particular, the relative errors for the former are way lower than those of sMC.As expected, the estimated probabilities resulting from sMC and importance sampling scheme agree when the relative errors are below 10.0.The relative errors per sample for the importance sampling scheme lie mostly below 1.2.The relative-error trends indicate that the accuracy improves as the sampling time grows from T = 1 to T = 8.The relative errors per sample as reported in Tables 2,10, 6 support the theoretical findings in that the scheme performs optimally as the theory predicts.

7)
The performance of the importance sampling scheme for the quintic SRDE (83) experiences a slight degradation after T = 3 (see Table 14).Nevertheless, it remains superior to that of the sMC (compare to Table 16) while the relative errors remain mostly below 2.5 and decrease with ϵ.

9)
In Table 18, we work with the Neumann Allen-Cahn and compare relative errors for different levels N of the Galerkin approximation with N = 50, 100, 150.The sampling time is fixed to T = 3.We notice that the relative errors are practically of the same order.This indicates that the first mode really dominates the rare event.Another observation we made is that the total simulation time increased significantly as we increased N.These considerations led us to conclude that N = 50 is an efficient and sufficiently good lower dimensional approximation to the corresponding SPDE.18.Comparison of the relative errors produced by the importance sampling scheme for the Neumann Allen-Cahn equation with different Galerkin projection levels N .The rest of the parameters are x * = 1, ℓ = 1, κ = 0.9, h(ϵ) = ϵ −0.1 , T = 3.

CONCLUSIONS AND FUTURE WORK
In this paper we studied the problem of rare event simulation for small-noise SRDEs via moderate deviation-based importance sampling.Taking advantage of the linearized limiting dynamics of the process ηϵ,v (42), we constructed changes of measure that behave optimally in the limit as ϵ → 0 under the fairly general spectral gap condition of Hypothesis 3(c').Working under the more restrictive Hypothesis 3(c) we designed an importance sampling scheme with changes of measure that act on a one-dimensional eigenspace of the operator A + DF (x * ).We were then able to show that this scheme performs well pre-asymptotically and supplemented the theoretical results with numerical simulations for gradient-type SRDEs corresponding to a double-well potential.Such systems have wide applicability and provided good examples to illustrate our theory.Nevertheless, there are other types of nonlinearities which satisfy our assumptions, e.g.f = −V ′ f , where the potential V f (x) = sin x has more than two global minima.The design and pre-asymptotic analysis of a scheme under the weaker spectral gap of Hypothesis 3(c') provides an interesting direction for future work.This would allow for the simulation of rare events for SRDEs under bifurcation (e.g. when ℓ > π in the Neumann Allen-Cahn case).The asymptotic optimality of such a scheme is guaranteed by Theorem 3.2.Even though the presence of non-constant saddle points with one unstable direction facilitates exits from D (13), the pre-asymptotic analysis of Section 4 is expected to be more complicated in this setting.This is due to the fact that the changes of measure u k0 (25) act on k 0 − dimensional subspaces of H.One then has to show that the linearization error is negligible by considering the behavior of the system on carefully chosen partitions of a k 0 -dimensional section of D.
Throughout this work we have considered SRDEs in one spatial dimension.In higher dimensions, equations like (76) are singular and a-priori ill-posed.Thus, one has to consider SRDEs with a spatially colored stochastic forcing or employ renormalization techniques.Metastability results for the renormalized twodimensional Allen-Cahn can be found e.g. in [5], [58] and references therein.Importance sampling for linear equations (i.e.f = 0) with colored noise has been considered in [53].In the latter, the spatial covariance operator Q is assumed to be trace-class and diagonalizable with respect to the eigenbasis of the differential operator A. Carrying the analysis of this paper over to higher spatial dimensions is challenging since A and the linearized operator A+DF (x * ) do not necessarily have the same eigenbasis (e.g. in the case of the Allen-Cahn with Dirichlet boundary conditions in spatial dimension 2).In particular, the analysis of the exit direction in Section 4 would have to be generalized and take into account the non-commutativity of Q and A + DF (x * ).
Finally, we expect that the results of this paper can be used to design importance sampling schemes for simulating rare events in slow-fast systems of SRDEs.Similar work for multiscale diffusions in finite dimensions has been done in [57] and an MDP for multiscale SRDEs was recently proved in [35] Moreover,
Dirichlet boundary conditions.The eigenpairs of the Dirchlet Laplacian on the interval (0, ℓ) are explicitly given by

FIGURE 3 .
FIGURE 3. Instances of the Dirichlet eigenfunctions e f 1,a .

TABLE 2
. Estimated relative errors per sample for the stochastic Allen-Cahn equation with Neumann boundary conditions using the developed importance sampling scheme with κ = 0.9 and mollification parameter δ := 2/h 2 (ϵ).

TABLE 3 .
Estimated probability values P (ϵ, T ) for the stochastic Allen-Cahn equation with Neumann boundary conditions.The values reported are based on standard Monte Carlo simulation without employing some change of measure.

TABLE 4 .
Estimated relative errors per sample for the stochastic Allen-Cahn equation with Neumann boundary conditions.The values reported are based on standard Monte Carlo simulation without employing some change of measure.A probability of 2 × 10 −5 means that only one out of the 5 × 10 4 trajectories exited the domain.The relative error in that case is 223.6.

Numerical results for Stochastic Allen-Cahn with periodic boundary conditions.
In this section, we provide numerical simulation results validating our theory for the stochastic Allen-Cahn equation with periodic boundary conditions studied in Subsection 5.1.

TABLE 5
. Estimated probabilities for the stochastic Allen-Cahn equation with periodic boundary conditions using the developed importance sampling scheme with κ = 0.9 and mollification parameter δ := 2/h 2 (ϵ).

TABLE 6 .
Estimated relative errors per sample for the stochastic Allen-Cahn equation with periodic boundary conditions using the developed importance sampling scheme with κ = 0.9 and mollification parameter δ := 2/h 2 (ϵ).

TABLE 7 .
Estimated probabilities for the stochastic Allen-Cahn equation with periodic boundary conditions.The values reported are based on standard Monte Carlo simulation without employing some change of measure.

TABLE 8 .
Estimated relative error per sample for the stochastic Allen-Cahn equation with periodic boundary conditions.The values reported are based on standard Monte Carlo simulation without employing some change of measure.
6.3.Numerical results for Stochastic Allen-Cahn with Dirichlet boundary conditions.In this section, we provide numerical simulation results validating our theory for the stochastic Allen-Cahn equation with Dirichlet boundary conditions studied in Subsection 5.2.

TABLE 10 .
Estimated relative errors per sample for the stochastic Allen-Cahn equation with Dirichlet boundary conditions using the developed importance sampling scheme with κ = 0.9 and mollification parameter δ

TABLE 11
. Estimated probability values P (ϵ, T ) for the stochastic Allen-Cahn equation with Dirichlet boundary conditions.The values reported are based on standard Monte Carlo simulation without employing some change of measure.

TABLE 13 .
Estimated probabilities for the quintic SRDE

TABLE 15 .
Estimated probability values for the quintic SRDE (83) with Neumann boundary conditions and µ = −0.5.The values reported are based on standard Monte Carlo simulation without employing some change of measure.

TABLE 16 .
Estimated relative errors per sample for the quintic SRDE (83) with Neumann boundary conditions and µ = −0.5.The values reported are based on standard Monte Carlo simulation without employing some change of measure.6.5.

Numerical comparisons of relative errors and probabilities for different parameter values. In
this section, we provide numerical simulation results validating our theory for the stochastic Allen-Cahn equation with Neumann boundary conditions studied in Subsection 5.1.In particular, we now explore the effect of different moderate deviation scalings h(ϵ) and of different Galerkin projection levels N .ϵ/T=2,h(ϵ) = ϵ −0.1Prob.Rel.error/sample ϵ/T = 2, h(ϵ) = ϵ −0.2

TABLE 17 .
Comparison of relative errors and probabilities produced by the importance sampling scheme for the Neumann Allen-Cahn equation with different moderate deviation scalings h(ϵ).The rest of the parameters are x * = 1, ℓ = 1, κ = 0.9, T = 2.