1 Introduction

In this paper we are concerned with the problem of rare event simulation for the stochastic reaction–diffusion equation (SRDE)

$$\begin{aligned} \left\{ \begin{aligned}&\partial _tX^\epsilon (t,\xi )={\mathcal {A}}X^{\epsilon }(t,\xi )+f\big ( X^\epsilon (t,\xi )\big )+\sqrt{\epsilon }{\dot{W}}(t,\xi )\;,\;\;(t,\xi )\in [0,\infty )\times (0,\ell ) \\&X^{\epsilon }(0,\xi )=x(\xi ), \xi \in (0,\ell ),\;{\mathcal {N}}X^{\epsilon }(t,\xi )=0,\; (t,\xi )\in [0,\infty )\times \{0,\ell \}, \end{aligned}\right. \nonumber \\ \end{aligned}$$
(1)

where \(\epsilon \ll 1,\) \({\mathcal {A}}\) is a uniformly elliptic second-order differential operator, \(f:\mathbb {R}\rightarrow \mathbb {R}\) is a dissipative nonlinearity with polynomial growth and \({\dot{W}}\) is a stochastic forcing term of intensity \(\sqrt{\epsilon }\) modeled by space-time white noise. The mixed boundary conditions are given by the linear operator \({\mathcal {N}}\) which acts on functions defined on the boundary \(\partial (0,\ell )\) (see Sect. 2 for more details), and the initial datum \(x:(0,\ell )\rightarrow \mathbb {R}\) is a continuous function in the kernel of \({\mathcal {N}}.\)

Systems like (1) are of interest because they exhibit metastable behavior. Assuming that the associated noiseless dynamics are non-trivial and \(\epsilon >0\), the stochastic forcing can induce transitions between neighborhoods of metastable states. As \(\epsilon \rightarrow 0\), transitions and exits from domains of attraction occur with very small probabilities and rigorous asymptotic analysis of exit times and places is possible within the framework of large deviations or potential theory (see e.g. [18, 28, 31, 32] and [4, 16, 27, 35, 47], as well as references within, for results in metastability theory in finite and infinite dimensions respectively).

In practice, efficient simulation of such events is challenging. On the one hand, Large Deviation Principles (LDPs) characterize the exponential decay rates of probabilities in the limit as \(\epsilon \rightarrow 0\) but ignore the effect of prefactors which can be significant (see [23]). On the other hand, as \(\epsilon \) decreases, standard Monte-Carlo schemes require an increasingly large sample size in order to maintain a small relative error per sample. For this reason, accelerated and adaptive methods such as importance sampling or multi-level splitting become essential when it comes to rare events. For more details on the general theory and applications of such methods in a number of different models, the interested reader is referred to the book [10].

In the present work, we aim to develop a provably efficient importance sampling scheme that computes exit probabilities of \(X^\epsilon \) from scaled neighborhoods of a stable equilibrium point \(x^*\). In particular, let \(X^\epsilon _x\) denote the unique (mild) solution of (1) with initial condition x\(D\subset L^2(0,\ell )\) and

$$\begin{aligned} \tau _{x^*}^\epsilon =\inf \{ t>0: X_{x^*}^{\epsilon }(t)\notin D\}. \end{aligned}$$

For \(T, L>0,\) we focus on the estimation of probabilities \(\mathbb {P}[\tau _{x^*}^\epsilon \le T ]\) in the case where \(D=D_\epsilon \) with

$$\begin{aligned} D_\epsilon =\big \{ x\in L^2(0,\ell ): \Vert x-x^*\Vert _{L^2}< L\sqrt{\epsilon }h(\epsilon ) \big \}. \end{aligned}$$
(2)

The scaling \(h(\epsilon )\) is chosen so that \(h(\epsilon )\rightarrow \infty \) and \(\sqrt{\epsilon }h(\epsilon )\rightarrow 0,\) as \(\epsilon \rightarrow 0.\) As \(\epsilon \rightarrow 0\), exit probabilities from such domains lie in an asymptotic regime that interpolates between the Central Limit Theorem (CLT) and LDP. To be precise, let \(X^0_x\) denote the (deterministic) solution of (1) with \(\epsilon =0\) and define a family of centered and re-scaled processes

$$\begin{aligned} \eta _x^\epsilon :=\frac{X_x^\epsilon -X^0_x}{\sqrt{\epsilon }h(\epsilon )}\;,\;\; \epsilon >0. \end{aligned}$$
(3)

As \(\epsilon \rightarrow 0,\) the choices \(h(\epsilon )=1/\sqrt{\epsilon }\) and \(h(\epsilon )=1\) correspond to large and Gaussian deviations of \(X^{\epsilon }\) respectively.

Exits of \(X^\epsilon \) from D are then equivalent to exits of \(\eta _x^\epsilon \) from an \(L^2-\)ball of radius L around 0 and large deviations of the family \(\{\eta _x^\epsilon \}_{\epsilon \in (0,1)}\) are called moderate deviations of \(\{X_x^\epsilon \}_{\epsilon \in (0,1)}.\) Moderate Deviation Principles (MDPs) have been studied in many different contexts such as multiscale and interacting particle systems, Markov processes with jumps, small-noise stochastic dynamics, statistical estimation, option pricing and stochastic recursive algorithms see e.g. [30, 55] for SRDEs as well as [7, 11, 20, 29, 33, 34, 36, 38, 44].

Importance sampling is a variance-reduction accelerated Monte-Carlo method and its objective is to minimize the variance of the estimator by carefully chosen changes of measure. Such changes of measure "push" the dynamics towards trajectories that realize the rare event of interest. This procedure transforms tail events to more typical events, thus allowing for more efficient sampling. The simulation outcomes are then weighted by likelihood ratios so that the importance sampling estimators remain unbiased under the new probability measures. Importance sampling schemes for events in the large and moderate deviation regimes have been developed for finite-dimensional systems in [21, 23, 49, 50, 53]. In [21, 50], the authors observed that moderate-deviation based schemes provide a viable and simpler alternative to their large-deviation based counterparts, in cases where both are applicable. This is due to the fact that the MDP action functional, which characterizes exponential decay rates of probabilities, takes a much simpler form. In turn, this allows for more tractable and straightforward design of optimal changes of measure.

Importance sampling for SRDEs presents new challenges due to infinite dimensionality combined with the nonlinearity of the dynamics. Our work is close to [46] where a large deviation based scheme was developed for linear equations (i.e. when \(f=0\)). In there, the authors show that efficient changes of measure need to accomplish both variance and dimension reduction. For example, changes of measure that force infinitely many modes of the dynamics lead to estimators with very large variance when \(\epsilon \) is small. A possible workaround is to show that exits from D take place in a finite-dimensional submanifold of \(\partial D\) with high probability. This was achieved in the linear case of [46] where it was proved that, under a sufficiently large spectral gap, exit from D happens in the direction of the eigenvector \(e_1\) of \(-{\mathcal {A}}\) corresponding to the smallest non-zero eigenvalue. Similar results regarding the exit direction for (finite-dimensional) SDEs with a linear drift have been proved in [51] (see also Remark 5 below).

To the best of our knowledge, importance sampling for nonlinear SRDEs is rigorously studied here for the first time. The main difficulty in designing large deviation-based schemes for such equations lies in the task of identifying a finite-dimensional exit submanifold (if any). We are able to overcome this obstacle by working in the moderate deviation regime. As we show in the sequel, the latter is equivalent to linearizing the dynamics in a neighborhood of the equilibrium \(x^*\). Consequently, the results of [46] can be applied locally at the cost of a linearization error which is, however, negligible as \(\epsilon \rightarrow 0\). In cases where both LDP and MDP-based schemes are available, one may think of the tradeoff between the two as follows: Moderate deviations cover the regime between central limit theorem and large deviations, so they are appropriate to characterize rare events, but not so rare that they would be in the large deviations regime. On the other hand, moderate deviations schemes are in general more tractable due to the asymptotic linearization of the dynamics that takes place. In our setting, this tradeoff is reflected in the fact that we only consider exit domains (2) in which the radius shrinks to zero as \(\epsilon \rightarrow 0.\) Furthermore, the probability of exiting from a ball of radius \(\sqrt{\epsilon }h(\epsilon )\) is strictly smaller than the probability of exiting a ball of radius 1. The MDP importance sampling schemes described in this paper can provide a quantitative upper bound for the much more difficult to characterize LDP exit probabilities.

The design of an importance sampling scheme and proof of its good asymptotic and pre-asymptotic performance is the main contribution of this paper. In the course of our analysis, we prove an MDP for additive-noise SRDEs with a non-Lipschitz nonlinearity which cannot be found in the literature (see Theorem 3.1 and Remark 11). Furthermore, our theory is applied to the stochastic Allen–Cahn (also known as real Ginzburg-Landau or Chafee-Infante) equation and supplemented by simulation studies. In contrast to the linear case, there is a number of interesting cases where the aforementioned spectral gap is not satisfied. Another novel feature of this work is the construction of changes of measure that perform well asymptotically (i.e. as \(\epsilon \rightarrow 0\)) in the absence of this condition (see Hypothesis 3c’ below).

The rest of this paper is organized as follows: In Sect. 2 we fix the notation and state our assumptions. In the first part of Sect. 3 we introduce moderate deviations and subsolution-based importance sampling and then state and prove our results on the asymptotic theory of the scheme. Section 4 is devoted to the implementation and pre-asymptotic performance analysis of our scheme. In Sect. 5 we apply the developed theory to the case where f is, up to a sign, the derivative of a double-well potential. Our examples include the stochastic Allen–Cahn equation (which features a cubic nonlinearity) with different boundary conditions as well as SRDEs with higher order polynomial nonlinearities. The results of simulation studies are then presented in Sect. 6. Finally, Appendix A collects the proofs of some useful lemmas.

2 Notation and assumptions

Let \(\ell >0.\) The Hilbert space \(L^2(0,\ell )\) endowed with its usual inner product will be denoted by \((\mathcal {H},\langle \cdot ,\cdot \rangle _\mathcal {H})\). The Banach space \(C[0,\ell ],\) endowed with the supremum norm, is denoted by \({\mathcal {E}}.\) The norm of a Banach space \({\mathcal {X}}\) will be denoted by \( \Vert \cdot \Vert _{{\mathcal {X}}}\) and the closed ball of radius \(R>0\) and center \(x_0\in {\mathcal {X}}\), i.e. the set \(\{x\in {\mathcal {X}} : \Vert x-x_0\Vert _{{\mathcal {X}}}\le R\}\), by \(B_{\mathcal {X}}(x_0,R)\). We use \(\mathring{D},{\bar{D}},\partial D \) to denote interior, closure and boundary of a set \(D\subset {\mathcal {X}}\) respectively. The lattice notation \(\wedge , \vee \) is used to indicate minimum and maximum respectively.

For \(\theta > 0,\) \(p\in [1,\infty ),\) we denote by \(W^{p,\theta }(0,\ell )\) the fractional Sobolev space of \(x\in L^p(0,\ell )\) such that

$$\begin{aligned}{}[x]^p_{p,\theta }:=\iint _{[0,\ell ]^2}\frac{|x(\xi _2)-x(\xi _1)|^p}{|\xi _2-\xi _1|^{p\theta +1}}d\xi _1 d\xi _2<\infty . \end{aligned}$$

\(W^{p,\theta }(0,\ell ),\) endowed with the norm \(\Vert \cdot \Vert _{p,\theta }:=\Vert \cdot \Vert _{L^p(0,\ell )}+[\cdot ]_{p,\theta },\) is a Banach space. \(W^{2,\theta }(0,\ell )\) is a Hilbert space and is denoted by \(H^{\theta }(0,\ell ).\) Moreover, for \(T>0\) and \(\beta \in [0,1)\), we denote by \(C^\beta ([0,T];{\mathcal {X}})\) the space of \(\beta \)-Hölder continuous \({\mathcal {X}}\)-valued functions defined on the interval [0, T]. \(C^\beta ([0,T];{\mathcal {X}}),\) endowed with the norm

$$\begin{aligned} \Vert X\Vert _{C^\beta ([0,T];{\mathcal {X}})}:= & {} \Vert X\Vert _{C([0,T];{\mathcal {X}})}+[X]_{C^\beta ([0,T];{\mathcal {X}})}\\:= & {} \sup _{t\in [0,T]}\Vert X(t)\Vert _{{\mathcal {X}}}+\sup _{\overset{s,t\in [0, T]}{ t\ne s}}\frac{\Vert X(t)-X(s)\Vert _{\mathcal {X}}}{|t-s|^\beta }\;, \end{aligned}$$

is a Banach space.

For any two Banach spaces \({\mathcal {X}}, {\mathcal {Y}}\) we denote the space of linear bounded operators \(B: {\mathcal {X}}\rightarrow {\mathcal {Y}}\) by \({\mathscr {L}}({\mathcal {X}}; {\mathcal {Y}})\). The latter is a Banach space when endowed with the norm \(\Vert B\Vert _{{\mathscr {L}}({\mathcal {X}}; {\mathcal {Y}})}:=\sup _{x\in B_{\mathcal {X}}(0,1)}\Vert Bx\Vert _{{\mathcal {Y}}}\). When the domain coincides with the co-domain, we use the simpler notation \({\mathscr {L}}({\mathcal {X}}).\) The spaces of trace-class and Hilbert-Schmidt linear operators \(B:\mathcal {H}\rightarrow \mathcal {H}\) are denoted by \({\mathscr {L}}_1(\mathcal {H})\) and \({\mathscr {L}}_2(\mathcal {H})\) respectively. The former is a Banach space when endowed with the norm \(\Vert B\Vert _{{\mathscr {L}}_1(\mathcal {H})}:=\text {tr}(\sqrt{B^*B})\) while the latter is a Hilbert space when endowed with the inner product \(\langle B_1, B_2\rangle _{{\mathscr {L}}_2(\mathcal {H})}:= \text {tr}( B_2^* B_1)\).

The operator \({\mathcal {A}}\) in (1) is a uniformly elliptic second-order differential operator in divergence form. In particular:

$$\begin{aligned} {\mathcal {A}}\phi (\xi )=\frac{d}{d\xi }\bigg (a(\xi )\frac{d\phi (\xi )}{d\xi }\bigg )\;, \xi \in (0,\ell ) \end{aligned}$$
(4)

with \(a\in C^1(0,\ell )\) and \(\inf _{\xi \in (0,\ell )}a(\xi )>0\). The operator \({\mathcal {N}}\) acts on the boundary \(\{0,\ell \}\) and can be either the identity operator (corresponding to Dirichlet boundary conditions), first-order differential operators of the type

$$\begin{aligned} {\mathcal {N}}u(\xi )=b(\xi )u'(\xi )+c(\xi )u(\xi )\;,\;\xi \in \{0, \ell \} \end{aligned}$$

for some \(b,c\in C^1[0,\ell ]\) such that \(b\ne 0\) on \(\{0,\ell \}\) (corresponding to Neumann or Robin boundary conditions) or

$$\begin{aligned} {\mathcal {N}}u=\big (u(\ell )-u(0), u'(\ell )-u'(0)\big ) \end{aligned}$$

for periodic boundary conditions. We denote by A the realization of the differential operator \({\mathcal {A}}\) in \(\mathcal {H}\), endowed with the boundary condition \({\mathcal {N}}\). It is defined on a dense subspace \(Dom(A)\subset \mathcal {H}\) that contains

$$\begin{aligned} \{ u\in H^2(0,\ell ): {\mathcal {N}}u=0 \} \end{aligned}$$

and it generates a \(C_0\) semigroup of operators \(S=\{S(t)\}_{t\ge 0}\subset {\mathscr {L}}(\mathcal {H})\). Moreover, the part of A in \(\overline{Dom(A)}\subset \mathcal {E},\) where the closure is taken in the topology of \(\mathcal {E},\) generates either a \(C_0\) or an analytic semigroup for which we use the same notation (see e.g. A.27 in [17] for a definition). Regarding the spectral properties of A, we make the following assumptions:

Hypothesis 1(a)

In view of (4), the operator \(-A\) is self-adjoint. As a result, there exists a countable complete orthonormal basis \(\{e_{n}\}_{n\in \mathbb {N}}\subset \mathcal {H}\) of eigenvectors of \(-A\). The corresponding sequence of nonnegative eigenvalues is denoted by \(\{a_{n}\}_{n\in \mathbb {N}}\).

Hypothesis 1(b)

The eigenvectors satisfy

$$\begin{aligned} \sup _{n\in \mathbb {N}}\Vert e_{n}\Vert _{{\mathcal {E}}}<\infty . \end{aligned}$$

Remark 1

Without loss of generality, we can replace the operator A by \({\tilde{A}}=A-cI\) for some \(c>0\) and the reaction term f in (1), by \({\tilde{f}}(x(\xi )):=f(x(\xi ))+cx(\xi )\). The model is invariant under this transformation and, in light of Hypothesis 1(a), it follows that \(\Vert {\tilde{S}}(t)\Vert _{{\mathscr {L}}(\mathcal {H})}\le e^{-c t}\). Throughout the rest of this work we will be using \({\tilde{A}},{\tilde{S}}\) and \( {\tilde{f}}\) with no further distinction in notation.

Let \(\theta \ge 0\). In view of Hypotheses 1(a) along with the previous remark, \(-A\), restricted to its image, has a densely defined bounded inverse \((-A)^{-1}\) which can then be uniquely extended to all of \(\mathcal {H}\). The fractional power \((-A)^{-\theta }\) is defined via interpolation and is also injective. Letting \((-A)^{\frac{\theta }{2}}:= ((-A)^{-\frac{\theta }{2}})^{-1}\) we define \(\mathcal {H}^\theta := Dom((-A)^\frac{\theta }{2})= Range((-A)^{-\frac{\theta }{2}})\subset \mathcal {H}\). The norm \(\Vert x\Vert _{\mathcal {H}^\theta }:=\big \Vert (-A)^\frac{\theta }{2}x\big \Vert _\mathcal {H}\) turns \(\mathcal {H}^\theta \) into a Banach space and is equivalent to the graph norm (see [41], Chapter 2.2).

Remark 2

For \(\theta \in (0,\frac{1}{2})\) the spaces \(H^\theta (0,\ell )\) and \(\mathcal {H}^\theta \) coincide via the identification

$$\begin{aligned} H^\theta (0,\ell )=\mathcal {H}^\theta =\big \{x\in \mathcal {H}: \sup _{t\in (0,1] }t^{-\theta /2} \Vert S(t)x-x\Vert _\mathcal {H}<\infty \big \} \end{aligned}$$

which holds with equivalence of norms. The latter implies that for each \(t\ge 0\), the linear operator \(S(t)-I\in {\mathscr {L}}(H^\theta ;\mathcal {H})\) and there exists a constant \(C>0\) such that

$$\begin{aligned} \big \Vert S(t)-I\big \Vert _{{\mathscr {L}}(H^\theta ;\mathcal {H})}\le Ct^{\theta /2}. \end{aligned}$$
(5)

The analytic semigroup S possesses the following regularizing properties (see e.g. section 4.1.1 in [13]) :

(i) For \(0\le s\le r\le \frac{1}{2}\) and \(t>0\), S maps \(H^{s}(0,\ell )\) to \(H^{r}(0,\ell )\) and

$$\begin{aligned} \Vert S(t)x\Vert _{H^{r}}\le C_{r,s}(t\wedge 1)^{-\frac{r-s}{2}}e^{c_{r,s}t}\Vert x\Vert _{H^{s}}\;\;,\;x\in H^{s}(0,\ell ), \end{aligned}$$
(6)

for some positive constants \(c_{r,s}, C_{r,s}\).

(ii) S is ultracontractive, i.e. for \(t>0,\) S(t) maps \(\mathcal {H}\) to \(L^{\infty }(0,\ell )\) and furthermore, for any \(1\le p\le r\le \infty \),

$$\begin{aligned} \Vert S(t)x\Vert _{L^r(0,\ell )}\le C(t\wedge 1)^{-\frac{r-p}{2pr}}\Vert x\Vert _{L^p(0,\ell )}\;\;,\;x\in L^p(0,\ell ). \end{aligned}$$

The next set of assumptions concerns the nonlinear reaction term in (1).

Hypothesis 2(a)

\(f:\mathbb {R}\rightarrow \mathbb {R}\) is twice continuously differentiable and

$$\begin{aligned} f=f_1+f_2 \end{aligned}$$

where \(f_1:\mathbb {R}\rightarrow \mathbb {R}\) is globally Lipschitz continuous and \(f_2:\mathbb {R}\rightarrow \mathbb {R}\) is a non-increasing function.

Hypothesis 2(b)

There exists \(C_f>0\) and \(p_0\ge 3\) such that for all \( x\in \mathbb {R}\) and \(i\in \{0,1,2 \}\)

$$\begin{aligned} |\partial ^{(i)}_{x}f(x)|\le C_f\big (1+|x|^{p_0-i}\big ). \end{aligned}$$
(7)

For \(p\ge 1\), f induces a superposition (or Nemytskii) operator \(F:{\mathcal {E}}\rightarrow L^p(0,\ell )\) defined by \(F(x)(\xi ):=f(x(\xi )),\) \(\xi \in (0,\ell ).\) In view of Hypotheses 2(a) and 2(b), F is twice Gâteaux differentiable along any direction in \({\mathcal {E}}\) and (with some abuse of notation) its Gâteaux differentials are given by \(D^{i}F=\partial ^{i}_xf\), \(i=1,2\).

The last set of assumptions concerns the stability properties of the deterministic and linearized dynamics governed by (1), after setting \(\epsilon =0.\)

Hypothesis 3(a)

There exists at least one asymptotically stable equilibrium \(x^*\in Dom(A)\) of (1) solving the elliptic Sturm-Liouville problem \( Ax+F(x)=0.\)

Hypothesis 3(b)

The linear self-adjoint operator \(-A-DF(x^*)\) has a countable, non-decreasing sequence of nonnegative eigenvalues \(\{a_n^f\}_{n\in \mathbb {N}}\) corresponding to a complete orthonormal set of eigenvectors \(\{e^f_n\}_{n\in \mathbb {N}}\subset {\mathcal {E}}.\) Therefore, the equilibrium \(x^*\) is asymptotically stable.

Hypothesis 3(c)

The first two eigenvalues of the self-adjoint operator \(-A-DF(x^*)\) satisfy \(3a_1^f<a_2^f.\)

This spectral gap provides a sufficient condition that allows us to identify a one-dimensional exit direction for limiting trajectories (see Lemma 3.4 below). A weaker condition under which our results continue to hold is \(2a_1^f<a_2^f\) (see Remark 7). In fact, our asymptotic results continue to hold under the following relaxed spectral gap:

Hypothesis 3(c\({}^\prime \))

There exists \(k_0\ge 1\) such that \(3a_1^f<a_{k_0+1}^f\) and \(a_1^f<a_2^f.\)

Note that Hypothesis 3(c) trivially implies Hypothesis 3c’ with \(k_0=1\). The latter will be used throughout Section 3 to prove asymptotic results. In Sect. 4 we restrict the pre-asymptotic analysis to schemes that work under Hypothesis 3(c).

Turning to the stochastic forcing, let \((\Omega ,{\mathscr {F}}, {\mathscr {F}}_{t\ge 0}, \mathbb {P})\) be a complete filtered probability space. The space-time white noise \({\dot{W}}\) is understood as the time-derivative of a cylindrical Wiener process \(W:[0,\infty )\times \mathcal {H}\rightarrow L^2(\Omega )\) in the sense of distributions. The latter is a Gaussian family of random variables with covariance given by

$$\begin{aligned} \mathbb {E}[W(t_1,\chi _1)W(t_2,\chi _2) ]=t_1\wedge t_2\langle \chi _1, \chi _2\rangle _\mathcal {H}, \end{aligned}$$

for \((t_i,\chi _i)\in [0,\infty )\times \mathcal {H}, i=1,2.\) Given a separable Hilbert space \((\mathcal {H}_1, \langle .\;,.\rangle _{\mathcal {H}_1})\) such that \(\mathcal {H}\) is a linear subspace of \(\mathcal {H}_1\) and the inclusion map \(\mathcal {H}\overset{i}{\rightarrow }\mathcal {H}_1\) is Hilbert-Schmidt, W can be identified with the \(\mathcal {H}_1-\)valued Wiener process

$$\begin{aligned} W(t)=\sum _{n=1}^{\infty } W(t,e_{n})i(e_{n})\;\;,t\ge 0 \end{aligned}$$

with covariance operator \(Q=ii^*\in {\mathscr {L}}_1(\mathcal {H})\). This identification is assumed throughout the rest of this paper without further distinction in notation.

Having introduced the necessary notation, we can recast (1) as a stochastic evolution equation on \({\mathcal {E}}\) given by

$$\begin{aligned} \left\{ \begin{aligned}&dX^\epsilon (t)=[AX^{\epsilon }(t)+F(X^\epsilon (t))]dt+\sqrt{\epsilon }dW(t)\\ {}&X^{\epsilon }(0)=x. \end{aligned}\right. \end{aligned}$$
(8)

A mild solution to the latter is defined as a process \(X^\epsilon \) satisfying for each \(\epsilon \) and all \(t\in [0,T],\)

$$\begin{aligned} X^\epsilon (t)=S(t)x+\int _{0}^{t}S(t-s)F(X^\epsilon (s))ds+\sqrt{\epsilon }\int _{0}^{t}S(t-s)dW(t) \end{aligned}$$
(9)

with probability 1. The last term is known as a stochastic convolution and will be frequently denoted by \(W_A.\) Our assumptions guarantee that the \({\mathcal {E}}\)-valued paths of \(W_A\) are continuous with probability 1 and

$$\begin{aligned} \mathbb {E}\sup _{t\in [0,T]}\big \Vert W_A(t)\big \Vert ^p_{\mathcal {E}}<\infty . \end{aligned}$$
(10)

This can be proved by the stochastic factorization method of Da Prato-Zabczyk [17] (see also Theorem B.6 in [45]). Moreover, for each \(\epsilon >0,\) (8) has a unique mild solution taking values in \(C([0,T];{\mathcal {E}})\) with probability 1 (see e.g. Theorem 2.2 in [14]).

3 Moderate deviations, importance sampling and asymptotic theory

3.1 General theory and main results

In this section we present some theoretical aspects of subsolution-based importance sampling in the moderate deviation regime, applied to our problem of interest. First, we recall the notion of a Moderate Deviation Principle (MDP).

Definition 3.1

Let \(T>0,\) \({\mathcal {X}}=\mathcal {H}\) or \({\mathcal {E}}, x\in {\mathcal {X}}\) and a functional \({\mathcal {S}}_{x,T}:C([0,T];{\mathcal {X}})\rightarrow [0,\infty ]\) with compact sub-level sets.

(i) We say that the collection of \(C([0,T];{\mathcal {X}})\)-valued random elements \(\{X^\epsilon \}_{\epsilon \ll 1}\) satisfies an MDP with action functional \({\mathcal {S}}_{x,T}\) if, for all continuous and bounded \(g: C([0,T];{\mathcal {X}})\rightarrow \mathbb {R}\) and all scalings \(h(\epsilon )\) such that \(h(\epsilon )\rightarrow \infty \) and \(\sqrt{\epsilon }h(\epsilon )\rightarrow 0\) as \(\epsilon \rightarrow 0\)

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}\frac{1}{h^2(\epsilon )}\log \mathbb {E}e^{-h^2(\epsilon )g(\eta _x^{\epsilon })}=-\inf _{\{\phi \in C([0,T];{\mathcal {X}}): \phi (0)=0\}} \big [ {\mathcal {S}}_{x,T}(\phi )+g(\phi ) \big ], \end{aligned}$$
(11)

where \(\eta _x^{\epsilon }\) is defined as in (3).

(ii) A Borel set \(E\subset C([0,T];{\mathcal {X}})\) will be called an \({\mathcal {S}}_{x,T}-\)continuity set if

$$\begin{aligned} \inf _{\phi \in {\bar{E}}}{\mathcal {S}}_{x,T}(\phi )=\inf _{\phi \in \mathring{E}}{\mathcal {S}}_{x,T}(\phi ). \end{aligned}$$

As mentioned in Sect. 1 we aim to compute probabilities of the form

$$\begin{aligned} P(\epsilon )=\mathbb {P}[ \tau _{x^*}^\epsilon \le T] \end{aligned}$$
(12)

for \(\epsilon \ll 1, T>0,\) where \( \tau _{x^*}^\epsilon =\inf \{ t>0: X_{x^*}^{\epsilon }(t)\notin D\}\) and

$$\begin{aligned} D=D_\epsilon =\mathring{B}_\mathcal {H}(x^*, L\sqrt{\epsilon }h(\epsilon )) \end{aligned}$$
(13)

for some \(L>0.\) Passing to the moderate deviation process \(\eta ^\epsilon _x\) and recalling that \(x^*\) is a (stable) equilibrium of \(X^0_x\) we see that

$$\begin{aligned} \eta _{x^*}^{\epsilon }=\frac{ X_{x^*}^{\epsilon }-x^*}{\sqrt{\epsilon }h(\epsilon )} \end{aligned}$$

and

$$\begin{aligned} \tau _{x^*}^\epsilon =\inf \{ t>0: \eta _{x^*}^{\epsilon }(t)\notin \mathring{B}_\mathcal {H}(0, L)\}. \end{aligned}$$
(14)

As will be shown in Sect. 3.4, \(\eta ^\epsilon _{x^*}\) converges, as \(\epsilon \rightarrow 0\), to the solution of a linear deterministic PDE with zero initial condition. Since 0 is the unique fixed point of this PDE, the limit process is bound to stay at 0 and \(\lim _{\epsilon \rightarrow 0} P(\epsilon )=0.\) This is why accelerated methods that estimate \(P(\epsilon )\) when \(\epsilon \) is small are useful.

In this paper, we will only work with unbiased estimators. Hence, minimizing the variance of the estimator is equivalent to minimizing the second moment. As we show below, an upper bound for the exponential decay rate of the second moment of any unbiased estimator can be determined in terms of the action functional \({\mathcal {S}}_{x,T}.\)

Lemma 3.1

Let \(P(\epsilon )\) as in (12) and \({\hat{P}}(\epsilon )\) be an unbiased estimator of \(P(\epsilon )\) with respect to a probability measure \(\bar{\mathbb {P}}\) defined on \((\Omega , {\mathscr {F}}).\) For any \(\phi \in C([0,T];{\mathcal {X}}),\) let \(\tau _\phi =\inf \{t>0: \phi (t)\notin \mathring{B}_{\mathcal {H}}(0,L) \}\) and

$$\begin{aligned} G_{T}(0,0):=\inf _{\{\phi \in C([0,T];{\mathcal {X}}): \phi (0)=0, \tau _\phi = T\}} {\mathcal {S}}_{x^*,T}(\phi ). \end{aligned}$$
(15)

If \(\{X^\epsilon \}\) satisfies an MDP with action functional \({\mathcal {S}}_{x,T}\) and \(E=\{ \phi \in C([0,T];\mathcal {H}): \tau _{\phi }\le T\}\) is a \({\mathcal {S}}_{x,T}-\)continuity set then

$$\begin{aligned} \limsup _{\epsilon \rightarrow 0} -\frac{1}{h^2(\epsilon )}\log {\bar{\mathbb {E}}}[ ({\hat{P}}(\epsilon ))^2]\le 2G_T(0,0), \end{aligned}$$

where \({\bar{\mathbb {E}}}\) denotes expectation with respect to the measure \(\bar{\mathbb {P}}.\)

Proof

We have

$$\begin{aligned} P(\epsilon )=\mathbb {P}[ \tau _{x^*}^\epsilon \le T]=\mathbb {P}[ \sup _{t\in [0,T]}\Vert \eta _{x^*}^{\epsilon }(t) \Vert _{\mathcal {H}}\ge L ]= \mathbb {P}[ \eta ^\epsilon _{x^*}\in E ]. \end{aligned}$$

Now, for any unbiased estimator \({\hat{P}}(\epsilon ),\)

$$\begin{aligned} {\bar{\mathbb {E}}}[ {\hat{P}}(\epsilon )^2 ]\ge {\bar{\mathbb {E}}}[ {\hat{P}}(\epsilon ) ]^2=P(\epsilon )^2 , \end{aligned}$$

where we used Jensen’s inequality. Thus

$$\begin{aligned} \begin{aligned} \limsup _{\epsilon \rightarrow 0}-\frac{1}{h^2(\epsilon )}\log {\bar{\mathbb {E}}}[ {\hat{P}}(\epsilon )^2 ]&\le 2\limsup _{\epsilon \rightarrow 0}-\frac{1}{h^2(\epsilon )}\log P(\epsilon )\\ {}&=-2\liminf _{\epsilon \rightarrow 0} \frac{1}{h^2(\epsilon )}\log P(\epsilon )\\ {}&=2\inf _{\{\phi \in C([0,T];{\mathcal {X}}): \phi (0)=0,\phi \in E\}}{\mathcal {S}}_{x^*,T}\le 2G_T(0,0) \end{aligned} \end{aligned}$$

where we used the continuity property of E in the last equality. \(\square \)

As in finite dimensions (see e.g. the discussion in Section 2.2 in [23]) , the previous lemma shows that \(2G_T(0,0)\) is the best possible exponential decay rate for any unbiased estimator. In turn, this motivates the following criterion for asymptotic optimality.

Definition 3.2

An unbiased estimator \({\hat{P}}(\epsilon )\) of \(P(\epsilon )\) defined on a probability space \((\Omega , {\mathscr {F}}, {\bar{\mathbb {P}}})\) will be called asymptotically optimal if

$$\begin{aligned} \liminf _{\epsilon \rightarrow 0} -\frac{1}{h^2(\epsilon )}\log {\bar{\mathbb {E}}}[ ({\hat{P}}(\epsilon ))^2]\ge 2G_T(0,0). \end{aligned}$$

In other words, an estimator is asymptotically optimal if its second moment achieves the best possible exponential decay rate in the limit as \(\epsilon \rightarrow 0\).

Importance sampling involves changes of measure chosen to guarantee that the corresponding estimators achieve optimal (or nearly optimal) asymptotic behavior. Given a measurable feedback control (or change of measure) \(u:[0,T]\times \mathcal {H}\rightarrow \mathcal {H}\) that is bounded on bounded subsets of \(\mathcal {H},\) we define a family of probability measures \(\{\mathbb {P}^\epsilon \}_{\epsilon >0}\) on \((\Omega , {\mathscr {F}})\) such that, for all \(\epsilon \), \(\mathbb {P}^\epsilon<<\mathbb {P}\) on \({\mathscr {F}}_T\) and

$$\begin{aligned} \frac{d\mathbb {P}^\epsilon }{d\mathbb {P}}\bigg |_{{\mathscr {F}}_T}=\exp \bigg ( h(\epsilon )\int _{0}^{T}\big \langle u\big (s,\eta ^{\epsilon }_{x^*}(s)\big ), dW(s)\big \rangle _\mathcal {H}-\frac{h^2(\epsilon )}{2}\int _{0}^{T } \Vert u\big (s,\eta ^{\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds \bigg ). \end{aligned}$$

Using these new measures, it is straightforward to verify that

$$\begin{aligned} {\hat{P}}(\epsilon ,u):=\frac{d\mathbb {P}}{d\mathbb {P}^\epsilon }\mathbb {1}_{\{\tau ^{\epsilon }_{x^*}\le T\}}, \end{aligned}$$

defined on \(( \Omega , {\mathscr {F}}_T, \mathbb {P}^{\epsilon } )\), is an unbiased estimator of \(\mathbb {P}[\tau ^{\epsilon }_{x^*}\le T]\). Its second moment is given by

$$\begin{aligned} \begin{aligned} Q^{\epsilon }(u):=&\mathbb {E}^\epsilon \big [{\hat{P}}(\epsilon ,u)^2\big ]= \mathbb {E}^\epsilon \bigg [\exp \bigg ( -2h(\epsilon )\int _{0}^{\tau ^{\epsilon }_{x^*}}\big \langle u\big (s,\eta ^{\epsilon }_{x^*}(s)\big ), dW(s)\big \rangle _\mathcal {H}\\&+h^2(\epsilon )\int _{0}^{\tau ^{\epsilon }_{x^*} } \Vert u\big (s,\eta ^{\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds \bigg )\mathbb {1}_{\{\tau ^{\epsilon }_{x^*}\le T\}}\bigg ]. \end{aligned} \end{aligned}$$
(16)

As we show in the next lemma \(Q^{\epsilon }(u)\) admits a variational stochastic control representation which will be useful for studying its asymptotic behavior. A similar variational formula can be found in (2.5) of [46].

Lemma 3.2

Let \(u:[0,T]\times \mathcal {H}\rightarrow \mathcal {H}\) be a measurable feedback control that is bounded on bounded subsets of \(\mathcal {H}\), uniformly in \(t\in [0,T]\). Then for all \(\epsilon >0\)

$$\begin{aligned} \begin{aligned} -\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u)=\inf _{v\in {\mathcal {A}}}\mathbb {E}\bigg [\frac{1}{2}\int _{0}^{\hat{\tau }^{\epsilon ,v}_{x^*}}\Vert v(s)\Vert ^2_\mathcal {H}ds-\int _{0}^{\hat{\tau }^{\epsilon ,v}_{x^*}}\Vert u\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds\bigg ], \end{aligned}\nonumber \\ \end{aligned}$$
(17)

where \({\mathcal {A}}\) is the collection of all \(\mathcal {H}\)-valued, \({\mathscr {F}}_{t\ge 0}\)-adapted processes v defined on [0, T] such that

$$\begin{aligned} \hat{\tau }^{\epsilon ,v}_{x^*}=\inf \{ t>0 : \hat{\eta }^{\epsilon ,v}_{x^*}(t)\notin \mathring{B}_\mathcal {H}(0, L) \}\le T \end{aligned}$$

with probability 1,  \(\hat{\eta }^{\epsilon ,v}_{x^*}\) solves

$$\begin{aligned} \left\{ \begin{aligned}&d\hat{\eta }^{\epsilon ,v}_{x^*}(t)=A\hat{\eta }^{\epsilon ,v}_{x^*}(t)+\frac{1}{\sqrt{\epsilon }h(\epsilon )}\big [ F\big (x^*+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v}_{x^*}(t) \big )-F\big (x^* \big ) \big ]dt\\&\qquad \qquad \qquad +\big [v(t) -u\big (t,\hat{\eta }^{\epsilon ,v}_{x^*}(t)\big )\big ]dt+ \frac{1}{h(\epsilon )}dW(t)\\ {}&\hat{\eta }^{\epsilon ,v}_{x^*}(0)=0_\mathcal {H}\end{aligned}\right. \end{aligned}$$
(18)

and

$$\begin{aligned} \mathbb {E}\int _{0}^{\hat{\tau }^{\epsilon ,v}_{x^*}}\Vert v(s)\Vert ^2_\mathcal {H}ds<\infty . \end{aligned}$$

Proof

Let \(\epsilon >0.\) From the Cameron-Martin-Girsanov theorem,

$$\begin{aligned} \begin{aligned} Q^{\epsilon }(u)&=\mathbb {E}^\epsilon \bigg [\exp \bigg ( -2h(\epsilon )\int _{0}^{\tau ^{\epsilon }_{x^*}}\big \langle u\big (s,\eta ^{\epsilon }_{x^*}(s)\big ), dW^\epsilon (s)\big \rangle _\mathcal {H}\\&\quad -h^2(\epsilon )\int _{0}^{\tau ^{\epsilon }_{x^*} } \Vert u\big (s,\eta ^{\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds \bigg )\mathbb {1}_{\{\tau ^{\epsilon }_{x^*}\le T\}}\bigg ], \end{aligned} \end{aligned}$$

where

$$\begin{aligned} W^\epsilon (t):=W(t)-h(\epsilon )\int _{0}^{t}u\big (s,\eta ^{\epsilon }_{x^*}(s)\big )ds\;, t\in [0,T] \end{aligned}$$

is a cylindrical Wiener process under \(\mathbb {P}^\epsilon \). Using yet another change of measure with

$$\begin{aligned} \frac{d{\tilde{\mathbb {P}}}^\epsilon }{d\mathbb {P}^\epsilon }\bigg |_{{\mathscr {F}}_T}=&\exp \bigg (-2 h(\epsilon )\int _{0}^{T}\big \langle u\big (s,\eta ^{\epsilon }_{x^*}(s)\big ), dW^{\epsilon }(s)\big \rangle _\mathcal {H}\\&-2h^2(\epsilon )\int _{0}^{T } \Vert u\big (s,\eta ^{\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds \bigg ), \end{aligned}$$

we can write

$$\begin{aligned} \begin{aligned} Q^{\epsilon }(u)&= {\tilde{\mathbb {E}}}^\epsilon \bigg [\exp \bigg ( h^2(\epsilon )\int _{0}^{\tau ^{\epsilon }_{x^*} } \Vert u\big (s,\eta ^{\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds \bigg )\mathbb {1}_{\{\tau ^{\epsilon }_{x^*}\le T\}}\bigg ] \\&=\mathbb {E}\bigg [\exp \bigg ( h^2(\epsilon )\int _{0}^{\hat{\tau }^{\epsilon }_{x^*} } \Vert u\big (s,\hat{\eta }^{\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds \bigg )\mathbb {1}_{\{\hat{\tau }^{\epsilon }_{x^*}\le T\}}\bigg ], \end{aligned} \end{aligned}$$
(19)

where \(\hat{\eta }^{\epsilon }_{x^*}\) solves

$$\begin{aligned} \begin{aligned} \big \{d\hat{\eta }^{\epsilon }_{x^*}(t)&=A\hat{\eta }^{\epsilon }_{x^*}(t)+\frac{1}{\sqrt{\epsilon }h(\epsilon )}\big [ F\big (x^*+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon }_{x^*}(t) \big )-F\big (x^* \big ) \big ]dt\\&\quad -u\big (t,\hat{\eta }^{\epsilon }_{x^*}(t)\big )dt+ \frac{1}{h(\epsilon )}dW(t)\;,\;\; \hat{\eta }^{\epsilon }_{x^*}(0)=0_\mathcal {H}\big \} \end{aligned} \end{aligned}$$

and \(\hat{\tau }^{\epsilon }_{x^*}\) denotes the corresponding exit time for \(\hat{\eta }^{\epsilon }_{x^*}\). This follows, once again, from the Cameron-Martin-Girsanov theorem, as

$$\begin{aligned} {\tilde{W}}^{\epsilon }(t):=W(t)+h(\epsilon )\int _{0}^{t}u\big (s,\eta ^{\epsilon }_{x^*}(s)\big )ds\;, t\in [0,T] \end{aligned}$$

is a cylindrical Wiener process under the measure \({\tilde{\mathbb {P}}}^\epsilon \). From (19) we see that the second moment of the estimator can be written as an exponential functional of the driving noise and, as such, it admits the variational representation (17) (see (2.5) in [46] as well as (14) in [50] for the finite-dimensional case). \(\square \)

The form of the MDP action functional provides essential information for choosing changes of measure u that perform well asymptotically. In particular, if for all \(\phi \) with \({\mathcal {S}}_{x,T}(\phi )<\infty \) there exists a (local) Lagrangian \({\mathcal {L}}_x\) defined on a subset of \({\mathcal {X}}\times \mathcal {H},\) such that

$$\begin{aligned} {\mathcal {S}}_{x,T}(\phi )=\int _{0}^{T}{\mathcal {L}}_x(\phi (t),{\dot{\phi }}(t)) dt\;, \end{aligned}$$
(20)

then "good" changes of measure are connected to subsolutions of the PDE

$$\begin{aligned} \left\{ \begin{aligned}&\partial _tU(t,\eta )+\mathbb {H}_x\big (\eta ,D_\eta U(t,\eta )\big )=0\;,\;(t,\eta )\in [0,T)\times {\mathcal {K}}\\ {}&U(T,\eta )={\bar{g}}(\eta )\;,\;\eta \in {\mathcal {K}}\subset \mathcal {H}, \end{aligned}\right. \end{aligned}$$
(21)

with

$$\begin{aligned} {\bar{g}}(\eta )=\left\{ \begin{array}{ll} 0,&{}\quad \eta : \Vert \eta \Vert _\mathcal {\mathcal {H}}\ge L\\ \infty ,&{}\quad \eta : \Vert \eta \Vert _\mathcal {\mathcal {H}}< L. \end{array}\right. \end{aligned}$$

Here, \(\mathbb {H}_x\) denotes the Hamiltonian corresponding to \({\mathcal {L}}_x\) via Legendre transform (up to a sign). In the problems we consider, the latter are not well-defined on the whole space but rather on a subset \({\mathcal {K}}\times \mathcal {H}\subset \mathcal {H}\times \mathcal {H},\) see e.g. (23) below. The notion of subsolution is meant in the sense of the following definition.

Definition 3.3

A subsolution of (21) is any \(U:[0,T]\times {\mathcal {K}}\rightarrow \mathbb {R}\) such that for all \((t,\eta ),\) \(U(\cdot ,\eta )\in C^1(0,T)\), \(U(t,\cdot )\in C^1({\mathcal {K}})\) in the sense of Fréchet differentiation and satisfies

$$\begin{aligned} \left\{ \begin{aligned}&\partial _tU(t,\eta )+\mathbb {H}_x\big (\eta ,D_\eta U(t,\eta )\big )\ge 0\;,\;(t,\eta )\in [0,T)\times {\mathcal {K}}\\ {}&U(T,\eta )\le {\bar{g}}(\eta )\;,\;\eta \in {\mathcal {K}}\subset \mathcal {H}. \end{aligned}\right. \end{aligned}$$

The interested reader is referred to [25] for the original development of subsolution-based importance sampling. As we will show below (Theorem 3.1 and Remark 11), when \(x=x^*\), the MDP action functional takes the form (20) with

$$\begin{aligned} {\mathcal {L}}_{x^*}(\eta ,v)=\frac{1}{2}\Vert v-[A+DF(x^*)]\eta \Vert _\mathcal {H}^2,\;\; (\eta ,v)\in Dom(A)\cap {\mathcal {E}}\times \mathcal {H}\end{aligned}$$
(22)

and the corresponding Hamiltonian is given by

$$\begin{aligned} \mathbb {H}_{x^*}(\eta ,p)=\big \langle [A+DF(x^*)]\eta , p\big \rangle _\mathcal {H}-\frac{1}{2}\Vert p\Vert _\mathcal {H}^2\;, (\eta ,p)\in Dom(A)\cap {\mathcal {E}}\times \mathcal {H}.\nonumber \\ \end{aligned}$$
(23)

A direct consequence of (20) is that we can construct an explicit stationary subsolution in terms of the corresponding quasipotential. The latter is given by

$$\begin{aligned} \begin{aligned} V_{x^*}(\eta )&=\inf \{ {\mathcal {S}}_{x^*,T}(\phi ) :\phi \in C([0,T];{\mathcal {X}}): \phi (0)=0, \phi (T)=\eta , T\in (0,\infty )\}\\ {}&=\Vert (-A)^\frac{1}{2}\eta \Vert _\mathcal {H}^2-\big \langle DF(x^*)\eta , \eta \big \rangle \\ {}&=-\big \langle [A+DF(x^*)]\eta , \eta \big \rangle \;,\; \eta \in Dom(A) \end{aligned} \end{aligned}$$

and \(V_{x^*}(\eta )=\infty \) otherwise. A physical interpretation of \(V_{x^*}(\eta )\) is that of the minimal "energy" required to push a path from 0 to the state \(\eta \) and its explicit form is a consequence of the fact that (8) is, in our setting, a gradient system (see e.g. [16, 17] Section 12.2.3 for SRDEs). In view of Hypotheses 1(a), 1(b), 3(a), 3(b) it follows that

$$\begin{aligned} U(t,\eta )=a_1^fL^2-V_{x^*}(\eta ) \end{aligned}$$
(24)

is a subsolution of (21) on \({\mathcal {K}}=Dom(A)\). The final condition is satisfied since \(a_1^f=\inf _{n\in \mathbb {N}}a_n^f.\)

Remark 3

In finite-dimensional systems, feedback controls (or changes of measure) defined by \(u(t,\eta )=-D_\eta U(t,\eta )\) lead to nearly optimal asymptotic behavior (see [22] Section 2.3, [23] Theorem 2.4 for large-deviation and [50] Theorem 3.1 for moderate deviation-based schemes). A first issue that appears in infinite dimensions is that \(u(t, \hat{\eta }^{\epsilon ,v}_{x^*}(t))\) is not well-defined since with probability 1 and for all t\(\hat{\eta }^{\epsilon ,v}_{x^*}(t)\notin Dom(A)\). The latter is a consequence of the spatial irregularity of the noise.

Throughout the rest of this paper, \(P^f_n:\mathcal {H}\rightarrow \mathcal {H}\) denotes an orthogonal projection to the \(n-\)dimensional eigenspace \(\text {span}\{ e^f_j\}_{j=1}^{n}\) and we consider the "projected" quasipotential \(V_{x^*}(P^f_n\eta )=V_{x^*}(\langle \eta , e_1^f\rangle _\mathcal {H}e_1^f),\) the subsolution \(U(t,P^f_n\eta )\) of (21) (with \({\mathcal {K}}=P_n^f\mathcal {H}\)). The changes of measure we will use are given by

$$\begin{aligned} u_{k_0}(t,\eta ):=-D_\eta U(t,P^f_{k_0}\eta ):=2\sum _{i=1}^{k_0}a_i^f\langle \eta , e_i^f\rangle _\mathcal {H}e_i^f, \end{aligned}$$
(25)

with \(k_0\) as in Hypothesis 3c’. For implementation purposes, \(u_{k_0}\) is replaced by a sequence \(u_{k_0}^\epsilon \) that converges to \(u_{k_0}\) as \(\epsilon \rightarrow 0\). For more details on the choice of \(u_{1}^\epsilon \) see (63) and the discussion in Sect. 4 below.

We can now present our main results on the asymptotic behavior of the scheme.

Theorem 3.1

(Moderate Deviations) Let \(T>0, L>0\) as in (13), \(k_0\) as in Hypothesis 3c’, \(u_{k_0}\) as in (25), \(Q^\epsilon \) as in (16) and \(B_\mathcal {H}(0,L)\subset \mathcal {H}\) denote the closed ball of radius L centered at the origin. Moreover let \(u_{k_0}^\epsilon :[0,T]\times \mathcal {H}\rightarrow \mathcal {H}\) be a sequence that converges pointwise and uniformly over bounded subsets of \(\mathcal {H}\) to \(u_{k_0}\),

$$\begin{aligned} {\mathcal {T}}= & {} \big \{y\in C([0,T];\mathcal {H}): y(0)=0, \exists \tau \in (0,T] :y(\tau )\in \partial B_\mathcal {H}(0,L),\; \nonumber \\{} & {} y(t)\in B_\mathcal {H}(0,L)\; \forall t\in [0,\tau )\big \} \end{aligned}$$
(26)

and

$$\begin{aligned} {\mathcal {C}}_{y,x^*}=\big \{v\in L^2([0,T];\mathcal {H}): {\dot{y}}(t) = Ay(t) +DF(x^*)y(t)-u_{k_0}(t,y(t))+v(t)\big \}. \end{aligned}$$

Under Hypotheses 1(a)-(c), 2(a),(b), 3(a),(b),(c’) we have

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} -\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u_{k_0}^\epsilon )= \inf _{y\in {\mathcal {T}}}\inf _{v\in {\mathcal {C}}_{y,x^*}}\int _{0}^{\tau }\bigg (\frac{1}{2}\Vert v(t)\Vert ^2_\mathcal {H}-\Vert u_{k_0}(t,y(t))\Vert _\mathcal {H}^2\bigg ) dt,\nonumber \\ \end{aligned}$$
(27)

with the convention that the infimum over the empty set is \(\infty .\)

Remark 4

A few comments on (27) are in order: (1) If \(y\in H^1((0,T);\mathcal {H})\cap L^2([0,T];Dom(A)),\) the set \({\mathcal {C}}_{y,x^*}\) reduces to the singleton \(\{{\bar{v}}(t):= {\dot{y}}(t)-Ay(t)-DF(x^*)y(t)-u(t,y(t))\}\) and for any \(y\notin H^1((0,T);\mathcal {H})\cap L^2([0,T];Dom(A)),\) \({\mathcal {C}}_{y,x^*}\) is empty. (2) Using the same notation, it follows that the right-hand side of (27) can be expressed as

$$\begin{aligned} \inf _{y\in {\mathcal {T}}}\int _{0}^{\tau }\bigg (\frac{1}{2}\Vert {\bar{v}}(t)\Vert ^2_\mathcal {H}-\Vert u_{k_0}(t,y(t))\Vert _\mathcal {H}^2\bigg ) dt. \end{aligned}$$

(3) Since the functional on the right-hand side involves only the values of y on \([0,\tau ]\) it is straightforward to see that the infimum can in fact be taken over paths \(y\in C([0,\tau ];\mathcal {H})\) that satisfy the constraints in (26).

Using the moderate deviation asymptotics of Theorem 3.1 we can then prove the following:

Theorem 3.2

(Near asymptotic optimality) Let \(L,T>0\), \(k_0,u_{k_0},u_{k_0}^\epsilon :[0,T]\times \mathcal {H}\rightarrow \mathcal {H}\) as in Theorem 3.1, \({\mathcal {A}}\) as in Lemma 3.2, U as in (24) and \(G_T\) as in (15). For any sequence \(\{v^\epsilon \}\subset {\mathcal {A}}\) such that

$$\begin{aligned}{} & {} -\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u_{k_0}^\epsilon )\nonumber \\{} & {} \quad \ge \mathbb {E}\bigg [\frac{1}{2}\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\Vert v^\epsilon (s)\Vert ^2_\mathcal {H}ds-\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\Vert u_{k_0}^\epsilon \big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds\bigg ]-\epsilon ^2 \end{aligned}$$
(28)

we have

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}\mathbb {E}\big \langle \hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}\big (\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}\big ), e_1^f \big \rangle ^2_\mathcal {H}= L^2. \end{aligned}$$
(29)

Moreover, we have the second moment bounds

$$\begin{aligned} G_T(0,0)+U(0,0) \le \lim _{\epsilon \rightarrow 0}-\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u_{k_0}^\epsilon )\le 2G_T(0,0), \end{aligned}$$
(30)

where \(U(0,0)\le G_T(0,0)\) and \(G_T(0,0)\longrightarrow U(0,0)\) as \(T\rightarrow \infty .\)

The first statement above asserts that the limiting controlled trajectories exit the domain D through the boundary near the direction of the eigenvector \(e_1^f\) (see Hypotheses 3(c), (c’)). Finally, (30) shows that, for any finite time horizon T, our scheme is close to asymptotic optimality, according to Definition 3.2, and achieves optimal behavior in the limit \(\epsilon \rightarrow 0, T\rightarrow \infty \). Near asymptotic optimality is a common feature of importance sampling schemes for continuous-time dynamics even in finite dimensions. This is mainly a consequence of using subsolutions of (21) instead of exact solutions which are rarely given in explicit form. Our numerical studies indicate that near optimality leads to provably superior performance in comparison to standard Monte Carlo.

Remark 5

The moderate deviation regime allows us to work with the exit problem of a linear equation instead of that of the initial nonlinear SRDE (1). The "drift" of this linear equation is given by \(A+DF(x^*)\) and thus the dominant eigenpairs of this operator govern the exit time and exit place asymptotics. As mentioned in the introduction, similar statements have been proved for finite-dimensional linear equations in [51] (see e.g. Theorem 6).

3.2 On the asymptotic exit direction

In this section we study the limiting variational problem appearing on the right-hand side of (27). In particular, we will show that, under Hypothesis 3c’, changes of measure that force the dynamics in the \(e_1^f\) direction lead to minimal paths that exit from the ball \(\mathring{B}_\mathcal {H}(0,L)\) through the same direction. From this point on we will only use the notation \({\mathcal {S}}_{x,T}\) to denote the explicit action functional

$$\begin{aligned} {\mathcal {S}}_{x,T}(\phi )=\frac{1}{2}\int _{0}^{T}\big \Vert {\dot{\phi }}(t)-[A+DF\big (X^0_x(t)\big )]\phi (t)\big \Vert ^2_{\mathcal {H}}dt. \end{aligned}$$
(31)

Moving on to the variational problem in (27), we let \(I^{k_0}:{\mathcal {T}}\subset C([0,T];\mathcal {H})\rightarrow \mathbb {R},\)

$$\begin{aligned} I^{k_0}(y):=\inf _{v\in {\mathcal {C}}_{y,x^*}}\int _{0}^{\tau }\bigg (\frac{1}{2}\Vert v(t)\Vert ^2_\mathcal {H}-\Vert u_{k_0}(t,y(t))\Vert _\mathcal {H}^2\bigg ) dt \end{aligned}$$
(32)

and seek to characterize \(\arg \min _{y\in {\mathcal {T}}}I^{k_0}(y).\) For the first part of this section we consider the case \(k_0=1\) covered by Hypothesis 3(c). The more general setting of Hypothesis 3c’ will be studied in Proposition 3.1 below. For the sake of simplicity we will drop the superscript \(k_0\) and write \(I\equiv I^{1}\) and \(u\equiv u_{1}\) unless otherwise stated.

A first observation is that \(I(y)<\infty \) if and only if \(y\in H^1((0,T);\mathcal {H})\cap L^2([0,T];Dom(A))\) and for all such y the infimum above is uniquely attained by

$$\begin{aligned} {\bar{v}}(t)= {\dot{y}}(t)-Ay(t)-DF(x^*)y(t)-u(t,y(t)),\; t\in [0,T] \end{aligned}$$

(see also Remark 4 above). Therefore, in view of (25), we can re-express I as follows:

$$\begin{aligned} I(y)&=\int _{0}^{\tau }\frac{1}{2}\Vert {\bar{v}}(t)\Vert ^2_\mathcal {H}-\Vert u(t,y(t))\Vert _\mathcal {H}^2 dt\nonumber \\ {}&=\int _{0}^{\tau }\bigg (\frac{1}{2}\Vert {\dot{y}}(t)-Ay(t) -DF(x^*)y(t)+u(t,y(t))\Vert ^2_\mathcal {H}-\Vert u(t,y(t))\Vert _\mathcal {H}^2\bigg )\ dt\nonumber \\ {}&=\int _{0}^{\tau }\bigg (\frac{1}{2}\Vert {\dot{y}}(t)-Ay(t) -DF(x^*)y(t)\Vert ^2_\mathcal {H}-\frac{1}{2}\Vert u(t,y(t))\Vert _\mathcal {H}^2 \bigg )dt\nonumber \\ {}&\quad +\int _{0}^{\tau }\big \langle {\dot{y}}(t), u(y(t))\big \rangle _\mathcal {H}dt- \int _{0}^{\tau }\big \langle Ay(t) +DF(x^*)y(t), u(y(t))\big \rangle _\mathcal {H}dt\nonumber \\ {}&={\mathcal {S}}_{x^*,\tau }(y)-\int _{0}^{\tau }\frac{1}{2}\Vert u(t,y(t))\Vert _\mathcal {H}^2 dt+2a_1^f\int _{0}^{\tau }\langle y(t), e_1^f\rangle _\mathcal {H}\langle {\dot{y}}(t), e_1^f\rangle _\mathcal {H}dt\nonumber \\ {}&- 2a_1^f\int _{0}^{\tau }\big \langle Ay(t) +DF(x^*)y(t), e_1^f\big \rangle _\mathcal {H}\langle y(t), e_1^f\rangle _\mathcal {H}dt\nonumber \\ {}&={\mathcal {S}}_{x^*,\tau }(y)+2a_1^f\int _{0}^{\tau }\langle y(t), e_1^f\rangle _\mathcal {H}\langle {\dot{y}}(t), e_1^f\rangle _\mathcal {H}dt +2(a_1^{f})^2\int _{0}^{\tau } \langle y(t), e_1^f\rangle ^2_\mathcal {H}dt\nonumber \\&\quad -\int _{0}^{\tau }\frac{1}{2}\Vert u(t,y(t))\Vert _\mathcal {H}^2dt. \end{aligned}$$
(33)

The last two terms in the last display are equal due to (25). Thus,

$$\begin{aligned} \begin{aligned} I(y)&={\mathcal {S}}_{x^*,\tau }(y)+a_1^f\int _{0}^{\tau }\frac{d}{dt}\bigg (\langle y(t), e_1^f\rangle ^2_\mathcal {H}\bigg )dt ={\mathcal {S}}_{x^*,\tau }(y)\\&\quad +a_1^f\big (\langle y(\tau ), e_1^f\rangle ^2_\mathcal {H}-\langle y(0), e_1^f\rangle ^2_\mathcal {H}\big ). \end{aligned} \end{aligned}$$
(34)

It is straightforward to verify that \(\arg \min _{y\in {\mathcal {T}}}I(y)\ne \varnothing ,\) i.e. the minimum value of I over the set \({\mathcal {T}}\subset C([0,T];\mathcal {H})\) is attained in \({\mathcal {T}}\). Indeed, \({\mathcal {S}}_{x^*,\cdot }:[0,T]\times C([0,T];\mathcal {H})\rightarrow [0, \infty ]\) is lower-semicontinuous and the second summand in (34) defines a continuous functional on the same set. Thus, I is itself lower-semicontinuous and furthermore \({\mathcal {T}}\) is closed in the topology of \(C([0,T];\mathcal {H})\) (recall that \(B_\mathcal {H}(0,L)\) in (26) is a closed ball in \(\mathcal {H})\).

Remark 6

We shall proceed to the characterization of minimizers in three steps. First we minimize over paths y with \(y(0)=0\) and \(y(\tau )=z\in \partial B_{\mathcal {H}}(0,L)\). Then we minimize over the exit place z and finally over the time \(\tau \) in which the path y hits the boundary \(\partial B_{\mathcal {H}}(0,L)\) of the closed ball \(B_{\mathcal {H}}(0,L)\). At this point, we emphasize that, in contrast to \(\tau ^\epsilon _{x^*}\) (14), \(\tau _\phi \) (Lemma 3.1),\(\hat{\tau }^{\epsilon , v}_{x^*}\) (Lemma 3.2) and \(\hat{\tau }^{\epsilon , v^\epsilon }_{x^*}\) (28), it is not known a priori whether the time \(\tau \) is the first exit time of y from the open ball \(\mathring{B}_\mathcal {H}(0,L)\). We will show that the latter is true for minimizing paths in Lemma 3.4 and Proposition 3.1 below.

Lemma 3.3

Let \(y^*\in \arg \min \{ I(y): y\in C([0,\tau ];\mathcal {H}), y(0)=0, y(\tau )=z\}.\) Then

$$\begin{aligned} y^*(t)= y_{z,\tau }^*(t) = \sum _{k=1}^{\infty }\frac{\sinh (a_k^{f}t) }{\sinh (a_k^{f}\tau )}\langle z,e_k^f\rangle _\mathcal {H}e_k^f, \;\;t\in [0,\tau ]. \end{aligned}$$

Proof

The fact that we minimize over \(y\in C([0,\tau ];\mathcal {H})\) instead of \(C([0,T];\mathcal {H})\) is justified by Remark 4-3). Next notice that \(y^*_{z,\tau }\in C([0,\tau ];\mathcal {H})\) since \(\sinh \) is increasing and continuous. In particular,

$$\begin{aligned} \Vert y^*_{z,\tau }\Vert _{\mathcal {H}}\le \sum _{k=1}^{\infty }\langle z,e_k^f\rangle ^2_\mathcal {H}=\Vert z\Vert ^2_{\mathcal {H}}. \end{aligned}$$

Proceeding to the proof we have, in view of (34),

$$\begin{aligned} I(y)=\int _{0}^{\tau }{\mathcal {L}}_{x^*}(y(t),{\dot{y}}(t))dt+a_1^f\langle z, e_1^f\rangle ^2_\mathcal {H}\end{aligned}$$

with \({\mathcal {L}}_{x^*}\) as in (22). Minimizers are then governed by the Euler-Lagrange equation

$$\begin{aligned} \partial _tD_v{\mathcal {L}}_{x^*}(y(t),{\dot{y}}(t))=D_\eta {\mathcal {L}}_{x^*}(y(t),{\dot{y}}(t)) \end{aligned}$$
(35)

which boils down to

$$\begin{aligned} \big \{y''(t)=[A+DF(x^*)]^2y(t)\;, y(0)=0\;,\;y(\tau )=z\big \}. \end{aligned}$$

Projecting to the eigenbasis \(\{e_k^f\}_{k\in \mathbb {N}}\) of \(A+DF(x^*)\) we obtain

$$\begin{aligned} \frac{d^2}{dt^2}\langle y(t), e_k\rangle =(a_k^{f})^2\langle y(t), e_k \rangle \; ,k\in \mathbb {N}_{0}, t\in [0,T]. \end{aligned}$$

Letting \(y_k=\langle y, e_k^f\rangle _\mathcal {H},z_k=\langle z, e_k^f\rangle _\mathcal {H},\) the general solution of the latter has the form \( y_k(t)=c_1e^{a_k^ft}+c_2e^{-a_k^{f}t} \) and taking into account the initial and terminal conditions we obtain

$$\begin{aligned} c_1+c_2=0, z_k=c_1e^{a_k^{f}\tau }+c_2e^{-a_k^{f}\tau }\implies c_1=\frac{z_k}{e^{a_k\tau }-e^{-a_k^{f}\tau }}. \end{aligned}$$

Thus,

$$\begin{aligned} y_k(t)=\frac{z_k(e^{a_k^{f}t}-e^{-a_k^{f}t})}{e^{a_k^f\tau }-e^{-a_k^{f}T}}=z_k\frac{\sinh (a_k^{f}t) }{\sinh (a_k^{f}\tau )}. \end{aligned}$$

\(\square \)

The next lemma is concerned with the exit direction when Hypothesis 3(c) holds.

Lemma 3.4

Let \(T>0,\) I as in (32) and \(u,{\mathcal {T}}, C_{y,x^*}\) as in Theorem 3.1. Under Hypothesis 3(c), any \(y^*\in \arg \min \{I(y); y\in {\mathcal {T}} \} \)  \(y^*\) first exits \(\mathring{B}_\mathcal {H}(0,L)\) at \(\tau =T\) in the direction of the eigenvector \(e_1^f\) (recall Remark 6) i.e. for all \(k\ge 2\),

$$\begin{aligned} \langle y^*(\tau ), e_k^f\rangle _\mathcal {H}=\langle y^*(T), e_k^f\rangle _\mathcal {H}=0, \end{aligned}$$

\(\Vert y^*(t)\Vert _{\mathcal {H}}<L\) for all \(t<T\) and \(\Vert y^*(T)\Vert _{\mathcal {H}}=L.\)

Proof

Let \(\phi ^*=\phi ^*_{z,\tau }\) be a minimizer provided by Lemma 3.3. Notice that, since the Euler-Lagrange equations provide necessary conditions for minimality, any \(\phi ^*\in \arg \min \{ I(y): y\in C([0,\tau ];\mathcal {H}), y(0)=0, y(\tau )=z\}\) will be of this form. After straightforward algebra we obtain

$$\begin{aligned} \begin{aligned} I(\phi ^*_{z,\tau })&= {\mathcal {S}}_{x^*,\tau }(\phi ^*)+a_1^f\langle z, e_1^f\rangle ^2_\mathcal {H}\\ {}&= \sum _{k=1}^{\infty }\int _{0}^{\tau }\langle {\dot{\phi }}^*(t)-[A+DF(x^*)]\phi ^*(t),e^f_k\rangle _\mathcal {H}^2 dt+a_1^f\langle z, e^f_1\rangle ^2_\mathcal {H}\\ {}&=\sum _{k=1}^{\infty }\int _{0}^{\tau }\big ({\dot{\phi }}_k^*(t)-a_k^{f}\phi _k^*(t)\big )^2dt+a_1^f\langle z, e^f_1\rangle ^2_\mathcal {H}\\ {}&=a_1^f z^2_1 +\sum _{k=1}^{\infty }\frac{a^{f}_kz_k^2}{1-e^{-2a^{f}_k\tau }}. \end{aligned} \end{aligned}$$

Now for each fixed \(\tau \), Hypothesis 3(c) guarantees that this quadratic form is minimized for \(z^*\in \partial B_{\mathcal {H}}(0,L)\) such that \(z^*_k=0\) for all \(k\ge 2\) and \(z_1^*=\pm L\) (see e.g. Theorem 3.4 in [46]). Then,

$$\begin{aligned} \begin{aligned} I(\phi ^*_{z^*,\tau })=a_1^fL^2\bigg (1+\frac{1}{1-e^{-2a^{f}_1\tau }}\bigg ) \end{aligned} \end{aligned}$$
(36)

is minimized for the largest possible \(\tau \) i.e. for \(\tau =T.\) Hence, since the order with which the variables are being minimized does not change the value of the minimum, we have \(\min _{y\in {\mathcal {T}}}I(y)= I(\phi ^*_{z^*,T})\) and the minimizers \(y^*=\phi ^*_{z^*,T}\) enjoy the desired properties. Finally, note that any element \(y^*\in \arg \min \{I(y); y\in {\mathcal {T}} \} \) is of the form \(\phi ^*_{z^*, T}.\) Indeed, fix the initial and terminal values \(y^*(0)=0, y^*(\tau )=z\in \partial B_\mathcal {H}(0, L)\) and assume that \(y^*\) does not satisfy the Euler-Lagrange equations (35). Since the latter provide necessary conditions for minimality, it follows that \(y^*\) is not a minimizer. Moreover, it follows from the previous calculations that if \(\tau <T\) or if \(z_k \ne 0\) for some \(k\ge 2\) then \(y^*\) cannot be a minimizer of I. The proof is complete.\(\square \)

As mentioned above, the previous lemma implies that, for any minimizing path \(y^*\), \(\tau \) is in fact the first exit time from the open ball \(\mathring{B}_\mathcal {H}(0,L),\) i.e. \(\tau =\inf \{t\in [0,T]: y^*\notin \mathring{B}_\mathcal {H}(0,L) \}\) and furthermore \(\tau =T\).

Remark 7

If the sampling time T is large enough, the results of Lemma 3.4 as well as Theorems 3.1, 3.2 remain true under the weaker spectral gap assumption that \(2a_1^f<a_k^f\) for all \(k\ge 2.\) Since we are interested in schemes that perform well for large values of T,  this generalization comes at no cost. For more details on this relaxed condition see [46, Theorem 3.9].

Up to this point we have worked under Hypothesis 3(c) to show that minimizers of the functional I lie on the one-dimensional subspace where the change of measure u acts. In the absence of a sufficiently large spectral gap the situation is more complicated. In particular, if the sampling time T is large enough, the minimizers can be orthogonal to u. In other words, forcing the system towards its physical exit direction \(e_1^f\) might actually lead to controlled trajectories that exit from a subspace that is orthogonal to \(e_1^f\) under the change of measure. This is proved in the following lemma.

Lemma 3.5

Assume that the eigenvalues \(\{a^f_k\}_{k\in \mathbb {N}}\) are strictly increasing, \(a_2^f\le 2a_1^f\) and let

$$\begin{aligned} T^*:=-\frac{1}{2a^f_2}\ln \bigg (1-\frac{a^f_2}{2a_1^f}\bigg ). \end{aligned}$$

If \(T> T^*\) then any minimizer \(y^*\in \arg \min \{I(y); y\in {\mathcal {T}} \} \) satisfies \(\Vert y^*(t)\Vert _{\mathcal {H}}<L\) for all \(t<T\) and \(\Vert y^*(T)\Vert _{\mathcal {H}}=L.\) Moreover \(y^*\) first exits \(\mathring{B}_\mathcal {H}(0,L)\) at \(\tau =T\) in the direction of the eigenvector \(e_2^f\) (recall Remark 6) i.e. for all \(k\ne 2,\)

$$\begin{aligned} \langle y^*(\tau ), e_k^f\rangle _\mathcal {H}=\langle y^*(T), e_k^f\rangle _\mathcal {H}=0. \end{aligned}$$

Proof

As in the proof of Lemma 3.4 we have

$$\begin{aligned} \begin{aligned} I(\phi ^*_{z,\tau })=a_1^f z^2_1 +\sum _{k=1}^{\infty }\frac{a^{f}_kz_k^2}{1-e^{-2a^{f}_k\tau }}=: \sum _{k=1}^{\infty }\lambda _k^fz_k^2. \end{aligned} \end{aligned}$$

We claim that, without loss of generality, we can consider \(\tau \in (T^*,T].\) Assuming the latter for now, we can compare the weights \(\lambda _k^f\) to conclude that

$$\begin{aligned} \begin{aligned} \lambda _2^f&=\frac{a^{f}_2}{1-e^{-2a^{f}_2\tau }}< \frac{a^{f}_2}{1-e^{-2a^{f}_2T^*}}=\frac{a_2^f}{1-e^{\ln (1-a_2^f/2a_1^f)}}\\&=2a_1^f\le a^{f}_1\bigg (1+\frac{1}{1-e^{-2a^{f}_1\tau }}\bigg )=\lambda _1^f \end{aligned} \end{aligned}$$
(37)

and since \(x\mapsto x/(1-e^{-2\tau x})\) is (strictly) increasing for all \(\tau ,\) it follows that

$$\begin{aligned} \lambda _2^f< \lambda _k^f\;,\;\;\forall k> 2. \end{aligned}$$

Therefore, the quadratic form is minimized for \(z^*\in \partial B_\mathcal {H}(0,L)\) such that \(z^*_k=0\) for all \(k\ne 2\) and \(z^*_2=\pm L\). Consequently

$$\begin{aligned} \begin{aligned} \inf _{(z,\tau )\in \partial B_\mathcal {H}(0,L)\times [ T^*,T ] } I(\phi ^*_{z,\tau })=\frac{a^{f}_2L^2}{1-e^{-2a^{f}_2T}} \ge \inf _{(z,\tau )\in \partial B_\mathcal {H}(0,L)\times [0,T]}I(\phi ^*_{z,\tau }) \end{aligned} \end{aligned}$$
(38)

and

$$\begin{aligned} \begin{aligned} \inf _{(z,\tau )\in \partial B_\mathcal {H}(0,L)\times [0,T]}I(\phi ^*_{z,\tau })&\ge \inf _{(z,\tau )\in \partial B_\mathcal {H}(0,L)\times [ 0,T ] } \bigg (\inf _{k\in \mathbb {N}}\lambda _{k}^{f}(\tau )\bigg )\Vert z\Vert ^2_{\mathcal {H}}\\&= L^2\inf _{\tau \in [ 0,T ] } \bigg (\inf _{k\in \mathbb {N}}\lambda _{k}^{f}(\tau )\bigg ). \end{aligned} \end{aligned}$$

Since \(\lambda _2^f\le \lambda _k^f\) for all \(k\ge 2\) it follows that

$$\begin{aligned} \begin{aligned} \inf _{(z,\tau )\in \partial B_\mathcal {H}(0,L)\times [0,T]}I(\phi ^*_{z,\tau })\ge L^2 \bigg (\lambda _{1}^{f}(T)\wedge \lambda _{2}^{f}(T)\bigg )=L^2\lambda _{2}^{f}(T)=\frac{a^{f}_2L^2}{1-e^{-2a^{f}_2T}}, \end{aligned}\nonumber \\ \end{aligned}$$
(39)

which follows from (37) by setting \(\tau =T>T^*.\) Since the infimum is achieved at \(t=T\), the combination of (38) and (39) concludes the proof. \(\square \)

Remark 8

Lemma 3.5 highlights the importance of sufficient spectral gaps for the design of efficient changes of measure. If Hypothesis 3(c) fails, a scheme that forces the \(e_1^f\) direction will be far from optimal and is expected to produce large errors for small values of \(\epsilon .\) Under the assumptions of that lemma, one can repeat the arguments of the proof above to show

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}-\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u_{1}^\epsilon )=\frac{a_2^fL^2}{1-e^{-2a^{f}_2T}} <2G_T(0,0). \end{aligned}$$

If the ratio \(2a_1^f/a_2^f\) is large, this bound translates to sub-optimal performance as \(\epsilon \rightarrow 0\) which does not improve as \(T\rightarrow \infty .\) Moreover, as we will see in Sect. 5, this ratio depends non-trivially on the interval length \(\ell \) and is indeed large when \(\ell \) is moderately small. This behavior is caused by the linearization of the dynamics and is completely absent when \({f=0}.\) For an example that satisfies the assumptions of Lemma 3.5 see Sect. 5.1.

Before we conclude this section we consider once again the situation where the eigenvalues \(\{a_k^f\}_{k\in \mathbb {N}}\) do not satisfy Hypothesis 3(c) but instead Hypothesis 3c’ holds. We show that the conclusions of Lemma 3.4 can be recovered by projecting to a higher dimensional eigenspace of \(A+DF(x^*)\) consisting of the first \(k_0\) eigenvalues.

Proposition 3.1

Let \(k_0\) as in Hypothesis 3c’, U as in (24) and \(u_{k_0}\) as in (25). Under Hypothesis 3c’ any minimizer \(y^*\in \arg \min \{I^{k_0}(y); y\in {\mathcal {T}} \}\) satisfies the same properties as in Lemma 3.4.

Proof

Following the computations in (33), which carry over verbatim, we see that

$$\begin{aligned} I^{k_0}(y)={\mathcal {S}}_{x^*,\tau }(y)+\sum _{j=1}^{k_0}\bigg [a_j^f\bigg (\langle y(\tau ), e_j^f\rangle ^2_\mathcal {H}-\langle y(0), e_j^f\rangle ^2_\mathcal {H}\bigg )\bigg ]. \end{aligned}$$
(40)

Since the second term is constant for each fixed value of the exit point \(y(\tau ),\) the Euler-Lagrange equations and minimizers for this functional are then identical to those derived in Lemma 3.3 for I. Thus, for any minimizing path \(\phi ^*_{z,\tau }\) that hits the point \(z=(z_k)_{k\in \mathbb {N}}\in \partial B_{\mathcal {H}}(0,L)\) at time \(\tau \in [0,T]\) we have

$$\begin{aligned} \begin{aligned} I^{k_0}(\phi ^*_{z,\tau })&=\sum _{j=1}^{k_0}a^f_j z^2_j+\sum _{j=1}^{\infty }\frac{a_j^fz_j^2}{1-e^{-2a_j^f\tau }} =:\sum _{j=1}^{\infty }\lambda ^f_{k_0,j}z_j^2. \end{aligned} \end{aligned}$$

Comparing the weights \(\lambda ^f_{k_0,j}\) we see that for all \(1<j\le k_0\)

$$\begin{aligned} \lambda ^f_{k_0,1}=a_1^f\bigg (1+\frac{1}{1-e^{-2a_1^f\tau }}\bigg )< a_j^f\bigg (1+\frac{1}{1-e^{-2a_j^f\tau }}\bigg )=\lambda _{k_0,j}, \end{aligned}$$

which holds since \(x\mapsto x/(1-e^{-2\tau x})\) is (strictly) increasing for all \(\tau \) and \(a_1^f<a_2^f\le a_j^f\) for any \(j\ge 2\). In order to show that minimizers point towards \(z_1\) it remains to compare \(\lambda ^f_{k_0,1}\) with \(\lambda ^f_{k_0,j}\) for \(j\ge k_0+1.\) Since \(\lambda _{k_0,k_0+1}\le \lambda _{k_0,k_0+2}\le \dots \) it suffices to consider \(\lambda ^f_{k_0,k_0+1}.\) In view of Hypothesis 3c’ and Theorem 3.4 of [46] we conclude that

$$\begin{aligned} \lambda ^f_{k_0,1}=a_1^f\bigg (1+\frac{1}{1-e^{-2a_1^f\tau }}\bigg )<\frac{a^f_{k_0+1}}{1-e^{-2\tau a^f_{k_0+1}}}=\lambda ^f_{k_0,k_0+1} \end{aligned}$$

for all \(\tau \in [0,T]\). The proof is complete.\(\square \)

3.3 Tightness of \(\hat{\eta }_{x^*}^{\epsilon , v^\epsilon }\)

Let \(v^\epsilon \) be a sequence in \({\mathcal {A}}\) satisfying the assumptions of Theorem 3.2, \(u_{k_0}\) as in (25) and \(u_{k_0}^\epsilon :[0,T]\times \mathcal {H}\rightarrow \mathcal {H}\) be a sequence that converges pointwise and uniformly over bounded subsets of \(\mathcal {H}\) to \(u_{k_0}.\) The goal of this section is to prove tightness estimates for the collection \(\{ \hat{\eta }_{x^*}^{\epsilon , v^\epsilon }:\epsilon <\epsilon _0 \}\) of \(C([0,T];{\mathcal {X}})-\)valued random elements. Throughout the rest of this section we drop the index \(k_0\) and write \(u\equiv u_{k_0}, u^\epsilon \equiv u^{\epsilon }_{k_0} .\)

Recall that for each \(\epsilon ,\) \(\hat{\eta }_{x^*}^{\epsilon , v^\epsilon }\) is the unique mild solution of the controlled equation (18) with \(v=v^\epsilon , u=u^\epsilon .\) Existence and uniqueness is once again provided by Theorem 2.2 of [14] (see also Theorem 7.1 of [45]). The following lemma guarantees that, for \(\epsilon \) small, the sequence \(v^{\epsilon }\) is bounded in \(L^2\).

Lemma 3.6

There exists \(\epsilon _0>0\) and a constant \(C>0\) such that

$$\begin{aligned} \sup _{\epsilon <\epsilon _0}\mathbb {E}\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\big \Vert v^\epsilon (s)\big \Vert ^2_{\mathcal {H}}ds\le C. \end{aligned}$$

Proof

In view of the variational representation (17) any approximate minimizer \(v^\epsilon \in {\mathcal {A}}\) satisfies

$$\begin{aligned} \begin{aligned} \mathbb {E}\bigg [\frac{1}{2}\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\Vert v^\epsilon (s)\Vert ^2_\mathcal {H}ds\bigg ]&\!\le \! -\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u)\!+\!\mathbb {E}\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\!\!\Vert u^\epsilon \big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds\!+\!\epsilon ^2 \end{aligned} \end{aligned}$$

Now from the MDP for bounded functionals (see Definition 3.1 as well as Remark 11 below), along with Lemma 3.1, there exists a constant \(C>0\) such that, for \(\epsilon \) sufficiently small,

$$\begin{aligned} -\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u)\le C. \end{aligned}$$

Hence, from the uniform convergence of \(u^\epsilon \) to u and the uniform boundedness of u in bounded subsets of \(\mathcal {H}\) and the fact that \(\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}\le T\) with probability 1 the estimate follows. \(\square \)

Remark 9

Without loss of generality, we can trivially extend the controls \(v^{\epsilon }\) to [0, T] by letting \(v^{\epsilon }(t)=0 \) for \(t\in [ \hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}, T ].\) This convention will be in use for the rest of this section.

We shall now proceed to the proof of tightness estimates.

Lemma 3.7

Let \(p\ge 1\). For all \(\epsilon ,T>0\), there exist \(\epsilon _0>0, \alpha ,\beta >0\) such that

$$\begin{aligned} \begin{aligned}&\sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(t)\Vert ^p_\mathcal {E}+\sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t)\big \Vert _{C^a}\\&\quad + \sup _{\epsilon<\epsilon _0}\mathbb {E}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}\big \Vert _{C^{\beta }([0,T];\mathcal {E})}<C_{T,\ell ,f} \end{aligned} \end{aligned}$$
(41)

Proof

Using the mild formulation we have

$$\begin{aligned} \hat{\eta }_{x^*}^{\epsilon , v^\epsilon }(t)&=\frac{1}{\sqrt{\epsilon }h(\epsilon )}\int _{0}^{t}S(t-s)\big [ F\big (x^*+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big )-F\big (x^* \big ) \big ]ds\nonumber \\ {}&\quad +\int _{0}^{t}S(t-s)\big [v^{\epsilon }(s) -u^\epsilon \big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )\big ]ds+ \frac{1}{h(\epsilon )}\int _{0}^{t}S(t-s)dW(s) \nonumber \\&=: \Psi ^{\epsilon ,v^\epsilon }(t)+U^{\epsilon }(t)+\frac{1}{h(\epsilon )}W_A(t). \end{aligned}$$
(42)

We now fix a version of the process \( \Psi ^{\epsilon ,v^\epsilon }(t,\xi )\) and work path-by-path. The paths of \(\Psi ^{\epsilon ,v^\epsilon }\) are weakly differentiable with probability 1 and

$$\begin{aligned} \partial _t\Psi ^{\epsilon ,v^\epsilon }(t,\xi )={\mathcal {A}}\Psi ^{\epsilon ,v^\epsilon }(t,\xi )+\frac{1}{\sqrt{\epsilon }h(\epsilon )}\big [ F\big (x^*+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t) \big )- F\big (x^* \big ) \big ](\xi ), \end{aligned}$$

with \({\mathcal {A}}\) as in (4). Next, let \(t\in [0,T]\) and choose \(\xi _t\in [0,L]\) to be such that

$$\begin{aligned} \Vert \Psi ^{\epsilon ,v^\epsilon }(t)\Vert _{\mathcal {E}}=\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\text {sign}\big (\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big ) \end{aligned}$$

In view of Proposition A.1 in [45] (see also Proposition D.4 of [17]) we can estimate the left derivative of the supremum norm \(\Vert \Psi ^{\epsilon ,v^\epsilon }(t)\Vert _{\mathcal {E}}\) by

$$\begin{aligned} \begin{aligned}&\frac{d^-}{dt}\Vert \Psi ^{\epsilon ,v^\epsilon }(t)\Vert _{\mathcal {E}}\\&\quad \le {\mathcal {A}}\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t) \text {sign}\big (\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big )+\frac{1}{\sqrt{\epsilon }h(\epsilon )}\big [ f\big ( x^*(\xi _t)+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t,\xi _t) \big )\\ {}&\qquad -f\big (x^*(\xi _t) \big ) \big ]\text {sign}\big (\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big ). \end{aligned} \end{aligned}$$

From the uniform ellipticity of \({\mathcal {A}}\) we have for all \( t\in [0,T], {\mathcal {A}}\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t) \text {sign}\big (\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big )\le 0.\) Thus, in view of Hypothesis (2(a))

$$\begin{aligned} \begin{aligned}&\frac{d^-}{dt}\Vert \Psi ^{\epsilon ,v^\epsilon }(t)\Vert _{\mathcal {E}}\\&\quad \le \frac{1}{\sqrt{\epsilon }h(\epsilon )}\big [ f_1\big ( x^*(\xi _t)+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t,\xi _t) \big )-f_1\big (x^*(\xi _t) \big ) \big ]\text {sign}\big (\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big )\\ {}&\qquad +\frac{1}{\sqrt{\epsilon }h(\epsilon )}\big [ f_2\big ( x^*(\xi _t)+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t,\xi _t) \big )\\&\qquad -f_2\big (x^*(\xi _t) \big ) \big ]\text {sign}\big (\sqrt{\epsilon }h(\epsilon )\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big )\\ {}&\quad \le M_{f_1}\big |\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t,\xi _t)\big |+ \frac{1}{\sqrt{\epsilon }h(\epsilon )}\big [ f_2\big ( x^*(\xi _t)+\sqrt{\epsilon }h(\epsilon )\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\\&\qquad +\sqrt{\epsilon }h(\epsilon )U^{\epsilon }(t,\xi _t)+\sqrt{\epsilon }W_A(t,\xi _t) \big )-f_2\big (x^*(\xi _t) \big ) \big ]\\&\qquad \cdot \text {sign}\big (\sqrt{\epsilon }h(\epsilon )\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big ), \end{aligned} \end{aligned}$$

where \(M_{f_1}\) is the Lipschitz constant of \(f_1\). To proceed, we distinguish the following two cases:

Case 1:

$$\begin{aligned} \text {sign}\big (\sqrt{\epsilon }h(\epsilon )\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big )= & {} \text {sign}\big (\sqrt{\epsilon }h(\epsilon )\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)+\sqrt{\epsilon }h(\epsilon )U^{\epsilon }(t, \xi _t)\\{} & {} +\sqrt{\epsilon }W_A(t,\xi _t) \big ). \end{aligned}$$

Since \(f_2\) is non-increasing,

$$\begin{aligned}{} & {} f_2\big ( x^*(\xi _t)+\sqrt{\epsilon }h(\epsilon )\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)+\sqrt{\epsilon }h(\epsilon )U^{\epsilon }(t)+\sqrt{\epsilon }W_A(t) \big )\\{} & {} \quad -f_2\big (x^*(\xi _t) \big ) \text {sign}\big (\sqrt{\epsilon }h(\epsilon )\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big )\le 0. \end{aligned}$$

Hence,

$$\begin{aligned} \begin{aligned}&\frac{d^-}{dt}\Vert \Psi ^{\epsilon ,v^\epsilon }(t)\Vert _{\mathcal {E}}\le M_{f_1}\big |\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t,\xi _t)\big |\le M_{f_1}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t)\big \Vert _{\mathcal {E}} . \end{aligned} \end{aligned}$$
(43)

Case 2:

$$\begin{aligned} \text {sign}\big (\sqrt{\epsilon }h(\epsilon )\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big )\ne & {} \text {sign}\big (\sqrt{\epsilon }h(\epsilon )\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)+\sqrt{\epsilon }h(\epsilon )U^{\epsilon }(t,\xi _t)\\{} & {} +\sqrt{\epsilon }W_A(t,\xi _t) \big ). \end{aligned}$$

In this case it is straightforward to verify that

$$\begin{aligned} \sqrt{\epsilon }h(\epsilon )\big |\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big |\le \big |\sqrt{\epsilon }h(\epsilon )U^{\epsilon }(t,\xi _t)+\sqrt{\epsilon }W_A(t,\xi _t)\big |. \end{aligned}$$

The reader is referred to the proof of Theorem 6.1 of [45] for a similar argument. The latter, along with the optimality of \(\xi _t\), yields

$$\begin{aligned} \begin{aligned} \big \Vert \Psi ^{\epsilon ,v^\epsilon }(t)\big \Vert _{\mathcal {E}}&=\big |\Psi ^{\epsilon ,v^\epsilon }(t,\xi _t)\big |\le \bigg |U^{\epsilon }(t,\xi _t)+\frac{1}{h(\epsilon )}W_A(t,\xi _t)\bigg |\\&\le \bigg \Vert U^{\epsilon }(t)+\frac{1}{h(\epsilon )}W_A(t)\bigg \Vert _{\mathcal {E}}. \end{aligned} \end{aligned}$$
(44)

Setting \(\Xi ^{\epsilon ,v^\epsilon }(t):=\max \{ \big \Vert U^{\epsilon }+W_A/h(\epsilon )\big \Vert _{C([0,T];\mathcal {E})}, \big \Vert \Psi ^{\epsilon ,v^\epsilon }(t)\big \Vert _{\mathcal {E}} \}\), we can combine (43), (44) and the mean value inequality to obtain

$$\begin{aligned} \begin{aligned}&\Xi ^{\epsilon ,v^\epsilon }(t)- \big \Vert U^{\epsilon }+W_A/h(\epsilon )\big \Vert _{C([0,T];\mathcal {E})}\\&= \Xi ^{\epsilon ,v^\epsilon }(t)- \Xi ^{\epsilon ,v^\epsilon }(0) \le \int _{0}^{t}\frac{d^-}{ds} \Xi ^{\epsilon ,v^\epsilon }(s)ds \le M_{f_1}\int _{0}^{t}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big \Vert _{\mathcal {E}}ds\\&\le M_{f_1}\int _{0}^{t}\big [\big \Vert \Psi ^{\epsilon ,v^\epsilon }(s)\big \Vert _{\mathcal {E}}+\big \Vert U^{\epsilon }+W_A\big /h(\epsilon )\Vert _{C([0,T];\mathcal {E})} \big ]ds\\ {}&\le 2M_{f_1}\int _{0}^{t}\Xi ^{\epsilon ,v^\epsilon }(s)ds. \end{aligned} \end{aligned}$$

By Grönwall’s inequality,

$$\begin{aligned} \begin{aligned} \big \Vert \Psi ^{\epsilon ,v^\epsilon }(t)\big \Vert _{\mathcal {E}}\le \Xi ^{\epsilon ,v^\epsilon }(t)\le C_{T,\phi }\big \Vert U^{\epsilon }+W_A/h(\epsilon )\big \Vert _{C([0,T];\mathcal {E})}, \end{aligned} \end{aligned}$$

where \(C_{T,\phi }=e^{2M_{f_1} T}\). Since the latter holds for all \(t\in [0,T]\) we obtain

$$\begin{aligned} \big \Vert \Psi ^{\epsilon ,v^\epsilon }\big \Vert _{C([0,T];\mathcal {E})}\le C_{T,\phi }\big \Vert U^{\epsilon }+W_A/h(\epsilon )\big \Vert _{C([0,T];\mathcal {E})}. \end{aligned}$$
(45)

Turning to the control term,

$$\begin{aligned} \begin{aligned} \big \Vert U^{\epsilon }(t)\big \Vert _{\mathcal {E}}\le C\bigg \Vert \int _{0}^{t}S(t-s)v^{\epsilon }(s) ds\bigg \Vert _{H^\theta (0,L)} +\bigg \Vert \int _{0}^{t}S(t-s)u^\epsilon \big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big ) ds\bigg \Vert _\mathcal {E}\end{aligned} \end{aligned}$$

for any \(\theta >1/2\). This is a consequence of the embedding \(W^{\theta ,p}(O)\hookrightarrow {\mathcal {E}}\) which holds for smooth domains \(O\subset \mathbb {R}^d\) and all \(\theta >d/p\). From the smoothing property (6), the Cauchy-Schwarz inequality, the uniform convergence of \(u^\epsilon \) to u and (25) we have

$$\begin{aligned} \begin{aligned} \big \Vert U^{\epsilon }(t)\big \Vert _{\mathcal {E}}&\le C_{T,\theta }\int _{0}^{t}(t-s)^{-\frac{\theta }{2}}\big \Vert v^{\epsilon }(s) \big \Vert _\mathcal {H}ds+C_{T}\int _{0}^{t}\Vert u^\epsilon \big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big ) \big \Vert _\mathcal {E}ds\\ {}&\le C\bigg (\int _{0}^{t}(t-s)^{-\theta }ds\bigg )^{\frac{1}{2}}\bigg (\int _{0}^{t}\big \Vert v^{\epsilon }(s)\big \Vert ^2_{\mathcal {H}} ds \bigg )^{\frac{1}{2}} \\&\quad + C_{T}\int _{0}^{t}\big (\Vert u\big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big ) \big \Vert _\mathcal {E}+\rho \big )ds \\ {}&\le C_{\theta } T^{(1-\theta )/2}\Vert v^\epsilon \Vert _{L^2([0,T];\mathcal {H})}+\rho T C_{T}+2\lambda _1^f\Vert e_1^f\Vert ^2_\mathcal {E}\int _{0}^{T} \Vert \hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\Vert _\mathcal {E}ds \end{aligned} \end{aligned}$$
(46)

which holds w.p. 1 for \(\theta <1\), \(\epsilon \) sufficiently small and \(\rho >0\). As for the stochastic convolution term we have \(h(\epsilon )\rightarrow \infty \) and (10) yields

$$\begin{aligned} \mathbb {E}\sup _{t\in [0,T]}\big \Vert W_A(t)/h(\epsilon )\big \Vert ^p_{\mathcal {E}}\le C \end{aligned}$$

for \(\epsilon \) small and some \(C>0\) independent of \(\epsilon \). The estimate is a consequence of the Sobolev embedding theorem along with heat kernel estimates and the stochastic factorization formula. Combining (42), (45), (46), Lemma 3.6 and Remark 9 we obtain

$$\begin{aligned} \begin{aligned} \mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(t)\Vert ^p_\mathcal {E}&\le C\mathbb {E}\sup _{t\in [0,T]}\big (\big \Vert \Psi ^{\epsilon ,v^\epsilon }(t)\big \Vert ^p_{\mathcal {E}}+ \big \Vert U^{\epsilon }(t)\big \Vert ^p_{\mathcal {E}}+\big \Vert W_A^{\epsilon }(t)/h(\epsilon )\big \Vert ^p_{\mathcal {E}}\big )\\ {}&\le C_{p,T}\bigg (1+\frac{1}{h^p(\epsilon )}+\int _{0}^{T} \mathbb {E}\sup _{s\in [0,t]}\Vert \hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\Vert ^p_\mathcal {E}dt\bigg ) \end{aligned} \end{aligned}$$

and \(h(\epsilon )\rightarrow \infty \) as \(\epsilon \rightarrow 0\). Another application of Grönwall’s inequality leads to

$$\begin{aligned} \sup _{\epsilon <\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(t)\Vert ^p_\mathcal {E}\le C \end{aligned}$$
(47)

which is the first estimate in (41). Note here that C does not depend on \(x^*\). Turning to the spatial Hölder regularity, an application of Taylor’s theorem for Gâteaux derivatives yields

$$\begin{aligned} \begin{aligned} \Psi ^{\epsilon ,v^\epsilon }(t)&=\frac{1}{\sqrt{\epsilon }h(\epsilon )}\int _{0}^{t}S(t-s)\big [ F\big (x^*+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big )-F\big (x^* \big ) \big ]ds\\ {}&=\int _{0}^{t}S(t-s) \bigg [DF\big (x^*\big )\big (\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big )+ \frac{\sqrt{\epsilon }h(\epsilon )}{2} D^2\\&\quad \times F\big (x^*+\theta _0\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big )\big (\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s), \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big ) \bigg ]ds \end{aligned} \end{aligned}$$
(48)

for some \(\theta _0\in (0,1)\). Let \(\theta >1/2\) and \(\alpha =(2\theta -1)/2\). By virtue of the Sobolev embedding theorem (see e.g. Theorem 8.2 in [19]) and Hypothesis 2(b) we have

$$\begin{aligned} \begin{aligned}&\big \Vert \Psi ^{\epsilon ,v^\epsilon }(t)\Vert _{C^\alpha }\\&\quad \le C \bigg \Vert \int _{0}^{t}S(t-s) \bigg [DF\big (x^*\big )\big (\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big )+ \sqrt{\epsilon }h(\epsilon ) D^2F\\&\qquad \times \big (x^*+\theta _0\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big )\big (\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s), \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big ) \bigg ]ds\bigg \Vert _{H^\theta }\\&\quad \le C\int _{0}^{t}(t-s)^{-\theta /2} \big \Vert DF\big (x^*\big )\big (\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big )+ \sqrt{\epsilon }h(\epsilon ) D^2F\big (x^*+\theta _0\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big )\\&\qquad \times \big (\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s), \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big ) \big \Vert _\mathcal {H}ds\\&\quad \le C_f\int _{0}^{t}(t-s)^{-\theta /2}\bigg [\big (1+\Vert x^*\Vert ^{p_0-1}_\mathcal {E}\big )\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big \Vert _\mathcal {E}+\sqrt{\epsilon }h(\epsilon )\\&\qquad \times \big (1+\big \Vert x^*+\theta _0\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big \Vert ^{p_0-2}_\mathcal {E}\big )\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big \Vert ^2_\mathcal {E}\bigg ]ds\\&\quad \le C_{f,\theta ,p_0,x^*}\bigg [1+\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big \Vert ^{p_0}_\mathcal {E}\bigg ]t^{1-\theta /2}. \end{aligned} \end{aligned}$$

In view of (47),

$$\begin{aligned} \begin{aligned} \sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert&\Psi ^{\epsilon ,v^\epsilon }(t)\Vert _{C^\alpha }\!\le \! C_{T,f,\theta ,p_0,x^*} \bigg [1\!+\!\sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big \Vert ^{p_0}_\mathcal {E}\bigg ]<\infty . \end{aligned}\nonumber \\ \end{aligned}$$
(49)

Repeating similar arguments to the ones used in (46) we see that

$$\begin{aligned} \begin{aligned} \sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert U^{\epsilon }(t)\big \Vert _{C^\alpha }\le C_{N,T,\theta ,f}\bigg [1+\sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big \Vert _\mathcal {E}\bigg ]<\infty . \end{aligned}\nonumber \\ \end{aligned}$$
(50)

Moreover, we have the following well-known spatial equicontinuity estimate for the stochastic convolution

$$\begin{aligned} \begin{aligned} \mathbb {E}\sup _{t\in [0,T]}\big \Vert W_A(t)\big \Vert _{C^\alpha }\le C. \end{aligned} \end{aligned}$$
(51)

The reader is refered to [17], Theorems 5.16, 5.22 for the proof and a detailed discussion of regularity properties of stochastic convolutions. Combining the latter along with (49) and (50) we deduce that for each \(\epsilon >0, t\in [0,T]\) \(\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t)\in C^a\) w.p. 1 and furthermore

$$\begin{aligned} \sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t)\big \Vert _{C^a}<\infty , \end{aligned}$$

for some sufficiently small \(\epsilon _0\). It remains to study the temporal equicontinuity of \(\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}\). Letting \(s<t\in [0,T]\) we have

$$\begin{aligned} \begin{aligned}&\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t)-\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)- [S(t-s)-I]\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\\&=\frac{1}{\sqrt{\epsilon }h(\epsilon )}\int _{s}^{t}S(t-r)\big [ F\big (x^*+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(r) \big )-F\big (x^* \big ) \big ]dr\\&\quad +\int _{s}^{t}S(t-r)\big [v^{\epsilon }(r) -u^\epsilon \big (r,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(r)\big )\big ]dr+ \frac{1}{h(\epsilon )}\int _{s}^{t}S(t-r)dW(r) \\ {}&=: \Psi ^{\epsilon ,v^\epsilon }(s,t)+U^{\epsilon }(s,t)+\frac{1}{h(\epsilon )}W_A(s,t). \end{aligned} \end{aligned}$$

Hence,

$$\begin{aligned} \begin{aligned}&\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(t)-\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big \Vert _\mathcal {E}\le \big \Vert \Psi ^{\epsilon ,v^\epsilon }(s,t)\big \Vert _\mathcal {E}+\big \Vert U^{\epsilon }(s,t)\big \Vert _\mathcal {E}\\&\quad +\big \Vert W_A(s,t)\big \Vert _\mathcal {E}+\big \Vert [S(t-s)-I]\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big \Vert _\mathcal {E}. \end{aligned} \end{aligned}$$
(52)

From the estimates preceding (49) and the arguments in (46) we obtain

$$\begin{aligned} \sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{s\ne t\in [0,T]}\frac{\big \Vert \Psi ^{\epsilon ,v^\epsilon }(s,t)\big \Vert _\mathcal {E}}{|t-s|^{1-\theta /2}}\le C_{f,\theta ,p_0,x^*}\bigg [1+\sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big \Vert ^{p_0}_\mathcal {E}\bigg ]<\infty \nonumber \\ \end{aligned}$$
(53)

and

$$\begin{aligned} \sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{s\ne t\in [0,T]}\frac{\big \Vert U^{\epsilon }(s,t)\big \Vert _\mathcal {E}}{|t-s|^{1-\theta /2}}\le C_{\theta ,N,T,f}\bigg [ 1+ \sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big \Vert _\mathcal {E}\bigg ]<\infty \nonumber \\ \end{aligned}$$
(54)

respectively. As for the stochastic convolution, there exists \(\beta \in (0,1)\) such that

$$\begin{aligned} \mathbb {E}\big [ W_{A}\big ]_{C^{\beta }([0,T];\mathcal {E})}\le C \end{aligned}$$
(55)

(see e.g. [17], Theorem 5.22). Finally, let \(\theta >0, \beta \in (0,1/2)\) such that \(\beta +\theta /2<1\). From the Sobolev embedding theorem and (5)

$$\begin{aligned} \begin{aligned} \big \Vert [S(t-s)-I]\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big \Vert _\mathcal {E}&\le C \big \Vert [S(t-s)-I]\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big \Vert _{H^\theta }\\ {}&\le C \big \Vert [S(t-s)-I](-A)^{\frac{\theta }{2}}\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big \Vert _{\mathcal {H}}\\ {}&\le C \big \Vert S(t-s)-I\big \Vert _{{\mathscr {L}}(H^\beta ;\mathcal {H})}\big \Vert (-A)^{\beta +\frac{\theta }{2}}\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big \Vert _{\mathcal {H}}\\ {}&\le C(t-s)^\beta \big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big \Vert _{H^{2\beta +\theta }}. \end{aligned} \end{aligned}$$

Following the derivation of the estimates (49), (50), (51) (see also Lemma A.3 in [30]) we deduce that

$$\begin{aligned} \begin{aligned}&\mathbb {E}\sup _{s\ne t\in [0,T]}\frac{\big \Vert [S(t-s)-I]\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big \Vert _\mathcal {E}}{|t-s|^{\beta _0}}\\ {}&\quad \le C \mathbb {E}\sup _{s\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s)\big \Vert _{H^{2\beta +\theta }} \le C\bigg [ 1+ \frac{1}{h(\epsilon )}+ \mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big \Vert ^{p_0}_\mathcal {E}\bigg ]<\infty . \end{aligned} \end{aligned}$$

From the latter and (52)-(55), there exists a sufficiently small \(\epsilon _0\) and \(\beta >0\) such that

$$\begin{aligned} \sup _{\epsilon<\epsilon _0}\mathbb {E}\sup _{t\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}\big \Vert _{C^{\beta }([0,T];\mathcal {E})}<\infty . \end{aligned}$$

This proves the last estimate in (41) and completes the proof. \(\square \)

From Lemma 3.7, along with an infinite-dimensional version of the Arzelà-Ascoli theorem, it follows that the family of laws of the controlled processes \(\{\eta ^{\epsilon ,v^\epsilon }_{x^*}\}_{\epsilon }\) is concentrated on compact subsets of \(C([0,T];{\mathcal {E}}),\) uniformly over sufficiently small values of \(\epsilon \). Thus, in view of Prokhorov’s theorem (Theorem 3.3 below), it forms a relatively compact set in the topology of weak convergence of measures in \(C([0,T];{\mathcal {E}})\). In the next section we aim to characterize the limit points as \(\epsilon \rightarrow 0\).

3.4 Limiting behavior of \(\hat{\eta }_{x^*}^{\epsilon , v^\epsilon }\)

Before we proceed to the main body of this section let us recall the notion of a tight family of probability measures and the classical theorem of Prokhorov.

Definition 3.4

Let \({\mathcal {Z}}\) be a Polish space and \(\Pi \subset {\mathscr {P}}({\mathcal {Z}})\) be a set of Borel probability measures on \({\mathcal {Z}}\) and \(\{P_n\}_{n\in \mathbb {N}}\subset \Pi .\) We say that (i) \(P_n\) converges weakly to a measure \(P\in {\mathscr {P}}({\mathcal {Z}})\) if for every \(f\in C_b({\mathcal {Z}})\)

$$\begin{aligned} \lim _{n\rightarrow \infty } \int _{{\mathcal {Z}}} f dP_n= \int _{{\mathcal {Z}}} f dP. \end{aligned}$$

(ii) \(\Pi \) is tight if for each \(\epsilon >0\) there exists a compact set \(K_\epsilon \subset {\mathcal {Z}}\) such that for all \(P\in \Pi \),

$$\begin{aligned} P({\mathcal {Z}}\setminus K_\epsilon )<\epsilon . \end{aligned}$$

Prokhorov’s theorem asserts that the notions of tightness and relative weak sequential compactness are equivalent for Borel measures on Polish spaces.

Theorem 3.3

(Prokhorov) Let \({\mathcal {Z}}\) be a Polish space and \(\Pi \subset {\mathscr {P}}({\mathcal {Z}})\) be a tight family of Borel probability measures. Then every sequence in \(\Pi \) contains a weakly convergent subsequence.

Lemma 3.8

Let \(\epsilon _0\) be sufficiently small, \(v^\epsilon \) be a sequence in \({\mathcal {A}}\) satisfying the assumptions of Theorem 3.2, u as in (25) and \(u^\epsilon :[0,T]\times \mathcal {H}\rightarrow \mathcal {H}\) be a sequence that converges pointwise and uniformly over bounded subsets of \(\mathcal {H}\) to u. Any sequence in \( \{(\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}, v^{\epsilon })\}_{\epsilon <\epsilon _0}\) has a further subsequence that converges in distribution in \(C([0,T];{\mathcal {E}})\times L^2([0,T];\mathcal {H})\) to a pair \((\hat{\eta }^{v^0}_{x^*}, v^0)\) in the product of uniform and weak topologies. Moreover:

(i) \(\hat{\eta }^{v^0}_{x^*}\) is equal in law to the (unique) solution of

$$\begin{aligned} \big \{ {\dot{\phi }}(t) = [A +DF(x^*)]\phi (t)-u(t,\phi (t))+v^0(t),\;\; \phi (0)=0\big \}, \end{aligned}$$
(56)

(ii) Any sequence in \(\{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}\;;\epsilon <\epsilon _0\}\) converges in distribution to a [0, T]-valued random variable \(\hat{\tau }^{v^0}\) such that

$$\begin{aligned} \hat{\eta }^{v^0}_{x^*}(\hat{\tau }^{v^0})\in \partial B_\mathcal {H}(0,L) \end{aligned}$$

and for all \(t<\hat{\tau }^{v^0},\) \(\hat{\eta }^{v^0}_{x^*}(t)\in B_\mathcal {H}(0,L)\) with probability 1 (recall that \(B_\mathcal {H}(0,L)\) denotes a closed ball on \(\mathcal {H}\)).

Proof

Starting from the controls \(v^\epsilon ,\) Lemma 3.6 along with Remark 9 yield

$$\begin{aligned} \sup _{\epsilon >0}\mathbb {E}\int _{0}^{T}\big \Vert v^\epsilon (t)\big \Vert ^2_{\mathcal {H}}dt<\infty . \end{aligned}$$

Since any bounded subset of \(L^2([0,T];\mathcal {H})\) is relatively compact in the weak topology, we deduce from the discussion after Lemma 3.7 that the family of laws of the pairs \(\{(\hat{\eta }^{v^\epsilon }_{x^*}, v^{\epsilon })\}_{\epsilon <\epsilon _0}\) is tight. By virtue of Prokhorov’s theorem any sequence of such elements contains a subsequence (denoted with the same notation) that converge in distribution to a pair \((\hat{\eta }_{x^*}, v^{0})\) of \(C([0,T];{\mathcal {E}})\times L^2([0,T];\mathcal {H})\)-valued random elements. We remark here that \(L^2([0,T];\mathcal {H})\) with the weak topology is not globally metrizable, hence not a Polish space, and Prokhorov’s theorem is not directly applicable. However the same conclusions can be drawn by a more general version of the theorem (e.g. Theorem 8.6.7 in [8]). Invoking Skorokhod’s theorem we can now assume that this convergence happens almost surely. This theorem involves the introduction of a new probability space with respect to which the convergence takes place. This will not be reflected in our notation for the sake of convenience. We will now characterize the law of \(\hat{\eta }_{x^*}\).

(i) Recall that for all \(t\in [0,T]\)

$$\begin{aligned} \begin{aligned} \hat{\eta }_{x^*}^{\epsilon , v^\epsilon }(t)=&\frac{1}{\sqrt{\epsilon }h(\epsilon )}\int _{0}^{t}S(t-s)\big [ F\big (x^*+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big )-F\big (x^* \big ) \big ]ds\\ {}&+\int _{0}^{t}S(t-s)v^{\epsilon }(s)ds -\int _{0}^{t}S(t-s)u^\epsilon \big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )+ \frac{1}{h(\epsilon )}W_A(t) \end{aligned} \end{aligned}$$

with probability 1. Starting from the last term, the estimate (10) yields \(\frac{1}{h(\epsilon )}W_A\longrightarrow 0\) in \(L^p(\Omega ; C([0,T];{\mathcal {E}}))\) for any \(p\ge 1\). Next, from Lemma 4.7 in [46] we have

$$\begin{aligned} \int _{0}^{\cdot }S(\cdot -s)v^\epsilon (s)ds\longrightarrow \int _{0}^{\cdot }S(\cdot -s)v^0(s)ds \end{aligned}$$

almost surely in \(C([0,T];{\mathcal {E}}).\) As for the term involving the changes of measure \(u^\epsilon \)

$$\begin{aligned} \begin{aligned}&\mathbb {E}\sup _{t\in [0,T]} \bigg \Vert \int _{0}^{t}S(t-s)\big [u^\epsilon \big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )-u\big (s,\hat{\eta }_{x^*}(s)\big )\big ] ds\bigg \Vert _\mathcal {E}\\&\le c\mathbb {E}\int _{0}^{T}\big \Vert u^\epsilon \big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )-u\big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )\big \Vert _\mathcal {E}ds\\ {}&+c\mathbb {E}\int _{0}^{T}\big \Vert u\big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )-u\big (s,\hat{\eta }_{x^*}(s)\big )\big \Vert _\mathcal {E}ds. \end{aligned} \end{aligned}$$

The first term on the right hand side converges to 0 by our assumptions along with (47). The almost sure convergence of \(\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}\) and the continuity of u (see (25)) along with the dominated convergence theorem imply the convergence of the second term to 0. Next, in view of (48), Hypothesis 2(b) and the dominated convergence theorem we have

$$\begin{aligned} \begin{aligned}&\mathbb {E}\sup _{t\in [0,T]}\bigg \Vert \frac{1}{\sqrt{\epsilon }h(\epsilon )}\int _{0}^{t}S(t-s)\big [ F\big (x^*+\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}(s) \big )\\&\quad -F\big (x^* \big )- DF\big (x^*\big )\big (\hat{\eta }_{x^*}(s)\big ) \big ]ds\bigg \Vert _\mathcal {E}\longrightarrow 0 \end{aligned} \end{aligned}$$

as \(\epsilon \rightarrow 0\). Uniqueness of (56) along with a subsequence argument complete the proof.

(ii) Since [0, T] is compact in the standard topology, the family of [0, T]-valued random variables \(\{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}\}_{\epsilon <\epsilon _0}\) is tight. Invoking Prokhorov’s and Skorokhod’s theorems once again, any sequence in this family has a subsequence that converges almost surely to a [0, T]-valued random variable \(\hat{\tau }^{v^0}_{x^*}.\) From the almost sure convergence of \(\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}\) and the definition of \(\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}\) (see Lemma 3.2), \(\hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}\big (\hat{\tau }^{\epsilon ,v^{\epsilon }}_{x^*}\big )\longrightarrow \hat{\eta }_{x^*}^{ v^0}\big (\hat{\tau }^{v^0}_{x^*}\big )\in \partial B_{\mathcal {H}}(0,L)\) almost surely (the latter being a closed set ). Moreover, for any \(t<\hat{\tau }^{v^0}_{x^*},\) there exists \(\delta >0\) and \(\epsilon _0>0\) sufficiently small such that \(t\le \hat{\tau }^{v^0}_{x^*}-\delta <\hat{\tau }^{\epsilon ,v^{\epsilon }}_{x^*}\) for all \(\epsilon \le \epsilon _0\) on a set of probability 1. Thus, for \(\epsilon \) sufficiently small, \(\{ \hat{\eta }^{\epsilon ,v^{\epsilon }}_{x^*}\big (t\big ) \}_{\epsilon }\subset {B_{\mathcal {H}}(0,L)}\) and the pointwise limit \(\hat{\eta }^{\epsilon ,v^{0}}_{x^*}\big (t\big )\in B_\mathcal {H}(0,L)\) with probability 1. \(\square \)

Remark 10

A simple consequence of Lemma 3.8 is that the moderate deviation process \(\eta ^{\epsilon }_{x}\) (3) which results by setting \(u=v^\epsilon =0\) in (18), converges as \(\epsilon \rightarrow 0\) to the solution of the linear deterministic PDE \({\dot{\phi }}(t) = [A +DF(X^0_x(t))]\phi (t)\) with zero initial condition, i.e. \(\eta ^{\epsilon }_{x}\rightarrow 0.\)

3.5 Proof of Theorem 3.1

Before we move on to the proof we remind the reader that the index \(k_0\) has been dropped.

Let \(\epsilon >0\). Returning to (17), choose a sequence \(\{v^\epsilon \}\subset {\mathcal {A}}\) of approximate minimizers such that (28) holds. Since \(u^\epsilon \) converges uniformly to u over bounded subsets, there exists \(\epsilon _0\) sufficiently small such that for any \(\delta >0\) and \(\epsilon <\epsilon _0\)

$$\begin{aligned} -\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )\!\ge \! \mathbb {E}\bigg [\frac{1}{2}\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\!\Vert v^\epsilon (s)\Vert ^2_\mathcal {H}ds\!-\!\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\!\Vert u\big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds\bigg ]\!-\!\epsilon \!-\!\delta . \end{aligned}$$
(57)

From the variational representation (17), Lemma 3.6 and the assumptions on \(u^\epsilon \) and u there exists \(\epsilon _0\) sufficiently small such that

$$\begin{aligned} \sup _{\epsilon<\epsilon _0}\bigg |\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )\bigg |\le & {} \sup _{\epsilon<\epsilon _0}\mathbb {E}\frac{1}{2}\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\Vert v^\epsilon (s)\Vert ^2_\mathcal {H}ds\\{} & {} +\sup _{\epsilon<\epsilon _0}\mathbb {E}\int _{0}^{T}\Vert u^\epsilon \big (s,\hat{\eta }^{\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds<\infty . \end{aligned}$$

Thus, there exists a sequence in \(\epsilon \) over which the left hand side in (57) converges to \(\liminf _{\epsilon \rightarrow 0} -\log Q^{\epsilon }(u^\epsilon )/h^2(\epsilon )\). Since the functional \({\mathcal {J}}: C([0,T];\mathcal {E})\times L^2([0,T];\mathcal {H})\times [0, T]\rightarrow \mathbb {R}\),

$$\begin{aligned} {\mathcal {J}}(\eta , v,\tau ):=\frac{1}{2}\int _{0}^{\tau }\Vert v(s)\Vert ^2_\mathcal {H}ds-\int _{0}^{\tau }\Vert u\big (\eta (s)\big )\Vert ^2_\mathcal {H}ds \end{aligned}$$

is lower semi-continuous in the product of uniform, weak and standard topologies, we can pass to a further subsequence and apply the Portmanteau lemma along with Lemma 3.8 to obtain

$$\begin{aligned} \liminf _{\epsilon \rightarrow 0}-\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )\ge & {} \liminf _{\epsilon \rightarrow 0}\mathbb {E}\big [{\mathcal {J}}\big ( \hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}, v^\epsilon ,\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}\big )\big ]-\delta \nonumber \\ {}\ge & {} \mathbb {E}\big [{\mathcal {J}}\big (\hat{\eta }^{v^0}_{x^*}, v^0,\hat{\tau }^{v^0}_{x^*}\big )\big ]-\delta \nonumber \\ {}= & {} \mathbb {E}\bigg [\frac{1}{2}\int _{0}^{\hat{\tau }^{v^0}_{x^*}}\Vert v^0(s)\Vert ^2_\mathcal {H}ds-\int _{0}^{\hat{\tau }^{v^0}_{x^*}}\Vert u\big (\hat{\eta }^{v^0}_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds\bigg ]\!-\!\delta \nonumber \\ {}\ge & {} \inf _{y\in {\mathcal {T}}}\inf _{v\in {\mathcal {C}}_{y,x^*}}\int _{0}^{\tau }\bigg (\frac{1}{2}\Vert v(s)\Vert ^2_\mathcal {H}-\Vert u(y(s))\Vert _\mathcal {H}^2\bigg ) ds-\delta ,\nonumber \\ \end{aligned}$$
(58)

with \({\mathcal {T}}\) as in (26). Since \(\delta \) is arbitrary, the upper bound is complete. To obtain a lower bound we will use the conclusions of Proposition 3.1 for the limiting variational problem. To this end let \(y^*\) satisfy

$$\begin{aligned} \begin{aligned}&\inf _{v\in {\mathcal {C}}_{y^*,x^*}}\int _{0}^{\tau _{y^*}}\bigg (\frac{1}{2}\Vert v(s)\Vert ^2_\mathcal {H}-\Vert u(y^*(s))\Vert _\mathcal {H}^2\bigg ) ds \\&\quad =\inf _{y\in {\mathcal {T}}}\inf _{v\in {\mathcal {C}}_{y,x^*}}\int _{0}^{\tau }\bigg (\frac{1}{2}\Vert v(s)\Vert ^2_\mathcal {H}-\Vert u(y(s))\Vert _\mathcal {H}^2\bigg ) ds. \end{aligned} \end{aligned}$$

As we mentioned in Sect. 3.2, the optimization problem on the left-hand side has an explicit solution attained by

$$\begin{aligned} {\bar{v}}(t)={\dot{y}}^*(t)-Ay^*(t) -DF(x^*)y^*(t)+u\big (y^*(t)\big ), t\in [0,T] \end{aligned}$$
(59)

and from Proposition 3.1, \(T=\inf \{ t>0: \Vert y^*(t)\Vert _{\mathcal {H}}=L\}=\tau _{y^*}\). Now consider the processes \(\hat{\eta }^{\epsilon , {\bar{v}}}_{x^*}\) controlled by \({\bar{v}}\). From Lemmas 3.7, 3.8, \(\{\hat{\eta }^{\epsilon , {\bar{v}}}_{x^*};\epsilon >0 \}\) is tight and converges in distribution to a process \(\hat{\eta }^{{\bar{v}}}_{x^*}.\) From the choice of \({\bar{v}}\) and uniqueness of solutions it follows that \(\hat{\eta }^{{\bar{v}}}_{x^*}=y^*\) with probability 1. Moreover, the exit times \(\hat{\tau }^{\epsilon , {\bar{v}}}_{x^*}\) converge in distribution to a random time \(\hat{\tau }^{{\bar{v}}}\) which is no less than the first exit time of \(y^*\) from \(\mathring{B}_\mathcal {H}(0,L)\). Since the latter is equal to T it follows that \(\hat{\tau }^{{\bar{v}}}=T\) with probability 1. Thus

$$\begin{aligned} \limsup _{\epsilon \rightarrow 0}-\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )&\le \! \frac{1}{2}\int _{0}^{\hat{\tau }^{\epsilon , {\bar{v}}}_{x^*}}\!\!\Vert {\bar{v}}(s)\Vert ^2_\mathcal {H}ds\!+\!\limsup _{\epsilon \rightarrow 0} -\mathbb {E}\int _{0}^{\hat{\tau }^{\epsilon , {\bar{v}}}_{x^*}}\!\!\Vert u^\epsilon \big (\hat{\eta }^{\epsilon , {\bar{v}}}_{x^*}\big )\Vert ^2_\mathcal {H}ds\nonumber \\ {}&\le \frac{1}{2}\int _{0}^{T}\Vert {\bar{v}}(s)\Vert ^2_\mathcal {H}ds-\liminf _{\epsilon \rightarrow 0} \mathbb {E}\int _{0}^{\hat{\tau }^{\epsilon , {\bar{v}}}_{x^*}}\Vert u^\epsilon \big (\hat{\eta }^{\epsilon , {\bar{v}}}_{x^*}\big )\Vert ^2_\mathcal {H}ds\nonumber \\ {}&\le \frac{1}{2}\int _{0}^{T}\Vert {\bar{v}}(s)\Vert ^2_\mathcal {H}ds-\int _{0}^{T}\Vert u\big (\hat{\eta }^{ {\bar{v}}}_{x^*}\big )\Vert ^2_\mathcal {H}ds \nonumber \\ {}&= \inf _{v\in {\mathcal {C}}_{y^*,x^*}}\int _{0}^{\tau _{y^*}}\bigg (\frac{1}{2}\Vert v(s)\Vert ^2_\mathcal {H}-\Vert u(y^*(s))\Vert _\mathcal {H}^2\bigg ) ds \nonumber \\ {}&= \inf _{y\in {\mathcal {T}}}\inf _{v\in {\mathcal {C}}_{y,x^*}}\int _{0}^{\tau }\bigg (\frac{1}{2}\Vert v(s)\Vert ^2_\mathcal {H}-\Vert u(y(s))\Vert _\mathcal {H}^2\bigg ) ds, \end{aligned}$$
(60)

where the second inequality follows from lower semi-continuity. Combining (58) and (60) allows us to conclude.

Remark 11

Theorem 3.1 is essentially equivalent to an MDP for the family \(\{X^\epsilon \}_{\epsilon }\) of solutions of (8), in the space \(C([0,T];{\mathcal {E}}).\) The latter is an asympotic statement for exponential functionals of \(g(X^\epsilon ),\) where \(g: C([0,T];{\mathcal {E}})\rightarrow \mathbb {R}\) is continuous and bounded (see Definition 3.1), while the former covers exit probabilities and corresponds to the choice \(g={\tilde{g}}\) with

$$\begin{aligned} {\tilde{g}}(\eta )={\left\{ \begin{array}{ll} &{}0, \;\;\eta : \sup _{t\in [0,T]}\Vert \eta (t)\Vert _\mathcal {\mathcal {H}}\ge L\\ {} &{} \infty , \;\; \eta : \sup _{t\in [0,T]}\Vert \eta (t)\Vert _\mathcal {\mathcal {H}}< L. \end{array}\right. } \end{aligned}$$

The case for bounded continuous test functions is in fact simpler, does not require analysis of the limiting variational problem and can be proved using very similar arguments to the ones used above. To be precise, for any continuous, bounded \(g: C([0,T];{\mathcal {E}})\rightarrow \mathbb {R}\) the variational representation (17) takes the form

$$\begin{aligned} -\frac{1}{h^2(\epsilon )}\log \;\mathbb {E}\big [ e^{-h^2(\epsilon )g(\eta ^\epsilon )}\big ]=\inf _{v\in {\mathcal {A}}}\mathbb {E}\bigg [ \frac{1}{2} \int _{0}^{T} \Vert v(t) \Vert ^2_\mathcal {H}dt+ g\big (\eta ^{\epsilon ,v} \big ) \bigg ], \end{aligned}$$

according to the classical results of [12]. The controlled process \(\eta ^{\epsilon ,v}\) solves (18) with \(u=0\) and \({\mathcal {A}}\) is a collection of square-integrable adapted controls. The tightness and limiting statements of Lemmas 3.7, 3.8 carry over verbatim after setting \(u=0\) and (11) then follows with the same action functional (31) by proving an upper and a lower bound as above. In particular, the upper bound is a consequence of lower-semicontinuity and the lower bound follows by considering the minimizing control \({\bar{v}}\) in (59). In fact, this simpler MDP is used to obtain Lemma 3.6 above, which is important for the case of unbounded functionals that we consider here.

3.6 Proof of Theorem 3.2

Let \(\{v^\epsilon \}\subset {\mathcal {A}}\) satisfy (28). From Lemma 3.8, Theorem 3.1 and the lower semi-continuity argument in (58) we know that the triples \((\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}, v^{\epsilon }, \hat{\tau }^{\epsilon ,v^\epsilon }_{x^*})\) converge in distribution to a triple \((\hat{\eta }^{v^0}_{x^*}, v^{0}, \hat{\tau }^{v^0}_{x^*} )\) and

$$\begin{aligned}&\inf _{y\in {\mathcal {T}}}\inf _{v\in {\mathcal {C}}_{y,x^*}}\int _{0}^{\tau }\bigg (\frac{1}{2}\Vert v(s)\Vert ^2_\mathcal {H}-\Vert u(y(s))\Vert _\mathcal {H}^2\bigg ) ds\nonumber \\&=\limsup _{\epsilon \rightarrow 0}-\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )\nonumber \\ {}&\ge \limsup _{\epsilon \rightarrow 0}\mathbb {E}\bigg [\frac{1}{2}\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\Vert v^\epsilon (s)\Vert ^2_\mathcal {H}ds-\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\Vert u^\epsilon \big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds\bigg ]\nonumber \\ {}&\ge \liminf _{\epsilon \rightarrow 0}\mathbb {E}\bigg [\frac{1}{2}\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\Vert v^\epsilon (s)\Vert ^2_\mathcal {H}ds-\int _{0}^{\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}}\Vert u^\epsilon \big (s,\hat{\eta }^{\epsilon ,v^\epsilon }_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds\bigg ]\nonumber \\ {}&=\mathbb {E}\bigg [\frac{1}{2}\int _{0}^{\hat{\tau }^{v^0}_{x^*}}\Vert v^0(s)\Vert ^2_\mathcal {H}ds-\int _{0}^{\hat{\tau }^{v^0}_{x^*}}\Vert u\big (\hat{\eta }^{v^0}_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds\bigg ]. \end{aligned}$$

Invoking Lemma 3.8 once again we have \(\hat{\eta }^{v^0}_{x^*}\in {\mathcal {T}}\) and \(v^0\in {\mathcal {C}}_{\hat{\eta }^{v^0}_{x^*},x^*}\) with probability 1. Since the left-hand side is the infimum over all such paths and controls it follows that

$$\begin{aligned} \begin{aligned}&\frac{1}{2}\int _{0}^{\hat{\tau }^{v^0}_{x^*}}\Vert v^0(s)\Vert ^2_\mathcal {H}ds-\int _{0}^{\hat{\tau }^{v^0}_{x^*}}\Vert u\big (\hat{\eta }^{v^0}_{x^*}(s)\big )\Vert ^2_\mathcal {H}ds \\&= \inf _{y\in {\mathcal {T}}}\inf _{v\in {\mathcal {C}}_{y,x^*}}\int _{0}^{\tau }\bigg (\frac{1}{2}\Vert v(s)\Vert ^2_\mathcal {H}-\Vert u(y(s))\Vert _\mathcal {H}^2\bigg ) ds \end{aligned} \end{aligned}$$

with probability 1. Thus, from Proposition 3.1 we can conclude that \(\hat{\tau }^{\epsilon ,v^\epsilon }_{x^*}\rightarrow T\) in probability as \(\epsilon \rightarrow 0\), \(\big \langle \hat{\eta }^{v^0}_{x^*}(T), e_1^f \big \rangle ^2_\mathcal {H}=L^2\) with probability 1 and (29) follows.

It remains to prove (30). We start from the upper bound which is a consequence of Lemma 3.1, provided that \(E=\{ \phi \in C([0,T];\mathcal {H}): \tau _{\phi }\le T\}\) is a \({\mathcal {S}}_{x^*,T}-\)continuity set. This property can be verified from the analysis of Sect. 3.2. In particular, Lemmas 3.3, 3.4 and Proposition 3.1 remain true after setting the second summand in (34) or (40) equal to 0. Hence the infima of the action functional over \(\{\tau _{\phi }\le T\}, \{\tau _{\phi }<T\} \) and \(\{\tau _{\phi }=T\}\) coincide and the estimate follows. As for the lower bound, we combine Theorem 3.1, (36) and (24) to obtain

$$\begin{aligned} G_T(0,0)=\frac{a_1^fL^2}{1-e^{-2a^{f}_1T}}\ge a_1^fL^2=U(0,0)\;,\;\;\lim _{T\rightarrow \infty }G_T(0,0)=U(0,0) \end{aligned}$$

and

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}-\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )= a_1^fL^2\bigg (1+\frac{1}{1-e^{-2a^{f}_1T}}\bigg )=U(0,0)+G_T(0,0). \end{aligned}$$

The latter shows that the lower bound actually holds with equality, hence the proof is complete.

4 Implementation and pre-asymptotic analysis of the scheme

4.1 Implementation issues and exponential mollification

In Sect. 3, we demonstrated that, under fairly general spectral gap conditions, an importance sampling scheme using the change of measure \(u_{k_0}\) (25) achieves nearly optimal asymptotic behavior as the noise intensity \(\epsilon \rightarrow 0\). However, changes of measure based only on the quasipotential subsolution U (24) can lead to poor pre-asymptotic performance. This issue is present even in finite dimensions and is related to the behavior of the controlled dynamics near the origin. In [23], the authors demonstrated that, for certain choices of controls v, the second moment of the estimator degrades over time. In these situations, the system tends to spend a large amount of time near the attractor thus accumulating a large running cost which affects the variance. As a result, for fixed \(\epsilon >0\) the pre-exponential terms which are ignored by the asymptotic bounds (30) dominate and can even lead to errors that increase exponentially as T grows. For more details the reader is referred to the discussion in [23] pp.2919-2921.

In infinite dimensions, an additional challenge appears when the changes of measure act on the full space \(\mathcal {H}.\) As we will see in Lemma 4.1 below, in order to prove that the second moment of a scheme behaves well for any fixed \(\epsilon >0,\) one needs to have good control over the quantity

$$\begin{aligned} \begin{aligned} {\mathscr {D}}_{x^*}^{\epsilon }(Z_{x^*})(t,\eta )&:=\partial _t Z_{x^*}(t,\eta )+\mathbb {H}_{x^*}\big (\eta ,D_\eta Z_{x^*}(t,\eta )\big )+\frac{1}{2h^2(\epsilon )}\text {tr}\big [ D^2_\eta Z_{x^*}(t,\eta )\big ], \end{aligned}\nonumber \\ \end{aligned}$$
(61)

where \(Z_{x^*}\) denotes a subsolution used for the analysis of the scheme. However, any radial function \(Z:\mathcal {H}\rightarrow \mathbb {R}\) such that \(Z(\eta )={\bar{Z}}(\Vert \eta \Vert _\mathcal {H})\), with \({\bar{Z}}''<0,\) satisfies \(\text {tr}\big [ D^2_\eta Z(\eta )\big ]=-\infty \). Thus, apart from dealing with the difficulties related to unbounded operators (see Remark 3), changes of measure for SRDEs that effectively accomplish dimension reduction are necessary for provably efficient performance.

In this section we construct a scheme under Hypothesis 3(c), i.e. our changes of measure only force the \(e_1^f\) direction. From this point on it is understood that \(u\equiv u_1\) and \(u^\epsilon _{1}\equiv u^\epsilon .\) In order to deal with the aforementioned issues, our changes of measure \(u^\epsilon \) will meet the following criteria: 1) The projected-quasipotential subsolution (denoted below by \(F_1\)) will be used for regions of space that are sufficiently far from the origin. 2) A constant subsolution \(F^{\epsilon }_2\) will dominate near zero. \(F^{\epsilon }_2\) does not influence the dynamics until they enter the domain where \(F_1\) dominates. 3) To avoid issues from lack of smoothness, the combination of \(F_1,F_2\) should be appropriately mollified. 4) As \(\epsilon \rightarrow 0\) the changes of measure \(u^\epsilon \) converge to the asymptotically nearly optimal u. A suitable choice is provided by the exponential mollification of \(F_1,F_2^\epsilon \).

To be precise, we define for \(a_1^f,e_1^f\) as in Hypothesis 3(c), \(\kappa \in (0,1)\) and \(\delta =\delta (\epsilon )>0\)

$$\begin{aligned} F_1(\eta ):=a_1^f(L^2-\langle \eta , e^f_1\rangle ^2_\mathcal {H}),\; F^\epsilon _2:=a_1^f(L^2-h(\epsilon )^{-2\kappa }), \eta \in \mathcal {H}\end{aligned}$$

and consider the exponential mollification

$$\begin{aligned} U^\delta (t,\eta ):=-\delta \log \bigg ( e^{-\frac{F_1(\eta )}{\delta }} + e^{-\frac{F^\epsilon _2}{\delta }} \bigg ). \end{aligned}$$
(62)

We implement our scheme using the change of measure

$$\begin{aligned} u^\epsilon (t,\eta ):=-D_\eta U^\delta (t,\eta )=-2a_1^f\rho ^\epsilon (\eta )\langle e_1^f, \eta \rangle _\mathcal {H}e_1^f, \end{aligned}$$
(63)

where

$$\begin{aligned} \rho ^\epsilon (\eta ):=\frac{ e^{-\frac{F_1(\eta )}{\delta }}}{ e^{-\frac{F_1(\eta )}{\delta }} + e^{-\frac{F^\epsilon _2}{\delta }}}, \end{aligned}$$

\(\delta =2/h^2(\epsilon )\) is the mollification parameter and \(\kappa \) is a parameter that controls the size of the neighborhood outside of which \(F_1\) dominates.

In order to derive non-asymptotic bounds for the second moment of the estimator, we will use the following min/max representation for the Hamiltonian

$$\begin{aligned} \mathbb {H}_{x^*}(\eta , p)=\inf _{v}\sup _{u}\bigg [ \langle p, A\eta +DF(x^*)\eta -u+v\rangle _\mathcal {H}-\frac{1}{2}\Vert u\Vert ^2_\mathcal {H}+\frac{1}{4}\Vert v\Vert _\mathcal {H}^2 \bigg ] \end{aligned}$$

(see e.g. [23, 24]) and for any smooth functions \(U_{x^*},Z_{x^*}:[0,T]\times \mathcal {H}\rightarrow \mathbb {R}\) we let \(u(t,\eta )=-D_\eta U_{x^*}(t,\eta )\) and \(p=D_\eta Z_{x^*}(t,\eta )\). Thus we obtain

$$\begin{aligned} \inf _{v}\bigg [\big \langle D_\eta Z_{x^*}(t,\eta ), A\eta&+DF(x^*)\eta -u(t,\eta )+v\big \rangle _\mathcal {H}-\frac{1}{2}\Vert u(t,\eta )\Vert ^2_{\mathcal {H}}+\frac{1}{4}\Vert v\Vert ^2_\mathcal {H}\bigg ]\nonumber \\ {}&=\mathbb {H}_{x^*}\big (\eta , D_\eta Z_{x^*}(t,\eta ) \big )-\frac{1}{2}\big \Vert D_\eta Z_{x^*}(t,\eta )- D_\eta U_{x^*}(t,\eta ) \big \Vert ^2_\mathcal {H}. \end{aligned}$$
(64)

A consequence of this expression is the following pre-asymptotic bound for the second moment:

Lemma 4.1

For any smooth functions \(U_{x^*},Z_{x^*}:[0,T]\times \mathcal {H}\rightarrow \mathbb {R},\) \({\mathscr {D}}_{x^*}\) as in (61) and some \(\theta _0\in (0,1)\) let

$$\begin{aligned} \begin{aligned} {\mathscr {H}}_{x^*}^{\epsilon }(Z_{x^*})(t,\eta )&:=\frac{\sqrt{\epsilon }h(\epsilon )}{2}\big \langle D_\eta Z_{x^*}\big (t,\eta \big ), D^2F\big (x^*+\theta _0\sqrt{\epsilon }h(\epsilon )\eta \big )\big (\eta ,\eta \big ) \big \rangle _\mathcal {H}\end{aligned} \end{aligned}$$
(65)

and

$$\begin{aligned} \begin{aligned}&{\mathfrak {H}}_{x^*}^{\epsilon }( U_{x^*}, Z_{x^*})(t,\eta ):={\mathscr {H}}_{x^*}^{\epsilon }( Z_{x^*})(t,\eta )+{\mathscr {D}}_{x^*}^{\epsilon }(Z_{x^*})(t,\eta )\\&\quad -\frac{1}{2}\Vert D_\eta Z_{x^*}(t,\eta )- D_\eta U_{x^*}(t,\eta )\Vert ^2_\mathcal {H}. \end{aligned} \end{aligned}$$
(66)

For all \(\epsilon >0\) we have

$$\begin{aligned}&-\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )\ge \inf _{v\in {\mathcal {A}}}\bigg [ 2Z_{x^*}\big (0,0\big )- 2\mathbb {E}Z_{x^*}\big (\hat{\tau }^{\epsilon ,v}_{x^*},\hat{\eta }_{x^*}^{\epsilon , v}(\hat{\tau }^{\epsilon ,v}_{x^*})\big )\nonumber \\&\quad +2\mathbb {E}\int _{0}^{\hat{\tau }^{\epsilon ,v}_{x^*}}{\mathfrak {H}}_{x^*}^{\epsilon }( U_{x^*}, Z_{x^*})\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big ) ds \bigg ]. \end{aligned}$$
(67)

The proof makes use of Itô’s formula and is deferred to Appendix A.

Remark 12

The term \({\mathscr {H}}_{x^*}^{\epsilon }\) accounts for the error coming from the local approximation of the nonlinear dynamics by their linearized version around the stable equilibrium \(x^*.\) A significant part of this section is devoted to the pre-asymptotic control of this term.

The rest of this section is devoted to the pre-asymptotic analysis of \(Q^{\epsilon }(u^\epsilon )\) based on the lower bound (67) with \(U_{x^*}=U^\delta (t,\eta ),\)

$$\begin{aligned} Z(t,\eta )=Z_{x^*}(t,\eta )=(1-\zeta )U^\delta (t,\eta ), \zeta \in (0,1). \end{aligned}$$
(68)

4.2 Performance analysis of the scheme

At this point we shall recall the definition of the random times

$$\begin{aligned} \hat{\tau }^{\epsilon ,v}_{x^*}=\inf \{ t>0 : \hat{\eta }^{\epsilon ,v}_{x^*}(t)\notin \mathring{B}_\mathcal {H}(0, L) \} \end{aligned}$$

where \(\hat{\eta }^{\epsilon ,v}_{x^*}\) solves (18). Before we state the main result of this section, we provide the definition of exponential negligibility; a concept which will be frequently used in the sequel.

Definition 4.1

A term will be called exponentially negligible (a) in the moderate deviations range if it can be bounded from above in absolute value by \(C_1e^{- c_2h^2(\epsilon )}/h^2(\epsilon )\) where \(C_1<\infty , c_2>0\) (b) in the large deviations range if (a) holds with \(1/h^2(\epsilon )\) replaced by \(\epsilon \).

Remark 13

Since \(\sqrt{\epsilon }h(\epsilon )\rightarrow 0\) as \(\epsilon \rightarrow 0,\) exponential negligibility in the large deviations range implies exponential negligibility in the moderate deviations range.

The analysis of this section is summarized in the following theorem. Its proof is postponed for the end of this section and is preceded by several auxiliary estimates.

Theorem 4.1

Let \(T,\alpha ,\zeta _0,\epsilon >0\) and \(u^\epsilon (t,\eta )=-D_\eta U^{\delta (\epsilon )}(t,\eta )\) with \(U^\delta \) defined in (62). Assume that \(\delta =2/h^2(\epsilon ), \kappa \in (0,1-\alpha ), \zeta \in (\zeta _0,1/2)\) and \(\epsilon \) is sufficiently small to have \(h^{2(\kappa +\alpha -1)}(\epsilon )\le \frac{9a_1^f}{2}(\zeta _0-2\zeta _0^2)\wedge \frac{a^f_1}{2}.\) Then, up to exponentially negligible terms in the moderate deviations range,

$$\begin{aligned} -\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )\ge \bigg [ (1-\zeta )a_1^f\bigg (L^2-\frac{1}{h(\epsilon )^{2\kappa }}\bigg )-\frac{2\log 2}{h^2(\epsilon )}\bigg ]-CT\sqrt{\epsilon }h(\epsilon ).\nonumber \\ \end{aligned}$$
(69)

‘, if \(h(\epsilon )\) is such that \(\sqrt{\epsilon }h^3(\epsilon )\longrightarrow 0\) as \(\epsilon \rightarrow 0\) then for \(\epsilon \) sufficiently small we have

$$\begin{aligned} -\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )\ge \bigg [ (1-\zeta )a_1^f\bigg (L^2-\frac{1}{h(\epsilon )^{2\kappa }}\bigg )-\frac{2\log 2}{h^2(\epsilon )}\bigg ]. \end{aligned}$$
(70)

Remark 14

Note that for a small fixed \(\epsilon ,\) (69) shows that, in theory, the second moment degrades as the sampling time T grows. This degradation is caused by the linearization error (65) and suggests that, in practice, good performance lies in the balance between \(\epsilon \) and T. Fortunately, (70) shows that this theoretical degradation is no longer present if the scaling \(h(\epsilon )\) does not grow too fast. Moreover, the simulation studies of Sect. 6 show that our scheme performs well for large T even when this growth assumption is not satisfied.

The following lemma collects a few straightforward computations that will be used below. Its proof can be found in Appendix A.

Lemma 4.2

For all \((t,\eta )\in [0,T]\times \mathcal {H},\zeta \in (0,1),\) \(U^\delta , Z\) as in (62), (68) and some \(\theta _0\in (0,1)\) we have

$$\begin{aligned} {\mathfrak {H}}_{x^*}^{\epsilon }( U^\delta , Z)(t,\eta )&=2(1-\zeta )(a_1^f)^2\rho ^\epsilon (\eta )\langle e_1^f, \eta \rangle ^2_\mathcal {H}\big [1-(1-\zeta )\rho ^\epsilon (\eta ) \big ]\nonumber \\&\quad -2\zeta ^2(a_1^f)^2\big (\rho ^\epsilon (\eta )\big )^2\langle e_1^f, \eta \rangle ^2_\mathcal {H}\nonumber \\&\quad -\frac{(1-\zeta )a_1^f\rho ^\epsilon (\eta )}{h^2(\epsilon )}\bigg [1+\frac{2}{\delta }\rho ^\epsilon (\eta )\big (1-\rho ^\epsilon (\eta )\big )\langle e_1^f, \eta \rangle _\mathcal {H}\bigg ]\nonumber \\ {}&\quad -\sqrt{\epsilon }h(\epsilon )2(1-\zeta )a_1^f\rho ^\epsilon (\eta )\langle e_1^f, \eta \rangle _\mathcal {H}\nonumber \\&\quad \times \langle e_1^f, D^2F\big (x^*+\theta _0\sqrt{\epsilon }h(\epsilon )\eta \big )\big (\eta ,\eta \big )\rangle _\mathcal {H}. \end{aligned}$$
(71)

Moving on to the main body of the analysis, let \(B_{\infty }(x^*,1)\) denote an open \(L^\infty -\)ball of radius 1 centered at \(x^*\) and

$$\begin{aligned} \tau _{\infty }^{\epsilon }=\inf \big \{t>0: \Vert \hat{\eta }_{x^*}^{\epsilon ,v}(t) \Vert _{L^\infty }\ge \tfrac{1}{\sqrt{\epsilon }h(\epsilon )} \big \}=\inf \big \{t>0: {\hat{X}}_{x^*}^{\epsilon ,v}(t)\in B_{\infty }(x^*,1)^c \big \}. \end{aligned}$$

Returning to (67) we have the following decomposition

$$\begin{aligned} \begin{aligned} -\frac{1}{h^2(\epsilon )}\log&Q^{\epsilon }(u^\epsilon )\ge \inf _{v\in {\mathcal {A}}}\bigg [ 2Z_{x^*}\big (0,0\big )- 2\mathbb {E}Z_{x^*}\big (\hat{\tau }^{\epsilon ,v}_{x^*},\hat{\eta }_{x^*}^{\epsilon , v}(\hat{\tau }^{\epsilon ,v}_{x^*})\big )\\&\quad +2\mathbb {E}\int _{0}^{\hat{\tau }^{\epsilon ,v}_{x^*}}{\mathfrak {H}}_{x^*}^{\epsilon }( U_{x^*}, Z_{x^*})\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big ) ds \bigg ]\\ {}&=\inf _{v\in {\mathcal {A}}}\bigg [ 2Z_{x^*}\big (0,0\big )- 2\mathbb {E}Z_{x^*}\big (\hat{\tau }^{\epsilon ,v}_{x^*},\hat{\eta }_{x^*}^{\epsilon , v}(\hat{\tau }^{\epsilon ,v}_{x^*})\big ) \\&\quad +2\mathbb {E}\int _{0}^{\hat{\tau }^{\epsilon ,v}_{x^*}\wedge \tau _{\infty }^{\epsilon }}{\mathfrak {H}}_{x^*}^{\epsilon }( U_{x^*}, Z_{x^*})\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big ) ds\\ {}&\quad \quad +2\mathbb {E}\int _{\hat{\tau }^{\epsilon ,v}_{x^*}\wedge \tau _{\infty }^{\epsilon }}^{\hat{\tau }^{\epsilon ,v}_{x^*}}{\mathfrak {H}}_{x^*}^{\epsilon }( U_{x^*}, Z_{x^*})\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big ) ds \bigg ]. \end{aligned} \end{aligned}$$
(72)

Remark 15

This decomposition allows us to deal with the cubic power of \(\eta \) that appears in (65). Since we are only controlling the spatial \(L^2-\)norm of the moderate deviation process, this term is problematic. In particular, estimates based in the a-priori bound (47) will introduce T-dependent constants which are not desirable for the pre-asymptotic analysis.

The last term in (72) concerns the behavior of the controlled process \(\hat{\eta }^{\epsilon ,v}\) in the event that it exits an \(L^\infty -\)ball of radius \(1/\sqrt{\epsilon }h(\epsilon )\) before it exits \(\mathring{B}_{\mathcal {H}}(0,L)\). Since the latter is a very rare event in the moderate deviations range, we expect that this term is exponentially negligible. This claim is proved in the following proposition.

Proposition 4.1

The term

$$\begin{aligned} 2\mathbb {E}\int _{\hat{\tau }^{\epsilon ,v}_{x^*}\wedge \tau _{\infty }^{\epsilon }}^{\hat{\tau }^{\epsilon ,v}_{x^*}}{\mathfrak {H}}_{x^*}^{\epsilon }( U^\delta , Z)\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big ) ds \end{aligned}$$

is exponentially negligible in the moderate deviations range for \(\epsilon \) sufficiently small.

Proof

Let \(t\in [0,T], \eta \in L^\infty \cap B_\mathcal {H}(0,L)\), \(\epsilon \) small enough to have \(\sqrt{\epsilon }h(\epsilon )<1.\) In view of (7),

$$\begin{aligned} \begin{aligned} \frac{1}{\sqrt{\epsilon }h(\epsilon )}\big |{\mathscr {H}}_{x^*}^{\epsilon }( Z)\big (t,\eta \big )\big |&\le 2(1-\zeta )a_1^f\rho ^\epsilon (\eta )\big |\langle e_1^f,\\&\quad \times \eta \rangle _\mathcal {H}\big |\big | \langle D^2F(x^*+\theta \sqrt{\epsilon }h(\epsilon )\eta )(\eta ,\eta ),e_1^f \rangle _\mathcal {H}\big |\\ {}&\le 2a_1^f\Vert \eta \Vert ^3_\mathcal {H}\Vert e_1^f\Vert _{\mathcal {H}}\big \Vert \partial ^2_xf\big ( x^*+\theta \sqrt{\epsilon }h(\epsilon )\eta \big )e_1^f\big \Vert _{L^\infty }\\ {}&\le 2C_{f,p}a_1^fL^3\Vert e_1^f\Vert _{L^\infty }\big (1+\Vert x^* \Vert ^{p_0-2}_{L^\infty }+\Vert \eta \Vert ^{p_0-2}_{L^\infty }\big ). \end{aligned} \end{aligned}$$

Moreover, from (71) we have

$$\begin{aligned} \begin{aligned}&\big |{\mathfrak {H}}_{x^*}^{\epsilon }( U^\delta , Z)\big (t,\eta \big )-{\mathscr {H}}_{x^*}^{\epsilon }(Z)\big (t,\eta \big )\big |\\&\le 2(1-\zeta )(a_1^f)^2\rho ^\epsilon (\eta )\langle e_1^f, \eta \rangle ^2_\mathcal {H}\big [1-(1-\zeta )\rho ^\epsilon (\eta ) \big ]+2\zeta ^2(a_1^f)^2\big (\rho ^\epsilon (\eta )\big )^2\langle e_1^f, \eta \rangle ^2_\mathcal {H}\\&\quad + \frac{(1-\zeta )a_1^f\rho ^\epsilon (\eta )}{h^2(\epsilon )}\bigg [1+\frac{2}{\delta }\rho ^\epsilon (\eta )\big (1-\rho ^\epsilon (\eta )\big )\langle e_1^f, \eta \rangle _\mathcal {H}\bigg ]\\ {}&\le 4(a_1^f)^2\Vert e_1^f\Vert ^2_{\mathcal {H}}\Vert \eta \Vert ^2_{\mathcal {H}}+ \frac{a_1^f}{h^2(\epsilon )}\bigg [1+h^2(\epsilon )\Vert e_1^f\Vert _{\mathcal {H}}\Vert \eta \Vert _{\mathcal {H}} \bigg ]\\ {}&\le C_{\ell ,f,L}, \end{aligned} \end{aligned}$$

where we used that \(\zeta ,\rho \in (0,1),\) and \(h(\epsilon )>1\). Combining the last two estimates we deduce that for any \(v\in {\mathcal {A}},\)

$$\begin{aligned}&\bigg |\int _{\hat{\tau }^{\epsilon ,v}_{x^*}\wedge \tau _{\infty }^{\epsilon }}^{\hat{\tau }^{\epsilon ,v}_{x^*}}{\mathfrak {H}}_{x^*}^{\epsilon }( U^\delta , Z)ds\bigg |\\&\quad \le \mathbb {1}_{\{\tau _{\infty }^{\epsilon }\le \hat{\tau }^{\epsilon ,v}_{x^*} \}}\int _{\tau _{\infty }^{\epsilon }}^{\hat{\tau }^{\epsilon ,v}_{x^*}}\big |{\mathfrak {H}}_{x^*}^{\epsilon }( U^\delta , Z)\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big )\big |ds\\ {}&\quad \le C_{f,p_0,L,\ell }\mathbb {1}_{\{\tau _{\infty }^{\epsilon }\le \hat{\tau }^{\epsilon ,v}_{x^*} \}}\int _{0}^{T}\big (1+\Vert x^* \Vert ^{p_0-2}_{L^\infty }+\big \Vert \hat{\eta }^{\epsilon ,v}_{x^*}(s) \big \Vert ^{p_0-2}_{L^\infty }\big )ds\\ {}&\quad \le C_{f,p_0,L,T}\mathbb {1}_{\{\tau _{\infty }^{\epsilon }\le T \}}\bigg ( 1+\Vert x^* \Vert ^{p_0-2}_{L^\infty }+\sup _{s\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v}_{x^*}(s) \big \Vert ^{p_0-2}_{L^\infty } \bigg ). \end{aligned}$$

An application of Hölder’s inequality along with (41) yields

$$\begin{aligned} \begin{aligned}&\bigg |\mathbb {E}\int _{\hat{\tau }^{\epsilon ,v}_{x^*}\wedge \tau _{\infty }^{\epsilon }}^{\hat{\tau }^{\epsilon ,v}_{x^*}}{\mathfrak {H}}_{x^*}^{\epsilon }( U^\delta , Z)\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big ) ds\bigg |\le C_{f,p_0,L,\ell ,T}\mathbb {P}[\tau _{\infty }^{\epsilon }\\&\le T ]^{\frac{1}{2}}\bigg ( 1+\Vert x^* \Vert ^{p_0-2}_{L^\infty }+\mathbb {E}\bigg [\sup _{s\in [0,T]}\big \Vert \hat{\eta }^{\epsilon ,v}_{x^*}(s) \big \Vert ^{2p_0-4}_{L^\infty }\bigg ]^{\frac{1}{2}} \bigg )\\ {}&\le C_{f,p,L,T,x^*}\mathbb {P}[\tau _{\infty }^{\epsilon }\le T]^{\frac{1}{2}}. \end{aligned} \end{aligned}$$

Recall now that \({\hat{X}}_{x^*}^{\epsilon , v}\) solves

$$\begin{aligned}{} & {} \big \{ d{\hat{X}}^\epsilon (t)=[A {\hat{X}}^\epsilon (t)+F\big ( {\hat{X}}^\epsilon (t)\big )]dt+\sqrt{\epsilon }h(\epsilon )\\{} & {} \quad \big [v(t)- u^\epsilon \big (\hat{\eta }^{\epsilon ,v}_{x^*}(t)\big )]dt+\sqrt{\epsilon }dW(t)\;\;,{\hat{X}}^\epsilon (0)= x^*\big \} \end{aligned}$$

and, as \(\epsilon \rightarrow 0\), \(\{{\hat{X}}_{x^*}^{\epsilon , v}\}_{\epsilon >0}\) satisfies a large deviation principle in \(C([0,T];L^\infty (0,\ell ))\) with action functional \(\widetilde{{\mathcal {S}}}_{x^*,T}: C([0,T];L^\infty (0,\ell ))\rightarrow [0,\infty ] \) given by

$$\begin{aligned}{} & {} \widetilde{{\mathcal {S}}}_{x^*,T}(\phi )=\inf _{u\in {\mathcal {P}}_\phi }\frac{1}{2}\int _{0}^{T}\big \Vert u(t)\big \Vert ^2_{\mathcal {H}}dt,\;\; \\{} & {} {\mathcal {P}}_\phi =\bigg \{ u\in L^2([0,T];\mathcal {H}):\forall t\in [0,T]\; \phi (t)\\{} & {} \qquad =S(t)x^*+\int _{0}^{t}S(t-s)\big [F\big ( \phi (s)\big )+ u(s) \big ] ds \bigg \}, \end{aligned}$$

where the convention \(\inf \varnothing =+\infty \) is in use (see e.g. [14], Theorems 6.2, 6.3). Passing to a convergent subsequence if necessary, we deduce that

$$\begin{aligned} \lim _{\epsilon \rightarrow 0}\epsilon \log \mathbb {P}[\tau _{\infty }^{\epsilon }\le T ]\le -\inf _{\phi \in {\mathcal {B}}_{\infty }(x^*,L)^c}\widetilde{{\mathcal {S}}}_{x^*,T}(\phi ), \end{aligned}$$

where \({\mathcal {B}}_{\infty }(x^*,L):=\{\phi \in C([0,T];\mathcal {H}): \sup _{t\in [0,T]}\Vert \phi (t)-x^*\Vert _{L^\infty }<1\}.\) Hence, for \(\epsilon \) sufficiently small

$$\begin{aligned} \epsilon \log \mathbb {P}[\tau _{\infty }^{\epsilon }\le T ]\le -\inf _{\phi \in {\mathcal {B}}_{\infty }(x^*,L)^c}\widetilde{{\mathcal {S}}}_{x^*,T}(\phi )/2 \end{aligned}$$

or equivalently

$$\begin{aligned}\mathbb {P}[\tau _{\infty }^{\epsilon }\le T ]^{\frac{1}{2}}\le e^{-\inf _{\phi \in {\mathcal {B}}_{\infty }(x^*,L)^c}\widetilde{{\mathcal {S}}}_{x^*,T}(\phi )/4\epsilon }. \end{aligned}$$

Finally, we claim that \(\inf _{\phi \in {\mathcal {B}}_{\infty }(x^*,L)^c}\widetilde{{\mathcal {S}}}_{x^*,T}(\phi )\!>\!0\). Indeed, since the action functional is lower semi-continuous (see Lemma 5.1, [14]) and \({\mathcal {B}}_{\infty }(x^*,L)^c\subset C([0,T];L^\infty (0,\ell ))\) is closed, there exists a minimizer \(\phi ^*\in {\mathcal {B}}_{\infty }(x^*,L)^c.\) Furthermore, there exists \(u^*\in {\mathcal {P}}_{\phi ^*}\) such that

$$\begin{aligned} 2\inf _{\phi \in {\mathcal {B}}_{\infty }(x^*,L)^c}\widetilde{{\mathcal {S}}}_{x^*,T}(\phi )=2\widetilde{{\mathcal {S}}}_{x^*,T}(\phi ^*)>\frac{1}{2}\int _{0}^{T}\big \Vert u^*(t)\big \Vert ^2_{\mathcal {H}}dt=\frac{1}{2}\big \Vert u^*\big \Vert ^2_{L^2([0,T];\mathcal {H})}>0. \end{aligned}$$

The last inequality fails if and only if \(u^*=0\) almost everywhere in \([0,T]\times [0,\ell ].\) Since \(x^*\) is an equilibrium of the uncontrolled system, the latter implies that \(\phi ^*(t)=x^*\) for all \(t\in [0,T],\) hence \(\phi ^*\notin {\mathcal {B}}_{\infty }(x^*,L)^c.\) This contradicts the initial choice of \(\phi ^*\) and concludes the argument. Therefore, the term of interest is exponentially negligible in the large deviation range hence also in the moderate deviation range. \(\square \)

Next, we turn our attention to the third term in (72). The linearization error in this term is easier to control, since the process \(\sqrt{\epsilon }h(\epsilon )\hat{\eta }^{\epsilon ,v}\) is uniformly bounded by 1 in \(L^\infty -\)norm. This fact is used in the following lemma whose proof can be found in Appendix 1.

Lemma 4.3

For all \(\eta \in B_{\infty }(0,1/\sqrt{\epsilon }h(\epsilon ))\) there exists a constant \(C=C_{x^*,\ell ,f}>0\) such that for \(\epsilon \) sufficiently small we have

$$\begin{aligned} \begin{aligned} {\mathscr {H}}_{x^*}^{\epsilon }(Z)(t,\eta ) \ge -C\sqrt{\epsilon }h(\epsilon )2(1-\zeta )a_1^f\rho ^\epsilon (\eta )\big |\langle e_1^f, \eta \rangle _\mathcal {H}\big |. \end{aligned} \end{aligned}$$
(73)

As for \( {\mathfrak {H}}_{x^*}^{\epsilon }( U^\delta , Z)(t,\eta )- {\mathscr {H}}_{x^*}^{\epsilon }(Z_{x^*})(t,\eta ),\) straightforward algebra along with the arguments of Lemma 4.2 of [23] yield

$$\begin{aligned} \begin{aligned} {\mathfrak {H}}_{x^*}^{\epsilon }( U^\delta , Z)(t,\eta )- {\mathscr {H}}_{x^*}^{\epsilon }(Z)(t,\eta )&\ge {\mathscr {D}}_{x^*}^{\epsilon }(Z)(t,\eta )-\frac{\zeta ^2}{2}\big \Vert D_\eta U^\delta (t,\eta )\big \Vert ^2_{\mathcal {H}}\\ {}&\ge (1\!-\!\zeta ){\mathscr {D}}_{x^*}^{\epsilon }(Z)(t,\eta )\!-\!\frac{\zeta -2\zeta ^2}{2}\big \Vert D_\eta U^\delta (t,\eta )\big \Vert ^2_{\mathcal {H}}\\ {}&\ge \frac{1-\zeta }{2}\bigg (1-\frac{1}{h^2(\epsilon )\delta }\bigg )\beta ^\epsilon _0(\eta )+(1-\zeta )\rho ^\epsilon (\eta )\gamma _1\\ {}&\quad +\frac{\zeta -2\zeta ^2}{2}\rho ^\epsilon (\eta )^2\big \Vert D_\eta F_1(\eta )\big \Vert ^2_{\mathcal {H}}, \end{aligned} \end{aligned}$$

where the quantity \(\beta _0(\eta ):=\rho ^\epsilon (\eta )\big (1-\rho ^\epsilon (\eta )\big )\big \Vert D_\eta F_1(\eta )\big \Vert ^2_{\mathcal {H}}\) is nonnegative since \(\rho ^\epsilon \in [0,1]\) and \(\gamma _1:={\mathscr {D}}_{x^*}(F_1)(\eta )=-a_1^f/h^2(\epsilon ).\) Combining the latter with (73) and substituting \(\delta =2/h^2(\epsilon )\) and

$$\begin{aligned} \big \Vert D_\eta F_1(\eta )\big \Vert ^2_{\mathcal {H}}=4(a_1^f)^2\langle \eta , e_1^f\rangle ^2\big \Vert e_1^f\Vert ^2_\mathcal {H}=4(a_1^f)^2\langle \eta , e_1^f\rangle ^2, \end{aligned}$$

we obtain the lower bound

$$\begin{aligned} \begin{aligned}&{\mathfrak {H}}_{x^*}^{\epsilon }(U^\delta ,Z)(t,\eta )\\&\quad \ge (1-\zeta )\rho ^\epsilon (\eta )\big (1-\rho ^\epsilon (\eta )\big )(a_1^f)^2\langle \eta , e_1^f\rangle ^2-\frac{1}{h^2(\epsilon )}(1-\zeta )\rho ^\epsilon (\eta )a_1^f\\ {}&\quad +2(\zeta -2\zeta ^2)\rho ^\epsilon (\eta )^2(a_1^f)^2\langle \eta , e_1^f\rangle ^2-2C\sqrt{\epsilon }h(\epsilon )(1-\zeta )a_1^f\rho ^\epsilon (\eta )\big |\langle e_1^f, \eta \rangle _\mathcal {H}\big |. \end{aligned} \end{aligned}$$
(74)

At this point we partition \(B_\mathcal {H}(0,L)=B_1^f\cup B_2^f\cup B_3^f,\) where

$$\begin{aligned} \begin{aligned}&B^f_1:=\big \{\eta \in \mathcal {H}:\; \langle \eta , e_1^f\rangle _\mathcal {H}^2\le h(\epsilon )^{-2(\kappa +\alpha )} \big \},\\ {}&B^f_2:=\big \{\eta \in \mathcal {H}:\; h(\epsilon )^{-2(\kappa +\alpha )}<\langle \eta , e_1^f\rangle _\mathcal {H}^2\le 2h(\epsilon )^{-2\kappa }-h(\epsilon )^{-2}K \big \},\\ {}&B^f_3:=\big \{\eta \in \mathcal {H}:\; 2h(\epsilon )^{-2\kappa }-h(\epsilon )^{-2}K <\langle \eta , e_1^f\rangle _\mathcal {H}^2\le L^2 \big \}. \end{aligned} \end{aligned}$$
(75)

and the constants \(\alpha ,\zeta , \kappa \in (0,1), K<0\) will be chosen later. The remaining part of this section is devoted to the study of the right-hand side of (74) on each component separately.

Lemma 4.4

Let \(\epsilon >0\) small enough to have \(\sqrt{\epsilon }h(\epsilon )<1\) and \( \zeta \in (0,1/2)\). For all \(\eta \in B_1^f\) (75), \( t\in [0,T]\) we have

$$\begin{aligned} {\mathfrak {H}}_{x^*}^{\epsilon }(U^\delta ,Z)(t,\eta )\ge 0, \end{aligned}$$

up to terms that are exponentially negligible in the moderate deviations range.

We address the region \(B_3^f\) in the following lemma.

Lemma 4.5

Let \(\kappa \in (0,1),\) \(K=-\ln 3,\zeta _0>0,\) \(\zeta \in [\zeta _0,1/2),\) \(\epsilon >0\) small enough to have \(\sqrt{\epsilon }h(\epsilon )<1\) and \(h^{2(\kappa -1)}(\epsilon )\le \frac{9a_1^f}{2}(\zeta _0-2\zeta _0^2).\) For all \(\eta \in B_3^f\) (75), \(t\in [0,T]\) we have either

$$\begin{aligned} (i)\quad {\mathfrak {H}}_{x^*}^{\epsilon }(U^\delta ,Z)(t,\eta )\ge -C\sqrt{\epsilon }h(\epsilon ), \end{aligned}$$

or, if \(h(\epsilon )\) is such that \(\sqrt{\epsilon }h^3(\epsilon )\rightarrow 0,\) then for sufficiently small \(\epsilon \) we have

$$\begin{aligned} (ii)\quad {\mathfrak {H}}_{x^*}^{\epsilon }(U^\delta ,Z)(t,\eta )\ge 0. \end{aligned}$$

It remains to study the region \(B_2^f.\) It is the most problematic region as there is no guarantee that the weight \(\rho ^\epsilon \) is exponentially negligible or of order one. The analysis is deferred to Appendix 1.

Lemma 4.6

Let \(\alpha \in (0,1), \kappa <1-\alpha , K=-\ln 3,\) \(\zeta \in (\zeta _0,1/2),\) \(\epsilon >0\) small enough to have \(\sqrt{\epsilon }h(\epsilon )<1\) and \(h^{2(\kappa +\alpha -1)}(\epsilon )\le \frac{a^f_1}{2}.\) For all \(\eta \in B_2^f\) (75), \( t\in [0,T]\) we have either

$$\begin{aligned} (i)\quad {\mathfrak {H}}_{x^*}^{\epsilon }(U^\delta ,Z)(t,\eta )\ge -C\sqrt{\epsilon }h(\epsilon ), \end{aligned}$$

where C does not depend on \(\epsilon \) or, if \(\sqrt{\epsilon }h^3(\epsilon )\longrightarrow 0\), there exists \(\epsilon \) sufficiently small such that

$$\begin{aligned} (ii)\quad {\mathfrak {H}}_{x^*}^{\epsilon }(U^\delta ,Z)(t,\eta )\ge 0. \end{aligned}$$

Combining the three previous lemmas we arrive at the following regarding the third term in (72)

Lemma 4.7

There exists a constant C independent of \(T>0\) such that for \(\epsilon \) sufficiently small,

$$\begin{aligned} (i)\quad \quad \quad \mathbb {E}\int _{0}^{\hat{\tau }^{\epsilon ,v}_{x^*}\wedge \tau _{\infty }^{\epsilon }}{\mathfrak {H}}_{x^*}^{\epsilon }(U^\delta ,Z)\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big ) ds\ge -CT\sqrt{\epsilon }h(\epsilon ). \end{aligned}$$

up to exponentially negligible terms in the moderate deviations range. Moreover, if \(h(\epsilon )\) is such that \(\lim _{\epsilon \rightarrow 0}\sqrt{\epsilon }h^3(\epsilon )=0\) then, for \(\epsilon \) sufficiently small,

$$\begin{aligned} (ii)\quad \quad \quad \quad \quad \quad \quad \mathbb {E}\int _{0}^{\hat{\tau }^{\epsilon ,v}_{x^*}\wedge \tau _{\infty }^{\epsilon }}{\mathfrak {H}}_{x^*}^{\epsilon }(U^\delta ,Z)\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big ) ds\ge 0. \end{aligned}$$

up to exponentially negligible terms in the moderate deviations range.

Proof

(i) From Lemmas 4.4, 4.6(i), 4.5(i) we have

$$\begin{aligned} \int _{0}^{\hat{\tau }^{\epsilon ,v}_{x^*}\wedge \tau _{\infty }^{\epsilon }}{\mathfrak {H}}_{x^*}^{\epsilon }(U^\delta ,Z)\big (s,\hat{\eta }^{\epsilon ,v}_{x^*}(s)\big ) ds\ge -C\big (\hat{\tau }^{\epsilon ,v}_{x^*}\wedge \tau _{\infty }^{\epsilon }\big )\sqrt{\epsilon }h(\epsilon ), \end{aligned}$$

with probability 1, up to exponentially negligible terms. Since \(\hat{\tau }^{\epsilon ,v}_{x^*}\wedge \tau _{\infty }^{\epsilon }\le T\) with probability 1 and the constant is deterministic, the estimate follows by taking expectation.

(ii) The estimate follows from Lemmas 4.4, 4.6(ii), 4.5(ii). \(\square \)

We conclude this section with the proof of Theorem 4.1.

Proof of Theorem 4.1

In view of Lemmas (4.1) and (4.7)(i), (72) yields

$$\begin{aligned} \begin{aligned} -\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )\ge \inf _{v\in {\mathcal {A}}}\bigg [&2Z\big (0,0\big )- 2\mathbb {E}Z\big (\hat{\tau }^{\epsilon ,v}_{x^*},\hat{\eta }_{x^*}^{\epsilon , v}(\hat{\tau }^{\epsilon ,v}_{x^*})\big )-CT\sqrt{\epsilon }h(\epsilon ) \bigg ]. \end{aligned} \end{aligned}$$

up to exponentially negligible terms. In view of (68), we have \(Z\big (t,\eta \big )=(1-\zeta )U^\delta (t,\eta ).\) From Theorem 3.2 we have \(\lim _{\epsilon \rightarrow 0}\mathbb {E}[Z\big (\hat{\tau }^{\epsilon ,v}_{x^*},\hat{\eta }_{x^*}^{\epsilon , v}(\hat{\tau }^{\epsilon ,v}_{x^*})\big )]=0\). Thus for \(\epsilon \) sufficiently small we may write

$$\begin{aligned} \begin{aligned} -\frac{1}{h^2(\epsilon )}\log Q^{\epsilon }(u^\epsilon )\ge \inf _{v\in {\mathcal {A}}}\bigg [ Z\big (0,0\big )-CT\sqrt{\epsilon }h(\epsilon )\bigg ]. \end{aligned} \end{aligned}$$

As for the first term, since \(U^\delta \) is the exponential mollification of two functions, Lemma 4.1 of [23] gives that

$$\begin{aligned} U^\delta (0,0)\ge F_1(0)\wedge F_2^{\epsilon }-\delta \log 2=F_2^{\epsilon }-\delta \log 2. \end{aligned}$$

Finally, the improved bound (70) follows by invoking Lemma (4.7)(ii). \(\square \)

5 The case of a double-well potential

In this section we specialize our results to SRDEs in which the differential operator \({\mathcal {A}}=\Delta \) (i.e. the second derivative operator in one spatial dimension) and the reaction term takes the form \(f=-V'_f,\) where \(V_f\) is a double-well potential as the one depicted below. This choice is possible in view of Hypotheses 2(a), 2(b) which allow arbitrary polynomial growth. Thus, we assume that \(V_f\) has two global minima and a local maximum which, for simplicity, is assumed to lie in the origin. Without loss of generality, we take \(f'(0)=-V''_f(0)=1\). Such SRDEs arise as scaling limits of particle systems with nearest-neighbor coupling that evolve in the inverted potential \(-V_f\) (see e.g. [4], Chapter 1) and provide one of the simplest examples of non-trivial dynamical behavior.

figure a

The deterministic reaction–diffusion equation posed on the interval \((0, \ell )\) has two stable equilibria \(x^*_{-}, x^*_{+}\), corresponding to the global minima, and a saddle point \(x^*_{0}\), corresponding to the global maximum, that is identically equal to 0. The equilibria \(x^*_{\pm }\) only exist if \(\ell >\pi \) for Dirichlet boundary conditions and for all \(\ell >0\) for Neumann and periodic conditions. Moreover, every time the interval length \(\ell \) crosses the value \(k\pi \) (for Neumann or Dirichlet b.c.) or \(2k\pi \) (for periodic b.c.), for some \(k\in \mathbb {N},\) two (resp. one) non-constant saddle points \(\pm x^*_{k,\ell }\) (resp. \(x^*_{p,k,\ell }\)) bifurcate from \(x^*_{0}.\) The \(k-\)th non-constant saddle points feature k kink-antikink pairs in the periodic and Dirichlet cases and k kinks in the Neumann case. The interested reader is referred to [4], Section 2.1 and [27] for the bifurcation analysis of the problem with different boundary conditions.

For most of the sequel, we specialize the discussion to the potential

$$\begin{aligned} V_f(x)=\frac{x^4}{4}-\frac{x^2}{2}\;,\;\; x\in \mathbb {R}. \end{aligned}$$

The corresponding bistable stochastic dynamics are governed by the (stochastic) Allen–Cahn equation

$$\begin{aligned} \partial _tX^\epsilon =\partial ^2_{\xi }X^{\epsilon }+ X^\epsilon -\big (X^\epsilon \big )^3+\sqrt{\epsilon }{\dot{W}}. \end{aligned}$$
(76)

The noiseless (\(\epsilon =0\)) equation was proposed in [2] as a simple model of phase separation of two-component alloy systems. It is also known in the literature as real Ginzburg-Landau [40] (due to its connections with the physical superconductivity theory bearing the same name) or Chafe-Infante problem [15]. Transitions between the stable states \(x^*_{\pm }\) that correspond to the absolute minima \(\pm 1\) are enabled by the stochastic forcing and have been studied as models of quantum tunneling phenomena [27] and thermally induced magnetization reversal of micromagnets [42]. For studies of transition times the interested reader is referred to [5, 6] and [42, 43] in the mathematical and physical literature respectively.

5.1 Stochastic Allen–Cahn with Neumann and periodic boundary conditions

The eigenvalues of the Neumann and periodic (negative) Laplacian on the interval \((0,\ell )\) are respectively given by

$$\begin{aligned} a^{Ne}_n=\bigg (\frac{n\pi }{\ell }\bigg )^2\;, a^{per}_{\pm n}=\bigg (\frac{2 n\pi }{\ell }\bigg )^2\;,\;\;n=0,1,2,\dots . \end{aligned}$$
(77)

For \(\xi \in (0,\ell ), n=1,2,\dots ,\) the corresponding eigenfunctions are

$$\begin{aligned} e^{Ne}_0(\xi )= & {} \ell ^{-1/2},\; e^{Ne}_n(\xi )=\sqrt{\frac{2}{\ell }}\cos \bigg (\frac{n\pi \xi }{\ell }\bigg ) \;,e^{per}_0(\xi )=\ell ^{-1/2}\;, e^{per}_{\pm n}(\xi )\nonumber \\= & {} \sqrt{\frac{2}{\ell }}\bigg [\pm \sin \bigg (\frac{ 2n\pi \xi }{\ell }\bigg )+ \cos \bigg (\frac{ 2n\pi \xi }{\ell }\bigg )\bigg ]. \end{aligned}$$
(78)

For both cases, the stable equilibria are the constant functions \(x^{*}_{\pm }(\xi )=\pm 1, \xi \in (0,\ell ).\) The reaction term is \(f(x)=x-x^3\) and the linearized operators \(\Delta +DF(x^*_{\pm })\) acting on a function y are given by

$$\begin{aligned}{}[\Delta +DF(x^*_{\pm })]y(t)=y''(t)+\big [(1-3x^2)|_{x=x^*_{\pm }}\big ]y(t)=y''(t)-2y(t)\;,\;\;t\in [0,T]. \end{aligned}$$

Both cases can be treated simultaneously after indexing the eigenpairs by the natural numbers. In particular, in the Neumann case, the eigenvalues \(\{a^f_n\}\) from Hypothesis 3(b) are shifted eigenvalues of the (negative) Laplacian, i.e.

$$\begin{aligned} a^f_n=2+\bigg (\frac{(n-1)\pi }{\ell }\bigg )^2,\; n=1,2,\dots \end{aligned}$$

and the sequence of eigenvectors \(\{e_n^f\}\) coincides with \(\{e^{Ne}_{n-1}\}.\) In the periodic case we set \(a_1^f=2,\) \(a_{2n}^f=2+a^{per}_{n}, a^f_{2n+1}=2+a^{per}_{-n}\) and \(e_1^f=e^{per}_{0},\) \(e_{2n}^f=e^{per}_{n}, e^f_{2n+1}=e^{per}_{-n}\) for \(n=1,2,\dots \).

Turning to the spectral gap conditions, Theorems 3.1 and 3.2 hold for any value of \(\ell \) provided that the change of measure \(u_{k_0}\) acts on a finite-dimensional eigenspace of sufficiently high dimension. For example, consider the Neumann problem with \(\ell =4\pi /3.\) For this value of \(\ell \), (76) has 3 saddle points (see e.g. [48], Chapter 5.3.4) and it is easy to check that Hypothesis 3(c) is violated. However, the weak spectral gap of Hypothesis 3c’ is satisfied for \(k_0=3.\) Indeed, we have

$$\begin{aligned} 3a^f_1=6<2+\frac{9\pi ^2}{(4\pi /3)^2}=2+\frac{81}{16}=7.0265=a^f_4. \end{aligned}$$

Thus, the asymptotic results hold with the change of measure

$$\begin{aligned} u_{3}(t,\eta )=2\big (a_1^f\langle \eta , e_1^f\rangle _\mathcal {H}e_1^f +a_2^f\langle \eta , e_2^f\rangle _\mathcal {H}e_2^f+a_3^f\langle \eta , e_3^f\rangle _\mathcal {H}e_3^f \big ). \end{aligned}$$

As for the pre-asymptotic analysis of Sect. 4 and the numerical studies of the following section we work under the stronger spectral gap of Hypothesis 3(c). For the Neumann problem, this places the restriction

$$\begin{aligned} 3a_1^f=6<2+\frac{\pi ^2}{\ell ^2}= a_2^f\iff \ell <\frac{\pi }{2} \end{aligned}$$

which can be weakened to \(\ell <\pi /\sqrt{2}\) in view of Remark 7. For the periodic problem, Hypothesis 3(c) gives

$$\begin{aligned} 3a_1^f=6<2+\frac{4\pi ^2}{\ell ^2}= a_2^f\iff \ell <\pi . \end{aligned}$$

Finally, an example where the assumptions of Lemma 3.5 are satisfied is given by the Neumann problem with \(\ell \ge \pi /\sqrt{2}.\) In this case it is straightforward to verify that \(a_2^f=2+\pi ^2/\ell ^2\le 4=2a_1^f.\)

5.2 Stochastic Allen–Cahn with Dirichlet boundary conditions

The eigenpairs of the Dirchlet Laplacian on the interval \((0,\ell )\) are explicitly given by

$$\begin{aligned} a^{Dir}_n=\bigg (\frac{n\pi }{\ell }\bigg )^2\;,\; e^{Dir}_n(\xi )=\sqrt{\frac{2}{\ell }}\sin \bigg (\frac{n\pi \xi }{\ell }\bigg )\;,\;\; n=1,2,\dots \end{aligned}$$
(79)

However, exact spectral analysis and numerical simulation of the linearized operators is more involved than the periodic and Neumann cases. This is due to the fact that the stable equilibria \(x^*_{\pm }\) are non-constant functions with absolute value less than or equal to 1 that vanish at the endpoints \(0,\ell .\) They can be determined by solving the Sturm-Liouville problem

$$\begin{aligned} \left\{ \begin{aligned}&x''(\xi )=V'_f(x(\xi ))=x^3(\xi )-x(\xi ) ,\; \xi \in (0,\ell )\\ {}&x(0)=x(\ell )=0. \end{aligned}\right. \end{aligned}$$

Following [54] (see also [26]), we can parametrize \(x^*_{\pm }\) with respect to their minimum pointwise distance from the constant solutions \(\pm 1\). The latter is in one-to-one correspondence with the bifurcation parameter \(\ell \) (see (80) below).

First, note that the scaling \(y(\xi )=x(\ell \xi )\) leads to the equivalent problem

$$\begin{aligned} \left\{ \begin{aligned}&y''(\xi )=\ell ^2\big (y^3(\xi )-y(\xi )\big ),\; \xi \in (0,1)\\ {}&y(0)=y(1)=0. \end{aligned}\right. \end{aligned}$$

For any \(a\in (0,1),\) the stable equilibria \(y^*_{\pm }\) of the latter are then given by \(\pm y^*,\)

$$\begin{aligned} y^*(\xi )\equiv y^*_a(\xi )=a\;sn\bigg ( 2 K\bigg (\frac{a^2}{2-a^2}\bigg )\xi , \frac{a^2}{2-a^2} \bigg ), \xi \in (0,1), \end{aligned}$$

where for any \(m\in (0,1),\)

$$\begin{aligned} K(m):=\int _{0}^{1}\frac{dx}{\sqrt{(1-x^2)(1-m x^2)}}\;, m\in (0,1) \end{aligned}$$

is the complete elliptic integral of the first kind and \( sn(\cdot ,m)\) is the Jacobi elliptic sine function defined by

$$\begin{aligned} x=\int _{0}^{sn(x,m)}\frac{dy}{\sqrt{(1-y^2)(1-m y^2)}}\;, x\in [0, K(m)]. \end{aligned}$$

The function \(sn(\cdot ,m)\) can be periodically extended to all of \(\mathbb {R}\) so that K(m) is its quarter-period. We remark that there are several different parameterizations of K in the literature (e.g. in [54] \(K(\xi )\) corresponds to \(K(\sqrt{\xi })\) in our notation). The definition above was chosen in agreement with [1] and the corresponding built-in Matlab function.

The parameter a is the maximum value of \(y_a^*\) i.e.

$$\begin{aligned} y^*_a\big (\inf \{\xi \in (0,1): y'(\xi )=0\}\big )=a. \end{aligned}$$

In order to convert to a parameterization in terms of \(\ell ,\) we first define a scaled quarter-period map \({\mathcal {M}}:(0,1)\rightarrow \mathbb {R}\) with

$$\begin{aligned} {\mathcal {M}}(a):= & {} \frac{1}{\sqrt{2}}\int _{0}^{a}\frac{dx}{\sqrt{V_f(a)-V_f(x)}}=\sqrt{2}\int _{0}^{1}\frac{dx}{\sqrt{1-x^2}\sqrt{2-a^2(1+x^2)}}\\= & {} \frac{\sqrt{2}}{\sqrt{2-a^2}}K\bigg (\frac{a^2}{2-a^2}\bigg ). \end{aligned}$$

The correspondence of the interval length \(\ell \) and a is then given by

$$\begin{aligned} \ell =2{\mathcal {M}}(a). \end{aligned}$$
(80)

As seen in the Fig. 1, \({\mathcal {M}}\) is continuous, strictly increasing and \(\lim _{a\rightarrow 1}{\mathcal {M}}(a)=\infty \). Thus \({\mathcal {M}}\) is continuously invertible. Furthermore it is straightforward to verify that \(\lim _{a\rightarrow 0}{\mathcal {M}}(a)=\pi /2\). Putting the previous facts together we deduce that

$$\begin{aligned} \begin{aligned} x^*_{+}(\xi )=-x^*_{-}(\xi )&=y^*_{+}(\xi /\ell )\\ {}&= a\;sn\bigg ( 2 K\bigg (\frac{a^2}{2-a^2}\bigg )\frac{\xi }{\ell }, \frac{a^2}{2-a^2} \bigg )\\ {}&=a\;sn\bigg ( 2 K\bigg (\frac{a^2}{2-a^2}\bigg )\frac{\xi }{2{\mathcal {M}}(a)}, \frac{a^2}{2-a^2} \bigg )\\ {}&=a\; sn\bigg ( \xi \sqrt{1-\frac{a^2}{2}}, \frac{a^2}{2-a^2}\bigg )\;,\xi \in (0,\ell )\;\;,\; a={\mathcal {M}}^{-1}(\ell /2). \end{aligned} \end{aligned}$$
(81)
Fig. 1
figure 1

The map \({\mathcal {M}}\)

Turning to the spectral properties of the linearized operators \(\Delta +DF(x^*_{\pm }),\) they have a countable sequence of eigenvalues-eigenvectors \(\{(a_n^f, e_n^f)\}_{n\in \mathbb {N}},\) hence they satisfy Hypothesis 3(b). The first two pairs have been computed explicitly in [54] and are given by

$$\begin{aligned} a_1^f= & {} \frac{3}{2}a^2=\frac{3}{2}{\mathcal {M}}^{-1}(\ell /2)\;, e_1^f(\xi )=e_{1,a}^f(\xi )\nonumber \\= & {} sn\bigg ( \xi \sqrt{1-\frac{a^2}{2}}, \frac{a^2}{2-a^2}\bigg )dn\bigg ( \xi \sqrt{1-\frac{a^2}{2}}, \frac{a^2}{2-a^2}\bigg ) \end{aligned}$$
(82)

and

$$\begin{aligned} a_2^f= & {} \frac{3}{2}(2-a^2)=\frac{3}{2}\big (2-{\mathcal {M}}^{-1}(\ell /2)\big )\;, e_2^f(\xi )= e^f_{2,a}(\xi )\\= & {} sn\bigg ( \xi \sqrt{1-\frac{a^2}{2}}, \frac{a^2}{2-a^2}\bigg )cn\bigg ( \xi \sqrt{1-\frac{a^2}{2}}, \frac{a^2}{2-a^2}\bigg ) \end{aligned}$$

where dncn denote the Jacobi delta amplitude and elliptic cosine functions

$$\begin{aligned} dn(x,m):=\sqrt{1-m^2sn^2(x,m)}\;,\;\;cn^2(x,m):=1-sn^2(x,m), cn(0,m)=1. \end{aligned}$$

The spectral gap of Hypothesis 3(c) is then satisfied if

$$\begin{aligned} \frac{3a_1^f}{a_2^f}=\frac{3a^2}{2-a^2}<1\iff a<\frac{\sqrt{2}}{2}\iff \ell < 2{\mathcal {M}}(\sqrt{2}/2) \end{aligned}$$

where we used the monotonicity of \({\mathcal {M}}\) and \(2{\mathcal {M}}(\sqrt{2}/2)\approx 4.0043.\) Plots of the equilibria \(y^*_{a}\) and eigenfunctions \(e^f_{1,a}\) for \(a=0.65,0.95\) are given in Figs. 2 and 3.

Fig. 2
figure 2

Instances of the Dirichlet stable equilibrium \(y_a^*\)

Fig. 3
figure 3

Instances of the Dirichlet eigenfunctions \(e^f_{1,a}\)

5.3 A higher-order Ginzburg–Landau SRDE

We conclude this section with an example of an SRDE with a higher-order polynomial nonlinearity. This time we consider a potential given by

$$\begin{aligned} V_f(x)=\frac{4+\mu }{12}-\frac{1}{2}x^2-\frac{\mu }{4}x^4+\frac{\mu +1}{6}x^6\;,\;\;x\in \mathbb {R}. \end{aligned}$$

If \(\mu >-1\) then \(V_f\) is a double-well potential with steeper walls than the fourth-order case. Such potentials have been considered in the physical literature as higher order quantum mechanical models, see e.g. [39]. The nonlinear reaction term is given by \(f(x)=-V'_f(x)= x+\mu x^3-(\mu +1)x^5\) and \(f'(x) = 1+3\mu x^2-5(\mu +1)x^4\). For \(\mu \in (-1,0],\) Hypothesis 2(a) is satisfied with \(f_1(x)=x\) and \(f_2(x)=\mu x^3-(\mu +1)x^5\). The corresponding SRDE is given by

$$\begin{aligned} \partial _tX^\epsilon =\partial ^2_{\xi }X^{\epsilon }+ X^\epsilon +\mu \big (X^\epsilon \big )^3-(\mu +1)\big (X^\epsilon \big )^5+\sqrt{\epsilon }{\dot{W}}. \end{aligned}$$
(83)

The noiseless dynamics with Neumann or periodic boundary conditions are bistable for any \(\ell >0\) with stable equilibria \(x^*_{\pm }=\pm 1\) and a saddle point \(x_0^*=0\). The linearized operator

$$\begin{aligned} \Delta +DF(\pm 1)=\Delta +(1+3\mu x^2-5(\mu +1)x^4)|_{x=\pm 1}=\Delta -2\mu -4 \end{aligned}$$

has the same eigenfunctions as the Laplacian and eigenvalues shifted by \(-2\mu -4\).

As in the Allen–Cahn case, Theorems 3.1 and 3.2 hold for any value of \(\ell \) provided that the change of measure \(u_{k_0}\) acts on a finite-dimensional eigenspace of sufficiently high dimension. The pre-asymptotic analysis of Sect. 4 holds under the spectral gap of Hypothesis 3(c). In the Neumann case the spectral gap holds if

$$\begin{aligned} 3a_1^f=6\mu +12<2\mu +4+\frac{\pi ^2}{\ell ^2}=a_2^f\iff \ell <\frac{\pi }{2\sqrt{\mu +2}} \end{aligned}$$

and in the periodic case

$$\begin{aligned} 3a_1^f=6\mu +12<2\mu +4+\frac{4\pi ^2}{\ell ^2}=a_2^f\iff \ell <\frac{\pi }{\sqrt{\mu +2}}. \end{aligned}$$

6 Numerical simulations

In this section we demonstrate the theoretical results of this paper by a series of simulation studies for (8). As explained in Sect. 3, we start the process \(X_x^\epsilon \) at a stable equilibrium \(x=x^*\) and develop a scheme that computes exit probabilities of the form

$$\begin{aligned} P(\epsilon )= P(\epsilon ,T)=\mathbb {P}[ \tau _{x^*}^\epsilon \le T] \end{aligned}$$

for \(\epsilon \ll 1, T>0,\) where \( \tau _{x^*}^\epsilon =\inf \{ t>0: X_{x^*}^{\epsilon }\notin D\}\) and \( D=\mathring{B}_\mathcal {H}(x^*, L\sqrt{\epsilon }h(\epsilon )).\) For the simulations that follow we fix \(L=1\) and set \(R=R(\epsilon )=\sqrt{\epsilon }h(\epsilon )\). In view of Remark 10 we have

$$\begin{aligned} P(\epsilon )=\mathbb {P}\bigg [\sup _{t\in [0,T]}\big \Vert \eta _{x^*}^{\epsilon }(t)\big \Vert _{\mathcal {H}}\ge 1\bigg ] \end{aligned}$$

and the process \(\eta _{x^*}^{\epsilon }\) (3) converges in distribution to 0 as \(\epsilon \rightarrow 0\). Hence, for \(\epsilon \) small, we are dealing with rare events. We will apply the scheme of Sect. 4 to the examples of Sect. 5 and compare its performance to the standard Monte Carlo, which corresponds to no change of measure at all. It is clear that in order to simulate the mild solutions in (9), (42) we need to discretize the equation in time and space. In the simulations below we used the exponential Euler scheme finite-dimensional Galerkin projection as it is described in [37]. In particular, with \(\hat{\eta }^{\epsilon }=({\hat{X}}^\epsilon -x^*)/\sqrt{\epsilon }h(\epsilon ),\) \(u^\epsilon \) as in (25) and \({\hat{X}}^\epsilon \) solving

$$\begin{aligned} \left\{ \begin{aligned}&d{\hat{X}}^\epsilon (t)=[A{\hat{X}}^{\epsilon }(t)+F({\hat{X}}^\epsilon (t))]dt+\sqrt{\epsilon }h(\epsilon )u^\epsilon (\hat{\eta }^{\epsilon }(t))dt+\sqrt{\epsilon }dW(t)\\ {}&{\hat{X}}^{\epsilon }(0)=x^* \end{aligned}\right. \end{aligned}$$
(84)

we simulate the mild solution \({\hat{X}}^\epsilon \) on the sampling window \(t\in [0,T]\) until it hits \(\partial D\). Its N-th Galerkin projection is given in mild formulation by

$$\begin{aligned} X^\epsilon _N(t)= & {} e^{A_Nt}P_Nx^*+\int _{0}^{t}e^{A_N(t-s)}\big [P_NF({\hat{X}}^\epsilon _N(s))+\sqrt{\epsilon }h(\epsilon )P_Nu^\epsilon (\hat{\eta }_N^{\epsilon }(s))\big ]ds\\{} & {} +\sqrt{\epsilon }W_{A_N}(t) , \end{aligned}$$

where \(P_N\) denotes a projection to the N-dimensional subspace of \(\mathcal {H}\) spanned by the eigenvectors \(e_1,\dots ,e_N\) of A (not to be confused with the linearization eigenvectors \(e^f_n\) of Hypothesis 3(b)). Turning to the time discretization, we consider a time-step \(h=T/\Delta t\) for some \(\Delta t\in \mathbb {N},\) discretization times \(t_k=kh\), \(k=0,\dots , \Delta t\) and set \(\Theta _0^{N}:=P_Nx^*.\) The exponential Euler scheme is then given by

$$\begin{aligned} \begin{aligned}&\Theta _{k+1}^{N}=e^{A_Nh}\Theta _k^{N}+A_N^{-1}\big [ e^{A_Nh}-I\big ]^{-1}\big [P_NF\big ( \Theta _{k}^{N}\big )+\sqrt{\epsilon }h(\epsilon )P_Nu^\epsilon \big ({\tilde{\Theta }}_{k}^{N}\big ) \big ]\\&\quad + \sqrt{\epsilon }\int _{t_{k}}^{t_{k+1}}e^{A_N(t_{k+1}-s) }P_NdW(s), \end{aligned} \end{aligned}$$

\(k=0,\dots ,\Delta t-1\) where \({\tilde{\Theta }}_{k}^{N}=(\Theta _{k}^{N}-\Theta _{0}^{N})/\sqrt{\epsilon }h(\epsilon ).\) Letting \(\Theta _{k,j}^N=\langle \Theta _{k}^N, e_j \rangle _\mathcal {H}, f_{k,j}^N= \langle P_NF\big ( \Theta _{k}^{N}\big ), e_j\rangle _\mathcal {H}, u_{k,j}^N=\langle P_Nu^\epsilon \big ( {\tilde{\Theta }}_{k}^{N}\big ), e_j\rangle _\mathcal {H}\) the numerical scheme for the approximation of (84) is then given by

$$\begin{aligned} \begin{aligned} \Theta _{k+1,j}^{N}=e^{-a_j h}\Theta _{k,j}^{N}+\frac{1-e^{-a_j h}}{a_j}\big ( f_{k,j}^N+\sqrt{\epsilon }h(\epsilon ) u_{k,j}^N \big )+ \sqrt{\epsilon }\sqrt{\frac{1-e^{-2a_j h}}{2a_j}}w_{k,j} \end{aligned}\nonumber \\ \end{aligned}$$
(85)

where for \(k=0,\dots \Delta t-1, j=1,\dots , N\) \(\xi _{k,j}\) are independent standard normal random variables.

For Neumann and periodic boundary conditions, the pairs \((a_j, e_j)\) are given by (77), (78). Since the changes of measure \(u^\epsilon \) act only in the direction of \(e_1^f\) and the latter coincides with \(e_0\) (i.e. a constant function) we have that \(u_{k,j}^N=0\) when \(j\ne 0.\) However the eigenvalue \(a_0\) is in both cases equal to 0. Hence the exponential Euler scheme is not well-defined for \(j=0\). For this reason we simulate \(\Theta _{k+1,0}^{N}\) via an explicit Euler scheme i.e.

$$\begin{aligned} \Theta _{k+1,0}^N= \Theta _{k,0}^{N}+ h\big [f_{k,0}^N+\sqrt{\epsilon }h(\epsilon )u_{k,0}^N \big ]+\sqrt{\epsilon }\sqrt{h}w_{k,0}\;,\;\; k=0,\dots ,\Delta t-1, \end{aligned}$$

where \(\Theta _{0,0}^{N}:=\langle x^*, e_0\rangle _\mathcal {H}\), \(f_{k,0}^N= \langle P_NF\big ( \Theta _{k}^{N}\big ), e_0\rangle _\mathcal {H}, u_{k,0}^N=\langle P_Nu^\epsilon \big ( {\tilde{\Theta }}_{k}^{N}\big ), e_0\rangle _\mathcal {H}\) and \(w_{k,0}\) are once again independent standard normal random variables. The computation of the coefficients \(\{ f^{N}_{k,j}\}\) in the Neumann (respectively periodic) case can be efficiently performed by applying a forward-backward odd (resp. periodic or Hartley-type) Fast Fourier Transform (FFT) in an iterative fashion. For more details on the discrete Fourier transform and the FFT algorithm the reader is referred to [9] (Chapters 6 and 8 respectively).

Turning to the stochastic Allen–Cahn with Dirichlet boundary conditions the simulations require an additional step. As discussed in Sect. 5, the stable equilibrium \(x^*\) (81) is no longer a constant function and the changes of measure \(u^\epsilon \) push towards \(e_1^f\) (82) which no longer coincides with a single eigenvector \(e^{Dir}_{k}\) (79). Thus, one needs to express \(x^*\) and \(e_1^f\) in terms of the eigenbasis \(\{e^{Dir}_{k}\}_{k\in \mathbb {N}}\) of the Laplacian and then perform the exponential Euler scheme (85). If the changes of measure acted on a higher dimensional eigenspace, this step essentially reduces to a change of basis which can be computed with numerical linear-algebraic methods. Regarding the coefficients \(\{ f^{N}_{k,j}\},\) these can be computed by applying a forward-backward even Fast Fourier Transform iteratively.

Remark 16

In the examples of Sect. 5 the stable equilibrium \(x^*\) and the spectra of the linearized operators could be found explicitly. We remark here that our scheme does not depend on explicit formulas for eigenvalues and eigenvectors as long as those can be approximated numerically and the approximated eigenvalues satisfy Hypothesis 3(c).

All the simulations below were done using a parallel MPI C code with \(M=5\times 10^4\) Monte Carlo trajectories. The FFTs were performed with the aid of the C library FFTW. As it is standard in the related literature (see e.g. [3], Chapter VI,1), the measure of performance is relative error per sample, defined as

$$\begin{aligned} \text {relative error per sample} =\sqrt{M}\frac{\text {st.deviation}({\hat{P}}^\epsilon ) }{\text {expectation} [{\hat{P}}(\epsilon )]}{.} \end{aligned}$$

The smaller the relative error per sample, the more efficient the algorithm and the more accurate the estimator. However, in practice both the standard deviation and the expected value of an estimator are typically unknown, which implies that empirical relative error is often used for measurement. This means that the expected value of the estimator will be replaced by the empirical sample mean and the standard deviation of the estimator will be replaced by the empirical sample standard error. A dash line in the simulation tables indicates that no trajectory exited D before time T. Before presenting the simulation tables, let us make a few comments on the parameter values and the end conclusions of the numerical studies.

1) (Simulations for the Neumann stochastic Allen–Cahn) We estimate exit probabilities \(P(\epsilon )\) for the solution \(X^\epsilon \) of (76) driven by additive space-time white noise on the interval \((0,\ell )\) and Neumann boundary conditions. For the simulations we set \(\ell =1, x^*=x^*_{+}=1,\) \(h(\epsilon )=\epsilon ^{-0.1}\) and Galerkin projection level \(N=50\). The numerical results can be found in Tables 14.

Table 1 Estimated probability values \(P(\epsilon ,T)\) for the stochastic Allen–Cahn equation with Neumann boundary conditions using the developed importance sampling scheme with \(\kappa =0.9\) and mollification parameter \(\delta :=2/h^2(\epsilon )\)
Table 2 Estimated relative errors per sample for the stochastic Allen–Cahn equation with Neumann boundary conditions using the developed importance sampling scheme with \(\kappa =0.9\) and mollification parameter \(\delta :=2/h^2(\epsilon )\)
Table 3 Estimated probability values \(P(\epsilon ,T)\) for the stochastic Allen–Cahn equation with Neumann boundary conditions. The values reported are based on standard Monte Carlo simulation without employing some change of measure
Table 4 Estimated relative errors per sample for the stochastic Allen–Cahn equation with Neumann boundary conditions. The values reported are based on standard Monte Carlo simulation without employing some change of measure. A probability of \(2\times 10^{-5}\) means that only one out of the \(5\times 10^4\) trajectories exited the domain. The relative error in that case is 223.6
Table 5 Estimated probabilities for the stochastic Allen–Cahn equation with periodic boundary conditions using the developed importance sampling scheme with \(\kappa =0.9\) and mollification parameter \(\delta :=2/h^2(\epsilon )\)

2) (Simulations for the periodic stochastic Allen–Cahn) We estimate \(P(\epsilon )\) for the solution \(X^\epsilon \) of (76) driven by additive space-time white noise on the interval \((0,\ell )\) and periodic boundary conditions. For the simulations we set \(x^*_{+}=1, \ell =1, R=R(\epsilon ):=\sqrt{\epsilon }h(\epsilon )\in (0,1), \;h(\epsilon )=\epsilon ^{-0.1}, \eta ^{\epsilon }=(X^{\epsilon }-x^*_{+})/R(\epsilon )\), Galerkin projection level \(N=50\). The numerical results can be found in Tables 58.

3) (Simulations for the Dirichlet stochastic Allen–Cahn) We estimate \(P(\epsilon )\) for the solution \(X^\epsilon \) of (76) driven by additive space-time white noise on the interval \((0,\ell )\) and Dirichlet boundary conditions. For the simulations we set \(\ell =3.81828\), \(x^*=x^*_{+}=a\quad sn\bigg ( \xi \sqrt{1-\frac{a^2}{2}}, \frac{a^2}{2-a^2}\bigg )\) with \(a={\mathcal {M}}^{-1}(\ell /2)=0.65,\) \(h(\epsilon )=\epsilon ^{-0.1}\) and Galerkin projection level \(N=50\). Note that \(\Vert x^*_+\Vert _{L^2}\approx 0.33\). The numerical results can be found in Tables 912.

4) (Simulations for the quintic SRDE (83)) We estimate \(P(\epsilon )\) for the solution \(X^\epsilon \) of (83) driven by additive space-time white noise on the interval \((0,\ell )\) and Neumann boundary conditions. For the simulations we set \(\mu =-0.5, x^*_{+}=1, \ell =1, R=R(\epsilon ):=\sqrt{\epsilon }h(\epsilon )\in (0,1), \;h(\epsilon )=\epsilon ^{-0.1}, \eta ^{\epsilon }=(X^{\epsilon }-x^*_{+})/R(\epsilon )\), Galerkin projection level \(N=50\). The numerical results can be found in Tables 1316.

Table 6 Estimated relative errors per sample for the stochastic Allen–Cahn equation with periodic boundary conditions using the developed importance sampling scheme with \(\kappa =0.9\) and mollification parameter \(\delta :=2/h^2(\epsilon )\)
Table 7 Estimated probabilities for the stochastic Allen–Cahn equation with periodic boundary conditions. The values reported are based on standard Monte Carlo simulation without employing some change of measure
Table 8 Estimated relative error per sample for the stochastic Allen–Cahn equation with periodic boundary conditions. The values reported are based on standard Monte Carlo simulation without employing some change of measure

5) Standard Monte Carlo (sMC) estimation, i.e. with no change of measure does not perform well for small values of \(\epsilon ,\) as indicated in Tables 4,12,8. A dash line indicates that there was no successful trajectory in the simulations and thus no estimate could be provided. The relative errors per sample are getting increasingly large making most of the reported probability values of Tables 3,11,7 to be of no value.

Table 9 Estimated probability values \(P(\epsilon ,T)\) for the stochastic Allen–Cahn equation with Dirichlet boundary conditions using the developed importance sampling scheme with \(\kappa =0.9\) and mollification parameter \(\delta :=2/h^2(\epsilon )\)
Table 10 Estimated relative errors per sample for the stochastic Allen–Cahn equation with Dirichlet boundary conditions using the developed importance sampling scheme with \(\kappa =0.9\) and mollification parameter \(\delta :=2/h^2(\epsilon )\)
Table 11 Estimated probability values \(P(\epsilon ,T)\) for the stochastic Allen–Cahn equation with Dirichlet boundary conditions. The values reported are based on standard Monte Carlo simulation without employing some change of measure
Table 12 Estimated relative errors per sample for the stochastic Allen–Cahn equation with Dirichlet boundary conditions. The values reported are based on standard Monte Carlo simulation without employing some change of measure

6) The importance sampling scheme for the Allen–Cahn equation outperforms sMC and performs well for all boundary conditions and probabilities ranging from \(10^{-1}\) to \(10^{-11}\) (see Tables 1, 9, 5). In particular, the relative errors for the former are way lower than those of sMC. As expected, the estimated probabilities resulting from sMC and importance sampling scheme agree when the relative errors are below 10.0. The relative errors per sample for the importance sampling scheme lie mostly below 1.2. The relative-error trends indicate that the accuracy improves as the sampling time grows from \(T=1\) to \(T=8.\) The relative errors per sample as reported in Tables 2,10, 6 support the theoretical findings in that the scheme performs optimally as the theory predicts.

7) The performance of the importance sampling scheme for the quintic SRDE (83) experiences a slight degradation after \(T=3\) (see Table 14). Nevertheless, it remains superior to that of the sMC (compare to Table 16) while the relative errors remain mostly below 2.5 and decrease with \(\epsilon \).

8) Table 17 provides a comparison between the importance sampling relative errors for the Neumann Allen–Cahn with \(h(\epsilon )=\epsilon ^{-0.2}\), \(h(\epsilon )=\epsilon ^{-0.1}.\) The sampling time is fixed to \(T=2.\) We observe that the scaling \(h(\epsilon )=\epsilon ^{-0.1}\) leads to significantly lower relative errors than \( h(\epsilon )=\epsilon ^{-0.2}\). This behavior is correctly predicted by (70) since the first satisfies \(\sqrt{\epsilon }h^3(\epsilon )\rightarrow 0\) while the second does not. We remark that, despite the higher relative errors, the importance sampling scheme with \( h(\epsilon )=\epsilon ^{-0.2}\) still outperforms sMC. Complete simulation tables for \(h(\epsilon )=\epsilon ^{-0.2}\) are available upon request.

Table 13 Estimated probabilities for the quintic SRDE (83) with Neumann boundary conditions and \(\mu =-0.5\) using the developed importance sampling scheme with \(\kappa =0.999\) and mollification parameter \(\delta :=2/h^2(\epsilon ).\) The other parameters are \(h(\epsilon )=\epsilon ^{-0.1}, \ell =1, x^*=1\)
Table 14 Estimated relative errors per sample for the quintic SRDE (83) with Neumann boundary conditions and \(\mu =-0.5\) using the developed importance sampling scheme with \(\kappa =0.999\) and mollification parameter \(\delta :=2/h^2(\epsilon ).\) The rest of the parameters are \(h(\epsilon )=\epsilon ^{-0.1}, \ell =1, x^*=1\)
Table 15 Estimated probability values for the quintic SRDE (83) with Neumann boundary conditions and \(\mu =-0.5\). The values reported are based on standard Monte Carlo simulation without employing some change of measure
Table 16 Estimated relative errors per sample for the quintic SRDE (83) with Neumann boundary conditions and \(\mu =-0.5\). The values reported are based on standard Monte Carlo simulation without employing some change of measure
Table 17 Comparison of relative errors and probabilities produced by the importance sampling scheme for the Neumann Allen–Cahn equation with different moderate deviation scalings \(h(\epsilon )\). The rest of the parameters are \(x^*=1,\ell =1,\kappa =0.9, T=2\)
Table 18 Comparison of the relative errors produced by the importance sampling scheme for the Neumann Allen–Cahn equation with different Galerkin projection levels N. The rest of the parameters are \(x^*=1,\ell =1,\kappa =0.9,h(\epsilon )=\epsilon ^{-0.1}, T=3\)

9) In Table 18, we work with the Neumann Allen–Cahn and compare relative errors for different levels N of the Galerkin approximation with \(N = 50,100,150\). The sampling time is fixed to \(T=3\). We notice that the relative errors are practically of the same order. This indicates that the first mode really dominates the rare event. Another observation we made is that the total simulation time increased significantly as we increased N. These considerations led us to conclude that \(N =50\) is an efficient and sufficiently good lower dimensional approximation to the corresponding SPDE.

6.1 Numerical results for stochastic Allen–Cahn with Neumann boundary conditions

In this section, we provide numerical simulation results validating our theory for the stochastic Allen–Cahn equation with Neumann boundary conditions studied in Subsection 5.1.

6.2 Numerical results for stochastic Allen–Cahn with periodic boundary conditions

In this section, we provide numerical simulation results validating our theory for the stochastic Allen–Cahn equation with periodic boundary conditions studied in Subsection 5.1.

6.3 Numerical results for stochastic Allen–Cahn with Dirichlet boundary conditions

In this section, we provide numerical simulation results validating our theory for the stochastic Allen–Cahn equation with Dirichlet boundary conditions studied in Subsection 5.2.

6.4 Numerical results for the quintic SRDE (83) with Neumann boundary conditions

In this section, we provide numerical simulation results validating our theory for the for the quintic SRDE (83) with Neumann boundary conditions studied in Subsection 5.3.

6.5 Numerical comparisons of relative errors and probabilities for different parameter values

In this section, we provide numerical simulation results validating our theory for the stochastic Allen–Cahn equation with Neumann boundary conditions studied in Subsection 5.1. In particular, we now explore the effect of different moderate deviation scalings \(h(\epsilon )\) and of different Galerkin projection levels N.

7 Conclusions and future work

In this paper we studied the problem of rare event simulation for small-noise SRDEs via moderate deviation-based importance sampling. Taking advantage of the linearized limiting dynamics of the process \(\hat{\eta }^{\epsilon ,v}\) (42), we constructed changes of measure that behave optimally in the limit as \(\epsilon \rightarrow 0\) under the fairly general spectral gap condition of Hypothesis 3c’. Working under the more restrictive Hypothesis 3(c) we designed an importance sampling scheme with changes of measure that act on a one-dimensional eigenspace of the operator \(A+DF(x^*).\) We were then able to show that this scheme performs well pre-asymptotically and supplemented the theoretical results with numerical simulations for gradient-type SRDEs corresponding to a double-well potential. Such systems have wide applicability and provided good examples to illustrate our theory. Nevertheless, there are other types of nonlinearities which satisfy our assumptions, e.g. \(f=-V'_f,\) where the potential \(V_f(x)=\sin x\) has more than two global minima.

The design and pre-asymptotic analysis of a scheme under the weaker spectral gap of Hypothesis 3c’ provides an interesting direction for future work. This would allow for the simulation of rare events for SRDEs under bifurcation (e.g. when \(\ell >\pi \) in the Neumann Allen–Cahn case). The asymptotic optimality of such a scheme is guaranteed by Theorem 3.2. Even though the presence of non-constant saddle points with one unstable direction facilitates exits from D (13), the pre-asymptotic analysis of Sect. 4 is expected to be more complicated in this setting. This is due to the fact that the changes of measure \(u_{k_0}\) (25) act on \(k_0-\) dimensional subspaces of \(\mathcal {H}.\) One then has to show that the linearization error is negligible by considering the behavior of the system on carefully chosen partitions of a \(k_0\)-dimensional section of D.

Throughout this work we have considered SRDEs in one spatial dimension. In higher dimensions, equations like (76) are singular and a-priori ill-posed. Thus, one has to consider SRDEs with a spatially colored stochastic forcing or employ renormalization techniques. Metastability results for the renormalized two-dimensional Allen–Cahn can be found e.g. in [5, 52] and references therein. Importance sampling for linear equations (i.e. \(f=0\)) with colored noise has been considered in [46]. In the latter, the spatial covariance operator Q is assumed to be trace-class and diagonalizable with respect to the eigenbasis of the differential operator A. Carrying the analysis of this paper over to higher spatial dimensions is challenging since A and the linearized operator \(A+DF(x^*)\) do not necessarily have the same eigenbasis (e.g. in the case of the Allen–Cahn with Dirichlet boundary conditions in spatial dimension 2). In particular, the analysis of the exit direction in Sect. 4 would have to be generalized and take into account the non-commutativity of Q and \(A+DF(x^*)\).

Finally, we expect that the results of this paper can be used to design importance sampling schemes for simulating rare events in slow-fast systems of SRDEs. Similar work for multiscale diffusions in finite dimensions has been done in [50] and an MDP for multiscale SRDEs was recently proved in [30].