1 Introduction

The estimation of extreme event probabilities in complex stochastic systems is an important problem in applied sciences and engineering, and is difficult as soon as these events are too rare to be easily observable, but at the same time too impactful to be ignored. Examples of such events studied in the recent literature include rogue waves (Dematteis et al. 2018) and wave impacts on an offshore platform (Mohamad and Sapsis 2018), heat waves and cold spells (Ragone et al. 2018; Gálfi et al. 2019), intermittent fluctuations in turbulent flows (Fuchs et al. 2022) and derivative pricing fluctuations in mathematical finance (Friz et al. 2015). A broad perspective on extreme event prediction can be found in Farazmand and Sapsis (2019). Methods to estimate extreme events typically rely on Monte Carlo simulations, including importance sampling (Bucklew 2013), subset simulation (Au and Beck 2001) or multilevel splitting methods (Budhiraja and Dupuis 2019).

A possible theoretical framework to assess extreme event probabilities, which we will follow in this work, is given by large deviation theory (LDT) (Varadhan 1984; Dembo and Zeitouni 1998). This approach allows to estimate the dominant, exponential scaling of the probabilities in question through the solution of a deterministic optimization problem, namely finding the most relevant realization of the stochastic process for a given outcome. This realization is sometimes called instanton, inspired by theoretical physics. For stochastic processes described by stochastic differential equations (SDEs), the relevant theory has been formulated by Freidlin and Wentzell (2012), and can be extended to many stochastic partial differential equations (SPDEs). The computational potential of this formulation has been reviewed by Grafke and Vanden-Eijnden (2019).

In addition to the exponential scaling provided by LDT, it is often desirable to obtain asymptotically sharp, i.e. asymptotically exact probability estimates. This requires the evaluation of a pre-exponential factor in addition to the usual leading-order large deviation result, when interpreting LDT as a Laplace approximation. On the theoretical side, there exist multiple results for such precise Laplace asymptotics for general SDEs (Ellis and Rosen 1982; Azencott 1982; Ben Arous 1988; Piterbarg and Fatalov 1995; Deuschel et al. 2014) and certain SPDEs requiring renormalization (Berglund et al. 2017; Friz and Klose 2022), which, however, typically do not include an actual evaluation of the abstract objects in terms of which they are formulated. We concentrate on the case of SDEs or well-posed SPDEs with additive noise here, where computing the leading-order prefactor amounts to evaluating a Fredholm determinant of an integral operator.

Approach. In this paper, we present a sharp and computable probability estimate for tail probabilities \(\mathbb {P}\left[ f\left( X_T\right) \ge z \right] \), i.e. a real-valued function f of a diffusion process \((X_t)_{t \in [0,T]}\) with state space \(\mathbb {R}^n\) and

$$\begin{aligned} {\left\{ \begin{array}{ll} \textrm{d}X_t = b(X_t) \textrm{d}t + \sigma \textrm{d}B_t\,,\\ X_0 = x \in \mathbb {R}^n, \end{array}\right. } \end{aligned}$$
(1)

exceeding a given threshold z at final time T (see Fig. 1 for an example of this setup). We demonstrate that

$$\begin{aligned} \mathbb {P}\left[ f\left( X_T\right) \ge z \right] \approx (2 \pi )^{-1/2} C(z) \exp \left\{ -I(z) \right\} \,, \end{aligned}$$
(2)

in a way to be made precise later on, with real-valued functions C, called the (leading-order) prefactor, and I, called the rate function. The latter is determined through the solution of a constrained optimization problem:

$$\begin{aligned} I(z) = \min _{{\begin{array}{c}\eta \in L^2([0,T],\mathbb {R}^n)\\ \text {s.t. }f\left( X_T[\eta ] \right) = z \end{array}}} \; \frac{1}{2} \left\Vert \eta \right\Vert _{L^2}^2, \end{aligned}$$
(3)

where formally \(\eta = \textrm{d}B_t / \textrm{d}t\) is the time derivative of the Brownian motion \((B_t)_{t \in [0,T]}\), and \(X_T\) depends on \(\eta \) through (1). The prefactor C is then expressed as a Fredholm determinant of a linear operator which contains the solution of the minimization problem (3), the instanton \(\eta _z\), as a background field and acts on paths \(\delta \eta :[0,T] \rightarrow \mathbb {R}^n\). We show how to evaluate this operator determinant numerically for general SDEs and SPDEs, and demonstrate through multiple examples that it is possible to do so even for very high-dimensional systems with \(n \gg 1\) arising, for instance, after spatial discretization of an SPDE. Our approach is based on computing the dominant eigenvalues of the trace-class integral operator entering the Fredholm determinant.

Fig. 1
figure 1

Visualization of extreme event set (red), a sample path that, from a given initial condition, ends in the extreme event set at final time T (orange) and two typical sample paths that do not end in the event set (blue and green). The gray lines are field lines of the drift vector field b. This is the reason why paths that end in the event set are rare, since the noise must act against the flow of the deterministic vector field b to push the system into the extreme event set. Details of this example problem, which is used throughout the paper as illustration, will be given in Sect. 1.2 and more rare event paths and the instanton are shown in Fig. 2. The implementation of this example is available from a public GitHub repository (Schorlepp et al. 2023)

Related literature. In the physics literature, the leading-order prefactor computation corresponds to the evaluation of Gaussian path integrals, which is a classical topic in quantum and statistical field theory (Zinn-Justin 2021). There are multiple references dealing with the evaluation of such integrals for the class of differential operators that is necessary for SDEs, such as Papadopoulos (1975), Nickelsen and Engel (2011), Corazza and Fadel (2020). In accordance with these approaches, in the last years, numerical leading order prefactor computation methods for general SDEs and SPDEs via the solution of Riccati matrix differential equations have been established (Schorlepp et al. 2021; Ferré and Grafke 2021; Grafke et al. 2021; Bouchet and Reygner 2022; Schorlepp et al. 2023). An early example using a similar method is given by Maier and Stein (1996). All of these papers have in common that the leading order prefactor can be evaluated in a closed form by solving a single matrix valued initial or final value problem, thereby bypassing the need to compute large operator determinants directly. We briefly introduce this method in this paper, relate it to the—in some sense complementary—Fredholm determinant prefactor evaluation based on dominant eigenvalues, and discuss possible advantages and disadvantages. We note that for SDEs with low-dimensional state space, it can also be feasible to compute the differential operator determinants, that are otherwise evaluated through the Riccati matrices, directly by discretizing the operator into a large matrix and numerically calculating its determinant, which has been carried out e.g. by Psaros and Kougioumtzoglou (2020), Zhao et al. (2022).

Another perspective on the precise Laplace approximation used in this paper is provided by the so-called second-order reliability method (SORM), which is used in the engineering literature to estimate failure probabilities, as reviewed e.g. by Rackwitz (2001); Breitung (2006). For example, the asymptotic form of the extreme event probabilities in this paper corresponds to the standard form stated by Breitung (1984). In this sense, the method proposed in this paper can be regarded as a path space SORM, carried over to infinite dimensions for the case of additive noise SDEs. The connection of precise LDT estimates to SORM for finite-dimensional parameter spaces has also been pointed out by Tong et al. (2021).

In studies of rare and extreme event estimation, Monte Carlo simulations are commonly used, and various sampling schemes have been designed, some of which have been modified and adapted to systems involving SDEs. These include various importance sampling estimators which can be associated e.g. with the solution to deterministic optimal control problems along random trajectories (Vanden-Eijnden and Weare 2012), with the instanton in LDT (Ebener et al. 2019), or build on stochastic Koopman operator eigenfunctions (Zhang et al. 2022). The method we propose takes a different perspective from these sampling methods—it does not involve sampling, and is only asymptotically exact.

Contributions and limitations. The main contributions of this paper are as follows: (i) Generalizing SORM to infinite dimensions, we introduce a sampling-free method to approximate extreme event probabilities for SDEs (and SPDEs) with additive noise. The method is based on the Laplace approximation in path space and uses second-order information to compute the probability prefactor. (ii) While such precise Laplace asymptotics for SDEs are known on a theoretical level, we show how to evaluate them numerically in a manner that is straightforward to implement and is scalable, i.e. it does not degrade with increasing discretization dimension. We illustrate the method on a high-dimensional nonlinear example, namely estimating the probability of high strain rate events in a three-dimensional stochastic Navier–Stokes flow. (iii) On the theoretical level, we explore the relationship between the proposed eigenvalue-based approach for calculating the prefactor and Riccati methods from stochastic analysis and stochastic field theory. We examine the advantages of each method and provide an interpretation of the involved Gaussian process using transition tubes towards the extreme event, i.e. the expected magnitude and direction of fluctuations on the way to an extreme outcome.

The approach taken in this paper also has some limitations: (i) While we find the probability estimates including the leading-order prefactor to be quite accurate when compared to direct Monte Carlo simulations when these are feasible, these estimates are approximations and only asymptotically exact in the limit as \(z\rightarrow \infty \). To obtain unbiased estimates, one can e.g. use importance sampling. The instanton and the second variation eigenvalues and eigenvectors can be used as input for such extreme event importance sampling algorithms (Ebener et al. 2019; Tong et al. 2021; Tong and Stadler 2022). (ii) We limit ourselves to SDEs with additive Gaussian noise. For SDEs with multiplicative noise (or singular SPDEs), the leading-order prefactor is more complicated, as the direct analogy to the finite-dimensional case gets lost (Ben Arous 1988). Nevertheless, extensions of the eigenvalue-based prefactor computation proposed here can likely be made, but are beyond the scope of this paper. (iii) The proposed approach assumes that the differential equation-optimization (3) has a unique solution that can be computed. For non-convex constraints, uniqueness may be difficult to prove or may not hold. However, in the examples we consider, we seem to be able to identify the global minimizer reliably by using several different initializations in the minimization algorithm and, if we find different minimizers, by choosing the one corresponding to the smallest objective value. The proposed approach can also be generalized to multiple isolated and continuous families of minimizers (Ellis and Rosen 1981; Schorlepp et al. 2023).

Notation. We use the following notations throughout the paper: The state space dimension is always written as n, a possible time discretization dimension of the interval [0, T] as \(n_t\), and N is exclusively used in section 1.1 for the motivation of our results via random variables in \(\mathbb {R}^N\). We denote the Euclidean norm and inner product in \(\mathbb {R}^N\) by \(\left\Vert \cdot \right\Vert _N\) and \(\langle \cdot , \cdot \rangle _N\), respectively, and the \(L^2\) norm and scalar product for \(\mathbb {R}^n\)-valued functions defined on [0, T] by \(\left\Vert \cdot \right\Vert _{L^2([0,T], \mathbb {R}^n)}\) and \(\langle \cdot , \cdot \rangle _{L^2([0,T], \mathbb {R}^n)}\), respectively. The outer product is denoted by \(\otimes \), with \(v \otimes w = v w^\top \) and \(v^{\otimes 2} {:}{=}v \otimes v\) for \(v,w \in \mathbb {R}^N\) and \((f \otimes g)(t,t') = f(t) g(t')^\top \) for \(f,g \in L^2([0,T], \mathbb {R}^n)\) and \(t,t' \in [0,T]\). Convolutions are written as \(*\). The subscript or argument \(z \in \mathbb {R}\) always represents the dependency on the observable value e.g. of the minimizer \(\eta _z\), Lagrange multiplier \(\lambda _z\) and projected second variation operator \(A_z\), as well as the observable rate function \(I_F(z)\) and prefactor \(C_F(z)\). The identity map is in general denoted by \({{\,\textrm{Id}\,}}\), and the identity matrix and zero matrix in \(\mathbb {R}^N\) are written as \(1_{N \times N}\) and \(0_{N \times N}\). The superscript \(\perp \) always denotes the orthogonal complement, with \(v^\perp {:}{=}(\text {span}(\{v\}))^\perp \). Functional derivatives with respect to \(\eta \in L^2([0,T], \mathbb {R}^n)\) are denoted by \(\delta / \delta \eta \). Determinants in \(\mathbb {R}^N\), as well as Fredholm determinants, are written as \(\det \), whereas regularized differential operator determinants are written as \({{\,\textrm{Det}\,}}\) with the boundary conditions of the operator as a subscript. For two real functions g and h, we write

$$\begin{aligned} g(\varepsilon ) \overset{\varepsilon \downarrow 0}{ \sim }\ h(\varepsilon ) \quad \iff \quad \lim _{\varepsilon \downarrow 0} \; \frac{g(\varepsilon )}{h(\varepsilon )} = 1\,, \end{aligned}$$
(4)

if the functions g and h are asymptotically equivalent as \(\varepsilon \downarrow 0\). By an abuse of terminology, we use the term “instanton” in this paper to refer to the large deviation minimizer \(\eta _z\) for finite-dimensional parameter spaces, and also to both the instanton noise trajectory \(\left( \eta _z(t)\right) _{t \in [0,T]}\) and the instanton state variable trajectory \(\left( \phi _z(t)\right) _{t \in [0,T]}\) in the infinite-dimensional setup.

We start with a more precise explanation of the concepts described in this introduction in Sects. 1.1 and 1.2, before summarizing the structure of the rest of the paper at the end of Sect. 1.2.

1.1 Laplace method for normal random variables in \(\mathbb {R}^N\)

We start with the finite dimensional setting, following Dematteis et al. (2019), Tong et al. (2021): We consider a collection of N random parameters \(\eta \in \mathbb {R}^N\) that are standard normally distributed, and are interested in a physical observable, described by a function \(F:\mathbb {R}^N\rightarrow \mathbb {R}\), that describes the outcome of an experiment under these random parameters. Note that restricting ourselves to independent standard normal variables is not a major limitation as F may include a map that transforms a standard normal to another distribution. To give an example that fits into this setting, \(\eta \) could be all parameters entering a weather prediction model, and F then constitutes the mapping of the parameters to some final prediction, such as the temperature at a given location in the future. Note that the map F may be complicated and expensive to evaluate, e.g. requiring the solution of a PDE.

We are interested in the probability that the outcome of the experiment exceeds some threshold z, i.e. \(P(z) = \mathbb {P}[F(\eta ) \ge z]\). Since here z is assumed large compared to typically expected values of \(F(\eta )\), we call P(z) the extreme event probability. To be able to control the rareness of the event, we introduce a formal scaling parameter \(\varepsilon >0\) and consider \(\varepsilon \ll 1\) to make the event extreme by defining \(P_F^\varepsilon (z) = \mathbb {P}[F(\sqrt{\varepsilon }\eta ) \ge z]\). This allows us to treat terms of different orders in \(\varepsilon \) perturbatively in the rareness of the event and is more amenable to analysis than rareness due to \(z\rightarrow \infty \). In the following, we will thus consider z as a fixed constant, while discussing the limit \(\varepsilon \rightarrow 0\). Since \(\eta \) is normally distributed, the extreme event probability is available as an integral,

$$\begin{aligned} P_F^\varepsilon (z) = (2\pi \varepsilon )^{-N/2}\!\!\int _{\mathbb {R}^N} \mathbbm {1}_{ \{F(\eta )\ge z\}}(\eta ) \exp \left\{ -\frac{1}{2\varepsilon } \left\Vert \eta \right\Vert _N^2\right\} \textrm{d}^N\eta ,\nonumber \\ \end{aligned}$$
(5)

by integrating all possible \(\eta \) that lead to an exceedance of the observable threshold (as identified by the indicator function \(\mathbbm {1}\)), weighed by their respective probabilities given by the Gaussian densities. Directly evaluating the integral in (5) is typically infeasible for complicated sets \(\{\eta \in \mathbb {R}^N \mid F(\eta )\ge z\}\) and large N.

The central notion of this paper is the fact that in the limit \(\varepsilon \downarrow 0\), the integral in (5) can be approximated via the Laplace method, which replaces the integrand with its extremal value, times higher order multiplicative corrections. The corrections at leading order in \(\varepsilon \) amount to a Gaussian integral that can be solved exactly. In effect, the integral (5) is approximated by the probability of the most likely event that exceeds the threshold, multiplied by a factor that takes into account the event’s neighborhood.

To make things concrete, we make the following assumptions on \(F \in C^2(\mathbb {R}^N, \mathbb {R})\) for given \(z > F(0)\):

  1. 1.

    There is a unique \(\eta _z \in \mathbb {R}^N \backslash \{0\}\), called the instanton, that minimizes the function \(\tfrac{1}{2} \left\Vert \cdot \right\Vert ^2_N\) in \(F^{-1}([z,\infty ))\). Necessarily, \(\eta _z \in F^{-1}(\{z\})\) lies on the boundary, \(F(\eta _z) = z\), and there exists a Lagrange multiplier \(\lambda _z\ge 0\) with \(\eta _z = \lambda _z \nabla F(\eta _z)\) as a first-order necessary condition. We define the large deviation rate function of the family of real-valued random variables \(\left( F(\sqrt{\varepsilon } \eta ) \right) _{\varepsilon > 0}\) at z via

    $$\begin{aligned} I_F :\mathbb {R}\rightarrow \mathbb {R}\,, \quad I_F(z) := \tfrac{1}{2} \left\Vert \eta _z\right\Vert ^2_N\,. \end{aligned}$$
    (6)
  2. 2.

    \(1_{N \times N} - \lambda _z \nabla ^2 F(\eta _z)\) is positive definite on the \((N-1)\)-dimensional subspace \(\eta _z^\perp \subset \mathbb {R}^N\) orthogonal to the instanton, i.e. we assume a second-order sufficient condition for \(\eta _z\) holds.

Then, there is a sharp estimate, in the sense of (4), for the extreme event probability (5) via

$$\begin{aligned} P_F^\varepsilon (z) \overset{\varepsilon \downarrow 0}{\sim }\varepsilon ^{1/2} (2 \pi )^{-1/2} \, C_F(z) \,\exp \left\{ -\frac{1}{\varepsilon }I_F(z)\right\} , \end{aligned}$$
(7)

where the rate function \(I_F\) determines the exponential scaling, and \(C_F(z)\) is the z-dependent leading order prefactor contribution that accounts for the local properties around the instanton. Note that the prefactor is essential to get a sharp estimate, which cannot be obtained from mere \(\log \)-asymptotics using only the rate function. The prefactor \(C_F(z)\) can explicitly be computed via

$$\begin{aligned} C_F(z) = \left[ 2 I_F(z) \det \left( 1_{N \times N} - \lambda _z {{\,\textrm{pr}\,}}_{\eta _z^\perp } \nabla ^2 F(\eta _z) {{\,\textrm{pr}\,}}_{ \eta _z^\perp } \right) \right] ^{-1/2}, \nonumber \\ \end{aligned}$$
(8)

where \({{\,\textrm{pr}\,}}_{\eta _z^\perp } = 1_{N \times N} - \eta _z \otimes \eta _z / \left\Vert \eta _z\right\Vert ^2_N\) is the orthogonal projection onto \(\eta _z^\perp \). A brief derivation of this result, analogous to the computations of Tong et al. (2021), is included in “Appendix A1” for completeness. It is also directly equivalent to the standard form of the second order reliability method, as derived e.g. by Breitung (1984). Geometrically, it corresponds to replacing the extreme event set \(\{\eta \in \mathbb {R}^N \mid F(\eta ) \ge z \}\) by a set bounded by the paraboloid with vertex at the instanton \(\eta _z\), the axis of symmetry in the direction of \(\nabla F(\eta _z)\) and curvatures adjusted to be the eigenvalues of the \(-\Vert \nabla F\Vert ^{-1}\)-weighted Hessian of F at \(\eta _z\).

For the weather prediction example, Eqs. (7) and (8) mean the following: We could estimate (5) by performing a large number of simulations of the weather model with a random choice of parameters to obtain statistics on an extremely high temperature event. Instead, we solve an optimization problem over parameters to compute only the single most likely route to that large temperature. When the desired event is very extreme, such a situation can only be realized when all simulated physical processes conspire in exactly the right way to make the extreme temperature event possible. Consequently, only a narrow choice of model parameters and corresponding sequence of events remains that can contribute to the extreme event probability: precisely the instanton singled out by the optimization procedure. The probability of the extreme event is then well approximated by perturbations around that single most likely extreme outcome.

Next, we generalize the statement (7) to the infinite-dimensional setting encountered in continuous time stochastic systems. Intuitively, for temporally evolving systems with stochastic noise, there is randomness at every single instance in time, which implies an infinite number of random parameters to optimize over. We generalize the above strategy to the important case of SDEs in \(\mathbb {R}^n\) driven by small additive Gaussian noise, and assemble and compare computational methods to compute \(I_F\) and \(C_F\) numerically, even for very large spatial dimensions n stemming from semi-discretizations of multi-dimensional SPDEs.

1.2 Generalization to infinite dimensions for SDEs with additive noise

As a stochastic model problem, we consider the SDE

$$\begin{aligned} {\left\{ \begin{array}{ll} \textrm{d}X^\varepsilon _t = b(X^\varepsilon _t) \textrm{d}t + \sqrt{\varepsilon } \sigma \textrm{d}B_t\,,\\ X^\varepsilon _0 = x \in \mathbb {R}^n, \end{array}\right. } \end{aligned}$$
(9)

on the time interval [0, T] with a deterministic initial condition and \(n \in \mathbb {N}\), \(\varepsilon > 0\). The drift vector field \(b :\mathbb {R}^n \rightarrow \mathbb {R}^n\), assumed to be smooth, may be nonlinear and non-gradient. The constant matrix \(\sigma \in \mathbb {R}^{n \times n}\) is not required to be diagonal or invertible. The SDE is driven by a standard n-dimensional Brownian motion \(B = (B_t)_{t \in [0,T]}\). We limit ourselves to the estimation of extreme event probabilities (due to small noise \(\epsilon \)) of the random variable \(f(X^\varepsilon _T)\), where \(f:\mathbb {R}^n \rightarrow \mathbb {R}\) is a smooth, possibly nonlinear observable of the process \(X^\varepsilon \) at final time \(t = T\).

Fig. 2
figure 2

Visualization of five different sample paths (light orange) and the mean of 100 such paths (orange with black outline) of the model SDE (10) that satisfy \(f(X(T),Y(T)) \ge z\) with \(z = 3\) (red set) and \(\varepsilon = 0.5\). Using Euler-Maruyama steps with an integrating factor with step size \(\Delta t = 5 \cdot 10^{-4}\), we repeatedly simulated (10) until 100 such rare trajectories were found. The dashed blue line is the state variable instanton trajectory \(\phi _z\), solution of (19) with the optimal \(\eta _z\) as forcing. As in Fig. 1, the gray lines are field lines of the drift vector field b

A concrete example of this type of system, already alluded to in the first section, is shown in Fig. 2. It is given by the SDE

$$\begin{aligned}&{\left\{ \begin{array}{ll} \textrm{d}X = (-X-XY)\,\textrm{d}t + \sqrt{\varepsilon }\,\textrm{d}B_X,\\ \textrm{d}Y = (-4Y+X^2)\,\textrm{d}t + \tfrac{1}{2}\sqrt{\varepsilon }\,\textrm{d}B_Y, \end{array}\right. } \nonumber \\&\text {with }(X(0),Y(0)) = (0,0)\,. \end{aligned}$$
(10)

The streamlines in the figure show the motion taken by deterministic trajectories of the model at \(\varepsilon = 0\). Small magnitude stochasticity in the form of Brownian noise is added, and we ask the question: What is the probability \(P_F^\varepsilon (z)\), as defined below in (13), that the system ends up, at time \(T=1\), in the red shaded area in the top right corner, given by \(f(x,y) = x + 2y \ge z=3\)? After approximately \(1.2 \cdot 10^7\) simulations, 100 such trajectories are found, with some of them shown in light orange in Fig. 2. These can be considered typical realizations for this extreme outcome, and allow us to estimate \(P_F^\varepsilon (z) \in \left[ 6.71 \cdot 10^{-6},9.97 \cdot 10^{-6}\right] \) as a \(95\%\) confidence interval. While in principle the same approach could be applied to much more complicated stochastic models, such as SPDEs arising in atmosphere or ocean dynamics, it quickly becomes infeasible due to the cost of performing such a large number of simulations.

Instead, we generalize the strategy outlined in the previous section. For the derivation, we make the following, compared to the finite-dimensional case stronger assumptions for technical reasons. To formulate them, we introduce the solution map

$$\begin{aligned} F :L^2([0,T],\mathbb {R}^n) \rightarrow \mathbb {R}\,, \quad&F[\eta ] = f(\phi (T)), \nonumber \\&\text {for } {\left\{ \begin{array}{ll} \dot{\phi } = b(\phi ) + \sigma \eta \,,\\ \phi (0) = x\,. \end{array}\right. } \end{aligned}$$
(11)

Then, we assume for all \(z \in \mathbb {R}\):

  1. 1.

    There is a unique instanton on the z-levelset of F, \(\eta _z \in F^{-1}(\{z\}) \subset L^2([0,T], \mathbb {R}^n)\), that minimizes the function \(\tfrac{1}{2} \left\Vert \cdot \right\Vert ^2_{L^2([0,T], \mathbb {R}^n)}\). There exists a Lagrange multiplier \(\lambda _z \in \mathbb {R}\) with \(\eta _z = \lambda _z \left. \frac{\delta F}{\delta \eta }\right| _{\eta _z}\) as a first-order necessary condition. We define the large deviation rate function for the observable f as

    $$\begin{aligned} I_F :\mathbb {R}\rightarrow \mathbb {R}\,, \quad I_F(z) := \tfrac{1}{2} \left\Vert \eta _z\right\Vert ^2_{L^2([0,T], \mathbb {R}^n)}\,. \end{aligned}$$
    (12)
  2. 2.

    The map from observable value to minimizer \(z \mapsto \eta _z\) is \(C^1\). In particular \(I_F'(z) = \langle \eta _z, \textrm{d}\eta _z / \textrm{d}z \rangle _{L^2([0,T], \mathbb {R}^n)} = \lambda _z \langle \left. \tfrac{\delta F}{\delta \eta } \right| _{\eta _z}, \tfrac{\textrm{d}\eta _z}{\textrm{d}z} \rangle _{L^2([0,T], \mathbb {R}^n)} = \lambda _z\).

  3. 3.

    \({{\,\textrm{Id}\,}}- \lambda _z \left. \frac{\delta ^{2}F}{\delta \eta ^{2}}\right| _{ \eta _z}\) is positive definite.

  4. 4.

    The rate function \(I_F\) is twice continuously differentiable and strictly convex, i.e. \(I_F'' > 0\).

Under these assumptions and using existing theoretical results on precise Laplace asymptotics for small-noise SDEs, in “Appendix A2” we sketch a derivation of the following result: For the extreme event probability

$$\begin{aligned} P_F^\varepsilon (z) = \mathbb {P}{\Big [ F[\sqrt{\varepsilon } \eta ] \ge z \Big ]} = \mathbb {P}{\Big [ f(X_T^\varepsilon ) \ge z \Big ]} \end{aligned}$$
(13)

with \(z > F(0)\), the asymptotically sharp estimate (7) holds in the same way as before. The leading order prefactor is now given by

$$\begin{aligned} C_F(z) = \left[ 2 I_F(z) \det \left( {{\,\textrm{Id}\,}}- \lambda _z {{\,\textrm{pr}\,}}_{\eta _z^\perp } \left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{ \eta _z} {{\,\textrm{pr}\,}}_{ \eta _z^\perp } \right) \right] ^{-1/2}\,, \end{aligned}$$
(14)

where \(\det \) is now a Fredholm determinant, the second variation \(\delta ^2 F / \delta \eta ^2\) of the solution map F at \(\eta = \eta _z\) is a linear trace-class operator on \(L^2([0,T],\mathbb {R}^n)\), and \({{\,\textrm{pr}\,}}\) denotes orthogonal projection in \(L^2([0,T],\mathbb {R}^n)\).

Applied to the model SDE (10), we must first compute the optimal noise realization \(\eta _z = \left( \eta _z(t) \right) _{t \in [0,T]}\), which has a corresponding optimal system trajectory \(\phi _z = \left( \phi _z(t) \right) _{t \in [0,T]}\). This optimal trajectory, shown blue dashed in Fig. 2, describes the most likely evolution of the SDE (10) from the initial condition (0, 0) into the shaded region in the upper right corner, thus leading to an event \(f(X(T),Y(T))\ge z\). Second, through equation (14), we can compute the prefactor correction for this optimal noise realization. Inserted into Eq. (7), we obtain \(P_F^{\varepsilon = 0.5}(z = 3) \approx 8.94 \cdot 10^{-6}\) as an asymptotic, sampling-free estimate, which falls into the estimated interval obtained with direct sampling. The source code to reproduce all results for this example is available in a public GitHub repository (Schorlepp et al. 2023).

We add some remarks on the setting:

  1. 1.

    We focus on SDEs with additive noise (9) for simplicity. For the more general case of ordinary Itô SDEs with multiplicative noise \(\sigma = \sigma (X_t^\varepsilon )\), the leading order prefactor can still be computed explicitly, but involves a regularized Carleman-Fredholm determinant \(\det _2\) (see Simon (1977) for a definition) instead of a Fredholm determinant \(\det \), because the second variation of F is no longer guaranteed to be trace-class (Ben Arous 1988). The direct analogy to the finite-dimensional case is only possible for additive noise.

  2. 2.

    We state the theoretical result and computational strategy for ordinary stochastic differential equations, but will also apply them numerically to SPDEs with additive, spatially smooth Gaussian forcing. In this case, we expect a direct generalization of the results for SDEs to hold.

  3. 3.

    Without any additional work, we also obtain a sharp estimate, in the sense of (4), for the probability density function \(\rho _F^\varepsilon \) of \(f(X_T^\varepsilon )\) at z via

    $$\begin{aligned} \rho _F^\varepsilon (z) \overset{\varepsilon \downarrow 0}{\sim }\ (2 \pi \varepsilon )^{-1/2} \lambda _z C_F(z)\,\exp \left\{ -\frac{1}{\varepsilon }I_F(z)\right\} \,. \end{aligned}$$
    (15)

From a practical point of view, the remaining question is how to evaluate (12) and (14), given a general and possibly high-dimensional SDE (9).

Main questions and paper outline. In the remainder of this paper, we will specifically answer the following questions:

  • How to find the minimizer \(\eta _z\) to the differential equation constrained optimization problem (12) numerically? This question has been treated in detail in the literature for the setup at hand, and we give a brief summary of relevant references in Sect. 2.1.

  • How to evaluate the Fredholm determinant in (14) numerically? We show in Sect. 2.2 how to use second-order adjoints to compute the application of the projected second variation operator

    $$\begin{aligned} A_z := \lambda _z {{\,\textrm{pr}\,}}_{ \eta ^\perp _z} \left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} {{\,\textrm{pr}\,}}_{ \eta ^\perp _z} \end{aligned}$$
    (16)

    to functions (or, upon discretization, to vectors), which is the basis for iterative eigenvalue solvers. In Sect. 2.4, we discuss how this allows us to treat very large system dimensions n as long as the rank of \(\sigma \) remains small.

  • How does this prefactor computation based on the dominant eigenvalues of the projected second variation operator theoretically relate to the alternative approach using symmetric matrix Riccati differential equations mentioned in the introduction? What are the advantages and disadvantages of the different approaches? We comment on these points in Sects. 2.3 and 2.4.

  • What is the probabilistic interpretation of the quantities encountered when evaluating (12) and (14)? In how far can they be observed in direct Monte Carlo simulations of the SDE (9)? This is the content of Sect. 3.

After these theoretical sections, illustrated throughout via the model SDE (10), we present two challenging examples in Sect. 4: The probability of high waves in the stochastic Korteweg–De Vries equation in Sect. 4.1, and the probability of high strain events in the stochastic three-dimensional incompressible Navier–Stokes equations in Sect. 4.2. All technical derivations can be found in “Appendix A”.

2 Numerical rate function and prefactor evaluation

In this section, we show how the instanton and prefactor for the evaluation of the asymptotic tail probability estimate (7) can be computed in practice for a general, possibly high-dimensional SDE (9), and illustrate the procedure for the model SDE (10). Both finding the instanton (Sect. 2.1) and the prefactor (Sect. 2.2) require the solutions of differential equations of a complexity comparable to the original SDE. They therefore become realistic to evaluate numerically even for fairly large problems, provided tailored methods are used, as summarized in Sect. 2.4. Additionally, we compare the adjoint-based Fredholm determinant computation to the approach based on matrix Riccati differential equations in Sects. 2.3 and 2.4.

2.1 First variations and finding the instanton

Here, we discuss the differential equation-constrained optimization problem

$$\begin{aligned} \eta _z = \mathop {\mathrm {arg\,min}}\limits _{{\begin{array}{c}\eta \in L^2([0,T],\mathbb {R}^n)\\ \text {s.t. }F[\eta ] = z \end{array}}} \; \frac{1}{2} \left\Vert \eta \right\Vert _{L^2([0,T],\mathbb {R}^n)}^2, \end{aligned}$$
(17)

that determines the instanton noise \(\eta _z\), and briefly review how it can be solved numerically. We reformulate the first-order optimality condition

$$\begin{aligned} \eta _z = \lambda _z \left. \frac{\delta F}{\delta \eta } \right| _{\eta _z} \end{aligned}$$
(18)

by evaluating the first variation using an adjoint variable as reviewed by Plessix (2006), Hinze et al. (2009). For any \(\eta \in L^2([0,T],\mathbb {R}^n)\), we find \(\frac{\delta (\lambda F)}{\delta \eta } = \sigma ^\top \theta \), where the adjoint variable \(\theta \) (also called conjugate momentum) is found via solving

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{\phi }} = b(\phi ) + \sigma \eta \,, \quad &{}\phi (0) = x\,,\\ {\dot{\theta }} = - \nabla b^\top (\phi ) \theta \,, \quad &{}\theta (T) = \lambda \nabla f(\phi (T))\,. \end{array}\right. } \end{aligned}$$
(19)

With \(a = \sigma \sigma ^\top \), we recover from (18) the well-known instanton equations, formulated only in term of the state variable \(\phi _z\) and its adjoint variable \(\theta _z\) with optimal noise \(\eta _z = \sigma ^\top \theta _z\):

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{\phi }}_z = b(\phi _z) + a \theta _z\,, \quad &{}\phi _z(0) = x\,, \quad f(\phi _z(T)) = z,\\ {\dot{\theta }}_z = - \nabla b^\top (\phi _z) \theta _z\,, \quad &{}\theta _z(T) = \lambda _z \nabla f(\phi _z(T))\,. \end{array}\right. } \end{aligned}$$
(20)

The rate function is given by \(I_F(z) = \tfrac{1}{2} \left\langle \theta _z, a \theta _z \right\rangle _{L^2([0,T],\mathbb {R}^n)}\). When formulating the optimization problem in the state variable \(\phi \) instead of the noise \(\eta \), the instanton equations (20) are directly obtained as the first-order necessary condition for a minimizer of the Freidlin-Wentzell (Freidlin and Wentzell 2012) action functional S with

$$\begin{aligned} S[\phi ] = \frac{1}{2} \int _0^T \left\langle \dot{\phi } - b(\phi ), a^{-1} \left[ \dot{\phi } - b(\phi ) \right] \right\rangle _n \textrm{d}t\,. \end{aligned}$$
(21)

The numerical minimization of this functional for both ordinary and partial stochastic differential equations is discussed e.g. by E et al. (2004), Grafke et al. (2015), Grafke et al. (2015), Grafke and Vanden-Eijnden (2019), Schorlepp et al. (2022). Conceptually, the minimization problem (17) is a deterministic distributed optimal control problem on a finite time horizon with a final time constraint on the state variable (Lewis et al. 2012; Herzog and Kunisch 2010). The final-time constraint can be eliminated e.g. using penalty methods. Alternatively, for a convex rate function, a primal-dual strategy (Boyd and Vandenberghe 2004) with minimization of \(\tfrac{1}{2} \left\Vert \cdot \right\Vert ^2 - \lambda F\) at fixed \(\lambda \) can be used. If estimates for a range of z are desired, one can solve the dual problem for various \(\lambda \), which effectively computes the Legendre-Fenchel transform \(I_F^*(\lambda )\), and invert afterwards. If the rate function is not convex, the observable f can be reparameterized to make this possible (Alqahtani and Grafke 2021). To solve the unconstrained problems of the general form \( \min \tfrac{1}{2} \left\Vert \cdot \right\Vert ^2 - \lambda (F - z) + \tfrac{\mu }{2}(F - z)^2\), gradient-based methods with an adjoint evaluation (19) can be used, e.g. Schorlepp et al. (2022) use an L-BFGS solver. Simonnet (2022) used a deep learning approach instead. For high-dimensional problems such as multi-dimensional fluids, it may be necessary to use checkpointing for the gradient evaluation, and to use \({{\,\textrm{rank}\,}}\sigma \ll n\) if applicable to reduce memory costs (Grafke et al. 2015). We comment on this point in more detail in Sect. 2.4. Using second order adjoints as in the next section would also make it possible to implement a Newton solver, cf. Hinze and Kunisch (2001), Hinze et al. (2006), Sternberg and Hinze (2010), Cioaca et al. (2012).

For the model SDE (10), the instanton equations (20) read

$$\begin{aligned}&{\left\{ \begin{array}{ll} \frac{\textrm{d}}{\textrm{d}t} \left( \begin{array}{c} \phi _1\\ \phi _2 \end{array} \right) = \left( \begin{array}{c} -\phi _1\\ -4\phi _2 \end{array} \right) + \left( \begin{array}{c} -\phi _1 \phi _2\\ \phi _1^2 \end{array} \right) + \left( \begin{array}{c} \theta _1\\ \tfrac{1}{4}\theta _2 \end{array} \right) \,,\\ \frac{\textrm{d}}{\textrm{d}t} \left( \begin{array}{c} \theta _1\\ \theta _2 \end{array} \right) = \left( \begin{array}{c} +\theta _1\\ +4\theta _2 \end{array} \right) + \left( \begin{array}{c} \phi _2 \theta _1 - 2 \phi _1 \theta _2\\ \phi _1\theta _1 \end{array} \right) \,, \end{array}\right. }\nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \left( \begin{array}{c} \phi _1(0)\\ \phi _2(0) \end{array} \right) = \left( \begin{array}{c} 0\\ 0 \end{array} \right) \,, \quad \phi _1(T) + 2 \phi _2(T) = z,\\ \left( \begin{array}{c} \theta _1(T)\\ \theta _2(T) \end{array} \right) = \lambda _z \left( \begin{array}{c} 1\\ 2 \end{array} \right) \,. \end{array}\right. } \end{aligned}$$
(22)

We implemented a simple gradient descent (preconditioned with \(a^{-1}\)) using adjoint evaluations of the gradient and an Armijo line search (available in the GitHub repository (Schorlepp et al. 2023)) to find the instanton for the model SDE (10). The state equation is discretized using explicit Euler steps with an integrating factor, and the gradient is computed exactly on a discrete level, i.e. “discretize, then optimize”. To find the instanton for a given z, we use the augmented Lagrangian method. For each subproblem at fixed Lagrange multiplier \(\lambda \) and penalty parameter \(\mu \), gradient descent is performed until the gradient norm has been reduced by a given factor compared to its initial value. All of these aspects are summarized in more detail by Schorlepp et al. (2022). The resulting optimal state variable trajectory \(\phi _z\) for \(z = 3\) for the model SDE (10) is shown in Fig. 2.

2.2 Second variations and prefactor computation via dominant eigenvalues

Similarly to the previous section, the second variation is also readily evaluated in the adjoint formalism. With this prerequisite, we are able to use iterative eigenvalue solvers to approximate the Fredholm determinant \(\det ({{\,\textrm{Id}\,}}- A_z)\). For a comprehensive introduction to the numerical computation of Fredholm determinants, as well as theoretical results on approximate evaluations using integral quadratures, see Bornemann (2010). However, in contrast to Bornemann (2010), we deal with possibly spatially high-dimensional problems, such as the example in Sect. 4.2. Hence, we use iterative algorithms to compute the dominant eigenvalues to keep the number of operator evaluations manageable.

Another application of the adjoint state method shows that applying the second functional derivative of the solution map F at \(\eta :[0,T] \rightarrow \mathbb {R}^n\) to a fluctuation \(\delta \eta :[0,T] \rightarrow \mathbb {R}^n\) results in \(\frac{\delta ^{2}(\lambda F)}{\delta \eta ^{2}} \delta \eta = \sigma ^\top \zeta \), where \(\zeta \) is found via solving

$$\begin{aligned}&{\left\{ \begin{array}{ll} {\dot{\gamma }} =\nabla b(\phi ) \gamma + \sigma \delta \eta \,,\\ {\dot{\zeta }} = -\left\langle \nabla ^2 b(\phi ), \theta \right\rangle _n \gamma - \nabla b^\top (\phi ) \zeta \,, \end{array}\right. } \nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \gamma (0) = 0\,,\\ \zeta (T) = \lambda \nabla ^2 f(\phi (T)) \gamma (T)\,. \end{array}\right. } \end{aligned}$$
(23)

Here, we use the short-hand notation \(\left[ \left\langle \nabla ^2 b(\phi ), \theta \right\rangle _n\right] _{ij} = \sum _{k=1}^n \partial _i \partial _j b_k(\phi ) \theta _k\). The trajectories \(\phi \) and \(\theta \) in (23) are determined via (19) from \(\eta \). Note that the second order equations (23) are simply the linearization of (19). Together with the projection operator \({{\,\textrm{pr}\,}}_{\eta ^\perp _z}\) acting as

$$\begin{aligned} ({{\,\textrm{pr}\,}}_{\eta ^\perp _z} \delta \eta )(t) = \delta \eta (t) - \frac{\left\langle \eta _z, \delta \eta \right\rangle _{L^2([0,T], \mathbb {R}^n)}}{\left\Vert \eta _z\right\Vert ^2_{L^2([0,T],\mathbb {R}^n)}} \eta _z(t) \end{aligned}$$
(24)

for \(t \in [0,T]\), we are now in a position to evaluate the application of the operator \(A_z\), as defined in (16), to any function \(\delta \eta :[0,T] \rightarrow \mathbb {R}^n\). Denoting the eigenvalues of the trace-class operator \(A_z\) by \(\mu _z^{(i)} \in (-\infty , 1)\), the Fredholm determinant in the prefactor (14) is given by \(\det ({{\,\textrm{Id}\,}}- A_z) = \prod _{i=1}^\infty (1 - \mu _z^{(i)})\), with \(\left|\mu _z^{(i)}\right| \xrightarrow {i \rightarrow \infty } 0\) in such a way that the product converges. An iterative eigenvalue solver relying solely on matrix–vector multiplication, thus avoiding the explicit storage of the possibly large discretized operator \(A_z\) as an \((n_t \cdot n)\times (n_t \cdot n)\) matrix, can now be used numerically to find a finite number of dominant eigenvalues of \(A_z\) with absolute value larger than some thresholds, and approximate \(\det ({{\,\textrm{Id}\,}}- A_z)\) using these.

Fig. 3
figure 3

Result of numerically computing 200 eigenvalues \(\mu _z^{(i)}\) with largest absolute value of \(A_z\) for the example SDE (10) with \(z = 3\). Discretization of (25) was done with step size \(\Delta t = 5 \cdot 10^{-4}\), hence the dimension of the discretized path space variables is 4000 here. Main figure: absolute value of the eigenvalues \(\mu _z^{(i)}\). Inset: Finite product \(\prod _{i = 1}^m \left( 1 - \mu _z^{(i)} \right) \) for different m as an approximation for the Fredholm determinant \(\det ({{\,\textrm{Id}\,}}- A_z)\). We see that the eigenvalues rapidly decay to zero in this example, that similarly, the cumulative product in the inset quickly converges, and that the final estimate \(\det ({{\,\textrm{Id}\,}}- A_z) \approx \prod _{i = 1}^{200} \left( 1 - \mu _z^{(i)} \right) \approx 1.0397\) is in fact close to 1 in this example

For the model example SDE (10), linearizing the state and first order adjoint equations (19), the second order adjoint equations for (10) become

$$\begin{aligned}&{\left\{ \begin{array}{ll} \frac{\textrm{d}}{\textrm{d}t} \left( \begin{array}{c} \gamma _1\\ \gamma _2 \end{array} \right) = -\left( \begin{array}{c}\gamma _1\\ 4 \gamma _2\end{array}\right) + \left( \begin{array}{c} - \gamma _1 \phi _2 - \phi _1 \gamma _2\\ 2\phi _1 \gamma _1 \end{array} \right) + \left( \begin{array}{c} \delta \eta _1\\ \tfrac{1}{2}\delta \eta _2 \end{array} \right) \,,\\ \frac{\textrm{d}}{\textrm{d}t} \left( \begin{array}{c} \zeta _1\\ \zeta _2 \end{array} \right) = \left( \begin{array}{c} \zeta _1\\ 4 \zeta _2 \end{array}\right) + \left( \begin{array}{c} \gamma _2 \theta _1+\phi _2 \zeta _1 -2 \gamma _1 \theta _2 - 2 \phi _1 \zeta _2 \\ \gamma _1 \theta _1 + \phi _1 \zeta _1 \end{array} \right) \,, \end{array}\right. } \nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \left( \begin{array}{c} \gamma _1(0)\\ \gamma _2(0) \end{array} \right) = \left( \begin{array}{c} 0\\ 0 \end{array} \right) \,,\\ \left( \begin{array}{c} \zeta _1(T)\\ \zeta _2(T) \end{array} \right) = \left( \begin{array}{c} 0\\ 0 \end{array} \right) \,. \end{array}\right. } \end{aligned}$$
(25)

We implemented a simple Euler solver for these equations for a given discretized input vector \(\delta \eta \in \mathbb {R}^{2 (n_t + 1)}\) in the python code (Schorlepp et al. 2023) as a subclass of scipy.sparse.linalg.LinearOperator. To set up this operator, we supply the instanton data \((\phi _z, \theta _z, \lambda _z) \in \mathbb {R}^{2 (n_t + 1)} \times \mathbb {R}^{2 (n_t + 1)} \times \mathbb {R}\) as found using the methods of the previous Sect. 2.1. The LinearOperator class, for which we only need to supply a matrix vector multiplication method instead of having to store the full matrix \(\in \mathbb {R}^{2 (n_t + 1) \times 2 (n_t + 1)}\), can then be used with any iterative eigenvalue solver. Here, we use the implicitly restarted Arnoldi method of ARPACK (Lehoucq et al. 1998), wrapped as scipy.sparse.linalg.eigs in python. Note that in this example, storing the full matrix would be feasible, and the Riccati method discussed in the next section is faster to compute the prefactor. However, we are interested in a scalable approach for large n, where, as discussed in Sect. 2.4 and shown in Sect. 4, the Riccati approach becomes infeasible. We show the results of computing 200 eigenvalues with largest absolute value of the projected second variation operator \(A_z\) for \(z=3\) in Fig. 3.

2.3 Alternative: prefactor computation via matrix Riccati differential equations

In “Appendix A3”, we motivate via formal manipulations that the prefactor (14) can also be expressed via the following ratio of zeta-regularized functional determinants (Ray and Singer 1971) of second order differential operators, instead of a Fredholm determinant of an integral operator. This prefactor expression is more natural from the statistical physics point of view, where path integrals in the field variable \(\phi \) instead of the noise \(\eta \) are typically considered, cf. Zinn-Justin (2021). We obtain

$$\begin{aligned} C_F(z) =&\sqrt{I_F''(z)}\lambda _z^{-1} \left( \frac{{{\,\textrm{Det}\,}}_{{{\mathcal {A}}}_{\lambda _z}} \left( \Omega [\phi _z] \right) }{{{\,\textrm{Det}\,}}_{{{\mathcal {A}}}_0} \left( \Omega [\phi _0] \right) } \right) ^{-1/2} \times \nonumber \\ {}&\quad \times \exp \left\{ -\tfrac{1}{2} \int _0^T \left( \nabla \cdot b(\phi _z) - \nabla \cdot b(\phi _0)\right) \,\textrm{d}t\right\} \,, \end{aligned}$$
(26)

in accordance with Schorlepp et al. (2023), where it was derived directly through path integral computations. Here, \(\Omega \) is the Jacobi operator of the Freidlin-Wentzell action functional as defined in the “Appendix A3”, and the subscript of the zeta-regularized determinants \({{\,\textrm{Det}\,}}\) denotes the boundary conditions under which the determinants of the differential operators are computed. Naively evaluating the determinant ratio in (26) by numerically finding the eigenvalues of the appearing differential operators is typically not feasible. This is due to the fact that both operators posses unbounded spectra with the same asymptotic behavior of the eigenvalues, which requires computing the smallest eigenvalues of both operators. A threshold for this computation is difficult to set, and while the eigenvalues of both operators should converge to each other as they increase, numerical inaccuracies tend to increase for the larger eigenvalues. Fortunately, there exists theoretical results regarding the computation of such determinant ratios exactly and in a closed form by solving initial value problems (Gel’fand and Yaglom 1960; Levit and Smilansky 1977; Forman 1987; Kirsten and McKane 2003). Using the results of Forman (1987), the prefactor (26) can be computed by solving the symmetric matrix Riccati differential equation

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{Q}_z = a + Q_z \nabla b\left( \phi _z \right) ^\top + \nabla b\left( \phi _z \right) Q_z + Q_z \left\langle \nabla ^2 b(\phi _z), \theta _z\right\rangle _n Q_z \,,\\ Q_z(0) = 0_{n \times n} \in \mathbb {R}^{n \times n}\,. \end{array}\right. } \end{aligned}$$
(27)

for \(Q_z :[0,T] \rightarrow \mathbb {R}^{n \times n}\) and then evaluating

$$\begin{aligned} C_F(z)&= \lambda _z^{-1} \exp \left\{ \frac{1}{2} \int _0^T {{\,\textrm{tr}\,}}\left[ \left\langle \nabla ^2 b(\phi _z), \theta _z \right\rangle _n Q_z \right] \textrm{d}t\right\} \times \nonumber \\&\times \left[ {\det } \left( U_z \right) \left\langle \nabla f(\phi _z(T)), Q_z(T) U_z^{-1} \nabla f(\phi _z(T)) \right\rangle _n \right] ^{-1/2} \end{aligned}$$
(28)

with

$$\begin{aligned} U_z := 1_{n \times n} - \lambda _z \nabla ^2 f \left( \phi _z(T) \right) Q_z(T) \in \mathbb {R}^{n \times n}\,. \end{aligned}$$
(29)

This result in terms of a Riccati matrix differential equation is also natural from a stochastic analysis perspective (WKB analysis of the Kolmogorov backward equation (Grafke et al. 2021)), or a time-discretization of the path integral perspective (recursive evaluation method (Schorlepp et al. 2021)). To give intuition for the Riccati differential equation (27), note that by letting \(Q_z = \gamma \zeta ^{-1}\) with \(\gamma (0) = 0_{n \times n}\) and \(\zeta (0) = 1_{n \times n}\), the approach amounts to solving

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{\gamma }} =\nabla b(\phi ) \gamma + a \zeta \,, \quad &{}\gamma (0) = 0_{n \times n}\,,\\ {\dot{\zeta }} = -\left\langle \nabla ^2 b(\phi ), \theta \right\rangle _n \gamma - \nabla b^\top (\phi ) \zeta \,, \quad &{}\zeta (0) = 1_{n \times n}, \end{array}\right. } \end{aligned}$$
(30)

as an initial value problem, whereas the eigenvalue problem \(\frac{\delta ^{2}(\lambda F)}{\delta \eta ^{2}} \delta \eta = \mu \delta \eta \) corresponds to the boundary value problem

$$\begin{aligned}&{\left\{ \begin{array}{ll} {\dot{\gamma }} =\nabla b(\phi ) \gamma + \mu ^{-1} a \zeta \,,\\ {\dot{\zeta }} = -\left\langle \nabla ^2 b(\phi ), \theta \right\rangle _n \gamma - \nabla b^\top (\phi ) \zeta \,, \end{array}\right. } \nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \gamma (0) = 0\,,\\ \zeta (T) = \lambda \nabla ^2 f(\phi (T)) \gamma (T)\,. \end{array}\right. } \end{aligned}$$
(31)

This means that to evaluate the functional determinant prefactor via the Riccati approach, we consider functions in the kernel of the operator \({{\,\textrm{Id}\,}}- \lambda \delta ^2 F / \delta \eta ^2\), i.e. eigenfunctions belonging to the eigenvalue 0, but under modified boundary conditions of the operator. In practice, instead of finding the dominant eigenvalues of the integral operator \(A_z\) of Sect. 2.2 that acts on functions \(\delta \eta :[0,T] \rightarrow \mathbb {R}^n\), we can integrate a single matrix-valued initial value problem for \(Q_z :[0,T] \rightarrow \mathbb {R}^{n \times n}\) as presented in this section. Even though the Riccati equation (27), in contrast to the linear system (30), is a nonlinear differential equation, it is nevertheless advisable to solve (27) instead of (30) numerically because the equation for \(\zeta \) in (30) has to be integrated in the unstable time direction for the right-hand side term \(-\nabla b(\phi )^\top \zeta \). Note also that, depending on the system and observable at hand, the solution of the Riccati equation (27) may pass through removable singularities in (0, T) whenever \(\zeta (t)\) in (30) becomes non-invertible, hence direct numerical integration of (27) may require some care (see Schiff and Shnider (1999) and references therein).

For the two-dimensional model SDE (10), the forward Riccati equation for the symmetric matrix \(Q = Q_z :[0,T] \rightarrow \mathbb {R}^{2 \times 2}\) along \((\phi , \theta ) = (\phi _z, \theta _z)\) becomes

$$\begin{aligned}&\frac{\textrm{d}}{\textrm{d}t}\left( \begin{array}{cc} Q_{11} &{} Q_{12}\\ Q_{12} &{} Q_{22} \end{array} \right) = \left( \begin{array}{cc} 1 &{} 0\\ 0 &{} \tfrac{1}{4} \end{array} \right) - \left( \begin{array}{cc} 2 Q_{11} &{} 5 Q_{12}\\ 5 Q_{12} &{} 8 Q_{22} \end{array} \right) \nonumber \\&+ \left[ \left( \begin{array}{cc} -\phi _2 &{} -\phi _1\\ 2 \phi _1 &{} 0 \end{array} \right) \left( \begin{array}{cc} Q_{11} &{} Q_{12}\\ Q_{12} &{} Q_{22} \end{array} \right) \right] + [\dots ]^\top \nonumber \\&+\left( \begin{array}{cc} Q_{11} &{} Q_{12}\\ Q_{12} &{} Q_{22} \end{array} \right) \left( \begin{array}{cc} 2 \theta _2 &{} - \theta _1\\ - \theta _1 &{} 0 \end{array} \right) \left( \begin{array}{cc} Q_{11} &{} Q_{12}\\ Q_{12} &{} Q_{22} \end{array} \right) \,, \end{aligned}$$
(32)

where \([\dots ]\) stands for a repetition of the preceding term. We solve the Riccati equation with Euler steps with integrating factor in Schorlepp et al. (2023), and use it to evaluate (28). We do not encounter any numerical problems or singularities in this example. The result for \(C_F(z = 3)\) agrees with the Fredholm determinant computation using dominant eigenvalues in the previous Sect. 2.2.

2.4 Computational efficiency considerations

In this section, we compare the two prefactor computation methods of Sects. 2.2 and 2.3 using either dominant eigenvalues of the trace-class operator \(A_z\) evaluated via (23), or the Riccati matrix differential equation (27), in terms of their practical applicability as well as computational and memory cost for large system dimensions \(n \gg 1\).

For the eigenvalue-based approach, we know that \(\prod _{i = 1}^m \left( 1 - \mu _z^{(i)} \right) \xrightarrow {m \rightarrow \infty } \det ({{\,\textrm{Id}\,}}- A_z)\) converges in theory, but it is difficult to give bounds on the required number of eigenvalues for an approximation of the Fredholm determinant to a given accuracy. In all examples considered in this paper, at most a few 100 eigenvalues turned out to be necessary for accurate results, even for the three-dimensional Navier–Stokes equations in Sect. 4.2 as a high-dimensional (\(n = 3 \cdot 128^3 \approx 6.3 \cdot 10^6\)) and strongly nonlinear example. The number of dominant eigenvalues of \(A_z\) to be computed to achieve a desired accuracy is robust with respect to the temporal resolution and only depends on the (effective, see below) dimension of the system and the level of nonlinearity in the system. In any case, to obtain m eigenvalues of \(A_z\) with largest absolute value, iterative eigenvalue solvers, either using Krylov subspace methods or randomized algorithms, typically require a number of evaluations of the operator that is equal to a constant times m (Halko et al. 2011). Each evaluation of \(A_z\) consists of solving two ODEs or PDEs (23) with comparable computational complexity to the original SDE. We comment on memory requirements below.

Compared to this, the Riccati approach requires the numerical solution of a single \(n \times n\) symmetric matrix differential equation as an initial value problem. If n is small, then this is clearly more efficient than computing \(m > n / 2\) eigenvalues. However, there may also be problems with the Riccati approach: On the one hand, this approach requires a strictly convex rate function with \(I_F''(z)> 0\) at z, as can be seen from (26). If this is not satisfied, then a suitable convexification via reparameterization needs to be carried out on a case-by-case basis (Alqahtani and Grafke 2021). While we assumed that the rate function is convex to derive the prefactor (14) in terms of the Fredholm determinant, this assumption is actually not necessary and the eigenvalue-based approach remains feasible regardless of the convexity of the observable rate function \(I_F\). Finally, the eigenvalue approach is easier to interpret, while it is not always immediately clear why the Riccati solution may diverge (removable singularities that can be remedied by a suitable choice of integration scheme versus true singularities due to unstable or flat directions of the second variation at the instanton).

We turn to the memory requirements of the prefactor computations strategies, and in particular to their scaling with the system dimension n. Informally, one can think of the Riccati matrix as defined in the (squared) state space of the SDE, in contrast to the eigenvectors of \(A_z\) that are defined in the noise space that is potentially lower-dimensional. The Riccati equation then integrates a dense \(n \times n\) array in time by performing \(n_t\) consecutive times steps of (27) and evaluating (28) along the way. This is difficult to achieve directly as soon as (semi-discretizations of) multi-dimensional SPDEs are considered, which are relevant e.g. for realistic fluid or climate models. Usually, large Riccati matrix differential equations, which also arise e.g. in linear-quadratic regulator problems, are solved within some problem-specific low-rank format, see e.g. Stillfjord (2018). In contrast to this, the vectors on which iterative eigenvalue solvers for the Fredholm-determinant based approach need to operate are in general vectors of size \(n_t \times n\).

As an important class of examples, we now consider systems with large spatial dimension \(n \gg 1\), for which, however, only a few degrees of freedom are forced, such that the diffusion matrix \(a = \sigma \sigma ^\top \) is singular and \({{\,\textrm{rank}\,}}\sigma \ll n\). Examples for this include fluid and turbulence models with energy injection only on a compactly supported set of either high or low spatial Fourier modes, or climate models with a limited number of random parameters in the model (Margazoglou et al. 2021). In this case, it is straightforward to exploit the small rank of \(\sigma \) within the eigenvalue-based approach to decrease the memory requirements and apply the method even to very high-dimensional models, which we demonstrate for the randomly forced three-dimensional Navier–Stokes equations in Sect. 4.2 in this paper. The idea is that for the eigenvectors \(\delta \eta \) of \(A_z\), clearly only \({{\,\textrm{rank}\,}}\sigma \) many entries are relevant due to the composition with \(\sigma \) and \(\sigma ^\top \). Eigenvalue solvers hence act on \(n_t \times {{\,\textrm{rank}\,}}\sigma \) vectors, which should fit into memory. This is similar to the computation of the instanton itself, where only the instanton noise \(\eta _z\) as a \(n_t \times {{\,\textrm{rank}\,}}\sigma \) vector is computed and stored explicitly, as discussed by Grafke et al. (2015), Schorlepp et al. (2022). The remaining challenge is then to evaluate \(A_z \delta \eta \) for given \(\delta \eta \in \mathbb {R}^{n_t \times {{\,\textrm{rank}\,}}\sigma }\) by solving the second order adjoint equations (23), without storing the full, prohibitively large \(n_t \times n\) arrays needed for \(\phi _z\), \(\gamma \) and \(\theta _z\). Similar to the gradient itself, evaluated via the first order adjoint approach (19), this is possible through (static) checkpointing (Griewank and Walther 2000), as illustrated in Fig. 4. At the cost of having to integrate the first order adjoint equations repeatedly for each noise vector \(\delta \eta \) to which \(A_z\) is applied, and to recursively solve the forward equations for \(\phi _z\) and \(\gamma \) again and again, the memory requirements for the spatially dense fields are only \({{\mathcal {O}}}\left( \log n_t \cdot n \right) \) this way. The same problem is encountered and solved similarly in implementations of Newton solvers for high-dimensional PDE-constrained optimal control problems (Hinze and Kunisch 2001; Hinze et al. 2006; Sternberg and Hinze 2010; Cioaca et al. 2012). All in all, in contrast to the Riccati formalism, this permits an easy and controlled strategy that enables to treat very large spatial dimensions within the Fredholm-based prefactor approach, as long as the diffusion matrix possesses a comparably small rank. Note, however, that it is still necessary that the number of eigenvalues needed to approximate \(\det ({{\,\textrm{Id}\,}}- A_z)\) remains small for this approach to be applicable in practice. We show numerically in Sect. 4.2 that this is indeed the case for the three-dimensional Navier–Stokes equations as an example. The discussion of this paragraph, with all relevant scalings of computational and memory costs for the different approaches, is briefly summarized in Table 1.

Table 1 Overview of computational and memory costs for finding the prefactor \(C_F(z)\) either through solving the Riccati equation (27), or through determining m dominant eigenvalues of \(A_z\).
Fig. 4
figure 4

Sketch of the checkpointing procedure used to evaluate the second variation operator \(\delta ^2(\lambda F) / \delta \eta ^2\) at \(\eta _z\), applied to \(\delta \eta \), for large system dimensions \(n \gg 1\) in a memory-efficient way. The instanton noise \(\eta _z\), the input noise fluctuation \(\delta \eta \), and the return vector \(\sigma ^\top \zeta \) are all stored as dense \((n_t + 1, {{\,\textrm{rank}\,}}\sigma )\)-arrays for \({{\,\textrm{rank}\,}}\sigma \ll n\). Given the instanton noise \(\eta _z\) and a noise fluctuation \(\delta \eta \), the first step consists of solving the state equation and linearized state equation for \(\phi _z\) and \(\gamma \) simultaneously forward in time from \(t = 0\) to \(t = T\), and storing the fields \(\phi _z(t_i) \in \mathbb {R}^n\) and \(\gamma (t_i) \in \mathbb {R}^n\) only at the logarithmically spaced instances \(t_i = \bullet \). Afterwards, the first and second order adjoint equations for \(\theta _z\) and \(\zeta \) are simultaneously solved backwards in time from \(t = T\) to \(t = 0\) and \(\sigma ^\top \zeta (t_i)\) is stored for each \(t_i\). Whenever \(\phi _z(t_j)\) and \(\gamma (t_j)\) are needed for the time integration, but not available in storage already, the two forward equations are solved again from the nearest preceding point in time when they are available, and recursively stored at intermediate steps \(\square \), \(\bigtriangleup \), \(\bigtriangledown \), ...All fields \(\phi _z(t_i) \in \mathbb {R}^n\) and \(\gamma (t_i) \in \mathbb {R}^n\) that are no longer needed during the backwards integration are deleted from memory

In conclusion, we recommend using the Riccati equation only in sufficiently “nice” situations for small to moderate system dimensions n. For such systems and diffusion matrices without low-rank properties, and as long as no additional complications such as non-convex rate functions or removable singularities of the Riccati solution are encountered, it is faster than the eigenvalue-based approach, and better suited to analytical computations or approximations since it only involves the solution of initial value problems, in contrast to the boundary value problems that need to be solved to find eigenfunctions of the projected second variation operator \(A_z\). On the other hand, the Fredholm determinant computation through dominant eigenvalues is easier to use and implement, requiring only solvers for the original SDE, its adjoint, as well as their linearizations. At the cost of introducing numerical errors and a step size parameter \(h>0\) that needs to be adjusted, one can also approximate the second variation evaluations via

$$\begin{aligned} \left. \frac{\delta ^{2}\left( \lambda _z F \right) }{\delta \eta ^{2}}\right| _{\eta _z} \delta \eta \approx \frac{1}{h} \left( \left. \frac{\delta \left( \lambda _z F \right) }{\delta \eta } \right| _{\eta _z + h \delta \eta } - \left. \frac{\delta \left( \lambda _z F \right) }{\delta \eta }\right| _{\eta _z} \right) \end{aligned}$$
(33)

or other finite difference approximations, which does not require implementing any second order variations. In this sense, both the numerical instanton and leading-order prefactor computation can quickly be achieved in a black-box like, non-intrusive way when solvers for the state equation and its adjoint are available. Alternatively, the adjoint solver, as well as solvers for the second order tangent and adjoint equation can be obtained through automatic differentiation (Naumann 2011). We also note that in the context of the second order reliability method, there exist further approximation methods that could be used here for the Fredholm determinant prefactor, e.g. by extracting information from the gradient based optimization method that has been used to find the instanton or design point (Der Kiureghian and De Stefano 1991), or through constructing a non-infinitesimal parabolic approximation to the extreme event set (Der Kiureghian et al. 1987).

In any case, for the scenario of possibly multi-dimensional SPDEs with low-rank forcing, we argue that the eigenvalue approach is to be preferred as it leads to natural approximations and a simpler implementation. However, we remark that in the case of SDEs with multiplicative noise, or SPDEs with spatially white noise that need to be renormalized such as the Kardar–Parisi–Zhang (KPZ) equation, the Riccati approach remains structurally unchanged (Schorlepp et al. 2023), whereas the Fredholm determinant expression turns into a Carleman-Fredholm determinant and an additional operator trace (Ben Arous 1988), which could potentially be more costly to evaluate.

3 Probabilistic interpretation via fluctuation covariances and transition tubes

In this section, we give an intuitive interpretation for some of the quantities encountered in the previous sections. The second variation quantifies the linearized dynamics of the SDE (9) around the most likely realization. This implies that dominating eigenfunctions of the second variation correspond to fluctuation modes that are most easily observable. Below, we confirm this with a simple numerical experiment that relates the eigenfunction information with the transition tube along a rare trajectory. The basic object that we consider in this section is the process \((X_t^\varepsilon )_{t \in [0,T]}\) as \(\varepsilon \downarrow 0\), conditioned on the rare outcome \(f(X_T^\varepsilon ) = z\) at final time. In other words, we consider only transition paths between the fixed initial state \(x \in \mathbb {R}^n\) and any final state in the target set \(f^{-1}(\{z\}) \subset \mathbb {R}^n\). The path on which the transition path ensemble concentrates as \(\varepsilon \downarrow 0\) is given by the state variable instanton trajectory \(\phi _z\), i.e. the most likely way for the system to achieve \(f(X_T^\varepsilon ) = z\), since deviations from it are suppressed exponentially (Freidlin and Wentzell 2012). One thus has

$$\begin{aligned} \lim _{\varepsilon \downarrow 0} \mathbb {E}\left[ X_t^\varepsilon \mid f(X_T^\varepsilon )= z \right] = \phi _z(t) \end{aligned}$$
(34)

for the mean of the conditioned process. In this sense, by taking conditional averages of direct Monte Carlo simulations of (9) as \(\varepsilon \) tends to 0, the instanton trajectory \(\phi _z\) is directly observable, and the mean realization agrees with the most likely one for \(\varepsilon \downarrow 0\). This procedure is sometimes called filtering, and has been carried out e.g. for the one-dimensional Burgers equation (Grafke et al. 2013), the three-dimensional Navier–Stokes equations (Grafke et al. 2015; Schorlepp et al. 2022) and the one-dimensional KPZ equation (Hartmann et al. 2021). Using the results of the previous sections, we can, however, make this statement more precise and state a central limit-type theorem for the conditioned fluctuations at order \(\sqrt{\varepsilon }\) around the instanton: As \(\varepsilon \downarrow 0\), the process \((X_t^\varepsilon - \phi _z(t)) / \sqrt{\varepsilon }\), conditioned on \(f(X_T^\varepsilon ) = z\), becomes centered Gaussian. It is hence fully characterized by its covariance function \({{\mathcal {C}}}_z :[0,T] \times [0,T] \rightarrow \mathbb {R}^{n \times n}\), given by

$$\begin{aligned} {{\mathcal {C}}}_z(t,t') = \lim _{\varepsilon \downarrow 0} \mathbb {E}\left[ \frac{(X_t^\varepsilon - \phi _z(t))\otimes (X_{t'}^\varepsilon - \phi _z(t'))}{\varepsilon } \bigg \vert f(X_T^\varepsilon ) = z\right] . \end{aligned}$$
(35)

We show in “Appendix A4” that \({{\mathcal {C}}}_z\) is fully determined through the orthonormal eigenfunctions \(\delta \eta ^{(i)}_z\) of the projected second variation operator \(A_z\) with corresponding eigenvalues \(\mu _z^{(i)}\) and associated state variable fluctuations \(\gamma ^{(i)}_z\), the solution of the linearized state equation

$$\begin{aligned} \dot{\gamma }^{(i)}_z = \nabla b (\phi _z) \gamma ^{(i)}_z + \sigma \delta \eta ^{(i)}_z\,, \quad \gamma ^{(i)}_z = 0\,, \end{aligned}$$
(36)

via

$$\begin{aligned} {{\mathcal {C}}}_z(t,t') = \sum _{i = 1}^\infty \frac{\gamma ^{(i)}_z(t) \otimes \gamma ^{(i)}_z(t')}{1 - \mu _z^{(i)}}\,. \end{aligned}$$
(37)

In particular, computing the eigenvalues and eigenfunctions of \(A_z\) yields a complete characterization of the conditioned Gaussian fluctuations around the instanton. As detailed in the example below, at small but finite \(\varepsilon \), \({{\mathcal {C}}}_z\) can be used to approximate the distribution of transition paths at any time \(t \in [0,T]\) as multivariate normal \({{\mathcal {N}}}(\phi _z(t), \varepsilon {{\mathcal {C}}}_z(t,t))\). Effectively, in addition to the mean transition path at small noise, the instanton \(\phi _z\), we can also estimate the width and shape of the transition tube around it at any \(t \in [0,T]\) without sampling within a Gaussian process approximation of the conditioned SDE; see Vanden-Eijnden (2006) for a general introduction to transition path theory, and Archambeau et al. (2007), Lu et al. (2017) for Gaussian process approximations of SDEs based on minimizing the path space Kullback–Leibler divergence, which, in the small-noise limit and for transition paths, reduce to the Gaussian process considered here. Furthermore, one can show that the forward Riccati approach of Sect. 2.3 recovers the final-time state variable fluctuation covariance via

$$\begin{aligned} {{\mathcal {C}}}_z(T,T)&= Q_z(T) U_z^{-1} \nonumber \\ {}&\quad - \frac{\left( Q_z(T) U_z^{-1} \nabla f(\phi _z(T) )\right) ^{\otimes 2}}{\left\langle \nabla f(\phi _z(T)), Q_z(T) U_z^{-1} \nabla f(\phi _z(T)) \right\rangle _n}\,. \end{aligned}$$
(38)

This directly follows by adapting the forward Feynman-Kac computation used in remark 4 of Schorlepp et al. (2021) to the present calculation of the covariance function (35) at final time \(t = t' = T\). Note that both, directly from (38), as well as from (37) after a short calculation, carried out in “Appendix A5”, one can see that these results are consistent with the additional final time boundary condition for the state variable fluctuations

$$\begin{aligned} \lim _{\varepsilon \downarrow 0} \left\langle \nabla f(\phi _z(T)), \frac{X_T^\varepsilon - \phi _z(T)}{\sqrt{\varepsilon }} \right\rangle _n = 0, \end{aligned}$$
(39)

almost surely, when conditioning on \(f(X_T^\varepsilon ) = z\). In words, the conditioned Gaussian fluctuations at final time are constrained to the tangent plane of the equi-observable hypersurface \(f^{-1}(\{z\})\) at the point \(\phi _z(T)\).

Fig. 5
figure 5

Results of numerically computing \(10^5\) transition paths from \(x = 0\) to the target set \(f^{-1}(\{z\})\) for the model SDE (10) with \(z = 3\) and \(\varepsilon = 0.5\) using instanton-based importance sampling (Ebener et al. 2019). We visualize the transition tube information obtained from the eigenvalues and eigenfunctions of the projected second variation operator. The upper left subfigure shows the histogram of the full data set for all times. The remaining subfigures show histograms of the transition paths at specific times t. The black lines, as a comparison, are the level sets of the normal PDF with covariance \(\varepsilon {{\mathcal {C}}}_z(t,t)\), found by evaluating (37) numerically, and mean \(\phi _z(t)\). Note that the deformation of the distribution of \(X_t^\varepsilon \), conditioned on \(f(X_T^\varepsilon ) = z\), is captured quite well using the quadratic, sampling-free approximation

As in the previous sections, we use the model SDE (10) with \(z = 3\) and \(\varepsilon = 0.5\) to illustrate these findings. To do this, we compare the PDF of \(X_t^\varepsilon \) at different times t, when conditioning on \(f(X_T^\varepsilon ) = z\), as obtained via sampling, to the Gaussian approximation \(\mathcal{N}(\phi _z(t), \varepsilon {{\mathcal {C}}}_z(t,t))\) that we evaluate using the instanton as well as eigenvalues and eigenfunctions of \(A_z\) that were computed previously. We use instanton-based importance sampling (Ebener et al. 2019) to generate \(10^5\) trajectories of (10) that satisfy \(f(X_T^\varepsilon ) = z\) up to a given precision \(f((X_T^\varepsilon - \phi _z(T))/\sqrt{\varepsilon }) < 0.05\); the corresponding code, which again uses Euler steps with an integrating factor and a step size of \(\Delta t = 5 \cdot 10^{-4}\), can be found in the GitHub repository (Schorlepp et al. 2023). Essentially, instead of using (9) directly, we shift the system by the instanton (cf. Tong et al. (2021) for a visualization and further analysis), solve

$$\begin{aligned} \textrm{d}Y_t^\varepsilon \!=\! \frac{b\left( \phi _z(t) + \sqrt{\varepsilon } Y_t^\varepsilon \right) - b(\phi _z(t))}{\sqrt{\varepsilon }} \textrm{d}t + \sigma \textrm{d}B_t\,, \quad Y_0^\varepsilon \!=\! 0\,, \end{aligned}$$
(40)

and reweight the samples by

$$\begin{aligned}&\exp \bigg \{ \varepsilon ^{-1} \int _0^T \big \langle b\left( \phi _z(t) + \sqrt{\varepsilon } Y_t^\varepsilon \right) - b(\phi _z(t)) \nonumber \\ {}&- \sqrt{\varepsilon } \nabla b(\phi _z(t)) Y_t^\varepsilon ,\theta _z \big \rangle _n \textrm{d}t + \varepsilon ^{-1} \lambda \big (f\left( \phi _z(T) + \sqrt{\varepsilon } Y_T^\varepsilon \right) \nonumber \\ {}&\qquad - f(\phi _z(T)) - \sqrt{\varepsilon } \nabla f(\phi _z(T)) Y_T^\varepsilon \big ) \bigg \}. \end{aligned}$$
(41)

The results are shown in Fig. 5, and we observe good agreement between the sampled conditioned distributions at times \(t \in \{0.05, 0.25, 0.5, 0.75, 0.95 \}\) and the corresponding theoretical small-noise Gaussian approximations. In particular, the deformation of the fluctuation PDF along the instanton trajectory \(\left( \phi _z(t)\right) _{ t \in [0,T]}\) is captured by the Gaussian approximation. It is not surprising that the Gaussian approximation works well for the parameters \(\varepsilon , z\) and T used here, since the probability \(P_F^\varepsilon (z)\) in Sect. 1.1 as approximated by the Laplace method also matched the direct sampling estimate.

4 Computational examples

We now apply the numerical methods introduced in the previous section to two high-dimensional examples involving SPDEs: In Sect. 4.1, we consider the Korteweg–De Vries equation in one spatial dimension, subject to spatially smooth Gaussian noise, and compute precise estimates for the probability to observe large wave heights at one instance in space and time. We compare our asymptotically sharp estimates to direct sampling, and also explicitly compare the two different prefactor computation strategies. Then, we focus on the stochastically forced three-dimensional incompressible Navier–Stokes equations in Sect. 4.2. This is a much higher-dimensional problem, and we demonstrate that the eigenvalue-based prefactor computation indeed remains applicable in practice for this example. Note that both SPDE examples in this section have periodic boundary conditions in space, but this is not a restriction of the method and has merely been chosen for convenience.

4.1 Stochastic Korteweg–De Vries equation

Fig. 6
figure 6

Left column: Rate function \(I_F\) (top) and leading order prefactor \(C_F\) (bottom) for the KdV equation (42) with height observable (44), as obtained from numerical instanton and prefactor computations. Note that the prefactor depends strongly, almost exponentially, on the observable value z in this example. Right: Comparison of LDT estimate (7) for different noise levels \(\varepsilon \in \{0.1, 1, 10\}\) to direct sampling for the SPDE (42). For each \(\varepsilon \), we computed \(4 \cdot 10^4\) samples of \(f\left( u^\varepsilon (\cdot , T) \right) \) to estimate the tail probabilities for various z. The shaded regions are \(95\%\) Wilson score intervals (Brown et al. 2001) for the sampling estimate of the tail probabilities. The solid lines show the asymptotically sharp estimate (7) without adjustable parameters. In comparison to this, the dashed lines show just the leading order LDT term \(\exp \left\{ -I_F(z) / \varepsilon \right\} \) with a constant prefactor (chosen such that the curve matches (7) for large z), which shows that the prefactor \(C_F\) is absolutely necessary to get useful results in this example at \(\varepsilon > 0.1\) and can be understood in regard to the left column of the figure. Results use \(n_x = 1024, n_t = 4000\) for the instanton computations, pseudo-spectral code with integrating factor, L-BFGS optimization with penalty term for observable; 80 eigenvalues with largest absolute value for Fredholm determinant; stochastic Heun steps with size \(\Delta t = 10^{-3}\) for direct sampling

To illustrate the instanton and prefactor computation, we study the Korteweg–De Vries (KdV) equation subject to large-scale smooth Gaussian noise. The KdV equation can be considered as a model for shallow water waves, so the problem we are interested in is to estimate the probability of observing large wave amplitudes. Since this is the first PDE example we study and the general theory in the previous sections has only been developed for ODEs, we explicitly state the instanton equations, second order adjoint equations and Riccati equation. We consider a field \(u^\varepsilon :[0,l = 2 \pi ] \times [0,T = 1] \rightarrow \mathbb {R}\) with periodic boundary conditions in space satisfying the SPDE

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t u^\varepsilon + u^\varepsilon \partial _x u^\epsilon - \nu \partial _{xx} u^\varepsilon + \kappa \partial _{xxx} u^\varepsilon = \sqrt{\varepsilon }\eta \,,\\ u^\varepsilon (\cdot , 0) = 0\,, \end{array}\right. } \end{aligned}$$
(42)

with constants \(\nu = \kappa = 4 \cdot 10^{-2}\) and white-in-time, centered and stationary Gaussian forcing

$$\begin{aligned} \mathbb {E}\left[ \eta (x,t)\eta (x',t') \right] = \chi (x-x') \delta (t-t') \,. \end{aligned}$$
(43)

We choose \(\hat{\chi }_k = \delta _{\left|k\right|,1} / (2 \pi )\) as the spatial correlation function of the noise \(\eta \) in Fourier space, with \(\,\hat{}\,\) denoting the spatial Fourier transform. Concretely, \(\eta (x,t)\) is then given by \(\eta (x,t) = \pi ^{-1/2} (\dot{B}_1(t) \sin (x) + \dot{B}_2(t) \cos (x))\), where \(B_1,B_2\) are independent standard one-dimensional Brownian motions. Hence, the forcing only acts on a single large scale Fourier mode, and excitations of all other modes are due to the nonlinearity of the SPDE. As our observable, we choose the wave height at the origin

$$\begin{aligned} f(u(\cdot , T)) = u(0,T)\,, \end{aligned}$$
(44)

and we want to quantify the tail probability \(P_F^\varepsilon (z) = \mathbb {P}\left[ f(u^\varepsilon (\cdot , T)) \ge z \right] \) for different \(z > 0\). Note that the effective dimension of the system when formulated in terms of the noise for our choice of noise correlation is small, and we have \({{\,\textrm{rank}\,}}\sigma = 2 \ll n = n_x\) for typical spatial resolutions. Unless otherwise specified, we use \(n_x = 1024\) for all numerical results in this section, as well as \(n_t = 4000\) equidistant points in time, and we expect the prefactor computation in terms of eigenvalues of \(A_z\) to be more efficient in this example, even though the Riccati approach still remains feasible.

We use a pseudo-spectral code and explicit second order Runge-Kutta steps in time with an integrating factor for the linear terms. The final-time constraint is treated with the augmented Lagrangian method. Denoting the state space instanton by \(u_z\) with adjoint variable \(p_z\) and Lagrange multiplier \(\lambda _z\), the first-order necessary conditions at the minimizers read

$$\begin{aligned}&{\left\{ \begin{array}{ll} \partial _t u_z =-u_z \partial _x u_z + \nu \partial _{xx} u_z - \kappa \partial _{xxx} u_z + \chi * p_z\,,\\ \partial _t p_z = -u_z \partial _x p_z - \nu \partial _{xx} p_z - \kappa \partial _{xxx} p_z \,, \end{array}\right. }\nonumber \\ \text {with }&{\left\{ \begin{array}{ll} u_z(\cdot , 0) = 0\,, \quad f(u_z(\cdot , 1)) = z\,,\\ p_z(x, 1) = \lambda _z \delta (x)\,. \end{array}\right. } \end{aligned}$$
(45)

Here, \(*\) denotes spatial convolution, which appears due to the stationarity of the forcing.

As a starting point, we compute instantons for a range of equidistantly spaced observable values \(z \in [0, 30]\). Knowledge of the instanton for different z gives us access to the rate function \(I_F\) of the observable, which is shown on the left in Fig. 6.

In the table in Fig. 8, we show for fixed z how the value of \(I_F(z)\) converges when increasing the spatio-temporal resolution, and in particular that the number of optimization steps needed to find the instanton is robust under changes of the numerical resolution, indicating scalability of the instanton computation. The numerical details for these instanton computations are as follows (cf. Schorlepp et al. (2022)): Initial control \(p \equiv 0\) and initial Lagrange multiplier \(\lambda = 0\); precise target observable value \(z = 8.39125\); 6 logarithmically spaced penalty steps from 1 to 300 for augmented Lagrangian method; optimization is terminated upon reduction of gradient norm by \(10^6\); same (presumably) global minimizer was found for each resolution; discretize-then-optimize; L-BFGS solver with 4 updates stored; Armijo line search.

Two comments on the instanton computations for this example are in order: Firstly, the observable rate function is non-convex for some z in the interval [1.5, 5] (not visible in the figure). This poses a problem for the dual problem solved at fixed \(\lambda \) without penalty, but is not an issue for the penalty or augmented Lagrangian strategy that we used. Furthermore, this means that the Riccati prefactor computation is not directly applicable in this region, but the Fredholm expression remains valid. Secondly, since it is a priori unclear whether the minimization problem for the instanton has a unique solution (the target functional is quadratic, but the constraint is nonlinear), we started multiple optimization runs for the same z at different random initial conditions. In the KdV system, we found multiple subdominant minima that consist of multiple large wave crests (as opposed to just one for the dominant one, as shown in the top left of Fig. 9 for one z), but only took the (presumably) global minimizer for subsequent estimates.

To complete the asymptotic estimate of the wave height probability via (7), we further need the prefactor \(C_F(z)\) for all z, which we compute by finding the dominant eigenvalues of \(A_z\) as before. We specify the input and output of the linear operator \(A_z\) only in terms of the two real Fourier modes of the noise that are relevant for this, to remove the memory cost of the eigenvalue solver. The second order adjoint equations (23) for noise fluctuations \(\delta \eta :[0,2 \pi ] \times [0,1] \rightarrow \mathbb {R}\) for the KdV equation read

$$\begin{aligned}&{\left\{ \begin{array}{ll} \partial _t \delta u =-\partial _x (u_z \delta u) + \nu \partial _{xx} \delta u - \kappa \partial _{xxx} \delta u + \chi ^{1/2} * \delta \eta \,,\\ \partial _t \delta p = -\delta u \partial _x p_z -u_z \partial _x \delta p - \nu \partial _{xx} \delta p - \kappa \partial _{xxx} \delta p\,, \end{array}\right. } \nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \delta u (\cdot , 0) = 0\,,\\ \delta p(\cdot , 1) = 0\,, \end{array}\right. } \end{aligned}$$
(46)

with \(A_z \delta \eta = \chi ^{1/2} * \delta p\). In our implementation, we supply the second variation operator with the two real Fourier coefficients \(\left( \text {Re} \, \widehat{\delta \eta }_1(t_i)\right) _{i = 0, \dots , n_t}\) and \(\left( \text {Im} \, \widehat{\delta \eta }_1(t_i) \right) _{i = 0, \dots , n_t}\), assemble the full fluctuation vector \(\delta \eta \) from it, and return \(\chi ^{1/2} * \delta p\) in the same format after solving (46). As the KdV solutions fit into memory, checkpointing, as discussed in Sect. 2.4, is not necessary. In Fig. 7, we show the convergence of the determinant \(\det ({{\,\textrm{Id}\,}}- A_z)\) for some z’s based on the found eigenvalues, thereby demonstrating that a handful of eigenvalues suffices for an accurate approximation of the prefactor. The number of necessary eigenvalues increases only weakly with the observable value z in this example. In addition, Fig. 8 shows the effect of varying the spatio-temporal resolution \((n_x, n_t)\) on the determinant \(\det ({{\,\textrm{Id}\,}}- A_z)\) for one particular observable value of \(z = 8.4\) at a fixed number of computed eigenvalues. We see that as long as the physical problem is resolved, the eigenvalue spectrum does not change much with the resolution, and the determinant converges when increasing the spatio-temporal resolution. This indicates that our methods are scalable, i.e., their cost does not increase with the temporal (and also spatial) discretization beyond the increased cost of the PDE solution. This is a crucial property of the eigenvalue-based prefactor computation and is in contrast with the Riccati approach.

The result for the prefactor \(C_F\) as a function of z is shown on the bottom left of Fig. 6. Note that the vertical axis is scaled logarithmically, i.e. the prefactor strongly depends on the observable value. The importance of the prefactor is further confirmed by the comparison of the complete asymptotic estimate (7) to the results of direct Monte Carlo simulations on the right in Fig. 6. For three values of \(\varepsilon \in \{0.1, 1, 10\}\), we performed \(4 \cdot 10^4\) respective simulations of the stochastic KdV equation (42) to estimate the tail probability \(P_F^\varepsilon (z)\) without approximations. Using both the rate function and prefactor, excellent agreement with the Monte Carlo simulations is obtained. In contrast to this, only using the leading order LDT term \(\exp \left\{ -I_F(z)/\varepsilon \right\} \) with a constant prefactor leads to a much worse agreement with simulations, and in fact only works reasonably for \(\varepsilon = 0.1\). Note also that one can see from these comparisons that the actual effective smallness parameter for the asymptotic expression (7) to be valid is \(\varepsilon / h(z)\) for some monotonically increasing function h, meaning that the estimate is also valid for large \(\varepsilon \) as long as suitably large \(z \rightarrow \infty \) are considered. In this sense, the estimate is truly an extreme event probability estimate, but we chose to work in terms of the formal parameter \(\varepsilon \) to have an explicit and general scaling parameter, in contrast to the example-specific function h(z). For works on large deviation principles directly in \(z \rightarrow \infty \), see e.g. Dematteis et al. (2019), Tong et al. (2021)

Fig. 7
figure 7

Result of numerically computing 80 eigenvalues \(\mu _z^{(i)}\) with largest absolute value of \(A_z\) for the KdV equation (42) with \(z \in \{1, 8.4, 19.9\}\). Main figure: absolute value of the eigenvalues \(\mu _z^{(i)}\) (dots: positive eigenvalues, crosses: negative eigenvalues). Inset: Finite product \(\prod _{i = 1}^m \left( 1 - \mu _z^{(i)} \right) \) for different m as an approximation for the Fredholm determinant \(\det ({{\,\textrm{Id}\,}}- A_z)\). We see that the eigenvalues rapidly decay to zero for all z. Similarly, the cumulative product in the inset converges quickly, and the determinant is in fact well-approximated by less than 10 eigenvalues for all z

Fig. 8
figure 8

Performance of instanton and prefactor computations for KdV problem with \(z=8.4\) for different spatio-temporal resolutions \((n_x, n_t) \in \{(32, 125), \dots , (1024, 4000)\}\). The table shows the number of optimization iterations required to compute the instanton, and the value of the objective \(I_F(z)\). The number of iterations does not increase with the resolution \((n_x, n_t)\). The bottom figure shows 80 eigenvalues \(\mu _z^{(i)}\) with largest absolute value of \(A_z\). The main figure shows the absolute value of the eigenvalues \(\mu _z^{(i)}\) (dots: positive eigenvalues, crosses: negative eigenvalues). The inset shows \(\prod _{i = 1}^{80} \left( 1 - \mu _z^{(i)} \right) \) for the different resolutions \((n_x, n_t)\) as an approximation for the Fredholm determinant \(\det ({{\,\textrm{Id}\,}}- A_z)\), which is seen to converge with increasing resolution. Note that only for the lowest resolution, the eigenvalue spectrum shows noticeable deviations from the results at \((n_x, n_t) = (1024, 4000)\). The latter resolution has been used for all other numerical results on the KdV equation in this paper

In addition to the probability estimate itself, the instanton, eigenvalues and eigenfunctions of \(\eta _z\) also carry physical information about the system, as discussed in general in Sect. 3. Figure 9 shows the instanton \(u_z\), i.e. the most likely field realization to reach a large wave height of \(z = 8.4\), and the dominant space-time fluctuations \(\delta u_z^{(i)}\) around it.

We further computed the Gaussian fluctuations around the instanton for \(z = 8.4\) at the final instance \(t = T\) in Fig. 10. First of all, we also solved the forward Riccati equation (27), which is a PDE for \({{\mathcal {Q}}}_z :[0,2 \pi ]^2 \times [0,1] \rightarrow \mathbb {R}\) here and reads

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t {{\mathcal {Q}}}_z(x, y, t) = \chi (x - y)\\ \hspace{2cm}- \left[ \partial _x \left( u_z(x) \cdot \right) + \partial _y \left( u_z(y) \cdot \right) \right] {{\mathcal {Q}}}_z(x, y, t)\\ \hspace{2cm}+ \nu \left[ \partial _{xx} + \partial _{yy} \right] {{\mathcal {Q}}}_z(x, y, t) \\ \hspace{2cm}- \kappa \left[ \partial _{xxx} + \partial _{yyy} \right] {{\mathcal {Q}}}_z(x, y, t)\\ \hspace{2cm}+ \int _0^{2 \pi } {{\mathcal {Q}}}_z(x, x', t) \partial _{x'} p_z(x', t){{\mathcal {Q}}}_z(x', y, t) \textrm{d}x'\,,\\ {{\mathcal {Q}}}_z(\cdot , \cdot , t = 0) = 0\,, \end{array}\right. } \end{aligned}$$
(47)

using the same pseudospectral code and explicit second order Runge-Kutta steps with integrating factor. The result for the prefactor agrees with the one obtained using the Fredholm determinant expression, with \(C_F(z = 8.4) \approx 1.0793 \cdot 10^{-2}\) using the eigenvalues and \(C_F(z = 8.4) \approx 1.0794 \cdot 10^{-2}\) from the Riccati approach with

$$\begin{aligned} C_F(z) = \frac{\exp \left\{ \tfrac{1}{2} \int _0^1 \textrm{d}t \int _0^{2 \pi } \textrm{d}x \; \partial _x p_z(x, t) {{\mathcal {Q}}}_z(x,x,t) \right\} }{\lambda _z \sqrt{{{\mathcal {Q}}}_z(0,0,1)}}\,. \end{aligned}$$
(48)

For this particular observable value, the Riccati equation could be integrated without numerical problems, but we encountered a removable singularity for larger observable values. The final-time covariance of the conditioned Gaussian fluctuations around the instanton, as predicted using either the Riccati solution (38) or the eigenfunctions and eigenvalues (37), indeed coincides for both approaches and is highly oscillatory (top row, center and right in Fig. 10). Denoting the eigenvalues and normalized eigenfunctions of the final-time covariance operator \({{\mathcal {C}}}_z(T,T)\) by \(\nu _z^{(i)}(T)\) and \(\delta v_z^{(i)}\), we see that only a handful of fluctuation modes \(\delta v_z^{(i)}\) are actually observable since the eigenvalues \(\nu _z^{(i)}(T)\) in the bottom left of Fig. 10 quickly decay. Using the eigenvalues and eigenfunctions, realizations of \(u^\varepsilon (\cdot , T)\) when conditioning on \(u^\varepsilon (0, T) = z = 8.4\) can now easily be sampled within the Gaussian approximation as

$$\begin{aligned} u^\varepsilon (x, T) \approx u_z(x, T) + \sqrt{\varepsilon } \sum _{i = 1}^\infty Z_i \sqrt{\nu _z^{(i)}(T)} \delta v_z^{(i)}(x) \end{aligned}$$
(49)

with \(Z_i\) independent and identically standard normally distributed. All in all, this example demonstrates the practical relevance and ease of applicability of the asymptotically sharp LDT estimate including the prefactor in a nonlinear, one-dimensional SPDE.

Fig. 9
figure 9

Example instanton field \(u_z\) in space and time for \(z = 8.4\) (top left) for the KdV equation (42) and height observable (44), and dominant 5 normalized state variable eigenfunctions \(\delta u_z^{(i)}\) of the projected second variation operator \(A_z\). Due to the KdV nonlinearity and linear wave dispersion, the large-scale forcing input is transformed into a large wave with dominant peak at \(t = T\), \(x = 0\) for the instanton \(u_z\), i.e. the most likely field realization to obtain a large wave height \(z = 8.4\) at \(t = T\), \(x = 0\). The strongest fluctuations around the instanton resemble the instanton itself, but are necessarily centered around 0 with final-time height \(\delta u(0, T) = 0\) at the origin. Note that only two eigenvalues are larger than 0.1 in modulus, reflecting the small effective dimension of the system in the noise variable, and that \(\delta u_z^{(4)}\) contains already higher modes in time

Fig. 10
figure 10

Information on the conditioned final time Gaussian fluctuations around the KdV instanton for \(z = 8.4\), calculated from the quantities used to evaluate the prefactor \(C_F(z)\). Top, left: Riccati solution \(\mathcal{Q}_z( \cdot , \cdot , T = 1)\) at final time. Top, center: Projection of the Riccati solution, such that the constraint \(\delta u(0, T) = 0\) is satisfied. This way, the final time covariance \(\mathcal{C}_z(T,T)\) as given in (38) is obtained. Top, right: The same final time covariance \({{\mathcal {C}}}_z(T,T)\) constructed from the eigenvalues and eigenfunctions of \(A_z\) instead as in (37). The result is visually indistinguishable from the Riccati computations. Bottom, left: Eigenvalues \(\nu _z^{(i)}(T)\) of the covariance \({{\mathcal {C}}}_z(T,T)\). We see that the eigenvalues quickly decay to zero, and less than 10 fluctuation modes are in fact relevant. Bottom, center: Eigenfunctions \(\delta v_z^{(i)}\) for the 4 dominant eigenvalues \(\nu _z^{(i)}(T)\), \(i \in \{1,2,3,4\}\), which all necessarily satisfy \(\delta v_z^{(i)}(x = 0) = 0\). Bottom, right: Instanton \(u_z(\cdot , T)\) at final time (dashed line), and variance of conditioned Gaussian fluctuations around it for \(\varepsilon = 0.1\) (shaded area)

4.2 Stochastically forced incompressible three-dimensional Navier–Stokes equations

As a challenging, high-dimensional example, we consider the estimation of the probability of a high strain event in the stochastically forced incompressible three-dimensional Navier–Stokes equations. Our main goal here is to demonstrate that in addition to instantons for this problem, which were computed by Schorlepp et al. (2022), it is also numerically feasible to compute the leading order prefactor using the Fredholm determinant approach (14). Our setup hence follows the one treated by Schorlepp et al. (2022). A comprehensive analysis of the problem, including the behavior of the prefactor in the vicinity of the critical points of the dynamical phase transitions observed in this example, is beyond the scope of this paper. For other works on instantons and large deviations for the three-dimensional stochastic Navier–Stokes equations, see Falkovich et al. (1996), Moriconi (2004), Grafke et al. (2015), Apolinário et al. (2022). We consider a velocity field \(u^\varepsilon :[0,l = 2 \pi ]^3 \times [0,T = 1] \rightarrow \mathbb {R}^3\) with periodic boundary conditions in space that satisfies

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t u^\varepsilon + \left( u^\varepsilon \cdot \nabla \right) u^\epsilon - \Delta u^\varepsilon + \nabla P = \sqrt{\varepsilon }\eta \,,\\ \nabla \cdot u^\varepsilon = 0\,,\\ u^\varepsilon (\cdot , 0) = 0\,. \end{array}\right. } \end{aligned}$$
(50)

Here, P denotes the pressure which is determined through the divergence constraint. The forcing \(\eta \) is centered Gaussian, large-scale in space, white in time, and solenoidal with covariance

$$\begin{aligned} \mathbb {E}\left[ \eta (x,t)\eta (x',t')^\top \right] = \chi (x-x') \delta (t-t') \,, \end{aligned}$$
(51)

where a Mexican hat correlation function with correlation length 1

$$\begin{aligned} \chi (x) = \left[ 1_{3 \times 3} - \frac{1}{2} \left( \left\Vert x\right\Vert ^2 1_{3 \times 3} - x \otimes x\right) \right] \exp \left\{ -\frac{\left\Vert x\right\Vert ^2}{2} \right\} \,, \end{aligned}$$
(52)

is used. Note that this corresponds to the situation \({{\,\textrm{rank}\,}}\sigma \ll 3 n_x^3\) of Sect. 2.4, where only a small number of degrees of freedom is forced due to the Fourier transform \(\hat{\chi }\) decaying exponentially. As our observable, we consider the strain \(f(u) = \partial _3 u_3(x=0)\) at the origin. Denoting the Leray projection onto the divergence-free part of a vector field by \({{\mathcal {P}}}\), the instanton equations for \((u_z, p_z, \lambda _z)\) are given by

$$\begin{aligned}&{\left\{ \begin{array}{ll} \partial _t u_z =- {{\mathcal {P}}} \left[ \left( u_z \cdot \nabla \right) u_z \right] + \Delta u_z + \chi * p_z\,,\\ \partial _t p_z = - {{\mathcal {P}}} \left[ \left( u_z \cdot \nabla \right) p_z + \left( \nabla p_z \right) ^\top u_z\right] - \Delta p_z \,, \end{array}\right. }\nonumber \\ \text {with }&{\left\{ \begin{array}{ll} u_z(\cdot , 0) = 0\,, \quad f(u_z(\cdot , 1)) = z\,,\\ p_z(\cdot , 1) = \lambda _z {{\mathcal {P}}} \left[ \left. \frac{\delta f}{\delta u} \right| _{u_z(\cdot , 1)} \right] \,. \end{array}\right. } \end{aligned}$$
(53)

With the instantons computed, we are able to evaluate the application of the second variation operator \(A_z\) to noise fluctuation vectors \(\delta \eta :[0,2\pi ]^3 \times [0,1] \rightarrow \mathbb {R}^3\) by solving the second order adjoint equations

$$\begin{aligned}&{\left\{ \begin{array}{ll} \partial _t \left( \delta u \right) = -{{\mathcal {P}}} \left[ (u_z \cdot \nabla ) \delta u + (\delta u \cdot \nabla ) u_z\right] \\ \hspace{1.4cm}+ \Delta \left( \delta u\right) + \chi ^{1/2} * \delta \eta \,,\\ \partial _t \left( \delta p\right) = - {{\mathcal {P}}} \big [ \left( \nabla p_z + \left( \nabla p_z \right) ^\top \right) \delta u + \left( u_z \cdot \nabla \right) \delta p \\ \hspace{1.4cm}+ \left( \nabla (\delta p) \right) ^\top u_z \big ] - \Delta \left( \delta p\right) \,, \end{array}\right. }\nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \delta u(\cdot , 0) = 0\,,\\ \delta p(\cdot , 1) = 0\,. \end{array}\right. } \end{aligned}$$
(54)

We focus on \(z = -25\) here, where the unique instanton solution does not break rotational symmetry (Schorlepp et al. 2022). Numerically, we use a pseudo-spectral GPU code with a spatial resolution \(n_x = n_y = n_z = 128\), a temporal resolution of \(n_t = 512\), a nonuniform grid in time with smaller time steps close to \(T = 1\), and second order explicit Runge–Kutta steps with an integrating factor for the diffusion term. We truncated \(\chi \) in Fourier space by setting it to 0 for all k where \(\left|\hat{\chi }_k\right| < 10^{-14}\), leading to \(\left\Vert k\right\Vert \le 9\) and an effective real spatial dimension, independently of \(n_x\), of approximately \({{\,\textrm{rank}\,}}\sigma \approx 2 \cdot (2 \cdot 9)^3 = 11664\) for the noise (by taking a cube instead of sphere for the Fourier coefficients of the noise vectors that are stored, and noting that \(\hat{\chi }_k\) projects onto \(k^\perp \)). The evaluation of the second order adjoint equations is then possible with only a few GB of VRAM for this resolution when exploiting double checkpointing and low rank storage as described in Sect. 2.4. We computed the 600 largest eigenvalues of operator \(A_z\), again realized as a scipy.sparse.linalg.LinearOperator, by using scipy.sparse.linalg.eigs as before. We transfer the data to the GPU to evaluate the second variation applied to \(\delta \eta \) by solving (54) with PyCUDA (Klöckner et al. 2012), and transfer back \(\chi ^{1/2} * \delta p\) to the CPU afterwards. Computing 600 eigenvalues this way needs about 1200 operator evaluations, or about 30 hours on a modern workstation with Intel Xeon Gold 6342 CPUs at \(2.80\;\text {GHz}\) and an NVIDIA A100 80GB GPU. The main limitation for computing more eigenvalues is that the eigenvalue solver used stores all matrix vector products in RAM. This could be overcome by storing some of them on a hard disk, or using different algorithms that can be parallelized over multiple nodes such as randomized SVD (Maulik et al. 2021).

The results for the eigenvalues of \(A_z\) are shown in Fig. 11. We see that the absolute value of the eigenvalues decays such that the product \(\prod _{i = 1}^m \left( 1 - \mu _z^{(i)} \right) \) converges as m increases, but that even more than 600 eigenvalues would be needed for a more accurate result. For smaller observable values z, faster convergence is expected. Also, the spectrum of \(A_z\) shows a large number of doubly degenerate eigenvalues, which appear whenever the eigenfunctions break the axial symmetry of the instanton. This feature of the spectrum clearly depends on the domain and spatial boundary conditions that were chosen here. From the instanton computation, we obtain \(I_F(z) \approx 1900.7\) for the rate function, and from the 600 eigenvalues of \(A_z\) we estimate \(C_F(z) \approx 4.9 \cdot 10^{-3}\). With this, we can estimate that e.g. for \(\varepsilon = 250\), the probability to observe a strain event with \(\partial _3 u_3(x = 0,T = 1) \le -25\) is approximately \(1.5 \cdot 10^{-5}\), which matches the sampling estimate of \(P_F^{250}(-25) \in [1.3 \cdot 10^{-5}, 1.7 \cdot 10^{-5}]\) at \(95\%\) asymptotic confidence, as obtained from \(10^4\) direct numerical simulations of (50) (data set from Schorlepp et al. (2022)). For smaller \(\varepsilon \), the event becomes more rare, and it quickly becomes unfeasible to estimate its probability via direct sampling, whereas the quadratic estimate using the rate function and prefactor can be computed for any \(\varepsilon \) and is known to become more precise as the event becomes more difficult to observe in direct simulations. In addition to these probability estimates, we can also analyze the dominant Gaussian fluctuations around the instanton now and easily sample high strain events within the Gaussian approximation. Figure 12 shows the instanton \(u_z\) at final time, i.e. an axially symmetric pair of counter-rotating vortex rings, as well as the dominant eigenfunctions of \({{\mathcal {C}}}_z(T,T)\), corresponding to the fluctuation modes that are most easily observed at final time in conditioned direct numerical simulations. Note that the Riccati equation (27) would be a PDE for a six-dimensional matrix-valued field \(Q_z(x_1,x_2,x_3,y_1,y_2,y_3,t)\) here without obvious sparsity properties. Solvers for such a problem are quite expensive, if feasible at all, and also not easy to scale to higher spatial resolutions, whereas this is possible for the dominant eigenvalue approach.

Fig. 11
figure 11

Result of numerically computing 600 eigenvalues \(\mu _z^{(i)}\) with largest absolute value of \(A_z\) for the three-dimensional Navier–Stokes equations (50) with strain \(z = \partial _3 u_3(x = 0,T) = -25\), where the instanton is a rotationally symmetric pair of vortex rings. Main figure: absolute value of the eigenvalues \(\mu _z^{(i)}\). Inset: Finite product \(\prod _{i = 1}^m \left( 1 - \mu _z^{(i)} \right) \) for different m as an approximation for the Fredholm determinant \(\det ({{\,\textrm{Id}\,}}- A_z)\). We see in the main figure that the eigenvalues often appear in pairs, which happens whenever the eigenfunctions break the rotational symmetry of the problem, such that, due to the periodic box, there are two linearly independent eigenfunctions for the same eigenvalue. The inset shows that \(\det ({{\,\textrm{Id}\,}}- A_z)\) is approximately 11 in this example, but that even more eigenvalues would be needed to get an accurate result

Fig. 12
figure 12

Visualization of the instanton and dominant Gaussian fluctuations around it at final time \(T = 1\), for a strain event with \(z = \partial _3 u_3(x = 0,T) = -25\) for the three-dimensional Navier–Stokes equations (50). All three-dimensional images show isosurfaces of the vorticity or curl of the respective field. Top, left: The unique instanton for this observable value is a rotationally symmetric pair of vortex rings. Top, center: Eigenvalues \(\nu _z^{(i)}(T)\) of the final-time covariance operator \({{\mathcal {C}}}_z(T,T)\), approximated as \(\mathcal{C}_z(T,T) \approx \sum _{i = 1}^{600} [1 - \mu _z^{(i)}]^{-1} \delta u_z^{(i)}(\cdot , T) \otimes \delta u_z^{(i)}(\cdot , T)\) using 600 eigenvalues \(\mu _z^{(i)}\) with largest absolute value of the projected second variation operator \(A_z\), of the conditioned Gaussian fluctuations around the instanton. Top right, and second/third row: Normalized eigenfunctions \(\delta v_z^{(i)}\) of \({{\mathcal {C}}}_z(T,T)\) for the largest eigenvalues of \({{\mathcal {C}}}_z(T,T)\), indicating the strongest fluctuation directions around the strain instanton at final time \(t = T\)

5 Summary and outlook

In this paper, we have presented an asymptotically sharp, sampling-free probability estimation method for extreme events of stochastic processes described by additive-noise SDEs and SPDEs. The method can be regarded as a path-space SORM approximation. We have introduced and compared two different conceptual and numerical strategies to evaluate the pre-exponential factor appearing in these estimates, either through dominant eigenvalues of the second variation, corresponding to the standard formulation of precise Laplace asymptotics and SORM, or through the solution of matrix Riccati differential equations, which is possible for precise large deviations of continuous-time Markov processes. Highlighting the scalability of the first approach, we have shown that leading-order prefactors can be computed in practice even for very high-dimensional SDEs, and explicitly tested our methods in two SPDE examples. In all examples, the approximations showed good agreement with direct Monte Carlo simulations or importance sampling. We hope that the methods assembled in this paper are useful whenever sample path large deviation theory is used to obtain probability estimates in real-world examples.

There are multiple possible extensions of the methods presented in this paper. More general classes of SDEs and SPDEs could possibly be treated numerically within the eigenvalue-based approach, most notably SDEs with multiplicative Gaussian noise, but also SDEs driven by Levy noise or singular SPDEs. Furthermore, one could try to generalize the approach to include any additive Gaussian noise that is colored in time instead of white. This would potentially lead to further dimensional reduction for the instanton and prefactor computation for examples with a slowly decaying temporal noise correlation. It would also be interesting to apply the eigenvalue-based prefactor computation strategy to metastable non-gradient SDEs. Regarding the numerical applicability of the Riccati method in case of high-dimensional systems with low-rank forcing, there is an alternative formulation of the prefactor in terms of a backward-in-time Riccati equation (Grafke et al. 2021), which could be better suited for controlled low-rank approximations. In general, improvements of the quadratic approximation used throughout this paper via loop expansions, resummation techniques or non-perturbative methods from theoretical physics could be investigated. In this regard, it would be desirable to obtain simple criteria that indicate whether the SORM approximation considered in this paper can be expected to be accurate for given \(\varepsilon \) and z. Finally, one could use the instanton and additional prefactor information for efficient importance sampling of extreme events for S(P)DEs.