Scalable methods for computing sharp extreme event probabilities in infinite-dimensional stochastic systems

Schorlepp, Timo; Tong, Shanyin; Grafke, Tobias; Stadler, Georg

doi:10.1007/s11222-023-10307-2

Scalable methods for computing sharp extreme event probabilities in infinite-dimensional stochastic systems

Original Paper
Open access
Published: 13 October 2023

Volume 33, article number 137, (2023)
Cite this article

Download PDF

You have full access to this open access article

Statistics and Computing Aims and scope Submit manuscript

Scalable methods for computing sharp extreme event probabilities in infinite-dimensional stochastic systems

Download PDF

1883 Accesses
3 Citations
Explore all metrics

Abstract

We introduce and compare computational techniques for sharp extreme event probability estimates in stochastic differential equations with small additive Gaussian noise. In particular, we focus on strategies that are scalable, i.e. their efficiency does not degrade upon temporal and possibly spatial refinement. For that purpose, we extend algorithms based on the Laplace method for estimating the probability of an extreme event to infinite dimensional path space. The method estimates the limiting exponential scaling using a single realization of the random variable, the large deviation minimizer. Finding this minimizer amounts to solving an optimization problem governed by a differential equation. The probability estimate becomes sharp when it additionally includes prefactor information, which necessitates computing the determinant of a second derivative operator to evaluate a Gaussian integral around the minimizer. We present an approach in infinite dimensions based on Fredholm determinants, and develop numerical algorithms to compute these determinants efficiently for the high-dimensional systems that arise upon discretization. We also give an interpretation of this approach using Gaussian process covariances and transition tubes. An example model problem, for which we provide an open-source python implementation, is used throughout the paper to illustrate all methods discussed. To study the performance of the methods, we consider examples of stochastic differential and stochastic partial differential equations, including the randomly forced incompressible three-dimensional Navier–Stokes equations.

The Bayesian Approach to Inverse Problems

Stochastic Methods for Solving High-Dimensional Partial Differential Equations

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The estimation of extreme event probabilities in complex stochastic systems is an important problem in applied sciences and engineering, and is difficult as soon as these events are too rare to be easily observable, but at the same time too impactful to be ignored. Examples of such events studied in the recent literature include rogue waves (Dematteis et al. 2018) and wave impacts on an offshore platform (Mohamad and Sapsis 2018), heat waves and cold spells (Ragone et al. 2018; Gálfi et al. 2019), intermittent fluctuations in turbulent flows (Fuchs et al. 2022) and derivative pricing fluctuations in mathematical finance (Friz et al. 2015). A broad perspective on extreme event prediction can be found in Farazmand and Sapsis (2019). Methods to estimate extreme events typically rely on Monte Carlo simulations, including importance sampling (Bucklew 2013), subset simulation (Au and Beck 2001) or multilevel splitting methods (Budhiraja and Dupuis 2019).

A possible theoretical framework to assess extreme event probabilities, which we will follow in this work, is given by large deviation theory (LDT) (Varadhan 1984; Dembo and Zeitouni 1998). This approach allows to estimate the dominant, exponential scaling of the probabilities in question through the solution of a deterministic optimization problem, namely finding the most relevant realization of the stochastic process for a given outcome. This realization is sometimes called instanton, inspired by theoretical physics. For stochastic processes described by stochastic differential equations (SDEs), the relevant theory has been formulated by Freidlin and Wentzell (2012), and can be extended to many stochastic partial differential equations (SPDEs). The computational potential of this formulation has been reviewed by Grafke and Vanden-Eijnden (2019).

In addition to the exponential scaling provided by LDT, it is often desirable to obtain asymptotically sharp, i.e. asymptotically exact probability estimates. This requires the evaluation of a pre-exponential factor in addition to the usual leading-order large deviation result, when interpreting LDT as a Laplace approximation. On the theoretical side, there exist multiple results for such precise Laplace asymptotics for general SDEs (Ellis and Rosen 1982; Azencott 1982; Ben Arous 1988; Piterbarg and Fatalov 1995; Deuschel et al. 2014) and certain SPDEs requiring renormalization (Berglund et al. 2017; Friz and Klose 2022), which, however, typically do not include an actual evaluation of the abstract objects in terms of which they are formulated. We concentrate on the case of SDEs or well-posed SPDEs with additive noise here, where computing the leading-order prefactor amounts to evaluating a Fredholm determinant of an integral operator.

Approach. In this paper, we present a sharp and computable probability estimate for tail probabilities $\mathbb {P}\left[ f\left( X_T\right) \ge z \right] $, i.e. a real-valued function f of a diffusion process $(X_t)_{t \in [0,T]}$ with state space $\mathbb {R}^n$ and

$$\begin{aligned} {\left\{ \begin{array}{ll} \textrm{d}X_t = b(X_t) \textrm{d}t + \sigma \textrm{d}B_t\,,\\ X_0 = x \in \mathbb {R}^n, \end{array}\right. } \end{aligned}$$

(1)

exceeding a given threshold z at final time T (see Fig. 1 for an example of this setup). We demonstrate that

$$\begin{aligned} \mathbb {P}\left[ f\left( X_T\right) \ge z \right] \approx (2 \pi )^{-1/2} C(z) \exp \left\{ -I(z) \right\} \,, \end{aligned}$$

(2)

in a way to be made precise later on, with real-valued functions C, called the (leading-order) prefactor, and I, called the rate function. The latter is determined through the solution of a constrained optimization problem:

$$\begin{aligned} I(z) = \min _{{\begin{array}{c}\eta \in L^2([0,T],\mathbb {R}^n)\\ \text {s.t. }f\left( X_T[\eta ] \right) = z \end{array}}} \; \frac{1}{2} \left\Vert \eta \right\Vert _{L^2}^2, \end{aligned}$$

(3)

where formally $\eta = \textrm{d}B_t / \textrm{d}t$ is the time derivative of the Brownian motion $(B_t)_{t \in [0,T]}$, and $X_T$ depends on $\eta $ through (1). The prefactor C is then expressed as a Fredholm determinant of a linear operator which contains the solution of the minimization problem (3), the instanton $\eta _z$, as a background field and acts on paths $\delta \eta :[0,T] \rightarrow \mathbb {R}^n$. We show how to evaluate this operator determinant numerically for general SDEs and SPDEs, and demonstrate through multiple examples that it is possible to do so even for very high-dimensional systems with $n \gg 1$ arising, for instance, after spatial discretization of an SPDE. Our approach is based on computing the dominant eigenvalues of the trace-class integral operator entering the Fredholm determinant.

Related literature. In the physics literature, the leading-order prefactor computation corresponds to the evaluation of Gaussian path integrals, which is a classical topic in quantum and statistical field theory (Zinn-Justin 2021). There are multiple references dealing with the evaluation of such integrals for the class of differential operators that is necessary for SDEs, such as Papadopoulos (1975), Nickelsen and Engel (2011), Corazza and Fadel (2020). In accordance with these approaches, in the last years, numerical leading order prefactor computation methods for general SDEs and SPDEs via the solution of Riccati matrix differential equations have been established (Schorlepp et al. 2021; Ferré and Grafke 2021; Grafke et al. 2021; Bouchet and Reygner 2022; Schorlepp et al. 2023). An early example using a similar method is given by Maier and Stein (1996). All of these papers have in common that the leading order prefactor can be evaluated in a closed form by solving a single matrix valued initial or final value problem, thereby bypassing the need to compute large operator determinants directly. We briefly introduce this method in this paper, relate it to the—in some sense complementary—Fredholm determinant prefactor evaluation based on dominant eigenvalues, and discuss possible advantages and disadvantages. We note that for SDEs with low-dimensional state space, it can also be feasible to compute the differential operator determinants, that are otherwise evaluated through the Riccati matrices, directly by discretizing the operator into a large matrix and numerically calculating its determinant, which has been carried out e.g. by Psaros and Kougioumtzoglou (2020), Zhao et al. (2022).

Another perspective on the precise Laplace approximation used in this paper is provided by the so-called second-order reliability method (SORM), which is used in the engineering literature to estimate failure probabilities, as reviewed e.g. by Rackwitz (2001); Breitung (2006). For example, the asymptotic form of the extreme event probabilities in this paper corresponds to the standard form stated by Breitung (1984). In this sense, the method proposed in this paper can be regarded as a path space SORM, carried over to infinite dimensions for the case of additive noise SDEs. The connection of precise LDT estimates to SORM for finite-dimensional parameter spaces has also been pointed out by Tong et al. (2021).

In studies of rare and extreme event estimation, Monte Carlo simulations are commonly used, and various sampling schemes have been designed, some of which have been modified and adapted to systems involving SDEs. These include various importance sampling estimators which can be associated e.g. with the solution to deterministic optimal control problems along random trajectories (Vanden-Eijnden and Weare 2012), with the instanton in LDT (Ebener et al. 2019), or build on stochastic Koopman operator eigenfunctions (Zhang et al. 2022). The method we propose takes a different perspective from these sampling methods—it does not involve sampling, and is only asymptotically exact.

Contributions and limitations. The main contributions of this paper are as follows: (i) Generalizing SORM to infinite dimensions, we introduce a sampling-free method to approximate extreme event probabilities for SDEs (and SPDEs) with additive noise. The method is based on the Laplace approximation in path space and uses second-order information to compute the probability prefactor. (ii) While such precise Laplace asymptotics for SDEs are known on a theoretical level, we show how to evaluate them numerically in a manner that is straightforward to implement and is scalable, i.e. it does not degrade with increasing discretization dimension. We illustrate the method on a high-dimensional nonlinear example, namely estimating the probability of high strain rate events in a three-dimensional stochastic Navier–Stokes flow. (iii) On the theoretical level, we explore the relationship between the proposed eigenvalue-based approach for calculating the prefactor and Riccati methods from stochastic analysis and stochastic field theory. We examine the advantages of each method and provide an interpretation of the involved Gaussian process using transition tubes towards the extreme event, i.e. the expected magnitude and direction of fluctuations on the way to an extreme outcome.

The approach taken in this paper also has some limitations: (i) While we find the probability estimates including the leading-order prefactor to be quite accurate when compared to direct Monte Carlo simulations when these are feasible, these estimates are approximations and only asymptotically exact in the limit as $z\rightarrow \infty $. To obtain unbiased estimates, one can e.g. use importance sampling. The instanton and the second variation eigenvalues and eigenvectors can be used as input for such extreme event importance sampling algorithms (Ebener et al. 2019; Tong et al. 2021; Tong and Stadler 2022). (ii) We limit ourselves to SDEs with additive Gaussian noise. For SDEs with multiplicative noise (or singular SPDEs), the leading-order prefactor is more complicated, as the direct analogy to the finite-dimensional case gets lost (Ben Arous 1988). Nevertheless, extensions of the eigenvalue-based prefactor computation proposed here can likely be made, but are beyond the scope of this paper. (iii) The proposed approach assumes that the differential equation-optimization (3) has a unique solution that can be computed. For non-convex constraints, uniqueness may be difficult to prove or may not hold. However, in the examples we consider, we seem to be able to identify the global minimizer reliably by using several different initializations in the minimization algorithm and, if we find different minimizers, by choosing the one corresponding to the smallest objective value. The proposed approach can also be generalized to multiple isolated and continuous families of minimizers (Ellis and Rosen 1981; Schorlepp et al. 2023).

Notation. We use the following notations throughout the paper: The state space dimension is always written as n, a possible time discretization dimension of the interval [0, T] as $n_t$, and N is exclusively used in section 1.1 for the motivation of our results via random variables in $\mathbb {R}^N$. We denote the Euclidean norm and inner product in $\mathbb {R}^N$ by $\left\Vert \cdot \right\Vert _N$ and $\langle \cdot , \cdot \rangle _N$, respectively, and the $L^2$ norm and scalar product for $\mathbb {R}^n$-valued functions defined on [0, T] by $\left\Vert \cdot \right\Vert _{L^2([0,T], \mathbb {R}^n)}$ and $\langle \cdot , \cdot \rangle _{L^2([0,T], \mathbb {R}^n)}$, respectively. The outer product is denoted by $\otimes $, with $v \otimes w = v w^\top $ and $v^{\otimes 2} {:}{=}v \otimes v$ for $v,w \in \mathbb {R}^N$ and $(f \otimes g)(t,t') = f(t) g(t')^\top $ for $f,g \in L^2([0,T], \mathbb {R}^n)$ and $t,t' \in [0,T]$. Convolutions are written as $*$. The subscript or argument $z \in \mathbb {R}$ always represents the dependency on the observable value e.g. of the minimizer $\eta _z$, Lagrange multiplier $\lambda _z$ and projected second variation operator $A_z$, as well as the observable rate function $I_F(z)$ and prefactor $C_F(z)$. The identity map is in general denoted by ${{\,\textrm{Id}\,}}$, and the identity matrix and zero matrix in $\mathbb {R}^N$ are written as $1_{N \times N}$ and $0_{N \times N}$. The superscript $\perp $ always denotes the orthogonal complement, with $v^\perp {:}{=}(\text {span}(\{v\}))^\perp $. Functional derivatives with respect to $\eta \in L^2([0,T], \mathbb {R}^n)$ are denoted by $\delta / \delta \eta $. Determinants in $\mathbb {R}^N$, as well as Fredholm determinants, are written as $\det $, whereas regularized differential operator determinants are written as ${{\,\textrm{Det}\,}}$ with the boundary conditions of the operator as a subscript. For two real functions g and h, we write

$$\begin{aligned} g(\varepsilon ) \overset{\varepsilon \downarrow 0}{ \sim }\ h(\varepsilon ) \quad \iff \quad \lim _{\varepsilon \downarrow 0} \; \frac{g(\varepsilon )}{h(\varepsilon )} = 1\,, \end{aligned}$$

(4)

if the functions g and h are asymptotically equivalent as $\varepsilon \downarrow 0$. By an abuse of terminology, we use the term “instanton” in this paper to refer to the large deviation minimizer $\eta _z$ for finite-dimensional parameter spaces, and also to both the instanton noise trajectory $\left( \eta _z(t)\right) _{t \in [0,T]}$ and the instanton state variable trajectory $\left( \phi _z(t)\right) _{t \in [0,T]}$ in the infinite-dimensional setup.

We start with a more precise explanation of the concepts described in this introduction in Sects. 1.1 and 1.2, before summarizing the structure of the rest of the paper at the end of Sect. 1.2.

1.1 Laplace method for normal random variables in $\mathbb {R}^N$

We start with the finite dimensional setting, following Dematteis et al. (2019), Tong et al. (2021): We consider a collection of N random parameters $\eta \in \mathbb {R}^N$ that are standard normally distributed, and are interested in a physical observable, described by a function $F:\mathbb {R}^N\rightarrow \mathbb {R}$, that describes the outcome of an experiment under these random parameters. Note that restricting ourselves to independent standard normal variables is not a major limitation as F may include a map that transforms a standard normal to another distribution. To give an example that fits into this setting, $\eta $ could be all parameters entering a weather prediction model, and F then constitutes the mapping of the parameters to some final prediction, such as the temperature at a given location in the future. Note that the map F may be complicated and expensive to evaluate, e.g. requiring the solution of a PDE.

We are interested in the probability that the outcome of the experiment exceeds some threshold z, i.e. $P(z) = \mathbb {P}[F(\eta ) \ge z]$. Since here z is assumed large compared to typically expected values of $F(\eta )$, we call P(z) the extreme event probability. To be able to control the rareness of the event, we introduce a formal scaling parameter $\varepsilon >0$ and consider $\varepsilon \ll 1$ to make the event extreme by defining $P_F^\varepsilon (z) = \mathbb {P}[F(\sqrt{\varepsilon }\eta ) \ge z]$. This allows us to treat terms of different orders in $\varepsilon $ perturbatively in the rareness of the event and is more amenable to analysis than rareness due to $z\rightarrow \infty $. In the following, we will thus consider z as a fixed constant, while discussing the limit $\varepsilon \rightarrow 0$. Since $\eta $ is normally distributed, the extreme event probability is available as an integral,

$$\begin{aligned} P_F^\varepsilon (z) = (2\pi \varepsilon )^{-N/2}\!\!\int _{\mathbb {R}^N} \mathbbm {1}_{ \{F(\eta )\ge z\}}(\eta ) \exp \left\{ -\frac{1}{2\varepsilon } \left\Vert \eta \right\Vert _N^2\right\} \textrm{d}^N\eta ,\nonumber \\ \end{aligned}$$

(5)

by integrating all possible $\eta $ that lead to an exceedance of the observable threshold (as identified by the indicator function $\mathbbm {1}$), weighed by their respective probabilities given by the Gaussian densities. Directly evaluating the integral in (5) is typically infeasible for complicated sets $\{\eta \in \mathbb {R}^N \mid F(\eta )\ge z\}$ and large N.

The central notion of this paper is the fact that in the limit $\varepsilon \downarrow 0$, the integral in (5) can be approximated via the Laplace method, which replaces the integrand with its extremal value, times higher order multiplicative corrections. The corrections at leading order in $\varepsilon $ amount to a Gaussian integral that can be solved exactly. In effect, the integral (5) is approximated by the probability of the most likely event that exceeds the threshold, multiplied by a factor that takes into account the event’s neighborhood.

To make things concrete, we make the following assumptions on $F \in C^2(\mathbb {R}^N, \mathbb {R})$ for given $z > F(0)$:

1.
There is a unique $\eta _z \in \mathbb {R}^N \backslash \{0\}$, called the instanton, that minimizes the function $\tfrac{1}{2} \left\Vert \cdot \right\Vert ^2_N$ in $F^{-1}([z,\infty ))$. Necessarily, $\eta _z \in F^{-1}(\{z\})$ lies on the boundary, $F(\eta _z) = z$, and there exists a Lagrange multiplier $\lambda _z\ge 0$ with $\eta _z = \lambda _z \nabla F(\eta _z)$ as a first-order necessary condition. We define the large deviation rate function of the family of real-valued random variables $\left( F(\sqrt{\varepsilon } \eta ) \right) _{\varepsilon > 0}$ at z via
$$\begin{aligned} I_F :\mathbb {R}\rightarrow \mathbb {R}\,, \quad I_F(z) := \tfrac{1}{2} \left\Vert \eta _z\right\Vert ^2_N\,. \end{aligned}$$
(6)
2.
$1_{N \times N} - \lambda _z \nabla ^2 F(\eta _z)$ is positive definite on the $(N-1)$-dimensional subspace $\eta _z^\perp \subset \mathbb {R}^N$ orthogonal to the instanton, i.e. we assume a second-order sufficient condition for $\eta _z$ holds.

Then, there is a sharp estimate, in the sense of (4), for the extreme event probability (5) via

$$\begin{aligned} P_F^\varepsilon (z) \overset{\varepsilon \downarrow 0}{\sim }\varepsilon ^{1/2} (2 \pi )^{-1/2} \, C_F(z) \,\exp \left\{ -\frac{1}{\varepsilon }I_F(z)\right\} , \end{aligned}$$

(7)

where the rate function $I_F$ determines the exponential scaling, and $C_F(z)$ is the z-dependent leading order prefactor contribution that accounts for the local properties around the instanton. Note that the prefactor is essential to get a sharp estimate, which cannot be obtained from mere $\log $-asymptotics using only the rate function. The prefactor $C_F(z)$ can explicitly be computed via

$$\begin{aligned} C_F(z) = \left[ 2 I_F(z) \det \left( 1_{N \times N} - \lambda _z {{\,\textrm{pr}\,}}_{\eta _z^\perp } \nabla ^2 F(\eta _z) {{\,\textrm{pr}\,}}_{ \eta _z^\perp } \right) \right] ^{-1/2}, \nonumber \\ \end{aligned}$$

(8)

where ${{\,\textrm{pr}\,}}_{\eta _z^\perp } = 1_{N \times N} - \eta _z \otimes \eta _z / \left\Vert \eta _z\right\Vert ^2_N$ is the orthogonal projection onto $\eta _z^\perp $. A brief derivation of this result, analogous to the computations of Tong et al. (2021), is included in “Appendix A1” for completeness. It is also directly equivalent to the standard form of the second order reliability method, as derived e.g. by Breitung (1984). Geometrically, it corresponds to replacing the extreme event set $\{\eta \in \mathbb {R}^N \mid F(\eta ) \ge z \}$ by a set bounded by the paraboloid with vertex at the instanton $\eta _z$, the axis of symmetry in the direction of $\nabla F(\eta _z)$ and curvatures adjusted to be the eigenvalues of the $-\Vert \nabla F\Vert ^{-1}$-weighted Hessian of F at $\eta _z$.

For the weather prediction example, Eqs. (7) and (8) mean the following: We could estimate (5) by performing a large number of simulations of the weather model with a random choice of parameters to obtain statistics on an extremely high temperature event. Instead, we solve an optimization problem over parameters to compute only the single most likely route to that large temperature. When the desired event is very extreme, such a situation can only be realized when all simulated physical processes conspire in exactly the right way to make the extreme temperature event possible. Consequently, only a narrow choice of model parameters and corresponding sequence of events remains that can contribute to the extreme event probability: precisely the instanton singled out by the optimization procedure. The probability of the extreme event is then well approximated by perturbations around that single most likely extreme outcome.

Next, we generalize the statement (7) to the infinite-dimensional setting encountered in continuous time stochastic systems. Intuitively, for temporally evolving systems with stochastic noise, there is randomness at every single instance in time, which implies an infinite number of random parameters to optimize over. We generalize the above strategy to the important case of SDEs in $\mathbb {R}^n$ driven by small additive Gaussian noise, and assemble and compare computational methods to compute $I_F$ and $C_F$ numerically, even for very large spatial dimensions n stemming from semi-discretizations of multi-dimensional SPDEs.

1.2 Generalization to infinite dimensions for SDEs with additive noise

As a stochastic model problem, we consider the SDE

$$\begin{aligned} {\left\{ \begin{array}{ll} \textrm{d}X^\varepsilon _t = b(X^\varepsilon _t) \textrm{d}t + \sqrt{\varepsilon } \sigma \textrm{d}B_t\,,\\ X^\varepsilon _0 = x \in \mathbb {R}^n, \end{array}\right. } \end{aligned}$$

(9)

on the time interval [0, T] with a deterministic initial condition and $n \in \mathbb {N}$, $\varepsilon > 0$. The drift vector field $b :\mathbb {R}^n \rightarrow \mathbb {R}^n$, assumed to be smooth, may be nonlinear and non-gradient. The constant matrix $\sigma \in \mathbb {R}^{n \times n}$ is not required to be diagonal or invertible. The SDE is driven by a standard n-dimensional Brownian motion $B = (B_t)_{t \in [0,T]}$. We limit ourselves to the estimation of extreme event probabilities (due to small noise $\epsilon $) of the random variable $f(X^\varepsilon _T)$, where $f:\mathbb {R}^n \rightarrow \mathbb {R}$ is a smooth, possibly nonlinear observable of the process $X^\varepsilon $ at final time $t = T$.

A concrete example of this type of system, already alluded to in the first section, is shown in Fig. 2. It is given by the SDE

$$\begin{aligned}&{\left\{ \begin{array}{ll} \textrm{d}X = (-X-XY)\,\textrm{d}t + \sqrt{\varepsilon }\,\textrm{d}B_X,\\ \textrm{d}Y = (-4Y+X^2)\,\textrm{d}t + \tfrac{1}{2}\sqrt{\varepsilon }\,\textrm{d}B_Y, \end{array}\right. } \nonumber \\&\text {with }(X(0),Y(0)) = (0,0)\,. \end{aligned}$$

(10)

The streamlines in the figure show the motion taken by deterministic trajectories of the model at $\varepsilon = 0$. Small magnitude stochasticity in the form of Brownian noise is added, and we ask the question: What is the probability $P_F^\varepsilon (z)$, as defined below in (13), that the system ends up, at time $T=1$, in the red shaded area in the top right corner, given by $f(x,y) = x + 2y \ge z=3$? After approximately $1.2 \cdot 10^7$ simulations, 100 such trajectories are found, with some of them shown in light orange in Fig. 2. These can be considered typical realizations for this extreme outcome, and allow us to estimate $P_F^\varepsilon (z) \in \left[ 6.71 \cdot 10^{-6},9.97 \cdot 10^{-6}\right] $ as a $95\%$ confidence interval. While in principle the same approach could be applied to much more complicated stochastic models, such as SPDEs arising in atmosphere or ocean dynamics, it quickly becomes infeasible due to the cost of performing such a large number of simulations.

Instead, we generalize the strategy outlined in the previous section. For the derivation, we make the following, compared to the finite-dimensional case stronger assumptions for technical reasons. To formulate them, we introduce the solution map

$$\begin{aligned} F :L^2([0,T],\mathbb {R}^n) \rightarrow \mathbb {R}\,, \quad&F[\eta ] = f(\phi (T)), \nonumber \\&\text {for } {\left\{ \begin{array}{ll} \dot{\phi } = b(\phi ) + \sigma \eta \,,\\ \phi (0) = x\,. \end{array}\right. } \end{aligned}$$

(11)

Then, we assume for all $z \in \mathbb {R}$:

1.
There is a unique instanton on the z-levelset of F, $\eta _z \in F^{-1}(\{z\}) \subset L^2([0,T], \mathbb {R}^n)$, that minimizes the function $\tfrac{1}{2} \left\Vert \cdot \right\Vert ^2_{L^2([0,T], \mathbb {R}^n)}$. There exists a Lagrange multiplier $\lambda _z \in \mathbb {R}$ with $\eta _z = \lambda _z \left. \frac{\delta F}{\delta \eta }\right| _{\eta _z}$ as a first-order necessary condition. We define the large deviation rate function for the observable f as
$$\begin{aligned} I_F :\mathbb {R}\rightarrow \mathbb {R}\,, \quad I_F(z) := \tfrac{1}{2} \left\Vert \eta _z\right\Vert ^2_{L^2([0,T], \mathbb {R}^n)}\,. \end{aligned}$$
(12)
2.
The map from observable value to minimizer $z \mapsto \eta _z$ is $C^1$. In particular $I_F'(z) = \langle \eta _z, \textrm{d}\eta _z / \textrm{d}z \rangle _{L^2([0,T], \mathbb {R}^n)} = \lambda _z \langle \left. \tfrac{\delta F}{\delta \eta } \right| _{\eta _z}, \tfrac{\textrm{d}\eta _z}{\textrm{d}z} \rangle _{L^2([0,T], \mathbb {R}^n)} = \lambda _z$.
3.
${{\,\textrm{Id}\,}}- \lambda _z \left. \frac{\delta ^{2}F}{\delta \eta ^{2}}\right| _{ \eta _z}$ is positive definite.
4.
The rate function $I_F$ is twice continuously differentiable and strictly convex, i.e. $I_F'' > 0$.

Under these assumptions and using existing theoretical results on precise Laplace asymptotics for small-noise SDEs, in “Appendix A2” we sketch a derivation of the following result: For the extreme event probability

$$\begin{aligned} P_F^\varepsilon (z) = \mathbb {P}{\Big [ F[\sqrt{\varepsilon } \eta ] \ge z \Big ]} = \mathbb {P}{\Big [ f(X_T^\varepsilon ) \ge z \Big ]} \end{aligned}$$

(13)

with $z > F(0)$, the asymptotically sharp estimate (7) holds in the same way as before. The leading order prefactor is now given by

$$\begin{aligned} C_F(z) = \left[ 2 I_F(z) \det \left( {{\,\textrm{Id}\,}}- \lambda _z {{\,\textrm{pr}\,}}_{\eta _z^\perp } \left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{ \eta _z} {{\,\textrm{pr}\,}}_{ \eta _z^\perp } \right) \right] ^{-1/2}\,, \end{aligned}$$

(14)

where $\det $ is now a Fredholm determinant, the second variation $\delta ^2 F / \delta \eta ^2$ of the solution map F at $\eta = \eta _z$ is a linear trace-class operator on $L^2([0,T],\mathbb {R}^n)$, and ${{\,\textrm{pr}\,}}$ denotes orthogonal projection in $L^2([0,T],\mathbb {R}^n)$.

Applied to the model SDE (10), we must first compute the optimal noise realization $\eta _z = \left( \eta _z(t) \right) _{t \in [0,T]}$, which has a corresponding optimal system trajectory $\phi _z = \left( \phi _z(t) \right) _{t \in [0,T]}$. This optimal trajectory, shown blue dashed in Fig. 2, describes the most likely evolution of the SDE (10) from the initial condition (0, 0) into the shaded region in the upper right corner, thus leading to an event $f(X(T),Y(T))\ge z$. Second, through equation (14), we can compute the prefactor correction for this optimal noise realization. Inserted into Eq. (7), we obtain $P_F^{\varepsilon = 0.5}(z = 3) \approx 8.94 \cdot 10^{-6}$ as an asymptotic, sampling-free estimate, which falls into the estimated interval obtained with direct sampling. The source code to reproduce all results for this example is available in a public GitHub repository (Schorlepp et al. 2023).

We add some remarks on the setting:

1.
We focus on SDEs with additive noise (9) for simplicity. For the more general case of ordinary Itô SDEs with multiplicative noise $\sigma = \sigma (X_t^\varepsilon )$, the leading order prefactor can still be computed explicitly, but involves a regularized Carleman-Fredholm determinant $\det _2$ (see Simon (1977) for a definition) instead of a Fredholm determinant $\det $, because the second variation of F is no longer guaranteed to be trace-class (Ben Arous 1988). The direct analogy to the finite-dimensional case is only possible for additive noise.
2.
We state the theoretical result and computational strategy for ordinary stochastic differential equations, but will also apply them numerically to SPDEs with additive, spatially smooth Gaussian forcing. In this case, we expect a direct generalization of the results for SDEs to hold.
3.
Without any additional work, we also obtain a sharp estimate, in the sense of (4), for the probability density function $\rho _F^\varepsilon $ of $f(X_T^\varepsilon )$ at z via
$$\begin{aligned} \rho _F^\varepsilon (z) \overset{\varepsilon \downarrow 0}{\sim }\ (2 \pi \varepsilon )^{-1/2} \lambda _z C_F(z)\,\exp \left\{ -\frac{1}{\varepsilon }I_F(z)\right\} \,. \end{aligned}$$
(15)

From a practical point of view, the remaining question is how to evaluate (12) and (14), given a general and possibly high-dimensional SDE (9).

Main questions and paper outline. In the remainder of this paper, we will specifically answer the following questions:

How to find the minimizer $\eta _z$ to the differential equation constrained optimization problem (12) numerically? This question has been treated in detail in the literature for the setup at hand, and we give a brief summary of relevant references in Sect. 2.1.
How to evaluate the Fredholm determinant in (14) numerically? We show in Sect. 2.2 how to use second-order adjoints to compute the application of the projected second variation operator
$$\begin{aligned} A_z := \lambda _z {{\,\textrm{pr}\,}}_{ \eta ^\perp _z} \left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} {{\,\textrm{pr}\,}}_{ \eta ^\perp _z} \end{aligned}$$
(16)
to functions (or, upon discretization, to vectors), which is the basis for iterative eigenvalue solvers. In Sect. 2.4, we discuss how this allows us to treat very large system dimensions n as long as the rank of $\sigma $ remains small.
How does this prefactor computation based on the dominant eigenvalues of the projected second variation operator theoretically relate to the alternative approach using symmetric matrix Riccati differential equations mentioned in the introduction? What are the advantages and disadvantages of the different approaches? We comment on these points in Sects. 2.3 and 2.4.
What is the probabilistic interpretation of the quantities encountered when evaluating (12) and (14)? In how far can they be observed in direct Monte Carlo simulations of the SDE (9)? This is the content of Sect. 3.

After these theoretical sections, illustrated throughout via the model SDE (10), we present two challenging examples in Sect. 4: The probability of high waves in the stochastic Korteweg–De Vries equation in Sect. 4.1, and the probability of high strain events in the stochastic three-dimensional incompressible Navier–Stokes equations in Sect. 4.2. All technical derivations can be found in “Appendix A”.

2 Numerical rate function and prefactor evaluation

In this section, we show how the instanton and prefactor for the evaluation of the asymptotic tail probability estimate (7) can be computed in practice for a general, possibly high-dimensional SDE (9), and illustrate the procedure for the model SDE (10). Both finding the instanton (Sect. 2.1) and the prefactor (Sect. 2.2) require the solutions of differential equations of a complexity comparable to the original SDE. They therefore become realistic to evaluate numerically even for fairly large problems, provided tailored methods are used, as summarized in Sect. 2.4. Additionally, we compare the adjoint-based Fredholm determinant computation to the approach based on matrix Riccati differential equations in Sects. 2.3 and 2.4.

2.1 First variations and finding the instanton

Here, we discuss the differential equation-constrained optimization problem

$$\begin{aligned} \eta _z = \mathop {\mathrm {arg\,min}}\limits _{{\begin{array}{c}\eta \in L^2([0,T],\mathbb {R}^n)\\ \text {s.t. }F[\eta ] = z \end{array}}} \; \frac{1}{2} \left\Vert \eta \right\Vert _{L^2([0,T],\mathbb {R}^n)}^2, \end{aligned}$$

(17)

that determines the instanton noise $\eta _z$, and briefly review how it can be solved numerically. We reformulate the first-order optimality condition

$$\begin{aligned} \eta _z = \lambda _z \left. \frac{\delta F}{\delta \eta } \right| _{\eta _z} \end{aligned}$$

(18)

by evaluating the first variation using an adjoint variable as reviewed by Plessix (2006), Hinze et al. (2009). For any $\eta \in L^2([0,T],\mathbb {R}^n)$, we find $\frac{\delta (\lambda F)}{\delta \eta } = \sigma ^\top \theta $, where the adjoint variable $\theta $ (also called conjugate momentum) is found via solving

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{\phi }} = b(\phi ) + \sigma \eta \,, \quad &{}\phi (0) = x\,,\\ {\dot{\theta }} = - \nabla b^\top (\phi ) \theta \,, \quad &{}\theta (T) = \lambda \nabla f(\phi (T))\,. \end{array}\right. } \end{aligned}$$

(19)

With $a = \sigma \sigma ^\top $, we recover from (18) the well-known instanton equations, formulated only in term of the state variable $\phi _z$ and its adjoint variable $\theta _z$ with optimal noise $\eta _z = \sigma ^\top \theta _z$:

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{\phi }}_z = b(\phi _z) + a \theta _z\,, \quad &{}\phi _z(0) = x\,, \quad f(\phi _z(T)) = z,\\ {\dot{\theta }}_z = - \nabla b^\top (\phi _z) \theta _z\,, \quad &{}\theta _z(T) = \lambda _z \nabla f(\phi _z(T))\,. \end{array}\right. } \end{aligned}$$

(20)

The rate function is given by $I_F(z) = \tfrac{1}{2} \left\langle \theta _z, a \theta _z \right\rangle _{L^2([0,T],\mathbb {R}^n)}$. When formulating the optimization problem in the state variable $\phi $ instead of the noise $\eta $, the instanton equations (20) are directly obtained as the first-order necessary condition for a minimizer of the Freidlin-Wentzell (Freidlin and Wentzell 2012) action functional S with

$$\begin{aligned} S[\phi ] = \frac{1}{2} \int _0^T \left\langle \dot{\phi } - b(\phi ), a^{-1} \left[ \dot{\phi } - b(\phi ) \right] \right\rangle _n \textrm{d}t\,. \end{aligned}$$

(21)

The numerical minimization of this functional for both ordinary and partial stochastic differential equations is discussed e.g. by E et al. (2004), Grafke et al. (2015), Grafke et al. (2015), Grafke and Vanden-Eijnden (2019), Schorlepp et al. (2022). Conceptually, the minimization problem (17) is a deterministic distributed optimal control problem on a finite time horizon with a final time constraint on the state variable (Lewis et al. 2012; Herzog and Kunisch 2010). The final-time constraint can be eliminated e.g. using penalty methods. Alternatively, for a convex rate function, a primal-dual strategy (Boyd and Vandenberghe 2004) with minimization of $\tfrac{1}{2} \left\Vert \cdot \right\Vert ^2 - \lambda F$ at fixed $\lambda $ can be used. If estimates for a range of z are desired, one can solve the dual problem for various $\lambda $, which effectively computes the Legendre-Fenchel transform $I_F^*(\lambda )$, and invert afterwards. If the rate function is not convex, the observable f can be reparameterized to make this possible (Alqahtani and Grafke 2021). To solve the unconstrained problems of the general form $ \min \tfrac{1}{2} \left\Vert \cdot \right\Vert ^2 - \lambda (F - z) + \tfrac{\mu }{2}(F - z)^2$, gradient-based methods with an adjoint evaluation (19) can be used, e.g. Schorlepp et al. (2022) use an L-BFGS solver. Simonnet (2022) used a deep learning approach instead. For high-dimensional problems such as multi-dimensional fluids, it may be necessary to use checkpointing for the gradient evaluation, and to use ${{\,\textrm{rank}\,}}\sigma \ll n$ if applicable to reduce memory costs (Grafke et al. 2015). We comment on this point in more detail in Sect. 2.4. Using second order adjoints as in the next section would also make it possible to implement a Newton solver, cf. Hinze and Kunisch (2001), Hinze et al. (2006), Sternberg and Hinze (2010), Cioaca et al. (2012).

For the model SDE (10), the instanton equations (20) read

$$\begin{aligned}&{\left\{ \begin{array}{ll} \frac{\textrm{d}}{\textrm{d}t} \left( \begin{array}{c} \phi _1\\ \phi _2 \end{array} \right) = \left( \begin{array}{c} -\phi _1\\ -4\phi _2 \end{array} \right) + \left( \begin{array}{c} -\phi _1 \phi _2\\ \phi _1^2 \end{array} \right) + \left( \begin{array}{c} \theta _1\\ \tfrac{1}{4}\theta _2 \end{array} \right) \,,\\ \frac{\textrm{d}}{\textrm{d}t} \left( \begin{array}{c} \theta _1\\ \theta _2 \end{array} \right) = \left( \begin{array}{c} +\theta _1\\ +4\theta _2 \end{array} \right) + \left( \begin{array}{c} \phi _2 \theta _1 - 2 \phi _1 \theta _2\\ \phi _1\theta _1 \end{array} \right) \,, \end{array}\right. }\nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \left( \begin{array}{c} \phi _1(0)\\ \phi _2(0) \end{array} \right) = \left( \begin{array}{c} 0\\ 0 \end{array} \right) \,, \quad \phi _1(T) + 2 \phi _2(T) = z,\\ \left( \begin{array}{c} \theta _1(T)\\ \theta _2(T) \end{array} \right) = \lambda _z \left( \begin{array}{c} 1\\ 2 \end{array} \right) \,. \end{array}\right. } \end{aligned}$$

(22)

We implemented a simple gradient descent (preconditioned with $a^{-1}$) using adjoint evaluations of the gradient and an Armijo line search (available in the GitHub repository (Schorlepp et al. 2023)) to find the instanton for the model SDE (10). The state equation is discretized using explicit Euler steps with an integrating factor, and the gradient is computed exactly on a discrete level, i.e. “discretize, then optimize”. To find the instanton for a given z, we use the augmented Lagrangian method. For each subproblem at fixed Lagrange multiplier $\lambda $ and penalty parameter $\mu $, gradient descent is performed until the gradient norm has been reduced by a given factor compared to its initial value. All of these aspects are summarized in more detail by Schorlepp et al. (2022). The resulting optimal state variable trajectory $\phi _z$ for $z = 3$ for the model SDE (10) is shown in Fig. 2.

2.2 Second variations and prefactor computation via dominant eigenvalues

Similarly to the previous section, the second variation is also readily evaluated in the adjoint formalism. With this prerequisite, we are able to use iterative eigenvalue solvers to approximate the Fredholm determinant $\det ({{\,\textrm{Id}\,}}- A_z)$. For a comprehensive introduction to the numerical computation of Fredholm determinants, as well as theoretical results on approximate evaluations using integral quadratures, see Bornemann (2010). However, in contrast to Bornemann (2010), we deal with possibly spatially high-dimensional problems, such as the example in Sect. 4.2. Hence, we use iterative algorithms to compute the dominant eigenvalues to keep the number of operator evaluations manageable.

Another application of the adjoint state method shows that applying the second functional derivative of the solution map F at $\eta :[0,T] \rightarrow \mathbb {R}^n$ to a fluctuation $\delta \eta :[0,T] \rightarrow \mathbb {R}^n$ results in $\frac{\delta ^{2}(\lambda F)}{\delta \eta ^{2}} \delta \eta = \sigma ^\top \zeta $, where $\zeta $ is found via solving

$$\begin{aligned}&{\left\{ \begin{array}{ll} {\dot{\gamma }} =\nabla b(\phi ) \gamma + \sigma \delta \eta \,,\\ {\dot{\zeta }} = -\left\langle \nabla ^2 b(\phi ), \theta \right\rangle _n \gamma - \nabla b^\top (\phi ) \zeta \,, \end{array}\right. } \nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \gamma (0) = 0\,,\\ \zeta (T) = \lambda \nabla ^2 f(\phi (T)) \gamma (T)\,. \end{array}\right. } \end{aligned}$$

(23)

Here, we use the short-hand notation $\left[ \left\langle \nabla ^2 b(\phi ), \theta \right\rangle _n\right] _{ij} = \sum _{k=1}^n \partial _i \partial _j b_k(\phi ) \theta _k$. The trajectories $\phi $ and $\theta $ in (23) are determined via (19) from $\eta $. Note that the second order equations (23) are simply the linearization of (19). Together with the projection operator ${{\,\textrm{pr}\,}}_{\eta ^\perp _z}$ acting as

$$\begin{aligned} ({{\,\textrm{pr}\,}}_{\eta ^\perp _z} \delta \eta )(t) = \delta \eta (t) - \frac{\left\langle \eta _z, \delta \eta \right\rangle _{L^2([0,T], \mathbb {R}^n)}}{\left\Vert \eta _z\right\Vert ^2_{L^2([0,T],\mathbb {R}^n)}} \eta _z(t) \end{aligned}$$

(24)

for $t \in [0,T]$, we are now in a position to evaluate the application of the operator $A_z$, as defined in (16), to any function $\delta \eta :[0,T] \rightarrow \mathbb {R}^n$. Denoting the eigenvalues of the trace-class operator $A_z$ by $\mu _z^{(i)} \in (-\infty , 1)$, the Fredholm determinant in the prefactor (14) is given by $\det ({{\,\textrm{Id}\,}}- A_z) = \prod _{i=1}^\infty (1 - \mu _z^{(i)})$, with $\left|\mu _z^{(i)}\right| \xrightarrow {i \rightarrow \infty } 0$ in such a way that the product converges. An iterative eigenvalue solver relying solely on matrix–vector multiplication, thus avoiding the explicit storage of the possibly large discretized operator $A_z$ as an $(n_t \cdot n)\times (n_t \cdot n)$ matrix, can now be used numerically to find a finite number of dominant eigenvalues of $A_z$ with absolute value larger than some thresholds, and approximate $\det ({{\,\textrm{Id}\,}}- A_z)$ using these.

For the model example SDE (10), linearizing the state and first order adjoint equations (19), the second order adjoint equations for (10) become

$$\begin{aligned}&{\left\{ \begin{array}{ll} \frac{\textrm{d}}{\textrm{d}t} \left( \begin{array}{c} \gamma _1\\ \gamma _2 \end{array} \right) = -\left( \begin{array}{c}\gamma _1\\ 4 \gamma _2\end{array}\right) + \left( \begin{array}{c} - \gamma _1 \phi _2 - \phi _1 \gamma _2\\ 2\phi _1 \gamma _1 \end{array} \right) + \left( \begin{array}{c} \delta \eta _1\\ \tfrac{1}{2}\delta \eta _2 \end{array} \right) \,,\\ \frac{\textrm{d}}{\textrm{d}t} \left( \begin{array}{c} \zeta _1\\ \zeta _2 \end{array} \right) = \left( \begin{array}{c} \zeta _1\\ 4 \zeta _2 \end{array}\right) + \left( \begin{array}{c} \gamma _2 \theta _1+\phi _2 \zeta _1 -2 \gamma _1 \theta _2 - 2 \phi _1 \zeta _2 \\ \gamma _1 \theta _1 + \phi _1 \zeta _1 \end{array} \right) \,, \end{array}\right. } \nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \left( \begin{array}{c} \gamma _1(0)\\ \gamma _2(0) \end{array} \right) = \left( \begin{array}{c} 0\\ 0 \end{array} \right) \,,\\ \left( \begin{array}{c} \zeta _1(T)\\ \zeta _2(T) \end{array} \right) = \left( \begin{array}{c} 0\\ 0 \end{array} \right) \,. \end{array}\right. } \end{aligned}$$

(25)

We implemented a simple Euler solver for these equations for a given discretized input vector $\delta \eta \in \mathbb {R}^{2 (n_t + 1)}$ in the python code (Schorlepp et al. 2023) as a subclass of scipy.sparse.linalg.LinearOperator. To set up this operator, we supply the instanton data $(\phi _z, \theta _z, \lambda _z) \in \mathbb {R}^{2 (n_t + 1)} \times \mathbb {R}^{2 (n_t + 1)} \times \mathbb {R}$ as found using the methods of the previous Sect. 2.1. The LinearOperator class, for which we only need to supply a matrix vector multiplication method instead of having to store the full matrix $\in \mathbb {R}^{2 (n_t + 1) \times 2 (n_t + 1)}$, can then be used with any iterative eigenvalue solver. Here, we use the implicitly restarted Arnoldi method of ARPACK (Lehoucq et al. 1998), wrapped as scipy.sparse.linalg.eigs in python. Note that in this example, storing the full matrix would be feasible, and the Riccati method discussed in the next section is faster to compute the prefactor. However, we are interested in a scalable approach for large n, where, as discussed in Sect. 2.4 and shown in Sect. 4, the Riccati approach becomes infeasible. We show the results of computing 200 eigenvalues with largest absolute value of the projected second variation operator $A_z$ for $z=3$ in Fig. 3.

2.3 Alternative: prefactor computation via matrix Riccati differential equations

In “Appendix A3”, we motivate via formal manipulations that the prefactor (14) can also be expressed via the following ratio of zeta-regularized functional determinants (Ray and Singer 1971) of second order differential operators, instead of a Fredholm determinant of an integral operator. This prefactor expression is more natural from the statistical physics point of view, where path integrals in the field variable $\phi $ instead of the noise $\eta $ are typically considered, cf. Zinn-Justin (2021). We obtain

$$\begin{aligned} C_F(z) =&\sqrt{I_F''(z)}\lambda _z^{-1} \left( \frac{{{\,\textrm{Det}\,}}_{{{\mathcal {A}}}_{\lambda _z}} \left( \Omega [\phi _z] \right) }{{{\,\textrm{Det}\,}}_{{{\mathcal {A}}}_0} \left( \Omega [\phi _0] \right) } \right) ^{-1/2} \times \nonumber \\ {}&\quad \times \exp \left\{ -\tfrac{1}{2} \int _0^T \left( \nabla \cdot b(\phi _z) - \nabla \cdot b(\phi _0)\right) \,\textrm{d}t\right\} \,, \end{aligned}$$

(26)

in accordance with Schorlepp et al. (2023), where it was derived directly through path integral computations. Here, $\Omega $ is the Jacobi operator of the Freidlin-Wentzell action functional as defined in the “Appendix A3”, and the subscript of the zeta-regularized determinants ${{\,\textrm{Det}\,}}$ denotes the boundary conditions under which the determinants of the differential operators are computed. Naively evaluating the determinant ratio in (26) by numerically finding the eigenvalues of the appearing differential operators is typically not feasible. This is due to the fact that both operators posses unbounded spectra with the same asymptotic behavior of the eigenvalues, which requires computing the smallest eigenvalues of both operators. A threshold for this computation is difficult to set, and while the eigenvalues of both operators should converge to each other as they increase, numerical inaccuracies tend to increase for the larger eigenvalues. Fortunately, there exists theoretical results regarding the computation of such determinant ratios exactly and in a closed form by solving initial value problems (Gel’fand and Yaglom 1960; Levit and Smilansky 1977; Forman 1987; Kirsten and McKane 2003). Using the results of Forman (1987), the prefactor (26) can be computed by solving the symmetric matrix Riccati differential equation

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{Q}_z = a + Q_z \nabla b\left( \phi _z \right) ^\top + \nabla b\left( \phi _z \right) Q_z + Q_z \left\langle \nabla ^2 b(\phi _z), \theta _z\right\rangle _n Q_z \,,\\ Q_z(0) = 0_{n \times n} \in \mathbb {R}^{n \times n}\,. \end{array}\right. } \end{aligned}$$

(27)

for $Q_z :[0,T] \rightarrow \mathbb {R}^{n \times n}$ and then evaluating

$$\begin{aligned} C_F(z)&= \lambda _z^{-1} \exp \left\{ \frac{1}{2} \int _0^T {{\,\textrm{tr}\,}}\left[ \left\langle \nabla ^2 b(\phi _z), \theta _z \right\rangle _n Q_z \right] \textrm{d}t\right\} \times \nonumber \\&\times \left[ {\det } \left( U_z \right) \left\langle \nabla f(\phi _z(T)), Q_z(T) U_z^{-1} \nabla f(\phi _z(T)) \right\rangle _n \right] ^{-1/2} \end{aligned}$$

(28)

with

$$\begin{aligned} U_z := 1_{n \times n} - \lambda _z \nabla ^2 f \left( \phi _z(T) \right) Q_z(T) \in \mathbb {R}^{n \times n}\,. \end{aligned}$$

(29)

This result in terms of a Riccati matrix differential equation is also natural from a stochastic analysis perspective (WKB analysis of the Kolmogorov backward equation (Grafke et al. 2021)), or a time-discretization of the path integral perspective (recursive evaluation method (Schorlepp et al. 2021)). To give intuition for the Riccati differential equation (27), note that by letting $Q_z = \gamma \zeta ^{-1}$ with $\gamma (0) = 0_{n \times n}$ and $\zeta (0) = 1_{n \times n}$, the approach amounts to solving

$$\begin{aligned} {\left\{ \begin{array}{ll} {\dot{\gamma }} =\nabla b(\phi ) \gamma + a \zeta \,, \quad &{}\gamma (0) = 0_{n \times n}\,,\\ {\dot{\zeta }} = -\left\langle \nabla ^2 b(\phi ), \theta \right\rangle _n \gamma - \nabla b^\top (\phi ) \zeta \,, \quad &{}\zeta (0) = 1_{n \times n}, \end{array}\right. } \end{aligned}$$

(30)

as an initial value problem, whereas the eigenvalue problem $\frac{\delta ^{2}(\lambda F)}{\delta \eta ^{2}} \delta \eta = \mu \delta \eta $ corresponds to the boundary value problem

$$\begin{aligned}&{\left\{ \begin{array}{ll} {\dot{\gamma }} =\nabla b(\phi ) \gamma + \mu ^{-1} a \zeta \,,\\ {\dot{\zeta }} = -\left\langle \nabla ^2 b(\phi ), \theta \right\rangle _n \gamma - \nabla b^\top (\phi ) \zeta \,, \end{array}\right. } \nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \gamma (0) = 0\,,\\ \zeta (T) = \lambda \nabla ^2 f(\phi (T)) \gamma (T)\,. \end{array}\right. } \end{aligned}$$

(31)

This means that to evaluate the functional determinant prefactor via the Riccati approach, we consider functions in the kernel of the operator ${{\,\textrm{Id}\,}}- \lambda \delta ^2 F / \delta \eta ^2$, i.e. eigenfunctions belonging to the eigenvalue 0, but under modified boundary conditions of the operator. In practice, instead of finding the dominant eigenvalues of the integral operator $A_z$ of Sect. 2.2 that acts on functions $\delta \eta :[0,T] \rightarrow \mathbb {R}^n$, we can integrate a single matrix-valued initial value problem for $Q_z :[0,T] \rightarrow \mathbb {R}^{n \times n}$ as presented in this section. Even though the Riccati equation (27), in contrast to the linear system (30), is a nonlinear differential equation, it is nevertheless advisable to solve (27) instead of (30) numerically because the equation for $\zeta $ in (30) has to be integrated in the unstable time direction for the right-hand side term $-\nabla b(\phi )^\top \zeta $. Note also that, depending on the system and observable at hand, the solution of the Riccati equation (27) may pass through removable singularities in (0, T) whenever $\zeta (t)$ in (30) becomes non-invertible, hence direct numerical integration of (27) may require some care (see Schiff and Shnider (1999) and references therein).

For the two-dimensional model SDE (10), the forward Riccati equation for the symmetric matrix $Q = Q_z :[0,T] \rightarrow \mathbb {R}^{2 \times 2}$ along $(\phi , \theta ) = (\phi _z, \theta _z)$ becomes

$$\begin{aligned}&\frac{\textrm{d}}{\textrm{d}t}\left( \begin{array}{cc} Q_{11} &{} Q_{12}\\ Q_{12} &{} Q_{22} \end{array} \right) = \left( \begin{array}{cc} 1 &{} 0\\ 0 &{} \tfrac{1}{4} \end{array} \right) - \left( \begin{array}{cc} 2 Q_{11} &{} 5 Q_{12}\\ 5 Q_{12} &{} 8 Q_{22} \end{array} \right) \nonumber \\&+ \left[ \left( \begin{array}{cc} -\phi _2 &{} -\phi _1\\ 2 \phi _1 &{} 0 \end{array} \right) \left( \begin{array}{cc} Q_{11} &{} Q_{12}\\ Q_{12} &{} Q_{22} \end{array} \right) \right] + [\dots ]^\top \nonumber \\&+\left( \begin{array}{cc} Q_{11} &{} Q_{12}\\ Q_{12} &{} Q_{22} \end{array} \right) \left( \begin{array}{cc} 2 \theta _2 &{} - \theta _1\\ - \theta _1 &{} 0 \end{array} \right) \left( \begin{array}{cc} Q_{11} &{} Q_{12}\\ Q_{12} &{} Q_{22} \end{array} \right) \,, \end{aligned}$$

(32)

where $[\dots ]$ stands for a repetition of the preceding term. We solve the Riccati equation with Euler steps with integrating factor in Schorlepp et al. (2023), and use it to evaluate (28). We do not encounter any numerical problems or singularities in this example. The result for $C_F(z = 3)$ agrees with the Fredholm determinant computation using dominant eigenvalues in the previous Sect. 2.2.

2.4 Computational efficiency considerations

In this section, we compare the two prefactor computation methods of Sects. 2.2 and 2.3 using either dominant eigenvalues of the trace-class operator $A_z$ evaluated via (23), or the Riccati matrix differential equation (27), in terms of their practical applicability as well as computational and memory cost for large system dimensions $n \gg 1$.

For the eigenvalue-based approach, we know that $\prod _{i = 1}^m \left( 1 - \mu _z^{(i)} \right) \xrightarrow {m \rightarrow \infty } \det ({{\,\textrm{Id}\,}}- A_z)$ converges in theory, but it is difficult to give bounds on the required number of eigenvalues for an approximation of the Fredholm determinant to a given accuracy. In all examples considered in this paper, at most a few 100 eigenvalues turned out to be necessary for accurate results, even for the three-dimensional Navier–Stokes equations in Sect. 4.2 as a high-dimensional ($n = 3 \cdot 128^3 \approx 6.3 \cdot 10^6$) and strongly nonlinear example. The number of dominant eigenvalues of $A_z$ to be computed to achieve a desired accuracy is robust with respect to the temporal resolution and only depends on the (effective, see below) dimension of the system and the level of nonlinearity in the system. In any case, to obtain m eigenvalues of $A_z$ with largest absolute value, iterative eigenvalue solvers, either using Krylov subspace methods or randomized algorithms, typically require a number of evaluations of the operator that is equal to a constant times m (Halko et al. 2011). Each evaluation of $A_z$ consists of solving two ODEs or PDEs (23) with comparable computational complexity to the original SDE. We comment on memory requirements below.

Compared to this, the Riccati approach requires the numerical solution of a single $n \times n$ symmetric matrix differential equation as an initial value problem. If n is small, then this is clearly more efficient than computing $m > n / 2$ eigenvalues. However, there may also be problems with the Riccati approach: On the one hand, this approach requires a strictly convex rate function with $I_F''(z)> 0$ at z, as can be seen from (26). If this is not satisfied, then a suitable convexification via reparameterization needs to be carried out on a case-by-case basis (Alqahtani and Grafke 2021). While we assumed that the rate function is convex to derive the prefactor (14) in terms of the Fredholm determinant, this assumption is actually not necessary and the eigenvalue-based approach remains feasible regardless of the convexity of the observable rate function $I_F$. Finally, the eigenvalue approach is easier to interpret, while it is not always immediately clear why the Riccati solution may diverge (removable singularities that can be remedied by a suitable choice of integration scheme versus true singularities due to unstable or flat directions of the second variation at the instanton).

We turn to the memory requirements of the prefactor computations strategies, and in particular to their scaling with the system dimension n. Informally, one can think of the Riccati matrix as defined in the (squared) state space of the SDE, in contrast to the eigenvectors of $A_z$ that are defined in the noise space that is potentially lower-dimensional. The Riccati equation then integrates a dense $n \times n$ array in time by performing $n_t$ consecutive times steps of (27) and evaluating (28) along the way. This is difficult to achieve directly as soon as (semi-discretizations of) multi-dimensional SPDEs are considered, which are relevant e.g. for realistic fluid or climate models. Usually, large Riccati matrix differential equations, which also arise e.g. in linear-quadratic regulator problems, are solved within some problem-specific low-rank format, see e.g. Stillfjord (2018). In contrast to this, the vectors on which iterative eigenvalue solvers for the Fredholm-determinant based approach need to operate are in general vectors of size $n_t \times n$.

As an important class of examples, we now consider systems with large spatial dimension $n \gg 1$, for which, however, only a few degrees of freedom are forced, such that the diffusion matrix $a = \sigma \sigma ^\top $ is singular and ${{\,\textrm{rank}\,}}\sigma \ll n$. Examples for this include fluid and turbulence models with energy injection only on a compactly supported set of either high or low spatial Fourier modes, or climate models with a limited number of random parameters in the model (Margazoglou et al. 2021). In this case, it is straightforward to exploit the small rank of $\sigma $ within the eigenvalue-based approach to decrease the memory requirements and apply the method even to very high-dimensional models, which we demonstrate for the randomly forced three-dimensional Navier–Stokes equations in Sect. 4.2 in this paper. The idea is that for the eigenvectors $\delta \eta $ of $A_z$, clearly only ${{\,\textrm{rank}\,}}\sigma $ many entries are relevant due to the composition with $\sigma $ and $\sigma ^\top $. Eigenvalue solvers hence act on $n_t \times {{\,\textrm{rank}\,}}\sigma $ vectors, which should fit into memory. This is similar to the computation of the instanton itself, where only the instanton noise $\eta _z$ as a $n_t \times {{\,\textrm{rank}\,}}\sigma $ vector is computed and stored explicitly, as discussed by Grafke et al. (2015), Schorlepp et al. (2022). The remaining challenge is then to evaluate $A_z \delta \eta $ for given $\delta \eta \in \mathbb {R}^{n_t \times {{\,\textrm{rank}\,}}\sigma }$ by solving the second order adjoint equations (23), without storing the full, prohibitively large $n_t \times n$ arrays needed for $\phi _z$, $\gamma $ and $\theta _z$. Similar to the gradient itself, evaluated via the first order adjoint approach (19), this is possible through (static) checkpointing (Griewank and Walther 2000), as illustrated in Fig. 4. At the cost of having to integrate the first order adjoint equations repeatedly for each noise vector $\delta \eta $ to which $A_z$ is applied, and to recursively solve the forward equations for $\phi _z$ and $\gamma $ again and again, the memory requirements for the spatially dense fields are only ${{\mathcal {O}}}\left( \log n_t \cdot n \right) $ this way. The same problem is encountered and solved similarly in implementations of Newton solvers for high-dimensional PDE-constrained optimal control problems (Hinze and Kunisch 2001; Hinze et al. 2006; Sternberg and Hinze 2010; Cioaca et al. 2012). All in all, in contrast to the Riccati formalism, this permits an easy and controlled strategy that enables to treat very large spatial dimensions within the Fredholm-based prefactor approach, as long as the diffusion matrix possesses a comparably small rank. Note, however, that it is still necessary that the number of eigenvalues needed to approximate $\det ({{\,\textrm{Id}\,}}- A_z)$ remains small for this approach to be applicable in practice. We show numerically in Sect. 4.2 that this is indeed the case for the three-dimensional Navier–Stokes equations as an example. The discussion of this paragraph, with all relevant scalings of computational and memory costs for the different approaches, is briefly summarized in Table 1.

Table 1 Overview of computational and memory costs for finding the prefactor $C_F(z)$ either through solving the Riccati equation (27), or through determining m dominant eigenvalues of $A_z$.

Full size table

In conclusion, we recommend using the Riccati equation only in sufficiently “nice” situations for small to moderate system dimensions n. For such systems and diffusion matrices without low-rank properties, and as long as no additional complications such as non-convex rate functions or removable singularities of the Riccati solution are encountered, it is faster than the eigenvalue-based approach, and better suited to analytical computations or approximations since it only involves the solution of initial value problems, in contrast to the boundary value problems that need to be solved to find eigenfunctions of the projected second variation operator $A_z$. On the other hand, the Fredholm determinant computation through dominant eigenvalues is easier to use and implement, requiring only solvers for the original SDE, its adjoint, as well as their linearizations. At the cost of introducing numerical errors and a step size parameter $h>0$ that needs to be adjusted, one can also approximate the second variation evaluations via

$$\begin{aligned} \left. \frac{\delta ^{2}\left( \lambda _z F \right) }{\delta \eta ^{2}}\right| _{\eta _z} \delta \eta \approx \frac{1}{h} \left( \left. \frac{\delta \left( \lambda _z F \right) }{\delta \eta } \right| _{\eta _z + h \delta \eta } - \left. \frac{\delta \left( \lambda _z F \right) }{\delta \eta }\right| _{\eta _z} \right) \end{aligned}$$

(33)

or other finite difference approximations, which does not require implementing any second order variations. In this sense, both the numerical instanton and leading-order prefactor computation can quickly be achieved in a black-box like, non-intrusive way when solvers for the state equation and its adjoint are available. Alternatively, the adjoint solver, as well as solvers for the second order tangent and adjoint equation can be obtained through automatic differentiation (Naumann 2011). We also note that in the context of the second order reliability method, there exist further approximation methods that could be used here for the Fredholm determinant prefactor, e.g. by extracting information from the gradient based optimization method that has been used to find the instanton or design point (Der Kiureghian and De Stefano 1991), or through constructing a non-infinitesimal parabolic approximation to the extreme event set (Der Kiureghian et al. 1987).

In any case, for the scenario of possibly multi-dimensional SPDEs with low-rank forcing, we argue that the eigenvalue approach is to be preferred as it leads to natural approximations and a simpler implementation. However, we remark that in the case of SDEs with multiplicative noise, or SPDEs with spatially white noise that need to be renormalized such as the Kardar–Parisi–Zhang (KPZ) equation, the Riccati approach remains structurally unchanged (Schorlepp et al. 2023), whereas the Fredholm determinant expression turns into a Carleman-Fredholm determinant and an additional operator trace (Ben Arous 1988), which could potentially be more costly to evaluate.

3 Probabilistic interpretation via fluctuation covariances and transition tubes

In this section, we give an intuitive interpretation for some of the quantities encountered in the previous sections. The second variation quantifies the linearized dynamics of the SDE (9) around the most likely realization. This implies that dominating eigenfunctions of the second variation correspond to fluctuation modes that are most easily observable. Below, we confirm this with a simple numerical experiment that relates the eigenfunction information with the transition tube along a rare trajectory. The basic object that we consider in this section is the process $(X_t^\varepsilon )_{t \in [0,T]}$ as $\varepsilon \downarrow 0$, conditioned on the rare outcome $f(X_T^\varepsilon ) = z$ at final time. In other words, we consider only transition paths between the fixed initial state $x \in \mathbb {R}^n$ and any final state in the target set $f^{-1}(\{z\}) \subset \mathbb {R}^n$. The path on which the transition path ensemble concentrates as $\varepsilon \downarrow 0$ is given by the state variable instanton trajectory $\phi _z$, i.e. the most likely way for the system to achieve $f(X_T^\varepsilon ) = z$, since deviations from it are suppressed exponentially (Freidlin and Wentzell 2012). One thus has

$$\begin{aligned} \lim _{\varepsilon \downarrow 0} \mathbb {E}\left[ X_t^\varepsilon \mid f(X_T^\varepsilon )= z \right] = \phi _z(t) \end{aligned}$$

(34)

for the mean of the conditioned process. In this sense, by taking conditional averages of direct Monte Carlo simulations of (9) as $\varepsilon $ tends to 0, the instanton trajectory $\phi _z$ is directly observable, and the mean realization agrees with the most likely one for $\varepsilon \downarrow 0$. This procedure is sometimes called filtering, and has been carried out e.g. for the one-dimensional Burgers equation (Grafke et al. 2013), the three-dimensional Navier–Stokes equations (Grafke et al. 2015; Schorlepp et al. 2022) and the one-dimensional KPZ equation (Hartmann et al. 2021). Using the results of the previous sections, we can, however, make this statement more precise and state a central limit-type theorem for the conditioned fluctuations at order $\sqrt{\varepsilon }$ around the instanton: As $\varepsilon \downarrow 0$, the process $(X_t^\varepsilon - \phi _z(t)) / \sqrt{\varepsilon }$, conditioned on $f(X_T^\varepsilon ) = z$, becomes centered Gaussian. It is hence fully characterized by its covariance function ${{\mathcal {C}}}_z :[0,T] \times [0,T] \rightarrow \mathbb {R}^{n \times n}$, given by

$$\begin{aligned} {{\mathcal {C}}}_z(t,t') = \lim _{\varepsilon \downarrow 0} \mathbb {E}\left[ \frac{(X_t^\varepsilon - \phi _z(t))\otimes (X_{t'}^\varepsilon - \phi _z(t'))}{\varepsilon } \bigg \vert f(X_T^\varepsilon ) = z\right] . \end{aligned}$$

(35)

We show in “Appendix A4” that ${{\mathcal {C}}}_z$ is fully determined through the orthonormal eigenfunctions $\delta \eta ^{(i)}_z$ of the projected second variation operator $A_z$ with corresponding eigenvalues $\mu _z^{(i)}$ and associated state variable fluctuations $\gamma ^{(i)}_z$, the solution of the linearized state equation

$$\begin{aligned} \dot{\gamma }^{(i)}_z = \nabla b (\phi _z) \gamma ^{(i)}_z + \sigma \delta \eta ^{(i)}_z\,, \quad \gamma ^{(i)}_z = 0\,, \end{aligned}$$

(36)

via

$$\begin{aligned} {{\mathcal {C}}}_z(t,t') = \sum _{i = 1}^\infty \frac{\gamma ^{(i)}_z(t) \otimes \gamma ^{(i)}_z(t')}{1 - \mu _z^{(i)}}\,. \end{aligned}$$

(37)

In particular, computing the eigenvalues and eigenfunctions of $A_z$ yields a complete characterization of the conditioned Gaussian fluctuations around the instanton. As detailed in the example below, at small but finite $\varepsilon $, ${{\mathcal {C}}}_z$ can be used to approximate the distribution of transition paths at any time $t \in [0,T]$ as multivariate normal ${{\mathcal {N}}}(\phi _z(t), \varepsilon {{\mathcal {C}}}_z(t,t))$. Effectively, in addition to the mean transition path at small noise, the instanton $\phi _z$, we can also estimate the width and shape of the transition tube around it at any $t \in [0,T]$ without sampling within a Gaussian process approximation of the conditioned SDE; see Vanden-Eijnden (2006) for a general introduction to transition path theory, and Archambeau et al. (2007), Lu et al. (2017) for Gaussian process approximations of SDEs based on minimizing the path space Kullback–Leibler divergence, which, in the small-noise limit and for transition paths, reduce to the Gaussian process considered here. Furthermore, one can show that the forward Riccati approach of Sect. 2.3 recovers the final-time state variable fluctuation covariance via

$$\begin{aligned} {{\mathcal {C}}}_z(T,T)&= Q_z(T) U_z^{-1} \nonumber \\ {}&\quad - \frac{\left( Q_z(T) U_z^{-1} \nabla f(\phi _z(T) )\right) ^{\otimes 2}}{\left\langle \nabla f(\phi _z(T)), Q_z(T) U_z^{-1} \nabla f(\phi _z(T)) \right\rangle _n}\,. \end{aligned}$$

(38)

This directly follows by adapting the forward Feynman-Kac computation used in remark 4 of Schorlepp et al. (2021) to the present calculation of the covariance function (35) at final time $t = t' = T$. Note that both, directly from (38), as well as from (37) after a short calculation, carried out in “Appendix A5”, one can see that these results are consistent with the additional final time boundary condition for the state variable fluctuations

$$\begin{aligned} \lim _{\varepsilon \downarrow 0} \left\langle \nabla f(\phi _z(T)), \frac{X_T^\varepsilon - \phi _z(T)}{\sqrt{\varepsilon }} \right\rangle _n = 0, \end{aligned}$$

(39)

almost surely, when conditioning on $f(X_T^\varepsilon ) = z$. In words, the conditioned Gaussian fluctuations at final time are constrained to the tangent plane of the equi-observable hypersurface $f^{-1}(\{z\})$ at the point $\phi _z(T)$.

As in the previous sections, we use the model SDE (10) with $z = 3$ and $\varepsilon = 0.5$ to illustrate these findings. To do this, we compare the PDF of $X_t^\varepsilon $ at different times t, when conditioning on $f(X_T^\varepsilon ) = z$, as obtained via sampling, to the Gaussian approximation $\mathcal{N}(\phi _z(t), \varepsilon {{\mathcal {C}}}_z(t,t))$ that we evaluate using the instanton as well as eigenvalues and eigenfunctions of $A_z$ that were computed previously. We use instanton-based importance sampling (Ebener et al. 2019) to generate $10^5$ trajectories of (10) that satisfy $f(X_T^\varepsilon ) = z$ up to a given precision $f((X_T^\varepsilon - \phi _z(T))/\sqrt{\varepsilon }) < 0.05$; the corresponding code, which again uses Euler steps with an integrating factor and a step size of $\Delta t = 5 \cdot 10^{-4}$, can be found in the GitHub repository (Schorlepp et al. 2023). Essentially, instead of using (9) directly, we shift the system by the instanton (cf. Tong et al. (2021) for a visualization and further analysis), solve

$$\begin{aligned} \textrm{d}Y_t^\varepsilon \!=\! \frac{b\left( \phi _z(t) + \sqrt{\varepsilon } Y_t^\varepsilon \right) - b(\phi _z(t))}{\sqrt{\varepsilon }} \textrm{d}t + \sigma \textrm{d}B_t\,, \quad Y_0^\varepsilon \!=\! 0\,, \end{aligned}$$

(40)

and reweight the samples by

$$\begin{aligned}&\exp \bigg \{ \varepsilon ^{-1} \int _0^T \big \langle b\left( \phi _z(t) + \sqrt{\varepsilon } Y_t^\varepsilon \right) - b(\phi _z(t)) \nonumber \\ {}&- \sqrt{\varepsilon } \nabla b(\phi _z(t)) Y_t^\varepsilon ,\theta _z \big \rangle _n \textrm{d}t + \varepsilon ^{-1} \lambda \big (f\left( \phi _z(T) + \sqrt{\varepsilon } Y_T^\varepsilon \right) \nonumber \\ {}&\qquad - f(\phi _z(T)) - \sqrt{\varepsilon } \nabla f(\phi _z(T)) Y_T^\varepsilon \big ) \bigg \}. \end{aligned}$$

(41)

The results are shown in Fig. 5, and we observe good agreement between the sampled conditioned distributions at times $t \in \{0.05, 0.25, 0.5, 0.75, 0.95 \}$ and the corresponding theoretical small-noise Gaussian approximations. In particular, the deformation of the fluctuation PDF along the instanton trajectory $\left( \phi _z(t)\right) _{ t \in [0,T]}$ is captured by the Gaussian approximation. It is not surprising that the Gaussian approximation works well for the parameters $\varepsilon , z$ and T used here, since the probability $P_F^\varepsilon (z)$ in Sect. 1.1 as approximated by the Laplace method also matched the direct sampling estimate.

4 Computational examples

We now apply the numerical methods introduced in the previous section to two high-dimensional examples involving SPDEs: In Sect. 4.1, we consider the Korteweg–De Vries equation in one spatial dimension, subject to spatially smooth Gaussian noise, and compute precise estimates for the probability to observe large wave heights at one instance in space and time. We compare our asymptotically sharp estimates to direct sampling, and also explicitly compare the two different prefactor computation strategies. Then, we focus on the stochastically forced three-dimensional incompressible Navier–Stokes equations in Sect. 4.2. This is a much higher-dimensional problem, and we demonstrate that the eigenvalue-based prefactor computation indeed remains applicable in practice for this example. Note that both SPDE examples in this section have periodic boundary conditions in space, but this is not a restriction of the method and has merely been chosen for convenience.

4.1 Stochastic Korteweg–De Vries equation

To illustrate the instanton and prefactor computation, we study the Korteweg–De Vries (KdV) equation subject to large-scale smooth Gaussian noise. The KdV equation can be considered as a model for shallow water waves, so the problem we are interested in is to estimate the probability of observing large wave amplitudes. Since this is the first PDE example we study and the general theory in the previous sections has only been developed for ODEs, we explicitly state the instanton equations, second order adjoint equations and Riccati equation. We consider a field $u^\varepsilon :[0,l = 2 \pi ] \times [0,T = 1] \rightarrow \mathbb {R}$ with periodic boundary conditions in space satisfying the SPDE

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t u^\varepsilon + u^\varepsilon \partial _x u^\epsilon - \nu \partial _{xx} u^\varepsilon + \kappa \partial _{xxx} u^\varepsilon = \sqrt{\varepsilon }\eta \,,\\ u^\varepsilon (\cdot , 0) = 0\,, \end{array}\right. } \end{aligned}$$

(42)

with constants $\nu = \kappa = 4 \cdot 10^{-2}$ and white-in-time, centered and stationary Gaussian forcing

$$\begin{aligned} \mathbb {E}\left[ \eta (x,t)\eta (x',t') \right] = \chi (x-x') \delta (t-t') \,. \end{aligned}$$

(43)

We choose $\hat{\chi }_k = \delta _{\left|k\right|,1} / (2 \pi )$ as the spatial correlation function of the noise $\eta $ in Fourier space, with $\,\hat{}\,$ denoting the spatial Fourier transform. Concretely, $\eta (x,t)$ is then given by $\eta (x,t) = \pi ^{-1/2} (\dot{B}_1(t) \sin (x) + \dot{B}_2(t) \cos (x))$, where $B_1,B_2$ are independent standard one-dimensional Brownian motions. Hence, the forcing only acts on a single large scale Fourier mode, and excitations of all other modes are due to the nonlinearity of the SPDE. As our observable, we choose the wave height at the origin

$$\begin{aligned} f(u(\cdot , T)) = u(0,T)\,, \end{aligned}$$

(44)

and we want to quantify the tail probability $P_F^\varepsilon (z) = \mathbb {P}\left[ f(u^\varepsilon (\cdot , T)) \ge z \right] $ for different $z > 0$. Note that the effective dimension of the system when formulated in terms of the noise for our choice of noise correlation is small, and we have ${{\,\textrm{rank}\,}}\sigma = 2 \ll n = n_x$ for typical spatial resolutions. Unless otherwise specified, we use $n_x = 1024$ for all numerical results in this section, as well as $n_t = 4000$ equidistant points in time, and we expect the prefactor computation in terms of eigenvalues of $A_z$ to be more efficient in this example, even though the Riccati approach still remains feasible.

We use a pseudo-spectral code and explicit second order Runge-Kutta steps in time with an integrating factor for the linear terms. The final-time constraint is treated with the augmented Lagrangian method. Denoting the state space instanton by $u_z$ with adjoint variable $p_z$ and Lagrange multiplier $\lambda _z$, the first-order necessary conditions at the minimizers read

$$\begin{aligned}&{\left\{ \begin{array}{ll} \partial _t u_z =-u_z \partial _x u_z + \nu \partial _{xx} u_z - \kappa \partial _{xxx} u_z + \chi * p_z\,,\\ \partial _t p_z = -u_z \partial _x p_z - \nu \partial _{xx} p_z - \kappa \partial _{xxx} p_z \,, \end{array}\right. }\nonumber \\ \text {with }&{\left\{ \begin{array}{ll} u_z(\cdot , 0) = 0\,, \quad f(u_z(\cdot , 1)) = z\,,\\ p_z(x, 1) = \lambda _z \delta (x)\,. \end{array}\right. } \end{aligned}$$

(45)

Here, $*$ denotes spatial convolution, which appears due to the stationarity of the forcing.

As a starting point, we compute instantons for a range of equidistantly spaced observable values $z \in [0, 30]$. Knowledge of the instanton for different z gives us access to the rate function $I_F$ of the observable, which is shown on the left in Fig. 6.

In the table in Fig. 8, we show for fixed z how the value of $I_F(z)$ converges when increasing the spatio-temporal resolution, and in particular that the number of optimization steps needed to find the instanton is robust under changes of the numerical resolution, indicating scalability of the instanton computation. The numerical details for these instanton computations are as follows (cf. Schorlepp et al. (2022)): Initial control $p \equiv 0$ and initial Lagrange multiplier $\lambda = 0$; precise target observable value $z = 8.39125$; 6 logarithmically spaced penalty steps from 1 to 300 for augmented Lagrangian method; optimization is terminated upon reduction of gradient norm by $10^6$; same (presumably) global minimizer was found for each resolution; discretize-then-optimize; L-BFGS solver with 4 updates stored; Armijo line search.

Two comments on the instanton computations for this example are in order: Firstly, the observable rate function is non-convex for some z in the interval [1.5, 5] (not visible in the figure). This poses a problem for the dual problem solved at fixed $\lambda $ without penalty, but is not an issue for the penalty or augmented Lagrangian strategy that we used. Furthermore, this means that the Riccati prefactor computation is not directly applicable in this region, but the Fredholm expression remains valid. Secondly, since it is a priori unclear whether the minimization problem for the instanton has a unique solution (the target functional is quadratic, but the constraint is nonlinear), we started multiple optimization runs for the same z at different random initial conditions. In the KdV system, we found multiple subdominant minima that consist of multiple large wave crests (as opposed to just one for the dominant one, as shown in the top left of Fig. 9 for one z), but only took the (presumably) global minimizer for subsequent estimates.

To complete the asymptotic estimate of the wave height probability via (7), we further need the prefactor $C_F(z)$ for all z, which we compute by finding the dominant eigenvalues of $A_z$ as before. We specify the input and output of the linear operator $A_z$ only in terms of the two real Fourier modes of the noise that are relevant for this, to remove the memory cost of the eigenvalue solver. The second order adjoint equations (23) for noise fluctuations $\delta \eta :[0,2 \pi ] \times [0,1] \rightarrow \mathbb {R}$ for the KdV equation read

$$\begin{aligned}&{\left\{ \begin{array}{ll} \partial _t \delta u =-\partial _x (u_z \delta u) + \nu \partial _{xx} \delta u - \kappa \partial _{xxx} \delta u + \chi ^{1/2} * \delta \eta \,,\\ \partial _t \delta p = -\delta u \partial _x p_z -u_z \partial _x \delta p - \nu \partial _{xx} \delta p - \kappa \partial _{xxx} \delta p\,, \end{array}\right. } \nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \delta u (\cdot , 0) = 0\,,\\ \delta p(\cdot , 1) = 0\,, \end{array}\right. } \end{aligned}$$

(46)

with $A_z \delta \eta = \chi ^{1/2} * \delta p$. In our implementation, we supply the second variation operator with the two real Fourier coefficients $\left( \text {Re} \, \widehat{\delta \eta }_1(t_i)\right) _{i = 0, \dots , n_t}$ and $\left( \text {Im} \, \widehat{\delta \eta }_1(t_i) \right) _{i = 0, \dots , n_t}$, assemble the full fluctuation vector $\delta \eta $ from it, and return $\chi ^{1/2} * \delta p$ in the same format after solving (46). As the KdV solutions fit into memory, checkpointing, as discussed in Sect. 2.4, is not necessary. In Fig. 7, we show the convergence of the determinant $\det ({{\,\textrm{Id}\,}}- A_z)$ for some z’s based on the found eigenvalues, thereby demonstrating that a handful of eigenvalues suffices for an accurate approximation of the prefactor. The number of necessary eigenvalues increases only weakly with the observable value z in this example. In addition, Fig. 8 shows the effect of varying the spatio-temporal resolution $(n_x, n_t)$ on the determinant $\det ({{\,\textrm{Id}\,}}- A_z)$ for one particular observable value of $z = 8.4$ at a fixed number of computed eigenvalues. We see that as long as the physical problem is resolved, the eigenvalue spectrum does not change much with the resolution, and the determinant converges when increasing the spatio-temporal resolution. This indicates that our methods are scalable, i.e., their cost does not increase with the temporal (and also spatial) discretization beyond the increased cost of the PDE solution. This is a crucial property of the eigenvalue-based prefactor computation and is in contrast with the Riccati approach.

The result for the prefactor $C_F$ as a function of z is shown on the bottom left of Fig. 6. Note that the vertical axis is scaled logarithmically, i.e. the prefactor strongly depends on the observable value. The importance of the prefactor is further confirmed by the comparison of the complete asymptotic estimate (7) to the results of direct Monte Carlo simulations on the right in Fig. 6. For three values of $\varepsilon \in \{0.1, 1, 10\}$, we performed $4 \cdot 10^4$ respective simulations of the stochastic KdV equation (42) to estimate the tail probability $P_F^\varepsilon (z)$ without approximations. Using both the rate function and prefactor, excellent agreement with the Monte Carlo simulations is obtained. In contrast to this, only using the leading order LDT term $\exp \left\{ -I_F(z)/\varepsilon \right\} $ with a constant prefactor leads to a much worse agreement with simulations, and in fact only works reasonably for $\varepsilon = 0.1$. Note also that one can see from these comparisons that the actual effective smallness parameter for the asymptotic expression (7) to be valid is $\varepsilon / h(z)$ for some monotonically increasing function h, meaning that the estimate is also valid for large $\varepsilon $ as long as suitably large $z \rightarrow \infty $ are considered. In this sense, the estimate is truly an extreme event probability estimate, but we chose to work in terms of the formal parameter $\varepsilon $ to have an explicit and general scaling parameter, in contrast to the example-specific function h(z). For works on large deviation principles directly in $z \rightarrow \infty $, see e.g. Dematteis et al. (2019), Tong et al. (2021)

In addition to the probability estimate itself, the instanton, eigenvalues and eigenfunctions of $\eta _z$ also carry physical information about the system, as discussed in general in Sect. 3. Figure 9 shows the instanton $u_z$, i.e. the most likely field realization to reach a large wave height of $z = 8.4$, and the dominant space-time fluctuations $\delta u_z^{(i)}$ around it.

We further computed the Gaussian fluctuations around the instanton for $z = 8.4$ at the final instance $t = T$ in Fig. 10. First of all, we also solved the forward Riccati equation (27), which is a PDE for ${{\mathcal {Q}}}_z :[0,2 \pi ]^2 \times [0,1] \rightarrow \mathbb {R}$ here and reads

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t {{\mathcal {Q}}}_z(x, y, t) = \chi (x - y)\\ \hspace{2cm}- \left[ \partial _x \left( u_z(x) \cdot \right) + \partial _y \left( u_z(y) \cdot \right) \right] {{\mathcal {Q}}}_z(x, y, t)\\ \hspace{2cm}+ \nu \left[ \partial _{xx} + \partial _{yy} \right] {{\mathcal {Q}}}_z(x, y, t) \\ \hspace{2cm}- \kappa \left[ \partial _{xxx} + \partial _{yyy} \right] {{\mathcal {Q}}}_z(x, y, t)\\ \hspace{2cm}+ \int _0^{2 \pi } {{\mathcal {Q}}}_z(x, x', t) \partial _{x'} p_z(x', t){{\mathcal {Q}}}_z(x', y, t) \textrm{d}x'\,,\\ {{\mathcal {Q}}}_z(\cdot , \cdot , t = 0) = 0\,, \end{array}\right. } \end{aligned}$$

(47)

using the same pseudospectral code and explicit second order Runge-Kutta steps with integrating factor. The result for the prefactor agrees with the one obtained using the Fredholm determinant expression, with $C_F(z = 8.4) \approx 1.0793 \cdot 10^{-2}$ using the eigenvalues and $C_F(z = 8.4) \approx 1.0794 \cdot 10^{-2}$ from the Riccati approach with

$$\begin{aligned} C_F(z) = \frac{\exp \left\{ \tfrac{1}{2} \int _0^1 \textrm{d}t \int _0^{2 \pi } \textrm{d}x \; \partial _x p_z(x, t) {{\mathcal {Q}}}_z(x,x,t) \right\} }{\lambda _z \sqrt{{{\mathcal {Q}}}_z(0,0,1)}}\,. \end{aligned}$$

(48)

For this particular observable value, the Riccati equation could be integrated without numerical problems, but we encountered a removable singularity for larger observable values. The final-time covariance of the conditioned Gaussian fluctuations around the instanton, as predicted using either the Riccati solution (38) or the eigenfunctions and eigenvalues (37), indeed coincides for both approaches and is highly oscillatory (top row, center and right in Fig. 10). Denoting the eigenvalues and normalized eigenfunctions of the final-time covariance operator ${{\mathcal {C}}}_z(T,T)$ by $\nu _z^{(i)}(T)$ and $\delta v_z^{(i)}$, we see that only a handful of fluctuation modes $\delta v_z^{(i)}$ are actually observable since the eigenvalues $\nu _z^{(i)}(T)$ in the bottom left of Fig. 10 quickly decay. Using the eigenvalues and eigenfunctions, realizations of $u^\varepsilon (\cdot , T)$ when conditioning on $u^\varepsilon (0, T) = z = 8.4$ can now easily be sampled within the Gaussian approximation as

$$\begin{aligned} u^\varepsilon (x, T) \approx u_z(x, T) + \sqrt{\varepsilon } \sum _{i = 1}^\infty Z_i \sqrt{\nu _z^{(i)}(T)} \delta v_z^{(i)}(x) \end{aligned}$$

(49)

with $Z_i$ independent and identically standard normally distributed. All in all, this example demonstrates the practical relevance and ease of applicability of the asymptotically sharp LDT estimate including the prefactor in a nonlinear, one-dimensional SPDE.

4.2 Stochastically forced incompressible three-dimensional Navier–Stokes equations

As a challenging, high-dimensional example, we consider the estimation of the probability of a high strain event in the stochastically forced incompressible three-dimensional Navier–Stokes equations. Our main goal here is to demonstrate that in addition to instantons for this problem, which were computed by Schorlepp et al. (2022), it is also numerically feasible to compute the leading order prefactor using the Fredholm determinant approach (14). Our setup hence follows the one treated by Schorlepp et al. (2022). A comprehensive analysis of the problem, including the behavior of the prefactor in the vicinity of the critical points of the dynamical phase transitions observed in this example, is beyond the scope of this paper. For other works on instantons and large deviations for the three-dimensional stochastic Navier–Stokes equations, see Falkovich et al. (1996), Moriconi (2004), Grafke et al. (2015), Apolinário et al. (2022). We consider a velocity field $u^\varepsilon :[0,l = 2 \pi ]^3 \times [0,T = 1] \rightarrow \mathbb {R}^3$ with periodic boundary conditions in space that satisfies

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t u^\varepsilon + \left( u^\varepsilon \cdot \nabla \right) u^\epsilon - \Delta u^\varepsilon + \nabla P = \sqrt{\varepsilon }\eta \,,\\ \nabla \cdot u^\varepsilon = 0\,,\\ u^\varepsilon (\cdot , 0) = 0\,. \end{array}\right. } \end{aligned}$$

(50)

Here, P denotes the pressure which is determined through the divergence constraint. The forcing $\eta $ is centered Gaussian, large-scale in space, white in time, and solenoidal with covariance

$$\begin{aligned} \mathbb {E}\left[ \eta (x,t)\eta (x',t')^\top \right] = \chi (x-x') \delta (t-t') \,, \end{aligned}$$

(51)

where a Mexican hat correlation function with correlation length 1

$$\begin{aligned} \chi (x) = \left[ 1_{3 \times 3} - \frac{1}{2} \left( \left\Vert x\right\Vert ^2 1_{3 \times 3} - x \otimes x\right) \right] \exp \left\{ -\frac{\left\Vert x\right\Vert ^2}{2} \right\} \,, \end{aligned}$$

(52)

is used. Note that this corresponds to the situation ${{\,\textrm{rank}\,}}\sigma \ll 3 n_x^3$ of Sect. 2.4, where only a small number of degrees of freedom is forced due to the Fourier transform $\hat{\chi }$ decaying exponentially. As our observable, we consider the strain $f(u) = \partial _3 u_3(x=0)$ at the origin. Denoting the Leray projection onto the divergence-free part of a vector field by ${{\mathcal {P}}}$, the instanton equations for $(u_z, p_z, \lambda _z)$ are given by

$$\begin{aligned}&{\left\{ \begin{array}{ll} \partial _t u_z =- {{\mathcal {P}}} \left[ \left( u_z \cdot \nabla \right) u_z \right] + \Delta u_z + \chi * p_z\,,\\ \partial _t p_z = - {{\mathcal {P}}} \left[ \left( u_z \cdot \nabla \right) p_z + \left( \nabla p_z \right) ^\top u_z\right] - \Delta p_z \,, \end{array}\right. }\nonumber \\ \text {with }&{\left\{ \begin{array}{ll} u_z(\cdot , 0) = 0\,, \quad f(u_z(\cdot , 1)) = z\,,\\ p_z(\cdot , 1) = \lambda _z {{\mathcal {P}}} \left[ \left. \frac{\delta f}{\delta u} \right| _{u_z(\cdot , 1)} \right] \,. \end{array}\right. } \end{aligned}$$

(53)

With the instantons computed, we are able to evaluate the application of the second variation operator $A_z$ to noise fluctuation vectors $\delta \eta :[0,2\pi ]^3 \times [0,1] \rightarrow \mathbb {R}^3$ by solving the second order adjoint equations

$$\begin{aligned}&{\left\{ \begin{array}{ll} \partial _t \left( \delta u \right) = -{{\mathcal {P}}} \left[ (u_z \cdot \nabla ) \delta u + (\delta u \cdot \nabla ) u_z\right] \\ \hspace{1.4cm}+ \Delta \left( \delta u\right) + \chi ^{1/2} * \delta \eta \,,\\ \partial _t \left( \delta p\right) = - {{\mathcal {P}}} \big [ \left( \nabla p_z + \left( \nabla p_z \right) ^\top \right) \delta u + \left( u_z \cdot \nabla \right) \delta p \\ \hspace{1.4cm}+ \left( \nabla (\delta p) \right) ^\top u_z \big ] - \Delta \left( \delta p\right) \,, \end{array}\right. }\nonumber \\ \text {with }&{\left\{ \begin{array}{ll} \delta u(\cdot , 0) = 0\,,\\ \delta p(\cdot , 1) = 0\,. \end{array}\right. } \end{aligned}$$

(54)

We focus on $z = -25$ here, where the unique instanton solution does not break rotational symmetry (Schorlepp et al. 2022). Numerically, we use a pseudo-spectral GPU code with a spatial resolution $n_x = n_y = n_z = 128$, a temporal resolution of $n_t = 512$, a nonuniform grid in time with smaller time steps close to $T = 1$, and second order explicit Runge–Kutta steps with an integrating factor for the diffusion term. We truncated $\chi $ in Fourier space by setting it to 0 for all k where $\left|\hat{\chi }_k\right| < 10^{-14}$, leading to $\left\Vert k\right\Vert \le 9$ and an effective real spatial dimension, independently of $n_x$, of approximately ${{\,\textrm{rank}\,}}\sigma \approx 2 \cdot (2 \cdot 9)^3 = 11664$ for the noise (by taking a cube instead of sphere for the Fourier coefficients of the noise vectors that are stored, and noting that $\hat{\chi }_k$ projects onto $k^\perp $). The evaluation of the second order adjoint equations is then possible with only a few GB of VRAM for this resolution when exploiting double checkpointing and low rank storage as described in Sect. 2.4. We computed the 600 largest eigenvalues of operator $A_z$, again realized as a scipy.sparse.linalg.LinearOperator, by using scipy.sparse.linalg.eigs as before. We transfer the data to the GPU to evaluate the second variation applied to $\delta \eta $ by solving (54) with PyCUDA (Klöckner et al. 2012), and transfer back $\chi ^{1/2} * \delta p$ to the CPU afterwards. Computing 600 eigenvalues this way needs about 1200 operator evaluations, or about 30 hours on a modern workstation with Intel Xeon Gold 6342 CPUs at $2.80\;\text {GHz}$ and an NVIDIA A100 80GB GPU. The main limitation for computing more eigenvalues is that the eigenvalue solver used stores all matrix vector products in RAM. This could be overcome by storing some of them on a hard disk, or using different algorithms that can be parallelized over multiple nodes such as randomized SVD (Maulik et al. 2021).

The results for the eigenvalues of $A_z$ are shown in Fig. 11. We see that the absolute value of the eigenvalues decays such that the product $\prod _{i = 1}^m \left( 1 - \mu _z^{(i)} \right) $ converges as m increases, but that even more than 600 eigenvalues would be needed for a more accurate result. For smaller observable values z, faster convergence is expected. Also, the spectrum of $A_z$ shows a large number of doubly degenerate eigenvalues, which appear whenever the eigenfunctions break the axial symmetry of the instanton. This feature of the spectrum clearly depends on the domain and spatial boundary conditions that were chosen here. From the instanton computation, we obtain $I_F(z) \approx 1900.7$ for the rate function, and from the 600 eigenvalues of $A_z$ we estimate $C_F(z) \approx 4.9 \cdot 10^{-3}$. With this, we can estimate that e.g. for $\varepsilon = 250$, the probability to observe a strain event with $\partial _3 u_3(x = 0,T = 1) \le -25$ is approximately $1.5 \cdot 10^{-5}$, which matches the sampling estimate of $P_F^{250}(-25) \in [1.3 \cdot 10^{-5}, 1.7 \cdot 10^{-5}]$ at $95\%$ asymptotic confidence, as obtained from $10^4$ direct numerical simulations of (50) (data set from Schorlepp et al. (2022)). For smaller $\varepsilon $, the event becomes more rare, and it quickly becomes unfeasible to estimate its probability via direct sampling, whereas the quadratic estimate using the rate function and prefactor can be computed for any $\varepsilon $ and is known to become more precise as the event becomes more difficult to observe in direct simulations. In addition to these probability estimates, we can also analyze the dominant Gaussian fluctuations around the instanton now and easily sample high strain events within the Gaussian approximation. Figure 12 shows the instanton $u_z$ at final time, i.e. an axially symmetric pair of counter-rotating vortex rings, as well as the dominant eigenfunctions of ${{\mathcal {C}}}_z(T,T)$, corresponding to the fluctuation modes that are most easily observed at final time in conditioned direct numerical simulations. Note that the Riccati equation (27) would be a PDE for a six-dimensional matrix-valued field $Q_z(x_1,x_2,x_3,y_1,y_2,y_3,t)$ here without obvious sparsity properties. Solvers for such a problem are quite expensive, if feasible at all, and also not easy to scale to higher spatial resolutions, whereas this is possible for the dominant eigenvalue approach.

5 Summary and outlook

In this paper, we have presented an asymptotically sharp, sampling-free probability estimation method for extreme events of stochastic processes described by additive-noise SDEs and SPDEs. The method can be regarded as a path-space SORM approximation. We have introduced and compared two different conceptual and numerical strategies to evaluate the pre-exponential factor appearing in these estimates, either through dominant eigenvalues of the second variation, corresponding to the standard formulation of precise Laplace asymptotics and SORM, or through the solution of matrix Riccati differential equations, which is possible for precise large deviations of continuous-time Markov processes. Highlighting the scalability of the first approach, we have shown that leading-order prefactors can be computed in practice even for very high-dimensional SDEs, and explicitly tested our methods in two SPDE examples. In all examples, the approximations showed good agreement with direct Monte Carlo simulations or importance sampling. We hope that the methods assembled in this paper are useful whenever sample path large deviation theory is used to obtain probability estimates in real-world examples.

There are multiple possible extensions of the methods presented in this paper. More general classes of SDEs and SPDEs could possibly be treated numerically within the eigenvalue-based approach, most notably SDEs with multiplicative Gaussian noise, but also SDEs driven by Levy noise or singular SPDEs. Furthermore, one could try to generalize the approach to include any additive Gaussian noise that is colored in time instead of white. This would potentially lead to further dimensional reduction for the instanton and prefactor computation for examples with a slowly decaying temporal noise correlation. It would also be interesting to apply the eigenvalue-based prefactor computation strategy to metastable non-gradient SDEs. Regarding the numerical applicability of the Riccati method in case of high-dimensional systems with low-rank forcing, there is an alternative formulation of the prefactor in terms of a backward-in-time Riccati equation (Grafke et al. 2021), which could be better suited for controlled low-rank approximations. In general, improvements of the quadratic approximation used throughout this paper via loop expansions, resummation techniques or non-perturbative methods from theoretical physics could be investigated. In this regard, it would be desirable to obtain simple criteria that indicate whether the SORM approximation considered in this paper can be expected to be accurate for given $\varepsilon $ and z. Finally, one could use the instanton and additional prefactor information for efficient importance sampling of extreme events for S(P)DEs.

References

Alqahtani, M., Grafke, T.: Instantons for rare events in heavy-tailed distributions. J. Phys. A: Math. Theor. 54(17), 175001 (2021)
MathSciNet MATH Google Scholar
Apolinário, G., Moriconi, L., Pereira, R., Valadão, V.: Eddy-viscous modeling and the topology of extreme circulation events in three-dimensional turbulence. Phys. Lett. A 449, 128360 (2022)
MathSciNet MATH Google Scholar
Archambeau, C., Cornford, D., Opper, M., Shawe-Taylor, J.: Gaussian process approximations of stochastic differential equations, Gaussian Processes in Practice, PMLR (2007)
Au, S.-K., Beck, J.L.: Estimation of small failure probabilities in high dimensions by subset simulation. Probab. Eng. Mech. 16(4), 263–277 (2001)
Google Scholar
Azencott, R.: Formule de Taylor stochastique et développement asymptotique d’intégrales de Feynmann, Séminaire de Probabilités XVI, 1980/81 Supplément: Géométrie Différentielle Stochastique, Springer, pp. 237–285 (1982)
Ben Arous, G.: Methods de Laplace et de la phase stationnaire sur l’espace de Wiener. Stochastics 25(3), 125–153 (1988)
MathSciNet MATH Google Scholar
Berglund, N., Gesù, G.D., Weber, H.: An Eyring–Kramers law for the stochastic Allen–Cahn equation in dimension two. Electron. J. Probab. 22, 1–27 (2017)
MathSciNet MATH Google Scholar
Bleistein, N., Handelsman, R.A.: Asymptotic Expansions of Integrals. Ardent Media, Wilkes-Barre (1975)
MATH Google Scholar
Bornemann, F.: On the numerical evaluation of Fredholm determinants. Math. Comput. 79(270), 871–915 (2010)
MathSciNet MATH Google Scholar
Bouchet, F., Reygner, J.: Path integral derivation and numerical computation of large deviation prefactors for non-equilibrium dynamics through matrix Riccati equations. J. Stat. Phys. 189(2), 1–32 (2022)
MathSciNet MATH Google Scholar
Boyd, S., Vandenberghe, L.: Convex Optimization. Cambridge University Press, Cambridge (2004)
MATH Google Scholar
Breitung, K.: Asymptotic approximations for multinormal integrals. J. Eng. Mech. 110(3), 357–366 (1984)
MATH Google Scholar
Breitung, K.: Asymptotic Approximations for Probability Integrals. Springer, Berlin (2006)
MATH Google Scholar
Brown, L.D., Cai, T.T., DasGupta, A.: Interval estimation for a binomial proportion. Stat. Sci. 16(2), 101–133 (2001)
MathSciNet MATH Google Scholar
Bucklew, J.: Introduction to Rare Event Simulation. Springer Science & Business Media, Berlin (2013)
MATH Google Scholar
Budhiraja, A., Dupuis, P.: Analysis and approximation of rare events, Representations and Weak Convergence Methods. Series Prob. Theory and Stoch. Modelling 94 (2019)
Cioaca, A., Alexe, M., Sandu, A.: Second-order adjoints for solving PDE-constrained optimization problems. Optim. Methods Softw. 27(4–5), 625–653 (2012)
MathSciNet MATH Google Scholar
Corazza, G., Fadel, M.: Normalized Gaussian path integrals. Phys. Rev. E 102(2), 022135 (2020)
MathSciNet Google Scholar
Dematteis, G., Grafke, T., Vanden-Eijnden, E.: Rogue waves and large deviations in deep sea. Proc. Natl. Acad. Sci. 115(5), 855–860 (2018)
MathSciNet MATH Google Scholar
Dematteis, G., Grafke, T., Vanden-Eijnden, E.: Extreme event quantification in dynamical systems with random components. SIAM/ASA J. Uncertain. Quantif. 7(3), 1029–1059 (2019)
MathSciNet MATH Google Scholar
Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications. Applications of mathematics, Springer, Berlin (1998)
MATH Google Scholar
Der Kiureghian, A., De Stefano, M.: Efficient algorithm for second-order reliability analysis. J. Eng. Mech. 117(12), 2904–2923 (1991)
Google Scholar
Der Kiureghian, A., Lin, H.-Z., Hwang, S.-J.: Second-order reliability approximations. J. Eng. Mech. 113(8), 1208–1225 (1987)
Google Scholar
Deuschel, J.-D., Friz, P.K., Jacquier, A., Violante, S.: Marginal density expansions for diffusions and stochastic volatility I: Theoretical foundations. Commun. Pure Appl. Math. 67(1), 40–82 (2014)
MathSciNet MATH Google Scholar
Weinan, E., Ren, W., Vanden-Eijnden, E.: Minimum action method for the study of rare events. Commun. Pure Appl. Math. 57(5), 637–656 (2004)
MathSciNet MATH Google Scholar
Ebener, L., Margazoglou, G., Friedrich, J., Biferale, L., Grauer, R.: Instanton based importance sampling for rare events in stochastic PDEs. Chaos Interdiscip. J. Nonlinear Sci. 29(6), 063102 (2019)
MathSciNet MATH Google Scholar
Ellis, R.S., Rosen, J.S.: Asymptotic analysis of Gaussian integrals, II: Manifold of minimum points. Commun. Math. Phys. 82(2), 153–181 (1981)
MathSciNet MATH Google Scholar
Ellis, R.S., Rosen, J.S.: Asymptotic analysis of Gaussian integrals. I. Isolated Minimum Points. Trans. Am. Math. Soc. 273(2), 447–481 (1982)
MathSciNet MATH Google Scholar
Falkovich, G., Kolokolov, I., Lebedev, V., Migdal, A.: Instantons and intermittency. Phys. Rev. E 54(5), 4896 (1996)
Google Scholar
Farazmand, M., Sapsis, T.P.: Extreme events: mechanisms and prediction. Appl. Mech. Rev. 71(5), 050801 (2019)
Google Scholar
Ferré, G., Grafke, T.: Approximate optimal controls via instanton expansion for low temperature free energy computation. Multiscale Model Simul. 19(3), 1310–1332 (2021)
MathSciNet MATH Google Scholar
Forman, R.: Functional determinants and geometry. Invent. Math. 88(3), 447–493 (1987)
MathSciNet MATH Google Scholar
Fredholm, I.: Sur une classe d’équations fonctionnelles. Acta Math. 27, 365–390 (1903)
MathSciNet MATH Google Scholar
Freidlin, M.I., Wentzell, A.D.: Random Perturbations of Dynamical Systems, vol. 260. Springer, Berlin (2012)
MATH Google Scholar
Friz, P.K., Gatheral, J., Gulisashvili, A., Jacquier, A., Teichmann, J., et al.: Large Deviations and Asymptotic Methods in Finance, vol. 110. Springer, Berlin (2015)
MATH Google Scholar
Friz, P.K., Klose, T.: Precise Laplace asymptotics for singular stochastic PDEs: the case of 2D gPAM. J. Funct. Anal. 283(1), 109446 (2022)
MathSciNet MATH Google Scholar
Fuchs, A., Herbert, C., Rolland, J., Wächter, M., Bouchet, F., Peinke, J.: Instantons and the path to intermittency in turbulent flows. Phys. Rev. Lett. 129(3), 034502 (2022)
MathSciNet Google Scholar
Gálfi, V.M., Lucarini, V., Wouters, J.: A large deviation theory-based analysis of heat waves and cold spells in a simplified model of the general circulation of the atmosphere. J. Stat. Mech. Theory Exp 2019(3), 033404 (2019)
MathSciNet MATH Google Scholar
Gel’fand, I.M., Yaglom, A.M.: Integration in Functional Spaces and its Applications in Quantum Physics. J. Math. Phys. 1(1), 48–69 (1960)
MathSciNet MATH Google Scholar
Grafke, T., Grauer, R., Schäfer, T.: Instanton filtering for the stochastic Burgers equation. J. Phys. A: Math. Theor. 46(6), 062002 (2013)
MathSciNet MATH Google Scholar
Grafke, T., Grauer, R., Schäfer, T.: The instanton method and its numerical implementation in fluid mechanics. J. Phys. A: Math. Theor. 48(33), 333001 (2015)
MathSciNet MATH Google Scholar
Grafke, T., Grauer, R., Schindel, S.: Efficient computation of instantons for multi-dimensional turbulent flows with large scale forcing. Commun. Comput. Phys. 18(3), 577–592 (2015)
MathSciNet MATH Google Scholar
Grafke, T., Schäfer, T., Vanden-Eijnden, E.: Sharp Asymptotic Estimates for Expectations, Probabilities, and Mean First Passage Times in Stochastic Systems with Small Noise Commun. Pure Appl. Math. (2023). https://doi.org/10.1002/cpa.22177
Grafke, T., Vanden-Eijnden, E.: Numerical computation of rare events via large deviation theory. Chaos Interdiscip. J. Nonlinear Sci. 29(6), 063118 (2019)
MathSciNet MATH Google Scholar
Griewank, A., Walther, A.: Algorithm 799: revolve: An implementation of checkpointing for the reverse or adjoint mode of computational differentiation. ACM Trans. Math. Softw. 26(1), 19–45 (2000)
MATH Google Scholar
Halko, N., Martinsson, P.-G., Tropp, J.A.: Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Rev. 53(2), 217–288 (2011)
MathSciNet MATH Google Scholar
Hartmann, A.K., Meerson, B., Sasorov, P.: Observing symmetry-broken optimal paths of the stationary Kardar–Parisi–Zhang interface via a large-deviation sampling of directed polymers in random media. Phys. Rev. E 104(5), 054125 (2021)
Google Scholar
Hartmann, L., Lesch, M.: Zeta and Fredholm determinants of self-adjoint operators. J. Funct. Anal. 283(1), 109491 (2022)
MathSciNet MATH Google Scholar
Herzog, R., Kunisch, K.: Algorithms for PDE-constrained optimization. GAMM-Mitteilungen 33(2), 163–176 (2010)
MathSciNet MATH Google Scholar
Hinze, M., Kunisch, K.: Second order methods for optimal control of time-dependent fluid flow. SIAM J. Control. Optim. 40(3), 925–946 (2001)
MathSciNet MATH Google Scholar
Hinze, M., Pinnau, R., Ulbrich, M., Ulbrich, S.: Optimization with PDE Constraints. Springer, Berlin (2009)
MATH Google Scholar
Hinze, M., Walther, A., Sternberg, J.: An optimal memory-reduced procedure for calculating adjoints of the instationary Navier–Stokes equations. Optim. Control Appl. Methods 27(1), 19–40 (2006)
MathSciNet Google Scholar
Kirsten, K., McKane, A.J.: Functional determinants by contour integration methods. Ann. Phys. 308(2), 502–527 (2003)
MathSciNet MATH Google Scholar
Klöckner, A., Pinto, N., Lee, Y., Catanzaro, B., Ivanov, P., Fasih, A.: PyCUDA and PyOpenCL: a scripting-based approach to GPU run-time code generation. Parallel Comput. 38(3), 157–174 (2012)
Google Scholar
Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK Users’ Guide: Solution of Large-scale Eigenvalue Problems with Implicitly Restarted Arnoldi Methods. SIAM, New Delhi (1998)
MATH Google Scholar
Levit, S., Smilansky, U.: A theorem on infinite products of eigenvalues of Sturm-Liouville type operators. In: Proceedings of the American Mathematical Society, pp. 299–302 (1977)
Lewis, F.L., Vrabie, D., Syrmos, V.L.: Optimal Control. John Wiley & Sons, New York (2012)
MATH Google Scholar
Lu, Y., Stuart, A., Weber, H.: Gaussian approximations for transition paths in Brownian dynamics. SIAM J. Math. Anal. 49(4), 3005–3047 (2017)
MathSciNet MATH Google Scholar
Maier, R.S., Stein, D.L.: A scaling theory of bifurcations in the symmetric weak-noise escape problem. J. Stat. Phys. 83(3), 291–357 (1996)
MathSciNet MATH Google Scholar
Margazoglou, G., Grafke, T., Laio, A., Lucarini, V.: Dynamical landscape and multistability of a climate model. Proc. R. Soc. A 477(2250), 20210019 (2021)
MathSciNet Google Scholar
Maulik, R., Mengaldo, G.: PyParSVD: a streaming, distributed and randomized singular-value-decomposition library. In: 2021 7th International Workshop on Data Analysis and Reduction for Big Scientific Data (DRBSD-7), pp. 19–25 (2021)
McKean, H.: Fredholm determinants. Open Math. 9(2), 205–243 (2011)
MathSciNet MATH Google Scholar
Mohamad, M.A., Sapsis, T.P.: Sequential sampling strategy for extreme event statistics in nonlinear dynamical systems. Proc. Natl. Acad. Sci. 115(44), 11138–11143 (2018)
MathSciNet MATH Google Scholar
Moriconi, L.: Statistics of intense turbulent vorticity events. Phys. Rev. E 70, 025302 (2004)
Google Scholar
Naumann, U.: The art of differentiating computer programs: an introduction to algorithmic differentiation. SIAM, New Delhi (2011)
MATH Google Scholar
Nickelsen, D., Engel, A.: Asymptotics of work distributions: the pre-exponential factor. Eur. Phys. J. B 82(3), 207–218 (2011)
Google Scholar
Papadopoulos, G.J.: Gaussian path integrals. Phys. Rev. D 11(10), 2870–2875 (1975)
Google Scholar
Piterbarg, V.I., Fatalov, V.R.: The Laplace method for probability measures in Banach spaces. Russ. Math. Surv. 50(6), 1151 (1995)
MathSciNet MATH Google Scholar
Plessix, R.-E.: A review of the adjoint-state method for computing the gradient of a functional with geophysical applications. Geophys. J. Int. 167(2), 495–503 (2006)
Google Scholar
Psaros, A.F., Kougioumtzoglou, I.A.: Functional series expansions and quadratic approximations for enhancing the accuracy of the Wiener path integral technique. J. Eng. Mech. 146(7), 04020065 (2020)
Google Scholar
Rackwitz, R.: Reliability analysis-a review and some perspectives. Struct. Saf. 23(4), 365–395 (2001)
Google Scholar
Ragone, F., Wouters, J., Bouchet, F.: Computation of extreme heat waves in climate models using a large deviation algorithm. Proc. Natl. Acad. Sci. 115(1), 24–29 (2018)
MathSciNet MATH Google Scholar
Ray, D.B., Singer, I.M.: R-torsion and the Laplacian on Riemannian manifolds. Adv. Math. 7(2), 145–210 (1971)
MathSciNet MATH Google Scholar
Schiff, J., Shnider, S.: A natural approach to the numerical integration of Riccati differential equations. SIAM J. Numer. Anal. 36(5), 1392–1413 (1999)
MathSciNet MATH Google Scholar
Schorlepp, T., Grafke, T., Grauer, R.: Gel’fand–Yaglom type equations for calculating fluctuations around Instantons in stochastic systems. J. Phys. A: Math. Theor. 54(23), 235003 (2021)
MathSciNet MATH Google Scholar
Schorlepp, T., Grafke, T., Grauer, R.: Symmetries and zero modes in sample path large deviations. J. Stat. Phys. 190(3), 1–62 (2023)
MathSciNet MATH Google Scholar
Schorlepp, T., Grafke, T., May, S., Grauer, R.: Spontaneous symmetry breaking for extreme vorticity and strain in the three-dimensional Navier–Stokes equations. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 380(2226), 20210051 (2022)
MathSciNet Google Scholar
Schorlepp, T., Tong, S., Grafke, T., Stadler, G.: Source Code for “Scalable Methods for Computing Sharp Extreme Event Probabilities in Infinite-Dimensional Stochastic Systems”, GitHub Repository (2023) https://github.com/TimoSchorlepp/sharp-extreme-event
Simon, B.: Notes on infinite determinants of Hilbert space operators. Adv. Math. 24(3), 244–273 (1977)
MathSciNet MATH Google Scholar
Simonnet, E.: Computing non-equilibrium trajectories by a deep learning approach, J. Comp. Phys. 491, (2023). https://doi.org/10.1016/j.jcp.2023.112349
Sternberg, J., Hinze, M.: A memory-reduced implementation of the Newton-CG method in optimal control of nonlinear time-dependent PDEs. Optim. Methods Softw. 25(4), 553–571 (2010)
MathSciNet MATH Google Scholar
Stillfjord, T.: Adaptive high-order splitting schemes for large-scale differential Riccati equations. Numer. Algorithms 78(4), 1129–1151 (2018)
MathSciNet MATH Google Scholar
Tong, S., and Stadler, G.: Large deviation theory-based adaptive importance sampling for rare events in high dimensions, SIAM/ASA J. Uncertain. Quantif. 11(3) 788–813, (2023) https://doi.org/10.1137/22M1524758
Tong, S., Vanden-Eijnden, E., Stadler, G.: Extreme event probability estimation using PDE-constrained optimization and large deviation theory, with application to tsunamis. Commun. Appl. Math. Comput. Sci. 16(2), 181–225 (2021)
MathSciNet MATH Google Scholar
Vanden-Eijnden, E.: Transition path theory, computer simulations in condensed matter systems: from materials to chemical biology, vol. 1, pp. 453–493. Springer, Berlin (2006)
Google Scholar
Vanden-Eijnden, E., Weare, J.: Rare event simulation of small noise diffusions. Commun. Pure Appl. Math. 65(12), 1770–1803 (2012)
MathSciNet MATH Google Scholar
Varadhan, S.S.: Large deviations and applications, vol. 46. SIAM, New Delhi (1984)
MATH Google Scholar
Zhang, B.J., Sahai, T., Marzouk, Y.M.: A Koopman framework for rare event simulation in stochastic differential equations. J. Comput. Phys. 456, 111025 (2022)
MathSciNet MATH Google Scholar
Zhao, Y., Psaros, A.F., Petromichelakis, I., Kougioumtzoglou, I.A.: A quadratic Wiener path integral approximation for stochastic response determination of multi-degree-of-freedom nonlinear systems. Probab. Eng. Mech. 69, 103319 (2022)
Google Scholar
Zinn-Justin, J.: Quantum field theory and critical phenomena, vol. 171. Oxford University Press, Oxford (2021)

Download references

Acknowledgements

The authors would like to thank Sandra May, Rainer Grauer, and Eric Vanden-Eijnden for helpful discussions. T.S. acknowledges the support received from the Ruhr University Research School, funded by Germany’s Excellence Initiative [DFG GSC 98/3], that enabled a research visit at the Courant Institute of Mathematical Sciences. T.G. acknowledges the support received from the EPSRC projects EP/T011866/1 and EP/V013319/1.

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

Authors and Affiliations

Institute for Theoretical Physics I, Ruhr University Bochum, Bochum, Germany
Timo Schorlepp
Department of Applied Physics and Applied Mathematics, Columbia University, New York, NY, USA
Shanyin Tong
Mathematics Institute, University of Warwick, Coventry, UK
Tobias Grafke
Courant Institute of Mathematical Sciences, New York University, New York, NY, USA
Georg Stadler

Authors

Timo Schorlepp
View author publications
You can also search for this author in PubMed Google Scholar
Shanyin Tong
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Grafke
View author publications
You can also search for this author in PubMed Google Scholar
Georg Stadler
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

T.S. performed the numerical calculations for the paper. All authors contributed to the development of the theoretical formalism and the writing of the manuscript

Corresponding author

Correspondence to Timo Schorlepp.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Derivations

1.1 1. Laplace method in finite dimensions

In this section, we give a more detailed explanation on how the finite dimensional Laplace Method is used to estimate extreme event probabilities in complex systems. It follows arguments similar to Dematteis et al. (2019), Tong et al. (2021).

In

$$\begin{aligned} P_F^\varepsilon (z) =&\left( 2 \pi \varepsilon \right) ^{-N/2}\nonumber \\&\times \int _{\mathbb {R}^N} \mathbbm {1}_{\{F \ge z\}}(\eta ) \exp \left\{ -\frac{1}{2 \varepsilon } \left\Vert \eta \right\Vert _N^2 \right\} \textrm{d}^N \eta \,, \end{aligned}$$

(A1)

we expand

$$\begin{aligned} \eta = \eta _z + \varepsilon \eta _1 + \sqrt{\varepsilon } \eta _2 \end{aligned}$$

(A2)

with $\eta _1, \eta _2 \in \mathbb {R}^N$ satisfying $\eta _1 \parallel \eta _z$ and $\eta _2 \in \eta _z^\perp $, such that

$$\begin{aligned} \frac{1}{2 \varepsilon }\left\Vert \eta \right\Vert _N^2 = \frac{\varepsilon }{2} \left\Vert \eta _1 \right\Vert ^2_N + \left\langle \eta _1, \eta _z \right\rangle _N + \frac{1}{\varepsilon } I_F(z) + \frac{1}{2} \left\Vert \eta _2\right\Vert ^2_N \end{aligned}$$

(A3)

and

$$\begin{aligned} \frac{F(\eta ) - z}{\varepsilon } =&\frac{1}{\lambda _z} \left\langle \eta _1, \eta _z \right\rangle _N \nonumber \\&+ \frac{1}{2} \left\langle \eta _2, \nabla ^2 F(\eta _z) \eta _2 \right\rangle _N + {{\mathcal {O}}}\left( \varepsilon ^{1/2} \right) \,. \end{aligned}$$

(A4)

To motivate the decomposition (A2), note that the natural scaling for random fluctuations around the fixed state $\eta _z$ is clearly $\propto \sqrt{\varepsilon }$, and we use this ansatz for all directions except for the one parallel to the instanton. In this direction, due to the restriction $F \ge z$ of the event set, we can expect a different behavior, and the subsequent computations in this section confirm that a decay with $\varepsilon $ faster than $\sqrt{\varepsilon }$ is indeed observed. We obtain, with $\eta _1 = s e_z$ for $s \in \mathbb {R}$ and $e_z:= {\eta _z}/{\Vert \eta _z\Vert _N}$,

$$\begin{aligned}&P_F^\varepsilon (z) \overset{\varepsilon \downarrow 0}{\sim }\ (2 \pi )^{-N/2} \varepsilon ^{1/2} \exp \left\{ -\varepsilon ^{-1} I_F(z) \right\} \nonumber \\&\quad \times \int _{\eta ^\perp _z} \textrm{d}^{N-1} \eta _2 \; \exp \left\{ -\frac{1}{2} \left\Vert \eta _2\right\Vert ^2_{N} \right\} \nonumber \\&\quad \times \int _{- \frac{\lambda _z}{2 \left\Vert \eta _z\right\Vert _N} \left\langle \eta _2, \nabla ^2 F(\eta _z) \eta _2 \right\rangle _N}^\infty \textrm{d}s \; \exp \left\{ - s \left\Vert \eta _z\right\Vert _N \right\} \nonumber \\&= (2 \pi )^{-N/2} \varepsilon ^{1/2} \left\Vert \eta _z\right\Vert _N^{-1} \exp \left\{ -\varepsilon ^{-1} I_F(z) \right\} \nonumber \\&\quad \times \int _{\eta ^\perp _z} \textrm{d}^{N-1} \eta _2 \; \exp \left\{ -\frac{1}{2} \left\langle \eta _2, \left( 1_{N \times N} - \lambda _z \nabla ^2 F(\eta _z) \right) \eta _2 \right\rangle _{N} \right\} \nonumber \\&= (2 \pi )^{-1/2} \varepsilon ^{1/2} \exp \left\{ -\varepsilon ^{-1} I_F(z) \right\} \nonumber \\&\quad \times \left[ 2 I_F(z)\det \left( 1_{N \times N} - \lambda _z \, {{\,\textrm{pr}\,}}_{\eta _z^\perp } \nabla ^2 F(\eta _z) {{\,\textrm{pr}\,}}_{\eta _z^\perp } \right) \right] ^{-1/2}\,. \end{aligned}$$

(A5)

With this computation, we have motivated (7) and (8). A rigorous proof would consist of a more careful error analysis for the Laplace method, as detailed e.g. by Bleistein and Handelsman (1975).

1.2 2. Laplace method in infinite dimensions

It is a common strategy in large deviation theory to first study expectations of the type $\mathbb {E}\left[ \exp \left\{ \frac{1}{\varepsilon } F(\phi ^\varepsilon ) \right\} \right] $ for a family of random variables $\phi ^\varepsilon $ satisfying a large deviation principle, and a real-valued function F. Only later will these results be transformed onto probabilities or other probabilistic quantities. We directly use the results of Ben Arous (1988) to conclude that the asymptotic behavior of the moment-generating function (MGF) $A_F^\varepsilon :\mathbb {R}\rightarrow [0, \infty ]$, $A_F^\varepsilon (\lambda ) = \mathbb {E}\left[ \exp \left\{ \tfrac{\lambda }{\varepsilon } f(X_T^\varepsilon ) \right\} \right] $ of the observable $f(X^\varepsilon _T)$ for the additive-noise SDE (9) as $\varepsilon \downarrow 0$ is given by

$$\begin{aligned} A_F^\varepsilon (\lambda )\overset{\varepsilon \downarrow 0}{\sim }\ R_\lambda \exp \left\{ \varepsilon ^{-1} I_F^*(\lambda ) \right\} \end{aligned}$$

(A6)

with prefactor

$$\begin{aligned} R_\lambda = \left[ \det \left( {{\,\textrm{Id}\,}}- \left. \lambda \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _\lambda } \right) \right] ^{-1/2}\,. \end{aligned}$$

(A7)

Here, $I_F^*$ denotes the Legendre transform of the rate function $I_F$, $\det $ is a Fredholm determinant, the second variation operator $\left. \frac{\delta ^{2}F_\lambda }{\delta \eta ^{2}} \right| _{\eta _\lambda }$ is trace class, and $\eta _\lambda $ is short for $\eta _{z_\lambda }$ at the Legendre dual $z_\lambda $ of $\lambda $ via $I_F'(z_\lambda ) = \lambda $. Note that for multiplicative noise, the result would be different, which can already be seen in the simple example of a one-dimensional geometric Brownian motion and $f(x) = \tfrac{1}{2} \log ^2 x$. Furthermore, Ben Arous (1988) also assumes that the vector field b, the observable f, and their respective derivatives are bounded. A remark by Deuschel et al. (2014) shows how one could relax this assumption via localization.

Evaluating the inverse Laplace transform from the MGF (A6) to the probability density function

$$\begin{aligned} \rho _F^\varepsilon (z)&= \frac{1}{2\pi i \varepsilon } \int _C A_F^\varepsilon (\lambda ) \exp \left\{ -\frac{\lambda z}{\varepsilon } \right\} \;\textrm{d}\lambda \nonumber \\&\overset{\varepsilon \downarrow 0}{\sim }\ \left( 2 \pi \varepsilon \right) ^{-1/2} R_{\lambda _z} \sqrt{I_F''(z)} \exp \left\{ - \varepsilon ^{-1} I_F(z) \right\} \end{aligned}$$

(A8)

via a saddlepoint approximation, as well as a further integration to get the tail probability via a Laplace approximation yields the desired estimate with leading-order prefactor

$$\begin{aligned} C_F(z) = R_{\lambda _z}\sqrt{I_F''(z)}\lambda _z^{-1}\,. \end{aligned}$$

(A9)

From the first-order necessary condition

$$\begin{aligned} \eta _z = \lambda _z \left. \frac{\delta F}{\delta \eta } \right| _{\eta _z} \end{aligned}$$

(A10)

and $\lambda _z = I_F'(z)$, we get via differentiation

$$\begin{aligned} \frac{\lambda _z}{I_F''(z)} \frac{\textrm{d}\eta _z}{\textrm{d}z} = \left[ {{\,\textrm{Id}\,}}- \lambda _z\left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} \right] ^{-1} \eta _z\,, \end{aligned}$$

(A11)

so

$$\begin{aligned} C_F(z)&= \left[ \left\langle \eta _z, \frac{\lambda _z}{I_F''(z)} \frac{\textrm{d}\eta _z}{\textrm{d}z} \right\rangle _{L^2} \det \left( {{\,\textrm{Id}\,}}- \left. \lambda _z \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} \right) \right] ^{-1/2} \nonumber \\&= \left[ 2 I_F(z) \left\langle \frac{\eta _z}{\left\Vert \eta _z\right\Vert }, \left[ {{\,\textrm{Id}\,}}- \lambda _z\left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} \right] ^{-1} \frac{\eta _z }{\left\Vert \eta _z\right\Vert }\right\rangle _{L^2} \right. \times \nonumber \\&\quad \times \left. \det \left( {{\,\textrm{Id}\,}}- \left. \lambda _z \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} \right) \right] ^{-1/2} \nonumber \\&= \left[ 2 I_F(z) \det \left( {{\,\textrm{Id}\,}}- \lambda _z {{\,\textrm{pr}\,}}_{\eta _z^\perp } \left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{ \eta _z} {{\,\textrm{pr}\,}}_{ \eta _z^\perp } \right) \right] ^{-1/2} \end{aligned}$$

(A12)

as claimed. The last equality is easy to see for finite-dimensional matrices: For $A \in \mathbb {R}^{N \times N}$ invertible and a unit vector $e \in \mathbb {R}^N$, the adjugate is $\text {adj}(A) = \det A \cdot A^{-1}$, and applying e from the left and right yields $\det A \left\langle e, A^{-1} e \right\rangle _N = \left\langle e, \text {adj}(A) e \right\rangle _N$. The right-hand side is the (e, e) cofactor of A, which is equal to the determinant of the $(N-1) \times (N-1)$ matrix $ {{\,\textrm{pr}\,}}_{e^\perp } A {{\,\textrm{pr}\,}}_{e^\perp }$ with ${{\,\textrm{pr}\,}}_{e^\perp } :\mathbb {R}^N \rightarrow e^\perp $ denoting the orthogonal projection. For the present infinite-dimensional case, an analogue of this relation can be verified using the series definition of the Fredholm determinant and adjugate as originally introduced by Fredholm himself (Fredholm 1903; McKean 2011).

1.3 3. From Fredholm determinants to zeta-regularized functional determinants

In this section, we motivate (26) using purely formal manipulations, and only consider linear observables f for simplicity. For rigorous results on the relation between Fredholm determinants and zeta-regularized determinants for related classes of operators, see e.g. Forman (1987), Hartmann and Lesch (2022). We start with the expression (A9) for the prefactor $C_F(z)$ in terms of the full second variation determinant without projection operators. According to the adjoint formulation of Sect. 2.2, we write the second variation $\delta ^2 (\lambda F) / \delta \eta ^2$ as the composition of three linear operators

$$\begin{aligned} \lambda _z \left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} = \left[ L^\top _{ z,(T,0)} \right] ^{-1} \circ \left\langle \nabla ^2 b(\phi _z), \theta _z \right\rangle _n \circ \left[ L_{z,(0,0)} \right] ^{-1}\,. \end{aligned}$$

(A13)

Here, the operator in the middle simply denotes pointwise multiplication with $\left\langle \nabla ^2 b(\phi _z(t)), \theta _z(t) \right\rangle _n$ for each $t \in [0,T]$. The rightmost operator, for a given argument $\delta \eta $, integrates

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{\gamma } = \nabla b(\phi _z) \gamma + \sigma \delta \eta \,, \\ \gamma (0) = 0 \end{array}\right. } \end{aligned}$$

(A14)

and sets $\left[ L_{z,(0,0)} \right] ^{-1} \delta \eta = \gamma $. Symbolically, we have

$$\begin{aligned} \left[ L_{z,(0,0)} \right] ^{-1} = \left[ \frac{\textrm{d}}{\textrm{d}t} - \nabla b(\phi _z) \right] ^{-1}_{(0,0)} \sigma , \end{aligned}$$

(A15)

where the subscript denotes inversion under the boundary condition $\gamma (0)=0$. Similarly, we put $\left[ L_{z,(T,0)}^\top \right] ^{-1} \gamma = \zeta $ with

$$\begin{aligned} \left[ L^\top _{z,(T,0)} \right] ^{-1} = \sigma ^\top \left[ -\frac{\textrm{d}}{\textrm{d}t} - \nabla b^\top (\phi _z) \right] ^{-1}_{(T,0)}, \end{aligned}$$

(A16)

under the boundary condition $\zeta (T) = 0$. Symbolically, we then get

$$\begin{aligned}&\quad \left[ \det \left( {{\,\textrm{Id}\,}}- \lambda _z \left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} \right) \right] ^{-1/2} \nonumber \\&= \left[ \det \left( {{\,\textrm{Id}\,}}- \left[ L^\top _{ z,(T,0)} \right] ^{-1} \circ \left\langle \nabla ^2 b(\phi _z), \theta _z \right\rangle _n \circ \left[ L_{z,(0,0)} \right] ^{-1} \right) \right] ^{-1/2} \nonumber \\&= \left[ \frac{{{\,\textrm{Det}\,}}\left( L^\top _{ z,(T,0)} L_{z,(0,0)} - \left\langle \nabla ^2 b(\phi _z), \theta _z \right\rangle _n \right) }{{{\,\textrm{Det}\,}}\left( L^\top _{ z,(T,0)} L_{z,(0,0)} \right) } \right] ^{-1/2} \nonumber \\&= \left[ \frac{{{\,\textrm{Det}\,}}\left( L^\top _{ z,(T,0)} L_{z,(0,0)} - \left\langle \nabla ^2 b(\phi _z), \theta _z \right\rangle _n \right) }{{{\,\textrm{Det}\,}}\left( L^\top _{ 0,(T,0)} L_{0,(0,0)} \right) } \right] ^{-1/2} \times \nonumber \\&\quad \times \left[ \frac{{{\,\textrm{Det}\,}}\left( L^\top _{ 0,(T,0)} L_{0,(0,0)} \right) }{{{\,\textrm{Det}\,}}\left( L^\top _{ z,(T,0)} L_{z,(0,0)} \right) } \right] ^{-1/2}\,. \end{aligned}$$

(A17)

Here, the critical step is in the second line where the operators are moved out of the Fredholm determinant to get a fraction of two zeta-regularized determinants, which is true for finite-dimensional matrices but non-trivial for general operators. We see that the boundary conditions of all appearing operators are

$$\begin{aligned} {{\mathcal {A}}}_0:{\left\{ \begin{array}{ll} \gamma (0) = 0\,,\\ \zeta (T)=0\,, \end{array}\right. } \end{aligned}$$

(A18)

which is the correct special case of the general boundary conditions

$$\begin{aligned} {{\mathcal {A}}}_{\lambda _z}:{\left\{ \begin{array}{ll} \gamma (0) = 0\,,\\ \zeta (T)= \lambda _z \nabla ^2 f(\phi _z(T)) \gamma (T) \end{array}\right. } \end{aligned}$$

(A19)

from Schorlepp et al. (2023) for a linear observable f. Moreover, we have

$$\begin{aligned}&L^\top _{ z,(T,0)} L_{z,(0,0)} - \left\langle \nabla ^2 b(\phi _z), \theta _z \right\rangle _n \nonumber \\&= \left[ -\frac{\textrm{d}}{\textrm{d}t} - \nabla b^\top (\phi _z) \right] \ a^{-1} \left[ \frac{\textrm{d}}{\textrm{d}t} - \nabla b(\phi _z) \right] - \langle \nabla ^2 b(\phi _z),\theta _z \rangle _n \nonumber \\&=: \Omega [\phi _z] \,, \end{aligned}$$

(A20)

which is the Jacobi operator, defined via $\delta ^2\,S[\phi _z][\gamma ] = \frac{1}{2} \int _0^T \left\langle \gamma , \Omega [\phi _z] \gamma \right\rangle _n \textrm{d}t$, for the Freidlin-Wentzell action functional $S[\phi ] = \tfrac{1}{2} \int _0^T \langle \dot{\phi } - b(\phi ), a^{-1}(\dot{\phi } - b(\phi ) ) \rangle _n \, \textrm{d}t$. We then use Forman’s theorem (Forman 1987) to evaluate the second ratio of determinants in (A17)

$$\begin{aligned}&\left[ \frac{{{\,\textrm{Det}\,}}\left( L^\top _{ 0,(T,0)} L_{0,(0,0)} \right) }{{{\,\textrm{Det}\,}}\left( L^\top _{ z,(T,0)} L_{z,(0,0)} \right) } \right] ^{-1/2} \nonumber \\&\quad = \exp \left\{ -\tfrac{1}{2} \int _0^T \left( \nabla \cdot b(\phi _z) - \nabla \cdot b(\phi _0)\right) \,\textrm{d}t\right\} \,, \end{aligned}$$

(A21)

thereby finishing the motivation of the result (26).

1.4 4. Full covariance function via eigenvalues and eigenfunctions

In this section, we formally derive (37). First, we introduce the evaluation maps $\Phi _t$ for $t \in [0,T]$ as $(\eta (s))_{s \in [0,T]} \xrightarrow {\Phi _t} \phi (t)$ with

$$\begin{aligned} {\left\{ \begin{array}{ll} \dot{\phi } = b(\phi ) + \sigma \eta \,,\\ \phi (0) = x\,. \end{array}\right. } \end{aligned}$$

(A22)

Then

$$\begin{aligned}&{{\mathcal {C}}}_z(t,t') \nonumber \\&= \lim _{\varepsilon \downarrow 0} \mathbb {E}\left[ \frac{(X_t^\varepsilon - \phi _z(t))\otimes (X_{t'}^\varepsilon - \phi _z(t'))}{\varepsilon } \bigg \vert f(X_T^\varepsilon ) = z\right] \nonumber \\&= \lim _{\varepsilon \downarrow 0} \left( \varepsilon \mathbb {E}\left[ \delta (f(\Phi _T[\sqrt{\varepsilon }\eta ]) - z) \right] \right) ^{-1} \times \nonumber \\&\quad \times \left( \mathbb {E}\left[ (\Phi _t[\sqrt{\varepsilon }\eta ] - \phi _z(t))\otimes (\Phi _{t'}[\sqrt{\varepsilon }\eta ] - \phi _z(t')) \right. \right. \times \nonumber \\&\quad \times \left. \left. \delta (f(\Phi _T[\sqrt{\varepsilon }\eta ]) - z) \right] \right) , \end{aligned}$$

(A23)

where $\delta $ denotes the Dirac delta function. The denominator of (A23) is just the PDF $\rho _F^\varepsilon (z)$; we already know its asymptotic behavior from (15). In short, its asymptotics are obtained as

$$\begin{aligned}&\mathbb {E}\left[ \delta (f(\Phi _T[\sqrt{\varepsilon }\eta ]) - z) \right] \nonumber \\&=\frac{1}{2 \pi i \varepsilon } \int _{-i \infty }^{i \infty } \textrm{d}\lambda \int D \eta \, \exp \left\{ -\frac{1}{\varepsilon } \left( \frac{1}{2}\left\Vert \eta \right\Vert ^2_{L^2} - \lambda (F[\eta ] - z) \right) \right\} \nonumber \\&\overset{\varepsilon \downarrow 0}{\sim }\ \frac{1}{2 \pi i \varepsilon ^{1/2}} \exp \left\{ -I_F(z) / \varepsilon \right\} \int _{-i \infty }^{i \infty } \textrm{d}\lambda \int D \eta \times \nonumber \\&\times \exp \left\{ -\frac{1}{2} \left\langle \eta , \left[ {{\,\textrm{Id}\,}}- \lambda _z \left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} \right] \eta \right\rangle _{L^2} \right\} \exp \bigg \{ \lambda \bigg \langle \underbrace{\left. \frac{\delta F}{\delta \eta } \right| _{\eta _z}}_{= \eta _z / \lambda _z}, \eta \bigg \rangle _{L^2} \bigg \} \nonumber \\&= \frac{1}{\varepsilon ^{1/2}} \exp \left\{ -I_F(z) / \varepsilon \right\} \frac{\left|\lambda _z\right|}{\left\Vert \eta _z\right\Vert _{L^2}} \times \nonumber \\&\times \underbrace{\int D \eta \, \exp \left\{ -\frac{1}{2} \left\langle \eta , \left[ {{\,\textrm{Id}\,}}- \lambda _z \left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} \right] \eta \right\rangle _{L^2} \right\} \delta \left( \left\langle e_z, \eta \right\rangle _{L^2} \right) }_{=(2 \pi )^{-1/2} \det \left( {{\,\textrm{Id}\,}}- A_z \right) ^{-1/2}}\,. \end{aligned}$$

(A24)

Here, in the first step, the PDF was written as the inverse Laplace transform of the moment-generating function, and the expectation over $\eta $ was expressed as a functional integral. Then, in the second step, all integration variables were expanded up to second order around the stationary point $(\eta _z, \lambda _z)$. Finally, in the last step, the $\lambda $ integral was interpreted as a delta function again, restricting the functional integration to the subspace orthogonal to $e_z = \eta _z / \left\Vert \eta _z\right\Vert _{L^2}$. Hence, the Gaussian integral yields the determinant in the subspace $\eta _z^\perp $, and the factor of $(2 \pi )^{-1/2}$ appears due to the normalization of the functional integral. For the numerator of (A23), we proceed similarly:

$$\begin{aligned}&\varepsilon ^{-1} \mathbb {E}\left[ (\Phi _t[\sqrt{\varepsilon }\eta ] - \phi _z(t))\otimes (\Phi _{t'}[\sqrt{\varepsilon }\eta ] - \phi _z(t')) \right. \nonumber \\&\qquad \quad \times \left. \delta (f(\Phi _T[\sqrt{\varepsilon }\eta ]) - z) \right] \nonumber \\&\overset{\varepsilon \downarrow 0}{\sim }\ \frac{1}{\varepsilon ^{1/2}} \exp \left\{ -I_F(z) / \varepsilon \right\} \frac{\left|\lambda _z\right|}{\left\Vert \eta _z\right\Vert _{L^2}} \nonumber \\&\quad \times \int D \eta \, \exp \left\{ -\frac{1}{2} \left\langle \eta , \left[ {{\,\textrm{Id}\,}}- \lambda _z \left. \frac{\delta ^{2}F}{\delta \eta ^{2}} \right| _{\eta _z} \right] \eta \right\rangle _{L^2} \right\} \nonumber \\&\quad \times \delta \left( \left\langle e_z, \eta \right\rangle _{L^2} \right) \left. \delta \Phi _t \right| _{\eta _z}[\eta ] \otimes \left. \delta \Phi _{t'} \right| _{\eta _z}[\eta ]\,, \end{aligned}$$

(A25)

where $\left. \delta \Phi _t \right| _{\eta _z}$ denotes the first variation of $\Phi _t$. One can show, by first using an adjoint variable and then proceeding similar to the boundary condition computation in section A5, that

$$\begin{aligned} \left. \delta \Phi _t \right| _{\eta _z}[\eta ] = \gamma (t) \end{aligned}$$

(A26)

is the state space fluctuation from (23) around $\phi _z$ at time t associated with $\eta $. Since this is a linear function of $\eta $, expanding

$$\begin{aligned} \eta = \sum _{i = 1}^\infty \alpha _i \delta \eta _z^{(i)} \end{aligned}$$

(A27)

in terms of the orthonormal eigenfunctions of $A_z$ and performing the Gaussian integration in the $\alpha $ variables then leads to (37).

1.5 5. Final time conditioned fluctuations boundary condition

Here, we show that for the state variable fluctuations $\gamma $ associated with any $\delta \eta \in \eta _z^\perp \subset L^2([0,T],\mathbb {R}^n)$, the final time boundary condition

$$\begin{aligned} \left\langle \lambda _z \nabla f(\phi _z(T)), \gamma (T) \right\rangle _n = 0 \end{aligned}$$

(A28)

holds, and hence the result (37) for the fluctuation covariance in terms of the $\gamma ^{(i)}_z$’s is consistent with (39). Note that the linearized state equation for $\gamma $ in (23) can be formally integrated to get

$$\begin{aligned} \gamma (T) = \int _0^T {{\mathcal {T}}} \left[ \exp \left\{ \int _t^T \nabla b(\phi _z(\tau )) \, \textrm{d}\tau \right\} \right] \sigma \delta \eta (t) \, \textrm{d}t\,, \end{aligned}$$

(A29)

where ${{\mathcal {T}}}$ is the time-ordering operator. Similarly, from the first order adjoint equation in (19), we get

$$\begin{aligned} \theta _z(t) = {{\mathcal {T}}} \left[ \exp \left\{ \int _t^T \nabla b(\phi _z(\tau ))^\top \textrm{d}\tau \right\} \right] \lambda _z \nabla f(\phi _z(T))\,, \end{aligned}$$

(A30)

and hence

$$\begin{aligned} \left\langle \lambda _z \nabla f(\phi _z(T)), \gamma (T) \right\rangle _n = \left\langle \eta _z, \delta \eta \right\rangle _{L^2} = 0\ \end{aligned}$$

(A31)

by transposing.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Schorlepp, T., Tong, S., Grafke, T. et al. Scalable methods for computing sharp extreme event probabilities in infinite-dimensional stochastic systems. Stat Comput 33, 137 (2023). https://doi.org/10.1007/s11222-023-10307-2

Download citation

Received: 21 March 2023
Accepted: 26 September 2023
Published: 13 October 2023
DOI: https://doi.org/10.1007/s11222-023-10307-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Scalable methods for computing sharp extreme event probabilities in infinite-dimensional stochastic systems

Abstract

Similar content being viewed by others

The Bayesian Approach to Inverse Problems

The Bayesian Approach to Inverse Problems

Stochastic Methods for Solving High-Dimensional Partial Differential Equations

Explore related subjects

1 Introduction

1.1 Laplace method for normal random variables in \(\mathbb {R}^N\)

1.2 Generalization to infinite dimensions for SDEs with additive noise

2 Numerical rate function and prefactor evaluation

2.1 First variations and finding the instanton

2.2 Second variations and prefactor computation via dominant eigenvalues

2.3 Alternative: prefactor computation via matrix Riccati differential equations

2.4 Computational efficiency considerations

3 Probabilistic interpretation via fluctuation covariances and transition tubes

4 Computational examples

4.1 Stochastic Korteweg–De Vries equation

4.2 Stochastically forced incompressible three-dimensional Navier–Stokes equations

5 Summary and outlook

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: Derivations

Appendix A: Derivations

1.1 1. Laplace method in finite dimensions

1.2 2. Laplace method in infinite dimensions

1.3 3. From Fredholm determinants to zeta-regularized functional determinants

1.4 4. Full covariance function via eigenvalues and eigenfunctions

1.5 5. Final time conditioned fluctuations boundary condition

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation