First we recap the main results of the large deviation theory of noise-perturbed dynamical systems, including the exponential scaling law of mean exit times, Eq. (4), which is a key ingredient of our algorithm. A precursor to this is Kramers’ seminal work [14] on the ‘escape rate’Footnote 1 of a particle from a “deep” (1D) potential well over a barrier, a concise presentation of which can be found in Sect. 5.10.1 of [9]. We consider the rather generic situation when the dynamics is governed by the following Langevin stochastic differential equation (SDE):
$$\begin{aligned} \dot{\mathbf {x}} = \mathbf {F}(\mathbf {x}) + \sigma \mathbf {D}\varvec{\xi }(t), \end{aligned}$$
(1)
\(\mathbf {x},\mathbf {F},\varvec{\xi }\in \mathbb {R}^n\), and the diffusion matrix \(\mathbf {D}\in \mathbb {R}^{n\times n}\) is independent of \(\mathbf {x}\), i.e., the white noise \(\varvec{\xi }\) is additive. A lack of time-scale separation between resolved (\(\mathbf {x}\)) and unresolved variables in the context of stochastic parametrization leads to a memory effect, something that is not included in Eq. (1) [16]; however, this form can be recovered already by a moderate time scale separation [17]. The vector field \(\mathbf {F}(\mathbf {x})\) is such that it gives rise, under the deterministic dynamics, \(\sigma =0\), to the coexistence of multiple attractors (including the possibility of an attractor at infinity) and at least one nonattracting invariant set, often called a saddle set. The saddle set is embedded in the boundary of some basins of attraction. Based on a well-established theory due to Freidlin and Wentzell [18], and its extension [19, 20], the steady state probability distribution in the weak-noise limit, \(\sigma \ll 1\), can be written as
$$\begin{aligned} W(\mathbf {x}) \sim Z(\mathbf {x})\exp (-2\Phi (\mathbf {x})/\sigma ^2), \end{aligned}$$
(2)
in which \(\Phi (\mathbf {x})\) is called the nonequilibrium- or quasi-potential. In gradient systems where \(\mathbf {F}(\mathbf {x})=-\nabla V(\mathbf {x})\) we have that \(\Phi (\mathbf {x})=V(\mathbf {x})\), provided that \(\mathbf {D}=\mathbf {I}\). If \(\mathbf {D}\) does depend on \(\mathbf {x}\), then \(W(\mathbf {x})\) might not satisfy a large deviation law \(\lim _{\sigma \rightarrow 0}\sigma ^2\ln W(\mathbf {x})=-2\Phi (\mathbf {x})\). See e.g. [21] for an example of multiplicative noise where \(\lim _{\sigma \rightarrow 0}\sigma ^2\ln W(\mathbf {x}=x)\) does not exists for some parameter setting and W(x) has a fat tail.
The probability that a noise-perturbed trajectory does not escape the basin of attraction over a time span of \(t_t\) decays exponentially:
$$\begin{aligned} P(t_t) \sim \frac{1}{\tau }\exp (-t_t/\tau ). \end{aligned}$$
(3)
The approximation is in fact quite good already for times \(t_t\approx {{\,\mathrm{\text {E}}\,}}[t_t]=\tau \) or even smaller. The reciprocal of the expectation value \(\tau \) can be written as an integral of the probability current through the basin boundary, whose leading component as \(\sigma \rightarrow 0\) comes from a point \(\mathbf {x}_e\) where \(\Phi (\mathbf {x})\) is minimal on the boundary. It is, of course, the global minimum when multiple local minima, corresponding to multiple invariant saddle sets embedded in the basin boundary, are present. The proportionality of the probability current to \(W(\mathbf {x})\) leads [8, 11, 15, 22] to:
$$\begin{aligned} \tau \propto \exp (2\Delta \Phi /\sigma ^2), \end{aligned}$$
(4)
where
$$\begin{aligned} \Delta \Phi =\Phi (\mathbf {x}_e)-\Phi (A) \end{aligned}$$
(5)
is what we call the potential barrier height. Both the saddle and the attractor can be chaotic, in which cases \(\Phi (\mathbf {x}_e)\) and \(\Phi (A)\) have been shown [19, 20] to be constant over the saddle [19] and attractor [20], respectively.
Considering (4), the expected transition times increase “explosively” as the noise strength \(\sigma \) decreases. From the point of view of estimating \(\Delta \Phi \), say, in a setting of linear regression with \(\ln \tau \) as the dependent-and \(2\sigma ^{-2}\) as the independent variable, there seems to be a trade-off between an improving accuracy of the estimation and an increasing demand of resources as the range of the independent variable is expanded towards smaller values of \(\sigma \). Accordingly, if we fix the amount of resources that we are willing to commit, an improvement of accuracy is not guaranteed, because we can register fewer transitions with smaller values of \(\sigma \) on average. On the other hand, aiming for boosting the sample number by restricting \(\sigma \) to larger values might not improve accuracy either for the reason that Eq. (7) will illuminate below. We assume that for some \(\sigma _a\) we can estimate \(\tau =\tau _a\) arbitrarily accurately because a large number of transitions can be achieved relatively inexpensively. We also assume that in this “anchor point” (4) applies accurately:
$$\begin{aligned} \tau \approx \tau _a\exp \big (2\Delta \Phi (\sigma ^{-2} - \sigma _a^{-2})\big ), \quad \sigma <\sigma _a. \end{aligned}$$
(6)
We note in passing that in gradient systems the prefactor \(\tau _a\exp (-2\Delta \Phi \sigma _a^{-2})\) in the above is given by the Eyring-Kramers formula [23]; see a generalisation of that for irreversible diffusion processes in [24], where it is assumed, however, that the saddle is a fixed-point. Subsequently, as a departure from the regression-type estimation framework, we will identify the accuracy of estimation by
$$\begin{aligned} \delta \Delta \Phi = \frac{\sqrt{{{\,\mathrm{\text {Var}}\,}}[\ln \bar{t}_t]}}{y}, \end{aligned}$$
(7)
where we introduced \(y=\sigma ^{-2} - \sigma _a^{-2}\), and \(\bar{t}_t=\frac{1}{N}\sum _{i=1}^Nt_t\) is our finite-N estimate of \(\tau \) for a fixed \(\sigma \). Now we can see that as \(\sigma \rightarrow \sigma _a\), the inaccuracy explodes. That is, in the present specific setting of estimation there should exist an optimal value \(\sigma ^*\) of \(\sigma \). This is what we determine next.
The sum of the exponentially distributed random variables, \(N\bar{t}_t\), does in fact follow an Erlang distribution [25], and so:
$$\begin{aligned} P(\bar{t}_t) \sim \frac{1}{\tau ^N}\frac{(N\bar{t}_t)^{N-1}}{(N-1)!}\exp (-N\bar{t}_t/\tau )N. \end{aligned}$$
(8)
Note that since \({{\,\mathrm{\text {E}}\,}}[\bar{t}_t]={{\,\mathrm{\text {E}}\,}}[t_t]=\tau \), our estimator \(\bar{t}_t\) is unbiased. Furthermore, \({{\,\mathrm{\text {Var}}\,}}[\bar{t}_t]={{\,\mathrm{\text {Var}}\,}}[t_t]/N=\tau ^2/N\) in accordance with the Central Limit Theorem. It can be shown that (8) implies that
$$\begin{aligned} {{\,\mathrm{\text {Var}}\,}}[\ln \bar{t}_t] = \Psi ^{(1)}(N), \end{aligned}$$
(9)
where \(\Psi ^{(1)}(N)\) is the first derivative of the digamma function [26]. We can make the interesting observation that \({{\,\mathrm{\text {Var}}\,}}[\ln \bar{t}_t]\) does not depend on \(\tau \), only on N. Next, we make use of the approximation [26]
$$\begin{aligned} \Psi ^{(1)}(N) \sim 1/N \end{aligned}$$
(10)
and, upon substitution in (7), write that
$$\begin{aligned} \delta \Delta \Phi \sim \sqrt{\frac{\tau _a}{T}}\frac{\exp (\Delta \Phi y)}{y}, \end{aligned}$$
(11)
where, furthermore, we assumed a certain fixed commitment of resources, which can be expressed simply by \(T=N\tau \), and also made use of (6). We look for a \(\sigma =\sigma ^*\) or \(y=y^*\) that minimizes \(\delta \Delta \Phi \), for which we need to solve \({{\,\mathrm{\text {d}}\,}}\delta \Delta \Phi /{{\,\mathrm{\text {d}}\,}}y=0\). This yields our main result:
$$\begin{aligned} y^*=\Delta \Phi ^{-1}. \end{aligned}$$
(12)
We can make the interesting observation that it is independent of \(\tau _a\) and T, which we comment on shortly. Rather, \(y^*\) depends only on \(\Delta \Phi \) (in a very simple way), the unknown that we wanted to determine in the first place, and, so, the result may seem irrelevant to practice for the first sight. However, one can simply start out with an initial guess value, \(\hat{\Delta \Phi }_0\), and iteratively update the estimate as \(\hat{\Delta \Phi }_{i}\) by performing a maximum likelihood estimation (MLE) [27] each time a new value of \(t_{t,i}\) is acquired. This way, for the acquisition of \(t_{t,i+1}\), one continues the experiment/simulation with an updated noise strength \(y_{i+1}^*=\hat{\Delta \Phi }^{-1}_i\), \(i=1,\dots ,N\), according to (12). The MLE of \(\Delta \Phi \) is based on the probability distribution (3) jointly with (6). This is an analogous procedure to the well-established nonstationary extreme value statistics when one or more parameters of the Generalised Extreme Value (GEV) distribution is a function of a covariate that could depend on time (see Chapter 6 of [27]). In our case \(\tau \) and \(\sigma \) correspond to the parameter of the GEV distribution and the covariate, respectively. We recall that as \(\sigma ^*\) does not depend on T, at any time into the experiment/run (for large enough N, though, such that (10) is a good approximation) our estimate of \(\Delta \Phi \) is done most efficiently, and, so, we can revise our commitment: either stopping the experiment/run early or extending it.