An Efficient Algorithm to Estimate the Potential Barrier Height from Noise-Induced Escape Time Data

An algorithm is developed for determining the potential barrier height experimentally, provided that we have control over the noise strength σ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma $$\end{document}. We are concerned with the situation when the laboratory or numerical experiment requires large resources of time or computational power, respectively, and wish to find a protocol that provides the best estimate in a given amount of time. The optimal noise strength σ∗\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^*$$\end{document} to use is found to be very simply related to the potential barrier height ΔΦ\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Delta \Phi $$\end{document} as: y∗=ΔΦ-1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y^*=\Delta \Phi ^{-1}$$\end{document}, y=σ-2-σa-2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$y=\sigma ^{-2}-\sigma ^{-2}_\mathrm{a}$$\end{document}, with some “anchor point” σa-2\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma ^{-2}_\mathrm{a}$$\end{document}; and, as a second ingredient, an iterative method is proposed for the estimation. For a numerical verification of the optimality, we apply the algorithm to a simple system of an over-damped particle confined to a double-well potential, when it is feasible to evaluate statistics of the estimator. Subsequently, we also apply it to a high-dimensional case of a diffusive energy balance model, when the potential barrier height—concerning e.g. the warm-to-snowball-climate transition—cannot be determined analytically, but we would have to resort to more sophisticated numerical methods.


Introduction
It is a common phenomenon in nature and technology that a system under noise perturbations exits a regime of its usual dynamics [1][2][3][4][5][6]. Often it is possible to define a potential function, or nonequilibrium potential [7,8], whereby a potential well can be associated with a usual or persistent dynamics, and a saddle of the potential adjacent to the potential well is a Communicated by Valerio Lucarini. feature through which the noise-driven exit takes place [9]. One of the two most important conditions for the possibility of defining a potential appears to be a time-scale separation in the deterministic dissipative dynamics, when the fast processes can be modeled as noise. This is a very common modeling approach in the climate sciences, known as stochastic parametrization; see [10] for a technical reference which also details many caveats. In another scenario, fast perturbations to a system may be truly exogenous. The other condition is that we are in the weak-noise limit [11], when the nonequilibrium potential turns out to be analogous with the classical mechanical action (for which reason it is also called the Freidlin-Wentzell action). The action is a solution of a Hamilton-Jacobi equation, which equation arises from a series expansion of the Fokker-Planck equation with respect to the noise strength, and is associated with the leading-order terms. The potential difference between the bottom of the potential well and the saddle is often termed a potential barrier. The expected exit time then depends on the height of this potential barrier and the (small) noise strength (as expressed by Eq. (4) in Sect. 2). Therefore, knowing the potential barrier height is often of strong interest, because then one can predict-for a given or applied noise strength-the expected escape time.
We develop an algorithm to determine the potential barrier height experimentally, provided that we have control over the noise strength. We are concerned with the situation when the experiment or numerical simulation requires large resources of time or computational power, respectively, and wish to find a protocol that provides the best estimate in a given amount of time. We encountered such a situation when wanted to determine expected transition times to the cold climate for the noisy version of a climate model; see Fig. 2 of [12].
We can envisage another application scenario as follows. Consider that there is a bistability in a deterministically parametrized version of an Earth System Model (ESM), say, a desertforest bistability, and in a stochastically parametrized version rare transitions can occur, i.e., the system becomes transitive. The key is that one can control the noise strength associated with the stochastic parametrization of the model, and, so, one can apply stronger than realistic noise. Although it is crucial that we remain in the said weak-noise limit. Thereby one can generate transitions more frequently, such that it can become feasible even in an expensive-to-simulate ESM to determine the potential barrier height using the proposed algorithm. Subsequently, it will be just a back-on-the-envelope calculation to estimate the mean transition/exit time for realistic, much smaller, noise strengths.
When the noise strength cannot be controlled, one can apply the adaptive multilevel splitting algorithm [13] in order to determine the mean exit time. In this case there is, of course, no limitation imposed on the noise strength.
Next, in Sect. 2, we detail the derivation of the optimal noise strength for the algorithm, and also detail the iterative procedure to estimate the potential difference. Then, in Sect. 3 we provide a numerical proof-of-concept via applying the algorithm to two example systems. Finally, in Sect. 4, expanding on the above, we provide some remarks on the application and scope of the algorithm.

The Efficient Algorithm
First we recap the main results of the large deviation theory of noise-perturbed dynamical systems, including the exponential scaling law of mean exit times, Eq. (4), which is a key ingredient of our algorithm. A precursor to this is Kramers' seminal work [14] on the 'escape rate' 1 of a particle from a "deep" (1D) potential well over a barrier, a concise presentation of which can be found in Sect. 5.10.1 of [9]. We consider the rather generic situation when the dynamics is governed by the following Langevin stochastic differential equation (SDE): x, F, ξ ∈ R n , and the diffusion matrix D ∈ R n×n is independent of x, i.e., the white noise ξ is additive. A lack of time-scale separation between resolved (x) and unresolved variables in the context of stochastic parametrization leads to a memory effect, something that is not included in Eq. (1) [16]; however, this form can be recovered already by a moderate time scale separation [17]. The vector field F(x) is such that it gives rise, under the deterministic dynamics, σ = 0, to the coexistence of multiple attractors (including the possibility of an attractor at infinity) and at least one nonattracting invariant set, often called a saddle set. The saddle set is embedded in the boundary of some basins of attraction. Based on a wellestablished theory due to Freidlin and Wentzell [18], and its extension [19,20], the steady state probability distribution in the weak-noise limit, σ 1, can be written as in which (x) is called the nonequilibrium-or quasi-potential. In gradient systems where . See e.g. [21] for an example of multiplicative noise where lim σ →0 σ 2 ln W (x = x) does not exists for some parameter setting and W (x) has a fat tail. The probability that a noise-perturbed trajectory does not escape the basin of attraction over a time span of t t decays exponentially: The approximation is in fact quite good already for times t t ≈ E[t t ] = τ or even smaller. The reciprocal of the expectation value τ can be written as an integral of the probability current through the basin boundary, whose leading component as σ → 0 comes from a point x e where (x) is minimal on the boundary. It is, of course, the global minimum when multiple local minima, corresponding to multiple invariant saddle sets embedded in the basin boundary, are present. The proportionality of the probability current to W (x) leads [8,11,15,22] to: is what we call the potential barrier height. Both the saddle and the attractor can be chaotic, in which cases (x e ) and (A) have been shown [19,20] to be constant over the saddle [19] and attractor [20], respectively. Considering (4), the expected transition times increase "explosively" as the noise strength σ decreases. From the point of view of estimating , say, in a setting of linear regression with ln τ as the dependent-and 2σ −2 as the independent variable, there seems to be a trade-off between an improving accuracy of the estimation and an increasing demand of resources as the range of the independent variable is expanded towards smaller values of σ . Accordingly, if we fix the amount of resources that we are willing to commit, an improvement of accuracy is not guaranteed, because we can register fewer transitions with smaller values of σ on average. On the other hand, aiming for boosting the sample number by restricting σ to larger values might not improve accuracy either for the reason that Eq. (7) will illuminate below. We assume that for some σ a we can estimate τ = τ a arbitrarily accurately because a large number of transitions can be achieved relatively inexpensively. We also assume that in this "anchor point" (4) applies accurately: We note in passing that in gradient systems the prefactor τ a exp(−2 σ −2 a ) in the above is given by the Eyring-Kramers formula [23]; see a generalisation of that for irreversible diffusion processes in [24], where it is assumed, however, that the saddle is a fixed-point. Subsequently, as a departure from the regression-type estimation framework, we will identify the accuracy of estimation by where we introduced t t is our finite-N estimate of τ for a fixed σ . Now we can see that as σ → σ a , the inaccuracy explodes. That is, in the present specific setting of estimation there should exist an optimal value σ * of σ . This is what we determine next.
The sum of the exponentially distributed random variables, Nt t , does in fact follow an Erlang distribution [25], and so: Var[t t ]/N = τ 2 /N in accordance with the Central Limit Theorem. It can be shown that (8) implies that Var[lnt t ] = (1) where (1) (N ) is the first derivative of the digamma function [26]. We can make the interesting observation that Var[lnt t ] does not depend on τ , only on N . Next, we make use of the approximation [26] (1) (N ) ∼ 1/N (10) and, upon substitution in (7), write that where, furthermore, we assumed a certain fixed commitment of resources, which can be expressed simply by T = N τ , and also made use of (6). We look for a σ = σ * or y = y * that minimizes δ , for which we need to solve d δ / d y = 0. This yields our main result: We can make the interesting observation that it is independent of τ a and T , which we comment on shortly. Rather, y * depends only on (in a very simple way), the unknown that we wanted to determine in the first place, and, so, the result may seem irrelevant to practice for the first sight. However, one can simply start out with an initial guess value,ˆ 0 , and iteratively update the estimate asˆ i by performing a maximum likelihood estimation (MLE) [27] each time a new value of t t,i is acquired. This way, for the acquisition of t t,i+1 , one continues the experiment/simulation with an updated noise strength y * i+1 =ˆ −1 i , i = 1, . . . , N , according to (12). The MLE of is based on the probability distribution (3) jointly with (6). This is an analogous procedure to the well-established nonstationary extreme value statistics when one or more parameters of the Generalised Extreme Value (GEV) distribution is a function of a covariate that could depend on time (see Chapter 6 of [27]). In our case τ and σ correspond to the parameter of the GEV distribution and the covariate, respectively. We recall that as σ * does not depend on T , at any time into the experiment/run (for large enough N , though, such that (10) is a good approximation) our estimate of is done most efficiently, and, so, we can revise our commitment: either stopping the experiment/run early or extending it.

Numerical Verification
Next, we demonstrate the use of our algorithm on two examples; in a single-as well as a multi-dimensional system.

Example 1: Overdamped Particle in a Symmetrical 1D Double-Well Potential
The governing equation of this system is the following SDE: We specify our example as: The two minima are at x ± = ± √ 2, and the local maximum in between x ± is at x 0 = 0. These are fixed points of the deterministic case (σ = 0). A numerical solution of the SDE (13) is obtained by using an Euler-Maruyama integrator [28] with a time step size of h = 0.02. Examples of time series realisations are shown in Fig. 1, indicating the regime behaviour with transitions between the two regimes. The time series clearly evidence bimodal marginal distributions-corresponding to the two regimes-whose maxima, and the local minimum in between (not shown), are exactly at x ± = ± √ 2 and x 0 = 0, respectively. With substituting these into (5) we obtain that V = = 1. This shows up as the slope of the curve in Fig. 2. The green coloring indicates that (4) is satisfied well even with so strong noise that the time spent in a regime is not so clear cut any more, as seen in Fig. 1 (b). The result of applying our algorithm is shown in Fig. 3, indicating that it serves its purpose, i.e., V = is correctly estimated to be about 1, and that the convergence is rather fast. Finally, Fig. 4 verifies the corner stone of the algorithm, given by Eq. (12), showing the sample standard deviation of a number of estimates. For the purpose of comparison, results with the new algorithm (horizontal red line) and, as a reference, results with different fixed sample values of σ (blue circle markers) are shown in one diagram, indicating that the accuracy of the estimate by our algorithm is just about the best accuracy achievable by the same amount of computation using the optimal fixed σ * (vertical gold line). Note that we chose N = 30 for our algorithm, resulting in some computational time T , and then we realised N = T /τ (σ ) transitions using the different fixed σ 's.

Example 2: The Ghil-Sellers Energy Balance Climate Model (GSEBM)
One of the most striking facts about Earth's climate is its global-scale bistability: beside the relatively warm climate that we live in, under the present astronomical conditions a very cold climate featuring a fully glaciated so-called snowball Earth is also possible, and this state might have been experienced a number of times by Earth in the past few hundred million years [29]. Different hypotheses of transitioning from the warm to the cold climate and the other way round involve external forcings, but in principle it is possible that the climate system in itself-in its autonomous form, without external effects-is transitive, at least in the warm-to-cold direction. This would be a somewhat counterintuitive scenario of no bistability in a strict sense, but the coexistence of an attractor corresponding to the cold climate and a nonattracting set [15] corresponding to the warm climate. Escape from the nonattracting set is then modeled as a noise-induced transition (when the concepts of escape rate and exit rate coalesce), where the noise models some unresolved internal, say, atmospheric and/or oceanic dynamics. Without a requirement for physical realism, we consider additive noise perturbations of the Ghil-Sellers model [30] written for the long time average surface air temperature T (φ, t) as a function of latitude φ ∈ [−π/2, π/2] or x = 2φ/π ∈ [−1, 1]. The See [30,31] for the concrete form of the equation, and [32] for a numerical implementation. The model expresses an energy budget, namely, that the tendency of internal energy in latitudinal bands on the left hand side is equal to the sum of-following the order of terms on the right-the incoming short-wave solar radiation modulated by the albedo α, the outgoing long-wave thermal radiation O and the rate of diffusive heat transport from neighbouring latitudinal bands. M(x) is to do with the spherical geometry, and D 1 and D 2 g(T ) are sensible and latent heat diffusivities, respectively. A global mean albedo can be derived from the dynamics, whose temperature-dependence features a cross-over between approximately constant values for very cold and warm conditions, corresponding to a completely snow-covered 'snowball' and completely snow/ice-free conditions, respectively. It is this temperature dependence-giving rise also to the so-called ice-albedo positive feedbackthat is responsible for the global scale bistability [31], which is a robust feature of the climate model hierarchy. Unlike [32] that integrates the deterministic model using Matlab's pdepe, this time we have a stochastic system at hand and, so, we take advantage of Matlab's SDE simulation suite simulate. For this we discretize the eq. with respect to T by the 'method of lines', converting the PDE into an ODE, i.e., Eq. (1). The particular difference schemes that we apply, using a regular grid, are: [33] regarding the x-dependent diffusivity).   The boundary conditions are eliminated by the 'method of reflection', setting T 0 = T J and T J +1 = T 1 . Such a grid deals effectively with the singularity of M(x) at the poles, but the resulting ODE can be somewhat stiff. Figure 5 suggests that our new algorithm works also in a multi-dimensional setting: the estimatesˆ i (blue markers) do convergence to the reference value (horizontal red line). It is key that the reference value is obtained by a completely independent method. In the high-dimensional setting, even in gradient systems, in general cannot be calculated analytically. To provide a reference, we calculated for the discretized system, the SDE, by an action-minimizing procedure [3], using the computer code publicly available as supplementary material of that paper. We realised the time-discretization of the instanton by breaking it down into 100 segments over a span of 2000 [Ms]. We note that for the feasibility of the minimization, already with J = 10, it is crucial to provide symbolically the gradient of the action with respect to the displacement of the discrete sample points of the instanton. Without this symbolic expression the minimization takes orders-of-magnitude more time (as we estimated by examining toy examples). The symbolic expression-an extremely long expression, practically impossible to keep track of manually-is generated by the code of [3], which makes use of Matlab's symbolic algebra toolbox. For the code to be successful in this, we need to be able to obtain an explicit expression for the tendenciesṪ j . This could not be achieved by the sophisticated method implemented in Matlab's pdepe, which is why we developed the discretization scheme (15) using the method of lines.
We note that an expression for the potential functional (T ) of the PDE was given in [30]. However, it does not seem to be possible to evaluate this expression for the particular model at hand, not even numerically, because Eq. (7d) of [30] would need to be first integrated analytically, which does not seem feasible using symbolic algebra software. The reason for this may be that the compact expression featuring multiple applications of the absolute value function to represent the piece-wise conditional expression (arising from the cutoff values for the albedo) may be too complicated.
does not affect the proposed procedure. The condition for the existence of a quasi-potential is that the noise is weak and that, when the noise is a representation of unresolved small-scale dynamics, there is a time-scale separation between the resolved and unresolved dynamics. When the noise is weak, the probability of any escape path is exponentially small compared with the most probable exit path through the saddle where the potential over the deterministic basin boundary has a minimum. This includes the possibility of multiple saddle sets embedded in the basin boundary. An example of this may be the reduced stochastic Duffing oscillator-like system underlying the turbulent swirling flow studied in [35]. However, as in this experiment we can actually see multiple well-separated clustering of transition paths that prompt the multiplicity of local potential minima over the basin boundary, we can say that the noise is not weak in the above sense. In this case the potential barrier height should not be expected to predict the mean transition times. Nevertheless, it is not clear what errors to expect as e.g. Fig. 2 indicates that the theoretical scaling applies when the noise is rather strong.
Our algorithm relies on an anchor point in which a large number of transitions are generated so that the mean transition time can be determined highly accurately. For this to be affordable, we look to set the noise as large as possible. However, it should be weak enough such that with the variation of the noise strength the mean transition times scale according to our assumption (as in Eq. (4)), based on the existence of a quasi-potential. There seems to be no apriori indicator what noise strength is already small enough. To check that it is, perhaps the most efficient procedure is an iterative one: (1) One uses a noise strength for the anchor that is affordable by far; (2) uses our iterative algorithm to determine the optimal noise strength belonging to this anchor-alongside the estimate of the potential difference; (3) in a new secondary iterate one associates this optimal noise strength with a new anchor, and 4) determines a new estimate of the potential difference; and so on. If, as a result of the secondary iteration, two subsequent estimates of the potential difference are close, it should mean that the noise is weak enough.
The primary iteration of our algorithm, as described in Sect. 2, dictates that simulation (or experimental) runs that end up in a transition take place sequentially. If simulations are run in parallel, and a new one is started with an updated noise strength right away, then a bias arises towards shorter transition times, which implies a biased estimate of the potential difference. However, it would be a worthwhile future investigation to determine the dependence of the bias on the number of parallel simulations, as perhaps a moderate number of parallel simulations would result in a reasonably small bias, while accelerate the procedure already considerably.
A limitation of our algorithm is that it is applicable only to systems whose deterministic part is autonomous. In climate research applications-what motivated the present workat least one periodic external forcing is always present: the seasonal cycle. There are two ways to possibly remove the issue with this external forcing. First, it might be possible to achieve dimension reduction such that the deterministic part represents the evolution of a long-term average. An example of this is the GSEBM. Second, a periodic forcing is special in that a discrete-time sampling of the dynamics by using the period of the forcing yields an autonomous discrete-time dynamical system, the so-called stroboscopic Poincaré return map [36,37]. The quasi-potential can be defined also for such discrete-time maps [15,20], and our algorithm is applicable to maps too. Then, what remains to ensure is that the noise is weak enough, as proposed above via a secondary iteration. With an aperiodic external forcing perhaps it is possible to generalise the Freidlin-Wentzell large-deviation theory of stochastic dynamical systems and define an instantaneous so-called snapshot or pullback [37] quasi-potential, as well as snapshot instantonic paths analogous to manifolds of snapshot saddles [15,38]. However, our algorithm is based on temporal statistics and so not applicable, even in the case of quasistatically slow forcing, when snapshot objects closely trace their stationary counterparts in autonomous variants of the system. We remark that the latter setting can provide a generalised theory of stochastic resonance [39], which is relevant to approaches dealing with long paleoclimatic time series [34].