The goal is now to find a reaction coordinate \(\xi \) that is as low-dimensional as possible and results in a good projected transfer operator in the sense of (12). As we saw in the previous section, the condition \(\Vert P_\xi ^\perp \varphi _i\Vert _{L^2_{\mu }} \approx 0\) is sufficient. Thus, the idea to numerically seek \(\xi \) that parametrizes the dominant eigenfunctions of \(\mathcal {T}^t\) in the \( \Vert \cdot \Vert _{L^2_{\mu }}\)-norm seems natural since this would lead to small projection error \( \Vert P_\xi ^\perp \varphi _i\Vert _{L^2_{\mu }}\).
In fact, eigenfunctions of transfer operators have been used before to compute reduced dynamics and reaction coordinates: In Froyland et al. (2014), methods to decompose multiscale systems into fast and slow processes and to project the dynamics onto these subprocesses based on eigenfunctions of the Koopman operator \( \mathcal {K}^t \) are proposed. In McGibbon et al. (2017), the dominant eigenfunctions of the transfer operator \( \mathcal {T}^t \), which due to the assumed reversibility of the system is identical to \( \mathcal {K}^t \), are shown to be good reaction coordinates. Also, committor functions (introduced in Appendix B), which are closely related to the dominant eigenfunctions, have been used as reaction coordinates in Du et al. (1998) and Lu and Vanden-Eijnden (2014).
However, we propose a fundamentally different path in defining and finding reaction coordinates, as working with dominant eigenfunctions has two major disadvantages:
-
1.
The eigenproblem is global. Thus, if we wish to learn the value of an eigenfunction \(\varphi _i\) at only one location \(x\in \mathbb {X}\), we need an approximation of the transfer operator \(\mathcal {T}_t\) that has to be accurate on all of \(\mathbb {X}\). The computational effort to construct such an approximation grows exponentially with \(\dim (\mathbb {X})\); this is the curse of dimensionality. There have been attempts to mitigate this (Weber 2006; Junge and Koltai 2009; Weber 2012), but we aim to circumvent this problem entirely. Given two points \(x,y \in \mathbb {X}\), we will decide whether \(\xi (x)\) is close to \(\xi (y)\) or not by using only local computations around x and y (i.e., samples from the transition densities \(p^t(x,\cdot )\) and \(p^t(y,\cdot )\) for moderate t).
-
2.
The number of dominant eigenfunctions \( (d + 1) \) equals the number of metastable states, and this number can be much larger than the dimension of the transition manifold. This fact is illustrated in Example 4.1.
Example 4.1
Let us consider a diffusion process of the form (2) with the circular multi-well potential shown in Fig. 2. Choosing a temperature that is not high enough for the central potential barrier to be overcome easily, transitions between the wells typically happen in the vicinity of a one-dimensional reaction pathway, the unit circle. The number of dominant eigenfunctions, however, corresponds to the number of wells. Nevertheless, projecting the system onto the unit circle would retain the dominant time scales of the system, cf. Sect. . \(\triangle \)
Parametrization of Dominant Eigenfunctions
If the \( (d+1) \) dominant eigenfunctions do not depend fully on the phase space \( \mathbb {X}\), a lower-dimensional and ultimately easier to find reaction coordinate suffices for keeping the eigenvalue approximation error (12) small. It is easy to see that if there exists a function \( \xi :\mathbb {X}\rightarrow \mathbb {R}^k \) for some k so that the eigenfunctions \( \varphi \) are constant on the level sets of \( \xi \), i.e., there exist functions \( \tilde{\varphi }_i :\mathbb {R}^k \rightarrow \mathbb {R}\), \( i = 1, \ldots , d \), such that \( \varphi _i = \tilde{\varphi }_i \circ \xi \), then the projection error \( \Vert P_\xi ^\perp \varphi _i \Vert _{L^2_{\mu }} \) is zero. A quantitative generalization of this is the statement that if the eigenfunctions \( \varphi _i \) are almost constant on level sets of a \( \xi \), then the projection error is small.
Lemma 4.2
Assume that there exists a function \( \xi :\mathbb {X}\rightarrow \mathbb {R}^k \) for some k and functions \( \tilde{\varphi }_i :\mathbb {R}^k \rightarrow \mathbb {R}\), \( i = 1, \ldots , d \), with
$$\begin{aligned} |\varphi _i(x) - \tilde{\varphi }_i(\xi (x))| \le \varepsilon \quad \forall ~x\in \mathbb {X}. \end{aligned}$$
(13)
Then \( \Vert P_\xi ^\perp \varphi _i \Vert _{L^2_{\mu }} \le 2\varepsilon \).
Proof
Assuming (13) holds, there exists a function \( c_i :\mathbb {R}\rightarrow \mathbb {R}\) with \( c_i(x)\le 1~\forall x\in \mathbb {X}\) so that
$$\begin{aligned} \varphi _i(x) = \tilde{\varphi }_i(\xi (x)) + c_i(x)\varepsilon . \end{aligned}$$
Thus, we have
$$\begin{aligned} P_\xi \varphi _i(x)&= \int _{\mathbb {L}_{\xi (x)}} \Big (\tilde{\varphi }_i\big (\xi (x')\big )+c_i(x')\varepsilon \Big ) \mathrm{d}\mu _{\xi (x)}(x') \\&=\tilde{\varphi }_i\big (\xi (x)\big ) + \varepsilon \int _{\mathbb {L}_{\xi (x)}} c_i(x')\mathrm{d}\mu _{\xi (x)}(x'). \end{aligned}$$
For the projection error, we then obtain
$$\begin{aligned} \Vert P_\xi \varphi _i - \varphi _i\Vert _{L^2_{\mu }}&\le \Vert P_\xi \varphi _i - \tilde{\varphi _i}\circ \xi \Vert _{L^2_{\mu }} + \Vert \tilde{\varphi _i}\circ \xi - \varphi _i\Vert _{L^2_{\mu }} \\&\le 2\varepsilon . \end{aligned}$$
\(\square \)
Remark 4.3
From the proof we see that the pointwise condition (13) can be replaced by the much weaker condition
$$\begin{aligned} \int _{\mathbb {L}_{z}} \left| \varphi _i(x') - \tilde{\varphi }_i(\xi (x'))\right| \mathrm{d}\mu _{z}(x') \le \varepsilon , \end{aligned}$$
for all level sets \(\mathbb {L}_z\) of \(\xi \).
From here on, we address the following two central questions:
-
(Q1)
In which dynamical situations can we expect to find low-dimensional reaction coordinates?
-
(Q2)
How can we computationally exploit the properties of the dynamics to obtain reaction coordinates?
Let us start with the first question. We will address the second question in Sects. 4.2 and 4.3. Experience shows (E et al. 2002; Ren et al. 2005; E and Vanden-Eijnden 2006; Schütte and Sarich 2014) that transitions metastable states tend to happen along the so-called reaction pathways, which are the low-dimensional dynamical backbone in the high-dimensional state space, connecting the metastable states via saddle points of the potential V (Freidlin and Wentzell 1998).
From now on, we observe the system at an intermediate time scale \(t_\text {slow} \gg t \gg t_\text {fast}\) (where \(t_\text {slow}\) and \(t_\text {fast}\) are the implied time scales \(t_d,~t_{d+1}\) from Sect. 2.3) and thus assume that the process \(\mathbf {X}_t\) has already left the transition region (if it started there), equilibrated to a quasi-stationary distribution inside some metastable wells, but has not had enough time to equilibrate globally. At this time scale, starting in some \(x\in \mathbb {X}\), the transition density \(p^t(x,\cdot )\) is observed to approximately depend only on progress along these reaction paths; see Fig. 3 for an illustration. This means that the density \(p^t(x,\cdot )\) on the fiber perpendicular to the transition pathway is approximately the same as \(p^t(x^*,\cdot )\) for some \(x^*\) on the transition pathway. As this pathway is low-dimensional, this means that the image \(\overline{\mathcal {Q}}(\mathbb {X})\) of the map
$$\begin{aligned} \overline{\mathcal {Q}}(x):= p^t(x,\cdot ) \end{aligned}$$
is almost a low-dimensional manifold in \(L^1(\mathbb {X})\).
The existence of this low-dimensional structure in the space of probability densities is exactly the assumption we need to ensure that the dominant eigenfunctions are low-dimensionally parametrizable, and thus that a low-dimensional reaction coordinate \(\xi \) exists. This assumption is made precise in Definition 4.4. To summarize, we will see that \(\xi \) is a good reaction coordinate if \(p^t(x,\cdot ) \approx p^t(y,\cdot )\) for \(\xi (x) = \xi (y)\).
Definition 4.4
(\((\varepsilon ,r)\)-reducibility and transition manifold) We call the process \(\mathbf {X}_t\) \((\varepsilon ,r)\)-reducible, if there exists a smooth closed r-dimensional manifold \( \mathbb {M} \subset L^2_{1/\mu } \subset L^1(\mathbb {X})\) such that for \(t_\text {fast}\ll t \ll t_\text {slow}\) and all \(x\in \mathbb {X}\)
$$\begin{aligned} \min _{f\in \mathbb {M}} \Vert f - p^t(x,\cdot )\Vert _{L^2_{1/\mu }} \le \varepsilon \end{aligned}$$
(14)
holds. We call \(\mathbb {M}\) the transition manifold and the map \(\mathcal {Q} :\mathbb {X}\rightarrow \mathbb {M}\),
$$\begin{aligned} \mathcal {Q}(x) := \mathrm {arg}\min _{f\in \mathbb {M}}\Vert p^t(x,\cdot ) - f\Vert _{L^2_{1/\mu }} \end{aligned}$$
(15)
the mapping onto the transition manifold. We can set \(\mathbb {M} = \mathrm {cl}(\mathcal {Q}(\mathbb {X}))\), where \( \mathrm {cl}(\mathbb {Y})\) denotes the closure of the set \(\mathbb {Y}\).Footnote 3
Remark 4.5
While it is natural to motivate \((\varepsilon ,r)\)-reducibility by the existence of reaction pathways in phase space, it is not strictly necessary. There exist stochastic systems without low-dimensional reaction pathways whose densities still quickly converge to a transition manifold in \(L^1\). Future work includes the identification of necessary and sufficient conditions for the existence of transition manifolds (see the first point in Conclusion). We also further elaborate on the connection between reaction pathways and transition manifolds in Appendix B.
Remark 4.6
We recall from Sect. that the Perron–Frobenius operator \(\mathcal {P}^t\) is also naturally defined on the space \(L^2_{1/\mu }\) (Schervish and Carlin 1992). Further, with the Dirac distribution centered in \(x\in \mathbb {X}\), denoted by \(\delta _x\), we formally have \(p^t(x,\cdot ) = \mathcal {P}^t\delta _x\). Hence, the choice of norm in Definition 4.4 is natural. It should also be noted that since \(\mu \) is a probability measure, the Hölder inequality yields \(\Vert f\Vert _{L^1_{\mu }} \le \Vert f\Vert _{L^2_{\mu }}\). Using this we have
$$\begin{aligned} \Vert f\Vert _{L^1} = \Vert f/\varrho \Vert _{L^1_{\mu }} \le \Vert f/\varrho \Vert _{L^2_{\mu }} = \Vert f\Vert _{L^2_{1/\mu }}, \end{aligned}$$
which shows that if \(p^t(x,\cdot )\) and \(p^t(y,\cdot )\) are close in the \(L^2_{1/\mu }\) norm, they are also close in the \(L^1\) norm. We require the closeness of the respective \(p^t(x,\cdot )\) in the \(L^2_{1/\mu }\) norm for our theoretical considerations below, but otherwise we will think of them as functions in \(L^1\).
Note that we only need to evolve the system at hand for a moderate time \(t\ll t_\text {slow}\), which has to be merely sufficiently large to damp out the fast fluctuations in the metastable states. This will be an important point later, allowing for numerical tractability.
Next, we show that \((\varepsilon ,r)\)-reducibility implies that dominant eigenfunctions are almost constant on the level sets of \(\mathcal {Q}\).
Lemma 4.7
If \(\mathbf {X}_t\) is \((\varepsilon ,r)\)-reducible, then for an eigenfunction \(\varphi _i\) of \(\mathcal {T}^t\) with \(\Vert \varphi _i\Vert _{L^2_{\mu }}=1\) and points \(x,y\in \mathbb {X}\) with \(\mathcal {Q}(x) = \mathcal {Q}(y)\) we have
$$\begin{aligned} \left| \varphi _i(x) - \varphi _i(y)\right| \le \frac{2\varepsilon }{|\lambda _i|}. \end{aligned}$$
Proof
First note that for the transition densities \(p^t(x,\cdot ),~p^t(y,\cdot )\) it holds that
$$\begin{aligned} \begin{aligned} \Vert p^t(x,\cdot ) - p^t(y,\cdot )\Vert _{L^2_{1/\mu }}&\le \Vert p^t(x,\cdot ) - \mathcal {Q}(x)\Vert _{L^2_{1/\mu }} + \Vert \mathcal {Q}(x) - p^t(y,\cdot )\Vert _{L^2_{1/\mu }}\\&= \Vert p^t(x,\cdot ) - \mathcal {Q}(x)\Vert _{L^2_{1/\mu }} + \Vert \mathcal {Q}(y) - p^t(y,\cdot )\Vert _{L^2_{1/\mu }} \le 2\varepsilon . \end{aligned} \end{aligned}$$
(16)
With this we can show the assertion:
$$\begin{aligned} \lambda _i\varphi _i(x)&= \mathcal {T}^t \varphi _i(x) = \mathcal {K}^t \varphi _i(x) = \int _\mathbb {X}\varphi _i(x')p^t(x,x') \, dx'. \end{aligned}$$
Applying (16), for some function \(e\in L^2_{1/\mu }(\mathbb {X})\) with \(\Vert e\Vert _{L^2_{1/\mu }}\le 2\varepsilon \), we get
$$\begin{aligned} \lambda _i\varphi _i(x)&=\int _\mathbb {X}\varphi _i(x')\big (p^t(y,x')+e(x')\big )~dx'\\&=\int _\mathbb {X}\varphi _i(x')p^t(y,x') dx' + \int _\mathbb {X}\varphi _i(x')\frac{e(x')}{\varrho (x')}~\mathrm{d}\mu (x')\\&=\lambda _i\varphi _i(y) + \int _\mathbb {X}\varphi _i(x')\frac{e(x')}{\varrho (x')} \, \mathrm{d}\mu (x'), \end{aligned}$$
where in the last equation, we again used that due to reversibility \( \mathcal {K}^t = \mathcal {T}^t \) and that \(\varphi _i\) is an eigenfunction. Thus, for the difference, we have
$$\begin{aligned} |\varphi (x) - \varphi (y)|&=\frac{1}{|\lambda _i|}\Big |\int _\mathbb {X}\varphi _i(x')\frac{e(x')}{\varrho (x')}~\mathrm{d}\mu (x')\Big | \\&\le \frac{1}{|\lambda _i|}\underbrace{\Vert \varphi _i\Vert _{L^2_{\mu }}}_{=1} \underbrace{\Vert e/\varrho \Vert _{L^2_{\mu }}}_{ = \Vert e\Vert _{L^2_{1/\mu }}} \le \frac{2\varepsilon }{|\lambda _i|}. \end{aligned}$$
\(\square \)
Assuming that the eigenfunctions are normalized (which we do from now on), i.e., \(\Vert \varphi _i\Vert _{L^2_{\mu }}=1\), and that \(\varepsilon \) is sufficiently small, Lemma 4.7 implies that the dominant eigenfunctions (i.e., \(|\lambda _i|\approx 1\)) are almost constant on the level sets of \(\mathcal {Q}\). This can now be used to show that the \(\varphi _i\) are not fully dependent on \(\mathbb {X}\), but only on the level sets of \(\mathcal {Q}\) (up to a small error), in a sense similar to Lemma 4.2.
Corollary 4.8
Let \(\mathbf {X}_t\) be \((\varepsilon ,r)\)-reducible. Then there exists a function \(\tilde{\varphi }_i :\mathbb {M} \rightarrow \mathbb {R}\) such that
$$\begin{aligned} \left| \varphi _i(x) - \tilde{\varphi }_i\big (\mathcal {Q}(x)\big )\right| \le \frac{\varepsilon }{|\lambda _i|}. \end{aligned}$$
Proof
Fix \(x\in \mathbb {X}\), and let \(z = \mathcal {Q}(x)\). Define the function \(\tilde{\varphi }_i\) by
$$\begin{aligned} \tilde{\varphi }_i(\mathcal {Q}(x)) := \frac{1}{2} \left( \inf _{\mathcal {Q}(y)=z} \varphi _i(y) + \sup _{\mathcal {Q}(y)=z} \varphi _i(y) \right) . \end{aligned}$$
Since by Lemma 4.7 it holds that \(|\varphi _i(x) - \varphi _i(y)| \le \tfrac{2\varepsilon }{|\lambda _i|}\) if \(\mathcal {Q}(x) = \mathcal {Q}(y)\), we have that
$$\begin{aligned} \left| \sup _{\mathcal {Q}(y)=z} \varphi _i(y) - \inf _{\mathcal {Q}(y)=z} \varphi _i(y) \right| \le \frac{2\varepsilon }{|\lambda _i|}; \end{aligned}$$
thus, our choice of \(\tilde{\varphi }_i\) gives
$$\begin{aligned} \left| \varphi _i(x) - \tilde{\varphi }_i(\mathcal {Q}(x)) \right| \le \frac{\varepsilon }{|\lambda _i|}. \end{aligned}$$
\(\square \)
Embedding the Transition Manifold
In light of Corollary 4.8, one could say that \(\mathcal {Q}\) is an “\(\mathbb {M}\)-valued reaction coordinate.” However, as we have no access to \(\mathbb {M}\) so far, and a \(\mathbb {R}^k\)-valued reaction coordinate is more intuitive, we aim to obtain a more useful representation of the transition manifold through embedding it into a finite, possibly low-dimensional Euclidean space.
We will see that we are very free in the choice of the embedding mapping, even though the manifold \(\mathbb {M}\) is not known explicitly. (We only assumed that it exists.) To achieve this, we will use an infinite-dimensional variant of the weak Whitney embedding theorem (Sauer et al. 1991; Whitney 1936), which, roughly speaking, states that “almost every bounded linear map from \(L^1(\mathbb {X})\) to \(\mathbb {R}^{2r+1}\) will be one-to-one on \(\mathbb {M}\) and its image.” We first specify what we mean by “almost every” in the context of bounded linear maps, following the notions of Sauer et al. (1991).
Definition 4.9
(Prevalence) A Borel subset \(\mathbb {S}\) of a normed linear space \(\mathbb {V}\) is called prevalent if there is a finite-dimensional subspace \(\mathbb {E}\) of \(\mathbb {V}\) such that for each \(v\in \mathbb {V}\), \(v+e\) belongs to \(\mathbb {S}\) for (Lebesgue) almost every e in \(\mathbb {E}\).
As the infinite-dimensional embedding theorem from Hunt and Kaloshin (1999) is applicable not only to smooth manifolds, but to arbitrary subsets \(\mathbb {A}\subset \mathbb {V}\) of fractal dimension, it uses the concepts of box-covering dimension \(\dim _B(\mathbb {A})\) and thickness exponent \(\tau (\mathbb {A})\) from fractal geometry. Intuitively, \(\dim _B(\mathbb {A})\) describes the exponent of the growth rate in the number of boxes of decreasing side length that are needed to cover \(\mathbb {A}\), and \(\tau (\mathbb {A})\) describes how well \(\mathbb {A}\) can be approximated using only finite-dimensional linear subspaces of \(\mathbb {V}\). As these concepts coincide with the traditional measure of dimensionality in our setting, we will not go into detail here and point to Hunt and Kaloshin (1999) for a precise definition.
The general infinite-dimensional embedding theorem reads:
Theorem 4.10
(Hunt and Kaloshin 1999, Theorem 3.9) Let \(\mathbb {V}\) be a Banach space and \(\mathbb {A}\subset \mathbb {V}\) be a compact set with box-counting dimension d and thickness exponent \(\tau \). Let \(k>2d\) be an integer, and let \(\alpha \) be a real number with
$$\begin{aligned} 0<\alpha <\frac{k-2d}{k(1+\tau )}. \end{aligned}$$
Then for almost every (in the sense of prevalence) bounded linear function \(\mathcal {E}:\mathbb {V}\rightarrow \mathbb {R}^k\) there exists \(C>0\) such that for all \(x,y\in \mathbb {A}\),
$$\begin{aligned} C\Vert \mathcal {E}(x)-\mathcal {E}(y)\Vert _2^\alpha \ge \Vert x-y\Vert _2, \end{aligned}$$
(17)
where \(\Vert \cdot \Vert _2\) denotes the Euclidean 2-norm.
Note that (17) implies Hölder continuity of \(\mathcal {E}^{-1}\) on \(\mathcal {E}(\mathbb {A})\) and in particular that \(\mathcal {E}\) is one-to-one on \(\mathbb {A}\) and its image. Using that the box-counting dimension of a smooth r-dimensional manifold \(\mathbb {K}\) is simply r and that the thickness exponent is bounded from above by the box-counting dimension, thus \(0\le \tau (\mathbb {K}) \le r\), see Hunt and Kaloshin (1999), we get the following infinite-dimensional embedding theorem for smooth manifolds.
Corollary 4.11
Let \(\mathbb {V}\) be a Banach space, let \(\mathbb {K}\subset \mathbb {V}\) be a smooth manifold of dimension r, and let \(k>2r\). Then almost every (in the sense of prevalence) bounded linear function \(\mathcal {E}:\mathbb {V}\rightarrow \mathbb {R}^k\) is one-to-one on \(\mathbb {K}\) and its image in \(\mathbb {R}^k\).
Thus, since the transition manifold \(\mathbb {M}\) is assumed to be a smooth r-dimensional manifold in \(L^1(\mathbb {X})\), an arbitrarily chosen bounded linear map \(\mathcal {E} :L^1(\mathbb {X})\rightarrow \mathbb {R}^{2r+1}\) can be assumed to be one-to-one on \(\mathbb {M}\) and its image. In particular, \(\mathcal {E}(\mathbb {M})\) is again an r-dimensional manifold (although not necessarily smooth). With this insight, we can now construct a reaction coordinate in Euclidean space:
Corollary 4.12
Let \(\mathbf {X}_t\) be \((\varepsilon ,r)\)-reducible, and let \(\mathcal {E} :L^1(\mathbb {X}) \rightarrow \mathbb {R}^{2r+1}\) be one-to-one on \(\mathbb {M}\) and its image. Define \(\xi :\mathbb {R}^n \rightarrow \mathbb {R}^{2r+1}\) by
$$\begin{aligned} \xi (x) := \mathcal {E}\big (\mathcal {Q}(x)\big ). \end{aligned}$$
(18)
Then there exists a function \(\hat{\varphi }_i :\mathbb {R}^{2r+1} \rightarrow \mathbb {R}\) so that
$$\begin{aligned} |\varphi _i(x) - \hat{\varphi }_i(\xi (x))| \le \frac{\varepsilon }{|\lambda _i|}. \end{aligned}$$
(19)
Proof
As \(\mathcal {E}\) is one-to-one on \(\mathbb {M}\) and its image, it is invertible on \(\mathcal {E}(\mathbb {M})\). With \(\tilde{\varphi }_i\) chosen as in the proof of Corollary 4.8, define \(\hat{\varphi }_i :\mathcal {E}(\mathbb {M}) \rightarrow \mathbb {R}\) by
$$\begin{aligned} \hat{\varphi _i}(\hat{z}) := \tilde{\varphi }_i\big (\mathcal {E}^{-1}(\hat{z})\big ). \end{aligned}$$
(20)
Then
$$\begin{aligned} |\varphi _i(x) - \hat{\varphi }_i(\xi (x))| = |\varphi _i(x) - \tilde{\varphi }_i(\mathcal {Q}(x))| \overset{\text {Cor. 4.8}}{\le } \frac{\varepsilon }{|\lambda _i|} . \end{aligned}$$
\(\square \)
Since \(\hat{\mathbb {M}} := \mathcal {E}(\mathbb {M})\) is an r-dimensional manifold, \(\xi \) is effectively an r-dimensional reaction coordinate. Thus, if the right-hand side of (19) is small, the \(\varphi _i\) are “almost parametrizable” by the r-dimensional reaction coordinate \(\xi \). Using Lemma 4.2, we immediately see that this results in a small projection error \(\Vert P_\xi ^\perp \varphi _i\Vert \), and due to Corollary 3.6 in a good transfer operator approximation; hence, \(\xi \) is a good reaction coordinate.
The reaction coordinate \(\xi \) remains an “ideal” case, because we have no access to the map \(\mathcal {Q}\) and hence to \(\mathbb {M}\), but only to \(\overline{\mathcal {Q}}(x) = p^t(x,\cdot ) \approx \mathcal {Q}(x)\). We show the construction of the ideal reaction coordinate \(\xi \) in Fig. 4.
Remark 4.13
The recent work of Dellnitz et al. (2016) uses similar embedding techniques to identify finite-dimensional objects in the state space of infinite-dimensional dynamical systems. They utilize the infinite-dimensional delay-embedding theorem of Robinson (2005), a generalization of the well-known Takens embedding theorem (Takens 1981), to compute finite-dimensional attractors of delay differential equations by established subdivision techniques (Dellnitz and Hohmann 1997).
Numerical Approximation of the Reaction Coordinate
Approximate Embedding of the Transition Manifold We now elaborate how to construct a good reaction coordinate \(\overline{\xi }\) numerically. To use the central definition (18) in practice, two points have to be addressed:
-
1.
How to choose the embedding \(\mathcal {E}\)?
-
2.
How to deal with the fact that we do not know \(\mathcal {Q}\)?
For the choice of \(\mathcal {E}\), we restrict ourselves to linear maps of the form
$$\begin{aligned} \mathcal {E}(f) := \begin{pmatrix} \left\langle f,\, \eta _1 \right\rangle \\ \vdots \\ \left\langle f,\, \eta _{2r+1} \right\rangle \end{pmatrix}, \end{aligned}$$
(21)
with arbitrarily chosen linearly independent functions \( \eta _i\in L^{\infty }(\mathbb {X}) \), where \(\langle f,\eta _i\rangle = \int f\eta _i\). In practice, we will choose the \(\eta _i:\mathbb {X}\rightarrow \mathbb {R}\) as linear functions themselves, i.e., \(\eta _i(x) = a_i^{\intercal }x\) for some, usually randomly drawn, \(a_i\in \mathbb {R}^n\). Note that then \(\eta _i\notin L^{\infty }\), but this is not a problem because we will embed the functions \(f = p^t(x,\cdot )\), and \(p^t(x,y)\) can be shown to decay exponentially as \(\Vert y\Vert _2\rightarrow \infty \), cf. Bittracher et al. (2015), Theorem C.1. Thus, \(\left\langle f,\, \eta _i \right\rangle \) will exist. For linearly independent \(\eta _i\), these maps are still generic in the sense of the Whitney embedding theorem and thus still embed the transition manifold \(\mathbb {M}\).
A natural choice for the approximation of the unknown map \(\mathcal {Q}\) is the mapping to the transition probability density,
$$\begin{aligned} \overline{\mathcal {Q}}: x \mapsto p^t(x,\cdot ), \end{aligned}$$
(22)
as \(\Vert \mathcal {Q}(x) - p^t(x,\cdot )\Vert _{L^2_{1/\mu }} \le \varepsilon \). With this, we consider
$$\begin{aligned} \mathcal {E}\big (\overline{\mathcal {Q}}(x)\big ) = \mathcal {E}\big (p^t(x,\cdot )\big ) = \begin{pmatrix} \left\langle p^t(x, \cdot ),\, \eta _1 \right\rangle \\ \vdots \\ \left\langle p^t(x, \cdot ),\, \eta _{2r+1} \right\rangle \end{pmatrix} {\mathop {=}\limits ^{(1)}} \begin{pmatrix} \mathcal {K}^t \eta _1(x) \\ \vdots \\ \mathcal {K}^t \eta _{2r+1}(x) \end{pmatrix}. \end{aligned}$$
(23)
The values on the right-hand side can in turn be approximated by a Monte Carlo quadrature, using only short-time trajectories of the original dynamics:
$$\begin{aligned} \mathcal {K}^t \eta _i(x) = \mathsf {E}\big [\eta _i(\mathbf {X}_t) \mid \mathbf {X}_0 = x \big ] \approx \frac{1}{M} \sum _{m=1}^M\eta _i\big (\varvec{\Phi }_t^{(m)}(x)\big ), \end{aligned}$$
(24)
where the \(\varvec{\Phi }_t^{(m)}(x)\) are independent realizations of \(\mathbf {X}_t\) with starting point \(\mathbf {X}_0=x\), in practice realized by a stochastic integrator (e.g., Euler–Maruyama).
The Computationally Infeasible Reaction Coordinate \(\varvec{\xi }\) Note that \(\mathcal {E}\circ \overline{\mathcal {Q}}\) is not yet an r-dimensional reaction coordinate, since \(\overline{\mathcal {Q}}(\mathbb {X})\) is only approximately an r-dimensional manifold; more precisely, it lies in the \(\varepsilon \)-neighborhood of an r-dimensional submanifold \(\mathbb {M}\) of \(L^1\). Hence, also \(\mathcal {E}(\overline{\mathcal {Q}}(\mathbb {X}))\) is only approximately an r-dimensional manifold; see the magenta regions in Fig. 5.
The question now is how we can reduce \(\mathcal {E}\circ \overline{\mathcal {Q}}\) to an r-dimensional good reaction coordinate. Since we know from above that \(\xi = \mathcal {E}\circ \mathcal {Q}\) is a good reaction coordinate, let us see what would be needed to construct it.
The property of \(\xi \) that we want is that it is constant along level sets \(\mathbb {L}_z\) of \(\mathcal {Q}\), i.e., \(\xi \vert _{\mathbb {L}_z} = \mathrm {const}\) (because this implies that it is a good reaction coordinate, cf. Corollary 4.12). Hence, if we could identify \(\hat{\mathbb {M}}\) as an r-dimensional manifold in \(\mathbb {R}^{2r+1}\), we would project \(\mathcal {E}(\overline{\mathcal {Q}}(x))\) along \(\mathcal {E}(\overline{\mathcal {Q}}(\mathbb {L}_z))\) onto \(\hat{\mathbb {M}}\)—assuming that \(\hat{\mathbb {M}}\) and \(\mathcal {E}(\overline{\mathcal {Q}}(\mathbb {L}_z))\) intersect in \(\mathbb {R}^{2r+1}\)—to obtain \(\xi (x)\) as the resulting point (see Fig. 5, where we would project along the red line on the right). Unfortunately, we have no access to \(\mathcal {Q}\) (not to mention that \(\hat{\mathbb {M}}\) and \(\mathcal {E}(\overline{\mathcal {Q}}(\mathbb {L}_z))\) need not intersect in \(\mathbb {R}^{2r+1}\)) and hence to its level sets \(\mathbb {L}_z\). Thus, this strategy seems infeasible.
A Computationally Feasible Reaction Coordinate What helps us at this point is that there is a certain amount of arbitrariness in the definition of \(\mathcal {Q}\). Recalling Definition 4.4, what we are given is \(\overline{\mathcal {Q}}\), and we construct \(\mathcal {Q}(x)\) as a projection of \(\overline{\mathcal {Q}}(x)\) onto the r-dimensional manifold \(\mathbb {M}\) by the closest-point projection \(\mathcal {Q}'\); i.e., \(\mathcal {Q} = \mathcal {Q}'\circ \overline{\mathcal {Q}}\). This choice of \(\mathcal {Q}'\) is convenient, because we can show
$$\begin{aligned} \Vert \overline{\mathcal {Q}}(x) - \overline{\mathcal {Q}}(y)\Vert _{L^2_{1/\mu }}\le 2\varepsilon \quad \text {for every }\mathcal {Q}(x) = \mathcal {Q}(y)~(\text {i.e., on level sets of } \mathcal {Q}'), \end{aligned}$$
(25)
which is used in Lemma 4.7. Other choices of \(\mathcal {Q}'\) could, however, yield a similarly practicable \(\mathcal {O}(\varepsilon )\)-bound in (25). Our strategy will be to choose a specific r-dimensional reaction coordinate \(\overline{\xi }\) and to show that in general it can be expected to be a good reaction coordinate.
Let us recall that, by assumption, the set \(\overline{\mathcal {Q}}(\mathbb {X})\) is contained in the \(\varepsilon \)-neighborhood of an unknown smooth r-dimensional manifold \(\mathbb {M}\subset L^1(\mathbb {X})\). Thus, a generic smooth map \(\mathcal {E} :L^1(\mathbb {X}) \rightarrow \mathbb {R}^{2r+1}\) will embed \(\mathbb {M}\) into \(\mathbb {R}^{2r+1}\), forming a diffeomorphism from \(\mathbb {M}\) to \(\hat{\mathbb {M}}\). Thus, \(\mathcal {E}\) is going to map \(\overline{\mathcal {Q}}(\mathbb {X})\) to an \(\mathcal {O}(\varepsilon )\)-neighborhood of \(\hat{\mathbb {M}}\). This means the r-dimensional manifold structure of \(\hat{\mathbb {M}}\) should still be detectable and can be identified with standard manifold learning tools. We use the diffusion maps algorithm (see Sect. 4.4), which gives us a map \(\Psi : \mathbb {R}^{2r+1} \rightarrow \mathbb {R}^r\) (the diffusion map). Then we define \({\overline{\xi }}\) as
$$\begin{aligned} \overline{\xi } := \Psi \circ \mathcal {E}\circ \overline{\mathcal {Q}}. \end{aligned}$$
(26)
This is depicted on the right-hand side of Fig. 6, where the dashed red line shows the level set \(\hat{\mathbb {L}}_{\hat{z}} = \{ z\in \mathbb {R}^{2r+1}: \Psi (z) = \Psi ({\hat{z}})\}\).
Next, we consider the set \(\tilde{\mathbb {L}}_{\hat{z}} := \mathcal {E}^{-1}(\hat{\mathbb {L}}_{\hat{z}}) \cap \overline{\mathcal {Q}}(\mathbb {X})\). It holds that \(\tilde{\mathbb {L}}_{\hat{z}} = \left\{ \overline{\mathcal {Q}}(x)\,\big \vert \,\overline{\xi }(x) = \Psi (\hat{z})\right\} \). Recall that \(\mathcal {E} :\mathbb {M} \rightarrow \hat{\mathbb {M}}\) is one-to-one; thus, \(\tilde{\mathbb {L}}_{\hat{z}}\) intersects \(\mathbb {M}\) in exactly one point. We define this one point as \(\mathcal {Q}(x)\), and thus, \(\mathcal {Q}'\) is the projection onto \(\mathbb {M}\) along \(\tilde{\mathbb {L}}_{\hat{z}}\). We see that \(\mathcal {Q}\) is well defined and that \(\mathcal {Q}(x)=\mathcal {Q}(y) \Leftrightarrow \overline{\xi }(x) = \overline{\xi }(y)\).
At this point we assume that \(\mathcal {E}^{-1}\) is sufficiently well behaved in a neighborhood of \(\hat{\mathbb {M}}\) and it does not “distort transversality” of intersections such that the diameter of \(\tilde{\mathbb {L}}_{\hat{z}}\) is \(\mathcal {O}(\varepsilon )\) with a moderate constant in \(\mathcal {O}(\cdot )\). We will investigate a formal justification of this fact in a future work, here we assume it holds true, and we will see in the numerical experiments that the assumption is justified. This assumption implies that \(\Vert \overline{\mathcal {Q}}(x) - \overline{\mathcal {Q}}(y)\Vert _{L^2_{1/\mu }} = \mathcal {O}(\varepsilon )\) for \(\mathcal {Q}(x) = \mathcal {Q}(y)\), i.e., for \(\overline{\xi }(x) = \overline{\xi }(y)\). Now, however, Lemma 4.7 implies that \(\varphi _i\) is almost constant (up to an error \(\mathcal {O}(\varepsilon )\)) on level sets of \(\overline{\xi }\), which, in turn, by Lemma 4.2 and Corollary 3.6 shows that \(\overline{\xi }\) is a good reaction coordinate.
Identification of \(\hat{\mathbb {M}}\) Through Manifold Learning
In this section, we describe how to identify \(\hat{\mathbb {M}}\) numerically. The task is as follows: Given that we have computed \(\mathcal {E}(\overline{\mathcal {Q}}(x_i)) = {\hat{z}}_i \in \mathbb {R}^{2r+1}\) for a number of sample points \(\{x_i\}_{i=1}^{\ell } \subset \mathbb {X}\), we would like to identify the r-dimensional manifold \(\hat{\mathbb {M}}\), noting the points \(\mathcal {E}(\overline{\mathcal {Q}}(x_i))\) are in a \(\mathcal {O}(\varepsilon )\)-neighborhood of \(\hat{\mathbb {M}}\) (see Sect. 4.3). Additionally, we would like an r-dimensional coordinate function \(\Psi :\mathbb {R}^{2r+1} \rightarrow \mathbb {R}^r\) that parametrizes \(\hat{\mathbb {M}}\) (so that the level sets of \(\Psi \) are transversal to \(\hat{\mathbb {M}}\)).
This is a default setting for which manifold learning algorithms can be applied. Many standard methods exist; we name multidimensional scaling (Kruskal 1964b, a), isomap (Tenenbaum et al. 2000), and diffusion maps (Coifman and Lafon 2006) as a few of the most prominent examples. Because of its favorable properties, we choose the diffusion maps algorithm here and summarize it briefly for our setting in what follows. For details, the reader is referred to Coifman and Lafon (2006), Nadler et al. (2006), Coifman et al. (2008) and Singer et al. (2009).
Given sample points \(\{{\hat{z}}_i\}_{i=1}^{\ell } \subset \mathbb {R}^{2r+1}\), diffusion maps proceed by constructing a similarity matrix \(W\in \mathbb {R}^{\ell \times \ell }\) with
$$\begin{aligned} W_{ij} = h\left( \frac{\Vert {\hat{z}}_i - {\hat{z}}_j\Vert _2^2}{\sigma }\right) , \end{aligned}$$
where \(\Vert \cdot \Vert _2\) is the Euclidean norm in \(\mathbb {R}^{2r+1}\), \(\sigma > 0\) is a scale factor, and \(h : \mathbb {R}\rightarrow \mathbb {R}_+\) is a kernel function which is most commonly chosen as \(h(x) = \exp (-x) 1_{x\le R}\) with a suitably chosen cutoff R that sparsifies W and ensures that only local distances enter the construction. With D being the diagonal matrix containing the row sums of W, the similarity matrix is then normalized to give \({\tilde{W}} = D^{-1}WD^{-1}\). Finally, the stochastic matrix \(P = {\tilde{D}}^{-1}{\tilde{W}}\) is constructed, where \({\tilde{D}}\) is the diagonal matrix containing the row sums of \({\tilde{W}}\). P is similar to the symmetric matrix \({\tilde{D}}^{-1/2}{\tilde{W}}{\tilde{D}}^{-1/2}\); thus, it has an orthonormal basis of eigenvectors \(\{\psi _i\}_{i=0}^{\ell -1}\) with real eigenvalues \(\gamma _i\). Since P is also stochastic, \(|\gamma _i| \le 1\). The diffusion map is then given by
$$\begin{aligned} \Psi : \mathbb {R}^{2r+1} \rightarrow \mathbb {R}^r, \quad \Psi ({\hat{z}}) = \left( \gamma _1 \psi _1({\hat{z}}),\ldots , \gamma _r \psi _r({\hat{z}})\right) ^{\intercal }. \end{aligned}$$
(27)
Using properties of the Laplacian eigenproblem on \(\hat{\mathbb {M}}\), one can show that \(\Psi \) indeed parametrizes the r-dimensional manifold \(\hat{\mathbb {M}}\) for suitably chosen \(\sigma \) (Coifman and Lafon 2006).
Remark 4.14
The diffusion maps algorithm will only reliably identify \(\hat{\mathbb {M}}\) based on the neighborhood relations between the embedded sample points \(z_i\), if the points cover all parts of \(\hat{\mathbb {M}}\) sufficiently well. In particular, as \(p^t(x,\cdot )\) and thus \(\big (\mathcal {E}\circ \overline{\mathcal {Q}}\big )(x)\) vary strongly with x traversing the transition regions, a good coverage of those regions is required.
For the various low-dimensional academic examples Sect. 5, this is ensured by choosing the \(x_i\) to be a dense grid of points in \(\mathbb {X}\). For the high-dimensional example in Sect. 5.2, the evaluation points are generated as a subsample from a long equilibrated trajectory, essentially sampling \(\mu \). Both of these ad hoc methods are likely to be unapplicable in realistic high-dimensional systems with very long equilibration times. However, as we mentioned in Introduction, there exist multiple statistical and dynamical approaches to this common problem of quickly sampling the relevant parts of phase space, including the transition regions. Each of these sampling methods can be easily integrated into our proposed algorithm as a preprocessing step.
Fundamentally though, the central idea of our method does not depend crucially on the applicability of diffusion maps. Rather, the latter can be considered an optional post-processing step. Using the \(2r+1\)-dimensional reaction coordinate
$$\begin{aligned} \overline{\overline{{\xi }}} := \mathcal {E}\circ \overline{\mathcal {Q}}, \end{aligned}$$
i.e., (26) without the manifold learning step, may in practice already represent a sufficient dimensionality reduction.
In addition, situations may occur where the a priori generation of evaluation points is not possible or desired. One of the final goals and the work currently in progress is the construction of an accelerated integration scheme that generates significant evaluation points and their reaction coordinate value “on the fly.” This is related to the effective dynamics mentioned in fifth point in Conclusion. However, this also requires us to be able to evaluate the reaction coordinate at isolated points, independent of each other, and thus also necessitates the use of the above \(\overline{\overline{{\xi }}}\) instead of \(\overline{\xi }\).