Transition Manifolds of Complex Metastable Systems
 671 Downloads
 4 Citations
Abstract
We consider complex dynamical systems showing metastable behavior, but no local separation of fast and slow time scales. The article raises the question of whether such systems exhibit a lowdimensional manifold supporting its effective dynamics. For answering this question, we aim at finding nonlinear coordinates, called reaction coordinates, such that the projection of the dynamics onto these coordinates preserves the dominant time scales of the dynamics. We show that, based on a specific reducibility property, the existence of good lowdimensional reaction coordinates preserving the dominant time scales is guaranteed. Based on this theoretical framework, we develop and test a novel numerical approach for computing good reaction coordinates. The proposed algorithmic approach is fully local and thus not prone to the curse of dimension with respect to the state space of the dynamics. Hence, it is a promising method for databased model reduction of complex dynamical systems such as molecular dynamics.
Keywords
Metastability Reaction coordinate Coarse graining Effective dynamics Whitney embedding theorem Transfer operatorMathematics Subject Classification
47B38 82C31 60H351 Introduction
With the advancement of computing power, we are able to simulate and analyze more and more complicated and highdimensional models of dynamical systems, ranging from astronomical scales for the simulation of galaxies, over planetary and continental scales for climate and weather prediction, down to molecular and subatomistic scales via, e.g., molecular dynamics (MD) simulations aimed at gaining insight into complex biological processes. Particular aspects of such processes, however, can often be described by much simpler means than the full process, thus reducing the full dynamics to some essential behavior or effective dynamics in terms of some essential observables of the system. Extracting these observables and the related effective dynamics from a dynamical system, though, is one of the most challenging problems in computational modeling (Froyland et al. 2014).
One prominent example of dynamical reduction is arguably given by a variety of multiscale systems with explicit fast–slow time scale separation, mostly singularly perturbed systems, where either the fast component is considered in a quasistationary regime (i.e., the slow components are fixed and assumed not to change for the observation period), or the effective behavior of the fast components is injected into the slow processes, e.g., by averaging or homogenization (Pavliotis and Stuart 2008). Much of the recent attention has been directed to the case where the deduction of the slow (or fast) effective dynamics is not possible by purely analytic means, due to the lack of an analytic description of the system, or because the complexity of the system renders this task unfeasible (Froyland et al. 2014, 2016; Coifman et al. 2008; Dsilva et al. 2016; Nadler et al. 2006; Singer et al. 2009; Crosskey and Maggioni 2017; VandenEijnden 2007; Kevrekidis and Samaey 2009). However, all of these approaches still depend on some local form of time scale separation between the “fast” and the “slow” components of the dynamics.
The focus of this work is on specific multiscale systems without local dynamical slow–fast time scale separation, but for which a reduction to an effective dynamical behavior supported on some lowdimensional manifold is still possible. The dynamical property lying at the heart of our approach is that there is a time scale separation in the global kinetic behavior of the process, as opposed to the aforementioned slow–fast behavior encoded in the local dynamics. Here, global kinetic behavior means that the multiple scales show up if we consider the Fokker–Planck equation associated with the dynamics, say \(\dot{u} = \mathcal {L}u\), where the Fokker–Planck operator \(\mathcal {L}\) will have several small eigenvalues, while the rest of its spectrum is significantly larger. Such dynamical systems exhibit metastable behavior, and the slow time scales are the time scales of statistical relaxation between the main metastable sets, while there is no time scale gap for the local dynamics within each of the metastable regions (Bovier et al. 2002; Schütte and Sarich 2014).
The tool to describe the global kinetic behavior of a metastable system is the socalled transfer operator (the evolution operator of the Fokker–Planck equation), which acts on functions on the state space. The time scale separation we rely on here implies a spectral gap for this operator. This fact has been exploited to find lowdimensional representations of the global kinetics in the form of Markov chains whose (discrete) states represent the metastable sets, while the transition probabilities between the states approximate the jump statistics between the sets on long time scales. Under the name “Markov state models” (MSMs), this approach has led to a variety of methods (Bowman et al. 2014; Schütte and Sarich 2014) with broad application, e.g., in molecular dynamics, cf. Schütte et al. (1999), Pande et al. (2010), Schütte et al. (2011) and Chodera and Noé (2014). This reduction comes with a price: Since the relaxation kinetics is described just by jumps between the metastable sets in a (finite) discrete state space, any information about the transition process and its dynamical features is lost. A variety of approaches have been developed for complementing the MSM approach appropriately (Metzner et al. 2009), but a continuous (in time and space) lowdimensional effective description based on MSMs allowing to understand the transition mechanism is infeasible.
In another branch of the literature, again heavily influenced by molecular dynamics applications, model reduction techniques have been developed that assume the existence of a lowdimensional reaction coordinate or order parameter in order to construct an effective dynamics or kinetics: Examples are free energybased techniques (Torrie and Valleau 1977; Laio and Gervasio 2008), trajectorybased sampling techniques (Faradjian and Elber 2004; Becker et al. 2012; Moroni et al. 2004; PérezHernández et al. 2013), methods based on diffusive processes (Best and Hummer 2010; Zhang et al. 2016; Pavliotis and Stuart 2008), and many more that rely on the assumption that the reaction coordinates are known. The problem of actually constructing good reaction coordinates remains an area of ongoing research (Li and Ma 2014), to which this paper contributes. Typically, reaction coordinates are either postulated using systemspecific expert knowledge (Camacho and Thirumalai 1993; Socci et al. 1996), an approximation to the dominant eigenfunctions of the transfer operator is sought (Schütte and Sarich 2014; Chodera and Noé 2014; PérezHernández et al. 2013), or machine learning techniques are proposed (Ma and Dinner 2005). Froyland et al. (2014) show that these eigenfunctions are indeed optimal—in the sense of optimally representing the slow dynamics—but for highdimensional systems computational reaction coordinate identification still is often infeasible. In the context of transition path theory (VandenEijnden 2006), the committor function is known to be an ideal (Lu and VandenEijnden 2014) reaction coordinate. In Pozun et al. (2012), the authors construct a level set of the committor using support vector machines, but the computation of reaction coordinates is infeasible for highdimensional systems. The main problem in computing reaction coordinates for highdimensional metastable systems results from the fact that all of these algorithms try to solve a global problem in the entire state space that cannot be decomposed easily into purely local computations.
In this article, we elaborate on the definition, existence and algorithmic identification of reaction coordinates for metastable systems: We define reaction coordinates as a small set of nonlinear coordinates on which a reduced system (Legoll and Lelièvre 2010; Zhang et al. 2016) can be defined having the same dominant time scales (in terms of transfer operator eigenvalues) as the original system. We then consider a lowdimensional state space on which the reduced dynamics is a Markov process. Thus, our approach utilizes concepts and transfer operator theory developed previously, but in our case the projected transfer operator is still infinitedimensional, in stark contrast to its reduction to a stochastic matrix in the MSM approach.
The contribution of this paper is twofold: First, we develop a conceptual framework that identifies good reaction coordinates as the ones that parametrize a lowdimensional transition manifold \(\mathbb {M}\) in the function space \(L^1\), which is the natural state space of the Fokker–Planck equation \(\dot{u} = \mathcal {L}u\) associated with the dynamics. The property which defines \(\mathbb {M}\) is that, on moderate time scales \(t_\text {fast} < t \ll t_\text {slow}\), the transition density functions of the dynamics concentrate around \(\mathbb {M}\). We provide evidence that such an \(\mathbb {M}\) indeed exists due to metastability and the existence of transition pathways. Crucially, the dimension of \(\mathbb {M}\) is often lower than the number of dominant eigenfunctions.
 1.
The simulation time scale t can be chosen a lot smaller than the dominant time scales \(t_\text {slow}\) of the system, such that it is feasible to simulate many short trajectories of length t.
 2.
We utilize embedding techniques inspired by the seminal work of Whitney (1936) and the recent work Dellnitz et al. (2016) that allows one to take almost any mapping into a Euclidean space of more than twice the dimension of the manifold \(\mathbb {M}\) and to obtain a onetoone image of it.
The locality of the algorithm also implies that reaction coordinates are only computed in the region of state space where sampled points are available. This is a common issue with manifold learning algorithms; here, it manifests as the transition manifold being reliably learned only in regions we have good sampling coverage of. However, recently several methods have appeared in the literature that allow a fast exploration of the state space. These methods do not provide equilibrium sampling, but instead try to rapidly cover the essential part of the state space with sampling points. This can be achieved with enhanced sampling methods such as umbrella sampling (Kumar et al. 1992; Torrie and Valleau 1977), metadynamics (Laio and Gervasio 2008; Laio and Parrinello 2002), bluemoon sampling (Ciccotti et al. 2005), adaptive biasing force method (Darve et al. 2008) or temperatureaccelerated molecular dynamics (Maragliano and VandenEijnden 2006), as well as trajectorybased techniques such as milestoning (Faradjian and Elber 2004), transition interface sampling (Moroni et al. 2004) or forward flux sampling (Becker et al. 2012). Alternatively, several techniques such as the equationfree approach (Kevrekidis and Samaey 2009), the heterogeneous multiscale method (HMM) (E and Engquist 2003) and methods based on diffusion maps (Chiavazzo et al. 2016) have been developed to utilize short unbiased MD trajectories for extracting information that allows much larger time steps. This can be combined with reaction coordinatebased effective dynamics (Zhang et al. 2016; Zhang and Schuette 2017).
In principle, the method we present in this article may be combined with any enhanced sampling technique in order to generate sampling points that cover a large part of the state space. For simplicity, we will use long MD trajectories to generate our sampling points, but we do not require that the points are distributed according to an equilibrium distribution.
The paper is organized as follows: Sect. introduces transfer operators, which describe the global kinetics of the stochastic process. Based on these transfer operators, we define metastability, i.e., the existence of dominant time scales. In Sect. 3, we describe the model reduction techniques Markov state modeling and coordinate projection that are designed to capture the dominant time scales of metastable systems. Furthermore, we characterize good reaction coordinates. In the first part of Sect. 4, we show that our dynamical assumption ensures the existence of good reaction coordinates, and then in the second part we describe our approach to compute them. Several numerical examples are given in Sect. . Concluding remarks and an outlook are provided in Sect. 6.
2 Transfer Operators and Their Properties
As mentioned in Introduction, global properties of dynamical systems such as metastable sets or a partitioning into fast and slow subprocesses can be obtained using transfer operators associated with the system and their eigenfunctions. In this section, we will introduce different transfer operators needed for our considerations.
2.1 Transfer Operators
In what follows, \( \mathsf {P}[\,\cdot \mid \mathfrak {E}] \) denotes probabilities conditioned on the event \( \mathfrak {E} \) and \( \mathsf {E[\cdot \mid \mathfrak {E}]} \) the expectation value. Furthermore, \( \{\mathbf {X}_t\}_{t \ge 0} \) is a stochastic process defined on a state space \( \mathbb {X}\subset \mathbb {R}^n \).
Definition 2.1
Definition 2.2
 (a)The Perron–Frobenius operator \( \mathcal {P}^t :L^1(\mathbb {X}) \rightarrow L^1(\mathbb {X}) \) is defined by the unique linear extension ofto \(L^1(\mathbb {X})\).$$\begin{aligned} \mathcal {P}^t p(x) = \intop _{\mathbb {X}} p^t(y,x) \, p(y) \, \mathrm {d}y \end{aligned}$$
 (b)The Perron–Frobenius operator \(\mathcal {T}^t :L_\mu ^1(\mathbb {X}) \rightarrow L_\mu ^1(\mathbb {X})\) with respect to the equilibrium density is defined by the unique linear extension ofto \(L^1_{\mu }(\mathbb {X})\).$$\begin{aligned} \mathcal {T}^t u(x) = \intop _{\mathbb {X}} \frac{\varrho (y)}{\varrho (x)} \, p^t(y, x) \, u(y) \,\mathrm {d}y \end{aligned}$$
 (c)The Koopman operator \( \mathcal {K}^t :L^{\infty }(\mathbb {X}) \rightarrow L^{\infty }(\mathbb {X}) \) is defined by$$\begin{aligned} \mathcal {K}^t f(x) = \intop _{\mathbb {X}} p^t(x,y) \, f(y) \,\mathrm {d}y = \mathsf {E}[f(\mathbf {X}_t) \mid \mathbf {X}_0 = x]. \end{aligned}$$(1)
The equilibrium density \( \varrho \) satisfies \( \mathcal {P}^t \varrho =\varrho \), that is, \( \varrho \) is an eigenfunction of \( \mathcal {P}^t \) with associated eigenvalue \( \lambda _0 = 1 \). The definition of \( \mathcal {T}^t \) relies on \(\varrho \), and we have \(\varrho \, \mathcal {T}^tu = \mathcal {P}^t (u \varrho )\).
Instead of their natural domains from Definition 2.2, all our transfer operators are considered on the following Hilbert spaces: \(\mathcal {P}^t:L^2_{1/\mu }(\mathbb {X}) \rightarrow L^2_{1/\mu }(\mathbb {X})\), \(\mathcal {T}^t:L^2_{\mu }(\mathbb {X}) \rightarrow L^2_{\mu }(\mathbb {X})\), and \(\mathcal {K}^t:L^2_{\mu }(\mathbb {X}) \rightarrow L^2_{\mu }(\mathbb {X})\). They are still welldefined nonexpansive operators on these spaces (Baxter and Rosenthal 1995; Schervish and Carlin 1992; Klus et al. 2017).
Furthermore, we will need the notion of reversibility for our considerations. Reversibility means that the process is statistically indistinguishable from its timereversed counterpart.
Definition 2.3
In what follows, we will assume that the system is reversible.
2.2 Spectral Decomposition
Due to the selfadjointness, the eigenvalues \( \lambda _i^t \) of \( \mathcal {P}^t \) and \( \mathcal {T}^t \) are realvalued and the eigenfunctions form an orthogonal basis with respect to \( \left\langle \cdot ,\, \cdot \right\rangle _{1/\mu } \) and \( \left\langle \cdot ,\, \cdot \right\rangle _\mu \), respectively. In what follows, we assume that the spectrum of \( \mathcal {T}^t \) is purely discrete given by (infinitely many) isolated eigenvalues. This assumption is made for the sake of simplicity. It is actually not required for the rest of our considerations; it would be sufficient to assume that the spectral radius R of the essential spectrum of \( \mathcal {T}^t \) is strictly smaller than 1, and some isolated eigenvalues of modulus larger than R exist. It has been shown that this condition is satisfied for a large class of metastable dynamical systems, see Schütte and Sarich (2014), Sec. 5.3 for details. For example, the process generated by (2) has purely discrete spectrum under mild growth and regularity assumptions on the potential V.
2.3 Implied Time Scales
3 Projected Transfer Operators and Reaction Coordinates
The purpose of dimension reduction in molecular dynamics is to find a reduced dynamical model that captures the dominant time scales of the system correctly while keeping the model as simple as possible. In this section, we will introduce two different projections and the corresponding projected transfer operators. The goal is to find suitable projections onto the slow processes.
3.1 Galerkin Projections and Markov State Models
Equation (6) readily suggests that \( T^t_\text {core} \) is a projection of the transfer operator \( \mathcal {T}^t \), namely its Galerkin projection onto the space spanned by the characteristic functions \( \mathbb {1}_{\mathbb {C}_1},\ldots , \mathbb {1}_{\mathbb {C}_{d+1}} \) (Schütte et al. 1999).
Definition 3.1
Equivalently, \( T^t = \Pi _{\psi } \mathcal {T}^t \). We also denote the extension of \( T^t \) to the whole \( L^2_{\mu }(\mathbb {X}) \), given by \( \Pi _{\psi }\mathcal {T}^t\Pi _{\psi } \), by \( T^t \). Furthermore, we denote the matrix representation of \( T^t \) with respect to the basis \( (\psi _0,\ldots ,\psi _d) \) by \( T^t \) as well. Either it will be clear from the context which of the objects \(T^t\) is meant or it will not matter; e.g., the dominant spectrum is the same for all of them.
We see that \(T_\text {core} \) is the matrix representation of the Galerkin projection with respect to the basis functions \( \left\langle \mathbb {1}_{\mathbb {C}_i},\, \mathbb {1}_{\mathbb {C}_i} \right\rangle _{\mu }^{1} \mathbb {1}_{\mathbb {C}_i}\), \(i=1,\ldots ,d+1\). More general MSMs can be built by Galerkin projections of the transfer operator to spaces spanned by other—not necessarily piecewise constant—basis functions (Weber 2006; Schütte et al. 2011; Weber et al. 2017; Klus et al. 2016, 2017; PérezHernández et al. 2013; Noé and Nüske 2013). However, in some of these methods, one also often loses the interpretation of the entries of the matrix \( T^t \) as probabilities.
Ultimately, the best MSM in terms of approximation quality in (4) is given by the Galerkin projection of \( \mathcal {T}^t \) onto the space spanned by its dominant eigenfunctions \( \varphi _0 , \ldots , \varphi _d \). This space is invariant under \( \mathcal {T}^t \) since \(\mathcal {T}^t \varphi _i = \lambda _i^t \varphi _i \) and the dominant eigenvalues (and hence the time scales) are the same for the MSM and for \( \mathcal {T}^t \). Due to the curse of dimensionality, however, the computation of the eigenfunctions \( \varphi _i \) is in general infeasible for highdimensional problems.
Remark 3.2
There are quantitative results assessing the error in (4) of the MSM in terms of the projection errors \(\Vert \Pi _{\psi }^{\perp }\varphi _i\Vert _{L^2_{\mu }}\), \(i=0,\ldots ,d\), cf. Schütte and Sarich (2014), Section 5.3. One can obtain a weaker, but similar result from our Lemma 3.5 in the next section.
3.2 Coordinate Projections and Effective Transfer Operators
While the MSMs from above successfully reproduce the dominant time scales of the original system, they often discard all other information about the system, such as the transition paths between metastable sets. Minimal coordinates that describe these transitions are called reaction coordinates, and reducing the dynamics onto these coordinates yields effective dynamics (Legoll and Lelièvre 2010; Zhang et al. 2016). The goal of the previous section—namely to retain the dominant time scales of the original dynamics in a reduced model—can now be reformulated for this lowerdimensional effective dynamics or, equivalently, for its (effective) transfer operator.
Definition 3.3
Proposition 3.4
 (a)
\(P_{\xi }\) is a linear projection, i.e., \(P_{\xi }^2 = P_{\xi }\).
 (b)
\(P_{\xi }\) is selfadjoint with respect to \( \left\langle \cdot ,\, \cdot \right\rangle _\mu \).
 (c)
\(P_{\xi } :L^2_{\mu }(\mathbb {X}) \rightarrow L^2_{\mu }(\mathbb {X})\) is orthogonal, hence nonexpansive, i.e., \(\Vert P_{\xi }f\Vert _{L^2_{\mu }} \le \Vert f\Vert _{L^2_{\mu }}\).
Proof
See Appendix A. \(\square \)
As mentioned above, a Galerkin projection of the transfer operator onto its dominant eigenfunctions is a perfect MSM. In the same vein, we ask here how we can characterize a good reaction coordinate. We can make use of the following general result.
Lemma 3.5
Let \(\mathbb {H}\) be a Hilbert space with scalar product \(\langle \cdot ,\cdot \rangle \) and associated norm \({\Vert \cdot \Vert }\); let \(Q:\mathbb {H}\rightarrow \mathbb {H}\) be some orthogonal projection on a linear subspace of \(\mathbb {H}\), with \(Q^{\perp } = \mathrm {Id}Q\). Let \(T:\mathbb {H}\rightarrow \mathbb {H}\) be a selfadjoint nonexpansive linear operator, and u with \(\Vert u\Vert =1\) its eigenvector, i.e., \(Tu = \lambda u\) for some \(\lambda \in \mathbb {R}\). If \(\Vert Q^{\perp }u\Vert <\varepsilon \), then \(T_Q:=QTQ\) has an eigenvalue \(\lambda _Q\in \mathbb {R}\) with \(\lambda  \lambda _Q<\varepsilon /\sqrt{1\varepsilon ^2}\).
Proof
With \(\mathbb {H}=L^2_{\mu }\), \(Q=P_{\xi }\), and \(T = \mathcal {T}^t\) we immediately obtain the following result.
Corollary 3.6
As before, let \( \lambda _i^t \) and \( \varphi _i \), \( i = 0, \ldots , d \), denote the dominant eigenvalues and eigenfunctions of \( \mathcal {T}^t \), respectively. For any given i, if \( \Vert P_{\xi }^{\perp }\varphi _i\Vert _{L^2_\mu } < \varepsilon \), then there is an eigenvalue \(\tilde{\lambda }_i^t\) of \( \mathcal {T}^t_\xi \) with \( \lambda _i^t  \tilde{\lambda }_i^t < \varepsilon /\sqrt{1\varepsilon ^2} \).
Corollary 3.6 implies that if the projection error of all dominant eigenfunctions is small, then \( \xi \) is a good reaction coordinate in the sense of (12). Very similar results are available for approximation of the eigenvalues of the infinitesimal generator of the Fokker–Planck equation associated with the transfer operator if the dynamical system under consideration is continuous in time (Zhang and Schuette 2017).
Under which conditions is the projection error small? Let us consider the case where there are \( \tilde{\varphi }_i:\mathbb {R}^k \rightarrow \mathbb {R}\), \( i = 1, \ldots , d \), such that \( \varphi _i(x) = \tilde{\varphi }_i(\xi (x)) \). We then say that \(\varphi _i\) is a function of \(\xi \) or that \(\xi \) parametrizes \(\varphi _i\). If \(\xi \) parametrizes \(\varphi _i\) perfectly, the projection error obviously vanishes. Thus, trivially, by choosing \( \xi = \varphi = (\varphi _1, \ldots , \varphi _d)^{\intercal } \), we obtain a perfect reaction coordinate since with \(\tilde{\varphi }_i(z):= z_i \) we have \( \varphi _i = \tilde{\varphi }_i \circ \xi \). However, the eigenfunctions are global objects, i.e., their computation is prohibitive in high dimensions. Since we are aiming at computing a reaction coordinate, we have to answer the question of whether there is a reaction coordinate \(\xi \) that can be evaluated based on local computations only, while it parametrizes the dominant eigenfunctions of \(\mathcal {T}^t\) well enough such that it leads to a small projection error. We will see next that this question can be answered by utilizing a common property of most metastable systems: The transitions between the metastable sets happen along the socalled reaction pathways, which imply the existence of transition manifolds in the space of transition densities. A suitable parametrization of this manifold results in a parametrization of the dominant eigenfunctions with a small error.
4 Identifying Good Reaction Coordinates
The goal is now to find a reaction coordinate \(\xi \) that is as lowdimensional as possible and results in a good projected transfer operator in the sense of (12). As we saw in the previous section, the condition \(\Vert P_\xi ^\perp \varphi _i\Vert _{L^2_{\mu }} \approx 0\) is sufficient. Thus, the idea to numerically seek \(\xi \) that parametrizes the dominant eigenfunctions of \(\mathcal {T}^t\) in the \( \Vert \cdot \Vert _{L^2_{\mu }}\)norm seems natural since this would lead to small projection error \( \Vert P_\xi ^\perp \varphi _i\Vert _{L^2_{\mu }}\).
In fact, eigenfunctions of transfer operators have been used before to compute reduced dynamics and reaction coordinates: In Froyland et al. (2014), methods to decompose multiscale systems into fast and slow processes and to project the dynamics onto these subprocesses based on eigenfunctions of the Koopman operator \( \mathcal {K}^t \) are proposed. In McGibbon et al. (2017), the dominant eigenfunctions of the transfer operator \( \mathcal {T}^t \), which due to the assumed reversibility of the system is identical to \( \mathcal {K}^t \), are shown to be good reaction coordinates. Also, committor functions (introduced in Appendix B), which are closely related to the dominant eigenfunctions, have been used as reaction coordinates in Du et al. (1998) and Lu and VandenEijnden (2014).
 1.
The eigenproblem is global. Thus, if we wish to learn the value of an eigenfunction \(\varphi _i\) at only one location \(x\in \mathbb {X}\), we need an approximation of the transfer operator \(\mathcal {T}_t\) that has to be accurate on all of \(\mathbb {X}\). The computational effort to construct such an approximation grows exponentially with \(\dim (\mathbb {X})\); this is the curse of dimensionality. There have been attempts to mitigate this (Weber 2006; Junge and Koltai 2009; Weber 2012), but we aim to circumvent this problem entirely. Given two points \(x,y \in \mathbb {X}\), we will decide whether \(\xi (x)\) is close to \(\xi (y)\) or not by using only local computations around x and y (i.e., samples from the transition densities \(p^t(x,\cdot )\) and \(p^t(y,\cdot )\) for moderate t).
 2.
The number of dominant eigenfunctions \( (d + 1) \) equals the number of metastable states, and this number can be much larger than the dimension of the transition manifold. This fact is illustrated in Example 4.1.
Example 4.1
Let us consider a diffusion process of the form (2) with the circular multiwell potential shown in Fig. 2. Choosing a temperature that is not high enough for the central potential barrier to be overcome easily, transitions between the wells typically happen in the vicinity of a onedimensional reaction pathway, the unit circle. The number of dominant eigenfunctions, however, corresponds to the number of wells. Nevertheless, projecting the system onto the unit circle would retain the dominant time scales of the system, cf. Sect. . \(\triangle \)
4.1 Parametrization of Dominant Eigenfunctions
If the \( (d+1) \) dominant eigenfunctions do not depend fully on the phase space \( \mathbb {X}\), a lowerdimensional and ultimately easier to find reaction coordinate suffices for keeping the eigenvalue approximation error (12) small. It is easy to see that if there exists a function \( \xi :\mathbb {X}\rightarrow \mathbb {R}^k \) for some k so that the eigenfunctions \( \varphi \) are constant on the level sets of \( \xi \), i.e., there exist functions \( \tilde{\varphi }_i :\mathbb {R}^k \rightarrow \mathbb {R}\), \( i = 1, \ldots , d \), such that \( \varphi _i = \tilde{\varphi }_i \circ \xi \), then the projection error \( \Vert P_\xi ^\perp \varphi _i \Vert _{L^2_{\mu }} \) is zero. A quantitative generalization of this is the statement that if the eigenfunctions \( \varphi _i \) are almost constant on level sets of a \( \xi \), then the projection error is small.
Lemma 4.2
Proof
Remark 4.3
 (Q1)
In which dynamical situations can we expect to find lowdimensional reaction coordinates?
 (Q2)
How can we computationally exploit the properties of the dynamics to obtain reaction coordinates?
Definition 4.4
Remark 4.5
While it is natural to motivate \((\varepsilon ,r)\)reducibility by the existence of reaction pathways in phase space, it is not strictly necessary. There exist stochastic systems without lowdimensional reaction pathways whose densities still quickly converge to a transition manifold in \(L^1\). Future work includes the identification of necessary and sufficient conditions for the existence of transition manifolds (see the first point in Conclusion). We also further elaborate on the connection between reaction pathways and transition manifolds in Appendix B.
Remark 4.6
Note that we only need to evolve the system at hand for a moderate time \(t\ll t_\text {slow}\), which has to be merely sufficiently large to damp out the fast fluctuations in the metastable states. This will be an important point later, allowing for numerical tractability.
Next, we show that \((\varepsilon ,r)\)reducibility implies that dominant eigenfunctions are almost constant on the level sets of \(\mathcal {Q}\).
Lemma 4.7
Proof
Assuming that the eigenfunctions are normalized (which we do from now on), i.e., \(\Vert \varphi _i\Vert _{L^2_{\mu }}=1\), and that \(\varepsilon \) is sufficiently small, Lemma 4.7 implies that the dominant eigenfunctions (i.e., \(\lambda _i\approx 1\)) are almost constant on the level sets of \(\mathcal {Q}\). This can now be used to show that the \(\varphi _i\) are not fully dependent on \(\mathbb {X}\), but only on the level sets of \(\mathcal {Q}\) (up to a small error), in a sense similar to Lemma 4.2.
Corollary 4.8
Proof
4.2 Embedding the Transition Manifold
In light of Corollary 4.8, one could say that \(\mathcal {Q}\) is an “\(\mathbb {M}\)valued reaction coordinate.” However, as we have no access to \(\mathbb {M}\) so far, and a \(\mathbb {R}^k\)valued reaction coordinate is more intuitive, we aim to obtain a more useful representation of the transition manifold through embedding it into a finite, possibly lowdimensional Euclidean space.
We will see that we are very free in the choice of the embedding mapping, even though the manifold \(\mathbb {M}\) is not known explicitly. (We only assumed that it exists.) To achieve this, we will use an infinitedimensional variant of the weak Whitney embedding theorem (Sauer et al. 1991; Whitney 1936), which, roughly speaking, states that “almost every bounded linear map from \(L^1(\mathbb {X})\) to \(\mathbb {R}^{2r+1}\) will be onetoone on \(\mathbb {M}\) and its image.” We first specify what we mean by “almost every” in the context of bounded linear maps, following the notions of Sauer et al. (1991).
Definition 4.9
(Prevalence) A Borel subset \(\mathbb {S}\) of a normed linear space \(\mathbb {V}\) is called prevalent if there is a finitedimensional subspace \(\mathbb {E}\) of \(\mathbb {V}\) such that for each \(v\in \mathbb {V}\), \(v+e\) belongs to \(\mathbb {S}\) for (Lebesgue) almost every e in \(\mathbb {E}\).
As the infinitedimensional embedding theorem from Hunt and Kaloshin (1999) is applicable not only to smooth manifolds, but to arbitrary subsets \(\mathbb {A}\subset \mathbb {V}\) of fractal dimension, it uses the concepts of boxcovering dimension \(\dim _B(\mathbb {A})\) and thickness exponent \(\tau (\mathbb {A})\) from fractal geometry. Intuitively, \(\dim _B(\mathbb {A})\) describes the exponent of the growth rate in the number of boxes of decreasing side length that are needed to cover \(\mathbb {A}\), and \(\tau (\mathbb {A})\) describes how well \(\mathbb {A}\) can be approximated using only finitedimensional linear subspaces of \(\mathbb {V}\). As these concepts coincide with the traditional measure of dimensionality in our setting, we will not go into detail here and point to Hunt and Kaloshin (1999) for a precise definition.
The general infinitedimensional embedding theorem reads:
Theorem 4.10
Note that (17) implies Hölder continuity of \(\mathcal {E}^{1}\) on \(\mathcal {E}(\mathbb {A})\) and in particular that \(\mathcal {E}\) is onetoone on \(\mathbb {A}\) and its image. Using that the boxcounting dimension of a smooth rdimensional manifold \(\mathbb {K}\) is simply r and that the thickness exponent is bounded from above by the boxcounting dimension, thus \(0\le \tau (\mathbb {K}) \le r\), see Hunt and Kaloshin (1999), we get the following infinitedimensional embedding theorem for smooth manifolds.
Corollary 4.11
Let \(\mathbb {V}\) be a Banach space, let \(\mathbb {K}\subset \mathbb {V}\) be a smooth manifold of dimension r, and let \(k>2r\). Then almost every (in the sense of prevalence) bounded linear function \(\mathcal {E}:\mathbb {V}\rightarrow \mathbb {R}^k\) is onetoone on \(\mathbb {K}\) and its image in \(\mathbb {R}^k\).
Thus, since the transition manifold \(\mathbb {M}\) is assumed to be a smooth rdimensional manifold in \(L^1(\mathbb {X})\), an arbitrarily chosen bounded linear map \(\mathcal {E} :L^1(\mathbb {X})\rightarrow \mathbb {R}^{2r+1}\) can be assumed to be onetoone on \(\mathbb {M}\) and its image. In particular, \(\mathcal {E}(\mathbb {M})\) is again an rdimensional manifold (although not necessarily smooth). With this insight, we can now construct a reaction coordinate in Euclidean space:
Corollary 4.12
Proof
Since \(\hat{\mathbb {M}} := \mathcal {E}(\mathbb {M})\) is an rdimensional manifold, \(\xi \) is effectively an rdimensional reaction coordinate. Thus, if the righthand side of (19) is small, the \(\varphi _i\) are “almost parametrizable” by the rdimensional reaction coordinate \(\xi \). Using Lemma 4.2, we immediately see that this results in a small projection error \(\Vert P_\xi ^\perp \varphi _i\Vert \), and due to Corollary 3.6 in a good transfer operator approximation; hence, \(\xi \) is a good reaction coordinate.
Remark 4.13
The recent work of Dellnitz et al. (2016) uses similar embedding techniques to identify finitedimensional objects in the state space of infinitedimensional dynamical systems. They utilize the infinitedimensional delayembedding theorem of Robinson (2005), a generalization of the wellknown Takens embedding theorem (Takens 1981), to compute finitedimensional attractors of delay differential equations by established subdivision techniques (Dellnitz and Hohmann 1997).
4.3 Numerical Approximation of the Reaction Coordinate
 1.
How to choose the embedding \(\mathcal {E}\)?
 2.
How to deal with the fact that we do not know \(\mathcal {Q}\)?
The question now is how we can reduce \(\mathcal {E}\circ \overline{\mathcal {Q}}\) to an rdimensional good reaction coordinate. Since we know from above that \(\xi = \mathcal {E}\circ \mathcal {Q}\) is a good reaction coordinate, let us see what would be needed to construct it.
The property of \(\xi \) that we want is that it is constant along level sets \(\mathbb {L}_z\) of \(\mathcal {Q}\), i.e., \(\xi \vert _{\mathbb {L}_z} = \mathrm {const}\) (because this implies that it is a good reaction coordinate, cf. Corollary 4.12). Hence, if we could identify \(\hat{\mathbb {M}}\) as an rdimensional manifold in \(\mathbb {R}^{2r+1}\), we would project \(\mathcal {E}(\overline{\mathcal {Q}}(x))\) along \(\mathcal {E}(\overline{\mathcal {Q}}(\mathbb {L}_z))\) onto \(\hat{\mathbb {M}}\)—assuming that \(\hat{\mathbb {M}}\) and \(\mathcal {E}(\overline{\mathcal {Q}}(\mathbb {L}_z))\) intersect in \(\mathbb {R}^{2r+1}\)—to obtain \(\xi (x)\) as the resulting point (see Fig. 5, where we would project along the red line on the right). Unfortunately, we have no access to \(\mathcal {Q}\) (not to mention that \(\hat{\mathbb {M}}\) and \(\mathcal {E}(\overline{\mathcal {Q}}(\mathbb {L}_z))\) need not intersect in \(\mathbb {R}^{2r+1}\)) and hence to its level sets \(\mathbb {L}_z\). Thus, this strategy seems infeasible.
Next, we consider the set \(\tilde{\mathbb {L}}_{\hat{z}} := \mathcal {E}^{1}(\hat{\mathbb {L}}_{\hat{z}}) \cap \overline{\mathcal {Q}}(\mathbb {X})\). It holds that \(\tilde{\mathbb {L}}_{\hat{z}} = \left\{ \overline{\mathcal {Q}}(x)\,\big \vert \,\overline{\xi }(x) = \Psi (\hat{z})\right\} \). Recall that \(\mathcal {E} :\mathbb {M} \rightarrow \hat{\mathbb {M}}\) is onetoone; thus, \(\tilde{\mathbb {L}}_{\hat{z}}\) intersects \(\mathbb {M}\) in exactly one point. We define this one point as \(\mathcal {Q}(x)\), and thus, \(\mathcal {Q}'\) is the projection onto \(\mathbb {M}\) along \(\tilde{\mathbb {L}}_{\hat{z}}\). We see that \(\mathcal {Q}\) is well defined and that \(\mathcal {Q}(x)=\mathcal {Q}(y) \Leftrightarrow \overline{\xi }(x) = \overline{\xi }(y)\).
At this point we assume that \(\mathcal {E}^{1}\) is sufficiently well behaved in a neighborhood of \(\hat{\mathbb {M}}\) and it does not “distort transversality” of intersections such that the diameter of \(\tilde{\mathbb {L}}_{\hat{z}}\) is \(\mathcal {O}(\varepsilon )\) with a moderate constant in \(\mathcal {O}(\cdot )\). We will investigate a formal justification of this fact in a future work, here we assume it holds true, and we will see in the numerical experiments that the assumption is justified. This assumption implies that \(\Vert \overline{\mathcal {Q}}(x)  \overline{\mathcal {Q}}(y)\Vert _{L^2_{1/\mu }} = \mathcal {O}(\varepsilon )\) for \(\mathcal {Q}(x) = \mathcal {Q}(y)\), i.e., for \(\overline{\xi }(x) = \overline{\xi }(y)\). Now, however, Lemma 4.7 implies that \(\varphi _i\) is almost constant (up to an error \(\mathcal {O}(\varepsilon )\)) on level sets of \(\overline{\xi }\), which, in turn, by Lemma 4.2 and Corollary 3.6 shows that \(\overline{\xi }\) is a good reaction coordinate.
4.4 Identification of \(\hat{\mathbb {M}}\) Through Manifold Learning
In this section, we describe how to identify \(\hat{\mathbb {M}}\) numerically. The task is as follows: Given that we have computed \(\mathcal {E}(\overline{\mathcal {Q}}(x_i)) = {\hat{z}}_i \in \mathbb {R}^{2r+1}\) for a number of sample points \(\{x_i\}_{i=1}^{\ell } \subset \mathbb {X}\), we would like to identify the rdimensional manifold \(\hat{\mathbb {M}}\), noting the points \(\mathcal {E}(\overline{\mathcal {Q}}(x_i))\) are in a \(\mathcal {O}(\varepsilon )\)neighborhood of \(\hat{\mathbb {M}}\) (see Sect. 4.3). Additionally, we would like an rdimensional coordinate function \(\Psi :\mathbb {R}^{2r+1} \rightarrow \mathbb {R}^r\) that parametrizes \(\hat{\mathbb {M}}\) (so that the level sets of \(\Psi \) are transversal to \(\hat{\mathbb {M}}\)).
This is a default setting for which manifold learning algorithms can be applied. Many standard methods exist; we name multidimensional scaling (Kruskal 1964b, a), isomap (Tenenbaum et al. 2000), and diffusion maps (Coifman and Lafon 2006) as a few of the most prominent examples. Because of its favorable properties, we choose the diffusion maps algorithm here and summarize it briefly for our setting in what follows. For details, the reader is referred to Coifman and Lafon (2006), Nadler et al. (2006), Coifman et al. (2008) and Singer et al. (2009).
Remark 4.14
The diffusion maps algorithm will only reliably identify \(\hat{\mathbb {M}}\) based on the neighborhood relations between the embedded sample points \(z_i\), if the points cover all parts of \(\hat{\mathbb {M}}\) sufficiently well. In particular, as \(p^t(x,\cdot )\) and thus \(\big (\mathcal {E}\circ \overline{\mathcal {Q}}\big )(x)\) vary strongly with x traversing the transition regions, a good coverage of those regions is required.
For the various lowdimensional academic examples Sect. 5, this is ensured by choosing the \(x_i\) to be a dense grid of points in \(\mathbb {X}\). For the highdimensional example in Sect. 5.2, the evaluation points are generated as a subsample from a long equilibrated trajectory, essentially sampling \(\mu \). Both of these ad hoc methods are likely to be unapplicable in realistic highdimensional systems with very long equilibration times. However, as we mentioned in Introduction, there exist multiple statistical and dynamical approaches to this common problem of quickly sampling the relevant parts of phase space, including the transition regions. Each of these sampling methods can be easily integrated into our proposed algorithm as a preprocessing step.
In addition, situations may occur where the a priori generation of evaluation points is not possible or desired. One of the final goals and the work currently in progress is the construction of an accelerated integration scheme that generates significant evaluation points and their reaction coordinate value “on the fly.” This is related to the effective dynamics mentioned in fifth point in Conclusion. However, this also requires us to be able to evaluate the reaction coordinate at isolated points, independent of each other, and thus also necessitates the use of the above \(\overline{\overline{{\xi }}}\) instead of \(\overline{\xi }\).
5 Numerical Examples
 1.
Let \( x_i \), \( i = 1, \ldots , \ell \), be the points for which we would like to evaluate \( \overline{\xi } \). Here, we assume the points satisfy the requirements addressed in Remark 4.14.
 2.
Choose linearly independent functions \( \eta _j \in L^\infty (\mathbb {X}) \), \( j = 1, \ldots , 2r+1 \). The essential boundedness of the \( \eta _j \) is not necessary, but \( \eta _j(x) \) should not grow faster than a polynomial as \( \Vert x\Vert _2\rightarrow \infty \).
 3.
In each point \( x_i \), start M simulations of length t and estimate \(\mathcal {E}_j\big (\overline{\mathcal {Q}}(x_i)\big )\) using (23) and (24), to obtain the point \(\hat{z}_i\in \mathbb {R}^{2r+1}\). We discuss the appropriate choice of M and t in Sect. 5.1.
 4.
Apply the diffusion maps technique from Sect. 4.4 for the point cloud \( \{\hat{z}_i\}_{i=1}^{\ell } \), and obtain \(\Psi :\mathbb {R}^{2r+1}\rightarrow \mathbb {R}^r\), a parametrization of the point in its r essential directions of variation.
 5.By (27), we define the reaction coordinate as$$\begin{aligned} \overline{\xi }:\,x_i \mapsto \Psi (\hat{z}_i). \end{aligned}$$(28)
In order to demonstrate the efficacy of our method, we compute the reaction coordinates for three representative problems, namely a simple curved doublewell potential, a multiwell potential defined on a circle, both in low and high dimensions, and two slightly different quadruplewell potentials stressing the difference between a one and a twodimensional reaction coordinate.
5.1 Curved DoubleWell Potential
 1.
Compute points \(\overline{\mathbb {X}} := \{\varvec{\Phi }_{(k\tau )}x_0~~k=1,\ldots ,N\}\), a discrete trajectory with step size \(\tau \) of the full phase space dynamics that adequately samples the invariant density \(\varrho \).
 2.
Compute the reaction coordinate \(\overline{\xi }\) on the points \(\overline{\mathbb {X}}\).
 3.Divide the neighborhood of \(\overline{\xi }(\overline{\mathbb {X}})\) into boxes or other suitable discretization elements \(\{\mathbb {A}_1,\ldots ,\mathbb {A}_N\}\) and sample the boxes from the trajectory, i.e., compute$$\begin{aligned} \overline{\mathbb {X}}_i := \left\{ x\in \overline{\mathbb {X}}~~\bar{\xi }(x) \in \mathbb {A}_i\right\} . \end{aligned}$$
 4.Count the timettransitions within \(\overline{\mathbb {X}}\) between the boxes (where t is a multiple of \(\tau \)), i.e., compute the matrix$$\begin{aligned} \left( T^t_{\overline{\xi }}\right) _{ij} := \#\left\{ x\in \overline{\mathbb {X}}_i~~\varvec{\Phi }_tx \in \overline{\mathbb {X}}_j\right\} . \end{aligned}$$
 5.
After rownormalization, the eigenvalues of \(T^t_{\overline{\xi }}\) approximate the point spectrum of \(\mathcal {T}^t_{\overline{\xi }}\).
Remark 5.1
Note that the equilibrated trajectory \(\overline{\mathbb {X}}\) is typically unavailable for more complex systems. In practice, one would replace steps 1 and 2 by directly computing a reduced trajectory \(\overline{\mathbb {Z}}=\{z_1,\ldots ,z_N\}\subset \mathbb {R}^r\) whose statistics approximate that of \(\xi \big (\overline{\mathbb {X}}\big )\). The formulation of a reduced numerical integration scheme to realize this is a work currently in progress (see the fifth point in Conclusion).
For our example system, we compute \(\overline{\mathbb {X}}\) as a \(N=10^6\) step trajectory with step size \(\tau =10^{2}\) using the Euler–Maruyama scheme. However, to reduce the numerical effort, \(\overline{\xi }\) is computed only on a subsample of \(\overline{\mathbb {X}}\) (\(10^4\) points) and extended to \(\overline{\mathbb {X}}\) by nearestneighbor interpolation. On \(\overline{\mathbb {X}}\), the image of the \(\overline{\xi }\) is contained in the interval \([0.04,0.04]\), which we discretize into \(M=40\) subintervals of equal length. The spectrum of the full transfer operator \(\mathcal {T}^t\) was computed using the standard Ulam method over a \(40\times 30\) uniform box discretization of the domain \([2,2]\times [1,2]\). With the choice \(t=1\) for the lag time, the spectral gap is clearly visible.
5.2 Circular Potential
We again choose the inverse temperature \(\beta =0.5\) and perform the same analysis as in the previous subsection. For this system, a time scale gap between \(t_6 \approx 1.53\) and \(t_7 \approx 0.05\) can be found. We thus choose the intermediate time scale \( t = 0.1 \). Since we again expect a onedimensional transition path, the three observables (29) are used for the embedding of \(\mathbb {M}\). We use the grid points of a \(40 \times 40\) grid, denoted again by \(\overline{\mathbb {X}}\), over the region \([2,2]\times [2,2]\) as our test points.
In ten dimensions, the computation of the reaction coordinate on all points of a regular grid is no longer possible due to the curse of dimensionality, and neither is the visualization of this grid. Instead, we compute \(\overline{\xi }\) on \(10^5\) points sampled from the invariant measure and plot only the first three coordinates. Let this point cloud be called \(\overline{\mathbb {X}}\).
5.3 Two QuadrupleWell Potentials
Our theory is based on the existence of an rdimensional transition manifold \(\mathbb {M}\) in \(L^1(\mathbb {X})\) around which the transition probability functions concentrate. In Appendix B, we argue that the existence of an rdimensional transition path suffices to ensure the existence of \(\mathbb {M}\). Here we illustrate how the existence of the transition path is reflected in the embedding procedure.
Both systems possess metastable sets around the four minima \((\pm 1,\pm 1)\), but \(V_1\) confines its dynamics outside of the metastable sets onto a onedimensional transition path, whereas \(V_2\) does not impose such restrictions on the dynamics (see Fig. 17). For both potentials the time \(t=1\) lies inside the slow–fast time scale gap. Assuming a onedimensional transition manifold (wrongfully for \(V_2\)), we use the three linear observables (29). A \(40\times 40\) grid on \([2,2]\times [2,2]\) is used as evaluation points for \(\overline{\xi }\). The embedding of these points by \(\mathcal {E}\circ \overline{\mathcal {Q}}\) is shown in Fig. 18. We observe a onedimensional structure in the case of the “hilly” potential \(V_1\), whereas the embedding points of the “flat” potential \(V_2\) lie on a seemingly twodimensional manifold. As these embeddings are approximately onetoone with the respective transition manifolds \(\mathbb {M}\), we conclude that in the case of \(V_1\) the manifold \(\mathbb {M}\) must be onedimensional, whereas for \(V_2\) it is twodimensional.
6 Conclusion
 (a)
We developed a mathematical framework to characterize good reaction coordinates for stochastic dynamical systems showing metastable behavior, but no local separation of fast and slow time scales.
 (b)
We showed the existence of good lowdimensional reaction coordinates under certain dynamical assumptions on the system.
 (c)
We proposed an algorithmic approach to numerically identify good reaction coordinates and the associated lowdimensional transition manifold based on local evaluation of short trajectories of the system only.

A rigorous mathematical justification for the dynamical assumption in Definition 4.4 in terms of the potential V and the noise intensity \(\beta ^{1}\) in (2) would be desirable. This seems to be a demanding task, as the interplay between potential landscape and the thermal forcing is nontrivial. For \(\beta ^{1}\rightarrow 0\) the problem can be handled by large deviation approaches; however, understanding increasing \(\beta ^{1}\) is challenging: The strength of noise increases, and additional transitions between metastable sets become more probable, as the barriers in the potential landscape become less significant, and thus the reaction coordinate may increase in dimension.

Also related to the previous point, the choice of the correct lag time t is crucial. Choosing the time too small, the concentration of the transition densities near a lowdimensional manifold in \(L^1\) may not have happened yet, but a too large lag time has severe consequences for the numerical expenses. If no expert knowledge of a proper lag time t is available, it has to be identified in a preprocessing step, for example using Markov state model techniques (Bowman et al. 2014).

As discussed in the last part of Sect. 4.3 and in Fig. 6, we need the embedding \(\mathcal {E}\) not to distort transversality close to the transition manifold \(\mathbb {M}\) too much, such that the realized reaction coordinate \(\overline{\xi }\) is indeed a good one. Theoretical bounds shall be developed. This problem seems to be coupled with the problem of how to control the condition number of the embedding and its numerical realization.

The dimension r of the reaction coordinate may not be known in advance; hence, we need an algorithmic strategy to identify this on the fly. Fortunately, once the sampling has been made, the evaluation of the embedding mapping \(\mathcal {E}\), and finding intrinsic coordinates on the set of data points embedded in \(\mathbb {R}^k\), has a negligible numerical effort; hence, different embedding dimensions k can be probed via (21). Theorem 4.10 suggests that if the identified dimension of the reaction coordinate is smaller than k / 2, then a reaction coordinate of sufficient dimension is found.

To benefit from the dimensionality reduction of the reaction coordinate \(\xi \), the dynamics that generates the reduced transfer operator \(\mathcal {T}^t_\xi \) has to be described in closed form. We are planning to employ techniques based on the Kramers–Moyal extension (Zhang et al. 2016) to again receive an SDE for a stochastic process on \(\mathbb {R}^r\).

The embedding mapping \(\mathcal {E}\) is evaluated by Monte Carlo quadrature (24). Although Monte Carlo quadrature is known to have a convergence rate independent of the underlying dimension n of \(\mathbb {X}\), there is still an impact of the dimension on the practical accuracy. This we shall investigate as well.
Footnotes
 1.
We denote by \(L^q\) the space (equivalence class) of qintegrable functions with respect to the Lebesgue measure. \(L^q_{\nu }\) denotes the same space of function, now integrable with respect to the measure \(\nu \).
 2.
The coarea formula holds for \(L^1\) functions, but \(L^2_{\mu }\subset L^1_{\mu }\), since \(\mu \) is a probability measure. (That is, it is finite.)
 3.
 4.
In our examples, we used linear functions with great success.
 5.
In realistic, highdimensional systems, the computation of the dominant eigenvalues using gridbased methods is likely infeasible. In these situations, the implied time scales have to be estimated, for example using standard Markov state model techniques (Bowman et al. 2014).
Notes
Acknowledgements
This research has been partially funded by Deutsche Forschungsgemeinschaft (DFG) through grant CRC 1114 “Scaling Cascades in Complex Systems,” Project B03 “Multilevel coarse graining of multiscale problems” and the Einstein Foundation Berlin (Einstein Center ECMath).
Supplementary material
References
 Baxter, J.R., Rosenthal, J.S.: Rates of convergence for everywherepositive Markov chains. Stat. Probab. Lett. 22(4), 333–338 (1995)MathSciNetCrossRefMATHGoogle Scholar
 Becker, N.B., Allen, R.J., ten Wolde, P.R.: Nonstationary forward flux sampling. J. Chem. Phys. 136(17), 174118 (2012)CrossRefGoogle Scholar
 Best, R.B., Hummer, G.: Coordinatedependent diffusion in protein folding. Proc. Natl. Acad. Sci. 107(3), 1088–1093 (2010)CrossRefGoogle Scholar
 Bittracher, A., Koltai, P., Junge, O.: Pseudogenerators of spatial transfer operators. SIAM J. Appl. Dyn. Syst. 14(3), 1478–1517 (2015)MathSciNetCrossRefMATHGoogle Scholar
 Bovier, A., Gayrard, V., Klein, M.: Metastability in reversible diffusion processes II. Precise asymptotics for small eigenvalues. J. Eur. Math. Soc. 7, 69–99 (2002)MathSciNetMATHGoogle Scholar
 Bowman, G.R., Pande, V.S., Noé, F. (eds): An Introduction to Markov State Models and Their Application to Long Timescale Molecular Simulation, volume 797 of Advances in Experimental Medicine and Biology. Springer, Berlin (2014)Google Scholar
 Camacho, C.J., Thirumalai, D.: Kinetics and thermodynamics of folding in model proteins. Proc. Natl. Acad. Sci. 90(13), 6369–6372 (1993)CrossRefGoogle Scholar
 Chiavazzo, E., Coifman, R.R., Covino, R., Gear, C.W., Georgiou, A.S., Hummer, G., Kevrekidis, I.G.: iMapD: intrinsic map dynamics exploration for uncharted effective free energy landscapes. arXiv preprint arXiv:1701.01513 (2016)
 Chodera, J.D., Noé, F.: Markov state models of biomolecular conformational dynamics. Curr. Opin. Struct. Biol. 25, 135–144 (2014)CrossRefGoogle Scholar
 Ciccotti, G., Kapral, R., VandenEijnden, E.: Blue moon sampling, vectorial reaction coordinates, and unbiased constrained dynamics. ChemPhysChem 6(9), 1809–1814 (2005)CrossRefGoogle Scholar
 Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006)MathSciNetCrossRefMATHGoogle Scholar
 Coifman, R.R., Kevrekidis, I.G., Lafon, S., Maggioni, M., Nadler, B.: Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems. Multiscale Model. Simul. 7(2), 842–864 (2008)MathSciNetCrossRefMATHGoogle Scholar
 Crosskey, M., Maggioni, M.: ATLAS: a geometric approach to learning highdimensional stochastic systems near manifolds. Multiscale Model. Simul. 15(1), 110–156 (2017)MathSciNetCrossRefMATHGoogle Scholar
 Darve, E., RodríguezGómez, D., Pohorille, A.: Adaptive biasing force method for scalar and vector free energy calculations. J. Chem. Phys. 128(14), 144120 (2008)CrossRefGoogle Scholar
 Dellago, C., Bolhuis, P.G.: Transition path sampling and other advanced simulation techniques for rare events. In: Holm, C., Kremer, K. (eds.) Advanced Computer Simulation Approaches for Soft Matter Sciences III, pp. 167–233. Springer, Berlin (2009)CrossRefGoogle Scholar
 Dellnitz, M., Hohmann, A.: A subdivision algorithm for the computation of unstable manifolds and global attractors. Numer. Math. 75(3), 293–317 (1997)MathSciNetCrossRefMATHGoogle Scholar
 Dellnitz, M., Junge, O.: On the approximation of complicated dynamical behavior. SIAM J. Numer. Anal. 36(2), 491–515 (1999)MathSciNetCrossRefMATHGoogle Scholar
 Dellnitz, M., von Molo, M.H., Ziessler, A.: On the computation of attractors for delay differential equations. J. Comput. Dyn. 3(1), 93–112 (2016)MathSciNetMATHGoogle Scholar
 Djurdjevac, N., Sarich, M., Schütte, C.: Estimating the eigenvalue error of Markov state models. Multiscale Model. Simul. 10(1), 61–81 (2012)MathSciNetCrossRefMATHGoogle Scholar
 Dsilva, C.J., Talmon, R., Gear, C.W., Coifman, R.R., Kevrekidis, I.G.: Datadriven reduction for a class of multiscale fast–slow stochastic dynamical systems. SIAM J. Appl. Dyn. Syst. 15(3), 1327–1351 (2016)MathSciNetCrossRefMATHGoogle Scholar
 Du, R., Pande, V.S., Grosberg, A.Y., Tanaka, T., Shakhnovich, E.S.: On the transition coordinate for protein folding. J. Chem. Phys. 108(1), 334–350 (1998)CrossRefGoogle Scholar
 E, W., Engquist, B.: The heterogenous multiscale method. Commun. Math. Sci. 1(1), 87–132 (2003)Google Scholar
 E, W., VandenEijnden, E.: Towards a theory of transition paths. J. Stat. Phys 123(3), 503–523 (2006)Google Scholar
 E, W., VandenEijnden, E.: Transitionpath theory and pathfinding algorithms for the study of rare events. Annu. Rev. Phys. Chem 61(1), 391–420 (2010)Google Scholar
 E, W., Ren, W., VandenEijnden, E.: String method for the study of rare events. Phys. Rev. B 66(5), 052301 (2002)Google Scholar
 Faradjian, A.K., Elber, R.: Computing time scales from reaction coordinates by milestoning. J. Chem. Phys. 120, 10880–10889 (2004)CrossRefGoogle Scholar
 Federer, H.: Geometric Measure Theory, vol. 1996. Springer, New York (1969)MATHGoogle Scholar
 Freidlin, M., Wentzell, A.D.: Random Perturbations of Dynamical Systems. Springer, New York (1998)CrossRefMATHGoogle Scholar
 Froyland, G., Gottwald, G., Hammerlindl, A.: A computational method to extract macroscopic variables and their dynamics in multiscale systems. SIAM J. Appl. Dyn. Syst. 13(4), 1816–1846 (2014)MathSciNetCrossRefMATHGoogle Scholar
 Froyland, G., Gottwald, G.A., Hammerlindl, A.: A trajectoryfree framework for analysing multiscale systems. Phys. D Nonlinear Phenom. 328, 34–43 (2016)MathSciNetCrossRefMATHGoogle Scholar
 Huisinga, W., Meyn, S., Schütte, C.: Phase transitions and metastability in Markovian and molecular systems. Ann. Appl. Probab. 14(1), 419–458 (2004)MathSciNetCrossRefMATHGoogle Scholar
 Hunt, B., Kaloshin, V.: Regularity of embeddings of infinitedimensional fractal sets into finitedimensional spaces. Nonlinearity 12(5), 1263–1275 (1999)MathSciNetCrossRefMATHGoogle Scholar
 Junge, O., Koltai, P.: Discretization of the Frobenius–Perron operator using a sparse Haar tensor basis: the sparse Ulam method. SIAM J. Numer. Anal. 47(5), 3464–3485 (2009)MathSciNetCrossRefMATHGoogle Scholar
 Kevrekidis, I.G., Samaey, G.: Equationfree multiscale computation: algorithms and applications. Annu. Rev. Phys. Chem. 60(1), 321–344 (2009)CrossRefGoogle Scholar
 Klus, S., Koltai, P., Schütte, C.: On the numerical approximation of the Perron–Frobenius and Koopman operator. J. Comput. Dyn. 3(1), 51–79 (2016)MathSciNetMATHGoogle Scholar
 Klus, S., Nüske, F., Koltai, P., Wu, H., Kevrekidis, I., Schütte, C., Noé, F.: Datadriven model reduction and transfer operator approximation. ArXiv eprints (2017)Google Scholar
 Kruskal, J.B.: Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29(1), 1–27 (1964a)MathSciNetCrossRefMATHGoogle Scholar
 Kruskal, J.B.: Nonmetric multidimensional scaling: a numerical method. Psychometrika 29(2), 115–129 (1964b)MathSciNetCrossRefMATHGoogle Scholar
 Kumar, S., Rosenberg, J.M., Bouzida, D., Swendsen, R.H., Kollman, P.A.: The weighted histogram analysis method for freeenergy calculations on biomolecules. I. The method. J. Comput. Chem. 13(8), 1011–1021 (1992)CrossRefGoogle Scholar
 Laio, A., Gervasio, F.L.: Metadynamics: a method to simulate rare events and reconstruct the free energy in biophysics, chemistry and material science. Rep. Prog. Phys. 71(12), 126601 (2008)CrossRefGoogle Scholar
 Laio, A., Parrinello, M.: Escaping freeenergy minima. Proc. Natl. Acad. Sci. 99(20), 12562–12566 (2002)CrossRefGoogle Scholar
 Legoll, F., Lelièvre, T.: Effective dynamics using conditional expectations. Nonlinearity 23(9), 2131 (2010)MathSciNetCrossRefMATHGoogle Scholar
 Li, W., Ma, A.: Recent developments in methods for identifying reaction coordinates. Mol. Simul. 40(10–11), 784–793 (2014)CrossRefGoogle Scholar
 Lu, J., VandenEijnden, E.: Exact dynamical coarsegraining without timescale separation. J. Chem. Phys. 141(4), 07B619_1 (2014)Google Scholar
 Ma, A., Dinner, A.R.: Automatic method for identifying reaction coordinates in complex systems. J. Phys. Chem. B 109, 6769–6779 (2005)CrossRefGoogle Scholar
 Maragliano, L., VandenEijnden, E.: A temperature accelerated method for sampling free energy and determining reaction pathways in rare events simulations. Chem. Phys. Lett. 426(1), 168–175 (2006)CrossRefGoogle Scholar
 Mattingly, J.C., Stuart, A.M.: Geometric ergodicity of some hypoelliptic diffusions for particle motions. Markov Process. Relat. Fields 8(2), 199–214 (2002)MathSciNetMATHGoogle Scholar
 Mattingly, J.C., Stuart, A.M., Higham, D.J.: Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise. Stoch. Process. Appl. 101(2), 185–232 (2002)MathSciNetCrossRefMATHGoogle Scholar
 McGibbon, R.T., Husic, B.E., Pande, V.S.: Identification of simple reaction coordinates from complex dynamics. J. Chem. Phys. 146(4), 044109 (2017)CrossRefGoogle Scholar
 Metzner, P., Schütte, C., VandenEijnden, E.: Transition path theory for Markov jump processes. Multiscale Model. Simul. 7(3), 1192–1219 (2009)MathSciNetCrossRefMATHGoogle Scholar
 Moroni, D., van Erp, T.S., Bolhuis, P.G.: Investigating rare events by transition interface sampling. Phys. A Stat. Mech. Appl. 340(1), 395–401 (2004)CrossRefGoogle Scholar
 Nadler, B., Lafon, S., Coifman, R.R., Kevrekidis, I.G.: Diffusion maps, spectral clustering and reaction coordinates of dynamical systems. Appl. Comput. Harmon. Anal. 21(1), 113–127 (2006)MathSciNetCrossRefMATHGoogle Scholar
 Noé, F., Nüske, F.: A variational approach to modeling slow processes in stochastic dynamical systems. Multiscale Model. Simul. 11(2), 635–655 (2013)MathSciNetCrossRefMATHGoogle Scholar
 Noé, F., Schütte, C., VandenEijnden, E., Reich, L., Weikl, T.R.: Constructing the full ensemble of folding pathways from short offequilibrium simulations. Proc. Natl. Acad. Sci. 106, 19011–19016 (2009)CrossRefMATHGoogle Scholar
 Pande, V.S., Beauchamp, K., Bowman, G.R.: Everything you wanted to know about Markov state models but were afraid to ask. Methods 52(1), 99–105 (2010)CrossRefGoogle Scholar
 Pavliotis, G., Stuart, A.: Multiscale Methods: Averaging and Homogenization. Springer, Berlin (2008)MATHGoogle Scholar
 PérezHernández, G., Paul, F., Giorgino, T., De Fabritiis, G., Noé, F.: Identification of slow molecular order parameters for Markov model construction. J. Chem. Phys. 139(1), 015102 (2013)CrossRefGoogle Scholar
 Pozun, Z.D., Hansen, K., Sheppard, D., Rupp, M., Mller, K.R., Henkelman, G.: Optimizing transition states via kernelbased machine learning. J. Chem. Phys. 136(17), 174101 (2012)CrossRefGoogle Scholar
 Ren, W., VandenEijnden, E., Maragakis, P., E, W.: Transition pathways in complex systems: application of the finitetemperature string method to the alanine dipeptide. J. Chem. Phys. 123(13), 134109 (2005)CrossRefGoogle Scholar
 Robinson, J.C.: A topological delay embedding theorem for infinitedimensional dynamical systems. Nonlinearity 18(5), 2135–2143 (2005)MathSciNetCrossRefMATHGoogle Scholar
 Sarich, M., Noé, F., Schütte, C.: On the approximation quality of Markov state models. Multiscale Model. Simul. 8(4), 1154–1177 (2010)MathSciNetCrossRefMATHGoogle Scholar
 Sauer, T., Yorke, J.A., Casdagli, M.: Embedology. J. Stat. Phys. 65(3), 579–616 (1991)MathSciNetCrossRefMATHGoogle Scholar
 Schervish, M.J., Carlin, B.P.: On the convergence of successive substitution sampling. J. Comput. Graph. Stat. 1(2), 111–127 (1992)MathSciNetGoogle Scholar
 Schütte, C., Sarich, M.: Metastability and Markov State Models in Molecular Dynamics: Modeling, Analysis. Algorithmic Approaches. Courant Lecture Notes in Mathematics, vol. 24. American Mathematical Society (2013)Google Scholar
 Schütte, C., Fischer, A., Huisinga, W., Deuflhard, P.: A direct approach to conformational dynamics based on hybrid Monte Carlo. J. Comput. Phys. 151(1), 146–168 (1999)MathSciNetCrossRefMATHGoogle Scholar
 Schütte, C., Noé, F., Lu, J., Sarich, M., VandenEijnden, E.: Markov state models based on milestoning. J. Chem. Phys. 134(20), 204105 (2011). doi: 10.1063/1.3590108
 Singer, A., Erban, R., Kevrekidis, I.G., Coifman, R.R.: Detecting intrinsic slow variables in stochastic dynamical systems by anisotropic diffusion maps. Proc. Natl. Acad. Sci. 106(38), 16090–160955 (2009)CrossRefGoogle Scholar
 Socci, N., Onuchic, J.N., Wolynes, P.G.: Diffusive dynamics of the reaction coordinate for protein folding funnels. J. Chem. Phys. 104(15), 5860–5868 (1996)CrossRefGoogle Scholar
 Takens, F.: Detecting strange attractors in turbulence. In: Rand, D., Young, L.S. (eds.) Springer Lecture Notes in Mathematics, vol. 898, pp. 366–381. Springer, Berlin (1981)Google Scholar
 Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
 Torrie, G.M., Valleau, J.P.: Nonphysical sampling distributions in Monte Carlo freeenergy estimation: umbrella sampling. J. Comput. Phys. 23(2), 187–199 (1977)CrossRefGoogle Scholar
 Trefethen, L.N., Embree, M.: Spectra and Pseudospectra: The Behavior of Nonnormal Matrices and Operators. Princeton University Press, Princeton (2005)MATHGoogle Scholar
 VandenEijnden, E.: Transition path theory. In: Ferrario, M., Ciccotti, G., Binder, K. (eds.) Computer Simulations in Condensed Matter Systems: From Materials to Chemical Biology, vol. 1, pp. 453–493 (2006)Google Scholar
 VandenEijnden, E.: On HMMlike integrators and projective integration methods for systems with multiple time scales. Commun. Math. Sci. 5(2), 495–505 (2007)MathSciNetCrossRefMATHGoogle Scholar
 Weber, M.: Meshless methods in conformation dynamics. Ph.D. thesis, FU Berlin (2006)Google Scholar
 Weber, M.: A subspace approach to molecular Markov state models via a new infinitesimal generator. Habilitation thesis (2012)Google Scholar
 Weber, M., Fackeldey, K., Schütte, C.: Setfree Markov state model building. J. Chem. Phys. 146, 124133 (2017)CrossRefGoogle Scholar
 Whitney, H.: Differentiable manifolds. Ann. Math. 37(3), 645–680 (1936)MathSciNetCrossRefMATHGoogle Scholar
 Zhang, W., Schuette, C.: Reliable approximation of long relaxation timescales in molecular dynamics. Entropy 19, 367 (2017). doi: 10.3390/e19070367 CrossRefGoogle Scholar
 Zhang, W., Hartmann, C., Schütte, C.: Effective dynamics along given reaction coordinates, and reaction rate theory. Faraday Discuss. 195, 365–394 (2016)CrossRefGoogle Scholar
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.