Sample Paths Estimates for Stochastic Fast-Slow Systems driven by Fractional Brownian Motion

We analyze the effect of additive fractional noise with Hurst parameter $H>\frac{1}{2}$ on fast-slow systems. Our strategy is based on sample paths estimates, similar to the approach by Berglund and Gentz in the Brownian motion case. Yet, the setting of fractional Brownian motion does not allow us to use the martingale methods from fast-slow systems with Brownian motion. We thoroughly investigate the case where the deterministic system permits a uniformly hyperbolic stable slow manifold. In this setting, we provide a neighborhood, tailored to the fast-slow structure of the system, that contains the process with high probability. We prove this assertion by providing exponential error estimates on the probability that the system leaves this neighborhood. We also illustrate our results in an example arising in climate modeling, where time-correlated noise processes have become of greater relevance recently.


Introduction
Fast-slow systems naturally arise in the modeling of several phenomena in natural sciences, when processes have widely differing rates [23,25,31]. The standard form of a fast-slow system of ordinary differential equations (ODEs) is given by dx ds = x = f (x, y, ε), dy ds = y = εg(x, y, ε), where x are the fast variables, y are the slow variables, ε > 0 is a small parameter, and f, g are sufficiently smooth vector fields; for a more detailed technical introduction regarding the analysis of (1) we refer to Section 2.1. Here we just point out the basic aspects from the modeling perspective. First, note that if ε = 0, then (1) becomes a parametrized set of ODEs, where the y-variables are parameters. Taking this viewpoint, all bifurcation problems [21,33] involving parameters naturally relate to fast-slow dynamics if the parameters vary slowly, which is often a natural assumption in applications. Second, in practice, we also want to couple many dynamical systems. The resulting large/complex system is often multiscale in time and space. For example, in the context of climate modeling [12,26] coupled processes can evolve on temporal scales of seconds up to millennial scales. Third, fast-slow systems are the core class of dynamical problems to understand singular perturbations [51], i.e., roughly speaking singular perturbations problems with small parameters are those, which degenerate in the limit of the small parameter into a different class of equations. Combining all these observations, it is not surprising that fast-slow systems have become an important tool in more theoretical as well as application-oriented parts of nonlinear dynamics [31].
However, when dealing with real life phenomena certain random influences have to be taken into account and quantified in a suitable way [16]. The most common stochastic process used to describe uncertainty is Brownian motion W = W t . One of its key features is the memoryless or Markov property, which means that the behavior of this process after a certain time T > 0 only depends on the situation at the current time T . In certain applications it may be desirable to model long-range dependencies and to take into account the evolution of the process up to time T . One of the most famous example is constituted by fractional Brownian motion (fBm) W H = W H t ; see [28] for its first use. A fBm is a centered stationary Gaussian processes parameterized by the so-called Hurst index/parameter H ∈ (0, 1). For H = 1/2 one recovers classical Brownian motion. However, for H ∈ (1/2, 1) and H ∈ (0, 1/2), fBm exhibits a totally different behavior compared to Brownian motion. Its increments are no longer independent, but positively correlated for H > 1/2 and negative correlated for H < 1/2. The Hurst index does not only influence the structure of the covariance but also the regularity of the trajectories. Fractional Brownian motion has been used to model a wide range of phenomena such as network traffic [49], stock prices and financial markets [35,47], activity of neurons [15,42], dynamics of the nerve growth [39], fluid dynamics [52], as well as various phenomena in geoscience [29,36,41]. However, the mathematical analysis of stochastic systems involving fBm is a very challenging task. Several well-known results for classical Brownian motion are not available. For instance, the distribution of the hitting time τ a of a level a is explicitly known for a Brownian motion, whereas for fBm, one has only an asymptotic statement, according to which P(τ a > t) = t −(1−H)+o (1) , as t goes to infinity, see [37]. Furthermore, since fBm is not a semi-martingale, Itô-calculus breaks down. Therefore, it is highly non-trivial to define an appropriate integral with respect to the fBm. This issue has been intensively investigated in the literature. There are numerous approaches that exploit the regularity of the trajectories of the fBm in order to develop a completely path-wise integration theory and to analyze differential equations. For more details, see [17,19,20,24,34] and the references specified therein. Furthermore, another ansatz employed to define stochastic integrals with respect to fBm relies on the stochastic calculus of variations (Malliavin calculus) developed in [10]. In summary, fBm is a natural candidate process to aim to improve our understanding of correlated stochastic dynamics.
Our objective here is to combine the study of fast-slow systems and fBm by starting to study stochastic differential equations of the form where we start with the case of additive noise for the fast variable(s) and assume there is a single regularly slowly-drifting variable y. For H = 1/2, i.e., for Brownian motion, there is a very detailed theory, how to analyze stochastic fast-slow systems [31]. One particular building block -initially developed by Berglund and Gentz -uses a sample paths viewpoint [4]. This approach has recently been extended to broader classes of spatial stochastic fast-slow systems [18] and it has found many successful applications; see e.g. [3,30,44,48]. Therefore, it is evident that one should also consider the case of correlated noise in the fast-slow setup [22,53].
Our key goal is to derive sample paths estimates for fast-slow systems driven by fBm with Hurst index H ∈ (1/2, 1). We restrict ourselves to the case of additive noise and establish the theory for the normally hyperbolic stable case. Due to the technical challenges mentioned above, we need to derive sharp estimates for the exit times for processes solving certain equations driven by fBm. Exploring various properties of general Gaussian processes, we propose two variants to obtain optimal sample paths estimates.
Then we are going to apply our theory to a climate model describing the North-Atlantic thermohaline circulation forced by fractional Brownian motion. In fact, it is well-established that just using white noise modelling in climate models can be insufficient. The simple reason is that neglecting spatial and temporal correlations does not represent the statistics of large classes of underlying climate measurement data including temperature time series [13,27], historical climate data [1,2,9], as well as large-scale simulation data [6]. In all these cases, an elegant way to model temporal correlations in climate science is fractional Brownian motion [1,9,27,46,54]. The reasoning to use a time correlated process can also be understood in climate dynamics in various intuitive ways. For example, in a larger-scale climate model, stochastic terms often represent unresolved degrees of freedom or small-scale fluctuations. If we consider the weather as a short-lived smaller scale effect in terms of the global long-term climate, then models for the latter must include noise with (positive) time correlations as weather patterns are positively correlated in time on short scales. Similarly, if the noise terms represent external forcing, such as input from another climate subsystem on a macro-scale, then also this input is likely to be correlated in time as there are internal correlations of the long-term behaviour of each larger-scale climate subsystem. In summary, this has motivated us to consider a model from climate dynamics as one possible key application for fast-slow dynamical systems with fractional Brownian motion. As mentioned above, in many other applications, fractional Brownian motion also naturally appears, so our modelling approach via fast-slow systems with with fBm is even more broadly applicable.
This work is structured as follows. In Section 2 we introduce basic notions from the theory of fast-slow systems and fractional Brownian motion. Furthermore, we state important estimates for the exit times of Gaussian processes which will be required later on. In Section 3, we generalize the theory of [4] by first deriving an attracting invariant manifold of the variance using the fast-slow structure of the system. Based on this manifold we define a region, where the linearization of the process is contained with high probability. In order to prove such statements, we first derive a suitable nonlocal Lyapunov-type equation for the covariance of the solution of a linear equation driven by fBm, the so-called fractional Ornstein-Uhlenbeck process. Thereafter we analyze two variants which entail sharp estimates for the exit times of this process. Furthermore, we consider more complicated dynamics and provide extensions of our results to the non-linear case, more complicated slow dynamics and finally discuss the case of fully coupled dynamics. We apply our theory to a model for the North-Atlantic thermohaline circulation and provide some simulations. Section 4 generalizes the sample paths estimates to higher dimensions in the autonomous linear case. Our strategy is based on diagonalization techniques, which allow us to go back to the one-dimensional case and apply the results developed in Section 3. For completeness, we provide an appendix which contains a detailed proof regarding the limit superior of a non-autonomous fractional Ornstein-Uhlenbeck processes. We conclude in Section 5 with an outlook of possible continuations of our results.

Deterministic Fast-Slow Systems
In this section, we will briefly introduce the terminology of fast-slow systems. We restrict ourselves to the most important results tailored to our problem in the upcoming sections. For further details, see [31]. For the definition of the setting, all of the equations are to be understood formally. We will later add regularity assumptions sufficient to deduce important results. These also imply that the formal computation we will have performed before are valid.
where x = x s , y = y s are the unknown functions of the fast time variable s, the vector fields are f : R m × R n × R → R m , g : R m × R n × R → R n , and ε > 0 is a small parameter. The x variables are called the fast variables, while y variables are called the slow variables. Transforming into another time scale by defining theslow time t = εs yields the equivalent system Depending on the situation both formulations in fast and slow time may be of use. In particular, under certain assumptions, considering them for ε → 0 indicates a lot of information for the underlying dynamics for the case 0 < ε 1. The process for ε → 0 is called the singular limit. The singular limit of (3) for ε → 0 is called the fast subsystem. The resulting system of the slow time formulation of the fast-slow system (4) for ε → 0 is called the slow subsystem. The set is called the critical set. If C 0 is a manifold, it is also called the critical manifold. From now on, we assume that C 0 is a manifold given by a graph of the slow variables, i.e., where D ⊂ R n is an open subset.
Theorem 2.2 [14,25,31,50]). Let f, g ∈ C r (R m × R n × R), 1 ≤ r < ∞, and their derivatives up to order r be uniformly bounded. Assume that C 0 is uniformly hyperbolic. Then for an ε 0 > 0 there exists a locally invariant C r -smooth manifold for all ε ∈ (0, ε 0 ], wherex(y, ε) = x * (y) + O(ε) with respect to the fast variables. Furthermore, the local stability properties of C ε are the same as the ones for C 0 .

Fractional Brownian Motion
In this section we state important properties of fBm, which will be required later on. For further details see [5,38] and the references specified therein. We fix a complete probability space (Ω, F, P) and use the abbreviation a.s. for almost surely.
for all t, s ≥ 0.
Note that for H > 1/2 the covariance of fBm satisfies We further observe that: where the integral kernel K is given by for a positive constant c H depending exclusively on the Hurst parameter.
We remark that for suitable square integrable kernels, one obtains different stochastic processes, for instance the multi-fractional Brownian motion or the Rosenblatt process, see [8]. We now focus on the most important properties of fBm. For the complete proofs of the following statements, see [ Proposition 2.6. Let (W H t ) t≥0 be a fBm of Hurst index H ∈ (0, 1). Then: i.e. fBm is self-similar with Hurst index H. Since fBm is not a semi-martingale, the standard Itô calculus is not applicable. Due to this reason, the construction of a stochastic integral of a random function with respect to fBm has been a challenging question, see [5,10] and the references specified therein. However, for deterministic integrands and for H > 1/2 the theory essentially simplifies. We deal exclusively with this case and indicate for the sake of completeness the theory of Wiener integrals of deterministic functions with respect to fBm, see [10]. Let T > 0 and

3) [Stationarity of increments] For all
Observe that I(h; T ) defines a Gaussian random variable with where The representation of the variance can be easily verified by noting the following identity Note that H > 1/2 is crucial here. For p > 1 H we can bound the L 2 (Ω)-norm of h → I(h; T ) as follows where we have obtained the estimate by applying Hölder's inequality and Young's inequality for convolutions [7,Theorem 3.9.4]. The boundedness claim now follows as φ 2 L p/(2p−2) (0,T ) < ∞ for p > 1 H . This means that I(·, T ) is a bounded linear operator defined on the dense subspace E ⊂ L p (0, T ), so it can be uniquely extended to a bounded operator This discussion justifies the following definition: The integral process I p (f 1 [0,t] ; T ) t∈[0,T ] is by construction centered Gaussian. Regarding (7), its covariance can be immediately computed as follows.

Stochastic Differential Equations Driven by Fractional Brownian Motion
After establishing a suitable stochastic integral with respect to the fractional Brownian motion, we consider stochastic differential equations (SDEs) given by: The solution satisfies the integral formulation where the stochastic integral was constructed in Section 2. In this work one case we have to consider is a time-dependent linear drift, i.e., b(t, ·) : R → R is linear with b(t, x) := A(t)x for every t ∈ [0, ∞) and x ∈ R. In this case, the solution of (9) is given by the variation of constants formula/Duhamel's formula and is called non-autonomous fractional Ornstein-Uhlenbeck process.
which satisfies the variation of constants formula (R2) Naturally, existence and uniqueness of SDEs in higher dimension carry over from Theorem 2.10 under the same assumptions respectively. In particular, for coefficients A, B : [0, ∞) → R m with m ≥ 1, satisfying the same assumptions as in Theorem 2.11, the solution of (10) is given by where Φ denotes the fundamental solution of x t = A(t)x t and (W H t ) t≥0 is an m-dimensional fractional Brownian motion.

Useful Estimates of Gaussian Processes
The fact that fBm is not a semi-martingale restricts the repository of known inequalities (such as Doob or Burkholder-Davies-Gundy) to establish sample paths estimates. A crucial property of fBm we shall exploit is its Gaussianity. In this section we will describe some useful estimates for exit times of certain Gaussian processes, which will be helpful for our analysis in the upcoming sections.
We first state the next auxiliary result regarding the Laplace transform of a Gaussian process. This was established in [11] by means of Malliavin calculus.
In addition, we require the following form of Chebychev's inequality. Proof. Under these assumptions we have Taking expectation in the above inequality yields the result.
Lemma 2.15. Let c > 0 and (Y t ) t≥0 be a centered Gaussian process with Y 0 = 0 satisfying the assumptions i)-iv) of Lemma 2.13. Then, for its exit time τ c := inf{r > 0 : Y r ≥ c}, the following estimate holds: .
The previous lemma established a Bernstein-type inequality solely relying on certain properties of the covariance function of Gaussian processes. Another useful estimate is given by [40,Theorem D.4], which is based on Slepian's Lemma [45].
be a centered Gaussian process with a.s. continuous trajectories. Assume that (Y t ) t∈[0,T ] is a.s. mean-square Hölder continuous, i.e. there are constants G and γ such that Then there exists a constant K : This estimate can be sharpened if we restrict ourselves to the interval of interest.
Then there exists a constant K :

The One-Dimensional Case
In this section, we investigate the dynamics of a planar stochastic fast-slow system driven by fractional Brownian motion (W H s ) s≥0 with Hurst parameter H > 1 2 : Its equivalent formulation in slow time, i.e. for t = εs is using the self-similarity of fBm (6). We are interested in the normally hyperbolic stable case and therefore make the following assumptions. 2) Critical manifold: There is an Under these assumptions, (12) has a unique global solution according to Theorem 2.10. Furthermore, the deterministic system, i.e., for σ = 0, given by has an asymptotically slow manifoldx(t, ε) = x * (t)+O(ε) for ε > 0 small enough due to Fenichel-Tikhonov (Theorem 2.2). We expect that, given small noise 0 < σ 1, the trajectories of (12) starting sufficiently close tox(0, ε) remain in a properly chosen neighborhood ofx(t, ε) for a long time with high probability. Our goal will be to make this idea rigorous by pursuing the following steps. We first linearize the system around the slow manifold to get an SDE describing the deviations induced by the noise. This helps us obtain a simple description of a suitable neighborhood by using the fast-slow structure inherited by the variance of the system. Then, using this neighborhood, we deduce sample paths estimates for the linear case starting on the slow manifold. To complete the discussion we generalize the result to the non-linear case starting sufficiently close to the slow manifold, that is, such that in the deterministic case solutions are still attracted by the slow manifold. This general strategy inspired by [4], where a similar system driven by Brownian motion (Hurst parameter H = 1 2 ) is analyzed. Yet, the several techniques used in [4] do not generalize to fBm.

The Linearized System
The deterministic system has an asymptotically stable slow manifoldx(t, ε) = x * (t) + O(ε) due to Fenichel-Tikhonov (Theorem 2.2). As already outlined, our first step is to examine the behavior of the linearized system aroundx(t, ε). For a solution (x t ) t∈I of (12) we set ξ t := x t −x(t, ε). Then (ξ) t∈I satisfies the equation where by Taylor's remainder theorem. Due to the uniform boundedness of the derivatives of f one can show that the O(ε)-term is negligible on finite time scales as ε → 0. Therefore, we restrict ourselves without loss of generality to the analysis of the linearization Examining the process starting on the slow manifold now corresponds to investigating the unique explicit solution of (14) for initial value ξ 0 = 0, which is given by the fractional Ornstein-Uhlenbeck process (recall Theorem 2.11) where α(t, u) := t u a(r)dr. In order to define a proper neighborhood, where the fractional Ornstein-Uhlenbeck process (ξ t ) t∈I is going to stay with high probability, we use the variance Var(ξ t ) as an indicator for the deviations at time t. According to Proposition 2.9, the variance is given by As we would like to see dynamics of t → Var(ξ t ), we rescale it by 1 σ 2 to get rid of the small parameter σ 1, which only changes the order of magnitude of the system. It turns out that t → w(t) inherits the fast-slow structure from the SDE, which yields a particularly simple approximation of the variance.
In particular, there is a (globally) asymptotically stable slow manifold of the system of the form In order to be able to take the singular limit ε → 0 and apply Fenichel-Tikhonov (Theorem 2.2) we need to prove sufficient regularity in ε = 0; continuous differentiability will be enough for the approximation of the slow manifold with the critical manifold up to order O(ε). To do this, rewrite the integral by substituting To see that the right hand side of (15) is continuously differentiable in ε = 0 it is sufficient to check it for the integral term which has an existing limit for ε → 0, because the exponential term goes to 0 faster than the polynomial term diverges. Now taking the singular limit ε → 0 gives the slow subsystem The critical manifold is hence given by Using integration by parts we can rewrite (2H − 1)Γ(2H − 1) = Γ(2H), so that the critical manifold can also be written as By Theorem 2.2, the ODE (15) has a solution of the form which is asymptotically stable due to Assumption 3.1 3. This stability property is even global in this case because the ODE (15) is linear.
As expected, the critical manifold depends on the Hurst parameter H. For H ∈ ( 1 2 , 1) we have HΓ(2H) ∈ ( 1 2 , 1). This means that the only possible structural change of the critical manifold under variation of H is induced by the factor which coincides with the slow subsystem we would obtain in the case of Brownian motion noise, which exactly corresponds to H = 1 2 . Remark 3.3. The proof of Proposition 3.2 only shows that ζ is C 1 in ε and C 1 in the time t. Depending on the properties of f, we expect ζ to even have higher regularity. However, this fact is not required in the following considerations. Proposition 3.2 already states that the slow manifold is a good indicator for the size of the set we are looking for as t → 1 σ 2 Var(ξ t ) (as a solution of (15) with initial datum w(0) = 0) is attracted by the slow manifold. In this particular case we can explicitly state the exponentially fast approach due to the structure of the linear equation where α(t) := α(t, 0). Even more is known about the properties of ζ. Due to the uniform boundedness assumption on f and F we get that the difference between ζ and F (t) 2 |a(t)| 2H HΓ(2H) is actually in uniform t. This implies that for ε small enough there are ζ + and ζ − such that The goal is now to prove that the stochastic process (ξ) t∈I is concentrated in sets of the form To get a better understanding of what to expect, note that the probability that ξ leaves B(h) at time t can be bounded by using the inequality P(X > c) ≤ exp − c 2 2Var(X) , which holds for any centered Gaussian random variable X. This further leads to Of course, the probability that (ξ) t∈I has exited . We will present a few approaches to estimate this probability in the following, using the inequalities we have established in Section 2.3. The increase of probability compared to (18) is simply indicated by the prefactor of exp − h 2 2σ 2 . Remark 3.4. It could also have been possible to define the neighborhood B(h) by considering the critical manifold w * , i.e. to define This will yield the same bounds on the exit times, which we will establish in the following because the difference between ζ(t) and w * (t) is only in O(ε), which is of the same order as the order we obtain by approximating with the slow manifold ζ anyways.

Variant 1
The first approach is based on the result on exit times of Gaussian processes with sufficiently regular and increasing covariance function, as stated in Lemma 2.15.
Theorem 3.5. Let t ∈ I. Then under Assumption 3.1 for any h > 0 the following estimate holds true for ε > 0 sufficiently small Proof. In order to apply the estimate given in Lemma 2.15 to our problem observe that ξ may not satisfy all the assumptions. First of all we need well-definedness of the Ornstein-Uhlenbeck process over the whole non-negative real line [0, ∞); this is guaranteed by Assumption 3.1.
In addition, we consider the process given by X t = e −α(t)/ε ξ t . Note that the event that ξ t exceeds a certain level c ≥ 0 corresponds exactly to the event of X t exceeding e −α(t)/ε c, that is Unfortunately this observation does not carry over to sup 0≤r≤t X r and sup 0≤r≤t ξ r . In fact, a priori we only have the relation {sup 0≤r≤t ξ r ≥ c} ⊂ {sup 0≤r≤t X r ≥ c}, which yields a too strong estimate in the end as we are increasing the variance by an exponentially increasing factor, while maintaining the same exit level! A way to overcome this is to partition the interval [0, t] to suitable subintervals [t i , t i+1 ) to obtain the relation This partition will also turn out to be useful to control the variance with the slow manifold in the exponential of the estimate. The covariance of (X t ) t∈I is given by By the theorem for differentiation of parameter dependent integrals we deduce that ∂ ∂t R(t, r) exists and is continuous in [0, ∞) × [0, ∞). Furthermore, ∂ ∂t R(t, r) ≥ 0 for all t, r ≥ 0 is an immediate consequence of the fundamental theorem of calculus as the integrand is always greater equal than 0. This already implies assumption (i) und (ii) of Lemma 2.15. For assumption (iii) it suffices to observe that is a Gaussian process with nonzero variance. Assumption (iv) follows by Corollary A.2. This implies that (e −α(t)/ε ξ t ) t∈I satisfies a Bernstein-type inequality due to Lemma 2.15 After having established this result, we can proceed as in the proof of Proposition 3.1.5 [4]. For γ ∈ (0, 1/2) let 0 = t 0 < t 1 < . . . < t N be a partition containing the interval [0, t] such that (Note that t N ≥ t is possible. But this only increases the estimate on the probability slightly. As we would like to optimize over γ in the end, it does not make sense to fix it to obtain t N = t.) where the last inequality follows byζ(r) = O(1), which is proven in Lemma 3.6. Now by subadditivity of the probability measure where the last inequality is due to e −2γ ≥ 1 − 2γ. Due to monotonicity of · , it suffices to minimize in order to find the minimal value of this estimate. Optimizing over γ hence yields which finishes the proof. Proof. Note that satisfies the corresponding invariance equation for the fast-slow ODE (15) up to error O(ε). This implies that ζ(t) = w * (t, ε) + O(ε). Plugging this representation of ζ into the ODE (15) yields directly εζ(t) = O(ε).

Variant 2
The second approach uses the fact that (ξ t ) t∈I is mean-square Hölder continuous. This is also going to enable to control the deviations based upon Theorem 2.16.
Proof. Let t > 0. In order to apply Theorem 2.16 we have to prove mean-square Hölder conti- Lipschitz continuity of r → e α(r,u)/ε for arbitrary u ≥ 0 (with Lipschitz constant L = a + ε , where a + := sup r∈I |a(r)|) yields for (20) Similarly we can show for (21) 2 Last but not least (22) can be estimated as follows By combining the three estimates we obtain that, for a constant G = G(t, ε, σ, H) > 0, it holds Let 0 = t 0 < t 1 < . . . < t N = t be a partition of the interval [0, t] such that For each i ∈ {0, . . . , N − 1} we have by Corollary 2.17 for K = K(G, 2H) where the last inequality follows byζ(r) = O(1), see Lemma 3.6. This yields by the subadditivity of the probability measure Therefore, the proof is finished.

Comparison of the two Variants
In this section, we will compare the two variants in view of varying the noise intensity given by σ > 0 and the time scale parameter ε > 0. In order to better understand what to expect under these variations we first heuristically describe their effect on the underlying SDE We can directly see that a smaller σ reduces the intensity of the fraction Brownian motion noise. In particular, as σ is decreasing, the probability that (ξ r ) r∈I exits B(h) on some interval [0, t] should become smaller. For a smaller ε the attraction towards the slow manifold becomes stronger, however also the noise intensity increases. We expect small deviations if σ ε H is sufficiently small. Suppose we are in the situation of Theorem 3.5 and Theorem 3.7. To simplify the comparison the results of both variant 1 and variant 2 where K = K(t, ε, σ, H) > 0, are displayed here. Unfortunately, we do not know the dependence of K on the other parameters, so we can only do a qualitative comparison up to some extent. By looking at the proof of Theorem 3.7 we guess that K is increasing in G, so that we assume for the forthcoming analysis that K is increasing in σ, decreasing in ε and increasing in t. In variant 1, we see the same interplay of σ and h as already observed in the analysis of (18) because the exponential dominates the linear term in the prefactor, and the same holds true for variant 2 as long as ε to remain relatively small), which is true in many applications. For ε → 0 the estimate in variant 1 becomes larger, whereas in variant 2 it does not seem to have a huge effect on the bound. However, the increase might be hidden in K. As the time t increases, it obviously becomes more likely that ξ t has already exited B(h) at least once. In variant 1 this increase is displayed linearly in t as r → a(r) is uniformly bounded and thus α(t) = Θ(t). Variant 2 shows an increase which is at least linear in t because K might be increasing in t as well. This means that in variant 1 we have to pick h large enough such that

Back to the Original System
Now that we have convinced ourselves that the most promising estimate is given in Theorem 3.5 it remains to generalize the result to different scenarios which may be of interest.

The Nonlinear Case
Recall that we have rewritten the SDE (13) satisfied by the deviations around the slow manifold with a being the linear drift term and b containing the (possible) nonlinearities of the equation satisfying |b(x, t, ε)| ≤ M |x| 2 . As we expect that the nonlinear term b does not influence the deviations too strongly near the critical manifold, we use the same neighborhood B(h). This in particular implies that we keep the same simple description of it, which we have derived in Proposition 3.2. The bound on b will help us to control it inside of B(h). For the case that (x t ) t∈I is starting on the slow manifold, i.e. ξ 0 = 0 we obtain: Proof. As previously motivated before we treat the nonlinear drift term b as perturbation of the linear system, i.e. split the solution of (13) where ξ 0 t is a solution to the linear system, which we have already studied in detail, and It remains to prove that P 1 (h 1 ) is small. Observe that due to continuity of (ξ t ) t∈I we have for This enables us to control |ξ 1 r | √ Remark 3.9. With this approach we lose some accuracy (κ = 1 − O(h) instead of κ = 1) in the exponential. This has more effect on the increase of probability than in the prefactor. To overcome this difficulty it might be better to adapt the neighborhood depending on the nonlinearities.

Behavior Close to the Slow Manifold
For the deterministic system we get in the case of a uniformly asymptotically stable slow manifold that solutions starting close to it are attracted exponentially fast. Given low enough noise intensity a similar behavior can be observed in the noisy system, i.e., solutions have small deviations around the deterministic solution and after some (small) time t 0 we can again observe small deviations around the slow manifold.
Theorem 3.10. Let t > 0, d > 0. There is δ > 0 and some time t 0 > 0 such that the solutions (x t ) t∈I of (12) with initial condition x 0 satisfying |x 0 −x(0, ε)| < δ are attracted by the slow manifold. That is, up to time t 0 the solution (x t ) t∈I is close to the deterministic solution x det where the˜denotes the different values due to linearization around x det instead of the slow manifoldx(t, ε). After t 0 we obtain almost the same behavior as in the case where Proof. Exponentially fast attraction means that there are constants δ, C, κ > 0 such that for Consider an initial value x 0 ∈ R with |x 0 − x(0, ε)| < δ and denote by x det the solution to the deterministic system (σ = 0) in (12). Instead of linearizing (12) aroundx(t, ε), like we did in (13), we linearize it around x det . This procedure yields qualitatively the same linearization, with a(t) := ∂ x f (x det t , t, 0) instead of a(t, ε), or respectively a(t), see discussion before, andζ adapted accordingly. In particular, even for the nonlinear case we obtain by Theorem 3.8 whereα(t) = t 0ã (r)dr and κ = 1 − O(h). Choose t 0 such that for distance d that is t 0 ≥ ε κ ln Cδ d . Furthermore, we have by the mean value theorem We want this distance to be of order at most ε, so that in total t 0 ≥ max ε κ ln Cδ d , ε κ ln CM δ ε . Then, up to time t 0 , we can use the estimate above for (x t ) t∈I close to its deterministic solution. And after t 0 the process is already close to the slow manifold and its dynamics, so it makes sense to look at the deviations aroundx(r, ε).
which finishes the proof.

More Complicated Slow Dynamics
So far, we have considered the case where the slow dynamics is completely uniform and regular, i.e., dy t = 1dt.
However, in applications many interesting systems contain more complicated slow variables. In fact, this is particularly relevant if one wants to reduce the dynamics to the slow manifold. The reduced equation usually qualitatively describes the dynamics of the slow variables around the slow manifold quite well. This section will clarify that the theory developed so far can be extended to more complicated slow dynamics in two steps. We first generalize our result to deterministic slow dynamics, which may also influence the diffusion term and then consider a fully coupled system.

Deterministic Slow Dynamics
Hence, we consider systems of the form We have to adapt the assumptions a bit. Under these assumptions the system (23) has an attracting slow manifold wherex(y, ε) = x * (y) + O(ε) due to Theorem 2.2 (Fenichel-Tikhonov). (Again, this O(ε) is uniform in y.) We linearize the fast variable around C ε . For a solution (x, y) of (23) set ξ t = x t −x(y t , ε), then (ξ, y) satisfies the equation where due to Taylor's remainder theorem Now we can proceed as in the case for trivial slow dynamics by first considering solutions starting on the slow manifold, i.e. (ξ 0 , y 0 ) = (0, y 0 ), y 0 ∈ D 0 , and using the terms a(t) := a(y t , 0), This way we obtain the same qualitative bound (also for the nonlinear case) as before, which also coincides with intuition, as more involved dynamics on the slow manifold should not influence the attracting behavior of it.

Fully Coupled Dynamics
Now that we have seen the idea how to generalize to more complicated dynamics we give an exposition of the more general case, where the slow variables may even be random, particularly be perturbed by fBms with different Hurst parameters H 1 , H 2 ∈ ( 1 2 , 1). We consider the following system which will turn out to be interesting in applications, see Section 3.4. The following assumptions will suffice to obtain a qualitatively similar result to the one-dimensional case, analyzed in Section 3. 2. Critical manifold: There is an x * : D 0 → R for D 0 ⊂ π 2 (D) open such that is a critical manifold of the system (25). Here π 2 is the projection onto the second coordinate. Remark 3.13. Note that the theory discussed in Section 2.2.1 does not provide the technical details regarding the existence and uniqueness of solutions for coupled systems driven by fractional Brownian motion. However, this can be extended to systems of the form (25). We refer to [38,Theorem 3.3] for further details.
Similarly as before the system (25) has an attracting slow manifold given by wherex(y, ε) = x * (y) + O(ε) due to Theorem 2.2, where again the O(ε)-term is uniform in y.
The strategy to establish sample paths estimates for (25) is to successively linearize both fast and slow variables around their deterministic counterpart (i.e. the solution for σ 1 = σ 2 = 0) denoted by y det t ,x(y det t , ε). The deviations are then described by They satisfy the following SDE, whose form is obtained by successively applying Taylor's theorem (int. always stands for the appropriate intermediate value) In particular the nonlinearity term satisfies for some M 2 ≥ 0 (again uniform in the variables) |R(x, y, t, ε)| ≤ M 2 (x 2 + y 2 + |xy|).
Then we partition the event of x t in the following way Note that the first probability is of the form By the same technique used to prove Theorem 3.8 we get which is valid as long as hζ . This issue is however tightly linked to investigating the behavior of non-stable or even nonhyperbolic dynamics under fractional noise because we have no additional assumptions on the slow dynamics. In the event {τ η,h ≤ t} we conjecture that a reduction to the slow variables should be possible. The reduced equation is then given by dy t = g(x(t, ε), y t , ε)dt + σ 2 dW H 2 t , which will be illustrated by the simulations presented at the end of Section 3.4.

Example
We consider the climate model analyzed in [4, Section 6.2.1]. It is a simple model describing the difference of temperature ∆T = T 1 − T 2 and salinity ∆S = S 1 − S 2 between low latitude (T 1 , S 1 ) and high latitude (T 2 , S 2 ) by a system of coupled differential equations Here τ r stands for the relaxation time of ∆T to its reference value θ, F is the freshwater flux, H the depth of the ocean, S 0 a reference salinity. Furthermore, τ d is the diffusion timescale, q the Poiseuille transport coefficient and V the volume of the box the system is contained in. The influence of external sources, internal fluctuations, and/or microscopic effects can be incorporated into the model via noise terms. For example, daily weather variations certainly influence the temperature T and salinity S. Yet, a precise/detailed modelling of these terms would be far too expensive computationally and would make the model intractable analytically. We know, as discussed in the introduction to this work, that using white noise generally does not represent temperature fluctuations correctly but these are usually positively correlated. Since we have no further basic knowledge of the stochastic process it is quite natural to start by considering fBm with Hurst index H > 1/2. This allows us to model Gaussianity and positive correlations in time. After transforming our model into dimensionless variables x = ∆T /θ, y = α S ∆S/(α T θ), rescaling time by τ d and taking into consideration fractional noise with Hurst parameter H > 1 2 , this yields the system where ε = τ r /τ d , η 2 = τ d (α T θ) 2 q/V , and µ = F · α S S 0 τ d /(α T θH). Note that the previous system is of the form (25). We consider the solution on a bounded time interval [0, T ], T > 0 to ensure the uniform boundedness of the corresponding functions, as imposed in Assumption 3.12.
The slow subsystem of the deterministic system is given by In particular, it has a normally hyperbolic critical manifold, namely which is even stable, as d dx (−(x − 1)) = −1. By Theorem 2.2 there exists an invariant manifold In order to apply the estimate from Theorem 3.
Hence we have By Proposition 3.2 there is an attracting slow manifold for the variance of the form We conclude that, in the case that y t is deterministic, sample paths starting on the slow manifold x(y, ε) are concentrated in the set or, more precisely, for 0 < t ≤ T and initial data (x 0 , y 0 ) = (x(y 0 , ε), y 0 ) Figure 1 indicates that for small enough noise the dynamics around the slow manifold should be governed by the equation In (28) η 2 is a fixed parameter, while µ is proportional to the freshwater flux. It can be hence treated as a slowly varying parameter compared to the rescaled salinity. By setting X := y and Y := µ we obtain another fast slow system subject to some noise where g ∈ C 2 (R 2 ; R). In particular, we can apply our theory again on the two stable branches of the new slow manifold. We simulate now the reduced equation, similarly to [  Process starting on (1.2, 2) and moving backward in time.
We also give a neighborhood for the second stable branch; σ 2 = 0.1, h = 3

The Multi-Dimensional Case
In this section, we make the first steps towards extending our theory to the multi-dimensional case. Note that we keep the same notation as for the one-dimensional objects. We start again with uniform slow dynamics and consider the fast-slow system system in slow time under the following assumptions. Let m ≥ 1.

Critical manifold:
There is an

Stability:
The critical manifold is asymptotically stable, i.e. the Jacobian matrix only contains eigenvalues with negative real part. In addition, its linearization is independent of time, i.e. A(t) ≡ A.
These assumptions guarantee the existence and uniqueness of (30) due to Remark 2.12(R2). Furthermore, recall that under these assumptions there is a slow manifold due to Fenichel-Tikhonov (Theorem 2.2). We start again by examining the behavior of the linearized system around C ε . For a solution (x t ) t∈I of (30) set ξ t := x t −x(t, ε), then (ξ t ) t∈I satisfies the equation where with · 2 being the operator norm with respect to the Euclidean norm. For simplicity, we analyze the linearization with A being the drift term instead of A + O(ε), i.e. we consider The solution for ξ 0 = 0 ((x t ) t∈I starting on the slow manifold C ε ) is given by Its covariance (matrix) can be computed as In the same way as in the one-dimensional case the rescaled covariance t → Ξ(t) inherits the fast-slow structure.
Proposition 4.2. The so-called renormalized covariance Ξ(t) satisfies the fast-slow ODE where In particular, there is an (even globally) asymptotically stable slow manifold of the system of the formX where Proof. We again differentiate t → Ξ(t) to obtain the ODE where In order to be able to take the singular limit ε → 0 and apply Fenichel-Tikhonov (Theorem 2.2) we need to prove at least one times continuous differentiability in ε = 0. To do this, rewrite This implies continuity in ε = 0. To see that the right hand side of (34) is continuously differentiable in ε = 0, it is sufficient to check it for the integral P (t) where the limit for ε → 0 exists because the exponential term dominates the polynomial term in ε. The slow subsystem hence reads where This is a Lyapunov equation, and according to Lemma 4.4 it has the unique solution By Fenichel-Tikhonov (Theorem 2.2) we conclude that there is an asymptotically stable manifold of the formX Note again that the stability property, which carries over from the critical manifold, is even global due to linearity of the ODE (34).
Remark 4.3. We need that the linearization (32) is autonomous in this section for taking the singular limit in (36). In the non-autonomous case we need to compute the limit of Φ(t, t − εv) In order to investigate a multi-dimensional Lyapunov-Equation, we rely on the following result, see [4,Lemma 5.1.2]. Note again that due to the linearity of the operator LX = AX +XA the rescaled covariance t → 1 σ 2 Cov(ξ t ) as solution of (34) with starting value 1 σ 2 Cov(ξ 0 ) = 0 satisfies the following equation which explicitly depicts the exponentially fast approach of the covariance towards the slow manifold, as it could have been already concluded by Fenichel-Tikhonov (Theorem 2.2). This justifies the choice of our neighborhood this time, depending on the critical manifold As already previously mentioned in Remark 3.4 choosing the neighborhood depending on the critical manifold instead of the slow manifold does not worsen our estimates. So we expect the same to be true in the higher dimensional case. Therefore, we have used the critical manifold X * (which is time-independent in our case) this time because our strategy depends on diagonalizing, and we do not spell out the additional technical details regarding the O(ε)-term.

No Restrictions on the Linearization
The proof of Theorem 3.7 can be immediately extended the multi-dimensional case by proving the mean-square Hölder continuity in each component of the covariance.
Lemma 4.5. Let t ≥ r 1 > r 2 ≥ 0, then there is a constant G = G(t, ε, σ, H) > 0 such that Proof. Let t ≥ r 1 > r 2 ≥ 0, then Since A only has eigenvalues with negative real part we have for r ≥ u ≥ 0 where a := max{|λ| : λ eigenvalue of A}. This enables us to prove the result similarly as in the one-dimensional case, i.e. a straightforward calculation using the last result now shows the required Hölder bounds by estimating (38)- (41).
The mean-square Hölder continuity of (ξ t ) t∈I implies the same for each component. Hence, we can establish the following qualitative result.
Theorem 4.6. Let t ∈ I. Then under Assumption 4.1 there is a constant K = K(t, ε, σ, H) > 0 such that for h > 0 the following estimate holds true for ε > 0 small enough where λ k ≥ 0 with m k=1 λ k = 1 and d * k denote the (time-independent) eigenvalues of X * .
Proof. Note that the critical manifold X * is symmetric and in the autonomous case it is time independent in addition. This implies that it is diagonalizable with respect to an orthogonal matrix O (independent of time). Let O = O 1 , . . . , O m , where O k denotes the k-th row of O and d * k (r) ≡ d * k be the corresponding eigenvalues. This enables us to reduce the problem to the estimate of the one-dimensional problem, using the notation D * (r) = diag(d * k ), We have already proven that (ξ t ) t∈I is mean-square Hölder continuous in Lemma 4.5, which directly implies the same property for the components in the O-coordinate system. This means that we can apply Theorem 2.16 for O k ξ. This leads to Now note that (37) written in the k-th component in the O-coordinate system reads as

This further implies
Summing over the dimensions yields and this finishes the proof.
In the case when A is normal we get a nice description of the d * k . The corresponding k-th component of the diagonalized critical manifold results in which simplifies even further if all eigenvalues are real

Symmetric Linearization
From now on, we consider the case when A is a symmetric matrix (i.e. A = A ). The reason for this restriction is that in the following the proves to bound the probability of (ξ t ) t∈I exiting the neighborhood up to time t is based on linearizing the underlying system and understanding the structure of the eigenvalues of the covariance. We actually require normality of A for this strategy to work as it is sufficient to use the functional equality of the matrix exponential. Furthermore, e A inherits the normality structure, which is a necessary and sufficient criterion to characterize the eigenvalues of e Aλ e A µ , λ, µ ≥ 0. To be able to generalize the result of variant 1 it is crucial that the eigenvalues of A are all real. These two criteria already imply that A is symmetric. In particular, we see that the critical manifold is of the form Thanks to the discussion above the we can consider the diagonalization D * (t) := U X * (t)U . Its k-th (k = 1, . . . , m) diagonal component is given by Similarly we can rewrite the covariance and diagonalize it with respect to the same U and consider its k-th (k = 1, . . . , m) component We obtain the following result Theorem 4.7. Let t ∈ I. Then under Assumption 4.1 and if A is in addition symmetric with real eigenvalues a 1 , . . . , a m there is a constant such that for h > 0 the following estimate holds for ε > 0 small enough where a + = max{|µ| : µ eigenvalue of A}.
Due to the normality of e A(t−u) (inherited by A) the Gaussian process U k ξ has variance In particular, we can show that the process e −a k t U k ξ t t≥0 satisfies the assumptions of Lemma 2.15, so that we can apply the Bernstein-type inequality. (The proof is completely analogous as in the one-dimensional case, see proof of Theorem 3.5.) To get a relation between the k-th value of the diagonalized covariance and the corresponding component of the critical manifold consider (37) in the U -coordinate system as d * (t) ≡ d * is actually independent of time in our case. Now, we can use the same strategy as variant 1 in the one dimensional case for each k. For γ ∈ (0, 1/2) let 0 = t 0 < t 1 < . . . < t N be a partition containing the interval [0, t] such that −a k (t i+1 − t i ) = εγ for 0 ≤ i < N = |a k t| εγ .
We start by estimating the probability of the exit time on [t i , t i+1 ) for i ∈ {0, . . . , N − 1} Applying Lemma 2.15 Taking the union of the events that U k ξ has exited B(h) in [t i , t i+1 ) and using the subadditivity of the probability measure, yields Finding the minimal bound with respect to γ now corresponds to optimizing due to the monotonicity of · . The optimal value is achieved for Plugging this in the estimate gives the bound for the k-th component Summing over the dimensions where a + := max{|µ| : µ eigenvalue of A}. The optimal value is now attained by choosing λ k = 1 m . This yields The proof is complete.
Due to the symmetry of A we could have diagonalized the SDE in the beginning (i.e. look at it in the U -coordinate system) and done the whole theory established in Chapter 2 to get the existence of a slow manifold ζ k (t) for Var(U k ξ t ), which is of the form However, we decided to use the results on the higher dimensional systems as much as possible to clearly indicate which steps of the proof can be generalized to more general classes of matrices beyond symmetric ones.

Outlook
This work provides a first step towards the investigation of fast-slow systems driven by fBm using sample paths estimates. So far we have examined the behavior close to a normally hyperbolic attracting invariant manifold in finite dimensions. Numerous extensions could be considered as next steps.
Having covered the uniformly attracting case, it is then natural to conjecture that there are scaling laws for the fluctuations as fast subsystem bifurcation points are approached, i.e., when hyperbolicity is lost. These results are available in the fast-slow Brownian motion case [30]. However, even when the fast dynamics is dominated by nonlinear terms [43] or one considers fast-slow maps with bounded noise [32] using modified proofs and additional technical tools it is possible to save many results. This robustness of the scaling laws near the loss of normal hyperbolicity leads one to conjecture that it will still be possible to prove such results for the fast-slow fBm case when H ∈ (1/2, 1).
However, the analysis of fast-slow systems for H ∈ (0, 1/2) is expected to be more complicated due to several reasons. First of all, a different integration theory has to be considered, see for instance [5,10]. Furthermore, the kernel (8) we have used to develop an approximation of the variance by means of the slow manifold has a non-integrable singularity for H < 1/2. Last but not least, Bernstein-type inequalities as established in Lemma 2.15 do not hold true anymore, since the covariance function of the fractional Brownian motion is negative. Consequently, one has to develop completely different techniques in this case. Another related extension would be to analyze the dynamics of fast-slow systems driven by multiplicative noise. This issue, however, requires a more general theory than Itô-calculus because the fractional Brownian motion is not a semi-martingale.
Furthermore, one could consider other stochastic processes with memory. More precisely, one could think of other stochastic processes whose covariance functions are represented by min{s,t} 0 K(s, r)K(t, r) dr, for s, t ≥ 0, for suitable square integrable kernels K, recall (5). Beyond fBm, further examples in this sense are the multi-fractional Brownian motion or the Rosenblatt process [8]. However, the analysis of fast-slow systems in this case is a challenging question, since these processes do not have in general stationary increments and are no longer Gaussian (as e.g. Rosenblatt processes).
Finally, one can also broaden the scope of the applications. Although climate dynamics is certainly a very important topic, where time-correlated noise is well-motivated by data such as temperature measurements, it is not the only possible application. Other areas, where fastslow systems with fBm could be considered are financial markets. For example, assets could be modelled as fast variables influenced by fBm stochastic forcing, while the slow variables are political/social factors influencing the market, which change on a much slower time scale in many cases. Similar remarks and examples of concrete applications are likely also exist in many contexts in neuroscience, ecology and epidemiology, where stochastic fast-slow systems with Brownian motion are already used frequently.

A On the Limit Superior of Gaussian Processes
The following proof has been developed in personal communication with Professor Andrey Dorogovtsev.
Theorem A.1. Let (Y t ) t≥0 be a centered Gaussian process with covariance function R(t, s) = E[Y t Y s ] satisfying 1. R(t, s) → 0 for t − s → ∞, 2. R(t, t) = 1 for all t ≥ 0. Proof. We aim to construct a sequence of independent random variables inductively to apply the Borel-Cantelli lemma. The strategy is highly based on the fact that for Gaussian random variables independence is equivalent to zero covariance. Let t 1 := 0. Given t 1 < · · · < t n we apply Gram-Schmidt orthogonalization to the variables Y t 1 , . . . , Y tn in L 2 (Ω, F, P) to obtain uncorrelated and normalized random variables S 1 , . . . , S n satisfying for each 1 ≤ i ≤ n where the coefficients a ij and b ij only depend on the on the values R(t i , t j ) for 1 ≤ i, j ≤ n by construction. Now for any t > t n we can project Y t on the space span 1≤j≤n {Y t j } = span 1≤j≤n {S j } Observe that by construction Y t is uncorrelated, and hence independent, of the the variables Y t 1 , . . . , Y tn . This holds in particular if we choose t n+1 ≥ max{t n , n + 1} with where the latter is possible because R(t, s) → 0 for t − s → ∞. Inductively we get for each n ∈ N Y tn = Y tn + γ n .
For the sequence of independent random variables (Y tn ) n∈N Now choose t 0 > s + T such that for all t ≥ t 0 Now, Theorem A.1 proves the claim.
Remark A.3. Note that one cannot directly apply classical probabilistic results such as the Borel-Cantelli Lemma, law of iterated logarithm or ergodic theorems to prove Corollary A.2, since the process ξ has neither stationary nor independent increments.