Homogenization of Coupled Fast-Slow Systems via Intermediate Stochastic Regularization

In this paper we study coupled fast-slow ordinary differential equations (ODEs) with small time scale separation parameter ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} such that, for every fixed value of the slow variable, the fast dynamics are sufficiently chaotic with ergodic invariant measure. Convergence of the slow process to the solution of a homogenized stochastic differential equation (SDE) in the limit ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} to zero, with explicit formulas for drift and diffusion coefficients, has so far only been obtained for the case that the fast dynamics evolve independently. In this paper we give sufficient conditions for the convergence of the first moments of the slow variable in the coupled case. Our proof is based upon a new method of stochastic regularization and functional-analytical techniques combined via a double limit procedure involving a zero-noise limit as well as considering ε\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\varepsilon $$\end{document} to zero. We also give exact formulas for the drift and diffusion coefficients for the limiting SDE. As a main application of our theory, we study weakly-coupled systems, where the coupling only occurs in lower time scales.

In this paper we are going to study multiscale ordinary differential equations (ODEs) with three separated time scales and fast chaotic dynamics: firstly, a fast time scale O(ε 2 ) with nontrivial fast chaotic dynamics, but with slow dynamics which are practically in equilibrium, secondly an intermediate time scale O(ε) with fast dynamics which have equilibrated, and finally a slow time scale O(1) (diffusive time scale). When the slow variables start to evolve under the influence of the fast dynamics, one observes induced fluctuations. In this setting, the method of reduction to a single slow equation is usually called homogenization. Common techniques to achieve the reduction include methods based upon partial differential equations (PDEs) via the Liouville or Fokker-Planck/Kolmogorov equations [10,37], techniques based upon semigroups [31], algorithmic approaches [22], as well as pathwise approaches via dynamical systems and probabilistic limit laws which we will focus on: in recent years, Melbourne and co-workers [23,26,27,35] have obtained rigorous convergence results, with high generality and mild assumptions, for the slow process x ε within fast-slow systems of the forṁ x ε = a(x ε , y ε ) + ε −1 b(x ε , y ε ), x ε (0; η) = ξ ∈ R d , for all η ∈ , (slow equation), (1.1a) y ε = ε −2 g(y ε ), y ε (0; η) = η ∈ ⊂ R m , for all η ∈ , (fast equation), (1.1b) where the vector fields a : 3 and bounded with globally bounded derivatives. A main dynamical assumption is to require ergodicity for the fastest scale, i.e., the ODEẏ = g(y), y ∈ R m , generates a flow φ t : R m → R m with a compact invariant set ⊂ R m and ergodic invariant probability measure μ supported on . Another intrinsic part of this setup is the centering condition b(x, y) dμ(y) = 0, for all x ∈ R d .
Systems of the form (1.1) are also called skew products, because they are not coupled but instead the fast variables y ε can be described by a separate dynamical system on . Further, we note that the initial condition η is the only source of randomness in the system. Without particular mixing conditions on the flow φ t , Kelly and Melbourne have shown [27] that for any finite T > 0 the slow process x ε converges weakly in C([0, T ], R d ) to the solution X of an Itô stochastic differential equation (SDE) of the form dX =ã(X ) dt + σ (X ) dW , X (0) = ξ, (1.2) where W is an R d -valued standard Brownian motion, σ is a matrix-valued map andã denotes a modified drift term. Mixing assumptions on the flow φ t are needed for more specific formulas for drift and diffusion coefficients. Although one might intuitively expect that fast chaotic noise may be approximated by a stochastic process, it is neither obvious which stochastic integral to consider nor how to prove the convergence to an SDE. The main difficulty lies in the fact that fast-slow systems are singular perturbation problems [30] as ε → 0. Yet, as described above, there even exist exact formulas for the drift termã : R d → R d and the diffusion coefficient σ : R d → R d×d . However, the skew-product structure (1.1) is a big practical restriction as it is well-known that in most applications, the fast and slow variables are coupled [30]. Our main goal in this paper is to study coupled deterministic fast-slow systems or, in other words, to generalize the study of systems of the form (1.1) by considering the case g = g(x, y). Unlike skew products, coupled systems have barely been covered in the literature, with the only results for the discrete-time case being obtained by Dolgopyat in [15], according to our best knowledge. Informally speaking, we are going to prove that as ε → 0, the solutions of the fast-slow ODE are well-approximated by an effective slow SDE; see Sect. 1.2 for precise statements. Our strategy to achieve this result is to employ a double singular limit argument via an intermediate small-noise regularization, i.e., the idea is to pass to the stochastic level as early as possible in the proof and then use functional-analytic a-priori bounds to carry out both of the necessary limits. The specific proofs will need limits of the respective integrals for the coefficients such that mixing assumptions have to be made; this is the price we pay to show such results for the coupled case.

Main Setup and Strategy for Coupled Systems
More precisely, in this paper we are interested in coupled fast-slow systems of the forṁ x ε = a(x ε , y ε ) + ε −1 b(x ε , y ε ), x ε (0; η) = ξ ∈ R d , for all η ∈ T m , (slow equation), (1.3a) y ε = ε −2 g(x ε , y ε ), y ε (0; η) = η ∈ ⊂ T m , for all η ∈ T m , (fast equation). (1.3b) Before we can provide our main results, we state several assumptions, which are supposed to hold: 3 with globally bounded derivatives up to order one. (A2) For every fixed x ∈ R d , when viewed as a parameter, the ODEẏ = g(x, y) , y ∈ T m , generates a flow φ 0,t x : T m → T m with a compact invariant set ⊂ T m and ergodic invariant probability measure μ 0 x supported on . Furthermore, g is C 3 with globally bounded derivatives up to order two. (A3) For the function b(x, ·) : → R d , the following centering condition is satisfied: b(x, y) dμ 0 x (y) = 0 for all x ∈ R d . (1.4) Due to the coupling, the argument used for skew products cannot be repeated (cf. Sect. 2.1) and we need a new ansatz. Our strategy is the following: 1. Instead of proving weak convergence of the slow process (as a measure in C([0, 1], R d )), we first try to prove a weaker form of convergence (e.g. convergence in distribution at any time). 2. We add small stochastic non-degenerate noise to the fast subsystem in order to use results on uniformly elliptic SDEs. 3. We let the noise in the stochastic system tend to zero and find the right limiting behaviour for the deterministic fast-slow system.
The main reason, why we choose to work with stochastic systems as an intermediate step is that they provide a regularization. The infinitesimal generator for the semigroup of the associated Kolmogorov equation is uniformly elliptic. In particular, this case has been studied and weak convergence of the slow process has been rigorously proven. Such systems have Here it is always assumed that δ > 0, V is an m-dimensional Brownian motion on a probability space ( , F , ν) and the SDE is to be understood as an integral equation, as usual, where dV dt denotes white noise viewed as the usual generalized stochastic process [2]. Further, let E denote the expectation with respect to the Wiener measure ν. It is well-known that for a sufficiently smooth function v : satisfy the backward Kolmogorov equation Here we use the notation A : B = trace(A B) = i j a i j b i j for the inner product of two matrices A and B, ∇ for the gradient and ∇∇ for the Hessian matrix. Note that (see for example [38,Chapter 11]) the operator L δ 1 : is uniformly elliptic and has for every fixed x ∈ R d , viewed as a parameter, a one-dimensional null space. The null space is characterized by where C denotes the constant functions in y and ρ δ ∞ is the Lebesgue density of the measure μ δ x , i.e., dμ δ Assume additionally that the centering condition is satisfied for all x ∈ R d and δ > 0. Then, due to the uniform ellipticity of L δ 1 for δ > 0, applying the Fredholm alternative [38,Theorem 7.9] gives the existence of a unique centered solution δ (y; x) of the so-called cell problem (1.10) Using perturbation expansion techniques, which we will discuss in more details in Sect. 2.3, it can been shown that u ε,δ can be approximated by the leading order component u δ 0 which satisfies where the operator L 0,δ acts on the twice continuously differentiable functions with compact support where the coefficients F δ and A δ depend on the solution δ of the cell problem (1.10) and are given by (1.13) We are now ready to state our main theorems.

Main Results
In the following, let (X ε (t; ξ, η), Y ε (t; ξ, η)) denote the solution of the ODE (1.3) for any ε > 0 and let C 0 (R d ) denote the space of continuous functions vanishing at infinity, i.e., as x → ∞. Note that we still use the notation of Sect. 1.1. In addition we assume: (A4) There exists a generator L 0,0 of a strongly continuous semigroup T 0,0 on C 0 (R d ), (1.14) Theorem A Assume (A1)-(A4). Then, for every f ∈ C 0 (R d ) and every sequence whereT is any finite time.
Theorem A provides a convergence result of the original fast-slow system with sufficiently strong assumptions on the fast chaotic dynamics to a Markov process, whose correspondence with a reduced slow SDE is specified below in the context of Theorem B (see (1.22)). The notion of convergence is to be understood in a weak averaged sense but it does cover the coupled case. The proof of Theorem A is provided in Sect. 2.4. The second main result, Theorem B, gives sufficient conditions under which the main assumption (A4) in Theorem A is satisfied. Let us define the solution operator φ δ,t x (y) of the fast equation for ε = 1, solving, for a fixed x ∈ R d , the SDE Note that φ δ,t x (y) depends on a Brownian motion and, hence, is a stochastic process φ δ,t x (y)(ω), ω ∈ . Furthermore, notice that the flow φ 0,t x is purely deterministic.
Theorem B Assume that the unperturbed flow φ 0,t x has an ergodic invariant probability measure μ 0 and summable stochastically stable decay of correlations C(t; x) in the sense of Definitions 3.2 and 3.5. Additionally (A1)-(A2) are satisfied and suppose the following centering condition holds Then we have the following: 1. In the case that g = g(y) is independent of x, then condition (A4) is satisfied. 2. In the general case that g = g(x, y), (A4) holds provided that the centering condition and the growth assumption are satisfied (Here, · α denotes the α-Hölder norm for an α > 0). 3. The operator L 0,0 can be written as where the diffusion coefficient A 0 is given by ( 1.20) and the drift term F 0 is given by Theorem B is proven at the end of Sect. 3. Note that the Markov process X generated by L 0,0 is expliticitly given by the SDE whose unique solvability is guaranteed by the smoothness and boundedness assumptions (A1), (A2). Moreover, the action of the semigroup T 0,0 f is given by E[ f (X (t))]. The growth assumption (1.18) is a strong mixing assumption on the flow and it remains to be determined precisely how large the class of functions satisfying this property is in applications (see remarks in Sect. 2.4). One possible way to weaken this assumption is to consider systems that are not coupled in the strongest possible sense, but for which the coupling occurs in smaller time scales. We refer to such systems as weakly-coupled and their general form is given by the following fast-slow ODE on Indeed, there are several examples of multiscale systems with interesting dynamical behaviour such as mixed-mode oscillations, where three time scales occur (see for example [12,28,29]). Furthermore, these three-scale systems are often very similar to related problems of van der Pol type, where rigorous proofs for chaos exist [25]. In the following, let (X ε (t; ξ, η), Y ε (t; ξ, η)) be the solution of the ODE (1.23). In this case, the solution operator φ δ,t for the fast dynamics of the stochastically perturbed system, given by 2. The operator L 0,0 can be written as
The proof of Theorem C is given with Theorem 4.1 below. Note once again that T 0,0 (t) f = E[ f (X (t))], where the Markov process X is generated by L 0,0 . Moreover, X solves the SDE (1.22) (with modified driftF 0 instead of F 0 ). Basically Theorem C states that we have the desired convergence, where the growth assumption on the correlation function is relaxed in the sense that weakly-coupled fast-slow systems behave more like the skew-product case. More precisely, for weakly-coupled systems of the form (1.23), • with vanishing h ≡ 0 (i.e. with coupling occuring only in the lowest posssible time scale), summable decay of correlations (DOC) is sufficient, provided that it is stochastically stable in the sense of Definition 3.5. There are plenty of examples for systems with summable DOC, including Anosov flows with exponential DOC, like for instance geodesic flows on compact negatively curved surfaces [13] or contact Anosov flows [32], Axiom A flows with superpolynomial DOC (also called rapid mixing) [20] or non-hyperbolic flows with a stable C 1+α foliation including some geometric Lorenz attractors [1], see also Sect. 2.2. The assumption of stochastically stable DOC is crucial and unfortunately, we are so far lacking any theory to prove for a dynamical system if it satisfies this property. This may actually be difficult to prove and we leave it as an open problem for future research here. • with non-vanishing h, the correlation function must satisfy the stronger assumption (1.25).
In summary, our results provide an entire scale of results from the more classical skewproduct structure, via weak coupling to strong coupling.

Remark 1.2
The explicit formulas for A 0 andF 0 for Theorem C are (1.28)

Outline of the Paper
In Sect. 2 we first discuss the main idea of the proofs used in [26,27] for proving weak convergence of the slow process in skew product systems (Sect. 2.1) (Sect. 2.1) and we also summarize some progress, which has been achieved over the last years, in proving mixing properties of certain classes of flows (Sect. 2.2). We then recall and extend in Sect. 2.3 some basic facts required for stochastic systems. In Sect. 2.4, we prove Theorem A, which provides criteria to guarantee weak convergence of the slow process for coupled systems. In Sect. 3, we then prove Theorem B, which gives sufficient conditions for verifying the main assumption in Theorem A and provides explicit formulas for the drift and diffusion coefficients of the limiting Itô SDE. In Sect. 4 we apply our theory to weakly-coupled systems: we transfer the results obtained for coupled systems leading to the proof of Theorem C (Sect. 4.1) and, in addition, discuss a numerical example (Sect. 4.2). Finally, in Sect. 5 we state our conclusions and discuss open problems and directions for further research.

Main Idea Used in Previous Results
Before starting proving our main results, we want quickly summarize the main idea used in [26] and [27] to study systems of the form (1.1). This provides suitable background for the reader and also shows that our approach to the problem works along a completely different route. The basic tool used in [26,27] is the so-called Weak Invariance Principle (WIP) and the idea of the proof can been very easily illustrated in the special case of a multiplicative noise (considered in [26]), i.e., under the additional assumption that the vector-field b has a multiplicative structure For simplicity let us just in this section restrict to the case that the vector field a is also independent of y, i.e., a = a(x). In this case the system (1.1) can be rewritten as where the family of random elements The key observation now is that if the flow φ s is sufficiently chaotic, then the process W ε satisfies the WIP which is a generalization of the Central Limit Theorem. Therefore, we are already tempted to conclude weak convergence of the slow process X ε . The framework under which this intuitive idea has been rigorously justified is rough path theory [21]. Equation (2.2) can be interpreted as a rough differential equation Noticing further, as shown in [26], that for any γ > 1 3 an iterated WIP, i.e.
holds, one can conclude due to continuity of the solution map of such rough differential equations [21] and the Continuous Mapping Theorem, the weak convergence of the slow process, i.e. as result of the form where b(X ) * dW is a certain kind of stochastic integral [26]. More general vector fields b are considered in [27] and the main idea is to rewrite the system (1.1) in the form where V ε and W ε are function space valued paths given by In this context, the operators F(x), H (x) are interpreted as Dirac distributions located at x, that is F(x)φ = φ(x) for any φ in the function space and similarly for H . Under mixing assumptions the iterated WIP (2.5) holds and as in the case of multiplicative noise one can then conclude a result of the form (2.6). Exact formulas of the drift and diffusion coefficients are also given in [27]. In summary, the approach relies upon a pathwise viewpoint and continuity in the rough-path topology to solutions of ODEs/SDEs. Yet, this approach seems to be very difficult to generalize if the fast-slow system is fully coupled. In particular, this has motivated our approach to look for weaker convergence concepts in a more functional-analytic setting.

Rates of Mixing for Classes of Flows
In the following, we briefly give an overview over rigorous results on mixing rates of certain classes of flows that thereby satisfy summable decay of correlations in the sense of Definition 3.2. Given a measure preserving flow φ t : → , the correlation function is defined as g. [34]).

Uniformly Hyperbolic Flows
Assume that the flow φ t : M → M is C 2 and defined on a compact manifold M. An invariant compact set ⊂ M is a hyperbolic set for φ t , provided that the tangent bundle over admits a continuous Dφ t -invariant spliting of uniformly contracting and expanding directions. For an Axiom A (uniformly hyperbolic) flow the dynamics can be reduced into finitely many hyperbolic sets 1 , ... k , called hyperbolic basis sets, which all contain a dense orbit. On every hyperbolic basic set = i , for i ∈ {1, ..., N }, we can associate, to every Hölder function on a unique invariant ergodic probability measure μ. We can further categorize Axiom A flows depending on the speed of mixing. For example, for flows with exponential DOC, the correlation function, restricted to a suitable subspace of L 2 ( , μ) (like, for example, an appropriate Hölder space), satisfies for constants C, α > 0. This was proven for example for certain classes of Anosov flows (i.e. special types of Axiom A flows for which the whole set M is uniformly hyperbolic) like geodesic flows on compact negatively curved surfaces [13] and contact Anosov flows [32]. Appart from exponential DOC we also have weaker notions, such as stretched exponential mixing, i.e. for some constant 0 < β ≤ 1 which was proven for a large class of Anosov flows in dimension 3 [11], and superpolynomial decay (or rapid mixing), i.e. for any n > 0 the correlation function satisfies or in other words, DOC at an arbitary polynomial rate. Dolgopyat [14] proved rapid mixing for "typical" Axiom A flows. Moreover, he has shown that an open and dense set of Axiom A flows is rapid mixing, when restricted to sufficiently smooth observables [15]. For all mentioned classes of mixing flows, the correlation is summable, that is we have

Non-uniformly Hyperbolic Flows
Since the assumption of uniform hyperbolicity might be too restrictive for real applications, it is natural to seek for a good mixing theory for non-uniformly hyperbolic flows. Over the last few years remarkable progress has been achieved in this area; see e.g. [34] and references therein for a good overview concerning results in this direction. For example, in [1], extending results from [4], exponential DOC is proven for a class of non-uniformly hyperbolic skewproduct flows satisfying an uniform integrability condition, which contains an open set of geometric Lorenz attractors. Moreover, in [6], for certain types of Gibbs-Markov flows, including intermittent solenoidal flows and various Lorentz gas models including the infinite horizon Lorentz gas polynomial, DOC of the correlation function with β > 1, is proven. For such flows, the DOC is summable, provided that β > 2.

Basic Facts for Stochastic Systems
Let us now come back to the coupled systems (1.3). In the following we use the notation from Sect. 1.1. If we further consider the Banach space X := (C 0 (R d × T m ), · ∞ ) of continuous functions, which vanish as x 2 → ∞ for points (x, y) ∈ R d × T m ; with the usual supremum norm, it can be shown (cf. Lemma A.3 in the Appendix) that the closurē L 1 δ generates an ergodic strongly continuous contraction semigroup {S δ (t)} t≥0 on X (in the sense of Definition A.1) andL ε,δ generates a strongly continuous contraction semigroup on X denoted by {T ε,δ (t)} t≥0 . Let P δ be the projection corresponding to the ergodic semigroup produced by L δ 1 , acting on X explicitly via The perturbation expansion leads, as shown for instance in [38] and [22] (cf. Sect. B in the Appendix for completeness) to the following equation for the leading order u 0 : The operator L 0,δ acting on the right side of equation (2.9) can be more precisely evaluated, using the function δ defined in (1.10). As shown in [38], equation (2.9) can be rewritten as where the drift and diffusion coefficients are given by (1.13) and L 0,δ u δ 0 is given by (1.11). The major disadvantage of the formulas (1.13) is that they use the solution δ of the cell problem which is not well-posed for L 0 1 or in other words, in the case that we work with purely deterministic systems. However, there are also some alternative expressions, which are more suitable for deterministic systems and are already proven in [38], but which are for convenience included in the following Lemma 2.2, since we require some minor changes. The alternative expressions use the solution operator φ δ,t x (y) of the fast dynamics given by (1.15). Recall that E denotes the expectation with respect to Wiener measure ν on and further let E μ x ⊗ν denote the expectation with respect to the product measure μ δ x ⊗ ν, where μ δ x is the ergodic measure defined in (1.8). Lemma 2.1 (Differentiability of the solution operator with respect to x) There exists a version of the stochastic process φ δ,t x such that for almost all (a.a.) ω ∈ the function x → φ δ,t x is continuously differentiable for every t and the differential ∇ ) dZ s , and observe that all assumptions are satisfied since g has bounded derivatives up to order two.

Lemma 2.2 (Alternative representations of the coefficients of the limiting SDE) Fix a δ > 0.
We have the following alternative formulas for the vector fields F δ 0 (x), F δ 1 (x) and the diffusion matrix A δ 0 (x) from equation (1.13): For all y ∈ T m and for a.a. ω ∈ we have and and if there exists a constant D(t) such that then, it holds also that We follow the proof given in [38,Chapter 11]. We first calculate Thus, using Fubini's theorem, and by inserting into the expression for A δ 0 (x) we get that for a.a. ω ∈ equation (2.13) is satisfied. Analogously (noticing that condition (2.14) allows us to interchange the order of integration and the ∇ x operator), By the chain rule we have that Thus, setting Finally, let (T 0,δ (t)) t≥0 denote the corresponding semigroup of the generator L 0,δ on C 0 (R d ). The basic important fact that we use in the following is that the semigroup (T ε,δ (t)) t≥0 converges towards (T 0,δ (t)) t≥0 as ε → 0, as stated in Theorem A.4, which has similarly been proven by Kurtz [31], but is formulated and shown in the Appendix for the reader's convenience. We are now ready to state the main result of this section.

Main Result for Coupled Systems
In the following, let {T ε,0 (t)} t≥0 denote the semigroup on X generated by L ε,0 , which is defined as in (1.6) with δ = 0. Similarly we consider the generatorL 0,0 for the strongly continuous semigroup T 0,0 (t) on C 0 (R d ).

Theorem 2.3
Under the assumptions (A1)-(A4), it follows that for every f ∈ C 0 (R d ) and every sequence {ε k } k≥0 with ε k → 0 for k → ∞, there exists a subsequence {ε k m } m≥0 such that for any finite timeT > 0 We have by the triangle inequality (2.17) Further, due to the definition of the operator L δ 1 we see immediately that for all f ∈ D(L ε,δ ) Due to equations (2.18) and (1.14) and by the Trotter-Kato Theorem (see for example [16,Theorem 4.8]) we observe that for any fixed ε > 0 the first and the last term on the right side of equation (2.17) can be made arbitrary small as δ → 0. The second difference for any fixed δ > 0 can be also made arbitrary small as ε → 0 due to Theorem A.4. To be more precise, let {ε k } k≥0 be a sequence with ε k → 0 for k → ∞. Then we can find for every k ∈ N a δ k > 0 so that Moreover, for any k ∈ N we can fix an l(k) ∈ N so that In this way we get a subsequence {ε l(k) } k≥0 for which holds. The claim now follows by taking the limit k → ∞.

Remark 2.4
A sufficient condition for the key assumption (A4) to hold is that provided that the expressions F 0 0 , F 0 1 , A 0 0 are well-defined, which requires sufficiently fast decay of correlations. Furthermore, Theorem B gives us precise conditions under, which (A4) is satisfied. In the case that g = g(y) is independent of x, the posed assumptions are relatively mild.

Corollary 2.5
Assume that (A1)-(A4) hold, that L 0,0 can be written as in (1.19) and that SDE (1.22) has the solution X (t). Then for every f ∈ C 0 (R d ) and every sequence {ε k } k≥0 with ε k → 0 for k → ∞ there exists a subsequence {ε k m } m≥0 such that for m → ∞, where the expectation E is taken with respect to the Wiener measure (defined on ) of the Brownian motion W . It follows especially that for any Borel probability measure μ on T m we have Proof The first statement follows immediately from Theorem 2.3, observing that (T ε, (t; x))]. The last statement follows from the dominated convergence theorem. In fact, even continuity ofā cannot be guaranteed in such cases. The problem of non-smooth dependence of the measures μ x is known in statistical physics as "no linear response" and can appear even in relatively simple dynamical systems [8,9,24]. See also the work of Baladi and coworkers on unimodal maps, i.e., [3,5] and references therein.
Our next natural goal is now to check under which abstract assumptions on the original ODE problems, the condition (A4) (that is equation (1.14)) is satisfied.

Convergence of the Limiting Generator L 0,ı
In this section we investigate requirements for condition (A4) to hold, which is the main assumption in Theorem 2.3 and it is also our last missing piece for proving convergence of the first moments for the slow process for the coupled deterministic systems (1.3). Let us recall that the operator L 0,δ is explicitly given by (1.12) where the drift term F δ and the diffusion matrix A δ are explicitly given by (1.13) and by the alternative expressions in Lemma 2.2. These alternative expressions use the solution operator φ δ,t x solving equation (1.15). Thus, a first step towards proving (A4) is to understand the behavior of φ δ,t x in the limit δ → 0: (i) For every T > 0 and ω ∈ , there exists a positive constant β(T , ω) > 0 (which is independent of x, y and δ) such that: where | · | ∞ denotes the supremum norm in R m . This implies that for all ω ∈ we have Furthermore, it holds that There exists a version of the stochastic process φ δ,t x (y) such that for a.a. ω ∈ the map x → φ δ,t x (y) is continuously differentiable for every t and the gradient ∇ x φ δ,t x (y) satisfies the linear ODE

Proof (i) Due to the definition of the solution operator, it follows immediately that for any
, whereC := sup x∈R d C(x) < ∞ due to the boundedness of ∇ x g. Due to Gronwall's lemma it follows that for all t ∈ [0, T ] x (y)) → ∇ y g(x, φ 0,t x (y)) as δ → 0 uniformly in x, y and t ∈ [0, T ]. Hence, the last equation is a consequence of continuous dependence of ODEs on the coefficients.
After having understood the behavior of φ δ,t x in the limit δ → 0 we now want to come back to the generator L 0,δ given in (1.12). Its coefficients, which use the solution operator φ δ,t x , are given in Lemma 2.2. Seeing these expressions and Lemma 3.1 one might be tempted to conclude the convergence of F δ , A δ and as a consequence equation (1.14). Unfortunately, it is not that simple, because for general functions g the expressions F 0 0 , F 0 1 and A 0 0 in Lemma 2.2 will not be well-defined. In fact, they are only then well-defined, when the flow φ 0,t x (y) has strong mixing properties. These considerations motivate the following definitions:

Definition 3.2 (Decay of correlations for deterministic systems)
We say that the flow φ 0,t x (y) is mixing with decay of correlations C(t; x) provided that there exists an α > 0 such that for all continuous functions v, w : T m → R, lying in the Hölder space (C 0,α , · α ), we have We say that the decay of correlations is summable provided that and we say that the decay of correlations is exponential provided that for every x ∈ R d there exist constants C(x), ρ(x) > 0 such that

In particular, this implies that the stochastic flow has exponential decay of correlations in the sense of Definition 3.2.
Proof This is an easy application of [38, Theorem 6.16]: This finishes the proof.

Definition 3.5 (Stochastically stable decay of correlations)
Let v, w : T m → R. Assume that the deterministic flow φ 0,t x has decay of correlation C(t; x). We say that φ 0,t x has stochastically stable decay of correlations provided that for all small enough δ > 0 and x ∈ R d

C(δ; x)e −ρ(δ;x)t ≤ C(t; x),
where the constants on the left side are as in Lemma 3.4.
These notions allow to prove the following statement concerning F 0 0 , F 0 1 and A 0 0 : Lemma 3.6 Assume that the unperturbed flow φ 0,t x has summable decay of correlations C(t; x) and stochastically stable decay of correlations in the sense of Definitions 3.2 and 3.5, and that the centering condition (1.16) is satisfied. Furthermore, consider, for δ ≥ 0, the well-defined expressions F δ 1 (x) (2.12), A δ 0 (x) (2.13) and, for g = g(y), which hold for all y ∈ T m and a.a. ω ∈ by ergodicity (cf. Lemma 2.2). Then we have (3.8) and, in the case that g = g(y), we additionally obtain Proof We first want to ensure that all considered expressions (2.12), (2.13) and (3.7) are welldefined for all δ ≥ 0. For (2.12) this is trivial. For (2.13) note that for a.a. ω ∈ , due to the centering condition (1.16), Lemma 3.4 and the stochastic stability we have componentwise in the tensor product is a constant which depends on b) and analogously for (3.7) in the case that g = g(y). We now start by estimating the difference F δ 1 − F 0 1 for δ > 0. Let ε > 0 and define, for For each δ > 0 we can fix a T = T 0 , which is independent of δ and x, y, ω, such that the first and last difference become smaller that ε 3 . To see this, note that the sequence 1 T T 0 sup δ,x,y,ω |a x, φ δ,s x (y)(ω) |ds is bounded from above and increasing, hence it converges. Moreover, due to Lemma 3.1 and due to the Lipschitz continuity of the vector field a, we have that Hence, for a.a. ω we have As before we split is bounded from above and increasing, hence it converges for every t. Hence, we can find a T = T 0 (t), which is independent of δ and and x, y and ω such that the first and last terms of equation (3.12) become smaller than ε. With this T 0 we have Finally, we deal with the difference |F δ 0 − F 0 0 | in case that g is independent of x. Proceeding as in our previous computations we can verify that uniformly in x, y and for t ∈ [0, T ]. This implies, due to the stochastically stable decay of correlations of φ that This finishes the proof.
It remains to deal with the term F 0 0 in case g does also depend on x. The crucial ingredients are equations (1.17) and (1.18) such that we can formulate the following result: Lemma 3.7 For the case that g = g(x, y) also depends on x, we assume that the unperturbed flow φ 0,t x has summable and stochastically stable decay of correlations wrt. an ergodic invariant measure μ 0 x on T m . Additionally, we assume that the centering condition (1.17) and, for any y ∈ T m , the growth condition (1.18) are satisfied.
Then we obtain: 1. Setting we have that

For δ ≥ 0 small enough, h(t) is an upper bound for f
is well-defined and we have

19)
Proof We must first ensure that all expressions F δ 0 are well-defined. It is easy to see that for all δ ≥ 0 we have (3.20) for a constant C 2 > 0. Secondly for δ = 0, we set w x := ∇ y b(x, y) and v t,x := ∇ x φ 0,t x (y)b(x, y) in the definition of decay of correlations and, using condition (1.17), we observe that This fact together with the growth assumption (1.18) yields which, in particular, implies that F 0 0 is well-defined. Furthermore, due to stochastically stable decay of correlations, proceeding as in Lemma 3.6 (and using also Lemma 3.1 (ii)) we can show that

Finally, we can conclude (3.19) by dominated convergence.
This allows us now to conclude the main result of this section, Theorem B.

Proof of Theorem B
The statement follows immediately from Lemmas 3.6 and 3.7. Remark 3.8 (i) Condition (1.18) seems to be a relatively strong mixing condition, which may be difficult to verify for certain practical examples. Indeed, one observes that ∇ x φ δ,t x (y) solves the first order linear inhomogeneous ODE (3.4). Thus, ∇ x φ δ,t x (y) can be calculated by variation of constants and is explicitly given by the formula Assuming for simplicity that the matrices e t 0 ∇ y g(x,φ δ,τ x (y)) dτ and e − s 0 ∇ y g(x,φ δ,τ x (y)) dτ commute, we obtain from the last equation From this we conclude that sup x,y,ω,δ where the constant is independent of t. Thus, the growth condition (1.18) might hold if the unperturbed flow φ 0,t x has exponential decay of correlations C(t; x) ≤ Ce −ρt , for all x ∈ R d , with ρ ≥ ∇ y g ∞ . This inequality describes precisely the boundary of what we might optimistically expect as possible decay rates for correlations and a further investigation is left as an open problem here. (ii) The centering condition (1.16) might seem a strong assumption at first glance because it must be satisfied for all δ > 0 and x. However, the parameter δ > 0 has the effect of only "streching" the invariant density ρ δ ∞ (y; x), so that the function b has to be simply some function which is in accordance with the symmetry of the invariant densities. The condition can also be relaxed by allowing the operator L 2 to be perturbed as well. More precisely, assume that the function b satisfies We consider suitable perturbed vector fields b δ satisfying the centering condition (1.9), for which additionally we have For example, we can consider functions of the form We then define the perturbed operators L δ and we can repeat the proof of Theorem 2.3 to get the statement.

Main Result
To provide an intermediate alternative to the strong mixing assumption (see condition (1.18)), we are also consider a simpler case of so-called weakly-coupled systems. These are systems with coupling occurring only in lower times scales and they are given by equation (1.23). We also consider the corresponding stochastic version (4.1) We are going to use now the assumptions (A1)-(A2), (A4)-(A5), and suitable centering an correlation decay conditions but not (A6) to finally be able to prove Theorem C. For any δ > 0 we setL δ 1 := g(y) · ∇ y + 1 2 δ I : ∇ y ∇ y , with the commutative part L c 2 := b(x, y) · ∇ x and the remainder L nc 2 := h(x, y) · ∇ y . The operatorL is the backward Kolmogorov operator associated with the SDE (4.1). Assume that the centering condition (1.16) is satisfied. Consider the perturbation expansion which we substitute into the backward Kolmogorov equation Via the perturbation analysis given in Sect. B of the Appendix, we arrive at the following equation for the leading order u δ Here the drift coefficient in the homogenized equation (2.10) now changes tõ (4.5) and the diffusion coefficient A δ (x) remains unchanged (4.6) Note that (see for example [38,Result 11.8]) the solution δ of the cell problem admits the representation formula where the stochastic process φ δ,t (y) satisfies equation (1.24) and the term E[b(x, φ δ,t (y))] decays exponentially fast as t → ∞ (see [38,Theorem 6.16]). The above considerations allow us to repeat the arguments from the previous sections and we get following theorem.
In the case that h does not vanish everywhere, we assume additionally that the centering condition (1.17) and the growth condition (1.25) hold. Then following statements are true: (i) There exist vector fieldsF 0 (x) and A 0 (x) such that where A 0 is explicitly given by (1.27) and the vector fieldF 0 is given by (1.28).
Proof The arguments needed for the proof are identical with those given in Sects. 2 and 3. Thus we omit their exact repetition. We only want to note that in the case that h ≡ 0 the term ∇ y δ (x, y)h(x, y) in (4.5) vanishes, so that we can repeat the arguments from Lemma 3.6 to get the first statement. In the general case that h does not vanish everywhere, the term ∇ y δ (x, y)h(x, y) in equation (4.5) cannot be neglected. Thus we need to pose the additional assumptions (1.17) and (1.25) (which ensure especially that the expression is well-defined) and then we proceed as in Lemma 3.7 to get the first statement also for this case. Finally we note that for the second statement we repeat the arguments from Theorem B, for the third statement we need to repeat the proof of Theorem 2.3 and for the last statement see the proof of Corollary 2.5.
As we can see from the formulation of Theorem (4.1), we do not have to assume any additional growth condition for φ 0,t in case h in (4.1) vanishes. If h = 0, the assumed growth condition (1.25) for the weakly-coupled system is clearly weaker than growth condition (1.18) for the more general case: in (1.18), the integrability has to hold uniformly over all x ∈ R d , whereas φ 0,t does not depend on x in the weakly-coupled situation, hence the simplification to (1.25).

Numerical Example
As an application of the previous Sect. 4.1, we consider a weakly-coupled system on R × R 3 with chaotic fast dynamics on the Lorenz attractor. Let us recall that the classical Lorenz equations are given by the three-dimensional ODE system dy 1 dt = s(y 2 − y 1 ), (4.13) In Fig. 1 sample paths of the process X ε,δ solving (4.13) for different values of ε and δ are shown. These paths illustrate that the deterministic flow displays stochastic-looking/chaotic oscillations but one does really need to look at the limiting behaviour as ε → 0 to fail to see the visual difference between a deterministic and a stochastic process. The fast subsystem has the ergodic measure μ supported on the Lorenz attractor . Let Q ⊂ R 3 be a sufficiently large cube containing . By identifying the opposite sides of the cube and rescaling the coordinates we can assume, without loss of generality, that Q = T 3 is the torus, so that the theory from the previous sections can be applied. We note further that it has been already verified numerically in [22] that the y 2 coordinate has zero average with respect to μ and as a consequence that the centering condition (1.4) is satisfied. Theorem 4.1 states that for every f ∈ C 0 (R) and every sequence {ε k } k≥0 with ε k → 0 for k → ∞ there exists a subsequence {ε k m } m≥0 such that where the process X solves the SDE Note that equation (4.15) describes an Ornstein-Uhlenbeck process which has the unique solution given by In general we know that for a square integrable function f on [0, T ], the random variable T 0 f (t) dW t is normally distributed with variance T 0 f (t) 2 dt and from this fact it is easy to see that X t is normally distributed with The exact value of σ is given by formula (1.28). In the following we use the estimate σ 2 0.126 calculated in [22]. Furthermore, since C 0 (R) ⊂ C b (R), equation (4.14) is slightly weaker than uniform convergence in distribution of the process X ε km ,0 (t) towards X (t). The following Figs. 2 and 3 verify equation (4.14) numerically.

(a)
Histogram of X ε,0 (0.5) (b) Fig. 3 Histograms of the process X ε,0 (t), corresponding with Fig. 2, taking ε = 0.8, 0.3, 0.08 at time t = 10 (a) and again ε = 0.08, 0.05 at time t = 0.5 (b) satisfying equation (4.13), in comparison to the distribution of the limiting process X (t), solving (4.15) with the initial condition ξ = 0. We used ensembles of 500 realizations Figure 2 shows that equation (4.14) is satisfied for f being the identity function (note that, since the process X ε,0 is uniformly bounded for every ε ≥ 0.05, we can assume without loss of generality that f coincides with the identity function only in a compact interval and that f ∈ C 0 (R)). Appart from that, Fig. 3 suggests that we actually have convergence in distribution of the slow process X ε,0 , satisfying the chaotic ODE (4.13) (for δ = 0), towards the limiting stochastic process X satisfying the SDE (4.15), which is a reduced stochastic equation for the slow process X ε,0 . This illustrates the reduction effect one is looking for since now the chaotic fast degrees of freedom are encoded in a low-dimensional SDE.

Conclusion and Outlook
In this paper we have extended results on deterministic homogenization of fast-slow ODEs to the case where coupling of the fast and slow variables is part of the model. Our main strategy was to add small stochastic noise to the fast subsystem and then take two independent limits -namely the zero-noise limit and the limit ε → 0 -, which enabled us to use results and functional-analytical methods from stochastic systems. For generally coupled systems, we have succeeded to prove a certain weak form of convergence of the slow process, similarly to uniform convergence of the first moments, requiring strong mixing assumptions on the fast flow. However, for the intermediate case of weakly-coupled systems, the mixing assumptions are relatively mild. Our method also directly yields explicit expressions for the drift and diffusion coefficients of the limiting SDE.
This paper can be seen as one of the first steps to understand homogenization of coupled fast-slow systems in continuous time and leaves open several relevant questions for further research. One task is to find, numerically and/or analytically, more direct examples from applications for which the strong mixing condition (1.25) is satisfied. Moreover, the key assumption of stochastically stable DOC in the sense of Definition 3.5 needs to be investigated. Another goal will be to find alternative representations of the drift and diffusion coefficients of the limiting diffusion, such that potentially weaker or even no mixing assumptions are required, as seen in [26,27]. In addition to that, it will be crucial to study the behavior of the higher moments of the slow process in order to prove weak convergence of the respective measures in C([0, T ], R d ).
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

A Convergence of the Semigroup T ,ı as → 0
Let X be a Banach space. We call P the projection corresponding to the semigroup.
Remark A.2 A sufficient condition for (A.1) to hold is that lim t→∞ S(t) f exists for every f ∈ X and then we also have that Using semigroup notation we can rewrite the last equation as See also [18,Remark 7.5].

Lemma A.3
For any fixed δ > 0 consider the operators L ε,δ and L δ 1 defined as in (1.6) on C 2 c (R d ×T m ). Let X := (C 0 (R d ×T m ), · ∞ ) be the Banach space of continuous functions, which vanish for x 2 → ∞. Then the following statements are true (i) L ε,δ generates a strongly continuous contraction semigroup (T ε,δ (t)) t≥0 on X .
(ii) L δ 1 generates an ergodic, strongly continuous contraction semigroup (S δ (t)) t≥0 on X . Proof (i) Let ψ t (x, y) denote the solution map of the SDE corresponding to the generator L ε,δ . For f ∈ X define  where P δ is the projection given by (2.7) Letψ t (x, y) denote the flow of the SDE corresponding to L δ 1 . Observe that due to the structure of the generator, the flow has the form ψ t (x, y) = (x, φ δ,t x (y)), where φ δ,t x (y) solves (1.15). Due to [38,Theorem 6.16] we have since the constant C can be chosen to be independent of x, y (due to the uniform bounds on the coefficients of the SDE). This proves the ergodicity of the semigroup S δ (t) on X .
Theorem A.4 [18, Chapter 12, Theorem 2.4] Fix a δ > 0 and let L ε,δ be the the operators as in (1.6). Define P δ by (2.7) and assume that the centering condition (1.9) is satisfied for all x ∈ R d . Furthermore let δ be the solution of the cell problem (1.10). Define For every f ∈ D let h ∈ X denote the unique solution of the Poisson equation h(x, y)ρ δ ∞ (y; x) dy = 0, (A.5) whose existence and uniqueness is guaranteed due to the centering condition and the Fredholm alternative and let L 0,δ be the operator defined on D by (2.9). Assume that the closurē L 0,δ generates a strongly continuous contraction semigroup {T (t) 0,δ } t≥0 on C 0 (R d ). Then we have for every f ∈D and finite timesT < ∞ Note that since b and the coefficients of L 1 are smooth and L 1 is uniformly elliptic, is smooth in both arguments (See also [38,Lemma 17.2] for a similar situation). Having this in mind, it is easy to check that R(V ) ⊂ D(L δ 1 ) ∩ D(L 2 ) ∩ D(L 3 ) and recalling the definitions of δ and L δ 1 we also see that h = V ( f ) solves the Poisson equation Hence, The claim follows now from [18, Chapter 1, Corollary 7.8], setting A := L 2 , := L 3 and B := L δ 1 .

B Perturbation Analysis for Weakly-Coupled Systems
In the following we follow [38] and [22]. We provide the perturbation expansions here for completeness as they are the most convenient tool to formally derive the correct limiting behavior. Substituting We continue with the last equation (B.3). Solvability requires that the right side is orthogonal to the null space of L 1 and this leads the following equation for u δ 0 (x, t): In this way we obtained a closed equation for the dominant term u δ 0 but we still have to evaluate the operators involved in it. Recall that δ denotes the solution of the cell problem (1.10). Thus, coming back to equation (B.2), we observe that u 1 must have due to (B.4) the form u δ 1 (x, y, t) = δ (y; x) · ∇ x u δ 0 (x, t). (B.7) Hence,L and Putting everything together we get (4.4).