1 Introduction

Many natural processes can be modeled by systems with two clearly separated sets of variables: a set of variables which evolve rapidly in time (for instance, within milliseconds) and a set of slowly varying variables (for instance, variables for which change is observed after hundreds of years); see [30] for many examples and techniques in fast-slow systems. In many applications the rapidly varying variables lie in a high-dimensional space and complicate the model significantly. Typical examples are chemical processes such as combustion [33], or climate dynamics [17]. Therefore, one naturally seeks reduced equations for the slow dynamics only. Several formal and rigorous reduction methods exist, such as Fenichel-Tikhonov slow manifolds [19, 30, 39], averaging [40] and homogenization [7, 38].

In this paper we are going to study multiscale ordinary differential equations (ODEs) with three separated time scales and fast chaotic dynamics: firstly, a fast time scale \({\mathcal {O}}(\varepsilon ^2)\) with nontrivial fast chaotic dynamics, but with slow dynamics which are practically in equilibrium, secondly an intermediate time scale \({\mathcal {O}}(\varepsilon )\) with fast dynamics which have equilibrated, and finally a slow time scale \({\mathcal {O}}(1)\) (diffusive time scale). When the slow variables start to evolve under the influence of the fast dynamics, one observes induced fluctuations. In this setting, the method of reduction to a single slow equation is usually called homogenization. Common techniques to achieve the reduction include methods based upon partial differential equations (PDEs) via the Liouville or Fokker-Planck/Kolmogorov equations [10, 37], techniques based upon semigroups [31], algorithmic approaches [22], as well as pathwise approaches via dynamical systems and probabilistic limit laws which we will focus on: in recent years, Melbourne and co-workers [23, 26, 27, 35] have obtained rigorous convergence results, with high generality and mild assumptions, for the slow process \(x_\varepsilon \) within fast-slow systems of the form

$$\begin{aligned} {\dot{x}}_\varepsilon&= a(x_\varepsilon ,y_\varepsilon ) +\varepsilon ^{-1}b(x_\varepsilon ,y_\varepsilon ), {\quad } x_\varepsilon (0;\eta ) = \xi \in {\mathbb {R}}^d , \text { for all }\eta \in \Omega , \quad \text { (slow equation),} \end{aligned}$$
(1.1a)
$$\begin{aligned} {\dot{y}}_\varepsilon&= \varepsilon ^{-2}g(y_\varepsilon ),{\quad } y_\varepsilon (0;\eta ) = \eta \in \Omega \subset {\mathbb {R}}^m,\text { for all }\eta \in \Omega , \quad \text {(fast equation)} , \end{aligned}$$
(1.1b)

where the vector fields \(a:{\mathbb {R}}^d\times {\mathbb {R}}^m\rightarrow {\mathbb {R}}^d\), \(b:{\mathbb {R}}^d \times {\mathbb {R}}^m \rightarrow {\mathbb {R}}^{d}\) are \(C^3\) and bounded with globally bounded derivatives. A main dynamical assumption is to require ergodicity for the fastest scale, i.e., the ODE \({\dot{y}} = g(y)\), \(y\in {\mathbb {R}}^m\), generates a flow \(\phi _t: {\mathbb {R}}^m \rightarrow {\mathbb {R}}^m \) with a compact invariant set \(\Omega \subset {\mathbb {R}}^m\) and ergodic invariant probability measure \(\mu \) supported on \(\Omega \). Another intrinsic part of this setup is the centering condition

$$\begin{aligned} \int _\Omega b(x,y) ~\mathrm {d}\mu (y) = 0, \text {\quad for all } x\in {\mathbb {R}}^d. \end{aligned}$$

Systems of the form (1.1) are also called skew products, because they are not coupled but instead the fast variables \(y_\varepsilon \) can be described by a separate dynamical system on \(\Omega \). Further, we note that the initial condition \(\eta \) is the only source of randomness in the system. Without particular mixing conditions on the flow \(\phi _t\), Kelly and Melbourne have shown [27] that for any finite \(T>0\) the slow process \(x_\varepsilon \) converges weakly in \(C([0,T], {\mathbb {R}}^d)\) to the solution X of an Itô stochastic differential equation (SDE) of the form

$$\begin{aligned} \mathrm {d}X = {\tilde{a}}(X) ~\mathrm {d}t + \sigma (X)~ \mathrm {d}W, {\quad } X(0) = \xi , \end{aligned}$$
(1.2)

where W is an \({\mathbb {R}}^d\)-valued standard Brownian motion, \(\sigma \) is a matrix-valued map and \({{\tilde{a}}}\) denotes a modified drift term. Mixing assumptions on the flow \(\phi _t\) are needed for more specific formulas for drift and diffusion coefficients.

Although one might intuitively expect that fast chaotic noise may be approximated by a stochastic process, it is neither obvious which stochastic integral to consider nor how to prove the convergence to an SDE. The main difficulty lies in the fact that fast-slow systems are singular perturbation problems [30] as \(\varepsilon \rightarrow 0\). Yet, as described above, there even exist exact formulas for the drift term \({\tilde{a}}: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^d\) and the diffusion coefficient \(\sigma : {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d\times d}\). However, the skew-product structure (1.1) is a big practical restriction as it is well-known that in most applications, the fast and slow variables are coupled [30]. Our main goal in this paper is to study coupled deterministic fast-slow systems or, in other words, to generalize the study of systems of the form (1.1) by considering the case \(g = g(x,y)\). Unlike skew products, coupled systems have barely been covered in the literature, with the only results for the discrete-time case being obtained by Dolgopyat in [15], according to our best knowledge. Informally speaking, we are going to prove that as \(\varepsilon \rightarrow 0\), the solutions of the fast-slow ODE are well-approximated by an effective slow SDE; see Sect. 1.2 for precise statements. Our strategy to achieve this result is to employ a double singular limit argument via an intermediate small-noise regularization, i.e., the idea is to pass to the stochastic level as early as possible in the proof and then use functional-analytic a-priori bounds to carry out both of the necessary limits. The specific proofs will need limits of the respective integrals for the coefficients such that mixing assumptions have to be made; this is the price we pay to show such results for the coupled case.

1.1 Main Setup and Strategy for Coupled Systems

More precisely, in this paper we are interested in coupled fast-slow systems of the form

$$\begin{aligned} {\dot{x}}_\varepsilon&= a(x_\varepsilon ,y_\varepsilon ) +\varepsilon ^{-1}b(x_\varepsilon ,y_\varepsilon ), {\quad } x_\varepsilon (0;\eta ) = \xi \in {\mathbb {R}}^d , \text { for all }\eta \in {\mathbb {T}}^m,\quad \text { (slow equation),} \end{aligned}$$
(1.3a)
$$\begin{aligned} {\dot{y}}_\varepsilon&= \varepsilon ^{-2}g(x_\varepsilon ,y_\varepsilon ), {\quad } y_\varepsilon (0;\eta ) = \eta \in \Omega \subset {\mathbb {T}}^m,\text { for all }\eta \in {\mathbb {T}}^m, \quad \text {(fast equation)}. \end{aligned}$$
(1.3b)

Before we can provide our main results, we state several assumptions, which are supposed to hold:

Assumption 1.1

  1. (A1)

    The functions \(a:{\mathbb {R}}^d\times {\mathbb {T}}^m\rightarrow {\mathbb {R}}^d\) , \(b:{\mathbb {R}}^d \times {\mathbb {T}}^m \rightarrow {\mathbb {R}}^{d}\) are \(C^3\) with globally bounded derivatives up to order one.

  2. (A2)

    For every fixed \(x \in {\mathbb {R}}^d\), when viewed as a parameter, the ODE \({\dot{y}} = g(x,y)\) , \(y\in {\mathbb {T}}^m\), generates a flow \(\phi _x^{0,t}: {\mathbb {T}}^m \rightarrow {\mathbb {T}}^m \) with a compact invariant set \(\Omega \subset {\mathbb {T}}^m\) and ergodic invariant probability measure \(\mu _x^0\) supported on \(\Omega \). Furthermore, g is \(C^3\) with globally bounded derivatives up to order two.

  3. (A3)

    For the function \(b(x,\cdot ): \Omega \rightarrow {\mathbb {R}}^{d}\), the following centering condition is satisfied:

    $$\begin{aligned} \int _{\Omega } b(x,y) ~\mathrm {d}\mu _x^0(y) = 0 \quad \text {for all } x\in {\mathbb {R}}^d. \end{aligned}$$
    (1.4)

Due to the coupling, the argument used for skew products cannot be repeated (cf. Sect. 2.1) and we need a new ansatz. Our strategy is the following:

  1. 1.

    Instead of proving weak convergence of the slow process (as a measure in \(C([0,1],{\mathbb {R}}^d)\)), we first try to prove a weaker form of convergence (e.g. convergence in distribution at any time).

  2. 2.

    We add small stochastic non-degenerate noise to the fast subsystem in order to use results on uniformly elliptic SDEs.

  3. 3.

    We let the noise in the stochastic system tend to zero and find the right limiting behaviour for the deterministic fast-slow system.

The main reason, why we choose to work with stochastic systems as an intermediate step is that they provide a regularization. The infinitesimal generator for the semigroup of the associated Kolmogorov equation is uniformly elliptic. In particular, this case has been studied and weak convergence of the slow process has been rigorously proven. Such systems have the form

$$\begin{aligned} \frac{\mathrm {d}x_{\varepsilon ,\delta }}{\mathrm {d}t}&= a(x_{\varepsilon ,\delta },y_{\varepsilon ,\delta }) + \frac{1}{\varepsilon } b(x_{\varepsilon ,\delta },y_{\varepsilon ,\delta }), {\quad } x_{\varepsilon ,\delta }(0) = x_0, \quad \text { (slow equation),} \end{aligned}$$
(1.5a)
$$\begin{aligned} \frac{\mathrm {d}y_{\varepsilon ,\delta }}{\mathrm {d}t}&= \frac{1}{\varepsilon ^2}g(x_{\varepsilon ,\delta },y_{\varepsilon ,\delta }) + \frac{1}{\varepsilon } \sqrt{\delta }\frac{\mathrm {d}V}{\mathrm {d}t}, {\quad } y_{\varepsilon ,\delta }(0) = y_0, \quad \text {(fast equation)}. \end{aligned}$$
(1.5b)

Here it is always assumed that \(\delta >0\), V is an m-dimensional Brownian motion on a probability space \((\Lambda ,{\mathcal {F}},\nu )\) and the SDE is to be understood as an integral equation, as usual, where \(\frac{\mathrm {d}V}{\mathrm {d}t}\) denotes white noise viewed as the usual generalized stochastic process [2]. Further, let \({\mathbb {E}}\) denote the expectation with respect to the Wiener measure \(\nu \). It is well-known that for a sufficiently smooth function \(v: {\mathbb {R}}^d\times {\mathbb {T}}^m\rightarrow {\mathbb {R}}\) the first moments

$$\begin{aligned} u^{\varepsilon ,\delta }(x,y,t):= {\mathbb {E}}[v(x_{\varepsilon ,\delta }(t), y_{\varepsilon ,\delta }(t))|(x_{\varepsilon ,\delta }(0), y_{\varepsilon ,\delta }(0)) = (x,y)] \end{aligned}$$

satisfy the backward Kolmogorov equation

$$\begin{aligned} \frac{\mathrm {d}u^{\varepsilon ,\delta }}{\mathrm {d}t} = {\mathcal {L}}^{\varepsilon ,\delta }u^{\varepsilon ,\delta } := \Big (\frac{1}{\varepsilon ^2} {\mathcal {L}}_1^\delta + \frac{1}{\varepsilon }{\mathcal {L}}_2 + {\mathcal {L}}_3 \Big ) u^{\varepsilon ,\delta }, \end{aligned}$$
(1.6)

where

$$\begin{aligned} {\mathcal {L}}_1^\delta u:= & {} g \cdot \nabla _y u + \frac{1}{2} \delta I: \nabla _y \nabla _y u, \\ {\mathcal {L}}_2 u:= & {} b\cdot \nabla _x u, \\ {\mathcal {L}}_3 u:= & {} a \cdot \nabla _x u. \end{aligned}$$

Here we use the notation \(A:B = \text {trace}(A^\top B)= \sum _{ij}a_{ij}b_{ij}\) for the inner product of two matrices A and B, \(\nabla \) for the gradient and \(\nabla \nabla \) for the Hessian matrix. Note that (see for example [38, Chapter 11]) the operator \({\mathcal {L}}_1^\delta : D({\mathcal {L}}_1^\delta ) \subset L^2({\mathbb {T}}^m) \rightarrow L^2({\mathbb {T}}^m)\) is uniformly elliptic and has for every fixed \(x \in {\mathbb {R}}^d\), viewed as a parameter, a one-dimensional null space. The null space is characterized by

$$\begin{aligned} \begin{aligned} {\mathcal {L}}_1^\delta C&=0, \\ \Big ({\mathcal {L}}_1^\delta \Big )^* \rho ^\delta _\infty (y;x)&= 0, \end{aligned} \end{aligned}$$
(1.7)

where C denotes the constant functions in y and \(\rho ^\delta _\infty \) is the Lebesgue density of the measure \(\mu _x^\delta \), i.e.,

$$\begin{aligned} \mathrm {d}\mu _x^\delta (y) := \rho ^\delta _\infty (y;x) ~\mathrm {d}\lambda ^m(y), \end{aligned}$$
(1.8)

where \(\mu _x^\delta \) is the unique ergodic invariant measure of the SDE

$$\begin{aligned} \frac{\mathrm {d}y}{\mathrm {d}t} = g(x,y) + \sqrt{\delta } \frac{\mathrm {d}V}{\mathrm {d}t}. \end{aligned}$$

Assume additionally that the centering condition

$$\begin{aligned} \int _{{\mathbb {T}}^m} b(x,y) \rho _\infty ^\delta (y;x) ~\mathrm {d}y = 0 \end{aligned}$$
(1.9)

is satisfied for all \(x \in {\mathbb {R}}^d\) and \(\delta > 0\). Then, due to the uniform ellipticity of \({\mathcal {L}}_1^\delta \) for \(\delta > 0\), applying the Fredholm alternative [38, Theorem 7.9] gives the existence of a unique centered solution \(\Phi ^\delta (y;x)\) of the so-called cell problem

$$\begin{aligned} - {\mathcal {L}}_1^\delta \Phi ^\delta (y;x) = b(x,y), {\quad } \int _{{\mathbb {T}}^m} \Phi ^\delta (y;x) \rho _\infty ^\delta (y;x) ~\mathrm {d}y = 0. \end{aligned}$$
(1.10)

Using perturbation expansion techniques, which we will discuss in more details in Sect. 2.3, it can been shown that \(u^{\varepsilon ,\delta }\) can be approximated by the leading order component \(u_0^\delta \) which satisfies

$$\begin{aligned} \frac{\mathrm {d}u_0^\delta }{\mathrm {d}t} = {\mathcal {L}}^{0,\delta }u_0^\delta , \end{aligned}$$
(1.11)

where the operator \({\mathcal {L}}^{0,\delta }\) acts on the twice continuously differentiable functions with compact support \(C^2_{\text {c}}({\mathbb {R}}^d)\) via

$$\begin{aligned} {\mathcal {L}}^{0,\delta }u := F^\delta (x) \cdot \nabla _{x} u + \frac{1}{2} A^\delta (x)A^\delta (x)^\top : \nabla _x \nabla _{x} u, \end{aligned}$$
(1.12)

where the coefficients \(F^\delta \) and \(A^\delta \) depend on the solution \(\Phi ^\delta \) of the cell problem (1.10) and are given by

$$\begin{aligned} \begin{aligned} F^\delta (x)&:= \int _{{\mathbb {T}}^m} \Big (a(x,y) + (\nabla _x \Phi ^\delta (y;x))b(x,y) \Big )\rho _\infty ^\delta (y;x) ~\mathrm {d}y \\&= F_1^\delta (x) + F_0^\delta (x), \\ A^\delta (x)A^\delta (x)^\top&:=\frac{1}{2}\Big (A_0^\delta (x) + A_0^\delta (x)^\top \Big ),\\ A_0^\delta (x)&:= 2\int _{{\mathbb {T}}^m} b(x,y) \otimes \Phi ^\delta (y;x) \rho _\infty ^\delta (y;x)~{\text {d}}y. \end{aligned} \end{aligned}$$
(1.13)

We are now ready to state our main theorems.

1.2 Main Results

In the following, let \((X^\varepsilon (t;\xi ,\eta ), Y^\varepsilon (t;\xi ,\eta ))\) denote the solution of the ODE (1.3) for any \(\varepsilon > 0\) and let \(C_0({\mathbb {R}}^d)\) denote the space of continuous functions vanishing at infinity, i.e., as \(\Vert x\Vert \rightarrow \infty \). Note that we still use the notation of Sect. 1.1. In addition we assume:

  1. (A4)

    There exists a generator \({\mathcal {L}}^{0, 0}\) of a strongly continuous semigroup \(T^{0,0}\) on \(C_0({\mathbb {R}}^d)\), with domain \(D\subset C_0({\mathbb {R}}^d)\) containing \(C^2_{\text {c}}({\mathbb {R}}^d)\), such that for all \(f \in C^2_{\text {c}}({\mathbb {R}}^d)\) we have

    $$\begin{aligned} \lim _{\delta \rightarrow 0}{\mathcal {L}}^{0, \delta } f = {\mathcal {L}}^{0, 0}f \quad \text {uniformly.} \end{aligned}$$
    (1.14)

Theorem A

Assume (A1)-(A4). Then, for every \(f \in C_0({\mathbb {R}}^d)\) and every sequence \(\{\varepsilon _k\}_{k\ge 0}\) with \(\varepsilon _k \rightarrow 0\) for \(k \rightarrow \infty \), there exists a subsequence \(\{\varepsilon _{k_m}\}_{m \ge 0}\) such that for \(m \rightarrow \infty \)

$$\begin{aligned} f(X^{\varepsilon _{k_m}}(t; \xi , \eta )) \rightarrow T^{0,0}(t) f (\xi ), \quad \text {uniformly in }\xi \in {\mathbb {R}}^d, \eta \in \Omega \text { and }t\in [0,{\hat{T}}], \end{aligned}$$

where \({\hat{T}}\) is any finite time.

Theorem A provides a convergence result of the original fast-slow system with sufficiently strong assumptions on the fast chaotic dynamics to a Markov process, whose correspondence with a reduced slow SDE is specified below in the context of Theorem B (see (1.22)). The notion of convergence is to be understood in a weak averaged sense but it does cover the coupled case. The proof of Theorem A is provided in Sect. 2.4. The second main result, Theorem B, gives sufficient conditions under which the main assumption (A4) in Theorem A is satisfied. Let us define the solution operator \(\phi _x^{\delta ,t}(y)\) of the fast equation for \(\varepsilon = 1\), solving, for a fixed \(x \in {\mathbb {R}}^d\), the SDE

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} \phi _x^{\delta , t}(y) = g(x,\phi _x^{\delta , t}(y)) + \sqrt{\delta } \frac{\mathrm {d}V}{\mathrm {d}t}, {\quad } \phi _x^{\delta ,0}(y)= y. \end{aligned}$$
(1.15)

Note that \(\phi _x^{\delta ,t}(y)\) depends on a Brownian motion and, hence, is a stochastic process \(\phi _x^{\delta ,t}(y)(\omega )\), \(\omega \in \Lambda \). Furthermore, notice that the flow \(\phi _x^{0,t}\) is purely deterministic.

Theorem B

Assume that the unperturbed flow \(\phi _x^{0,t}\) has an ergodic invariant probability measure \(\mu ^0\) and summable stochastically stable decay of correlations C(tx) in the sense of Definitions 3.2 and 3.5. Additionally (A1)-(A2) are satisfied and suppose the following centering condition holds

$$\begin{aligned} \int _{{\mathbb {T}}^m} b(x,y) ~\mathrm {d}\mu _x^\delta (y) =0 \quad \text {for all } x \in {\mathbb {R}}^d \text { and } \delta \ge 0. \end{aligned}$$
(1.16)

Then we have the following:

  1. 1.

    In the case that \(g=g(y)\) is independent of x, then condition (A4) is satisfied.

  2. 2.

    In the general case that \(g=g(x,y)\), (A4) holds provided that the centering condition

    $$\begin{aligned} \int _{{\mathbb {T}}^m} \nabla _y b(x,y) ~\mathrm {d}\mu _x^\delta (y) = 0 \quad \text {for all } x \in {\mathbb {R}}^d \text { and } \delta \ge 0 \end{aligned}$$
    (1.17)

    and the growth assumption

    $$\begin{aligned} \int _0^\infty \sup _{x \in {\mathbb {R}}^d} \Big \{C(t; x) \parallel \nabla _x \phi _x^{0,t}(\cdot ) b(x,\cdot ) \parallel _\alpha \Big \}~\mathrm {d}t < \infty \end{aligned}$$
    (1.18)

    are satisfied (Here, \(\parallel \cdot \parallel _{\alpha }\) denotes the \(\alpha \)-Hölder norm for an \(\alpha >0\)).

  3. 3.

    The operator \({\mathcal {L}}^{0,0}\) can be written as

    $$\begin{aligned} {\mathcal {L}}^{0,0}u = F^0(x) \cdot \nabla _x + \frac{1}{2}A^0(x)A^0(x) : \nabla _x \nabla _x u, \end{aligned}$$
    (1.19)

    where the diffusion coefficient \(A^0\) is given by

    $$\begin{aligned} \begin{aligned} A^0(x)A^0(x)^\top&= \frac{1}{2} \Big ( A_0^0(x) + A_0^0(x)^\top \Big ), \\ A_0^0(x)&= 2 \int _0^{\infty } \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T b(x, \phi _x^{0,s}(y)) b \Big (x,\phi _x^{0,t+s}(y)\Big )~ \mathrm {d}s ~\mathrm {d}t. \end{aligned} \end{aligned}$$
    (1.20)

    and the drift term \(F_0\) is given by

    $$\begin{aligned} F^0 (x)&= \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T a(x,\phi _x^{0,s}(y)) \mathrm {d}s \nonumber \\&\quad + \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T \Bigg (\nabla _x b\Big (x ,\phi _x^{0,t+s}(y) \nonumber \\&\quad + \nabla _yb\Big (x,\phi _x^{0,t+s}(y)\Big )\nabla _x\phi _x^{0,t}(\phi _x^{0,s}(y)) \Bigg ) b\Big (x,\phi _x^{0,s}(y) \Big ) ~\mathrm {d}s. \end{aligned}$$
    (1.21)

Theorem B is proven at the end of Sect. 3. Note that the Markov process X generated by \({\mathcal {L}}^{0,0}\) is expliticitly given by the SDE

$$\begin{aligned} \mathrm {d}X = F^0(X) + A^0(X)~\mathrm {d}W, {\quad } X(0) = \xi \in {\mathbb {R}}^d, \end{aligned}$$
(1.22)

whose unique solvability is guaranteed by the smoothness and boundedness assumptions (A1), (A2). Moreover, the action of the semigroup \(T^{0,0}f\) is given by \({\mathbb {E}}[f(X(t))]\). The growth assumption (1.18) is a strong mixing assumption on the flow and it remains to be determined precisely how large the class of functions satisfying this property is in applications (see remarks in Sect. 2.4). One possible way to weaken this assumption is to consider systems that are not coupled in the strongest possible sense, but for which the coupling occurs in smaller time scales. We refer to such systems as weakly-coupled and their general form is given by the following fast-slow ODE on \({\mathbb {R}}^d \times {\mathbb {T}}^m\)

$$\begin{aligned} \frac{\mathrm {d}x_\varepsilon }{\mathrm {d}t}&= a(x_\varepsilon , y_\varepsilon ) + \frac{1}{\varepsilon } b(x_\varepsilon , y_\varepsilon ), {\quad } x_\varepsilon (0) = \xi , \end{aligned}$$
(1.23a)
$$\begin{aligned} \frac{\mathrm {d}y_\varepsilon }{\mathrm {d}t}&= \frac{1}{\varepsilon ^2} g(y_\varepsilon ) + \frac{1}{\varepsilon } h(x_\varepsilon ,y_\varepsilon ) + r(x_\varepsilon ,y_\varepsilon ), {\quad }y_\varepsilon (0) = \eta . \end{aligned}$$
(1.23b)

Indeed, there are several examples of multiscale systems with interesting dynamical behaviour such as mixed-mode oscillations, where three time scales occur (see for example [12, 28, 29]). Furthermore, these three-scale systems are often very similar to related problems of van der Pol type, where rigorous proofs for chaos exist [25].

In the following, let \((X^\varepsilon (t;\xi ,\eta ), Y^\varepsilon (t;\xi ,\eta ))\) be the solution of the ODE (1.23). In this case, the solution operator \(\phi ^{\delta ,t}\) for the fast dynamics of the stochastically perturbed system, given by

$$\begin{aligned} \frac{d}{\mathrm {d}t} \phi ^{\delta ,t}(y) = g(\phi ^{\delta ,t}(y)) + \sqrt{\delta } ~\mathrm {d}V, {\quad } \phi ^{\delta ,0}(y) = y, \end{aligned}$$
(1.24)

does not depend on x.

Theorem C

Assume (A1)-(A2) and

  1. 1.

    that the unperturbed flow \(\phi ^{0,t}\) has an ergodic invariant probability measure \(\mu ^0\), summable and stochastically stable decay of correlations C(t) in the sense of Definitions 3.2 and 3.5, and that the centering condition (1.16) is satisfied,

  2. 2.

    in the case that h does not vanish everywhere, additionally, that the centering condition (1.17) and the growth condition

    $$\begin{aligned} \int _0^\infty C(t) \sup _{x \in {\mathbb {R}}^d} \Big \{ \parallel \nabla _y \phi ^{0,t}(\cdot ) h(x,\cdot )\parallel _\alpha \Big \} ~\mathrm {d}t <\infty \end{aligned}$$
    (1.25)

    are both satisfied.

Then,

  1. 1.

    condition (A4) is satisfied and for every \(f \in C_0({\mathbb {R}}^d)\) and every sequence \(\{\varepsilon _k\}_{k\ge 0}\) with \(\varepsilon _k \rightarrow 0\) for \(k \rightarrow \infty \), there exists a subsequence \( \{ \varepsilon _{k_m} \}_{m \ge 0}\) such that

    $$\begin{aligned} f(X^{\varepsilon _{k_m}}(t; \xi , \eta )) \rightarrow T^{0,0}(t) f(\xi ), \quad \text {uniformly in }\xi \in {\mathbb {R}}^d, \eta \in \Omega \text { and }t\in [0,{\hat{T}}]. \end{aligned}$$
  2. 2.

    The operator \({\mathcal {L}}^{0,0}\) can be written as

    $$\begin{aligned} {\mathcal {L}}^{0,0}u = {\tilde{F}}^0(x) \cdot \nabla _x + \frac{1}{2}A^0(x)A^0(x) : \nabla _x \nabla _x u, \end{aligned}$$
    (1.26)

    where \({\tilde{F}}^0\) is given by (1.28) and \(A^0\) is given by (1.27).

The proof of Theorem C is given with Theorem 4.1 below. Note once again that \(T^{0,0}(t)f ={\mathbb {E}}[f(X(t))]\), where the Markov process X is generated by \({\mathcal {L}}^{0,0}\). Moreover, X solves the SDE (1.22) (with modified drift \({\tilde{F}}^0\) instead of \(F^0\)). Basically Theorem C states that we have the desired convergence, where the growth assumption on the correlation function is relaxed in the sense that weakly-coupled fast-slow systems behave more like the skew-product case. More precisely, for weakly-coupled systems of the form (1.23),

  • with vanishing \(h \equiv 0\) (i.e. with coupling occuring only in the lowest posssible time scale), summable decay of correlations (DOC) is sufficient, provided that it is stochastically stable in the sense of Definition 3.5. There are plenty of examples for systems with summable DOC, including Anosov flows with exponential DOC, like for instance geodesic flows on compact negatively curved surfaces [13] or contact Anosov flows [32], Axiom A flows with superpolynomial DOC (also called rapid mixing) [20] or non-hyperbolic flows with a stable \(C^{1+\alpha }\) foliation including some geometric Lorenz attractors [1], see also Sect. 2.2. The assumption of stochastically stable DOC is crucial and unfortunately, we are so far lacking any theory to prove for a dynamical system if it satisfies this property. This may actually be difficult to prove and we leave it as an open problem for future research here.

  • with non-vanishing h, the correlation function must satisfy the stronger assumption (1.25).

In summary, our results provide an entire scale of results from the more classical skew-product structure, via weak coupling to strong coupling.

Remark 1.2

The explicit formulas for \(A^0\) and \({\tilde{F}}^0\) for Theorem C are

$$\begin{aligned} \begin{aligned} A^0(x)A^0(x)^\top&= \frac{1}{2} \Big ( A_0^0(x) + A_0^0(x)^\top \Big ), \\ A_0^0(x)&= 2 \int _0^{\infty } \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T b(x, \phi ^{0,s}(y)) b \Big (x,\phi ^{0,t+s}(y)\Big ) ~\mathrm {d}s ~\mathrm {d}t. \end{aligned} \end{aligned}$$
(1.27)

and

$$\begin{aligned}&{\tilde{F}}^0 (x) = \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T a(x,\phi ^{0,s}(y)) ~\mathrm {d}s \nonumber \\&\quad + \int _0^\infty \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T \Bigg (\nabla _x b\Big (x,\phi ^{0,t+s}(y) \Big ) b\Big (x,\phi ^{0,s}(y)\Big ) \nonumber \\&\quad + \nabla _y b\Big (x,\phi ^{0,t+s}(y) \Big ) \nabla _y \phi ^{0,t} (\phi ^{0,s}(y)) h\Big (x,\phi ^{0,s}(y) \Big ) \Bigg ) ~\mathrm {d}s ~\mathrm {d}t. \end{aligned}$$
(1.28)

1.3 Outline of the Paper

In Sect. 2 we first discuss the main idea of the proofs used in [26, 27] for proving weak convergence of the slow process in skew product systems (Sect. 2.1) (Sect. 2.1) and we also summarize some progress, which has been achieved over the last years, in proving mixing properties of certain classes of flows (Sect. 2.2). We then recall and extend in Sect. 2.3 some basic facts required for stochastic systems. In Sect. 2.4, we prove Theorem A, which provides criteria to guarantee weak convergence of the slow process for coupled systems. In Sect. 3, we then prove Theorem B, which gives sufficient conditions for verifying the main assumption in Theorem A and provides explicit formulas for the drift and diffusion coefficients of the limiting Itô SDE. In Sect. 4 we apply our theory to weakly-coupled systems: we transfer the results obtained for coupled systems leading to the proof of Theorem C (Sect. 4.1) and, in addition, discuss a numerical example (Sect. 4.2). Finally, in Sect. 5 we state our conclusions and discuss open problems and directions for further research.

2 From Skew Products to Coupled Systems

2.1 Main Idea Used in Previous Results

Before starting proving our main results, we want quickly summarize the main idea used in [26] and [27] to study systems of the form (1.1). This provides suitable background for the reader and also shows that our approach to the problem works along a completely different route. The basic tool used in [26, 27] is the so-called Weak Invariance Principle (WIP) and the idea of the proof can been very easily illustrated in the special case of a multiplicative noise (considered in [26]), i.e., under the additional assumption that the vector-field b has a multiplicative structure

$$\begin{aligned} b(x,y) \equiv b(x) v(y), {\quad } b: {\mathbb {R}}^d \rightarrow {\mathbb {R}}^{d \times e}, v: \Omega \rightarrow {\mathbb {R}}^e. \end{aligned}$$
(2.1)

For simplicity let us just in this section restrict to the case that the vector field a is also independent of y, i.e., \(a = a(x)\). In this case the system (1.1) can be rewritten as

$$\begin{aligned} \mathrm {d}X_\varepsilon = a(X_\varepsilon )~\mathrm {d}t + b(X_\varepsilon )~\mathrm {d}W_\varepsilon , {\quad } X_\varepsilon (0;\eta )=\xi , \end{aligned}$$
(2.2)

where the family of random elements \(W_\varepsilon (\cdot ;\eta ) \in C([0,1],{\mathbb {R}}^e)\) is defined by

$$\begin{aligned} W_\varepsilon (t;\eta ):= \varepsilon v _{t\varepsilon ^{-2}}(\eta ), {\quad } v_t(\eta ) := \int _0^tv\circ \phi _s(\eta )\mathrm {d}s. \end{aligned}$$
(2.3)

The key observation now is that if the flow \(\phi _s\) is sufficiently chaotic, then the process \(W_\varepsilon \) satisfies the WIP

$$\begin{aligned} W_\varepsilon \rightarrow _w W \quad \text {in } C([0,1], {\mathbb {R}}^e) \text { as } \varepsilon \rightarrow 0, \end{aligned}$$
(2.4)

which is a generalization of the Central Limit Theorem. Therefore, we are already tempted to conclude weak convergence of the slow process \(X_\varepsilon \). The framework under which this intuitive idea has been rigorously justified is rough path theory [21]. Equation (2.2) can be interpreted as a rough differential equation

$$\begin{aligned} \mathrm {d}X_{\varepsilon } = a(X_\varepsilon )~\mathrm {d}t + b(X_\varepsilon )~\mathrm {d}(W_\varepsilon ,{\mathbb {W}}_\varepsilon ), \quad X_\varepsilon (0;\eta ) = \xi . \end{aligned}$$

Noticing further, as shown in [26], that for any \(\gamma > \frac{1}{3}\) an iterated WIP, i.e.

$$\begin{aligned} (W_\varepsilon , {\mathbb {W}}_\varepsilon ) \rightarrow _w (W, {\mathbb {W}}) \quad \text {as }\varepsilon \rightarrow 0\hbox { in the rough path }\rho _\gamma \text { topology}, \end{aligned}$$
(2.5)

holds, one can conclude due to continuity of the solution map of such rough differential equations [21] and the Continuous Mapping Theorem, the weak convergence of the slow process, i.e. as result of the form

$$\begin{aligned} X_\varepsilon \rightarrow _w X \text { as }\varepsilon \rightarrow 0, \quad \mathrm {d}X = a(X)~\mathrm {d}t + b(X)*\mathrm {d}W, \end{aligned}$$
(2.6)

where \(b(X)*\mathrm {d}W\) is a certain kind of stochastic integral [26]. More general vector fields b are considered in [27] and the main idea is to rewrite the system (1.1) in the form

$$\begin{aligned} \mathrm {d}X_\varepsilon = F(X_\varepsilon ) ~\mathrm {d}V_\varepsilon + H(X_\varepsilon )~\mathrm {d}W_\varepsilon , \end{aligned}$$

where \(V_\varepsilon \) and \(W_\varepsilon \) are function space valued paths given by

$$\begin{aligned} V_\varepsilon (t) = \int _0^t a(\cdot , y_\varepsilon (r))~\mathrm {d}r \text { and } W_\varepsilon (t) = \varepsilon ^{-1} \int _0^t b(\cdot , y_\varepsilon (r))~\mathrm {d}r. \end{aligned}$$

In this context, the operators F(x), H(x) are interpreted as Dirac distributions located at x, that is \(F(x) \phi = \phi (x)\) for any \(\phi \) in the function space and similarly for H. Under mixing assumptions the iterated WIP (2.5) holds and as in the case of multiplicative noise one can then conclude a result of the form (2.6). Exact formulas of the drift and diffusion coefficients are also given in [27]. In summary, the approach relies upon a pathwise viewpoint and continuity in the rough-path topology to solutions of ODEs/SDEs. Yet, this approach seems to be very difficult to generalize if the fast-slow system is fully coupled. In particular, this has motivated our approach to look for weaker convergence concepts in a more functional-analytic setting.

2.2 Rates of Mixing for Classes of Flows

In the following, we briefly give an overview over rigorous results on mixing rates of certain classes of flows that thereby satisfy summable decay of correlations in the sense of Definition 3.2. Given a measure preserving flow \(\phi _t: \Lambda \rightarrow \Lambda \), the correlation function is defined as

$$\begin{aligned} \rho _{A,B}(t) := \int _{\Lambda } A \circ \Phi _tB d\mu - \int _{\Lambda } A d\mu \int _{\Lambda } B d\mu . \end{aligned}$$

for observables \(A, B \in L^2(\Lambda , \mu )\). The flow \(\phi _t\) is called a mixing if and only if \(\rho _{A,B}(t)\rightarrow 0\) as \(t \rightarrow \infty \) for all \(A, B \in L^2(\Lambda , \mu )\) (see e.g. [34]).

2.2.1 Uniformly Hyperbolic Flows

Assume that the flow \(\phi _t: M \rightarrow M\) is \(C^2\) and defined on a compact manifold M. An invariant compact set \(\Lambda \subset M\) is a hyperbolic set for \(\phi _t\), provided that the tangent bundle over \(\Lambda \) admits a continuous \(D\phi _t\)- invariant spliting

$$\begin{aligned} T_\Lambda M = E^u \oplus E^0 \oplus E^s \end{aligned}$$

of uniformly contracting and expanding directions. For an Axiom A (uniformly hyperbolic) flow the dynamics can be reduced into finitely many hyperbolic sets \(\Lambda _1\), ... \(\Lambda _k\), called hyperbolic basis sets, which all contain a dense orbit. On every hyperbolic basic set \(\Lambda = \Lambda _i\), for \(i\in \{1,...,N \}\), we can associate, to every Hölder function on \(\Lambda \) a unique invariant ergodic probability measure \(\mu \). We can further categorize Axiom A flows depending on the speed of mixing. For example, for flows with exponential DOC, the correlation function, restricted to a suitable subspace of \(L^2(\Lambda , \mu )\) (like, for example, an appropriate Hölder space), satisfies

$$\begin{aligned} \rho _{A,B}(t) \le C(A,B) e^{-\alpha t}, \quad \forall t>0, \end{aligned}$$

for constants \(C, \alpha >0\). This was proven for example for certain classes of Anosov flows (i.e. special types of Axiom A flows for which the whole set M is uniformly hyperbolic) like geodesic flows on compact negatively curved surfaces [13] and contact Anosov flows [32]. Appart from exponential DOC we also have weaker notions, such as stretched exponential mixing, i.e.  for some constant \(0 <\beta \le 1\)

$$\begin{aligned} \rho _{A,B}(t) \le C(A,B) e^{-\alpha t^\beta }, \quad \forall t>0, \end{aligned}$$

which was proven for a large class of Anosov flows in dimension 3 [11], and superpolynomial decay (or rapid mixing), i.e. for any \(n>0\) the correlation function satisfies

$$\begin{aligned} \rho _{A,B}(t) \le C(A,B) t^{-n}, \quad \forall t>0, \end{aligned}$$

or in other words, DOC at an arbitary polynomial rate. Dolgopyat [14] proved rapid mixing for “typical” Axiom A flows. Moreover, he has shown that an open and dense set of Axiom A flows is rapid mixing, when restricted to sufficiently smooth observables [15]. For all mentioned classes of mixing flows, the correlation is summable, that is we have

$$\begin{aligned} \int _0^\infty \rho _{A,B}(t) dt < \infty . \end{aligned}$$

2.2.2 Non-uniformly Hyperbolic Flows

Since the assumption of uniform hyperbolicity might be too restrictive for real applications, it is natural to seek for a good mixing theory for non-uniformly hyperbolic flows. Over the last few years remarkable progress has been achieved in this area; see e.g. [34] and references therein for a good overview concerning results in this direction. For example, in [1], extending results from [4], exponential DOC is proven for a class of non-uniformly hyperbolic skew-product flows satisfying an uniform integrability condition, which contains an open set of geometric Lorenz attractors. Moreover, in [6], for certain types of Gibbs-Markov flows, including intermittent solenoidal flows and various Lorentz gas models including the infinite horizon Lorentz gas polynomial, DOC of the correlation function

$$\begin{aligned} \rho _{A,B}(t) \le C(A,B) t^{-(\beta -1)} \quad \forall t>0, \end{aligned}$$

with \(\beta >1\), is proven. For such flows, the DOC is summable, provided that \(\beta >2\).

2.3 Basic Facts for Stochastic Systems

Let us now come back to the coupled systems (1.3). In the following we use the notation from Sect. 1.1. If we further consider the Banach space \(X:= (C_0({\mathbb {R}}^d\times {\mathbb {T}}^m) , \Vert \cdot \Vert _\infty )\) of continuous functions, which vanish as \(\Vert x\Vert _2 \rightarrow \infty \) for points \((x,y) \in {\mathbb {R}}^d\times {\mathbb {T}}^m\); with the usual supremum norm, it can be shown (cf. Lemma A.3 in the Appendix) that the closure \(\bar{{\mathcal {L}}_1}^\delta \) generates an ergodic strongly continuous contraction semigroup \(\{ S^\delta (t) \}_{t\ge 0}\) on X (in the sense of Definition A.1) and \(\bar{{\mathcal {L}}}^{\varepsilon ,\delta }\) generates a strongly continuous contraction semigroup on X denoted by \(\{T^{\varepsilon ,\delta }(t) \}_{t\ge 0}\). Let \({\mathcal {P}}^\delta \) be the projection corresponding to the ergodic semigroup produced by \({\mathcal {L}}_1^\delta \), acting on X explicitly via

$$\begin{aligned} {\mathcal {P}}^\delta f (x,y) := \int _{{\mathbb {T}}^m} f(x,y) \rho _\infty ^\delta (y;x) ~\mathrm {d}y, f\in X. \end{aligned}$$
(2.7)

The perturbation expansion

$$\begin{aligned} u^{\varepsilon ,\delta } = u_0^\delta + \varepsilon u_1^\delta + \varepsilon ^2 u_2^\delta + \cdots , \end{aligned}$$
(2.8)

leads, as shown for instance in [38] and [22] (cf. Sect. B in the Appendix for completeness) to the following equation for the leading order \(u_0\):

$$\begin{aligned} \frac{\mathrm {d}u_0^\delta }{\mathrm {d}t}(x,t)&= \int _{{\mathbb {T}}^m}\rho _\infty ^\delta (y;x){\mathcal {L}}_3 u_0^\delta (x,t) ~\mathrm {d}y - \int _{{\mathbb {T}}^m}\rho _\infty ^\delta (y;x){\mathcal {L}}_2\Big ({\mathcal {L}}_1^\delta \Big )^{-1} {\mathcal {L}}_2 u_0^\delta (x,t)~\mathrm {d}y \nonumber \\&= \Big ({\mathcal {P}}^\delta {\mathcal {L}}_3 {\mathcal {P}}^\delta - {\mathcal {P}}^\delta {\mathcal {L}}_2\Big ({\mathcal {L}}_1^\delta \Big )^{-1}{\mathcal {L}}_2 {\mathcal {P}}^\delta \Big ) u_0^\delta (x,t) \nonumber \\&=: {\mathcal {L}}^{0,\delta } u_0^\delta . \end{aligned}$$
(2.9)

The operator \({\mathcal {L}}^{0,\delta }\) acting on the right side of equation (2.9) can be more precisely evaluated, using the function \(\Phi ^\delta \) defined in (1.10). As shown in [38], equation (2.9) can be rewritten as

$$\begin{aligned} \frac{du_0^\delta }{\mathrm {d}t}&= F^\delta (x) \cdot \nabla _{x} u_0^\delta + \frac{1}{2} A^\delta (x)A^\delta (x)^\top : \nabla _x \nabla _{x} u_0^\delta \nonumber \\&= {\mathcal {L}}^{0,\delta }u_0^\delta \end{aligned}$$
(2.10)

where the drift and diffusion coefficients are given by (1.13) and \({\mathcal {L}}^{0,\delta }u_0^\delta \) is given by (1.11).

The major disadvantage of the formulas (1.13) is that they use the solution \(\Phi ^\delta \) of the cell problem which is not well-posed for \({\mathcal {L}}_1^0\) or in other words, in the case that we work with purely deterministic systems. However, there are also some alternative expressions, which are more suitable for deterministic systems and are already proven in [38], but which are for convenience included in the following Lemma 2.2, since we require some minor changes. The alternative expressions use the solution operator \(\phi _x^{\delta ,t}(y)\) of the fast dynamics given by (1.15). Recall that \({\mathbb {E}}\) denotes the expectation with respect to Wiener measure \(\nu \) on \(\Lambda \) and further let \({\mathbb {E}}^{\mu _x \otimes \nu }\) denote the expectation with respect to the product measure \(\mu _x^\delta \otimes \nu \), where \(\mu _x^\delta \) is the ergodic measure defined in (1.8).

Lemma 2.1

(Differentiability of the solution operator with respect to x) There exists a version of the stochastic process \(\phi _x^{\delta ,t}\) such that for almost all (a.a.) \(\omega \in \Lambda \) the function \(x \rightarrow \phi _x^{\delta ,t}\) is continuously differentiable for every t and the differential \(\nabla _{x}\phi _x^{\delta ,t}(y) \in {\mathbb {R}}^{m\times d}\) satisfies the linear ODE

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} \nabla _{x}\phi _x^{\delta ,t}(y) = \nabla _x g(x,\phi _x^{\delta ,t}(y)) + \nabla _y g(x,\phi _x^{\delta ,t}(y))\nabla _{x}\phi _x^{\delta ,t}(y), \quad \nabla _x\phi _x^{\delta ,0}(y) = 0.\qquad \end{aligned}$$
(2.11)

Proof

This follows from [36, Theorem 4.2], where we set \(v^x(t):= y +\sigma _2\frac{\mathrm {d}V}{\mathrm {d}t}\), \(u:= x\) and \(\mathrm {d}Z_s:= \mathrm {d}t\) such that \(\phi _x(t) = v^x(t) + \int _0^tg(x,\phi _x(s))~\mathrm {d}Z_s\), and observe that all assumptions are satisfied since g has bounded derivatives up to order two. \(\square \)

Lemma 2.2

(Alternative representations of the coefficients of the limiting SDE) Fix a \(\delta > 0\). We have the following alternative formulas for the vector fields \(F_0^\delta (x), F_1^\delta (x)\) and the diffusion matrix \(A_0^\delta (x)\) from equation (1.13): For all \(y\in {\mathbb {T}}^m\) and for a.a. \(\omega \in \Lambda \) we have

$$\begin{aligned} F_1^\delta (x) = \lim _{T\rightarrow \infty } \frac{1}{T} \int _0^{T} a \Big ( x,\phi _x^{\delta ,s}(y)(\omega ) \Big )~\mathrm {d}s \end{aligned}$$
(2.12)

and

$$\begin{aligned} A_0^\delta (x) = 2 \int _0^{\infty } \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T b(x, \phi _x^{\delta ,s}(y)(\omega ))\otimes {\mathbb {E}}[ b \Big (x,\phi _x^{\delta ,t} (\phi _x^{\delta ,s}(y)(\omega )) \Big )] ~\mathrm {d}s ~\mathrm {d}t,\qquad \end{aligned}$$
(2.13)

and if there exists a constant D(t) such that

$$\begin{aligned} \nabla _x \Big ( {\mathbb {E}}[b(x,\phi _x^{\delta ,t}(y))] \Big ) \le D(t), \text { for all }x \in {\mathbb {R}}^d, \quad \int _0^\infty D(t) ~\mathrm {d}t <\infty , \end{aligned}$$
(2.14)

then, it holds also that

$$\begin{aligned} F_0^{\delta }(x)&= \int _0^\infty \Bigg (\lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T {\mathbb {E}} \Big [ \nabla _x b\Big (x,\phi _x^{\delta ,t}(\phi _x^s(y)(\omega ))\Big ) \nonumber \\&\quad + \nabla _yb\Big (x,\phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y)(\omega )) \Big )\nabla _x \phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y)(\omega )) \Big ] b\Big (x,\phi _x^{\delta ,s}(y)(\omega ) \Big )~ \mathrm {d}s \Bigg ) ~\mathrm {d}t. \end{aligned}$$
(2.15)

Proof

We follow the proof given in [38, Chapter 11]. We first calculate

$$\begin{aligned} \Phi ^\delta (y;x)&= \int _0^\infty ({\text {e}}^{{\mathcal {L}}_1^\delta t}b)(x,y) ~\mathrm {d}t \quad (\text {by }[38, Result 11.8]) \\&= \int _0^\infty {\mathbb {E}} [ b(x,\phi _x^{\delta ,t}(y)) ] ~\mathrm {d}t&(\text {by }[38, Theorem\, 6.6]) \end{aligned}$$

Thus, using Fubini’s theorem,

$$\begin{aligned} A_0^\delta (x)&= 2\int _{{\mathbb {T}}^m} b(x,y) \otimes \Phi ^\delta (y;x) \rho _\infty (y;x)~\mathrm {d}y\\&= 2 \int _{{\mathbb {T}}^m} b(x,y) \otimes \int _0^\infty {\mathbb {E}} [ b(x,\phi _x^{\delta ,t}(y)) ] ~\mathrm {d}t ~ \rho _\infty ^\delta (y;x)~\mathrm {d}y \\&= 2\int _0^\infty \int _{{\mathbb {T}}^m} b(x,y) \otimes \int _\Lambda b(x,\phi _x^{\delta ,t}(y)(\omega )) ~\mathrm {d}\nu (\omega ) \rho ^\delta _\infty (y;x) ~\mathrm {d}y ~\mathrm {d}t\\&= 2 \int _0^{\infty } {\mathbb {E}}^{\mu _x^\delta \otimes \nu }[b(x,y) \otimes b(x,\phi _x^{\delta ,t}(y))] ~\mathrm {d}t. \end{aligned}$$

Setting \(h(x,y;t):= b(x,y) \otimes {\mathbb {E}} [b(x,\phi _x^{\delta ,t}(y))]\) we get from Theorem [38, Theorem 6.16] that for a.a. \(\omega \in \Lambda \) we have

$$\begin{aligned} \int _{{\mathbb {T}}^m} h(x,y;t) \rho _\infty ^\delta (y;x) ~\mathrm {d}y = \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T h(x,\phi _x^{\delta ,s}(y)(\omega ))\mathrm {d}s \end{aligned}$$

and by inserting into the expression for \(A_0^\delta (x)\) we get that for a.a. \(\omega \in \Lambda \) equation (2.13) is satisfied. Analogously (noticing that condition (2.14) allows us to interchange the order of integration and the \(\nabla _x\) operator),

$$\begin{aligned} F_0^\delta (x)&= \int _{{\mathbb {T}}^m} \rho _\infty ^\delta (y;x)\nabla _x \Phi ^\delta (y;x)b(x,y)~\mathrm {d}y\\&= \int _{{\mathbb {T}}^m} \rho _\infty ^\delta (y;x) \nabla _x [\int _0^\infty {\mathbb {E}} [ b(x,\phi _x^{\delta ,t}(y))] ~\mathrm {d}t] b(x,y) ~\mathrm {d}y\\&= \int _{{\mathbb {T}}^m} \rho _\infty ^\delta (y;x) [\int _0^\infty \int _{\Lambda }\nabla _x \Big (b(x,\phi _x^{\delta ,t}(y)(\omega )) \Big ) ~\mathrm {d}\nu (\omega ) ~\mathrm {d}t] b(x,y) ~\mathrm {d}y \\&= \int _0^\infty \int _{{\mathbb {T}}^m} \rho _\infty ^\delta (y;x)\int _{\Lambda }\nabla _x \Big (b(x,\phi _x^{\delta ,t}(y)(\omega )) \Big ) ~\mathrm {d}\nu (\omega )b(x,y) ~\mathrm {d}y ~\mathrm {d}t \\&= \int _0^\infty {\mathbb {E}}^{\mu _x^\delta \otimes \nu }[\nabla _x \Big ( b(x,\phi _x^{\delta ,t}(y))\Big ) b(x,y) ]~\mathrm {d}t. \end{aligned}$$

By the chain rule we have that

$$\begin{aligned} \nabla _x \Big (b(x,\phi _x^{\delta ,t}(y)) \Big ) = \nabla _xb(x,\phi _x^{\delta ,t}(y)) + \nabla _yb(x,\phi _x^{\delta ,t}(y))\nabla _x\phi _x^{\delta ,t}(y). \end{aligned}$$

Thus, setting

$$\begin{aligned} h(x,y;t) := {\mathbb {E}} [\nabla _xb(x,\phi _x^{\delta ,t}(y)) + \nabla _yb(x,\phi _x^{\delta ,t}(y)) \nabla _x\phi _x^{\delta ,t}(y) ] b(x,y), \end{aligned}$$

we get equation (2.15) by [38, Theorem 6.16]. Now the expression for \(F_1^\delta \) follows directly from [38, Theorem 6.16]. \(\square \)

Finally, let \((T^{0,\delta }(t))_{t \ge 0}\) denote the corresponding semigroup of the generator \({\mathcal {L}}^{0,\delta }\) on \(C_0({\mathbb {R}}^d)\). The basic important fact that we use in the following is that the semigroup \((T^{\varepsilon ,\delta }(t))_{t \ge 0}\) converges towards \((T^{0,\delta }(t))_{t \ge 0}\) as \(\varepsilon \rightarrow 0\), as stated in Theorem A.4, which has similarly been proven by Kurtz [31], but is formulated and shown in the Appendix for the reader’s convenience. We are now ready to state the main result of this section.

2.4 Main Result for Coupled Systems

In the following, let \(\{T^{\varepsilon ,0}(t)\}_{t \ge 0}\) denote the semigroup on X generated by \({\mathcal {L}}^{\varepsilon ,0}\), which is defined as in (1.6) with \(\delta = 0\). Similarly we consider the generator \(\bar{{\mathcal {L}}}^{0,0}\) for the strongly continuous semigroup \(T^{0,0}(t)\) on \(C_0({\mathbb {R}}^d)\).

Theorem 2.3

Under the assumptions (A1)-(A4), it follows that for every \(f \in C_0({\mathbb {R}}^d)\) and every sequence \(\{\varepsilon _k\}_{k\ge 0}\) with \(\varepsilon _k \rightarrow 0\) for \(k \rightarrow \infty \), there exists a subsequence \( \{ \varepsilon _{k_m} \}_{m \ge 0}\) such that for any finite time \({\hat{T}}>0\)

$$\begin{aligned} \lim _{m \rightarrow \infty } \sup _{0 \le t \le {\hat{T}}} \parallel T^{\varepsilon _{k_m},0}(t)f - T^{0,0}(t)f \parallel _\infty = 0. \end{aligned}$$
(2.16)

Proof

Fix \(f \in C_0({\mathbb {R}}^d)\). We have by the triangle inequality

$$\begin{aligned} \begin{aligned} \parallel T^{\varepsilon , 0}(t) f - T^{0,0}(t) f \parallel _\infty&\le \parallel T^{\varepsilon , 0}(t)f - T^{\varepsilon ,\delta }(t)f \parallel _\infty + \parallel T^{\varepsilon ,\delta }(t)f - T^{0,\delta }(t)f \parallel _\infty \\&{\quad }+ \parallel T^{0,\delta }(t)f - T^{0,0}(t)f \parallel _\infty . \end{aligned}\nonumber \\ \end{aligned}$$
(2.17)

Further, due to the definition of the operator \({\mathcal {L}}_1^\delta \) we see immediately that for all \(f \in {\mathcal {D}}({\mathcal {L}}^{\varepsilon ,\delta })\)

$$\begin{aligned} \lim _{\delta \rightarrow 0}{\mathcal {L}}^{\varepsilon , \delta } f = {\mathcal {L}}^{\varepsilon ,0} f \quad \text {uniformly}. \end{aligned}$$
(2.18)

Due to equations (2.18) and (1.14) and by the Trotter-Kato Theorem (see for example [16, Theorem 4.8]) we observe that for any fixed \(\varepsilon >0\) the first and the last term on the right side of equation (2.17) can be made arbitrary small as \(\delta \rightarrow 0\). The second difference for any fixed \(\delta > 0\) can be also made arbitrary small as \(\varepsilon \rightarrow 0\) due to Theorem A.4. To be more precise, let \( \{ \varepsilon _k \}_{k \ge 0} \) be a sequence with \(\varepsilon _k \rightarrow 0\) for \(k \rightarrow \infty \). Then we can find for every \(k \in {\mathbb {N}}\) a \(\delta _k > 0\) so that

$$\begin{aligned} \parallel T^{\varepsilon _k, 0}(t)f - T^{\varepsilon _k,\delta _k}(t)f \parallel _\infty + \parallel T^{0,\delta _k}(t)f - T^{0,0}(t)f \parallel _\infty < \frac{2\varepsilon _k}{3}. \end{aligned}$$

Moreover, for any \(k \in {\mathbb {N}}\) we can fix an \(l(k) \in {\mathbb {N}}\) so that

$$\begin{aligned} \parallel T^{\varepsilon _{l(k)},\delta _k}(t)f - T^{0,\delta _k}(t)f \parallel _\infty < \frac{\varepsilon _k}{3}. \end{aligned}$$

In this way we get a subsequence \(\{ \varepsilon _{l(k)} \}_{k \ge 0 }\) for which

$$\begin{aligned} \parallel T^{\varepsilon _{l(k)}, 0}(t) f - T^{0,0}(t) f \parallel _\infty \le \varepsilon _k \end{aligned}$$

holds. The claim now follows by taking the limit \(k\rightarrow \infty \). \(\square \)

Remark 2.4

A sufficient condition for the key assumption (A4) to hold is that

$$\begin{aligned} F_0^\delta \rightarrow F_0^0, {\quad } F_1^\delta \rightarrow F_1^0, {\quad } A_0^\delta \rightarrow A_0^0 \quad \text {uniformly in }x, \end{aligned}$$
(2.19)

provided that the expressions \(F_0^0, F_1^0, A_0^0\) are well-defined, which requires sufficiently fast decay of correlations. Furthermore, Theorem B gives us precise conditions under, which (A4) is satisfied. In the case that \(g= g(y)\) is independent of x, the posed assumptions are relatively mild.

Next, recall that for \(\varepsilon > 0\) we denote by \((X^\varepsilon (t;\xi ,\eta ), Y^\varepsilon (t;\xi ,\eta ))\) the solution of the ODE (1.3).

Corollary 2.5

Assume that (A1)-(A4) hold, that \({\mathcal {L}}^{0,0}\) can be written as in (1.19) and that SDE (1.22) has the solution X(t). Then for every \(f \in C_0({\mathbb {R}}^d)\) and every sequence \(\{\varepsilon _k\}_{k\ge 0}\) with \(\varepsilon _k \rightarrow 0\) for \(k \rightarrow \infty \) there exists a subsequence \( \{ \varepsilon _{k_m} \}_{m \ge 0}\) such that for \(m \rightarrow \infty \),

$$\begin{aligned} f(X^{\varepsilon _{k_m}}(t; \xi , \eta )) \rightarrow {\mathbb {E}}[f(X(t;\xi ))], \quad \text {uniformly in }\xi \in {\mathbb {R}}^d, \eta \in \Omega \subset {\mathbb {T}}^m\text { and }t\in [0,{\hat{T}}], \end{aligned}$$

where the expectation \({\mathbb {E}}\) is taken with respect to the Wiener measure (defined on \(\Lambda \)) of the Brownian motion W. It follows especially that for any Borel probability measure \(\mu \) on \({\mathbb {T}}^m\) we have

$$\begin{aligned} {\mathbb {E}}^{\mu } [f(X^{\varepsilon _{k_m}}(t))]\rightarrow {\mathbb {E}}[f(X(t))]\quad \text {uniformly in }t \in [0,{\hat{T}}] \end{aligned}$$

Proof

The first statement follows immediately from Theorem 2.3, observing that \((T^{\varepsilon ,0}(t) f)(x) = f(X^{\varepsilon }(t; x)) \) and \((T^{0,0}(t) f) (x) = {\mathbb {E}}[f(X(t;x))]\). The last statement follows from the dominated convergence theorem. \(\square \)

Remark 2.6

Note that if there exists a unique solution to the SDE (1.22), then this is exactly the Markov process generated by \({\mathcal {L}}^{0,0}\), but Theorem A does not necessarily need this restriction. A sufficient condition for existence and uniqness of solutions of the SDE is global Lipschitz continuity of the drift and diffusion coefficients which follows in the more particular context of Theorems B and C via the ergodic formulas (1.20), (1.21), (1.27), (1.28) and Assumptions (A1), (A2). In general, we need Lipschitz continuity of the averaged vector field

$$\begin{aligned} {\bar{a}}(x):= \int _{{\mathbb {T}}^m} a(x,y) ~\mathrm {d}\mu _x^0(y), \end{aligned}$$

which demands sufficiently smooth dependence of the invariant measures \(\mu _x\) on the parameter x. This can be violated, if for example the fast dynamics exhibits bifurcations upon varying x. In fact, even continuity of \({\bar{a}}\) cannot be guaranteed in such cases. The problem of non-smooth dependence of the measures \(\mu _x\) is known in statistical physics as “no linear response” and can appear even in relatively simple dynamical systems [8, 9, 24]. See also the work of Baladi and coworkers on unimodal maps, i.e., [3, 5] and references therein.

Our next natural goal is now to check under which abstract assumptions on the original ODE problems, the condition (A4) (that is equation (1.14)) is satisfied.

3 Convergence of the Limiting Generator \({\mathcal {L}}^{0,\delta }\)

In this section we investigate requirements for condition (A4) to hold, which is the main assumption in Theorem 2.3 and it is also our last missing piece for proving convergence of the first moments for the slow process for the coupled deterministic systems (1.3). Let us recall that the operator \({\mathcal {L}}^{0,\delta }\) is explicitly given by (1.12) where the drift term \(F^\delta \) and the diffusion matrix \(A^\delta \) are explicitly given by (1.13) and by the alternative expressions in Lemma 2.2. These alternative expressions use the solution operator \(\phi _x^{\delta ,t}\) solving equation (1.15). Thus, a first step towards proving (A4) is to understand the behavior of \(\phi _x^{\delta ,t}\) in the limit \(\delta \rightarrow 0\):

Lemma 3.1

(Behavior of the solution operator as\(\delta \rightarrow 0\)) Under the previous assumptions, the following statements are true:

(i):

For every \(T>0\) and \(\omega \in \Lambda \), there exists a positive constant \(\beta (T ,\omega ) > 0\) (which is independent of xy and \(\delta \)) such that:

$$\begin{aligned} | \phi _{x}^{\delta ,t}(y) - \phi _{x}^{0,t} (y) |_\infty \le \sqrt{\delta } \beta (T ,\omega ), \end{aligned}$$
(3.1)

where \(|\cdot |_\infty \) denotes the supremum norm in \({\mathbb {R}}^m\). This implies that for all \(\omega \in \Lambda \) we have

$$\begin{aligned} \phi _{x}^{\delta ,t}(y) \rightarrow \phi _{x}^{0,t} (y) \quad \text {as }\delta \rightarrow 0\text { uniformly in }x, y\text { and }t \in [0,T]. \end{aligned}$$
(3.2)

Furthermore, it holds that

$$\begin{aligned} {\mathbb {E}}[| (\phi _{x}^{\delta ,t}(y)) - \phi _{x}^{0,t} (y) |_\infty ] \le \sqrt{\delta } \beta (T), \end{aligned}$$
(3.3)

where \(\beta (T):= {\mathbb {E}}\left[ \beta (T ,\omega )\right] < \infty \)

(ii):

There exists a version of the stochastic process \(\phi _x^{\delta ,t}(y)\) such that for a.a. \(\omega \in \Lambda \) the map \(x \mapsto \phi _x^{\delta ,t}(y)\) is continuously differentiable for every t and the gradient \(\nabla _{x}\phi _x^{\delta ,t}(y)\) satisfies the linear ODE

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} \nabla _{x}\phi _x^{\delta ,t}(y) = \nabla _x g(x,\phi _x^{\delta ,t}(y)) + \nabla _y g(x,\phi _x^{\delta ,t}(y))\nabla _{x}\phi _x^{\delta ,t}(y) \quad \nabla _x\phi _x^{\delta ,0}(y) = 0.\nonumber \\ \end{aligned}$$
(3.4)

Furthermore, for a.a. \(\omega \in \Lambda \) we have

$$\begin{aligned} \nabla _x \phi _x^{\delta ,t}(y) \rightarrow \nabla _x \phi _x^{0,t}(y) \quad \text {as }\delta \rightarrow 0\hbox { uniformly in }x,y\hbox { and }t \in [0,T]. \end{aligned}$$
(3.5)

Proof

(i) :

Due to the definition of the solution operator, it follows immediately that for any \(t \in [0,T]\)

$$\begin{aligned} | \phi _{x}^{\delta ,t} (y) - \phi _{x}^{0,t} (y)|_\infty&\le \int _0^t |g(x,\phi _{x}^{\delta ,t} (y) ) - g(x,\phi _{x}^{0,t} (y) )|_\infty ~\mathrm {d}s + \sqrt{\delta } |V(t) (\omega )|_\infty \\&\le C(x) \int _0^t | \phi _{x}^{\delta ,s} (y) - \phi _{x}^{0,s} (y)|_\infty ~\mathrm {d}s + \sqrt{\delta }|V(t)(\omega )|_\infty \\&\le {\tilde{C}} \int _0^t | \phi _{x}^{\delta ,s} (y) - \phi _{x}^{0,s} (y)|_\infty ~\mathrm {d}s + \sqrt{\delta } \underbrace{\sup _{t \in [0,T]}|V(t)(\omega )|_\infty }_{=:\alpha (T,\omega )}, \end{aligned}$$

where \({\tilde{C}} := \sup _{x \in {\mathbb {R}}^d} C(x) <\infty \) due to the boundedness of \(\nabla _xg\). Due to Gronwall’s lemma it follows that for all \(t\in [0,T]\)

$$\begin{aligned} | \phi _{x}^{\delta ,t} (y) - \phi _{x}^{0,t} (y)|_\infty \le \sqrt{\delta } \alpha (T ,\omega ) \exp (CT) \le \sqrt{\delta } \beta (T ,\omega ), \end{aligned}$$
(3.6)

where we have set \(\beta (T,\eta ):= \alpha (T ,\eta ) \exp (CT)\). Further we see that

$$\begin{aligned} {\mathbb {E}} [\beta (T ,\cdot )]= {\text {e}}^{CT} {\mathbb {E}}[\alpha (T ,\cdot ) ]< \infty , \end{aligned}$$

which implies, by monotonicity of the integral, equation (3.3).

(ii) :

For the pathwise differentiability of the process \(\phi _x^{\delta ,t}\) see Lemma 2.1 (or [36, Theorem 4.2]). Due to (i) we see further that for a.a. \(\omega \in \Lambda \)

$$\begin{aligned} \nabla _x g(x,\phi _x^{\delta ,t}(y)) \rightarrow \nabla _x g(x,\phi _x^{0,t}(y)), \\ \nabla _y g(x,\phi _x^{\delta ,t}(y)) \rightarrow \nabla _y g(x,\phi _x^{0,t}(y)) \quad \text {as }\delta \rightarrow 0\text { uniformly in }x,y\text { and }t \in [0,T]. \end{aligned}$$

Hence, the last equation is a consequence of continuous dependence of ODEs on the coefficients.

\(\square \)

After having understood the behavior of \(\phi _x^{\delta , t}\) in the limit \(\delta \rightarrow 0\) we now want to come back to the generator \({\mathcal {L}}^{0,\delta }\) given in (1.12). Its coefficients, which use the solution operator \(\phi _x^{\delta ,t}\), are given in Lemma 2.2. Seeing these expressions and Lemma 3.1 one might be tempted to conclude the convergence of \(F^\delta , A^\delta \) and as a consequence equation (1.14). Unfortunately, it is not that simple, because for general functions g the expressions \(F_0^0, F_1^0\) and \(A_0^0\) in Lemma 2.2 will not be well-defined. In fact, they are only then well-defined, when the flow \(\phi _x^{0,t}(y)\) has strong mixing properties. These considerations motivate the following definitions:

Definition 3.2

(Decay of correlations for deterministic systems) We say that the flow \(\phi _x^{0,t}(y)\) is mixing with decay of correlations C(tx) provided that there exists an \(\alpha >0\) such that for all continuous functions \(v,w: {\mathbb {T}}^m \rightarrow {\mathbb {R}}\), lying in the Hölder space \((C^{0, \alpha }, \parallel \cdot \parallel _\alpha )\), we have

$$\begin{aligned}&\Big |\int _{{\mathbb {T}}^m} v(z) w(\phi _x^{0,t}(z)) d\mu _x(z) - \int _{{\mathbb {T}}^m} v(z)~\mathrm {d}\mu _x(z) \int _{{\mathbb {T}}^m} w(z) d\mu _x(z)\Big | \le C(t;x) \parallel v \parallel _\alpha \parallel w\parallel _\alpha ,\\&\quad \text { with } C(t;x) \rightarrow 0 \quad \text {as }t \rightarrow \infty \text { for all } x \in {\mathbb {R}}^d. \end{aligned}$$

We say that the decay of correlations is summable provided that

$$\begin{aligned} \int _0^\infty C(t;x) ~\mathrm {d}t < \infty \quad \text {for all } x \in {\mathbb {R}}^d, \end{aligned}$$

and we say that the decay of correlations is exponential provided that for every \(x \in {\mathbb {R}}^d\) there exist constants \(C(x),\rho (x) > 0\) such that

$$\begin{aligned} C(t;x) = C(x) {\text {e}}^{- \rho (x) t}. \end{aligned}$$

Remark 3.3

Note that in the special case where either \(\int _{{\mathbb {T}}^m} v(z) ~\mathrm {d}\mu _x(z)=0\) or \(\int _{{\mathbb {T}}^m} w(z) ~\mathrm {d}\mu _x(z)=0\) holds, summable decay of correlations implies that

$$\begin{aligned} \int _0^\infty \Big |\int _{{\mathbb {T}}^m} v(z) w(\phi _x^{0,t}(z)) ~\mathrm {d}\mu _x(z)\Big | ~\mathrm {d}t < \infty . \end{aligned}$$

Lemma 3.4

(Decay of correlations for stochastic systems) Fix a \(\delta > 0\). For all continuous functions \(v,w: {\mathbb {T}}^m \rightarrow {\mathbb {R}}\) we have

$$\begin{aligned}&\Big |\int _{{\mathbb {T}}^m} v(z) {\mathbb {E}} [ w(\phi _x^{\delta ,t}(z)) ] ~\mathrm {d}\mu _x^\delta (z) - \int _{{\mathbb {T}}^m} v(z) ~\mathrm {d}\mu _x^\delta (z) \int _{{\mathbb {T}}^m} w(z) ~\mathrm {d}\mu _x^\delta (z) \Big | \\&\quad \le {\tilde{C}}(\delta ;x) \parallel v \parallel _\infty \parallel w \parallel _\infty {\text {e}}^{-\rho (\delta ;x)t}. \end{aligned}$$

In particular, this implies that the stochastic flow has exponential decay of correlations in the sense of Definition 3.2.

Proof

This is an easy application of [38, Theorem 6.16]:

$$\begin{aligned}&\Big | \int _{{\mathbb {T}}^m} v(z) {\mathbb {E}} [w(\phi _x^{\delta ,t}(z)) ]~\mathrm {d}\mu _x^\delta (z) - \int _{{\mathbb {T}}^m} v(z) ~\mathrm {d}\mu _x(z) \int _{{\mathbb {T}}^m} w(z) ~\mathrm {d}\mu _x^\delta (z) \Big | \\&\quad = \Big |\int _{{\mathbb {T}}^m} v(z) \Big \{{\mathbb {E}} [w(\phi _x^{\delta ,t}(z))] - \int _{{\mathbb {T}}^m} w({\tilde{z}}) ~\mathrm {d}\mu _x^\delta ({\tilde{z}}) \Big \}~\mathrm {d}\mu _x^\delta (z)\Big | \\&\quad \le \Big |\int _{{\mathbb {T}}^m} v(z) {\tilde{C}}(\delta ;x) \parallel w \parallel _\infty {\text {e}}^{-\rho (\delta ;x) t }~\mathrm {d}\mu _x^\delta (z)\Big | \\&\quad \le {\tilde{C}}(\delta ;x)\parallel v \parallel _\infty \parallel w \parallel _\infty {\text {e}}^{-\rho (\delta ;x)t}. \end{aligned}$$

This finishes the proof. \(\square \)

Definition 3.5

(Stochastically stable decay of correlations) Let \(v,w: {\mathbb {T}}^m \rightarrow {\mathbb {R}}\). Assume that the deterministic flow \(\phi _x^{0,t}\) has decay of correlation C(tx). We say that \(\phi _x^{0,t}\) has stochastically stable decay of correlations provided that for all small enough \(\delta > 0\) and \(x \in {\mathbb {R}}^d\)

$$\begin{aligned} {\tilde{C}}(\delta ;x) {\text {e}}^{-\rho (\delta ;x)t} \le C(t;x), \end{aligned}$$

where the constants on the left side are as in Lemma 3.4.

These notions allow to prove the following statement concerning \(F_0^0, F_1^0\) and \(A_0^0\):

Lemma 3.6

Assume that the unperturbed flow \(\phi _x^{0,t}\) has summable decay of correlations C(tx) and stochastically stable decay of correlations in the sense of Definitions 3.2 and 3.5, and that the centering condition (1.16) is satisfied. Furthermore, consider, for \(\delta \ge 0\), the well-defined expressions \(F_1^\delta (x)\) (2.12), \(A_0^\delta (x)\) (2.13) and, for \(g=g(y)\),

$$\begin{aligned} F_0^\delta (x) = \int _0^\infty \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T {\mathbb {E}} [\nabla _x b\Big (x,\phi ^{\delta ,t}(\phi ^{\delta ,s}(y)(\omega )) \Big ) ] b\Big (x,\phi ^{\delta ,s}(y)(\omega )\Big ) \mathrm {d}s ~\mathrm {d}t, \end{aligned}$$
(3.7)

which hold for all \(y\in {\mathbb {T}}^m\) and a.a. \(\omega \in \Lambda \) by ergodicity (cf. Lemma 2.2).

Then we have

$$\begin{aligned} F_1^\delta \rightarrow F_1^0, {\quad }A_0^\delta \rightarrow A_0^0 \quad \text { as }\delta \rightarrow 0\text { uniformly in }x, \end{aligned}$$
(3.8)

and, in the case that \(g=g(y)\), we additionally obtain

$$\begin{aligned} F_0^\delta \rightarrow F_0^0 \quad \text {as }\delta \rightarrow 0\text { uniformly in } x. \end{aligned}$$
(3.9)

Proof

We first want to ensure that all considered expressions (2.12), (2.13) and (3.7) are well-defined for all \(\delta \ge 0\). For (2.12) this is trivial. For (2.13) note that for a.a. \(\omega \in \Lambda \), due to the centering condition (1.16), Lemma 3.4 and the stochastic stability we have componentwise in the tensor product

$$\begin{aligned}&\Big |\lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T b\Big (x, \phi _x^{\delta ,s}(y)(\omega ) \Big ) \otimes {\mathbb {E}} [b \Big (x,\phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y)(\omega )) \Big ) ]~\mathrm {d}s \Big | \\&\quad = \Big |\int _{{\mathbb {T}}^m} b(x,y) \otimes {\mathbb {E}} [ b(x,\phi _x^{\delta ,t}(y) ] ~\mathrm {d}\mu _x^\delta (y)\Big | \\&\quad \le C_1(b) C(t;x) \end{aligned}$$

(\(C_1(b)\) is a constant which depends on b) and analogously for (3.7) in the case that \(g=g(y)\).

We now start by estimating the difference \(F_1^\delta - F_1^0\) for \(\delta > 0\). Let \(\varepsilon > 0\) and define, for \(T> 0\), \(F_1^{\delta , T}:= \frac{1}{T} \int _0^Ta(x, \phi _x^{\delta ,s})~\mathrm {d}s\). For any \(\delta > 0\) we have that

$$\begin{aligned} |F_1^\delta - F_1^0| \le |F_1^\delta - F_1^{\delta ,T}| + |F_1^{\delta ,T} - F_1^{0,T}| + |F_1^{0,T}- F_1^0|. \end{aligned}$$

For each \(\delta > 0\) we can fix a \(T = T_0\), which is independent of \(\delta \) and \(x,y,\omega \), such that the first and last difference become smaller that \(\frac{\varepsilon }{3}\). To see this, note that the sequence \(\frac{1}{T} \int _0^T \sup _{\delta ,x,y,\omega } |a\Big (x, \phi _x^{\delta ,s}(y)(\omega ) \Big )|\mathrm {d}s\) is bounded from above and increasing, hence it converges. Moreover, due to Lemma 3.1 and due to the Lipschitz continuity of the vector field a, we have that

$$\begin{aligned} |F_1^{\delta ,T_0} - F_1^{0,T_0}|= & {} \frac{1}{T_0} \int _0^{T_0} |a(x, \phi _x^{\delta ,s}(y)) - a(x, \phi _x^{0,s}(y))|~\mathrm {d}s \nonumber \\\le & {} \sqrt{\delta } C(T_0,\omega ) \rightarrow 0 \text { for }\delta \rightarrow 0. \end{aligned}$$
(3.10)

Hence, for a.a. \(\omega \) we have

$$\begin{aligned} F_1^\delta \rightarrow F_1^0 \quad \text {as }\delta \rightarrow 0\text { uniformly in } x,y. \end{aligned}$$
(3.11)

Next, for estimating \(A_0^\delta - A_0^0\) we we define

$$\begin{aligned} a_\delta ^{i,j}(t; x,y,\omega )&:= \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T b^i \Big (x, \phi _x^{\delta ,s}(y) \Big ) {\mathbb {E}}\Big [ b^j\Big (x,\phi _x^{\delta ,t} (\phi _x^{\delta ,s}(y)) \Big ) \Big ] ~\mathrm {d}s, \\ a_0^{i,j}(t; x,y,\omega )&:= \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T b^i\Big (x, \phi _x^{0,s}(y) \Big ) b^j\Big (x,\phi _x^{0,t+s}(y) \Big )~\mathrm {d}s, \\ a_\delta ^{i,j,T}(t; x,y,\omega )&:= \frac{1}{T} \int _0^T b^i\Big (x, \phi _x^{\delta ,s}(y) \Big ) {\mathbb {E}}\Big [ b^j \Big (x,\phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y) \Big ) \Big ] ~\mathrm {d}s, \\ a_0^{i,j,T}(t; x,y,\omega )&:= \frac{1}{T} \int _0^T b^i\Big (x, \phi _x^{0,s}(y)\Big ) b^j\Big (x,\phi _x^{0,t+s}(y) \Big )~\mathrm {d}s. \end{aligned}$$

As before we split

$$\begin{aligned} |a_\delta ^{i,j} - a_0^{i,j}| \le |a_\delta ^{i,j} - a_\delta ^{i,j,T}| + |a_\delta ^{i,j,T} - a_0^{i,j,T}| + |a_0^{i,j,T}- a_0^{i,j}|. \end{aligned}$$

The sequence

$$\begin{aligned} \frac{1}{T} \int _0^T \sup _{\delta ,x,y,\omega } \Big | b^i(x, \phi _x^{\delta ,s}(y)) {\mathbb {E}} \Big [ b^j\Big (x,\phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y)) \Big ) \Big ] \Big |~ \mathrm {d}s \end{aligned}$$
(3.12)

is bounded from above and increasing, hence it converges for every t. Hence, we can find a \(T = T_0(t)\), which is independent of \(\delta \) and and xy and \(\omega \) such that the first and last terms of equation (3.12) become smaller than \(\varepsilon \). With this \(T_0\) we have

$$\begin{aligned}&| a_\delta ^{i,j,T_0}(t; x,y,\omega ) - a_0^{i,j,T_0}(t; x,y,\omega )| \\&\quad \le \frac{1}{T_0} \int _0^{T_0}| b^i\Big (x, \phi _x^{\delta ,s}(y) \Big ){\mathbb {E}} \Big [ b^j\Big (x,\phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y)) \Big ) \Big ] \\&\qquad - b^i\Big (x, \phi _x^{0,s}(y)\Big ) b^j\Big (x,\phi _x^{0,t+s}(y)\Big )|~\mathrm {d}s \\&\quad \le \frac{1}{T_0} \int _0^{T_0} \underbrace{\Big |b^i\Big (x, \phi _x^{\delta ,s}(y)\Big )\Big |}_{\le C_1 } \underbrace{\Big | {\mathbb {E}} \Big [ b^j\Big (x,\phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y))\Big ) - b^j\Big (x,\phi _x^{0,t+s}(y)\Big ) \Big ] \Big |}_{\le \sqrt{\delta }C_2(t) \text { due to Lemma}~3.1} |~\mathrm {d}s \\&\qquad + \frac{1}{T_0} \int _0^{T_0} \Bigg |\underbrace{b^j\Big (x,\phi _x^{0,t+s}(y) \Big )}_{\le C_1} \underbrace{\Bigg \{b^i\Big (x, \phi _x^{\delta ,s}(y) \Big )- b^i\Big (x,\phi _x^{0,s}(y) \Big ) \Bigg \}}_{\le \sqrt{\delta } C_3(T_0,\omega ) \text { due to Lemma }3.1} \Bigg |~\mathrm {d}s \\&\quad \le \sqrt{\delta } C_4(t,T_0,\omega ) \rightarrow 0 \quad \text {for } \delta \rightarrow 0, \end{aligned}$$

where \( C_1, C_2, C_3, C_4\) denote positive constants. Hence, for all t and \(\omega \) we have

$$\begin{aligned} a_\delta ^{i,j}(t; x,y,\omega ) \rightarrow a_0^{i,j}(t; x,y,\omega ) \quad \text {as } \delta \rightarrow 0,\text { uniformly in } x,y. \end{aligned}$$
(3.13)

Due to the assumption on the fast dynamics we know further that for any fixed \(t,x,y,\omega \) we have

$$\begin{aligned} |a_\delta ^{i,j}(t; x,y,\omega )| < C(t;x) \quad \text {for } \delta \text { sufficiently small.} \end{aligned}$$
(3.14)

Using (3.13) and (3.14) we get by the dominated convergence theorem

$$\begin{aligned} \int _0^\infty a_\delta ^{i,j}(t; x,y,\omega ) ~\mathrm {d}t \rightarrow \int _0^\infty a_0^{i,j}(t; x, y) ~\mathrm {d}t \quad \text {as } \delta \rightarrow 0. \end{aligned}$$
(3.15)

Due to equation (3.13) the convergence is uniform in \(x\in {\mathbb {R}}^d\), \(y\in {\mathbb {T}}^m\). From (3.15), it follows that

$$\begin{aligned} A_0^\delta \rightarrow A_0^0 \quad \text {as }\delta \rightarrow 0\text { uniformly in }x \in {\mathbb {R}}^d. \end{aligned}$$
(3.16)

Finally, we deal with the difference \(|F_0^\delta - F_0^0|\) in case that g is independent of x. Proceeding as in our previous computations we can verify that

$$\begin{aligned}&\lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T {\mathbb {E}} \Big [ \nabla _x b\Big (x,\phi _x^{\delta ,t} (\phi _x^{\delta ,s}(y)) \Big ) \Big ] b\Big (x,\phi _x^{\delta ,s}(y) \Big )~\mathrm {d}s \\&\quad \rightarrow \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T \nabla _x b\Big (x,\phi _x^{0,t+s}(y)\Big ) b\Big (x,\phi _x^{0,s}(y)\Big )\mathrm {d}s \quad \text {as } \delta \rightarrow 0, \end{aligned}$$

uniformly in xy and for \(t \in [0,T]\). This implies, due to the stochastically stable decay of correlations of \(\phi \) that

$$\begin{aligned} F_0^\delta \rightarrow F_0^0 \quad \text {as }\delta \rightarrow 0\text { uniformly in } x. \end{aligned}$$
(3.17)

This finishes the proof. \(\square \)

It remains to deal with the term \(F_0^0\) in case g does also depend on x. The crucial ingredients are equations (1.17) and (1.18) such that we can formulate the following result:

Lemma 3.7

For the case that \(g=g(x,y)\) also depends on x, we assume that the unperturbed flow \(\phi _x^{0,t}\) has summable and stochastically stable decay of correlations wrt. an ergodic invariant measure \(\mu _x^0\) on \({\mathbb {T}}^m\). Additionally, we assume that the centering condition (1.17) and, for any \(y \in {\mathbb {T}}^m\), the growth condition (1.18) are satisfied.

Then we obtain:

  1. 1.

    Setting

    $$\begin{aligned}&f_0^\delta (t,x) := \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T {\mathbb {E}} \Big [ \nabla _x b\Big (x ,\phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y)) \Big ) \\&\quad + \nabla _yb\Big (x,\phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y))\Big )\nabla _x \phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y)) \Big ] b\Big (x,\phi _x^{\delta ,s}(y) \Big ) ~\mathrm {d}s, \end{aligned}$$

    we have that

    $$\begin{aligned} \parallel f_0^0(t,\cdot )\parallel _\infty \le h(t), \quad \text {for a function }h\text { with} \int _0^\infty h(t) ~\mathrm {d}t < \infty . \end{aligned}$$
    (3.18)
  2. 2.

    For \(\delta \ge 0\) small enough, h(t) is an upper bound for \(f_0^\delta \), the expression

    $$\begin{aligned} F_0^\delta (x) = \int _0^\infty f_0^\delta (t,x) ~\mathrm {d}t \end{aligned}$$

    is well-defined and we have

    $$\begin{aligned} F_0^\delta \rightarrow F_0^0 \quad \text {as }\delta \rightarrow 0\text { uniformly in } x \in {\mathbb {R}}^d . \end{aligned}$$
    (3.19)

Proof

We must first ensure that all expressions \(F_0^\delta \) are well-defined. It is easy to see that for all \(\delta \ge 0\) we have

$$\begin{aligned} \Big |\lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T {\mathbb {E}} \Big [ \nabla _x b\Big (x,\phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y)) \Big ) \Big ] b\Big (x,\phi _x^{\delta ,s} (y) \Big ) ~\mathrm {d}s \Big |_\infty \le C_2 C(t;x), \end{aligned}$$
(3.20)

for a constant \(C_2>0\). Secondly for \(\delta = 0 \), we set \(w^{x}:= \nabla _yb(x,y)\) and \(v^{t,x}:= \nabla _x \phi _x^{0,t}(y) b(x,y)\) in the definition of decay of correlations and, using condition (1.17), we observe that

$$\begin{aligned}&\Big |\lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T {\mathbb {E}} \Big [ \nabla _yb \Big (x,\phi _x^{\delta ,t}(\phi _x^{\delta ,s}(y))\Big )\nabla _x\phi _x^{\delta ,t} (\phi _x^{\delta ,s}(y)) \Big ] b\Big (x,\phi _x^{\delta ,s}(y)\Big )~ \mathrm {d}s\Big | \\&\quad \le C(t,x) \parallel w^x \parallel _\alpha \parallel v^{t,x} \parallel _\alpha . \end{aligned}$$

This fact together with the growth assumption (1.18) yields

$$\begin{aligned} \parallel f_0^0(t, \cdot ) \parallel _\infty\le & {} \sup _{x \in {\mathbb {R}}^d} \Big \{ C(t;x) ( \parallel w^x \parallel _\alpha \parallel \nabla _x \phi _x^{0,t}(\cdot ) b(x,\cdot )\parallel _\alpha + C_2) \Big \}\\=: & {} h(t), {\quad } \int _0^\infty h(t) ~\mathrm {d}t < \infty , \end{aligned}$$

which, in particular, implies that \(F_0^0\) is well-defined. Furthermore, due to stochastically stable decay of correlations, proceeding as in Lemma 3.6 (and using also Lemma 3.1(ii)) we can show that

$$\begin{aligned} f_0^\delta \rightarrow f_0^0, {\quad } \parallel f_0^\delta (t, \cdot ) \parallel _\infty \le h(t). \end{aligned}$$

Finally, we can conclude (3.19) by dominated convergence. \(\square \)

This allows us now to conclude the main result of this section, Theorem B.

Proof of Theorem B

The statement follows immediately from Lemmas 3.6 and 3.7. \(\square \)

Remark 3.8

  1. (i)

    Condition (1.18) seems to be a relatively strong mixing condition, which may be difficult to verify for certain practical examples. Indeed, one observes that \(\nabla _x \phi _x^{\delta ,t} (y)\) solves the first order linear inhomogeneous ODE (3.4). Thus, \(\nabla _x \phi _x^{\delta ,t} (y) \) can be calculated by variation of constants and is explicitly given by the formula

    $$\begin{aligned} \nabla _x \phi _x^{\delta ,t} (y) = {\text {e}}^{ \int _0^t \nabla _yg(x,\phi _x^{\delta ,\tau }(y))~\mathrm {d}\tau } \Bigg ( \int _0^t {\text {e}}^{- \int _0^s \nabla _y g(x, \phi _x^{\delta ,\tau }(y)) ~\mathrm {d}\tau }\nabla _x g(x,\phi _x^{\delta ,s}(y)) ~\mathrm {d}s + y \Bigg ). \end{aligned}$$

    Assuming for simplicity that the matrices \({\text {e}}^{ \int _0^t \nabla _yg(x,\phi _x^{\delta ,\tau }(y))~\mathrm {d}\tau }\) and \({\text {e}}^{- \int _0^s \nabla _y g(x, \phi _x^{\delta ,\tau }(y))~\mathrm {d}\tau }\) commute, we obtain from the last equation

    $$\begin{aligned} |\nabla _x \phi _x^{\delta ,t} (y)|_\infty&\le \parallel \nabla _x g \parallel _\infty \int _0^t {\text {e}}^{\parallel \nabla _yg \parallel _\infty (t-s)}~\mathrm {d}s + {\text {e}}^{\parallel \nabla _y g\parallel _\infty t}. \end{aligned}$$

    From this we conclude that

    $$\begin{aligned} \sup _{x,y,\omega ,\delta }|\nabla _x \phi _x^{\delta ,t} (y)|_\infty \le K {\text {e}}^{\parallel \nabla _y g\parallel _\infty t}, \end{aligned}$$

    where the constant

    $$\begin{aligned} K:= \parallel \nabla _x g \parallel _\infty \int _0^\infty {\text {e}}^{-\parallel \nabla _yg \parallel _\infty s}~\mathrm {d}s + 1 \end{aligned}$$

    is independent of t. Thus, the growth condition (1.18) might hold if the unperturbed flow \(\phi _x^{0,t}\) has exponential decay of correlations \(C(t;x) \le C {\text {e}}^{-\rho t}\), for all \(x \in {\mathbb {R}}^d\), with \(\rho \ge \parallel \nabla _yg \parallel _\infty \). This inequality describes precisely the boundary of what we might optimistically expect as possible decay rates for correlations and a further investigation is left as an open problem here.

  2. (ii)

    The centering condition (1.16) might seem a strong assumption at first glance because it must be satisfied for all \(\delta >0\) and x. However, the parameter \(\delta > 0\) has the effect of only “streching” the invariant density \(\rho _\infty ^\delta (y;x)\), so that the function b has to be simply some function which is in accordance with the symmetry of the invariant densities. The condition can also be relaxed by allowing the operator \({\mathcal {L}}_2\) to be perturbed as well. More precisely, assume that the function b satisfies

    $$\begin{aligned} \int _{{\mathbb {T}}^m} b(x,y)~\mathrm {d}\mu _x^0(y) = 0, \quad \text {for all } x \in {\mathbb {R}}^d. \end{aligned}$$

    We consider suitable perturbed vector fields \(b^\delta \) satisfying the centering condition (1.9), for which additionally we have

    $$\begin{aligned} b^\delta \rightarrow b \quad \text {uniformly.} \end{aligned}$$

    For example, we can consider functions of the form

    $$\begin{aligned} b^\delta (x,y):= b(x,y) - \int _{{\mathbb {T}}^m} b(x,z) \rho _\infty ^\delta (z;x) dz \end{aligned}$$

    We then define the perturbed operators

    $$\begin{aligned} {\mathcal {L}}_2^\delta u := b^\delta \cdot \nabla _x u, \end{aligned}$$
    $$\begin{aligned} {\mathcal {L}}^{\varepsilon ,\delta } =\frac{1}{\varepsilon ^2} {\mathcal {L}}_{1}^\delta + \frac{1}{\varepsilon } {\mathcal {L}}_2^\delta + {\mathcal {L}}_3 \end{aligned}$$

    and

    $$\begin{aligned} {\mathcal {L}}^{0,\delta } f := (-{\mathcal {P}}^\delta {\mathcal {L}}_2^\delta [{\mathcal {L}}_1^{\delta }]^{-1} {\mathcal {L}}_2^\delta {\mathcal {P}}^\delta + {\mathcal {P}}^\delta {\mathcal {L}}_3^\delta {\mathcal {P}}^\delta )f \end{aligned}$$

    and we can repeat the proof of Theorem 2.3 to get the statement.

4 Weakly-Coupled Systems

4.1 Main Result

To provide an intermediate alternative to the strong mixing assumption (see condition (1.18)), we are also consider a simpler case of so-called weakly-coupled systems. These are systems with coupling occurring only in lower times scales and they are given by equation (1.23). We also consider the corresponding stochastic version

$$\begin{aligned} \begin{aligned} \frac{\mathrm {d}x_\varepsilon }{\mathrm {d}t}&= a(x_\varepsilon , y_\varepsilon ) + \frac{1}{\varepsilon } b(x_\varepsilon , y_\varepsilon ),{\quad } x_\varepsilon (0) = x_0, \\ \frac{\mathrm {d}y_\varepsilon }{\mathrm {d}t}&= \frac{1}{\varepsilon ^2} g( y_\varepsilon ) + \frac{1}{\varepsilon } \left( h(x_\varepsilon ,y_\varepsilon ) + \sqrt{\delta } \frac{\mathrm {d}V}{\mathrm {d}t}\right) + r(x_\varepsilon ,y_\varepsilon ), {\quad } y_\varepsilon (0) = y_0. \end{aligned} \end{aligned}$$
(4.1)

We are going to use now the assumptions (A1)-(A2), (A4)-(A5), and suitable centering an correlation decay conditions but not (A6) to finally be able to prove Theorem C. For any \(\delta > 0\) we set

$$\begin{aligned} \tilde{{\mathcal {L}}}_1^\delta&:= g(y) \cdot \nabla _y + \frac{1}{2}\delta I: \nabla _y\nabla _y , \\ \tilde{{\mathcal {L}}}_2&:= b(x,y) \cdot \nabla _x + h(x,y) \cdot \nabla _y = {\mathcal {L}}_2^c + {\mathcal {L}}_2^{nc}, \\ \tilde{{\mathcal {L}}}_3&= a(x,y) \cdot \nabla _x + r(x,y)\cdot \nabla _y, \end{aligned}$$

with the commutative part \({\mathcal {L}}_2^c := b(x,y) \cdot \nabla _x\) and the remainder \({\mathcal {L}}_2^{nc} := h(x,y) \cdot \nabla _y\). The operator

$$\begin{aligned} \tilde{{\mathcal {L}}}^{\varepsilon , \delta } = \frac{1}{\varepsilon ^2}\tilde{{\mathcal {L}}}_1^\delta + \frac{1}{\varepsilon }\tilde{{\mathcal {L}}}_2 + \tilde{{\mathcal {L}}}_3 \end{aligned}$$

is the backward Kolmogorov operator associated with the SDE (4.1). Assume that the centering condition (1.16) is satisfied. Consider the perturbation expansion

$$\begin{aligned} u^{\varepsilon , \delta } = u_0^\delta + \varepsilon u_1^\delta + \varepsilon ^2 u_2^\delta + \cdots \end{aligned}$$
(4.2)

which we substitute into the backward Kolmogorov equation

$$\begin{aligned} \frac{\mathrm {d}u^{\varepsilon ,\delta }}{\mathrm {d}t} = \tilde{{\mathcal {L}}}^{\varepsilon ,\delta }u^{\varepsilon ,\delta } := \Big (\frac{1}{\varepsilon ^2}\tilde{{\mathcal {L}}}_1^\delta + \frac{1}{\varepsilon }\tilde{{\mathcal {L}}}_2 + \tilde{{\mathcal {L}}}_3 \Big )u^{\varepsilon ,\delta }. \end{aligned}$$
(4.3)

Via the perturbation analysis given in Sect. B of the Appendix, we arrive at the following equation for the leading order \(u_0^\delta \)

$$\begin{aligned} \frac{\mathrm {d}u_0^\delta }{\mathrm {d}t} = {\tilde{F}}^{\delta } \cdot \nabla _x u_0^\delta + \frac{1}{2} A^\delta (x)A^\delta (x)^\top : \nabla _x\nabla _x u_0^\delta . \end{aligned}$$
(4.4)

Here the drift coefficient in the homogenized equation (2.10) now changes to

$$\begin{aligned} {\tilde{F}}^\delta (x):= \int _{{\mathbb {T}}^m} \Big (a(x,y) + \nabla _x \Phi ^\delta (y;x) b(x,y) + \nabla _y\Phi ^\delta (y;x)h(x,y) \Big ) \rho _\infty ^\delta (y;x) ~\mathrm {d}y\quad \end{aligned}$$
(4.5)

and the diffusion coefficient \(A^\delta (x)\) remains unchanged

$$\begin{aligned} \begin{aligned} A^\delta (x)A^\delta (x)^\top&= \frac{1}{2} \Big ( A_0^\delta (x) + A_0^\delta (x)^\top \Big ), \\ A_0^\delta (x)&= 2 \int _{{\mathbb {T}}^m} \Big (b(x,y) \otimes \Phi ^\delta (y;x)\Big ) \rho _\infty ^\delta (y;x) ~\mathrm {d}y. \end{aligned} \end{aligned}$$
(4.6)

Note that (see for example [38, Result 11.8]) the solution \(\Phi ^\delta \) of the cell problem admits the representation formula

$$\begin{aligned} \Phi ^\delta (y;x) = \int _0^\infty {\mathbb {E}} \Big [ b(x, \phi ^{\delta ,t}(y)) \Big ] ~\mathrm {d}t, \end{aligned}$$

where the stochastic process \(\phi ^{\delta ,t}(y)\) satisfies equation (1.24) and the term \({\mathbb {E}} [ b(x, \phi ^{\delta ,t}(y)) ]\) decays exponentially fast as \(t\rightarrow \infty \) (see [38, Theorem 6.16]). The above considerations allow us to repeat the arguments from the previous sections and we get following theorem.

Theorem 4.1

(Convergence of the slow process for weakly-coupled systems) Assume (A1)-(A2) and that the unperturbed flow \(\phi ^{0,t}\) has summable stochastically stable decay of correlations C(t) in the sense of Definitions 3.2 and 3.5. Furthermore, assume that the centering condition (1.16) is satisfied and define the operator \(\tilde{{\mathcal {L}}}^{0,\delta }\) on \(C^2_{\text {c}}({\mathbb {R}}^d)\) by

$$\begin{aligned} \tilde{{\mathcal {L}}}^{0,\delta }u := {\tilde{F}}^\delta \cdot \nabla _xu + \frac{1}{2}A^\delta (x)A^\delta (x)^\top :\nabla _x\nabla _xu. \end{aligned}$$
(4.7)

In the case that h does not vanish everywhere, we assume additionally that the centering condition (1.17) and the growth condition (1.25) hold.

Then following statements are true:

(i):

There exist vector fields \({\tilde{F}}^0(x)\) and \(A^0(x)\) such that

$$\begin{aligned} {\tilde{F}}^\delta \rightarrow {\tilde{F}}^0, {\quad }A^\delta \rightarrow A^0, \quad \text {uniformly in }x\text { as }\delta \rightarrow 0, \end{aligned}$$
(4.8)

where \(A^0\) is explicitly given by (1.27) and the vector field \({\tilde{F}}^0\) is given by (1.28).

(ii):

For every \(f \in C^2_{\text {c}}({\mathbb {R}}^d)\)

$$\begin{aligned} \lim _{\delta \rightarrow 0}\tilde{{\mathcal {L}}}^{0, \delta } f = \tilde{{\mathcal {L}}}^{0,0} f \quad \text {uniformly}, \end{aligned}$$
(4.9)

where the operator \(\tilde{{\mathcal {L}}}^{0,0}\) is defined by

$$\begin{aligned} \tilde{{\mathcal {L}}}^{0,0} u := {\tilde{F}}^0 \cdot \nabla _xu + \frac{1}{2}A^0(x)A^0(x)^\top :\nabla _x\nabla _xu, \end{aligned}$$
(4.10)

and \(\bar{\tilde{{\mathcal {L}}}}^{0,0}\) generates the strongly continuous semigroup \(T(t)^{0,0}\) on X.

(iii):

Let \(T^{\varepsilon ,\delta }\) be the semigroup on \({\hat{C}}({\mathbb {R}}^d\times {\mathbb {T}}^m)\) generated by \(\bar{{\mathcal {L}}}^{\varepsilon ,\delta }\). Then for every \(f \in C_0({\mathbb {R}}^d)\) and every sequence \(\{\varepsilon _k\}_{k\ge 0}\) with \(\varepsilon _k \rightarrow 0\) for \(k \rightarrow \infty \), there exists a subsequence \( \{ \varepsilon _{k_m} \}_{m \ge 0}\) such that

$$\begin{aligned} \lim _{m \rightarrow \infty } \sup _{0 \le t \le {\hat{T}}} \parallel T^{\varepsilon _{k_m},0}(t)f - T^{0,0}(t)f \parallel _\infty = 0. \end{aligned}$$
(4.11)
(iv):

For \(\varepsilon > 0\) let \((X^\varepsilon (t;\xi ,\eta ), Y^\varepsilon (t;\xi ,\eta ))\) be the solution of the ODE (1.23).

Then for every initial condition \(f \in {\hat{C}}({\mathbb {R}}^d)\) and every sequence \(\{\varepsilon _k\}_{k\ge 0}\) with \(\varepsilon _k \rightarrow 0\) for \(k \rightarrow \infty \), there exists a subsequence \( \{ \varepsilon _{k_m} \}_{m \ge 0}\) such that

$$\begin{aligned} f(X^{\varepsilon _{k_m}}(t; \xi , \eta )) \rightarrow T^{0,0}(t) f(\xi ), \quad \text {uniformly in }\xi \in {\mathbb {R}}^d, \eta \in \Omega \text { and }t\in [0,{\hat{T}}]. \end{aligned}$$

Proof

The arguments needed for the proof are identical with those given in Sects. 2 and 3. Thus we omit their exact repetition. We only want to note that in the case that \(h \equiv 0\) the term \(\nabla _y\Phi ^\delta (x,y)h(x,y)\) in (4.5) vanishes, so that we can repeat the arguments from Lemma 3.6 to get the first statement. In the general case that h does not vanish everywhere, the term \(\nabla _y\Phi ^\delta (x,y)h(x,y)\) in equation (4.5) cannot be neglected. Thus we need to pose the additional assumptions (1.17) and (1.25) (which ensure especially that the expression

$$\begin{aligned} \int _0^\infty \int _{{\mathbb {T}}^m} {\mathbb {E}} \Big [\nabla _yb(x, \phi _x^{\delta ,t}(y)) \nabla _y \phi _x^{\delta ,t}(y) \Big ] h(x,y) \rho ^\delta _\infty (y;x) dy dt <\infty \end{aligned}$$

is well-defined) and then we proceed as in Lemma 3.7 to get the first statement also for this case. Finally we note that for the second statement we repeat the arguments from Theorem B, for the third statement we need to repeat the proof of Theorem 2.3 and for the last statement see the proof of Corollary 2.5. \(\square \)

As we can see from the formulation of Theorem (4.1), we do not have to assume any additional growth condition for \(\phi ^{0,t}\) in case h in (4.1) vanishes. If \(h \ne 0\), the assumed growth condition (1.25) for the weakly-coupled system is clearly weaker than growth condition (1.18) for the more general case: in (1.18), the integrability has to hold uniformly over all \(x \in {\mathbb {R}}^d\), whereas \(\phi ^{0,t}\) does not depend on x in the weakly-coupled situation, hence the simplification to (1.25).

4.2 Numerical Example

As an application of the previous Sect. 4.1, we consider a weakly-coupled system on \({\mathbb {R}}\times {\mathbb {R}}^3\) with chaotic fast dynamics on the Lorenz attractor. Let us recall that the classical Lorenz equations are given by the three-dimensional ODE system

$$\begin{aligned} \begin{aligned} \frac{\mathrm {d}y_1}{\mathrm {d}t}&= s(y_2-y_1), \\ \frac{\mathrm {d}y_2}{\mathrm {d}t}&= \rho y_1 - y_2 -y_1y_3, \\ \frac{\mathrm {d}y_3}{\mathrm {d}t}&= y_1y_2 - \beta y_3, \end{aligned} \end{aligned}$$
(4.12)

with the parameters \(s, \rho , \beta >0\), where, in particular, s is called the Prandtl number and \(\rho \) is called the Rayleigh number. For the standard values \(s = 10, \rho = 28, \beta = 8/3\), the equations are ergodic with invariant measure \(\mu \) supported on the Lorenz attractor \(\Omega \). We now consider, motivated by [38, Section 11.7.2] and [22, Section 6.4], the following weakly-coupled systems on \({\mathbb {R}}\times {\mathbb {R}}^3\):

$$\begin{aligned} \begin{aligned} \frac{\mathrm {d}X^{\varepsilon ,\delta }}{\mathrm {d}t}&= -X^{\varepsilon ,\delta } + \frac{1}{\varepsilon }\frac{4}{90} Y^{\varepsilon ,\delta }_2 \\ \frac{\mathrm {d}Y^{\varepsilon ,\delta }_1}{\mathrm {d}t}&= \frac{10 }{\varepsilon ^2}(Y^{\varepsilon ,\delta }_2- Y^{\varepsilon ,\delta }_1) + X^{\varepsilon ,\delta }Y_3^{\varepsilon ,\delta } +\delta \frac{\mathrm {d}U}{\mathrm {d}t}\\ \frac{\mathrm {d}Y^{\varepsilon ,\delta }_2}{\mathrm {d}t}&= \frac{1}{\varepsilon ^2} (28Y^{\varepsilon ,\delta }_1 - Y^{\varepsilon ,\delta }_2 -Y^{\varepsilon ,\delta }_1Y^{\varepsilon ,\delta }_3) - X^{\varepsilon ,\delta }+ \delta \frac{\mathrm {d}V}{\mathrm {d}t} \\ \frac{\mathrm {d}Y^{\varepsilon ,\delta }_3}{\mathrm {d}t}&=\frac{1}{\varepsilon ^2}( Y^{\varepsilon ,\delta }_1 Y^{\varepsilon ,\delta }_2 - \frac{8}{3} Y^{\varepsilon ,\delta }_3) + X^{\varepsilon ,\delta } Y_1^{\varepsilon ,\delta } Y_2^{\varepsilon ,\delta } +\delta \frac{\mathrm {d}W}{\mathrm {d}t}. \end{aligned} \end{aligned}$$
(4.13)

In Fig. 1 sample paths of the process \(X^{\varepsilon ,\delta }\) solving (4.13) for different values of \(\varepsilon \) and \(\delta \) are shown. These paths illustrate that the deterministic flow displays stochastic-looking/chaotic oscillations but one does really need to look at the limiting behaviour as \(\varepsilon \rightarrow 0\) to fail to see the visual difference between a deterministic and a stochastic process.

Fig. 1
figure 1

Sample paths of the process \(X^{\varepsilon , \delta }\) satisfying equation (4.13), with the initial condition \([x^0, y_1^0, y_2^0, y_3^0]^\top =[0,13.93,20.06,26.87]^\top \), for \(\varepsilon = 0.8\) (a) and \(\varepsilon = 0.05\) (b), for different values of \(\delta \)

The fast subsystem has the ergodic measure \(\mu \) supported on the Lorenz attractor \(\Omega \). Let \(Q \subset {\mathbb {R}}^3\) be a sufficiently large cube containing \( \Omega \). By identifying the opposite sides of the cube and rescaling the coordinates we can assume, without loss of generality, that \(Q = {\mathbb {T}}^3\) is the torus, so that the theory from the previous sections can be applied. We note further that it has been already verified numerically in [22] that the \(y_2\) coordinate has zero average with respect to \(\mu \) and as a consequence that the centering condition (1.4) is satisfied. Theorem 4.1 states that for every \(f \in C_0({\mathbb {R}})\) and every sequence \(\{\varepsilon _k\}_{k\ge 0}\) with \(\varepsilon _k \rightarrow 0\) for \(k \rightarrow \infty \) there exists a subsequence \( \{ \varepsilon _{k_m} \}_{m \ge 0}\) such that

$$\begin{aligned} {\mathbb {E}}^{\mu } [f(X^{\varepsilon _{k_m},0}(t))]\rightarrow {\mathbb {E}}[f(X(t))] \quad \text {as }m \rightarrow \infty \text { uniformly in }t \in [0,{\hat{T}}], \end{aligned}$$
(4.14)

where the process X solves the SDE

$$\begin{aligned} \frac{\mathrm {d}X}{\mathrm {d}t} = - X + \sigma \frac{\mathrm {d}W}{\mathrm {d}t}, {\quad } X(0) = \xi . \end{aligned}$$
(4.15)

Note that equation (4.15) describes an Ornstein-Uhlenbeck process which has the unique solution given by

$$\begin{aligned} X_t = {\text {e}}^{-t} \xi + \sigma {\text {e}}^{-t} \int _0^t {\text {e}}^{ \tau } ~\mathrm {d}W_\tau . \end{aligned}$$

In general we know that for a square integrable function f on [0, T], the random variable \(\int _0^T f(t) ~\mathrm {d}W_t\) is normally distributed with variance \(\int _0^T f(t)^2 ~\mathrm {d}t\) and from this fact it is easy to see that \(X_t\) is normally distributed with

$$\begin{aligned} X_t \sim N({\text {e}}^{- t} \xi , \frac{\sigma ^2}{2} {\text {e}}^{-2 t}({\text {e}}^{2t} - 1)). \end{aligned}$$

The exact value of \(\sigma \) is given by formula (1.28). In the following we use the estimate \(\sigma ^2 \simeq 0.126\) calculated in [22].

Furthermore, since \(C_0({\mathbb {R}}) \subset C_b({\mathbb {R}})\), equation (4.14) is slightly weaker than uniform convergence in distribution of the process \(X^{\varepsilon _{k_m},0}(t)\) towards X(t). The following Figs. 2 and 3 verify equation (4.14) numerically.

Fig. 2
figure 2

Averages of the process \(X^{\varepsilon , 0}(t)\) satisfying equation (4.13) for different values of \(\varepsilon >0\) and theoretical average for \(\varepsilon = 0\), i.e. for the limiting process X(t) satisfying (4.15) with the initial condition \(\xi =0\). The averages are taken over 500 different realizations on the Lorenz attractor. In a convergence of averages for \(\varepsilon = 0.8, 0.3, 0.08\) is shown, while in b the continuation of the converging behaviour on a smaller scale for \(\varepsilon = 0.08, 0.05\) is illustrated

Fig. 3
figure 3

Histograms of the process \(X^{\varepsilon , 0}(t)\), corresponding with Fig. 2, taking \(\varepsilon =0.8, 0.3, 0.08\) at time \(t=10\) (a) and again \(\varepsilon =0.08, 0.05\) at time \(t=0.5\) (b) satisfying equation (4.13), in comparison to the distribution of the limiting process X(t), solving (4.15) with the initial condition \(\xi =0\). We used ensembles of 500 realizations

Figure 2 shows that equation (4.14) is satisfied for f being the identity function (note that, since the process \(X^{\varepsilon ,0}\) is uniformly bounded for every \(\varepsilon \ge 0.05\), we can assume without loss of generality that f coincides with the identity function only in a compact interval and that \(f \in C_0({\mathbb {R}})\)). Appart from that, Fig. 3 suggests that we actually have convergence in distribution of the slow process \(X^{\varepsilon ,0}\), satisfying the chaotic ODE (4.13) (for \(\delta =0\)), towards the limiting stochastic process X satisfying the SDE (4.15), which is a reduced stochastic equation for the slow process \(X^{\varepsilon ,0}\). This illustrates the reduction effect one is looking for since now the chaotic fast degrees of freedom are encoded in a low-dimensional SDE.

5 Conclusion and Outlook

In this paper we have extended results on deterministic homogenization of fast-slow ODEs to the case where coupling of the fast and slow variables is part of the model. Our main strategy was to add small stochastic noise to the fast subsystem and then take two independent limits — namely the zero-noise limit and the limit \(\varepsilon \rightarrow 0\) —, which enabled us to use results and functional-analytical methods from stochastic systems. For generally coupled systems, we have succeeded to prove a certain weak form of convergence of the slow process, similarly to uniform convergence of the first moments, requiring strong mixing assumptions on the fast flow. However, for the intermediate case of weakly-coupled systems, the mixing assumptions are relatively mild. Our method also directly yields explicit expressions for the drift and diffusion coefficients of the limiting SDE.

This paper can be seen as one of the first steps to understand homogenization of coupled fast-slow systems in continuous time and leaves open several relevant questions for further research. One task is to find, numerically and/or analytically, more direct examples from applications for which the strong mixing condition (1.25) is satisfied. Moreover, the key assumption of stochastically stable DOC in the sense of Definition 3.5 needs to be investigated. Another goal will be to find alternative representations of the drift and diffusion coefficients of the limiting diffusion, such that potentially weaker or even no mixing assumptions are required, as seen in [26, 27]. In addition to that, it will be crucial to study the behavior of the higher moments of the slow process in order to prove weak convergence of the respective measures in \(C([0,T],{\mathbb {R}}^d)\).