Homogenization for Generalized Langevin Equations with Applications to Anomalous Diffusion

We study homogenization for a class of generalized Langevin equations (GLEs) with state-dependent coefficients and exhibiting multiple time scales. In addition to the small mass limit, we focus on homogenization limits, which involve taking to zero the inertial time scale and, possibly, some of the memory time scales and noise correlation time scales. The latter are meaningful limits for a class of GLEs modeling anomalous diffusion. We find that, in general, the limiting stochastic differential equations (SDEs) for the slow degrees of freedom contain non-trivial drift correction terms and are driven by non-Markov noise processes. These results follow from a general homogenization theorem stated and proven here. We illustrate them using stochastic models of particle diffusion.


. Motivation
Most of the mathematical models of diffusion phenomena use noise which is white (i.e. uncorrelated), or Markovian [53]. The present paper is a step towards removing this limitation. The diffusion models studied here are driven by noises, belonging to a wide class of non-Markov processes. A standard example of Markovian noise is a multidimensional Ornstein-Uhlenbeck process. An important class of Gaussian stochastic processes is obtained by linear transformations of multidimensional Ornstein-Uhlenbeck processes. The covariance (equal to correlation in the case of zero mean) of such a process is a linear combination of exponentials decaying and possibly oscillating on different time scales, and its spectral density (power spectrum) is a ratio of two semi-positive defined polynomials [16]. In cases when the polynomial in the denominator has degenerate zeros, the covariance contains products of exponentials and polynomials in time. This is a very general class of processes-every stationary Gaussian process whose covariance is a Bohl function (see Section 2) can be obtained as a linear transformation of an Ornstein-Uhlenbeck process in some (finite) dimension. In general, these processes are not Markov. Let us mention here the seminal result by L.A. Khalfin from 1957 [32], who showed, quite generally that in any system with energy spectrum bounded from below (which is a necessary condition for the physical stability), correlations must decay no faster than according to a power law. To this day this result provides inspirations and motivations for further studies in the context of thermalization [70], cooling of atoms in photon reservoirs [40], decay of metastable states as monitored by luminescence [63], or quantum anti-Zeno effect (c.f. [59,41]), to name a few examples. Khalfin's result further motivates studying systems with non-Markovian noise, as most natural examples of strongly correlated processes do not satisfy Markov property.
While the noise processes studied here have exponentially decaying covariances, their class is very rich and they may be useful in approximating strongly correlated noises on time intervals, relevant for studied phenomena [67]. In addition, as discussed in more detail later, generalization of the method applied here may lead to a representation of a class of noises whose covariances decay as powers (see Remark 3.7). Also, the representation of spectral density of the noise processes as ratio of two polynomials is convenient in applications, in particular for solving the problem of predicting (in the least mean square sense) a colored noise process given observations on a finite segment of the past or on the full past [16].

Definitions and Models
We consider the following stochastic model for a particle (for instance, Brownian particle or a tagged tracer particle) interacting with the environment (for instance, a heat bath or a viscous fluid). Let x t ∈ R d denote the particle's position, where t ≥ 0 denotes time and d is a positive integer. The evolution of the particle's velocity, v t :=ẋ t ∈ R d , is described by the following generalized Langevin equation (GLE): In the above, m > 0 is the particle's mass, η t is a k-dimensional Gaussian white noise satisfying E[η t ] = 0 and E[η t η * s ] = δ(t − s)I, and ξ t is a colored noise process independent of η t . Here and throughout the paper, the superscript * denotes transposition of matrices or vectors, I denotes identity matrix of appropriate dimension, E denotes expectation, and R + := [0, ∞). The initial data are random variables, x 0 = x, v 0 = v, independent of {ξ t , t ∈ R + } and {η t , t ∈ R + }.
The three terms on the right hand side of (1.1) model forces of different physical natures acting on the particle.
(i) F e is an external force field, which may be conservative (potential) or not. (ii) F 0 is a Markovian force of the form containing an instantaneous damping term and a multiplicative white noise term. The damping and noise coefficients, γ 0 : R + × R d → R d×d and σ 0 : R + × R d → R d×k , may depend on the particle's position and on time. W (k) t denotes a k-dimensional Wiener process-the time integral of the white noise η t . (iii) F 1 is a non-Markovian force of the form (1.3) containing a non-instantaneous damping term, describing the delayed drag effects by the environment on the particle, and a multiplicative colored noise term. The coefficients, g : R + × R d → R d×q , h : R + × R d → R q×d and σ : R + × R d → R d×r , depend in general on the particle's position and on time. In the above, q and r are positive integers, and the memory function κ : R → R q×q is a real-valued function that decays sufficiently fast at infinities. ξ t ∈ R r is a mean-zero stationary Gaussian vector process, to be defined in detail later. The statistical properties of the process ξ t are completely determined by its (matrix-valued) covariance function, 4) or equivalently, by its spectral density, S(ω), i.e. the Fourier transform of R(t) defined as: For simplicity, we have omitted other forces such as the Basset force [24] from Eqn. (1.1). Note that F 0 and F 1 describe two types of forces associated with different physical mechanisms. Of particular interest is when the noise term in F 0 and F 1 models environments of different nature (passive bath and active bath respectively [14]) that the particle interacts with.
As the name itself suggests, GLEs are generalized versions of the Markovian Langevin equations, frequently employed to model physical systems. A basic form of the GLEs was first introduced by Mori in [52] and subsequently used in numerous statistical physics models [35,71,75]. The studies of GLEs have attracted increasing interest in recent years. We refer to, for instance, [49,45,68,26,23,46,38,74,69] for various applications of GLEs and [55,48,21,39] for their asymptotic analysis. The main merit of GLEs from modeling point of view is that they take into account the effects of memory and the colored nature of noise on the dynamics of the system. Remark 1.1. In general, there need not be any relation between κ(t) and R(t), or any relation between the damping coefficients and the noise coefficients appearing in the formula for F 0 and F 1 . A particular but important case that we will revisit often in this paper is the case when a fluctuationdissipation relation holds. In this case, γ 0 is proportional to σ 0 σ * 0 , h = g * , g is proportional to σ and (without loss of generality 1 ) R(t) = κ(t). Studies of microscopic Hamiltonian models for open classical systems lead to GLEs of the form (1.1) satisfying the above fluctuation-dissipation relation (see, for instance, Appendix A of [42] or [11]). On another note, GLEs of the form (1.1) are extended versions of the ones studied in our previous work [42] here the GLEs are generalized to include a Markovian force, in addition to the non-Markovian one, as well as explicit time dependence in the coefficients.
As a motivation, we now provide and elaborate on examples of systems that can be modeled by our GLEs.
An important type of diffusion, which has been observed in many physical systems, from charge transport in amorphous materials to intracellular particle motion in cytoplasm of living cells [62], is ballistic diffusion. It is a subclass of anomalous diffusions and is characterized by the property that the particle's long-time mean-squared displacement grows quadratically in time -in contrast to linear growth in usual diffusion. There are many different theoretical models of anomalous diffusion with diverse properties, coming from different physical assumptions; see [50] for a comprehensive survey. In the following, we provide two GLE models that are employed to study such phenomena. Their properties will be studied in Section 2, as an application of the results proven here.
Example 1. Two GLE models for anomalous diffusion of a free Brownian particle in a heat bath. A large class of models for diffusive systems is described by the system of equations (for simplicity, we restrict to one dimension): where x t , v t ∈ R are the position and velocity of the particle, κ(t) is called the memory function, and ξ t is a mean-zero stationary Gaussian process.
Observe that the spectral densities in both models share the same asymptotic behavior near ω = 0, i.e. S(ω) ∼ ω 2 as ω → 0, contributing to the enhanced diffusion (super-diffusion) of the particle with meansquared displacement growing as t 2 as t → ∞ [68]. See Proposition 3.5 for a precise argument.
Other examples of systems that can be modeled by our GLEs are multiparticle systems with hydrodynamic interaction [17], active matter systems [65], among others. Although our main results are applicable to these systems, we will not pursue the study of these systems here.
1.3. Goals, Organization and Summary of Results of the Paper Goals of the Paper. We aim to derive homogenized models for a general class of GLEs (see Section 3), containing the examples (M1) and (M2) as special cases (see Corollary 2.1 and Corollary 2.2). This will allow us to gain insights into the stochastic dynamics of such systems, including many systems that exhibit anomalous diffusion (see discussion in the paragraph before Example 1) -this is, in fact, the main motivation of the present paper. To the best of our knowledge, this is the first work that studies homogenization for GLE models describing anomalous diffusion. Given a GLE system, it is often desirable to work with simpler, reduced models that capture the essential features of its dynamics. To obtain satisfactory and optimal models, one needs to take into account the trade-off between the simplicity and accuracy of the reduced models sought after. Indeed, one may find that a reduced model, while simplified, fails to give a physically correct model for describing a system of interest [64]. Two successful reductions were carried out in [28] for the case F 1 = 0 and in [42] for the case F 0 = 0.
One of our main goals in this paper is to devise and study new homogenization procedures that yield reduced models retaining essential features of a more general class of models. This program is of importance for identification, parameter inference and uncertainty quantification of stochastic systems [61,25,45,38] arising in the studies of anomalous diffusion [49,51], climate modeling [22,47] and molecular systems [10], among others. There is increasing amount of effort striving to implement this or related programs, starting from microscopic models [60], using various techniques [20,57,7,19,26], for different systems of interest in the literature. The derived effective SDE models will be of particular interest for modelers of anomalous diffusion.
Organization of the Paper. The paper is organized as follows. We first present the application of the results obtained in the later sections (Section 5 and Section 6) to study homogenization of generalized versions of the one-dimensional models (M1) and (M2) from Example 1 in Section 2. Since these results are easier to state and require minimal notation to understand, we have chosen to present them as early as possible to demonstrate the value of our study to application-oriented readers. The later sections study an extended, multidimensional version of the GLEs in Section 2. In Section 3 we introduce the GLEs to be studied and revisit them from the perspective of input-output stochastic dynamical systems exhibiting multiple time scales. In Section 4, we discuss various ways of homogenizing GLEs. Following this discussion, we study the small mass limit of the GLEs in Section 5. We introduce and study novel homogenization procedures for a class of GLEs in Section 6. We state conclusions and make final remarks in Section 7. Relevant technical details and supplementary materials are provided in the appendix. In particular, we state a homogenization theorem for a general class of SDEs with statedependent coefficients in Appendix A. The proof of this theorem is given in Appendix B.
Summary of the Main Results. For reader's convenience, below we list (not in exactly the same order as the results appear in the paper) and summarize the main results obtained in the paper.
• The first main result is Theorem 5.4. It studies the small mass limit of the GLE described by (5.1)-(5.2). It states that the position process converges, in a strong pathwise sense, to a component of a higher dimensional process satisfying an Itô SDE. The SDE contains non-trivial drift correction terms. We stress that, while being a component of a Markov process, the limiting position process itself is not Markov. This is in constrast to the nature of limiting processes obtained in earlier works, the difference which holds interesting implications from a physical point of view (recall the discussion after Eqn. (1.5)). Therefore, Theorem 5.4 constitutes a novel result, both mathematically and physically. • The second main result is Theorem 6.7. It describes the homogenized behavior of a family of GLEs (Eqns. (6.16)-(6.17)), parametrized by > 0, in the limit as → 0. This limit is equivalent to the limit in which the inertial time scale, some of the memory time scales and some of the noise correlation time scales in the pre-limit system, tend to zero at the same rate. As in Theorem 5.4, the result here states that the position process converges, in a strong pathwise sense, to a component of a higher dimensional process satisfying an Itô SDE which contains non-trivial drift correction terms. Again, the limiting position process is non-Markov. However, the structure of the SDE is rather different from the one obtained in Theorem 5.4. As discussed later, this result holds interesting consequences for systems exhibiting anomalous diffusion.
• The third and forth main result are Corollary 2.1 and Corollary 2.2.
These results specialize the earlier ones to one-dimensional GLE models, which are generalizations of (M1) and (M2), and follow from the earlier theorems. They give explicit expressions for the drift correction terms present in the limiting SDEs and therefore may be used directly for modeling and simulation purposes. Furthermore, we show that, in the important case where the fluctuation-dissipation relation (see Remark 1.1) holds, the two corollaries are intimately connected. Recall that these results are going to be presented first in Section 2. • The last main result is Theorem A.6, on homogenization of a family of parametrized SDEs whose coefficients are state-dependent. These SDEs are variants of the ones studied in earlier works [28,4,6]. In comparison with all the earlier studies, the state-dependent coefficients of the prelimit SDEs (A.3)-(A.4) may depend on the parameter > 0 (to be taken to zero) explicitly. Therefore, this result is new and not simply a minor generalization of earlier results. Moreover, it is important in the context of present paper and is needed here to study various homogenization limits of GLEs, the importance of which is evident in the discussions above, in the main paper.

Application to One-Dimensional GLE Models
We first study the small mass limit of a one-dimensional GLE, which is a generalized version of the GLE in model (M2) of Example 1, modeling superdiffusion of a particle in a heat bath. Our models are generalized in that the coefficients of the GLEs are state-dependent. For simplicity, we are going to omit the explicit time dependence in the damping and noise coefficients-but not in the external force. For t ∈ R + , m > 0, let x t , v t ∈ R be the solutions to the equations: where Γ 1 > 0, and ξ t is the mean-zero stationary Gaussian process with the covariance function R(t) = κ(t) and spectral density, The initial data (x, v) are random variables independent of and have finite moments of all orders.
The following corollary describes the limiting SDE for the particle's position obtained in the small mass limit of (2.1)-(2.2).
We next specialize the result of Theorem 6.7 to study homogenization of one-dimensional GLEs which are generalizations of the model (M1) in Example 1: for t ∈ R + , m > 0, let x t , v t ∈ R be the solutions to the equations: with Γ 2 > Γ 1 > 0, and ξ t is the mean-zero stationary Gaussian process with the covariance function R(t) = κ(t) and spectral density, . (2.14) The initial data (x, v) are random variables independent of and have finite moments of all orders.
Discussion. We discuss the physical meaning behind the above rescaling of parameters. Recall that in the first case of Example 1 (i.e. the model (M1)), the mean-square displacement of the particle grows as t 2 as t → ∞ and therefore the above model describes a particle exhibiting super-diffusion. As → 0, the environment allows for more and more negative correlation and in the limit the covariance function consists of a delta-type peak at t = 0 and a negative long tail compensating for the positive peak when integrated (see Figure 1 and also page 105 of [71]). Indeed, as → 0. This is the so-called vanishing effective friction case in [1]. The noise with the covariance function κ (t) is called harmonic velocity noise, whereas the noise with the covariance function κ(t) is the derivative of an Ornstein-Uhlenbeck process.
The following corollary provides the homogenized model in the limit → 0 of (2.15)-(2.16).
Corollary 2.2. Assume that for every y ∈ R, g(y), g (y), h(y), h (y), σ(y) are bounded continuous functions in y, F e (t, y) is bounded and continuous in t and y, and all the listed functions have bounded derivatives in y. Then in the limit → 0, the particle's position, x t ∈ R, satisfying (2.15)-(2.16), converges to X t , where X t solves the following Itô SDE:

21)
where g = g(X t ), h = h(X t ), σ = σ(X t ), W t is a one-dimensional Wiener process, and Moreover, if in addition g := φσ, where φ > 0, then the number of limiting SDEs reduces from three to two: Proof. Let d = 1, d 2 = d 4 = 2 and denote the one-dimensional version of the variables, coefficients and parameters in Theorem 6.7 by non-bold letters (for instance, x t , B 2 , Γ 2,2 etc.). Furthermore, set B 2 = B 4 = β > 0, γ 2,2 = γ 4,2 = γ 2 > 0 and Γ 2,1 = Γ 4,1 = Γ 1 . Then it can be verified that the assumptions of Theorem 6.7 hold and the results follow upon solving a Lyapunov equation. (i) the homogenized position process is non-Markov, driven by a colored noise process which is the derivative of the Ornstein-Uhlenbeck process. This behavior is expected in view of the asymptotic behavior of the rescaled memory function and spectral density as → 0. (ii) similarly to the small mass limit case considered earlier, the limiting equation for the particle's position not only contains noise-induced drift terms but is also coupled to equations for other slow variables. Moreover, the limiting equations for these other slow variables also contain nontrivial correction terms -the memory induced drift. give the same limiting SDE as taking the joint limit of m → 0 and Γ 2 → ∞. However, if one further assumes that g is proportional to σ, then the limiting SDE systems coincide. An important particular case is when g = h = σ, in which case a fluctuation-dissipation relation holds and the GLE can be derived from a microscopic Hamiltonian model (see Remark 1.1). In this case, the homogenized model described in both corollaries reduces to: To end this section, we remark that one could in principle repeat the above analysis for the case where the spectral density varies as ω 2l , for l = 2, 4, . . . (i.e. the highly nonlinear case) as well as extending the studies done so far in various other directions. To illustrate how non-trivial the calculations and results could become, we work out another example in Appendix D.

GLEs in Finite Dimensions
We call a system modeled by GLE of the form (1.1) a generalized Langevin system. Its dynamics will be referred to as generalized Langevin dynamics.
We assume that the memory function κ(t) in the GLE (1.1) is a Bohl function, i.e. that each matrix element of κ(t) is a finite, real-valued linear combination of exponentials, possibly multiplied by polynomials and/or by trigonometric functions. The noise process, {ξ(t), t ∈ R + }, is a mean-zero, mean-square continuous stationary Gaussian process with Bohl covariance function and, therefore, its spectral density S(ω) is a rational function-(see Theorem 2.20 in [72]). In this case, the generalized Langevin dynamics can be realized by an SDE system in a finite-dimensional space (see next subsection for details). The case in which an infinite-dimensional space is required is deferrred to a future work (see also Remark 3.7 and Section 7).
We recall a useful fact: given a rational spectral density S(ω) ∈ R r×r , there exists a rational function G(z) ∈ C r×l , called a spectral factor, such that S(ω) = G(iω)G * (−iω). We emphasize that such factorization is not unique [43].

Generalized Langevin Systems
Below we define the memory function and the noise process in the GLE (1.1) (see Eqn. (1.3)), and along the way introduce our notation. They are defined in a manner ensuring simplicity as well as providing sufficient parameters for matching the memory function and the correlation function of the noise, thereby preserving the essential statistical properties of the GLE. This provides a systematic framework for our homogenization studies (see the discussion in Section 4).
For i = 1, 2, 3, 4, let Γ i ∈ R di×di , M i ∈ R di×di , Σ i ∈ R di×qi be constant matrices. Also, let C i ∈ R q×di (for i = 1, 2) and C i ∈ R r×di (for i = 3, 4) be constant matrices. Here, the d i and q i (i = 1, 2, 3, 4) are positive integers. Let α i ∈ {0, 1} be a "switch on or off" parameter. We define the memory function in terms of the sextuple (Γ 1 , M 1 , C 1 ; Γ 2 , M 2 , C 2 ) of matrices: The noise process is defined as: where the β j t ∈ R dj (j = 3, 4) are independent Ornstein-Uhlenbeck type processes, i.e. solutions of the SDEs: with the initial conditions, β j 0 , normally distributed with mean-zero and covariance M j . Here, W (qj ) t denotes a q j -dimensional Wiener process, independent of β j 0 . Also, the Wiener processes W The M i are therefore the steady-state covariances of the systems, i.e. the resulting Ornstein-Uhlenbeck processes are stationary. In control theory, M i is also known as the controllability Gramian for the pair (Γ i , Σ i ) [72]. The covariance matrix, R(t), of the mean-zero Gaussian noise process is expressed by the sextuple (Γ 3 , M 3 , C 3 ; Γ 4 , M 4 , C 4 ) of matrices as follows: and so the sextuple (Γ 3 , M 3 , C 3 ; Γ 4 , M 4 , C 4 ), together with the parameters α 3 , α 4 , completely determine the probability distributions of ξ t . We denote the spectral density of the noise process by We will view the system (3.2)-(3.3) (which is in a statistical steady state) as a representation of the noise process ξ t and call such a representation a (finite-dimensional) stochastic realization of ξ t . Similarly, we view (3.1) as a representation of the memory function κ(t) and call such a representation a (finite-dimensional, deterministic) memory realization of κ(t). We call the Fourier transform of κ(t) and R(t) the spectral density of the memory function and spectral density of the noise process respectively.
An important message from the stochastic realization theory is that the system (3.2)-(3.3) is more than a representation of ξ t in terms of a white noise, in that it also contains state variables β j (j = 3, 4) which serve as a "dynamical memory". In contrast to standard treatments, this dynamical memory comes not from one, but from two independent systems of type (3.3). This will be used to include two distinct types of dynamical memory that can be switched on or off using the parameters α i -see Proposition 3.5. This consideration motivates us to define the memory function (and noise) explicitly using two independent systems, with different constraints on their parameters easier to state than if a single higher-dimensional system were used.
The sextuples that define the memory function in (3.1) and the noise process in (3.2) are only unique up to the following transformations: where i = 1, 2, 3, 4 and T i are any invertible matrices of appropriate dimensions [43]. Different choices of T i correspond to different coordinate systems.
Remark 3.1. Realization of the memory function and noise process in terms of the matrix sextuples, as defined above, covers all GLEs driven by Gaussian processes that can be realized in a finite dimension (see the propositions and theorems on page 303-308 of [73]). See also the remarks on the subject in [42].
A summary of the above discussion is included in the following: The memory function κ(t) in the GLE (1.1) is a real-valued Bohl function defined by (3.1) and the noise process, {ξ t , t ∈ R + }, is a meanzero, mean-square continuous, stationary Gaussian process with Bohl covariance function (hence, with rational spectral density), admitting a stochastic realization given by (3.2)-(3.3). Furthermore, we assume that any spectral factors Φ i (z) (i = 1, 2, 3, 4) of the spectral densities S i (ω) are minimal (see Chapter 10 in [43]).
We introduce a generalized version of the effective damping constant and effective diffusion constant used in [42], which will be useful to study the asymptotic behavior of spectral densities. Definition 3.3. For n ∈ Z, the nth order effective damping constant is defined as the constant matrix, parametrized by α 1 , α 2 ∈ {0, 1}: . Likewise, the nth order effective diffusion constant, Note that the first order effective damping constant dt are simply the effective damping constant and effective diffusion constant introduced in [42]. The memory function and the covariance function of the noise process can be expressed in terms of these constants: in the expression for first order effective damping constant is invertible and the matrix K In order to develop intuition about general GLEs, it will be helpful to study the following exactly solvable special case.
x) = g and σ(t, x) = σ be constant matrices. The initial data are the random variables, , so that the fluctuation-dissipation relations hold (see Remark 1.1 and also Remark 3.6). The resulting GLE gives a simple model describing the motion of a free particle, interacting with a heat bath. Note that generally the process v(t) is not assumed to be stationary, in particular v(0) could be an arbitrarily distributed random variable.
The following proposition gives the asymptotic behavior of the spectral densities (equivalently, covariance functions or memory functions), the regularity 2 (in the mean-square sense) of the noise process, and, in the exactly solvable case of Example 2, the long-time mean-squared displacement of the particle.
as ω → 0. Also, let k ≥ 3 be a positive odd integer and assume that L (n) 4 = 0 for 0 < n < k, where n is odd, and L (k) be the particle's initial average kinetic energy. Assume for simplicity that R(t) = κ(t) and σκ(t)σ * = h * κ * (t)g * . Then we have the following formula for the particle's mean-squared displacement (MSD): For (iii) and (iv) below, we consider the process x t solving the GLE (3.10) can be 0 or 1 and F 0 can be zero or nonzero). Then E[x(t)x * (t)] = O(t) as t → ∞, in which case we say that the particle diffuses normally. (iv) Let α 1 = 0, α 2 = 1 and F 0 = 0 (the vanishing effective damping constant case). Then E[x(t)x * (t)] = O(t 2 ) as t → ∞, in which case we say that the particle exhibits a ballistic (super-diffusive) behavior.
Proof. (i) For i = 3, 4, it is easy to compute that and so one has: 3 A process X(t) is mean-square differentiable on a time interval τ if for every t ∈ τ , as ω → 0. The first two statements in (i) then follow by Assumption 3.4. The last statement follows from Lemma 6.11 in [44].
Remark 3.6. We emphasize that superdiffusion with E[x(t)x * (t)] behaving as t α as t → ∞, where α > 2, cannot take place when the velocity process converges to a stationary state. For a system to behave this way, the velocity itself has to grow with time. Moreover, we remark that one could obtain a richer class of asymptotic behaviors for the MSD by relaxing the assumption of fluctuation-dissipation relations.
To summarize, (i) says that in the case where F 0 = 0, α 1 = α 3 = 0, the nth order effective constants characterize the asymptotic behavior of the spectral densities at low frequencies; (ii) provides a formula for the particle's mean-squared displacement, and (iii)-(iv) classify the types of diffusive behavior of the GLE model, in the exactly solvable case of Example 2, satisfying the fluctuation-dissipation relations. We emphasize that in the sequel we go beyond the above exactly solvable case; in particular the coefficients g, h, σ, γ 0 , σ 0 will depend in general on the particle's position. However, the GLE in the exactly solvable case can be viewed as linear approximation to the general GLE (1.1) (by expanding these coefficients in a Taylor series about a fixed position x ∈ R d ).
In view of Proposition 3.5, the parameters α i ∈ {0, 1} allow us to control diffusive behavior of the generalized Langevin dynamics. Our GLE models are very general and need not satisfy a fluctuation-dissipation relation. As we will see, these different behaviors motivate our introduction and study of various homogenization schemes for the GLE. Depending on the physical systems under consideration, one scheme might be more realistic than the others. It is one of the goals of this paper to explore homogenization schemes for different GLE classes.
Remark 3.7. In finite dimension, it is not possible to realize generalized Langevin dynamics with a noise and/or memory function whose spectral density varies as 1/ω p , p ∈ (0, 1), near ω = 0 (i.e. the so-called 1/f -type noise [36]), and, consequently, the noise covariance function and/or memory function decay as a power 1/t α , α ∈ (0, 1), as t → ∞. In this case one can use the formula in (ii) of Proposition 3.5 to show, at least for the exactly solvable case in Example 2 where the fluctuation-dissipation relations hold, that the asymptotic behavior of the particle is sub where β ∈ (0, 1), as t → ∞ (see also the related works [48,15]). Sub-diffusive behavior has been discovered in a wide range of statistical and biological systems [34], and, therefore, making the study in this case relevant. One could, following the ideas in [21,54], extend the state space of the GLEs to an infinite-dimensional one, in order to study the sub-diffusive case. Homogenization studies, where more technicalities are expected to be encountered due to the infinite-dimensional nature of the systems, for this case will be explored in a future work.

Generalized Langevin Systems as Input-Output Stochastic Dynamical
Systems with Multiple Time Scales In this subsection, we discuss GLEs of the form (1.1), under Assumptions 3.2-3.4, from the input-output system-theoretic and multiple time scale points of view. First, we introduce the notion of stochastic dynamical systems.
Definition 3.8. A stochastic dynamical system is a pair (Z, F ) of vectorvalued stochastic processes satisfying equations of the form: where A, B, C are measurable (jointly in t and Z) mappings, η(t) is a random process (the input). Z(t) is called the state process and F (t) the output process (observation process). The system is linear if all the mappings are at most linear in Z; otherwise the system is nonlinear. The system is time-invariant if all the mappings are independent of t.
The equation for the particle's position, together with the GLE (1.1), can be cast as the system of SDEs for the Markov process where we have defined the auxiliary memory processes: It is easy to see that the pairs (β i , ξ i ), i = 3, 4, defined in the previous subsection, are linear time-invariant Gaussian stochastic dynamical systems with a white noise input (and therefore the state processes β j (t) are Markov) in the sense of Definition 3.8. Also, the pairs (y i t , C i y i t ) (i = 1, 2) are linear stochastic dynamical systems driven by the random processes x t )v t , which depend on the particle's position and velocity variables. The generalized Langevin system can be viewed as a nonlinear stochastic dynamical system (z, F ), where the components of z satisfy the SDEs (3.36)-(3.39) and F is a measurable mapping describing an output process or a quantity of interest, for instance, for p > 0 and T > 0. In the exactly solvable case of Example 2, the generalized Langevin system reduces to a linear time-invariant stochastic dynamical system and can be viewed as a network of input-output systems consisting of components modeling the memory and noise. One of the goals of homogenization of GLEs is to reduce the number of the components needed to describe the effective dynamics in the considered limit. It is natural question, what class of GLEs should be taken as the starting point for homogenization. For feasible treatment, the GLEs should be in some sense minimal. In the network interpretation, the original system should be completely described by a minimal number of components, with no redundancies. We will discuss this based on a time scale analysis in the following.
The (discrete) spectrum of the Γ i (i = 1, 2) and of the Γ j (j = 3, 4) (or equivalently, the spectrum of the Bohl memory function κ(t) and that of the covariance function R(t)see Definition 2.5 in [72]) encode information about the memory time scales and noise correlation time scales present in the generalized Langevin system respectively. In realistic experiments, there may be many, possibly infinitely many, time scales (each corresponding to a mode of the environment), but typically they cannot be all observed and/or controlled. When modeling a system, it is important to focus on those time scales that are controllable and observable. This motivates the following definition, closely related to the notions of controllable and observable eigenvalues from the systems theory [72]. Definition 3.9. Consider a linear stochastic dynamical system (Z, F ), as in The following proposition, which follows from Theorem 3.13 in [72], states well-known results regarding the above notions.  [B AB · · · A n−1 B] is full rank) if and only if every time scale of the system is controllable.
[C CA · · · CA n−1 ] * is full rank) if and only if every time scale of the system is observable.
Our consideration of GLEs will be based on the following assumption.
Assumption 3.11. All the memory time scales and the noise correlation time scales in the generalized Langevin systems described by (1.1) are controllable and observable.
From the mathematical point of view, our consideration minimizes the dimension of the state space on which the GLE is realized and therefore minimizes the complexity of the model which will be taken as the starting point for our homogenization studies. Indeed, recall that a stochastic realization is minimal if the realized process has no other stochastic realization of smaller dimension. It follows from our assumptions that all the realizations of the memory function and noise process are minimal, since a sufficient condition for a linear stochastic dynamical system to be minimal is that it is controllable (or reachable in the language of [43]), observable and the spectral factor of its spectral density is minimal [43].

On the Homogenization of Generalized Langevin Dynamics
In this section, we discuss some new directions for homogenization of GLEs.
In the case of non-vanishing (first order) effective damping constant and effective diffusion constant, homogenization of a version of the GLE (1.1) was studied in [42], where a limiting SDE for the position process was obtained in the limit, in which all the characteristic time scales of the system (i.e. the inertial time scale, the memory time scale and the noise correlation time scale) tend to zero at the same rate. Extending this result, we are going to focus on the following two cases. (A) The case where an instantaneous damping term is present in the GLE, i.e. F 0 = 0, or the non-vanishing effective damping constant case, i.e. α 1 = 1. Together with the conditions in Example 2, this gives a model for normally diffusing systems; see Proposition 3.5 (iii). One can study the limit in which the inertial time scale and a subset (possibly all or none of) of other characteristic time scales of the system tend to zero; in particular the small mass limit in the case F 0 = 0 of the generalized Langevin dynamics. We remark that the small mass limit is not welldefined in the case F 0 = 0 and α 1 = α 3 = 1 -this was first observed in [49], where it was pointed out that the limit leads to the phenomenon of anomalous gap of the particle's mean-squared displacement (see also [10,29]).
(B) The vanishing effective damping constant and effective diffusion constant case, i.e. F 0 = 0, α 1 = α 3 = 0, α 2 = α 4 = 1. Together with the conditions in Example 2, this gives a model for systems with superdiffusive behavior; see Proposition 3.5 (iv). One can study the limit in which the inertial time scale, a subset of the memory time scales and a subset of the noise correlation time scales tend to zero at the same rate. Such effective models are physically relevant when they preserve the asymptotic behavior of the spectral densities at low and/or high frequencies in the limit. Situations are also possible, where some of the eigenmodes of the memory and noise spectrum are damped much stronger than other, for example due to an injection of monochromatic light from a laser into the system, which is originally in thermal equilibrium. This justifies studying homogenization limits that selectively target a part of frequencies of memory and noise.
We will study homogenization of the GLE (1.1) in the limits described in the above scenarios. In all cases, the inertial time scale is taken to zero -this gives rise to the singular nature of the limit problems. We remark that one could also consider the more interesting scenarios in which the time scales tend to zero at different rates, but we choose not to pursue this in this already lengthy paper.
Notation. Throughout the paper, we denote the variables in the pre-limit equations by small letters (for instance, x (t)), and those of the limiting equations by capital letters (for instance, X(t)). We use Einstein's summation convention on repeated indices. The Euclidean norm of an arbitrary vector w is denoted by |w| and the (induced operator) norm of a matrix A by A . For an R n2×n3 -valued function f (y) := ([f ] jk (y)) j=1,...,n2;k=1,...,n3 , y := ([y] 1 , . . . , [y] n1 ) ∈ R n1 , we denote by (f ) y (y) the n 1 n 2 × n 3 matrix: ∈ R n1 for every j, k. We denote by ∇· the divergence operator which contracts a matrix-valued function to a vector-valued function, i.e. for the matrixvalued function A(X), the ith component of its divergence is given by (∇ · A) i = j ∂A ij ∂X j . Lastly, the symbol E denotes expectation with respect to the probability measure P.

Small Mass Limit of Generalized Langevin Dynamics
Consider the following family of equations for the processes ( where κ(t) and ξ t are the memory function and noise process defined in (3.1) and (3.2) respectively, with each of the α i (i = 1, 2, 3, 4) equal to zero or to one. The equations (5.1)-(5.2) are equivalent to the following system of SDEs for the Markov process z m where we have defined the auxiliary memory processes: Note that the processes β 3,m t and β 4,m t do not actually depend on m, but we are adding the superscript m for a more homogeneous notation.
We make the following simplifying assumptions concerning (5.3)-(5.6). Let W (qj ) (j = 3, 4) be independent Wiener processes on a filtered probability space (Ω, F, F t , P) satisfying the usual conditions and let E denote expectation with respect to P. Assumption 5.2. For t ∈ R + , y ∈ R d , the functions F e (t, y), σ 0 (t, y) and σ(t, y) are continuous and bounded (in t and y) as well as Lipschitz in y, whereas the functions γ 0 (t, y), g(t, y), h(t, y), (γ 0 ) y (t, y), (g) y (t, y) and (h) y (t, y) are continuously differentiable and Lipschitz in y as well as bounded (in t and y). Moreover, the functions (γ 0 ) yy (t, y), (g) yy (t, y) and (h) yy (t, y) are bounded for every t ∈ R + , y ∈ R d . Assumption 5.3. The initial data x, v ∈ R d are F 0 -measurable random variables independent of the σ-algebra generated by the Wiener processes W (qj ) (j = 3, 4). They are independent of m and have finite moments of all orders.
The following theorem describes the homogenized behavior of the particle's position modeled by the family of the equations (5.1)-(5.2)-or, equivalently, by the SDE systems (5.3)-(5.6)-in the limit as the particle's mass tends to zero.
) be a family of processes solving the SDE system (5.3)-(5.6). Suppose that Assumptions 3.2-3.11 and Assumptions 5.1-5.3 hold. In addition, suppose that for every m > 0, x ∈ R d , the family of matrices γ 0 (t, x) is positive stable, uniformly in t and x. Then as m → 0, the position process x m t converges to X t , where X t is the first component of the process satisfying the Itô SDE system: t , for l = 3, 4, (5.10) where the ith component of the S (k) (k = 0, 1, 2) is given by: and for k = 1, 2, with J ∈ R d×d solving the Lyapunov equation, γ 0 J + J γ * 0 = σ 0 σ * 0 . The convergence is obtained in the following sense: for all finite T > 0, sup t∈[0,T ] |x m t − X t | → 0 in probability, as m → 0.
Observe that in the above formula, a i , b i , σ i (i = 1, 2) do not depend explicitly on = m, so by the convention adopted earlier, we denote them Next, we verify the assumptions of Theorem A.6. Assumption A.1 clearly follows from the Assumption 5.1. Since the family of matrices γ 0 (t, x) is positive stable (uniformly in t and x), Assumption A.2 is satisfied. It is straightforward to see that our assumptions on the coefficients of the GLE imply Assumption A. 3. As x(0) and v(0) are random variables independent of m, Assumption A.4 holds by our assumptions on the initial conditions x 0 , v 0 and β j 0 (j = 3, 4). Finally, as noted earlier, Assumption A.5 holds with a i = b i = c i = d i = ∞. The assumptions of the Theorem A.6 are thus satisfied. Applying it, we obtain the limiting SDE system (5.8)-(5.10).
We remark that the limiting SDE is unique up to transformation in (3.6), as pointed out already in [42].
Remark 5.5. In the special case when α i = 0 for i = 1, 2, 3, 4 and the coefficients do not depend on t explicitly, Theorem 5.4 reduces to the result obtained in [28]. In general, by comparing the result with the one obtained in [28], we see that perturbing the original Markovian system by adding a memory and colored noise changes the behavior of the homogenized system obtained in the small mass limit. In particular, (i) the limiting equation for the particle's position not only contains a correction drift term (S (0) ) -the noise-induced drift, but is also coupled to equations for other slow variables; (ii) in the case when α 1 and/or α 2 equal 1, the limiting equation for the (slow) auxiliary memory variables contains correction drift terms (S (1) and/or S (2) ) -which could be called the memory-induced drifts. Interestingly, the memory-induced drifts disappear when h is proportional to γ 0 , a phenomenon that can be attributed to the interaction between the forces F 0 and F 1 .
Note that the highly coupled structure of the limiting SDEs is due to the fact that only one time scale (inertial time scale) was taken to zero in the limit. We expect the structure to simplify when all time scales present in the problem are taken to zero at the same rate.

Homogenization for the Case of Vanishing Effective Damping Constant and Effective Diffusion Constant
In this section we consider the GLE (1.1), with F 0 = 0, α 1 = α 3 = 0, and α 2 = α 4 = 1. We explore a class of homogenization schemes, aiming to: (P1) reduce the complexity of the generalized Langevin dynamics in a way that the homogenized dynamics can be realized on a state space with minimal dimension and are described by minimal number of effective parameters; (P2) retain non-trivial effects of the memory and the colored noise in the homogenized dynamics by matching the asymptotic behavior of the spectral density of the noise process and memory function in the original and the effective model.
Remark 6.1. Generally, the larger the number of time scales (the eigenvalues of the Γ i ) present in the system, the higher the dimension of the state space needed to realize the generalized Langevin system. On the other hand, in addition to Γ i , information on C i and M i is needed to determine the asymptotic behavior of the spectral densities (see Proposition 3.5(i)). In other words, although analysis based solely on time scales consideration may reduce the dimension of the model, it does not in general allow one to achieve the model matching in (P2). It is desirable to have homogenization schemes that achieve both goals of dimension reduction (P1) and matching of models (P2). Such a scheme is considered below.
The idea is to consider the limit when the inertial time scale, a proper subset of the memory time scales and a proper subset of the noise correlation time scales tend to zero at the same rate. The case of sending all the characteristic time scales to zero is excluded here as it is uninteresting when the effective damping and diffusion vanish in the limit.
Recall that the notions of controllability and observability are invariant under the trivial equivalence relation of type (3.6). Therefore, one can, without loss of generality, assume that the Γ i (i = 1, 2, 3, 4) are already in the Jordan normal form and work in Jordan basis. Such form will reveal the slow-fast time scale structure of the system and so give us a rubric to develop homogenization schemes. Assumption 6.2. Let i = 2, 4. All the Γ i are of the following Jordan normal form: 1, . . . , N i ) is the Jordan block associated with the (controllable and observable) eigenvalue λ i,k (or time scale τ i,k = 1/λ i,k ) and corresponds to the invariant subspace X i,k = Ker(λ i,k I − Γ i,k ) ν(λ i,k ) , where ν(λ i,k ) is the index of λ i,k , i.e. the size of the largest Jordan block corresponding to the eigenvalue λ i,k . Let 1 ≤ M i < N i and the eigenvalues be ordered as 0 < λ i,1 ≤ · · · ≤ λ i,Mi < λ i,Mi+1 ≤ · · · ≤ λ i,Ni , so that we have the invariant subspace decomposition, The following procedure studies generalized Langevin dynamics whose spectral densities of the memory and the noise process have the asymptotic behavior, S i (ω) ∼ ω 2li for small ω, and S i (ω) ∼ 1/ω 2di for large ω, for i = 2, 4. We construct a homogenized version of the model in such a way that its memory and noise processes have spectral densities whose asymptotic behavior at low ω matches that of the original model (to achieve (P2)), while that at high ω it varies as 1/ω 2li (to achieve (P1)).   4). Note that the matrix entries of the M i and/or Σ i necessarily depend on the λ i,k due to the Lyapunov equations that relate them to the Γ i . (4) Apply Theorem A.6 to study the limit → 0 and obtain the homogenized model, under appropriate assumptions on the coefficients and parameters in the GLEs.
We remark that while one has the above procedure to study homogenization schemes that achieve (P1) and (P2), the derivations and formulas for the limiting equations could become tedious and complicated as the l i and d i become large. To illustrate this, we consider a simple yet still sufficiently general instance of Algorithm 6.3 in the following. Assumption 6.4. The spectral densities, S i (ω) = Φ i (iω)Φ * i (−iω) (i = 2, 4), with the (minimal) spectral factor: where the P i (z) ∈ R pi×mi are matrix-valued monomials with degree l i : and the Q i (z) ∈ R pi×pi are matrix-valued polynomials with degree d i , i.e.
Here p 2 = q, p 4 = r, the m i (i = 2, 4) are positive integers, the B li ∈ R pi×mi are constant matrices, Γ i,k ∈ R pi×pi are diagonal matrices with positive entries, and I denotes identity matrix of appropriate dimension.
Under Assumption 6.4, the spectral densities have the following asymptotic behavior: S i (ω) ∼ ω 2li for small ω, and S i (ω) ∼ 1/ω 2di for large ω. One can then implement Algorithm 6.3 explicitly to study homogenization for a sufficiently large class of GLEs, where the rescaled spectral densities tend to the ones with the asymptotic behavior mentioned in the paragraph just before Algorithm 6.3 in the limit. We discuss one such implementation in Appendix C. Since the calculations become more complicated as l i and d i become large, we will only study simpler cases and illustrate how things could get complicated in the following.
Together with the equation for the particle's position, the equations (6.16)-(6.17) form the SDE system: x t )dt, (6.23) In the following, we take ∈ E to be small. We make the following assumptions, similar to those made in Theorem 5.4. Assumption 6.5. There are no explosions, i.e. almost surely, for every ∈ E, there exist unique solutions on the time interval [0, T ] to the pre-limit SDEs (6.22)-(6.27) and to the limiting SDEs (6.34).
Assumption 6.6. The initial data x, v ∈ R d are F 0 -measurable random variables independent of the σ-algebra generated by the Wiener processes W (qj ) (j = 3, 4). They are independent of and have finite moments of all orders.
The following theorem describes the homogenized dynamics of the family of the GLEs (6.16)-(6.17) (or equivalently, of the SDEs (6.22)-(6.27)) in the limit → 0, i.e. when the inertial time scale, one half of the memory time scales and one half of the noise correlation time scales in the original generalized Langevin system tend to zero at the same rate. Theorem 6.7. Consider the family of the GLEs (6.16)-(6.17) (or equivalently, of the SDEs (6.22)-(6.27)). Suppose that Assumption 5.2 and Assumptions 6.4-6.6 hold, with the C i , Σ i , M i and Γ i (i = 2, 4) given in (6.7)-(6.9).
Proof. We apply Theorem A.6 to the SDEs (6.22)- (6.27). To this end, we set, in Theorem A.6, n 1 = n 2 = d + d 2 /2 + d 4 /2, k 1 = k 2 = d 4 /2 and as → 0. Using the bound E[|z| p ] ≤ C p (E[|z| 2 ]) p/2 , where z is a mean-zero Gaussian random variable, C p > 0 is a constant and p > 0, it is straightforward to see that Assumption A.4 is satisfied. Note that B i = b i (for i = 1, 2) by our convention, as the b i do not depend explicitly on . The uniform convergence of a i (t, x, ), (a i ) x (t, x, ) and σ i (t, x, ) (in x) to A i (t, x), (A i ) x (t, x) and Σ i (t, x) respectively in the limit → 0 can be shown easily and, in fact, we see that where T and U are given in the theorem, and d i are from Assumption A.5 of Theorem A.6. Therefore, the first part of Assumption A.5 is satisfied. It remains to verify the (uniform) Hurwitz stability of a 2 and A 2 (i.e. Assumption A.2 and the last part of Assumption A.5). This can be done using the methods of the proof of Theorem 2 in [42] and we omit the details here. The results then follow by applying Theorem A.6 and (6.34)-(6.38) follow from matrix algebraic calculations.
It is clear from Theorem 6.7 that the homogenized position process is a component of the (slow) Markov process θ t . In general, it is not a Markov process itself. Also, the components of θ t are coupled in a non-trivial way. We emphasize that one could use Theorem A.6 to study cases in which the different time scales are taken to zero in a different manner.
The limiting SDE for the position process may simplify under additional assumptions. In particular, in the one-dimensional case, i.e. with d = 1 (or when all the matrix-valued coefficients and the parameters are diagonal in the multi-dimensional case), the formula for the limiting SDEs becomes more explicit. This special case has been studied in an earlier section in the context of the models (M1) and (M2) from Example 1.

Conclusions and Final Remarks
We have explored various homogenization schemes for a wide class of generalized Langevin equations. The relevance of the studied limit problems in the context of usual and anomalous diffusion of a particle in a heat bath. Our explorations here open up a wide range of possibilities and provide insights in the model reduction of and effective drifts in generalized Langevin systems.
The following summarizes the main conclusions of the paper: (i) (stochastic modeling point of view) Homogenization schemes producing effective SDEs, driven by white noise, should be the exception rather than the rule. This is particularly important if one seeks to reduce the original model, retain its non-trivial features; (ii) (complexity reduction point of view) There is a trade-off in simplifying GLE models with state-dependent coefficients: the greater the level of model reduction, the more complicated the correction drift terms, entering the homogenized model; (iii) (statistical physics point of view) Homogenized equation obtained could be further simplified, i.e. number of effective equations could be reduced and the drift terms become simplified, when certain special conditions such as a fluctuation-dissipation theorem holds.
We conclude this paper by mentioning a very interesting future direction. As mentioned in Remark 3.7, one could extend the current GLE studies to the infinite-dimensional setting so that a larger class of memory functions and covariance functions can be covered. To this end, one can define the noise process as an appropriate linear functional of a Hilbert space valued process solving a stochastic evolution equation [12,13]. This way, one can approach a class of GLEs, driven by noises having a completely monotone covariance function. This large class of functions contains covariances with power decay and thus the method outlined above can be viewed as an extension of those considered in [21,54], where the memory function and covariance of the driving noise are represented as suitable infinite series with a power-law tail (these works are, to our knowledge, among the few works that study rigorously GLEs with a power-law memory). This approach to systems driven by strongly correlated noise, which is our future project, is expected to involve substantial technical difficulties. More importantly, one can expect that power decay of correlations leads to new phenomena, altering the nature of noise-induced drift.

Appendix A. Homogenization for a Class of SDEs with State-Dependent Coefficients
In this section, we study homogenization for a general class of perturbed SDEs with state-dependent coefficients. Homogenization of differential equations has been extensively studied, from the seminal works of Kurtz [37], Papanicolaou [56] and Khasminksy [33] to the more recent works [58,57,28,27,5,4,9].
Here we are going to present yet another variant of homogenization result that will be needed for studying homogenization for our GLEs (see the last paragraph in Section 1.3 for comments on novelty of this result). Let n 1 , n 2 , k 1 , k 2 be positive integers. Let ∈ (0, 0 ] =: E be a small parameter and x (t) ∈ R n1 , v (t) ∈ R n2 for t ∈ [0, T ], where 0 > 0 and T > 0 are finite constants. Let W (k1) and W (k2) denote independent Wiener processes, which are R k1 -valued and R k2 -valued respectively, on a filtered probability space (Ω, F, F t , P) satisfying the usual conditions [31].
With respect to the standard bases of R n1 and R n2 respectively, we write: We consider the following family of perturbed SDE systems 4 for (x (t), v (t)) ∈ R n1+n2 : with the initial conditions, x (0) = x and v (0) = v , where x and v are random variables that possibly depend on . In the SDEs (A.3)-(A.4), the coefficients a 1 : are matrix-valued or vector-valued functions, which may depend on x , as well as on t and explicitly, as indicated by the parenthesis (t, x (t), ). In the case where the coefficients do not depend on explicitly, we will denote them by the corresponding capital letters (for instance, if a i (t, x, ) = a i (t, x), then a i (t, x) := A i (t, x) etc.). We are interested in the limit as → 0 of the SDEs (A.3)-(A.4), in particular the limiting behavior of the process x (t), under appropriate assumptions 5 on the coefficients. In this section, we present a homogenization theorem that studies this limit and delay its proof and applications to later sections.
We make the following assumptions concerning the SDEs (A. Assumption A.3. For t ∈ [0, T ], y ∈ R n1 , ∈ E, and i = 1, 2, the functions b i (t, y, ) and σ i (t, y, ) are continuous and bounded in t and y, and Lipschitz in y, whereas the functions a i (t, y, ) and (a i ) y (t, y, ) are continuous in t, continuously differentiable in y, bounded in t and y, and Lipschitz in y. Moreover, the functions (a i ) yy (t, y, ) (i = 1, 2) are bounded for every t ∈ [0, T ], y ∈ R n1 and ∈ E.
We assume that the (global) Lipschitz constants are bounded by L( ), where L( ) = O(1) as → 0, i.e. for every t ∈ [0, T ], x, y ∈ R n1 , Assumption A.4. The initial condition x 0 = x ∈ R n1 is an F 0 -measurable random variable that may depend on , and we assume that E[|x | p ] = O(1) as → 0 for all p > 0. Also, x converges, in the limit as → 0, to a random variable x as follows: is an F 0 -measurable random variable that may depend on , and we assume that for every p > 0, Assumption A.5. For i = 1, 2, t ∈ [0, T ], and every x ∈ R n1 , each of the matrix or vector entries of the (non-zero) functions a i (t, x, ) and σ i (t, x, ), converges, uniformly in x, to a unique non-zero element, in the limit as → 0. Their limits are denoted by x) and Σ i (t, x) respectively. Their rate of convergence is assumed to satisfy the following power law bounds: for every t ∈ [0, T ], x ∈ R n1 and i = 1, 2, , as → 0, for some positive exponents a i , b i , c i and d i . Moreover, we assume that A 2 (t, x) is Hurwitz stable for every t and x.
Convention. In the case where the coefficients do not show explicit dependence on or the case when any of the coefficients b 1 , b 2 and σ 1 is zero, we set the exponent, describing the corresponding rate of convergence, to infinity. For instance, if a i (t, x, ) = A i (t, x), we set a i = ∞. Meanwhile, if σ 1 = 0, we set c 1 = ∞, etc..
We now state our homogenization theorem.  (x (t), v (t)) ∈ R n1 × R n2 be their solutions, with the initial conditions (x , v ). Let X(t) ∈ R n1 be the solution to the following Itô SDE with the initial position X(0) = x: where S(t, X(t)) is the noise-induced drift vector whose ith component is given by where i, l = 1, . . . , n 1 , j, k = 1, . . . , n 2 , or in index-free notation, and J ∈ R n2×n2 is the unique solution to the Lyapunov equation: Then the process x (t) converges, as → 0, to the solution X(t), of the Itô SDE (A.10), in the following sense: for all finite T > 0, p > 0, there exists a positive random variable 1 such that in the limit as → 0, with r > 0 is the rate determined to be: where the a i , b i , c i , d i (i = 1, 2) are the positive constants from Assumption A.5. In particular, for all finite T > 0, 16) in probability, in the limit as → 0.
Remark A.7. With more work and additional assumptions, one could prove the statements in Assumption A.1 from Assumption A.2-A.5. However, we choose to incorporate such existence and uniquess results into our assumptions and work with the assumptions as stated above. Moreover, as we have forewarned the readers, our assumptions can be relaxed in various directions at the cost of more technicalities. For instance, the boundedness assumption on the coefficients of the SDEs may be removed to obtain still a pathwise convergence result by adapting the techniques in [27] -see also analogous remarks in Remark 5 in [42]. However, we choose not to pursue the above technical details in this already lengthy paper.

Appendix B. Proof of Theorem A.6
Proof of Theorem A.6 uses techniques developed in earlier works [28,6,42], but here one needs to additionally take into account the -dependence of the coefficients in the SDEs (A.3)-(A.4). As a preparation for the proof, we need a few lemmas and propositions.
We start from an elementary calculus result.
Lemma B.1. For i = 1, . . . , N , let f i (y, ) : R n × (0, ∞) → R mi×n be bounded and globally Lipschitz in y for every > 0, with a Lipschitz constant that is bounded as → 0, i.e. for every y, z ∈ R n , there exists a constant M i ( ) > 0 such that (i) Suppose that for each i and y ∈ R n , there exists a unique bounded F i (y) : R n → R mi×n and a constant C i > 0 such that f i (y, ) − F i (y) ≤ C i ri , for some positive constant r i , as → 0 (i.e. the lefthand side is of order O( ri ) as → 0). Then there exists constants D, Then as → 0, where C, D k+1 are positive constants and we have used the inductive hypothesis and assumptions of the lemma in the last two lines above. The last statement follows from: as → 0, where C is a positive constant. (ii) The statements can be proven using the same techniques used for (i) and so we omit the proof.
Let x (t) ∈ R n1 , v (t) ∈ R n2 and T > 0. For t ∈ [0, T ], let p (t) := v (t) denote a solution of the SDE: (B.11) We provide estimates for the moments concerning the process p (t), under appropriate assumptions on the coefficients and the initial conditions, in the limit as → 0.
We need the following lemma, adapted from Proposition A.2.3 of [30], to obtain an exponential bound on certain fundamental matrix solution.
Then there exists a constant C > 0 and an (in general random 6 for all ≤ 1 and for all s, t ∈ [0, T ].
Proof. Let u ∈ [s, t]. We rewrite for ω ∈ Ω, s, t ∈ [0, T ]: and represent the solution to the IVP as: Using this, we obtain: We now prove a lemma that gives a bound on a class of stochastic integrals. It is modification of Lemma 5.1 in [4]. In both cases, the main idea is to rewrite some of the stochastic integrals in terms of ordinary ones. where N = max{k ∈ Z : kδ < T }, 1 , κ and C are from Lemma B.2, and l 2 -norm is used on every R k .
Proof. The proof is identical to that of Lemma 5.1 in [4] up to line (5.10), with the constant α there replaced by κ, etc. We let ≤ 1 and replace the bound in line (5.11) there by the following bound, which follows from the semigroup property of the fundamental matrix process and Lemma B.2: Then we proceed as in the proof of Lemma 5.1 in [4] to get the desired bound.
Proof. Let Φ (t) be the matrix-valued process solving the IVP: Then, Therefore, for T > 0 and p ≥ 1, using the bound for p ≥ 1 (here the a i ∈ R and N is a positive integer), taking supremum on both sides, and applying Lemma B.2 (with B = a 2 (t, x (t), )), we estimate: for ≤ 1 , where C > 0, κ > 0, and 1 > 0 is the random variable whose existence was proven in Lemma B. | v | p ] = O( α ) as → 0, for some α ≥ p/2. Therefore, combining the above estimates, we obtain: where C 1 (p), C 2 (p) > 0 are constants.
Next, the idea is to use Lemma B.3 and the Burkholder-Davis-Gundy inequality (see Theorem 3.28 in [31]) to estimate the last term on the right hand side above. This is analogous to the technique used in the proof of Proposition 5.1 in [4].
Therefore, we have as → 0, for all 0 < β < p/2. Combining all the estimates obtained, one has: where the C i are positive constants, α ≥ p/2 is some constant, and b 2 > 0 is the constant from Assumption A.5. The statement of the proposition follows.
We also need the following estimate on a class of integrals with respect to products of the coordinates of the process p (t).
Proposition B.5. Suppose that Assumptions A.1-A.5 hold and ∈ E. Let h : R + × R n1 → R be a family of functions, continuously differentiable in y ∈ R n1 and bounded (in s ∈ R + and y ∈ R n1 ), with bounded first derivatives ∇ y h (y) for y ∈ R n1 . Assume that h and ∇ y h (y) are O(1) as → 0. Moreover, assume that ∂ ∂s h is bounded (in all variables) and is O(1) as → 0.
Now we proceed to prove Theorem A.6. Using the above moment estimates and the proof techniques in [4,6], we are going to first obtain the convergence of x t to X t in the limit as → 0 in the following sense: for all finite T > 0, p ≥ 1, as → 0, where the 1 is from Proposition B.4. The main tools are well known ordinary and stochastic integral inequalities, as well as a Gronwall type argument. This result would then imply that for all finite T > 0, sup t∈[0,T ] |x t − X t | → 0 in probability, in the limit as → 0 (see Lemma 1 in [42] Substituting this into (A.3), we obtain: (B.54) In integral form, we have: Its ith component, [x ] i (t) (i = 1, 2, . . . , n 1 ) is (recall that we are employing Einstein's summation convention): Next, we perform integration by parts in the second term on the right hand side above: Next, we apply Itô formula to v (t)( v (t)) * ∈ R n2×n2 : Denoting J (t) := v (t)(v (t)) * , we can rewrite the above as: Since −a 2 (t, x (t), ) is positive stable uniformly (in t, x and ) by Assumption A.2, the solution of the Lyapunov equation (B.63) can be represented as: where J n (t) = Therefore, for s ∈ [0, T ], · e a * 2 (s,x (s), )y To estimate R 5 , we use the Burkholder-Davis-Gundy inequality: where C p is a positive constant and · F denotes the Frobenius norm. Using Hölder's inequality, Assumption A.3, Assumption A.5, and the above techniques, we obtain: and so R 1 ≤ 6 p−1 6 k=0 E 1 sup t∈[0,T ] |Π k | p =: 6 p−1 6 k=0 M k . It is straightforward to show, using the boundedness assumptions of the theorem, that for k = 0, 1, 2, 3, 5: where the C k are positive constants. Applying Proposition B.5, we obtain: By the last statement in Assumption A.5, A 2 is positive stable uniformly (in X and s), therefore the above Lyapunov equation has a unique solution: H (s) = ∞ 0 e A2(s,X(s))y − (Σ 2 Σ * 2 )(s, X(s)) + (σ 2 σ * 2 )(s, x (s), ) + G (s)J 3 (s) + J 3 (s)(G ) * (s) e A * 2 (s,X(s))y dy. (B.94) Using (B.94), the assumptions of the theorem, and estimating as before, we obtain: on S 1 , where L 6 ( , p, T ) = O(1) as → 0, C 6 (p, T ) is a positive constant, and α i ( ), θ i ( ) (i = 1, 2) and γ 2 ( ) are from Assumption A.5.
For i = 2, 4, set m = m 0 , Γ i,k = γ i,k / for k = l i + 1, . . . , d i and rescale the B li with accordingly, so that the limit as → 0 of the rescaled spectral densities gives us the desired asymptotic behavior. The choice of which and how many of the Γ i,k to rescale as well as the smallness of (i.e. what determines the wide separation of time scales and their magnitude) depends on the physical system under study. The resulting family of GLEs can then be cast in a form suitable for application of Theorem A.6 and the homogenized SDE for the particle's position can be obtained, under appropriate assumptions on the coefficients of the GLE.