Homogenization for Generalized Langevin Equations with Applications to Anomalous Diffusion

We study homogenization for a class of generalized Langevin equations (GLEs) with state-dependent coefficients and exhibiting multiple time scales. In addition to the small mass limit, we focus on homogenization limits, which involve taking to zero the inertial time scale and, possibly, some of the memory time scales and noise correlation time scales. The latter are meaningful limits for a class of GLEs modeling anomalous diffusion. We find that, in general, the limiting stochastic differential equations for the slow degrees of freedom contain non-trivial drift correction terms and are driven by non-Markov noise processes. These results follow from a general homogenization theorem stated and proven here. We illustrate them using stochastic models of particle diffusion.

sense) a colored noise process given observations on a finite segment of the past or on the full past [16].

Definitions and Models
We consider the following stochastic model for a particle (for instance, Brownian particle or a tagged tracer particle) interacting with the environment (for instance, a heat bath or a viscous fluid). Let x t ∈ R d denote the particle's position, where t ≥ 0 denotes time and d is a positive integer. The evolution of the particle's velocity, v t :=ẋ t ∈ R d , is described by the following generalized Langevin equation (GLE): [0,t] , ξ t dt + F e (t, x t )dt. (1.1) In the above, m > 0 is the particle's mass, η t is a k-dimensional Gaussian white noise satisfying E[η t ] = 0 and E[η t η * s ] = δ(t − s)I, and ξ t is a colored noise process independent of η t . Here and throughout the paper, the superscript * denotes transposition of matrices or vectors, I denotes identity matrix of appropriate dimension, E denotes expectation, and R + := [0, ∞). The initial data are random variables, x 0 = x, v 0 = v, independent of {ξ t , t ∈ R + } and {η t , t ∈ R + }.
The three terms on the right-hand side of (1.1) model forces of different physical natures acting on the particle.
(i) F e is an external force field, which may be conservative (potential) or not. (ii) F 0 is a Markovian force of the form containing an instantaneous damping term and a multiplicative white noise term. The damping and noise coefficients, γ 0 : R + × R d → R d×d and σ 0 : R + ×R d → R d×k , may depend on the particle's position and on time. W (k) t denotes a k-dimensional Wiener process-the time integral of the white noise η t . (iii) F 1 is a non-Markovian force of the form containing a non-instantaneous damping term, describing the delayed drag effects by the environment on the particle, and a multiplicative colored noise term. The coefficients, g : R + × R d → R d×q , h : R + × R d → R q×d and σ : R + × R d → R d×r , depend in general on the particle's position and on time. In the above, q and r are positive integers, and the memory function κ : R → R q×q is a real-valued function that decays sufficiently fast at infinities. ξ t ∈ R r is a mean-zero stationary Gaussian vector process, to be defined in detail later. The statistical properties of the process ξ t are completely determined by its (matrix-valued) covariance function, (1.4) or equivalently, by its spectral density, S(ω), i.e., the Fourier transform of R(t) defined as: For simplicity, we have omitted other forces such as the Basset force [25] from Eq. (1.1). Note that F 0 and F 1 describe two types of forces associated with different physical mechanisms. Of particular interest is when the noise term in F 0 and F 1 models environments of different nature (passive bath and active bath, respectively [14]) that the particle interacts with.
As the name itself suggests, GLEs are generalized versions of the Markovian Langevin equations, frequently employed to model physical systems. A basic form of the GLEs was first introduced by Mori in [53] and subsequently used in numerous statistical physics models [36,72,76]. The studies of GLEs have attracted increasing interest in recent years. We refer to, for instance, [24,27,39,46,47,50,69,70,75] for various applications of GLEs and [21,40,49,56] for their asymptotic analysis. The main merit of GLEs from modeling point of view is that they take into account the effects of memory and the colored nature of noise on the dynamics of the system. Remark 1.1. In general, there need not be any relation between κ(t) and R(t), or any relation between the damping coefficients and the noise coefficients appearing in the formula for F 0 and F 1 . A particular but important case that we will revisit often in this paper is the case when a fluctuation-dissipation relation holds. In this case, γ 0 is proportional to σ 0 σ * 0 , h = g * , g is proportional to σ and (without loss of generality 1

) R(t) = κ(t). Studies of microscopic
Hamiltonian models for open classical systems lead to GLEs of the form (1.1) satisfying the above fluctuation-dissipation relation (see, for instance, Appendix A of [43] or [11]). On another note, GLEs of the form (1.1) are extended versions of the ones studied in our previous work [43]-here the GLEs are generalized to include a Markovian force, in addition to the non-Markovian one, as well as explicit time dependence in the coefficients.
As a motivation, we now provide and elaborate on examples of systems that can be modeled by our GLEs.
An important type of diffusion, which has been observed in many physical systems, from charge transport in amorphous materials to intracellular particle motion in cytoplasm of living cells [63], is ballistic diffusion. It is a subclass of anomalous diffusions and is characterized by the property that the particle's long-time mean-square displacement grows quadratically in time-in contrast to linear growth in usual diffusion. There are many different theoretical models of anomalous diffusion with diverse properties, coming from different physical assumptions; see [51] for a comprehensive survey. In the following, we provide two GLE models that are employed to study such phenomena. Their properties will be studied in Sect. 2, as an application of the results proven here. Example 1. Two GLE models for anomalous diffusion of a free Brownian particle in a heat bath. A large class of models for diffusive systems is described by the system of equations (for simplicity, we restrict to one dimension): where x t , v t ∈ R are the position and velocity of the particle, κ(t) is called the memory function, and ξ t is a mean-zero stationary Gaussian process. Two particular GLE models are described by (1.6) and (1.7), with: (M1) memory function of the bi-exponential form: , (1.8) where the parameters satisfy Γ 2 > Γ 1 > 0, and ξ t has the covariance function R(t) = κ(t) and thus the spectral density, S(ω) = Γ 2 2 ω 2 (ω 2 + Γ 2 1 )(ω 2 + Γ 2 2 ) . (1.9) This model is similar to the one first introduced and studied in [3]. The noise with the above covariance function can be realized by the difference between two Ornstein-Uhlenbeck processes, with different damping rates, driven by the same white noise. Various properties as well as applications of GLEs of the form (1.6) and (1.7) were studied in [2,3,69]. (M2) memory function of the form: where Γ 1 > 0, and ξ t has the covariance function R(t) = κ(t) and thus the spectral density, (1.11) This model can be obtained from the one in (M1) by sending Γ 2 → ∞ in the formula for κ(t) in (1.8).
Observe that the spectral densities in both models share the same asymptotic behavior near ω = 0, i.e., S(ω) ∼ ω 2 as ω → 0, contributing to the enhanced diffusion (super-diffusion) of the particle with mean-square displacement growing as t 2 as t → ∞ [67,69]. See Proposition 3.5 for a precise argument.
supplementary materials are provided in the appendix. In particular, we state a homogenization theorem for a general class of SDEs with state-dependent coefficients in Appendix A. The proof of this theorem is given in Appendix B.
Summary of the Main Results. For reader's convenience, below we list (not in exactly the same order as the results appear in the paper) and summarize the main results obtained in the paper.
• The first main result is Theorem 5.4. It studies the small mass limit of the GLE described by (5.1) and (5.2). It states that the position process converges, in a strong pathwise sense, to a component of a higher-dimensional process satisfying an Itô SDE. The SDE contains non-trivial drift correction terms. We stress that, while being a component of a Markov process, the limiting position process itself is not Markov. This is in contrast to the nature of limiting processes obtained in earlier works, the difference which holds interesting implications from a physical point of view [recall the discussion after Eq. (1.5)]. Therefore, Theorem 5.4 constitutes a novel result, both mathematically and physically. • The second main result is Theorem 6.7. It describes the homogenized behavior of a family of GLEs [Eqs. (6.16) and (6.17)], parametrized by > 0, in the limit as → 0. This limit is equivalent to the limit in which the inertial time scale, some of the memory time scales and some of the noise correlation time scales in the pre-limit system, tend to zero at the same rate. As in Theorem 5.4, the result here states that the position process converges, in a strong pathwise sense, to a component of a higher-dimensional process satisfying an Itô SDE which contains non-trivial drift correction terms. Again, the limiting position process is non-Markov. However, the structure of the SDE is rather different from the one obtained in Theorem 5.4. As discussed later, this result holds interesting consequences for systems exhibiting anomalous diffusion.
• The third and fourth main results are Corollaries 2.1 and 2.2. These results specialize the earlier ones to one-dimensional GLE models, which are generalizations of (M1) and (M2), and follow from the earlier theorems. They give explicit expressions for the drift correction terms present in the limiting SDEs and therefore may be used directly for modeling and simulation purposes. Furthermore, we show that, in the important case where the fluctuation-dissipation relation (see Remark 1.1) holds, the two corollaries are intimately connected. Recall that these results are going to be presented first in Sect. 2. • The last main result is Theorem A.6, on homogenization of a family of parametrized SDEs whose coefficients are state-dependent. These SDEs are variants of the ones studied in earlier works [4,6,29]. In comparison with all the earlier studies, the state-dependent coefficients of the prelimit SDEs (A.3) and (A.4) may depend on the parameter > 0 (to be taken to zero) explicitly. Therefore, this result is new and not simply a minor generalization of earlier results. Moreover, it is important in the context of present paper and is needed here to study various homogenization limits of GLEs, the importance of which is evident in the discussions above, in the main paper.

Application to One-Dimensional GLE Models
We first study the small mass limit of a one-dimensional GLE, which is a generalized version of the GLE in model (M2) of Example 1, modeling superdiffusion of a particle in a heat bath. Our models are generalized in that the coefficients of the GLEs are state-dependent. For simplicity, we are going to omit the explicit time dependence in the damping and noise coefficients-but not in the external force. For t ∈ R + , m > 0, let x t , v t ∈ R be the solutions to the equations: where Γ 1 > 0, and ξ t is the mean-zero stationary Gaussian process with the covariance function R(t) = κ(t) and spectral density, The initial data (x, v) are random variables independent of and have finite moments of all orders. The following corollary describes the limiting SDE for the particle's position obtained in the small mass limit of (2.1) and (2.2). Corollary 2.1. Assume that for every y ∈ R, g(y), g (y), h(y), h (y), σ(y) are bounded continuous functions in y, F e (t, y) is bounded and continuous in t and y, and all the listed functions have bounded y-derivatives. Then in the limit m → 0, the particle's position, x t ∈ R, satisfying (2.1) and (2.2), converges to X t , where X t solves the following Itô SDE: where Moreover, if in addition g := φσ, where φ > 0, then the number of limiting SDEs reduces from three to two: Proof. We apply Theorem 5.4 by setting d = 1, The assumptions of Theorem 5.4 can be verified in a straightforward way and so the results of the corollary follow.
We next specialize the result of Theorem 6.7 to study homogenization of one-dimensional GLEs which are generalizations of the model (M1) in Example 1: for t ∈ R + , m > 0, let x t , v t ∈ R be the solutions to the equations: , (2.13) with Γ 2 > Γ 1 > 0, and ξ t is the mean-zero stationary Gaussian process with the covariance function R(t) = κ(t) and spectral density, . (2.14) The initial data (x, v) are random variables independent of and have finite moments of all orders. For > 0, we set m = m 0 and Γ 2 = γ 2 / in (2.11) and (2.12), where m 0 and γ 2 are positive constants. This gives the family of equations: where and ξ t is the family of mean-zero stationary Gaussian processes with the covariance functions, R (t) = κ (t).

Discussion.
We discuss the physical meaning behind the above rescaling of parameters. Recall that in the first case of Example 1 (i.e., the model (M1)), the mean-square displacement of the particle grows as t 2 as t → ∞, and therefore, the above model describes a particle exhibiting super-diffusion. As → 0, the environment allows for more and more negative correlation and in the limit the covariance function consists of a delta-type peak at t = 0 and a negative long tail compensating for the positive peak when integrated (see Fig. 1 and also page 105 of [72]). Indeed, as → 0. This is the so-called vanishing effective friction case in [1]. The noise with the covariance function κ (t) is called harmonic velocity noise, whereas the noise with the covariance function κ(t) is the derivative of an Ornstein-Uhlenbeck process.
The following corollary provides the homogenized model in the limit → 0 of (2.15) and (2.16).

Corollary 2.2.
Assume that for every y ∈ R, g(y), g (y), h(y), h (y), σ(y) are bounded continuous functions in y, F e (t, y) is bounded and continuous in t and y, and all the listed functions have bounded derivatives in y. Then in the limit → 0, the particle's position, x t ∈ R, satisfying (2.15) and (2.16), converges to X t , where X t solves the following Itô SDE: , W t is a one-dimensional Wiener process, and Moreover, if in addition g := φσ, where φ > 0, then the number of limiting SDEs reduces from three to two: Proof. Let d = 1, d 2 = d 4 = 2 and denote the one-dimensional version of the variables, coefficients and parameters in Theorem 6.7 by non-bold letters (for instance, x t , B 2 , Γ 2,2 etc.). Furthermore, set B 2 = B 4 = β > 0, γ 2,2 = γ 4,2 = γ 2 > 0 and Γ 2,1 = Γ 4,1 = Γ 1 . Then it can be verified that the assumptions of Theorem 6.7 hold and the results follow upon solving a Lyapunov equation. (i) the homogenized position process is non-Markov, driven by a colored noise process which is the derivative of the Ornstein-Uhlenbeck process. This behavior is expected in view of the asymptotic behavior of the rescaled memory function and spectral density as → 0. (ii) similarly to the small mass limit case considered earlier, the limiting equation for the particle's position not only contains noise-induced drift terms but is also coupled to equations for other slow variables. Moreover, the limiting equations for these other slow variables also contain nontrivial correction terms-the memory-induced drift.
To end this section, we remark that one could in principle repeat the above analysis for the case where the spectral density varies as ω 2l , for l = 2, 4, . . . (i.e., the highly nonlinear case).

GLEs in Finite Dimensions
We call a system modeled by GLE of the form (1.1) a generalized Langevin system. Its dynamics will be referred to as generalized Langevin dynamics.
We assume that the memory function κ(t) in the GLE (1.1) is a Bohl function, i.e., that each matrix element of κ(t) is a finite, real-valued linear combination of exponentials, possibly multiplied by polynomials and/or by trigonometric functions. The noise process, {ξ(t), t ∈ R + }, is a mean-zero, mean-square continuous stationary Gaussian process with Bohl covariance function and, therefore, its spectral density S(ω) is a rational function (see Theorem 2.20 in [73]). In this case, the generalized Langevin dynamics can be realized by an SDE system in a finite-dimensional space. The case in which an infinite-dimensional space is required is deferred to a future work (see also Remark 3.7 and Sect. 7).
Below we define the memory function and the noise process in the GLE (1.1) [see Eq. (1.3)] and along the way introduce our notation. They are defined in a manner ensuring simplicity as well as providing sufficient parameters for matching the memory function and the correlation function of the noise, thereby reflecting the essential statistical properties of the GLE. This provides a systematic framework for our homogenization studies (see the discussion in Sect. 4). For be constant matrices. Also, let C i ∈ R q×di (for i = 1, 2) and C i ∈ R r×di (for i = 3, 4) be constant matrices. Here, the d i and q i (i = 1, 2, 3, 4) are positive integers. Let α i ∈ {0, 1} be a "switch on or off" parameter. We define the memory function in terms of the sextuple (Γ 1 , M 1 , C 1 ; Γ 2 , M 2 , C 2 ) of matrices: The noise process is defined as: where the β j t ∈ R dj (j = 3, 4) are independent Ornstein-Uhlenbeck type processes, i.e., solutions of the SDEs: with the initial conditions, β j 0 , normally distributed with mean-zero and covariance M j . Here, W (qj ) t denotes a q j -dimensional Wiener process, independent of β j 0 . Also, the Wiener processes W The M i are therefore the steady-state covariances of the systems, i.e., the resulting Ornstein-Uhlenbeck processes are stationary. In control theory, M i is also known as the controllability Gramian for the pair (Γ i , Σ i ) [73]. The covariance matrix, R(t), of the mean-zero Gaussian noise process is expressed by the sextuple (Γ 3 , M 3 , C 3 ; Γ 4 , M 4 , C 4 ) of matrices as follows: and so the sextuple (Γ 3 , M 3 , C 3 ; Γ 4 , M 4 , C 4 ), together with the parameters α 3 , α 4 , completely determine the probability distributions of ξ t . We denote the spectral density of the noise process by We will view the system (3.2) and (3.3) (which is in a statistical steady state) as a representation of the noise process ξ t and call such a representation a (finite-dimensional) stochastic realization of ξ t . Similarly, we view (3.1) as a representation of the memory function κ(t) and call such a representation a (finite-dimensional, deterministic) memory realization of κ(t). We call the Fourier transform of κ(t) and R(t) the spectral density of the memory function and spectral density of the noise process respectively.
An important message from the stochastic realization theory is that the system (3.2) and (3.3) is more than a representation of ξ t in terms of a white noise, in that it also contains state variables β j (j = 3, 4) which serve as a "dynamical memory." In contrast to standard treatments, this dynamical memory comes not from one, but from two independent systems of type (3.3). This will be used to include two distinct types of dynamical memory that can be switched on or off using the parameters α i -see Proposition 3.5. This consideration motivates us to define the memory function (and noise) explicitly using two independent systems, with different constraints on their parameters easier to state than if a single higher-dimensional system were used.
The sextuples that define the memory function in (3.1) and the noise process in (3.2) are only unique up to the following transformations: where i = 1, 2, 3, 4 and T i are any invertible matrices of appropriate dimensions [44]. Different choices of T i correspond to different coordinate systems.
Remark 3.1. Realization of the memory function and noise process in terms of the matrix sextuples, as defined above, covers all GLEs driven by Gaussian processes that can be realized in a finite dimension (see the propositions and theorems on page 303-308 of [74]). See also the remarks on the subject in [43].
A summary of the above discussion is included in the following: The memory function κ(t) in the GLE (1.1) is a real-valued Bohl function defined by (3.1) and the noise process, {ξ t , t ∈ R + }, is a meanzero, mean-square continuous, stationary Gaussian process with Bohl covariance function (hence, with rational spectral density), admitting a stochastic realization given by (3.2) and (3.3).
We introduce a generalized version of the effective damping constant and effective diffusion constant used in [43], which will be useful to study the asymptotic behavior of spectral densities.

Definition 3.3.
For n ∈ Z, the nth order effective damping constant is defined as the constant matrix, parametrized by α 1 , α 2 ∈ {0, 1}: . Likewise, the nth order effective diffusion constant, Note that the first-order effective damping constant K (1) (α 1 , α 2 ) = ∞ 0 κ(t)dt and the first-order effective diffusion constant L (1) (α 3 , α 4 ) = ∞ 0 R(t)dt are simply the effective damping constant and effective diffusion constant introduced in [43]. The memory function and the covariance function of the noise process can be expressed in terms of these constants: in the expression for first-order effective damping constant is invertible and the matrix K In order to develop intuition about general GLEs, it will be helpful to study the following exactly solvable special case.

Example 2.
(An exactly solvable case) In the GLE (1.1), set F e = 0. Let γ 0 (t, x) = γ 0 , σ 0 (t, x) = σ 0 , h(t, x) = h, g(t, x) = g and σ(t, x) = σ be constant matrices. The initial data are the random variables, , so that the fluctuation-dissipation relations hold (see Remark 1.1 and also Remark 3.6). The resulting GLE gives a simple model describing the motion of a free particle, interacting with a heat bath. Note that generally the process v(t) is not assumed to be stationary, in particular v(0) could be an arbitrarily distributed random variable.
The following proposition gives the asymptotic behavior of the spectral densities (equivalently, covariance functions, or memory functions), the regularity 2 (in the mean-square sense) of the noise process, and, in the exactly solvable case of Example 2, the long-time mean-square displacement of the particle.
be the particle's initial average kinetic energy. Assume for simplicity that R(t) = κ(t) and σκ(t)σ * = h * κ * (t)g * . Then we have the following formula for the particle's mean-square displacement (MSD): 2 Sample path continuity does not in general imply mean-square continuity. 3 A process X(t) is mean-square differentiable (with derivative dX(t)/dt) on a time interval τ if for every t ∈ τ , For (iii) and (iv) below, we consider the process x t solving the GLE (3.10)

4, can be 0 or 1 and F 0 can be zero or nonzero). Then E[x(t)x * (t)] = O(t)
as t → ∞, in which case we say that the particle diffuses normally. (iv) Let α 1 = 0, α 2 = 1 and F 0 = 0 (the vanishing effective damping con- as t → ∞, in which case we say that the particle exhibits a ballistic (super-diffusive) behavior.
Proof. (i) For i = 3, 4, it is easy to compute that and so one has: as ω → 0. The first two statements in (i) then follow by Assumption 3.4. The last statement follows from Lemma 6.11 in [45].
Therefore, using the mutual independence of v, {ξ(t), t ∈ R + } and {η(t), t ∈ R + }, the Itô isometry, and the assumption that R(t) = κ(t), we obtain: (3.22) where To compute the double integral L(t), we first rewrite it as We then compute: where denotes convolution. Now note that, by the convolution theorem, (σκσ * H * )(u) is the inverse Laplace transform of σκ(z)σ * Ĥ * (z), which can be written as I/z −(mzI +γ * 0 )Ĥ * (z) by using the assumption that σκ(t)σ * = h * κ * (t)g * . Computing the inverse transform gives us: Similarly, we obtain L 2 (t) = L 1 (t), and so L(t) = 2L 1 (t). Therefore, combining (3.22) and (3.30) gives us the desired formula for MSD. (iii) & (iv) The assumptions that g = h * = σ and R(t) = κ(t) = κ * (t) ensure that we can apply the MSD formula in (ii). The additional assumption that γ 0 = σ 0 σ * 0 /2 (fluctuation-dissipation relation of the first kind) implies thatĤ(z) =Ĥ * (z) and simplifies the formula to: To determine the behavior of E[x(t)x * (t)] as t → ∞, it suffices to investigate the asymptotic behavior ofĤ(z), whose formula is given in (ii), as z → 0. Noting that and using Assumption 3.4, we find that, as z → 0, The results in (iii) and (iv) then follow by applying the Tauberian theorems [18], which say, in particular, that ifĤ behaving as t α as t → ∞, where α > 2, cannot take place when the velocity process converges to a stationary state. For a system to behave this way, the velocity itself has to grow with time. Moreover, we remark that one could obtain a richer class of asymptotic behaviors for the MSD by relaxing the assumption of fluctuation-dissipation relations.
To summarize, (i) says that in the case where F 0 = 0, α 1 = α 3 = 0, the nth-order effective constants characterize the asymptotic behavior of the spectral densities at low frequencies; (ii) provides a formula for the particle's mean-square displacement, and (iii)-(iv) classify the types of diffusive behavior of the GLE model, in the exactly solvable case of Example 2, satisfying the fluctuation-dissipation relations. We emphasize that in the sequel we go beyond the above exactly solvable case; in particular the coefficients g, h, σ, γ 0 , σ 0 will depend in general on the particle's position. However, the GLE in the exactly solvable case can be viewed as linear approximation to the general GLE (1.1) (by expanding these coefficients in a Taylor series about a fixed position x ∈ R d ).
In view of Proposition 3.5, the parameters α i ∈ {0, 1} allow us to control diffusive behavior of the generalized Langevin dynamics. Our GLE models are very general and need not satisfy a fluctuation-dissipation relation. As we will see, these different behaviors motivate our introduction and study of various homogenization schemes for the GLE. Depending on the physical systems under consideration, one scheme might be more realistic than the others. It is one of the goals of this paper to explore homogenization schemes for different GLE classes. The equation for the particle's position, together with the GLE (1.1), can be cast as the system of SDEs for the Markov process where we have defined the auxiliary memory processes: Remark 3.7. In finite dimension, it is not possible to realize generalized Langevin dynamics with a noise and/or memory function whose spectral density varies as 1/ω p , p ∈ (0, 1), near ω = 0 (i.e., the so-called 1/f -type noise [37]), and, consequently, the noise covariance function and/or memory function decay as a power 1/t α , α ∈ (0, 1), as t → ∞. In this case, one can use the formula in (ii) of Proposition 3.5 to show, at least for the exactly solvable case in Example 2 where the fluctuation-dissipation relations hold, that the asymptotic behavior of the particle is sub-diffusive, i.e., where β ∈ (0, 1), as t → ∞ (see also the related works [15,49]). Sub-diffusive behavior has been discovered in a wide range of statistical and biological systems [35], making the study in this case relevant. One could, following the ideas in [21,55], extend the state space of the GLEs to an infinite-dimensional one, in order to study the sub-diffusive case. Homogenization in this case, where more technicalities are expected, will be explored in a future work.

On the Homogenization of Generalized Langevin Dynamics
In this section, we discuss some new directions for homogenization of GLEs.
In the case of nonvanishing (first-order) effective damping constant and effective diffusion constant, homogenization of a version of the GLE (1.1) was studied in [43], where a limiting SDE for the position process was obtained in the limit, in which all the characteristic time scales of the system (i.e., the inertial time scale, the memory time scale and the noise correlation time scale) tend to zero at the same rate. Extending this result, we are going to focus on the following two cases.
(A) The case where an instantaneous damping term is present in the GLE, i.e., F 0 = 0, or the nonvanishing effective damping constant case, i.e., α 1 = 1. Together with the conditions in Example 2, this gives a model for normally diffusing systems; see Proposition 3.5 (iii). One can study the limit in which the inertial time scale and a subset (possibly all or none of) of other characteristic time scales of the system tend to zero; in particular the small mass limit in the case F 0 = 0 of the generalized Langevin dynamics. We remark that the small mass limit is not welldefined in the case F 0 = 0 and α 1 = α 3 = 1-this was first observed in [50], where it was pointed out that the limit leads to the phenomenon of anomalous gap of the particle's mean-square displacement (see also [10,30]). (B) The vanishing effective damping constant and effective diffusion constant case, i.e., Together with the conditions in Example 2, this gives a model for systems with superdiffusive behavior; see Proposition 3.5 (iv). One can study the limit in which the inertial time scale, a subset of the memory time scales, and a subset of the noise correlation time scales tend to zero at the same rate. Such effective models are physically relevant when they preserve the asymptotic behavior of the spectral densities at low and/or high frequencies in the limit. Situations are also possible, where some of the eigenmodes of the memory and noise spectrum are damped much stronger than other, for example due to an injection of monochromatic light from a laser into the system, which is originally in thermal equilibrium. This justifies studying homogenization limits that selectively target a part of frequencies of memory and noise.
We will study homogenization of the GLE (1.1) in the limits described in the above scenarios. In all cases, the inertial time scale is taken to zero-this gives rise to the singular nature of the limit problems. We remark that one could also consider the more interesting scenarios in which the time scales tend to zero at different rates, but we choose not to pursue this in this already long paper.
Notation. Throughout the paper, we denote the variables in the pre-limit equations by small letters (for instance, x (t)), and those of the limiting equations by capital letters (for instance, X(t)). We use Einstein's summation convention on repeated indices. The Euclidean norm of an arbitrary vector w is denoted by |w| and the (induced operator) norm of a matrix A by A . For an R n2×n3 -valued function f (y) : where ∇ y [f ] jk (y) stands for the gradient vector for every j, k. We denote by ∇· the divergence operator which contracts a matrix-valued function to a vector-valued function, i.e., for the matrix-valued function A(X), the ith component of its divergence is given by (∇ · A) i = j ∂A ij ∂X j . Lastly, the symbol E denotes expectation with respect to the probability measure P.

Small Mass Limit of Generalized Langevin Dynamics
Consider the following family of equations for the processes ( where κ(t) and ξ t are the memory function and noise process defined in (3.1) and (3.2), respectively, with each of the α i (i = 1, 2, 3, 4) equal to zero or to one. Equations (5.1) and (5.2) are equivalent to the following system of SDEs for the Markov process where we have defined the auxiliary memory processes: Note that the processes β 3,m t and β 4,m t do not actually depend on m, but we are adding the superscript m for a more homogeneous notation.
We make the following simplifying assumptions concerning (5.3)-(5.6). Let W (qj ) (j = 3, 4) be independent Wiener processes on a filtered probability space (Ω, F, F t , P) satisfying the usual conditions [32] and let E denote expectation with respect to P. Assumption 5.2. For t ∈ R + , y ∈ R d , the functions F e (t, y), σ 0 (t, y) and σ(t, y) are continuous and bounded (in t and y) as well as Lipschitz in y, whereas the functions γ 0 (t, y), g(t, y), h(t, y), (γ 0 ) y (t, y), (g) y (t, y) and (h) y (t, y) are continuously differentiable and Lipschitz in y as well as bounded (in t and y). Moreover, the functions (γ 0 ) y y (t, y), (g) y y (t, y) and (h) y y (t, y) are bounded for every t ∈ R + , y ∈ R d . Assumption 5.3. The initial data x, v ∈ R d are F 0 -measurable random variables independent of the σ-algebra generated by the Wiener processes W (qj ) (j = 3, 4). They are independent of m and have finite moments of all orders.
The following theorem describes the homogenized behavior of the particle's position modeled by the family of Eqs. (5.1) and (5.2)-or, equivalently, by the SDE systems (5.3)-(5.6)-in the limit as the particle's mass tends to zero.
Observe that in the above formula, a i , b i , σ i (i = 1, 2) do not depend explicitly on = m, so by the convention adopted in Appendix A, we denote them A i , B i , Σ i , respectively, and we put Next, we verify the assumptions of Theorem A.6. Assumption A.1 clearly follows from Assumption 5.1. Since the family of matrices γ 0 (t, x) is positive stable (uniformly in t and x), Assumption A.2 is satisfied. It is straightforward to see that our assumptions on the coefficients of the GLE imply Assumption A.3. As x(0) and v(0) are random variables independent of m, Assumption A.4 holds by our assumptions on the initial conditions x 0 , v 0 and β j 0 (j = 3, 4). Finally, as noted earlier, Assumption A.5 holds with a i = b i = c i = d i = ∞. The assumptions of Theorem A.6 are thus satisfied. Applying it, we obtain the limiting SDE system (5.8)-(5.10).
We remark that the limiting SDE is unique up to the transformation in (3.6), as pointed out already in [43].
Remark 5.5. In the special case when α i = 0 for i = 1, 2, 3, 4 and the coefficients do not depend on t explicitly, Theorem 5.4 reduces to the result obtained in [29]. In general, by comparing the result with the one obtained in [29], we see that perturbing the original Markovian system by adding a memory and colored noise changes the behavior of the homogenized system obtained in the small mass limit. In particular, (i) the limiting equation for the particle's position not only contains a correction drift term (S (0) )-the noise-induced drift, but is also coupled to equations for other slow variables; (ii) in the case when α 1 and/or α 2 are/is one, the limiting equation for the (slow) auxiliary memory variables contains correction drift terms (S (1) and/or S (2) )-which could be called the memory-induced drifts. Interestingly, the memory-induced drifts disappear when h is proportional to γ 0 , a phenomenon that can be attributed to the interaction between the forces F 0 and F 1 . Note that the highly coupled structure of the limiting SDEs is due to the fact that only one time scale (inertial time scale) was taken to zero in the limit. We expect the structure to simplify when all time scales present in the problem are taken to zero at the same rate.

Homogenization for the Case of Vanishing Effective Damping Constant and Effective Diffusion Constant
In this section, we consider the GLE (1.1), with F 0 = 0, α 1 = α 3 = 0, and α 2 = α 4 = 1. We explore a class of homogenization schemes, aiming to: (P1) reduce the complexity of the generalized Langevin dynamics in a way that the homogenized dynamics can be realized on a state space with minimal dimension and are described by minimal number of effective parameters; (P2) retain non-trivial effects of the memory and the colored noise in the homogenized dynamics by matching the asymptotic behavior of the spectral density of the noise process and memory function in the original and the effective model. It is desirable to have homogenization schemes that achieve both goals of dimension reduction (P1) and matching of models (P2). Such a scheme is considered below.
The idea is to consider the limit when the inertial time scale, a proper subset of the memory time scales and a proper subset of the noise correlation time scales tend to zero at the same rate. The case of sending all the characteristic time scales to zero is excluded here as it is uninteresting when the effective damping and diffusion vanish in the limit.
We assume that the Γ i (i = 1, 2, 3, 4) are already in the Jordan normal form and work in Jordan basis. Such form will reveal the slow-fast time scale structure of the system and so give us a rubric to develop homogenization schemes. Assumption 6.2. Let i = 2, 4. All the Γ i are of the following Jordan normal form: is the Jordan block associated with the (controllable and observable) eigenvalue λ i,k (or time scale τ i,k = 1/λ i,k ) and corresponds to the invariant subspace is the index of λ i,k , i.e., the size of the largest Jordan block corresponding to the eigenvalue λ i,k . Let 1 ≤ M i < N i and the eigenvalues be ordered as 0 The following procedure studies generalized Langevin dynamics whose spectral densities of the memory and the noise process have the asymptotic behavior, S i (ω) ∼ ω 2li for small ω, and S i (ω) ∼ 1/ω 2di for large ω, for i = 2, 4. We construct a homogenized version of the model in such a way that its memory and noise processes have spectral densities whose asymptotic behavior at low ω matches that of the original model [to achieve (P2)], while that at high ω it varies as 1/ω 2li [to achieve (P1)]. We remark that while one has the above procedure to study homogenization schemes that achieve (P1) and (P2), the derivations and formulae for the limiting equations could become tedious and complicated as the l i and d i become large. Therefore, we consider a simple yet still sufficiently general instance of Algorithm 6.3 in the following.
where the P i (z) ∈ R pi×mi are matrix-valued monomials with degree l i : and the Q i (z) ∈ R pi×pi are matrix-valued polynomials of degree d i , i.e., Here p 2 = q, p 4 = r, the m i (i = 2, 4) are positive integers, the B li ∈ R pi×mi are constant matrices, Γ i,k ∈ R pi×pi are diagonal matrices with positive entries, and I denotes identity matrix of appropriate dimension.
Under Assumption 6.4, the spectral densities have the following asymptotic behavior: S i (ω) ∼ ω 2li for small ω, and S i (ω) ∼ 1/ω 2di for large ω. One can then implement Algorithm 6.3 explicitly to study homogenization for a sufficiently large class of GLEs, where the rescaled spectral densities tend to the ones with the asymptotic behavior mentioned in the paragraph just before Algorithm 6.3 in the limit. We discuss one such implementation in Appendix C. Since the calculations become more complicated as l i and d i become large, we will only study simpler cases and illustrate how things could get complicated in the following.
The following theorem describes the homogenized dynamics of the family of the GLEs (6.16) and (6.17) [or equivalently, of the SDEs (6.22)-(6.27)] in the limit → 0, i.e., when the inertial time scale, one half of the memory time scales and one half of the noise correlation time scales in the original generalized Langevin system tend to zero at the same rate. Assume that for every t ∈ R + , > 0, x ∈ R d ,
Next, note that β 4, 0 = (β 4,1, ) is a random variable normally distributed with mean-zero and covariance: , (6.46) where E[|β 4,1,  , x), respectively, in the limit → 0 can be shown easily and, in fact, we see that A 1 = T , A 2 = −U , where T and U are given in (6.38), (6.51) and and d i are from Assumption A.5 of Theorem A.6. Therefore, the first part of Assumption A.5 is satisfied. It remains to verify the (uniform) Hurwitz stability of a 2 and A 2 (i.e., Assumption A.2 and the last part of Assumption A.5). This can be done using the methods of the proof of Theorem 2 in [43], and we omit the details here. The results then follow by applying Theorem A.6, and (6.34)-(6.38) follow from matrix algebraic calculations.
It is clear from Theorem 6.7 that the homogenized position process is a component of the (slow) Markov process θ t . In general, it is not a Markov process itself. Also, the components of θ t are coupled in a non-trivial way. We emphasize that one could use Theorem A.6 to study cases in which the different time scales are taken to zero in a different manner.
The limiting SDE for the position process may simplify under additional assumptions. In particular, in the one-dimensional case, i.e., with d = 1 (or when all the matrix-valued coefficients and the parameters are diagonal in the multi-dimensional case), the formula for the limiting SDEs becomes more explicit. This special case has been studied in Sect. 2 (see Corollary 2.2) in the context of the model (M1) in Example 1.

Conclusions and Final Remarks
We have explored various homogenization schemes for a wide class of generalized Langevin equations and the relevance of the studied limit problems in the context of usual and anomalous diffusion of a particle in a heat bath. Our explorations here open up a wide range of possibilities and provide insights in the model reduction of and effective drifts in generalized Langevin systems.
The following summarizes the main conclusions of the paper: (i) (stochastic modeling point of view) Homogenization schemes producing effective SDEs, driven by white noise, should be the exception rather than the rule. This is particularly important if one seeks to reduce the original model, retaining its non-trivial features; (ii) (complexity reduction point of view) There is a trade-off in simplifying GLE models with state-dependent coefficients: The greater the level of model reduction, the more complicated the correction drift terms, entering the homogenized model; (iii) (statistical physics point of view) Homogenized equation obtained could be further simplified, i.e., number of effective equations could be reduced and the drift terms become simplified, when certain special conditions such as a fluctuation-dissipation theorem holds. We conclude this paper by mentioning a very interesting future direction. As mentioned in Remark 3.7, one could extend the current GLE studies to the infinite-dimensional setting so that a larger class of memory functions and covariance functions can be covered. To this end, one can define the noise process as an appropriate linear functional of a Hilbert space valued process solving a stochastic evolution equation [12,13]. This way, one can approach a class of GLEs, driven by noises having a completely monotone covariance function. This large class of functions contains covariances with power decay, and thus, the method outlined above can be viewed as an extension of those considered in [21,55], where the memory function and covariance of the driving noise are represented as suitable infinite series with a power-law tail. The works in [21,55] are, to the best of our knowledge, among the few works that study rigorously GLEs with a power-law memory. This approach to systems driven by strongly correlated noise, which is our future project, is expected to involve substantial technical difficulties. More importantly, one can expect that power decay of correlations leads to new phenomena, altering the nature of noiseinduced drift.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: Homogenization for a Class of SDEs with State-Dependent Coefficients
In this section, we study homogenization for a general class of perturbed SDEs with state-dependent coefficients. Homogenization of differential equations has been extensively studied, from the seminal works of Kurtz [38], Papanicolaou [57] and Khasminksy [34] to the more recent works [4,5,9,28,29,58,59]. Here we are going to present yet another variant of homogenization result that will be needed for studying homogenization for our GLEs (see the last paragraph in Sect. 1.3 for comments on novelty of this result).
With respect to the standard bases of R n1 and R n2 respectively, we write: We consider the following family of perturbed SDE systems 4 for (x (t), v (t)) ∈ R n1+n2 : are matrix-valued or vector-valued functions, which may depend on x , as well as on t and explicitly, as indicated by the parentheses (t, x (t), ). In the case where the coefficients do not depend on explicitly, we will denote them by the corresponding capital letters (for instance, if a i (t, x, ) = a i (t, x), then a i (t, x) := A i (t, x) etc.). We are interested in the limit as → 0 of the SDEs (A.3) and (A.4), in particular the limiting behavior of the process x (t), under appropriate assumptions 5 on the coefficients. In this appendix, we present a homogenization theorem that studies this limit and delay its proof to Appendix B.
We make the following assumptions concerning the SDEs  i (t, y, ) and (a i ) y (t, y, ) are continuous in t, continuously differentiable in y, bounded in t and y, and Lipschitz in y. Moreover, the functions (a i ) y y (t, y, ) (i = 1, 2) are bounded for every t ∈ [0, T ], y ∈ R n1 and ∈ E.
We assume that the (global) Lipschitz constants are bounded by L( ), where L( ) = O(1) as → 0, i.e., for every t ∈ [0, T ], x, y ∈ R n1 , max a i (t, x, ) − a i (t, y, ) , (a i ) x (t, x, ) − (a i ) x (t, y, ) , t, y, )|, σ i (t, x, ) − σ i (t, y, ) Assumption A.4. The initial condition x 0 = x ∈ R n1 is an F 0 -measurable random variable that may depend on , and we assume that as → 0 for all p > 0. Also, x converges, in the limit as → 0, to a random variable x as follows: is an F 0 -measurable random variable that may depend on , and we assume that for every p > 0, Assumption A.5. For i = 1, 2, t ∈ [0, T ], and every x ∈ R n1 , each of the matrix or vector entries of the (nonzero) functions a i (t, x, ), (a i ) x (t, x, ), b i (t, x, ) and σ i (t, x, ), converges, uniformly in x, to a unique nonzero limit as → 0. Their limits are denoted by A i (t, x), , x), respectively. Their rate of convergence is assumed to satisfy the following power-law bounds: for every t ∈ [0, T ], x ∈ R n1 and i = 1, 2, , as → 0, for some positive exponents a i , b i , c i and d i . Moreover, we assume that A 2 (t, x) is Hurwitz stable for every t and x.

Convention.
In the case where the coefficients do not show explicit dependence on or in the case when any of the coefficients b 1 , b 2 and σ 1 is zero, we set the exponent, describing the corresponding rate of convergence, to infinity.

where S(t, X(t)) is the noise-induced drift vector whose ith component is given by
and J ∈ R n2×n2 is the unique solution to the Lyapunov equation: Then the process x (t) converges, as → 0, to the solution X(t), of the Itô SDE (A.10), in the following sense: for all finite T > 0, p > 0, there exists a positive random variable 1 such that in the limit as → 0, with r > 0 is defined as: Remark A.7. With more work and additional assumptions, one could prove the statements in Assumption A.1 from Assumptions A.2-A.5. However, we choose to incorporate such existence and uniqueness results into our assumptions and work with the assumptions as stated above. Moreover, as we have forewarned the readers, our assumptions can be relaxed in various directions at the cost of more technicalities. For instance, the boundedness assumption on the coefficients of the SDEs may be removed to obtain still a pathwise convergence result by adapting the techniques in [28]-see also analogous remarks in Remark 5 in [43]. However, we choose not to pursue the above technical details in this already long paper.

Appendix B: Proof of Theorem A.6
Proof of Theorem A.6 uses techniques developed in earlier works [6,29,43], but here one needs to additionally take into account the -dependence of the coefficients in the SDEs (A.3) and (A.4). As a preparation for the proof, we need a few lemmas and propositions. We start from an elementary calculus result.
(i) Suppose that for each i and y ∈ R n , there exists a unique bounded F i (y) : R n → R mi×n and a constant C i > 0 such that f i (y, )−F i (y) ≤ C i ri , for some positive constant r i , as → 0 (i.e., the left-hand side is of order O( ri ) as → 0). Then there exist constants D, as → 0. If, in addition, n = m 1 , f 1 (y, ) and F 1 (y) are invertible for every y ∈ R n and > 0, then Then ≤ C max{D k , C k+1 }( min(r1,...,r k ) + r k+1 ) ≤ D k+1 min(r1,...,r k+1 ) , (B.7) as → 0, where C, D k+1 are positive constants and we have used the inductive hypothesis and assumptions of the lemma in the last two lines above. The last statement follows from: as → 0, where C is a positive constant. (ii) The statements can be proven using the same techniques used for (i) and so we omit the proof. (B.11) We provide estimates for the moments of the process p (t), under appropriate assumptions on the coefficients and the initial conditions, in the limit as → 0.
We need the following lemma, adapted from Proposition A.2.3 of [31], to obtain an exponential bound on fundamental matrix solutions of a linear equation.
Then there exists a constant C > 0 and an (in general random 6 for all ≤ 1 and for all s, t ∈ [0, T ].
Proof. Let u ∈ [s, t]. We rewrite for ω ∈ Ω, s, t ∈ [0, T ]: and represent the solution to the IVP as: Using this, we obtain: This leads to the estimate: We now prove a lemma that gives a bound on a class of stochastic integrals. It is modification of Lemma 5.1 in [4]. In both cases, the main idea is to rewrite some of the stochastic integrals in terms of ordinary ones.
where N = max{k ∈ Z : kδ < T }, 1 , κ and C are from Lemma B.2, and l 2 -norm is used on every R k .
Proof. The proof is identical to that of Lemma 5.1 in [4] up to line (5.10), with the constant α there replaced by κ, etc. We let ≤ 1 and replace the bound in line (5.11) there by the following bound, which follows from the semigroup property of the fundamental matrix process and Lemma B.2: Then we proceed as in the proof of Lemma 5.1 in [4] to get the desired bound.
Proof. Let Φ (t) be the matrix-valued process solving the IVP: Then, Therefore, for T > 0 and p ≥ 1, using the bound for p ≥ 1 (here the a i ∈ R and N is a positive integer), taking supremum on both sides, and applying Lemma B.2 (with B = a 2 (t, x (t), )), we estimate: for ≤ 1 , where C > 0, κ > 0, and 1 > 0 is the random variable whose existence was proven in Lemma B.
Next, the idea is to use Lemma B.3 and the Burkholder-Davis-Gundy inequality (see Theorem 3.28 in [32]) to estimate the last term on the right-hand side above. This is analogous to the technique used in the proof of Proposition 5.1 in [4].
Let δ be a constant such that 0 < δ < T. Applying Lemma B.3, we estimate, using (B.30): where a 2 (t, x (t), ) ∞ := sup t∈[0,T ],y ∈R n 1 , ∈E a 2 (t, y, ) and We estimate:  40) with N := max{k ∈ Z : kδ < T }, where we have used the fact that the l ∞ -norm on R N is bounded by the l q norm for every q ≥ 1 and then applied Hölder's inequality to get the last two lines above. Now, letting δ = 1−h for 0 < h < 1, and using the Burkholder-Davis-Gundy inequality, Since Nδ < T , we have Nδ pq/2 < T δ pq/2−1 = T (1−h)(pq/2−1) . Therefore, . For all 0 < β < p/2, one can choose 0 < h < 1 and q > 1 such that Therefore, we have as → 0, for all 0 < β < p/2. Combining all the estimates obtained, one has: where the C i are positive constants, α ≥ p/2 is some constant, and b 2 > 0 is the constant from Assumption A.5. The statement of the proposition follows.
We also need the following estimate on a class of integrals with respect to products of the coordinates of the process p (t).
Now we proceed to prove Theorem A.6. Using the above moment estimates and the proof techniques in [4,6], we are going to first obtain the convergence of x t to X t in the limit as → 0 in the following sense: for all finite T > 0, p ≥ 1, as → 0, where the 1 is from Proposition B.4. The main tools are wellknown ordinary and stochastic integral inequalities, as well as a Gronwall type argument. This result will then imply that for all finite T > 0, sup t∈[0,T ] |x t − X t | → 0 in probability, in the limit as → 0 (see Lemma 1 in [43]).

Proof of Theorem
Substituting this into (A.3), we obtain: (B.54) In integral form, we have: The ith component, [x ] i (t) (i = 1, 2, . . . , n 1 ) is (recall that we are employing Einstein's summation convention): Next, we perform integration by parts in the second term on the righthand side above: Denoting J (t) := v (t)(v (t)) * , we can rewrite the above as: Since −a 2 (t, x (t), ) is positive stable uniformly (in t, x and ) by Assumption A.2, the solution of the Lyapunov equation (B.63) can be represented as: On the other hand, by (A.10),
We have: To estimate R 5 , we use the Burkholder-Davis-Gundy inequality: where C p is a positive constant and · F denotes the Frobenius norm. Using Hölder's inequality, Assumption A.3, Assumption A.5, and the above techniques, we obtain: Using (B.94), the assumptions of the theorem, and estimating as before, we obtain: Collecting the above estimates for the M k , we obtain: (A.14) then follows for the case p > 2. The result for 0 < p ≤ 2 follows by an application of the Hölder's inequality: for 0 < p ≤ 2, taking q > 2 so that p/q < 1, we have for all 0 < β < p , as → 0. The statement on convergence in probability follows from Lemma 1 in [43].
Then it can be shown that Φ i (z) admits the following (controllable) realization [8]: where B li is in the l i th slot, Then the realization of the memory function (for the case i = 2) and noise process (for the case i = 4) can be obtained by taking Γ i = F i , C i = H i and solving the following linear matrix inequality: for M i = M * i [74]. The above realization gives us the desired spectral densities. Indeed, let us use the transformation of type (3.6) to diagonalize the M i , i.e., In this case, for i = 4 we have: (ξ i ) t = C i (β i ) t = C i β i t = ξ i t , where (β i ) t solves the SDE: and one can compute the spectral density to be: S i (ω) = Φ i (−iω)Φ * i (iω) = B li ω 2li ((ω 2 I + Γ i,1 ) 2 ) · · · (ω 2 I + Γ i,di ) 2 )) −1 B * li . (C.8) A similar discussion applies to the realization of the memory function.
For i = 2, 4, set m = m 0 , Γ i,k = γ i,k / for k = l i + 1, . . . , d i and rescale the B li with accordingly, so that the limit as → 0 of the rescaled spectral densities gives us the desired asymptotic behavior. The choice of which and how many of the Γ i,k to rescale as well as the smallness of (i.e., what determines the wide separation of time scales and their magnitude) depends on the physical system under study. The resulting family of GLEs can then be cast in a form suitable for application of Theorem A.6 and the homogenized SDE for the particle's position can be obtained, under appropriate assumptions on the coefficients of the GLE.