Eigenfunction martingale estimating functions and filtered data for drift estimation of discretely observed multiscale diffusions

We propose a novel method for drift estimation of multiscale diffusion processes when a sequence of discrete observations is given. For the Langevin dynamics in a two-scale potential, our approach relies on the eigenvalues and the eigenfunctions of the homogenized dynamics. Our first estimator is derived from a martingale estimating function of the generator of the homogenized diffusion process. However, the unbiasedness of the estimator depends on the rate with which the observations are sampled. We therefore introduce a second estimator which relies also on filtering the data, and we prove that it is asymptotically unbiased independently of the sampling rate. A series of numerical experiments illustrate the reliability and efficiency of our different estimators.


Introduction
Learning models from data is a problem of fundamental importance in modern applied mathematics. The abundance of data in many application areas such as molecular dynamics, atmosphere/ocean science makes it possible to develop physics-informed data driven methodologies for deriving models from data (Raissi et al. 2019;Yang et al. 2021;Zhang et al. 2019). Naturally, most problems of interest are characterised by a very high-dimensional state space and by the presence of many characteristic length and time scales. When it is possible to decompose the state space into the resolved Dedicated to the memory of Assyr Abdulle.
B Grigorios A. Pavliotis and unresolved degrees of freedom, then one is usually interested in the derivation of a model for the resolved degrees of freedom, while treating the unresolved scales as noise. Clearly, these reduced models are stochastic, often described by stochastic differential equations (SDEs). The goal of this paper is to derive rigorous and systematic methodologies for learning coarse-grained models that accurately describe the dynamics at macroscopic length and time scales from noisy observations of the full, unresolved dynamics. We apply the proposed methodologies to simple models of fast/slow SDEs for which the theory of homogenization exists, that enables us to study the inference problem in a rigorous and systematic manner.
In many applications, the available data are noisy, not equidistant and certainly not compatible with the coarsegrained model. The presence of observation noise and of the model-data mismatch renders the problem of learning macroscopic models from microscopic data highly ill-posed. Several examples from econometrics (market microstructure noise) (Aït-Sahalia et al. 2006) and molecular dynamics show that standard algorithms, e.g. maximum likelihood or quadratic variation for the diffusion coefficient, are asymptotically biased and they fail to estimate correctly the parameters in the coarse-grained model. In a series of earlier works, this problem was studied using maximum likelihood techniques with subsampled data (Pavliotis and Stuart 2007;Papavasiliou et al. 2009), methodologies based on the method of moments (Krumscheid et al. 2013;Kalliadasis et al. 2015;Krumscheid et al. 2015), quadratic programming approaches (Crommelin and Vanden-Eijnden 2006a) as well as Bayesian approaches (Abdulle and Di Blasio 2020;. We also mention the pioneering work on estimating the integrated stochastic volatility in the presence of market microstructure noise (Aït-Sahalia et al. 2006;Zhang et al. 2005). In particular, in Aït-Sahalia and Jacod (2014) the authors analyse the correct interplay between the intensity of the microstructure noise and the optimal rates of convergence.
The main observation in Pavliotis and Stuart (2007); Papavasiliou et al. (2009) is that when the maximum likelihood estimator (MLE) of the fast/slow system is evaluated at the full data, then the MLE becomes asymptotically biased; in fact, the original data are not compatible with the homogenized equation, and therefore data need to be preprocessed, for instance under the form of subsampling. On the other hand, when the MLE is evaluated at appropriately subsampled data, then it becomes asymptotically unbiased. Although this is an interesting theoretical observation (see also later developments in Spiliopoulos and Chronopoulou (2013)), it does not lead to an efficient algorithm. The reason for this is that the performance of the estimator depends very sensitively on the choice of the sampling rate. In addition, the optimal sampling rate is not known and is strongly dependent on the problem under investigation. Furthermore, subsampling naturally leads to an increase in the variance, unless appropriate variance reduction methodologies are used.
In a recent work (Abdulle et al. 2021), we addressed the problem of lack of robustness of the MLE with subsampling algorithm by introducing an appropriate filtering methodology that leads to a stable and robust algorithm. In particular, rather than subsampling the original trajectory, we smoothed the data by applying an appropriate linear time-invariant filter from the exponential family and we modified the MLE by inserting the new filtered data. This new estimator was thus independent of the subsampling rate and also asymptotically unbiased and robust with respect to the parameters of the filter.
However, the assumption that the full path of the solution is observed is not realistic in most applications. In fact, in all real problems one can only obtain discrete measurements of the diffusion process. Hence, in this paper we focus on the problem of learning the coarse-grained homogenized model assuming that we are given discrete observations from the microscopic model. In this paper, we use the martingale estimating functions that were introduced in Bibby and Sø rensen (1995), where the authors study drift estimation for discrete observations of one-scale processes and show that estimators based on the discretized continuous-version likelihood function can be strongly biased. They therefore propose martingale estimating functions obtained by adjusting the discretized continuous-version score function by its compensator which leads to unbiased estimators. Moreover, in Kessler and Sørensen (1999) a different type of martingale estimating function, which is dependent on the eigenvalues and eigenfunctions of the generator of the stochastic process, is introduced and asymptotic unbiasedness and normality are proved. Furthermore, another inference methodology that uses spectral information is proposed in Crommelin and Vanden-Eijnden (2006b). Their approach consists of inferring the drift and diffusion functions of a diffusion process by minimizing an objective function which measures how close the generator is to having a reference spectrum which is obtained from the time series through the construction of a discrete-time Markov chain. This idea has been further expanded in several directions in Crommelin and Vanden-Eijnden (2011).
In this paper, we propose a new estimator for learning homogenised SDEs from noisy discrete data that is based on the martingale estimators that were introduced in Kessler and Sørensen (1999). The main idea is to consider the eigenvalues and eigenfunctions of the generator of the homogenized process. This new estimator is asymptotically unbiased only if the distance between two consecutive observations is not too small compared with the multiscale parameter describing the fastest scale, i.e. if data are compatible with the homogenized model. Therefore, in order to obtain unbiased approximations independently of the sampling rate with which the observations are obtained, we propose a second estimator which, in addition to the original observations, relies also on filtered data obtained following the filtering methodology presented in Abdulle et al. (2021). We observe that smoothing the original data makes observations compatible with the homogenized process independently of the rate with which they are sampled and hence this second estimator gives a black-box tool for parameter estimation.

Our main contributions
The main goal of this paper is to propose new algorithms based on martingale estimating functions and filtered data for which we can prove rigorously that they are asymptotically unbiased and not sensitive with respect to, e.g. the sampling rate and the observation error. In particular, we combine two main ideas: • the use of martingale estimating functions for discretely observed diffusion processes based on the eigenvalues and the eigenfunctions of the generator of the homogenized process, which was originally presented for onescale problems in Kessler and Sørensen (1999); • the filtering methodology for smoothing the data in order to make them compatible with the homogenized model, which was introduced in Abdulle et al. (2021).
We prove theoretically and observe numerically that the estimator without filtered data is asymptotically unbiased if: • the observations are taken at the homogenized regime, i.e. the sampling rate is independent of the parameter measuring scale separation; • the observations are taken at the multiscale regime, i.e. the sampling rate is dependent on the fastest scale, and the sampling rate is bigger than the multiscale parameter.
Moreover, we show that the estimator with filtered data corrects the bias caused by a sampling rate smaller than the multiscale parameter, and therefore, it is asymptotically unbiased independently of the sampling rate.
Outline. The rest of the paper is organized as follows. In Sect. 2, we present the Langevin dynamics and its corresponding homogenized equation and we introduce the two proposed estimators based on eigenvalues and eigenfunctions of the generator with and without filtered data. In Sect. 3, we present the main results of this work, i.e. the asymptotic unbiasedness of the two estimators, and in Sect. 4 we perform numerical experiments which validate the efficacy of our methods. Section 4.1 is devoted to the proof of the main results which are presented in Sect. 3. Finally, in Appendix, we show some technical results which are employed in the analysis and we explain some details about the implementation of the proposed methodology.

Problem setting
In this work, we study the following class of multiscale diffusion processes. Consider the following two-scale SDE, observed over the time interval [0, T ] where ε > 0 describes the fast scale, α ∈ R M and σ > 0 are, respectively, the drift and diffusion coefficients and W t is a standard one-dimensional Brownian motion. The functions V : R → R M and p : R → R are the slow-scale and fast-scale parts of the potential, and they are assumed to be smooth. Moreover, we also assume p to be periodic with period L > 0. We remark that our setting can be considered as a semi-parametric framework similar to the one of Krumscheid et al. (2013). The components of the potential function V , in fact, can be viewed as basis functions for a truncated expansion (e.g. Taylor series or Fourier expansion) of the unknown slow-scale potential V (·; α) : R → R, where the components of the unknown drift term α contain the generalized Fourier coefficients, i.e.
We also mention that assuming a parametric form for the potential V is a technique usually employed in the statistics literature in order to regularize the likelihood function and obtain a parametric approximation of the actual MLE of V , which does not exist in general (Pokern et al. 2009).

Remark 2.1
For clarity of the presentation, we focus our analysis on scalar multiscale diffusions with a finite number of parameters in the drift that have to be learned from data. Nevertheless, we remark that all the following theory can be generalized to the case of multidimensional diffusion processes in R d , for which we provide further details in Appendix C and an example in Sect. 4.5. However, the problem becomes more complex and computationally expensive from a numerical viewpoint and it can be prohibitive if the dimension d is too large, since the methodology proposed in this paper requires the solution of the eigenvalue problem for the generator of a d-dimensional diffusion process.
The theory of homogenization (see, for example, Bensoussan et al. 2011, Chapter 3 or Pavliotis andStuart 2008, Chapter 18) guarantees the existence of the following homogenized SDE whose solution X 0 t is the limit in law of the solutions X ε t of (2.1) as random variables in C 0 ([0, T ]; R) (2.4) and where the function is the unique solution with zeromean with respect to the measure μ of the differential equation − p (y) (y) + σ (y) = p (y), 0 ≤ y ≤ L, (2.5) endowed with periodic boundary conditions. In particular, for one-dimensional diffusion processes, we have Our goal is to derive estimators for the homogenized drift coefficient A based on multiscale data originating from (2.1).
In this work, we consider the same setting as Abdulle et al. (2021), which is summarized by the following assumption.
Assumption 2.2 The potentials p and V satisfy and each component is polynomially bounded from above and bounded from below, and there exist b 1 , b 2 > 0 such that Let us remark that, under Assumption 2.2, it has been proved in Pavliotis and Stuart (2007) that both processes (2.1) and (2.2) are geometrically ergodic and their invariant measure has a density with respect to the Lebesgue measure. In particular, let us denote by ϕ ε and ϕ 0 the densities of the invariant measures of X ε t and X 0 t , respectively, defined by (2.6)

Remark 2.3
The value of the initial condition X ε 0 in the SDE (2.1) is important neither for the numerical experiments nor for the following analysis and can be chosen arbitrarily. In fact, the process X ε t is geometrically ergodic and therefore it converges to its invariant distribution with density ϕ ε exponentially fast for any initial condition.
Drift estimation problem. Consider N + 1 uniformly distributed observation times 0 = t 0 < t 1 < t 2 < . . . , < t N = T , set = t n −t n−1 and let (X ε t ) t∈[0,T ] be a realization of the solution of (2.1). We then assume to know a sample { X ε n } N n=0 of the realization where X ε n = X ε t n , and we aim to estimate the drift coefficient A of the homogenized equation (2.2). First, since we deal with discrete observations of stochastic processes, we employ martingale estimating functions based on eigenfunctions, which have already been studied for problems without a martingale structure in Kessler and Sørensen (1999). Second, by observing that if the time-step is too small with respect to the multiscale parameter ε, then the data could be compatible with the full dynamics rather than with the coarse-grained model, we also adopt the filtering methodology presented in Abdulle et al. (2021), which has been proved to be beneficial for correcting the behaviour of the maximum likelihood estimator (MLE) in the setting of continuous observations.

Martingale estimating functions based on eigenfunctions
We first remark that a general theory for martingale estimating functions exists and is thoroughly outlined in Bibby and Sø rensen (1995). They appear to be appropriate for multiscale problems due to their robustness properties. In this paper, we develop martingale estimating functions based on the eigenfunctions of the generator of the process, since the theory of the eigenvalue problem for elliptic differential operators and the multiscale analysis of this eigenvalue problem are well developed. Let A ⊂ R M be the set of admissible drift coefficients for which Assumption 2.2(ii) is satisfied. To describe our methodology, we consider the solution X t (a) of the homogenized process (2.2) with a generic parameter a ∈ A instead of the exact drift coefficient A: which, according to (2.6), has invariant measure The generator L a of (2.7) is defined for all u ∈ C 2 (R) as: where the subscript denotes the dependence of the generator on the unknown drift coefficient a. From the well-known spectral theory of diffusion processes and under our assumptions on the potential V , we deduce that L a has a countable set of eigenvalues (see, for example, Hansen et al. 1998).
In particular, let {(λ j (a), φ j (·; a))} ∞ j=0 be the sequence of eigenvalue-eigenfunction couples of the generator which solve the eigenvalue problem (2.10) which, due to (2.9), is equivalent to (2.11) and where the eigenvalues satisfy 0 = λ 0 (a) < λ 1 (a) < · · · < λ j (a) ↑ ∞ and the eigenfunctions form an orthonormal basis for the weighted space L 2 (ϕ 0 a ). We mention in passing that, by making a unitary transformation, the eigenvalue problem for the generator of the Langevin dynamics can be transformed to the standard Sturm-Liouville problem for Schrödinger operators (Pavliotis 2014, Chapter 4). We now state a formula, which has been proved in Kessler and Sørensen (1999) and will be fundamental in the rest of the paper (2.12) where = t n − t n−1 is the constant distance between two consecutive observations. We now discuss how this eigenvalue problem can be used for parameter estimation. Let J be a positive integer and let {β j (·; a)} J j=1 be J arbitrary functions β j (·; a) : R → R M possibly dependent on the parameter a, which satisfy Assumption 2.5(i)(ii) stated below, and define for x, y, z ∈ R the martingale estimating function (2.13) Then, given a set of observations { X ε n } N n=0 , we consider the score function G ε N ,J : (2.14) This function can be seen as an approximation in terms of eigenfunctions of the true score function, i.e. the gradient of the log-likelihood function with respect to the unknown parameter. The full derivation of a martingale estimating function as an approximation of the true score function is given in detail in Bibby and Rensen (1995, Sect. 2). The first step is a discretization of the gradient of the continuous-time log-likelihood, which yields a biased estimating function. Hence, the next step is adjusting this function by adding its compensator in order to obtain a zero-mean martingale. Moreover, by using the eigenfunctions of the generator, it is shown in Kessler and Sørensen (1999) that this approach is suitable for scalar diffusion processes with no multiscale structure, i.e. processes with a single characteristic length/time scale. In fact, by a classical result for ergodic diffusion processes (Pavliotis 2014, Sect. 4.7), any function in the L 2 space weighted by the invariant measure can be written as an infinite linear combination of the eigenfunctions of the generator of the diffusion process.

Remark 2.4
In the construction of the martingale estimating function G ε N ,J (a), we omitted the first index j = 0 because, for ergodic diffusion processes, the first eigenvalue is zero, λ 0 (a) = 0, and its corresponding eigenfunction is constant, φ 0 (a) = 1, and hence, they would give g 0 (x, y, z; a) = 0 independently of the function β 0 (z; a). Therefore, it would not provide us with any information about the unknown parameters in the drift. An intuition on why G ε N ,J is a good score function is given by the following result. Let G 0 N ,J be the score function where the observations of the slow variable of the multiscale process are replaced by the homogenized ones, then due to equation (2.12) which means that the zero of the expectation of the score function with homogenized observations is exactly the drift coefficient of the effective equation. In Algorithm 1, we summarize the main steps for computing the estimator A ε N ,J and further details about the implementation can be found in Appendix B. We finally introduce the following technical assumption which will be employed in the analysis.
Assumption 2.5 The following hold for all a ∈ A and for all j = 1, . . . , J : (i) β j (z; a) is continuously differentiable with respect to a for all z ∈ R; (ii) all components of β j (·; a), β j (·; a), . β j (·; a), . β j (·; a) are polynomially bounded; (iii) the slow-scale potential V is such that φ j (·; a), φ j (·; a), φ j (·; a), and all components of . φ j (·; a) are polynomially bounded; where the dot denotes either the Jacobian matrix or the gradient with respect to a.

Remark 2.6
In Kessler and Sørensen (1999) the authors propose a method to choose the functions {β j (·; a)} J j=1 in order to obtain optimality in the sense of Godambe and Heyde (1987): This optimal set of functions can be seen as the projection of the score function onto the set of martingale estimating functions obtained by varying the function {β j (·; a)} J j=1 . For the class of diffusion processes for which the eigenfunctions are polynomials, the optimal estimating functions can be computed analytically. In fact, they are related to the moments of the transition density, which can be computed explicitly. Moreover, another procedure is to choose functions which depend only on the unknown parameter and which minimize the asymptotic variance. This approach is strongly related to the asymptotic optimality criterion considered by Heyde and Gay (1989). For further details on how to choose these functions we refer to Kessler and Sørensen (1999), and we remark that their calculation requires additional computational cost. Nevertheless, the theory we develop is valid for all functions which satisfy Assumptions 2.5(i) and 2.5(ii) and we observed in practice that choosing simple functions independent of the unknown parameter, e.g. monomials of the form β j (z; a) = z k with k ∈ N, is sufficient to obtain satisfactory estimations. We also remark that in one dimension we can characterize completely all diffusion processes whose generator has orthogonal polynomials as eigenfunctions (Bakry et al. 2014, Sect. 2.7). Partial results in this directions also exist in higher dimensions.

The filtering approach
We now go back to our multiscale SDE (2.1) and, inspired by Abdulle et al. (2021), we propose a second estimator for the homogenized drift coefficient by filtering the data. In particular, we modify A ε N ,J by filtering the observations and inserting the new data into the score function G ε N ,J in order to take into account the case when the step size is too small with respect to the multiscale parameter ε. Let us consider the exponential kernel k : R + → R defined as for which a rigorous theory has been developed in Abdulle et al. (2021). We remark that this exponential kernel is a low-Algorithm 1: Estimation of A without filtered data Input: Observations X ε n N n=0 . Distance between two consecutive observations . Number of eigenvalues and eigenfunctions J .
pass filter, which cuts the high frequencies and highlights the slowest components. We then define the filtered observa- where the fast-scale component of the original multiscale trajectory is eliminated, and we define the new score function as a modification of (2.14), i.e. (2.17)

Remark 2.7
Notice that the filtered data only partially replace the original data in the definition of the score function. This idea is inspired by Abdulle et al. (2021) where the same approach is used with the maximum likelihood estimator. The importance of keeping also the original observations becomes apparent in the proofs of the main results. However, a simple intuition is provided by equation (2.12). This equation is essential in order to obtain the unbiasedness of the estimators when the sampling rate is independent of the multiscale parameter ε, but it is not valid for the filtered process.
The estimator A ε N ,J . The second estimator A ε N ,J is given by the solution of the M-dimensional nonlinear system The main steps to compute the estimator A ε N ,J are highlighted in Algorithm 2 and additional details about the implementation can be found in Appendix B. Note that (2.16) can be rewritten as We introduce its continuous version Z ε t which will be employed in the analysis We remark that the joint process (X ε t , Z ε t ) satisfies the system of multiscale SDEs and, using the theory of homogenization, when ε goes to zero it converges in law as a random variable in Moreover, it has been proved in Abdulle et al. (2021) that the two-dimensional processes (X ε t , Z ε t ) and (X 0 t , Z 0 t ) are geometrically ergodic and their respective invariant measures have densities with respect to the Lebesgue measure denoted respectively by ρ ε = ρ ε (x, z) and ρ 0 = ρ 0 (x, z). Let us finally remark that given discrete observations X ε n we can only compute Z ε n , but the theory, which has to be employed for proving the convergence results, has been studied for the continuous-time process Z ε t .

Remark 2.8
The only difference in the construction of the estimators A ε N ,J and A ε N ,J is the fact that the latter requires filtered data, which are obtained from discrete observations, and thus, it is computationally more expensive. Therefore, when it is possible to use the estimator without filtered data, it is preferable to employ it.

Main results
In this section, we present the main results of this work, i.e. the asymptotic unbiasedness of the proposed estimators. We first need to introduce the following technical assumption, which is a nondegeneracy hypothesis related to the use of Algorithm 2: Estimation of A with filtered data Input: Observations X ε n N n=0 . Distance between two consecutive observations . Number of eigenvalues and eigenfunctions J .
the implicit function theorem for the functions (2.14) and (2.17) in the limit as N → ∞.
Assumption 3.1 Let A be the homogenized drift coefficient of equation (2.2). Then, the following hold where ρ 0 is the invariant measure of the couple ( X 0 n , Z 0 n ), whose existence is guaranteed by Lemma A.2, and ∇ a X t (a) is the gradient of the stochastic process X t (a) in (2.7) with respect to the drift coefficient a.
Remark 3.2 The nondegeneracy Assumption 3.1, which is analogous to Condition 4.2(a) in Kessler and Sørensen (1999), holds true in all nonpathological examples and does not constitute an essential limitation on the range of validity of the results proved in this paper. Further details about the necessity of this assumption for the analysis of the proposed estimator will be given in Sect. 5.2.
The proofs of the following two main theorems are the focus of Sect. 5.

Theorem 3.3 Let J be a positive integer. Under Assumptions
exists with probability tending to one as N → ∞. Moreover,

Theorem 3.4 Let J be a positive integer. Under Assumptions
Remark 3.5 Notice that in both Theorem 3.3 and Theorem 3.4 the order of the limits is important and they cannot be interchanged. In fact, we first consider the large data limit, i.e. the number of observations N tends to infinity, and then we let the multiscale parameter ε vanish. Moreover, in Theorem 3.4 the values ζ = 1 and ζ = 2 are not allowed because of technicalities in the proof, but we observe numerically that the estimator works well also in these two particular cases.
These two theorems show that both estimators based on the multiscale data from (2.1) converge to the homogenized drift coefficient A of (2.2). Since the analysis is similar for the two cases, we will mainly focus on the second score function with filtered observations and at the end of each step we will state the differences with respect to the estimator without pre-processed data.

Remark 3.6
Since the main goal of this work is the estimation of the effective drift coefficient A, in the numerical experiments and in the following analysis we will always assume the effective diffusion coefficient to be known. Nevertheless, we remark that our methodology can be slightly modified in order to take into account the estimation of the effective diffusion coefficient too. In fact, the parameter a can be replaced by the parameter θ = (a, s) ∈ R M+1 where a stands for the drift and s stands for the diffusion, yielding nonlinear systems of dimension M + 1 corresponding to (2.15) and (2.18). The proofs of the asymptotic unbiasedness of the new estimators θ ε N ,J and θ ε N ,J can be adjusted analogously. For completeness, we provide a more detailed explanation and a numerical experiment illustrating this approach in Sect. 4.6.

A particular case
Before analysing the general framework, let us consider the simple case of the Ornstein-Uhlenbeck process, i.e. let the dimension of the parameter N = 1 and let V (x) = x 2 /2. Then, the multiscale SDE (2.1) becomes and its homogenized version is Letting a ∈ A, then the eigenfunctions φ j (·; a) and the eigenvalues λ j (a) satisfy The solution of the eigenvalue problem can be computed explicitly (see Pavliotis 2014, Sect. 4.4); we have and φ j (·; a) satisfies the recurrence relation It is also possible to prove by induction that Let us consider the simplest case with only one eigenfunction, i.e. J = 1, and β 1 (z; a) = z, which implies Then, the score functions (2.14) and (2.17) become The solutions of the equations G ε N ,1 (a) = 0 and G ε N ,1 (a) = 0 can be computed analytically and are given by Comparing these estimators with the discrete MLE defined in Pavliotis and Stuart (2007) without filtered data as , and the discrete MLE with filtered data we notice that they coincide in the limit as vanishes. We remark that we are comparing our estimator with the discrete MLE instead of the analytical formula for the MLE in continuous time since we assume that we are observing our process at discrete times. Therefore, the continuous time MLE has to be approximated using the available discrete data (Pavliotis 2014, Sect. 5.3). In the following theorems we show the asymptotic limit of the estimators. We do not provide a proof for these results since Theorem 3.7 and Theorem 3.9 are particular cases of Theorem 3.3 and Theorem 3.4 respectively, and Theorem 3.8 follows from the proof of Theorem 3.3 as highlighted in Remark 5.11.
where A is the drift coefficient of the homogenized equation (2.2).
Theorem 3.8 Let be independent of ε or = ε ζ with ζ > 2. Then, under Assumption 2.2, the estimator (3.1) satisfies where A is the drift coefficient of the homogenized equation (2.2).

Remark 3.10
Notice that it is possible to write different proofs for Theorems 3.7, 3.8 and 3.9, which take into account the specific form of the estimators, and thus show stronger results. In fact, if the distance between two consecutive observations is independent of the multiscale parameter ε, then the convergences in the statements do not only hold in probability, but also almost surely. We expect that almost sure convergence can be proved for a larger class of equations, but are neither aware of related literature showing such a stronger result, nor have been able to prove it.

Numerical experiments
In this section, we present numerical experiments which confirm our theoretical results and show the power of the martingale estimating functions based on eigenfunctions and filtered data to correct the unbiasedness caused by discretization and the fact that we are using multiscale data to fit homogenized models. Moreover, we present a sensitivity analysis with respect to the number N of observations and the number J of eigenvalues and eigenfunctions taken into account.
In the experiments that we present data are generated employing the Euler-Maruyama method with a fine time step h, in particular we set h = ε 3 . Letting , with t n = n . In view of Remark 2.3, we do not require stationarity of the multiscale dynamics; hence, we always set the initial condition to be X ε 0 = 0. Notice that the time step h is only used to generate numerically the original data and has to be chosen sufficiently small in order to have a reliable approximation of the continuous path. However, the distance between two consecutive observations is the rate at which we sample the data, which we assume to know, from the original trajectory. In order to compute the filtered data { Z ε n } N n=1 , we employ equation (2.19). We repeat this procedure for M = 15 different realizations of Brownian motion and we plot the average of the drift coefficients computed by the estimators. We finally remark that in order to compute our estimators we need the value of the diffusion coefficient of the homogenized equation. In all the numerical experiments, we compute it exactly using the formula for the coefficient K given by the theory of homogenization, but we also remark that its value could be estimated employing the subsampling technique presented in Pavliotis and Stuart (2007) or modifying the estimating function as explained in Remark 3.6.

Sensitivity analysis with respect to the number of observations
We consider the multiscale Ornstein-Uhlenbeck process, i.e. equation (2.1) with V (x) = x 2 /2, and we take p(y) = cos(y), the multiscale parameter ε = 0.1, the drift coefficient α = 1 and the diffusion coefficient σ = 1. Notice that for this choice of the slow-scale potential the technical assumptions required in the main Theorems 3.3, 3.4 can be easily checked. We plot the results computed by the estimator A ε N ,J with J = 1 and β(x; a) = x and we then divide the analysis in two cases: "small" and "big".
Let us first consider "small", i.e. = ε ζ with ζ = 0, 0.5, 1, 1.5, and take T = 400. In Fig. 1, we plot the results of the estimator as a function of the number of observations N . We remark that in this case the number of observations needed to reach convergence is strongly dependent and inversely proportional to the distance between two consecutive observations. This means that in order to reach convergence we need the final time T to be sufficiently large independently of . In fact, when the distance is small, the discrete observations are a good approximation of the continuous trajectory and therefore what matters most is the length T of the original path rather than the number N of observations.
In order to study the case "big", i.e. > 1, we set = 2 ζ with ζ = 1, 2, 3, 4, and take T = 2 15 . Figure 2 shows that in this case the number of observations needed to reach convergence is an increasing function of . Therefore, in order to have a reliable approximation of the drift coefficient of the homogenized equation, the final time T has to be chosen depending on . This is justified by the fact that, differently from the previous case, the discrete data are less correlated and therefore they do not well approximate the continuous trajectory. In particular, when the distance between two consecutive observations is very large, then in practice we need a huge amount of data because a good approximation of the unknown coefficient is obtained only if the final time T is very large.

Sensitivity analysis with respect to the number of eigenvalues and eigenfunctions
Let us now consider equation (2.1) with four different slowscale potentials The other functions and parameters of the SDE are chosen as in the previous subsection, i.e. p(y) = cos(y), α = 1, σ = 1 and ε = 0.1. Moreover, we set = ε and T = 500 and we vary J = 1, . . . , 10. The functions {β j } 10 j=1 appearing in the estimating function are given by β j (x; a) = x for all j = 1, . . . , J .
In Fig. 3, where we plot the values computed by A ε N ,J and A ε N ,J , we observe that the number J of eigenvalues and eigenfunctions slightly improve the results, in particular for the fourth potential, but the estimation stabilizes when the number of eigenvalues J is still small, e.g. J = 3. Therefore, in order to reduce the computational cost, it seems to be preferable not to take large values of J . This is related to how quickly the eigenvalues grow and, therefore, how quickly the corresponding exponential terms decay. The rigorous study of the accuracy of the spectral estimators as a function of the number of eigenvalues and eigenfunctions that we take into account will be investigated elsewhere.
In Fig. 4, we compare our martingale estimator A ε N ,J without filtered data with the discrete maximum likelihood estimator denoted MLE ε N , . The MLE does not provide good results for two reasons: • if is small, more precisely if = ε ζ with ζ > 1, sampling the data does not completely eliminate the fastscale components of the original trajectory; therefore, since we are employing data generated by the multiscale model, the estimator is trying to approximate the drift coefficient α of the multiscale equation, rather than the one of the homogenized equation; • if is relatively big, in particular if = ε ζ with ζ ∈ [0, 1), then we are taking into account only the slowscale components of the original trajectory, but a bias is with J = 1 without filtered data as a function of the distance between two successive observations for different slow-scale potentials still introduced because we are discretizing an estimator which is usually used for continuous data.
Nevertheless, as observed in these numerical experiments and investigated in greater detail in Pavliotis and Stuart (2007), there exists an optimal value of such that MLE ε N , works well, but this value is not known a priori and is strongly dependent on the problem, hence this technique is not robust. Figure 4 shows that the second issue, i.e. when is relatively big, can be solved employing A ε N ,J , an estimator for discrete observations, and that filtering the data is not needed as proved in Theorem 3.3.
Then, in order to solve also the first problem, in Fig. 5 we compare A ε N ,J with our martingale estimator A ε N ,J with filtered data. We observe that inserting filtered data in the estimator allows us to disregard the fast-scale components of the original trajectory and to obtain good approximations of the drift coefficient A of the homogenized equation independently of , as already shown in Theorem 3.4. In particular, we notice that the results still improve even for big values of if we employ the estimator based on filtered data. Finally, as highlighted in Remark 5.11, we observe that the limiting value of the estimator A ε N ,J as the number of observations N goes to infinity and the multiscale parameter ε vanishes is strongly dependent on the problem and cannot be computed theoretically. However, if we consider the slow-scale potential V 1 (x) = x 2 /2, i.e. the multiscale Ornstein-Uhlenbeck process, then the limit, as proved in Theorem 3.8, is the drift coefficient α of the multiscale equation.

Multidimensional drift coefficient
In this experiment, we consider a multidimensional drift coefficient, in particular we set N = 2. We then consider the bistable potential, i.e.
, and the fast-scale potential p(y) = cos(y). We choose the exact drift coefficient of the multiscale equation (2.1) to be α = 1.2 0.7 and the diffusion coefficient to be σ = 0.7. We also set the number of eigenfunctions J = 1, the function β(x; a) = x 3 x , the distance between two consecutive observations = 1 and the final time T = 1000. We then compute the estimator A ε N ,J after N = 100, 200, . . . , 1000 observations and in Fig. 6 we plot the result of the experiment for the cases ε = 0.1 and ε = 0.05. Since we are analysing the case independent of ε, filtering the data is not necessary and therefore we consider the estimator A ε N ,J which is computationally less expensive to compute.
We observe that the estimation is approaching the exact value A of the drift coefficient of the homogenized equation as the number of observations increases, until it starts oscillating around the true value A = 0.48 0.28 . Moreover, we notice that the time needed to reach a neighbourhood of A is smaller when the multiscale parameter ε is closer to its vanishing limit. In Table 1, we report the absolute error e ε N defined as where · 2 denotes the Euclidean norm, varying the number of observations N for the two values of the multiscale parameter.

Multidimensional stochastic process: interacting particles
In this section, we consider a system of d interacting particles in a two-scale potential, a problem with a wide range of applications which has been studied in Gomes and Pavliotis (2018). For t ∈ [0, T ] and for all i = 1, . . . , d, consider the system of SDEs (4.3) In this paper we fix the number of particles and study the performance of our estimators as ε vanishes. The very interesting problem of inference for mean field SDEs, obtained in the limit as d → ∞, will be investigated elsewhere. It can be shown (see, for example, Gomes and Pavliotis 2018, Sect. 2.1 and Duncan and Pavliotis (2016); Delgadino et al. (2021)) that (X ε 1 , . . . X ε d ) converges in law as ε goes to zero to the solution (X 0 1 , . . . , X 0 d ) of the homogenized system dX 0 (4.4) where = K θ and K is defined in (2.3). Moreover, the first eigenvalue and eigenfunction of the generator of the homogenized system can be computed explicitly and they are given, respectively, by Hence, letting > 0 independent of ε, given a sequence of observations

we can express the estimators analytically
Let us now set p(y) = cos(y), α = 1, σ = 1 and θ = 1. We then simulate system (4.3) for different final times T = 100, 200, . . . , 1000 and approximate the drift coefficient A of the homogenized system (4.4) for d = 2 and d = 5. In Figs. 7 and 8, we plot the results, respectively, of the estimators A ε N ,J with = 1 and A ε N ,J with = ε for two different values of ε = 0.1, 0.05. As expected, we observe that our estimator provides a better approximation of the unknown coefficient A when the time T increases and that this value stabilizes after approximately T = 500. Fig. 9 Simultaneous inference of drift and diffusion coefficient for the estimator A ε N ,J with J = 2 θ = a s ∈ R M+1 , whose exact value is given by

Simultaneous inference of drift and diffusion coefficients
, where A and are the drift and diffusion coefficients of the homogenized equation, respectively. Then, the eigenvalue problem reads for all j ∈ N where the eigenvalues and eigenfunctions are now dependent on the new parameter θ . Accordingly, also the functions {β j } J j=1 can be chosen dependent on both the drift and diffusion coefficients and, moreover, they have to take values in R M+1 , i.e. β j (·; θ) : R → R M+1 . Therefore, the new score functions G ε N ,J and G ε N ,J are defined from = A × S ⊂ R M+1 , which is the set of admissible parameters θ , to R M+1 and thus give nonlinear systems of dimension M +1. Finally, the solutions θ ε N ,J and θ ε N ,J of the systems are the estimators of both the drift and diffusion coefficients of the homogenized equation. In fact, small modifications in the proofs of the main results, in particular in the notation, yield the asymptotic unbiasedness of the estimators under the same conditions, i.e.
Consider now the same setting of Sect. 4.1, i.e. the multiscale Ornstein-Uhlenbeck potential with V (x) = x 2 /2, p(y) = cos(y), α = 1, σ = 1 and let us assume that both the drift and diffusion coefficients are unknown. We remark that in this case we have M = 1. Then, set the final time T = 1000, the sampling rate = 1 and the number of eigenfunctions and eigenvalues J = 2. Moreover, we choose the functions β 1 (x; θ) = β 2 (x; θ) = x 2 x . Since the distance between two consecutive observations is independent of the multiscale parameter ε, we consider the estimator A ε N ,J without filtered data. In Fig. 9, we plot the evolution of our estimator varying the number of observations N for two different values of ε, in particular ε = 0.1 and ε = 0.05. We observe that if the multiscale parameter is smaller, then the number of observations needed to obtain a reliable approximation of the unknown parameters is lower.

Asymptotic unbiasedness
In this section, we prove our main results. The plan of the proof is the following: • we first study the limiting behaviour of the score functions G ε N ,J and G ε N ,J defined in (2.14) and (2.17) as the number of observations N goes to infinity, i.e. as the final time T tends to infinity; • we then show the continuity of the limit of the score functions obtained in the previous step and we compute their limits as the multiscale parameter ε vanishes (Sect. 5.1); • we finally prove our main results, i.e. the asymptotic unbiasedness of the drift estimators (Sect. 5.2).
We first define the Jacobian matrix of the function g j introduced in (2.13) with respect to a: (5.1) which will be employed in the following and where ⊗ denotes the outer product in R M and the dot denotes either the Jacobian matrix or the gradient with respect to a, e.g. h j = .
g j . Then note that, under Assumption 2.2, due to ergodicity and stationarity and by Bibby and Rensen (1995, Lemma 3 , a), where E ϕ ε and E ρ ε denotes, respectively, that X ε 0 and (X ε 0 , Z ε 0 ) are distributed according to their invariant distribution. We remark that the invariant distribution ρ ε exists due to Lemma A.2. By equation (5.1) the Jacobian matrices of G J (ε, a) and G J (ε, a) with respect to a are given by

Continuity of the limit of the score function
In this section, we first prove the continuity of the functions We then study the limit of these functions for ε → 0. As the proof for the filtered and the non-filtered are similar, we will concentrate on the filtered case and comment on the non-filtered case. Before entering into the proof, we give two preliminary technical lemmas which will be used repeatedly and whose proof can be found, respectively, in Appendix A.1 and Appendix A.3.
Lemma 5.1 Let Z ε be defined in (2.16) and distributed according to the invariant measure ρ ε of the process ( X n , Z n ). Then, for any p ≥ 1 there exists a constant C > 0 uniform in ε such that Lemma 5.2 Let f : R → R be a C ∞ (R) function which is polynomially bounded along with all its derivatives. Then, where R(ε, ) satisfies for all p ≥ 1 and for a constant C > 0 independent of and ε We start here with a continuity result for the score function and its Jacobian matrix with respect to the unknown parameter. Proof We only prove the statement for G J , then the argument is similar for H J . Letting ε * ∈ (0, ∞) and a * ∈ A, we want to show that By the triangle inequality, we have then we divide the proof in two steps and we show that the two terms vanish.
Since β j and φ j are continuously differentiable with respect to a for all j = 1, . . . , J , respectively, due to Assumption 2.5 and Lemma A.4, then also g j is continuously differentiable with respect to a. Therefore, by the mean value theorem for vector-valued functions, we have Then, letting C > 0 be a constant independent of ε, since β j and φ j are polynomially bounded still by Assumption 2.5 and X ε 0 , X ε and Z ε 0 have bounded moments of any order by Pavliotis and Stuart (2007, Corollary 5.4) and Lemma 5.1, we obtain which implies that Q 1 (ε, a) vanishes as (ε, a) goes to (ε * , a * ) both if is independent of ε and if = ε ξ .
Step 2: Q 2 (ε) → 0 as ε → ε * . If is independent of ε, then we have and the right-hand side vanishes due to the continuity of g j for all j = 1, . . . , J and the continuity of the solution of a stochastic differential equation with respect to a parameter (see Krylov 2009, Theorem 2.8.1). Let us now consider the case = ε ζ with ζ > 0 and let us assume, without loss of generality, that ε > ε * . Denoting * = (ε * ) ζ and applying Itô's lemma we have for all j = 1, . . . , J then we can write where R(ε) is given by Let C > 0 be independent of ε and notice that since p is bounded, β j , φ j , φ j , V are polynomially bounded and X ε t and Z ε 0 have bounded moments of any order by Pavliotis and Stuart (2007, Corollary 5.4) and Lemma 5.1, applying Hölder's inequality we obtain Therefore, by the continuity of the solution of a stochastic differential equation with respect to a parameter (see Mishura et al. 2010) and due to the bound (5.4), we deduce that which implies that Q 2 (ε) vanishes as ε goes to ε * and concludes the proof.

Remark 5.4
Notice that the proof of Proposition 5.3 can be repeated analogously for the functions G J : (0, ∞) × A → R M and H J : (0, ∞) × A → R M×M without filtered data in order to prove their continuity.
Next we study the limit as ε vanishes and we divide the analysis in two cases. In particular, we consider independent of ε and = ε ζ with ζ > 0. In the first case (Proposition 5.5), data are sampled at the homogenized regime ignoring the fact that the they are generated by a multiscale model, while in the second case (Proposition 5.7) the distance between two consecutive observations is proportional to the multiscale parameter and thus, data are sampled at the multiscale regime preserving the multiscale structure of the full path.
Proposition 5.5 Let the functions G J : (0, ∞) × A → R M and H J , : (0, ∞) × A → R M×M be defined in (5.2) and (5.3) and let be independent of ε. Under Assumption 2.5 and for any a * ∈ A, we have Proof We only prove the statement for G J , then the argument is similar for H J . By the triangle inequality, we have which vanishes due to the first step of the proof of Proposition 5.3 and Let us remark that the convergence in law of the joint process which implies the desired result.
Remark 5.6 Similar results to Proposition 5.3 and Proposition 5.5 can be shown for the estimator without filtered data.
In particular we have that G J (ε, a) and H J (ε, a) are continuous in (0, ∞) × A and Since the proof is analogous, we do not report here the details.
where the generator L A is defined in (2.9).
Proof We only prove the statement for G J , then the argument is similar for H J . By the triangle inequality, we have then we need to show that the two terms vanish and we distinguish two cases. Case 1: ζ ∈ (0, 1). Applying Lemma 5.2 to the functions φ j (·; a * ) for all j = 1, . . . , J and noting that where R(ε, ) satisfies for a constant C > 0 independent of ε and and for all p ≥ 1 We now study the three terms separately. First, by Cauchy-Schwarz inequality, since β j (·; a * ) is polynomially bounded, Z ε 0 has bounded moments of any order by Lemma 5.1 and due to (5.5) we obtain Let us now focus on I ε 1 for which we have where Z ε 0 is distributed according to the invariant measure ρ ε of the continuous process (X ε t , Z ε t ) and lim ε→0 1 − e −λ j (a * ) = λ j (a * ). (5.7) By the mean value theorem for vector-valued functions, we have and since β j (·; a * ), φ j (·; a * ) are polynomially bounded, X ε 0 , Z ε 0 , Z ε 0 have bounded moments of any order, respectively, by Pavliotis and Stuart (2007, Corollary 5.4 Moreover, notice that by homogenization theory (see Abdulle et al. 2021, Sect. 3.2) the joint process (X ε 0 , Z ε 0 ) converges in law to the joint process (X 0 0 , Z 0 0 ) and therefore which together with (5.7) and (5.8) yields We now consider I ε 3 and we follow an argument similar to I ε 2 . We first have where the first term in the right-hand side converges due to homogenization theory and the second one is bounded by Therefore, we obtain which together with (5.6) and (5.9) implies which shows that Q 2 (ε) vanishes as ε goes to zero. Let us now consider Q 1 (ε, a). Following the first step of the proof of Proposition 5.3, we have where a assumes values in the line connecting a and a * , and repeating the same computation as above we obtain which together with (5.10) gives the desired result. Case 2: ζ ∈ (1, 2) ∪ (2, ∞). Let Z ε 0 be distributed according to the invariant measure ρ ε of the continuous process (X ε t , Z ε t ) and define Then, we have and we first bound the remainder R(ε, ). Applying Itô's lemma to the process X ε t with the functions φ j (·; a * ) for each j = 1, . . . , J we have (5.12) and observing that By the mean value theorem for vector-valued functions, we have and since β j (·; a * ), φ j (·; a * ) are polynomially bounded, X ε 0 , Z ε 0 , Z ε 0 have bounded moments of any order, respectively, by Pavliotis and Stuart (2007, Corollary 5.4), Abdulle (2021, Lemma C.1) and Lemma 5.1 and applying Hölder's inequality, we obtain for a constant C > 0 independent of ε and . We repeat a similar argument for R 2 (ε, ) and R 3 (ε, ) to get , which together with (5.14) yield Moreover, applying Lemma 5.2 and proceeding similarly to the first part of the first case of the proof, we have which together with (5.15) and Corollary A.3 implies (5.16) Let us now consider Q ε j . Replacing equation (5.12) into the definition of Q ε j in (5.11) and observing that similarly to (5.13), it holds We rewrite β j (Z ε 0 ; a * ) inside the integrals employing equation (2.21) and Itô's lemma hence due to stationarity we have Since φ j (·; a * ), φ j (·; a * ) and β j (·; a * ) are polynomially bounded, p is bounded and X ε t and Z ε t have bounded moments of any order, respectively, by Pavliotis and Stuart (2007, Corollary 5.4) and Abdulle et al. (2021, Lemma C Let us now move to Q ε j,1 and let us define the functions where ρ ε and ρ 0 are, respectively, the densities with respect to the Lebesgue measure of the invariant distributions of the joint processes (X ε t , Z ε t ) and (X 0 t , Z 0 t ) and ϕ ε and ϕ 0 are their marginals with respect to the first component. Integrating by parts we have Employing the last equation in the proof of Lemma 3.5 in Abdulle et al. (2021) and thus we obtain Letting ε go to zero and due to homogenization theory, it follows then applying formula (5.19) for the homogenized equation, i.e. with p(y) = 0 and α and σ replaced by A and , and integrating by parts we have Therefore, we obtain which together with (5.11), (5.17) and bounds (5.16) and (5.18) implies that Q 2 (ε) vanishes as ε goes to zero. Finally, analogously to the first case we can show that also Q 1 (ε, a) vanishes, concluding the proof.

Remark 5.8
A similar result to Proposition 5.7 can be shown for the estimator without filtered data only if ζ ∈ (0, 1), i.e. the first case in the proof. In particular, we have where the generator L A is defined in (2.9). Since the proof is analogous, we do not report here the details. On the other hand, if ζ > 2, we can show that The proof is omitted since it is similar to the second case of the proof of Proposition 5.7.

Proof of the main results
Let us remark that we aim to prove the asymptotic unbiasedness of the proposed estimators, i.e. their convergence to the homogenized drift coefficient A as the number of observations N tends to infinity and the multiscale parameter ε vanishes. Therefore, we study the limit of the score functions and their Jacobian matrices as N → ∞ and ε → 0 evaluated in the desired limit point A.
We first analyse the case independent of ε and we consider the limit of Proposition 5.5 and Remark 5.6 evaluated in a * = A. Then, due to equation (2.12) we get and similarly we obtain On the other hand, if is a power of ε, we study the limit of Proposition 5.7 and Remark 5.8 evaluated in a * = A and by (2.10) we have Moreover, differentiating equation (2.12) with respect to a, we get Therefore, due to (2.12) and (5.23), we have Then, due to Lemma A.4, we can differentiate the eigenvalue problem (2.11) with respect to a and deduce that where the dot denotes the gradient with respect to a and which together with (2.11) implies Before showing the main results, we need two auxiliary lemmas, which in turn rely on the technical Assumption 3.1, which can now be rewritten as: Since the proofs of the two lemmas are similar, we only write the details of the first one.
Proof Let us first extend the functions G J and H J by continuity in ε = 0 with their limit given by Proposition 5.5 and Proposition 5.7 depending on and note that due to (5.21) if is independent of ε and (5.22) otherwise, we have Moreover, by (5.24), (5.25) and Assumption 3.1, we know that det H J (0, A) = 0.
Therefore, since the functions G J and H J are continuous by Proposition 5.3, the implicit function theorem (see Hurwicz and Richter 2003, Theorem 2) gives the desired result.
Lemma 5.10 Under Assumption 2.5 and Assumption 3.1, there exists ε 0 > 0 such that for all 0 < ε < ε 0 there exists γ = γ (ε) such that if is independent of ε or = ε ζ with ζ ∈ (0, 1) We are now ready to prove the asymptotic unbiasedness of the estimators, i.e. Theorems 3.3 and 3.4. We only prove Theorem 3.4 for the estimator A ε N ,J with filtered data. The proof of Theorem 3.3 for the estimator A ε N ,J without filtered data is analogous and is omitted here.

Proof of Theorem 3.4
We need to show for a fixed 0 < ε < ε 0 : (i) the existence of the solution A ε N ,J of the system G ε N ,J (a) = 0 with probability tending to one as N → ∞; We first note that by Lemma 5.9 we have lim ε→0 γ (ε) = 0.
We then follow the steps of the proof of Bibby and Rensen (1995, Theorem 3.2). Due to Barndorff-Nielsen and Sorensen (1994, Theorem A.1), claims (i) and (ii) hold true if we verify that (5.26) and as N → ∞ where ε is a positive definite covariance matrix and for C > 0 small enough such that B C,1 ⊂ A. Result (5.27) is a consequence of Florens-Zmirou (1989, Theorem 1). We where the right-hand side vanishes by Bibby and Rensen (1995, Lemma 3.3) and the continuity of H (Proposition 5.3), implying result (5.26). Hence, we proved (i) and (ii), which conclude the proof of the theorem.

Remark 5.11
Notice that if = ε ζ with ζ > 2 and we do not employ the filter, in view of (5.20) and following the same proof of Theorem 3.4, we could compute the asymptotic limit of A ε N ,J as N goes to infinity and ε vanishes if we knew a * such that The value of a * cannot be found analytically since it is, in general, different from the drift coefficients α and A of the multiscale and homogenized equations (2.1) and (2.2). Nevertheless, we observe that in the simple scale of the multiscale Ornstein-Uhlenbeck process we have a * = α.

Conclusion
In this work, we presented new estimators for learning the effective drift coefficient of the homogenized Langevin dynamics when we are given discrete observations from the original multiscale diffusion process. Our approach relies on a martingale estimating function based on the eigenvalues and eigenfunctions of the generator of the coarse-grained model and on a linear time-invariant filter from the exponential family, which is employed to smooth the original data. We studied theoretically the convergence properties of our estimators when the sample size goes to infinity and the multiscale parameter describing the fastest scale vanishes. In Theorem 3.3 and Theorem 3.4, we proved, respectively, the asymptotic unbiasedness of the estimators with and without filtered data. We remark that the former is not robust with respect to the sampling rate at finite multiscale parameter, while the estimator with filtered data is robust independently of the sampling rate. We analysed numerically the dependence of our estimators on the number of observations and the number of eigenfunctions employed in the estimating function noticing that the first eigenvalues in magnitude are sufficient to approximate the drift coefficient. Moreover, we performed several numerical experiments, which highlighted the effectiveness of our approach and confirmed our theoretical results. We believe that eigenfunction estimators can be very useful in applications, for example to multiparticle systems and their mean field limit (Gomes and Pavliotis 2018), since the eigenvalue problem for the generator of a reversible Markov process is a very well-studied problem. This means, in particular, that it is possible to study rigorously the proposed estimators and to prove asymptotic unbiasedness and asymptotic normality. Furthermore, in order to be able to assess the accuracy of the estimators, we could analyse its rate of convergence with respect to both the number of observations and the fastest scale. This is a highly nontrivial problem since it first requires the development of a fully quantitative periodic homogenization theory and we will return to this problem in future work. Finally, we think that it would be interesting to extend our estimators to the nonparametric framework and consider more general multiscale models. expressed herein are solely those of the authors listed, and may differ from the views and opinions expressed by JPMorgan Chase & Co. or its affiliates. This material is not a product of the Research Department of J.P. Morgan Securities LLC. This material does not constitute a solicitation or offer in any jurisdiction.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.

A Technical results
In this section, we prove some technical results which are used to show the unbiasedness of the estimators A ε N ,J and A ε N ,J . We first study the properties of the filter applied to discrete data, and then, we focus on the regularity of the eigenfunctions and eigenvalues of the generator. We finally prove a formula which can be interpreted as an approximation of the Itô's lemma.

A.1 Application of the filter to discrete data
The following result quantifies the expected distance among the continuous process Z ε t and the filtered observations Z ε n .
Lemma A.1 Let 0 < < 1, N be a positive integer and let Z ε n and Z ε t be defined, respectively, in (2.16) and (2.20) with X ε 0 = X ε 0 distributed according to its invariant measure ϕ ε . Then, there exists a constant C > 0 independent of ε, and N such that for all n = 0, . . . , N and for all p ≥ 1 where E ϕ ε denotes the expectation with respect to the Wiener measure and the fact that X ε 0 is distributed according to ϕ ε .
Proof In order to simplify the notation, let us define the quantity which is equivalent to Then, by Jensen's inequality applied to the convex function y → |y| p and since X ε k = X ε k , we have We now study the two terms separately. Applying Abdulle et al. (2021, Lemma B.1), we first get where C > 0 is a constant independent of ε and or we notice that, since X ε t has bounded moments of any order by Pavliotis and Stuart (2007, Corollary 5.4) and p is bounded, it holds for all s ∈ [k , (k + 1) ) Therefore, applying Jensen's inequality and due to the fact that X ε k has bounded moments of any order by Pavliotis and Stuart (2007, Corollary 5.4) we have which, together with (A.1) and (A.5), gives the desired result.
We now show the ergodicity of the process ( X ε n , Z ε n ), where the first component is a sample from the continuoustime process, i.e. X ε n = X ε n , while the second component is computed starting from the discrete observations X ε n .
Lemma A.2 Let > 0 and let Assumption 2.2 hold. Then, the couple ( X ε n , Z ε n ), where X ε n is a sample from the continuous process (2.1) and Z ε n is defined in (2.16), admits a unique invariant measure with density with respect to the Lebesgue measure denoted by ρ ε = ρ ε (x, z). Moreover, if is independent of ε, it converges in law to the two-dimensional process ( X 0 n , Z 0 n ) with ρ 0 = ρ 0 (x, z) as density of the invariant measure.
Proof By definition (2.19), we obtain the following stochastic difference equation where X ε n is a stationary and ergodic sequence. Observing that log e − = − < 0, applying Theorem 1 and in view of Remark 1.3 in Brandt (1986) we deduce the existence of a unique invariant measure for the couple ( X ε n , Z ε n ). Let us notice that in the theorem the sequence X ε n must be defined for all n ∈ Z while in our framework n ∈ N, but let us also remark that any stationary process indexed by N can be extended to one indexed by Z in an essentially unique way. Moreover, if is independent of ε, the same reasoning can be repeated to get the existence of a unique invariant measure for the couple ( X 0 n , Z 0 n ). Finally, standard homogenization theory implies the weak convergence of ρ ε to ρ 0 , which concludes the proof.
Proof The result follows directly from Lemma A.1 by letting n go to infinity, noting that the constant C is independent of n and employing ergodicity given by Lemma A.2.
It directly follows that Z ε n has bounded moments of all order and, in particular, we can prove Lemma 5.1.

Proof of Lemma 5.1
Applying Jensen's inequality to the function x → |x| p , we have then bounding the two terms in the right-hand side, respectively, with Abdulle et al. (2021, Lemma C.1) and Corollary A.3 gives the desired result.
Then, we write f (X ε )= f (S ε )+ f (X ε )− f (S ε ) =: f (S ε )+R 3 (ε, ), and, in order to conclude, it only remains to bound the expectation of R 3 (ε, ). Applying the mean value theorem and the Cauchy-Schwarz inequality and due to (A.7), the hypotheses on f and the fact that X ε t has bounded moments of any order by Pavliotis and Stuart (2007, Corollary 5.4), we obtain where X takes values between X ε and S ε , and which together with the estimates for R 1 and R 2 implies the desired result.

B Implementation details
In this section, we present the main techniques that we employed in the implementation of the proposed method. The most important steps in the algorithm are the computation of the eigenvalues and eigenfunctions of the eigenvalue problem (2.11) φ j (x; a) − a · V (x)φ j (x; a) + λ j (a)φ j (x; a) = 0, and the solution of the nonlinear system (2.15) or (2.18) with filtered data. Let us first focus on the eigenvalue problem. We note that the domain of the eigenfunctions is the whole real line R and need to be truncated for numerical computations. We first consider the variational formulation of equation (2.11), i.e. we multiply it by vϕ a , where v is a test function and ϕ a is the invariant distribution defined in (2.8), and integrating by parts we obtain for all j ∈ N the following eigenvalue problem Since ϕ a decays to zero exponentially fast, for all δ > 0 there exists r > 0 such that |ϕ a (x)| < δ for all x / ∈ [−r , r ].
Hence, letting R > 0 we assume that ϕ a (±R) 0 and we solve the truncated problem and h = 2R/N h , and we construct the discrete space which is constituted by continuous piecewise linear functions. Note that the discretization parameter h is chosen to be h = 0.1 or h = 0.05. We pick the characteristic Lagrangian basis {ψ k } N h k=0 of X 1 h characterized by the following property where δ ik is the Kronecker delta. We want to find φ j (·; a) ∈ X 1 h such that equation (B.1) holds true for all v ∈ X 1 h . Therefore, in equation (B.1) we substitute which have dimension Md 2 . From a theoretical point of view, slight modifications of the proofs allow to conclude that analogous results to the main theorems hold true, i.e. that the estimators are asymptotically unbiased in the limit of infinite observations and when the multiscale parameter vanishes. However, the problem becomes more complex and computationally expensive from a numerical viewpoint, in particular when the dimension d is large. In fact, the final nonlinear system, which has to be solved, has dimension Md 2 instead of M and, most importantly, it is required to solve the eigenvalue problem for the generator of a diffusion process in d dimensions.