1 Introduction

Efficient parameter estimation for stochastic models is essential in a wide range of applications in natural and social sciences. In several areas, the data originate from phenomena which vary continuously in time and which are endowed with a multiscale structure. This is the case, for example, in molecular dynamics, oceanography and atmosphere science or in econometrics. Frequently, it is desirable in these areas to infer from data a simpler model which captures effectively large-scale structures, or slow variations, disregarding small-scale fluctuations or treating them as a source of noise. The mismatch between the data and their desired slow-scale representation is a typical instance of a problem of model misspecification, which, if ignored or handled incorrectly, can lead to erroneous inference. Indeed, the data, coming from the full dynamics, are compatible with the coarse-grained model only at the time scales at which the effective dynamics is valid.

In this paper, we consider a simple multiscale setting arising from models of molecular dynamics, with a complete separation between the fast and the slow scale. In particular, we consider diffusion processes for motion in a confining potential which has slow variations with rapid order-one oscillations superimposed. Given data in the form of a sample path from this simple class of model problems, we are interested in determining the drift coefficient of an equation of the overdamped Langevin type in which the fast-scale potential is eliminated. The theory of homogenization guarantees that such a single-scale equation can be uniquely determined, and our goal is therefore to obtain effective coarse-grained dynamics from data consistently with respect to the homogenization result.

Several methods to take into account model misspecification in multiscale frameworks as above exist. For diffusion processes, the proposed approaches rely in different measures to subsampling, which has proved itself to some extent effective in many applications, but which requires nevertheless precise knowledge of how separated the two characteristic time scales are. Robustness of this methodology is dubious, too, as inference results tend to be extremely sensitive to the subsampling rate.

In the rest of the introduction, we first give a brief overview of the existing literature on the topic of deterministic and stochastic multiscale inference problems, then introduce our novel methodology and its favorable properties and conclude with an outline of this paper.

1.1 Literature Review

For simple models in molecular dynamics, the effect of model misspecification was studied in a series of papers [7, 8, 16, 17, 26, 28, 29] under the assumption of scale separation. In particular, for Brownian particles moving in two-scale potentials it was shown that, when fitting data from the full dynamics to the homogenized equation, the maximum likelihood estimator (MLE) is asymptotically biased [29, Theorem 3.4]. To be more precise, in the large sample size limit, the data remain consistent with the multiscale problem at small scale. Ostensibly, this would seem related only to the estimation of the diffusion coefficient. However, because of detail balance, it also has the effect that the MLE, for the drift in a parameter fit of a single-scale model, incorrectly identifies the coefficient of the homogenized equation. The bias of the MLE can be eliminated by subsampling at an appropriate rate, which lies between the two characteristic time scales of the problem [29, Theorems 3.5 and 3.6].

Similar techniques can be employed in econometrics, in particular for the estimation of the integrated stochastic volatility in the presence of market microstructure noise. In this case, too, the data have to be subsampled at an appropriate rate [6, 25]. The correct subsampling rate can, in some instances, be rather extreme with respect to the frequency of the data itself, resulting in ignoring as much as \(99\%\) of the time series. As the intuition suggests, this increases significantly the variance of the estimator, which is usually taken care of with additional bias corrections and variance reduction procedures. The need of such methodology is accentuated by data being obtained at high-frequency [5, 35].

The problem of extracting large-scale variations from multiscale data is studied in atmosphere and ocean science. In this field, too, subsampling the data is necessary to obtain an accurate coarse-grained model [12, 34].

The necessity to subsample the data can be alleviated by using appropriate martingale estimators, as was done in [18, 21]. This class of estimators can be applied to the case where the noise is multiplicative and also given by a deterministic chaotic system, as opposed to white noise. Estimators of this family have been applied to time series from paleoclimatic data and marine biology and augmented with appropriate model selection methodologies [22].

In case the data consist of discrete observations and not of continuous time series, it is possible to employ estimators based on a spectral decomposition of the generator of the stochastic process. Methodologies of this kind have been applied successfully to inference problems for single-scale problems [13, 20], as well as more recently for multiscale diffusions [14].

Inference of diffusion processes can be naturally performed under a Bayesian perspective. If one focuses on the drift coefficient, the form of the likelihood function guarantees, under a Gaussian prior hypothesis, that the posterior distribution is itself a Gaussian. The versatility of the Bayesian approach in the infinite-dimensional case [15, 33] gives the possibility to extend the study of inferring the drift of a diffusion process to the nonparametric case [31, 32].

The issue of model misspecification in inverse problems with a multiscale structure has been treated in the context of partial differential equations, too. In particular, it has been shown that it is possible to infer a coarse-grained equation from data coming from the full model and to retrieve, in the large data limit, the correct result [24]. A series of papers [1,2,3] focuses on retrieving the full model when the multiscale coefficient is endowed with a specific parametrized structure. Since these problems are ill-posed, the latter is achieved via Tikhonov regularization [1, 24], adopting a Bayesian approach [2, 24] or exploiting techniques of Kalman filtering [3]. In [2, 3], the authors highlight the need to account explicitly for the modeling error due to homogenization and apply statistical techniques taken from [10, 11].

1.2 Our Contributions

In this paper, we bypass subsampling by designing a methodology based on filtered data. In particular, we smooth the time-series data from the multiscale model by application of an appropriate linear time-invariant filter, from the exponential family, and show that doing so allows us to accurately retrieve the drift coefficient of the homogenized model. The methodology we present is straightforward to implement, robust in practice and backed by theory. In particular, we show theoretically and demonstrate via numerical experiments that:

  1. (i)

    The smoothing width of the filter can be alternatively tuned to be proportional to the speed of the slow process or to smaller scales and provide in both cases unbiased results for maximum likelihood parameter estimation. Sharp estimates on the minimal width with respect to the multiscale parameter are provided. The unbiasedness results are given in Theorems 3.12 and 3.18 for filtered data in the homogenized and in the multiscale regimes, respectively.

  2. (ii)

    We additionally propose in the multiscale regime an estimator of the effective diffusion coefficient based on filtered data, as shown in Theorem 3.20.

  3. (iii)

    Estimations based on our technique are robust in practice with respect to the parameter of the filter. This is not the case for subsampling, which is strongly influenced by the subsampling frequency. The robustness of our technique is demonstrated via numerical experiments in Sects. 5.1 and 5.3.

  4. (iv)

    The entire stream of data is employed, which, in practice, enhances the quality of the filter-based MLE in terms of bias. Moreover, avoiding subsampling and thus discretizing the data allow us to employ continuous-time theoretical tools.

  5. (v)

    It is possible to employ the filtered data approach within a continuous-time Bayesian framework by a careful modification of the likelihood function. Under mild hypotheses on the filter parameters, we are able to show that the posterior distributions obtained with our methodology are asymptotically consistent with respect to the drift parameter of the homogenized equation. Our main theoretical result is given in Theorem 4.5, and a numerical experiment for the combination of the filtered data approach and of Bayesian techniques is presented in Sect. 5.4.

1.3 Outline

The rest of the paper is organized as follows. In Sect. 2, we introduce the problem and lay the basis of our analysis setting the main assumptions and notation. In Sect. 3, we present our filtered data methodology, with a particular focus on ergodic properties, on multiscale convergence and, naturally, on the properties of our estimators. In Sect. 4, we introduce the Bayesian framework and show how it can be enhanced employing filtered data. Finally, in Sect. 5 we demonstrate the effectiveness of our methodology via a series of numerical experiments.

2 Problem Setting

In this section, we introduce the class of diffusion processes which we treat in this paper and the classical methodology employed for the estimation of the drift. Let \(\varepsilon > 0\) and let us consider the one-dimensional multiscale stochastic differential equation (SDE)

$$\begin{aligned} \mathrm {d}X_t^\varepsilon = -\alpha \cdot V'(X_t^\varepsilon ) \,\mathrm {d}t - \frac{1}{\varepsilon }p'\left( \frac{X_t^\varepsilon }{\varepsilon }\right) \,\mathrm {d}t + \sqrt{2\sigma } \,\mathrm {d}W_t, \end{aligned}$$

where, given a positive integer N, we have that \(\alpha \in {\mathbb {R}}^N\) and \(\sigma > 0\) are the drift and diffusion coefficients, respectively, and \(W_t\) is a standard one-dimensional Brownian motion. The functions \(V:{\mathbb {R}}\rightarrow {\mathbb {R}}^N\) and \(p:{\mathbb {R}}\rightarrow {\mathbb {R}}\) define the slow-scale and the fast-scale confining potentials, respectively. In particular, we assume

$$\begin{aligned} V(x) = \begin{pmatrix} V_1(x)&V_2(x)&\cdots&V_N(x) \end{pmatrix}^\top , \end{aligned}$$

for smooth functions \(V_i:{\mathbb {R}}\rightarrow {\mathbb {R}}\), \(i = 1, \ldots , N\). Moreover, we assume p to be smooth and periodic of period L. The theory of homogenization [9, Chapter 3] guarantees the existence of an SDE of the form

$$\begin{aligned} \mathrm {d}X_t = - A \cdot V'(X_t) \,\mathrm {d}t + \sqrt{2\varSigma } \,\mathrm {d}W_t, \end{aligned}$$

such that \(X_t^\varepsilon \rightarrow X_t\) for \(\varepsilon \rightarrow 0\) in law as random variables in \({\mathcal {C}}^0([0, T]; {\mathbb {R}})\). In particular, we have \(A = K\alpha \) and \(\varSigma = K \sigma \), where the coefficient \(0<K<1\) is given by the formula

$$\begin{aligned} K = \int _0^L (1 + \varPhi '(y))^2 \, \mu (\mathrm {d}y), \end{aligned}$$


$$\begin{aligned} \mu (\mathrm {d}y) = \frac{1}{Z} e^{-p(y)/\sigma } \,\mathrm {d}y, \quad \text {where}\quad Z = \int _0^L e^{-p(y)/\sigma } \,\mathrm {d}y, \end{aligned}$$

and where the function \(\varPhi \) is the unique solution with zero mean with respect to the measure \(\mu \) of the two-point boundary value problem

$$\begin{aligned} -p'(y)\varPhi '(y) + \sigma \varPhi ''(y) = p'(y), \quad 0 \le y \le L, \end{aligned}$$

endowed with periodic boundary conditions. Let us remark that in this one-dimensional setting it is possible to determine \(\varPhi \) explicitly, and the homogenization coefficient K is given by

$$\begin{aligned} K = \frac{L^2}{Z{\widehat{Z}}}, \end{aligned}$$


$$\begin{aligned} Z = \int _0^L e^{-p(y)/\sigma } \,\mathrm {d}y, \quad {\widehat{Z}} = \int _0^L e^{p(y)/\sigma } \,\mathrm {d}y. \end{aligned}$$

We now briefly present the classical methodology for estimating the drift coefficient. Let \(T > 0\) and let \(X \,\,{:=}\,\,(X_t, 0\le t \le T)\) be a realization of the solution of (3) up to final time T. Girsanov’s change of measure formula applied to (3) allows to write the likelihood of X given a drift coefficient A as

$$\begin{aligned} p(X \mid A) = \exp \left( -\frac{I(X\mid A)}{2\varSigma } \right) , \end{aligned}$$


$$\begin{aligned} I(X \mid A) = \int _0^T A \cdot V'(X_t) \,\mathrm {d}X_t + \frac{1}{2} \int _0^T \left( A \cdot V'(X_t) \right) ^2 \,\mathrm {d}t. \end{aligned}$$

Minimizing the functional \(I(X \mid A)\) with respect to A therefore gives the maximum likelihood estimator (MLE) of A, which can be formally computed in closed form as

$$\begin{aligned} {\widehat{A}}(X, T) \,\,{:=}\,\,\arg \min _{A \in {\mathbb {R}}^N} I(X \mid A) = - M^{-1}(X)h(X), \end{aligned}$$

where \(M(X)\in {\mathbb {R}}^{N\times N}\) and \(h(X)\in {\mathbb {R}}^N\) are defined as

$$\begin{aligned} M(X) = \frac{1}{T} \int _0^T V'(X_t) \otimes V'(X_t) \,\mathrm {d}t, \quad h(X) = \frac{1}{T} \int _0^T V'(X_t) \,\mathrm {d}X_t, \end{aligned}$$

where \(\otimes \) denotes the outer product in \({\mathbb {R}}^N\). Let us now state the assumptions which will be employed throughout the rest of our work. In particular, we consider the same dissipative setting as [29, Assumption 3.1].

Assumption 2.1

The potentials p and V satisfy

  1. (i)

    \(p \in {\mathcal {C}}^\infty ({\mathbb {R}})\) and is L-periodic for some \(L > 0\);

  2. (ii)

    \(V_i \in {\mathcal {C}}^\infty ({\mathbb {R}})\) for all \(i=1, \ldots , N\) is polynomially bounded from above and bounded from below, and there exist \(a,b > 0\) such that

    $$\begin{aligned} -\alpha \cdot V'(x) x \le a - bx^2; \end{aligned}$$
  3. (iii)

    \(V'\) is Lipschitz continuous, i.e., there exists a constant \(C > 0\) such that

    $$\begin{aligned} \left\| V'(x) - V'(y)\right\| _2 \le C\left|x - y\right|, \end{aligned}$$

    and the components \(V'_i\) are polynomially bounded for all \(i = 1, \ldots , N\);

  4. (iv)

    for all \(T > 0\), the symmetric matrix M(X) is positive definite and there exists \({\bar{\lambda }} > 0\) such that \(\lambda _{\min }(M(X)) \ge {\bar{\lambda }}\).

Remark 2.2

In the following, in particular in the proof of Lemma 3.3, we will employ Assumption 2.1(ii) for the whole drift of the SDE (1), i.e., the function

$$\begin{aligned} V^\varepsilon (x) \,\,{:=}\,\,\alpha \cdot V(x) + p\left( \frac{x}{\varepsilon }\right) . \end{aligned}$$

Since \(p \in C^\infty ({\mathbb {R}})\) and is periodic, all derivatives of p are in \(L^\infty ({\mathbb {R}})\). Therefore, the assumption above is sufficient for \(V^\varepsilon \) to satisfy Assumption 2.1(ii) with different values for a and b. In particular, assume Assumption 2.1(ii) holds for V. Then, we have for all \(\gamma > 0\) by Young’s inequality

$$\begin{aligned} \begin{aligned} -(V^\varepsilon )'(x)x&\le a - bx^2 - \frac{1}{\varepsilon }p'\left( \frac{x}{\varepsilon }\right) x \\&\le \left( a + \frac{1}{2\varepsilon ^2\gamma }\left\| p'\right\| _{L^\infty ({\mathbb {R}})}^2\right) - \left( b - \frac{\gamma }{2}\right) x^2. \end{aligned} \end{aligned}$$

Hence, Assumption 2.1(ii) holds for \(V^\varepsilon \) with a coefficient b which is arbitrarily close to the coefficient for V, alone.

Under these assumptions, the MLE given in (7) is indeed the unique minimizer of the likelihood function, as shown in [31, Theorem 2.4].

Let us consider the modified estimator of the drift coefficient obtained replacing X with \(X^\varepsilon \,\,{:=}\,\,(X_t^\varepsilon , 0 \le t \le T)\) solution of (1), i.e.,

$$\begin{aligned} {\widehat{A}}(X^\varepsilon , T) \,\,{:=}\,\,\arg \min _{A \in {\mathbb {R}}^N} I(X^\varepsilon \mid A) = - M^{-1}(X^\varepsilon )h(X^\varepsilon ), \end{aligned}$$

where \(I(X^\varepsilon \mid A)\), the matrix \(M(X^\varepsilon )\) and the vector \(h(X^\varepsilon )\) are obtained replacing each occurrence of X with \(X^\varepsilon \). In the following, we assume that Assumption 2.1(iv) holds as well for the matrix \(M(X^\varepsilon )\), and simply denote by \(M \,\,{:=}\,\,M(X^\varepsilon )\) and \(h \,\,{:=}\,\,h(X^\varepsilon )\) in case of no ambiguity. Given the convergence of \(X^\varepsilon \rightarrow X\) in the space of continuous stochastic processes, one would expect that the MLE (8) would be asymptotically unbiased for the drift coefficient A of the homogenized equation (3). Instead, it is possible to prove that in the asymptotic limit for \(T \rightarrow \infty \) and \(\varepsilon \rightarrow 0\), the MLE tends to the drift coefficient \(\alpha \) of the unhomogenized equation (1). We report here this result, whose proof can be found for the case \(N = 1\) in [29, Theorem 3.4]. We remark that the proof for \(N > 1\) follows directly from the one-dimensional case.

Theorem 2.3

Let Assumption 2.1 hold and let \(X^\varepsilon _0\) be distributed according to the invariant measure of the process \(X^\varepsilon \) solution of (1). Then,

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\lim _{T \rightarrow \infty } {\widehat{A}}(X^\varepsilon , T) = \alpha , \quad \text {a.s.}, \end{aligned}$$

where \(\alpha \) is the drift coefficient of Eq. (1).

As anticipated in the introduction, the main existing tool for obtaining unbiased estimators in the literature is subsampling the data. In particular, let the dimension of the parameter \(N = 1\), let \(\delta > 0\) and let \(T = n\delta \) with n a positive integer. Then, a subsampled estimator for A is given by

$$\begin{aligned} {\widehat{A}}_\delta (X^\varepsilon , T) = - \frac{\sum _{j=0}^{n-1} V'(X^\varepsilon _{j\delta })\left( X^\varepsilon _{(j+1)\delta } - X^\varepsilon _{j\delta }\right) }{\delta \sum _{j=0}^{n-1} V'(X^\varepsilon _{j\delta })^2}, \end{aligned}$$

which is a discretized version of \({\widehat{A}}(X^\varepsilon , T)\). It is possible to show [29, Theorem 3.5] that choosing \(\delta = \varepsilon ^\zeta \) with \(\zeta \in (0, 1)\), then \({\widehat{A}}_\delta (X^\varepsilon , T)\) is an asymptotically unbiased estimator of A in the limit for \(\varepsilon \rightarrow 0\), in probability. Despite being widely employed in practice, estimators based on subsampling present some drawbacks, such as having a high variance, as mentioned in the introduction. In the following, we will introduce and analyze a novel approach for the drift estimation.

Estimating the effective diffusion coefficient \(\varSigma \) of the homogenized SDE (3) is as well a relevant problem. Indeed, knowing \(\varSigma \) besides the drift coefficient A gives a complete estimation of the effective model (3), which is effective for the multiscale data generated by (1) in the sense of homogenization theory. The standard approach for estimating the diffusion coefficient is to compute the quadratic variation of the path. In [29, Theorem 3.4], the authors show that this approach fails in case the data are not pre-processed, meaning that the quadratic variation of \(X^\varepsilon \) equals the diffusion coefficient \(\sigma \) of (1), even in the limit for \(\varepsilon \rightarrow 0\). They propose therefore the estimator \({\widehat{\varSigma }}_\delta \) based on subsampling that tends to the effective diffusion coefficient \(\varSigma \) [29, Theorem 3.6]. Despite the focus of this work being mainly the effective drift coefficient, we propose in the following an unbiased estimator for the effective diffusion coefficient which fits our novel approach.

Remark 2.4

We note that our framework may be viewed in the semi-parametric setting as the one of [21]. In particular, the functions \(V_i\), \(i=1, \ldots , N\) can be seen as the known basis functions of an expansion (e.g., a Taylor expansion) for the unknown confining potential \(V_\alpha :{\mathbb {R}}\rightarrow {\mathbb {R}}\) given by

$$\begin{aligned} V_\alpha (x) = \sum _{i=1}^N \alpha _i V_i(x). \end{aligned}$$

A numerical example highlighting the potential of our method in such a setting is given in Sect. 5.3.

Remark 2.5

Let us remark that for enhancing the clarity of the exposition, in this article we chose to focus on the case of a multidimensional parameter in the setting of one-dimensional diffusion processes. In fact, all the theory we present in the following could be generalized to the case of the d-dimensional version of the SDE (1), which can be written as

$$\begin{aligned} \mathrm {d}X_t^\varepsilon = - \sum _{i=1}^N \alpha _i \nabla V_i(X_t^\varepsilon ) \,\mathrm {d}t - \frac{1}{\varepsilon }\nabla p\left( \frac{X_t^\varepsilon }{\varepsilon }\right) \,\mathrm {d}t + \sqrt{2\sigma } \,\mathrm {d}W_t, \end{aligned}$$

where \(W_t\) is a standard d-dimensional Brownian motion. Slight modifications of the proof demonstrate that analogous results to ours may be obtained in the d-dimensional case.

3 The Filtered Data Approach

In this section, we introduce and analyze a novel approach based on filtered data to address the issue that the MLE estimator, when confronted with multiscale data, is biased. Let \(\beta , \delta > 0\) and let us consider a family of exponential kernel functions \(k :{\mathbb {R}}^+ \rightarrow {\mathbb {R}}\) defined as

$$\begin{aligned} k(r) = C_\beta \delta ^{-1/\beta } e^{-r^\beta /\delta }, \end{aligned}$$

where \(C_{\beta }\) is the normalizing constant given by

$$\begin{aligned} C_\beta = \beta \, \varGamma (1/\beta )^{-1}, \end{aligned}$$

so that

$$\begin{aligned} \int _0^\infty k(r) \,\mathrm {d}r = 1, \end{aligned}$$

and where \(\varGamma (\cdot )\) is the gamma function. We consider the process \(Z^\varepsilon \,\,{:=}\,\,(Z^\varepsilon _t, 0 \le t \le T)\) defined by the weighted average

$$\begin{aligned} Z^{\varepsilon }_t \,\,{:=}\,\,\int _0^t k(t - s)X^\varepsilon _s \,\mathrm {d}s. \end{aligned}$$

The process \(Z^\varepsilon \) can be interpreted as a smoothed version of the original trajectory \(X^\varepsilon \). In fact, in the field of signal processing, kernel (9) belongs to the class of low-pass linear time-invariant filters, which cut the high frequencies in a signal to highlight its slowest components. In the following, rigorous analysis is conducted only when \(\beta = 1\). Nonetheless, numerical experiments show that for higher values of \(\beta \) the performances of estimators computed employing the filter are more robust and qualitatively better.

Remark 3.1

Given a trajectory \(X^\varepsilon \), it is relatively inexpensive to compute \(Z^\varepsilon \) from a computational standpoint. In particular, the process \(Z^\varepsilon \) is the truncated convolution of the kernel with the process \(X^\varepsilon \). Hence, computational tools based on the fast Fourier transform (FFT) exist and allow to compute \(Z^\varepsilon \) fast component-wise. Moreover, the process \(Z^\varepsilon \) can be computed, in case \(\beta = 1\), in a recursive manner and therefore “online.”

Given a trajectory \(X^\varepsilon \) and the filtered data \(Z^\varepsilon \), the estimator of the drift coefficient we propose is given by

$$\begin{aligned} {\widehat{A}}_k(X^\varepsilon , T) = - {\widetilde{M}}^{-1}(X^\varepsilon ) \widetilde{h}(X^\varepsilon ), \end{aligned}$$

where we employ the subscript k for reference to the filter’s kernel in (9), and where

$$\begin{aligned} {\widetilde{M}}(X^\varepsilon ) = \frac{1}{T} \int _0^T V'(Z^\varepsilon _t) \otimes V'(X^\varepsilon _t) \,\mathrm {d}t, \qquad \text {and} \qquad {\widetilde{h}}(X^\varepsilon ) = \frac{1}{T} \int _0^T V'(Z^\varepsilon _t) \,\mathrm {d}X^\varepsilon _t.\nonumber \\ \end{aligned}$$

For economy of notation, we drop explicit reference to the dependence of \({\widetilde{M}}\) and \({\widetilde{h}}\) on \(X^\varepsilon \). Let us remark that the formula above is obtained from (8) by replacing only one instance of \(X_t^\varepsilon \) with \(Z_t^\varepsilon \) in both M and h. In particular, it is fundamental for proving unbiasedness to keep in the definition of h the differential of the original process \(\mathrm {d}X^\varepsilon _t\) (see Remark 3.7). Let us furthermore remark that \({\widehat{A}}_k(X^\varepsilon , T)\) need not be the minimizer of some likelihood function based on filtered data. In fact, if one were to replace \(Z_t^\varepsilon \) directly in (6), the symmetric part of the matrix \({\widetilde{M}}\) would appear and \(\widehat{A}_k(X^\varepsilon , T)\) would not be the minimizer. Therefore, the estimator \({\widehat{A}}_k(X^\varepsilon , T)\) has to be thought of as a perturbation of \({\widehat{A}}(X^\varepsilon , T)\), directly at the level of estimators and after the maximization procedure. The only theoretical guarantee which is still needed for the well-posedness of \(\widehat{A}_k(X^\varepsilon , T)\) is for \({\widetilde{M}}\) to be invertible, which we assume to be true and which we observed to hold in practice.

Fig. 1
figure 1

Filtering a trajectory \(X^\varepsilon \) obtained with \(V(x) = x^2 / 2\), \(p(y) = \cos (y)\), \(\alpha = 1\), \(\sigma = 0.5\) and \(\varepsilon = 0.1\). The filtering width is \(\delta = \{1, \sqrt{\varepsilon }, \varepsilon \}\) from top to bottom, respectively, and \(\beta = 1\)

We now consider the diffusion coefficient and propose the estimator for \(\varSigma \) in (3) given by

$$\begin{aligned} {\widehat{\varSigma }}_k(X^\varepsilon ,T) \,\,{:=}\,\,\frac{1}{\delta T} \int _0^T \left( X_t^\varepsilon - Z_t^\varepsilon \right) ^2 \,\mathrm {d}t, \end{aligned}$$

where again we employ the subscript k for reference to kernel (9) of the filter. As we will show in the following, and in particular in Theorem 3.20, the estimator \({\widehat{\varSigma }}\) is unbiased for the effective diffusion coefficient \(\varSigma \) in case \(\beta = 1\) and when we filter data at the multiscale regime, i.e., when \(\delta \) is a vanishing function of \(\varepsilon \).

Let us from now on consider \(\beta = 1\). For this value of \(\beta \), the parameter \(\delta \) appearing in (9) regulates the width of the filtering window. In practice, larger values of \(\delta \) will lead to trajectories which are smoother and for which fast-scale oscillations are practically canceled. Let us remark that the filtering width resembles the subsampling step employed for the estimator \({\widehat{A}}_\delta (X^\varepsilon , T)\) introduced and analyzed in [29]. For subsampling, the choice guaranteeing asymptotically unbiased results is \(\delta = \varepsilon ^\zeta \) with \(\zeta \in (0, 1)\), and a similar analysis is due for our technique. For visualization purposes, we depict in Fig. 1 the filtered trajectory \(Z^\varepsilon \) for three different values of \(\delta \), namely \(\delta = \{1, \sqrt{\varepsilon }, \varepsilon \}\). With \(\delta = 1\), all oscillations at the fast scale are canceled and the filtered trajectory \(Z^\varepsilon \) presents only slow-scale variations. Reducing the value of \(\delta \), fast-scale oscillations are progressively taken into account.

In the following, we first focus on the ergodic properties of the process \(Z^\varepsilon \) when it is coupled with the process \(X^\varepsilon \). This analysis is practically independent of the choice of \(\delta \) and is therefore presented on its own. Then, we focus on two different cases which depend on the choice of the width \(\delta \) of the filter. First, in Sect. 3.2, we consider \(\delta \) to be independent of \(\varepsilon \), and therefore, we filter at the speed of the homogenized process. In this case, we are able to prove that our estimator of the drift coefficient of the homogenized equation is asymptotically unbiased almost surely. This result will be presented in Theorem 3.12. We then move on in Sect. 3.3 to the case \(\delta \propto \varepsilon ^\zeta \), which corresponds to filtering the data at the speed of the multiscale process. In this case, we show that under some conditions on the exponent \(\zeta \), we can still obtain estimators which are asymptotically unbiased in probability. This result is proved in Theorem 3.18. For this second case, we widely employ techniques and estimates which come from [29].

3.1 Ergodic Properties

Let us consider the filtering kernel (9) with \(\beta = 1\), i.e.,

$$\begin{aligned} k(r) = \frac{1}{\delta } e^{-r/\delta }. \end{aligned}$$

In this case, Leibniz integral rule yields the equality

$$\begin{aligned} \mathrm {d}Z^\varepsilon _t = k(0) X^\varepsilon _t \,\mathrm {d}t + \int _0^t k'(t-s) X^\varepsilon _s \,\mathrm {d}s \,\mathrm {d}t = \frac{1}{\delta } \left( X^\varepsilon _t - Z^\varepsilon _t \right) \,\mathrm {d}t, \end{aligned}$$

which can be interpreted as an ordinary differential equation for \(Z_t^\varepsilon \) driven by the stochastic signal \(X^\varepsilon \). Considering the processes \(X^\varepsilon \) and \(Z^\varepsilon \) together, we obtain the system of two one-dimensional SDEs

$$\begin{aligned} \begin{aligned} \mathrm {d}X_t^\varepsilon&= -\alpha \cdot V'(X_t^\varepsilon ) \,\mathrm {d}t - \frac{1}{\varepsilon }p'\left( \frac{X_t^\varepsilon }{\varepsilon }\right) \,\mathrm {d}t+ \sqrt{2\sigma } \,\mathrm {d}W_t, \\ \mathrm {d}Z^\varepsilon _t&= \frac{1}{\delta } \left( X^\varepsilon _t - Z^\varepsilon _t \right) \,\mathrm {d}t. \end{aligned} \end{aligned}$$

The first ingredient for verifying the ergodic properties of the two-dimensional process \((X^\varepsilon , Z^\varepsilon )^\top \,\,{:=}\,\,((X^\varepsilon _t, Z^\varepsilon _t)^\top , 0 \le t \le T)\) is verifying that the measure induced by the stochastic process admits a smooth density with respect to the Lebesgue measure. Since noise is present only on the first component, this is a consequence of the theory of hypo-ellipticity, as summarized in the following Lemma, whose proof is given in “Appendix A.”

Lemma 3.2

Let \((X^\varepsilon , Z^\varepsilon )^\top \) be the solution of (13) and let \(\mu ^\varepsilon _t\) be the measure induced by the joint process at time t. Then, the measure \(\mu ^\varepsilon _t\) admits a smooth density \(\rho ^\varepsilon _t\) with respect to the Lebesgue measure.

Once it is established that the law of the process admits a smooth density for all times \(t>0\), which satisfies a time-dependent Fokker–Planck equation, we are interested in the limiting properties of this law. In particular, we know that the process \(X^\varepsilon \) alone is geometrically ergodic [23, Theorem 4.4], and we wish the couple \((X^\varepsilon , Z^\varepsilon )^\top \) to inherit the same property. The following Lemma guarantees that the couple is indeed geometrically ergodic, and its proof is given in “Appendix A.”

Lemma 3.3

Let Assumption 2.1 hold and let \(b > 0\) be given in Assumption 2.1(ii). Then, if \(\delta > 1/(4b)\), the process \((X^\varepsilon , Z^\varepsilon )^\top \) solution of (13) is geometrically ergodic, i.e., there exists \(C, \lambda > 0\) such that for all measurable \(f:{\mathbb {R}}^2\rightarrow {\mathbb {R}}\) such that for some integer \(q > 0\)

$$\begin{aligned} f(x, z) \le 1 + \left\| \begin{pmatrix} x&z \end{pmatrix}^\top \right\| _2^q, \end{aligned}$$

it holds

$$\begin{aligned} \left|{{\mathbb {E}}}f(X^{\varepsilon }_{t}, Z^{\varepsilon }_{t}) - \int _{\mathbb {R}}\int _{\mathbb {R}}f(x, z) \rho ^{\varepsilon }(x, z) \,\mathrm {d}x \,\mathrm {d}z\right| \le C\left( 1 + \left\| \begin{pmatrix} X^{\varepsilon }_{0}&Z^{\varepsilon }_{0} \end{pmatrix}^{\top }\right\| _{2}^{q} \right) e^{-\lambda t}, \end{aligned}$$

for \(\rho ^\varepsilon \)-a.e. couple \((X_0^\varepsilon , Z_0^\varepsilon )^\top \), where \({{\mathbb {E}}}\) denotes expectation with respect to the Wiener measure, and \(\rho ^\varepsilon \) is the solution to the stationary Fokker–Planck equation

$$\begin{aligned} \sigma \partial ^2_{xx} \rho ^\varepsilon (x,z) + \partial _x \left( \left( \alpha \cdot V'(x) + \frac{1}{\varepsilon } p' \left( \frac{x}{\varepsilon }\right) \right) \rho ^\varepsilon (x,z) \right) + \frac{1}{\delta } \partial _z \left( (z - x) \rho ^\varepsilon (x,z)\right) = 0.\nonumber \\ \end{aligned}$$

Remark 3.4

The condition \(\delta > 1 / (4b)\) is not very restrictive. Let the parameter dimension \(N = 1\) and let \(V(x) \propto x^{2r}\) for an integer \(r > 1\). Then, Assumption 2.1(ii) holds for an arbitrarily large \(b > 0\). Therefore, the parameter of the filter \(\delta \) can be chosen along the entire positive real axis. A similar argument can be employed for higher dimensions \(N > 1\).

In a general case, it is not possible to find an explicit solution to (14). Nevertheless, it is possible to show some relevant properties of the solution itself, which are summarized in the following Lemma, whose proof is given in “Appendix A.”

Lemma 3.5

Under the assumptions of Lemma 3.3, let \(\rho ^\varepsilon \) be the solution of (14) and let us write

$$\begin{aligned} \rho ^\varepsilon (x, z) = \varphi ^\varepsilon (x)\psi ^\varepsilon (z)R^\varepsilon (x,z), \end{aligned}$$

where \(\varphi ^\varepsilon \) and \(\psi ^\varepsilon \) are the marginal densities of \(X^\varepsilon \) and \(Z^\varepsilon \), respectively, i.e.,

$$\begin{aligned} \varphi ^\varepsilon (x) = \int _{{\mathbb {R}}} \rho ^\varepsilon (x,z) \,\mathrm {d}z, \quad \psi ^\varepsilon (z) = \int _{{\mathbb {R}}} \rho ^\varepsilon (x,z) \,\mathrm {d}x. \end{aligned}$$

Then, it holds

$$\begin{aligned} \varphi ^\varepsilon (x) = \frac{1}{C_{\varphi ^\varepsilon }} \exp \left( -\frac{1}{\sigma } \alpha \cdot V(x) - \frac{1}{\sigma } p \left( \frac{x}{\varepsilon }\right) \right) , \end{aligned}$$


$$\begin{aligned} C_{\varphi ^\varepsilon } = \int _{{\mathbb {R}}} \exp \left( -\frac{1}{\sigma } \alpha \cdot V(x) - \frac{1}{\sigma } p \left( \frac{x}{\varepsilon }\right) \right) \,\mathrm {d}x. \end{aligned}$$

Moreover, it holds

$$\begin{aligned} \sigma \delta \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) \varphi ^\varepsilon (x) \psi ^\varepsilon (z) \partial _x R^\varepsilon (x,z) \,\mathrm {d}x \,\mathrm {d}z = {{\mathbb {E}}}^{\rho ^\varepsilon }[(X^\varepsilon - Z^\varepsilon )^2 V''(Z^\varepsilon )]. \end{aligned}$$

Remark 3.6

Lemma 3.5, and in particular equality (17), plays a fundamental role in the proof of unbiasedness of the estimator based on filtered data. In particular, this equality allows to bypass the explicit knowledge of the function R(xz), which governs the correlation between the processes \(X^\varepsilon \) and \(Z^\varepsilon \) at stationarity, for which a closed-form expression is not available in the general case.

Remark 3.7

Let us return to the definition of \({\widehat{A}}_k\) and replace the differential \(\mathrm {d}X^\varepsilon _t\) with \(\mathrm {d}Z^\varepsilon _t\) in \({\widetilde{h}}\). In this case, it holds

$$\begin{aligned} \lim _{T \rightarrow \infty } \frac{1}{T} \int _0^T V'(Z_t^\varepsilon ) \,\mathrm {d}Z_t^\varepsilon= & {} \lim _{T \rightarrow \infty } \frac{1}{\delta T} \int _0^T V'(Z_t^\varepsilon ) (X_t^\varepsilon - Z_t^\varepsilon ) \,\mathrm {d}t\\= & {} \frac{1}{\delta }{{\mathbb {E}}}^{\rho ^\varepsilon } \left[ V'(Z^\varepsilon ) (X^\varepsilon - Z^\varepsilon ) \right] = 0, \end{aligned}$$

where the last equality is obtained as in the proof of Lemma 3.5, with the choice \(f(x,z) = V(z)\) at the last line. Therefore, we stress again that it is indeed necessary to employ the original differential \(\mathrm {d}X^\varepsilon _t\) in the vector \({\widetilde{h}}\) in the definition (10) of \(\widehat{A}_k^\varepsilon \).

Remark 3.8

Let us consider kernel (9) with \(\beta > 1\). In this case, the steps leading to system (13) do not yield a system of Itô SDEs, but of stochastic delay differential equations. The analysis of the estimator in case \(\beta > 1\) is therefore based on different arguments than the one we present in this work.

3.2 Filtered Data in the Homogenized Regime

In this section, we analyze the behavior of the estimator \(\widehat{A}_k(X^\varepsilon , T)\) based on filtered data given in (10) when the filtering width \(\delta \) is independent of \(\varepsilon \). The analysis in this case is based on the convergence of the couple \((X^\varepsilon , Z^\varepsilon )^\top \) with respect to the multiscale parameter \(\varepsilon \rightarrow 0\). In particular, it is known that the invariant measure of \(X^\varepsilon \) converges weakly to the invariant measure of X, the solution of the homogenized equation (3). The following result guarantees the same kind of convergence for the couple \((X^\varepsilon , Z^\varepsilon )^\top \).

Lemma 3.9

Under Assumption 2.1, let \(\mu ^\varepsilon \) be the invariant measure of the couple \((X^\varepsilon , Z^\varepsilon )^\top \). If \(\delta \) is independent of \(\varepsilon \), then the measure \(\mu ^\varepsilon \) converges weakly to the measure \(\mu ^0(\mathrm {d}x, \mathrm {d}z) = \rho ^0(x, z) \,\mathrm {d}x \,\mathrm {d}z\), whose density \(\rho ^0\) is the unique solution of the Fokker–Planck equation

$$\begin{aligned} \varSigma \partial ^2_{xx} \rho ^0(x,z) + \partial _x\left( A \cdot V'(x) \rho ^0(x,z) \right) + \frac{1}{\delta }\partial _z\left( (z - x) \rho ^0(x,z) \right) = 0, \end{aligned}$$

where A and \(\varSigma \) are the coefficients of the homogenized equation (3).


Let \((X, Z)^\top \,\,{:=}\,\,\left( (X_t, Z_t)^\top , 0\le t \le T\right) \) be the solution of

$$\begin{aligned} \begin{aligned} \mathrm {d}X_t&= -A \cdot V'(X_t) \,\mathrm {d}t + \sqrt{2\varSigma } \,\mathrm {d}W_t, \\ \mathrm {d}Z_t&= \frac{1}{\delta } \left( X_t - Z_t \right) \,\mathrm {d}t, \end{aligned} \end{aligned}$$

with \((X_0, Z_0)^\top \sim \mu ^0\). The arguments of Sect. 3.1 can be repeated to conclude that the invariant measure of \((X, Z)^\top \) admits a smooth density \(\rho ^0\) which satisfies (18). Moreover, standard homogenization theory (see, e.g., [9, Chapter 3, Theorem 6.4] or [30, Theorem 18.1]) guarantees that \((X^\varepsilon ,Z^\varepsilon )^\top \rightarrow (X,Z)^\top \) for \(\varepsilon \rightarrow 0\) in law as random variables with values in \({\mathcal {C}}^0([0, T]; {\mathbb {R}}^2)\), provided that \((X_0^\varepsilon , Z_0^\varepsilon )^\top \sim \mu ^\varepsilon \). We remark that traditionally it is assumed that the initial conditions satisfy \((X_0^\varepsilon , Z_0^\varepsilon )^\top = (X_0, Z_0)^\top \) for the homogenization result to hold, but notice that the proof of, e.g., [30, Theorem 18.1] can be shown to hold with a minor modification in case both the multiscale and the homogenized processes are at stationarity. Denoting \(E = C^0([0,T], {\mathbb {R}}^2)\), this means that the measure induced by \((X^\varepsilon , Z^\varepsilon )^\top \) on \((E, {\mathcal {B}}(E))\) converges weakly to the measure induced by \((X, Z)^\top \) on the same measurable space (see, e.g., [30, Definition 3.24]). Hence, the measure \(\mu ^\varepsilon \) converges weakly to \(\mu ^0\) for \(\varepsilon \rightarrow 0\). \(\square \)

Example 3.10

A closed-form solution of (18) can be obtained in a simple case. Let the dimension of the parameter \(N=1\) and let \(V(x) = x^2/2\). Then, the analytical solution is given by

$$\begin{aligned} \rho ^0(x,z) = \frac{1}{C_{\rho ^0}} \exp \left( -\frac{A}{\varSigma } \frac{x^2}{2} - \frac{1}{\delta \varSigma } \frac{(x - (1+A \delta )z)^2}{2}\right) , \end{aligned}$$


$$\begin{aligned} C_{\rho ^0} = \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} \exp \left( -\frac{A}{\varSigma } \frac{x^2}{2} - \frac{1}{\delta \varSigma } \frac{(x - (1+A \delta )z)^2}{2}\right) \,\mathrm {d}x \,\mathrm {d}z = \frac{2\pi \varSigma \sqrt{\delta }}{(1+A\delta )\sqrt{A}}. \end{aligned}$$

This is the density of a multivariate normal distribution \(\mathcal N(0, \varGamma )\), where the covariance matrix is given by

$$\begin{aligned} \varGamma = \frac{\varSigma }{A (1 + A\delta )} \begin{pmatrix} 1+A\delta &{}\quad 1 \\ 1 &{}\quad 1 \end{pmatrix}. \end{aligned}$$

Let us remark that this distribution can be obtained from direct computations involving Gaussian processes. In particular, we have that X is in this case an Ornstein–Uhlenbeck process and it is therefore known that \(X \sim \mathcal {GP}(m_t, {\mathcal {C}}(t, s))\), where at stationarity \(m_t = 0\) and

$$\begin{aligned} {\mathcal {C}}(t, s) = \frac{\varSigma }{A} e^{-A|t-s|}. \end{aligned}$$

The basic properties of Gaussian processes imply that Z is a Gaussian process and that the couple \((X, Z)^\top \) is a Gaussian process, too, whose mean and covariance are computable explicitly.

We now present an analogous result to Lemma 3.5 for the limit distribution.

Corollary 3.11

Let \(\rho ^0\) be the solution of (18) and let us write

$$\begin{aligned} \rho ^0(x, z) = \varphi ^0(x)\psi ^0(z)R^0(x,z), \end{aligned}$$

where \(\varphi ^0\) and \(\psi ^0\) are the marginal densities, i.e.,

$$\begin{aligned} \varphi ^0(x) = \int _{{\mathbb {R}}} \rho ^0(x,z) \,\mathrm {d}z, \quad \psi ^0(z) = \int _{{\mathbb {R}}} \rho ^0(x,z) \,\mathrm {d}x. \end{aligned}$$

Then, if A and \(\varSigma \) are the coefficients of the homogenized equation (3), it holds

$$\begin{aligned} \varphi ^0(x) = \frac{1}{C_{\varphi ^0}} \exp \left( - \frac{1}{\varSigma } A\cdot V(x)\right) , \quad \text {where } \quad C_{\varphi ^0} = \int _{{\mathbb {R}}} \exp \left( - \frac{1}{\varSigma } A\cdot V(x) \right) \,\mathrm {d}x.\nonumber \\ \end{aligned}$$

Moreover, it holds

$$\begin{aligned} \varSigma \delta \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) \varphi ^0(x) \psi ^0(z) \partial _x R^0(x,z) \,\mathrm {d}x \,\mathrm {d}z = {{\mathbb {E}}}^{\rho ^0}[(X - Z)^2 V''(Z)]. \end{aligned}$$


The proof is directly obtained from Lemma 3.5 setting \(p(y)=0\) and replacing \(\alpha , \sigma \) by \(A, \varSigma \), respectively. \(\square \)

Let us introduce a notation which will be used throughout the rest of the paper. We denote

$$\begin{aligned} \widetilde{{\mathcal {M}}}_\varepsilon \,\,{:=}\,\,{{\mathbb {E}}}^{\rho ^\varepsilon }[V'(Z^\varepsilon )\otimes V'(X^\varepsilon )], \quad \widetilde{{\mathcal {M}}}_0 \,\,{:=}\,\,{{\mathbb {E}}}^{\rho ^0}[V'(Z)\otimes V'(X)], \end{aligned}$$

i.e., \(\widetilde{{\mathcal {M}}}_\varepsilon \) is obtained in the limit for \(T \rightarrow \infty \) applying the ergodic theorem elementwise to the matrix \({\widetilde{M}}\), and \(\widetilde{{\mathcal {M}}}_0\) is the limit for \(\varepsilon \rightarrow 0\) of the matrix \(\widetilde{{\mathcal {M}}}_\varepsilon \) due to Lemma 3.9. For completeness, we introduce here the symmetric matrices \({\mathcal {M}}_\varepsilon \) and \({\mathcal {M}}_0\) which are defined as

$$\begin{aligned} {\mathcal {M}}_\varepsilon \,\,{:=}\,\,{{\mathbb {E}}}^{\rho ^\varepsilon }[V'(X^\varepsilon )\otimes V'(X^\varepsilon )], \quad {\mathcal {M}}_0 \,\,{:=}\,\,{{\mathbb {E}}}^{\rho ^0}[V'(X)\otimes V'(X)], \end{aligned}$$

and which will be employed in the following. We can now introduce the main result, namely the convergence of the estimator based on filtered data of the drift coefficient of the homogenized equation.

Theorem 3.12

Let the assumptions of Lemmas 3.3 and 3.9 hold, and let \({\widehat{A}}_k(X^\varepsilon , T)\) be defined in (10) with \(\delta \) independent of \(\varepsilon \). If \({\widetilde{M}}\) is invertible, then

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \lim _{T \rightarrow \infty } {\widehat{A}}_k(X^\varepsilon ,T) = A, \quad \text {a.s.}, \end{aligned}$$

where A is the drift coefficient of the homogenized equation (3).


Replacing the expression of \(\mathrm {d}X^\varepsilon _t\) into (11), we get for \({\widetilde{h}}\)

$$\begin{aligned} {\widetilde{h}} = -{\widetilde{M}} \alpha - \frac{1}{T} \int _0^T \frac{1}{\varepsilon } p' \left( \frac{X^\varepsilon _t}{\varepsilon } \right) V'(Z^\varepsilon _t) \,\mathrm {d}t + \frac{\sqrt{2\sigma }}{T} \int _0^T V'(Z^\varepsilon _t) \,\mathrm {d}W_t. \end{aligned}$$

Therefore, we have

$$\begin{aligned} \begin{aligned} {\widehat{A}}_k(X^\varepsilon , T)&= \alpha + \frac{1}{T} {\widetilde{M}}^{-1} \int _0^T \frac{1}{\varepsilon } p' \left( \frac{X^\varepsilon _t}{\varepsilon } \right) V'(Z^\varepsilon _t) \,\mathrm {d}t - \frac{\sqrt{2\sigma }}{T} {\widetilde{M}}^{-1} \int _0^T V'(Z^\varepsilon _t) \,\mathrm {d}W_t\\&\,\,{=:}\,\,\alpha + I_1^\varepsilon (T) - I_2^\varepsilon (T). \end{aligned}\nonumber \\ \end{aligned}$$

We study the terms \(I_1^\varepsilon (T)\) and \(I_2^\varepsilon (T)\) separately. First, the ergodic theorem applied to \(I_1^\varepsilon (T)\) yields

$$\begin{aligned} \lim _{T \rightarrow \infty } I_1^\varepsilon (T) = \widetilde{{\mathcal {M}}}_\varepsilon ^{-1} {{\mathbb {E}}}^{\rho ^\varepsilon } \left[ \frac{1}{\varepsilon } p' \left( \frac{X^\varepsilon }{\varepsilon } \right) V'(Z^\varepsilon ) \right] , \quad \text {a.s.} \end{aligned}$$

Replacing decomposition (15), expression (16) of \(\varphi ^\varepsilon \) and integrating by parts, we have

$$\begin{aligned} \begin{aligned}&{{\mathbb {E}}}^{\rho ^\varepsilon } \left[ \frac{1}{\varepsilon } p' \left( \frac{X^\varepsilon }{\varepsilon } \right) V'(Z^\varepsilon ) \right] \\&\quad = \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) \, \frac{1}{\varepsilon }p' \left( \frac{x}{\varepsilon }\right) \frac{1}{C_{\varphi ^\varepsilon }} e^{- \frac{1}{\sigma } \alpha \cdot V(x)} e^{ - \frac{1}{\sigma } p \left( \frac{x}{\varepsilon }\right) } \psi ^\varepsilon (z) R^\varepsilon (x,z) \,\mathrm {d}x \,\mathrm {d}z \\&\quad = -\sigma \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} \frac{\mathrm {d}}{\mathrm {d}x}\left( e^{ - \frac{1}{\sigma } p \left( \frac{x}{\varepsilon }\right) } \right) \frac{1}{C_{\varphi ^\varepsilon }} e^{- \frac{1}{\sigma } \alpha \cdot V(x)} V'(z) \psi ^\varepsilon (z) R^\varepsilon (x,z) \,\mathrm {d}x \,\mathrm {d}z \\&\quad = \sigma \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} \frac{1}{C_{\varphi ^\varepsilon }} e^{ - \frac{1}{\sigma } p \left( \frac{x}{\varepsilon }\right) } \partial _x \left( e^{- \frac{1}{\sigma } \alpha \cdot V(x)} R^\varepsilon (x,z) \right) V'(z) \psi ^\varepsilon (z) \,\mathrm {d}x \,\mathrm {d}z, \end{aligned} \end{aligned}$$

which implies

$$\begin{aligned} {{\mathbb {E}}}^{\rho ^\varepsilon } \left[ \frac{1}{\varepsilon } p' \left( \frac{X^\varepsilon }{\varepsilon } \right) V'(Z^\varepsilon ) \right]&= - \left( \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) \otimes V'(x) \rho ^\varepsilon (x, z) \,\mathrm {d}x \,\mathrm {d}z\right) \alpha \\&\quad +\sigma \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) \varphi ^\varepsilon (x) \psi ^\varepsilon (z) \partial _x R^\varepsilon (x,z) \,\mathrm {d}x \,\mathrm {d}z\\&= - \widetilde{{\mathcal {M}}}_\varepsilon \alpha + \sigma \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) \varphi ^\varepsilon (x) \psi ^\varepsilon (z) \partial _x R^\varepsilon (x,z) \,\mathrm {d}x \,\mathrm {d}z. \end{aligned}$$

Replacing the equality above into (23), we obtain

$$\begin{aligned} \lim _{T \rightarrow \infty } I_1^\varepsilon (T) = -\alpha + \widetilde{\mathcal M}_\varepsilon ^{-1} \sigma \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) \varphi ^\varepsilon (x) \psi ^\varepsilon (z) \partial _x R^\varepsilon (x,z) \,\mathrm {d}x \,\mathrm {d}z, \quad \text {a.s.} \end{aligned}$$

Due to Lemma 3.5, we therefore have

$$\begin{aligned} \lim _{T \rightarrow \infty } I_1^\varepsilon (T) = -\alpha + \frac{1}{\delta }\widetilde{{\mathcal {M}}}_\varepsilon ^{-1} {{\mathbb {E}}}^{\rho ^\varepsilon }[(X^\varepsilon - Z^\varepsilon )^2 V''(Z^\varepsilon )], \quad \text {a.s.} \end{aligned}$$

Since \(\delta \) is independent of \(\varepsilon \), we can pass to the limit as \(\varepsilon \) goes to zero and Lemma 3.9 yields

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \lim _{T \rightarrow \infty } I_1^\varepsilon (T) = -\alpha + \frac{1}{\delta }\widetilde{{\mathcal {M}}}_0^{-1} {{\mathbb {E}}}^{\rho ^0}[(X - Z)^2 V''(Z)], \quad \text {a.s.} \end{aligned}$$

Due to Corollary 3.11, we have

$$\begin{aligned} \frac{1}{\delta } {{\mathbb {E}}}^{\rho ^0}[(X - Z)^2 V''(Z)] = \varSigma \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) \varphi ^0(x) \psi ^0(z) \partial _x R^0(x,z) \,\mathrm {d}x \,\mathrm {d}z, \end{aligned}$$

and moreover, an integration by parts yields

$$\begin{aligned} \begin{aligned} \frac{1}{\delta } {{\mathbb {E}}}^{\rho ^0}[(X - Z)^2 V''(Z)]&= -\varSigma \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) (\varphi ^0)'(x) \psi ^0(z) R^0(x,z) \,\mathrm {d}x \,\mathrm {d}z\\&= -\varSigma \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) \frac{\mathrm {d}}{\mathrm {d}x} \left( \frac{1}{C_{\varphi ^0}} e^{{-}\frac{1}{\varSigma } A{\cdot } V(x)} \right) \psi ^0(z) R^0(x,z) \,\mathrm {d}x \,\mathrm {d}z \\&= \left( \int _{{\mathbb {R}}} \int _{{\mathbb {R}}} V'(z) \otimes V'(x) \rho ^0(x,z) \,\mathrm {d}x \,\mathrm {d}z\right) A\\&= \widetilde{{\mathcal {M}}}_0 A. \end{aligned} \end{aligned}$$

We can therefore conclude that

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \lim _{T \rightarrow \infty } I_1^\varepsilon (T) = -\alpha + A, \quad \text {a.s.} \end{aligned}$$

We now consider the second term \(I_2^\varepsilon (T)\) and rewrite it as

$$\begin{aligned} I^\varepsilon _2(T) = \sqrt{2\sigma } I_{2,1}^\varepsilon (T) I_{2,2}^\varepsilon (T), \end{aligned}$$


$$\begin{aligned} \begin{aligned} I_{2,1}^\varepsilon (T)&\,\,{:=}\,\,\left( \frac{1}{T} \int _0^T V'(Z^\varepsilon _t) \otimes V'(X^\varepsilon _t) \,\mathrm {d}t\right) ^{-1}\left( \frac{1}{T} \int _0^T V'(Z^\varepsilon _t) \otimes V'(Z^\varepsilon _t) \,\mathrm {d}t\right) ,\\ I_{2,2}^\varepsilon (T)&\,\,{:=}\,\,\left( \frac{1}{T} \int _0^T V'(Z^\varepsilon _t) \otimes V'(Z^\varepsilon _t) \,\mathrm {d}t\right) ^{-1} \left( \frac{1}{T} \int _0^T V'(Z^\varepsilon _t) \,\mathrm {d}W_t\right) . \end{aligned} \end{aligned}$$

The ergodic theorem yields

$$\begin{aligned} \lim _{T \rightarrow \infty } I_{2,1}^\varepsilon (T) = \widetilde{\mathcal M}_\varepsilon ^{-1}{{\mathbb {E}}}^{\rho ^\varepsilon }\left[ V'(Z^\varepsilon ) \otimes V'(Z^\varepsilon )\right] \,\,{=:}\,\,R^\varepsilon , \end{aligned}$$

where \(R^\varepsilon \) is bounded uniformly in \(\varepsilon \) due to the theory of homogenization, Assumption 2.1(iii)–(iv) and Lemma C.1. Moreover, always due to Lemma C.1 and Assumption 2.1(iii) we have that \(V'(Z^\varepsilon )\) is square integrable, and hence, the strong law of large numbers for martingales implies

$$\begin{aligned} \lim _{T \rightarrow \infty } I_{2,2}^\varepsilon (T) = 0, \quad \text {a.s.}, \end{aligned}$$

independently of \(\varepsilon \). Therefore,

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\lim _{T \rightarrow \infty } I_2^\varepsilon (T) = 0, \quad \text {a.s.}, \end{aligned}$$

which, together with (26) and (22), proves the desired result. \(\square \)

Remark 3.13

Let us remark that the assumption that \(\delta \) is independent of \(\varepsilon \) is necessary to pass from (24) to (25) but is not needed before (24). Moreover, the term \(I_2^\varepsilon (t)\) in the proof vanishes a.s. independently of \(\varepsilon \). Therefore, in the analysis of the case \(\delta = {\mathcal {O}}(\varepsilon ^\zeta )\) it will be sufficient for unbiasedness to show that

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\frac{1}{\delta }\widetilde{{\mathcal {M}}}_\varepsilon ^{-1} {{\mathbb {E}}}^{\rho ^\varepsilon }[(X^\varepsilon - Z^\varepsilon )^2 V''(Z^\varepsilon )] = A, \end{aligned}$$

which is a non-trivial limit since \(\delta \rightarrow 0\) for \(\varepsilon \rightarrow 0\).

3.3 Filtered Data in the Multiscale Regime

We now consider the case of the filtering width \(\delta = \mathcal O(\varepsilon ^\zeta )\), where \(\zeta > 0\) will be specified in the following. In this case, the filtered process resembles more the original process \(X^\varepsilon \), as noted in Fig. 1. Moreover, the techniques employed for proving Theorem 3.12 can only be partly exploited, as highlighted in Remark 3.13. In fact, in order to prove unbiasedness it is necessary to characterize precisely the difference between the processes \(Z^\varepsilon \) and \(X^\varepsilon \). A first characterization is given by the following Proposition, whose proof is found in “Appendix B.”

Proposition 3.14

Let Assumption 2.1 hold and \(\varepsilon ,\delta >0\) be sufficiently small. Then, it holds for every \(t > 0\)

$$\begin{aligned} X_t^\varepsilon - Z_t^\varepsilon = \delta B^\varepsilon _t + R(\varepsilon ,\delta ), \end{aligned}$$

where the stochastic process \(B_t^\varepsilon \) is defined as

$$\begin{aligned} B_t^\varepsilon \,\,{:=}\,\,\sqrt{2\sigma } \int _0^t k(t-s)(1 + \varPhi '(Y_s^\varepsilon )) \,\mathrm {d}W_s, \end{aligned}$$

where \(\varPhi \) is the solution of the cell problem (5), \(W_s\) is the Brownian motion appearing in (1) and \(Y_t^\varepsilon = X_t^\varepsilon / \varepsilon \). Moreover, \(B_t^\varepsilon \) and the remainder \(R(\varepsilon ,\delta )\) satisfy for every \(p \ge 1\) the estimates

$$\begin{aligned} \left( {{\mathbb {E}}}^{\varphi ^\varepsilon } \left|B_t^\varepsilon \right|^p \right) ^{1/p} \le C \delta ^{-1/2}, \end{aligned}$$


$$\begin{aligned} \left( {{\mathbb {E}}}^{\varphi ^\varepsilon } \left| R(\varepsilon ,\delta ) \right| ^p \right) ^{1/p} \le C \left( \delta + \varepsilon + \max \{ 1,t \} e^{-t/\delta } \right) , \end{aligned}$$

where C is independent of \(\varepsilon \), \(\delta \) and t and \(\varphi ^\varepsilon \) is the density of the invariant measure of \(X^\varepsilon \).

It is clear from the Proposition above that understanding the properties of the process \(B_t^\varepsilon \) is key to understanding the behavior of the difference between \(X^\varepsilon \) and \(Z^\varepsilon \). In particular, we can write the dynamics of \(B_t^\varepsilon \) with an application of the Itô formula and due to the properties of the kernel k(t) as

$$\begin{aligned} \mathrm {d}B_t^\varepsilon = - \frac{1}{\delta }B_t^\varepsilon \,\mathrm {d}t + \frac{\sqrt{2\sigma }}{\delta }(1+\varPhi '(Y_t^\varepsilon )) \,\mathrm {d}W_t. \end{aligned}$$

This equation can be coupled with the dynamics of the processes \(X_t^\varepsilon \), \(Y_t^\varepsilon \) and \(Z_t^\varepsilon \), thus describing the evolution of the quadruple \((X^\varepsilon , Y^\varepsilon , Z^\varepsilon , B^\varepsilon )\) together. In particular, it is possible to show that the results of Sect. 3.1 hold for the quadruple, and the properties of the invariant measure of the quadruple can be exploited to prove the unbiasedness of the estimator in the case \(\delta = \mathcal O(\varepsilon ^\zeta )\) in the same way as in the case \(\delta \) independent of \(\varepsilon \). In this context, a further assumption on the potential V is necessary.

Assumption 3.15

The derivatives \(V''\) and \(V'''\) of the potential \(V :{\mathbb {R}}\rightarrow {\mathbb {R}}^N\) are component-wise polynomially bounded, and the second derivative is Lipschitz, i.e., there exists a constant \(L > 0\) such that

$$\begin{aligned} \left\| V''(x)-V''(y)\right\| \le L \left|x - y\right|, \end{aligned}$$

for all \(x, y \in {\mathbb {R}}\).

In light of Remark 3.13, it is fundamental to understand the behavior of the quantity

$$\begin{aligned} \frac{1}{\delta }(X_t^\varepsilon - Z_t^\varepsilon )^2 V''(Z_t^\varepsilon ), \end{aligned}$$

as well as its limit for \(t\rightarrow \infty \) and for \(\varepsilon \rightarrow 0\). Let us remark that due to Proposition 3.14 we have

$$\begin{aligned} \frac{1}{\delta }(X_t^\varepsilon - Z_t^\varepsilon )^2 V''(Z_t^\varepsilon ) \approx \delta (B_t^\varepsilon )^2 V''(Z_t^\varepsilon ), \end{aligned}$$

and therefore studying the right-hand side of the approximate equality above is the goal of the upcoming discussion. The following result, whose proof is in “Appendix C,” gives a first characterization.

Lemma 3.16

Under Assumptions 2.1 and 3.15, let \(\eta ^\varepsilon \) be the invariant measure of the quadruple \((X^\varepsilon , Y^\varepsilon , Z^\varepsilon , B^\varepsilon )\). Then, it holds

$$\begin{aligned} \delta {{\mathbb {E}}}^{\eta ^\varepsilon } \left[ (B^\varepsilon )^2 V''(Z^\varepsilon ) \right] = \sigma {{\mathbb {E}}}^{\eta ^\varepsilon } [(1 + \varPhi '(Y^\varepsilon ))^2V''(Z^\varepsilon ) ] + {\widetilde{R}}(\varepsilon ,\delta ), \end{aligned}$$

where the remainder \({\widetilde{R}}(\varepsilon ,\delta )\) satisfies

$$\begin{aligned} \left| {\widetilde{R}}(\varepsilon ,\delta ) \right| \le C \left( \delta ^{1/2} + \varepsilon \right) . \end{aligned}$$

Let us remark that the quantity appearing above hints toward the theory of homogenization. In fact, we recall that the homogenization coefficient K is given by

$$\begin{aligned} K = \int _0^L \left( 1 + \varPhi '(y)\right) ^2 \mu (\mathrm {d}y), \end{aligned}$$

where \(\mu \) is the marginal measure of the process \(Y^\varepsilon \) when coupled with \(X^\varepsilon \). Therefore, the next step is the homogenization limit, i.e., the limit of vanishing \(\varepsilon \), which is considered in the following Lemma, and whose proof is given in “Appendix C.”

Lemma 3.17

Let the assumptions of Lemma 3.16 hold, and let \(\delta = \varepsilon ^\zeta \) with \(\zeta > 0\). Then, it holds

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \sigma {{\mathbb {E}}}^{\eta ^\varepsilon } [(1 + \varPhi '(Y^\varepsilon ))^2V''(Z^\varepsilon ) ] = \varSigma {{\mathbb {E}}}^{\varphi ^0} [V''(X)], \end{aligned}$$

where \(\varSigma \) is the diffusion coefficient of the homogenized equation (3).

Provided with the results presented above, we can prove the following Theorem, stating that the estimator \({\widehat{A}}_k(X^\varepsilon , T)\) is asymptotically unbiased even in the case of the filtering width \(\delta \) vanishing with respect to the multiscale parameter \(\varepsilon \).

Theorem 3.18

Let the assumptions of Lemmas 3.3 and 3.17 hold. Let \({\widehat{A}}_k(X^\varepsilon , T)\) be defined in (10) and \(\delta = \varepsilon ^\zeta \) with \(\zeta \in (0,2)\). If \({\widetilde{M}}\) is invertible, then

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \lim _{T \rightarrow \infty } {\widehat{A}}_k(X^\varepsilon , T) = A, \quad \text {in probability}, \end{aligned}$$

where A is the drift coefficient of the homogenized equation (3).


Let us introduce the notation

$$\begin{aligned} {\mathcal {A}}^\varepsilon (\delta ) \,\,{:=}\,\,\frac{1}{\delta }\widetilde{\mathcal M}_\varepsilon ^{-1} {{\mathbb {E}}}^{\rho ^\varepsilon }[(X^\varepsilon - Z^\varepsilon )^2 V''(Z^\varepsilon )], \end{aligned}$$

where \(\widetilde{{\mathcal {M}}}_\varepsilon \) is defined in (20). Then, following the proof of Theorem 3.12 and in light of Remark 3.13, we only need to show that if \(\delta = \varepsilon ^\zeta \) with \(\zeta \in (0,2)\) we have

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} {\mathcal {A}}^\varepsilon (\delta ) = A, \quad \text {in probability}. \end{aligned}$$

Using Proposition 3.14 and geometric ergodicity for taking the limit for \(t \rightarrow \infty \) (Lemma 3.3), we have the following equality

$$\begin{aligned} \begin{aligned} {\mathcal {A}}^\varepsilon (\delta )&= \widetilde{{\mathcal {M}}}_\varepsilon ^{-1} \frac{1}{\delta }\lim _{t \rightarrow \infty } {{\mathbb {E}}}[(X_t^\varepsilon - Z_t^\varepsilon )^2 V''(Z_t^\varepsilon )] \\&= \widetilde{{\mathcal {M}}}_\varepsilon ^{-1} \frac{1}{\delta }\lim _{t \rightarrow \infty } {{\mathbb {E}}}\left[ \left( \delta B_t^\varepsilon + R(\varepsilon ,\delta ) \right) ^2 V''(Z_t^\varepsilon ) \right] \\&\,\,{=:}\,\,\widetilde{{\mathcal {M}}}_\varepsilon ^{-1} \lim _{t \rightarrow \infty } \left( J_1^\varepsilon (t) + J_2^\varepsilon (t) + J_3^\varepsilon (t) \right) , \end{aligned} \end{aligned}$$

where \(R(\varepsilon , \delta )\) is given in Proposition 3.14, \({{\mathbb {E}}}\) denotes the expectation with respect to the Wiener measure and

$$\begin{aligned} \begin{aligned} J_{1}^{\varepsilon }(t)&= \delta {{\mathbb {E}}}\left[ (B_{t}^{\varepsilon })^{2} V''(Z_{t}^{\varepsilon }) \right] , \\ J_{2}^{\varepsilon }(t)&= 2 {{\mathbb {E}}}\left[ B_{t}^{\varepsilon } R(\varepsilon ,\delta ) V''(Z_{t}^{\varepsilon }) \right] , \\ J_{3}^{\varepsilon }(t)&= \frac{1}{\delta }{{\mathbb {E}}}\left[ R(\varepsilon ,\delta )^{2} V''(Z_{t}^{\varepsilon }) \right] . \end{aligned} \end{aligned}$$

Let us consider the three terms separately. First, by geometric ergodicity and applying Lemmas 3.16 and 3.17 we get

$$\begin{aligned} \begin{aligned} \lim _{\varepsilon \rightarrow 0} \lim _{t \rightarrow \infty } J_1^\varepsilon (t)&= \lim _{\varepsilon \rightarrow 0}\delta {{\mathbb {E}}}^{\eta ^\varepsilon } \left[ (B^\varepsilon )^2 V''(Z^\varepsilon ) \right] \\&= \lim _{\varepsilon \rightarrow 0} \left( \sigma {{\mathbb {E}}}^{\eta ^\varepsilon } [V''(Z^\varepsilon ) (1 + \varPhi '(Y^\varepsilon ))^2] + {\widetilde{R}}(\varepsilon ,\delta )\right) \\&= \varSigma {{\mathbb {E}}}^{\varphi ^0} [V''(X)]. \end{aligned} \end{aligned}$$

Let us now consider \(J_2^\varepsilon (t)\). Considering Hölder conjugates pqr the Hölder inequality yields

$$\begin{aligned} \left|J_2^\varepsilon (t)\right| \le {{\mathbb {E}}}[(B_t^\varepsilon )^p]^{1/p}{{\mathbb {E}}}[R(\varepsilon ,\delta )^q]^{1/q}{{\mathbb {E}}}[V''(Z^\varepsilon )^r]^{1/r}. \end{aligned}$$

Now, we can bound the first two terms with (28) and (29), respectively. The third term is bounded due to Assumption 3.15 and Lemma C.1. Hence, we have for t sufficiently large

$$\begin{aligned} \left|J_2^\varepsilon (t)\right| \le C \left( \delta ^{1/2} + \varepsilon \delta ^{-1/2} \right) . \end{aligned}$$

We consider now \(J_3^\varepsilon (t)\). The Hölder inequality yields for conjugates p and q

$$\begin{aligned} \left|J_3^\varepsilon (t)\right| \le {{\mathbb {E}}}[R(\varepsilon , \delta )^{2p}]^{1/p} {{\mathbb {E}}}[V''(Z_t^\varepsilon )^q]^{1/q}, \end{aligned}$$

which, similarly as above, yields for t sufficiently large

$$\begin{aligned} \left|J_3^\varepsilon (t)\right| \le C \left( \delta + \varepsilon ^2 \delta ^{-1} \right) . \end{aligned}$$

Therefore, since \(\delta = {\mathcal {O}}(\varepsilon ^\zeta )\) for \(\zeta \in (0, 2)\), the terms \(J_2^\varepsilon (t)\) and \(J_3^\varepsilon (t)\) vanish in the limit for \(t \rightarrow \infty \) and \(\varepsilon \rightarrow 0\). Furthermore, by Lemma C.4 and by weak convergence of the invariant measure \(\mu ^\varepsilon \) to \(\mu ^0\), we have

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \widetilde{{\mathcal {M}}}_\varepsilon = {\mathcal {M}}_0, \end{aligned}$$

where \({\mathcal {M}}_0\) is defined in (21). Therefore,

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} {\mathcal {A}}^\varepsilon (\delta ) = \varSigma {\mathcal {M}}_0^{-1} {{\mathbb {E}}}^{\varphi ^0} [V''(X)], \end{aligned}$$

and, finally, employing (19) and (21) and integrating by parts yield

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} {\mathcal {A}}^\varepsilon (\delta ) = \varSigma {\mathcal {M}}_0^{-1} \frac{1}{\varSigma }{\mathcal {M}}_0 A = A, \end{aligned}$$

which implies the desired result. \(\square \)

We conclude the analysis concerning the estimator \({\widehat{A}}_k\) for the effective drift coefficient with a negative convergence result, i.e., that if \(\delta = \varepsilon ^\zeta \) with \(\zeta > 2\), the estimator based on filtered data converges to the coefficient \(\alpha \) of the unhomogenized equation. This result is relevant for two reasons. First, it shows the sharpness of the bound on \(\zeta \) in the assumptions of Theorem 3.18. Second, it shows an interesting switch between two completely different regimes at \(\zeta = 2\), which happens arbitrarily fast in the limit \(\varepsilon \rightarrow 0\).

Theorem 3.19

Let the assumptions of Lemma 3.3 and Assumption 3.15 hold. Let \({\widehat{A}}_k(X^\varepsilon , T)\) be defined in (10) and \(\delta = \varepsilon ^\zeta \) with \(\zeta > 2\). If \({\widetilde{M}}\) is invertible, then

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \lim _{T \rightarrow \infty } {\widehat{A}}_k(X^\varepsilon , T) = \alpha , \quad \text {in probability}, \end{aligned}$$

where \(\alpha \) is the drift coefficient of the multiscale equation (1).

The proof is given in “Appendix C.”

We conclude this section by proving a result of asymptotic unbiasedness for the estimator \({\widehat{\varSigma }}_k\) of the effective diffusion coefficient \(\varSigma \) defined in (12). The proof is given in “Appendix D.”

Theorem 3.20

Let the Assumptions of Theorem 3.19 hold. Then, if \(\delta = \varepsilon ^\zeta \), with \(\zeta \in (0,2)\), it holds

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \lim _{T \rightarrow \infty } {\widehat{\varSigma }}_k(X^\varepsilon ,T) = \varSigma , \quad \text {in probability}, \end{aligned}$$

where \(\varSigma \) is the diffusion coefficient of the homogenized equation (3).

4 The Bayesian Setting

In this section, we present a Bayesian reinterpretation of the inference procedure, which, given the structure of the problem, allows full uncertainty quantification with little more computational effort than required for the MLE.

Let us fix a Gaussian prior \(\mu _0 = {\mathcal {N}}(A_0, C_0)\) on A, where \(A_0 \in {\mathbb {R}}^N\) and \(C_0 \in {\mathbb {R}}^{N\times N}\) is symmetric positive definite. Then, given a final time \(T > 0\), the posterior distribution \(\mu _{T,\varepsilon }\) admits a density \(p(A \mid X^\varepsilon )\) with respect to the Lebesgue measure which satisfies

$$\begin{aligned} p(A \mid X^\varepsilon ) = \frac{1}{Z^\varepsilon } \, p(X^\varepsilon \mid A) \, p_0(A), \end{aligned}$$

where \(Z^\varepsilon \) is the normalization constant, \(p_0\) is the density of \(\mu _0\) and the likelihood \(p(X^\varepsilon \mid A)\) is given in (6). The log-posterior density is therefore given by

$$\begin{aligned} \log p(A \mid X^\varepsilon )= & {} -\log Z^\varepsilon - \frac{T}{2\varSigma } A \cdot h \\&- \frac{T}{4\varSigma } A \cdot M A - \frac{1}{2} (A - A_0) \cdot C_0^{-1}(A-A_0), \end{aligned}$$

where M and h are defined in (8). Since the log-posterior density is quadratic in A, the posterior is Gaussian, and it is therefore sufficient to determine its mean and covariance to fully characterize it. We denote by \(m_{T,\varepsilon }\) and \(C_{T,\varepsilon }\) the mean and covariance matrix, respectively. Completing the squares in the log-posterior density, we formally obtain

$$\begin{aligned} \begin{aligned} C_{T,\varepsilon }^{-1}&= C_0^{-1} + \frac{T}{2\varSigma } M, \\ C_{T,\varepsilon }^{-1}m_{T,\varepsilon }&= C_0^{-1}A_0 - \frac{T}{2\varSigma } h. \end{aligned} \end{aligned}$$

Under Assumption 2.1, one can show that the posterior at time \(T > 0\) is well defined and given by \(\mu _{T,\varepsilon }(\cdot \mid X^\varepsilon ) = {\mathcal {N}}(m_{T,\varepsilon }, C_{T,\varepsilon })\). Let us remark that in order to compute the posterior covariance \(C_{T,\varepsilon }\) the value of the diffusion coefficient \(\varSigma \) of the homogenized equation is needed. Although the exact value is in general unknown, it can be estimated employing the subsampling technique presented in [29] or with the estimator \({\widehat{\varSigma }}_k\) given in (12) based on filtered data. In fact, we verified in practice that the estimator of the diffusion coefficient based on subsampling is more robust with respect to the subsampling step than the estimator for the drift coefficient. In the following theorem, we show that the posterior distribution obtained with no pre-processing of the data contracts asymptotically to the drift coefficient of the unhomogenized equation. We characterize the contraction by verifying that the posterior measure concentrates in arbitrarily small balls. Let us finally remark that the measure \(\mu _{T, \varepsilon }\) is a random measure, and therefore, contraction has to be considered averaged with respect to the Wiener measure. The choice of the contraction measure and some parts of the proof are taken from [32, Theorem 5.2].

Theorem 4.1

Under Assumption 2.1, the posterior measure \(\mu _{T,\varepsilon }(\cdot \mid X^\varepsilon ) = {\mathcal {N}}(m_{T,\varepsilon }, C_{T,\varepsilon })\) satisfies for all \(c > 0\)

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\lim _{T \rightarrow \infty } {{\mathbb {E}}}\left[ \mu _{T,\varepsilon }\left( \{a:\left\| a-\alpha \right\| _2\ge c\}\mid X^\varepsilon \right) \right] = 0, \end{aligned}$$

where \({{\mathbb {E}}}\) denotes expectation with respect to the Wiener measure and \(\alpha \) is the drift coefficient of the unhomogenized equation (1).

Remark 4.2

The result above has the same consequences in the Bayesian setting as Theorem 2.3 has for the MLE. In particular, it shows that the posterior distribution obtained when data are not pre-processed concentrates asymptotically on the drift coefficient of the unhomogenized equation (1). Moreover, a partial result which can be deduced from the proof is that in the limit for \(T \rightarrow \infty \) and for a positive value \(\varepsilon > 0\) the Bayesian and the MLE approaches are equivalent. In particular, we have for all \(\varepsilon > 0\)

$$\begin{aligned} \begin{aligned}&\lim _{T\rightarrow \infty } \left\| C_{T,\varepsilon }\right\| _2 = 0,\\&\lim _{T \rightarrow \infty }\left\| m_{T,\varepsilon } - {\widehat{A}}(X^\varepsilon , T)\right\| _2 =0, \end{aligned} \end{aligned}$$

i.e., the weak limit of the posterior \(\mu _{T,\varepsilon }\) for \(T\rightarrow \infty \) is the Dirac delta concentrated on the limit of \(\widehat{A}(X^\varepsilon , T)\) for \(T\rightarrow \infty \).

Proof of Theorem 4.1

The proof of [32, Theorem 5.2] guarantees that if the trace of \(C_{T,\varepsilon }\) tends to zero and if the mean \(m_{T,\varepsilon }\) tends to \(\alpha \), then the desired result holds. Indeed, the triangle inequality yields

$$\begin{aligned} \begin{aligned} {{\mathbb {E}}}\left[ \mu _{T,\varepsilon }\left( \{a:\left\| a-\alpha \right\| _2\ge c\}\mid X^\varepsilon \right) \right]&\le {{\mathbb {E}}}\left[ \mu _{T,\varepsilon }\left( \left\{ a:\left\| a-m_{T,\varepsilon }\right\| _2\ge \frac{c}{2}\right\} \mid X^\varepsilon \right) \right] \\&\quad + {\mathbb {P}}\left( \left\| m_{T,\varepsilon } - \alpha \right\| _2 \ge \frac{c}{2}\right) . \end{aligned} \end{aligned}$$

If the mean converges in probability, then the second term vanishes. For the first term, Markov’s inequality yields

$$\begin{aligned} \mu _{T,\varepsilon }\left( \left\{ a:\left\| a-m_{T,\varepsilon }\right\| _2\ge \frac{c}{2}\right\} \mid X^\varepsilon \right) \le \frac{4}{c^2}\int _{{\mathbb {R}}^N} \left\| a-m_{T,\varepsilon }\right\| _2^2 \, \mu _{T,\varepsilon }(\mathrm {d}a\mid X^\varepsilon ), \end{aligned}$$

and a change of variable simply gives

$$\begin{aligned} \int _{{\mathbb {R}}^N} \left\| a-m_{T,\varepsilon }\right\| _2^2 \, \mu _{T,\varepsilon }(\mathrm {d}a\mid X^\varepsilon ) = {\text {tr}}(C_{T,\varepsilon }). \end{aligned}$$

This proves that we just have to verify that the covariance matrix vanishes and that the mean tends to the coefficient \(\alpha \). Let us first consider the covariance matrix. An algebraic identity yields

$$\begin{aligned} C_{T,\varepsilon } = \frac{2\varSigma }{T} \left( M^{-1} - Q^{-1}\right) , \end{aligned}$$


$$\begin{aligned} Q = M + \frac{T}{2\varSigma } M C_0 M. \end{aligned}$$

Let us first remark that due to the hypothesis on M (Assumption 2.1(iv)) and the ergodic theorem it holds for all \(T > 0\)

$$\begin{aligned} \left\| M^{-1}\right\| _2 \le \frac{1}{{\bar{\lambda }}}, \end{aligned}$$

where \({\bar{\lambda }}\) is given in Assumption 2.1(iv). We now have that for generic symmetric positive definite matrices R and S it holds

$$\begin{aligned} \left\| (R+S)^{-1}\right\| _2 \le \left\| S^{-1}\right\| _2. \end{aligned}$$

Applying this inequality to \(Q^{-1}\), we obtain

$$\begin{aligned} \left\| Q^{-1}\right\| _2 \le \frac{2\varSigma }{T} \left\| (MC_0M)^{-1}\right\| _2 \le \frac{2\varSigma }{T} \left\| M^{-1}\right\| _2^2 \left\| C_0^{-1}\right\| _2 = \frac{2\varSigma }{T{\bar{\lambda }}^2} \left\| C_0^{-1}\right\| _2, \end{aligned}$$

which implies

$$\begin{aligned} \lim _{T \rightarrow \infty }\left\| Q^{-1}\right\| _2 = 0, \end{aligned}$$

and due to the triangle inequality

$$\begin{aligned} \lim _{T \rightarrow \infty }\left\| C_{T,\varepsilon }\right\| _2 = 0. \end{aligned}$$

We proved that in the limit for \(T \rightarrow \infty \) the covariance shrinks to zero independently of \(\varepsilon \). We now consider the mean. First, we remark that the triangle inequality yields

$$\begin{aligned} \left\| m_{T,\varepsilon } - \alpha \right\| _2 \le \left\| m_{T,\varepsilon } - \widehat{A}(X^\varepsilon , T)\right\| _2 + \left\| {\widehat{A}}(X^\varepsilon , T) - \alpha \right\| _2. \end{aligned}$$

For the second term, Theorem 2.3 implies

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \lim _{T \rightarrow \infty }\left\| {\widehat{A}}(X^\varepsilon , T) - \alpha \right\| _2 = 0, \quad \text {a.s.} \end{aligned}$$

Let us now consider the first term. Replacing the expression of the maximum likelihood estimator (8) and due to the Cauchy–Schwarz and triangle inequalities, we obtain

$$\begin{aligned} \begin{aligned} \left\| m_{T,\varepsilon } - {\widehat{A}}(X^\varepsilon , T)\right\| _2&= \frac{2\varSigma }{T}\left\| M^{-1}C_0^{-1}A_0 - Q^{-1}\left( C_0^{-1}A_0 - \frac{T}{2\varSigma } h \right) \right\| _2\\&\le \frac{2\varSigma }{T{\bar{\lambda }}} \left\| C_0^{-1}\right\| _2 \left( \left\| A_0\right\| _2 + \frac{1}{{\bar{\lambda }}}\left\| h\right\| _2 + \frac{2\varSigma }{T{\bar{\lambda }}} \left\| C_0^{-1}\right\| _2\left\| A_0\right\| _2\right) . \end{aligned} \end{aligned}$$

Moreover, the ergodic theorem and the strong law of large numbers for martingales guarantee that \(\left\| h\right\| _2\) is bounded a.s. for \(T \rightarrow \infty \). Therefore,

$$\begin{aligned} \lim _{T\rightarrow \infty } \left\| m_{T,\varepsilon } - {\widehat{A}}(X^\varepsilon , T)\right\| _2 = 0, \quad \text {a.s.}, \end{aligned}$$

independently of \(\varepsilon \). Finally,

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \lim _{T \rightarrow \infty }\left\| m_{T,\varepsilon } - \alpha \right\| _2 = 0, \quad \text {a.s.}, \end{aligned}$$

which, together with (31), implies the desired result. \(\square \)

4.1 The Filtered Data Approach

In this section, we present how to correct the asymptotic biasedness of the posterior highlighted in Theorem 4.1 employing filtered data. In the Bayesian setting, we consider the modified likelihood function

$$\begin{aligned} {\widetilde{p}}(X^\varepsilon \mid A) = \exp \left( -\frac{\widetilde{I}(X^\varepsilon \mid A)}{2\varSigma } \right) , \end{aligned}$$


$$\begin{aligned} \begin{aligned} {\widetilde{I}}(X^\varepsilon \mid A)&= \int _0^T A \cdot V'(Z^\varepsilon _t) \,\mathrm {d}X^\varepsilon _t + \frac{1}{2} \int _0^T \left( A \cdot V'(X^\varepsilon _t)\right) ^2 \,\mathrm {d}t \\&= {\widetilde{h}} \cdot A + \frac{1}{2} A \cdot M A. \end{aligned} \end{aligned}$$

Since M is symmetric positive definite, the function \(\widetilde{p}(X^\varepsilon \mid A)\) is indeed a valid Gaussian likelihood function. We then obtain the modified posterior \({{\widetilde{\mu }}}_{T,\varepsilon } = {\mathcal {N}}({\widetilde{m}}_{T, \varepsilon }, C_{T, \varepsilon })\), whose parameters are given by

$$\begin{aligned} \begin{aligned} C_{T, \varepsilon }^{-1}&= C_0^{-1} + \frac{T}{2\varSigma } M, \\ C_{T,\varepsilon }^{-1}{\widetilde{m}}_{T,\varepsilon }&= C_0^{-1}A_0 - \frac{T}{2\varSigma } {\widetilde{h}}. \end{aligned} \end{aligned}$$

Let us remark that the posterior \({{\widetilde{\mu }}}_{T,\varepsilon }\) has the same covariance as \(\mu _{T,\varepsilon }\) given in (30) and that therefore it is indeed a valid Gaussian posterior distribution. Nevertheless, in order to employ the tool of convergence introduced in Theorem 4.1, we need to study the properties of the MLE based on the likelihood \({\widetilde{p}}(X^\varepsilon \mid A)\), i.e., the quantity

$$\begin{aligned} {\widetilde{A}}_k(X^\varepsilon , T) = - M^{-1} {\widetilde{h}}. \end{aligned}$$

The following theorem guarantees the unbiasedness of this estimator under a condition on the parameter \(\delta \) of the filter.

Theorem 4.3

Let the assumptions of Theorem 3.18 hold. Then, if \(\delta = \varepsilon ^\zeta \), with \(\zeta \in (0, 2)\), it holds

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0} \lim _{T \rightarrow \infty } {\widetilde{A}}_k(X^\varepsilon , T) = A, \quad \text {in probability}, \end{aligned}$$

for \({\widetilde{A}}_k(X^\varepsilon , T)\) defined in (32).


We first consider the difference between the two estimators \({\widetilde{A}}_k(X^\varepsilon , T)\) and \({\widehat{A}}_k(X^\varepsilon , T)\). In particular, the ergodic theorem and an algebraic equality imply

$$\begin{aligned} \begin{aligned} \lim _{T \rightarrow \infty } \left( {\widetilde{A}}_k(X^\varepsilon , T) - {\widehat{A}}_k(X^\varepsilon , T)\right)&= \left( {\mathcal {M}}_\varepsilon ^{-1} - \widetilde{{\mathcal {M}}}_\varepsilon ^{-1}\right) \lim _{T \rightarrow \infty }{\widetilde{h}} \\&= -{\mathcal {M}}_\varepsilon ^{-1}\left( {\mathcal {M}}_\varepsilon - \widetilde{{\mathcal {M}}}_\varepsilon \right) \widetilde{{\mathcal {M}}}_\varepsilon ^{-1} \lim _{T \rightarrow \infty } {\widetilde{h}}\\&= {\mathcal {M}}_\varepsilon ^{-1}\left( {\mathcal {M}}_\varepsilon - \widetilde{\mathcal M}_\varepsilon \right) \lim _{T \rightarrow \infty } {\widehat{A}}_k(X^\varepsilon , T), \end{aligned} \end{aligned}$$

almost surely, where \({\mathcal {M}}_\varepsilon \) and \(\widetilde{\mathcal M}_\varepsilon \) are defined in (21) and (20), respectively. Therefore, due to Assumption 2.1 which allows controlling the norm of \(\mathcal M_\varepsilon ^{-1}\) and due to Lemma C.4 we have for a constant \(C > 0\)

$$\begin{aligned} \lim _{T \rightarrow \infty } \left\| {\widetilde{A}}_k(X^\varepsilon , T) - \widehat{A}_k(X^\varepsilon , T)\right\| _2 \le C \left( \varepsilon + \delta ^{1/2} \right) , \end{aligned}$$

where we remark that \({\widehat{A}}_k(X^\varepsilon , T)\) has a bounded norm for \(\varepsilon \) sufficiently small due to Theorem 3.18. Now, the triangle inequality yields

$$\begin{aligned} \left\| {\widetilde{A}}_k(X^\varepsilon , T) - A\right\| _2 \le \left\| \widetilde{A}_k(X^\varepsilon , T) - {\widehat{A}}_k(X^\varepsilon , T)\right\| _2 + \left\| \widehat{A}_k(X^\varepsilon , T) - A\right\| _2. \end{aligned}$$

Therefore, due to Theorem 3.18, inequality (33) and since \(\delta = \varepsilon ^\zeta \), the desired result holds. \(\square \)

Remark 4.4

One could argue that we could have carried on the whole analysis for the estimator \({\widetilde{A}}_k(X^\varepsilon , T)\) instead of the estimator \({\widehat{A}}_k(X^\varepsilon , T)\). Nevertheless, the latter guarantees the strong result of almost sure convergence in case \(\delta \) is independent of \(\varepsilon \), which is false for the former. Conversely, analyzing the properties of the estimator \({\widetilde{A}}_k(X^\varepsilon , T)\) is fundamental for the Bayesian setting, in which the matrix \({\widetilde{M}}\) cannot be employed as its symmetric part is not positive definite in general.

In light of the proof of Theorem 4.1, Theorem 4.3 guarantees that the mean of the posterior distribution \({{\widetilde{\mu }}}_{T, \varepsilon }\) converges to the drift coefficient of the homogenized equation. Since the covariance matrix is the same for \(\mu _{T, \varepsilon }\) and \({{\widetilde{\mu }}}_{T, \varepsilon }\), it is possible to prove a positive convergence result for \(\widetilde{\mu }_{T, \varepsilon }\), which is given by the following Theorem.

Theorem 4.5

Let the Assumptions of Theorem 4.3 hold. Then, the modified posterior measure \({{\widetilde{\mu }}}_{T,\varepsilon }(\cdot \mid X^\varepsilon ) = {\mathcal {N}}({\widetilde{m}}_{T,\varepsilon }, C_{T,\varepsilon })\) satisfies

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\lim _{T \rightarrow \infty } {{\mathbb {E}}}\left[ \widetilde{\mu }_{T,\varepsilon }\left( \{a:\left\| a-A\right\| _2\ge c\}\mid X^\varepsilon \right) \right] = 0, \end{aligned}$$

where \({{\mathbb {E}}}\) denotes expectation with respect to the Wiener measure and A is the drift coefficient of the homogenized equation (3).


The proof follows from the proof of Theorem 4.1 and from Theorem 4.3. \(\square \)

5 Numerical Experiments

In this section, we show numerical experiments confirming our theoretical findings and showcasing the potential of the filtered data approach to overcome model misspecification arising when multiscale data are used to fit homogenized models.

Remark 5.1

In practice, we consider for numerical experiment the data to be in the form of a high-frequency discrete time series from the solution \(X^\varepsilon \) of (1). Let \(\tau > 0\) be the time step at which data are observed, and let \(X^\varepsilon \,\,{:=}\,\,(X^\varepsilon _0, X^\varepsilon _\tau , X^\varepsilon _{2\tau }, \ldots )\). We then compute the estimator \({\widehat{A}}_k\) as

$$\begin{aligned} {\widehat{A}}_{k,\tau }(X^\varepsilon , T) = - {\widetilde{M}}_\tau ^{-1}(X^\varepsilon ) {\widetilde{h}}_\tau (X^\varepsilon ), \end{aligned}$$


$$\begin{aligned} {\widetilde{M}}_\tau (X^\varepsilon ) = \frac{\tau }{T} \sum _{j=0}^{n-1} V'(Z^\varepsilon _{j\tau }) \otimes V'(X^\varepsilon _{j\tau }) , \qquad \widetilde{h}_\tau (X^\varepsilon ) = \frac{1}{T} \sum _{j=0}^{n-1} V'(Z^\varepsilon _{j\tau }) (X^\varepsilon _{(j+1)\tau } - X^\varepsilon _{j\tau }). \end{aligned}$$

We take in all experiments \(\tau \ll \varepsilon ^2\), so that the discretization of the data has negligible effects and does not compromise the validity of our theoretical results.

5.1 Parameters of the Filter

For the first preliminary experiments, we consider \(N = 1\) and the quadratic potential \(V(x) = x^2/2\). In this case, the solution of the homogenized equation is an Ornstein–Uhlenbeck process. Moreover, we set the fast potential in the multiscale equation (1) as \(p(y) = \cos (y)\). In all experiments, data are generated employing the Euler–Maruyama method with a fine time step.

Fig. 2
figure 2

Results for Sect. 5.1.1. In both figures, horizontal lines represent \(\alpha \) and A, the drift coefficients of the unhomogenized and homogenized equations, and the grey vertical line represents the lower bound for the validity of Theorem 3.18. The curved lines (dashed, dotted and dash-dotted) represent in figure a the values of \(\widehat{A}_k(X^\varepsilon , T)\) for \(\varepsilon = \{0.1, 0.05, 0.025\}\), respectively, computed with \(T = 10^3\). In figure b, they correspond to the values of \({\widehat{A}}_k(X^\varepsilon , T)\) at \(T = \{100, 300, 1000\}\), respectively, computed with \(\varepsilon = 0.05\). We plot next to both figures a and b a zoom on a neighborhood of \(\varepsilon ^2\) to show the transition between the two regimes highlighted by the theoretical results. Note that the \(\delta \)-axis is in logarithmic scale and is normalized with respect to \(\varepsilon \)

5.1.1 Verification of Theoretical Results

We first demonstrate numerically the validity of Theorem 3.12, Theorem 3.18 and Theorem 3.19, i.e., the unbiasedness of \({\widehat{A}}_k(X^\varepsilon , T)\) for \(\delta = \varepsilon ^\zeta \) with \(\zeta \in [0, 2)\) and biasedness for \(\zeta > 2\). Let us recall that for \(\zeta = 0\) the analysis and the theoretical result are fundamentally different than for \(\zeta \in (0, 2)\). We consider \(\varepsilon \in \{0.1, 0.05, 0.025\}\), the diffusion coefficient \(\sigma = 1\) and generate data \(X^\varepsilon _t\) for \(0 \le t \le T\) with \(T = 10^3\). Then, we filter the data by choosing \(\delta = \varepsilon ^\zeta \), and \(\zeta = 0, 0.1, 0.2,\ldots , 3\), and compute \(\widehat{A}_k(X^\varepsilon , T)\). Results are displayed in Fig. 2 and show that for \(\zeta > 2\), i.e., \(\delta = o(\varepsilon ^2)\), the estimator tends to the drift coefficient \(\alpha \) of the unhomogenized equation. Conversely, as predicted by the theory, for \(\zeta \in [0, 2)\) the estimator tends to A, the drift coefficient of the homogenized equation. Therefore, the point \(\delta = \varepsilon ^2\) acts asymptotically as a switch between two completely different regimes, which is theoretically sharp in the limit for \(T \rightarrow \infty \) and \(\varepsilon \rightarrow 0\). Let us remark that the results displayed in Fig. 2a demonstrate that the transition occurs more rapidly for the smallest values of \(\varepsilon \). Moreover, in Fig. 2b, one can see how with bigger final times T the estimator is closer both to A when \(\zeta \in [0, 2]\) and to \(\alpha \) when \(\zeta > 2\). Still, we observe that in finite computations the switch between A and \(\alpha \) is smoother than what we expect from the theory, which suggests to fix, if possible, \(\delta = 1\).

Fig. 3
figure 3

Results for Sect. 5.1.2. The case of \(\delta = 1\) is highlighted as a solid dot for the filtered data technique, as the analysis and theoretical result is different in this case. The three rows correspond to \(\sigma = 0.5, 0.7, 1.0\) from top to bottom, and the dashed line corresponds to the true value of A

5.1.2 Comparison with Subsampling

We now compare the results given by the filtered data technique with the results given by subsampling the data, i.e., the difference between the estimators \({\widehat{A}}_k(X^\varepsilon , T)\) and \(\widehat{A}_\delta (X^\varepsilon , T)\). We fix the multiscale parameter \(\varepsilon = 0.1\) and generate data for \(0 \le t \le T\) with \(T = 10^3\). We choose \(\delta = \varepsilon ^{\zeta }\) and vary \(\zeta \in [0, 1]\), where \(\delta \) is the filtering and the subsampling width, respectively. Moreover, for the filtered data approach we consider both \(\beta = 1\) and \(\beta = 5\). We report in Fig. 3 the experimental results. Let us remark that:

  1. (i)

    for \(\sigma = 0.5\) the results given by subsampling and by the filter with \(\beta = 1\) are similar, while for higher values of \(\sigma \) the filtered data approach seems better than subsampling;

  2. (ii)

    in general, choosing a higher value of \(\beta \) seems beneficial for the quality of the estimator;

  3. (iii)

    the dependence on \(\delta \) of numerical results given by the filter seems relevant only in case \(\beta = 1\) and for small values of \(\sigma \). For \(\beta = 1\) and higher values of \(\sigma \), the estimator is stable with respect to this parameter. This can be observed for a higher value of \(\beta \), but we have no theoretical guarantee in this case.

5.1.3 The Influence of \(\beta \)

We finally test the variability of the estimator with respect to \(\beta \) in (9). We consider \(\delta = \varepsilon \), which corresponds to \(\zeta = 1\) and seems to be the worst-case scenario for the filter, at least for \(\beta = 1\). We consider again \(\sigma = 0.5, 0.7, 1\) and vary \(\beta = 1, 2, \ldots , 10\). Results, given in Fig. 4, show empirically that the estimator stabilizes fast with respect to \(\beta \). Nevertheless, there is no theoretical guarantee supporting this empirical observation.

Fig. 4
figure 4

Results for the estimator based on filter data with respect to the parameter \(\beta \) (Sect. 5.1.3). The result for \(\beta =1\), for which there are theoretical guarantees given in Theorem 3.18, is highlighted as a solid dot. From left to right, we consider different values of \(\sigma \), and the dashed line corresponds to the true value of A

5.2 Variance of the Estimators

We now compare the estimators \({\widehat{A}}_k\) based on filtered data and \({\widehat{A}}_\delta \) based on subsampling in terms of variance. We consider for this experiment the SDE (1) with \(N = 1\), the bistable potential \(V(x) = x^4/4 - x^2/2\), the multiscale drift coefficient \(\alpha = 1\), the diffusion coefficient \(\sigma = 1\) and with \(\varepsilon = 0.1\). We then let \(X^\varepsilon = (X_t, 0\le t\le T)\) be the solution of (1) and generate \(N_{\mathrm {s}} = 500\) i.i.d. samples of \(X^\varepsilon \). We then compute the estimators \({\widehat{A}}_k\) and \({\widehat{A}}_\delta \) on each of the realizations of \(X^\varepsilon \), thus obtaining \(N_{\mathrm {s}}\) replicas \(\{{\widehat{A}}_k^{(i)}\}_{i=1}^{N_{\mathrm {s}}}\) and \(\{\widehat{A}_\delta ^{(i)}\}_{i=1}^{N_{\mathrm {s}}}\). For the estimator \({\widehat{A}}_k\), we consider kernel (9) with \(\beta = \{1,5\}\) and with \(\delta = 1\). For the estimator \(\widehat{A}_\delta \), we employ the subsampling width \(\delta = \varepsilon ^{2/3}\), which is heuristically optimal following [29]. It could be argued that another estimator based on subsampling and shifting could be employed to reduce the variance. In particular, we let \(\tau > 0\) be the time step at which the data is observed. Indeed, in practice we work with high-frequency discrete data and observe \(X^\varepsilon \,\,{:=}\,\,(X^\varepsilon _0, X^\varepsilon _\tau , \ldots , X^\varepsilon _{n\tau })\), with \(n\tau = T\). We assume for simplicity that the subsampling width \(\delta \) is a multiple of \(\tau \) and compute for all \(k = 0, 1, \ldots , \delta /\tau -1\)

$$\begin{aligned} {\widehat{A}}_{\delta ,k}(X^\varepsilon ,T) = - \frac{\sum _{j=0}^{n-1} V'(X^\varepsilon _{j\delta + k}) (X^\varepsilon _{(j+1)\delta + k}-X^\varepsilon _{j\delta + k}) }{\delta \sum _{j=0}^{n-1} V'(X^\varepsilon _{j\delta + k})^2}, \end{aligned}$$

i.e., the subsampling estimator obtained by shifting the origin by \(k\tau \). We then average over the index k and obtain the new estimator

$$\begin{aligned} {\widehat{A}}_{\delta }^{\mathrm {avg}}(X^\varepsilon ,T) = \frac{\tau }{\delta } \sum _{k=0}^{\delta /\tau - 1}{\widehat{A}}_{\delta ,k}(X^\varepsilon ,T). \end{aligned}$$

We include this estimator in the numerical study for completeness and compute \(N_{\mathrm {s}}\) replicas of \(\widehat{A}_{\delta }^{\mathrm {avg}}\) on all the realizations of \(X^\varepsilon \). Results, given in Fig. 5 for the final times \(T = \{500, 1000\}\), show that our novel approach does not outperform subsampling in terms of variance, but clearly does in terms of bias. Moreover, we notice numerically that the shifted-averaged estimator \({\widehat{A}}_\delta ^{\mathrm {avg}}\) does not reduce sensibly the variance in this case with respect to \({\widehat{A}}_\delta \). In fact, this is only partly surprising, since the estimators \(\widehat{A}_{\delta ,k}\) of (34) are highly correlated. Finally, we notice that the filtering estimator \({\widehat{A}}_k\) with \(\beta = 5\) has a lower variance with respect to the same estimator with \(\beta = 1\). This confirms that choosing a higher value of \(\beta \) improves the estimation of the effective drift coefficient.

Fig. 5
figure 5

Numerical results for Sect. 5.2. Comparison between the density of the estimator of the drift based on filtered data with \(\beta = \{1,5\}\), the estimator based on subsampling and the estimator based on shift-subsampling and averaging of (35). On the left and on the right, the final time is \(T = \{500, 1000\}\), respectively

5.3 Multidimensional Drift Coefficient

Let us consider the Chebyshev polynomials of the first kind, i.e., the polynomials \(T_i:{\mathbb {R}}\rightarrow {\mathbb {R}}\), \(i=0, 1, \ldots \), defined by the recurrence relation

$$\begin{aligned} T_0(x) = 1, \quad T_1(x) = x, \quad T_{i+1}(x) = 2xT_i(x) - T_{i-1}(x). \end{aligned}$$

We consider the potential function V(x) as in (2) with

$$\begin{aligned} V_i(x) = T_i(x), \quad i =1, \ldots , 4, \end{aligned}$$

thus considering the semi-parametric framework of Remark 2.4. This potential function satisfies Assumption 2.1 whenever N is even and if the leading coefficient \(\alpha _N\) is positive. We set \(N = 4\) and the drift coefficient \(\alpha = (-1, -1/2, 1/2, 1)\). With this drift coefficient, the potential function is of the bistable kind. Moreover, we set \(\varepsilon = 0.05\), the diffusion coefficient \(\sigma = 1\), the fast potential \(p(y) = \cos (y)\) and simulate a trajectory of \(X^\varepsilon \) for \(0 \le t \le T\) with \(T = 10^3\) employing the Euler–Maruyama method with time step \(\varDelta t = \varepsilon ^3\). We estimate the drift coefficient \(A \in {\mathbb {R}}^4\) with the estimators:

  1. (i)

    \({\widehat{A}}(X^\varepsilon , T)\) based on the data \(X^\varepsilon \) itself;

  2. (ii)

    \({\widehat{A}}_\delta (X^\varepsilon , T)\) based on subsampled data with subsampling parameter \(\delta = \varepsilon ^{2/3}\);

  3. (iii)

    \({\widehat{A}}_k(X^\varepsilon , T)\) based on filtered data \(Z^\varepsilon \) computed with \(\beta = 1\) and \(\delta = 1\).

In particular, we pick this specific value of \(\delta \) for the subsampling following the optimality criterion given in [29]. Results, given in Fig. 6, show that the filter-based estimation captures well the homogenized potential as well as the coefficient A. Moreover, it is possible to remark the negative result given in Theorem 2.3 holds in practice, i.e., with no pre-processing the estimator \(\widehat{A}(X^\varepsilon , T)\) tends to the drift coefficient \(\alpha \) of the unhomogenized equation. Finally, we can observe that the subsampling-based estimator fails to capture the homogenized coefficients. Indeed, the estimator strongly depends on the sampling rate and on the diffusion coefficient, as shown in the numerical experiments of [29]. Even though the authors suggest the choice of \(\delta = \varepsilon ^{2/3}\), this is just an heuristic and is not guaranteed to be the optimal value in all cases. In the asymptotic limit of \(\varepsilon \rightarrow 0\) and \(T \rightarrow \infty \), any valid choice of the subsampling rate is guaranteed theoretically to work, but not in the pre-asymptotic regime. Our estimator, conversely, seems to perform better with no particular tuning of the parameters even in this multidimensional case, which demonstrates the robustness of our novel approach.

Fig. 6
figure 6

Results for Sect. 5.3. In the figure, from left to right the potential function estimated with the data itself, the filter, subsampled data. In the table, numerical results for the single components of the true and estimated drift coefficients

Fig. 7
figure 7

Results for Sect. 5.4. Posterior distributions over the parameter \(A = (A_1, A_2)^\top \) for the bistable potential obtained with the filtered data approach. The figures refer to final time \(T = 100, 200, 400\) from left to right, respectively. The MLE \({\widetilde{A}}_k(X^\varepsilon , t)\) is represented with a circle, while the true value A of the drift coefficient of the homogenized equation is represented with a cross

5.4 The Bayesian Approach: Bistable Potential

In this numerical experiment, we consider \(N = 2\) and the bistable potential, i.e., the function V defined as

$$\begin{aligned} V(x) = \begin{pmatrix} \dfrac{x^4}{4}&-\dfrac{x^2}{2} \end{pmatrix}^\top , \end{aligned}$$

with coefficients \(\alpha _1 = 1\) and \(\alpha _2 = 2\). We then consider the multiscale equation with \(\sigma = 0.7\), the fast potential \(p(y) = \cos (y)\) and \(\varepsilon = 0.05\), thus simulating a trajectory \(X^\varepsilon \). We adopt here a Bayesian approach and compute the posterior distribution \({{\widetilde{\mu }}}_{T, \varepsilon }\) obtained with the filtered data approach introduced in Sect. 4.1. The parameters of the filter are set to \(\beta = 1\) and \(\delta = \varepsilon \) in (9). Moreover, we choose the non-informative prior \(\mu _0 = {\mathcal {N}}(0,I)\). Let us remark that in order to compute the posterior covariance the diffusion coefficient \(\varSigma \) of the homogenized equation has to be known. In this case, we pre-compute the value of \(\varSigma \) via the coefficient K and the theory of homogenization, but notice that \(\varSigma \) could be estimated either employing the subsampling technique of [29] or using the estimator \({\widehat{\varSigma }}_k\) based on filtered data defined in (12). In particular, in this case \(\varSigma \approx 0.2807\), and we compute numerically

$$\begin{aligned} {\widehat{\varSigma }}_k(X^{\varepsilon }, 100) = 0.2901, \quad \widehat{\varSigma }_k(X^{\varepsilon }, 200) = 0.2835, \quad {\widehat{\varSigma }}_k(X^{\varepsilon }, 400) = 0.2813, \end{aligned}$$

so that employing the estimator \({\widehat{\varSigma }}_k\) instead of the true value would have negligible effects on the computation of the posterior over the effective drift coefficient. We stop computations at times \(T = \{100, 200, 400\}\) in order to observe the shrinkage of the Gaussian posterior toward the MLE \({\widetilde{A}}_k(X^\varepsilon , T)\) with respect to time. In Fig. 7, we observe that the posterior does indeed shrink toward the MLE, which in turn gets progressively closer to the true value of the drift coefficient A of the homogenized equation.

6 Conclusion

In this work, we considered a novel methodology to confront the problem of model misspecification when homogenized models are fit to multiscale data. Our approach is based on using filtered data for the estimation of the drift of the homogenized diffusion process. We proved asymptotic unbiasedness of estimators drawn from our methodology. Moreover, we found a modified Bayesian approach which guarantees robust uncertainty quantification and posterior contraction, based on the same filtered data approach. Numerical experiments demonstrate how the estimator based on filtered data requires less knowledge of the characteristic time scales of the multiscale equation with respect to subsampling and how it can be employed as a black-box tool for parameter estimation on a range of academic examples. We note that in many applications one can only obtain discrete measurements of the diffusion process. Recently, using the filtering approach developed in this paper and martingale estimating functions a new estimator for learning homogenized SDEs from noisy discrete data has been introduced [4]. We believe this work gives way to several further developments. In particular, we believe it would be relevant to

  1. (i)

    analyze the filtered data approach for \(\beta > 1\) in (9), which seems to give more robust results in practice,

  2. (ii)

    extend the analysis to the nonparametric framework most likely by means of Bayesian regularization techniques, thus allowing to recover effective drift functions for which a parametric representation does not exist,

  3. (iii)

    consider multiscale models for which the homogenized equation presents multiplicative noise,

  4. (iv)

    test the filtered data methodology against real-world data,

  5. (v)

    apply similar methodologies to correct faulty behavior of other methods.