Abstract
We study the problem of drift estimation for twoscale continuous time series. We set ourselves in the framework of overdamped Langevin equations, for which a singlescale surrogate homogenized equation exists. In this setting, estimating the drift coefficient of the homogenized equation requires preprocessing of the data, often in the form of subsampling; this is because the twoscale equation and the homogenized singlescale equation are incompatible at small scales, generating mutually singular measures on the path space. We avoid subsampling and work instead with filtered data, found by application of an appropriate kernel function, and compute maximum likelihood estimators based on the filtered process. We show that the estimators we propose are asymptotically unbiased and demonstrate numerically the advantages of our method with respect to subsampling. Finally, we show how our filtered data methodology can be combined with Bayesian techniques and provide a full uncertainty quantification of the inference procedure.
Introduction
Efficient parameter estimation for stochastic models is essential in a wide range of applications in natural and social sciences. In several areas, the data originate from phenomena which vary continuously in time and which are endowed with a multiscale structure. This is the case, for example, in molecular dynamics, oceanography and atmosphere science or in econometrics. Frequently, it is desirable in these areas to infer from data a simpler model which captures effectively largescale structures, or slow variations, disregarding smallscale fluctuations or treating them as a source of noise. The mismatch between the data and their desired slowscale representation is a typical instance of a problem of model misspecification, which, if ignored or handled incorrectly, can lead to erroneous inference. Indeed, the data, coming from the full dynamics, are compatible with the coarsegrained model only at the time scales at which the effective dynamics is valid.
In this paper, we consider a simple multiscale setting arising from models of molecular dynamics, with a complete separation between the fast and the slow scale. In particular, we consider diffusion processes for motion in a confining potential which has slow variations with rapid orderone oscillations superimposed. Given data in the form of a sample path from this simple class of model problems, we are interested in determining the drift coefficient of an equation of the overdamped Langevin type in which the fastscale potential is eliminated. The theory of homogenization guarantees that such a singlescale equation can be uniquely determined, and our goal is therefore to obtain effective coarsegrained dynamics from data consistently with respect to the homogenization result.
Several methods to take into account model misspecification in multiscale frameworks as above exist. For diffusion processes, the proposed approaches rely in different measures to subsampling, which has proved itself to some extent effective in many applications, but which requires nevertheless precise knowledge of how separated the two characteristic time scales are. Robustness of this methodology is dubious, too, as inference results tend to be extremely sensitive to the subsampling rate.
In the rest of the introduction, we first give a brief overview of the existing literature on the topic of deterministic and stochastic multiscale inference problems, then introduce our novel methodology and its favorable properties and conclude with an outline of this paper.
Literature Review
For simple models in molecular dynamics, the effect of model misspecification was studied in a series of papers [7, 8, 16, 17, 26, 28, 29] under the assumption of scale separation. In particular, for Brownian particles moving in twoscale potentials it was shown that, when fitting data from the full dynamics to the homogenized equation, the maximum likelihood estimator (MLE) is asymptotically biased [29, Theorem 3.4]. To be more precise, in the large sample size limit, the data remain consistent with the multiscale problem at small scale. Ostensibly, this would seem related only to the estimation of the diffusion coefficient. However, because of detail balance, it also has the effect that the MLE, for the drift in a parameter fit of a singlescale model, incorrectly identifies the coefficient of the homogenized equation. The bias of the MLE can be eliminated by subsampling at an appropriate rate, which lies between the two characteristic time scales of the problem [29, Theorems 3.5 and 3.6].
Similar techniques can be employed in econometrics, in particular for the estimation of the integrated stochastic volatility in the presence of market microstructure noise. In this case, too, the data have to be subsampled at an appropriate rate [6, 25]. The correct subsampling rate can, in some instances, be rather extreme with respect to the frequency of the data itself, resulting in ignoring as much as \(99\%\) of the time series. As the intuition suggests, this increases significantly the variance of the estimator, which is usually taken care of with additional bias corrections and variance reduction procedures. The need of such methodology is accentuated by data being obtained at highfrequency [5, 35].
The problem of extracting largescale variations from multiscale data is studied in atmosphere and ocean science. In this field, too, subsampling the data is necessary to obtain an accurate coarsegrained model [12, 34].
The necessity to subsample the data can be alleviated by using appropriate martingale estimators, as was done in [18, 21]. This class of estimators can be applied to the case where the noise is multiplicative and also given by a deterministic chaotic system, as opposed to white noise. Estimators of this family have been applied to time series from paleoclimatic data and marine biology and augmented with appropriate model selection methodologies [22].
In case the data consist of discrete observations and not of continuous time series, it is possible to employ estimators based on a spectral decomposition of the generator of the stochastic process. Methodologies of this kind have been applied successfully to inference problems for singlescale problems [13, 20], as well as more recently for multiscale diffusions [14].
Inference of diffusion processes can be naturally performed under a Bayesian perspective. If one focuses on the drift coefficient, the form of the likelihood function guarantees, under a Gaussian prior hypothesis, that the posterior distribution is itself a Gaussian. The versatility of the Bayesian approach in the infinitedimensional case [15, 33] gives the possibility to extend the study of inferring the drift of a diffusion process to the nonparametric case [31, 32].
The issue of model misspecification in inverse problems with a multiscale structure has been treated in the context of partial differential equations, too. In particular, it has been shown that it is possible to infer a coarsegrained equation from data coming from the full model and to retrieve, in the large data limit, the correct result [24]. A series of papers [1,2,3] focuses on retrieving the full model when the multiscale coefficient is endowed with a specific parametrized structure. Since these problems are illposed, the latter is achieved via Tikhonov regularization [1, 24], adopting a Bayesian approach [2, 24] or exploiting techniques of Kalman filtering [3]. In [2, 3], the authors highlight the need to account explicitly for the modeling error due to homogenization and apply statistical techniques taken from [10, 11].
Our Contributions
In this paper, we bypass subsampling by designing a methodology based on filtered data. In particular, we smooth the timeseries data from the multiscale model by application of an appropriate linear timeinvariant filter, from the exponential family, and show that doing so allows us to accurately retrieve the drift coefficient of the homogenized model. The methodology we present is straightforward to implement, robust in practice and backed by theory. In particular, we show theoretically and demonstrate via numerical experiments that:

(i)
The smoothing width of the filter can be alternatively tuned to be proportional to the speed of the slow process or to smaller scales and provide in both cases unbiased results for maximum likelihood parameter estimation. Sharp estimates on the minimal width with respect to the multiscale parameter are provided. The unbiasedness results are given in Theorems 3.12 and 3.18 for filtered data in the homogenized and in the multiscale regimes, respectively.

(ii)
We additionally propose in the multiscale regime an estimator of the effective diffusion coefficient based on filtered data, as shown in Theorem 3.20.

(iii)
Estimations based on our technique are robust in practice with respect to the parameter of the filter. This is not the case for subsampling, which is strongly influenced by the subsampling frequency. The robustness of our technique is demonstrated via numerical experiments in Sects. 5.1 and 5.3.

(iv)
The entire stream of data is employed, which, in practice, enhances the quality of the filterbased MLE in terms of bias. Moreover, avoiding subsampling and thus discretizing the data allow us to employ continuoustime theoretical tools.

(v)
It is possible to employ the filtered data approach within a continuoustime Bayesian framework by a careful modification of the likelihood function. Under mild hypotheses on the filter parameters, we are able to show that the posterior distributions obtained with our methodology are asymptotically consistent with respect to the drift parameter of the homogenized equation. Our main theoretical result is given in Theorem 4.5, and a numerical experiment for the combination of the filtered data approach and of Bayesian techniques is presented in Sect. 5.4.
Outline
The rest of the paper is organized as follows. In Sect. 2, we introduce the problem and lay the basis of our analysis setting the main assumptions and notation. In Sect. 3, we present our filtered data methodology, with a particular focus on ergodic properties, on multiscale convergence and, naturally, on the properties of our estimators. In Sect. 4, we introduce the Bayesian framework and show how it can be enhanced employing filtered data. Finally, in Sect. 5 we demonstrate the effectiveness of our methodology via a series of numerical experiments.
Problem Setting
In this section, we introduce the class of diffusion processes which we treat in this paper and the classical methodology employed for the estimation of the drift. Let \(\varepsilon > 0\) and let us consider the onedimensional multiscale stochastic differential equation (SDE)
where, given a positive integer N, we have that \(\alpha \in {\mathbb {R}}^N\) and \(\sigma > 0\) are the drift and diffusion coefficients, respectively, and \(W_t\) is a standard onedimensional Brownian motion. The functions \(V:{\mathbb {R}}\rightarrow {\mathbb {R}}^N\) and \(p:{\mathbb {R}}\rightarrow {\mathbb {R}}\) define the slowscale and the fastscale confining potentials, respectively. In particular, we assume
for smooth functions \(V_i:{\mathbb {R}}\rightarrow {\mathbb {R}}\), \(i = 1, \ldots , N\). Moreover, we assume p to be smooth and periodic of period L. The theory of homogenization [9, Chapter 3] guarantees the existence of an SDE of the form
such that \(X_t^\varepsilon \rightarrow X_t\) for \(\varepsilon \rightarrow 0\) in law as random variables in \({\mathcal {C}}^0([0, T]; {\mathbb {R}})\). In particular, we have \(A = K\alpha \) and \(\varSigma = K \sigma \), where the coefficient \(0<K<1\) is given by the formula
with
and where the function \(\varPhi \) is the unique solution with zero mean with respect to the measure \(\mu \) of the twopoint boundary value problem
endowed with periodic boundary conditions. Let us remark that in this onedimensional setting it is possible to determine \(\varPhi \) explicitly, and the homogenization coefficient K is given by
where
We now briefly present the classical methodology for estimating the drift coefficient. Let \(T > 0\) and let \(X \,\,{:=}\,\,(X_t, 0\le t \le T)\) be a realization of the solution of (3) up to final time T. Girsanov’s change of measure formula applied to (3) allows to write the likelihood of X given a drift coefficient A as
where
Minimizing the functional \(I(X \mid A)\) with respect to A therefore gives the maximum likelihood estimator (MLE) of A, which can be formally computed in closed form as
where \(M(X)\in {\mathbb {R}}^{N\times N}\) and \(h(X)\in {\mathbb {R}}^N\) are defined as
where \(\otimes \) denotes the outer product in \({\mathbb {R}}^N\). Let us now state the assumptions which will be employed throughout the rest of our work. In particular, we consider the same dissipative setting as [29, Assumption 3.1].
Assumption 2.1
The potentials p and V satisfy

(i)
\(p \in {\mathcal {C}}^\infty ({\mathbb {R}})\) and is Lperiodic for some \(L > 0\);

(ii)
\(V_i \in {\mathcal {C}}^\infty ({\mathbb {R}})\) for all \(i=1, \ldots , N\) is polynomially bounded from above and bounded from below, and there exist \(a,b > 0\) such that
$$\begin{aligned} \alpha \cdot V'(x) x \le a  bx^2; \end{aligned}$$ 
(iii)
\(V'\) is Lipschitz continuous, i.e., there exists a constant \(C > 0\) such that
$$\begin{aligned} \left\ V'(x)  V'(y)\right\ _2 \le C\leftx  y\right, \end{aligned}$$and the components \(V'_i\) are polynomially bounded for all \(i = 1, \ldots , N\);

(iv)
for all \(T > 0\), the symmetric matrix M(X) is positive definite and there exists \({\bar{\lambda }} > 0\) such that \(\lambda _{\min }(M(X)) \ge {\bar{\lambda }}\).
Remark 2.2
In the following, in particular in the proof of Lemma 3.3, we will employ Assumption 2.1(ii) for the whole drift of the SDE (1), i.e., the function
Since \(p \in C^\infty ({\mathbb {R}})\) and is periodic, all derivatives of p are in \(L^\infty ({\mathbb {R}})\). Therefore, the assumption above is sufficient for \(V^\varepsilon \) to satisfy Assumption 2.1(ii) with different values for a and b. In particular, assume Assumption 2.1(ii) holds for V. Then, we have for all \(\gamma > 0\) by Young’s inequality
Hence, Assumption 2.1(ii) holds for \(V^\varepsilon \) with a coefficient b which is arbitrarily close to the coefficient for V, alone.
Under these assumptions, the MLE given in (7) is indeed the unique minimizer of the likelihood function, as shown in [31, Theorem 2.4].
Let us consider the modified estimator of the drift coefficient obtained replacing X with \(X^\varepsilon \,\,{:=}\,\,(X_t^\varepsilon , 0 \le t \le T)\) solution of (1), i.e.,
where \(I(X^\varepsilon \mid A)\), the matrix \(M(X^\varepsilon )\) and the vector \(h(X^\varepsilon )\) are obtained replacing each occurrence of X with \(X^\varepsilon \). In the following, we assume that Assumption 2.1(iv) holds as well for the matrix \(M(X^\varepsilon )\), and simply denote by \(M \,\,{:=}\,\,M(X^\varepsilon )\) and \(h \,\,{:=}\,\,h(X^\varepsilon )\) in case of no ambiguity. Given the convergence of \(X^\varepsilon \rightarrow X\) in the space of continuous stochastic processes, one would expect that the MLE (8) would be asymptotically unbiased for the drift coefficient A of the homogenized equation (3). Instead, it is possible to prove that in the asymptotic limit for \(T \rightarrow \infty \) and \(\varepsilon \rightarrow 0\), the MLE tends to the drift coefficient \(\alpha \) of the unhomogenized equation (1). We report here this result, whose proof can be found for the case \(N = 1\) in [29, Theorem 3.4]. We remark that the proof for \(N > 1\) follows directly from the onedimensional case.
Theorem 2.3
Let Assumption 2.1 hold and let \(X^\varepsilon _0\) be distributed according to the invariant measure of the process \(X^\varepsilon \) solution of (1). Then,
where \(\alpha \) is the drift coefficient of Eq. (1).
As anticipated in the introduction, the main existing tool for obtaining unbiased estimators in the literature is subsampling the data. In particular, let the dimension of the parameter \(N = 1\), let \(\delta > 0\) and let \(T = n\delta \) with n a positive integer. Then, a subsampled estimator for A is given by
which is a discretized version of \({\widehat{A}}(X^\varepsilon , T)\). It is possible to show [29, Theorem 3.5] that choosing \(\delta = \varepsilon ^\zeta \) with \(\zeta \in (0, 1)\), then \({\widehat{A}}_\delta (X^\varepsilon , T)\) is an asymptotically unbiased estimator of A in the limit for \(\varepsilon \rightarrow 0\), in probability. Despite being widely employed in practice, estimators based on subsampling present some drawbacks, such as having a high variance, as mentioned in the introduction. In the following, we will introduce and analyze a novel approach for the drift estimation.
Estimating the effective diffusion coefficient \(\varSigma \) of the homogenized SDE (3) is as well a relevant problem. Indeed, knowing \(\varSigma \) besides the drift coefficient A gives a complete estimation of the effective model (3), which is effective for the multiscale data generated by (1) in the sense of homogenization theory. The standard approach for estimating the diffusion coefficient is to compute the quadratic variation of the path. In [29, Theorem 3.4], the authors show that this approach fails in case the data are not preprocessed, meaning that the quadratic variation of \(X^\varepsilon \) equals the diffusion coefficient \(\sigma \) of (1), even in the limit for \(\varepsilon \rightarrow 0\). They propose therefore the estimator \({\widehat{\varSigma }}_\delta \) based on subsampling that tends to the effective diffusion coefficient \(\varSigma \) [29, Theorem 3.6]. Despite the focus of this work being mainly the effective drift coefficient, we propose in the following an unbiased estimator for the effective diffusion coefficient which fits our novel approach.
Remark 2.4
We note that our framework may be viewed in the semiparametric setting as the one of [21]. In particular, the functions \(V_i\), \(i=1, \ldots , N\) can be seen as the known basis functions of an expansion (e.g., a Taylor expansion) for the unknown confining potential \(V_\alpha :{\mathbb {R}}\rightarrow {\mathbb {R}}\) given by
A numerical example highlighting the potential of our method in such a setting is given in Sect. 5.3.
Remark 2.5
Let us remark that for enhancing the clarity of the exposition, in this article we chose to focus on the case of a multidimensional parameter in the setting of onedimensional diffusion processes. In fact, all the theory we present in the following could be generalized to the case of the ddimensional version of the SDE (1), which can be written as
where \(W_t\) is a standard ddimensional Brownian motion. Slight modifications of the proof demonstrate that analogous results to ours may be obtained in the ddimensional case.
The Filtered Data Approach
In this section, we introduce and analyze a novel approach based on filtered data to address the issue that the MLE estimator, when confronted with multiscale data, is biased. Let \(\beta , \delta > 0\) and let us consider a family of exponential kernel functions \(k :{\mathbb {R}}^+ \rightarrow {\mathbb {R}}\) defined as
where \(C_{\beta }\) is the normalizing constant given by
so that
and where \(\varGamma (\cdot )\) is the gamma function. We consider the process \(Z^\varepsilon \,\,{:=}\,\,(Z^\varepsilon _t, 0 \le t \le T)\) defined by the weighted average
The process \(Z^\varepsilon \) can be interpreted as a smoothed version of the original trajectory \(X^\varepsilon \). In fact, in the field of signal processing, kernel (9) belongs to the class of lowpass linear timeinvariant filters, which cut the high frequencies in a signal to highlight its slowest components. In the following, rigorous analysis is conducted only when \(\beta = 1\). Nonetheless, numerical experiments show that for higher values of \(\beta \) the performances of estimators computed employing the filter are more robust and qualitatively better.
Remark 3.1
Given a trajectory \(X^\varepsilon \), it is relatively inexpensive to compute \(Z^\varepsilon \) from a computational standpoint. In particular, the process \(Z^\varepsilon \) is the truncated convolution of the kernel with the process \(X^\varepsilon \). Hence, computational tools based on the fast Fourier transform (FFT) exist and allow to compute \(Z^\varepsilon \) fast componentwise. Moreover, the process \(Z^\varepsilon \) can be computed, in case \(\beta = 1\), in a recursive manner and therefore “online.”
Given a trajectory \(X^\varepsilon \) and the filtered data \(Z^\varepsilon \), the estimator of the drift coefficient we propose is given by
where we employ the subscript k for reference to the filter’s kernel in (9), and where
For economy of notation, we drop explicit reference to the dependence of \({\widetilde{M}}\) and \({\widetilde{h}}\) on \(X^\varepsilon \). Let us remark that the formula above is obtained from (8) by replacing only one instance of \(X_t^\varepsilon \) with \(Z_t^\varepsilon \) in both M and h. In particular, it is fundamental for proving unbiasedness to keep in the definition of h the differential of the original process \(\mathrm {d}X^\varepsilon _t\) (see Remark 3.7). Let us furthermore remark that \({\widehat{A}}_k(X^\varepsilon , T)\) need not be the minimizer of some likelihood function based on filtered data. In fact, if one were to replace \(Z_t^\varepsilon \) directly in (6), the symmetric part of the matrix \({\widetilde{M}}\) would appear and \(\widehat{A}_k(X^\varepsilon , T)\) would not be the minimizer. Therefore, the estimator \({\widehat{A}}_k(X^\varepsilon , T)\) has to be thought of as a perturbation of \({\widehat{A}}(X^\varepsilon , T)\), directly at the level of estimators and after the maximization procedure. The only theoretical guarantee which is still needed for the wellposedness of \(\widehat{A}_k(X^\varepsilon , T)\) is for \({\widetilde{M}}\) to be invertible, which we assume to be true and which we observed to hold in practice.
We now consider the diffusion coefficient and propose the estimator for \(\varSigma \) in (3) given by
where again we employ the subscript k for reference to kernel (9) of the filter. As we will show in the following, and in particular in Theorem 3.20, the estimator \({\widehat{\varSigma }}\) is unbiased for the effective diffusion coefficient \(\varSigma \) in case \(\beta = 1\) and when we filter data at the multiscale regime, i.e., when \(\delta \) is a vanishing function of \(\varepsilon \).
Let us from now on consider \(\beta = 1\). For this value of \(\beta \), the parameter \(\delta \) appearing in (9) regulates the width of the filtering window. In practice, larger values of \(\delta \) will lead to trajectories which are smoother and for which fastscale oscillations are practically canceled. Let us remark that the filtering width resembles the subsampling step employed for the estimator \({\widehat{A}}_\delta (X^\varepsilon , T)\) introduced and analyzed in [29]. For subsampling, the choice guaranteeing asymptotically unbiased results is \(\delta = \varepsilon ^\zeta \) with \(\zeta \in (0, 1)\), and a similar analysis is due for our technique. For visualization purposes, we depict in Fig. 1 the filtered trajectory \(Z^\varepsilon \) for three different values of \(\delta \), namely \(\delta = \{1, \sqrt{\varepsilon }, \varepsilon \}\). With \(\delta = 1\), all oscillations at the fast scale are canceled and the filtered trajectory \(Z^\varepsilon \) presents only slowscale variations. Reducing the value of \(\delta \), fastscale oscillations are progressively taken into account.
In the following, we first focus on the ergodic properties of the process \(Z^\varepsilon \) when it is coupled with the process \(X^\varepsilon \). This analysis is practically independent of the choice of \(\delta \) and is therefore presented on its own. Then, we focus on two different cases which depend on the choice of the width \(\delta \) of the filter. First, in Sect. 3.2, we consider \(\delta \) to be independent of \(\varepsilon \), and therefore, we filter at the speed of the homogenized process. In this case, we are able to prove that our estimator of the drift coefficient of the homogenized equation is asymptotically unbiased almost surely. This result will be presented in Theorem 3.12. We then move on in Sect. 3.3 to the case \(\delta \propto \varepsilon ^\zeta \), which corresponds to filtering the data at the speed of the multiscale process. In this case, we show that under some conditions on the exponent \(\zeta \), we can still obtain estimators which are asymptotically unbiased in probability. This result is proved in Theorem 3.18. For this second case, we widely employ techniques and estimates which come from [29].
Ergodic Properties
Let us consider the filtering kernel (9) with \(\beta = 1\), i.e.,
In this case, Leibniz integral rule yields the equality
which can be interpreted as an ordinary differential equation for \(Z_t^\varepsilon \) driven by the stochastic signal \(X^\varepsilon \). Considering the processes \(X^\varepsilon \) and \(Z^\varepsilon \) together, we obtain the system of two onedimensional SDEs
The first ingredient for verifying the ergodic properties of the twodimensional process \((X^\varepsilon , Z^\varepsilon )^\top \,\,{:=}\,\,((X^\varepsilon _t, Z^\varepsilon _t)^\top , 0 \le t \le T)\) is verifying that the measure induced by the stochastic process admits a smooth density with respect to the Lebesgue measure. Since noise is present only on the first component, this is a consequence of the theory of hypoellipticity, as summarized in the following Lemma, whose proof is given in “Appendix A.”
Lemma 3.2
Let \((X^\varepsilon , Z^\varepsilon )^\top \) be the solution of (13) and let \(\mu ^\varepsilon _t\) be the measure induced by the joint process at time t. Then, the measure \(\mu ^\varepsilon _t\) admits a smooth density \(\rho ^\varepsilon _t\) with respect to the Lebesgue measure.
Once it is established that the law of the process admits a smooth density for all times \(t>0\), which satisfies a timedependent Fokker–Planck equation, we are interested in the limiting properties of this law. In particular, we know that the process \(X^\varepsilon \) alone is geometrically ergodic [23, Theorem 4.4], and we wish the couple \((X^\varepsilon , Z^\varepsilon )^\top \) to inherit the same property. The following Lemma guarantees that the couple is indeed geometrically ergodic, and its proof is given in “Appendix A.”
Lemma 3.3
Let Assumption 2.1 hold and let \(b > 0\) be given in Assumption 2.1(ii). Then, if \(\delta > 1/(4b)\), the process \((X^\varepsilon , Z^\varepsilon )^\top \) solution of (13) is geometrically ergodic, i.e., there exists \(C, \lambda > 0\) such that for all measurable \(f:{\mathbb {R}}^2\rightarrow {\mathbb {R}}\) such that for some integer \(q > 0\)
it holds
for \(\rho ^\varepsilon \)a.e. couple \((X_0^\varepsilon , Z_0^\varepsilon )^\top \), where \({{\mathbb {E}}}\) denotes expectation with respect to the Wiener measure, and \(\rho ^\varepsilon \) is the solution to the stationary Fokker–Planck equation
Remark 3.4
The condition \(\delta > 1 / (4b)\) is not very restrictive. Let the parameter dimension \(N = 1\) and let \(V(x) \propto x^{2r}\) for an integer \(r > 1\). Then, Assumption 2.1(ii) holds for an arbitrarily large \(b > 0\). Therefore, the parameter of the filter \(\delta \) can be chosen along the entire positive real axis. A similar argument can be employed for higher dimensions \(N > 1\).
In a general case, it is not possible to find an explicit solution to (14). Nevertheless, it is possible to show some relevant properties of the solution itself, which are summarized in the following Lemma, whose proof is given in “Appendix A.”
Lemma 3.5
Under the assumptions of Lemma 3.3, let \(\rho ^\varepsilon \) be the solution of (14) and let us write
where \(\varphi ^\varepsilon \) and \(\psi ^\varepsilon \) are the marginal densities of \(X^\varepsilon \) and \(Z^\varepsilon \), respectively, i.e.,
Then, it holds
where
Moreover, it holds
Remark 3.6
Lemma 3.5, and in particular equality (17), plays a fundamental role in the proof of unbiasedness of the estimator based on filtered data. In particular, this equality allows to bypass the explicit knowledge of the function R(x, z), which governs the correlation between the processes \(X^\varepsilon \) and \(Z^\varepsilon \) at stationarity, for which a closedform expression is not available in the general case.
Remark 3.7
Let us return to the definition of \({\widehat{A}}_k\) and replace the differential \(\mathrm {d}X^\varepsilon _t\) with \(\mathrm {d}Z^\varepsilon _t\) in \({\widetilde{h}}\). In this case, it holds
where the last equality is obtained as in the proof of Lemma 3.5, with the choice \(f(x,z) = V(z)\) at the last line. Therefore, we stress again that it is indeed necessary to employ the original differential \(\mathrm {d}X^\varepsilon _t\) in the vector \({\widetilde{h}}\) in the definition (10) of \(\widehat{A}_k^\varepsilon \).
Remark 3.8
Let us consider kernel (9) with \(\beta > 1\). In this case, the steps leading to system (13) do not yield a system of Itô SDEs, but of stochastic delay differential equations. The analysis of the estimator in case \(\beta > 1\) is therefore based on different arguments than the one we present in this work.
Filtered Data in the Homogenized Regime
In this section, we analyze the behavior of the estimator \(\widehat{A}_k(X^\varepsilon , T)\) based on filtered data given in (10) when the filtering width \(\delta \) is independent of \(\varepsilon \). The analysis in this case is based on the convergence of the couple \((X^\varepsilon , Z^\varepsilon )^\top \) with respect to the multiscale parameter \(\varepsilon \rightarrow 0\). In particular, it is known that the invariant measure of \(X^\varepsilon \) converges weakly to the invariant measure of X, the solution of the homogenized equation (3). The following result guarantees the same kind of convergence for the couple \((X^\varepsilon , Z^\varepsilon )^\top \).
Lemma 3.9
Under Assumption 2.1, let \(\mu ^\varepsilon \) be the invariant measure of the couple \((X^\varepsilon , Z^\varepsilon )^\top \). If \(\delta \) is independent of \(\varepsilon \), then the measure \(\mu ^\varepsilon \) converges weakly to the measure \(\mu ^0(\mathrm {d}x, \mathrm {d}z) = \rho ^0(x, z) \,\mathrm {d}x \,\mathrm {d}z\), whose density \(\rho ^0\) is the unique solution of the Fokker–Planck equation
where A and \(\varSigma \) are the coefficients of the homogenized equation (3).
Proof
Let \((X, Z)^\top \,\,{:=}\,\,\left( (X_t, Z_t)^\top , 0\le t \le T\right) \) be the solution of
with \((X_0, Z_0)^\top \sim \mu ^0\). The arguments of Sect. 3.1 can be repeated to conclude that the invariant measure of \((X, Z)^\top \) admits a smooth density \(\rho ^0\) which satisfies (18). Moreover, standard homogenization theory (see, e.g., [9, Chapter 3, Theorem 6.4] or [30, Theorem 18.1]) guarantees that \((X^\varepsilon ,Z^\varepsilon )^\top \rightarrow (X,Z)^\top \) for \(\varepsilon \rightarrow 0\) in law as random variables with values in \({\mathcal {C}}^0([0, T]; {\mathbb {R}}^2)\), provided that \((X_0^\varepsilon , Z_0^\varepsilon )^\top \sim \mu ^\varepsilon \). We remark that traditionally it is assumed that the initial conditions satisfy \((X_0^\varepsilon , Z_0^\varepsilon )^\top = (X_0, Z_0)^\top \) for the homogenization result to hold, but notice that the proof of, e.g., [30, Theorem 18.1] can be shown to hold with a minor modification in case both the multiscale and the homogenized processes are at stationarity. Denoting \(E = C^0([0,T], {\mathbb {R}}^2)\), this means that the measure induced by \((X^\varepsilon , Z^\varepsilon )^\top \) on \((E, {\mathcal {B}}(E))\) converges weakly to the measure induced by \((X, Z)^\top \) on the same measurable space (see, e.g., [30, Definition 3.24]). Hence, the measure \(\mu ^\varepsilon \) converges weakly to \(\mu ^0\) for \(\varepsilon \rightarrow 0\). \(\square \)
Example 3.10
A closedform solution of (18) can be obtained in a simple case. Let the dimension of the parameter \(N=1\) and let \(V(x) = x^2/2\). Then, the analytical solution is given by
where
This is the density of a multivariate normal distribution \(\mathcal N(0, \varGamma )\), where the covariance matrix is given by
Let us remark that this distribution can be obtained from direct computations involving Gaussian processes. In particular, we have that X is in this case an Ornstein–Uhlenbeck process and it is therefore known that \(X \sim \mathcal {GP}(m_t, {\mathcal {C}}(t, s))\), where at stationarity \(m_t = 0\) and
The basic properties of Gaussian processes imply that Z is a Gaussian process and that the couple \((X, Z)^\top \) is a Gaussian process, too, whose mean and covariance are computable explicitly.
We now present an analogous result to Lemma 3.5 for the limit distribution.
Corollary 3.11
Let \(\rho ^0\) be the solution of (18) and let us write
where \(\varphi ^0\) and \(\psi ^0\) are the marginal densities, i.e.,
Then, if A and \(\varSigma \) are the coefficients of the homogenized equation (3), it holds
Moreover, it holds
Proof
The proof is directly obtained from Lemma 3.5 setting \(p(y)=0\) and replacing \(\alpha , \sigma \) by \(A, \varSigma \), respectively. \(\square \)
Let us introduce a notation which will be used throughout the rest of the paper. We denote
i.e., \(\widetilde{{\mathcal {M}}}_\varepsilon \) is obtained in the limit for \(T \rightarrow \infty \) applying the ergodic theorem elementwise to the matrix \({\widetilde{M}}\), and \(\widetilde{{\mathcal {M}}}_0\) is the limit for \(\varepsilon \rightarrow 0\) of the matrix \(\widetilde{{\mathcal {M}}}_\varepsilon \) due to Lemma 3.9. For completeness, we introduce here the symmetric matrices \({\mathcal {M}}_\varepsilon \) and \({\mathcal {M}}_0\) which are defined as
and which will be employed in the following. We can now introduce the main result, namely the convergence of the estimator based on filtered data of the drift coefficient of the homogenized equation.
Theorem 3.12
Let the assumptions of Lemmas 3.3 and 3.9 hold, and let \({\widehat{A}}_k(X^\varepsilon , T)\) be defined in (10) with \(\delta \) independent of \(\varepsilon \). If \({\widetilde{M}}\) is invertible, then
where A is the drift coefficient of the homogenized equation (3).
Proof
Replacing the expression of \(\mathrm {d}X^\varepsilon _t\) into (11), we get for \({\widetilde{h}}\)
Therefore, we have
We study the terms \(I_1^\varepsilon (T)\) and \(I_2^\varepsilon (T)\) separately. First, the ergodic theorem applied to \(I_1^\varepsilon (T)\) yields
Replacing decomposition (15), expression (16) of \(\varphi ^\varepsilon \) and integrating by parts, we have
which implies
Replacing the equality above into (23), we obtain
Due to Lemma 3.5, we therefore have
Since \(\delta \) is independent of \(\varepsilon \), we can pass to the limit as \(\varepsilon \) goes to zero and Lemma 3.9 yields
Due to Corollary 3.11, we have
and moreover, an integration by parts yields
We can therefore conclude that
We now consider the second term \(I_2^\varepsilon (T)\) and rewrite it as
where
The ergodic theorem yields
where \(R^\varepsilon \) is bounded uniformly in \(\varepsilon \) due to the theory of homogenization, Assumption 2.1(iii)–(iv) and Lemma C.1. Moreover, always due to Lemma C.1 and Assumption 2.1(iii) we have that \(V'(Z^\varepsilon )\) is square integrable, and hence, the strong law of large numbers for martingales implies
independently of \(\varepsilon \). Therefore,
which, together with (26) and (22), proves the desired result. \(\square \)
Remark 3.13
Let us remark that the assumption that \(\delta \) is independent of \(\varepsilon \) is necessary to pass from (24) to (25) but is not needed before (24). Moreover, the term \(I_2^\varepsilon (t)\) in the proof vanishes a.s. independently of \(\varepsilon \). Therefore, in the analysis of the case \(\delta = {\mathcal {O}}(\varepsilon ^\zeta )\) it will be sufficient for unbiasedness to show that
which is a nontrivial limit since \(\delta \rightarrow 0\) for \(\varepsilon \rightarrow 0\).
Filtered Data in the Multiscale Regime
We now consider the case of the filtering width \(\delta = \mathcal O(\varepsilon ^\zeta )\), where \(\zeta > 0\) will be specified in the following. In this case, the filtered process resembles more the original process \(X^\varepsilon \), as noted in Fig. 1. Moreover, the techniques employed for proving Theorem 3.12 can only be partly exploited, as highlighted in Remark 3.13. In fact, in order to prove unbiasedness it is necessary to characterize precisely the difference between the processes \(Z^\varepsilon \) and \(X^\varepsilon \). A first characterization is given by the following Proposition, whose proof is found in “Appendix B.”
Proposition 3.14
Let Assumption 2.1 hold and \(\varepsilon ,\delta >0\) be sufficiently small. Then, it holds for every \(t > 0\)
where the stochastic process \(B_t^\varepsilon \) is defined as
where \(\varPhi \) is the solution of the cell problem (5), \(W_s\) is the Brownian motion appearing in (1) and \(Y_t^\varepsilon = X_t^\varepsilon / \varepsilon \). Moreover, \(B_t^\varepsilon \) and the remainder \(R(\varepsilon ,\delta )\) satisfy for every \(p \ge 1\) the estimates
and
where C is independent of \(\varepsilon \), \(\delta \) and t and \(\varphi ^\varepsilon \) is the density of the invariant measure of \(X^\varepsilon \).
It is clear from the Proposition above that understanding the properties of the process \(B_t^\varepsilon \) is key to understanding the behavior of the difference between \(X^\varepsilon \) and \(Z^\varepsilon \). In particular, we can write the dynamics of \(B_t^\varepsilon \) with an application of the Itô formula and due to the properties of the kernel k(t) as
This equation can be coupled with the dynamics of the processes \(X_t^\varepsilon \), \(Y_t^\varepsilon \) and \(Z_t^\varepsilon \), thus describing the evolution of the quadruple \((X^\varepsilon , Y^\varepsilon , Z^\varepsilon , B^\varepsilon )\) together. In particular, it is possible to show that the results of Sect. 3.1 hold for the quadruple, and the properties of the invariant measure of the quadruple can be exploited to prove the unbiasedness of the estimator in the case \(\delta = \mathcal O(\varepsilon ^\zeta )\) in the same way as in the case \(\delta \) independent of \(\varepsilon \). In this context, a further assumption on the potential V is necessary.
Assumption 3.15
The derivatives \(V''\) and \(V'''\) of the potential \(V :{\mathbb {R}}\rightarrow {\mathbb {R}}^N\) are componentwise polynomially bounded, and the second derivative is Lipschitz, i.e., there exists a constant \(L > 0\) such that
for all \(x, y \in {\mathbb {R}}\).
In light of Remark 3.13, it is fundamental to understand the behavior of the quantity
as well as its limit for \(t\rightarrow \infty \) and for \(\varepsilon \rightarrow 0\). Let us remark that due to Proposition 3.14 we have
and therefore studying the righthand side of the approximate equality above is the goal of the upcoming discussion. The following result, whose proof is in “Appendix C,” gives a first characterization.
Lemma 3.16
Under Assumptions 2.1 and 3.15, let \(\eta ^\varepsilon \) be the invariant measure of the quadruple \((X^\varepsilon , Y^\varepsilon , Z^\varepsilon , B^\varepsilon )\). Then, it holds
where the remainder \({\widetilde{R}}(\varepsilon ,\delta )\) satisfies
Let us remark that the quantity appearing above hints toward the theory of homogenization. In fact, we recall that the homogenization coefficient K is given by
where \(\mu \) is the marginal measure of the process \(Y^\varepsilon \) when coupled with \(X^\varepsilon \). Therefore, the next step is the homogenization limit, i.e., the limit of vanishing \(\varepsilon \), which is considered in the following Lemma, and whose proof is given in “Appendix C.”
Lemma 3.17
Let the assumptions of Lemma 3.16 hold, and let \(\delta = \varepsilon ^\zeta \) with \(\zeta > 0\). Then, it holds
where \(\varSigma \) is the diffusion coefficient of the homogenized equation (3).
Provided with the results presented above, we can prove the following Theorem, stating that the estimator \({\widehat{A}}_k(X^\varepsilon , T)\) is asymptotically unbiased even in the case of the filtering width \(\delta \) vanishing with respect to the multiscale parameter \(\varepsilon \).
Theorem 3.18
Let the assumptions of Lemmas 3.3 and 3.17 hold. Let \({\widehat{A}}_k(X^\varepsilon , T)\) be defined in (10) and \(\delta = \varepsilon ^\zeta \) with \(\zeta \in (0,2)\). If \({\widetilde{M}}\) is invertible, then
where A is the drift coefficient of the homogenized equation (3).
Proof
Let us introduce the notation
where \(\widetilde{{\mathcal {M}}}_\varepsilon \) is defined in (20). Then, following the proof of Theorem 3.12 and in light of Remark 3.13, we only need to show that if \(\delta = \varepsilon ^\zeta \) with \(\zeta \in (0,2)\) we have
Using Proposition 3.14 and geometric ergodicity for taking the limit for \(t \rightarrow \infty \) (Lemma 3.3), we have the following equality
where \(R(\varepsilon , \delta )\) is given in Proposition 3.14, \({{\mathbb {E}}}\) denotes the expectation with respect to the Wiener measure and
Let us consider the three terms separately. First, by geometric ergodicity and applying Lemmas 3.16 and 3.17 we get
Let us now consider \(J_2^\varepsilon (t)\). Considering Hölder conjugates p, q, r the Hölder inequality yields
Now, we can bound the first two terms with (28) and (29), respectively. The third term is bounded due to Assumption 3.15 and Lemma C.1. Hence, we have for t sufficiently large
We consider now \(J_3^\varepsilon (t)\). The Hölder inequality yields for conjugates p and q
which, similarly as above, yields for t sufficiently large
Therefore, since \(\delta = {\mathcal {O}}(\varepsilon ^\zeta )\) for \(\zeta \in (0, 2)\), the terms \(J_2^\varepsilon (t)\) and \(J_3^\varepsilon (t)\) vanish in the limit for \(t \rightarrow \infty \) and \(\varepsilon \rightarrow 0\). Furthermore, by Lemma C.4 and by weak convergence of the invariant measure \(\mu ^\varepsilon \) to \(\mu ^0\), we have
where \({\mathcal {M}}_0\) is defined in (21). Therefore,
and, finally, employing (19) and (21) and integrating by parts yield
which implies the desired result. \(\square \)
We conclude the analysis concerning the estimator \({\widehat{A}}_k\) for the effective drift coefficient with a negative convergence result, i.e., that if \(\delta = \varepsilon ^\zeta \) with \(\zeta > 2\), the estimator based on filtered data converges to the coefficient \(\alpha \) of the unhomogenized equation. This result is relevant for two reasons. First, it shows the sharpness of the bound on \(\zeta \) in the assumptions of Theorem 3.18. Second, it shows an interesting switch between two completely different regimes at \(\zeta = 2\), which happens arbitrarily fast in the limit \(\varepsilon \rightarrow 0\).
Theorem 3.19
Let the assumptions of Lemma 3.3 and Assumption 3.15 hold. Let \({\widehat{A}}_k(X^\varepsilon , T)\) be defined in (10) and \(\delta = \varepsilon ^\zeta \) with \(\zeta > 2\). If \({\widetilde{M}}\) is invertible, then
where \(\alpha \) is the drift coefficient of the multiscale equation (1).
The proof is given in “Appendix C.”
We conclude this section by proving a result of asymptotic unbiasedness for the estimator \({\widehat{\varSigma }}_k\) of the effective diffusion coefficient \(\varSigma \) defined in (12). The proof is given in “Appendix D.”
Theorem 3.20
Let the Assumptions of Theorem 3.19 hold. Then, if \(\delta = \varepsilon ^\zeta \), with \(\zeta \in (0,2)\), it holds
where \(\varSigma \) is the diffusion coefficient of the homogenized equation (3).
The Bayesian Setting
In this section, we present a Bayesian reinterpretation of the inference procedure, which, given the structure of the problem, allows full uncertainty quantification with little more computational effort than required for the MLE.
Let us fix a Gaussian prior \(\mu _0 = {\mathcal {N}}(A_0, C_0)\) on A, where \(A_0 \in {\mathbb {R}}^N\) and \(C_0 \in {\mathbb {R}}^{N\times N}\) is symmetric positive definite. Then, given a final time \(T > 0\), the posterior distribution \(\mu _{T,\varepsilon }\) admits a density \(p(A \mid X^\varepsilon )\) with respect to the Lebesgue measure which satisfies
where \(Z^\varepsilon \) is the normalization constant, \(p_0\) is the density of \(\mu _0\) and the likelihood \(p(X^\varepsilon \mid A)\) is given in (6). The logposterior density is therefore given by
where M and h are defined in (8). Since the logposterior density is quadratic in A, the posterior is Gaussian, and it is therefore sufficient to determine its mean and covariance to fully characterize it. We denote by \(m_{T,\varepsilon }\) and \(C_{T,\varepsilon }\) the mean and covariance matrix, respectively. Completing the squares in the logposterior density, we formally obtain
Under Assumption 2.1, one can show that the posterior at time \(T > 0\) is well defined and given by \(\mu _{T,\varepsilon }(\cdot \mid X^\varepsilon ) = {\mathcal {N}}(m_{T,\varepsilon }, C_{T,\varepsilon })\). Let us remark that in order to compute the posterior covariance \(C_{T,\varepsilon }\) the value of the diffusion coefficient \(\varSigma \) of the homogenized equation is needed. Although the exact value is in general unknown, it can be estimated employing the subsampling technique presented in [29] or with the estimator \({\widehat{\varSigma }}_k\) given in (12) based on filtered data. In fact, we verified in practice that the estimator of the diffusion coefficient based on subsampling is more robust with respect to the subsampling step than the estimator for the drift coefficient. In the following theorem, we show that the posterior distribution obtained with no preprocessing of the data contracts asymptotically to the drift coefficient of the unhomogenized equation. We characterize the contraction by verifying that the posterior measure concentrates in arbitrarily small balls. Let us finally remark that the measure \(\mu _{T, \varepsilon }\) is a random measure, and therefore, contraction has to be considered averaged with respect to the Wiener measure. The choice of the contraction measure and some parts of the proof are taken from [32, Theorem 5.2].
Theorem 4.1
Under Assumption 2.1, the posterior measure \(\mu _{T,\varepsilon }(\cdot \mid X^\varepsilon ) = {\mathcal {N}}(m_{T,\varepsilon }, C_{T,\varepsilon })\) satisfies for all \(c > 0\)
where \({{\mathbb {E}}}\) denotes expectation with respect to the Wiener measure and \(\alpha \) is the drift coefficient of the unhomogenized equation (1).
Remark 4.2
The result above has the same consequences in the Bayesian setting as Theorem 2.3 has for the MLE. In particular, it shows that the posterior distribution obtained when data are not preprocessed concentrates asymptotically on the drift coefficient of the unhomogenized equation (1). Moreover, a partial result which can be deduced from the proof is that in the limit for \(T \rightarrow \infty \) and for a positive value \(\varepsilon > 0\) the Bayesian and the MLE approaches are equivalent. In particular, we have for all \(\varepsilon > 0\)
i.e., the weak limit of the posterior \(\mu _{T,\varepsilon }\) for \(T\rightarrow \infty \) is the Dirac delta concentrated on the limit of \(\widehat{A}(X^\varepsilon , T)\) for \(T\rightarrow \infty \).
Proof of Theorem 4.1
The proof of [32, Theorem 5.2] guarantees that if the trace of \(C_{T,\varepsilon }\) tends to zero and if the mean \(m_{T,\varepsilon }\) tends to \(\alpha \), then the desired result holds. Indeed, the triangle inequality yields
If the mean converges in probability, then the second term vanishes. For the first term, Markov’s inequality yields
and a change of variable simply gives
This proves that we just have to verify that the covariance matrix vanishes and that the mean tends to the coefficient \(\alpha \). Let us first consider the covariance matrix. An algebraic identity yields
where
Let us first remark that due to the hypothesis on M (Assumption 2.1(iv)) and the ergodic theorem it holds for all \(T > 0\)
where \({\bar{\lambda }}\) is given in Assumption 2.1(iv). We now have that for generic symmetric positive definite matrices R and S it holds
Applying this inequality to \(Q^{1}\), we obtain
which implies
and due to the triangle inequality
We proved that in the limit for \(T \rightarrow \infty \) the covariance shrinks to zero independently of \(\varepsilon \). We now consider the mean. First, we remark that the triangle inequality yields
For the second term, Theorem 2.3 implies
Let us now consider the first term. Replacing the expression of the maximum likelihood estimator (8) and due to the Cauchy–Schwarz and triangle inequalities, we obtain
Moreover, the ergodic theorem and the strong law of large numbers for martingales guarantee that \(\left\ h\right\ _2\) is bounded a.s. for \(T \rightarrow \infty \). Therefore,
independently of \(\varepsilon \). Finally,
which, together with (31), implies the desired result. \(\square \)
The Filtered Data Approach
In this section, we present how to correct the asymptotic biasedness of the posterior highlighted in Theorem 4.1 employing filtered data. In the Bayesian setting, we consider the modified likelihood function
where
Since M is symmetric positive definite, the function \(\widetilde{p}(X^\varepsilon \mid A)\) is indeed a valid Gaussian likelihood function. We then obtain the modified posterior \({{\widetilde{\mu }}}_{T,\varepsilon } = {\mathcal {N}}({\widetilde{m}}_{T, \varepsilon }, C_{T, \varepsilon })\), whose parameters are given by
Let us remark that the posterior \({{\widetilde{\mu }}}_{T,\varepsilon }\) has the same covariance as \(\mu _{T,\varepsilon }\) given in (30) and that therefore it is indeed a valid Gaussian posterior distribution. Nevertheless, in order to employ the tool of convergence introduced in Theorem 4.1, we need to study the properties of the MLE based on the likelihood \({\widetilde{p}}(X^\varepsilon \mid A)\), i.e., the quantity
The following theorem guarantees the unbiasedness of this estimator under a condition on the parameter \(\delta \) of the filter.
Theorem 4.3
Let the assumptions of Theorem 3.18 hold. Then, if \(\delta = \varepsilon ^\zeta \), with \(\zeta \in (0, 2)\), it holds
for \({\widetilde{A}}_k(X^\varepsilon , T)\) defined in (32).
Proof
We first consider the difference between the two estimators \({\widetilde{A}}_k(X^\varepsilon , T)\) and \({\widehat{A}}_k(X^\varepsilon , T)\). In particular, the ergodic theorem and an algebraic equality imply
almost surely, where \({\mathcal {M}}_\varepsilon \) and \(\widetilde{\mathcal M}_\varepsilon \) are defined in (21) and (20), respectively. Therefore, due to Assumption 2.1 which allows controlling the norm of \(\mathcal M_\varepsilon ^{1}\) and due to Lemma C.4 we have for a constant \(C > 0\)
where we remark that \({\widehat{A}}_k(X^\varepsilon , T)\) has a bounded norm for \(\varepsilon \) sufficiently small due to Theorem 3.18. Now, the triangle inequality yields
Therefore, due to Theorem 3.18, inequality (33) and since \(\delta = \varepsilon ^\zeta \), the desired result holds. \(\square \)
Remark 4.4
One could argue that we could have carried on the whole analysis for the estimator \({\widetilde{A}}_k(X^\varepsilon , T)\) instead of the estimator \({\widehat{A}}_k(X^\varepsilon , T)\). Nevertheless, the latter guarantees the strong result of almost sure convergence in case \(\delta \) is independent of \(\varepsilon \), which is false for the former. Conversely, analyzing the properties of the estimator \({\widetilde{A}}_k(X^\varepsilon , T)\) is fundamental for the Bayesian setting, in which the matrix \({\widetilde{M}}\) cannot be employed as its symmetric part is not positive definite in general.
In light of the proof of Theorem 4.1, Theorem 4.3 guarantees that the mean of the posterior distribution \({{\widetilde{\mu }}}_{T, \varepsilon }\) converges to the drift coefficient of the homogenized equation. Since the covariance matrix is the same for \(\mu _{T, \varepsilon }\) and \({{\widetilde{\mu }}}_{T, \varepsilon }\), it is possible to prove a positive convergence result for \(\widetilde{\mu }_{T, \varepsilon }\), which is given by the following Theorem.
Theorem 4.5
Let the Assumptions of Theorem 4.3 hold. Then, the modified posterior measure \({{\widetilde{\mu }}}_{T,\varepsilon }(\cdot \mid X^\varepsilon ) = {\mathcal {N}}({\widetilde{m}}_{T,\varepsilon }, C_{T,\varepsilon })\) satisfies
where \({{\mathbb {E}}}\) denotes expectation with respect to the Wiener measure and A is the drift coefficient of the homogenized equation (3).
Proof
The proof follows from the proof of Theorem 4.1 and from Theorem 4.3. \(\square \)
Numerical Experiments
In this section, we show numerical experiments confirming our theoretical findings and showcasing the potential of the filtered data approach to overcome model misspecification arising when multiscale data are used to fit homogenized models.
Remark 5.1
In practice, we consider for numerical experiment the data to be in the form of a highfrequency discrete time series from the solution \(X^\varepsilon \) of (1). Let \(\tau > 0\) be the time step at which data are observed, and let \(X^\varepsilon \,\,{:=}\,\,(X^\varepsilon _0, X^\varepsilon _\tau , X^\varepsilon _{2\tau }, \ldots )\). We then compute the estimator \({\widehat{A}}_k\) as
where
We take in all experiments \(\tau \ll \varepsilon ^2\), so that the discretization of the data has negligible effects and does not compromise the validity of our theoretical results.
Parameters of the Filter
For the first preliminary experiments, we consider \(N = 1\) and the quadratic potential \(V(x) = x^2/2\). In this case, the solution of the homogenized equation is an Ornstein–Uhlenbeck process. Moreover, we set the fast potential in the multiscale equation (1) as \(p(y) = \cos (y)\). In all experiments, data are generated employing the Euler–Maruyama method with a fine time step.
Verification of Theoretical Results
We first demonstrate numerically the validity of Theorem 3.12, Theorem 3.18 and Theorem 3.19, i.e., the unbiasedness of \({\widehat{A}}_k(X^\varepsilon , T)\) for \(\delta = \varepsilon ^\zeta \) with \(\zeta \in [0, 2)\) and biasedness for \(\zeta > 2\). Let us recall that for \(\zeta = 0\) the analysis and the theoretical result are fundamentally different than for \(\zeta \in (0, 2)\). We consider \(\varepsilon \in \{0.1, 0.05, 0.025\}\), the diffusion coefficient \(\sigma = 1\) and generate data \(X^\varepsilon _t\) for \(0 \le t \le T\) with \(T = 10^3\). Then, we filter the data by choosing \(\delta = \varepsilon ^\zeta \), and \(\zeta = 0, 0.1, 0.2,\ldots , 3\), and compute \(\widehat{A}_k(X^\varepsilon , T)\). Results are displayed in Fig. 2 and show that for \(\zeta > 2\), i.e., \(\delta = o(\varepsilon ^2)\), the estimator tends to the drift coefficient \(\alpha \) of the unhomogenized equation. Conversely, as predicted by the theory, for \(\zeta \in [0, 2)\) the estimator tends to A, the drift coefficient of the homogenized equation. Therefore, the point \(\delta = \varepsilon ^2\) acts asymptotically as a switch between two completely different regimes, which is theoretically sharp in the limit for \(T \rightarrow \infty \) and \(\varepsilon \rightarrow 0\). Let us remark that the results displayed in Fig. 2a demonstrate that the transition occurs more rapidly for the smallest values of \(\varepsilon \). Moreover, in Fig. 2b, one can see how with bigger final times T the estimator is closer both to A when \(\zeta \in [0, 2]\) and to \(\alpha \) when \(\zeta > 2\). Still, we observe that in finite computations the switch between A and \(\alpha \) is smoother than what we expect from the theory, which suggests to fix, if possible, \(\delta = 1\).
Comparison with Subsampling
We now compare the results given by the filtered data technique with the results given by subsampling the data, i.e., the difference between the estimators \({\widehat{A}}_k(X^\varepsilon , T)\) and \(\widehat{A}_\delta (X^\varepsilon , T)\). We fix the multiscale parameter \(\varepsilon = 0.1\) and generate data for \(0 \le t \le T\) with \(T = 10^3\). We choose \(\delta = \varepsilon ^{\zeta }\) and vary \(\zeta \in [0, 1]\), where \(\delta \) is the filtering and the subsampling width, respectively. Moreover, for the filtered data approach we consider both \(\beta = 1\) and \(\beta = 5\). We report in Fig. 3 the experimental results. Let us remark that:

(i)
for \(\sigma = 0.5\) the results given by subsampling and by the filter with \(\beta = 1\) are similar, while for higher values of \(\sigma \) the filtered data approach seems better than subsampling;

(ii)
in general, choosing a higher value of \(\beta \) seems beneficial for the quality of the estimator;

(iii)
the dependence on \(\delta \) of numerical results given by the filter seems relevant only in case \(\beta = 1\) and for small values of \(\sigma \). For \(\beta = 1\) and higher values of \(\sigma \), the estimator is stable with respect to this parameter. This can be observed for a higher value of \(\beta \), but we have no theoretical guarantee in this case.
The Influence of \(\beta \)
We finally test the variability of the estimator with respect to \(\beta \) in (9). We consider \(\delta = \varepsilon \), which corresponds to \(\zeta = 1\) and seems to be the worstcase scenario for the filter, at least for \(\beta = 1\). We consider again \(\sigma = 0.5, 0.7, 1\) and vary \(\beta = 1, 2, \ldots , 10\). Results, given in Fig. 4, show empirically that the estimator stabilizes fast with respect to \(\beta \). Nevertheless, there is no theoretical guarantee supporting this empirical observation.
Variance of the Estimators
We now compare the estimators \({\widehat{A}}_k\) based on filtered data and \({\widehat{A}}_\delta \) based on subsampling in terms of variance. We consider for this experiment the SDE (1) with \(N = 1\), the bistable potential \(V(x) = x^4/4  x^2/2\), the multiscale drift coefficient \(\alpha = 1\), the diffusion coefficient \(\sigma = 1\) and with \(\varepsilon = 0.1\). We then let \(X^\varepsilon = (X_t, 0\le t\le T)\) be the solution of (1) and generate \(N_{\mathrm {s}} = 500\) i.i.d. samples of \(X^\varepsilon \). We then compute the estimators \({\widehat{A}}_k\) and \({\widehat{A}}_\delta \) on each of the realizations of \(X^\varepsilon \), thus obtaining \(N_{\mathrm {s}}\) replicas \(\{{\widehat{A}}_k^{(i)}\}_{i=1}^{N_{\mathrm {s}}}\) and \(\{\widehat{A}_\delta ^{(i)}\}_{i=1}^{N_{\mathrm {s}}}\). For the estimator \({\widehat{A}}_k\), we consider kernel (9) with \(\beta = \{1,5\}\) and with \(\delta = 1\). For the estimator \(\widehat{A}_\delta \), we employ the subsampling width \(\delta = \varepsilon ^{2/3}\), which is heuristically optimal following [29]. It could be argued that another estimator based on subsampling and shifting could be employed to reduce the variance. In particular, we let \(\tau > 0\) be the time step at which the data is observed. Indeed, in practice we work with highfrequency discrete data and observe \(X^\varepsilon \,\,{:=}\,\,(X^\varepsilon _0, X^\varepsilon _\tau , \ldots , X^\varepsilon _{n\tau })\), with \(n\tau = T\). We assume for simplicity that the subsampling width \(\delta \) is a multiple of \(\tau \) and compute for all \(k = 0, 1, \ldots , \delta /\tau 1\)
i.e., the subsampling estimator obtained by shifting the origin by \(k\tau \). We then average over the index k and obtain the new estimator
We include this estimator in the numerical study for completeness and compute \(N_{\mathrm {s}}\) replicas of \(\widehat{A}_{\delta }^{\mathrm {avg}}\) on all the realizations of \(X^\varepsilon \). Results, given in Fig. 5 for the final times \(T = \{500, 1000\}\), show that our novel approach does not outperform subsampling in terms of variance, but clearly does in terms of bias. Moreover, we notice numerically that the shiftedaveraged estimator \({\widehat{A}}_\delta ^{\mathrm {avg}}\) does not reduce sensibly the variance in this case with respect to \({\widehat{A}}_\delta \). In fact, this is only partly surprising, since the estimators \(\widehat{A}_{\delta ,k}\) of (34) are highly correlated. Finally, we notice that the filtering estimator \({\widehat{A}}_k\) with \(\beta = 5\) has a lower variance with respect to the same estimator with \(\beta = 1\). This confirms that choosing a higher value of \(\beta \) improves the estimation of the effective drift coefficient.
Multidimensional Drift Coefficient
Let us consider the Chebyshev polynomials of the first kind, i.e., the polynomials \(T_i:{\mathbb {R}}\rightarrow {\mathbb {R}}\), \(i=0, 1, \ldots \), defined by the recurrence relation
We consider the potential function V(x) as in (2) with
thus considering the semiparametric framework of Remark 2.4. This potential function satisfies Assumption 2.1 whenever N is even and if the leading coefficient \(\alpha _N\) is positive. We set \(N = 4\) and the drift coefficient \(\alpha = (1, 1/2, 1/2, 1)\). With this drift coefficient, the potential function is of the bistable kind. Moreover, we set \(\varepsilon = 0.05\), the diffusion coefficient \(\sigma = 1\), the fast potential \(p(y) = \cos (y)\) and simulate a trajectory of \(X^\varepsilon \) for \(0 \le t \le T\) with \(T = 10^3\) employing the Euler–Maruyama method with time step \(\varDelta t = \varepsilon ^3\). We estimate the drift coefficient \(A \in {\mathbb {R}}^4\) with the estimators:

(i)
\({\widehat{A}}(X^\varepsilon , T)\) based on the data \(X^\varepsilon \) itself;

(ii)
\({\widehat{A}}_\delta (X^\varepsilon , T)\) based on subsampled data with subsampling parameter \(\delta = \varepsilon ^{2/3}\);

(iii)
\({\widehat{A}}_k(X^\varepsilon , T)\) based on filtered data \(Z^\varepsilon \) computed with \(\beta = 1\) and \(\delta = 1\).
In particular, we pick this specific value of \(\delta \) for the subsampling following the optimality criterion given in [29]. Results, given in Fig. 6, show that the filterbased estimation captures well the homogenized potential as well as the coefficient A. Moreover, it is possible to remark the negative result given in Theorem 2.3 holds in practice, i.e., with no preprocessing the estimator \(\widehat{A}(X^\varepsilon , T)\) tends to the drift coefficient \(\alpha \) of the unhomogenized equation. Finally, we can observe that the subsamplingbased estimator fails to capture the homogenized coefficients. Indeed, the estimator strongly depends on the sampling rate and on the diffusion coefficient, as shown in the numerical experiments of [29]. Even though the authors suggest the choice of \(\delta = \varepsilon ^{2/3}\), this is just an heuristic and is not guaranteed to be the optimal value in all cases. In the asymptotic limit of \(\varepsilon \rightarrow 0\) and \(T \rightarrow \infty \), any valid choice of the subsampling rate is guaranteed theoretically to work, but not in the preasymptotic regime. Our estimator, conversely, seems to perform better with no particular tuning of the parameters even in this multidimensional case, which demonstrates the robustness of our novel approach.
The Bayesian Approach: Bistable Potential
In this numerical experiment, we consider \(N = 2\) and the bistable potential, i.e., the function V defined as
with coefficients \(\alpha _1 = 1\) and \(\alpha _2 = 2\). We then consider the multiscale equation with \(\sigma = 0.7\), the fast potential \(p(y) = \cos (y)\) and \(\varepsilon = 0.05\), thus simulating a trajectory \(X^\varepsilon \). We adopt here a Bayesian approach and compute the posterior distribution \({{\widetilde{\mu }}}_{T, \varepsilon }\) obtained with the filtered data approach introduced in Sect. 4.1. The parameters of the filter are set to \(\beta = 1\) and \(\delta = \varepsilon \) in (9). Moreover, we choose the noninformative prior \(\mu _0 = {\mathcal {N}}(0,I)\). Let us remark that in order to compute the posterior covariance the diffusion coefficient \(\varSigma \) of the homogenized equation has to be known. In this case, we precompute the value of \(\varSigma \) via the coefficient K and the theory of homogenization, but notice that \(\varSigma \) could be estimated either employing the subsampling technique of [29] or using the estimator \({\widehat{\varSigma }}_k\) based on filtered data defined in (12). In particular, in this case \(\varSigma \approx 0.2807\), and we compute numerically
so that employing the estimator \({\widehat{\varSigma }}_k\) instead of the true value would have negligible effects on the computation of the posterior over the effective drift coefficient. We stop computations at times \(T = \{100, 200, 400\}\) in order to observe the shrinkage of the Gaussian posterior toward the MLE \({\widetilde{A}}_k(X^\varepsilon , T)\) with respect to time. In Fig. 7, we observe that the posterior does indeed shrink toward the MLE, which in turn gets progressively closer to the true value of the drift coefficient A of the homogenized equation.
Conclusion
In this work, we considered a novel methodology to confront the problem of model misspecification when homogenized models are fit to multiscale data. Our approach is based on using filtered data for the estimation of the drift of the homogenized diffusion process. We proved asymptotic unbiasedness of estimators drawn from our methodology. Moreover, we found a modified Bayesian approach which guarantees robust uncertainty quantification and posterior contraction, based on the same filtered data approach. Numerical experiments demonstrate how the estimator based on filtered data requires less knowledge of the characteristic time scales of the multiscale equation with respect to subsampling and how it can be employed as a blackbox tool for parameter estimation on a range of academic examples. We note that in many applications one can only obtain discrete measurements of the diffusion process. Recently, using the filtering approach developed in this paper and martingale estimating functions a new estimator for learning homogenized SDEs from noisy discrete data has been introduced [4]. We believe this work gives way to several further developments. In particular, we believe it would be relevant to

(i)
analyze the filtered data approach for \(\beta > 1\) in (9), which seems to give more robust results in practice,

(ii)
extend the analysis to the nonparametric framework most likely by means of Bayesian regularization techniques, thus allowing to recover effective drift functions for which a parametric representation does not exist,

(iii)
consider multiscale models for which the homogenized equation presents multiplicative noise,

(iv)
test the filtered data methodology against realworld data,

(v)
apply similar methodologies to correct faulty behavior of other methods.
References
 1.
Abdulle, A., Di Blasio, A.: Numerical homogenization and model order reduction for multiscale inverse problems. Multiscale Model. Simul. 17(1), 399–433 (2019).
 2.
Abdulle, A., Di Blasio, A.: A Bayesian Numerical Homogenization Method for Elliptic Multiscale Inverse Problems. SIAM/ASA J. Uncertain. Quantif. 8(1), 414–450 (2020).
 3.
Abdulle, A., Garegnani, G., Zanoni, A.: Ensemble Kalman Filter for Multiscale Inverse Problems. Multiscale Model. Simul. 18(4), 1565–1594 (2020).
 4.
Abdulle, A., Pavliotis, G.A., Zanoni, A.: Eigenfunction martingale estimating functions and filtered data for drift estimation of discretely observed multiscale diffusions (2021). ArXiv preprint arXiv:2104.10587
 5.
AïtSahalia, Y., Jacod, J.: Highfrequency financial econometrics. Princeton University Press (2014)
 6.
AïtSahalia, Y., Mykland, P.A., Zhang, L.: How often to sample a continuoustime process in the presence of market microstructure noise. Rev. Financ. Stud 18(2), 351–416 (2005)
 7.
Azencott, R., Beri, A., Jain, A., Timofeyev, I.: Subsampling and parametric estimation for multiscale dynamics. Commun. Math. Sci. 11(4), 939–970 (2013).
 8.
Azencott, R., Beri, A., Timofeyev, I.: Adaptive subsampling for parametric estimation of Gaussian diffusions. J. Stat. Phys. 139(6), 1066–1089 (2010).
 9.
Bensoussan, A., Lions, J.L., Papanicolaou, G.: Asymptotic analysis for periodic structures. NorthHolland Publishing Co., Amsterdam (1978)
 10.
Calvetti, D., Dunlop, M., Somersalo, E., Stuart, A.M.: Iterative updating of model error for Bayesian inversion. Inverse Problems 34(2), 025008, 38 (2018).
 11.
Calvetti, D., Ernst, O., Somersalo, E.: Dynamic updating of numerical model discrepancy using sequential sampling. Inverse Problems 30(11), 114019, 19 (2014).
 12.
Cotter, C.J., Pavliotis, G.A.: Estimating eddy diffusivities from noisy Lagrangian observations. Commun. Math. Sci. 7(4), 805–838 (2009).
 13.
Crommelin, D., VandenEijnden, E.: Reconstruction of diffusions using spectral data from timeseries. Commun. Math. Sci. 4(3), 651–668 (2006).
 14.
Crommelin, D., VandenEijnden, E.: Diffusion estimation from multiscale data by operator eigenpairs. Multiscale Model. Simul. 9(4), 1588–1623 (2011).
 15.
Dashti, M., Stuart, A.M.: The Bayesian Approach to Inverse Problems. In: Handbook of Uncertainty Quantification, pp. 1–118. Springer (2016)
 16.
Gailus, S., Spiliopoulos, K.: Statistical inference for perturbed multiscale dynamical systems. Stochastic Process. Appl. 127(2), 419–448 (2017).
 17.
Gailus, S., Spiliopoulos, K.: Discretetime statistical inference for multiscale diffusions. Multiscale Model. Simul. 16(4), 1824–1858 (2018).
 18.
Kalliadasis, S., Krumscheid, S., Pavliotis, G.A.: A new framework for extracting coarsegrained models from time series with multiscale structure. J. Comput. Phys. 296, 314–328 (2015).
 19.
Karatzas, I., Shreve, S.E.: Brownian motion and stochastic calculus, Graduate Texts in Mathematics, vol. 113, second edn. SpringerVerlag, New York (1991).
 20.
Kessler, M., Sørensen, M.: Estimating equations based on eigenfunctions for a discretely observed diffusion process. Bernoulli 5(2), 299–314 (1999).
 21.
Krumscheid, S., Pavliotis, G.A., Kalliadasis, S.: Semiparametric drift and diffusion estimation for multiscale diffusions. Multiscale Model. Simul. 11(2), 442–473 (2013).
 22.
Krumscheid, S., Pradas, M., Pavliotis, G.A., Kalliadasis, S.: Datadriven coarse graining in action: Modeling and prediction of complex systems. Physical Review E 92(4), 042139 (2015)
 23.
Mattingly, J.C., Stuart, A.M., Higham, D.J.: Ergodicity for SDEs and approximations: locally Lipschitz vector fields and degenerate noise. Stochastic Process. Appl. 101(2), 185–232 (2002).
 24.
Nolen, J., Pavliotis, G.A., Stuart, A.M.: Multiscale modeling and inverse problems. In: Numerical analysis of multiscale problems, Lect. Notes Comput. Sci. Eng., vol. 83, pp. 1–34. Springer, Heidelberg (2012).
 25.
Olhede, S.C., Sykulski, A.M., Pavliotis, G.A.: Frequency domain estimation of integrated volatility for Itô processes in the presence of marketmicrostructure noise. Multiscale Model. Simul. 8(2), 393–427 (2010).
 26.
Papavasiliou, A., Pavliotis, G.A., Stuart, A.M.: Maximum likelihood drift estimation for multiscale diffusions. Stochastic Process. Appl. 119(10), 3173–3210 (2009).
 27.
Pavliotis, G.A.: Stochastic processes and applications, Diffusion processes, the FokkerPlanck and Langevin equations Texts in Applied Mathematics, vol. 60. Springer, New York (2014).
 28.
Pavliotis, G.A., Pokern, Y., Stuart, A.M.: Parameter estimation for multiscale diffusions: an overview. In: Statistical methods for stochastic differential equations, Monogr. Statist. Appl. Probab., vol. 124, pp. 429–472. CRC Press, Boca Raton, FL (2012).
 29.
Pavliotis, G.A., Stuart, A.M.: Parameter estimation for multiscale diffusions. J. Stat. Phys. 127(4), 741–781 (2007).
 30.
Pavliotis, G.A., Stuart, A.M.: Multiscale methods: averaging and homogenization, Texts in Applied Mathematics, vol. 53. Springer, New York (2008)
 31.
Pokern, Y., Stuart, A.M., VandenEijnden, E.: Remarks on drift estimation for diffusion processes. Multiscale Model. Simul. 8(1), 69–95 (2009).
 32.
Pokern, Y., Stuart, A.M., van Zanten, J.H.: Posterior consistency via precision operators for Bayesian nonparametric drift estimation in SDEs. Stochastic Process. Appl. 123(2), 603–628 (2013).
 33.
Stuart, A.M.: Inverse problems: a Bayesian perspective. Acta Numer. 19, 451–559 (2010).
 34.
Ying, Y., Maddison, J., Vanneste, J.: Bayesian inference of ocean diffusivity from Lagrangian trajectory data. Ocean Model. 140 (2019)
 35.
Zhang, L., Mykland, P.A., AïtSahalia, Y.: A tale of two time scales: determining integrated volatility with noisy highfrequency data. J. Amer. Statist. Assoc. 100(472), 1394–1411 (2005).
Acknowledgements
We thank the anonymous reviewers whose comments helped improve and clarify this manuscript.
Funding
Open Access funding provided by EPFL Lausanne.
Author information
Affiliations
Corresponding author
Additional information
Dedicated to the memory of Assyr Abdulle.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Assyr Abdulle, Andrea Zanoni and Giacomo Garegnani are partially supported by the Swiss National Science Foundation, under Grant No. 200020_172710. The work of Grigorios A. Pavliotis was partially funded by the EPSRC, Grant Number EP/P031587/1, and by JPMorgan Chase & Co. Any views or opinions expressed herein are solely those of the authors listed, and may differ from the views and opinions expressed by JPMorgan Chase & Co. or its affiliates. This material is not a product of the Research Department of J.P. Morgan Securities LLC. This material does not constitute a solicitation or offer in any jurisdiction. Andrew M. Stuart is grateful to NSF (Grant DMS 18189770) for financial support.
Communicated by Wolfgang Dahmen.
Appendices
Appendix A: Proofs of Sections 3.1
Proof of Lemma 3.2
We have to show that the joint process solution to (13) is hypoelliptic. Denoting as \(f:{\mathbb {R}}\rightarrow {\mathbb {R}}\) the function
the generator of the process \((X^\varepsilon , Z^\varepsilon )^\top \) is given by
where
The commutator \([{\mathcal {X}}_0, {\mathcal {X}}_1]\) applied to a test function v then gives
Consequently,
which spans the tangent space of \({\mathbb {R}}^2\) at (x, z), denoted \(T_{x, z}{\mathbb {R}}^2\). The desired result then follows from Hörmander’s theorem (see, e.g., [27, Chapter 6]). \(\square \)
Proof of Lemma 3.3
Lemma 3.2 guarantees that the Fokker–Planck equation can be written directly from system (13). For geometric ergodicity, let
Due to Assumption 2.1(ii), Remark 2.2 and Young’s inequality, we then have for all \(\gamma > 0\)
We choose \(\gamma = \gamma ^* \,\,{:=}\,\,1  b\delta + \sqrt{1 + (1  b\delta )^2} > 0\) so that
and we notice that \(C(\gamma ^*) > 0\) if \(\delta > 1/(4b)\). In this case, we have
and problem (13) is dissipative. It remains to prove the irreducibility condition [23, Condition 4.3]. We remark that system (13) fits the framework of the example the end of [23, Page 199], and therefore, [23, Condition 4.3] is satisfied. The result then follows from [23, Theorem 4.4]. \(\square \)
Proof of Lemma 3.5
Integrating Eq. (14) with respect to z, we obtain the stationary Fokker–Planck equation for the process \(X^\varepsilon \), i.e.,
whose solution is given by
and which proves (16). In view of (15) and (36), Eq. (14) can be rewritten as
We now multiply the equation above by a continuous differentiable function \(f :{\mathbb {R}}^2 \rightarrow {\mathbb {R}}^N\), \(f = f(x,z)\), and integrate with respect to x and z. Then, an integration by parts yields
which implies the following identity in \({\mathbb {R}}^N\)
Finally, choosing
we obtain the desired result. \(\square \)
Appendix B: Proof of Proposition 3.14
B.1 Preliminary estimates
In order to prove the characterization provided by Proposition 3.14, we need to prove two additional results on the filter. First, we prove a Jensenlike inequality for the kernel of the filter.
Lemma B.1
Let \(\delta > 0\) and k(r) be defined as
Then, for any \(t > 0\), \(p \ge 1\) and any function \(g\in \mathcal C^0([0, t])\) it holds
Proof
Let us first note that
Therefore, the measure \(\kappa _t(\mathrm {d}s)\) on [0, t] defined as
is a probability measure. An application of Jensen’s inequality therefore yields
Finally, since \(0< (1  e^{t/\delta })< 1\) and \(p \ge 1\), this yields the desired result. \(\square \)
The following lemma characterizes the action of the filter when it is applied to polynomials in \((ts)\).
Lemma B.2
With the notation of Lemma B.1, it holds for all \(p \ge 0\)
where \(C > 0\) is a positive constant independent of \(\delta \).
Proof
The change of variable \(u = (ts)/\delta \) yields
where \(\gamma \) is the lower incomplete gamma function, which is bounded by the complete gamma function \(\varGamma (p+1)\) independently of the second argument. \(\square \)
B.2 Proof of Proposition 3.14
Denoting \(Y_t^\varepsilon \,\,{:=}\,\,X_t^\varepsilon /\varepsilon \), we will make use of the decomposition [29, Formula 5.8]
which is obtained applying the Itô formula to \(\varPhi \), the solution of the cell problem (5). Recall that by definition of \(Z_t^\varepsilon \) we have
Plugging decomposition (37) into the equation above, we obtain
where
Let us analyze the terms above singularly. For \(I_1^\varepsilon (t)\), one can show [29, Proposition 5.8]
where the remainder \(R_1^\varepsilon \) satisfies
Therefore, it holds
where we exploited the equality
and where
Now, Lemma B.1, inequality (38) and Lemma B.2 yield for all \(p \ge 1\)
where C is a positive constant independent of \(\varepsilon \) and \(\delta \). Therefore, for \(\delta \) sufficiently small, we get
We now consider the second term. Let us introduce the notation
and therefore rewrite
An application of the Itô formula to \(u(s, Q_s^\varepsilon )\) where \(u(s, x) = k(ts)x\) yields
where \(B_t^\varepsilon \) is defined in (27). For the remainder \(R_2^\varepsilon (t)\), let us remark that for all \(p \ge 1\) it holds
where we applied Jensen’s inequality, an estimate for the moments of stochastic integrals [19, Formula (3.25), p. 163] and the boundedness of \(\varPhi \). Therefore, we have
In order to obtain bound (28) on \(B_t^\varepsilon \), let us remark that from (39) it holds for a constant \(C > 0\) depending only on p
The second term is bounded exponentially fast with respect to t and \(\delta \) due to (40). For the first term, applying Lemma B.1, inequality [19, Formula (3.25), p. 163] and Lemma B.2 we obtain for a constant \(C > 0\) independent of \(\delta \) and t
Therefore, it holds for \(\delta \) sufficiently small
which proves bound (28). Let us now consider \(I_3^\varepsilon (t)\). Since \(\varPhi \) is bounded, we simply have
almost surely. Finally, due to [29, Corollary 5.4], we know that \(X_t^\varepsilon \) has bounded moments of all orders and therefore
which concludes the proof. \(\square \)
Appendix C: Proofs of Section 3.3
C.1 Preliminary estimates
The following lemma shows that \(Z^\varepsilon \) has bounded moments of all orders.
Lemma C.1
Under Assumption 2.1, let \(Z^\varepsilon \) be distributed as the invariant measure \(\mu ^\varepsilon \) of the couple \((X^\varepsilon , Z^\varepsilon )^\top \). Then, for any \(p \ge 1\) there exists a constant \(C > 0\) uniform in \(\varepsilon \) such that
Proof
Let \(X_t^\varepsilon \) be at stationarity with respect to its invariant measure, which we recall having density denoted as \(\varphi ^\varepsilon \). Let \(Z_t^\varepsilon \) be the corresponding filtered process. By definition of \(Z^\varepsilon _t\) and applying Lemma B.1, we have
which, together with the definition of k and the fact that \(X_s^\varepsilon \) has bounded moments of all orders [29, Corollary 5.4], implies for a constant \(C>0\)
In order to conclude, we remark that due to Lemma 3.3 we have for all \(t \ge 0\)
which, for t sufficiently big, yields the desired result. \(\square \)
Corollary C.2 is a direct consequence of Proposition 3.14 and provides a rough estimate of the difference between the trajectories \(X_t^\varepsilon \) and \(Z_t^\varepsilon \) when they are at stationarity.
Corollary C.2
Under Assumption 2.1, let the couple \((X^\varepsilon , Z^\varepsilon )^\top \) be distributed as its invariant measure \(\mu ^\varepsilon \). Then, if \(\delta \le 1\), it holds for any \(p \ge 1\)
for a constant \(C > 0\) independent of \(\varepsilon \) and \(\delta \).
Proof
Let \(p \ge 1\), then due to Proposition 3.14 there exists a constant \(C > 0\) depending only on p such that
Let us now remark that this result holds for \(X_t^\varepsilon \) being at stationarity and for \(Z_t^\varepsilon \) being its filtered process, and not for a couple \((X^\varepsilon , Z^\varepsilon )^\top \sim \mu ^\varepsilon \). In order to conclude, we remark that due to Lemma 3.3 we have for all \(t \ge 0\)
which, for t sufficiently big, yields the desired result. \(\square \)
The result above can be in some sense rather counterintuitive. Indeed, for a fixed \(\varepsilon > 0\) and for \(\delta \rightarrow 0\) independently of \(\varepsilon \), one expects the filtered trajectory \(Z^\varepsilon \) to approach \(X^\varepsilon \). This is provided by the following Lemma.
Lemma C.3
Under Assumption 2.1, let the couple \((X^\varepsilon , Z^\varepsilon )^\top \) be distributed as its invariant measure \(\mu ^\varepsilon \). Then, if \(\delta \le 1\), it holds for any \(p \ge 1\)
for a constant \(C > 0\) independent of \(\varepsilon \) and \(\delta \).
Proof
By Eq. (1), we have for all \(0 \le s < t\)
Therefore, by Assumption 2.1 and since \(X_t^\varepsilon \) has bounded moments of all orders at stationarity [29, Corollary 5.4], it holds for any \(p \ge 1\) and a constant \(C > 0\)
where \(\varphi ^\varepsilon \) is the invariant measure of \(X^\varepsilon \). By definition of \(Z_t^\varepsilon \), we have
which, applying Lemma B.1, inequality (41) and Lemma B.2, implies
Geometric ergodicity (Lemma 3.3) then implies for \(\rho ^\varepsilon \) the measure of the couple \((X^\varepsilon , Z^\varepsilon )^\top \)
which, for t sufficiently big and since \(\delta \le 1\) yields the desired result. \(\square \)
Let us conclude with a last preliminary estimate concerning the matrices \(\widetilde{{\mathcal {M}}}_\varepsilon \) and \({\mathcal {M}}_\varepsilon \) defined in (20) and (21), respectively.
Lemma C.4
Let the assumptions of Corollary C.2 hold. Then, the matrices \({\mathcal {M}}_\varepsilon \) and \(\widetilde{{\mathcal {M}}}_\varepsilon \) satisfy
for a constant \(C > 0\) independent of \(\varepsilon \) and \(\delta \).
Proof
Applying Jensen’s and Cauchy–Schwarz inequalities, we have
The Lipschitz condition on \(V'\) together with the boundedness of the moments of \(X^\varepsilon \) and Corollary C.2 yield for a constant \(C > 0\)
which is the desired result. \(\square \)
C.2 Proof of Lemma 3.16
Let us consider the following system of stochastic differential equations for the processes \(X_t^\varepsilon , Z_t^\varepsilon , B_t^\varepsilon , Y_t^\varepsilon \)
whose generator \(\widetilde{{\mathcal {L}}}_\varepsilon \) is given by
Let us denote by \(\eta ^\varepsilon :{\mathbb {R}}^3 \times [0,L] \rightarrow {\mathbb {R}}\), \(\eta ^\varepsilon = \eta ^\varepsilon (x,z,b,y)\), the invariant measure of the quadruple \((X_t^\varepsilon , Z_t^\varepsilon , B_t^\varepsilon , Y_t^\varepsilon )\). Then, \(\eta ^\varepsilon \) solves the stationary Fokker–Planck equation \(\widetilde{{\mathcal {L}}}_\varepsilon ^* \eta ^\varepsilon = 0\), i.e., explicitly
We now multiply the equation above by a continuous differentiable function \(f :{\mathbb {R}}^2 \rightarrow {\mathbb {R}}^N\), \(f = f(z,b)\), and integrate with respect to x, z, b and y. Then, an integration by parts yields
which implies the following identity in \({\mathbb {R}}^N\)
Choosing
we obtain
We now consider the remainder and, applying Hölder’s inequality, Corollary C.2, Lemma C.1, Assumption 3.15 and (28), we get for p, q, r such that \(1/p+1/q+1/r=1\)
which completes the proof. \(\square \)
C.3 Proof of Lemma 3.17
Let us introduce the notation
and note that the aim is to show that \(\lim _{\varepsilon \rightarrow 0} \varDelta (\varepsilon ) = 0\). By the triangle inequality, we get
We first study \(\varDelta _1(\varepsilon )\) and due to the boundedness of \(\varPhi '\), Assumption 3.15 and Lemma C.2 we have
which implies
We now consider \(\varDelta _2(\varepsilon )\). Integrating Eq. (42) with respect to z and b, we obtain the Fokker–Planck equation for the stationary marginal distribution \(\lambda :{\mathbb {R}}\times [0,L]\), \(\lambda = \lambda (x,y)\), of the couple \((X^\varepsilon ,Y^\varepsilon )\)
whose solution is given by
where
Therefore, since \(\varSigma = K\sigma \) and by Eqs. (4) and (19) we have
which shows that \(\varDelta _2(\varepsilon ) = 0\) and completes the proof. \(\square \)
C.4 Proof of Theorem 3.19
Let us consider decomposition (22), i.e.,
where \(I_1^\varepsilon (T)\) is defined in (22) and satisfies
and, by the proof of Theorem 3.12, we have independently of \(\varepsilon \)
A Taylor expansion of the first order of \(V'\) yields
where \({\widetilde{X}}^\varepsilon \) is a random variable which assumes values between \(X^\varepsilon \) and \(Z^\varepsilon \). We can therefore write
We now consider the two terms separately and show they vanish for \(\varepsilon \rightarrow 0\). Integrating by parts in \(J_1^\varepsilon \), we obtain
We then pass to the limit as \(\varepsilon \rightarrow 0\) and integrate by parts again to obtain
We now turn to \(J_2^\varepsilon \). The Hölder’s inequality with conjugate exponents p and q and the assumptions on p and V yield
Since \({\widetilde{X}}^\varepsilon \) assumes values between \(X^\varepsilon \) and \(Z^\varepsilon \), it has bounded moments by [29, Corollary 5.4] and Lemma C.1. Hence, applying Lemma C.3 we have
which, since \(\delta = \varepsilon ^\zeta \) with \(\zeta > 2\), implies
Finally, Lemma C.4 and the weak convergence of the invariant measure \(\varphi ^\varepsilon \) to \(\varphi ^0\) imply
which, together with (44), (45), implies that \(I_1^\varepsilon (T) \rightarrow 0\) for \(T \rightarrow \infty \) and \(\varepsilon \rightarrow 0\), which implies the desired result. \(\square \)
Appendix D: Proof of Theorem 3.20
First, the ergodic theorem yields
then applying Proposition 3.14 at stationarity we obtain
and due to the Cauchy–Schwarz inequality and estimates (28) and (29) we have
for a constant \(C>0\) independent of \(\varepsilon \) and \(\delta \). Let us now consider \(I_1^\varepsilon \). Employing Eq. (43) with the function \(f(z,b) = 1/2 b^2\) gives
which together with bounds (46) and the hypothesis on \(\delta \) implies
which is the desired result. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Abdulle, A., Garegnani, G., Pavliotis, G.A. et al. Drift Estimation of Multiscale Diffusions Based on Filtered Data. Found Comput Math (2021). https://doi.org/10.1007/s10208021095419
Received:
Revised:
Accepted:
Published:
Keywords
 Parameter inference
 Diffusion process
 Datadriven homogenization
 Filtering
 Bayesian inference
 Langevin equation
Mathematics Subject Classification
 62F15
 65C30
 62M05
 74Q10