# Adaptive nonparametric drift estimation for diffusion processes using Faber–Schauder expansions

- 425 Downloads

## Abstract

We consider the problem of nonparametric estimation of the drift of a continuously observed one-dimensional diffusion with periodic drift. Motivated by computational considerations, van der Meulen et al. (Comput Stat Data Anal 71:615–632, 2014) defined a prior on the drift as a randomly truncated and randomly scaled Faber–Schauder series expansion with Gaussian coefficients. We study the behaviour of the posterior obtained from this prior from a frequentist asymptotic point of view. If the true data generating drift is smooth, it is proved that the posterior is adaptive with posterior contraction rates for the \(L_2\)-norm that are optimal up to a log factor. Contraction rates in \(L_p\)-norms with \(p\in (2,\infty ]\) are derived as well.

## 1 Introduction

*X*defined as (weak) solution to the stochastic differential equation (sde)

*W*is a Brownian Motion and the drift \(b_0\) is assumed to be a real-valued measurable function on the real line that is 1-periodic and square integrable on [0, 1]. The assumed periodicity implies that we can alternatively view the process

*X*as a diffusion on the circle. This model has been used for dynamic modelling of angles, see for instance Pokern (2007) and Hindriks (2011).

We are interested in nonparametric adaptive estimation of the drift. This problem has recently been studied by multiple authors. Spokoiny (2000) proposed a locally linear smoother with a data-driven bandwidth choice that is rate adaptive with respect to \(|b''(x)|\) for all *x* and optimal up to a log factors. Interestingly, the result is non-asymptotic and does not require ergodicity. Dalalyan and Kutoyants (2002) and Dalalyan (2005) consider ergodic diffusions and construct estimators that are asymptotically minimax and adaptive under Sobolev smoothness of the drift. Their results were extended to the multidimensional case by Strauch (2015).

In this paper we focus on Bayesian nonparametric estimation, a paradigm that has become increasingly popular over the past two decades. An overview of some advances of Bayesian nonparametric estimation for diffusion processes is given in van Zanten (2013).

The Bayesian approach requires the specification of a prior. Ideally, the prior on the drift is chosen such that drawing from the posterior is computationally efficient while at the same time ensuring that the resulting inference has good theoretical properties. which is quantified by a contraction rate. This is a rate for which we can shrink balls around the true parameter value, while maintaining most of the posterior mass. More formally, if *d* is a semimetric on the space of drift functions, a contraction rate \(\varepsilon _T\) is a sequence of positive numbers \(\varepsilon _T\downarrow 0\) for which the posterior mass of the balls \(\{b\,:\, d(b,b_0)\le \varepsilon _T\}\) converges in probability to 1 as \(T\rightarrow \infty \), under the law of *X* with drift \(b_0\). For a general discussion on contraction rates, see for instance Ghosal et al. (2000) and Ghosal and van der Vaart (2007).

For diffusions, the problem of deriving optimal posterior convergence rates has been studied recently under the additional assumption that the drift integrates to zero, \(\int _0^1 b_0(x) d x =0\). In Papaspiliopoulos et al. (2012) a mean zero Gaussian process prior is proposed together with an algorithm to sample from the posterior. The precision operator (inverse covariance operator) of the proposed Gaussian process is given by \(\eta \left( (-\Delta )^{\alpha +1/2} + \kappa I\right) \), where \(\Delta \) is the one-dimensional Laplacian, *I* is the identity operator, \(\eta , \kappa >0\) and \(\alpha +1/2 \in \{2,3,\ldots \}\). A first consistency result was shown in Pokern et al. (2013).

*L*and \(\alpha \) are fixed and \(b_0\) is assumed to be \(\alpha \)-Sobolev smooth, then the optimal posterior rate of contraction, \(T^{-\alpha /(1+2\alpha )}\), is obtained. Note that this result is nonadaptive, as the regularity of the prior must match the regularity of \(b_0\). For obtaining optimal posterior contraction rates for the full range of possible regularities of the drift, two options are investigated: endowing either

*L*or \(\alpha \) with a hyperprior. Only the second option results in the desired adaptivity over all possible regularities.

These functions feature prominently in the Lévy-Ciesielski construction of Brownian motion (see for instance (Bhattacharya and Waymire 2007, paragraph 10.1)).

The prior coefficients \(Z_{jk}\) are equipped with a Gaussian distribution, and the truncation level *R* and the scaling factor *S* are equipped with independent priors. Truncation in absence of scaling increases the apparent smoothness of the prior (as illustrated for deterministic truncation by example 4.5 in van der Vaart and van Zanten (2008)), whereas scaling by a number \(\ge 1\) decreases the apparent smoothness. (Scaling with a number \(\le 1\) only increases the apparent smoothness to a limited extent, see for example Knapik et al. (2011).)

The simplest type of prior is obtained by taking the coefficients \(Z_{jk}\) independent. We do however also consider the prior that is obtained by first expanding a periodic Ornstein–Uhlenbeck process into the Faber–Schauder basis, followed by random scaling and truncation. We will explain that specific stationarity properties of this prior make it a natural choice.

Draws from the posterior can be computed using a reversible jump Markov Chain Monte Carlo (MCMC) algorithm (cf. van der Meulen et al. (2014)). For both types of priors, fast computation is facilitated by leveraging inherent sparsity properties stemming from the compact support of the functions \(\psi _{jk}\). In the discussion of van der Meulen et al. (2014) it was argued that inclusion of both the scaling and random truncation in the prior is beneficial. However, this claim was only supported by simulations results.

*In this paper we support this claim theoretically by proving adaptive contraction rates of the posterior distribution in case the prior* (3) *is used.* We start from a general result in van der Meulen et al. (2006) on Brownian semimartingale models, which we adapt to our setting. Here we take into account that as the drift is assumed to be one-periodic, information accumulates in a different way compared to (general) ergodic diffusions. Subsequently we verify that the resulting prior mass, remaining mass and entropy conditions appearing in this adapted result are satisfied for the prior defined in Eq. (3). An application of our results shows that if the true drift function is \(B_{\infty ,\infty }^\beta \)-Besov smooth, \(\beta \in (0,2)\), then by appropriate choice of the variances of \(Z_{jk}\), as well as the priors on *R* and *S*, the posterior for the drift *b* contracts at the rate \((T/\log T)^{-\beta /(1+2\beta )}\) around the true drift in the \(L_2\)-norm. Up to the log factor this rate is minimax-optimal (See for instance Kutoyants 2004, Theorem 4.48)). Moreover, it is adaptive: the prior does not depend on \(\beta \). In case the true drift has Besov-smoothness greater than or equal to 2, our method guarantees contraction rates equal to essentially \(T^{-2/5}\) (corresponding to \(\beta =2\)). A further application of our results shows that for \(L_p\)-norms we obtain contraction rate \(T^{-(\beta -1/2+1/p)/(1+2\beta )}\), up to log-factors.

The paper is organised as follows. In the next section we give a precise definition of the prior. In Sect. 3 a general contraction result for the class of diffusion processes considered here is derived. Our main result on posterior contraction for \(L^p\)-norms with \(p\ge 2\) is presented in Sect. 4. Many results of this paper concern general properties of the prior and their application is not confined to drift estimation of diffusion processes. To illustrate this, we show in Sect. 5 how these results can easily be adapted to nonparametric regression and nonparametric density estimation. Proofs are gathered in Sect. 6. The appendix contains a couple of technical results.

## 2 Prior construction

### 2.1 Model and posterior

### Lemma 1

If \(b_0 \in L^2({{\mathrm{\mathbb {T}}}}),\) then the SDE Eq. (1) has a unique weak solution.

The proof is in Sect. 6.1.

*b*. If \(P^0\) denotes the law of \(X^T\) when the drift is zero, then \(P^b\) is absolutely continuous with respect to \(P^0\) with Radon-Nikodym density

*A*is Borel set of \(L^2(\mathbb {T})\). These assertions are verified as part of the proof of Theorem 3.

### 2.2 Motivating the choice of prior

We are interested in randomly truncated, scaled series priors that simultaneously enable a fast algorithm for obtaining draws from the posterior and enjoy good contraction rates.

*finite*series prior. Let \(\{\psi _1,\ldots , \psi _r\}\) denote basis functions and \(Z=(Z_1,\ldots , Z_r)\) a mean zero Gaussian random vector with precision matrix \(\Gamma \). Assume that the prior for

*b*is given by \(b=\sum _{i=1}^r Z_i \psi _i\). By conjugacy, it follows that \( Z \mid X^T \sim \mathrm N(W^{-1}\mu , W^{-1})\), where \(W= G + \Gamma \),

*G*is referred to as the Grammian. From these expressions it follows that it is computationally advantageous to exploit

*compactly supported*basis functions. Whenever \(\psi _{i}\) and \(\psi _{i'}\) have nonoverlapping supports, we have \(G_{i, i'}=0\). Depending on the choice of such basis functions, the Grammian

*G*will have a specific sparsity structure (a set of index pairs \((i,i')\) such that \(G_{i,i'} = 0\), independently of \(X^T\).) This sparsity structure is inherited by

*W*as long as the sparsity structure of the prior precision matrix matches that of

*G*.

In the next section we make a specific choice for the basis functions and the prior precision matrix \(\Gamma \).

### 2.3 Definition of the prior

*R*and the scaling factor

*S*are equipped with (hyper)priors. We extend

*b*periodically if we want to consider

*b*as function on the real line. If we identify the double index (

*j*,

*k*) in (3) with the single index \(i = 2^{j}+k\), then we can write \(b^{R,S} = S \sum _{i=1}^{2^{R+1}} \psi _i Z_i\). Let

*R*,

*S*).

We will consider two choices of priors for the sequence \(Z_1,Z_2,\ldots \) Our first choice consists of taking independent Gaussian random variables. If the coefficients \(Z_{i}\) are independent with standard deviation \(2^{-\ell (i)/2}\), the random draws from this prior are scaled piecewise linear interpolations on a dyadic grid of a Brownian bridge on [0, 1] plus the random function \(Z_1\psi _1.\) The choice of \(\psi _1\) is motivated by the fact that in this case \({{\text {Var}}}\left( b(t) \big | S=s, R=\infty \right) = s^2\) is independent of *t*.

*V*is continuous and can be extended to a periodic function on \(\mathbb {R}\). Then

*V*can be represented as an infinite series expansion in the Faber–Schauder basis:

*S*and truncating at

*R*we obtain from

*V*the second choice of prior on the drift function

*b*. Visualisations of the covariance kernels \({{\text {Cov}}}\left( b(s) , b(t) \right) \) for first prior (Brownian bridge type) and for the second prior (periodic Ornstein–Uhlenbeck process prior with parameter \(\gamma = 1.48\)) are shown in Fig. 2 (for \(S=1\) and \(R =\infty \)).

### 2.4 Sparsity structure induced by choice of \(Z_i\)

Conditional on *R* and *S*, the posterior of \(Z^R\) is Gaussian with precision matrix \(G^R+\Gamma ^R\) (here \(G^R\) is the Grammian corresponding to using all basis functions up to and including level *R*).

If the coefficients are independent it is trivial to see that the precision matrix \(\Gamma \) does not destroy the sparsity structure of *G*, as defined in (6). This is convenient for numerical computations. The next lemma details the situation for periodic Ornstein–Uhlenbeck processes.

### Lemma 2

*V*be defined as in Eq. (10)

- 1.
The sparsity structure of the precision matrix of the infinite stochastic vector

*Z*(appearing in the series representation (11)) equals the sparsity structure of*G*, as defined in (6). - 2.The entries of the covariance matrix of the random Gaussian coefficients \(Z_i\) and \(Z_{i'}\), \(A_{i,i'} = \mathbb {E}Z_i Z_{i'}\), satisfy the following bounds: \(A_{11} = A_{22} = \tfrac{\sigma ^2}{2\gamma }\coth (\gamma /2)\) and for \(\gamma \le 1.5\) and \(i\ge 3\),and \(A_{12} = A_{21} = \tfrac{\sigma ^2}{2\gamma }\sinh ^{-1}(\gamma /2)\) and for \(i\ne i'\)$$\begin{aligned} 0.95 \cdot 2^{-\ell (i)}\sigma ^2/4 \le A_{ii} \le 2^{-\ell (i)}\sigma ^2/4 \end{aligned}$$$$\begin{aligned} |A_{ii'}| \le {\left\{ \begin{array}{ll} 0.20\sigma ^22^{-1.5(\ell (i)\vee \ell ( i'))}&{} \qquad i \wedge i'\le 2<i\vee i',\\ 0.37 \sigma ^2 2^{-1.5(\ell (i)+\ell (i'))}&{} \qquad \text {otherwise.} \end{array}\right. } \end{aligned}$$

The proof is given in Sect. 6.2. By the first part of the lemma, also this prior does not destroy the sparsity structure of the *G*. The second part asserts that while the off-diagonal entries of \(A^{r}\) are not zero, they are of smaller order than the diagonal entries, quantifying that the covariance matrix of the coefficients in the Schauder expansion is close to a diagonal matrix.

## 3 Posterior contraction for diffusion processes

*T*) and choose measurable subsets (sieves) \({\mathscr {B}}_T \subset L^2({{\mathrm{\mathbb {T}}}})\). Define the balls

*covering number*of a set

*A*for a semimetric \(\rho \), denoted by \(N(\varepsilon ,A,\rho )\), is defined as the minimal number of \(\rho \)-balls of radius \(\varepsilon \) needed to cover the set

*A*. The logarithm of the covering number is referred to as the entropy.

The following theorem characterises the rate of posterior contraction for diffusions on the circle in terms of properties of the prior.

### Theorem 3

*T*big enough

*K*big enough,

Equations (12), (13) and (14) are referred to as the entropy condition, small ball condition and remaining mass condition of Theorem 3 respectively. The proof of this theorem is in Sect. 6.3.

## 4 Theorems on posterior contraction rates

The main result of this section, Theorem 9 characterises the frequentist rate of contraction of the posterior probability around a fixed parameter \(b_0\) of unknown smoothness using the truncated series prior from Sect. 2.

We make the following assumption on the true drift function.

### Assumption 4

Note that we use a slightly different symbol for the norm, as we denote the \(L^2\)-norm by \(\Vert \cdot \Vert _2\).

### Remark 5

If \(\beta \in (0,1)\), then \(\beta \)–Hölder smoothness and \(B^\beta _{\infty ,\infty }\)–smoothness coincide (cf. Proposition 4.3.23 in Giné and Nickl (2016)).

For the prior defined in Eqs. (7)–(9) we make the following assumptions.

### Assumption 6

*A*satisfies one of the following conditions:

- (A)
For fixed \(\alpha >0\), \(A_{ii}=2^{-2\alpha \ell (i)}\) and \(A_{ii'}=0\) for \(i\ne i'\).

- (B)There exists \(0< c_1 < c_2\) and \(0 < c_3\) with \(3 c_3 < c_1\) independent from
*r*, such that for all \(i, i' \in {\mathscr {I}}_r\)$$\begin{aligned}&c_1 2^{-\ell (i)} \le A_{ii} \le c_2 2^{-\ell (i)},\\&|A_{ii'}| \le c_3 2^{-1.5(\ell (i)+\ell (i'))} \quad \text { if } i \ne i'. \end{aligned}$$

In particular the second assumption if fulfilled by the prior defined by Eq. (10) if \(0 < \gamma \le 3/2\) and any \(\sigma ^2 > 0\).

### Assumption 7

The prior on *R* can be defined as \(R= \lfloor ^2\log Y\rfloor \), where *Y* is Poisson distributed. Equation (18) is satisfied for a whole range of distributions, including the popular family of inverse gamma distributions. Since the inverse gamma prior on \(S^2\) decays polynomially (Lemma 17), condition (A2) of Shen and Ghosal (2015) is not satisfied and hence their posterior contraction results cannot be applied to our prior. We obtain the following result for our prior.

### Theorem 8

*n*sufficiently large

The following theorem is obtained by applying these bounds to Theorem 3 after taking \(\varepsilon _n=(T / \log T)^{-\beta /(1+2\beta )}\).

### Theorem 9

This means that when the true parameter is from \(B_{\infty ,\infty }^\beta [0,1],\beta <2\) a rate is obtained that is optimal possibly up to a log factor. When \(\beta \ge 2\) then \(b_0\) is in particular in the space \(B_{\infty ,\infty }^{2-\delta }[0,1],\) for every small positive \(\delta \), and therefore converges with rate essentially \(T^{-2/5}\).

When a different function \(\Lambda \) is used, defined on a compact interval of \(\mathbb {R},\) and the basis elements are defined by \(\psi _{jk}=\sum _{m\in \mathbb {Z}}\Lambda (2^{j}(x-m)+k-1)\); forcing them to be 1-periodic. Then Theorem 9 and derived results for applications still holds provided \(\Vert \psi _{jk}\Vert _\infty = 1\) and \(\psi _{j,k}\cdot \psi _{j,l}\equiv 0\) when \(|k-l|\ge d\) for a fixed \(d \in \mathbb {N}\) and the smoothness assumptions on \(b_0\) are changed accordingly. A finite number of basis elements can be added or redefined as long as they are 1-periodic.

It is easy to see that our results imply posterior convergences rates in weaker \(L^p\)-norms, \(1\le p<2,\) with the same rate. When \(p\in (2,\infty ]\) the \(L^p\)-norm is stronger than the \(L^2\)-norm. We apply ideas of Knapik and Salomond (2014) to obtain rates for stronger \(L^p\)-norms.

### Theorem 10

These rates are similar to the rates obtained for the density estimation in Giné and Nickl (2011). However our proof is less involved. Note that we have only consistency for \(\beta >1/2-1/p\).

## 5 Applications to nonparametric regression and density estimation

Our general results also apply to other models. The following results are obtained for \(b_0\) satisfying Assumption 4 and the prior satisfying assumptions 6 and 7.

### 5.1 Nonparametric regression model

### 5.2 Density estimation

*n*independent observations \(X^n := (X_1,\ldots ,X_n)\) with \(X_i\sim p_0\) where \(p_0\) is an unknown density on [0, 1] relative to the Lebesgue measure. Let \(\mathscr {P}\) denote the space of densities on [0, 1] relative to the Lebesgue measure. The natural distance for densities is the Hellinger distance

*h*defined by

*b*is endowed with the prior of Theorem 9 or its non-periodic version. Assume that \(\log p_0\) is \(\beta \)-smooth in the sense of Assumption 4. Applying Ghosal et al. (2000), theorem 2.1 and van der Vaart and van Zanten (2008), lemma 3.1 to Theorem 8, we obtain for a big enough constant \(M>0\)

## 6 Proofs

### 6.1 Proof of lemma 1

Since conditions (ND) and (LI) of (Karatzas and Shreve 1991, theorem 5.15) hold, the SDE Eq. (1) has a unique weak solution up to an explosion time.

### 6.2 Proof of lemma 2

### Proof of the first part

For the proof we introduce some notation: for any (*j*, *k*), \((j', k')\) we write \((j, k) \prec (j', k')\) if \(\text {supp}\, \psi _{j',k'}\subset \text {supp}\, \psi _{j,k}\). The set of indices become a lattice with partial order \(\prec \), and by \((j,k) \vee (j',k')\) we denote the supremum. Identify *i* with (*j*, *k*) and similarly \(i'\) with \((j',k')\).

*V*is a Gaussian process, the vector

*Z*is mean-zero Gaussian, say with (infinite) precision matrix \(\Gamma \). Now \(\Gamma _{i,i'}=0\) if there exists a set \({\mathscr {L}}\subset \mathbb {N}\) such that \({\mathscr {L}} \cap \{i,i'\}=\varnothing \) for which conditional on \(\{ Z_{i^\star },\, i^\star \in {\mathscr {L}}\}\), \(Z_i\) are \(Z_{i'}\) are independent.

*V*at all times \(k 2^{-j^\star -1}\), \(k=0\ldots ,2^{j^\star +1}\). Now \(Z_i\) and \(Z_{i'}\) are conditionally independent given \(\{V_t, t=k 2^{-j^\star -1},\, k=0\ldots ,2^{j^\star +1}\}\) by (23) and the Markov property of the nonperiodic Ornstein–Uhlenbeck process. The result follows since \(\sigma (\{Z_{i^\star },\, i^\star \in {\mathscr {L}}\})=\sigma (\{V_t, t=k 2^{-j^\star -1},\, k=0\ldots ,2^{j^\star +1}\})\).

### Lemma 11

### Proof

### Proof of the second part

Denote by [*a*, *b*], [*c*, *d*] the support of \(\psi _i\) and \(\psi _{i'}\) respectively and let \(m = (b+a)/2\) and \(n = (d+c)/2\) but for \(i=1\), let \(m=0\). \(Z_1 = V(0)\), \(Z_2 = V_{1/2}\) and \({{\text {Var}}}\left( Z_1 \right) = {{\text {Var}}}\left( Z_2 \right) = \frac{\sigma ^2}{2\gamma }\coth (\gamma /2)\), and \({{\text {Cov}}}\left( Z_1 , Z_2 \right) = \frac{\sigma ^2}{2\gamma }\sinh ^{-1}(\gamma /2)\). Note that the \(2\times 2\) covariance matrix of \(Z_1\) and \(Z_2\) has eigenvalues \(\tfrac{\sigma ^2}{2\gamma } {\text {tanh}}(\gamma /4)\) and \(\tfrac{\sigma ^2}{2\gamma } \coth (\gamma /4)\) and is strictly positive definite. \(\square \)

By midpoint displacement, \(2Z_{i} = 2V_{m} - V_{a} - V_{b}\), \(i > 2\) and \(K(s,t)=\mathbb {E}{V}_s {V}_t= \frac{\sigma ^2}{2\gamma }\frac{1}{1-e^{-\gamma }} ( e^{-\gamma |t-s|}+e^{-\gamma (1-|t-s|)})\).

- 1.
The entries on diagonal, \(i = i'\);

- 2.
The interiors of the supports of \(\psi _i\) and \(\psi _{i'}\) are non-overlapping;

- 3.
The support of \(\psi _{i'}\) is contained in the support of \(\psi _i\).

*Case 1.*By elementary computations for \(i > 2\),

*Case 2.*Necessarily \(i, i' > 2\). By twofold application of lemma 11

*Case 3.*

*a*,

*b*and

*m*are not in (

*c*,

*d*), we obtain

### 6.3 Proof of theorem 3

- 1.
For every \(T>0\) and \(b_1,b_2\in L^2(\mathbb {T})\) the measures \(P^{b_1,T}\) and \(P^{b_2,T}\) are equivalent.

- 2.
The posterior as defined in equation Eq. (5) is well defined.

- 3.Define the (random)
*Hellinger semimetric*\(h_T\) on \(L^2(\mathbb {T})\) byThere are constants \(0<c<C\) for which$$\begin{aligned} h_T^2(b_1,b_2):= \int _0^{T} \Bigl (b_1-b_2\Bigr )^2(X_t)\,{\,\mathrm {d}}t, \quad b_1,\, b_2 \in L^2(\mathbb {T}). \end{aligned}$$(28)$$\begin{aligned} \lim _{T\rightarrow \infty } P^{\theta _0,T}\Bigl (c\sqrt{T}\Vert b_1-b_2\Vert _2\le h_T(b_1,b_2) \le C\,\sqrt{T}\Vert b_1-b_2\Vert _2, \forall \, ,b_1, b_2\in L^2(\mathbb {T}) \Bigr ) =1. \end{aligned}$$

*f*for which the above integrals are defined. Since we are working with 1-periodic functions, we define the periodic local time by

*f*we have

Conditions 1 and 2 now follow by arguing precisely as in lemmas A.2 and 3.1 of van Waaij and van Zanten (2016) respectively (the key observation being that the convergence result of \(\mathring{L}_T(x)/T\) also holds when \(\int _0^1b(x){\,\mathrm {d}}x\) is nonzero, which is assumed in that paper).

The stated result follows from Theorem 2.1 in van der Meulen et al. (2006) (taking \(\mu _T=\sqrt{T} \varepsilon _T\) in their paper).

### 6.4 Proof of theorem 8 with Assumption 6 (A)

#### 6.4.1 Small ball probability

*r*instead of \(r_\varepsilon \) in the remainder of the proof. By lemma 16 we have \(\Vert b_0^{r}-b_0\Vert _\infty \le \varepsilon \). Therefore

*S*. For any \(x > 0\), we have

*q*are taken from Assumption 7. For \(\varepsilon \) sufficiently small, we have by the second part of Assumption 7

*r*and the first part of Assumption 7, there exists a positive constant

*C*such that

*Bounding the first term on the RHS of*(31). For \(\varepsilon \) sufficiently small, we have

*Bounding the second term on the RHS of*(31). For \(\varepsilon \) sufficiently small, we have

*Bounding the third term on the RHS of*(31). For \(\varepsilon \) sufficiently small, in case \(\beta \ge \alpha \) we have

#### 6.4.2 Entropy and remaining mass conditions

### Proposition 12

### Proof

### Proposition 13

*K*such that

### Proof

*K*such that

We can now finish the proof for the entropy and remaining mass conditions. Choose \(r_n\) to be the smallest integer so that \(2^{r_n}\ge L\varepsilon _n^{-\frac{1}{\beta }}\), where *L* is a constant, and set \( {\mathscr {B}}_n={\mathscr {C}}_{r_n}\). The entropy bound then follows directly from Proposition 13.

*L*big enough.

### 6.5 Proof of theorem 8 under assumption 6 (B)

We start with a lemma.

### Lemma 14

*r*, such that for all \(i, i', 2\le \ell (i),\ell (i')\le r\),

### Proof

*A*as block matrix

*B*, \(A_2\) defined accordingly. By lemma 2

It follows that \(x'\Lambda x \asymp x'Ax\). This implies that the small ball probabilities and the mass outside a sieve behave similar under Assumption 6(B) as when the \(Z_{i}\) are independent normally distributed with zero mean and variance \(\xi _i^2=\Lambda _{ii}\). As this case corresponds to Assumption 6(A) with \(\alpha = \frac{1}{2}\) for which posterior contraction has already been established, the stated contraction rate under Assumption 6(B) follows from Anderson’s lemma (lemma 19).

### 6.6 Proof of theorem 10: convergence in stronger norms

*m*as

### Theorem 15

Note that the sieves \({\mathscr {C}}_{r,t}\) which we define in Sect. 6.4.2 have by Eq. (15) the property \(\Pi ({\mathscr {C}}_{r,t}^c\mid X^T)\rightarrow 0.\) By lemmas 21 and 23, the modulus of continuity satisfies \(m({\mathscr {C}}_{r,u},\varepsilon _n)\lesssim 2^{r(1/2-1/p)}\varepsilon _n\), for all \(p\in (2,\infty ]\), (assume \(1/\infty =0\)), and the result follows.

## Notes

### Acknowledgements

This work was partly supported by the Netherlands Organisation for Scientific Research (NWO) under the research programme “Foundations of nonparametric Bayes procedures”, 639.033.110 and by the ERC Advanced Grant “Bayesian Statistics in Infinite Dimensions”, 320637.

## References

- Anderson TW (1955) The integral of a symmetric unimodal function over a symmetric convex set and some probability inequalities. Proc Am Math Soc 6:170–176MathSciNetCrossRefzbMATHGoogle Scholar
- Bhattacharya R, Waymire E (2007) A basic course in probability theory. Universitext, Springer, New YorkzbMATHGoogle Scholar
- Dalalyan A (2005) Sharp adaptive estimation of the drift function for ergodic diffusions. Ann Stat 33(6):2507–2528MathSciNetCrossRefzbMATHGoogle Scholar
- Dalalyan AS, Kutoyants YA (2002) Asymptotically efficient trend coefficient estimation for ergodic diffusion. Math Methods Stat 11(4):402–427MathSciNetGoogle Scholar
- Ghosal S, van der Vaart AW (2007) Convergence rates of posterior distributions for noniid observations. Ann Stat 35(1):192–223CrossRefzbMATHGoogle Scholar
- Ghosal S, Ghosh JK, van der Vaart AW (2000) Convergence rates of posterior distributions. Ann Stat 28(2):500–531MathSciNetCrossRefzbMATHGoogle Scholar
- Giné E, Nickl R (2011) Rates of contraction for posterior distributions in \(L^r\)-metrics, \(1\le r\le \infty \). Ann Stat 39(6):2883–2911CrossRefzbMATHGoogle Scholar
- Giné E, Nickl R (2016) Mathematical foundations of infinite-dimensional statistical models. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, CambridgezbMATHGoogle Scholar
- Hindriks R (2011) Empirical dynamics of neuronal rhythms. PhD thesis, Vrije Universiteit AmsterdamGoogle Scholar
- Karatzas I, Shreve SE (1991) Brownian motion and stochastic calculus, volume 113 of graduate texts in mathematics, 2nd edn. Springer, New YorkGoogle Scholar
- Knapik BT, van der Vaart AW, van Zanten JH (2011) Bayesian inverse problems with Gaussian priors. Ann Stat 39(5):2626–2657MathSciNetCrossRefzbMATHGoogle Scholar
- Knapik B, Salomond J-B (2014) A general approach to posterior contraction in nonparametric inverse problems. BernoulliGoogle Scholar
- Kutoyants YA (2004) Statistical inference for ergodic diffusion processes. Springer, New YorkCrossRefzbMATHGoogle Scholar
- Papaspiliopoulos O, Pokern Y, Roberts GO, Stuart AM (2012) Nonparametric estimation of diffusions: a differential equations approach. Biometrika 99(3):511MathSciNetCrossRefzbMATHGoogle Scholar
- Pokern Y (2007) Fitting Stochastic Differential Equations to Molecular Dynamics Data. PhD thesis, University of WarwickGoogle Scholar
- Pokern Y, Stuart AM, van Zanten JH (2013) Posterior consistency via precision operators for Bayesian nonparametric drift estimation in SDEs. Stoch Process Appl 123(2):603–628MathSciNetCrossRefzbMATHGoogle Scholar
- Schauer M, van Zanten JH (2017) Uniform central limit theorems for additive functionals of diffusions on the circle. In preparationGoogle Scholar
- Shen W, Ghosal S (2015) Adaptive Bayesian procedures using random series priors. Scand J Stat 42(4):1194–1213MathSciNetCrossRefzbMATHGoogle Scholar
- Spokoiny VG (2000) Adaptive drift estimation for nonparametric diffusion model. Ann Stat 28(3):815–836MathSciNetCrossRefzbMATHGoogle Scholar
- Strauch C (2015) Sharp adaptive drift estimation for ergodic diffusions: the multivariate case. Stoch Process Appl 125(7):2562–2602MathSciNetCrossRefzbMATHGoogle Scholar
- van der Meulen FH, van der Vaart AW, van Zanten JH (2006) Convergence rates of posterior distributions for Brownian semimartingale models. Bernoulli 12(5):863–888MathSciNetCrossRefzbMATHGoogle Scholar
- van der Meulen FH, Schauer M, van Zanten JH (2014) Reversible jump MCMC for nonparametric drift estimation for diffusion processes. Comput Stat Data Anal 71:615–632MathSciNetCrossRefGoogle Scholar
- van der Vaart AW, van Zanten JH (2008) Rates of contraction of posterior distributions based on Gaussian process priors. Ann Stat 36(3):1435–1463Google Scholar
- van Waaij J, van Zanten H (2016) Gaussian process methods for one-dimensional diffusions: optimal rates and adaptation. Electron J Stat 10(1):628–645MathSciNetCrossRefzbMATHGoogle Scholar
- van Zanten JH (2013) Nonparametric Bayesian methods for one-dimensional diffusion models. Math Biosci 243(2):215–222MathSciNetCrossRefzbMATHGoogle Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.