Asymptotic equivalence for nonparametric regression with dependent errors: Gauss–Markov processes

Dette, Holger; Kroll, Martin

doi:10.1007/s10463-022-00826-6

Asymptotic equivalence for nonparametric regression with dependent errors: Gauss–Markov processes

Published: 09 May 2022

Volume 74, pages 1163–1196, (2022)
Cite this article

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Holger Dette¹ &
Martin Kroll¹

201 Accesses
Explore all metrics

Abstract

For the class of Gauss–Markov processes we study the problem of asymptotic equivalence of the nonparametric regression model with errors given by the increments of the process and the continuous time model, where a whole path of a sum of a deterministic signal and the Gauss–Markov process can be observed. We derive sufficient conditions which imply asymptotic equivalence of the two models. We verify these conditions for the special cases of Sobolev ellipsoids and Hölder classes with smoothness index $>1/2$ under mild assumptions on the Gauss–Markov process. To give a counterexample, we show that asymptotic equivalence fails to hold for the special case of Brownian bridge. Our findings demonstrate that the well-known asymptotic equivalence of the Gaussian white noise model and the nonparametric regression model with i.i.d. standard normal errors (see Brown and Low (Ann Stat 24:2384–2398, 1996)) can be extended to a setup with general Gauss–Markov noises.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fourier approach to goodness-of-fit tests for Gaussian random processes

Article 01 December 2023

The asymptotics of misspecified MLEs for some stochastic processes: a survey

Article 17 May 2017

Precise small deviations in L 2 of some Gaussian processes appearing in the regression context

Article 29 June 2014

References

Abundo, M. (2014). On the representation of an integrated Gauss–Markov process. Scientiae Mathematicae Japonicae, 77, 357–361.
MathSciNet MATH Google Scholar
Beder, J. H. (1987). A sieve estimator for the mean of a Gaussian process. The Annals of Statistics, 15, 59–78.
Article MathSciNet Google Scholar
Berlinet, A., Thomas-Agnan, C. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Boston, MA: Kluwer Academic Publishers.
Book Google Scholar
Brown, L. D., Low, M. G. (1996). Asymptotic equivalence of nonparametric regression and white noise. The Annals of Statistics, 24, 2384–2398.
Article MathSciNet Google Scholar
Brown, L. D., Zhang, C. H. (1998). Asymptotic nonequivalence of nonparametric experiments when the smoothness index is $1/2$. The Annals of Statistics, 26, 279–287.
Article MathSciNet Google Scholar
Brown, L. D., Cai, T. T., Low, M. G., Zhang, C. H. (2002). Asymptotic equivalence theory for nonparametric regression with random design. The Annals of Statistics, 30, 688–707.
Article MathSciNet Google Scholar
Carter, A. V. (2006). A continuous Gaussian approximation to a nonparametric regression in two dimensions. Bernoulli, 12, 143–156.
MathSciNet MATH Google Scholar
Carter, A. V. (2007). Asymptotic approximation of nonparametric regression experiments with unknown variances. The Annals of Statistics, 35, 1644–1673.
Article MathSciNet Google Scholar
Carter, A.V. (2009) Asymptotically sufficient statistics in nonparametric regression experiments with correlated noise. Journal of Probability and Statistics, Article ID 275308.
Dette, H., Pepelyshev, A., Zhigljavsky, A. (2016). Optimal designs in regression with correlated errors. The Annals of Statistics, 44, 113–152.
Article MathSciNet Google Scholar
Doob, J. L. (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. The Annals of Mathematical Statistics, 20, 393–403.
Article MathSciNet Google Scholar
Golubev, G. K., Nussbaum, M., Zhou, H. H. (2010). Asymptotic equivalence of spectral density estimation and Gaussian white noise. The Annals of Statistics, 38, 181–214.
Article MathSciNet Google Scholar
Grama, I., Nussbaum, M. (1998). Asymptotic equivalence for nonparametric generalized linear models. Probability Theory and Related Fields, 111, 167–214.
Article MathSciNet Google Scholar
Grama, I. G., Neumann, M. H. (2006). Asymptotic equivalence of nonparametric autoregression and nonparametric regression. The Annals of Statistics, 34, 1701–1732.
Article MathSciNet Google Scholar
Johnstone, I. M. (1999). Wavelet shrinkage for correlated data and inverse problems: adaptivity results. Statistica Sinica, 9, 51–83.
MathSciNet MATH Google Scholar
Johnstone, I. M., Silverman, B. W. (1997). Wavelet threshold estimators for data with correlated noise. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59, 319–351.
Article MathSciNet Google Scholar
Karatzas, I., Shreve, S. E. (1991). Brownian motion and stochastic calculus. New York: Springer.
MATH Google Scholar
Le Cam, L. (1986). Asymptotic methods in statistical decision theory. New York: Springer.
Book Google Scholar
Le Cam, L., Yang, G. L. (2000). Asymptotics in statistics. New York: Springer.
Book Google Scholar
Mariucci, E. (2016). Le Cam theory on the comparison of statistical models. The Graduate Journal of Mathematics, 1, 81–91.
MathSciNet Google Scholar
Mehr, C. B., McFadden, J. A. (1965). Certain properties of Gaussian processes and their first-passage times. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 27, 505–522.
MathSciNet MATH Google Scholar
Meister, A. (2011). Asymptotic equivalence of functional linear regression and a white noise inverse problem. The Annals of Statistics, 39, 1471–1495.
Article MathSciNet Google Scholar
Neveu, J. (1968). Processus aléatoires gaussiens. Montréal: Les Presses de l’Université de Montréal, Séminaire de Mathématiques Supérieures, No. 34 (Été, 1968).
Nussbaum, M. (1996). Asymptotic equivalence of density estimation and Gaussian white noise. The Annals of Statistics, 24, 2399–2430.
Article MathSciNet Google Scholar
Paulsen, V. I., Raghupathi, M. (2016). An introduction to the theory of reproducing kernel Hilbert spaces. Cambridge: Cambridge University Press.
Book Google Scholar
Reiß, M. (2008). Asymptotic equivalence for nonparametric regression with multivariate and random design. The Annals of Statistics, 36, 1957–1982.
Article MathSciNet Google Scholar
Reiß, M. (2011). Asymptotic equivalence for inference on the volatility from noisy observations. The Annals of Statistics, 39, 772–802.
Article MathSciNet Google Scholar
Rohde, A. (2004). On the asymptotic equivalence and rate of convergence of nonparametric regression and Gaussian white noise. Statistics & Decisions, 22, 235–243.
Article MathSciNet Google Scholar
Schmidt-Hieber, J. (2014). Asymptotic equivalence for regression under fractional noise. The Annals of Statistics, 42, 2557–2585.
Article MathSciNet Google Scholar
Slepian, D. (1961). First passage time for a particular Gaussian process. The Annals of Mathematical Statistics, 32, 610–612.
Article MathSciNet Google Scholar
Torgersen, E. (1991). Comparison of statistical experiments. Cambridge: Cambridge University Press.
Book Google Scholar
Wendland, H. (2005). Scattered data approximation. Cambridge: Cambridge University Press.
MATH Google Scholar

Download references

Author information

Authors and Affiliations

Fakultät für Mathematik, Lehrstuhl für Stochastik, Ruhr-Universität Bochum, Universitätsstraße 150, 44780, Bochum, Deutschland
Holger Dette & Martin Kroll

Authors

Holger Dette
View author publications
You can also search for this author in PubMed Google Scholar
Martin Kroll
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Holger Dette.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This research has been supported in part by the research grant DE 502/27-1 of the German Research Foundation (DFG)

Appendices

Proof of Theorem 2

The proof consists in the consideration of two intermediate experiments, given through Eqs. (21) and (22) below, that lie between ${\mathfrak {E}}_{1,n}$ and ${\mathfrak {E}}_{2,n}$.

First step: We first show that under Condition (i) in Theorem 2 the model given by Equation (13) is asymptotically equivalent to observing

$$\begin{aligned} Y_{i,n}' = n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s + \sqrt{n} \xi _{i,n}, \quad i=1,\ldots ,n, \end{aligned}$$

(21)

where $\xi _{i,n}= \varXi _{t_i,n} -\varXi _{t_{i-1,n}}$ are the increments of the process $(\varXi _t)_{t\in [0,1]}$. Under Assumption 1, we can take advantage of the representation (8) and write

$$\begin{aligned} \xi _{i,n}&=v(t_{i,n}) W_{q(t_{i,n})} - v(t_{i-1,n})W_{q(t_{i-1,n})}, \end{aligned}$$

and the experiment (13) can be written as

$$\begin{aligned} Y_{i,n} = f ( t_{i,n}) + \sqrt{n} \left[ v(t_{i,n}) W_{q(t_{i,n})} - v(t_{i-1,n})W_{q(t_{i-1,n})} \right] . \end{aligned}$$

Adding and subtracting $v(t_{i,n}) W_{q(t_{i-1,n})}$, we get

$$\begin{aligned} Y_{i,n} =&\,f(t_{i,n}) + \sqrt{n} (v(t_{i,n}) - v(t_{i-1,n})) W_{q(t_{i-1,n})} \\&+ \sqrt{n} v(t_{i,n}) (W_{q(t_{i,n})} - W_{q(t_{i-1,n}))}. \end{aligned}$$

Similarly, (21) can be written as

$$\begin{aligned} Y_{i,n}' =&\,n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s + \sqrt{n} (v(t_{i,n}) - v(t_{i-1,n})) W_{q(t_{i-1,n})} \\&+ \sqrt{n} v(t_{i,n}) (W_{q(t_{i,n})} - W_{q(t_{i-1,n})}). \end{aligned}$$

For the sake of a transparent notation let ${\mathbf {P}}^{{\mathbf {Y}}_n} = {\mathbf {P}}_{1,n}^f $ denote the distribution of the vector ${\mathbf {Y}}_n=(Y_{1,n},\ldots ,Y_{n,n})$, where we do not reflect the dependence on f in the notation. Similarly, let ${\mathbf {P}}^{{\mathbf {Y}}_n'}$ denote the distribution of the vector ${\mathbf {Y}}_n' = (Y'_{1,n},\ldots ,Y'_{n,n})$. Note that the squared total variation distance can be bounded by the Kullback-Leibler divergence, and therefore we will derive a bound for the Kullback-Leibler distance between the distributions ${\mathbf {P}}^{{\mathbf {Y}}_n}$ and ${\mathbf {P}}^{{\mathbf {Y}}_n'}$ by suitable conditioning. Denote with ${\mathscr {F}}_{i,n}$ the $\sigma $-algebra generated by $\left\{ W_{q({t})}, t \le t_{i,n} \right\} $. We have

$$\begin{aligned} \mathrm {KL}({\mathbf {P}}^{{\mathbf {Y}}_n}, {\mathbf {P}}^{{\mathbf {Y}}_n'})&= {\mathbf {E}}[ \mathrm {KL}( {\mathbf {P}}^{{\mathbf {Y}}_n}, {\mathbf {P}}^{{\mathbf {Y}}_n'} | {\mathscr {F}}_{n-1,n}) ] \\&= {\mathbf {E}}[ \mathrm {KL}({\mathbf {P}}^{{\mathbf {Y}}_{1:n-1,n}}, {\mathbf {P}}^{{\mathbf {Y}}_{1:n-1,n}'} | {\mathscr {F}}_{n-1,n}) + \mathrm {KL}({\mathbf {P}}^{Y_{n,n}}, {\mathbf {P}}^{Y_{n,n}'} | {\mathscr {F}}_{n-1,n})]\\&= {\mathbf {E}}[ \mathrm {KL}({\mathbf {P}}^{{\mathbf {Y}}_{1:n-1,n}}, {\mathbf {P}}^{{\mathbf {Y}}_{1:n-1,n}'} | {\mathscr {F}}_{n-1,n}) ] + {\mathbf {E}}[ \mathrm {KL}({\mathbf {P}}^{Y_{n,n}}, {\mathbf {P}}^{Y_{n,n}'} | {\mathscr {F}}_{n-1,n})]\\&= \mathrm {KL}({\mathbf {P}}^{{\mathbf {Y}}_{1:n-1,n}}, {\mathbf {P}}^{{\mathbf {Y}}_{1:n-1,n}'}) + {\mathbf {E}}[ \mathrm {KL}({\mathbf {P}}^{Y_{n,n}}, {\mathbf {P}}^{Y_{n,n}'} | {\mathscr {F}}_{n-1,n})] , \end{aligned}$$

where we use the notation ${\mathbf {Y}}_{1:n-1,n}=(Y_{1,n},\ldots ,Y_{n-1,n})$ and ${\mathbf {Y}}_{1:n-1,n}' = (Y'_{1,n},\ldots ,Y'_{n-1,n})$. Repeating the same argument, one obtains

$$\begin{aligned} \mathrm {KL}({\mathbf {P}}^{{\mathbf {Y}}_n}, {\mathbf {P}}^{{\mathbf {Y}}_n'}) = \sum _{i=1}^n {\mathbf {E}}[ \mathrm {KL}({\mathbf {P}}^{Y_{i,n}}, {\mathbf {P}}^{Y_{i,n}'} | {\mathscr {F}}_{i-1,n})] ]. \end{aligned}$$

In order to study the terms ${\mathbf {E}}[ \mathrm {KL}({\mathbf {P}}^{Y_{i,n}}, {\mathbf {P}}^{Y_{i,n}'} | {\mathscr {F}}_{i-1,n})]$, note that

$$\begin{aligned} Y_{i,n} | {\mathscr {F}}_{i-1,n} \sim {\mathcal {N}}( \mu _i , \sigma _i^2 ) \qquad \text {and} \qquad Y'_{i,n} | {\mathscr {F}}_{i-1,n} \sim {\mathcal {N}}( \mu _i^\prime , \sigma _i^2 ), \end{aligned}$$

where

$$\begin{aligned}&\mu _i = f(t_{i,n}) + \sqrt{n} (v(t_{i,n}) - v(t_{i-1,n})) W_{q(t_{i-1,n})},\\&\mu _i^\prime = n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s + \sqrt{n} (v(t_{i,n}) - v(t_{i-1,n})) W_{q(t_{i-1,n})},\quad \text {and}\\&\sigma _i^2 = n v^2(t_{i,n}) (q(t_{i,n}) - q(t_{i-1,n})). \end{aligned}$$

Here and in the following, ${\mathcal {N}}(\mu , \sigma ^2)$ denotes a normal distribution with mean $\mu $ and variance $\sigma ^2$. Using the fact that the Kullback-Leibler divergence between two normal distributions with common variance is given by

$$\begin{aligned} \mathrm {KL}({\mathcal {N}}(\mu _1, \sigma ^2), {\mathcal {N}}(\mu _2, \sigma ^2)) = \frac{(\mu _1 - \mu _2)^2}{2\sigma ^2}, \end{aligned}$$

we have

$$\begin{aligned} \mathrm {KL}({\mathbf {P}}^{Y_{i,n}}, {\mathbf {P}}^{Y_{i,n}'} | {\mathcal {F}}_{i-1,n}) = \frac{(f(t_{i,n}) - n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s)^2}{2 nv^2(t_{i,n}) (q(t_{i,n}) - q(t_{i-1,n}))}. \end{aligned}$$

This yields

$$\begin{aligned} \mathrm {KL}({\mathbf {P}}^{{\mathbf {Y}}_n}, {\mathbf {P}}^{{\mathbf {Y}}_n'}) = \frac{1}{2n} \sum _{i=1}^{n} \frac{(f(t_{i,n}) - n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s)^2}{v^2(t_{i,n}) (q(t_{i,n}) - q(t_{i-1,n}))}, \end{aligned}$$

and $\mathrm {KL}({\mathbf {P}}^{{\mathbf {Y}}_n}, {\mathbf {P}}^{{\mathbf {Y}}_n'}) \rightarrow 0$ holds if and only if Condition (i) holds. Consequently, the experiments (13) and (21) are asymptotically equivalent in this case.

Second step: Let $I(t | {\varvec{y}}_n)$ denote the Kriging interpolator which is defined as

$$\begin{aligned} I(t| {\varvec{y}}_n) = \left( K_\varXi ( t,t_{1,n}) , K_\varXi \left( t,t_{2,n}\right) , \ldots , K_\varXi ( t,t_{n,n}) \right) {\text {Cov}}(\varvec{\varXi }_n)^{-1} {\varvec{y}}_n^\top \end{aligned}$$

(22)

for ${\varvec{y}}_n = (y_{1,n},\ldots ,y_{n,n})$ and $\varvec{\varXi }_n = (\varXi _{t_{1,n}},\ldots ,\varXi _{t_{n,n}})$ (the additional condition that $v(1) \ne 0$ guarantees the invertibility of $\varvec{\varXi }_n$; see Lemma A.1 in Dette et al. (2016) for an explicit formula for the entries of the inverse matrix). By definition the Kriging predictor is linear in the argument ${\varvec{y}}_n$, and a simple argument shows the interpolation property

$$\begin{aligned} I(t_{j,n}|{\varvec{y}}_n) = y_{j,n} ~\text { for } j =1,\ldots , n. \end{aligned}$$

(23)

The second step now consists in proving (exact, that is, non-asymptotic) equivalence of the experiment defined by the discrete observations (21) and the experiment defined by the continuous path

$$\begin{aligned} {\widetilde{Y}}_t = I(t|{\mathbf {F}}_{f,n}) + n^{-1/2} \varXi _t, \end{aligned}$$

(24)

where ${\mathbf {F}}_{f,n}=(F_{f,n}(t_{1,n}),\ldots ,F_{f,n}(t_{n,n}))$. Defining the partial sums $S'_{k,n} = \sum _{j=1}^{k} Y'_{j,n}$ and recalling the notation $\xi _{k,n}= \varXi _{t_{k,n}} -\varXi _{t_{k-1,n}} $ for the increments of the the process $(\varXi _t)_{t\in [0,1]}$, we have

$$\begin{aligned} S'_{k,n} = \sum _{j=1}^{k} Y'_{j,n}&= n \int _0^{t_{k,n}} f(s) \mathrm {d}s + \sqrt{n} \sum _{j=1}^{k} \xi _{j,n} = n F_f\left( t_{k,n} \right) + \sqrt{n} \varXi _{t_{k,n}} \end{aligned}$$

(25)

where we used the interpolating property (23) and the the definition (24). Let $(\varXi _t^\prime )_{t \in [0,1]}$ be an independent copy of $(\varXi _t)_{t \in [0,1]}$, and set $R_t = \varXi _t^\prime - I(t| \varvec{\varXi }_n^\prime )$ with $\varvec{\varXi }_n^\prime = (\varXi ^\prime _{t_{1,n}},\ldots ,\varXi ^\prime _{t_{n,n}})$. Then, the process

$$\begin{aligned} I(t|\varvec{\varXi }_n) + R_t, \quad t \in [0,1], \end{aligned}$$

follows the same law as $(\varXi _t)_{t \in [0,1]}$ and $(\varXi _t^\prime )_{t \in [0,1]}$, which can can be checked by a comparison of the covariance structure (indeed, this kind of construction is valid for any centered Gaussian process). Then, observing the definition (24), we have

$$\begin{aligned} {\widetilde{Y}}_t&= I(t|{\mathbf {F}}_{f,n}) + n^{-1/2} \varXi _t \\&{\mathop {=}\limits ^{{\mathcal {L}}}}\, I(t|{\mathbf {F}}_{f,n}) + n^{-1/2}I(t|\varvec{\varXi }_n) + n^{-1/2}R_t \\&= n^{- 1} (I(t|{\mathbf {S}}'_n) + \sqrt{n} R_t), \end{aligned}$$

where we used the notation ${\mathbf {S}}'_n = (S'_{1.n}, \ldots , S'_{n.n})$, eq. (25) and the linearity of the Kriging estimator. Therefore, the process $({\widetilde{Y}}_t)_{t \in [0,1]}$ can be constructed from the vector ${\mathbf {Y}}'_{n}$. On the other hand, the observations $Y'_{1,n},\ldots ,Y'_{n,n}$ can be recovered from the trajectory $({\widetilde{Y}}_t)_{t \in [0,1]}$ since for $t=t_{k,n}$ the interpolation property (23) yields

$$\begin{aligned} n {\widetilde{Y}}_{t_{k,n}}&= n F_{f,n}\left( t_{k,n} \right) + \sqrt{n} \varXi _{t_{k,n}}\\&= n \int _0^{t_{k,n}} f(s) \mathrm {d}s + \sqrt{n}(\varXi _{t_{k,n}} - \varXi _{0})\\&= n \int _0^{t_{k,n}} f(s) \mathrm {d}s + \sqrt{n} \sum _{j=1}^k \xi _{j,n}, \end{aligned}$$

and one obtains $Y'_{i,n}$ as $Y'_{i,n} =n{\widetilde{Y}}_{t_i} - n{\widetilde{Y}}_{t_{i-1}}$. Hence, the process $({\widetilde{Y}}_t)_{t \in [0,1]}$ and the vector ${\mathbf {Y}}'_n$ contain the same information and the experiments (21) and (24) are equivalent.

Third step: It remains to show that the experiment $\widetilde{{\mathfrak {E}}}_{2,n}$ defined by the path in (24) is asymptotically equivalent to the experiment ${\mathfrak {E}}_{2,n}$ defined by

$$\begin{aligned} Y_t = \int _0^t f(s)\mathrm {d}s + n^{-1/2} \varXi _t. \end{aligned}$$

For this purpose we denote by $ {\mathbf {P}}^{(Y_t)_{t \in [0,1]}}$ and $ {\mathbf {P}}^{({\widetilde{Y}}_t)_{t \in [0,1]} }$ the distributions of the processes $(Y_t)_{t \in [0,1]}$ and $({\widetilde{Y}}_t)_{t \in [0,1]}$, respectively, where the dependence on the parameter f is again suppressed. First, note that the representation of the Kriging estimator shows that the function $t \rightarrow I( t | {\mathbf {F}}_{f,n})$ belongs to the RKHS ${\mathcal {H}}(\varXi )$ associated with the covariance kernel $K_\varXi $. Second, Condition (ii) in the statement of Theorem 2 yields that the same holds true for the function $F_{f,n}(\varvec{\cdot })$. Using the fact that the squared total variation distance can be bounded by the Kullback-Leibler divergence, we obtain

$$\begin{aligned} \varDelta ^2({\mathfrak {E}}_{2,n}, {\widetilde{{\mathfrak {E}}}}_{2,n})&\le \sup _{f \in \varTheta } \mathrm {KL}( {\mathbf {P}}^{(Y_t)_{t \in [0,1]}}, {\mathbf {P}}^{({\widetilde{Y}}_t)_{t \in [0,1]}})\\&= \frac{n}{2} \sup _{f \in \varTheta } \Vert F_{f}(\varvec{\cdot }) - I(\varvec{\cdot }\mid {\mathbf {F}}_{f,n}) \Vert _{{\mathcal {H}}(\varXi )}^2\\&= \frac{n}{2} \sup _{f \in \varTheta } \inf _{{\varvec{\alpha }}_n \in {\mathbb {R}}^n} \left\Vert F_f(\varvec{\varvec{\cdot }}) - \sum _{j=1}^n \alpha _j K_\varXi \left( \varvec{\cdot }, t_{j,n} \right) \right\Vert _{{\mathcal {H}}(\varXi )}^2 \rightarrow 0 \end{aligned}$$

(the first equality follows from Lemma 2 in Schmidt-Hieber (2014), the second from Theorem 13.1 in Wendland (2005)), which completes the proof of Theorem 2.$\square $

Proofs of the results in Section 4

1.1 Proof of Theorem 3

The proof consists in checking the two conditions (i) and (ii) in Theorem 2.

Verification of condition (i): We have to show that the expression

$$\begin{aligned} \frac{1}{n} \sup _{f \in \varTheta (\beta ,L)} \sum _{i=1}^{n} \frac{(f(t_{i,n}) - n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s)^2}{v^2(t_{i,n}) (q(t_{i,n}) - q(t_{i-1,n}))} \end{aligned}$$

converges to zero as $n \rightarrow \infty $. By the assumptions regarding the functions v and q and an application of the mean value theorem this is equivalent to the condition

$$\begin{aligned} \sup _{f \in \varTheta (\beta ,L)} \sum _{i=1}^{n} \Big ( f(t_{i,n}) - n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s \Big )^2 \rightarrow 0. \end{aligned}$$

(26)

Let $f\in \varTheta (\beta ,L)$ with Fourier expansion $f(\varvec{\cdot }) = \sum _{k \in {\mathbb {Z}}} \theta _k \varvec{\mathrm e}_k(\varvec{\cdot })$. For any $K \in {\mathbb {N}}$ (the appropriate value of $K=K(n)$ for our purposes will be specified below) we define the functions

$$\begin{aligned} f_K(\varvec{\cdot })&= \sum _{\vert k\vert \le K} \theta _k \varvec{\mathrm e}_k(\varvec{\cdot }) \text { and } \\ f_K^\top (\varvec{\cdot })&= f(\varvec{\cdot }) - f_K(\varvec{\cdot }) = \sum _{\vert k\vert > K} \theta _k \varvec{\mathrm e}_k(\varvec{\cdot }), \end{aligned}$$

respectively. Consequently,

$$\begin{aligned} \sum _{i=1}^n \Big (f(t_{i,n}) - n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s \Big )^2 \le 3 \sum _{i=1}^n A_{i,n}^2 + 3 \sum _{i=1}^n B_{i,n}^2 + 3 \sum _{i=1}^n C_{i,n}^2, \end{aligned}$$

(27)

where

$$\begin{aligned} A_{i,n}&= f_K(t_{i,n}) - n \int _{t_{i-1,n}}^{t_{i,n}} f_K(s)\mathrm {d}s ,\\ B_{i,n}&= f_K^\top (t_{i,n}), \quad \text {and}\\ C_{i,n}&= n \int _{t_{i-1,n}}^{t_{i,n}} f_K^\top (s)\mathrm {d}s. \end{aligned}$$

We now show for $K=n$ that the estimates

$$\begin{aligned} \sum _{i=1}^n A_{i,n}^2&= {\mathcal {O}}(\max \{ n^{-1},n^{1-2\beta } \} ) , \end{aligned}$$

(28)

$$\begin{aligned} \sum _{i=1}^n B_{i,n}^2&= {\mathcal {O}}( n^{1-2\beta } ) , \end{aligned}$$

(29)

$$\begin{aligned} \sum _{i=1}^n C_{i,n}^2&= {\mathcal {O}}( n^{1-2\beta } ) , \end{aligned}$$

(30)

hold uniformly with respect to $f \in \varTheta (\beta ,L)$. Then, assertion (26) follows from (27) and the assumption $\beta >1/2$.

Proof of (28):

The quantity

$$\begin{aligned} \frac{1}{n} \sum _{j=1}^n |A_{j,n} |^2 \end{aligned}$$

can be interpreted as the (average) energy of the discrete signal $A^{(n)} = (A_{1,n},\ldots ,A_{n,n})$. Define

$$\begin{aligned} F_j = \frac{1}{n} \sum _{k=1}^n A_{k,n} e^{-2\pi \mathrm {i}k j/n} \end{aligned}$$

as the discrete Fourier transform of the signal $A^{(n)}$, then Parseval’s identity for the discrete Fourier transform yields

$$\begin{aligned} \frac{1}{n} \sum _{j=1}^n |A_{j,n} |^2 = \sum _{j=1}^n |F_j |^2, \end{aligned}$$

and we have to derive an estimate for $n \sum _{j=1}^n |F_j |^2 $. For this purpose, we recall the notation of $A_{j,n}$ and note that

$$\begin{aligned} F_j&= \frac{1}{n} \sum _{k=1}^{n} \Big (f_K(t_{k,n}) - n \int _{t_{k-1}}^{t_k} f_K(s) \mathrm {d}s \Big ) e^{-2\pi \mathrm {i}kj/n}\\&= \frac{1}{n} \sum _{k=1}^{n} \left( \sum _{1 \le |l |\le K} \theta _l e^{-2\pi \mathrm {i}l k/n} - n \int _{t_{k-1}}^{t_k} \sum _{1 \le |l |\le K} \theta _l e^{-2\pi \mathrm {i}ls} \mathrm {d}s \right) e^{-2\pi \mathrm {i}kj/n}\\&= \frac{1}{n} \sum _{k=1}^{n} \left( \sum _{1 \le |l |\le K} \theta _l e^{-2\pi \mathrm {i}l k/n} + n \sum _{1 \le |l |\le K} \frac{\theta _l}{2\pi \mathrm {i}l} \left[ e^{-2\pi \mathrm {i}lk/n} - e^{-2\pi \mathrm {i}l(k-1)/n} \right] \right) e^{-2\pi \mathrm {i}kj/n}\\&= \frac{1}{n} \sum _{k=1}^{n} \left( \sum _{1 \le |l |\le K} \theta _l e^{-2\pi \mathrm {i}l k/n} + n \sum _{1 \le |l |\le K} \frac{\theta _l}{2\pi \mathrm {i}l} \left[ 1 - e^{2\pi \mathrm {i}l/n} \right] e^{-2\pi \mathrm {i}lk/n} \right) e^{-2\pi \mathrm {i}kj/n}\\&= \frac{1}{n} \sum _{k=1}^{n} \sum _{1 \le |l |\le K} \theta _l \left[ 1 + \frac{n}{2\pi \mathrm {i}l} ( 1- e^{2\pi \mathrm {i}l/n} ) \right] e^{-2\pi \mathrm {i}kl/n} e^{-2\pi \mathrm {i}kj /n}. \end{aligned}$$

From now on, we take $K=n$ and write

$$\begin{aligned} F_j^+&= \frac{1}{n} \sum _{k=1}^{n} \sum _{l=1}^n \theta _l \left[ 1 + \frac{n}{2\pi \mathrm {i}l} ( 1- e^{2\pi \mathrm {i}l/n} ) \right] e^{-2\pi \mathrm {i}kl/n} e^{-2\pi \mathrm {i}kj /n}, \quad \text {and}\\ F_j^-&= \frac{1}{n} \sum _{k=1}^{n} \sum _{l=-n}^{-1} \theta _l \left[ 1 + \frac{n}{2\pi \mathrm {i}l} ( 1- e^{2\pi \mathrm {i}l/n} ) \right] e^{-2\pi \mathrm {i}kl/n} e^{-2\pi \mathrm {i}kj /n}. \end{aligned}$$

Since $\sum _{j=1}^n |F_j |^2 \le 2 \sum _{j=1}^n |F_j^+ |^2 + 2\sum _{j=1}^n |F_j^- |^2$, it is sufficient to consider $\sum _{j=1}^n |F_j^+ |^2$ (the term involving $F_j^-$ is treated analogously). We have

$$\begin{aligned} F_j^+&= \frac{1}{n} \sum _{k=1}^{n} \sum _{l=1}^n \theta _l \left[ 1 + \frac{n}{2\pi \mathrm {i}l} ( 1- e^{2\pi \mathrm {i}l/n} ) \right] e^{-2\pi \mathrm {i}kl/n} e^{-2\pi \mathrm {i}kj /n}\\&= \frac{1}{n} \sum _{l=1}^n \theta _l \left[ 1 + \frac{n}{2\pi \mathrm {i}l} ( 1- e^{2\pi \mathrm {i}l/n} ) \right] \sum _{k=1}^{n} e^{-2\pi \mathrm {i}kl/n} e^{-2\pi \mathrm {i}kj /n}\\&= \theta _{l(j)} \left[ 1 + \frac{n}{2\pi \mathrm {i}l(j)} ( 1- e^{2\pi \mathrm {i}l(j)/n} ) \right] \end{aligned}$$

where $l(j) = n - j$ for $j=1,\ldots ,n-1$ and $l(n) = n$. Here, we used the well-known fact that for any integer $m \in {\mathbb {Z}}$

$$\begin{aligned} \sum _{k=1}^n e^{-2\pi \mathrm {i}km/n} = {\left\{ \begin{array}{ll} n, &{} \text {if } m \in n{\mathbb {Z}},\\ 0, &{} \text {if } m \notin n {\mathbb {Z}}. \end{array}\right. } \end{aligned}$$

(31)

Thus, we obtain (uniformly with respect to $\varTheta $)

$$\begin{aligned} \sum _{j=1}^n |F_j^+ |^2&= \sum _{j=1}^n |\theta _j |^2 \left| 1 + \frac{n}{2\pi \mathrm {i}j} ( 1- e^{2\pi \mathrm {i}j/n} ) \right| ^2\\&= \sum _{j=1}^n |\theta _j |^2 \left| \frac{n}{2\pi \mathrm {i}j} ( e^{2\pi \mathrm {i}j/n} - 1 - 2\pi \mathrm {i}j/n ) \right| ^2\\&\asymp n^2\sum _{j=1}^n \frac{|\theta _j |^2}{|j |^2} \left| e^{2\pi \mathrm {i}j/n} - 1 - 2\pi \mathrm {i}j/n \right| ^2\\&\lesssim n^2\sum _{j=1}^n \frac{|\theta _j |^2}{|j |^2} \cdot \left| j/n \right| ^4 = n^{-2} \sum _{j=1}^n |\theta _j |^2 |j |^2\\&= n^{-2} \sum _{j=1}^n |\theta _j |^2 |j |^{2\beta } |j |^{2-2\beta } \le n^{-2} L^2 \max \{1,n^{2-2\beta }\} \\&\lesssim \max \{ n^{-2},n^{-2\beta } \}. \end{aligned}$$

An analogous argument for the term $\sum _{j=1}^n |F_j^- |^2 $ proves the estimate (28). $\square $

Proof of (29):

We have

$$\begin{aligned} \sum _{j=1}^n |f_K^\top (t_{j,n}) |^2&= \sum _{j=1}^n \left| \sum _{\vert k\vert> K} \theta _k \exp (-2\pi \mathrm {i}kj/n) \right| ^2\\&\le 2\sum _{j=1}^n \left| \sum _{k > K} \theta _k \exp (-2\pi \mathrm {i}kj/n) \right| ^2 + 2\sum _{j=1}^n \left| \sum _{k < -K} \theta _k \exp (-2\pi \mathrm {i}kj/n) \right| ^2, \end{aligned}$$

and it is again sufficient to consider the sum running over $k>K$. Using (31) again, we get

$$\begin{aligned} \sum _{j=1}^n \left| \sum _{k> K} \theta _k \exp (-2\pi \mathrm {i}kj/n) \right| ^2&= \sum _{k,l> K} \theta _k \overline{\theta _{l}} \sum _{j=1}^n \exp (-2\pi \mathrm {i}(k-l)j/n)\\&= n \sum _{\begin{array}{c} k,l > K\\ k-l \in n{\mathbb {Z}} \end{array}} \theta _k \overline{\theta _{l}}. \end{aligned}$$

Taking $K = n$ here as well yields

$$\begin{aligned} \sum _{j=1}^n \left| \sum _{k > K} \theta _k \exp (-2\pi \mathrm {i}kj/n) \right| ^2&= \sum _{m=n+1}^{2n} \left| \sum _{r=0}^\infty \theta _{m + rn} \right| ^2. \end{aligned}$$

Now,

$$\begin{aligned} \left| \sum _{r=0}^\infty \theta _{m + rn} \right| ^2&\le \left( \sum _{r=0}^\infty |\theta _{m + rn}|^2 (m+rn)^{2\beta } \right) \left( \sum _{r=0}^\infty (m+rn)^{-2\beta } \right) , \end{aligned}$$

and we obtain

$$\begin{aligned} \sum _{m=n+1}^{2n} \left| \sum _{r=0}^\infty \theta _{m + rn} \right| ^2&\le n L^2 \sum _{r=0}^\infty (n+rn)^{-2\beta } \lesssim n^{1-2\beta } \end{aligned}$$

uniformly over $f \in \varTheta (\beta ,L)$, which establishes (29). $\square $

Proof of (30):

Using Jensen’s inequality and Parseval’s identity we obtain

$$\begin{aligned} \sum _{i=1}^n \left( n \int _{t_{i-1,n}}^{t_{i,n}} f_K^\top (s)\mathrm {d}s\right) ^2&\le n \int _0^1 (f_K^\top (s))^2 \mathrm {d}s= n \sum _{\vert k\vert > K} |\theta _k |^2\le nL^2 K^{-2\beta } \end{aligned}$$

uniformly over $f \in \varTheta (\beta ,L)$, which is of order $n^{1-2\beta }$ if we choose $K=n$. $\square $

Verification of condition (ii): We have to show that

$$\begin{aligned} D_n:= \min _{{\varvec{\alpha }}_n \in {\mathbb {R}}^n} \Vert F_f(\varvec{\cdot }) - \sum _{j=1}^n \alpha _j K_\varXi (\varvec{\cdot },t_{j,n}) \Vert _{{\mathcal {H}}(\varXi )}^2 = o(n^{-1}) \end{aligned}$$

(32)

uniformly over all $f \in \varTheta (\beta ,L)$. Via the isomorphism $\psi $ introduced in Eq. (10) we have for $g=\psi F_f$

$$\begin{aligned} D_n&= \min _{{\varvec{\alpha }}_n \in {\mathbb {R}}^n} \Vert g(\varvec{\cdot }) - \sum _{j=1}^n \alpha _j v(t_{j,n}) {\mathbf {1}}_{[0,q(t_{j,n})]}(\varvec{\cdot }) \Vert _{L^2([0,T])}^2\\&= \min _{{\varvec{\alpha }}_n \in {\mathbb {R}}^n} \Vert g(\varvec{\cdot }) - \sum _{j=1}^n \alpha _j {\mathbf {1}}_{(q(t_{j-1,n}),q(t_{j,n})]}(\varvec{\cdot }) \Vert ^2_{L^2([0,T])}\\&=\min _{{\varvec{\alpha }}_n \in {\mathbb {R}}^n} \int _0^T (g(u) - \sum _{j=1}^n \alpha _j {\mathbf {1}}_{(q(t_{j-1,n}),q(t_{j,n})]}(u) )^2 \mathrm {d}u\\&= \min _{{\varvec{\alpha }}_n \in {\mathbb {R}}^n} \int _0^1 (g(q(w)) - \sum _{j=1}^n \alpha _j {\mathbf {1}}_{(q(t_{j-1,n}),q(t_{j,n})]}(q(w)))^2 q'(w) \mathrm {d}w\\&= \min _{{\varvec{\alpha }}_n \in {\mathbb {R}}^n} \int _0^1 (g(q(w)) - \sum _{j=1}^n \alpha _j {\mathbf {1}}_{(t_{j-1,n},t_{j,n}]}(w))^2 q'(w) \mathrm {d}w. \end{aligned}$$

Assuming that $q'$ is bounded from above we obtain

$$\begin{aligned} D_n \le C(q)&\min _{{\varvec{\alpha }}_n \in {\mathbb {R}}^n} \int _0^1 (g(q(w)) - \sum _{j=1}^n \alpha _j {\mathbf {1}}_{(t_{j-1,n},t_{j,n}]}(w))^2 \mathrm {d}w. \end{aligned}$$

Note that

$$\begin{aligned} g(q(w)) = g_1(q(w)) - g_2(q(w)) \end{aligned}$$

with

$$\begin{aligned} g_1(q(w)) = \frac{f(w)}{v(w)q'(w)},\quad \text {and}\quad g_2(q(w)) = \frac{F_f(w)v'(w)}{v^2(w)q'(w)}. \end{aligned}$$

With $f_n(\varvec{\cdot }) = \sum _{\vert k\vert \le n} \theta _k \varvec{\mathrm e}_k(\varvec{\cdot })$ (and $f_n^\top (\varvec{\cdot }) = f(\varvec{\cdot }) - f_n(\varvec{\cdot })$) we define

$$\begin{aligned} \alpha _j^{(1)} = \frac{f_n(t_{j,n})}{v(t_{j,n})q'(t_{j,n})} \quad \text { and } \quad \alpha _j^{(2)} = \frac{F_{f}(t_{j,n})v'(t_{j,n})}{v^2(t_{j,n})q'(t_{j,n})}. \end{aligned}$$

Using these notations we get

$$\begin{aligned} D_n = \min _{{\varvec{\alpha }}_n \in {\mathbb {R}}^n} \Vert F_f(\varvec{\cdot }) - \sum _{j=1}^n \alpha _j K_\varXi (\varvec{\cdot },t_{j,n}) \Vert _{{\mathcal {H}}(\varXi )}^2 \lesssim I_1 + I_2 \end{aligned}$$

(33)

where

$$\begin{aligned} I_1 = \int _0^1 \Big ( g_1(q(w)) - \sum _{j=1}^{n} \alpha _j^{(1)} {\mathbf {1}}_{[t_{j-1,n},t_{j,n})}(w)\Big )^2 \mathrm {d}w , \end{aligned}$$

(34)

$$\begin{aligned} I_2 = \int _0^1 \Big (g_2(q(w)) - \sum _{j=1}^{n} \alpha _j^{(2)} {\mathbf {1}}_{[t_{j-1,n},t_{j,n})}(w) \Big )^2 \mathrm {d}w. \end{aligned}$$

(35)

We investigate the two terms $I_1$ and $I_2$ separately.

Bound for $I_1$: We use the estimate

$$\begin{aligned} I_1&\lesssim I_{11} + I_{12}, \end{aligned}$$

(36)

where

$$\begin{aligned} I_{11}&= \int _0^1 \left( \frac{f_n(w)}{v(w)q'(w)} - \sum _{j=1}^{n} \alpha _j^{(1)} {\mathbf {1}}_{[t_{j-1,n},t_{j,n})}(w)\right) ^2 \mathrm {d}w , \\ I_{12}&= \int _0^1 \left( \frac{f_n^\top (w)}{v(w)q'(w)}\right) ^2 \mathrm {d}w. \end{aligned}$$

For the first integral $I_{11}$ on the right-hand side of (36), we have

$$\begin{aligned} I_{11}&= \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{f_n(w)}{v(w)q'(w)} - \frac{f_n(t_{j,n})}{v(t_{j,n})q'(t_{j,n})} \right) ^2 \mathrm {d}w\nonumber \\&\lesssim \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{f_n(w)}{v(w)q'(w)} - \frac{f_n(t_{j,n})}{v(w)q'(w)} \right) ^2 \mathrm {d}w \end{aligned}$$

(37)

$$\begin{aligned}&+ \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{f_n(t_{j,n})}{v(w)q'(w)} - \frac{f_n(t_{j,n})}{v(t_{j,n})q'(t_{j,n})} \right) ^2 \mathrm {d}w. \end{aligned}$$

(38)

First, we further decompose (37) as

$$\begin{aligned} \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{f_n(w)}{v(w)q'(w)} - \frac{f_n(t_{j,n})}{v(w)q'(w)} \right) ^2 \mathrm {d}w&\lesssim C(v,q) \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} ( f_n(w) - f_n(t_{j,n})) ^2 \mathrm {d}w\\&\lesssim \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} |f_n^+(w) - f_n^+(t_{j,n}) |^2 \mathrm {d}w + \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} |f_n^-(w) - f_n^-(t_{j,n})|^2 \mathrm {d}w~, \end{aligned}$$

where

$$\begin{aligned} f_n^+ = \sum _{k=1}^n \theta _k \varvec{\mathrm e}_k(\varvec{\cdot }) \quad \text {and} \quad f_n^- = \sum _{k=-n}^{-1} \theta _k \varvec{\mathrm e}_k(\varvec{\cdot }). \end{aligned}$$

In the sequel, we consider only the term involving $f_n^+$ since the sum involving $f_n^-$ can be bounded using the same argument. We have the identity

$$\begin{aligned} |f_n^+(w) - f_n^+(t_{j,n}) |^2&= |\sum _{k=1}^n \theta _k (\varvec{\mathrm e}_k(w) - \varvec{\mathrm e}_k(t_{j,n})) |^2\\&= \sum _{k,l = 1}^n \theta _k {\overline{\theta }}_{l} \exp (-2\pi \mathrm {i}kj/n)[\exp (2\pi \mathrm {i}k(j/n - w)) - 1] \\&\quad\cdot \exp (2\pi \mathrm {i}l j/n) [\exp (-2\pi \mathrm {i}l(j/n - w)) - 1]. \end{aligned}$$

From this identity we obtain (exploiting (31) again)

$$\begin{aligned}&\sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} |f_n^+(w) - f_n^+(t_{j,n})|^2 \mathrm {d}w\\&\quad = \sum _{j=1}^n \sum _{k,l=1} \theta _k {\overline{\theta }}_l \exp (2\pi \mathrm {i}(l-k)j/n) \int _{\frac{j-1}{n}}^{\frac{j}{n}} [\exp (2\pi \mathrm {i}k(j/n - w)) - 1] \\&\qquad \cdot [\exp (-2\pi \mathrm {i}l(j/n - w)) - 1] \mathrm {d}w\\&\quad = \sum _{j=1}^n \sum _{k,l=1} \theta _k {\overline{\theta }}_l \exp (2\pi \mathrm {i}(l-k)j/n) \int _{0}^{\frac{1}{n}} [\exp (2\pi \mathrm {i}k(1/n - w)) - 1]\\&\qquad \cdot [\exp (-2\pi \mathrm {i}l(1/n - w)) - 1] \mathrm {d}w\\&\quad = n \sum _{k=1}^n \vert \theta _k\vert ^2 \int _{0}^{\frac{1}{n}} [\exp (2\pi \mathrm {i}k(1/n - w)) - 1] [\exp (-2\pi \mathrm {i}k(1/n - w)) - 1] \mathrm {d}w\\&\quad \le C n \sum _{k=1}^n \vert \theta _k\vert ^2 k^2 n^{-3}\\&\quad \le C n^{-2} \sum _{k=1}^n \vert \theta _k\vert ^2 k^{2\beta } k^{-2\beta + 2}\\&\quad \le C(L)n^{-2} \max \{ 1, n^{-2\beta +2} \} \lesssim \max \{ n^{-2}, n^{-2\beta } \} = o(n^{-1}). \end{aligned}$$

To derive an estimate of (38) we note that for any $n \in {\mathbb {N}}$,

$$\begin{aligned} \Vert f_n \Vert ^2_\infty = \sup _{x \in [0,1]} |\sum _{\vert k\vert \le n} \theta _k \varvec{\mathrm e}_k(x) |^2&\le \left( \sum _{\vert k\vert \le n} |\theta _k |^2 k^{2\beta } \right) \left( \sum _{\vert k\vert \le n} k^{-2\beta } \right) \le C(L,\beta ) \end{aligned}$$

(the same estimate holding true for f instead of $f_n$ which formally corresponds to $n = \infty $). Hence,

$$\begin{aligned}&\sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{f_n(t_{j,n})}{v(w)q'(w)} - \frac{f_n(t_{j,n})}{v(t_{j,n})q'(t_{j,n})} \right) ^2 \mathrm {d}w\\&\quad = \sum _{j=1}^n |f_n(t_{j,n}) |^2 \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{1}{v(w)q'(w)} - \frac{1}{v(t_{j,n})q'(t_{j,n})} \right) ^2 \mathrm {d}w\\&\quad \le C(L,\beta ) \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left| \frac{v(t_{j,n})q'(t_{j,n}) - v(w)q'(w)}{v(w)q'(w)v(t_{j,n})q'(t_{j,n})} \right| ^2 \mathrm {d}w\\&\quad \le C(L,\beta ,v,q) \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} |v(t_{j,n})q'(t_{j,n}) - v(w)q'(w) |^2 \mathrm {d}w. \end{aligned}$$

Because the product of a $\gamma _1$-Hölder function and a $\gamma _2$-Hölder function is (at least) Hölder with index $\min \{ \gamma _1,\gamma _2 \}$ we obtain from our assumptions that the function $vq'$ is Hölder continuous with index $\gamma >1/2$. Thus,

$$\begin{aligned} \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{f_n(t_{j,n})}{v(w)q'(w)} - \frac{f_n(t_{j,n})}{v(t_{j,n})q'(t_{j,n})} \right) ^2 \mathrm {d}w&\le C(L,\beta ,v,q) \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} n^{-2\gamma } \mathrm {d}w\\&= C(L,\beta ,v,q) n^{-2\gamma }, \end{aligned}$$

and this is $o(n^{-1})$ since $\gamma > 1/2$. Combining these arguments we obtain

$$\begin{aligned} I_{11} = o(n^{-1}). \end{aligned}$$

Finally, the second integral $I_{12}$ on the right-hand side of (36) can bounded as follows:

$$\begin{aligned} I_{12}&= \int _0^1 \left( \frac{f_n^\top (w)}{v(w)q'(w)}\right) ^2 \mathrm {d}w \le C(v,q) \int _0^1 |f_n^\top (w)|^2 \mathrm {d}w\\&\le C(v,q) \sum _{|k |> n} |\theta _k |^2 \le C(v,q)L^2 n^{-2\beta } = o(n^{-1}). \end{aligned}$$

Observing the estimate (26) we finally obtain $I_1= o(n^{-1})$.

Bound for $I_2$: In analogy to the decomposition of the term $I_{11}$ on the right-hand side of (36) we have

$$\begin{aligned} I_2&= \int _0^1 \left( \frac{F_{f}(w)v'(w)}{v^2(w)q'(w)} - \sum _{j=1}^{n} \alpha _j^{(2)} {\mathbf {1}}_{[t_{j-1,n},t_{j,n})}(w)\right) ^2 \mathrm {d}w\nonumber \\&\lesssim \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{F_{f}(w)v'(w)}{v^2(w)q'(w)} - \frac{F_{f}(t_{j,n})v'(t_{j,n})}{v^2(w)q'(w)} \right) ^2 \mathrm {d}w \end{aligned}$$

(39)

$$\begin{aligned}&\quad + \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{F_{f}(t_{j,n})v'(t_{j,n})}{v^2(w)q'(w)} - \frac{F_{f}(t_{j,n})v'(t_{j,n})}{v^2(t_{j,n})q'(t_{j,n})} \right) ^2 \mathrm {d}w. \end{aligned}$$

(40)

Note that $F_f$ is Lipschitz since it is continuously differentiable (recall that f itself is continuous since $\beta > 1/2$) and $v'$ is Hölder with index $\gamma >1/2$ due to our assumptions. Thus, the term (39) can be bounded as

$$\begin{aligned} \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{F_{f}(w)v'(w)}{v^2(w)q'(w)} - \frac{F_{f}(t_{j,n})v'(t_{j,n})}{v^2(w)q'(w)} \right) ^2 \mathrm {d}w\le C(v,q) n^{-2\gamma } \end{aligned}$$

which is of order $o(n^{-1})$. For the term (40) we obtain using our assumptions that

$$\begin{aligned}&\sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{F_{f}(t_{j,n})v'(t_{j,n})}{v^2(w)q'(w)} - \frac{F_{f}(t_{j,n})v'(t_{j,n})}{v^2(t_{j,n})q'(t_{j,n})} \right) ^2 \mathrm {d}w\\&\quad \le C(\beta ,L,v,q) \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} (v^2(t_{j,n})q'(t_{j,n}) - v^2(w)q'(w))^2 \mathrm {d}w. \end{aligned}$$

Using the same arguments as for the bound of (38), this term can be shown to be of order $o(n^{-1})$. Since both terms $I_1$ and $I_2$ are of order $o(n^{-1})$ the assertion (32) follows from (33). $\square $

1.2 Proof of Theorem 4

As in the proof of Theorem 3, we have to verify the the two conditions (i) and (ii) from Theorem 2.

Verification of condition (i): As in the Sobolev case it is sufficient to show that

$$\begin{aligned} \sup _{f \in {\mathcal {F}}(\alpha , L, M) } \sum _{i=1}^{n} \Big ( f(t_{i,n}) - n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s \Big )^2 \rightarrow 0. \end{aligned}$$

By the mean value theorem $n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s = f(\zeta _{i,n})$ for some $t_{i-1,n} \le \zeta _{i,n} \le t_{i,n}$. Thus, since $f \in {\mathcal {F}}(\alpha ,L,M)$,

$$\begin{aligned} \sum _{i=1}^{n} \Big ( f(t_{i,n}) - n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s \Big )^2&= \sum _{i=1}^n (f(t_{i,n}) - f(\zeta _{i,n}))^2\\&\le \sum _{i=1}^n L^2 |t_{i,n} - \zeta _{i,n} |^2\\&\le L^2 n^{-2\alpha +1}, \end{aligned}$$

and the last term converges to zero uniformly over $f \in {\mathcal {F}}(\alpha ,L,M)$ whenever $\alpha > 1/2$.

Verification of condition (ii): The proof is based on nearly the same reduction as in the Sobolev case. Again, we consider the bound

$$\begin{aligned} \min _{{\varvec{\alpha }}_n \in {\mathbb {R}}^n} \Vert F_f(\varvec{\cdot }) - \sum _{j=1}^n \alpha _j K_\varXi (\varvec{\cdot },t_{j,n}) \Vert _{{\mathcal {H}}(\varXi )}^2 \lesssim I_1 + I_2 \end{aligned}$$

(41)

where we define $I_1$ and $I_2$ as in Appendix B.1 (see the equations (34) and (35)) with the only exception that we now put

$$\begin{aligned} \alpha _j^{(1)} = \frac{f(t_{j,n})}{v(t_{j,n})q'(t_{j,n})} \end{aligned}$$

in the definition of $I_1$. Then,

$$\begin{aligned} I_1&\lesssim \int _0^1 \left( \frac{f(w)}{v(w)q'(w)} - \sum _{j=1}^{n} \alpha _j^{(1)} {\mathbf {1}}_{[t_{j-1,n},t_{j,n})}(w)\right) ^2 \mathrm {d}w\\&= \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{f(w)}{v(w)q'(w)} - \frac{f(t_{j,n})}{v(t_{j,n})q'(t_{j,n})} \right) ^2 \mathrm {d}w\\&\lesssim \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{f(w)}{v(w)q'(w)} - \frac{f(t_{j,n})}{v(w)q'(w)} \right) ^2 \mathrm {d}w\\&\quad + \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{f(t_{j,n})}{v(w)q'(w)} - \frac{f(t_{j,n})}{v(t_{j,n})q'(t_{j,n})} \right) ^2 \mathrm {d}w. \end{aligned}$$

The first integral can be bounded as

$$\begin{aligned} \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} \left( \frac{f(w)}{v(w)q'(w)} - \frac{f(t_{j,n})}{v(w)q'(w)} \right) ^2 \mathrm {d}w&\le C(v,q) \sum _{j=1}^n \int _{t_{j-1,n}}^{t_{j,n}} (f(w) - f(t_{j,n}))^2 \mathrm {d}w\\&\le C(v,q) L^2 n^{-2\alpha }, \end{aligned}$$

which converges to zero as n increases. The second integral can be bounded as in the Sobolev case using the assumption that f is bounded (as one can easily see, the assumption of uniform boundedness can be dropped if $v \cdot q'$ is constant; this is for instance satisfied in the case of Brownian motion). Hence, $I_1$ converges to 0.

The term $I_2$ in (41) can be bounded exactly as the corresponding term in the Sobolev ellipsoid case. This finishes the proof of the theorem. $\square $

1.3 Proof of Theorem 5

To prepare the proof, we recall an alternative (but equivalent) characterization of asymptotic equivalence in the framework of statistical decision theory. Let ${\mathfrak {E}}= ({\mathcal {X}},{\mathscr {X}},({\mathbf {P}}_{\theta })_{\theta \in \varTheta })$ be a statistical experiment. In decision theory one considers a decision space $({\mathcal {A}},{\mathscr {A}})$ where the set ${\mathcal {A}}$ contains the potential decisions (or actions) that are at the observers disposal and ${\mathscr {A}}$ is a $\sigma $-field on ${\mathcal {A}}$. In addition, there is a loss function

$$\begin{aligned} \ell :\varTheta \times {\mathcal {A}}\rightarrow [0,\infty ), \qquad (\theta ,a) \mapsto \ell (\theta ,a) \end{aligned}$$

with the interpretation that a loss $\ell (\theta ,a)$ occurs if the statistician chooses the action $a \in {{{\mathcal {A}}}} $ and $\theta \in \varTheta $ is the true state of nature. A (randomized) decision rule is a Markov kernel $\rho :{\mathcal {X}}\times {\mathscr {A}}\rightarrow [0,1]$, and the associated risk is

$$\begin{aligned} R_\theta ({\mathfrak {E}},\rho ,\ell ) = \int _{\mathcal {X}}\left( \int _{\mathcal {A}}\ell (\theta , a) \rho (x,\mathrm {d}a) \right) {\mathbf {P}}_\theta (\mathrm {d}x). \end{aligned}$$

Then, the deficiency between two experiments ${\mathfrak {E}}_1$ and $ {\mathfrak {E}}_2 $ is exactly the quantity

$$\begin{aligned} \delta ({\mathfrak {E}}_1,{\mathfrak {E}}_2) = \inf _{\rho _1} \sup _{\rho _2} \sup _\theta \sup _\ell |R_\theta ({\mathfrak {E}}_1,\rho _1,\ell ) - R_\theta ({\mathfrak {E}}_2,\rho _2,\ell ) |\end{aligned}$$

(42)

(see Mariucci, 2016, Theorem 2.7, and the references cited there), where the supremum is taken over all loss functions $\ell $ with $0 \le \ell (\theta ,a) \le 1$ for all $\theta \in \varTheta $ and $a \in {\mathcal {A}}$, all admissible parameters $\theta \in \varTheta $, and decision rules $\rho _2$ in the second experiment. The infimum is taken over all decision rules $\rho _1$ in the first experiment.

After these preliminaries, let us go on to the proof of the theorem. We consider the decision space $({\mathcal {A}},{\mathscr {A}}) = ({\mathbb {R}},{\mathscr {B}}({\mathbb {R}}))$ and the loss function

$$\begin{aligned} \ell :\varTheta (\beta ,L) \times {\mathcal {A}}\rightarrow \{0,1\}, \quad (f,a) \mapsto \ell (f,a) = {\left\{ \begin{array}{ll} 1, &{} \text {if } \int _0^1 f(x)\mathrm {d}x \ne a,\\ 0, &{} \text {if } \int _0^1 f(x)\mathrm {d}x = a. \end{array}\right. } \end{aligned}$$

In the experiment ${\mathfrak {E}}_{2,n}$ we observe the whole path $Y = \{ Y_t, \, t \in [0,1] \}$ satisfying

$$\begin{aligned} Y_t = \int _0^t f(s) \mathrm {d}s + \frac{1}{\sqrt{n}} B_t, \quad t \in [0,1]~, \end{aligned}$$

where $(B_t)_{t \in [0,1]}$ is a Brownian Bridge, and we consider the (non-randomized) decision rule $\rho _2$ defined by

$$\begin{aligned} \rho _2(h) = h(1) - h(0)~,~~~~ h \in {{{\mathcal {C}}}}([0,1], {\mathbb {R}}) . \end{aligned}$$

This directly yields $\rho _2(Y) = \int _0^1 f(s)\mathrm {d}s$ since $\varXi _0 = \varXi _1 = 0$ for the Brownian bridge. Hence,

$$\begin{aligned} R_f({\mathfrak {E}}_{2,n},\rho _2,\ell ) = \int \ell (f,\rho _2(Y)) {\mathbf {P}}^f_{2,n}(\mathrm {d}Y) = 0 \end{aligned}$$

for all $f \in \varTheta (\beta ,L)$ and $n \in {\mathbb {N}}$. From (42) we thus obtain

$$\begin{aligned} \delta ({\mathfrak {E}}_{1,n},{\mathfrak {E}}_{2,n})&\ge \inf _{\rho _1} \sup _{f \in \varTheta (\beta ,L)} |R_f({\mathfrak {E}}_{1,n},\rho _1,\ell ) |. \end{aligned}$$

Recall the notation $\varvec{\mathrm e}_n(x) = \exp (-2\pi \mathrm {i}n x)$ and introduce the functions $f_0 \equiv 0$ and

$$\begin{aligned} f_n(x) = \sqrt{\frac{2}{3}}\frac{L}{n^\beta } \left[ 1 - \frac{1}{2} \varvec{\mathrm e}_n(x) - \frac{1}{2} \varvec{\mathrm e}_{-n}(x) \right] = \sqrt{\frac{2}{3}}\frac{L}{n^\beta } \left[ 1 - \frac{1}{2} \cos ( 2\pi n x) \right] \end{aligned}$$

for $n \in {\mathbb {N}}$. It is easily seen that $f_n$ belongs to $\varTheta (\beta ,L)$. Note that by construction $f_n(j/n) = 0$ for $j=1,\ldots ,n$ and thus in the experiment ${\mathfrak {E}}_{1,n}$ the identity ${\mathbf {P}}_{1,n}^{f_0} = {\mathbf {P}}_{1,n}^{f_n}$ holds. For the considered loss function we have

$$\begin{aligned} R_f({\mathfrak {E}}_{1,n},\rho _1,\ell )&= \int _{{\mathbb {R}}^n} \rho _1({\mathbf {Y}}_n) \left( {\mathbb {R}}\backslash \left\{ \int _0^1 f(x)\mathrm {d}x \right\} \right) {\mathbf {P}}_{1,n}^f(\mathrm {d}{\mathbf {Y}}_n), \end{aligned}$$

where $\rho _1$ is any (potentially randomized) decision rule. Because

$$\begin{aligned} \int _0^1 f_0(x)\mathrm {d}x= 0 \ne \sqrt{2/3}Ln^{-\beta } = \int _0^1 f_n(x)\mathrm {d}x \end{aligned}$$

at least one of the quantities $\rho _1({\mathbf {Y}}_n) ( {\mathbb {R}}\backslash \{ \int _0^1 f_0(x)\mathrm {d}x \} )$ and $\rho _1({\mathbf {Y}}_n) ( {\mathbb {R}}\backslash \{ \int _0^1 f_n(x)\mathrm {d}x \} )$ must be $\ge 1/2$ for any ${\mathbf {Y}}_n$ (otherwise there is a contradiction). Thus, setting $A_\diamond = \{ {\mathbf {Y}}_n \in {\mathbb {R}}^n : \rho _1({\mathbf {Y}}_n) ( {\mathbb {R}}\backslash \{ \int _0^1 f_\diamond (x)\mathrm {d}x \} ) \ge 1/2 \}$ for $\diamond \in \{ 0, n \}$ one has $A_0 \cup A_n = {\mathbb {R}}^n$. As a consequence, either ${\mathbf {P}}_{1,n}^{f_0}(A_0) = {\mathbf {P}}_{1,n}^{f_n}(A_0) \ge 1/2$ or ${\mathbf {P}}_{1,n}^{f_0}(A_n) = {\mathbf {P}}_{1,n}^{f_n}(A_n) \ge 1/2$ holds. Without loss of generality, we assume that ${\mathbf {P}}_{1,n}^{f_0}(A_0) = {\mathbf {P}}_{1,n}^{f_n}(A_0) \ge 1/2$ (the other case follows by exactly the same argument). In this case, using the definition of the set $A_0$,

$$\begin{aligned} \delta ({\mathfrak {E}}_{1,n},{\mathfrak {E}}_{2,n})&\ge \inf _{\rho _1} \sup _{f \in \{ f_0,f_n \}} R_f({\mathfrak {E}}_{1,n},\rho _1,\ell ) \\&\ge \inf _{\rho _1} \int _{{\mathbb {R}}^n} \rho _1({\mathbf {Y}}_n) \left( {\mathbb {R}}\backslash \left\{ \int _0^1 f_0(x)\mathrm {d}x \right\} \right) {\mathbf {P}}_{1,n}^{f_0}(\mathrm {d}{\mathbf {Y}}_n)\\&\ge \inf _{\rho _1} \int _{A_0} \rho _1({\mathbf {Y}}_n) \left( {\mathbb {R}}\backslash \left\{ \int _0^1 f_0(x)\mathrm {d}x \right\} \right) {\mathbf {P}}_{1,n}^{f_0}(\mathrm {d}{\mathbf {Y}}_n)\\&\ge \frac{1}{2} {\mathbf {P}}_{1,n}^{f_0}(A_0)\\&\ge \frac{1}{4}, \end{aligned}$$

which proves the assertion. $\square $

About this article

Cite this article

Dette, H., Kroll, M. Asymptotic equivalence for nonparametric regression with dependent errors: Gauss–Markov processes. Ann Inst Stat Math 74, 1163–1196 (2022). https://doi.org/10.1007/s10463-022-00826-6

Download citation

Received: 24 October 2021
Revised: 24 February 2022
Accepted: 22 March 2022
Published: 09 May 2022
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10463-022-00826-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Asymptotic equivalence for nonparametric regression with dependent errors: Gauss–Markov processes

Abstract

Access this article

Similar content being viewed by others

Fourier approach to goodness-of-fit tests for Gaussian random processes

The asymptotics of misspecified MLEs for some stochastic processes: a survey

Precise small deviations in L 2 of some Gaussian processes appearing in the regression context

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Proof of Theorem 2

Proofs of the results in Section 4

1.1 Proof of Theorem 3

Proof of (28):

Proof of (29):

Proof of (30):

1.2 Proof of Theorem 4

1.3 Proof of Theorem 5

About this article

Cite this article

Keywords

Navigation

Asymptotic equivalence for nonparametric regression with dependent errors: Gauss–Markov processes

Abstract

Access this article

Similar content being viewed by others

Fourier approach to goodness-of-fit tests for Gaussian random processes

The asymptotics of misspecified MLEs for some stochastic processes: a survey

Precise small deviations in L 2 of some Gaussian processes appearing in the regression context

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Proof of Theorem 2

Proofs of the results in Section 4

1.1 Proof of Theorem 3

Proof of (28):

Proof of (29):

Proof of (30):

1.2 Proof of Theorem 4

1.3 Proof of Theorem 5

About this article

Cite this article

Share this article

Keywords

Search

Navigation