Abstract
For the class of Gauss–Markov processes we study the problem of asymptotic equivalence of the nonparametric regression model with errors given by the increments of the process and the continuous time model, where a whole path of a sum of a deterministic signal and the Gauss–Markov process can be observed. We derive sufficient conditions which imply asymptotic equivalence of the two models. We verify these conditions for the special cases of Sobolev ellipsoids and Hölder classes with smoothness index \(>1/2\) under mild assumptions on the Gauss–Markov process. To give a counterexample, we show that asymptotic equivalence fails to hold for the special case of Brownian bridge. Our findings demonstrate that the well-known asymptotic equivalence of the Gaussian white noise model and the nonparametric regression model with i.i.d. standard normal errors (see Brown and Low (Ann Stat 24:2384–2398, 1996)) can be extended to a setup with general Gauss–Markov noises.
Similar content being viewed by others
References
Abundo, M. (2014). On the representation of an integrated Gauss–Markov process. Scientiae Mathematicae Japonicae, 77, 357–361.
Beder, J. H. (1987). A sieve estimator for the mean of a Gaussian process. The Annals of Statistics, 15, 59–78.
Berlinet, A., Thomas-Agnan, C. (2004). Reproducing kernel Hilbert spaces in probability and statistics. Boston, MA: Kluwer Academic Publishers.
Brown, L. D., Low, M. G. (1996). Asymptotic equivalence of nonparametric regression and white noise. The Annals of Statistics, 24, 2384–2398.
Brown, L. D., Zhang, C. H. (1998). Asymptotic nonequivalence of nonparametric experiments when the smoothness index is \(1/2\). The Annals of Statistics, 26, 279–287.
Brown, L. D., Cai, T. T., Low, M. G., Zhang, C. H. (2002). Asymptotic equivalence theory for nonparametric regression with random design. The Annals of Statistics, 30, 688–707.
Carter, A. V. (2006). A continuous Gaussian approximation to a nonparametric regression in two dimensions. Bernoulli, 12, 143–156.
Carter, A. V. (2007). Asymptotic approximation of nonparametric regression experiments with unknown variances. The Annals of Statistics, 35, 1644–1673.
Carter, A.V. (2009) Asymptotically sufficient statistics in nonparametric regression experiments with correlated noise. Journal of Probability and Statistics, Article ID 275308.
Dette, H., Pepelyshev, A., Zhigljavsky, A. (2016). Optimal designs in regression with correlated errors. The Annals of Statistics, 44, 113–152.
Doob, J. L. (1949). Heuristic approach to the Kolmogorov-Smirnov theorems. The Annals of Mathematical Statistics, 20, 393–403.
Golubev, G. K., Nussbaum, M., Zhou, H. H. (2010). Asymptotic equivalence of spectral density estimation and Gaussian white noise. The Annals of Statistics, 38, 181–214.
Grama, I., Nussbaum, M. (1998). Asymptotic equivalence for nonparametric generalized linear models. Probability Theory and Related Fields, 111, 167–214.
Grama, I. G., Neumann, M. H. (2006). Asymptotic equivalence of nonparametric autoregression and nonparametric regression. The Annals of Statistics, 34, 1701–1732.
Johnstone, I. M. (1999). Wavelet shrinkage for correlated data and inverse problems: adaptivity results. Statistica Sinica, 9, 51–83.
Johnstone, I. M., Silverman, B. W. (1997). Wavelet threshold estimators for data with correlated noise. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59, 319–351.
Karatzas, I., Shreve, S. E. (1991). Brownian motion and stochastic calculus. New York: Springer.
Le Cam, L. (1986). Asymptotic methods in statistical decision theory. New York: Springer.
Le Cam, L., Yang, G. L. (2000). Asymptotics in statistics. New York: Springer.
Mariucci, E. (2016). Le Cam theory on the comparison of statistical models. The Graduate Journal of Mathematics, 1, 81–91.
Mehr, C. B., McFadden, J. A. (1965). Certain properties of Gaussian processes and their first-passage times. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 27, 505–522.
Meister, A. (2011). Asymptotic equivalence of functional linear regression and a white noise inverse problem. The Annals of Statistics, 39, 1471–1495.
Neveu, J. (1968). Processus aléatoires gaussiens. Montréal: Les Presses de l’Université de Montréal, Séminaire de Mathématiques Supérieures, No. 34 (Été, 1968).
Nussbaum, M. (1996). Asymptotic equivalence of density estimation and Gaussian white noise. The Annals of Statistics, 24, 2399–2430.
Paulsen, V. I., Raghupathi, M. (2016). An introduction to the theory of reproducing kernel Hilbert spaces. Cambridge: Cambridge University Press.
Reiß, M. (2008). Asymptotic equivalence for nonparametric regression with multivariate and random design. The Annals of Statistics, 36, 1957–1982.
Reiß, M. (2011). Asymptotic equivalence for inference on the volatility from noisy observations. The Annals of Statistics, 39, 772–802.
Rohde, A. (2004). On the asymptotic equivalence and rate of convergence of nonparametric regression and Gaussian white noise. Statistics & Decisions, 22, 235–243.
Schmidt-Hieber, J. (2014). Asymptotic equivalence for regression under fractional noise. The Annals of Statistics, 42, 2557–2585.
Slepian, D. (1961). First passage time for a particular Gaussian process. The Annals of Mathematical Statistics, 32, 610–612.
Torgersen, E. (1991). Comparison of statistical experiments. Cambridge: Cambridge University Press.
Wendland, H. (2005). Scattered data approximation. Cambridge: Cambridge University Press.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This research has been supported in part by the research grant DE 502/27-1 of the German Research Foundation (DFG)
Appendices
Proof of Theorem 2
The proof consists in the consideration of two intermediate experiments, given through Eqs. (21) and (22) below, that lie between \({\mathfrak {E}}_{1,n}\) and \({\mathfrak {E}}_{2,n}\).
First step: We first show that under Condition (i) in Theorem 2 the model given by Equation (13) is asymptotically equivalent to observing
where \(\xi _{i,n}= \varXi _{t_i,n} -\varXi _{t_{i-1,n}}\) are the increments of the process \((\varXi _t)_{t\in [0,1]}\). Under Assumption 1, we can take advantage of the representation (8) and write
and the experiment (13) can be written as
Adding and subtracting \(v(t_{i,n}) W_{q(t_{i-1,n})}\), we get
Similarly, (21) can be written as
For the sake of a transparent notation let \({\mathbf {P}}^{{\mathbf {Y}}_n} = {\mathbf {P}}_{1,n}^f \) denote the distribution of the vector \({\mathbf {Y}}_n=(Y_{1,n},\ldots ,Y_{n,n})\), where we do not reflect the dependence on f in the notation. Similarly, let \({\mathbf {P}}^{{\mathbf {Y}}_n'}\) denote the distribution of the vector \({\mathbf {Y}}_n' = (Y'_{1,n},\ldots ,Y'_{n,n})\). Note that the squared total variation distance can be bounded by the Kullback-Leibler divergence, and therefore we will derive a bound for the Kullback-Leibler distance between the distributions \({\mathbf {P}}^{{\mathbf {Y}}_n}\) and \({\mathbf {P}}^{{\mathbf {Y}}_n'}\) by suitable conditioning. Denote with \({\mathscr {F}}_{i,n}\) the \(\sigma \)-algebra generated by \(\left\{ W_{q({t})}, t \le t_{i,n} \right\} \). We have
where we use the notation \({\mathbf {Y}}_{1:n-1,n}=(Y_{1,n},\ldots ,Y_{n-1,n})\) and \({\mathbf {Y}}_{1:n-1,n}' = (Y'_{1,n},\ldots ,Y'_{n-1,n})\). Repeating the same argument, one obtains
In order to study the terms \({\mathbf {E}}[ \mathrm {KL}({\mathbf {P}}^{Y_{i,n}}, {\mathbf {P}}^{Y_{i,n}'} | {\mathscr {F}}_{i-1,n})]\), note that
where
Here and in the following, \({\mathcal {N}}(\mu , \sigma ^2)\) denotes a normal distribution with mean \(\mu \) and variance \(\sigma ^2\). Using the fact that the Kullback-Leibler divergence between two normal distributions with common variance is given by
we have
This yields
and \(\mathrm {KL}({\mathbf {P}}^{{\mathbf {Y}}_n}, {\mathbf {P}}^{{\mathbf {Y}}_n'}) \rightarrow 0\) holds if and only if Condition (i) holds. Consequently, the experiments (13) and (21) are asymptotically equivalent in this case.
Second step: Let \(I(t | {\varvec{y}}_n)\) denote the Kriging interpolator which is defined as
for \({\varvec{y}}_n = (y_{1,n},\ldots ,y_{n,n})\) and \(\varvec{\varXi }_n = (\varXi _{t_{1,n}},\ldots ,\varXi _{t_{n,n}})\) (the additional condition that \(v(1) \ne 0\) guarantees the invertibility of \(\varvec{\varXi }_n\); see Lemma A.1 in Dette et al. (2016) for an explicit formula for the entries of the inverse matrix). By definition the Kriging predictor is linear in the argument \({\varvec{y}}_n\), and a simple argument shows the interpolation property
The second step now consists in proving (exact, that is, non-asymptotic) equivalence of the experiment defined by the discrete observations (21) and the experiment defined by the continuous path
where \({\mathbf {F}}_{f,n}=(F_{f,n}(t_{1,n}),\ldots ,F_{f,n}(t_{n,n}))\). Defining the partial sums \(S'_{k,n} = \sum _{j=1}^{k} Y'_{j,n}\) and recalling the notation \(\xi _{k,n}= \varXi _{t_{k,n}} -\varXi _{t_{k-1,n}} \) for the increments of the the process \((\varXi _t)_{t\in [0,1]}\), we have
where we used the interpolating property (23) and the the definition (24). Let \((\varXi _t^\prime )_{t \in [0,1]}\) be an independent copy of \((\varXi _t)_{t \in [0,1]}\), and set \(R_t = \varXi _t^\prime - I(t| \varvec{\varXi }_n^\prime )\) with \(\varvec{\varXi }_n^\prime = (\varXi ^\prime _{t_{1,n}},\ldots ,\varXi ^\prime _{t_{n,n}})\). Then, the process
follows the same law as \((\varXi _t)_{t \in [0,1]}\) and \((\varXi _t^\prime )_{t \in [0,1]}\), which can can be checked by a comparison of the covariance structure (indeed, this kind of construction is valid for any centered Gaussian process). Then, observing the definition (24), we have
where we used the notation \({\mathbf {S}}'_n = (S'_{1.n}, \ldots , S'_{n.n})\), eq. (25) and the linearity of the Kriging estimator. Therefore, the process \(({\widetilde{Y}}_t)_{t \in [0,1]}\) can be constructed from the vector \({\mathbf {Y}}'_{n}\). On the other hand, the observations \(Y'_{1,n},\ldots ,Y'_{n,n}\) can be recovered from the trajectory \(({\widetilde{Y}}_t)_{t \in [0,1]}\) since for \(t=t_{k,n}\) the interpolation property (23) yields
and one obtains \(Y'_{i,n}\) as \(Y'_{i,n} =n{\widetilde{Y}}_{t_i} - n{\widetilde{Y}}_{t_{i-1}}\). Hence, the process \(({\widetilde{Y}}_t)_{t \in [0,1]}\) and the vector \({\mathbf {Y}}'_n\) contain the same information and the experiments (21) and (24) are equivalent.
Third step: It remains to show that the experiment \(\widetilde{{\mathfrak {E}}}_{2,n}\) defined by the path in (24) is asymptotically equivalent to the experiment \({\mathfrak {E}}_{2,n}\) defined by
For this purpose we denote by \( {\mathbf {P}}^{(Y_t)_{t \in [0,1]}}\) and \( {\mathbf {P}}^{({\widetilde{Y}}_t)_{t \in [0,1]} }\) the distributions of the processes \((Y_t)_{t \in [0,1]}\) and \(({\widetilde{Y}}_t)_{t \in [0,1]}\), respectively, where the dependence on the parameter f is again suppressed. First, note that the representation of the Kriging estimator shows that the function \(t \rightarrow I( t | {\mathbf {F}}_{f,n})\) belongs to the RKHS \({\mathcal {H}}(\varXi )\) associated with the covariance kernel \(K_\varXi \). Second, Condition (ii) in the statement of Theorem 2 yields that the same holds true for the function \(F_{f,n}(\varvec{\cdot })\). Using the fact that the squared total variation distance can be bounded by the Kullback-Leibler divergence, we obtain
(the first equality follows from Lemma 2 in Schmidt-Hieber (2014), the second from Theorem 13.1 in Wendland (2005)), which completes the proof of Theorem 2.\(\square \)
Proofs of the results in Section 4
1.1 Proof of Theorem 3
The proof consists in checking the two conditions (i) and (ii) in Theorem 2.
Verification of condition (i): We have to show that the expression
converges to zero as \(n \rightarrow \infty \). By the assumptions regarding the functions v and q and an application of the mean value theorem this is equivalent to the condition
Let \(f\in \varTheta (\beta ,L)\) with Fourier expansion \(f(\varvec{\cdot }) = \sum _{k \in {\mathbb {Z}}} \theta _k \varvec{\mathrm e}_k(\varvec{\cdot })\). For any \(K \in {\mathbb {N}}\) (the appropriate value of \(K=K(n)\) for our purposes will be specified below) we define the functions
respectively. Consequently,
where
We now show for \(K=n\) that the estimates
hold uniformly with respect to \(f \in \varTheta (\beta ,L)\). Then, assertion (26) follows from (27) and the assumption \(\beta >1/2\).
Proof of (28):
The quantity
can be interpreted as the (average) energy of the discrete signal \(A^{(n)} = (A_{1,n},\ldots ,A_{n,n})\). Define
as the discrete Fourier transform of the signal \(A^{(n)}\), then Parseval’s identity for the discrete Fourier transform yields
and we have to derive an estimate for \(n \sum _{j=1}^n |F_j |^2 \). For this purpose, we recall the notation of \(A_{j,n}\) and note that
From now on, we take \(K=n\) and write
Since \(\sum _{j=1}^n |F_j |^2 \le 2 \sum _{j=1}^n |F_j^+ |^2 + 2\sum _{j=1}^n |F_j^- |^2\), it is sufficient to consider \(\sum _{j=1}^n |F_j^+ |^2\) (the term involving \(F_j^-\) is treated analogously). We have
where \(l(j) = n - j\) for \(j=1,\ldots ,n-1\) and \(l(n) = n\). Here, we used the well-known fact that for any integer \(m \in {\mathbb {Z}}\)
Thus, we obtain (uniformly with respect to \(\varTheta \))
An analogous argument for the term \(\sum _{j=1}^n |F_j^- |^2 \) proves the estimate (28). \(\square \)
Proof of (29):
We have
and it is again sufficient to consider the sum running over \(k>K\). Using (31) again, we get
Taking \(K = n\) here as well yields
Now,
and we obtain
uniformly over \(f \in \varTheta (\beta ,L)\), which establishes (29). \(\square \)
Proof of (30):
Using Jensen’s inequality and Parseval’s identity we obtain
uniformly over \(f \in \varTheta (\beta ,L)\), which is of order \(n^{1-2\beta }\) if we choose \(K=n\). \(\square \)
Verification of condition (ii): We have to show that
uniformly over all \(f \in \varTheta (\beta ,L)\). Via the isomorphism \(\psi \) introduced in Eq. (10) we have for \(g=\psi F_f\)
Assuming that \(q'\) is bounded from above we obtain
Note that
with
With \(f_n(\varvec{\cdot }) = \sum _{\vert k\vert \le n} \theta _k \varvec{\mathrm e}_k(\varvec{\cdot })\) (and \(f_n^\top (\varvec{\cdot }) = f(\varvec{\cdot }) - f_n(\varvec{\cdot })\)) we define
Using these notations we get
where
We investigate the two terms \(I_1\) and \(I_2\) separately.
Bound for \(I_1\): We use the estimate
where
For the first integral \(I_{11}\) on the right-hand side of (36), we have
First, we further decompose (37) as
where
In the sequel, we consider only the term involving \(f_n^+\) since the sum involving \(f_n^-\) can be bounded using the same argument. We have the identity
From this identity we obtain (exploiting (31) again)
To derive an estimate of (38) we note that for any \(n \in {\mathbb {N}}\),
(the same estimate holding true for f instead of \(f_n\) which formally corresponds to \(n = \infty \)). Hence,
Because the product of a \(\gamma _1\)-Hölder function and a \(\gamma _2\)-Hölder function is (at least) Hölder with index \(\min \{ \gamma _1,\gamma _2 \}\) we obtain from our assumptions that the function \(vq'\) is Hölder continuous with index \(\gamma >1/2\). Thus,
and this is \(o(n^{-1})\) since \(\gamma > 1/2\). Combining these arguments we obtain
Finally, the second integral \(I_{12}\) on the right-hand side of (36) can bounded as follows:
Observing the estimate (26) we finally obtain \(I_1= o(n^{-1})\).
Bound for \(I_2\): In analogy to the decomposition of the term \(I_{11}\) on the right-hand side of (36) we have
Note that \(F_f\) is Lipschitz since it is continuously differentiable (recall that f itself is continuous since \(\beta > 1/2\)) and \(v'\) is Hölder with index \(\gamma >1/2\) due to our assumptions. Thus, the term (39) can be bounded as
which is of order \(o(n^{-1})\). For the term (40) we obtain using our assumptions that
Using the same arguments as for the bound of (38), this term can be shown to be of order \(o(n^{-1})\). Since both terms \(I_1\) and \(I_2\) are of order \(o(n^{-1})\) the assertion (32) follows from (33). \(\square \)
1.2 Proof of Theorem 4
As in the proof of Theorem 3, we have to verify the the two conditions (i) and (ii) from Theorem 2.
Verification of condition (i): As in the Sobolev case it is sufficient to show that
By the mean value theorem \(n \int _{t_{i-1,n}}^{t_{i,n}} f(s)\mathrm {d}s = f(\zeta _{i,n})\) for some \(t_{i-1,n} \le \zeta _{i,n} \le t_{i,n}\). Thus, since \(f \in {\mathcal {F}}(\alpha ,L,M)\),
and the last term converges to zero uniformly over \(f \in {\mathcal {F}}(\alpha ,L,M)\) whenever \(\alpha > 1/2\).
Verification of condition (ii): The proof is based on nearly the same reduction as in the Sobolev case. Again, we consider the bound
where we define \(I_1\) and \(I_2\) as in Appendix B.1 (see the equations (34) and (35)) with the only exception that we now put
in the definition of \(I_1\). Then,
The first integral can be bounded as
which converges to zero as n increases. The second integral can be bounded as in the Sobolev case using the assumption that f is bounded (as one can easily see, the assumption of uniform boundedness can be dropped if \(v \cdot q'\) is constant; this is for instance satisfied in the case of Brownian motion). Hence, \(I_1\) converges to 0.
The term \(I_2\) in (41) can be bounded exactly as the corresponding term in the Sobolev ellipsoid case. This finishes the proof of the theorem. \(\square \)
1.3 Proof of Theorem 5
To prepare the proof, we recall an alternative (but equivalent) characterization of asymptotic equivalence in the framework of statistical decision theory. Let \({\mathfrak {E}}= ({\mathcal {X}},{\mathscr {X}},({\mathbf {P}}_{\theta })_{\theta \in \varTheta })\) be a statistical experiment. In decision theory one considers a decision space \(({\mathcal {A}},{\mathscr {A}})\) where the set \({\mathcal {A}}\) contains the potential decisions (or actions) that are at the observers disposal and \({\mathscr {A}}\) is a \(\sigma \)-field on \({\mathcal {A}}\). In addition, there is a loss function
with the interpretation that a loss \(\ell (\theta ,a)\) occurs if the statistician chooses the action \(a \in {{{\mathcal {A}}}} \) and \(\theta \in \varTheta \) is the true state of nature. A (randomized) decision rule is a Markov kernel \(\rho :{\mathcal {X}}\times {\mathscr {A}}\rightarrow [0,1]\), and the associated risk is
Then, the deficiency between two experiments \({\mathfrak {E}}_1\) and \( {\mathfrak {E}}_2 \) is exactly the quantity
(see Mariucci, 2016, Theorem 2.7, and the references cited there), where the supremum is taken over all loss functions \(\ell \) with \(0 \le \ell (\theta ,a) \le 1\) for all \(\theta \in \varTheta \) and \(a \in {\mathcal {A}}\), all admissible parameters \(\theta \in \varTheta \), and decision rules \(\rho _2\) in the second experiment. The infimum is taken over all decision rules \(\rho _1\) in the first experiment.
After these preliminaries, let us go on to the proof of the theorem. We consider the decision space \(({\mathcal {A}},{\mathscr {A}}) = ({\mathbb {R}},{\mathscr {B}}({\mathbb {R}}))\) and the loss function
In the experiment \({\mathfrak {E}}_{2,n}\) we observe the whole path \(Y = \{ Y_t, \, t \in [0,1] \}\) satisfying
where \((B_t)_{t \in [0,1]}\) is a Brownian Bridge, and we consider the (non-randomized) decision rule \(\rho _2\) defined by
This directly yields \(\rho _2(Y) = \int _0^1 f(s)\mathrm {d}s\) since \(\varXi _0 = \varXi _1 = 0\) for the Brownian bridge. Hence,
for all \(f \in \varTheta (\beta ,L)\) and \(n \in {\mathbb {N}}\). From (42) we thus obtain
Recall the notation \(\varvec{\mathrm e}_n(x) = \exp (-2\pi \mathrm {i}n x)\) and introduce the functions \(f_0 \equiv 0\) and
for \(n \in {\mathbb {N}}\). It is easily seen that \(f_n\) belongs to \(\varTheta (\beta ,L)\). Note that by construction \(f_n(j/n) = 0\) for \(j=1,\ldots ,n\) and thus in the experiment \({\mathfrak {E}}_{1,n}\) the identity \({\mathbf {P}}_{1,n}^{f_0} = {\mathbf {P}}_{1,n}^{f_n}\) holds. For the considered loss function we have
where \(\rho _1\) is any (potentially randomized) decision rule. Because
at least one of the quantities \(\rho _1({\mathbf {Y}}_n) ( {\mathbb {R}}\backslash \{ \int _0^1 f_0(x)\mathrm {d}x \} )\) and \(\rho _1({\mathbf {Y}}_n) ( {\mathbb {R}}\backslash \{ \int _0^1 f_n(x)\mathrm {d}x \} )\) must be \(\ge 1/2\) for any \({\mathbf {Y}}_n\) (otherwise there is a contradiction). Thus, setting \(A_\diamond = \{ {\mathbf {Y}}_n \in {\mathbb {R}}^n : \rho _1({\mathbf {Y}}_n) ( {\mathbb {R}}\backslash \{ \int _0^1 f_\diamond (x)\mathrm {d}x \} ) \ge 1/2 \}\) for \(\diamond \in \{ 0, n \}\) one has \(A_0 \cup A_n = {\mathbb {R}}^n\). As a consequence, either \({\mathbf {P}}_{1,n}^{f_0}(A_0) = {\mathbf {P}}_{1,n}^{f_n}(A_0) \ge 1/2\) or \({\mathbf {P}}_{1,n}^{f_0}(A_n) = {\mathbf {P}}_{1,n}^{f_n}(A_n) \ge 1/2\) holds. Without loss of generality, we assume that \({\mathbf {P}}_{1,n}^{f_0}(A_0) = {\mathbf {P}}_{1,n}^{f_n}(A_0) \ge 1/2\) (the other case follows by exactly the same argument). In this case, using the definition of the set \(A_0\),
which proves the assertion. \(\square \)
About this article
Cite this article
Dette, H., Kroll, M. Asymptotic equivalence for nonparametric regression with dependent errors: Gauss–Markov processes. Ann Inst Stat Math 74, 1163–1196 (2022). https://doi.org/10.1007/s10463-022-00826-6
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-022-00826-6