Abstract
Differenced estimators of variance bypass the estimation of regression function and thus are simple to calculate. However, there exist two problems: most differenced estimators do not achieve the asymptotic optimal rate for the mean square error; for finite samples the estimation bias is also important and not further considered. In this paper, we estimate the variance as the intercept in a linear regression with the lagged Gasser-type variance estimator as dependent variable. For the equidistant design, our estimator is not only \(n^{1/2}\)-consistent and asymptotically normal, but also achieves the optimal bound in terms of estimation variance with less asymptotic bias. Simulation studies show that our estimator has less mean square error than some existing differenced estimators, especially in the cases of immense oscillation of regression function and small-sized sample.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Consider the nonparametric regression model
where \(0\le x_{1}< \cdots < x_{n}\le 1\) are design points, \(Y_{i}\)’s are observed responses, \(m(\cdot )\) is an unknown smooth mean function, and \(\epsilon _{i}\)’s are independent and identically distributed random errors with zero mean and variance \(\sigma ^{2}\).
For the estimation of variance \(\sigma ^{2}\), usually one fits the regression function m first by smoothing spline (Carter and Eagleson 1992; Carter et al. 1992) or kernel regression (Müller and Stadtmüller 1987; Hall and Carroll 1989; Neumann 1994), and then estimate variance \(\sigma ^{2}\) from residual sum of squares. However, the regression function estimation depends on the amount of smoothing (Dette et al. 1998), which requires knowledge about some unknown quantities such as \(\int _{0}^{1}\{m^{(1)}(x)\}^{2}\mathrm{d}x\) (Hall and Marron 1990), \(\int _{0}^{1}\{m^{(2)}(x)\}^{2}\mathrm{d}x\) (Buckley et al. 1988), and even \(\int _{0}^{1}\{m^{(l)}(x)\}^{2}\mathrm{d}x\) where l is a derivative order (Seifert et al. 1993; Eubank 1999; Wang 2011). Moreover, few methods are proposed to estimate the amount of smoothness \(\int _{0}^{1}\{m^{(l)}(x)\}^{2}\mathrm{d}x\). Thus in practical applications an estimator of \(\sigma ^{2}\) without estimating the regression function is preferable.
A class of estimators which bypass the estimation of regression function are the differenced estimators. Rice (1984) proposed the first-order differenced estimator
Later, Gasser et al. (1986) proposed the second-order differenced estimator
and Hall et al. (1990) generalized to the kth-order differenced estimator by minimizing the estimation variance. Based on former estimators, more estimators of \(\sigma ^{2}\) are proposed. With respect to the Rice-type estimator, Müller and Stadtmüller (1999) proposed the lagged Rice estimator; Müller et al. (2003) proposed a covariate-matched U-statistic; Tong and Wang (2005) further improved the lagged Rice estimator via estimating the variance as the intercept in a linear regression model, and the asymptotic optimal rate is discussed in Tong et al. (2013), Dai and Tong (2014) proposed a pairwise regression for models with jump discontinuities. As for the Gasser-type estimator, Seifert et al. (1993) generalized to a higher-order version, and Du and Schick (2009) proposed a covariate-matched U-statistic.
For the differenced estimators above, there exist the following problems. Most estimators keep the order k to be constant and don’t achieve the asymptotic optimal rate for the mean square error (MSE), that is \(\mathrm{MSE}(\hat{\sigma }^{2})=n^{-1}var(\varepsilon ^{2})+o(n^{-1})\), except the estimators in Müller et al. (2003), Tong and Wang (2005), and Du and Schick (2009). Zhou et al. (2015) indicated that Müller et al. (2003) and Du and Schick (2009) are not applicable in practice since they require an appropriate choice of bandwidth in advance. In Tong and Wang (2005), the bias caused by the regression function is non-ignorable for finite samples, especially in the cases of immense oscillation of regression function m and small sample size n. However, Seifert et al. (1993), and Du and Schick (2009) both showed their estimators based on the Gasser-type estimator have better bias properties than the Rice-type and Hall-type estimators. Thus in practical applications, the method based on the Gasser-type estimator is preferred.
In this paper, we propose a new variance estimator based on the Gasser-type estimator. With equidistant designs, our estimator achieves the asymptotic optimal rate for the MSE. In finite samples, our estimator effectively reduces the estimation bias caused by the regression function, especially in the cases of immense oscillation of regression function and small-sized sample.
The remainder of this paper is organised as follows. In Sect. 2, we propose the lagged Gasser-type estimator, introduce the estimation methodology, and deduce the asymptotic results for our estimator. To assess the performance of our estimator, we conduct simulation studies and compare it with other differenced estimators in Sect. 3. All proofs of Theorems are given in “Appendices 1–4”.
2 Main results
In this section, we define the lag-l Gasser-type estimator, and construct a new estimator based on these lagged estimators using least squares regression. It will be shown that the new estimator is more efficient than the Gasser et al. (1986)’s estimator, and reaches the optimal bound in terms of estimation variance. Meanwhile, the asymptotic normality is established.
2.1 Lagged Gasser-type estimator
Assume that \(x_{i}\)’s are equidistantly designed, that is, \(x_{i}=i/n\) for \(1\le i\le n\). For ease of notations, let \(m_{i}=m(x_{i})\), and \(\gamma _{4}=\mathrm{E}(\epsilon ^{4})/\sigma ^{4}\). Define the lag-l Gasser-type estimator as
which is similar to Müller and Stadtmüller (1999). The difference is that their methods focus on the Rice-type estimator while ours relies on the Gasser-type estimator to further reduce bias.
For the lag-l Gasser-type estimator \(s_{l}\), we decompose it into three parts as
where
Taking expectation of \(I_{2l}\) and \(I_{3l}\), we have
Note that the lag-l Gasser-type estimator has a positive bias \(I_{1l}\). Assume that \(m(\cdot )\) has a bounded second-order derivative. According to the Taylor expansion of \(m_{i\pm l}\) at \(x_{i}\), we have
and
where \(J=(1/6)\int _{0}^{1}\{m^{(2)}(x)\}^{2}\mathrm{d}x\). For \(l=o(n)\), the expectation of \(s_{l}\) is
2.2 Estimation methodology
For any fixed \(k=o(n)\), we construct a linear regression model
where \(\beta =(\beta _{0}, \beta _{1})^{T}=(\sigma ^{2}, J)^{T}\), \(d_{l}=l^{4}/n^{4}\), and \(\delta _{l}=o(l^{4}/n^{4})+I_{2l}+(I_{3l}-\sigma ^{2}\)) with \(\mathrm{E}(\delta _{l})\approx 0\) and \(var(\delta _{l})\approx (\gamma _{4}+8/9)\sigma ^{4}/n\) (see Theorem 2). The method is similar in spirit to the variance estimation (Müller and Stadtmüller 1999; Tong and Wang 2005) and the derivative estimation (Wang and Lin 2015).
Since the asymptotic variances of \(\delta _{l}\)’s are all the same, we use the least squares method to obtain the parameter estimation
where
That is
where \(\bar{s}=(1/k)\sum _{l=1}^{k}s_{l}\) and \(\bar{d}=(1/k)\sum _{l=1}^{k}d_{l}\). We define the proposed variance estimator as \(\hat{\sigma }^{2}=\hat{\beta }_{0}\). The following theorem gives some properties of the variance estimator \(\hat{\sigma }^{2}\).
Theorem 1
For equidistant designs, we have the following:
-
1.
\(\hat{\sigma }^{2}\) is unbiased when m is a linear or quadratic function regardless of the choice of k;
-
2.
\(\hat{\sigma }^{2}\) can be represented as a linear combination of \(\{s_{l}\}_{l=1}^{k}\), i.e., \(\hat{\sigma }^{2}=(1/k)\sum _{l=1}^{k}b_{l}s_{l}\), where \(b_{l}=1-\frac{k\bar{d}(d_{l}-\bar{d})}{\sum _{l=1}^{k}(d_{l}-\bar{d})^{2}} \approx \frac{25}{16}-\frac{45}{16}\frac{l^{4}}{k^{4}}\) for \(l=1,\ldots ,k\);
-
3.
\(\hat{\sigma }^{2}\) can be written as a quadratic form \(\hat{\sigma }^{2}=Y^{T} DY\), where \(D=(1/k)\sum _{l=1}^{k}b_{l}D_{l}\) is an \(n\times n\) matrix with \(tr(D)=1\), and \(D_{l}\) is an \(n\times n\) matrix corresponding to \(s_{l}=Y^{T} D_{l}Y\) with \(tr(D_{l})=1\).
Remark 1
From the second point of Theorem 1, \(\hat{\sigma }^{2}\) is a linear combination of \(\{s_{l}\}_{l=1}^{k}\). The weighted average has two superiorities: it decreases asymptotic estimation variance from \((\gamma _{4}+8/9)\sigma ^{4}\) to \((\gamma _{4}-1)\sigma ^{4}\) (see Theorems 2 and 4); in addition, it eliminates the bias terms \(\{Jl^{4}/n^{4}\}_{l=1}^{k}\) since \(\sum _{l=1}^{k}b_{l}l^{4}/n^{4}=0\). Owing to \(\sum _{l=1}^{k}b_{l}=k\) and \(b_{l}<0\) for \(l>(5/9)^{1/4}k\approx 0.86k\), D is not guaranteed to be a positive definite matrix, and it is possible to take a negative value for \(\hat{\sigma }^{2}\). However, our simulation indicates that this rarely happens. If negative estimates do happen in practice, then we recommend replacing them by zero as Tong and Wang (2005) suggested.
For finite samples, since \(s_{l}\) is the average of \((n-2l)\) lag-l differences, it may be better to assign weight \(w_{l}=(n-2l)/N\) to \(s_{l}\) with \(N=\sum _{l=1}^{k}(n-2l)\). So we obtain the weighted least squares variance estimation
where \(\bar{s}_{w}=\sum _{l=1}^{k}w_{l}s_{l}\) and \(\bar{d}_{w}=\sum _{l=1}^{k}w_{l}d_{l}\).
In the construction of the least squares estimator and its weighted version, we ignore the correlation between \(s_{l}\)’s. It is well known that the generalized least squares estimator \(\hat{\beta }_{\Sigma }=(X^{T}\Sigma ^{-1} X)^{-1}X^{T}\Sigma ^{-1} S\) is the best linear unbiased estimator (BLUE), where \(\Sigma \) is the asymptotic covariance matrix. “BLUE \(=\) OLSE” in linear regression if and only if the errors have equal variances and equal nonnegative correlation coefficients by McElroy (1967). From the following Theorem 2 and Lemma 3, the generalized least squares estimator is identical to the ordinary least squares estimator. Thus the above three estimators are asymptotically equivalent.
2.3 Asymptotic results
Next, we establish the asymptotic normality for the lag-l Gasser-type estimator and our estimator.
Theorem 2
Assume that m has a bounded second-order derivative. If \(l=o(n^{7/8})\) as \(n\rightarrow \infty \), then the lag-l Gasser estimator satisfies
With normal errors, the asymptotic variances for the lagged Rice-type and Gasser-type estimators are \(3\sigma ^{4}/n\) (Tong et al. 2013) and \((3+8/9)\sigma ^{4}/n\) respectively; thus the Rice-type estimator is more efficient than the Gasser-type estimator theoretically. However, for finite samples the Gasser-type estimator would be better if the bias caused by regression function is controlled in the case of immense oscillation of regression function.
Lemma 3
Assume that m has a bounded second-order derivative. If \(v=o(n^{3/4})\) as \(n\rightarrow \infty \) and \(v>u\), then the covariance between lag-l Gasser-type estimators is
Theorem 4
Assume that m has a bounded second-order derivative. For any \(k=O(n^{r})\) with \(0<r<3/4\) as \(n\rightarrow \infty \), we have
Remark 2
There are two kinds of differenced methods: the direct-difference method and integral-based method. Most direct-difference estimators do not achieve the asymptotic optimal variance, i.e., Gasser et al. (1986)’s estimator has the asymptotic variance \((\gamma _{4}+8/9)\sigma ^{4}\) from Theorem 2. As for the integral-based estimator, our estimator achieves the same optimal variance \((\gamma _{4}-1)\sigma ^{4}\) from Theorem 4 as Tong and Wang (2005) did, but ours has less bias (\(o(k^{4}/n^{4})\)) than Tong and Wang (2005)’s (\(o(k^{2}/n^{2})\)) under the same conditions of equidistant design and bounded second-order derivative.
3 Simulations
3.1 Finite-sample choice of the bandwidth
In our comparisons, we evaluate the performance of the estimators of Rice (1984), Gasser et al. (1986), Hall et al. (1990), Tong and Wang (2005), and ours, which are notated as \(\hat{\sigma }_{R}^{2}\), \(\hat{\sigma }_{\textit{GSJ}}^{2}\), \(\hat{\sigma }_{\textit{HKT}}^{2}\), \(\hat{\sigma }_{\textit{TW}}^{2}\), and \(\hat{\sigma }^{2}\), respectively. For Hall et al. (1990)’s estimator, we set \(k=2\) in our simulations, that is
As for Tong and Wang (2005)’s estimator and ours, the bandwidth is chosen as \(k=n^{1/3}, n^{1/2}\) for small and large n respectively, as Tong and Wang (2005) did. The further explanation is given recently by Wang and Lin (2015), Wang and Yu (2016).
3.2 Simulation results
The oscillation of periodic function depends on frequency and amplitude. We choose the regression function in our simulations to be
with design points \(x_{i}=i/n\), and the errors are independent and identically distributed normal random variables with zero mean and variance \(\sigma ^{2}\). The regression function is similar to those in Seifert et al. (1993), Dette et al. (1998), and Tong and Wang (2005). We consider two sample sizes \(n=50, 500\), corresponding to small, large sample sizes, three standard deviations \(\sigma =0.1, 0.5, 2\), two frequencies \(f=0.5, 2\), and two amplitudes \(A=5, 50\).
For each simulation setting, we generate observations and compute the estimators \(\hat{\sigma }_{R}^{2}\), \(\hat{\sigma }_{\textit{GSJ}}^{2}\), \(\hat{\sigma }_{\textit{HKT}}^{2}\), \(\hat{\sigma }_{\textit{TW}}^{2}\), \(\hat{\sigma }^{2}\). We repeat this process 1000 times and compute relative MSEs for these estimators.
Table 1 lists relative MSEs of all estimators, \(\mathrm{nMSE}/(2\sigma ^{4})\), since the asymptotically optimal variance is \(2\sigma ^{4}/n\) with normal errors. In general, \(\hat{\sigma }^{2}\) has smaller relative MSE in most settings. With normal errors, the relative MSEs are 1.25, 1.5, and 1.94 corresponding to \(\hat{\sigma }_{\textit{HKT}}^{2}\), \(\hat{\sigma }_{R}^{2}\), and \(\hat{\sigma }_{\textit{GSJ}}^{2}\). However, \(\hat{\sigma }_{\textit{GSJ}}^{2}\) is robust in most settings because it reduces more bias than \(\hat{\sigma }_{R}^{2}\) and \(\hat{\sigma }_{\textit{HKT}}^{2}\). \(\hat{\sigma }_{\textit{TW}}^{2}\) and \(\hat{\sigma }^{2}\) both achieve the asymptotic optimal rate, but \(\hat{\sigma }^{2}\) has less relative MSE than \(\hat{\sigma }_{\textit{TW}}^{2}\) in all settings except that they have the similar performance in the case within large sample and variance, low frequency, and small amplitude occur simultaneously.
Next we explain the reason why there are some quite large values in Table 1. The asymptotic MSE is composed of bias and variance, that is \(\mathrm{MSE=Bias^{2}+Var}\). By Theorems 2 and 4, we known that the variance can be controlled as the sample size n becomes large. However, the bias is fixed which is determined by the used method and the oscillation of regression function. For example, \(\mathrm{AMSE}(\hat{\sigma }_{R}^{2})=1.47E2\) is large enough for \(n=50, A=5, f=0.5, \sigma =0.1\). We compute the expectation of \(\hat{\sigma }_{R}^{2}\) such that
Thus the bias is
and now
which is close to 147.
To further show the importance of bias-correction, we consider the two cases \((n, A, f, \sigma ^{2})=(500,5,0.5,0.01)\) and (500, 50, 0.5, 0.01). Figure 1 indicates that \(\hat{\sigma }_{\textit{TW}}^{2}\) and \(\hat{\sigma }^{2}\) both have asymptotic normality with variance \(\sigma ^{2}=0.01\). However, \(\hat{\sigma }_{\textit{TW}}^{2}\) has small bias (0.0004) while \(\hat{\sigma }^{2}\) almost has no bias with \(A=5\); as amplitude A varies from 5 to 50, \(\hat{\sigma }_{\textit{TW}}^{2}\) has larger bias (0.04) while \(\hat{\sigma }^{2}\) has little bias (0.00018). So the new estimator based on the Gasser-type estimator controls bias much better than Tong and Wang (2005)’s estimator based on the Rice-type estimator.
4 Discussion
In this paper, we propose a new variance estimator in nonparametric regression model. The new estimator achieves the asymptotic optimal rate for the MSE, meanwhile it is less-biased than most differenced estimators.
This work concentrates on the equidistant design. We can generalize the idea to non-equidistant design. Following one reviewer’s comment, we assume that \(x_{i}=g(i/n)\) for \(i=1, \ldots , n\), where the function \(g(\cdot )\) has a positive derivative function and \(c\le g(x)\le 1\) for some constant \(0<c<1\). We expand \(x_{i+l}-x_{i}\), \(x_{i+l}-x_{i-1}\), and \(x_{i}-x_{i-1}\) on the same design point i / n such that
Thus in the sense of linear approximation, we have the same result as the equidistant design such that
Further, assume that the function \(g(\cdot )\) is two times continuous differentiable. We expand \(x_{i+l}-x_{i}\), \(x_{i+l}-x_{i-1}\), and \(x_{i}-x_{i-1}\) on the point i / n such that
Thus we have
To eliminate the last term of the above equation, we need the conditions that \(g^{(1)}(i/n)\) is away from 0 and \(g^{(2)}(i/n)\) is close to 0. Or else the last term is non-ignorable and our proposed method is not applied directly. In particular case \(g(x)=x\) corresponding to the equidistant design, we have that \(g^{(1)}(i/n)\equiv 1\) and \(g^{(2)}(i/n)\equiv 0\), which satisfy the above conditions.
In addition, our proposed method concentrates on univariate x, while some differenced methods in the literature focused on multivariate x. Hall et al. (1991) generalized the idea in Hall et al. (1990) to the bivariate lattice design. Munk et al. (2005) further proposed a differenced estimator for multivariate regression (the dimension number \(d\le 4\)). Our method can be generalized to the bivariate lattice design as Hall et al. (1991) did. The generalization may be more efficient in some particular cases. For example, when the surface has the similar variation trend in the horizontal direction, difference in the vertical direction can eliminate the trend to retain the error information.
References
Buckley MJ, Eagleson GK, Silverman BW (1988) The estimation of residual variance in nonparametric regression. Biometrika 75:189–199
Carter CK, Eagleson GK (1992) A comparison of variance estimators in nonparametric regression. J R Stat Soc B 54:773–780
Carter CK, Eagleson GK, Silverman BW (1992) A comparison of the reinsch and speckman splines. Biometrika 71:81–91
Dai W, Tong T (2014) Variance estimation in nonparametric regression with jump discontinuities. J Appl Stat 41:530–545
Dette H, Munk A, Wagner T (1998) Estimating teh variance in nonparametric regression-what is a reasonable choice? J Am Stat Assoc 60:751–764
Du J, Schick A (2009) A covariate-matched estimator of the error variance in nonparametric regression. J Nonparametr Stat 21(3):263–285
Eubank RL (1999) Nonparametric regression and spline smoothing. Marcel Dekker Inc, New York
Gasser T, Sroka L, Jennen-Sternmetz C (1986) Residual variance and residual pattern in nonlinear regression. Biometrika 73:625–633
Hall P, Carroll RJ (1989) Variance function estimation in regression: the effect of estimating the mean. J R Stat Soc B 51:3–14
Hall P, Marron JS (1990) On variance estimation in nonparametric regression. Biometrika 77:415–419
Hall P, Kay JW, Titterington DM (1990) Asymptotically optimal difference-based estimation of variance in nonparametric regression. Biometrika 77:521–528
Hall P, Kay JW, Titterington DM (1991) On estimation of noise variance in two-dimensional signal processing. Appl Probab Trust 23(3):476–495
McElroy F (1967) A necessary and sufficient condition that ordinary least-squares estimators be best linear unbiased. J Am Stat Assoc 62(320):1302–1304
Müller HG, Stadtmüller U (1987) Estimation of heteroscedasticity in regression analysis. Ann Stat 15:610–635
Müller HG, Stadtmüller U (1999) Discontinuous versus smooth regression. Ann Stat 27:1215–1230
Müller UU, Schick A, Wefelmeyer W (2003) Estimating the error variance in nonparametri regression by a covariate-matched u-statistic. Statistics 37:179–188
Munk A, Bissantz N, Wagner T, Freitag G (2005) On difference-based variance estimation in nonparametric regression when the covariate is high dimensional. Aust N Z J Stat 44:479–488
Neumann MH (1994) Fully data-driven nonparametric variance estimators. Statistics 25:189–212
Rice JA (1984) Bandwidth choice for nonparametric regression. Ann Stat 12:1215–1230
Seifert B, Gasser T, Wolf A (1993) Nonparametric estimation of residual variance revisited. Biometrika 80:373–383
Tong T, Wang Y (2005) Estimating residual variance in nonparametric regression using least squares. Biometrika 92(4):821–830
Tong T, Ma Y, Wang Y (2013) Optimal variance estimation without estimating the mean function. Bernoulli 19(5A):1839–1854
Wang Y (2011) Smoothing splines: methods and applications. CRC Press, Boca Raton
Wang WW, Lin L (2015) Derivative estimation based on difference sequence via locally weighted least squares regression. J Mach Learn Res 16:2617–2641
Wang WW, Yu P (2016) Asymptotically optimal differenced estimators of error variance in nonparametric regression. Comput Stat Data Anal. Technical report, The University of Hong Kong. http://web.hku.hk/~pingyu/KernelD.pdf
Whittle P (1964) On the convergence to normality of quadratic forms in independent variables. Theory Probab Appl 9:103–108
Zhou Y, Cheng Y, Wang L, Tong T (2015) Optimal difference-based variance estimation in heteroscedastic nonparametric regression. Stat Sin 25:1377–1397
Acknowledgments
We would like to thank the four reviewers for the helpful comments and Ping Yu at the University of Hong Kong for useful discussions. The research was supported by NNSF Projects (11571204 and 11231005) of China.
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: Proof of Theorem 1
Let \(m(x)=a+bx+cx^{2}\). We have
Since it is an accurate linear regression model, the coefficient estimation is unbiased. Thus,
Variance estimator \(\hat{\sigma }^{2}\) is a linear combination of \(\{s_{l}\}_{l=1}^{k}\), that is
where \(b_{l}=1-\frac{k\bar{d}(d_{l}-\bar{d})}{\sum _{l=1}^{k}(d_{l}-\bar{d})^{2}}\approx \frac{25}{16}-\frac{45}{16}\frac{l^{4}}{k^{4}}\).
According to the proof of (2) and \(\sum _{l=1}^{k}b_{l}=k\), (3) is straightforward.
Appendix 2: Proof of Theorem 2
For lag-l Gasser-type estimator,
Applying Taylor expansion to \(I_{1l}\), we can show that \(I_{1l}=(l^{4}/n^{4})J+o(l^{4}/n^{4})=o_{p}(n^{-1/2})\) if \(n^{-1}l^{8/7}\rightarrow 0\) as \(n\rightarrow \infty \). For \(I_{2l}\), we have
which implies that \(I_{2l}=o_{p}(n^{-1/2})\) for any \(l=o(n)\).
Rewrite \(I_{3l}\) as \(I_{3l}=\sigma ^{2}+1/(n-2l)\sum _{i=l+1}^{n-l}\eta _{i}(l)\), where \(\eta _{i}(l)=2/3(1/2\epsilon _{i-l}-\epsilon _{i}+1/2\epsilon _{i+l})^{2}-\sigma ^{2}\). For any l, \(\{\eta _{i}(l)\}_{i=l+1}^{n-l}\) is a strictly stationary sequence of random variables with mean zero and autocovariance function
Note that the sequence \(\{\eta _{i}(l)\}_{i=l+1}^{n-l}\) is 2l-dependent. Thus by the central limit theorem in Whittle (1964), \(\sqrt{n}(I_{3l}-\sigma ^{2})\mathop {\longrightarrow }\limits ^{d}N(0, \zeta ^{2})\), where \(\zeta ^{2}=\gamma (0)+2\sum _{t=1}^{2l}\gamma (t)=(\gamma _{4}+8/9)\sigma ^{4}\). Using \(I_{1l}=o_{p}(n^{-1/2})\), \(I_{2l}=o_{p}(n^{-1/2})\) and \(s_{l}=I_{1l}+I_{2l}+I_{3l}\), we have
Appendix 3: Proof of Lemma 3
For \(1\le u<v=o(n)\), we have
where
Applying the second-order Taylor expansion, we have
Next, we decompose \(L_{1}(u, v)\) into ten parts such as
where
and \(\Gamma =\{(i, j)|u+1\le i\le n-u; v+1\le j\le n-v; \{i-u, i, i+u\}\cap \{j-v, j, j+v\}=\emptyset \}\). When \(v\ne 2u\), it is easy to verify the following
Therefore,
When \(v=2u\), note that \(L_{12}=L_{18}, L_{13}=L_{19}\) and \(\mathrm{E}(L_{12})=(5\gamma _{4}+23)\sigma ^{4}n/16+o(n)\). Thus,
Finally, if \(v=o(n^{3/4})\) as \(n\rightarrow \infty \), then
Appendix 4: Proof of Theorem 4
When \(k=O(n^{r})\) with \(0<r<3/4\) as \(n\rightarrow \infty \), the estimation bias is negligible, and Theorems 1 and 3 are established simultaneously. Therefore \(\hat{\sigma }^{2}\) is consistent and asymptotically normal since it is a least squares estimator. Next we derive the asymptotic variance
So we have
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Cite this article
Wang, W., Lin, L. & Yu, L. Optimal variance estimation based on lagged second-order difference in nonparametric regression. Comput Stat 32, 1047–1063 (2017). https://doi.org/10.1007/s00180-016-0666-2
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-016-0666-2