1 Introduction

Consider the nonparametric regression model

$$\begin{aligned} Y_{i}=m(x_{i})+\epsilon _{i}\quad (i=1, \ldots , n), \end{aligned}$$
(1)

where \(0\le x_{1}< \cdots < x_{n}\le 1\) are design points, \(Y_{i}\)’s are observed responses, \(m(\cdot )\) is an unknown smooth mean function, and \(\epsilon _{i}\)’s are independent and identically distributed random errors with zero mean and variance \(\sigma ^{2}\).

For the estimation of variance \(\sigma ^{2}\), usually one fits the regression function m first by smoothing spline (Carter and Eagleson 1992; Carter et al. 1992) or kernel regression (Müller and Stadtmüller 1987; Hall and Carroll 1989; Neumann 1994), and then estimate variance \(\sigma ^{2}\) from residual sum of squares. However, the regression function estimation depends on the amount of smoothing (Dette et al. 1998), which requires knowledge about some unknown quantities such as \(\int _{0}^{1}\{m^{(1)}(x)\}^{2}\mathrm{d}x\) (Hall and Marron 1990), \(\int _{0}^{1}\{m^{(2)}(x)\}^{2}\mathrm{d}x\) (Buckley et al. 1988), and even \(\int _{0}^{1}\{m^{(l)}(x)\}^{2}\mathrm{d}x\) where l is a derivative order (Seifert et al. 1993; Eubank 1999; Wang 2011). Moreover, few methods are proposed to estimate the amount of smoothness \(\int _{0}^{1}\{m^{(l)}(x)\}^{2}\mathrm{d}x\). Thus in practical applications an estimator of \(\sigma ^{2}\) without estimating the regression function is preferable.

A class of estimators which bypass the estimation of regression function are the differenced estimators. Rice (1984) proposed the first-order differenced estimator

$$\begin{aligned} \hat{\sigma }_{R}^{2}=\frac{1}{2(n-1)}\sum _{i=2}^{n}(Y_{i}-Y_{i-1})^{2}. \end{aligned}$$

Later, Gasser et al. (1986) proposed the second-order differenced estimator

$$\begin{aligned} \hat{\sigma }_{\textit{GSJ}}^{2}=\frac{2}{3(n-2)}\sum _{i=2}^{n-1}\left( \frac{1}{2} Y_{i-1}-Y_{i}+\frac{1}{2}Y_{i+1}\right) ^{2}, \end{aligned}$$
(2)

and Hall et al. (1990) generalized to the kth-order differenced estimator by minimizing the estimation variance. Based on former estimators, more estimators of \(\sigma ^{2}\) are proposed. With respect to the Rice-type estimator, Müller and Stadtmüller (1999) proposed the lagged Rice estimator; Müller et al. (2003) proposed a covariate-matched U-statistic; Tong and Wang (2005) further improved the lagged Rice estimator via estimating the variance as the intercept in a linear regression model, and the asymptotic optimal rate is discussed in Tong et al. (2013), Dai and Tong (2014) proposed a pairwise regression for models with jump discontinuities. As for the Gasser-type estimator, Seifert et al. (1993) generalized to a higher-order version, and Du and Schick (2009) proposed a covariate-matched U-statistic.

For the differenced estimators above, there exist the following problems. Most estimators keep the order k to be constant and don’t achieve the asymptotic optimal rate for the mean square error (MSE), that is \(\mathrm{MSE}(\hat{\sigma }^{2})=n^{-1}var(\varepsilon ^{2})+o(n^{-1})\), except the estimators in Müller et al. (2003), Tong and Wang (2005), and Du and Schick (2009). Zhou et al. (2015) indicated that Müller et al. (2003) and Du and Schick (2009) are not applicable in practice since they require an appropriate choice of bandwidth in advance. In Tong and Wang (2005), the bias caused by the regression function is non-ignorable for finite samples, especially in the cases of immense oscillation of regression function m and small sample size n. However, Seifert et al. (1993), and Du and Schick (2009) both showed their estimators based on the Gasser-type estimator have better bias properties than the Rice-type and Hall-type estimators. Thus in practical applications, the method based on the Gasser-type estimator is preferred.

In this paper, we propose a new variance estimator based on the Gasser-type estimator. With equidistant designs, our estimator achieves the asymptotic optimal rate for the MSE. In finite samples, our estimator effectively reduces the estimation bias caused by the regression function, especially in the cases of immense oscillation of regression function and small-sized sample.

The remainder of this paper is organised as follows. In Sect. 2, we propose the lagged Gasser-type estimator, introduce the estimation methodology, and deduce the asymptotic results for our estimator. To assess the performance of our estimator, we conduct simulation studies and compare it with other differenced estimators in Sect. 3. All proofs of Theorems are given in “Appendices 1–4”.

2 Main results

In this section, we define the lag-l Gasser-type estimator, and construct a new estimator based on these lagged estimators using least squares regression. It will be shown that the new estimator is more efficient than the Gasser et al. (1986)’s estimator, and reaches the optimal bound in terms of estimation variance. Meanwhile, the asymptotic normality is established.

2.1 Lagged Gasser-type estimator

Assume that \(x_{i}\)’s are equidistantly designed, that is, \(x_{i}=i/n\) for \(1\le i\le n\). For ease of notations, let \(m_{i}=m(x_{i})\), and \(\gamma _{4}=\mathrm{E}(\epsilon ^{4})/\sigma ^{4}\). Define the lag-l Gasser-type estimator as

$$\begin{aligned} s_{l}=\frac{2}{3(n-2l)}\sum _{i=l+1}^{n-l}\left( \frac{1}{2}Y_{i-l}-Y_{i} +\frac{1}{2}Y_{i+l}\right) ^{2}\quad (1\le l\le k), \end{aligned}$$
(3)

which is similar to Müller and Stadtmüller (1999). The difference is that their methods focus on the Rice-type estimator while ours relies on the Gasser-type estimator to further reduce bias.

For the lag-l Gasser-type estimator \(s_{l}\), we decompose it into three parts as

$$\begin{aligned} s_{l}=I_{1l}+I_{2l}+I_{3l}, \end{aligned}$$

where

$$\begin{aligned} I_{1l}= & {} \frac{2}{3(n-2l)}\sum _{i=l+1}^{n-l}\left( \frac{1}{2}m_{i-l}-m_{i}+\frac{1}{2} m_{i+l}\right) ^{2},\\ I_{2l}= & {} \frac{4}{3(n-2l)}\sum _{i=l+1}^{n-l}\left( \frac{1}{2}m_{i-l}-m_{i}+\frac{1}{2}m_{i+l}\right) \left( \frac{1}{2}\epsilon _{i-l}-\epsilon _{i}+\frac{1}{2}\epsilon _{i+l}\right) ,\\ I_{3l}= & {} \frac{2}{3(n-2l)}\sum _{i=l+1}^{n-l}\left( \frac{1}{2}\epsilon _{i-l}-\epsilon _{i}+\frac{1}{2}\epsilon _{i+l}\right) ^{2}. \end{aligned}$$

Taking expectation of \(I_{2l}\) and \(I_{3l}\), we have

$$\begin{aligned} \mathrm{E}(I_{2l})=0, \quad \mathrm{E}(I_{3l})=\sigma ^{2}. \end{aligned}$$

Note that the lag-l Gasser-type estimator has a positive bias \(I_{1l}\). Assume that \(m(\cdot )\) has a bounded second-order derivative. According to the Taylor expansion of \(m_{i\pm l}\) at \(x_{i}\), we have

$$\begin{aligned} \frac{1}{2}m_{i-l}-m_{i}+\frac{1}{2}m_{i+l}=\frac{m_{i}^{(2)}}{2} \frac{l^{2}}{n^{2}}+o\left( \frac{l^{2 }}{n^{2}}\right) , \end{aligned}$$

and

$$\begin{aligned} I_{1l}= & {} \frac{2}{3(n-2l)}\sum _{i=1+l}^{n-l}\left( \frac{m_{i}^{(2)}}{2} \frac{l^{2}}{n^{2}}+o\left( \frac{l^{2}}{n^{2}}\right) \right) ^{2}\\= & {} \frac{2}{3(n-2l)}\sum _{i=1+l}^{n-l}\left( \frac{m_{i}^{(2)}}{2}\right) ^{2} \frac{l^{4}}{n^{4}}+o\left( \frac{l^{4}}{n^{4}}\right) \\\approx & {} J\frac{l^{4}}{n^{4}}, \end{aligned}$$

where \(J=(1/6)\int _{0}^{1}\{m^{(2)}(x)\}^{2}\mathrm{d}x\). For \(l=o(n)\), the expectation of \(s_{l}\) is

$$\begin{aligned} \mathrm{E}(s_{l})\approx \sigma ^{2}+J\frac{l^{4}}{n^{4}}. \end{aligned}$$

2.2 Estimation methodology

For any fixed \(k=o(n)\), we construct a linear regression model

$$\begin{aligned} s_{l}=\beta _{0}+\beta _{1}d_{l}+\delta _{l} \quad (l=1,\ldots ,k), \end{aligned}$$
(4)

where \(\beta =(\beta _{0}, \beta _{1})^{T}=(\sigma ^{2}, J)^{T}\), \(d_{l}=l^{4}/n^{4}\), and \(\delta _{l}=o(l^{4}/n^{4})+I_{2l}+(I_{3l}-\sigma ^{2}\)) with \(\mathrm{E}(\delta _{l})\approx 0\) and \(var(\delta _{l})\approx (\gamma _{4}+8/9)\sigma ^{4}/n\) (see Theorem 2). The method is similar in spirit to the variance estimation (Müller and Stadtmüller 1999; Tong and Wang 2005) and the derivative estimation (Wang and Lin 2015).

Since the asymptotic variances of \(\delta _{l}\)’s are all the same, we use the least squares method to obtain the parameter estimation

$$\begin{aligned} \hat{\beta }= & {} \arg \min _{\beta _{0},\beta _{1}}\sum _{l=1}^{k}(s_{l}-\beta _{0}-\beta _{1}d_{l})^{2}\\= & {} (X^{T}X)^{-1}X^{T}S, \end{aligned}$$

where

$$\begin{aligned} X^{T}= \left( \begin{array}{ccc} 1&{}\quad \cdots &{}\quad 1\\ 1^{4}/n^{4}&{}\quad \cdots &{}\quad k^{4}/n^{4} \\ \end{array} \right) _{2\times k},\quad S^{T}= \left( \begin{array}{ccc} s_{1},&{} \cdots ,&{} s_{k}\\ \end{array} \right) _{1\times k}. \end{aligned}$$

That is

$$\begin{aligned} \hat{\beta }_{0}=\bar{s}-\hat{\beta }_{1}\bar{d},\quad \hat{\beta }_{1}=\frac{\sum _{l=1}^{k}s_{l}(d_{l}-\bar{d})}{\sum _{l=1}^{k}(d_{l}-\bar{d})^{2}}, \end{aligned}$$

where \(\bar{s}=(1/k)\sum _{l=1}^{k}s_{l}\) and \(\bar{d}=(1/k)\sum _{l=1}^{k}d_{l}\). We define the proposed variance estimator as \(\hat{\sigma }^{2}=\hat{\beta }_{0}\). The following theorem gives some properties of the variance estimator \(\hat{\sigma }^{2}\).

Theorem 1

For equidistant designs, we have the following:

  1. 1.

    \(\hat{\sigma }^{2}\) is unbiased when m is a linear or quadratic function regardless of the choice of k;

  2. 2.

    \(\hat{\sigma }^{2}\) can be represented as a linear combination of \(\{s_{l}\}_{l=1}^{k}\), i.e., \(\hat{\sigma }^{2}=(1/k)\sum _{l=1}^{k}b_{l}s_{l}\), where \(b_{l}=1-\frac{k\bar{d}(d_{l}-\bar{d})}{\sum _{l=1}^{k}(d_{l}-\bar{d})^{2}} \approx \frac{25}{16}-\frac{45}{16}\frac{l^{4}}{k^{4}}\) for \(l=1,\ldots ,k\);

  3. 3.

    \(\hat{\sigma }^{2}\) can be written as a quadratic form \(\hat{\sigma }^{2}=Y^{T} DY\), where \(D=(1/k)\sum _{l=1}^{k}b_{l}D_{l}\) is an \(n\times n\) matrix with \(tr(D)=1\), and \(D_{l}\) is an \(n\times n\) matrix corresponding to \(s_{l}=Y^{T} D_{l}Y\) with \(tr(D_{l})=1\).

Remark 1

From the second point of Theorem 1, \(\hat{\sigma }^{2}\) is a linear combination of \(\{s_{l}\}_{l=1}^{k}\). The weighted average has two superiorities: it decreases asymptotic estimation variance from \((\gamma _{4}+8/9)\sigma ^{4}\) to \((\gamma _{4}-1)\sigma ^{4}\) (see Theorems 2 and 4); in addition, it eliminates the bias terms \(\{Jl^{4}/n^{4}\}_{l=1}^{k}\) since \(\sum _{l=1}^{k}b_{l}l^{4}/n^{4}=0\). Owing to \(\sum _{l=1}^{k}b_{l}=k\) and \(b_{l}<0\) for \(l>(5/9)^{1/4}k\approx 0.86k\), D is not guaranteed to be a positive definite matrix, and it is possible to take a negative value for \(\hat{\sigma }^{2}\). However, our simulation indicates that this rarely happens. If negative estimates do happen in practice, then we recommend replacing them by zero as Tong and Wang (2005) suggested.

For finite samples, since \(s_{l}\) is the average of \((n-2l)\) lag-l differences, it may be better to assign weight \(w_{l}=(n-2l)/N\) to \(s_{l}\) with \(N=\sum _{l=1}^{k}(n-2l)\). So we obtain the weighted least squares variance estimation

$$\begin{aligned} \hat{\beta }_{0}=\bar{s}_{w}-\hat{\beta }_{1}\bar{d}_{w},\quad \hat{\beta }_{1}=\frac{\sum _{l=1}^{k}w_{l}s_{l}(d_{l}-\bar{d}_{w})}{\sum _{l=1}^{k}w_{l}(d_{l}-\bar{d}_{w})^{2}}, \end{aligned}$$

where \(\bar{s}_{w}=\sum _{l=1}^{k}w_{l}s_{l}\) and \(\bar{d}_{w}=\sum _{l=1}^{k}w_{l}d_{l}\).

In the construction of the least squares estimator and its weighted version, we ignore the correlation between \(s_{l}\)’s. It is well known that the generalized least squares estimator \(\hat{\beta }_{\Sigma }=(X^{T}\Sigma ^{-1} X)^{-1}X^{T}\Sigma ^{-1} S\) is the best linear unbiased estimator (BLUE), where \(\Sigma \) is the asymptotic covariance matrix. “BLUE \(=\) OLSE” in linear regression if and only if the errors have equal variances and equal nonnegative correlation coefficients by McElroy (1967). From the following Theorem 2 and Lemma 3, the generalized least squares estimator is identical to the ordinary least squares estimator. Thus the above three estimators are asymptotically equivalent.

2.3 Asymptotic results

Next, we establish the asymptotic normality for the lag-l Gasser-type estimator and our estimator.

Theorem 2

Assume that m has a bounded second-order derivative. If \(l=o(n^{7/8})\) as \(n\rightarrow \infty \), then the lag-l Gasser estimator satisfies

$$\begin{aligned} \sqrt{n}\left( s_{l}-\sigma ^{2}\right) \mathop {\longrightarrow }\limits ^{d}N\left( 0, \left( \gamma _{4}+\frac{8}{9}\right) \sigma ^{4}\right) . \end{aligned}$$

With normal errors, the asymptotic variances for the lagged Rice-type and Gasser-type estimators are \(3\sigma ^{4}/n\) (Tong et al. 2013) and \((3+8/9)\sigma ^{4}/n\) respectively; thus the Rice-type estimator is more efficient than the Gasser-type estimator theoretically. However, for finite samples the Gasser-type estimator would be better if the bias caused by regression function is controlled in the case of immense oscillation of regression function.

Lemma 3

Assume that m has a bounded second-order derivative. If \(v=o(n^{3/4})\) as \(n\rightarrow \infty \) and \(v>u\), then the covariance between lag-l Gasser-type estimators is

$$\begin{aligned} \mathrm{Cov}(s_{u}, s_{v})=\left\{ \begin{array}{ll} (\gamma _{4}-1)\sigma ^{4}/n+o(1/n),&{}\quad v\ne 2u,\\ (\gamma _{4}-\frac{13}{9})\sigma ^{4}/n+o(1/n),&{}\quad v=2u. \end{array} \right. \end{aligned}$$

Theorem 4

Assume that m has a bounded second-order derivative. For any \(k=O(n^{r})\) with \(0<r<3/4\) as \(n\rightarrow \infty \), we have

$$\begin{aligned} \sqrt{n}\left( \hat{\sigma }^{2}-\sigma ^{2}\right) \mathop {\longrightarrow }\limits ^{d}N\left( 0, \left( \gamma _{4}-1\right) \sigma ^{4}\right) . \end{aligned}$$

Remark 2

There are two kinds of differenced methods: the direct-difference method and integral-based method. Most direct-difference estimators do not achieve the asymptotic optimal variance, i.e., Gasser et al. (1986)’s estimator has the asymptotic variance \((\gamma _{4}+8/9)\sigma ^{4}\) from Theorem 2. As for the integral-based estimator, our estimator achieves the same optimal variance \((\gamma _{4}-1)\sigma ^{4}\) from Theorem 4 as Tong and Wang (2005) did, but ours has less bias (\(o(k^{4}/n^{4})\)) than Tong and Wang (2005)’s (\(o(k^{2}/n^{2})\)) under the same conditions of equidistant design and bounded second-order derivative.

3 Simulations

3.1 Finite-sample choice of the bandwidth

In our comparisons, we evaluate the performance of the estimators of Rice (1984), Gasser et al. (1986), Hall et al. (1990), Tong and Wang (2005), and ours, which are notated as \(\hat{\sigma }_{R}^{2}\), \(\hat{\sigma }_{\textit{GSJ}}^{2}\), \(\hat{\sigma }_{\textit{HKT}}^{2}\), \(\hat{\sigma }_{\textit{TW}}^{2}\), and \(\hat{\sigma }^{2}\), respectively. For Hall et al. (1990)’s estimator, we set \(k=2\) in our simulations, that is

$$\begin{aligned} \hat{\sigma }_{\textit{HKT}}^{2}=\frac{1}{n-2}\sum _{i=2}^{n-1}(0.809Y_{i-1} -0.5Y_{i}-0.309Y_{i+1})^{2}. \end{aligned}$$

As for Tong and Wang (2005)’s estimator and ours, the bandwidth is chosen as \(k=n^{1/3}, n^{1/2}\) for small and large n respectively, as Tong and Wang (2005) did. The further explanation is given recently by Wang and Lin (2015), Wang and Yu (2016).

3.2 Simulation results

The oscillation of periodic function depends on frequency and amplitude. We choose the regression function in our simulations to be

$$\begin{aligned} m(x)=A\sin (2\pi fx), \end{aligned}$$

with design points \(x_{i}=i/n\), and the errors are independent and identically distributed normal random variables with zero mean and variance \(\sigma ^{2}\). The regression function is similar to those in Seifert et al. (1993), Dette et al. (1998), and Tong and Wang (2005). We consider two sample sizes \(n=50, 500\), corresponding to small, large sample sizes, three standard deviations \(\sigma =0.1, 0.5, 2\), two frequencies \(f=0.5, 2\), and two amplitudes \(A=5, 50\).

Table 1 Relative MSEs of various estimators. Note that \(1.47\mathrm{E}2=1.47*10^{2}\)

For each simulation setting, we generate observations and compute the estimators \(\hat{\sigma }_{R}^{2}\), \(\hat{\sigma }_{\textit{GSJ}}^{2}\), \(\hat{\sigma }_{\textit{HKT}}^{2}\), \(\hat{\sigma }_{\textit{TW}}^{2}\), \(\hat{\sigma }^{2}\). We repeat this process 1000 times and compute relative MSEs for these estimators.

Table 1 lists relative MSEs of all estimators, \(\mathrm{nMSE}/(2\sigma ^{4})\), since the asymptotically optimal variance is \(2\sigma ^{4}/n\) with normal errors. In general, \(\hat{\sigma }^{2}\) has smaller relative MSE in most settings. With normal errors, the relative MSEs are 1.25, 1.5, and 1.94 corresponding to \(\hat{\sigma }_{\textit{HKT}}^{2}\), \(\hat{\sigma }_{R}^{2}\), and \(\hat{\sigma }_{\textit{GSJ}}^{2}\). However, \(\hat{\sigma }_{\textit{GSJ}}^{2}\) is robust in most settings because it reduces more bias than \(\hat{\sigma }_{R}^{2}\) and \(\hat{\sigma }_{\textit{HKT}}^{2}\). \(\hat{\sigma }_{\textit{TW}}^{2}\) and \(\hat{\sigma }^{2}\) both achieve the asymptotic optimal rate, but \(\hat{\sigma }^{2}\) has less relative MSE than \(\hat{\sigma }_{\textit{TW}}^{2}\) in all settings except that they have the similar performance in the case within large sample and variance, low frequency, and small amplitude occur simultaneously.

Fig. 1
figure 1

Histograms of the variance estimators \(\hat{\sigma }_{\textit{TW}}^{2}\) and \(\hat{\sigma }^{2}\) for the cases \((n, A, f, \sigma ^{2})=(500,5,0.5,0.01)\) and (500, 50, 0.5, 0.01) in 10,000 simulations. a Histograms of estimators \(\hat{\sigma }_{\textit{TW}}^{2}\) and \(\hat{\sigma }^{2}\) with \(A=5\). b Histograms of estimators \(\hat{\sigma }_{\textit{TW}}^{2}\) and \(\hat{\sigma }^{2}\) with \(A=50\)

Next we explain the reason why there are some quite large values in Table 1. The asymptotic MSE is composed of bias and variance, that is \(\mathrm{MSE=Bias^{2}+Var}\). By Theorems 2 and 4, we known that the variance can be controlled as the sample size n becomes large. However, the bias is fixed which is determined by the used method and the oscillation of regression function. For example, \(\mathrm{AMSE}(\hat{\sigma }_{R}^{2})=1.47E2\) is large enough for \(n=50, A=5, f=0.5, \sigma =0.1\). We compute the expectation of \(\hat{\sigma }_{R}^{2}\) such that

$$\begin{aligned} \mathrm{E}(\hat{\sigma }_{R}^{2})= & {} \mathrm{E}\left( \frac{1}{2(n-1)}\sum _{i=2}^{n}(Y_{i}-Y_{i-1})^{2}\right) \\= & {} \mathrm{E}\left( \frac{1}{2(n-1)}\sum _{i=2}^{n}(m_{i}-m_{i-1})^{2}+(\epsilon _{i}-\epsilon _{i-1})^{2} +2(m_{i}-m_{i-1})(\epsilon _{i}-\epsilon _{i-1})\right) \\= & {} \sigma ^{2}+\frac{1}{2(n-1)}\sum _{i=2}^{n}(m_{i}-m_{i-1})^{2}. \end{aligned}$$

Thus the bias is

$$\begin{aligned} \mathrm{Bias}(\hat{\sigma }_{R}^{2})=\frac{1}{2(n-1)}\sum _{i=2}^{n} (m_{i}-m_{i-1})^{2}\approx 0.025, \end{aligned}$$

and now

$$\begin{aligned} \mathrm{nMSE}/(2\sigma ^{4})\approx 50*\textit{Bias}^{2}(\hat{\sigma }_{R}^{2})/(2\sigma ^{4})\approx 156, \end{aligned}$$

which is close to 147.

To further show the importance of bias-correction, we consider the two cases \((n, A, f, \sigma ^{2})=(500,5,0.5,0.01)\) and (500, 50, 0.5, 0.01). Figure 1 indicates that \(\hat{\sigma }_{\textit{TW}}^{2}\) and \(\hat{\sigma }^{2}\) both have asymptotic normality with variance \(\sigma ^{2}=0.01\). However, \(\hat{\sigma }_{\textit{TW}}^{2}\) has small bias (0.0004) while \(\hat{\sigma }^{2}\) almost has no bias with \(A=5\); as amplitude A varies from 5 to 50, \(\hat{\sigma }_{\textit{TW}}^{2}\) has larger bias (0.04) while \(\hat{\sigma }^{2}\) has little bias (0.00018). So the new estimator based on the Gasser-type estimator controls bias much better than Tong and Wang (2005)’s estimator based on the Rice-type estimator.

4 Discussion

In this paper, we propose a new variance estimator in nonparametric regression model. The new estimator achieves the asymptotic optimal rate for the MSE, meanwhile it is less-biased than most differenced estimators.

This work concentrates on the equidistant design. We can generalize the idea to non-equidistant design. Following one reviewer’s comment, we assume that \(x_{i}=g(i/n)\) for \(i=1, \ldots , n\), where the function \(g(\cdot )\) has a positive derivative function and \(c\le g(x)\le 1\) for some constant \(0<c<1\). We expand \(x_{i+l}-x_{i}\), \(x_{i+l}-x_{i-1}\), and \(x_{i}-x_{i-1}\) on the same design point i / n such that

$$\begin{aligned} x_{i+l}-x_{i}\approx & {} g^{(1)}(i/n)l/n, \\ x_{i+l}-x_{i-l}\approx & {} 2g^{(1)}(i/n)l/n, \\ x_{i}-x_{i-l}\approx & {} g^{(1)}(i/n)l/n. \end{aligned}$$

Thus in the sense of linear approximation, we have the same result as the equidistant design such that

$$\begin{aligned} \tilde{s}_{l}= & {} \frac{1}{n-2l}\sum _{i=l+1}^{n-l}\frac{\{(x_{i+l}-x_{i}) Y_{i-l}-(x_{i+l}-x_{i-l})Y_{i}+(x_{i}-x_{i-l})Y_{i+l}\}^{2}}{(x_{i+l}-x_{i})^{2}+(x_{i+l}-x_{i-l})^{2}+(x_{i}-x_{i-l})^{2}} \\\approx & {} \frac{2}{3(n-2l)}\sum _{i=l+1}^{n-l}\left( \frac{1}{2}Y_{i-l}-Y_{i} +\frac{1}{2}Y_{i+l}\right) ^{2}. \end{aligned}$$

Further, assume that the function \(g(\cdot )\) is two times continuous differentiable. We expand \(x_{i+l}-x_{i}\), \(x_{i+l}-x_{i-1}\), and \(x_{i}-x_{i-1}\) on the point i / n such that

$$\begin{aligned} x_{i+l}-x_{i}\approx & {} g^{(1)}(i/n)l/n+\frac{g^{(2)}(i/n)}{2}(l/n)^{2}, \\ x_{i+l}-x_{i-l}\approx & {} g^{(1)}(i/n)2l/n, \\ x_{i}-x_{i-l}\approx & {} g^{(1)}(i/n)l/n-\frac{g^{(2)}(i/n)}{2}(l/n)^{2}. \end{aligned}$$

Thus we have

$$\begin{aligned} \tilde{s}_{l}= & {} \frac{1}{n-2l}\sum _{i=l+1}^{n-l}\frac{\{(x_{i+l}-x_{i})Y_{i-l}-(x_{i+l}-x_{i-l})Y_{i}+(x_{i}-x_{i-l})Y_{i+l}\}^{2}}{(x_{i+l}-x_{i})^{2}+(x_{i+l}-x_{i-l})^{2}+(x_{i}-x_{i-l})^{2}} \\\approx & {} \frac{1}{n-2l}\sum _{i=l+1}^{n-l}\frac{\{g^{(1)}(i/n)(Y_{i-l}-2Y_{i}+Y_{i+l})(l/n) +g^{(2)}(i/n)(Y_{i-l}-Y_{i+l})(l/n)^{2}/2\}^{2}}{6[g^{(1)}(i/n)]^{2}(l/n)^{2}+[g^{(2)}(i/n)]^{2}(l/n)^{4}/2}\\\approx & {} \frac{2}{3(n-2l)}\sum _{i=l+1}^{n-l}\left( \frac{1}{2}Y_{i-l}-Y_{i}+\frac{1}{2}Y_{i+l}\right) ^{2} \\&+\frac{1}{6(n-2l)}\sum _{i=l+1}^{n-l}\frac{g^{(2)}(i/n)}{g^{(1)}(i/n)}(Y_{i-l}-2Y_{i}+Y_{i+l})(Y_{i-l}-Y_{i+l})(l/n). \end{aligned}$$

To eliminate the last term of the above equation, we need the conditions that \(g^{(1)}(i/n)\) is away from 0 and \(g^{(2)}(i/n)\) is close to 0. Or else the last term is non-ignorable and our proposed method is not applied directly. In particular case \(g(x)=x\) corresponding to the equidistant design, we have that \(g^{(1)}(i/n)\equiv 1\) and \(g^{(2)}(i/n)\equiv 0\), which satisfy the above conditions.

In addition, our proposed method concentrates on univariate x, while some differenced methods in the literature focused on multivariate x. Hall et al. (1991) generalized the idea in Hall et al. (1990) to the bivariate lattice design. Munk et al. (2005) further proposed a differenced estimator for multivariate regression (the dimension number \(d\le 4\)). Our method can be generalized to the bivariate lattice design as Hall et al. (1991) did. The generalization may be more efficient in some particular cases. For example, when the surface has the similar variation trend in the horizontal direction, difference in the vertical direction can eliminate the trend to retain the error information.