Metrika

, Volume 76, Issue 6, pp 733–764

Local block bootstrap inference for trending time series

Authors

  • Arif Dowla
    • Stochastic Logic Ltd.
  • Efstathios Paparoditis
    • Department of Mathematics and StatisticsUniversity of Cyprus
    • Department of MathematicsUniversity of California
Article

DOI: 10.1007/s00184-012-0413-9

Cite this article as:
Dowla, A., Paparoditis, E. & Politis, D.N. Metrika (2013) 76: 733. doi:10.1007/s00184-012-0413-9
  • 153 Views

Abstract

Resampling for stationary sequences has been well studied in the last couple of decades. In the paper at hand, we focus on nonstationary time series data where the nonstationarity is due to a slowly-changing deterministic trend. We show that the local block bootstrap methodology is appropriate for inference under this locally stationary setting without the need of detrending the data. We prove the asymptotic consistency of the local block bootstrap in the smooth trend model, and complement the theoretical results by a finite-sample simulation.

Keywords

BootstrapDependent dataKernel smoothingLocal stationarityRegression

1 Introduction

Resampling for stationary sequences has been well studied in the last couple of decades; see the monograph of Lahiri (2003), or the recent review paper by Kreiss and Paparoditis (2011) and the references therein. Nevertheless, it is often unrealistic to assume that the stochastic structure of a time series stays invariant over a long stretch of time; a more realistic model might be to assume a slowly-changing stochastic structure.

As a convenient model for such a phenomenon, Dahlhaus (1996, 1997) defined the notion of a locally stationary process \(\{X_{t,n}, t=1,2,\ldots ,n\}\) by means of a time varying spectral representation of the form
$$\begin{aligned} X_{t,n}=\mu \left( \frac{t}{n}\right) +\int \limits _{-\pi }^{\pi }\exp (i\lambda t)A ^{o}\left(\frac{t}{n}, \lambda \right) d\xi (\lambda )\;\;\; \text{ for} \ \;\;t=1 ,\,\ldots ,\, n. \end{aligned}$$
(1.1)
with deterministic trend \(\mu ,\) and transfer function A\(^{o}\) ; as usual, \(\xi (\lambda )\) is an orthogonal increment stochastic process on [\(-\pi ,\pi \)]. The functions \(\mu (u)\) and A\(^{o}(u,\lambda )\) are assumed to be continuous in \(u\). The process \(\{X_{t,n}\}\) has a double index because for technical reasons that will be apparent in the sequel the range of the time index is mapped on the unit interval.

To address this locally stationary setting, we propose to use the local block bootstrap (LBB) that is a modification of the block bootstrap of Künsch (1989) and Liu and Singh (1992). The idea of the local block bootstrap was introduced by Paparoditis and Politis (2002), and Dowla et al. (2003). The crux of the LBB methodology is that a bootstrap series is formed by resampling blocks as in the block bootstrap method but with the constraint that to fill a position \(x\) (say) in the bootstrap series only blocks that are ‘near’\(x\) in the original series are considered as possible/probable candidates; the local block bootstrap will be rigorously defined shortly.

Although the local block bootstrap is a very general procedure, its asymptotic validity must-as usual-be demonstrated on a case-by-case basis. For this reason, this paper will focus on a particular interesting example of a series exemplifying local (but not global) stationarity, namely the situation of a stationary series with an additive slowly varying trend which is to be estimated in a nonparametric fashion. Thus, consider the model:
$$\begin{aligned} Y_{t}=s(t)+e_{t}\;\;\;\;\;t\in N \end{aligned}$$
(1.2)
where \(s(t)\) is a deteministic trend function and \(\left\{ e_{t},_{t\in N}\!\!\right\} \) is a mean zero, stationary noise process. Kernel regression is a popular nonparametric method for estimating deterministic trend functions. These estimates are typically of the form
$$\begin{aligned} \hat{s}(t)=\frac{1}{nh}\sum \limits _{i=1}^{n}K(t-t_{i})/h)Y_{i} \end{aligned}$$
(1.3)
where (\(t_{i},Y_{i})\;\ i=1,\ldots ,n\) are pairs of observations from the model (1.2) and h is the bandwidth of the kernel. This is the Nadaraya-Watson kernel estimator (cf. Nadaraya 1964; Watson 1964). Other linear smoothers like Gasser-Müller type kernel smoothing (Gasser and Müller 1979) would also apply to our procedures. The method of kernel smoothing with dependent errors have been previously considered by Altman (1990), Roussas (1990) and Hall and Hart (1990)
Of course, to bring Model (1.2) in Dahlhaus (1997) locally stationary framework the range of the time index should be mapped on the unit interval, e.g. by defining s\(_{n}(t)=\;m(t/n)\) and so forth. Note that model (1.2) has been previously considered in the literature; in particular, under the assumption that \(\left\{ e_{t},_{t\in N}\!\!\right\} \) is a linear AR(\(\infty \)) process, Bühlmann (1998) used a sieve bootstrap procedure to conduct inference about the trend function \(s(.).\) The method uses a pilot estimate of the trend, s̃ (t), which is obtained by oversmoothing the data with a kernel. The residuals obtained from this process, namely \(Y_{t} -\) s̃ (t), are fitted with an autoregressive model of order \(p(n)\), where \(n\) is the size of the data. Additionally p(n) \(\rightarrow \infty \) as n\(\rightarrow \infty .\) The errors from the autoregressive model are bootstrapped using the i.i.d. bootstrap of Efron (1979). A bootstrap noise process \(\left\{ e_{t}^{*},_{t\in N}\!\!\right\} \) is reconstructed, which is then used to form a pseudo time series,
$$\begin{aligned} Y_{t}^{*}=\tilde{s}(t)+e_{t}^{*}\;\;\;\;\;t\in N \end{aligned}$$
(1.4)
From this new series, a bootstrap replication of the kernel smoothed estimator is obtained. Repeating this procedure many times gives bootstrap estimates of the distribution of the kernel smoothed estimator.

One difficulty of the sieve bootstrap in this context is that a pilot estimator has to be constructed, and a smoothing parameter for the pilot has to be determined. An additional limitation is that the error process was assumed by Bühlmann (1998) to be an AR (\(\infty \)) time series although this might have been just an artifact of his method of proof. It seems natural to desire to obtain a methodology for trend function inference under fewer restrictions on the error process; the local block bootstrap manages to achieve this goal by replacing linearity with a strong mixing assumption. Note also that the local block bootstrap method does not require the construction of pilot estimates which may be a tricky issue in practice.

Our paper is organized as follows. In Sect. 2 the model assumptions and kernel estimator are discussed. In Sect. 3 we describe the bootstrap algorithm. In Sect. 4 the main results are presented along with their supporting lemmas. In Sect. 5 we give the simulation results. All proofs are given in the last section.

2 Kernel estimator of the trend function

We begin by recalling the definition of strong mixing; see Roussas et al. (1992) and the references therein. Let {\(Z_{j},j\in N\)} be a strictly stationary sequence of random variables defined on some probability space (\(\Omega ,A,P)\). Let \(\digamma _{m}^{n}\) be the \(\sigma \) field induced in \(\Omega \) by the random variables \(\left\{ Z_{j},\;m\le j\le n\!\right\} \). The sequence {\(Z_{j},j\in N\)} is strong mixing—also called \(\alpha \)-mixing—if \(\alpha _{Z}(n)\downarrow 0\;as\;n\rightarrow \infty \) where
$$\begin{aligned} \alpha _{Z}(n)=\underset{A\in \digamma _{1}^{k},\ B\in \digamma _{k+n}^{\infty } }{\sup }\left| P(A\cap B)-P(A)P(B)\right|. \end{aligned}$$
For our theorems we will require the following assumptions.
  1. A.1.

    \(\{ e_{t},_{t\in N}\!\} \) is a strictly stationary strong mixing process with mean zero, and autocovariance sequence \(c(k)=E(e_t e_{t+k})\).

     
  2. A.2.

    For some \(\delta >0\), \(E\left|e_{t}\right|^{6+\delta }\)\(<\)\(\infty \) and \(\sum \nolimits _{i=0}^{\infty }i^{2}\alpha _{e}(i)^{\delta /6+\delta } <\infty \) where \(\alpha _{e}\) is the mixing coefficient of \(\left\{ e_{t},_{t\in N}\!\right\} \). To define our nonparametric estimator we need a kernel which has the following properties.

     
  3. A.3.
    1. i)

      The kernel \(K\) is a symmetric probability density that is twice continuously differentiable.

       
    2. ii)

      \(\int u^{2}K(u)d(u)<\infty \)

       
    3. iii)

      \(\int K^{2}(u)d(u)<\infty \)

       
    4. iv)

      \(K\) is compactly supported on the interval [\(-\frac{1}{2},\frac{1}{2}].\)

       
     
  4. A.4.

    \(h=h(n)\) is the bandwidth of the kernel with \(h\rightarrow 0\) and \(nh\rightarrow \infty \) as \(n\rightarrow \infty .\)

     
  5. A.5.

    The data \(Y_{1},\ldots \ \), \(Y_{n}\) are observations from the model:

     
$$\begin{aligned} Y_{t}=s_{n}(t)+e_{t}\;\;\;\;\;t\in N \end{aligned}$$
where \(s_{n}\)(t)\(\equiv m(t/n)\), for \(t=1,\ldots ,n\), and \(m:[0,1] \rightarrow R\) is a twice continuously differentiable function.
Note that the above model is a slight modification of Eq. (1.2) that allows us to map the entire trend function onto the unit interval. In this connection, we also let \(x_{i}\;=\frac{i}{n}.\) The estimator we consider is defined by
$$\begin{aligned} \hat{m}(x)=\frac{1}{nh}\sum \limits _{i=1}^{n}K(x-x_{i})/h)Y_{i}\;\; \text{ for} \;x\in [h,1-h] \end{aligned}$$
(2.1)
which is the Nadaraya-Watson kernel estimator. The following theorem summarizes the properties of \(\hat{m}(x).\)

Theorem 1

Assume \(x\in (0,1)\). Under assumptions A.1.–A.5. and \(h=O(n^{-\frac{1}{5}})\) the following hold:

  1. i)

    \(\lim \limits _{n\rightarrow \infty }(nh)^{\frac{1}{2}}(E(\hat{m} (x))-m(x))=B_{as}(x).\)

     
  2. ii)

    \(\lim \limits _{n\rightarrow \infty }Var((nh)^{\frac{1}{2}}(\hat{m}(x))=\sigma _{as}^{2}.\)

     
  3. iii)

    \((nh)^{\frac{1}{2}}(\hat{m}(x)-E\hat{m}(x))\Longrightarrow N(0,\sigma _{as}^{2})\;as\)\(n\rightarrow \infty .\)

     
where \(B_{as}(x)=\frac{m^{\prime \prime }(x)}{2}\int u^{2} K(u)d(u)\;for\;h=\Omega (n^{-\frac{1}{5}}),\;B_{as}(x)=0\) for h = \(o(n^{-\frac{1}{5}})\) and \(\sigma _{as}^{2}=2\pi f(0)\int K^{2} (u)d(u)\) where \(f(\omega )=\frac{1}{2\pi }\sum \nolimits _{k=-\infty }^{\infty }c(k)e^{ik\omega }\) is the spectral density function. The notation \(h=\Omega (n^{\nu })\) means that there exists positive constants c\(_{1}\) and c\(_{2}\) such that asymptotically c\(_{1}n^{\nu }\le \; h(n) \;\le \;c_{2}n^{\nu }\). The proof of i) & ii) can be found in a similar setting in Altman (1990) and Hart (1991) but we give a proof pertaining to our particular setting at the end of the paper. The proof of iii), which establishes the asymptotic normality of \(\hat{m}(x)\) can be found in Roussas et al. (1992). The assumptions in their paper are straightforward consequences of assumptions A.2. and A.3. The assumptions pertaining to kernel weights are fairly weak and are satisfied by most kernels. The other assumption is about the mixing conditions. We verify the applicability of their assumptions and conditions in the last section.

3 Local block bootstrap algorithm

The goal is to generate a pseudo time series \(Y_{1}^{*},\ldots , Y_{n}^{*}\) by concatenating blocks of size \(b\) which are resampled from the original series by a probability mechanism. The probability mechanism will select blocks of size \(b\) from a set of blocks that are indexwise close to the original time series. This method of choosing blocks ‘locally’ with respect to the time index is the main idea behind the local block bootstrap (LBB) of Paparoditis and Politis (2002), and Dowla et al. (2003).

We precisely describe the LBB algorithm by the following steps.
  1. 1

    Select a number \(B\in (0,1)\) such that \(nB\) is a positive integer.

     
  2. 2

    Let \(k_{0},k_{1},\ldots , k _{q}\;(\)with \(q=\;\left\lfloor n/b\right\rfloor -1)\) be i.i.d. integers selected from a discrete probability distribution which assigns the probability \(w(k)=1/(2nB+1)\) to the value \(k\) where \(-nB \le k\le nB.\)

     
Note that the LBB can work under various choices for the probability weights \(w(k)\); the uniform weights are chosen for simplicity, and also because the original block bootstrap algorithm for stationary sequences uses uniform weights.
  1. 3

    Define the bootstrap pseudo series \(Y^*_1,\ldots , Y^*_n\) by \(Y_{j+ib}^{*} =Y _{j+ib+k_{i}}\) for \(j=1,\ldots ,b,\) where \(k_{i}\) is as given above for \(i=0,\ldots ,\left\lfloor n/b\right\rfloor -1\).

     
If for some \(i\) and \(j, \ j+ib+k_{i}\) is outside the range of integers 1 to \(n\), then an adjustment is needed; a simple adjustment is to take \(k_{i}\) equal to \(-k_{i}.\) Observe that this will allow us to define the beginning and end of the pseudo series by choosing blocks from the right when at the beginning of the series and blocks from the left when at the right end of the series.
  1. 4
    Based on the bootstrap sample, define the bootstrap estimator \(\hat{m}^{*}(x)\) as
    $$\begin{aligned} \hat{m}^{*}(x)=\frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)Y_{i}^{*} \end{aligned}$$
    (3.1)
    We observe that the \(k_{i}\) allow us to replace a designated block of b observations by another block of the same size but shifted by \(k_{i} \) indices. The range of \(k\), i.e., \(-nB\le k\le nB,\) denotes the size of the window from which a block of size b can be selected.
     
Let \(b\) denote the size of the blocks, \(nB\) denote the approximate number of observations in the bootstrap window, and \(nh\) denote the approximate number of observations in the kernel smoothing window. The values \(B\) and \(h\) correspond to the bootstrap window size and kernel smoothing bandwidth parameter respectively. Both depend on \(n\), and can be written as \(B(n)\) and \(h(n),\) but we write them here without the argument for a concise notation.

Observe that the local block bootstrap procedure has to capture the stochastic structure of the error process locally, allow for the asymptotic normality of the estimator and ensure that bias is negligible. This would require the rates of the block size \(b\), the bootstrap window size \(nB\) and the kernel window size \(nh\) to be appropriately chosen so that all the conditions are satisfied simultaneously. Fortunately, we can satisfy all our requirements with four easily satisfiable constraints. We state them as our last assumptions and motivate them subsequently.

Let \(b,B,\)and \(h\) be \(\Omega (n^{\delta _{1}}),O(n^{-\delta _{2}})\) and \(O(n^{-\delta _{3}})\) respectively where \(\delta _{1},\delta _{2},\delta _{3} \in (0,1).\)
  1. A.6.
    1. (i)

      \(\frac{b^{\frac{5}{2}}}{nh}\rightarrow 0\;\;[ \text{ i.e.} \ \ n^{\frac{5}{2} \delta _{1}}=o(n^{(1-\delta _{3})})]\).

       
    2. (ii)

      \(nB=o(n^{\frac{2}{3}})\;[n^{\frac{1}{3}} =o(n^{\delta _{2}})].\)

       
    For simplicity of notation we want \(nB\) and \(nh\) to be positive integers, and we can pick a sequence of numbers which satisfy the order conditions given.
     
  2. A.7.

    \(nh =\ o(n^{\frac{4}{5}})\) [\( \text{ i.e.} \ \ n^{\frac{1}{5}}=o(n^{\delta _{3}})\), or equivalently, \(h=o(n^{-\frac{1}{5}}).]\)

     
A.6. (i) is needed to ensure that the blocks capture the stochastic structure locally without propagating the errors due to nonstationarity of the entire time series. A.6. (i) is also necessary to establish the asymptotic normality of the bootstrap estimator. A.6. (ii) is used to establish an appropriate rate for the convergence of the bootstrap covariance to the covariance of the original time series. A.7. allows us to construct confidence intervals for m(x) using the bootstrap estimator without having to explicitly estimate \(\left|E(\hat{m}(x))-m(x)\right|\) via undersmoothing (see Theorem 1). The good news is that these conditions are not that restrictive, and will be illustrated in our simulations. Most importantly, these assumptions are what one would naturally expect from the nature of the problem the Local Block Bootstrap (LBB) algorithm is intended to solve.

4 Main results

Note that we can write
$$\begin{aligned} \hat{m}(x)=\frac{1}{nh}\sum \limits _{i=\left\lceil -nh/2+xn\right\rceil }^{\left\lfloor nh/2+xn\right\rfloor }K((x-x_{i})/h)Y_{i} \end{aligned}$$
(4.1)
because \(-\frac{1}{2}\le (x-x_{i})/h\le \frac{1}{2}\) implies \(\left\lceil -nh/2+xn\right\rceil \le i\le \left\lfloor nh/2+xn\right\rfloor \) due to the compact support of \(K\).

The local block bootstrap relies on the smoothness of \(m(x)\). The following results allows us to establish the consistency of the bootstrap estimator \(\hat{m}^{*}(x)\) and provides us with information about its rate of convergence to \(\hat{m}^{*}(x)\) in terms of the size of the bootstrap window. Let us consider the expected value of \(\hat{m}^{*}(x).\)

Theorem 2

Under assumptions A.1.–A.5., for any \(x\in (0,1)\),
$$\begin{aligned} \left| E^{*}(\hat{m}^{*}(x))-E(\hat{m}(x))\right| =O_{p} ((nB)^{-\frac{1}{2}})+O(B)=O_{p}(n^{(\max ((\delta _{2}-1)/2),-\delta _{2}}). \end{aligned}$$
To see that the local block bootstrap captures the variance of \(\hat{m}(x),\) we do the following computation. For a fixed \(n\), the explicit expression for the variance of \(\hat{m}(x)\) is
$$\begin{aligned} Var(\hat{m}(x))&= E\left( \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i} )/h)Y_{i}\;-E\left( \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i} )/h)Y_{i}\right) \right) ^{2}\\&= \frac{1}{n^{2}h^{2}}\sum \limits _{i,j=1}^{n}K((x-x_{i})/h)K((x-x_{j} )/h)E(e_{i}e_{j})\\&= \frac{1}{n^{2}h^{2}}\sum \limits _{i,j=1}^{n}K((x-x_{i})/h)K((x-x_{j} )/h)c(i-j) \end{aligned}$$
where \(c(i-j)\) is the covariance of errors at lag \(i-j\). We write \(Var(\hat{m}(x))\) explicitly to compare it with the expression for \(Var^{*}(\hat{m}^{*}(x))\). We already established in Theorem 1 that \(\sigma _{as}^{2}\equiv \)\(2\pi f(0)\int K^{2}(u)d(u)\) is the limit of \(nhVar(\hat{m}(x))\) as \(n\rightarrow \infty \). Let us consider the variance of the bootstrap estimate. Using (3.1),
$$\begin{aligned} Var^{*}(\hat{m}^{*}(x))&= E^{*}\left( \frac{1}{nh}\sum \limits _{i=1} ^{n}K((x-x_{i})/h)(Y_{i}^{*}-E^{*}(Y_{i}^{*}))\right) ^{2}\\&= \frac{1}{n^{2}h^{2}}\sum \limits _{i,j=1}^{n}K((x-x_{i})/h)K((x-x_{j} )/h)c^{*}(i,j) \end{aligned}$$
where \(c^{*}(i,j)\) is defined to be
$$\begin{aligned} c^{*}(i,j)=E^{*}\left( (Y_{i}^{*}-E^{*}(Y_{i}^{*} ))(Y_{j}^{*}-E^{*}(Y_{j}^{*})\right). \end{aligned}$$
Note that the variance of both \(\hat{m}(x)\) and \(\hat{m}^{*}(x)\) are given by the same formula but with \(c^{*}(i,j)\) instead of \(c (i-j)\) in the bootstrap variance expression. We need to show that \(c^{*}(i,j) \approx c (i-j).\) We can calculate \(c^{*}(i,j)\) exactly according to our bootstrap mechanism. First, define point \(k\) to be an end point if either \(1\le k \le nB\) or \(1-nB \le k \le n.\) When \(\left\lceil \frac{i}{b}\right\rceil -\left\lceil \frac{j}{b}\right\rceil =0\), that is to say \(i\) and \(j\) are in the same bootstrap block, and if neither \(i\) nor \(j\) are end points, then we have
$$\begin{aligned} c^{*}(i,j)&\!=\! E^{*}\left( (Y_{i}^{*}-E^{*}(Y_{i}^{*} ))(Y_{j}^{*}-E^{*}(Y_{j}^{*})\right)\\&\!=\!&\!\frac{1}{2nB\!+\!1}\sum \limits _{r=-nB}^{nB}\left( Y_{i+r}\!-\!\frac{1}{2nB\!+\!1} \sum \limits _{l=-nB}^{nB}Y_{i\!+\!l}\right)\! \left( Y_{j+r}\!-\!\frac{1}{2nB\!+\!1} \sum \limits _{l=-nB}^{nB}Y_{j+l}\right) \end{aligned}$$
which can be rewritten as
$$\begin{aligned} \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}Y_{i+r}Y_{j+r}-\left( \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}Y_{i+r}\right) \left( \frac{1}{2nB+1} \sum \limits _{r=-nB}^{nB}Y_{j+r}\right) \end{aligned}$$
and the following lemma may be proven.

Lemma 1

\(c^{*}(i,j)=c(i-j)+O_{p}((nB)^{-\frac{1}{2}} )\) when \(\left\lceil \frac{i}{b}\right\rceil -\left\lceil \frac{j}{b}\right\rceil =0\) for any fixed \(i\) and \(j\) that are not end points.

This result establishes that the pseudo time series under the bootstrap mechanism has the same asymptotic covariance as the original series. It takes into account the idea of the local block bootstrap. We see that bootstrapping the \(Y\) values in a small neighborhood is really equivalent to bootstrapping the errors when \(m(x)\) is smooth. This is because \(m(x)\) is ‘almost’ constant in a small neighborhood. Thus, if \(i\) and \(j\) are not end points, we have the following two cases:
$$\begin{aligned}&c^{*}(i,j)=c(i-j)+\;O_{p}((nB)^{-\frac{1}{2}}),\quad \text{ if} \;\left\lceil \frac{i}{b}\right\rceil -\left\lceil \frac{j}{b}\right\rceil \;=0\end{aligned}$$
(4.2)
$$\begin{aligned}&c^{*}(i,j)=0,\quad \text{ if} \;\left\lceil \frac{i}{b}\right\rceil -\left\lceil \frac{j}{b}\right\rceil \;\ne \;0 \end{aligned}$$
(4.3)
Equation (4.3) holds because the bootstrap probability mechanism chooses the blocks independently. Therefore, when \(i\) and \(j \) belong to different bootstrap blocks, we have \(c^{*}(i,j)=0\).

Now we need to ensure that the constant in the order term in Lemma 1 does not depend on \(i\) and \(j \). This is the subject of the following lemma.

Lemma 2

Under assumptions A1–A.6., if \(i\) and \(j\) are not end points, then \(c^{*}(i,j)=c(i-j)+O_{p}((nB)^{-\frac{1}{2}})\) where the \(O_{p}((nB)^{-\frac{1}{2}})\) term does not depend on \(i\) and \(j\).

The following result establishes that the asymptotic variance of the bootstrap estimator is the same as that of the regular kernel estimator—see Theorem 1.

Theorem 3

Under assumptions A.1.–A.6., for any \(x\in (0,1)\),
$$\begin{aligned} \hat{\sigma }_{n}^{*2}\equiv Var^{*}((nh)^{\frac{1}{2}}(\hat{m}^{*}(x))\rightarrow _{p}\sigma _{as}^{2}\quad as\;n\rightarrow \infty \end{aligned}$$

Theroem 3 gives us a method to estimate computationally the variance of the regular kernel estimator, by generating many bootstrap samples and computing the variance of \(\hat{m}^{*}(x),\) which would otherwise be a difficult undertaking. We can form confidence intervals for m(x) using asymptotic normality with variance estimated by the LBB (i.e. a combination of Theorem 1 and Theorem 3) by writing an (1-\(\alpha )100\,\%\) level confidence interval as \(\hat{m}(x)\pm \)\(\hat{\sigma }_{as}^{*}(nh)^{-\frac{1}{2}}z_{\alpha /2}.\) Here \(\hat{\sigma }_{as}^{*^{2}}\) is the variance of \((nh)^{\frac{1}{2}}(\hat{m}^{*}(x))\) computed from bootstrap resamples.

Theorem 4

Under assumptions A.1.–A.6., for any \(x\in (0,1)\), we have
$$\begin{aligned} \sup _{u}\left|\begin{array} [c]{c} P^{*}(((nh)^{\frac{1}{2}}(\hat{m}^{*}(x)-E^{*}\hat{m}^{*}(x))\le u)\;\;\;\;\;\;\;\;\;\;\;\;\;\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;-P(((nh)^{\frac{1}{2}}(\hat{m}(x)-E\hat{m}(x))\le u) \end{array} \right|\rightarrow _{p}0. \end{aligned}$$
If in addition assumption A.7. holds, then we also have
$$\begin{aligned} \sup _{u}\left|\begin{array} [c]{c} P^{*}(((nh)^{\frac{1}{2}}(\hat{m}^{*}(x)-E^{*}\hat{m}^{*}(x))\le u)\;\;\;\;\;\;\;\;\;\;\;\;\;\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;-P(((nh)^{\frac{1}{2}}(\hat{m}(x)-m(x))\le u) \end{array} \right|\rightarrow _{p}0. \end{aligned}$$
Theorem 4 allows us to form a bootstrap confidence interval for \(E\hat{m}(x)\) without appealing to the normal distribution. Note however that Theorem 4 gives confidence intervals for \(m(x)\) itself only if the bias \(E\hat{m}(x)-m(x)\) is negligible (i.e. if \(h\) is smaller than the optimal order of magnitude as is dictated by assumption A.7. Thus, to obtain a confidence interval for \( m(x)\), undersmoothing is required if one wants to avoid an explicit bias correction.

A multivariate extension is given below.

Theorem 5

Under assumptions A.1.–A.6., consider fixed points a\(_{i},\;i=1,\ldots ,d\) such that \(0<a_{1}<a_{2}<\cdots .<a_{d}<1\). We have
$$\begin{aligned} \sup _{u}\left|\begin{array}[c]{c} P^{*}((nh)^{\frac{1}{2}}(\hat{m}^{*}(a_{1})-E^{*}\hat{m}^{*} (a_{1})),\ldots ,(nh)^{\frac{1}{2}}(\hat{m}^{*}(a_{d})-E^{*}\hat{m}^{*}(a_{d})))\le u)\;\;\;\;\;\;\;\;\\ \;\;\;\;\;\;\;\;\;\;\;\;\;\;-P((nh)^{\frac{1}{2}}(\hat{m}(a_{1})-E\hat{m}(a_{1})),\ldots ,(nh)^{\frac{1}{2}}(\hat{m}(a_{d})-E\hat{m}(a_{d})))\le u) \end{array} \right|\rightarrow _{p}0 \end{aligned}$$
where \(u=(u_{1},u_{2},\ldots ,u_{d})\) is a \(d\) dimensional vector and the inequality holds coordinatewise. If in addition assumption A.7. holds, then we also have
$$\begin{aligned} \sup _{u}\left|\begin{array}[c]{c} P^{*}((nh)^{\frac{1}{2}}(\hat{m}^{*}(a_{1})\!-\!E^{*}\hat{m}^{*} (a_{1})),\ldots ,(nh)^{\frac{1}{2}}(\hat{m}^{*}(a_{d})\!-\!E^{*}\hat{m}^{*}(a_{d})))\le u)\;\;\;\;\;\;\;\;\\ \;\;\;\;\;\;\;\;\;\;-P((nh)^{\frac{1}{2}}(\hat{m}(a_{1})\!-\!m(a_{1} )),\ldots ,(nh)^{\frac{1}{2}}(\hat{m}(a_{d})\!-\!m(a_{d})))\le u) \end{array}\!\!\!\!\!\!\!\!\!\right|\rightarrow _{p}0. \end{aligned}$$

Theorem 5 is proven using the Cramer-Wold device and the asymptotic independence of \(\hat{m}^{*}(a_{i})\) and \(\hat{m}^{*}(a_{j})\) for \(i\;\ne \; j\). This result allows us to construct simultaneous confidence intervals for the underlying trend function.

5 Simulations

We generate time series that satisfy our model assumptions and we try to empirically test our LBB procedure. We consider the nonlinear time series
$$\begin{aligned} Y_{t}=\;20e^{-(t/n-.5)^{2}}+e_{t} \end{aligned}$$
where {e\(_{t}\}\) satisfy the nonlinear autoregression
$$\begin{aligned} e_{t}=\;sin(e_{t-1})\;+\;\epsilon _{t} \end{aligned}$$
and \(\epsilon _{t}\) is i.i.d. N(0,1). The Epanechnikov kernel is used to comptute \(\hat{m} (x)\). The kernel is defined as
$$\begin{aligned} K(x)=\frac{1}{6}\left(-x+\frac{1}{2}\right)\left(x+\frac{1}{2}\right)\;\;\text{ for} \text{-}\frac{1}{2} \le x \le \frac{1}{2}\;\text{ and} \text{0} \text{ otherwise.} \end{aligned}$$
We apply the Local Block Bootstrap [LBB] algorithm to develop confidence intervals for \(m(x)\). In Tables 1, 2 and 3 we observe the bootstrap coverage of 95 % confidence intervals of \(m(x)\) at \(x=0.1, x=0.3\) and \(x=0.5\) respectively. We consider these values of \(x\) in order to study the effect of slope and curvature of \(m(x)\) on the LBB-based inference. We also vary the sizes of the kernel and bootstrap windows as well as the block sizes. We conduct our simulation on data size \(n=1{,}000\) and \(n=10{,}000.\)
Table 1

Coverage of 95 % confidence interval of \(m\)(.1). \((m^{(1)}\) (.1) = 13.5 and \(m^{(2)}\) (.1) = \(-\)23.2)

Kernel

Bootstrap

Block size

% Coverage

Window size

Window size

\(\delta _{1}\)

Pivotal

(Normal)

\(1-\delta _{3}\)

\(1-\delta _{2}\)

 

\(n=10^{3}\)

\(n=10^{4}\)

.65 (89) (398)

.44 (20) (57)

.16 (3) (4)

80.0 (84.5)

87.0 (87.7)

.19 (4) (6)

82.0 (86.0)

89.7 (91.0)

.22 (5) (8)

79.0 (84.5)

88.7 (90.5)

.46 (23) (69)

.16 (3) (4)

80.0 (85.5)

86.8 (89.7)

.19 (4) (6)

81.2 (86.3)

85.5 (87.7)

.22 (5) (8)

82.5 (86.3)

89.7 (91.0)

.48 (27) (83)

.16 (3) (4)

76.0 (82.5)

84.8 (88.2)

.19 (4) (6)

78.3 (85.8)

89.0 (91.8)

.22 (5) (8)

80.0 (84.8)

90.0 (92.0)

.70 (125) (630)

.44 (20) (57)

.16 (3) (4)

79.3 (81.7)

85.3 (85.0)

.19 (4) (6)

81.7 (84.8)

89.2 (89.7)

.22 (5) (8)

83.7 (86.8)

90.5 (91.5)

.46 (23) (69)

.16 (3) (4)

82.2 (86.0)

86.0 (87.0)

.19 (4) (6)

84.5 (87.0)

89.7 (91.3)

.22 (5) (8)

83.0 (85.8)

89.7 (90.5)

.48 (27) (83)

.16 (3) (4)

83.0 (86.3)

84.8 (86.8)

.19 (4) (6)

84.0 (86.8)

90.0 (90.0)

.22 (5) (8)

84.0 (88.7)

90.2 (92.0)

.75 (177) (1,000)

.44 (20) (57)

.16 (3) (4)

83.5 (84.8)

87.3 (87.5)

.19 (4) (6)

84.0 (86.0)

85.8 (86.5)

.22 (5) (8)

87.3 (88.5)

89.7 (89.5)

.46 (23) (69)

.16 (3) (4)

88.7 (88.5)

89.7 (89.5)

.19 (4) (6)

90.0 (89.7)

85.5 (86.8)

.22 (5) (8)

89.5 (91.5)

93.3 (93.3)

.48 (27) (83)

.16 (3) (4)

88.0 (89.5)

87.5 (89.0)

.19 (4) (6)

86.8 (90.0)

88.7 (88.5)

.22 (5) (8)

87.0 (87.5)

92.3 (92.0)

The numbers in parentheses are the actual number of observations for the two different data sizes

Table 2

Coverage of 95 % confidence interval of \(m\)(.3). ( \((m^{(1)}\) (.3) = 7.7 and \(m^{(2)}\) (.3) = \(-\)35.4)

Kernel

Bootstrap

Block size

% Coverage

Window size

Window size

\(\delta _{1}\)

Pivotal

(Normal)

\(1-\delta _{3}\)

\(1-\delta _{2}\)

 

\(n=10^{3}\)

\(n=10^{4}\)

.65 (89) (398)

.44 (20) (57)

.16 (3) (4)

76.7 (81.5)

84.5 (85.8)

.19 (4) (6)

81.2 (85.8)

88.7 (90.2)

.22 (5) (8)

84.0 (87.3)

86.3 (86.8)

.46 (23) (69)

.16 (3) (4)

73.8 (79.8)

86.0 (87.7)

.19 (4) (6)

79.3 (84.5)

87.5 (88.2)

.22 (5) (8)

83.5 (86.8)

89.5 (92.3)

.48 (27) (83)

.16 (3) (4)

74.8 (82.7)

84.8 (88.5)

.19 (4) (6)

79.0 (83.7)

85.5 (88.2)

.22 (5) (8)

82.0 (89.0)

87.0 (90.8)

.70 (125) (630)

.44 (20) (57)

.16 (3) (4)

82.0 (83.7)

86.0 (86.3)

.19 (4) (6)

83.5 (86.3)

85.8 (86.0)

.22 (5) (8)

84.5 (86.3)

91.8 (92.8)

.46 (23) (69)

.16 (3) (4)

83.5 (87.0)

87.7 (87.7)

.19 (4) (6)

82.7 (85.5)

88.7 (89.7)

.22 (5) (8)

81.7 (85.3)

91.8 (93.0)

.48 (27) (83)

.16 (3) (4)

81.0 (84.2)

88.7 (90.2)

.19 (4) (6)

84.5 (86.0)

88.7 (91.5)

.22 (5) (8)

83.0 (87.7)

88.7 (90.5)

.75 (177) (1,000)

.44 (20) (57)

.16 (3) (4)

79.5 (81.7)

88.2 (87.7)

.19 (4) (6)

84.2 (86.0)

89.7 (90.0)

.22 (5) (8)

87.3 (88.2)

90.2 (91.8)

.46 (23) (69)

.16 (3) (4)

84.5 (85.0)

87.7 (88.2)

.19 (4) (6)

82.5 (84.2)

86.8 (88.0)

.22 (5) (8)

82.0 (84.0)

88.2 (89.2)

.48 (27) (83)

.16 (3) (4)

79.5 (80.8)

84.5 (85.3)

.19 (4) (6)

84.0 (86.3)

90.2 (90.8)

.22 (5) (8)

87.5 (90.2)

87.3 (88.2)

The numbers in parentheses are the actual number of observations for the two different data sizes

Table 3

Coverage of 95 % confidence interval of \(m\)(.5). \((m^{(1)}\) (.5) = 0 and \(m^{(2)}\) (.5) = \(-\)40)

Kernel

Bootstrap

Block size

% Coverage

Window size

Window size

\(\delta _{1}\)

Pivotal

(Normal)

\(1-\delta _{3}\)

\(1-\delta _{2}\)

 

\(n=10^{3}\)

\(n=10^{4}\)

.65 (89) (398)

.44 (20) (57)

.16 (3) (4)

79.3 (83.7)

85.3 (86.3)

.19 (4) (6)

80.0 (84.8)

86.5 (89.2)

.22 (5) (8)

81.7 (87.5)

89.2 (91.3)

.46 (23) (69)

.16 (3) (4)

80.0 (86.8)

84.8 (87.3)

.19 (4) (6)

83.0 (87.0)

89.2 (91.5)

.22 (5) (8)

80.3 (86.3)

88.2 (90.5)

.48 (27) (83)

.16 (3) (4)

74.8 (83.2)

88.0 (90.2)

.19 (4) (6)

80.3 (88.0)

85.5 (88.7)

.22 (5) (8)

82.7 (87.0)

86.0 (90.5)

.70 (125) (630)

.44 (20) (57)

.16 (3) (4)

83.7 (85.0)

83.5 (84.5)

.19 (4) (6)

80.8 (82.2)

90.8 (92.8)

.22 (5) (8)

82.2 (84.5)

92.3 (93.0)

.46 (23) (69)

.16 (3) (4)

79.3 (83.7)

84.2 (85.0)

.19 (4) (6)

81.5 (84.2)

89.2 (89.2)

.22 (5) (8)

83.0 (85.0)

90.5 (92.0)

.48 (27) (83)

.16 (3) (4)

81.2 (84.0)

87.0 (90.0)

.19 (4) (6)

84.5 (87.0)

86.5 (88.0)

.22 (5) (8)

82.0 (85.3)

88.7 (90.0)

.75 (177) (1,000)

.44 (20) (57)

.16 (3) (4)

84.2 (84.8)

84.5 (85.0)

.19 (4) (6)

77.0 (79.5)

88.2 (89.5)

.22 (5) (8)

85.5 (86.5)

90.2 (90.0)

.46 (23) (69)

.16 (3) (4)

81.5 (82.0)

87.7 (88.2)

.19 (4) (6)

87.7 (88.5)

88.5 (89.5)

.22 (5) (8)

85.0 (85.3)

89.7 (91.0)

.48 (27) (83)

.16 (3) (4)

79.3 (81.7)

88.2 (88.5)

.19 (4) (6)

84.0 (85.5)

90.2 (90.5)

.22 (5) (8)

86.3 (88.2)

92.5 (92.5)

The numbers in parentheses are the actual number of observations for the two different data sizes

We use the first column to denote the kernel window size which is \(nh\) and can be written as \(n^{1-\delta _{3}}.\) Similarly, we use the second column to denote the bootstrap window size nB which is \(n^{1-\delta _{2}}\) and the third column as the block size. The exponent corresponding to the size is given along with the exact number corresponding to the two different data sizes that we use. An equal-tailed 95 % bootstrap confidence interval is constructed using the pivotal method as described in Politis (1998) and the coverage is computed. A 95 % confidence interval based on the asymptotic normality of the bootstrap estimator is also given.

Our choice for the size of the kernel window is based on the fact that we want to undersmooth to remove bias. For that reason we choose \(h=o(n ^{-\frac{1}{5} })\) which restricts us to window sizes of order \(o(n^{4/5}).\) Our bootstrap window is restricted to \( o(n^{2/3})\) and our blocks sizes need to have an exponent less than half of the exponent of the bootstrap window. Based on these restrictions, which are explicitly stated in our assumptions, we choose a suitable range of values for our simulation.

We also compute the bootstrap estimate of the variance of \(\hat{m}(x)\). We generate 400 time series. From each of them \(\hat{m}(x)\) is computed. From each of these time series we construct 400 bootstrap pseudo time series using the LBB algorithm. We compute \(\hat{m} ^{*}(x)\) for each pseudo series and construct confidence intervals. Tables 1, 2 and 3 report the coverage of the 95 % confidence interval constructed by the pivotal method. The confidence interval is given by
$$\begin{aligned} \left[ \hat{m}(x)-q^{*}(1-\alpha /2),\;\hat{m}(x)-q^{*}(\alpha /2)\right] \end{aligned}$$
where \(q^{*}(p)\) is the value of the bootstrap estimator at the \(p{th} \) quantile of the bootstrap distribution figuring in Theorem 4, and \(\alpha ={}.05\). The confidence interval constructed using asymptotic normality is given by
$$\begin{aligned}{}[\hat{m}(x)\pm 1.96\sqrt{Var(\hat{m}^{*}x)}] \end{aligned}$$
where the \(z\)-value corresponding to \(\alpha \) = .05 is used. The variance estimate for \(\sqrt{nh}\) m̂ *(x) is computed from the 400 bootstrap samples. Tables 4, 5 and 6 compare variance of \(\sqrt{nh}\) m̂ (x) with the variance of \(\sqrt{nh}\hat{m} ^{*}(x)\) for different window and block sizes.
Table 4

Variance comparison at \(m\)(.1): Variance of \((nh)^{\frac{{1}}{{ 2}}}(\hat{m}^{{ *}}(x)-E^{{ *}}\hat{m}^{{ *}}(x))\) [denoted V\(^{{ *}}]\)] compared to the variance of \((nh)^{\frac{{ 1}}{{ 2}}}(\hat{m}(x)-E\hat{m}(x))\) [denoted V] at \(x=.1\)

Kernel

Bootstrap

Block

\(n=10 ^{3}\)

\(n=10 ^{4}\)

Window size

Window size

size

V

Var

MSE

V

Var

MSE

\(1-\delta _{3}\)

\(1-\delta _{2}\)

\(\delta _{1}\)

(EV*)

V*

V*

(EV*)

V*

(V*)

.65

.44

.16 (3) (4)

5.4 (3.0)

.45

6.7

5.7 (3.5)

.25

5.0

(89)

(20)

.19 (4) (6)

6.2 (3.3)

.64

8.7

5.1 (4.0)

.42

1.7

(398)

(57)

.22 (5) (8)

5.1 (3.6)

.98

3.1

4.8 (4.2)

.56

1.0

.46

.16 (3) (4)

5.1 (3.0)

.43

4.7

5.7 (3.5)

.23

5.1

(23)

.19 (4) (6)

5.5 (3.5)

.62

4.8

5.5 (4.0)

.34

2.5

(69)

.22 (5) (8)

4.9 (3.7)

1.06

2.4

6.0 (4.3)

.66

3.6

.48

.16 (3) (4)

5.8 (3.1)

.38

7.5

5.1 (3.5)

.21

2.6

(27)

.19 (4) (6)

5.6 (3.5)

.73

5.2

5.5 (4.1)

.39

2.5

(83)

.22 (5) (8)

5.4 (3.8)

.96

3.5

5.5 (4.3)

.52

1.9

.70

.44

.16 (3) (4)

5.5 (3.0)

.38

6.4

5.5 (3.5)

.19

4.4

(125)

(20)

.19 (4) (6)

5.3 (3.4)

.61

4.4

5.6 (3.9)

.33

3.1

(630)

(57)

.22 (5) (8)

5.6 (3.5)

.67

5.0

5.9 (4.1)

.37

3.7

.46

.16 (3) (4)

4.9 (3.0)

.36

3.7

5.5 (3.5)

.15

4.1

(23)

.19 (4) (6)

5.5 (3.4)

.47

5.0

4.6 (4.0)

.30

0.7

(69)

.22 (5) (8)

5.8 (3.6)

.74

5.9

6.1 (4.3)

.48

3.8

.48

.16 (3) (4)

5.1 (3.2)

.40

4.2

5.5 (3.6)

.18

3.8

(27)

.19 (4) (6)

5.0 (3.6)

.51

2.5

5.1 (4.1)

.30

1.3

(83)

.22 (5) (8)

5.8 (3.8)

.70

5.1

5.0 (4.3)

.40

1.0

.75

.44

.16 (3) (4)

5.3 (3.0)

.26

5.3

5.5 (3.5)

.18

4.1

(177)

(20)

.19 (4) (6)

5.6 (3.4)

.40

5.3

5.2 (3.9)

.24

1.9

(1,000)

(57)

.22 (5) (8)

5.8 (3.7)

.53

5.2

5.7 (4.1)

.35

2.7

.46

.16 (3) (4)

5.1 (3.3)

.26

3.5

5.2 (3.5)

.17

2.9

(23)

.19 (4) (6)

5.8 (3.7)

.40

4.8

5.8 (4.0)

.24

3.3

(69)

.22 (5) (8)

5.5 (3.9)

.50

3.0

4.9 (4.2)

.32

0.7

.48

.16 (3) (4)

5.3 (3.7)

.29

3.0

5.0 (3.6)

.15

2.2

(27)

.19 (4) (6)

5.5 (4.2)

.39

2.2

5.3 (4.0)

.26

1.9

(83)

.22 (5) (8)

6.2 (4.5)

.60

3.2

5.0 (4.3)

.34

0.9

The numbers in parentheses are the actual number of observations for the two different data sizes

Table 5

Variance comparison at \(m\)(.3): Variance of \((nh)^{\frac{{ 1}}{{ 2}}}(\hat{m}^{{ *}}(x)-E^{{ *}}\hat{m}^{{ *}}(x))\) [denoted V\(^{{ *}}]\) compared to the variance of \((nh)^{\frac{{ 1}}{{ 2}}}(\hat{m}(x)-E\hat{m}(x))\) [denoted V] at \(x=.3\)

Kernel

Bootstrap

Block

\(n=10^{3}\)

\(n=10^{4}\)

Window size

Window size

size

V

Var

MSE

V

Var

MSE

\(1-\delta _{3}\)

\(1-\delta _{2}\)

\(\delta _{1}\)

(EV*)

V*

V*

(EV*)

V*

(V*)

.65

.44

.16 (3) (4)

5.9 (2.9)

.46

9.1

5.9 (3.5)

.25

6.1

(89)

(20)

.19 (4) (6)

5.1 (3.2)

.62

4.3

5.1 (3.9)

.37

1.8

(398)

(57)

.22 (5) (8)

5.4 (3.4)

.94

4.9

6.0 (4.1)

.64

4.1

.46

.16 (3) (4)

4.8 (3.0)

.44

3.9

5.6 (3.5)

.22

4.8

(23)

.19 (4) (6)

5.1 (3.2)

.69

4.2

5.5 (4.0)

.44

2.8

(69)

.22 (5) (8)

5.8 (3.4)

.84

6.2

6.0 (4.2)

.60

3.8

.48

.16 (3) (4)

5.1 (3.1)

.45

4.7

5.4 (3.6)

.21

3.7

(27)

.19 (4) (6)

5.5 (3.4)

.74

5.0

5.1 (4.1)

.43

1.8

(83)

.22 (5) (8)

5.1 (3.5)

.84

3.1

5.5 (4.2)

.56

2.0

.70

.44

.16 (3) (4)

6.3 (2.9)

.37

12.2

6.1 (3.5)

.18

6.8

(125)

(20)

.19 (4) (6)

5.2 (3.2)

.49

4.5

4.8 (3.9)

.28

1.1

(630)

(57)

.22 (5) (8)

5.6 (3.5)

.76

5.2

5.7 (4.1)

.42

2.9

.46

.16 (3) (4)

6.1 (2.9)

.36

10.3

6.0 (3.5)

.19

6.7

(23)

.19 (4) (6)

5.4 (3.3)

.51

5.8

5.3 (4.0)

.32

1.9

(69)

.22 (5) (8)

5.4 (3.6)

.77

4.3

5.8 (4.2)

.42

2.8

.48

.16 (3) (4)

5.9 (3.0)

.39

8.6

5.0 (3.5)

.19

2.3

(27)

.19 (4) (6)

5.2 (3.4)

.50

3.8

5.6 (4.0)

.33

2.9

(83)

.22 (5) (8)

5.3 (3.6)

.65

3.7

5.3 (4.3)

.45

1.4

.75

.44

.16 (3) (4)

5.9 (2.9)

.25

9.3

6.0 (3.5)

.17

6.1

(177)

(20)

.19 (4) (6)

5.9 (3.2)

.38

7.3

6.0 (4.0)

.30

4.6

(1,000)

(57)

.22 (5) (8)

5.6 (3.4)

.46

5.3

5.6 (4.2)

.34

2.5

.46

.16 (3) (4)

5.8 (3.0)

.26

8.2

5.9 (3.5)

.19

6.0

(23)

.19 (4) (6)

4.8 (3.3)

.40

2.8

4.9 (4.0)

.28

1.2

(69)

.22 (5) (8)

5.4 (3.5)

.47

4.1

5.5 (4.2)

.33

2.0

.48

.16 (3) (4)

5.4 (3.0)

.26

5.9

5.4 (3.5)

.20

3.6

(27)

.19 (4) (6)

5.8 (3.4)

.40

6.2

5.8 (4.0)

.29

3.4

(83)

.22 (5) (8)

4.7 (3.6)

.58

1.8

5.6 (4.3)

.34

2.1

The numbers in parentheses are the actual number of observations for the two different data sizes

Table 6

Variance comparison at \(m\)(.5): Variance of \((nh)^{\frac{{ 1}}{{ 2}}}(\hat{m}^{{ *}}(x)-E^{{ *}}\hat{m}^{{ *}}(x))\) [denoted V\(^{{ *}}\)] compared to the variance of \((nh)^{\frac{{ 1}}{{ 2}}}(\hat{m}(x)-E\hat{m}(x))\) [denoted V] at \(x=.5\)

Kernel

Bootstrap

Block

\(n=10 ^{3}\)

\(n=10 ^{4}\)

window size

window size

size

V

Var

MSE

V

Var

MSE

\(1-\delta _{3}\)

\(1-\delta _{2}\)

\(\delta _{1}\)

(EV*)

V*

V*

(EV*)

V*

V*

.65

.44

.16 (3) (4)

5.0 (2.9)

.47

5.0

5.5 (3.5)

.23

4.2

(89)

(20)

.19 (4) (6)

5.4 (3.2)

.69

5.3

6.0 (3.9)

.37

4.7

(398)

(57)

.22 (5) (8)

4.8 (3.3)

.73

2.8

5.0 (4.2)

.65

1.4

.46

.16 (3) (4)

5.7 (3.0)

.37

7.9

5.7 (3.5)

.23

5.0

(23)

.19 (4) (6)

5.6 (3.2)

.64

6.2

5.0 (4.0)

.41

1.6

(69)

.22 (5) (8)

6.3 (3.4)

.71

9.0

5.6 (4.2)

.60

2.6

.48

.16 (3) (4)

4.9 (3.0)

.44

4.1

4.9 (3.5)

.24

2.3

(27)

.19 (4) (6)

5.5 (3.3)

.63

5.4

6.0 (4.0)

.42

4.3

(83)

.22 (5) (8)

4.8 (3.5)

.90

2.4

5.3 (4.2)

.50

1.6

.70

.44

.16 (3) (4)

6.4 (2.9)

.32

12.8

5.9 (3.5)

.19

5.9

(125)

(20)

.19 (4) (6)

5.7 (3.2)

.55

6.9

5.3 (3.9)

.33

2.3

(630)

(57)

.22 (5) (8)

5.2 (3.4)

.68

3.9

5.6 (4.1)

.40

2.6

.46

.16 (3) (4)

5.6 (2.9)

.33

7.6

5.1 (3.5)

.19

2.8

(23)

.19 (4) (6)

5.8 (3.2)

.47

7.0

5.4 (4.0)

.30

2.5

(69)

.22 (5) (8)

5.7 (3.5)

.75

5.6

5.7 (4.3)

.47

2.4

.48

.16 (3) (4)

5.7 (3.0)

.36

7.8

5.7 (3.5)

.18

4.7

(27)

.19 (4) (6)

5.7 (3.3)

.50

5.9

4.9 (4.0)

.29

1.1

(83)

.22 (5) (8)

5.8 (3.5)

.66

6.1

5.6 (4.3)

.42

2.2

.75

.44

.16 (3) (4)

5.4 (2.9)

.28

6.6

5.7 (3.5)

.18

5.2

(177)

(20)

.19 (4) (6)

6.2 (3.1)

.38

10.1

5.6 (3.9)

.26

3.2

(1,000)

(57)

.22 (5) (8)

5.0 (3.4)

.56

3.2

5.0 (4.2)

.34

1.0

.46

.16 (3) (4)

5.4 (2.9)

.25

6.5

5.3 (3.5)

.20

3.4

(23)

.19 (4) (6)

6.0 (3.3)

.41

8.2

6.4 (3.9)

.24

6.3

(69)

.22 (5) (8)

4.7 (3.4)

.49

2.3

5.4 (4.2)

.35

1.9

.48

.16 (3) (4)

5.9 (3.0)

.26

8.8

4.9 (3.5)

.20

2.1

(27)

.19 (4) (6)

5.6 (3.3)

.41

5.5

5.4 (4.0)

.26

2.2

(83)

.22 (5) (8)

5.0 (3.5)

.52

2.7

5.3 (4.3)

.36

1.3

The numbers in parentheses are the actual number of observations for the two different data sizes

The size of the kernel window should be smaller when the curvature is large in absolute value. The bootstrap window relies on the constancy of the underlying function, and should be smaller if the first derivative is large in absolute value. The block size should be larger if the data is strongly dependent. We note that for larger data sets our bootstrap variance approaches the variance of \(\sqrt{nh}\hat{m} (x)\). Many of the patterns we would expect theoretically can be seen in our simulation.

As a referee pointed out, the coverage for the confidence interval using the pivotal method was typically lower than the coverage using asymptotic normality. This is particularly evident in the smaller data set, where coverage is lower than the larger data set. This was puzzling at first but an explanation for this phenomenon is possible. Recall that the noise satisfies the nonlinear autoregression \( e_{t}=\;sin(e_{t-1})\;+\;\epsilon _{t} \). In general, \(sin(x)\le x\), so the autoregression is stable. However, for small \(x\) we have the approximation \(sin(x)\approx x\). Due to \(\epsilon _{t}\) being i.i.d. N(0,1), most of the \(e_{t}\) are indeed small, hence the autoregression—although formally nonlinear—is close to linearity (and to normality due to the normal inputs). A histogram (and QQ-plot) of the \( e_{t}\) generated confirms that their marginal distribution is indistiguishable from a Gaussian. If this is the case, then the distribution of \(\hat{m}(x)\) would also be (finite-sample) normal, since \(\hat{m}(x)\) is linear in the errors. Finally, using the normal reference distribution (with estimated variance) is obviously the better thing to do when the estimator happens to have a (finite-sample) normal distribution.

The coverage improves considerably across all window and block sizes as we increase our data size. The smaller data set is much more sensitive to window and block sizes. We notice both these effects as we compare the coverage percentages of the two data sizes in Tables 1, 2 and 3. The coverage for the confidence interval using the pivotal method was lower than the coverage using asymptotic normality in the smaller data set. On the larger data, this gap is reduced considerably, and in many cases the two coverages are the within 0.5 % of each other (i.e. equal to the nearest percent).

As the block sizes are increased, the coverage is improved. This improvement is very clear in the smaller data set as we compare the different sets of 3 numbers keeping bootstrap and kernel window constant. Most of these sets show improvement with increasing block size in both Tables 1, 2 and 3 corresponding to the different values of \(x\). This relationship is also present in the larger data set, although less clear. One reason could be that the coverage is higher for the larger data set, the variation between coverage values are small and consequently the patterns are harder to detect.

The coverage improves slightly as the bootstrap window size is increased. This is observed in both Table 1 and Table 2 as we note the changes in coverage across different bootstrap window sizes keeping the block size and the kernel window size constant. It does not increase for all such comparisons, but for a majority of them. One would expect that a larger bootstrap window would be less favorable at \(x=0.1\) where \(\left|m^{(1)}(.1)\right|=13.5\) than it would for \(x=0.3\) where \(\left|m^{(1)}(.1)\right|=7.7\) corresponding to Tables 1 and 2 respectively. This is observed for \(n=1{,}000\) and kernel size \(n^{0.65}\) where the coverage improves in the Table 2 as the bootstrap window size increases while in Table 1 coverage decreases for most comparisons. In Table 3, where \(\left|m^{(1)}(.5)\right|=0,\) the bootstrap window size did not seem to affect the coverage.

The coverage improved for increasing kernel window size, for Tables 1, 2 and 3 for the smaller data set. For \(n=10{,}000\), this trend is less noticeable, but present. This is again observed by comparing the coverage for different kernel size keeping the other size varaibles constant. The kernel window size was restricted to be less than \(O(n^{4/5})\) so that the bias goes to zero asymptotically. This would allow us to construct the bootstrap confidence intervals without the need for bias correction. The bias is proportional to the 2nd derivative \(m^{(2)}(x).\) In Table 1, \(\left|m^{(2)}(.1)\right|=23.2\) and in Table 2, \(\left|m^{(2)}(.1)\right|=35.4.\) One would expect that comparatively lower values of the kernel window size would favor the data set with the higher absolute curvature. We could not find any strong effect of this phenomenon in our result.

The variance of \(\sqrt{nh}\hat{m} ^{*}(x)\) is close to the variance of \(\sqrt{nh}\hat{m} (x) \) for the larger data set and is fairly stable. For large data sets, we could use the LBB algorithm to estimate the variance of \(\sqrt{nh}\hat{m} (x)\). It would be a difficult task to compute the variance by direct calculation because one would have to estimate covariances and truncate the sum of these covariances. Using LBB, we can simply compute the variance of the bootstrap estimates. In general, the bootstrap estimator variance is smaller than the variance of \(\sqrt{nh}\hat{m} (x)\) but tends to increase and close the gap as we increase block size. This is observed in Tables 4, 5 and 6 when we look at the sets of three coverage numbers correspoding to varying block sizes. This is expected because having bigger blocks retains more of the dependence structure of the original time series.

We notice that our results vary depending on the ratios of the kernel window, bootstrap window and block sizes. There seems to be more subtle relationships between the relative window sizes and the magnitudes of the slope and curvature. This relationship would require further investigation. It would seem that one should empirically establish these ratios for a given data set at a specified point by testing the procedure on several subsets of the data. This is like the subsampling-cross validation idea proposed in Hall et al. (1995). One can then use the LBB procedure for those values which work well on these subsets.

6 Proofs

Proof of Theorem 1 (i)

Recall that \(x\) is a fixed number in \( (0,1)\). In what follows, we assume that \(n\) is big enough, so that \(h\) (that tends to zero) is small enough to guarantee that either \(x>h\) (if \(x<1/2\)), or \(1-x>h\) (if \(x\ge 1/2\)); in this way the kernel estimator is defined without boundary effects.

Now consider
$$\begin{aligned} E(\hat{m}(x))&= E\left( \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i} )/h)Y_{i}\right)\\&= \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)E(Y_{i})=\frac{1}{nh} \sum \limits _{i=1}^{n}K((x-x_{i})/h)m(x_{i})\\&= \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)\left( m(x)+m^{\prime }(x)(x-x_{i})+m^{\prime \prime }(x_{i})(x-x_{i})^{2}/2\right) \; \nonumber \\&+o((x-x_{i} )^{2})\\&= \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)m(x)+\frac{1}{nh} \sum \limits _{i=1}^{n}K((x-x_{i})/h)m^{\prime }(x)(x-x_{i})\\&\quad +\frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)m^{\prime \prime } (x_{i})(x-x_{i})^{2}/2+O(h^{3})\\&= A_{n,1}+A_{n,2}+A_{n,3}+O(h^{3}).\ \text{ We} \text{ observe} \text{ that} \ A_{n,1}\rightarrow m(x).\\&\text{ Consider} \ A_{n,2}.\\ A_{n,2}&= \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)m^{\prime }(x)(x-x_{i})\\&= m^{\prime }(x)\frac{1}{n}\sum \limits _{i=1}^{n}((x-x_{i})/h)K((x-x_{i} )/h)\rightarrow m^{\prime }(x)\int uK(u)d(u)=0 \end{aligned}$$
Consider \(A_{n,3}.\)
$$\begin{aligned} A_{n,3}&= \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)m^{\prime \prime }(x)(x-x_{i})^{2}/2\nonumber \\&= \frac{m^{\prime \prime }(x)}{2}\frac{1}{nh}\sum \limits _{i=1}^{n}(x-x_{i})^{2}K((x-x_{i})/h)\\&\times \frac{m^{\prime \prime }(x)}{2}h^{2}\frac{1}{nh}\sum \limits _{i=1}^{n} ((x-x_{i})/h)^{2}K((x-x_{i})/h)\sim \frac{m^{\prime \prime }(x)}{2} h^{2}\int u^{2}K(u)d(u) \end{aligned}$$
This establishes our result. \(\square \)

Proof of Theorem 1 (ii)

$$\begin{aligned} Var(\hat{m}(x))&= Var\left( \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i} )/h)Y_{i}\right)\\&= \frac{1}{(nh)^{2}}\sum \limits _{i,j=1}^{n}K((x-x_{i})/h)K((x-x_{j} )/h)Cov(Y_{i},Y_{j}) \end{aligned}$$
We note that \(Cov(Y_{i},Y_{j})\) = \(Cov(e_{i},e_{j})\) which we denote as \(c(i-j).\) So we have
$$\begin{aligned} nhVar(\hat{m}(x))=\frac{1}{nh}\sum \limits _{i,j=1}^{n}K((x-x_{i})/h)K((x-x_{j} )/h)c(i-j) \end{aligned}$$
and to simplify computation we first let \(x-x_{i}=\frac{\left\lfloor xn\right\rfloor }{n}-\frac{i}{n}.\) Since \(K\) is a kernel of compact support on \([-h/2,h/2]\), only \(nh/2\) of the \(x_{i}\) to the left and right of \(x\) have a non zero contribution to the sum. We can pick the sequence \(nh\) in a way that \(nh/2\) is an integer. With this in mind, and without loss of generality we can rewrite the above sum as
$$\begin{aligned}&\frac{1}{nh}\sum \limits _{i,j=1}^{n}K\left(\left(\frac{i}{n}\right)/h\right) K\left(\left( \frac{j}{n}\right)/h\right)c(i-j)\\&\quad \sum \limits _{s=-nh+1}^{nh-1}c(s)\frac{1}{nh}\sum \limits _{i=-nh/2} ^{nh/2-\left|s\right|}K\left(\frac{i}{nh}\right)K\left(\frac{i+\left|s\right|}{nh}\right) \end{aligned}$$
We note that
$$\begin{aligned} K\left(\frac{i+\left| s\right| }{nh}\right)=K\left(\frac{i}{nh}\right)+\frac{\left| s\right| }{nh}K^{\prime }(\zeta _{i})\;\quad \text{ where} \frac{i}{nh}<\zeta _{i}<\frac{i+\left| s\right| }{nh} \end{aligned}$$
So we have
$$\begin{aligned} nhVar(\hat{m}(x))&= \sum \limits _{s=-nh+1}^{nh-1}c(s)\frac{1}{nh}\sum \limits _{i=-nh/2}^{nh/2-\left| s\right| }K\left(\frac{i}{nh}\right)\left( K\left(\frac{i}{nh}\right)+\frac{\left| s\right| }{nh}K^{\prime }(\zeta _{i})\right)\\&= \sum \limits _{s=-nh+1}^{nh-1}c(s)\frac{1}{nh}\sum \limits _{i=-nh/2} ^{nh/2-\left| s\right| }K^{2}\left(\frac{i}{nh}\right)\nonumber \\&+\sum \limits _{s=-nh+1} ^{nh-1}c(s)\frac{\left| s\right| }{(nh)^{2}}\sum \limits _{i=-nh/2} ^{nh/2-\left| s\right| }K\left(\frac{i}{nh}\right)K^{\prime }(\zeta _{i})\\&= V_{1,n}+V_{2,n} \end{aligned}$$
Let us consider \(V_{1,n}.\) We have
$$\begin{aligned} V_{1,n}&= \sum \limits _{s=-nh+1}^{nh-1}c(s)\frac{1}{nh}\sum \limits _{i=-nh/2} ^{nh/2-\left| s\right| }K^{2}\left(\frac{i}{nh}\right)\\&= \sum \limits _{s=-nh+1}^{nh-1}c(s)\frac{1}{nh}\sum \limits _{i=-nh/2} ^{nh/2}K^{2}\left(\frac{i}{nh}\right)\\&\quad +\sum \limits _{s=-nh+1}^{nh-1}c(s)\frac{1}{nh}\left( \sum \limits _{i=-nh/2} ^{nh/2-\left| s\right| }K^{2}\left(\frac{i}{nh}\right)-\sum \limits _{i=-nh/2} ^{nh/2}K^{2}\left(\frac{i}{nh}\right)\right)\\&= \sum \limits _{s=-nh+1}^{nh-1}c(s)\frac{1}{nh}\sum \limits _{i=-nh/2}^{nh/2} K^{2}\left(\frac{i}{nh}\right)+\tilde{V}_{1,n} \end{aligned}$$
We have that
$$\begin{aligned} \left| \tilde{V}_{1,n}\right|&= \left| \sum \limits _{s=-nh+1}^{nh-1} c(s)\frac{1}{nh}\left( \sum \limits _{i=-nh/2}^{nh/2-\left| s\right| } K^{2}\left(\frac{i}{nh}\right)-\sum \limits _{i=-nh/2}^{nh/2}K^{2}\left(\frac{i}{nh}\right)\right) \right|\\&\le 2C\sum \limits _{s=0}^{nh-1}\frac{\left| s\right| }{nh}c(s) \end{aligned}$$
$$\begin{aligned} bad hboxC\frac{1}{nh}\sum \limits _{s=0}^{nh-1}\left|s\right|c(s)\rightarrow 0\quad \text{ for} nh\rightarrow \infty \end{aligned}$$
where \(\left|K^{2}(.)\right|\le C.\) The limit is implied by A.2. by using the mixing inequality to write \(\left|c(i)\right|\le C\alpha _{e}(i)^{\frac{2+\delta }{4+\delta }},\;(\)since we have 6\(+\delta \) moments). We then apply Kronecker’s Lemma to establish the limit\(.\) Furthermore, we have
$$\begin{aligned} \sum \limits _{s=-nh+1}^{nh-1}c(s)\frac{1}{nh}\sum \limits _{i=-nh/2}^{nh/2} K^{2}\left(\frac{i}{nh}\right)\rightarrow \sum \limits _{s=-\infty }^{\infty }c(s)\int K^{2}(u)d(u)\quad \text{ for} nh\rightarrow \infty \end{aligned}$$
We have established that
$$\begin{aligned} V_{1,n}\rightarrow 2\pi f(0)\int K^{2}(u)d(u)\quad \text{ for}nh\rightarrow \infty \end{aligned}$$
Consider \(V_{2,n}.\)
$$\begin{aligned} \left| V_{2,n}\right|&\le \left( \sum \limits _{s=-nh+1}^{nh-1}\left| c(s)\right| \frac{\left| s\right| }{nh}\right) \left( \frac{1}{nh} \sum \limits _{i=-nh/2}^{nh/2-\left| s\right| }K\left(\frac{i}{nh}\right)\right) \left| K^{\prime }(\zeta _{i})\right|\\&\le \left( \sum \limits _{s=-nh+1}^{nh-1}\left| c(s)\right| \frac{\left| s\right| }{nh}\right) \left( \frac{1}{nh}\sum \limits _{i=-nh/2} ^{nh/2-\left| s\right| }K\left(\frac{i}{nh}\right)\right) M \end{aligned}$$
Since we have \(\left| K^{\prime }(s)\right| \le M.\) We also note that
$$\begin{aligned} \left( \frac{1}{nh}\sum \limits _{i=-nh+1}^{nh-1}K\left(\frac{i}{nh}\right)\right) = 1\;+O((nh)^{-1})\quad \text{ for} nh\rightarrow \infty \end{aligned}$$
and
$$\begin{aligned} \left| \sum \limits _{s=-nh+1}^{nh-1}\left| c(s)\right| \frac{\left| s\right| }{nh}\right| <2\left| \sum \limits _{s=0}^{nh-1}\left| c(s)\right| \frac{\left| s\right| }{nh}\right| \rightarrow 0. \end{aligned}$$
The limit follows in the same manner as the limit for \(\tilde{V}_{1,n}\) as explained above\(,\) using A.2., mixing inequality and Kronecker’s Lemma to establish that \(V_{2,n}\rightarrow 0\) for \(nh\rightarrow \infty .\)
$$\begin{aligned} nhVar(\hat{m}(x))=2\pi f(0)\int K^{2}(u)d(u)\quad \text{ for} nh\rightarrow \infty . \end{aligned}$$
\(\square \)

Proof of Theorem 1 (iii)

We refer to theorem 3.1 of Roussas et al. (1992) for this proof. We verify their assumptions and conditions for our setting. We denote their assumptions with the initials RTI following it. (A1) RTI is satisfied by the definition of our model along with A.1., A.2. and A.5. (A2) (i) RTI is a consequence of the fact that
$$\begin{aligned} \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)=\sum \limits _{i=1}^{n}\left| \frac{K((x-x_{i})/h)}{nh}\right| \rightarrow 1\;\text{ as} \quad n\rightarrow \infty . \end{aligned}$$
because \(K(x) \ge 0\) and integrates to one. (A2) (ii) RTI is a result of the fact that there exists a constant \(C>0\) such that
$$\begin{aligned} \left| \frac{K((x-x_{i})/h)}{nh}\right| \le \frac{C}{nh}\quad \text{ for} \text{ all} \text{ i.} \end{aligned}$$
as is implied by A.3. i) and iv), i.e compact support, positivity and continuity of the kernel \(K\). Furthermore,
$$\begin{aligned} \sum \limits _{i=1}^{n}\left( \frac{K((x-x_{i})/h)}{nh}\right) ^{2}=\frac{1}{nh}\left( \frac{1}{nh}\sum \limits _{i=1}^{n}\left( K((x-x_{i})/h)\right) ^{2}\right) =O\left( \frac{1}{nh}\right) \end{aligned}$$
because
$$\begin{aligned} \frac{1}{nh}\sum \limits _{i=1}^{n}\left( K((x-x_{i})/h)\right) ^{2} \rightarrow C_{1}\quad \text{ as}\;n\rightarrow \infty \end{aligned}$$
for some \(C_{1}>0\) as a consequence of A.3. iii). Therefore we have
$$\begin{aligned} \underset{i}{\max }\left| \frac{K((x-x_{i})/h)}{nh}\right| =O\left( \sum \limits _{i=1}^{n}\left( \frac{K((x-x_{i})/h)}{nh}\right) ^{2}\right). \end{aligned}$$
Assumption (A3) RTI is satisfied by the fact that variance of m̂ (x) is O(\(\frac{1}{nh})\) due to (Theorem 1.ii) ) which implies that
$$\begin{aligned} \sum \limits _{i=1}^{n}\left( \frac{K((x-x_{i})/h)}{nh}\right) ^{2}=O\left( Var(\hat{m}(x))\right). \end{aligned}$$
Assumption (A4) RTI is an immediate consequence of A.1. Condition (2.21) RTI has three parts. We first observe that the effective number of terms in our sum is \(nh\). We have to check that there exist \(p\) and \(q\) as defined in Tran and Ioannides (1992) where \(qp^{-1}\rightarrow 0\) and where
$$\begin{aligned}&nhqp^{-1}\sum \limits _{i=1}^{n}\left( \frac{K((x-x_{i})/h)}{nh}\right) ^{2}\rightarrow 0,\\&\quad p^{2}\sum \limits _{i=1}^{n}\left( \frac{K((x-x_{i})/h)}{nh}\right) ^{2}\rightarrow 0\quad \text{ and}\quad nhp^{-1}\alpha (q)\rightarrow 0\;\text{ as} \ n \rightarrow \infty . \end{aligned}$$
Since \(qp^{-1}\rightarrow 0\) and
$$\begin{aligned} nh\sum \limits _{i=1}^{n}\left( \frac{K((x-x_{i})/h)}{nh}\right) ^{2}=O(1) \end{aligned}$$
as we had shown earlier. As a result we confirm that the first condition holds. Once again, since
$$\begin{aligned} \sum \limits _{i=1}^{n}\left( \frac{K((x-x_{i})/h)}{nh}\right) ^{2}=O\left( \frac{1}{nh}\right) \end{aligned}$$
we require that \(p^{2}=o(nh)\) or \(p=o(\sqrt{nh}).\) From our assumption A.2. we have that
$$\begin{aligned} q\alpha (q)=o\left(\frac{1}{q}\right)\quad \text{ or} \quad \alpha (q)=o\left( \frac{1}{q^{2} }\right) \end{aligned}$$
and so we have
$$\begin{aligned} nh\left( o(\left( nh\right) ^{\frac{1}{2}})\right) o\left( \frac{1}{q^{2}}\right) =o(1) \end{aligned}$$
which requires that \(\left( nh\right) ^{\frac{1}{4}}=o(q).\) This satisfies the third condition of (2.21) RTI. A feasible range of values is \(p= \left( nh\right) ^{\frac{1}{2}-\varepsilon }\) and \(q= \left( nh\right) ^{\frac{1}{4}+\varepsilon }\) for 0 \(<\)\(\varepsilon <\frac{1}{4}.\)

Proof of Theorem 2

We have
$$\begin{aligned} E^{*}(\hat{m}^{*}(x))=\frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i} )/h)E(Y_{i}^{*}). \end{aligned}$$
As long as \(i\) is not an end point, we also have
$$\begin{aligned} E^{*}(Y_{i}^{*})=\frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}Y_{i+r}. \end{aligned}$$
Recall that \(x\) is a fixed number in \( (0,1)\). So let \(n\) be big enough so that \(h\) and \(B\) (that tend to zero) become small enough such that \(x> \max (B+h/2,h)\) if \(x<1/2\), or \(1-x> \max (B+h/2,h)\) if \(x\ge 1/2\); this is enough to guarantee that end points do not influence our estimates.

To see why, consider the case \(x<1/2\), and the influence of an end point \(x_i\). Being a left end point, \(x_i=i/n\) for some \(i\le nB\), i.e., \(x_i=i/n \le nB/n =B\). Since \(x> B+h/2\) it follows that \(x> B+h/2\), and thus \(x-x_i> h/2\). By the compact support of \(K\), this implies that \(K((x-x_{i})/h)=0\) and thus the effect of the end points can be neglected. The case \(x\ge 1/2\) is similar.

Replacing \(Y_{i}\) with \(m(x_{i})\) + \(e_{i}\) in the expression for \(E^{*}(\hat{m}^{*}(x))\), we get
$$\begin{aligned} E^{*}(\hat{m}^{*}(x))=\frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i} )/h)\left( \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}(m(x_{r+i})\text{+} e_{r+i}\text{)}\right) \end{aligned}$$
Let us consider the expression
$$\begin{aligned}&\left| \ E^{*}(\hat{m}^{*}(x))-E(\hat{m}(x))\right|\\&\quad =\left|\frac{1}{nh}\sum \limits _{1}^{n}K((x-x_{i})/h)\left( \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}(m(x_{r+i})\!-\!m(x_{i}))+\frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}e_{r+i}\right) \right|\!=\!D_{n}. \end{aligned}$$
Since
$$\begin{aligned} \left|\frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}m(x_{r+i}) -m(x_{i})\right|\le \sup _{t\in \left[ x-B,x+B\right] }\left|m^{\prime }(t)\right|2B \end{aligned}$$
we have
$$\begin{aligned} D_{n}\le \left| \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)\left( \left| \sup _{t\in \left[ x-B,x+B\right] }m^{\prime }(t)B\right| +\left| \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}e_{r+i}\right| \right) \right|. \end{aligned}$$
We know that \(\left| m^{\prime }(x)\right| \le M\) for some \(M>0,\) and \(B\rightarrow 0\) at a rate of \(O(\,n^{-\delta _{2}})\). Also
$$\begin{aligned} \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}e_{r+i}= O_{p}((nB)^{-\frac{1}{2}}) \end{aligned}$$
by central limit theorem for strictly stationary random sequences with strong mixing satisfying the moment and mixing conditions given in our assumptions (Theorem 18.5.3 Ibragimov & Linnik). Furthermore,
$$\begin{aligned} \frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)=1+O(\left( nh\right) ^{-1}). \end{aligned}$$
Putting this together we have
$$\begin{aligned} \left| E^{*}(\hat{m}^{*}(x))-E(\hat{m}(x))\right| =O_{p} ((nB)^{-\frac{1}{2}})+O(B)=O_{p}(n^{\max ((\delta _{2}-1)/2,-\delta _{2} )}). \end{aligned}$$
\(\square \)

Proof of Lemma 1

$$\begin{aligned} c^{*}(i,j)&= \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}\left( Y_{i+r}-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}Y_{i+l}\right) \left( Y_{j+r}-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}Y_{j+l}\right)\nonumber \\&= \frac{1}{2nB+1}\left( \sum \limits _{r=-nB}^{nB}\left( (m(x_{i+r})-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}m(x_{i+l})\right) \right.\nonumber \\&\left. +\left( e_{i+r}-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB} e_{i+l}\right) \right)\nonumber \\&\times \left( \left( m(x_{j+r})-\frac{1}{2nB+1}\sum \limits _{l=-nB} ^{nB}m(x_{j+l})\right) +\left( e_{j+r}-\frac{1}{2nB+1}\sum \limits _{l=-nB} ^{nB}e_{j+l}\right) \right)\quad \quad \quad \quad \end{aligned}$$
(6.1)
We note that
$$\begin{aligned}&\left|\frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}\left( (m(x_{i+r})-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}m(x_{i+l})\right) \right.\nonumber \\&\qquad \left. \left. \times \left( m(x_{j+r})-\frac{1}{2nB+1}\sum \limits _{l=-nB} ^{nB}m(x_{j+l})\right) \right) \right|\nonumber \\&\quad \le \left( 2B\underset{x\in [0,1]}{\max }\left|m^{\prime }(x)\right|\right) ^{2}=O(n^{-2\delta _{2}}) \end{aligned}$$
(6.2)
We also observe that by the Cauchy-Schwarz Inequality
$$\begin{aligned}&\left|\frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}\left( \left( (m(x_{i+r})-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}m(x_{i+l})\right) \right. \right.\nonumber \\&\qquad \left. \left. \times \left( e_{j+r}-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB} e_{j+l}\right) \right) \right|\nonumber \\&\quad \le \left( \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}\left( (m(x_{i+r} )-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}m(x_{i+l})\right) ^{2}\right) ^{\frac{1}{2}}\nonumber \\&\qquad \times \left( \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}\left( e_{j+r}-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}e_{j+l}\right) ^{2}\right) ^{\frac{1}{2}} \end{aligned}$$
(6.3)
We have
$$\begin{aligned}&\left( \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}\left( (m(x_{i+r})-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}m(x_{i+l})\right) ^{2}\right) ^{\frac{1}{2}}\nonumber \\&\quad \le \left( \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}\left( 2B\underset{x\in [0,1]}{\max }\left| m^{\prime }(x)\right| \right) ^{2}\right) ^{\frac{1}{2}}=O(B)=O(n^{-\delta _{2}}) \end{aligned}$$
(6.4)
Also
$$\begin{aligned}&\left( \frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}\left( e_{j+r}-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}e_{j+l}\right) ^{2}\right) ^{\frac{1}{2}}\nonumber \\&\quad =\left( c(0)+O_{p}((nB)^{-\frac{1}{2}})\right) ^{\frac{1}{2}} =O(1) \end{aligned}$$
(6.5)
by Theorem 3.1 of Romano and Thombs (1996). Therefore we can use (6.4) and (6.5) to conclude that (6.3) is \(O_{p}(n^{-\delta _{2}}).\) By the same argument, the other middle term in (6.3), namely
$$\begin{aligned}&\left|\frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}\left( (m(x_{j+r})-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}m(x_{j+l})\right) \left( e_{i+r}-\frac{1}{2nB+1}\sum \limits _{l=-nB}^{nB}e_{i+l}\right) \right|\\&\qquad =O_{p}(n^{-\delta _{2}}) \end{aligned}$$
(6.6)
Now we consider the last (and dominant) term in the expansion of (6.1).
$$\begin{aligned}&\frac{1}{2nB+1}\sum \limits _{r=-nB}^{nB}\left( e_{i+r}-\frac{1}{2nB+1} \sum \limits _{l=-nB}^{nB}e_{i+l}\right) \left( e_{j+r}-\frac{1}{2nB+1} \sum \limits _{l=-nB}^{nB}e_{j+l}\right)\nonumber \\&\qquad =c(i-j)+O_{p}(\left( nB\right) ^{-\frac{1}{2}})\end{aligned}$$
(6.7)
$$\begin{aligned}&\qquad =c(i-j)+O_{p}(\left( n^{(\delta _{2}-1)/2}\right) ) \end{aligned}$$
(6.8)
by Romano and Thombs (1996)). Therefore by (6.4), (6.6), and (6.8) we have that
$$\begin{aligned} c^{*}(i,j)&= c(i-j)+O(n^{-2\delta _{2}})+O_{p}(n^{-\delta _{2}})+O_{p} (n^{(\delta _{2}-1)/2})\nonumber MYAMP]=&c(i-j)+O_{p}(n^{(\delta _{2}-1)/2}) \end{aligned}$$
due to A.6. (ii) which ensures that max(\(-\delta _{2}/2,(\delta _{2} -1)/2)=(\delta _{2}-1)/2\). \(\square \)

Proof of Lemma 2

$$\begin{aligned} Var\left( \frac{1}{N}\sum \limits _{r=1}^{N}e_{r}e_{r+\left|s\right|}\right) =\frac{1}{N^{2}}Var\left( \sum \limits _{r=1}^{N}e_{r}e_{r+\left|s\right|}\right) \end{aligned}$$
Let Y\(_{r}^{s}=e_{r}e_{r+\left|s\right|}.\) Then we get
$$\begin{aligned} \frac{1}{N^{2}}Var\left( \sum \limits _{r=1}^{N}Y_{r}^{s}\right)&= \frac{1}{N^{2}}\sum \limits _{i=1}^{N}\sum \limits _{j=1}^{N}Cov(Y_{i}^{s},Y_{j}^{s})\\&= \frac{1}{N^{2}}\sum \limits _{k=-N+1}^{N-1}\sum \limits _{i=1}^{N-\left|k\right|}Cov(Y_{i}^{s},Y_{i+\left|k\right|}^{s}). \end{aligned}$$
This last step is due to stationarity. We need only to consider the uniform boundedness over \(s\) of
$$\begin{aligned} \sum \limits _{k=-N+1}^{N-1}Cov(Y_{1}^{s},Y_{1+\left|k\right|} ^{s})&= \sum \limits _{k=-N}^{N}Cov(e_{1}e_{s},e_{\left|k\right|}e_{\left|k\right|+s})\\&= \sum \limits _{k=-N}^{N}\left( E(e_{1}e_{s}e_{\left|k\right|}e_{\left|k\right|+s})-E(e_{1}e_{s})E(e_{\left|k\right|}e_{\left|k\right|+s})\right) \end{aligned}$$
If we wish to bound the above sum uniformly over \(s\), we need to establish summability of fourth order cumulants of strong mixing random variables. This requires the mixing and moment conditions given by assumptions A.1. and A.2. which are given by Künsch[1989] in the proof of Theorem 3.3. Therefore, as a consequence of the Dirichlet test, we have
$$\begin{aligned} =\frac{1}{N}\sum \limits _{k=-N+1}^{N-1}\left( 1-\frac{\left|k\right|}{N}\right) Cov(Y_{1}^{s},Y_{1+\left|k\right|}^{s})\rightarrow 0\;\text{ as} \ N \rightarrow \infty \;\text{ uniformly} \text{ over} \text{ all} \ s. \end{aligned}$$
We have thus shown that the convergence of the dominant term of \(c^*(i,j)\) is uniform—cf. eq. (6.7); the other terms follow similarly. \(\square \)

Proof of Theorem 3

$$\begin{aligned}&\hat{m}^{*}(x)=\frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)Y_{i}^{*}=\frac{1}{nh}\sum \limits _{i=\left\lceil -nh/2+xn\right\rceil }^{\left\lfloor nh/2+xn\right\rfloor }K((x-x_{i})/h)Y_{i}^{*}\\&\quad = \frac{1}{nh}\sum \limits _{i\!=\!\left\lceil -nh/2+xn\right\rceil }^{b\left\lceil \left\lceil \!-\!nh/2\!+\!xn\right\rceil /b\right\rceil }K((x-x_{i})/h)Y_{i}^{*}\!+\!\frac{1}{nh}\sum \limits _{i\!=\!b\left\lceil \left\lceil \!-\!nh/2\!+\!xn\right\rceil /b\right\rceil +1}^{\left\lfloor \left\lfloor nh/2+xn\right\rfloor /b\right\rfloor b}K((x-x_{i})/h)Y_{i}^{*}\nonumber \\&\qquad +\frac{1}{nh}\sum \limits _{i=\left\lfloor \left\lfloor nh/2+xn\right\rfloor /b\right\rfloor b+1}^{\left\lfloor nh/2+xn\right\rfloor }K((x-x_{i} )/h)Y_{i}^{*}\nonumber \end{aligned}$$
(6.9)
\(=T_{1,n}+V_{n}+T_{2,n}\) respectively. We observe that \(T_{1,n} \) and \(T_{2,n}\) are truncated bootstrap blocks and are independent of each other and V\(_{n}\) and are the sum of the beginning terms and end terms of (6.9). The term V\(_{n}\) is the sum of \(nh/b\) independent blocks of size \(b\). We write \(nhVar^{*}(\hat{m}^{*}(x))\) as
$$\begin{aligned}&\frac{1}{nh}Var^{*}\left( \sum \limits _{i=\left\lceil -nh/2+xn\right\rceil }^{b\left\lceil \left\lceil -nh/2+xn\right\rceil /b\right\rceil } K((x-x_{i})/h)Y_{i}^{*}\right)\\&\quad +\frac{1}{nh}Var^{*}\left( \sum \limits _{i=b\left\lceil \left\lceil -nh/2+xn\right\rceil /b\right\rceil +1}^{\left\lfloor \left\lfloor nh/2+xn\right\rfloor /b\right\rfloor } K((x-x_{i})/h)Y_{i}^{*}\right) \\&\quad +\frac{1}{nh}Var^{*}\left( \sum \limits _{i=\left\lfloor \left\lfloor nh/2+xn\right\rfloor /b\right\rfloor b+1}^{\left\lfloor nh/2+xn\right\rfloor }K((x-x_{i})/h)Y_{i}^{*}\right) \end{aligned}$$
\(=S_{1,n}+W_{n}+S_{2,n}\) respectively. We observe that \(S_{1,n} \) and \(S_{2,n}\) tend to zero at O(\(\frac{b^{2}}{nh}\)) implied by assumption A.6.(iii). Therefore we need only consider the term \(W_{n}\) which is the scaled sum of the variances of approximately \(nh/b\) independent blocks of size \(b.\)
$$\begin{aligned}&\frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2\!+\!xn)/b\right\rceil \!+\!1} ^{\left\lfloor (nh/2\!+\!xn)/b\right\rfloor }\sum \limits _{j=1}^{b}\sum \limits _{l=1}^{b}K((x\!-\!x_{ib+j})/h)K((x\!-\!x_{ib+l})/h)Cov^{*}(Y_{ib\!+\!j}^{*},Y_{ib+l}^{*})\\&\quad =\frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\sum \limits _{j=1}^{b} \sum \limits _{l=1}^{b}K((x-x_{ib+j})/h)K((x-x_{ib+l})/h)\\&\qquad \times \frac{1}{2nB+1}\left( \left( Y_{ib+j+r}-\frac{1}{2nB+1}\sum \limits _{k=-nB}^{nB}Y_{ib+j+k}\right) \right.\\&\qquad \left.\times \left( Y_{ib+l+r}-\frac{1}{2nB+1}\sum \limits _{k=-nB}^{nB} Y_{ib+l+k}\right) \right) \end{aligned}$$
Using Lemma 1 and Lemma 2 and assumption A.6.(i) and the error bounds we used there to write the above sum as
$$\begin{aligned}&\frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil } ^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\sum \limits _{j=1}^{b} \sum \limits _{l=1}^{b}K((x-x_{ib+j})/h)K((x-x_{ib+l})/h)(c(j-l)\nonumber \\&\qquad +O_{p} ((nB)^{-\frac{1}{2}})) \end{aligned}$$
We note that Lemma 1 gives us the \(O((nB)^{-\frac{1}{2}}))\) rate for the covariance estimate and Lemma 2 essentially allows us to ignore the constant associated with this rate. However, letting \(C =\max _s K(s)^{2}, \) we have
$$\begin{aligned} C\left(\frac{1}{nh}\right)\left(\frac{nh}{b}\right)(b^{2})O_{p}((nB)^{-\frac{1}{2}}))=O\left( \frac{b}{\sqrt{nB}}\right) \rightarrow 0\quad \text{ by} \text{ A.6.(ii).} \end{aligned}$$
We therefore need to consider
$$\begin{aligned}&\frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil } ^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\sum \limits _{j=1}^{b} \sum \limits _{l=1}^{b}K((x-x_{ib+j})/h)K((x-x_{ib+l})/h)c(j-l)\\&\quad =\frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\sum \limits _{s=-b+1}^{b-1} \sum \limits _{j=1}^{b-\left| s\right| }K((x-x_{ib+j})/h)K((x-x_{ib+j+\left| s\right| })/h)c(s) \end{aligned}$$
We note that
$$\begin{aligned} K((x-x_{ib+j+\left|s\right|})/h)=K((x-x_{ib+j})/h)+\frac{\left|s\right|}{nh}K^{\prime }(\zeta _{i,j})\; \end{aligned}$$
where
$$\begin{aligned} (x-x_{ib+j})/h<\zeta _{i,j}<(x-x_{ib+j+\left|s\right|})/h. \end{aligned}$$
So we have \(nhVar^{*}(\hat{m}^{*}(x))\)
$$\begin{aligned}&=\frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\sum \limits _{s=-b+1}^{b-1} \sum \limits _{j=1}^{b-\left| s\right| }K((x-x_{ib+j})/h)\left( K((x-x_{ib+j})/h)+\frac{\left| s\right| }{nh}K^{\prime }(\zeta _{i,j})\right) c(s)\\&=\frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\sum \limits _{s=-b+1}^{b-1} \sum \limits _{j=1}^{b-\left| s\right| }K^{2}((x-x_{ib+j})/h)c(s)\\&\quad +\frac{\left| s\right| }{n^{2}h^{2}}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1} \sum \limits _{s=-b+1}^{b-1}\sum \limits _{j=1}^{b-\left| s\right| }K((x-x_{ib+j})/h)K^{\prime }(\zeta _{i,j})c(s)=V_{1,n}+V_{2,n} \end{aligned}$$
Let us consider \(V_{1,n}.\) We have
$$\begin{aligned} V_{1,n}&=\left( \frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1} \sum \limits _{s=-b+1}^{b-1}\sum \limits _{j=1}^{b-\left| s\right| } K^{2}((x-x_{ib+j})/h)c(s)\right)\\&=\frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\frac{1}{2nB+1}\sum \limits _{r=-nB} ^{nB}\sum \limits _{s=-b+1}^{b-1}\sum \limits _{j=1}^{b}K^{2}((x-x_{ib+j} )/h) E(e_{ib+r}e_{ib+r+\left| s\right| })\\&\quad +\frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\sum \limits _{s=-b+1}^{b-1}\left( \sum \limits _{j=1}^{b-\left| s\right| }K^{2}((x-x_{ib+j})/h)-\sum \limits _{j=1}^{b}K^{2}((x-x_{ib+j})/h)\right) c(s)\\&=\frac{1}{nh}\sum \limits _{s=-b+1}^{b-1}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1} \sum \limits _{j=1}^{b}K^{2}((x-x_{ib+j})/h)c(s)+\tilde{V}_{1,n} \end{aligned}$$
(6.10)
We have that \(\left| \tilde{V}_{1,n}\right| \) is
$$\begin{aligned}&\left| \frac{1}{nh}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\sum \limits _{s=-b+1}^{b-1}\left( \sum \limits _{j=1}^{b-\left| s\right| }K^{2}((x-x_{ib+j})/h)-\sum \limits _{j=1}^{b}K^{2}((x-x_{ib+j})/h)\right) c(s)\right|\\&\quad \le 2C\sum \limits _{s=0}^{b-1}\frac{\left| s\right| }{b}c(s) \end{aligned}$$
and
$$\begin{aligned} 2C\frac{1}{b}\sum \limits _{s=0}^{b-1}\left| s\right| c(s)\rightarrow 0\quad \text{ as} b\rightarrow \infty \end{aligned}$$
(6.11)
where \(\left| K^{2}(.)\right| \le C.\) The limit is implied by A.2. by using the mixing inequality to write \(\left| c(i)\right| \le C\alpha _{e}(i)^{\frac{2+\delta }{4+\delta }},\;(\)since we have at least 4\(+\delta \; \)moments). We then apply Kronecker’s Lemma to establish the limit.
As a result, (6.10) can be written as
$$\begin{aligned} \frac{1}{nh}\sum \limits _{s=-b+1}^{b-1}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1} \sum \limits _{j=1}^{b}K^{2}((x-x_{ib+j})/h)c(s)\;+\;o_{p}(1) \end{aligned}$$
As a consequence, we arrive at the result
$$\begin{aligned}&\sum \limits _{s\!=\!-b\!+\!1}^{b-1}c(s)\frac{1}{nh}\sum \limits _{i\!=\!\left\lceil (-nh/2\!+\!xn)/b\right\rceil }^{\left\lfloor (nh/2\!+\!xn)/b\right\rfloor -1} \sum \limits _{j=1}^{b}K^{2}((x-x_{ib+j})/h)\rightarrow \sum \limits _{s=-\infty }^{\infty }c(s)\int K^{2}(u)d(u)\\&\text{ for} b\rightarrow \infty \;\text{ and} n\rightarrow \infty . \end{aligned}$$
We have established that
$$\begin{aligned} V_{1,n}\rightarrow 2\pi f(0)\int K^{2}(u)d(u)\quad \text{ for} n\rightarrow \infty . \end{aligned}$$
Consider \(V_{2,n}.\)
$$\begin{aligned} \left| V_{2,n}\right|&= \left( \sum \limits _{s=-b+1}^{b-1}c(s)\frac{\left| s\right| }{nh}\right) \left( \sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1} \sum \limits _{j=1}^{b-\left| s\right| }K((x-x_{ib+j})/h)\right) \left| K^{\prime }(\zeta _{i,j})\right|\\&\le \left( \frac{1}{nh}\sum \limits _{s=-b+1}^{b-1}c(s)\left| s\right| \right) \left( \frac{1}{nh}\sum \limits _{i=-nh/2}^{nh/2}K((x-x_{i} )/h)\right) M \end{aligned}$$
We use the fact that \(\left| K^{\prime }(.)\right| \le M.\) We also note that
$$\begin{aligned}&\left( \frac{1}{nh}\sum \limits _{i=-nh/2}^{nh/2}K((x-x_{i})/h)\right) \rightarrow 1\quad as n\rightarrow \infty \\&\qquad \qquad \qquad \left| \frac{1}{nh}\sum \limits _{s=-b+1}^{b-1}c(s)\left| s\right| \right| <\left| \frac{2}{b}\sum \limits _{s=0}^{b}c(s)\left| s\right| \right| \rightarrow 0. \end{aligned}$$
The limit is implied by A.2. by using the mixing inequality and Kronecker’s Lemma as explained for (6.11). Therefore we have
$$\begin{aligned} nhVar(\hat{m}(x))=2\pi f(0)\int K^{2}(u)d(u)\quad as\;n\rightarrow \infty \end{aligned}$$

Proof of Theorem 4

As in the proof of Theorem 3, assume \(n\) is big enough, so that \(h\) and \(B\) are small enough to ensure that the effect of end points is null. Let us consider the expression
$$\begin{aligned} (nh)^{-\frac{1}{2}}\sum \limits _{i=1}^{n}K((x-x_{i})/h)(Y_{i}^{*}-E^{*}Y_{i}^{*}) \end{aligned}$$
Let \(e_{i}^{*}=\)\((Y_{i}^{*}-E^{*}Y_{i}^{*}).\) Using (3.2) we write
$$\begin{aligned}&\left( nh\right) ^{-\frac{1}{2}}\hat{m}^{*}(x)=\left( nh\right) ^{-\frac{1}{2}}\sum \limits _{i=\left\lceil -nh/2+xn\right\rceil }^{\left\lfloor nh/2+xn\right\rfloor }K((x-x_{i})/h)e_{i}^{*}\\&\qquad =\left( nh\right) ^{-\frac{1}{2}}\sum \limits _{i=\left\lceil -nh/2+xn\right\rceil }^{b\left\lceil \left\lceil -nh/2+xn\right\rceil /b\right\rceil }K((x-x_{i})/h)e_{i}^{*}+\left( nh\right) ^{-\frac{1}{2} }\sum \limits _{i=b\left\lceil \left\lceil -nh/2+xn\right\rceil /b\right\rceil +1}^{\left\lfloor \left\lfloor nh/2+xn\right\rfloor /b\right\rfloor b}K((x-x_{i})/h)e_{i}^{*}\\&\quad \qquad +\left( nh\right) ^{-\frac{1}{2}}\sum \limits _{i=\left\lfloor \left\lfloor nh/2+xn\right\rfloor /b\right\rfloor b+1}^{\left\lfloor nh/2+xn\right\rfloor }K((x-x_{i})/h)e_{i}^{*} \end{aligned}$$
\(=T_{1,n}+V_{n}+T_{2,n}\) respectively. We observe that \(T_{1,n} \) and \(T_{2,n}\) are truncated bootstrap blocks and are independent of each other and of V\(_{n}\). By A.6. (iii) both \(T_{1,n}\) and \(T_{2,n}\) are approaching zero at rate \(O(b/\sqrt{ nh})\). Therefore we need only consider the term V\(_{n}\) which is the sum of \(nh/b\) independent blocks of size \(b\). We write
$$\begin{aligned} V_{n}&= \sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\left( (nh)^{-\frac{1}{2}}\sum \limits _{j=1} ^{b}K((x-x_{ib+j})/h)e_{ib+j}^{*}\right)\\&= \sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}\xi _{i,n}^{*} \end{aligned}$$
Let
$$\begin{aligned} \xi _{i,n}^{*}=(nh)^{-\frac{1}{2}}\sum \limits _{j=1}^{b}K((x-x_{ib+j} )/h)e_{ib+j}^{*} \end{aligned}$$
Each \(\xi _{j,n}^{*}\) are independent blocks of size b. We need to show that the \(\xi _{j,n}^{*}\) ’s satisfy the Lindeberg-Feller condition and we can invoke the Central Limit Theorem for the triangular array of the sum of independent, non-identically distributed random variables. Since we have 6+\(\delta \) order moments, we can use the Lyapunov condition (which implies Lindeberg-Feller) to prove this result. From theorem 3 we know that
$$\begin{aligned} \underset{n\rightarrow \infty }{Lim}\sum \limits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1} Var^{*}(\xi _{i,n}^{*})=\sigma _{as}^{2}. \end{aligned}$$
We also have
$$\begin{aligned} E^{*}(\xi _{i,n}^{*(6)})=E^{*}\left( (nh)^{-\frac{1}{2}} \sum \limits _{j=1}^{b}K((x-x_{ib+j})/h)e_{ib+j}^{*}\right) ^{6}=\frac{1}{(nh)^{3}}O_{p}(b^{6}) \end{aligned}$$
Therefore we obtain
$$\begin{aligned} \frac{\sum \nolimits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}E^{*}(\xi _{i,n}^{*6})}{\left( \sum \nolimits _{i=\left\lceil (-nh/2+xn)/b\right\rceil }^{\left\lfloor (nh/2+xn)/b\right\rfloor -1}Var^{*}(\xi _{i,n}^{*})\right) ^{\frac{6}{2}}}=\frac{\frac{nh}{b}\left( \frac{1}{(nh)^{3}}O_{p}(b^{6})\right) }{(\sigma _{as}^{2}.)^{3}}=O_{p}\left( \frac{b^{\frac{5}{2}}}{nh}\right) \end{aligned}$$
We apply A.6.(iii) and invoke Lyapunov’s condition to show that \((nh)^{\frac{1}{2}}(\hat{m}^{*}(x)-E^{*}\hat{m}^{*}(x))\) is asymptotically normal with mean zero. Furthermore, from Theorem 3 we know that the variance is \(\sigma _{as}^{2}.\) This establishes the result
$$\begin{aligned} \sup _{u}\left|P^{*}(((nh)^{\frac{1}{2}}(\hat{m}^{*}(x)-E^{*} \hat{m}^{*}(x))\le u)-\Phi \left( \frac{u}{\sigma _{as}}\right) \right|\rightarrow _{p}0.\; \end{aligned}$$
We know that if \(h=o( n^{-\frac{1}{5}}\)) we have
$$\begin{aligned} (nh)^{\frac{1}{2}}(E\hat{m}(x)-m(x))=o(1). \end{aligned}$$
We get the second result due to the above equation and Slutsky’s lemma. \(\square \)

Proof of Theorem 5

Observe that \(\sigma _{as}^{2}=2\pi f(0)\int K^{2}(u)d(u)\) does not depend on \(x\). This is due to stationarity of the errors \(\varepsilon _{t}\) and because \(m(x)\) is a deterministic function.We also note that \(\hat{m}^{*}(a_{i})\) and \(\hat{m}^{*}(a_{j})\) for \(i\ne j\) are asymptotically independent. This is because the number of observations between a\(_{i}\) and a\(_{j}\) is \(n\left|a_{i}-a_{j}\right|\) and since we have a kernel window which smooths over \(nh\) observations, 2\(nh+b<n\left|a_{i}-a_{j}\right|\) for n large enough. This implies \(\hat{m}^{*}(a_{i})\) and \(\hat{m}^{*}(a_{j})\) are independent for \(i\ne j\) because they are kernel estimators computed over disjoint and independent observations of the pseudo series. Also \(\hat{m}(a_{i} )\) and \(\hat{m}(a_{j})\) are asymptotically independent because 2\(nh<n\left|a_{i}-a_{j}\right|\) and \(\varepsilon _{t}\) are strong mixing. We now use Theorem 4 to establish the result. Let \(\lambda =(\lambda _{1},\lambda _{2},\ldots .,\lambda _{d})\in R^{d}.\) Then, \( \ \ \ P(\lambda ^{T}((nh)^{\frac{1}{2}}(\hat{m}^{*}(x_{1})-E^{*}\hat{m}^{*} (x_{1})),\ldots ,(nh)^{\frac{1}{2}}(\hat{m}^{*}(x_{d})-E^{*}\hat{m}^{*}(x_{d})))\le u) \)

\(-\Phi \left( \frac{u}{((\lambda _{1}^{2}+\lambda _{2} ^{2}\ldots .+\lambda _{d}^{2})\sigma _{as}^{2})^{\frac{1}{2}}}\right) \rightarrow _{p}0.\) Therefore as an application of the Cramer-Wold Device we have our result.

Our second result can be obtained by observing that
$$\begin{aligned} (nh)^{\frac{1}{2}}(E\hat{m}(a_{i})-m(a_{i}))=o(1)\quad \text{ for} \text{ all} \ i. \end{aligned}$$
We can apply Slutsky lemma to obtain the second part of our theorem. \(\square \)

Copyright information

© Springer-Verlag Berlin Heidelberg 2012