# Local block bootstrap inference for trending time series

## Authors

- First Online:

- Received:

DOI: 10.1007/s00184-012-0413-9

- Cite this article as:
- Dowla, A., Paparoditis, E. & Politis, D.N. Metrika (2013) 76: 733. doi:10.1007/s00184-012-0413-9

- 153 Views

## Abstract

Resampling for stationary sequences has been well studied in the last couple of decades. In the paper at hand, we focus on nonstationary time series data where the nonstationarity is due to a slowly-changing deterministic trend. We show that the *local block bootstrap* methodology is appropriate for inference under this locally stationary setting without the need of detrending the data. We prove the asymptotic consistency of the local block bootstrap in the smooth trend model, and complement the theoretical results by a finite-sample simulation.

### Keywords

BootstrapDependent dataKernel smoothingLocal stationarityRegression## 1 Introduction

Resampling for stationary sequences has been well studied in the last couple of decades; see the monograph of Lahiri (2003), or the recent review paper by Kreiss and Paparoditis (2011) and the references therein. Nevertheless, it is often unrealistic to assume that the stochastic structure of a time series stays invariant over a long stretch of time; a more realistic model might be to assume a slowly-changing stochastic structure.

*locally stationary*process \(\{X_{t,n}, t=1,2,\ldots ,n\}\) by means of a time varying spectral representation of the form

To address this locally stationary setting, we propose to use the *local block bootstrap* (LBB) that is a modification of the block bootstrap of Künsch (1989) and Liu and Singh (1992). The idea of the local block bootstrap was introduced by Paparoditis and Politis (2002), and Dowla et al. (2003). The crux of the LBB methodology is that a bootstrap series is formed by resampling blocks as in the block bootstrap method but with the constraint that to fill a position \(x\) (say) in the bootstrap series only blocks that are ‘near’\(x\) in the original series are considered as possible/probable candidates; the local block bootstrap will be rigorously defined shortly.

One difficulty of the sieve bootstrap in this context is that a pilot estimator has to be constructed, and a smoothing parameter for the pilot has to be determined. An additional limitation is that the error process was assumed by Bühlmann (1998) to be an AR (\(\infty \)) time series although this might have been just an artifact of his method of proof. It seems natural to desire to obtain a methodology for trend function inference under fewer restrictions on the error process; the local block bootstrap manages to achieve this goal by replacing linearity with a strong mixing assumption. Note also that the local block bootstrap method does not require the construction of pilot estimates which may be a tricky issue in practice.

Our paper is organized as follows. In Sect. 2 the model assumptions and kernel estimator are discussed. In Sect. 3 we describe the bootstrap algorithm. In Sect. 4 the main results are presented along with their supporting lemmas. In Sect. 5 we give the simulation results. All proofs are given in the last section.

## 2 Kernel estimator of the trend function

- A.1.
\(\{ e_{t},_{t\in N}\!\} \) is a strictly stationary strong mixing process with mean zero, and autocovariance sequence \(c(k)=E(e_t e_{t+k})\).

- A.2.
For some \(\delta >0\), \(E\left|e_{t}\right|^{6+\delta }\)\(<\)\(\infty \) and \(\sum \nolimits _{i=0}^{\infty }i^{2}\alpha _{e}(i)^{\delta /6+\delta } <\infty \) where \(\alpha _{e}\) is the mixing coefficient of \(\left\{ e_{t},_{t\in N}\!\right\} \). To define our nonparametric estimator we need a kernel which has the following properties.

- A.3.
- i)
The kernel \(K\) is a symmetric probability density that is twice continuously differentiable.

- ii)
\(\int u^{2}K(u)d(u)<\infty \)

- iii)
\(\int K^{2}(u)d(u)<\infty \)

- iv)
\(K\) is compactly supported on the interval [\(-\frac{1}{2},\frac{1}{2}].\)

- i)
- A.4.
\(h=h(n)\) is the bandwidth of the kernel with \(h\rightarrow 0\) and \(nh\rightarrow \infty \) as \(n\rightarrow \infty .\)

- A.5.
The data \(Y_{1},\ldots \ \), \(Y_{n}\) are observations from the model:

**Theorem 1**

Assume \(x\in (0,1)\). Under assumptions A.1.–A.5. and \(h=O(n^{-\frac{1}{5}})\) the following hold:

- i)
\(\lim \limits _{n\rightarrow \infty }(nh)^{\frac{1}{2}}(E(\hat{m} (x))-m(x))=B_{as}(x).\)

- ii)
\(\lim \limits _{n\rightarrow \infty }Var((nh)^{\frac{1}{2}}(\hat{m}(x))=\sigma _{as}^{2}.\)

- iii)
\((nh)^{\frac{1}{2}}(\hat{m}(x)-E\hat{m}(x))\Longrightarrow N(0,\sigma _{as}^{2})\;as\)\(n\rightarrow \infty .\)

## 3 Local block bootstrap algorithm

The goal is to generate a pseudo time series \(Y_{1}^{*},\ldots , Y_{n}^{*}\) by concatenating blocks of size \(b\) which are resampled from the original series by a probability mechanism. The probability mechanism will select blocks of size \(b\) from a set of blocks that are indexwise close to the original time series. This method of choosing blocks ‘locally’ with respect to the time index is the main idea behind the local block bootstrap (LBB) of Paparoditis and Politis (2002), and Dowla et al. (2003).

- 1
Select a number \(B\in (0,1)\) such that \(nB\) is a positive integer.

- 2
Let \(k_{0},k_{1},\ldots , k _{q}\;(\)with \(q=\;\left\lfloor n/b\right\rfloor -1)\) be i.i.d. integers selected from a discrete probability distribution which assigns the probability \(w(k)=1/(2nB+1)\) to the value \(k\) where \(-nB \le k\le nB.\)

- 3
Define the bootstrap pseudo series \(Y^*_1,\ldots , Y^*_n\) by \(Y_{j+ib}^{*} =Y _{j+ib+k_{i}}\) for \(j=1,\ldots ,b,\) where \(k_{i}\) is as given above for \(i=0,\ldots ,\left\lfloor n/b\right\rfloor -1\).

- 4Based on the bootstrap sample, define the bootstrap estimator \(\hat{m}^{*}(x)\) asWe observe that the \(k_{i}\) allow us to replace a designated block of b observations by another block of the same size but shifted by \(k_{i} \) indices. The range of \(k\), i.e., \(-nB\le k\le nB,\) denotes the size of the window from which a block of size b can be selected.$$\begin{aligned} \hat{m}^{*}(x)=\frac{1}{nh}\sum \limits _{i=1}^{n}K((x-x_{i})/h)Y_{i}^{*} \end{aligned}$$(3.1)

Observe that the local block bootstrap procedure has to capture the stochastic structure of the error process locally, allow for the asymptotic normality of the estimator and ensure that bias is negligible. This would require the rates of the block size \(b\), the bootstrap window size \(nB\) and the kernel window size \(nh\) to be appropriately chosen so that all the conditions are satisfied simultaneously. Fortunately, we can satisfy all our requirements with four easily satisfiable constraints. We state them as our last assumptions and motivate them subsequently.

- A.6.For simplicity of notation we want \(nB\) and \(nh\) to be positive integers, and we can pick a sequence of numbers which satisfy the order conditions given.
- (i)
\(\frac{b^{\frac{5}{2}}}{nh}\rightarrow 0\;\;[ \text{ i.e.} \ \ n^{\frac{5}{2} \delta _{1}}=o(n^{(1-\delta _{3})})]\).

- (ii)
\(nB=o(n^{\frac{2}{3}})\;[n^{\frac{1}{3}} =o(n^{\delta _{2}})].\)

- (i)
- A.7.
\(nh =\ o(n^{\frac{4}{5}})\) [\( \text{ i.e.} \ \ n^{\frac{1}{5}}=o(n^{\delta _{3}})\), or equivalently, \(h=o(n^{-\frac{1}{5}}).]\)

## 4 Main results

The local block bootstrap relies on the smoothness of \(m(x)\). The following results allows us to establish the consistency of the bootstrap estimator \(\hat{m}^{*}(x)\) and provides us with information about its rate of convergence to \(\hat{m}^{*}(x)\) in terms of the size of the bootstrap window. Let us consider the expected value of \(\hat{m}^{*}(x).\)

**Theorem 2**

*end point*if either \(1\le k \le nB\) or \(1-nB \le k \le n.\) When \(\left\lceil \frac{i}{b}\right\rceil -\left\lceil \frac{j}{b}\right\rceil =0\), that is to say \(i\) and \(j\) are in the same bootstrap block, and if neither \(i\) nor \(j\) are end points, then we have

**Lemma 1**

\(c^{*}(i,j)=c(i-j)+O_{p}((nB)^{-\frac{1}{2}} )\) when \(\left\lceil \frac{i}{b}\right\rceil -\left\lceil \frac{j}{b}\right\rceil =0\) for any fixed \(i\) and \(j\) that are not end points.

Now we need to ensure that the constant in the order term in Lemma 1 does not depend on \(i\) and \(j \). This is the subject of the following lemma.

**Lemma 2**

Under assumptions A1–A.6., if \(i\) and \(j\) are not end points, then \(c^{*}(i,j)=c(i-j)+O_{p}((nB)^{-\frac{1}{2}})\) where the \(O_{p}((nB)^{-\frac{1}{2}})\) term does not depend on \(i\) and \(j\).

The following result establishes that the asymptotic variance of the bootstrap estimator is the same as that of the regular kernel estimator—see Theorem 1.

**Theorem 3**

Theroem 3 gives us a method to estimate computationally the variance of the regular kernel estimator, by generating many bootstrap samples and computing the variance of \(\hat{m}^{*}(x),\) which would otherwise be a difficult undertaking. We can form confidence intervals for m(x) using asymptotic normality with variance estimated by the LBB (i.e. a combination of Theorem 1 and Theorem 3) by writing an (1-\(\alpha )100\,\%\) level confidence interval as \(\hat{m}(x)\pm \)\(\hat{\sigma }_{as}^{*}(nh)^{-\frac{1}{2}}z_{\alpha /2}.\) Here \(\hat{\sigma }_{as}^{*^{2}}\) is the variance of \((nh)^{\frac{1}{2}}(\hat{m}^{*}(x))\) computed from bootstrap resamples.

**Theorem 4**

*undersmoothing*is required if one wants to avoid an explicit bias correction.

A multivariate extension is given below.

**Theorem 5**

Theorem 5 is proven using the Cramer-Wold device and the asymptotic independence of \(\hat{m}^{*}(a_{i})\) and \(\hat{m}^{*}(a_{j})\) for \(i\;\ne \; j\). This result allows us to construct *simultaneous* confidence intervals for the underlying trend function.

## 5 Simulations

Coverage of 95 % confidence interval of \(m\)(.1). \((m^{(1)}\) (.1) = 13.5 and \(m^{(2)}\) (.1) = \(-\)23.2)

Kernel | Bootstrap | Block size | % Coverage | |
---|---|---|---|---|

Window size | Window size | \(\delta _{1}\) | Pivotal | (Normal) |

\(1-\delta _{3}\) | \(1-\delta _{2}\) | \(n=10^{3}\) | \(n=10^{4}\) | |

.65 (89) (398) | .44 (20) (57) | .16 (3) (4) | 80.0 (84.5) | 87.0 (87.7) |

.19 (4) (6) | 82.0 (86.0) | 89.7 (91.0) | ||

.22 (5) (8) | 79.0 (84.5) | 88.7 (90.5) | ||

.46 (23) (69) | .16 (3) (4) | 80.0 (85.5) | 86.8 (89.7) | |

.19 (4) (6) | 81.2 (86.3) | 85.5 (87.7) | ||

.22 (5) (8) | 82.5 (86.3) | 89.7 (91.0) | ||

.48 (27) (83) | .16 (3) (4) | 76.0 (82.5) | 84.8 (88.2) | |

.19 (4) (6) | 78.3 (85.8) | 89.0 (91.8) | ||

.22 (5) (8) | 80.0 (84.8) | 90.0 (92.0) | ||

.70 (125) (630) | .44 (20) (57) | .16 (3) (4) | 79.3 (81.7) | 85.3 (85.0) |

.19 (4) (6) | 81.7 (84.8) | 89.2 (89.7) | ||

.22 (5) (8) | 83.7 (86.8) | 90.5 (91.5) | ||

.46 (23) (69) | .16 (3) (4) | 82.2 (86.0) | 86.0 (87.0) | |

.19 (4) (6) | 84.5 (87.0) | 89.7 (91.3) | ||

.22 (5) (8) | 83.0 (85.8) | 89.7 (90.5) | ||

.48 (27) (83) | .16 (3) (4) | 83.0 (86.3) | 84.8 (86.8) | |

.19 (4) (6) | 84.0 (86.8) | 90.0 (90.0) | ||

.22 (5) (8) | 84.0 (88.7) | 90.2 (92.0) | ||

.75 (177) (1,000) | .44 (20) (57) | .16 (3) (4) | 83.5 (84.8) | 87.3 (87.5) |

.19 (4) (6) | 84.0 (86.0) | 85.8 (86.5) | ||

.22 (5) (8) | 87.3 (88.5) | 89.7 (89.5) | ||

.46 (23) (69) | .16 (3) (4) | 88.7 (88.5) | 89.7 (89.5) | |

.19 (4) (6) | 90.0 (89.7) | 85.5 (86.8) | ||

.22 (5) (8) | 89.5 (91.5) | 93.3 (93.3) | ||

.48 (27) (83) | .16 (3) (4) | 88.0 (89.5) | 87.5 (89.0) | |

.19 (4) (6) | 86.8 (90.0) | 88.7 (88.5) | ||

.22 (5) (8) | 87.0 (87.5) | 92.3 (92.0) |

Coverage of 95 % confidence interval of \(m\)(.3). ( \((m^{(1)}\) (.3) = 7.7 and \(m^{(2)}\) (.3) = \(-\)35.4)

Kernel | Bootstrap | Block size | % Coverage | |
---|---|---|---|---|

Window size | Window size | \(\delta _{1}\) | Pivotal | (Normal) |

\(1-\delta _{3}\) | \(1-\delta _{2}\) |
| \(n=10^{3}\) | \(n=10^{4}\) |

.65 (89) (398) | .44 (20) (57) | .16 (3) (4) | 76.7 (81.5) | 84.5 (85.8) |

.19 (4) (6) | 81.2 (85.8) | 88.7 (90.2) | ||

.22 (5) (8) | 84.0 (87.3) | 86.3 (86.8) | ||

.46 (23) (69) | .16 (3) (4) | 73.8 (79.8) | 86.0 (87.7) | |

.19 (4) (6) | 79.3 (84.5) | 87.5 (88.2) | ||

.22 (5) (8) | 83.5 (86.8) | 89.5 (92.3) | ||

.48 (27) (83) | .16 (3) (4) | 74.8 (82.7) | 84.8 (88.5) | |

.19 (4) (6) | 79.0 (83.7) | 85.5 (88.2) | ||

.22 (5) (8) | 82.0 (89.0) | 87.0 (90.8) | ||

.70 (125) (630) | .44 (20) (57) | .16 (3) (4) | 82.0 (83.7) | 86.0 (86.3) |

.19 (4) (6) | 83.5 (86.3) | 85.8 (86.0) | ||

.22 (5) (8) | 84.5 (86.3) | 91.8 (92.8) | ||

.46 (23) (69) | .16 (3) (4) | 83.5 (87.0) | 87.7 (87.7) | |

.19 (4) (6) | 82.7 (85.5) | 88.7 (89.7) | ||

.22 (5) (8) | 81.7 (85.3) | 91.8 (93.0) | ||

.48 (27) (83) | .16 (3) (4) | 81.0 (84.2) | 88.7 (90.2) | |

.19 (4) (6) | 84.5 (86.0) | 88.7 (91.5) | ||

.22 (5) (8) | 83.0 (87.7) | 88.7 (90.5) | ||

.75 (177) (1,000) | .44 (20) (57) | .16 (3) (4) | 79.5 (81.7) | 88.2 (87.7) |

.19 (4) (6) | 84.2 (86.0) | 89.7 (90.0) | ||

.22 (5) (8) | 87.3 (88.2) | 90.2 (91.8) | ||

.46 (23) (69) | .16 (3) (4) | 84.5 (85.0) | 87.7 (88.2) | |

.19 (4) (6) | 82.5 (84.2) | 86.8 (88.0) | ||

.22 (5) (8) | 82.0 (84.0) | 88.2 (89.2) | ||

.48 (27) (83) | .16 (3) (4) | 79.5 (80.8) | 84.5 (85.3) | |

.19 (4) (6) | 84.0 (86.3) | 90.2 (90.8) | ||

.22 (5) (8) | 87.5 (90.2) | 87.3 (88.2) |

Coverage of 95 % confidence interval of \(m\)(.5). \((m^{(1)}\) (.5) = 0 and \(m^{(2)}\) (.5) = \(-\)40)

Kernel | Bootstrap | Block size | % Coverage | |
---|---|---|---|---|

Window size | Window size | \(\delta _{1}\) | Pivotal | (Normal) |

\(1-\delta _{3}\) | \(1-\delta _{2}\) |
| \(n=10^{3}\) | \(n=10^{4}\) |

.65 (89) (398) | .44 (20) (57) | .16 (3) (4) | 79.3 (83.7) | 85.3 (86.3) |

.19 (4) (6) | 80.0 (84.8) | 86.5 (89.2) | ||

.22 (5) (8) | 81.7 (87.5) | 89.2 (91.3) | ||

.46 (23) (69) | .16 (3) (4) | 80.0 (86.8) | 84.8 (87.3) | |

.19 (4) (6) | 83.0 (87.0) | 89.2 (91.5) | ||

.22 (5) (8) | 80.3 (86.3) | 88.2 (90.5) | ||

.48 (27) (83) | .16 (3) (4) | 74.8 (83.2) | 88.0 (90.2) | |

.19 (4) (6) | 80.3 (88.0) | 85.5 (88.7) | ||

.22 (5) (8) | 82.7 (87.0) | 86.0 (90.5) | ||

.70 (125) (630) | .44 (20) (57) | .16 (3) (4) | 83.7 (85.0) | 83.5 (84.5) |

.19 (4) (6) | 80.8 (82.2) | 90.8 (92.8) | ||

.22 (5) (8) | 82.2 (84.5) | 92.3 (93.0) | ||

.46 (23) (69) | .16 (3) (4) | 79.3 (83.7) | 84.2 (85.0) | |

.19 (4) (6) | 81.5 (84.2) | 89.2 (89.2) | ||

.22 (5) (8) | 83.0 (85.0) | 90.5 (92.0) | ||

.48 (27) (83) | .16 (3) (4) | 81.2 (84.0) | 87.0 (90.0) | |

.19 (4) (6) | 84.5 (87.0) | 86.5 (88.0) | ||

.22 (5) (8) | 82.0 (85.3) | 88.7 (90.0) | ||

.75 (177) (1,000) | .44 (20) (57) | .16 (3) (4) | 84.2 (84.8) | 84.5 (85.0) |

.19 (4) (6) | 77.0 (79.5) | 88.2 (89.5) | ||

.22 (5) (8) | 85.5 (86.5) | 90.2 (90.0) | ||

.46 (23) (69) | .16 (3) (4) | 81.5 (82.0) | 87.7 (88.2) | |

.19 (4) (6) | 87.7 (88.5) | 88.5 (89.5) | ||

.22 (5) (8) | 85.0 (85.3) | 89.7 (91.0) | ||

.48 (27) (83) | .16 (3) (4) | 79.3 (81.7) | 88.2 (88.5) | |

.19 (4) (6) | 84.0 (85.5) | 90.2 (90.5) | ||

.22 (5) (8) | 86.3 (88.2) | 92.5 (92.5) |

We use the first column to denote the kernel window size which is \(nh\) and can be written as \(n^{1-\delta _{3}}.\) Similarly, we use the second column to denote the bootstrap window size nB which is \(n^{1-\delta _{2}}\) and the third column as the block size. The exponent corresponding to the size is given along with the exact number corresponding to the two different data sizes that we use. An equal-tailed 95 % bootstrap confidence interval is constructed using the pivotal method as described in Politis (1998) and the coverage is computed. A 95 % confidence interval based on the asymptotic normality of the bootstrap estimator is also given.

Our choice for the size of the kernel window is based on the fact that we want to undersmooth to remove bias. For that reason we choose \(h=o(n ^{-\frac{1}{5} })\) which restricts us to window sizes of order \(o(n^{4/5}).\) Our bootstrap window is restricted to \( o(n^{2/3})\) and our blocks sizes need to have an exponent less than half of the exponent of the bootstrap window. Based on these restrictions, which are explicitly stated in our assumptions, we choose a suitable range of values for our simulation.

Variance comparison at \(m\)(.1): Variance of \((nh)^{\frac{{1}}{{ 2}}}(\hat{m}^{{ *}}(x)-E^{{ *}}\hat{m}^{{ *}}(x))\) [denoted V\(^{{ *}}]\)] compared to the variance of \((nh)^{\frac{{ 1}}{{ 2}}}(\hat{m}(x)-E\hat{m}(x))\) [denoted V] at \(x=.1\)

Kernel | Bootstrap | Block | \(n=10 ^{3}\) | \(n=10 ^{4}\) | ||||
---|---|---|---|---|---|---|---|---|

Window size | Window size | size | V | Var | MSE | V | Var | MSE |

\(1-\delta _{3}\) | \(1-\delta _{2}\) | \(\delta _{1}\) | (EV*) | V* | V* | (EV*) | V* | (V*) |

.65 | .44 | .16 (3) (4) | 5.4 (3.0) | .45 | 6.7 | 5.7 (3.5) | .25 | 5.0 |

(89) | (20) | .19 (4) (6) | 6.2 (3.3) | .64 | 8.7 | 5.1 (4.0) | .42 | 1.7 |

(398) | (57) | .22 (5) (8) | 5.1 (3.6) | .98 | 3.1 | 4.8 (4.2) | .56 | 1.0 |

.46 | .16 (3) (4) | 5.1 (3.0) | .43 | 4.7 | 5.7 (3.5) | .23 | 5.1 | |

(23) | .19 (4) (6) | 5.5 (3.5) | .62 | 4.8 | 5.5 (4.0) | .34 | 2.5 | |

(69) | .22 (5) (8) | 4.9 (3.7) | 1.06 | 2.4 | 6.0 (4.3) | .66 | 3.6 | |

.48 | .16 (3) (4) | 5.8 (3.1) | .38 | 7.5 | 5.1 (3.5) | .21 | 2.6 | |

(27) | .19 (4) (6) | 5.6 (3.5) | .73 | 5.2 | 5.5 (4.1) | .39 | 2.5 | |

(83) | .22 (5) (8) | 5.4 (3.8) | .96 | 3.5 | 5.5 (4.3) | .52 | 1.9 | |

.70 | .44 | .16 (3) (4) | 5.5 (3.0) | .38 | 6.4 | 5.5 (3.5) | .19 | 4.4 |

(125) | (20) | .19 (4) (6) | 5.3 (3.4) | .61 | 4.4 | 5.6 (3.9) | .33 | 3.1 |

(630) | (57) | .22 (5) (8) | 5.6 (3.5) | .67 | 5.0 | 5.9 (4.1) | .37 | 3.7 |

.46 | .16 (3) (4) | 4.9 (3.0) | .36 | 3.7 | 5.5 (3.5) | .15 | 4.1 | |

(23) | .19 (4) (6) | 5.5 (3.4) | .47 | 5.0 | 4.6 (4.0) | .30 | 0.7 | |

(69) | .22 (5) (8) | 5.8 (3.6) | .74 | 5.9 | 6.1 (4.3) | .48 | 3.8 | |

.48 | .16 (3) (4) | 5.1 (3.2) | .40 | 4.2 | 5.5 (3.6) | .18 | 3.8 | |

(27) | .19 (4) (6) | 5.0 (3.6) | .51 | 2.5 | 5.1 (4.1) | .30 | 1.3 | |

(83) | .22 (5) (8) | 5.8 (3.8) | .70 | 5.1 | 5.0 (4.3) | .40 | 1.0 | |

.75 | .44 | .16 (3) (4) | 5.3 (3.0) | .26 | 5.3 | 5.5 (3.5) | .18 | 4.1 |

(177) | (20) | .19 (4) (6) | 5.6 (3.4) | .40 | 5.3 | 5.2 (3.9) | .24 | 1.9 |

(1,000) | (57) | .22 (5) (8) | 5.8 (3.7) | .53 | 5.2 | 5.7 (4.1) | .35 | 2.7 |

.46 | .16 (3) (4) | 5.1 (3.3) | .26 | 3.5 | 5.2 (3.5) | .17 | 2.9 | |

(23) | .19 (4) (6) | 5.8 (3.7) | .40 | 4.8 | 5.8 (4.0) | .24 | 3.3 | |

(69) | .22 (5) (8) | 5.5 (3.9) | .50 | 3.0 | 4.9 (4.2) | .32 | 0.7 | |

.48 | .16 (3) (4) | 5.3 (3.7) | .29 | 3.0 | 5.0 (3.6) | .15 | 2.2 | |

(27) | .19 (4) (6) | 5.5 (4.2) | .39 | 2.2 | 5.3 (4.0) | .26 | 1.9 | |

(83) | .22 (5) (8) | 6.2 (4.5) | .60 | 3.2 | 5.0 (4.3) | .34 | 0.9 |

Variance comparison at \(m\)(.3): Variance of \((nh)^{\frac{{ 1}}{{ 2}}}(\hat{m}^{{ *}}(x)-E^{{ *}}\hat{m}^{{ *}}(x))\) [denoted V\(^{{ *}}]\) compared to the variance of \((nh)^{\frac{{ 1}}{{ 2}}}(\hat{m}(x)-E\hat{m}(x))\) [denoted V] at \(x=.3\)

Kernel | Bootstrap | Block | \(n=10^{3}\) | \(n=10^{4}\) | ||||
---|---|---|---|---|---|---|---|---|

Window size | Window size | size | V | Var | MSE | V | Var | MSE |

\(1-\delta _{3}\) | \(1-\delta _{2}\) | \(\delta _{1}\) | (EV*) | V* | V* | (EV*) | V* | (V*) |

.65 | .44 | .16 (3) (4) | 5.9 (2.9) | .46 | 9.1 | 5.9 (3.5) | .25 | 6.1 |

(89) | (20) | .19 (4) (6) | 5.1 (3.2) | .62 | 4.3 | 5.1 (3.9) | .37 | 1.8 |

(398) | (57) | .22 (5) (8) | 5.4 (3.4) | .94 | 4.9 | 6.0 (4.1) | .64 | 4.1 |

.46 | .16 (3) (4) | 4.8 (3.0) | .44 | 3.9 | 5.6 (3.5) | .22 | 4.8 | |

(23) | .19 (4) (6) | 5.1 (3.2) | .69 | 4.2 | 5.5 (4.0) | .44 | 2.8 | |

(69) | .22 (5) (8) | 5.8 (3.4) | .84 | 6.2 | 6.0 (4.2) | .60 | 3.8 | |

.48 | .16 (3) (4) | 5.1 (3.1) | .45 | 4.7 | 5.4 (3.6) | .21 | 3.7 | |

(27) | .19 (4) (6) | 5.5 (3.4) | .74 | 5.0 | 5.1 (4.1) | .43 | 1.8 | |

(83) | .22 (5) (8) | 5.1 (3.5) | .84 | 3.1 | 5.5 (4.2) | .56 | 2.0 | |

.70 | .44 | .16 (3) (4) | 6.3 (2.9) | .37 | 12.2 | 6.1 (3.5) | .18 | 6.8 |

(125) | (20) | .19 (4) (6) | 5.2 (3.2) | .49 | 4.5 | 4.8 (3.9) | .28 | 1.1 |

(630) | (57) | .22 (5) (8) | 5.6 (3.5) | .76 | 5.2 | 5.7 (4.1) | .42 | 2.9 |

.46 | .16 (3) (4) | 6.1 (2.9) | .36 | 10.3 | 6.0 (3.5) | .19 | 6.7 | |

(23) | .19 (4) (6) | 5.4 (3.3) | .51 | 5.8 | 5.3 (4.0) | .32 | 1.9 | |

(69) | .22 (5) (8) | 5.4 (3.6) | .77 | 4.3 | 5.8 (4.2) | .42 | 2.8 | |

.48 | .16 (3) (4) | 5.9 (3.0) | .39 | 8.6 | 5.0 (3.5) | .19 | 2.3 | |

(27) | .19 (4) (6) | 5.2 (3.4) | .50 | 3.8 | 5.6 (4.0) | .33 | 2.9 | |

(83) | .22 (5) (8) | 5.3 (3.6) | .65 | 3.7 | 5.3 (4.3) | .45 | 1.4 | |

.75 | .44 | .16 (3) (4) | 5.9 (2.9) | .25 | 9.3 | 6.0 (3.5) | .17 | 6.1 |

(177) | (20) | .19 (4) (6) | 5.9 (3.2) | .38 | 7.3 | 6.0 (4.0) | .30 | 4.6 |

(1,000) | (57) | .22 (5) (8) | 5.6 (3.4) | .46 | 5.3 | 5.6 (4.2) | .34 | 2.5 |

.46 | .16 (3) (4) | 5.8 (3.0) | .26 | 8.2 | 5.9 (3.5) | .19 | 6.0 | |

(23) | .19 (4) (6) | 4.8 (3.3) | .40 | 2.8 | 4.9 (4.0) | .28 | 1.2 | |

(69) | .22 (5) (8) | 5.4 (3.5) | .47 | 4.1 | 5.5 (4.2) | .33 | 2.0 | |

.48 | .16 (3) (4) | 5.4 (3.0) | .26 | 5.9 | 5.4 (3.5) | .20 | 3.6 | |

(27) | .19 (4) (6) | 5.8 (3.4) | .40 | 6.2 | 5.8 (4.0) | .29 | 3.4 | |

(83) | .22 (5) (8) | 4.7 (3.6) | .58 | 1.8 | 5.6 (4.3) | .34 | 2.1 |

Variance comparison at \(m\)(.5): Variance of \((nh)^{\frac{{ 1}}{{ 2}}}(\hat{m}^{{ *}}(x)-E^{{ *}}\hat{m}^{{ *}}(x))\) [denoted V\(^{{ *}}\)] compared to the variance of \((nh)^{\frac{{ 1}}{{ 2}}}(\hat{m}(x)-E\hat{m}(x))\) [denoted V] at \(x=.5\)

Kernel | Bootstrap | Block | \(n=10 ^{3}\) | \(n=10 ^{4}\) | ||||
---|---|---|---|---|---|---|---|---|

window size | window size | size | V | Var | MSE | V | Var | MSE |

\(1-\delta _{3}\) | \(1-\delta _{2}\) | \(\delta _{1}\) | (EV*) | V* | V* | (EV*) | V* | V* |

.65 | .44 | .16 (3) (4) | 5.0 (2.9) | .47 | 5.0 | 5.5 (3.5) | .23 | 4.2 |

(89) | (20) | .19 (4) (6) | 5.4 (3.2) | .69 | 5.3 | 6.0 (3.9) | .37 | 4.7 |

(398) | (57) | .22 (5) (8) | 4.8 (3.3) | .73 | 2.8 | 5.0 (4.2) | .65 | 1.4 |

.46 | .16 (3) (4) | 5.7 (3.0) | .37 | 7.9 | 5.7 (3.5) | .23 | 5.0 | |

(23) | .19 (4) (6) | 5.6 (3.2) | .64 | 6.2 | 5.0 (4.0) | .41 | 1.6 | |

(69) | .22 (5) (8) | 6.3 (3.4) | .71 | 9.0 | 5.6 (4.2) | .60 | 2.6 | |

.48 | .16 (3) (4) | 4.9 (3.0) | .44 | 4.1 | 4.9 (3.5) | .24 | 2.3 | |

(27) | .19 (4) (6) | 5.5 (3.3) | .63 | 5.4 | 6.0 (4.0) | .42 | 4.3 | |

(83) | .22 (5) (8) | 4.8 (3.5) | .90 | 2.4 | 5.3 (4.2) | .50 | 1.6 | |

.70 | .44 | .16 (3) (4) | 6.4 (2.9) | .32 | 12.8 | 5.9 (3.5) | .19 | 5.9 |

(125) | (20) | .19 (4) (6) | 5.7 (3.2) | .55 | 6.9 | 5.3 (3.9) | .33 | 2.3 |

(630) | (57) | .22 (5) (8) | 5.2 (3.4) | .68 | 3.9 | 5.6 (4.1) | .40 | 2.6 |

.46 | .16 (3) (4) | 5.6 (2.9) | .33 | 7.6 | 5.1 (3.5) | .19 | 2.8 | |

(23) | .19 (4) (6) | 5.8 (3.2) | .47 | 7.0 | 5.4 (4.0) | .30 | 2.5 | |

(69) | .22 (5) (8) | 5.7 (3.5) | .75 | 5.6 | 5.7 (4.3) | .47 | 2.4 | |

.48 | .16 (3) (4) | 5.7 (3.0) | .36 | 7.8 | 5.7 (3.5) | .18 | 4.7 | |

(27) | .19 (4) (6) | 5.7 (3.3) | .50 | 5.9 | 4.9 (4.0) | .29 | 1.1 | |

(83) | .22 (5) (8) | 5.8 (3.5) | .66 | 6.1 | 5.6 (4.3) | .42 | 2.2 | |

.75 | .44 | .16 (3) (4) | 5.4 (2.9) | .28 | 6.6 | 5.7 (3.5) | .18 | 5.2 |

(177) | (20) | .19 (4) (6) | 6.2 (3.1) | .38 | 10.1 | 5.6 (3.9) | .26 | 3.2 |

(1,000) | (57) | .22 (5) (8) | 5.0 (3.4) | .56 | 3.2 | 5.0 (4.2) | .34 | 1.0 |

.46 | .16 (3) (4) | 5.4 (2.9) | .25 | 6.5 | 5.3 (3.5) | .20 | 3.4 | |

(23) | .19 (4) (6) | 6.0 (3.3) | .41 | 8.2 | 6.4 (3.9) | .24 | 6.3 | |

(69) | .22 (5) (8) | 4.7 (3.4) | .49 | 2.3 | 5.4 (4.2) | .35 | 1.9 | |

.48 | .16 (3) (4) | 5.9 (3.0) | .26 | 8.8 | 4.9 (3.5) | .20 | 2.1 | |

(27) | .19 (4) (6) | 5.6 (3.3) | .41 | 5.5 | 5.4 (4.0) | .26 | 2.2 | |

(83) | .22 (5) (8) | 5.0 (3.5) | .52 | 2.7 | 5.3 (4.3) | .36 | 1.3 |

The size of the kernel window should be smaller when the curvature is large in absolute value. The bootstrap window relies on the constancy of the underlying function, and should be smaller if the first derivative is large in absolute value. The block size should be larger if the data is strongly dependent. We note that for larger data sets our bootstrap variance approaches the variance of \(\sqrt{nh}\hat{m} (x)\). Many of the patterns we would expect theoretically can be seen in our simulation.

As a referee pointed out, the coverage for the confidence interval using the pivotal method was typically lower than the coverage using asymptotic normality. This is particularly evident in the smaller data set, where coverage is lower than the larger data set. This was puzzling at first but an explanation for this phenomenon is possible. Recall that the noise satisfies the nonlinear autoregression \( e_{t}=\;sin(e_{t-1})\;+\;\epsilon _{t} \). In general, \(sin(x)\le x\), so the autoregression is stable. However, for small \(x\) we have the approximation \(sin(x)\approx x\). Due to \(\epsilon _{t}\) being i.i.d. N(0,1), most of the \(e_{t}\) are indeed small, hence the autoregression—although formally nonlinear—is close to linearity (and to normality due to the normal inputs). A histogram (and QQ-plot) of the \( e_{t}\) generated confirms that their marginal distribution is indistiguishable from a Gaussian. If this is the case, then the distribution of \(\hat{m}(x)\) would also be (finite-sample) normal, since \(\hat{m}(x)\) is linear in the errors. Finally, using the normal reference distribution (with estimated variance) is obviously the better thing to do when the estimator happens to have a (finite-sample) normal distribution.

The coverage improves considerably across all window and block sizes as we increase our data size. The smaller data set is much more sensitive to window and block sizes. We notice both these effects as we compare the coverage percentages of the two data sizes in Tables 1, 2 and 3. The coverage for the confidence interval using the pivotal method was lower than the coverage using asymptotic normality in the smaller data set. On the larger data, this gap is reduced considerably, and in many cases the two coverages are the within 0.5 % of each other (i.e. equal to the nearest percent).

As the block sizes are increased, the coverage is improved. This improvement is very clear in the smaller data set as we compare the different sets of 3 numbers keeping bootstrap and kernel window constant. Most of these sets show improvement with increasing block size in both Tables 1, 2 and 3 corresponding to the different values of \(x\). This relationship is also present in the larger data set, although less clear. One reason could be that the coverage is higher for the larger data set, the variation between coverage values are small and consequently the patterns are harder to detect.

The coverage improves slightly as the bootstrap window size is increased. This is observed in both Table 1 and Table 2 as we note the changes in coverage across different bootstrap window sizes keeping the block size and the kernel window size constant. It does not increase for all such comparisons, but for a majority of them. One would expect that a larger bootstrap window would be less favorable at \(x=0.1\) where \(\left|m^{(1)}(.1)\right|=13.5\) than it would for \(x=0.3\) where \(\left|m^{(1)}(.1)\right|=7.7\) corresponding to Tables 1 and 2 respectively. This is observed for \(n=1{,}000\) and kernel size \(n^{0.65}\) where the coverage improves in the Table 2 as the bootstrap window size increases while in Table 1 coverage decreases for most comparisons. In Table 3, where \(\left|m^{(1)}(.5)\right|=0,\) the bootstrap window size did not seem to affect the coverage.

The coverage improved for increasing kernel window size, for Tables 1, 2 and 3 for the smaller data set. For \(n=10{,}000\), this trend is less noticeable, but present. This is again observed by comparing the coverage for different kernel size keeping the other size varaibles constant. The kernel window size was restricted to be less than \(O(n^{4/5})\) so that the bias goes to zero asymptotically. This would allow us to construct the bootstrap confidence intervals without the need for bias correction. The bias is proportional to the 2nd derivative \(m^{(2)}(x).\) In Table 1, \(\left|m^{(2)}(.1)\right|=23.2\) and in Table 2, \(\left|m^{(2)}(.1)\right|=35.4.\) One would expect that comparatively lower values of the kernel window size would favor the data set with the higher absolute curvature. We could not find any strong effect of this phenomenon in our result.

The variance of \(\sqrt{nh}\hat{m} ^{*}(x)\) is close to the variance of \(\sqrt{nh}\hat{m} (x) \) for the larger data set and is fairly stable. For large data sets, we could use the LBB algorithm to estimate the variance of \(\sqrt{nh}\hat{m} (x)\). It would be a difficult task to compute the variance by direct calculation because one would have to estimate covariances and truncate the sum of these covariances. Using LBB, we can simply compute the variance of the bootstrap estimates. In general, the bootstrap estimator variance is smaller than the variance of \(\sqrt{nh}\hat{m} (x)\) but tends to increase and close the gap as we increase block size. This is observed in Tables 4, 5 and 6 when we look at the sets of three coverage numbers correspoding to varying block sizes. This is expected because having bigger blocks retains more of the dependence structure of the original time series.

We notice that our results vary depending on the ratios of the kernel window, bootstrap window and block sizes. There seems to be more subtle relationships between the relative window sizes and the magnitudes of the slope and curvature. This relationship would require further investigation. It would seem that one should empirically establish these ratios for a given data set at a specified point by testing the procedure on several subsets of the data. This is like the subsampling-cross validation idea proposed in Hall et al. (1995). One can then use the LBB procedure for those values which work well on these subsets.

## 6 Proofs

*Proof of Theorem 1 (i)*

Recall that \(x\) is a fixed number in \( (0,1)\). In what follows, we assume that \(n\) is big enough, so that \(h\) (that tends to zero) is small enough to guarantee that either \(x>h\) (if \(x<1/2\)), or \(1-x>h\) (if \(x\ge 1/2\)); in this way the kernel estimator is defined without boundary effects.

*Proof of Theorem 1 (ii)*

*Proof of Theorem 1 (iii)*

*Proof of Theorem 2*

To see why, consider the case \(x<1/2\), and the influence of an end point \(x_i\). Being a left end point, \(x_i=i/n\) for some \(i\le nB\), i.e., \(x_i=i/n \le nB/n =B\). Since \(x> B+h/2\) it follows that \(x> B+h/2\), and thus \(x-x_i> h/2\). By the compact support of \(K\), this implies that \(K((x-x_{i})/h)=0\) and thus the effect of the end points can be neglected. The case \(x\ge 1/2\) is similar.

*Proof of Lemma 1*

*Proof of Lemma 2*

*Proof of Theorem 3*

*Proof of Theorem 4*

*Proof of Theorem 5*

Observe that \(\sigma _{as}^{2}=2\pi f(0)\int K^{2}(u)d(u)\) does not depend on \(x\). This is due to stationarity of the errors \(\varepsilon _{t}\) and because \(m(x)\) is a deterministic function.We also note that \(\hat{m}^{*}(a_{i})\) and \(\hat{m}^{*}(a_{j})\) for \(i\ne j\) are asymptotically independent. This is because the number of observations between a\(_{i}\) and a\(_{j}\) is \(n\left|a_{i}-a_{j}\right|\) and since we have a kernel window which smooths over \(nh\) observations, 2\(nh+b<n\left|a_{i}-a_{j}\right|\) for n large enough. This implies \(\hat{m}^{*}(a_{i})\) and \(\hat{m}^{*}(a_{j})\) are independent for \(i\ne j\) because they are kernel estimators computed over disjoint and independent observations of the pseudo series. Also \(\hat{m}(a_{i} )\) and \(\hat{m}(a_{j})\) are asymptotically independent because 2\(nh<n\left|a_{i}-a_{j}\right|\) and \(\varepsilon _{t}\) are strong mixing. We now use Theorem 4 to establish the result. Let \(\lambda =(\lambda _{1},\lambda _{2},\ldots .,\lambda _{d})\in R^{d}.\) Then, \( \ \ \ P(\lambda ^{T}((nh)^{\frac{1}{2}}(\hat{m}^{*}(x_{1})-E^{*}\hat{m}^{*} (x_{1})),\ldots ,(nh)^{\frac{1}{2}}(\hat{m}^{*}(x_{d})-E^{*}\hat{m}^{*}(x_{d})))\le u) \)

\(-\Phi \left( \frac{u}{((\lambda _{1}^{2}+\lambda _{2} ^{2}\ldots .+\lambda _{d}^{2})\sigma _{as}^{2})^{\frac{1}{2}}}\right) \rightarrow _{p}0.\) Therefore as an application of the Cramer-Wold Device we have our result.