1 Introduction

Directional or circular time series arise in many scientific fields such as meteorology, oceanography, biology, neuroscience, bioinformatics, geoscience and cosmology. An overview of directional data analysis can be found for instance in books by Mardia (1972), Fisher (1993), Mardia and Jupp (2000), Jammalamadaka and SenGupta (2001) and Ley and Verdebout (2017). Most literature deals with iid observations. More generally, for cases where more than one circular variable is observed, parametric and nonparametric circular-circular regression with iid residuals is discussed for instance in Gould (1969), Fisher and Lee (1992), Johnson and Wehrly (1978), Mardia and Jupp (2000), Jammalamadaka and SenGupta (2001), Kato et al. (2008), Kim and SenGupta (2016) and Polsen and Taylor (2015). Nonparametric density estimation on the circle is discussed for instance by Watson (1983), Hall et al. (1987), Bai et al. (1988), Fisher (1989, 1993) and Taylor (2008). Also see Tsurata and Sagae (2017) for properties of higher order circular kernels. In practice, one often observes circular time series that exhibit serial dependence. Circular processes with weak serial dependence are considered for instance in Wehrly and Johnson (1980), Breckling (1989), Kato (2010), Modlin et al. (2012) and Wang and Gelfand (2014). Circular time series with long-range dependence are considered in Di Marzio et al. (2012) and Beran and Ghosh (2019). Di Marzio et al. (2012) discuss nonparametric trend estimation, Beran and Ghosh (2019) define models with long-range dependence using Gaussian subordination, and derive asymptotic results for parametric estimators where the mean direction depends on deterministic explanatory variables. In this paper, we extend the results in Beran and Ghosh (2019) to a nonparametric circular-circular regression model of the form

$$\begin{aligned} \vartheta _{t}=\left[ \mu \left( \psi _{t}\right) +Z_{t}\right] {\text {mod}}2\pi \end{aligned}$$
(1)

where \(\vartheta _{t}\), \(\psi _{t}\in [-\pi ,\pi )\) (\(t=1,2,\ldots \)), \(\psi _{t} \) and \(Z_{t}\) are stationary long-memory processes defined by Gaussian subordination, and \(Z_{t}\) is such that

$$\begin{aligned} E\left( \sin Z_{t}\right) =0\text {, }E\left( \cos Z_{t}\right) =R>0 \end{aligned}$$
(2)

(see e.g. Gould 1969). Equations (1) and (2) mean that, given \(\psi _{t}=w^{0}\), the mean direction of \(\vartheta _{t}\) is \(\mu \left( w^{0}\right) \), i.e.

$$\begin{aligned} E\left[ \exp \left( i\vartheta _{t}\right) \mid \psi _{t}=w^{0}\right] =R\exp \left( i\mu \left( w^{0}\right) \right) \text { (}w^{0}\in [-\pi ,\pi )\text {).} \end{aligned}$$
(3)

We consider asymptotic properties of circular kernel estimators of \(\mu (w^{0})\). Limit theorems are derived using in particular general results in Mielniczuk and Wu (2004). Due to long-range dependence, a range of asymptotically optimal bandwidths can be found where the asymptotic rate of convergence does not depend on the bandwidth. The results can be used for obtaining simple simultaneous confidence bands for \(\mu (w_{1}^{0} ),\ldots ,\mu (w_{m}^{0})\) (\(w_{1}^{0},\ldots ,w_{m}^{0}\in [-\pi ,\pi )\), \(m\in {\mathbb {N}}\)).

The paper is organized as follows. A general introduction and definitions are given in Sect. 2. Limit theorems are discussed in Sect. 3. In Sect. 4, asymptotic results from Sect. 3 are used to obtain confidence bands for \(\mu (\psi )\). An application to wind direction data is discussed in Sect. 5. Proofs are given in the Appendix.

2 Definitions

2.1 Definition of the model

We extend the model introduced in Beran and Ghosh (2019) to bivariate circular time series with long-range dependence. First we recall the definition of long memory in the sense of second order dependence. A real-valued second order stationary time series with autocovariance function \(\gamma (k)\) is said to exhibit long-range (or strong) dependence, if \(\sum \gamma (k)=\infty \). Often it is assumed that

$$\begin{aligned} \gamma (k)\sim c_{\gamma }k^{2H-2}\text { (}k\rightarrow \infty \text {),} \end{aligned}$$
(4)

or that the spectral density has a pole at the origin characterized by

$$\begin{aligned} f\left( \lambda \right) =\frac{1}{2\pi }\sum _{k=-\infty }^{\infty }\gamma \left( k\right) e^{-ik\lambda }\underset{\lambda \rightarrow 0}{\sim }c_{f}\left| \lambda \right| ^{1-2H} \end{aligned}$$
(5)

with \(H\in (\frac{1}{2},1)\) and \(0<c_{\gamma }<\infty \). Here, “\(\sim \)” means that the ratio of the left and right hand side converges to 1. References to the extended literature on long-memory processes can be found for instance in Beran (1994), Giraitis et al. (2012) and Beran et al. (2013). For generating schemes for long memory processes also see e.g. Davidson and Sibbertsen (2005).

To obtain bivariate circular time series \((\psi _{t},\vartheta _{t})\in [0,2\pi )^{2}\) (\(t\in {\mathbb {Z}}\)) with long-range dependence, the following assumptions will be used:

  • (A1) Let \(Z_{t}\), \(X_{t}\) (\(t\in {\mathbb {Z}}\)) be stationary Gaussian processes with \(E(Z_{t})=E(X_{t})=0\), \(0<\sigma _{Z}^{2}=var(Z_{t})<\infty \), \(\sigma _{X}^{2}=var(X_{t})=1\), and autocovariance functions \(\gamma _{Z}\), \(\gamma _{X}\) and spectral densities \(f_{Z}\), \(f_{X}\) such that

    $$\begin{aligned}&\gamma _{Z}\left( k\right) \underset{k\rightarrow \infty }{\sim }c_{\gamma ,Z}k^{2H_{Z}-2}, \end{aligned}$$
    (6)
    $$\begin{aligned}&f_{Z}\left( \lambda \right) =\frac{1}{2\pi }\sum _{k=-\infty }^{\infty } \gamma _{Z}\left( k\right) e^{-ik\lambda }\underset{\lambda \rightarrow 0}{\sim }c_{f,Z}\left| \lambda \right| ^{1-2H_{Z}}, \end{aligned}$$
    (7)
    $$\begin{aligned}&\gamma _{X}\left( k\right) \underset{k\rightarrow \infty }{\sim }c_{\gamma ,X}k^{2H_{X}-2}, \end{aligned}$$
    (8)
    $$\begin{aligned}&f_{X}\left( \lambda \right) =\frac{1}{2\pi }\sum _{k=-\infty }^{\infty } \gamma _{X}\left( k\right) e^{-ik\lambda }\underset{\lambda \rightarrow 0}{\sim }c_{f,X}\left| \lambda \right| ^{1-2H_{X}} \end{aligned}$$
    (9)

    for some constants \(0<c_{\gamma ,Z},c_{f,Z},c_{\gamma ,X},c_{f,X}<\infty \) and \(\frac{1}{2}<H_{Z},H_{X}<1\). Moreover, assume that the two processes are independent from each other.

  • (A2) Let \(G_{\psi }(u)=\int _{-\pi }^{u}g_{\psi }(v)dv\) (\(u\in [-\pi ,\pi )\)) be an absolutely continuous circular cumulative distribution function with density \(g_{\psi }=G_{\psi }^{^{\prime }}\), and denote by \(\Phi \) the standard normal distribution function. Define the circular time series \(\psi _{t}\) (\(t\in {\mathbb {Z}}\)) by

    $$\begin{aligned} \psi _{t}=G_{\psi }^{-1}\left( \Phi \left( X_{t}\right) \right) . \end{aligned}$$
    (10)
  • (A3) Let \(\mu :[-\pi ,\pi )\rightarrow {\mathbb {R}}\) be a twice continuously differentiable function. The circular time series \(\vartheta _{t}\) (\(t\in {\mathbb {Z}}\)) is defined by

    $$\begin{aligned} \vartheta _{t}=\left[ \mu \left( \psi _{t}\right) +Z_{t}\right] {\text {mod}}2\pi . \end{aligned}$$
    (11)

Remark 1

Assumption (A1) implies

$$\begin{aligned} E\left( \sin Z_{t}\right)= & {} 0\text {, }E\left( \cos Z_{t}\right) =R_{Z}>0, \end{aligned}$$
(12)
$$\begin{aligned} E\left( \sin X_{t}\right)= & {} 0\text {, }E\left( \cos X_{t}\right) =R_{X}>0. \end{aligned}$$
(13)

Note also that \(Z_{t}\) and \(X_{t}\) can be written in the Wold representation with iid innovations \(\varepsilon _{t}\sim N(0,\sigma _{\varepsilon }^{2})\) and \(\eta _{t}\sim N(0,\sigma _{\eta }^{2})\) respectively, as

$$\begin{aligned} Z_{t}=\sum \limits _{j=0}^{\infty }{a_{j}\varepsilon _{t-j}}\text {, }X_{t} =\sum \limits _{j=0}^{\infty }{c_{j}\eta _{t-j}}\text { (}t\in {\mathbb {Z}} \text {)} \end{aligned}$$
(14)

where \(a_{0}=c_{0}=1\) and

$$\begin{aligned} a_{j}\sim C_{a}j^{H_{Z}-3/2}\text {,}\ c_{j}\sim C_{c}j^{H_{X}-3/2} \ (j\rightarrow \infty ) \end{aligned}$$
(15)

for some suitable constants \(0<C_{a}\), \(C_{c}<\infty \).

Remark 2

The circular time series \(\psi _{t}=G_{\psi }^{-1}(\Phi (X_{t}))\) is said to be subordinated to the Gaussian process \(X_{t}\) (see e.g. Rosenblatt 1961, 1979; Taqqu 1975, 1979; Dobrushin and Major 1979; Dobrushin 1980). By definition, the marginal distribution of the series is the circular distribution \(G_{\psi }\). Note that the circular density function \(g_{\psi }=G_{\psi }^{\prime }\) is arbitrary, i.e. using (10) any circular density function can be obtained as a marginal density via Gaussian subordination.

Assumption (8) means that \(X_{t}\) exhibits long-range dependence. For the circular time series \(\psi _{t}\), a different definition of autocovariance and autocorrelation function has to be used. Let \(\nu _{\psi }\) denote the mean direction of \(\psi _{t}\), i.e. \(E[\exp (i\psi _{t})]=R\exp (i\nu _{\psi })\). Jammalamadaka and Sarma (1988) proposed the circular autocorrelation function

$$\begin{aligned} \rho _{\text {circular}}\left( k\right) =\frac{E\left[ \sin \left( \psi _{t}-\nu _{\psi }\right) \sin \left( \psi _{t+k}-\nu _{\psi }\right) \right] }{E\left[ \sin ^{2}\left( \psi _{1}-\nu _{\psi }\right) \right] }. \end{aligned}$$
(16)

To see whether long memory of \(X_{t}\) is inherited by \(\psi _{t}\) one needs to introduce the notion of Hermite rank. Denote by \(\varphi (x)=(2\pi )^{-\frac{1}{2}}\exp (-\frac{1}{2}x^{2})\) the standard normal density function, and by \(L^{2}({\mathbb {R}};\varphi )\) the space of real valued functions h with \(||h||^{2}=\int |h(z)|^{2}\varphi (z)dz<\infty \). Equipped with the scalar product \(<h,{\tilde{h}}>=\int h(x){\tilde{h}}(x)\varphi (x)dx\), \(L^{2} ({\mathbb {R}};\varphi )\) is a Hilbert space. Hermite polynomials

$$\begin{aligned} {\mathcal {H}}_{q}\left( z\right) =\left( -1\right) ^{q}\exp \left( \frac{1}{2}z^{2}\right) \frac{d^{q}}{dz^{q}}\exp \left( -\frac{1}{2}z^{2}\right) \text { (}q=0,1,\ldots \text {)} \end{aligned}$$

build an orthogonal basis in \(L^{2}({\mathbb {R}};\varphi )\). Thus, every function \(h\in L^{2}({\mathbb {R}};\varphi )\) has an orthogonal \(L^{2}\)-representation

$$\begin{aligned} h\left( z\right) =\sum _{q=0}^{\infty }\frac{a_{q}}{q!}{\mathcal {H}}_{q}\left( z\right) , \end{aligned}$$

where the \(q\hbox {th}\) Hermite coefficient \(a_{q}\) is given by

$$\begin{aligned} a_{q}=\left\langle h,{\mathcal {H}}_{q}\right\rangle =\int _{-\infty }^{\infty }h\left( z\right) {\mathcal {H}}_{q}\left( z\right) \varphi (z)dz. \end{aligned}$$

A function h with \(a_{0}=0\) is said to have Hermite rank \(m\ge 1\), if \(a_{m}\ne 0\) and \(a_{j}=0\) (\(j<m\)) (Taqqu 1975). The following Lemma is derived in Beran and Ghosh (2019):

Lemma 1

Let \(\psi _{t}\) (\(t\in {\mathbb {Z}}\)) be defined in (10) with \(\gamma _{X}\) characterized by (8). Denote by \(\nu _{\psi }\) the mean direction of \(\psi _{t}\), and define

$$\begin{aligned} h\left( x\right) =\sin \left\{ G_{\psi }^{-1}\left( \Phi \left( x\right) \right) -\nu _{\psi }\right\} . \end{aligned}$$

Suppose that the Hermite rank of h is m. Furthermore assume that

$$\begin{aligned} H_{X}>1-\frac{1}{2m}. \end{aligned}$$
(17)

Then, setting

$$\begin{aligned} H_{m}:=1+\left( H_{X}-1\right) m, \end{aligned}$$
(18)

we have

$$\begin{aligned} \rho _{\text {circular}}\left( k\right) \underset{k\rightarrow \infty }{\sim }c_{m}k^{2H_{m}-2} \end{aligned}$$
(19)

where

$$\begin{aligned} c_{m}=\frac{a_{m}^{2}c_{\gamma ,Z}^{m}}{m!E\left[ \sin ^{2}\left( \vartheta _{1}-\mu \right) \right] }. \end{aligned}$$

Remark 3

Note that, by definition of the mean direction \(\nu _{\psi }\),

$$\begin{aligned} E\left[ h\left( X_{t}\right) \right] =E\left[ \sin \left\{ G_{\psi } ^{-1}\left( \Phi \left( X_{t}\right) \right) -\nu _{\psi }\right\} \right] =0. \end{aligned}$$

Lemma 1 means that, under assumption (17), the directional series \(\psi _{t}\) has long-range dependence in the sense that its circular autocorrelation \(\rho _{\text {circular}}(k)\) has a hyperbolic decay and is not summable.

As for the second circular series \(\vartheta _{t}\), (11) implies

$$\begin{aligned} \vartheta _{t}&=\left[ \mu \left( G_{\psi }^{-1}\left( \Phi \left( X_{t}\right) \right) \right) +Z_{t}\right] {\text {mod}} 2\pi \end{aligned}$$
(20)
$$\begin{aligned}&=\left[ {\tilde{\mu }}\left( X_{t}\right) +Z_{t}\right] {\text {mod}} 2\pi \end{aligned}$$
(21)

where

$$\begin{aligned} {\tilde{\mu }}\left( x\right) =\mu \left( G_{\psi }^{-1}\left( \Phi \left( x\right) \right) \right) . \end{aligned}$$

Since the processes \(X_{t}\) and \(Z_{t}\) are stationary, \(\vartheta _{t}\) is stationary with a constant mean direction \(\nu _{\vartheta }\). By definition, the conditional mean direction of \(\vartheta _{t}\) given \(\psi _{t}=w^{0}\) is equal to \(\mu (w^{0})\) (see (3)). Moreover, (20) implies that long-range dependence in \(Z_{t}\) leads to long-range dependence in \(\vartheta _{t}\).

2.2 Kernel estimation of the conditional mean direction

Let \(w^{0}\in [0,2\pi )\). The conditional mean direction \(\mu (w^{0})\) of \(\vartheta _{t}\) can be estimated by a circular kernel estimator. In the context of iid observations, circular kernel estimators have been discussed for instance by Hall et al. (1987), Fisher (1989), Jammalamadaka and SenGupta (2001), Taylor (2008), Tsurata and Sagae (2017); Tsuruta and Sagae (2020), Bedouhene and Zougab (2020). Here, we will consider Nadaraya–Watson estimators of the form

$$\begin{aligned} {\hat{\mu }}_{n}(w^{0})=\frac{\sum \nolimits _{s=1}^{n}{K_{b}}\left( \psi _{s} -w^{0}\right) \vartheta _{s}}{\sum \nolimits _{s=1}^{n}{K_{b}}\left( \psi _{s}-w^{0}\right) }. \end{aligned}$$
(22)

For the kernel function \(K_{b}\) we adopt the definition introduced in Hall et al. (1987) and Tsurata and Sagae (2017) among others.

Definition 1

Let \(L_{\kappa }(u)=L(\kappa (1-\cos u))\) and \(C_{\kappa }(L)=\int _{-\pi }^{\pi }L_{\kappa }(u)du\). Then, setting \(\kappa =b^{-2}\) (\(b>0\)), we define circular kernels

$$\begin{aligned} K_{b}\left( u\right) =C_{\kappa }^{-1}\left( L\right) L_{\kappa }\left( u\right) \text { (}u\in [-\pi ,\pi )\text {).} \end{aligned}$$
(23)

Moreover, the \(l\hbox {th}\) moment of L is defined by

$$\begin{aligned} m_{l}\left( L\right) =\int _{0}^{\infty }L(r)r^{(l-1)/2}dr\text { (}l=2k\text {, }k\in \left\{ 0,1,2,\ldots \right\} \text {).} \end{aligned}$$
(24)

Remark 4

Note that by definition \(\int _{-\pi }^{\pi }K_{b}(u)du=1\). Thus, if \(L\ge 0\), then \(K_{b}\) is a circular density function.

The following assumptions on L are used in Tsurata and Sagae (2017):

  • (L1) \(L^{\prime }(r)=dL/dr\) is continuous.

  • (L2)

    $$\begin{aligned} \int _{0}^{\infty }L^{2}\left( \frac{x^{2}}{2}\right) dx<\infty \text {, } \int _{0}^{\infty }L^{2}\left( \frac{x^{2}}{2}\right) x^{2}dx<\infty \text {.} \end{aligned}$$
  • (L3) For \(j_{0}=2k_{0}\), we have \(m_{l}(L)<\infty \) (\(0\le l\le j_{0}\)) and

    $$\begin{aligned} m_{l}\left( L\right) =\int _{0}^{y}L(r)r^{(l-1)/2}dr+O\left( y^{-(j_{0} +1)/2}\right) \text { (}0\le l\le j_{0}\text {).} \end{aligned}$$
  • (L4)

    $$\begin{aligned} \lim _{r\rightarrow \infty }L\left( r\right) r^{(j_{0}+1)/2}=0. \end{aligned}$$

Note that, under assumptions (L1) to (L4), \(K_{b}\) has a Fourier series representation

$$\begin{aligned} {K}_{b}\left( u\right) = \frac{1}{2\pi }\sum _{s=-\infty }^{\infty }\alpha _{s}\left( b\right) \exp \left( -isu\right) \text { (}u\in [-\pi ,\pi )\text {)} \end{aligned}$$
(25)

with

$$\begin{aligned} \alpha _{-s}\left( b\right) =\alpha _{s}\left( b\right) =\int _{-\pi }^{\pi }{K}_{b}\left( u\right) \exp \left( isu\right) du\text {.} \end{aligned}$$

The order of a circular kernel is defined as follows (Tsurata and Sagae 2017):

Definition 2

Let \(p=2k\) for some \(k\in \{1,2,\ldots \}\). Then \(K_{b}(u)=C_{\kappa } ^{-1}(L)L_{\kappa }(u)\) with \(\kappa =b^{-2}\) is called a circular kernel of order p, if (L1) to (L4) hold with \(j_{0}\ge p+2\), and

$$\begin{aligned} m_{0}\left( L\right) \ne 0\text {, }m_{l}\left( L\right) =0\text { (}l=2,4,\ldots ,p-2\text {), }m_{p}\left( L\right) \ne 0. \end{aligned}$$

Remark 5

Tsurata and Sagae (2017) discuss methods for constructing higher order circular kernels from lower order kernels. For instance, let

$$\begin{aligned} L_{2}\left( r\right)= & {} \exp \left( -r\right) \text {, }L_{2}^{(1)}\left( r\right) =\frac{d}{dr}L_{2}\left( r\right) ,\\ r_{\kappa }\left( u\right)= & {} \kappa \left( 1-\cos u\right) , \\ L_{\kappa ,2}\left( u\right)= & {} L_{2}\left( r_{\kappa }\left( u\right) \right) =\exp \left( -\kappa \left( 1-\cos u\right) \right) , \end{aligned}$$

and denote by \(C_{k}(L_{k,2})\) the corresponding normalizing constant. Then

$$\begin{aligned} m_{l}\left( L\right) =\int _{0}^{\infty }L_{2}(r)r^{(l-1)/2}dr=\Gamma \left( l+\frac{1}{2}\right) \text { (}l=2k\text {, }k\in \left\{ 0,1,2,\ldots \right\} \text {)} \end{aligned}$$

(Tsurata and Sagae 2017). In particular, \(m_{0}(L_{2})=\Gamma (\frac{1}{2})=\sqrt{\pi }\), and

$$\begin{aligned} m_{2}\left( L_{2}\right) =\Gamma \left( 2+\frac{1}{2}\right) =\frac{3}{4}\sqrt{\pi }. \end{aligned}$$

Thus, \(K_{b,2}(u)=C_{\kappa }^{-1}(L_{\kappa ,2})L_{\kappa ,2}(u)\) with \(\kappa =b^{-2}\) is a circular kernel of order 2. Now, let

$$\begin{aligned} L_{4}\left( r\right) =\frac{3}{2}L_{2}\left( r\right) +rL_{2}^{(1)}\left( r\right) , \end{aligned}$$

set \(L_{\kappa ,4}(u)=L_{4}(r_{\kappa }(u))\) and denote by \(C_{k}(L_{k,4})\) the corresponding normalizing constant. Then

$$\begin{aligned} K_{b,4}\left( u\right) =C_{\kappa }^{-1}\left( L_{\kappa ,4}\right) L_{\kappa ,4}\left( u\right) \end{aligned}$$

is a kernel of order 4. Similarly, kernels \(K_{b,p+2}\) of order \(p+2\) can be obtained recursively by setting

$$\begin{aligned} L_{p+2}\left( r\right) =\frac{p+1}{p}L_{p}\left( r\right) +\frac{2}{p}rL_{p}^{(1)}\left( r\right) \text { (}p=2,4,6,\ldots \text {)} \end{aligned}$$

\(L_{\kappa ,p+2}(u)=L_{p+2}(r_{k}(u))\) and

$$\begin{aligned} K_{b,p+2}\left( u\right) =C_{\kappa }^{-1}\left( L_{\kappa ,p+2}\right) L_{\kappa ,p+2}\left( u\right) . \end{aligned}$$

3 Asymptotic results

The asymptotic rate of convergence of \({\hat{\mu }}_{n}(w^{0})-\mu (w^{0})\) is characterized by the following decomposition:

Theorem 1

Let \(w^{0}\in [-\pi ,\pi )\), suppose that (A1) to (A3) and (L1) to (L4) hold and \(K_{b}\) is a kernel of order \(p=2k\). Furthermore assume that \({\tilde{\mu }}(x)=\mu \left( G_{\psi }^{-1}\left( \Phi \left( x\right) \right) \right) \) is \(p+2\) times continuously differentiable, the derivatives \({\tilde{\mu }}^{(s)}\) and \(g_{\psi }^{(s)}\) (\(s=1,2,\ldots ,p\)) are square integrable, \(g_{\psi }(w)>0\) (\(w\in [-\pi ,\pi )\)) and

$$\begin{aligned} n\rightarrow \infty \text {, }b\rightarrow 0\text {, }nb\rightarrow \infty . \end{aligned}$$

Then,

$$\begin{aligned} {\hat{\mu }}_{n}\left( w^{0}\right) -\mu \left( w^{0}\right) =O\left( b^{2k}\right) +O_{p}\left( \frac{1}{\sqrt{nb}}\right) +O_{p}\left( n^{H_{Z}-1}\right) +O_{p}\left( b^{2}n^{H_{X}-1}\right) . \end{aligned}$$
(26)

Remark 6

The first term in (26) is deterministic. It is due to the difference \(\mu _{n}(w^{0})-\mu (w^{0})\) where \(\mu _{n}(w^{0} )=E[K_{b}(\psi -w^{0})\vartheta ]/E[K_{b}(\psi -w^{0})].\) The order of this error is smaller the higher the order of the kernel \(K_{b}\) is.

Equation (26) can be used to address the question of optimal bandwidth choice. Note first that the third term in (26) is the only one that does not depend on the bandwidth b. This means that an error of order \(O_{p}(n^{H_{Z}-1})\) is always there, independently of the choice of b. A sequence of bandwidths \(b_{n}\) may therefore be called optimal, if

$$\begin{aligned} \max \left\{ b_{n}^{2k},\frac{1}{\sqrt{nb_{n}}},b_{n}^{2}n^{H_{X}-1}\right\} =o\left( n^{H_{Z}-1}\right) . \end{aligned}$$
(27)

The first question is whether (27) can be achieved. The answer is given by the following Lemma:

Lemma 2

Under the assumptions of Theorem 1 the following statements are equivalent:

  1. (i)

    The set of bandwidth sequences such that \(b_{n}>0\), \(b_{n}\rightarrow 0\), \(nb_{n}\rightarrow \infty \) and (27) holds is not empty.

  2. (ii)

    The set of bandwidth sequences such that \(b_{n}>0\), \(b_{n} \rightarrow 0\), \(nb_{n}\rightarrow \infty \) and

    $$\begin{aligned} \lim _{n\rightarrow \infty }n^{(1-H_{Z})/2k}b_{n}=0\text {, } \lim _{n\rightarrow \infty }n^{(H_{X}-H_{Z})/2}b_{n}=0\text {, } \lim _{n\rightarrow \infty } n^{2H_{Z}-1}b_n=\infty \end{aligned}$$
    (28)

    is not empty.

  3. (iii)
    $$\begin{aligned} H_{Z}>\max \left\{ \frac{H_{X}+2}{5},\frac{2k+1}{4k+1}\right\} . \end{aligned}$$
    (29)

As a corollary we obtain:

Corollary 1

Suppose that the assumptions of Theorem 1, and (29) hold. Then the set of optimal sequences is not empty. Moreover, for any such sequence \(b_{n}\) we have

$$\begin{aligned} {\hat{\mu }}_{n}\left( w^{0}\right) -\mu \left( w^{0}\right) =O_{p}\left( n^{H_{Z}-1}\right) \text { (}w^{0}\in [0,2\pi )\text {).} \end{aligned}$$
(30)

A closer analysis of (30) leads to convergence in distribution:

Theorem 2

Suppose that the assumptions of Theorem 1, and (29) hold. Then

$$\begin{aligned} n^{1-H_{Z}} \left[ {\hat{\mu }}_{n}\left( w^{0}\right) - \mu \left( w^{0} \right) \right] \underset{d}{\rightarrow } c_{\mu }\zeta \end{aligned}$$
(31)

where

$$\begin{aligned} c_{\mu }=\sqrt{\frac{\sin \left( \pi H_{Z}-\frac{\pi }{2}\right) \Gamma \left( 2-2H_{Z}\right) c_{f,Z}}{H_{Z}\left( H_{Z}-\frac{1}{2}\right) }}, \end{aligned}$$

\(c_{f,Z}\) is defined in (7) and \(\zeta \) is a standard normal random variable. More generally, for \(w_{1}^{0}<w_{2}^{0}<\cdots <w_{m}^{0}\) (\(w_{j}^{0}\in [0,2\pi )\)),

$$\begin{aligned} n^{1-H_{Z}} \left[ {\hat{\mu }}_{n}\left( w_{1}^{0}\right) -\mu \left( w_{1}^{0}\right) ,\ldots ,{\hat{\mu }}_{n}\left( w_{m}^{0} \right) - \mu \left( w_{m}^{0}\right) \right] \underset{d}{\rightarrow } \left[ c_{\mu },\ldots ,c_{\mu }\right] \cdot \zeta . \end{aligned}$$
(32)

Remark 7

Equation (32) means in particular that the standardized random deviations of estimates at different values \(w_{j}^{0}\), \(w_{l}^{0}\) are asymptotically perfectly correlated. This is very much in contrast to usual behaviour in nonparametric regression. The degenerate limit theorem simplifies the construction of simultaneous confidence intervals for \(\mu (w_{1}^{0}),\ldots ,\mu (w_{m}^{0})\).

Remark 8

It can also be shown that, if \(H_{X}\le \frac{1}{2}\), then (32) holds for any \(H_{Z}>(2k+1)/(4k+1)\). Note that in this case we also have \(\max \{ (H_{X}+2)/5,(2k+1)/(4k+1) \}=(2k+1)/(4k+1)\).

Remark 9

For \(k=1\), we have \((2k+1)/(4k+1)=3/5\). Now \(H_{X}<1\) implies \((H_{X}+2)/5<3/5\) so that condition (29) reduces to

$$\begin{aligned} H_{Z}>\frac{3}{5}. \end{aligned}$$
(33)

On the other hand, for \(k\rightarrow \infty \), the ratio \((2k+1)/(4k+1)\) decreases monotonically to 1/2. Therefore, in the limit (as \(k\rightarrow \infty \)), under the assumption that \(H_{X}>1/2\) condition (29) reduces to \(H_{Z}>(H_{X}+2)/5\).

Under simple additional conditions, uniform convergence of \({\hat{\mu }} _{n}(w^{0})\) can be obtained:

Theorem 3

Suppose that the assumptions of Theorem 1, and (29) hold. Moreover, assume

$$\begin{aligned} \lim _{n\rightarrow \infty }\left( n^{H_{Z}+H_{X}-2}\sum _{s=-\infty }^{\infty }\left| \alpha _{s}\left( b_{n}\right) \right| \right) =0, \end{aligned}$$
(34)

where \(\alpha _{s}(b)\) are the coefficients in the Fourier series representation (25). Then

$$\begin{aligned} \sup _{w^{0}\in [0,2\pi )}\left| {\hat{\mu }}_{n}\left( w^{0}\right) -\mu \left( w^{0}\right) \right| \underset{p}{\rightarrow }0. \end{aligned}$$

Remark 10

The conditions in Theorem 3 are related to Lemma 3.2 and Corollary 3.2 in Ghosh (2014).

Typically, \(\sum _{s=-\infty }^{\infty }|\alpha _{s}(b_{n})|\) is asymptotically proportional to \(b_{n}^{-\beta }\) for some \(\beta >0\). Theorem 3 then simplifies as follows.

Corollary 2

Suppose that the assumptions of Theorem 1, and (29) hold. Moreover, assume that

$$\begin{aligned} \sum _{s=-\infty }^{\infty }\left| \alpha _{s}\left( b\right) \right| =O\left( b^{-\beta }\right) \text { (}b\rightarrow 0\text {)} \end{aligned}$$
(35)

for some \(\beta >0\), and \(b_{n}\) is a sequence of bandwidths such that \(b_{n}>0\), \(b_{n}\rightarrow 0\), \(nb_{n}\rightarrow \infty \), (27) holds and

$$\begin{aligned} \lim _{n\rightarrow \infty }n^{2-H_{Z}-H_{X}}b_{n}^{\beta }=\infty . \end{aligned}$$
(36)

Then

$$\begin{aligned} \sup _{w^{0}\in [0,2\pi )}\left| {\hat{\mu }}_{n}\left( w^{0}\right) -\mu \left( w^{0}\right) \right| \underset{p}{\rightarrow }0. \end{aligned}$$

Example 1

Denote by \(I_{s}(\kappa )\) (\(s=0,1,2,\ldots \)) modified Bessel functions of order s. Consider a von Mises kernel with \(\kappa =b^{-2}\),

$$\begin{aligned} K_{b}\left( u\right) =\frac{1}{2\pi I_{0}\left( \kappa \right) }\exp \left( \kappa \cos u\right) . \end{aligned}$$

The Fourier series representation of \(K_{b}\) is given by

$$\begin{aligned} K_{b}\left( u\right) =\frac{1}{2\pi }\sum _{s=-\infty }^{\infty }\alpha _{s}\left( \kappa \right) \exp \left( -isu\right) \end{aligned}$$

where

$$\begin{aligned} \alpha _{s}\left( \kappa \right) =\frac{I_{s}\left( \kappa \right) }{I_{0}\left( \kappa \right) }. \end{aligned}$$

Since \(I_{s}(\kappa )>0\), we have

$$\begin{aligned} \sum _{s=-\infty }^{\infty }\left| \alpha _{s}\left( \kappa \right) \right| =2\pi K_{b}\left( 0\right) =\frac{\exp \left( \kappa \right) }{I_{0}\left( \kappa \right) }. \end{aligned}$$

Then, \(I_{0}(\kappa )\sim \exp (\kappa )/\sqrt{2\pi \kappa }\) (\(\kappa \rightarrow \infty \)) (see e.g. Robert 1990) implies

$$\begin{aligned} \sum _{s=-\infty }^{\infty }\left| \alpha _{s}\left( \kappa \right) \right| \underset{\kappa \rightarrow \infty }{\sim }\sqrt{2\pi \kappa } =\sqrt{2\pi }b^{-1}. \end{aligned}$$

Thus, condition (36) with \(\beta =1\) holds. Similar results can be obtained for the generalized von Mises distribution introduced by Kim et al. (2013).

Example 2

For a wrapped normal kernel with \(\sigma =b\),

$$\begin{aligned} K_{b}\left( u\right) =\frac{1}{\sqrt{2\pi }b}\sum _{j=-\infty }^{\infty } \exp \left( -\frac{\left( u+2\pi j\right) ^{2}}{2b^{2}}\right) , \end{aligned}$$

we have the Fourier series representation

$$\begin{aligned} K_{b}\left( u\right) =\frac{1}{2\pi }\sum _{s=-\infty }^{\infty }\exp \left( -\frac{b^{2}}{2}s^{2}\right) \exp \left( -isu\right) . \end{aligned}$$

Thus,

$$\begin{aligned} \sum _{s=-\infty }^{\infty }\left| \alpha _{s}\left( b\right) \right|&=2\pi K_{b}\left( 0\right) \\&=\frac{\sqrt{2\pi }}{b}\sum _{j=-\infty }^{\infty }\exp \left( -\frac{2\pi ^{2} }{b^{2}}j^{2}\right) , \end{aligned}$$

and, for \(b\le 1\),

$$\begin{aligned} \sum _{s=-\infty }^{\infty }\left| \alpha _{s}\left( b\right) \right|&\le b^{-1}\sqrt{2\pi }\sum _{j=-\infty }^{\infty }\exp \left( -2\pi ^{2} j^{2}\right) \\&=Cb^{-1}. \end{aligned}$$

Therefore, condition (36) with \(\beta =1\) holds.

4 Asymptotic confidence intervals

Theorem 2 can be used to obtain asymptotic confidence sets for \(\mu (w_{1}^{0})\),...,\(\mu (w_{m}^{0})\). Since condition (29) has to be checked first, a two-step procedure has to be applied. Suppose for instance that we use a kernel of order 2. Then we have to test the null hypothesis \(\hbox {H}_{0}:H_{Z}\le 3/5\) against the alternative \(\hbox {H}_{1}:H_{Z}>3/5\). If \(\hbox {H}_{{0}}\) is rejected, then (32) implies that confidence intervals with asymptotic coverage probability \(1-\alpha \) can be defined by

$$\begin{aligned} \left[ {\hat{\mu }}_{n}\left( w_{1}^{0}\right) ,\ldots ,{\hat{\mu }}_{n}\left( w_{m}^{0}\right) \right] \pm n^{H_{Z}-1}q_{1-\alpha /2}\left[ c_{\mu }\left( w_{1}^{0}\right) ,\ldots ,c_{\mu }\left( w_{m}^{0}\right) \right] \end{aligned}$$
(37)

where \(q_{1-\alpha /2}\) is the \((1-\alpha /2)\)-quantile of the standard normal distribution (see (32)).

Testing \(H_{Z}\le 3/5\) versus \(H_{Z}>3/5\) can be done as follows. Under the given conditions, \(S_{Z,t}=\sin Z_{t}{\text {mod}}2\pi \) has the same long-memory parameter \(H_{Z}\) as \(Z_{t}\). Therefore a consistent estimator of \(H_{Z}\) can be based on the periodogram of the series \(S_{{\hat{Z}},t}=\sin {\hat{Z}}_{t}\) (\(t=1,\ldots ,n\)) where \({\hat{Z}}_{t}=[\vartheta _{t}-{\hat{\mu }} _{n}(\psi _{t})]{\text {mod}}2\pi \). There is a vast range of consistent methods for estimating the long-memory parameter (see e.g. Beran et al. 2013, chapter 5, and references therein). For instance, let \({\hat{H}}_{Z}\) be a local Whittle estimator based on the periodogram of \(S_{{\hat{Z}},t}\) at the m lowest frequencies. Then, under mild regularity conditions (see Robinson 1995), as \(m\rightarrow \infty \), \(m/n\rightarrow 0\),

$$\begin{aligned} \sqrt{m}\left( {\hat{H}}_{Z}-H_{Z}\right) \underset{d}{\rightarrow }\frac{1}{2}\zeta _{1} \end{aligned}$$

where \(\zeta _{1}\) is a standard normal random variable. Thus, given a level of significance \(\alpha \in (0,1)\), we reject \(\hbox {H}_{0}:H_{Z}\le 3/5\), if

$$\begin{aligned} {\hat{H}}_{Z}>\frac{3}{5}+\frac{1}{2\sqrt{m}}q_{1-\alpha } \end{aligned}$$

where \(q_{1-\alpha }\) is the \((1-\alpha )\)-quantile of the standard normal distribution.

Remark 11

Confidence intervals (37) are conditional, since they are computed under the condition that \(\hbox {H}_{0}:H_{Z}\le 3/5\) is rejected. In principle this would call for an adjustment of quantiles, for instance using a Bonferroni correction. However, note that (32) and (37) are applicable only, if \(H_{Z}>3/5\). If this is the case (i.e. \(H_{Z}>3/5\)), then then \(\hbox {H}_{{0}}\) is rejected with asymptotic probability one, provided that we use a consistent estimator \({\hat{H}}_{Z}\) of \(H_{Z}\). Therefore, no correction for multiple testing needs to be applied asymptotically.

Remark 12

In practice, the constants \(c_{\mu }(w_{j}^{0})\) (\(j=1,\ldots ,m\)) in (37) have to be estimated. This means that \(H_{Z}\) and \(c_{f,Z}\) have to be replaced by consistent estimates.

Remark 13

If one uses a kernel of order \(p=2k\) with \(k\ge 2\) (instead of \(k=1\)), a test of

$$\begin{aligned} \text {H}_{0}:H_{Z}\le \max \left\{ \frac{H_{X}+2}{5},\frac{2k+1}{4k+1}\right\} \end{aligned}$$

against

$$\begin{aligned} \text {H}_{1}:H_{Z}>\max \left\{ \frac{H_{X}+2}{5},\frac{2k+1}{4k+1}\right\} \end{aligned}$$

has to be carried out. This is slightly more complicated, because the right hand side of the inequality involves the unknown parameter \(H_{X}\). Since \(\sin \psi _{t}\) has the same long-memory parameter \(H_{X}\) as \(\psi _{t}\), \(H_{X}\) can be estimated based on the periodogram of the series \(S_{\psi ,t}=\sin \psi _{t}\) (\(t=1,\ldots ,n\)). Suppose for instance that \({\hat{H}}_{Z}\) and \({\hat{H}}_{X}\) are local Whittle estimators based on \(S_{{\hat{Z}},t}=\sin \hat{Z}_{t}\) (see above) and \(S_{\psi ,t}\) respectively. Then

$$\begin{aligned} \sqrt{m}\left( {\hat{H}}_{Z}-H_{Z},{\hat{H}}_{X}-H_{X}\right) \underset{d}{\rightarrow }\frac{1}{2}\left( \zeta _{1},\zeta _{2}\right) \end{aligned}$$

where \(\zeta _{1}\), \(\zeta _{2}\) are independent standard normal random variables and m is the number of lowest frequencies used for estimation. Let

$$\begin{aligned} T={\hat{H}}_{Z}-\max \left\{ \frac{{\hat{H}}_{X}+2}{5},\frac{2k+1}{4k+1}\right\} . \end{aligned}$$

A rejection region at an asymptotic level of significance \(\alpha \) is then defined by the condition \(T>q_{1-\alpha }\) where \(q_{1-\alpha }\) is the \((1-\alpha )\)-quantile of the random variable

$$\begin{aligned} W=W\left( H_{Z},H_{X}\right) =H_{Z}+\frac{1}{2\sqrt{m}}\zeta _{1} -\max \left\{ \frac{1}{5}H_{X}+\frac{2}{5}+\frac{1}{10\sqrt{m}}\zeta _{2} ,\frac{2k+1}{4k+1}\right\} . \end{aligned}$$
(38)

The distribution of W depends on \(H_{Z}\) and \(H_{X}\). For each pair \((H_{Z},H_{X})\in \left( \frac{1}{2},1\right) ^{2}\), \((1-\alpha )\)-quantiles \(q_{1-\alpha }=q_{1-\alpha }(H_{Z},H_{X})\) of \(W(H_{Z},H_{X})\) can be obtained by numeric integration or by simulations. If \(H_{Z}\) and \(H_{X}\) are replaced by their consistent estimates, approximate \((1-\alpha )\)-quantiles are given by \(q_{1-\alpha }({\hat{H}}_{Z},{\hat{H}}_{X})\).

5 Data example

We consider daily average wind directions recorded in Chicago and Milwaukee between 1 January and 31 December, 2015 (data source https://www.glerl.noaa.gov/). Figure 1a–d show windrose plots of wind directions in Chicago (here given in degrees between \(0^{o}\) to \(360^{\circ }\)), split into four three-months periods, i.e. (a) January to March, (b) April to June, (c) July to September and (d) October to December. The same plots are displayed for Milwaukee in Fig. 2a–d. These figures indicate typical seasonal patterns. In a first step we therefore remove a deterministic seasonal trend from both series. This is done using the deseasonalization method for circular data described in Beran and Ghosh (2019). Figure 3a shows the two series together with the fitted seasonal trend. The residual series are displayed in Fig. 3b. Note that for better visibility, the Milwaukee series was moved down vertically by \(360^{\circ }\).

Fig. 1
figure 1

Windrose plots of daily wind directions in Chicago in 2015, split into four three-months periods: a January to March, b April to June, c July to September and d October to December

Fig. 2
figure 2

Windrose plots of daily wind directions in Milwaukee in 2015, split into four three-months periods: a January to March, b April to June, c July to September and d October to December

Fig. 3
figure 3

a Daily wind directions in Chicago and Milwaukee, in 2015. Also plotted are fitted seasonal trends. For better visibility, the Milwaukee series is shifted vertically by \(-360^{\circ }\). b Residual series obtained after removing the seasonal trend

Fig. 4
figure 4

Scatterplot of seasonally adjusted wind directions (Milwaukee vs. Chicago)

Fig. 5
figure 5

Log-log-periodogram of \(\sin {\hat{Z}}_{t}\)

Fig. 6
figure 6

Scatterplot of seasonally adjusted wind directions (Milwaukee vs. Chicago) together with \({\hat{\mu }}(\psi )\) and a \(95\%-\hbox {confidence}\) band

Define now \(\vartheta _{t}\) to be the residual series for Milwaukee (lower series in Fig. 3b), and \(\psi _{t}\) the residual series for Chicago. The plot of \(\vartheta _{t}\) vs. \(\psi _{t}\) in Fig. 4 indicates a possible relationship between the two wind directions, though the circular nature of the variables makes a visual inspection more difficult. The Nadaraya–Watson estimate of \(\mu (\psi )\), based on a von Mises kernel, is shown in Fig. 6. Since we are using a kernel of order 2, the value of \(H_{X}\) does not play any role (see Remark 9). With respect to estimation of \(H_{Z}\), we consider a local Whittle estimator with \(m=[n^{0.6}]\) frequencies. For \(S_{{\hat{Z}},t}=\sin {\hat{Z}}_{t}\) where \({\hat{Z}}_{t}=\vartheta _{t}-{\hat{\mu }} _{n}(\psi _{t}){\text {mod}}2\pi \), we obtain \({\hat{H}}_{Z}=0.75\) with a \(95\%\)-confidence interval of [0.58, 0.92]. Thus, there is significant evidence for long memory in \(Z_{t}\). The estimate does not change much, when other estimation methods are used. For instance, fitting fractional autoregressive models of orders \(p=0,1,\ldots ,3\), and selecting the best model using the BIC (see e.g. Beran et al. 1998), leads to \({\hat{H}}_{Z}=0.67\) ([0.59, 0.75]). Figure 5 shows the log-log-periodogram \(\sin {\hat{Z}}_{t}\), together with a fitted spectral density.

The next step is to test \(H_{Z}\le 3/5\) against \(H_{Z}>3/5\). For the local Whittle estimator, the critical limit for the \(5\%\) level of significance is \(3/5+1.645\cdot \frac{1}{2}m^{-1/2}=0.74\). Since \({\hat{H}}_{Z}=0.75\) is above this threshold, there is evidence at the \(5\%\)-level of significance for condition (29). The same conclusion is reached when using estimates based on the BIC method. Confidence bands for \(\mu (\psi )\) based on Theorem 2 and (37) are given in Fig. 6. As a cautionary remark one should bear in mind that, strictly speaking, the band is simultaneous for a finite - though possibly large - number of \(\psi \)-values only. Note also that due to the circular nature of the data, there is an apparent jump around \(\psi =17.6^{o}\). This is not a jump in the function \(\mu \), but rather due to plotting modulo \(360^{\circ }\). The same comment applies to the confidence band.