On nonparametric regression for bivariate circular long-memory time series

We consider nonparametric regression for bivariate circular time series with long-range dependence. Asymptotic results for circular Nadaraya–Watson estimators are derived. Due to long-range dependence, a range of asymptotically optimal bandwidths can be found where the asymptotic rate of convergence does not depend on the bandwidth. The result can be used for obtaining simple confidence bands for the regression function. The method is illustrated by an application to wind direction data.


Introduction
Directional or circular time series arise in many scientific fields such as meteorology, oceanography, biology, neuroscience, bioinformatics, geoscience and cosmology. An overview of directional data analysis can be found for instance in books by Mardia (1972), Fisher (1993), Mardia and Jupp (2000), Jammalamadaka and SenGupta (2001) and Ley and Verdebout (2017). Most literature deals with iid observations. More generally, for cases where more than one circular variable is observed, parametric and nonparametric circular-circular regression with iid residuals is discussed for instance in Gould (1969), Fisher and Lee (1992), Johnson and Wehrly (1978), Mardia and Jupp (2000), Jammalamadaka and SenGupta (2001), Kato et al. (2008), Kim and SenGupta (2016) and Polsen and Taylor (2015). Nonparametric density estimation on the circle is discussed for instance by Watson (1983), Hall et al. (1987), Bai et al. (1988), Fisher (1989Fisher ( , 1993 and Taylor (2008). Also see Tsurata and Sagae (2017) for properties of higher order circular kernels. In practice, one often observes circular time series that exhibit serial dependence. Circular processes with weak serial dependence are considered for instance in Wehrly and Johnson (1980), Breckling (1989), Kato (2010), Modlin et al. (2012) and Wang and Gelfand (2014). Circular time series with longrange dependence are considered in Di Marzio et al. (2012) and Beran and Ghosh (2019). Di Marzio et al. (2012) discuss nonparametric trend estimation, Beran and Ghosh (2019) define models with long-range dependence using Gaussian subordination, and derive asymptotic results for parametric estimators where the mean direction depends on deterministic explanatory variables. In this paper, we extend the results in Beran and Ghosh (2019) to a nonparametric circular-circular regression model of the form where ϑ t , ψ t ∈ [−π, π) (t = 1, 2, . . .), ψ t and Z t are stationary long-memory processes defined by Gaussian subordination, and Z t is such that (see e.g. Gould 1969). Equations (1) and (2) mean that, given ψ t = w 0 , the mean direction of ϑ t is μ w 0 , i.e.
We consider asymptotic properties of circular kernel estimators of μ(w 0 ). Limit theorems are derived using in particular general results in Mielniczuk and Wu (2004). Due to long-range dependence, a range of asymptotically optimal bandwidths can be found where the asymptotic rate of convergence does not depend on the bandwidth. The results can be used for obtaining simple simultaneous confidence bands for μ(w 0 1 ), . . . , μ(w 0 m ) (w 0 1 , . . . , w 0 m ∈ [−π, π), m ∈ N). The paper is organized as follows. A general introduction and definitions are given in Sect. 2. Limit theorems are discussed in Sect. 3. In Sect. 4, asymptotic results from Sect. 3 are used to obtain confidence bands for μ(ψ). An application to wind direction data is discussed in Sect. 5. Proofs are given in the Appendix.

Definition of the model
We extend the model introduced in Beran and Ghosh (2019) to bivariate circular time series with long-range dependence. First we recall the definition of long memory in the sense of second order dependence. A real-valued second order stationary time series with autocovariance function γ (k) is said to exhibit long-range (or strong) dependence, if γ (k) = ∞. Often it is assumed that or that the spectral density has a pole at the origin characterized by with H ∈ ( 1 2 , 1) and 0 < c γ < ∞. Here, "∼" means that the ratio of the left and right hand side converges to 1. References to the extended literature on long-memory processes can be found for instance in Beran (1994), Giraitis et al. (2012) and Beran et al. (2013). For generating schemes for long memory processes also see e.g. Davidson and Sibbertsen (2005).
To obtain bivariate circular time series (ψ t , ϑ t ) ∈ [0, 2π) 2 (t ∈ Z) with long-range dependence, the following assumptions will be used: for some constants 0 < c γ,Z , c f ,Z , c γ,X , c f ,X < ∞ and 1 2 < H Z , H X < 1. Moreover, assume that the two processes are independent from each other. π)) be an absolutely continuous circular cumulative distribution function with density g ψ = G ψ , and denote by the standard normal distribution function. Define the circular time series ψ t (t ∈ Z) by • (A3) Let μ : [−π, π) → R be a twice continuously differentiable function. The circular time series ϑ t (t ∈ Z) is defined by Remark 1 Assumption (A1) implies Note also that Z t and X t can be written in the Wold representation with iid innovations ε t ∼ N (0, σ 2 ε ) and η t ∼ N (0, σ 2 η ) respectively, as where a 0 = c 0 = 1 and for some suitable constants 0 < C a , C c < ∞.

Remark 2
The circular time series ψ t = G −1 ψ ( (X t )) is said to be subordinated to the Gaussian process X t (see e.g. Rosenblatt 1961Rosenblatt , 1979Taqqu 1975Taqqu , 1979Dobrushin and Major 1979;Dobrushin 1980). By definition, the marginal distribution of the series is the circular distribution G ψ . Note that the circular density function g ψ = G ψ is arbitrary, i.e. using (10) any circular density function can be obtained as a marginal density via Gaussian subordination.
Assumption (8) means that X t exhibits long-range dependence. For the circular time series ψ t , a different definition of autocovariance and autocorrelation function has to be used. Let ν ψ denote the mean direction of ψ t , i.e. E[exp(iψ t )] = R exp(iν ψ ). Jammalamadaka and Sarma (1988) proposed the circular autocorrelation function To see whether long memory of X t is inherited by ψ t one needs to introduce the notion of Hermite rank. Denote by ϕ(x) = (2π) − 1 2 exp(− 1 2 x 2 ) the standard normal density function, and by L 2 (R; ϕ) the space of real valued functions h with build an orthogonal basis in L 2 (R; ϕ). Thus, every function h ∈ L 2 (R; ϕ) has an orthogonal L 2 -representation where the qth Hermite coefficient a q is given by A function h with a 0 = 0 is said to have Hermite rank m ≥ 1, if a m = 0 and a j = 0 ( j < m) (Taqqu 1975). The following Lemma is derived in Beran and Ghosh (2019): Lemma 1 Let ψ t (t ∈ Z) be defined in (10) with γ X characterized by (8). Denote by ν ψ the mean direction of ψ t , and define Suppose that the Hermite rank of h is m. Furthermore assume that Then, setting we have Remark 3 Note that, by definition of the mean direction ν ψ , Lemma 1 means that, under assumption (17), the directional series ψ t has longrange dependence in the sense that its circular autocorrelation ρ circular (k) has a hyperbolic decay and is not summable.
As for the second circular series ϑ t , (11) implies Since the processes X t and Z t are stationary, ϑ t is stationary with a constant mean direction ν ϑ . By definition, the conditional mean direction of ϑ t given ψ t = w 0 is equal to μ(w 0 ) (see (3)). Moreover, (20) implies that long-range dependence in Z t leads to long-range dependence in ϑ t .

Kernel estimation of the conditional mean direction
Let w 0 ∈ [0, 2π). The conditional mean direction μ(w 0 ) of ϑ t can be estimated by a circular kernel estimator. In the context of iid observations, circular kernel estimators have been discussed for instance by Hall et al. (1987), Fisher (1989), Jammalamadaka and SenGupta (2001), Taylor (2008), Tsurata and Sagae (2017); Tsuruta and Sagae (2020), Bedouhene and Zougab (2020). Here, we will consider Nadaraya-Watson estimators of the formμ For the kernel function K b we adopt the definition introduced in Hall et al. (1987) and Tsurata and Sagae (2017) among others.
Moreover, the lth moment of L is defined by

Remark 4 Note that by definition
The following assumptions on L are used in Tsurata and Sagae (2017): Note that, under assumptions (L1) to (L4), K b has a Fourier series representation The order of a circular kernel is defined as follows (Tsurata and Sagae 2017): (2017) discuss methods for constructing higher order circular kernels from lower order kernels. For instance, let

Asymptotic results
The asymptotic rate of convergence ofμ n (w 0 ) − μ(w 0 ) is characterized by the following decomposition: Then, Remark 6 The first term in (26) The order of this error is smaller the higher the order of the kernel K b is.
Equation (26) can be used to address the question of optimal bandwidth choice. Note first that the third term in (26) is the only one that does not depend on the bandwidth b. This means that an error of order O p (n H Z −1 ) is always there, independently of the choice of b. A sequence of bandwidths b n may therefore be called optimal, if The first question is whether (27) can be achieved. The answer is given by the following Lemma: Lemma 2 Under the assumptions of Theorem 1 the following statements are equivalent: (i) The set of bandwidth sequences such that b n > 0, b n → 0, nb n → ∞ and (27) holds is not empty. (ii) The set of bandwidth sequences such that b n > 0, b n → 0, nb n → ∞ and As a corollary we obtain: Corollary 1 Suppose that the assumptions of Theorem 1, and (29) hold. Then the set of optimal sequences is not empty. Moreover, for any such sequence b n we havê A closer analysis of (30) leads to convergence in distribution: Theorem 2 Suppose that the assumptions of Theorem 1, and (29) hold. Then (7) and ζ is a standard normal random variable. More generally, for w 0 1 < w 0 2 < · · · < w 0 m (w 0 j ∈ [0, 2π)), Remark 7 Equation (32) means in particular that the standardized random deviations of estimates at different values w 0 j , w 0 l are asymptotically perfectly correlated. This is very much in contrast to usual behaviour in nonparametric regression. The degenerate limit theorem simplifies the construction of simultaneous confidence intervals for μ(w 0 1 ), . . . , μ(w 0 m ).
Under simple additional conditions, uniform convergence ofμ n (w 0 ) can be obtained: Theorem 3 Suppose that the assumptions of Theorem 1, and (29) hold. Moreover, assume where α s (b) are the coefficients in the Fourier series representation (25). Then Remark 10 The conditions in Theorem 3 are related to Lemma 3.2 and Corollary 3.2 in Ghosh (2014).
Corollary 2 Suppose that the assumptions of Theorem 1, and (29) hold. Moreover, assume that for some β > 0, and b n is a sequence of bandwidths such that b n > 0, b n → 0, nb n → ∞, (27) holds and Then Example 1 Denote by I s (κ) (s = 0, 1, 2, . . .) modified Bessel functions of order s. Consider a von Mises kernel with κ = b −2 , The Fourier series representation of K b is given by Thus, condition (36) with β = 1 holds. Similar results can be obtained for the generalized von Mises distribution introduced by Kim et al. (2013).
Example 2 For a wrapped normal kernel with σ = b, we have the Fourier series representation Therefore, condition (36) with β = 1 holds.

Asymptotic confidence intervals
Theorem 2 can be used to obtain asymptotic confidence sets for μ(w 0 1 ),…,μ(w 0 m ). Since condition (29) has to be checked first, a two-step procedure has to be applied. Suppose for instance that we use a kernel of order 2. Then we have to test the null hypothesis H 0 : H Z ≤ 3/5 against the alternative H 1 : H Z > 3/5. If H 0 is rejected, then (32) implies that confidence intervals with asymptotic coverage probability 1 − α can be defined by where q 1−α/2 is the (1 − α/2)-quantile of the standard normal distribution (see (32)).
Testing H Z ≤ 3/5 versus H Z > 3/5 can be done as follows. Under the given conditions, S Z ,t = sin Z t mod 2π has the same long-memory parameter H Z as Z t . Therefore a consistent estimator of H Z can be based on the periodogram of the series SẐ ,t = sinẐ t (t = 1, . . . , n) whereẐ t = [ϑ t −μ n (ψ t )] mod 2π . There is a vast range of consistent methods for estimating the long-memory parameter (see e.g. Beran et al. 2013, chapter 5, and references therein). For instance, letĤ Z be a local Whittle estimator based on the periodogram of SẐ ,t at the m lowest frequencies. Then, under mild regularity conditions (see Robinson 1995), as m → ∞, m/n → 0, where ζ 1 is a standard normal random variable. Thus, given a level of significance α ∈ (0, 1), we reject H 0 : where q 1−α is the (1 − α)-quantile of the standard normal distribution.

Remark 11
Confidence intervals (37) are conditional, since they are computed under the condition that H 0 : H Z ≤ 3/5 is rejected. In principle this would call for an adjustment of quantiles, for instance using a Bonferroni correction. However, note that (32) and (37) are applicable only, if H Z > 3/5. If this is the case (i.e. H Z > 3/5), then then H 0 is rejected with asymptotic probability one, provided that we use a consistent estimatorĤ Z of H Z . Therefore, no correction for multiple testing needs to be applied asymptotically.

Remark 12
In practice, the constants c μ (w 0 j ) ( j = 1, . . . , m) in (37) have to be estimated. This means that H Z and c f ,Z have to be replaced by consistent estimates.

Remark 13
If one uses a kernel of order p = 2k with k ≥ 2 (instead of k = 1), a test of has to be carried out. This is slightly more complicated, because the right hand side of the inequality involves the unknown parameter H X . Since sin ψ t has the same longmemory parameter H X as ψ t , H X can be estimated based on the periodogram of the series S ψ,t = sin ψ t (t = 1, . . . , n). Suppose for instance thatĤ Z andĤ X are local Whittle estimators based on SẐ ,t = sinẐ t (see above) and S ψ,t respectively. Then where ζ 1 , ζ 2 are independent standard normal random variables and m is the number of lowest frequencies used for estimation. Let A rejection region at an asymptotic level of significance α is then defined by the condition T > q 1−α where q 1−α is the (1 − α)-quantile of the random variable The distribution of W depends on H Z and H X . For each pair (H Z , H X ) ∈ 1 2 , 1 2 ,

Data example
We consider daily average wind directions recorded in Chicago and Milwaukee between 1 January and 31 December, 2015 (data source https://www.glerl.noaa.gov/).  Fig. 2a-d. These figures indicate typical seasonal patterns. In a first step we therefore remove a deterministic seasonal trend from both series. This is done using the deseasonalization method for circular data described in Beran and Ghosh (2019). Figure 3a shows the two series together with the fitted seasonal trend. The residual series are displayed in Fig. 3b. Note that for better visibility, the Milwaukee series was moved down vertically by 360 • . Define now ϑ t to be the residual series for Milwaukee (lower series in Fig. 3b), and ψ t the residual series for Chicago. The plot of ϑ t vs. ψ t in Fig. 4 indicates a possible relationship between the two wind directions, though the circular nature of (a) January to March (b) April to June (c) July to September (d) October to December  Fig. 6. Since we are using a kernel of order 2, the value of H X does not play any role (see Remark 9). With respect to estimation of H Z , we consider a local Whittle estimator with m = [n 0.6 ] frequencies.
For SẐ ,t = sinẐ t whereẐ t = ϑ t −μ n (ψ t ) mod 2π , we obtainĤ Z = 0.75 with a 95%-confidence interval of [0.58, 0.92]. Thus, there is significant evidence for long memory in Z t . The estimate does not change much, when other estimation methods are used. For instance, fitting fractional autoregressive models of orders p = 0, 1, . . . , 3, and selecting the best model using the BIC (see e.g. Beran et al. 1998), leads to  Figure 5 shows the log-log-periodogram sinẐ t , together with a fitted spectral density. The next step is to test H Z ≤ 3/5 against H Z > 3/5. For the local Whittle estimator, the critical limit for the 5% level of significance is 3/5 + 1.645 · 1 2 m −1/2 = 0.74. Sincê H Z = 0.75 is above this threshold, there is evidence at the 5%-level of significance for condition (29). The same conclusion is reached when using estimates based on the BIC method. Confidence bands for μ(ψ) based on Theorem 2 and (37) are given in Fig. 6. As a cautionary remark one should bear in mind that, strictly speaking, the band is simultaneous for a finite -though possibly large -number of ψ-values only. Note also that due to the circular nature of the data, there is an apparent jump around ψ = 17.6 o . This is not a jump in the function μ, but rather due to plotting modulo 360 • . The same comment applies to the confidence band.
Acknowledgements Open Access funding enabled and organized by Projekt DEAL. The data set was obtained from the homepage of the NOAA Great Lakes Environmental Research Laboratory (https://www. glerl.noaa.gov/). We would like to thank the associate editor and the referees for their insightful remarks.
Funding Open Access funding enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Proofs
Proof (of Theorem 1) The method of the proof of Theorem 1 follows similar lines as Proposition 1 and Section 3.2 in Mielniczuk and Wu (2004). For ψ 0 ∈ [0, 2π), we haveμ is a kernel density estimator of the density f ψ at ψ 0 . Using the notation The first term is deterministic and does not depend on the dependence structure. Following arguments in Tsurata and Sagae (2017) (for the special case of k = 1 also see Taylor 2008), leads to To study asymptotic properties of D n , the following notation is useful: and

Consider the decomposition
The asymptotic results in Mielniczuk and Wu (2004) are obtained for kernel regression with non-circular kernels. Taking into account the definition and properties of circular kernels, the arguments in Mielniczuk and Wu (2004) can be reformulated accordingly, to yield and r n = o p (max{A n , B n , C n }).
Proof (of Lemma 2) From Theorem 1 and (26) we havê In order that the term O p (n H Z −1 ) dominates we need max b 2k , (nb) −1/2 , b 2 n H X −1 = o n H Z −1 .

This leads to
and In order that (39) and (40) hold simultaneously, we need Proof (of Corollary 1) The result follows directly from Lemma 2 and Theorem 1.