Kernel regression for errors-in-variables problems in the circular domain

We study the problem of estimating a regression function when the predictor and/or the response are circular random variables in the presence of measurement errors. We propose estimators whose weight functions are deconvolution kernels defined according to the nature of the involved variables. We derive the asymptotic properties of the proposed estimators and consider possible generalizations and extensions. We provide some simulation results and a real data case study to illustrate and compare the proposed methods.


Introduction
Circular, or angular, data are observations consisting of directions or angles, and, as such, are defined on a circle-with unit radius-after the origin and orientation are established.Such data occur in many fields, for example: meteorology (wind and marine current directions), biology (directions of animal species migration), bioinformatics (conformational angles of a protein), geology (directions of rock fracture), social science and economics (clock or calendar effects).The fact that maximum and minimum of the measure scale are the same makes classical methods of statistics generally inappropriate for addressing circular data.However, circular statistics is a very active research field, and there are counterparts for most inferential techniques.For a recent comprehensive account about circular statistics, see Ley and Verdebout (2017) and Ley and Verdebout (2018).
Statistical regression models are generally based on the assumption that predictors have been measured exactly.However, sometimes they are, for some reason, not directly observable or are measured with errors, for instance due to imperfect measurement devices (deterioration or miscalibration), or the impossibility of directly accessing the variables of interest.When this is the case, specific models, known as errors-invariables or measurement error models, have to be considered.A key to solving many errors-in-variables problems is often to take the Fourier transform of the various functions involved, because, in the Fourier domain, equations generally become much simpler to solve.For a review of the extensive literature about the so-called deconvolution problem see, e.g., Carroll and Hall (1988); Liu and Taylor (1989); Carroll et al. (1995).
In the regression setting the kernel deconvolution estimator has been shown to reach the optimal rate of convergence by Fan and Truong (1993).The generalization of the higher order version of the local constant estimator, along with the derivation of the asymptotic normality, has been proposed by Delaigle et al. (2009).Carroll et al. (1999) introduced two new approaches to nonparametric regression in the presence of measurement error based on the simulation-extrapolation method and regression splines.Also, estimators involving kernel and orthogonal series methods based on a low order approximation approach have been proposed by Carroll and Hall (2004).The estimation of a nonparametric regression function with a covariate contaminated by a mixture of Berkson and classical measurement has been treated by Carroll et al. (2007).
The problem of estimating the density of an unobserved circular variable measured with error has been recently addressed by Di Marzio et al. (2021).Also, nonparametric regression for circular responses has been introduced by Di Marzio et al. (2012a) and Di Marzio et al. (2012b).
In this paper we introduce a nonparametric regression estimator that is consistent in the presence of measurement error when the data can be represented as points on the circumference of the unit circle.Specifically, we present a deconvolution estimator, showing resulting rates of asymptotic accuracy measures comparable to Euclidean deconvolution ones.The finite-sample properties of the estimator are investigated through Monte Carlo experiments.
We collect some basic concepts about the characteristic functions in Sect.2, and recall the Nadaraya-Watson estimators involving circular variables in Sect.3. Estimators which take into account of the presence of the measurement error are proposed in Sect.4, along with some asymptotics.In Sect. 5 we present some simulation results, and we conclude with a real data case study exploring the relation between levels of carbon monoxide and wind direction in Sect.6.

Some preliminaries
Given a real random variable X with distribution function F X , its characteristic function is defined as Since there is a one-to-one correspondence between characteristic functions and 1 3 Kernel regression for errors-in-variables problems in the… distribution functions, if X is absolutely integrable, it is possible to recover the density function f X of X, for every x ∈ ℝ , from the characteristic one by using the well known inversion formula Now consider the random variable Θ taking values on the unit circle.In this case, the density of Θ , say f Θ , is a 2 -periodic density function, i.e. f Θ ( ) = f Θ ( + 2r ) for any integer r; then its characteristic function, say Θ , is just defined for integer , and satisfies Θ ( ) = Θ+2 ( ), ∈ ℤ .Notice that Θ ( ) corresponds to the th trigonometric moment of Θ and that = 0 when f Θ is symmetric.It is interesting to note that and are the th order Fourier coefficients of the density f Θ .Analogously to the inversion formula for characteristic functions of real-valued random variables, if f Θ is square integrable on [0, 2 ) , one can represent f Θ ( ) , ∈ [0, 2 ) , through the Fourier series The smoothness of a generic density function can be determined by the rate of decay of the characteristic function: a polynomial decay characterizes ordinary smooth functions, while an exponential decay characterizes supersmooth ones.When the random variable is real-valued, examples of ordinary smooth densities include the Laplace and Gamma, while the Normal and Cauchy ones are supersmooth.In the circular case, examples for these two classes of functions are obtained by considering the wrapped version of the aforementioned densities.The von Mises density belongs to the class of the supersmooth ones.

Circular kernel regression in the error-free case
In this section we briefly recall the local constant regression estimators, also known as kernel regression or Nadaraya-Watson estimators, proposed by Di Marzio et al. (2009) and Di Marzio et al. (2012a) when the predictor and/or the response variables have circular nature.

Circular response
Consider a pair of ( × )-valued random variables (Ψ, Δ) , where is a generic domain and = [0, 2 ) .We are interested in the dependence of the response Δ on the predic- tor Ψ .Given the random sample (2) where the i s are i.i.d.random angles with zero mean direction and non-zero concentration (so we do not allow a U(0, 2 ) distribution for i ), which are independent of the Ψ i s.Now, for ∈ , let with g j = m j f , for j ∈ (1, 2) , where f is the design density, i.e. the density of the covariate, which may be linear or circular, and the function (y, x) returns the angle between the x-axis and the vector from the origin to (x, y).A kernel estimator for the regression function m at has been proposed by Di Marzio et al. (2012a) as where with W being a weight function, which may be linear or circular accord- ing to the nature of Ψ , having  > 0 as its smoothing parameter, whose role is to emphasize the contribution of the observations close to the estimation point .Noting that m 1 ( ) and m 2 ( ) are the components of the condi- tional first trigonometric moment of Δ , we can write m 1 ( ) = C( )f s ( ) and In what follows, we recall both the cases where Ψ is a random angle, i.e. = [0, 2 ) , or Ψ is a linear random variable, i.e. = ℝ.

Circular predictor
Consider the case where both and are [0, 2 ) , and denote the predictor variable as Θ .For ∈ [0, 2 ) , a local constant estimator for m( ) is defined as in Eq. ( 5) where the functions ĝj , j ∈ (1, 2) , involve a circular kernel K with a smoothing parameter  > 0 called the concentration.
We recall that a circular kernel of order r, defined by Di Marzio et al. (2009) as rth sin-order kernel, is a function symmetric around the null mean direction, with increasing with n in such a way that, as increases, ∫ − K ( )d tends to 1, for ∈ (0, ) , and, denoting j (K ) = ∫ 2 0 sin j ( )K ( )d , it holds (3) Kernel regression for errors-in-variables problems in the… Some asymptotic properties for the local constant estimator m() are collected in the following consider estimator (5) equipped with a circular kernel K as the weight function.If (i) K is a second sin-order kernel admitting a convergent Fourier series representa- tion 1∕(2 ( ) cos( )} , with increasing with n in such a way that, for ∈ ℤ + , lim n→∞ 1− ( ) (ii) the design density f Θ and the conditional expectations m 1 and m 2 are twice continuously differentiable in a neighbourhood of , and Remark 1 Condition i) of Result 1 is very mild because most of the usual circular densities, which are symmetric about the null mean direction, are included in the class of second sin-order kernels.Among these, an uncommon case is the uniform kernel on [− ∕( + 1), ∕( + 1)] , which has a smaller support than the cir- cle, where ∈ ℕ .A kernel satisfying above conditions, without being a density is instead the Dirichlet kernel (2 sin( ∕2)) −1 sin(( + 1∕2) ) .It can be negative, and its order depends on the value of , being + 1 if is odd, and + 2 otherwise.
A local linear version of estimator m can be obtained using the circular analogue of a local linear weight in defining the sample statistics ĝ1 and ĝ2 , see Di Marzio  et al. (2012a) for details.

Linear predictor
Now we consider the case where = ℝ , and denote the predictor variable as X.
Here the functions ĝj , j ∈ (1, 2) , of the local constant estimator (5) use a linear kernel which is a symmetric density function, with maximum at 0 and a smoothing parameter h > 0 called the bandwidth.Also define the quantities j (K) = ∫ x j K(x)dx and v(K) = ∫ K 2 (x)dx , and recall that K is a kth-order kernel if 0 (K) = 1 , j (K) = 0 for j = (1, … , k − 1) , and k (K) ≠ 0.
The asymptotic properties are collected in the following sider the estimator given by ( 5), equipped with a linear kernel K h as the weight func- tion.If (i) K h is a second order kernel such that h → 0 and nh → ∞ as n → ∞; (ii) the design density f X and the conditional expectations m 1 and m 2 are twice continuously differentiable in a neighbourhood of x, then, for any interior point x of the support of f X , and Remark 2 Condition i) of Result 2 is a basic requirement and is satisfied by common second order kernels such as the Uniform, Epanechnikov, Biweight, Triweight, Gaussian ones.
Analogously to the previous case, a local linear version of estimator m can be obtained using a local linear weight in defining the sample statistics ĝj , j ∈ (1, 2) as detailed in Di Marzio et al. (2012a).

Linear response
Consider the case where = [0, 2 ) and = ℝ , and denote, respectively, the pre- dictor and the response variables as Ψ and Y.Given the random sample Kernel regression for errors-in-variables problems in the… where the i s are i.i.d.real-random variables with zero mean and unit variance, and 2 (⋅) is the conditional variance of Y.
A kernel regression estimator for m at ∈ [0, 2 ) is defined as with K being a circular kernel.Some asymptotic properties are collected in the following sider the estimator given by ( 9).If assumption i) of Result 1 holds, and (i) the second derivative of the regression function m is continuous, (ii) the conditional variance 2 (⋅) is continuous, and the density f Ψ is continuously differentiable, then and More generally, a class of nonparametric regression estimators outlined for the case of a linear response and circular predictor has been described in detail by Di Marzio et al. (2009).

Circular kernel regression with errors-in-variables
Now we consider the errors-in-variables context, where the predictor variable is observed with errors.Specifically, suppose that we are interested in estimating nonparametrically the regression of Y on X, denoted as m, and that our data are realizations from variables Z = X + and Y, say (z 1 , y 1 ), … , (z n , y n ) .A general model for this case could be for i = 1, … , n , where X and Y respectively refer to the predictor and response vari- able, i s are realizations of the random error term , and i s are realizations of the random measurement error .The unobserved variable X is always referred to as the latent or true variable.The usual assumptions include that is independent from both X and , and that the distribution of is unknown but has mean 0 and constant variance, while the distribution of is known.Let f Z , f X and f respectively denote the probability density function of Z, X and .Basic theoretical considerations suggest that f Z is obtained from the convolution between f X and f , i.e.
where F denotes the distribution function of .As a consequence, the estimators of the error-free model are clearly not consistent.
A deconvolution approach can be used to obtain accurate estimators for m.We start by expressing the above relationship using the characteristic functions where Z , X and denote the characteristic functions of Z, X and , respectively.Assuming that is known, and Z can be estimated on the basis of sample data, the quantity of interest can be identified by the ratio The problem arises from the fact that (t) vanishes as t → ∞ .Hence, plugging the estimate φZ in Eq. ( 10) may not yield a consistent estimate of X , because even very small overestimates of | Z | are magnified by the arbitrarily large factor 1∕ .This is the well-known ill-defined inverse problem.The scenario is exacerbated if the error density is supersmooth because this makes the characteristic function tend to zero very fast as t → ∞ .The solution is represented by the so-called kernel decon- volution estimator proposed by Stefanski and Carroll (1990), which uses a kernel whose Fourier transform has a compact domain.This yields a compactly supported estimate φZ , and, consequently, φZ will vanish before small values of (t) cause the ratio to diverge.A more general perspective suggests to use a damping factor, i.e. to multiply φZ by a function that steadily goes to zero.Usually this function is the Fou- rier transform of an ordinary kernel W . Consequently, using the inversion theorem (1), the deconvolution kernel will be where W is the Fourier transform of the kernel W .
The description of our strategy will be implemented in the next sections to obtain errors-in-variables estimators for the cases when the predictor and/or response have a circular nature.
Kernel regression for errors-in-variables problems in the…

Circular response
We are interested in estimating a regression function m as in Eq. ( 4), but now we only observe the sample (Φ 1 , Δ 1 ), … , (Φ n , Δ n ) of i.i.d.observations, with Δ i s obtained according to model (3), and Φ i s being independent copies of the random variable where Ψ i s are independent copies of the latent variable Ψ whose density function f Ψ is defined on a generic domain , and is a random measurement error assumed to be independent of (Ψ, ) , with a known density function f which is symmetric around zero.We also assume that f , f Ψ and f Φ are square integrable densities.
On the basis of the deconvolution approach, a local estimator for m at ∈ can be defined as where the functions gj , j ∈ (1, 2), have the same structure as ĝj in Eq. ( 6), but employ the deconvolution kernel (11) in place of the weight function W .

Circular predictor
Consider the case where = [0, 2 ) and denote the predictor as Θ .Also, let f be a circular density admitting an absolutely convergent Fourier series representation.A nonparametric estimator for m at ∈ [0, 2 ) , denoted by m(;) , can be obtained by employing circular deconvolution kernels in formula (13).Therefore, recalling that the characteristic function of a periodic density takes values only for integer numbers, using the inversion formula (2), and considering that for a symmetric function = 0 for any , we have with smoothing parameter  > 0 , where ( ) and ( ) , for ∈ ℤ , respectively are the th Fourier coefficient of the periodic weight function K and the error density f whose concentration is .The estimator is well defined when the error density has nonvanishing Fourier coefficients, ( ) is not identically zero and + , which imply that both K and K are square integrable functions.
Some asymptotic properties are collected in the following consider the estimator m(;) , ∈ [0, 2 ) .If the assumptions of Result 1 hold, then (12) Φ = (Ψ + ) (2 ), (13) m(;) = (g 1 (;), g2 (;)), and Proof See Appendix.◻ We notice that the measurement error does not affect the bias of estimator m , which is identical to the error-free case, while the variance is considerably larger.This result will hold for the deconvolution estimators described in the next sections too.
Remark 3 Estimator (13) can also be obtained by using the unbiased score approach, which is based on the idea that it suffices to impose the constraint that gj , for j ∈ (1, 2) , employ an unknown weight function L k such that i.e.
By working in the Fourier domain, it can be seen that L  () = K ().

Linear predictor
Now, we consider the case where = ℝ , and denote the predictor variable as X.We assume that the measurement errors come from a known, symmetric density f with variance 2 , and the characteristic function (t) ≠ 0 for all t.
A kernel regression estimator for m at x ∈ ℝ , denoted by m(x;h) , is defined by employing in estimator (13) a linear deconvolution kernel with a smoothing parameter h > 0 , where K (t) = ∫ exp(itx)K(x)dx is the Fourier transform of the kernel K h defined in formula (7).In this case we assume that K is not identically zero and Kernel regression for errors-in-variables problems in the… As for the asymptotic properties we have the following Result 5 Given the ℝ × [0, 2 )-valued random sample (X 1 , Δ 1 ), … , (X n , Δ n ) , con- sider the estimator m(x;h) , x ∈ ℝ .If the assumptions of Result 2 hold, then, for any interior point x of the support of f X , and where −∞ < a < b < +∞ , then a different bias and variance hold when For this special case we need to adapt the boundary the- ory described in Di Marzio et al. (2012a) by employing a deconvolution kernel (15).

Linear response
We are now interested in the dependence of the real-valued response Y on the circular predictor Ψ , when the random sample (Φ 1 , Y 1 ), … , (Φ n , Y n ) , modelled according Eqs. ( 8) and ( 12) is available.Here the i s are i.i.d.circular random variables with zero mean direction and finite concentration, and are independent of the (Ψ i , i )'s.
The local constant estimator for m is defined by where K is a circular deconvolution kernel defined in formula ( 14).
As for the asymptotic properties we have the following If assumption i) of Result 1, and assump- tions i) and ii) of Result 3 hold, then and Proof See Appendix.◻ Remark 5 We notice that, as in the Euclidean setting, the measurement error has no effect on the asymptotic bias of the estimator, which, when the predictor observed with error is circular (linear respectively), depends only on the second moment of the classical kernel K ( K h resp.).The asymptotic variance, similarly to the Euclid- ean setting, depends on the Fourier coefficients (characteristic function resp.) of the error density appearing in roughness of the deconvolution kernel K ( Kh resp.).
Concerning the distribution of the estimators, asymptotic results can be obtained following the same approach of Delaigle et al. (2009), where some regularity conditions, directly applicable for our linear predictor case, are provided.In particular, using their assumptions about the kernel, the design density, the moments of the response and the error density -with the adaptations for the circular predictor case -the asymptotic normality of the estimators could be established following the same lines of theorems 1 and 2 of Delaigle et al. (2009).In fact, in the simulations of the next section we observed that the distribution of the estimates in correspondence of high design density is bell-shaped, closely recalling a Gaussian one.

Simulations
Our overall goal is to compare the performance of the standard Nadaraya-Watson regression estimator (SNW) with the proposed deconvolution one (DNW).Note that we will use "local constant" estimators only in these examples.We choose to avoid the task of smoothing degree selection in the estimates, consequently our results illustrate the potential of each method, with the caveat that the best performance is obtained conditional on an optimally selected smoothing degree.Our motivation for this is that we have not presented any data driven rule for smoothing selection, and so it appears preferable to avoid the situation in which the adoption of a sub-optimal rule then hides the strict merits of the estimators.
We consider these three simulation scenarios: (i) the circular-circular (C-C) case where the regression function m is modelled as the measurement and regression errors are assumed to follow, respectively, a wrapped Laplace (wL) and von Mises (vM) distribution, i.e., for i = 1, … , n , i ∼ wL(0, 0.2) and i ∼ vM(0, 5) , Θ i come from a von Mises density with Kernel regression for errors-in-variables problems in the… mean and concentration parameter 0.01, and a von Mises density is used as the kernel; (ii) the circular-linear (C-L) case where we assume that the measurement and regression errors follow, respectively, a wrapped Laplace and Normal distribution, i.e. i ∼ wL(0, 0.2) and i ∼ N(0, 0.2 2 ) , for i = 1, … , n , Θ i ∼ vM( , 0.01) , and the employed weight function is the von Mises density; (iii) the linear-circular (L-C) case where we use for m the model the measurement and regression errors follow, respectively, a Laplace and von Mises distribution, i.e. i ∼ L(0, 0.2) and i ∼ vM(0, 4) , for i = 1, … , n , X i ∼ N(6, 2 2 ) , and the standard Normal distribution is employed as the weight function.An illustration of above regression models is shown in Fig. 1.In both the scenarios i) and ii) in the summation of Eq. ( 14) we used a truncation at = 10 .Here does not play the role of a smoothing parameter, but it is necessary to set it in order to ensure a perfect description of the chosen kernel through its Fourier coefficients.We have prudentially chosen = 10 because this value largely guarantees an adequate representation of the deconvolution kernel in all of our simulation case studies.
Obviously, to study the effect of the measurement errors, it's possible to consider additional scenarios with different concentration (dispersion) parameters for the error model.However, we notice that when this latter goes to infinity (zero) we can ignore the error in the analysis, therefore the errors-in-variables estimator gives the same results as the standard kernel regression estimator.This is easily seen considering that in this case → 1 for any .In contrast, when it approaches zero (infinity), the target regression model becomes particularly hard to estimate because the error makes it unidentifiable.Additional simulative results are reported in the Supplementary Material.In a first experiment we use the best possible smoothing degree for each sample, obtained as minimizer of the averaged squared error (ASE) along a grid of size M describing the design density support.Due to the nature of response variables, ASE has two definitions.For the bth sample, b ∈ 1, … , B , in the L-C and C-C cases it is defined as where ∈ {h, } , with s j ∈ {x j , j } being a grid element, and (1 − cos( − )) is the usual angular distance between circular locations and .While, for the C-L case ASE is defined as where j is a grid element.For both n = 200 and n = 500 we have drawn B = 500 samples.Averaging over samples leads to the following global performance index (see, for example, Hart (1997), p. 86) where ∈ {h, } .We show MASE as a function of , and the corresponding minima in Fig. 2 and Table 1, respectively.We note that the deconvolution estimator gives a clear advantage, while both methods improve as the sample size increases.
Interestingly, the figure shows that deconvolution method tends to be uniformly superior for large n and the same choice of smoothing parameter, and this appears to be reassuring if we consider that a proper data driven smoothing selector still does not exist.However, the MASE curves allow also a kind of sensibility analy- sis, meaning that the curvatures of the deconvolution MASE s are generally much more pronounced than in the naive case.This suggests that an hypothetical selection method would be doomed to high variability.

Mis-specification of the measurement error
In this section we consider the case of a wrongly imposed measurement error model.In a first experiment data are corrupted by adding a supersmooth error and an ordinary smooth model is assumed, while in a second experiment we consider the opposite scenario.We have ensured that the concentration (or variability) of the error sample is very similar to that one of the assumed model in order to isolate the misspecification effect.Results are reported in Table 2.In panel A (B), in the C-C and C-L cases data are corrupted by a von Mises (wrapped Laplace) measurement error, while it is assumed a wrapped Laplace (von Mises) noise model.In the L-C case 1 3 Kernel regression for errors-in-variables problems in the… From Panel A we see that, if we wrongly assume a (wrapped) Laplace model, results are still reasonable, because we obtain positively biased estimates of the higher order coefficients, and this does not strongly affect the stability of DNW estimator, once considered that such estimates appear in the denominators of the deconvolution kernels.The opposite scenario clearly yields poorer performance, as seen in Panel B, because assuming a supersmooth error density leads to negatively biased estimates of the higher order coefficients hampering the stability of DNW estimator.As a result, an advice arises of, if in doubt, assume a smooth error model.

Pollution and surface wind data
The amount of pollution faced by a particular location will depend on a variety of factors.In this section we consider the response variable to be the amount of carbon monoxide (CO), and the explanatory variable to be the wind direction.In this case if the source of the pollution is upwind of the sensor, then a higher amount of pollution is likely and vice versa.The data were obtained from the Texas Commission on Environmental Quality who have many monitoring sites.Figure 3 shows the locations of sites which are close to Houston.Kernel regression for errors-in-variables problems in the… We have selected a site near Houston ("North Loop") in Harris County at Latitude: 29.81 o North and Longitude: −95.39 o West using data from 2018.1 The data is collected hourly, but we have calculated the average daily wind direction (using the directional average), and the average daily CO (in parts per million).We note that on 6 (out of 365) days the wind direction had two peaks, and in such cases the average is not so meaningful.But this is a small proportion (less than 2%) of the days and is unlikely to change our conclusions.These daily averages were "thinned" to reduce serial correlation resulting in 183 observations from alternate days.A technical treatment of correlation in circular data has been done by Di Marzio et al. (2012b).It is argued that, for a fixed sample size, the variance of an estimator increases with the correlation and it is often the case that the autocorrelation structure determines the optimal smoothing degree.
As a first benchmark we initially fit a parametric model in which CO (y) is related to wind direction ( ) using a sine-cosine model This gives fitted values β0 = 0.568, β1 = −0.173,β2 = 0.074 , with the prediction curve plotted with the data in Fig. 4. It is clear that the CO pollution is highest when the wind is coming from the south (2.73 radians).
As a second benchmark, we fit a standard circular-linear nonparametric regression, in which the measurements are treated as error free.The smoothing parameter (chosen by leave-one-out cross-validation) was selected as = 7.77 for a von Mises kernel, and the resulting curve is shown is also shown in Fig. 4. For this model, the maximum CO occurs at 2.11 radians.
In this circular-linear case, we use a measurement error model for the observed wind direction which can be approximated by a wrapped Normal error with zero mean and concentration equal to 0.9.This choice was motivated, in part, by Di Marzio et al. ( 2021) when dealing with surface wind data, but also influenced by a desire to note a difference from the error-free case (which is equivalent to taking = 1 ).As suggested by simulation results, in the summation of Eq. ( 14) we used a truncation at = 10 .This function depends only on , since is determined by the wrapped Normal concentration parameter.The estimated CO is then given using Eq. ( 16), in which was found by leave-one-out cross-validation to be 3.35.Naive cross-validation is sometimes used in practice although it does not have a sound theoretical foundation.The resulting curve is shown in Fig. 4, and is seen to be somewhat less y i = 0 + 1 sin i + 2 cos i + i , i = 1, … , n. Kernel regression for errors-in-variables problems in the… smooth than the error-free model estimate.The nonparametric errors-in-variables model has residual sum of squares equal to 1.91, whereas the parametric model is slightly larger (2.40) and the error-free model very similar (1.99).The maximum estimated CO occurs at = 2.17 for the errors-in-variables model.

Discussion
In the paper we introduce a local constant fit for circular data when the sample is affected by a measurement error.Future research work will deal with the generalization of the proposed methodology to higher order interpolating polynomials, the specification of more suitable smoothing degree selectors and the treatment of the circular errors-in-variables regression problem with other methods, such as the low-order approximation or equivalence ones.

Appendix A
Proof of Result 4. By following the same lines as in the proof of Lemma 1 in Di Marzio et al. (2012a), and noting that, for a circular kernel K , its second sin- moment 2 (K ) = ∫ sin 2 ( )K ( )d and roughness R(K ) = ∫ K 2 ( )d are equal to (1 − 2 ( ))∕2 and ∑ ∞ =1 2 ( ) , respectively, we obtain for j ∈ (1, 2) , and Then by applying the same arguments in the proof of Theorem 1 in Di Marzio et al. (2012a) gives the asymptotic bias and the asymptotic variance results.□ Proof of Result 5.By reasoning as in the proof of Lemma 3 in Di Marzio et al. (2012a), and considering that x ∈ ℝ is an interior point of the support of f X , we obtain for j ∈ (1, 2) , and Then by applying the same arguments in the proof of Theorem 3 in Di Marzio et al. (2012a) gives the asymptotic bias and the asymptotic variance results.□ Proof of Result 6.We obtain the asymptotic bias by reasoning as in the proof of Result 4. This result can be, additionally, derived by employing the unbiased score approach.The asymptotic variance follows from the same arguments as in the proof of Theorem 3 in Di Marzio et al. (2009), when considering the roughness of a deconvolution kernel.□

Fig. 1
Fig. 1 From left: C-C, C-L and L-C regression models

Fig. 2
Fig. 2 MASE curves of SNW (continuous) and DNW (dashed) estimators over a grid of smoothing values for the C-C (top), C-L (middle), and L-C (bottom) scenarios.Empty circles indicate the minima of the curves

Fig. 3
Fig. 3 Active monitoring sites close to Houston, Texas.Data from Houston North Loop is selected for our illustration.Extracted using the GeoTAM Map Viewer at https:// www.tceq.texas.gov/ airqu ality/ monops/ sites/ air-mon-sites

Fig. 4
Fig. 4 Carbon monoxide vs wind direction at Houston North Loop monitoring station-alternate daily averages for 2018.Parametric sin/cos model (red), fitted nonparametric errors in variables model (black) and standard circular-linear (no error model) kernel regression (dashed)

Table 1
Minimum values of MASE s over 500 samples of size n for SNW and DNW estimators