A reliable data-based smoothing parameter selection method for circular kernel estimation

A new data-based smoothing parameter for circular kernel density (and its derivatives) estimation is proposed. Following the plug-in ideas, unknown quantities on an optimal smoothing parameter are replaced by suitable estimates. This paper provides a circular version of the well-known Sheather and Jones bandwidths (DOI: 10.1111/j.2517-6161.1991.tb01857.x), with direct and solve-the-equation plug-in rules. Theoretical support for our developments, related to the asymptotic mean squared error of the estimator of the density, its derivatives, and its functionals, for circular data, are provided. The proposed selectors are compared with previous data-based smoothing parameters for circular kernel density estimation. This paper also contributes to the study of the optimal kernel for circular data. An illustration of the proposed plug-in rules is also shown using real data on the time of car accidents.


Introduction
Circular data are observations that can be represented on the unit circumference and where periodicity must be taken into account.Classic examples appear when the goal is to model orientations or a periodic phenomenon with a known period.Several applications of circular data can be found, e.g., in Ley and Verdebout (2018) or SenGupta and Arnold (2022).The complicated features that circular data can exhibit on real data applications lead to several new flexible models in the statistical literature.A recent review of flexible parametric circular distributions can be found in Ameijeiras-Alonso and Crujeiras (2022).When trying to obtain a flexible fit of the density function, an alternative to parametric models is the kernel density estimation.Kernel density estimation for circular data dates back to Beran (1979) and Hall et al. (1987), while the estimator of the density derivatives was studied by Klemelä (2000) and Di Marzio et al. (2011).It is well known that the choice of the smoothing parameter is critical, when using these kernel methods.
In the usual linear inferential framework, where random variables are supported on the Euclidean space, one can find many approaches for selecting the "best" data-driven bandwidth parameter (see, e.g., Jones et al., 1996, for a discussion on this topic).Due to its good performance, one of the most-employed bandwidth selectors is the plug-in bandwidth proposed by Sheather and Jones (1991).The relevance of this plug-in selector is evident from the impressive number of citations that Sheather and Jones (1991) have received, and although other authors have introduced new bandwidth selectors, none of the proposals outperforms, in general, their plug-in selector.
There exists some literature on smoothing parameter selection for circular data, some of these ideas are based on cross-validation (Hall et al., 1987), rule-of-thumb (Taylor, 2008), adaptive-mixture of von Mises (Oliveira et al., 2012), or bootstrap (Di Marzio et al., 2011) techniques.But none of the approaches introduced so far presents an outstanding performance with respect to its competitors, as is the case with the proposal by Sheather and Jones (1991) in the linear case.Hence, the goal of this paper is to provide the needed theory to derive an algorithm that replicates the idea of the two-stage direct and solve-the-equation plug-in bandwidth selectors for circular kernel density estimation.In addition, the developed theory can be also employed when estimating the density derivatives.
Regarding the kernel choice, most of the results for circular density estimation fix the kernel to be the von Mises density function.In this paper, we study the asymptotic results for a more general class of kernels.The developed results allow us to obtain the optimal kernels in the circular estimation context.This paper is organized as follows.Section 2 is devoted to the definition of the circular kernel density derivative estimate.Also, the key function needed to derive the optimal smoothing parameter is introduced in this section.For a general circular kernel, the asymptotic mean integrated squared error and the optimal smoothing parameter of the derivative estimator are derived in Section 3. Section 4 is devoted to the estimation of density functionals, needed to derive the plug-in rules.In Section 5, we discuss the kernel choice.Section 6 provides all the needed details to compute the two-stage direct and solve-the-equation plug-in smoothing selectors.A simulation study showing that the proposed rules provide a competitive smoothing parameter is given in Section 7. Section 8 uses a real data example to show the applicability of the proposed selector.Some final remarks are provided in Section 9. Section 10 describes the software that implements the proposed smoothing parameters.The proofs of the theoretical results appear in Appendices A and B.

Circular kernel density derivative estimation
Given a circular random sample in angles Θ 1 , . . ., Θ n ∈ [−π, π), with associated density function f , the circular kernel density estimation (see, for example, Oliveira et al., 2012) can be defined as follows where K ν is the circular kernel with smoothing parameter ν ∈ [0, 1], denoting by ν the mean resultant length (differently from the previous literature, where ν usually stands for a concentration parameter).
This allows us to establish a theory that will be valid for both, kernels depending on the concentration, such as the von Mises, or on the mean resultant length, such as the wrapped normal (see Section 5 for a formal introduction of these kernels).
The estimator (2.1) can be easily extended, when the objective is estimating the r-th derivative of f , denoted by f (r) .In that case, the estimator of f (r) can be defined as (Di Marzio et al., 2011), (2.2) The first question is which function can be employed as a kernel?In this case, we will assume that K satisfies the standard conditions that one can find for the canonical linear kernel (Wand and Jones, 1995, Ch. 2).In particular, K will be a circular density, (reflectively) symmetric about zero, unimodal and square-integrable in [−π, π).Note that these last conditions ensure that the kernel K has the following convergent Fourier series representation (see Mardia and Jupp, 1999, Section 4.2), where only the values of α K,j (ν) ∈ [0, 1], for j ∈ {1, 2, . ..}, depend on the employed kernel and the smoothing parameter ν.
Secondly, a crucial element in kernel density estimation is the smoothing parameter.Generally, this parameter is taken as a non-random sequence, depending on the sample size, and then as a fixed value for a sample realization.In this work, we will introduce the function h depending on ν, (2.4) Then, h can be seen as the bandwidth and the sequence of numbers h n ≡ h K (ν n ) must satisfy the following condition lim n→∞ h n = 0.

Asymptotic results
In this section, we establish the results needed for deriving the asymptotic mean integrated squared error (AMISE) and the AMISE-optimal smoothing parameter.Throughout this section, for a given derivative order r, we will employ the following assumptions to derive the asymptotic results.
(A2) The kernel K is a bounded circular density, (reflectively) symmetric about zero, unimodal and its r-th derivative is square integrable in [−π, π).
Assumptions (A1) and (A2) coincide with those employed in the standard linear case.For a large class of kernels, including the von Mises or the wrapped normal (see Section 5), we will see that Assumption (A3) translates into the standard conditions on the bandwidth, namely, h n = 0 and nh The result in Theorem 1 (see Section A of the Appendix for a formal proof) states the AMISE order of the kernel derivative estimator (2.2).If we also assume the following two extra conditions, we can derive an explicit expression of the AMISE.
Theorem 1 states the asymptotic expression of the MISE which depend on the sample size n, the derivative of the density function f (r+2) , the kernel K, and the smoothing parameter ν.The complete expression of the AMSE is provided in Section A of the Appendix.
Remark 1. Di Marzio et al. (2009Marzio et al. ( , 2011) also analysed the AMISE of the circular kernel estimation obtaining a similar result as in (3.1), replacing h K (ν n ) by (1 − α K,2 (ν n ))/2.Note that, although not stated there, Condition (E1) must be imposed to derive their asymptotic results.This can be seen in the remainder term of the asymptotic bias in the proof of Theorem 1 in Di Marzio et al. (2009).
Assuming Condition (E1), we obtain that h K (ν n ) can be approximated by (1 − α K,2 (ν n ))/2.Thus, asymptotically, both results coincide, but the expression provided in this paper will allow us to derive an expression of the optimal concentration.
The issue, under the general AMISE expression given in (3.1), is that it is not straightforward to know how to derive an explicit expression of the optimal smoothing parameter, unless a specific kernel, such as the von Mises is chosen (see, e.g., Di Marzio et al., 2011).This problem can be solved when the following extra condition is assumed.
, where Q K;r,2 is a constant only depending on the kernel K and r.
Note that under the Condition (E3), the Assumption (A3) simplifies to h n = 0 and nh Using this last assumption, we can observe the classic variance-bias trade-off, where the bias is reduced if h n is "small" and the variance decreases if h n is "large".The optimal smoothing parameter with respect to the AMISE criteria can be obtained using the following corollary of Theorem 1, which is a direct consequence of h K (ν) > 0 (see Section A of the Appendix).Note that, in the following corollary, we obtain that, under the previous assumptions, the AMISE order coincides with the one obtained in the linear case, O(n −4/(2r+5) ) (Wand and Jones, 1995, Section 2.5).
As usual, when using data-based plug-in smoothing parameters, the main problem with employing the optimal h K;r;AMISE in Corollary 1, is that its value depends on the unknown value of A rule-of-thumb smoothing selector can be obtained by replacing f with a simple and standard density function, such as the von Mises density.One can also follow the Ćwik and Koronacki (1997) approach and replace f with a mixture model, such as the mixture of von Mises.Both techniques were already proposed and studied in the circular literature when employing the von Mises density as the kernel K (see Taylor, 2008;Oliveira et al., 2012).Following Sheather and Jones (1991), an alternative is to estimate density functionals related to π −π f (s) (θ) 2 dθ.In the next section, we study how to obtain a kernel estimator of this last quantity, in the circular context.

Estimation of density functionals
As mentioned in the previous section, an issue with using, in practice, the optimal smoothing parameter (3.2) is its dependence on the unknown quantity In this section, we will see how to estimate this last quantity using kernel techniques.For doing so, we first define the functional of the form Note that under sufficient smoothness assumptions on f (e.g., the needed conditions to apply integration by parts), we obtain that, (4.1) Since ψ s = E(f (s) (Θ)), the following estimator can be employed to estimate the unknown quantity π −π f (r+2) (θ) 2 dθ on the optimal smoothing parameter (3.2), where L and ρ are a kernel and a concentration parameter, which are possibly different from K and ν.
Using the estimator (4.2), a direct plug-in estimator of the smoothing parameter can be obtained from (3.2), replacing the quantity depending on the true f , by its estimator, in the following way, . (4. 3) The problem with the direct plug-in estimator (4.3), is that it still depends on a choice of the pilot smoothing parameter ρ.Below, we establish the asymptotic theory to derive the optimal smoothing parameter of ψ2r+4;ρ .For obtaining that result, the following condition on the pilot parameter is required.
Replacing, in Assumptions (A1)-(A3), the kernel K by L, the smoothing parameter ν n by ρ n , and the order of the derivative r by s; we obtain the following AMSE for the estimator ψs;ρ (see Section B of the Appendix for a formal proof).
Theorem 2. Under the Assumptions (A1)-( A4), (E1), and (E2) (using L, instead of K; ρ n , instead ν n ; and s = r, an even number); we have (4.4) As for Corollary 1, for deriving a simpler expression of the AMSE and the optimal value of h n ≡ h L (ρ n ), we will assume the following extra conditions.
, where Q L;s,1 is a constant only depending on the kernel L and the even number s.The sign of Q L;s,1 is equal to the sign of (−1) s/2 .
From the AMSE expression in (4.4), we can see that the optimal h n value, in terms of AMSE, will depend on the relation between R L;s,1 (ρ n ), R L;s,2 (ρ n ), and h L (ρ n ).Conditions (E3) and (E4) help to establish this relation, from which the following corollary is derived (see Section B of the Appendix).
Corollary 2. Consider the assumptions of Theorem 2, Conditions (E3), and (E4).Then, for the kernel estimator of ψ s (see (4.2)), we have that, asymptotically, the optimal value, in terms of the AMSE expression, of h n can be obtained from, . (4.5) Under the previous assumptions, if Condition (E5) is also assumed, the minimal AMSE for (4.2) is of order O(n − min(5,s+3)/(s+3) ).If s is an even number greater than 2, the minimal AMSE would be equal to n −5/(s+3) . (4.6) When s = 2, the minimal AMSE is equal to the sum of the right-hand sides of (4.6) and (4.7).
Note that Condition (E5) is only assumed to derive the same AMSE optimal order as in the linear case.If that condition is not fulfilled, then, when s > 0, the leading term in the AMSE could be of a larger order (see Section B of the Appendix).Another important consideration is that the sign of ψ s+2 is the same as that of (−1) s/2+1 and, using Condition (E4), it also coincides with the sign of −Q L;s,1 .Therefore, we always have that h L;s;AMSE ≥ 0 in (4.5).
Using the quantity h L;2r+4;AMSE in (4.5), one can obtain the pilot smoothing parameter ρ needed to derive the direct plug-in estimator (4.3).The issue, as in the linear case, is that h L;2r+4;AMSE still depends on the unknown value of the functional ψ 2r+6 .We comment on how to overcome this difficulty in Section 6.

The kernel choice
Theoretical results in Sections 3 and 4 provide mathematical support for deriving the optimal smoothing parameter.However, we still did not discuss how to obtain the plug-in concentration parameter ν r;AMISE from h K;r;AMISE .As mentioned in Section 3, we must solve the equation h K (ν) = h K;r;AMISE .We also need to obtain the values of Q K;r,1 and Q K;r,2 in (3.2) and (4.5).
In this section, we study what happens when the "most common" circular models are employed as kernels.For doing so, we restrict our attention to the four standard choices of circular densities (see Mardia and Jupp, 1999, Section 3.5): cardioid, von Mises, wrapped normal, and wrapped Cauchy.
Once the behaviour of the standard kernels is studied, a second question to discuss is which is the optimal kernel in the circular context.As in Muller (1984), we study that optimality in terms of the AMISE expression in (3.3).Fixing f and the sample size n, in Section 5.3, we obtain the circular kernel that minimizes the AMISE.

von Mises and wrapped normal kernels
First, denoting by I j to the modified Bessel function of the first kind and order j, let us consider the density expressions of the von Mises (VM) and the wrapped normal (WN) kernels.
For both kernels, if Even more, it is easy to show that, in that setting, Conditions (E1) and (E2) are satisfied.Thus, as n → ∞, we obtain the following asymptotic approximations. (5.1) (5.2) From the previous equalities, we can see that κ n or ν n are easily derived after computing the optimal/plug-in value of h n (see, e.g., (3.2) or (4.3)).Again, considering ν n = 1, it can be seen that Conditions (E3)-(E5) are satisfied for the von Mises and wrapped normal kernels.For both kernels and a non-negative integer number r, the values of the constants in (E3) and (E4) are the following ones, We can see that the AMSE and AMISE results derived in Sections 3 and 4 can be easily obtained in practice from these last quantities.These optimal asymptotic results will coincide for both, the von Mises and the wrapped normal kernel.

Wrapped Cauchy kernel
The wrapped Cauchy kernel density expression is From the last equality, we derive that h K WC (ν) = π 2 /3 + 4Li 2 (−ν), where Li s is the polylogarithm of order s.The last implies can be obtained in terms of a polylogarithm, (5.4) If we consider a value of h n such that h n = 0 and nh 2r+1 n = ∞, Condition (A3) is satisfied.Therefore, from Theorem 1, if also f satisfies Assumption (A1), we obtain the following AMISE for the wrapped Cauchy kernel, (5.5) The AMISE expression (5.5) would be minimized if h n is of order n −1/(2r+3) .Using the previous result, we obtain that the AMISE order of the wrapped Cauchy kernel is worse than that obtained with the von Mises or the wrapped normal kernel (see Corollary 1).In particular, its minimal AMISE is equal to 2r+3) .
When searching for an explicit expression of the optimal smoothing parameter, one could be tempted to combine (3.1) and (5.4).But note that this cannot be done as Condition (E1) is not verified.Thus, the explicit expression of the asymptotic bias cannot be obtained following the steps in Section A of the Appendix.This means that Equations (A.1) and, therefore, (3.1) are not necessarily true for this particular kernel.On the contrary, Tsuruta and Sagae (2017) were able to obtain an optimal smoothing parameter for the wrapped Cauchy kernel from the results by Di Marzio et al. ( 2009).
Nevertheless, it should be noted that Di Marzio et al. ( 2009) results must not be employed for this kernel as Condition (E1) is not satisfied (see Remark 1).

Wrapped bounded-support kernels
To find the optimal kernels, consider a wrapped kernel Consider the asymptotic case for the linear bounded-support kernels, i.e., λ n = 0. Fixing the unknown linear density function f X , Muller (1984) gives explicit expressions of the bounded-support kernels minimizing the AMISE when estimating the density function and its derivatives, in the linear case.
Assume now that the wrapped bounded-support kernel satisfies the Assumptions of Corollary 1.
Except for the values not depending on the kernel, the minimal AMISE of 3) is equal to the minimal AMISE obtained in the linear case.As a consequence, Theorem 2.4 of Muller (1984) can be employed to see that the optimal kernels in terms of AMISE coincide with the wrapped version of those employed on linear kernel estimation.
The last means that the optimal kernel for circular density estimation is the wrapped Epanechnikov, whose associated linear density is K X;λ (x) = 3(1 − (x/λ) 2 )/(4λ).When λ < π, the concentration is ν = (3 sin(λ) − 3λ cos(λ))/λ 3 , h = λ 2 /5, and R K;0,2 (ν) = 3/(5λ).Thus, for the circular density estimation with the wrapped Epanechnikov, the optimal (AMISE) value of h n is obtained by taking Given a circular density f , independently of the kernel, the minimal AMISE that could be obtained when estimating the circular density is equal to This is the AMISE obtained by the wrapped Epanechnikov.For the derivatives of the density, we refer to Muller (1984).There, we can see, e.g., that the wrapped Biweight would be the optimal kernel for the first derivative of f .

The plug-in smoothing parameters
In this section, we will study how to implement, in practice, the plug-in smoothing parameter h K;r;PI in (4.3).As mentioned in Section 4, the issue of directly employing (4.3) is that the AMISE-optimal smoothing parameter for ψs;ρ will always depend on an unknown value of ψ s+2 .A way to overcome this difficulty is to provide an l-stage direct plug-in smoothing selector.This procedure consists in estimate ψ s with ψs+2;ρ , in an iterative process, until some point in which we replace ψ 2r+2l+4 by its value obtained with a simple density (see Section 6.1 for more details).An alternative selector can be computed by noting that the smoothing parameter for f (r) can be obtained as a function of the smoothing parameter for ψ2r+4 .This allows us to construct a solve-the-equation rule.In Section 6.2, we discuss in more detail how this selector is derived.

The two-stage direct plug-in smoothing selector
The procedure to obtain the l-stage direct plug-in smoothing selector consists in, at stage 0, using a simple rule of thumb to compute the smoothing parameter of ψ2r+4+2l;ρ .Once this initial step is achieved, from that estimator of ψ 2r+4+2l , the following functional estimators are derived in an iterative process (see Algorithm 1 for details).
Two decisions remain from this brief explanation: the number of stages l and which reference density should be employed at stage 0. Regarding the number of stages l, we suggest employing l = 2 for two reasons.First, since l = 2 is a common choice in the linear case (see, e.g., Wand and Jones, 1995, Section 3.6).A second reason was obtained when replicating the simulation study in Section 7, with l = 3.In that case, similar results were obtained with respect to those obtained with l = 2, with the added inconvenience of the extra computational time.
As mentioned before, at stage 0, a reference density is needed to compute the smoothing parameter of ψ2r+2l+4;ρ .Here, in the same spirit of the original rule of thumb proposed by Taylor ( 2008), a natural selection would be to replace ψ 2r+2l+4;ρ in (4.5) by the quantity obtained when assuming that the true density follows a von Mises distribution.In the circular case, the issue of employing that simple strategy is that a uniform estimation of the density can be obtained even if the true distribution is not uniform.This occurs, for example, when considering distributions with antipodal symmetry (see also Oliveira et al., 2012, for further discussion on this topic).The consequence would be to have a value of ψ2r+2l+4;ρ close to zero, which derives in a "large" value of the smoothing parameter at the next stage h L;2r+2l+2;PI .
To avoid that last issue, while still having a simple model, one can use as the reference density the following mixture of M von Mises, all having the same concentration parameter κ ≥ 0.
where the parameters µ m ∈ [−π, π), w m ∈ [0, 1], for all m ∈ {1, . . ., M }, and with M m=1 w m = 1.The value of ψ 2r+2l+4;ρ is calculated from the density (6.1), replacing its parameters by their maximum likelihood estimates obtained from the sample.An algorithm providing the maximum likelihood estimates for the density (6.1) was implemented in the R (R Core Team, 2022) library movMF by Hornik and Grün (2014).In practice, following Oliveira et al. (2012), the value of M can be chosen, using the Akaike Information Criterion (AIC), by comparing the results obtained with the mixtures (6.1) of M = 1, . . ., M max components.
In Algorithm 1, we summarize the steps that are needed to obtain our proposed two-stage direct plug-in smoothing parameter.There, for simplicity, we use K = L, a kernel that satisfies Assump-
Step 1: Use a rule of thumb to obtain the estimator of ψ 2r+8 , ψ2r+8;RT .This can be achieved as follows.
Step 1.a:For every M = 1, . . ., M max ; obtain the maximum likelihood estimators of the parameters in the mixture of M von Mises, all having the same concentration parameter, i.e., of µ, κ, and w in (6.1).
Step 1.b: Select the number of components in the mixture, M AIC , employing the AIC.
Step 4.a: Obtain the two-stage direct plug-in smoothing parameter h K;r;PI from (4.3), with the pilot smoothing parameter ρ 2 .
Step 4.b: The selected concentration parameter ν DPI is obtained from the value that satisfies h K (ν DPI ) = h K;r;PI .
The uniform distribution belongs to all the reference distributions mentioned before.Thus, in practice, the denominator of (4.3) or (4.5) could be equal to zero.If that occurs, we suggest directly returning a value of the concentration parameter ν DPI = 0, which would correspond with the uniform estimation of the density.
Given a finite value of n in (4.3) or (4.5), the equation h K (ν) = h K;r;PI , or its approximation α K,2 (ν) = 1 − 2h K;r;AMISE , cannot be solved for "large" values of h K;r;PI .The reason is that h K (ν) and α K,2 (ν) are non-negative and bounded for any value of ν.Since a "large" value of h K (ν) corresponds to ν = 0, if the equation h K (ν) = h K;r;AMISE cannot be solved, we also suggest to employ the uniform as the density estimator.

Solve-the-equation plug-in smoothing selector
An alternative to the previous smoothing selector is the solve-the-equation rule.This rule consists in searching for the smoothing parameter that satisfies Now, the smoothing parameter for ψ2r+4 is a function of the smoothing parameter for f (r) .We suggest taking the function γ by looking at the relation between the optimal smoothing parameter of these two estimators.
Using a plug-in rule, this last relation suggests taking the following function, h (2r+5)/(2r+7) . (6. 3) The two concentration parameters of the density functional estimators, ψ2r+4 and ψ2r+6 , can be obtained from (4.5).Those smoothing parameters will depend on some other density functionals, ψ 2r+6 and ψ 2r+8 .Following Sheather and Jones (1991), we suggest estimating these two new functionals with a rule of thumb.In Algorithm 2, we summarize the steps that are needed to obtain our proposed solve-the-equation plug-in smoothing parameter.Again, for simplicity, we use K = L, a kernel that satisfies Assumptions (A1)-(A4) and the extra Conditions (E1)-(E4).
Step 1: Use a rule of thumb to obtain the estimator of ψ 2r+6 and ψ 2r+8 , namely ψ2r+6;RT and ψ2r+8;RT .This can be done as in Step 1 of Algorithm 1.
Step 4: Using the γ function of Step 3, select the value of h K;r;STE by solving Equation (6.2).The solve-the-equation concentration parameter ν STE is obtained from the value that satisfies h K (ν STE ) = h K;r;STE .
Note that two extra functional estimators must be estimated in Step 1 of Algorithm 2. For this reason, the obtained smoothing parameter could be considered as a two-stages solve-the-equation plug-in selector.As for the direct rule, the number of stages could be increased to estimate the density functionals.In particular, a rule of thumb can be employed to estimate ψ 2r+10 or ψ 2r+12 and then, in an iterative process, the values of ρ 1 and ρ 2 are obtained (as in Steps 2 and 3 of Algorithm 1).
The simulation study in Section 7 was carried out under the same conditions, using three (relying on ψ2r+8;RT and ψ2r+10;RT ) and four (relying on ψ2r+10;RT and ψ2r+12;RT ) stages.The results in practice were similar to those obtained with Algorithm 2, with the drawback of the extra computational time.

Simulation study
In this section, we performed a simulation study to analyse the performance of the direct two-stage plug-in concentration parameter (see Algorithm 1) and the solve-the-equation plug-in smoothing selector (see Algorithm 2).We focused only on the density estimation case (r = 0 in (2.2)), with K being the von Mises kernel.The reason is that their effectiveness can be compared with other smoothing parameters proposed in the literature.But note that our data-driven smoothing parameters could be employed to estimate any derivative and with other kernels.In particular, in this framework, slightly better results are expected by employing the wrapped Epanechnikov kernel (see Section 5.3).
Regarding the choice of M max at stage 0 (see Step 1 of Algorithms 1 and 2), we studied two scenarios.First, the reference model for the plug-in smoothing parameters was a simple von Mises (M max = 1).As a second choice, we allowed for mixture models (6.1), with M max = 5, in the reference density.In Tables 1 and 2, we show both results, M max = 1 (ν DPI;1 ) and M max = 5 (ν DPI;5 ), for the direct plug-in concentration parameter.While for the solve-the-equation plug-in smoothing selector (ν STE;1 ), we only show the results when M max = 1, given that our empirical experiments reveal that for the shown "small/moderate" sample sizes, most of the time M max = 1 performs better, even in the more complex models, and it requires less computational time.
The results of using a common concentration parameter in the reference density (see (6.1)) at Step 1 were compared with those of the full von Mises mixture (allowing for different concentration parameters in each component).For most of the studied scenarios, better or similar results were obtained using a common value of κ, while keeping the computational efficiency.For that reason, we only show the results obtained using a common concentration parameter (as described in Algorithms 1 and 2).
We will compare the performance of the proposed concentration parameters with the following three rules for smoothing selection in circular density estimation, all implemented in the R library NPCirc (Oliveira et al., 2014).
• Rule of thumb of Taylor ( 2008), ν RT , where to compute ψ 4 in (4.3), it is assumed that f follows a von Mises.
• Plug-in rule of Oliveira et al. (2012), ν MvM , where to compute ψ 4 in (4.3), it is assumed that f follows a mixture of M von Mises, with different concentration parameters.The number M is chosen between 1 and 5, according to the AIC.
• Likelihood cross-validation of Hall et al. (1987), ν LCV .Consider the leave-one-out estimator f−i;ν (θ), which corresponds with the kernel density estimation (2.1), leaving out the i-th observation.Then, the concentration parameter ν LCV is obtained from the value of ν that maximizes The reasons for not showing the results with the other two well-known rules for obtaining the concentration parameter can be found in Oliveira et al. (2012).Oliveira et al. (2012) mentioned that their empirical experiments show that the likelihood cross-validation rule provides a more stable behaviour than the least-squares cross-validation method (Hall et al., 1987).The reason for not including the bootstrap smoothing selector of Di Marzio et al. ( 2011) is that it relies on an optimization algorithm that searches for a local minimum.In that optimization, it is needed to impose the possible range of concentration values and, when n is not "too large" (n = 100), the needed local minimum was not found for several samples.For n "large" (n = 250 or n = 500), Oliveira et al. (2012) show that their plug-in rule outperforms the bootstrap method except when data is generated from the simplest models (M1-M4 below).
For comparing the performance of the different rules, we have employed the 20 reference models that can be found in Oliveira et al. (2012).They correspond to the uniform (M1), simple unimodal models (symmetric, M2-M5, and asymmetric M6), two-component mixture models (M7-M10), models with more than two components (M11-M16), and more complex models (M17-M20).Their density representations can be found in Tables 1 and 2, and their density expressions appear in Oliveira et al. (2012).From each model, we have generated 1000 random samples of sizes n = 50 and n = 100.Given the true density f , for each sample, we have computed the integrated squared error (ISE) as the criterion to analyse the performance of the different estimators.
As summary measures in Tables 1 (for Models M1-M14) and 2 (for Models M15-M20), for each rule for concentration selection, we show the average ISE and standard deviations, computed over 1000 replicates.The smallest average ISE values are highlighted in bold.On those tables, as a benchmark, we have also included the results with the "gold standard" smoothing parameter ν GS , which is the value of ν minimizing the ISE(ν) for each generated sample.
In Tables 1 and 2, we can observe that the proposed plug-in rules are the ones providing the smallest or the second smallest average ISE.The obtained average ISE is also close to the benchmark, which is remarkable considering that no-extra information is given about the true density and sample sizes are "small".The "smallest" dispersion, measured through the standard deviation, is obtained either by the proposed plug-in rules (ν DPI;1 and ν STE;1 ) or by the rule of thumb (ν RT ).
In the following, we describe in more detail the behaviour of the two-stage plug-in rule in terms of the average ISE provided in Tables 1 and 2. First, we can observe that the plug-in rule with a simple von Mises at stage 0 (ν DPI;1 ) provides, in general, the smallest average ISE, especially when the sample size is "small" (n = 50).The first exception to this general pattern occurs for densities that are well-approximated by a von Mises (M1, M3, and M4), where, as expected, the rule of thumb (ν RT ) provides a slightly lower average ISE, being ν DPI;1 the second one providing the best results.
The second and more important exception occurs for Models M7, M11, M13, M14, M16, and M20; whose associated densities are multimodal and (almost) k-fold rotational symmetric.As mentioned in Section 6.1, the "bad" performance of ν DPI;1 is probably due to the uniform estimation of the density by the von Mises density at stage 0. As commented there, we can solve that issue of the two-stage plug-in rule by using the mixture of von Mises at stage 0 (ν DPI;5 ).For those densities, we can see that ν DPI;5 provides better results than ν DPI;1 .On the complex models, those densities that are multimodal (M7, M8, M10-M16, and M18-M20) or "peaked" (M5 and M17), in general, the smallest average ISE is obtained by the solve-the-equation plug-in smoothing selector (ν STE;1 ), especially when the sample size is "moderate" (n = 100).Even on the strongly (reflectively) asymmetric model (M6), ν STE;1 provides the smallest average ISE (n = 100).Thus, the only cases where ν STE;1 does not provide the smallest average ISE are the models well-approximated by a von Mises density (M1-M4 and M9).For the remaining models, if ν STE;1 is not the "best" smoothing selector (M5, M15, M19, and M20), it is the rule providing the second smallest average ISE after the direct plug-in rule (ν DPI ).
In summary, we have shown that the proposed plug-in rules provide competitive smoothing parameters for "small/moderate" sample sizes.We have also replicated the same simulation study for larger samples sizes (n = 250, n = 500, and n = 1000).For the simple models (the ones well approximated by a von Mises density), ν DPI;1 and ν RT still provide the "best" results.In the remaining models, for n = 250, we observe a similar pattern as that described for n = 100, with ν STE;1 providing the "best" results most of the time.The relative performance (when compared with the other data-based smoothing parameters) of ν DPI;5 and ν MvM improves with the sample size.Regarding the solve-the-equation plug-in smoothing selector, as mentioned before, the ISE values with M max = 5, for n = 50 and n = 100, were not shown since, in most of the cases, M max = 1 provided better results.We observed a different behaviour, in the complex models, for larger sample sizes (n = 500, and n = 1000).In those cases, the solve-the-equation plug-in smoothing selector with M max = 5 (ν STE;5 ) obtained lower ISE values than those derived with M max = 1 (ν STE;1 ).As a result, for most of the non-simple models, ν MvM and ν STE;5 become the smoothing selectors providing the "best" results when n = 500 and n = 1000.In those scenarios, the selector ν MvM behaves better when data is generated by a mixture of von Mises, while ν STE;5 provides the best results in the mixtures with other (more complex) density models.
As a summary of our findings in this simulation study, our recommendations are as follows.
• For small sample sizes (say n = 50), employ the proposed direct plug-in rule with M max = 1 (ν DPI;1 ).The only exception to this recommendation is if there could be evidence of the density function being multimodal and k-fold rotational symmetric.In that case, employ the solve-theequation plug-in smoothing selector with M max = 1 (ν STE;1 ).
• For moderate sample sizes (say n = 100 or n = 250), employ the solve-the-equation plug-in smoothing selector with M max = 1 (ν STE;1 ).The direct plug-in rule with M max = 1 (ν DPI;1 ) is also recommendable if there is evidence of unimodality.
• For large sample sizes (say n = 500 or n = 1000), employ the plug-in rule of Oliveira et al.
(2012) (ν MvM ).An alternative would be the solve-the-equation plug-in smoothing parameter with M max = 5 (ν STE;5 ).When compared with ν MvM , the selector ν STE;5 should provide a slightly better estimation when the density presents a complex shape.

Real data application
In this section, we revisit the car accident data that can be found in Ameijeiras-Alonso and Crujeiras In Ameijeiras-Alonso and Crujeiras (2022), several features of the shape of the distribution are analysed with different inferential tools.The conclusion is that the density is unimodal and (reflectively) asymmetric.More specifically among several parametric models studied by Ameijeiras-Alonso and Crujeiras (2022), the conclusion is that the "best" parametric fitting is achieved by the wrapped skew normal density of Pewsey (2000).A representation of the fitted wrapped skew normal density is provided in Figure 1 (left, thick solid line).This real dataset constitutes a good example where we would recommend employing the proposed direct plug-in smoothing selector, with M max = 1, to estimate the density function as the sample size is "small" (n = 85) and unimodality cannot be rejected for this sample.In Figure 1 (left), we show the estimated density function employing this direct plug-in rule (DPI; 1), the proposed solvethe-equation plug-in selector (STE; 1), the rule of thumb of Taylor (2008) (RT), and the plug-in rule of Oliveira et al. (2012) (MvM).We did not include the likelihood cross-validation of Hall et al. (1987) as its density estimation was close to the one obtained with the direct plug-in rule (DPI; 1).If we take the wrapped skew normal density as a reference, we can visually see that the closest kernel estimation is provided by the direct plug-in rule.This can be confirmed by computing the ISE (7.1), replacing f with the fitted wrapped skew normal density.In Table 3, we can observe that the smallest ISE is obtained with the proposed direct plug-in rule.
Finally, another application of the results derived in this paper can be found when the interest is in the density derivative estimate.As before, we will focus on the results obtained with the direct plug-in rule, with M max = 1.In that case, the smoothing parameter for the kernel estimator can be derived with Algorithm 1 (with r = 1).The obtained estimator, with the von Mises kernel, for the car accident data is plotted in Figure 1 (right).There, one can see that the derivative estimation is

Availability
The proposed l-stage direct plug-in rule and the solve-the-equation selector have been added to the R library NPCirc (Oliveira et al., 2014).Given a dataset x, the two-stage plug-in rule, described in Algorithm 1 (with M max = 5), for density estimation, can be obtained with the function bw.AA(x, method="dpi").The solve-the-equation smoothing selector, described in Algorithm 2 (with M max = 1), for density estimation is computed with bw.AA(x) or bw.AA(x, method="ste").Thus, bw.AA is the circular equivalent of the function bw.SJ of the stats R package.
Some other extra possibilities are available on the function bw.AA.The kernel (K) and the derivative order (r in (2.2)) can be selected, respectively, with the options kernel and deriv.order.If the practitioner prefers to use a different number of stages l, this can be selected with nstage.At stage 0, this function also allows for modifying the number of components M (M) in the mixture (6.1), and selecting if a common concentration parameter is employed in all the components (commonkappa).
Below, we provide the 3-stage direct plug-in rule for the first derivative of the density function.We employed the wrapped normal kernel and a mixture of 4 components, with different concentration parameters, at stage 0. We generated a sample x of 50 data from Model 18 from Oliveira et al. (2012).
R> library("NPCirc") R> x <-rcircmix(n=50,model=18) R> bw.AA(x, deriv.order=1,method="dpi", nstage=3, kernel="wrappednormal", + M=4, commonkappa=FALSE) For the direct plug-in rule, a slightly more accurate (and more computationally inefficient) concentration parameter could be obtained if the asymptotic approximations are avoided, see, e.g., (5.1) and (5.3), for the von Mises kernel.This can be done if, at the end of every step of Algorithm 1, we compute an optimization routine searching the value of ν that minimizes the corresponding MISE/AMISE criterion (3.1) or (4.4).In the optimization routine, we could employ as the initial value the concentration value achieved at each specific step (i.e., the values obtained by (4.5) and (4.3)).This extra substep can be performed, in the bw.AA function, if we impose the argument approximate=FALSE.The simulation study of Section 7 was carried out also with this extra substep and similar results were obtained.
A numerical routine is needed to obtain the value of the smoothing parameter, ν STE , satisfying the Equality (6.2) in Step 4 of Algorithm 2. In our simulations and in bw.AA(x, method='ste'), the uniroot function of the stats R package was employed.The range over which the value of h K;r;STE is searched corresponds with the function arguments (lower, upper).The convergence tolerance, for searching the smoothing parameter with uniroot, can be modified with the argument tol.
In the function bw.AA, the values of the constants in (E4) and (E5) are only provided for the von Mises and the wrapped normal kernel.These values can be selected for other kernels with the arguments Q1 (for Q K;r,1 ) and Q2 (for Q K;r,2 ).
The kernel density derivative estimate for circular data was also added to the NPCirc package, inside the kern.den.circfunction.By default it employs, as the smoothing parameter, the proposed solve-the-equation plug-in selector (with M max = 1), described in Algorithm 2. Alternatively, the concentration parameter (bw) can be a real value or a character related to the available data-based smoothing parameters (e.g., if bw="dpi", the direct plug-in rule is employed).The derivative order can be selected with the argument deriv.order.This function generates an object of the class density.circular(see the circular R package, Agostinelli and Lund, 2022).The remaining available options of kern.den.circare, thus, similar to the ones in the function density.circular.
A representation of the kernel density derivative estimator for the sample x could be obtained as follows.

A Asymptotic results for the kernel derivative estimator
The main objective of this section is to compute the mean squared error (MSE) of the kernel derivative estimator in (2.2), i.e., E f (r) ν (θ) − f (r) (θ)

2
. For doing so, firstly, we will consider the bias term and, secondly, the variance term.Regarding the expected value of f (r) ν (θ), we obtain the following result, using Taylor's theorem, Assumptions (A1) over f , and (A2) over K.
If Condition (E1) is not fulfilled, then we can see that the asymptotic bias is of the same order, but we cannot derive a general explicit expression valid for any kernel L, Bias ψs;ρ = n −1 R L;s,1 (ρ n ) + O(h L (ρ n )).
For the variance, consider again the reparametrization (B.1).Now, since L (s) is symmetric for s even, we have the following result (see, e.g., Wand and Jones, 1995, Ch. 3).
Finally, the AMSE expression in (4.4) is obtained by adding the square of the asymptotic bias (see (B.3)) and the asymptotic variance (B.7).

(
2022).This real dataset consists of the time of the day (at a resolution of one minute) at which the car crash happened in El Paso County (Texas, USA) in 2018.A total of 85 observations were recorded on the webpage of the National Highway Traffic Safety Administration of the United States.

Figure 1 :
Figure 1: Car accident data.Left, density estimation: fitted wrapped skew normal density (thick solid line) and KDE with the von Mises kernel and different smoothing parameters.Smoothing parameters: rule of thumb of Taylor (2008) (RT, dashed line), plug-in rule of Oliveira et al. (2012) (MvM, dotted line), proposed direct plug-in rule with M max = 1 (DPI; 1; dot-dashed line), and proposed solve-the-equation plug-in rule with M max = 1 (STE; 1; thin solid line).Right (solid line): kernel density derivative estimation, with a von Mises kernel and the proposed two-stage direct plug-in smoothing selector (M max = 1).Right (dotted lines): estimated location of the modal and antimodal directions.