Nonparametric estimation of directional highest density regions

Highest density regions (HDRs) are defined as level sets containing sample points of relatively high density. Although Euclidean HDR estimation from a random sample, generated from the underlying density, has been widely considered in the statistical literature, this problem has not been contemplated for directional data yet. In this work, directional HDRs are formally defined and plug-in estimators based on kernel smoothing and associated confidence regions are proposed. We also provide a new suitable bootstrap bandwidth selector for plug-in HDRs estimation based on the minimization of an error criteria that involves the Hausdorff distance between the boundaries of the theoretical and estimated HDRs. An extensive simulation study shows the performance of the resulting estimator for the circle and for the sphere. The methodology is applied to analyze two real data sets in animal orientation and seismology.


Introduction
Set estimation is focused on the reconstruction of a set (or the approximation of any of its characteristic features such as its boundary or its volume) from a random sample of points. One of the specific topics in this area is concerned with the estimation of sets directly related to density functions such as level sets. Mathematically, for a given level t > 0, the goal is to reconstruct the unknown set B Paula Saavedra-Nieves paula.saavedra@usc.es 1 Universidade de Santiago de Compostela, Santiago de Compostela, Spain from a random sample of points of a density function g on R d . This topic has received considerable attention in the statistical literature, specially since the notion of population clusters was established in Hartigan (1975) as the connected components of the set in (1). This cluster definition relies clearly on the user-specified level t, so for addressing this problem, an algorithm for estimating the smallest level with more than a single connected component was proposed in Steinwart (2015). For a general review on clustering, see (Anderberg 1973;Everitt 1993;Cuevas and Fraiman 1997) and (Rinaldo and Wasserman 2001). The number of clusters is a basic feature for a statistical population. However, the problem of its estimation is not always taken into account in cluster analysis where it is usually chosen by the practitioner as a first step. Since the number of clusters is equal to the number of connected components of a level set, a very natural estimator for this populational parameter is the number of the connected components of the level set reconstruction. This perspective that solves the problem of selecting this unknown population parameter is considered, for instance, in Cuevas et al. (2000), Cuevas et al. (2001) and Biau et al. (2007).
Level set estimation theory has been mainly established for a density supported on an Euclidean space such as in Eq. (1) with very few contributions in other domains. Cuevas et al. (2006) consider the estimation of level sets for general functions (not necessarily a density) providing some consistency theoretical results and showing a level set on the sphere for illustration. More recently, the reconstruction of density level sets on manifolds is studied in Cholaquidis et al. (2020). Through some simulations, the behavior of the proposed method is analyzed on the torus and on the sphere.
Unfortunately, for most applications, the specific value of the level t in (1) is fully unknown by the practitioner. In addition, areas of the distribution support where g is close to zero (non-effective support) are usually of limited interest for applications. If the practitioner establishes the probability content instead of the level t, a new kind of density level sets emerges known as highest density regions (HDRs) (see Box andTiao 1973 andHyndman 1996). The estimation of HDRs involves further complexities given that the threshold of this particular type of level sets must be determined from the established probability content. Perhaps due to its practical importance, HDRs plug-in reconstruction from the linear kernel density estimator has been widely studied considering also the problem of selecting an appropriate bandwidth specifically devised for the HDR reconstruction (see, for instance, Baíllo andCuevas 2006 or Samworth andWand 2010). However, as far as we know, the notion of HDR has not been introduced for directional data yet. Therefore, the main goals of this work are to (1) generalize HDRs definition to the directional setting, (2) establish a plug-in procedure for HDRs reconstruction from the proposal of a new bootstrap bandwidth for a well-known directional kernel density estimator that can be seen as the first specific selector for directional HDRs, (3) check its performance through an extensive simulation study analysing the effect of considering a smoothing parameter not specifically designed for HDR estimation and (4) apply this methodology to analyze data on animal orientation and on seismology.
One may argue that such an absence of a general and effective proposal for directional HDRs estimation may be due to a lack of practical interest, but this is far from the truth, so let us present two application examples that motivate the developments Zouara beach Fig. 1 Geographical location of Zouara beach (right). Talorchestia brito (center) and Talitrus saltator (left) in this work. The first one concerns a problem from animal orientation studies and the second one is related to earthquakes occurrences. Both datasets are available in the R package HDiR. 1

Some motivating examples
Animal orientation example. Behavioral plasticity is considered by biologists as a feature of adaptation to changing beach environments. In particular, orientation is an adaptation characteristic that can not be modified by a single factor. Nonetheless, experts found some regularities in the orientation of sandhoppers and other animals from beach environments by changing one factor at a time under other controlled conditions.
For instance, the orientation of two sandhoppers species (Talitrus saltator and Talorchestia brito) is analyzed in Scapini et al. (2002). Both species are shown in Fig. 1. Bottom pictures can be found in Dekker (1978). Comparing the two species through regresion procedures, Scapini et al. (2002) conclude that Talitrus saltator showed more differentiated orientations, depending on the time of day, period of the year and sex, with respect to Talorchestia brito. Moreover, it seems that Talitrus saltator shows a higher flexibility (variation) of orientation than Talorchestia brito under the same environmental conditions, supporting the hypothesis that the former has a higher level of terrestrialization. As an illustration, Fig. 2 (left panel) shows the 36 orientation points (slightly jittered) corresponding to males of the specie Talitrus saltator measurements during the noon in April. It also contains the 77 angles (slightly jittered) when the measures are taken in October (Fig. 2, right panel). Differences in the distribution on the circle of these two samples can be easily observed. Therefore, the month of the year seems to play a significant role in sandhoppers behavior. In particular, two clusters for October measurements can be detected around the angle π but they are not present for the April sample. Similar comments could be done for the situation registered around the angles π/2. HDRs reconstruction (with low probability content) would allow to determine the biggest modes of the distribution, and then, its clusters. Therefore, HDRs can be seen as a useful alternative to analyze sandhoppers orientation.
Earthquakes occurrences. The European-Mediterranean Seismological Centre (EMSC) 2 is a non-governmental and non-profit organisation that has been established in 1975 at the request of the European Seismological Commission. Since the European-Mediterranean region has suffered several destructive earthquakes, there was a need for a scientific organisation to be in charge of the determination, as quickly as possible (within one hour of the earthquake occurrence), of the characteristics of such earthquakes. These predictions are based on the seismological data received from more than 65 national seismological agencies, mostly in the Euro-Med region. Figure 3 (left) shows the geographical coordinates (red points), downloaded from EMSC website, of a total of 272 medium and strong world earthquakes registered between 1 th October 2004 and 9 th April 2020. The magnitude of all these events is at least 2.5 degrees on the Richter scale. Of course, these planar points correspond to spherical coordinates on Earth. Due to the important damages that earthquakes cause, cluster detection of HDRs could be also useful to identify, from a real dataset, where earthquakes are specially likely. This information is key for decision-making, for example, to update construction codes guaranteeing a better building seismic-resistance. An interactive representation of the sphere can be seen in Appendix D.

Paper organization
This paper is organized as follows. Section 2 contains some background ideas on directional level set estimation including some discussion on error measurements and some existing consistency results in the directional setting that will be really useful to extent the definition of HDRs for directional data in Sect. 3. There, plug-in estimators and the corresponding confidence regions are also established. Concretely, we consider the plug-in methods based on a well-known directional kernel density estimator, which requires a smoothing parameter (bandwidth) for its practical implementation. An appropriate bootstrap bandwidth selector, the first one specifically designed for directional HDRs estimation, is also introduced in this section. Section 4 presents an extensive simulation study illustrating the performance of the resulting plug-in reconstruction for the HDRs (for circular and spherical domains) considering the new bandwidth selector. These results are compared with those obtained with directional smoothing parameters not specifically designed for HDR estimation. In Sect. 5, the proposed methodology is applied to analyze the two real data examples presented in the Introduction. Finally, some conclusions and ideas for further research are presented in Sect. 6. Appendix A and the supplementary material that completes this work include further information on the datasets. Appendix B specifies the parameters taken for the construction of the spherical densities in the simulation study. Appendix C contains some additional results of simulations. Appendix D collects the description of the bandwidth selectors considered in the simulation study. All the methods presented in this paper, along with the real data examples, are accessible in HDiR package.

Some background on directional level sets
The specific problem of reconstructing density level sets in the directional setting is reviewed in this section: the definition of directional level set is introduced jointly with a plug-in estimator. Based on the real data and simulated examples, some discussion about how to measure the estimation error and some asymptotic results are also included.

On directional level sets
Consider a random vector X taking values on a d−dimensional unit sphere S d−1 with density f . Given a level t > 0, the directional level set is defined as: The nature of different level sets is shown in Fig. 4, which represents G f (t) in grey color for three different circular densities and three different values of the level t. The threshold t is represented through a dotted grey line. Note that, if large values of t are considered (bottom row in Fig. 4), G f (t) coincides with the greatest modes. However, for small values of t, the level set G f (t) is virtually equal to the support of the distribution.
It is important to noticed that, following (Hartigan 1975), we may also establish the concept of cluster in directional setting as the connected components of the level set G f (t). With this view in mind, note that the density represented in the second row of Fig. 4 presents four connected components for all of the considered values for t, determining four population clusters.
Plug-in estimation is the most natural and common choice for reconstructing density level sets in the Euclidean space. A review of other existing estimation alternatives can be seen in Rodríguez-Casal and Saavedra-Nieves (2019). Plug-in methods are devised to reconstruct the level set in (1) aŝ

Fig. 4
For thee different circular densities, G f (t) for t = t 1 (first column), t = t 2 (second column) and t = t 3 (third column) verifying 0 < t 1 < t 2 < t 3 . Equivalently, L( f τ ) for τ = 0.2 (first column), τ = 0.5 (second column) and τ = 0.8 (third column) where g n usually denotes the classical kernel estimator for Euclidean data (see Parzen 1962 andRosenblatt 1956). This methodology, which has received considerable attention (see, for instance, Tsybakov 1997;Baíllo 2003;Mason and Polonik 2009;Rigollet and Vert 2009;Mammen and Polonik 2013;Polonik 2013or Chen et al. 2017 can be easily generalized to the directional setting. Given a random sample X n = {X 1 , . . . , X n } ∈ S d−1 of the unknown directional density f , the corresponding level set G f (t) in (2) can be reconstructed aŝ where f n denotes a nonparametric directional density estimator. Following the classical ideas for real-valued random variables, a kernel estimator on S d−1 is provided in Bai et al. (1989) (d > 2) who also proved strong pointwise consistency, uniform consistency, and L 1 −norm consistency of the estimator (see also Hall et al. 1987 andKlemelä 2000 for further results). Following Bai et al. (1989), from a random sample X n on a d−dimensional sphere the directional kernel density estimator at a point x ∈ S d−1 is defined as where 1/h 2 > 0 is concentration parameter and K v M denotes the von Mises-Fisher kernel density (see Appendix B for explicit formulae). The consideration of a von Mises kernel in Eq. (4) is not the only option and it is particularly interesting to point out the use of a wrapped-normal kernel in the circular setting. In this case, Huckemann et al. (2016) proved that this kernel guarantees the monotonicity on the number of modes with respect to the smoothing parameter, something that also happens for the gaussian kernel in the linear case. It may be argued that such a kernel shoud be used in our problem. Nevertheless, it is computationally more expensive and our practical experience shows that results in practice are quite similar. Note that the kernel estimator in (4) can be viewed as a mixture of von Mises-Fisher. Furthermore, the concentration parameter 1/h 2 plays an analogous role to the bandwidth in the Euclidean case. For small values of 1/h 2 , the density estimator is oversmoothed. The opposite effect is obtained as 1/h 2 increases: with a large value of 1/h 2 , the estimator is clearly undersmoothing the underlying target density. Hence, the choice of h is a crucial issue. For simplicity, in what follows, we refer to h as bandwidth parameter. Several approaches for selecting h in practice, in circular and even directional settings, have been proposed in the literature (see Appendix D). All the existing proposals aim to minimize some error criterion on the target density, but none of them is specifically designed focusing on the reconstruction of a directional level set. Figure 5 shows three plug-in estimatorsĜ f (t) for models (black colour) and levels Plug-in density level setsĜ f (t) from X 250 for three different circular densities with t = t 1 (first column), t = t 2 (second column) and t = t 3 (third column) verifying For instance, for the sandhoppers example, Fig. 6 shows the plug-in estimators obtained for the two samples of sandhoppers represented in Fig. 2. It is possible to detect the largest modes of the two sample distributions corresponding to April and October samples. These results allow us to confirm the differences between the two populations. The largest cluster of April orientations is located around the angle 7π/4. However, the pattern observed for October registries is completely different. Although an only cluster is identified around the angle 3π/2, if the level t decreases slightly two additional groups can be detected around the angles 3π/4 and 5π/4, respectively.
Regarding the earthquakes illustration, Fig. 3 (right) shows the plug-in contour in blue obtained from the selected sample of world earthquakes considered. Chosing a convenient value of the level t, the greatest mode of sample distribution is identified in the Southeast of Europe. Countries such as Italy, Greece or Turkey (located withint this cluster) are the most affected areas in the recent past.

Error measures and consistency results on directional level sets
The level setĜ f (t 3 ) represented in Fig. 5 (third column) presents two connected components. However, Fig. 4 (right plot in the bottom row) shows that the theoretical level set G f (t 3 ) has exactly three components. Therefore, the estimation error is considerably large. Distances between sets are the common criteria considered in set estimation to measure the discrepancies between the theoretical region to be estimated and the corresponding reconstruction. Of course, this is also applicable when the goal is to estimate level sets or HDRs.
The distance in measure d μ between two Borel sets A and B in R d is defined as where μ denotes the Lebesgue measure and A B, the symmetric difference of A and B calculated as Consistency results for directional plug-in estimators have been already obtained in the literature for this distance. For the estimator established in (3) defined on S 2 , Cuevas et al. (2006) and Cholaquidis et al. (2020) ≥ t} and f n denotes the kernel estimator (for manifolds with boundary) proposed in Berry and Sauer (2017). From the definition in (5), it easy to check that the distance in measure d μ does not penalize those level set estimators that have an isolated point as a connected component or any other set with null Lebesgue measure. Additionally, the undersmoothing caused by the choice of a small bandwidth value may provoque that the estimatorĜ f (t) presents non-significant connected components with small Lebesgue measure. In this case, d μ would not be as effective as, for instance, the Hausdorff distance in detecting this situation. Let us recall that, if A and B are now non-empty compact sets in R d , the Hausdorff distance between A and B is established as follows where ρ({x}, B) = inf y∈B {ρ(x, y)} being ρ(x, y) the distance between two points. Note that the definition of the Hausdorff distance is very general and depending on the selection of the distance ρ, different error criteria emerge. Usually, ρ corresponds to the chordal distance (Euclidean distance in R d , ρ 1 ).

Remark 1
Other natural choices such as the geodesic distance (great circle, ρ 2 ) could be considered in Eq. (6). Hopf-Rinow Theorem states that ρ 1 and ρ 2 induce the same topology on S d−1 . Figure 1 in Jeong et al. (2017) illustrates that ρ 1 (x, y) ≤ ρ 2 (x, y) for any pair of points x, y in the unit circle. Following Lemma 3 in Boissonnat et al.
(2019), a general upper bound for the ρ 2 (x, y) for all x, y ∈ S d−1 depending on ρ 1 (x, y). Specifically, it is possible to prove that ρ 2 (x, y) ≤ arcsin(ρ 1 (x, y)) for all x, y ∈ S d−1 when the constant r is equal to 1/2.
The metric d H is not completely successful in detecting differences in shape properties. In other words, two sets can be very close in Hausdorff distance and still show quite different shapes. This typically happens where the boundaries ∂ A and ∂ B are far apart, no matter the proximity of A and B. So a natural way to reinforce the notion of visual proximity between two sets provided by Hausdorff distance is to account also for the proximity of the respective boundaries. This error criterion has been also considered for establishing consistency results of several directional plug-in reconstructions. Cuevas et al. (2006) prove that lim n→∞ d H (∂G f (t), ∂Ĝ f (t)) = 0, a.s., when the Hausdorff distance is defined from ρ 1 . If the Hausdorff distance involves ρ 2 , (Cholaquidis et al. 2020) s. The existing monotone relationship between chordal and geodesic distances guarantees the consistency of the plug-in estimator in Cholaquidis et al. (2020) also when the Hausdorff distance depends on ρ 1 instead of ρ 2 . Therefore, if the target is the reconstruction of a set, the Hausdorff metric (defined from the chordal distance) can be seen as a suitable error criteria in the directional setting.

HDRs in the directional setting
As noted in the Introduction, the level t is usually unknown in (1) and, for practical purposes, the practitioner chooses the probability content of the set instead of the level t. These particular class of level sets widely considered for Euclidean data are the socalled HDRs (see Box and Tiao 1973;Hyndman 1996 or Samworth andWand 2010). However, as far as we know, HDRs were not defined in the directional context yet. Motivating the need for a proper extension of this notion and the proposal of adequate estimation tools can be easily justified. Figure 7 (top) shows four different 50% circular regions (regions containing 50% of the probability, empirically approximated) for the kernel density estimator f n represented in grey. Although all of them have probability content equal to 50%, they exhibit completely different shapes. Therefore, it is obvious that there exists an infinite number of ways to choose a region with given coverage probability and in a general scenario, it may not be clear which region must be chosen. The same happens for real-valued random variables, and (Hyndman 1996) suggests that HDRs are the best subset to summarize a probability distribution.
The usual purpose in summarizing a probability distribution by a region of the sample space is to delineate a comparatively small set which contains most of the probability, although the density may be nonzero over infinite regions of the sample space. Therefore, as in the Euclidean case, it is necessary to decide what properties the region has to verify. The following conditions are natural: (C1) The region should occupy the smallest possible volume in the sample space. (C2) Every point inside the region should have probability density at least as large as every point outside the region.
Following (Box and Tiao 1973), conditions (C1) and (C2) are equivalent and lead to regions called HDRs. Definition 1 formalizes this concept in the directional context taking into account the second criterion.
Definition 1 Let f be a directional density function on S d−1 of a random vector X . Given τ ∈ (0, 1), the 100(1 − τ )% HDR is the subset where f τ can be seen as the largest constant such that with respect to the distribution induced by f .
According to Polonik (1997) and García et al. (2003) in the Euclidean context, L( f τ ) is the minimum volume level set with probability content at least (1 − τ ). Figure 4 shows the HDR L( f τ ) in grey for three different circular densities and three different values of τ . The threshold f τ is represented through a dotted grey line. Note that, if large values of τ are considered, L( f τ ) is equal to the greatest modes and, therefore, the most differentiated clusters can be easily identified. However, for small values of τ , L( f τ ) is almost equal to the support of the distribution.

Plug-in estimation of directional HDRs
The first step to reconstruct the HDR in Definition 1 for a given τ ∈ (0, 1) is to estimate the threshold f τ . As in the Euclidean case, numerical integration methods could be also used in the directional setting in order to approximate its value. However, when the dimension increases, the computational cost becomes a major issue due to the complexity of the numerical integration algorithms considered on high dimensional spaces. An alternative approach for estimating f τ with a feasible computational cost is described next. As before, let X be a random vector with directional density f and let Y = f (X ) be the random vector obtained by transforming X by its own density function.
Since (Hyndman 1996), f τ can be estimated as a sample quantile from a set of independent and identically distributed random vectors with the same distribution as Y .
In particular, if X n = {X 1 , . . . , X n } denotes a set of independent observations in Obviously, if f is a known function, the observations can be pseudorandomly generated and the estimation of f τ could be made arbitrarily accurate by increasing n. In practice, f is often unknown and we have as only information a random sample of points X n from an unknown density f . From this sample, we propose first to determine the kernel estimator f n in (4). If n is large enough, then calculate the set { f n (X 1 ), . . . , f n (X n )} in order to estimate f empirically. If n is moderate, it may be preferable to generate observations X n = {X l , . . . , X N } of large size N from f n . For small values of n it may not be possible to get a reasonable density estimate. Besides, with few observations and no prior knowledge of the underlying density, there seems little point in attempting to summarize the sample space (see Wand and Jones 1995 for some discussion on the number of observations needed for a reasonable linear density estimate). Note that the problem here is not with the density quantile algorithm (that give results to an arbitrary degree of accuracy given a density), but with estimating the density from insufficient data.
Once the threshold f τ is estimated, plug-in methods reconstruct the 100(1 − τ )% HDR namely L( f τ ) in (7) aŝ (9) Figure 7 shows the circular kernel estimator f n (grey color) calculated from a sample X 250 generated from the second model (black color) in Fig. 4 and different empirically approximated 50% circular regions (grey color, top). The boxplot of the transformed values denoted by { f n (X 1 ), . . . , f n (X 250 )} is also shown (bottom). The dotted lines represent the quantiles that determine the corresponding 50% (probability coverage) circular region. Note that only the estimated HDR (left),L(f τ ), is able to show the existence of the five existing modes.
Apart from the consistency off τ , Cadre et al. (2009) establish the exact convergerce rate (considering the distance in measure d μ as error criteria) for Euclidean HDRs. The extension of these results to the directional setting does not seem straightforward. However, iff τ −consistency remains true, we could prove thatL(f τ ) is also a d H −consistent estimator of L( f τ ) in S 2 under the assumptions of Corollary 1 and condition (T) in Cuevas et al. (2006). To complete the proof is only necessary to apply a triangle inequality on d H (L( f τ ),L(f τ )).

Confidence regions for estimated HDRs
The density quantile algorithm detailed above for approximating the threshold f τ involves an empirical approximation. Then, it is convenient to compute some uncertainty limits on the estimated regions.
For the simplest case of X being a circular random variable (following Hyndman 1996), standard asymptotic results for a sample in Cox and Hinkley (1979) allow to prove thatf τ is asymptotically normally distributed with mean f τ and variance and {z i } denote those points in the sample space of X such that f (z i ) = y, i = 1, 2, . . . , n(y).
Alternatively, a bootstrap algorithm can be easily designed to compute confidence regions for estimated HDRs. The procedure is detailed in Algorithm 1.
Algorithm 1: Bootstrap procedure to estimate HDRs confidence regions.
Inputs: X n , τ , the number of bootstrap resamples B and the confidence level α.
1. Calculate the directional kernel density estimador f n from X n .
As an illustration, Fig. 8 shows the estimated confidence regions using the asymptotic approach (first row, in dark red color) and the bootstrap procedure (second row, in purple color) for three different values of τ when α = 0.05 and B = 250. Cross validation bandwidths introduced in Hall et al. (1987) were used as smoothing parameters for circular density estimation in both approaches.

A suitable bootstrap bandwidth selector
The plug-in reconstruction of the directional HDRs in (9) involves the calculation of the kernel density estimator in (4) that is known to be heavily dependent on the selection of h. The existing methods for selecting an optimal value for h aim for minimizing some error criterion on the target density f , but they are not specifically Fig. 8 95 % Confidence regions considering the asymptotic approach (first row, in dark red color) and the bootstrap procedure with B = 250 (second row, in purple color) from X 500 of a circular density with τ 1 (first column), τ 2 (second column) and τ 3 (third column) verifying 0 < τ 1 < τ 2 < τ 3 < 1 designed for the estimation of HDRs. The goal of this section is to propose the first selector of h specifically designed for HDRs reconstruction.
A bootstrap bandwidth selector focused on the problem of reconstructing HDRs is introduced in what follows. The idea is to use an error criterion that quantifies the differences between the theoretical region and its plug-in reconstruction. In the realvalued setting, Samworth and Wand (2010) propose one of the first bandwidth selectors for HDRs estimation studying an relatively uncommon distance (depending on both μ and g) between these sets. In this work, we consider the classical Hausdorff distance (introduced in Sect. 2.2) between the boundaries of the HDR and the corresponding estimator.
In the directional case, the closed expression of d H (∂ L( f τ ), ∂L(f τ )) is not known. However, it could be estimated through a bootstrap procedure. Therefore, a new bandwidth selector can be established as where E B denotes the bootstrap expectation with respect to random samples X * n = {X * 1 , . . . , X * n } generated from the directional kernel f n that, of course, is dependent on a pilot bandwidth and also on the choice of the distance ρ in Eq. (6). Figure 12 shows the theoretical HDR for model S3 (see Sect. 4.2) when τ = 0.5 (first and second columns). Moreover, the plug-in estimatorL(f τ ) obtained from a sample of size n = 1000 and considering the bandwidth proposed in García-Portugués (2013) when τ = 0.5 is also represented (third column). Note that, for this sample size, only the largest mode is detected. In this particular case, the Hausdorff error is smaller if the HDR is reconstructed from a cross-validation bandwidth designed for density estimation (fourth and fifth columns). A relevant issue appears when h 1 is estimated from imprecise HDR estimators. Remember that the minimization procedure considered for determining h 1 involves the boundary of the setL(f τ ). If this set is poorly approximated the resulting bandwidth surely will not provide competitive results. Therefore, largest sample sizes will be considered in this section for avoiding this problem.
Another point that is worth to mention is that diverse bandwidths selectors emerge from the consideration of the different choices of ρ in the definition of the Hausdorff distance. In fact, other bandwidths could be defined if, for example, d H in (10) is replaced by a completely different error criteria such as the distance in measure d μ that, unlike Hausdorff distance, does not take in account the connected components of a set only composed by a isolated point. Therefore, we could propose as many bandwidths as existing distances between sets attending to the specific properties and characteristics of each distance.

Simulation study
The performance of the proposed bandwidth selector is explored in this section. As it has been mentioned, there exist other bandwidth selectors for directional kernel density estimation (see Appendix D), although not specifically designed for HDR reconstruction. We will also check the impact of considering some of these selectors in the HDR plug-in estimation. Specifically, the selector h 1 established in (10) was implemented considering the chordal distance ρ 1 that, as we show in Sect. 2.2, guarantees good asymptotic properties of directional level sets. The code for computing it is available in the R library HDiR. All the other bandwidths are implemented in the R packages NPCirc 3 and Directional 4 . Sects. 4.1 and 4.2 contain the results obtained in circular and spherical settings, respectively. Some additional results of simulations are also contained in Appendix C.

Estimation of circular HDRs
A collection of 9 circular densities (models C1 to C9) have been considered in this simulation study. These models are mixture of different circular distributions and they correspond to densities 5, 6, 7, 8, 10, 11, 16, 19 and 20 fully described in Oliveira et al. (2014). Figure 9 shows these densities and the thresholds f τ for τ = 0.2, τ = 0.5 and τ = 0.8 through dotted circles.
A total of 250 random samples of sizes n = 500 and n = 1000 were generated for each of these models. From each sample, circular HDRs are reconstructed for τ = 0.2, τ = 0.5 and τ = 0.8. Results for τ = 0.2 and τ = 0.8 are exposed in Appendix C.1. The behavior of plug-in methods that emerge from the consideration of different bandwidth parameters will be checked. Apart from h 1 , we will consider the circular rule-of-thumb by Taylor (2008)  For each method and each sample, the estimation error is measured by computing the Hausdorff distance (d H ) between the boundaries of estimated HDR and the frontier of theoretical set. As a reference, note that the maximum value of this criteria in S 1 is 2 (the length of the diameter of the circle).
Tables 1 and 2 show the means and the standard deviations of the 250 estimation errors obtained when τ = 0.5 from samples of sizes n = 500 and n = 1000, respec-  Fig. 10 Violin plots of Hausdorff errors for models C3, C6 and C8 when τ = 0.5 and n = 1000. Note that due to the behaviour of h 2 , the scales of these figures are different tively. Bold numbers correspond to the lowest mean errors obtained for each density. Taking into account the variety of models considered, exhibiting different features, it is not surprising that all of the bandwidth selectors are the best ones for some model, showing h 1 a competitive behavior in all cases. In fact, it is the best one for models C3, C5, C6 and C8 (with n = 1000). Figure 10 shows the violin plots of Hausdorff errors obtained for some of the simulation models when τ = 0.5 (n = 1000). It shows that h 2 is the selector that presents a worst behavior for models C3 and C6. Furthermore, its variance is again specially large for model C3.
Finally, it is worth to mention that results in Appendix C.1 shows that the competitive behavior of h 1 improves considerably when high values of τ are selected. This is not a minor question when the goal is to estimate the biggest modes of a distribution.

Estimation of spherical HDRs
For the spherical scenario, 9 density models have been considered. These models, namely S1 to S9, are mixtures of von Mises-Fisher densities on the sphere, allowing to represent complex structures showing multimodality and/or asymetry. Parameters of mixtures are fully established in Table 8 in Appendix B for reproducibility. Moreover, these densities are also implemented in the R package HDiR. Figure 11 shows them and the corresponding HDRs for τ = 0.2, τ = 0.5 and τ = 0.8.
For sample sizes n = 500, n = 1500 and n = 2500, 200 random samples were generated from models S1 to S9. From each sample, HDRs are reconstructed for τ = 0.2, τ = 0.5 and τ = 0.8. As before, results for τ = 0.2 and τ = 0.8 are contained in Appendix C.2. The performance of different plug-in methods that emerge from the consideration of different bandwidth parameters discussed in this work is checked. Apart from h 1 , cross-validation bandwidth selectors for data on a sphere S d−1 (h 5 ) and the plug-in bandwidth selector introduced by García-Portugués (2013) (h 7 ) are taken into account. In this case, a total of B = 50 resamples are established for estimating the proposed bootstrap bandwidth h 1 , taking h 5 as a pilot bandwidth. For each method and each sample, the estimation error is again measured calculating the Hausdorff distance between the boundaries of estimated HDR and the frontier of theoretical set. As reference, note that the maximum value of both criteria S 2 is also 2. In this case, this upper bound coincides exactly with the length of the diameter of the sphere.
Tables 3, 4 and 5 contains the results for τ = 0.5 when n = 500, n = 1500 and n = 2500, respectively. Bold numbers correspond to the lowest mean errors obtained for each density. The proposed selector h 1 is the best or second best in all cases. In fact, h 1 and h 5 usually behave similarly and h 7 is the worst selector for S3. Figure 13 contains the violin plots of Hausdorff errors for some of the considered models when τ = 0.5 and n = 1500. Remark that h 1 and h 5 usually present similar results, see densities S5 and S9. However, h 1 is clearly more competitive for models S1 and S8.
To conclude, simulations in Appendix C.2 allows to confirm the good performance of the selector h 1 when small or big values of τ are considered for spherical data.

Real data analysis
The proposed methodology is now applied to the two real datasets presented in the Introduction, exemplifying the aplicability of the method for circular and spherical data.

Behavioral plasticity of sandhoppers
Adaptation to changing beach environments for the real example on sandhoppers introduced in Sect. 1.1 is analyzed from HDRs estimation perspective. HDRs are estimated for τ = 0.8 disaggregating the sandhoppers data taking into account the categories of variables specie, sex, time of day and month of year. As consequence, a total of 24 set estimators are determined, numbered E1 to E24. Variables combinations yielding this group classification are presented in Table 7 in Appendix A. Note that the estimated HDRs correspond to the largest modes of the orientation distributions. Hausdorff distances between the boundaries of these 24 sets are able to establish the degree of dissimilarity of HDRs. In general, large distances between the boundaries of two sets indicate the existence of modes in different directions. If the categories of all variables with the exception of one are fixed, it is possible to check if the different values of the non-fixing variable has some influence in sandhoppers orientation through the comparison of the estimated HDRs. The upper triangular matrix in Table 6 contains the Hausdorff distances (defined from ρ 1 ) between boundaries. The largest distances are represented in blue color. Grey color is used in order to depict the next largest values. Furthermore, Table 6 (top) contains some of the estimated HDRs that present the largest distances.

Table 6
Upper triangular matrix contains the Hausdorff distances between boundaries of sets from 1 to 24. HDRs representations for τ = 0.8 for some sets between 1 and 24 (top) In particular, Hausdorff distance between regions 5 and 11 is equal to 1.91 (close to 2, the maximum value of Hausdorff distance). According to Table 7, the variable configuration 5 corresponds to the largest orientation modes for females of the specie Talitrus saltator when the orientation is measure in noon during October. Region 11 refers to same measurements taken in April. Therefore, the month can be seen as variable that has influence on the orientation for sandhoppers.
Hausdorff distance between regions 5 and 6 is equal to 1.93. According to Table  7, set 6 also corresponds to the HDR for females of the specie Talitrus saltator but, in this case, when the orientation is registered in morning during October. Then, the moment of the day also seems a factor with influence on the sandhoppers behavior.
Several cells in Table 6 are represented in pink color. All of them corresponds to considerable large values of distances (larger than 1.00) and they are used to analyze briefly the influence of each of the variables in the dataset. Under the same values of the rest of variables Talitrus saltator and Talorchestia brito present different behaviors. For instance, distances between sets 5 and 17 or 3 and 15 correspond to this situation. Sets 5 and 17 can be compared using their representations in Table 6 (top).
The importance of the sex variable for the specie Talitrus saltator can be also seen considering the Hausdorff distances of the sets 2 and 5, 3 and 6 or 18 and 15. According to images in Table 6, these sets present their largest modes in completely different directions. Note that the role of the variable month is clearly remarkable. The relatively high values of the distances between sets 1 and 7 and 6 and 12 or 14 and 20 for the species Talitrus saltator and Talorchestia brito also corresponds to the existence of modes in different directions. Finally, the importance of the moment of the day for the Talitrus saltator can be studied through the distances between sets 4 and 5 or 4 and 6. Remark that set 4 has two connected components while set 5 only presents one.
Finally, the analysis of Hausdorff distances for the two species of sandhoppers shows that the median of the Talitrus saltator in Hausdorff distance is 0.76, clearly bigger than the median of Talorchestia brito that is equal to 0.52. Therefore, Talitrus saltator presents more differentiated orientations, depending on the time of day, period of year and sex, with respect to Talorchestia brito. Therefore, conclusions in Scapini et al. (2002) are corroborated from this perspective.

Earthquakes distribution on Earth
According to the theory of plate tectonics, Earth is an active planet. Its surface is composed of about 15 individual plates that move and interact, constantly changing and reshaping Earth's outer layer. These movements are usually the main cause of volcanoes and earthquakes. Seismologists have related these natural phenomena to the boundaries of tectonic plates because they tend to occur there, see Selley et al. (2004). In fact, the concentration of earthquake epicenters traces the filamentary network of fault lines and, consequently, they could be analyzed alternatively from the perspective of nonparametric filamentary structure estimation (see, for instance, Genovese et al. 2012). Moreover, tectonic hazards can provoque important damages (destroy buildings, infrastructures or even cause deaths). Therefore, it is important to detect which areas are specially risky. As an illustration, the recent world earthquakes distribution is analyzed next through HDRs estimation. Figure 14 shows the margins of the tectonic plates (grey color) and the geographical coordinates (red points) of a total of 272 medium and strong earthquakes registered between 1th October 2004 and 9th April 2020 already introduced in Sect. 1.1. Note that most of events are exactly located on the plates boundaries.
Our main goal is to detect which areas are really problematic nowadays. In Sect. 1.1, we show that the largest mode is located on the Southeast Europe considering a value of τ = 0.8. However, a more general view on earthquakes distribution could be obtained if more HDRs are reconstructed for a range of values of τ . Specifically, they were estimated choosing τ 1 = 0.1, τ 2 = 0.3, τ 3 = 0.5, τ 4 = 0.7 and τ 5 = 0.9. The bandwidth parameter used is the proposed in García-Portugués (2013). The corresponding contours are also represented in Fig. 14 using blue colors. An interactive representation of these HDRs can be seen in Appendix D.
The two smallest contours (dark blue colors) corresponds to density regions with probability at least 1 − τ 5 = 0.1 and 1 − τ 4 = 0.3, respectively. Therefore, they match with the greatest modes of earthquakes world distribution and they identify the more risky parts of the world. They are located on Europe. Concretely, on the boundaries intersection for the Eurasian and African Plates. Note that the second of these regions even includes the frontier of the Arabian Plate. Contours for τ 2 = 0.3 and τ 3 = 0.5 are related to Indo-Australian Plate and margins of Philippine Sea and Pacific Plates appears when τ 1 = 0.1.
As for America, the most problematic area is detected in Central America. Concretely, it is mainly located on the frontiers of Cocos, Nazca and Caribbean Plates. According to the contours shown, this region belongs to the zone of the world where the 70% (1−τ 2 %) of earthquakes are registered. If τ 1 = 0.1 is considered then Pacific, North and South American plates appears as risky areas.

Conclusions and discussion
The main goals of this work are to extend the definition of HDRs for directional data and propose a plug-in estimator based on a new bootstrap bandwidth selector that is focused on HDRs reconstruction. The route designed to reach this goal can be summarized as follows: (1) Extending the definition of HDRs for directional data, (2) proposing general HDRs plug-in estimators and two different procedures for estimating confidence regions, (3) introducing the first specific selector of the bandwidth parameter for directional HDRs reconstruction, (4) studying the practical behavior of the plug-in estimators (using the new selector and other classical directional bandwidths not specifically designed for HDR reconstruction) and (5) applying the plug-in reconstruction of HDRs to the real data on sandhoppers orientation and earthquakes.
Some further research on the proposed estimator and some natural extensions of this work are discussed. The performance of the procedures for estimating the HDRs confidence regions should be compared, for instance, through simulations. Additionally, consistency results on the proposed HDR estimator and the bootstrap bandwidth selector could be explored following the scheme in Cadre et al. (2009). Regarding the procedure for bandwidth selection, there are two natural extensions. Firstly, as it has been mentioned along the paper, other distances may be used. Secondly, the consideration of the kernel density estimates proposed in Di Marzio et al. (2011) (torus) and García-Portugués et al. (2013) (cylinder) enables the adaptation of our proposal to these settings.
Note also that in the Introduction, we refer to the notion of cluster as the number of connected components of the probability density. With this view in mind, an estimator of the number of directional clusters can be given by the number of connected components of the HDRs plug-in estimator. In addition, (two or more) directional densities could be also compared using the ideas explored in this work: we may compare the discrepancy between directional HDRs estimations, for instance, measuring distances between boundaries. The simple geometric structure of estimators could be used to compute the procedure and calibrate the test using re-sampling schemes.
Finally, earthquakes on Earth could be analyzed following alternative approaches. Note that contour lines in Fig. 14 do not clearly follow the geometry of tectonic plates. A possible cause of this behavior is that earthquakes occur very close to the boundary of the density support (that is, the frontiers of the tectonic plates) and this issue may produce a bias in the estimator. For manifolds with known boundaries, Theorem 3.1 in Berry and Sauer (2017) provides a consistent estimate of the density both in the interior and the frontier, reducing the bias for density evaluations closed to the boundaries. Since the concentration of earthquakes epicenters traces the filamentary network of fault lines, following (Genovese et al. 2012), the performance of nonparametric filament estimators should be also checked for further insight in this problem.
Acknowledgements R.M. Crujeiras and P. Saavedra-Nieves acknowledge the financial support of Ministerio de Economía y Competitividad and Ministerio de Ciencia e Innovación of the Spanish government under grants MTM2016-76969P, MTM2017-089422-P, PID2020-118101GBI00 and PID2020-116587GB-I00 and ERDF. Authors also thank Elena Vázquez Abal for her help, Prof. Felicita Scapini for providing the sandhoppers data (collected under the support of the European Project ERB ICI8-CT98-0270), the computational resources of the CESGA Supercomputing Center and the referees for the constructive comments which have improved the paper.
Funding Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

A.1 Levels to the estimated HDRs disaggregating the sandhoppers variables
The orientation of two sandhoppers species (Talitrus saltator and Talorchestia brito) is analyzed in Scapini et al. (2002). The experiment was carried out on the exposed nontidal sand of Zouara beach located in the Tunisian northwestern coast. Apart from the specie and the orientation angles, this dataset contains information about other variables such as sex (male, female), month (April, October) and moment of the day (morning, afternoon and noon) when the experiment was done. We refer to Scapini et al. (2002) and Marchetti and Scapini (2003) for further details on the dataset and the experimental design. Table 7 contains the associated levels to the 24 estimated HDRs when variables are disaggregated.

A.2 Interactive representation of HDRs for eathquakes on Earth
An interactive representation of HDRs contours for earthquakes around the world is contained in the supplementary material.

C.1 Circular HDRs estimation
Tables 9 and 10 show the results for τ = 0.2 (for n = 500 and n = 1000, respectively). In this case, h 1 (being a competitive selector in all the scenarios) is the best one for models C3 and C5 (with n = 1000). Note that h 2 presents a poor behavior for models C3, C6, C7, C8 and C9, and h 6 performance is also unsatisfactory for models C5 and C9 (n = 500), although it improves with sample size. Tables 11 and 12 contain the results obtained for τ = 0.8 when n = 500 and n = 1000, respectively. According to Table 12, h 1 is the best selector for five models (C2, C6, C7, C8 and C9). It is clear that the new selector improves its results when large values of τ are considered and, therefore, largest modes are identified. Figure 15 shows the violin plots of Hausdorff errors obtained for some of the simulation models when τ = 0.2 (n = 1000). If τ = 0.2, h 2 is the selector that presents a worst behavior for models C3 and C6. Furthermore, its variance is always specially large.

C.2 Spherical HDRs estimation
Tables 13 and 14 show the means and the standard deviations of the 200 estimation errors obtained when τ = 0.2 from samples of sizes n = 1500 and n = 2500, respectively. Bold numbers correspond to the lowest mean errors obtained for each density. Except for model S8, h 1 is the best or shows a competitive performance. Tables 15 and 16 show the means and the standard deviations of the 200 estimation errors obtained when τ = 0.8 from samples of size n = 1500 and n = 2500, respectively. Although results for S7 are not good when h 1 is considered, this selector is again the best or competitive with h 5 . As for h 7 , results are remarkably poor in S2 and S6. Figure 16 contains the violin plots of Hausdorff errors for models S3, S4, S6 and S9 when τ = 0.2 and n = 2500. Note that he performance of selector h 1 is considerably good.

D Some details on the directional bandwidth selectors
We briefly revise in this section some bandwidth selection methods designed for kernel density estimation. Although these methods do not focus on HDRs, but on the reconstruction of the whole density curve, it may be argued that they could also be used for constructing the proposed plug-in estimator. The performance of our proposal is compared in all the simulated scenarios with different bandwidth selectors for circular and spherical data. As in the Euclidean setting, most used techniques for selecting h are based on the minimization of some error criteria that quantify the accuracy of the kernel density estimator. One of the most simple errors to be considered is the mean integrated squared error that can be written as follows: where ω d denotes the Lebesgue in S d−1 . Then, a possibility is to search for the bandwidth that minimizes (11). However, the asymptotic version of M I SE, AM I S E, is more commonly used in the literature. A rule of thumb proposed in Taylor (2008) adapts the idea in Silverman (1986) in kernel linear density estimation to the circular setting. The resulting plug-in selector assumes that the data follow a von Mises distribution to determine the AM I S E. The bandwidth is chosen by first obtaining an estimationκ of the concentration parameter κ in the reference density (for example, by maximum likelihood) through the formula h 2 = 4π 1/2 I 0 (κ) 2 3κ 2 I 2 (2κ)n 1/5 . Remark that the parametrization in Taylor (2008) has been adapted to the context of the estimator (4) by denoting by h the inverse of the squared concentration parameter employed in his paper. The poor performance of this rule is sometimes due to the non robust estimation by maximum likelihood of the concentration parameter. An alternative and robustified estimation procedure is considered in Oliveira et al. (2013).
A new selector also devoted to the circular case is established in Oliveira et al. (2012). It improves the performance of the Taylor's proposal allowing for more flexibility in the reference density, considering a mixture of von Mises. This selector is mainly based on two elements. First, the AM I S E expansion derived in Di Marzio et al. (2009) for the circular kernel density estimator by the use of Fourier expansions of the circular kernels. This expression has the following form when the kernel is a circular von Mises (the estimator is equivalent to consider L(r ) = e −r and h as the inverse of the squared concentration parameter in (4): The second element is the Expectation-Maximization (EM) algorithm in Banerjee et al. (2005) for fitting mixtures of directional von Mises. The selector, that is denoted by h 3 , proceeds as follows: first, apply the EM algorithm to fit mixtures with different number of components; then, choose the fitted mixture with the lowest AIC. Finally, compute the curvature term in (12) using the fitted mixture and seek for the h that minimizes this expression. This value of h is denoted by h 3 .
Of course, plug-in rules are not the only alternative to smoothing parameter selection. Some other data-driven directional procedures were already proposed in Hall et al. (1987) using cross-validation ideas. Specifically, Least Squares Cross-Validation (LSCV) and Likelihood Cross-Validation (LCV) bandwidth are introduced, arising as the minimizers of the cross-validated estimates of the squared error loss and the Kullback-Leibler loss, respectively. The selectors have the following expressions: where f −i n represents the kernel estimator computed without the i−th observation.
A bootstrap bandwidth selection procedure for data lying on a d−dimensional torus is proposed in Di Marzio et al. (2011). If a von Mises kernel is used, then the bootstrap MISE has a closed expression. Then, h 6 is selected as the value that minimizes where E B denotes the bootstrap expectation with respect to random samples {X * 1 , . . . , X * n } generated from f n (X ). A common problem for small samples is that a local minimum may be chosen, as pointed out by Oliveira et al. (2012).
Apart from existing cross-validation procedures in the directional setting, García-Portugués (2013) derives a plug-in directional analogue to the rule of thumb in Silverman (1986) using the properties of the von Mises density. Moreover, it is the optimal AM I S E bandwidth for normal reference density and normal kernel. Concretely, if the von Mises kernel is considered and κ is estimated by maximum likelihood, Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.