Pointwise density estimation on metric spaces and applications in seismology

We are studying the problem of estimating density in a wide range of metric spaces, including the Euclidean space, the sphere, the ball, and various Riemannian manifolds. Our framework involves a metric space with a doubling measure and a self-adjoint operator, whose heat kernel exhibits Gaussian behaviour. We begin by reviewing the construction of kernel density estimators and the related background information. As a novel result, we present a pointwise kernel density estimation for probability density functions that belong to general H\"{o}lder spaces. The study is accompanied by an application in Seismology. Precisely, we analyze a globally-indexed dataset of earthquake occurrence and compare the out-of-sample performance of several approximated kernel density estimators indexed on the sphere.


Introduction
Today, technology has equipped science with a massive amount of data that requires rigorous analysis.In astronomy, data can come from missions to other planets, telescopes observing distant parts of the universe, or programs studying Cosmic Microwave Background Radiation.In climatology and environmental science, sensors provide data on the atmosphere.Medicine uses scans to track the growth of tumors and monitor their development, while embryology uses data to track the growth and ensure the health of developing humans.Essentially all scientific fields now heavily rely on data.
The complexity and form of the data reflect their nature.In the examples mentioned above, the data can be represented by geometric structures that capture their form and dynamics.A dataset should be understood as independent realizations of a random variable (rv) X.Such a rv lives in some domain according to its nature.For instance when X represents the locations on some planet, then X lives on the sphere S 2 of the Euclidean space R 3 .The same is true of CMB radiation.For geological data in the interior of Earth, or another celestial body, the domain of study may be the ball B 3 .Similarly, in the field of medicine, the domain of definition of X can become much more complicated geometrically, and as a result, the general target domain becomes an abstract metric space M.
Let X be a rv distributed on a metric measure space M and let f = f X be its unknown probability density function (pdf).Density estimation, estimating a pdf from data X 1 , . . ., X n , represents an important problem in Statistics.To this end we need to construct a density estimator, which is an object of the form fn (X 1 , . . ., X n ; x), where fn : M n × M → R a measurable function.A famous method for obtaining such an estimator is by the so-called "kernel density estimators".
Here we study kernel density estimators on metric measure spaces under very broad assumptions.The setting we will work covers simultaneously the classical cases of Euclidean space R d , the sphere S d , the ball B d and many more significant examples of independent interest.Furthermore it contains more sophisticated geometric settings like manifolds and Lie groups.On the other hand, some techniques originated from spectral theory will simplify and unify several aspects of the approach.We shall operate in the setting put forward in Coulhon et al. (2012), which we describe next in a simplified form: I.We assume that (M, ρ, µ) is a metric measure space such that (M, ρ) is locally compact with distance ρ(•, •) and µ is a positive Radon measure satisfying: The number d is the so-called Ahlfors dimension of the space.
II.The second assumption is that there exists an essentially self-adjoint nonnegative operator L on L 2 (M, dµ), mapping real-valued to real-valued functions, such that the associated semigroup (more details in §2) P t = e −tL , t > 0, consists of integral operators with (heat) kernel p t (x, y) obeying the conditions: (ii) Gaussian localization: There exist constants c 2 , c 3 > 0 such that (iii) Hölder continuity: There exists a constant α > 0 such that for every x, y, y ∈ M such that ρ(y, y ) ≤ √ t and t > 0.
This setting we study generalizes (by default) the Euclidean space.Moreover, it contains spaces like the sphere, the ball, the interval, cubes/rectangles, the simplex, Riemannian manifolds with non-negative Ricci curvature and more, each equipped with their natural metrics and measures associated with Laplace or Laplace-Beltrami operators.For more examples we refer the reader to Coulhon et al. (2012); Georgiadis and Nielsen (2017); Kerkyacharian et al. (2020); Kerkyacharian and Petrushev (2015).
Some first contributions in Statistics in this generality can be found in Castillo et al. (2014); Cleanthous et al. (2020Cleanthous et al. ( , 2022)); Kerkyacharian et al. (2018), while there is a large number of open problems in front of the community.
The aim of the present study is threefold: (α) To review the setting and the construction of kernel density estimators together with the corresponding theoretical background, which is demanding, on the broad framework under study; Section 2.
(β) As novel results, we obtain optimal pointwise density estimation on Hölder spaces; see Sections 3 and 4 and we shed light in the assumptions and methods.
(γ) As an application, we perform a data-analysis of earthquakes; Section 5, using our kernel density estimators.Precisely we compare the out-of-sample performance of several approximated kernel density estimators and we plot the heat map of the estimated density using the selected model.
Remarks and Examples are placed in several points of the manuscript for highlighting notions and ideas.The new results are contained in Section 3 and under more general assumptions in Section 4 and are accompanied with remarks that could be used for future studies.
Section 5 is dedicated to the data analysis of earthquakes.In this Section we apply the theoretical results of the paper and show how one can use these approaches with occurrence data on the Earth.The data used in this analysis are freely available through the United States Geological Survey website https: //earthquake.usgs.gov/earthquakes/search/.
Notation: Throughout positive constants will be denoted by c, and will be allowed to vary at every occurrence.The dependence of a constant to the geometric structure constants c 1 , c 2 , c 3 , α and d will not be stated, but the dependence to a parameter q, will be stated as c q .We denote by N, R, R + the sets of positive integers, real numbers and non-negative real numbers respectively.If τ ∈ N, the class of differentiable functions on R + with continuous derivatives up to order τ will be stated as C τ (R + ).For s > 0, we will denote by s the greatest integer that is strictly less than s and by s the smaller integer strictly larger than s.

Density estimation on metric spaces associated with operators:
A review The first part of our study consists of a review of density estimation on metric spaces associated with operators.One of the milestones is to construct kernels.We expand here the methods used in Cleanthous et al. (2020Cleanthous et al. ( , 2022) ) inspired by the corresponding machinery built in Coulhon et al. (2012) based on the powerful Spectral Theory.
2.1.Functional calculus.We start by some fundamental notions of Spectral Theory providing a minimum background of this wide scientific field; the reader is further referred to Prugovečki (1981); Reed and Simon (1980); Yoshida (1978).
Recall that L is assumed to be a non-negative self-adjoint operator that maps real-valued to real-valued functions.Then (Prugovečki, 1981, Section 5) L admits a unique spectral measure E; that is a projector-valued mapping as follows: Denote by B the Borel σ-algebra on R. For every S ∈ B, we correspond an orthogonal projection E(S) : L 2 (M, dµ) → L 2 (M, dµ) such that: (i) E(R) = I (the identity operator on L 2 (M, dµ)).
(ii) For every sequence of disjoint Borel sets {S n } n∈N ⊂ B (2.5) in the strong L 2 (M, dµ) sense; i.e. for every f ∈ L 2 (M, dµ), Thanks to (ii), for every f, g ∈ L 2 (M, dµ) the set-function (2.7) ν f,g (S) := E(S)f, g , for every S ∈ B, is a complex measure on (R, B).Moreover for every f ∈ L 2 (M, dµ) the set-function The study can be slightly simplified by the following projector-valued function which is referred as the spectral resolution of L. Moreover for every f ∈ L 2 (M, dµ) and every λ ∈ R, we have Given further that L is assumed non-negative, by (Prugovečki, 1981, Theorem 6.3), the domain Dom(L) of L consists of all functions f ∈ L 2 (M, dµ) such that (2.9) Moreover for every f ∈ Dom(L) and g ∈ L 2 (M, dµ) It is customary to write symbolically the so-called spectral decomposition of L.
The next logical step is the functional calculus associated with the operator L; see also (Reed and Simon, 1980, Theorem VIII.5).
Let g : R + → R a Borel measurable function.Then the operator g(L) defined on (2.12) Dom(g) := ϕ ∈ L 2 (M, dµ) : is a self-adjoint operator mapping real-valued functions to real-valued functions.
If g is further assumed to be bounded, then Dom(g) = L 2 (M, dµ) and g(L) : L 2 (M, dµ) → L 2 (M, dµ) is a bounded operator.The operator g(L) is referred as the spectral multiplier associated with g and L and it is symbolically expressed -in the spirit of (2.11)-as The above spectral multipliers can take an explicit form for particular operators L and metric spaces, as we will see in §2.2.
For the purpose of the study of kernel density estimators we turn our attention to spectral multipliers associated with the operator √ L, which is well-defined and self-adjoint; see Yoshida (1978).The exact reasons behind the switch to √ L are discussed in (Cleanthous et al., 2022, Remark 2.2.(α)).
We denote by {F λ : λ ≥ 0} the spectral resolution of √ L. Then F λ = E λ 2 and for every Borel measurable g : R + → R it holds Summary and Notation.For the rest of our study we fix the following terminology and notation.
As a symbol we will refer to a Borel measurable and bounded function The spectral multiplier associated with the symbol k and the operator √ L as in (2.15) will be denoted by the corresponding capital letter: By the above discussion, the operator K: (i) is bounded on L 2 (M, dµ), (ii) is self-adjoint, and (iii) maps real-valued functions to real-valued functions.
For the purpose of kernel density estimation we are interested in the following class of operators: We say that K is an integral operator, when there exists a measurable function K(x, y) -referred as the kernel of the operator K- Note further that when the spectral multiplier K = k( √ L) is an integral operator, then its kernel is real valued and symmetric; Such kernels are exactly the objects we will use for the kernel density estimation.
As always we need a notion of dilations suitable for use in the current framework.Let k : R + a symbol, K the spectral multiplier associated with k and √ L and assume that K is an integral operator with kernel K(x, y), as above.For every h > 0 we denote by We shall need the following result from smooth functional calculus induced by the heat kernel, developed in Coulhon et al. (2012); Kerkyacharian and Petrushev (2015).We fix the following notation first: Let h > 0 and τ > 0. We denote by (2.17) where c > 0 is a constant depending on τ and the structural geometric constants of the setting.Moreover, for every h > 0 and x ∈ M (2.20) Let us return to the setting's Assumption II and seed more light on it.The heat kernel p t (x, y) consists the kernel of the operator e −tL .At this point, being more familiar with Spectral Theory, we define k(λ) := e −λ 2 .Then for every t > 0 the operator e −tL is just the spectral multiplier K √ t , which is an integral operator by Theorem 2.1, and the heat kernel equals p t (x, y) = K √ t (x, y).2020) and the references therein.In the following spaces we also express the kernels obtained in an abstract sense of existence in Theorem 2.1.
Example 2.3.Let M = R d the Euclidean space associated with the operator L = −∆, the negative Laplacian.By default this space is included in our study.We proceed to the kernels.Let k : R + → R a symbol satisfying the assumptions of Theorem 2.1.We extend the symbol ) is nothing but the Fourier multiplier associated with the symbol k(ξ).Denote by f and by F −1 f the Fourier transform and the inverse Fourier transform of the function f : R d → R, respectively.Then: The kernel K h (x, y), by the properties of the Fourier transform, is the familiar This example sheds light on the notion of spectral multipliers.Specifically, on R d , they are the well-known Fourier multipliers, and the corresponding kernels K(x, y) are the convolution kernels of the symbol κ = F −1 k.Moreover, the existing knowledge on R d , together with the present correspondence, acts as a guide for the several developments on the setting of metric spaces associated with operators.
In the next examples we consider spaces M of finite measure.In this caseas it has been proved in (Coulhon et al., 2012, Proposition 3.20)-the operator L presents a discrete spectrum 0 ≤ λ 0 < λ 1 < • • • .This implies the discrete decomposition Let {e ν i } i=1,...,dν be an orthonormal basis of the eigenspace E ν and d ν := dimE ν < ∞, ν ≥ 0. Then we have the projector operators Let k : R + → R a symbol satisfying the assumptions of Theorem 2.1.The corresponding spectral multiplier K h , h > 0, has the following kernel For more details we refer to Castillo et al. (2014); Kerkyacharian et al. (2020).Importantly, when dealing with a specific metric measure space of finite measure, associated with an operator L, we just need to know (i) the eigenvalues and (ii) the projector operators, and then we get the kernels in (2.22).
Next, we present precise expressions of (2.22) on the cases of the unit sphere and the unit ball of R 3 , which seems to be the most applicable.
Example 2.4.Let M = S 2 the unit sphere of R 3 associated with the angular distance, the spherical measure and the spherical Laplacian.Then this space satisfies our Assumptions I and II; Kerkyacharian et al. (2020).The kernel takes the form: where P ν the Legendre polynomials and •, • the inner product on R 3 .
Example 2.5.The unit ball B 3 of R 3 equipped with the distance Dai and Xu ( 2013) the measure and the operator satisfies the assumptions of our setting; Kerkyacharian et al. (2020).
Expanding the discussion in Cleanthous et al. (2020) and using Kyriazis et al. (2008), the kernel takes the form where and C 1 ν the Gegenbauer polynomials of order 1.We emphasize that the kernels existing by Theorem 2.1 may look completely different as in (2.21), (2.23), and (2.27); however, all of them enjoy the decay in (2.19), which is sharp in all the above cases as it can be confirmed by the properties of Fourier transform, Legendre polynomials and Gegenbauer polynomials, respectively.
The advantage of the general theory is that it unifies spaces of different nature, extracts general results and expresses them in the particular cases of interest.Definition 2.6.Let n ∈ N and X 1 , . . ., X n be iid random variables on M. Let k : R + → R be a symbol satisfying the assumptions of Theorem 2.1, as well as k(0) = 1, and h > 0 a bandwidth.The associated kernel density estimator (kde) is defined as Note that (2.29) is well-defined for every k, as guaranteed by Theorem 2.1.In addition (2.20) implies the fundamental property which is a standard assumption for the kernels in the Euclidean setting.
We further express explicitly the kde in (2.29) on R d , S 2 and B 3 , just by expanding the Examples 2.3, 2.4 and 2.5.Let k a symbol as in Definition 2.6 and h > 0.
(α) When M = R d , and which is the very well-known form of a kde on R d .(β) When M = S 2 , equipped with the angular distance, the spherical measure and the spherical Laplacian, (γ) When M = B 3 , equipped with the distance in (2.24), the measure in (2.25) and the operator in (2.26), where G ν as in (2.28).
2.4.Hölder spaces.We are closing this review by presenting some regularity spaces.In nonparametric estimation we assume that the density under study belongs to large regularity spaces.Regularity spaces on R and R d have been studied for a century within many scientific disciplines.Historically, the first way to express the notion of regularity (or smoothness) was in terms of derivatives, and gradually Fourier transforms and convolutions extended such notions.For the historical path, we refer the reader to Triebel (1983).
Hölder spaces are a suitable choice for the purpose of pointwise density estimation (see Tsybakov (2009)) that we will obtain in the present study.Let us recall this class on R 1 : Let s > 0 and denote by := s the greatest integer strictly less than s.The Hölder space Ḣs (R) is the set of function f : R → R that are -times differentiable and for some constant 0 ≤ c < ∞ and every x = y.Note that slightly different versions of these spaces can be found in different sources, but the overall purpose is more or less the same.
We must define a suitable extension of (2.33) on a metric space.For the right hand side, we simply use a power of the distance ρ(x, y).Metric spaces lack the notion of derivatives, so a substitute for the left side is more challenging, but a solution comes from the operator L. In all of our examples in Section 2.2 we observe that the differentiability is linked with the definition of L. We also note that in every case presented in Section 2.2 L is a differential operator of order 2.These facts justify the following definition: Definition 2.7.Let s > 0 and denote by = s .The Hölder space of order s, Ḣs , is defined as the set of all functions f : M → R such that (2.34) f Ḣs := sup For the connection between these spaces and other smoothness spaces in our setting, we refer to Coulhon et al. (2012).For the use of regularity spaces in Nonparametric Statistics in this generality, we further refer to Castillo et al. (2014); Cleanthous et al. (2020Cleanthous et al. ( , 2022)).

Pointwise density estimation
We proceed to present some new results.Namely the pointwise estimation of densities enjoying Hölder regularity.
One of the main ways to measure the accuracy of the estimator fn,h (x) at a given point x ∈ M is by the mean squared error (MSE): where E is the expectation of (X 1 , . . ., X n ), i.e.
What we are called to do is to determine the proper assumptions on the symbols k, so that the MSE of the corresponding kernel density estimator fn,h to be optimally estimated, provided that the unknown density f belongs to a certain Hölder space.
The main new result of this paper is the following: and for some r > τ + d, We pick h = h n = n − 1 2s+d .Then for every n ∈ N the corresponding kde fn,h satisfies where the constant c > 0, depends only on τ, s, C τ and the structural constants of the setting, while C(f ) is given by While approaching the proof of Theorem 3.1 we will have the opportunity to present the action on the setting and highlighting the correspondence with the classical Euclidean framework.
We first take a closer look at the assumptions on the symbol generating the kernels.We restrict ourselves to the example of M = R d .As we saw in Example 2.3, the radial extension k of the symbol k is the Fourier transform of the function κ, which yields the usual kde as in (2.30).Translating the assumptions of Theorem 3.1 in the Fourier transform language we get the usual assumptions on the κ for Euclidean spaces; (α) Assumption (3.37), means simply that the κ enjoys vanishing moments up to some certain order.
(β) Assumption (3.38), thanks to Theorem 2.1 and (2.21), ensures that The assumption k(0) = 1, simply asserts that The standard approach for dealing with the MSE is to decompose it as follows: where the function σ 2 (x) is the variance of the estimator fn,h (x), i.e. (3.43) We will separate the proof of Theorem 3.1 in the two usual steps: the estimation of the variance and the estimation of the bias, but before the proof, we provide some remarks below.
(β) For latter use we state that our choice of h n gives h n → 0 and nh d n → ∞ when n → ∞.
The following simple inequality is established in Coulhon et al. (2012) under more general assumptions.Here we express it for Ahlfors regular spaces and we give its proof for having the opportunity to present some first calculations on metric spaces: Lemma 3.3.If τ > d, there exists a constant c = c τ > 0 such that for every h > 0 and x ∈ M Proof.We split the metric space as where M 0 := B(x, h) and Of course, Combining all the above and since we assumed that τ > d we conclude to Let us point out that the above estimate is sharp in the sense that (3.47) It is well known that such an integral is classical on the Euclidean space and can be handled using (generalized) polar coordinates, exactly as we did in (3.41).On an abstract Ahlfors regular metric space it can be sharply estimated as above.
3.1.Estimation of the variance.We proceed to the first step of the proof of Theorem 3.1.We will estimate the variance of bounded densities.As always there is not any regularity required.The main tools are Theorem 2.1 and Lemma 3.3.Proposition 3.4.Let f ∈ L ∞ , τ > d and a multiplier k ∈ C τ (R + ) satisfying (3.37) and (3.38).Then for every x ∈ M, 0 < h ≤ 1 and n ∈ N , where the constant C 1 > 0 depends only on τ, C τ and the structural constants.
Proof.Recalling the results expanded in §2, the spectral multiplier K h associated with the dilated symbol k h (λ) = k(hλ), is an integral operator with kernel K h (x, y), x, y ∈ M.
We introduce the random variables and we observe that η 1 (x), . . ., η n (x) are iid random variables with E[η i (x)] = 0, for i = 1, . . ., n.For their variance we have By Theorem 2.1 we have the certain bounds where for the ultimate inequality we used (3.46).
We observe that Bearing in mind that the independent random variables η i have zero mean and guided by (3.52) and (3.53) we arrive at 3.2.Estimation of the bias.We will estimate the bias under the assumption that the pdf f lies in the Hölder space.
Proof.Since X i are iid with common density f , we obtain where I the identity operator on M and K h = k h ( √ L) the spectral multiplier associated with the dilated symbol k h and the operator √ L, as in §2.For the given bandwidth 0 < h ≤ 1 there exists a unique integer i ∈ N 0 such that We consider the symbol ψ ∈ C ∞ (R + ) with supp ψ ⊂ [0, 2], ψ(λ) = 1, for every λ ∈ [0, 1] and 0 ≤ ψ(λ) ≤ 1, for every λ ∈ [0, 2].
We set ϕ(λ) By the construction of the above functions, it turns out that Then by (Coulhon et al., 2012, Corollary 3.9) where by the capital Ψ 2 −i and Φ 2 −j we denoted the spectral multipliers as in Section 2.1; We set := s and we introduce the symbols We proceed to justify that the assumptions of Theorem 2.1 are fulfilled for the symbols g j , j ≥ i, using Remark 2.2 (β).
Moreover by the vanishing derivatives' assumption (3.37), the decay in (3.38) and the bottom of the support of ϕ, we obtain after some calculus that where the above constant c(τ, ) > 0 is independent of j.
By Theorem 2.1, coupled with Remark 2.2(β), the spectral multipliers G j 2 −j = g j (2 −j √ L), j ≥ i, are integral operators and their corresponding kernels G j 2 −j (x, y) present the behaviour By the definition of the symbols g j , j ≥ i in (3.59) we get Combining (3.56), (3.58) with (3.61) and (3.62) we obtain the expansion Since G j 2 −j are integral operators and because of g j (0) = 0, for every j ≥ i, using (2.20) we express G j 2 −j L /2 f (x) as The membership of f in the Hölder space Ḣs implies that We equip (3.63) with (3.64), (3.60) and (3.65) to arrive at the expression (3.66) where I 2 −j ,τ −s (x), as in Lemma 3.3.Thanks to the assumption τ > d+s, by (3.46), the fact that s > 0 and (3.57) we conclude that and the proof is complete.
End of the proof of Theorem 3.1.We combine Propositions 3.4 and 3.5 to conclude the proof of Theorem 3.1 in the standard way.
3.3.Kernel density estimators on the sphere.The shape of the Earth justifies the unit sphere S 2 of R 3 as the most important domain for the purposes of several sciences.In the present paper we study earthquakes that are the subject of seismology, but many other sciences like astrophysics, environment and geology could be interested in this geometry too.We describe how the kernels obtained in Section 2.2 should be used in a data analysis.
We consider the symbols for σ ∈ N, with σ > 1. Evidently for every σ > 1, the symbol g σ is an even function such that g σ ∈ C σ−1 (R), g σ (0) = 1, (g σ ) (ν) (0) = 0, for every 1 ≤ ν ≤ σ − 1 and presents the decay as in (3.38) for r = σ.Such symbols are suitable for generating kdes by choosing the appropriate value of σ depending on the dimension d and the regularity s and then the appropriate bandwidth h depending also on the datasize n.
For the purpose of our present study, we restrict our attention on the case of the unit sphere M = S 2 .
Let s > 0 and denote by s the smallest integer strictly grater than s.The symbols (3.68) for r = σ := 5 + s satisfy the assumptions of Theorem 3.1 for densities on Ḣs .
The expression (2.23) could be used by R (or Python etc) after the infinite series be truncated until some certain integer N ∈ N, namely Note that Moreover, for the Legendre polynomials it is well-known that |P ν (u)| ≤ 1, for every u ∈ [−1, 1] and of course 1+2ν 4π ≤ 0.51 ν π , for every ν ≥ 25.Then the error (absolute value of the difference) because of the truncation of (2.23) until the order N ≥ 24 can be safely bounded from above by error Recall that by Theorem 3.1 h = n −1/(2s+2) , where n is our datasize.Expression (3.70) asserts that an effectively large N could provide a certain error-bound when the datasize n and the regularity s are considered as fixed.
As an example, take s ∈ (0, 1] (the less restrictive range) which corresponds to the value r = 6.In this case and after setting h = n −1/2(s+1) the error is at most In a specific data analysis, with a given datasize n, one should bear in mind to respect (3.71) for the hypothetical smoothness' level s and obtain appropriate values for the error is pre-defined as "suitable".For a data analysis of earthquakes the reader is referred to Section 5.

Spaces of homogeneous type
We proceed to more general assumptions than those in Section 1. Specifically, we no longer assume that our space enjoys the Ahlfors regularity, rather than the so-called doubling volume property (4.72) below.Such a setting is what is referred as a space of homogeneous type.
We replace Assumption I with the following: (δ) In the framework of spaces of homogeneous type we replace the kernels defined in (2.17 On the background results: (i) Theorem 2.1 holds as it is, but with the kernels D h,τ as in (4.77).The last implies that the kernel in (4.77) is estimated by: We are now in place to express the boundedness of the Mean Squared Error on the more general setting of spaces of homogeneous type associated with operators: and for some r > τ + d, We pick h = h n = n − 1 2s+d .Then for every n ∈ N the corresponding kde fn,h satisfies where the constant c > 0, depends only on τ, s, c τ , β and the structural constants of the setting, while C(f ) is given by In the light of (3.42), we need to bound the variance and the bias in a similar manner as in Propositions 3.4 and 3.5.
We start with the variance.By (3.51) and (4.80) we get the behaviour The last replaced in (3.52) implies since τ > 3d/2 + s > d, where we used (4.78) and (4.76) respectively.Now the estimation of the variance in (3.48) follows as in (3.53).
We proceed to bound the bias as in Proposition 3.5.Recall that := s .This time the kernels G j 2 −j (x, y) enjoy the behaviour thanks to (4.80).Then where we used (4.78) which is valid because τ > 3d/2 + s.
The reminder of the proof is the same as in the proof of Theorem 3.1.
Let us close this section with some comments on the geometric assumptions.
Obviously the doubling property (4.72) is more general assumption than Ahlfors regularity (1.1).A simple illustrative example could be the case of the weighted ball, B m := x ∈ R m : x < 1 , of R m with the distance (2.24) and the weighted measure; Dai and Xu (2013) (4.86) As in Dai and Xu (2013) we have that which implies that (M, ρ, µ γ ) satisfies the doubling property (4.72) and the noncollapsing condition (4.73).Precisely, the homogeneous dimension is d = m + 2 max(γ, 0).Clearly m ≤ d, with the equality exactly when γ = 0, which correspond to the unweighted case and widows the space as an Ahlfors regular one.
Moreover, it is now apparent how the homogeneous dimension, fundamentally depends on the measure of the space and may or may not be an integer.
Note also that in the weighted case, the proper operator is which satisfies Assumption II; see Dai and Xu (2013); Kerkyacharian et al. (2020).
Note finally that the non-collapsing condition holds true for every space (M, ρ, µ) of homogeneous type which is of finite measure µ(M) < ∞; see Coulhon et al. (2012).

Data Illustration
In this data illustration, we use earthquake location data for all earthquakes with a reported magnitude of 6.5 or higher between 1990-2021 (inclusive).These data are freely available through the United States Geological Survey website https: //earthquake.usgs.gov/earthquakes/search/.In total, there are n = 1507 earthquakes that fit these criteria, and we plot the location of these earthquakes in Figure 1.
To explain some of the earthquake patterns in Figure 1, we briefly discuss plate tectonics.The Earth's crust or lithosphere is divided into distinct and irregular sections of solid rock called tectonic plates.The tectonic plates float and gradually move on the molten rock of the Earth's mantle.Many geological events (e.g., volcanic eruptions and earthquakes) occur where different tectonic plates meet.For this reason, high magnitude earthquakes are highly concentrated around tectonic plate boundaries, and this global network of plate boundaries is evident in the earthquake patterns in Figure 1.Earthquakes also occur elsewhere in the world at lower rates.
The Circum-Pacific Belt (the west coasts of the American continents, from Alaska to East Asia, stretching down to the Pacific Islands), sometimes called the Pacific Rim, is the most seismically active.Note that the Pacific Islands (e.g., Tonga, Fiji, New Zealand, and New Caledonia) appear on the left and right of Figure 1.The entire Pacific Rim has high concentrations of high magnitude earthquakes, but we point out two other areas with very high concentrations of earthquakes.There are many earthquakes in a small area around the South Sandwich Islands, including earthquakes with 7.5 and 8.1 magnitudes on August 12, 2021.We also point out the Alpide Belt, a region that runs along the Azores, the Mediterranean, the Middle East, the Himalayas, Indonesia, and connects to the Pacific Rim in the Pacific Islands.Given the distribution of earthquakes seen here, we anticipate the need for a heterogeneous density estimate.
These data are distributed globally and are indexed on the sphere.To estimate the density of earthquakes, we approximate the density estimator in (2.31) by selecting a finite truncation point N , (5.88) fn,h,N (ξ) := where X i are earthquake locations, k(•) is defined as 3.68 with r = 5 + s , and P ν are Legendre polynomials.
Because the truncation induces error in the estimation, we anticipate that lower values of N will decrease the accuracy.In Figure 2, given n, we plot the theoretical upper bound of the truncation error in (3.70) against the truncation point for Because (5.88) is not guaranteed to be positive, we use a rectified density estimate, (5.89) f * n,h,N (ξ) = max(10 −3 , fn,h,N (ξ)).In our analysis of these data, we explore the effect of the bandwidth and the truncation point of (5.88) on the density estimates of earthquake locations.We use out-ofsample performance to determine the bandwidth and truncation point.Specifically, we randomly hold out 20% of the earthquakes as a test dataset {X test 1 , ..., X test ntest } to validate the density estimator.Using many different bandwidth and truncation point combinations, we calculate density estimators f * ntrain,h,N (ξ) using the remaining 80% of the data.Then, at the hold-out locations, we evaluate f * ntrain,h,N (X test 1 ), . . ., f * ntrain,h,N (X test ntest ).Using these evaluations, we compute the out-of-sample mean log-loss (negative logscore) at the hold out locations ) .
The use of log-loss as a proper scoring rule is common; see for example (Good (1952), Gneiting and Raftery (2007)).
Rather than consider bandwidth directly, we let h = n −1/(2s+2) and consider s ∈ {0.001, 0.01, 0.05, 0.5, 1}, where s indexes the smoothness of the density (See Section 2.4).Based on how concentrated earthquake events are, small values of s (i.e., smaller bandwidths) are preferable to smoother alternatives that will yield more uniform density estimators.In our analysis, we also vary the truncation point of the density estimator N ∈ {5, 10, 20, 30, 40, 50, 75, 100}.Larger values of N yielded no improvement in log-loss.We select the truncation point and bandwidth with the lowest out-of-sample mean log-loss.
We plot the mean log loss as a function of truncation point for various values of s in Figure 3. Overall, for each s, increasing N improves out-of-sample performance up to a point; then, improvement flattens and appears to reach an asymptote.In addition, smaller values of s have better out-of-sample performance; however, values of s less that 0.01 do not change model performance.The best out-of-sample performance (lowest log loss) is with N = 75, and there is no appreciable difference between s = 0.01 and s = 0.001 (or even smaller values of s).For this reason, we use s = 0.01.
For N = 75 and s = 0.01, we plot a heat map of the estimated density over a fine grid over the Earth in Figure 4.The colors are on a natural log scale to better see variability in the density.The Pacific Rim is evident in the density estimate, but the most striking features are the high estimated densities around the Pacific Islands and the Pacific Rim.We also note that the South Sandwich Islands and the Alpide belt are visible for their relatively high earthquake densities.
In this data illustration, we considered a global dataset of earthquakes from 2021.We compared many truncated kernel density estimators indexed on the sphere on a test set.We found that the kernel density estimates perform better out-of-sample with shorter bandwidths (smaller s) and more polynomial terms (higher N ).For the best combination of s and N considered, we plot the estimated density and comment on its features.In future analyses of these data, one may also estimate the density of earthquake magnitude.In these cases, it may be beneficial to allow the magnitude density to vary smoothly over space as in Sheanshang et al. (2021).In addition, one may account for aftershock excitation in the estimation, as is sometimes used in point process methodology (see, e.g., Hawkes (1971a), Hawkes (1971b), Ogata (1988), Ogata (1998), White and Gelfand (2021)).

Data availability statement:
The data used in this analysis are freely available through the United States Geological Survey website https://earthquake.usgs.gov/earthquakes/search/

( i )
Ahlfors regularity: There exist constants c 1 ≥ 1 and d > 0 such that (1.1) c −1 1 r d ≤ |B(x, r)| ≤ c 1 r d for every x ∈ M and r > 0, where |B(x, r)| is the volume of the open ball B(x, r) := {y ∈ M : ρ(x, y) < r} centred at x of radius r.

2. 2 .
Examples.We present the most basic examples of spaces (M, ρ, µ, L) falling under our umbrella.For more examples we refer to Cleanthous et al. (2020); Kerkyacharian et al. (

2. 3 .
Kernel density estimators on (M, L).We are now ready to present kernel density estimators on the current general setting as introduced inCleanthous et al. (2020).
(a) Doubling volume condition: There exists a constant c 0 > 1 such that (4.72) 0 < |B(x, 2r)| ≤ c 0 |B(x, r)| < ∞ for all x ∈ M and r > 0, where |B(x, r)| is the volume of the open ball B(x, r) centred at x of radius r.(b) Noncollapsing condition: There exists a constant c 1 > 0 such that (4.73) inf x∈M |B(x, 1)| ≥ c 1 .We modify Assumption II by replacing the factor t −din order: (α) Of course (a) and (b) hold trivially true under (i).(β) From (4.72) it follows that there exist c 0 > 0 and d > 0 such that (4.75) |B(x, λr)| ≤ c 0 λ d |B(x, r)| for every x ∈ M, r > 0, and λ > 1, the constant d above d is referred as the homogeneous dimension of (M, ρ, µ).This generalizes effectively the Ahlfors dimension used in the previous sections.(γ) A connection between the volume of balls of small radius, with their radius and the dimension comes from (4.73) and (4.75): (4.76) |B(x, r)| ≥ cr d , x ∈ M, 0 < r ≤ 1.

Figure 2 .
Figure 2. The theoretical upper bound on the truncation error from (3.70) for combinations of s and N .The dashed line indicates an error of 0.01.

−
log f * ntrain,h,N (X test i

Figure 3 .
Figure 3.The mean out-of-sample log loss (negative log-score) for various combinations of s and N .

Figure 4 .
Figure 4. Kernel density estimate for the 1990-2021 earthquake data for s = 0.01 and N = 75.The colors of the density are presented on the log scale.