1 Introduction

Today, technology has equipped science with a massive amount of data that requires rigorous analysis. In astronomy, data can come from missions to other planets, telescopes observing distant parts of the universe, or programs studying Cosmic Microwave Background Radiation. In climatology and environmental science, sensors provide data on the atmosphere. Medicine uses scans to track the growth of tumors and monitor their development, while embryology uses data to track the growth and ensure the health of developing humans. Essentially all scientific fields now heavily rely on data.

The complexity and form of the data reflect their nature. In the examples mentioned above, the data can be represented by geometric structures that capture their form and dynamics. A dataset should be understood as independent realizations of a random variable (rv) X. Such a rv lives in some domain according to its nature. For instance when X represents the locations on some planet, then X lives on the sphere \({\mathbb S}^2\) of the Euclidean space \({\mathbb {R}}^3\). The same is true of CMB radiation. For geological data in the interior of Earth, or another celestial body, the domain of study may be the ball \({\mathbb B}^3\). Similarly, in the field of medicine, the domain of definition of X can become much more complicated geometrically, and as a result, the general target domain becomes an abstract metric space \(\mathcal {M}\).

Let X be a rv distributed on a metric measure space \(\mathcal {M}\) and let \(f=f_{X}\) be its unknown probability density function (pdf). Density estimation, estimating a pdf from data \(X_1,\dots ,X_n\), represents an important problem in Statistics. To this end we need to construct a density estimator, which is an object of the form \({\hat{f}}_{n}(X_1,\dots ,X_n;x)\), where \({\hat{f}}_n:\mathcal {M}^{n}\times \mathcal {M}\rightarrow {\mathbb R}\) a measurable function. A famous method for obtaining such an estimator is by the so-called “kernel density estimators".

Nonparametric Statistics approaches the problem of density estimation by constructing appropriate kernel density estimators, which can approximate any density with membership is certain regularity spaces. Historically, these methods were pioneered by Rosenblatt (1956), Parzen (1962) and Bretagnolle and Huber (1979). The first books on the topic include Silverman (1986) and Härdle et al. (1998), while today the book Tsybakov (2009) is considered one of the main reference points. For an indicative list of contributions we refer to Baldi et al. (2009), Baraud et al. (2014), Bates and Mio (2014), Berry and Sauer (2017), Birgé (2014), Devroye and Györfi (1985), Devroye and Lugosi (1996), Devroye and Lugosi (1997), Donoho et al. (1996), Efroimovich (1986), Goldenshluger and Lepski (2014), Goldenshluger and Lepski (2011a, 2011b, 2022a, 2022b), Hall et al. (1987), Hasminskii and Ibragimov (1990), Ibragimov and Khasminski (1980), Kerkyacharian et al. (1996), Kerkyacharian et al. (2001), Kerkyacharian et al. (2008), Massart (2007), Pelletier (2005), Pelletier (2006), Rigollet (2006), Rigollet and Tsybakov (2007), Samarov and Tsybakov (2007).

Here we study kernel density estimators on metric measure spaces under very broad assumptions. The setting we will work covers simultaneously the classical cases of Euclidean space \({\mathbb R}^d\), the sphere \({\mathbb {S}}^{d}\), the ball \({\mathbb {B}}^d\) and many more significant examples of independent interest. Furthermore it contains more sophisticated geometric settings like manifolds and Lie groups. On the other hand, some techniques originated from spectral theory will simplify and unify several aspects of the approach. We shall operate in the setting put forward in Coulhon et al. (2012), which we describe next in a simplified form:

I. We assume that \((\mathcal {M},\rho ,\mu )\) is a metric measure space such that \((\mathcal {M}, \rho )\) is locally compact with distance \(\rho (\cdot , \cdot )\) and \(\mu \) is a positive Radon measure satisfying:

(i) Ahlfors regularity: There exist constants \(c_1\ge 1\) and \(d>0\) such that

$$\begin{aligned} c_1^{-1}r^d\le |B(x,r)| \le c_1 r^{d} \quad \hbox {for every }x \in \mathcal {M}\hbox { and }r>0, \end{aligned}$$
(1.1)

where |B(xr)| is the volume of the open ball \(B(x,r):=\{y\in \mathcal {M}:\rho (x,y)<r\}\) centred at x of radius r.

The number d is the so-called Ahlfors dimension of the space.

II. The second assumption is that there exists an essentially self-adjoint non-negative operator L on \({\mathbb L}^2(\mathcal {M}, d\mu )\), mapping real-valued to real-valued functions, such that the associated semigroup (more details in Sect. 2) \(P_t=e^{-tL}\), \(t>0\), consists of integral operators with (heat) kernel \(p_t(x,y)\) obeying the conditions:

(ii) Gaussian localization: There exist constants \(c_2,c_3>0\) such that

$$\begin{aligned} |p_t(x,y)| \le c_2 t^{-d/2}\exp \Big \{-c_3\frac{\rho ^2(x,y)}{t}\Big \} \quad \hbox {for every} \;\;x,y\in \mathcal {M},\,t>0. \end{aligned}$$
(1.2)

(iii) Hölder continuity: There exists a constant \(\alpha >0\) such that

$$\begin{aligned} \big | p_t(x,y) - p_t(x,y') \big | \le c_2\Big (\frac{\rho (y,y')}{\sqrt{t}}\Big )^\alpha t^{-d/2}\exp \Big \{-c_3\frac{\rho ^2(x,y)}{t}\Big \} \end{aligned}$$
(1.3)

for every \(x, y, y'\in \mathcal {M}\) such that \(\rho (y,y')\le \sqrt{t}\) and \(t>0\). (iv) Markov property:

$$\begin{aligned} \int _{\mathcal {M}} p_t(x,y) d\mu (y)= 1 \quad \hbox {for every }x\in \mathcal {M}\hbox { and } t >0. \end{aligned}$$
(1.4)

This setting we study generalizes (by default) the Euclidean space. Moreover, it contains spaces like the sphere, the ball, the interval, cubes/rectangles, the simplex, Riemannian manifolds with non-negative Ricci curvature and more, each equipped with their natural metrics and measures associated with Laplace or Laplace-Beltrami operators. For more examples we refer the reader to Coulhon et al. (2012), Georgiadis and Nielsen (2017), Kerkyacharian et al. (2020), Kerkyacharian and Petrushev (2015).

Some first contributions in Statistics in this generality can be found in Castillo et al. (2014), Cleanthous et al. (2020, 2022), Kerkyacharian et al. (2018), while there is a large number of open problems in front of the community.

The aim of the present study is threefold:

\((\alpha )\) To review the setting and the construction of kernel density estimators together with the corresponding theoretical background, which is demanding, on the broad framework under study; Sect. 2.

\((\beta )\) As novel results, we obtain optimal pointwise density estimation on Hölder spaces; see Sects. 3 and 4 and we shed light in the assumptions and methods.

\((\gamma )\) As an application, we perform a data-analysis of earthquakes; Sect. 5, using our kernel density estimators. Precisely we compare the out-of-sample performance of several approximated kernel density estimators and we plot the heat map of the estimated density using the selected model.

Remarks and Examples are placed in several points of the manuscript for highlighting notions and ideas. The new results are contained in Sect. 3 and under more general assumptions in Sect. 4 and are accompanied with remarks that could be used for future studies.

Section 5 is dedicated to the data analysis of earthquakes. In this Section we apply the theoretical results of the paper and show how one can use these approaches with occurrence data on the Earth. The data used in this analysis are freely available through the United States Geological Survey website https://earthquake.usgs.gov/earthquakes/search/.

Notation: Throughout positive constants will be denoted by c, and will be allowed to vary at every occurrence. The dependence of a constant to the geometric structure constants \(c_1,c_2,c_3,\alpha \) and d will not be stated, but the dependence to a parameter q, will be stated as \(c_q\). We denote by \({\mathbb N},\;{\mathbb R},\;{\mathbb R}_+\) the sets of positive integers, real numbers and non-negative real numbers respectively. If \(\tau \in {\mathbb N}\), the class of differentiable functions on \({\mathbb R}_+\) with continuous derivatives up to order \(\tau \) will be stated as \(\mathcal {C}^{\tau }({\mathbb R}_+)\). For \(s>0\), we will denote by \(\lfloor s\rfloor \) the greatest integer that is strictly less than s and by \(\lceil s\rceil \) the smaller integer strictly larger than s.

2 Density estimation on metric spaces associated with operators: A review

The first part of our study consists of a review of density estimation on metric spaces associated with operators. One of the milestones is to construct kernels. We expand here the methods used in Cleanthous et al. (2020, 2022) inspired by the corresponding machinery built in Coulhon et al. (2012) based on the powerful Spectral Theory.

2.1 Functional calculus

We start by some fundamental notions of Spectral Theory providing a minimum background of this wide scientific field; the reader is further referred to Prugovečki (1981), Reed and Simon (1980), Yoshida (1978).

Recall that L is assumed to be a non-negative self-adjoint operator that maps real-valued to real-valued functions. Then (Prugovečki 1981, Section 5) L admits a unique spectral measure E; that is a projector-valued mapping as follows:

Denote by \(\mathcal {B}\) the Borel \(\sigma \)-algebra on \({\mathbb R}\). For every \(S\in \mathcal {B}\), we correspond an orthogonal projection \(E(S):{\mathbb L}^2(\mathcal {M},d\mu )\rightarrow {\mathbb L}^2(\mathcal {M},d\mu )\) such that:

(i) \(E({\mathbb R})=I\) (the identity operator on \({\mathbb L}^2(\mathcal {M},d\mu )\)).

(ii) For every sequence of disjoint Borel sets \(\{S_n\}_{n\in {\mathbb N}}\subset \mathcal {B}\)

$$\begin{aligned} E(S)=\sum _{n=1}^{\infty }E(S_n),\quad \text {where}\;\; S:=\bigcup _{n=1}^{\infty }S_{n}, \end{aligned}$$
(2.1)

in the strong \({\mathbb L}^2(\mathcal {M},d\mu )\) sense; i.e. for every \(f\in {\mathbb L}^2(\mathcal {M},d\mu )\),

$$\begin{aligned} \left\| \left( E(S)-\sum _{n=1}^{N}E(S_n)\right) f\right\| _2\xrightarrow {N\rightarrow \infty }0. \end{aligned}$$
(2.2)

Thanks to (ii), for every \(f,g\in {\mathbb L}^2(\mathcal {M},d\mu )\) the set-function

$$\begin{aligned} \nu _{f,g}(S):=\langle E(S)f,g\rangle ,\quad \text {for every }\;S\in \mathcal {B}, \end{aligned}$$
(2.3)

is a complex measure on \(({\mathbb R},\mathcal {B})\).

Moreover for every \(f\in {\mathbb L}^2(\mathcal {M},d\mu )\) the set-function

$$\begin{aligned} \nu _{f}(S)&:=\nu _{f,f}(S)=\langle E(S)f,f\rangle \\&=\langle E(S)^2 f,f\rangle =\langle E(S)f,E(S)f\rangle =\Vert E(S)f\Vert _2^2,\quad S\in \mathcal {B}\end{aligned}$$

is a measure on \(({\mathbb R},\mathcal {B})\), which is finite and precisely \(\nu _{f}({\mathbb R})=\Vert f\Vert _2^2<\infty \).

The study can be slightly simplified by the following projector-valued function

$$\begin{aligned} {\mathbb R}\ni \lambda \mapsto E_{\lambda }:=E(I_\lambda ),\quad I_{\lambda }:=(-\infty ,\lambda ] \end{aligned}$$
(2.4)

which is referred as the spectral resolution of L. Moreover for every \(f\in {\mathbb L}^2(\mathcal {M},d\mu )\) and every \(\lambda \in {\mathbb R}\), we have \(\nu _{f}(\lambda ):=\nu _{f}(I_\lambda )=\langle E_{\lambda }f,f\rangle =\Vert E_{\lambda }f\Vert _2^2\).

Given further that L is assumed non-negative, by Prugovečki (1981, Theorem 6.3), the domain \(\text {Dom}(L)\) of L consists of all functions \(f\in {\mathbb L}^2(\mathcal {M},d\mu )\) such that

$$\begin{aligned} \int _{0}^{\infty }\lambda ^2 d\nu _{f}(\lambda )=\int _{0}^{\infty }\lambda ^2 d\langle E_{\lambda }f,f\rangle <\infty . \end{aligned}$$
(2.5)

Moreover for every \(f\in \text {Dom}(L)\) and \(g\in {\mathbb L}^2(\mathcal {M},d\mu )\)

$$\begin{aligned} \langle Lf,g\rangle =\int _{0}^{\infty } \lambda d\langle E_{\lambda }f,g\rangle . \end{aligned}$$
(2.6)

It is customary to write symbolically

$$\begin{aligned} L=\int _{0}^{\infty }\lambda dE_{\lambda }, \end{aligned}$$
(2.7)

the so-called spectral decomposition of L.

The next logical step is the functional calculus associated with the operator L; see also Reed and Simon (1980, Theorem VIII.5).

Let \(g:{\mathbb R}_{+}\rightarrow {\mathbb R}\) a Borel measurable function. Then the operator g(L) defined on

$$\begin{aligned} \text {Dom}(g):=\left\{ \varphi \in {\mathbb L}^2(\mathcal {M},d\mu ):\;\int _{0}^{\infty }|g(\lambda )|^2 d\langle E_{\lambda }\varphi ,\varphi \rangle <\infty \right\} , \end{aligned}$$
(2.8)

as

$$\begin{aligned} \langle g(L)\varphi ,\psi \rangle =\int _{0}^{\infty }g(\lambda ) d\langle E_{\lambda }\varphi ,\psi \rangle ,\quad \text {for every }\varphi \in \text {Dom}(g),\;\psi \in {\mathbb L}^2(\mathcal {M},d\mu ),\nonumber \\ \end{aligned}$$
(2.9)

is a self-adjoint operator mapping real-valued functions to real-valued functions. If g is further assumed to be bounded, then \(\text {Dom}(g)={\mathbb L}^2(\mathcal {M},d\mu )\) and \(g(L):{\mathbb L}^2(\mathcal {M},d\mu )\rightarrow {\mathbb L}^2(\mathcal {M},d\mu )\) is a bounded operator. The operator g(L) is referred as the spectral multiplier associated with g and L and it is symbolically expressed —in the spirit of (2.7)— as

$$\begin{aligned} g(L)=\int _{0}^{\infty }g(\lambda ) dE_{\lambda }. \end{aligned}$$
(2.10)

The above spectral multipliers can take an explicit form for particular operators L and metric spaces, as we will see in §2.2.

For the purpose of the study of kernel density estimators we turn our attention to spectral multipliers associated with the operator \(\sqrt{L}\), which is well-defined and self-adjoint; see Yoshida (1978). The exact reasons behind the switch to \(\sqrt{L}\) are discussed in Cleanthous et al. (2022, Remark 2.2.\((\alpha )\)).

We denote by \(\{F_{\lambda }:\lambda \ge 0\}\) the spectral resolution of \(\sqrt{L}\). Then \(F_{\lambda }=E_{\lambda ^2}\) and for every Borel measurable \(g:{\mathbb R}_{+}\rightarrow {\mathbb R}\) it holds

$$\begin{aligned} g(\sqrt{L})=\int _{0}^{\infty }g(\lambda )dF_{\lambda }=\int _{0}^{\infty }g(\sqrt{\lambda })dE_{\lambda }. \end{aligned}$$
(2.11)

Summary and notation For the rest of our study we fix the following terminology and notation.

As a symbol we will refer to a Borel measurable and bounded function

$$\begin{aligned} k:{\mathbb R}_+\rightarrow {\mathbb R}\quad \text {(symbol)}. \end{aligned}$$

The spectral multiplier associated with the symbol k and the operator \(\sqrt{L}\) as in (2.11) will be denoted by the corresponding capital letter:

$$\begin{aligned} K:=k(\sqrt{L}):{\mathbb L}^2(\mathcal {M},d\mu )\rightarrow {\mathbb L}^2(\mathcal {M},d\mu )\quad \text {(spectral multiplier)}. \end{aligned}$$

By the above discussion, the operator K:

(i) is bounded on \({\mathbb L}^2(\mathcal {M},d\mu )\),

(ii) is self-adjoint, and

(iii) maps real-valued functions to real-valued functions.

For the purpose of kernel density estimation we are interested in the following class of operators: We say that K is an integral operator, when there exists a measurable function \(\mathcal {K}(x, y)\) —referred as the kernel of the operator K

$$\begin{aligned} \mathcal {M}\times \mathcal {M}\ni (x,y)\mapsto \mathcal {K}(x,y)\in {\mathbb R}\quad (\text {kernel}) \end{aligned}$$

such that

$$\begin{aligned} K(f)(x)=\int _{\mathcal {M}}\mathcal {K}(x,y)f(y)d\mu (y),\quad f\in \text {Dom}(K),\;x\in \text {Dom}(f). \end{aligned}$$
(2.12)

Note further that when the spectral multiplier \(K=k(\sqrt{L})\) is an integral operator, then its kernel is real valued and symmetric;

$$\begin{aligned} \mathcal {K}(x,y)=\mathcal {K}(y,x)\in {\mathbb R}. \end{aligned}$$

Such kernels are exactly the objects we will use for the kernel density estimation.

As always we need a notion of dilations suitable for use in the current framework.

Let \(k:{\mathbb R}_{+}\) a symbol, K the spectral multiplier associated with k and \(\sqrt{L}\) and assume that K is an integral operator with kernel \(\mathcal {K}(x,y)\), as above. For every \(h>0\) we denote by

(i) \(k_{h}(\lambda ):=k(h\lambda ),\;\lambda \in {\mathbb R}_+\), the symbol induced by k dilated by h.

(ii) \(K_h=k_h(\sqrt{L})=k(h\sqrt{L})\), the spectral multiplier associated with \(k_h\) and \(\sqrt{L}\).

(iii) \(\mathcal {K}_h(x,y)=\mathcal {K}_h(y,x)\), the symmetric real-valued kernel of \(K_h\).

We shall need the following result from smooth functional calculus induced by the heat kernel, developed in Coulhon et al. (2012); Kerkyacharian and Petrushev (2015). We fix the following notation first: Let \(h>0\) and \(\tau >0\). We denote by

$$\begin{aligned} \mathcal {D}_{h,\tau }(x,y):=h^{-d}\big (1+h^{-1}\rho (x,y)\big )^{-\tau },\quad \text {for}\;x,y\in \mathcal {M}. \end{aligned}$$
(2.13)

Theorem 2.1

Suppose \(k:{\mathbb R}_{+}\rightarrow {\mathbb R}\) is a symbol such that: \(k\in \mathcal {C}^\tau ({\mathbb R}_+)\), \(\tau >d\),

$$\begin{aligned} |k^{(\nu )}(\lambda )|\le C_\tau (1+\lambda )^{-r} \quad \text {for every}\; \lambda \ge 0\;\text {and}\;0\le \nu \le \tau ,\;\text {where}\; r > \tau +d,\nonumber \\ \end{aligned}$$
(2.14)

and \(k^{(2\nu +1)}(0)=0\) for every \(\nu \ge 0\) such that \(1\le 2\nu +1 \le \tau \).

Then \(K_h\), \(h>0\), is an integral operator with kernel \(\mathcal {K}_h(x, y)\) satisfying

$$\begin{aligned} \big |\mathcal {K}_h(x, y)\big | \le cC_\tau \mathcal {D}_{h,\tau }(x,y), \end{aligned}$$
(2.15)

where \(c>0\) is a constant depending on \(\tau \) and the structural geometric constants of the setting.

Moreover, for every \(h>0\) and \(x\in \mathcal {M}\)

$$\begin{aligned} \int _{\mathcal {M}} \mathcal {K}_h(x, y)d\mu (y)=k(0). \end{aligned}$$
(2.16)

Remark 2.2

Let us comment on Theorem 2.1.

\((\alpha )\) Let \(k\in C^{\tau }({\mathbb R})\) be an even function. Then the assumption \(k^{(2\nu +1)}(0)=0\), \(0<2\nu +1\le \tau \), holds automatically.

\((\beta )\) Let \(k\in \mathcal {C}^{\tau }({\mathbb R}_+)\) be such that \(\mathrm{{supp}\, }k\subset [0,b]\), for some \(b>0\). Then (2.14) holds for \(C_{\tau }:=(1+b)^r\max \{\Vert k^{(\nu )}\Vert _{\infty }:0\le \nu \le \tau \}\).

\((\gamma )\) Let us return to the setting’s Assumption II and seed more light on it. The heat kernel \(p_t(x,y)\) consists the kernel of the operator \(e^{-tL}\). At this point, being more familiar with Spectral Theory, we define \(k(\lambda ):=e^{-\lambda ^2}\). Then for every \(t>0\) the operator \(e^{-tL}\) is just the spectral multiplier \(K_{\sqrt{t}}\), which is an integral operator by Theorem 2.1, and the heat kernel equals \(p_{t}(x,y)=\mathcal {K}_{\sqrt{t}}(x,y)\).

2.2 Examples

We present the most basic examples of spaces \((\mathcal {M},\rho ,\mu ,L)\) falling under our umbrella. For more examples we refer to Cleanthous et al. (2020); Kerkyacharian et al. (2020) and the references therein. In the following spaces we also express the kernels obtained in an abstract sense of existence in Theorem 2.1.

Example 2.3

Let \(\mathcal {M}={\mathbb R}^d\) the Euclidean space associated with the operator \(L=-\Delta \), the negative Laplacian. By default this space is included in our study.

We proceed to the kernels. Let \(k:{\mathbb R}_{+}\rightarrow {\mathbb R}\) a symbol satisfying the assumptions of Theorem 2.1. We extend the symbol k on \({\mathbb R}^d\), radially; \({\tilde{k}}(\xi ):=k(|\xi |)\), for every \(\xi \in {\mathbb R}^d\). Then the spectral multiplier \(K=k(\sqrt{L})\) is nothing but the Fourier multiplier associated with the symbol \({\tilde{k}}(\xi )\). Denote by \({\hat{f}}\) and by \(\mathcal {F}^{-1}f\) the Fourier transform and the inverse Fourier transform of the function \(f:{\mathbb R}^d\rightarrow {\mathbb R}\), respectively. Then:

$$\begin{aligned} K(f)(x)&=\mathcal {F}^{-1}({\tilde{k}}{\hat{f}})(x)\\&=\kappa *f(x),\quad \text {where}\;\;\kappa :=\mathcal {F}^{-1}{\tilde{k}}\\&=\int _{{\mathbb R}^d}\kappa (x-y)f(y)dy,\quad \text {so}\;\;\mathcal {K}(x,y)=\kappa (x-y). \end{aligned}$$

The kernel \(\mathcal {K}_{h}(x,y)\), by the properties of the Fourier transform, is the familiar

$$\begin{aligned} \mathcal {K}_{h}(x,y)=\frac{1}{h^d}\kappa \left( \frac{x-y}{h}\right) ,\quad x,y\in {\mathbb R}^d,\;h>0. \end{aligned}$$
(2.17)

This example sheds light on the notion of spectral multipliers. Specifically, on \({\mathbb R}^d\), they are the well-known Fourier multipliers, and the corresponding kernels \(\mathcal {K}(x,y)\) are the convolution kernels of the symbol \(\kappa =\mathcal {F}^{-1}{\tilde{k}}\). Moreover, the existing knowledge on \({\mathbb R}^d\), together with the present correspondence, acts as a guide for the several developments on the setting of metric spaces associated with operators.

In the next examples we consider spaces \(\mathcal {M}\) of finite measure. In this case —as it has been proved in Coulhon et al. (2012, Proposition 3.20)—the operator L presents a discrete spectrum \(0\le \lambda _0<\lambda _1<\cdots \). This implies the discrete decomposition

$$\begin{aligned} {\mathbb {L}}^2=\bigoplus _{\nu =0}^{\infty } E_{\nu };\quad E_{\nu }:=\mathrm{{ker}}(L-\lambda _{\nu }I),\;\nu \ge 0. \end{aligned}$$

Let \(\{e_{i}^{\nu }\}_{i=1,\dots ,d_{\nu }}\) be an orthonormal basis of the eigenspace \(E_{\nu }\) and \(d_{\nu }:=\mathrm{{dim}}E_{\nu }<\infty \), \(\nu \ge 0\). Then we have the projector operators

$$\begin{aligned} P_{\nu }(x,y):=\sum _{i=1}^{d_{\nu }}e_i^{\nu }(x)\overline{e_i^{\nu }(y)},\quad x,y\in \mathcal {M},\;\nu \ge 0. \end{aligned}$$

Let \(k:{\mathbb R}_{+}\rightarrow {\mathbb R}\) a symbol satisfying the assumptions of Theorem 2.1. The corresponding spectral multiplier \(K_h\), \(h>0\), has the following kernel

$$\begin{aligned} \mathcal {K}_{h}(x,y)=\sum _{\nu =0}^{\infty }k(h\sqrt{\lambda _{\nu }})P_{\nu }(x,y),\quad x,y\in \mathcal {M}. \end{aligned}$$
(2.18)

For more details we refer to Castillo et al. (2014), Kerkyacharian et al. (2020).

Importantly, when dealing with a specific metric measure space of finite measure, associated with an operator L, we just need to know (i) the eigenvalues and (ii) the projector operators, and then we get the kernels in (2.18).

Next, we present precise expressions of (2.18) on the cases of the unit sphere and the unit ball of \({\mathbb R}^3\), which seems to be the most applicable.

Example 2.4

Let \(\mathcal {M}={\mathbb S}^2\) the unit sphere of \({\mathbb R}^{3}\) associated with the angular distance, the spherical measure and the spherical Laplacian. Then this space satisfies our Assumptions I and II; Kerkyacharian et al. (2020). The kernel takes the form:

$$\begin{aligned} \mathcal {K}_h(\xi ,\eta )=\sum _{\nu =0}^{\infty }\frac{1+2\nu }{4\pi }k\big (h\sqrt{\nu (\nu +1)}\big )P_{\nu }\big (\langle \xi ,\eta \rangle \big ),\quad \xi ,\eta \in {\mathbb S}^2, \end{aligned}$$
(2.19)

where \(P_{\nu }\) the Legendre polynomials and \(\langle \cdot ,\cdot \rangle \) the inner product on \({\mathbb R}^3\).

Example 2.5

The unit ball \({\mathbb B}^3\) of \({\mathbb R}^3\) equipped with the distance Dai and Xu (2013)

$$\begin{aligned} \rho (x,y)=\arccos \big (\langle x,y\rangle +\sqrt{1-|x|^2}\sqrt{1-|y|^2}\big ), \end{aligned}$$
(2.20)

the measure

$$\begin{aligned} d\mu (x)=\big (1-|x|^2\big )^{-1/2}dx, \end{aligned}$$
(2.21)

and the operator

$$\begin{aligned} L&=-\sum _{i=1}^{3}\partial _i^2 +\sum _{i,j=1}^{3}x_i x_j \partial _i \partial _j +3\sum _{i=1}^{3}x_i \partial _i, \end{aligned}$$
(2.22)

satisfies the assumptions of our setting; Kerkyacharian et al. (2020).

Expanding the discussion in Cleanthous et al. (2020) and using Kyriazis et al. (2008), the kernel takes the form

$$\begin{aligned} \mathcal {K}_{h}(x,y)=\sum _{\nu =0}^{\infty }\frac{1+\nu }{2\pi ^2}k\big (h\sqrt{\nu (\nu +2)}\big )G_{\nu }(x,y),\quad x,y\in {\mathbb B}^3, \end{aligned}$$
(2.23)

where

$$\begin{aligned} G_{\nu }(x,y):=C_{\nu }^{1}&\big (\langle x,y\rangle +\sqrt{1-|x|^2}\sqrt{1-|y|^2}\big ) \nonumber \\&+C_{\nu }^{1}\big (\langle x,y\rangle -\sqrt{1-|x|^2}\sqrt{1-|y|^2}\big ) \end{aligned}$$
(2.24)

and \(C_{\nu }^1\) the Gegenbauer polynomials of order 1.

We emphasize that the kernels existing by Theorem 2.1 may look completely different as in (2.17), (2.19), and (2.23); however, all of them enjoy the decay in (2.15), which is sharp in all the above cases as it can be confirmed by the properties of Fourier transform, Legendre polynomials and Gegenbauer polynomials, respectively.

The advantage of the general theory is that it unifies spaces of different nature, extracts general results and expresses them in the particular cases of interest.

2.3 Kernel density estimators on \(\varvec{(\mathcal {M},L)}\)

We are now ready to present kernel density estimators on the current general setting as introduced in Cleanthous et al. (2020).

Definition 2.6

Let \(n\in {\mathbb {N}}\) and \(X_1,\dots ,X_n\) be iid random variables on \(\mathcal {M}\). Let \(k:{\mathbb {R}}_+\rightarrow {\mathbb {R}}\) be a symbol satisfying the assumptions of Theorem 2.1, as well as \(k(0)=1\), and \(h>0\) a bandwidth. The associated kernel density estimator (kde) is defined as

$$\begin{aligned} {\hat{f}}_{n,h}(x):={\hat{f}}_{n,h}(X_1,\dots ,X_n;x):=\frac{1}{n}\sum \limits _{i=1}^{n} \mathcal {K}_{h}(X_i,x), \quad x\in \mathcal {M}. \end{aligned}$$
(2.25)

Note that (2.25) is well-defined for every k, as guaranteed by Theorem 2.1. In addition (2.16) implies the fundamental property

$$\begin{aligned} \int _{\mathcal {M}}\mathcal {K}_{h}(x,y)d\mu (y)=k(0)=1, \end{aligned}$$

which is a standard assumption for the kernels in the Euclidean setting.

We further express explicitly the kde in (2.25) on \({\mathbb R}^d\), \({\mathbb S}^2\) and \({\mathbb B}^3\), just by expanding the Examples 2.3, 2.4 and 2.5. Let k a symbol as in Definition 2.6 and \(h>0\).

\((\alpha )\) When \(\mathcal {M}={\mathbb R}^d\), and \(L=-\Delta \),

$$\begin{aligned} {\hat{f}}_{n,h}(x)=\frac{1}{n}\frac{1}{h^d}\sum \limits _{i=1}^{n} \kappa \Big (\frac{X_i-x}{h}\Big ), \quad x\in {\mathbb R}^d,\;\;\text {where}\;\;\kappa =\mathcal {F}^{-1}{\tilde{k}}, \end{aligned}$$
(2.26)

which is the very well-known form of a kde on \({\mathbb R}^d\).

\((\beta )\) When \(\mathcal {M}={\mathbb S}^2\), equipped with the angular distance, the spherical measure and the spherical Laplacian,

$$\begin{aligned} {\hat{f}}_{n,h}(\xi )=\frac{1}{n}\sum \limits _{i=1}^{n} \sum _{\nu =0}^{\infty }\frac{1+2\nu }{4\pi }k\big (h\sqrt{\nu (\nu +1)}\big )P_{\nu } \big (\langle \xi ,X_i\rangle \big ),\quad \xi \in {\mathbb S}^2. \end{aligned}$$
(2.27)

\((\gamma )\) When \(\mathcal {M}={\mathbb B}^3\), equipped with the distance in (2.20), the measure in (2.21) and the operator in (2.22),

$$\begin{aligned} {\hat{f}}_{n,h}(x)=\frac{1}{n}\sum \limits _{i=1}^{n} \sum _{\nu =0}^{\infty }\frac{1+\nu }{2\pi ^2}k\big (h\sqrt{\nu (\nu +2)}\big )G_{\nu }(x,X_i),\quad x\in {\mathbb B}^3, \end{aligned}$$
(2.28)

where \(G_{\nu }\) as in (2.24).

2.4 Hölder spaces

We are closing this review by presenting some regularity spaces. In nonparametric estimation we assume that the density under study belongs to large regularity spaces. Regularity spaces on \({\mathbb R}\) and \({\mathbb R}^d\) have been studied for a century within many scientific disciplines. Historically, the first way to express the notion of regularity (or smoothness) was in terms of derivatives, and gradually Fourier transforms and convolutions extended such notions. For the historical path, we refer the reader to Triebel (1983).

Hölder spaces are a suitable choice for the purpose of pointwise density estimation (see Tsybakov 2009) that we will obtain in the present study. Let us recall this class on \({\mathbb R}^1\): Let \(s>0\) and denote by \(\ell :=\lfloor s\rfloor \) the greatest integer strictly less than s. The Hölder space \({\dot{\mathcal {H}}}^s({\mathbb R})\) is the set of function \(f:{\mathbb R}\rightarrow {\mathbb R}\) that are \(\ell \)-times differentiable and

$$\begin{aligned} |f^{(\ell )}(x)-f^{(\ell )}(y)|\le c|x-y|^{s-\ell }, \end{aligned}$$
(2.29)

for some constant \(0\le c<\infty \) and every \(x\ne y\). Note that slightly different versions of these spaces can be found in different sources, but the overall purpose is more or less the same.

We must define a suitable extension of (2.29) on a metric space. For the right hand side, we simply use a power of the distance \(\rho (x,y)\). Metric spaces lack the notion of derivatives, so a substitute for the left side is more challenging, but a solution comes from the operator L. In all of our examples in Sect. 2.2 we observe that the differentiability is linked with the definition of L. We also note that in every case presented in Sect. 2.2L is a differential operator of order 2. These facts justify the following definition:

Definition 2.7

Let \(s>0\) and denote by \(\ell =\lfloor s\rfloor \). The Hölder space of order s, \({\dot{\mathcal {H}}}^s\), is defined as the set of all functions \(f:\mathcal {M}\rightarrow {\mathbb R}\) such that

$$\begin{aligned} \Vert f\Vert _{{\dot{\mathcal {H}}}^s}:=\sup \limits _{x\ne y} \frac{\big |L^{\ell /2}f(x)-L^{\ell /2}f(y)\big |}{\rho (x,y)^{s-\ell }}<\infty . \end{aligned}$$
(2.30)

For the connection between these spaces and other smoothness spaces in our setting, we refer to Coulhon et al. (2012). For the use of regularity spaces in Nonparametric Statistics in this generality, we further refer to Castillo et al. (2014), Cleanthous et al. (2020), Cleanthous et al. (2022).

3 Pointwise density estimation

We proceed to present some new results. Namely the pointwise estimation of densities enjoying Hölder regularity.

One of the main ways to measure the accuracy of the estimator \({\hat{f}}_{n,h}(x)\) at a given point \(x\in \mathcal {M}\) is by the mean squared error (MSE):

$$\begin{aligned} \text {MSE}=\text {MSE}({\hat{f}}_{n,h}(x)):={\mathbb {E}}\big [\big ({\hat{f}}_{n,h}(x)-f(x)\big )^2\big ],\quad x\in \mathcal {M}, \end{aligned}$$
(3.1)

where \({\mathbb {E}}\) is the expectation of \((X_1,\dots ,X_n),\) i.e.

$$\begin{aligned} \text {MSE}= & {} {\mathbb {E}}\big [\big ({\hat{f}}_{n,h}(x)-f(x)\big )^2\big ] \nonumber \\= & {} \int _M\cdots \int _M \big ({\hat{f}}_{n,h}(x;x_1,\dots ,x_n)-f(x)\big )^2 f(x_1)\cdots f(x_n)d\mu (x_1)\cdots d\mu (x_n).\nonumber \\ \end{aligned}$$
(3.2)

What we are called to do is to determine the proper assumptions on the symbols k, so that the MSE of the corresponding kernel density estimator \({\hat{f}}_{n,h}\) to be optimally estimated, provided that the unknown density f belongs to a certain Hölder space.

The main new result of this paper is the following:

Theorem 3.1

Let \(s>0,\) \(f\in L^{\infty }\cap {\dot{\mathcal {H}}}^s\) and a symbol \(k \in \mathcal {C}^{\tau }({\mathbb {R}}_+)\) for some \(\tau >d+s\), satisfying: \(k(0)=1\),

$$\begin{aligned} k^{(\nu )}(0)=0,\quad \text {for every}\;\;1\le \nu \le \tau , \end{aligned}$$
(3.3)

and for some \(r> \tau +d\),

$$\begin{aligned} |k^{(\nu )}(\lambda )|\le C_{\tau }(1+\lambda )^{-r},\quad \text {for every}\;\;\lambda \ge 0,\;0\le \nu \le \tau . \end{aligned}$$
(3.4)

We pick \(h=h_n=n^{-\frac{1}{2s+d}}\). Then for every \(n\in {\mathbb N}\) the corresponding kde \({\hat{f}}_{n,h}\) satisfies

$$\begin{aligned} \sup \limits _{x\in \mathcal {M}} \textrm{MSE}({\hat{f}}_{n,h}(x))=\sup \limits _{x\in \mathcal {M}} {\mathbb {E}}\big [\big ({\hat{f}}_{n,h}(x)-f(x)\big )^2\big ]\le c C(f) n^{-\frac{2s}{2s+d}}, \end{aligned}$$
(3.5)

where the constant \(c>0\), depends only on \(\tau ,\;s,\;C_{\tau }\) and the structural constants of the setting, while C(f) is given by

$$\begin{aligned} C(f):=\max \big (\Vert f\Vert _{\infty },\Vert f\Vert _{{\dot{\mathcal {H}}}^s}^2\big ). \end{aligned}$$
(3.6)

Note that the rate obtained in (3.5) is the optimal one, meeting this in Cleanthous et al. (2020, 2022), where the \(L^p\)-risk was used for densities on Besov and Sobolev regularity spaces respectively. For the connection between the several regularity spaces, we refer to Proposition 6.4 in Coulhon et al. (2012) and Theorem 7.8 in Kerkyacharian and Petrushev (2015).

While approaching the proof of Theorem 3.1 we will have the opportunity to present the action on the setting and highlighting the correspondence with the classical Euclidean framework.

We first take a closer look at the assumptions on the symbol generating the kernels. We restrict ourselves to the example of \(\mathcal {M}={\mathbb R}^d\). As we saw in Example 2.3, the radial extension \({\tilde{k}}\) of the symbol k is the Fourier transform of the function \(\kappa \), which yields the usual kde as in (2.26). Translating the assumptions of Theorem 3.1 in the Fourier transform language we get the usual assumptions on the \(\kappa \) for Euclidean spaces;

\((\alpha )\) Assumption (3.3), means simply that the \(\kappa \) enjoys vanishing moments up to some certain order.

\((\beta )\) Assumption (3.4), thanks to Theorem 2.1 and (2.17), ensures that

$$\begin{aligned} \int _{{\mathbb R}^d}(1+|\xi |)^s|\kappa (\xi )|d\xi&=\int _{{\mathbb R}^d}(1+|\xi |)^s|\mathcal {K}(\xi ,0)|d\xi \le c\int _{{\mathbb R}^d}(1+|\xi |)^s\mathcal {D}_{1,\tau }(\xi ,0)d\xi \nonumber \\&=c\int _{{\mathbb R}^d}(1+|\xi |)^{-(d+\varepsilon )}d\xi ,\quad \varepsilon :=\tau -d-s>0 \nonumber \\&=c_d\int _{0}^{\infty }\frac{\varrho ^{d-1}d\varrho }{(1+\varrho )^{d+\varepsilon }}\quad \text {(polar coordinates)} \nonumber \\&\le c\int _{0}^{\infty }\frac{d\varrho }{(1+\varrho )^{1+\varepsilon }}<\infty . \end{aligned}$$
(3.7)

\((\gamma )\) The assumption \(k(0)=1\), simply asserts that

$$\begin{aligned} \int _{{\mathbb R}^d}\kappa (\xi )d\xi ={\hat{\kappa }}(0)={\tilde{k}}(0)=k(0)=1. \end{aligned}$$

The standard approach for dealing with the MSE is to decompose it as follows:

$$\begin{aligned} \text {MSE}({\hat{f}}_{n,h}(x))=\sigma ^2 (x)+b^2 (x) \end{aligned}$$
(3.8)

where the function \(\sigma ^2 (x)\) is the variance of the estimator \({\hat{f}}_{n,h}(x)\), i.e.

$$\begin{aligned} \sigma ^2(x):={\mathbb {E}}\big [\big ({\hat{f}}_{n,h}(x)-{\mathbb {E}}[{\hat{f}}_{n,h}(x)]\big )^2\big ],\quad x\in \mathcal {M}\end{aligned}$$
(3.9)

and b(x) is the bias of \({\hat{f}}_{n,h}(x),\) i.e.

$$\begin{aligned} b(x):={\mathbb {E}}\big [{\hat{f}}_{n,h}(x)\big ]-f(x), \quad x\in \mathcal {M}. \end{aligned}$$
(3.10)

We will separate the proof of Theorem 3.1 in the two usual steps: the estimation of the variance and the estimation of the bias, but before the proof, we provide some remarks below.

Remark 3.2

\((\alpha )\) Another form for the conclusion of Theorem 3.1 is

$$\begin{aligned} \sup _{f\in {\mathbb {F}}^{s}(m)}\sup \limits _{x\in M} {\mathbb {E}}\Big [\big ({\hat{f}}_{n,h}(x)-f(x)\big )^2\Big ]\le cn^{-\frac{2s}{2s+d}}, \end{aligned}$$
(3.11)

where \({\mathbb {F}}^{s}(m):=\{f\in L^{\infty }\cap {\dot{\mathcal {H}}}^{s}: \Vert f\Vert _{\infty }\le m\;\text {and}\;\Vert f\Vert _{{\dot{\mathcal {H}}}^{s}}\le m\}\), \(m>0\), and the constant \(c>0\) depends also on \(m>0\).

\((\beta )\) For latter use we state that our choice of \(h_n\) gives \(h_n\rightarrow 0\) and \(nh_n^d\rightarrow \infty \) when \(n\rightarrow \infty \).

\((\gamma )\) If the symbol k is compactly supported, then it obviously satisfies (3.4); see Remark 2.2\((\beta )\).

\((\delta )\) The rate obtained in Theorem 3.1 is the optimal one (see e.g. Tsybakov (2009)).

The following simple inequality is established in Coulhon et al. (2012) under more general assumptions. Here we express it for Ahlfors regular spaces and we give its proof for having the opportunity to present some first calculations on metric spaces:

Lemma 3.3

If \(\tau >d\), there exists a constant \(c=c_\tau >0\) such that for every \(h>0\) and \(x\in \mathcal {M}\)

$$\begin{aligned} I_{h,\tau }(x):=\int _{\mathcal {M}} \big (1+h^{-1}\rho (x, y)\big )^{-\tau } d\mu (y) \le ch^{d}. \end{aligned}$$
(3.12)

Proof

We split the metric space as

$$\begin{aligned} \mathcal {M}=\bigcup _{\nu =0}^{\infty }M_{\nu }, \end{aligned}$$

where \(M_0:=B(x,h)\) and \(M_{\nu }:=B(x,2^{\nu }h)\setminus B(x,2^{\nu -1}h)\), for every \(\nu \in {\mathbb N}\).

Then

$$\begin{aligned} I_{h,\tau }(x)=\sum _{\nu =0}^{\infty }\int _{M_{\nu }}\big (1+h^{-1}\rho (x, y)\big )^{-\tau } d\mu (y). \end{aligned}$$

Of course,

$$\begin{aligned} \int _{M_{0}}\big (1+h^{-1}\rho (x, y)\big )^{-\tau }\le |B(x,h)|\le c_1 h^{d}, \end{aligned}$$

thanks to (1.1).

Let \(\nu \in {\mathbb N}\) and \(y\in M_{\nu }\subset B(x,2^{\nu -1}h)^c\). Then \(1+h^{-1}\rho (x, y)\ge 1+2^{\nu -1}> 2^{\nu -1}\). This together with (1.1) implies that

$$\begin{aligned}\int _{M_{\nu }}\big (1+h^{-1}\rho (x, y)\big )^{-\tau } d\mu (y)&\le 2^{-(\nu -1)\tau }|M_{\nu }|\le 2^{\tau }2^{-\nu \tau }|B(x,2^{\nu }h)|\\&\le c_1 2^{\tau }2^{-\nu (\tau -d)}h^{d}. \end{aligned}$$

Combining all the above and since we assumed that \(\tau >d\) we conclude to

$$\begin{aligned} I_{h,\tau }(x)&\le c_1 2^{\tau }\sum _{\nu =0}^{\infty }2^{-\nu (\tau -d)}h^{d}=c_1 2^{\tau }\frac{1}{1-2^{-\tau +d}}h^{d} \\&=c_1\frac{2^{2\tau }}{2^{\tau }-2^{d}}h^{d}=:c_{\tau }h^{d}. \end{aligned}$$

\(\square \)

Let us point out that the above estimate is sharp in the sense that

$$\begin{aligned} I_{h,\tau }(x)\ge \int _{M_{0}}\big (1+h^{-1}\rho (x, y)\big )^{-\tau } d\mu (y)\ge \frac{2^{-\tau }}{c_1}h^{d}, \end{aligned}$$
(3.13)

thanks to (1.1).

It is well known that such an integral is classical on the Euclidean space and can be handled using (generalized) polar coordinates, exactly as we did in (3.7). On an abstract Ahlfors regular metric space it can be sharply estimated as above.

3.1 Estimation of the variance

We proceed to the first step of the proof of Theorem 3.1. We will estimate the variance of bounded densities. As always there is not any regularity required. The main tools are Theorem 2.1 and Lemma 3.3.

Proposition 3.4

Let \(f\in L^{\infty }\), \(\tau >d\) and a multiplier \(k\in \mathcal {C}^{\tau }({\mathbb {R}}_+)\) satisfying (3.3) and (3.4). Then for every \(x\in \mathcal {M}\), \(0<h\le 1\) and \(n\in {\mathbb {N}}\)

$$\begin{aligned} \sigma ^2(x)\le \frac{C_1}{nh^d}\Vert f\Vert _{\infty }, \end{aligned}$$
(3.14)

where the constant \(C_1>0\) depends only on \(\tau ,\;C_\tau \) and the structural constants.

Proof

Recalling the results expanded in Sect. 2, the spectral multiplier \(K_h\) associated with the dilated symbol \(k_h(\lambda )=k(h\lambda )\), is an integral operator with kernel \(\mathcal {K}_h(x,y)\), \(x,y\in \mathcal {M}\).

We introduce the random variables

$$\begin{aligned} \eta _i(x):=\mathcal {K}_h(X_i,x)-{\mathbb {E}}\big [\mathcal {K}_h(X_i,x)\big ],\quad x\in \mathcal {M},\;i=1,\dots ,n \end{aligned}$$
(3.15)

and we observe that \(\eta _1(x),\dots ,\eta _n(x)\) are iid random variables with \({\mathbb {E}}[\eta _i(x)]=0,\) for \(i=1,\dots ,n\). For their variance we have

$$\begin{aligned} {\mathbb {E}}[\eta _i^2(x)]= & {} {\mathbb {E}}\big [\big (\mathcal {K}_h(X_i,x)\big )^2\big ]- \left( {\mathbb {E}}\big [\mathcal {K}_h(X_i,x)\big ]\right) ^2\nonumber \\\le & {} {\mathbb {E}}\big [\big (\mathcal {K}_h(X_i,x)\big )^2\big ] \nonumber \\= & {} \int _\mathcal {M}|\mathcal {K}_h(x,y)|^2 f(y) d\mu (y). \end{aligned}$$
(3.16)

By Theorem 2.1 we have the certain bounds

$$\begin{aligned} |\mathcal {K}_h(x,y)|\le cC_{\tau }\mathcal {D}_{h,\tau }(x,y). \end{aligned}$$
(3.17)

Since \(f\in L^{\infty }\) we derive

$$\begin{aligned} {\mathbb {E}}[\eta _i^2(x)]\le & {} c \Vert f\Vert _{\infty } h^{-2d}\int _\mathcal {M}\big (1+h^{-1}\rho (x,y)\big )^{-2\tau }d\mu (y)\nonumber \\= & {} c \Vert f\Vert _{\infty } h^{-2d}I_{h,2\tau }\le C_1 \Vert f\Vert _{\infty } h^{-d}, \end{aligned}$$
(3.18)

where for the ultimate inequality we used (3.12).

We observe that

$$\begin{aligned} \sum \limits _{i=1}^n \eta _i (x)= n\big ({\hat{f}}_{n,h}(x)-{\mathbb {E}}\big [{\hat{f}}_{n,h}(x)\big ]\big ). \end{aligned}$$
(3.19)

Bearing in mind that the independent random variables \(\eta _i\) have zero mean and guided by (3.18) and (3.19) we arrive at

$$\begin{aligned} \sigma ^2(x)= & {} {\mathbb {E}}\left[ \big ({\hat{f}}_{n,h}(x)-{\mathbb {E}}[{\hat{f}}_{n,h}(x)]\big )^2\right] ={\mathbb {E}}\left[ \Big (\frac{1}{n} \sum \limits _{i=1}^n \eta _i (x)\Big )^2\right] \nonumber \\= & {} \frac{1}{n^2} \sum \limits _{i=1}^n{\mathbb {E}}\big [\eta _i^2(x)\big ] \le C_1 \Vert f\Vert _{\infty }\frac{1}{nh^d}. \end{aligned}$$
(3.20)

\(\square \)

3.2 Estimation of the bias

We will estimate the bias under the assumption that the pdf f lies in the Hölder space.

Proposition 3.5

Let \(s>0,\;f\in L^{\infty }\cap {\dot{\mathcal {H}}}^s\) and a multiplier \(k\in \mathcal {C}^{\tau }({\mathbb {R}}_+)\), for some\(\;\tau >d+s\), satisfying \(k(0)=1\), (3.3) and (3.4). Then for every \(x\in \mathcal {M}\), \(0<h\le 1\) and \(n\in {\mathbb {N}}\),

$$\begin{aligned} |b(x)|\le C_2 \Vert f\Vert _{{\dot{\mathcal {H}}}^s} h^s, \end{aligned}$$
(3.21)

where the constant \(C_2>0\) depends only on \(s,\;\tau ,\;C_{\tau }\) and the structural constants of the setting.

Proof

Since \(X_i\) are iid with common density f, we obtain

$$\begin{aligned} b(x)= & {} {\mathbb {E}}\big [{\hat{f}}_{n,h}(x)\big ]-f(x) \nonumber \\= & {} \frac{1}{n}\sum \limits _{i=1}^n {\mathbb {E}}\big [\mathcal {K}_h(X_i,x)\big ]-f(x) \nonumber \\= & {} \big (K_h-I\big )f(x), \end{aligned}$$
(3.22)

where I the identity operator on \(\mathcal {M}\) and \(K_h=k_h(\sqrt{L})\) the spectral multiplier associated with the dilated symbol \(k_h\) and the operator \(\sqrt{L}\), as in Sect. 2.

For the given bandwidth \(0<h\le 1\) there exists a unique integer \(i\in {\mathbb N}_0\) such that

$$\begin{aligned} 2^{-i}\le h<2^{-i+1}. \end{aligned}$$
(3.23)

We consider the symbol \(\psi \in \mathcal {C}^{\infty }({\mathbb R}_+)\) with \(\mathrm{{supp}\, }\psi \subset [0,2]\), \(\psi (\lambda )=1\), for every \(\lambda \in [0,1]\) and \(0\le \psi (\lambda )\le 1\), for every \(\lambda \in [0,2]\).

We set \(\varphi (\lambda ):=\psi (\lambda )-\psi (2\lambda )\) which is \(\mathcal {C}^{\infty }\) and supported in \([2^{-1},2]\).

By the construction of the above functions, it turns out that

$$\begin{aligned} \psi (2^{-i}\lambda )+\sum \limits _{j=i+1}^{\infty }\varphi (2^{-j}\lambda )=1,\;\;\text {for every}\;\lambda \in {\mathbb R}_+. \end{aligned}$$

Then by Coulhon et al. (2012, Corollary 3.9)

$$\begin{aligned} f=\Psi _{2^{-i}}f+\sum \limits _{j=i+1}^{\infty }\Phi _{2^{-j}}f, \end{aligned}$$
(3.24)

where by the capital \(\Psi _{2^{-i}}\) and \(\Phi _{2^{-j}}\) we denoted the spectral multipliers as in Sect. 2.1; \(\Psi _{2^{-i}}=\psi (2^{-i}\sqrt{L})\) and \(\Phi _{2^{-j}}=\varphi (2^{-j}\sqrt{L})\).

We set \(\ell :=\lfloor s\rfloor \) and we introduce the symbols

$$\begin{aligned} g^i(\lambda ):=\frac{(k(h2^{i}\lambda )-1)\psi (\lambda )}{|\lambda |^{\ell }}\;\;\text {and}\;\;g^j(\lambda ):=\frac{(k(h2^{j}\lambda )-1)\varphi (\lambda )}{|\lambda |^{\ell }},\;j>i.\nonumber \\ \end{aligned}$$
(3.25)

We proceed to justify that the assumptions of Theorem 2.1 are fulfilled for the symbols \(g^j\), \(j\ge i\), using Remark 2.2\((\beta )\).

By the fact that \(k\in \mathcal {C}^{\tau }({\mathbb R}_+)\), the values of the derivatives \(k^{(\nu )}(0)\), \(0\le \nu \le \tau \), (3.23) and the definitions of the symbols \(\psi \) and \(\varphi \) we have that:

\(g^j\in \mathcal {C}^{\tau -\ell }({\mathbb R}_+)\), for every \(j\ge i\). Of course \(\tau -\ell>d+s-\ell >d\).

\(\mathrm{{supp}\, }g^i\subset [0,2]\) and \(\mathrm{{supp}\, }g^j\subset [2^{-1},2]\), for every \(j>i\).

\(g^i(0)=\lim _{\lambda \rightarrow 0^{+}}\frac{k(h2^{i}\lambda )-1}{\lambda ^{\ell }}\psi (\lambda )=\frac{(h2^{i})^{\ell }k^{(\ell )}(0)}{\ell !}\psi (0)=0\).

\((g^i)^{(2\nu +1)}(0)=0\), for every \(1\le 2\nu +1\le \tau -\ell \).

Moreover by the vanishing derivatives’ assumption (3.3), the decay in (3.4) and the bottom of the support of \(\varphi \), we obtain after some calculus that

$$\begin{aligned} |(g^j)^{(\nu )}(\lambda )|\le c(\tau ,\ell ),\quad \text {for every}\;\lambda \ge 0,\;0\le \nu \le \tau -\ell ,\;j\ge i, \end{aligned}$$

where the above constant \(c(\tau ,\ell )>0\) is independent of j.

By Theorem 2.1, coupled with Remark 2.2\((\beta )\), the spectral multipliers \(G^j_{2^{-j}}=g^{j}(2^{-j}\sqrt{L})\), \(j\ge i\), are integral operators and their corresponding kernels \(\mathcal {G}^j_{2^{-j}}(x,y)\) present the behaviour

$$\begin{aligned} |\mathcal {G}^j_{2^{-j}}(x,y)|\le c\mathcal {D}_{2^{-j},\tau -\ell }(x,y),\quad x,y\in \mathcal {M},\;j\ge i. \end{aligned}$$
(3.26)

By the definition of the symbols \(g^j\), \(j\ge i\) in (3.25) we get

$$\begin{aligned} (K_h-I)\Psi _{2^{-i}}f=2^{-\ell i}G^i_{2^{-i}}L^{\ell /2}f \end{aligned}$$
(3.27)

and

$$\begin{aligned} (K_h-I)\Phi _{2^{-j}}f=2^{-\ell j}G^j_{2^{-j}}L^{\ell /2}f,\;\;j>i. \end{aligned}$$
(3.28)

Combining (3.22), (3.24) with (3.27) and (3.28) we obtain the expansion

$$\begin{aligned} b(x)=\sum _{j=i}^{\infty }2^{-\ell j}G^j_{2^{-j}}L^{\ell /2}f(x). \end{aligned}$$
(3.29)

Since \(G^j_{2^{-j}}\) are integral operators and because of \(g^j(0)=0\), for every \(j\ge i\), using (2.16) we express \(G^j_{2^{-j}}L^{\ell /2}f(x)\) as

$$\begin{aligned} G^j_{2^{-j}}L^{\ell /2}f(x)&=\int _{\mathcal {M}}\mathcal {G}^j_{2^{-j}}(x,y)L^{\ell /2}f(y)d\mu (y) \nonumber \\&=\int _{\mathcal {M}}\mathcal {G}^j_{2^{-j}}(x,y)\big (L^{\ell /2}f(y)-L^{\ell /2}f(x)\big )d\mu (y). \end{aligned}$$
(3.30)

The membership of f in the Hölder space \({\dot{\mathcal {H}}}^s\) implies that

$$\begin{aligned} \big |L^{\ell /2}f(y)-L^{\ell /2}f(x)\big |&\le \Vert f\Vert _{{\dot{\mathcal {H}}}^s}\rho (x,y)^{s-\ell } \nonumber \\&\le 2^{\ell j}2^{-sj}\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\big (1+2^{j}\rho (x,y)\big )^{s-\ell },\quad j\ge i. \end{aligned}$$
(3.31)

We equip (3.29) with (3.30), (3.26) and (3.31) to arrive at the expression

$$\begin{aligned} |b(x)|&\le \sum _{j=i}^{\infty }2^{-\ell j}\int _{\mathcal {M}}|\mathcal {G}^j_{2^{-j}}(x,y)|\big |L^{\ell /2}f(y)-L^{\ell /2}f(x)\big |d\mu (y) \nonumber \\&\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}\int _{\mathcal {M}}\mathcal {D}_{2^{-j},\tau -s}(x,y)d\mu (y) \nonumber \\&= c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}2^{jd}I_{2^{-j},\tau -s}(x), \end{aligned}$$
(3.32)

where \(I_{2^{-j},\tau -s}(x)\), as in Lemma 3.3. Thanks to the assumption \(\tau >d+s\), by (3.12), the fact that \(s>0\) and (3.23) we conclude that

$$\begin{aligned} |b(x)|&\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}2^{jd}I_{2^{-j},\tau -s}(x) \nonumber \\&\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}2^{-is}\le C_2\Vert f\Vert _{{\dot{\mathcal {H}}}^s}h^{s} \end{aligned}$$
(3.33)

and the proof is complete. \(\square \)

End of the proof of Theorem 3.1. We combine Propositions 3.4 and 3.5 to conclude the proof of Theorem 3.1 in the standard way.

3.3 Kernel density estimators on the sphere

The shape of the Earth justifies the unit sphere \({\mathbb S}^2\) of \({\mathbb R}^3\) as the most important domain for the purposes of several sciences. In the present paper we study earthquakes that are the subject of seismology, but many other sciences like astrophysics, environment and geology could be interested in this geometry too. We describe how the kernels obtained in Sect. 2.2 should be used in a data analysis.

We consider the symbols

$$\begin{aligned} g^{\sigma }(\lambda ):=(1+|\lambda |^{\sigma })^{-1},\;\lambda \in {\mathbb R}, \end{aligned}$$
(3.34)

for \(\sigma \in {\mathbb N}\), with \(\sigma >1\). Evidently for every \(\sigma >1\), the symbol \(g^{\sigma }\) is an even function such that \(g^{\sigma }\in \mathcal {C}^{\sigma -1}({\mathbb R})\), \(g^{\sigma }(0)=1\), \((g^{\sigma })^{(\nu )}(0)=0\), for every \(1\le \nu \le \sigma -1\) and presents the decay as in (3.4) for \(r=\sigma \). Such symbols are suitable for generating kdes by choosing the appropriate value of \(\sigma \) depending on the dimension d and the regularity s and then the appropriate bandwidth h depending also on the datasize n.

For the purpose of our present study, we restrict our attention on the case of the unit sphere \(\mathcal {M}={\mathbb S}^2\).

Let \(s>0\) and denote by \(\lceil s\rceil \) the smallest integer strictly grater than s. The symbols (3.34) for \(r=\sigma :=5+\lceil s\rceil \) satisfy the assumptions of Theorem 3.1 for densities on \({\dot{\mathcal {H}}}^s\).

The expression (2.19) could be used by R (or Python etc) after the infinite series be truncated until some certain integer \(N\in {\mathbb N}\), namely

$$\begin{aligned} {\hat{f}}_{n,h,N}(\xi ):=\frac{1}{n}\sum \limits _{i=1}^{n}\sum _{\nu =0}^{N} \frac{1+2\nu }{4\pi }g^r\big (h\sqrt{\nu (\nu +1)}\big )P_{\nu }\big (\langle \xi ,X_i\rangle \big ),\quad \xi \in {\mathbb S}^2. \end{aligned}$$
(3.35)

Note that

$$\begin{aligned} g^{r}(h\sqrt{\nu (\nu +1)})< (h\sqrt{\nu (\nu +1)})^{-r}<h^{-r}\nu ^{-r},\quad \nu \in {\mathbb N}. \end{aligned}$$

Moreover, for the Legendre polynomials it is well-known that \(|P_{\nu }(u)|\le 1,\) for every \(u\in [-1,1]\) and of course \(\frac{1+2\nu }{4\pi }\le 0.51\frac{\nu }{\pi }\), for every \(\nu \ge 25\).

Then the error (absolute value of the difference) because of the truncation of (2.19) until the order \(N\ge 24\) can be safely bounded from above by

$$\begin{aligned} \text {error}&\le \sum _{\nu >N}\frac{0.51 \nu }{\pi }h^{-r}\nu ^{-r}=\frac{0.51}{\pi }h^{-r}\sum _{\nu =N+1}^{\infty }\nu ^{-r+1} \nonumber \\&\le \frac{0.51}{\pi }h^{-r}\int _{N}^{\infty }x^{-r+1}dx=\frac{0.51 h^{-r}N^{-r+2}}{\pi (r-2)}. \end{aligned}$$
(3.36)

Recall that by Theorem 3.1\(h=n^{-1/(2s+2)}\), where n is our datasize.

Expression (3.36) asserts that an effectively large N could provide a certain error-bound when the datasize n and the regularity s are considered as fixed.

As an example, take \(s\in (0,1]\) (the less restrictive range) which corresponds to the value \(r=6\). In this case and after setting \(h=n^{-1/2(s+1)}\) the error is at most

$$\begin{aligned} \frac{0.51*n^{3/(s+1)}}{4\pi }N^{-4}. \end{aligned}$$
(3.37)

In a specific data analysis, with a given datasize n, one should bear in mind to respect (3.37) for the hypothetical smoothness’ level s and obtain appropriate values for the error is pre-defined as “suitable”. For a data analysis of earthquakes the reader is referred to Sect. 5.

4 Spaces of homogeneous type

We proceed to more general assumptions than those in Sect. 1. Specifically, we no longer assume that our space enjoys the Ahlfors regularity, rather than the so-called doubling volume property (4.1) below. Such a setting is what is referred as a space of homogeneous type.

We replace Assumption I with the following:

(a) Doubling volume condition: There exists a constant \(c_0>1\) such that

$$\begin{aligned} 0< |B(x,2r)| \le c_0|B(x,r)|<\infty \quad \hbox {for all }x \in \mathcal {M}\hbox { and }r>0, \end{aligned}$$
(4.1)

where |B(xr)| is the volume of the open ball B(xr) centred at x of radius r.

(b) Noncollapsing condition: There exists a constant \(c_1>0\) such that

$$\begin{aligned} \inf _{x\in \mathcal {M}}|B(x,1)|\ge c_1. \end{aligned}$$
(4.2)

We modify Assumption II by replacing the factor \(t^{-d/2}\) by

$$\begin{aligned} \big (|B(x,\sqrt{t})||B(y,\sqrt{t})|\big )^{-1/2} \end{aligned}$$
(4.3)

in equations (1.2) and (1.3).

Some remarks are in order:

\((\alpha )\) Of course (a) and (b) hold trivially true under (i).

\((\beta )\) From (4.1) it follows that there exist \(c_0'>0\) and \(d>0\) such that

$$\begin{aligned} |B(x,\lambda r)| \le c_0'\lambda ^{d} |B(x,r)| \quad \hbox {for every } x \in \mathcal {M}, r>0,\hbox { and }\lambda >1, \end{aligned}$$
(4.4)

the constant d above \(d'\) is referred as the homogeneous dimension of \((\mathcal {M},\rho ,\mu )\). This generalizes effectively the Ahlfors dimension used in the previous sections.

\((\gamma )\) A connection between the volume of balls of small radius, with their radius and the dimension comes from (4.2) and (4.4):

$$\begin{aligned} |B(x, r)|\ge c r^d, \quad x\in \mathcal {M},\;0<r\le 1. \end{aligned}$$
(4.5)

\((\delta )\) In the framework of spaces of homogeneous type we replace the kernels defined in (2.13) by

$$\begin{aligned} \mathcal {D}_{h,\tau }(x,y):=\frac{\big (1+h^{-1}\rho (x,y)\big )^{-\tau }}{(|B(x,h)||B(y,h)|)^{1/2}},\quad \text {for}\;x,y\in \mathcal {M}. \end{aligned}$$
(4.6)

On the background results:

(i) Theorem 2.1 holds as it is, but with the kernels \(\mathcal {D}_{h,\tau }\) as in (4.6).

(ii) Lemma 3.3 takes the form: for every \(\tau >d\), there exists a constant \(c=c_{\tau }>0\) such that

$$\begin{aligned} I_{h,\tau }\le c|B(x,h)|,\quad \text {for every}\;x\in \mathcal {M},\;h>0. \end{aligned}$$
(4.7)

\((\varepsilon )\) To compare the volumes of balls with different centers \(x, y\in \mathcal {M}\) and the same radius r, we note first that \(B(x,r) \subset B\big (y, \rho (y,x) +r\big )\), which coupled with (4.4) leads to

$$\begin{aligned} |B(x, r)| \le c\big (1+ \rho (x,y)/r\big )^d |B(y, r)|, \quad x, y\in \mathcal {M}, \; r>0. \end{aligned}$$
(4.8)

The last implies that the kernel in (4.6) is estimated by:

$$\begin{aligned} D_{h,\tau }(x,y)\le c|B(x,h)|^{-1}(1+h^{-1}\rho (x,y))^{-\tau +d/2}. \end{aligned}$$
(4.9)

We are now in place to express the boundedness of the Mean Squared Error on the more general setting of spaces of homogeneous type associated with operators:

Theorem 4.1

Let \(s>0,\) \(f\in L^{\infty }\cap {\dot{\mathcal {H}}}^s\) and a symbol \(k \in \mathcal {C}^{\tau }({\mathbb {R}}_+)\) for some \(\tau >3d/2+s\), satisfying: \(k(0)=1\),

$$\begin{aligned} k^{(\nu )}(0)=0,\quad \text {for every}\;\;1\le \nu \le \tau , \end{aligned}$$
(4.10)

and for some \(r> \tau +d\),

$$\begin{aligned} |k^{(\nu )}(\lambda )|\le C_{\tau }(1+\lambda )^{-r},\quad \text {for every}\;\;\lambda \ge 0,\;0\le \nu \le \tau . \end{aligned}$$
(4.11)

We pick \(h=h_n=n^{-\frac{1}{2s+d}}\). Then for every \(n\in {\mathbb N}\) the corresponding kde \({\hat{f}}_{n,h}\) satisfies

$$\begin{aligned} \sup \limits _{x\in \mathcal {M}} {\mathbb {E}}\big [\big ({\hat{f}}_{n,h}(x)-f(x)\big )^2\big ]\le c C(f) n^{-\frac{2s}{2s+d}}, \end{aligned}$$
(4.12)

where the constant \(c>0\), depends only on \(\tau ,\;s,\;c_{\tau },\beta \) and the structural constants of the setting, while C(f) is given by

$$\begin{aligned} C(f):=\max \big (\Vert f\Vert _{\infty },\Vert f\Vert _{{\dot{\mathcal {H}}}^s}^2\big ). \end{aligned}$$
(4.13)

Proof

In the light of (3.8), we need to bound the variance and the bias in a similar manner as in Propositions 3.4 and 3.5.

We start with the variance.

By (3.17) and (4.9) we get the behaviour

$$\begin{aligned} |K_h(x,y)|\le c|B(x,h)|^{-1}\big (1+h^{-1}\rho (x,y)\big )^{-\tau +d/2},\quad \text {for every}\;x,y\in \mathcal {M}. \end{aligned}$$

The last replaced in (3.18) implies since \(\tau>3d/2+s>d\),

$$\begin{aligned} {\mathbb {E}}[\eta _i^2(x)]\le & {} c \Vert f\Vert _{\infty } |B(x,h)|^{-2}\int _M\big (1+h^{-1}\rho (x,y)\big )^{-2\tau +d}d\mu (y)\\= & {} c \Vert f\Vert _{\infty } |B(x,h)|^{-2}I_{h,2\tau -d}\le c \Vert f\Vert _{\infty } |B(x,h)|^{-1} \le C_1 \Vert f\Vert _{\infty } h^{-d}, \end{aligned}$$

where we used (4.7) and (4.5) respectively. Now the estimation of the variance in (3.14) follows as in (3.19).

We proceed to bound the bias as in Proposition 3.5. Recall that \(\ell :=\lfloor s\rfloor \). This time the kernels \(\mathcal {G}^j_{2^{-j}}(x,y)\) enjoy the behaviour

$$\begin{aligned} |\mathcal {G}^j_{2^{-j}}(x,y)|\le c\mathcal {D}_{2^{-j},\tau -\ell }(x,y)\le c|B(x,2^{-j})|^{-1}\big (1+2^{j}\rho (x,y)\big )^{-\tau +\ell +d/2}, \end{aligned}$$

thanks to (4.9). Then

$$\begin{aligned} |b(x)|&\le \sum _{j=i}^{\infty }2^{-\ell j}\int _{\mathcal {M}}|\mathcal {G}^j_{2^{-j}}(x,y)|\big |L^{\ell /2}f(y)-L^{\ell /2}f(x)\big |d\mu (y) \nonumber \\&\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}|B(x,2^{-j})|^{-1}I_{2^{-j},\tau -\frac{d}{2}-s}(x), \nonumber \\&\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}\le C_2 h^{s}, \end{aligned}$$
(4.14)

where we used (4.7) which is valid because \(\tau >3d/2+s\).

The reminder of the proof is the same as in the proof of Theorem 3.1. \(\square \)

Let us close this section with some comments on the geometric assumptions.

Obviously the doubling property (4.1) is more general assumption than Ahlfors regularity (1.1). A simple illustrative example could be the case of the weighted ball, \({\mathbb B}^m:=\big \{x\in \mathbb {R}^m: \Vert x\Vert <1\big \}\), of \({\mathbb R}^m\) with the distance (2.20) and the weighted measure; Dai and Xu (2013)

$$\begin{aligned} d\mu _{\gamma }(x):= (1-\Vert x\Vert ^2)^{\gamma -1/2} dx, \quad \gamma >-1. \end{aligned}$$
(4.15)

As in Dai and Xu (2013) we have that

$$\begin{aligned} |B(x, r)| \sim r^m(1-\Vert x\Vert ^2+r^2)^\gamma , \end{aligned}$$
(4.16)

which implies that \((\mathcal {M}, \rho , \mu _{\gamma })\) satisfies the doubling property (4.1) and the non-collapsing condition (4.2).

Precisely, the homogeneous dimension is \(d=m+2\max (\gamma ,0)\). Clearly \(m\le d\), with the equality exactly when \(\gamma =0\), which correspond to the unweighted case and widows the space as an Ahlfors regular one.

Moreover, it is now apparent how the homogeneous dimension, fundamentally depends on the measure of the space and may or may not be an integer.

Note also that in the weighted case, the proper operator is

$$\begin{aligned} L:=L_{\gamma }:= -\sum _{i=1}^m (1-x_i^2)\partial ^2_i + 2\sum _{1\le i < j \le m}x_i x_j\partial _i\partial _j + (m+2 \gamma )\sum _{i=1}^m x_i \partial _i. \end{aligned}$$

which satisfies Assumption II; see Dai and Xu (2013); Kerkyacharian et al. (2020).

Note finally that the non-collapsing condition holds true for every space \((\mathcal {M},\rho ,\mu )\) of homogeneous type which is of finite measure \(\mu (\mathcal {M})<\infty \); see Coulhon et al. (2012).

5 Data Illustration

In this data illustration, we use earthquake location data for all earthquakes with a reported magnitude of 6.5 or higher between 1990–2021 (inclusive). These data are freely available through the United States Geological Survey website https://earthquake.usgs.gov/earthquakes/search/. In total, there are \(n = 1507\) earthquakes that fit these criteria, and we plot the location of these earthquakes in Fig. 1.

Fig. 1
figure 1

Earthquakes from 1990–2021 with a magnitude of 6.5 or greater

To explain some of the earthquake patterns in Fig. 1, we briefly discuss plate tectonics. The Earth’s crust or lithosphere is divided into distinct and irregular sections of solid rock called tectonic plates. The tectonic plates float and gradually move on the molten rock of the Earth’s mantle. Many geological events (e.g., volcanic eruptions and earthquakes) occur where different tectonic plates meet. For this reason, high magnitude earthquakes are highly concentrated around tectonic plate boundaries, and this global network of plate boundaries is evident in the earthquake patterns in Fig. 1. Earthquakes also occur elsewhere in the world at lower rates.

The Circum-Pacific Belt (the west coasts of the American continents, from Alaska to East Asia, stretching down to the Pacific Islands), sometimes called the Pacific Rim, is the most seismically active. Note that the Pacific Islands (e.g., Tonga, Fiji, New Zealand, and New Caledonia) appear on the left and right of Fig. 1. The entire Pacific Rim has high concentrations of high magnitude earthquakes, but we point out two other areas with very high concentrations of earthquakes. There are many earthquakes in a small area around the South Sandwich Islands, including earthquakes with 7.5 and 8.1 magnitudes on August 12, 2021. We also point out the Alpide Belt, a region that runs along the Azores, the Mediterranean, the Middle East, the Himalayas, Indonesia, and connects to the Pacific Rim in the Pacific Islands. Given the distribution of earthquakes seen here, we anticipate the need for a heterogeneous density estimate.

These data are distributed globally and are indexed on the sphere. To estimate the density of earthquakes, we approximate the density estimator in (2.27) by selecting a finite truncation point N,

$$\begin{aligned} {\hat{f}}_{n,h,N}( \xi ):= \frac{1}{n} \sum ^n_{i=1} \sum ^N_{\nu =0}\frac{1 + 2 \nu }{4 \pi } k(h\sqrt{\nu (\nu +1)}) P_\nu \left( \langle \xi , X_i \rangle \right) , \end{aligned}$$
(5.1)

where \(X_i\) are earthquake locations, \(k(\cdot )\) is defined as 3.34 with \(r = 5 + \lceil s\rceil \), and \(P_\nu \) are Legendre polynomials.

Because the truncation induces error in the estimation, we anticipate that lower values of N will decrease the accuracy. In Fig. 2, given n, we plot the theoretical upper bound of the truncation error in (3.36) against the truncation point for various values of s. For all values of s, the upper bound of truncation error decreases as N increases.

Fig. 2
figure 2

The theoretical upper bound on the truncation error from (3.36) for combinations of s and N. The dashed line indicates an error of 0.01

Because (5.1) is not guaranteed to be positive, we use a rectified density estimate,

$$\begin{aligned} {\hat{f}}^*_{n,h,N}( \xi ) = \max (10^{-3},{\hat{f}}_{n,h,N}( \xi ) ). \end{aligned}$$
(5.2)

In our analysis of these data, we explore the effect of the bandwidth and the truncation point of (5.1) on the density estimates of earthquake locations. We use out-of-sample performance to determine the bandwidth and truncation point. Specifically, we randomly hold out 20% of the earthquakes as a test dataset \(\{X^{\text {test}}_1,\ldots ,X^{\text {test}}_{n_{\text {test}}}\}\) to validate the density estimator. Using many different bandwidth and truncation point combinations, we calculate density estimators \({\hat{f}}^*_{n_{\text {train}},h,N}( \xi )\) using the remaining 80% of the data. Then, at the hold-out locations, we evaluate

$$\begin{aligned} {\hat{f}}^*_{n_{\text {train}},h,N}( X^{\text {test}}_1 ),\dots , {\hat{f}}^*_{n_{\text {train}},h,N}( X^{\text {test}}_{n_{\text {test}}} ). \end{aligned}$$

Using these evaluations, we compute the out-of-sample mean log-loss (negative log-score) at the hold out locations

$$\begin{aligned} \frac{1}{n_{\text {test}}}\sum ^{n_{\text {test}}}_{i=1} -\log \left( {\hat{f}}^*_{n_{\text {train}},h,N}( X^{\text {test}}_i )\right) . \end{aligned}$$

The use of log-loss as a proper scoring rule is common; see for example (Good 1952; Gneiting and Raftery 2007).

Rather than consider bandwidth directly, we let \(h = n^{-1/(2\,s + 2)}\) and consider \(s \in \{0.001, 0.01, 0.05, 0.5, 1 \}\), where s indexes the smoothness of the density (See Sect. 2.4). Based on how concentrated earthquake events are, small values of s (i.e., smaller bandwidths) are preferable to smoother alternatives that will yield more uniform density estimators. In our analysis, we also vary the truncation point of the density estimator \(N \in \{5,10,20,30,40,50,75,100\}\). Larger values of N yielded no improvement in log-loss. We select the truncation point and bandwidth with the lowest out-of-sample mean log-loss.

We plot the mean log loss as a function of truncation point for various values of s in Fig. 3. Overall, for each s, increasing N improves out-of-sample performance up to a point; then, improvement flattens and appears to reach an asymptote. In addition, smaller values of s have better out-of-sample performance; however, values of s less that 0.01 do not change model performance. The best out-of-sample performance (lowest log loss) is with \(N = 75\), and there is no appreciable difference between \(s = 0.01\) and \(s = 0.001\) (or even smaller values of s). For this reason, we use \(s = 0.01\).

Fig. 3
figure 3

The mean out-of-sample log loss (negative log-score) for various combinations of s and N

For \(N = 75\) and \(s = 0.01\), we plot a heat map of the estimated density over a fine grid over the Earth in Fig. 4. The colors are on a natural log scale to better see variability in the density. The Pacific Rim is evident in the density estimate, but the most striking features are the high estimated densities around the Pacific Islands and the Pacific Rim. We also note that the South Sandwich Islands and the Alpide belt are visible for their relatively high earthquake densities.

Fig. 4
figure 4

Kernel density estimate for the 1990–2021 earthquake data for \(s = 0.01\) and \(N = 75\). The colors of the density are presented on the log scale

In this data illustration, we considered a global dataset of earthquakes from 2021. We compared many truncated kernel density estimators indexed on the sphere on a test set. We found that the kernel density estimates perform better out-of-sample with shorter bandwidths (smaller s) and more polynomial terms (higher N). For the best combination of s and N considered, we plot the estimated density and comment on its features. In future analyses of these data, one may also estimate the density of earthquake magnitude. In these cases, it may be beneficial to allow the magnitude density to vary smoothly over space as in Sheanshang et al. (2021). In addition, one may account for aftershock excitation in the estimation, as is sometimes used in point process methodology (see, e.g., Hawkes 1971a, b; Ogata 1988, 1998; White and Gelfand 2021).