Pointwise density estimation on metric spaces and applications in seismology

Cleanthous, G.; Georgiadis, Athanasios G.; White, P. A.

doi:10.1007/s00184-024-00948-2

Pointwise density estimation on metric spaces and applications in seismology

Open access
Published: 13 February 2024

(2024)
Cite this article

Download PDF

You have full access to this open access article

Metrika Aims and scope Submit manuscript

Pointwise density estimation on metric spaces and applications in seismology

Download PDF

G. Cleanthous¹,
Athanasios G. Georgiadis² &
P. A. White³

344 Accesses
Explore all metrics

Abstract

We are studying the problem of estimating density in a wide range of metric spaces, including the Euclidean space, the sphere, the ball, and various Riemannian manifolds. Our framework involves a metric space with a doubling measure and a self-adjoint operator, whose heat kernel exhibits Gaussian behaviour. We begin by reviewing the construction of kernel density estimators and the related background information. As a novel result, we present a pointwise kernel density estimation for probability density functions that belong to general Hölder spaces. The study is accompanied by an application in Seismology. Precisely, we analyze a globally-indexed dataset of earthquake occurrence and compare the out-of-sample performance of several approximated kernel density estimators indexed on the sphere.

A Fast Algorithm to Estimate the Square Root of Probability Density Function

Robust Comparison of Kernel Densities on Spherical Domains

Article 21 June 2018

Lattice-based methods for regression and density estimation on complicated multidimensional regions

Article 11 August 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Today, technology has equipped science with a massive amount of data that requires rigorous analysis. In astronomy, data can come from missions to other planets, telescopes observing distant parts of the universe, or programs studying Cosmic Microwave Background Radiation. In climatology and environmental science, sensors provide data on the atmosphere. Medicine uses scans to track the growth of tumors and monitor their development, while embryology uses data to track the growth and ensure the health of developing humans. Essentially all scientific fields now heavily rely on data.

The complexity and form of the data reflect their nature. In the examples mentioned above, the data can be represented by geometric structures that capture their form and dynamics. A dataset should be understood as independent realizations of a random variable (rv) X. Such a rv lives in some domain according to its nature. For instance when X represents the locations on some planet, then X lives on the sphere ${\mathbb S}^2$ of the Euclidean space ${\mathbb {R}}^3$. The same is true of CMB radiation. For geological data in the interior of Earth, or another celestial body, the domain of study may be the ball ${\mathbb B}^3$. Similarly, in the field of medicine, the domain of definition of X can become much more complicated geometrically, and as a result, the general target domain becomes an abstract metric space $\mathcal {M}$.

Let X be a rv distributed on a metric measure space $\mathcal {M}$ and let $f=f_{X}$ be its unknown probability density function (pdf). Density estimation, estimating a pdf from data $X_1,\dots ,X_n$, represents an important problem in Statistics. To this end we need to construct a density estimator, which is an object of the form ${\hat{f}}_{n}(X_1,\dots ,X_n;x)$, where ${\hat{f}}_n:\mathcal {M}^{n}\times \mathcal {M}\rightarrow {\mathbb R}$ a measurable function. A famous method for obtaining such an estimator is by the so-called “kernel density estimators".

Nonparametric Statistics approaches the problem of density estimation by constructing appropriate kernel density estimators, which can approximate any density with membership is certain regularity spaces. Historically, these methods were pioneered by Rosenblatt (1956), Parzen (1962) and Bretagnolle and Huber (1979). The first books on the topic include Silverman (1986) and Härdle et al. (1998), while today the book Tsybakov (2009) is considered one of the main reference points. For an indicative list of contributions we refer to Baldi et al. (2009), Baraud et al. (2014), Bates and Mio (2014), Berry and Sauer (2017), Birgé (2014), Devroye and Györfi (1985), Devroye and Lugosi (1996), Devroye and Lugosi (1997), Donoho et al. (1996), Efroimovich (1986), Goldenshluger and Lepski (2014), Goldenshluger and Lepski (2011a, 2011b, 2022a, 2022b), Hall et al. (1987), Hasminskii and Ibragimov (1990), Ibragimov and Khasminski (1980), Kerkyacharian et al. (1996), Kerkyacharian et al. (2001), Kerkyacharian et al. (2008), Massart (2007), Pelletier (2005), Pelletier (2006), Rigollet (2006), Rigollet and Tsybakov (2007), Samarov and Tsybakov (2007).

Here we study kernel density estimators on metric measure spaces under very broad assumptions. The setting we will work covers simultaneously the classical cases of Euclidean space ${\mathbb R}^d$, the sphere ${\mathbb {S}}^{d}$, the ball ${\mathbb {B}}^d$ and many more significant examples of independent interest. Furthermore it contains more sophisticated geometric settings like manifolds and Lie groups. On the other hand, some techniques originated from spectral theory will simplify and unify several aspects of the approach. We shall operate in the setting put forward in Coulhon et al. (2012), which we describe next in a simplified form:

I. We assume that $(\mathcal {M},\rho ,\mu )$ is a metric measure space such that $(\mathcal {M}, \rho )$ is locally compact with distance $\rho (\cdot , \cdot )$ and $\mu $ is a positive Radon measure satisfying:

(i) Ahlfors regularity: There exist constants $c_1\ge 1$ and $d>0$ such that

$$\begin{aligned} c_1^{-1}r^d\le |B(x,r)| \le c_1 r^{d} \quad \hbox {for every }x \in \mathcal {M}\hbox { and }r>0, \end{aligned}$$

(1.1)

where |B(x, r)| is the volume of the open ball $B(x,r):=\{y\in \mathcal {M}:\rho (x,y)<r\}$ centred at x of radius r.

The number d is the so-called Ahlfors dimension of the space.

II. The second assumption is that there exists an essentially self-adjoint non-negative operator L on ${\mathbb L}^2(\mathcal {M}, d\mu )$, mapping real-valued to real-valued functions, such that the associated semigroup (more details in Sect. 2) $P_t=e^{-tL}$, $t>0$, consists of integral operators with (heat) kernel $p_t(x,y)$ obeying the conditions:

(ii) Gaussian localization: There exist constants $c_2,c_3>0$ such that

$$\begin{aligned} |p_t(x,y)| \le c_2 t^{-d/2}\exp \Big \{-c_3\frac{\rho ^2(x,y)}{t}\Big \} \quad \hbox {for every} \;\;x,y\in \mathcal {M},\,t>0. \end{aligned}$$

(1.2)

(iii) Hölder continuity: There exists a constant $\alpha >0$ such that

$$\begin{aligned} \big | p_t(x,y) - p_t(x,y') \big | \le c_2\Big (\frac{\rho (y,y')}{\sqrt{t}}\Big )^\alpha t^{-d/2}\exp \Big \{-c_3\frac{\rho ^2(x,y)}{t}\Big \} \end{aligned}$$

(1.3)

for every $x, y, y'\in \mathcal {M}$ such that $\rho (y,y')\le \sqrt{t}$ and $t>0$. (iv) Markov property:

$$\begin{aligned} \int _{\mathcal {M}} p_t(x,y) d\mu (y)= 1 \quad \hbox {for every }x\in \mathcal {M}\hbox { and } t >0. \end{aligned}$$

(1.4)

This setting we study generalizes (by default) the Euclidean space. Moreover, it contains spaces like the sphere, the ball, the interval, cubes/rectangles, the simplex, Riemannian manifolds with non-negative Ricci curvature and more, each equipped with their natural metrics and measures associated with Laplace or Laplace-Beltrami operators. For more examples we refer the reader to Coulhon et al. (2012), Georgiadis and Nielsen (2017), Kerkyacharian et al. (2020), Kerkyacharian and Petrushev (2015).

Some first contributions in Statistics in this generality can be found in Castillo et al. (2014), Cleanthous et al. (2020, 2022), Kerkyacharian et al. (2018), while there is a large number of open problems in front of the community.

The aim of the present study is threefold:

$(\alpha )$ To review the setting and the construction of kernel density estimators together with the corresponding theoretical background, which is demanding, on the broad framework under study; Sect. 2.

$(\beta )$ As novel results, we obtain optimal pointwise density estimation on Hölder spaces; see Sects. 3 and 4 and we shed light in the assumptions and methods.

$(\gamma )$ As an application, we perform a data-analysis of earthquakes; Sect. 5, using our kernel density estimators. Precisely we compare the out-of-sample performance of several approximated kernel density estimators and we plot the heat map of the estimated density using the selected model.

Remarks and Examples are placed in several points of the manuscript for highlighting notions and ideas. The new results are contained in Sect. 3 and under more general assumptions in Sect. 4 and are accompanied with remarks that could be used for future studies.

Section 5 is dedicated to the data analysis of earthquakes. In this Section we apply the theoretical results of the paper and show how one can use these approaches with occurrence data on the Earth. The data used in this analysis are freely available through the United States Geological Survey website https://earthquake.usgs.gov/earthquakes/search/.

Notation: Throughout positive constants will be denoted by c, and will be allowed to vary at every occurrence. The dependence of a constant to the geometric structure constants $c_1,c_2,c_3,\alpha $ and d will not be stated, but the dependence to a parameter q, will be stated as $c_q$. We denote by ${\mathbb N},\;{\mathbb R},\;{\mathbb R}_+$ the sets of positive integers, real numbers and non-negative real numbers respectively. If $\tau \in {\mathbb N}$, the class of differentiable functions on ${\mathbb R}_+$ with continuous derivatives up to order $\tau $ will be stated as $\mathcal {C}^{\tau }({\mathbb R}_+)$. For $s>0$, we will denote by $\lfloor s\rfloor $ the greatest integer that is strictly less than s and by $\lceil s\rceil $ the smaller integer strictly larger than s.

2 Density estimation on metric spaces associated with operators: A review

The first part of our study consists of a review of density estimation on metric spaces associated with operators. One of the milestones is to construct kernels. We expand here the methods used in Cleanthous et al. (2020, 2022) inspired by the corresponding machinery built in Coulhon et al. (2012) based on the powerful Spectral Theory.

2.1 Functional calculus

We start by some fundamental notions of Spectral Theory providing a minimum background of this wide scientific field; the reader is further referred to Prugovečki (1981), Reed and Simon (1980), Yoshida (1978).

Recall that L is assumed to be a non-negative self-adjoint operator that maps real-valued to real-valued functions. Then (Prugovečki 1981, Section 5) L admits a unique spectral measure E; that is a projector-valued mapping as follows:

Denote by $\mathcal {B}$ the Borel $\sigma $-algebra on ${\mathbb R}$. For every $S\in \mathcal {B}$, we correspond an orthogonal projection $E(S):{\mathbb L}^2(\mathcal {M},d\mu )\rightarrow {\mathbb L}^2(\mathcal {M},d\mu )$ such that:

(i) $E({\mathbb R})=I$ (the identity operator on ${\mathbb L}^2(\mathcal {M},d\mu )$).

(ii) For every sequence of disjoint Borel sets $\{S_n\}_{n\in {\mathbb N}}\subset \mathcal {B}$

$$\begin{aligned} E(S)=\sum _{n=1}^{\infty }E(S_n),\quad \text {where}\;\; S:=\bigcup _{n=1}^{\infty }S_{n}, \end{aligned}$$

(2.1)

in the strong ${\mathbb L}^2(\mathcal {M},d\mu )$ sense; i.e. for every $f\in {\mathbb L}^2(\mathcal {M},d\mu )$,

$$\begin{aligned} \left\| \left( E(S)-\sum _{n=1}^{N}E(S_n)\right) f\right\| _2\xrightarrow {N\rightarrow \infty }0. \end{aligned}$$

(2.2)

Thanks to (ii), for every $f,g\in {\mathbb L}^2(\mathcal {M},d\mu )$ the set-function

$$\begin{aligned} \nu _{f,g}(S):=\langle E(S)f,g\rangle ,\quad \text {for every }\;S\in \mathcal {B}, \end{aligned}$$

(2.3)

is a complex measure on $({\mathbb R},\mathcal {B})$.

Moreover for every $f\in {\mathbb L}^2(\mathcal {M},d\mu )$ the set-function

$$\begin{aligned} \nu _{f}(S)&:=\nu _{f,f}(S)=\langle E(S)f,f\rangle \\&=\langle E(S)^2 f,f\rangle =\langle E(S)f,E(S)f\rangle =\Vert E(S)f\Vert _2^2,\quad S\in \mathcal {B}\end{aligned}$$

is a measure on $({\mathbb R},\mathcal {B})$, which is finite and precisely $\nu _{f}({\mathbb R})=\Vert f\Vert _2^2<\infty $.

The study can be slightly simplified by the following projector-valued function

$$\begin{aligned} {\mathbb R}\ni \lambda \mapsto E_{\lambda }:=E(I_\lambda ),\quad I_{\lambda }:=(-\infty ,\lambda ] \end{aligned}$$

(2.4)

which is referred as the spectral resolution of L. Moreover for every $f\in {\mathbb L}^2(\mathcal {M},d\mu )$ and every $\lambda \in {\mathbb R}$, we have $\nu _{f}(\lambda ):=\nu _{f}(I_\lambda )=\langle E_{\lambda }f,f\rangle =\Vert E_{\lambda }f\Vert _2^2$.

Given further that L is assumed non-negative, by Prugovečki (1981, Theorem 6.3), the domain $\text {Dom}(L)$ of L consists of all functions $f\in {\mathbb L}^2(\mathcal {M},d\mu )$ such that

$$\begin{aligned} \int _{0}^{\infty }\lambda ^2 d\nu _{f}(\lambda )=\int _{0}^{\infty }\lambda ^2 d\langle E_{\lambda }f,f\rangle <\infty . \end{aligned}$$

(2.5)

Moreover for every $f\in \text {Dom}(L)$ and $g\in {\mathbb L}^2(\mathcal {M},d\mu )$

$$\begin{aligned} \langle Lf,g\rangle =\int _{0}^{\infty } \lambda d\langle E_{\lambda }f,g\rangle . \end{aligned}$$

(2.6)

It is customary to write symbolically

$$\begin{aligned} L=\int _{0}^{\infty }\lambda dE_{\lambda }, \end{aligned}$$

(2.7)

the so-called spectral decomposition of L.

The next logical step is the functional calculus associated with the operator L; see also Reed and Simon (1980, Theorem VIII.5).

Let $g:{\mathbb R}_{+}\rightarrow {\mathbb R}$ a Borel measurable function. Then the operator g(L) defined on

$$\begin{aligned} \text {Dom}(g):=\left\{ \varphi \in {\mathbb L}^2(\mathcal {M},d\mu ):\;\int _{0}^{\infty }|g(\lambda )|^2 d\langle E_{\lambda }\varphi ,\varphi \rangle <\infty \right\} , \end{aligned}$$

(2.8)

as

$$\begin{aligned} \langle g(L)\varphi ,\psi \rangle =\int _{0}^{\infty }g(\lambda ) d\langle E_{\lambda }\varphi ,\psi \rangle ,\quad \text {for every }\varphi \in \text {Dom}(g),\;\psi \in {\mathbb L}^2(\mathcal {M},d\mu ),\nonumber \\ \end{aligned}$$

(2.9)

is a self-adjoint operator mapping real-valued functions to real-valued functions. If g is further assumed to be bounded, then $\text {Dom}(g)={\mathbb L}^2(\mathcal {M},d\mu )$ and $g(L):{\mathbb L}^2(\mathcal {M},d\mu )\rightarrow {\mathbb L}^2(\mathcal {M},d\mu )$ is a bounded operator. The operator g(L) is referred as the spectral multiplier associated with g and L and it is symbolically expressed —in the spirit of (2.7)— as

$$\begin{aligned} g(L)=\int _{0}^{\infty }g(\lambda ) dE_{\lambda }. \end{aligned}$$

(2.10)

The above spectral multipliers can take an explicit form for particular operators L and metric spaces, as we will see in §2.2.

For the purpose of the study of kernel density estimators we turn our attention to spectral multipliers associated with the operator $\sqrt{L}$, which is well-defined and self-adjoint; see Yoshida (1978). The exact reasons behind the switch to $\sqrt{L}$ are discussed in Cleanthous et al. (2022, Remark 2.2.$(\alpha )$).

We denote by $\{F_{\lambda }:\lambda \ge 0\}$ the spectral resolution of $\sqrt{L}$. Then $F_{\lambda }=E_{\lambda ^2}$ and for every Borel measurable $g:{\mathbb R}_{+}\rightarrow {\mathbb R}$ it holds

$$\begin{aligned} g(\sqrt{L})=\int _{0}^{\infty }g(\lambda )dF_{\lambda }=\int _{0}^{\infty }g(\sqrt{\lambda })dE_{\lambda }. \end{aligned}$$

(2.11)

Summary and notation For the rest of our study we fix the following terminology and notation.

As a symbol we will refer to a Borel measurable and bounded function

$$\begin{aligned} k:{\mathbb R}_+\rightarrow {\mathbb R}\quad \text {(symbol)}. \end{aligned}$$

The spectral multiplier associated with the symbol k and the operator $\sqrt{L}$ as in (2.11) will be denoted by the corresponding capital letter:

$$\begin{aligned} K:=k(\sqrt{L}):{\mathbb L}^2(\mathcal {M},d\mu )\rightarrow {\mathbb L}^2(\mathcal {M},d\mu )\quad \text {(spectral multiplier)}. \end{aligned}$$

By the above discussion, the operator K:

(i) is bounded on ${\mathbb L}^2(\mathcal {M},d\mu )$,

(ii) is self-adjoint, and

(iii) maps real-valued functions to real-valued functions.

For the purpose of kernel density estimation we are interested in the following class of operators: We say that K is an integral operator, when there exists a measurable function $\mathcal {K}(x, y)$ —referred as the kernel of the operator K—

$$\begin{aligned} \mathcal {M}\times \mathcal {M}\ni (x,y)\mapsto \mathcal {K}(x,y)\in {\mathbb R}\quad (\text {kernel}) \end{aligned}$$

such that

$$\begin{aligned} K(f)(x)=\int _{\mathcal {M}}\mathcal {K}(x,y)f(y)d\mu (y),\quad f\in \text {Dom}(K),\;x\in \text {Dom}(f). \end{aligned}$$

(2.12)

Note further that when the spectral multiplier $K=k(\sqrt{L})$ is an integral operator, then its kernel is real valued and symmetric;

$$\begin{aligned} \mathcal {K}(x,y)=\mathcal {K}(y,x)\in {\mathbb R}. \end{aligned}$$

Such kernels are exactly the objects we will use for the kernel density estimation.

As always we need a notion of dilations suitable for use in the current framework.

Let $k:{\mathbb R}_{+}$ a symbol, K the spectral multiplier associated with k and $\sqrt{L}$ and assume that K is an integral operator with kernel $\mathcal {K}(x,y)$, as above. For every $h>0$ we denote by

(i) $k_{h}(\lambda ):=k(h\lambda ),\;\lambda \in {\mathbb R}_+$, the symbol induced by k dilated by h.

(ii) $K_h=k_h(\sqrt{L})=k(h\sqrt{L})$, the spectral multiplier associated with $k_h$ and $\sqrt{L}$.

(iii) $\mathcal {K}_h(x,y)=\mathcal {K}_h(y,x)$, the symmetric real-valued kernel of $K_h$.

We shall need the following result from smooth functional calculus induced by the heat kernel, developed in Coulhon et al. (2012); Kerkyacharian and Petrushev (2015). We fix the following notation first: Let $h>0$ and $\tau >0$. We denote by

$$\begin{aligned} \mathcal {D}_{h,\tau }(x,y):=h^{-d}\big (1+h^{-1}\rho (x,y)\big )^{-\tau },\quad \text {for}\;x,y\in \mathcal {M}. \end{aligned}$$

(2.13)

Theorem 2.1

Suppose $k:{\mathbb R}_{+}\rightarrow {\mathbb R}$ is a symbol such that: $k\in \mathcal {C}^\tau ({\mathbb R}_+)$, $\tau >d$,

$$\begin{aligned} |k^{(\nu )}(\lambda )|\le C_\tau (1+\lambda )^{-r} \quad \text {for every}\; \lambda \ge 0\;\text {and}\;0\le \nu \le \tau ,\;\text {where}\; r > \tau +d,\nonumber \\ \end{aligned}$$

(2.14)

and $k^{(2\nu +1)}(0)=0$ for every $\nu \ge 0$ such that $1\le 2\nu +1 \le \tau $.

Then $K_h$, $h>0$, is an integral operator with kernel $\mathcal {K}_h(x, y)$ satisfying

$$\begin{aligned} \big |\mathcal {K}_h(x, y)\big | \le cC_\tau \mathcal {D}_{h,\tau }(x,y), \end{aligned}$$

(2.15)

where $c>0$ is a constant depending on $\tau $ and the structural geometric constants of the setting.

Moreover, for every $h>0$ and $x\in \mathcal {M}$

$$\begin{aligned} \int _{\mathcal {M}} \mathcal {K}_h(x, y)d\mu (y)=k(0). \end{aligned}$$

(2.16)

Remark 2.2

Let us comment on Theorem 2.1.

$(\alpha )$ Let $k\in C^{\tau }({\mathbb R})$ be an even function. Then the assumption $k^{(2\nu +1)}(0)=0$, $0<2\nu +1\le \tau $, holds automatically.

$(\beta )$ Let $k\in \mathcal {C}^{\tau }({\mathbb R}_+)$ be such that $\mathrm{{supp}\, }k\subset [0,b]$, for some $b>0$. Then (2.14) holds for $C_{\tau }:=(1+b)^r\max \{\Vert k^{(\nu )}\Vert _{\infty }:0\le \nu \le \tau \}$.

$(\gamma )$ Let us return to the setting’s Assumption II and seed more light on it. The heat kernel $p_t(x,y)$ consists the kernel of the operator $e^{-tL}$. At this point, being more familiar with Spectral Theory, we define $k(\lambda ):=e^{-\lambda ^2}$. Then for every $t>0$ the operator $e^{-tL}$ is just the spectral multiplier $K_{\sqrt{t}}$, which is an integral operator by Theorem 2.1, and the heat kernel equals $p_{t}(x,y)=\mathcal {K}_{\sqrt{t}}(x,y)$.

2.2 Examples

We present the most basic examples of spaces $(\mathcal {M},\rho ,\mu ,L)$ falling under our umbrella. For more examples we refer to Cleanthous et al. (2020); Kerkyacharian et al. (2020) and the references therein. In the following spaces we also express the kernels obtained in an abstract sense of existence in Theorem 2.1.

Example 2.3

Let $\mathcal {M}={\mathbb R}^d$ the Euclidean space associated with the operator $L=-\Delta $, the negative Laplacian. By default this space is included in our study.

We proceed to the kernels. Let $k:{\mathbb R}_{+}\rightarrow {\mathbb R}$ a symbol satisfying the assumptions of Theorem 2.1. We extend the symbol k on ${\mathbb R}^d$, radially; ${\tilde{k}}(\xi ):=k(|\xi |)$, for every $\xi \in {\mathbb R}^d$. Then the spectral multiplier $K=k(\sqrt{L})$ is nothing but the Fourier multiplier associated with the symbol ${\tilde{k}}(\xi )$. Denote by ${\hat{f}}$ and by $\mathcal {F}^{-1}f$ the Fourier transform and the inverse Fourier transform of the function $f:{\mathbb R}^d\rightarrow {\mathbb R}$, respectively. Then:

$$\begin{aligned} K(f)(x)&=\mathcal {F}^{-1}({\tilde{k}}{\hat{f}})(x)\\&=\kappa *f(x),\quad \text {where}\;\;\kappa :=\mathcal {F}^{-1}{\tilde{k}}\\&=\int _{{\mathbb R}^d}\kappa (x-y)f(y)dy,\quad \text {so}\;\;\mathcal {K}(x,y)=\kappa (x-y). \end{aligned}$$

The kernel $\mathcal {K}_{h}(x,y)$, by the properties of the Fourier transform, is the familiar

$$\begin{aligned} \mathcal {K}_{h}(x,y)=\frac{1}{h^d}\kappa \left( \frac{x-y}{h}\right) ,\quad x,y\in {\mathbb R}^d,\;h>0. \end{aligned}$$

(2.17)

This example sheds light on the notion of spectral multipliers. Specifically, on ${\mathbb R}^d$, they are the well-known Fourier multipliers, and the corresponding kernels $\mathcal {K}(x,y)$ are the convolution kernels of the symbol $\kappa =\mathcal {F}^{-1}{\tilde{k}}$. Moreover, the existing knowledge on ${\mathbb R}^d$, together with the present correspondence, acts as a guide for the several developments on the setting of metric spaces associated with operators.

In the next examples we consider spaces $\mathcal {M}$ of finite measure. In this case —as it has been proved in Coulhon et al. (2012, Proposition 3.20)—the operator L presents a discrete spectrum $0\le \lambda _0<\lambda _1<\cdots $. This implies the discrete decomposition

$$\begin{aligned} {\mathbb {L}}^2=\bigoplus _{\nu =0}^{\infty } E_{\nu };\quad E_{\nu }:=\mathrm{{ker}}(L-\lambda _{\nu }I),\;\nu \ge 0. \end{aligned}$$

Let $\{e_{i}^{\nu }\}_{i=1,\dots ,d_{\nu }}$ be an orthonormal basis of the eigenspace $E_{\nu }$ and $d_{\nu }:=\mathrm{{dim}}E_{\nu }<\infty $, $\nu \ge 0$. Then we have the projector operators

$$\begin{aligned} P_{\nu }(x,y):=\sum _{i=1}^{d_{\nu }}e_i^{\nu }(x)\overline{e_i^{\nu }(y)},\quad x,y\in \mathcal {M},\;\nu \ge 0. \end{aligned}$$

Let $k:{\mathbb R}_{+}\rightarrow {\mathbb R}$ a symbol satisfying the assumptions of Theorem 2.1. The corresponding spectral multiplier $K_h$, $h>0$, has the following kernel

$$\begin{aligned} \mathcal {K}_{h}(x,y)=\sum _{\nu =0}^{\infty }k(h\sqrt{\lambda _{\nu }})P_{\nu }(x,y),\quad x,y\in \mathcal {M}. \end{aligned}$$

(2.18)

For more details we refer to Castillo et al. (2014), Kerkyacharian et al. (2020).

Importantly, when dealing with a specific metric measure space of finite measure, associated with an operator L, we just need to know (i) the eigenvalues and (ii) the projector operators, and then we get the kernels in (2.18).

Next, we present precise expressions of (2.18) on the cases of the unit sphere and the unit ball of ${\mathbb R}^3$, which seems to be the most applicable.

Example 2.4

Let $\mathcal {M}={\mathbb S}^2$ the unit sphere of ${\mathbb R}^{3}$ associated with the angular distance, the spherical measure and the spherical Laplacian. Then this space satisfies our Assumptions I and II; Kerkyacharian et al. (2020). The kernel takes the form:

$$\begin{aligned} \mathcal {K}_h(\xi ,\eta )=\sum _{\nu =0}^{\infty }\frac{1+2\nu }{4\pi }k\big (h\sqrt{\nu (\nu +1)}\big )P_{\nu }\big (\langle \xi ,\eta \rangle \big ),\quad \xi ,\eta \in {\mathbb S}^2, \end{aligned}$$

(2.19)

where $P_{\nu }$ the Legendre polynomials and $\langle \cdot ,\cdot \rangle $ the inner product on ${\mathbb R}^3$.

Example 2.5

The unit ball ${\mathbb B}^3$ of ${\mathbb R}^3$ equipped with the distance Dai and Xu (2013)

$$\begin{aligned} \rho (x,y)=\arccos \big (\langle x,y\rangle +\sqrt{1-|x|^2}\sqrt{1-|y|^2}\big ), \end{aligned}$$

(2.20)

the measure

$$\begin{aligned} d\mu (x)=\big (1-|x|^2\big )^{-1/2}dx, \end{aligned}$$

(2.21)

and the operator

$$\begin{aligned} L&=-\sum _{i=1}^{3}\partial _i^2 +\sum _{i,j=1}^{3}x_i x_j \partial _i \partial _j +3\sum _{i=1}^{3}x_i \partial _i, \end{aligned}$$

(2.22)

satisfies the assumptions of our setting; Kerkyacharian et al. (2020).

Expanding the discussion in Cleanthous et al. (2020) and using Kyriazis et al. (2008), the kernel takes the form

$$\begin{aligned} \mathcal {K}_{h}(x,y)=\sum _{\nu =0}^{\infty }\frac{1+\nu }{2\pi ^2}k\big (h\sqrt{\nu (\nu +2)}\big )G_{\nu }(x,y),\quad x,y\in {\mathbb B}^3, \end{aligned}$$

(2.23)

where

$$\begin{aligned} G_{\nu }(x,y):=C_{\nu }^{1}&\big (\langle x,y\rangle +\sqrt{1-|x|^2}\sqrt{1-|y|^2}\big ) \nonumber \\&+C_{\nu }^{1}\big (\langle x,y\rangle -\sqrt{1-|x|^2}\sqrt{1-|y|^2}\big ) \end{aligned}$$

(2.24)

and $C_{\nu }^1$ the Gegenbauer polynomials of order 1.

We emphasize that the kernels existing by Theorem 2.1 may look completely different as in (2.17), (2.19), and (2.23); however, all of them enjoy the decay in (2.15), which is sharp in all the above cases as it can be confirmed by the properties of Fourier transform, Legendre polynomials and Gegenbauer polynomials, respectively.

The advantage of the general theory is that it unifies spaces of different nature, extracts general results and expresses them in the particular cases of interest.

2.3 Kernel density estimators on $\varvec{(\mathcal {M},L)}$

We are now ready to present kernel density estimators on the current general setting as introduced in Cleanthous et al. (2020).

Definition 2.6

Let $n\in {\mathbb {N}}$ and $X_1,\dots ,X_n$ be iid random variables on $\mathcal {M}$. Let $k:{\mathbb {R}}_+\rightarrow {\mathbb {R}}$ be a symbol satisfying the assumptions of Theorem 2.1, as well as $k(0)=1$, and $h>0$ a bandwidth. The associated kernel density estimator (kde) is defined as

$$\begin{aligned} {\hat{f}}_{n,h}(x):={\hat{f}}_{n,h}(X_1,\dots ,X_n;x):=\frac{1}{n}\sum \limits _{i=1}^{n} \mathcal {K}_{h}(X_i,x), \quad x\in \mathcal {M}. \end{aligned}$$

(2.25)

Note that (2.25) is well-defined for every k, as guaranteed by Theorem 2.1. In addition (2.16) implies the fundamental property

$$\begin{aligned} \int _{\mathcal {M}}\mathcal {K}_{h}(x,y)d\mu (y)=k(0)=1, \end{aligned}$$

which is a standard assumption for the kernels in the Euclidean setting.

We further express explicitly the kde in (2.25) on ${\mathbb R}^d$, ${\mathbb S}^2$ and ${\mathbb B}^3$, just by expanding the Examples 2.3, 2.4 and 2.5. Let k a symbol as in Definition 2.6 and $h>0$.

$(\alpha )$ When $\mathcal {M}={\mathbb R}^d$, and $L=-\Delta $,

$$\begin{aligned} {\hat{f}}_{n,h}(x)=\frac{1}{n}\frac{1}{h^d}\sum \limits _{i=1}^{n} \kappa \Big (\frac{X_i-x}{h}\Big ), \quad x\in {\mathbb R}^d,\;\;\text {where}\;\;\kappa =\mathcal {F}^{-1}{\tilde{k}}, \end{aligned}$$

(2.26)

which is the very well-known form of a kde on ${\mathbb R}^d$.

$(\beta )$ When $\mathcal {M}={\mathbb S}^2$, equipped with the angular distance, the spherical measure and the spherical Laplacian,

$$\begin{aligned} {\hat{f}}_{n,h}(\xi )=\frac{1}{n}\sum \limits _{i=1}^{n} \sum _{\nu =0}^{\infty }\frac{1+2\nu }{4\pi }k\big (h\sqrt{\nu (\nu +1)}\big )P_{\nu } \big (\langle \xi ,X_i\rangle \big ),\quad \xi \in {\mathbb S}^2. \end{aligned}$$

(2.27)

$(\gamma )$ When $\mathcal {M}={\mathbb B}^3$, equipped with the distance in (2.20), the measure in (2.21) and the operator in (2.22),

$$\begin{aligned} {\hat{f}}_{n,h}(x)=\frac{1}{n}\sum \limits _{i=1}^{n} \sum _{\nu =0}^{\infty }\frac{1+\nu }{2\pi ^2}k\big (h\sqrt{\nu (\nu +2)}\big )G_{\nu }(x,X_i),\quad x\in {\mathbb B}^3, \end{aligned}$$

(2.28)

where $G_{\nu }$ as in (2.24).

2.4 Hölder spaces

We are closing this review by presenting some regularity spaces. In nonparametric estimation we assume that the density under study belongs to large regularity spaces. Regularity spaces on ${\mathbb R}$ and ${\mathbb R}^d$ have been studied for a century within many scientific disciplines. Historically, the first way to express the notion of regularity (or smoothness) was in terms of derivatives, and gradually Fourier transforms and convolutions extended such notions. For the historical path, we refer the reader to Triebel (1983).

Hölder spaces are a suitable choice for the purpose of pointwise density estimation (see Tsybakov 2009) that we will obtain in the present study. Let us recall this class on ${\mathbb R}^1$: Let $s>0$ and denote by $\ell :=\lfloor s\rfloor $ the greatest integer strictly less than s. The Hölder space ${\dot{\mathcal {H}}}^s({\mathbb R})$ is the set of function $f:{\mathbb R}\rightarrow {\mathbb R}$ that are $\ell $-times differentiable and

$$\begin{aligned} |f^{(\ell )}(x)-f^{(\ell )}(y)|\le c|x-y|^{s-\ell }, \end{aligned}$$

(2.29)

for some constant $0\le c<\infty $ and every $x\ne y$. Note that slightly different versions of these spaces can be found in different sources, but the overall purpose is more or less the same.

We must define a suitable extension of (2.29) on a metric space. For the right hand side, we simply use a power of the distance $\rho (x,y)$. Metric spaces lack the notion of derivatives, so a substitute for the left side is more challenging, but a solution comes from the operator L. In all of our examples in Sect. 2.2 we observe that the differentiability is linked with the definition of L. We also note that in every case presented in Sect. 2.2L is a differential operator of order 2. These facts justify the following definition:

Definition 2.7

Let $s>0$ and denote by $\ell =\lfloor s\rfloor $. The Hölder space of order s, ${\dot{\mathcal {H}}}^s$, is defined as the set of all functions $f:\mathcal {M}\rightarrow {\mathbb R}$ such that

$$\begin{aligned} \Vert f\Vert _{{\dot{\mathcal {H}}}^s}:=\sup \limits _{x\ne y} \frac{\big |L^{\ell /2}f(x)-L^{\ell /2}f(y)\big |}{\rho (x,y)^{s-\ell }}<\infty . \end{aligned}$$

(2.30)

For the connection between these spaces and other smoothness spaces in our setting, we refer to Coulhon et al. (2012). For the use of regularity spaces in Nonparametric Statistics in this generality, we further refer to Castillo et al. (2014), Cleanthous et al. (2020), Cleanthous et al. (2022).

3 Pointwise density estimation

We proceed to present some new results. Namely the pointwise estimation of densities enjoying Hölder regularity.

One of the main ways to measure the accuracy of the estimator ${\hat{f}}_{n,h}(x)$ at a given point $x\in \mathcal {M}$ is by the mean squared error (MSE):

$$\begin{aligned} \text {MSE}=\text {MSE}({\hat{f}}_{n,h}(x)):={\mathbb {E}}\big [\big ({\hat{f}}_{n,h}(x)-f(x)\big )^2\big ],\quad x\in \mathcal {M}, \end{aligned}$$

(3.1)

where ${\mathbb {E}}$ is the expectation of $(X_1,\dots ,X_n),$ i.e.

$$\begin{aligned} \text {MSE}= & {} {\mathbb {E}}\big [\big ({\hat{f}}_{n,h}(x)-f(x)\big )^2\big ] \nonumber \\= & {} \int _M\cdots \int _M \big ({\hat{f}}_{n,h}(x;x_1,\dots ,x_n)-f(x)\big )^2 f(x_1)\cdots f(x_n)d\mu (x_1)\cdots d\mu (x_n).\nonumber \\ \end{aligned}$$

(3.2)

What we are called to do is to determine the proper assumptions on the symbols k, so that the MSE of the corresponding kernel density estimator ${\hat{f}}_{n,h}$ to be optimally estimated, provided that the unknown density f belongs to a certain Hölder space.

The main new result of this paper is the following:

Theorem 3.1

Let $s>0,$ $f\in L^{\infty }\cap {\dot{\mathcal {H}}}^s$ and a symbol $k \in \mathcal {C}^{\tau }({\mathbb {R}}_+)$ for some $\tau >d+s$, satisfying: $k(0)=1$,

$$\begin{aligned} k^{(\nu )}(0)=0,\quad \text {for every}\;\;1\le \nu \le \tau , \end{aligned}$$

(3.3)

and for some $r> \tau +d$,

$$\begin{aligned} |k^{(\nu )}(\lambda )|\le C_{\tau }(1+\lambda )^{-r},\quad \text {for every}\;\;\lambda \ge 0,\;0\le \nu \le \tau . \end{aligned}$$

(3.4)

We pick $h=h_n=n^{-\frac{1}{2s+d}}$. Then for every $n\in {\mathbb N}$ the corresponding kde ${\hat{f}}_{n,h}$ satisfies

$$\begin{aligned} \sup \limits _{x\in \mathcal {M}} \textrm{MSE}({\hat{f}}_{n,h}(x))=\sup \limits _{x\in \mathcal {M}} {\mathbb {E}}\big [\big ({\hat{f}}_{n,h}(x)-f(x)\big )^2\big ]\le c C(f) n^{-\frac{2s}{2s+d}}, \end{aligned}$$

(3.5)

where the constant $c>0$, depends only on $\tau ,\;s,\;C_{\tau }$ and the structural constants of the setting, while C(f) is given by

$$\begin{aligned} C(f):=\max \big (\Vert f\Vert _{\infty },\Vert f\Vert _{{\dot{\mathcal {H}}}^s}^2\big ). \end{aligned}$$

(3.6)

Note that the rate obtained in (3.5) is the optimal one, meeting this in Cleanthous et al. (2020, 2022), where the $L^p$-risk was used for densities on Besov and Sobolev regularity spaces respectively. For the connection between the several regularity spaces, we refer to Proposition 6.4 in Coulhon et al. (2012) and Theorem 7.8 in Kerkyacharian and Petrushev (2015).

While approaching the proof of Theorem 3.1 we will have the opportunity to present the action on the setting and highlighting the correspondence with the classical Euclidean framework.

We first take a closer look at the assumptions on the symbol generating the kernels. We restrict ourselves to the example of $\mathcal {M}={\mathbb R}^d$. As we saw in Example 2.3, the radial extension ${\tilde{k}}$ of the symbol k is the Fourier transform of the function $\kappa $, which yields the usual kde as in (2.26). Translating the assumptions of Theorem 3.1 in the Fourier transform language we get the usual assumptions on the $\kappa $ for Euclidean spaces;

$(\alpha )$ Assumption (3.3), means simply that the $\kappa $ enjoys vanishing moments up to some certain order.

$(\beta )$ Assumption (3.4), thanks to Theorem 2.1 and (2.17), ensures that

$$\begin{aligned} \int _{{\mathbb R}^d}(1+|\xi |)^s|\kappa (\xi )|d\xi&=\int _{{\mathbb R}^d}(1+|\xi |)^s|\mathcal {K}(\xi ,0)|d\xi \le c\int _{{\mathbb R}^d}(1+|\xi |)^s\mathcal {D}_{1,\tau }(\xi ,0)d\xi \nonumber \\&=c\int _{{\mathbb R}^d}(1+|\xi |)^{-(d+\varepsilon )}d\xi ,\quad \varepsilon :=\tau -d-s>0 \nonumber \\&=c_d\int _{0}^{\infty }\frac{\varrho ^{d-1}d\varrho }{(1+\varrho )^{d+\varepsilon }}\quad \text {(polar coordinates)} \nonumber \\&\le c\int _{0}^{\infty }\frac{d\varrho }{(1+\varrho )^{1+\varepsilon }}<\infty . \end{aligned}$$

(3.7)

$(\gamma )$ The assumption $k(0)=1$, simply asserts that

$$\begin{aligned} \int _{{\mathbb R}^d}\kappa (\xi )d\xi ={\hat{\kappa }}(0)={\tilde{k}}(0)=k(0)=1. \end{aligned}$$

The standard approach for dealing with the MSE is to decompose it as follows:

$$\begin{aligned} \text {MSE}({\hat{f}}_{n,h}(x))=\sigma ^2 (x)+b^2 (x) \end{aligned}$$

(3.8)

where the function $\sigma ^2 (x)$ is the variance of the estimator ${\hat{f}}_{n,h}(x)$, i.e.

$$\begin{aligned} \sigma ^2(x):={\mathbb {E}}\big [\big ({\hat{f}}_{n,h}(x)-{\mathbb {E}}[{\hat{f}}_{n,h}(x)]\big )^2\big ],\quad x\in \mathcal {M}\end{aligned}$$

(3.9)

and b(x) is the bias of ${\hat{f}}_{n,h}(x),$ i.e.

$$\begin{aligned} b(x):={\mathbb {E}}\big [{\hat{f}}_{n,h}(x)\big ]-f(x), \quad x\in \mathcal {M}. \end{aligned}$$

(3.10)

We will separate the proof of Theorem 3.1 in the two usual steps: the estimation of the variance and the estimation of the bias, but before the proof, we provide some remarks below.

Remark 3.2

$(\alpha )$ Another form for the conclusion of Theorem 3.1 is

$$\begin{aligned} \sup _{f\in {\mathbb {F}}^{s}(m)}\sup \limits _{x\in M} {\mathbb {E}}\Big [\big ({\hat{f}}_{n,h}(x)-f(x)\big )^2\Big ]\le cn^{-\frac{2s}{2s+d}}, \end{aligned}$$

(3.11)

where ${\mathbb {F}}^{s}(m):=\{f\in L^{\infty }\cap {\dot{\mathcal {H}}}^{s}: \Vert f\Vert _{\infty }\le m\;\text {and}\;\Vert f\Vert _{{\dot{\mathcal {H}}}^{s}}\le m\}$, $m>0$, and the constant $c>0$ depends also on $m>0$.

$(\beta )$ For latter use we state that our choice of $h_n$ gives $h_n\rightarrow 0$ and $nh_n^d\rightarrow \infty $ when $n\rightarrow \infty $.

$(\gamma )$ If the symbol k is compactly supported, then it obviously satisfies (3.4); see Remark 2.2$(\beta )$.

$(\delta )$ The rate obtained in Theorem 3.1 is the optimal one (see e.g. Tsybakov (2009)).

The following simple inequality is established in Coulhon et al. (2012) under more general assumptions. Here we express it for Ahlfors regular spaces and we give its proof for having the opportunity to present some first calculations on metric spaces:

Lemma 3.3

If $\tau >d$, there exists a constant $c=c_\tau >0$ such that for every $h>0$ and $x\in \mathcal {M}$

$$\begin{aligned} I_{h,\tau }(x):=\int _{\mathcal {M}} \big (1+h^{-1}\rho (x, y)\big )^{-\tau } d\mu (y) \le ch^{d}. \end{aligned}$$

(3.12)

Proof

We split the metric space as

$$\begin{aligned} \mathcal {M}=\bigcup _{\nu =0}^{\infty }M_{\nu }, \end{aligned}$$

where $M_0:=B(x,h)$ and $M_{\nu }:=B(x,2^{\nu }h)\setminus B(x,2^{\nu -1}h)$, for every $\nu \in {\mathbb N}$.

Then

$$\begin{aligned} I_{h,\tau }(x)=\sum _{\nu =0}^{\infty }\int _{M_{\nu }}\big (1+h^{-1}\rho (x, y)\big )^{-\tau } d\mu (y). \end{aligned}$$

Of course,

$$\begin{aligned} \int _{M_{0}}\big (1+h^{-1}\rho (x, y)\big )^{-\tau }\le |B(x,h)|\le c_1 h^{d}, \end{aligned}$$

thanks to (1.1).

Let $\nu \in {\mathbb N}$ and $y\in M_{\nu }\subset B(x,2^{\nu -1}h)^c$. Then $1+h^{-1}\rho (x, y)\ge 1+2^{\nu -1}> 2^{\nu -1}$. This together with (1.1) implies that

$$\begin{aligned}\int _{M_{\nu }}\big (1+h^{-1}\rho (x, y)\big )^{-\tau } d\mu (y)&\le 2^{-(\nu -1)\tau }|M_{\nu }|\le 2^{\tau }2^{-\nu \tau }|B(x,2^{\nu }h)|\\&\le c_1 2^{\tau }2^{-\nu (\tau -d)}h^{d}. \end{aligned}$$

Combining all the above and since we assumed that $\tau >d$ we conclude to

$$\begin{aligned} I_{h,\tau }(x)&\le c_1 2^{\tau }\sum _{\nu =0}^{\infty }2^{-\nu (\tau -d)}h^{d}=c_1 2^{\tau }\frac{1}{1-2^{-\tau +d}}h^{d} \\&=c_1\frac{2^{2\tau }}{2^{\tau }-2^{d}}h^{d}=:c_{\tau }h^{d}. \end{aligned}$$

$\square $

Let us point out that the above estimate is sharp in the sense that

$$\begin{aligned} I_{h,\tau }(x)\ge \int _{M_{0}}\big (1+h^{-1}\rho (x, y)\big )^{-\tau } d\mu (y)\ge \frac{2^{-\tau }}{c_1}h^{d}, \end{aligned}$$

(3.13)

thanks to (1.1).

It is well known that such an integral is classical on the Euclidean space and can be handled using (generalized) polar coordinates, exactly as we did in (3.7). On an abstract Ahlfors regular metric space it can be sharply estimated as above.

3.1 Estimation of the variance

We proceed to the first step of the proof of Theorem 3.1. We will estimate the variance of bounded densities. As always there is not any regularity required. The main tools are Theorem 2.1 and Lemma 3.3.

Proposition 3.4

Let $f\in L^{\infty }$, $\tau >d$ and a multiplier $k\in \mathcal {C}^{\tau }({\mathbb {R}}_+)$ satisfying (3.3) and (3.4). Then for every $x\in \mathcal {M}$, $0<h\le 1$ and $n\in {\mathbb {N}}$

$$\begin{aligned} \sigma ^2(x)\le \frac{C_1}{nh^d}\Vert f\Vert _{\infty }, \end{aligned}$$

(3.14)

where the constant $C_1>0$ depends only on $\tau ,\;C_\tau $ and the structural constants.

Proof

Recalling the results expanded in Sect. 2, the spectral multiplier $K_h$ associated with the dilated symbol $k_h(\lambda )=k(h\lambda )$, is an integral operator with kernel $\mathcal {K}_h(x,y)$, $x,y\in \mathcal {M}$.

We introduce the random variables

$$\begin{aligned} \eta _i(x):=\mathcal {K}_h(X_i,x)-{\mathbb {E}}\big [\mathcal {K}_h(X_i,x)\big ],\quad x\in \mathcal {M},\;i=1,\dots ,n \end{aligned}$$

(3.15)

and we observe that $\eta _1(x),\dots ,\eta _n(x)$ are iid random variables with ${\mathbb {E}}[\eta _i(x)]=0,$ for $i=1,\dots ,n$. For their variance we have

$$\begin{aligned} {\mathbb {E}}[\eta _i^2(x)]= & {} {\mathbb {E}}\big [\big (\mathcal {K}_h(X_i,x)\big )^2\big ]- \left( {\mathbb {E}}\big [\mathcal {K}_h(X_i,x)\big ]\right) ^2\nonumber \\\le & {} {\mathbb {E}}\big [\big (\mathcal {K}_h(X_i,x)\big )^2\big ] \nonumber \\= & {} \int _\mathcal {M}|\mathcal {K}_h(x,y)|^2 f(y) d\mu (y). \end{aligned}$$

(3.16)

By Theorem 2.1 we have the certain bounds

$$\begin{aligned} |\mathcal {K}_h(x,y)|\le cC_{\tau }\mathcal {D}_{h,\tau }(x,y). \end{aligned}$$

(3.17)

Since $f\in L^{\infty }$ we derive

$$\begin{aligned} {\mathbb {E}}[\eta _i^2(x)]\le & {} c \Vert f\Vert _{\infty } h^{-2d}\int _\mathcal {M}\big (1+h^{-1}\rho (x,y)\big )^{-2\tau }d\mu (y)\nonumber \\= & {} c \Vert f\Vert _{\infty } h^{-2d}I_{h,2\tau }\le C_1 \Vert f\Vert _{\infty } h^{-d}, \end{aligned}$$

(3.18)

where for the ultimate inequality we used (3.12).

We observe that

$$\begin{aligned} \sum \limits _{i=1}^n \eta _i (x)= n\big ({\hat{f}}_{n,h}(x)-{\mathbb {E}}\big [{\hat{f}}_{n,h}(x)\big ]\big ). \end{aligned}$$

(3.19)

Bearing in mind that the independent random variables $\eta _i$ have zero mean and guided by (3.18) and (3.19) we arrive at

$$\begin{aligned} \sigma ^2(x)= & {} {\mathbb {E}}\left[ \big ({\hat{f}}_{n,h}(x)-{\mathbb {E}}[{\hat{f}}_{n,h}(x)]\big )^2\right] ={\mathbb {E}}\left[ \Big (\frac{1}{n} \sum \limits _{i=1}^n \eta _i (x)\Big )^2\right] \nonumber \\= & {} \frac{1}{n^2} \sum \limits _{i=1}^n{\mathbb {E}}\big [\eta _i^2(x)\big ] \le C_1 \Vert f\Vert _{\infty }\frac{1}{nh^d}. \end{aligned}$$

(3.20)

$\square $

3.2 Estimation of the bias

We will estimate the bias under the assumption that the pdf f lies in the Hölder space.

Proposition 3.5

Let $s>0,\;f\in L^{\infty }\cap {\dot{\mathcal {H}}}^s$ and a multiplier $k\in \mathcal {C}^{\tau }({\mathbb {R}}_+)$, for some$\;\tau >d+s$, satisfying $k(0)=1$, (3.3) and (3.4). Then for every $x\in \mathcal {M}$, $0<h\le 1$ and $n\in {\mathbb {N}}$,

$$\begin{aligned} |b(x)|\le C_2 \Vert f\Vert _{{\dot{\mathcal {H}}}^s} h^s, \end{aligned}$$

(3.21)

where the constant $C_2>0$ depends only on $s,\;\tau ,\;C_{\tau }$ and the structural constants of the setting.

Proof

Since $X_i$ are iid with common density f, we obtain

$$\begin{aligned} b(x)= & {} {\mathbb {E}}\big [{\hat{f}}_{n,h}(x)\big ]-f(x) \nonumber \\= & {} \frac{1}{n}\sum \limits _{i=1}^n {\mathbb {E}}\big [\mathcal {K}_h(X_i,x)\big ]-f(x) \nonumber \\= & {} \big (K_h-I\big )f(x), \end{aligned}$$

(3.22)

where I the identity operator on $\mathcal {M}$ and $K_h=k_h(\sqrt{L})$ the spectral multiplier associated with the dilated symbol $k_h$ and the operator $\sqrt{L}$, as in Sect. 2.

For the given bandwidth $0<h\le 1$ there exists a unique integer $i\in {\mathbb N}_0$ such that

$$\begin{aligned} 2^{-i}\le h<2^{-i+1}. \end{aligned}$$

(3.23)

We consider the symbol $\psi \in \mathcal {C}^{\infty }({\mathbb R}_+)$ with $\mathrm{{supp}\, }\psi \subset [0,2]$, $\psi (\lambda )=1$, for every $\lambda \in [0,1]$ and $0\le \psi (\lambda )\le 1$, for every $\lambda \in [0,2]$.

We set $\varphi (\lambda ):=\psi (\lambda )-\psi (2\lambda )$ which is $\mathcal {C}^{\infty }$ and supported in $[2^{-1},2]$.

By the construction of the above functions, it turns out that

$$\begin{aligned} \psi (2^{-i}\lambda )+\sum \limits _{j=i+1}^{\infty }\varphi (2^{-j}\lambda )=1,\;\;\text {for every}\;\lambda \in {\mathbb R}_+. \end{aligned}$$

Then by Coulhon et al. (2012, Corollary 3.9)

$$\begin{aligned} f=\Psi _{2^{-i}}f+\sum \limits _{j=i+1}^{\infty }\Phi _{2^{-j}}f, \end{aligned}$$

(3.24)

where by the capital $\Psi _{2^{-i}}$ and $\Phi _{2^{-j}}$ we denoted the spectral multipliers as in Sect. 2.1; $\Psi _{2^{-i}}=\psi (2^{-i}\sqrt{L})$ and $\Phi _{2^{-j}}=\varphi (2^{-j}\sqrt{L})$.

We set $\ell :=\lfloor s\rfloor $ and we introduce the symbols

$$\begin{aligned} g^i(\lambda ):=\frac{(k(h2^{i}\lambda )-1)\psi (\lambda )}{|\lambda |^{\ell }}\;\;\text {and}\;\;g^j(\lambda ):=\frac{(k(h2^{j}\lambda )-1)\varphi (\lambda )}{|\lambda |^{\ell }},\;j>i.\nonumber \\ \end{aligned}$$

(3.25)

We proceed to justify that the assumptions of Theorem 2.1 are fulfilled for the symbols $g^j$, $j\ge i$, using Remark 2.2$(\beta )$.

By the fact that $k\in \mathcal {C}^{\tau }({\mathbb R}_+)$, the values of the derivatives $k^{(\nu )}(0)$, $0\le \nu \le \tau $, (3.23) and the definitions of the symbols $\psi $ and $\varphi $ we have that:

$g^j\in \mathcal {C}^{\tau -\ell }({\mathbb R}_+)$, for every $j\ge i$. Of course $\tau -\ell>d+s-\ell >d$.

$\mathrm{{supp}\, }g^i\subset [0,2]$ and $\mathrm{{supp}\, }g^j\subset [2^{-1},2]$, for every $j>i$.

$g^i(0)=\lim _{\lambda \rightarrow 0^{+}}\frac{k(h2^{i}\lambda )-1}{\lambda ^{\ell }}\psi (\lambda )=\frac{(h2^{i})^{\ell }k^{(\ell )}(0)}{\ell !}\psi (0)=0$.

$(g^i)^{(2\nu +1)}(0)=0$, for every $1\le 2\nu +1\le \tau -\ell $.

Moreover by the vanishing derivatives’ assumption (3.3), the decay in (3.4) and the bottom of the support of $\varphi $, we obtain after some calculus that

$$\begin{aligned} |(g^j)^{(\nu )}(\lambda )|\le c(\tau ,\ell ),\quad \text {for every}\;\lambda \ge 0,\;0\le \nu \le \tau -\ell ,\;j\ge i, \end{aligned}$$

where the above constant $c(\tau ,\ell )>0$ is independent of j.

By Theorem 2.1, coupled with Remark 2.2$(\beta )$, the spectral multipliers $G^j_{2^{-j}}=g^{j}(2^{-j}\sqrt{L})$, $j\ge i$, are integral operators and their corresponding kernels $\mathcal {G}^j_{2^{-j}}(x,y)$ present the behaviour

$$\begin{aligned} |\mathcal {G}^j_{2^{-j}}(x,y)|\le c\mathcal {D}_{2^{-j},\tau -\ell }(x,y),\quad x,y\in \mathcal {M},\;j\ge i. \end{aligned}$$

(3.26)

By the definition of the symbols $g^j$, $j\ge i$ in (3.25) we get

$$\begin{aligned} (K_h-I)\Psi _{2^{-i}}f=2^{-\ell i}G^i_{2^{-i}}L^{\ell /2}f \end{aligned}$$

(3.27)

and

$$\begin{aligned} (K_h-I)\Phi _{2^{-j}}f=2^{-\ell j}G^j_{2^{-j}}L^{\ell /2}f,\;\;j>i. \end{aligned}$$

(3.28)

Combining (3.22), (3.24) with (3.27) and (3.28) we obtain the expansion

$$\begin{aligned} b(x)=\sum _{j=i}^{\infty }2^{-\ell j}G^j_{2^{-j}}L^{\ell /2}f(x). \end{aligned}$$

(3.29)

Since $G^j_{2^{-j}}$ are integral operators and because of $g^j(0)=0$, for every $j\ge i$, using (2.16) we express $G^j_{2^{-j}}L^{\ell /2}f(x)$ as

$$\begin{aligned} G^j_{2^{-j}}L^{\ell /2}f(x)&=\int _{\mathcal {M}}\mathcal {G}^j_{2^{-j}}(x,y)L^{\ell /2}f(y)d\mu (y) \nonumber \\&=\int _{\mathcal {M}}\mathcal {G}^j_{2^{-j}}(x,y)\big (L^{\ell /2}f(y)-L^{\ell /2}f(x)\big )d\mu (y). \end{aligned}$$

(3.30)

The membership of f in the Hölder space ${\dot{\mathcal {H}}}^s$ implies that

$$\begin{aligned} \big |L^{\ell /2}f(y)-L^{\ell /2}f(x)\big |&\le \Vert f\Vert _{{\dot{\mathcal {H}}}^s}\rho (x,y)^{s-\ell } \nonumber \\&\le 2^{\ell j}2^{-sj}\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\big (1+2^{j}\rho (x,y)\big )^{s-\ell },\quad j\ge i. \end{aligned}$$

(3.31)

We equip (3.29) with (3.30), (3.26) and (3.31) to arrive at the expression

$$\begin{aligned} |b(x)|&\le \sum _{j=i}^{\infty }2^{-\ell j}\int _{\mathcal {M}}|\mathcal {G}^j_{2^{-j}}(x,y)|\big |L^{\ell /2}f(y)-L^{\ell /2}f(x)\big |d\mu (y) \nonumber \\&\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}\int _{\mathcal {M}}\mathcal {D}_{2^{-j},\tau -s}(x,y)d\mu (y) \nonumber \\&= c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}2^{jd}I_{2^{-j},\tau -s}(x), \end{aligned}$$

(3.32)

where $I_{2^{-j},\tau -s}(x)$, as in Lemma 3.3. Thanks to the assumption $\tau >d+s$, by (3.12), the fact that $s>0$ and (3.23) we conclude that

$$\begin{aligned} |b(x)|&\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}2^{jd}I_{2^{-j},\tau -s}(x) \nonumber \\&\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}2^{-is}\le C_2\Vert f\Vert _{{\dot{\mathcal {H}}}^s}h^{s} \end{aligned}$$

(3.33)

and the proof is complete. $\square $

End of the proof of Theorem 3.1. We combine Propositions 3.4 and 3.5 to conclude the proof of Theorem 3.1 in the standard way.

3.3 Kernel density estimators on the sphere

The shape of the Earth justifies the unit sphere ${\mathbb S}^2$ of ${\mathbb R}^3$ as the most important domain for the purposes of several sciences. In the present paper we study earthquakes that are the subject of seismology, but many other sciences like astrophysics, environment and geology could be interested in this geometry too. We describe how the kernels obtained in Sect. 2.2 should be used in a data analysis.

We consider the symbols

$$\begin{aligned} g^{\sigma }(\lambda ):=(1+|\lambda |^{\sigma })^{-1},\;\lambda \in {\mathbb R}, \end{aligned}$$

(3.34)

for $\sigma \in {\mathbb N}$, with $\sigma >1$. Evidently for every $\sigma >1$, the symbol $g^{\sigma }$ is an even function such that $g^{\sigma }\in \mathcal {C}^{\sigma -1}({\mathbb R})$, $g^{\sigma }(0)=1$, $(g^{\sigma })^{(\nu )}(0)=0$, for every $1\le \nu \le \sigma -1$ and presents the decay as in (3.4) for $r=\sigma $. Such symbols are suitable for generating kdes by choosing the appropriate value of $\sigma $ depending on the dimension d and the regularity s and then the appropriate bandwidth h depending also on the datasize n.

For the purpose of our present study, we restrict our attention on the case of the unit sphere $\mathcal {M}={\mathbb S}^2$.

Let $s>0$ and denote by $\lceil s\rceil $ the smallest integer strictly grater than s. The symbols (3.34) for $r=\sigma :=5+\lceil s\rceil $ satisfy the assumptions of Theorem 3.1 for densities on ${\dot{\mathcal {H}}}^s$.

The expression (2.19) could be used by R (or Python etc) after the infinite series be truncated until some certain integer $N\in {\mathbb N}$, namely

$$\begin{aligned} {\hat{f}}_{n,h,N}(\xi ):=\frac{1}{n}\sum \limits _{i=1}^{n}\sum _{\nu =0}^{N} \frac{1+2\nu }{4\pi }g^r\big (h\sqrt{\nu (\nu +1)}\big )P_{\nu }\big (\langle \xi ,X_i\rangle \big ),\quad \xi \in {\mathbb S}^2. \end{aligned}$$

(3.35)

Note that

$$\begin{aligned} g^{r}(h\sqrt{\nu (\nu +1)})< (h\sqrt{\nu (\nu +1)})^{-r}<h^{-r}\nu ^{-r},\quad \nu \in {\mathbb N}. \end{aligned}$$

Moreover, for the Legendre polynomials it is well-known that $|P_{\nu }(u)|\le 1,$ for every $u\in [-1,1]$ and of course $\frac{1+2\nu }{4\pi }\le 0.51\frac{\nu }{\pi }$, for every $\nu \ge 25$.

Then the error (absolute value of the difference) because of the truncation of (2.19) until the order $N\ge 24$ can be safely bounded from above by

$$\begin{aligned} \text {error}&\le \sum _{\nu >N}\frac{0.51 \nu }{\pi }h^{-r}\nu ^{-r}=\frac{0.51}{\pi }h^{-r}\sum _{\nu =N+1}^{\infty }\nu ^{-r+1} \nonumber \\&\le \frac{0.51}{\pi }h^{-r}\int _{N}^{\infty }x^{-r+1}dx=\frac{0.51 h^{-r}N^{-r+2}}{\pi (r-2)}. \end{aligned}$$

(3.36)

Recall that by Theorem 3.1$h=n^{-1/(2s+2)}$, where n is our datasize.

Expression (3.36) asserts that an effectively large N could provide a certain error-bound when the datasize n and the regularity s are considered as fixed.

As an example, take $s\in (0,1]$ (the less restrictive range) which corresponds to the value $r=6$. In this case and after setting $h=n^{-1/2(s+1)}$ the error is at most

$$\begin{aligned} \frac{0.51*n^{3/(s+1)}}{4\pi }N^{-4}. \end{aligned}$$

(3.37)

In a specific data analysis, with a given datasize n, one should bear in mind to respect (3.37) for the hypothetical smoothness’ level s and obtain appropriate values for the error is pre-defined as “suitable”. For a data analysis of earthquakes the reader is referred to Sect. 5.

4 Spaces of homogeneous type

We proceed to more general assumptions than those in Sect. 1. Specifically, we no longer assume that our space enjoys the Ahlfors regularity, rather than the so-called doubling volume property (4.1) below. Such a setting is what is referred as a space of homogeneous type.

We replace Assumption I with the following:

(a) Doubling volume condition: There exists a constant $c_0>1$ such that

$$\begin{aligned} 0< |B(x,2r)| \le c_0|B(x,r)|<\infty \quad \hbox {for all }x \in \mathcal {M}\hbox { and }r>0, \end{aligned}$$

(4.1)

where |B(x, r)| is the volume of the open ball B(x, r) centred at x of radius r.

(b) Noncollapsing condition: There exists a constant $c_1>0$ such that

$$\begin{aligned} \inf _{x\in \mathcal {M}}|B(x,1)|\ge c_1. \end{aligned}$$

(4.2)

We modify Assumption II by replacing the factor $t^{-d/2}$ by

$$\begin{aligned} \big (|B(x,\sqrt{t})||B(y,\sqrt{t})|\big )^{-1/2} \end{aligned}$$

(4.3)

in equations (1.2) and (1.3).

Some remarks are in order:

$(\alpha )$ Of course (a) and (b) hold trivially true under (i).

$(\beta )$ From (4.1) it follows that there exist $c_0'>0$ and $d>0$ such that

$$\begin{aligned} |B(x,\lambda r)| \le c_0'\lambda ^{d} |B(x,r)| \quad \hbox {for every } x \in \mathcal {M}, r>0,\hbox { and }\lambda >1, \end{aligned}$$

(4.4)

the constant d above $d'$ is referred as the homogeneous dimension of $(\mathcal {M},\rho ,\mu )$. This generalizes effectively the Ahlfors dimension used in the previous sections.

$(\gamma )$ A connection between the volume of balls of small radius, with their radius and the dimension comes from (4.2) and (4.4):

$$\begin{aligned} |B(x, r)|\ge c r^d, \quad x\in \mathcal {M},\;0<r\le 1. \end{aligned}$$

(4.5)

$(\delta )$ In the framework of spaces of homogeneous type we replace the kernels defined in (2.13) by

$$\begin{aligned} \mathcal {D}_{h,\tau }(x,y):=\frac{\big (1+h^{-1}\rho (x,y)\big )^{-\tau }}{(|B(x,h)||B(y,h)|)^{1/2}},\quad \text {for}\;x,y\in \mathcal {M}. \end{aligned}$$

(4.6)

On the background results:

(i) Theorem 2.1 holds as it is, but with the kernels $\mathcal {D}_{h,\tau }$ as in (4.6).

(ii) Lemma 3.3 takes the form: for every $\tau >d$, there exists a constant $c=c_{\tau }>0$ such that

$$\begin{aligned} I_{h,\tau }\le c|B(x,h)|,\quad \text {for every}\;x\in \mathcal {M},\;h>0. \end{aligned}$$

(4.7)

$(\varepsilon )$ To compare the volumes of balls with different centers $x, y\in \mathcal {M}$ and the same radius r, we note first that $B(x,r) \subset B\big (y, \rho (y,x) +r\big )$, which coupled with (4.4) leads to

$$\begin{aligned} |B(x, r)| \le c\big (1+ \rho (x,y)/r\big )^d |B(y, r)|, \quad x, y\in \mathcal {M}, \; r>0. \end{aligned}$$

(4.8)

The last implies that the kernel in (4.6) is estimated by:

$$\begin{aligned} D_{h,\tau }(x,y)\le c|B(x,h)|^{-1}(1+h^{-1}\rho (x,y))^{-\tau +d/2}. \end{aligned}$$

(4.9)

We are now in place to express the boundedness of the Mean Squared Error on the more general setting of spaces of homogeneous type associated with operators:

Theorem 4.1

Let $s>0,$ $f\in L^{\infty }\cap {\dot{\mathcal {H}}}^s$ and a symbol $k \in \mathcal {C}^{\tau }({\mathbb {R}}_+)$ for some $\tau >3d/2+s$, satisfying: $k(0)=1$,

$$\begin{aligned} k^{(\nu )}(0)=0,\quad \text {for every}\;\;1\le \nu \le \tau , \end{aligned}$$

(4.10)

and for some $r> \tau +d$,

$$\begin{aligned} |k^{(\nu )}(\lambda )|\le C_{\tau }(1+\lambda )^{-r},\quad \text {for every}\;\;\lambda \ge 0,\;0\le \nu \le \tau . \end{aligned}$$

(4.11)

We pick $h=h_n=n^{-\frac{1}{2s+d}}$. Then for every $n\in {\mathbb N}$ the corresponding kde ${\hat{f}}_{n,h}$ satisfies

$$\begin{aligned} \sup \limits _{x\in \mathcal {M}} {\mathbb {E}}\big [\big ({\hat{f}}_{n,h}(x)-f(x)\big )^2\big ]\le c C(f) n^{-\frac{2s}{2s+d}}, \end{aligned}$$

(4.12)

where the constant $c>0$, depends only on $\tau ,\;s,\;c_{\tau },\beta $ and the structural constants of the setting, while C(f) is given by

$$\begin{aligned} C(f):=\max \big (\Vert f\Vert _{\infty },\Vert f\Vert _{{\dot{\mathcal {H}}}^s}^2\big ). \end{aligned}$$

(4.13)

Proof

In the light of (3.8), we need to bound the variance and the bias in a similar manner as in Propositions 3.4 and 3.5.

We start with the variance.

By (3.17) and (4.9) we get the behaviour

$$\begin{aligned} |K_h(x,y)|\le c|B(x,h)|^{-1}\big (1+h^{-1}\rho (x,y)\big )^{-\tau +d/2},\quad \text {for every}\;x,y\in \mathcal {M}. \end{aligned}$$

The last replaced in (3.18) implies since $\tau>3d/2+s>d$,

$$\begin{aligned} {\mathbb {E}}[\eta _i^2(x)]\le & {} c \Vert f\Vert _{\infty } |B(x,h)|^{-2}\int _M\big (1+h^{-1}\rho (x,y)\big )^{-2\tau +d}d\mu (y)\\= & {} c \Vert f\Vert _{\infty } |B(x,h)|^{-2}I_{h,2\tau -d}\le c \Vert f\Vert _{\infty } |B(x,h)|^{-1} \le C_1 \Vert f\Vert _{\infty } h^{-d}, \end{aligned}$$

where we used (4.7) and (4.5) respectively. Now the estimation of the variance in (3.14) follows as in (3.19).

We proceed to bound the bias as in Proposition 3.5. Recall that $\ell :=\lfloor s\rfloor $. This time the kernels $\mathcal {G}^j_{2^{-j}}(x,y)$ enjoy the behaviour

$$\begin{aligned} |\mathcal {G}^j_{2^{-j}}(x,y)|\le c\mathcal {D}_{2^{-j},\tau -\ell }(x,y)\le c|B(x,2^{-j})|^{-1}\big (1+2^{j}\rho (x,y)\big )^{-\tau +\ell +d/2}, \end{aligned}$$

thanks to (4.9). Then

$$\begin{aligned} |b(x)|&\le \sum _{j=i}^{\infty }2^{-\ell j}\int _{\mathcal {M}}|\mathcal {G}^j_{2^{-j}}(x,y)|\big |L^{\ell /2}f(y)-L^{\ell /2}f(x)\big |d\mu (y) \nonumber \\&\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}|B(x,2^{-j})|^{-1}I_{2^{-j},\tau -\frac{d}{2}-s}(x), \nonumber \\&\le c\Vert f\Vert _{{\dot{\mathcal {H}}}^s}\sum _{j=i}^{\infty }2^{-sj}\le C_2 h^{s}, \end{aligned}$$

(4.14)

where we used (4.7) which is valid because $\tau >3d/2+s$.

The reminder of the proof is the same as in the proof of Theorem 3.1. $\square $

Let us close this section with some comments on the geometric assumptions.

Obviously the doubling property (4.1) is more general assumption than Ahlfors regularity (1.1). A simple illustrative example could be the case of the weighted ball, ${\mathbb B}^m:=\big \{x\in \mathbb {R}^m: \Vert x\Vert <1\big \}$, of ${\mathbb R}^m$ with the distance (2.20) and the weighted measure; Dai and Xu (2013)

$$\begin{aligned} d\mu _{\gamma }(x):= (1-\Vert x\Vert ^2)^{\gamma -1/2} dx, \quad \gamma >-1. \end{aligned}$$

(4.15)

As in Dai and Xu (2013) we have that

$$\begin{aligned} |B(x, r)| \sim r^m(1-\Vert x\Vert ^2+r^2)^\gamma , \end{aligned}$$

(4.16)

which implies that $(\mathcal {M}, \rho , \mu _{\gamma })$ satisfies the doubling property (4.1) and the non-collapsing condition (4.2).

Precisely, the homogeneous dimension is $d=m+2\max (\gamma ,0)$. Clearly $m\le d$, with the equality exactly when $\gamma =0$, which correspond to the unweighted case and widows the space as an Ahlfors regular one.

Moreover, it is now apparent how the homogeneous dimension, fundamentally depends on the measure of the space and may or may not be an integer.

Note also that in the weighted case, the proper operator is

$$\begin{aligned} L:=L_{\gamma }:= -\sum _{i=1}^m (1-x_i^2)\partial ^2_i + 2\sum _{1\le i < j \le m}x_i x_j\partial _i\partial _j + (m+2 \gamma )\sum _{i=1}^m x_i \partial _i. \end{aligned}$$

which satisfies Assumption II; see Dai and Xu (2013); Kerkyacharian et al. (2020).

Note finally that the non-collapsing condition holds true for every space $(\mathcal {M},\rho ,\mu )$ of homogeneous type which is of finite measure $\mu (\mathcal {M})<\infty $; see Coulhon et al. (2012).

5 Data Illustration

In this data illustration, we use earthquake location data for all earthquakes with a reported magnitude of 6.5 or higher between 1990–2021 (inclusive). These data are freely available through the United States Geological Survey website https://earthquake.usgs.gov/earthquakes/search/. In total, there are $n = 1507$ earthquakes that fit these criteria, and we plot the location of these earthquakes in Fig. 1.

To explain some of the earthquake patterns in Fig. 1, we briefly discuss plate tectonics. The Earth’s crust or lithosphere is divided into distinct and irregular sections of solid rock called tectonic plates. The tectonic plates float and gradually move on the molten rock of the Earth’s mantle. Many geological events (e.g., volcanic eruptions and earthquakes) occur where different tectonic plates meet. For this reason, high magnitude earthquakes are highly concentrated around tectonic plate boundaries, and this global network of plate boundaries is evident in the earthquake patterns in Fig. 1. Earthquakes also occur elsewhere in the world at lower rates.

The Circum-Pacific Belt (the west coasts of the American continents, from Alaska to East Asia, stretching down to the Pacific Islands), sometimes called the Pacific Rim, is the most seismically active. Note that the Pacific Islands (e.g., Tonga, Fiji, New Zealand, and New Caledonia) appear on the left and right of Fig. 1. The entire Pacific Rim has high concentrations of high magnitude earthquakes, but we point out two other areas with very high concentrations of earthquakes. There are many earthquakes in a small area around the South Sandwich Islands, including earthquakes with 7.5 and 8.1 magnitudes on August 12, 2021. We also point out the Alpide Belt, a region that runs along the Azores, the Mediterranean, the Middle East, the Himalayas, Indonesia, and connects to the Pacific Rim in the Pacific Islands. Given the distribution of earthquakes seen here, we anticipate the need for a heterogeneous density estimate.

These data are distributed globally and are indexed on the sphere. To estimate the density of earthquakes, we approximate the density estimator in (2.27) by selecting a finite truncation point N,

$$\begin{aligned} {\hat{f}}_{n,h,N}( \xi ):= \frac{1}{n} \sum ^n_{i=1} \sum ^N_{\nu =0}\frac{1 + 2 \nu }{4 \pi } k(h\sqrt{\nu (\nu +1)}) P_\nu \left( \langle \xi , X_i \rangle \right) , \end{aligned}$$

(5.1)

where $X_i$ are earthquake locations, $k(\cdot )$ is defined as 3.34 with $r = 5 + \lceil s\rceil $, and $P_\nu $ are Legendre polynomials.

Because the truncation induces error in the estimation, we anticipate that lower values of N will decrease the accuracy. In Fig. 2, given n, we plot the theoretical upper bound of the truncation error in (3.36) against the truncation point for various values of s. For all values of s, the upper bound of truncation error decreases as N increases.

Because (5.1) is not guaranteed to be positive, we use a rectified density estimate,

$$\begin{aligned} {\hat{f}}^*_{n,h,N}( \xi ) = \max (10^{-3},{\hat{f}}_{n,h,N}( \xi ) ). \end{aligned}$$

(5.2)

In our analysis of these data, we explore the effect of the bandwidth and the truncation point of (5.1) on the density estimates of earthquake locations. We use out-of-sample performance to determine the bandwidth and truncation point. Specifically, we randomly hold out 20% of the earthquakes as a test dataset $\{X^{\text {test}}_1,\ldots ,X^{\text {test}}_{n_{\text {test}}}\}$ to validate the density estimator. Using many different bandwidth and truncation point combinations, we calculate density estimators ${\hat{f}}^*_{n_{\text {train}},h,N}( \xi )$ using the remaining 80% of the data. Then, at the hold-out locations, we evaluate

$$\begin{aligned} {\hat{f}}^*_{n_{\text {train}},h,N}( X^{\text {test}}_1 ),\dots , {\hat{f}}^*_{n_{\text {train}},h,N}( X^{\text {test}}_{n_{\text {test}}} ). \end{aligned}$$

Using these evaluations, we compute the out-of-sample mean log-loss (negative log-score) at the hold out locations

$$\begin{aligned} \frac{1}{n_{\text {test}}}\sum ^{n_{\text {test}}}_{i=1} -\log \left( {\hat{f}}^*_{n_{\text {train}},h,N}( X^{\text {test}}_i )\right) . \end{aligned}$$

The use of log-loss as a proper scoring rule is common; see for example (Good 1952; Gneiting and Raftery 2007).

Rather than consider bandwidth directly, we let $h = n^{-1/(2\,s + 2)}$ and consider $s \in \{0.001, 0.01, 0.05, 0.5, 1 \}$, where s indexes the smoothness of the density (See Sect. 2.4). Based on how concentrated earthquake events are, small values of s (i.e., smaller bandwidths) are preferable to smoother alternatives that will yield more uniform density estimators. In our analysis, we also vary the truncation point of the density estimator $N \in \{5,10,20,30,40,50,75,100\}$. Larger values of N yielded no improvement in log-loss. We select the truncation point and bandwidth with the lowest out-of-sample mean log-loss.

We plot the mean log loss as a function of truncation point for various values of s in Fig. 3. Overall, for each s, increasing N improves out-of-sample performance up to a point; then, improvement flattens and appears to reach an asymptote. In addition, smaller values of s have better out-of-sample performance; however, values of s less that 0.01 do not change model performance. The best out-of-sample performance (lowest log loss) is with $N = 75$, and there is no appreciable difference between $s = 0.01$ and $s = 0.001$ (or even smaller values of s). For this reason, we use $s = 0.01$.

For $N = 75$ and $s = 0.01$, we plot a heat map of the estimated density over a fine grid over the Earth in Fig. 4. The colors are on a natural log scale to better see variability in the density. The Pacific Rim is evident in the density estimate, but the most striking features are the high estimated densities around the Pacific Islands and the Pacific Rim. We also note that the South Sandwich Islands and the Alpide belt are visible for their relatively high earthquake densities.

In this data illustration, we considered a global dataset of earthquakes from 2021. We compared many truncated kernel density estimators indexed on the sphere on a test set. We found that the kernel density estimates perform better out-of-sample with shorter bandwidths (smaller s) and more polynomial terms (higher N). For the best combination of s and N considered, we plot the estimated density and comment on its features. In future analyses of these data, one may also estimate the density of earthquake magnitude. In these cases, it may be beneficial to allow the magnitude density to vary smoothly over space as in Sheanshang et al. (2021). In addition, one may account for aftershock excitation in the estimation, as is sometimes used in point process methodology (see, e.g., Hawkes 1971a, b; Ogata 1988, 1998; White and Gelfand 2021).

Data Availability

The data used in this analysis are freely available through the United States Geological Survey website https://earthquake.usgs.gov/earthquakes/search/.

References

Baldi P, Kerkyacharian G, Marinucci D, Picard D (2009) Adaptive density estimation for directional data using needlets. Ann Stat 37(6A):3362–3395
Article MathSciNet Google Scholar
Baraud Y, Giraud C, Huet S (2014) Estimator selection in the Gaussian setting. Ann Inst Henri Poincaré Probab Stat 50:1092–1119
Article MathSciNet Google Scholar
Bates J, Mio W (2014) Density estimators of Gaussian type on closed Riemannian manifolds. J Math Imaging Vis 50:827–853
Article MathSciNet Google Scholar
Berry T, Sauer T (2017) Density estimation on manifolds with boundary. Comput Stat Data Anal 107:1–17
Article MathSciNet Google Scholar
Birgé L (2014) Model selection for density estimation with $L^2$-loss. Probab Theory Relat Fields 158:533–574
Article Google Scholar
Bretagnolle J, Huber C (1979) Estimation des densités: risque minimax (French). Z Wahrsch Verw Gebiete 47(2):119–137
Article MathSciNet Google Scholar
Castillo I, Kerkyacharian G, Picard D (2014) Thomas Bayes’ walk on manifolds. Probab Theory Related Fields 158(3):665–710
Article MathSciNet Google Scholar
Cleanthous G, Georgiadis AG, Kerkyacharian G, Petrushev P, Picard D (2020) Kernel and wavelet density estimators on manifolds or more general metric spaces. Bernoulli 26(3):1832–1862
Article MathSciNet Google Scholar
Cleanthous G, Georgiadis AG, Porcu E (2022) Oracle inequalities and upper bounds for kernel density estimators on manifolds and more general metric spaces. J Nonparametric Stat 34(4):734–757
Article MathSciNet Google Scholar
Coulhon T, Kerkyacharian G, Petrushev P (2012) Heat Kernel Generated Frames in the Setting of Dirichlet Spaces. J Fourier Anal Appl 18(5):995–1066
Article MathSciNet Google Scholar
Dai F, Xu Y (2013) Approximation theory and harmonic analysis on spheres and balls. Springer Monographs in Mathematics. Springer, Berlin
Devroye L, Györfi L (1985) Nonparametric density estimation: the $L^1$ view. Wiley, New York
Google Scholar
Devroye L, Lugosi L (1996) A universally acceptable smoothing factor for kernel density estimation. Ann Stat 24:2499–2512
Article Google Scholar
Devroye L, Lugosi L (1997) Nonasymptotic universal smoothing factors, kernel complexity and Yatracos classes. Ann Stat 25:2626–2637
Article MathSciNet Google Scholar
Donoho DL, Johnstone IM, Kerkyacharian G, Picard D (1996) Density estimation by wavelet thresholding. Ann Stat 24:508–539
Article MathSciNet Google Scholar
Efroimovich SYu (1986) Non-parametric estimation of the density with unknown smoothness. Ann Stat 36:1127–1155
Google Scholar
Georgiadis AG, Nielsen M (2017) Pseudodifferential operators on spaces of distributions associated with non-negative self-adjoint operators. J Fourier Anal Appl 23(2):344–378
Article MathSciNet Google Scholar
Gneiting T, Raftery AE (2007) Strictly proper scoring rules, prediction, and estimation. J Am Stat Assoc 102(477):359–378
Article MathSciNet CAS Google Scholar
Goldenshluger A, Lepski O (2011a) Uniform bounds for norms of sums of independent random functions. Ann Probab 39:2318–2384
Goldenshluger A, Lepski O (2011b) Bandwidth selection in kerrnel density estimation: oracle inequalities and adaptive minimax optimality. Ann Stat 39:1608–1632
Goldenshluger A, Lepski O (2014) On adaptive minimax density estimation on $\mathbb{R}^d$. Probab Theory Relat Fields 159:479–543
Article Google Scholar
Goldenshluger A, Lepski O (2022a) Minimax estimation of norms of a probability density: I. Lower bounds. Bernoulli 28(2):1120–1154
Goldenshluger A, Lepski O (2022b) Minimax estimation of norms of a probability density: II. Rate-optimal estimation procedures. Bernoulli 28(2):1155–1178
Good I (1952) Rational decisions. J R Stat Soc Ser B (Methodol) 14(1):107–114
MathSciNet Google Scholar
Hall P, Watson GS, Cabrera J (1987) Kernel density estimation with spherical data. Biometrika 74:751–762
Article MathSciNet Google Scholar
Härdle W, Kerkyacharian G, Picard D, Tsybakov AB (1998) Wavelets, approximation, and statistical applications. Lecture notes in statistics, vol 129. Springer, New York
Hasminskii R, Ibragimov IA (1990) On density estimation in the view of Kolmogorov’s ideas in approximation theory. Ann Stat 18:999–1010
Article MathSciNet Google Scholar
Hawkes AG (1971a) Point spectra of some mutually exciting point processes. J R Stat Soc Ser B (Methodol) 33(3):438–443
Hawkes AG (1971b) Spectra of some self-exciting and mutually exciting point processes. Biometrika 58(1):83–90
Ibragimov IA, Khasminski RZ (1980) An estimate of the density of a distribution. Zap Nauchn Sem Leningrad Otdel Mat Inst Steklov (LOMI) 98:61–85
MathSciNet Google Scholar
Kerkyacharian G, Petrushev P (2015) Heat kernel based decomposition of spaces of distributions in the framework of Dirichlet spaces. Trans Am Math Soc 367:121–189
Article MathSciNet Google Scholar
Kerkyacharian G, Picard D, Tribouley K (1996) $L^p$ adaptive density estimation. Bernoulli 2:229–247
Article MathSciNet Google Scholar
Kerkyacharian G, Lepski O, Picard D (2001) Nonlinear estimation in anisotropic multi-index denoising. Probab Theory Relat Fields 121:137–170
Article MathSciNet Google Scholar
Kerkyacharian G, Lepski O, Picard D (2008) Nonlinear estimation in anisotropic multiindex denoising. Sparse case. Theory Probab Appl 52:58–77
Article MathSciNet Google Scholar
Kerkyacharian G, Ogawa S, Petrushev P, Picard D (2018) Regularity of Gaussian processes on Dirichlet spaces. Constr Approx 47(2):277–320
Article MathSciNet Google Scholar
Kerkyacharian G, Petrushev P, Xu Y (2020) Gaussian bounds for the weighted heat kernels on the interval, ball and simplex. Constr Approx 51:73–122
Article MathSciNet Google Scholar
Kyriazis G, Petrushev P, Xu Y (2008) Decomposition of weighted Triebel-Lizorkin and Besov spaces on the ball. Proc Lond Math Soc (3) 97(2):477–513
Article MathSciNet Google Scholar
Massart P (2007) Concentration inequalities and model selection. Lecture notes in mathematics 1896. Springer, Berlin
Google Scholar
Ogata Y (1988) Statistical models for earthquake occurrences and residual analysis for point processes. J Am Stat Assoc 83(401):9–27
Article Google Scholar
Ogata Y (1998) Space-time point-process models for earthquake occurrences. Ann Inst Stat Math 50(2):379–402
Article Google Scholar
Parzen E (1962) On the estimation of a probability density function and mode. Ann Math Stat 33:1065–1076
Article MathSciNet Google Scholar
Pelletier B (2005) Kernel density estimation on Riemannian manifolds. Stat Probab Lett 73(3):297–304
Article MathSciNet Google Scholar
Pelletier B (2006) Non-parametric regression estimation on closed Riemannian manifolds. J Nonparametric Stat 18(1):57–67
Article MathSciNet Google Scholar
Prugovečki E (1981) Quantum mechanics in Hilbert space. Second edition. Pure and Applied Mathematics, 92. Academic Press, Inc., New York
Reed M, Simon B (1980) Methods of modern mathematical physics I: functional analysis. Academic Press, New York
Google Scholar
Rigollet Ph (2006) Adaptive density estimation using the blockwise Stein method. Bernoulli 12:351–370
Article MathSciNet Google Scholar
Rigollet Ph, Tsybakov AB (2007) Linear and convex aggregation of density estimators. Math Methods Stat 16:260–280
Article MathSciNet Google Scholar
Rosenblatt M (1956) Remarks on some nonparametric estimates of a density function. Ann Math Stat 27:832–837
Article MathSciNet Google Scholar
Samarov A, Tsybakov AB (2007) Aggregation of density estimators and dimension reduction. Advances in statistical modeling and inference, Ser. Biostat., vol 3, pp 233–251. World Scientific Publishing, Hackensack (2007)
Sheanshang DM, White PA, Keeler DG (2021) Outlier accommodation with semiparametric density processes: a study of Antarctic snow density modelling. Stat Model 1471082X211043946
Silverman BW (1986) Density estimation for statistics and data analysis. Monographs on statistics and applied probability. Chapman & Hall, London
Google Scholar
Triebel H (1983) Theory of function spaces, Monographs in Math, vol 78. Birkhäuser, Basel
Book Google Scholar
Tsybakov AB (V. Zaiats, trans.) (2009) Introduction to nonparametric estimation. Springer series in statistics. Springer, New York
White PA, Gelfand AE (2021) Generalized evolutionary point processes: model specifications and model comparison. Methodol Comput Appl Probab 23(3):1001–1021
Article MathSciNet Google Scholar
Yoshida K (1978) Functional analysis. Springer, Berlin
Book Google Scholar

Download references

Funding

Open Access funding provided by the IReL Consortium.

Author information

Authors and Affiliations

Department of Mathematics and Statistics, National University of Ireland, Maynooth, Maynooth, Ireland
G. Cleanthous
School of Computer Science and Statistics, Trinity College of Dublin, Dublin, Ireland
Athanasios G. Georgiadis
Department of Statistics, Brigham Young University, Provo, USA
P. A. White

Authors

G. Cleanthous
View author publications
You can also search for this author in PubMed Google Scholar
Athanasios G. Georgiadis
View author publications
You can also search for this author in PubMed Google Scholar
P. A. White
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Athanasios G. Georgiadis.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Cleanthous, G., Georgiadis, A.G. & White, P.A. Pointwise density estimation on metric spaces and applications in seismology. Metrika (2024). https://doi.org/10.1007/s00184-024-00948-2

Download citation

Received: 30 March 2023
Accepted: 11 January 2024
Published: 13 February 2024
DOI: https://doi.org/10.1007/s00184-024-00948-2

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Pointwise density estimation on metric spaces and applications in seismology

Abstract

Similar content being viewed by others

A Fast Algorithm to Estimate the Square Root of Probability Density Function

Robust Comparison of Kernel Densities on Spherical Domains

Lattice-based methods for regression and density estimation on complicated multidimensional regions

1 Introduction

2 Density estimation on metric spaces associated with operators: A review

2.1 Functional calculus

Theorem 2.1

Remark 2.2

2.2 Examples

Example 2.3

Example 2.4

Example 2.5

2.3 Kernel density estimators on \(\varvec{(\mathcal {M},L)}\)

Definition 2.6

2.4 Hölder spaces

Definition 2.7

3 Pointwise density estimation

Theorem 3.1

Remark 3.2

Lemma 3.3

Proof

3.1 Estimation of the variance

Proposition 3.4

Proof

3.2 Estimation of the bias

Proposition 3.5

Proof

3.3 Kernel density estimators on the sphere

4 Spaces of homogeneous type

Theorem 4.1

Proof

5 Data Illustration

Data Availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation