BUMP HUNTING THROUGH DENSITY CURVATURE FEATURES

. Bump hunting deals with finding in sample spaces meaningful data subsets known as bumps. These have traditionally been conceived as modal or concave regions in the graph of the underlying density function. We define an abstract bump construct based on curvature functionals of the probability density. Then, we explore several alternative characterizations involving derivatives up to second order. In particular, a suitable implementation of Good and Gaskins’ original concave bumps is proposed in the multivariate case. Moreover, we bring to exploratory data analysis concepts like the mean curvature and the Laplacian that have produced good results in applied domains. Our methodology addresses the approximation of the curvature functional with a plug-in kernel density estimator. We provide theoretical results that assure the asymptotic consistency of bump boundaries in the Hausdorff distance with affordable convergence rates. We also present asymptotically valid and consistent confidence regions bounding curvature bumps. The theory is illustrated through several use cases in sports analytics with datasets from the NBA, MLB and NFL. We conclude that the different curvature instances effectively combine to generate insightful visualizations.


Introduction
The subject of bump hunting (BH) refers to the set estimation task [2] of discovering meaningful data regions, called bumps, in a sample space [21].The most representative example is the modal regions in a probability density function (pdf), which are literally bumps in its graph.Even though the concept has a broader scope, BH remains relatively unexplored.
Consider the problem of identifying made shots on a basketball court.Coaches, scouts and other personnel might be interested in extracting shooting patterns for adopting specific pre-game strategies, assessing talent or working on player development.Fig. 1 illustrates four different ways of constructing bumps with basketball shot data.Fig. 1a and Fig. 1b correspond to Hyndman's classical highest density region (HDR) configurations, while Fig. 1c and Fig. 1d follow our novel curvature-based characterizations.Each of them presents a distinctive perspective on the underlying shooting tendencies.Fig. 1a and Fig. 1c point at fine-grained locations, whereas Fig. 1b and Fig. 1d cover entire influence areas.Smaller regions suggest spots to prioritize in an offensive or defensive scheme.The larger ones connect the dots, revealing general trends.Both views complement each other to offer a complete picture.1.1.Goals.We propose a new BH curvature-based methodology addressing some blind spots of classical methods.Fig. 1a and Fig. 1b either miss or mask relevant information.The finer-grained 35%-HDR does not include the perimeter concave bumps in Fig. 1c.Meanwhile, the 95%-HDR fails to keep the short, mid and long ranges well separated, as opposed to the Laplacian bumps in Fig. 1d.
Contributions.The main contributions of this paper are: ➤ Presenting a general set estimation framework for curvature-based BH.
➤ Extending concave bumps to the multivariate setting.➤ Introducing mean curvature and Laplacian bumps.
➤ Deriving consistency convergence rates for curvature bump boundaries.
➤ Building valid and consistent confidence regions for curvature bumps.
➤ Showcasing the numerous applications of curvature-based BH.
1.2.Related work.One of the first BH references was due to Good and Gaskins in 1980 [21].They offered a premier definition of a bump as the concave region delimited between two inflection points.Moreover, they suggested an extension to the multivariate case.Fig. 1c corresponds to our implementation of multivariate concave bumps.
In 1996, Hyndman introduced the concept of HDR, which he conceives as level sets of the pdf f that enclose a certain probability mass [23].More formally, the (1 − α)-level HDR is defined as R(f α ) = {x : f (x) ≥ f α }, where f α is the largest value such that P (X ∈ R(f α )) ≥ 1 − α, and the random variable (rv) X is such that X ∼ f .HDRs satisfy the nice property of being the smallest sets with a given probability mass.
In 1999, Chaudhuri and Marron presented SIgnificant ZERo crossings of derivatives (SiZer), envisioning bumps as places where the first derivative becomes zero [9].
In 2002, Chaudhuri and Marron showcased the role of second derivatives in an unpublished manuscript [10].Also in 2002, Godtliebsen, Marron, and Chaudhuri explored curvature features from a pointwise perspective by assessing Hessian eigenvalue sign combinations in the bivariate case [20].A multivariate extension to [20] was formulated in 2008 by Duong et al., targeting the pointwise significance of nonzero Hessian determinants [18].Lastly, in 2021, Marron and Dryden elaborate on second derivatives in their book Object Oriented Data Analysis [27].1.3.Outline.The new methodology is presented in Section 2. The supplementary material (SM) [8] provides the necessary differential geometry foundations.In turn, Section 3 is entirely dedicated to asymptotic consistency and inference results.A sports analytics application is explored in Section 4. The SM [8] includes all the proofs and computational details.We reflect on the proposed methodology in Section 5.

Methods
Our methodology finds alternative ways of analyzing sample spaces by exploiting pdfs' curvature properties, adhering to Chaudhuri and Marron's defence of pdf derivatives.Considering Hyndman's approach a well-established tool, we believe there are still some blind spots to address with curvature.
Hyndman's HDRs have the advantage of always including global modes.However, they may generally miss local modes if small enough; lowering the threshold α might not capture them without obfuscating the HDR.On the other hand, when varying α works, questions remain on the specific value it should take.Moreover, sometimes it is necessary to explore the whole range of α ∈ (0, 1) to recover all the relevant pdf features [31].
Consider a d-variate pdf f : R d → [0, ∞).We define bumps as subsets of R d of the form for some functional ϕ measuring the curvature of f at any point, and some sign selector s ∈ {0, 1} that will usually be kept implicit.If the gradient ∇ϕ[f ] does not vanish near the zero-level set of ϕ[f ], the bump boundary ∂B ϕ is retrieved by substituting the inequality with an equality sign in (1) [29,Remark 3.1]; see Theorem 2 ahead for a formal condition [14,Assumption G] [11].Contrary to HDRs, the idea behind (1) is that ϕ carries an implicit threshold, say zero, to determine if a point belongs to the bump, solving the arbitrariness of the choice of α in HDRs.
Once some curvature functional is chosen, we propose to employ a kernel plugin estimator of B ϕ , replacing f with its kernel density estimator (KDE) in (1).Thus, given a sample X 1 , . . ., X n of independent and identically distributed (i.i.d.) random variables with pdf f and a bandwidth h > 0, we consider the KDE of f as fn,h (x) = 1 n for some kernel function K, typically a d-variate pdf.Using (2), we then define the plug-in estimator of (1) as To a first approximation, a scalar bandwidth is chosen for simplicity.Chacón and Duong demonstrated that, for d > 1, unconstrained bandwidth matrices produce significant performance gains, especially in kernel density derivative estimation (KDDE) [6,Section 5.2].Preliminary experiments seem to support their recommendation also for curvature-based BH.Nonetheless, all the theoretical developments and, consequently, all the exhibition figures in this paper obey this simplification.On the other hand, the kernel K has a lower impact on the results.Most of the statements in Section 3 do not impose a particular choice.However, all of them are compatible with the Gaussian kernel (see [1,12,13]), which is almost universally preferred in a multivariate setting [6, p. 15].For d = 1, Chaudhuri and Marron studied the functional ϕ[f ] = f ′′ , which leads to concave bumps, if s = 1, or convex dips, if s = 0. Different alternatives arise in the multivariate case.The geometrical concepts in the SM [8] lay the grounds for characterizing bumps in alternative ways to HDRs.Considering pdfs as hypersurfaces, notions like the mean and Gaussian curvatures find new usages in statistics.Fig. 2 illustrates the two main kinds of curvature bumps in this paper.Even though ϕ may a priori depend on partial derivatives of f of arbitrary order, the theory of hypersurfaces in the SM [8] suggests that our quest for curvature features is essentially fulfilled with up to second derivatives of the pdf f .
Given the connection of curvature with second derivatives, we propose targeting r = 2 in one of the standard bandwidth selectors [5].The same heuristic worked well for KDE-based applications such as mean shift clustering or feature significance testing [6,Chapter 6].
2.1.Concavity and convexity.Given a sufficiently smooth pdf f , let us define λ i [f ], for i ∈ {1, 2, . . ., d}, as the function mapping x ∈ R d to the i-th largest possibly repeated eigenvalue of D 2 f (x), the Hessian matrix of f at x, i.e., for all x ∈ R d .As mentioned in the SM [8], the eigenvalues of the Hessian (or the shape operator, equivalently) determine local concavity and convexity.Let us assume that (−1) s λ i [f ] > 0, for all i on some subset U ⊂ R d .If s = 0, f will be locally convex, whereas if s = 1, it will be locally concave on U. Considering the ordering of functions (4), we can express the former concave and convex bumps in terms of a single functional, aligned with a specific sign s, as, respectively, The concave region (5) yields the most recognizable flavour of bumps in the literature, this time in a multivariate setting.It is the method depicted in Fig. 1c.As for (6), they are actually not bumps but dips.Assuming non-degenerate Hessians, concave bumps typically delineate areas near local pdf modes, while convex dips do with local minima.Consequently, the former and the latter are known as peaks and holes [20, Table 1].
When concave bumps contain local modes, they make the most natural definition of a d-dimensional neighbourhood.Although straightforward, considering modal regions as ε-fattenings or enlargements (see Section 3.1.1below) poses challenges regarding the choice of ε > 0, as similarly argued for α in HDRs.Besides, employing a single radius ε limits the overall expressiveness of the bump.On the other hand, if we saw modal regions as basins of attraction instead [3], despite ε disappearing and attaining more flexibility, we would not be pursuing a solution to a BH problem any more but a clustering one, giving up on the cohesive sense of bumps.In this respect, concave bumps provide us with an elegant compromise answer.
Moreover, this modal vicinity notion seamlessly incorporates the missing mode scenario.Concave bumps point out incipient modal regions as the central mouth in Fig. 2a and Fig. 2c, which does not contain a mode.Such weak modal regions are well-known in the context of univariate mode hunting as shoulders, representing complicated cases [15].As for BH, d-dimensional shoulders deserve attention as evidence of hidden structure.See the NFL application in the SM [8] for an interpretable dynamic shoulder.In turn, the mouth in Fig. 2 is characteristic of mixtures whose components influence each other significantly.All in all, concave bumps subsume the modal regions, having a slightly broader reach.
2.2.Gradient divergence.Concave bumps may be too restrictive in some use cases.Imagine the pdf graph as a landscape, with mountains being local highdensity regions.Concave bumps originate near mountain peaks, missing most of the hillside.Mean curvature allows the discovery of entire mountain chains.
The shape operator is a linear map of the tangent space that measures how a manifold bends in different directions (see the SM [8] for a formal definition).Let us consider its eigenvalues: the principal curvatures.Concavity requires all principal curvatures to be negative.By contrast, the mean curvature adds them all so that only the net sign matters.Computing curvature in this way fills the gaps between concave peaks in a long ridge [20,    , as depicted in Fig. 2b and Fig. 2c in the form of a boomerang.
The SM [8] shows the connection between the mean curvature and divergence of the normalized version of the gradient ∇f = ∇f / 1 + ∥∇f ∥ 2 .The divergence operator takes positive values when the argument field diverges from a point, whereas the sign is negative when it converges.Therefore, we define the mean curvature bump as ) When the gradient is slight, as is usually the case for pdfs (one can even tweak the scale of the random variables to make ∥∇f ∥ small), the Laplacian ∆f = div(∇f ) = d i=1 ∂ 2 f /∂x 2 i roughly approximates the mean curvature (see [19,Equation 5.28]).Hence, we define the Laplacian bump as Note that B λ1 ⊂ B ∆ .Even though (8) may be less intrinsic than (7), it has a more straightforward form, for ∆ is a second-order linear differential operator on f .A discretized version of the Laplacian operator has been used for contour detection in image processing through the Laplacian-of-Gaussian algorithm [22].We have already seen an example of a Laplacian bump in Fig. 1d.The results would have been almost indistinguishable if the mean curvature had been employed.The term ridge was used above to convey a mountain range covering several peaks following [20].Ridges also refer in the statistical literature to a specific definition of higher-dimensional pdf modes [13].This concept of ridge shares with Laplacian and mean curvature bumps the ability to unveil filament-like structures.However, ridges are intrinsically one-dimensional in their most typical form.For them to extend to R d , one would need to take an ε-enlargement, introducing some arbitrariness and rigidity with ε that gradient divergence bumps do not have.In our context, we will stick to the informal meaning of ridge in the following sections.
2.3.Intrinsic curvature.The Gaussian curvature is an intrinsic measure derived from the shape operator (see the SM [8] for a precise definition).This and the Hessian determinant provide alternative ways to detect warps.The analysis of these two notions is more subtle than in the previous sections: from the definition of Gaussian curvature in the SM [8], many sign combinations among the multiplied principal curvatures produce the same net sign.
The Gaussian curvature and the Hessian determinant differ by a positive factor; thus, if we set the bump detection threshold at zero, we can restrict our analysis to the latter.In the bivariate case, the bump coincides with the union of ( 5) and (6).Therefore, (9) is helpful for detecting both concave bumps and convex dips simultaneously.We will refer to (9) as a Gaussian bump.

Asymptotics
This section will demonstrate the soundness of plug-in estimators in the asymptotic regime for curvature bumps.
3.1.Consistency.We rely on a recent result by Chen to prove consistency [11].Let be two solution manifolds defined by their criterion functions Ψ, Ψ : R d → R, respectively.Chen's stability theorem shows that M and M are near whenever the criterion functions and their derivatives are close.In our context, Ψ will represent a curvature measure and Ψ the corresponding kernel plug-in estimator so that M and M are the boundaries of the associated curvature bumps.
We also include the case β = 0, which represents the identity.Let us also define, for any derivative index vectors β 1 , . . ., We will denote C ℓ (A) the class of functions φ : A ⊂ R d → R with continuous partial derivatives up to ℓ-th order.Likewise, we will say that a function φ : R d → R is Hölder continuous with exponent α ∈ (0, 1] if there exists C ∈ (0, ∞) such that |φ(x) − φ(y)| ≤ C∥x − y∥ α , for all x, y ∈ R d [24].By convention, we include the case α = 0 when Hölder continuity does not hold for any positive exponent.
For any φ : R d → R and some A ⊂ R d , we denote ∥φ∥ ∞ = sup x∈A |φ(x)|, and we will indicate that the supremum is over A by explicitly stating that ∥φ∥ ∞ satisfies some property on A. Also, write ∥φ∥ ∞,k = max ∥∂ β φ∥ ∞ : β ∈ Z d + , |β| = k .All these norms will formalize how close the criterion functions and their respective derivatives are.
On the other hand, the stability theorem invokes some other concepts related to sets.Let us define the distance from a point x ∈ R d to some subset A ⊂ R d as d(x, A) = inf y∈A ∥x − y∥, and the ε-fattening of a set A ⊂ R d , where ε > 0, as The problem of uniformly bounding the KDDE error refers to finding an infinitesimal bound for sup Lemma 1.Let β ∈ Z d + be a partial derivative index vector.Let f be a pdf in C |β|+r (R d ), for some r ∈ Z + ∪ {∞}, with all partial derivatives bounded up to (|β| + r)-th order.Assume that ∂ β f is Hölder continuous on R d with exponent α ∈ [0, 1].If the exponent is α = 0, then ultimately assume that ∂ β f is uniformly continuous.Finally, let fn,h be the KDE of f based on a true pdf kernel K vanishing at infinity and satisfying the moment constraints for all i, j ∈ {1, . . ., d}.Then, where s = max{α, min{r, 2}}.
3.1.3.Stochastic error analysis.Lemma 2 below appears as an auxiliary result in [1] in the case ℓ = 3, but the proof works for an arbitrary ℓ.
Lemma 2 (Arias-Castro, Mason, and Pelletier, 2016 [1]).Let f be a bounded pdf in R d and let fn,h be the KDE of f .Fix a nonnegative integer ℓ as the maximum partial derivative order.Assume that K is a product kernel of the form where each κ i is a univariate pdf of class C ℓ (R).Further, assume that all the partial derivatives up to ℓ-th order of K are of bounded variation and integrable on R d .Then, there exists b ∈ (0, 1) such that, if h ≡ h n is a sequence satisfying log n ≤ nh d ≤ bn, then . Finally, note that Lemma 2 also holds for a sufficiently small but constant h.

Total error analysis.
Combining Lemma 1 and Lemma 2, we obtain a general consistency result in the supremum norm for KDDE.We will focus on the Gaussian kernel for simplicity, but any other satisfying the conditions in both Lemma 1 and Lemma 2 would do.
+ be a partial derivative index vector.Let f be a pdf in C |β|+r (R d ), for some r ∈ Z + ∪ {∞}, with all partial derivatives bounded up to If the exponent is α = 0, then ultimately assume that ∂ β f is uniformly continuous.Let fn,h be the KDE of f based on the Gaussian kernel.Finally, let h ≡ h n be a sequence converging to zero as n → ∞ and satisfying nh d ≥ log n.Then, , otherwise , a.s. as n → ∞, where s = max{α, min{r, 2}}.In particular, 3.1.5.Manifold stability.Theorem 2 gathers the essential elements of Chen's stability theorem needed in our context.
Theorem 2 (Chen, 2022 [11]).Let Ψ, Ψ : R d → R and let M and M be as defined in (10) and (11), respectively.Assume that: A1.There exists δ > 0 such that Ψ has bounded first-order derivatives on M⊕δ.A2.There exists λ > 0 such that ∥∇Ψ We have introduced in Theorem 2 a slight relaxation on the differentiability constraint for Ψ.Chen supposes differentiability and bounds on R d , whereas we allow for a narrower domain M ⊕ δ.This deviation is justified since hypotheses (A) imply M ⊂ M ⊕ ε ⊂ M ⊕ δ, where ε < δ.Since pdfs typically vanish at infinity, it might be unfeasible to ask Ψ = ϕ[ fn,h ] to be differentiable everywhere.This is the case for the eigenvalues (4) in Proposition 1 below, where condition (12) would not hold if the infimum were taken over R d .
Finally, putting all the pieces together, we get the following main result.

Theorem 3. Assume the following:
♦ Let ϕ be a curvature functional defined over d-variate pdfs depending on their partial derivatives up to ℓ-th order.More formally, given a pdf p, we have ..,β m p, for some φ : R m → R and derivative index vectors for some r ∈ {1, 2, . . ., ∞}, with all partial derivatives bounded up to (ℓ + r)-th order.If r = 1, further assume that the (ℓ + 1)-th partial derivatives of f are either Hölder continuous with exponent α ∈ (0, 1] or uniformly continuous.
♦ Let fn,h be the KDE of f based on the Gaussian kernel.
♦ Let h ≡ h n converge to zero and satisfy lim n→∞ n −1 h −(d+2ℓ+2) log n = 0. Let the curvature bump boundary and its plug-in estimator, respectively, be Further, suppose that: The optimal bound is Haus . The former coincides up to a logarithmic term with the optimum in KDDE for ℓ-th order partial derivatives according to the root mean integrated square error criterion, which is O(n −2/(d+2ℓ+4) ) [7].
Theorem 3 straightforwardly leads to bump boundary convergence results for the determinants and traces of the shape operator and the Hessian matrix.
Example 1.Consider the Laplacian and Gaussian bumps (8) and ( 9) for a bivariate pdf f : For the trace, the underlying derivative functional is φ(a 1 , a 2 ) = a 1 +a 2 , considering for the determinant, taking β 1 and β 2 as before plus β 3 = (1, 1).In both cases, φ is an infinitely smooth function over U = R m , making every δ > 0 satisfy the requirement in Theorem 3 without imposing additional hypotheses on the original pdf and its KDE.
The case for the Hessian eigenvalues is more involved.The functions λ i [f ] in (4) are not generally R d -differentiable.To solve this differentiability issue, we will follow the standard assumption in Kato's book that, for every x ∈ R d , all the eigenvalues of D 2 f (x) have multiplicity one [25,Theorem 5.16,p. 119].We will ask for an even stronger hypothesis to ensure that all plug-in estimators λ i [ fn,h ] are eventually distinct everywhere for large n a.s.Proposition 1.Let f be a pdf and let fn,h be its KDE.Let us assume that f and fn,h satisfy all the conditions in Theorem 1 so that the second-order partial derivatives of f are consistently approximated with plug-in estimators.Let us call ∂B ϕ the bump boundary for the criterion function ϕ ≡ λ j [f ], for some j ∈ {1, . . ., d}.If there exists δ > 0 such that for all i ∈ {1, . . ., d − 1}, then (12) also holds a.s.for n sufficiently large if we replace f by fn,h .In particular, both λ j [f ] and λ j [ fn,h ] are infinitely differentiable functions of the second-order partial derivatives of f and fn,h , respectively, on some neighbourhood ∂B ϕ ⊕ δ a.s.for n sufficiently large.

3.2.
Inference.In this section, we derive bootstrap inference for curvature bumps, following similar steps as in the scheme developed by Chen, Genovese, and Wasserman for pdf level sets [14].To accommodate the required techniques, we will exclusively focus on curvature functionals ϕ deriving from the pdf Hessian D 2 f .3.2.1.Inference scheme.We will simplify the inference problem by targeting f h : , instead of f , considering the bias negligible for a small h.There are compelling arguments favouring f h against f for inference purposes (see [14, Section 2.2] for a thorough discussion).
Let us call B ϕ h the smoothed version of (1) derived by replacing f with f h .We will assume that B ϕ h ⊂ Θ, for some Θ ⊂ R d , or at least that the inferential procedure focuses on B ϕ h ∩ Θ. Ideally, Θ should be as small as possible (hopefully Θ ̸ = R d ) so that the resulting confidence regions are efficient.
Given α ∈ (0, 1), a path for narrowing down a (1 − α)-level confidence region for for some margin ), thus (13) are set bounds for the Bϕ n,h in (3) approximating B ϕ h .This vertical scheme is similar to Chen, Genovese, and Wasserman's second method for pdf level set inference [14] and a particular case of Mammen and Polonik's universal approach [26].
Our inference results will establish conditions to ensure the previous set inequality eventually holds too with probability 1 − α when replacing Bϕ n,h with B ϕ h while the set bounds (13) as n → ∞, for some sequence { ζα n,h } ∞ n=1 .The inference scheme ( 14) can be proven for all curvature bumps using Theorem 4. From Section 3.1, it is an exercise to realize that, under the conditions in which (14) will hold, and with a few mild additional assumptions, the boundaries of the set bounds (13) converge in the Hausdorff distance to ∂B ϕ h .In what follows, we will equivalently denote Q p {X} ≡ Q X (p) the p-th quantile, p ∈ (0, 1), of the rv X [32, p. 304].
Theorem 4. In the context described above, assume the following: I There exists a sequence of random variables Let us further assume that √ nZ n,h converges weakly [33] to some rv Z as n → ∞, denoted by √ nZ n,h ⇝ Z. Suppose that Z has a continuous and strictly increasing cumulative distribution function (cdf ).II For each α ∈ (0, 1), there is Then, for all α ∈ (0, 1), the asymptotic validity of the inference scheme (14) holds.
The following sections will introduce theoretical results leading to bootstrap estimates ζα n,h that can be feasibly computed in practice.Mammen and Polonik's approach [26] achieves a sharp asymptotic coverage probability 1 − α in (14).A key difference separating their proposal from Chen, Genovese, and Wasserman's and ours is that they manage to bootstrap from an rv that is a supremum over a neighbourhood of the level set, unlike S n,h [ϕ] in Theorem 4, which considers the whole Θ.See [30] for an overview of similar local strategies for level sets.Based on that, Mammen and Polonik's method will generally be less conservative.

Bootstrap outline.
The main point to fill the Theorem 4 template is approximating the stochastic errors for second-order linear differential operators using bootstrap estimates Assume that both (15) and ( 16) use the same kernel K everywhere.Estimating confidence regions for curvature bumps will go through, directly or indirectly, approximating the cdf of (15) with that of (16).

3.2.3.
Gaussian process approximation.Lemma 3 below allows a Gaussian process (GP) approximation between the suprema ( 15) and (16).See [33] for further knowledge about GPs.The empirical process [33] on a sample X 1 , . . ., X n of i.i.d.d-dimensional random variables indexed by a class F of measurable functions φ : R d → R is defined as the functional G n mapping a function φ ∈ F to the rv Lemma 3 invokes the pointwise measurable (PM) and Vapnik-Chervonenkis (VC)type classes of functions.We refer the reader to [33] for the former and briefly define the latter, including the auxiliary Definition 1.
Definition 1 (Covering number [33]).Let (V, ∥•∥) be a vector space with a seminorm and let F ⊂ V. We define the ϵ-covering number of F, denoted by N (F, V, ϵ), as the minimum number of ϵ-balls of the form {x ∈ V : ∥x − y∥ < ϵ}, where y ∈ V, needed to cover F.
Definition 2 (VC-type class of functions [16]).Let F be a class of measurable functions φ : R d → R. Let Ψ be an envelope function for F, i.e., Ψ : R d → R measurable such that sup φ∈F |φ(x)| ≤ Ψ(x) for all x ∈ R d .An F class equipped with an envelope Ψ is called a VC-type class if there exist A, ν ∈ (0, ∞) such that, for all ϵ ∈ (0, 1), where the supremum is taken over all finitely discrete probability measures Q defined on R d and We will denote the Kolmogorov distance as where F X is the cdf of the rv X.Likewise, X d = Y will denote equality in distribution between the random variables.
If we apply Lemma 3 to (15), we get the following result.
Theorem 5. Let D denote any linear ℓ-th order differential operator.Let K ∈ C ℓ (R d ) be a kernel with bounded ℓ-th derivatives.Further, suppose that the class is VC-type.Let h ≡ h n be a sequence with h ∈ (0, 1) and h −(d+ℓ) = O(log n).Moreover, let B be a GP with the same properties as in Lemma 3 and indexed by Then, there exists B h d = sup φ∈F h |B(φ)| such that, for n sufficiently large, Moreover, if we fix h ∈ (0, 1) and define A similar result establishes the asymptotic distribution for (16).
Theorem 6.Let D denote any linear ℓ-th order differential operator.Let K ∈ C ℓ (R d ) be a kernel with bounded ℓ-th derivatives.Further, suppose that the class K in (18) is VC-type.Moreover, let B Xn be a GP with the same properties as in Lemma 3, indexed by F h as in (19), and with covariance where x i is the i-th observation in X n .If h ≡ h n is a sequence with h ∈ (0, 1) and Theorem 6 holds for any observations X n .The applicability of this theorem relies on the assumption that B n,h {X n } ⇝ B h a.s.This connection crystallises in the following result, which can be straightly derived from Theorem 5 and Theorem 6.
Theorem 7. Let D denote any linear ℓ-th order differential operator.Let K ∈ C ℓ (R d ) be a kernel with bounded ℓ-th derivatives.Further, suppose that the class K in (18) is VC-type.Let h ≡ h n be a sequence with h ∈ (0, 1) and h , where B h and B n,h {X n } are as in Theorem 5 and Theorem 6, respectively.Let us allow X n to vary as a random sample from the pdf f underlying the covariance structure (17) of B h .Further, suppose that Ω n,h (X n ) = o(1) a.s.under the previous hypotheses on h.Then, for n sufficiently large, We can state sufficient conditions under which Ω n,h (X n ) would converge to zero a.s.Corollary 1 gathers all the previous findings in an easy, ready-to-use form.
Corollary 1.In the hypotheses of Theorem 7, if we further take a constant h and define Bh = B h / √ h d+ℓ , then In particular, Finally, Bh has a continuous and strictly increasing cdf.

3.2.4.
Inference for curvature bumps.The results from the previous section hold the key to ensuring (14) for curvature bumps.
Theorem 8. Let us fix h ∈ (0, 1).Let E * n,h [•|X n ] be as defined in (16) with KDE based on a kernel K ∈ C 2 (R d ) with bounded second derivatives.Taking ℓ = 2, suppose that the class K in (18) is VC-type.For any α ∈ (0, 1), define the margin Then, for all α ∈ (0, 1), the asymptotic validity of the inference scheme (14) holds a.s.for the smoothed version of the Laplacian bump (8).
Concave bumps & convex dips.Concave bumps and convex dips are more involved.To obtain a parallel result to Theorem 8, we will borrow the Tail Value at Risk (TVaR) concept from financial risk management [17].The TVaR at level p ∈ (0, 1) of an rv X is defined as The TVaR is utilised to aggregate risks governed by an unknown dependence structure, for it satisfies TVaR p {X} ≥ Q p {X} and is sub-additive [17].Contrary to quantiles, weak convergence does not guarantee TVaR convergence.Lemma 4 requires the random variables to be asymptotically uniformly integrable (a.u.i.) [32, p. 17].
n=1 be an a.u.i.sequence of random variables satisfying X n ⇝ X for some rv X with a strictly increasing cdf.Then, lim n→∞ TVaR p {X n } = TVaR p {X} for all p ∈ (0, 1), being the limit finite.
Then, Lemma 4 allows proving the main result.
be as defined in (15) and ( 16) with KDE based on the same kernel K ∈ C 2 (R d ) with bounded second derivatives.Taking ℓ = 2, suppose that the class K in (18) is VC-type.For any α ∈ (0, 1), define the margin where D ij denotes second-order partial differentiation in the i and j variables.Moreover, let us assume the following: (1) Letting Bh [D ij ] be the rv such that has a continuous and strictly increasing cdf.
(2) For each pair (i, j), we have: Then, for all α ∈ (0, 1), the asymptotic validity of the inference scheme (14) holds a.s.for the smoothed version of the concave bump (5) and the convex dip (6).
The assumptions (1) and (2) seem natural.Hypothesis (1) asks a sum of nonnegative random variables with continuous and strictly increasing cdfs to have a continuous and strictly increasing cdf too, which should be valid except in pathological cases.Similarly, knowing both sequences in hypothesis (2) converge weakly, being a.u.i.amounts to the convergence of their expectations [32,Theorem 2.20].
Gaussian bumps.A similar result to Theorem 9 holds for Gaussian bumps (9).Theorem 10.Consider the same hypotheses in Theorem 9 in the case d = 2. Assume a Gaussian kernel K. Further, assume that the true pdf f is bounded.Let C be a constant such that C > (πh 4 ) −1 .For any α ∈ (0, 1), define the margin Then, for all α ∈ (0, 1), the asymptotic validity of the inference scheme (14) holds a.s.for the smoothed version of the Gaussian bump (9).

Application
We will explore a sports analytics application for d = 2 in the National Basketball Association (NBA).See the SM [8] for additional applications with d ∈ {1, 3} in two American leagues: the National Football League (NFL) and the Major League Baseball (MLB).Each player and team has its own style, a form of DNA.Following the biological analogy, if a single gene activates a trait in natural DNA, even minor bumps in data may reveal essential features.
All three sports applications are representative of the use of kernel methods for exploratory data analysis (EDA).Moreover, our proposal has a marked visual intent, thus excelling in low dimensions.In this context, the curse of dimensionality that harms kernel methods, demanding larger sample sizes to retain precision, becomes less relevant [6, Section 2.8].
Bivariate made shots in the NBA.Most people are familiar with basketball's three-point line (3PL), behind which a made shot earns not two but three points.Sports analytics have demonstrated that attempting more of these shots is well worth the risk, given the increased efficiency of three-point shooters.This trend has recently changed the basketball landscape, especially in the NBA.
Chacón exemplified univariate multimodality with shooting distances to the basket in the NBA [4].We could see that the highest mode in a pdf model of all shots for the 2014-2015 season peaked beyond the 3PL.Looking at shots from a bivariate perspective will reveal the 3PL not as two separate modes but as a ridge [6].
We will examine bumps from shot data by the three best scorers in the 2015-2016 NBA season: Stephen Curry, James Harden and Kevin Durant.Fig. 3 and Fig. 4 present different perspectives on concave and Laplacian bumps.Setting the nearthe-rim shots aside, the three players have different shooting DNAs.Stephen Curry (Fig. 4a) operates beyond the 3PL, covering the entire ridge.He also demonstrates good range with even some half-court shots.However, he barely uses the mid-range area.His shooting patterns are mostly symmetrical.James Harden (Fig. 4b) has similar trends to Curry's.He almost covers the 3PL while leaning towards some mid-range areas without half-court shots.Some notable asymmetries are present.Kevin Durant (Fig. 4c) has a more balanced game between mid and long shots.He shoots facing the basket mainly, with lower usage of lateral shots.Fig. 5 complements the previous figures with confidence sets.As refers to concave bumps, a wholly or partially ring-shaped area around the basket can be excluded with confidence for the three players.Apart from the shots near the rim, we cannot find other spots likely contained in the concave bumps.Regarding Laplacian bumps, the lower bound confidence sets become more relevant, even far apart from the rim.For Curry, up to four high-confidence spots appear beyond the 3PL, including the left-field corner; for Harden, the number of outside high-confidence spots decreases to two, while for Durant, there is only one.

Discussion
Our curvature BH methodology represents the next step in density BH techniques, a path opened by Good and Gaskins [21] and consolidated with Hyndman [23] and Chaudhuri and Marron [9].Rather than sticking to a purely probabilistic view on pdfs, our proposal thrives on sound geometry principles that have produced good results in applied areas like image processing [22].
Our work strongly relies on KDDE, continuing the exploration of applications for higher-order partial derivatives of the pdf [5].On the other hand, we bring to curvature BH some cutting-edge techniques for level set estimation and inference that extend the pointwise-oriented initial works by Godtliebsen, Marron, and Chaudhuri [20] and Duong et al. [18].
The presented curvature framework shows great applicability from a theoretical standpoint.Under mild assumptions, the mean curvature, Laplacian and Gaussian bumps are consistent with affordable convergence rates.The confidence regions for Laplacian bumps are also asymptotically valid and consistent.The cases for Gaussian bumps (inference), concave bumps and convex dips (consistency and inference) are slightly more technical.Notwithstanding, pathological cases should not often appear in practice.
The NBA application shows promise for EDA and clustering.Fig. 4a presents a most pleasing result, identifying the 3PL area and the most relevant shooting spots.Both bumps are valuable and combine to produce insightful visualizations.
Comparing the pictures in Fig. 4, we see that curvature bumps capture the players' rich shooting DNAs.Despite the ultimately unavoidable threat of the curse of dimensionality in KDE settings [6], the relatively small sample sizes did not detract from the accuracy of the results.Our methodology's apparent least impressive achievement is confidence regions despite asymptotic guarantees.In Fig. 5, the upper-bound confidence sets tend to be conservative.This was not wholly unexpected, as Chen, Genovese, and Wasserman warned [14].The margin is especially coarse for the concave bumps.In practice, we can mitigate this effect by splitting the bump and calculating the margin over smaller domains, employing a pilot estimation for guidance.Nonetheless, further research following Mammen and Polonik's universal approach [26] should yield even better results.5); on the right, 90%-confidence sets for Laplacian bumps (8).The confidence margins are based on 200 bootstrap samples, each with the same resample size as the original one.On either side, the area colours convey the same meaning.The non-blue sandy areas fall outside the confidence set bounds; the blue-coloured areas lie inside the confidence region.The darkest blue corresponds to the lower bound confidence set: a set that is likely contained in the bump.The remaining blue areas cover the upper bound confidence set: a set that likely contains the bump.Finally, the mid-light blue colour points out the estimated bump.
the maximum absolute value of a vector's components.Likewise, for r = 1, let , which yields the desired orders after considering the moment constraints on K.
In turn, when r = 0, we can resort to Hölder continuity, if α > 0. Considering the ∥•∥ 1 norm without loss of generality and letting C > 0 be the corresponding Hölder constant, from (S2) follows Then, the integral is finite because of the moment constraints on K after applying Jensen's inequality with ∥z∥ α 1 → (∥z∥ α 1 ) . This leads to a bound similar to the case r = 1, proving the order O(h α ).
Finally, if α = 0, we apply uniform continuity.Branching from (S1) with a change of variables, we get By uniform continuity, the supremum term will be small as long as δ is small regardless of x.Once fixed a sufficiently small δ, the integral term approaches zero as h does.In conclusion, the bias is o(1) as h → 0. ■ Proof of Theorem 3. The result follows after applying Theorem 2 with Ψ = ϕ[f ] and Ψ = ϕ[ fn,h ].We must check that ∥ϕ[ fn,h ] − ϕ[f ]∥ ∞,k , k ∈ {0, 1}, can be made arbitrarily small.From Theorem 1, uniform convergence for all the KDE's derivatives is ensured, except for a finite union of zero-probability events.Then, since partial derivatives of f are bounded, those of fn,h are also eventually bounded a.s.Therefore, we can restrict the domain of the function φ to a compact set U. This claim implies that both ϕ[p] and ∇ϕ[p] are uniformly continuous functions of partial derivatives of p, which proves that ∥ϕ[ fn,h ] − ϕ[f ]∥ ∞,k , for k ∈ {0, 1}, can be made arbitrarily small.In particular, φ is Lipschitz continuous, which yields the desired convergence order for the Hausdorff distance.■ Proof of Proposition 1. From [27, Wielandt-Hoffman theorem], any ordered eigenvalue function is Lipschitz continous.In particular, plug-in estimators of the eigenvalues are consistent in the uniform norm a.s.Therefore, using the triangle inequality, one arrives at inf where we have used that the weak convergence to a continuous rv implies convergence in the Kolmogorov distance [23,Lemma 2.11], and , for all i.This leads to (19).On the other hand, the class for any h > 0, is VC-type by our assumption on the kernel class K in (18), since VC-type classes are closed under summation and product [5].Also, F ′ h is clearly PM, as the continuity of DK allows for a countable subset indexed by a rational d-tuple x.Therefore, we can apply Lemma 3 to F ′ h , yielding the existence of | with high probability.Now, by our boundedness assumption on the ℓ-th derivatives of K, letting C > 0 be an envelope for F h , and assumming h ∈ (0, 1), the same C will be an envelope for , and bypassing the empirical process, we get To go from (S6) to the desired result, we apply [4, Lemma 10 -supplementary material], which in turn derives from [5,Lemma 2.3].As in [4], the assumptions (A1)-(A3) hold under our hypotheses.(A1) is a PM requirement, which again stands valid for F h .Then, (A2) is satisfied with the bound C, and we can use [5,Lemma 2.1] to infer the pre-Gaussian requirement (A3) from (A2) and the fact that VC-type classes have finite uniform entropy integral [5,23].Therefore, for positive constants µ, µ ′ , where it has been used that

by the coupling hypothesis between h and n, E [B
uniformly in h due to Dudley's inequality [24, Corollary 2.2.8] applied to the VCtype class δ>0 F ′ δ , which is larger than F ′ h .Differentiating the right-hand side of the last inequality with respect to γ yields the optimal order O[log 7/8 n/(nh d+ℓ ) 1/8 ], which also happens to be that of γ.
Finally, plugging the optimal γ in (S6), given ϵ > 0, there exists n 0 ∈ N such that, for n ≥ n 0 , we have Proof of Theorem 6.Let us start by decoupling the sample sizes in (16).Let us fix some observed values X n = {x 1 , . . ., x n } and take X * 1 , . . ., X * m i.i.d.random variables from P * n {X n }, for m independent of n.Keeping n fixed, let us consider the decoupled version of ( 16) and, subsequently, (19).From this point on, one can apply exactly the same steps in the proof of Theorem 5 to obtain an approximation for E * m,h [D|X n ] in the form of the supremum B n,h {X n } of a GP.The covariance structure for this B n,h {X n } immediately follows by considering X * 1 and P * n {X n } in (17).A question remains to be solved to finish the proof: undo the decoupling of m from n and the original sample X n .We have proved that, given X n , the Gaussian approximation works for sufficiently large m, but we have no guarantee the same would work if we took m = n → ∞.A close look at Lemma 3 and Theorem 5 shows all the constants involved are universal or depend only on F h , which remains unchanged for every X n .Therefore, we can take m = n and conclude that the result holds for a sufficiently large n.■ Proof of Corollary 1.It suffices to check that Ω n,h (X n ) converges to zero.First, note that the GP B Xn converges weakly to the GP B since the sample covariances converge to the population covariances (17) a.s.due to the law of large numbers.Therefore, applying the continuous mapping theorem [24, Theorem 1.3.6],we assure the suprema of the GPs also converge weakly.Let us assume for a moment that B h has a continuous cdf.Then, the uniform convergence in distribution over all t ∈ R would follow using [23,Lemma 2.11], finishing the proof.Now, B h has a continuous and strictly increasing cdf following [11, Corollary 1.3 + Remark 4.1] since we automatically rule out the case K in ( 18) is degenerate by the regularity assumptions on the kernel K.That is, we have K ̸ = {y → 0}.Therefore, B h cannot be the Dirac distribution concentrated at zero.Meanwhile, the properties of Bh derive from those of B h .■ Proof of Lemma 4. From [6, Theorem 2.1], we have, for all p ∈ (0, 1), where (•) + ≡ max{•, 0}.Now, since X has a strictly increasing cdf, its quantile function is continuous at every p ∈ (0, 1).Following [23, (4).Taking suprema at both sides of the inequality, and noting that ] since the terms in the sum jointly converge [23,Theorem 2.7 (vi)], and then we can apply the continuous mapping theorem [23,Theorem 2.3].This fulfils the pillar (I) in Theorem 4. Now, Z n,h is the sum of random variables whose cdfs can be approximated via bootstrap.The obstacle here lies in the dependencies between the terms in Z n,h ; Corollary 1 only applies to the margin cdfs, not the joint cdf.Notwithstanding, we only need some ζ α n,h satisfying ζ α n,h ≥ Q 1−α {Z n,h }.This ζ α n,h can be obtained by resorting to the TVaR concept, which bounds from above the corresponding quantile at the same confidence level while sub-additive [6,Equation 37].Interestingly, the TVaR is the lowest concave distortion risk measure satisfying both requirements [6,Theorem 5.2.2].Therefore, Note that ζ α n,h converges to zero because, by Lemma 4 and assumption (2), for every pair (i, j), ), to finish the proof.In turn, this follows using again [6, Lemma 2.1] after checking, for every pair (i, j), a.s.for sufficiently large n.From this point on, the proof follows the same steps as that of Theorem 9.
The rationale behind Section 2 lies in classical geometry.We refer the reader to do Carmo's books for a comprehensive introduction to differential and Riemannian geometry [7,8] and Grinfeld's for an operational perspective on tensor calculus [13].Then, Petersen's book has a complete chapter devoted to the central topic of hypersurfaces [22,Chapter 4].The appendix of Ecker's book also includes some key concepts [9].Finally, we recommend Gray, Abbena, and Salamon's book to build up some intuition in R 3 [12,Chapter 13].
B.1.Outline.After recalling some basic notions, we will address the concept of principal curvature.Concavity and convexity are defined through specific sign configurations among the principal curvatures.On the other hand, the mean curvature and the Gaussian curvature are alternative scalar summaries of principal curvatures.All these measures involve first and second-order derivatives of the pdf.Theoretical considerations below will discard the need for higher-order derivatives beyond those.Meanwhile, combining derivatives of different orders is a major inconvenience in a KDE setting.Hence, we present similarly-intended features that rely only on the Hessian matrix.B.2. Fundamental concepts.Let f : R d → [0, ∞) be a d-dimensional pdf.Let us assume that f is as smooth as needed.The graph of f , that is, G = {(x, f (x)) ∈ R d+1 : x ∈ R d }, is a hypersurface embedded in R d+1 and a d-dimensional smooth manifold, in particular.Let X : G → R d be the global coordinates chart of G mapping every (x, f (x)) ∈ G to x.Let also F : R d → G be the global parameterization of G, defined as F(x) = X −1 (x) = (x, f (x)).
At every p ∈ G, we can define a d-dimensional tangent vector space T p G comprising different ways of deriving smooth functions from G to R in a neighbourhood of p [8].In particular, the i-th element of the canonical basis determined by the chart is the i-th partial derivative ∂ i | p acting on functions φ : The union of all tangent spaces across p ∈ G is also a smooth manifold, known as the tangent bundle T G [8]. Elements in T G are called vector fields.A vector field can be thought of as a function mapping each point p ∈ G to a tangent vector in T p G. Every vector field X can be expressed, using Einstein's summation convention [13], as Let us now build on the previous concepts considering G is a hypersurface.B.2.1.Tangent vector fields.Being G immersed in R d+1 , there exists a canonical identification of basis vector fields ∂ i with real functions R d → R d+1 [9].To see this, consider the canonical identity immersion ι : G → R d+1 .The differential dι transforms vector fields in T G into vector fields in R d+1 .Given a smooth function φ : R d+1 → R, we have [dι is the immersed parameterization of G. Now, using the chain rule, for any x ∈ R d , D i (φ • F ι )(x) = ⟨∇φ(x, f (x)), D i F ι (x)⟩, where D i F ι is the i-th column vector field in the Jacobian matrix of F ι .Combining the last two equations, we see that dι(∂ i ) derives the function φ in the direction of the vector field D i F ι .Therefore, there is a canonical isomorphism where e i denotes the i-th canonical basis vector in the Euclidean space R d+1 .The identification (S8) unlocks both intrinsic and extrinsic properties of G. On the one hand, we have an inner product in R d+1 that can be brought into T G via (S8).On the other hand, we have d tangent vectors in T G and d + 1 linearly independent vector fields in R d+1 , leaving space for an orthogonal complement normal to the hypersurface.B.2.2.Normal vector field.At any given x ∈ R d , the vectors D i F ι (x) form a basis for the d-dimensional tangent hyperplane π x ⊂ R d+1 at F(x).Its orthogonal complement π ⊥ x is a straight line passing through F(x).One can check that Ň (x) = (∇f (x), −1) corresponds to the downward unit normal vector to π x , perpendicular to vectors (S8), thus generating π ⊥ x [9, Equation A.2]. Studying how the normal vector field (S9) changes along all directions over G provides a means to characterize curvature.B.2.3.First fundamental form.Utilizing (S8), the Euclidean inner product in R d+1 induces a metric field g on G with components [9] where δ ij denotes the Kronecker delta twice covariant tensor.The metric tensor g is known as the first fundamental form.It allows measuring angles and lengths on T G intrinsically.In particular, it defines at each point a norm ∥X∥ = g(X, X), for X ∈ T G.
The metric tensor has an inverse g ij satisfying δ i k = g ij g jk , where δ i k is the Kronecker delta (1, 1)-tensor.From the previous equation and (S10), using the Sherman-Morrison formula [15,Corollary 18.2.11],one can check that A simple expression for the metric determinant follows from [15,Corollary 18.1.3]when applied to (S10), yielding det(g) = 1 + ∥∇f • X ∥ 2 .In our context, compatibility with the metric tensor leads to a natural definition of derivatives for tangent vector fields over G. B.2.4.Field divergence.The metric tensor allows defining covariant derivatives over vector fields through the Levi-Civita connection ∇ G [8], with components given by the Christoffel symbols Γ k ij [13, Equation 5.66].Letting X, Y ∈ T G, the covariant derivative of ∇ G Y X is a vector field in T G. See [13, Equation 8.9] for an explicit expression of ∇ G i X ≡ ∇ G ∂i X involving the Christoffel symbols, the components of X and the basis vector fields.
The covariant derivative then can be used to define the divergence operator [13, Equation 8.2] on X ∈ T G as div G (X) = (∇ G i X) i , which is a function G → R. As it turns out, the divergence can be extended to extrinsic vector fields ⃗ V : R d → R d+1 living in the ambient space through [9] div meaning div G (•) coincides with div G (•) over tangent vector fields, i.e., for all j, div G (D j F ι ) = div G (∂ j ).To see this, note that D where N is a normal vector field [9].
The divergence operator (S11) appears in an alternative expression for the mean curvature defined below.B.2.5.Second fundamental form.Similarly to how the first fundamental form (S10) measures change intrinsically, we can define another twice covariant tensor measuring changes in the normal vector field along different tangent vector fields.This new tensor A is the second fundamental form [9].It has components where D ij denotes second-order partial differentiation in the i and j variables [9].
The second fundamental form is extrinsically defined since it contains the normal vector, which lives outside the tangent space.
B.2.6.Shape operator.Having introduced the first and second fundamental forms, the shape operator S : T G → T G connects the two via g(S(∂ i ), ∂ j ) = A ij [22].One can easily verify that the shape operator has components S i k = g ij A jk [9].Since A is symmetric, i.e., A ij = A ji , one can check that S is a self-adjoint operator, meaning g(S(X), Y ) = g(X, S(Y )), for X, Y ∈ T G.This implies that, at any point p ∈ G, all the eigenvalues of S p are real and there exists an associated orthonormal basis consisting of eigenvectors of S p [22,Section 2.4.6].These eigenvalues correspond to some curvature measure along the direction of the corresponding eigenvectors.
The shape operator is easily confused with the second fundamental form.Petersen refers to this issue as a matter of perspective [22,Section 2.3.1].Ecker even employs the same letter A instead of S to refer to the shape operator, relying on the position of the indices to distinguish between them (S is a (1, 1)-tensor, while A is (2, 0)).Meanwhile, Gray, Abbena, and Salamon equivalently define A from S and g [12, Definition 13.28].
The curvature tensor, pervasive in Riemannian geometry, takes a simple form for hypersurfaces, exclusively involving g and S [22, Section 4.2].This virtually ensures that curvature can be apprehended by inspecting only the first and second derivatives of the pdf f .B.2.7.Normal and principal curvatures.The definition of the normal curvature [7,12] of G in the non-null direction X = X i ∂ i ∈ T G also applies to hypersurfaces: where the denominator ensures κ(λX) = κ(X), for all λ ̸ = 0. Using the Cauchy-Schwarz inequality, for all non-null X ∈ T G, we have at any point |κ(X)| ≤ ∥S( X)∥, where X = X/∥X∥.Moreover, the equality holds if and only if (iff) X is an eigenvector of S. In that case, κ(X) is equal to the respective eigenvalue of X.Given an orthonormal basis of eigenvectors X i of S, we say that X i is the i-th principal direction of G and its corresponding eigenvalue κ i is known as the i-th principal curvature [8].The minimax principle establishes a variational connection between principal curvatures and the quotient (S13) [18,Section 6.10].In particular, for any non-null X ∈ T G, min i κ i ≤ κ(X) ≤ max i κ i .
The principal curvatures can be summarized into a single value according to several criteria, giving rise to the mean and Gaussian curvatures.B.2.8.Mean curvature.At any point in G, the mean curvature D is defined as the sum of all principal curvatures, that is, the trace of the diagonalized matrix of S [9].Since the trace is a matrix invariant, we have D = d i=1 κ i = tr(S) = S i i .Equivalently, using (S11) we get (S14), while extra calculations yield (S15): where ∇f = ∇f / 1 + ∥∇f ∥ 2 [9, Equation A.3].Both expressions convey the same idea: D is the divergence of a normalized vector field.The mean curvature will take negative values when Ň and ∇f locally converge towards a point.
B.2.9.Gaussian curvature.The shape operator contains another way of summarizing principal curvatures, this time by its determinant: we can infer that S coincides in signature with A and, consequently, with D 2 f .On the other hand, it is well-known that the eigenvalues of the Hessian D 2 f relate to the concavity and convexity of f .Namely, if all the eigenvalues at a point are negative (alternatively, positive), i.e., D 2 f is negative (positive) definite, then f is locally strictly concave (convex) around that point.Hence, the shape operator also determines local concavity and convexity because of the previous argument [22, p. 91].
The shape operator and the Hessian are equal iff G is the identity matrix, or, equivalently, iff ∥∇f ∥ = 0.The connection between Gaussian curvature and the Hessian determinant is evident from (S16).Furthermore, the trace of the Hessian is equal to the divergence of the gradient, meaning tr(D 2 f ) = div R d (∇f ).Therefore, attending to (S15), the mean curvature differs from the Hessian trace by a normalizing term acting on the gradient argument (see [10,Equation 5.28] for an explicit link between the two).In conclusion, the essence of curvature lies in the Hessian D 2 f .Appendix C. Extra applications C.1.Univariate receiving yards in the NFL.American football is inherently one-dimensional: only distances projected on the sidelines matter, despite the game taking place on a rectangular field.Hence, we propose a case study about the NFL to illustrate our methods on univariate data.
The New England Patriots from the NFL experienced noticeable changes in their passing DNA while winning the Super Bowl in 2014, 2016 and 2018.We will analyze the bumps in their receiving profiles in those three championship seasons.Fig. 6 shows the original data and the estimated curvature bumps.All three subfigures have a primary bump between 0 and 40 yards.In 2014, this bump is just slightly narrower than in 2016 and 2018.What differentiates all three seasons is the existence or absence of a secondary bump.In 2014, the second bump ranges between 80 and 100 yards and is perfectly visible.Two years later, this bump still exists but has become smoother.Finally, in 2018, the bump has almost been wholly ironed.This evolution reflects the transition from a team with a few go-to players to a more numerous receiver squad sharing responsibilities.C.2. Trivariate pitches in MLB.The pitching event lives at least in two dimensions.The batter needs to anticipate the arrival coordinates of the ball.KDE methods have shortly become standard for these bivariate samples.However, the pitched ball has other attributes that will make the batter's job difficult.A first approximation suggests considering pitches as a pair of location coordinates with a speed value measured from the release point.We propose studying trivariate pitching samples from three top MLB pitchers in the 2019 season: Clayton Kershaw, Justin Verlander and Max Scherzer.Fig. 7 shows pitching scatter data and bumps.In general, the three pitching patterns resemble Gaussian mixtures.It seems that each player chooses his pitches from a finite arsenal.Then, each pitch type has its location and speed variability.Most pitchers employ three mechanisms: the fastball, which relies on sheer speed; the slider, which moves sideways at relatively high speeds; and the curveball, which sinks at low speeds.
In Fig. 7a, Kershaw's fastballs and sliders have speeds ranging between 85 and 90 miles per hour (mph).Fastballs usually range beyond 95 mph, as for Verlander (Fig. 7b) and Scherzer (Fig. 7c), who have a defined separation between both pitch types.As for curveballs, Kershaw's have low speed and high vertical variability.Verlander and Scherzer's curveball clouds have similar compact shapes, but the former usually aims for the right side of the strike zone and the latter for the left.Scherzer's usage of curveballs is the lowest of all three.

Fig. 1 .
Fig. 1.Four ways of constructing bumps for basketball converted shot data.The exact 804 made shot locations are scattered across each sub-figure.The top left and right bumps correspond to HDRs comprising 35% and 95% of all observations.The bottom left bumps highlight regions where the pdf subgraph is locally concave.The bottom right bumps comprise points where the Laplacian of the underlying pdf takes negative values.

Fig. 2 .
Fig. 2. Curvature bumps for a bivariate Gaussian mixture encompassing two equally-weighted components with means µ 1 = [−3/2, 0], µ 2 = [3/2, 0] and covariance matrices Σ 1 = [1, −0.7; −0.7, 1], Σ 2 = [1, 0.7; 0.7, 1].The top two sub-figures show the same graph of the pdf f .The area colours refer to the values taken by a specific curvature functional ϕ[f ] at each point.For the lefthand picture, this function is the λ 1 [f ] that defines concave bumps (5); on the right, it is the mean curvature div( ∇f ) in(7).The magenta halos represent the zero level sets of those functionals and, thus, the corresponding bump boundaries.Concave and mean curvature bump boundaries show in blue and cyan in the bottom sub-figure, along with a 1,000-observation random sample from the mixture, where each point is coloured according to the value of f .

Fig. 3 .
Fig. 3. Concave and Laplacian bumps for Stephen Curry, James Harden and Kevin Durant.The three sub-figures have the same structure.On the left are concave bumps (5); on the right are Laplacian bumps(8).On either side, the two-dimensional surface is the fitted KDE pdf.The area colour refers to the curvature functional value at each point.The bump boundaries appear as lines on a flat basketball court at the top.

Fig. 4 .
Fig. 4. Shot scatter data with concave and Laplacian bumps for Stephen Curry, James Harden and Kevin Durant.The three sub-figures have the same structure.Each point corresponds to a made shot location.The number of observations is 804, 710 and 698 for Stephen Curry, James Harden and Kevin Durant.The lines represent bump boundaries: magenta for concave bumps (5); blue, Laplacian bumps (8).The colour of the dots in the scatter plot conveys the value of the KDE pdf at each point.

Fig. 5 .
Fig.5.Confidence sets for Stephen Curry, James Harden and Kevin Durant's bumps.The three sub-figures have the same structure.On the left, 90%confidence sets for concave bumps (5); on the right, 90%-confidence sets for Laplacian bumps(8).The confidence margins are based on 200 bootstrap samples, each with the same resample size as the original one.On either side, the area colours convey the same meaning.The non-blue sandy areas fall outside the confidence set bounds; the blue-coloured areas lie inside the confidence region.The darkest blue corresponds to the lower bound confidence set: a set that is likely contained in the bump.The remaining blue areas cover the upper bound confidence set: a set that likely contains the bump.Finally, the mid-light blue colour points out the estimated bump.
whence we can take b = C and σ = C √ h d+ℓ in Lemma 3. Dividing both sides of the inequality inside P in Lemma 3 by √ h d+ℓ , we arrive at

B. 2 . 10 .
curvature B owns outstanding geometrical and topological properties.Gauss' Theorema Egregium states that B and |B| are intrinsic invariants of a hypersurface, respectively, if d is even or odd [22, Lemma 3.1, p. 96].Interestingly, when d = 2, B is precisely half the scalar curvature, the scalar contraction of the curvature tensor (see [22, Section 2.2.5] and [22, Proposition 2.1, p. 92]).Hessian matrix.The Hessian matrix is present in the previous curvature concepts through the second fundamental form (S12).The Hessian D 2 f corresponds to the non-normalized version of A, replacing Ň by N = (∇f, −1) in (S12).The eigenvalues of D 2 f are scalar multiples of those of A. Moreover, expressing the shape operator in matrix form as S = G −1 A, and using Sylvester's law of inertia [15, Exercise 12 (c), p. 275],

Fig. 6 .
Fig. 6.New England Patriots' receiving yards in the 2014, 2016 and 2018 seasons.The three sub-figures have the same structure.On the left is the underlying KDE pdf graph; on the right is a one-dimensional scatter plot of the original observations with random jitter for better resolution.Each point in the scatter plot corresponds to the receiving yards from Tom Brady by some player at some game.The number of observations is 98, 82 and 111 for the 2014, 2016 and 2018 seasons.The coloured areas under the pdf's curve represent the sign of the second derivative: red means convex; blue, concave.The colour of the dots in the scatter plot conveys the value of the KDE pdf at each point.Inflection points show up as vertical dotted lines on both sides.

Fig. 7 .
Fig. 7. Pitching scatter data for Clayton Kershaw, Justin Verlander and Max Scherzer with 2680, 3839 and 3284 pitches, respectively.Concave bumps (5) and Laplacian bumps(8)  show in blue and grey, respectively.Speeds are reported in mph.The arrival coordinates at the home plate for each speed value should be interpreted relative to an averaged strike zone, which shows at the front.The colour of each point corresponds to the value of the underlying KDE pdf.

Table
Note that the latter is bounded by the bias sup x∈R d |E[∂ β fn,h (x)] − ∂ β f (x)| plus the stochastic error sup x∈R d |∂ β n {X n } assigning equal masses 1/n to each component x i ∈ R d of a particular n-size i.i.d.realization X n = {x 1 , . . ., x n } from f , and fn,h (•|X n ) is the realization of the KDE based on X n , i.e., * n,h (•|X n ) denotes the KDE based on n i.i.d.random variables X * 1 , . .., X * n ∼ P *n {X n } of the empirical bootstrap probability measure P * s. for n sufficiently large, which proves (12) for fn,h .Let λ Proof of Theorem 4. It is an exercise to show that sup x∈Θ |ϕ[ fn,h ](x) − ϕ[f h ](x)| ≤ j : R d 2 → R be the j-th eigenvalue function defined over d-dimensional square matrices.From [18, Theorem 5.16, p. 119], for every x ∈ ∂B ϕ ⊕ δ, λ j is infinitely differentiable in some neighbourhoods of D 2 f (x) and D 2 fn,h (x), since all eigenvalues have multiplicity one in both cases.■ A.2. Inference.