1 Introduction

In the literature of manifold learning, data in high dimensional Euclidean space are usually assumed to lie on a low dimensional manifold \({\mathcal {M}}\). The inference of geometry and topology of \({\mathcal {M}}\) becomes critical in the understanding and application of the data such as dimension reduction, clustering and visualization. Topological quantities often involve persistent homology (Edelsbrunner & Harer, 2008; Zomorodian & Carlsson, 2005), homotopy groups (Letscher, 2012) and fundamental groups (Batan et al., 2019), while geometric quantities include intrinsic dimension (Little et al., 2017), tangent space (Singer & Wu, 2012), geodesics (Li & Dunson, 2019), Laplacian operators (Belkin & Niyogi, 2004; Belkin & Niyogi, 2007) and curvature (Aamari & Levrard, 2019; Buet et al., 2018). There are also hot topics on the estimation of volume and support. This paper focuses on the estimation of the Weingarten map, or the second fundamental form, for point clouds sampled from submanifolds embedded in Euclidean spaces. The second fundamental form is a useful tool to study Riemannian manifold extrinsically. On the one hand, since there is no prior information about the manifold, it prohibits the intrinsic way to study the manifold from sampling data. On the other hand, the second fundamental form is closely related to other geometric quantities of the manifold. For example, in Niyogi et al. (2008) the authors showed that the operator norm controls the reach of the manifold. In Absil et al. (2013) they proposed a simple way to compute the Riemannian hessian based on the Weingarten map. Furthermore, it is well known that the second fundamental form measures how manifolds curve in the ambient space. Once the second fundamental form is known, we can compute all kinds of curvature from it.

Our motivation comes from the need to estimate the curvature for unstructured point cloud data. Efficient estimation of (Gaussian/mean/principal) curvature for point cloud data is an important but difficult problem. In geometry, curvature contains much information of the underlying space of an unordered point set. Therefore it provides prior information in many applications such as surface segmentation (Woo et al., 2002; Rabbani et al., 2006), surface reconstruction (Wang et al., 2006; Berger et al., 2016; Tang & Feng, 2018), shape inference (Tang & Medioni, 2002; Dey et al., 2003), point cloud simplification (Kim et al., 2002; Pauly et al., 2002) and feature extraction (Huang & Menq, 2001). However, methods used to estimate curvature are restricted. Direct computation on point clouds often requires local parametrization. One needs to fit a local parametric surface first. Then the curvature is obtained by substituting the coefficients into an analytical formula. Another way is to estimate curvature after surface reconstruction which turns point clouds into triangular meshes or level sets of some distance function (Sander & Zucker, 1990; Hoppe et al., 1992; Levin, 1998; Meek & Walton, 2000). However, there is little theory about estimation error analysis like convergence rate. This is not surprising since these existing algorithms are not aiming at minimizing the estimation error of the curvature, but instead to minimize the error of the surface approximation. In addition, even when surfaces are close to each other under the Euclidean distance, their curvature might not be close to each other. For example, we can perturb a straight line a little bit so that the curvature is far away from zero. As a result, a more direct and efficient approach with theoretical guarantees on the estimation error is needed.

Recently several methods have been proposed to estimate the second fundamental form. In Buet et al. (2018), the authors proposed the notion of generalized second fundamental form and mean curvature based on the theory of varifolds. The generalized mean curvature descends to the classical mean curvature if the varifold is the standard measure on a submanifold. Under specific conditions, they proved that the optimal convergence rate is \(O(n^{-1/2})\) where n is the sample size, regardless of the dimension of the varifold. In Aamari and Levrard (2019), the authors used polynomial fitting to estimate the tangent space, the second fundamental form and the support of manifold simultaneously. Under some assumptions for the regularity of the manifold and sampling, they proved the convergence rate for the second fundamental form is \(O(n^{-(k-2)/m})\) where k is regularity parameter and m is the dimension of manifold. In computer vision, several approaches are based on triangular meshes rather than point clouds. For example, S. Rusinkiewicz approximates the second fundamental form per triangle for meshes (Rusinkiewicz, 2004). J. Berkmann and T. Caelli proposed two covariance matrices to approximate the second fundamental form (Berkmann & Caelli, 1994). For curvature estimation, there are methods proposed without the need of estimation of second fundamental form. In Mérigot et al. (2011), the authors introduced the definition of Voronoi covariance measure (VCM) to estimate the curvature for noisy point clouds. Though a stability theorem is proved, there is no assertion on convergence rate. In the classic paper (Taubin, 1995), G. Taubin defined a matrix by an integration formula. As illustrated in Lange and Polthier (2005), this is nothing but the Weingarten map. This formula is adopted by Lange and Polthier (2005) to estimate the principal curvature on point sets. The authors then proposed a method for anisotropic fairing of a point sampled surface using mean curvature flow.

1.1 Our contribution

In this paper, we proposed a two-step procedure to estimate the Weingarten map for point clouds sampled from submanifolds embedded in Euclidean spaces. Firstly, we use local PCA to estimate the tangent spaces and normal spaces. Secondly, we apply a least-square method to compute the matrix representing the Weingarten map under the estimated tangent basis. The algorithm is general for point clouds in any dimension, and is efficient to implement due to low complexity.

A statistical model is set up to analyze the convergence rate for the Weingarten map estimator. Under the assumption of exact tangent spaces and given normal vector field, we proved that if the bandwidth is chosen to be \(O(n^{-1/(m+4)})\), then the optimal convergence rate will be \(O(n^{-4/(m+4)})\). Other than the kernel method, we also discussed the k-nearest-neighbor method, which is more convenient to use in practice. Compared with the method proposed in Buet et al. (2018), our method converges faster in low dimension. In comparison with the estimator proposed in Aamari and Levrard (2019), our method gives a closed form and is easier to compute.

The convergence rate is verified by numerical experiments on two synthetic data sets. We also compare WME with the traditional quadratic fitting method, the state-of-art algorithm in this literature. Our method yields better results than the quadratic fitting in both MSE and robustness. As an application, we propose a curvature-based clustering method in point cloud simplification. Furthermore, we reconstruct surfaces based on the simplified point clouds to give a visible comparison. Three real data sets are tested to show the gain of our WME algorithm.

1.2 Outlines

This paper is organized as follows. We introduce our WME method in Sect. 2 followed by a statistical model to analyze the convergence rate in Sect. 3. In Sect. 4, we verify the convergence rate and compare our method with quadratic surface fitting method using synthetic data. Applications to brain cortical data and experiments on point cloud simplification are given in Sect. 5. In Sect. 6, we discuss the possible applications of WME algorithm in future works.

2 Algorithm description

Let \({\mathcal {M}}\subseteq {\mathbb{E}}^d\) be an m-dimensional submanifold in a d-dimensional Euclidean space with induced Riemannian metric. At each point p we have the following decomposition

$$\begin{aligned} T_p{\mathcal {M}}\oplus T_p^\perp {\mathcal {M}}={\mathbb{E}}^d \end{aligned}$$
(1)

Let \(\overline{\nabla }\) be the standard connection in \({\mathbb{E}}^d\). For any normal vector \(\xi \in T_p^\perp {\mathcal {M}}\), extend \(\xi \) to be a normal vector field \(\tilde{\xi }\) on \({\mathcal {M}}\). The Weingarten map or the shape operator at p with respect to \(\xi \) is defined as

$$\begin{aligned} \begin{aligned}&A_\xi :T_p{\mathcal {M}}\rightarrow T_p{\mathcal {M}}\\&A_\xi (X)=-(\overline{\nabla }_X\tilde{\xi })^\top \end{aligned} \end{aligned}$$
(2)

where \(^\top :E^d\rightarrow T_p{\mathcal {M}}\) is the orthogonal projection to the tangent space. It can be verified that the definition of \(A_\xi \) is independent of the extension of \(\xi \) (see Appendix 1).

Example 1

Let \({\mathcal {H}}^{d-1}\subseteq {\mathbb{E}}^d\) be a hypersurface, and \(\tilde{\xi }\) be a unit normal vector field on \({\mathcal {H}}^{d-1}\). The Gauss map \(g:{\mathcal {H}}^{d-1}\rightarrow {\mathbb{S}}^{d-1}\) sending any point on the hypersurface to a point on the unit sphere is defined by \(g(p)=\tilde{\xi }_p\). For any \(X\in T_p{\mathcal {H}}^{d-1}\), we have

$$\begin{aligned} A_{\tilde{\xi }_p}(X)=-\mathrm{d}g(X) \end{aligned}$$
(3)

that is, \(-A_{\tilde{\xi }_p}\) is the tangent map of the Gauss map.

The Weingarten map measures the variation of the normal vector field. In fact, from the following ‘Taylor expansion’ of a normal vector field, we can see that the Weingarten map plays the role similar to the derivative of a function.

Proposition 1

Let \(\tilde{\xi }\) be a normal vector field on the submanifold \({\mathcal {M}}\subseteq {\mathbb{R}}^d\). Suppose that p is a point on \({\mathcal {M}}\) and q is any point within the geodesic neighborhood of p. Denote \(^\top \) to be the orthogonal projection to the tangent space at p.We have

$$\begin{aligned} (\tilde{\xi }_q-\tilde{\xi }_p)^\top =-A_{\tilde{\xi }_p}\big ((q-p)^\top \big )+O(\Vert q-p\Vert ^2) \end{aligned}$$
(4)

Proof

Let \({\mathbf{r}}:U^m\subseteq {\mathbb{R}}^m\rightarrow {\mathcal {M}}\) be the exponential map such that \({\mathbf{r}}(0)=p\). Denote \({\mathbf{u}}=(u^1,\ldots,u^m)\in U^m\). Then the vector fields \(\{\frac{\partial {\mathbf{r}}}{\partial u^i}\}_{i=1}^m\) form a local tangent frame. Denote \(\tilde{\xi }({\mathbf{r}}({\mathbf{u}}))\) by \(\tilde{\xi }({\mathbf{u}})\). Consider the following expansions

$$\begin{aligned} \begin{aligned}&\tilde{\xi }({\mathbf{u}})=\tilde{\xi }(0)+\sum _{i=1}^{m}u^i\frac{\partial \tilde{\xi }}{\partial u^i}(0)+O(\Vert {\mathbf{u}}\Vert ^2)\\&{\mathbf{r}}({\mathbf{u}})={\mathbf{r}}(0)+\sum _{i=1}^{m}u^i\frac{\partial {\mathbf{r}}}{\partial u^i}(0)+O(\Vert {\mathbf{u}}\Vert ^2) \end{aligned} \end{aligned}$$
(5)

By definition of \(A_{\tilde{\xi }_p}\), we have

$$\begin{aligned} A_{\tilde{\xi }_p}\left( \frac{\partial {\mathbf{r}}}{\partial u^i}(0)\right) =-\left( \overline{\nabla }_{\frac{\partial {\mathbf{r}}}{\partial u^i}(0)}\tilde{\xi }\right) ^\top =-\left( \frac{\partial \tilde{\xi }}{\partial u^i}(0)\right) ^\top \end{aligned}$$
(6)

Substituting to (5), we obtain that

$$\begin{aligned} \begin{aligned} \big (\tilde{\xi }({\mathbf{u}})-\tilde{\xi }(0)\big )^\top =&\left( \sum _{i=1}^{m}u^i\frac{\partial \tilde{\xi }}{\partial u^i}(0)+O(\Vert {\mathbf{u}}\Vert ^2)\right) ^\top =-A_{\tilde{\xi }_p}\big (({\mathbf{r}}({\mathbf{u}})-{\mathbf{r}}(0))^\top \big )+O(\Vert {\mathbf{u}}\Vert ^2)\\ \end{aligned} \end{aligned}$$
(7)

Note that \(\Vert {\mathbf{u}}\Vert \) represents the geodesic distance on the manifold by the property of the exponential map. According to the Proposition 2 in Appendix 2, the geodesic distance can be approximated by the Euclidean distance in the same order. Therefore, (4) follows. \(\square \)

Assume that n points \(x_1,\ldots, x_n\) viewed as points in \({\mathbb{R}}^d\) are independently sampled from some distribution on \({\mathcal {M}}\). The object is to estimate the Weingarten maps at each \(x_i\) (\(i=1,2,\ldots,n\)). Thus, firstly, we need to estimate the tangent and normal spaces. Otherwise, the estimation of Weingarten maps does not make sense without specifying the normal directions. Local PCA is an extensively used method in manifold learning to estimate tangent and normal spaces, whose effectiveness and consistency is well understood (see Singer & Wu 2012). Secondly, after tangent space estimation, Proposition 1 indicates a simple linear model to estimate the Weingarten maps from the data points, which can be resolved by a least-square method. Therefore, our WME algorithm is a two-step procedure where each step can be proceeded efficiently.

2.1 Local PCA

For every data point \(x_i\) (\(i=1,2,\ldots,n\)) we try to estimate a basis \(e_i^1,\ldots,e_i^m\) to the tangent space and a basis \(\xi _i^1,\ldots,\xi _i^{d-m}\) to the normal space. Fix a parameter \(h_{\text {PCA}}>0\) and define the following set\(I_i=\{j\in {\mathbb{N}}|\Vert x_j-x_i\Vert \le h_{\text {PCA}}\}\). The local covariance matrix is defined as

$$\begin{aligned} \text {Cov}= \sum _{j\in I_i} (x_j-\bar{x}_i)(x_j-\bar{x}_i)^\mathrm{t} \end{aligned}$$
(8)

where each data point is regarded as a column vector in \({\mathbb{R}}^d\) and \(\bar{x}_i=\frac{1}{|I_i|}\sum _{j\in I_i}x_j\) is the mean of neighboring points. The first m eigenvectors of \(\text {Cov}\) corresponding to the first m largest eigenvalues constitute the basis to the tangent space at \(x_i\), while the last \(d-m\) eigenvectors corresponding to the last \(d-m\) smallest eigenvalues form the basis to the normal space at \(x_i\).

2.2 Normal vector extension

Suppose that we pick up the normal vector \(\xi _i^\alpha \) (\(i=1,2,\ldots, m\), \(\alpha =1,2,\ldots,d-m\)) from local PCA and want to estimate the Weingarten map with respect to \(\xi _i^\alpha \). Extend \(\xi _i^\alpha \) to be a normal vector field \(\tilde{\xi }^\alpha \) on \({\mathcal {M}}\) by setting

$$\begin{aligned} \tilde{\xi }^\alpha _{x_j}=\sum _{\beta =1}^{d-m}\langle \xi _i^\alpha,\xi _j^\beta \rangle \xi _j^\beta \end{aligned}$$
(9)

for \(j=1,2,\ldots,n\). That is, we project \(\xi _i^\alpha \) to the normal space at \(x_j\). Since projection varies smoothly with respect to the points on the manifold, it results in a smooth normal vector field.

Assume \(K:{\mathbb{R}}\rightarrow {\mathbb{R}}\) is a twice differentiable function supported on [0, 1]. For example, the truncated Gaussian kernel on [0, 1] or Epanechnikov kernel. Let \(y\in {\mathbb{R}}^d\). Given \(h>0\), define \(K_h:{\mathbb{R}}^d\rightarrow {\mathbb{R}}\) to be

$$\begin{aligned} K_h(y)=\frac{1}{h^m}K(\frac{\Vert y\Vert }{h}) \end{aligned}$$
(10)

Let \(E_i=[e_i^1,\ldots,e_i^m]\) be the matrix consisting of the basis to the tangent space at \(x_i.\) According to Proposition 1, if \(x_j\) is close to \(x_i,\)

$$\begin{aligned} (\tilde{\xi }^\alpha _{x_j}-\xi _i^\alpha )^\mathrm{t}E_i= -(x_j-x_i)^\mathrm{t}E_iA_{\xi _i^\alpha }+O(\Vert x_j-x_i\Vert ^2) \end{aligned}$$
(11)

where \(A_{\xi _i^\alpha }\) is understood as \(m\times m\) matrix. Therefore, we want to find a matrix \(\widetilde{A}_{\xi _i^\alpha }\) which minimizes the following residual

$$\begin{aligned} \sum _{j=1}^{n}\Vert (\tilde{\xi }^\alpha _{x_j}+\xi _i^\alpha )^\mathrm{t}E_i-(x_j-x_i)^\mathrm{t}E_i\widetilde{A}_{\xi _i^\alpha }\Vert ^2K_h(x_j-x_i) \end{aligned}$$
(12)

Set

$$\begin{aligned} \begin{aligned}&\widetilde{\Delta }_i=E_i^\mathrm{t}\left[ x_1-x_i,\ldots,x_n-x_i\right] \\&\widetilde{\Xi }_i^\alpha =E_i^\mathrm{t}\left[ \tilde{\xi }^\alpha _{x_1}-\xi _i^\alpha,\ldots,\tilde{\xi }^\alpha _{x_n}-\xi _i^\alpha \right] \\&W_i = \text {diag}\{K_h(x_1-x_i),\ldots,K_h(x_n-x_i)\} \end{aligned} \end{aligned}$$
(13)

Then the solution of (12) is given in the following closed form

$$\begin{aligned} \widetilde{A}_{\xi _i^\alpha }=-\widetilde{\Xi }_i^\alpha W_i\widetilde{\Delta }_i^\mathrm{t}(\widetilde{\Delta }_i W_i\widetilde{\Delta }_i^\mathrm{t})^{-1} \end{aligned}$$
(14)

Remark 1

The above method gives the estimation of the Weingarten maps at each point with respect to all normal basis. Since each Weingarten map is an \(m\times m\) matrix and the normal basis consists of \(d-m\) vectors, the whole procedure gives \(m\times m\times (d-m)\) coefficients (in fact \(\frac{m(m+1)}{2}\times (d-m)\) coefficients since the Weingarten map is symmetric) at each point. Using these coefficients we can give the estimation of second fundamental form and mean curvature and sectional curvature at each point. If we assume the underlying manifold is of low dimension, then all the coefficients are about the size O(dn) where d is the dimension of the ambient space and n is the size of the point cloud.

Remark 2

Since the Weingarten map is a self-adjoint operator on the tangent space, the matrix \(\widetilde{A}_{\xi _i^\alpha }\) should be symmetric. It is natural to solve (12) on the space of symmetric matrices. However, as we will prove later, the solution of (12) converges to the true matrix. It is not necessary to solve a more complex optimization problem on the space of symmetric matrices. In cases where symmetry is important we can always use the symmetrization \(\frac{1}{2}(\widetilde{A}_{\xi _i^\alpha }^\mathrm{t}+\widetilde{A}_{\xi _i^\alpha })\).

3 Convergence rate

3.1 Statistical modeling

Let \(\xi \) be a normal vector field on \({\mathcal {M}}\), and P be a random vector valued in \({\mathcal {M}}\) with smooth positive density function f. Fix a point \(p\in {\mathcal {M}}\). To avoid complexity on notations, we drop the subscripts and \(A:T_p\mathcal {M}\rightarrow T_p\mathcal {M}\) is always understood as the Weingarten map (or its matrix representation if a basis is specified) at p with respect to \(\xi _p\). We rewrite Proposition 1 as follows: when P is within the normal neighborhood of p, we have

$$\begin{aligned} (\xi _P-\xi _p)^\top =-A\big ((P-p)^\top \big )+\eta (P)\Vert P-p\Vert ^2 \end{aligned}$$
(15)

where \(\eta :{\mathcal {M}}\rightarrow {\mathbb{R}}^m\) is assumed to be a bounded smooth function in the neighborhood of p. In addition, assume that a basis \(e_1,e_2,\ldots,e_m\) for the tangent space \(T_p\mathcal {M}\) is given. Set \(\mathbf {\Xi }=(\xi _P-\xi _p)^\top \), \(\mathbf {\Delta }=(P-p)^\top \) to be the vectors representing the coordinates under the basis, and set \(\mathbf {K}_P=K_h(P-p)\). Consider the following optimization problem

$$\begin{aligned} \mathop {\arg \min }_{{\mathbf{A}}\in {\mathbb{R}}^{m\times m}}{\mathbb{E}}_f\left[ \Vert \mathbf {\Xi }+{\mathbf{A}}\mathbf {\Delta }\Vert ^2\mathbf {K}_P\right] \end{aligned}$$
(16)

That is, we want to find the minimizer of the function

$$\begin{aligned} \begin{aligned} F({\mathbf{A}})&={\mathbb{E}}_f\left[ \text {Tr}(\mathbf {\Xi }\mathbf {\Xi }^\mathrm{t}+\mathbf {\Xi }\mathbf {\Delta }^\mathrm{t}{\mathbf{A}}^\mathrm{t}+{\mathbf{A}}\mathbf {\Delta }\mathbf {\Xi }^\mathrm{t}+{\mathbf{A}}\mathbf {\Delta }\mathbf {\Delta }^\mathrm{t}{\mathbf{A}}^\mathrm{t})\mathbf {K}_P\right] \\&=\text {Tr}({\mathbb{E}}_f[\mathbf {\Xi }\mathbf {\Xi }^\mathrm{t}\mathbf {K}_P])+2\text {Tr}({\mathbb{E}}_f[\mathbf {\Xi }\mathbf {\Delta }^\mathrm{t}\mathbf {K}_P]{\mathbf{A}})+\text {Tr}({\mathbf{A}}{\mathbb{E}}_f[\mathbf {\Delta }\mathbf {\Delta }^\mathrm{t}\mathbf {K}_P]{\mathbf{A}}^\mathrm{t}) \end{aligned} \end{aligned}$$
(17)

By setting \(dF/d{\mathbf{A}}=0\), the population solution can be given in the following closed form

$$\begin{aligned} {\mathbf{A}}=-{\mathbb{E}}_f[\mathbf {\Xi }\mathbf {\Delta }^\text {t}\mathbf {K}_P]({\mathbb{E}}_f[\mathbf {\Delta }\mathbf {\Delta }^\text {t}\mathbf {K}_P])^{-1}=-\mathbf {L}\mathbf {D}^{-1} \end{aligned}$$
(18)

where we have set \(\mathbf {L}={\mathbb{E}}_f[\mathbf {\Xi }\mathbf {\Delta }^\text {t}\mathbf {K}_P]\) and \(\mathbf {D}={\mathbb{E}}_f[\mathbf {\Delta }\mathbf {\Delta }^\text {t}\mathbf {K}_P]\). Denote the coordinate components \(\mathbf {\Xi }\cdot e_j\) by \((\mathbf {\Xi })_j\) and \(\mathbf {\Delta }\cdot e_j\) by \((\mathbf {\Delta })_j\) for \(j=1,2,\ldots,m\). In matrix form we have

$$\begin{aligned} \begin{aligned}&\mathbf {L}=\begin{bmatrix} {\mathbb{E}}_f[(\mathbf {\Xi })_1(\mathbf {\Delta })_1\mathbf {K}_P]&{}\quad \cdots &{}\quad {\mathbb{E}}_f[(\mathbf {\Xi })_1(\mathbf {\Delta })_m\mathbf {K}_P]\\ \vdots &{}\quad \ddots &{}\quad \vdots \\ {\mathbb{E}}_f[(\mathbf {\Xi })_m(\mathbf {\Delta })_1\mathbf {K}_P]&{}\quad \cdots &{}\quad {\mathbb{E}}_f[(\mathbf {\Xi })_m(\mathbf {\Delta })_m\mathbf {K}_P]\\ \end{bmatrix}\\&\mathbf {D}=\begin{bmatrix} {\mathbb{E}}_f[(\mathbf {\Delta })_1^2\mathbf {K}_P]&{}\quad \cdots &{}\quad {\mathbb{E}}_f[(\mathbf {\Delta })_1(\mathbf {\Delta })_m\mathbf {K}_P]\\ \vdots &{}\quad \ddots &{}\quad \vdots \\ {\mathbb{E}}_f[(\mathbf {\Delta })_m(\mathbf {\Delta })_1\mathbf {K}_P]&{}\quad \cdots &{}\quad {\mathbb{E}}_f[(\mathbf {\Delta })_m^2\mathbf {K}_P]\\ \end{bmatrix}\\ \end{aligned} \end{aligned}$$
(19)

Let \(x_1,\ldots, x_n\) be i.i.d. samples from P. Let \(\Xi _i,\Delta _i\) be the quantities obtained by replacing the random vector in \(\mathbf {\Xi },\mathbf {\Delta }\) with samples for \(i=1,2,\ldots,n\). Substituting the expectation by empirical mean, we obtain the empirical solution

$$\begin{aligned} \widetilde{A}=-\widetilde{L}\widetilde{D}^{-1} \end{aligned}$$
(20)

where

$$\begin{aligned} \begin{aligned}&\widetilde{L}=\begin{bmatrix} \frac{1}{n}\sum _{i=1}^{n}(\Xi _i)_1(\Delta _i)_1\mathbf {K}_{x_i}&{}\quad \cdots &{}\quad \frac{1}{n}\sum _{i=1}^{n}(\Xi _i)_1(\Delta _i)_m\mathbf {K}_{x_i}\\ \vdots &{}\quad \ddots &{}\quad \vdots \\ \frac{1}{n}\sum _{i=1}^{n}(\Xi _i)_m(\Delta _i)_1\mathbf {K}_{x_i}&{}\quad \cdots &{}\quad \frac{1}{n}\sum _{i=1}^{n}(\Xi _i)_m(\Delta _i)_m\mathbf {K}_{x_i}\\ \end{bmatrix}\\&\widetilde{D}=\begin{bmatrix} \frac{1}{n}\sum _{i=1}^{n}(\Delta _i)^2_1\mathbf {K}_{x_i}&{}\quad \cdots &{}\quad \frac{1}{n}\sum _{i=1}^{n}(\Delta _i)_1(\Delta _i)_m\mathbf {K}_{x_i}\\ \vdots &{}\quad \ddots &{}\quad \vdots \\ \frac{1}{n}\sum _{i=1}^{n}(\Delta _i)_m(\Delta _i)_1\mathbf {K}_{x_i}&{}\quad \cdots &{}\quad \frac{1}{n}\sum _{i=1}^{n}(\Delta _i)^2_m\mathbf {K}_{x_i}\\ \end{bmatrix}\\ \end{aligned} \end{aligned}$$
(21)

It is easy to check that the empirical solution given here is the same as the one given in (14). That is, it is the solution of the following empirical optimization problem

$$\begin{aligned} \mathop {\arg \min }_{\widetilde{A}\in {\mathbb{R}}^{m\times m}}\sum _{i=1}^{n}\Vert \Xi _i-\widetilde{A}\Delta _i\Vert ^2\mathbf {K}_{x_i} \end{aligned}$$
(22)

Finally, the mean square error (MSE) is defined as

$$\begin{aligned} \text {MSE}={\mathbb{E}}_f\left[ \Vert \widetilde{A}-A\Vert _F^2\right] \le 2\bigg (\underbrace{{\mathbb{E}}_f\left[ \Vert \widetilde{A}-{\mathbf{A}}\Vert _F^2\right] }_{\text {Variance}}+\underbrace{\Vert A-{\mathbf{A}}\Vert _F^2}_{\text {Bias}^2}\bigg ) \end{aligned}$$
(23)

where \(\Vert \cdot \Vert _F\) denotes the Frobenius norm.

3.2 Mean square error

In the estimation of either variance or bias, the main obstacle is how to control the norm of inverse matrices \(\mathbf {D}^{-1}\) and \(\widetilde{D}^{-1}\). The method is somewhat similar to that in kernel density estimation. When the bandwidth h is small, the integration is taken over a normal neighborhood near p. Using the parametrization of the exponential map, the integral domain is the tangent space and we can use Taylor expansion to find the leading terms. Before that we notice the following fact concerning Euclidean distance and manifold distance, which can be found in Ozakin and Gray (2009).

Lemma 1

Let \(d_p\) and \(\rho _p\) be the Riemannian distance and Euclidean distance to p, respectively. There exists a function \(R_p(h)\) and positive constants \(\delta _{R_p}\), \(C_{R_p}\) such that when \(h<\delta _{R_p}\), \(d_p(y)\le R_p(h)\) for all y with \(\rho _p(y)\le h\), and furthermore,

$$\begin{aligned} h\le R_p(h)\le h+C_{R_p}h^3 \end{aligned}$$
(24)

Lemma 1 indicates that for h small enough if \(\Vert P-p\Vert \le h\) then there exists a function such that the geodesic distance between P and p is controlled by \(R_p(h)\). First we give the estimation of \(\mathbf {D}^{-1}\). Some properties of exponential map are presented in Appendix 2.

Lemma 2

There exists a positive constant \(h_0\) such that when \(h<h_0\),

$$\begin{aligned} \mathbf {D}=h^2\left( f(p)\int _{\Vert \mathbf {z}\Vert \le 1}(z^1)^2K(\Vert \mathbf {z}\Vert )\mathrm{d}\mathbf {z}\right) (I+o(1)) \end{aligned}$$
(25)

where I is the identity matrix. Thus, the inverse is given by

$$\begin{aligned} \mathbf {D}^{-1}=\frac{1}{h^2\left( f(p)\int _{\Vert \mathbf {z}\Vert \le 1}(z^1)^2K(\Vert \mathbf {z}\Vert )\mathrm{d}\mathbf {z}\right) }(I+o(1)) \end{aligned}$$
(26)

Proof

Pick \(h_0\) such that for \(h<h_0\) the exponential map is well defined on the geodesic ball of radius \(R_p(h)\). For an arbitrary element \(\mathbf {D}_{kl}\), let \({\mathbf{r}}\) be the exponential map. We have

$$\begin{aligned} \begin{aligned} {\mathbb{E}}_f[(\mathbf {\Delta })_k(\mathbf {\Delta })_l\mathbf {K}_P]&=\int _{\mathcal {M}}((P-p)\cdot e_k)((P-p)\cdot e_l)K_h(P-p)f(P)dv\\&=\frac{1}{h^m}\int _{\Vert {\mathbf{u}}\Vert \le R_x(h)} (u^k+o(\Vert {\mathbf{u}}\Vert ))(u^l+o(\Vert {\mathbf{u}}\Vert ))\\&\quad K\left( \frac{\Vert {\mathbf{u}}\Vert +o(\Vert {\mathbf{u}}\Vert )}{h}\right) f({\mathbf{r}}({\mathbf{u}}))\sqrt{\det (g({\mathbf{u}}))}{\mathrm{d}}{\mathbf{u}}\end{aligned} \end{aligned}$$
(27)

where \(g({\mathbf{u}})\) denotes the Riemannian metric matrix. In normal coordinates, \(g_{ij}({\mathbf{u}})=\delta _{ij}+o(\Vert {\mathbf{u}}\Vert )\). Changing the variable of integration to \({\mathbf{u}}=h\mathbf {z}\), we obtain that

$$\begin{aligned} \begin{aligned} (3.13)&= \int _{\Vert \mathbf {z}\Vert \le R_x(h)/h}(hz^k+o(h\Vert \mathbf {z}\Vert ))(hz^l+o(h\Vert \mathbf {z}\Vert ))\\&\qquad K(\Vert \mathbf {z}\Vert +o(h\Vert \mathbf {z}\Vert )/h)f({\mathbf{r}}(h\mathbf {z}))\sqrt{g(h\mathbf {z})}{\mathrm{d}}\mathbf {z}\end{aligned} \end{aligned}$$
(28)

The integral domain can be divided into two parts: the first is the unit ball \(Q_1=\{{\mathbf{z}}|\Vert \mathbf {z}\Vert \le 1\}\) and the second is \(Q_2=\{{\mathbf{z}}|1\le \Vert \mathbf {z}\Vert \le R_x(h)/h\}\). On \(Q_1\), the integration is

$$\begin{aligned} h^2\left( \int _{\Vert \mathbf {z}\Vert \le 1}z^kz^lK(\Vert z\Vert )f(x)\mathrm{d}\mathbf {z}+o(1)\right) =h^2(\delta _{kl}f(x)\int _{\Vert \mathbf {z}\Vert \le 1 }(z^1)^2K(\Vert z\Vert )\mathrm{d}\mathbf {z}+o(1)) \end{aligned}$$
(29)

where we have used the symmetry of integration and \(\delta _{kl}\) denotes the Kronecker delta. On \(Q_2\), note that the integrand is a quantity of \(O(h^2)\) whereas the area of integral domain is \(O(h^2)\) by Lemma 1. Overall, we have shown that

$$\begin{aligned} {\mathbb{E}}_f[(\mathbf {\Delta })_k(\mathbf {\Delta })_l\mathbf {K}_P]=h^2(\delta _{kl}f(x)\int _{\Vert \mathbf {z}\Vert \le 1 }(z^1)^2K(\Vert z\Vert ){\mathrm{d}}\mathbf {z}+o(1)) \end{aligned}$$
(30)

which proves Eq. (25). The inverse matrix is given by the identity \((I-D)^{-1}=\sum _{k=0}^{\infty }D^k\) for small D. \(\square \)

Using the presented results, we are able to give the convergence order of bias.

Lemma 3

As \(h\rightarrow 0\), \(\Vert {\mathbf{A}}-A\Vert _F^2= O(h^4)\).

Proof

According to the model assumption (15), if we set \(\mathbf {\Theta }=\mathbf {\Xi }-A\mathbf {\Delta }=\eta (P)\Vert P-p\Vert ^2\), then we have

$$\begin{aligned} \begin{aligned} {\mathbf{A}}&={\mathbb{E}}_f[(A\mathbf {\Delta }+\mathbf {\Theta })\mathbf {\Delta }^\text {t}\mathbf {K}_P]({\mathbb{E}}_f[\mathbf {\Delta }\mathbf {\Delta }^\text {t}\mathbf {K}_P])^{-1}\\&=A+{\mathbb{E}}_f[\mathbf {\Theta }\mathbf {\Delta }^\text {t}\mathbf {K}_P]({\mathbb{E}}_f[\mathbf {\Delta }\mathbf {\Delta }^\text {t}\mathbf {K}_P])^{-1} \end{aligned} \end{aligned}$$
(31)

It suffices to estimate \({\mathbb{E}}_f[\mathbf {\Theta }\mathbf {\Delta }^\text {t}\mathbf {K}_P]\). As in the proof of Lemma 2, for \(h<h_0\), the integration is

$$\begin{aligned} \begin{aligned}&{\mathbb{E}}_f[(\mathbf {\Theta })_k(\mathbf {\Delta })_l\mathbf {K}_P]=\int _{\mathcal {M}}\eta ^k(P)\Vert P-p\Vert ^2((P-p)\cdot e_l)K_h(P-p)f(P)dv\\&\quad =\frac{1}{h^m}\int _{\Vert {\mathbf{u}}\Vert \le R_x(h)}\eta ^k({\mathbf{r}}({\mathbf{u}}))(\Vert {\mathbf{u}}\Vert ^2+o(\Vert {\mathbf{u}}\Vert ^2))(u^l+O(\Vert {\mathbf{u}}\Vert ^2))\\&\quad \qquad K\left( \frac{\Vert {\mathbf{u}}\Vert +o(\Vert {\mathbf{u}}\Vert )}{h}\right) f({\mathbf{r}}({\mathbf{u}}))\sqrt{\det (g({\mathbf{u}}))}\mathrm{d}{\mathbf{u}}\end{aligned} \end{aligned}$$
(32)

After the changing of variable \({\mathbf{u}}=h\mathbf {z}\), note that by symmetry we have

$$\begin{aligned} \int _{\Vert \mathbf {z}\Vert \le 1}z^l\Vert \mathbf {z}\Vert ^2K(\Vert \mathbf {z}\Vert )d\mathbf {z}=0 \end{aligned}$$
(33)

Thus the coefficient of \(h^3\) vanishes. Set

$$\begin{aligned} \begin{aligned}&\partial _l\eta ^k(0)=\left. \frac{\eta ^k({\mathbf{r}}({\mathbf{u}}))}{\partial u^l}\right| _{{\mathbf{u}}=0}\\&\partial _j\partial _sr^i(0)=\left. \frac{\partial r^i({\mathbf{u}})}{\partial u^j\partial u^s}\right| _{{\mathbf{u}}=0} \end{aligned} \end{aligned}$$
(34)

The leading term is

$$\begin{aligned} h^4\left( \bigg (\partial _l\eta ^k(0)+\frac{1}{2}\eta ^k(x)\sum _{i,j}\partial _j\partial _jr^i(0)\partial _lr^i(0)\bigg )f(x)\int _{\Vert \mathbf {z}\Vert \le 1}(z^1)^2\Vert \mathbf {z}\Vert ^2K(\Vert \mathbf {z}\Vert )\mathrm{d}\mathbf {z}\right) \end{aligned}$$
(35)

Utilizing the estimation for \(\mathbf {D}\), we obtain that

$$\begin{aligned} \Vert \widehat{A}-A\Vert _F^2\le \Vert {\mathbb{E}}_f[\mathbf {\Theta }\mathbf {\Delta }^t\mathbf {K}_P]\Vert _F^2\Vert \mathbf {D}^{-1}\Vert _F^2=O(h^4) \end{aligned}$$
(36)

\(\square \)

For the matrix \(\widetilde{D}\) we have error coming from random sampling. Therefore, the estimation only holds with high probability (w.h.p.). That is, for any \(\epsilon >0\) the estimation holds with probability greater than \(1-\epsilon \). For the simplicity of statements, we omit the specific computation of quantities involving \(\epsilon \).

Lemma 4

When \(h\rightarrow 0\) and \(nh^m\rightarrow \infty \) as \(n\rightarrow \infty \), w.h.p. the following equality holds

$$\begin{aligned} \widetilde{D}=\mathbf {D}+O\left( \frac{h^2}{\sqrt{nh^m}}\right) \end{aligned}$$
(37)

Thus the inverse is given by

$$\begin{aligned} \widetilde{D}^{-1}=\mathbf {D}^{-1}-\mathbf {D}^{-1}O\left( \frac{h^2}{\sqrt{nh^m}}\right) \mathbf {D}^{-1} \end{aligned}$$
(38)

Proof

For an arbitrary element \(\widetilde{D}_{kl}\), note that

$$\begin{aligned} {\mathbb{E}}_f[\widetilde{D}_{kl}]=\mathbf {D}_{kl} \end{aligned}$$
(39)

The variance is

$$\begin{aligned} {\mathbb{E}}_f\left[ (\widetilde{D}_{kl}-\mathbf {D}_{kl})^2\right] =\frac{1}{n}{\mathbb{E}}_f\left[ ((\mathbf {\Delta })_k(\mathbf {\Delta })_l\mathbf {K}_X-\mathbf {D}_{kl})^2\right] \le \frac{1}{n}{\mathbb{E}}_f\left[ ((\mathbf {\Delta })_k(\mathbf {\Delta })_l\mathbf {K}_P)^2\right] \end{aligned}$$
(40)

Using the same method as in the proof in Lemma 2, we can find the leading term is

$$\begin{aligned} \frac{1}{h^m}\left( h^4f(x)\int _{\Vert \mathbf {z}\Vert \le 1}(z^1)^2(z^2)^2K^2(\Vert \mathbf {z}\Vert )\mathrm{d}\mathbf {z}+o(h^4)\right) \end{aligned}$$
(41)

By Chebyshev’s inequality, for any \(\epsilon >0\),

$$\begin{aligned} {\mathbb{P}}\left( |\widetilde{D}_{kl}-\mathbf {D}_{kl}|\ge \epsilon \right) \le \frac{1}{\epsilon ^2}O(\frac{h^4}{nh^m}) \end{aligned}$$
(42)

as \(h\rightarrow 0\) and \(nh^m\rightarrow \infty \). Set the right side to be O(1), we can show that w.h.p. \(|\widetilde{D}_{kl}-\mathbf {D}_{kl}|\le h^2/\sqrt{nh^m}\), which proves the Eq. (37). The inverse is given by the following identity

$$\begin{aligned} \begin{aligned} \left( \mathbf {D}+O\left( \frac{h^2}{\sqrt{nh^m}}\right) \right) ^{-1}&=\left( I+\mathbf {D}^{-1}O\left( \frac{h^2}{\sqrt{nh^m}}\right) \right) ^{-1}\mathbf {D}^{-1}\\&=\left( I-\mathbf {D}^{-1}O\left( \frac{h^2}{\sqrt{nh^m}}\right) \right) \mathbf {D}^{-1}\\&=\mathbf {D}^{-1}-\mathbf {D}^{-1}O\left( \frac{h^2}{\sqrt{nh^m}}\right) \mathbf {D}^{-1} \end{aligned} \end{aligned}$$
(43)

\(\square \)

Using the presented results, now we can estimate the variance.

Lemma 5

Suppose that when \(h\rightarrow 0\) and \(nh^m\rightarrow \infty \) as \(n\rightarrow \infty \), the variance is \(O(\frac{1}{nh^m})\) w.h.p.

Proof

First by interpolating a mixing term we have

$$\begin{aligned} \begin{aligned}&{\mathbb{E}}_f\left[ \Vert {\mathbf{A}}-\widetilde{A}\Vert _F^2\right] = {\mathbb{E}}_f\left[ \Vert {\mathbf{A}}-\widetilde{L}\mathbf {D}^{-1}+\widetilde{L}\mathbf {D}^{-1}-\widetilde{A}\Vert _F^2\right] \\&\quad \le 2\bigg (\underbrace{{\mathbb{E}}_f\left[ \Vert {\mathbf{A}}{ +}\widetilde{L}\mathbf {D}^{-1}\Vert _F^2\right] }_{(*)}+\underbrace{{\mathbb{E}}_f\left[ \Vert \widetilde{L}\mathbf {D}^{-1}{ +}\widetilde{A}\Vert _F^2\right] }_{(**)}\bigg ) \end{aligned} \end{aligned}$$
(44)

For the first term we have

$$\begin{aligned} (*)={\mathbb{E}}_f\left[ \Vert (\mathbf {L}-\widetilde{L})\mathbf {D}^{-1}\Vert _F^2\right] \le {\mathbb{E}}_f\left[ \Vert \mathbf {L}-\widetilde{L}\Vert _F^2\right] \Vert \mathbf {D}^{-1}\Vert _F^2 \end{aligned}$$
(45)

For an arbitrary term in \(\mathbf {L}-\widetilde{L}\) we have

$$\begin{aligned} \begin{aligned}&{\mathbb{E}}_f\left[ \bigg (\frac{\sum _{i=1}^{n}(\Xi _i)_k(\Delta _i)_lK_{x_i}}{n}-{\mathbb{E}}_f[(\mathbf {\Xi })_k(\mathbf {\Delta })_l\mathbf {K}_P]\bigg )^2\right] \\&\quad =\frac{1}{n}{\mathbb{V}}\text {ar}_f\left[ (\mathbf {\Xi })_k(\mathbf {\Delta })_l\mathbf {K}_P\right] \le \frac{1}{n}{\mathbb{E}}_f\left[ (\mathbf {\Xi })_k^2(\mathbf {\Delta })_l^2\mathbf {K}^2_P\right] \end{aligned} \end{aligned}$$
(46)

When \(h<h_0\), by the model assumption (15), the above quantity (46) is bounded by

$$\begin{aligned} \begin{aligned}&\frac{\Vert A\Vert ^2_F}{h^{2m}}\int _{\Vert {\mathbf{u}}\Vert \le R_x(h)}(u^k+o(\Vert {\mathbf{u}}\Vert ))^2(u^l+o(\Vert {\mathbf{u}}\Vert ))^2\\&\quad k^2\left( \frac{\Vert {\mathbf{u}}\Vert +o(\Vert {\mathbf{u}}\Vert )}{h}\right) f({\mathbf{r}}({\mathbf{u}}))\sqrt{\det (g({\mathbf{u}}))}\mathrm{d}{\mathbf{u}}\end{aligned} \end{aligned}$$
(47)

where the leading term is

$$\begin{aligned} f(x)\frac{\Vert A\Vert _F^2}{h^{m-4}}\int _{\Vert \mathbf {z}\Vert \le 1}(z^1)^2(z^2)^2K(\Vert \mathbf {z}\Vert )^2\mathrm{d}\mathbf {z}\end{aligned}$$
(48)

Together with the estimation for \(\mathbf {D}^{-1}\), we see that \((*)= O(\frac{1}{nh^m})\).

For the second term we have

$$\begin{aligned} \begin{aligned} (**)&={\mathbb{E}}_f\left[ \Vert \widetilde{L}(\mathbf {D}^{-1}-\widetilde{D}^{-1})\Vert _F^2\right] ={\mathbb{E}}_f\left[ \Vert \widetilde{L}\mathbf {D}^{-1}(\widetilde{D}-\mathbf {D})\widetilde{D}^{-1}\Vert _F^2\right] \\&\le {\mathbb{E}}_f\left[ \Vert \widetilde{L}\Vert _F^2\right] {\mathbb{E}}_f\left[ \Vert \widetilde{D}-\mathbf {D}\Vert _F^2\right] {\mathbb{E}}_f\left[ \Vert \widetilde{D}^{-1}\Vert _F^2\right] \Vert \mathbf {D}^{-1}\Vert _F^2 \end{aligned} \end{aligned}$$
(49)

Now the order of each quantity in (49) is

  1. 1.

    Similar to the proof of Lemma 4, note that \({\mathbb{E}}_f[\widetilde{L}_{kl}]=\mathbf {L}_{kl}\) and the variance is controlled by a quantity of order \(O(h^4/nh^m)\) w.h.p. Thus,

    $$\begin{aligned} {\mathbb{E}}_f\left[ \Vert \widetilde{L}\Vert _F^2\right] \le O(h^4) \end{aligned}$$
    (50)
  2. 2.

    Using the same method as in the estimation for \(\Vert \mathbf {L}-\widetilde{L}\Vert _F^2\), we see that it is controlled by \(O(1/nh^{m-4})\) w.h.p.

  3. 3.

    By Lemma 4, this term is \(O(1/h^4)\) w.h.p.

  4. 4.

    By Lemma 2, this term is \(O(1/h^4)\).

Hence, the order of \((**)\) is \(O(\frac{1}{nh^m})\). Overall, the rate for variance is proved. \(\square \)

Altogether we can give the convergence rate of MSE.

Theorem 1

Let \(\xi \) be a normal vector field on \({\mathcal {M}}\), and P be a random vector valued in \({\mathcal {M}}\) with smooth positive density function f. Assume that \(K:{\mathbb{R}}\rightarrow {\mathbb{R}}\) is a twice differentiable function supported on [0, 1]. In addition, assume that a basis \(e_1,e_2,\ldots,e_m\) for the tangent space \(T_p\mathcal {M}\) is given. Let \(x_1,\ldots, x_n\) be i.i.d. samples from P. When \(h\rightarrow 0\) and \(nh^m\rightarrow \infty \) as \(n\rightarrow \infty \), the mean square error defined as in (23) is

$$\begin{aligned} \mathrm {MSE}= O(h^4)+O\left( \frac{1}{nh^m}\right) \end{aligned}$$
(51)

If h is chosen to be proportional to \(n^{-1/(m+4)}\), the optimal convergence rate is given by \(O(n^{-4/(m+4)})\).

Remark 3

In the theorem we assumed that the normal vector field is precise. The model becomes much more complicated if we also consider the error from tangent/normal space estimation. However, in applications the choice of \(h_{\text {PCA}}\) is not sensitive. We choose \(h_{\text {PCA}}=O(n^{-1/m})\) as in Little et al. (2017) to guarantee \(I_i\) contains enough points.

3.3 k-nearest-neighbor method

k-nearest-neighbor method is widely used in many settings since it is intuitively simple, easy to implement and computationally efficient. For simplicity, we focus on disk neighbors with fixed radius h in this article. However, the disk neighbor can be replaced by k-nearest-neighbor without any other changes. According to Theorem 1 and the well known relation \(\frac{k}{n} \sim h^m\), we suggest to set \(k = O(n^{4/(m+4)})\) in practice to reach the optimal convergence rate \(O(n^{-4/(m+4)})\).

3.4 Comparison with other estimators

In Buet et al. (2018), the authors proposed generalized second fundamental form and mean curvature for varifolds based on geometric measure theory. Therefore, even if the underlying space has singularities (for example, self-intersecting points), they are able to compute the generalized mean curvature. Basically, a m-varifold is a Radon measure on the space \({\mathbb{R}}^d\times {\mathbb{G}}_{m,d}\) where \({\mathbb{G}}_{m,d}\) is the Grassmannian manifold of m-planes in \({\mathbb{R}}^d\). For an m-varifold V, they define the generalized mean curvature field using the first variation

$$\begin{aligned} \begin{aligned} \delta V: C_c^1({\mathbb{R}}^d,{\mathbb{R}}^d)&\rightarrow {\mathbb{R}}\\ X&\rightarrow \int _{{\mathbb{R}}^d\cap {\mathbb{G}}_{m,d}}\text {div}X(x)\mathrm{d}V(x,S) \end{aligned} \end{aligned}$$
(52)

The generalized mean curvature descends to the classical mean curvature in the sense that \(\delta V=-H\mathcal {H}^m_{|{\mathcal {M}}}\) if V is the standard measure on \({\mathcal {M}}\). However, in general, the first variation cannot be given in closed form. For a point cloud varifold \(V=\sum _j=1^n m_j\delta _{x_j}\otimes \delta _{P_j}\), the first variation \(\delta V\) is not a measure. Therefore, they put forward the following quantity to approximate the mean curvature for point clouds

$$\begin{aligned} H^V_{\alpha,\beta,\epsilon }=\frac{C_{\beta }}{C_{\alpha }\epsilon }\frac{\sum _{x_j\in B_\epsilon (x)\backslash \{x\}}m_j\alpha '(\frac{\Vert x_j-x\Vert }{\epsilon })\Pi _{P_j}\frac{x_j-x}{\Vert x_j-x\Vert }}{\sum _{x_j\in B_{\epsilon }(x)}m_j\beta (\frac{\Vert x_j-x\Vert }{\epsilon })} \end{aligned}$$
(53)

where \(\alpha,\beta \) are functions supported on \([-1,1]\) and such that \(\int _{{\mathbb{R}}^d}\alpha (x)\mathrm{d}x=\int _{{\mathbb{R}}^d}\beta (x)\mathrm{d}x=1\), and \(C_\alpha,C_\beta \) are constants related to \(\alpha,\beta \). They proved that \(H^V_{\alpha,\beta,\epsilon }\) converges to H under certain conditions (Theorem 3.6 in Buet et al., 2018). Specifically, if the points are uniformly sampled and the tangents are exact, the convergence rate is \(\frac{1}{n\epsilon }+\epsilon \) given that \(n\epsilon \rightarrow \infty \). Therefore, the optimal bandwidth is \(\epsilon =n^{-1/2}\) and the optimal convergence rate is also \(n^{-1/2}\). Compared with WME algorithm, this method yields faster convergence when the underlying manifold is of relatively high dimension. However, for curves and surfaces, our algorithm provides faster convergence.

In Aamari and Levrard (2019), the authors proposed estimators for the tangent space, the second fundamental form and the support of manifold based on local polynomial fitting. Finite sample rates are derived. Let \({\mathcal {M}}\) be a submanifold subject to some regularity conditions, and in addition assume that the data are sampled uniformly on the manifold. Then the convergence rate is \(O(n^{-(k-2)/m})\) where k is regularity parameter and m is the dimension of manifold. However, the estimator cannot be given in a closed form. Nonconvex optimization techniques should be involved in order to approximate the estimators.

4 Numerical experiments

We apply WME algorithm on various synthetic data sets and verify the optimal convergence rate proved in theorem 1. Both kernel method and k-Nearest-Neighbor method are used to show the consistency of WME algorithm. Then the algorithm is applied to curvature estimation. A comparison with classical local quadratic fitting method is carried out to demonstrate the efficiency and robustness.

4.1 Kernel method

We verify the convergence rates on three synthetic data sets:

4.1.1 Conical spiral

A conical spiral is a space curve given by

$$\begin{aligned} \gamma (t) = (rt\cos (at),rt\sin (at),t) \end{aligned}$$
(54)

where \(r,a>0\) are parameters. Here we set \(r=1\) and \(a=1.5\). It is a 1-dimensional manifold embedded in 3-Euclidean space. If we choose h proportional to \(n^{-1/5}\), then the optimal convergence rate will be \(O(n^{-4/5})\).

4.1.2 Torus

A 2-dimensional torus is given by

$$\begin{aligned} F(\theta,\alpha )=(R+r\cos (\theta )\cos (\alpha ),(R+r\cos (\theta ))\sin (\alpha ),r\sin (\theta )) \end{aligned}$$
(55)

where \(R>2r>0\) are parameters. Here we set \(R=4\) and \(r=0.5\). Thus if we choose h proportional to \(n^{-1/6}\), the optimal rate will be \(O(n^{-2/3})\).

4.1.3 Ellipsoid

A 3-dimensional ellipsoid is given by

$$\begin{aligned} F(\theta,\alpha,\beta )=(a\cos (\theta ),b\sin (\theta )\cos (\alpha ),c\sin (\theta )\sin (\alpha )\cos (\beta ),d\sin (\theta )\sin (\alpha )\sin (\beta )) \end{aligned}$$
(56)

where \(a,b,c,d>0\) are parameters. Here we set \(a=1\), \(b=1.15\), \(c=1.31\) and \(d=1.7\). If the bandwidth is chosen proportional to \(n^{-1/7}\), the convergence rate will be \(O(n^{-4/7})\).

We uniformly sample 1000–20,000 points using the parametrizations given as above. The bandwidth is chosen so that the convergence rate will be optimal. We perform leave-one-out estimates on the data points. The \(\log (\text {MSE})\) is drawn with respect to \(\log (\text {number of points})\). Therefore the slope corresponds to the order of convergence rate. The results are shown in Fig. 1.

Fig. 1
figure 1

The \(\log (n)\)-\(\log (\text {MSE})\) plot for conical spiral, 2-dimensional torus and 3-dimensional ellipsoid. Slopes match the rates of convergence

4.2 k-nearest-neighbor method

We test the optimal convergence rate on a 2-dimensional torus and a 2-dimensional ellipsoid. The sample size for the torus with major radius 5 and minor radius 2 ranges from 1000 to 20, 000. The sample size for the ellipsoid with length of principal axes 6, 6, 8 ranges from 7000 to 30, 000. k is chosen to be \(n^{\frac{2}{3}}\). Points are uniformly sampled. Figure 2 is the \( \log (n)\)-\(\log (\text {MSE})\) plot. The slopes match the optimal convergence rate.

Fig. 2
figure 2

The \(\log (n)\)-\(\log (\text {MSE})\) plot for 2-dimensional torus and 2-dimensional ellipsoid. Slopes match the convergence rate

4.3 Curvature estimation

We compare our method with a local quadratic surface fitting method. This is chosen for two reasons. On one hand, quadratic fitting is a commonly used method. Other complicated fitting algorithms involve extra scaling parameters which are difficult to tune in practice. On the other hand, quadratic fitting is studied by many scientists. In Magid et al. (2007), the authors compared five methods in computing Gaussian and mean curvature for triangular meshes of 2 dimensional surfaces. The result turns out that quadratic fitting exceeds other methods in computing mean curvature.

The method of quadratic surface fitting is illustrated as follows. First translate and rotate the k-nearest neighbors of a point so that its normal vector coincides with z-axis. Then fit the paraboloid \(z=ax^2+bxy+cy^2\) by least square. The Gaussian curvature and mean curvature at origin P are given by

$$\begin{aligned} K=4ac-b^2,H=a+c. \end{aligned}$$
(57)

The MSE of Gaussian curvature and mean curvature are compared as follows. We sample 1000–20,000 points on a torus with major radius 5 and minor radius 2. The number of k-nearest neighbors is set to be 100 for each iteration. The result in Fig. 3 shows that our method excels the quadratic fitting method without introducing any computational complexity.

Fig. 3
figure 3

MSE of Gaussian curvature and mean curvature on the torus obtained by WME and quadratic fitting

The robustness is compared as follows. Again we sample \(1000,2000,\ldots,20{,}000\) points on the same torus with multivariate Gaussian noise with zero mean and covariance \(\sigma ^2 I_3\) where \(\sigma ^2\in \{0.01,0.05,0.001\}\). The MSE of Gaussian curvature and mean curvature for different noises in Fig. 4 show that our method is more robust.

Fig. 4
figure 4

Robustness comparison on the noisy torus. Left and right plots are comparisons of Gaussian and mean curvature respectively

5 Applications

We apply our WME algorithm to real data sets. The first application is curvature estimation for brain cortical surfaces. We demonstrate the robustness of WME method on a real cortical surface data set. The second application is point cloud simplification which is a hot topic in computer vision. We propose a new method based on curvature to simplify large point cloud data sets. The results of following experiments show that our algorithm is also practical for real point clouds.

Fig. 5
figure 5

The first panel is the cortical surface data colored by the Gaussian curvature. The last two plots are \(\log (\text {MSE})\) of the Gaussian and the mean curvature on this dataset

5.1 Brain cortical surface data

To further illustrate the robustness, we test our method on the real brain cortical surface data. A point cloud of left brain cortical surface is obtained from Human Connectome Projects (s3://hcp-openaccess/HCP_1200/100206/MNINonLinear/Native/), consisting of 166,737 position vectors. This data is noisy and there is no information about the true curvature of the surface so there is no ground truth and the error can’t be calculated. Instead we propose an indirect way to evaluate the performance. Firstly, we estimate the Gaussian and mean curvature for the point cloud based on the entire dataset. The results are regarded as the true curvature for the underlying cortical surface. Then, the data is divided into training and testing sets. We recalculate the Gaussian and mean curvature for training data. For each testing data, the curvature is inferred to be the mean of curvature for its k nearest neighbors in training data. Finally, we compute the mean square error of the curvature for testing data. The same procedure is also carried out using quadratic surface fitting method. From Fig. 5, the mean square error obtained from WME is monotonically decreasing as the number of testing data increases but the error from quadratic surface fitting method is fluctuating, which means that WME is more robust on this real and complicated dataset.

5.2 Point cloud simplification

Point clouds are often converted into a continuous surface representation such as polygonal meshes and splines. This process is called surface reconstruction (Berger et al., 2016). The reconstruction algorithms require large amounts of memory and do not scale well with data size. Hence before further processed, the complexity of point cloud data should be reduced first. In Pauly et al. (2002), the authors proposed three types of simplification algorithms: clustering methods, iterative simplification and particle simulation. These methods are based on a quantity defined by the covariance of local data. As claimed by Pauly et al. (2002), this quantity reflects the curving of point cloud. However, the clear relation between this quantity and the curvatures needs to be further studied. Here we propose a curvature-adaptive clustering simplification algorithm and compare with uniform clustering simplification algorithm.

The uniform clustering method is described as follows. Starting from a random seed point, a cluster \(C_0\) is built by successively adding the nearest neighbors. The process is terminated when the size of clusters reaches the previously set threshold. The next cluster \(C_1\) is built in the same procedure with all points in \(C_0\) excluded. Each cluster is represented by its mean as a representative. The simplified point cloud is given by the representatives.

Intuitively, to preserve the geometric details of point cloud, the points in highly curved region should be kept. Therefore, a seed point with larger curvature should generate smaller cluster. Let \(\Omega \) represent any kind of (Gaussian, mean or principal) curvature. Suppose that \(|\Omega |_{\max }\) is the largest absolute curvature of the entire surface. Starting from a random seed point p, with absolute curvature \(|\Omega |_p\), a cluster \(C_p\) is built by successively adding the nearest neighbors. The process is terminated when the size of cluster reaches

$$\begin{aligned} \#C_p=\left\lceil \left( 1-c\frac{|\Omega _p|}{|\Omega |_{\max }}\right) T\right\rceil \end{aligned}$$
(58)

where \(0<c<1\) is the scaling constant and T is the preset threshold. \(\lceil \cdot \rceil \) denotes the ceiling function. The cluster and its curvature are represented by the mean of its points and mean of corresponding curvature. This yields a non-uniform clustering method.

The algorithms are applied to three scanned data sets: the Duke dragon, the Stanford bunny and the Armadillo. Here we adopt absolute mean curvature for curvature-based simplification. After simplification, we apply the Moving Least Square (MLS) method for surface reconstruction (Berger et al., 2016). The visualized surfaces in Figs. 6, 7 and 8 give a direct comparison of two algorithms. In each of these figures, the first subfigure represents the simplified point cloud using uniform clustering and the second subfigure is the reconstructed surface from the point cloud. Similarly, the third subfigure represents result using curvature-based clustering and the fourth subfigure shows the surface reconstructed from it. Results show that WME preserves more geometric information than the uniform method, especially for the region with larger curvature.

Fig. 6
figure 6

Duke Dragon dataset. The surfaces are reconstructed from 6500 points

Fig. 7
figure 7

Stanford Bunny dataset. The surfaces are reconstructed from 4400 points

Fig. 8
figure 8

Armadillo dataset. The surfaces are reconstructed from 7800 points

6 Discussions and future works

This paper introduced a new algorithm to estimate the Weingarten map for point clouds sampled from some distribution on a submanifold embedded in Euclidean space. A statistical model was also established to investigate the optimal convergence rate of the Weingarten map estimator. Numerical experiments were carried out to validate the consistency of the algorithm. Compared with other methods concerning the second fundamental form and curvature estimation, our method showed great robustness and efficiency. Applications to real data sets indicated that our method worked well in practice.

The Weingarten map is a vital tool which is indispensable in the study for submanifolds. In the literature of manifold learning, where data sets from high dimensional Euclidean space are always assumed to be sampled from some underlying low dimensional manifold, the Weingarten map helps researchers to investigate the underlying space from an extrinsic view. Furthermore, the Weingarten map is also in close relation with many other concepts in Riemannian geometry. Curvature is mentioned as one of them. Therefore, we expect many possible applications in the future. For example, the following are of our concern.

Optimization on manifolds

In Absil et al. (2013) the authors presented a useful relationship between the Weingarten map and Riemannian hessian of functions, and the latter plays an important role in Riemannian optimization. Especially, Riemannian hessian is critical in all kinds of second-order optimization methods on Riemannian manifolds (Boumal, 2020; Huang et al., 2015). We expect our method can be applied in Riemannian optimization problems such as low-rank matrix completion (Vandereycken, 2013), high-dimensional tensor completion (Steinlechner, 2016) and independent subspace analysis (Nishimori et al., 2006).

Laplacian-based methods

Laplacian-based methods are extensively studied in manifold learning. Theories and applications about graph Laplacian are popular in the realm of computer vision (Dinesh et al., 2020), network analysis (Grindrod et al., 2018), and statistical learning (Belkin & Niyogi, 2002; Belkin & Niyogi, 2007; Belkin & Niyogi, 2004). It is known that the Weingarten map is related to the Beltrami–Laplacian operator on Riemannian manifolds. We expect to find other ways to construct the Laplacian estimator on point clouds.

Statistical inference on point clouds

In the application for brain cortical data our method showed great efficiency in curvature estimation. In recent researches, many kinds of curvature appeared to be useful indexes in statistical inference and regression (Luders et al., 2006; Yue et al., 2020). We expect to find applications in fields such as biostatistics, medical imaging and neuroscience.