Abstract
We study the convergence of the graph Laplacian of a random geometric graph generated by an i.i.d. sample from a m-dimensional submanifold \({\mathcal {M}}\) in \(\mathbb {R}^d\) as the sample size n increases and the neighborhood size h tends to zero. We show that eigenvalues and eigenvectors of the graph Laplacian converge with a rate of \(O\Big (\big (\frac{\log n}{n}\big )^\frac{1}{2m}\Big )\) to the eigenvalues and eigenfunctions of the weighted Laplace–Beltrami operator of \({\mathcal {M}}\). No information on the submanifold \({\mathcal {M}}\) is needed in the construction of the graph or the “out-of-sample extension” of the eigenvectors. Of independent interest is a generalization of the rate of convergence of empirical measures on submanifolds in \(\mathbb {R}^d\) in infinity transportation distance.
Similar content being viewed by others
Notes
Note that as stated, our theorems give \(C_{m,\alpha ,r }\) , but in this case \(C_{m,\alpha ,r }= C_{m,\alpha } r\) because we can always rescale to the unit ball.
Note that as stated, Theorem 1.1 in [11] gives \(C_{m,\alpha ,\beta ,r }\), but in this case \(C_{m,\alpha ,\beta ,r }= C_{m,\alpha ,\beta }\, r\) as one can simply rescale to the unit ball.
Abbreviations
- \({\mathcal {M}}\) :
-
Compact manifold without boundary embedded in \(\mathbb {R}^d\). Riemann metric on \({\mathcal {M}}\) is the one inherited from \(\mathbb {R}^d\)
- m :
-
The dimension of \({\mathcal {M}}\)
- \(\hbox {Vol}(A)\) :
-
The volume of \(A \subset {\mathcal {M}}\) according to Riemann volume form
- d(x, y):
-
The geodesic distance between points \(x,y \in {\mathcal {M}}\)
- \(B_{\mathcal {M}}(x,r)\) :
-
Ball in \({\mathcal {M}}\) with respect to geodesic distance on \({\mathcal {M}}\)
- B(r):
-
Ball in \(\mathbb {R}^d\) of radius r, centered at the origin
- \(\mu \) :
-
Probability measure supported on \({\mathcal {M}}\) that describes the data distribution
- p :
-
Density of \(\mu \) with respect to volume form on \({\mathcal {M}}\)
- \(\rho \) :
-
Density of the weight measure (which allows us to consider the normalized graph Laplacian) with respect to \(\mu \)
- \(\alpha \) :
-
Constant describing the bounds on the densities p and \(\rho \), see (1.2) and (1.9)
- X :
-
Point cloud \(X = \{x_1, \dots , x_n\} \subset {\mathcal {M}}\) drawn from distribution \(\mu \). Also considered as the set of vertices of the associated graph
- \(\mu _n\) :
-
Empirical measure of the sample X
- \(\mathbf {m}\) :
-
The vector giving the values of the discrete weights used in various forms of graph Laplacian, see Sects. 1.2.1 and 1.2.2
- \(w_{i,j}\) :
-
Edge weight between vertices \(x_i\) and \(x_j\)
- \(\delta u\) :
-
Differential of function \(u : X \rightarrow \mathbb {R}\). It maps edges to \(\mathbb {R}\) and is defined by \(\delta u_{i,j} = u(x_j) - u(x_i)\)
- \(i_0\) :
-
Injectivity radius of \({\mathcal {M}}\). The injectivity radius at a point \(p \in {\mathcal {M}}\) is the largest radius of a ball for which the exponential map at p is a diffeomorphism. The injectivity radius \(i_0\) is the infimum of the injectivity radii at all points of \({\mathcal {M}}\)
- K :
-
Maximum of the absolute value of sectional curvature of \({\mathcal {M}}\)
- \(R\) :
-
Reach of \({\mathcal {M}}\), defined in (1.37)
- \(\eta \) :
-
Nonnegative function setting the edge weights as a function of the distance between the vertices, see (1.5)
- h :
-
Length scale such that weight between vertices is large if their distance is comparable to or less than h
- \(\sigma _\eta \) :
-
Is the kernel-dependent scaling factor relating the graph Laplacian and the continuum Laplacian; defined in (1.4)
- \(\omega _m\) :
-
The volume of unit ball in \(\mathbb {R}^m\)
- \(d_\infty (\mu , \nu )\) :
-
Infinity transportation distance between measures \(\mu \), \(\nu \)
- \(\varepsilon \) :
-
Upper bounds on the transportation distance between \(\mu \) and \(\mu _n\)
- L :
-
Lipschitz constant of various functions: p, \(\rho \) and \(\eta \)
- P :
-
Discretization operator defined in (1.24)
- \(P^*\) :
-
Is the adjoint of P if \(\rho \equiv 1\) and an approximate adjoint otherwise
- I :
-
Interpolation operator defined in (1.24)
References
W. Arendt and A. F. M. ter Elst, Sectorial forms and degenerate differential operators, J. Operator Theory, 67 (2012), pp. 33–72.
M. Belkin and P. Niyogi, Laplacian eigenmaps for dimensionality reduction and data representation, Neural Computation, 15 (2002), pp. 1373–1396.
M. Belkin and P. Niyogi, Convergence of Laplacian eigenmaps, Advances in Neural Information Processing Systems (NIPS), 19 (2007), p. 129.
M. Belkin and P. Niyogi, Towards a theoretical foundation for Laplacian-based manifold methods, J. Comput. System Sci., 74 (2008), pp. 1289–1308.
A. L. Besse, Manifolds all of whose geodesics are closed, vol. 93 of Ergebnisse der Mathematik und ihrer Grenzgebiete [Results in Mathematics and Related Areas], Springer-Verlag, Berlin-New York, 1978. With appendices by D. B. A. Epstein, J.-P. Bourguignon, L. Bérard-Bergery, M. Berger and J. L. Kazdan.
D. Burago, S. Ivanov, and Y. Kurylev, A graph discretization of the Laplace-Beltrami operator, J. Spectr. Theory, 4 (2014), pp. 675–714.
I. Chavel, Eigenvalues in Riemannian geometry, Academic Press, New York, 1984.
R. R. Coifman and S. Lafon, Diffusion maps, Appl. Comput. Harmon. Anal., 21 (2006), pp. 5–30.
M. P. do Carmo, Riemannian geometry, Mathematics: Theory & Applications, Birkhäuser Boston, Inc., Boston, MA, 1992. Translated from the second Portuguese edition by Francis Flaherty.
K. Fujiwara, Eigenvalues of Laplacians on a closed riemannian manifold and its nets, Proc. Amer. Math. Soc., 123 (1995), pp. 2585–2594.
N. García Trillos and D. Slepčev, On the rate of convergence of empirical measures in \(\infty \)-transportation distance, Canad. J. Math., 67 (2015), pp. 1358–1383.
N. García Trillos and D. Slepčev, A variational approach to the consistency of spectral clustering, Appl. Comput. Harmon. Anal., 45 (2018), pp. 239–281.
E. Giné and V. Koltchinskii, Empirical graph Laplacian approximation of Laplace-Beltrami operators: large sample results, in High dimensional probability, vol. 51 of IMS Lecture Notes Monogr. Ser., Inst. Math. Statist., Beachwood, OH, 2006, pp. 238–259.
M. Hein, Uniform convergence of adaptive graph-based regularization, in Proc. of the 19th Annual Conference on Learning Theory (COLT), G. Lugosi and H. U. Simon, eds., Springer, 2006, pp. 50–64.
M. Hein, J.-Y. Audibert, and U. v. Luxburg, Graph Laplacians and their convergence on random neighborhood graphs, Journal of Machine Learning Research, 8 (2007), pp. 1325–1368.
T. Leighton and P. Shor, Tight bounds for minimax grid matching with applications to the average case analysis of algorithms, Combinatorica, 9 (1989), pp. 161–187.
B. Mohar, Some applications of Laplace eigenvalues of graphs, in Graph Theory, Combinatoris and Applications, Y. Alavi, G. Chartrand, O. R. Oellermann, and A. J. Schwenk, eds., Wiley, 1991, pp. 871–898.
D. Mugnolo and R. Nittka, Convergence of operator semigroups associated with generalised elliptic forms, J. Evol. Equ., 12 (2012), pp. 593–619.
P. Niyogi, S. Smale, and S. Weinberger, Finding the homology of submanifolds with high confidence from random samples, Discrete Comput. Geom., 39 (2008), pp. 419–441.
M. Penrose, Random geometric graphs, vol. 5 of Oxford Studies in Probability, Oxford University Press, Oxford, 2003.
L. Rosasco, M. Belkin, and E. D. Vito, On learning with integral operators, Journal of Machine Learning Research, 11 (2010), pp. 905–934.
Y. Shi and B. Xu, Gradient estimate of an eigenfunction on a compact Riemannian manifold without boundary, Ann. Global Anal. Geom., 38 (2010), pp. 21–26.
Z. Shi, Convergence of Laplacian spectra from random samples. preprint, arXiv:1507.00151, 2015.
P. W. Shor and J. E. Yukich, Minimax grid matching and empirical measures, Ann. Probab., 19 (1991), pp. 1338–1348.
A. Singer, From graph to manifold Laplacian: The convergence rate, Applied and Computational Harmonic Analysis, 21 (2006), pp. 128–134.
A. Singer and H.-T. Wu, Spectral convergence of the connection Laplacian from random samples, Information and Inference: A Journal of the IMA, 6 (2017), pp. 58–123.
M. Talagrand, The generic chaining, Springer Monographs in Mathematics, Springer-Verlag, Berlin, 2005. Upper and lower bounds of stochastic processes.
D. Ting, L. Huang, and M. I. Jordan, An analysis of the convergence of graph Laplacians, in Proc. of the 27th Int. Conference on Machine Learning (ICML), 2010.
U. von Luxburg, A tutorial on spectral clustering, Statistics and computing, 17 (2007), pp. 395–416.
U. von Luxburg, M. Belkin, and O. Bousquet, Consistency of spectral clustering, Ann. Statist., 36 (2008), pp. 555–586.
Acknowledgements
We are grateful to Yaroslav Kurylev who generously shared his knowledge of the techniques that were crucial for this work and who also encouraged our collaboration. DS is grateful for support of the National Science Foundation under the Grants DMS 1516677 and DMS 1814991. MH is grateful for the support by the ERC Grant NOLEPRO. The authors are also grateful to the Center for Nonlinear Analysis (CNA) for support.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Alan Edeiman.
Dedicated to the memory of Yaroslav Kurylev.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proofs of Propositions in Sect. 1.6
Proof
(of Proposition 1) The first claim follows immediately from (1.33). To deduce the second part, let \(q_1, q_2 \in B_{\mathcal {M}}(p,\frac{r}{2})\). Consider a smooth curve \(\tilde{\gamma }:[0,1] \rightarrow {\mathcal {M}}\) connecting \(q_1\) and \(q_2\), i.e., \(\tilde{\gamma }(0)=q_1\) and \(\tilde{\gamma }(i)=q_2\). We observe that if \(\tilde{\gamma }\) is not contained in \(B_{\mathcal {M}}(p,r)\), then
In fact, to deduce that \(r \le {{\,\mathrm{Length}\,}}(\tilde{\gamma })\) let \(s \in (0,1) \) be such that \(\tilde{\gamma }(s) \not \in B_{\mathcal {M}}(p,r)\). It is straightforward to see that the length of the restriction of \(\tilde{\gamma }\) to the interval [0, s] is larger than the distance between \(\tilde{\gamma }(s)\) and \(\partial B_{\mathcal {M}}(p,\frac{r}{2})\), which in turn is larger than \(\frac{r}{2}\). Similarly, the length of the restriction of \(\tilde{\gamma }\) to the interval [s, 1] is larger than \(\frac{r}{2}\). Hence, \(r \le {{\,\mathrm{Length}\,}}(\tilde{\gamma })\) as desired.
Now, let \(\tilde{\gamma }\) be a smooth curve realizing the distance between \(q_1 \) and \(q_2\) (which after appropriate normalization has to be a geodesic). From the previous observation, we see that \(\tilde{\gamma }\) is contained in \(B_{\mathcal {M}}(p, r)\). Consider \(\gamma :=\exp _p^{-1} \circ \tilde{\gamma }\), where we note that \(\exp ^{-1}_p\) is well defined along \(\tilde{\gamma }\) given that \(r \le i_0\). From the first part of the proposition, we deduce that
Finally, for an arbitrary smooth curve \(\gamma :[0,1] \rightarrow B(r) \subseteq T_p{\mathcal {M}}\) with \(\gamma (0)=\exp ^{-1}_p(q_1)\) and \(\gamma (i)= \exp ^{-1}_p(q_2)\) we have
Taking the infimum on the right-hand side over all such curves \(\gamma \), we deduce that \(d(q_1,q_2) \le 2 d(\exp ^{-1}_p(q_1),\exp ^{-1}_p(q_2))\). This completes the proof. \(\square \)
Proof
(Proof of Proposition 2) The inequality \(\left|x-y \right| \le d(x,y)\) is trivial. To show the other inequality, we note that since \(\left|x-y \right| \le \frac{R}{2}\), it follows from [19, Prop 6.3] that
Using the fact that for every \(t \in [0,1]\), \(\sqrt{1-t}\ge 1- \frac{1}{2} t - \frac{1}{2} t^2 \)
To improve the error estimate, let \(L=d(x,y)\) and let \(\gamma :[0,L] \rightarrow {\mathcal {M}}\) be an arc-length-parameterized length-minimizing geodesic between x and y. Heuristically, \(\gamma \) is a “straight” line in \({\mathcal {M}}\), and thus, its curvature in \(\mathbb {R}^d\) is bounded by the maximal principal curvature of \({\mathcal {M}}\) in \(\mathbb {R}^d\), which is bounded by \(\frac{1}{R}\). More precisely, we claim that
This statement follows from [19, Prop 6.1] (and is used in the proof of Proposition 6.3 of [19]). Using translation, we can assume that \(x=0\). Furthermore, note that that \({\dot{\gamma }}(t) \cdot \ddot{\gamma }(t)=0\) for all t. Thus,
Combining with (A.1) implies \(L \le |x-y| + \frac{8}{R^2} |x-y|^3\). \(\square \)
Kernel-Density Estimates via Transportation
Here, we use the estimates on infinity transportation distance established in Sect. 2 to show the kernel density estimates we need. While the estimates we prove are not optimal, they do not affect the rate of convergence of eigenvalues and eigenfunctions in our main theorems. We chose to present the proof as follows as it highlights how the optimal transportation estimates can be used to provide general kernel-density estimates in a simple and direct way.
Lemma 18
Consider \(\eta : \mathbb {R}\rightarrow \mathbb {R}\), nonincreasing, supported on [0, 1], and normalized: \( \int _{\mathbb {R}^m} \eta (\left|x \right|) \hbox {d}x =1\). Consider \(h>0\) satisfying Assumption 3. Then, (1.12) holds. That is, there exists a universal constant \(C>0\) such that
where \(\varepsilon \) is the \(\infty \)-OT distance between \(\mu _n\) and \(\mu \) (see Sect. 2).
The weights \(\mathbf {m}\) are defined by
p is the density of \(\mu \) with respect to \({\mathcal {M}}\)’s volume form. We remark that we do not require \(\eta \) to be Lipschitz on [0, 1].
Proof
First, notice that for every i, j with \(|x_i - x_j |\le h \) we have \(|x_i - x_j |\le \frac{R}{2} \), and hence, Proposition 2 implies that
Therefore, for every i, j and every \(y \in U_j\),
where we recall that \(\varepsilon \) is the \(\infty \)-OT distance between \(\mu _n\) and \(\mu \) and where \({\hat{h}}:= h+ \frac{27 h^3}{R^2}\). From this, it follows that
where the last inequality follows using the Lipschitz continuity of p, the fact that \(\varepsilon < h \) and the fact that \(h < \frac{R}{2}\) (so that in particular \({\hat{h}} + \varepsilon < 10 h \)). Now,
where C is a universal constant. The last integral above can be estimated as follows
Using the binomial theorem, we obtain
where in the first equality we have used the fact that \(\eta \) was assumed to be normalized, and in the last inequality, we have used
Combining (B.2), (B.3) and (B.4), we conclude that
for a universal constant \(C>0\).
In a similar fashion, we can find an upper bound for \(p(x_i) -m_i\). Indeed, observe that for every i, j and \( y\in U_i\) we have
and so
The above integral can be estimated from below by
where the second equality follows using polar coordinates and a change in variables; the last inequality follows from the fact that \(\eta \) is assumed to be normalized. In turn,
where we have used the fact that \(\eta \) was assumed to be normalized. Combining the above inequalities, we deduce that
\(\square \)
Rights and permissions
About this article
Cite this article
García Trillos, N., Gerlach, M., Hein, M. et al. Error Estimates for Spectral Convergence of the Graph Laplacian on Random Geometric Graphs Toward the Laplace–Beltrami Operator. Found Comput Math 20, 827–887 (2020). https://doi.org/10.1007/s10208-019-09436-w
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10208-019-09436-w
Keywords
- Graph Laplacian
- Spectral clustering
- Discrete to continuum limit
- Spectral convergence
- Random geometric graph
- Point cloud