Abstract
Manifold learning methods play a prominent role in nonlinear dimensionality reduction and other tasks involving high-dimensional data sets with low intrinsic dimensionality. Many of these methods are graph-based: they associate a vertex with each data point and a weighted edge with each pair. Existing theory shows that the Laplacian matrix of the graph converges to the Laplace–Beltrami operator of the data manifold, under the assumption that the pairwise affinities are based on the Euclidean norm. In this paper, we determine the limiting differential operator for graph Laplacians constructed using any norm. Our proof involves an interplay between the second fundamental form of the manifold and the convex geometry of the given norm’s unit ball. To demonstrate the potential benefits of non-Euclidean norms in manifold learning, we consider the task of mapping the motion of large molecules with continuous variability. In a numerical simulation we show that a modified Laplacian eigenmaps algorithm, based on the Earthmover’s distance, outperforms the classic Euclidean Laplacian eigenmaps, both in terms of computational cost and the sample size needed to recover the intrinsic geometry.
This is a preview of subscription content, access via your institution.







Notes
Compactness of \(\mathcal {M}\) and the Hopf-Rinow theorem imply that \(exp _\mathbf{p }\) is defined on the entire tangent space \(T_\mathbf{p } \mathcal {M}\).
References
Al-Gwaiz, M.: Sturm-Liouville Theory and Its Applications. Springer, London (2008)
Bates, J.: The embedding dimension of Laplacian eigenfunction maps. Appl. Comput. Harmon. Anal. 37(3), 516–530 (2014). https://doi.org/10.1016/j.acha.2014.03.002
Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003). https://doi.org/10.1162/089976603321780317
Belkin, M., Niyogi, P.: Semi-supervised learning on Riemannian manifolds. Mach. Learn. 56(1–3), 209–239 (2004). https://doi.org/10.1023/B:MACH.0000033120.25363.1e
Belkin, M., Niyogi, P.: Convergence of Laplacian eigenmaps. In: Neural Information Processing Systems (NIPS) (2007). https://doi.org/10.7551/mitpress/7503.003.0021
Belkin, M., Niyogi, P.: Towards a theoretical foundation for Laplacian-based manifold methods. J. Comput. Syst. Sci. 74(8), 1289–1308 (2008). https://doi.org/10.1016/j.jcss.2007.08.006
Bellet, A., Habrard, A., Sebban, M.: Metric learning. Synth. Lect. Artif. Intell. Mach. Learn. 9(1), 1–151 (2015). https://doi.org/10.2200/S00626ED1V01Y201501AIM030
Bendory, T., Bartesaghi, A., Singer, A.: Single-particle cryo-electron microscopy: mathematical theory, computational challenges, and opportunities. IEEE Signal Process. Mag. 37(2), 58–76 (2020). https://doi.org/10.1109/MSP.2019.2957822
Cheng, M.Y., Wu, H.T.: Local linear regression on manifolds and its geometric interpretation. J. Am. Stat. Assoc. 108(504), 1421–1434 (2013). https://doi.org/10.1080/01621459.2013.827984
Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006). https://doi.org/10.1016/j.acha.2006.04.006
Coifman, R.R., Leeb, W.: Earth mover’s distance and equivalent metrics for spaces with hierarchical partition trees. Tech. rep., Yale University (2013)
Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Nadler, B., Warner, F., Zucker, S.W.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci. 102(21), 7426–7431 (2005). https://doi.org/10.1073/pnas.0500334102
Dashti, A., et al.: Trajectories of the ribosome as a Brownian nanomachine. Proc. Natl. Acad. Sci. 111(49), 17492–17497 (2014). https://doi.org/10.1073/pnas.1419276111
Dashti, A., et al.: Retrieving functional pathways of biomolecules from single-particle snapshots. Nat. Commun. 11(1), 4734 (2020). https://doi.org/10.1038/s41467-020-18403-x
Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003). https://doi.org/10.1073/pnas.1031596100
Frank, J.: New opportunities created by single-particle cryo-EM: the mapping of conformational space. Biochemistry 57(6), 888 (2018). https://doi.org/10.1021/acs.biochem.8b00064
Frank, J., Ourmazd, A.: Continuous changes in structure mapped by manifold embedding of single-particle data in cryo-EM. Methods 100, 61–67 (2016). https://doi.org/10.1016/j.ymeth.2016.02.007
García Trillos, N., Slepčev, D.: A variational approach to the consistency of spectral clustering. Appl. Comput. Harmon. Anal. 45(2), 239–281 (2018). https://doi.org/10.1016/j.acha.2016.09.003
García Trillos, N., Gerlach, M., Hein, M., Slepčev, D.: Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace-Beltrami operator. Found. Comput. Math. 20(4), 827–887 (2020). https://doi.org/10.1007/s10208-019-09436-w
Gavish, M., Nadler, B., Coifman, R.R.: Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning. In: International Conference on Machine Learning (ICML) (2010)
Giné, E., Koltchinskii, V.: Empirical graph Laplacian approximation of Laplace-Beltrami operators: large sample results. In: High Dimensional Probability, vol. 51, pp. 238–259. Institute of Mathematical Statistics, Beachwood, Ohio, USA (2006). https://doi.org/10.1214/074921706000000888
Glaeser, R.M., Nogales, E., Chiu, W. (eds.): Single-Particle Cryo-EM of Biological Macromolecules. IOP Publishing (2021). https://doi.org/10.1088/978-0-7503-3039-8
Goldberg, A.B., Zhu, X., Singh, A., Xu, Z., Nowak, R.: Multi-manifold semi-supervised learning. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 169–176 (2009)
Hein, M., Audibert, J.Y., von Luxburg, U.: From graphs to manifolds—weak and strong pointwise consistency of graph Laplacians. In: International Conference on Computational Learning Theory (COLT), pp. 470–485 (2005). https://doi.org/10.1007/11503415_32
Hein, M., Audibert, J.Y., von Luxburg, U.: Graph Laplacians and their convergence on random neighborhood graphs. J. Mach. Learn. Res. 8, 1325–1368 (2007)
Hug, D., Weil, W.: Lectures on convex geometry. In: Graduate Texts in Mathematics, vol. 286. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-50180-8
Jin, Q., et al.: Iterative elastic 3D-to-2D alignment method using normal modes for studying structural dynamics of large macromolecular complexes. Structure 22(3), 496–506 (2014). https://doi.org/10.1016/j.str.2014.01.004
Lederman, R.R., Andén, J., Singer, A.: Hyper-molecules: on the representation and recovery of dynamical structures for applications in flexible macro-molecules in cryo-EM. Inverse Prob. 36(4), 044005 (2020). https://doi.org/10.1088/1361-6420/ab5ede
Lee, J.M.: Riemannian manifolds. In: Graduate Texts in Mathematics, vol. 176. Springer New York (1997). https://doi.org/10.1007/b98852
Lee, J.M.: Introduction to smooth manifolds. In: Graduate Texts in Mathematics, vol. 218. Springer, New York (2012). https://doi.org/10.1007/978-1-4419-9982-5
Lee, A.B., Izbicki, R.: A spectral series approach to high-dimensional nonparametric regression. Electron. J. Stat. 10(1), 423–463 (2016). https://doi.org/10.1214/16-EJS1112
Lee, G., Gommers, R., Waselewski, F., Wohlfahrt, K., O’Leary, A.: PyWavelets: a Python package for wavelet analysis. J. Open Source Softw. 4(36), 1237 (2019). https://doi.org/10.21105/joss.01237
Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK users’ guide. Soc. Ind. Appl. Math. (1998). https://doi.org/10.1137/1.9780898719628
Liao, W., Maggioni, M., Vigogna, S.: Learning adaptive multiscale approximations to data and functions near low-dimensional sets. In: IEEE Information Theory Workshop (ITW), pp. 226–230. IEEE (2016). https://doi.org/10.1109/ITW.2016.7606829
Lieu, L., Saito, N.: Signal ensemble classification using low-dimensional embeddings and earth mover’s distance. In: Wavelets and Multiscale Analysis, 9780817680947, pp. 227–256. Birkhäuser Boston (2011). https://doi.org/10.1007/978-0-8176-8095-4_11
Mallat, S.: A Wavelet Tour of Signal Processing, 3rd edn. Elsevier, New York (2009)
McInnes, L., Healy, J., Saul, N., Großberger, L.: UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3(29), 861 (2018). https://doi.org/10.21105/joss.00861
Mishne, G., Talmon, R., Meir, R., Schiller, J., Lavzin, M., Dubin, U., Coifman, R.R.: Hierarchical coupled-geometry analysis for neuronal structure and activity pattern discovery. IEEE J. Select. Top. Signal Process. 10(7), 1238–1253 (2016). https://doi.org/10.1109/JSTSP.2016.2602061
Mishne, G., Talmon, R., Cohen, I., Coifman, R.R., Kluger, Y.: Data-driven tree transforms and metrics. IEEE Trans. Signal Inf. Process. Netw. 4(3), 451–466 (2018). https://doi.org/10.1109/TSIPN.2017.2743561
Monera, M.G., Montesinos-Amilibia, A., Sanabria-Codesal, E.: The Taylor expansion of the exponential map and geometric applications. Revista de la Real Academia de Ciencias Exactas, Fisicas y Naturales - Serie A 108(2), 881–906 (2014). https://doi.org/10.1007/s13398-013-0149-z
Moscovich, A., Jaffe, A., Nadler, B.: Minimax-optimal semi-supervised regression on unknown manifolds. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 933–942. PMLR (2017)
Moscovich, A., Halevi, A., Andén, J., Singer, A.: Cryo-EM reconstruction of continuous heterogeneity by Laplacian spectral volumes. Inverse Prob. 36(2), 024003 (2020). https://doi.org/10.1088/1361-6420/ab4f55
Nadler, B., Lafon, S., Coifman, R.R., Kevrekidis, I.G.: Diffusion maps, spectral clustering and eigenfunctions of Fokker–Planck operators. In: Neural Information Processing Systems (NIPS), pp. 955–962 (2005)
Nakane, T., Kimanius, D., Lindahl, E., Scheres, S.H.: Characterisation of molecular motions in cryo-EM single-particle data by multi-body refinement in RELION. eLife 7, 1–18 (2018). https://doi.org/10.7554/eLife.36861
Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., Ferrin, T.E.: UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem. 25(13), 1605–1612 (2004). https://doi.org/10.1002/jcc.20084
Punjani, A., Fleet, D.J.: 3D variability analysis: resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM. J. Struct. Biol. 213(2), 107702 (2021). https://doi.org/10.1016/j.jsb.2021.107702
Rao, R., Moscovich, A., Singer, A.: Wasserstein K-means for clustering tomographic projections. In: Machine Learning for Structural Biology Workshop, NeurIPS (2020)
Rosasco, L., Belkin, M., De Vito, E.: On learning with integral operators. J. Mach. Learn. Res. 11, 905–934 (2010)
Rose, P., et al.: The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucl Acids Res 45(D1), D271–D281 (2017). https://doi.org/10.1093/nar/gkw1000
Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000). https://doi.org/10.1126/science.290.5500.2323
Ruszczyński, A.: Nonlinear Optimization. Princeton University Press, Princeton (2011). https://doi.org/10.2307/j.ctvcm4hcj
Sathyanarayanan, N., Cannone, G., Gakhar, L., Katagihallimath, N., Sowdhamini, R., Ramaswamy, S., Vinothkumar, K.R.: Molecular basis for metabolite channeling in a ring opening enzyme of the phenylacetate degradation pathway. Nat. Commun. 10(1), 4127 (2019). https://doi.org/10.1038/s41467-019-11931-1
Schwander, P., Fung, R., Ourmazd, A.: Conformations of macromolecules and their complexes from heterogeneous datasets. Philos. Trans. R. Soc. B 369(1647), 1–8 (2014). https://doi.org/10.1098/rstb.2013.0567
Shirdhonkar, S., Jacobs, D.W.: Approximate earth mover’s distance in linear time. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008). https://doi.org/10.1109/CVPR.2008.4587662
Singer, A.: From graph to manifold Laplacian: the convergence rate. Appl. Comput. Harmon. Anal. 21(1), 128–134 (2006). https://doi.org/10.1016/j.acha.2006.03.004
Singer, A., Sigworth, F.J.: Computational methods for single-particle electron cryomicroscopy. Ann. Rev. Biomed. Data Sci. 3(1), 163–190 (2020). https://doi.org/10.1146/annurev-biodatasci-021020-093826
Sober, B., Aizenbud, Y., Levin, D.: Approximation of functions over manifolds: a moving Least-squares approach. J. Comput. Appl. Math. 383, 113140 (2021). https://doi.org/10.1016/j.cam.2020.113140
Sorzano, C.O.S., et al.: Survey of the analysis of continuous conformational variability of biological macromolecules by electron microscopy. Acta Crystallogr. Sect. F 75(1), 19–32 (2019). https://doi.org/10.1107/S2053230X18015108
Stock, D., Leslie, A., Walker, J.: Molecular architecture of the rotary motor in ATP synthase. Science 286(5445), 1700–1705 (1999). https://doi.org/10.1126/science.286.5445.1700
Tagare, H.D., Kucukelbir, A., Sigworth, F.J., Wang, H., Rao, M.: Directly reconstructing principal components of heterogeneous particles from cryo-EM images. J. Struct. Biol. 191(2), 245–262 (2015). https://doi.org/10.1016/j.jsb.2015.05.007
Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000). https://doi.org/10.1126/science.290.5500.2319
Ting, D., Huang, L., Jordan, M.: An analysis of the convergence of graph Laplacians. In: International Conference on Machine Learning (ICML) (2010)
van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008)
Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17(3), 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2
Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). https://doi.org/10.1007/s11222-007-9033-z
von Luxburg, U., Belkin, M., Bousquet, O.: Consistency of spectral clustering. Ann. Stat. 36(2), 555–586 (2008). https://doi.org/10.1214/009053607000000640
Winkelbauer, A.: Moments and absolute moments of the normal distribution, pp. 1–4 (2012). arXiv:1209.4340v2
Wormell, C.L., Reich, S.: Spectral convergence of diffusion maps: improved error bounds and an alternative normalization. SIAM J. Numer. Anal. 59(3), 1687–1734 (2021). https://doi.org/10.1137/20M1344093
Yoshida, M., Muneyuki, E., Hisabori, T.: ATP synthase—a Marvellous rotary engine of the cell. Nat. Rev. Mol. Cell Biol. 2(9), 669–677 (2001). https://doi.org/10.1038/35089509
Zelesko, N., Moscovich, A., Kileel, J., Singer, A.: Earthmover-based manifold learning for analyzing molecular conformation spaces. In: IEEE International Symposium on Biomedical Imaging (ISBI) (2020). https://doi.org/10.1109/ISBI45749.2020.9098723
Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004). https://doi.org/10.1137/S1064827502419154
Zhang, S., Moscovich, A., Singer, A.: Product manifold learning. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2021)
Zhong, E.D., Bepler, T., Berger, B., Davis, J.H.: CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 18(2), 176–185 (2021). https://doi.org/10.1038/s41592-020-01049-4
Acknowledgements
We thank Charles Fefferman, William Leeb, Eitan Levin and John Walker for enlightening discussions. Most of this work was performed while AM was affiliated with PACM at Princeton University. This research was supported by AFOSR FA9550-17-1-0291, ARO W911NF-17-1-0512, NSF BIGDATA IIS-1837992, the Simons Investigator Award, the Moore Foundation Data-Driven Discovery Investigator Award, the Simons Collaboration on Algorithms and Geometry, and start-up grants from the College of Natural Sciences and Oden Institute for Computational Engineering and Sciences at UT Austin.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Isaak Pesenson.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendices
Proof of Lemma 3
\({\mathbf{Step~1: LHS} \subseteq \mathbf{RHS}}\). By the identity (13) for tangent cones of convex sets, we have
By definition of \(\partial \) and the fact that the relative interior of a convex set equals the relative interior of its closure, the LHS of Eq. (14) reads
Let \(\mathbf{d} \in \partial \left( TC_\mathbf{y }(\mathcal {B}) \right) \). By Eq. (94), \(\mathbf{d} = \lim _{k \rightarrow \infty } \beta _k (\widetilde{\mathbf{y }}_k - \mathbf{y} )\) for some \(\beta _k \in \mathbb {R}_{>0}\) and \(\widetilde{\mathbf{y }}_k \in \mathcal {B}\). Without loss of generality, we assume \(\widetilde{\mathbf{y }}_k \in \partial \mathcal {B}\) for each k. We break into cases.
-
Case A: \(\widetilde{\mathbf{y }} = \mathbf{y} \).
Either \(\mathbf{d} = 0 \in T_\mathbf{y }(\partial \mathcal {B})\), or \(\tau _k := 1 / \beta _k \rightarrow \infty \) as \(k \rightarrow \infty \). If the latter, the sequences \((\widetilde{\mathbf{y }}_k)_{k=1}^{\infty } \subseteq \partial \mathcal {B}\) and \((\tau _k)_{k=1}^{\infty }\subseteq \mathbb {R}_{>0}\) witness \(\mathbf{d} \in TC_\mathbf{y }(\partial \mathcal {B})\).
-
Case B: \(\widetilde{\mathbf{y }} \ne \mathbf{y} \).
Here, \(\lim _{k \rightarrow \infty } \beta _k =: \beta \in \mathbb {R}_{\ge 0}\) exists, and \(\mathbf{d} = \beta (\widetilde{\mathbf{y }} - \mathbf{y} )\). If \(\beta =0\), then \(\mathbf{d} = 0 \in T_\mathbf{y }(\partial \mathcal {B})\). Suppose \(\beta \ne 0\). Let the line segment joining \(\widetilde{\mathbf{y }}\) and \(\mathbf{y} \) be
$$\begin{aligned} \texttt {conv}\{\widetilde{\mathbf{y }}, \mathbf{y} \} := \{\alpha \widetilde{\mathbf{y }} + (1 - \alpha )\mathbf{y} \in \mathbb {R}^D : \alpha \in [0,1]\}. \end{aligned}$$So, \(\texttt {conv}\{\widetilde{\mathbf{y }}, \mathbf{y} \} \subseteq \mathcal {B}\). We claim \(\texttt {conv}\{\widetilde{\mathbf{y }}, \mathbf{y} \} \subseteq \partial \mathcal {B}\). Assume not. That is,
$$\begin{aligned} \exists \, \alpha \in (0,1) \, such that \, \mathbf{z} := \alpha \widetilde{\mathbf{y }} + (1 - \alpha ) \mathbf{y} \in \mathcal {B}^{\circ }. \end{aligned}$$But then,
$$\begin{aligned} \mathbf{d} \, = \, \beta (\widetilde{\mathbf{y }} - \mathbf{y} ) \, = \, (\beta / \alpha ) (\mathbf{z} - \mathbf{y} ) \, \in \, \mathbb {R}_{>0}\left( \mathcal {B}^{\circ } - \mathbf{y} \right) \, \subseteq \, \left( \mathbb {R}_{>0} (\mathcal {B} - \mathbf{y} ) \right) ^{\circ }. \end{aligned}$$This contradicts \(\mathbf{d} \in \partial \left( TC_\mathbf{y }(\mathcal {B}) \right) \) (see Eq. (94)). So, indeed \(\texttt {conv}\{\widetilde{\mathbf{y }}, \mathbf{y} \} \subseteq \partial \mathcal {B}\). Now, define
$$\begin{aligned} \widehat{\mathbf{y }}_k := \frac{1}{k} \widetilde{\mathbf{y }} + (1 - \frac{1}{k}) \mathbf{y} \in \partial \mathcal {B} \,\,\,\,\,\,\,\, and \,\,\,\,\,\,\,\, \tau _k := \frac{1}{k} \in \mathbb {R}_{>0}. \end{aligned}$$Then, \(\frac{\widehat{\mathbf{y }}_k - \mathbf{y} }{\tau _k} = \mathbf{d} \) for each k, and \((\widehat{\mathbf{y }}_k)_{k=1}^{\infty }\) and \((\tau _k)_{k=1}^{\infty }\) witness \(\mathbf{d} \in TC_\mathbf{y }(\partial \mathcal {B})\).
In all cases, we have verified \(\mathbf{d} \in TC_\mathbf{y }(\partial \mathcal {B})\). This gives LHS \(\subseteq \) RHS in (14).
\(\underline{\mathbf{Step 2: LHS} \supseteq \mathbf{RHS.}}\) Let \(\mathbf{d} \in TC_\mathbf{y }(\partial \mathcal {B})\). By the definition of tangent cones (12), \(\mathbf{d} = \lim _{k \rightarrow \infty } \tau _k^{-1} \left( \widetilde{\mathbf{y }}_k - \mathbf{y} \right) \) for some \(\tau _{k} \in \mathbb {R}_{>0}\) and \(\widetilde{\mathbf{y }}_k \in \partial \mathcal {B}\) with \(\tau _k \rightarrow 0\) and \(\widetilde{\mathbf{y }}_k \rightarrow \mathbf{y} \) as \(k \rightarrow \infty \). By (94), we need to show \(\mathbf{d} \notin \left( \mathbb {R}_{>0} (\mathcal {B} - \mathbf{y} ) \right) ^{\circ }\).
First, we will prove \(\texttt {conv}\{\mathbf{d }+\mathbf{y} , \mathbf{y }\} \cap \mathcal {B}^{\circ } = \emptyset \). Assume not, i.e.,
Let \({\widehat{\tau }}_k = \tau _k / \alpha \in \mathbb {R}_{>0}\), so that
Since \(\mathcal {B}^{\circ }\) is open, there exists \(\delta > 0\) with
By Eq. (95), there exists K such that for all \(k \ge K\),
On the other hand, it is easy to see for each \(\mathbf{w} \in \mathcal {B}^{\circ }\),
for some \(\mathbf{w} ' \in \partial \mathcal {B}\), using convexity and compactness of \(\mathcal {B}\). In addition,
using \(\mathbf{w} \in \texttt {conv}\{\mathbf{y }, \mathbf{w '}\}\), \(\Vert \mathbf{w} \Vert _{\mathcal {B}}< 1\), and the triangle inequality for \(\Vert \cdot \Vert _{\mathcal {B}}\). Clearly,
Now, let \(\epsilon := \min _\mathbf{w \in \mathcal {N}} \Vert \mathbf{w} - \mathbf{y} \Vert _2\). Note \(\epsilon > 0\). For each \(k \le K\), we apply (96), (97) to \(\mathbf{w} = {\widehat{\tau }}_k^{~-1}(\widetilde{\mathbf{y }}_k - \mathbf{y} ) + \mathbf{y} \in \mathcal {N}\). Then, \(\mathbf{w} ' = \widetilde{\mathbf{y }}_k\). By (98),
But (99) contradicts \(\widetilde{\mathbf{y }}_k \rightarrow \mathbf{y} \) as \(k \rightarrow \infty \). Therefore, \(\texttt {conv}\{\mathbf{d }+\mathbf{y} ,\mathbf{y }\} \cap \mathcal {B}^{\circ } = \emptyset \).
Translating by \(-\mathbf{y} \), \(\texttt {conv}\{\mathbf{d }, 0\} \cap (\mathcal {B} - \mathbf{y} )^{\circ } = \emptyset \). By this and convexity, it follows there exists a properly separating hyperplane:
In particular,
Also, for any open neighborhood \(\mathcal {D} \subseteq \mathbb {R}^D\) with \(\mathbf{d} \in \mathcal {D}\),
We conclude \(\mathbf{d} \notin \left( \mathbb {R}_{>0} (\mathcal {B} - \mathbf{y} ) \right) ^{\circ }\), as desired. This gives \(\mathbf{d} \in \partial \left( TC_\mathbf{y }(\mathcal {B})\right) \), and LHS \(\supseteq \) RHS in Eq. (14), completing the proof of the lemma. \(\square \)
Proof of Proposition 5
For item 1, we first note that \({\text {grad}}\Vert \cdot \Vert _{\mathcal {B}}(\widehat{\mathbf{a }})\) is nonzero, since the directional derivative of the norm function at \(\widehat{\mathbf{a }}\) in the direction of \(\widehat{\mathbf{a }}\) is nonzero. Indeed the function \(\mathbb {R} \rightarrow \mathbb {R}; \lambda \mapsto \Vert \widehat{\mathbf{a }} + \lambda \widehat{\mathbf{a }} \Vert _{\mathcal {B}}\) has derivative \(\Vert \widehat{\mathbf{a }} \Vert _{\mathcal {B}} = 1\) at \(\lambda = 0\), using homogeneity of \(\Vert \cdot \Vert _{\mathcal {B}}\) under positive scaling. Item 1 now follows immediately from [51, Thm. 3.15] and the preceding paragraph in that reference that metric regularity is implied by the linear independence of the gradients.
For item 2, we note that due to homogeneity of the norm, since \(\Vert \cdot \Vert _{\mathcal {B}}\) is \(C^1\) around \(L_\mathbf{p }({\widehat{\mathbf{s }}})\), it is also \(C^1\) around \(L_\mathbf{p }({\widehat{\mathbf{s }}}) / \Vert L_\mathbf{p }({\widehat{\mathbf{s }}}) \Vert _{\mathcal {B}}\) and it holds
Thus, item 1 applies and implies the tangent cone in right-hand side of Eq. (15) is the hyperplane normal to \(L_\mathbf{p }({\widehat{\mathbf{s }}})\). Now we finish by equating the inner product of \({\text {grad}}\Vert \cdot \Vert _{\mathcal {B}}(L_\mathbf{p }({\widehat{\mathbf{s }}}))\) and the LHS of Eq. (15) with 0, and solving for \(\eta \). \(\square \)
Proof of Lemma 8
Given \(\mathcal {M}\) and \(\mathcal {B}\), we need to show that there exists a positive constant C (independent of \(\mathbf{p} , \xi \)) such that for all \(\mathbf{p} \in \mathcal {M}\) and all vectors \(\xi \in T_\mathbf{p }\mathcal {M}\) we have
To this end, use linearity of the integral to rewrite the left-hand side of (100) as
By the equivalence of norms on \(\mathbb {R}^D\), there exists a positive constant c such that for all \(\mathbf{v} \in \mathbb {R}^D\) we have that \(\Vert \mathbf{v} \Vert _{2} \le c\) implies \(\Vert \mathbf{v} \Vert _{\mathcal {B}} \le 1\). In particular, the domain of integration in Eq. (101) is inner-approximated by
Since the integrand in (101) is non-negative, it follows
Since \(L_\mathbf{p }\) is an isometry, the right-hand side is
Using rotational symmetry of Euclidean balls, this equals
where \(s_1\) denotes the first coordinate of \(\mathbf{s} \) with respect to the fixed orthonormal basis on \(T_\mathbf{p }\mathcal {M}\) (Sect. 3.1). Now note the parenthesized quantity in (102) is a positive constant C depending only on c and the manifold dimension d. By what we have said, it satisfies the bound (100) as desired. \(\square \)
Proof of Proposition 9
-
1.
Denote the function (37) by \(F : \mathcal {M} \rightarrow {\text {Sym}}^2(T\mathcal {M})\). Let \((\mathbf{p} _k)_{k=1}^{\infty } \subseteq \mathcal {M}\) be a sequence converging to \(\mathbf{p} \in \mathcal {M}\). To move to one fixed space we identify tangent spaces using the Levi-Civita connection on \(\mathcal {M}\). After choosing a smooth path \(\gamma :[0,1] \rightarrow \mathcal {M}\) such that \(\gamma (\tfrac{1}{k}) = \mathbf{p} _k\) for each \(k \ge 1\) and \(\gamma (0)=1\), the Levi-Civita connection gives isometries \(\tau _{k}: T_\mathbf{p }\mathcal {M} \rightarrow T_\mathbf{p _k}\mathcal {M}\). Furthermore, \(\tau _k\) converges to the identity map on \(T_\mathbf{p }\mathcal {M}\) as elements of \((T\mathcal {M})^* \otimes T\mathcal {M}\) as \(k \rightarrow \infty \).
We want to show \(F(\mathbf{p} _k) \rightarrow F(\mathbf{p} )\) in \({\text {Sym}}^2(T\mathcal {M})\). It suffices to show \((\tau _k^{-1} \otimes \tau _k^{-1})(F(\mathbf{p} _k)) \rightarrow F(\mathbf{p} )\) in \({\text {Sym}}^2(T_\mathbf{p }\mathcal {M})\) (last sentence of the previous paragraph). Changing variables \(\mathbf{s} \leftarrow \tau _k^{-1}(\mathbf{s} )\) and using that \(\tau _k\) is an isometry, we have
$$\begin{aligned} (\tau _k^{-1} \otimes \tau _k^{-1})(F(\mathbf{p} _k)) = \frac{1}{2} \int _\mathbf{s \in T_\mathbf{p }\mathcal {M} : \Vert L_\mathbf{p _k}(\tau _k(\mathbf{s} ))\Vert _{\mathcal {B}} \le 1} \mathbf{s} \mathbf{s} ^{\top } d\mathbf{s} . \end{aligned}$$Write this as
$$\begin{aligned} \int _\mathbf{s \in T_\mathbf{p }\mathcal {M}} \mathbbm {1}( \Vert L_\mathbf{p _k}(\tau _k(\mathbf{s} ))\Vert _{\mathcal {B}} < 1) \, \mathbf{s} \mathbf{s} ^{\top } d\mathbf{s} \, \in \, {\text {Sym}}^2(T_\mathbf{p }\mathcal {M}). \end{aligned}$$Compare this to
$$\begin{aligned} F(\mathbf{p} ) = \int _\mathbf{s \in T_\mathbf{p }\mathcal {M}} \mathbbm {1}(\Vert L_\mathbf{p }(\mathbf{s} ) \Vert _{\mathcal {B}} < 1) \mathbf{s} \, \in \, {\text {Sym}}^2(T_\mathbf{p }\mathcal {M}). \mathbf{s} ^{\top } d\mathbf{s} \end{aligned}$$Since \(L_\mathbf{p _k} \rightarrow L_\mathbf{p }\), \(\tau _k \rightarrow {\text {Id}}_{T_\mathbf{p }\mathcal {M}}\) and \(\Vert \cdot \Vert _{\mathcal {B}}\) is continuous on \(\mathbb {R}^D\), for each \(\mathbf{s} \in T_\mathbf{p }\mathcal {M}\) there is the pointwise convergence:
$$\begin{aligned} \mathbbm {1}( \Vert L_\mathbf{p _k}(\tau _k(\mathbf{s} ))\Vert _{\mathcal {B}}< 1) \, \mathbf{s} \mathbf{s} ^{\top } \longrightarrow \mathbbm {1}(\Vert L_\mathbf{p }(\mathbf{s} )\Vert _{\mathcal {B}} < 1) \mathbf{s} \mathbf{s} ^{\top } \end{aligned}$$Also, letting \(c \in \mathbb {R}_{>0}\) be a constant such that \(\Vert \mathbf{u} \Vert _{2} \le c \Vert \mathbf{u} \Vert _{\mathcal {B}}\) for all \(\mathbf{u} \in \mathbb {R}^D\), we have the uniform bound:
$$\begin{aligned} \Vert \mathbbm {1}( \Vert L_\mathbf{p _k}(\tau _k(\mathbf{s} ))\Vert _{\mathcal {B}} < 1) \, \mathbf{s} \mathbf{s} ^{\top }\Vert _F \le c^2 \quad for all \mathbf{s} \in T_\mathbf{p }\mathcal {M} and k \ge 1, \end{aligned}$$since \(L_\mathbf{p }\) and \(\tau _k\) are both isometries. Hence, the bounded convergence theorem is applicable, and implies \((\tau _k \otimes \tau _k)(F(\mathbf{p} _k)) \rightarrow F(\mathbf{p} )\).
-
2.
Denote the function (38) by \(G : \mathcal {M} \rightarrow {\text {Sym}}(T\mathcal {M})\). Let \(\mathbf{p} \) be a point satisfying the stated assumption. There exists an open neighborhood \(\mathcal {U}\) of \(\mathbf{p} \) in \(\mathcal {M}\) such that for each \(\mathbf{p} _* \in \mathcal {U}\) the norm \(\Vert \cdot \Vert _{\mathcal {B}}\) is \(C^1\) in some neighborhood of \(L_\mathbf{p _*}(T_\mathbf{p _*}\mathcal {M}) \cap \mathbb {S}^{D-1}\). Let \((\mathbf{p} _k)_{k=1}^{\infty } \subseteq \mathcal {U}\) be a sequence converging to \(\mathbf{p} \). Identifying tangent spaces as above, it suffices to show \(\tau _k^{-1}(F(\mathbf{p} _k)) \rightarrow F(\mathbf{p} )\).
By the local \(C^1\) assumption, Proposition 5, item 2 applies and gives
$$\begin{aligned} F(\mathbf{p} ) = \int _{{\widehat{\mathbf{s }}} \in T_\mathbf{p }\mathcal {M} : \Vert {\widehat{\mathbf{s }}}\Vert _2=1} -{\widehat{\mathbf{s }}} \Vert L_\mathbf{p }({\widehat{\mathbf{s }}}) \Vert ^{-d-2}_{\mathcal {B}} \frac{\left\langle {\text {grad}} \Vert \cdot \Vert _{\mathcal {B}} (L_\mathbf{p }({\widehat{\mathbf{s }}})), \, \tfrac{1}{2} Q_\mathbf{p }({\widehat{\mathbf{s }}}) \right\rangle }{\left\langle {\text {grad}} \Vert \cdot \Vert _{\mathcal {B}} (L_\mathbf{p }({\widehat{\mathbf{s }}})), \, L_\mathbf{p }({\widehat{\mathbf{s }}}) \right\rangle } \, d{\widehat{\mathbf{s }}}. \end{aligned}$$Likewise, by a change of variables using that \(\tau _k^{-1}\) preserves unit spheres:
$$\begin{aligned}&\tau _k^{-1}(F(\mathbf{p} _k))= \\&\int _{{\widehat{\mathbf{s }}} \in T_\mathbf{p }\mathcal {M} : \Vert {\widehat{\mathbf{s }}}\Vert _2=1} -{\widehat{\mathbf{s }}} \Vert L_\mathbf{p _k}(\tau _k({\widehat{\mathbf{s }}})) \Vert ^{-d-2}_{\mathcal {B}} \frac{\left\langle {\text {grad}} \Vert \cdot \Vert _{\mathcal {B}} (L_\mathbf{p _k}(\tau _k({\widehat{\mathbf{s }}}))), \, \tfrac{1}{2} Q_\mathbf{p _k}(\tau _k({\widehat{\mathbf{s }}})) \right\rangle }{\left\langle {\text {grad}} \Vert \cdot \Vert _{\mathcal {B}} (L_\mathbf{p _k}(\tau _k({\widehat{\mathbf{s }}}))), \, L_\mathbf{p _k}(\tau _k({\widehat{\mathbf{s }}})) \right\rangle } d{\widehat{\mathbf{s }}}. \end{aligned}$$Boundedness and pointwise convergence hold since \({\text {grad}} \Vert \cdot \Vert _{\mathcal {B}}\) is locally continuous. So the bounded convergence theorem implies the first sentence in the statement. The second sentence follows from the example in Sect. 3.7. \(\square \)
Tail Bounds and Absolute Moments of the Gaussian
We recall some basic properties of the Gaussian. As in Sect. 4.2, \(\kappa _{\sigma }(s) := \frac{2s}{\sigma ^2}e^{-s^2/\sigma ^2}\).
-
For each even \(k \ge 0\) and \(\delta \ge 0\), by substitution and then integration by parts k/2 times,
$$\begin{aligned}&\int _{s=\delta }^{\infty } s^k \kappa _{\sigma }(s) ds \nonumber \\&= \sigma ^k e^{-\delta ^2/\sigma ^2} \left( \left( \frac{\delta ^2}{\sigma ^2}\right) ^{\frac{k}{2}} + \, \frac{k}{2} \left( \frac{\delta ^2}{\sigma ^2}\right) ^{\frac{k}{2}-1} + \, \frac{k}{2}\left( \frac{k}{2} - 1\right) \left( \frac{\delta ^2}{\sigma ^2}\right) ^{\frac{k}{2}-2} + \cdots + \left( \frac{k}{2}\right) ! \right) \nonumber \\&= e^{-\delta ^2/\sigma ^2}poly (\sigma , \delta ). \end{aligned}$$(103) -
For each odd \(k \ge 0\) and \(\delta > 0\), using \(s/\delta \ge 1\) for \(s \in [\delta , \infty ]\) and Eq. (103),
$$\begin{aligned} \int _{s=\delta }^{\infty } s^k \kappa _{\sigma }(s) ds \le (1/\delta ) \int _{s=\delta }^{\infty } s^{k+1} \kappa _{\sigma }(s) ds = e^{-\delta ^2/\sigma ^2} (1/\delta ) \, poly (\sigma , \delta ). \end{aligned}$$(104) -
For each \(k \ge 0\), from [67, Equation 18],
$$\begin{aligned} \int _{s=0}^{\infty } s^k \kappa _{\sigma }(s) ds = \sigma ^k \varGamma \left( \tfrac{k+2}{2}\right) . \end{aligned}$$(105)
Numerical Estimation of One-Dimensional Eigenfunctions
The eigenfunctions of the limiting operator are of key interest for manifold learning methods in general. For the case of the circle example (Sect. 3.7), these are the functions \(\varphi :[0,2\pi ] \rightarrow \mathbb {R}\) that solve the following generalized Helmholtz boundary value problem:
where \(\varDelta _{\mathcal {M}, \mathcal {B}}\) is the limiting Laplacian-like differential operator in Eq. (42), subject to the periodic boundary conditions:
Figure 8 shows numerically computed solutions of Eq. (106) for different choices of \(w_1, w_2\). Notice the eigenfunctions are oscillatory, as dictated by Sturm-Liouville theory [1].
Eigenfunctions. The five eigenfunctions with eigenvalues smallest in magnitude for the weighted \(\ell _1\) Laplacian on the unit circle (42). These were computed numerically. In these plots, \(w_2 = 1\). All the choices \(w_1 \in \{1, 2, 4, 8\}\) are displayed from top to bottom
We describe the numerical computation of these limiting eigenfunctions. We used a standard finite-difference scheme where the first derivative was replaced by a symmetrized difference
and the second derivative by
In this equation, f is taken to be a cyclic function defined over the discrete range
To compute the solutions we formed a sparse \(n \times n\) matrix L that corresponds to the finite-difference operator formed by substituting (107) and (108) into the first and second derivative terms in Eq. (106). The eigenvalues and eigenvectors of L were found using the function scipy.sparse.linalg.eigs() from the SciPy package [64]. It is a wrapper of the ARPACK library for large-scale eigenvalue problems [33]. Recall that in our problem, all eigenvalues are non-positive. To obtain the smallest (in magnitude) eigenvalues and their corresponding eigenvectors, we used the eigs() function in shift-invert mode with \(\sigma =1\). The particular choice of \(\sigma \) did not seem to matter much when \(\sigma > 0\), however choosing \(\sigma =0\) resulted in instabilities and convergence errors. This is due to the fact that shift-invert mode attempts to find solutions to \((L-\sigma I)^{-1}\mathbf{x}= \lambda \mathbf{x}\), and since zero is an eigenvalue of L, the choice \(\sigma =0\) results in the inversion of an ill-conditioned matrix. The use of sparse matrices allows one to take large values of n, since applying the finite-difference operator defined above costs only O(n).
Rights and permissions
About this article
Cite this article
Kileel, J., Moscovich, A., Zelesko, N. et al. Manifold Learning with Arbitrary Norms. J Fourier Anal Appl 27, 82 (2021). https://doi.org/10.1007/s00041-021-09879-2
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00041-021-09879-2
Keywords
- Dimensionality reduction
- Diffusion maps
- Laplacian eigenmaps
- Second-order differential operator
- Riemannian geometry
- Convex body