Skip to main content

Manifold Learning with Arbitrary Norms

Abstract

Manifold learning methods play a prominent role in nonlinear dimensionality reduction and other tasks involving high-dimensional data sets with low intrinsic dimensionality. Many of these methods are graph-based: they associate a vertex with each data point and a weighted edge with each pair. Existing theory shows that the Laplacian matrix of the graph converges to the Laplace–Beltrami operator of the data manifold, under the assumption that the pairwise affinities are based on the Euclidean norm. In this paper, we determine the limiting differential operator for graph Laplacians constructed using any norm. Our proof involves an interplay between the second fundamental form of the manifold and the convex geometry of the given norm’s unit ball. To demonstrate the potential benefits of non-Euclidean norms in manifold learning, we consider the task of mapping the motion of large molecules with continuous variability. In a numerical simulation we show that a modified Laplacian eigenmaps algorithm, based on the Earthmover’s distance, outperforms the classic Euclidean Laplacian eigenmaps, both in terms of computational cost and the sample size needed to recover the intrinsic geometry.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Notes

  1. Compactness of \(\mathcal {M}\) and the Hopf-Rinow theorem imply that \(exp _\mathbf{p }\) is defined on the entire tangent space \(T_\mathbf{p } \mathcal {M}\).

References

  1. Al-Gwaiz, M.: Sturm-Liouville Theory and Its Applications. Springer, London (2008)

    MATH  Google Scholar 

  2. Bates, J.: The embedding dimension of Laplacian eigenfunction maps. Appl. Comput. Harmon. Anal. 37(3), 516–530 (2014). https://doi.org/10.1016/j.acha.2014.03.002

    Article  MathSciNet  MATH  Google Scholar 

  3. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput. 15(6), 1373–1396 (2003). https://doi.org/10.1162/089976603321780317

    Article  MATH  Google Scholar 

  4. Belkin, M., Niyogi, P.: Semi-supervised learning on Riemannian manifolds. Mach. Learn. 56(1–3), 209–239 (2004). https://doi.org/10.1023/B:MACH.0000033120.25363.1e

    Article  MATH  Google Scholar 

  5. Belkin, M., Niyogi, P.: Convergence of Laplacian eigenmaps. In: Neural Information Processing Systems (NIPS) (2007). https://doi.org/10.7551/mitpress/7503.003.0021

  6. Belkin, M., Niyogi, P.: Towards a theoretical foundation for Laplacian-based manifold methods. J. Comput. Syst. Sci. 74(8), 1289–1308 (2008). https://doi.org/10.1016/j.jcss.2007.08.006

    Article  MathSciNet  MATH  Google Scholar 

  7. Bellet, A., Habrard, A., Sebban, M.: Metric learning. Synth. Lect. Artif. Intell. Mach. Learn. 9(1), 1–151 (2015). https://doi.org/10.2200/S00626ED1V01Y201501AIM030

    Article  MATH  Google Scholar 

  8. Bendory, T., Bartesaghi, A., Singer, A.: Single-particle cryo-electron microscopy: mathematical theory, computational challenges, and opportunities. IEEE Signal Process. Mag. 37(2), 58–76 (2020). https://doi.org/10.1109/MSP.2019.2957822

    Article  Google Scholar 

  9. Cheng, M.Y., Wu, H.T.: Local linear regression on manifolds and its geometric interpretation. J. Am. Stat. Assoc. 108(504), 1421–1434 (2013). https://doi.org/10.1080/01621459.2013.827984

    Article  MathSciNet  MATH  Google Scholar 

  10. Coifman, R.R., Lafon, S.: Diffusion maps. Appl. Comput. Harmon. Anal. 21(1), 5–30 (2006). https://doi.org/10.1016/j.acha.2006.04.006

    Article  MathSciNet  MATH  Google Scholar 

  11. Coifman, R.R., Leeb, W.: Earth mover’s distance and equivalent metrics for spaces with hierarchical partition trees. Tech. rep., Yale University (2013)

  12. Coifman, R.R., Lafon, S., Lee, A.B., Maggioni, M., Nadler, B., Warner, F., Zucker, S.W.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci. 102(21), 7426–7431 (2005). https://doi.org/10.1073/pnas.0500334102

    Article  MATH  Google Scholar 

  13. Dashti, A., et al.: Trajectories of the ribosome as a Brownian nanomachine. Proc. Natl. Acad. Sci. 111(49), 17492–17497 (2014). https://doi.org/10.1073/pnas.1419276111

    Article  Google Scholar 

  14. Dashti, A., et al.: Retrieving functional pathways of biomolecules from single-particle snapshots. Nat. Commun. 11(1), 4734 (2020). https://doi.org/10.1038/s41467-020-18403-x

    Article  Google Scholar 

  15. Donoho, D.L., Grimes, C.: Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. 100(10), 5591–5596 (2003). https://doi.org/10.1073/pnas.1031596100

    Article  MathSciNet  MATH  Google Scholar 

  16. Frank, J.: New opportunities created by single-particle cryo-EM: the mapping of conformational space. Biochemistry 57(6), 888 (2018). https://doi.org/10.1021/acs.biochem.8b00064

    Article  Google Scholar 

  17. Frank, J., Ourmazd, A.: Continuous changes in structure mapped by manifold embedding of single-particle data in cryo-EM. Methods 100, 61–67 (2016). https://doi.org/10.1016/j.ymeth.2016.02.007

    Article  Google Scholar 

  18. García Trillos, N., Slepčev, D.: A variational approach to the consistency of spectral clustering. Appl. Comput. Harmon. Anal. 45(2), 239–281 (2018). https://doi.org/10.1016/j.acha.2016.09.003

  19. García Trillos, N., Gerlach, M., Hein, M., Slepčev, D.: Error estimates for spectral convergence of the graph Laplacian on random geometric graphs toward the Laplace-Beltrami operator. Found. Comput. Math. 20(4), 827–887 (2020). https://doi.org/10.1007/s10208-019-09436-w

  20. Gavish, M., Nadler, B., Coifman, R.R.: Multiscale wavelets on trees, graphs and high dimensional data: theory and applications to semi supervised learning. In: International Conference on Machine Learning (ICML) (2010)

  21. Giné, E., Koltchinskii, V.: Empirical graph Laplacian approximation of Laplace-Beltrami operators: large sample results. In: High Dimensional Probability, vol. 51, pp. 238–259. Institute of Mathematical Statistics, Beachwood, Ohio, USA (2006). https://doi.org/10.1214/074921706000000888

  22. Glaeser, R.M., Nogales, E., Chiu, W. (eds.): Single-Particle Cryo-EM of Biological Macromolecules. IOP Publishing (2021). https://doi.org/10.1088/978-0-7503-3039-8

  23. Goldberg, A.B., Zhu, X., Singh, A., Xu, Z., Nowak, R.: Multi-manifold semi-supervised learning. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 169–176 (2009)

  24. Hein, M., Audibert, J.Y., von Luxburg, U.: From graphs to manifolds—weak and strong pointwise consistency of graph Laplacians. In: International Conference on Computational Learning Theory (COLT), pp. 470–485 (2005). https://doi.org/10.1007/11503415_32

  25. Hein, M., Audibert, J.Y., von Luxburg, U.: Graph Laplacians and their convergence on random neighborhood graphs. J. Mach. Learn. Res. 8, 1325–1368 (2007)

    MathSciNet  MATH  Google Scholar 

  26. Hug, D., Weil, W.: Lectures on convex geometry. In: Graduate Texts in Mathematics, vol. 286. Springer International Publishing, Cham (2020). https://doi.org/10.1007/978-3-030-50180-8

  27. Jin, Q., et al.: Iterative elastic 3D-to-2D alignment method using normal modes for studying structural dynamics of large macromolecular complexes. Structure 22(3), 496–506 (2014). https://doi.org/10.1016/j.str.2014.01.004

    Article  Google Scholar 

  28. Lederman, R.R., Andén, J., Singer, A.: Hyper-molecules: on the representation and recovery of dynamical structures for applications in flexible macro-molecules in cryo-EM. Inverse Prob. 36(4), 044005 (2020). https://doi.org/10.1088/1361-6420/ab5ede

    Article  MathSciNet  MATH  Google Scholar 

  29. Lee, J.M.: Riemannian manifolds. In: Graduate Texts in Mathematics, vol. 176. Springer New York (1997). https://doi.org/10.1007/b98852

  30. Lee, J.M.: Introduction to smooth manifolds. In: Graduate Texts in Mathematics, vol. 218. Springer, New York (2012). https://doi.org/10.1007/978-1-4419-9982-5

  31. Lee, A.B., Izbicki, R.: A spectral series approach to high-dimensional nonparametric regression. Electron. J. Stat. 10(1), 423–463 (2016). https://doi.org/10.1214/16-EJS1112

    Article  MathSciNet  MATH  Google Scholar 

  32. Lee, G., Gommers, R., Waselewski, F., Wohlfahrt, K., O’Leary, A.: PyWavelets: a Python package for wavelet analysis. J. Open Source Softw. 4(36), 1237 (2019). https://doi.org/10.21105/joss.01237

    Article  Google Scholar 

  33. Lehoucq, R.B., Sorensen, D.C., Yang, C.: ARPACK users’ guide. Soc. Ind. Appl. Math. (1998). https://doi.org/10.1137/1.9780898719628

  34. Liao, W., Maggioni, M., Vigogna, S.: Learning adaptive multiscale approximations to data and functions near low-dimensional sets. In: IEEE Information Theory Workshop (ITW), pp. 226–230. IEEE (2016). https://doi.org/10.1109/ITW.2016.7606829

  35. Lieu, L., Saito, N.: Signal ensemble classification using low-dimensional embeddings and earth mover’s distance. In: Wavelets and Multiscale Analysis, 9780817680947, pp. 227–256. Birkhäuser Boston (2011). https://doi.org/10.1007/978-0-8176-8095-4_11

  36. Mallat, S.: A Wavelet Tour of Signal Processing, 3rd edn. Elsevier, New York (2009)

    MATH  Google Scholar 

  37. McInnes, L., Healy, J., Saul, N., Großberger, L.: UMAP: uniform manifold approximation and projection. J. Open Source Softw. 3(29), 861 (2018). https://doi.org/10.21105/joss.00861

    Article  Google Scholar 

  38. Mishne, G., Talmon, R., Meir, R., Schiller, J., Lavzin, M., Dubin, U., Coifman, R.R.: Hierarchical coupled-geometry analysis for neuronal structure and activity pattern discovery. IEEE J. Select. Top. Signal Process. 10(7), 1238–1253 (2016). https://doi.org/10.1109/JSTSP.2016.2602061

    Article  Google Scholar 

  39. Mishne, G., Talmon, R., Cohen, I., Coifman, R.R., Kluger, Y.: Data-driven tree transforms and metrics. IEEE Trans. Signal Inf. Process. Netw. 4(3), 451–466 (2018). https://doi.org/10.1109/TSIPN.2017.2743561

    Article  MathSciNet  Google Scholar 

  40. Monera, M.G., Montesinos-Amilibia, A., Sanabria-Codesal, E.: The Taylor expansion of the exponential map and geometric applications. Revista de la Real Academia de Ciencias Exactas, Fisicas y Naturales - Serie A 108(2), 881–906 (2014). https://doi.org/10.1007/s13398-013-0149-z

    Article  MathSciNet  MATH  Google Scholar 

  41. Moscovich, A., Jaffe, A., Nadler, B.: Minimax-optimal semi-supervised regression on unknown manifolds. In: International Conference on Artificial Intelligence and Statistics (AISTATS), pp. 933–942. PMLR (2017)

  42. Moscovich, A., Halevi, A., Andén, J., Singer, A.: Cryo-EM reconstruction of continuous heterogeneity by Laplacian spectral volumes. Inverse Prob. 36(2), 024003 (2020). https://doi.org/10.1088/1361-6420/ab4f55

    Article  MathSciNet  MATH  Google Scholar 

  43. Nadler, B., Lafon, S., Coifman, R.R., Kevrekidis, I.G.: Diffusion maps, spectral clustering and eigenfunctions of Fokker–Planck operators. In: Neural Information Processing Systems (NIPS), pp. 955–962 (2005)

  44. Nakane, T., Kimanius, D., Lindahl, E., Scheres, S.H.: Characterisation of molecular motions in cryo-EM single-particle data by multi-body refinement in RELION. eLife 7, 1–18 (2018). https://doi.org/10.7554/eLife.36861

    Article  Google Scholar 

  45. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., Ferrin, T.E.: UCSF Chimera–a visualization system for exploratory research and analysis. J. Comput. Chem. 25(13), 1605–1612 (2004). https://doi.org/10.1002/jcc.20084

    Article  Google Scholar 

  46. Punjani, A., Fleet, D.J.: 3D variability analysis: resolving continuous flexibility and discrete heterogeneity from single particle cryo-EM. J. Struct. Biol. 213(2), 107702 (2021). https://doi.org/10.1016/j.jsb.2021.107702

    Article  Google Scholar 

  47. Rao, R., Moscovich, A., Singer, A.: Wasserstein K-means for clustering tomographic projections. In: Machine Learning for Structural Biology Workshop, NeurIPS (2020)

  48. Rosasco, L., Belkin, M., De Vito, E.: On learning with integral operators. J. Mach. Learn. Res. 11, 905–934 (2010)

    MathSciNet  MATH  Google Scholar 

  49. Rose, P., et al.: The RCSB protein data bank: integrative view of protein, gene and 3D structural information. Nucl Acids Res 45(D1), D271–D281 (2017). https://doi.org/10.1093/nar/gkw1000

    Article  Google Scholar 

  50. Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500), 2323–2326 (2000). https://doi.org/10.1126/science.290.5500.2323

    Article  Google Scholar 

  51. Ruszczyński, A.: Nonlinear Optimization. Princeton University Press, Princeton (2011). https://doi.org/10.2307/j.ctvcm4hcj

  52. Sathyanarayanan, N., Cannone, G., Gakhar, L., Katagihallimath, N., Sowdhamini, R., Ramaswamy, S., Vinothkumar, K.R.: Molecular basis for metabolite channeling in a ring opening enzyme of the phenylacetate degradation pathway. Nat. Commun. 10(1), 4127 (2019). https://doi.org/10.1038/s41467-019-11931-1

    Article  Google Scholar 

  53. Schwander, P., Fung, R., Ourmazd, A.: Conformations of macromolecules and their complexes from heterogeneous datasets. Philos. Trans. R. Soc. B 369(1647), 1–8 (2014). https://doi.org/10.1098/rstb.2013.0567

    Article  Google Scholar 

  54. Shirdhonkar, S., Jacobs, D.W.: Approximate earth mover’s distance in linear time. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8. IEEE (2008). https://doi.org/10.1109/CVPR.2008.4587662

  55. Singer, A.: From graph to manifold Laplacian: the convergence rate. Appl. Comput. Harmon. Anal. 21(1), 128–134 (2006). https://doi.org/10.1016/j.acha.2006.03.004

    Article  MathSciNet  MATH  Google Scholar 

  56. Singer, A., Sigworth, F.J.: Computational methods for single-particle electron cryomicroscopy. Ann. Rev. Biomed. Data Sci. 3(1), 163–190 (2020). https://doi.org/10.1146/annurev-biodatasci-021020-093826

    Article  Google Scholar 

  57. Sober, B., Aizenbud, Y., Levin, D.: Approximation of functions over manifolds: a moving Least-squares approach. J. Comput. Appl. Math. 383, 113140 (2021). https://doi.org/10.1016/j.cam.2020.113140

    Article  MathSciNet  MATH  Google Scholar 

  58. Sorzano, C.O.S., et al.: Survey of the analysis of continuous conformational variability of biological macromolecules by electron microscopy. Acta Crystallogr. Sect. F 75(1), 19–32 (2019). https://doi.org/10.1107/S2053230X18015108

    Article  Google Scholar 

  59. Stock, D., Leslie, A., Walker, J.: Molecular architecture of the rotary motor in ATP synthase. Science 286(5445), 1700–1705 (1999). https://doi.org/10.1126/science.286.5445.1700

    Article  Google Scholar 

  60. Tagare, H.D., Kucukelbir, A., Sigworth, F.J., Wang, H., Rao, M.: Directly reconstructing principal components of heterogeneous particles from cryo-EM images. J. Struct. Biol. 191(2), 245–262 (2015). https://doi.org/10.1016/j.jsb.2015.05.007

    Article  Google Scholar 

  61. Tenenbaum, J.B., de Silva, V., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290(5500), 2319–2323 (2000). https://doi.org/10.1126/science.290.5500.2319

    Article  Google Scholar 

  62. Ting, D., Huang, L., Jordan, M.: An analysis of the convergence of graph Laplacians. In: International Conference on Machine Learning (ICML) (2010)

  63. van der Maaten, L., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(86), 2579–2605 (2008)

  64. Virtanen, P., et al.: SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat. Methods 17(3), 261–272 (2020). https://doi.org/10.1038/s41592-019-0686-2

    Article  Google Scholar 

  65. Von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). https://doi.org/10.1007/s11222-007-9033-z

    Article  MathSciNet  Google Scholar 

  66. von Luxburg, U., Belkin, M., Bousquet, O.: Consistency of spectral clustering. Ann. Stat. 36(2), 555–586 (2008). https://doi.org/10.1214/009053607000000640

    Article  MathSciNet  MATH  Google Scholar 

  67. Winkelbauer, A.: Moments and absolute moments of the normal distribution, pp. 1–4 (2012). arXiv:1209.4340v2

  68. Wormell, C.L., Reich, S.: Spectral convergence of diffusion maps: improved error bounds and an alternative normalization. SIAM J. Numer. Anal. 59(3), 1687–1734 (2021). https://doi.org/10.1137/20M1344093

    Article  MathSciNet  MATH  Google Scholar 

  69. Yoshida, M., Muneyuki, E., Hisabori, T.: ATP synthase—a Marvellous rotary engine of the cell. Nat. Rev. Mol. Cell Biol. 2(9), 669–677 (2001). https://doi.org/10.1038/35089509

    Article  Google Scholar 

  70. Zelesko, N., Moscovich, A., Kileel, J., Singer, A.: Earthmover-based manifold learning for analyzing molecular conformation spaces. In: IEEE International Symposium on Biomedical Imaging (ISBI) (2020). https://doi.org/10.1109/ISBI45749.2020.9098723

  71. Zhang, Z., Zha, H.: Principal manifolds and nonlinear dimensionality reduction via tangent space alignment. SIAM J. Sci. Comput. 26(1), 313–338 (2004). https://doi.org/10.1137/S1064827502419154

    Article  MathSciNet  MATH  Google Scholar 

  72. Zhang, S., Moscovich, A., Singer, A.: Product manifold learning. In: International Conference on Artificial Intelligence and Statistics (AISTATS) (2021)

  73. Zhong, E.D., Bepler, T., Berger, B., Davis, J.H.: CryoDRGN: reconstruction of heterogeneous cryo-EM structures using neural networks. Nat. Methods 18(2), 176–185 (2021). https://doi.org/10.1038/s41592-020-01049-4

    Article  Google Scholar 

Download references

Acknowledgements

We thank Charles Fefferman, William Leeb, Eitan Levin and John Walker for enlightening discussions. Most of this work was performed while AM was affiliated with PACM at Princeton University. This research was supported by AFOSR FA9550-17-1-0291, ARO W911NF-17-1-0512, NSF BIGDATA IIS-1837992, the Simons Investigator Award, the Moore Foundation Data-Driven Discovery Investigator Award, the Simons Collaboration on Algorithms and Geometry, and start-up grants from the College of Natural Sciences and Oden Institute for Computational Engineering and Sciences at UT Austin.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Joe Kileel.

Additional information

Communicated by Isaak Pesenson.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendices

Proof of Lemma 3

\({\mathbf{Step~1: LHS} \subseteq \mathbf{RHS}}\). By the identity (13) for tangent cones of convex sets, we have

$$\begin{aligned} TC_{\mathbf{y}}({\mathcal {B}}) = \overline{\mathbb {R}_{>0}({\mathcal {B}}- \mathbf{y})}. \end{aligned}$$

By definition of \(\partial \) and the fact that the relative interior of a convex set equals the relative interior of its closure, the LHS of Eq. (14) reads

$$\begin{aligned} \partial \left( TC_\mathbf{y }(\mathcal {B}) \right) = \overline{\mathbb {R}_{>0} (\mathcal {B} - \mathbf{y} )} \setminus \left( \mathbb {R}_{>0} (\mathcal {B} - \mathbf{y} ) \right) ^{\circ }. \end{aligned}$$
(94)

Let \(\mathbf{d} \in \partial \left( TC_\mathbf{y }(\mathcal {B}) \right) \). By Eq. (94), \(\mathbf{d} = \lim _{k \rightarrow \infty } \beta _k (\widetilde{\mathbf{y }}_k - \mathbf{y} )\) for some \(\beta _k \in \mathbb {R}_{>0}\) and \(\widetilde{\mathbf{y }}_k \in \mathcal {B}\). Without loss of generality, we assume \(\widetilde{\mathbf{y }}_k \in \partial \mathcal {B}\) for each k. We break into cases.

  • Case A: \(\widetilde{\mathbf{y }} = \mathbf{y} \).

    Either \(\mathbf{d} = 0 \in T_\mathbf{y }(\partial \mathcal {B})\), or \(\tau _k := 1 / \beta _k \rightarrow \infty \) as \(k \rightarrow \infty \). If the latter, the sequences \((\widetilde{\mathbf{y }}_k)_{k=1}^{\infty } \subseteq \partial \mathcal {B}\) and \((\tau _k)_{k=1}^{\infty }\subseteq \mathbb {R}_{>0}\) witness \(\mathbf{d} \in TC_\mathbf{y }(\partial \mathcal {B})\).

  • Case B: \(\widetilde{\mathbf{y }} \ne \mathbf{y} \).

    Here, \(\lim _{k \rightarrow \infty } \beta _k =: \beta \in \mathbb {R}_{\ge 0}\) exists, and \(\mathbf{d} = \beta (\widetilde{\mathbf{y }} - \mathbf{y} )\). If \(\beta =0\), then \(\mathbf{d} = 0 \in T_\mathbf{y }(\partial \mathcal {B})\). Suppose \(\beta \ne 0\). Let the line segment joining \(\widetilde{\mathbf{y }}\) and \(\mathbf{y} \) be

    $$\begin{aligned} \texttt {conv}\{\widetilde{\mathbf{y }}, \mathbf{y} \} := \{\alpha \widetilde{\mathbf{y }} + (1 - \alpha )\mathbf{y} \in \mathbb {R}^D : \alpha \in [0,1]\}. \end{aligned}$$

    So, \(\texttt {conv}\{\widetilde{\mathbf{y }}, \mathbf{y} \} \subseteq \mathcal {B}\). We claim \(\texttt {conv}\{\widetilde{\mathbf{y }}, \mathbf{y} \} \subseteq \partial \mathcal {B}\). Assume not. That is,

    $$\begin{aligned} \exists \, \alpha \in (0,1) \, such that \, \mathbf{z} := \alpha \widetilde{\mathbf{y }} + (1 - \alpha ) \mathbf{y} \in \mathcal {B}^{\circ }. \end{aligned}$$

    But then,

    $$\begin{aligned} \mathbf{d} \, = \, \beta (\widetilde{\mathbf{y }} - \mathbf{y} ) \, = \, (\beta / \alpha ) (\mathbf{z} - \mathbf{y} ) \, \in \, \mathbb {R}_{>0}\left( \mathcal {B}^{\circ } - \mathbf{y} \right) \, \subseteq \, \left( \mathbb {R}_{>0} (\mathcal {B} - \mathbf{y} ) \right) ^{\circ }. \end{aligned}$$

    This contradicts \(\mathbf{d} \in \partial \left( TC_\mathbf{y }(\mathcal {B}) \right) \) (see Eq. (94)). So, indeed \(\texttt {conv}\{\widetilde{\mathbf{y }}, \mathbf{y} \} \subseteq \partial \mathcal {B}\). Now, define

    $$\begin{aligned} \widehat{\mathbf{y }}_k := \frac{1}{k} \widetilde{\mathbf{y }} + (1 - \frac{1}{k}) \mathbf{y} \in \partial \mathcal {B} \,\,\,\,\,\,\,\, and \,\,\,\,\,\,\,\, \tau _k := \frac{1}{k} \in \mathbb {R}_{>0}. \end{aligned}$$

    Then, \(\frac{\widehat{\mathbf{y }}_k - \mathbf{y} }{\tau _k} = \mathbf{d} \) for each k, and \((\widehat{\mathbf{y }}_k)_{k=1}^{\infty }\) and \((\tau _k)_{k=1}^{\infty }\) witness \(\mathbf{d} \in TC_\mathbf{y }(\partial \mathcal {B})\).

In all cases, we have verified \(\mathbf{d} \in TC_\mathbf{y }(\partial \mathcal {B})\). This gives LHS \(\subseteq \) RHS in (14).

\(\underline{\mathbf{Step 2: LHS} \supseteq \mathbf{RHS.}}\) Let \(\mathbf{d} \in TC_\mathbf{y }(\partial \mathcal {B})\). By the definition of tangent cones (12), \(\mathbf{d} = \lim _{k \rightarrow \infty } \tau _k^{-1} \left( \widetilde{\mathbf{y }}_k - \mathbf{y} \right) \) for some \(\tau _{k} \in \mathbb {R}_{>0}\) and \(\widetilde{\mathbf{y }}_k \in \partial \mathcal {B}\) with \(\tau _k \rightarrow 0\) and \(\widetilde{\mathbf{y }}_k \rightarrow \mathbf{y} \) as \(k \rightarrow \infty \). By (94), we need to show \(\mathbf{d} \notin \left( \mathbb {R}_{>0} (\mathcal {B} - \mathbf{y} ) \right) ^{\circ }\).

First, we will prove \(\texttt {conv}\{\mathbf{d }+\mathbf{y} , \mathbf{y }\} \cap \mathcal {B}^{\circ } = \emptyset \). Assume not, i.e.,

$$\begin{aligned} \exists \, \alpha \in (0,1) \, such that \, \mathbf{z} := \alpha (\mathbf{d} +\mathbf{y} ) + (1 - \alpha ) \mathbf{y} = \alpha \mathbf{d} + \mathbf{y} \in \mathcal {B}^{\circ }. \end{aligned}$$

Let \({\widehat{\tau }}_k = \tau _k / \alpha \in \mathbb {R}_{>0}\), so that

$$\begin{aligned} \alpha \mathbf{d} = \lim _{k \rightarrow \infty } {\widehat{\tau }}_k^{-1}\left( \widetilde{\mathbf{y }}_k - \mathbf{y} \right) . \end{aligned}$$
(95)

Since \(\mathcal {B}^{\circ }\) is open, there exists \(\delta > 0\) with

$$\begin{aligned} \mathcal {N} := \{ \mathbf{w} \in \mathbb {R}^D : \Vert \mathbf{w} - \mathbf{z} \Vert _2 \le \delta \} \subseteq \mathcal {B}^{\circ }. \end{aligned}$$

By Eq. (95), there exists K such that for all \(k \ge K\),

$$\begin{aligned} {\widehat{\tau }}_k^{-1} (\widetilde{\mathbf{y }}_k - \mathbf{y} ) + \mathbf{y} \in \mathcal {N}. \end{aligned}$$

On the other hand, it is easy to see for each \(\mathbf{w} \in \mathcal {B}^{\circ }\),

$$\begin{aligned} \left( \mathbf{y } + {\mathbb {R}}_{\ge 0} (\mathbf{w } - \mathbf{y }) \right) \cap {\mathcal {B}} \, = \, \texttt {conv}\{\mathbf{y }, \mathbf{w '}\} \end{aligned}$$
(96)

for some \(\mathbf{w} ' \in \partial \mathcal {B}\), using convexity and compactness of \(\mathcal {B}\). In addition,

$$\begin{aligned} \left( \mathbf{y} + \mathbb {R}_{\ge 0} (\mathbf{w} - \mathbf{y} ) \right) \cap \partial \mathcal {B} = \{\mathbf{y }, \mathbf{w '}\}, \end{aligned}$$
(97)

using \(\mathbf{w} \in \texttt {conv}\{\mathbf{y }, \mathbf{w '}\}\), \(\Vert \mathbf{w} \Vert _{\mathcal {B}}< 1\), and the triangle inequality for \(\Vert \cdot \Vert _{\mathcal {B}}\). Clearly,

$$\begin{aligned} \Vert \mathbf{w} ' - \mathbf{y} \Vert _2 > \Vert \mathbf{w } - \mathbf{y} \Vert _2. \end{aligned}$$
(98)

Now, let \(\epsilon := \min _\mathbf{w \in \mathcal {N}} \Vert \mathbf{w} - \mathbf{y} \Vert _2\). Note \(\epsilon > 0\). For each \(k \le K\), we apply (96), (97) to \(\mathbf{w} = {\widehat{\tau }}_k^{~-1}(\widetilde{\mathbf{y }}_k - \mathbf{y} ) + \mathbf{y} \in \mathcal {N}\). Then, \(\mathbf{w} ' = \widetilde{\mathbf{y }}_k\). By (98),

$$\begin{aligned} \Vert \widetilde{\mathbf{y }}_k - \mathbf{y} \Vert _2 > \Vert \mathbf{w} - \mathbf{y} \Vert _2 \ge \epsilon \,\,\,\,\, for all k \ge K. \end{aligned}$$
(99)

But (99) contradicts \(\widetilde{\mathbf{y }}_k \rightarrow \mathbf{y} \) as \(k \rightarrow \infty \). Therefore, \(\texttt {conv}\{\mathbf{d }+\mathbf{y} ,\mathbf{y }\} \cap \mathcal {B}^{\circ } = \emptyset \).

Translating by \(-\mathbf{y} \), \(\texttt {conv}\{\mathbf{d }, 0\} \cap (\mathcal {B} - \mathbf{y} )^{\circ } = \emptyset \). By this and convexity, it follows there exists a properly separating hyperplane:

$$\begin{aligned}&\exists \, \mathbf{v} \in \mathbb {R}^D \setminus \{0\}, \exists \, \gamma \in \mathbb {R} \, such that \, \forall \, \mathbf{u} _1 \in \texttt {conv}\{\mathbf{d }, 0\}, \forall \, \mathbf{u} _2 \in \mathcal {B} - \mathbf{y} \\&\quad \langle \mathbf{v} , \mathbf{u} _1 \rangle \ge \gamma , \langle \mathbf{v} , \mathbf{u} _2 \rangle \le \gamma and \, \exists \, \widetilde{\mathbf{u }}_2 \in \mathcal {B} - \mathbf{y} \, such that \langle \mathbf{v} , \widetilde{\mathbf{u }}_2 \rangle < \gamma . \end{aligned}$$

In particular,

$$\begin{aligned} \mathbb {R}_{>0} (\mathcal {B} - \mathbf{y} ) \subseteq \{\mathbf{u } \in \mathbb {R}^D : \langle \mathbf{v} , \mathbf{u} \rangle \le \gamma \}. \end{aligned}$$

Also, for any open neighborhood \(\mathcal {D} \subseteq \mathbb {R}^D\) with \(\mathbf{d} \in \mathcal {D}\),

$$\begin{aligned} \exists \, \widetilde{\mathbf{d }} \in \mathcal {D} \, such that \langle \mathbf{v} , \widetilde{\mathbf{d }} \rangle > \langle \mathbf{v} , \mathbf{d} \rangle \ge \gamma . \end{aligned}$$

We conclude \(\mathbf{d} \notin \left( \mathbb {R}_{>0} (\mathcal {B} - \mathbf{y} ) \right) ^{\circ }\), as desired. This gives \(\mathbf{d} \in \partial \left( TC_\mathbf{y }(\mathcal {B})\right) \), and LHS \(\supseteq \) RHS in Eq. (14), completing the proof of the lemma. \(\square \)

Proof of Proposition 5

For item 1, we first note that \({\text {grad}}\Vert \cdot \Vert _{\mathcal {B}}(\widehat{\mathbf{a }})\) is nonzero, since the directional derivative of the norm function at \(\widehat{\mathbf{a }}\) in the direction of \(\widehat{\mathbf{a }}\) is nonzero. Indeed the function \(\mathbb {R} \rightarrow \mathbb {R}; \lambda \mapsto \Vert \widehat{\mathbf{a }} + \lambda \widehat{\mathbf{a }} \Vert _{\mathcal {B}}\) has derivative \(\Vert \widehat{\mathbf{a }} \Vert _{\mathcal {B}} = 1\) at \(\lambda = 0\), using homogeneity of \(\Vert \cdot \Vert _{\mathcal {B}}\) under positive scaling. Item 1 now follows immediately from [51, Thm. 3.15] and the preceding paragraph in that reference that metric regularity is implied by the linear independence of the gradients.

For item 2, we note that due to homogeneity of the norm, since \(\Vert \cdot \Vert _{\mathcal {B}}\) is \(C^1\) around \(L_\mathbf{p }({\widehat{\mathbf{s }}})\), it is also \(C^1\) around \(L_\mathbf{p }({\widehat{\mathbf{s }}}) / \Vert L_\mathbf{p }({\widehat{\mathbf{s }}}) \Vert _{\mathcal {B}}\) and it holds

$$\begin{aligned} {\text {grad}} \Vert \cdot \Vert _{\mathcal {B}}(L_\mathbf{p }(\widehat{\mathbf{s }})/\Vert L_\mathbf{p }(\widehat{\mathbf{s }})\Vert _{\mathcal {B}}) = (1/\Vert L_\mathbf{p }({\widehat{\mathbf{s }}})\Vert _{\mathcal {B}}) {\text {grad}} \Vert \cdot \Vert _{\mathcal {B}}(L_\mathbf{p }(\widehat{\mathbf{s }})). \end{aligned}$$

Thus, item 1 applies and implies the tangent cone in right-hand side of Eq. (15) is the hyperplane normal to \(L_\mathbf{p }({\widehat{\mathbf{s }}})\). Now we finish by equating the inner product of \({\text {grad}}\Vert \cdot \Vert _{\mathcal {B}}(L_\mathbf{p }({\widehat{\mathbf{s }}}))\) and the LHS of Eq. (15) with 0, and solving for \(\eta \). \(\square \)

Proof of Lemma 8

Given \(\mathcal {M}\) and \(\mathcal {B}\), we need to show that there exists a positive constant C (independent of \(\mathbf{p} , \xi \)) such that for all \(\mathbf{p} \in \mathcal {M}\) and all vectors \(\xi \in T_\mathbf{p }\mathcal {M}\) we have

$$\begin{aligned} \left\langle \xi \xi ^{\top } , \, \tfrac{1}{2} \int _{\{\mathbf{s } \in T_\mathbf{p }\mathcal {M}: \Vert L_\mathbf{p }(\mathbf{s} ) \Vert _{\mathcal {B}}\le 1\}} \mathbf{s} \mathbf{s} ^{\top } d \mathbf{s} \right\rangle \,\,\, \ge \,\,\, C \left\| \xi \right\| _2^2. \end{aligned}$$
(100)

To this end, use linearity of the integral to rewrite the left-hand side of (100) as

$$\begin{aligned} \frac{1}{2} \int _{\{\mathbf{s } \in T_\mathbf{p }\mathcal {M}: \Vert L_\mathbf{p }(\mathbf{s} ) \Vert _{\mathcal {B}}\le 1\}} \langle \xi \xi ^{\top }, \mathbf{s} \mathbf{s} ^{\top } \rangle \, d \mathbf{s} \,\, = \,\, \frac{1}{2} \int _{\{\mathbf{s } \in T_\mathbf{p }\mathcal {M}: \Vert L_\mathbf{p }(\mathbf{s} ) \Vert _{\mathcal {B}}\le 1 \}} \langle \xi , \mathbf{s} \rangle ^2 \, d \mathbf{s} . \end{aligned}$$
(101)

By the equivalence of norms on \(\mathbb {R}^D\), there exists a positive constant c such that for all \(\mathbf{v} \in \mathbb {R}^D\) we have that \(\Vert \mathbf{v} \Vert _{2} \le c\) implies \(\Vert \mathbf{v} \Vert _{\mathcal {B}} \le 1\). In particular, the domain of integration in Eq. (101) is inner-approximated by

$$\begin{aligned} \{\mathbf{s } \in T_\mathbf{p }\mathcal {M} : \Vert L_\mathbf{p }(\mathbf{s} )\Vert _2 \le c \} \, \subseteq \, \{\mathbf{s } \in T_\mathbf{p }\mathcal {M} : \Vert L_\mathbf{p }(\mathbf{s} )\Vert _{\mathcal {B}} \le 1 \}. \end{aligned}$$

Since the integrand in (101) is non-negative, it follows

$$\begin{aligned} \frac{1}{2} \int _{\{\mathbf{s } \in T_{\mathbf{p }}{\mathcal {M}}: \Vert L_{\mathbf{p }}(\mathbf{s }) \Vert _{\mathcal {B}}\le 1 \}} \langle \xi , \mathbf{s } \rangle ^2 \, d \mathbf{s } \,\, \ge \,\, \frac{1}{2} \int _{\{\mathbf{s } \in T_\mathbf{p }\mathcal {M}: \Vert L_\mathbf{p }(\mathbf{s} ) \Vert _{2} \le c \}} \langle \xi , \mathbf{s} \rangle ^2 \, d \mathbf{s} . \end{aligned}$$

Since \(L_\mathbf{p }\) is an isometry, the right-hand side is

$$\begin{aligned} \frac{1}{2} \int _{\{\mathbf{s } \in T_\mathbf{p }\mathcal {M}: \Vert \mathbf{s} \Vert _{2} \le c \}} \langle \xi , \mathbf{s} \rangle ^2 \, d \mathbf{s} . \end{aligned}$$

Using rotational symmetry of Euclidean balls, this equals

$$\begin{aligned} \left( \frac{1}{2} \int _{\{\mathbf{s } \in T_\mathbf{p }\mathcal {M}: \Vert \mathbf{s} \Vert _{2} \le c\}} s_1^2 \, d \mathbf{s} \right) \Vert \xi \Vert _2^2, \end{aligned}$$
(102)

where \(s_1\) denotes the first coordinate of \(\mathbf{s} \) with respect to the fixed orthonormal basis on \(T_\mathbf{p }\mathcal {M}\) (Sect. 3.1). Now note the parenthesized quantity in (102) is a positive constant C depending only on c and the manifold dimension d. By what we have said, it satisfies the bound (100) as desired. \(\square \)

Proof of Proposition 9

  1. 1.

    Denote the function (37) by \(F : \mathcal {M} \rightarrow {\text {Sym}}^2(T\mathcal {M})\). Let \((\mathbf{p} _k)_{k=1}^{\infty } \subseteq \mathcal {M}\) be a sequence converging to \(\mathbf{p} \in \mathcal {M}\). To move to one fixed space we identify tangent spaces using the Levi-Civita connection on \(\mathcal {M}\). After choosing a smooth path \(\gamma :[0,1] \rightarrow \mathcal {M}\) such that \(\gamma (\tfrac{1}{k}) = \mathbf{p} _k\) for each \(k \ge 1\) and \(\gamma (0)=1\), the Levi-Civita connection gives isometries \(\tau _{k}: T_\mathbf{p }\mathcal {M} \rightarrow T_\mathbf{p _k}\mathcal {M}\). Furthermore, \(\tau _k\) converges to the identity map on \(T_\mathbf{p }\mathcal {M}\) as elements of \((T\mathcal {M})^* \otimes T\mathcal {M}\) as \(k \rightarrow \infty \).

    We want to show \(F(\mathbf{p} _k) \rightarrow F(\mathbf{p} )\) in \({\text {Sym}}^2(T\mathcal {M})\). It suffices to show \((\tau _k^{-1} \otimes \tau _k^{-1})(F(\mathbf{p} _k)) \rightarrow F(\mathbf{p} )\) in \({\text {Sym}}^2(T_\mathbf{p }\mathcal {M})\) (last sentence of the previous paragraph). Changing variables \(\mathbf{s} \leftarrow \tau _k^{-1}(\mathbf{s} )\) and using that \(\tau _k\) is an isometry, we have

    $$\begin{aligned} (\tau _k^{-1} \otimes \tau _k^{-1})(F(\mathbf{p} _k)) = \frac{1}{2} \int _\mathbf{s \in T_\mathbf{p }\mathcal {M} : \Vert L_\mathbf{p _k}(\tau _k(\mathbf{s} ))\Vert _{\mathcal {B}} \le 1} \mathbf{s} \mathbf{s} ^{\top } d\mathbf{s} . \end{aligned}$$

    Write this as

    $$\begin{aligned} \int _\mathbf{s \in T_\mathbf{p }\mathcal {M}} \mathbbm {1}( \Vert L_\mathbf{p _k}(\tau _k(\mathbf{s} ))\Vert _{\mathcal {B}} < 1) \, \mathbf{s} \mathbf{s} ^{\top } d\mathbf{s} \, \in \, {\text {Sym}}^2(T_\mathbf{p }\mathcal {M}). \end{aligned}$$

    Compare this to

    $$\begin{aligned} F(\mathbf{p} ) = \int _\mathbf{s \in T_\mathbf{p }\mathcal {M}} \mathbbm {1}(\Vert L_\mathbf{p }(\mathbf{s} ) \Vert _{\mathcal {B}} < 1) \mathbf{s} \, \in \, {\text {Sym}}^2(T_\mathbf{p }\mathcal {M}). \mathbf{s} ^{\top } d\mathbf{s} \end{aligned}$$

    Since \(L_\mathbf{p _k} \rightarrow L_\mathbf{p }\), \(\tau _k \rightarrow {\text {Id}}_{T_\mathbf{p }\mathcal {M}}\) and \(\Vert \cdot \Vert _{\mathcal {B}}\) is continuous on \(\mathbb {R}^D\), for each \(\mathbf{s} \in T_\mathbf{p }\mathcal {M}\) there is the pointwise convergence:

    $$\begin{aligned} \mathbbm {1}( \Vert L_\mathbf{p _k}(\tau _k(\mathbf{s} ))\Vert _{\mathcal {B}}< 1) \, \mathbf{s} \mathbf{s} ^{\top } \longrightarrow \mathbbm {1}(\Vert L_\mathbf{p }(\mathbf{s} )\Vert _{\mathcal {B}} < 1) \mathbf{s} \mathbf{s} ^{\top } \end{aligned}$$

    Also, letting \(c \in \mathbb {R}_{>0}\) be a constant such that \(\Vert \mathbf{u} \Vert _{2} \le c \Vert \mathbf{u} \Vert _{\mathcal {B}}\) for all \(\mathbf{u} \in \mathbb {R}^D\), we have the uniform bound:

    $$\begin{aligned} \Vert \mathbbm {1}( \Vert L_\mathbf{p _k}(\tau _k(\mathbf{s} ))\Vert _{\mathcal {B}} < 1) \, \mathbf{s} \mathbf{s} ^{\top }\Vert _F \le c^2 \quad for all \mathbf{s} \in T_\mathbf{p }\mathcal {M} and k \ge 1, \end{aligned}$$

    since \(L_\mathbf{p }\) and \(\tau _k\) are both isometries. Hence, the bounded convergence theorem is applicable, and implies \((\tau _k \otimes \tau _k)(F(\mathbf{p} _k)) \rightarrow F(\mathbf{p} )\).

  2. 2.

    Denote the function (38) by \(G : \mathcal {M} \rightarrow {\text {Sym}}(T\mathcal {M})\). Let \(\mathbf{p} \) be a point satisfying the stated assumption. There exists an open neighborhood \(\mathcal {U}\) of \(\mathbf{p} \) in \(\mathcal {M}\) such that for each \(\mathbf{p} _* \in \mathcal {U}\) the norm \(\Vert \cdot \Vert _{\mathcal {B}}\) is \(C^1\) in some neighborhood of \(L_\mathbf{p _*}(T_\mathbf{p _*}\mathcal {M}) \cap \mathbb {S}^{D-1}\). Let \((\mathbf{p} _k)_{k=1}^{\infty } \subseteq \mathcal {U}\) be a sequence converging to \(\mathbf{p} \). Identifying tangent spaces as above, it suffices to show \(\tau _k^{-1}(F(\mathbf{p} _k)) \rightarrow F(\mathbf{p} )\).

    By the local \(C^1\) assumption, Proposition 5, item 2 applies and gives

    $$\begin{aligned} F(\mathbf{p} ) = \int _{{\widehat{\mathbf{s }}} \in T_\mathbf{p }\mathcal {M} : \Vert {\widehat{\mathbf{s }}}\Vert _2=1} -{\widehat{\mathbf{s }}} \Vert L_\mathbf{p }({\widehat{\mathbf{s }}}) \Vert ^{-d-2}_{\mathcal {B}} \frac{\left\langle {\text {grad}} \Vert \cdot \Vert _{\mathcal {B}} (L_\mathbf{p }({\widehat{\mathbf{s }}})), \, \tfrac{1}{2} Q_\mathbf{p }({\widehat{\mathbf{s }}}) \right\rangle }{\left\langle {\text {grad}} \Vert \cdot \Vert _{\mathcal {B}} (L_\mathbf{p }({\widehat{\mathbf{s }}})), \, L_\mathbf{p }({\widehat{\mathbf{s }}}) \right\rangle } \, d{\widehat{\mathbf{s }}}. \end{aligned}$$

    Likewise, by a change of variables using that \(\tau _k^{-1}\) preserves unit spheres:

    $$\begin{aligned}&\tau _k^{-1}(F(\mathbf{p} _k))= \\&\int _{{\widehat{\mathbf{s }}} \in T_\mathbf{p }\mathcal {M} : \Vert {\widehat{\mathbf{s }}}\Vert _2=1} -{\widehat{\mathbf{s }}} \Vert L_\mathbf{p _k}(\tau _k({\widehat{\mathbf{s }}})) \Vert ^{-d-2}_{\mathcal {B}} \frac{\left\langle {\text {grad}} \Vert \cdot \Vert _{\mathcal {B}} (L_\mathbf{p _k}(\tau _k({\widehat{\mathbf{s }}}))), \, \tfrac{1}{2} Q_\mathbf{p _k}(\tau _k({\widehat{\mathbf{s }}})) \right\rangle }{\left\langle {\text {grad}} \Vert \cdot \Vert _{\mathcal {B}} (L_\mathbf{p _k}(\tau _k({\widehat{\mathbf{s }}}))), \, L_\mathbf{p _k}(\tau _k({\widehat{\mathbf{s }}})) \right\rangle } d{\widehat{\mathbf{s }}}. \end{aligned}$$

    Boundedness and pointwise convergence hold since \({\text {grad}} \Vert \cdot \Vert _{\mathcal {B}}\) is locally continuous. So the bounded convergence theorem implies the first sentence in the statement. The second sentence follows from the example in Sect. 3.7. \(\square \)

Tail Bounds and Absolute Moments of the Gaussian

We recall some basic properties of the Gaussian. As in Sect. 4.2, \(\kappa _{\sigma }(s) := \frac{2s}{\sigma ^2}e^{-s^2/\sigma ^2}\).

  • For each even \(k \ge 0\) and \(\delta \ge 0\), by substitution and then integration by parts k/2 times,

    $$\begin{aligned}&\int _{s=\delta }^{\infty } s^k \kappa _{\sigma }(s) ds \nonumber \\&= \sigma ^k e^{-\delta ^2/\sigma ^2} \left( \left( \frac{\delta ^2}{\sigma ^2}\right) ^{\frac{k}{2}} + \, \frac{k}{2} \left( \frac{\delta ^2}{\sigma ^2}\right) ^{\frac{k}{2}-1} + \, \frac{k}{2}\left( \frac{k}{2} - 1\right) \left( \frac{\delta ^2}{\sigma ^2}\right) ^{\frac{k}{2}-2} + \cdots + \left( \frac{k}{2}\right) ! \right) \nonumber \\&= e^{-\delta ^2/\sigma ^2}poly (\sigma , \delta ). \end{aligned}$$
    (103)
  • For each odd \(k \ge 0\) and \(\delta > 0\), using \(s/\delta \ge 1\) for \(s \in [\delta , \infty ]\) and Eq. (103),

    $$\begin{aligned} \int _{s=\delta }^{\infty } s^k \kappa _{\sigma }(s) ds \le (1/\delta ) \int _{s=\delta }^{\infty } s^{k+1} \kappa _{\sigma }(s) ds = e^{-\delta ^2/\sigma ^2} (1/\delta ) \, poly (\sigma , \delta ). \end{aligned}$$
    (104)
  • For each \(k \ge 0\), from [67, Equation 18],

    $$\begin{aligned} \int _{s=0}^{\infty } s^k \kappa _{\sigma }(s) ds = \sigma ^k \varGamma \left( \tfrac{k+2}{2}\right) . \end{aligned}$$
    (105)

Numerical Estimation of One-Dimensional Eigenfunctions

The eigenfunctions of the limiting operator are of key interest for manifold learning methods in general. For the case of the circle example (Sect. 3.7), these are the functions \(\varphi :[0,2\pi ] \rightarrow \mathbb {R}\) that solve the following generalized Helmholtz boundary value problem:

$$\begin{aligned} \varDelta _{\mathcal {M}, \mathcal {B}} \, \varphi + \lambda \varphi = 0, \end{aligned}$$
(106)

where \(\varDelta _{\mathcal {M}, \mathcal {B}}\) is the limiting Laplacian-like differential operator in Eq. (42), subject to the periodic boundary conditions:

$$\begin{aligned} \varphi (\theta + 2 \pi )&= \varphi (\theta ),\\ \varphi '(\theta + 2 \pi )&= \varphi '(\theta ). \end{aligned}$$

Figure 8 shows numerically computed solutions of Eq. (106) for different choices of \(w_1, w_2\). Notice the eigenfunctions are oscillatory, as dictated by Sturm-Liouville theory [1].

Fig. 8
figure 8

Eigenfunctions. The five eigenfunctions with eigenvalues smallest in magnitude for the weighted \(\ell _1\) Laplacian on the unit circle (42). These were computed numerically. In these plots, \(w_2 = 1\). All the choices \(w_1 \in \{1, 2, 4, 8\}\) are displayed from top to bottom

We describe the numerical computation of these limiting eigenfunctions. We used a standard finite-difference scheme where the first derivative was replaced by a symmetrized difference

$$\begin{aligned} \frac{d f}{d \theta } \rightarrow \frac{f(\theta + \varDelta \theta ) - f(\theta - \varDelta \theta )}{2 \varDelta \theta }, \end{aligned}$$
(107)

and the second derivative by

$$\begin{aligned} \frac{d^2 f}{d \theta ^2} \rightarrow \frac{f(\theta + \varDelta \theta ) - 2 f(\theta ) + f(\theta - \varDelta \theta ) }{(\varDelta \theta )^2}. \end{aligned}$$
(108)

In this equation, f is taken to be a cyclic function defined over the discrete range

$$\begin{aligned} \left\{ 0, \frac{2\pi }{n}, \ldots , \frac{2\pi (n-1)}{n}\right\} . \end{aligned}$$

To compute the solutions we formed a sparse \(n \times n\) matrix L that corresponds to the finite-difference operator formed by substituting (107) and (108) into the first and second derivative terms in Eq. (106). The eigenvalues and eigenvectors of L were found using the function scipy.sparse.linalg.eigs() from the SciPy package [64]. It is a wrapper of the ARPACK library for large-scale eigenvalue problems [33]. Recall that in our problem, all eigenvalues are non-positive. To obtain the smallest (in magnitude) eigenvalues and their corresponding eigenvectors, we used the eigs() function in shift-invert mode with \(\sigma =1\). The particular choice of \(\sigma \) did not seem to matter much when \(\sigma > 0\), however choosing \(\sigma =0\) resulted in instabilities and convergence errors. This is due to the fact that shift-invert mode attempts to find solutions to \((L-\sigma I)^{-1}\mathbf{x}= \lambda \mathbf{x}\), and since zero is an eigenvalue of L, the choice \(\sigma =0\) results in the inversion of an ill-conditioned matrix. The use of sparse matrices allows one to take large values of n, since applying the finite-difference operator defined above costs only O(n).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Kileel, J., Moscovich, A., Zelesko, N. et al. Manifold Learning with Arbitrary Norms. J Fourier Anal Appl 27, 82 (2021). https://doi.org/10.1007/s00041-021-09879-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00041-021-09879-2

Keywords

  • Dimensionality reduction
  • Diffusion maps
  • Laplacian eigenmaps
  • Second-order differential operator
  • Riemannian geometry
  • Convex body