Skip to main content

Heuristic Framework for Multiscale Testing of the Multi-Manifold Hypothesis

  • Chapter
  • First Online:

Part of the book series: Association for Women in Mathematics Series ((AWMS,volume 17))

Abstract

When analyzing empirical data, we often find that global linear models overestimate the number of parameters required. In such cases, we may ask whether the data lies on or near a manifold or a set of manifolds, referred to as multi-manifold, of lower dimension than the ambient space. This question can be phrased as a (multi-)manifold hypothesis. The identification of such intrinsic multiscale features is a cornerstone of data analysis and representation, and has given rise to a large body of work on manifold learning. In this work, we review key results on multiscale data analysis and intrinsic dimension followed by the introduction of a heuristic, multiscale, framework for testing the multi-manifold hypothesis. Our method implements a hypothesis test on a set of spline-interpolated manifolds constructed from variance-based intrinsic dimensions. The workflow is suitable for empirical data analysis as we demonstrate on two use cases.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Hardcover Book
USD   49.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    We define d VLID to be a pointwise statistic that depends on a set of local neighborhoods at each point. The intrinsic dimension d is computed for sets of data points in each local neighborhood. Then d VLID is the minimum of these intrinsic dimensions. Hence points sampled from a local manifold of dimension d have d VLID equal to d. A more formal definition is in Sect. 2.4.

  2. 2.

    The description of each of the attributes below is literally taken from website http://desktop.arcgis.com/en/arcmap/ (in ”Fundamentals about LiDAR under Manage Data”).

References

  1. E. Arias-Castro, G. Chen, G. Lerman, Spectral clustering based on local linear approximations. Electr. J. Stat. 5, 1537–1587 (2011)

    Article  MathSciNet  Google Scholar 

  2. J. Azzam, R. Schul, An analyst’s traveling salesman theorem for sets of dimension larger than one. Tech Report (2017). https://arxiv.org/abs/1609.02892

  3. D. Bassu, R. Izmailov, A. McIntosh, L. Ness, D. Shallcross, Centralized multi-scale singular vector decomposition for feature construction in LiDAR image classification problems, in IEEE Applied Imagery and Pattern Recognition Workshop (AIPR) (IEEE, Piscataway, 2012)

    Google Scholar 

  4. D. Bassu, R. Izmailov, A. McIntosh, L. Ness, D. Shallcross, Application of multi-scale singular vector decomposition to vessel classification in overhead satellite imagery, in Proceedings of the Seventh Annual International Conference on Digital Image Processing (ICDIP 2015), vol. 9631, ed. by C. Falco, X. Jiang (2015)

    Google Scholar 

  5. M. Belkin, P. Niyogi, Laplacian Eigenmaps and spectral techniques for embedding and clustering, in Advances in Neural Information Processing Systems (NIPS), vol. 14 (2002)

    Google Scholar 

  6. M. Belkin, P. Niyogi, Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2002)

    Article  Google Scholar 

  7. M. Belkin, P. Niyogi, Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)

    Article  Google Scholar 

  8. P. Bendich, E. Gasparovic, J. Harer, R. Izmailov, L. Ness, Multi-scale local shape analysis and feature selection in machine learning applications, in Multi-Scale Local Shape Analysis and Feature Selection in Machine Learning Applications (IEEE, Piscataway, 2014). http://arxiv.org/pdf/1410.3169.pdf

    Google Scholar 

  9. P. Bendich, E. Gasparovic, C. Tralie, J. Harer, Scaffoldings and spines: organizing high-dimensional data using cover trees, local principal component analysis, and persistent homology. Technical Report (2016). https://arxiv.org/pdf/1602.06245.pdf

  10. A. Beygelzimer, S. Kakade, J. Langford, Cover trees for nearest neighbor, in Proceedings of the 23rd International Conference on Machine Learning (ICML ’06) (ACM, New York 2006), pp. 97–104

    Google Scholar 

  11. N. Brodu, D. Lague, 3D terrestrial LiDAR data classification of complex natural scenes using a multi-scale dimensionality criterion: applications in geomorphology. ISPRS J. Photogramm. Remote Sens. 68, 121–134 (2012)

    Article  Google Scholar 

  12. F. Camastra, Data dimensionality estimation methods: a survey. Pattern Recognit. 36, 2945–2954 (2003)

    Article  Google Scholar 

  13. F. Camastra, A. Vinciarelli, Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1404–1407 (2002)

    Article  Google Scholar 

  14. K. Carter, A. Hero, Variance reduction with neighborhood smoothing for local intrinsic dimension estimation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, Piscataway, 2008)

    Google Scholar 

  15. G. Chen, A. Little, M. Maggioni, Multi-resolution geometric analysis for data in high dimensions, in Excursions in Harmonic Analysis: The February Fourier Talks at the Norbert Wiener Center (Springer, Berlin, 2013), pp. 259–285

    Google Scholar 

  16. J. Chodera, W. Swope, J. Pitera, K. Dill, Long-time protein folding dynamics from short-time molecular dynamics simulations. Multiscale Model. Simul. 5, 1214–1226 (2006)

    Article  MathSciNet  Google Scholar 

  17. R. Coifman, S. Lafon, Diffusion maps. Appl. Comput. Harmon. Anal. 21, 5–30 (2006)

    Article  MathSciNet  Google Scholar 

  18. R. Coifman, S. Lafon, M. Maggioni, B. Nadler, F. Warner, S.W. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci. U. S. A. 102, 7426–31 (2005)

    Article  Google Scholar 

  19. R.R. Coifman, I. Kevrekidis, S. Lafon, M. Maggioni, B. Nadler, Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems. Multiscale Model. Simul. 7, 842–864 (2008)

    Article  MathSciNet  Google Scholar 

  20. J.A. Costa, A.O. Hero, Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. Signal Process. 52, 2210–2211 (2004)

    Article  MathSciNet  Google Scholar 

  21. J.A. Costa, A. Girotra, A.O. Hero, Estimating local intrinsic dimension with k-nearest neighbor graphs, in IEEE/SP 13th Workshop on Statistical Signal Processing (IEEE, Piscataway, 2005)

    Google Scholar 

  22. G. David, S. Semmes, Quantitative rectifiability and Lipschitz mappings. Trans. Am. Math. Soc. 2, 855–889 (1993) http://dx.doi.org/10.2307/2154247

    Article  MathSciNet  Google Scholar 

  23. D. Donoho, C. Grimes, Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. U. S. A. 100, 5591–5596 (2003)

    Article  MathSciNet  Google Scholar 

  24. C. Fefferman, S. Mitter, H. Narayanan, Testing the manifold hypothesis. J. Am. Math. Soc. 29, 983–1049 (2016)

    Article  MathSciNet  Google Scholar 

  25. K. Fukunaga, Intrinsic dimensionality extraction, in Classification Pattern Recognition and Reduction of Dimensionality. Handbook of Statistics, vol. 2 (Elsevier, Amsterdam, 1982), pp. 347–360

    Google Scholar 

  26. P. Grassberger, I. Procaccia, Measuring the strangeness of strange attractors. Phys. D 9, 189–208 (1983)

    Article  MathSciNet  Google Scholar 

  27. J. Ham, D. Lee, S. Mika, B. Schölkopf, A kernel view of the dimensionality reduction of manifolds, in Proceedings of the Twenty-First International Conference on Machine Learning (ICML ’04) (ACM, New York, 2004), pp. 47–55

    Google Scholar 

  28. G. Haro, G. Randall, G. Sapiro, Translated Poisson mixture model for stratification learning. Int. J. Comput. Vis. 80, 358–374 (2008)

    Article  Google Scholar 

  29. D. Joncas, M. Meila, J. McQueen, Improved graph Laplacian via geometric self-consistency, in Advances in Neural Information Processing Systems (2017), pp. 4457–4466

    Google Scholar 

  30. P.W. Jones, Rectifiable sets and the traveling salesman problem. Invent. Math. 102, 1–15 (1990)

    Article  MathSciNet  Google Scholar 

  31. D.R. Karger, M. Ruhl, Finding nearest neighbors in growth-restricted metrics, in Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing (STOC ’02) (ACM, New York, 2002), pp. 741–750

    MATH  Google Scholar 

  32. R. Krauthgamer, J.R. Lee, Navigating nets: simple algorithms for proximity search, in Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’04) (Philadelphia, Society for Industrial and Applied Mathematics, 2004), pp. 798–807

    Google Scholar 

  33. J. Lee, M. Verleysen, Nonlinear Dimensionality Reduction, 1st edn. (Springer, Berlin, 2007)

    Book  Google Scholar 

  34. E. Levina, P. Bickel, Maximum likelihood estimation of intrinsic dimension, in Advances in Neural Information Processing Systems (NIPS), vol. 17 (MIT Press, Cambridge, MA, 2005), pp. 777–784

    Google Scholar 

  35. A. Little, Estimating the Intrinsic Dimension of High-Dimensional Data Sets: A Multiscale, Geometric Approach, vol. 5 (Duke University, Durham, 2011)

    Google Scholar 

  36. P.M. Mather, Computer Processing of Remotely-Sensed Images: An Introduction (Wiley, New York, 2004)

    Google Scholar 

  37. J. McQueen, M. Meila, J. VanderPlas, Z. Zhang, Megaman: scalable manifold learning in python. J. Mach. Learn. Res. 17, 1–5 (2016)

    MathSciNet  MATH  Google Scholar 

  38. B. Nadler, S. Lafon, R. Coifman, I. Kevrekidis, Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators. Appl. Comput. Harmon. Anal. 21, 113–127 (2006)

    Article  MathSciNet  Google Scholar 

  39. H. Narayanan, S. Mitter, Sample complexity of testing the manifold hypothesis, in Advances in Neural Information Processing Systems, vol. 23. ed. by J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta (Curran Associates, Red Hook, 2010), pp. 1786–1794

    Google Scholar 

  40. A. Ng, M. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm, in Advances in Neural Information Processing Systems (NIPS), vol. 14 (2002), pp. 849–856

    Google Scholar 

  41. K.W. Pettis, T.A. Bailey, A.K. Jain, R.C. Dubes, An intrinsic dimensionality estimator from near-neighbor information. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1, 25–37 (1979)

    Article  Google Scholar 

  42. S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)

    Article  Google Scholar 

  43. L.K. Saul, S.T. Roweis, Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 119–155 (2003)

    MathSciNet  MATH  Google Scholar 

  44. B. Schölkopf, A. Smola, J. Alexander, K. Müller, Kernel principal component analysis, in Advances in Kernel Methods: Support Vector Learning (1999), pp. 327–352

    Google Scholar 

  45. J. Shan, C.K. Toth, Topographic Laser Ranging and Scanning: Principles and Processing, 1st edn. (CRC Press, Boca Raton, 2008)

    Book  Google Scholar 

  46. J. Stoker, http://www.usgs.gov/media/images/3d-lidar-point-cloud-image-san-francisco-bay-and-golden-gate-bridge (2016)

  47. G. Sumerling, Lidar Analysis in Arcgis 9.3.1 for Forestry Applications. https://www.esri.com/library/whitepapers/pdfs/lidar-analysis-forestry.pdf (2010)

  48. F. Takens, On the Numerical Determination of the Dimension of an Attractor (Springer, Berlin, 1985), pp. 99–106

    MATH  Google Scholar 

  49. J.B. Tenenbaum, V. de Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)

    Article  Google Scholar 

  50. J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, J.D. Farmer, Testing for nonlinearity in time series: the method of surrogate data. Phys. D: Nonlinear Phenom. 58, 77–94 (1992)

    Article  Google Scholar 

  51. J. Wang, A.L. Ferguson, Nonlinear reconstruction of single-molecule free-energy surfaces from univariate time series. Phys. Rev. E 93, 032412 (2016)

    Article  Google Scholar 

  52. X. Wang, K. Slavakis, G. Lerman, Riemannian multi-manifold modeling. Technical Report (2014). http://arXiv:1410.0095 and http://www-users.math.umn.edu/~lerman/MMC/ Link to supplementary webpage with code

  53. W. Zheng, M. Rohrdanz, M. Maggioni, C. Clementi, Determination of reaction coordinates via locally scaled diffusion map. J. Chem. Phys. 134, 03B624 (2011)

    Google Scholar 

  54. W. Zjeng, M. Rohrdanz, C. Clementi, Rapid exploration of configuration space with diffusion-map-directed molecular dynamics. J. Phys. Chem. B 117, 12769–12776 (2013)

    Article  Google Scholar 

Download references

Acknowledgements

This research started at the Women in Data Science and Mathematics Research Collaboration Workshop (WiSDM), July 17–21, 2017, at the Institute for Computational and Experimental Research in Mathematics (ICERM). The workshop was partially supported by grant number NSF-HRD 1500481-AWM ADVANCE and co-sponsored by Brown’s Data Science Initiative.

Additional support for some participant travel was provided by DIMACS in association with and through its Special Focus on Information Sharing and Dynamic Data Analysis. Linda Ness worked on this project during a visit to DIMACS, partially supported by the National Science Foundation under grant number CCF-1445755. F. Patricia Medina received partial travel funding from the Mathematical Science Department at Worcester Polytechnic Institute.

We thank Brie Finegold and Katherine M. Kinnaird for their participation in the workshop and in early stage experiments. In addition, we thank Anna Little for helpful discussions on intrinsic dimensions and Jason Stoker for sharing material on LiDAR data.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Linda Ness .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 The Author(s) and the Association for Women in Mathematics

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Medina, F.P., Ness, L., Weber, M., Djima, K.Y. (2019). Heuristic Framework for Multiscale Testing of the Multi-Manifold Hypothesis. In: Gasparovic, E., Domeniconi, C. (eds) Research in Data Science. Association for Women in Mathematics Series, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-030-11566-1_3

Download citation

Publish with us

Policies and ethics