Abstract
When analyzing empirical data, we often find that global linear models overestimate the number of parameters required. In such cases, we may ask whether the data lies on or near a manifold or a set of manifolds, referred to as multi-manifold, of lower dimension than the ambient space. This question can be phrased as a (multi-)manifold hypothesis. The identification of such intrinsic multiscale features is a cornerstone of data analysis and representation, and has given rise to a large body of work on manifold learning. In this work, we review key results on multiscale data analysis and intrinsic dimension followed by the introduction of a heuristic, multiscale, framework for testing the multi-manifold hypothesis. Our method implements a hypothesis test on a set of spline-interpolated manifolds constructed from variance-based intrinsic dimensions. The workflow is suitable for empirical data analysis as we demonstrate on two use cases.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsNotes
- 1.
We define d VLID to be a pointwise statistic that depends on a set of local neighborhoods at each point. The intrinsic dimension d is computed for sets of data points in each local neighborhood. Then d VLID is the minimum of these intrinsic dimensions. Hence points sampled from a local manifold of dimension d have d VLID equal to d. A more formal definition is in Sect. 2.4.
- 2.
The description of each of the attributes below is literally taken from website http://desktop.arcgis.com/en/arcmap/ (in ”Fundamentals about LiDAR under Manage Data”).
References
E. Arias-Castro, G. Chen, G. Lerman, Spectral clustering based on local linear approximations. Electr. J. Stat. 5, 1537–1587 (2011)
J. Azzam, R. Schul, An analyst’s traveling salesman theorem for sets of dimension larger than one. Tech Report (2017). https://arxiv.org/abs/1609.02892
D. Bassu, R. Izmailov, A. McIntosh, L. Ness, D. Shallcross, Centralized multi-scale singular vector decomposition for feature construction in LiDAR image classification problems, in IEEE Applied Imagery and Pattern Recognition Workshop (AIPR) (IEEE, Piscataway, 2012)
D. Bassu, R. Izmailov, A. McIntosh, L. Ness, D. Shallcross, Application of multi-scale singular vector decomposition to vessel classification in overhead satellite imagery, in Proceedings of the Seventh Annual International Conference on Digital Image Processing (ICDIP 2015), vol. 9631, ed. by C. Falco, X. Jiang (2015)
M. Belkin, P. Niyogi, Laplacian Eigenmaps and spectral techniques for embedding and clustering, in Advances in Neural Information Processing Systems (NIPS), vol. 14 (2002)
M. Belkin, P. Niyogi, Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2002)
M. Belkin, P. Niyogi, Laplacian Eigenmaps for dimensionality reduction and data representation. Neural Comput. 15, 1373–1396 (2003)
P. Bendich, E. Gasparovic, J. Harer, R. Izmailov, L. Ness, Multi-scale local shape analysis and feature selection in machine learning applications, in Multi-Scale Local Shape Analysis and Feature Selection in Machine Learning Applications (IEEE, Piscataway, 2014). http://arxiv.org/pdf/1410.3169.pdf
P. Bendich, E. Gasparovic, C. Tralie, J. Harer, Scaffoldings and spines: organizing high-dimensional data using cover trees, local principal component analysis, and persistent homology. Technical Report (2016). https://arxiv.org/pdf/1602.06245.pdf
A. Beygelzimer, S. Kakade, J. Langford, Cover trees for nearest neighbor, in Proceedings of the 23rd International Conference on Machine Learning (ICML ’06) (ACM, New York 2006), pp. 97–104
N. Brodu, D. Lague, 3D terrestrial LiDAR data classification of complex natural scenes using a multi-scale dimensionality criterion: applications in geomorphology. ISPRS J. Photogramm. Remote Sens. 68, 121–134 (2012)
F. Camastra, Data dimensionality estimation methods: a survey. Pattern Recognit. 36, 2945–2954 (2003)
F. Camastra, A. Vinciarelli, Estimating the intrinsic dimension of data with a fractal-based method. IEEE Trans. Pattern Anal. Mach. Intell. 24, 1404–1407 (2002)
K. Carter, A. Hero, Variance reduction with neighborhood smoothing for local intrinsic dimension estimation, in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, Piscataway, 2008)
G. Chen, A. Little, M. Maggioni, Multi-resolution geometric analysis for data in high dimensions, in Excursions in Harmonic Analysis: The February Fourier Talks at the Norbert Wiener Center (Springer, Berlin, 2013), pp. 259–285
J. Chodera, W. Swope, J. Pitera, K. Dill, Long-time protein folding dynamics from short-time molecular dynamics simulations. Multiscale Model. Simul. 5, 1214–1226 (2006)
R. Coifman, S. Lafon, Diffusion maps. Appl. Comput. Harmon. Anal. 21, 5–30 (2006)
R. Coifman, S. Lafon, M. Maggioni, B. Nadler, F. Warner, S.W. Zucker, Geometric diffusions as a tool for harmonic analysis and structure definition of data: diffusion maps. Proc. Natl. Acad. Sci. U. S. A. 102, 7426–31 (2005)
R.R. Coifman, I. Kevrekidis, S. Lafon, M. Maggioni, B. Nadler, Diffusion maps, reduction coordinates, and low dimensional representation of stochastic systems. Multiscale Model. Simul. 7, 842–864 (2008)
J.A. Costa, A.O. Hero, Geodesic entropic graphs for dimension and entropy estimation in manifold learning. IEEE Trans. Signal Process. 52, 2210–2211 (2004)
J.A. Costa, A. Girotra, A.O. Hero, Estimating local intrinsic dimension with k-nearest neighbor graphs, in IEEE/SP 13th Workshop on Statistical Signal Processing (IEEE, Piscataway, 2005)
G. David, S. Semmes, Quantitative rectifiability and Lipschitz mappings. Trans. Am. Math. Soc. 2, 855–889 (1993) http://dx.doi.org/10.2307/2154247
D. Donoho, C. Grimes, Hessian eigenmaps: locally linear embedding techniques for high-dimensional data. Proc. Natl. Acad. Sci. U. S. A. 100, 5591–5596 (2003)
C. Fefferman, S. Mitter, H. Narayanan, Testing the manifold hypothesis. J. Am. Math. Soc. 29, 983–1049 (2016)
K. Fukunaga, Intrinsic dimensionality extraction, in Classification Pattern Recognition and Reduction of Dimensionality. Handbook of Statistics, vol. 2 (Elsevier, Amsterdam, 1982), pp. 347–360
P. Grassberger, I. Procaccia, Measuring the strangeness of strange attractors. Phys. D 9, 189–208 (1983)
J. Ham, D. Lee, S. Mika, B. Schölkopf, A kernel view of the dimensionality reduction of manifolds, in Proceedings of the Twenty-First International Conference on Machine Learning (ICML ’04) (ACM, New York, 2004), pp. 47–55
G. Haro, G. Randall, G. Sapiro, Translated Poisson mixture model for stratification learning. Int. J. Comput. Vis. 80, 358–374 (2008)
D. Joncas, M. Meila, J. McQueen, Improved graph Laplacian via geometric self-consistency, in Advances in Neural Information Processing Systems (2017), pp. 4457–4466
P.W. Jones, Rectifiable sets and the traveling salesman problem. Invent. Math. 102, 1–15 (1990)
D.R. Karger, M. Ruhl, Finding nearest neighbors in growth-restricted metrics, in Proceedings of the Thirty-Fourth Annual ACM Symposium on Theory of Computing (STOC ’02) (ACM, New York, 2002), pp. 741–750
R. Krauthgamer, J.R. Lee, Navigating nets: simple algorithms for proximity search, in Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA ’04) (Philadelphia, Society for Industrial and Applied Mathematics, 2004), pp. 798–807
J. Lee, M. Verleysen, Nonlinear Dimensionality Reduction, 1st edn. (Springer, Berlin, 2007)
E. Levina, P. Bickel, Maximum likelihood estimation of intrinsic dimension, in Advances in Neural Information Processing Systems (NIPS), vol. 17 (MIT Press, Cambridge, MA, 2005), pp. 777–784
A. Little, Estimating the Intrinsic Dimension of High-Dimensional Data Sets: A Multiscale, Geometric Approach, vol. 5 (Duke University, Durham, 2011)
P.M. Mather, Computer Processing of Remotely-Sensed Images: An Introduction (Wiley, New York, 2004)
J. McQueen, M. Meila, J. VanderPlas, Z. Zhang, Megaman: scalable manifold learning in python. J. Mach. Learn. Res. 17, 1–5 (2016)
B. Nadler, S. Lafon, R. Coifman, I. Kevrekidis, Diffusion maps, spectral clustering and eigenfunctions of Fokker-Planck operators. Appl. Comput. Harmon. Anal. 21, 113–127 (2006)
H. Narayanan, S. Mitter, Sample complexity of testing the manifold hypothesis, in Advances in Neural Information Processing Systems, vol. 23. ed. by J.D. Lafferty, C.K.I. Williams, J. Shawe-Taylor, R.S. Zemel, A. Culotta (Curran Associates, Red Hook, 2010), pp. 1786–1794
A. Ng, M. Jordan, Y. Weiss, On spectral clustering: analysis and an algorithm, in Advances in Neural Information Processing Systems (NIPS), vol. 14 (2002), pp. 849–856
K.W. Pettis, T.A. Bailey, A.K. Jain, R.C. Dubes, An intrinsic dimensionality estimator from near-neighbor information. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1, 25–37 (1979)
S.T. Roweis, L.K. Saul, Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)
L.K. Saul, S.T. Roweis, Think globally, fit locally: unsupervised learning of low dimensional manifolds. J. Mach. Learn. Res. 4, 119–155 (2003)
B. Schölkopf, A. Smola, J. Alexander, K. Müller, Kernel principal component analysis, in Advances in Kernel Methods: Support Vector Learning (1999), pp. 327–352
J. Shan, C.K. Toth, Topographic Laser Ranging and Scanning: Principles and Processing, 1st edn. (CRC Press, Boca Raton, 2008)
J. Stoker, http://www.usgs.gov/media/images/3d-lidar-point-cloud-image-san-francisco-bay-and-golden-gate-bridge (2016)
G. Sumerling, Lidar Analysis in Arcgis 9.3.1 for Forestry Applications. https://www.esri.com/library/whitepapers/pdfs/lidar-analysis-forestry.pdf (2010)
F. Takens, On the Numerical Determination of the Dimension of an Attractor (Springer, Berlin, 1985), pp. 99–106
J.B. Tenenbaum, V. de Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)
J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, J.D. Farmer, Testing for nonlinearity in time series: the method of surrogate data. Phys. D: Nonlinear Phenom. 58, 77–94 (1992)
J. Wang, A.L. Ferguson, Nonlinear reconstruction of single-molecule free-energy surfaces from univariate time series. Phys. Rev. E 93, 032412 (2016)
X. Wang, K. Slavakis, G. Lerman, Riemannian multi-manifold modeling. Technical Report (2014). http://arXiv:1410.0095 and http://www-users.math.umn.edu/~lerman/MMC/ Link to supplementary webpage with code
W. Zheng, M. Rohrdanz, M. Maggioni, C. Clementi, Determination of reaction coordinates via locally scaled diffusion map. J. Chem. Phys. 134, 03B624 (2011)
W. Zjeng, M. Rohrdanz, C. Clementi, Rapid exploration of configuration space with diffusion-map-directed molecular dynamics. J. Phys. Chem. B 117, 12769–12776 (2013)
Acknowledgements
This research started at the Women in Data Science and Mathematics Research Collaboration Workshop (WiSDM), July 17–21, 2017, at the Institute for Computational and Experimental Research in Mathematics (ICERM). The workshop was partially supported by grant number NSF-HRD 1500481-AWM ADVANCE and co-sponsored by Brown’s Data Science Initiative.
Additional support for some participant travel was provided by DIMACS in association with and through its Special Focus on Information Sharing and Dynamic Data Analysis. Linda Ness worked on this project during a visit to DIMACS, partially supported by the National Science Foundation under grant number CCF-1445755. F. Patricia Medina received partial travel funding from the Mathematical Science Department at Worcester Polytechnic Institute.
We thank Brie Finegold and Katherine M. Kinnaird for their participation in the workshop and in early stage experiments. In addition, we thank Anna Little for helpful discussions on intrinsic dimensions and Jason Stoker for sharing material on LiDAR data.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 The Author(s) and the Association for Women in Mathematics
About this chapter
Cite this chapter
Medina, F.P., Ness, L., Weber, M., Djima, K.Y. (2019). Heuristic Framework for Multiscale Testing of the Multi-Manifold Hypothesis. In: Gasparovic, E., Domeniconi, C. (eds) Research in Data Science. Association for Women in Mathematics Series, vol 17. Springer, Cham. https://doi.org/10.1007/978-3-030-11566-1_3
Download citation
DOI: https://doi.org/10.1007/978-3-030-11566-1_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-11565-4
Online ISBN: 978-3-030-11566-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)