Advertisement

Multi-Resolution Geometric Analysis for Data in High Dimensions

  • Guangliang Chen
  • Anna V. Little
  • Mauro Maggioni
Chapter
Part of the Applied and Numerical Harmonic Analysis book series (ANHA)

Abstract

Large data sets arise in a wide variety of applications and are often modeled as samples from a probability distribution in high-dimensional space. It is sometimes assumed that the support of such probability distribution is well approximated by a set of low intrinsic dimension, perhaps even a low-dimensional smooth manifold. Samples are often corrupted by high-dimensional noise. We are interested in developing tools for studying the geometry of such high-dimensional data sets. In particular, we present here a multiscale transform that maps high-dimensional data as above to a set of multiscale coefficients that are compressible/sparse under suitable assumptions on the data. We think of this as a geometric counterpart to multi-resolution analysis in wavelet theory: whereas wavelets map a signal (typically low dimensional, such as a one-dimensional time series or a two-dimensional image) to a set of multiscale coefficients, the geometric wavelets discussed here map points in a high-dimensional point cloud to a multiscale set of coefficients. The geometric multi-resolution analysis (GMRA) we construct depends on the support of the probability distribution, and in this sense it fits with the paradigm of dictionary learning or data-adaptive representations, albeit the type of representation we construct is in fact mildly nonlinear, as opposed to standard linear representations. Finally, we apply the transform to a set of synthetic and real-world data sets.

Keywords

Multiscale analysis Geometric analysis High-dimensional data Covariance matrix estimation 

Notes

Acknowledgements

The authors thank E. Monson for useful discussions. AVL was partially supported by NSF and ONR. GC was partially supported by DARPA, ONR, NSF CCF, and NSF/DHS FODAVA program. MM is grateful for partial support from DARPA, NSF, ONR, and the Sloan Foundation.

References

  1. 1.
    Aharon, M., Elad, M., Bruckstein, A.: K-SVD: Design of dictionaries for sparse representation. In: Proceedings of SPARS 05’, pp. 9–12 (2005)Google Scholar
  2. 2.
    Allard, W.K., Chen, G., Maggioni, M.: Multi-scale geometric methods for data sets II: Geometric multi-resolution analysis. Appl. Computat. Harmonic Analysis 32, 435–462 (2012)MathSciNetzbMATHCrossRefGoogle Scholar
  3. 3.
    Belkin, M., Niyogi, P.: Using manifold structure for partially labelled classification. Advances in NIPS, vol. 15. MIT Press, Cambridge (2003)Google Scholar
  4. 4.
    Beygelzimer, A., Kakade, S., Langford, J.: Cover trees for nearest neighbor. In: ICML, pp. 97–104 (2006)Google Scholar
  5. 5.
    Binev, P., Cohen, A., Dahmen, W., Devore, R., Temlyakov, V.: Universal algorithms for learning theory part i: Piecewise constant functions. J. Mach. Learn. 6, 1297–1321 (2005)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Binev, P., Devore, R.: Fast computation in adaptive tree approximation. Numer. Math. 97, 193–217 (2004)MathSciNetzbMATHCrossRefGoogle Scholar
  7. 7.
    Bremer, J., Coifman, R., Maggioni, M.,  Szlam, A.: Diffusion wavelet packets. Appl. Comp. Harm. Anal. 21, 95–112 (2006) (Tech. Rep. YALE/DCS/TR-1304, 2004)Google Scholar
  8. 8.
    Candès, E., Donoho, D.L.: Curvelets: A surprisingly effective nonadaptive representation of objects with edges. In: Schumaker, L.L., et al. (eds.) Curves and Surfaces. Vanderbilt University Press, Nashville (1999)Google Scholar
  9. 9.
    Causevic, E.,  Coifman, R.,  Isenhart, R.,  Jacquin, A.,  John, E.,  Maggioni, M.,  Prichep, L.,  Warner, F.: QEEG-based classification with wavelet packets and microstate features for triage applications in the ER, vol. 3. ICASSP Proc., May 2006 10.1109/ICASSP.2006.1660859Google Scholar
  10. 10.
    Chen, G.,  Little, A.,  Maggioni, M.,  Rosasco, L.: Wavelets and Multiscale Analysis: Theory and Applications. Springer (2011) submitted March 12th, 2010Google Scholar
  11. 11.
    Chen, G., Maggioni, M.: Multiscale geometric wavelets for the analysis of point clouds. Information Sciences and Systems (CISS), 2010 44th Annual Conference on. IEEE, 2010.Google Scholar
  12. 12.
    Chen, S.S., Donoho, D.L., Saunders, M.A.: Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20, 33–61 (1998)MathSciNetCrossRefGoogle Scholar
  13. 13.
    Christ, M.: A T(b) theorem with remarks on analytic capacity and the Cauchy integral. Colloq. Math. 60–61, 601–628 (1990)MathSciNetGoogle Scholar
  14. 14.
    Christensen, O.: An introduction to frames and Riesz bases. Applied and Numerical Harmonic Analysis. Birkhäuser, Boston (2003)zbMATHGoogle Scholar
  15. 15.
    Coifman, R.,  Lafon, S.: Diffusion maps. Appl. Comp. Harm. Anal. 21, 5–30 (2006)MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    Coifman, R.,  Lafon, S.,  Maggioni, M.,  Keller, Y.,  Szlam, A.,  Warner, F.,  Zucker, S.: Geometries of sensor outputs, inference, and information processing. In:  Athale, R.A. (ed.) Proc. SPIE, J. C. Z. E. Intelligent Integrated Microsystems, vol. 6232, p. 623209, May 2006Google Scholar
  17. 17.
    Coifman, R.,  Maggioni, M.: Diffusion wavelets. Appl. Comp. Harm. Anal. 21, 53–94 (2006) (Tech. Rep. YALE/DCS/TR-1303, Yale Univ., Sep. 2004).Google Scholar
  18. 18.
    Coifman, R.,  Maggioni, M.: Multiscale data analysis with diffusion wavelets. In: Proc. SIAM Bioinf. Workshop, Minneapolis (2007)Google Scholar
  19. 19.
    Coifman, R.,  Maggioni, M.: Geometry analysis and signal processing on digital data, emergent structures, and knowledge building. SIAM News, November 2008Google Scholar
  20. 20.
    Coifman, R.,  Meyer, Y.,  Quake, S., Wickerhauser, M.V.: Signal processing and compression with wavelet packets. In: Progress in Wavelet Analysis and Applications (Toulouse, 1992), pp. 77–93. Frontières, Gif (1993)Google Scholar
  21. 21.
    Coifman, R.R.,  Lafon, S., Lee, A.B.,  Maggioni, M.,  Nadler, B.,  Warner, F., Zucker, S.W.: Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps. PNAS 102, 7426–7431 (2005)CrossRefGoogle Scholar
  22. 22.
    Daubechies, I.: Ten lectures on wavelets. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (1992) ISBN: 0-89871-274-2.zbMATHCrossRefGoogle Scholar
  23. 23.
    David, G.: Wavelets and singular integrals on curves and surfaces. In: Lecture Notes in Mathematics, vol. 1465. Springer, Berlin (1991)Google Scholar
  24. 24.
    David, G.: Wavelets and Singular Integrals on Curves and Surfaces. Springer, Berlin (1991)CrossRefGoogle Scholar
  25. 25.
    David, G.,  Semmes, S.: Analysis of and on uniformly rectifiable sets. Mathematical Surveys and Monographs, vol. 38. American Mathematical Society, Providence (1993)Google Scholar
  26. 26.
    David, G.,  Semmes, S.: Uniform Rectifiability and Quasiminimizing Sets of Arbitrary Codimension. American Mathematical Society, Providence (2000)Google Scholar
  27. 27.
    Donoho, D.L.,  Grimes, C.: When does isomap recover natural parameterization of families of articulated images? Tech. Rep. 2002–2027, Department of Statistics, Stanford University, August 2002Google Scholar
  28. 28.
    Donoho, D.L.,  Grimes, C.: Hessian eigenmaps: new locally linear embedding techniques for high-dimensional data. Proc. Nat. Acad. Sciences 100, 5591–5596 (2003)MathSciNetzbMATHCrossRefGoogle Scholar
  29. 29.
    Golub, G., Loan, C.V.: Matrix Computations. Johns Hopkins University Press, Baltimore (1989)zbMATHGoogle Scholar
  30. 30.
    Jones, P.,  Maggioni, M.,  Schul, R.: Manifold parametrizations by eigenfunctions of the Laplacian and heat kernels. Proc. Nat. Acad. Sci. 105, 1803–1808 (2008)MathSciNetzbMATHCrossRefGoogle Scholar
  31. 31.
    Jones, P.,  Maggioni, M.,  Schul, R.: Universal local manifold parametrizations via heat kernels and eigenfunctions of the Laplacian. Ann. Acad. Scient. Fen. 35, 1–44 (2010) http://arxiv.org/abs/0709.1975
  32. 32.
    Jones, P.W.: Rectifiable sets and the traveling salesman problem. Invent. Math. 102, 1–15 (1990)MathSciNetzbMATHCrossRefGoogle Scholar
  33. 33.
    Jones, P.W.: The traveling salesman problem and harmonic analysis. Publ. Mat. 35, 259–267 (1991) Conference on Mathematical Analysis (El Escorial, 1989)Google Scholar
  34. 34.
    Karypis, G.,  Kumar, V.: A fast and high quality multilevel scheme for partitioning irregular graphs. SIAM J. Sci. Comput. 20, 359–392 (1999)MathSciNetzbMATHCrossRefGoogle Scholar
  35. 35.
    Little, A., Jung, Y.-M.,  Maggioni, M.: Multiscale estimation of intrinsic dimensionality of data sets. In: Proc. A.A.A.I. (2009)Google Scholar
  36. 36.
    Little, A.,  Lee, J., Jung, Y.-M.,  Maggioni, M.: Estimation of intrinsic dimensionality of samples from noisy low-dimensional manifolds in high dimensions with multiscale SVD. In: Proc. S.S.P. (2009)Google Scholar
  37. 37.
    Little, A.,  Maggioni, M.,  Rosasco, L.: Multiscale geometric methods for data sets I: Estimation of intrinsic dimension, submitted (2010)Google Scholar
  38. 38.
    Maggioni, M., Bremer, J. Jr.,  Coifman, R.,  Szlam, A.: Biorthogonal diffusion wavelets for multiscale representations on manifolds and graphs. SPIE, vol. 5914, p. 59141M (2005)CrossRefGoogle Scholar
  39. 39.
    Maggioni, M.,  Mahadevan, S.: Fast direct policy evaluation using multiscale analysis of markov diffusion processes. In: ICML 2006, pp. 601–608 (2006)CrossRefGoogle Scholar
  40. 40.
    Mahadevan, S.,  Maggioni, M.: Proto-value functions: A spectral framework for solving markov decision processes. JMLR 8, 2169–2231 (2007)MathSciNetzbMATHGoogle Scholar
  41. 41.
    Mairal, J.,  Bach, F.,  Ponce, J.,  Sapiro, G.: Online dictionary learning for sparse coding. In: ICML, p. 87 (2009)Google Scholar
  42. 42.
    Mairal, J.,  Bach, F.,  Ponce, J.,  Sapiro, G.: Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11, 19–60 (2010)MathSciNetzbMATHGoogle Scholar
  43. 43.
    Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: A strategy employed by V1? Vision Res. 37, 3311–3325 (1997)CrossRefGoogle Scholar
  44. 44.
    Rahman, I.U.,  Drori, I., Stodden, V.C., Donoho, D.L.: Multiscale representations for manifold-valued data. SIAM J. Multiscale Model. Simul. 4, 1201–1232 (2005).MathSciNetzbMATHCrossRefGoogle Scholar
  45. 45.
    Rohrdanz, M.A.,  Zheng, W.,  Maggioni, M.,  Clementi, C.: Determination of reaction coordinates via locally scaled diffusion map. J. Chem. Phys. 134, 124116 (2011)CrossRefGoogle Scholar
  46. 46.
    Roweis, S.,  Saul, L.: Nonlinear dimensionality reduction by locally linear embedding. Science 290, 2323–2326 (2000)CrossRefGoogle Scholar
  47. 47.
    Starck, J.L.,  Elad, M.,  Donoho, D.: Image decomposition via the combination of sparse representations and a variational approach. IEEE T. Image Process. 14, 1570–1582 (2004)MathSciNetCrossRefGoogle Scholar
  48. 48.
    Szlam, A.: Asymptotic regularity of subdivisions of euclidean domains by iterated PCA and iterated 2-means. Appl. Comp. Harm. Anal. 27, 342–350 (2009)MathSciNetzbMATHCrossRefGoogle Scholar
  49. 49.
    Szlam, A.,  Maggioni, M.,  Coifman, R.,  Bremer, J. Jr.: Diffusion-driven multiscale analysis on manifolds and graphs: top-down and bottom-up constructions. SPIE, vol. 5914(1), p. 59141D (2005)Google Scholar
  50. 50.
    Szlam, A.,  Maggioni, M.,  Coifman, R.: Regularization on graphs with function-adapted diffusion processes. J. Mach. Learn. Res. 9, 1711–1739 (2008) (YALE/DCS/TR1365, Yale Univ, July 2006)Google Scholar
  51. 51.
    Szlam, A.,  Sapiro, G.: Discriminative k-metrics. In: Proceedings of the 26th Annual International Conference on Machine Learning, pp. 1009–1016 (2009)Google Scholar
  52. 52.
    Tenenbaum, J.B., Silva, V.D., Langford, J.C.: A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000)CrossRefGoogle Scholar
  53. 53.
    Tibshirani, R.: Regression shrinkage and selection via the lasso. J. Royal. Statist. Soc. B 58, 267–288 (1996)MathSciNetzbMATHGoogle Scholar
  54. 54.
    Zhang, Z.,  Zha, H.: Principal manifolds and nonlinear dimension reduction via local tangent space alignment. SIAM J. Sci. Comput. 26, 313–338 (2002)MathSciNetCrossRefGoogle Scholar
  55. 55.
    Zhou, M.,  Chen, H.,  Paisley, J.,  Ren, L.,  Sapiro, G.,  Carin, L.: Non-parametric Bayesian dictionary learning for sparse image representations. In: Neural and Information Processing Systems (NIPS) (2009)Google Scholar

Copyright information

© Birkhäuser Boston 2013

Authors and Affiliations

  • Guangliang Chen
    • 1
  • Anna V. Little
    • 1
  • Mauro Maggioni
    • 1
  1. 1.Mathematics and Computer Science DepartmentsDuke UniversityDurhamUSA

Personalised recommendations