Metric Structures on Datasets: Stability and Classification of Algorithms

  • Facundo Mémoli
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6855)

Abstract

Several methods in data and shape analysis can be regarded as transformations between metric spaces. Examples are hierarchical clustering methods, the higher order constructions of computational persistent topology, and several computational techniques that operate within the context of data/shape matching under invariances.

Metric geometry, and in particular different variants of the Gromov-Hausdorff distance provide a point of view which is applicable in different scenarios. The underlying idea is to regard datasets as metric spaces, or metric measure spaces (a.k.a. mm-spaces, which are metric spaces enriched with probability measures), and then, crucially, at the same time regard the collection of all datasets as a metric space in itself. Variations of this point of view give rise to different taxonomies that include several methods for extracting information from datasets.

Imposing metric structures on the collection of all datasets could be regarded as a ”soft” construction. The classification of algorithms, or the axiomatic characterization of them, could be achieved by imposing the more ”rigid” category structures on the collection of all finite metric spaces and demanding functoriality of the algorithms. In this case, one would hope to single out all the algorithms that satisfy certain natural conditions, which would clarify the landscape of available methods. We describe how using this formalism leads to an axiomatic description of many clustering algorithms, both flat and hierarchical.

Keywords

metric geometry categories and functors metric spaces Gromov-Hausdorff distance Gromov-Wasserstein distance 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ankerst, M., Kastenmüller, G., Kriegel, H.-P., Seidl, T.: 3d shape histograms for similarity search and classification in spatial databases. In: Güting, R.H., Papadias, D., Lochovsky, F.H. (eds.) SSD 1999. LNCS, vol. 1651, pp. 207–226. Springer, Heidelberg (1999)CrossRefGoogle Scholar
  2. 2.
    Asimov, D.: The grand tour: a tool for viewing multidimensional data. SIAM J. Sci. Stat. Comput. 6, 128–143 (1985)CrossRefMATHGoogle Scholar
  3. 3.
    Belongie, S., Malik, J., Puzicha, J.: Shape matching and object recognition using shape contexts. IEEE Trans. Pattern Anal. Mach. Intell. 24(4), 509–522 (2002)CrossRefGoogle Scholar
  4. 4.
    Berchtold, S.: Geometry-based Search of Similar Parts. PhD thesis. University of Munich, Germany (1998)Google Scholar
  5. 5.
    Boutin, M., Kemper, G.: On reconstructing n-point configurations from the distribution of distances or areas. Adv. in Appl. Math. 32(4), 709–735 (2004)CrossRefMATHGoogle Scholar
  6. 6.
    Bowman, G.R., Huang, X., Yao, Y., Sun, J., Carlsson, G., Guibas, L.J., Pande, V.S.: Structural insight into rna hairpin folding intermediates. Journal of the American Chemical Society (2008)Google Scholar
  7. 7.
    Brinkman, D., Olver, P.J.: Invariant histograms. University of Minnesota. Preprint (2010)Google Scholar
  8. 8.
    Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Topology-invariant similarity of nonrigid shapes. Intl. Journal of Computer Vision (IJCV) 81(3), 281–301 (2009)CrossRefGoogle Scholar
  9. 9.
    Bronstein, A.M., Bronstein, M.M., Kimmel, R., Mahmoudi, M., Sapiro, G.: A gromov-hausdorff framework with diffusion geometry for topologically-robust non-rigid shape matching (Submitted)Google Scholar
  10. 10.
    Bronstein, A., Bronstein, M., Bruckstein, A., Kimmel, R.: Partial similarity of objects, or how to compare a centaur to a horse. International Journal of Computer VisionGoogle Scholar
  11. 11.
    Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Efficient computation of isometry-invariant distances between surfaces. SIAM Journal on Scientific Computing 28(5), 1812–1836 (2006)CrossRefMATHGoogle Scholar
  12. 12.
    Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Calculus of nonrigid surfaces for geometry and texture manipulation. IEEE Trans. Vis. Comput. Graph. 13(5), 902–913 (2007)CrossRefGoogle Scholar
  13. 13.
    Burago, D., Burago, Y., Ivanov, S.: A Course in Metric Geometry. AMS Graduate Studies in Math, vol. 33. American Mathematical Society, Providence (2001)MATHGoogle Scholar
  14. 14.
    Bustos, B., Keim, D.A., Saupe, D., Schreck, T., Vranić, D.V.: Feature-based similarity search in 3d object databases. ACM Comput. Surv. 37(4), 345–387 (2005)CrossRefGoogle Scholar
  15. 15.
    Carlsson, G., Mémoli, F.: Persistent Clustering and a Theorem of J. Kleinberg. ArXiv e-prints (August 2008)Google Scholar
  16. 16.
    Carlsson, G., Mémoli, F.: Multiparameter clustering methods. Technical report, technical report (2009)Google Scholar
  17. 17.
    Carlsson, G.: Topology and data. Bull. Amer. Math. Soc. 46, 255–308 (2009)CrossRefMATHGoogle Scholar
  18. 18.
    Carlsson, G., Mémoli, F.: Characterization, stability and convergence of hierarchical clustering methods. Journal of Machine Learning Research 11, 1425–1470 (2010)MATHGoogle Scholar
  19. 19.
    Carlsson, G., Mémoli, F.: Classifying clustering schemes. CoRR, abs/1011.5270 (2010)Google Scholar
  20. 20.
    Chazal, F., Cohen-Steiner, D., Guibas, L., Mémoli, F., Oudot, S.: Gromov-Hausdorff stable signatures for shapes using persistence. In: Proc. of SGP (2009)Google Scholar
  21. 21.
    Clarenz, U., Rumpf, M., Telea, A.: Robust feature detection and local classification for surfaces based on moment analysis. IEEE Transactions on Visualization and Computer Graphics 10 (2004)Google Scholar
  22. 22.
    Coifman, R.R., Lafon, S.: Diffusion maps. Applied and Computational Harmonic Analysis 21(1), 5–30 (2006)CrossRefMATHGoogle Scholar
  23. 23.
    Cox, T.F., Cox, M.A.A.: Multidimensional scaling. Monographs on Statistics and Applied Probability, vol. 59. Chapman & Hall, London (1994) With 1 IBM-PC floppy disk (3.5 inch, HD)MATHGoogle Scholar
  24. 24.
    d’Amico, M., Frosini, P., Landi, C.: Natural pseudo-distance and optimal matching between reduced size functions. Technical Report 66, DISMI, Univ. degli Studi di Modena e Reggio Emilia, Italy (2005)Google Scholar
  25. 25.
    d’Amico, M., Frosini, P., Landi, C.: Using matching distance in size theory: A survey. IJIST 16(5), 154–161 (2006)Google Scholar
  26. 26.
    Davies, E.B.: Heat kernels in one dimension. Quart. J. Math. Oxford Ser. (2) 44(175), 283–299 (1993)CrossRefMATHGoogle Scholar
  27. 27.
    Edelsbrunner, H., Harer, J.: Computational Topology - an Introduction. American Mathematical Society, Providence (2010)MATHGoogle Scholar
  28. 28.
    Elad (Elbaz), A., Kimmel, R.: On bending invariant signatures for surfaces. IEEE Trans. Pattern Anal. Mach. Intell. 25(10), 1285–1295 (2003)CrossRefGoogle Scholar
  29. 29.
    Frosini, P.: A distance for similarity classes of submanifolds of Euclidean space. Bull. Austral. Math. Soc. 42(3), 407–416 (1990)CrossRefMATHGoogle Scholar
  30. 30.
    Frosini, P.: Omotopie e invarianti metrici per sottovarieta di spazi euclidei (teoria della taglia). PhD thesis. University of Florence, Italy (1990)Google Scholar
  31. 31.
    Frosini, P., Mulazzani, M.: Size homotopy groups for computation of natural size distances. Bull. Belg. Math. Soc. Simon Stevin 6(3), 455–464 (1999)MATHGoogle Scholar
  32. 32.
    Gelfand, N., Mitra, N.J., Guibas, L.J., Pottmann, H.: Robust global registration. In: SGP 2005: Proceedings of the Third Eurographics Symposium on Geometry Processing, p. 197. Eurographics Association, Aire-la-Ville (2005)Google Scholar
  33. 33.
    Ghrist, R.: Barcodes: The persistent topology of data. Bulletin-American Mathematical Society 45(1), 61 (2008)CrossRefMATHGoogle Scholar
  34. 34.
    Grigorescu, C., Petkov, N.: Distance sets for shape filters and shape recognition. IEEE Transactions on Image Processing 12(10), 1274–1286 (2003)CrossRefMATHGoogle Scholar
  35. 35.
    Gromov, M.: Metric structures for Riemannian and non-Riemannian spaces. Progress in Mathematics, vol. 152. Birkhäuser Boston Inc., Boston (1999)MATHGoogle Scholar
  36. 36.
    Ben Hamza, A., Krim, H.: Geodesic object representation and recognition. In: Nyström, I., Sanniti di Baja, G., Svensson, S. (eds.) DGCI 2003. LNCS, vol. 2886, pp. 378–387. Springer, Heidelberg (2003)CrossRefGoogle Scholar
  37. 37.
    Hartigan, J.A.: Statistical theory in clustering. J. Classification 2(1), 63–76 (1985)CrossRefMATHGoogle Scholar
  38. 38.
    Hastie, T., Stuetzle, W.: Principal curves. Journal of the American Statistical Association 84(406), 502–516 (1989)CrossRefMATHGoogle Scholar
  39. 39.
    Hilaga, M., Shinagawa, Y., Kohmura, T., Kunii, T.L.: Topology matching for fully automatic similarity estimation of 3d shapes. In: SIGGRAPH 2001: Proceedings of the 28th Annual Conference on Computer Graphics and Interactive Techniques, pp. 203–212. ACM, New York (2001)Google Scholar
  40. 40.
    Holm, L., Sander, C.: Protein structure comparison by alignment of distance matrices. Journal of Molecular Biology 233(1), 123–138 (1993)CrossRefGoogle Scholar
  41. 41.
    Huang, Q.-X., Adams, B., Wicke, M., Guibas, L.J.: Non-rigid registration under isometric deformations. Comput. Graph. Forum 27(5), 1449–1457 (2008)CrossRefGoogle Scholar
  42. 42.
    Huber, P.J.: Projection pursuit. The Annals of Statistics 13(2), 435–525 (1985)CrossRefMATHGoogle Scholar
  43. 43.
    Huttenlocher, D.P., Klanderman, G.A., Rucklidge, W.J.: Comparing images using the Hausdorff distance. IEEE Transactions on Pattern Analysis and Machine Intelligence 15(9) (1993)Google Scholar
  44. 44.
    Inselberg, A.: Parallel Coordinates: Visual Multidimensional Geometry and Its Applications. Springer-Verlag New York, Inc., Secaucus (2009)CrossRefMATHGoogle Scholar
  45. 45.
    Ion, A., Artner, N.M., Peyre, G., Marmol, S.B.L., Kropatsch, W.G., Cohen, L.: 3d shape matching by geodesic eccentricity. In: IEEE Computer Society Conference on, Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2008, pp. 1–8 (June 2008)Google Scholar
  46. 46.
    Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice Hall Advanced Reference Series. Prentice Hall Inc., Englewood Cliffs (1988)MATHGoogle Scholar
  47. 47.
    Janowitz, M.F.: An order theoretic model for cluster analysis. SIAM Journal on Applied Mathematics 34(1), 55–72 (1978)CrossRefMATHGoogle Scholar
  48. 48.
    Jardine, N., Sibson, R.: Mathematical taxonomy. Wiley Series in Probability and Mathematical Statistics. John Wiley & Sons Ltd., London (1971)MATHGoogle Scholar
  49. 49.
    Johnson, A.: Spin-Images: A Representation for 3-D Surface Matching. PhD thesis, Robotics Institute, Carnegie Mellon University, Pittsburgh, PA (August 1997)Google Scholar
  50. 50.
    Kastenmüller, G., Kriegel, H.P., Seidl, T.: Similarity search in 3d protein databases. In: Proc. GCB (1998)Google Scholar
  51. 51.
    Kleinberg, J.M.: An impossibility theorem for clustering. In: Becker, S., Thrun, S., Obermayer, K. (eds.) NIPS, pp. 446–453. MIT Press, Cambridge (2002)Google Scholar
  52. 52.
    Koppensteiner, W.A., Lackner, P., Wiederstein, M., Sippl, M.J.: Characterization of novel proteins based on known protein structures. Journal of Molecular Biology 296(4), 1139–1152 (2000)CrossRefGoogle Scholar
  53. 53.
    Lafon, S.: Diffusion Maps and Geometric Harmonics. PhD thesis, Yale University (2004)Google Scholar
  54. 54.
    Le, T.M., Mémoli, F.: Local scales of embedded curves and surfaces. preprint (2010)Google Scholar
  55. 55.
    Ling, H., Jacobs, D.W.: Using the inner-distance for classification of articulated shapes. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), vol. 2, pp. 719–726 (2005)Google Scholar
  56. 56.
    Lu, C.E., Latecki, L.J., Adluru, N., Yang, X., Ling, H.: Shape guided contour grouping with particle filters. In: IEEE 12th International Conference on, Computer Vision 2009, pp. 2288–2295. IEEE, Los Alamitos (2009)Google Scholar
  57. 57.
    Lane, S.M.: Categories for the working mathematician, 2nd edn. Graduate Texts in Mathematics, vol. 5. Springer, New York (1998)MATHGoogle Scholar
  58. 58.
    Manay, S., Cremers, D., Hong, B.W., Yezzi, A.J., Soatto, S.: Integral invariants for shape matching 28(10), 1602–1618 (2006)Google Scholar
  59. 59.
    Mémoli, F.: Gromov-Hausdorff distances in Euclidean spaces. In: IEEE Computer Society Conference on, Computer Vision and Pattern Recognition Workshops, CVPR Workshops 2008, pp. 1–8 (June 2008)Google Scholar
  60. 60.
    Mémoli, F.: Gromov-wasserstein distances and the metric approach to object matching. In: Foundations of Computational Mathematics, pp. 1–71 (2011) 10.1007/s10208-011-9093-5Google Scholar
  61. 61.
    Mémoli, F.: Some properties of gromov-hausdorff distances. Technical report, Department of Mathematics. Stanford University (March 2011)Google Scholar
  62. 62.
    Mémoli, F.: A spectral notion of Gromov-Wasserstein distances and related methods. Applied and Computational Mathematics 30, 363–401 (2011)MATHGoogle Scholar
  63. 63.
    Mémoli, F., Sapiro, G.: Comparing point clouds. In: SGP 2004: Proceedings of the 2004 Eurographics/ACM SIGGRAPH symposium on Geometry processing, pp. 32–40. ACM, New York (2004)CrossRefGoogle Scholar
  64. 64.
    Mémoli, F., Sapiro, G.: A theoretical and computational framework for isometry invariant recognition of point cloud data. Found. Comput. Math. 5(3), 313–347 (2005)CrossRefMATHGoogle Scholar
  65. 65.
    Nicolau, M., Levine, A.J., Carlsson, G.: Topology based data analysis identifies a subgroup of breast cancers with a unique mutational profile and excellent survival. Proceedings of the National Academy of Sciences 108(17), 7265–7270 (2011)CrossRefGoogle Scholar
  66. 66.
    Norris, J.R.: Heat kernel asymptotics and the distance function in Lipschitz Riemannian manifolds. Acta. Math. 179(1), 79–103 (1997)CrossRefMATHGoogle Scholar
  67. 67.
    Olver, P.J.: Joint invariant signatures. Foundations of computational mathematics 1(1), 3–68 (2001)CrossRefMATHGoogle Scholar
  68. 68.
    Osada, R., Funkhouser, T., Chazelle, B., Dobkin, D.: Shape distributions. ACM Trans. Graph. 21(4), 807–832 (2002)CrossRefMATHGoogle Scholar
  69. 69.
    Pottmann, H., Wallner, J., Huang, Q., Yang, Y.-L.: Integral invariants for robust geometry processing. Comput. Aided Geom. Design (2008) (to appear) Google Scholar
  70. 70.
    Raviv, D., Bronstein, A.M., Bronstein, M.M., Kimmel, R.: Symmetries of non-rigid shapes. In: IEEE 11th International Conference on, Computer Vision, ICCV 2007, October 14-21, pp. 1–7 (2007)Google Scholar
  71. 71.
    Reeb, G.: Sur les points singuliers d’une forme de Pfaff complètement intégrable ou d’une fonction numérique. C. R. Acad. Sci. Paris 222, 847–849 (1946)MATHGoogle Scholar
  72. 72.
    Reuter, M., Wolter, F.-E., Peinecke, N.: Laplace-spectra as fingerprints for shape matching. In: SPM 2005: Proceedings of the 2005 ACM Symposium on Solid and Physical Modeling, pp. 101–106. ACM Press, New York (2005)CrossRefGoogle Scholar
  73. 73.
    Reuter, M., Wolter, F.-E., Peinecke, N.: Laplace-Beltrami spectra as ”Shape-DNA” of surfaces and solids. Computer-Aided Design 38(4), 342–366 (2006)CrossRefGoogle Scholar
  74. 74.
    Roweis, S.T., Saul, L.K.: Nonlinear Dimensionality Reduction by Locally Linear Embedding. Science 290(5500), 2323–2326 (2000)CrossRefGoogle Scholar
  75. 75.
    Ruggeri, M., Saupe, D.: Isometry-invariant matching of point set surfaces. In: Proceedings Eurographics 2008 Workshop on 3D Object Retrieval (2008)Google Scholar
  76. 76.
    Rustamov, R.M.: Laplace-beltrami eigenfunctions for deformation invariant shape representation. In: Symposium on Geometry Processing, pp. 225–233 (2007)Google Scholar
  77. 77.
    Sakai, T.: Riemannian geometry. Translations of Mathematical Monographs, vol. 149. American Mathematical Society, Providence (1996)MATHGoogle Scholar
  78. 78.
    Semple, C., Steel, M.: Phylogenetics. Oxford Lecture Series in Mathematics and its Applications, vol. 24. Oxford University Press, Oxford (2003)MATHGoogle Scholar
  79. 79.
    Shi, Y., Thompson, P.M., de Zubicaray, G.I., Rose, S.E., Tu, Z., Dinov, I., Toga, A.W.: Direct mapping of hippocampal surfaces with intrinsic shape context. NeuroImage 37(3), 792–807 (2007)CrossRefGoogle Scholar
  80. 80.
    Singh, G., Mémoli, F., Carlsson, G.: Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition, pp. 91–100. Eurographics Association, Prague (2007)Google Scholar
  81. 81.
    Singh, G., Memoli, F., Ishkhanov, T., Sapiro, G., Carlsson, G., Ringach, D.L.: Topological analysis of population activity in visual cortex. J. Vis. 8(8), 1–18 (2008)CrossRefGoogle Scholar
  82. 82.
    Stuetzle, W.: Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. J. Classification 20(1), 25–47 (2003)CrossRefMATHGoogle Scholar
  83. 83.
    Sturm, K.-T.: On the geometry of metric measure spaces. I. Acta. Math. 196(1), 65–131 (2006)CrossRefMATHGoogle Scholar
  84. 84.
    Sun, J., Ovsjanikov, M., Guibas, L.: A concise and provably informative multi-scale signature based on heat diffusion. In: SGP (2009)Google Scholar
  85. 85.
    Tenenbaum, J.B., de Silva, V., Langford, J.C.: A Global Geometric Framework for Nonlinear Dimensionality Reduction. Science 290(5500), 2319–2323 (2000)CrossRefGoogle Scholar
  86. 86.
    Thureson, J., Carlsson, S.: Appearance based qualitative image description for object class recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 518–529. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  87. 87.
    Tsuchida, T.: Long-time asymptotics of heat kernels for one-dimensional elliptic operators with periodic coefficients. Proc. Lond. Math. Soc (3) 97(2), 450–476 (2008)CrossRefMATHGoogle Scholar
  88. 88.
    Verri, A., Uras, C., Frosini, P., Ferri, M.: On the use of size functions for shape analysis. Biological cybernetics 70(2), 99–107 (1993)CrossRefMATHGoogle Scholar
  89. 89.
    Villani, C.: Topics in optimal transportation. Graduate Studies in Mathematics, vol. 58. American Mathematical Society, Providence (2003)MATHGoogle Scholar
  90. 90.
    von Luxburg, U., Ben-David, S.: Towards a statistical theory of clustering. presented at the pascal workshop on clustering, london. Technical report, Presented at the PASCAL Workshop on Clustering, London (2005)Google Scholar
  91. 91.
    Zomorodian, A., Carlsson, G.: Computing persistent homology. In: SCG 2004: Proceedings of the Twentieth Annual Symposium on Computational Geometry, pp. 347–356. ACM, New York (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Facundo Mémoli
    • 1
    • 2
  1. 1.Department of MathematicsStanford UniversityUSA
  2. 2.Department of Computer ScienceThe University of AdelaideAustralia

Personalised recommendations