Advertisement

Foundations of Computational Mathematics

, Volume 13, Issue 2, pp 221–252 | Cite as

Classifying Clustering Schemes

  • Gunnar Carlsson
  • Facundo Mémoli
Article

Abstract

Many clustering schemes are defined by optimizing an objective function defined on the partitions of the underlying set of a finite metric space. In this paper, we construct a framework for studying what happens when we instead impose various structural conditions on the clustering schemes, under the general heading of functoriality.

Functoriality refers to the idea that one should be able to compare the results of clustering algorithms as one varies the dataset, for example by adding points or by applying functions to it. We show that, within this framework, one can prove a theorem analogous to one of Kleinberg (Becker et al. (eds.), NIPS, pp. 446–453, MIT Press, Cambridge, 2002), in which, for example, one obtains an existence and uniqueness theorem instead of a nonexistence result.

We obtain a full classification of all clustering schemes satisfying a condition we refer to as excisiveness. The classification can be changed by varying the notion of maps of finite metric spaces. The conditions occur naturally when one considers clustering as the statistical version of the geometric notion of connected components. By varying the degree of functoriality that one requires from the schemes, it is possible to construct richer families of clustering schemes that exhibit sensitivity to density.

Keywords

Clustering Functoriality Chaining effect Single linkage Data analysis 

Mathematics Subject Classification

62H30 68T10 

Notes

Acknowledgements

This work is supported by DARPA grant HR0011-05-1-0007, ONR grant N00014-09-1-0783, and AFOSR Grant FA9550-09-1-0643 Princeton Subaward 00001716-2.

References

  1. 1.
    M. Ackerman, S. Ben-David, D. Loker, Characterization of linkage-based clustering, in COLT, ed. by A.T. Kalai, M. Mohri (Omnipress, Madison, 2010), pp. 270–281. Google Scholar
  2. 2.
    S. Ben-David, M. Ackerman, Measures of clustering quality: a working set of axioms for clustering, in NIPS, ed. by D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (Curran Associates, Inc., San Francisco, 2008), pp. 121–128. Google Scholar
  3. 3.
    G. Carlsson, Topology and data, Bull. Am. Math. Soc. 46, 255–308 (2009). MathSciNetzbMATHCrossRefGoogle Scholar
  4. 4.
    G. Carlsson, V. de Silva, Zigzag persistence, Found. Comput. Math. 10(4), 367–405 (2010). MathSciNetzbMATHCrossRefGoogle Scholar
  5. 5.
    G. Carlsson, F. Mémoli, Persistent clustering and a theorem of J. Kleinberg. ArXiv e-prints, August 2008. Google Scholar
  6. 6.
    G. Carlsson, F. Mémoli, Characterization, stability and convergence of hierarchical clustering methods, J. Mach. Learn. Res. 11, 1425–1470 (2010). MathSciNetzbMATHGoogle Scholar
  7. 7.
    G. Carlsson, F. Mémoli, Multiparameter clustering methods, in Classification as a Tool for Research, ed. by C.W.H. Locarek-Junge. Proc. 11th Conference of the International Federation of Classification Societies (IFCS-09) (Springer, Heidelberg, 2010), pp. 63–70. CrossRefGoogle Scholar
  8. 8.
    G. Carlsson, A. Zomorodian, The theory of multidimensional persistence, in Symposium on Computational Geometry (2007), pp. 184–193. Google Scholar
  9. 9.
    V. de Silva, R. Ghrist, Coverage in sensor networks via persistent homology, in Algebraic and Geometric Topology, vol. 7 (2007) pp. 339–358. Google Scholar
  10. 10.
    M. Ester, H.P. Kriegel, J. Sander, X. Xu, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise (AAAI Press, Menlo Park, 1996), pp. 226–231. Google Scholar
  11. 11.
    R. Ghrist, Barcodes: The persistent topology of data, Bull. Am. Math. Soc. 45(1), 61 (2008). MathSciNetzbMATHCrossRefGoogle Scholar
  12. 12.
    R. Ghrist, Three examples of applied and computational homology, Lab Papers (GRASP), p. 18 (2008). Google Scholar
  13. 13.
    I. Guyon, U. von Luxburg, R. Williamson, Clustering: Science or art? Technical report, Paper presented at the NIPS 2009 Workshop Clustering: science or Art? http://www.kyb.tuebingen.mpg.de/bs/people/ule/publications/publication_downloads/IsClusteringScience.pdf (2009).
  14. 14.
    J.C. Hausmann, On the Vietoris-Rips complexes and a cohomology theory for metric spaces, in Prospects in Topology. Princeton, NJ, 1994, Ann. of Math. Stud., vol. 138 (Princeton Univ. Press, Princeton, 1995), pp. 175–188. Google Scholar
  15. 15.
    J.R. Isbell, Six theorems about injective metric spaces, Comment. Math. Helv. 39, 65–76 (1964). MathSciNetzbMATHCrossRefGoogle Scholar
  16. 16.
    A.K. Jain, R.C. Dubes, Algorithms for Clustering Data. Prentice Hall Advanced Reference Series (Prentice Hall, Englewood Cliffs, 1988). zbMATHGoogle Scholar
  17. 17.
    N. Jardine, R. Sibson, Mathematical Taxonomy. Wiley Series in Probability and Mathematical Statistics (Wiley, London, 1971). zbMATHGoogle Scholar
  18. 18.
    D. Kozlov, D.N. Kozlov, Combinatorial Algebraic Topology (Springer, Berlin, 2008). zbMATHCrossRefGoogle Scholar
  19. 19.
    J.M. Kleinberg, An impossibility theorem for clustering, in NIPS, ed. by S. Becker, S. Thrun, K. Obermayer (MIT Press, Cambridge, 2002), pp. 446–453. Google Scholar
  20. 20.
    G.N. Lance, W.T. Williams, A general theory of classificatory sorting strategies 1. Hierarchical systems, Comput. J. 9(4), 373–380 (1967). CrossRefGoogle Scholar
  21. 21.
    S.M. Lane, Categories for the Working Mathematician, 2nd edn. Graduate Texts in Mathematics, vol. 5 (Springer, New York, 1998). zbMATHGoogle Scholar
  22. 22.
    J.R. Munkres, Topology: A First Course (Prentice Hall, Englewood Cliffs, 1975). zbMATHGoogle Scholar
  23. 23.
    A. Nayak, I. Stojmenovic, Handbook of Applied Algorithms: Solving Scientific, Engineering, and Practical Problems (Wiley/IEEE Press, New York, 2008). zbMATHCrossRefGoogle Scholar
  24. 24.
    G. Palla, I. Derenyi, I. Farkas, T. Vicsek, Uncovering the overlapping community structure of complex networks in nature and society, Nature 435(7043), 814–818 (2005). CrossRefGoogle Scholar

Copyright information

© SFoCM 2013

Authors and Affiliations

  1. 1.Department of MathematicsStanford UniversityStanfordUSA
  2. 2.School of Computer ScienceThe University of AdelaideAdelaideAustralia

Personalised recommendations