Classifying Clustering Schemes
- 963 Downloads
Many clustering schemes are defined by optimizing an objective function defined on the partitions of the underlying set of a finite metric space. In this paper, we construct a framework for studying what happens when we instead impose various structural conditions on the clustering schemes, under the general heading of functoriality.
Functoriality refers to the idea that one should be able to compare the results of clustering algorithms as one varies the dataset, for example by adding points or by applying functions to it. We show that, within this framework, one can prove a theorem analogous to one of Kleinberg (Becker et al. (eds.), NIPS, pp. 446–453, MIT Press, Cambridge, 2002), in which, for example, one obtains an existence and uniqueness theorem instead of a nonexistence result.
We obtain a full classification of all clustering schemes satisfying a condition we refer to as excisiveness. The classification can be changed by varying the notion of maps of finite metric spaces. The conditions occur naturally when one considers clustering as the statistical version of the geometric notion of connected components. By varying the degree of functoriality that one requires from the schemes, it is possible to construct richer families of clustering schemes that exhibit sensitivity to density.
KeywordsClustering Functoriality Chaining effect Single linkage Data analysis
Mathematics Subject Classification62H30 68T10
This work is supported by DARPA grant HR0011-05-1-0007, ONR grant N00014-09-1-0783, and AFOSR Grant FA9550-09-1-0643 Princeton Subaward 00001716-2.
- 1.M. Ackerman, S. Ben-David, D. Loker, Characterization of linkage-based clustering, in COLT, ed. by A.T. Kalai, M. Mohri (Omnipress, Madison, 2010), pp. 270–281. Google Scholar
- 2.S. Ben-David, M. Ackerman, Measures of clustering quality: a working set of axioms for clustering, in NIPS, ed. by D. Koller, D. Schuurmans, Y. Bengio, L. Bottou (Curran Associates, Inc., San Francisco, 2008), pp. 121–128. Google Scholar
- 5.G. Carlsson, F. Mémoli, Persistent clustering and a theorem of J. Kleinberg. ArXiv e-prints, August 2008. Google Scholar
- 8.G. Carlsson, A. Zomorodian, The theory of multidimensional persistence, in Symposium on Computational Geometry (2007), pp. 184–193. Google Scholar
- 9.V. de Silva, R. Ghrist, Coverage in sensor networks via persistent homology, in Algebraic and Geometric Topology, vol. 7 (2007) pp. 339–358. Google Scholar
- 10.M. Ester, H.P. Kriegel, J. Sander, X. Xu, A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise (AAAI Press, Menlo Park, 1996), pp. 226–231. Google Scholar
- 12.R. Ghrist, Three examples of applied and computational homology, Lab Papers (GRASP), p. 18 (2008). Google Scholar
- 13.I. Guyon, U. von Luxburg, R. Williamson, Clustering: Science or art? Technical report, Paper presented at the NIPS 2009 Workshop Clustering: science or Art? http://www.kyb.tuebingen.mpg.de/bs/people/ule/publications/publication_downloads/IsClusteringScience.pdf (2009).
- 14.J.C. Hausmann, On the Vietoris-Rips complexes and a cohomology theory for metric spaces, in Prospects in Topology. Princeton, NJ, 1994, Ann. of Math. Stud., vol. 138 (Princeton Univ. Press, Princeton, 1995), pp. 175–188. Google Scholar
- 19.J.M. Kleinberg, An impossibility theorem for clustering, in NIPS, ed. by S. Becker, S. Thrun, K. Obermayer (MIT Press, Cambridge, 2002), pp. 446–453. Google Scholar