Abstract
We propose an extension of hierarchical clustering methods, called multiparameter hierarchical clustering methods which are designed to exhibit sensitivity to density while retaining desirable theoretical properties. The input of the method we propose is a triple (X, d, f), where (X, d) is a finite metric space and f : X → ℝ is a function defined on the data X, which could be a density estimate or could represent some other type of information. The output of our method is more general than dendrograms in that we track two parameters: the usual scale parameter and a parameter related to the function f. Our construction is motivated by the methods of persistent topology (Edelsbrunner et al. 2000), the Reeb graph and Cluster Trees (Stuetzle 2003). We present both a characterization, and a stability theorem.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Anthony Wong, M., & Lane, T. (1983). A kth nearest neighbour clustering procedure. Journal of the Royal Statistical Society: Series B, 45(3), 362–368.
Ben-David, S., von Luxburg, U., & Pál, D. (2006). A sober look at clustering stability. In G. Lugosi & H.-U. Simon (Eds.), COLT, volume 4005 of Lecture Notes in Computer Science (pp. 5–19). Berlin, Heidelberg, New York: Springer.
Biau, G., Cadre, B., & Pelletier, B. (2007). A graph-based estimator of the number of clusters. ESAIM Probability and Statistics, 11, 272–280.
Burago, D., Burago, Y., & Ivanov, S. (2001). A course in metric geometry, volume 33 of AMS Graduate Studies in Maths. American Mathematical Society.
Carlsson, G., & Mémoli, F. (2008). Persistent clustering and a theorem of J. Kleinberg. ArXiv e-prints.
Cuevas, A., Febrero, M., & Fraiman, R. (2001). Cluster analysis: A further approach based on density estimation. Computational Statistics and Data Analysis, 36(4), 441–459.
Edelsbrunner, H., Letscher, D., & Zomorodian, A. (2000). Topological persistence and simplification. In Proceedings of the 41st Annual IEEE Symposium Foundation of Computer Science (pp. 454–463).
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise, 226–231. Menlo Park, CA, USA: AAAI Press.
Hartigan, J. A. (1975). Clustering algorithms. New York-London-Sydney: Wiley. Wiley Series in Probability and Mathematical Statistics.
Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. Journal of the American Statistical Association, 76(374), 388–394.
Jardine, N., & Sibson, R. (1971). Mathematical taxonomy. London: Wiley. Wiley Series in Probability and Mathematical Statistics.
Kleinberg, J. M. (2002). An impossibility theorem for clustering. In S. Becker, S. Thrun and K. Obermayer (Eds.), NIPS (pp. 446–453). Cambridge, MA: MIT Press.
Klemelä, J. (2004). Visualization of multivariate density estimates with level set trees. Journal of Computational and Graphical Statistics, 13(3), 599–620.
Lance, G. N., & Williams, W. T. (1967). A general theory of classificatory sorting strategies 1. Hierarchical systems. Computer Journal, 9(4), 373–380.
Mac Lane, S. (1998). Categories for the working mathematician (2nd ed.), Vol. 5 of Graduate Texts in Mathematics. New York: Springer-Verlag.
Stuetzle, W. (2003). Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. Journal of Classification, 20(1), 25–47.
Stuetzle, W., & Nugent, R. (2008). A generalized single linkage method for estimating the cluster tree of a density.
Wishart, D. (1969). Mode analysis: a generalization of nearest neighbor which reduces chaining effects. In Numerical Taxonomy (pp. 282–311). London: Academic Press.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Carlsson, G., Mémoli, F. (2010). Multiparameter Hierarchical Clustering Methods. In: Locarek-Junge, H., Weihs, C. (eds) Classification as a Tool for Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10745-0_6
Download citation
DOI: https://doi.org/10.1007/978-3-642-10745-0_6
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10744-3
Online ISBN: 978-3-642-10745-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)