Multiparameter Hierarchical Clustering Methods

Carlsson, Gunnar; Mémoli, Facundo

doi:10.1007/978-3-642-10745-0_6

Gunnar Carlsson &
Facundo Mémoli³

Part of the book series: Studies in Classification, Data Analysis, and Knowledge Organization ((STUDIES CLASS))

2203 Accesses
12 Citations

Abstract

We propose an extension of hierarchical clustering methods, called multiparameter hierarchical clustering methods which are designed to exhibit sensitivity to density while retaining desirable theoretical properties. The input of the method we propose is a triple (X, d, f), where (X, d) is a finite metric space and f : X → ℝ is a function defined on the data X, which could be a density estimate or could represent some other type of information. The output of our method is more general than dendrograms in that we track two parameters: the usual scale parameter and a parameter related to the function f. Our construction is motivated by the methods of persistent topology (Edelsbrunner et al. 2000), the Reeb graph and Cluster Trees (Stuetzle 2003). We present both a characterization, and a stability theorem.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Anthony Wong, M., & Lane, T. (1983). A kth nearest neighbour clustering procedure. Journal of the Royal Statistical Society: Series B, 45(3), 362–368.
MATH MathSciNet Google Scholar
Ben-David, S., von Luxburg, U., & Pál, D. (2006). A sober look at clustering stability. In G. Lugosi & H.-U. Simon (Eds.), COLT, volume 4005 of Lecture Notes in Computer Science (pp. 5–19). Berlin, Heidelberg, New York: Springer.
Google Scholar
Biau, G., Cadre, B., & Pelletier, B. (2007). A graph-based estimator of the number of clusters. ESAIM Probability and Statistics, 11, 272–280.
Article MATH MathSciNet Google Scholar
Burago, D., Burago, Y., & Ivanov, S. (2001). A course in metric geometry, volume 33 of AMS Graduate Studies in Maths. American Mathematical Society.
Google Scholar
Carlsson, G., & Mémoli, F. (2008). Persistent clustering and a theorem of J. Kleinberg. ArXiv e-prints.
Google Scholar
Cuevas, A., Febrero, M., & Fraiman, R. (2001). Cluster analysis: A further approach based on density estimation. Computational Statistics and Data Analysis, 36(4), 441–459.
Article MATH MathSciNet Google Scholar
Edelsbrunner, H., Letscher, D., & Zomorodian, A. (2000). Topological persistence and simplification. In Proceedings of the 41st Annual IEEE Symposium Foundation of Computer Science (pp. 454–463).
Google Scholar
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise, 226–231. Menlo Park, CA, USA: AAAI Press.
Google Scholar
Hartigan, J. A. (1975). Clustering algorithms. New York-London-Sydney: Wiley. Wiley Series in Probability and Mathematical Statistics.
MATH Google Scholar
Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. Journal of the American Statistical Association, 76(374), 388–394.
Article MATH MathSciNet Google Scholar
Jardine, N., & Sibson, R. (1971). Mathematical taxonomy. London: Wiley. Wiley Series in Probability and Mathematical Statistics.
MATH Google Scholar
Kleinberg, J. M. (2002). An impossibility theorem for clustering. In S. Becker, S. Thrun and K. Obermayer (Eds.), NIPS (pp. 446–453). Cambridge, MA: MIT Press.
Google Scholar
Klemelä, J. (2004). Visualization of multivariate density estimates with level set trees. Journal of Computational and Graphical Statistics, 13(3), 599–620.
Article MathSciNet Google Scholar
Lance, G. N., & Williams, W. T. (1967). A general theory of classificatory sorting strategies 1. Hierarchical systems. Computer Journal, 9(4), 373–380.
Google Scholar
Mac Lane, S. (1998). Categories for the working mathematician (2nd ed.), Vol. 5 of Graduate Texts in Mathematics. New York: Springer-Verlag.
Google Scholar
Stuetzle, W. (2003). Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. Journal of Classification, 20(1), 25–47.
Article MATH MathSciNet Google Scholar
Stuetzle, W., & Nugent, R. (2008). A generalized single linkage method for estimating the cluster tree of a density.
Google Scholar
Wishart, D. (1969). Mode analysis: a generalization of nearest neighbor which reduces chaining effects. In Numerical Taxonomy (pp. 282–311). London: Academic Press.
Google Scholar

Download references

Author information

Authors and Affiliations

Mathematics Department, Stanford University, Stanford, CA, USA
Facundo Mémoli

Authors

Gunnar Carlsson
View author publications
You can also search for this author in PubMed Google Scholar
Facundo Mémoli
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Facundo Mémoli .

Editor information

Editors and Affiliations

LS für BWL, insb. Finanzwirtschaft und, Finanzdienstleistungen, TU Dresden, Helmholtzstr. 10, Dresden, 01062, Germany
Hermann Locarek-Junge
FG Computergestützte Statistik, Univ. Dortmund, Vogelpothsweg 87, Dortmund, 44227, Germany
Claus Weihs

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carlsson, G., Mémoli, F. (2010). Multiparameter Hierarchical Clustering Methods. In: Locarek-Junge, H., Weihs, C. (eds) Classification as a Tool for Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10745-0_6

Download citation

DOI: https://doi.org/10.1007/978-3-642-10745-0_6
Published: 03 May 2010
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10744-3
Online ISBN: 978-3-642-10745-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics