Improved Analysis of Complete-Linkage Clustering

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9294)

Abstract

Complete-linkage clustering is a very popular method for computing hierarchical clusterings in practice, which is not fully understood theoretically. Given a finite set P ⊆ ℝd of points, the complete-linkage method starts with each point from P in a cluster of its own and then iteratively merges two clusters from the current clustering that have the smallest diameter when merged into a single cluster.

We study the problem of partitioning P into k clusters such that the largest diameter of the clusters is minimized and we prove that the complete-linkage method computes an O(1)-approximation for this problem for any metric that is induced by a norm, assuming that the dimension d is a constant. This improves the best previously known bound of O(logk) due to Ackermann et al. (Algorithmica, 2014). Our improved bound also carries over to the k-center and the discrete k-center problem.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Ackermann, M.R., Blömer, J., Kuntze, D., Sohler, C.: Analysis of agglomerative clustering. Algorithmica 69(1), 184–215 (2014)MathSciNetCrossRefMATHGoogle Scholar
  2. 2.
    Cole, J.R., Wang, Q., Fish, J.A., Chai, B., McGarrell, D.M., Sun, Y., Brown, C.T., Porras-Alfaro, A., Kuske, C.R., Tiedje, J.M.: Ribosomal database project: data and tools for high throughput rrna analysis. Nucleic Acids Research (2013)Google Scholar
  3. 3.
    Dasgupta, S., Long, P.M.: Performance guarantees for hierarchical clustering. Journal of Computer and System Sciences 70(4), 555–569 (2005)MathSciNetCrossRefMATHGoogle Scholar
  4. 4.
    Defays, D.: An efficient algorithm for a complete link method. The Computer Journal 20(4), 364–366 (1977)MathSciNetCrossRefMATHGoogle Scholar
  5. 5.
    Feder, T., Greene, D.H.: Optimal algorithms for approximate clustering. In: Proc. of the 20th Annual ACM Symposium on Theory of Computing (STOC), pp. 434–444 (1988)Google Scholar
  6. 6.
    Ghaemmaghami, H., Dean, D., Vogt, R., Sridharan, S.: Speaker attribution of multiple telephone conversations using a complete-linkage clustering approach. In: Proc. of the 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4185–4188 (2012)Google Scholar
  7. 7.
    Rieck, K., Trinius, P., Willems, C., Holz, T.: Automatic analysis of malware behavior using machine learning. Journal of Computer Security 19(4), 639–668 (2011)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2015

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of BonnBonnGermany

Personalised recommendations