Computational Aspects of Trimmed Single-Link Clustering

  • Evangelos Tabakis
Part of the Lecture Notes in Statistics book series (LNS, volume 109)


The main drawback of the single-link method is the chaining effect. A few observations between clusters can create a chain, i.e. a path of short edges joining the real clusters and thus making their single-link distances small. It has been suggested (Tabakis (1992a)) that a density estimator could help by eliminating those low-density observations creating the problem. Since the amount of trimming necessary is not known, we may have to compute and compare several minimal spanning trees using standard algorithms such as Prim’s algorithm. In this paper we are concerned with the computational aspects of the problem. In particular, we prove that trimmed single-link can be applied on n points in O(n 2) time, i.e. its time complexity does not exceed that of the classical single-link method. We also discuss the space requirements of the method.

Keywords and phrases

Cluster analysis single-link minimal spanning tree computational complexity 

AMS 1991 subject classifications

Primary 62H30 secondary 62G35 90C27 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. [1]
    Gower, J.C. and Ross, C.J.S. (1969): Minimum spanning trees and single linkage cluster analysis. J. Roy. Statist. Soc. C 18 54–64.MathSciNetGoogle Scholar
  2. [2]
    Hartigan, J.A. (1981): Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc. 76 388–394.MathSciNetCrossRefzbMATHGoogle Scholar
  3. [3]
    Hu, T.C. (1982): Combinatorial Algorithms. Addison-Wesley, Reading.zbMATHGoogle Scholar
  4. [4]
    Kittler, Josef (1976): A locally sensitive method for cluster analysis. Pattern Recognition 8 23–33.MathSciNetCrossRefzbMATHGoogle Scholar
  5. [5]
    Kittler, Josef (1979): Comments on “Single-link characteristics of a mode-seeking algorithm”. Pattern Recognition 11 71–73.CrossRefzbMATHGoogle Scholar
  6. [6]
    Lebart, L., Morineau, A. and Warwick, K.M. (1984): Multivariate Descriptive Statistical Analysis: Correspondence Analysis and Related Techniques for Large Matrices. Wiley.Google Scholar
  7. [7]
    Papadimitriou, Christos H. and Steiglitz, Kenneth (1982): Combinatorial Optimization. Algorithms and Complexity. Prentice-Hall, Englewood Cliff.zbMATHGoogle Scholar
  8. [8]
    Rozál, G.P.M. and Hartigan, J.A. (1994): The MAP Test for Multimodality. Journal of Classification 11 5–36.MathSciNetCrossRefzbMATHGoogle Scholar
  9. [9]
    Schaffer, E., Dubes, R. and Jain, A. (1979): Single-link characteristics of a mode-seeking clustering algorithm. Pattern Recognition 11 65–70.CrossRefGoogle Scholar
  10. [10]
    Steele, J. Michael (1988): Growth rates of euclidean minimal spanning trees with power weighted edges. Ann. Prob. 16 1767–1787.MathSciNetCrossRefzbMATHGoogle Scholar
  11. [11]
    Tabakis, Evangelos (1992a): Visualizing Cluster Structure in High Dimensions. Computing Science and Statistics 24 582–586.Google Scholar
  12. [12]
    Tabakis, Evangelos (1992b): Assessing cluster structure on the real line using single-link distances. Submitted to Comm. Statist. A, under revision.Google Scholar
  13. [13]
    Tabakis, Evangelos (1995): On the length of the longest edge of the minimal spanning tree. In From Data to Knowledge: Theoretical and Practical Aspects of Classification, Data Analysis and Knowledge Organization (W. Gaul, D. Pfeifer, eds.), 222–230. Springer-Verlag, BerlinGoogle Scholar
  14. [14]
    Wong, M. Antony (1982): A hybrid clustering method for identifying high-density clusters. J. Amer. Statist. Assoc. 77 841–847.MathSciNetCrossRefzbMATHGoogle Scholar
  15. [15]
    Wong, M. Antony and Lane, Tom (1983): A kth nearest neighbour clustering procedure. J. Roy. Statist. Soc. B 45 362–368.MathSciNetzbMATHGoogle Scholar

Copyright information

© Springer-Verlag New York, Inc. 1996

Authors and Affiliations

  • Evangelos Tabakis
    • 1
  1. 1.Universität BayreuthGermany

Personalised recommendations