Computational Aspects of Trimmed Single-Link Clustering
The main drawback of the single-link method is the chaining effect. A few observations between clusters can create a chain, i.e. a path of short edges joining the real clusters and thus making their single-link distances small. It has been suggested (Tabakis (1992a)) that a density estimator could help by eliminating those low-density observations creating the problem. Since the amount of trimming necessary is not known, we may have to compute and compare several minimal spanning trees using standard algorithms such as Prim’s algorithm. In this paper we are concerned with the computational aspects of the problem. In particular, we prove that trimmed single-link can be applied on n points in O(n 2) time, i.e. its time complexity does not exceed that of the classical single-link method. We also discuss the space requirements of the method.
Keywords and phrasesCluster analysis single-link minimal spanning tree computational complexity
AMS 1991 subject classificationsPrimary 62H30 secondary 62G35 90C27
Unable to display preview. Download preview PDF.
- Lebart, L., Morineau, A. and Warwick, K.M. (1984): Multivariate Descriptive Statistical Analysis: Correspondence Analysis and Related Techniques for Large Matrices. Wiley.Google Scholar
- Tabakis, Evangelos (1992a): Visualizing Cluster Structure in High Dimensions. Computing Science and Statistics 24 582–586.Google Scholar
- Tabakis, Evangelos (1992b): Assessing cluster structure on the real line using single-link distances. Submitted to Comm. Statist. A, under revision.Google Scholar
- Tabakis, Evangelos (1995): On the length of the longest edge of the minimal spanning tree. In From Data to Knowledge: Theoretical and Practical Aspects of Classification, Data Analysis and Knowledge Organization (W. Gaul, D. Pfeifer, eds.), 222–230. Springer-Verlag, BerlinGoogle Scholar