Abstract
It is well known that the clusters produced by a clustering algorithm depend on the chosen initial centers. In this paper we present a measure for the degree to which a given clustering algorithm depends on the choice of initial centers, for a given data set. This measure is calculated for four well-known offline clustering algorithms (k-means Forgy, k-means Hartigan, k-means Lloyd and fuzzy c-means), for five benchmark data sets. The measure is also calculated for ECM, an online algorithm that does not require the number of initial centers as input, but for which the resulting clusters can depend on the order that the input arrives. Our main finding is that this initialization dependence measure can also be used to determine the optimal number of clusters.
This is a preview of subscription content, access via your institution.
Buying options
Preview
Unable to display preview. Download preview PDF.
References
Redmond, S.J., Heneghan, C.: A method for initialising the K-means clustering algorithm using kd-trees. Pattern Recognition Letters 28, 965–973 (2007)
Al-Daoud, M.B., Roberts, S.A.: New methods for the initialisation of clusters. Pattern Recognition Letters 17, 451–455 (1996)
Katsavounidis, I., Kuo, J., Zhen Zhang, C.-C.: A new initialization technique for generalized Lloyd iteration. IEEE Signal Processing Letters 1, 144–146 (1994)
Khan, S.S., Ahmad, A.: Cluster center initialization algorithm for K-means clustering. Pattern Recognition Letters 25, 1293–1302 (2004)
Jain, A.K., Murty, M.N., Flynn, P.J.: Data Clustering: A Review. ACM Computing Surveys 31, 264–323 (1999)
Kasabov, N.: Evolving Connectionist Systems: The Knowledge Engineering Approach. Springer, Heidelberg (2007)
UC Machine Learning Repository, http://archive.ics.uci.edu/ml/
SPAETH Cluster Analysis Datasets, http://people.scs.fsu.edu/~burkardt/datasets/spaeth/spaeth.html
SPAETH2 Cluster Analysis Datasets, http://people.scs.fsu.edu/~burkardt/datasets/spaeth2/spaeth2.html .
Davies, D.L., Bouldin, D.W.: A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence 1, 224–227 (2000)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
De Mulder, W., Schliebs, S., Boel, R., Kuiper, M. (2009). Initialization Dependence of Clustering Algorithms. In: Köppen, M., Kasabov, N., Coghill, G. (eds) Advances in Neuro-Information Processing. ICONIP 2008. Lecture Notes in Computer Science, vol 5507. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03040-6_75
Download citation
DOI: https://doi.org/10.1007/978-3-642-03040-6_75
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03039-0
Online ISBN: 978-3-642-03040-6
eBook Packages: Computer ScienceComputer Science (R0)