A Bad Instance for k-Means++
k-means++ is a seeding technique for the k-means method with an expected approximation ratio of O(logk), where k denotes the number of clusters. Examples are known on which the expected approximation ratio of k-means++ is Ω(logk), showing that the upper bound is asymptotically tight. However, it remained open whether k-means++ yields an O(1)-approximation with probability 1/poly(k) or even with constant probability. We settle this question and present instances on which k-means++ achieves an approximation ratio of (2/3 − ε)·logk only with exponentially small probability.
Unable to display preview. Download preview PDF.
- 3.Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proc. of the 50th Annual IEEE Symposium on Foundations of Computer Science (FOCS), pp. 405–414 (2009)Google Scholar
- 4.Arthur, D., Vassilvitskii, S.: k-means++: The advantages of careful seeding. In: Proc. of the 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pp. 1027–1035. SIAM, Philadelphia (2007)Google Scholar
- 5.Berkhin, P.: Survey of Clustering Data Mining Techniques. Technical report, Accrue Software (2002)Google Scholar
- 8.Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Proc. of the 25th ACM Symposium on Computational Geometry (SoCG), pp. 324–332. ACM, New York (2009)Google Scholar