Abstract
The k-means algorithm is a well-known method for partitioning n points that lie in the d-dimensional space into k clusters. Its main features are simplicity and speed in practice. Theoretically, however, the best known upper bound on its running time (i.e., n O(kd)) is, in general, exponential in the number of points (when kd=Ω(n/log n)). Recently Arthur and Vassilvitskii (Proceedings of the 22nd Annual Symposium on Computational Geometry, pp. 144–153, 2006) showed a super-polynomial worst-case analysis, improving the best known lower bound from Ω(n) to \(2^{\varOmega (\sqrt{n})}\) with a construction in \(d=\varOmega (\sqrt{n})\) dimensions. In Arthur and Vassilvitskii (Proceedings of the 22nd Annual Symposium on Computational Geometry, pp. 144–153, 2006), they also conjectured the existence of super-polynomial lower bounds for any d≥2.
Our contribution is twofold: we prove this conjecture and we improve the lower bound, by presenting a simple construction in the plane that leads to the exponential lower bound 2Ω(n).
Article PDF
Similar content being viewed by others
References
Agarwal, P.K., Mustafa, N.H.: k-means projective clustering. In: Proceedings of the 23rd Symposium on Principles of Database Systems, pp. 155–165 (2004)
Arthur, D., Vassilvitskii, S.: How slow is the k-means method. In: Proceedings of the 22nd Annual Symposium on Computational Geometry, pp. 144–153 (2006)
Arthur, D., Vassilvitskii, S.: Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. SIAM J. Comput. 39(2), 766–782 (2009)
Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (2009)
Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, USA (2002)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2000)
Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. In: Biometric Society Meeting, Riverside, California, 1965. Abstract in Biometrics, vol. 21, p. 768 (1965)
Gibou, F., Fedkiw, R.: A fast hybrid k-means level set algorithm for segmentation. In: Proceedings of the 4th Annual Hawaii International Conference on Statistics and Mathematics, pp. 281–291 (2005)
Har-Peled, S., Sadri, B.: How fast is the k-means method? Algorithmica 41(3), 185–202 (2005)
Inaba, M., Katoh, N., Imai, H.: Variance-based k-clustering algorithms by Voronoi diagrams and randomization. IEICE Trans. Inf. Syst. E83-D(6), 1199–1206 (2000)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: A local search approximation algorithm for k-means clustering. Comput. Geom. 28(2–3), 89–112 (2004)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Manthey, B., Röglin, H.: Improved smoothed analysis of the k-means method. In: Proceedings of the 20th Annual Symposium on Discrete Algorithms, pp. 461–470 (2009)
Spielman, D.A., Teng, S.: Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time. J. ACM 51(3), 385–463 (2004)
Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Proceedings of the 25th Annual Symposium on Computational Geometry, pp. 324–332. (2009)
Vattani, A.: k-means lower bound implementation, www.cse.ucsd.edu/~avattani/k-means/lowerbound.py
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appeared in SoCG 2009 [16].
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Vattani, A. k-means Requires Exponentially Many Iterations Even in the Plane. Discrete Comput Geom 45, 596–616 (2011). https://doi.org/10.1007/s00454-011-9340-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00454-011-9340-1