Abstract
The k-means algorithm is a well-known method for partitioning n points that lie in the d-dimensional space into k clusters. Its main features are simplicity and speed in practice. Theoretically, however, the best known upper bound on its running time (i.e., n O(kd)) is, in general, exponential in the number of points (when kd=Ω(n/log n)). Recently Arthur and Vassilvitskii (Proceedings of the 22nd Annual Symposium on Computational Geometry, pp. 144–153, 2006) showed a super-polynomial worst-case analysis, improving the best known lower bound from Ω(n) to \(2^{\varOmega (\sqrt{n})}\) with a construction in \(d=\varOmega (\sqrt{n})\) dimensions. In Arthur and Vassilvitskii (Proceedings of the 22nd Annual Symposium on Computational Geometry, pp. 144–153, 2006), they also conjectured the existence of super-polynomial lower bounds for any d≥2.
Our contribution is twofold: we prove this conjecture and we improve the lower bound, by presenting a simple construction in the plane that leads to the exponential lower bound 2Ω(n).
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Agarwal, P.K., Mustafa, N.H.: k-means projective clustering. In: Proceedings of the 23rd Symposium on Principles of Database Systems, pp. 155–165 (2004)
Arthur, D., Vassilvitskii, S.: How slow is the k-means method. In: Proceedings of the 22nd Annual Symposium on Computational Geometry, pp. 144–153 (2006)
Arthur, D., Vassilvitskii, S.: Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. SIAM J. Comput. 39(2), 766–782 (2009)
Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (2009)
Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, USA (2002)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2000)
Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. In: Biometric Society Meeting, Riverside, California, 1965. Abstract in Biometrics, vol. 21, p. 768 (1965)
Gibou, F., Fedkiw, R.: A fast hybrid k-means level set algorithm for segmentation. In: Proceedings of the 4th Annual Hawaii International Conference on Statistics and Mathematics, pp. 281–291 (2005)
Har-Peled, S., Sadri, B.: How fast is the k-means method? Algorithmica 41(3), 185–202 (2005)
Inaba, M., Katoh, N., Imai, H.: Variance-based k-clustering algorithms by Voronoi diagrams and randomization. IEICE Trans. Inf. Syst. E83-D(6), 1199–1206 (2000)
Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: A local search approximation algorithm for k-means clustering. Comput. Geom. 28(2–3), 89–112 (2004)
Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982)
MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967)
Manthey, B., Röglin, H.: Improved smoothed analysis of the k-means method. In: Proceedings of the 20th Annual Symposium on Discrete Algorithms, pp. 461–470 (2009)
Spielman, D.A., Teng, S.: Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time. J. ACM 51(3), 385–463 (2004)
Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Proceedings of the 25th Annual Symposium on Computational Geometry, pp. 324–332. (2009)
Vattani, A.: k-means lower bound implementation, www.cse.ucsd.edu/~avattani/k-means/lowerbound.py
Author information
Authors and Affiliations
Corresponding author
Additional information
A preliminary version of this paper appeared in SoCG 2009 [16].
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Vattani, A. k-means Requires Exponentially Many Iterations Even in the Plane. Discrete Comput Geom 45, 596–616 (2011). https://doi.org/10.1007/s00454-011-9340-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00454-011-9340-1