Advertisement

Discrete & Computational Geometry

, Volume 45, Issue 4, pp 596–616 | Cite as

k-means Requires Exponentially Many Iterations Even in the Plane

  • Andrea VattaniEmail author
Open Access
Article

Abstract

The k-means algorithm is a well-known method for partitioning n points that lie in the d-dimensional space into k clusters. Its main features are simplicity and speed in practice. Theoretically, however, the best known upper bound on its running time (i.e., n O(kd)) is, in general, exponential in the number of points (when kd=Ω(n/log n)). Recently Arthur and Vassilvitskii (Proceedings of the 22nd Annual Symposium on Computational Geometry, pp. 144–153, 2006) showed a super-polynomial worst-case analysis, improving the best known lower bound from Ω(n) to \(2^{\varOmega (\sqrt{n})}\) with a construction in \(d=\varOmega (\sqrt{n})\) dimensions. In Arthur and Vassilvitskii (Proceedings of the 22nd Annual Symposium on Computational Geometry, pp. 144–153, 2006), they also conjectured the existence of super-polynomial lower bounds for any d≥2.

Our contribution is twofold: we prove this conjecture and we improve the lower bound, by presenting a simple construction in the plane that leads to the exponential lower bound 2Ω(n).

Keywords

k-means Local search Lower bounds 

References

  1. 1.
    Agarwal, P.K., Mustafa, N.H.: k-means projective clustering. In: Proceedings of the 23rd Symposium on Principles of Database Systems, pp. 155–165 (2004) Google Scholar
  2. 2.
    Arthur, D., Vassilvitskii, S.: How slow is the k-means method. In: Proceedings of the 22nd Annual Symposium on Computational Geometry, pp. 144–153 (2006) Google Scholar
  3. 3.
    Arthur, D., Vassilvitskii, S.: Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. SIAM J. Comput. 39(2), 766–782 (2009) CrossRefzbMATHMathSciNetGoogle Scholar
  4. 4.
    Arthur, D., Manthey, B., Röglin, H.: k-means has polynomial smoothed complexity. In: Proceedings of the 50th Annual IEEE Symposium on Foundations of Computer Science (2009) Google Scholar
  5. 5.
    Berkhin, P.: Survey of clustering data mining techniques. Technical report, Accrue Software, San Jose, CA, USA (2002) Google Scholar
  6. 6.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley, New York (2000) Google Scholar
  7. 7.
    Forgy, E.W.: Cluster analysis of multivariate data: efficiency versus interpretability of classifications. In: Biometric Society Meeting, Riverside, California, 1965. Abstract in Biometrics, vol. 21, p. 768 (1965) Google Scholar
  8. 8.
    Gibou, F., Fedkiw, R.: A fast hybrid k-means level set algorithm for segmentation. In: Proceedings of the 4th Annual Hawaii International Conference on Statistics and Mathematics, pp. 281–291 (2005) Google Scholar
  9. 9.
    Har-Peled, S., Sadri, B.: How fast is the k-means method? Algorithmica 41(3), 185–202 (2005) CrossRefzbMATHMathSciNetGoogle Scholar
  10. 10.
    Inaba, M., Katoh, N., Imai, H.: Variance-based k-clustering algorithms by Voronoi diagrams and randomization. IEICE Trans. Inf. Syst. E83-D(6), 1199–1206 (2000) Google Scholar
  11. 11.
    Kanungo, T., Mount, D.M., Netanyahu, N.S., Piatko, C.D., Silverman, R., Wu, A.Y.: A local search approximation algorithm for k-means clustering. Comput. Geom. 28(2–3), 89–112 (2004) CrossRefzbMATHMathSciNetGoogle Scholar
  12. 12.
    Lloyd, S.P.: Least squares quantization in PCM. IEEE Trans. Inf. Theory 28(2), 129–136 (1982) CrossRefzbMATHMathSciNetGoogle Scholar
  13. 13.
    MacQueen, J.B.: Some methods for classification and analysis of multivariate observations. In: Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, pp. 281–297 (1967) Google Scholar
  14. 14.
    Manthey, B., Röglin, H.: Improved smoothed analysis of the k-means method. In: Proceedings of the 20th Annual Symposium on Discrete Algorithms, pp. 461–470 (2009) Google Scholar
  15. 15.
    Spielman, D.A., Teng, S.: Smoothed analysis of algorithms: why the simplex algorithm usually takes polynomial time. J. ACM 51(3), 385–463 (2004) CrossRefzbMATHMathSciNetGoogle Scholar
  16. 16.
    Vattani, A.: k-means requires exponentially many iterations even in the plane. In: Proceedings of the 25th Annual Symposium on Computational Geometry, pp. 324–332. (2009) CrossRefGoogle Scholar
  17. 17.
    Vattani, A.: k-means lower bound implementation, www.cse.ucsd.edu/~avattani/k-means/lowerbound.py

Copyright information

© The Author(s) 2011

Authors and Affiliations

  1. 1.University of California, San DiegoLa JollaUSA

Personalised recommendations