Abstract
The k-means++ seeding algorithm is one of the most popular algorithms that is used for finding the initial k centers when using the k-means heuristic. The algorithm is a simple sampling procedure and can be described as follows:
Pick the first center randomly from the given points. For i > 1, pick a point to be the i th center with probability proportional to the square of the Euclidean distance of this point to the closest previously (i − 1) chosen centers.
The k-means++ seeding algorithm is not only simple and fast but also gives an O(logk) approximation in expectation as shown by Arthur and Vassilvitskii [7]. There are datasets [7,3] on which this seeding algorithm gives an approximation factor of Ω(logk) in expectation. However, it is not clear from these results if the algorithm achieves good approximation factor with reasonably high probability (say 1/poly(k)). Brunsch and Röglin [9] gave a dataset where the k-means++ seeding algorithm achieves an O(logk) approximation ratio with probability that is exponentially small in k. However, this and all other known lower-bound examples [7,3] are high dimensional. So, an open problem was to understand the behavior of the algorithm on low dimensional datasets. In this work, we give a simple two dimensional dataset on which the seeding algorithm achieves an O(logk) approximation ratio with probability exponentially small in k. This solves open problems posed by Mahajan et al. [13] and by Brunsch and Röglin [9].
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Ackermann, M.R., Blömer, J.: Bregman clustering for separable instances. In: Kaplan, H. (ed.) SWAT 2010. LNCS, vol. 6139, pp. 212–223. Springer, Heidelberg (2010)
Agarwal, M., Jaiswal, R., Pal, A.: k-means++ under approximation stability. In: Chan, T.-H.H., Lau, L.C., Trevisan, L. (eds.) TAMC 2013. LNCS, vol. 7876, pp. 84–95. Springer, Heidelberg (2013)
Aggarwal, A., Deshpande, A., Kannan, R.: Adaptive sampling for k-means clustering. In: Dinur, I., Jansen, K., Naor, J., Rolim, J. (eds.) Approx and Random 2009. LNCS, vol. 5687, pp. 15–28. Springer, Heidelberg (2009)
Ailon, N., Jaiswal, R., Monteleoni, C.: Streaming k-means approximation. In: NIPS, pp. 10–18 (2009)
Arthur, D., Vassilvitskii, S.: How slow is the k-means method? In: Proceedings of the Twenty-second Annual Symposium on Computational Geometry, SCG 2006, pp. 144–153. ACM, New York (2006)
Arthur, D., Vassilvitskii, S.: Worst-case and smoothed analysis of the ICP algorithm, with an application to the k-means method. In: Proceedings of the 47th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2006, pp. 153–164. IEEE Computer Society, Washington, DC (2006)
Arthur, D., Vassilvitskii, S.: k-means++: the advantages of careful seeding. In: Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2007, pp. 1027–1035. Society for Industrial and Applied Mathematics, Philadelphia (2007)
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., Vassilvitskii, S.: Scalable k-means++. Proc. VLDB Endow. 5(7), 622–633 (2012)
Brunsch, T., Röglin, H.: A bad instance for k-means++. Theoretical Computer Science (2012)
Dasgupta, S.: The hardness of k-means clustering. Technical report, University of California San Diego
Jaiswal, R., Garg, N.: Analysis of k-means++ for separable data. In: Gupta, A., Jansen, K., Rolim, J., Servedio, R. (eds.) APPROX/RANDOM 2012. LNCS, vol. 7408, pp. 591–602. Springer, Heidelberg (2012)
Jaiswal, R., Kumar, A., Sen, S.: A simple D 2-sampling based PTAS for k-means and other clustering problems. Algorithmica (2013)
Mahajan, M., Nimbhorkar, P., Varadarajan, K.: The planar k-means problem is NP-hard. Theoretical Computer Science 442, 13–21 (2012); Special Issue on the Workshop on Algorithms and Computation (WALCOM 2009)
Vattani, A.: The planar k-means problem is NP-hard. Manuscript (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Bhattacharya, A., Jaiswal, R., Ailon, N. (2014). A Tight Lower Bound Instance for k-means++ in Constant Dimension. In: Gopal, T.V., Agrawal, M., Li, A., Cooper, S.B. (eds) Theory and Applications of Models of Computation. TAMC 2014. Lecture Notes in Computer Science, vol 8402. Springer, Cham. https://doi.org/10.1007/978-3-319-06089-7_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-06089-7_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-06088-0
Online ISBN: 978-3-319-06089-7
eBook Packages: Computer ScienceComputer Science (R0)