Abstract
We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2(k/ε)O(1) dn) time (1+ε)-approximation algorithms for these problems. These are the first algorithms for these problems linear in the size of the input (nd for n points in d dimensions), independent of dimensions in the exponent, assuming k and ε to be fixed. A key ingredient of the k-median result is a (1+ε)-approximation algorithm for the 1-median problem which has running time O(2(1/ε)O(1) d). The previous best known algorithm for this problem had linear running time.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Arora, S.: Polynomial time approximation schemes for Euclidean TSP and other geometric problems. In: Proceedings of the 37th Annual Symposium on Foundations of Computer Science, pp. 2–11 (1996)
Arora, S., Raghavan, P., Rao, S.: Approximation schemes for Euclidean k-medians and related problems. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp. 106–113 (1998)
Badoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 250–257 (2002)
Bern, M., Eppstein, D.: Approximation algorithms for geometric problems. In: Approximating algorithms for NP-Hard problems, pp. 296–345. PWS Publishing Company (1997)
Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the Web. In: Proc. of 6th International World Wide Web Conference, pp. 391–404 (1997)
de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pp. 50–58 (2003)
Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)
Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2001)
Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image content. Journal of Intelligent Information Systems 3(3), 231–262 (1994)
Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp. 291–300 (2004)
Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the tenth annual symposium on Computational Geometry, pp. 332–339 (1994)
Indyk, P.: High Dimensional Computational Geometry. Ph.D. Thesis. Department of Computer Science, Stanford University (2004)
Kolliopoulos, S., Rao, S.: A nearly linear time approximation scheme for the Euclidean k-medians problem. In: Proceedings of the 7th European Symposium on Algorithms, pp. 362–371 (1999)
Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In: Proceedings of the 45th Annual Symposium on Foundations of Computer Science, pp. 454–462 (2004)
Matousek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24, 61–84 (2000)
Swain, M.J., Ballard, D.H.: Color indexing. International Journal of Computer Vision, 11–32 (1991)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Kumar, A., Sabharwal, Y., Sen, S. (2005). Linear Time Algorithms for Clustering Problems in Any Dimensions. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds) Automata, Languages and Programming. ICALP 2005. Lecture Notes in Computer Science, vol 3580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11523468_111
Download citation
DOI: https://doi.org/10.1007/11523468_111
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-27580-0
Online ISBN: 978-3-540-31691-6
eBook Packages: Computer ScienceComputer Science (R0)