Skip to main content

Linear Time Algorithms for Clustering Problems in Any Dimensions

  • Conference paper
Automata, Languages and Programming (ICALP 2005)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 3580))

Included in the following conference series:

Abstract

We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2(k/ε)O(1) dn) time (1+ε)-approximation algorithms for these problems. These are the first algorithms for these problems linear in the size of the input (nd for n points in d dimensions), independent of dimensions in the exponent, assuming k and ε to be fixed. A key ingredient of the k-median result is a (1+ε)-approximation algorithm for the 1-median problem which has running time O(2(1/ε)O(1) d). The previous best known algorithm for this problem had linear running time.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Arora, S.: Polynomial time approximation schemes for Euclidean TSP and other geometric problems. In: Proceedings of the 37th Annual Symposium on Foundations of Computer Science, pp. 2–11 (1996)

    Google Scholar 

  2. Arora, S., Raghavan, P., Rao, S.: Approximation schemes for Euclidean k-medians and related problems. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp. 106–113 (1998)

    Google Scholar 

  3. Badoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 250–257 (2002)

    Google Scholar 

  4. Bern, M., Eppstein, D.: Approximation algorithms for geometric problems. In: Approximating algorithms for NP-Hard problems, pp. 296–345. PWS Publishing Company (1997)

    Google Scholar 

  5. Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the Web. In: Proc. of 6th International World Wide Web Conference, pp. 391–404 (1997)

    Google Scholar 

  6. de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pp. 50–58 (2003)

    Google Scholar 

  7. Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)

    Article  Google Scholar 

  8. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2001)

    MATH  Google Scholar 

  9. Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image content. Journal of Intelligent Information Systems 3(3), 231–262 (1994)

    Article  Google Scholar 

  10. Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp. 291–300 (2004)

    Google Scholar 

  11. Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the tenth annual symposium on Computational Geometry, pp. 332–339 (1994)

    Google Scholar 

  12. Indyk, P.: High Dimensional Computational Geometry. Ph.D. Thesis. Department of Computer Science, Stanford University (2004)

    Google Scholar 

  13. Kolliopoulos, S., Rao, S.: A nearly linear time approximation scheme for the Euclidean k-medians problem. In: Proceedings of the 7th European Symposium on Algorithms, pp. 362–371 (1999)

    Google Scholar 

  14. Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In: Proceedings of the 45th Annual Symposium on Foundations of Computer Science, pp. 454–462 (2004)

    Google Scholar 

  15. Matousek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24, 61–84 (2000)

    Google Scholar 

  16. Swain, M.J., Ballard, D.H.: Color indexing. International Journal of Computer Vision, 11–32 (1991)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2005 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Kumar, A., Sabharwal, Y., Sen, S. (2005). Linear Time Algorithms for Clustering Problems in Any Dimensions. In: Caires, L., Italiano, G.F., Monteiro, L., Palamidessi, C., Yung, M. (eds) Automata, Languages and Programming. ICALP 2005. Lecture Notes in Computer Science, vol 3580. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11523468_111

Download citation

  • DOI: https://doi.org/10.1007/11523468_111

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-27580-0

  • Online ISBN: 978-3-540-31691-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics