Advertisement

Linear Time Algorithms for Clustering Problems in Any Dimensions

  • Amit Kumar
  • Yogish Sabharwal
  • Sandeep Sen
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3580)

Abstract

We generalize the k-means algorithm presented by the authors [14] and show that the resulting algorithm can solve a larger class of clustering problems that satisfy certain properties (existence of a random sampling procedure and tightness). We prove these properties for the k-median and the discrete k-means clustering problems, resulting in O(2(k/ε)O(1) dn) time (1+ε)-approximation algorithms for these problems. These are the first algorithms for these problems linear in the size of the input (nd for n points in d dimensions), independent of dimensions in the exponent, assuming k and ε to be fixed. A key ingredient of the k-median result is a (1+ε)-approximation algorithm for the 1-median problem which has running time O(2(1/ε)O(1) d). The previous best known algorithm for this problem had linear running time.

Keywords

Approximation Algorithm Cluster Problem Linear Time Algorithm Constant Probability Random Sampling Procedure 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Arora, S.: Polynomial time approximation schemes for Euclidean TSP and other geometric problems. In: Proceedings of the 37th Annual Symposium on Foundations of Computer Science, pp. 2–11 (1996)Google Scholar
  2. 2.
    Arora, S., Raghavan, P., Rao, S.: Approximation schemes for Euclidean k-medians and related problems. In: Proceedings of the thirtieth annual ACM symposium on Theory of computing, pp. 106–113 (1998)Google Scholar
  3. 3.
    Badoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the thiry-fourth annual ACM symposium on Theory of computing, pp. 250–257 (2002)Google Scholar
  4. 4.
    Bern, M., Eppstein, D.: Approximation algorithms for geometric problems. In: Approximating algorithms for NP-Hard problems, pp. 296–345. PWS Publishing Company (1997)Google Scholar
  5. 5.
    Broder, A., Glassman, S., Manasse, M., Zweig, G.: Syntactic clustering of the Web. In: Proc. of 6th International World Wide Web Conference, pp. 391–404 (1997)Google Scholar
  6. 6.
    de la Vega, W.F., Karpinski, M., Kenyon, C., Rabani, Y.: Approximation schemes for clustering problems. In: Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, pp. 50–58 (2003)Google Scholar
  7. 7.
    Deerwester, S., Dumais, S.T., Landauer, T.K., Furnas, G.W., Harshman, R.A.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41(6), 391–407 (1990)CrossRefGoogle Scholar
  8. 8.
    Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-Interscience, New York (2001)zbMATHGoogle Scholar
  9. 9.
    Faloutsos, C., Barber, R., Flickner, M., Hafner, J., Niblack, W., Petkovic, D., Equitz, W.: Efficient and effective querying by image content. Journal of Intelligent Information Systems 3(3), 231–262 (1994)CrossRefGoogle Scholar
  10. 10.
    Har-Peled, S., Mazumdar, S.: On coresets for k-means and k-median clustering. In: Proceedings of the thirty-sixth annual ACM symposium on Theory of computing, pp. 291–300 (2004)Google Scholar
  11. 11.
    Inaba, M., Katoh, N., Imai, H.: Applications of weighted Voronoi diagrams and randomization to variance-based k-clustering. In: Proceedings of the tenth annual symposium on Computational Geometry, pp. 332–339 (1994)Google Scholar
  12. 12.
    Indyk, P.: High Dimensional Computational Geometry. Ph.D. Thesis. Department of Computer Science, Stanford University (2004)Google Scholar
  13. 13.
    Kolliopoulos, S., Rao, S.: A nearly linear time approximation scheme for the Euclidean k-medians problem. In: Proceedings of the 7th European Symposium on Algorithms, pp. 362–371 (1999)Google Scholar
  14. 14.
    Kumar, A., Sabharwal, Y., Sen, S.: A simple linear time (1 + ε)-approximation algorithm for k-means clustering in any dimensions. In: Proceedings of the 45th Annual Symposium on Foundations of Computer Science, pp. 454–462 (2004)Google Scholar
  15. 15.
    Matousek, J.: On approximate geometric k-clustering. Discrete and Computational Geometry 24, 61–84 (2000)Google Scholar
  16. 16.
    Swain, M.J., Ballard, D.H.: Color indexing. International Journal of Computer Vision, 11–32 (1991)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2005

Authors and Affiliations

  • Amit Kumar
    • 1
  • Yogish Sabharwal
    • 2
  • Sandeep Sen
    • 3
  1. 1.Dept of Comp Sc & EnggIndian Institute of TechnologyNew DelhiIndia
  2. 2.IBM India Research Lab, Block-IIIT Delhi, Hauz KhasNew DelhiIndia
  3. 3.Dept of Comp Sc & EnggIndian Institute of TechnologyKharagpurIndia

Personalised recommendations