Sublinear-Time Approximation for Clustering Via Random Sampling

  • Artur Czumaj
  • Christian Sohler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3142)


In this paper we present a novel analysis of a random sampling approach for three clustering problems in metric spaces: k-median, min-sum k -clustering, and balanced k -median. For all these problems we consider the following simple sampling scheme: select a small sample set of points uniformly at random from V and then run some approximation algorithm on this sample set to compute an approximation of the best possible clustering of this set. Our main technical contribution is a significantly strengthened analysis of the approximation guarantee by this scheme for the clustering problems.

The main motivation behind our analyses was to design sublinear-time algorithms for clustering problems. Our second contribution is the development of new approximation algorithms for the aforementioned clustering problems. Using our random sampling approach we obtain for the first time approximation algorithms that have the running time independent of the input size, and depending on k and the diameter of the metric space only.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Arora, S., Raghavan, P., Rao, S.: Approximation schemes for Euclidean k-medians and related problems. In: 30th STOC, pp. 106–113 (1998)Google Scholar
  2. 2.
    Arya, V., Garg, N., Khandekar, R., Meyerson, A., Munagala, K., Pandit, V.: Local search heuristics for k-median and facility location problems. In: 33rd STOC, pp. 21–30 (2001)Google Scholar
  3. 3.
    Bartal, Y., Charikar, M., Raz, D.: Approximating min-sum k-clustering in metric spaces. In: 33rd STOC, pp. 11–20 (2001)Google Scholar
  4. 4.
    Charikar, M., Guha, S.: Improved combinatorial algorithms for the facility location and k-median problems. In: 40th FOCS, pp. 378–388 (1999)Google Scholar
  5. 5.
    Charikar, M., Guha, S., Tardos, É., Shmoys, D.B.: A constant-factor approximation algorithm for the k-median problem. In: 31st STOC, pp. 1–10 (1999); Charikar, M., Khuller, S., Mount, D.M., Narasimhan, G.: Algorithms for facility location problems with outliers. In: 12th SODA, pp. 642–651 (2001)Google Scholar
  6. 6.
    Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: 35th STOC, pp. 30–39 (2003)Google Scholar
  7. 7.
    Chazelle, B.: Who says you have to look at the input? The brave new world of sublinear computing? In: 15th SODA, p. 134 (2004)Google Scholar
  8. 8.
    Fernandez de la Vega, W., Karpinski, M., Kenyon, C.: andY. Rabani. Polynomial time approximation schemes for metric min-sum clustering. In: 35th STOC, pp. 50–58 (2003)Google Scholar
  9. 9.
    Gutmann-Beck, N., Hassin, R.: Approximation algorithms for min-sum p-clustering. Discrete Applied Mathematics 89, 125–142 (1998)CrossRefMathSciNetGoogle Scholar
  10. 10.
    Har-Peled, S., Mazumdar, S.: Coresets for k-means and k-median clustering and their applications. In: 36th STOC (2004)Google Scholar
  11. 11.
    Indyk, P.: Sublinear time algorithms for metric space problems. In: 31st STOC, pp. 428–434 (1999)Google Scholar
  12. 12.
    Indyk, P.: A sublinear time approximation scheme for clustering in metric spaces. In: 40th FOCS, pp. 154–159 (1999)Google Scholar
  13. 13.
    Jain, K., Mahdian, M., Saberi, A.: A new greedy approach for facility location problems. In: 34th STOC, pp. 731–740 (2002)Google Scholar
  14. 14.
    Jain, K., Vazirani, V.V.: Primal-dual approximation algorithms for metric facility location and k-median problems. In: 40th FOCS, pp. 2–13 (1999)Google Scholar
  15. 15.
    Kolliopoulos, S.G., Rao, S.: A nearly linear-time approximation scheme for the Euclidean k-median problems. In: Nešetřil, J. (ed.) ESA 1999. LNCS, vol. 1643, pp. 378–389. Springer, Heidelberg (1999)Google Scholar
  16. 16.
    Kumar, R., Rubinfeld, R.: Sublinear time algorithms. SIGACT News 34(4), 57–67 (2003)CrossRefGoogle Scholar
  17. 17.
    Mettu, R.R., Plaxton, C.G.: Optimal time bounds for approximate clustering. In: 18th Conference on Uncertainty in Artificial Intelligence (UAI), August 2002, pp. 344–351 (2002)Google Scholar
  18. 18.
    Meyerson, A., O’Callaghan, L., Plotkin, S.: A k-median algorithm with running time independent of data size. Journal of Machine Learning (2004)Google Scholar
  19. 19.
    Mishra, N., Oblinger, D., Pitt, L.: Sublinear time approximate clustering. In: 12th SODA, pp. 439–447 (2001)Google Scholar
  20. 20.
    Schulman, L.J.: Clustering for edge-cost minimization. In: 32nd STOC, pp. 547–555 (2000)Google Scholar
  21. 21.
    Sahni, S., Gonzalez, T.: P-complete approximation problems. JACM 23, 555–566 (1976)zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Tokuyama, T., Nakano, J.: Geometric algorithms for the minimum cost assignment problem. Random Structures and Algorithms 6(4), 393–406 (1995)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2004

Authors and Affiliations

  • Artur Czumaj
    • 1
  • Christian Sohler
    • 2
  1. 1.Department of Computer ScienceNew Jersey Institute of TechnologyNewarkUSA
  2. 2.Heinz Nixdorf Institute and Department of Computer ScienceUniversity of PaderbornPaderbornGermany

Personalised recommendations