Skip to main content

Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity

  • Conference paper

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5171))

Abstract

Clustering is a common problem in the analysis of large data sets. Streaming algorithms, which make a single pass over the data set using small working memory and produce a clustering comparable in cost to the optimal offline solution, are especially useful. We develop the first streaming algorithms achieving a constant-factor approximation to the cluster radius for two variations of the k-center clustering problem. We give a streaming (4 + ε)-approximation algorithm using O(ε − 1 kz) memory for the problem with outliers, in which the clustering is allowed to drop up to z of the input points; previous work used a random sampling approach which yields only a bicriteria approximation. We also give a streaming (6 + ε)-approximation algorithm using O(ε − 1 ln (ε − 1) k + k 2) memory for a variation motivated by anonymity considerations in which each cluster must contain at least a certain number of input points.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, P., Har-Peled, S., Yu, H.: Robust shape fitting via peeling and grating coresets. In: Proc. of ACM Symp. on Discrete Algorithms (SODA), pp. 182–191 (2006)

    Google Scholar 

  2. Aggarwal, G., Feder, T., Kenthapadi, K., Khuller, S., Panigrahy, R., Thomas, D., Zhu, A.: Achieving anonymity via clustering. In: Proc. of ACM Principles of Database Systems (PODS), pp. 153–162 (2006)

    Google Scholar 

  3. Aldenderfer, M.S., Blashfield, R.K.: Cluster Analysis. Sage, Beverly Hills (1984)

    Google Scholar 

  4. Bădoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proc. of ACM Symp. on Theory of Computing (STOC), pp. 250–257 (2002)

    Google Scholar 

  5. Charikar, M., Chekuri, C., Feder, T., Motwani, R.: Incremental clustering and dynamic infomation retrieval. In: Proc. of ACM Symp. on Theory of Computing (STOC), pp. 626–635 (1997)

    Google Scholar 

  6. Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proc. of ACM Symp. on Theory of Computing (STOC), pp. 30–39 (2003)

    Google Scholar 

  7. Charikar, M., Khuller, S., Mount, D., Narasimhan, G.: Algorithms for facility location problems with outliers. In: Proc. of ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 642–651 (2001)

    Google Scholar 

  8. Chen, K.: A constant factor approximation algorithm for k-median clustering with outliers. In: Proc. of ACM-SIAM Symp. on Discrete Algorithms (SODA), pp. 826–835 (2008)

    Google Scholar 

  9. Everitt, B.: Cluster Analysis. Heinemann Educational, London (1974)

    Google Scholar 

  10. Gonzalez, T.: Clustering to minimize the maximum inter-cluster distance. Theoretical Computer Science 38, 293–306 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  11. Guha, S.: The k-center karma of a data stream (unpublished manuscript) (2007)

    Google Scholar 

  12. Guha, S., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams. In: Proc. of IEEE Foundations of Computer Science (FOCS), pp. 359–366 (2000)

    Google Scholar 

  13. Hartigan, J.A.: Clustering Algorithms. Wiley, Chichester (1975)

    MATH  Google Scholar 

  14. Hochbaum, D., Shmoys, D.B.: A best possible approximation algorithm for the k-center problem. Math. of Operations Research 10, 180–184 (1985)

    Article  MATH  MathSciNet  Google Scholar 

  15. Jain, A.K., Dubes, R.C.: Algorithms for clustering data. Prentice Hall, NJ (1988)

    MATH  Google Scholar 

  16. Rasmussen, E.: Clustering algorithms. In: Frakes, W., Baeza-Yates, R. (eds.) Information Retrieval: Data Structures and Algorithms. Prentice Hall, Englewood Cliffs (1992)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ashish Goel Klaus Jansen José D. P. Rolim Ronitt Rubinfeld

Rights and permissions

Reprints and permissions

Copyright information

© 2008 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Matthew McCutchen, R., Khuller, S. (2008). Streaming Algorithms for k-Center Clustering with Outliers and with Anonymity. In: Goel, A., Jansen, K., Rolim, J.D.P., Rubinfeld, R. (eds) Approximation, Randomization and Combinatorial Optimization. Algorithms and Techniques. APPROX RANDOM 2008 2008. Lecture Notes in Computer Science, vol 5171. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-85363-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-85363-3_14

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-85362-6

  • Online ISBN: 978-3-540-85363-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics