Skip to main content

Accurate Low-Space Approximation of Metric k-Median for Insertion-Only Streams

  • Conference paper
  • First Online:
Algorithms and Discrete Applied Mathematics (CALDAM 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10156))

Included in the following conference series:

  • 820 Accesses

Abstract

We present a low-constant approximation for metric k-median on an insertion-only stream of n points using \(O(\epsilon ^{-3} k \log n)\) space. In particular, we present a streaming \((O(\epsilon ^{-3} k \log n), 2 + \epsilon )\)-bicriterion solution that reports cluster weights. It is well-known that running an offline algorithm on this bicriterion solution yields a \((17.66 + \epsilon )\)-approximation.

Previously, there have been two lines of research that trade off between space and accuracy in the streaming k-median problem. To date, the best-known \((k,\epsilon )\)-coreset construction requires \(O(\epsilon ^{-2} k \log ^4 n)\) space [8], while the best-known \(O(k \log n)\)-space algorithm provides only a \((O(k \log n), 1063)\)-bicriterion [3]. Our work narrows this gap significantly, matching the best-known space while significantly improving the accuracy from 1063 to \(2+\epsilon \). We also provide a matching lower bound, showing that any \({\text {polylog}}(n)\)-space streaming algorithm that maintains an \((\alpha ,\beta )\)-bicriterion must have \(\beta \ge 2\).

Our technique breaks the stream into segments defined by jumps in the optimal clustering cost, which increases monotonically as the stream progresses. By a storing an accurate summary of recent segments and a lower-space summary of older segments, our algorithm maintains a \((O(\epsilon ^{-3} k \log n), 2 + \epsilon )\)-bicriterion solution for the entire input.

V. Braverman—This material is based upon work supported in part by the National Science Foundation under Grants IIS-1447639 and CCF-1650041.

H. Lang—This research is supported by the Franco-American Fulbright Commission. The author thanks INRIA (l’Institut national de recherche en informatique et en automatique) for hosting him during the writing of this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Bādoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC 2002, pp. 250–257. ACM, New York (2002)

    Google Scholar 

  2. Bentley, J.L., Saxe, J.B.: Decomposable searching problems I. Static-to-dynamic transformation. J. Algorithms 1(4), 301–358 (1980)

    Article  MathSciNet  MATH  Google Scholar 

  3. Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., Tagiku, B.: Streaming k-means on well-clusterable data. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, pp. 26–40. SIAM (2011)

    Google Scholar 

  4. Bury, M., Schwiegelshohn, C.: Random projections for k-means: maintaining coresets beyond merge & reduce. CoRR, abs/1504.01584 (2015)

    Google Scholar 

  5. Byrka, J., Pensyl, T., Rybicki, B., Srinivasan, A., Trinh, K.: An improved approximation for k-median, and positive correlation in budgeted optimization. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pp. 737–756. SIAM (2015)

    Google Scholar 

  6. Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC 2003, pp. 30–39. ACM, New York (2003)

    Google Scholar 

  7. Chen, K.: On coresets for \(k\)-median and \(k\)-means clustering in metric and euclidean spaces and their applications. SIAM J. Comput. 39(3), 923–947 (2009)

    Article  MathSciNet  MATH  Google Scholar 

  8. Feldman, D., Langberg, M.: A unified framework for approximating and clustering data. In: Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing, STOC 2011, pp. 569–578. ACM, New York (2011)

    Google Scholar 

  9. Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: BICO: BIRCH meets coresets for k-means clustering. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 481–492. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40450-4_41

    Chapter  Google Scholar 

  10. Guha, S.: Tight results for clustering and summarizing data streams. In: Proceedings of the 12th International Conference on Database Theory, ICDT 2009, pp. 268–275. ACM, New York (2009)

    Google Scholar 

  11. Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)

    Article  Google Scholar 

  12. Har-Peled, S., Kushal, A.: Smaller coresets for k-median and k-means clustering. Discrete Comput. Geom. 37(1), 3–19 (2007)

    Article  MathSciNet  MATH  Google Scholar 

  13. Har-Peled, S., Mazumdar, S.: Coresets for \(k\)-means and \(k\)-median clustering and their applications. In: STOC 2004, pp. 291–300 (2004)

    Google Scholar 

  14. Meyerson, A.: Online facility location. In: Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, FOCS 2001, p. 426. IEEE Computer Society, Washington, DC (2001)

    Google Scholar 

  15. Shindler, M., Wong, A., Meyerson, A.W.: Fast and accurate k-means for large datasets. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 24, pp. 2375–2383. Curran Associates Inc., Red Hook (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Keith Levin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2017 Springer International Publishing AG

About this paper

Cite this paper

Braverman, V., Lang, H., Levin, K. (2017). Accurate Low-Space Approximation of Metric k-Median for Insertion-Only Streams. In: Gaur, D., Narayanaswamy, N. (eds) Algorithms and Discrete Applied Mathematics. CALDAM 2017. Lecture Notes in Computer Science(), vol 10156. Springer, Cham. https://doi.org/10.1007/978-3-319-53007-9_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-53007-9_7

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-53006-2

  • Online ISBN: 978-3-319-53007-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics