Abstract
We present a low-constant approximation for metric k-median on an insertion-only stream of n points using \(O(\epsilon ^{-3} k \log n)\) space. In particular, we present a streaming \((O(\epsilon ^{-3} k \log n), 2 + \epsilon )\)-bicriterion solution that reports cluster weights. It is well-known that running an offline algorithm on this bicriterion solution yields a \((17.66 + \epsilon )\)-approximation.
Previously, there have been two lines of research that trade off between space and accuracy in the streaming k-median problem. To date, the best-known \((k,\epsilon )\)-coreset construction requires \(O(\epsilon ^{-2} k \log ^4 n)\) space [8], while the best-known \(O(k \log n)\)-space algorithm provides only a \((O(k \log n), 1063)\)-bicriterion [3]. Our work narrows this gap significantly, matching the best-known space while significantly improving the accuracy from 1063 to \(2+\epsilon \). We also provide a matching lower bound, showing that any \({\text {polylog}}(n)\)-space streaming algorithm that maintains an \((\alpha ,\beta )\)-bicriterion must have \(\beta \ge 2\).
Our technique breaks the stream into segments defined by jumps in the optimal clustering cost, which increases monotonically as the stream progresses. By a storing an accurate summary of recent segments and a lower-space summary of older segments, our algorithm maintains a \((O(\epsilon ^{-3} k \log n), 2 + \epsilon )\)-bicriterion solution for the entire input.
V. Braverman—This material is based upon work supported in part by the National Science Foundation under Grants IIS-1447639 and CCF-1650041.
H. Lang—This research is supported by the Franco-American Fulbright Commission. The author thanks INRIA (l’Institut national de recherche en informatique et en automatique) for hosting him during the writing of this paper.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bādoiu, M., Har-Peled, S., Indyk, P.: Approximate clustering via core-sets. In: Proceedings of the Thiry-Fourth Annual ACM Symposium on Theory of Computing, STOC 2002, pp. 250–257. ACM, New York (2002)
Bentley, J.L., Saxe, J.B.: Decomposable searching problems I. Static-to-dynamic transformation. J. Algorithms 1(4), 301–358 (1980)
Braverman, V., Meyerson, A., Ostrovsky, R., Roytman, A., Shindler, M., Tagiku, B.: Streaming k-means on well-clusterable data. In: Proceedings of the Twenty-Second Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2011, pp. 26–40. SIAM (2011)
Bury, M., Schwiegelshohn, C.: Random projections for k-means: maintaining coresets beyond merge & reduce. CoRR, abs/1504.01584 (2015)
Byrka, J., Pensyl, T., Rybicki, B., Srinivasan, A., Trinh, K.: An improved approximation for k-median, and positive correlation in budgeted optimization. In: Proceedings of the Twenty-Sixth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA 2015, pp. 737–756. SIAM (2015)
Charikar, M., O’Callaghan, L., Panigrahy, R.: Better streaming algorithms for clustering problems. In: Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing, STOC 2003, pp. 30–39. ACM, New York (2003)
Chen, K.: On coresets for \(k\)-median and \(k\)-means clustering in metric and euclidean spaces and their applications. SIAM J. Comput. 39(3), 923–947 (2009)
Feldman, D., Langberg, M.: A unified framework for approximating and clustering data. In: Proceedings of the Forty-Third Annual ACM Symposium on Theory of Computing, STOC 2011, pp. 569–578. ACM, New York (2011)
Fichtenberger, H., Gillé, M., Schmidt, M., Schwiegelshohn, C., Sohler, C.: BICO: BIRCH meets coresets for k-means clustering. In: Bodlaender, H.L., Italiano, G.F. (eds.) ESA 2013. LNCS, vol. 8125, pp. 481–492. Springer, Heidelberg (2013). doi:10.1007/978-3-642-40450-4_41
Guha, S.: Tight results for clustering and summarizing data streams. In: Proceedings of the 12th International Conference on Database Theory, ICDT 2009, pp. 268–275. ACM, New York (2009)
Guha, S., Meyerson, A., Mishra, N., Motwani, R., O’Callaghan, L.: Clustering data streams: theory and practice. IEEE Trans. Knowl. Data Eng. 15(3), 515–528 (2003)
Har-Peled, S., Kushal, A.: Smaller coresets for k-median and k-means clustering. Discrete Comput. Geom. 37(1), 3–19 (2007)
Har-Peled, S., Mazumdar, S.: Coresets for \(k\)-means and \(k\)-median clustering and their applications. In: STOC 2004, pp. 291–300 (2004)
Meyerson, A.: Online facility location. In: Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science, FOCS 2001, p. 426. IEEE Computer Society, Washington, DC (2001)
Shindler, M., Wong, A., Meyerson, A.W.: Fast and accurate k-means for large datasets. In: Shawe-Taylor, J., Zemel, R., Bartlett, P., Pereira, F., Weinberger, K. (eds.) Advances in Neural Information Processing Systems 24, pp. 2375–2383. Curran Associates Inc., Red Hook (2011)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2017 Springer International Publishing AG
About this paper
Cite this paper
Braverman, V., Lang, H., Levin, K. (2017). Accurate Low-Space Approximation of Metric k-Median for Insertion-Only Streams. In: Gaur, D., Narayanaswamy, N. (eds) Algorithms and Discrete Applied Mathematics. CALDAM 2017. Lecture Notes in Computer Science(), vol 10156. Springer, Cham. https://doi.org/10.1007/978-3-319-53007-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-319-53007-9_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-53006-2
Online ISBN: 978-3-319-53007-9
eBook Packages: Computer ScienceComputer Science (R0)