Advertisement

Algorithmica

, Volume 81, Issue 6, pp 2222–2243 | Cite as

Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks

  • Zengfeng HuangEmail author
  • Ke Yi
  • Qin Zhang
Article
  • 173 Downloads

Abstract

We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the count-tracking problem, where there are k players, each holding a counter \(n_i\) that gets incremented over time, and the goal is to track an \(\varepsilon \)-approximation of their sum \(n=\sum _i n_i\) continuously at all times, using minimum communication. While the deterministic communication complexity of the problem is \({\varTheta }(k/\varepsilon \cdot \log N)\), where N is the final value of n when the tracking finishes, we show that with randomization, the communication cost can be reduced to \({\varTheta }(\sqrt{k}/\varepsilon \cdot \log N)\). Our algorithm is simple and uses only O(1) space at each player, while the lower bound holds even assuming each player has infinite computing power. Then, we extend our techniques to two related distributed tracking problems: frequency-tracking and rank-tracking, and obtain similar improvements over previous deterministic algorithms. Both problems are of central importance in large data monitoring and analysis, and have been extensively studied in the literature.

Keywords

Continuous distributed tracking Randomized algorithms Distributed streaming 

Notes

References

  1. 1.
    Agarwal, P.K., Cormode, G., Huang, Z., Phillips, J.M., Wei, Z., Yi, K.: Mergeable summaries. In: Proceedings of the ACM Symposium on Principles of Database Systems (2012)Google Scholar
  2. 2.
    Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Proceedings of the International Colloquium on Automata, Languages, and Programming (2009)Google Scholar
  3. 3.
    Babcock, B., Olston, C.: Distributed top-k monitoring. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2003)Google Scholar
  4. 4.
    Bar-Yossef, Z.: The complexity of massive data set computations. PhD thesis, University of California at Berkeley (2002)Google Scholar
  5. 5.
    Chan, H.-L., Lam, T.W., Lee, L.-K., Ting, H.-F.: Continuous monitoring of distributed data streams over a time-based sliding window. Algorithmica 62(3–4), 1088–1111 (2011)MathSciNetzbMATHGoogle Scholar
  6. 6.
    Cormode, G.: The continuous distributed monitoring model. ACM SIGMOD Rec. 42(1), 5–14 (2013)CrossRefGoogle Scholar
  7. 7.
    Cormode, G., Garofalakis, M., Muthukrishnan, S., Rastogi, R.: Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2005)Google Scholar
  8. 8.
    Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. In: Proceedings of the International Conference on Very Large Data Bases (2008)Google Scholar
  9. 9.
    Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7(2), Article 21 (2011). (Preliminary version in SODA’08)MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Continuous sampling from distributed streams. J. ACM 59(2), 10 (2012). (Preliminary version in PODS’10)MathSciNetCrossRefzbMATHGoogle Scholar
  11. 11.
    Feller, W.: An Introduction to Probability Theory and Its Applications. Wiley, New York (1968)zbMATHGoogle Scholar
  12. 12.
    Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (2001)Google Scholar
  13. 13.
    Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)Google Scholar
  14. 14.
    Huang, Z., Wang, L., Yi, K., Liu, Y.: Sampling based algorithms for quantile computation in sensor networks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2011)Google Scholar
  15. 15.
    Huang, Z., Yi, K., Liu, Y., Chen, G.: Optimal sampling algorithms for frequency estimation in distributed data. In: IEEE INFOCOM (2011)Google Scholar
  16. 16.
    Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2006)Google Scholar
  17. 17.
    Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (recently) frequent items in distributed data streams. In: Proceedings of the IEEE International Conference on Data Engineering (2005)Google Scholar
  18. 18.
    Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the International Conference on Very Large Data Bases (2002)Google Scholar
  19. 19.
    Metwally, A., Agrawal, D., Abbadi, A.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst. 31(3), 1095–1133 (2006)CrossRefGoogle Scholar
  20. 20.
    Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program. 2, 143–152 (1982)MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    Munro, J.I., Paterson, M.S.: Selection and sorting with limited storage. Theor. Comput. Sci. 12, 315–323 (1980)MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Patt-Shamir, B., Shafrir, A.: Approximate distributed top-k queries. Distrib. Comput. 21(1), 1–22 (2008)CrossRefzbMATHGoogle Scholar
  23. 23.
    Suri, S., Toth, C., Zhou, Y.: Range counting over multidimensional data streams. Discrete Comput. Geom. 36, 633–655 (2006)MathSciNetCrossRefzbMATHGoogle Scholar
  24. 24.
    Tirthapura, S., Woodruff, D.P.: Optimal random sampling from distributed streams revisited. In: Proceedings of the International Symposium on Distributed Computing (2011)Google Scholar
  25. 25.
    Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 16, 264–280 (1971)CrossRefzbMATHGoogle Scholar
  26. 26.
    Woodruff, D.P.: Efficient and Private Distance Approximation in the Communication and Streaming Models. PhD thesis, Massachusetts Institute of Technology (2007)Google Scholar
  27. 27.
    Woodruff, D.P., Zhang, Q.: Tight bounds for distributed functional monitoring. In: Proceedings of the ACM Symposium on Theory of Computing (2012)Google Scholar
  28. 28.
    Yao, A.C.: Probabilistic computations: towards a unified measure of complexity. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (1977)Google Scholar
  29. 29.
    Yi, K., Zhang, Q.: Optimal tracking of distributed heavy hitters and quantiles. In: Proceedings of the ACM Symposium on Principles of Database Systems (2009)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.School of Data ScienceFudan UniversityShanghaiChina
  2. 2.The Hong Kong University of Science and TechnologyClear Water BayHong Kong
  3. 3.Indiana University BloomingtonBloomingtonUSA

Personalised recommendations