# Randomized Algorithms for Tracking Distributed Count, Frequencies, and Ranks

- 173 Downloads

## Abstract

We show that randomization can lead to significant improvements for a few fundamental problems in distributed tracking. Our basis is the *count-tracking* problem, where there are *k* players, each holding a counter \(n_i\) that gets incremented over time, and the goal is to track an \(\varepsilon \)-approximation of their sum \(n=\sum _i n_i\) continuously at all times, using minimum communication. While the deterministic communication complexity of the problem is \({\varTheta }(k/\varepsilon \cdot \log N)\), where *N* is the final value of *n* when the tracking finishes, we show that with randomization, the communication cost can be reduced to \({\varTheta }(\sqrt{k}/\varepsilon \cdot \log N)\). Our algorithm is simple and uses only *O*(1) space at each player, while the lower bound holds even assuming each player has infinite computing power. Then, we extend our techniques to two related distributed tracking problems: *frequency-tracking* and *rank-tracking*, and obtain similar improvements over previous deterministic algorithms. Both problems are of central importance in large data monitoring and analysis, and have been extensively studied in the literature.

## Keywords

Continuous distributed tracking Randomized algorithms Distributed streaming## Notes

## References

- 1.Agarwal, P.K., Cormode, G., Huang, Z., Phillips, J.M., Wei, Z., Yi, K.: Mergeable summaries. In: Proceedings of the ACM Symposium on Principles of Database Systems (2012)Google Scholar
- 2.Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Proceedings of the International Colloquium on Automata, Languages, and Programming (2009)Google Scholar
- 3.Babcock, B., Olston, C.: Distributed top-k monitoring. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2003)Google Scholar
- 4.Bar-Yossef, Z.: The complexity of massive data set computations. PhD thesis, University of California at Berkeley (2002)Google Scholar
- 5.Chan, H.-L., Lam, T.W., Lee, L.-K., Ting, H.-F.: Continuous monitoring of distributed data streams over a time-based sliding window. Algorithmica
**62**(3–4), 1088–1111 (2011)MathSciNetzbMATHGoogle Scholar - 6.Cormode, G.: The continuous distributed monitoring model. ACM SIGMOD Rec.
**42**(1), 5–14 (2013)CrossRefGoogle Scholar - 7.Cormode, G., Garofalakis, M., Muthukrishnan, S., Rastogi, R.: Holistic aggregates in a networked world: distributed tracking of approximate quantiles. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2005)Google Scholar
- 8.Cormode, G., Hadjieleftheriou, M.: Finding frequent items in data streams. In: Proceedings of the International Conference on Very Large Data Bases (2008)Google Scholar
- 9.Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. ACM Trans. Algorithms
**7**(2), Article 21 (2011). (Preliminary version in SODA’08)MathSciNetCrossRefzbMATHGoogle Scholar - 10.Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Continuous sampling from distributed streams. J. ACM
**59**(2), 10 (2012). (Preliminary version in PODS’10)MathSciNetCrossRefzbMATHGoogle Scholar - 11.Feller, W.: An Introduction to Probability Theory and Its Applications. Wiley, New York (1968)zbMATHGoogle Scholar
- 12.Gibbons, P.B., Tirthapura, S.: Estimating simple functions on the union of data streams. In: Proceedings of the ACM Symposium on Parallelism in Algorithms and Architectures (2001)Google Scholar
- 13.Greenwald, M., Khanna, S.: Space-efficient online computation of quantile summaries. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2001)Google Scholar
- 14.Huang, Z., Wang, L., Yi, K., Liu, Y.: Sampling based algorithms for quantile computation in sensor networks. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2011)Google Scholar
- 15.Huang, Z., Yi, K., Liu, Y., Chen, G.: Optimal sampling algorithms for frequency estimation in distributed data. In: IEEE INFOCOM (2011)Google Scholar
- 16.Keralapura, R., Cormode, G., Ramamirtham, J.: Communication-efficient distributed monitoring of thresholded counts. In: Proceedings of the ACM SIGMOD International Conference on Management of Data (2006)Google Scholar
- 17.Manjhi, A., Shkapenyuk, V., Dhamdhere, K., Olston, C.: Finding (recently) frequent items in distributed data streams. In: Proceedings of the IEEE International Conference on Data Engineering (2005)Google Scholar
- 18.Manku, G., Motwani, R.: Approximate frequency counts over data streams. In: Proceedings of the International Conference on Very Large Data Bases (2002)Google Scholar
- 19.Metwally, A., Agrawal, D., Abbadi, A.: An integrated efficient solution for computing frequent and top-k elements in data streams. ACM Trans. Database Syst.
**31**(3), 1095–1133 (2006)CrossRefGoogle Scholar - 20.Misra, J., Gries, D.: Finding repeated elements. Sci. Comput. Program.
**2**, 143–152 (1982)MathSciNetCrossRefzbMATHGoogle Scholar - 21.Munro, J.I., Paterson, M.S.: Selection and sorting with limited storage. Theor. Comput. Sci.
**12**, 315–323 (1980)MathSciNetCrossRefzbMATHGoogle Scholar - 22.Patt-Shamir, B., Shafrir, A.: Approximate distributed top-k queries. Distrib. Comput.
**21**(1), 1–22 (2008)CrossRefzbMATHGoogle Scholar - 23.Suri, S., Toth, C., Zhou, Y.: Range counting over multidimensional data streams. Discrete Comput. Geom.
**36**, 633–655 (2006)MathSciNetCrossRefzbMATHGoogle Scholar - 24.Tirthapura, S., Woodruff, D.P.: Optimal random sampling from distributed streams revisited. In: Proceedings of the International Symposium on Distributed Computing (2011)Google Scholar
- 25.Vapnik, V.N., Chervonenkis, A.Y.: On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl.
**16**, 264–280 (1971)CrossRefzbMATHGoogle Scholar - 26.Woodruff, D.P.: Efficient and Private Distance Approximation in the Communication and Streaming Models. PhD thesis, Massachusetts Institute of Technology (2007)Google Scholar
- 27.Woodruff, D.P., Zhang, Q.: Tight bounds for distributed functional monitoring. In: Proceedings of the ACM Symposium on Theory of Computing (2012)Google Scholar
- 28.Yao, A.C.: Probabilistic computations: towards a unified measure of complexity. In: Proceedings of the IEEE Symposium on Foundations of Computer Science (1977)Google Scholar
- 29.Yi, K., Zhang, Q.: Optimal tracking of distributed heavy hitters and quantiles. In: Proceedings of the ACM Symposium on Principles of Database Systems (2009)Google Scholar