Abstract
We consider the scenario of n sensor nodes observing streams of data. The nodes are connected to a central server whose task it is to compute some function over all data items observed by the nodes. In our case, there exists a total order on the data items observed by the nodes. Our goal is to compute the k currently lowest observed values or a value with rank in \([(1-\varepsilon )k,(1+\varepsilon )k]\) with probability \((1-\delta )\). We propose solutions for these problems in an extension of the distributed monitoring model where the server can send broadcast messages to all nodes for unit cost. We want to minimize communication over multiple time steps where there are m updates to a node’s value in between queries. The result is composed of two main parts, which each may be of independent interest:
-
1.
Protocols which answer Top-\(k\) and \(k\)-Select queries. These protocols are memoryless in the sense that they gather all information at the time of the request.
-
2.
A dynamic data structure which tracks for every k an element close to k.
We describe how to combine the two parts to receive a protocol answering the stated queries over multiple time steps. Overall, for Top-k queries we use and for k-Select queries messages in expectation. These results are shown to be asymptotically tight if m is not too small.
This work was partially supported by the German Research Foundation (DFG) within the Priority Program “Algorithms for Big Data” (SPP 1736).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009. LNCS, vol. 5555, pp. 95–106. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02927-1_10
Babcock, B., Olston, C.: Distributed Top-K monitoring. In: International Conference on Management of Data, pp. 28–39. ACM (2003)
Bemmann, P., Biermeier, F., Bürmann, J., Kemper, A., Knollmann, T., Knorr, S., Kothe, N., Mäcker, A., Malatyali, M., Meyer auf der Heide, F., Riechers, S., Schaefer, J., Sundermeier, J.: Monitoring of domain-related problems in distributed data streams. In: 24th International Colloquium on Structural Information and Communication Complexity (to appear)
Biermeier, F., Feldkord, B., Malatyali, M., Meyer auf der Heide, F.: A communication-efficient distributed data structure for top-k and k-select queries. arXiv preprint arXiv:1709.07259 (2017)
Canetti, R., Even, G., Goldreich, O.: Lower bounds for sampling algorithms for estimating the average. Inf. Process. Lett. 53(1), 17–25 (1995)
Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Space-and time-efficient deterministic algorithms for biased quantiles over data streams. In: Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 263–272. ACM
Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Effective computation of biased quantiles over data streams. In: 21st International Conference on Data Engineering, ICDE 2005 Proceedings, pp. 20–31. IEEE (2005)
Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7, 21 (2011)
Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete algorithms, SODA 2008. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2008)
Mäcker, A., Malatyali, M., Meyer auf der Heide, F.: Online Top-k-position monitoring of distributed data streams. In: 29th International Parallel and Distributed Processing Symposium. IEEE (2015)
Mäcker, A., Malatyali, M., Meyer auf der Heide, F.: On competitive algorithms for approximations of Top-\(k\)-position monitoring of distributed streams. In: 30th International Parallel and Distributed Processing Symposium. IEEE (2016)
Madden, S., Franklin, M., Hellerstein, J., Hong, W.: The design of an acquisitional query processor for sensor networks. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of data, pp. 491–502 (2003)
Marberg, J.: An optimal shout-echo algorithm for selection in distributed sets. UCLA (1985)
Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers Inc., Breda (2005)
Rotem, D., Santoro, N., Sidney, J.: Shout echo selection in distributed files. Networks 16, 77–86 (1986)
Yi, K., Zhang, Q.: Optimal tracking of distributed heavy hitters and quantiles. Algorithmica 65(1), 206–223 (2013)
Zengfeng, H., Yi, K., Zhang, Q.: Randomized algorithms for tracking distributed count, frequencies, and ranks. In: Proceedings of the 31st Symposium on Principles of Database Systems (2012)
Zhang, Z., Cheng, R., Papadias, D., Tung, A.K.H.: Minimizing the communication cost for continuous skyline maintenance. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 495–508. ACM, New York (2009)
Zhang, Q.: Communication-efficient computation on distributed noisy datasets. In: Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 313–322. ACM (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG, part of Springer Nature
About this paper
Cite this paper
Biermeier, F., Feldkord, B., Malatyali, M., Meyer auf der Heide, F. (2018). A Communication-Efficient Distributed Data Structure for Top-k and k-Select Queries. In: Solis-Oba, R., Fleischer, R. (eds) Approximation and Online Algorithms. WAOA 2017. Lecture Notes in Computer Science(), vol 10787. Springer, Cham. https://doi.org/10.1007/978-3-319-89441-6_21
Download citation
DOI: https://doi.org/10.1007/978-3-319-89441-6_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-89440-9
Online ISBN: 978-3-319-89441-6
eBook Packages: Computer ScienceComputer Science (R0)