Advertisement

A Communication-Efficient Distributed Data Structure for Top-k and k-Select Queries

  • Felix Biermeier
  • Björn Feldkord
  • Manuel MalatyaliEmail author
  • Friedhelm Meyer auf der Heide
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10787)

Abstract

We consider the scenario of n sensor nodes observing streams of data. The nodes are connected to a central server whose task it is to compute some function over all data items observed by the nodes. In our case, there exists a total order on the data items observed by the nodes. Our goal is to compute the k currently lowest observed values or a value with rank in \([(1-\varepsilon )k,(1+\varepsilon )k]\) with probability \((1-\delta )\). We propose solutions for these problems in an extension of the distributed monitoring model where the server can send broadcast messages to all nodes for unit cost. We want to minimize communication over multiple time steps where there are m updates to a node’s value in between queries. The result is composed of two main parts, which each may be of independent interest:
  1. 1.

    Protocols which answer Top-\(k\) and \(k\)-Select queries. These protocols are memoryless in the sense that they gather all information at the time of the request.

     
  2. 2.

    A dynamic data structure which tracks for every k an element close to k.

     

We describe how to combine the two parts to receive a protocol answering the stated queries over multiple time steps. Overall, for Top-k queries we use Open image in new window and for k-Select queries Open image in new window messages in expectation. These results are shown to be asymptotically tight if m is not too small.

References

  1. 1.
    Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009. LNCS, vol. 5555, pp. 95–106. Springer, Heidelberg (2009).  https://doi.org/10.1007/978-3-642-02927-1_10CrossRefGoogle Scholar
  2. 2.
    Babcock, B., Olston, C.: Distributed Top-K monitoring. In: International Conference on Management of Data, pp. 28–39. ACM (2003)Google Scholar
  3. 3.
    Bemmann, P., Biermeier, F., Bürmann, J., Kemper, A., Knollmann, T., Knorr, S., Kothe, N., Mäcker, A., Malatyali, M., Meyer auf der Heide, F., Riechers, S., Schaefer, J., Sundermeier, J.: Monitoring of domain-related problems in distributed data streams. In: 24th International Colloquium on Structural Information and Communication Complexity (to appear)Google Scholar
  4. 4.
    Biermeier, F., Feldkord, B., Malatyali, M., Meyer auf der Heide, F.: A communication-efficient distributed data structure for top-k and k-select queries. arXiv preprint arXiv:1709.07259 (2017)
  5. 5.
    Canetti, R., Even, G., Goldreich, O.: Lower bounds for sampling algorithms for estimating the average. Inf. Process. Lett. 53(1), 17–25 (1995)MathSciNetCrossRefzbMATHGoogle Scholar
  6. 6.
    Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Space-and time-efficient deterministic algorithms for biased quantiles over data streams. In: Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 263–272. ACMGoogle Scholar
  7. 7.
    Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Effective computation of biased quantiles over data streams. In: 21st International Conference on Data Engineering, ICDE 2005 Proceedings, pp. 20–31. IEEE (2005)Google Scholar
  8. 8.
    Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7, 21 (2011)MathSciNetCrossRefzbMATHGoogle Scholar
  9. 9.
    Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete algorithms, SODA 2008. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2008)Google Scholar
  10. 10.
    Mäcker, A., Malatyali, M., Meyer auf der Heide, F.: Online Top-k-position monitoring of distributed data streams. In: 29th International Parallel and Distributed Processing Symposium. IEEE (2015)Google Scholar
  11. 11.
    Mäcker, A., Malatyali, M., Meyer auf der Heide, F.: On competitive algorithms for approximations of Top-\(k\)-position monitoring of distributed streams. In: 30th International Parallel and Distributed Processing Symposium. IEEE (2016)Google Scholar
  12. 12.
    Madden, S., Franklin, M., Hellerstein, J., Hong, W.: The design of an acquisitional query processor for sensor networks. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of data, pp. 491–502 (2003)Google Scholar
  13. 13.
    Marberg, J.: An optimal shout-echo algorithm for selection in distributed sets. UCLA (1985)Google Scholar
  14. 14.
    Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers Inc., Breda (2005)zbMATHGoogle Scholar
  15. 15.
    Rotem, D., Santoro, N., Sidney, J.: Shout echo selection in distributed files. Networks 16, 77–86 (1986)MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    Yi, K., Zhang, Q.: Optimal tracking of distributed heavy hitters and quantiles. Algorithmica 65(1), 206–223 (2013)MathSciNetCrossRefzbMATHGoogle Scholar
  17. 17.
    Zengfeng, H., Yi, K., Zhang, Q.: Randomized algorithms for tracking distributed count, frequencies, and ranks. In: Proceedings of the 31st Symposium on Principles of Database Systems (2012)Google Scholar
  18. 18.
    Zhang, Z., Cheng, R., Papadias, D., Tung, A.K.H.: Minimizing the communication cost for continuous skyline maintenance. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 495–508. ACM, New York (2009)Google Scholar
  19. 19.
    Zhang, Q.: Communication-efficient computation on distributed noisy datasets. In: Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 313–322. ACM (2015)Google Scholar

Copyright information

© Springer International Publishing AG, part of Springer Nature 2018

Authors and Affiliations

  • Felix Biermeier
    • 1
  • Björn Feldkord
    • 1
  • Manuel Malatyali
    • 1
    Email author
  • Friedhelm Meyer auf der Heide
    • 1
  1. 1.Computer Science Department, Heinz Nixdorf InstitutePaderborn UniversityPaderbornGermany

Personalised recommendations