Skip to main content

A Communication-Efficient Distributed Data Structure for Top-k and k-Select Queries

  • Conference paper
  • First Online:
Approximation and Online Algorithms (WAOA 2017)

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 10787))

Included in the following conference series:

Abstract

We consider the scenario of n sensor nodes observing streams of data. The nodes are connected to a central server whose task it is to compute some function over all data items observed by the nodes. In our case, there exists a total order on the data items observed by the nodes. Our goal is to compute the k currently lowest observed values or a value with rank in \([(1-\varepsilon )k,(1+\varepsilon )k]\) with probability \((1-\delta )\). We propose solutions for these problems in an extension of the distributed monitoring model where the server can send broadcast messages to all nodes for unit cost. We want to minimize communication over multiple time steps where there are m updates to a node’s value in between queries. The result is composed of two main parts, which each may be of independent interest:

  1. 1.

    Protocols which answer Top-\(k\) and \(k\)-Select queries. These protocols are memoryless in the sense that they gather all information at the time of the request.

  2. 2.

    A dynamic data structure which tracks for every k an element close to k.

We describe how to combine the two parts to receive a protocol answering the stated queries over multiple time steps. Overall, for Top-k queries we use and for k-Select queries messages in expectation. These results are shown to be asymptotically tight if m is not too small.

This work was partially supported by the German Research Foundation (DFG) within the Priority Program “Algorithms for Big Data” (SPP 1736).

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Arackaparambil, C., Brody, J., Chakrabarti, A.: Functional monitoring without monotonicity. In: Albers, S., Marchetti-Spaccamela, A., Matias, Y., Nikoletseas, S., Thomas, W. (eds.) ICALP 2009. LNCS, vol. 5555, pp. 95–106. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-02927-1_10

    Chapter  Google Scholar 

  2. Babcock, B., Olston, C.: Distributed Top-K monitoring. In: International Conference on Management of Data, pp. 28–39. ACM (2003)

    Google Scholar 

  3. Bemmann, P., Biermeier, F., Bürmann, J., Kemper, A., Knollmann, T., Knorr, S., Kothe, N., Mäcker, A., Malatyali, M., Meyer auf der Heide, F., Riechers, S., Schaefer, J., Sundermeier, J.: Monitoring of domain-related problems in distributed data streams. In: 24th International Colloquium on Structural Information and Communication Complexity (to appear)

    Google Scholar 

  4. Biermeier, F., Feldkord, B., Malatyali, M., Meyer auf der Heide, F.: A communication-efficient distributed data structure for top-k and k-select queries. arXiv preprint arXiv:1709.07259 (2017)

  5. Canetti, R., Even, G., Goldreich, O.: Lower bounds for sampling algorithms for estimating the average. Inf. Process. Lett. 53(1), 17–25 (1995)

    Article  MathSciNet  MATH  Google Scholar 

  6. Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Space-and time-efficient deterministic algorithms for biased quantiles over data streams. In: Proceedings of the Twenty-Fifth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp. 263–272. ACM

    Google Scholar 

  7. Cormode, G., Korn, F., Muthukrishnan, S., Srivastava, D.: Effective computation of biased quantiles over data streams. In: 21st International Conference on Data Engineering, ICDE 2005 Proceedings, pp. 20–31. IEEE (2005)

    Google Scholar 

  8. Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. ACM Trans. Algorithms 7, 21 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  9. Cormode, G., Muthukrishnan, S., Yi, K.: Algorithms for distributed functional monitoring. In: Proceedings of the Nineteenth Annual ACM-SIAM Symposium on Discrete algorithms, SODA 2008. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA (2008)

    Google Scholar 

  10. Mäcker, A., Malatyali, M., Meyer auf der Heide, F.: Online Top-k-position monitoring of distributed data streams. In: 29th International Parallel and Distributed Processing Symposium. IEEE (2015)

    Google Scholar 

  11. Mäcker, A., Malatyali, M., Meyer auf der Heide, F.: On competitive algorithms for approximations of Top-\(k\)-position monitoring of distributed streams. In: 30th International Parallel and Distributed Processing Symposium. IEEE (2016)

    Google Scholar 

  12. Madden, S., Franklin, M., Hellerstein, J., Hong, W.: The design of an acquisitional query processor for sensor networks. In: Proceedings of the 2003 ACM SIGMOD International Conference on Management of data, pp. 491–502 (2003)

    Google Scholar 

  13. Marberg, J.: An optimal shout-echo algorithm for selection in distributed sets. UCLA (1985)

    Google Scholar 

  14. Muthukrishnan, S.: Data Streams: Algorithms and Applications. Now Publishers Inc., Breda (2005)

    MATH  Google Scholar 

  15. Rotem, D., Santoro, N., Sidney, J.: Shout echo selection in distributed files. Networks 16, 77–86 (1986)

    Article  MathSciNet  MATH  Google Scholar 

  16. Yi, K., Zhang, Q.: Optimal tracking of distributed heavy hitters and quantiles. Algorithmica 65(1), 206–223 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  17. Zengfeng, H., Yi, K., Zhang, Q.: Randomized algorithms for tracking distributed count, frequencies, and ranks. In: Proceedings of the 31st Symposium on Principles of Database Systems (2012)

    Google Scholar 

  18. Zhang, Z., Cheng, R., Papadias, D., Tung, A.K.H.: Minimizing the communication cost for continuous skyline maintenance. In: Proceedings of the ACM SIGMOD International Conference on Management of Data, pp. 495–508. ACM, New York (2009)

    Google Scholar 

  19. Zhang, Q.: Communication-efficient computation on distributed noisy datasets. In: Proceedings of the 27th ACM Symposium on Parallelism in Algorithms and Architectures, pp. 313–322. ACM (2015)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Manuel Malatyali .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer International Publishing AG, part of Springer Nature

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Biermeier, F., Feldkord, B., Malatyali, M., Meyer auf der Heide, F. (2018). A Communication-Efficient Distributed Data Structure for Top-k and k-Select Queries. In: Solis-Oba, R., Fleischer, R. (eds) Approximation and Online Algorithms. WAOA 2017. Lecture Notes in Computer Science(), vol 10787. Springer, Cham. https://doi.org/10.1007/978-3-319-89441-6_21

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-89441-6_21

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-89440-9

  • Online ISBN: 978-3-319-89441-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics