Skip to main content
Log in

Approximate distributed top-k queries

  • Published:
Distributed Computing Aims and scope Submit manuscript

Abstract

We consider a distributed system where each node keeps a local count for items (similar to elections where nodes are ballot boxes and items are candidates). A top-k query in such a system asks which are the k items whose global count, across all nodes in the system, is the largest. In this paper, we present a Monte Carlo algorithm that outputs, with high probability, a set of k candidates which approximates the top-k items. The algorithm is motivated by sensor networks in that it focuses on reducing the individual communication complexity. In contrast to previous algorithms, the communication complexity depends only on the global scores and not on the partition of scores among nodes. If the number of nodes is large, our algorithm dramatically reduces the communication complexity when compared with deterministic algorithms. We show that the complexity of our algorithm is close to a lower bound on the cell-probe complexity of any non-interactive top-k approximation algorithm. We show that for some natural global distributions (such as the Geometric or Zipf distributions), our algorithm needs only polylogarithmic number of communication bits per node.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Attiya H. and Welch J. (1998). Distributed Algorithms. McGraw-Hill, UK

    Google Scholar 

  2. Babcock, B., Olston, C.: Distributed top-k monitoring. In: Proceeding 2003 ACM SIGMOD, pp. 28–39 (2003)

  3. Bak P. (1996). How Nature Works: The Science of Self-Organized Criticality. Springer, New York

    MATH  Google Scholar 

  4. Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top k retrieval in peer-to-peer networks. In: Proceedings of 21st International Conference on Data Engineering, pp. 174–185 (2005)

  5. Bruno, N., Gravano, L., Marian, A.: Evaluating top-k queries over web-accessible databases. In: Proceedings of 18st International Conference on Data Engineering (2002)

  6. Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: Proceedings of 23rd Annals ACM Symposium on Principles of Distributed Computing, pp. 206–215 (2004)

  7. Considine, J., Li, F., Kollios, G., Byers, J.: Approximate aggregation techniques for sensor databases. In: Proceedings of 20th International Conference on Data Engineering, pp. 449–460 (2004)

  8. Cormode, G., Garofalakis, M.N., Muthukrishnan, S., Rastogi, R.: Holistic aggregates in a networked world: Distributed tracking of approximate quantiles. In: Proceeding 2005 ACM SIGMOD, pp. 25–36 (2005)

  9. Dagum P., Karp R.M., Luby M. and Ross S. (2000). An optimal algorithm for Monte Carlo estimation. SIAM J. Comput. 29(5): 1484–1496

    Article  MATH  MathSciNet  Google Scholar 

  10. Durand, M., Flajolet, P.: Loglog counting of large cardinalities (extended abstract). In: Algorithms: ESA 11th Annals European Symposium, pp. 605–617 (2003)

  11. Fagin R. (2002). Combining fuzzy information: an overview. SIGMOD Rec. 31(2): 109–118

    Article  Google Scholar 

  12. Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. In: Proceedings of 20th ACM Symposium on Principles of Database Systems (2001)

  13. Faloutsos, M., Faloutsos, P., Faloutsos, C.: On power-law relationships of the internet topology. In: Proceedings SIGCOMM ’99, pp. 251–262. ACM Press, New York (1999)

  14. Fredman, M., Saks, M.: The cell probe complexity of dynamic data structures. In: Proceedings of the 21st Annual ACM Symposium on Theory of Computing, (1989)

  15. Greenwald, M., Khanna, S.: Power-conserving computation of order-statistics over sensor networks. In: Proceedings of 23rd ACM Symp. on Principles of Database Systems, pp. 275–285 (2004)

  16. Lynch N. (1995). Distributed Algorithms. Morgan Kaufmann, San Mateo

    Google Scholar 

  17. Madden, S., Franklin, M.J., Hellerstein, J.M., Hong, W.: The design of an acquisitional query processor for sensor networks. In: Proceedings 2003 ACM SIGMOD, pp. 491–502 (2003)

  18. Michel, S., Triantafillou, P., Weikum, G.: Klee: A framework for distributed top-k query algorithms. In: Proceedings of 31st International Conference on Very Large Data Bases, pp. 637–648 (2005)

  19. Nath, S., Gibbons, P.B., Seshan, S., Anderson, Z.R.: Synopsis diffusion for robust aggregation in sensor networks. In: SenSys ’04: Proceedings of 2nd International Conference on Embedded networked sensor systems, pp. 250–262 (2004)

  20. Patt-Shamir, B.: A note on efficient aggregate queries in sensor networks. In: Proceedings of 23rd Annals ACM Symposium on Principles of Distributed Computing, pp. 283–289 (2004)

  21. Silberstein, A., Braynard, R., Ellis, C., Munagala, K., Yang, J.: A sampling-based approach to optimizing top-k queries in sensor networks. In: Proceedings of 22nd International Conference on Data Engineering (2006)

  22. Warneke B. (2004). Miniaturizing sensor networks with mems. In: Ilyas, M. and Mahgoub, I. (eds) Handbook of Sensor Networks: Compact Wireless and Wired Sensing Systems, pp 5-1–5-19. CRC Press, USA

    Google Scholar 

  23. Yao A.C.-C. (1981). Should tables be sorted?. J. ACM 28(3): 615–628

    Article  MATH  Google Scholar 

  24. Yao Y. and Gehrke J. (2002). The Cougar approach to in-network query processing in sensor networks. ACM SIGMOD Rec. 31(3): 9–18

    Article  Google Scholar 

  25. Zeinalipour-Yazti, D., Vagena, Z., Gunopulos, D., Kalogeraki, V., Tsotras, V., Vlachos, M., Koudas, N., Srivastava, D.: The threshold join algorithm for top-k queries in distributed sensor networks. In: Proceedings of 2nd International Workshop on Data Management for Sensor Networks, pp. 61–66 (2005)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Boaz Patt-Shamir.

Additional information

An extended abstract of this paper appeared in Proc. 13th Int. Colloquium on Structural Information and Communication Complexity, SIROCCO 2006, Lecture Notes in Computer Science 4056, pp. 319–333.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Patt-Shamir, B., Shafrir, A. Approximate distributed top-k queries. Distrib. Comput. 21, 1–22 (2008). https://doi.org/10.1007/s00446-008-0055-3

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00446-008-0055-3

Keywords

Navigation