Optimizing Distributed Top-k Queries

  • Thomas Neumann
  • Matthias Bender
  • Sebastian Michel
  • Ralf Schenkel
  • Peter Triantafillou
  • Gerhard Weikum
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5175)

Abstract

Top-k query processing is a fundamental building block for efficient ranking in a large number of applications. Efficiency is a central issue, especially for distributed settings, when the data is spread across different nodes in a network. This paper introduces novel optimization methods for top-k aggregation queries in such distributed environments that can be applied to all algorithms that fall into the frameworks of the prior TPUT and KLEE methods. The optimizations address 1) hierarchically grouping input lists into top-k operator trees and optimizing the tree structure, and 2) computing data-adaptive scan depths for different input sources. The paper presents comprehensive experiments with two different real-life datasets, using the ns-2 network simulator for a packet-level simulation of a large Internet-style network.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Akbarinia, R., et al.: Reducing network traffic in unstructured p2p systems using top-k queries. Distributed and Parallel Databases 19(2-3), 67–86 (2006)CrossRefGoogle Scholar
  2. 2.
    Balke, W.-T., Nejdl, W., Siberski, W., Thaden, U.: Progressive distributed top k retrieval in peer-to-peer networks. In: ICDE, pp. 174–185 (2005)Google Scholar
  3. 3.
    Brijs, T., Swinnen, G., Vanhoof, K., Wets, G.: Using association rules for product assortment decisions: A case study. In: KDD, pp. 254–260 (1999)Google Scholar
  4. 4.
    Cao, P., Wang, Z.: Efficient top-k query calculation in distributed networks. In: PODC, pp. 206–215 (2004)Google Scholar
  5. 5.
    Chang, K.C.-C., won Hwang, S.: Minimal probing: supporting expensive predicates for top-k queries. In: SIGMOD, pp. 346–357 (2002)Google Scholar
  6. 6.
    Das, G., Gunopulos, D., Koudas, N., Sarkas, N.: Ad-hoc top-k query answering for data streams. In: VLDB, pp. 183–194 (2007)Google Scholar
  7. 7.
    Das, G., Gunopulos, D., Koudas, N., Tsirogiannis, D.: Answering top-k queries using views. In: VLDB, pp. 451–462 (2006)Google Scholar
  8. 8.
    Dubinko, M., Kumar, R., Magnani, J., Novak, J., Raghavan, P., Tomkins, A.: Visualizing tags over time. In: WWW, pp. 193–202 (2006)Google Scholar
  9. 9.
    Garofalakis, M.(ed.): Special issue on in-network query processing. IEEE Data Eng. Bull. 28(1) (2005)Google Scholar
  10. 10.
    Fagin, R., Lotem, A., Naor, M.: Optimal aggregation algorithms for middleware. J. Comput. Syst. Sci. 66(4), 614–656 (2003)MATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Güntzer, U., Balke, W.-T., Kießling, W.: Optimizing multi-feature queries for image databases. In: VLDB, pp. 419–428 (2000)Google Scholar
  12. 12.
    Güntzer, U., Balke, W.-T., Kießling, W.: Towards efficient multi-feature queries in heterogeneous environments. In: ITCC, pp. 622–628 (2001)Google Scholar
  13. 13.
    Information Sciences Institute. The University of Southern California. The network simulator - ns-2 (2007), http://www.isi.edu/nsnam/ns/
  14. 14.
    Marian, A., Bruno, N., Gravano, L.: Evaluating top-k queries over web-accessible databases. ACM Trans. Database Syst. 29(2), 319–362 (2004)CrossRefGoogle Scholar
  15. 15.
    Michel, S., Triantafillou, P., Weikum, G.: Klee: A framework for distributed top-k query algorithms. In: VLDB, pp. 637–648 (2005)Google Scholar
  16. 16.
    Nepal, S., Ramakrishna, M.V.: Query processing issues in image (multimedia) databases. In: ICDE, pp. 22–29 (1999)Google Scholar
  17. 17.
    Neumann, T., Michel, S.: Algebraic query optimization for distributed top-k queries. In: BTW, pp. 324–343 (2007)Google Scholar
  18. 18.
    Soliman, M.A., Ilyas, I.F., Chang, K.C.-C.: Top-k query processing in uncertain databases. In: ICDE, pp. 896–905 (2007)Google Scholar
  19. 19.
    Theobald, M., Weikum, G., Schenkel, R.: Top-k query evaluation with probabilistic guarantees. In: VLDB, pp. 648–659 (2004)Google Scholar
  20. 20.
    Winick, J., Jamin, S.: Inet-3.0: Internet topology generator. Technical Report UM-CSE-TR-456-02, EECS, University of Michigan (2002)Google Scholar
  21. 21.
    Xin, D., Han, J., Chang, K.C.-C.: Progressive and selective merge: computing top-k with ad-hoc ranking functions. In: SIGMOD, pp. 103–114 (2007)Google Scholar
  22. 22.
    Yu, H., Li, H.-G., Wu, P., Agrawal, D., Abbadi, A.E.: Efficient processing of distributed top-k queries. In: Andersen, K.V., Debenham, J., Wagner, R. (eds.) DEXA 2005. LNCS, vol. 3588, pp. 65–74. Springer, Heidelberg (2005)CrossRefGoogle Scholar
  23. 23.
    Zeinalipour-Yazti, D., et al.: The threshold join algorithm for top-k queries in distributed sensor networks. In: DMSN (2005)Google Scholar
  24. 24.
    Zhang, J., Suel, T.: Efficient query evaluation on large textual collections in a peer-to-peer environment. In: Peer-to-Peer Computing, pp. 225–233 (2005)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Thomas Neumann
    • 1
  • Matthias Bender
    • 1
  • Sebastian Michel
    • 2
  • Ralf Schenkel
    • 1
  • Peter Triantafillou
    • 3
  • Gerhard Weikum
    • 1
  1. 1.Max-Planck-Institut InformatikSaarbrückenGermany
  2. 2.École Polytechnique Fédérale de LausanneSwitzerland
  3. 3.RACTI and University of PatrasGreece

Personalised recommendations