The VLDB Journal

, Volume 13, Issue 3, pp 207–221 | Cite as

Supporting top-kjoin queries in relational databases

  • Ihab F. Ilyas
  • Walid G. Aref
  • Ahmed K. Elmagarmid

Abstract.

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.

Keywords:

Ranking Top-k queries Rank aggregarion Query operators 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bruno N, Chaudhuri S, Gravano L (2002) Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM Trans Database Sys (TODS) 27(2):369-380Google Scholar
  2. 2.
    Bruno N, Gravano L, Marian A (2002) Evaluating top-k queries over web-accessible databases. In: Proceedings of the IEEE 18th international conference on data engineering (ICDE), San Jose, CA, pp 153-187Google Scholar
  3. 3.
    Carey MJ, Kossmann D (1997) On saying “Enough already!” in SQL. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, pp 219-230Google Scholar
  4. 4.
    Carey MJ, Kossmann D (1998) Reducing the braking distance of an SQL query engine. In: Proceedings of the 24th international conference on very large databases (VLDB), New York, August 1998, pp 158-169. Morgan Kaufmann, San FranciscoGoogle Scholar
  5. 5.
    Chen-Chuan Chang K, won Hwang S (2002) Minimal probing: supporting expensive predicates for top-k queries. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 346-357Google Scholar
  6. 6.
    Diaconis P (1988) Group representation in probability and statistics. IMS Lecture Series 11, IMSGoogle Scholar
  7. 7.
    Diaconis P, Graham R (1977) Spearman’s footrule as a measure of disarray. J R Stat Soc 39(2):262-368MATHGoogle Scholar
  8. 8.
    Dwork C, Ravi Kumar S, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on the World Wide Web, Hong Kong, pp 613-622Google Scholar
  9. 9.
    Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Sys Sci 58(1):216-226Google Scholar
  10. 10.
    Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS), Santa Barbara, CA, pp 102-113Google Scholar
  11. 11.
    Güntzer U, Balke W-T, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th international conference on very large databases (VLDB), Cairo, Egypt. Morgan Kaufmann, San Francisco, pp 419-428Google Scholar
  12. 12.
    Güntzer U, Balke W-T, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE international symposium on information technology (ITCC), Las Vegas, pp 622-628Google Scholar
  13. 13.
    Haas PJ, Hellerstein JM (1999) Ripple joins for online aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, Philadelphia, pp 287-298Google Scholar
  14. 14.
    Hong W, Stonebraker M (1993) Optimization of parallel query execution plans in XPRS. Distrib Parallel Databases 1(1):9-32Google Scholar
  15. 15.
    Ilyas IF, Aref WG, Elmagarmid AK (2002) Joining ranked inputs in practice. In: Proceedings of the 28th international conference on very large databases (VLDB), Hong Kong. Morgan Kaufmann, San Francisco, pp 950-961Google Scholar
  16. 16.
    Natsev A, Chang Y-C, Smith JR, Li C-S, Vitter JS (2001) Supporting incremental join queries on ranked inputs. In: Proceedings of the 27th international conference on very large databases (VLDB), Rome, pp 281-290. Morgan Kaufmann, San FranciscoGoogle Scholar
  17. 17.
    Nepal S, Ramakrishna MV (1999) Query processing issues in image (multimedia) databases. In: Proceedings of the IEEE 15th international conference on data engineering (ICDE), Sydney, Australia, pp 22-29Google Scholar
  18. 18.
    Selinger PG, Astrahan MM, Chamberlin DD, Lorie Ra, Price TG (1979) Access path election in a relational database management system. In: Proceedings of the ACM SIGMOD international conference on management of data, Boston, pp 23-34Google Scholar
  19. 19.
    Seshadri P, Paskin M (1997) Predator: An or-dbms with enhanced data types. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, pp 568-571Google Scholar
  20. 20.
    Urhan T, Franklin MJ (2000) XJoin: A reactively scheduled pipelined join operator. IEEE Data Eng Bull 23(2):27-33Google Scholar
  21. 21.
    Urhan T, Franklin MJ (2001) Dynamic pipeline scheduling for improving interactive query performance. In: Proceedings of the 27th international conference on very large databases (VLDB), Rome. Morgan Kaufmann, San Francisco, pp 501-510Google Scholar
  22. 22.
    Wilschut AN, Apers PMG (1991) Dataflow query execution in a parallel main-memory environment. Distrib Parallel Databases 1(1):68-77Google Scholar

Copyright information

© Springer-Verlag Berlin/Heidelberg 2004

Authors and Affiliations

  • Ihab F. Ilyas
    • 1
  • Walid G. Aref
    • 2
  • Ahmed K. Elmagarmid
    • 2
  1. 1.School of Computer ScienceUniversity of WaterlooWaterlooCanada
  2. 2.Department of Computer SciencesPurdue UniversityWest LafayetteUSA

Personalised recommendations