Skip to main content
Log in

Supporting top-k join queries in relational databases

  • Published:
The VLDB Journal Aims and scope Submit manuscript

Abstract.

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Bruno N, Chaudhuri S, Gravano L (2002) Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM Trans Database Sys (TODS) 27(2):369-380

    Google Scholar 

  2. Bruno N, Gravano L, Marian A (2002) Evaluating top-k queries over web-accessible databases. In: Proceedings of the IEEE 18th international conference on data engineering (ICDE), San Jose, CA, pp 153-187

  3. Carey MJ, Kossmann D (1997) On saying “Enough already!” in SQL. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, pp 219-230

  4. Carey MJ, Kossmann D (1998) Reducing the braking distance of an SQL query engine. In: Proceedings of the 24th international conference on very large databases (VLDB), New York, August 1998, pp 158-169. Morgan Kaufmann, San Francisco

  5. Chen-Chuan Chang K, won Hwang S (2002) Minimal probing: supporting expensive predicates for top-k queries. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 346-357

  6. Diaconis P (1988) Group representation in probability and statistics. IMS Lecture Series 11, IMS

  7. Diaconis P, Graham R (1977) Spearman’s footrule as a measure of disarray. J R Stat Soc 39(2):262-368

    MATH  Google Scholar 

  8. Dwork C, Ravi Kumar S, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on the World Wide Web, Hong Kong, pp 613-622

  9. Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Sys Sci 58(1):216-226

    Google Scholar 

  10. Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS), Santa Barbara, CA, pp 102-113

  11. Güntzer U, Balke W-T, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th international conference on very large databases (VLDB), Cairo, Egypt. Morgan Kaufmann, San Francisco, pp 419-428

  12. Güntzer U, Balke W-T, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE international symposium on information technology (ITCC), Las Vegas, pp 622-628

  13. Haas PJ, Hellerstein JM (1999) Ripple joins for online aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, Philadelphia, pp 287-298

  14. Hong W, Stonebraker M (1993) Optimization of parallel query execution plans in XPRS. Distrib Parallel Databases 1(1):9-32

    Google Scholar 

  15. Ilyas IF, Aref WG, Elmagarmid AK (2002) Joining ranked inputs in practice. In: Proceedings of the 28th international conference on very large databases (VLDB), Hong Kong. Morgan Kaufmann, San Francisco, pp 950-961

  16. Natsev A, Chang Y-C, Smith JR, Li C-S, Vitter JS (2001) Supporting incremental join queries on ranked inputs. In: Proceedings of the 27th international conference on very large databases (VLDB), Rome, pp 281-290. Morgan Kaufmann, San Francisco

  17. Nepal S, Ramakrishna MV (1999) Query processing issues in image (multimedia) databases. In: Proceedings of the IEEE 15th international conference on data engineering (ICDE), Sydney, Australia, pp 22-29

  18. Selinger PG, Astrahan MM, Chamberlin DD, Lorie Ra, Price TG (1979) Access path election in a relational database management system. In: Proceedings of the ACM SIGMOD international conference on management of data, Boston, pp 23-34

  19. Seshadri P, Paskin M (1997) Predator: An or-dbms with enhanced data types. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, pp 568-571

  20. Urhan T, Franklin MJ (2000) XJoin: A reactively scheduled pipelined join operator. IEEE Data Eng Bull 23(2):27-33

    Google Scholar 

  21. Urhan T, Franklin MJ (2001) Dynamic pipeline scheduling for improving interactive query performance. In: Proceedings of the 27th international conference on very large databases (VLDB), Rome. Morgan Kaufmann, San Francisco, pp 501-510

  22. Wilschut AN, Apers PMG (1991) Dataflow query execution in a parallel main-memory environment. Distrib Parallel Databases 1(1):68-77

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ihab F. Ilyas.

Additional information

Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004

Edited by: S. Abiteboul

Extended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ilyas, I.F., Aref, W.G. & Elmagarmid, A.K. Supporting top-k join queries in relational databases. VLDB 13, 207–221 (2004). https://doi.org/10.1007/s00778-004-0128-2

Download citation

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00778-004-0128-2

Keywords:

Navigation