Supporting top-k join queries in relational databases

Ilyas, Ihab F.; Aref, Walid G.; Elmagarmid, Ahmed K.

doi:10.1007/s00778-004-0128-2

Supporting top-k join queries in relational databases

Published: September 2004

Volume 13, pages 207–221, (2004)
Cite this article

The VLDB Journal Aims and scope Submit manuscript

Ihab F. Ilyas¹,
Walid G. Aref² &
Ahmed K. Elmagarmid²

321 Accesses
136 Citations
Explore all metrics

Abstract.

Ranking queries, also known as top-k queries, produce results that are ordered on some computed score. Typically, these queries involve joins, where users are usually interested only in the top-k join results. Top-k queries are dominant in many emerging applications, e.g., multimedia retrieval by content, Web databases, data mining, middlewares, and most information retrieval applications. Current relational query processors do not handle ranking queries efficiently, especially when joins are involved. In this paper, we address supporting top-k join queries in relational query processors. We introduce a new rank-join algorithm that makes use of the individual orders of its inputs to produce join results ordered on a user-specified scoring function. The idea is to rank the join results progressively during the join operation. We introduce two physical query operators based on variants of ripple join that implement the rank-join algorithm. The operators are nonblocking and can be integrated into pipelined execution plans. We also propose an efficient heuristic designed to optimize a top-k join query by choosing the best join order. We address several practical issues and optimization heuristics to integrate the new join operators in practical query processors. We implement the new operators inside a prototype database engine based on PREDATOR. The experimental evaluation of our approach compares recent algorithms for joining ranked inputs and shows superior performance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

Scalable top-k keyword search in relational databases

Article 06 October 2017

Progressive and Approximate Join Algorithms on Data Streams

References

Bruno N, Chaudhuri S, Gravano L (2002) Top-k selection queries over relational databases: mapping strategies and performance evaluation. ACM Trans Database Sys (TODS) 27(2):369-380
Google Scholar
Bruno N, Gravano L, Marian A (2002) Evaluating top-k queries over web-accessible databases. In: Proceedings of the IEEE 18th international conference on data engineering (ICDE), San Jose, CA, pp 153-187
Carey MJ, Kossmann D (1997) On saying “Enough already!” in SQL. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, pp 219-230
Carey MJ, Kossmann D (1998) Reducing the braking distance of an SQL query engine. In: Proceedings of the 24th international conference on very large databases (VLDB), New York, August 1998, pp 158-169. Morgan Kaufmann, San Francisco
Chen-Chuan Chang K, won Hwang S (2002) Minimal probing: supporting expensive predicates for top-k queries. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 346-357
Diaconis P (1988) Group representation in probability and statistics. IMS Lecture Series 11, IMS
Diaconis P, Graham R (1977) Spearman’s footrule as a measure of disarray. J R Stat Soc 39(2):262-368
MATH Google Scholar
Dwork C, Ravi Kumar S, Naor M, Sivakumar D (2001) Rank aggregation methods for the web. In: Proceedings of the 10th international conference on the World Wide Web, Hong Kong, pp 613-622
Fagin R (1999) Combining fuzzy information from multiple systems. J Comput Sys Sci 58(1):216-226
Google Scholar
Fagin R, Lotem A, Naor M (2001) Optimal aggregation algorithms for middleware. In: Proceedings of the 20th ACM SIGACT-SIGMOD-SIGART symposium on principles of database systems (PODS), Santa Barbara, CA, pp 102-113
Güntzer U, Balke W-T, Kießling W (2000) Optimizing multi-feature queries for image databases. In: Proceedings of the 26th international conference on very large databases (VLDB), Cairo, Egypt. Morgan Kaufmann, San Francisco, pp 419-428
Güntzer U, Balke W-T, Kießling W (2001) Towards efficient multi-feature queries in heterogeneous environments. In: Proceedings of the IEEE international symposium on information technology (ITCC), Las Vegas, pp 622-628
Haas PJ, Hellerstein JM (1999) Ripple joins for online aggregation. In: Proceedings of the ACM SIGMOD international conference on management of data, Philadelphia, pp 287-298
Hong W, Stonebraker M (1993) Optimization of parallel query execution plans in XPRS. Distrib Parallel Databases 1(1):9-32
Google Scholar
Ilyas IF, Aref WG, Elmagarmid AK (2002) Joining ranked inputs in practice. In: Proceedings of the 28th international conference on very large databases (VLDB), Hong Kong. Morgan Kaufmann, San Francisco, pp 950-961
Natsev A, Chang Y-C, Smith JR, Li C-S, Vitter JS (2001) Supporting incremental join queries on ranked inputs. In: Proceedings of the 27th international conference on very large databases (VLDB), Rome, pp 281-290. Morgan Kaufmann, San Francisco
Nepal S, Ramakrishna MV (1999) Query processing issues in image (multimedia) databases. In: Proceedings of the IEEE 15th international conference on data engineering (ICDE), Sydney, Australia, pp 22-29
Selinger PG, Astrahan MM, Chamberlin DD, Lorie Ra, Price TG (1979) Access path election in a relational database management system. In: Proceedings of the ACM SIGMOD international conference on management of data, Boston, pp 23-34
Seshadri P, Paskin M (1997) Predator: An or-dbms with enhanced data types. In: Proceedings of the ACM SIGMOD international conference on management of data, Tucson, AZ, pp 568-571
Urhan T, Franklin MJ (2000) XJoin: A reactively scheduled pipelined join operator. IEEE Data Eng Bull 23(2):27-33
Google Scholar
Urhan T, Franklin MJ (2001) Dynamic pipeline scheduling for improving interactive query performance. In: Proceedings of the 27th international conference on very large databases (VLDB), Rome. Morgan Kaufmann, San Francisco, pp 501-510
Wilschut AN, Apers PMG (1991) Dataflow query execution in a parallel main-memory environment. Distrib Parallel Databases 1(1):68-77
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, University of Waterloo, N2L 3G1, Waterloo, Ontario, Canada
Ihab F. Ilyas
Department of Computer Sciences, Purdue University, IN 47907-1398, West Lafayette, USA
Walid G. Aref & Ahmed K. Elmagarmid

Authors

Ihab F. Ilyas
View author publications
You can also search for this author in PubMed Google Scholar
Walid G. Aref
View author publications
You can also search for this author in PubMed Google Scholar
Ahmed K. Elmagarmid
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ihab F. Ilyas.

Additional information

Received: 23 December 2003, Accepted: 31 March 2004, Published online: 12 August 2004

Edited by: S. Abiteboul

Extended version of the paper published in the Proceedings of the 29th International Conference on Very Large Databases, VLDB 2003, Berlin, Germany, pp 754-765

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ilyas, I.F., Aref, W.G. & Elmagarmid, A.K. Supporting top-k join queries in relational databases. VLDB 13, 207–221 (2004). https://doi.org/10.1007/s00778-004-0128-2

Download citation

Issue Date: September 2004
DOI: https://doi.org/10.1007/s00778-004-0128-2

Keywords:

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Supporting top-k join queries in relational databases

Abstract.

Access this article

Similar content being viewed by others

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

Scalable top-k keyword search in relational databases

Progressive and Approximate Join Algorithms on Data Streams

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords:

Navigation

Supporting top-k join queries in relational databases

Abstract.

Access this article

Similar content being viewed by others

Pay-as-you-go Approximate Join Top-k Processing for the Web of Data

Scalable top-k keyword search in relational databases

Progressive and Approximate Join Algorithms on Data Streams

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords:

Search

Navigation