RRPJ: Result-Rate Based Progressive Relational Join

Tok, Wee Hyong; Bressan, Stéphane; Lee, Mong-Li

doi:10.1007/978-3-540-71703-4_6

Wee Hyong Tok¹,
Stéphane Bressan¹ &
Mong-Li Lee¹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4443))

Included in the following conference series:

International Conference on Database Systems for Advanced Applications

1447 Accesses
7 Citations

Abstract

Progressive join algorithms are join algorithms that produce results incrementally as input data is available. Because they are non-blocking, they are particularly suitable for online processing of data streams. Reference algorithms of this family are the symmetric hash join, the X-join and more recently, the rate-based progressive join (RPJ).

While the symmetric hash join introduces the idea of a symmetric processing of the input streams but assumes sufficient main memory, the X-Join suggests that the processing can scale to very large amounts of data if main memory is regularly flushed to disk, and a reactive/cleanup phase is triggered for disk-resident data. The X-join flushing strategy is based on a simple largest-first strategy, where the largest partition is flushed to disk. The recently proposed RPJ predicts the main memory tuples or partitions that should be flushed to disk in order to maximize throughput by computing their probabilities to contribute to a result.

In this paper, we discuss the limitations of RPJ and propose a novel extension, called Result Rate-based Progressive Join (RRPJ), which addresses these limitations. Instead of computing the probabilities from statistics over the input data, RRPJ directly observes the output (result) statistics. This not only yields a better performance, but also simplifies the generalization of the algorithm to non-relational data such as multidimensional data and hierarchical data. We empirically show that RRPJ is effective and efficient and outperforms the state-of-art RPJ. We also investigate the relevance and performance of an adaptive version of these algorithms using amortization parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Haas, P.J., Hellerstein, J.M.: Ripple join for online aggregation. In: SIGMOD, pp. 287–298 (1999)
Google Scholar
Urhan, T., Franklin, M.J., Amsaleg, L.: Cost based query scrambling for initial delays. In: SIGMOD, pp. 130–141 (1998)
Google Scholar
Urhan, T., Franklin, M.J.: XJoin: Getting fast answers from slow and bursty networks. Technical Report CS-TR-3994, Computer Science Department, University of Maryland (1999)
Google Scholar
Avnur, R., Hellerstein, J.M.: Eddies: Continuously adaptive query processing. In: SIGMOD, pp. 261–272 (2000)
Google Scholar
Madden, S., Shah, M.A., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: SIGMOD, pp. 49–60 (2002)
Google Scholar
Tao, Y., Yiu, M.L., Papadias, D., Hadjieleftheriou, M., Mamoulis, N.: Rpj: Producing fast join results on streams through rate-based optimization. In: SIGMOD, pp. 371–382 (2005)
Google Scholar
Li, F., Chang, C., Kollios, G., Bestavros, A.: Characterizing and exploiting reference locality in data stream applications. In: ICDE, p. 81 (2006)
Google Scholar
Dittrich, J.-P., Seeger, B., Taylor, D.S., Widmayer, P.: Progressive merge join: A generic and non-blocking sort-based join algorithm. In: VLDB, pp. 299–310 (2002)
Google Scholar
Dittrich, J.-P., Seeger, B., Taylor, D.S., Widmayer, P.: On producing join results early. In: PODS, pp. 134–142 (2003)
Google Scholar
Mokbel, M.F., Lu, M., Aref, W.G.: Hash-merge join: A non-blocking join algorithm for producing fast and early join results. In: ICDE, pp. 251–263 (2004)
Google Scholar
Lawrence, R.: Early hash join: A configurable algorithm for the efficient and early production of join results. In: VLDB, pp. 841–852 (2005)
Google Scholar
Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: PDIS, pp. 68–77 (1991)
Google Scholar
Tok, W.H., Bressan, S., Lee, M.L.: Progressive spatial join. In: SSDBM, pp. 353–358 (2006)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computing, National University of Singapore,
Wee Hyong Tok, Stéphane Bressan & Mong-Li Lee

Authors

Wee Hyong Tok
View author publications
You can also search for this author in PubMed Google Scholar
Stéphane Bressan
View author publications
You can also search for this author in PubMed Google Scholar
Mong-Li Lee
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Ramamohanarao Kotagiri P. Radha Krishna Mukesh Mohania Ekawit Nantajeewarawat

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Tok, W.H., Bressan, S., Lee, ML. (2007). RRPJ: Result-Rate Based Progressive Relational Join. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_6

Download citation

DOI: https://doi.org/10.1007/978-3-540-71703-4_6
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71702-7
Online ISBN: 978-3-540-71703-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics