Skip to main content

RRPJ: Result-Rate Based Progressive Relational Join

  • Conference paper
Advances in Databases: Concepts, Systems and Applications (DASFAA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4443))

Included in the following conference series:

Abstract

Progressive join algorithms are join algorithms that produce results incrementally as input data is available. Because they are non-blocking, they are particularly suitable for online processing of data streams. Reference algorithms of this family are the symmetric hash join, the X-join and more recently, the rate-based progressive join (RPJ).

While the symmetric hash join introduces the idea of a symmetric processing of the input streams but assumes sufficient main memory, the X-Join suggests that the processing can scale to very large amounts of data if main memory is regularly flushed to disk, and a reactive/cleanup phase is triggered for disk-resident data. The X-join flushing strategy is based on a simple largest-first strategy, where the largest partition is flushed to disk. The recently proposed RPJ predicts the main memory tuples or partitions that should be flushed to disk in order to maximize throughput by computing their probabilities to contribute to a result.

In this paper, we discuss the limitations of RPJ and propose a novel extension, called Result Rate-based Progressive Join (RRPJ), which addresses these limitations. Instead of computing the probabilities from statistics over the input data, RRPJ directly observes the output (result) statistics. This not only yields a better performance, but also simplifies the generalization of the algorithm to non-relational data such as multidimensional data and hierarchical data. We empirically show that RRPJ is effective and efficient and outperforms the state-of-art RPJ. We also investigate the relevance and performance of an adaptive version of these algorithms using amortization parameters.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Haas, P.J., Hellerstein, J.M.: Ripple join for online aggregation. In: SIGMOD, pp. 287–298 (1999)

    Google Scholar 

  2. Urhan, T., Franklin, M.J., Amsaleg, L.: Cost based query scrambling for initial delays. In: SIGMOD, pp. 130–141 (1998)

    Google Scholar 

  3. Urhan, T., Franklin, M.J.: XJoin: Getting fast answers from slow and bursty networks. Technical Report CS-TR-3994, Computer Science Department, University of Maryland (1999)

    Google Scholar 

  4. Avnur, R., Hellerstein, J.M.: Eddies: Continuously adaptive query processing. In: SIGMOD, pp. 261–272 (2000)

    Google Scholar 

  5. Madden, S., Shah, M.A., Hellerstein, J.M., Raman, V.: Continuously adaptive continuous queries over streams. In: SIGMOD, pp. 49–60 (2002)

    Google Scholar 

  6. Tao, Y., Yiu, M.L., Papadias, D., Hadjieleftheriou, M., Mamoulis, N.: Rpj: Producing fast join results on streams through rate-based optimization. In: SIGMOD, pp. 371–382 (2005)

    Google Scholar 

  7. Li, F., Chang, C., Kollios, G., Bestavros, A.: Characterizing and exploiting reference locality in data stream applications. In: ICDE, p. 81 (2006)

    Google Scholar 

  8. Dittrich, J.-P., Seeger, B., Taylor, D.S., Widmayer, P.: Progressive merge join: A generic and non-blocking sort-based join algorithm. In: VLDB, pp. 299–310 (2002)

    Google Scholar 

  9. Dittrich, J.-P., Seeger, B., Taylor, D.S., Widmayer, P.: On producing join results early. In: PODS, pp. 134–142 (2003)

    Google Scholar 

  10. Mokbel, M.F., Lu, M., Aref, W.G.: Hash-merge join: A non-blocking join algorithm for producing fast and early join results. In: ICDE, pp. 251–263 (2004)

    Google Scholar 

  11. Lawrence, R.: Early hash join: A configurable algorithm for the efficient and early production of join results. In: VLDB, pp. 841–852 (2005)

    Google Scholar 

  12. Wilschut, A.N., Apers, P.M.G.: Dataflow query execution in a parallel main-memory environment. In: PDIS, pp. 68–77 (1991)

    Google Scholar 

  13. Tok, W.H., Bressan, S., Lee, M.L.: Progressive spatial join. In: SSDBM, pp. 353–358 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Ramamohanarao Kotagiri P. Radha Krishna Mukesh Mohania Ekawit Nantajeewarawat

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Tok, W.H., Bressan, S., Lee, ML. (2007). RRPJ: Result-Rate Based Progressive Relational Join. In: Kotagiri, R., Krishna, P.R., Mohania, M., Nantajeewarawat, E. (eds) Advances in Databases: Concepts, Systems and Applications. DASFAA 2007. Lecture Notes in Computer Science, vol 4443. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71703-4_6

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71703-4_6

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71702-7

  • Online ISBN: 978-3-540-71703-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics