Skip to main content

Sampling the Join of Streams

  • Conference paper
  • First Online:
Classification as a Tool for Research

Abstract

One of the most critical operators for a Data Stream Management System is the join operator. Unfortunately, the join operator between the stream A and B is a blocking operator: for each current tuple of the stream A, the entire stream B have to be scanned. The usual technique used for unblocking stream operators consists to restrict the processing to a sliding window. This technique emphasizes recent data which are considered to be more relevant than old data. However, in a Data Stream Management System, a general approach is needed to join any data streams for any applications. Our approach is to consider data stream join as an estimation problem. The estimation model is simple and generic: a reservoir per data stream is used to model the join. The quality of join estimator is based on the frequencies of join key in the join. We propose four algorithms to feed reservoirs. The proposed methods outperform reservoir sampling approach on synthetic and real data streams.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  • Aggarwal, C. (2006). On biased reservoir sampling in the presence of stream evolution. In VLDB Conference, 607–618.

    Google Scholar 

  • Babcock, B., Babu, S., Datar, M., Motwani, R., & Widom, J. (2002). Models and issues in data stream systems. In ACM SIGMOD, 1–16.

    Google Scholar 

  • Chaudhuri, S., & Motwani, R. (1999). On sampling and relational operators. In IEEE on Data Engineering, 22, 41–46.

    Google Scholar 

  • Chaudhuri, S., Motwani, R., & Narasayya, V. (1999). On random sampling over joins. In ACM SIGMOD, 263–274.

    Google Scholar 

  • Das, A., Gehrke, J., & Riedewald, M. (2003). Approximate join processing over data streams. In ACM SIGMOD, 40–51.

    Google Scholar 

  • Efraimidis, P. S., & Spirakis, P. G. (2004). Weighted random sampling. Technical Report Research Academic Computer Technology Institute.

    Google Scholar 

  • Hellerstein, J. M., & Haas, P. J. (1999). Ripple joins for online aggregation. In ACM SIGMOD, 287–298.

    Google Scholar 

  • Hellerstein, J. M., Haas, P. J., & Wang, H. J. (1997). Online aggregation. In ACM SIGMOD, 171–182.

    Google Scholar 

  • Hellerstein, J. M., Avnur, R., & Raman, V. (2000). Informix under CONTROL: online query processing. In Data Mining and Knowledge Discovery Journal, 4(4), 281–314.

    Google Scholar 

  • Kolonko, M., & Wasch, D. (2004). Sequential reservoir sampling with a non-uniform distribution. Technical Report University of Clausthal.

    Google Scholar 

  • Vitter, J. S. (1985). Random sampling with a reservoir. In ACM SIGMOD, 11, 37–57.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Raphaël Féraud .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Féraud, R., Clérot, F., Gouzien, P. (2010). Sampling the Join of Streams. In: Locarek-Junge, H., Weihs, C. (eds) Classification as a Tool for Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10745-0_33

Download citation

Publish with us

Policies and ethics