Abstract
One of the most critical operators for a Data Stream Management System is the join operator. Unfortunately, the join operator between the stream A and B is a blocking operator: for each current tuple of the stream A, the entire stream B have to be scanned. The usual technique used for unblocking stream operators consists to restrict the processing to a sliding window. This technique emphasizes recent data which are considered to be more relevant than old data. However, in a Data Stream Management System, a general approach is needed to join any data streams for any applications. Our approach is to consider data stream join as an estimation problem. The estimation model is simple and generic: a reservoir per data stream is used to model the join. The quality of join estimator is based on the frequencies of join key in the join. We propose four algorithms to feed reservoirs. The proposed methods outperform reservoir sampling approach on synthetic and real data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C. (2006). On biased reservoir sampling in the presence of stream evolution. In VLDB Conference, 607–618.
Babcock, B., Babu, S., Datar, M., Motwani, R., & Widom, J. (2002). Models and issues in data stream systems. In ACM SIGMOD, 1–16.
Chaudhuri, S., & Motwani, R. (1999). On sampling and relational operators. In IEEE on Data Engineering, 22, 41–46.
Chaudhuri, S., Motwani, R., & Narasayya, V. (1999). On random sampling over joins. In ACM SIGMOD, 263–274.
Das, A., Gehrke, J., & Riedewald, M. (2003). Approximate join processing over data streams. In ACM SIGMOD, 40–51.
Efraimidis, P. S., & Spirakis, P. G. (2004). Weighted random sampling. Technical Report Research Academic Computer Technology Institute.
Hellerstein, J. M., & Haas, P. J. (1999). Ripple joins for online aggregation. In ACM SIGMOD, 287–298.
Hellerstein, J. M., Haas, P. J., & Wang, H. J. (1997). Online aggregation. In ACM SIGMOD, 171–182.
Hellerstein, J. M., Avnur, R., & Raman, V. (2000). Informix under CONTROL: online query processing. In Data Mining and Knowledge Discovery Journal, 4(4), 281–314.
Kolonko, M., & Wasch, D. (2004). Sequential reservoir sampling with a non-uniform distribution. Technical Report University of Clausthal.
Vitter, J. S. (1985). Random sampling with a reservoir. In ACM SIGMOD, 11, 37–57.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Féraud, R., Clérot, F., Gouzien, P. (2010). Sampling the Join of Streams. In: Locarek-Junge, H., Weihs, C. (eds) Classification as a Tool for Research. Studies in Classification, Data Analysis, and Knowledge Organization. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-10745-0_33
Download citation
DOI: https://doi.org/10.1007/978-3-642-10745-0_33
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-10744-3
Online ISBN: 978-3-642-10745-0
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)