Abstract
Sampling streams of continuous data with limited memory, or reservoir sampling, is a utility algorithm. Standard reservoir sampling maintains a random sample of the entire stream as it has arrived so far. This restriction does not meet the requirement of many applications that need to give preference to recent data. The simplest algorithm for maintaining a random sample of a sliding window reproduces periodically the same sample design. This is also undesirable for many applications. Other existing algorithms are using variable size memory, variable size samples or maintain biased samples and allow expired data in the sample.
We propose an effective algorithm, which is very simple and therefore efficient, for maintaining a near random fixed size sample of a sliding window. Indeed our algorithm maintains a biased sample that may contain expired data. Yet it is a good approximation of a random sample with expired data being present with low probability. We analytically explain why and under which parameter settings the algorithm is effective. We empirically evaluate its performance (effectiveness) and compare it with the performance of existing representatives of random sampling over sliding windows and biased sampling algorithm.
This work was partially funded under research grant number R-252-000-301-646 from NUS.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Aggarwal, C.C.: On Biased Reservoir Sampling in the Presence of Stream Evolution. In: VLDB 2006, pp. 607–618 (2006)
Babcock, B., Chaudhuri, S., Das, G.: Dynamic Sample Selection for Approximate Query Processing. In: SIGMOD Conference 2003, pp. 539–550 (2003)
Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: SODA 2002, pp. 633–634 (2002)
Considine, J., Li, F., Kollios, G., Byers, J.W.: Approximate Aggregation Techniques for Sensor Databases. In: ICDE 2004, pp. 449–460 (2004)
Gemulla, R., Lehner, W., Haas, P.J.: A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets. In: VLDB 2006, pp. 595–606 (2006)
Hagn, C.J., Warren, S.G., London, J.: Edited synoptic cloud reports from ships and land stations over the globe, 1982-1991 (1996), http://cdiac.esd.ornl.gov/ftp/ndp026b
John, G.H., Langley, P.: Static Versus Dynamic Sampling for Data Mining. In: KDD 1996, pp. 367–370 (1996)
McLeod, A.I., Bellhouse, D.R.: A convenient algorithm for drawing a simple random sample. Appl, Stat., pp. 182–184 (1983)
Raissi, C., Poncelet, P.: Sampling for Sequential Pattern Mining: From Static Databases to Data Streams. In: ICDM 2007, pp. 631–636 (2007)
Ross, S.: A first course in probability, 5th edn. (1997)
Silberstein, A., Braynard, R., Ellis, C.S., Munagala, K., Yang, J.: A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks. In: ICDE 2006, p. 68 (2006)
Toivonen, H.: Sampling Large Databases for Association Rules. In: VLDB 1996, pp. 134–145 (1996)
Vitter, J.S.: Random sampling with a reservoir. ACM Transactions on Mathematical Software 11(1), 37–57 (1985)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lu, X., Tok, W.H., Raissi, C., Bressan, S. (2010). A Simple, Yet Effective and Efficient, Sliding Window Sampling Algorithm. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 5981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12026-8_27
Download citation
DOI: https://doi.org/10.1007/978-3-642-12026-8_27
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-12025-1
Online ISBN: 978-3-642-12026-8
eBook Packages: Computer ScienceComputer Science (R0)