Skip to main content

A Simple, Yet Effective and Efficient, Sliding Window Sampling Algorithm

  • Conference paper
Book cover Database Systems for Advanced Applications (DASFAA 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 5981))

Included in the following conference series:

Abstract

Sampling streams of continuous data with limited memory, or reservoir sampling, is a utility algorithm. Standard reservoir sampling maintains a random sample of the entire stream as it has arrived so far. This restriction does not meet the requirement of many applications that need to give preference to recent data. The simplest algorithm for maintaining a random sample of a sliding window reproduces periodically the same sample design. This is also undesirable for many applications. Other existing algorithms are using variable size memory, variable size samples or maintain biased samples and allow expired data in the sample.

We propose an effective algorithm, which is very simple and therefore efficient, for maintaining a near random fixed size sample of a sliding window. Indeed our algorithm maintains a biased sample that may contain expired data. Yet it is a good approximation of a random sample with expired data being present with low probability. We analytically explain why and under which parameter settings the algorithm is effective. We empirically evaluate its performance (effectiveness) and compare it with the performance of existing representatives of random sampling over sliding windows and biased sampling algorithm.

This work was partially funded under research grant number R-252-000-301-646 from NUS.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Aggarwal, C.C.: On Biased Reservoir Sampling in the Presence of Stream Evolution. In: VLDB 2006, pp. 607–618 (2006)

    Google Scholar 

  2. Babcock, B., Chaudhuri, S., Das, G.: Dynamic Sample Selection for Approximate Query Processing. In: SIGMOD Conference 2003, pp. 539–550 (2003)

    Google Scholar 

  3. Babcock, B., Datar, M., Motwani, R.: Sampling from a moving window over streaming data. In: SODA 2002, pp. 633–634 (2002)

    Google Scholar 

  4. Considine, J., Li, F., Kollios, G., Byers, J.W.: Approximate Aggregation Techniques for Sensor Databases. In: ICDE 2004, pp. 449–460 (2004)

    Google Scholar 

  5. Gemulla, R., Lehner, W., Haas, P.J.: A Dip in the Reservoir: Maintaining Sample Synopses of Evolving Datasets. In: VLDB 2006, pp. 595–606 (2006)

    Google Scholar 

  6. Hagn, C.J., Warren, S.G., London, J.: Edited synoptic cloud reports from ships and land stations over the globe, 1982-1991 (1996), http://cdiac.esd.ornl.gov/ftp/ndp026b

  7. John, G.H., Langley, P.: Static Versus Dynamic Sampling for Data Mining. In: KDD 1996, pp. 367–370 (1996)

    Google Scholar 

  8. McLeod, A.I., Bellhouse, D.R.: A convenient algorithm for drawing a simple random sample. Appl, Stat., pp. 182–184 (1983)

    Google Scholar 

  9. Raissi, C., Poncelet, P.: Sampling for Sequential Pattern Mining: From Static Databases to Data Streams. In: ICDM 2007, pp. 631–636 (2007)

    Google Scholar 

  10. Ross, S.: A first course in probability, 5th edn. (1997)

    Google Scholar 

  11. Silberstein, A., Braynard, R., Ellis, C.S., Munagala, K., Yang, J.: A Sampling-Based Approach to Optimizing Top-k Queries in Sensor Networks. In: ICDE 2006, p. 68 (2006)

    Google Scholar 

  12. Toivonen, H.: Sampling Large Databases for Association Rules. In: VLDB 1996, pp. 134–145 (1996)

    Google Scholar 

  13. Vitter, J.S.: Random sampling with a reservoir. ACM Transactions on Mathematical Software 11(1), 37–57 (1985)

    Article  MATH  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lu, X., Tok, W.H., Raissi, C., Bressan, S. (2010). A Simple, Yet Effective and Efficient, Sliding Window Sampling Algorithm. In: Kitagawa, H., Ishikawa, Y., Li, Q., Watanabe, C. (eds) Database Systems for Advanced Applications. DASFAA 2010. Lecture Notes in Computer Science, vol 5981. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-12026-8_27

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-12026-8_27

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-12025-1

  • Online ISBN: 978-3-642-12026-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics