Abstract
In this work, we present a comprehensive treatment of weighted random sampling (WRS) over data streams. More precisely, we examine two natural interpretations of the item weights, describe an existing algorithm for each case [3, 8], discuss sampling with and without replacement and show adaptations of the algorithms for several WRS problems and evolving data streams.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
We say “supposed” because even though WRS is best described with a sequential sampling procedure, it is not inherently sequential. Algorithm A-ES [8] which we will use to solve WRS-W problems can be executed on sequential, parallel and distributed settings.
References
Aggarwal, C.C.: On biased reservoir sampling in the presence of stream evolution. In: VLDB 2006: Proceedings of the 32nd International Conference on Very Large Data Bases, pp. 607–618. VLDB Endowment (2006)
Al-Kateb, M., Lee, B.S.: Adaptive stratified reservoir sampling over heterogeneous data streams. Inf. Syst. 39, 199–216 (2014)
Chao, M.T.: A general purpose unequal probability sampling plan. Biometrika 69(3), 653–656 (1982)
Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Optimal sampling from distributed streams. In: Proceedings of the Twenty-ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2010, pp. 77–86. ACM, New York (2010)
Cormode, G., Muthukrishnan, S., Yi, K., Zhang, Q.: Continuous sampling from distributed streams. J. ACM 59(2), 10:1–10:25 (2012)
Cormode, G., Shkapenyuk, V., Srivastava, D., Xu, B.: Forward decay: a practical time decay model for streaming systems. In: Proceedings of the 2009 IEEE International Conference on Data Engineering, ICDE 2009, pp. 138–149. IEEE Computer Society, Washington, DC (2009)
Devroye, L.: Non-Uniform Random Variate Generation. Springer, New York (1986)
Efraimidis, P.S., Spirakis, P.G.: Weighted random sampling with a reservoir. Inf. Process. Lett. 97(5), 181–185 (2006)
Goldberg, G., Harnik, D., Sotnikov, D.: The case for sampling on very large file systems. In: 30th Symposium on Mass Storage Systems and Technologies (MSST), pp. 1–11, June 2014
Hu, X., Qiao, M., Tao, Y.: Independent range sampling. In: Proceedings of the 33rd ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS 2014, pp. 246–255. ACM, New York (2014)
Knuth, D.E.: The Art of Computer Programming: Seminumerical Algorithms, vol. 2, 2nd edn. Addison-Wesley Publishing Company, Reading (1981)
Li, K.-H.: Reservoir-sampling algorithms of time complexity o(n(1 + log(n/n))). ACM Trans. Math. Softw. 20(4), 481–493 (1994)
Longbo, Z., Zhanhuai, L., Yiqiang, Z., Min, Y., Yang, Z.: A priority random sampling algorithm for time-based sliding windows over weighted streaming data. In: Proceedings of the 2007 ACM Symposium on Applied Computing, SAC 2007, pp. 453–456. ACM, New York (2007)
Olken, F.: Random sampling from databases. Ph.D. thesis, Department of Computer Science, University of California at Berkeley (1993)
Tirthapura, S., Woodruff, D.P.: Optimal random sampling from distributed streams revisited. In: Peleg, D. (ed.) Distributed Computing. LNCS, vol. 6950, pp. 283–297. Springer, Heidelberg (2011)
Vitter, J.S.: Faster methods for random sampling. Commun. ACM 27(7), 703–718 (1984)
Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11(1), 37–57 (1985)
WRS.: A stream sampler for weighted random sampling. https://euclid.ee.duth.gr/demo/wrs/
Acknowledgments
The present work was supported in part by the project ATLAS (Advanced Tourism Planning), GSRT/CO-OPERATION/11SYN-10-1730, and by national ETAA funds.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Efraimidis, P.S. (2015). Weighted Random Sampling over Data Streams. In: Zaroliagis, C., Pantziou, G., Kontogiannis, S. (eds) Algorithms, Probability, Networks, and Games. Lecture Notes in Computer Science(), vol 9295. Springer, Cham. https://doi.org/10.1007/978-3-319-24024-4_12
Download citation
DOI: https://doi.org/10.1007/978-3-319-24024-4_12
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24023-7
Online ISBN: 978-3-319-24024-4
eBook Packages: Computer ScienceComputer Science (R0)