Encyclopedia of Algorithms

2008 Edition
| Editors: Ming-Yang Kao

Weighted Random Sampling

2005; Efraimidis, Spirakis
  • Pavlos Efraimidis
  • Paul Spirakis
Reference work entry
DOI: https://doi.org/10.1007/978-0-387-30162-4_478

Keywords and Synonyms

Random number generation; Sampling        

Problem Definition

The problem of random sampling without replacement (RS) calls for the selection of m distinct random items out of a population of size n. If all items have the same probability to be selected, the problem is known as uniform RS. Uniform random sampling in one pass is discussed in [1,6,11]. Reservoir-type uniform sampling algorithms over data streams are discussed in [12]. A parallel uniform random sampling algorithm is given in [10]. In weighted random sampling (WRS) the items are weighted and the probability of each item to be selected is determined by its relative weight. WRS can be defined with the following algorithm D:

Algorithm D, a definition of WRS
Input:

A population V of n weighted items

Output:

A set S with a WRS of size m

1:

For \( { k=1 } \)

This is a preview of subscription content, log in to check access.

Recommended Reading

  1. 1.
    Ahrens, J.H., Dieter, U.: Sequential random sampling. ACM Trans. Math. Softw. 11, 157–169 (1985)zbMATHCrossRefGoogle Scholar
  2. 2.
    Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and issues in data stream systems. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, pp. 1–16. ACM Press (2002)Google Scholar
  3. 3.
    Devroye, L.: Non-uniform Random Variate Generation. Springer, New York (1986)zbMATHGoogle Scholar
  4. 4.
    Efraimidis, P., Spirakis, P.: Weighted Random Sampling with a reservoir. Inf. Process. Lett. J. 97(5), 181–185 (2006)CrossRefMathSciNetGoogle Scholar
  5. 5.
    Jermaine, C., Pol, A., Arumugam, S.: Online maintenance of very large random samples. In: SIGMOD '04: Proceedings of the 2004 ACM SIGMOD international conference on Management of data, New York, pp. 299–310. ACM Press (2004)Google Scholar
  6. 6.
    Knuth, D.: The Art of Computer Programming, vol. 2 : Seminumerical Algorithms, 2nd edn. Addison-Wesley Publishing Company, Reading (1981)zbMATHGoogle Scholar
  7. 7.
    Lin, J.-H., Vitter, J.: ϵ-approximations with minimum packing constraint violation. In: 24th ACM STOC, pp. 771–782 (1992)Google Scholar
  8. 8.
    Muthukrishnan, S.: Data streams: Algorithms and applications. Found. Trends Theor. Comput. Sci. 1, pp.1–126 (2005)Google Scholar
  9. 9.
    Olken, F.: Random Sampling from Databases. Ph. D. thesis, Department of Computer Science, University of California, Berkeley (1993)Google Scholar
  10. 10.
    Rajan, V., Ghosh, R., Gupta, P.: An efficient parallel algorithm for random sampling. Inf. Process. Lett. 30, 265–268 (1989)zbMATHCrossRefMathSciNetGoogle Scholar
  11. 11.
    Vitter, J.: Faster methods for random sampling. Commun. ACM 27, 703–718 (1984)zbMATHCrossRefMathSciNetGoogle Scholar
  12. 12.
    Vitter, J.: Random sampling with a reservoir. ACM Trans. Math. Softw. 11, 37–57 (1985)zbMATHCrossRefMathSciNetGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  • Pavlos Efraimidis
    • 1
  • Paul Spirakis
    • 2
  1. 1.Department of Electrical and Computer EngineeringDemocritus University of ThraceXanthiGreece
  2. 2.Department of Computer Engineering and Informatics, Research and Academic Computer Technology InstitutePatras UniversityPatrasGreece