# Efficient Sampling Methods for Discrete Distributions

- 814 Downloads
- 1 Citations

## Abstract

We study the fundamental problem of the exact and efficient generation of random values from a finite and discrete probability distribution. Suppose that we are given *n* distinct events with associated probabilities \(p_1, \dots , p_n\). First, we consider the problem of sampling from the distribution where the *i*-th event has probability proportional to \(p_i\). Second, we study the problem of sampling a subset which includes the *i*-th event independently with probability \(p_i\). For both problems we present on two different classes of inputs—sorted and general probabilities—efficient data structures consisting of a preprocessing and a query algorithm. Varying the allotted preprocessing time yields a trade-off between preprocessing and query time, which we prove to be asymptotically optimal everywhere.

### Keywords

Sampling algorithm Subset sampling Distribution Proportional sampling Data structures## 1 Introduction

Generating random variables from finite and discrete distributions has long been an important building block in many applications. For example, in computer simulations usually a huge number of random decisions based on prespecified or dynamically changing distributions is made. In this work we consider two fundamental computational problems, namely sampling *from a distribution* and sampling *independent events*. We consider these problems on general probabilities as well as restricted to sorted probabilities. The latter case is motivated by the fact that many natural distributions, such as the geometric or binomial distribution, are unimodal, i.e., they change monotonicity at most once. After splitting up such a distribution at its only extremum, we obtain two sorted sequences of probabilities, see Sect. 5 for a thorough discussion. As we will see, there is a rich interplay in designing efficient algorithms that solve these different problem variants.

We present our results on the classical Real RAM model of computation [1, 13]. In particular, we will assume that the following operations take constant time: (1) accessing the content of any memory cell, (2) generating a uniformly distributed real number in the interval [0, 1], and (3) performing basic arithmetical operations involving real numbers like addition, multiplication, division, comparison, truncation, and evaluating any fundamental function like exp and log. We argue in Sect. 5 that our algorithms can also be adapted to work on the Word RAM model of computation.

### 1.1 Proportional Sampling

We first focus on the classic problem of sampling from a given distribution. Given \(\mathbf {p}= (p_1,\ldots ,p_n) \in \mathbb {R}_{\ge 0}^n\), we define a random variable \(Y = Y_{\mathbf {p}}\) that takes values in^{1} [*n*] such that \({\text {Pr}}[Y = i] = {p_i}/{\mu }\), where \(\mu = \sum _{i=1}^n p_i\) is assumed to be positive. Note that if \(\mu =1\) then \(\mathbf {p}\) is indeed a probability distribution, otherwise we need to normalize first. We concern ourselves with the problem of sampling *Y*. We study this problem on two different classes of input sequences, sorted and general (i.e., not necessarily sorted) sequences; depending on the class under consideration we call the problem SortedProportionalSampling or UnsortedProportionalSampling.

A *single-sample algorithm* for SortedProportionalSampling or UnsortedProportionalSampling gets input \(\mathbf {p}\) and outputs a number \(s \in [n]\) that has the same distribution as *Y*. When we speak of “input \(\mathbf {p}\)” we mean that the algorithm gets to know *n* and can access every \(p_{i}\) in constant time. This can be achieved by storing all \(p_{i}\)’s in an array, but also, e.g., by having access to an algorithm computing any \(p_i\) in constant time. In particular, the algorithm does not know the number of *i*’s with \(p_{i} = 0\). Moreover, the input format is not sparse. For this problem we prove the following result.

### Theorem 1.1

There is a single-sample algorithm for SortedProportionalSampling with expected time \(\mathcal {O}\big (\frac{\log n}{\log \log n}\big )\) and for UnsortedProportionalSampling with expected time \(\mathcal {O}(n)\). Both bounds are asymptotically tight.

We remark that all our lower bounds only hold for algorithms that work for all *n* and all (sorted) sequences \(p_{1},\ldots ,p_{n}\). They are worst-case bounds over the input sequence \(\mathbf {p}\) and asymptotic in *n*. For particular instances \(\mathbf {p}\) there can be faster algorithms. To avoid any confusion, note that we mean worst-case bounds whenever we speak of *(running) time* and expected bounds whenever we speak of *expected (running) time*.

To obtain faster sampling times, we consider *sampling data structures* that support ProportionalSampling as a *query*. We view building the data structure as *preprocessing* of the input. More precisely, in this preprocessing-query variant we consider the interplay of two algorithms. First, the *preprocessing algorithm**P* gets \(\mathbf {p}\) as input and computes some auxiliary data \(D = D(\mathbf {p})\). Second, the *query algorithm* *Q* gets input \(\mathbf {p}\) and *D*, and samples *Y*, i.e., for any \(s \in [n]\) we have \({\text {Pr}}[Q(\mathbf {p},D) = s] = {\text {Pr}}[Y = s]\). For \({\text {Pr}}[Q(\mathbf {p},D) = s]\) the probability goes only over the random choices of *Q*, so that, after running the preprocessing once, running the query algorithm multiple times generates multiple independent samples. In this setting we prove the following tight result.

### Theorem 1.2

For any \(2 \le \beta \le \mathcal {O}(\frac{\log n}{\log \log n})\), SortedProportionalSampling can be solved in preprocessing time \(\mathcal {O}(\log _\beta n)\) and expected query time \(\mathcal {O}(\beta )\). This is optimal, as there is a constant \(\varepsilon > 0\) such that for all \(2 \le \beta \le \mathcal {O}(\frac{\log n}{\log \log n})\)SortedProportionalSampling has no data structure with preprocessing time \(\varepsilon \log _\beta (n)\) and expected query time \(\varepsilon \, \beta \).

Note that if we can afford a preprocessing time of \(\mathcal {O}(\log n)\) then the query time is already \(\mathcal {O}(1)\), which is optimal. Thus, larger preprocessing times cannot yield better query times. Moreover, for \(\beta = \Theta (\frac{\log n}{\log \log n})\) the preprocessing time is equal to the query time. Thus, we may skip the preprocessing phase and run both the preprocessing and query algorithm for every sample. We obtain a single-sample algorithm with runtime \(\mathcal {O}(\frac{\log n}{\log \log n})\). This shows that \(\beta \gg \frac{\log n}{\log \log n}\) makes no sense and explains why we allow preprocessing time \(\mathcal {O}(\log _\beta n)\) with \(2 \le \beta \le \mathcal {O}(\frac{\log n}{\log \log n})\). Varying \(\beta \) yields a trade-off between preprocessing and query time; if one wants to have a large number of samples, one should set \(\beta = 2\) to minimize query time, while a large \(\beta \) yields superior runtimes if one wants only a small number of samples. Note that we prove a matching lower bound for this trade-off for all \(\beta \).

For general input sequences, ProportionalSampling can be solved by the technique known as *pairing* or *aliasing* [8, 17]; see also Mihai Pătraşcu’s blog [14] for an excellent exposition. Basically, we use \(\mathcal {O}(n)\) preprocessing to distribute the probabilities of all elements over *n* urns such that any urn contains exactly 1 / *n* probability mass, stemming from at most two elements. For querying we first choose an urn uniformly at random. Then we choose one of the two included elements randomly according to their probability mass in the urn, resulting in an \(\mathcal {O}(1)\) (worst-case) query time. This result is not new, but will be used in the proofs of Theorem 1.5 and Theorem 1.6 below, so we include it for completeness.

### Theorem 1.3

UnsortedProportionalSampling can be solved in preprocessing time \(\mathcal {O}(n)\) and query time \(\mathcal {O}(1)\). This is optimal, as there is a constant \(\varepsilon > 0\) such that UnsortedProportionalSampling has no data structure with preprocessing time \(\varepsilon n\) and expected query time \(\varepsilon n\).

Note that any data structure with preprocessing time \(t_p\) and query time \(t_q\) can be transformed into a single-sample algorithm with expected time \(t_p+t_q\), so the single-sample variant of the problem is also solved by the preprocessing-query variant. This argument proves that Theorem 1.1 follows from Theorems 1.2 and 1.3.

*Related work* The fundamental problem of the exact and efficient generation of random values from discrete and continuous distributions has been studied extensively in the literature. The seminal work [9] examines the power of several restricted devices, like finite-state machines; the articles [6, 18] provide a further refined treatment of the topic. However, their results are not directly comparable to ours, since on the one hand they do not make any assumption on the sequence of probabilities and use unbiased coin flips as the only source of randomness, but on the other hand they cannot guarantee efficient precomputation on general sequences. Furthermore, [7] and [10] provided algorithms for a dynamic version of UnsortedProportionalSampling, where the probabilities may change over time. In particular, under certain mild conditions their results guarantee the same bounds as in Theorem 1.3. Finally, there is a solution to UnsortedProportionalSampling [3] that can be implemented on a WordRAM (i.e., the \(p_i\)’s are each represented by *w* bits, and the usual arithmetic operations on *w*-bit integers take constant time) that improves upon Walker’s technique and has optimal space and time requirements.

### 1.2 Subset Sampling

*S*of \(\{1,\ldots ,n\}\), where the values \(p_i = {\text {Pr}}[i \in S]\) are given as an input, and the events “\(i \in S\)” are independent. In other words, we are given \(\mathbf {p}= (p_1,\ldots ,p_n)\) as input and we want to sample a random variable \(X \subseteq [n]\) with

*X*SortedSubsetSampling or UnsortedSubsetSampling, if we consider it on sorted or general input sequences, respectively.

The motivation for this problems comes from sampling certain random graphs. Consider for instance the Chung-Lu random graph model [4]: We are given weights \(w_1 \ge \cdots \ge w_n\) and sample a graph on vertex set [*n*] where the edge \(\{i,j\}\) is independently present with probability \({\text {min}}\{1, \frac{w_i w_j}{\sum _k w_k}\}\). Note that for any fixed vertex *i*, the edge probabilities to vertices \(j>i\) are descendingly sorted. Thus, sampling the set of neighbors of vertex *i* is an instance of SortedSubsetSampling. Solving these instances for all vertices *i* yields a Chung-Lu random graph, and our algorithms from this paper do this in total time \(\mathcal {O}(n \log n + m)\), where *m* is the expected number of edges. This does not match the optimal \(\mathcal {O}(n+m)\) [11], because we ignore the structure connecting the different arising instances. However, it serves as a motivating example.

As previously, we consider two variations of SubsetSampling. In the *single-sample* variant we are given \(\mathbf {p}\) and we want to compute an output that has the same distribution as *X*. Moreover, in the *preprocessing-query* variant we have a precomputation algorithm that is given \(\mathbf {p}\) and computes some auxiliary data *D*, and a query algorithm that is given \(\mathbf {p}\) and *D* and has an output with the same distribution as *X*; where the results of multiple calls to the query algorithm are independent.

Any query algorithm cannot run faster than \(\mathcal {O}(1 + \mu )\), as its expected output size is \(\mu \) and any algorithm requires a running time of \(\Omega (1)\). Whether this query time is achievable depends on \(\mu \) and the allotted preprocessing time, as our results below make precise. Note that the single-sample variant of UnsortedSubsetSampling can be solved trivially in time \(\mathcal {O}(n)\); we just toss a biased coin for every \(p_i\). This algorithm is optimal, as shown by the following tight result.

### Theorem 1.4

Let us discuss what we mean by “asymptotically tight for any fixed \(\mu = \mu (n)\)”. Fix any \(\mu = \mu (n)\). Consider any single-sample algorithm for SortedSubsetSampling that, given any \(\mathbf {p}\) (not necessarily with \(\mu _\mathbf {p}= \mu \)), correctly samples from the desired distribution. Then there exists an input \(\mathbf {p}\) with \(\mu _\mathbf {p}= \mu \) such that the expected time of the algorithm on input \(\mathbf {p}\) is \(\Omega (t(n,\mu ))\), where \(t(n,\mu )\) is defined in Theorem 1.4. This holds even if we allow the algorithm to have a very large runtime for all instances with \(\mu _\mathbf {p}\ne \mu \). In particular, our runtime bound is not only tight for one infinite family of input \(\mathbf {p}\) (realizing a particular function \(\mu (n)\)), but for every \(\mu (n)\) we construct a hard family of inputs. A similar discussion applies to Theorems 1.5 and 1.6 below.

As for ProportionalSampling, the single-sample result Theorem 1.4 follows from our results on the preprocessing-query variant below.

### Theorem 1.5

Observe that setting \(\beta =2\) in the above result yields a preprocessing time of \(\mathcal {O}(\log n)\) and an (optimal) expected query time of \(\mathcal {O}(1+\mu )\).

The next result addresses the case of general, i.e., not necessarily sorted, probabilities.

### Theorem 1.6

UnsortedSubsetSampling can be solved in preprocessing time \(\mathcal {O}(n)\) and expected query time \(\mathcal {O}(1 + \mu )\). This is optimal, as there is a constant \(\varepsilon > 0\) such that UnsortedSubsetSampling has no data structure with preprocessing time \(\varepsilon n\) and expected query time \(\varepsilon n\) for any fixed \(\mu = \mu (n)\).

Both positive results in the previous theorems highly depend on each other. In particular, as is demonstrated in Sect. 2.2, we prove them by repeatedly reducing the instance size *n* and switching from the one problem variant to the other.

We also present a relation between ProportionalSampling and SubsetSampling that suggests that the classic problem ProportionalSampling is the easier of the two problems (or can be seen as a special case of SubsetSampling). Specifically, we present a reduction that allows one to infer the upper bounds for ProportionalSampling (Theorems 1.2 and 1.3) from the upper bounds for SubsetSampling (Theorems 1.5 and 1.6), see Sect. 4 for details.

*Related work* A classic algorithm solves SubsetSampling for \(p_1 = \ldots = p_n = p\) in the optimal expected time \(\mathcal {O}(1 + \mu )\), see, e.g., the monographs [5] and [8], where also many other cases are discussed. Indeed, observe that the index \(i_1\) of the first sampled element is geometrically distributed, i.e., \({\text {Pr}}[i_1 = i] = (1-p)^{i-1} p\). Such a random value can be generated by setting \(i_1 = \lfloor \frac{\log {\text {rand}}()}{\log (1-p)} \rfloor \). Moreover, after having sampled the index of the first element, we iterate the process starting at \(i_1+1\) to sample the second element, and so on, until we arrive for the first time at an index \(i_k > n\). In [16] the “orthogonal” problem is considered, where we want to uniformly sample a fixed number of elements from a stream of objects. The problem of UnsortedSubsetSampling was considered also in [15], where algorithms with linear preprocessing time and suboptimal query time \(\mathcal {O}(\log n + \mu )\) were designed. Our results improve upon this running time, and provide matching lower bounds.

### 1.3 Notation and Organization

In the remainder, we will write \(\ln x\) for the natural logarithm of *x*, \(\log _t x = \ln x / \ln t\), and \(\log x = \log _2 x\). Finally, we will write \({\text {rand}}()\) for a uniform random number in [0, 1].

The rest of the paper is structured as follows. In Sect. 2 we present our new algorithms, proving (the upper bounds of) Theorem 1.2 in Sect. 2.1 and Theorems 1.5 and 1.6 in Sect. 2.2. In Sect. 3 we present the lower bounds, proving (the lower bounds of) Theorems 1.3 and 1.6 in Sect. 3.1, Theorem 1.2 in Sect. 3.2, and Theorem 1.5 in Sect. 3.3. We present our reduction from ProportionalSampling to SubsetSampling in Sect. 4. We discuss relaxations to our input and machine model and possible extensions in Sect. 5.

## 2 Upper Bounds

### 2.1 A Simple Algorithm for Sorted Proportional Sampling

In this section, we prove the upper bound of Theorem 1.2 by presenting an algorithm for SortedProportionalSampling with \(\mathcal {O}(\beta )\) expected query time after \(\mathcal {O}(\log _\beta n)\) preprocessing, where \(2 \le \beta \le \mathcal {O}(\frac{\log n}{\log \log n})\) is a parameter. We remark that our algorithm also works for \(\beta \gg \frac{\log n}{\log \log n}\), but is not meaningful in this case, because then the preprocessing time is less than the query time.

*n*]. For \(i \in B_k\) we set \(\overline{p}_i := p_{\beta ^k}\), which is an upper bound for \(p_i\). Let \(\mu := \sum _i p_i\) and \(\overline{\mu }:= \sum _i \overline{p}_i\). We also set for \(0 \le k \le L\)

*i*with distribution \(\overline{p}_1,\ldots ,\overline{p}_n\). To this end, we sample a block \(B_k\) proportional to the distribution \(q_0,\ldots ,q_L\) and then sample an index \(i \in B_k\) uniformly at random. Second, with probability \(1 - p_i/\overline{p}_i\) we reject

*i*and repeat the whole process. Otherwise we return

*i*. This culminates into Algorithm 1.

*i*is proportional to \(\overline{p}_i \cdot p_i/\overline{p}_i = p_i\) and we obtained an exact sampling algorithm. Moreover, in any iteration of the loop the probability

*r*of not rejecting, i.e., of leaving the loop, is

### Lemma 2.1

We have \(\mu \le \overline{\mu }\le \beta \cdot \mu \).

### Proof

### 2.2 Subset Sampling

In this section we consider SortedSubsetSampling and UnsortedSubsetSampling and prove the upper bounds of Theorems 1.5 and 1.6. An interesting interplay between both of these problem variants will be revealed on the way.

We begin with an algorithm for unsorted probabilities that has a quite large preprocessing time, but will be used as a base case later. The algorithm uses Theorem 1.3.

### Lemma 2.2

UnsortedSubsetSampling can be solved in preprocessing time \(\mathcal {O}(n^2)\) and expected query time \(\mathcal {O}(1+\mu )\).

### Proof

*i*, or \(\infty \), if no such element is sampled. Then \(S_i\) is a random variable such that

*i*, i.e., in time \(\mathcal {O}(n^2)\) for all

*i*. After having computed the distribution of the \(S_i\)’s, we execute, for each \(i \in [n]\), the preprocessing of Theorem 1.3, which allows us to quickly sample \(S_i\) later on. This preprocessing takes time \(\mathcal {O}(n^2)\).

For querying, we start at \(i=1\) and iteratively sample the smallest element \(j \ge i\) (i.e., sample \(S_i\)), output *j*, and start over with \(i = j+1\). This is done until \(j = \infty \) or \(i = n+1\). Note that any sample of \(S_i\) can be computed in \(\mathcal {O}(1)\) time with our preprocessing, so that sampling \(S \subseteq [n]\) will be done in time \(\mathcal {O}(1 + |S|)\). The expected runtime is, thus, \(\mathcal {O}(1+\mu )\). \(\square \)

After having established this base case, we turn towards reductions between SortedSubsetSampling and UnsortedSubsetSampling. First, we give an algorithm for UnsortedSubsetSampling that reduces the problem to SortedSubsetSampling. For this, we roughly sort the probabilities so that we get good upper bounds for each probability. Then these upper bounds will be a sorted instance. After querying from this sorted instance, we use rejection (see, e.g., [8]) to sample with the original probabilities.

### Lemma 2.3

Assume that SortedSubsetSampling can be solved in preprocessing time \(t_p(n,\mu )\) and expected query time \(t_q(n,\mu )\), where \(t_p\) and \(t_q\) are monotonically increasing in *n* and \(\mu \). Then UnsortedSubsetSampling can be solved in preprocessing time \(\mathcal {O}(n + t_p(n, 2\mu + 1))\) and expected query time \(\mathcal {O}(1 + \mu + t_q(n, 2\mu + 1))\).

### Proof

*i*from

*S*. Note that we have thus sampled

*i*with probability \(p_i\), and all elements are sampled independently, so

*S*has the desired distribution. Moreover, since the expected size of \(S'\) is \(\overline{\mu }\), the expected query time is bounded by

We also give a reduction in the other direction, solving SortedSubsetSampling by UnsortedSubsetSampling.

### Lemma 2.4

Let \(2 \le \beta < n\). Assume that UnsortedSubsetSampling can be solved in preprocessing time \(t_p(n,\mu )\) and expected query time \(t_q(n,\mu )\), where \(t_p\) and \(t_q\) are monotonically increasing in *n* and \(\mu \). Then SortedSubsetSampling can be solved in preprocessing time \(\mathcal {O}(\log _\beta n + t_p(1+ \log _\beta n, \beta \mu ))\) and expected query time \(\mathcal {O}(1 + \beta \mu + t_q(1 + \log _\beta n, \beta \mu ))\). More precisely, our preprocessing computes a value \(\overline{\mu }\) with \(\mu \le \overline{\mu }\le \beta \mu \) and the expected query time is \(\mathcal {O}(1 + \overline{\mu }+ t_q(1 + \log _\beta n, \overline{\mu }))\).

### Proof

*potential*—and then use rejection. For this, let \(X_k\) be an indicator random variable for the event that we sample

*at least one*potential element in \(B_k\). Then

*Z*with parameter

*p*truncated at

*m*, i.e., \({\text {Pr}}[Z = i] = p(1-p)^i / q\) for \(i \in \{0,\ldots ,m-1\}\), where \(q := 1-(1-p)^m\). Then \(\lfloor \log (1-q \cdot {\text {rand}}()) / \log (1-p) \rfloor \) samples from

*Z*; see also [8].

Next, we put the above three lemmas together to prove the upper bounds of Theorems 1.5 and 1.6.

### Proof of Theorem 1.6, upper bound

To solve UnsortedSubsetSampling, we use the reduction Lemma 2.3 and then Lemma 2.4 (where we set \(\beta = 2\)), followed by the base case Lemma 2.2. This reduces the instance size from *n* to \(\mathcal {O}(\log n)\), so that preprocessing costs \(\mathcal {O}(n)\) for the invocation of the first lemma, \(\mathcal {O}(\log n)\) for the second, and \(\mathcal {O}(\log ^2 n)\) for the third. Note that \(\mu \) is increased only by constant factors, so that we indeed get the a query time of \(\mathcal {O}(1 + \mu )\).

For SortedSubsetSampling we first prove a weaker statement than Theorem 1.5, which follows from simply putting together the reductions of this section.

### Lemma 2.5

Let \(2 \le \beta < n\) . Then SortedSubsetSampling can be solved in preprocessing time \(\mathcal {O}(\log _\beta n)\) and expected query time \(\mathcal {O}(1 + \beta \mu )\). More precisely, our preprocessing computes a value \(\overline{\mu }\) with \(\mu \le \overline{\mu }\le \beta \mu \) and the expected query time is \(\mathcal {O}(1 + \overline{\mu })\).

### Proof

To solve SortedSubsetSampling, we use the reduction presented in Lemma 2.4 followed by the upper bound of Theorem 1.6 that we proved above. This reduces the instance size from *n* to \(\mathcal {O}(\log _\beta n)\) while \(\mu \) is increased to \(\mathcal {O}(1+\beta \mu )\). We obtain the desired preprocessing time \(\mathcal {O}(\log _\beta n)\) and query time \(\mathcal {O}(1 + \beta \mu )\). \(\square \)

### Proof of Theorem 1.5, upper bound

Assume that we are allowed preprocessing time \(\mathcal {O}(\log _{\tilde{\beta }} n)\) for some \(2 \le \tilde{\beta }< n\). Our algorithm for SortedSubsetSampling simply runs the preprocessing of Lemma 2.5 with \(\beta = \tilde{\beta }\) to satisfy the preprocessing time constraint.

For querying, we improve upon the runtime of Lemma 2.5 as follows. For any \(\beta \in \{2,\ldots ,n\}\), let \(\overline{\mu }(\beta )\) be the upper bound on \(\mu \) computed by Lemma 2.5 given \(\mathcal {O}(\log _\beta n)\) preprocessing time. Initially, we set \(\beta := \tilde{\beta }\) so that \(\overline{\mu }(\beta ) = \overline{\mu }(\tilde{\beta })\) was computed by our preprocessing. If \(1 + \overline{\mu }(\tilde{\beta }) \le \log _{\tilde{\beta }} n\) then we run the query algorithm of Lemma 2.5 and are done. Otherwise, we repeatedly set \(\beta := \lceil \beta ^{1/2} \rceil \) and rerun the preprocessing of Lemma 2.5, until \(\beta = 2\) or \(1 + \overline{\mu }(\beta ) \le \log _\beta n\). Then we run the query algorithm of Lemma 2.5.

## 3 Lower Bounds

We prove most of our lower bounds by reducing the various sampling problems to the following fact, that searching in an unordered array of length *m* takes time \(\Omega (m)\). A notable exception is Lemma 3.4.

### Fact 3.1

Consider problem ArraySearch: Given *m* and query access to an array \(A \in \{0,1\}^m\) consisting of *m* bits, with exactly one bit set to 1, find the position of this bit. Any randomized algorithm for ArraySearch needs \(\Omega (m)\) accesses to *A* in expectation.

### 3.1 Proportional Sampling on Unsorted Probabilities

The lower bound for Theorem 1.3 is provided by the following lemma that reduces ArraySearch to UnsortedProportionalSampling. Moreover, the same proof yields the lower bound of Theorem 1.6 for UnsortedSubsetSampling.

### Lemma 3.2

Any single-sample algorithm for UnsortedProportionalSampling has expected time \(\Omega (n)\). Moreover, any single-sample algorithm for UnsortedSubsetSampling has expected time \(\Omega (n)\).

### Proof

*A*be an instance of ArraySearch of size

*n*, say with 1-bit at position \(\ell ^*\). We consider the instance

*A*. Hence, by Fact 3.1, any algorithm for UnsortedProportionalSampling takes expected time \(\Omega (n)\).

Observe that on the same instance \(\mathbf {p}^A\) any sampling algorithm for UnsortedSubsetSampling returns the set \(\{\ell ^*\}\) with probability 1. This needs expected time \(\Omega (n)\) for the same reasons. With varying \(\mu \), no better bound is possible, either: If \(\mu \ge 1\), consider an ArraySearch instance *A* of length \(n-s\), where \(s := \lceil \mu - 1 \rceil \). Let \(p_i^{A} = A[i]\) for \(1 \le i \le n-s\) and set the last *s* probabilities \(p_i^A\) to values that sum up to \(\mu - 1\). Then we still need runtime \(\Omega (n - \mu )\) by Fact 3.1. As we also need runtime \(\Omega (\mu )\) for outputting the result, the lower bound of \(\Omega (n)\) follows. Otherwise, if \(\mu < 1\), then we consider \(\tilde{p}_i^A := \mu \cdot A[i]\). Since the algorithm does not know \(\mu \), it behaves just as in the case \(\mu = 1\)*until it reads*\(p_{\ell ^*}^A\). However, finding \(\ell ^*\) takes time \(\Omega (n)\), which yields the result. \(\square \)

### 3.2 Proportional Sampling on Sorted Probabilities

Here we present the proof of the lower bound of Theorem 1.2 for SortedProportionalSampling.

### Proof of Theorem 1.2, lower bound

Let \(n \in \mathbb {N}\) and \(2 \le \beta \le \mathcal {O}(\frac{\log n}{\log \log n})\). Let \(s_i := \sum _{j=0}^{i-1} \beta ^j = (\beta ^i - 1)/(\beta - 1)\). Let *L* be maximal with \(s_L \le n\) and note that \(L = \Theta (\log _\beta n)\). Then \(\beta \le \mathcal {O}(\frac{\log n}{\log \log n})\) implies \(\beta = \mathcal {O}(L)\). We consider blocks \(B_i := \{ s_i, s_i+1, \ldots , s_i+\beta ^{i-1}-1 \}\), for \(i=1,\ldots ,L\), that partition \(\{1,\ldots ,s_L\}\).

*A*be an instance of ArraySearch of size

*L*, say with 1-bit at position \(\ell ^*\). To construct the instance \(\mathbf {p}= \mathbf {p}^A = (p_1^A,\ldots ,p_n^A)\) we set for any \(\ell \in \{1,\ldots ,L\}\) and \(j \in B_\ell \)

In the following we will prove that there is no sampling algorithm where the preprocessing reads at most \(\varepsilon L\) input values and the querying reads at most \(\varepsilon \beta \) input values in expectation, for a sufficiently small constant \(\varepsilon >0\). Assume, for the sake of contradiction, that such an algorithm exists. On \(\mathbf {p}^A\) we run the preprocessing and then *K* times the query algorithm, sampling *K* numbers \(X_1,\ldots ,X_K \in \{1,\ldots ,n\}\). Denote by \(Y_k\) the block of \(X_k\), i.e., \(X_k \in B_{Y_k}\). If \(A[Y_k] = 1\) for some \(1 \le k \le K\) then we return \(Y_k\), otherwise we linearly search for the 1-bit of *A*.

*A*. Since the total probability mass of block \(B_{\ell ^*}\) is \(\beta \), we have

*A*of the constructed algorithm is (counting preprocessing,

*K*queries, and a possible linear search through

*A*)

Note that the same proof also works for single-sample algorithms. In this case the preprocessing reads no input values, and the only restriction is \(\beta \le \mathcal {O}(L)\). Setting \(\beta = \Theta (\log (n)/\log \log (n))\) this yields a lower bound of \(\Omega (\log (n)/\log \log (n))\) on the expected runtime of any single-sample algorithm for SortedProportionalSampling.

### 3.3 Subset Sampling on Sorted Probabilities

We first prove two lemmas proving lower bounds for SortedSubsetSampling in different situations. Then we show how the lower bound of Theorem 1.5 follows from these lemmas.

### Lemma 3.3

Let \(\beta \in \{2,\ldots ,n\}\). Consider any data structure for SortedSubsetSampling with preprocessing time \(\varepsilon \log _{\beta } n\) (where \(\varepsilon >0\) is a sufficiently small constant) and query time \(t_q(n,\mu )\). Then for any \(\mu = \mu (n)\) with \(\beta (1+\mu ) = \mathcal {O}(\log _\beta n)\) we have \(t_q(n,\mu ) = \Omega (\beta \mu )\).

### Proof

We closely follow the proof of the lower bound of Theorem 1.2 (Sect. 3.2). Let \(s_i := \sum _{j=0}^{i-1} \beta ^j = (\beta ^i - 1)/(\beta - 1)\). Let *L* be maximal with \(s_L \le n\) and note that \(L = \Theta (\log _\beta n)\). We consider blocks \(B_i := \{ s_i, s_i+1, \ldots , s_i+\beta ^{i-1}-1 \}\), for \(i=1,\ldots ,L\), that partition \(\{1,\ldots ,s_L\}\).

Note that our assumptions imply \(\beta = \mathcal {O}(\log _\beta n)\), from which it follows that \(\beta = \mathcal {O}(\log n)\) and thus \(L = \Theta (\log _\beta n) = \Omega (\log n / \log \log n)\) grows with *n*. Since we can assume that *n* is sufficiently large, we thus can assume the same for *L*. By assumption we also have \(\mu = \mathcal {O}(\log _\beta n) = \mathcal {O}(L)\). If \(\mu > L\), then we introduce elements \(p_1=\ldots =p_{\lceil \mu -L\rceil } = 1\). Then on the remainder \(p_{\lceil \mu -L\rceil +1},\ldots ,p_n\) we have a probability mass \(\mu - \lceil \mu - L \rceil \), which is at most *L*, but still \(\Omega (\mu )\) (where we use that *L* is at least a sufficiently large constant). Hence, it suffices to show that sampling from the remainder takes query time \(\Omega (\beta \mu )\). Focussing on this remainder, without loss of generality we can from now on assume \(\mu \le L\).

Let *A* be an instance of ArraySearch of size *L*, say with 1-bit at position \(\ell ^*\). To construct the instance \(\mathbf {p}= \mathbf {p}^A = (p_1^A,\ldots ,p_n^A)\), for some \(0 \le \alpha \le 1\) we set for any \(\ell \in \{1,\ldots ,L\}\) and \(j \in B_\ell \) the input to \( p_j^A := \alpha \cdot \beta ^{-\ell + A[\ell ]} \), and for \(s_L < j \le n\) to \(p_j^A := 0\). As block \(B_\ell \) has size \(\beta ^\ell \), the total probability mass of \(B_\ell \) is \(\sum _{j \in B_\ell } p_j^A = \alpha \cdot \beta ^{A[\ell ]}\). Observe that \( \mu = \sum _{i=1}^n p_i^A = \alpha (L + \beta - 1)\) indeed has a solution \(0 \le \alpha \le 1\), since \(\mu \le L\). Furthermore, note that \(p_1^A,\ldots ,p_n^A\) is indeed sorted.

Assume for the sake of contradiction that there is a data structure for SortedSubsetSampling where the preprocessing reads at most \(\varepsilon \log _\beta n\) input values and the querying reads at most \(\varepsilon \beta \mu \) input values in expectation, for a sufficiently small constant \(\varepsilon > 0\).

On \(\mathbf {p}^A\) we run the preprocessing and then *K* times the query algorithm, sampling *K* sets \(X_1,\ldots ,X_K \subseteq \{1,\ldots ,n\}\). For every \(x \in \bigcup _{k=1}^K X_k\) we determine its block \(B_y\) and check whether \(A[y] = 1\). If so, we have found the 1-bit of *A*. Otherwise we linearly search for the 1-bit of *A*.

*A*. Let \(\ell ^*\) be the position of the 1-bit in

*A*. The probability of not sampling any \(i \in B_{\ell ^*}\) in any of the

*K*queries is

*A*of the constructed algorithm is (counting preprocessing,

*K*queries, and a linear search through

*A*with probability at most \(\varepsilon \))

*A*by \(\mathcal {O}(\log (1/\varepsilon ) \varepsilon L)\), which contradicts Fact 3.1 for sufficiently small \(\varepsilon >0\). \(\square \)

### Lemma 3.4

Note that this lemma directly implies the lower bound of Theorem 1.4 for SortedSubsetSampling assuming \(\mu \le \tfrac{1}{2}\).

### Proof

*P*,

*Q*) be a preprocessing and a query algorithm, and let \(\mathbf {p}\) be an instance. Let \(D = P(\mathbf {p})\) be the result of the precomputation. By definition we have

*D*, note that \(|\mathcal P| \le t_p = t_p(n)\). Without loss of generality, we can assume that \(1,n \in \mathcal P\), i.e., that the preprocessing reads \(p_{1}\) and \(p_{n}\), as this adjustment of the algorithm does not increase its runtime asymptotically.

*Q*(with input \(\mathbf {p},D\)) reads exactly the values \(p_{i}\) with \(i \in \mathcal Q\) before returning \(\emptyset \). We clearly have

*Q*on input \(\mathbf {p},D\) runs for time at most \(4 t_q\) we have

*n*] of size at most \(4t_q\) is \(\sum _{s=0}^{4t_q} {n \atopwithdelims ()s} \le \big (e n / 4t_q \big )^{4t_q} \le n^{4t_q}/4\), there exists a set \(\mathcal Q^* \subseteq [n]\), \(|\mathcal Q^*| \le 4t_q\), with

A tedious case distinction now shows that the lower bound of Theorem 1.5 follows from the above two lemmas.

### Proof of Theorem 1.5, lower bound

Case 1, \(\mu \ge \frac{1}{2}\): We split this into 3 subcases as follows.

Case 1.1, \(\mu \ge \tfrac{1}{2} \log n\): As the expected output size is \(\mu \), the expected query time is always \(\Omega (\mu )\), which is tight in this case.

Case 1.2, \(\mu \ge \frac{1}{2}\) and \(\tfrac{1}{\beta }\log _\beta n \le \mu < \tfrac{1}{2} \log n\): In this case, we can choose \(2 \le \gamma \le \beta \) such that \(\mu = \Theta (\tfrac{1}{\gamma }\log _\gamma n)\). Solving for \(\gamma \) yields \(\gamma = \Theta \big ( \tfrac{\log n}{\mu } \big / \log \tfrac{\log n}{\mu }\big )\). We have \(\gamma \le 2 \gamma \mu \le \mathcal {O}(\log _\gamma n)\), so Lemma 3.3 (applied with \(\beta \) replaced by \(\gamma \)) yields a lower bound of \(\Omega (\gamma \mu ) = \Omega \big (\tfrac{\log n}{\log \frac{\log n}{\mu }}\big )\) for any data structure with preprocessing time \(\mathcal {O}(\log _\beta n) \le \mathcal {O}(\log _\gamma n)\).

Case 1.3, \(\frac{1}{2} \le \mu < \tfrac{1}{\beta }\log _\beta n\): These inequalities imply \(\beta \le 2 \beta \mu \le 2 \log _\beta n\). Thus, Lemma 3.3 applies, showing that the query time is \(\Omega (\beta \mu )\). As any algorithm takes time \(\Omega (1)\), the query time is also bounded by \(\Omega (1 + \beta \mu )\), as desired.

Case 2, \(\mu < \frac{1}{2}\): We split this into three subcases as follows.

Case 2.1, \(\tfrac{1}{\beta }\log _\beta n \le \mu < \frac{1}{2}\): Note that \(\mu \ge \tfrac{1}{\beta }\log _\beta n\) implies \(\beta ^2 \ge \beta \log \beta \ge \tfrac{1}{\mu }\log n\) so that \(\log \beta = \Omega \big (\log \tfrac{\log n}{\mu }\big )\). Hence, the preprocessing time is \(\varepsilon \log _\beta n = \mathcal {O}\big (\varepsilon \tfrac{\log n}{\log \frac{\log n}{\mu }}\big )\). For sufficiently small \(\varepsilon > 0\), Lemma 3.4 now implies \(t_q(n,\mu ) = \Omega \big ( \tfrac{\log n}{\log \frac{\log n}{\mu }}\big )\), as desired.

Case 2.2, \(\mu < \frac{1}{2}\) and \(\tfrac{1}{\beta ^3} \log n \le \mu < \tfrac{1}{\beta }\log _\beta n\): Then \(\log \beta = \Omega \big (\log \tfrac{\log n}{\mu }\big )\) and \(\log _\beta n = \mathcal {O}\big (\tfrac{\log n}{\log \frac{\log n}{\mu }}\big )\). Hence, with \(\varepsilon \log _\beta n\) preprocessing time and sufficiently small \(\varepsilon >0\), Lemma 3.4 implies that \(t_q(n,\mu ) = \Omega \big (\tfrac{\log n}{\log \frac{\log n}{\mu }}\big ) \ge \Omega (\log _\beta n) \ge \Omega (\beta \mu )\), where the last inequality follows from \(\mu < \tfrac{1}{\beta }\log _\beta n\). Since any algorithm takes time \(\Omega (1)\), this yields a lower bound of \(\Omega (1+\beta \mu )\), as desired.

## 4 Reduction from Proportional Sampling to Subset Sampling

In this section, we present a reduction from (Sorted or Unsorted) ProportionalSampling to (Sorted or Unsorted) SubsetSampling. This yields an alternative proof of the upper bounds for ProportionalSampling (Theorems 1.2 and 1.3) using the upper bounds for SubsetSampling (Theorems 1.5 and 1.6). Moreover, it shows that the classic ProportionalSampling problem is easier than SubsetSampling (or the former can be seen as a special case of the latter).

We first present a reduction that works for \(\mu \le 1\) and yields a query time proportional to \(1/\mu \). Then we show how to ensure \(1/\beta \le \mu \le 1\) after \(\mathcal {O}(\log _\beta n)\) preprocessing, which together with the first reduction shows the main result of this section, Proposition 4.5.

### 4.1 Special Case

Let \(\mathbf {p}\) be an instance to SortedProportionalSampling or UnsortedProportionalSampling. We assume \(\mu \le 1\) and will obtain a running time proportional to \(\frac{1}{\mu }\), which is most reasonable when \(\mu \) comes from a small interval \([1/\beta ,1]\). Instead of \(\mathbf {p}\) we consider \(\mathbf {p}' = (p_1',\ldots ,p_n')\) with \(p_i' := {p_i}/({1+p_i})\). Note that if \(\mathbf {p}\) is sorted then \(\mathbf {p}'\) is also sorted. Moreover, \(\mu ' := \sum _{i=1}^n p_i'\) is in the range \([{\mu }/{2},\mu ]\).

Let \(Y = \textsc {ProportionalSampling}(\mathbf {p})\) be the random variable denoting proportional sampling on input \(\mathbf {p}\), and \(X = \textsc {SubsetSampling}(\mathbf {p}')\) be the random variable denoting subset sampling on input \(\mathbf {p}'\). Then conditioned on sampling exactly one element \(X = \{i\}\), this element *i* is distributed exactly as *Y*, as formulated by the following lemma.

### Lemma 4.1

### Proof

Moreover, the probability of sampling exactly one element is not too small, as shown in the following lemma. This bound is not best possible but sufficient for our purposes.

### Lemma 4.2

### Proof

*X*implies that

We put these facts together to show the following result. We need \(\mu \le 1\), and we want \(\mu \) as large as possible, since the obtained running time is proportional to \(\frac{1}{\mu }\). In the next section we will see that we can assume \(\frac{1}{\beta }\le \mu \le 1\) after preprocessing \(\mathcal {O}(\log _\beta n)\).

### Lemma 4.3

Assume that (Sorted or Unsorted) SubsetSampling can be solved in preprocessing time \(t_p(n,\mu )\) and expected query time \(t_q(n,\mu )\), where \(t_p\) and \(t_q\) are monotonically increasing in *n* and \(\mu \). Then (Sorted or Unsorted, respectively) ProportionalSampling on instances with \(\mu \le 1\) can be solved in preprocessing time \(\mathcal {O}(t_p(n,\mu ))\) and expected query time \(\mathcal {O}(\tfrac{1}{\mu }\cdot \, t_q(n,\mu ))\).

### Proof

For preprocessing, given input \(\mathbf {p}\), we run the preprocessing of SubsetSampling on input \(\mathbf {p}'\). This does not mean that we compute the vector \(\mathbf {p}'\) beforehand, but if the preprocessing algorithm of SubsetSampling reads the *i*-th input value, we compute \(p_i' = {p_i}/{(1+p_i)}\) on the fly, so that preprocessing needs runtime \(\mathcal {O}(t_p(n,\mu ))\) (recall that \(\mu ' \le \mu \)). It allows to sample *X* later on in expected runtime \(\mathcal {O}(t_q(n,\mu ))\) using the same trick of computing \(\mathbf {p}'\) on the fly.

For querying, we repeatedly sample *X* until we sample a set *S* of size one. Returning the unique element of *S* results in a proper sample according to SortedProportionalSampling by Lemma 4.1. Moreover, by Lemma 4.2 and the fact that sampling *X* needs expected time \(\mathcal {O}(t_q(n,\mu ))\) after our preprocessing, the total expected query time is \(\mathcal {O}(\tfrac{1}{\mu }\cdot t_q(n,\mu ))\). \(\square \)

### 4.2 General Case

In this subsection we reduce the general case with arbitrary \(\mu \) to the special case \(1/\beta \le \mu \le 1\). In the unsorted case, we simply compute \(\mu \) exactly in time \(\mathcal {O}(n)\), which shows the following proposition. In the sorted case, we approximate \(\mu \) using an idea of Sect. 2.1, see Proposition 4.5.

### Proposition 4.4

Assume that UnsortedSubsetSampling can be solved in preprocessing time \(t_p(n,\mu )\) and expected query time \(t_q(n,\mu )\), where \(t_p\) and \(t_q\) are monotonically increasing in *n* and \(\mu \). Then UnsortedProportionalSampling can be solved in preprocessing time \(\mathcal {O}(n + t_p(n,1))\) and expected query time \(\mathcal {O}(t_q(n,1))\).

Note that plugging Theorem 1.6 into the above proposition yields the upper bound of Theorem 1.3.

### Proof

In the preprocessing we compute \(\mu \) in time \(\mathcal {O}(n)\), and set \(\tilde{p}_i := p_i/\mu \) for \( i\in [n]\). This rescaling ensures \(\tilde{\mu }= \sum _i \tilde{p}_i = 1\). Then we run the algorithm guaranteed by Lemma 4.3 on \(\tilde{p}_1, \ldots , \tilde{p}_n\). \(\square \)

### Proposition 4.5

Let \(\beta \in \{2, \dots , n\}\). Assume that SortedSubsetSampling can be solved in preprocessing time \(t_p(n,\mu )\) and expected query time \(t_q(n,\mu )\), where \(t_p\) and \(t_q\) are monotonically increasing in *n* and \(\mu \). Then SortedProportionalSampling can be solved in preprocessing time \(\mathcal {O}(\log _\beta n + t_p(n,1))\) and expected query time \(\mathcal {O}({\text {max}}_{1/\beta \le \nu \le 1} \tfrac{1}{\nu } t_q(n, \nu ))\).

Note that plugging Theorem 1.5 into the above proposition yields the upper bound of Theorem 1.2 (to see the bound on the query time, note that we can set \(t_q(n,\mu ) = \mathcal {O}(1+\beta \mu )\) by Theorem 1.5 or Lemma 2.5, so that \({\text {max}}_{1/\beta \le \nu \le 1} \tfrac{1}{\nu } t_q(n, \nu ) = \mathcal {O}( {\text {max}}_{1/\beta \le \nu \le 1} \tfrac{1}{\nu } (1 + \beta \nu ) ) = \mathcal {O}( \beta )\)).

### Proof

*i*-th input value, we compute \(p_i'\) on the fly. This way we need a total runtime for preprocessing of \(\mathcal {O}(\log _\beta n + t_p(n,1))\).

For querying, Lemma 4.3 allows us to query according to \(\mathbf {p}'\) in expected runtime \(\mathcal {O}(\tfrac{1}{\mu '} t_q(n,\mu ')) \le \mathcal {O}({\text {max}}_{1/\beta \le \nu \le 1} \tfrac{1}{\nu } t_q(n, \nu ))\), where we again compute values of \(\mathbf {p}'\) on the fly as needed. As we want to sample proportionally to the input distribution, a sample with respect to \(\mathbf {p}'\) has the same distribution as a sample with respect to \(\mathbf {p}\), so that we simply return the sampled number. \(\square \)

## 5 Relaxations

In this section we describe some natural relaxations for the input and machine model studied so far in this paper.

*Large Deviations for the Running Times*The query runtimes in Theorems 1.2, 1.5 and 1.6 are, in fact, not only small in expectation, but they are also concentrated, i.e., they satisfy large deviation estimates in the following sense. Let

*t*be the expected runtime bound and

*T*the actual runtime. Then

*k*. This is shown rather straightforwardly along the lines of our proofs of these theorems, except the fact that the size of the random set

*X*in SubsetSampling is concentrated. Note that for any \(a>1\) the Chernoff bound shows that

*not*show a tail bound of \(e^{-\Omega (k)}\) for \({\text {Pr}}[|X| > k \mu ]\), and in fact such a tail bound does not hold. However, it suffices that |

*X*| is not much larger than \(1+\mu \) to bound our algorithms’ running times, and this indeed has an exponential tail bound, since by setting \(a = k (\mu + 1)/\mu \) we obtain

*Partially Sorted Input*The condition of sorted input for SortedSubsetSampling and SortedProportionalSampling can easily be relaxed, as long as we have sorted upper bounds of the probabilities. Given input \(\mathbf {p}\) and sorted \(\overline{\mathbf {p}}\) with \(p_{i} \le \overline{p}_{i}\) for all \(i \in [n]\), we simply sample according to \(\overline{\mathbf {p}}\) and use rejection to get down to the probabilities \(\mathbf {p}\). This allows for the optimal query time \(\mathcal {O}(1+\mu )\) as long as \(\overline{\mu }= \sum _{i=1}^{n} \overline{p}_{i} = \mathcal {O}(1 + \mu )\), where \(\mu = \sum _{i=1}^{n} p_{i}\).

*Unimodular Input* Many natural distributions \(\mathbf {p}\) are not sorted, but unimodular, meaning that \(p_{i}\) is monotonically increasing for \(1 \le i \le m\) and monotonically decreasing for \(m \le i \le n\) (or the other way round). Knowing *m*, we can run the algorithms developed in this paper on both sorted halfs, and combine the return values, which gives an optimal query algorithm for unimodular inputs. Alternatively, if we have strong monotonicity, we can search for *m* in time \(\mathcal {O}(\log n)\) using ternary search.

This can be naturally generalized to *k*-modular inputs, where the monotonicity changes *k* times.

*Approximate Input* In some applications it may be costly to compute the probabilities \(p_{i}\) exactly, but we are able to compute approximations \(\overline{p}_{i}(\varepsilon ) \ge p_{i} \ge \underline{p}_{i}(\varepsilon )\), with relative error at most \(\varepsilon \), where the cost of computing these approximations depends on \(\varepsilon \). We can still guarantee optimal query time, if the costs of computing these approximations are small enough, see e.g. [12].

We sketch this for SubsetSampling. We can surely sample a superset \(\overline{S}\) with respect to the probabilities \(\overline{p}_{i}(\frac{1}{2})\). Then we want to use rejection, i.e., for each element \(i \in \overline{S}\) we want to compute a random number \(r := {\text {rand}}()\) and delete *i* from \(\overline{S}\) if \(r \cdot \overline{p}_{i}(\frac{1}{2}) > p_{i}\), to get a sample set *S*. This check can be performed as follows. We initialize \(k:=1\). If \(r \cdot \overline{p}_{i}(\frac{1}{2}) > \overline{p}_{i}(2^{-k})\) we delete *i* from \(\overline{S}\). If \(r \cdot \overline{p}_{i}(\frac{1}{2}) \le \underline{p}_{i}(2^{-k})\) we keep *i* and are done. Otherwise, we increase *k* by 1. This method needs an expected number of \(\mathcal {O}(1)\) rounds of increasing *k*; the probability of needing *k* rounds is \(\mathcal {O}(2^{-k})\). Hence, if the cost of computing \(\overline{p}_{i}(\varepsilon )\) and \(\underline{p}_{i}(\varepsilon )\) is \(\mathcal {O}(\varepsilon ^{-c})\) with \(c < 1\), the expected overall cost is constant, and we get an optimal expected query time of \(\mathcal {O}(1+\mu )\).

*Word RAM* Throughout the paper we worked in the Real RAM model of computation, where every memory cell can store a real number. In the more realistic Word RAM model each cell consists of \(w = \Omega (\log n)\) bits and any reasonable operation on two words can be performed in constant time. Additionally to the standard repertoire of operations, we assume that we can generate a uniformly random word in constant time. It is known that in this model Bernoulli and geometric random variates can be drawn in constant time [2] and the classic aliasing method for UnsortedProportionalSampling still works [3]. This already allows one to translate large parts of the algorithms of this paper to the Word RAM. Unfortunately, terms like \(\prod _{1\le k \le n} (1-p_k)\) (see Sect. 2.2) cannot be evaluated exactly on the Word RAM, as the result would need at least *n* bits. This difficulty can be solved by working with \(\mathcal {O}(\log n)\) bit approximations and increasing the precision as needed, similarly to the generalization to *approximate input* that we discussed in the last paragraph. This way one can obtain a complete translation of our algorithms to the Word RAM. We omit the details.

## Footnotes

- 1.
Throughout the paper, we abbreviate \([n] = \{1, \dots , n\}\).

## Notes

### Acknowledgments

Open access funding provided by Max Planck Society (or associated institution if applicable).

### References

- 1.Borodin, A., Munro, I.: The Computational Complexity of Algebraic and Numeric Problems. Elsevier Publishing Company, London (1975)MATHGoogle Scholar
- 2.Bringmann, K., Friedrich, T.: Exact and efficient generation of geometric random variates and random graphs. In: Proceedings of 40th International Colloquium on Automata, Languages, and Programming (ICALP’13), pp. 267–278 (2013)Google Scholar
- 3.Bringmann, K., Green Larsen, K.: Succinct sampling from discrete distributions. In Proceedings of 45th Annual ACM Symposium on Theory of Computing (STOC’13), pp. 775–782 (2013)Google Scholar
- 4.Chung, Fan, Linyuan, Lu: The average distance in a random graph with given expected degrees. Internet Math.
**1**(1), 91–113 (2004)MathSciNetCrossRefMATHGoogle Scholar - 5.Devroye, L.: Nonuniform Random Variate Generation. Springer, New York (1986)CrossRefMATHGoogle Scholar
- 6.Flajolet, P., Saheb, N.: The complexity of generating an exponentially distributed variate. J. Algorithms
**7**(4), 463–488 (1986)MathSciNetCrossRefMATHGoogle Scholar - 7.Hagerup, T., Mehlhorn, K. and Munro, I.: Maintaining discrete probability distributions optimally. In: Proceedings of 20th International Colloquium on Automata, Languages, and Programming (ICALP ’93), pp. 253–264 (1993)Google Scholar
- 8.Knuth, D.E.: The Art of Computer Programming, Vol. 2: Seminumerical Algorithms, 3rd edn. Addison-Wesley Publishing Company, Boston (2009)MATHGoogle Scholar
- 9.Knuth, D.E., Yao, A.C.: The complexity of nonuniform random number generation. In: Traub, J.F. (ed.) Algorithms and Complexity: New Directions and Recent Results, Proceedings of a Symposium, pp. 357–428. Carnegie-Mellon University, Computer Science Department, Academic Press, New York, NY (1976)Google Scholar
- 10.Matias, Y., Vitter, J.S., Ni, W.-C.: Dynamic generation of discrete random variates. Theory Comput Syst
**36**(4), 329–358 (2003)MathSciNetCrossRefMATHGoogle Scholar - 11.Miller, J.C., Hagberg, A.A.: Efficient generation of networks with given expected degrees. In: Proceedings of 8th International Workshop Algorithms and Models for the Web Graph (WAW’11), pp. 115–126 (2011)Google Scholar
- 12.Nacu, Ş., Peres, Y.: Fast simulation of new coins from old. Ann. Appl. Probab.
**15**(1A), 93–115 (2005)MathSciNetCrossRefMATHGoogle Scholar - 13.Preparata, F.P., Shamos, M.I.: Computational Geometry. Texts and Monographs in Computer Science. Springer, New York (1985)Google Scholar
- 14.Pătraşcu, M.: WebDiarios de Motocicleta, Sampling a discrete distribution. http://infoweekly.blogspot.com/2011/09/sampling-discrete-distribution.html (2011)
- 15.Tsai, M.-T., Wang, D.-W., Liau, C.-J., Hsu, T.-S.: Heterogeneous subset sampling. In: Proceedings of 16th Annual International Computing and Combinatorics Conference (COCOON ’10), pp. 500–509 (2010)Google Scholar
- 16.Vitter, J.S.: Random sampling with a reservoir. ACM Trans. Math. Softw.
**11**(1), 37–57 (1985)Google Scholar - 17.Walker, A.J.: New fast method for generating discrete random numbers with arbitrary distributions. Electron. Lett.
**10**, 127–128 (1974)CrossRefGoogle Scholar - 18.Yao, A.C.: Context-free grammars and random number generation. In: Combinatorial Algorithms on Words
**12**, 357–361 (1985)Google Scholar

## Copyright information

**Open Access**This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.