1 Introduction

A Poisson binomial distribution of order \(n\) is the discrete probability distribution of the sum of \(n\) independent indicator random variables. The distribution is parameterized by a vector \((p_i)_{i=1}^n \in [0, 1]^n\) of probabilities, and is denoted \(\mathrm{PBD}(p_1, \ldots , p_n)\). In this paper we establish that the set \(\mathcal{S}_n\) of all Poisson Binomial distributions of order \(n\) admits certain useful covers with respect to the total variation distance \(d_{\mathrm{TV}}\left( \cdot ,\cdot \right) \) between distributions. Namely

Theorem 1

(Main Theorem) For all \(n, \epsilon >0\), there exists a set \(\mathcal{S}_{n,\epsilon } \subset \mathcal{S}_n\) such that:

  1. 1.

    \(\mathcal{S}_{n,\epsilon }\) is an \(\epsilon \)-cover of \(\mathcal{S}_n\) in total variation distance; that is, for all \(D \in \mathcal{S}_n\), there exists some \(D' \in \mathcal{S}_{n,\epsilon }\) such that \(d_{\mathrm{TV}}\left( D,D'\right) \le \epsilon \)

  2. 2.

    \(|\mathcal{S}_{n,\epsilon }| \le n^2 + n \cdot \left( {1 \over \epsilon }\right) ^{O(\log ^2{1/\epsilon })}\)

  3. 3.

    \(\mathcal{S}_{n,\epsilon }\) can be computed in time \(O(n^2 \log n) + O(n \log n) \cdot \left( {1 \over \epsilon }\right) ^{O(\log ^2{1/\epsilon })}\).

Moreover, all distributions \(\mathrm{PBD}(p_1,\ldots ,p_n) \in \mathcal{S}_{n,\epsilon }\) in the cover satisfy at least one of the following properties, for some positive integer \(k=k(\epsilon ) = O(1/\epsilon )\!:\)

  • (\(k\)-sparse form) there is some \( \ell \le k^3\) such that, for all \(i \le \ell , p_i \in \left\{ {1 \over k^2}, {2\over k^2},\ldots , {k^2-1 \over k^2 }\right\} \) and, for all \(i >\ell , p_i \in \{0, 1\}\); or

  • (\((n,k)\)-binomial form) there is some \(\ell \in \{1,\dots ,n\}\) and \(q \in \left\{ {1 \over n}, {2 \over n},\ldots , {n \over n} \right\} \) such that, for all \(i \le \ell , p_i = q\) and, for all \(i >\ell , p_i = 0\); moreover, \(\ell \) and \(q\) satisfy \(\ell q \ge k^2\) and \(\ell q(1-q) \ge k^2- k-1\).

Covers such as the one provided by Theorem 1 are of interest in the design of algorithms, when one is searching a class of distributions \(C\) to identify an element of the class with some quantitative property, or in optimizing over a class with respect to some objective. If the metric used in the construction of the cover is relevant for the problem at hand, and the cover is discrete, relatively small and easy to construct, then one can provide a useful approximation to the sought distribution by searching the cover, instead of searching all of \(C\). For example, it is shown in [1214] that Theorem 1 implies efficient algorithms for computing approximate Nash equilibria in an important class of multiplayer games, called anonymous [5, 20].

We proceed with a fairly detailed sketch of the proof of our main cover theorem, Theorem 1, stating two additional results, Theorems 2 and 3. The complete proofs of Theorems 1, 2 and 3 are deferred to Sects. 34 and 5 respectively. Section 1.4 discusses related work, while Sect. 2 provides formal definitions, as well as known approximations to the Poisson Binomial distribution by simpler distributions, which are used in the proof.

1.1 Proof outline and additional results

At a high level, the proof of Theorem 1 is obtained in two steps. First, we establish the existence of an \(\epsilon \)-cover whose size is polynomial in \(n\) and \((1/\epsilon )^{1/\epsilon ^2}\), via Theorem 2. We then show that this cover can be pruned to size polynomial in \(n\) and \((1/\epsilon )^{\log ^2(1/\epsilon )}\) using Theorem 3, which provides a quantification of how the total variation distance between Poisson Binomial distributions depends on the number of their first moments that are equal.

We proceed to state the two ingredients of the proof, Theorems 2 and 3. We start with Theorem 2 whose detailed sketch is given in Sect. 1.2, and complete proof in Sect. 4.

Theorem 2

Let \(X_1,\ldots ,X_n\) be arbitrary mutually independent indicators, and \(k \in \mathbb {N}\). Then there exist mutually independent indicators \(Y_1,\ldots ,Y_n\) satisfying the following:

  1. 1.

    \(d_{\mathrm{TV}}\left( \sum _{i}{X_i},\sum _{i}{Y_i}\right) \le 41/k ;\)

  2. 2.

    at least one of the following is true:

    1. (a)

      (\(k\)-sparse form) there exists some \( \ell \le k^3\) such that, for all \(i \le \ell , \mathbb {E}[{Y_i}] \in \left\{ {1 \over k^2}, {2\over k^2},\ldots , {k^2-1 \over k^2 }\right\} \) and, for all \(i >\ell , \mathbb {E}[{Y_i}] \in \{0, 1\};\) or

    2. (b)

      (\((n,k)\)-Binomial form) there is some \(\ell \in \{1,\dots ,n\}\) and \(q \in \left\{ {1 \over n}, {2 \over n},\ldots , {n \over n} \right\} \) such that, for all \(i \le \ell , \mathbb {E}[{Y_i}] = q\) and, for all \(i >\ell , \mathbb {E}[{Y_i}] = 0;\) moreover, \(\ell \) and \(q\) satisfy \(\ell q \ge k^2\) and \(\ell q(1-q) \ge k^2- k-1.\)

Theorem 2 implies the existence of an \(\epsilon \)-cover of \(\mathcal{S}_n\) whose size is \(n^2 + n \cdot \left( {1 / \epsilon }\right) ^{O({1/\epsilon ^2})}\). This cover can be obtained by enumerating over all Poisson Binomial distributions of order \(n\) that are in \(k\)-sparse or \((n,k)\)-Binomial form as defined in the statement of the theorem, for \(k=\lceil 41/ \epsilon \rceil \).

The next step is to sparsify this cover by removing elements to obtain Theorem 1. Note that the term \(n \cdot \left( {1 / \epsilon }\right) ^{O({1/\epsilon ^2})}\) in the size of the cover is due to the enumeration over distributions in sparse form. Using Theorem 3 below, we argue that there is a lot of redundancy in those distributions, and that it suffices to only include \(n\cdot \left( {1 / \epsilon }\right) ^{O(\log ^2{1/\epsilon })}\) of them in the cover. In particular, Theorem 3 establishes that, if two Poisson Binomial distributions have their first \(O(\log 1/\epsilon )\) moments equal, then their distance is at most \(\epsilon \). So we only need to include at most one sparse form distribution with the same first \(O(\log 1/\epsilon )\) moments in our cover. We proceed to state Theorem 3, postponing its proof to Sect. 5. In Sect. 1.3 we provide a sketch of the proof.

Theorem 3

Let \(\mathcal {P}:=(p_i )_{i=1}^n \in [0,1/2]^n\) and \(\mathcal {Q}:=(q_i)_{i=1}^n \in [0,1/2]^n\) be two collections of probability values. Let also \(\mathcal {X}:=(X_i)_{i=1}^n\) and \(\mathcal {Y}:=(Y_i)_{i=1}^n\) be two collections of mutually independent indicators with \(\mathbb {E}[X_i]=p_i\) and \(\mathbb {E}[Y_i]=q_i\), for all \(i \in [n]\). If for some \(d \in [n]\) the following condition is satisfied:

$$\begin{aligned} (C_d)&:&\quad \sum _{i=1}^n p_i^{\ell } = \sum _{i=1}^n q_i^{\ell },\quad \text {for all } \ell =1,\ldots ,d, \nonumber \\&\text {then}\quad d_{\mathrm{TV}}\left( \sum _{i}{X_i},\sum _{i}{Y_i}\right) \le 13(d+1)^{1/4} 2^{-(d+1)/2}. \end{aligned}$$
(1)

Remark 1

Condition \((C_d)\) in the statement of Theorem 3 constrains the first \(d\) power sums of the expectations of the constituent indicators of two Poisson Binomial distributions. To relate these power sums to the moments of these distributions we can use the theory of symmetric polynomials to arrive at the following equivalent condition to \((C_d)\):

$$\begin{aligned} (V_d):\quad \mathbb {E}\left[ \left( \sum _{i=1}^n X_i\right) ^{\ell }\right] = \mathbb {E}\left[ \left( \sum _{i=1}^n Y_i\right) ^{\ell }\right] ,\quad ~\text {for all } \ell \in [d]. \end{aligned}$$

We provide a proof that \((C_d) \Leftrightarrow (V_d)\) in Proposition 2 of Sect. 6.

Remark 2

In view of Remark 1, Theorem 3 says the following:

“If two sums of independent indicators with expectations in [0,1/2] have equal first \(d\) moments, then their total variation distance is \(2^{-\Omega (d)}\).”

We note that the bound (1) does not depend on the number of variables \(n\), and in particular does not rely on summing a large number of variables. We also note that, since we impose no constraint on the expectations of the indicators, we also impose no constraint on the variance of the resulting Poisson Binomial distributions. Hence we cannot use Berry-Esséen type bounds to bound the total variation distance of the two Poisson Binomial distributions by approximating them with Normal distributions. Finally, it is easy to see that Theorem 3 holds if we replace \([0,1/2]\) with \([1/2,1]\). See Corollary 1 in Sect. 6.

In Sect. 3 we show how to use Theorems 2 and 3 to obtain Theorem 1. We continue with the outlines of the proofs of Theorems 2 and 3, postponing their complete proofs to Sects. 4 and 5.

1.2 Outline of proof of Theorem 2

Given arbitrary indicators \(X_1,\ldots ,X_n\) we obtain indicators \(Y_1,\ldots ,Y_n\), satisfying the requirements of Theorem 2, in two steps. We first massage the given variables \(X_1,\ldots ,X_n\) to obtain variables \(Z_1,\ldots ,Z_n\) such that

$$\begin{aligned}&d_{\mathrm{TV}}\left( \sum _{i}{X_i},\sum _{i}{Z_i}\right) \le 7/k; \nonumber \\&\quad \text {and}\quad \mathbb {E}[Z_i] \notin \left( 0,\frac{1}{k}\right) \cup \left( 1-\frac{1}{k},1\right) ; \end{aligned}$$
(2)

that is, we eliminate from our collection variables that have expectations very close to \(0\) or \(1\), without traveling too much distance from the starting Poisson Binomial distribution.

Variables \(Z_1,\ldots ,Z_n\) do not necessarily satisfy Properties 2a or b in the statement of Theorem 2, but allow us to define variables \(Y_1,\ldots ,Y_n\) which do satisfy one of these properties and, moreover,

$$\begin{aligned}&d_{\mathrm{TV}}\left( \sum _{i}{Z_i},\sum _{i}{Y_i}\right) \le 34/k. \end{aligned}$$
(3)

(2), (3) and the triangle inequality imply \(d_{\mathrm{TV}}\left( \sum _{i}{X_i},\sum _{i}{Y_i}\right) \le {41 \over k}\), concluding the proof of Theorem 2.

Let us call Stage 1 the process of determining the \(Z_i\)’s and Stage 2 the process of determining the \(Y_i\)’s. The two stages are described briefly below, and in detail in Sects. 4.1 and 4.2 respectively. For convenience, we use the following notation: for \(i=1,\ldots ,n, p_i=\mathbb {E}[X_i]\) will denote the expectation of the given indicator \(X_i, p_i' = \mathbb {E}[Z_i]\) the expectation of the intermediate indicator \(Z_i\), and \(q_i = \mathbb {E}[Y_i]\) the expectation of the final indicator \(Y_i\).

Stage 1: Recall that our goal in this stage is to define a Poisson Binomial distribution \(\sum _i Z_i\) whose constituent indicators have no expectation in \(\mathcal{T}_k:=(0,\frac{1}{k})\cup (1-\frac{1}{k},1)\). The expectations \((p_i'=\mathbb {E}[Z_i])_i\) are defined in terms of the corresponding \((p_i)_i\) as follows. For all \(i\), if \(p_i \notin \mathcal{T}_k\) we set \(p_i' = p_i\). Then, if \(\mathcal{L}_k\) is the set of indices \(i\) such that \(p_i \in (0,1/k)\), we choose any collection \((p_i')_{i \in \mathcal {L}_k}\) so as to satisfy \(|\sum _{i \in \mathcal{L}_k} p_i- \sum _{i \in \mathcal{L}_k} p_i'| \le 1/k\) and \(p_i' \in \{0,1/k\}\), for all \(i \in \mathcal {L}_k\). That is, we round all indicators’ expectations to \(0\) or \(1/k\) while preserving the expectation of their sum, to within \(1/k\). Using the Poisson approximation to the Poisson Binomial distribution, given as Theorem 4 in Sect. 2.1, we can argue that \(\sum _{i \in \mathcal{L}_k} X_i\) is within \(1/k\) of a Poisson distribution with the same mean. By the same token, \(\sum _{i \in \mathcal{L}_k} Z_i\) is \(1/k\)-close to a Poisson distribution with the same mean. And the two resulting Poisson distributions have means that are within \(1/k\), and are therefore \(1.5/k\)-close to each other (see Lemma 3). Hence, by triangle inequality \(\sum _{i \in \mathcal{L}_k} X_i\) is \(3.5/k\)-close to \(\sum _{i \in \mathcal{L}_k} Z_i\). A similar construction is used to define the \(p_i'\)’s corresponding to the \(p_i\)’s lying in \((1-1/k,1)\). The details of this step can be found in Sect. 4.1.

Stage 2: The definition of \((q_i)_i\) depends on the number \(m\) of \(p_i'\)’s which are not \(0\) or \(1\). The case \(m \le k^3\) corresponds to Case 2a in the statement of Theorem 2, while the case \(m > k^3\) corresponds to Case 2b.

  • Case \(m \le k^3\): First, we set \(q_i = p_i'\), if \(p_i' \in \{0,1\}\). We then argue that each \(p'_i, {i \in \mathcal {M}}:=\{i~\vert ~p_i' \notin \{0,1\} \}\), can be rounded to some \(q_i\), which is an integer multiple of \(1/k^2\), so that (3) holds. Notice that, if we were allowed to use multiples of \(1/k^4\), this would be immediate via an application of Lemma 2:

    $$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i}{Z_i},\sum _{i}{Y_i}\right) \le \sum _{i \in \mathcal {M}} |p_i' - q_i|. \end{aligned}$$

    We improve the required accuracy to \(1/k^2\) via a series of Binomial approximations to the Poisson Binomial distribution, using Ehm’s bound [15] stated as Theorem 5 in Sect. 2.1. The details involve partitioning the interval \([1/k,1-1/k]\) into irregularly sized subintervals, whose endpoints are integer multiples of \(1/k^2\). We then round all but one of the \(p_i'\)’s falling in each subinterval to the endpoints of the subinterval so as to maintain their total expectation, and apply Ehm’s approximation to argue that the distribution of their sum is not affected by more than \(O(1/k^2)\) in total variation distance. It is crucial that the total number of subintervals is \(O(k)\) to get a total hit of at most \(O(1/k)\) in variation distance in the overall distribution. The details are given in Sect. 4.2.1.

  • Case \(m > k^3\): We approximate \(\sum _{i}Z_i\) with a Translated Poisson distribution (defined formally in Sect. 2), using Theorem 6 of Sect. 2.1 due to Röllin [23]. The quality of the approximation is inverse proportional to the standard deviation of \(\sum _i Z_i\), which is at least \(k\), by the assumption \(m>k^3\). Hence, we show that \(\sum _{i}Z_i\) is \(3/k\)-close to a Translated Poisson distribution. We then argue that the latter is \(6/k\)-close to a Binomial distribution \(B(m',q)\), where \(m' \le n\) and \(q\) is an integer multiple of \(\frac{1}{n}\). In particular, we show that an appropriate choice of \(m'\) and \(q\) implies (3), if we set \(m'\) of the \(q_i\)’s equal to \(q\) and the remaining equal to \(0\). The details are in Sect. 4.2.2.

1.3 Outline of proof of theorem 3

Using Roos’s expansion [24], given as Theorem 7 of Sect. 2.1, we express \(\mathrm{PBD}(p_1,\ldots ,p_n)\) as a weighted sum of the Binomial distribution \({\mathcal {B}}(n,p)\) at \(p = \bar{p}= {\sum p_i / n}\) and its first \(n\) derivatives with respect to \(p\) also at value \(p=\bar{p}\). (These derivatives correspond to finite signed measures.) We notice that the coefficients of the first \(d+1\) terms of this expansion are symmetric polynomials in \(p_1,\ldots ,p_n\) of degree at most \(d\). Hence, from the theory of symmetric polynomials, each of these coefficients can be written as a function of the power-sum symmetric polynomials \(\sum _i p_i^{\ell }\) for \(\ell =1,\ldots ,d\). So, whenever two Poisson Binomial distributions satisfy Condition \((C_d)\), the first \(d+1\) terms of their expansions are exactly identical, and the total variation distance of the distributions depends only on the other terms of the expansion (those corresponding to higher derivatives of the Binomial distribution). The proof is concluded by showing that the joint contribution of these terms to the total variation distance can be bounded by \(2^{-\Omega (d)}\), using Proposition 1 of Sect. 2.1, which is also due to Roos [24]. The details are provided in Sect. 5.

1.4 Related work

It is believed that Poisson [22] was the first to study the Poisson Binomial distribution, hence its name. Sometimes the distribution is also referred to as “Poisson’s Binomial Distribution.” PBDs have many uses in research areas such as survey sampling, case-control studies, and survival analysis; see e.g. [8] for a survey of their uses. They are also very important in the design of randomized algorithms [21].

In Probability and Statistics there is a broad literature studying various properties of these distributions; see [28] for an introduction to some of this work. Many results provide approximations to the Poisson Binomial distribution via simpler distributions. In a well-known result, Le Cam [18] shows that, for any vector \((p_i)_{i=1}^n \in [0, 1]^n\),

$$\begin{aligned} d_\mathrm{TV} \left( \mathrm{PBD}(p_1, \ldots , p_n),\mathrm{Poisson} \left( \sum _{i=1}^n p_i \right) \right) \le \sum _{i=1}^n p_i^2, \end{aligned}$$

where \(\mathrm{Poisson} (\lambda )\) is the Poisson distribution with parameter \(\lambda \). Subsequently many other proofs of this bound and improved ones, such as Theorem 4 of Sect. 2.1, were given, using a range of different techniques; [2, 7, 11, 17] is a sampling of work along these lines, and Steele [26] gives an extensive list of relevant references. Much work has also been done on approximating PBDs by Normal distributions (see e.g. [1, 6, 16, 19, 27]) and by Binomial distributions; see e.g. Ehm’s result [15], given as Theorem 5 of Sect. 2.1, as well as Soon’s result [25] and Roos’s result [24], given as Theorem 7 of Sect. 2.1.

These results provide structural information about PBDs that can be well approximated by simpler distributions, but fall short of our goal of approximating a PBD to within arbitrary accuracy. Indeed, the approximations obtained in the probability literature (such as the Poisson, Normal and Binomial approximations) typically depend on the first few moments of the PBD being approximated, while higher moments are crucial for arbitrary approximation [24]. At the same time, algorithmic applications often require that the approximating distribution is of the same kind as the distribution that is being approximated. E.g., in the anonymous game application mentioned earlier, the parameters of the given PBD correspond to mixed strategies of players at Nash equilibrium, and the parameters of the approximating PBD correspond to mixed strategies at approximate Nash equilibrium. Approximating the given PBD via a Poisson or a Normal distribution would not have any meaning in the context of a game.

As outlined above, the proof of our main result, Theorem 1, builds on Theorems 2 and 3. A weaker form of these theorems was announced in [9, 13], while a weaker form of Theorem 1 was announced in [10].

2 Preliminaries

For a positive integer \(\ell \), we denote by \([\ell ]\) the set \(\{1,\dots ,\ell \}\). For a random variable \(X\), we denote by \(\mathcal{L}(X)\) its distribution. We further need the following definitions.

Total variation distance: For two distributions \(\mathbb {P}\) and \(\mathbb {Q}\) supported on a finite set \({A}\) their total variation distance is defined as

$$\begin{aligned} d_{\mathrm{TV}}\left( \mathbb {P},\mathbb {Q}\right) := \frac{1}{2} \sum _{\alpha \in A}{\left| \mathbb {P}(\alpha )-\mathbb {Q}(\alpha )\right| }. \end{aligned}$$

An equivalent way to define \(d_{\mathrm{TV}}\left( \mathbb {P},\mathbb {Q}\right) \) is to view \(\mathbb {P}\) and \(\mathbb {Q}\) as vectors in \(\mathbb {R}^{A}\), and define \(d_{\mathrm{TV}}\left( \mathbb {P},\mathbb {Q}\right) = {1 \over 2}\Vert \mathbb {P} - \mathbb {Q}\Vert _1\) to equal half of their \(\ell _1\) distance. If \(X\) and \(Y\) are random variables ranging over a finite set, their total variation distance, denoted \(d_{\mathrm{TV}}\left( X,Y\right) ,\) is defined to equal \(d_{\mathrm{TV}}\left( \mathcal{L}(X),\mathcal{L}(Y)\right) \).

Covers: Let \(\mathcal{F}\) be a set of probability distributions. A subset \(\mathcal{G} \subseteq \mathcal{F}\) is called a (proper) \(\epsilon \) -cover of \(\mathcal{F}\) in total variation distance if, for all \(D \in \mathcal{F}\), there exists some \(D' \in \mathcal{G}\) such that \(d_{\mathrm{TV}}\left( D,D'\right) \le \epsilon \).

Poisson binomial distribution: A Poisson binomial distribution of order \(n \in \mathbb {N}\) is the discrete probability distribution of the sum \(\sum _{i=1}^n X_i\) of \(n\) mutually independent Bernoulli random variables \(X_1,\ldots ,X_n\). We denote the set of all Poisson Binomial distributions of order \(n\) by \(\mathcal{S}_{n}\).

By definition, a Poisson Binomial distribution \(D \in \mathcal{S}_n\) can be represented by a vector \((p_i)_{i=1}^n \in [0,1]^n\) of probabilities as follows. We map \(D \in \mathcal{S}_n\) to a vector of probabilities by finding a collection \(X_1,\ldots ,X_n\) of mutually independent indicators such that \(\sum _{i=1}^n X_i\) is distributed according to \(D\), and setting \(p_i = \mathbb {E}[X_i]\) for all \(i\). The following lemma implies that the resulting vector of probabilities is unique up to a permutation, so that there is a one-to-one correspondence between Poisson Binomial distributions and vectors \((p_i)_{i=1}^n \in [0,1]^n\) such that \(0\le p_1 \le p_2 \le \cdots \le p_n \le 1\). The proof of this lemma can be found in Sect. 6.

Lemma 1

Let \(X_1,\ldots ,X_n\) be mutually independent indicators with expectations \(p_1 \le p_2 \le \cdots \le p_n\) respectively. Similarly let \(Y_1,\ldots ,Y_n\) be mutually independent indicators with expectations \(q_1 \le \cdots \le q_n\) respectively. The distributions of \(\sum _i X_i\) and \(\sum _i Y_i\) are different if and only if \((p_1,\ldots ,p_n) \ne (q_1,\ldots ,q_n)\).

We will be denoting a Poisson Binomial distribution \(D \in \mathcal{S}_n\) by \(\mathrm{PBD}(p_1,\ldots ,p_n)\) when it is the distribution of the sum \(\sum _{i=1}^n X_i\) of mutually independent indicators \(X_1,\ldots ,X_n\) with expectations \(p_i=\mathbb {E}[X_i]\), for all \(i\). Given the above discussion, the representation is unique up to a permutation of the \(p_i\)’s.

Translated Poisson distribution: We say that an integer random variable \(Y\) has a translated Poisson distribution with parameters \(\mu \) and \(\sigma ^2\) and write \(\mathcal {L}(Y)=TP(\mu ,\sigma ^2)\) iff

$$\begin{aligned} \mathcal {L}(Y - \lfloor \mu -\sigma ^2\rfloor ) = \mathrm{Poisson}(\sigma ^2+ \{\mu -\sigma ^2\}), \end{aligned}$$

where \(\{\mu -\sigma ^2\}\) represents the fractional part of \(\mu -\sigma ^2\).

Order notation: Let \(f(x)\) and \(g(x)\) be two positive functions defined on some infinite subset of \(\mathbb {R}_+\). One writes \(f(x)=O(g(x))\) if and only if, for sufficiently large values of \(x, f(x)\) is at most a constant times \(g(x)\). That is, \(f(x) = O(g(x))\) if and only if there exist positive real numbers \(M\) and \(x_0\) such that

$$\begin{aligned} f(x) \le \; M g(x), \text{ for } \text{ all } x>x_0. \end{aligned}$$

Similarly, we write \(f(x) = \Omega (g(x))\) if and only if there exist positive reals \(M\) and \(x_0\) such that

$$\begin{aligned} f(x) \ge \; M g(x), \text{ for } \text{ all } x>x_0. \end{aligned}$$

We are casual in our use of the order notation \(O(\cdot )\) and \(\Omega (\cdot )\) throughout the paper. Whenever we write \(O(f(n))\) or \(\Omega (f(n))\) in some bound where \(n\) ranges over the integers, we mean that there exists a constant \(c >0 \) such that the bound holds true for sufficiently large \(n\) if we replace the \(O(f(n))\) or \(\Omega (f(n))\) in the bound by \(c \cdot f(n)\). On the other hand, whenever we write \(O(f(1/\epsilon ))\) or \(\Omega (f(1/\epsilon ))\) in some bound where \(\epsilon \) ranges over the positive reals, we mean that there exists a constant \(c>0\) such that the bound holds true for sufficiently small \(\epsilon \) if we replace the \(O(f(1/\epsilon ))\) or \(\Omega (f(1/\epsilon ))\) in the bound with \(c \cdot f(1/\epsilon )\).

We conclude with an easy but useful lemma whose proof we defer to Sect. 6.

Lemma 2

Let \(X_1,\ldots ,X_n\) be mutually independent random variables, and let \(Y_1,\ldots ,Y_n\) be mutually independent random variables. Then

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i=1}^n X_i,\sum _{i=1}^n Y_i\right) \le \sum _{i=1}^nd_{\mathrm{TV}}\left( X_i,Y_i\right) . \end{aligned}$$

2.1 Approximations to the Poisson binomial distribution

We present a collection of known approximations to the Poisson Binomial distribution via simpler distributions. The quality of these approximations can be quantified in terms of the first few moments of the Poisson Binomial distribution that is being approximated. We will make use of these bounds to approximate Poisson Binomial distributions in different regimes of their moments. Theorems 4—6 are obtained via the Stein-Chen method.

Theorem 4

(Poisson approximation [2, 3]) Let \(J_1,\ldots ,J_n\) be mutually independent indicators with \(\mathbb {E}[J_i]=t_i\). Then

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i=1}^nJ_i,\mathrm{Poisson}\left( \sum _{i=1}^nt_i\right) \right) \le \frac{\sum _{i=1}^nt_i^2}{\sum _{i=1}^nt_i}. \end{aligned}$$

Theorem 5

(Binomial Approximation [15]) Let \(J_1,\ldots ,J_n\) be mutually independent indicators with \(\mathbb {E}[J_i]=t_i\), and \(\bar{t} = {\sum _i t_i \over n}\). Then

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i=1}^nJ_i,{\mathcal {B}}\left( n,\bar{t}\right) \right) \le \frac{\sum _{i=1}^n (t_i-\bar{t})^2}{(n+1) \bar{t}(1-\bar{t})}, \end{aligned}$$

where \({\mathcal {B}}\left( n,\bar{t}\right) \) is the Binomial distribution with parameters \(n\) and \(\bar{t}\).

Theorem 6

(Translated Poisson Approximation [23]) Let \(J_1,\ldots ,J_n\) be mutually independent indicators with \(\mathbb {E}[J_i]=t_i\). Then

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i=1}^nJ_i,TP(\mu , \sigma ^2)\right) \le \frac{\sqrt{\sum _{i=1}^nt_i^3(1-t_i)}+2}{\sum _{i=1}^nt_i(1-t_i)}, \end{aligned}$$

where \(\mu =\sum _{i=1}^nt_i\) and \(\sigma ^2 = \sum _{i=1}^nt_i(1-t_i)\).

The approximation theorems stated above do not always provide tight enough approximations. When these fail, we employ the following theorem of Roos [24], which provides an expansion of the Poisson Binomial distribution as a weighted sum of a finite number of signed measures: the Binomial distribution \(\mathcal {B}({n,p})\) (for an arbitrary value of \(p\)) and its first \(n\) derivatives with respect to the parameter \(p\), at the chosen value of \(p\). For the purposes of the following statement we denote by \(\mathcal {B}_{n,p}(m)\) the probability assigned by the Binomial distribution \({\mathcal {B}}(n,p)\) to integer \(m\).

Theorem 7

([24]) Let \(\mathcal {P}:=(p_i )_{i=1}^n \in [0,1]^n, X_1,\ldots ,X_n\) be mutually independent indicators with expectations \(p_1,\ldots ,p_n\), and \(X=\sum _i X_i\). Then, for all \(m \in \{0,\ldots ,n\}\) and \(p \in [0,1]\),

$$\begin{aligned} Pr[X = m] = \sum _{\ell = 0}^n \alpha _{\ell }(\mathcal {P}, p)\cdot \delta ^{\ell }\mathcal {B}_{n,p}(m), \end{aligned}$$
(4)

where for the purposes of the above expression:

  • \(\alpha _0(\mathcal {P},p):=1\) and for \(\ell \in [n]\!:\)

    $$\begin{aligned} \alpha _{\ell }(\mathcal {P},p):= \sum _{1 \le k(1) < \cdots < k(\ell ) \le n} \prod _{r=1}^{\ell }(p_{k(r)}-p); \end{aligned}$$
  • and for all \(\ell \in \{0,\ldots ,n\}\!:\)

    $$\begin{aligned} \delta ^{\ell }\mathcal {B}_{n,p}(m):=\frac{(n-\ell )!}{n!} \frac{d^{\ell }}{d p^{\ell }}\mathcal {B}_{n,p}(m), \end{aligned}$$

    where for the last definition we interpret \(\mathcal {B}_{n,p}(m)\equiv {n \atopwithdelims ()m} p^m (1-p)^{n-m}\) as a function of \(p\).

We can use Theorem 7 to get tighter approximations to the Poisson Binomial distribution by appropriately tuning the number of terms of summation (4) that we keep. The following proposition, shown in the proof of Theorem 2 of [24], bounds the \(\ell _1\) approximation error to the Poisson Binomial distribution when only the first \(d+1\) terms of summation (4) are kept. The error decays exponentially in \(d\) as long as the quantity \(\theta ({\mathcal {P}},p)\) in the proposition statement is smaller than \(1\).

Proposition 1

([24]) Let \(\mathcal {P}=(p_i )_{i=1}^n \in [0,1]^n, p \in [0,1], \alpha _{\ell }(\cdot , \cdot )\) and \(\delta ^{\ell }\mathcal {B}_{n,p}(\cdot )\) as in the statement of Theorem 7, and take

$$\begin{aligned} \theta ({\mathcal {P}},p)= \frac{2 \sum _{i=1}^n(p_i - p)^2 + (\sum _{i=1}^n(p_i - p))^2}{2np(1-p)}. \end{aligned}$$

If \(\theta ({\mathcal {P}},p)< 1,\) then, for all \(d \ge 0\):

$$\begin{aligned}&\sum _{\ell = d+1}^n |\alpha _{\ell }(\mathcal {P}, p)|\cdot \Vert \delta ^{\ell }\mathcal {B}_{n,p}(\cdot )\Vert _1 \le {\sqrt{e}(d+1)^{1/4}} \theta ({\mathcal {P}},p)^{(d+1)/2} \frac{1- \frac{d}{d+1}\sqrt{\theta ({\mathcal {P}},p)}}{(1-\sqrt{\theta ({\mathcal {P}},p)})^2}, \end{aligned}$$

where \(\Vert \delta ^{\ell }\mathcal {B}_{n,p}(\cdot )\Vert _1 := \sum _{m=0}^n | \delta ^{\ell }\mathcal {B}_{n,p}(m) |\).

3 Proof of Theorem 1

We first argue that Theorem 2 already implies the existence of an \(\epsilon \)-cover \(\mathcal{S}_{n,\epsilon }'\) of \(\mathcal{S}_n\) of size at most \(n^2 + n \cdot \left( {1 \over \epsilon }\right) ^{O({1/\epsilon ^2})}\). This cover is obtained by taking the union of all Poisson Binomial distributions in \((n,k)\)-Binomial form and all Poisson Binomial distributions in \(k\)-sparse form, for \(k=\lceil 41/ \epsilon \rceil \). The total number of Poisson Binomial distributions in \((n,k)\)-Binomial form is at most \(n^2\), since there are at most \(n\) choices for the value of \(\ell \) and at most \(n\) choices for the value of \(q\). The total number of Poisson Binomial distributions in \(k\)-sparse form is at most \((k^3+1)\cdot k^{3 k^2} \cdot (n+1) = n \cdot \left( {1 \over \epsilon }\right) ^{O({1/\epsilon ^2})}\) since there are \(k^3+1\) choices for \(\ell \), at most \(k^{3k^2}\) choices of probabilities \(p_1 \le p_2 \le \cdots \le p_{\ell }\) in \(\left\{ {1 \over k^2}, {2\over k^2},\ldots , {k^2-1 \over k^2 }\right\} \), and at most \(n+1\) choices for the number of variables indexed by \(i > \ell \) that have expectation equal to \(1\).Footnote 1 Notice that enumerating over the above distributions takes time \(O(n^2 \log n) + O(n \log n) \cdot \left( {1 \over \epsilon }\right) ^{O({1/\epsilon ^2})}\), as a number in \(\{0,\ldots ,n\}\) and a probability in \(\left\{ {1 \over n},{2 \over n},\ldots ,{n \over n}\right\} \) can be represented using \(O(\log n)\) bits, while a number in \(\{0,\ldots ,k^3\}\) and a probability in \(\left\{ {1 \over k^2}, {2\over k^2},\ldots , {k^2-1 \over k^2 }\right\} \) can be represented using \(O(\log k)=O(\log 1/\epsilon )\) bits.

We next show that we can remove from \(\mathcal{S}_{n,\epsilon }'\) a large number of the sparse-form distributions it contains to obtain a \(2\epsilon \)-cover of \(\mathcal{S}_n\). In particular, we shall only keep \(n \cdot \left( {1 \over \epsilon }\right) ^{O(\log ^2{1/\epsilon })}\) sparse-form distributions by appealing to Theorem 3. To explain the pruning we introduce some notation. For a collection \({\mathcal {P}}=(p_i)_{i \in [n]} \in [0,1]^n\) of probability values we denote by \(\mathcal{L}_{{\mathcal {P}}} =\{i~|~p_i\in (0,1/2]\}\) and by \(\mathcal{R}_{{\mathcal {P}}}= \{i~|~p_i\in (1/2,1)\}\). Theorem 3, Corollary 1, Lemmas 1 and 2 imply that if two collections \({\mathcal {P}}=(p_i)_{i \in [n]}\) and \(\mathcal {Q}=(q_i)_{i \in [n]}\) of probability values satisfy

$$\begin{aligned} \sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i^{t}&= \sum _{i \in \mathcal{L}_{\mathcal {Q}}} q_i^{t},\quad \text {for all } t=1,\ldots ,d; \\ \sum _{i \in \mathcal{R}_{{\mathcal {P}}}} p_i^{t}&= \sum _{i \in \mathcal{R}_{\mathcal {Q}}} q_i^{t},\quad \text {for all } t=1,\ldots ,d; \text {and}\\ (p_i)_{[n]{\setminus } (\mathcal {L}_{{\mathcal {P}}} \cup \mathcal {R}_{{\mathcal {P}}})}~\text {and}&~(q_i)_{[n]{\setminus } (\mathcal {L}_{\mathcal {Q}} \cup \mathcal {R}_{\mathcal {Q}})}~\text {are equal up to a permutation;} \end{aligned}$$

then \(d_\mathrm{TV}(\mathrm{PBD}({\mathcal {P}}), \mathrm{PBD}(\mathcal {Q})) \le 2\cdot 13(d+1)^{1/4} 2^{-(d+1)/2}\). In particular, for some \(d(\epsilon )= O(\log 1/\epsilon )\), this bound becomes at most \(\epsilon \).

For a collection \({\mathcal {P}}=(p_i)_{i \in [n]} \in [0,1]^n\), we define its moment profile \(m_{{\mathcal {P}}}\) to be the \((2 d(\epsilon )+1)\)-dimensional vector

$$\begin{aligned} m_{{\mathcal {P}}} = \left( \sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i, \sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i^{2},\ldots ,\sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i^{d(\epsilon )}; \sum _{i \in \mathcal{R}_{{\mathcal {P}}}} p_i, \ldots ,\sum _{i \in \mathcal{R}_{{\mathcal {P}}}} p_i^{d(\epsilon )} ; |\{ i~|~p_i\!=\!1 \}|\!\right) . \end{aligned}$$

By the previous discussion, for two collections \({\mathcal {P}}, \mathcal {Q}\), if \(m_{{\mathcal {P}}} = m_{\mathcal {Q}}\) then \(d_\mathrm{TV}(\mathrm{PBD}({\mathcal {P}}), \mathrm{PBD}(\mathcal {Q})) \le \epsilon \).

Given the above we sparsify \(\mathcal{S}_{n,\epsilon }'\) as follows: for every possible moment profile that can arise from a Poisson Binomial distribution in \(k\)-sparse form, we keep in our cover a single Poisson Binomial distribution with such moment profile. The cover resulting from this sparsification is a \(2 \epsilon \)-cover, since the sparsification loses us an additional \(\epsilon \) in total variation distance, as argued above.

We now bound the cardinality of the sparsified cover. The total number of moment profiles of \(k\)-sparse Poisson Binomial distributions is \(k^{O(d(\epsilon )^2)} \cdot (n+1)\). Indeed, consider a Poisson Binomial distribution \(\mathrm{PBD}({\mathcal {P}}=(p_i)_{i \in [n]})\) in \(k\)-sparse form. There are at most \(k^3+1\) choices for \(|\mathcal{L}_{{\mathcal {P}}}|\), at most \(k^3+1\) choices for \(|\mathcal{R}_{{\mathcal {P}}}|\), and at most \((n+1)\) choices for \(|\{i~|~p_i = 1\}|\). We also claim that the total number of possible vectors

$$\begin{aligned} \left( \sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i, \sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i^{2},\ldots ,\sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i^{d(\epsilon )}\right) \end{aligned}$$

is \(k^{O(d(\epsilon )^2)}\). Indeed, if \(|\mathcal {L}_{{\mathcal {P}}}|=0\) there is just one such vector, namely the all-zero vector. If \(|\mathcal {L}_{{\mathcal {P}}}|> 0\), then, for all \(t=1,\ldots ,d(\epsilon ), \sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i^{t} \in (0, |\mathcal{L}_{{\mathcal {P}}}|]\) and it must be an integer multiple of \(1/k^{2t}\). So the total number of possible values of \(\sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i^{t}\) is at most \(k^{2t} |\mathcal{L}_{{\mathcal {P}}}| \le k^{2t} k^3\), and the total number of possible vectors

$$\begin{aligned} \left( \sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i, \sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i^{2},\ldots ,\sum _{i \in \mathcal{L}_{{\mathcal {P}}}} p_i^{d(\epsilon )}\right) \end{aligned}$$

is at most

$$\begin{aligned} \prod _{t=1}^{d(\epsilon )}k^{2t} k^3 \le k^{O(d(\epsilon )^2)}. \end{aligned}$$

The same upper bound applies to the total number of possible vectors

$$\begin{aligned} \left( \sum _{i \in \mathcal{R}_{{\mathcal {P}}}} p_i, \sum _{i \in \mathcal{R}_{{\mathcal {P}}}} p_i^{2},\ldots ,\sum _{i \in \mathcal{R}_{{\mathcal {P}}}} p_i^{d(\epsilon )}\right) . \end{aligned}$$

The moment profiles we enumerated over are a superset of the moment profiles of \(k\)-sparse Poisson Binomial distributions. We call them compatible moment profiles. We argued that there are at most \(k^{O(d(\epsilon )^2)} \cdot (n+1)\) compatible moment profiles, so the total number of Poisson Binomial distributions in \(k\)-sparse form that we keep in the cover is at most \(k^{O(d(\epsilon )^2)} \cdot (n+1) = n \cdot \left( {1 \over \epsilon }\right) ^{O(\log ^2{1/\epsilon })}\). The number of Poisson Binomial distributions in \((n,k)\)-Binomial form is the same as before, i.e. at most \(n^2\), as we did not eliminate any of them. So the size of the sparsified cover is \(n^2 + n \cdot \left( {1 \over \epsilon }\right) ^{O(\log ^2{1/\epsilon })}\).

To finish the proof it remains to argue that we don’t actually need to first compute \(\mathcal{S}_{n,\epsilon }'\) and then sparsify it to obtain our cover, but can produce it directly in time \(O(n^2 \log n) + O(n \log n) \cdot \left( {1 \over \epsilon }\right) ^{O(\log ^2{1/\epsilon })}\). We claim that, given a moment profile \(m\) that is compatible with a \(k\)-sparse Poisson Binomial distribution, we can compute some \(\mathrm{PBD}({\mathcal {P}}=(p_i)_i)\) in \(k\)-sparse form such that \(m_{{\mathcal {P}}}=m\), if such a distribution exists, in time \(O(\log n) \left( {1 \over \epsilon }\right) ^{O(\log ^2{1/\epsilon })}\). This follows from Claim 1 of Sect. 6.Footnote 2 So our algorithm enumerates over all moment profiles that are compatible with a \(k\)-sparse Poisson Binomial distribution and for each profile invokes Claim 1 to find a Poisson Binomial distribution with such moment profile, if such distribution exists, adding it to the cover if it does exist. It then enumerates over all Poisson Binomial distributions in \((n,k)\)-Binomial form and adds them to the cover as well. The overall running time is as promised.

4 Proof of Theorem 2

We organize the proof according to the structure and notation of our outline in Sect. 1.2. In particular, we proceed to provide the details of Stages 1 and 2, described in the outline. The reader should refer to Sect. 1.2 for notation.

4.1 Details of stage 1

Define \(\mathcal{L}_k\!:=\! \left\{ i~\vert ~i\in [n] \wedge p_i\!\in \!(0,1/k)\right\} \) and \(\mathcal{H}_k\!:=\! \left\{ i~\vert ~i\in [n] \wedge p_i\!\in \!(1-1/k,1)\right\} \!.\) We define the expectations \((p_i')_i\) of the intermediate indicators \((Z_i)_i\) as follows.

First, we set \(p'_i=p_i\), for all \(i \in [n]{\setminus }\mathcal{L}_k \cup \mathcal{H}_k\). It follows that

$$\begin{aligned}&d_{\mathrm{TV}}\left( \sum _{i \in [n]{\setminus }\mathcal{L}_k \cup \mathcal{H}_k}X_i,\sum _{i \in [n]{\setminus }\mathcal{L}_k \cup \mathcal{H}_k}Z_i\right) = 0. \end{aligned}$$
(5)

Next, we define the probabilities \(p'_i, i\in \mathcal{L}_k\), using the following procedure:

  1. 1.

    Set \(r=\left\lfloor \frac{\sum _{i \in \mathcal{L}_k}p_i}{1/k}\right\rfloor \); and let \(\mathcal{L}_k' \subseteq \mathcal{L}_k\) be an arbitrary subset of cardinality \(|\mathcal{L}_k'|=r\).

  2. 2.

    Set \(p_i'=\frac{1}{k}\), for all \(i \in \mathcal{L}_k'\), and \(p_i'=0\), for all \(i \in \mathcal{L}_k{\setminus } \mathcal{L}_k'\).

We bound the total variation distance \(d_{\mathrm{TV}}\left( \sum _{i \in \mathcal{L}_k}X_i,\sum _{i \in \mathcal{L}_k}Z_i\right) \) using the Poisson approximation to the Poisson Binomial distribution. In particular, Theorem 4 implies

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i \in \mathcal{L}_k}X_i,\mathrm{Poisson}\left( \sum _{i \in \mathcal {L}_k} p_i\right) \right) \le \frac{\sum _{i \in \mathcal {L}_k}p_i^2}{\sum _{i \in \mathcal {L}_k} p_i} \le {{1 \over k} \sum _{i \in \mathcal {L}_k} p_i \over \sum _{i \in \mathcal {L}_k} p_i}= 1/k. \end{aligned}$$

Similarly, \(d_{\mathrm{TV}}\left( \sum _{i \in \mathcal{L}_k}Z_i,\mathrm{Poisson}\left( \sum _{i \in \mathcal {L}_k} p'_i\right) \right) \le 1/k.\) Finally, we use Lemma 3 (given below and proved in Sect. 6) to bound the distance

$$\begin{aligned} d_{\mathrm{TV}}\left( \mathrm{Poisson}\left( \sum _{i \in \mathcal {L}_k} p_i\right) ,\mathrm{Poisson}\left( \sum _{i \in \mathcal {L}_k} p'_i\right) \right) \le {1\over 2}\left( e^{1\over k}-e^{-{1 \over k}} \right) \le {1.5 \over k}, \end{aligned}$$

where we used that \(|\sum _{i \in \mathcal {L}_k} p_i - \sum _{i \in \mathcal {L}_k} p'_i | \le 1/k\). Using the triangle inequality the above imply

$$\begin{aligned}&d_{\mathrm{TV}}\left( \sum _{i \in \mathcal{L}_k}X_i,\sum _{i \in \mathcal{L}_k}Z_i\right) \le \frac{3.5}{k}. \end{aligned}$$
(6)

Lemma 3

(Variation Distance of Poisson Distributions) Let \(\lambda _1, \lambda _2>0\). Then

$$\begin{aligned}&d_{\mathrm{TV}}\left( \mathrm{Poisson}(\lambda _1),\mathrm{Poisson}(\lambda _2)\right) \le {1 \over 2} \left( e^{|\lambda _1-\lambda _2|}-e^{-|\lambda _1-\lambda _2|}\right) .\end{aligned}$$

We follow a similar rounding scheme to define \((p_i')_{i \in \mathcal {H}_k}\) from \((p_i)_{i \in \mathcal{H}_k}\). That is, we round some of the \(p_i\)’s to \(1-1/k\) and some of them to \(1\) so that \(|\sum _{i \in \mathcal {H}_k}p_i -\sum _{i \in \mathcal {H}_k}p'_i | \le 1/k\). As a result, we get (to see this, repeat the argument employed above to the variables \(1-X_i\) and \(1-Z_i, i\in \mathcal{H}_k\))

$$\begin{aligned}&d_{\mathrm{TV}}\left( \sum _{i \in \mathcal{H}_k}X_i,\sum _{i \in \mathcal{H}_k}Z_i\right) \le \frac{3.5}{k}. \end{aligned}$$
(7)

Using (5), (6), (7) and Lemma 2 we get (2).

4.2 Details of stage 2

Recall that \(\mathcal{M}:=\{i~|~p'_i \notin \{0,1\}\}\) and \(m:=|\mathcal{M}|\). Depending on on whether \(m \le k^3\) or \(m > k^3\) we follow different strategies to define the expectations \((q_i)_i\) of indicators \((Y_i)_i\).

4.2.1 The case \(m \le k^3\)

First we set \(q_i = p'_i\), for all \(i\in [n] {\setminus } \mathcal{M}.\) It follows that

$$\begin{aligned}&d_{\mathrm{TV}}\left( \sum _{i \in [n] {\setminus } \mathcal{M}}Z_i,\sum _{i \in [n] {\setminus } \mathcal{M}}Y_i\right) = 0. \end{aligned}$$
(8)

For the definition of \((q_i)_{i \in \mathcal {M}}\), we make use of Ehm’s Binomial approximation to the Poisson Binomial distribution, stated as Theorem 5 in Sect. 2.1. We start by partitioning \(\mathcal {M}\) as \(\mathcal {M}=\mathcal {M}_l \sqcup \mathcal {M}_h\), where \(\mathcal {M}_l = \{i \in \mathcal {M}~|~p'_i \le 1/2 \}\), and describe below a procedure for defining \((q_i)_{i \in \mathcal {M}_l}\) so that the following hold:

  1. 1.

    \(d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}_l} Z_i,\sum _{i \in \mathcal {M}_l}Y_i\right) \le 17/k\);

  2. 2.

    For all \(i \in \mathcal {M}_l, q_i\) is an integer multiple of \(1/k^2\).

To define \((q_i)_{i \in \mathcal {M}_h}\), we apply the same procedure to \((1-p_i')_{i\in \mathcal {M}_h}\) to obtain \((1-q_i)_{i \in \mathcal {M}_h}\). Assuming the correctness of our procedure for probabilities \(\le 1/2\) the following should also hold:

  1. 1.

    \(d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}_h} Z_i,\sum _{i \in \mathcal {M}_h}Y_i\right) \le 17/k\);

  2. 2.

    For all \(i \in \mathcal {M}_h, q_i\) is an integer multiple of \(1/k^2\).

Using Lemma 2, the above bounds imply

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}} Z_i,\sum _{i \in \mathcal {M}}Y_i\right)&\le d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}_l} Z_i,\sum _{i \in \mathcal {M}_l}Y_i\right) \nonumber \\&\quad + d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}_h} Z_i,\sum _{i \in \mathcal {M}_h}Y_i\right) \le 34/k. \end{aligned}$$
(9)

Now that we have (9), using (8) and Lemma 2 we get (3).

So it suffices to define the \((q_i)_{i \in \mathcal {M}_l}\) properly. To do this, we define the partition \(\mathcal {M}_l=\mathcal {M}_{l,1} \sqcup \mathcal {M}_{l,2} \sqcup \cdots \sqcup \mathcal {M}_{l,k-1}\) where for all \(j\):

$$\begin{aligned} \mathcal {M}_{l,j} = \left\{ i~\Big |~ p_i' \in \left[ {1 \over k}+{(j-1)j \over 2}{1 \over k^2},{1 \over k}+{(j+1)j \over 2}{1 \over k^2} \right) \right\} . \end{aligned}$$

(Notice that the length of interval used in the definition of \(\mathcal {M}_{l,j}\) is \( {j \over k^2}\).) Now, for each \(j=1,\ldots ,k-1\) such that \(\mathcal {M}_{l,j} \ne \emptyset \), we define \((q_i)_{i \in \mathcal {M}_{l,j}}\) via the following procedure:

  1. 1.

    Set \(p_{j,\min }:={1 \over k}+{(j-1)j \over 2}{1 \over k^2}, p_{j,\max }:={1 \over k}+{(j+1)j \over 2}{1 \over k^2}, n_j = |\mathcal {M}_{l,j}|, \bar{p}_j = {\sum _{i \in \mathcal{M}_{l,j}}p_i' \over n_j}\).

  2. 2.

    Set \(r=\left\lfloor \frac{n_j (\bar{p}_j-p_{j,\min })}{j/k^2}\right\rfloor \); let \(\mathcal{M}_{l,j}' \subseteq \mathcal{M}_{l,j}\) be an arbitrary subset of cardinality \(r\).

  3. 3.

    Set \(q_i=p_{j,\max }\), for all \(i \in \mathcal{M}_{l,j}'\);

  4. 4.

    For an arbitrary index \(i_j^* \in \mathcal{M}_{l,j} {\setminus } \mathcal{M}_{l,j}'\), set \(q_{i^*_j}=n_j \bar{p}_j - (r p_{j,\max } + (n_j-r-1)p_{j,\min })\);

  5. 5.

    Finally, set \(q_i=p_{j,\min }\), for all \(i \in \mathcal{M}_{l,j} {\setminus } \mathcal{M}_{l,j}' {\setminus } \{i^*_j\}\).

It is easy to see that

  1. 1.

    \(\sum _{i \in \mathcal{M}_{l,j}}p_i' = \sum _{i \in \mathcal{M}_{l,j}}q_i \equiv n_j \bar{p}_j\);

  2. 2.

    For all \(i \in \mathcal{M}_{l,j}{\setminus } \{i_j^*\}, q_i\) is an integer multiple of \(1/k^2\).

Moreover Theorem 5 implies:

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}_{l,j}}Z_i,{\mathcal {B}}\left( n_j, \bar{p}_j\right) \right)&\le \frac{\sum _{i \in \mathcal {M}_{l,j}} (p_i'-\bar{p}_j)^2}{(n_j+1) \bar{p}_j(1-\bar{p}_j)} \\&\le {\left\{ \begin{array}{ll}{ n_j (j {1 \over k^2})^2 \over (n_j+1) p_{j,\min } (1-p_{j,\min })},\quad \text {when }j <k-1\\ {n_j (j {1 \over k^2})^2 \over (n_j+1) p_{j,\max } (1-p_{j,\max })},\quad \text {when }j = k-1\end{array}\right. }\\&\le {8 \over k^2}. \end{aligned}$$

A similar derivation gives \(d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}_{l,j}}Y_i,{\mathcal {B}}\left( n_j,\bar{p}_j\right) \right) \le {8 \over k^2}\). So by the triangle inequality:

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}_{l,j}}Z_i,\sum _{i \in \mathcal {M}_{l,j}}Y_i\right) \le {16 \over k^2}. \end{aligned}$$
(10)

As Eq. (10) holds for all \(j=1,\ldots ,k-1\), an application of Lemma 2 gives:

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}_{l}}Z_i,\sum _{i \in \mathcal {M}_{l}}Y_i\right) \le \sum _{j=1}^{k-1} d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}_{l,j}}Z_i,\sum _{i \in \mathcal {M}_{l,j}}Y_i\right) \le {16 \over k}. \end{aligned}$$

Moreover, the \(q_i\)’s defined above are integer multiples of \(1/k^2\), except maybe for \(q_{i_1^*},\ldots ,q_{i_{k-1}^*}\). But we can round these to their closest multiple of \(1/k^2\), increasing \(d_{\mathrm{TV}}\left( \sum _{i \in \mathcal {M}_l} Z_i,\sum _{i \in \mathcal {M}_l}Y_i\right) \) by at most \(1/k\).

4.2.2 The case \(m > k^3\)

Let \(t = | \{ i~|~p'_i=1\}|\). We show that the random variable \(\sum _{i}Z_i\) is within total variation distance \(9/k\) from the Binomial distribution \({\mathcal {B}}(m',q)\) where

$$\begin{aligned} m':=\left\lceil \frac{\left( \sum _{i \in \mathcal{M}}p_i' + t\right) ^2}{\sum _{i \in \mathcal{M}}p_i'^2+t} \right\rceil \quad \text {and}\quad&~q:=\frac{\ell ^*}{n}, \end{aligned}$$

where \(\ell ^*\) satisfies \(\frac{\sum _{i \in \mathcal{M}}p_i' + t}{m'} \in [\frac{\ell ^*-1}{n}, \frac{\ell ^*}{n}]\). Notice that:

  • \(\left( \sum _{i \in \mathcal{M}}p_i' + t\right) ^2 \le (\sum _{i \in \mathcal{M}}p_i'^2+t) (m+t)\), by the Cauchy-Schwarz inequality; and

  • \(\frac{\sum _{i \in \mathcal{M}}p_i' + t}{m'} \le {\sum _{i \in \mathcal {M}} p_i' + t \over \frac{\left( \sum _{i \in \mathcal{M}}p_i' + t\right) ^2}{\sum _{i \in \mathcal{M}}p_i'^2+t}} = {\sum _{i \in \mathcal{M}}p_i'^2+t \over \sum _{i \in \mathcal{M}}p_i'+t} \le 1\).

So \(m' \le m+t \le n\), and there exists some \(\ell ^* \in \{1,\ldots ,n\}\) so that \(\frac{\sum _{i \in \mathcal{M}}p_i' + t}{m'} \in [\frac{\ell ^*-1}{n}, \frac{\ell ^*}{n}]\).

For fixed \(m'\) and \(q\), we set \(q_i=q\), for all \(i \le m'\), and \(q_i=0\), for all \(i >m'\), and compare the distributions of \(\sum _{i \in \mathcal{M}}Z_i\) and \(\sum _{i \in \mathcal{M}}Y_i\). For convenience we define

$$\begin{aligned} \mu&:=\mathbb {E}\left[ \sum _{i \in \mathcal{M}}Z_i\right] \quad \text {and}\quad \mu ':=\mathbb {E}\left[ \sum _{i \in \mathcal{M}}Y_i\right] ,\\ \sigma ^2&:=\text {Var}\left[ \sum _{i \in \mathcal{M}}Z_i\right] \quad \text {and}\quad \sigma '^2:=\text {Var}\left[ \sum _{i \in \mathcal{M}}Y_i\right] . \end{aligned}$$

The following lemma compares the values \(\mu , \mu ', \sigma , \sigma '\).

Lemma 4

The following hold

$$\begin{aligned} \mu&\le \mu ' \le \mu + 1, \end{aligned}$$
(11)
$$\begin{aligned} \sigma ^2-1&\le \sigma '^2 \le \sigma ^2 + 2, \end{aligned}$$
(12)
$$\begin{aligned} \mu&\ge k^2, \end{aligned}$$
(13)
$$\begin{aligned} \sigma ^2&\ge k^2\left( 1-\frac{1}{k} \right) . \end{aligned}$$
(14)

The proof of Lemma 4 is given in Sect. 6. To compare \(\sum _{i \in \mathcal{M}}Z_i\) and \(\sum _{i \in \mathcal{M}}Y_i\) we approximate both by Translated Poisson distributions. Theorem 6 implies that

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i}Z_i,TP(\mu , \sigma ^2)\right)&\le \frac{\sqrt{\sum _{i}p_i'^3(1-p'_i)}+2}{\sum _{i}p'_i(1-p'_i)} \le \frac{\sqrt{\sum _{i}p_i'(1-p'_i)}+2}{\sum _{i}p'_i(1-p'_i)}\\&\le \frac{1}{\sqrt{\sum _{i}p_i'(1-p'_i)}}+\frac{2}{\sum _{i}p'_i(1-p'_i)}= \frac{1}{\sigma }+\frac{2}{\sigma ^2}\\&\le \frac{1}{k\sqrt{1-1/k}}+\frac{2}{k^2\left( 1-\frac{1}{k}\right) }\quad \text {(using (14))}\\&\le {3 \over k}, \end{aligned}$$

where for the last inequality we assumed \(k\ge 3\), but the bound of \(3/k\) clearly also holds for \(k=1,2\). Similarly,

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i}Y_i,TP(\mu ', \sigma '^2)\right)&\le \frac{1}{\sigma '}+\frac{2}{\sigma '^2} \\&\le \frac{1}{k\sqrt{1\,{-}\,\frac{1}{k} \,{-}\, \frac{1}{k^2} }}+\frac{2}{k^2\left( 1\,{-}\,\frac{1}{k} \,{-}\, \frac{1}{k^2} \right) }\quad \text {(using (12), (14))}\\&\le {3 \over k}, \end{aligned}$$

where for the last inequality we assumed \(k\ge 3\), but the bound of \(3/k\) clearly also holds for \(k=1,2\). By the triangle inequality we then have that

$$\begin{aligned}&d_{\mathrm{TV}}\left( \sum _{i}Z_i,\sum _{i }Y_i\right) \nonumber \\&\quad \le d_{\mathrm{TV}}\left( \sum _{i }Z_i,TP(\mu , \sigma ^2)\right) \nonumber \\&\quad \quad +d_{\mathrm{TV}}\left( \sum _{i}Y_i,TP(\mu ', \sigma '^2)\right) + d_{\mathrm{TV}}\left( TP(\mu , \sigma ^2),TP(\mu ', \sigma '^2)\right) \nonumber \\&\quad =6/k+d_{\mathrm{TV}}\left( TP(\mu , \sigma ^2),TP(\mu ', \sigma '^2)\right) . \end{aligned}$$
(15)

It remains to bound the total variation distance between the two Translated Poisson distributions. We make use of the following lemma.

Lemma 5

([4]) Let \(\mu _1, \mu _2 \in \mathbb {R}\) and \(\sigma _1^2, \sigma _2^2 \in \mathbb {R}_+ {\setminus } \{0\}\) be such that \(\lfloor \mu _1-\sigma _1^2 \rfloor \le \lfloor \mu _2-\sigma _2^2 \rfloor \). Then

$$\begin{aligned}&d_{\mathrm{TV}}\left( TP(\mu _1,\sigma _1^2),TP(\mu _2,\sigma _2^2)\right) \le \frac{|\mu _1-\mu _2|}{\sigma _1}+\frac{|\sigma _1^2-\sigma _2^2|+1}{\sigma _1^2}. \end{aligned}$$

Lemma 5 implies

$$\begin{aligned}&d_{\mathrm{TV}}\left( TP(\mu , \sigma ^2),TP(\mu ', \sigma '^2)\right) \le \frac{|\mu -\mu '|}{\min (\sigma ,\sigma ')}+ \frac{|\sigma ^2-\sigma '^2|+1}{\min (\sigma ^2,\sigma '^2)} \nonumber \\&\quad \le \frac{1}{k\sqrt{1-\frac{1}{k} - \frac{1}{k^2} }}+ \frac{3}{k^2\left( 1-\frac{1}{k} - \frac{1}{k^2} \right) }\quad \text {(using Lemma 4)} \nonumber \\&\quad \le 3/k, \end{aligned}$$
(16)

where for the last inequality we assumed \(k > 3\), but the bound clearly also holds for \(k=1,2,3\). Using (15) and (16) we get

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i}Z_i,\sum _{i}Y_i\right) \le 9/k, \end{aligned}$$
(17)

which implies (3).

5 Proof of Theorem 3

Let \(\mathcal {X}\) and \(\mathcal {Y}\) be two collections of indicators as in the statement of Theorem 3. For \(\alpha _{\ell }(\cdot ,\cdot )\) defined as in the statement of Theorem 7, we claim the following.

Lemma 6

If \({\mathcal {P}}, \mathcal {Q}\in [0,1]^n\) satisfy property \((C_d)\) in the statement of Theorem 3, then for all \(p, \ell \in \{0,\ldots ,d\}\):

$$\begin{aligned} \alpha _{\ell }(\mathcal {P},p) = \alpha _{\ell }(\mathcal {Q},p). \end{aligned}$$

Proof of Lemma 6

First \(\alpha _{0}(\mathcal {P},p) = 1 = \alpha _{0}(\mathcal {Q},p)\) by definition. Now fix \(\ell \in \{1,\ldots ,d\}\) and consider the function \(f(\vec {x}):=\alpha _{\ell }((x_1,\ldots ,x_n),p)\) in the variables \(x_1,\ldots ,x_n \in \mathbb {R}\). Observe that \(f\) is a symmetric polynomial of degree \(\ell \) on \(x_1,\ldots ,x_n\). Hence, from the theory of symmetric polynomials, it follows that \(f\) can be written as a polynomial function of the power-sum symmetric polynomials \(\pi _1, \ldots ,\pi _{\ell }\), where

$$\begin{aligned} \pi _j(x_1,\ldots ,x_n):=\sum _{i=1}^nx_i^j,~\text {for all }j \in [\ell ], \end{aligned}$$

as the elementary symmetric polynomial of degree \(j \in [n]\) can be written as a polynomial function of the power-sum symmetric polynomials \(\pi _1,\ldots ,\pi _j\) (e.g. [29]). Now \((C_d)\) implies that \(\pi _j({\mathcal {P}}) = \pi _j(\mathcal {Q})\), for all \(j \le \ell \). So \(f({\mathcal {P}})=f(\mathcal {Q})\), i.e. \(\alpha _{\ell }(\mathcal {P},p) = \alpha _{\ell }(\mathcal {Q},p)\). \(\square \)

For all \(p\in [0,1]\), by combining Theorem 7 and Lemma 6 and we get that

$$\begin{aligned} Pr[X = m] - Pr[Y=m]&= \sum _{\ell = d+1}^n (\alpha _{\ell }(\mathcal {P}, p)-\alpha _{\ell }(\mathcal {Q}, p))\cdot \delta ^{\ell }\mathcal {B}_{n,p}(m),\\&\quad \quad \text {for all }m \in \{0,\ldots ,n\}. \end{aligned}$$

Hence, for all \(p\):

$$\begin{aligned} d_{\mathrm{TV}}\left( X,Y\right)&= \frac{1}{2} \sum _{m=0}^n|Pr[X = m] - Pr[Y=m]| \nonumber \\&\le \frac{1}{2} \sum _{\ell = d+1}^n |\alpha _{\ell }(\mathcal {P}, p)-\alpha _{\ell }(\mathcal {Q}, p)|\cdot \Vert \delta ^{\ell }\mathcal {B}_{n,p}(\cdot )\Vert _1\nonumber \\&\le \frac{1}{2} \sum _{\ell = d+1}^n \left( |\alpha _{\ell }(\mathcal {P}, p)|+|\alpha _{\ell }(\mathcal {Q}, p)|\right) \cdot \Vert \delta ^{\ell }\mathcal {B}_{n,p}(\cdot )\Vert _1. \end{aligned}$$
(18)

Plugging \(p = \bar{p}:= \frac{1}{n}\sum _{i} p_i\) into Proposition 1, we get

$$\begin{aligned} \theta ({\mathcal {P}},\bar{p})= \frac{\sum _{i=1}^n(p_i - \bar{p})^2 }{n\bar{p}(1-\bar{p})} \le \Big |\max _i\{p_i\}-\min _i\{p_i\}\Big | \le \frac{1}{2}\quad \text {(see [24])} \end{aligned}$$

and then

$$\begin{aligned} \frac{1}{2} \sum _{\ell = d+1}^n |\alpha _{\ell }(\mathcal {P}, \bar{p})|\cdot \Vert \delta ^{\ell }\mathcal {B}_{n,\bar{p}}(\cdot )\Vert _1&\le \sqrt{e}(d+1)^{1/4} 2^{-(d+1)/2} \frac{1- \frac{1}{\sqrt{2}}\frac{d}{d+1}}{(\sqrt{2}-1)^2}\\ {}&\le 6.5(d+1)^{1/4} 2^{-(d+1)/2}. \end{aligned}$$

But \((C_d)\) implies that \(\sum _i q_i = \sum _i p_i =\bar{p} \). So we get in a similar fashion

$$\begin{aligned} \frac{1}{2} \sum _{\ell = d+1}^n |\alpha _{\ell }(\mathcal {Q}, \bar{p})|\cdot \Vert \delta ^{\ell }\mathcal {B}_{n,\bar{p}}(\cdot )\Vert _1 \le 6.5(d+1)^{1/4} 2^{-(d+1)/2}. \end{aligned}$$

Plugging these bounds into (18) we get

$$\begin{aligned} d_{\mathrm{TV}}\left( X,Y\right) \le 13(d+1)^{1/4} 2^{-(d+1)/2}. \end{aligned}$$

6 Deferred proofs

Proof of Lemma 1

Let \(X=\sum _i X_i\) and \(Y=\sum _i Y_i\). It is obvious that, if \((p_1,\ldots ,p_n)=(q_1,\ldots ,q_n)\), then the distributions of \(X\) and \(Y\) are the same. In the other direction, we show that, if \(X\) and \(Y\) have the same distribution, then \((p_1,\ldots ,p_n)=(q_1,\ldots ,q_n)\). Consider the polynomials:

$$\begin{aligned} g_X(s)=\mathbb {E}\left[ (1+s)^X\right] = \prod _{i=1}^n \mathbb {E}\left[ (1+s)^{X_i}\right] = \prod _{i=1}^n (1+p_i s);\\ g_Y(s)=\mathbb {E}\left[ (1+s)^Y\right] = \prod _{i=1}^n \mathbb {E}\left[ (1+s)^{Y_i}\right] = \prod _{i=1}^n (1+q_i s). \end{aligned}$$

Since \(X\) and \(Y\) have the same distribution, \(g_X\) and \(g_Y\) are equal, so they have the same degree and roots. Notice that \(g_X\) has degree \(n-|\{i~|~p_i=0\}|\) and roots \(\{ -{1 \over p_i}~|~p_i \ne 0 \}\). Similarly, \(g_Y\) has degree \(n-|\{i~|~q_i=0\}|\) and roots \(\{ -{1 \over q_i}~|~q_i \ne 0 \}\). Hence, \((p_1,\ldots ,p_n)=(q_1,\ldots ,q_n)\). \(\square \)

Proof of Lemma 2

It follows from the coupling lemma that for any coupling of the variables \(X_1,\ldots ,X_n, Y_1,\ldots ,Y_n\):

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i=1}^n X_i,\sum _{i=1}^n Y_i\right)&\le \Pr \left[ \sum _{i=1}^n X_i \ne \sum _{i=1}^n Y_i \right] \nonumber \\&\le \sum _{i=1}^n \Pr [X_i \ne Y_i]. \end{aligned}$$
(19)

We proceed to fix a specific coupling. For all \(i\), it follows from the optimal coupling theorem that there exists a coupling of \(X_i\) and \(Y_i\) such that \(\Pr [X_i \ne Y_i] = d_{\mathrm{TV}}\left( X_i,Y_i\right) \). Using these individual couplings for each \(i\) we define a grand coupling of the variables \(X_1,\ldots ,X_n, Y_1,\ldots ,Y_n\) such that \(\Pr [X_i \ne Y_i] = d_{\mathrm{TV}}\left( X_i,Y_i\right) \), for all \(i\). This coupling is faithful because \(X_1,\ldots , X_n\) are mutually independent and \(Y_1,\ldots ,Y_n\) are mutually independent. Under this coupling Eq. (19) implies:

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i=1}^n X_i,\sum _{i=1}^n Y_i\right) \le \sum _{i=1}^n \Pr [X_i \ne Y_i] \equiv \sum _{i=1}^n d_{\mathrm{TV}}\left( X_i,Y_i\right) . \end{aligned}$$
(20)

\(\square \)

Claim 1

Fix integers \(\tilde{n}, \delta , B, k \in \mathbb {N}_+, \tilde{n}, k\ge 2\). Given a set of values \(\mu _1,\ldots ,\mu _{\delta }, \mu '_1,\ldots ,\mu '_{\delta }\), where, for all \(\ell =1,\ldots ,{\delta }\),

$$\begin{aligned} \mu _{\ell },\mu '_{\ell } \in \left\{ 0,\left( {1 \over k^2}\right) ^{\ell },2\left( {1 \over k^2}\right) ^{\ell },\ldots ,B\right\} , \end{aligned}$$

discrete sets \(\mathcal {T}_1,\ldots ,\mathcal {T}_{\tilde{n}} \subseteq \left\{ 0, {1 \over k^2}, {2 \over k^2},\ldots , 1\right\} \), and four integers \(n_0, n_1 \le {\tilde{n}}, n_s, n_b \le B\), it is possible to solve the system of equations:

$$\begin{aligned} (\Sigma ):\quad \sum _{p_i \in (0,1/2]}p_i^{\ell }&=\mu _{\ell }, \quad \text { for all }\ell =1,\ldots ,{\delta },\\ \sum _{p_i \in (1/2,1)}p_i^{\ell }&=\mu '_{\ell }, \quad \text { for all }\ell =1,\ldots ,{\delta },\\ |\{i | p_i = 0\}|&= n_0\\ |\{i | p_i = 1\}|&=n_1\\ |\{i | p_i \in (0, 1/2]\}|&=n_s\\ |\{i | p_i \in (1/2, 1)\}|&=n_b \end{aligned}$$

with respect to the variables \(p_1 \in \mathcal {T}_1,\ldots , p_{\tilde{n}} \in \mathcal {T}_{\tilde{n}}\), or to determine that no solution exists, in time

$$\begin{aligned} O({\tilde{n}}^3 \log _2 {\tilde{n}}) B^{O({\delta })} k^{O({\delta }^2)}. \end{aligned}$$

Proof of Claim 1

We use dynamic programming. Let us consider the following tensor of dimension \(2{\delta }+5\):

$$\begin{aligned} A(i, z_0, z_1, z_s, z_b; \nu _1,\ldots ,\nu _{\delta } ; \nu '_1,\ldots ,\nu '_{\delta }), \end{aligned}$$

where \(i \in [\tilde{n}], z_0, z_1 \in \{0,\ldots ,\tilde{n}\}, z_s, z_b \in \{0,\ldots ,B\}\) and

$$\begin{aligned} \nu _{\ell }, \nu '_{\ell } \in \left\{ 0,\left( {1 \over k^2}\right) ^{\ell },2\left( {1 \over k^2}\right) ^{\ell },\ldots ,B\right\} , \end{aligned}$$

for \(\ell =1,\ldots ,{\delta }.\) The total number of cells in \(A\) is

$$\begin{aligned}&\tilde{n} \cdot (\tilde{n}+1)^2 \cdot (B+1)^2 \cdot \left( \prod _{\ell =1}^{\delta } (B k^{2 \ell }+1) \right) ^2 \le O(\tilde{n}^3) B^{O({\delta })} k^{O({\delta }^2)}. \end{aligned}$$

Every cell of \(A\) is assigned value \(0\) or \(1\), as follows:

$$\begin{aligned}&A(i, z_0, z_1, z_s, z_b ; \nu _1,\ldots ,\nu _{\delta }, \nu '_1,\ldots ,\nu '_{\delta })=1\\&\Longleftrightarrow \left( \begin{array}{c} \hbox {There exist } p_1 \in \mathcal {T}_1, \ldots , p_i \in \mathcal {T}_i \hbox { such} \\ \hbox {that } |\{j \le i | p_j =0\}|=z_0,\\ |\{j \le i | p_j =1\}|=z_1, \\ |\{j \le i | p_j \in (0,1/2]\}|=z_s,\\ |\{j \le i | p_j \in (1/2,1)\}|=z_b,\\ \sum _{j \le i: p_j \in (0,1/2]}p_j^{\ell }=\nu _{\ell },\hbox { for all}\\ \ell =1,\ldots ,{\delta }, \sum _{j \le i: p_j \in (1/2,1)}p_j^{\ell }=\nu '_{\ell },\hbox { for}\\ \hbox {all }\ell =1,\ldots ,{\delta }.\end{array}\right) . \end{aligned}$$

Notice that we need \(O(\tilde{n}^3) B^{O({\delta })} k^{O({\delta }^2)}\) bits to store \(A\) and \(O(\log \tilde{n}+ \delta \log B + \delta ^2 \log k)\) bits to address cells of \(A\). To complete \(A\) we can work in layers of increasing \(i\). We initialize all entries to value \(0\). Then, the first layer \(A(1,\cdot ,\cdot ~;~\cdot ,\ldots ,\cdot )\) can be completed easily as follows:

$$\begin{aligned}&A(1, 1, 0, 0, 0 ; 0, 0, \ldots ,0 ; 0, 0, \ldots , 0)=1 \Leftrightarrow 0 \in \mathcal {T}_1;\\&A(1, 0, 1, 0, 0 ; 0, 0, \ldots ,0 ; 0, 0 \ldots , 0)=1 \Leftrightarrow 1 \in \mathcal {T}_1;\\&A(1, 0, 0, 1, 0 ; p, p^2, \ldots ,p^{\delta } ; 0,\ldots ,0)=1 \Leftrightarrow p \in \mathcal {T}_1 \cap (0,1/2];\\&A(1, 0, 0, 0, 1 ; 0, \ldots ,0; p, p^2, \ldots ,p^{\delta })=1 \Leftrightarrow p \in \mathcal {T}_1 \cap (1/2,1). \end{aligned}$$

Inductively, to complete layer \(i+1\), we consider all the non-zero entries of layer \(i\) and for every such non-zero entry and for every \(v_{i+1} \in \mathcal {T}_{i+1}\), we find which entry of layer \(i+1\) we would transition to if we chose \(p_{i+1}=v_{i+1}\). We set that entry equal to \(1\) and we also save a pointer to this entry from the corresponding entry of layer \(i\), labeling that pointer with the value \(v_{i+1}\). The bit operations required to complete layer \(i+1\) are bounded by

$$\begin{aligned} |\mathcal {T}_{i+1}| (\tilde{n}+1)^2 B^{O({\delta })} k^{O({\delta }^2)} O(\log \tilde{n} + \delta \log B + \delta ^2 \log k) \le O(\tilde{n}^2 \log \tilde{n}) B^{O({\delta })} k^{O({\delta }^2)}. \end{aligned}$$

Therefore, the overall time needed to complete \(A\) is

$$\begin{aligned} O(\tilde{n}^3 \log \tilde{n}) B^{O({\delta })} k^{O({\delta }^2)}. \end{aligned}$$

Having completed \(A\), it is easy to check if there is a solution to \((\Sigma )\). A solution exists if and only if

$$\begin{aligned} A(\tilde{n},{n}_0,n_1,n_s,n_b;\mu _1,\ldots ,\mu _{\delta } ; \mu '_1,\ldots ,\mu '_{\delta })=1, \end{aligned}$$

and can be found by tracing the pointers from this cell of \(A\) back to level \(1\). The overall running time is dominated by the time needed to complete \(A\). \(\square \)

Proof of Lemma 3

Without loss of generality assume that \(0<\lambda _1 \le \lambda _2\) and denote \(\delta = \lambda _2-\lambda _1\). For all \(i\in \{0,1,\ldots \}\), denote

$$\begin{aligned} p_i=e^{-\lambda _1}\frac{\lambda _1^i}{i!} \quad \text { and }\quad q_i=e^{-\lambda _2}\frac{\lambda _2^i}{i!}. \end{aligned}$$

Finally, define \(\mathcal {I}^*=\{i:p_i\ge q_i\}\).

We have

$$\begin{aligned} \sum _{i \in \mathcal {I}^*}{|p_i - q_i|} = \sum _{i \in \mathcal {I}^*}{(p_i - q_i)}&\le \sum _{i \in \mathcal {I}^*}{\frac{1}{i!}(e^{-\lambda _1}\lambda _1^i - e^{-\lambda _1-\delta }\lambda _1^i)}\\&= \sum _{i \in \mathcal {I}^*}{\frac{1}{i!}e^{-\lambda _1}\lambda _1^i(1 - e^{-\delta })}\\&\le (1 - e^{-\delta }) \sum _{i=0}^{+\infty }{\frac{1}{i!}e^{-\lambda _1}\lambda _1^i}=1 - e^{-\delta }. \end{aligned}$$

On the other hand

$$\begin{aligned} \sum _{i \notin \mathcal {I}^*}{|p_i - q_i|} = \sum _{i \notin \mathcal {I}^*}{(q_i - p_i)}&\le \sum _{i \notin \mathcal {I}^*}{\frac{1}{i!}(e^{-\lambda _1}(\lambda _1+\delta )^i - e^{-\lambda _1}\lambda _1^i)}\\&= \sum _{i \notin \mathcal {I}^*}{\frac{1}{i!}e^{-\lambda _1}((\lambda _1+\delta )^i-\lambda _1^i)}\\&\le \sum _{i=0}^{+\infty }{\frac{1}{i!}e^{-\lambda _1}((\lambda _1+\delta )^i-\lambda _1^i)}\\&= e^{\delta }\sum _{i=0}^{+\infty }{\frac{1}{i!}e^{-(\lambda _1+\delta )}(\lambda _1+\delta )^i }- \sum _{i=0}^{+\infty }{\frac{1}{i!}e^{-\lambda _1}\lambda _1^i}\\&= e^{\delta }-1. \end{aligned}$$

Combining the above we get the result. \(\square \)

Proof of Lemma 4

We have

$$\begin{aligned} {\mu \over m'}=\frac{\sum _{i \in \mathcal{M}}p_i'+t}{m'} \le q=\frac{\ell ^*}{n} \le \frac{\sum _{i \in \mathcal{M}}p_i'+t}{m'}+\frac{1}{n} = {\mu \over m'}+{1 \over n}. \end{aligned}$$

Multiplying by \(m'\) we get:

$$\begin{aligned} {\mu } \le m' q \le \mu +{m' \over n}. \end{aligned}$$

As \(\mu ' = m' q\) and \(m' \le n\), we get \({\mu } \le \mu ' \le \mu + 1.\) Moreover, since \(m\ge k^3\),

$$\begin{aligned} \mu \ge \sum _{i \in \mathcal{M}}p'_i \ge m \frac{1}{k} \ge k^2. \end{aligned}$$

For the variances we have:

$$\begin{aligned} \sigma '^2 = m' q (1-q)&= m' \cdot {\ell ^* \over n} \cdot \left( 1- {\ell ^*-1 \over n} -{1 \over n}\right) \nonumber \\&\ge \left( \sum _{i \in \mathcal {M}}p_i' +t \right) \cdot \left( 1- {1 \over n} - {\sum _{i \in \mathcal {M}} p_i' +t \over m'} \right) \nonumber \\&= (1-1/n) \left( \sum _{i \in \mathcal {M}}p_i' +t \right) - {(\sum _{i \in \mathcal {M}} p_i' +t)^2 \over m'}\nonumber \\&\ge (1-1/n) \left( \sum _{i \in \mathcal {M}}p_i' +t \right) - {(\sum _{i \in \mathcal {M}} p_i' +t)^2 \over \frac{\left( \sum _{i \in \mathcal{M}}p_i' + t\right) ^2}{\sum _{i \in \mathcal{M}}p_i'^2+t}}\nonumber \\&= \sum _{i \in \mathcal{M}}p'_i(1-p'_i) -{1 \over n} \left( \sum _{i \in \mathcal {M}}p_i' +t \right) \nonumber \\&= \sigma ^2 -{1 \over n} \left( \sum _{i \in \mathcal {M}}p_i' +t \right) \ge \sigma ^2 -1. \end{aligned}$$
(21)

In the other direction:

$$\begin{aligned} \sigma '^2 = m' q (1-q)&= m' \cdot \left( {\ell ^* -1 \over n} + {1 \over n} \right) \cdot \left( 1- {\ell ^* \over n} \right) \nonumber \\&\le m' \cdot \left( {\ell ^* -1 \over n}\right) \cdot \left( 1- {\ell ^* \over n} \right) +{m' \over n} \nonumber \\&\le \left( {\sum _{i \in \mathcal{M}}p_i' + t}\right) \cdot \left( 1- \frac{\sum _{i \in \mathcal{M}}p_i' + t}{m'}\right) +1\nonumber \\&= \left( {\sum _{i \in \mathcal{M}}p_i' + t}\right) - \frac{\left( \sum _{i \in \mathcal{M}}p_i' + t \right) ^2}{m'}+1\nonumber \\&\le \left( {\sum _{i \in \mathcal{M}}p_i' + t}\right) - \frac{\left( \sum _{i \in \mathcal{M}}p_i' + t \right) ^2}{{{\left( \sum _{i \in \mathcal{M}}p_i' + t\right) ^2} \over {\sum _{i \in \mathcal{M}}p_i'^2+t}} +1 }+1\nonumber \\&= \left( {\sum _{i \in \mathcal{M}}p_i' + t}\right) - \left( {\sum _{i \in \mathcal{M}}p_i'^2+t}\right) \nonumber \\&\times \frac{\left( \sum _{i \in \mathcal{M}}p_i' + t \right) ^2}{{{\left( \sum _{i \in \mathcal{M}}p_i' + t\right) ^2}} +{\sum _{i \in \mathcal{M}}p_i'^2+t} }+1\nonumber \\&= \left( {\sum _{i \in \mathcal{M}}p_i' + t}\right) - \left( {\sum _{i \in \mathcal{M}}p_i'^2+t}\right) \nonumber \\&\times \left( 1\!-\!{\sum _{i \in \mathcal{M}}p_i'^2 + t \over {{{\left( \sum _{i \in \mathcal{M}}p_i' + t\right) ^2}} \!+\!{\sum _{i \in \mathcal{M}}p_i'^2+t} }}\right) \!+\!1\nonumber \\&= {\sum _{i \in \mathcal{M}}p_i' (1-p_i')} + {\left( \sum _{i \in \mathcal{M}}p_i'^2 + t \right) ^2 \over {{{\left( \sum _{i \in \mathcal{M}}p_i' + t\right) ^2}} +{\sum _{i \in \mathcal{M}}p_i'^2+t} }}+1\nonumber \\&= \sigma ^2 + {\left( \sum _{i \in \mathcal{M}}p_i'^2 + t \right) ^2 \over {{{\left( \sum _{i \in \mathcal{M}}p_i' + t\right) ^2}} +{\sum _{i \in \mathcal{M}}p_i'^2+t} }}+1 \le \sigma ^2+2. \end{aligned}$$
(22)

Finally,

$$\begin{aligned} \sigma ^2 = \sum _{i \in \mathcal{M}}p'_i(1-p'_i) \ge m \frac{1}{k}\left( 1-\frac{1}{k} \right) \ge k^2\left( 1-\frac{1}{k} \right) . \end{aligned}$$

\(\square \)

Proposition 2

For all \(d \in [n]\), Condition \((C_d)\) in the statement of Theorem 3 is equivalent to the following condition:

$$\begin{aligned} (V_d):\quad \mathbb {E}\left[ \left( \sum _{i=1}^n X_i\right) ^{\ell }\right] = \mathbb {E}\left[ \left( \sum _{i=1}^n Y_i\right) ^{\ell }\right] ,\quad ~\text {for all } \ell \in [d]. \end{aligned}$$

Proof of Proposition 2

\((V_d) \Rightarrow (C_d)\): First notice that, for all \(\ell \!\in \! [n], \mathbb {E}\left[ \left( \sum _{i=1}^n X_i\right) ^{\ell }\right] \) can be written as a weighted sum of the elementary symmetric polynomials \(\psi _1({\mathcal {P}}), \psi _2({\mathcal {P}}),\ldots ,\psi _{\ell }({\mathcal {P}})\), where, for all \(t \in [n], \psi _{t}({\mathcal {P}})\) is defined as

$$\begin{aligned} \psi _{t}({\mathcal {P}}):=(-1)^{t}\sum _{\begin{array}{c} S \subseteq [n]\\ |S|=t \end{array}} \prod _{i \in S} p_i. \end{aligned}$$

\((V_d)\) implies then by induction

$$\begin{aligned} \psi _{\ell }({\mathcal {P}})=\psi _{\ell }(\mathcal {Q}),\quad \text {for all }\ell = 1,\ldots ,d. \end{aligned}$$
(23)

Next, for all \(t \in [n]\), define \(\pi _{t}({\mathcal {P}})\) to be the power sum symmetric polynomial of degree \(t\)

$$\begin{aligned} \pi _{t}(\mathcal {P}):=\sum _{i=1}^np_i^{t}. \end{aligned}$$

Now, fix any \(\ell \le d\). Since \(\pi _{\ell }({\mathcal {P}})\) is a symmetric polynomial of degree \(\ell \) on the variables \(p_1,\ldots ,p_n\), it can be expressed as a function of the elementary symmetric polynomials \(\psi _1({\mathcal {P}}),\ldots ,\psi _{\ell }({\mathcal {P}})\). So, by (23), \(\pi _{\ell }(\mathcal {P}) = \pi _{\ell }(\mathcal {Q})\). Since this holds for any \(\ell \le d, (C_d)\) is satisfied.

The implication \((C_d) \Rightarrow (V_d)\) is established in a similar fashion. \((C_d)\) says that

$$\begin{aligned} \pi _{\ell }(\mathcal {P})=\pi _{\ell }(\mathcal {Q}), \quad \text {for all } \ell =1,\ldots ,d. \end{aligned}$$
(24)

Fix some \(\ell \le d\). \(\mathbb {E}\left[ \left( \sum _{i=1}^n X_i\right) ^{\ell }\right] \) can be written as a weighted sum of the elementary symmetric polynomials \(\psi _1({\mathcal {P}}), \psi _2({\mathcal {P}}),\ldots ,\psi _{\ell }({\mathcal {P}})\). Also, for all \(t \in [\ell ], \psi _t({\mathcal {P}})\) can be written as a polynomial function of \(\pi _1({\mathcal {P}}),\ldots ,\pi _{t}({\mathcal {P}})\) (see, e.g., [29]). So from (24) it follows that \(\mathbb {E}\left[ \left( \sum _{i=1}^n X_i\right) ^{\ell }\right] =\mathbb {E}\left[ \left( \sum _{i=1}^n Y_i\right) ^{\ell }\right] \). Since this holds for any \(\ell \le d, (V_d)\) is satisfied. \(\square \)

Corollary 1

Let \(\mathcal {P}:=(p_i )_{i=1}^n \in [1/2,1]^n\) and \(\mathcal {Q}:=(q_i)_{i=1}^n \in [1/2,1]^n\) be two collections of probability values in \([1/2,1]\). Let also \(\mathcal {X}:=(X_i)_{i=1}^n\) and \(\mathcal {Y}:=(Y_i)_{i=1}^n\) be two collections of mutually independent indicators with \(\mathbb {E}[X_i]=p_i\) and \(\mathbb {E}[Y_i]=q_i\), for all \(i \in [n]\). If for some \(d \in [n]\) Condition \((C_d)\) in the statement of Theorem 3 is satisfied, then

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i}{X_i},\sum _{i}{Y_i}\right) \le 13(d+1)^{1/4} 2^{-(d+1)/2}. \end{aligned}$$

Proof of Corollary 1

Define \(X'_i = 1-X_i\) and \(Y'_i=1-Y_i\), for all \(i\). Also, denote \(p_i'=\mathbb {E}[X'_i]=1-p_i\) and \(q_i'=\mathbb {E}[Y'_i]=1-q_i\), for all \(i\). By assumption:

$$\begin{aligned} \sum _{i=1}^n \left( 1-p_i'\right) ^{\ell } = \sum _{i=1}^n \left( 1-q_i'\right) ^{\ell },\quad \text {for all } \ell =1,\ldots ,d. \end{aligned}$$
(25)

Using the Binomial theorem and induction, we see that (25) implies:

$$\begin{aligned} \sum _{i=1}^n p_i'^{\ell } = \sum _{i=1}^n q_i'^{\ell },\quad \text {for all } \ell =1,\ldots ,d. \end{aligned}$$

Hence we can apply Theorem 3 to deduce

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i}{X'_i},\sum _{i}{Y'_i}\right) \le 13(d+1)^{1/4} 2^{-(d+1)/2}. \end{aligned}$$

The proof is completed by noticing that

$$\begin{aligned} d_{\mathrm{TV}}\left( \sum _{i}{X_i},\sum _{i}{Y_i}\right) =d_{\mathrm{TV}}\left( \sum _{i}{X'_i},\sum _{i}{Y'_i}\right) . \end{aligned}$$

\(\square \)