Abstract
Under the assumption that the distribution of a nonnegative random variable \(X\) admits a bounded coupling with its size biased version, we prove simple and strong concentration bounds. In particular the upper tail probability is shown to decay at least as fast as the reciprocal of a Gamma function, guaranteeing a moment generating function that converges everywhere. The class of infinitely divisible distributions with finite mean, whose Lévy measure is supported on an interval contained in \([0,c]\) for some \(c < \infty \), forms a special case in which this upper bound is logarithmically sharp. In particular the asymptotic estimate for the Dickman function, that \(\rho (u) \approx u^{-u}\) for large \(u\), is shown to be universal for this class. A special case of our bounds arises when \(X\) is a sum of independent random variables, each admitting a 1-bounded size bias coupling. In this case, our bounds are comparable to Chernoff–Hoeffding bounds; however, ours are broader in scope, sharper for the upper tail, and equal for the lower tail. We discuss bounded and monotone couplings, give a sandwich principle, and show how this gives an easy conceptual proof that any finite positive mean sum of independent Bernoulli random variables admits a 1-bounded coupling with the same conditioned to be nonzero.
Similar content being viewed by others
1 Introduction and main results
For any nonnegative random variable \(X\) with \(0 < \mathbb {E \,}X < \infty \), we say that the distribution of \(Y\) is the size biased distribution of \(X\), written \(Y =^d X^*\), if the Radon–Nikodym derivative of the distribution of \(Y\), with respect to the distribution of \(X\), is given by \(\mathbb {P}(Y \in dx)/\mathbb {P}(X \in dx) = x/ \mathbb {E \,}X\). If \(Y =^d X^*\), then for all bounded measurable \(g\), \(\mathbb {E \,}g(Y) = \mathbb {E \,}( X g(X)) / \mathbb {E \,}X\). For much more information about size biased distributions see [3], or [1, pp 78–80].
In this paper we shall assume that \(X\) admits a \(c\)-bounded size bias coupling. More precisely we make the hypothesis
BSBC: \(X\) is a non-negative random variable with positive finite expected value \(\mathbb {E \,}X = a\), and for \(Y =^d X^*\) there exists a coupling in which \( Y \le X+c\) for some \(c \in (0,\infty )\).
Throughout the paper the numbers \(a\) and \(c\) will always refer to their definitions in BSBC. Examples of random variables which admit a \(c\)-bounded size bias coupling are given in Sect. 5, and some equivalent formulations of the assumption BSBC are given in Sect. 7.
Define
When \(X\) has finite mean \(a\), concentration inequalities refer to upper bounds on the upper tail probability \(G(x) = \mathbb {P}(X \ge x)\) for \(x \ge a\) and on the lower tail probability \(F(x) = \mathbb {P}(X \le x)\) for \(x \le a\). For a review of early results on concentration inequalities see Ledoux [19]. More recently Chatterjee [7] has used Stein’s method for exchangeable pairs to obtain concentration inequalities, see also [8].
The remarkably effective idea of using bounded size bias couplings to prove concentration inequalities comes from Ghosh and Goldstein [12]; their proof is inspired by the \(x \mapsto e^x\) is convex argument used to prove the Hoeffding concentration bounds, see [18]. For many examples of the application of concentration bounds derived from size bias couplings, in situations involving dependence, see [2, 5, 11, 12]. Some details of the example in [2] are given in Sect. 5. An extension to a multivariate setting is given in Ghosh and Işlak [13].
Number theorists denote by \(\Psi (x,y)\) the number of positive integers not exceeding \(x\) and free of prime factors larger than \(y\). Dickman [9] showed that for any \(u > 0\)
where \(\rho (u)\), the Dickman function, is the unique continuous function satisfying
In Hildebrand and Tenenbaum [17, Lemma 2.5] it is shown that
and then a simple inductive proof gives the inequality
where \(\Gamma (u)\) denotes the usual Gamma function. It is well known that \(\log \Gamma (u) \sim u \log u\) as \(u \rightarrow \infty \), and the Dickman function \(\rho (u)\) exhibits similar asymptotic behavior: \(\log \rho (u) \sim -u \log u\) as \(u \rightarrow \infty \), see [17, Cor 2.3].
Inspired by the proof of (1), we prove stronger concentration inequalities than [12], under weaker hypotheses, and with a simpler proof. In the following, the notation \(\lfloor x \rfloor \) is used for the floor function, that is, the greatest integer less than or equal to \(x\).
Theorem 1.1
Assume BSBC. Given \(x\) let
so that \(k\) is a nonnegative integer, possibly zero. Then
and
Remark 1.1
(Scaling) It is simplest, both for notation and concept, to work with the special case where the constant \(c\) in BSBC satisfies \(c=1\). The results derived for this special case easily transform into results for the general case, since if \(Y =^d X^*\) and \(Y \le X+c\), then
In particular the upper bounds in (3) and (4) satisfy
for all \(a,c>0\) and \(x \ge 0\).
An opportunity to use (6) occurs in the following result, which provides a more convenient closed form version of the concentration inequalities above.
Theorem 1.2
The upper tail bound defined by (3) and the lower tail bound defined by (4) satisfy
and
with, in each case, equality if and only if \(x-a\) is an integer.
It is an immediate consequence of these results that if \(X\) satisfies BSBC then
In Sect. 6 we present a class of random variables satisfying BSBC for which the \(\limsup \) and inequality in (9) can be replaced by \(\lim \) and equality, see Theorem 6.1.
An alternative approach, described in Sect. 4, involves an upper bound on the moment generating function of \(X\) and gives the following result.
Theorem 1.3
Assume BSBC. Then
and
Here we use the convention that \((1/0)^0 = \lim _{x \rightarrow 0^+} (1/x)^x = 1\), so we can regard \(\displaystyle { \left( a/x\right) ^{x/c}}\) as being well-defined and taking the value 1 when \(x = 0\). Some elementary analysis, see Lemmas 4.1 and 4.2, allows us to replace the bounds in Theorem 1.3 with the strictly weaker (for \(x \ne a\)) bounds obtained by Ghosh and Goldstein in [12].
Corollary 1.1
Assume BSBC. Then
and
For a simple application of these bounds, suppose that \(X = X_1+X_2+ \cdots + X_n\) is the sum of independent random variables with values in the interval \([0,c]\). Then \(X\) satisfies BSBC, see Sect. 5. The estimates in Theorem 1.3 are the standard Hoeffding inequalities, see Hoeffding [18] and Sect. 9.2, and the estimates in Corollary 1.1 are those obtained by Chatterjee [7] as an simple example of his results on concentration inequalities for exchangeable pairs.
The proofs of Theorems 1.1, 1.2 and 1.3 and Corollary 1.1 are given in Sects. 2, 3 and 4. Some simple examples of random variables satisfying BSBC are given in Sect. 5, and a particular family of infinitely divisible measures satisfying BSBC, the so called Lévy(\([0,c]\)) random variables, is studied in Sect. 6. Section 7 deals with the relationship between monotone and bounded couplings. Finally in Sects. 8 and 9 we consider the relative strengths of the bounds in Theorems 1.1 and 1.2 (coming from the elementary argument in Lemma 2.1) as opposed to those in Theorem 1.3 and Corollary 1.1 (obtained via the moment generating function); the technical calculations appear in Sect. 8 and the discussion is in Sect. 9.
2 Product bounds
We start with an elementary argument giving powerful bounds on the upper and lower tail probabilities.
Lemma 2.1
Assume BSBC. Then
and
Proof
To prove the upper bound on \(G(x)\), note that BSBC implies that the event \(Y \ge x\) is a subset of the event \(X \ge x-c\). Hence for \(x>0\),
When \(x>0\) we can divide by \(x\) to get (14).
To prove the upper bound on \(F(x)\), note that BSBC implies that the event \(Y \le x\) is a superset of the event \(X \le x-c\). Hence
This does not require that \(x\) be positive; for \(x<0\) it is the trivial inequality, that \(0 \ge 0\). Replacing \(x\) by \(x+c\) and dividing by \(a>0\) yields (15). \(\square \)
Proof of Theorem 1.1
Given \(x>0\), the obvious strategy for obtaining good bounds is to iterate (14) or (15) for as long as the new value of \(x\), say \(x'=x \pm i c\), still gives a favorable ratio, \(a/x'\) in (14), or \((x'+c)/a\) in (15), and using \(G(t) \le 1\) or \(F(t) \le 1\) as needed, to finish off. The proof is now a simple matter of carrying out this strategy. \(\square \)
Corollary 2.1
Assume BSBC. Then the moment generating function of \(X\) is finite everywhere, that is, for all \(\beta \in \mathbb {R}\), \(M(\beta ) := \mathbb {E \,}e^{\beta X} < \infty \).
Proof
Since \(X \ge 0\), trivially \(M(\beta ) \le 1\) if \(\beta \le 0\), so assume \(\beta > 0\). For motivation: as \(x\) increases by \(c\), \(e^{\beta x}\) increases by a factor of \(e^{\beta c}\), while the upper bound \(u(x,a,c)\) on \(G(x)\) decreases by a factor of \(a/(x+c)\); hence for \(x > x_0 := 2 a e^{\beta c}\), the product \(e^{\beta x} u(x,a,c)\) decreases by a factor of at least 2. More precisely, for \(x \ge a\) we have \(u(x+c,a,c) = \frac{a}{x+c}u(x,a,c)\) and so
for \(x > x_0\). Writing
the series on the right side is bounded by a geometric series with ratio 1/2, and the net result will be \(M(\beta ) \le \exp (\beta x_0) + 2 \exp (\beta (x_0+c))\). \(\square \)
Remark 2.1
Note the difference between the indexing in the products (3, 4): \(u(x,a,c)\) includes the factor indexed by \(i=0\), while \(\ell (x,a,c)\) excludes the factor indexed by \(i=0\). In case \(x \in (a,a+c)\), which is equivalent to \(a < x\) and \(k=0\), the bound in (3) has one factor, and simplifies to \(G(x) \le u(x,a,c) = a/x < 1\). In case \(x \in (a-c,a)\), which is equivalent to \(x < a\) and \(k=0\), the bound in (4) has no factors, and simplifies to the trivial observation \(F(x) \le \ell (x,a,c)=1\).
Remark 2.2
(Product bound combined with the one-sided Chebyshev inequality) The recursive nature of Lemma 2.1 allows the possibility of combining it with other information about \(F(x)\) or \(G(x)\). Here we pursue one possibility.
For a random variable \(X\) with mean \(a\) and variance \(\sigma ^2\), the one-sided Chebybshev inequality states that, for all \(x \le a\), \(F(x) = \mathbb {P}(X \le x)\) is upper bounded by \(\sigma ^2/(\sigma ^2+(a-x)^2)\). In our situation, \(Y =^d X^*\) and \(\mathbb {E \,}X=a\) yields \(\mathbb {E \,}X^2 = a \, \mathbb {E \,}Y\), and \(Y \le X+c\) implies \(\mathbb {E \,}Y \le a+c\). Hence \(\mathbb {E \,}X^2 = a \, \mathbb {E \,}Y \le a(a+c)\), so that \(\sigma ^2 \le ac\). Thus, under the hypotheses of Lemma 2.1, for all \(x \le a\),
We want to improve on (4) by using one-sided Chebyshev in combination with iteration of (15). More precisely, given \(x < a\) and any non-negative integer \(j\) such that \(x+jc \le a\) we can iterate (15) from \(x\) to \(x+jc\) and then use the one-sided Chebyshev inequality at \(x+jc\) to obtain
3 Gamma bounds
Here we replace the product bounds \(u(x,a,c)\) and \(\ell (x,a,c)\) in Theorem 1.1 by the simpler (but weaker) bounds given in Theorem 1.2. In this section we restrict to the case \(c = 1\). Results for general \(c > 0\) can be recovered using Eq. (6), see Remark 1.1. For \(c=1\), making use of \(z\Gamma (z) = \Gamma (z+1)\), the conclusions (3) and (4) in Theorem 1.1 can be rewritten as: for \(x \ge a\) and \(k=\lfloor x-a \rfloor \),
and for \(0 \le x \le a\) and \(k=\lfloor a-x \rfloor \),
These upper and lower tail bounds might be viewed as too complicated; as \(x\) varies they are not closed-form expressions, and are not analytic functions. Here we replace them by expressions which are analytic in \(a\) and \(x\).
Lemma 3.1
For \(a > 0\) and \(0 \le f \le 1\)
with equality for \(f=0,1\) and strict inequality for \(f \in (0,1)\).
Proof
The result is true (with equality) when \(f = 0\) or \(1\), so we can assume \(0 < f < 1\). We use the integral formula \(\Gamma (x) = \int _0^\infty t^{x-1}e^{-t}dt\) for \(x > 0\). Writing \(t^{a+f-1} = (t^{a-1})^{1-f} (t^a)^f\) and using Hölder’s inequality (with \(p = 1/(1-f)\)) gives
Since \( a \Gamma (a) = \Gamma (a+1)\) we get
and we are done. \(\square \)
Proof of Theorem 1.2
(i) Let \(k = \lfloor x-a \rfloor \) and \(f= x-a-k \in [0,1)\). Since \(c=1\) and \(x \ge a\), this is consistent with the notation in Theorem 1.1. Note that \(x-k=a-f\), and combine (18) with (20).
(ii) Let \(k = \lfloor a-x \rfloor \) and \(f = a-x-k \in [0,1).\) Since \(c=1\) and \(a \ge x\), this is consistent with the notation in Theorem 1.1. Replacing \(f\) by \(1-f\) in (20) gives \(a^f \Gamma (a-f+1) \le \Gamma (a+1)\). Combining this with (19) and noting that \(x+k=a-f\) gives the result. \(\square \)
4 Moment generating function
We recall our notation, \(M(\beta ) := \mathbb {E \,}e^{\beta X}\) for the moment generating function of \(X\). We observe that if \(X\) is Poisson with parameter \(a\), then \({\log M}(\beta ) = a(e^{\beta }-1)\), and \(X\) admits a \(c\)-bounded size bias coupling with \(c=1\), so that in this case, the inequality (21) holds as an equality for all \(\beta \).
Proposition 4.1
Assume BSBC. Then the moment generating function \(M(\beta )\) for \(X\) satisfies
for all \(\beta \in \mathbb {R}\).
Proof
We know from Corollary 2.1 that the moment generating function \(M(\beta )\) is finite for all \(\beta \in \mathbb {R}\). It follows that \(M\) is continuously differentiable and \( M'(\beta ) = \mathbb {E \,}(Xe^{\beta X})\). Moreover, since \(Y\) is the size biased version of \(X\), we have \(\mathbb {E \,}(Xe^{\beta X}) = a \mathbb {E \,}(e^{\beta Y})\). Together we have \(M'(\beta ) = a \mathbb {E \,}(e^{\beta Y})\).
For \(\beta \ge 0\) we have \(e^{\beta Y} \le e^{\beta (X+c)} = e^{\beta c}e^{\beta X}\) so that
Then
so that
for all \(\beta \ge 0\).
For \(\beta \le 0\) we have \(e^{\beta Y} \ge e^{\beta (X+c)} = e^{\beta c}e^{\beta X}\) so that
Then
so that
and the proof is complete. \(\square \)
The upper tail and lower tail bounds in Theorem 1.3 can now be obtained using the standard “large deviation upper bound” method together with the information about \(M(\beta )\) in Proposition 4.1. Here are the details.
Proof of Theorem 1.3
Suppose first \(x \ge a\). For any \(\beta \ge 0\) we have
Choosing \(\beta = (1/c) \log (x/a) \ge 0\) we get
Now suppose \(0 \le x \le a\). For any \(\beta \le 0\) we have
If \(0 < x \le a\) take \(\beta = (1/c) \log (x/a) \le 0\) to get
while if \(x = 0\) let \(\beta \rightarrow -\infty \) to get \(\mathbb {P}(X \le 0) \le e^{-a/c}\).\(\square \)
The passage from Theorem 1.3 to Corollary 1.1 is a direct application of the following two lemmas.
Lemma 4.1
Suppose \(0 < a \le x\) and \(c >0\). Then
with strict inequality whenever \(x > a\).
Proof
The expressions both take the value 1 when \(x = a\), so we can assume \(x > a\). Jensen’s inequality applied to the function \(f(x) = 1/x\) on the interval \([1,1+u]\) gives \(\log (1+ u) > 2u/(2+u)\) for \(u > 0\). Taking \(u = (x-a)/a\) we get \(\log (x/a) > 2(x-a)/(x+a)\) and so
Dividing by \(c\) and applying the exponential function to both sides gives (22). \(\square \)
Lemma 4.2
Suppose \(0 \le x \le a\) and \(a,c >0\). Then
with strict inequality whenever \(x < a\).
Proof
The expressions both take the value 1 when \(x = a\), so we can assume \(x < a\). Integrating the inequality \(x^{-1} \le (1+x^{-2})/2\) from \(1\) to \(v \) gives \(\log v < (v-1/v)/2\) for \(v > 1\). Putting \(v=a/x\) gives
Dividing by \(c\) and applying the exponential function to both sides gives (23). \(\square \)
5 Examples admitting a 1-bounded size bias coupling
Here we give some examples of random variables \(X\) which satisfy BSBC with \(c = 1\). Examples with general \(c > 0\) can be obtained by scaling, see Remark 1.1.
Example 5.1
If \(X\) has a Poisson distribution, it can be verified directly that \(X+1 =^d X^*\). Conversely if \(X \ge 0\) with \(0 < \mathbb {E \,}X <\infty \) and \(X+1 =^d X^*\), then the inequality in the statement (and proof) of Proposition 4.1 can be replaced by equality, so that \(X\) must have a Poisson distribution.
Remark 5.1
(Sharpness of the bounds) Suppose \(X\) is Poisson distributed with mean \(a \in (0,\infty )\). Taking \(x = 0\) in the lower tail bound (11) from Theorem 1.3 we get \(F(0) \le e^{-a}\) whereas the exact value is \(F(0) = e^{-a}\). Therefore in this setting the lower tail bound (11) from Theorem 1.3 is sharp.
Now suppose further that \(a \in (0,1]\). When \(x=n\), the upper tail bound (3) in Theorem 1.1 simplifies, with \(k=n-1\), to \(G(x) \le u(n,a,1)= a^n/n!\), while for large \(n\), \(G(n) \sim \mathbb {P}(X=n) = e^{-a} a^n/n!\). Hence, for large \(x\) with \(x\) an integer, the upper bound (3) is sharp up to a factor of approximately \(e^{a}\). Letting \(a \rightarrow 0\) so that \(e^{a} \rightarrow 1\), one sees that the upper bound (3), for large \(x\), is sharp up to a factor arbitrarily close to 1.
Example 5.2
Lévy([0,1]), the infinitely divisible distributions with Lévy measure supported on \([0,1]\). This is the case \(c=1\) of the Lévy([0,\(c\)]) distributions, discussed in detail in Sect. 6.
Example 5.3
Let \(X\) be a random variable with values in \([0,1]\), with \(\mathbb {E \,}X > 0\). This includes, but is not restricted to, Bernoulli random variables. The size biased version \(Y\), say, of \(X\) takes values in \([0,1]\) also, and hence \(Y \le 1 \le 1+X\), so that \(X\) admits a 1-bounded size bias coupling.
Example 5.4
The uniform distribution on \([0,b]\) admits a 1-bounded size bias coupling for any \(b \in (0,4]\).
Proof
Suppose that \(X\) is uniformly distributed on \([0,1]\), and that \(Y\) is the size biased version of \(X\). On \([0,1]\), the density of \(X\) is \(f_X(x)=1\), the density of \(Y\) is \(f_Y(x)=2x\), the cumulative distribution functions are \(F_X(t)=t\) and \(F_Y(t)=t^2\), and the inverse cumulative distribution functions are \(F_X^{(-1)}(u)=u\) and \(F_Y^{(-1)}(u)=\sqrt{u}\) for \(0 \le u \le 1\). Observe that \(\max _{0 \le u \le 1} (\sqrt{u}-u) = \frac{1}{4}\), achieved at \(u=\frac{1}{4}\). Hence, using the quantile transform as in the proof of Lemma 7.1, there is a \(c\)-bounded size bias coupling for the standard uniform with \(c=\frac{1}{4}\). By scaling, as in Remark 1.1, the uniform distribution on \([0,b]\) admits a \(b/4\)-bounded size bias coupling. \(\square \)
Proposition 5.1
Suppose \(X = \sum _{i} X_i\) is the sum of finitely or countably many independent non-negative random variables \(X_i\) with \(0 < \mathbb {E \,}X = \sum _i \mathbb {E \,}X_i <\infty \). If each \(X_i\) admits a 1-bounded size bias coupling, then so does \(X\).
Proof
For each \(i\) there is a coupling of \(X_i\) with its size biased version \(Y_i\), say, so that \(Y_i \le X_i+1\). Let \(I\) denote an index \(i\) chosen (independently of all the \(X_j\) and \(Y_j\)) with probability \(\mathbb {E \,}X_i/\mathbb {E \,}X\). Then \(Y:=X-X_I+Y_I\) is the size biased version of \(X\), see [15, Lemma 2.1] or [3, Sect. 2.4]. Since \(Y_i \le X_i +1\) for all \(i\), it follows that \(Y \le X+1\). \(\square \)
Example 5.5
By Proposition 5.1, if \(X\) is Binomial, or more generally if \(X\) is the sum of (possibly infinitely many) independent Bernoulli random variables (with possibly different parameters) with \(0 < \mathbb {E \,}X < \infty \) then \(X\) has a 1-bounded size bias coupling.
Example 5.6
Let \(C(b,n)\) denote the number of collisions which occur when \(b\) balls are tossed into \(n\) equally likely bins. (A collision is said to occur whenever a ball lands in a bin that already contains at least one ball.) Clearly \(C(b,n)\) can be expressed as the sum of dependent Bernoulli random variables. In [2, Prop 15] it is shown that \(C(b,n)\) admits a 1-bounded size bias coupling.
The paper [2] studies the number \(B(\kappa ,n)\) of balls that must be tossed into \(n\) equally likely bins in order to obtain \(\kappa \) collisions. For the sake of concrete analysis, we suppose here that \(\kappa \sim n^\alpha \) for some fixed \(\alpha \in (1/2,1)\); in this case, the goal is to prove that the variance of \(B(\kappa ,n)\) is asymptotic to \(n/2\). Rényi’s central limit theorem [21] for the number of collisions \(C(b,n)\) obtained with a given number \(b\) of balls, combined with duality, leads easily to a central limit theorem for the number of balls \(B(\kappa ,n)\) that must be tossed; the hard part, for getting the asymptotic variance, requires uniform integrability, to be obtained from concentration inequalities for \(C(b,n)\). The traditional tool, the Azuma-Hoeffding bounded difference martingale approach [4], leads to \(\mathbb {P}( |C(b,n)- \mathbb {E \,}C(b,n) | \ge t) \le 2 \exp ( -t^2/(2b))\). In contrast, bounded size bias coupling inequalities such as Corollary 1.1 involve \(a = \mathbb {E \,}C(b,n)\) rather than \(b\) in the denominator of the fraction in the exponential. The situation in [2] has \(a/b \rightarrow 0\); it turns out that Azuma-Hoeffding inequality is inadequate in this setting, whereas the bounded size bias coupling inequalities are strong enough, see [2, Sect. 7] for details.
6 Lévy(\([0,c]\)), the infinitely divisible distributions with Lévy measure supported on \([0,c]\)
Among the distributions satisfying BSBC are those with characteristic function of the form
where \(a \in (0,\infty )\) and \(\alpha \) is the probability distribution of a nonnegative nonzero random variable \(D\), with \(\mathbb {P}(D\in [0,c])=1\). Given this characteristic function, the random variable \(X\) has \(a=\mathbb {E \,}X\), and, with \(X,D\) independent, \(X^*=^d X+D\). See [3] for the connection between infinite divisibility of \(X\) and size biasing of \(X\) by adding an independent \(D\). Special cases include:
-
1.
\(c=1, \mathbb {P}(D=1)=1; X\) is Poisson with mean \(a\).
-
2.
\(c=1,a=1, D\) is uniformly distributed on \((0,1)\); \(X\) has density \(f(x) = e^{-\gamma } \rho (x)\) where \(\rho \) is Dickman’s function and \(\gamma \) is Euler’s constant.
The characteristic function in (24) can also be expressed as
Here \(\gamma \) is a nonnegative measure on \((0,\infty )\), with \(\gamma (dy) /\alpha (dy) = a/y\), and may be called the Lévy measure of the infinitely divisible random variable \(X\); see Sato [22, Section 51], Bertoin [6, Section 1.2], or [3].
Theorem 6.1
Suppose \(X\) has distribution given by (24) for \(c >0\), and that for every \(\varepsilon > 0\), the probability measure \(\alpha \) is not supported on \([0,c-\varepsilon ]\). Then \(G(x) := \mathbb {P}(X \ge x)\) satisfies, as \(x \rightarrow \infty \),
Proof
As observed in the Introduction, the upper bound on \(G(x)\) follows directly and easily from (3) in Theorem 1.1 and (7) in Theorem 1.2.
For the lower bound, let \(\varepsilon >0\) be given, with \(\varepsilon < c\). Using the representation (25), the random variable \(X\) can be realized as the constant \(a \alpha _0\) plus the sum of the arrivals in the Poisson process \({\mathcal X}\) on \((0,c]\) with intensity measure \(\gamma \). Let \(Z\) be the number of arrivals of \({\mathcal X}\) in \([c-\varepsilon ,c]\). We have \(X \ge (c - \varepsilon ) Z\). This yields
Finally, \(Z\) is a Poisson random variable with mean \(\lambda = \gamma ([c-\varepsilon ,c] ) \ge (a/c) \alpha ([c-\varepsilon ,c]) > 0\). We recall an elementary calculation: for integers \(k \rightarrow \infty \), the Poisson (\(\lambda \)) distribution for \(Z\) has \(\mathbb {P}(Z \ge k) \ge \mathbb {P}(Z=k)\) with
We use this with \(k = \lceil x/(c-\varepsilon ) \rceil \sim x/(c-\varepsilon )\) and \(\log k \sim \log x\). \(\square \)
Remark 6.1
Most probabilists are familiar with the \(\approx \) notation of (26), with \(a_n \approx b_n\) defined to mean \(\log a_n \sim \log b_n\), for use in the context where \(a_n\) grows or decays exponentially. The standard example is the large deviation statement that for i.i.d. sums, \(\mathbb {P}(S_n \ge a n ) \approx \exp (-n I(a) )\), and the \(\approx \) relation hides factors with a slower than exponential order of growth or decay, in this case \(1/\sqrt{n}\). When both \(a_n\) and \(b_n\) grow or decay even faster, for example with \(a_n \sim n^{\pm n/c}\), an unfamiliar phenomenon arises, with \(\approx \) hiding factors which grow or decay exponentially fast. One example appears in the conclusion (26) of Theorem 6.1, where \(x^{-x/c} \approx (x/c)^{-x/c} \approx 1/ \Gamma (x/c)\) as \(x \rightarrow \infty \)—with the last expression being relevant because it corresponds to the Gamma function upper bound in Theorem 1.2. A second example occurs in (27), where at first it appears strange that the parameter \(\lambda \) does not appear on the right hand side—this reflects the fact that for any fixed \(\lambda , \lambda ' >0\), when \(Z\) and \(Z'\) are Poisson with parameters \(\lambda , \lambda '\) respectively, \(\mathbb {P}(Z=k) \approx \mathbb {P}(Z'=k)\) as \(k \rightarrow \infty \). The first example hides the exponentially growing factor \(c^{x/c}\), and the second hides the factor \((\lambda ' / \lambda )^k\).
7 Bounded coupling, monotone coupling, and a sandwich principle
Suppose that the distributions of random variables \(X,Y\) have been specified, with cumulative distribution functions \(F_X,F_Y\) respectively. We will clarify the relations between hypotheses of the form
It is well-known that if \(Y =^d X^*\), then (31) holds. We observe that if \(Y =^d X^*\), then (28) is the hypothesis BSBC, while (29) and (30) are used as hypotheses for the first results on concentration via size bias couplings, in [12].
Proposition 7.1
Given that (31) holds, all of (28)–(30) are equivalent.
The proof of Proposition 7.1 will be given later in this section.
7.1 One sided bounds
Among the four hypotheses (28) – (31), only (31) is standard. The relation of stochastic domination, that \(X\) lies below \(Y\), written \(X \preceq Y\), is usually defined by the condition
and it is well known that (32) is equivalent to (31). See for example Lemma 7.1. Stochastic domination is often considered in the more general context where \(X,Y\) are random elements in a partially ordered set, see for example [20].
7.2 Two sided bounds: historical perspective
Dudley [10, Prop 1 and Thm 2], following earlier work of Strassen [23], proved the following result. Given distributions \(\mu \) and \(\nu \) respectively for random variables \(X\) and \(Y\) taking values in a Polish metric space \((S,d)\) and \(\alpha > 0\), write \(A^\alpha = \{ x : d(x,A) \le \alpha \}\) to denote the closed \(\alpha \) neighborhood of the set \(A \subset S\). Then the following are equivalent:
When \(X,Y\) are real valued random variables, and \(\alpha = c\), condition (29) is trivially equivalent to (35), and (33) or (34) can be shown to be equivalent to \(\forall t, \ F_Y(t-c) \le F_X(t) \le F_Y(t+c)\).
7.3 Proof of Proposition 7.1
It is clear that (30) implies (29) implies (28). It remains to show that (31) and (28) together imply (30), and this follows immediately by using the implication i) implies iii) with \(b = 0\) in the following Lemma.
Lemma 7.1
For random variables \(X,Y\) with cumulative distribution functions \(F_X, F_Y\) respectively and \(b,c \in \mathbb {R}\), the following are equivalent:
-
(i)
There exist a coupling such that \(\mathbb {P}(X \le Y+b) =1\) and a coupling such that \(\mathbb {P}(Y \le X+c) = 1\)
-
(ii)
For all \(t, F_Y(t-b) \le F_X(t) \le F_Y(t+c)\)
-
(iii)
There exists a coupling such that \(\mathbb {P}(X-b \le Y \le X+c)=1\)
-
(iv)
There exists a coupling such that \(X-b \le Y \le X+c\) for all \(\omega \).
Proof
To prove that (i) implies (ii) we use the coupling in which \(\mathbb {P}(X \le Y+b) =1\) and calculate
and similarly for \(F_X(t) \le F_Y(t+c)\).
To show (ii) implies (iv) we may use the quantile transform to couple \(X\) and \(Y\) to a single random variable \(U\), uniformly distributed in (0,1). This transform is written informally as
and more formally as \(X(\omega ) := \inf \{t: F_X(t) \ge U(\omega ) \}\), \(Y(\omega ) := \inf \{t: F_Y(t) \ge U(\omega ) \}\). Under this coupling, \(F_Y(t-b) \le F_X(t)\) for all \(t\) implies \(X(\omega ) \le Y(\omega )+b\) for all \(\omega \), and \(F_X(t) \le F_Y(t+c)\) for all \(t\) implies \(Y(\omega ) \le X(\omega )+c\) for all \(\omega \). Together ii) implies \( X(\omega )-b \le Y(\omega ) \le X(\omega )+c \) holds for all \(\omega \).
Finally, (iv) implies (iii) trivially, and (iii) implies (i) a fortiori. \(\square \)
7.4 A sandwich principle
Corollary 7.1
(Sandwich principle) Suppose \(X,Y,Z\) are random variables with cumulative distribution functions \(F_X, F_Y, F_Z\) satisfying \(F_Z(t) \le F_Y(t) \le F_X(t)\) for all \(t\). If there is a \(c\)-bounded coupling of \(X\) and \(Z\), as defined via (29), then there exists a \(c\)-bounded monotone coupling of \(X\) and \(Y\), as defined by (30).
Proof
Using Lemma 7.1 with \(Z\) in place of \(Y\), the \(c\)-bounded coupling of \(X\) and \(Z\) implies \(F_X(t) \le F_Z(t+c)\) for all \(t\). Then \(F_Y(t) \le F_X(t) \le F_Z(t+c) \le F_Y(t+c)\) for all \(t\), and Lemma 7.1 gives the existence of the \(c\)-bounded monotone coupling of \(X\) and \(Y\). \(\square \)
Application As an illustration of the sandwich principle, we prove the following corollary. The special case where \(X\) is Binomial\((m,p)\) for \(p \in (0,1)\) and \(m \ge 1\) was proved in [14, Lemmas 3.2, 3.3] by an explicit calculation. The sandwich principle enables a short proof for the general case, with no calculation.
Corollary 7.2
Suppose \(X\) satisfies BSBC, with \(c = 1\), and the distribution of \(Y\) is defined to be that of \(X\), conditional on \(X > 0\). Then there is a coupling in which \(Y-X \in [0,1]\) for all \(\omega \).
Proof
Trivially, \(Y\) stochastically dominates \(X\). Take the distribution of \(Z\) to be the size biased distribution of \(X\). By assumption, there is a coupling in which \(\mathbb {P}(Z-X \in [0,1])=1\). Trivially, the distribution of \(Z\), initially defined as the size-biased distribution of \(X\), is also the size-biased distribution of \(Y\). Hence \(Z\) dominates \(Y\), and the sandwich principle applies with \(c = 1\). \(\square \)
This result may be applied to any of the random variables \(X\) discussed in Sect. 5, and in particular to those obtained using Proposition 5.1. If \(X\) is (non-negative) integer valued, for example, a Binomial random variable or a sum of independent Bernoulli random variables, then \(Y\) is also integer valued and the conclusion of Corollary 7.2 is easily strengthened to \(Y-X \in \{0,1\}\) for all \(\omega \).
8 Analysis of the upper tail and lower tail bounds
In this section we obtain results enabling us to compare the upper and lower tail bounds in Theorem 1.2 with those in Theorem 1.3.
Lemma 8.1
For \(0 < u \le v\)
Proof
For \(x > 0\), using Gauss’ formula, see [24, Sect 12.3],
together with
we obtain
Therefore for \(0 < u \le v\),
\(\square \)
Proposition 8.1
For \(0 < a \le x\),
Proof
Taking \(u = a+1/2\) and \(v=x+1/2\) in (36), we get
giving (37). A simple calculus argument shows that \(t \mapsto (a+t)^a/(x+t)^x\) is decreasing for \(t \in [0,\infty )\). In particular \(\left( a+\frac{1}{2}\right) ^a/\left( x+\frac{1}{2}\right) ^x \le a^a/x^x\), and this gives (38). \(\square \)
Proposition 8.2
For \(0 < x <a\),
Proof
Essentially the same as for Proposition 8.1, but with the roles of \(x\) and \(a\) switched. \(\square \)
9 Comparison of the bounds
9.1 Relative strength of our upper and lower tail bounds
Here we compare the elementary “product bounds” (3, 4) given in Theorem 1.1 with the Gamma function bounds (7, 8) given in Theorem 1.2 and the bounds (10, 11) obtained in Theorem 1.3 from the moment generating function estimate.
It is clear from Theorem 1.2 [together with the scaling relationship (6)] that the product bounds (3, 4) are at least as strong (i.e. small) as the corresponding Gamma bounds (7, 8). However the relationship between the Gamma bounds and the bounds (10, 11) in Theorem 1.3 is more complicated. See Propositions 8.1 and 8.2. For the upper tail, with \(a \le x\), the Gamma bound (7) is sharper than the bound (10) from Theorem 1.3 by a factor \(\sqrt{(a+c/2)/(x+c/2)}\). However, the situation is reversed for the lower tail, with \(0 \le x \le a\), where now the bound (11) from Theorem 1.3 is sharper than the Gamma bound (8) by a factor \(\sqrt{(x+c/2)/(a+c/2)}\). Since the Gamma bound (8) and the product bound (4) agree whenever \(a-x\) is an integer, this suggests (but does not prove) that the bound (11) is the best of the three for the lower tail.
Numerical investigations suggest that (11) is in fact the best estimate to use for the lower tail, and beats the simple product rule \(\ell (x,a,c)\) of (4). Recall however, from Remark 2.2, the lower tail bounds derived from (15) in combination with the one-sided Chebyshev inequality (16), in particular the functions \(\ell _j(x,a,c)\) defined in (17). Numerical investigations suggest that for all \(a,c > 0\) with sufficiently large \(a/c\) there exists \(x \in (0,a)\) such that the bound given in (11) is less than the product bound (4) and is less than the one-sided Chebyshev estimate (16), but is greater than \(\ell _j(x,a,c)\) for some nonnegative integer \(j\le (a-x)/c\).
9.2 Hoeffding bounds
Suppose \(X = X_1+ \cdots + X_n\) where the \(X_i\) are independent and take values in \([0,1]\), and let \(a = \mathbb {E \,}X\). Clearly \(a < \infty \) and to avoid trivialities we assume \(a > 0\). Hoeffding [18, Thm 1] proved that for \(a \le x < n\)
The inequality
for \(0 < x < n\), (see for example [16]), and the fact that \(\mathbb {P}(X \ge x) = 0\) for \(x > n\), together give the upper tail bound
A similar argument with \(X_i\) replaced by \(1-X_i\) and \(X\) replaced by \(n-X\) gives the lower tail bound
Notice that the right sides of (41) and (42) do not depend on the number \(n\) of summands in \(X\). Other related inequalities, also referred to as Hoeffding or Chernoff–Hoeffding bounds, involve the parameter \(n\).
From Proposition 5.1 we know that any random variable \(X\) of the form above admits a 1-bounded size bias coupling. Therefore the Hoeffding bounds (41, 42) are a special case of the bounds (10, 11) in our Theorem 1.3. Our best upper tail bound, given by (3) is smaller than the Hoeffding upper tail bound (41) by a factor \({\big (\frac{a+1/2}{x+1/2}\big )^{1/2}}\). Moreover our results have a broader scope, applying to any random variable \(X\) which admits a 1-bounded size bias coupling. In particular it applies to sums of independent nonnegative random variables such as Uniform[0,4] and Lévy[0,1] (as discussed in Examples 5.4 and 5.2).
References
Arratia, R., Barbour, A.D., Tavaré, S.: Logarithmic Combinatorial Structures: A Probabilistic Approach. EMS Monographs in Mathematics. European Mathematical Society (EMS), Zürich (2003)
Arratia, R., Garibaldi, S., Killian, J.: Asymptotic distribution for the birthday problem with multiple coincidences, via an embedding of the collision process. Preprint (2013)
Arratia, R., Goldstein, L., Kochman, F.: Size bias for one and all. Preprint (2013)
Azuma, K.: Weighted sums of certain dependent random variables. Tôhoku Math. J. 2(19), 357–367 (1967)
Bartroff, J., Goldstein, L., Işlak, U.: Bounded size biased couplings for log concave distributions and concentration of measure for occupancy models. Preprint (2013)
Bertoin, J.: Subordinators: examples and applications. In: Lectures on probability theory and statistics (Saint-Flour, 1997), Lecture Notes in Math., vol. 1717, pp. 1–91. Springer, Berlin (1999)
Chatterjee, S.: Stein’s method for concentration inequalities. Probab. Theory Relat. Fields 138, 305–321 (2007)
Chatterjee, S., Dey, P.S.: Applications of Stein’s method for concentration inequalities. Ann. Probab. 38(6), 2443–2485 (2010)
Dickman, K.: On the frequency of numbers containing prime factors of a certain relative magnitude. Ark. Mat. Astr. Fys. 22, 1–14 (1930)
Dudley, R.M.: Distances of probability measures and random variables. Ann. Math. Stat. 39, 1563–1572 (1968)
Ghosh, S., Goldstein, L.: Applications of size biased couplings for concentration of measures. Electron. Commun. Probab. 16, 70–83 (2011)
Ghosh, S., Goldstein, L.: Concentration of measures via size-biased couplings. Probab. Theory Relat. Fields 149(1–2), 271–278 (2011)
Ghosh, S., Işlak, U.: Multivariate concentration inequalities with size biased couplings. Preprint (2013)
Goldstein, L., Penrose, M.D.: Normal approximation for coverage models over binomial point processes. Ann. Appl. Probab. 20(2), 696–721 (2010)
Goldstein, L., Rinott, Y.: Multivariate normal approximations by Stein’s method and size bias couplings. J. Appl. Probab. 33(1), 1–17 (1996)
Hagerup, T., Rüb, C.: A guided tour of Chernoff bounds. Inform. Process. Lett. 33(6), 305–308 (1990)
Hildebrand, A., Tenenbaum, G.: Integers without large prime factors. J. Théor. Nombres Bordeaux 5(2), 411–484 (1993)
Hoeffding, W.: Probability inequalities for sums of bounded random variables. J. Amer. Stat. Assoc. 58, 13–30 (1963)
Ledoux, M.: The Concentration of Measure Phenomenon, Mathematical Surveys and Monographs, vol. 89. American Mathematical Society, Providence, RI (2001)
Liggett, T.M.: Interacting particle systems, Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences], vol. 276. Springer, New York (1985)
Rényi, A.: Three new proofs and a generalization of a theorem of Irving Weiss. Publ. Math. Inst. Hung. Acad. Sci. 7, 203–214 (1962). [Reprinted in, vol. III of his Selected Papers, pp. 67–77]
Sato, K-I.: Lévy processes and infinitely divisible distributions, Cambridge Studies in Advanced Mathematics, vol. 68. Cambridge University Press, Cambridge (1999). Translated from the 1990 Japanese original, Revised by the author
Strassen, V.: The existence of probability measures with given marginals. Ann. Math. Stat. 36, 423–439 (1965)
Whittaker, E.T., Watson, G.N.: A course of modern analysis. An introduction to the general theory of infinite processes and of analytic functions: with an account of the principal transcendental functions. Fourth edition. Reprinted. Cambridge University Press, New York (1962)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Arratia, R., Baxendale, P. Bounded size bias coupling: a Gamma function bound, and universal Dickman-function behavior. Probab. Theory Relat. Fields 162, 411–429 (2015). https://doi.org/10.1007/s00440-014-0572-x
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00440-014-0572-x