1 Introduction and main results

For any nonnegative random variable \(X\) with \(0 < \mathbb {E \,}X < \infty \), we say that the distribution of \(Y\) is the size biased distribution of \(X\), written \(Y =^d X^*\), if the Radon–Nikodym derivative of the distribution of \(Y\), with respect to the distribution of \(X\), is given by \(\mathbb {P}(Y \in dx)/\mathbb {P}(X \in dx) = x/ \mathbb {E \,}X\). If \(Y =^d X^*\), then for all bounded measurable \(g\), \(\mathbb {E \,}g(Y) = \mathbb {E \,}( X g(X)) / \mathbb {E \,}X\). For much more information about size biased distributions see [3], or [1, pp 78–80].

In this paper we shall assume that \(X\) admits a \(c\)-bounded size bias coupling. More precisely we make the hypothesis

BSBC: \(X\) is a non-negative random variable with positive finite expected value \(\mathbb {E \,}X = a\), and for \(Y =^d X^*\) there exists a coupling in which \( Y \le X+c\) for some \(c \in (0,\infty )\).

Throughout the paper the numbers \(a\) and \(c\) will always refer to their definitions in BSBC. Examples of random variables which admit a \(c\)-bounded size bias coupling are given in Sect. 5, and some equivalent formulations of the assumption BSBC are given in Sect. 7.

Define

$$\begin{aligned} F(x ) := \mathbb {P}( X \le x), \ \ G(x) := \mathbb {P}(X \ge x). \end{aligned}$$

When \(X\) has finite mean \(a\), concentration inequalities refer to upper bounds on the upper tail probability \(G(x) = \mathbb {P}(X \ge x)\) for \(x \ge a\) and on the lower tail probability \(F(x) = \mathbb {P}(X \le x)\) for \(x \le a\). For a review of early results on concentration inequalities see Ledoux [19]. More recently Chatterjee [7] has used Stein’s method for exchangeable pairs to obtain concentration inequalities, see also [8].

The remarkably effective idea of using bounded size bias couplings to prove concentration inequalities comes from Ghosh and Goldstein [12]; their proof is inspired by the \(x \mapsto e^x\) is convex argument used to prove the Hoeffding concentration bounds, see [18]. For many examples of the application of concentration bounds derived from size bias couplings, in situations involving dependence, see [2, 5, 11, 12]. Some details of the example in [2] are given in Sect. 5. An extension to a multivariate setting is given in Ghosh and Işlak [13].

Number theorists denote by \(\Psi (x,y)\) the number of positive integers not exceeding \(x\) and free of prime factors larger than \(y\). Dickman [9] showed that for any \(u > 0\)

$$\begin{aligned} \lim _{y \rightarrow \infty } \Psi (y^u,y) y^{-u} = \rho (u) \end{aligned}$$

where \(\rho (u)\), the Dickman function, is the unique continuous function satisfying

$$\begin{aligned} \left\{ \begin{array}{rcll} &{}&{}u\rho '(u) = -\rho (u-1) &{} \quad \text{ if } u > 1 \\ &{}&{}\rho (u) = 1 &{} \quad \text{ if } 0 \le u \le 1. \end{array} \right. \end{aligned}$$

In Hildebrand and Tenenbaum [17, Lemma 2.5] it is shown that

$$\begin{aligned} u \rho (u) = \int \limits _{u-1}^u \rho (v)\,dv, \quad u \ge 1 \end{aligned}$$

and then a simple inductive proof gives the inequality

$$\begin{aligned} 0 < \rho (u) \le \frac{1}{\Gamma (u+1)}, \quad u \ge 0, \end{aligned}$$
(1)

where \(\Gamma (u)\) denotes the usual Gamma function. It is well known that \(\log \Gamma (u) \sim u \log u\) as \(u \rightarrow \infty \), and the Dickman function \(\rho (u)\) exhibits similar asymptotic behavior: \(\log \rho (u) \sim -u \log u\) as \(u \rightarrow \infty \), see [17, Cor 2.3].

Inspired by the proof of (1), we prove stronger concentration inequalities than [12], under weaker hypotheses, and with a simpler proof. In the following, the notation \(\lfloor x \rfloor \) is used for the floor function, that is, the greatest integer less than or equal to \(x\).

Theorem 1.1

Assume BSBC. Given \(x\) let

$$\begin{aligned} k = \lfloor \frac{| x-a|}{c} \rfloor , \end{aligned}$$
(2)

so that \(k\) is a nonnegative integer, possibly zero. Then

$$\begin{aligned} \text { for } x\ge a, \ \ G(x) \le u(x,a,c) := \prod _{0 \le i \le k} \frac{a}{x-ic}, \end{aligned}$$
(3)

and

$$\begin{aligned} \text { for } 0 \le x \le a, \ \ F(x) \le \ell (x,a,c) := \prod _{0 < i \le k} \frac{x+ic}{a}. \end{aligned}$$
(4)

Remark 1.1

(Scaling) It is simplest, both for notation and concept, to work with the special case where the constant \(c\) in BSBC satisfies \(c=1\). The results derived for this special case easily transform into results for the general case, since if \(Y =^d X^*\) and \(Y \le X+c\), then

$$\begin{aligned} (Y/c) =^d (X/c)^*, \ \text { and }\quad (Y/c) \le (X/c) + 1. \end{aligned}$$
(5)

In particular the upper bounds in (3) and (4) satisfy

$$\begin{aligned} u(x,a,c) = u(x/c,a/c,1), \ \ \ \ell (x,a,c) = \ell (x/c,a/c,1). \end{aligned}$$
(6)

for all \(a,c>0\) and \(x \ge 0\).

An opportunity to use (6) occurs in the following result, which provides a more convenient closed form version of the concentration inequalities above.

Theorem 1.2

The upper tail bound defined by (3) and the lower tail bound defined by (4) satisfy

$$\begin{aligned} \text { for } x\ge a, \ \ u(x,a,1) \le \frac{ a^{x-a} \ \Gamma (a+1)}{\Gamma (x+1)}, \end{aligned}$$
(7)

and

$$\begin{aligned} \text { for } 0 \le x \le a, \ \ \ell (x,a,1) \le \frac{\Gamma (a+1)}{a^{a-x}\Gamma (x+1)}, \end{aligned}$$
(8)

with, in each case, equality if and only if \(x-a\) is an integer.

It is an immediate consequence of these results that if \(X\) satisfies BSBC then

$$\begin{aligned} \limsup _{x \rightarrow \infty } \frac{\log G(x)}{(x/c) \log x} \le -1 \end{aligned}$$
(9)

In Sect. 6 we present a class of random variables satisfying BSBC for which the \(\limsup \) and inequality in (9) can be replaced by \(\lim \) and equality, see Theorem 6.1.

An alternative approach, described in Sect. 4, involves an upper bound on the moment generating function of \(X\) and gives the following result.

Theorem 1.3

Assume BSBC. Then

$$\begin{aligned} \text { for } x\ge a, \ \ G(x) \le \left( \frac{a}{x}\right) ^{x/c} e^{(x-a)/c}, \end{aligned}$$
(10)

and

$$\begin{aligned} \text { for } 0 \le x \le a, \ \ F(x) \le \left( \frac{a}{x}\right) ^{x/c} e^{(x-a)/c}. \end{aligned}$$
(11)

Here we use the convention that \((1/0)^0 = \lim _{x \rightarrow 0^+} (1/x)^x = 1\), so we can regard \(\displaystyle { \left( a/x\right) ^{x/c}}\) as being well-defined and taking the value 1 when \(x = 0\). Some elementary analysis, see Lemmas 4.1 and 4.2, allows us to replace the bounds in Theorem 1.3 with the strictly weaker (for \(x \ne a\)) bounds obtained by Ghosh and Goldstein in [12].

Corollary 1.1

Assume BSBC. Then

$$\begin{aligned} \text { for } x\ge a, \ \ G(x) \le \exp \left( -\frac{(x-a)^2}{c(a+x)}\right) , \end{aligned}$$
(12)

and

$$\begin{aligned} \text { for } 0 \le x \le a, \ \ F(x) \le \exp \left( -\frac{(x-a)^2}{2ca}\right) . \end{aligned}$$
(13)

For a simple application of these bounds, suppose that \(X = X_1+X_2+ \cdots + X_n\) is the sum of independent random variables with values in the interval \([0,c]\). Then \(X\) satisfies BSBC, see Sect. 5. The estimates in Theorem 1.3 are the standard Hoeffding inequalities, see Hoeffding [18] and Sect. 9.2, and the estimates in Corollary 1.1 are those obtained by Chatterjee [7] as an simple example of his results on concentration inequalities for exchangeable pairs.

The proofs of Theorems 1.1, 1.2 and 1.3 and Corollary 1.1 are given in Sects. 2, 3 and 4. Some simple examples of random variables satisfying BSBC are given in Sect. 5, and a particular family of infinitely divisible measures satisfying BSBC, the so called Lévy(\([0,c]\)) random variables, is studied in Sect. 6. Section 7 deals with the relationship between monotone and bounded couplings. Finally in Sects. 8 and 9 we consider the relative strengths of the bounds in Theorems 1.1 and 1.2 (coming from the elementary argument in Lemma 2.1) as opposed to those in Theorem 1.3 and Corollary 1.1 (obtained via the moment generating function); the technical calculations appear in Sect. 8 and the discussion is in Sect. 9.

2 Product bounds

We start with an elementary argument giving powerful bounds on the upper and lower tail probabilities.

Lemma 2.1

Assume BSBC. Then

$$\begin{aligned} \forall x>0, \ \ G(x) \le \frac{a}{x} \, G(x-c), \end{aligned}$$
(14)

and

$$\begin{aligned} \forall x, \ \ F(x) \le \frac{x+c}{a} \, F(x+c). \end{aligned}$$
(15)

Proof

To prove the upper bound on \(G(x)\), note that BSBC implies that the event \(Y \ge x\) is a subset of the event \(X \ge x-c\). Hence for \(x>0\),

$$\begin{aligned} x G(x) = x \, \mathbb {E \,}1(X \ge x)&\le \mathbb {E \,}(X 1(X \ge x)) \\&= a \, \mathbb {P}(Y \ge x) \\&\le a \,G(x-c). \end{aligned}$$

When \(x>0\) we can divide by \(x\) to get (14).

To prove the upper bound on \(F(x)\), note that BSBC implies that the event \(Y \le x\) is a superset of the event \(X \le x-c\). Hence

$$\begin{aligned} x F(x) = x \, \mathbb {E \,}1(X \le x)&\ge \mathbb {E \,}(X 1(X \le x)) \\&= a \, \mathbb {P}(Y \le x) \\&\ge a \,F(x-c). \end{aligned}$$

This does not require that \(x\) be positive; for \(x<0\) it is the trivial inequality, that \(0 \ge 0\). Replacing \(x\) by \(x+c\) and dividing by \(a>0\) yields (15). \(\square \)

Proof of Theorem 1.1

Given \(x>0\), the obvious strategy for obtaining good bounds is to iterate (14) or (15) for as long as the new value of \(x\), say \(x'=x \pm i c\), still gives a favorable ratio, \(a/x'\) in (14), or \((x'+c)/a\) in (15), and using \(G(t) \le 1\) or \(F(t) \le 1\) as needed, to finish off. The proof is now a simple matter of carrying out this strategy.   \(\square \)

Corollary 2.1

Assume BSBC. Then the moment generating function of \(X\) is finite everywhere, that is, for all \(\beta \in \mathbb {R}\), \(M(\beta ) := \mathbb {E \,}e^{\beta X} < \infty \).

Proof

Since \(X \ge 0\), trivially \(M(\beta ) \le 1\) if \(\beta \le 0\), so assume \(\beta > 0\). For motivation: as \(x\) increases by \(c\), \(e^{\beta x}\) increases by a factor of \(e^{\beta c}\), while the upper bound \(u(x,a,c)\) on \(G(x)\) decreases by a factor of \(a/(x+c)\); hence for \(x > x_0 := 2 a e^{\beta c}\), the product \(e^{\beta x} u(x,a,c)\) decreases by a factor of at least 2. More precisely, for \(x \ge a\) we have \(u(x+c,a,c) = \frac{a}{x+c}u(x,a,c)\) and so

$$\begin{aligned} e^{\beta (x+c)}u(x+c,a,c) = \left( \frac{ae^{\beta c}}{x+c}\right) e^{\beta x}u(x,a,c) \le \frac{1}{2} e^{\beta x} u(x,a,c) \end{aligned}$$

for \(x > x_0\). Writing

$$\begin{aligned} M(\beta )&= \mathbb {E}\left[ e^{\beta X}1(X < x_0)\right] + \sum _{i \ge 0} \mathbb {E} \left[ e^{\beta X}1(x_0+ic \le X < x_0+ic+c)\right] \\&\le \mathbb {E}\left[ e^{\beta X}1(X < x_0)\right] + \sum _{i \ge 0} e^{\beta (x_0+ic+c)} u(x_0+ic,a,c), \end{aligned}$$

the series on the right side is bounded by a geometric series with ratio 1/2, and the net result will be \(M(\beta ) \le \exp (\beta x_0) + 2 \exp (\beta (x_0+c))\). \(\square \)

Remark 2.1

Note the difference between the indexing in the products (34): \(u(x,a,c)\) includes the factor indexed by \(i=0\), while \(\ell (x,a,c)\) excludes the factor indexed by \(i=0\). In case \(x \in (a,a+c)\), which is equivalent to \(a < x\) and \(k=0\), the bound in (3) has one factor, and simplifies to \(G(x) \le u(x,a,c) = a/x < 1\). In case \(x \in (a-c,a)\), which is equivalent to \(x < a\) and \(k=0\), the bound in (4) has no factors, and simplifies to the trivial observation \(F(x) \le \ell (x,a,c)=1\).

Remark 2.2

(Product bound combined with the one-sided Chebyshev inequality) The recursive nature of Lemma 2.1 allows the possibility of combining it with other information about \(F(x)\) or \(G(x)\). Here we pursue one possibility.

For a random variable \(X\) with mean \(a\) and variance \(\sigma ^2\), the one-sided Chebybshev inequality states that, for all \(x \le a\), \(F(x) = \mathbb {P}(X \le x)\) is upper bounded by \(\sigma ^2/(\sigma ^2+(a-x)^2)\). In our situation, \(Y =^d X^*\) and \(\mathbb {E \,}X=a\) yields \(\mathbb {E \,}X^2 = a \, \mathbb {E \,}Y\), and \(Y \le X+c\) implies \(\mathbb {E \,}Y \le a+c\). Hence \(\mathbb {E \,}X^2 = a \, \mathbb {E \,}Y \le a(a+c)\), so that \(\sigma ^2 \le ac\). Thus, under the hypotheses of Lemma 2.1, for all \(x \le a\),

$$\begin{aligned} \mathrm{Var}X \le ac, \ \ F(x) \le \frac{ac}{ac+(a-x)^2}. \end{aligned}$$
(16)

We want to improve on (4) by using one-sided Chebyshev in combination with iteration of (15). More precisely, given \(x < a\) and any non-negative integer \(j\) such that \(x+jc \le a\) we can iterate (15) from \(x\) to \(x+jc\) and then use the one-sided Chebyshev inequality at \(x+jc\) to obtain

$$\begin{aligned} F(x) \le \ell _j(x,a,c):=\frac{ac}{ac+(a-x-jc)^2} \prod _{0 < i \le j}\frac{x+ic}{a}. \end{aligned}$$
(17)

3 Gamma bounds

Here we replace the product bounds \(u(x,a,c)\) and \(\ell (x,a,c)\) in Theorem 1.1 by the simpler (but weaker) bounds given in Theorem 1.2. In this section we restrict to the case \(c = 1\). Results for general \(c > 0\) can be recovered using Eq. (6), see Remark 1.1. For \(c=1\), making use of \(z\Gamma (z) = \Gamma (z+1)\), the conclusions (3) and (4) in Theorem 1.1 can be rewritten as: for \(x \ge a\) and \(k=\lfloor x-a \rfloor \),

$$\begin{aligned} G(x) \le u(x,a,1) := \prod _{0 \le i \le k} \frac{a}{x-i} = \frac{a^{k+1} \Gamma (x-k)}{\Gamma (x+1)}, \end{aligned}$$
(18)

and for \(0 \le x \le a\) and \(k=\lfloor a-x \rfloor \),

$$\begin{aligned} F(x) \le \ell (x,a,1) := \prod _{0 < i \le k} \frac{x+i}{a} = \frac{\Gamma (x+k+1)}{a^k\Gamma (x+1)}. \end{aligned}$$
(19)

These upper and lower tail bounds might be viewed as too complicated; as \(x\) varies they are not closed-form expressions, and are not analytic functions. Here we replace them by expressions which are analytic in \(a\) and \(x\).

Lemma 3.1

For \(a > 0\) and \(0 \le f \le 1\)

$$\begin{aligned} a^{1-f}\Gamma (a+f) \le \Gamma (a+1), \end{aligned}$$
(20)

with equality for \(f=0,1\) and strict inequality for \(f \in (0,1)\).

Proof

The result is true (with equality) when \(f = 0\) or \(1\), so we can assume \(0 < f < 1\). We use the integral formula \(\Gamma (x) = \int _0^\infty t^{x-1}e^{-t}dt\) for \(x > 0\). Writing \(t^{a+f-1} = (t^{a-1})^{1-f} (t^a)^f\) and using Hölder’s inequality (with \(p = 1/(1-f)\)) gives

$$\begin{aligned} \Gamma (a+f) < \left[ \Gamma (a)\right] ^{1-f}\left[ \Gamma (a+1)\right] ^f\!. \end{aligned}$$

Since \( a \Gamma (a) = \Gamma (a+1)\) we get

$$\begin{aligned} a^{1-f}\Gamma (a+f) < \left[ a\Gamma (a)\right] ^{1-f}\left[ \Gamma (a+1)\right] ^f = \Gamma [a+1], \end{aligned}$$

and we are done. \(\square \)

Proof of Theorem 1.2

(i) Let \(k = \lfloor x-a \rfloor \) and \(f= x-a-k \in [0,1)\). Since \(c=1\) and \(x \ge a\), this is consistent with the notation in Theorem 1.1. Note that \(x-k=a-f\), and combine (18) with (20).

(ii) Let \(k = \lfloor a-x \rfloor \) and \(f = a-x-k \in [0,1).\) Since \(c=1\) and \(a \ge x\), this is consistent with the notation in Theorem 1.1. Replacing \(f\) by \(1-f\) in (20) gives \(a^f \Gamma (a-f+1) \le \Gamma (a+1)\). Combining this with (19) and noting that \(x+k=a-f\) gives the result. \(\square \)

4 Moment generating function

We recall our notation, \(M(\beta ) := \mathbb {E \,}e^{\beta X}\) for the moment generating function of \(X\). We observe that if \(X\) is Poisson with parameter \(a\), then \({\log M}(\beta ) = a(e^{\beta }-1)\), and \(X\) admits a \(c\)-bounded size bias coupling with \(c=1\), so that in this case, the inequality (21) holds as an equality for all \(\beta \).

Proposition 4.1

Assume BSBC. Then the moment generating function \(M(\beta )\) for \(X\) satisfies

$$\begin{aligned} {\log M}(\beta ) \le \frac{a}{c}\left( e^{\beta c}-1\right) \end{aligned}$$
(21)

for all \(\beta \in \mathbb {R}\).

Proof

We know from Corollary 2.1 that the moment generating function \(M(\beta )\) is finite for all \(\beta \in \mathbb {R}\). It follows that \(M\) is continuously differentiable and \( M'(\beta ) = \mathbb {E \,}(Xe^{\beta X})\). Moreover, since \(Y\) is the size biased version of \(X\), we have \(\mathbb {E \,}(Xe^{\beta X}) = a \mathbb {E \,}(e^{\beta Y})\). Together we have \(M'(\beta ) = a \mathbb {E \,}(e^{\beta Y})\).

For \(\beta \ge 0\) we have \(e^{\beta Y} \le e^{\beta (X+c)} = e^{\beta c}e^{\beta X}\) so that

$$\begin{aligned} M'(\beta ) = a \mathbb {E \,}(e^{\beta Y}) \le ae^{\beta c} \mathbb {E \,}(e^{\beta X}) = a e^{\beta c} M(\beta ). \end{aligned}$$

Then

$$\begin{aligned} (\log M)'(\beta ) \le a e^{\beta c} \end{aligned}$$

so that

$$\begin{aligned} {\log M}(\beta ) = {\log M}(\beta )-{\log M}(0) \le \int \limits _0^\beta a e^{uc}\,du = \frac{a}{c}\left( e^{\beta c}-1\right) \!. \end{aligned}$$

for all \(\beta \ge 0\).

For \(\beta \le 0\) we have \(e^{\beta Y} \ge e^{\beta (X+c)} = e^{\beta c}e^{\beta X}\) so that

$$\begin{aligned} M'(\beta ) = a \mathbb {E \,}(e^{\beta Y}) \ge ae^{\beta c} \mathbb {E \,}(e^{\beta X}) = a e^{\beta c} M(\beta ). \end{aligned}$$

Then

$$\begin{aligned} ({\log M})'(\beta ) \ge a e^{\beta c} \end{aligned}$$

so that

$$\begin{aligned} -{\log M}(\beta ) = {\log M}(0) - {\log M}(\beta ) \ge \int \limits _\beta ^0 a e^{uc}\,du = \frac{a}{c}\left( 1-e^{\beta c}\right) , \end{aligned}$$

and the proof is complete. \(\square \)

The upper tail and lower tail bounds in Theorem 1.3 can now be obtained using the standard “large deviation upper bound” method together with the information about \(M(\beta )\) in Proposition 4.1. Here are the details.

Proof of Theorem 1.3

Suppose first \(x \ge a\). For any \(\beta \ge 0\) we have

$$\begin{aligned} \mathbb {P}(X \ge x) = \mathbb {P}(e^{\beta X} \ge e^{\beta x}) \le M(\beta )/e^{\beta x} \le \exp \left\{ \frac{a}{c}\left( e^{\beta c}-1 \right) -\beta x \right\} . \end{aligned}$$

Choosing \(\beta = (1/c) \log (x/a) \ge 0\) we get

$$\begin{aligned} \mathbb {P}(X \ge x) \le \exp \left\{ \frac{a}{c}\left( \frac{x}{a}-1\right) -\frac{x}{c} \log \left( \frac{x}{a}\right) \right\} = e^{(x-a)/c} \left( \frac{a}{x}\right) ^{x/c}. \end{aligned}$$

Now suppose \(0 \le x \le a\). For any \(\beta \le 0\) we have

$$\begin{aligned} \mathbb {P}(X \le x) = \mathbb {P}(e^{\beta X} \ge e^{\beta x}) \le M(\beta )/e^{\beta x} \le \exp \left\{ \frac{a}{c}\left( e^{\beta c}-1 \right) -\beta x \right\} . \end{aligned}$$

If \(0 < x \le a\) take \(\beta = (1/c) \log (x/a) \le 0\) to get

$$\begin{aligned} \mathbb {P}(X \le x) \le \exp \left\{ \frac{a}{c}\left( \frac{x}{a}-1\right) -\frac{x}{c} \log \left( \frac{x}{a}\right) \right\} = e^{(x-a)/c} \left( \frac{a}{x}\right) ^{x/c}, \end{aligned}$$

while if \(x = 0\) let \(\beta \rightarrow -\infty \) to get \(\mathbb {P}(X \le 0) \le e^{-a/c}\).\(\square \)

The passage from Theorem 1.3 to Corollary 1.1 is a direct application of the following two lemmas.

Lemma 4.1

Suppose \(0 < a \le x\) and \(c >0\). Then

$$\begin{aligned} \left( \frac{a}{x}\right) ^{x/c} e^{(x-a)/c} \le e^{-(x-a)^2/(ca+cx)}, \end{aligned}$$
(22)

with strict inequality whenever \(x > a\).

Proof

The expressions both take the value 1 when \(x = a\), so we can assume \(x > a\). Jensen’s inequality applied to the function \(f(x) = 1/x\) on the interval \([1,1+u]\) gives \(\log (1+ u) > 2u/(2+u)\) for \(u > 0\). Taking \(u = (x-a)/a\) we get \(\log (x/a) > 2(x-a)/(x+a)\) and so

$$\begin{aligned} x \log (a/x) +x-a < \frac{-2x(x-a)}{x+a} + x-a = -\frac{(x-a)^2}{x+a}. \end{aligned}$$

Dividing by \(c\) and applying the exponential function to both sides gives (22). \(\square \)

Lemma 4.2

Suppose \(0 \le x \le a\) and \(a,c >0\). Then

$$\begin{aligned} \left( \frac{a}{x}\right) ^{x/c} e^{(x-a)/c} \le e^{-(x-a)^2/(2ca)}, \end{aligned}$$
(23)

with strict inequality whenever \(x < a\).

Proof

The expressions both take the value 1 when \(x = a\), so we can assume \(x < a\). Integrating the inequality \(x^{-1} \le (1+x^{-2})/2\) from \(1\) to \(v \) gives \(\log v < (v-1/v)/2\) for \(v > 1\). Putting \(v=a/x\) gives

$$\begin{aligned} x \log (a/x) -a+x \le \frac{x}{2}\left( \frac{a}{x}-\frac{x}{a}\right) -a+x = -\frac{(x-a)^2}{2a}. \end{aligned}$$

Dividing by \(c\) and applying the exponential function to both sides gives (23). \(\square \)

5 Examples admitting a 1-bounded size bias coupling

Here we give some examples of random variables \(X\) which satisfy BSBC with \(c = 1\). Examples with general \(c > 0\) can be obtained by scaling, see Remark 1.1.

Example 5.1

If \(X\) has a Poisson distribution, it can be verified directly that \(X+1 =^d X^*\). Conversely if \(X \ge 0\) with \(0 < \mathbb {E \,}X <\infty \) and \(X+1 =^d X^*\), then the inequality in the statement (and proof) of Proposition 4.1 can be replaced by equality, so that \(X\) must have a Poisson distribution.

Remark 5.1

(Sharpness of the bounds) Suppose \(X\) is Poisson distributed with mean \(a \in (0,\infty )\). Taking \(x = 0\) in the lower tail bound (11) from Theorem 1.3 we get \(F(0) \le e^{-a}\) whereas the exact value is \(F(0) = e^{-a}\). Therefore in this setting the lower tail bound (11) from Theorem 1.3 is sharp.

Now suppose further that \(a \in (0,1]\). When \(x=n\), the upper tail bound (3) in Theorem 1.1 simplifies, with \(k=n-1\), to \(G(x) \le u(n,a,1)= a^n/n!\), while for large \(n\), \(G(n) \sim \mathbb {P}(X=n) = e^{-a} a^n/n!\). Hence, for large \(x\) with \(x\) an integer, the upper bound (3) is sharp up to a factor of approximately \(e^{a}\). Letting \(a \rightarrow 0\) so that \(e^{a} \rightarrow 1\), one sees that the upper bound (3), for large \(x\), is sharp up to a factor arbitrarily close to 1.

Example 5.2

Lévy([0,1]), the infinitely divisible distributions with Lévy measure supported on \([0,1]\). This is the case \(c=1\) of the Lévy([0,\(c\)]) distributions, discussed in detail in Sect. 6.

Example 5.3

Let \(X\) be a random variable with values in \([0,1]\), with \(\mathbb {E \,}X > 0\). This includes, but is not restricted to, Bernoulli random variables. The size biased version \(Y\), say, of \(X\) takes values in \([0,1]\) also, and hence \(Y \le 1 \le 1+X\), so that \(X\) admits a 1-bounded size bias coupling.

Example 5.4

The uniform distribution on \([0,b]\) admits a 1-bounded size bias coupling for any \(b \in (0,4]\).

Proof

Suppose that \(X\) is uniformly distributed on \([0,1]\), and that \(Y\) is the size biased version of \(X\). On \([0,1]\), the density of \(X\) is \(f_X(x)=1\), the density of \(Y\) is \(f_Y(x)=2x\), the cumulative distribution functions are \(F_X(t)=t\) and \(F_Y(t)=t^2\), and the inverse cumulative distribution functions are \(F_X^{(-1)}(u)=u\) and \(F_Y^{(-1)}(u)=\sqrt{u}\) for \(0 \le u \le 1\). Observe that \(\max _{0 \le u \le 1} (\sqrt{u}-u) = \frac{1}{4}\), achieved at \(u=\frac{1}{4}\). Hence, using the quantile transform as in the proof of Lemma 7.1, there is a \(c\)-bounded size bias coupling for the standard uniform with \(c=\frac{1}{4}\). By scaling, as in Remark 1.1, the uniform distribution on \([0,b]\) admits a \(b/4\)-bounded size bias coupling. \(\square \)

Proposition 5.1

Suppose \(X = \sum _{i} X_i\) is the sum of finitely or countably many independent non-negative random variables \(X_i\) with \(0 < \mathbb {E \,}X = \sum _i \mathbb {E \,}X_i <\infty \). If each \(X_i\) admits a 1-bounded size bias coupling, then so does \(X\).

Proof

For each \(i\) there is a coupling of \(X_i\) with its size biased version \(Y_i\), say, so that \(Y_i \le X_i+1\). Let \(I\) denote an index \(i\) chosen (independently of all the \(X_j\) and \(Y_j\)) with probability \(\mathbb {E \,}X_i/\mathbb {E \,}X\). Then \(Y:=X-X_I+Y_I\) is the size biased version of \(X\), see [15, Lemma 2.1] or [3, Sect. 2.4]. Since \(Y_i \le X_i +1\) for all \(i\), it follows that \(Y \le X+1\). \(\square \)

Example 5.5

By Proposition 5.1, if \(X\) is Binomial, or more generally if \(X\) is the sum of (possibly infinitely many) independent Bernoulli random variables (with possibly different parameters) with \(0 < \mathbb {E \,}X < \infty \) then \(X\) has a 1-bounded size bias coupling.

Example 5.6

Let \(C(b,n)\) denote the number of collisions which occur when \(b\) balls are tossed into \(n\) equally likely bins. (A collision is said to occur whenever a ball lands in a bin that already contains at least one ball.) Clearly \(C(b,n)\) can be expressed as the sum of dependent Bernoulli random variables. In [2, Prop 15] it is shown that \(C(b,n)\) admits a 1-bounded size bias coupling.

The paper [2] studies the number \(B(\kappa ,n)\) of balls that must be tossed into \(n\) equally likely bins in order to obtain \(\kappa \) collisions. For the sake of concrete analysis, we suppose here that \(\kappa \sim n^\alpha \) for some fixed \(\alpha \in (1/2,1)\); in this case, the goal is to prove that the variance of \(B(\kappa ,n)\) is asymptotic to \(n/2\). Rényi’s central limit theorem [21] for the number of collisions \(C(b,n)\) obtained with a given number \(b\) of balls, combined with duality, leads easily to a central limit theorem for the number of balls \(B(\kappa ,n)\) that must be tossed; the hard part, for getting the asymptotic variance, requires uniform integrability, to be obtained from concentration inequalities for \(C(b,n)\). The traditional tool, the Azuma-Hoeffding bounded difference martingale approach [4], leads to \(\mathbb {P}( |C(b,n)- \mathbb {E \,}C(b,n) | \ge t) \le 2 \exp ( -t^2/(2b))\). In contrast, bounded size bias coupling inequalities such as Corollary 1.1 involve \(a = \mathbb {E \,}C(b,n)\) rather than \(b\) in the denominator of the fraction in the exponential. The situation in [2] has \(a/b \rightarrow 0\); it turns out that Azuma-Hoeffding inequality is inadequate in this setting, whereas the bounded size bias coupling inequalities are strong enough, see [2, Sect. 7] for details.

6 Lévy(\([0,c]\)), the infinitely divisible distributions with Lévy measure supported on \([0,c]\)

Among the distributions satisfying BSBC are those with characteristic function of the form

$$\begin{aligned} \phi _X(u) := \mathbb {E \,}e^{i u X} =\exp \left( a \left( i u \, \alpha (\{0\}) +\ \int \limits _{(0,c]} \frac{e^{iuy}-1}{y} \ \alpha (dy) \right) \right) \end{aligned}$$
(24)

where \(a \in (0,\infty )\) and \(\alpha \) is the probability distribution of a nonnegative nonzero random variable \(D\), with \(\mathbb {P}(D\in [0,c])=1\). Given this characteristic function, the random variable \(X\) has \(a=\mathbb {E \,}X\), and, with \(X,D\) independent, \(X^*=^d X+D\). See [3] for the connection between infinite divisibility of \(X\) and size biasing of \(X\) by adding an independent \(D\). Special cases include:

  1. 1.

    \(c=1, \mathbb {P}(D=1)=1; X\) is Poisson with mean \(a\).

  2. 2.

    \(c=1,a=1, D\) is uniformly distributed on \((0,1)\); \(X\) has density \(f(x) = e^{-\gamma } \rho (x)\) where \(\rho \) is Dickman’s function and \(\gamma \) is Euler’s constant.

The characteristic function in (24) can also be expressed as

$$\begin{aligned} \phi _X(u) = \exp \left( i a u \alpha _0+ \int \limits _{(0,c]} \left( e^{iuy}-1 \right) \gamma (dy) \right) \end{aligned}$$
(25)

Here \(\gamma \) is a nonnegative measure on \((0,\infty )\), with \(\gamma (dy) /\alpha (dy) = a/y\), and may be called the Lévy measure of the infinitely divisible random variable \(X\); see Sato [22, Section 51], Bertoin [6, Section 1.2], or [3].

Theorem 6.1

Suppose \(X\) has distribution given by (24) for \(c >0\), and that for every \(\varepsilon > 0\), the probability measure \(\alpha \) is not supported on \([0,c-\varepsilon ]\). Then \(G(x) := \mathbb {P}(X \ge x)\) satisfies, as \(x \rightarrow \infty \),

$$\begin{aligned} G(x) \approx x^{-x/c}, \text { that is}, \ \frac{\log G(x)}{(x/c) \log x} \rightarrow -1. \end{aligned}$$
(26)

Proof

As observed in the Introduction, the upper bound on \(G(x)\) follows directly and easily from (3) in Theorem 1.1 and (7) in Theorem 1.2.

For the lower bound, let \(\varepsilon >0\) be given, with \(\varepsilon < c\). Using the representation (25), the random variable \(X\) can be realized as the constant \(a \alpha _0\) plus the sum of the arrivals in the Poisson process \({\mathcal X}\) on \((0,c]\) with intensity measure \(\gamma \). Let \(Z\) be the number of arrivals of \({\mathcal X}\) in \([c-\varepsilon ,c]\). We have \(X \ge (c - \varepsilon ) Z\). This yields

$$\begin{aligned} G(x) = \mathbb {P}(X \ge x) \ge \mathbb {P}\left( Z \ge \frac{x}{c-\varepsilon }\right) . \end{aligned}$$

Finally, \(Z\) is a Poisson random variable with mean \(\lambda = \gamma ([c-\varepsilon ,c] ) \ge (a/c) \alpha ([c-\varepsilon ,c]) > 0\). We recall an elementary calculation: for integers \(k \rightarrow \infty \), the Poisson (\(\lambda \)) distribution for \(Z\) has \(\mathbb {P}(Z \ge k) \ge \mathbb {P}(Z=k)\) with

$$\begin{aligned} \log \mathbb {P}(Z=k) = -\lambda + k \log \lambda - \log k! \sim - \log k! \sim -k \log k. \end{aligned}$$
(27)

We use this with \(k = \lceil x/(c-\varepsilon ) \rceil \sim x/(c-\varepsilon )\) and \(\log k \sim \log x\). \(\square \)

Remark 6.1

Most probabilists are familiar with the \(\approx \) notation of (26), with \(a_n \approx b_n\) defined to mean \(\log a_n \sim \log b_n\), for use in the context where \(a_n\) grows or decays exponentially. The standard example is the large deviation statement that for i.i.d. sums, \(\mathbb {P}(S_n \ge a n ) \approx \exp (-n I(a) )\), and the \(\approx \) relation hides factors with a slower than exponential order of growth or decay, in this case \(1/\sqrt{n}\). When both \(a_n\) and \(b_n\) grow or decay even faster, for example with \(a_n \sim n^{\pm n/c}\), an unfamiliar phenomenon arises, with \(\approx \) hiding factors which grow or decay exponentially fast. One example appears in the conclusion (26) of Theorem 6.1, where \(x^{-x/c} \approx (x/c)^{-x/c} \approx 1/ \Gamma (x/c)\) as \(x \rightarrow \infty \)—with the last expression being relevant because it corresponds to the Gamma function upper bound in Theorem 1.2. A second example occurs in (27), where at first it appears strange that the parameter \(\lambda \) does not appear on the right hand side—this reflects the fact that for any fixed \(\lambda , \lambda ' >0\), when \(Z\) and \(Z'\) are Poisson with parameters \(\lambda , \lambda '\) respectively, \(\mathbb {P}(Z=k) \approx \mathbb {P}(Z'=k)\) as \(k \rightarrow \infty \). The first example hides the exponentially growing factor \(c^{x/c}\), and the second hides the factor \((\lambda ' / \lambda )^k\).

7 Bounded coupling, monotone coupling, and a sandwich principle

Suppose that the distributions of random variables \(X,Y\) have been specified, with cumulative distribution functions \(F_X,F_Y\) respectively. We will clarify the relations between hypotheses of the form

$$\begin{aligned} \exists \text { a coupling},&\mathbb {P}(Y \le X+c)=1, \end{aligned}$$
(28)
$$\begin{aligned} \exists \text { a coupling},&\mathbb {P}( |Y-X| \le c)=1, \end{aligned}$$
(29)
$$\begin{aligned} \exists \text { a coupling},&\mathbb {P}(Y \in [X,X+c])=1, \end{aligned}$$
(30)
$$\begin{aligned} \exists \text { a coupling},&\mathbb {P}( X \le Y)=1. \end{aligned}$$
(31)

It is well-known that if \(Y =^d X^*\), then (31) holds. We observe that if \(Y =^d X^*\), then (28) is the hypothesis BSBC, while (29) and (30) are used as hypotheses for the first results on concentration via size bias couplings, in [12].

Proposition 7.1

Given that (31) holds, all of (28)–(30) are equivalent.

The proof of Proposition 7.1 will be given later in this section.

7.1 One sided bounds

Among the four hypotheses (28) – (31), only (31) is standard. The relation of stochastic domination, that \(X\) lies below \(Y\), written \(X \preceq Y\), is usually defined by the condition

$$\begin{aligned} \forall t, F_X(t) \ge F_Y(t), \end{aligned}$$
(32)

and it is well known that (32) is equivalent to (31). See for example Lemma 7.1. Stochastic domination is often considered in the more general context where \(X,Y\) are random elements in a partially ordered set, see for example [20].

7.2 Two sided bounds: historical perspective

Dudley [10, Prop 1 and Thm 2], following earlier work of Strassen [23], proved the following result. Given distributions \(\mu \) and \(\nu \) respectively for random variables \(X\) and \(Y\) taking values in a Polish metric space \((S,d)\) and \(\alpha > 0\), write \(A^\alpha = \{ x : d(x,A) \le \alpha \}\) to denote the closed \(\alpha \) neighborhood of the set \(A \subset S\). Then the following are equivalent:

$$\begin{aligned}&\text {For all closed sets } A, \ \ \mu (A) \le \nu (A^\alpha ) \end{aligned}$$
(33)
$$\begin{aligned}&\text {For all closed sets } A, \ \ \nu (A) \le \mu (A^\alpha ) \end{aligned}$$
(34)
$$\begin{aligned}&\text {There exists a coupling under which } \mathbb {P}(d(X,Y) \le \alpha ) = 1. \end{aligned}$$
(35)

When \(X,Y\) are real valued random variables, and \(\alpha = c\), condition (29) is trivially equivalent to (35), and (33) or (34) can be shown to be equivalent to \(\forall t, \ F_Y(t-c) \le F_X(t) \le F_Y(t+c)\).

7.3 Proof of Proposition 7.1

It is clear that (30) implies (29) implies (28). It remains to show that (31) and (28) together imply (30), and this follows immediately by using the implication i) implies iii) with \(b = 0\) in the following Lemma.

Lemma 7.1

For random variables \(X,Y\) with cumulative distribution functions \(F_X, F_Y\) respectively and \(b,c \in \mathbb {R}\), the following are equivalent:

  1. (i)

    There exist a coupling such that \(\mathbb {P}(X \le Y+b) =1\) and a coupling such that \(\mathbb {P}(Y \le X+c) = 1\)

  2. (ii)

    For all \(t, F_Y(t-b) \le F_X(t) \le F_Y(t+c)\)

  3. (iii)

    There exists a coupling such that \(\mathbb {P}(X-b \le Y \le X+c)=1\)

  4. (iv)

    There exists a coupling such that \(X-b \le Y \le X+c\) for all \(\omega \).

Proof

To prove that (i) implies (ii) we use the coupling in which \(\mathbb {P}(X \le Y+b) =1\) and calculate

$$\begin{aligned} F_Y(t-b) = \mathbb {P}(Y \le t-b) \le \mathbb {P}(X \le t) + \mathbb {P}(X > Y+b) = \mathbb {P}(X \le t) = F_X(t) \end{aligned}$$

and similarly for \(F_X(t) \le F_Y(t+c)\).

To show (ii) implies (iv) we may use the quantile transform to couple \(X\) and \(Y\) to a single random variable \(U\), uniformly distributed in (0,1). This transform is written informally as

$$\begin{aligned} X(\omega ) := F_X^{(-1)}(U(\omega )), \ Y(\omega ) := F_Y^{(-1)}(U(\omega )) \end{aligned}$$

and more formally as \(X(\omega ) := \inf \{t: F_X(t) \ge U(\omega ) \}\), \(Y(\omega ) := \inf \{t: F_Y(t) \ge U(\omega ) \}\). Under this coupling, \(F_Y(t-b) \le F_X(t)\) for all \(t\) implies \(X(\omega ) \le Y(\omega )+b\) for all \(\omega \), and \(F_X(t) \le F_Y(t+c)\) for all \(t\) implies \(Y(\omega ) \le X(\omega )+c\) for all \(\omega \). Together ii) implies \( X(\omega )-b \le Y(\omega ) \le X(\omega )+c \) holds for all \(\omega \).

Finally, (iv) implies (iii) trivially, and (iii) implies (i) a fortiori. \(\square \)

7.4 A sandwich principle

Corollary 7.1

(Sandwich principle) Suppose \(X,Y,Z\) are random variables with cumulative distribution functions \(F_X, F_Y, F_Z\) satisfying \(F_Z(t) \le F_Y(t) \le F_X(t)\) for all \(t\). If there is a \(c\)-bounded coupling of \(X\) and \(Z\), as defined via (29), then there exists a \(c\)-bounded monotone coupling of \(X\) and \(Y\), as defined by (30).

Proof

Using Lemma 7.1 with \(Z\) in place of \(Y\), the \(c\)-bounded coupling of \(X\) and \(Z\) implies \(F_X(t) \le F_Z(t+c)\) for all \(t\). Then \(F_Y(t) \le F_X(t) \le F_Z(t+c) \le F_Y(t+c)\) for all \(t\), and Lemma 7.1 gives the existence of the \(c\)-bounded monotone coupling of \(X\) and \(Y\). \(\square \)

Application As an illustration of the sandwich principle, we prove the following corollary. The special case where \(X\) is Binomial\((m,p)\) for \(p \in (0,1)\) and \(m \ge 1\) was proved in [14, Lemmas 3.2, 3.3] by an explicit calculation. The sandwich principle enables a short proof for the general case, with no calculation.

Corollary 7.2

Suppose \(X\) satisfies BSBC, with \(c = 1\), and the distribution of \(Y\) is defined to be that of \(X\), conditional on \(X > 0\). Then there is a coupling in which \(Y-X \in [0,1]\) for all \(\omega \).

Proof

Trivially, \(Y\) stochastically dominates \(X\). Take the distribution of \(Z\) to be the size biased distribution of \(X\). By assumption, there is a coupling in which \(\mathbb {P}(Z-X \in [0,1])=1\). Trivially, the distribution of \(Z\), initially defined as the size-biased distribution of \(X\), is also the size-biased distribution of \(Y\). Hence \(Z\) dominates \(Y\), and the sandwich principle applies with \(c = 1\). \(\square \)

This result may be applied to any of the random variables \(X\) discussed in Sect. 5, and in particular to those obtained using Proposition 5.1. If \(X\) is (non-negative) integer valued, for example, a Binomial random variable or a sum of independent Bernoulli random variables, then \(Y\) is also integer valued and the conclusion of Corollary 7.2 is easily strengthened to \(Y-X \in \{0,1\}\) for all \(\omega \).

8 Analysis of the upper tail and lower tail bounds

In this section we obtain results enabling us to compare the upper and lower tail bounds in Theorem 1.2 with those in Theorem 1.3.

Lemma 8.1

For \(0 < u \le v\)

$$\begin{aligned} \frac{\Gamma (v+1/2)}{\Gamma (u+1/2)} \ge \frac{v^v}{e^{v-u}u^u}. \end{aligned}$$
(36)

Proof

For \(x > 0\), using Gauss’ formula, see [24, Sect 12.3],

$$\begin{aligned} ( \log \Gamma )'(x+ 1/2) = \int \limits _0^\infty \left( \frac{e^{-t}}{t} - \frac{e^{-t(x+1/2)}}{1-e^{-t}}\right) \,dt \end{aligned}$$

together with

$$\begin{aligned} \log x = \int \limits _0^\infty \left( \frac{e^{-t} - e^{-tx}}{t}\right) \,dt \end{aligned}$$

we obtain

$$\begin{aligned} ( \log \Gamma )'(x+1/2) - \log x&= \int \limits _0^\infty \left( \frac{e^{-tx}}{t} - \frac{e^{-t(x+1/2)}}{1-e^{-t}}\right) \,dt\\&= \int \limits _0^\infty \frac{e^{-t(x+1/2)}(2 \sinh (t/2) -t)}{t(1-e^{-t})}\,dt \\&\ge 0. \end{aligned}$$

Therefore for \(0 < u \le v\),

$$\begin{aligned} \frac{\Gamma (u+1/2)}{\Gamma (v+1/2)}&\ge \exp \left\{ \int \limits _u^v \log x \,dx \right\} = \exp \left\{ \left[ x \log x - x\right] _u^v\right\} = \frac{v^v}{e^{v-u}u^u}. \end{aligned}$$

\(\square \)

Proposition 8.1

For \(0 < a \le x\),

$$\begin{aligned} \frac{a^{x-a} \Gamma (a+1)}{\Gamma (x+1)}&\le \frac{\left( a+\frac{1}{2}\right) ^{a+1/2}(ae)^{x-a}}{\left( x+\frac{1}{2}\right) ^{x+1/2}} \end{aligned}$$
(37)
$$\begin{aligned}&\le \left( \frac{a+\frac{1}{2}}{x+\frac{1}{2}}\right) ^{1/2} \left( \frac{a}{x}\right) ^x e^{x-a}. \end{aligned}$$
(38)

Proof

Taking \(u = a+1/2\) and \(v=x+1/2\) in (36), we get

$$\begin{aligned} \frac{\Gamma (x+1)}{\Gamma (a+1)} \ge \frac{(x+\frac{1}{2})^{x+1/2}}{e^{x-a}(a+\frac{1}{2})^{a+1/2}}, \end{aligned}$$

giving (37). A simple calculus argument shows that \(t \mapsto (a+t)^a/(x+t)^x\) is decreasing for \(t \in [0,\infty )\). In particular \(\left( a+\frac{1}{2}\right) ^a/\left( x+\frac{1}{2}\right) ^x \le a^a/x^x\), and this gives (38). \(\square \)

Proposition 8.2

For \(0 < x <a\),

$$\begin{aligned} \frac{\Gamma (a+1)}{a^{a-x} \Gamma (x+1)}&\ge \frac{\left( a+\frac{1}{2}\right) ^{a+1/2}}{(ae)^{a-x}\left( x+\frac{1}{2}\right) ^{x+1/2}} \end{aligned}$$
(39)
$$\begin{aligned}&\ge \left( \frac{a+\frac{1}{2}}{x+\frac{1}{2}}\right) ^{1/2} \left( \frac{a}{x}\right) ^x e^{x-a}. \end{aligned}$$
(40)

Proof

Essentially the same as for Proposition 8.1, but with the roles of \(x\) and \(a\) switched. \(\square \)

9 Comparison of the bounds

9.1 Relative strength of our upper and lower tail bounds

Here we compare the elementary “product bounds” (34) given in Theorem 1.1 with the Gamma function bounds (78) given in Theorem 1.2 and the bounds (1011) obtained in Theorem 1.3 from the moment generating function estimate.

It is clear from Theorem 1.2 [together with the scaling relationship (6)] that the product bounds (34) are at least as strong (i.e. small) as the corresponding Gamma bounds (78). However the relationship between the Gamma bounds and the bounds (1011) in Theorem 1.3 is more complicated. See Propositions 8.1 and 8.2. For the upper tail, with \(a \le x\), the Gamma bound (7) is sharper than the bound (10) from Theorem 1.3 by a factor \(\sqrt{(a+c/2)/(x+c/2)}\). However, the situation is reversed for the lower tail, with \(0 \le x \le a\), where now the bound (11) from Theorem 1.3 is sharper than the Gamma bound (8) by a factor \(\sqrt{(x+c/2)/(a+c/2)}\). Since the Gamma bound (8) and the product bound (4) agree whenever \(a-x\) is an integer, this suggests (but does not prove) that the bound (11) is the best of the three for the lower tail.

Numerical investigations suggest that (11) is in fact the best estimate to use for the lower tail, and beats the simple product rule \(\ell (x,a,c)\) of (4). Recall however, from Remark 2.2, the lower tail bounds derived from (15) in combination with the one-sided Chebyshev inequality (16), in particular the functions \(\ell _j(x,a,c)\) defined in (17). Numerical investigations suggest that for all \(a,c > 0\) with sufficiently large \(a/c\) there exists \(x \in (0,a)\) such that the bound given in (11) is less than the product bound (4) and is less than the one-sided Chebyshev estimate (16), but is greater than \(\ell _j(x,a,c)\) for some nonnegative integer \(j\le (a-x)/c\).

9.2 Hoeffding bounds

Suppose \(X = X_1+ \cdots + X_n\) where the \(X_i\) are independent and take values in \([0,1]\), and let \(a = \mathbb {E \,}X\). Clearly \(a < \infty \) and to avoid trivialities we assume \(a > 0\). Hoeffding [18, Thm 1] proved that for \(a \le x < n\)

$$\begin{aligned} \mathbb {P}(X \ge x) \le \left( \frac{a}{x}\right) ^x \left( \frac{n-a}{n-x}\right) ^{n-x}. \end{aligned}$$

The inequality

$$\begin{aligned} \left( \frac{n-a}{n-x}\right) ^{n-x} \le e^{x-a} \end{aligned}$$

for \(0 < x < n\), (see for example [16]), and the fact that \(\mathbb {P}(X \ge x) = 0\) for \(x > n\), together give the upper tail bound

$$\begin{aligned} \mathbb {P}(X \ge x) \le \left( \frac{a}{x}\right) ^x e^{x-a} \quad \text{ for } \text{ all } x \ge a. \end{aligned}$$
(41)

A similar argument with \(X_i\) replaced by \(1-X_i\) and \(X\) replaced by \(n-X\) gives the lower tail bound

$$\begin{aligned} \mathbb {P}(X \le x) \le \left( \frac{a}{x}\right) ^x e^{x-a} \quad \text{ whenever } 0 \le x \le a. \end{aligned}$$
(42)

Notice that the right sides of (41) and (42) do not depend on the number \(n\) of summands in \(X\). Other related inequalities, also referred to as Hoeffding or Chernoff–Hoeffding bounds, involve the parameter \(n\).

From Proposition 5.1 we know that any random variable \(X\) of the form above admits a 1-bounded size bias coupling. Therefore the Hoeffding bounds (4142) are a special case of the bounds (1011) in our Theorem 1.3. Our best upper tail bound, given by (3) is smaller than the Hoeffding upper tail bound (41) by a factor \({\big (\frac{a+1/2}{x+1/2}\big )^{1/2}}\). Moreover our results have a broader scope, applying to any random variable \(X\) which admits a 1-bounded size bias coupling. In particular it applies to sums of independent nonnegative random variables such as Uniform[0,4] and Lévy[0,1] (as discussed in Examples 5.4 and 5.2).