## 1 Introduction

The task of approximating the distribution of a sum of independent random variables lies at the heart of the probability theory. The central role is played by the normal approximation. However, in situations where one deals with rare events Poisson or compound Poisson approximation may be preferable (cf. [4, 5, 35, 36, 40]).

Interest to the topic of Poisson/compound Poisson approximation arises in connection with applications in extreme value theory, insurance, reliability theory, etc. (cf. [6, 7, 29, 33]). The theory of Poisson/compound Poisson approximation underpins the extreme value theory [29, 33].

Let $$X_1,X_2,\ldots$$ be integer-valued non-negative random variables (r.v.s). Denote $$S_0=0,$$

\begin{aligned} S_n = X_1\!+\!\cdots \!+\!X_n,\quad \lambda \!\equiv \!\lambda (n)\!=\!I\!\!ES_n\, \quad (n\!>\!1). \end{aligned}

For example, given r.v.s $$\xi _1,\xi _2,\ldots$$, one may deal with exceedances of a chosen “threshold” x. Set Then

denotes the number of exceedances of threshold x. In particular,

\begin{aligned} \Big \{\!\max _{1\le i\le n} \xi _i\!\le \!x\Big \}=\{S_n(x)\!=\!0\},\ \ \{\xi _{k,n}\!\le \!x\} = \{S_n(x)\!<\!k\} \quad (k\!\in \!I\!\!N), \end{aligned}
(1)

where $$\xi _{k,n}$$ denotes the k-th largest element among $$\xi _1,\ldots ,\xi _n$$. Thus, results concerning the distribution of sample extremes can be derived from the corresponding results concerning $$\mathcal{L}(S_n)$$.

In applications, indicators may be dependent. A well-known approach (Bernstein’s blocks method [14]) consists of grouping observations into blocks which can be considered almost independent. The number of r.v.s in a block is an integer-valued random variable; hence, the number of rare events is a sum of almost independent integer-valued r.v.s.

In (re)insurance applications, the sum of integer-valued r.v.s allows to account for the total loss from the claims $$\{Y_i\}$$ exceeding threshold x [22]. More information concerning applications can be found in [6, 7, 22, 29, 35, 36].

The distribution of a sum $$S_n(x)$$ of integer-valued non-negative random variables can often be approximated by a Poisson or compound Poisson law. In early 1950s, Kolmogorov has formulated the task of evaluating the accuracy of approximation of the distribution of a sum $$S_n$$ of independent and identically distributed (i.i.d.) r.v.s by infinitely divisible distributions (Kolmogorov’s uniform approximation problem). The topic has attracted a lot of attention among researchers (see, e.g., [4, 19, 35, 36, 40, 49] and references therein).

From a theoretical point of view, the question concerning the accuracy of Poisson/compound Poisson approximation is a particular case of Kolmogorov’s problem. Besides, it was noticed that estimates of the accuracy of approximation to the Binomial distribution can provide important insights in other areas of probability theory [32, 38]. In a sense, the Binomial distribution plays the role of a “testing stone” [4].

Note that there is a strong connection between the topics of Poisson and compound Poisson approximation [35, 36, 38, 48]; the latter plays a special role in approximating $$\mathcal{L}(S_n)$$ by infinitely divisible laws since the class of infinitely divisible distributions coincides with the class of weak limits of compound Poisson distributions [26, Theorem 26].

In a range of situations, both normal and (compound) Poisson approximations can be applicable (cf. [4, 40]). Due to the complex structure of the compound Poisson distribution, in applications one would prefer pure Poisson approximation where possible.

One can choose between possible types of approximation by comparing estimates of the accuracy of approximation. Obviously, one would make a choice according to the sharpest estimate, thus the need of sharp estimates of the accuracy of approximation with explicit constants.

### 1.1 Notation

Let $$\mathcal{S}$$ denote the class of measurable functions taking values in [0; 1]. Then

\begin{aligned} d_{_{TV}}\!\left( X;Y\right) \equiv d_{_{TV}}\!(\mathcal{L}(X);\mathcal{L}(Y)) = \sup _{h\in \mathcal{S}}\,(I\!\!Eh(X)\!-\!I\!\!Eh(Y)) \end{aligned}

denotes the total variation distance between the distributions of r.v.s X and Y.

We denote by

\begin{aligned} {d_{_K}}\!(X;Y) \equiv {d_{_K}}\!(\mathcal{L}(X);\mathcal{L}(Y)) = \sup _x|F_X(x)-F_Y(x)| \end{aligned}

the uniform (Kolmogorov’s) distance between the distributions of random variables X and Y with distribution functions (d.f.s) $$F_X$$ and $$F_Y$$.

The Gini–Kantorovich distance between the distributions of r.v.s X and Y with finite first moments (known also as the Kantorovich–Wasserstein distance) is

\begin{aligned} d_{_G}(X;Y) \equiv d_{_{G}}(\mathcal{L}(X);\mathcal{L}(Y)) = \sup _{g\in \mathcal{L}}\left| I\!\!Eg(X)-I\!\!Eg(Y)\right| , \end{aligned}

where $$\mathcal{L} = \{g\!:|g(x)\!-\!g(y)|\le |x\!-\!y|\}$$ is the set of Lipschitz functions. Note that

\begin{aligned} d_{_G} \!\left( X;Y\right) = \inf _{X',Y'} I\!\!E|X'-Y'|, \end{aligned}

where the infimum is taken over all random pairs $$(X',Y')$$ such that $$\mathcal{L}(X')=\mathcal{L}(X)$$ and $$\mathcal{L}(Y')=\mathcal{L}(Y)$$.

Set $$\Vert f\Vert = \sup _k |f(k)|.$$ We denote by $$\Vert \cdot \Vert _{_1}$$ the $$L_1$$-norm of a function. Given a discrete r.v. Y,  we denote

\begin{aligned} I\!\!P_Y = I\!\!P(Y\!=\!\cdot ). \end{aligned}

In the case of discrete distributions, it is natural to exploit the point metric

\begin{aligned} d_\textrm{o}(X;Y) \equiv d_\textrm{o}(\mathcal{L}(X);\mathcal{L}(Y)) = \sup _k |I\!\!P(X\!=\!k)\!-\!I\!\!P(Y\!=\!k)| = \Vert I\!\!P_X\!-\!I\!\!P_Y \Vert , \end{aligned}

where $$\Vert \cdot \Vert$$ denotes the sup-norm.

Initially, estimates of the accuracy of Poisson approximation to the Binomial distribution $$\textbf{B}(n,p)$$ were established in terms of the uniform distance $$d_K$$ and the total variation distance $$d_{_{TV}}\!$$ [27, 40, 47]. Metrics $$d_{_{TV}}\!,\ d_K,\ d_\textrm{o}$$ have obvious merits. For instance, they are shift- and scale invariant. The $$O(n^{-2/3})$$ estimate of the accuracy of approximation in Kolmogorov’s problem holds in terms of $$d_K$$ [1,2,3] (and hence in terms of $$d_\textrm{o}$$) but generally not in terms of $$d_{_{TV}}\!$$ [49]. Bounds in terms of $$d_\textrm{o}$$ can be sharper than those in terms of the uniform and the total variation distances.

In extreme value theory, one is interested in the distribution of the k-th largest sample element $$X_{n:k}$$. Metric $$d_\textrm{o}$$ appears more suitable than $$d_K$$ and $$d_{_{TV}}\!$$ if one evaluates probabilities like $$I\!\!P(S_n(x)\!<\!k)$$ for “small” k,  cf. (1). A similar observation can be made concerning the length of the longest head run (LLHR), the length of the longest match pattern (LLMP), etc. (see [35]). Metric $$d_\textrm{o}$$ is shift and scale invariant. Another advantage of using $$d_\textrm{o}$$ in comparison with $$d_{_{TV}}\!,\ d_K$$ is the higher rate of approximation (cf. (4), (8)).

### 1.2 Independent Bernoulli r.v.s

Hereinafter, $$\{X_i\}$$ are independent non-negative integer-valued random variables; multiplication is superior to the division. W.l.o.g. we may assume that $$I\!\!EX_i\!>\!0\ (\forall i)$$.

Let $$\pi _\lambda$$ denote a Poisson $$\mathbf{\Pi }(\lambda )$$ r.v. Many authors worked on the problem of evaluating the accuracy of approximation

\begin{aligned} \,S_n\approx \pi _\lambda \, \end{aligned}

in the case where $$\{X_i\}$$ are 0-1 r.v.s (see, e.g., [4, 7, 35, 36] and references therein). The problem goes back to Prokhorov [40]. It has attracted a lot of attention among specialists (see, e.g., [35, 36] and references therein).

Estimates of the accuracy of Poisson approximation to the distribution of a sum of independent 0-1 random variables in terms of the uniform distance $$d_K$$ and the total variation distance $$d_{_{TV}}\!$$ have been derived by Kolmogorov [27, Lemma 5], Tsaregradskii [47], LeCam [30], Kerstan [25], Romanowska [41], Shorgin [46], Barbour and Eagleson [12] and other authors (see, e.g., [20, 34, 39, 45]). Concerning estimates in terms of some other metrics, see [15, 21, 35,36,37] and references therein.

In the case of independent 0-1 r.v.s estimates with correct (the best possible) constants at the leading terms have been found by Roos [45]:

\begin{aligned} d_{_{TV}}\!\!\left( S_n;\pi _\lambda \right) \!\le & {} \! 3\theta /4e (1\!-\!\sqrt{\theta }\,)^{3/2},\end{aligned}
(2)
\begin{aligned} d_{_K}(S_n;\pi _\lambda ) \!\le & {} \! \theta /2e + 1.2\theta \sqrt{\theta }/(1\!-\!\sqrt{\theta }), \end{aligned}
(3)
\begin{aligned} d_\textrm{o}(S_n;\pi _\lambda ) \!\le & {} \! \theta (3/2e)^{3/2}/2\lambda ^{1/2} + \frac{\theta \sqrt{\theta }}{3\sqrt{\lambda }} \frac{6\!-\!4\sqrt{\theta }}{(1\!-\!\sqrt{\theta })^2}, \end{aligned}
(4)

where $$\theta =\sum ^n_{i=1} p_i^2/\lambda ,\,p_i=I\!\!P(X_i\!=\!1)\ \ (i\!\ge \!1)$$; constants at the leading terms in (2)–(4) cannot be improved. Note that $$3/4e\approx 0.276,$$ $$1/2e \le 0.184,\ \frac{1}{2} (3/2e)^{3/2}\le 0.205$$.

Estimate (3) has a sharper constant at the leading term than that in (2), estimate (4) has a sharper rate of decay if $$\lambda \!\equiv \!\lambda (n)\!\rightarrow \!\infty$$ as $$n\!\rightarrow \!\infty .$$

Estimates of the accuracy of shifted (translated) Poisson approximation to the distribution of a sum of independent 0-1 r.v.s have been given by Barbour and Xia [9], Čekanavičius and Vaitkus [18], Barbour and Čekanavičius [11], Röllin [43], Novak [34, 37]. Kruopis [28] has evaluated the accuracy of shifted Poisson approximation to $$\mathcal{L}(S_n)$$ in terms of $$d_{\textrm{o}}\!:$$

\begin{aligned} d_{\textrm{o}}(S_n;Y)\le \sum _{j=1}^n p_j^2 \min \left\{ 1; \sqrt{\frac{e}{\pi }}\, \bigg (\sum _{j=1}^np_j(1\!-\!p_j)\bigg )^{-3/2} \right\} , \end{aligned}
(5)

where $$Y = \lfloor \lambda _2\!+\!0.5\rfloor + \pi _{\lambda -\lfloor \lambda _2+0.5\rfloor }$$, $$\lambda _2 \!=\! \sum _{j=1}^n p_j^2$$. Note that $$I\!\!EY\!=\!\lambda ,$$ $$|{\textrm{var}\,}Y\!-\!{\textrm{var}\,}S_n|\!\le \!1/2.$$ In the case of the Binomial distribution $$\textbf{B}(n,p)$$, the right-hand side (r.h.s) of (5) is of order $$\sqrt{p/n}$$.

### 1.3 Integer-valued r.v.s

Let $$\{X_i\}_{i\ge 1}$$ be independent non-negative integer-valued r.v.s. The problem of evaluating the accuracy of Poisson approximation to the distribution of a sum of independent non-negative integer-valued r.v.s has been considered, e.g. in [8, 11, 13, 21, 25, 33, 37]; an overview of the results on the topic can be found in [35, 36].

Röllin [43, formula (2.13)], assuming that third moments are finite, states that

\begin{aligned} d_{\textrm{o}}(S_n;[\lambda \!-\!\sigma ^2]\!+\!\pi _{\sigma ^2+\{\lambda -\sigma ^2\}}) \le \Bigg (\!2\!+\!d{\mathbb \,'} \sum _{i=1}^n\psi _i\!\Bigg )/\sigma ^2, \end{aligned}
(6)

where $$\sigma ^2=\hbox {var }S_n,$$ $$\psi _i = \sigma ^2_i I\!\!EX_i(X_i\!-\!1) \!+\! |I\!\!EX_i\!-\!\sigma ^2_i| I\!\!E(X_i\!-\!1)(X_i\!-\!2) \!+\! I\!\!E|X_i(X_i\!-\!1)(X_i\!-\!2)|,$$ $$d{\mathbb \,'} = \max _{i\le n} \Vert I\!\!P_{S_{n,i}+2}-2I\!\!P_{S_{n,i}+1}+I\!\!P_{S_{n,i}}\Vert _{_1}/2,$$ $$S_{n,i}=S_n\!-\!X_i$$, $$[x]=\max \{k\!\in \!\textbf{Z}\!: k\!\le \!x\},$$ $$\{x\}=x\!-\![x]$$. In the case of the Binomial distribution $$\textbf{B}(n,p)$$ bound (6) is of order 1/np.

Tsaregradskii [47] has shown that the rate of the accuracy of compound Poisson approximation to the Binomial distribution $$\textbf{B}(n,p)$$ in terms of the uniform distance is O(1/np). Presman [38] has shown that

\begin{aligned} d_{_{TV}}\!(\textbf{B}(n,p);P_{n,p}) \le C \min \big \{ np^2; p; \max \{(np)^{-2};n^{-1}\} \big \} \qquad (0\!\le \!p\!\le \!1/2), \end{aligned}

where C is an absolute constant and $$P_{n,p}$$ is a particular compound Poisson distribution. Hence,

\begin{aligned} \sup _{p\le 1/2} d_{_{TV}}\!(\textbf{B}(n,p);P_{n,p}) = O(n^{-2/3}). \end{aligned}
(7)

According to Čekanavičius [16], there exists an absolute constant C such that

\begin{aligned} d_{\textrm{o}}(\textbf{B}(n,p);P_{n,p}) \le C \max \{(np)^{-2};n^{-1}\} (np)^{-1/2} \qquad (n^{-2/3}\!\le \!p\!\le \!1/2). \end{aligned}
(8)

In the case of i.i.d.r.v.s $$X,X_1,X_2,\ldots$$ obeying $$I\!\!EX^4\!<\!\infty$$ Čekanavičius [17] has shown that the accuracy of compound Poisson approximation to $$\mathcal{L}(S_n)$$ is $$C_Xn^{-3/2}$$, where $$C_X$$ depends on $$\mathcal{L}(X)$$.

The problem of establishing an estimate of the accuracy of Poisson approximation to the distributions of a sum of independent non-negative integer-valued r.v.s in terms of the point metric with a correct constant at the leading term remained open for a long while. In particular, an open question was whether $$\frac{1}{2} (3/2e)^{3/2}$$ would remain the best possible constant at the leading term in the case of integer-valued r.v.s. We give below the affirmative answer to that question. We generalise and sharpen the corresponding results from [16, 21, 43, 45].

Theorem 1 presents an estimate of the accuracy of Poisson approximation to $$\mathcal{L}(S_n)$$ in terms of the point metric with a correct constant at the leading term. Theorem 2 provides a first-order asymptotic expansion. Theorem 3 presents an estimate of the accuracy of shifted Poisson approximation. Theorem 4 provides an estimate of the accuracy of compound Poisson approximation in terms of the point metric. Theorems 13 only assume that the second moments are finite, Theorem 4 does not impose moment requirements, and the constants are explicit.

## 2 Results

Let $$\{X_i\}_{i\ge 1}$$ be independent non-negative integer-valued r.v.s.

First, we present an estimate of the accuracy of pure Poisson approximation.

### 2.1 Poisson Approximation

If random variables $$X,\xi ,\eta$$ have finite second moments, let

\begin{aligned} \kappa _{_X} = I\!\!EX\!-\!\textrm{var}X,\ \gamma _{\xi ,\eta } = I\!\!E|\xi (\xi \!-\!1)-\eta (\eta \!-\!1)|. \end{aligned}

Denote $$X_0\!=\!0,$$ $$\varepsilon ^*_\lambda = 1\!\wedge \!1/\sqrt{2\pi [\lambda ]}$$. If $$i\!\in \!\{0,1,\ldots ,n\},$$ let $$S_{n,i}=S_n\!-\!X_i$$, $$\lambda _i\!=\!I\!\!ES_{n,i}$$,

\begin{aligned} u_i= & {} 1\!-\!d_{_{TV}}\!(X_i;X_i\!+\!1),\ \ U\!=\!\sum ^n_{i=1}u_i,\ \ u^*=\max _{1\le i\le n} u_i,\ \ U_* = U\!-\!u^*,\\ \varepsilon _{i,n}= & {} \varepsilon _{i,n}^\textrm{o}\wedge \! \left( \varepsilon ^*_{\lambda _i}\!+\!2\varepsilon _{i,n}^+ \right) ,\ \ \varepsilon _{i,n}^\textrm{o}= 1 \!\wedge \! \sqrt{2/\pi }\, \big (1/4\!+\!U\!-\!u_i\big )^{-1/2},\\ \varepsilon _{i,n}^+= & {} \frac{1\!-\!e^{-\lambda }}{\lambda _i} \!\sum ^n_{j=1} d_{_{G}}(X_j;X_j^*) I\!\!EX_j,\\ r_{i,n}^*= & {} \min \!\Big \{ \varepsilon _{i,n}; \frac{_8}{^\pi } (U^2_*\!+\!(1\!-\!2u^*)U_*\!+\!1/4)^{-1/2} \Big \}, \end{aligned}

where $$X^*_i$$ denotes a random variable with the distribution

\begin{aligned} I\!\!P(X^*_i\!=\!m) = (m\!+\!1)I\!\!P(X_i\!=\!m\!+\!1)/I\!\!EX_i \qquad (m\!\ge \!0) \end{aligned}
(9)

(cf. [33, Ch. 4.4]). Note that $$U^2_*\!+\!(1\!-\!2u^*)U_*\!+\!1/4\!\ge \!0$$. If $$u^*\!\le \!1/4,$$ then $$U^2_*\!+\!(1\!-\!2u^*)U_*\!+\!1/4\!\ge \!U^2$$, and $$r_{i,n}^* \le \min \!\big \{ \varepsilon _{i,n};8/\pi U\big \}$$.

Set

\begin{aligned} \varepsilon _1^\textrm{o}= & {} \lambda ^{-1} \sum ^n_{i=1} \min \{ 4d_{_G}(X_i;X_i^*) \varepsilon _{i,n}; \gamma _{X_i,X_i^*} r_{i,n}^* \} I\!\!EX_i,\ \ \varepsilon ^\textrm{o}_+ = 2 \sum ^n_{i=1} |\kappa _{X_i}| \varepsilon _{i,n}/\lambda ,\\ \varepsilon _2^\textrm{o}= & {} 2\lambda ^{-1} \sum ^n_{i=1} |\kappa _{X_i}| I\!\!EX_i r_{i,n}^*,\ \ \varepsilon _3^\textrm{o}= 2\lambda ^{-1} |\kappa _{S_n}| (\varepsilon _1^\textrm{o}+ \varepsilon ^\textrm{o}_+). \end{aligned}

### Theorem 1

If $$X_1,\ldots ,X_n$$ are independent non-negative integer-valued random variables with finite second moments, then

\begin{aligned} d_\textrm{o}(S_n;\pi _\lambda ) \!\le & {} \! c_\textrm{o}|\kappa _{S_n}| \lambda ^{-3/2} + (1\!-\!e^{-\lambda })(\varepsilon _1^\textrm{o}+\varepsilon _2^\textrm{o}+\varepsilon _3^\textrm{o}) \qquad (n\!>\!1), \end{aligned}
(10)

where $$c_\textrm{o}= \frac{1}{2} (3/2e)^{3/2}$$.

If $$X,X_1,\ldots ,X_n$$ are i.i.d.r.v.s, set $$\theta ^*\!=\!|\kappa _X|/I\!\!EX,$$

\begin{aligned} \varepsilon _{_X}^\textrm{o}= & {} \tilde{\varepsilon }_1^\textrm{o}\!+\! 16|\kappa _X|/\pi (n\!-\!1)u_1 \!+\! 2\theta ^* ( 2\theta ^*\varepsilon _{1,n}\!+\tilde{\varepsilon }_1^\textrm{o}),\\ \tilde{\varepsilon }_1^\textrm{o}= & {} 2 d_{_G}(X;X^*) \varepsilon _{1,n} \!\wedge \! 8\gamma _{X,X^*}/\pi (n\!-\!1)u_1. \end{aligned}

Then (10) becomes

According to [45], constant $$c_\textrm{o}$$ in (10), (10$$^*$$) cannot be improved.

Note that the moment assumption in Theorem 1 can be relaxed at a cost of adding an extra term if one uses truncation at some levels $$\{K_i\}$$ (i.e. switches from $$\{X_i\}$$ to $$\{X_i'\},$$ where ) since $$d_{_{TV}}\!((X_1,\ldots ,\!X_n);(X_1',\ldots ,\!X_n')) \le \sum ^n_{i=1} \!I\!\!P(X_i\!>\!K_i)$$.

### Example 1

Let $$\{X_i\}$$ be independent Bernoulli $$\textbf{B}(p_i)$$ random variables, where $$p_i\!\in \![0;1/2]\ (i\!\ge \!1).$$ Then $$\lambda =\sum ^n_{i=1}p_i$$, $$\kappa _{S_n} = \sum ^n_{i=1}p_i^2$$, $$\varepsilon _1^\textrm{o}\!=\!0,$$ and (10) yields

\begin{aligned} d_\textrm{o}(S_n;\pi _\lambda )\le & {} c_\textrm{o}\sum ^n_{i=1}p_i^2 \Bigg (\sum ^n_{i=1}p_i\Bigg )^{-3/2} \nonumber \\{} & {} + 2\lambda ^{-1} (1\!-\!e^{-\lambda }) \Bigg (\sum ^n_{i=1}p_i^3r_{i,n}^* + \sum ^n_{i=1}p_i^2 \varepsilon _+^\textrm{o}\Bigg ). \end{aligned}
(11)

The constant at the leading term in (11) is sharper than that in (5).

In the case of the Binomial distribution $$\textbf{B}(n,p)$$, one has $$\kappa _X\!=\!p^2$$, $$u_1\!=\!p$$,

\begin{aligned} \varepsilon _2^\textrm{o}\le 16p/\pi (n\!-\!1),\ \ \varepsilon ^\textrm{o}_+ = 2p\varepsilon _{1,n},\ \ \varepsilon _3^\textrm{o}= 4p^2 \varepsilon _{1,n},\ \ \varepsilon _{1,n} \le \tilde{\varepsilon }_n, \end{aligned}

where $$\tilde{\varepsilon }_n = \min \!\Big \{ \sqrt{2/\pi } \Big /\!\Big (1/4\!+\!(n\!-\!1)p\Big )^{1/2}; 1/\sqrt{2\pi [(n\!-\!1)p]}\,+2(1\!-\!e^{-np})p/(1\!-\!1/n) \Big \}.$$ Then

\begin{aligned} d_\textrm{o}(\textbf{B}(n,p);\mathbf{\Pi }(np))\le & {} \frac{_1}{^2} (3/2e)^{3/2} \sqrt{p/n}\nonumber \\{} & {} + 4(1\!-\!e^{-np}) p \left( \sqrt{2p/\pi (n\!-\!1)} + 4/\pi (n\!-\!1)\right) . \end{aligned}
(12)

The rate of the second-order term in (12) is sharper than that in (4) if $$p\!\equiv \!p(n)\!\rightarrow \!0$$ as $$n\!\rightarrow \!\infty$$. Note that $$\sup _k I\!\!P(S_n\!=\!k) = O(1/\sqrt{np}\,),$$ cf. [31, 42].

### Example 2

Let $$X,X_1,X_2,\ldots$$ be independent geometric $$\mathbf{\Gamma _0}(p)$$ r.v.s:

\begin{aligned} I\!\!P(X\!=\!m) = (1\!-\!p)p^m \quad (m\!\ge \!0,\ 0\!\le \!p\!<\!1). \end{aligned}

Then $$S_n$$ is a Negative Binomial $$\mathbf{N\!B}(n,p)$$ r.v.:

\begin{aligned} I\!\!P(S_n\!=\!j) = \frac{\Gamma (n\!+\!j)}{\Gamma (n)\,j!} (1\!-\!p)^np^j \qquad (j\!\ge \!0,\ 0\!\le \!p\!<\!1), \end{aligned}

where $$\Gamma (y) = \int _0^\infty x^{y-1}e^{-x}dx.$$ It is easy to see that $$I\!\!P(X_i^*=m) = (m\!+\!1)p^m(1\!-\!p)^2$$. Hence,

\begin{aligned} X_i^* {\mathop {=}\limits ^{d}}X_i+X. \end{aligned}

Set $$r\!=\!p/(1\!-\!p).$$ Note that $$\lambda \!\equiv \! I\!\!ES_n \!=\!nr$$.

It is easy to check that $$\kappa _X\!=\!-r^2,$$ $$\gamma _{X_1,X_1^*}\!=\!4r^2,$$ $$u_1\!=\!p,$$ $$\theta ^*\!=\!r,$$

\begin{aligned} \varepsilon _X^\textrm{o}\le & {} \tilde{\varepsilon }_1^\textrm{o}+ 32p/\pi q(n\!-\!1) + 2r\!\left( 2r \sqrt{2/\pi (1/4+(n\!-\!1)p)} + \tilde{\varepsilon }_1^\textrm{o}\right) ,\\ \tilde{\varepsilon }_1^\textrm{o}\le & {} 32p/\pi q^2(n\!-\!1),\ \ \varepsilon _{1,n} \le \varepsilon _{n,p}^\star , \end{aligned}

where $$\varepsilon _{n,p}^\star = \sqrt{2/\pi }\!\Big /\!\Big (1/4\!+\!(n\!-\!1)p\Big )^{1/2}\! \wedge \Big ( 1\big /\sqrt{2\pi [(n\!-\!1)p]}\, + 2r/(1\!-\!1/n) \Big ).$$ Theorem 1 yields

\begin{aligned} d_\textrm{o}(S_n;\pi _\lambda ) \le \frac{_1}{^2} (3/2e)^{3/2} \sqrt{r/n}\, + (1\!-\!e^{-nr}) \varepsilon _X^\textrm{o}. \end{aligned}
(13)

Estimate (13) has a correct constant at the leading term, $$\varepsilon _X^\textrm{o}=O(p\sqrt{p/n}+\!p/n)$$.

### Example 3

Let $$X,X_1,\ldots$$ be i.i.d.r.v.s with the distribution

\begin{aligned} I\!\!P(X\!=\!0)\!=\!1\!-\!p\!+\!p^2/2,\ I\!\!P(X\!=\!1)\!=\!p\!-\!p^2,\ I\!\!P(X\!=\!2)\!=\!p^2/2, \end{aligned}

where $$p\!\in \![0;1/2].$$ Then $$I\!\!EX = \textrm{var}X = p$$, $$\mathcal{L}(X^*)=\textbf{B}(p)$$.

Note that $$I\!\!EX\!=\!I\!\!EX^*=p,$$

\begin{aligned} \kappa _{X}\!=\!0,\ \ \varepsilon _2\!=\!\varepsilon _3\!=\!0,\ \ u_1 = p\!-\!p^2/2,\ \ \gamma _{X,X^*}\!=\! d_{_{G}}(X;X^*)\!=\! p^2. \end{aligned}

Hence, $$\varepsilon _1^\textrm{o}\le 8p/\pi (1\!-\!p/2)(n\!-\!1),$$ and Theorem 1 yields

\begin{aligned} d_\textrm{o}(S_n;\pi _\lambda ) \!\le \! 8(1\!-\!e^{-np})p/\pi (1\!-\!p/2)(n\!-\!1) \qquad (n\!>\!1). \end{aligned}
(14)

While the rate of the accuracy of approximation in (12) is $$\sqrt{p/n},$$ the rate is p/n in (14).

### 2.2 A First-Order Asymptotic Expansion

For any function f, denote $$\Delta f(\cdot ) = f(\cdot +\!1)-f(\cdot ).$$ The following theorem provides a first-order asymptotic expansion in terms of $$d_{\textrm{o}}$$.

### Theorem 2

Let $$X_1,\ldots ,X_n$$ be independent non-negative integer-valued random variables with finite second moments. If where $$k\!\in \!I\!\!N,$$ then

\begin{aligned} \left| I\!\!Eh(S_n)-I\!\!Eh(\pi _\lambda ) + I\!\!E\Delta ^2 h(\pi _\lambda ) \kappa _{S_n} \!\big /2 \right| \le (1\!-\!e^{-\lambda })\left( \varepsilon _1^\textrm{o}+\varepsilon _2^\textrm{o}+ \varepsilon _3^\textrm{o}\right) . \end{aligned}
(15)

Let $$\pi ^\star _\lambda$$ denote a random variable with the distribution

\begin{aligned} I\!\!P(\pi ^\star _\lambda \!=\!k) = I\!\!P(\pi _\lambda \!=\!k)(k\!-\!\lambda )^2/\lambda \qquad (k\!\in \!\mathbb {Z}_+), \end{aligned}
(16)

where $$\mathbb {Z}_+=\{0,1,2,\ldots \}$$ is the set of non-negative integer numbers. Note that

\begin{aligned} \lambda I\!\!E\Delta ^2 h(\pi _\lambda ) = I\!\!Eh(\pi _\lambda ^\star )-I\!\!Eh(\pi _\lambda \!+\!1) = I\!\!P(\pi _\lambda ^\star \!=\!k) - I\!\!P(\pi _\lambda \!+\!1\!=\!k) \end{aligned}
(17)

(cf. [34, Remark 1]). Thus,

\begin{aligned}{} & {} \Big | I\!\!P(S_n\!=\!k) \!-\! I\!\!P(\pi _\lambda \!=\!k) + (I\!\!P(\pi _\lambda ^\star \!=\!k) \!-\! I\!\!P(\pi _\lambda \!+\!1\!=\!k)) \kappa _{S_n} \!\big /2\lambda \Big | \\{} & {} \quad \le (1\!-\!e^{-\lambda })\left( \varepsilon _1^\textrm{o}+\varepsilon _2^\textrm{o}+\varepsilon _3^\textrm{o}\right) . \end{aligned}

In particular,

\begin{aligned} | d_{\textrm{o}}(S_n;\pi _\lambda ) - d_{\textrm{o}}(\pi _\lambda ^\star ;\pi _\lambda \!+\!1) |\kappa _{S_n}\!| \big /2\lambda | \le (1\!-\!e^{-\lambda })\left( \varepsilon _1^\textrm{o}+\varepsilon _2^\textrm{o}+\varepsilon _3^\textrm{o}\right) . \end{aligned}

According to Roos [44], $$\Big |\lambda \Vert \Delta ^2I\!\!P_{\pi _\lambda }\Vert - 1/\sqrt{2\pi \lambda }\,\Big | \le C/\lambda .$$ Therefore,

\begin{aligned} | d_{\textrm{o}}(S_n;\pi _\lambda ) - |\kappa _{S_n}\!| \lambda ^{-3/2}\big /2\sqrt{2\pi } | = O(\varepsilon _1^\textrm{o}\!+\! \varepsilon _2^\textrm{o}\!+\! \varepsilon _3^\textrm{o}\!+\! |\kappa _{S_n}\!|\lambda ^{-2}). \end{aligned}
(18)

If $$X_1,\ldots ,X_n$$ are i.i.d. Bernoulli $$\textbf{B}(p)$$ r.v.s, then for any $$k\!\in \!I\!\!N$$

\begin{aligned} \Vert I\!\!P(S_n\!=\!k) - I\!\!P(\pi _{np}\!=\!k) + (I\!\!P(\pi _{np}^\star \!=\!k) \!-\! I\!\!P(\pi _{np}\!+\!1\!=\!k))p/2 \Vert \le (1\!-\!e^{-np}) \varepsilon _X^\textrm{o}, \end{aligned}

where $$\varepsilon _X^\textrm{o}= 4p \left( \sqrt{2p/\pi (n\!-\!1)}\, + 4/\pi (n\!-\!1)\right)$$; hence,

### 2.3 Shifted Poisson Approximation

Set $$\mu \!=\!\textrm{var}\,S_n\!+\!\{\kappa _{S_n}\}.$$ The next theorem deals with shifted Poisson approximation

\begin{aligned} \mathcal{L}(S_n)\approx \mathcal{L}([\kappa _{S_n}]\!+\!\pi _{\mu }). \end{aligned}

Note that $$[\kappa _{S_n}] \!+\! I\!\!E\pi _{\mu } \!= I\!\!ES_n,\ \,|\textrm{var}\, \pi _{\mu }\!-\textrm{var}\,S_n|\!<\!1.$$ W.l.o.g. we may assume that $$\mu \!>\!0$$.

Given a r.v. Y,  we denote $$\bar{Y} = Y\!-\!I\!\!EY.$$ Let $$\sigma ^2\!=\!\textrm{var}\, S_n$$, $$U_i \!=\! U\!-\!u_i,$$

\begin{aligned} \hat{\varepsilon }_\mu= & {} \min \!\Big \{ 2\mu ^{-1}(1\!-\!e^{-\mu })\varepsilon _{0,n}; \bar{\varepsilon }_\mu \Big \},\ \ \bar{\varepsilon }_\mu = 2\varepsilon ^*_{\mu }\sqrt{2/e\mu }\, + 2\mu ^{-1}(1\!-\!e^{-\mu }) \varepsilon ^\star _{\mu },\\ \varepsilon _\mu ^{_\#}= & {} \mu ^{-1} \sum ^n_{i=1} \min \{ 4I\!\!E|\bar{X}_iX_i| \varepsilon _{i,n}^\textrm{o}; I\!\!E|\bar{X}_iX_i| |X_i\!-\!1\!-\!2\tilde{X}_i| r^*_{i,n} \}. \end{aligned}

### Theorem 3

Let $$X_1,\ldots ,X_n$$ be independent non-negative integer-valued random variables with finite second moments. Then

\begin{aligned} d_\textrm{o}(S_n;[\kappa _{S_n}]\!+\!\pi _{\mu })\le & {} |\{\kappa _{S_n}\}| \hat{\varepsilon }_\mu + (1\!-\!e^{-\mu }) \varepsilon _\mu ^{_\#}. \end{aligned}
(19)

Example 1 (continued). Let $$\mathcal{L}(S_n)=\textbf{B}(n,p),$$ where $$p\!\in \![0;1/2].$$ Set $$q=1\!-\!p$$. Clearly,

\begin{aligned} \mu = npq\!+\!\{np^2\},\ \ \kappa _{S_n} = np^2,\ \delta _{X_i}^{(\mu )} = 0,\ \ \varepsilon _{0,n} \le \sqrt{2/\pi np},\ \ r_{1,n}^* \le 8/\pi (n\!-\!1)p. \end{aligned}

Therefore, $$\mu \!\le \!np,$$ $$\hat{\varepsilon }_\mu \!\le \!2\mu ^{-1}\varepsilon _{0,n},$$ and (19) yields $$(n\!>\!1)$$

\begin{aligned} d_\textrm{o}(S_n;[np^2]\!+\!\pi _{\mu }) \le 2\sqrt{2/\pi } (1\!-\!e^{-np}) (npq)^{-3/2} + 16(1\!-\!e^{-np})q/\pi (n\!-\!1). \end{aligned}
(20)

Using (12) as $$p\!<\!1/\sqrt{n}$$ and (20) as $$1/\sqrt{n}\le \!p\!\le \!1/2,$$ we derive

\begin{aligned} \sup _{0\le p\le 1/2}\! d_\textrm{o}(S_n;[np^2]\!+\!\pi _{\mu }) \le \frac{2\sqrt{2/\pi }}{n^{3/4} (1\!-\!1/\sqrt{n}\,)^{3/2}} + \frac{16}{\pi (n\!-\!1)} + \frac{4\sqrt{2/\pi }}{(n\!-\!1)^{5/4}}.\nonumber \\ \end{aligned}
(21)

Indeed, (12) entails

\begin{aligned} d_\textrm{o}(S_n;\pi _{np}) \le c_\textrm{o}n^{-3/4} + 4\sqrt{2/\pi }(n\!-\!1)^{-5/4} + 16/\pi (n\!-\!1)^{3/2}\, \end{aligned}

if $$p\!<\!1/\sqrt{n}$$. If $$p\!\ge \!1/\sqrt{n},$$ then (20) yields

\begin{aligned} d_\textrm{o}(S_n;[np^2]\!+\!\pi _{\mu }) \le 2\sqrt{2/\pi }\, n^{-3/4} (1\!-\!1/\sqrt{n}\,)^{-3/2} + 16/\pi (n\!-\!1), \end{aligned}

and (21) follows. A uniform in $$p\!\in \![0;1/2]$$ estimate of the accuracy of Poisson approximation to the Binomial distribution $$\textbf{B}(n,p)$$ in terms of the point metric seems to be new.

## 3 Compound Poisson Approximation

The topic of compound Poisson approximation to the distribution of a sum of random variables has attracted a lot of attention in the past decades (see, e.g., [19] and references therein).

The topic has applications in extreme value theory, insurance, reliability theory, patterns matching, etc. (cf. [6, 10, 22, 33]). In order to decide if a particular compound Poisson approximation to $$\mathcal{L}(S_n)$$ is applicable, one would require an estimate of the accuracy of compound Poisson approximation indicating the distance between two distributions is “small”, hence the need of sharp bounds with explicit constants.

From a theoretical point of view, the interest to the topic arises in connection with Kolmogorov’s problem concerning the accuracy of approximation of the distribution of a sum of independent r.v.s by infinitely divisible laws (see [4, 19] and references therein). From a practical point of view, the problem has important applications in insurance, reliability theory, extreme value theory, etc., cf. [19], and references therein.

Estimates of the accuracy of compound Poisson approximation have been derived mainly in terms of the uniform distance and the total variation distance. Very few estimates of the accuracy of compound Poisson approximation are available in terms of the point metric. However, in situations where one needs to evaluate $$I\!\!P(S_n\!<\!k)$$ for a “small” k the point metric may be advantages as estimates in terms of the point metric are expected to have better rate of approximation than estimates in terms of the uniform of the total variation distances, cf. (2)–(4).

By Khintchine’s formula (see [24, Ch. 2]), the distribution of any random variable X obeys

\begin{aligned} X {\mathop {=}\limits ^{d}}\tau _p X', \end{aligned}
(22)

where $$\tau _p$$ and $$X'$$ are independent r.v.s, $$\mathcal{L}(X') = \mathcal{L}(X|X\!\ne \!0),$$ $$\tau _p$$ is a Bernoulli $$\textbf{B}(p)$$ r.v., $$p=I\!\!P(X\!\ne \!0).$$ Since $$X_1,\ldots ,X_n$$ are independent r.v.s,

\begin{aligned} S_n {\mathop {=}\limits ^{d}}\tau _1 X_1'+\cdots +\tau _n X_n', \end{aligned}
(23)

where $$\tau _1,X_1',\ldots ,\tau _n,X_n'$$ are independent r.v.s, $$p_i=I\!\!P(X_i\!\ne \!0),$$

\begin{aligned} \mathcal{L}(X_i')=\mathcal{L}(X_i|X_i\!\ne \!0),\ \, \mathcal{L}(\tau _i)=\textbf{B}(p_i) \quad (\forall i). \end{aligned}
(24)

Assume that $$X',X_1',\ldots ,X_n'$$ are i.i.d.r.v.s. Then

\begin{aligned} S_n {\mathop {=}\limits ^{d}}\sum _{i=1}^{\nu _n} X_i', \end{aligned}
(25)

where r.v. $$\nu _n=\tau _1+\cdots +\tau _n$$ is independent of $$\{X_i'\}$$.

If $$\mathcal{L}(X')$$ is degenerate (i.e. $$X'\!=\!c,$$ where c is a constant), then $$S_n\!{\mathop {=}\limits ^{d}}\!c\nu _n,$$ and the problem of compound Poisson approximation to $$\mathcal{L}(S_n)$$ reduces to the problem of Poisson approximation to $$\mathcal{L}(\nu _n).$$

Assume that $$\mathcal{L}(X')$$ is not degenerate. According to Kolmogrov [27], formula (30),

\begin{aligned} d_{_{TV}}\!\Bigg ( \sum _{i=1}^{\nu _n}X'_i;\sum _{i=1}^{\pi _\lambda }X'_i \Bigg ) \le d_{_{TV}}\!(\nu _n;\pi _\lambda ), \end{aligned}
(26)

where $$\pi _\lambda$$ denotes a Poisson $$\mathbf{\Pi }(\lambda )$$ r.v.. Thus, an estimate of the accuracy of “accompanying” compound Poisson approximation (the terminology of Gnedenko and Kolmogorov [23]) in terms of the total variation distance follows from an estimate of the accuracy of pure Poisson approximation.

The following theorem presents an estimate of the accuracy of “accompanying” compound Poisson approximation to $$\mathcal{L}(S_n)$$ in terms of the point metric.

Recall (24), where $$p_i=I\!\!P(X_i\!\ne \!0).$$ Denote $$\bar{p}= \sum _{i=1}^np_i/n.$$

### Theorem 4

If $$X,X_1,X_2,\ldots$$ are independent integer-valued random variables, $$X',X_1',X_2',\ldots$$ are i.i.d.r.v.s, where $$\mathcal{L}(X')=\mathcal{L}(X|X\!\ne \!0),$$ then

\begin{aligned} d_\textrm{o}\!\Bigg (S_n;\sum _{i=1}^{\pi _{n\bar{p}}} X'_i\Bigg )\le & {} c_\textrm{o}\sum ^n_{i=1}p_i^2 \Bigg (\sum ^n_{i=1}p_i\Bigg )^{-3/2}\nonumber \\{} & {} + 2\frac{1\!-\!e^{-{n\bar{p}}}}{n\bar{p}} \Bigg (\sum ^n_{i=1}p_i^3 r^*_{i,n} + \sum ^n_{i=1}p_i^2 \varepsilon _+^\textrm{o}\Bigg ). \end{aligned}
(27)

If $$X,X_1,\ldots$$ are i.i.d.r.v.s, then $$\bar{p}\!=\!p,$$ where $$p\!=\!I\!\!P(X\!\ne \!0),$$ and (27) becomes

Bound (27) appears suitable for evaluating probabilities $$I\!\!P(S_n\!<\!k)$$ when k is “small”. A feature of the bound is that Theorem 4 does not impose moment requirements.

The rate of the accuracy of approximation in (27$$^*$$) is at least $$n^{-1/2},$$ the rate is $$o(n^{-1/2})$$ if $$p\!\equiv \!p(n)\!\rightarrow \!0$$ as $$n\!\rightarrow \!\infty$$.

If X is a Bernoulli r.v., then $$X'\!\equiv \!1,$$ $$\sum _{i=1}^{\pi _{np}} X'_i = \pi _{np}$$ is a Poisson r.v., and (27) coincides with (12). Thus, the constant at the leading term in (27) cannot in general be improved.

The advantage of employing the “accompanying” compound Poisson distribution is the simplicity of the approximating distribution. An open question is if the accuracy of compound Poisson approximation to $$\mathcal{L}(S_n)$$ can be improved using more complex approximating laws? Relation (29) suggests the following hypothesis. Let $$X,X_1,X_2,\ldots$$ be i.i.d.r.v.s. Denote $$P_{_X}=\mathcal{L}(X),$$ and let $$\mathbb{C}\mathbb{P}$$ denote the class of (shifted) compound Poisson r.v.s. An open question is if there exists an absolute constant $$C_\textrm{o}$$ such that

\begin{aligned} \sup _{P_{_X}} d_{\textrm{o}}(S_n;\mathbb{C}\mathbb{P}) \le C_\textrm{o} n^{-5/6}. \end{aligned}

### Example 4

Suppose that $$X,X_1,\ldots$$ are i.i.d. random variables,

\begin{aligned} I\!\!P(X\!=\!0)=1\!-\!p,\ I\!\!P(X\!=\!1)=I\!\!P(X\!=\!2)=p/2. \end{aligned}

Then $$I\!\!P(X'\!=\!1)=I\!\!P(X'\!=\!1)\!=\!1/2.$$ Hence, $$\sum _{i=1}^{\pi _{np}} X'_i {\mathop {=}\limits ^{d}}\pi '\!+\!2\pi ''$$, where $$\pi ',\,\pi ''$$ are independent Poisson $$\mathbf{\Pi }(np/2)$$ r.v.s. Note that $$\pi '\!+\!2\pi ''$$ is a compound Poisson r.v. Theorem 4 yields

\begin{aligned} d_\textrm{o}(S_n;\pi '\!+\!2\pi '') \le c_\textrm{o}\sqrt{p/n}\, + 4\sqrt{2/\pi }\, p^{3/2}/\sqrt{n\!-\!1} + 16p/\pi (n\!-\!1). \end{aligned}
(28)

If $$p\!\equiv \!p(n)\!\rightarrow \!0$$ as $$n\!\rightarrow \!\infty$$, which is typically the case when one deals with rare events, then the r.h.s. of (28) is $$o(n^{-1/2})$$.

We now present a uniform in $$p\!\in \![0;1/2]$$ estimate of the accuracy of compound Poisson approximation to the Binomial distribution in terms of the point metric.

Let $$P_{n,p}$$ denote the compound Poisson distribution from (7).

### Proposition 5

There exists an absolute constant C such that

\begin{aligned} \sup _{0\le p\le 1/2} d_{\textrm{o}}(\textbf{B}(n,p);\varvec{{\Pi }}(np)) \wedge d_{\textrm{o}}(\textbf{B}(n,p);P_{n,p}) \le C n^{-5/6}. \end{aligned}
(29)

Bound (29) provides a better rate of approximation than (21). However, (21) offers a simpler approximating distribution, and the constants are explicit.

## 4 Proofs

First, we present the proof of Theorem 2, then the proof of Theorem 1.

We start with a particular lemma, which is needed in the proof of Theorem 2.

For any function $$h\!\in \!\mathcal{S}$$, we denote by $$g\equiv g_h$$ the solution of the Stein equation

\begin{aligned} g(n\!+\!1)-\lambda ^{-1}ng(n) = h(n) - I\!\!Eh(\pi _\lambda ) \qquad (n\!\in \!\mathbb {Z}_+\!). \end{aligned}
(30)

Note that it is possible to write down the Stein equation in a different manner (see, e.g., [33, Remark 4.1]). The way shown in (30) is in line with the general approach to the characterisation of discrete distributions (cf. [33, Ch. 12]).

### Lemma 6

Let for a particular $$k\!\in \!\mathbb {Z}_+$$. If $$g_h$$ is given by (30), then

\begin{aligned} \Vert g_h\Vert \le 1\!-\!e^{-\lambda }. \end{aligned}
(31)

### Proof of Lemma 6

It is known that the solution of Eq. (30) is

(32)

(see, e.g., [7, 33]). The value of g(0) is irrelevant (we can set $$g(0)\!=\!0$$).

If where $$k\!\in \!\mathbb {Z}_+,$$ then

\begin{aligned} g(m) = \left( I\!\!P(\pi _\lambda \!=\!k\!<\!m) - I\!\!P(\pi _\lambda \!=\!k) I\!\!P(\pi _\lambda \!<\!m) \right) \!/ I\!\!P(\pi _\lambda \!=\!m\!-\!1) \quad (m\!\ge \!1\!). \end{aligned}
(33)

Denote

\begin{aligned} G(n) = I\!\!P(\pi _\lambda \!>\!n)/I\!\!P(\pi _\lambda \!=\!n),\ \ G_*(n) = I\!\!P(\pi _\lambda \!\le \!n)/I\!\!P(\pi _\lambda \!=\!n) \quad (n\!\in \!\mathbb {Z}_+). \end{aligned}

It is known (see, e.g., [33, p. 82]) that function G is decreasing; function $$G_*$$ is increasing. Therefore, (33) yields

\begin{aligned} -g_{h_k}(m)= & {} I\!\!P(\pi _\lambda \!=\!k) G_*(m\!-\!1) \le I\!\!P(\pi _\lambda \!=\!k) G_*(k\!-\!1) = I\!\!P(\pi _\lambda \!\le \!k\!-\!1)\lambda /k \\\le & {} I\!\!P(\pi _\lambda \!\le \!k) \!-\! e^{-\lambda } \le 1\!-\!e^{-\lambda }\, \quad (m\!\le \!k),\\ g_{h_k}(m)= & {} I\!\!P(\pi _\lambda \!=\!k) G(m\!-\!1) \le I\!\!P(\pi _\lambda \!=\!k) G(k) = I\!\!P(\pi _\lambda \!>\!k) \le 1\!-\!e^{-\lambda }\, \qquad (m\!>\!k), \end{aligned}

and (31) follows (Röllin [43] mentions without proof that $$\Vert g_h\Vert \!\le \!1$$). $$\square$$

### Proof of Theorem 2

We will use Stein’s method. The details of the method have been presented in many publications (see, e.g., [7, 33]).

According to (30),

\begin{aligned} I\!\!Eh(S_n)-I\!\!Eh(\pi _\lambda ) = I\!\!Eg(S_n\!+\!1)-\lambda ^{-1}I\!\!ES_ng(S_n). \end{aligned}

Below we evaluate $$|I\!\!Eg(S_n\!+\!1)-\lambda ^{-1}I\!\!ES_ng(S_n)|$$.

Let where $$k\!\in \!\mathbb {Z}_+.$$ Then

\begin{aligned}{} & {} g(m) = I\!\!P(\pi _\lambda \!=\!k) \frac{I\!\!P(\pi _\lambda \!>\!m\!-\!1)}{I\!\!P(\pi _\lambda \!=\!m\!-\!1)}\quad (k\!<\!m),\\{} & {} \quad g(m) = - I\!\!P(\pi _\lambda \!=\!k) \frac{I\!\!P(\pi _\lambda \!\le \!m\!-\!1)}{I\!\!P(\pi _\lambda \!=\!m\!-\!1)} \quad (k\!\ge \!m). \end{aligned}

If $$\Vert \Delta g\Vert \!=\!0,$$ then g is a constant, and $$\lambda I\!\!Eg(S_n\!+\!1)-I\!\!ES_ng(S_n)=0.$$ Therefore, without loss of generality we may assume that $$\Vert \Delta g\Vert \!>\!0.$$

Recall that $$S_{n,i}=S_n\!-\!X_i$$. Set

\begin{aligned} g_i(\cdot )=I\!\!Eg(S_{n,i}\!+\!1\!+\cdot ). \end{aligned}

It is known that

\begin{aligned} I\!\!EX_i f(X_i) = I\!\!EX_i I\!\!Ef(X_i^*\!+\!1) \end{aligned}
(34)

for any function f such that $$I\!\!E|X_i f(X_i)|<\infty$$ (cf. [33, Ch. 4]). Therefore,

\begin{aligned} \nonumber \lambda I\!\!Eg(S_n\!+\!1)-I\!\!ES_ng(S_n)= & {} \lambda I\!\!Eg(S_n\!+\!1)-\sum ^n_{i=1} I\!\!EX_ig(S_{n,i}\!+\!X_i) \\ \nonumber= & {} \sum ^n_{i=1} I\!\!EX_i\left( I\!\!Eg(S_{n,i}\!+\!X_i\!+\!1) - I\!\!Eg(S_{n,i}\!+\!X_i^*\!+\!1) \right) \\= & {} \sum ^n_{i=1} I\!\!EX_i \left( I\!\!Eg_i(X_i) - I\!\!Eg_i(X_i^*) \right) . \end{aligned}
(35)

Given a function $$f\!:\mathbb {Z}_+\!\rightarrow \!I\!\!R,$$ we denote $$(\ell \!\ge \!0,m\!\ge \!0,k\!\ge \!0)$$

\begin{aligned} R_f(m,k,\ell )= & {} f(m)-f(k)-(m\!-\!k)\Delta f(\ell ),\\ c_1(f)= & {} \sup \nolimits _{i,j}|\Delta f(i)\!-\!\Delta f(j)|,\ c_2(f)=\Vert \Delta ^2 f\Vert , \nonumber \\ \delta _{m,k}^{(\ell )}= & {} \min \!\left\{ c_1(f) |m\!-\!k|; c_2(f) |(m\!-\!\ell )(m\!-\!\ell -\!1\!) - (k\!-\!\ell )(k\!-\!\ell -\!1\!)|/2 \right\} .\nonumber \end{aligned}
(36)

According to Proposition 4 in [35, 36],

\begin{aligned} |f(m)-f(k)-(m\!-\!k)\Delta f(\ell )| \le \delta _{m,k}^{(\ell )}. \end{aligned}
(37)

Clearly,

\begin{aligned} g_i(X_i)-g_i(X_i^*) = (X_i\!-\!X_i^*) \Delta g_i(0) + R_{g_i}(X_i,X_i^*,0). \end{aligned}
(38)

From (35), (38), (37),

\begin{aligned} \nonumber{} & {} \Bigg | \lambda I\!\!Eg(S_n\!+\!1)-I\!\!ES_ng(S_n) - \sum ^n_{i=1} I\!\!EX_i I\!\!E(X_i\!-\!X_i^*) I\!\!E\Delta g(S_{n,i}\!+\!1) \Bigg | \\{} & {} \quad \le \sum ^n_{i=1} I\!\!EX_i I\!\!E\delta _{X_i,X_i^*}^{(0)}. \end{aligned}
(39)

It is known (see Barbour and Eagleson [12] or [7, Remark 1.1.2]) that

\begin{aligned} \Vert \Delta g_h\Vert \le 1\!-\!e^{-\lambda },\ \ \Vert g_h\Vert \le \sqrt{2\lambda /e} \wedge \lambda . \end{aligned}
(40)

Thus, $$|c_1(g_i)| \le 2\Vert \Delta g_i\Vert \le 2(1\!-\!e^{-\lambda }).$$ Using (51), we get

\begin{aligned} \Vert \Delta g_i\Vert \le \Vert g_i\Vert \,\sum _k |\Delta I\!\!P(S_{n,i}\!=\!k)| = 2\Vert g_i\Vert \, d_{_{TV}}\!(S_{n,i};S_{n,i}\!+\!1). \end{aligned}
(41)

Recall that

\begin{aligned} d_{_{TV}}\!(S_{n,i};S_{n,i}\!+\!1) \le \varepsilon _{i,n}. \end{aligned}
(42)

Taking into account Proposition 6 and (42), we derive

\begin{aligned} |c_1(g_i)| \le 4(1\!-\!e^{-\lambda })\varepsilon _{i,n}. \end{aligned}

Note that

\begin{aligned} |I\!\!E\Delta ^2 g(S_{n,i}\!+\!\ell )| \le 2\Vert \Delta g\Vert \varepsilon _{i,n}. \end{aligned}

Taking into account (40), $$\Vert \Delta ^2 g_i\Vert \le 2(1\!-\!e^{-\lambda }) \varepsilon _{i,n}$$. By (51), (56) and (31),

\begin{aligned} |\Delta ^2 g_i(\cdot )| \le \Vert g\Vert \, \sum _k |\Delta ^2 I\!\!P(S_{n,i}\!=\!k)| \le \frac{_{16}}{^\pi } (1\!-\!e^{-\lambda }) \big / \sqrt{U_*^2\!+\!(1\!-\!2u^*)U\!+\!1/4}. \end{aligned}

Therefore,

\begin{aligned} \Vert \Delta ^2 g_i(\cdot )\Vert /2 \le (1\!-\!e^{-\lambda }) r_{i,n}^*. \end{aligned}
(43)

Combining these estimates, we get

\begin{aligned} I\!\!E\delta _{X_i,X_i^*}^{(0)} \le (1\!-\!e^{-\lambda }) \min \{ 4I\!\!E|X_i\!-\!X_i^*| \varepsilon _{i,n}; \gamma _{X_i,X_i^*} r_{i,n}^* \}. \end{aligned}
(44)

Here random variables $$X_i,X_i^*$$ can be defined on a common probability space in such a way that $$I\!\!E|X_i\!-\!X_i^*| = d_{_G}(X_i,X_i^*).$$ Notice that

\begin{aligned} \kappa _{X_i} = I\!\!EX_i I\!\!E(X_i\!-\!X_i^*) \end{aligned}
(45)

(cf. (34)). Taking into account (39), (44), (45), we derive

\begin{aligned} \Big | \lambda I\!\!Eg(S_n\!+\!1)-I\!\!ES_ng(S_n) - \sum ^n_{i=1} \kappa _{X_i} I\!\!E\Delta g(S_{n,i}\!+\!1) \Big | \le \lambda (1\!-\!e^{-\lambda }) \varepsilon _1^\textrm{o}. \end{aligned}
(46)

By (51), $$I\!\!E\Delta g(Y) = -\sum _k g(k) \Delta I\!\!P_Y(k).$$ Using (42) and (31), we get

\begin{aligned} |I\!\!E\Delta g(S_{n,i}\!+\!1)| \le 2\Vert g\Vert d_{_{TV}}\!(S_{n,i};S_{n,i}\!+\!1) \le 2(1\!-\!e^{-\lambda }) \varepsilon _{i,n}. \end{aligned}

This and (46) entail

\begin{aligned} d_{\textrm{o}}(S_n;\pi _\lambda ) \le (1\!-\!e^{-\lambda })(\varepsilon _1^\textrm{o}+\varepsilon ^\textrm{o}_+). \end{aligned}
(47)

Now we replace $$I\!\!E\Delta g(S_{n,i}\!+\!1)$$ in (46) with $$I\!\!E\Delta g(S_n\!+\!1).$$

If $$m\!>\!k,$$ then

\begin{aligned} f(m)-f(k) = \sum _{i=k}^{m-1} \Delta f(i) \end{aligned}
(48)

for any function f. In particular,

\begin{aligned} I\!\!E\Delta g(S_{n,i}\!+\!X_i\!+\!1) - I\!\!E\Delta g(S_{n,i}\!+\!1) = I\!\!E\sum _{\ell =0}^{X_i-1} \Delta ^2 g_i(\ell ). \end{aligned}
(49)

According to [35], Lemma 5, for any bounded function f

\begin{aligned} |I\!\!E\Delta f(S_{n,i})| \le \min \left\{ 2\Vert f\Vert \varepsilon _{i,n}; (\Vert \Delta f\Vert \!\wedge \!2\Vert f\Vert \varepsilon ^*_{\lambda _i})+2\Vert \Delta f\Vert \varepsilon _{i,n}^+ \right\} . \end{aligned}
(50)

An application of (50) with $$f=\Delta g_i$$ yields

\begin{aligned} \left| I\!\!E\Delta g(S_{n,i}\!+\!X_i\!+\!1)- I\!\!E\Delta g(S_{n,i}\!+\!1) \right| \le 2\Vert \Delta g\Vert \varepsilon _{i,n} I\!\!EX_i, \end{aligned}

where $$\Vert \Delta g\Vert \!\le \!1\!-\!e^{-\lambda }$$ by (40). Note that (43) and (49) yield

\begin{aligned} | I\!\!E\Delta g(S_{n,i}\!+\!X_i\!+\!1)- I\!\!E\Delta g(S_{n,i}\!+\!1) | \le 2(1\!-\!e^{-\lambda }) r_{i,n}^* I\!\!EX_i. \end{aligned}

Therefore,

\begin{aligned} \Bigg | \sum ^n_{i=1} \Big ( I\!\!E\Delta g(S_n\!+\!1) - I\!\!E\Delta g(S_{n,i}\!+\!1) \kappa _{X_i} \Big ) \Bigg | \le 2(1\!-\!e^{-\lambda }) \sum ^n_{i=1} |\kappa _{X_i}| I\!\!EX_i r_{i,n}^*. \end{aligned}

We have shown that

\begin{aligned} \left| I\!\!Eh(S_n) - I\!\!Eh(\pi _\lambda ) - \lambda ^{-1} \kappa _{S_n} I\!\!E\Delta g(S_n\!+\!1) \right| \le (1\!-\!e^{-\lambda }) (\varepsilon _1^\textrm{o}+\varepsilon _2^\textrm{o}). \end{aligned}

It remains to evaluate $$I\!\!E\Delta g(S_n\!+\!1)-I\!\!E\Delta g(\pi _\lambda \!+\!1)$$.

Recall that where $$k\!\in \!\mathbb {Z}_+.$$ It is known that $$\Delta g(i)\!\le \!0$$ if $$i\!\ne \!k,$$ while $$0\!\le \!\Delta g(k)\!\le \!1\!-\!e^{-\lambda }$$ (cf. [12] or the proof of Proposition 6). Note that $$\sum _{i\ne k} \Delta g(i) = -\Delta g(k)$$. Therefore,

\begin{aligned} \left| \sum _{i\ne k} \Delta g(i) \left( I\!\!P(S_n\!+\!1\!=\!i) - I\!\!P(\pi _\lambda \!+\!1\!=\!i) \right) \right| \!\le & {} \! (1\!-\!e^{-\lambda }) d_{\textrm{o}}(S_n;\pi _\lambda ),\\ \left| \Delta g(k) \left( I\!\!P(S_n\!+\!1\!=\!k) - I\!\!P(\pi _\lambda \!+\!1\!=\!k)\right) \right| \!\le & {} \! (1\!-\!e^{-\lambda }) d_{\textrm{o}}(S_n;\pi _\lambda ). \end{aligned}

An application of (47) yields

\begin{aligned} | I\!\!E\Delta g(S_n\!+\!1)-I\!\!E\Delta g(\pi _\lambda \!+\!1) | \le 2(1\!-\!e^{-\lambda }) d_{\textrm{o}}(S_n;\pi _\lambda ) \le 2(1\!-\!e^{-\lambda }) (\varepsilon _1^\textrm{o}+ \varepsilon ^\textrm{o}_+). \end{aligned}

Thus,

\begin{aligned} \left| I\!\!Eh(S_n) - I\!\!Eh(\pi _\lambda ) - \lambda ^{-1} \kappa _{S_n} I\!\!E\Delta g(\pi _\lambda \!+\!1) \right| \le (1\!-\!e^{-\lambda }) (\varepsilon _1^\textrm{o}+\varepsilon _2^\textrm{o}+\varepsilon _3^\textrm{o}). \end{aligned}

Note that

\begin{aligned} I\!\!E\Delta g_{h_A}(\pi _\lambda \!+\!1) = \left( I\!\!P(\pi _\lambda \!+\!1\!\in \! A) - I\!\!P(\pi _\lambda ^\star \!\in \! A)\right) \!/2 \end{aligned}

for any indicator function (cf. (4.31) in [33]). Since every function $$h\!\in \!\mathcal{S}$$ can be represented as where $$\{c_k\}$$ are constants,

\begin{aligned} I\!\!E\Delta g_h(\pi _\lambda \!+\!1) = \left( I\!\!Eh(\pi _\lambda \!+\!1) - I\!\!Eh(\pi _\lambda ^\star )\right) \!/2 \qquad (\forall h\!\in \!\mathcal{S}). \end{aligned}

Therefore,

This and (17) lead to (15). $$\square$$

### Proof of Theorem 1

If $$\lambda \!=\!0,$$ then $$S_n\!=\!\pi _\lambda \!=\!0,$$ and (10) trivially holds. Therefore, w.l.o.g. we may assume that $$\lambda \!>\!0$$.

Given a r.v. Y,  denote

\begin{aligned} \Delta I\!\!P_Y(\cdot ) = I\!\!P(Y\!+\!1\!=\!\cdot )-I\!\!P(Y\!=\!\cdot ). \end{aligned}

In particular, $$\Delta f(\cdot ,\cdot )$$ means the increment of the first argument.

Clearly, for any bounded function h

\begin{aligned} I\!\!E\Delta h(Y) = - \sum _j h(j) \Delta I\!\!P_Y(j). \end{aligned}
(51)

Given an arbitrary $$k\!\in \!\mathbb {Z}_+$$, let . We apply (15).

Note that

Therefore,

According to Lemma 3 in Roos [45],

(52)

Bounds (15) and (52) entail

The proof is complete. $$\square$$

The proof of Theorem 3 requires the following

### Proposition 7

For any bounded function f and any integer-valued random variable Y

\begin{aligned} |I\!\!E\Delta f(Y)|\le & {} \!2\Vert f\Vert \, d_{_{TV}}\!(Y;Y\!+\!1), \end{aligned}
(53)
\begin{aligned} |I\!\!E\Delta ^2 f(Y)|\le & {} \Vert f\Vert \, \Vert \Delta ^2I\!\!P_Y\Vert _{_1}. \end{aligned}
(54)

As a consequence,

\begin{aligned} |I\!\!E\Delta f(S_n)|\le & {} 2\Vert f\Vert \, \varepsilon _{0,n}, \end{aligned}
(55)
\begin{aligned} |I\!\!E\Delta ^2 f(S_n)|\le & {} \frac{_{16}}{^\pi } \Vert f\Vert \big / \sqrt{U^2\!+\!(1\!-\!2u^*)U\!+\!1/4}. \end{aligned}
(56)

If $$2u^*\!\le \!1,$$ then (56) yields

\begin{aligned} |I\!\!E\Delta ^2 f(S_n)| \le 16\Vert f\Vert \big /\pi U. \end{aligned}

If $$2u^*>1,$$ then (56) entails

\begin{aligned} |I\!\!E\Delta ^2 f(S_n)| \le 16\Vert f\Vert \big /\pi (U\!-\!1/2). \end{aligned}

### Proof of Proposition 7

Relation (53) follows from (51): $$I\!\!E\Delta f(Y) = -\sum _k f(k) \Delta I\!\!P_Y(k),$$ hence

\begin{aligned} |I\!\!E\Delta f(Y)| \le \Vert f\Vert \,\Vert \Delta I\!\!P_Y\Vert _{_1} = 2\Vert f\Vert \, d_{_{TV}}\!(Y;Y\!+\!1). \end{aligned}

Similarly,

\begin{aligned} |I\!\!E\Delta ^2 f(Y)| = \Big |\sum _k f(k) \Delta ^2 I\!\!P_Y(k)\Big | \le \Vert f\Vert \, \Vert \Delta ^2I\!\!P_Y\Vert _{_1}. \end{aligned}

Bound (55) is an immediate consequence of (53) and (42). Relation (56) will follow from (54) and the following inequality:

\begin{aligned} \Vert \Delta ^2I\!\!P(S_n\!=\!\cdot )\Vert _{_1} \le 16\big /\pi \sqrt{U^2\!+\!(1\!-\!2u^*)U\!+\!1/4}. \end{aligned}
(57)

We proceed with the proof of (57); the argument is similar to that behind (4.9) in [11].

Let $$I_x$$ denote the distribution concentrated at point x, $$I\!\equiv \!I_\textrm{0},$$ and let $$*$$ denote the convolution of measures. Then

\begin{aligned} \Delta I\!\!P_Y = I\!\!P_Y *(I_1\!-\!I),\ \,\Delta ^2I\!\!P_Y(\cdot ) = (I_{1}\!-\!I)^{*2}*I\!\!P_Y. \end{aligned}

Let $$Q_1,\,Q_2$$ be two measures. By the property of the total variation norm,

\begin{aligned} \Vert Q_1*Q_2\Vert _{_1} \le \Vert Q_{_1}\Vert _{_1}\, \Vert Q_2\Vert _{_1}. \end{aligned}

Set $$J\!=\!\{1,\ldots ,n\}.$$ If $$J=C\cup D,$$ we denote $$S'=\sum _{i\in C}X_i$$, $$S''=\sum _{i\in D}X_i$$. Since $$I\!\!P_{S_n}=I\!\!P_{S'}*I\!\!P_{S''}$$, we have

\begin{aligned} \Vert \Delta ^2I\!\!P_{S_n}\Vert _{_1}= & {} \Vert (I_1\!-\!I)^{*2}*I\!\!P_{S_n}\Vert _{_1} = \Vert (I_1\!-\!I)\!*\!I\!\!P_{S'}*(I_1\!-\!I)\!*\!I\!\!P_{S''}\Vert _{_1} \\\le & {} \Vert (I_1\!-\!I)\!*\!I\!\!P_{S'}\Vert _{_1}\, \Vert (I_1\!-\!I)\!*\!I\!\!P_{S''}\Vert _{_1} = 4d_{_{TV}}\!(S';S'\!+\!1) d_{_{TV}}\!(S'';S''\!+\!1). \end{aligned}

We will exploit this bound and (59).

If $$U\!=\!0,$$ then (57) trivially holds. Therefore, we may assume that $$U\!>\!0.$$ Set J can be split into $$C\!\cup \!D$$ so that sets CD are non-empty,

\begin{aligned} U_C\!>\!U/2\!-\!u^*,\ \ U_D\!\ge \!U/2, \end{aligned}
(58)

where $$U_C=\sum _{i\in C}u_i,\ U_D:=\sum _{i\in D} u_i$$.

Indeed, r.v.s $$X_1,\ldots ,X_n$$ (and hence numbers $$u_1,\ldots ,u_n$$) can be rearranged without affecting $$S_n$$. Therefore, we may assume that $$u_0\!:=\!0\!\le \!u_1\!\le \cdots \le \!u_n\!=\!u^*$$. Denote

\begin{aligned} \Sigma _k = u_1+\cdots +u_k,\ \ \nu = \min \{k\!\ge \!1\!: \Sigma _k\!>\!U/2\}. \end{aligned}

If $$\Sigma _{n-1}\!\le \!U/2,$$ then $$\nu =n,$$ hence $$u_n\!>\!U/2.$$ One can choose $$C=\{1,\ldots ,n\!-\!1\},\,D\!=\!\{n\}.$$ Then

\begin{aligned} U_C\!=\!U\!-\!u^* \ge (U/2\!-\!u^*)_+,\ U_D\!=\! u_n\!>\!U/2. \end{aligned}

If $$\Sigma _{n-1}\!>\!U/2,$$ then $$\nu \!\le \!n\!-\!1$$ and $$u^*\!\le \!U/2.$$ One can choose $$C=\{1,\ldots ,\nu \!-\!1\},\,D=\{\nu ,\ldots ,n\}.$$ Then

\begin{aligned} U_C = \Sigma _\nu -u_\nu \ge \Sigma _\nu -u^* > U/2-u^*,\ \ U_D = U-\Sigma _{\nu -1} \ge U/2, \end{aligned}

and (58) holds.

According to Mattner and Roos [31, Corollary 1.6],

\begin{aligned} d_{_{TV}}\!(S_n;S_n\!+\!1) \le \sqrt{2/\pi }\Big / \left( 1/4\!+\!U\right) ^{1/2}. \end{aligned}
(59)

Inequality (59) can be applied to $$S',\,S''$$:

\begin{aligned} d_{_{TV}}\!(S';S'\!+\!1) \le \sqrt{2/\pi }\Big / \Big (1/4\!+\!U_C\Big )^{1/2},\ \ d_{_{TV}}\!(S'';S''\!+\!1) \le \sqrt{2/\pi }\Big / \Big (1/4\!+\!U_D\Big )^{1/2}. \end{aligned}

One can check that $$(1/4\!+\!U_C) (1/4\!+\!U_D) \ge U^2/4\!+\!(1\!-\!2u^*)U/4\!+\!1/16$$. Hence,

\begin{aligned} \Vert \Delta ^2I\!\!P_{S_n}\Vert _{_1} \le \frac{_8}{^\pi } (U_C\!+\!1/4)^{-1/2} (U_D\!+\!1/4)^{-1/2} \le \frac{_{16}}{^\pi } (U^2\!+\!(1\!-\!2u^*)U\!+\!1/4)^{-1/2}. \end{aligned}

This leads to (57) and hence to (56). The proof is complete. $$\square$$

### Proof of Theorem 3

Recall that $$\mathbb {Z}$$ denotes the set of integer numbers. Set

\begin{aligned} a = [\kappa _{S_n}],\ b=\{\kappa _{S_n}\}. \end{aligned}

Then

\begin{aligned} \mu = \sigma ^2\!+b = \lambda \!-\!a\ge 0. \end{aligned}
(60)

We need to evaluate $$|I\!\!Eh(S_n)-I\!\!Eh(\pi _{\mu }\!+\!a)|,$$ where .

W.l.o.g. we may assume that $$\mu \!>\!0$$: if $$\mu \!=0,$$ then (60) yields $$\sigma ^2 = \{\lambda \}\!=\!0$$; hence, every $$X_i$$ is a constant, $$a=\lambda ,$$ and (19) trivially holds.

In the case of shifted Poisson approximation, the basic equation is

\begin{aligned} f(k\!+\!1)-\mu ^{-1}(k\!-\!a)f(k) = h(k) - I\!\!Eh(\pi _{\mu }\!+a) \qquad (k\!\ge \!a) \end{aligned}
(61)

(cf. (12.26) in [33]). The solution $$f \equiv f_{h}$$ of Eq. (61) is

\begin{aligned} f(k) \!=\! 0 \quad (k\!\le \!a),\ f(k) = g_{\tilde{h}}(k\!-\!a)\quad (k\!\ge \!a), \end{aligned}

where $$\tilde{h}(m)=h(m\!+\!a)\ (m\!\ge \!0)$$ and $$g\equiv g_{\tilde{h}}$$ is given by (32) with $$\lambda$$ replaced with $$\mu$$.

According to (61),

\begin{aligned} I\!\!Eh(S_n)-I\!\!Eh(\pi _{\mu }\!+\!a) = I\!\!Ef(S_n\!+\!1)-\mu ^{-1}I\!\!E(S_n\!-\!a) f(S_n). \end{aligned}

Below we evaluate $$|\mu I\!\!Ef(S_n\!+\!1)-I\!\!E(S_n\!-\!a)f(S_n)|$$.

First, we show that

\begin{aligned} d_\textrm{o}(S_n;\pi _\mu \!+\!a) \le \mu ^{-1}|b|\, |I\!\!E\Delta f(S_n)| + (1\!-\!e^{-\mu }) \varepsilon _\mu ^{_\#}. \end{aligned}
(62)

Let $$\{\tilde{X}_j\}$$ denote independent copies of $$\{X_j\}$$. Set $$\bar{X}_i = X_i\!-\!I\!\!EX_i,$$

\begin{aligned} f_i(\cdot )=I\!\!Ef(S_{n,i}\!+\!\cdot ),\ r_i = R_{f_i}(X_i,0,\tilde{X}_i), \end{aligned}

cf. (36). Because of (60),

\begin{aligned} \mu f(S_n\!+\!1) - (S_n\!-\!a) f(S_n)= & {} \mu f(S_n\!+\!1) - (\bar{S}_n\!+\!\mu ) f(S_n)\\= & {} \mu \Delta f(S_n) - \bar{S}_n f(S_n). \end{aligned}

Since $$r_i = f_i(X_i)-f_i(0)-X_i\Delta f_i(\tilde{X}_i),$$

\begin{aligned} I\!\!E\bar{S}_n f(S_n)= & {} \sum ^n_{i=1} I\!\!E\bar{X}_i \left( f(S_n)-f(S_{n,i})\right) \\= & {} \sigma ^2I\!\!E\Delta f(S_n) + \sum ^n_{i=1} I\!\!E\bar{X}_i r_i. \end{aligned}

Recall that $$\mu = \sigma ^2\!+\!b.$$ Hence,

\begin{aligned} \mu I\!\!Ef(S_n\!+\!1) - I\!\!E(S_n\!-\!a) f(S_n) = bI\!\!E\Delta f(S_n) - \sum ^n_{i=1} I\!\!E\bar{X}_i r_i. \end{aligned}

Note that $$|r_i| \le \delta _{X_i,0}^{\tilde{X}_i},$$ $$c_1(f_i) \le 2\Vert \Delta f_i\Vert \le 4(1\!-\!e^{-\mu }) \varepsilon _{i,n}^\textrm{o}$$ by (37), (41), (42), (31). Therefore,

\begin{aligned} I\!\!E| \bar{X}_i r_i | \le 4 (1\!-\!e^{-\mu }) \varepsilon _{i,n}^\textrm{o}I\!\!E|\bar{X}_iX_i|. \end{aligned}

According to (55), $$\Vert \Delta ^2 f_i\Vert \le 2\Vert \Delta g\Vert \varepsilon _{i,n},$$ while (56) yields

\begin{aligned} \Vert \Delta ^2 f_i\Vert \le \frac{_{16}}{^\pi } \Vert g\Vert \big / \sqrt{U_i^2\!+\!(1\!-\!2u^*)U_i\!+\!1/4}. \end{aligned}

In view of (40) and (31), $$\max \{ \Vert g\Vert ;\Vert \Delta g\Vert \} \le 1\!-\!e^{-\mu }$$. Therefore,

\begin{aligned} c_2(f_i) \equiv \Vert \Delta ^2 f_i\Vert \le 2(1\!-\!e^{-\mu }) r^*_{i,n}. \end{aligned}
(63)

Note that $$(m\!-\!\ell )(m\!-\!\ell -\!1\!) - \ell (\ell \!+\!1) = m(m\!-\!1\!) - 2m\ell .$$ Hence,

\begin{aligned} I\!\!E| \bar{X}_i r_i |\le & {} I\!\!E|\bar{X}_i| |X_i(X_i\!-\!1)\!-\!2X_i\tilde{X}_i| \Vert \Delta ^2 f_i\Vert /2\\\le & {} (1\!-\!e^{-\mu }) I\!\!E|\bar{X}_i| |X_i(X_i\!-\!1)\!-\!2X_i\tilde{X}_i| r^*_{i,n} \end{aligned}

by (37), (40), (55), (31), (63). Thus,

\begin{aligned} I\!\!E| \bar{X}_i r_i | \le (1\!-\!e^{-\mu }) \min \{ 4I\!\!E|\bar{X}_iX_i| \varepsilon _{i,n}^\textrm{o}; I\!\!E|\bar{X}_iX_i| |X_i\!-\!1\!-\!2\tilde{X}_i| r^*_{i,n} \}, \end{aligned}

i.e. (62) holds.

It is known (cf. [35, 36]) that

\begin{aligned} |I\!\!E\Delta f(S_n)| \le \mu \bar{\varepsilon }_\mu . \end{aligned}

According to (51), (41), (42), (31),

\begin{aligned} |I\!\!E\Delta f(S_n)| \le 2(1\!-\!e^{-\mu })\varepsilon _{0,n}. \end{aligned}

Thus, (19) holds. The proof is complete. $$\square$$

### Remark 1

Estimates of Theorems 13 work best if the maximum span of $$\mathcal{L}(X_1),\ldots ,\mathcal{L}(X_n)$$ is 1. If the maximum span of $$\mathcal{L}(X_i)$$ is $$>\!1$$ for a particular i, then $$d_{_{TV}}\!(X_i;X_i\!+\!1)$$ may be equal to 1, reducing $$U\!=\!\sum ^n_{i=1}u_i$$ and hence increasing the bounds.

The following inequality is employed in the proof of Theorem 4.

### Lemma 8

Let $$\tilde{\nu },\nu ,X,X_1,X_2,\ldots$$ be independent r.v.s taking values in $$\mathbb {Z}_+$$, $$X_i{\mathop {=}\limits ^{d}}X\ (\forall i).$$ Denote $$S_0=0,\,S_k=X_1+\cdots +X_k\ (k\!\in \!I\!\!N).$$ Then

\begin{aligned} d_{\textrm{o}}(S_{\tilde{\nu }};S_\nu ) \le d_\textrm{o}(\tilde{\nu };\nu )/I\!\!P(X\!\ne \!0). \end{aligned}
(64)

Relation (64) is an analogue of (26) in terms of the point metric.

### Proof of Lemma 8

W.l.o.g. we may assume that $$p\!:=\!I\!\!P(X\!\ne \!0)\!\ne \!0.$$ For any $$m\in \mathbb {Z}_+$$

\begin{aligned} I\!\!P(S_{\tilde{\nu }}\!=\!m)-I\!\!P(S_\nu \!=\!m) = \sum _{k\ge 0} \left( I\!\!P(\tilde{\nu }\!=\!k)-I\!\!P(\nu \!=\!k)\right) I\!\!P(S_k\!=\!m). \end{aligned}

Therefore,

\begin{aligned} |I\!\!P(S_{\tilde{\nu }}\!=\!m)-I\!\!P(S_\nu \!=\!m)| \le d_\textrm{o}(\nu _n;\nu ) \sum _{k\ge 0} I\!\!P(S_k\!=\!m). \end{aligned}
(65)

Denote

\begin{aligned} f(m) = \sum _{k\ge 0} I\!\!P(S_k\!=\!m) \qquad (m\!\in \!\mathbb {Z}_+). \end{aligned}

Estimate (64) will follow if we show that $$f(m)\!\le \!1/p$$ for any $$m\!\in \!\mathbb {Z}_+$$.

Clearly, $$f(0)\!=\!\sum _{k\ge 0}q^m \!=\!1/p,$$ where $$q\!=\!1\!-\!p.$$ Let $$m\!\in \!I\!\!N.$$ By Khintchine’s formula (23),

\begin{aligned} S_k {\mathop {=}\limits ^{d}}S'_{\nu _k}, \end{aligned}

where Binomial $$\textbf{B}(k,p)$$ r.v. $$\nu _k$$ is independent of $$\{X_i'\}$$. Hence

\begin{aligned} f(m)= & {} \sum _{k\ge 0} I\!\!P(S'_{\nu _k}\!=\!m) = \sum _{k\ge 0} \sum _{j=0}^k {k \atopwithdelims ()j} p^j q^{k-j} I\!\!P(S'_j\!=\!m)\\= & {} \sum _{j\ge 0} p^j I\!\!P(S'_j\!=\!m) \sum _{k\ge j} {k \atopwithdelims ()j} q^{k-j} = \sum _{j\ge 0} I\!\!P(S'_j\!=\!m)/p, \end{aligned}

where $$0^0\!:=\!1$$; we have used the fact that $$\sum _{k\ge j} {k \atopwithdelims ()j} p^j q^{k-j} = 1/p$$.

Denote by $$\eta (\cdot )$$ the renewal process

\begin{aligned} \eta (m) = \max \{k\!\in \!I\!\!N\!: S_k\!\le \!m\} \qquad (m\!\in \!\mathbb {Z}_+). \end{aligned}
(66)

Then $$\{S'_k\!\le \!m\} = \{k\!\le \!\eta (m)\},$$ $$I\!\!E\eta (m) \!=\! \sum _{k\ge 1} I\!\!P(S'_k\!\le \!m),$$

\begin{aligned} \sum _{k\ge 0} I\!\!P(S'_k\!=\!m) = I\!\!E\eta (m)-I\!\!E\eta (m\!-\!1) \qquad (m\!\in \!I\!\!N). \end{aligned}

Since $$X'$$ takes values in $$I\!\!N,$$ we have

\begin{aligned} |\eta (m)-\eta (m\!-\!1)|\!\le \!1. \end{aligned}

Indeed, $$S_{\eta (m-1)}\!\le \!m\!-\!1,\ S_{\eta (m-1)+1}\!\ge \!m$$ by (66). If $$S_{\eta (m-1)+1}\!>\!m,$$ then $$\eta (m)=\eta (m\!-\!1).$$ Therefore,

Thus, $$\sum _{j\ge 0}I\!\!P(S'_j\!=\!m)\!\le \!1$$ for all $$m\!\in \!\mathbb {Z}_+,$$ and (65) entails (64). $$\square$$

### Proof of Theorem 4

According to (25),

\begin{aligned} S_n{\mathop {=}\limits ^{d}}S'_{\nu _n}, \end{aligned}

where $$\nu _n=\tau _1+\cdots +\tau _n,$$ $$\tau _1,X_1',\ldots ,\tau _n,X_n'$$ are independent r.v.s, $$\mathcal{L}(X_i')=\mathcal{L}(X_i|X_i\!\ne \!0),\,\mathcal{L}(\tau _i)=\textbf{B}(p_i)\ (\forall i),$$ $$p_i=I\!\!P(X_i\!\ne \!0),$$

\begin{aligned} S'_0:=0,\ \ \ S'_k=X_1'+\cdots +X_k'\ \ \ (k\!\in \!I\!\!N). \end{aligned}

Thus, $$d_{\textrm{o}}(S_n;S'_{\pi _{n\bar{p}}}) = d_\textrm{o}(S'_{\nu _n};S'_{\pi _{n\bar{p}}}).$$ Since $$I\!\!P(X'\!\ne \!0)\!=\!1,$$ inequality (64) yields

An application of (11) leads to (27). $$\square$$

### Proof of Proposition 5

If $$p\!\le \!n^{-2/3},$$ then (12) yields

\begin{aligned} d_{\textrm{o}}(\textbf{B}(n,p);\mathbf{\Pi }(np)) \le \frac{_1}{^2} (3/2e)^{3/2} n^{-5/6} + 4\sqrt{2/\pi }\,(n\!-\!1)^{-3/2} + \frac{_{16}}{^\pi } (n\!-\!1)^{-5/3}. \end{aligned}

If $$p\!>\!n^{-2/3}$$, then we apply (8) to get

\begin{aligned} d_{\textrm{o}}(\textbf{B}(n,p);P_{n,p}) \le Cn^{-5/6}. \end{aligned}

Combining these bounds, we derive (29). $$\square$$