1 Introduction

In this paper, we are interested in properties of two probability distributions defined on the infinite set \(\{0,1,2, \ldots \}\) and generalizing the ordinary discrete uniform and binomial distributions. Both of these extensions have been recently discussed in Calude and Dumitrescu (2020) and mentioned in Zhigljavsky (2012); both extensions use the notion of grossone. The grossone, introduced in Sergeyev (2013) and denoted by \(\textcircled {1}\), is a model of infinity which, as shown in Sergeyev (2009), Sergeyev (2017) and many other publications can be very useful in solving diverse problems of computational mathematics and optimization; in such applications, \(\textcircled {1}\) is used as numerical infinity. Grossone can also be useful as a theoretical model of infinity, see, e.g., (Zhigljavsky 2012; Sergeyev 2017). Some historical, philosophical and logical aspects of grossone have been considered in Lolli (2012), Lolli (2015), Hansson (2020). In Sect. 1, we consider and briefly discuss postulates of \(\textcircled {1}\).

For a positive integer n, the discrete uniform distribution on the set \(\{0,1, \ldots , n-1\}\) assigns equal mass 1/n to all integers \(j\in \{0, \ldots , n-1\}\). We use the notation DU(n) from Balakrishnan and Nevzorov (2004) for this distribution. An extension of this distribution to the infinite set \(\{0,1,2 \ldots \}\) is denoted by DU\((\infty )\) and is used in Bayesian statistics as vague prior (often called ‘Jeffrey’s prior’) for large integer-valued parameters, in particular for the parameter N in the binomial distribution Bin (Np), see, e.g., (Raftery 1988). An extension of DU(n) to DU\((\textcircled {1})\) is straightforward: for a random variable (r.v.) \(\xi \sim \)DU\((\textcircled {1})\), we have \(\mathrm{Pr} \{ \xi = k \}=1/(\textcircled {1})\) for all \(k \in \{0,1, \ldots ,\textcircled {1}-1\}\). This distribution has been considered, in particular, in Calude and Dumitrescu (2020). The distribution DU\((\textcircled {1})\) is easier than DU\((\infty )\): indeed, DU\((\infty )\) is a vague (improper) distribution but, if one agrees to perform calculations with \(\textcircled {1}\), DU\((\textcircled {1})\) is a well-defined distribution, very similar to DU(n).

In Sect. 2.2, we consider the problem of decomposing a random variable \(\xi \sim \) DU\((\textcircled {1})\) into sums \(\xi {\mathop {=}\limits ^\mathrm{d}} \xi _1 + \ldots + \xi _m\), where \(\xi _1 , \ldots , \xi _m\) are independent non-degenerate random variables and the equality \({\mathop {=}\limits ^\mathrm{d}}\) means that the distributions of the random variables in the lhs and rhs of the equation are equal. In particular, we shall establish that DU\((\textcircled {1})\) is not an infinitely divisible distribution which might have been expected in view of results of Warde and Katti (1971).

The probability mass function (pmf) for Bin(Np), the binomial distribution with parameters N and p, is

$$\begin{aligned} b_x={\frac{N!\,}{ \left( N-x \right) !\, x!}} {p}^{x} \left( 1-p \right) ^{N-x}\, ,\;\;x=0,1, \ldots , N, \end{aligned}$$
(1)

where N is usually interpreted as the number of Bernoulli trials and p as the probability of success in these trials. We are interested in approximating the binomial probabilities (3) in the case when N is (very) large but p is rather small like \(p=c/N^{\alpha }\) with finite \(c>0\) and \(1/2<\alpha \le 1\). This case is important for understanding the distribution Bin\((\textcircled {1},p)\), the grossone extension of Bin(Np).

According to the central limit theorem, for any p and large N, the distribution [Bin(\(N,p)-Np]/\sqrt{Np(1-p)}\) is approximately the standard normal distribution \({{\mathcal {N}}}(0,1)\) and, therefore, the binomial distribution Bin(Np) can be approximated by the normal distribution \( {{\mathcal {N}}}(Np,{Np(1-p)}). \) However, if p is small then, even for very large N, this normal approximation is very poor, especially in the tails, see for example (Berry 1941). Also, the support of the random variable with distribution \({{\mathcal {N}}}(Np,Np(1-p))\) barely resembles the support of Bin(Np) and this could be a serious problem in practice. There are many improvements to the normal approximation, see, e.g., (Brown et al. 2001). However, even corrected normal approximations are rather poor in approximating tails; in particular, the approximations based on the Edgeworth expansion do not guarantee that the approximations to individual binomial probabilities are non-negative, see for example (Petrov 1995) for an excellent account of different approximations in the CLT. Even the shape of the normal approximation \( {{\mathcal {N}}}(Np,Np(1-p))\) may be misleading. Consider, for instance, the skewness which is the widely accepted characteristic of a non-symmetry of distributions. The skewness of \({{\mathcal {N}}}(0,1)\) is zero, whereas the skewness of [Bin(\(\textcircled {1}\),p)\(-p\) \(\textcircled {1}\) \(]/\sqrt{p(1-p)\textcircled {1}}\) is \( \gamma _1={{(2p-1)/\sqrt{p(1-p)\textcircled {1}}}.} \) As an example, for \(p=\lambda /\textcircled {1}\) we have \(\gamma _1=1/\sqrt{\lambda } + O(\textcircled {1}^{-1})\) which shows that even if N is very large, the binomial distribution Bin(Np) can still be very asymmetric for small p, even after the renormalization.

Bearing in mind that the normal approximation to Bin(Np) cannot be suitably corrected if p is small, in Sect. 3 we will concentrate on correcting the Poisson approximation to Bin(Np) assuming that N is very large but p is of order \(p=c/N^{\alpha }\) with finite \(c>0\) and \(1/2<\alpha \le 1\).

One of the central concepts used below is the concept of grossone which has been introduced in Sergeyev (2013), developed in a series of papers by Ya. Sergeyev and coauthors and recently comprehensively reviewed in Sergeyev (2017). Grossone can be defined axiomatically, see (Sergeyev 2017). The two main axioms are given below.

Axiom 1

(Grossone is ‘the largest natural number’). The set of natural numbers is \({\mathbb {N}}=\{ 1,2, \ldots , \textcircled {1} \}\), where \(\textcircled {1}\) is the grossone.

Axiom 2

(Divisibility) For any finite positive integer n, \(\textcircled {1}\) is divisible by n.

The grossone models infinity. Similarly, the quantities like \(1/\textcircled {1}\) and \(1/\textcircled {1}^2\) model infinitesimals. These models, as comprehensively discussed in Sergeyev (2017), are very useful as theoretical models and as models of numerical infinity and infinitesimals. A very attractive feature of these numerical infinity and infinitesimals is a possibility to operate with them in numerical fashion, exactly as with numbers (rather than with symbols like in MAPLE); this feature is the key concept of the ‘infinity computer’ discussed in many publications of Ya. Sergeyev and coauthors, see (Sergeyev 2009, 2017).

In mathematics, a more common approach to model infinitesimal quantities is to use the framework of the non-standard analysis. The non-standard analysis approach for modeling infinitesimal probabilities has been recently discussed in Benci et al. (2018). Modeling infinitesimal probabilities with \(1/\textcircled {1}\) and similar quantities involving \(\textcircled {1}\) has also attracted serious attention, see (Calude and Dumitrescu 2020; Sergeyev 2017; Rizza 2018). One of an attractive features of the grossone-based approach is that one may simultaneously work with infinitesimal probabilities of different order like \(1/\textcircled {1}\) and \(1/\textcircled {1}^2\). It should be stressed that the grossone-based methodology is different from the approach based on the non-standard analysis, see (Sergeyev 2019).

2 Deconvolutions of discrete distributions

2.1 Deconvolution and infinite divisibility of a discrete r.v.

The concepts of deconvolution of a r.v. and its infinite divisibility are closely related.

A r.v. \(\xi \) can be deconvoluted if it can be represented as \(\xi {\mathop {=}\limits ^\mathrm{d}} \xi _1+ \xi _2\), where \(\xi _1\) and \( \xi _2\) are independent but not necessarily identically distributed r.v. Let \(\phi (t)= E \exp {(i t\xi )}\) be the characteristic function (c.f.) of \(\xi \). Clearly, \(\xi \) can be deconvoluted if and only if \(\phi (t)\) can be written as a product of two or more characteristic functions of non-degenerate r.v. In Sect. 2.2, we consider the case where \(\xi \) is discrete uniform r.v.

Let now \(\xi \) be a discrete r.v. taking values \(0,1, \ldots \) with probabilities \(p_i=\mathrm{Pr} (\xi =i)\), \(i=0,1, \ldots \) (here we use the traditional language). This r.v. \(\xi \) is infinitely divisible if for any positive integer n there exist i.i.d.r.v. \(\xi _{j,n}\) (\(j=1, \ldots , n\)) such that \(\xi {} {\mathop {=}\limits ^\mathrm{d}} \xi _{1,n}+\ldots + \xi _{n,n}\). Equivalently, a r.v. \(\xi \) is infinitely divisible if and only if \(\phi (t)= E \exp {(i t\xi )}\), the characteristic function (c.f.) of \(\xi \) satisfies the relation \(\phi (t) =\left[ \phi _n(t) \right] ^n \) for any \(n = 1,2,\ldots ,\) where \(\phi _n(t)\) are some characteristic functions.

It yields, in particular, that if a discrete r.v. \(\xi \) has a finite support then it cannot be infinitely divisible. In particular, discrete uniform DU(n) and binomial Bin(Np) r.v. are not infinitely divisible. Note that c.f. \(\phi _{\mathrm{DU}(n)}(t)\) of \(\xi \sim \)DU(n) and \(\phi _{\mathrm{Bin}(N,p)}(t)\) of \(\xi \sim \)Bin(Np) are

$$\begin{aligned} \phi _{\mathrm{DU}(n)}(t){=} \frac{1-e^{nit}}{n(1-e^it)}\, , \;\;\; \phi _{\mathrm{Bin}(N,p)}(t){=} \left( 1-p+p e^{it}\right) ^N,\nonumber \\ \end{aligned}$$
(2)

see formulas (2.22) and (5.5) in Balakrishnan and Nevzorov (2004).

A very general sufficient condition for the infinite divisibility of a discrete r.v. has been established in Warde and Katti (1971):

Theorem 1

see Theorem 2.1 in Warde and Katti (1971). If \( p_0 \ne 0\), \( p_1\ne 0\) and the ratios \(p_{j+1}/p_j \) form a monotonously non-decreasing sequence (\(j=0,1, \ldots \)), then the r.v. \(\xi \) is infinitely divisible.

One of our aims in this paper is to generalize the concept of infinite divisibility to the case when the vague \(\infty \) is replaced with rigid \(\textcircled {1}\) and check if Theorem 1 can still be applied. In Sect. 2.2 below, we will consider DU\((\textcircled {1})\) where we show that formal extension of Theorem 1 to the random variables defined on \(\{0,1, \ldots , \textcircled {1}-1\}\) or \(\{0,1, \ldots , \textcircled {1}\}\) fails. On the other hand, as shown below in this section, the formal extension of Theorem 1 to Bin\((\textcircled {1},p)\) cannot be applied, but the distribution Bin\((\textcircled {1},p)\) is infinitely divisible.

Consider a discrete r.v. \(\xi \) taking values either in \(\{0,1, \ldots , \textcircled {1}-1\}\) or in \(\{0,1, \ldots , \textcircled {1}\}\). In both cases, \(\phi (t)= E \exp {(i t\xi )}\), the c.f. of \(\xi \), is well-defined and we can still classify a r.v. \(\xi \) as infinitely divisible if \(\phi (t) =\left[ \phi _n(t) \right] ^n \) for any finite integer \(n = 1,2,\ldots ,\) where \(\phi _n(t)\) are some characteristic functions.

Consider \(\xi \sim \)Bin\((\textcircled {1},p)\) defined on \(\{0,1, \ldots , \textcircled {1}\}\) with

$$\begin{aligned} p_j={\frac{\textcircled {1}!\,}{ \left( \textcircled {1}-j \right) !\, j!}} {p}^{j} \left( 1-p \right) ^{\textcircled {1}-j}\, ,\;\;j=0,1, \ldots , \textcircled {1}\, . \end{aligned}$$
(3)

For the ratios \(p_{j+1}/p_j\), we have

$$\begin{aligned} \frac{p_{j+1}}{p_j}= & {} \frac{\textcircled {1}-j}{j+1} \frac{p}{1-p}= \frac{\textcircled {1} \, p}{(1-p)(j+1)}\nonumber \\&- \frac{jp}{(1-p)(j+1)}\, . \end{aligned}$$
(4)

As \(\textcircled {1} \) is much larger than j for finite j, for such j we can neglect the second term in (4) and we can clearly see that, at least for finite j, the ratios \(p_{j+1}/p_j\) are decreasing with j. The formal extension of Theorem 1 is thus not applicable but \(\xi \sim \)Bin\((\textcircled {1},p)\) is clearly infinitely divisible if we assume Axiom 2. This is a direct consequence of (2) with \(N=\textcircled {1}\).

2.2 Deconvolution of a discrete uniform distribution DU\((\textcircled {1})\)

Decomposition of the discrete uniform random variable \(\xi \sim \,\)DU(n) into a sum \(\xi {\mathop {=}\limits ^\mathrm{d}} \xi _1 + \ldots + \xi _m\), where \(\xi _1 , \ldots , \xi _m\) are independent non-degenerate random variables, is considered in Zhigljavsky et al. (2016). It is shown in Zhigljavsky et al. (2016) that such decompositions exist if and only if n is a composite number and that the number of different decompositions \(\xi {\mathop {=}\limits ^\mathrm{d}} \xi _1 + \ldots + \xi _m\) is equal to the number of all ordered factorizations of n. The results of Zhigljavsky et al. (2016) have been recently extended in Gillard and Zhigljavsky (2016), Golyandina and Zhigljavsky (2020) to cover the case of deconvolution of positive definite matrices.

Theorem 1 of Zhigljavsky et al. (2016) states that if \(\xi \sim \,\)DU(n) can be represented as \(\xi {\mathop {=}\limits ^\mathrm{d}} \xi _1 + \xi _2\), where \(\xi _1 \) and \( \xi _2\) are independent non-degenerate random variables, then both \(\xi _1 \) and \( \xi _2\) are uniformly distributed on some subsets of \(\{0,1, \ldots , n-1\}\). Let us show that such random variables \(\xi _1 \) and \( \xi _2\) cannot be identically distributed. Indeed, the moment generating function (mgf) of \(\xi \) is \( \mathrm{F}(s)=(1+s+\ldots +s^{n-1})/n. \) The assumption \(\xi = \xi _1 + \xi _2\) with \(\xi _1, \) \( \xi _2\) identically distributed implies \(\mathrm{F}(s)= \mathrm{F}_0^2(s)\), where \(\mathrm{F}_0(s)\) is the mgf of \(\xi _1 \) and \( \xi _2\). From the above-mentioned Theorem 1 of Zhigljavsky et al. (2016), \( \mathrm{F}_0(s)=(a_0+a_1s+\ldots +a_{n-1}s^{n-1})/k \) for \(k=\sqrt{n}\) and coefficients \(a_j\in \{0,1\}\). The squared mgf \(\mathrm{F}_0^2(s)\) cannot have the form \((1+s+\ldots +s^{n-1})/n\) as in the expansion of \(n\mathrm{F}_0^2(s)\) coefficients with some powers will necessarily be larger than 1. Similarly, for any \(n, m>1\) the random variable \(\xi \sim \) DU(n) cannot be represented as a sum \(\xi {\mathop {=}\limits ^\mathrm{d}} \xi _1 + \ldots + \xi _m\), where \(\xi _1 , \ldots , \xi _m\) are i.i.d.r.v.

All results of Zhigljavsky et al. (2016) can be extended to the case \(n=\textcircled {1}\). Likewise, we arrive at the conclusion of impossibility of representation of \(\xi \sim \) DU\((\textcircled {1})\) in the form of a sum \(\xi {\mathop {=}\limits ^\mathrm{d}} \xi _1 + \ldots + \xi _m\), where \(\xi _1 , \ldots , \xi _m\) are independent non-degenerate random variables (\(1<m \le \textcircled {1}\)). This conclusion seems to contradict to Theorem 1 of Sect. 2.1. However, the proof of Theorem 1 of Sect. 2.1 (which is Theorem 2.1 of Warde and Katti (1971)) cannot be extended to the case when \(\mathrm{Pr} (\xi =j+1)/\mathrm{Pr} (\xi =j)=1 \) for all j as for the validity of formula (3) of Warde and Katti (1971), at least one of the inequalities \(p_{j+1}/p_j \ge p_{j}/p_{j-1} \) should be strict.

For \(\xi \sim \) DU(n), the number of different decompositions \(\xi {\mathop {=}\limits ^\mathrm{d}} \xi _1 + \ldots + \xi _m\) with independent non-degenerate random variables \(\xi _1 , \ldots , \xi _m\) is equal to the number of all ordered factorizations of \(\textcircled {1}\). In view of Hille (1936) and (Chor et al. 2000), the number of all ordered factorizations of n (and hence the number of different decompositions of \(\xi \sim \) DU(n) as sums of independent non-degenerate random variables) may reach the order \(n^\rho \), where \(\rho \simeq 1.72865\) and is defined as the solution of the equation \(\zeta (\rho )=2\), where \(\zeta (\cdot )\) is the classical Riemann’s zeta-function \(\zeta (s)\); this function is unambiguously defined for all s with Re\((s)>1\).

Assuming the grossone divisibility axiom, we thus expect for the number of all ordered factorizations of \(\textcircled {1}\) to be much larger than \(\textcircled {1}\) and reaching \(\textcircled {1}\) \(^{\rho }\) with \(\rho \simeq 1.72865\). Note that this is not a precise statement as the divisibility axiom does not give us enough information about all divisors of \(\textcircled {1}\). Note also that there are difficulties (discussed in Sergeyev (2017), Ya (2011) regarding the use of the zeta-function in the grossone-based universe since many different zeta-functions can be distinguished in the latter.

3 Improving Poisson approximation to the Binomial probabilities

The pmf for Poi\((\lambda )\), the Poisson distribution with parameter \(\lambda \), is defined by

$$\begin{aligned} p_x={\frac{{\mathrm{e}^{-\lambda }}{\lambda }^{x}}{x!}}\, ,~~ x=0,1,2, \ldots \,. \end{aligned}$$
(5)

There is a lot of literature on the accuracy of Poisson approximation to Bin(Np), see for example (Hodges and Le Cam 1960; Duembgen et al. 2019). If we are not interested in approximating tails of the Binomial distribution, then Poisson approximation Poi\((\lambda )\) to Bin(Np) is rather accurate when \(\lambda =pN\) is not too large. However, unless \(\lambda \) is small enough, Poi\((\lambda )\) does not approximate tails of Bin(Np) well even if N is very large. There are several approaches for correcting the Poisson approximation including the Stein–Chen method, see (Barbour et al. 1992). Among these asymptotic corrections, the most known is based on the use the expansion with respect to the Charlier polynomials. This expansion has been developed in Uspensky (1931) and can be written as follows:

$$\begin{aligned} \frac{b_x}{p_x}= & {} \left[ 1\! -\!\frac{{\tilde{c}}_2(x;\lambda )}{2N} \! + \!\frac{8 {\tilde{c}}_3(x;\lambda ) \! + \! 3 {\tilde{c}}_4(x;\lambda ) }{4!~ N^2} \right. \nonumber \\&\left. \!-\! \frac{12 {\tilde{c}}_4(x;\lambda )\! +\! 8 {\tilde{c}}_5(x;\lambda )\!+\!{\tilde{c}}_6(x;\lambda ) }{2\cdot 4!~ N^3}\! +\!O\left( \frac{1}{N^4}\right) \right] \;, \end{aligned}$$
(6)

where \(\lambda =p/N\), \(b_x\) and \(p_x\) are defined by (3) and (5), respectively, \({\tilde{c}}_j(x;\lambda )= \lambda ^j {c}_j(x;\lambda )\) and \({c}_j(x;\lambda )\) are the Charlier polynomials

$$\begin{aligned} {c}_j(x;\lambda )=\sum _{k=0}^j (-1)^k \lambda ^{-k} \left( \begin{array}{c} j \\ k \\ \end{array} \right) \left( \begin{array}{c} x \\ k \\ \end{array} \right) k!\, . \end{aligned}$$

We will derive an alternative improvement to the Poisson approximation which, as we will demonstrate, is very accurate at the lower tail of the binomial distribution Bin(Np) with very large N and very small p; the value of \(\lambda =Np\) could be rather large but smaller than \(\sqrt{N}\).

To start with, we rearrange the binomial probability \(b_x\) as follows

$$\begin{aligned} b_x= & {} {\frac{N!\,}{ ( N-x ) !\, x!}} {p}^{x} \left( 1-p \right) ^{N-x}= \frac{\lambda ^x}{x!} \underbrace{ \frac{N!\,}{ ( N-x ) ! N^x} }_{R} \nonumber \\&\underbrace{\left( 1-\frac{\lambda }{N} \right) ^N}_{Q} \underbrace{\left( 1-\frac{\lambda }{N} \right) ^{-x}}_{T}. \end{aligned}$$
(7)

Assume N is large enough and \(p=\lambda /N\). For all \(\lambda x<\!<N\), we have

$$\begin{aligned} T=\left( 1-\frac{\lambda }{N} \right) ^{-x}= & {} 1+ \sum _{j=1}^\infty T_j(\lambda ,x) N^{-j} \end{aligned}$$
(8)

with

$$\begin{aligned} T_j(\lambda ,x)= \frac{(-1)^j{\lambda }^{j} x_{(j)}}{j! } \;\;{ (j=1,2, \ldots ),} \end{aligned}$$

where \(x_{(j)}=x(x-1)\ldots (x-j+1)\) is the falling factorial. The series (8) converges if \(\lambda x /N \rightarrow 0\) as \(N \rightarrow \infty \).

Now, consider Q of (7):

$$\begin{aligned} Q= & {} \left( 1-\frac{\lambda }{N} \right) ^N = \exp \left\{ N \log \left( 1-\frac{\lambda }{N} \right) \right\} \\= & {} \exp \left\{ -N \sum _{k=1}^\infty \frac{\lambda ^k}{k\,N^k} \right\} =e^{-\lambda } e^{-y}, \end{aligned}$$

where

$$\begin{aligned} y= \sum _{k=1}^\infty \frac{\lambda ^{k+1}}{(k+1)\, N^k} = \lambda \sum _{k=1}^\infty \frac{\lambda ^{k}}{(k+1)\, N^k} \, . \end{aligned}$$

Using the expansion \(e^{-y}= 1+\sum _{m=1}^\infty (-1)^m y^m /m! \) and writing \(y^m\) in the form of the multiple sum

$$\begin{aligned} y^m= \lambda ^m \sum _{k_1=1}^\infty \ldots \sum _{k_m=1}^\infty \frac{ \lambda ^{k_1} \ldots \lambda ^{k_m} }{ N^{k_1} \ldots N^{k_m} (k_1+1) \ldots (k_m+1) } \, , \end{aligned}$$

we obtain

$$\begin{aligned}&\left( 1-\frac{\lambda }{N} \right) ^N = e^{-\lambda } \left[ 1+ \sum _{m=1}^\infty \frac{(-1)^m \lambda ^m}{m!} \sum _{k_1=1}^\infty \right. \\&\quad \left. \ldots \sum _{k_m=1}^\infty \frac{ \lambda ^{k_1} \ldots \lambda ^{k_m} }{ N^{k_1} \ldots N^{k_m} (k_1+1) \ldots (k_m+1) } \, \right] . \end{aligned}$$

Collecting the terms by powers of 1/N, we obtain

$$\begin{aligned} Q=\left( 1-\frac{\lambda }{N} \right) ^N = e^{-\lambda } \left[ 1+ \sum _{j=1}^\infty Q_j(\lambda )N^{-j} \right] , \end{aligned}$$
(9)

where the first four polynomials \(Q_j(\lambda )\) are

$$\begin{aligned} Q_1(\lambda )= & {} -\frac{1}{2}~\lambda ^2,~~\\ Q_2(\lambda )= & {} \frac{1}{4!}~ {\lambda }^{3} \left( 3\,\lambda -8 \right) ,~~\\ Q_3(\lambda )= & {} -\frac{1}{2 \cdot 4!}~ {\lambda }^{4} \left( \lambda -2 \right) \left( \lambda -6 \right) ,\\ Q_4(\lambda )= & {} \frac{ 1}{8 \cdot 6!}~ {\lambda }^{5} \left( 15\,{\lambda }^{3}-240\,{\lambda }^{2}+1040\,\lambda -1152 \right) . \end{aligned}$$

The series (9) converges if \(\sqrt{\lambda }/N \rightarrow 0 \) as \(N \rightarrow \infty \).

For the term R of (7), we have:

$$\begin{aligned} R= {\frac{N!\,}{ ( N-x )! N^x \,}} = \prod _{i=0}^{x-1} \left( 1-\frac{i}{N}\right) = 1+\sum _{j=1}^\infty R_j(x)N^{-j},\nonumber \\ \end{aligned}$$
(10)

where the first four polynomials \(R_j(x)\) are

$$\begin{aligned} R_1(x)= & {} -\frac{1}{2}~x(x-1),~~\\ R_2(x)= & {} \frac{1}{4!}~x(x-1)(x-2)(3x-1),~~\\ R_3(x)= & {} -\frac{1}{2 \cdot 4!}~ x^2(x-1)^2(x-2)(x-3),\\ R_4(x)= & {} \frac{1}{8 \cdot 6!}x(x-1)(x-2)(x-3)(x-4)\\&(15x^3-30x^2+5x+2). \end{aligned}$$

Combining (7)–(10), we obtain the following expansion for the probability \(b_x\):

$$\begin{aligned} \frac{b_x}{p_x}= & {} \left[ 1+ \sum _{j=1}^\infty Q_j(\lambda )N^{-j} \right] \left[ 1+\sum _{j=1}^\infty R_j(x)N^{-j} \right] \nonumber \\&\left[ 1+ \sum _{j=1}^\infty T_j(\lambda ,x)N^{-j} \right] \, . \end{aligned}$$
(11)

All the series converge if \(\lambda x /N \rightarrow 0\) as \(N \rightarrow \infty \). If \(\lambda x /N \) does not tend to 0 as \(N \rightarrow \infty \), but x/N and \(\sqrt{\lambda } /N \) do, then we recommend to use the approximation

$$\begin{aligned} \frac{b_x}{p_x}= & {} \left[ 1+ \sum _{j=1}^\infty Q_j(\lambda )N^{-j} \right] \left[ 1+\sum _{j=1}^\infty R_j(x)N^{-j} \right] \nonumber \\&\left( 1-\frac{\lambda }{N} \right) ^{-x} \, , \end{aligned}$$
(12)

where the term T of (7) is not expanded. Numerical results show that for x in the left tail of the binomial distribution, the approximations (11) and (12) practically coincide if we get enough terms in the expansion for T.

Let us rewrite the result (11) in terms of the binomial probabilities of Bin\((\textcircled {1},\lambda /\textcircled {1})\) keeping only two main terms in the expansion:

$$\begin{aligned} \frac{b_x}{p_x}\!= & {} \! 1\!-\! \frac{(\lambda +x)^2\! -\!x}{2\textcircled {1}} \nonumber \\&+ \frac{3(\lambda \!+\!x)^4 \!+\!9x^2 \! -\!2( 4\lambda ^3\!+\!9\lambda ^2 x\!+\!6\lambda x^2\!+\!5 x^3\!+\!x) }{24 \textcircled {1}^2} \nonumber \\&+ O\left( \frac{1}{\textcircled {1}^3}\right) \, .\;\; \end{aligned}$$
(13)

This formula makes sense in the grossone universe. Indeed, both \(\lambda \) and x could be infinitely large but \((\lambda +x)^2/\textcircled {1} \) should be kept infinitesimal as otherwise the second and third terms in the rhs of (13) become large. In particular, the expansion (13) is valid if \(\lambda =c_1 \textcircled {1}^\alpha \) and \(x=c_2 \textcircled {1}^\beta \) where \(c_1\) and \(c_2\) are finite constants and both \(\alpha \) and \(\beta \) are smaller than 1/2.

4 Numerical study

In Figs. 1, 2 and 3, we demonstrate accuracy of several approximations for \(b_x\) in (3) by plotting the ratio \({\tilde{b}}_x/b_x\) where \({\tilde{b}}_x\) is an approximation for \(b_x\). We have chosen the following coding for the x axis: “0” is the mean \(\lambda =Np\) of the distribution Bin(Np) and \(j= \pm 1, \pm 2, \ldots \) denote the points \(\lambda +js\), where \(s=\sqrt{Np(1-p)}\) is the standard deviation of Bin(Np) .

Fig. 1
figure 1

Ratios \({\tilde{b}}_x/b_x\), where \({\tilde{b}}_x\) is computed using the approximations: normal (black solid), Poisson (blue dashed), first-order Charlier (red dotted) and second-order Charlier (green dash-dotted); \(N=1000\), \(p=0.03\) (color figure online)

Fig. 2
figure 2

Ratios \({\tilde{b}}_x/b_x\), where \({\tilde{b}}_x\) is computed using the approximations: normal (black solid), Poisson (blue dashed), first-order Charlier (red dotted) and second-order Charlier (green dash-dotted); \(N=10^4\), \(p=0.03\) (color figure online)

Fig. 3
figure 3

Ratios \({\tilde{b}}_x/b_x\), where \({\tilde{b}}_x\) is computed using the approximations: normal (black solid), Poisson (blue dashed), first-order Charlier (red dotted) and second-order Charlier (green dash-dotted); \(N=10^5\), \(p=0.03\) (color figure online)

In Figs. 1, 2 and 3, we have chosen \(p=0.03\) and \(N=10^3,10^4,10^5\) so that \(\lambda =30,300\) and 3000. The normal approximation (depicted by black solid line) is always very poor. Due to incorrect skewness, the normal approximation considerably underestimates the probabilities \(b_x\) for \(x<\lambda \) overestimates them for \(x>\lambda \). We have tried to use several improved normal approximations, in particular, the ones based on the Edgeworth expansion, but these expansions were not much better and in many cases some of the estimators \({\tilde{b}}_x\) became negative. Uncorrected Poisson approximation (blue dashed line) is slightly more accurate than normal, but it is still quite poor. The corrected Poisson approximations (6), based on the expansion with respect to the Charlier polynomials, significantly improve the Poisson approximation. The first-order Charlier approximation (red dotted line), where we keep only the term \({\tilde{c}}_2(x;\lambda )/{2N} \) in (6), works well for x within the \(3\sigma \)-interval; the second-order Charlier approximation is very accurate x within the \(4\sigma \)-interval.

The style of Figs. 4, 5 and 6 is similar to Figs. 1, 2 and 3. In these figures, we also demonstrate accuracy of several approximations \({\tilde{b}}_x\) for binomial probabilities \(b_x\) by plotting the ratio \({\tilde{b}}_x/b_x\) where \({\tilde{b}}_x\) is an approximation for \(b_x\). Similarly to Figs. 1, 2 and 3, for the x-axis, \(j= 0,\pm 1, \pm 2, \ldots \) denote the points \(\lambda +js\), where \(s=\sqrt{Np(1-p)}\). In Figs. 4, 5 and 6, the smallest value at the x-axis corresponds to \(x=0 \) in (3) so that we show the values of the ratios \({\tilde{b}}_x/b_x\) for \(x=0,1, \ldots , \lambda +6 \sqrt{Np(1-p)}\).

Fig. 4
figure 4

Ratios \({\tilde{b}}_x/b_x\), where \({\tilde{b}}_x\) is computed using the approximations: (12) (brown dashed), first-order Charlier (red dotted), second-order Charlier (green dash-dotted), third-order Charlier (magenta long dashed); \(N=10^4\), \(p=0.008\) (color figure online)

Fig. 5
figure 5

Ratios \({\tilde{b}}_x/b_x\), where \({\tilde{b}}_x\) is computed using the approximations: (12) (brown dashed), first-order Charlier (red dotted), second-order Charlier (green dash-dotted), third-order Charlier (magenta long dashed); \(N=10^5\), \(p=0.0015\) (color figure online)

Fig. 6
figure 6

Ratios \({\tilde{b}}_x/b_x\), where \({\tilde{b}}_x\) is computed using the approximations: (12) (brown dashed), first-order Charlier (red dotted), second-order Charlier (green dash-dotted), third-order Charlier (magenta long dashed); \(N=10^6\), \(p=0.0002\) (color figure online)

In Figs. 4, 5 and 6, we do not use uncorrected normal and Poisson approximations as these approximations are poor. We use three approximations based on the expansion (6): the first-order Charlier approximation (red dotted line), second-order Charlier (green dash-dotted) and third-order Charlier (magenta long dashed). We also use the expansion (12) (brown dashed line), where we keep the first four terms in the expansions for Q and R.

The corrected Poisson approximations (6), based on the expansion with respect to the Charlier polynomials, significantly improve the Poisson approximation. The first-order Charlier approximation works well for x within the \(3\sigma \)-interval, but the second-order and especially third-order Charlier approximations are much more accurate. The new approximation (11) is basically exact at the lower tail of the binomial distribution and outperforms the Charlier approximations.

5 Conclusion

We study properties of two probability distributions defined on the infinite set \(\{0,1,2, \ldots \}\) and generalizing the ordinary discrete uniform and binomial distributions. Both extensions use the notion of grossone denoted by \(\textcircled {1}\). The uniform distribution assigns masses \(1/\textcircled {1}\) to all points in the set \( \{0,1, \ldots ,\textcircled {1}-1\}\). For this distribution, we study the problem of decomposing a r.v. \(\xi \) with this distribution as a sum \(\xi {\mathop {=}\limits ^\mathrm{d}} \xi _1 + \ldots + \xi _m\), where \(\xi _1 , \ldots , \xi _m\) are independent non-degenerate r.v. We establish that, under the validity of the grossone divisibility axiom, such decompositions exist, but all r.v. \(\xi _j\) in the decomposition \(\xi {\mathop {=}\limits ^\mathrm{d}} \xi _1 + \ldots + \xi _m\) must have different distributions and, as a corollary, that the discrete uniform distribution on the set \( \{0,1, \ldots ,\textcircled {1}-1\}\) is not infinitely divisible, where the natural extension of the notion of infinite divisibility (introduced in Sect. 2.1) is used.

Then, we study the accuracy of different approximations for the probability mass function of the binomial distribution Bin\((\textcircled {1},p)\) with \(p=c/\textcircled {1}^{\alpha }\) with \(1/2<\alpha \le 1\). We demonstrate that the normal and uncorrected Poisson approximations are rather poor and develop a new approximation which is demonstrated to be extremely accurate on the lower tail of Bin\((\textcircled {1},p)\). We compare the accuracy of the developed approximation with the corrected Poisson approximations constructed from the expansion with respect to the Charlier polynomials. The accuracy of approximations is assessed on the base of a numerical study. To derive approximations, we use asymptotic expansions formulated in the standard language, but the final results we translate into the language of grossone.