1 Introduction

The classical occupancy distribution is an important and underappreciated discrete probability distribution, which describes the behaviour of the number of occupied bins when we allocate \(n\in {\mathbb{N}}\) balls at random to \(m\in {\mathbb{N}}\) bins. Analysis of the distribution can be found in Harkness (1969), Uppuluri and Carpenter (1971), Johnson and Kotz (1977), Kolchin et al. (1978) and Holst (1986), and some further analysis and computational aspects are discussed in O’Neill (2021). The distribution is useful in problems involving sampling with replacement, and it is especially useful in the context of bootstrapping techniques, where it can be applied to find the probability of any given level of coverage of the original sample in a random resampling. The classical distribution covers the case where balls allocated to bins automatically “occupy” those bins. For reasons that will become clear later, it turns out to be very useful to generalise to the case where each ball has some fixed probability \(0\le \theta \le 1\) of “occupying” its allocated bin, and corresponding probability \(1-\theta\) of “falling through” the bin, so that it does not occupy the bin (see e.g., Uppuluri and Carpenter 1971; Samuel-Cahn 1974).

In this paper we will examine the extended occupancy problem, which seeks the marginal and conditional distributions of the “occupancy number” (counting the number of occupied bins) under conditions where the balls can fall through the bins with a fixed probability. We will derive three important distributional forms arising in this framework, and we will see how these distributions relate to one another. The first two distributions we examine are generalisations of the binomial and negative binomial distributions, and a third is a new distribution relating the extended occupancy distribution to the binomial in a useful way. All three distributions involve the noncentral Stirling numbers of the second kind, and are mathematically interesting forms that arise as norms of well-known summation formulae involving the Stirling numbers.

We examine the extended occupancy process by framing the sequence of occupancy numbers as a Markov chain, and analysing the transition matrix of the chain. To set up our analysis, we first describe the underlying mathematics of this stochastic process, based on a sequence of randomly allocated balls that can occupy or fall through the bins. Consider two independent sequences of random variables:

$$\begin{aligned}&{\widetilde{U}}_{1},{\widetilde{U}}_{2},{\widetilde{U}}_{3},\dots \sim \mathrm{IID\, Unif\, }\left\{1,\dots ,m\right\}\text{,}\\& {Q}_{1},{Q}_{2},{Q}_{3},\dots \sim \mathrm{IID\, Bern}\left(\theta \right).\end{aligned}$$

The first sequence represents balls allocated at random to \(m\) bins and the second sequence gives indicators of whether these balls “occupy” those bins (as opposed to “falling through” the bins). From these underlying sequences we define the occupancy of each ball by the valuesFootnote 1:

$${U}_{i}=\left\{ \begin{array}{cc} \bullet& \text{if }{Q}_{i}=0\text{,}\\ {\widetilde{U}}_{i} & \text{if }{Q}_{i}=1.\end{array}\right.$$

The outcome \({U}_{i}= \bullet\) means that the ball fell through its bin and so it makes no contribution to the occupancy, whereas an outcome \({U}_{i}=1,\dots ,m\) means that the ball occupies its allocated bin.Footnote 2 For any number of balls \(n\) we define the occupied bin counts over bins \(\ell=1,\dots ,m\) and the corresponding occupancy number respectively by:

$${N}_{n,\ell}\equiv \sum\limits_{i=1}^{n}{\mathbb{I}}\left({U}_{i}=\ell\right) \qquad\qquad {K}_{n}\equiv \sum\limits_{\ell=1}^{m}{\mathbb{I}}\left({N}_{n,\ell}>0\right).$$

We have \(n\) balls in our problem, but there are \({n}_{\mathrm{eff}}\equiv n-{N}_{n,\bullet}=\sum_{i=1}^{n}{\mathbb{I}}\left({U}_{i}\ne \bullet\right)\)  effective balls (i.e., balls that occupy their allocated bins). The occupancy number counts the number of bins that are occupied, which are the bin counts that are above zero.

The occupancy process is illustrated in Fig. 1 below, where we show a tabular arrangement of balls randomly allocated to bins. We show outcomes of \(n=10\) balls randomly allocated to \(m=12\) bins. Yellow squares show balls that fell through their bins and black squares show balls that occupy their bins. The bottom row of the figure shows the bin counts \({N}_{n,\ell}\), which add up the number of black squares in the columns above. The effective number of balls \({n}_{\mathrm{eff}}\) is obtained by counting the number of black squares in the whole figure, and the occupancy number \({K}_{n}\) is obtained by counting the number of bins with at least one occupying ball (i.e., the number of columns with at least one black square).

Fig. 1
figure 1

Outcomes of \(n=10\) balls randomly allocated to \(m=12\) bins. Yellow squares show balls that fell through their bins and black squares show balls that occupy their bins. Counts for each bin are shown in the bottom row. There are \({n}_{\mathrm{eff}}=8\) effective balls (black squares) in this case and the occupancy number is \({K}_{n}=6\) (number of columns with at least one black square)

The above figure can be extended to accommodate more balls and/or bins, and the probability of a ball falling through its allocated bin can also be varied. In any case, we have now shown the mathematical foundation of the extended occupancy process and so we are in a position to state the extended occupancy problem, which seeks the marginal and conditional distributions of the occupancy number. Although we will ultimately be interested in three distributional forms arising in the extended occupancy process, our first task will be to derive a distribution form that solves this problem, and examine its properties.

Definition (The Extended Occupancy Problem)

For \(0\le t\le k\le \mathrm{min}\left(n,m\right)\) we wish to find distributional forms for the marginal and conditional probabilities:

$${\mathbb{P}}\left({K}_{n}=k\right) \qquad\qquad {\mathbb{P}}\left({K}_{\acute{n}+n}=k|{K}_{\acute{n}}=t\right).$$

This problem is an extension of the classical occupancy problem, which occurs when \(\theta =1\) (i.e., all balls occupy their allocated bins with probability one). □

2 The Occupancy Process and the Extended Occupancy Distribution

Our approach to the occupancy problem is to look at the stochastic process \(\left\{{K}_{n}|n=\mathrm{0,1},2,\dots \right\}\), which shows the evolution of the occupancy number as we add more balls to the process. This approach is also used in Harkness (1969) and in Uppuluri and Carpenter (1971). Each time we allocate one new ball, this ball will either occupy a bin that is not already occupied (increasing the occupancy number by one) or it will fall through its allocated bin or be allocated to a bin that is already occupied (leaving the occupancy number unchanged). Since we are assuming that balls are allocated to bins by a uniform distribution, the probability of these two outcomes is conditional on the present occupancy number, but it is not affected by which bins are occupied. In this case, the conditional probability that a newly allocated ball increases the occupancy number, given the “history” of the process, depends only on the present occupancy number, so the chain obeys the Markov property. We formalise this argument, and give the resulting transition probability and transition probability matrix, in the following theorem.

Theorem 1 (Markov Characterisation)

Let \({{\varvec{u}}}_{n}\equiv \left({u}_{1},{u}_{2},\dots ,{u}_{n}\right)\) denote the outcomes of the first \(n\) balls in the series and let \({K}_{n}=t\) denote their occupancy number. Then the conditional probability for the occupancy number with one more allocated ball is:

$${\mathbb{P}}\left({K}_{n+1}=r+t|{{\varvec{u}}}_{n}\right)={\mathbb{P}}\left({K}_{n+1}=r+t|{K}_{n}=t\right)=\left\{ \begin{array}{ccc}1-\theta \left(1-t/m\right)& & \text{for }r=0\text{,}\\ \theta \left(1-t/m\right)& & \text{for }r=1\text{,}\\ 0& & {\text{otherwise}}.\end{array}\right.$$

This conditional probability depends on the allocation history only through \({K}_{n}\), so the process satisfies the Markov property —i.e., it is a Markov chain.

Corollary (Transition Probability Matrix)

For all the possible states \(k=\mathrm{0,1},2,\dots ,m\) the transition probability matrix for the Markov chain is the \(\left(m+1\right)\times \left(m+1\right)\) bidiagonal matrix:

$$\mathbf{P}\equiv \frac{1}{m}\left[ \begin{array}{cccccc}m-m\theta & m\theta & 0& \cdots & 0& 0\\ 0& m-\left(m-1\right)\theta & \left(m-1\right)\theta & \cdots & 0& 0\\ 0& 0& m-\left(m-2\right)\theta & \cdots & 0& 0\\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots \\ 0& 0& 0& \cdots & m-\theta & \theta \\ 0& 0& 0& \cdots & 0& m\end{array} \right].$$

(In accordance with standard conventions for Markov chains, we will index the elements of this matrix by the state values, so the first row and column will each use the zero index; i.e., the row and column indices will both run over \(i,j=\mathrm{0,1},\dots ,m\).)

Theorem 1 shows that the stochastic process \(\left\{{K}_{n}|n=\mathrm{0,1},2,\dots \right\}\) for the occupancy number is a Markov chain with a bidiagonal transition matrix, giving us a discrete “pure birth” process. (In the nomenclature of pure birth process, an increase in the occupancy number is a “birth”.) The intuition behind this transition matrix is straight-forward: if the existing occupancy number is \(t\) then the newly allocated ball increases the occupancy number by one so long as it is allocated to an unoccupied bin, and does not “fall through” that bin; the probability of allocation to an unoccupied bin is \(\left(1-t/m\right)\) and the probability that the ball does not fall through the bin is \(\theta\). If the newly allocated ball does falls through its allocated bin, or is allocated to a bin that is already occupied, the occupancy number does not increase.

The extended occupancy problem seeks both the marginal and conditional probabilities for the occupancy number. In the marginal problem our starting point for the chain is \({K}_{0}=0\), and in the conditional problem our starting point is \({K}_{\acute{n}}=t\). In either case, the required probabilities are easily obtained as appropriate elements of the powers of the transition matrix. Specifically, for all \(k,t=\mathrm{0,1},2,\dots ,m\) and all \(\acute{n},n=\mathrm{0,1},2,\dots\) we have:

$${\mathbb{P}}\left({K}_{n}=k\right)={\left[{\mathbf{P}}^{n}\right]}_{0,k} \qquad\qquad {\mathbb{P}}\left({K}_{\acute{n}+n}=k|{K}_{\acute{n}}=t\right)={\left[{\mathbf{P}}^{n}\right]}_{t,k}.$$

Obtaining these probabilities requires us to take arbitrarily large powers of the transition matrix \(\mathbf{P}\), so it is useful to examine its spectral decomposition. It turns out that the transition matrix has eigenvectors that do not depend on the probability parameter \(\theta\), and this makes our matrix characterisation especially useful, leading to a simple spectral form for the distribution.

Theorem 2 (Spectral Decomposition)

The probability matrix \(\mathbf{P}\) has eigenvalue matrix:

$${\Lambda}\equiv \left[ \begin{array}{ccccc}{\lambda }_{0}& 0& 0& \cdots & 0\\ 0& {\lambda }_{1}& 0& \cdots & 0\\ 0& 0& {\lambda }_{2}& \cdots & 0\\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0& 0& 0& \cdots & {\lambda }_{m}\end{array} \right] \qquad \qquad {\lambda }_{i}=1-\left(1-\frac{i}{m}\right)\cdot \theta .$$

Its (unscaled) eigenvector matrix \(\mathbf{v}\) and inverse eigenvector matrix \(\mathbf{w}={\mathbf{v}}^{-1}\) have elements given respectively by:

$${\left(\mathbf{v}\right)}_{i,j}={v}_{i,j}=\left(\begin{array}{c}m-i\\ j-i\end{array}\right) \qquad \qquad {\left(\mathbf{w}\right)}_{i,j}={w}_{i,j}={\left(-1\right)}^{i-j}\left(\begin{array}{c}m-i\\ j-i\end{array}\right).$$

(Again, we remind the reader that we use indexing where the first row and column each use the zero index; this applies also to the indices for the eigenvalue and inverse eigenvalue matrices.) The columns of the eigenvector matrix (and its inverse matrix) are linearly independent, so the transition matrix is diagonalisable, with spectral decomposition \(\mathbf{P}=\mathbf{v}{\varvec{\Lambda}}\mathbf{w}\).

Theorem 2 gives the eigenvalue and eigenvector matrices of the transition matrix, which allows us to take arbitrarily large powers of this matrix using its spectral decomposition \({\mathbf{P}}^{n}=\mathbf{v}{{\varvec{\Lambda}}}^{n}\mathbf{w}\). Uppuluri and Carpenter (1971) derive this same spectral decomposition by way of the general form for the spectral decomposition of a bidiagonal matrix (i.e., a general “pure-birth” process). To apply the spectral decomposition, we let \({\mathbf{v}}_{k}\) denote the \(k\)th row of the eigenvector matrix and we let \({\mathbf{w}}_{k}\) denote the \(k\)th column of the inverse eigenvector matrix. We then have:

$$\begin{aligned}\mathbb{P}\left({K}_{\acute{n}+n}=k|{K}_{\acute{n}}=t\right)={\left[{\mathbf{P}}^{n}\right]}_{t,k}&={\left(\mathbf{v}{{\boldsymbol{\Lambda}}}^{n}\mathbf{w}\right)}_{t,k}={\mathbf{v}}_{t}{{\boldsymbol{\Lambda}}}^{n}{\mathbf{w}}_{k}\\&=\sum_{i=0}^{m}{\lambda }_{i}^{n}\cdot {v}_{t,i}\cdot {w}_{i,k}\\&=\sum_{i=0}^{m}{\left(1-\theta \cdot \frac{m-i}{m}\right)}^{n}\cdot {\left(-1\right)}^{i-k}\left({^{m-t}_{i-t}} \right)\cdot \left({^{m-i}_{k-i}}\right)\\&=\sum_{i=0}^{m}{\left(1-\theta \cdot \frac{m-i}{m}\right)}^{n}\cdot {\left(-1\right)}^{i-k}\left({^{m}_ {k}}\right)\cdot \left({^{k}_ {i}}\right)\cdot \frac{{\left(i\right)}_{t}}{{\left(m\right)}_{t}}\\&=\left({^{m}_{k}}\right)\sum_{i=t}^{k}\left({^{k}_{i}}\right){\left(-1\right)}^{i-k}\cdot {\left(1-\theta \cdot \frac{m-i}{m}\right)}^{n}\cdot \frac{{\left(i\right)}_{t}}{{\left(m\right)}_{t}}.\end {aligned}$$

(Here we use the “falling factorials” \({\left(m\right)}_{t}=\prod_{i=0}^{t-1}\left(m-i\right)\) to expand binomial coefficients.) This gives a general form for the conditional probabilities in the extended occupancy problem. Taking \(t=0\) gives the marginal form:

$$\begin {aligned}{\mathbb{P}}\left({K}_{n}=k\right)={\left[{\mathbf{P}}^{n}\right]}_{0,k}&=\left(\begin{array}{c}m\\ k\end{array}\right)\sum_{i=0}^{k}\left(\begin{array}{c}k\\ i\end{array}\right){\left(-1\right)}^{i-k}\cdot {\left(1-\theta \cdot \frac{m-i}{m}\right)}^{n}\\&=\frac{{\theta }^{n}}{{m}^{n}}\cdot {\left(m\right)}_{k}\cdot S\left(n,k,m\cdot \frac{1-\theta }{\theta }\right)\text{,}\end {aligned}$$

where \(S\left(n,k,\phi \right)\) are the noncentral Stirling numbers of the second kind (see Appendix 1). This gives us a succinct form for the marginal probabilities arising in the occupancy problem.

In the above working, we offered the conditional form of the extended occupancy distribution as our most general form, with the marginal form occurring for \(t=0\). However, it is possible to rewrite the conditional form of the extended occupancy distribution using the marginal form with a corresponding variation in the number of bins and the probability of “falling through” a bin. To see this, we first note that —with a bit of algebra— it can be shown that:

$$\left(\begin{array}{c}m\\ k\end{array}\right)\left(\begin{array}{c}k\\ i\end{array}\right)\cdot \frac{{\left(i\right)}_{t}}{{\left(m\right)}_{t}}=\left(\begin{array}{c}m-t\\ k-t\end{array}\right)\left(\begin{array}{c}k-t\\ i\end{array}\right).$$

Hence, an alternative form for the conditional occupancy probability is:

$$\begin {aligned}{\mathbb{P}}\left({K}_{\acute{n}+n}=k|{K}_{\acute{n}}=t\right)&=\left(\begin{array}{c}m\\ k\end{array}\right)\sum_{i=t}^{k}\left(\begin{array}{c}k\\ i\end{array}\right){\left(-1\right)}^{i-k}\cdot {\left(1-\theta \cdot \frac{m-i}{m}\right)}^{n}\cdot \frac{{\left(i\right)}_{t}}{{\left(m\right)}_{t}}\\&=\left(\begin{array}{c}m-t\\ k-t\end{array}\right)\sum_{i=t}^{k}\left(\begin{array}{c}k-t\\ i\end{array}\right){\left(-1\right)}^{i-k}\cdot {\left(1-\theta \cdot \frac{m-i}{m}\right)}^{n}\\&=\left(\begin{array}{c}m-t\\ k-t\end{array}\right)\sum_{i=0}^{k-t}\left(\begin{array}{c}k-t\\ t+i\end{array}\right){\left(-1\right)}^{i-k+t}\cdot {\left(1-\theta \cdot \frac{m-i-t}{m}\right)}^{n}\\&=\left(\begin{array}{c}m-t\\ k-t\end{array}\right)\sum_{i=0}^{k-t}\left(\begin{array}{c}k-t\\ t+i\end{array}\right){\left(-1\right)}^{i-k+t}\cdot {\left(1-\theta \cdot \frac{m-t}{m}\cdot \frac{m-t-i}{m-t}\right)}^{n}\\&=\left(\begin{array}{c}m-t\\ k-t\end{array}\right)\sum_{i=0}^{k-t}\left(\begin{array}{c}k-t\\ t+i\end{array}\right){\left(-1\right)}^{i-k+t}\cdot {\left(1-\theta \left(1-\frac{t}{m}\right)\cdot \frac{m-t-i}{m-t}\right)}^{n}\\&=\frac{{\left(\theta \left(1-t/m\right)\right)}^{n}}{{\left(m-t\right)}^{n}}\cdot {\left(m-t\right)}_{k-t}\cdot S\left(n,k-t,m-t\cdot \frac{1-\theta -t/m}{\theta }\right).\end {aligned}$$

This is the same form as the marginal occupancy probability given above, but with an argument value of \(k-t\) occupied bins out of \(m-t\) bins, and with probability parameter \(\theta \left(1-t/m\right)\).

This result also follows simple intuition. To convert the conditional occupancy probability to a marginal occupancy probability, we can treat the problem as a marginal occupancy problem where we allocate all the new balls to \(m-t\) unoccupied bins, but we also have to reduce the probability parameter so that a new ball is considered to “fall through” its bin if it would have been allocated to one of the \(t\) bins already occupied by previous balls. In this method, the conditional probability of allocating a new ball to an already occupied bin is “folded into” the probability parameter, allowing us to write the conditional occupancy probability as a marginal occupancy probability. This result shows that both the marginal occupancy probability and the conditional occupancy probability can be written in the same distributional form, which can be used to solve the extended occupancy problem. In the next section, we will formalise this and look at three distributions that arises in our analysis.

Our above analysis gives solutions to the extended occupancy problem. We can make these solutions clearer and more succinct by introducing a class of distributions for the extended occupancy distribution, and naming its parameters.

Definition (The Extended Occupancy Distribution)

This is a discrete distribution with probability mass function given byFootnote 3:

$$\mathrm{Occ}\left(k|n,m,\theta \right)=\frac{{\theta }^{n}}{{m}^{n}}\cdot {\left(m\right)}_{k}\cdot S\left(n,k,m\cdot \frac{1-\theta }{\theta }\right) \qquad\qquad 0\le k\le \mathrm{min}\left(n,m\right)\text{,}$$

where \(m\in \overline{\mathbb{N} }\) is the space parameter (number of bins),Footnote 4\(n\in {\mathbb{N}}\) is the size parameter (number of balls), and \(0<\theta \le 1\) is the probability parameter.Footnote 5 In the special case where \(\theta =1\) the distribution reduces to the classical occupancy distribution \(\mathrm{Occ}\left(k|n,m\right)\equiv \mathrm{Occ}\left(k|n,m,1\right)\). □

Our above analysis gives solutions to the extended occupancy problem, and shows that both the marginal and conditional distributions that arise in this problem are the same distributional form (but with different parameters). Using the notation introduced in our definition of the extended occupancy distribution, the marginal and conditional probabilities of interest are:

$$\begin{aligned}{\mathbb{P}}\left({K}_{n}=k\right)&=\mathrm{Occ}\left(k|n,m,\theta \right)\text{,} \, \\ \\ {\mathbb{P}}\left({K}_{\acute{n}+n}=k|{K}_{\acute{n}}=t\right)&=\mathrm{Occ}\left(k-t|n,m-t,\theta \left(1-t/m\right)\right).\end{aligned}$$

The special case where \(\theta =1\) leads to the classical occupancy distribution for the marginal distribution, and in this case the distribution can be derived by a combinatorial argument (see e.g., O’Neill 2021). The main value of extending the classical occupancy distribution to the extended occupancy distribution is that the latter is “closed under conditioning”, by which we mean that this family accommodates both the marginal and conditional distributions of the occupancy number. The extended occupancy distribution has been examined by a number of authors including Park (1972), Johnson and Kotz (1977, Section 3.3, pp. 139–146), Samuel-Cahn (1974) and Holst (1986). Broader extension to general occupancy problems and corresponding distributions can be found in Charalambides (2005).

Remark 1

Mathematically, the mass function of the extended occupancy distribution arises from the expansion \({\left(m+\phi \right)}^{n}=\sum_{k=0}^{n}{\left(m\right)}_{k}\cdot S\left(n,k,\phi \right)\) for the non-central Stirling numbers of the second kind (see Appendix 1). Each of the terms in this sum is non-negative, and those terms constitute a kernel for the mass function of the extended occupancy distribution. □

The occupancy distribution provides a natural extension to the binomial distribution, insofar as it takes the count of the “effective balls” and “squashes” this number down to the occupancy number by counting only those effective balls that are not duplicating the occupancy of a bin. In the next section we will see that the occupancy distribution actually provides a generalisation of the binomial distribution. However, for the moment, it is worth comparing the forms of the mass functions of the two distributions. To do this, it is quite useful to write the mass function of the occupancy distribution in an alternative form as a product of the binomial mass function multiplied by an adjustment term involving the scaled Stirling function (see Appendix 1):

$$\begin {aligned}\mathrm{Occ}\left(k|n,m,\theta \right)&=\frac{{\theta }^{n}}{{m}^{n}}\cdot {\left(m\right)}_{k}\cdot S\left(n,k,m\cdot \frac{1-\theta }{\theta }\right)\\&=\frac{{\left(m\right)}_{k}}{{m}^{k}}\cdot \frac{S\left(n,k,m\cdot \frac{1-\theta }{\theta }\right)}{\left(\begin{array}{c}n\\ k\end{array}\right){\left(m\cdot \frac{1-\theta }{\theta }\right)}^{n-k}}\cdot \left(\begin{array}{c}n\\ k\end{array}\right)\cdot {\theta }^{k}\cdot {\left(1-\theta \right)}^{n-k}\\&=\frac{{\left(m\right)}_{k}}{{m}^{k}}\cdot\Pi \left(n,k,m\cdot \frac{1-\theta }{\theta }\right)\times \mathrm{Bin}\left(k|n,\theta \right).\end {aligned}$$

This form shows a close resemblance between the mass functions of these two distributions. Moreover, the monotonicity and limit properties of the scaled Stirling function (see Lemma 1 in Appendix 1) allow us to obtain useful properties of the occupancy distribution.

A full account of the properties of the extended occupancy distribution is beyond the scope of this paper. Nevertheless, it is worth giving some basic properties including its moments and asymptotic form, since these are useful for computational purposes. As with many discrete distributions, the moments of the extended occupancy distribution are simplest when presented through the factorial moments. These factorial moments yield corresponding functions for the raw and central moments, which can be computed with a reasonable amount of algebra. We will go as far as the kurtosis of the distribution, noting that the form of this moment is already quite cumbersome. We will also show the asymptotic form of important moments. Higher-order raw and central moments can be computed from the factorial moments, but they are not particularly illuminating.

Theorem 3 (Factorial and Raw Moments)

Letting \({E}_{r}\equiv {\left(1-\theta r/m\right)}^{n}\), we have:

$${\mathbb{E}}\left({\left(m-{K}_{n}\right)}_{r}\right)={\left(m\right)}_{r}\cdot {E}_{r}.$$

We can see from Theorem 3 that the occupancy distribution gives a simple form for the factorial moments, in terms of the terms \({E}_{r}\). (This notation comes in handy below when we write the central moments of the distribution.) Since \({\left(m-{K}_{n}\right)}_{r}\) is a polynomial in \({K}_{n}\), the factorial moments are used to derive the raw and central moments. The algebra is cumbersome, so for brevity we state them here as corollaries to this above theorem, without further derivation.

Corollary (Central Moments)

The extended occupancy distribution has mean, variance, skewness and kurtosis given respectively by:

$$\begin{aligned}&{\mu }_{n,m,\theta }=m\left(1-{E}_{1}\right)\text{, }\\ \\& {\sigma }_{n,m,\theta }^{2}=m\left[\left(m-1\right){E}_{2}+{E}_{1}-m{E}_{1}^{2}\right]\text{,}\\ \\& {\gamma }_{n,m,\theta }=-\frac{{E}_{1}-3{E}_{2}+2{E}_{3}+m\left(\begin{array}{c}3\left({E}_{2}-{E}_{1}^{2}\right)\\ +\,2m\left({E}_{1}^{3}-{E}_{1}{E}_{2}\right)\\ +\,\left(m-3\right)\left({E}_{3}-{E}_{1}{E}_{2}\right)\end{array}\right)}{{m}^{1/2}{\left[\left(m-1\right){E}_{2}+{E}_{1}-m{E}_{1}^{2}\right]}^{3/2}}\text{,}\\ \\& {\kappa }_{n,m,\theta }=\frac{\left(\begin{array}{c}{E}_{1}-4m{E}_{1}^{2}+6{m}^{2}{E}_{1}^{3}-3{m}^{3}{E}_{1}^{4}\\ +\,7\left(m-1\right){E}_{2}+6\left(m-1\right)\left(m-2\right){E}_{3}\\ +\left(m-1\right)\left(m-2\right)\left(m-3\right){E}_{4}\\ -12m\left(m-1\right){E}_{1}{E}_{2}+6{m}^{2}\left(m-1\right){E}_{1}^{2}{E}_{2}\\ -4m\left(m-1\right)\left(m-2\right){E}_{1}{E}_{3}\end{array}\right)}{m{\left[\left(m-1\right){E}_{2}+{E}_{1}-m{E}_{1}^{2}\right]}^{2}}.\end{aligned}$$

Corollary (Asymptotic Central Moments)

As \(n\to \infty\) we have the asymptotic equivalence \({E}_{r} \sim {e}^{-\theta rn/m}\) which gives the asymptotic forms:

$$\begin{aligned}\mu_{n,m,\theta } &\sim m\left(1-{e}^{-\theta n/m}\right)\text{,}\\ \\ {\sigma }_{n,m,\theta }^{2}& \sim m{e}^{-\theta n/m}\left(1-{e}^{-\theta n/m}\right)\text{,}\\ \\ {\gamma }_{n,m,\theta } &\sim -\frac{1}{\sqrt{m}}\cdot \frac{1-2{e}^{-\theta n/m}}{\sqrt{{e}^{-\theta n/m}\left(1-{e}^{-\theta n/m}\right)}}\text{,}\\ \\ {\kappa }_{n,m,\theta } &\sim 3+\frac{1}{m}\cdot \frac{1-6{e}^{-\theta n/m}\left(1-{e}^{-\theta n/m}\right)}{{e}^{-\theta n/m}\left(1-{e}^{-\theta n/m}\right)}.\end{aligned}$$

If \(n\to \infty\) and \(m\to \infty\) in a way that yields a fixed finite limit for \(n/m\) then we have \({\gamma }_{n,m}\to 0\) and \({\kappa }_{n,m}\to 3\) (so the distribution is asymptotically unskewed and mesokurtic).

Corollary (Asymptotic Central Moments)

As \(m\to \infty\) we have the asymptotic equivalence \({m}^{a}{E}_{r}^{b} \sim \sum_{i=0}^{a}\left(\begin{array}{c}bn\\ i\end{array}\right){\left(-1\right)}^{i}{\left(r\theta \right)}^{i}{m}^{a-i}\) which gives the asymptotic forms:

$$\begin{aligned}&{\mu }_{n,m,\theta } \sim \theta \text{,}\\ \\& {\sigma }_{n,m,\theta }^{2} \sim n\theta \left(1-\theta \right)\text{,}\\ \\&{\gamma }_{n,m,\theta } \sim \frac{1}{\sqrt{n}}\cdot \frac{1-2\theta }{\sqrt{\theta \left(1-\theta \right)}}\text{,}\\ \\& {\kappa }_{n,m,\theta } \sim 3+\frac{1}{n}\cdot \frac{1-6\theta \left(1-\theta \right)}{\theta \left(1-\theta \right)}.\end{aligned}$$

The moments of the extended occupancy distribution give us a reasonable sense of the shape of the distribution. In particular, we see that —under broad limit conditions— the distribution is asymptotically unskewed and mesokurtic. In fact, this is just a partial aspect of a powerful limit result for general occupancy distributions given in Hwang and Janson (2008). If \(n\to \infty\) and \(m\to \infty\) in such a way that \({\sigma }_{n,m}^{2}\to \infty\) (having a fixed finite limit for \(n/m\) is a sufficient condition for this convergence) then the mass function for the extended occupancy distribution converges uniformly to the normal density with the same mean and variance. This result can be used to approximate the occupancy distribution for large values of \(n\) and \(m\) where it is not feasible to compute the distribution (due to difficulties computing the Stirling numbers of the second kind for large input values).

3 The Extended Occupancy Distribution Generalises the Binomial Distribution

Here we will show that the occupancy distribution provides a generalisation of the binomial distribution (i.e., it subsumes the binomial as a special case), which will later allow us to frame various properties and mixture characterisations as extensions of well-known characterisations for the binomial distribution. The attentive reader may already have noticed that our definition of the extended occupancy distribution allows the space parameter (number of bins) to occur in the extended domain \(m\in \overline{\mathbb{N} }\), which allows the value \(m=\infty\). In this latter case we define the distribution by its limit \(\mathrm{Occ}\left(k|n,\infty ,\theta \right)\equiv {\mathrm{lim}}_{m\to \infty }\mathrm{Occ}\left(k|n,m,\theta \right)=\mathrm{Bin}\left(k|n,\theta \right)\). (This case forms part of our definition of the extended occupancy distribution, but we omitted it from the above definition in order to discuss it in more detail here.)

Harkness (1969) notes the similarity of the occupancy distribution to the binomial distribution (pp. 112–114). We will prove the special case identified above formally in a moment, but the simplest way to see that this limit gives the binomial distribution is to observe that the transition probability matrix converges to the infinite dimensional matrix:

$${\mathbf{P}}_{\infty }\equiv \underset{m\to \infty }{\mathrm{lim}}\mathbf{P}\equiv \left[ \begin{array}{ccccccc}1-\theta & \theta & 0& \cdots & 0& 0& \cdots \\ 0& 1-\theta & \theta & \cdots & 0& 0& \cdots \\ 0& 0& 1-\theta & \cdots & 0& 0& \cdots \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots \\ 0& 0& 0& \cdots & 1-\theta & \theta & \cdots \\ 0& 0& 0& \cdots & 0& 1-\theta & \cdots \\ \vdots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots \end{array} \right].$$

The reader will recognise this matrix as the transition probability matrix for a Bernoulli process, and its powers give matrices with elements:

$${\left[{\mathbf{P}}_{\infty }^{n}\right]}_{t,k}\equiv \mathrm{Bin}\left(k-t|n,\theta \right).$$

Thus, starting in state \({K}_{0}=0\) and taking \(n\) steps (i.e., allocating \(n\) balls at random) gives the state \({K}_{n}\) having a binomial distribution with size parameter \(n\) and probability parameter \(\theta\). We have derived the result heuristically here, but it can be formally established either via analysis of the limiting properties of Markov chains, or by purely algebraic analysis on the established probability mass function for the extended occupancy distribution.

Theorem 4a (Generalisation of the Binomial Distribution)

The occupancy distribution satisfies the limiting form:

$$\begin{array}{c}\underset{m\to \infty }{\mathrm{lim}}\mathrm{Occ}\left(k|n,m,\theta \right)=\mathrm{Bin}\left(k|n,\theta \right).\end{array}$$

This theorem formally establishes that the extended occupancy distribution is a generalisation of the binomial distribution. Intuitively, the limiting result reflects the fact that, with infinite bins, there is zero probability that any two balls will fall in the same bin. Thus, the occupancy number is then the “effective” number of balls that have not “fallen through” their allocated bins, which is merely a count of independent Bernoulli random variables with fixed probability. Indeed, going back to our initial setup for the occupancy problem, we note that the number of effective balls in the problem is \({n}_{\mathrm{eff}}=\sum_{i=1}^{n}{\mathbb{I}}\left({U}_{i}\ne \bullet\right)=\sum_{i=1}^{n}{\mathbb{I}}\left({Q}_{i}=1\right)\), with the underlying values \({Q}_{1},{Q}_{2},{Q}_{3},\dots \sim \mathrm{IID \, Bern}\left(\theta \right)\).

Since the occupancy distribution generalises the binomial, and since the binomial distribution is well-known to lead to other distributional forms (e.g., the Poisson) under appropriate limits, a natural follow-up question is to ask whether we obtain any interesting distribution if we keep the space parameter \(m\) as a finite value, but take the limits on the other parameters that would yield the Poisson distribution from the binomial. It turns out that this limiting exercise leads us to another binomial distribution (with different parameters). Since the binomial is itself a generalisation of the Poisson distribution, under appropriate limits, the occupancy distribution generalises both the binomial and the Poisson.Footnote 6

Theorem 4b (Limit to the Binomial Distribution)

The occupancy distribution satisfies the limiting form:

$$\mathop{\lim}\limits_{n\to \infty ,\theta \to 0 \atop n\theta \to \lambda}\mathrm{Occ}\left(k|n,m,\theta \right)=\mathrm{Bin}\left(k\Big{|}m,1-\mathrm{exp}\left(-\frac{\lambda }{m}\right)\right).$$

The binomial distribution has a number of well-known properties relating to recursion on its parameters, monotone likelihood ratio with respect to its parameters, and resulting stochastic dominance (see e.g., Johnson and Kotz 1969, pp. 50–86). Since the occupancy distribution generalises the binomial, it is useful to see how the properties of the binomial are generalised in the occupancy distribution. In particular, it is well-known that a binomial random variable has a monotone likelihood ratio and is therefore “stochastically increasing” in \(n\) and \(\theta\) (in the sense of first-order stochastic dominance), and that it obeys simple recursive and differential equations on its parameters. We now establish a set of generalised equations, monotonicity and stochastic dominance results that apply to the extended occupancy distribution, with the binomial recursive equations and stochastic dominance results occurring as a special case.

Theorem 5 (Recursive and Differential Equations)

The extended occupancy distribution satisfies the following recursive/differential equations:

$$\begin{aligned}\mathrm{Occ}\left(k|n+1,m,\theta \right)&=\theta \cdot \frac{m-k+1}{m}\cdot \mathrm{Occ}\left(k-1|n,m,\theta \right)+\left(1-\theta \cdot \frac{m-k}{m}\right)\cdot \mathrm{Occ}\left(k|n,m,\theta \right)\text{,}\\ \\ \mathrm{Occ}\left(k|n,m+1,\theta \right)&=\frac{m+1}{m-k+1}\cdot {\left(1-\frac{\theta }{m+1}\right)}^{n}\cdot \mathrm{Occ}\left(k\Big{|}n,m,\frac{m\theta }{1-\theta +m}\right)\text{,} \, \\ \\ \frac{\partial }{\partial \theta }\mathrm{Occ}\left(k|n,m,\theta \right)&=-\frac{m-k}{m}\cdot \mathrm{Occ}\left(k|n-1,m,\theta \right)+\frac{m-k+1}{m}\cdot \mathrm{Occ}\left(k-1|n-1,m,\theta \right).\end{aligned}$$

Corollary

In the case where \(m=\infty\) we have the binomial recursive/differential equations:

$$\begin{aligned}\mathrm{Bin}\left(k|n+1,\theta \right)=\theta \cdot \mathrm{Bin}\left(k-1|n,\theta \right)+\left(1-\theta \right)\cdot \mathrm{Bin}\left(k|n,\theta \right)\text{,}\\ \\ \frac{\partial }{\partial \theta }\mathrm{Bin}\left(k|n,\theta \right)=-\mathrm{Bin}\left(k|n-1,\theta \right)+\mathrm{Bin}\left(k-1|n-1,\theta \right).\end{aligned}$$

Theorem 6 (First-Order Stochastic Dominance)

Let \(F\left(k|n,m,\theta \right)\equiv {\mathbb{P}}\left({K}_{n}\le k\right)\) denote the cumulative distribution function for the extended occupancy distribution. This satisfies the following first-order stochastic dominance relations:

$$\begin{array}{lllllll}n\le n^{\prime}& & \Rightarrow & & F\left(k|n,m,\theta \right)\ge F\left(k|n^{\prime},m,\theta \right)& & \left(\text{strict if }n<{n}^{\prime} \text{and }m>1\right)\text{,}\\ m\le m^{\prime}& & \Rightarrow & & F\left(k|n,m,\theta \right)\ge F\left(k|n,m^{\prime},\theta \right)& & \left(\text{strict if }m<m^{\prime} \text{and }n>1\right)\text{,}\\ \theta \le \theta ^{\prime}& & \Rightarrow & & F\left(k|n,m,\theta \right)\ge F\left(k|n,m,\theta ^{\prime}\right)& & \left(\text{strict if }\theta <\theta ^{\prime}\right).\end{array}$$

Theorem 5 establishes equations for the occupancy distribution generalising similar equations for the binomial distribution. Theorem 6 shows that the stochastic dominance results for the binomial distribution also hold for the extended occupancy distribution, and are also extended to the new parameter \(m\). The theorem establishes that an extended occupancy random variable is “stochastically increasing” in \(n\), \(m\) and \(\theta\) (in the sense of first-order stochastic dominance). This result accords with intuition, since increasing the number of balls, the number of bins, or the probability of occupancy, will all tend to increase the number of occupied bins. This stochastic dominance result also gives us a useful intuitive understanding of the effect of the generalisation from the binomial to the occupancy distribution. By imposing a finite parameter \(m\) (rather than the value \(m=\infty\) that gives the binomial distribution) we “squash” the effective balls into a finite number of available bins, which gives rise to the possibility that more than one effective ball will share a bin, so the excess balls will not count towards our occupancy number. A simple corollary of Theorem 6 is that imposition of a finite value of \(m\) (instead of \(m=\infty\)) will tend to reduce the value of the occupancy number.

In Appendix 2 we prove the stochastic dominance results in Theorem 6 algebraically from the mass function for the occupancy distribution. However, a simpler intuition for the result is obtained by noting that the Markov chain defining the extended occupancy distribution has a number of monotonicity properties. The rows of the matrix have cumulative sums that are increasing, so the matrix is “monotone” in the sense described in Daley (1968).Footnote 7 Moreover, in any row of the matrix, the cumulative sum of terms is decreasing in \(m\) and \(\theta\) (in the degenerate case where \(\theta =0\) it is only non-increasing in \(m\)). An alternative proof (not pursued here) could be couched in terms of these general properties of monotone Markov chains.

4 Excess Hitting Times and the Negative Occupancy Distribution

Another distribution related to the binomial distribution is the negative binomial distribution, and it turns out that we can find a natural extension of this latter distribution arising in the occupancy process. This distribution is obtained by considering the “excess hitting time”Footnote 8 for the event \({K}_{n}=k\) in the Markov chain, which we denote by:

$${T}_{k}\equiv \mathrm{min}\left\{t=\mathrm{0,1},2,\dots |{K}_{k+t}=k\right\}.$$

Derivation of the distribution of this excess hitting time is quite straight-forward. The event \({T}_{k}\le t\) is equivalent to the event \({K}_{k+t}\ge k\). Hence, for all \(0<k\le m\) and \(t\ge 0\) the cumulative distribution of \({T}_{k}\) can be obtained from the occupancy distribution as:

$${F}_{{T}_{k}}\left(t\right)={\mathbb{P}}\left({K}_{k+t}\ge k\right)=\sum_{r=0}^{m-k}\mathrm{Occ}\left(k+r|k+t,m,\theta \right).$$

To find the probability mass function we take the first difference of the distribution function. As a preliminary step, applying the first recursive equation in Theorem 5 we obtain:

$$\begin {aligned}\mathrm{Occ}\left(k+r|k+t,m,\theta \right)=\, &\theta \cdot \frac{m-k-r+1}{m}\cdot \mathrm{Occ}\left(k+r-1|k+t-1,m,\theta \right)\\&+\left(1-\theta \cdot \frac{m-k-r}{m}\right)\cdot \mathrm{Occ}\left(k+r|k+t-1,m,\theta \right).\end {aligned}$$

We therefore have:

$$\begin {aligned}\mathrm{Occ}\left(k+r|k+t,m,\theta \right)\, &-\mathrm{Occ}\left(k+r|k+t-1,m,\theta \right)\\&=\frac{\theta }{m}\left[\begin{array}{c}\left(m-k-r+1\right)\cdot \mathrm{Occ}\left(k+r-1|k+t-1,m,\theta \right)\\ -\left(m-k-r\right)\cdot \mathrm{Occ}\left(k+r|k+t-1,m,\theta \right) \end{array}\right].\end {aligned}$$

Using this result, the mass function for the excess hitting time is:

$$\begin {aligned}{\mathbb{P}}\left({T}_{k}=t\right)\, &={F}_{{T}_{k}}\left(t\right)-{F}_{{T}_{k}}\left(t-1\right)\\&=\sum_{r=0}^{m-k}\left[\mathrm{Occ}\left(k+r|k+t,m,\theta \right)-\mathrm{Occ}\left(k+r|k+t-1,m,\theta \right)\right]\\&=\frac{\theta }{m}\cdot \left[\begin{array}{c} \sum\limits_{r=0}^{m-k}\left(m-k-r+1\right)\cdot \mathrm{Occ}\left(k+r-1|k+t-1,m,\theta \right)\\ -\sum\limits_{r=1}^{m-k+1}\left(m-k-r+1\right)\cdot \mathrm{Occ}\left(k+r-1|k+t-1,m,\theta \right) \end{array}\right]\\&=\frac{\theta }{m}\cdot \left[\begin{array}{c} {\left.\left(m-k-r+1\right)\cdot \mathrm{Occ}\left(k+r-1|k+t-1,m,\theta \right)\right|}_{r=0} \\ -{\left.\left(m-k-r+1\right)\cdot \mathrm{Occ}\left(k+r-1|k+t-1,m,\theta \right)\right|}_{r=m-k+1}\end{array}\right]\\&=\frac{\theta }{m}\cdot \left(m-k+1\right)\cdot \mathrm{Occ}\left(k-1|k+t-1,m,\theta \right)\\&=\theta \cdot \frac{m-k+1}{m}\cdot \frac{{\theta }^{k+t-1}}{{m}^{k+t-1}}\cdot {\left(m\right)}_{k-1}\cdot S\left(k+t-1,k-1,m\cdot \frac{1-\theta }{\theta }\right)\\&=\frac{{\theta }^{k+t}}{{m}^{k+t}}\cdot {\left(m\right)}_{k}\cdot S\left(k+t-1,k-1,m\cdot \frac{1-\theta }{\theta }\right).\end {aligned}$$

Below we formalise this mass function as defining a class of distributions we call the “negative occupancy distribution”. We also consider a special case of this distribution, which describes the behaviour of the excess coupons collected in the famous coupon-collector problem.

Definition (The Negative Occupancy Distribution)

This is a discrete distribution that has probability mass function given over all integer arguments \(t\ge 0\) as:

$$\mathrm{NegOcc}\left(t|m,k,\theta \right)=\frac{{\theta }^{k+t}}{{m}^{k+t}}\cdot {\left(m\right)}_{k}\cdot S\left(k+t-1,k-1,m\cdot \frac{1-\theta }{\theta }\right)\text{,}$$

where \(0<k\le m\le \infty\) are the occupancy parameter (number of occupied bins) and space parameter (number of bins) respectively, and \(0<\theta \le 1\) is the probability parameter. □

Definition (Coupon-Collector Distribution)

This is a discrete distribution with probability mass function given over all integer argument values \(t\ge 0\) as:

$$\mathrm{CoupColl}\left(t|m,\theta \right)=\frac{{\theta }^{m+t}}{{m}^{m+t}}\cdot m!\cdot S\left(m+t-1,m-1,m\cdot \frac{1-\theta }{\theta }\right).$$

where \(0<m<\infty\) is the space parameter (number of bins) and \(0<\theta \le 1\) is the probability parameter. Since \(\mathrm{CoupColl}\left(t|m,\theta \right)=\mathrm{NegOcc}\left(t|m,m,\theta \right)\) the coupon-collector distribution is a special case of the negative occupancy distribution with \(k=m\). □

Remark 2

Mathematically, the mass function of the negative occupancy distribution arises from the ordinary generating function for the noncentral Stirling numbers of the second kind, which is \(\sum_{n=k}^{\infty }S\left(n,k,\phi \right)\cdot {x}^{n}={x}^{k}/\prod_{i=0}^{k}\left(1-\left(i+\phi \right)x\right)\) (see Appendix 1). Substituting the noncentrality parameter \(\phi =m\left(1-\theta \right)/\theta\) and the argument \(x=\theta /m\) and rearranging gives the norming equation for the mass function of the negative occupancy distribution. □

Since \({\mathbb{P}}\left({T}_{k}=t\right)=\mathrm{NegOcc}\left(t|m,k,\theta \right)\) we can see that the negative occupancy distribution is the appropriate family of distributions to describe the behaviour of the excess hitting time in the occupancy process. In the case where \(k=m\) we are looking at the excess number of balls that are required to fully occupy all the bins in the occupancy problem. In this case, we have referred to the distribution as the “coupon-collector” distribution. The classical version of the coupon-collector distribution arises in the “coupon-collector problem”, which examines the number of randomly obtained coupons that need to be collected to obtain a full set (see e.g., Dawkins 1991; Adler et al 2003). Note that our distribution describes the excess number of coupons required for a full set, not the total number of required coupons; it is trivial to convert to the distribution of the total required number of coupons if required.

As with the occupancy distribution above, it is useful to write the mass function of the negative occupancy distribution as a product of the negative binomial mass function multiplied by an adjustment term involving the scaled Stirling function (Appendix 1):

$$\begin {aligned}\mathrm{NegOcc}\left(t|m,k,\theta \right)\, &=\frac{{\theta }^{k+t}}{{m}^{k+t}}\cdot {\left(m\right)}_{k}\cdot S\left(k+t-1,k-1,m\cdot \frac{1-\theta }{\theta }\right)\\&=\frac{{\left(m\right)}_{k}}{{m}^{k}}\cdot \frac{S\left(k+t-1,k-1,m\cdot \frac{1-\theta }{\theta }\right)}{\left(\begin{array}{c}k+t-1\\ k-1\end{array}\right){\left(m\cdot \frac{1-\theta }{\theta }\right)}^{t}}\cdot \left(\begin{array}{c}k+t-1\\ t\end{array}\right)\cdot {\left(1-\theta \right)}^{t}\cdot {\theta }^{k}\\&=\frac{{\left(m\right)}_{k}}{{m}^{k}}\cdot\Pi \left(k+t-1,k-1,m\cdot \frac{1-\theta }{\theta }\right)\times \mathrm{NegBin}\left(t|k,1-\theta \right).\end {aligned}$$

This alternative form shows that there is a close resemblance between the mass functions of these two distributions.Footnote 9 In fact, it is simple to show that the negative occupancy distribution generalises the negative binomial distribution. In the case \(m=\infty\) we define the distribution by the limit \(\mathrm{NegOcc}\left(t|\infty ,k,\theta \right)\equiv {\mathrm{lim}}_{m\to \infty }\mathrm{NegOcc}\left(t|m,k,\theta \right)=\mathrm{NegBin}\left(t|k,1-\theta \right)\).

Theorem 7 (Generalisation of Negative Binomial Distribution)

The negative occupancy distribution satisfies the limiting form:

$$\begin{array}{c}\underset{m\to \infty }{\mathrm{lim}}\mathrm{NegOcc}\left(t|m,k,\theta \right)=\mathrm{NegBin}\left(t|k,1-\theta \right){.}\end{array}$$

The negative occupancy distribution generalises the negative binomial distribution by adding a space parameter \(m\) that allows the occupancy to be “squashed” into a finite number of bins.Footnote 10 The distribution obeys a number of recursive/differential results, and stochastic dominance properties that generalise results for the negative binomial distribution.

Theorem 8 (Recursive and Differential Equations)

The negative occupancy distribution has recursive/differential equations with respect to its parameters given by:

$$\begin{aligned}\mathrm{NegOcc}\left(t|m+1,k,\theta \right)&=\frac{m+1}{m-k+1}\cdot {\left(1-\frac{\theta }{m+1}\right)}^{k+t}\cdot \mathrm{NegOcc}\left(t\Big{|}m,k,\frac{m\theta }{1-\theta +m}\right)\text{,} \, \\ \mathrm{NegOcc}\left(t|m,k+1,\theta \right)&=\theta \cdot \frac{m-k}{m}\cdot \sum\limits_{i=0}^{t}{\left(1-\theta \cdot \frac{m-k}{m}\right)}^{i}\cdot \mathrm{NegOcc}\left(t-i|m,k,\theta \right)\text{,} \, \\ \frac{\partial }{\partial \theta }\mathrm{NegOcc}\left(t|m,k,\theta \right)&=\frac{1}{\theta }\cdot \mathrm{NegOcc}\left(t|m,k,\theta \right)+\frac{m-k+1}{m}\cdot \left[\begin{array}{c} \mathrm{NegOcc}\left(t|m,k-1,\theta \right)\\ -\mathrm{NegOcc}\left(t-1|m,k,\theta \right)\end{array}\right].\end{aligned}$$

Corollary

In the case where \(m=\infty\) we have the recursive/differential equations for the negative binomial distribution:

$$\begin{aligned}\mathrm{NegBin}\left(t|k+\mathrm{1,1}-\theta \right)&=\theta \cdot \sum\limits_{i=0}^{t}{\left(1-\theta \right)}^{i}\cdot \mathrm{NegBin}\left(t-i|k,1-\theta \right)\text{,} \, \\ \frac{\partial }{\partial \theta }\mathrm{NegBin}\left(t|k,1-\theta \right)&=\frac{1}{\theta }\cdot \mathrm{NegBin}\left(t|k,1-\theta \right)+\left[\begin{array}{c} \mathrm{NegBin}\left(t|k-\mathrm{1,1}-\theta \right)\\ -\mathrm{NegBin}\left(t-1|k,1-\theta \right)\end{array}\right].\end{aligned}$$

Theorem 8 gives recursive/differential equations for the negative occupancy distribution. The corollary shows that these equations are extensions of well-known equations for the negative binomial distribution. (The reader should note that corresponding recursive equations for the coupon-collector distribution are a little more complicated than for the negative occupancy distribution, owing to the fact that two parameters are collapsed into one. Detailed analysis of this distribution is outside the scope of the present paper.) These recursive equations give rise to corresponding stochastic dominance results, as shown in the theorem below.

Theorem 9 (First-Order Stochastic Dominance)

Let \(F\left(t|m,k,\theta \right)\equiv {\mathbb{P}}\left({T}_{k}\le t\right)\) denote the cumulative distribution function for the negative occupancy distribution. This satisfies the following first-order stochastic dominance relations:

$$\begin{array}{lllllll}m\le m^{\prime}& \Rightarrow & F\left(t|m,k,\theta \right)\le F\left(t|m^{\prime},k,\theta \right) & \left(\begin{array}{c}{\text{strict}} \, \text{if} \ m < {m}^{\prime} \;{\text{and}} \, \\ k>1 \, {\text{o}}{\text{r}} \, \theta < 1\end{array}\right)\\k\le k^{\prime}& \Rightarrow & F\left(t|m,k,\theta \right)\ge F\left(t|m,k^{\prime},\theta \right)& \left(\text{strict if }k < k^{\prime}\right)\\ \theta \le \theta ^{\prime}& \Rightarrow & F\left(t|m,k,\theta \right)\le F\left(t|m,k,\theta ^{\prime}\right)& \left(\text{strict if }\theta < \theta ^{\prime}\right)\end{array}$$

In the proofs in Appendix 2, we establish the above stochastic dominance results algebraically from the mass function of the negative occupancy distribution. These results follow directly from the monotone likelihood-ratio properties of the distribution with respect to its parameters, but they also have some basic statistical intuition. In particular, the stochastic dominance results for the negative occupancy distribution are intuitively related to stochastic dominance for the extended occupancy distribution —ceteris paribus, increases in either \(m\) or \(\theta\) will tend to increase the occupancy number for any fixed number of balls, and will thus tend to decrease the number of excess balls required to achieve a fixed occupancy number. Contrarily, if we increase the occupancy number \(k\) this will tend to directly increase the excess hitting time, since the hitting time is now for a larger outcome value in a pure birth process.

The negative occupancy distribution provides us with a description of the stochastic behaviour of the excess number of balls required to achieve a given occupancy number in the extended occupancy problem. By a simple shift in location, it can also be used to describe the stochastic behaviour of the total number of balls required to achieve a given occupancy number. The coupon-collector distribution is a special case of the negative occupancy distribution, which solves the famous “coupon-collector problem”, giving a full description of the behaviour of the minimum number of balls needed to achieve full occupancy.

5 The “Spillage” and its Conditional Distribution

From our previous analysis, we have already seen that taking \(m=\infty\) means that each ball falls into a different bin, yielding standard Bernoulli sampling. In this special case the occupancy number must be equal to the effective number of balls in the occupancy problem, and thus, we will have \({n}_{\mathrm{eff}}-{K}_{n}=0\). If we use a finite number of bins \(m<\infty\) this is no longer guaranteed, since it is possible for some of the effective balls to occupy the same bin, so that the effective number of balls may exceed the occupancy number. If we consider a bin containing a single ball to be occupied, we can consider the value \({n}_{\mathrm{eff}}-{K}_{n}\) to constitute “spillage” of balls in excess of the number required to occupy the occupied bins.

The third distribution we will examine in the extended occupancy problem is the conditional distribution of the “spillage”, conditional on the occupancy number. We will derive this distribution using Bayes’ theorem. Conditional on the effective number of balls \({n}_{\mathrm{eff}}=s\), the distribution of the occupancy number is the classical occupancy distribution:

$${\mathbb{P}}\left({K}_{n}=k|{n}_{\mathrm{eff}}=s,n,m,\theta \right)=\mathrm{Occ}\left(k|s,m\right).$$

Thus, for all occupancy values \(1\le k\le \mathrm{min}\left(n,m\right)\) and all argument values \(k\le s\le n\) for the effective number of balls, we have:

$$\begin {aligned}{\mathbb{P}}\left({n}_{\mathrm{eff}}=s|{K}_{n}=k,n,m,\theta \right)\, &=\frac{{\mathbb{P}}\left({K}_{n}=k|{n}_{\mathrm{eff}}=s,n,m,\theta \right)\times {\mathbb{P}}\left({n}_{\mathrm{eff}}=s|n,m,\theta \right)}{{\mathbb{P}}\left({K}_{n}=k|n,m,\theta \right)}\\&=\frac{\mathrm{Occ}\left(k|s,m\right)\times \mathrm{Bin}\left(s|n,\theta \right)}{\mathrm{Occ}\left(k|n,m,\theta \right)}\\&=\frac{\frac{1}{{m}^{s}}\cdot {\left(m\right)}_{k}\cdot S\left(s,k\right)\times \left(\begin{array}{c}n\\ s\end{array}\right)\cdot {\theta }^{s}\cdot {\left(1-\theta \right)}^{n-s}}{\frac{{\theta }^{n}}{{m}^{n}}\cdot {\left(m\right)}_{k}\cdot S\left(n,k,m\cdot \frac{1-\theta }{\theta }\right)}\\&=\frac{\frac{{\theta }^{n}}{{m}^{n}}\cdot {\left(m\right)}_{k}\cdot S\left(s,k\right)\times \left(\begin{array}{c}n\\ s\end{array}\right)\cdot {\left(m\cdot \frac{1-\theta }{\theta }\right)}^{n-s}}{\frac{{\theta }^{n}}{{m}^{n}}\cdot {\left(m\right)}_{k}\cdot S\left(n,k,m\cdot \frac{1-\theta }{\theta }\right)}\\&=\left(\begin{array}{c}n\\ s\end{array}\right)\cdot {\left(m\cdot \frac{1-\theta }{\theta }\right)}^{n-s}\cdot \frac{S\left(s,k\right)}{S\left(n,k,m\cdot \left(1-\theta \right)/\theta \right)}.\end {aligned}$$

Taking \(s=k+r\) gives the conditional distribution of the “spillage”, which is:

$$\begin {aligned}{\mathbb{P}}\left({n}_{\mathrm{eff}}-{K}_{n}=r|{K}_{n}=k,n,m,\theta \right)\, &={\mathbb{P}}\left({n}_{\mathrm{eff}}=k+r|{K}_{n}=k,n,m,\theta \right) \\&=\left(\begin{array}{c}n\\ k+r\end{array}\right)\cdot {\left(m\cdot \frac{1-\theta }{\theta }\right)}^{n-k-r}\cdot \frac{S\left(k+r,k\right)}{S\left(n,k,m\cdot \left(1-\theta \right)/\theta \right)}.\end {aligned}$$

Definition (The Spillage Distribution)

This distribution is a discrete probability distribution with probability mass function given byFootnote 11:

$$\mathrm{Spillage}\left(r|n,k,\phi \right)\equiv \left(\begin{array}{c}n\\ k+r\end{array}\right)\cdot {\phi }^{n-k-r}\cdot \frac{S\left(k+r,k\right)}{S\left(n,k,\phi \right)} \qquad\qquad r=0,\dots ,n-k\text{,}$$

where \(n\in {\mathbb{N}}\) is the size parameter (number of balls), \(0\le k\le n\) is the occupancy parameter (occupancy number) and \(0\le \phi \le \infty\) is the scale parameter. □

We can see that the spillage distribution describes the behaviour of the “spillage” given our knowledge of the occupancy number. The distribution can also be shifted to describe the behaviour of the number of effective balls given our knowledge of the occupancy number. These two conditional probabilities are given respectively byFootnote 12:

$$\begin{aligned}{\mathbb{P}}\left({n}_{\mathrm{eff}}-{K}_{n}=r|{K}_{n}=k,n,m,\theta \right)&=\mathrm{Spillage}\left(r|n,k,m\cdot \frac{1-\theta }{\theta }\right)\text{, }\\ {\mathbb{P}}\left({n}_{\mathrm{eff}}=s|{K}_{n}=k,n,m,\theta \right)&=\mathrm{Spillage}\left(k+s|n,k,m\cdot \frac{1-\theta }{\theta }\right).\end{aligned}$$

It is worth noting here that the distribution of the “spillage” depends on \(m\) and \(\theta\) only through the scale parameter \(\phi =m\cdot \left(1-\theta \right)/\theta\). In the classical case where \(\theta =1\) we have the scale parameter \(\phi =0\) so \({n}_{\mathrm{eff}}=n\) with probability one (and the corresponding “spillage” is \(n-k\)). In the case where \(m=\infty\) and \(0<\theta <1\) we have the scale parameter \(\phi =\infty\) so \({n}_{\mathrm{eff}}=k\) with probability one (and the corresponding “spillage” is zero). We have been unable to identify this distribution in the existing mathematical or statistical literature, and so to our knowledge it is a “new” distributional family; the name we have ascribed here is our own creation. The name we have chosen reflects the fact that the distribution arises when we consider excess balls above what is required to occupy a bin to “spill” over the capacity of the bin.

Remark 3

Mathematically, the mass function of the spillage distribution arises from the well-known expansion for the noncentral Stirling numbers of the second kind in terms of the central Stirling numbers of the second kind (see Appendix 1), which can be written as:

$$S\left(n,k,\phi \right)=\sum_{r=0}^{n-k}\left(\begin{array}{c}n\\ k+r\end{array}\right)\cdot {\phi }^{n-k-r}\cdot S\left(k+r,k\right).$$

For \(\phi \ge 0\) the terms in this sum are non-negative and the terms give the kernel of the mass function of the spillage distribution. □

Unlike our previous two distributions, the present distribution does not generalise any common non-trivial distribution arising as a variant of the binomial. In fact, we see from the theorem below that in the case where we have an infinite number of bins (giving \(\phi =\infty\)) the distribution degenerates down to a point mass on \(r=0\), reflecting the fact that there is no “spillage” in this case. Thus, rather than providing a useful generalisation of an existing distribution, like our previous distributions, the spillage distribution is a new form that describes the divergence between the effective number of balls and the occupancy number in the setting of the extended occupancy problem. In the case of a finite number of bins, the occupancy number may be “squashed” down below the effective number of balls by the fact that balls may share bins with non-zero probability.

Theorem 10 (Limit of the Spillage Distribution with Infinite Bins)

The spillage distribution satisfies the limiting form:

$$\begin{array}{c}\mathop{\lim}\limits_{\theta \to \infty}\mathrm{Spillage}\left(r|n,k,\phi \right)={\mathbb{I}}\left(r=0\right).\end{array}$$

Taking \(n=k+r\) gives the probability that the effective number of balls is equal to the full number of balls (i.e., that all balls were effective). The probability of this outcome, written in terms of the occupancy number \(k\) and the spillage \(r\) is:

$$\mathrm{Spillage}\left(r|k+r,k,\phi \right)=\frac{S\left(k+r,k\right)}{S\left(k+r,k,\phi \right)}.$$

As we did above with our first two distributions, it is useful to write the mass function of the spillage distribution in an alternative form involving the scaled Stirling function (see Appendix 1). With a bit of algebra it can be shown that:

$$\left(\begin{array}{c}n\\ k+r\end{array}\right)=\frac{\left(\begin{array}{c}n\\ k\end{array}\right)\left(\begin{array}{c}n-k\\ r\end{array}\right)}{\left(\begin{array}{c}k+r\\ k\end{array}\right)}.$$

This gives an alternative form for the mass function of the spillage distribution:

$$\begin {aligned}\mathrm{Spillage}\left(r|n,k,\phi \right)\, &=\left(\begin{array}{c}n\\ k+r\end{array}\right)\cdot {\phi }^{n-k-r}\cdot \frac{S\left(k+r,k\right)}{S\left(n,k,\phi \right)} \\&=\frac{\left(\begin{array}{c}n\\ k\end{array}\right)\left(\begin{array}{c}n-k\\ r\end{array}\right)}{\left(\begin{array}{c}k+r\\ k\end{array}\right)}\cdot {\phi }^{n-k-r}\cdot \frac{S\left(k+r,k\right)}{S\left(n,k,\phi \right)} \\&=\left(\begin{array}{c}n-k\\ r\end{array}\right)\cdot \frac{S\left(k+r,k,\phi \right)}{\left(\begin{array}{c}k+r\\ k\end{array}\right)\cdot {\phi }^{r}}\Bigg{/}\frac{S\left(n,k,\phi \right)}{\left(\begin{array}{c}n\\ k\end{array}\right)\cdot {\phi }^{n-k}}\times \frac{S\left(k+r,k\right)}{S\left(k+r,k,\phi \right)}\\&=\left(\begin{array}{c}n-k\\ r\end{array}\right)\cdot \frac{\Pi \left(k+r,k,\phi \right)}{\Pi \left(n,k,\phi \right)}\times \mathrm{Spillage}\left(r|k+r,k,\phi \right).\end {aligned}$$

This form of the mass function frames the probabilities relative to the conditional probability that all the balls in the occupancy problem are effective. As can be seen, the form involves writing the mass function as the product of this probability and an adjustment term involving the scaled Stirling numbers.

As with the other two distributions we have examined in this paper, it is possible to use the recursive/differential properties of the noncentral Stirling numbers of the second kind to obtain corresponding recursive/differential equations for the spillage distribution. These equations are rather cumbersome, and not particularly illuminating, so they are omitted here. As should be unsurprising, the spillage is stochastically increasing in \(n\) and decreasing in \(k\). It is also possible to establish that the spillage is stochastically decreasing in \(\phi\), which means it is stochastically decreasing in \(m\) and stochastically increasing in \(\theta\).

6 Mixture Properties Involving the Occupancy Distributions

The three occupancy distributions we have examined have a number of interesting mixture characterisations that are useful for computational and analytic purposes. We will look at each of the distributions in the order presented in our previous examination, and derive mixture characterisations for each, beginning with the extended occupancy distribution. The mixtures in Theorems 12–13 below are also shown in Harkness (1969) (Eqs. 24 and 23 respectively). To the knowledge of the present author, the remaining results are new.

One way to derive mixture results for the extended occupancy distribution is to treat the number of balls in the occupancy problem as a random variable with a specified distribution over the natural numbers. This leads to a general mixture form shown in the theorem below, where the marginal mass function of the occupancy number involves the probability generating function of the underlying distribution.

Theorem 11 (Random Number of Balls)

Suppose we let the number of balls in the occupancy problem be a random variable \(N \sim {p}_{N}\) and let \({G}_{N}\) be the corresponding probability generating function of \(N\). Then the marginal mass function for the occupancy number is:

$${\mathbb{P}}\left({K}_{N}=k|m,\theta \right)=\left(\begin{array}{c}m\\ k\end{array}\right)\sum\limits_{i=0}^{k}\left(\begin{array}{c}k\\ i\end{array}\right){\left(-1\right)}^{k-i}{G}_{N}\left(1-\theta \cdot \frac{m-i}{m}\right).$$

There are some distributions with simple probability generating functions that conform nicely with this sum expression above, in such a way as to yield useful mixture characterisations. In Theorems 12–13 below we give our first mixture results, which show that binomial and Poisson mixtures of the occupancy distribution both give rise to simple marginal distributions. Later we show some mixture results for the negative occupancy and spillage distributions.

Theorem 12 (Occupancy Distribution is a Binomial Mixture of Occupancy Distributions)

The occupancy distribution satisfies the equation:

$$\mathrm{Occ}\left(k|n,m,\gamma \theta \right)=\sum_{r=0}^{n}\mathrm{Bin}\left(r|n,\theta \right)\cdot \mathrm{Occ}\left(k|r,m,\gamma \right).$$

In the special case where \(\gamma =1\) we obtain the useful mixture equation:

$$\mathrm{Occ}\left(k|n,m,\theta \right)=\sum_{r=0}^{n}\mathrm{Bin}\left(r|n,\theta \right)\cdot \mathrm{Occ}\left(k|r,m\right).$$

Theorem 12 has a simple intuition when we interpret it as involving two independent events resulting in “falling through” the bins. The binomial distribution in the mixture gives the number of “effective” balls that do not fall through the bins due to the new event, and each of these terms is multiplied by the occupancy distribution without the probability of that event incorporated. We have stated the theorem in a general form, but the most important case occurs when \(\gamma =1\), which allows us to write the occupancy distribution as a binomial mixture of the classical occupancy distribution. (This mixture is useful for computational purposes; it can be combined with the algorithms in O’Neill 2021 to yield an algorithm to compute the extended occupancy distribution.)

Theorem 13 (Binomial Distribution is a Poisson Mixture of Occupancy Distributions)

The binomial distribution satisfies the equation:

$$\mathrm{Bin}\left(k\Big{|}m,1-\mathrm{exp}\left(-\frac{\lambda \theta }{m}\right)\right)=\sum_{r=0}^{\infty }\mathrm{Pois}\left(r|\lambda \right)\cdot \mathrm{Occ}\left(k|r,m,\theta \right).$$

With \(\theta >0\) this can be written to yield a binomial distribution with parameter \(0<\gamma <1\) as:

$$\mathrm{Bin}\left(k|m,\gamma \right)=\sum_{r=0}^{\infty }\mathrm{Pois}\left(r\Bigg{|}m\cdot \frac{\left|\mathrm{ln}\left(1-\gamma \right)\right|}{\theta }\right)\cdot \mathrm{Occ}\left(k|r,m,\theta \right).$$

Theorem 13 gives a mixture characterisation of the binomial distribution as a Poisson mixture of underlying occupancy distributions. We have already noted that the occupancy distribution is a generalisation of the binomial distribution, so this gives us yet another characterisation of the binomial distribution. Theorems 12–13 are extensions of well-known characterisations of the binomial and Poisson distributions. Taking \(m\to \infty\) gives \(\mathrm{Occ}\left(k|r,m,\gamma \right)\to \mathrm{Bin}\left(k|r,\gamma \right)\) so that Theorem 12 reduces down to the well-known mixture:

$$\mathrm{Bin}\left(k|n,\gamma \theta \right)=\sum_{r=0}^{n}\mathrm{Bin}\left(r|n,\theta \right)\cdot \mathrm{Bin}\left(k|r,\gamma \right)\text{,}$$

Taking \(m\to \infty\) and \(\gamma \to 0\) with \(m\gamma \to \lambda \theta\) we can apply L’Hôpital’s rule to show that:

$$m\cdot \frac{\left|\mathrm{ln}\left(1-\gamma \right)\right|}{\theta }=\frac{m\gamma }{\theta }\cdot \frac{\left|\mathrm{ln}\left(1-\gamma \right)\right|}{\gamma }\to \frac{m\gamma }{\theta }\cdot \frac{1}{1-\gamma }\to \lambda .$$

Since these limits give \(\mathrm{Occ}\left(k|r,m,\theta \right)\to \mathrm{Bin}\left(k|r,\theta \right)\) and \(\mathrm{Bin}\left(k|m,\gamma \right)\to \mathrm{Pois}\left(k|\lambda \theta \right)\) we see that the mixture equation in Theorem 13 (second equation) reduces to the well-known mixture:

$$\mathrm{Pois}\left(k|\lambda \theta \right)=\sum_{r=0}^{\infty }\mathrm{Pois}\left(r|\lambda \right)\cdot \mathrm{Bin}\left(k|r,\theta \right).$$

The extended occupancy distribution provides a useful extension to the binomial distribution, with mixture properties that connect it to other common discrete distributions that arise in statistical practice. In particular, we have established that a Poisson mixture of occupancy distributions yields the binomial distribution, and thus provides a natural link between these two distributions. By viewing the occupancy distribution as a distribution relating to a Markov chain we are able to establish its properties either via the theory of Markov chains, or by direct algebraic analysis of the mass function.

As with the occupancy distribution, it is also possible to derive interesting mixture results using the negative occupancy distribution, which generalise well-known mixture representations for the negative binomial distribution. We will derive the mixture characterisation directly through the mass function in this case.

Theorem 14 (Negative Occupancy Distribution is a Negative Binomial Mixture of Negative Occupancy Distributions)

The negative occupancy distribution satisfies the equation:

$$\mathrm{NegOcc}\left(t|m,k,\gamma \theta \right)=\sum_{r=0}^{t}\mathrm{NegBin}\left(t-r|k+r,1-\theta \right)\cdot \mathrm{NegOcc}\left(r|m,k,\gamma \right).$$

In the special case where \(\gamma =1\) we obtain the useful mixture equation:

$$\mathrm{NegOcc}\left(t|m,k,\theta \right)=\sum_{r=0}^{t}\mathrm{NegBin}\left(t-r|k+r,1-\theta \right)\cdot \mathrm{NegOcc}\left(r|m,k\right).$$

Theorem 14 is the negative occupancy analogue to Theorem 12 for the occupancy distribution. This theorem also has a simple intuition when we interpret it as involving two independent events resulting in “falling through” the bins. The negative binomial distribution in the mixture gives the component of the excess hitting time that is attributable to the new event, and each of these terms is multiplied by the negative occupancy distribution without the probability of that event incorporated. We have stated the theorem in a general form, but the most important case occurs when \(\gamma =1\), which allows us to write the negative occupancy distribution as a negative binomial mixture of the classical negative occupancy distribution. Again, this latter mixture is especially useful for computational purposes.

We have seen that the negative occupancy distribution generalises the negative binomial distribution, so it is useful to compare the mixture characterisation in Theorem 14 to the known characterisations of the negative binomial distribution. Taking \(m\to \infty\) in Theorem 14 gives \(\mathrm{NegOcc}\left(t|m,k,\gamma \right)\to \mathrm{NegBin}\left(t|k,1-\gamma \right)\) so that Theorem 14 reduces asymptotically down to the well-known negative binomial mixture:

$$\mathrm{NegBin}\left(t|k,1-\gamma \theta \right)=\sum_{r=0}^{t}\mathrm{NegBin}\left(t-r|k+r,1-\theta \right)\cdot \mathrm{NegBin}\left(r|k,1-\gamma \right).$$

Each of the above mixtures extends known mixture characterisations for the binomial, Poisson, or negative binomial distributions. To complete our analysis of mixture characterisations, we will derive one final mixture result using the spillage distribution. This mixture does not extend any other well-known mixture results. (Taking \(m\to \infty\) reduces the spillage distribution to a point-mass distribution on zero, so in this limiting case the mixture reduces to a trivial assertion that \(\mathrm{Bin}\left(s|n,\theta \right)=\mathrm{Bin}\left(s|n,\theta \right)\).)

Theorem 15 (Binomial Distribution is a Spillage Mixture of Occupancy Distributions)

The binomial distribution satisfies the equation:

$$\mathrm{Bin}\left(s|n,\theta \right)=\sum_{k=0}^{s}\mathrm{Spillage}\left(s-k\Bigg{|}n,k,m\cdot \frac{1-\theta }{\theta }\right)\cdot \mathrm{Occ}\left(k|n,m,\theta \right).$$

We now have a reasonably complete set of mixture results that relate the various occupancy distributions to other well-known discrete distributions. In particular, the most useful mixture results here are Theorems 12 and 14, which allow us to generate the occupancy distribution and the negative occupancy distribution as mixtures of their classical versions. (In the first case this is a binomial mixture and in the second case it is a negative binomial mixture.) These results are useful for computational purposes, since they allow us to generate the extended distributions from their classical counterparts.

7 Application to “Coverage” Analysis in Bootstrapping/Resampling Problems

The extended occupancy problem and the three distributions discussed in this paper arise when we undertake simple-random-sampling with replacement (SRSWR) from a finite set of objects. One statistical context in which this occurs is when we use resampling methods such as “bootstrap” estimation (see e.g., Hall 1992). In this context we may wish to examine the coverage of the data points in the original sample, which leads us to the classical occupancy problem. Consequently, various aspects of the coverage of the original sample are described by the occupancy distributions we have looked at in this paper. In particular, the marginal and conditional distributions of the coverage of the original sample are described by the extended occupancy distribution, and our other distributions describe related aspects of the problem.

The formal description of occupancy analysis in resampling problems is fairly straightforward. In order to stick with our existing notation throughout this paper, suppose we have an initial sample of data points \({\varvec{x}}=\left({x}_{1},\dots ,{x}_{m}\right)\) and we decide to resample \(n\) data points via SRSWR. We note that it is usual in bootstrapping to generate resamples that are of the same size as the original sample (i.e., with \(n=m\)). We will proceed in greater generality because our coverage analysis applies just as well to resampling that does not impose this restriction. For simplicity, we will also assume that all the original data points are distinct, such as would occur when the underlying distribution is continuous. (Analysis can be extended to the case where there are duplicate values in the original sample, but certain aspects of the problem then go beyond the extended occupancy problem.) Formally, bootstrapping works by generating a resampled data vector \({\varvec{y}}=\left({y}_{1},\dots ,{y}_{n}\right)\) where the elements are:

$${y}_{i}={x}_{{U}_{i}} \qquad\qquad {U}_{1},\dots ,{U}_{n} \sim \mathrm{IID\, U}\left\{1,\dots ,m\right\}.$$

Let \({\mathcal{J}}_{n}\equiv \bigcup_{i=1}^{n}\left\{{U}_{i}\right\}\subseteq \left\{1,\dots ,m\right\}\) be the set of data points that were resampled (as described by their indices) and let \({K}_{n}\equiv \left|{\mathcal{J}}_{n}\right|\) be the size of this set. Let \({T}_{k}\equiv \mathrm{min}\left\{t=\mathrm{0,1},2,\dots |{K}_{k+t}=k\right\}\) be the number of excess resampled values required to ensure that the resampled vector includes \(k\) different data points from the original sample. The set \({\mathcal{J}}_{n}\) describes the “coverage” of the original data points in the resample, and the quantity \({K}_{n}\) tells us the number of data points in the original sample that appear in the resample.

Since the indices for the resampled values are independent uniform random variables over the original indices for the data points, we can easily see that the “coverage” of the original sample is described by the classical occupancy problem. (Our notation \({K}_{n}\) and \({T}_{k}\) for the coverage of the original sample and its excess hitting time reflect the fact that these are the occupancy number and excess hitting time in the classical occupancy problem.) We can easily see that the number of resampled data points \({K}_{n}\) is an occupancy number from the classical occupancy problem, so it follows the marginal and conditional distributions:

$$\begin{aligned}{\mathbb{P}}\left({K}_{n}=k\right)&=\mathrm{Occ}\left(k|n,m\right)\text{,} \, \\ {\mathbb{P}}\left({K}_{\acute{n}+n}=k|{K}_{\acute{n}}=r\right)&=\mathrm{Occ}\left(k-r|n,m-r,1-r/m\right).\end{aligned}$$

Similarly, we can easily see that \({T}_{k}\) is an excess hitting time in the classical occupancy problem, so it follows the marginal and conditional distributions:

$$\begin{aligned}{\mathbb{P}}\left({T}_{k}=t\right)&=\mathrm{NegOcc}\left(t|m,k\right)\text{,} \, \\ {\mathbb{P}}\left({T}_{\acute{k}+k}=t|{T}_{\acute{k}}=r\right)&=\mathrm{NegOcc}\left(t-r|m-r,k,1-r/m\right).\end{aligned}$$

Since resampling follows the classical occupancy problem, in this context the “spillage” is trivial; we always have \({n}_{\mathrm{eff}}=n\) so \({n}_{\mathrm{eff}}-{K}_{n}=n-{K}_{n}\) with probability one.

Bootstrapping involves taking some large number of resamples from an original sample vector in order to estimate the sampling distribution of a quantity of interest. Suppose we generate bootstrap simulations \(s=1,\dots ,S\) giving corresponding resample vectors \({{\varvec{y}}}^{\left(1\right)},\dots ,{{\varvec{y}}}^{\left(S\right)}\) using the above method. If \(S\) is large then we can rely on the strong law of large numbers to assure ourselves that the empirical distributions of the coverage quantities for these resamples will converge almost surely to their true distributions:

$$\begin{aligned}&\frac{1}{S}\sum\limits_{s=1}^{S}{\mathbb{I}}\left({K}_{n}^{\left(s\right)}=k\right)\stackrel{\mathrm{a}.\mathrm{s}}{\to }\mathrm{Occ}\left(k|n,m\right)\text{,} \, \\& \\& \frac{1}{S}\sum\limits_{s=1}^{S}{\mathbb{I}}\left({T}_{k}^{\left(s\right)}=t\right)\stackrel{\mathrm{a}.\mathrm{s}}{\to }\mathrm{NegOcc}\left(t|m,k\right).\end{aligned}$$

In Fig. 2 below we show the coverage of \(S=30\) bootstrap resamples of an original sample containing \(n=25\) data points. Each simulated resample is shown by one of the larger squares, and the occupancy number for each resample is the number of red squares in the larger square. (Since we are interested only in the coverage of the original sample here, the figure does not show how many times each value was resampled — only if it is resampled at least once or not.) As we take more and more resamples (i.e., as \(S\to \infty\)), the empirical distribution of the occupancy numbers will converge to the classical occupancy distribution with \(n=m=25\).

Fig. 2
figure 2

Coverage of simulated bootstrap resamples for an original sample of n = 25 data points using m = 25 resampled points. There are S = 30 squares showing simulated resamples. Each red square in the larger square represents a data point that is included in the resample; the occupancy number for each resample is the number of red squares

Analysis of coverage of the original sample has been used in bootstrapping analysis in order to examine issues of bias that arise from conflation of training and testing. For example, Efron and Tibshirani (1997) examine bootstrapping and cross-validation methods for estimation of error rates in binary regression modelling. In order to correct for bias arising from training and testing with the same data, they examine a special type of bootstrapping analysis formulated in Efron (1983) called the “0.632 + bootstrap method” (see also Efron 1986). Though they give their own explanation of this method, here we will examine this method in terms of our own coverage analysis. To do this, we will begin by noting that the expectation and variance of the coverage proportion are:

$$\begin{aligned}&{\mathbb{E}}\left(\frac{{K}_{n}}{m}\right)=\left[1-{\left(1-\frac{1}{n}\right)}^{n}\right]\text{,} \\& {\mathbb{V}}\left(\frac{{K}_{n}}{m}\right)=\left[\left(m-1\right){\left(1-\frac{2}{n}\right)}^{n}+{\left(1-\frac{1}{n}\right)}^{n}-m{\left(1-\frac{1}{n}\right)}^{2n}\right].\end{aligned}$$

If we take \(n\to \infty\) and \(m\to \infty\) subject some fixed limiting ratio \(n/m\to \lambda\) then we have the asymptotic equivalence:

$${\mathbb{E}}\left(\frac{{K}_{n}}{m}\right) \sim 1-{e}^{-\lambda} \qquad\qquad {\mathbb{V}}\left(\frac{{K}_{n}}{m}\right) \sim \frac{{e}^{-\lambda }\left(1-{e}^{-\lambda }\right)}{m}.$$

Using the standard bootstrapping method where we use a resample that is the same size as the original sample (i.e., with \(n=m\) so that \(\lambda =1\)) we have convergence \({K}_{n}\to 1-1/e\approx 0.632\) (using the strong law of large numbers this is almost sure convergence). This is empirically evident in the resample simulations in Fig. 2.

For any particular data point \(i\), it can similarly be shown that \({\mathbb{P}}\left(i\in {\mathcal{J}}_{n}\right)\to 1-1/e\). Efron and Tibshirani (1997) note that error rate analysis suffers from bias when a data point used for prediction purposes (the “test point”) is also included in the training sample, and this is usually dealt with in cross-validation by using the “leave one out” method (i.e., the test point is left out of the training set used for constructing its prediction). However, in bootstrapping analysis the resample will include the test point with approximate probability \(0.632\). The “0.632+ bootstrap method” formulates an estimator that takes a weighted average of two estimators with known upward and downward biases, where the probabilities of inclusion/exclusion of the test point are used as weightings in the method. Our purpose here is not to recommend this method. (Indeed, the present author has a great deal of scepticism towards bootstrapping methods, but that is beyond the scope of this paper.) It is simply to note that analysis of coverage of the original data points is an important issue in the analysis of bootstrapping and resampling, and it leads to methods that take account of coverage probabilities pertaining to the original sample.

Suppose now that we look more broadly than the bootstrap, at resampling methods that may use a different number of resample values than were in the original sample (i.e., allowing for the case where \(n\ne m\)). One natural question in this context is how many points one should resample in order to get some stipulated minimum probability of a particular level of coverage of the original sample (e.g., including at least \(k\) distinct points from the original sample). The probability that a resample of size \(n\) covers at least \(k\) data points in the original sample is:

$${\mathbb{P}}\left({K}_{n}\ge k\right)={\mathbb{P}}\left({T}_{k}\le n-k\right)=\sum_{r=k}^{m}\mathrm{Occ}\left(r|n,m\right)=\sum_{s=0}^{n-k}\mathrm{NegOcc}\left(s|m,k\right).$$

Given some stipulated minimum probability \(0<\phi <1\) we can use the occupancy distributions to find the required resample size for this problem:

$$\begin {aligned}{\widehat{n}}_{k}\left(\phi \right)\, &\equiv \mathrm{min}\left\{n\in {\mathbb{N}}\big{|}{\sum}_{r=k}^{m}\mathrm{Occ}\left(r\big{|}n,m\right)\ge \phi \right\}\\&=\mathrm{min}\left\{n\in {\mathbb{N}}\big{|}{\sum}_{r=k}^{m}{\left(m\right)}_{r}\cdot S\left(n,r\right)\ge \phi {m}^{n}\right\}.\end {aligned}$$

The value \({\widehat{n}}_{k}\left(\phi \right)\) is the smallest number of resampled values required to give a probability of at least \(\phi\) of an occupancy number at least \(k\). Computation of this quantity allows an analyst to pre-determine the required resample size for a coverage requirement on the original sample. The special case where we seek full coverage of the original sample (i.e., \({K}_{n}=m\)) is a variation of the coupon-collector problem. Investigations of this kind can be of use if an analyst wishes to undertake resampling in a manner that is likely to give some specified level of coverage of the original sample.

8 Summary and Concluding Remarks

Our goal in this paper has been to derive and discuss three interesting distributions arising from the extended occupancy problem. This problem can be framed as a pure-birth Markov chain describing the evolution of the occupancy number as more and more balls are added to a fixed number of bins, with some fixed probability of occupancy for each ball. The three distributions we have examined arise to describe the behaviour of various aspects of this Markov chain — the occupancy number, the excess hitting time for the occupancy number, and the “spillage” describing the difference between the effective number of balls and the occupancy number. It is interesting that all three distributions involve the noncentral Stirling numbers of the second kind, and the first two distributions generalise other well-known discrete distributions.

Setting aside their statistical derivation, the mathematical form of these distributions is also interesting, insofar as each distribution can be framed as the distributional analogue to a well-known equation for the noncentral Stirling numbers of the second kind (i.e., each corresponds to a normed version of a summation result involving the noncentral Stirling numbers of the second kind). The occupancy distribution arises as the distributional analogue to the equation expressing a power-sum as a sum of falling factorials of one of the values, with the noncentral Stirling numbers of the second kind arising as the coefficients. The negative occupancy distribution arises as the distributional analogue to the ordinary generating function of the noncentral Stirling numbers of the second kind. Finally, the spillage distribution arises as the distributional analogue to the equation expressing the noncentral Stirling numbers of the second kind as a weighted sum of the (central) Stirling numbers of the second kind.

The occupancy distributions in this paper arise in contexts where we undertake simple-random-sampling with replacement from a finite set of items, and we then examine the “occupancy number” and related quantities pertaining to the sample. This also arises in bootstrapping and other resampling techniques, where we can use the occupancy distributions to describe the stochastic behaviour of various quantities looking at the coverage of the original sample.

We hope that this inquiry has given the reader an appreciation for the various ways that the noncentral Stirling numbers of the second kind arise in the extended occupancy problem, and has likewise given an appreciation for the fact that this simple problem generates distributions that provide analogues to a wide range of equations involving the noncentral Stirling numbers of the second kind. Both the statistical and mathematical aspects of these three distributions are interesting, and they provide a broad class of discrete distributional forms created from the noncentral Stirling numbers of the second kind. It is particularly interesting that our first two distributions provide generalisations of the binomial and negative binomial distributions, with an additional parameter \(0\le m\le \infty\) that has the effect of “squashing” the occupancy number when we impose a finite value. Imposing a finite number of bins on the extended occupancy problem will tend to give a lower value of the occupancy number, and thus a higher value of the excess hitting time, than would be the case if balls were allocated among an infinite number of bins. This also leads to a non-trivial distribution for the “spillage”, measuring the difference between the effective number of balls and the occupancy number. As a reference, we give some tables (Tables 1, 2, 3 and 4) below that summarise our three distributions, and summarise the mixture results involving these distributions.

Table 1 Summary of the occupancy distribution
Table 2 Summary of the negative occupancy distribution
Table 3 Summary of the spillage distribution
Table 4 Summary of mixture results involving the occupancy distributions