Abstract
The classical and extended occupancy distributions are useful for examining the number of occupied bins in problems involving random allocation of balls to bins. We examine the extended occupancy problem by framing it as a Markov chain and deriving the spectral decomposition of the transition probability matrix. We look at three distributions of interest that arise from the problem, all involving the noncentral Stirling numbers of the second kind. These distributions give a useful generalisation to the binomial and negative-binomial distributions. We examine how these distributions relate to one another, and we derive recursive properties and mixture properties that characterise the distributions.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
The classical occupancy distribution is an important and underappreciated discrete probability distribution, which describes the behaviour of the number of occupied bins when we allocate \(n\in {\mathbb{N}}\) balls at random to \(m\in {\mathbb{N}}\) bins. Analysis of the distribution can be found in Harkness (1969), Uppuluri and Carpenter (1971), Johnson and Kotz (1977), Kolchin et al. (1978) and Holst (1986), and some further analysis and computational aspects are discussed in O’Neill (2021). The distribution is useful in problems involving sampling with replacement, and it is especially useful in the context of bootstrapping techniques, where it can be applied to find the probability of any given level of coverage of the original sample in a random resampling. The classical distribution covers the case where balls allocated to bins automatically “occupy” those bins. For reasons that will become clear later, it turns out to be very useful to generalise to the case where each ball has some fixed probability \(0\le \theta \le 1\) of “occupying” its allocated bin, and corresponding probability \(1-\theta\) of “falling through” the bin, so that it does not occupy the bin (see e.g., Uppuluri and Carpenter 1971; Samuel-Cahn 1974).
In this paper we will examine the extended occupancy problem, which seeks the marginal and conditional distributions of the “occupancy number” (counting the number of occupied bins) under conditions where the balls can fall through the bins with a fixed probability. We will derive three important distributional forms arising in this framework, and we will see how these distributions relate to one another. The first two distributions we examine are generalisations of the binomial and negative binomial distributions, and a third is a new distribution relating the extended occupancy distribution to the binomial in a useful way. All three distributions involve the noncentral Stirling numbers of the second kind, and are mathematically interesting forms that arise as norms of well-known summation formulae involving the Stirling numbers.
We examine the extended occupancy process by framing the sequence of occupancy numbers as a Markov chain, and analysing the transition matrix of the chain. To set up our analysis, we first describe the underlying mathematics of this stochastic process, based on a sequence of randomly allocated balls that can occupy or fall through the bins. Consider two independent sequences of random variables:
The first sequence represents balls allocated at random to \(m\) bins and the second sequence gives indicators of whether these balls “occupy” those bins (as opposed to “falling through” the bins). From these underlying sequences we define the occupancy of each ball by the valuesFootnote 1:
The outcome \({U}_{i}= \bullet\) means that the ball fell through its bin and so it makes no contribution to the occupancy, whereas an outcome \({U}_{i}=1,\dots ,m\) means that the ball occupies its allocated bin.Footnote 2 For any number of balls \(n\) we define the occupied bin counts over bins \(\ell=1,\dots ,m\) and the corresponding occupancy number respectively by:
We have \(n\) balls in our problem, but there are \({n}_{\mathrm{eff}}\equiv n-{N}_{n,\bullet}=\sum_{i=1}^{n}{\mathbb{I}}\left({U}_{i}\ne \bullet\right)\) effective balls (i.e., balls that occupy their allocated bins). The occupancy number counts the number of bins that are occupied, which are the bin counts that are above zero.
The occupancy process is illustrated in Fig. 1 below, where we show a tabular arrangement of balls randomly allocated to bins. We show outcomes of \(n=10\) balls randomly allocated to \(m=12\) bins. Yellow squares show balls that fell through their bins and black squares show balls that occupy their bins. The bottom row of the figure shows the bin counts \({N}_{n,\ell}\), which add up the number of black squares in the columns above. The effective number of balls \({n}_{\mathrm{eff}}\) is obtained by counting the number of black squares in the whole figure, and the occupancy number \({K}_{n}\) is obtained by counting the number of bins with at least one occupying ball (i.e., the number of columns with at least one black square).
Outcomes of \(n=10\) balls randomly allocated to \(m=12\) bins. Yellow squares show balls that fell through their bins and black squares show balls that occupy their bins. Counts for each bin are shown in the bottom row. There are \({n}_{\mathrm{eff}}=8\) effective balls (black squares) in this case and the occupancy number is \({K}_{n}=6\) (number of columns with at least one black square)
The above figure can be extended to accommodate more balls and/or bins, and the probability of a ball falling through its allocated bin can also be varied. In any case, we have now shown the mathematical foundation of the extended occupancy process and so we are in a position to state the extended occupancy problem, which seeks the marginal and conditional distributions of the occupancy number. Although we will ultimately be interested in three distributional forms arising in the extended occupancy process, our first task will be to derive a distribution form that solves this problem, and examine its properties.
Definition (The Extended Occupancy Problem)
For \(0\le t\le k\le \mathrm{min}\left(n,m\right)\) we wish to find distributional forms for the marginal and conditional probabilities:
This problem is an extension of the classical occupancy problem, which occurs when \(\theta =1\) (i.e., all balls occupy their allocated bins with probability one). □
2 The Occupancy Process and the Extended Occupancy Distribution
Our approach to the occupancy problem is to look at the stochastic process \(\left\{{K}_{n}|n=\mathrm{0,1},2,\dots \right\}\), which shows the evolution of the occupancy number as we add more balls to the process. This approach is also used in Harkness (1969) and in Uppuluri and Carpenter (1971). Each time we allocate one new ball, this ball will either occupy a bin that is not already occupied (increasing the occupancy number by one) or it will fall through its allocated bin or be allocated to a bin that is already occupied (leaving the occupancy number unchanged). Since we are assuming that balls are allocated to bins by a uniform distribution, the probability of these two outcomes is conditional on the present occupancy number, but it is not affected by which bins are occupied. In this case, the conditional probability that a newly allocated ball increases the occupancy number, given the “history” of the process, depends only on the present occupancy number, so the chain obeys the Markov property. We formalise this argument, and give the resulting transition probability and transition probability matrix, in the following theorem.
Theorem 1 (Markov Characterisation)
Let \({{\varvec{u}}}_{n}\equiv \left({u}_{1},{u}_{2},\dots ,{u}_{n}\right)\) denote the outcomes of the first \(n\) balls in the series and let \({K}_{n}=t\) denote their occupancy number. Then the conditional probability for the occupancy number with one more allocated ball is:
This conditional probability depends on the allocation history only through \({K}_{n}\), so the process satisfies the Markov property —i.e., it is a Markov chain.
Corollary (Transition Probability Matrix)
For all the possible states \(k=\mathrm{0,1},2,\dots ,m\) the transition probability matrix for the Markov chain is the \(\left(m+1\right)\times \left(m+1\right)\) bidiagonal matrix:
(In accordance with standard conventions for Markov chains, we will index the elements of this matrix by the state values, so the first row and column will each use the zero index; i.e., the row and column indices will both run over \(i,j=\mathrm{0,1},\dots ,m\).)
Theorem 1 shows that the stochastic process \(\left\{{K}_{n}|n=\mathrm{0,1},2,\dots \right\}\) for the occupancy number is a Markov chain with a bidiagonal transition matrix, giving us a discrete “pure birth” process. (In the nomenclature of pure birth process, an increase in the occupancy number is a “birth”.) The intuition behind this transition matrix is straight-forward: if the existing occupancy number is \(t\) then the newly allocated ball increases the occupancy number by one so long as it is allocated to an unoccupied bin, and does not “fall through” that bin; the probability of allocation to an unoccupied bin is \(\left(1-t/m\right)\) and the probability that the ball does not fall through the bin is \(\theta\). If the newly allocated ball does falls through its allocated bin, or is allocated to a bin that is already occupied, the occupancy number does not increase.
The extended occupancy problem seeks both the marginal and conditional probabilities for the occupancy number. In the marginal problem our starting point for the chain is \({K}_{0}=0\), and in the conditional problem our starting point is \({K}_{\acute{n}}=t\). In either case, the required probabilities are easily obtained as appropriate elements of the powers of the transition matrix. Specifically, for all \(k,t=\mathrm{0,1},2,\dots ,m\) and all \(\acute{n},n=\mathrm{0,1},2,\dots\) we have:
Obtaining these probabilities requires us to take arbitrarily large powers of the transition matrix \(\mathbf{P}\), so it is useful to examine its spectral decomposition. It turns out that the transition matrix has eigenvectors that do not depend on the probability parameter \(\theta\), and this makes our matrix characterisation especially useful, leading to a simple spectral form for the distribution.
Theorem 2 (Spectral Decomposition)
The probability matrix \(\mathbf{P}\) has eigenvalue matrix:
Its (unscaled) eigenvector matrix \(\mathbf{v}\) and inverse eigenvector matrix \(\mathbf{w}={\mathbf{v}}^{-1}\) have elements given respectively by:
(Again, we remind the reader that we use indexing where the first row and column each use the zero index; this applies also to the indices for the eigenvalue and inverse eigenvalue matrices.) The columns of the eigenvector matrix (and its inverse matrix) are linearly independent, so the transition matrix is diagonalisable, with spectral decomposition \(\mathbf{P}=\mathbf{v}{\varvec{\Lambda}}\mathbf{w}\).
Theorem 2 gives the eigenvalue and eigenvector matrices of the transition matrix, which allows us to take arbitrarily large powers of this matrix using its spectral decomposition \({\mathbf{P}}^{n}=\mathbf{v}{{\varvec{\Lambda}}}^{n}\mathbf{w}\). Uppuluri and Carpenter (1971) derive this same spectral decomposition by way of the general form for the spectral decomposition of a bidiagonal matrix (i.e., a general “pure-birth” process). To apply the spectral decomposition, we let \({\mathbf{v}}_{k}\) denote the \(k\)th row of the eigenvector matrix and we let \({\mathbf{w}}_{k}\) denote the \(k\)th column of the inverse eigenvector matrix. We then have:
(Here we use the “falling factorials” \({\left(m\right)}_{t}=\prod_{i=0}^{t-1}\left(m-i\right)\) to expand binomial coefficients.) This gives a general form for the conditional probabilities in the extended occupancy problem. Taking \(t=0\) gives the marginal form:
where \(S\left(n,k,\phi \right)\) are the noncentral Stirling numbers of the second kind (see Appendix 1). This gives us a succinct form for the marginal probabilities arising in the occupancy problem.
In the above working, we offered the conditional form of the extended occupancy distribution as our most general form, with the marginal form occurring for \(t=0\). However, it is possible to rewrite the conditional form of the extended occupancy distribution using the marginal form with a corresponding variation in the number of bins and the probability of “falling through” a bin. To see this, we first note that —with a bit of algebra— it can be shown that:
Hence, an alternative form for the conditional occupancy probability is:
This is the same form as the marginal occupancy probability given above, but with an argument value of \(k-t\) occupied bins out of \(m-t\) bins, and with probability parameter \(\theta \left(1-t/m\right)\).
This result also follows simple intuition. To convert the conditional occupancy probability to a marginal occupancy probability, we can treat the problem as a marginal occupancy problem where we allocate all the new balls to \(m-t\) unoccupied bins, but we also have to reduce the probability parameter so that a new ball is considered to “fall through” its bin if it would have been allocated to one of the \(t\) bins already occupied by previous balls. In this method, the conditional probability of allocating a new ball to an already occupied bin is “folded into” the probability parameter, allowing us to write the conditional occupancy probability as a marginal occupancy probability. This result shows that both the marginal occupancy probability and the conditional occupancy probability can be written in the same distributional form, which can be used to solve the extended occupancy problem. In the next section, we will formalise this and look at three distributions that arises in our analysis.
Our above analysis gives solutions to the extended occupancy problem. We can make these solutions clearer and more succinct by introducing a class of distributions for the extended occupancy distribution, and naming its parameters.
Definition (The Extended Occupancy Distribution)
This is a discrete distribution with probability mass function given byFootnote 3:
where \(m\in \overline{\mathbb{N} }\) is the space parameter (number of bins),Footnote 4\(n\in {\mathbb{N}}\) is the size parameter (number of balls), and \(0<\theta \le 1\) is the probability parameter.Footnote 5 In the special case where \(\theta =1\) the distribution reduces to the classical occupancy distribution \(\mathrm{Occ}\left(k|n,m\right)\equiv \mathrm{Occ}\left(k|n,m,1\right)\). □
Our above analysis gives solutions to the extended occupancy problem, and shows that both the marginal and conditional distributions that arise in this problem are the same distributional form (but with different parameters). Using the notation introduced in our definition of the extended occupancy distribution, the marginal and conditional probabilities of interest are:
The special case where \(\theta =1\) leads to the classical occupancy distribution for the marginal distribution, and in this case the distribution can be derived by a combinatorial argument (see e.g., O’Neill 2021). The main value of extending the classical occupancy distribution to the extended occupancy distribution is that the latter is “closed under conditioning”, by which we mean that this family accommodates both the marginal and conditional distributions of the occupancy number. The extended occupancy distribution has been examined by a number of authors including Park (1972), Johnson and Kotz (1977, Section 3.3, pp. 139–146), Samuel-Cahn (1974) and Holst (1986). Broader extension to general occupancy problems and corresponding distributions can be found in Charalambides (2005).
Remark 1
Mathematically, the mass function of the extended occupancy distribution arises from the expansion \({\left(m+\phi \right)}^{n}=\sum_{k=0}^{n}{\left(m\right)}_{k}\cdot S\left(n,k,\phi \right)\) for the non-central Stirling numbers of the second kind (see Appendix 1). Each of the terms in this sum is non-negative, and those terms constitute a kernel for the mass function of the extended occupancy distribution. □
The occupancy distribution provides a natural extension to the binomial distribution, insofar as it takes the count of the “effective balls” and “squashes” this number down to the occupancy number by counting only those effective balls that are not duplicating the occupancy of a bin. In the next section we will see that the occupancy distribution actually provides a generalisation of the binomial distribution. However, for the moment, it is worth comparing the forms of the mass functions of the two distributions. To do this, it is quite useful to write the mass function of the occupancy distribution in an alternative form as a product of the binomial mass function multiplied by an adjustment term involving the scaled Stirling function (see Appendix 1):
This form shows a close resemblance between the mass functions of these two distributions. Moreover, the monotonicity and limit properties of the scaled Stirling function (see Lemma 1 in Appendix 1) allow us to obtain useful properties of the occupancy distribution.
A full account of the properties of the extended occupancy distribution is beyond the scope of this paper. Nevertheless, it is worth giving some basic properties including its moments and asymptotic form, since these are useful for computational purposes. As with many discrete distributions, the moments of the extended occupancy distribution are simplest when presented through the factorial moments. These factorial moments yield corresponding functions for the raw and central moments, which can be computed with a reasonable amount of algebra. We will go as far as the kurtosis of the distribution, noting that the form of this moment is already quite cumbersome. We will also show the asymptotic form of important moments. Higher-order raw and central moments can be computed from the factorial moments, but they are not particularly illuminating.
Theorem 3 (Factorial and Raw Moments)
Letting \({E}_{r}\equiv {\left(1-\theta r/m\right)}^{n}\), we have:
We can see from Theorem 3 that the occupancy distribution gives a simple form for the factorial moments, in terms of the terms \({E}_{r}\). (This notation comes in handy below when we write the central moments of the distribution.) Since \({\left(m-{K}_{n}\right)}_{r}\) is a polynomial in \({K}_{n}\), the factorial moments are used to derive the raw and central moments. The algebra is cumbersome, so for brevity we state them here as corollaries to this above theorem, without further derivation.
Corollary (Central Moments)
The extended occupancy distribution has mean, variance, skewness and kurtosis given respectively by:
Corollary (Asymptotic Central Moments)
As \(n\to \infty\) we have the asymptotic equivalence \({E}_{r} \sim {e}^{-\theta rn/m}\) which gives the asymptotic forms:
If \(n\to \infty\) and \(m\to \infty\) in a way that yields a fixed finite limit for \(n/m\) then we have \({\gamma }_{n,m}\to 0\) and \({\kappa }_{n,m}\to 3\) (so the distribution is asymptotically unskewed and mesokurtic). □
Corollary (Asymptotic Central Moments)
As \(m\to \infty\) we have the asymptotic equivalence \({m}^{a}{E}_{r}^{b} \sim \sum_{i=0}^{a}\left(\begin{array}{c}bn\\ i\end{array}\right){\left(-1\right)}^{i}{\left(r\theta \right)}^{i}{m}^{a-i}\) which gives the asymptotic forms:
The moments of the extended occupancy distribution give us a reasonable sense of the shape of the distribution. In particular, we see that —under broad limit conditions— the distribution is asymptotically unskewed and mesokurtic. In fact, this is just a partial aspect of a powerful limit result for general occupancy distributions given in Hwang and Janson (2008). If \(n\to \infty\) and \(m\to \infty\) in such a way that \({\sigma }_{n,m}^{2}\to \infty\) (having a fixed finite limit for \(n/m\) is a sufficient condition for this convergence) then the mass function for the extended occupancy distribution converges uniformly to the normal density with the same mean and variance. This result can be used to approximate the occupancy distribution for large values of \(n\) and \(m\) where it is not feasible to compute the distribution (due to difficulties computing the Stirling numbers of the second kind for large input values).
3 The Extended Occupancy Distribution Generalises the Binomial Distribution
Here we will show that the occupancy distribution provides a generalisation of the binomial distribution (i.e., it subsumes the binomial as a special case), which will later allow us to frame various properties and mixture characterisations as extensions of well-known characterisations for the binomial distribution. The attentive reader may already have noticed that our definition of the extended occupancy distribution allows the space parameter (number of bins) to occur in the extended domain \(m\in \overline{\mathbb{N} }\), which allows the value \(m=\infty\). In this latter case we define the distribution by its limit \(\mathrm{Occ}\left(k|n,\infty ,\theta \right)\equiv {\mathrm{lim}}_{m\to \infty }\mathrm{Occ}\left(k|n,m,\theta \right)=\mathrm{Bin}\left(k|n,\theta \right)\). (This case forms part of our definition of the extended occupancy distribution, but we omitted it from the above definition in order to discuss it in more detail here.)
Harkness (1969) notes the similarity of the occupancy distribution to the binomial distribution (pp. 112–114). We will prove the special case identified above formally in a moment, but the simplest way to see that this limit gives the binomial distribution is to observe that the transition probability matrix converges to the infinite dimensional matrix:
The reader will recognise this matrix as the transition probability matrix for a Bernoulli process, and its powers give matrices with elements:
Thus, starting in state \({K}_{0}=0\) and taking \(n\) steps (i.e., allocating \(n\) balls at random) gives the state \({K}_{n}\) having a binomial distribution with size parameter \(n\) and probability parameter \(\theta\). We have derived the result heuristically here, but it can be formally established either via analysis of the limiting properties of Markov chains, or by purely algebraic analysis on the established probability mass function for the extended occupancy distribution.
Theorem 4a (Generalisation of the Binomial Distribution)
The occupancy distribution satisfies the limiting form:
This theorem formally establishes that the extended occupancy distribution is a generalisation of the binomial distribution. Intuitively, the limiting result reflects the fact that, with infinite bins, there is zero probability that any two balls will fall in the same bin. Thus, the occupancy number is then the “effective” number of balls that have not “fallen through” their allocated bins, which is merely a count of independent Bernoulli random variables with fixed probability. Indeed, going back to our initial setup for the occupancy problem, we note that the number of effective balls in the problem is \({n}_{\mathrm{eff}}=\sum_{i=1}^{n}{\mathbb{I}}\left({U}_{i}\ne \bullet\right)=\sum_{i=1}^{n}{\mathbb{I}}\left({Q}_{i}=1\right)\), with the underlying values \({Q}_{1},{Q}_{2},{Q}_{3},\dots \sim \mathrm{IID \, Bern}\left(\theta \right)\).
Since the occupancy distribution generalises the binomial, and since the binomial distribution is well-known to lead to other distributional forms (e.g., the Poisson) under appropriate limits, a natural follow-up question is to ask whether we obtain any interesting distribution if we keep the space parameter \(m\) as a finite value, but take the limits on the other parameters that would yield the Poisson distribution from the binomial. It turns out that this limiting exercise leads us to another binomial distribution (with different parameters). Since the binomial is itself a generalisation of the Poisson distribution, under appropriate limits, the occupancy distribution generalises both the binomial and the Poisson.Footnote 6
Theorem 4b (Limit to the Binomial Distribution)
The occupancy distribution satisfies the limiting form:
The binomial distribution has a number of well-known properties relating to recursion on its parameters, monotone likelihood ratio with respect to its parameters, and resulting stochastic dominance (see e.g., Johnson and Kotz 1969, pp. 50–86). Since the occupancy distribution generalises the binomial, it is useful to see how the properties of the binomial are generalised in the occupancy distribution. In particular, it is well-known that a binomial random variable has a monotone likelihood ratio and is therefore “stochastically increasing” in \(n\) and \(\theta\) (in the sense of first-order stochastic dominance), and that it obeys simple recursive and differential equations on its parameters. We now establish a set of generalised equations, monotonicity and stochastic dominance results that apply to the extended occupancy distribution, with the binomial recursive equations and stochastic dominance results occurring as a special case.
Theorem 5 (Recursive and Differential Equations)
The extended occupancy distribution satisfies the following recursive/differential equations:
Corollary
In the case where \(m=\infty\) we have the binomial recursive/differential equations:
Theorem 6 (First-Order Stochastic Dominance)
Let \(F\left(k|n,m,\theta \right)\equiv {\mathbb{P}}\left({K}_{n}\le k\right)\) denote the cumulative distribution function for the extended occupancy distribution. This satisfies the following first-order stochastic dominance relations:
Theorem 5 establishes equations for the occupancy distribution generalising similar equations for the binomial distribution. Theorem 6 shows that the stochastic dominance results for the binomial distribution also hold for the extended occupancy distribution, and are also extended to the new parameter \(m\). The theorem establishes that an extended occupancy random variable is “stochastically increasing” in \(n\), \(m\) and \(\theta\) (in the sense of first-order stochastic dominance). This result accords with intuition, since increasing the number of balls, the number of bins, or the probability of occupancy, will all tend to increase the number of occupied bins. This stochastic dominance result also gives us a useful intuitive understanding of the effect of the generalisation from the binomial to the occupancy distribution. By imposing a finite parameter \(m\) (rather than the value \(m=\infty\) that gives the binomial distribution) we “squash” the effective balls into a finite number of available bins, which gives rise to the possibility that more than one effective ball will share a bin, so the excess balls will not count towards our occupancy number. A simple corollary of Theorem 6 is that imposition of a finite value of \(m\) (instead of \(m=\infty\)) will tend to reduce the value of the occupancy number.
In Appendix 2 we prove the stochastic dominance results in Theorem 6 algebraically from the mass function for the occupancy distribution. However, a simpler intuition for the result is obtained by noting that the Markov chain defining the extended occupancy distribution has a number of monotonicity properties. The rows of the matrix have cumulative sums that are increasing, so the matrix is “monotone” in the sense described in Daley (1968).Footnote 7 Moreover, in any row of the matrix, the cumulative sum of terms is decreasing in \(m\) and \(\theta\) (in the degenerate case where \(\theta =0\) it is only non-increasing in \(m\)). An alternative proof (not pursued here) could be couched in terms of these general properties of monotone Markov chains.
4 Excess Hitting Times and the Negative Occupancy Distribution
Another distribution related to the binomial distribution is the negative binomial distribution, and it turns out that we can find a natural extension of this latter distribution arising in the occupancy process. This distribution is obtained by considering the “excess hitting time”Footnote 8 for the event \({K}_{n}=k\) in the Markov chain, which we denote by:
Derivation of the distribution of this excess hitting time is quite straight-forward. The event \({T}_{k}\le t\) is equivalent to the event \({K}_{k+t}\ge k\). Hence, for all \(0<k\le m\) and \(t\ge 0\) the cumulative distribution of \({T}_{k}\) can be obtained from the occupancy distribution as:
To find the probability mass function we take the first difference of the distribution function. As a preliminary step, applying the first recursive equation in Theorem 5 we obtain:
We therefore have:
Using this result, the mass function for the excess hitting time is:
Below we formalise this mass function as defining a class of distributions we call the “negative occupancy distribution”. We also consider a special case of this distribution, which describes the behaviour of the excess coupons collected in the famous coupon-collector problem.
Definition (The Negative Occupancy Distribution)
This is a discrete distribution that has probability mass function given over all integer arguments \(t\ge 0\) as:
where \(0<k\le m\le \infty\) are the occupancy parameter (number of occupied bins) and space parameter (number of bins) respectively, and \(0<\theta \le 1\) is the probability parameter. □
Definition (Coupon-Collector Distribution)
This is a discrete distribution with probability mass function given over all integer argument values \(t\ge 0\) as:
where \(0<m<\infty\) is the space parameter (number of bins) and \(0<\theta \le 1\) is the probability parameter. Since \(\mathrm{CoupColl}\left(t|m,\theta \right)=\mathrm{NegOcc}\left(t|m,m,\theta \right)\) the coupon-collector distribution is a special case of the negative occupancy distribution with \(k=m\). □
Remark 2
Mathematically, the mass function of the negative occupancy distribution arises from the ordinary generating function for the noncentral Stirling numbers of the second kind, which is \(\sum_{n=k}^{\infty }S\left(n,k,\phi \right)\cdot {x}^{n}={x}^{k}/\prod_{i=0}^{k}\left(1-\left(i+\phi \right)x\right)\) (see Appendix 1). Substituting the noncentrality parameter \(\phi =m\left(1-\theta \right)/\theta\) and the argument \(x=\theta /m\) and rearranging gives the norming equation for the mass function of the negative occupancy distribution. □
Since \({\mathbb{P}}\left({T}_{k}=t\right)=\mathrm{NegOcc}\left(t|m,k,\theta \right)\) we can see that the negative occupancy distribution is the appropriate family of distributions to describe the behaviour of the excess hitting time in the occupancy process. In the case where \(k=m\) we are looking at the excess number of balls that are required to fully occupy all the bins in the occupancy problem. In this case, we have referred to the distribution as the “coupon-collector” distribution. The classical version of the coupon-collector distribution arises in the “coupon-collector problem”, which examines the number of randomly obtained coupons that need to be collected to obtain a full set (see e.g., Dawkins 1991; Adler et al 2003). Note that our distribution describes the excess number of coupons required for a full set, not the total number of required coupons; it is trivial to convert to the distribution of the total required number of coupons if required.
As with the occupancy distribution above, it is useful to write the mass function of the negative occupancy distribution as a product of the negative binomial mass function multiplied by an adjustment term involving the scaled Stirling function (Appendix 1):
This alternative form shows that there is a close resemblance between the mass functions of these two distributions.Footnote 9 In fact, it is simple to show that the negative occupancy distribution generalises the negative binomial distribution. In the case \(m=\infty\) we define the distribution by the limit \(\mathrm{NegOcc}\left(t|\infty ,k,\theta \right)\equiv {\mathrm{lim}}_{m\to \infty }\mathrm{NegOcc}\left(t|m,k,\theta \right)=\mathrm{NegBin}\left(t|k,1-\theta \right)\).
Theorem 7 (Generalisation of Negative Binomial Distribution)
The negative occupancy distribution satisfies the limiting form:
The negative occupancy distribution generalises the negative binomial distribution by adding a space parameter \(m\) that allows the occupancy to be “squashed” into a finite number of bins.Footnote 10 The distribution obeys a number of recursive/differential results, and stochastic dominance properties that generalise results for the negative binomial distribution.
Theorem 8 (Recursive and Differential Equations)
The negative occupancy distribution has recursive/differential equations with respect to its parameters given by:
Corollary
In the case where \(m=\infty\) we have the recursive/differential equations for the negative binomial distribution:
Theorem 8 gives recursive/differential equations for the negative occupancy distribution. The corollary shows that these equations are extensions of well-known equations for the negative binomial distribution. (The reader should note that corresponding recursive equations for the coupon-collector distribution are a little more complicated than for the negative occupancy distribution, owing to the fact that two parameters are collapsed into one. Detailed analysis of this distribution is outside the scope of the present paper.) These recursive equations give rise to corresponding stochastic dominance results, as shown in the theorem below.
Theorem 9 (First-Order Stochastic Dominance)
Let \(F\left(t|m,k,\theta \right)\equiv {\mathbb{P}}\left({T}_{k}\le t\right)\) denote the cumulative distribution function for the negative occupancy distribution. This satisfies the following first-order stochastic dominance relations:
In the proofs in Appendix 2, we establish the above stochastic dominance results algebraically from the mass function of the negative occupancy distribution. These results follow directly from the monotone likelihood-ratio properties of the distribution with respect to its parameters, but they also have some basic statistical intuition. In particular, the stochastic dominance results for the negative occupancy distribution are intuitively related to stochastic dominance for the extended occupancy distribution —ceteris paribus, increases in either \(m\) or \(\theta\) will tend to increase the occupancy number for any fixed number of balls, and will thus tend to decrease the number of excess balls required to achieve a fixed occupancy number. Contrarily, if we increase the occupancy number \(k\) this will tend to directly increase the excess hitting time, since the hitting time is now for a larger outcome value in a pure birth process.
The negative occupancy distribution provides us with a description of the stochastic behaviour of the excess number of balls required to achieve a given occupancy number in the extended occupancy problem. By a simple shift in location, it can also be used to describe the stochastic behaviour of the total number of balls required to achieve a given occupancy number. The coupon-collector distribution is a special case of the negative occupancy distribution, which solves the famous “coupon-collector problem”, giving a full description of the behaviour of the minimum number of balls needed to achieve full occupancy.
5 The “Spillage” and its Conditional Distribution
From our previous analysis, we have already seen that taking \(m=\infty\) means that each ball falls into a different bin, yielding standard Bernoulli sampling. In this special case the occupancy number must be equal to the effective number of balls in the occupancy problem, and thus, we will have \({n}_{\mathrm{eff}}-{K}_{n}=0\). If we use a finite number of bins \(m<\infty\) this is no longer guaranteed, since it is possible for some of the effective balls to occupy the same bin, so that the effective number of balls may exceed the occupancy number. If we consider a bin containing a single ball to be occupied, we can consider the value \({n}_{\mathrm{eff}}-{K}_{n}\) to constitute “spillage” of balls in excess of the number required to occupy the occupied bins.
The third distribution we will examine in the extended occupancy problem is the conditional distribution of the “spillage”, conditional on the occupancy number. We will derive this distribution using Bayes’ theorem. Conditional on the effective number of balls \({n}_{\mathrm{eff}}=s\), the distribution of the occupancy number is the classical occupancy distribution:
Thus, for all occupancy values \(1\le k\le \mathrm{min}\left(n,m\right)\) and all argument values \(k\le s\le n\) for the effective number of balls, we have:
Taking \(s=k+r\) gives the conditional distribution of the “spillage”, which is:
Definition (The Spillage Distribution)
This distribution is a discrete probability distribution with probability mass function given byFootnote 11:
where \(n\in {\mathbb{N}}\) is the size parameter (number of balls), \(0\le k\le n\) is the occupancy parameter (occupancy number) and \(0\le \phi \le \infty\) is the scale parameter. □
We can see that the spillage distribution describes the behaviour of the “spillage” given our knowledge of the occupancy number. The distribution can also be shifted to describe the behaviour of the number of effective balls given our knowledge of the occupancy number. These two conditional probabilities are given respectively byFootnote 12:
It is worth noting here that the distribution of the “spillage” depends on \(m\) and \(\theta\) only through the scale parameter \(\phi =m\cdot \left(1-\theta \right)/\theta\). In the classical case where \(\theta =1\) we have the scale parameter \(\phi =0\) so \({n}_{\mathrm{eff}}=n\) with probability one (and the corresponding “spillage” is \(n-k\)). In the case where \(m=\infty\) and \(0<\theta <1\) we have the scale parameter \(\phi =\infty\) so \({n}_{\mathrm{eff}}=k\) with probability one (and the corresponding “spillage” is zero). We have been unable to identify this distribution in the existing mathematical or statistical literature, and so to our knowledge it is a “new” distributional family; the name we have ascribed here is our own creation. The name we have chosen reflects the fact that the distribution arises when we consider excess balls above what is required to occupy a bin to “spill” over the capacity of the bin.
Remark 3
Mathematically, the mass function of the spillage distribution arises from the well-known expansion for the noncentral Stirling numbers of the second kind in terms of the central Stirling numbers of the second kind (see Appendix 1), which can be written as:
For \(\phi \ge 0\) the terms in this sum are non-negative and the terms give the kernel of the mass function of the spillage distribution. □
Unlike our previous two distributions, the present distribution does not generalise any common non-trivial distribution arising as a variant of the binomial. In fact, we see from the theorem below that in the case where we have an infinite number of bins (giving \(\phi =\infty\)) the distribution degenerates down to a point mass on \(r=0\), reflecting the fact that there is no “spillage” in this case. Thus, rather than providing a useful generalisation of an existing distribution, like our previous distributions, the spillage distribution is a new form that describes the divergence between the effective number of balls and the occupancy number in the setting of the extended occupancy problem. In the case of a finite number of bins, the occupancy number may be “squashed” down below the effective number of balls by the fact that balls may share bins with non-zero probability.
Theorem 10 (Limit of the Spillage Distribution with Infinite Bins)
The spillage distribution satisfies the limiting form:
Taking \(n=k+r\) gives the probability that the effective number of balls is equal to the full number of balls (i.e., that all balls were effective). The probability of this outcome, written in terms of the occupancy number \(k\) and the spillage \(r\) is:
As we did above with our first two distributions, it is useful to write the mass function of the spillage distribution in an alternative form involving the scaled Stirling function (see Appendix 1). With a bit of algebra it can be shown that:
This gives an alternative form for the mass function of the spillage distribution:
This form of the mass function frames the probabilities relative to the conditional probability that all the balls in the occupancy problem are effective. As can be seen, the form involves writing the mass function as the product of this probability and an adjustment term involving the scaled Stirling numbers.
As with the other two distributions we have examined in this paper, it is possible to use the recursive/differential properties of the noncentral Stirling numbers of the second kind to obtain corresponding recursive/differential equations for the spillage distribution. These equations are rather cumbersome, and not particularly illuminating, so they are omitted here. As should be unsurprising, the spillage is stochastically increasing in \(n\) and decreasing in \(k\). It is also possible to establish that the spillage is stochastically decreasing in \(\phi\), which means it is stochastically decreasing in \(m\) and stochastically increasing in \(\theta\).
6 Mixture Properties Involving the Occupancy Distributions
The three occupancy distributions we have examined have a number of interesting mixture characterisations that are useful for computational and analytic purposes. We will look at each of the distributions in the order presented in our previous examination, and derive mixture characterisations for each, beginning with the extended occupancy distribution. The mixtures in Theorems 12–13 below are also shown in Harkness (1969) (Eqs. 24 and 23 respectively). To the knowledge of the present author, the remaining results are new.
One way to derive mixture results for the extended occupancy distribution is to treat the number of balls in the occupancy problem as a random variable with a specified distribution over the natural numbers. This leads to a general mixture form shown in the theorem below, where the marginal mass function of the occupancy number involves the probability generating function of the underlying distribution.
Theorem 11 (Random Number of Balls)
Suppose we let the number of balls in the occupancy problem be a random variable \(N \sim {p}_{N}\) and let \({G}_{N}\) be the corresponding probability generating function of \(N\). Then the marginal mass function for the occupancy number is:
There are some distributions with simple probability generating functions that conform nicely with this sum expression above, in such a way as to yield useful mixture characterisations. In Theorems 12–13 below we give our first mixture results, which show that binomial and Poisson mixtures of the occupancy distribution both give rise to simple marginal distributions. Later we show some mixture results for the negative occupancy and spillage distributions.
Theorem 12 (Occupancy Distribution is a Binomial Mixture of Occupancy Distributions)
The occupancy distribution satisfies the equation:
In the special case where \(\gamma =1\) we obtain the useful mixture equation:
Theorem 12 has a simple intuition when we interpret it as involving two independent events resulting in “falling through” the bins. The binomial distribution in the mixture gives the number of “effective” balls that do not fall through the bins due to the new event, and each of these terms is multiplied by the occupancy distribution without the probability of that event incorporated. We have stated the theorem in a general form, but the most important case occurs when \(\gamma =1\), which allows us to write the occupancy distribution as a binomial mixture of the classical occupancy distribution. (This mixture is useful for computational purposes; it can be combined with the algorithms in O’Neill 2021 to yield an algorithm to compute the extended occupancy distribution.)
Theorem 13 (Binomial Distribution is a Poisson Mixture of Occupancy Distributions)
The binomial distribution satisfies the equation:
With \(\theta >0\) this can be written to yield a binomial distribution with parameter \(0<\gamma <1\) as:
Theorem 13 gives a mixture characterisation of the binomial distribution as a Poisson mixture of underlying occupancy distributions. We have already noted that the occupancy distribution is a generalisation of the binomial distribution, so this gives us yet another characterisation of the binomial distribution. Theorems 12–13 are extensions of well-known characterisations of the binomial and Poisson distributions. Taking \(m\to \infty\) gives \(\mathrm{Occ}\left(k|r,m,\gamma \right)\to \mathrm{Bin}\left(k|r,\gamma \right)\) so that Theorem 12 reduces down to the well-known mixture:
Taking \(m\to \infty\) and \(\gamma \to 0\) with \(m\gamma \to \lambda \theta\) we can apply L’Hôpital’s rule to show that:
Since these limits give \(\mathrm{Occ}\left(k|r,m,\theta \right)\to \mathrm{Bin}\left(k|r,\theta \right)\) and \(\mathrm{Bin}\left(k|m,\gamma \right)\to \mathrm{Pois}\left(k|\lambda \theta \right)\) we see that the mixture equation in Theorem 13 (second equation) reduces to the well-known mixture:
The extended occupancy distribution provides a useful extension to the binomial distribution, with mixture properties that connect it to other common discrete distributions that arise in statistical practice. In particular, we have established that a Poisson mixture of occupancy distributions yields the binomial distribution, and thus provides a natural link between these two distributions. By viewing the occupancy distribution as a distribution relating to a Markov chain we are able to establish its properties either via the theory of Markov chains, or by direct algebraic analysis of the mass function.
As with the occupancy distribution, it is also possible to derive interesting mixture results using the negative occupancy distribution, which generalise well-known mixture representations for the negative binomial distribution. We will derive the mixture characterisation directly through the mass function in this case.
Theorem 14 (Negative Occupancy Distribution is a Negative Binomial Mixture of Negative Occupancy Distributions)
The negative occupancy distribution satisfies the equation:
In the special case where \(\gamma =1\) we obtain the useful mixture equation:
Theorem 14 is the negative occupancy analogue to Theorem 12 for the occupancy distribution. This theorem also has a simple intuition when we interpret it as involving two independent events resulting in “falling through” the bins. The negative binomial distribution in the mixture gives the component of the excess hitting time that is attributable to the new event, and each of these terms is multiplied by the negative occupancy distribution without the probability of that event incorporated. We have stated the theorem in a general form, but the most important case occurs when \(\gamma =1\), which allows us to write the negative occupancy distribution as a negative binomial mixture of the classical negative occupancy distribution. Again, this latter mixture is especially useful for computational purposes.
We have seen that the negative occupancy distribution generalises the negative binomial distribution, so it is useful to compare the mixture characterisation in Theorem 14 to the known characterisations of the negative binomial distribution. Taking \(m\to \infty\) in Theorem 14 gives \(\mathrm{NegOcc}\left(t|m,k,\gamma \right)\to \mathrm{NegBin}\left(t|k,1-\gamma \right)\) so that Theorem 14 reduces asymptotically down to the well-known negative binomial mixture:
Each of the above mixtures extends known mixture characterisations for the binomial, Poisson, or negative binomial distributions. To complete our analysis of mixture characterisations, we will derive one final mixture result using the spillage distribution. This mixture does not extend any other well-known mixture results. (Taking \(m\to \infty\) reduces the spillage distribution to a point-mass distribution on zero, so in this limiting case the mixture reduces to a trivial assertion that \(\mathrm{Bin}\left(s|n,\theta \right)=\mathrm{Bin}\left(s|n,\theta \right)\).)
Theorem 15 (Binomial Distribution is a Spillage Mixture of Occupancy Distributions)
The binomial distribution satisfies the equation:
We now have a reasonably complete set of mixture results that relate the various occupancy distributions to other well-known discrete distributions. In particular, the most useful mixture results here are Theorems 12 and 14, which allow us to generate the occupancy distribution and the negative occupancy distribution as mixtures of their classical versions. (In the first case this is a binomial mixture and in the second case it is a negative binomial mixture.) These results are useful for computational purposes, since they allow us to generate the extended distributions from their classical counterparts.
7 Application to “Coverage” Analysis in Bootstrapping/Resampling Problems
The extended occupancy problem and the three distributions discussed in this paper arise when we undertake simple-random-sampling with replacement (SRSWR) from a finite set of objects. One statistical context in which this occurs is when we use resampling methods such as “bootstrap” estimation (see e.g., Hall 1992). In this context we may wish to examine the coverage of the data points in the original sample, which leads us to the classical occupancy problem. Consequently, various aspects of the coverage of the original sample are described by the occupancy distributions we have looked at in this paper. In particular, the marginal and conditional distributions of the coverage of the original sample are described by the extended occupancy distribution, and our other distributions describe related aspects of the problem.
The formal description of occupancy analysis in resampling problems is fairly straightforward. In order to stick with our existing notation throughout this paper, suppose we have an initial sample of data points \({\varvec{x}}=\left({x}_{1},\dots ,{x}_{m}\right)\) and we decide to resample \(n\) data points via SRSWR. We note that it is usual in bootstrapping to generate resamples that are of the same size as the original sample (i.e., with \(n=m\)). We will proceed in greater generality because our coverage analysis applies just as well to resampling that does not impose this restriction. For simplicity, we will also assume that all the original data points are distinct, such as would occur when the underlying distribution is continuous. (Analysis can be extended to the case where there are duplicate values in the original sample, but certain aspects of the problem then go beyond the extended occupancy problem.) Formally, bootstrapping works by generating a resampled data vector \({\varvec{y}}=\left({y}_{1},\dots ,{y}_{n}\right)\) where the elements are:
Let \({\mathcal{J}}_{n}\equiv \bigcup_{i=1}^{n}\left\{{U}_{i}\right\}\subseteq \left\{1,\dots ,m\right\}\) be the set of data points that were resampled (as described by their indices) and let \({K}_{n}\equiv \left|{\mathcal{J}}_{n}\right|\) be the size of this set. Let \({T}_{k}\equiv \mathrm{min}\left\{t=\mathrm{0,1},2,\dots |{K}_{k+t}=k\right\}\) be the number of excess resampled values required to ensure that the resampled vector includes \(k\) different data points from the original sample. The set \({\mathcal{J}}_{n}\) describes the “coverage” of the original data points in the resample, and the quantity \({K}_{n}\) tells us the number of data points in the original sample that appear in the resample.
Since the indices for the resampled values are independent uniform random variables over the original indices for the data points, we can easily see that the “coverage” of the original sample is described by the classical occupancy problem. (Our notation \({K}_{n}\) and \({T}_{k}\) for the coverage of the original sample and its excess hitting time reflect the fact that these are the occupancy number and excess hitting time in the classical occupancy problem.) We can easily see that the number of resampled data points \({K}_{n}\) is an occupancy number from the classical occupancy problem, so it follows the marginal and conditional distributions:
Similarly, we can easily see that \({T}_{k}\) is an excess hitting time in the classical occupancy problem, so it follows the marginal and conditional distributions:
Since resampling follows the classical occupancy problem, in this context the “spillage” is trivial; we always have \({n}_{\mathrm{eff}}=n\) so \({n}_{\mathrm{eff}}-{K}_{n}=n-{K}_{n}\) with probability one.
Bootstrapping involves taking some large number of resamples from an original sample vector in order to estimate the sampling distribution of a quantity of interest. Suppose we generate bootstrap simulations \(s=1,\dots ,S\) giving corresponding resample vectors \({{\varvec{y}}}^{\left(1\right)},\dots ,{{\varvec{y}}}^{\left(S\right)}\) using the above method. If \(S\) is large then we can rely on the strong law of large numbers to assure ourselves that the empirical distributions of the coverage quantities for these resamples will converge almost surely to their true distributions:
In Fig. 2 below we show the coverage of \(S=30\) bootstrap resamples of an original sample containing \(n=25\) data points. Each simulated resample is shown by one of the larger squares, and the occupancy number for each resample is the number of red squares in the larger square. (Since we are interested only in the coverage of the original sample here, the figure does not show how many times each value was resampled — only if it is resampled at least once or not.) As we take more and more resamples (i.e., as \(S\to \infty\)), the empirical distribution of the occupancy numbers will converge to the classical occupancy distribution with \(n=m=25\).
Coverage of simulated bootstrap resamples for an original sample of n = 25 data points using m = 25 resampled points. There are S = 30 squares showing simulated resamples. Each red square in the larger square represents a data point that is included in the resample; the occupancy number for each resample is the number of red squares
Analysis of coverage of the original sample has been used in bootstrapping analysis in order to examine issues of bias that arise from conflation of training and testing. For example, Efron and Tibshirani (1997) examine bootstrapping and cross-validation methods for estimation of error rates in binary regression modelling. In order to correct for bias arising from training and testing with the same data, they examine a special type of bootstrapping analysis formulated in Efron (1983) called the “0.632 + bootstrap method” (see also Efron 1986). Though they give their own explanation of this method, here we will examine this method in terms of our own coverage analysis. To do this, we will begin by noting that the expectation and variance of the coverage proportion are:
If we take \(n\to \infty\) and \(m\to \infty\) subject some fixed limiting ratio \(n/m\to \lambda\) then we have the asymptotic equivalence:
Using the standard bootstrapping method where we use a resample that is the same size as the original sample (i.e., with \(n=m\) so that \(\lambda =1\)) we have convergence \({K}_{n}\to 1-1/e\approx 0.632\) (using the strong law of large numbers this is almost sure convergence). This is empirically evident in the resample simulations in Fig. 2.
For any particular data point \(i\), it can similarly be shown that \({\mathbb{P}}\left(i\in {\mathcal{J}}_{n}\right)\to 1-1/e\). Efron and Tibshirani (1997) note that error rate analysis suffers from bias when a data point used for prediction purposes (the “test point”) is also included in the training sample, and this is usually dealt with in cross-validation by using the “leave one out” method (i.e., the test point is left out of the training set used for constructing its prediction). However, in bootstrapping analysis the resample will include the test point with approximate probability \(0.632\). The “0.632+ bootstrap method” formulates an estimator that takes a weighted average of two estimators with known upward and downward biases, where the probabilities of inclusion/exclusion of the test point are used as weightings in the method. Our purpose here is not to recommend this method. (Indeed, the present author has a great deal of scepticism towards bootstrapping methods, but that is beyond the scope of this paper.) It is simply to note that analysis of coverage of the original data points is an important issue in the analysis of bootstrapping and resampling, and it leads to methods that take account of coverage probabilities pertaining to the original sample.
Suppose now that we look more broadly than the bootstrap, at resampling methods that may use a different number of resample values than were in the original sample (i.e., allowing for the case where \(n\ne m\)). One natural question in this context is how many points one should resample in order to get some stipulated minimum probability of a particular level of coverage of the original sample (e.g., including at least \(k\) distinct points from the original sample). The probability that a resample of size \(n\) covers at least \(k\) data points in the original sample is:
Given some stipulated minimum probability \(0<\phi <1\) we can use the occupancy distributions to find the required resample size for this problem:
The value \({\widehat{n}}_{k}\left(\phi \right)\) is the smallest number of resampled values required to give a probability of at least \(\phi\) of an occupancy number at least \(k\). Computation of this quantity allows an analyst to pre-determine the required resample size for a coverage requirement on the original sample. The special case where we seek full coverage of the original sample (i.e., \({K}_{n}=m\)) is a variation of the coupon-collector problem. Investigations of this kind can be of use if an analyst wishes to undertake resampling in a manner that is likely to give some specified level of coverage of the original sample.
8 Summary and Concluding Remarks
Our goal in this paper has been to derive and discuss three interesting distributions arising from the extended occupancy problem. This problem can be framed as a pure-birth Markov chain describing the evolution of the occupancy number as more and more balls are added to a fixed number of bins, with some fixed probability of occupancy for each ball. The three distributions we have examined arise to describe the behaviour of various aspects of this Markov chain — the occupancy number, the excess hitting time for the occupancy number, and the “spillage” describing the difference between the effective number of balls and the occupancy number. It is interesting that all three distributions involve the noncentral Stirling numbers of the second kind, and the first two distributions generalise other well-known discrete distributions.
Setting aside their statistical derivation, the mathematical form of these distributions is also interesting, insofar as each distribution can be framed as the distributional analogue to a well-known equation for the noncentral Stirling numbers of the second kind (i.e., each corresponds to a normed version of a summation result involving the noncentral Stirling numbers of the second kind). The occupancy distribution arises as the distributional analogue to the equation expressing a power-sum as a sum of falling factorials of one of the values, with the noncentral Stirling numbers of the second kind arising as the coefficients. The negative occupancy distribution arises as the distributional analogue to the ordinary generating function of the noncentral Stirling numbers of the second kind. Finally, the spillage distribution arises as the distributional analogue to the equation expressing the noncentral Stirling numbers of the second kind as a weighted sum of the (central) Stirling numbers of the second kind.
The occupancy distributions in this paper arise in contexts where we undertake simple-random-sampling with replacement from a finite set of items, and we then examine the “occupancy number” and related quantities pertaining to the sample. This also arises in bootstrapping and other resampling techniques, where we can use the occupancy distributions to describe the stochastic behaviour of various quantities looking at the coverage of the original sample.
We hope that this inquiry has given the reader an appreciation for the various ways that the noncentral Stirling numbers of the second kind arise in the extended occupancy problem, and has likewise given an appreciation for the fact that this simple problem generates distributions that provide analogues to a wide range of equations involving the noncentral Stirling numbers of the second kind. Both the statistical and mathematical aspects of these three distributions are interesting, and they provide a broad class of discrete distributional forms created from the noncentral Stirling numbers of the second kind. It is particularly interesting that our first two distributions provide generalisations of the binomial and negative binomial distributions, with an additional parameter \(0\le m\le \infty\) that has the effect of “squashing” the occupancy number when we impose a finite value. Imposing a finite number of bins on the extended occupancy problem will tend to give a lower value of the occupancy number, and thus a higher value of the excess hitting time, than would be the case if balls were allocated among an infinite number of bins. This also leads to a non-trivial distribution for the “spillage”, measuring the difference between the effective number of balls and the occupancy number. As a reference, we give some tables (Tables 1, 2, 3 and 4) below that summarise our three distributions, and summarise the mixture results involving these distributions.
Availability of Data and Materials
Not applicable.
Notes
The setup for the observed data in this problem is similar to that occurring in problems with missing data. The outcome \({U}_{i}= \bullet\) is akin to the ball being “missing”.
Since balls that fall through their bins make no contribution to the occupancy, we can legitimately ignore loss of information of which bin they were allocated to in this case.
It is worth noting that we can allow argument values \(k>\mathrm{min}\left(n,m\right)\) in this expression, and in these cases, the stated expression for the mass function reduces down to zero (see e.g., Ruiz 1996). This means that we can validly use this mass function for all \(k=\mathrm{0,1},2,\dots\) and the argument values above the specified support will have zero probability. This is a useful aspect of the mass function, since it allows us to use this function for any non-negative argument value, which allows us to play “fast and loose” with the upper bound of sums over the mass function. This will be especially valuable when we look at expected values of functions of the occupancy number (e.g., the moments of the distribution).
We will examine the special case where \(m=\infty\) in the next section, and there we will show that the extended occupancy distribution degenerates to the binomial distribution in this case.
In the trivial case where \(\theta =0\) we have \(\mathrm{Occ}\left(k|n,m,\theta \right)={\mathbb{I}}\left(k=0\right)\) so that the distribution is a point mass on \(k=0\). This reflects the fact that setting the probability parameter to zero means that all the balls fall through their allocated bins with probability one. This trivial case is not particularly interesting, so we have removed it from the scope of our analysis in this paper. In all subsequent equations we will assume that \(\theta >0\).
The latter is easily achieved by taking \(n\to \infty\), \(m\to \infty\) and \(\theta \to 0\) such that \(n\theta \to \lambda\). Taking these limits gives \(m\left(1-\mathrm{exp}\left(-\lambda /m\right)\right)\to \lambda\) so that \(\mathrm{Occ}\left(k|n,m,\theta \right)\to \mathrm{Pois}\left(k|\lambda \right)\).
That is, the function \(\Lambda \left(t,k\right)\equiv {\mathbb{P}}\left({K}_{n+1}\le k|{K}_{n}=t\right)\) is strictly increasing in \(t\) for every \(k\). (In the degenerate case where \(\theta =0\) it is only non-decreasing.) This can be established from the transition probability matrix \(\mathbf{P}\).
Since the occupancy process is a “pure birth process” the occupancy number can increase by no more than one unit with each ball, which means that the occupancy number \({K}_{n}=k\) requires at least \(n=k\) balls. Our “excess” hitting time is measuring the number of “excess” balls —i.e., the number of balls beyond this minimum— to obtain the stipulated occupancy number.
In this equation we are using the “standard” parameterisation of the negative binomial distribution, giving the probability of seeing \(t\) “failures” until we first get \(k\) “successes”, where the probability parameter in the mass function is the probability of a “failure”. In case of any doubt, we stipulate the definition:
$$\begin{array}{ccc}\mathrm{NegBin}\left(t|k,p\right)=\left(\begin{array}{c}k+t-1\\ t\end{array}\right)\cdot {p}^{t}\cdot {\left(1-p\right)}^{k}& & \text{for all }t=\mathrm{0,1},2,.... \end{array}$$Note that the negative binomial distribution is parameterised in terms of the probability \(1-\theta\) of falling through the bin. The negative binomial counts the number of “failures” before a given number of “successes”, where the distribution is parameterised in terms of the probability of “failure”. In the context of our analysis the “failures” are balls that do not contribute to the occupancy number (either because they fall through their bins or because they occupy a bin that is already occupied) and the “successes” are balls that occupy a new bin and therefore increase the occupancy number. In the case where \(m=\infty\) the parameter \(\theta\) is the probability of occupancy, which is the probability of a “success”. The corresponding probability of a “failure” is \(1-\theta\), which is why this value appears as the probability parameter for the negative binomial distribution.
The case where \(\phi =0\) gives the point-mass distribution \(\mathrm{Spillage}\left(r|n,k,0\right)={\mathbb{I}}\left(r=n-k\right)\). Additionally, the case where \(\phi =\infty\) is defined via the appropriate limit, as the point-mass distribution:
$$\begin{array}{c}\mathrm{Spillage}\left(r|n,k,\infty \right)\equiv \underset{\phi \to \infty }{\mathrm{lim}}\mathrm{Spillage}\left(r|n,k,\phi \right)={\mathbb{I}}\left(r=0\right).\end{array}$$To avoid ambiguity, we need to comment on the case where \(m=\infty\) and \(\theta =1\), which gives an indeterminate form for the scale parameter. This case corresponds to allocation of balls to an infinite number of bins, with zero probability of falling through the bins, and so it should give \({n}_{\mathrm{eff}}=n=k\) with probability one. By convention, in this case we set \(\phi \equiv 0\) so that the distribution for the spillage is a point mass on the value \(r=n-k\). Note that even with this convention, the fact that the values \(n\) and \(k\) are conditioning parameters in the distribution, allows the user to stipulate values \(n\ne k\), and this can give pathological results. This occurs because we are dealing with a conditional distribution where it is possible to stipulate conditioning values that occur jointly with probability zero; the fact that an unusual result can occur is of no consequence.
The second of these conditions is a restatement of an algebraic identity given in Ruiz (1996).
In the cases where \(k>n\) we define \(\Pi \left(n,k,\phi \right)\equiv 0\) by convention. We allow infinite inputs for the argument variables, and in these cases we define the output of the function by the corresponding limits.
Abbreviations
- 60C05:
-
Combinatorial probability
- 60E05:
-
Probability distributions: general theory
- 60J10:
-
Markov chains (discrete-time Markov processes on discrete state spaces)
References
Adler I, Oren S, Ross SM (2003) The coupon collector’s problem revisited. J Appl Probab 40(2):513–518
Charalambides CA (2005) Combinatorial Methods in Discrete Distributions. John Wiley and Sons, New York
Daley DJ (1968) Stochastically monotone Markov chains. Probab Theory Relat Fields 10(4):305–317
Dawkins B (1991) Siobhan’s problem: the coupon-collector revisited. Am Stat 45(1):76–82
Efron B (1983) Estimating the error rate of a prediction rule: improvement on cross-validation. J Am Stat Assoc 78(382):316–331
Efron B (1986) How biased is the apparent error rate of a prediction rule? J Am Stat Assoc 81(394):461–470
Efron B, Tibshirani R (1997) Improvements on cross-validation: the .632+ bootstrap method. J Am Stat Assoc 92(438):548–560
Feller W (1950) An Introduction to Probability Theory and its Applications. John Wiley and Sons, New York
Hall P (1992) The Bootstrap and Edgeworth Expansion. Springer-Verlag, New York
Harkness (1969) The classical occupancy problem revisited. Penn State University, Technical Report. Reprinted in Patil, G.P. (ed) (1970) Random Counts in Scientific Work (Volume 3). Penn State Statistics Series, Penn State University Press, pp. 107-120
Holst L (1986) On birthday, collectors’, occupancy and other classical urn problems. Int Stat Rev 54(1):15–27
Hwang HK, Janson S (2008) Local limit theorems for finite and infinite urn models. Ann Probab 36(3):992–1022
Johnson NL, Kotz S (1969) Discrete Distributions. John Wiley and Sons, New York
Johnson NL, Kotz S (1977) Urn models and their applications. John Wiley and Sons, New York, pp 107–175, 318–370
Kolchin VF, Sevast’yanov BA, Chistyakov VP (1978) Random Allocations. John Wiley and Sons, New York
Koutras M (1982) Non-central Stirling numbers and some applications. Discret Math 42(1):73–89
O’Neill B (2021) The classical occupancy distribution: computation and approximation. Am Stat 75(4):355–363
Park CJ (1972) A note on the classical occupancy problem. Ann Math Stat 43(5):1698–1701
Ruiz S (1996) An algebraic identity leading to Wilson’s theorem. Math Gaz 80(489):579–582
Samuel-Cahn E (1974) Asymptotic distributions for occupancy and waiting time problems with positive probability of falling through the cells. Ann Probab 2(3):515–521
Uppuluri VRR, Carpenter JA (1971) A generalization of the classical occupancy problem. J Math Anal Appl 34(2):316–324
Acknowledgements
The author would like to thank two anonymous referees at the Journal for Statistical Distributions and Applications (JSDA) who provided useful suggestions to improve a previous draft of this paper. The author would also like to thank the editors of Methodology and Computing in Applied Probability (MCAP) for accepting this paper after publication at JSDA fell through due to unforeseen circumstances (see note on publication history below).
Note on Publication History
This paper has had a complicated publication history which may bear noting. The paper was first written by November 2019 and final revisions for publication in the Journal for Statistical Distributions and Applications (JSDA) (Springer) were done in July 2021. The paper was accepted for publication in that journal and proofs and copyright permission were completed with Springer. Unfortunately, JSDA ceased publication with Springer at around this time, and the publication of the present paper was delayed. (We published a preprint to arXiv on 6 September 2022 to allow visibility of the paper, since other research was building on it.) The paper was accepted to appear in MCAP as an alternative outlet. The author has made some minor revisions to incorporate additional references during this time.
Funding
Open Access funding enabled and organized by CAUL and its Member Institutions There was no specific funding for this paper or project.
Author information
Authors and Affiliations
Contributions
The sole author undertook all work on the present paper (i.e., conception, analysis and writing for the paper).
Corresponding author
Ethics declarations
Competing Interests
The author declares that he has no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix 1 Noncentral Stirling Numbers of the Second Kind
In this appendix we set out some basic material on noncentral Stirling numbers of the second kind, which run through much of our analysis. At the outset, we note a trap for the unwary, which is that there are two alternative definitions of these numbers in the literature. Our definition of these numbers is equivalent to the one presented in Charalambides (2005, p. 75, Eqn. 2.7), using the values \(S\left(n,k,\phi \right)\) that satisfy the expansion:
where the values \({\left(t\right)}_{k}={\prod }_{i=0}^{k-1}\left(t-i\right)\) are falling factorials. We can see from this equation that the noncentral Stirling numbers of the second kind are the coefficients of the expansion that converts a power of a sum into a sum of falling factorials. The values \(n\) and \(k\) are non-negative integers and the value \(\phi\) is a real noncentrality parameter. In the special case where \(\phi =0\) we obtain the standard (central) Stirling numbers of the second kind, which satisfy the simpler equation \({t}^{n}=\sum_{k=0}^{n}S\left(n,k\right){\left(t\right)}_{k}\). The Stirling numbers of the second kind arise in discrete analysis, mostly through these conversion equations. Using the forward difference operator \(\Delta\) these numbers can be written as \(S\left(n,k,\phi \right)={\left[{\Delta }^{k}{t}^{n}/k!\right]}_{t=\phi }\) (p. 76, Eqn. 2.21). This method of framing the operator is also used by Uppuluri and Carpenter (1971) (p. 316).
Note: As stated, there are two alternative definitions of these numbers in the literature. The two alternative definitions of the noncentral Stirling numbers of the second kind correspond to two directions in which the noncentrality parameter can be expressed. The definition used here differs from the definition used in Koutras (1982, pp. 81–84), where the author measures non-centrality in the other direction, using the noncentrality parameter \(a=-\phi\). When using this latter definition, all the resulting equations involving these numbers include negative signs attached to the noncentrality parameter (and powers of negative signs in the various expansions). Either definition is perfectly serviceable; they merely express noncentrality in different directions. However, when dealing with the extended occupancy problem, it is more natural to use the definition that avoids adding unnecessary negative signs into our various equations. The reader should of course be careful to ensure that they bear our definition in mind when applying any of the equations in this paper; use of our equations with the alternative definition of these numbers leads to errors. □
Our goal in this paper concerns analysis of the extended occupancy problem, so we will not derive the properties of the Stirling numbers. Charalambides (2005, pp. 73–96) provides a comprehensive introduction to these numbers, with relevant derivations of properties, and this is a useful reference for the interested reader. We give a selection of useful results here. There are several explicit forms for the noncentral Stirling numbers of the second kind, which are each useful in different contexts. One explicit form (p. 85, Eqn. 2.26) is:
The noncentral Stirling numbers of the second kind can also be written explicitly as a sum of powers of values that are increments of the noncentrality parameter (pp. 93–94), as:
where the sum is taken over the set of all vectors \({\varvec{j}}=\left({j}_{1},\dots ,{j}_{k}\right)\) of non-negative integer values with sum \(\sum_{i}{j}_{i}\le n-k\). In practice, the noncentral Stirling numbers of the second kind are usually computed via the following triangular recursive equation (p. 88, Eqn. 2.34):
combined with the initial conditionsFootnote 13:
Many properties of the Stirling numbers are derived using generating functions. The ordinary generating function for these numbers (p. 93, Eqn. 2.40) is:
The exponential generating function for these numbers (p. 80, Eqn. 2.24) is:
A related generating function (p. 80, Eqn. 2.22) is:
Our analysis looks at recursive properties of distributions involving the noncentral Stirling numbers of the second kind. We first note that these numbers can be written in terms of the (central) Stirling numbers of the second kind via the equation (p. 77, Eqn. 2.16):
Lemma 1 (Recursive and Differential Equations)
We have:
Lemma 2 (Telescoping Equation)
We have:
Lemma 3 (Moving the Noncentrality Parameter)
For all \(\phi >0\) and \(\phi ^{\prime}>0\) we have:
To facilitate our analysis we will formulate a function giving a scaled version of the noncentral Stirling numbers of the second kind in the case where \(\phi >0\). For fixed values of \(n\) and \(k\) the second of these equations is an explicit polynomial in \(\phi\). It can be written in long-form as:
Dividing by the leading term of this polynomial gives the scaled Stirling functionFootnote 14:
The scaled Stirling function can be written in various useful forms, among which is:
(Here we use the “falling factorials” \({\left(m\right)}_{t}=\prod_{i=0}^{t-1}\left(m-i\right)\) to expand binomial coefficients.) This is a rational function of \(\phi\) that reduces to a polynomial in \(1/\phi\). The polynomial is scaled so that its constant term is unity. The scaled Stirling function is shown below in Fig. 3 for fixed values of \(n\) and \(k\), plotted as a function of \(\phi\) on a dual-logarithmic scale. As \(\phi \to 0\) we have asymptotic equivalence \(\mathrm{log\,\Pi }\left(n,k,\phi \right) \sim \mathrm{log}\,S\left(n,k\right)-\mathrm{log}\left(\genfrac{}{}{0pt}{}{n}{k}\right)-\left(n-k\right)\,\mathrm{log}\,\phi\), which means that on the dual-logarithmic scale the curves left-converge to lines with slope \(n-k\).
The support of the scaled Stirling function occurs over the subdomain \(0\le k\le n\le \infty\). Within this support, the function can be written in its “canonical form” as follows. Define the set \({\mathcal{J}}_{n,k}\) containing all vectors \({\varvec{j}}=\left({j}_{1},\dots ,{j}_{k}\right)\) of non-negative integer values that sum to \(\sum_{i}{j}_{i}\le n-k\). The number of elements in this set is equal to the number of ways to partition \(n-k\) objects into \(k+1\) subsets. Using the classic “stars and bars” combinatorial method of Feller (1950) we can represent this as \(k\) “bars” in the gaps between \(n-k\) “stars”, which yields \(\left|{\mathcal{J}}_{n,k}\right|=\left(\genfrac{}{}{0pt}{}{n}{k}\right)\). (Note that this set is empty if we are outside the support of \(\Pi\).) Since \(\phi >0\) over the domain of our function, we can take the corresponding equation for the noncentral Stirling numbers of the second kind and divide it through by \({\phi }^{n-k}\) to get the canonical form:
This equation can be given a probabilistic interpretation, by observing that averaging over a set of vector inputs is equivalent to finding the expected value of that function for a random vector that is uniformly distributed over the aforementioned set. To formalise this idea, we define the function \(\begin{array}{c}H\left({\mathbf{J}}_{k},\phi \right)\equiv {\left(1+1/\phi \right)}^{{J}_{1}}\dots {\left(1+k/\phi \right)}^{{J}_{k}}\end{array}.\) Taking \({\mathbf{J}}_{k}\sim \mathrm{U}\left({\mathcal{J}}_{n,k}\right)\) as a random vector that is uniformly distributed over the set \({\mathcal{J}}_{n,k}\), we then have:
In order to derive certain properties of the scaled Stirling function, it is useful to decompose this function using the law of total probability, by conditioning on \({\dot{\kern -3.4pt J}}\equiv \sum_{i}{J}_{i}\). To formalise this, for each \(\ell=k,\dots ,n\) we let \({\mathcal{U}}_{\ell,k}\) denote the set of all vectors \({\varvec{j}}=\left({j}_{1},\dots ,{j}_{k}\right)\) containing non-negative integer values with sum \(\sum_{i}{j}_{i}=\ell-k\). Using the same “stars and bars” argument used above, we can establish that \(\left|{\mathcal{U}}_{\ell,k}\right|=\left(\genfrac{}{}{0pt}{}{\ell-1}{k-1}\right)\), and so we have:
Defining the conditional expectation function \(\Lambda \left(\ell,k,\phi \right)\equiv {\mathbb{E}}\left(H\left({\mathbf{J}}_{k},\phi \right)|\;\,{\dot{\kern -3.4pt J}}=\ell-k\right)\), and using the law of total probability, we can decompose the scaled Stirling function as:
We can now establish some useful asymptotic properties for the scaled Stirling function, which we state in conjunction with corresponding limiting results.
Lemma 4 (Monotonicity, Concavity, and Asymptotics)
The scaled Stirling function has the following monotonicity, concavity, and asymptotic properties over its support.
-
(a)
If \(k<n\) the function is monotonically decreasing in \(\phi\) with limits:
$$\begin{array}{ccc}\underset{\phi \downarrow 0}{\mathrm{lim}}\,\Pi \left(n,k,\phi \right)=\infty & & \underset{\phi \uparrow \infty }{\mathrm{lim}}\,\Pi \left(n,k,\phi \right)=1.\end{array}$$ -
(b)
The function is monotonically increasing in \(n\) with limits:
$$\begin{array}{ccc}\Pi \left(0,k,\phi \right)=1& & \underset{n\uparrow \infty }{\mathrm{lim}}\,\Pi \left(n,k,\phi \right)=\infty .\end{array}$$ -
(c)
The function has limits in \(k\) given by:
$$\begin{array}{ccc}\Pi \left(n,0,\phi \right)=1& &\Pi \left(n,n,\phi \right)=1.\end{array}$$
Appendix 2: Proof of Theorems
Proof of Lemma 1
To establish the first and second equations we use the expansion:
For the first equation in the lemma we extract the term \(\left(i+\phi \right)=\left(k+\phi \right)-\left(k-i\right)\) to obtain the two parts of required sum, and for the second we extract the term \(\left(i+\phi \right)=\left(\phi \right)+\left(i\right)\) to obtain the two parts of the required sum. Equating the right-hand-sides of the first and second equations and simplifying gives \(S\left(n,k-1,\phi +1\right)=k\cdot S\left(n,k,\phi \right)+S\left(n,k-1,\phi \right)\), which gives the third equation. Finally, differentiating with respect to \(\phi\) gives:
which establishes the last equation. ■
Proof of Lemma 2
We will prove the theorem by induction. For \(n=k-1\) the equation reduces to \(S\left(k,k,\phi \right)=S\left(k,k,\phi \right)\) and for \(n<k-1\) the equation reduces to \(0=0\). Assuming the equation holds for some \(n\) we can apply the first equation of Lemma 1 to obtain:
This establishes the inductive step, which completes the proof. ■
Proof of Lemma 3
Using the expansion of the noncentral Stirling numbers of the second kind in terms of the (central) Stirling numbers of the second kind, we have:
Taking \({\phi }^{\prime}=\left({\phi }^{\prime}-\phi \right)+\phi\) and expanding via the binomial theorem then gives:
which was to be shown. ■
Proof of Lemma 4
We establish the monotonicity, concavity, and asymptotic properties of the function in the specified order, using the canonical form of the function.
-
(a)
Each of the terms \(\left(1+i/\phi \right)\) for \(i=1,\dots ,k\) are clearly monotonically decreasing in \(\phi\). Since each of the indices \({j}_{1},\dots ,{j}_{k}\) are non-negative (and at least some of these elements will be strictly positive over at least some of the elements of the sum) the entire sum is therefore monotonically decreasing in \(\phi\), which establishes the monotonicity requirement in the theorem. For the lower limit we have:
$$\begin {aligned} \underset{\phi \to 0}{\mathrm{lim}}\,\Pi \left(n,k,\phi \right)\, &=\underset{\phi \to 0}{\mathrm{lim}}\sum_{{\mathbf{j}}}{\left(1+\frac{1}{\phi }\right)}^{{j}_{1}}\dots {\left(1+\frac{k}{\phi }\right)}^{{j}_{k}}\Bigg{/}\left(\begin{array}{c}n\\ k\end{array}\right)\\&\ge \underset{\phi \to 0}{\mathrm{lim}}\sum_{{\mathbf{j}}}{\left(1+\frac{1}{\phi }\right)}^{{j}_{1}+\dots +{j}_{k}}\Bigg{/}\left(\begin{array}{c}n\\ k\end{array}\right)\\&\ge \underset{\phi \to 0}{\mathrm{lim}}{\left(1+\frac{1}{\phi }\right)}^{n-k}\Bigg{/}\left(\begin{array}{c}n\\ k\end{array}\right)=\infty .\end {aligned}$$Upper limit is:
$$\begin {aligned}\underset{\phi \to \infty }{\mathrm{lim}}\Pi \left(n,k,\phi \right)\, &=\underset{\phi \to \infty }{\mathrm{lim}}\sum_{{\mathbf{j}}}{\left(1+\frac{1}{\phi }\right)}^{{j}_{1}}\dots {\left(1+\frac{k}{\phi }\right)}^{{j}_{k}}\Bigg{/}\left(\begin{array}{c}n\\ k\end{array}\right)\\&=\sum_{{\mathbf{j}}}{1}^{{j}_{1}}\dots {1}^{{j}_{k}}\Bigg{/}\left(\begin{array}{c}n\\ k\end{array}\right)\\&=\left(\begin{array}{c}n\\ k\end{array}\right)\Bigg{/}\left(\begin{array}{c}n\\ k\end{array}\right)=1.\end {aligned}$$ -
(b)
Consider a problem with a fixed value of \(k\) but variable value of \(n\). Suppose we generate a sequence of independent random vectors \({\mathbf{J}}_{1},{\mathbf{J}}_{2},{\mathbf{J}}_{3},\dots\) as \({\mathbf{J}}_{\ell} \sim \mathrm{U}\left({\mathcal{U}}_{\ell,k}\right)\) and another sequence of independent random variables \({U}_{1},{U}_{2},{U}_{3},\dots \sim \mathrm{IID \,U}\left\{1,\dots ,k\right\}\) (all mutually independent). We clearly have the distributional equivalence:
$$H\left({\mathbf{J}}_{\ell},\phi \right)\cdot \left(1+{U}_{\ell}/\phi \right) \stackrel{\mathrm{Dist}}{\sim } H\left({\mathbf{J}}_{\ell+1},\phi \right).$$We also have:
$${\mathbb{E}}\left(1+{U}_{\ell}/\phi \right)=\frac{1}{k}\sum_{i=1}^{k}\left(1+\frac{i}{\phi }\right)=1+\frac{1}{\phi }\cdot \frac{k+1}{2}.$$It follows that:
$$\begin {aligned}\Lambda \left(\ell+1,k,\phi \right)\, &={\mathbb{E}}\left(H\left(\mathbf{J},\phi \right)|\dot{J}=\ell-k+1\right)\\&={\mathbb{E}}\left(H\left({\mathbf{J}}_{\ell+1},\phi \right)\right)\\&={\mathbb{E}}\left(H\left({\mathbf{J}}_{\ell},\phi \right)\cdot \left(1+{U}_{\ell}/\phi \right)\right)\\&={\mathbb{E}}\left(1+{U}_{\ell}/\phi \right)\cdot {\mathbb{E}}\left(H\left({\mathbf{J}}_{\ell},\phi \right)\right)\\&={\mathbb{E}}\left(1+{U}_{\ell}/\phi \right)\cdot {\mathbb{E}}\left(H\left(\mathbf{J},\phi \right)|\dot{J}=\ell-k\right)\\&=\left(1+\frac{1}{\phi }\cdot \frac{k+1}{2}\right)\cdot\Lambda \left(\ell,k,\phi \right).\end {aligned}$$Applying this to the decomposition of the scaled Stirling function gives:
$$\begin {aligned}\Pi \left(n+1,k,\phi \right)\, &=\sum_{\ell=k}^{n+1}\frac{\left(\genfrac{}{}{0pt}{}{\ell-1}{k-1}\right)}{\left(\genfrac{}{}{0pt}{}{n}{k}\right)}\cdot\Lambda \left(\ell,k,\phi \right)\\&=\frac{1}{\left(\genfrac{}{}{0pt}{}{n}{k}\right)}\cdot\Lambda \left(k,k,\phi \right)+\sum_{\ell=k+1}^{n+1}\frac{\left(\genfrac{}{}{0pt}{}{\ell-1}{k-1}\right)}{\left(\genfrac{}{}{0pt}{}{n}{k}\right)}\cdot\Lambda \left(\ell,k,\phi \right)\\&=\frac{1}{\left(\genfrac{}{}{0pt}{}{n}{k}\right)}\cdot\Lambda \left(k,k,\phi \right)+\sum_{\ell=k}^{n}\frac{\left(\genfrac{}{}{0pt}{}{\ell}{k-1}\right)}{\left(\genfrac{}{}{0pt}{}{n}{k}\right)}\cdot\Lambda \left(\ell+1,k,\phi \right)\\&=\frac{1}{\left(\genfrac{}{}{0pt}{}{n}{k}\right)}\cdot\Lambda \left(k,k,\phi \right)+\left(1+\frac{1}{\phi }\cdot \frac{k+1}{2}\right)\sum_{\ell=k}^{n}\frac{\left(\genfrac{}{}{0pt}{}{\ell}{k-1}\right)}{\left(\genfrac{}{}{0pt}{}{n}{k}\right)}\cdot\Lambda \left(\ell,k,\phi \right)\\&>\frac{1}{\left(\genfrac{}{}{0pt}{}{n}{k}\right)}\cdot\Lambda \left(k,k,\phi \right)+\left(1+\frac{1}{\phi }\cdot \frac{k+1}{2}\right)\sum_{\ell=k}^{n}\frac{\left(\genfrac{}{}{0pt}{}{\ell-1}{k-1}\right)}{\left(\genfrac{}{}{0pt}{}{n}{k}\right)}\cdot\Lambda \left(\ell,k,\phi \right)\\&=\frac{1}{\left(\genfrac{}{}{0pt}{}{n}{k}\right)}\cdot\Lambda \left(k,k,\phi \right)+\left(1+\frac{1}{\phi }\cdot \frac{k+1}{2}\right)\Pi \left(n,k,\phi \right)\\&>\left(1+\frac{1}{2\phi }\right)\cdot\Pi \left(n,k,\phi \right)>\Pi \left(n,k,\phi \right).\end {aligned}$$This establishes the monotonicity requirement in the theorem. For the lower limit we have \(\Pi \left(k,k,\phi \right)=1\) from the initial conditions for the noncentral Stirling numbers of the second kind. To obtain the upper limit we first note from the above inequalities that:
$$\Pi \left(n+1,k,\phi \right)=\prod_{\ell=k}^{n-1}\frac{\Pi \left(\ell+1,k,\phi \right)}{\Pi \left(\ell,k,\phi \right)}>{\left(1+\frac{1}{2\phi }\right)}^{n-k}.$$This inequality gives \({\mathrm{lim}}_{n\to \infty }\Pi \left(n,k,\phi \right)\ge {\mathrm{lim}}_{n\to \infty }{\left(1+1/2\phi \right)}^{n-k}=\infty\), which gives us the upper limit in the theorem.
-
(c)
The limits in this part follow trivially from substitution into the function and use of the initial conditions for the noncentral Stirling numbers of the second kind.
Proof of Theorem 1:
Let \({\mathcal{S}}_{n}\equiv {\mathcal{S}}_{n}\left({{\varvec{x}}}_{n}\right)\) be the set of bins occupied by the first \(n\) balls, and note that this is a function of \({{\varvec{x}}}_{n}\). Since \(\left|{\mathcal{S}}_{n}\right|={K}_{n}=t\) we have \({\mathbb{P}}\left({X}_{n+1}\notin {\mathcal{S}}_{n}|{{\varvec{x}}}_{n}\right)=1-t/m\), and we also have:
Hence, we have:
Since \({K}_{n+1}-{K}_{n}\) must be either zero or one, we then have:
which was to be shown. (Note that this function depends on \({{\varvec{x}}}_{n}\) only through \({K}_{n}=t\) so this also justifies the intermediate statement that replaces the conditioning value.) ■
Proof of Theorem 2:
Since \(\mathbf{P}\) is an upper triangle matrix, its eigenvalues are its diagonal elements, which establishes the eigenvalue matrix in the theorem. Let \({\mathbf{v}}_{k}\) be the eigenvector corresponding to the eigenvalue \({\lambda }_{k}\). To confirm that \(\mathbf{v}\) is an eigenvector matrix, we need to establish that for each \(k=\mathrm{0,1},\dots ,m\) we have the characteristic equation:
Since \(m{\lambda }_{k}=m-\left(m-k\right)\theta\) we have:
Hence, the characteristic equation corresponds to the scalar equations:
It is easily shown that substitution of the values in the theorem satisfies these equations, which establishes the stated eigenvector matrix. Now, to establish the inverse eigenvector matrix we need to establish that \(\mathbf{w}\mathbf{v}={\varvec{I}}\). For all \(k=\mathrm{0,1},\dots ,m\) we have:
(In the step to the second line we use the fact that \(\left(\begin{array}{c}m-i\\ k-i\end{array}\right)\cdot \left(\begin{array}{c}m-k\\ j-k\end{array}\right)=\left(\begin{array}{c}m-i\\ j-i\end{array}\right)\cdot \left(\begin{array}{c}j-i\\ j-k\end{array}\right)\), which is easily established by expanding these terms out as ratios of factorials.) This shows that the inverse eigenvalue matrix is indeed the inverse of the eigenvalue matrix. The only remaining part of the theorem is the fact that the eigenvectors are linearly independent, which follows trivially from the fact that \(\mathbf{v}\) has non-zero elements on its main diagonal and upper triangle, and zero elements on its lower triangle. ■
Proof of Theorem 3:
With a bit of algebra it can be shown that:
We therefore have:
which was to be shown. ■
Proof of Theorem 4a:
Since \(\theta >0\) we have:
Thus, the limit \(m\to \infty\) implies that \(\phi \equiv m\cdot \left(1-\theta \right)/\theta \to \infty\). Applying Lemma 4 then gives:
The result follows using the alternative form for the occupancy mass function (written in terms of the scaled Stirling function). ■
Proof of Theorem 4b:
Taking the limits \(n\to \infty\) and \(\theta \to 0\) such that \(n\theta \to \lambda\) gives:
Applying this limit to each term in the summation in occupancy distribution then gives:
which was to be shown. ■
Proof of Theorem 5a:
Using the noncentrality parameter \(\phi =m\cdot \left(1-\theta \right)/\theta\) we have:
We therefore have:
which was to be shown. ■
Lemma 5
The classical occupancy distribution satisfies the following recursive equation:
Proof of Lemma 5
The standard form for the classical occupancy distribution is written in terms of the falling factorials and the Stirling numbers, as:
We therefore have:
which was to be shown. ■
Proof of Theorem 5b:
We first note that with a bit of algebra it can be shown that:
Thus, we have:
Applying the binomial mixture in Theorem 13 and using Lemma 5 we have:
which was to be shown. ■
Proof of Theorem 5c:
Writing the occupancy distribution in explicit form we have:
Differentiating the occupancy distribution with respect to \(\theta\) gives:
which was to be shown. ■
Proof of Theorem 6a:
Applying the first recursive equation in Theorem 5 gives:
Thus, for all \(n\le n^{\prime}\) we have:
where the inequality is strict if \(n<{n}^{\prime}\) and \(m>1\) (and noting that \(\theta >0\)). ■
Proof of Theorem 6b:
As a preliminary step, we note that the second recursive equation in Theorem 5 can be applied for the case \(\theta =1\) to give the recursive formula:
To facilitate our analysis, we define the function \(\Psi\) by:
It is simple to establish that \(\Psi \left(m,0\right)=\Psi \left(m,1\right)=1\) and the function is strictly decreasing after this. For all \(r=\mathrm{0,1},2,\dots ,m-1\) we have:
Now, using the binomial mixture for the extended occupancy distribution in Theorem 12, and then applying this recursive formula, gives:
Thus, for all \(m\le m^{\prime}\) we have:
where the inequality is strict if \(m<m^{\prime}\) and \(n>1\) (and noting that \(\theta >0\)). ■
Proof of Theorem 6c:
Applying the differential equation in Theorem 5 gives:
Thus, for all \(\theta \le \theta ^{\prime}\) we have:
where the inequality is strict if \(\theta <\theta ^{\prime}\).
Proof of Theorem 7:
Proof is analogous to Theorem 4(a), using the alternative form for the negative occupancy mass function (written in terms of the scaled Stirling function). ■
Proof of Theorem 8A:
Using the second recursive equation in Theorem 5 we obtain:
which was to be shown. ■
Proof of Theorem 8b:
Using the first recursive equation in Theorem 5 we obtain:
Applying this recursive equation repeatedly to reduce the occupancy parameter in the second term gives us the desired result. (This result can be formally obtained by using induction with the present recursive equation.) ■
Proof of Theorem 8c:
Using the differential equation in Theorem 5 we obtain:
which was to be shown. ■
Proof of Theorem 9a:
To facilitate our analysis, define the updated probability parameter \({\theta }^{\prime}=m\theta /\left(1-\theta +m\right)\) which satisfies the equation \(m\cdot \left(1-{\theta }^{\prime}\right)/{\theta }^{\prime}=\left(m+1\right)\cdot \left(1-\theta \right)/\theta\). Also, recall the function \(\Psi\) defined in the proof of Theorem 6b. Using the alternative form of the negative occupancy distribution, it is possible to show that:
(We have equality if \(k=1\) and \(\theta =1\), and we have strict inequality if \(k>1\) or \(\theta <1\).) Thus, with the same conditions on strict inequality, we have:
Thus, for all \(m\le m^{\prime}\) we have:
where the inequality is strict if \({m}^{\prime}>m\) and \(k>1\) or \(\theta <1\). ■
Proof of Theorem 9b:
Applying the second recursive equation in Theorem 8 gives:
Thus, for all \(k\le k^{\prime}\) we have:
where the inequality is strict if \(k<k^{\prime}\). ■
Proof of Theorem 9c:
The likelihood-ratio function for the negative occupancy distribution (comparing parameter values \(\theta\) and \(\theta ^{\prime}\)) is:
For \({\theta }^{\prime}\ge \theta\) this function is monotone non-increasing in \(t\), and for \({\theta }^{\prime}>\theta\) it is monotone increasing in \(t\). The stochastic dominance result follows directly from this monotonicity property. ■
Proof of Theorem 10:
Proof is analogous to Theorem 4(a), but we set it out explicitly to avoid any confusion. Applying Lemma 4 gives:
which was to be shown. ■
Proof of Theorem 11:
Using the law of total probability we have:
which was to be shown. ■
Proof of Theorem 12:
This theorem is an application of Theorem 11 where \(N \sim \mathrm{Bin}\left(n,\theta \right)\). This distribution has probability generating function \({G}_{N}\left(z\right)={\left(1-\theta +\theta z\right)}^{n}\) so we have:
Applying Theorem 3 we have:
which establishes the main theorem. The special case holds trivially by substitution. ■
Proof of Theorem 13:
This result is an application of Theorem 11 where \(N \sim \mathrm{Pois}\left(\lambda \right)\). This distribution has probability generating function \({G}_{N}\left(z\right)=\mathrm{exp}\left(\lambda \left(z-1\right)\right)\) so we have:
Applying Theorem 3 and using the binomial theorem we have:
which establishes the first equation in the theorem. The second equation follows immediately by substitution of the parameter value \(\lambda =m\cdot \left|\mathrm{ln}\left(1-\phi \right)\right|/\theta\). ■
Proof of Theorem 14:
Let \(\phi =m\cdot \left(1-\gamma \right)/\gamma\) and \(\phi ^{\prime}=m\cdot \left(1-\gamma \theta \right)/\gamma \theta\) and note that:
By moving the noncentrality parameter using Lemma 3, we have:
Hence, we have:
which establishes the main theorem. The special case holds trivially by substitution. ■
Proof of Theorem 15:
This theorem can be established as a consequence of the law of total probability, using the known distributions established in the body of the paper:
which was to be shown. ■
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
O’Neill, B. Three Distributions in the Extended Occupancy Problem. Methodol Comput Appl Probab 25, 84 (2023). https://doi.org/10.1007/s11009-023-10053-y
Accepted:
Published:
DOI: https://doi.org/10.1007/s11009-023-10053-y