A generalised matching distribution for the problem of coincidences

This paper examines the classical matching distribution arising in the"problem of coincidences". We generalise the classical matching distribution with a preliminary round of allocation where items are correctly matched with some fixed probability, and remaining non-matched items are allocated using simple random sampling without replacement. Our generalised matching distribution is a convolution of the classical matching distribution and the binomial distribution. We examine the properties of this latter distribution and show how its probability functions can be computes. We also show how to use the distribution for matching tests and inferences of matching ability.

. The history of the problem is examined in Takács (1980), and unsurprisingly, it is tied up with the historical derivation of the subfactorial numbers (representing the number of permutations with no matches).

The classical matching distribution
Each variant of the problem can be formulated in general terms by taking a set of  distinct items (labelled here as consecutive integers).Suppose we permute the order of these items at random to obtain a permutation vector  = ( , … ,  ), assumed to be equiprobable over all permutations of 1, … ,  (i.e., simple random sampling without replacement).We then define the matching statistic by the number of "fixed points" in the permutation vector: The matching statistic counts the number of items in the original vector  that are in the same place after a random permutation.The classical version of the problem focused specifically on the probability that there are no matches, which is ℙ( = 0).However, it is simple to extend the problem to consider the full distribution of the number of matches, which fully describes the stochastic behaviour of the matching statistic.This distribution (which we will call the "classical matching distribution") is defined below.
DEFINITION (Classical matching distribution): This is a family of discrete distributions characterised by the probability mass function:1 where we have a size parameter  = 0,1,2, … , ∞.In the special case where  = ∞ we take Match(|∞) ≡ Pois(|1) by convention (based on limiting results shown below).□ We begin by demonstrating that the matching statistic follows the matching distribution.Since each permutation of the initial items is (by assumption) equiprobable, there are ! equiprobable outcomes for the permutation vector .The number of outcomes with  =  can be obtained by choosing  fixed values from the  elements and then counting the number of derangements for the remaining  −  elements, which we denote as ( − ).The number of derangements is given by the subfactorial numbers, whose properties have been extensively studied (see e.g., Hassani 2003 andHassani 2004).Applying the fundamental principle of counting then gives: Consequently, under the assumption that all possible permutations are equiprobable, we have: The values ( − ) are the subfactorials, defined recursively by the base values (0) = 1 and (1) = 0 and the recursion ( − ) = ( −  − 1)[( −  − 1) + ( −  − 2)].
Consequently, we can write the probabilities for the matching statistic as: In the definition of the matching distribution, we have allowed the special case  = ∞, which we conceive as a limiting case.To obtain this special case we examine the limiting form of the distribution that occurs when  → ∞.In this case, using a well-known limit result for the ratio of derangements over permutations (see e.g., Hassani 2003, pp.1-2) we get: We can see that the distribution approaches the Poisson distribution with unit scale in the limit as  → ∞.This result has been noted in much of the literature on the problem of coincidences (see e.g., Penrice 1991, Crain 1992, Knudsen and Skau 1996), and it implies the asymptotic probability lim → ℙ( = 0) = 1  ⁄ which gives an approximate solution to the original problem.In our definition of the matching distribution we have allowed  = ∞ as an explicit case, and we set the distribution equal to its limit in this case.This convention has the advantage of giving a family of distributions that is closed under limits on its size parameter.
REMARK: One noteworthy aspect of the matching statistic is that we cannot have  =  − 1, and therefore the matching distribution always has ℙ( =  − 1) = 0. Intuitively, this reflects the fact that if  − 1 items are matched in the permutation then the last item must also be matched -i.e., it is not possible to match all but one item in any permutation.Consequently, the matching distribution has a strange looking support, with a missing value at  − 1.Since the distribution has a non-zero probability at  =  this also means that -except in the trivial cases  = 0,1 where the distribution is a point-mass-the distribution is not quasi-concave (unimodal) and is instead slightly bimodal.This slight bimodality of the distribution may need to be taken into account in some cases where the user forms a highest density region (HDR) for the distribution, though it is marginal even in this case.□

Important properties of the classical matching distribution
We will begin our examination of the classical matching distribution by deriving its moments.
As with many discrete distributions on the non-negative integers, it is easiest to examine the moments by first looking at the factorial moments and then using these to obtain the raw moments.Abramson (1973) has previously derived the factorial moments of the distribution and Feller (1968, p. 231) has derived the mean and variance of the distribution.We give a full derivation of all factorial and raw moments, the moment generating function, and the central moments up to fourth order.In Theorems 1-3 below we derive the form of the factorial moments and raw moments and the moment generating function.As with the general form of the distribution, these results generalise the case of a Poisson distribution with unit scale.
THEOREM 1 (Factorial moments): The factorial moments of the matching distribution are: THEOREM 2 (Raw moments): Let (, ) denote the Stirling numbers of the second kind.The raw moments of the matching distribution are: .
In the case where  ≤  this reduces to the Bell numbers: THEOREM 3 (MGF): The moment generating function for the matching distribution is: The form of these moments is quite interesting in its own right.The factorial moments of the distribution are unit value for all  ≤  and zero thereafter.In the special case where  = ∞ the distribution degenerates to the Poisson distribution with unit scale, which is known to have unit factorial moments and raw moments equal to the Bell numbers.Some important moments of the distribution are shown in Theorem 4 below.
THEOREM 4 (Central moments): Some important moments of the matching distribution are: In the special cases where  = 0 or  = 1 the matching distribution is a point-mass on  = 0 or  = 1 respectively, and the moments for those cases reflect this.In the case where  = 2 the matching distribution is equiprobable on the values  = 0 and  = 2, yielding a symmetric and extreme platykurtic distribution (known to be the most platykurtic distribution).In the case where  = 3 the matching distribution is positively skewed and mesokurtic, and in the most general case where  ≥ 4 the distribution is positively skewed and leptokurtic.The fixed values of the main central moments for large  ≥ 4, and the slowly changing nature of the moment generating function, suggests that the distribution does not change much with respect to the size parameter  once this value is already large.In particular, we see that the moment generating function has the form of a partial exponential generating function up to , so in a neighbourhood of  = 0 the early terms are much larger than the later terms.
In Figure 1 we plot the distribution for size values  = 1, … ,12 and confirm this fact.This shows that there is very little change in the distribution once  > 6. (At this point there is no discernible change in the barplot.)To confirm this observation, we plot the SSE comparing each distribution to the limiting case  = ∞ (i.e., the Poisson distribution with unit scale) in Figure 2. 2 From this latter plot we can confirm that the SSE diminishes rapidly, and for  > 6 it is no greater than 6 × 10 , which shows very little deviation from the limiting case.This latter result can be formalised with analytical bounds on the difference between probabilities in the matching distribution and probabilities in the Poisson distribution with unit scale (see e.g., DasGupta 2005, pp. 384-385).An interesting result is that there are alternating sign for these differences, though they diminish rapidly to zero (Ibid,p. 385).The explicit form for the matching distribution involves a sum of rapidly decreasing numbers with alternating sign.This form is not particularly helpful for computational purposes, since sums of this kind commonly leads to problems of arithmetic underflow.In order to compute the mass function of the distribution, it is useful to characterise this function in recursive form.
In Theorem 5 we give a recursive form for the probabilities in the distribution.In Theorem 6 we establish a recursive form comparing mass values at a fixed  as  increases.Both of these forms are simple consequences of the recursive properties of the subfactorial numbers.
THEOREM 5 (Recursive form): The mass function of the matching distribution satisfies the recursive equations: THEOREM 6 (Recursive form): The mass function of the matching distribution satisfies the recursive equation: Consequently, for all fixed  we have: For computational purposes, it is useful to compute the distribution in log-space (i.e., compute the log-probabilities rather than the probabilities) and then convert back to regular probability space at the end of the computation.Consequently, the best way to compute the mass function is using the log-space recursive equations in the corollary to Theorem 5.The stability of the distribution for large  is exhibited by the recursive equation and limiting result in Theorem 6.
We defer computation of the mass function for now, since we will first want to generalise this distribution to handle a broader model for matching that allows results that are "better than random".The classical matching distribution gives the baseline behaviour of the matching number when allocation is done at random, and so it gives a null distribution for this hypothesis, which can serve as a basis for hypothesis testing for better allocation.In the next section we generalise the matching distribution to explicitly model allocation that is better than random.

Generalising the matching distribution
The essence of the classical problem of coincidences is that each selection of an item is assumed to occur via simple random sampling without replacement, such that all permutations over the set of items are equiprobable.This represents the premise that the allocation of items is based on purely random "guesses" or "draws" from the person doing the allocation.However, in certain kinds of matching problems, it is possible that the allocator may have some partial control over the matching.For example, consider the game where an allocator attempts to match baby photos to adults, where there is one baby photo for each adult.If the allocator is completely unable to match characteristics in the photos with characteristics of the adults then we would hypothesise that the selection is essentially just a random permutation of photos.
However, if the allocator has some ability to match characteristics in the photos, he should tend to perform better than a random selection.Indeed, testing whether allocation is "better than random" is one of the primary purposes of the matching distribution.
Consideration of this kind of case leads to a desire to generalise the matching distribution to allow an additional parameter that will tend to increase the number of matches in some natural way.There are many ways that one could go about seeking this generalisation. 3In this paper we will consider the matching mechanism as a two-step process.In the first step, for each item, suppose there is a fixed probability  that the allocator will know the correct allocation for the item and allocate it accordingly.In the second step, all remaining items are allocated via a random permutation (i.e., via simple random sampling without replacement).This two-step process can be described mathematically by letting  denote the (random) number of items with known allocation, so that the number of matches is then given by: The resulting distribution of the matching number  * is then given by the convolution: We take this distribution to be a reasonable model of the matching process in cases where there may be some ability by the allocator to match the items "better than random".
DEFINITION (Generalised matching distribution): This is a family of discrete distributions characterised by the probability mass function: where we have a size parameter  ∈ ℕ and probability parameter 0 ≤  ≤ 1.The generalised matching distribution reduces to the classical matching distribution when  = 0. □ 3 Another interesting generalisation, which we will not pursue in detail here, is to generalise using the distribution with moment generating function proportional to ∑ (( − 1)) ! ⁄ , converging to the Poisson distribution as  → ∞.Unfortunately, this only leads to a valid distribution when 0 ≤  ≤ 1, which allows a lower number of matches but does not allow a larger number of matches.This is an interesting distribution, but it does not allow modelling of cases where allocation is "better than random" so it is unsuitable for our purposes.
Unfortunately, the mass function for the generalised distribution cannot be simplified further; it is a rather cumbersome form that presents some computational challenges.It is interesting to note that the mass function for the generalised matching distribution relates to the mass function for the classical case through the expectation of a falling factorial involving a binomial random variable.Specifically, we have Match(|, ) = Match(|) • (1 − ) • Π(, ), where Π(, , ) = (() ) for a random variable  ~ Bin(, ).The reader should note that the quantity Π(, , ) is not a standard "factorial moment" for the binomial random variable, since those are of the form (() ), not (() ).
One were produced by computing the probabilities using these functions.) In the special cases where  = 0 or  = 1 the generalised matching distribution is a point-mass on  * = 0 or  * = 1 respectively, just as in the classical case.For larger values the form of the distribution is more complicated, but it preserves the property that ℙ( * =  − 1) = 0, reflecting the fact that it is impossible to match all but one item in any permutation.In the special case where  = 0 the generalised distribution reduces down to the classical form and in the special case where  = 1 it denegerates down to a point-mass distribution on .Later we will show that the distribution is stochastically increasing in , so these special cases give us the extremes.In Figure 3 we plot the distribution for size values  = 1, … ,12 which shows some of these properties.Once  becomes large the distribution converges to an asymptotic form for the sum of a binomial random variable and a Poisson random variable with unit scale, and of course, for large  the binomial part will tend to dominate the Poisson part, so the distribution will be close to a binomial distribution.the factorial moments and raw moments are messy, and not particularly illuminating.It is also quite messy to obtain the higher-order moments of the distribution.In Theorem 8 we give the mean and variance of the distribution, which are polynomials of the probability parameter; the reader can easily verify that these moments reduce down to those of the classical matching distribution in the case where  = 0.In Theorem 9 we show that the generalised distribution obeys a central limit theorem as  → ∞, so long as 0 <  < 1.

THEOREM 7 (MGF):
The moment generating function for the generalised distribution is: THEOREM 8 (Important central moments): The mean, variance, skewness and kurtosis of the generalised distribution are given respectively by: When 0 <  < 1 and  is large we obtain the asymptotic equivalences: This shows that the distribution is asymptotically unskewed and mesokurtic when 0 <  < 1.
(When  = 0 we have ( * ) ~ 1 and ( * ) ~ 4 for the Poisson with unit rate.) THEOREM 9 (Central limit theorem): If 0 <  < 1 we get the following asymptotic result: It is worth noting some intuition about the central limit theorem for the generalised distribution.
From the fact that  * =  +  , if 0 <  < 1 we can see that when  → ∞ the binomial random variable  will come to dominate this sum and  *  ⁄ will converge to unity.Since the binomial distribution converges to the normal distribution (using the classical central limit theorem) this means that  * also converges to a normal random variable in this limit.The form of the quantity used in Theorem 8 is the standardised value of  * using its asymptotic mean and variance in Theorem 8.
The main value of our generalised matching distribution is that it allows us to model allocation that is "better than random" and is of increasing accuracy.The case where  = 0 corresponds to the classical matching distribution (where items under consideration are allocated at random) and the case where  = 1 gives a point-mass distribution on  * = .Between these extremes it can be shown that the parameter  has a monotonic effect on the number of matches.To see this, we note that { | ∈ ℕ} is a pure-birth process, which means that  + 1 ≥  for all  ∈ ℕ.Consequently, for all values 0 ≤ ℓ <  we have: This establishes that  * is stochastically increasing in , and since  is stochastically increasing in  it then follows that  * is stochastically increasing in .If we again exclude the extremes and look at the case where 0 <  < 1 the matching distribution obeys a central limit theorem shown in Theorem 9.
Since the mass function for the generalised matching distribution is a weighted sum of binomial mass functions, it is a polynomial in .To facilitate this analysis, and some further analysis later in the paper, we will use the succinct notation (, ℓ) ≡ Match( − ℓ| − ℓ) to denote relevant values of the mass function for the classical matching distribution, which then gives: This mixture form for the generalised matching distribution is useful for computation.We can use recursive methods to compute the classical matching distribution and then use the mixture result to generalise to the generalised matching distribution.

Computing the generalised matching distribution
As previously noted above, the explicit form for the mass function for the matching distribution involves a sum of terms with alternating sign.Such formulae are notoriously problematic for computation since they often lead to arithmetic underflow problems if one tries to compute via the explicit form.Moreover, it is best to compute probability functions in log-space to avoid arithmetic underflow from small probabilities.Consequently, the best way to compute the mass function for the classical matching distribution is work in log-space using the recursive form for the mass function (using the corollary to Theorem 5) and then convert back to regular probability space at the end of the computation.For the generalised matching distribution, we can compute the log-probabilities from the mass function using its convolution representation with the classical matching distribution and the binomial distribution.In Algorithm 1 below we implement this computational method, dealing separately with some trivial special cases.
ALGORITHM 1: Generalised Matching Distribution Input: Size parameter n (positive integer or Inf) Probability parameter θ (probability value) Logical value log (specifying whether output is log-probability) Output: Vector of probabilities/log-probabilities from the generalised matching distribution mass function for k = 0,…,n #Compute the matrix M M <-Matrix with rows indexed by k = 0,…,n and columns indexed by l = 0,…,n (all starting values are -Inf) #Compute the generalised matching distribution using mixture method LOGPROBS <-Vector indexed by k = 0,…,n (all starting values are -Inf) LOGBINOM <-log(Bin(0:n|n, θ) The computation method in Algorithm 1 is stable and accurate and allows the user to compute the mass function for any allowable parameter inputs within the computational power of the machine. 4The algorithm computes the probability mass function for the generalised matching distribution, but it also forms the basis for computing other probability functions, including the cumulative distribution function and quantile function.In the stat.extendpackage this algorithm is used in the functions dmatching, pmatching and qmatching for the mass function, cumulative distribution function and quantile function.The latter functions involve some further computational steps but use the same algorithm for computing the underlying logprobabilities for the distribution.Rather than producing the mass function (or other functions) over the full range  = 0, … ,  these functions are modified to instead take an arbitrary input vector for the argument values and compute outputs over these argument values (i.e., they are "vectorised" versions of the probability mass function, cumulative distribution function and Each of these probability functions other than the random-generation function is built on the computational method in Algorithm 1.For random generation from the generalised matching distribution we can use inverse-transformation sampling via quantile function, computed using the algorithm.Alternatively, we can generate values using direct simulation of the binomial matches and direct simulation of random permutations for the matching part.Interested readers can review code for the functions dmatching, pmatching, qmatching and rmatching in the stat.extendpackage to see how Algorithm 1 is adapted to create each probability function.

Likelihood function and inference for the unknown probability parameter
In applications of the matching problem, the size  is fixed by the design of the matching game, but the probability parameter  is unknown.In a matching game, this parameter represents the ability of the allocator to allocate the items "better than random" with a higher probability value representing greater likelihood of correct allocation.Here we will consider some methods to make inferences for this parameter.We will begin by considering some point-estimators and then later we will look at interval estimates.We will see that there are some essential coherence properties we would wish to impose on interval estimators for the parameter, which presents some challenges in finding an appropriate interval estimator.
In order to make inferences about the parameter  we will derive the form of the log-likelihood function, score function, and information function, first for a single observation and then for a vector of IID observations.For purposes of numerical stability, and for other reasons, it is also useful to consider the problem framed in terms of the transformed parameter  ⟼  given by: This COROLLARY: Consider the transformed parameter  ⟼  given by: The score function and Hessian function for  are given (in terms of the parameter ) by: We can generalise this likelihood analysis for the case where a single player plays a matching game with  objects  times, where we assume that these are IID trials with fixed parameters (i.e., we assume that the player is not getting better or worse at the game as they play it more).
Suppose that we observe the outcomes  = ( , … ,  ) over these games.Taking  as fixed, the log-likelihood function, score function and Hessian function are then given by summing over the data points in the observed data vector.
It is simple to obtain the MLE for an IID sample from the generalised matching distribution by maximising the log-likelihood function using numerical methods.Since we have an explicit form for the Hessian function (and therefore the Fisher information) we can also compute the asymptotic standard error of the MLE and use this to obtain the standard asymptotic confidence interval for the unknown parameter.Alternatively, we can use bootstrapping to obtain a set of values for the MLE (based on resamples of the observed data ) and use this to form a bootstrap confidence interval for the unknown parameter.In either case, it is best to undertake both these computations in terms of the parameter  and then transform back to obtain a corresponding MLE and confidence interval for .Using the transformed parameter  converts the problem to an unconstrained optimisation, which improves numerical stability, and also ensures that the resulting confidence interval respects the allowable parameter range.
The log-likelihood for IID data from the generalised matching distribution is unimodal, so there is a unique maximum likelihood estimator (MLE) for .In the special case where  ≤ 1 the log-likelihood is monotonically decreasing in , so the MLE is  = 0.In the special case where  =  the log-likelihood is monotonically increasing in , so the MLE is  = 1.
In the remaining cases the MLE is found using numerical methods.To compute the confidence interval, we recommend eschewing an equal-tail interval in favour of a "moving" interval.In view of the monotonicity properties of the log-likelihood, when  ≤ 1 the lower bound of the interval should be at zero and when  =  the upper bound of the interval should be at one.
Consequently, at confidence level 1 −  we recommend using the lower tail area: Use of this lower-tail area gives a "moving" interval which interpolates between these extremes to give a confidence interval that (roughly) takes account of the changing shape of the loglikelihood function.In particular, it should give the bounds required by the monotonicity properties of the log-likelihood function.In these cases, it is worth noting that the Hessian of the log-likelihood function vanishes at its extremes and so the standard asymptotic confidence interval performs poorly when the MLE for  is near to zero or one (due to the failure of the underlying assumptions for that interval).Consequently, we recommend using the bootstrap interval in these cases.
The MLE and resulting confidence interval are implemented in the MLE.match function in the stat.extendpackage.This function allows the user to compute the confidence interval using the standard asymptotic form or via bootstrapping.In both cases the function uses the moving interval with the lower-tail area shown above.In the code below we give an example where we generate a random sample of size  = 40 from the matching distribution and use this to estimate the probability parameter.(Our commands are shown in blue and the output is shown in black.)In this example we use the bootstrap confidence interval, which gives better results in cases where the true probability parameter is near its boundaries (i.e., close to zero or one).As can be seen from the output, the MLE in this example is close to the true value of the probability parameter and the confidence interval contains the true value.The MLE is just one way that the probability parameter can be estimated.A simpler estimator for the probability parameter can be obtained using the method of moments.Recalling from Theorem 8 that ( * ) = 1 +  −  , and given the monotonicity properties of the likelihood function, the natural method-of-moments (MOM) estimator is given by the polynomial root: 6 −  + max( − 1,0) = 0.
For  = 1 there is no valid MOM estimator since the matching distribution does not depend on  (in this case any value for  satisfies the above equation).However, for  > 1 there is a unique MOM estimator, and it is simple to use root-finding methods to obtain this value. 7It is easy to see that if  = 0 or  = 1 then  = 0 and if  =  then  = 1.Moreover, the estimate  is increasing in  for values between these two extremes.If  is large we have the approximate MOM estimator: ≈ max( − 1,0)  − 1 .
In the code below we compute the MOM estimate and the approximate MOM estimator for the above example.As can be seen, in this example the MOM estimate is close to the MLE, but slightly further away from the true probability parameter than this latter estimate.Moreover, the approximate MOM estimate is still further from the true probability parameter.
#Set the MOM function FUNC <-function(t) { t^n -n*t + max(mean(DATA)-1, 0) } #Find the MOM estimator uniroot(f = FUNC, lower = 0, upper = 1)$root [1] 0.0390625 #Find the approximate MOM estimator max(mean(DATA)-1, 0)/(n-1) [1] 0.04666667 6 Note that we have made a slight adjustment to the method of moments since ( * ) ≥ 1 and so the observation  = 0 falls below the expected matches for any value of the probability parameter.In this special case, we estimate  = 0 which is a sensible estimate given that the matching statistic is stochastically increasing in . 7To see this, we observe that the MOM estimator is given by the roots ( ) = 0 using the function: For  > 1 and  < 1 we have  () = −(1 −  ) < 0 so  is strictly decreasing.At the boundaries of the parameter range we have (0) = max( − 1,0) ≥ 0 and (1) = max( , 1) −  ≤ 0. Consequently, under the condition that  > 1 there is a unique critical point  satisfying ( ) = 0.

The matching test -hypothesis testing for the probability parameter
Instead of generating an interval estimate for the probability parameter it is sometimes useful to conduct a hypothesis test on this parameter.We will again consider the case where a player plays a matching game with  objects  times, giving rise to IID trials with fixed parameters, and we observe the outcomes  = ( , … ,  ) over these games.We will consider both onesided and two-sided tests using a simple null hypothesis positing a value  for the probability parameter.This encompasses a range of useful tests for the matching probability.
Since the generalised matching distribution follows the monotone-likelihood ratio property, a reasonable test statistic for the hypothesis test is the mean number of matches over the trials, with a larger mean number of matches constituting evidence for a higher value of  and a lower mean number of matches constituting evidence for a lower value of .Of course, it is trivial to see that the special case where  = 1 gives the generalised matching distribution previously described, so we have Match(|, 1, ) = Match(|, ).If  is not too large, then we can compute the exact distribution for the total matches via convolutions of the generalised matching distribution.Contrarily, if  is large, we can rely on the central limit theorem to approximate the generalised matching distribution with a normal distribution over the appropriate support. 9We also note that there are some trivial special cases where we can perform exact computation easily, even when  is large. 10 8 Another way to conduct the hypothesis test is to use the likelihood-ratio statistic for the generalised matching distribution.That method is more complex than the test we will use here. 9For the normal approximation we use the moments in Theorem 8 and we approximate the generalised distribution by Match(|, , ) ≈ N(|( * ), ( * )) over  = 0, … , .(Since it is impossible to get  − 1 matches in a single trial the support of this generalised distribution always excludes the point  =  − 1.) 10 Some trivial special cases are noted here.If  = 0 then the distribution is a point-mass on  = 0, if  = 1 then the distribution is a point-mass on  = , if  = 1 then the distribution is a point-mass on  = .If  = ∞ and  > 0 then the distribution is a point-mass on  = ∞, and if  = ∞ and  = 0 then the distribution reduces to the Poisson distribution Match(|∞, , 0) = Pois(|).In these cases we would not use the normal approximation to the distribution even if  is large.
The various functions for the generalised matching distribution in the stat.extendpackage actually do accommodate this generalisation.Each function allows an input for the number of trials , with default behaviour setting  = 1 if the number of trials is not specified. 11This distribution allows us to easily compute the p-value for any variation of the matching test.For the one-sided versions of the test (testing for parameter values above/below the null value) the p-value is the probability that the total number of matches  would be no less than/no greater than the total observed matches  .For the two-sided version of the test (testing for parameter values different to the null value) the p-value is the probability that the total number of matches  takes on a value no more probable than the total observed matches  .Each of these probabilities can easily be computed from the generalised matching distribution for  .
The canonical version of the matching test occurs when we test whether there is evidence that the matching done by a player is "better than random", which amounts to testing whether the probability parameter is zero.In this case we have the hypotheses: The matching test is implemented in the matching.testfunction in the stat.extendpackage.In the code below we implement the canonical matching test, and then a more specific one-sided matching test on the data vector previously generated from the generalised matching distribution.(By way of reminder, the vector DATA consists of  = 40 independent values from the generalised matching distribution with size  = 16 and probability  = 0.04.)As can be seen from the outcome of the two tests in this example, we find strong evidence that  > 0 (i.e., the matching is "better than random") but we find no evidence that  > 0.05.
#Perform a matching test for matching "better than random" matching.test(DATA,size = 16) Matching test data: DATA mean matches = 1.625, p-value = 0.0001726 alternative hypothesis: true prob is greater than 0 11 This is done using the trials input in the functions.By default, this parameter is set to one, which gives the generalised matching distribution for a single trial.By default the function will compute the exact distribution using convolutions of the generalised matching distribution for a single trial.In the case where  > 100 it will switch (by default) to the normal approximation.The user can override this behaviour by using the approx input to tell the function whether or not to use the normal approximation to the distribution. * () ≡ min{ = 0, … ,  + 1| ∑ Match(|, , 0) < }.
The power function for the canonical matching test is then given by: This power function gives the probability of rejecting the null hypothesis  :  = 0 given the true value of the probability parameter.Ideally, we would like the power to converge to one whenever  > 0 (i.e., whenever the alternative hypothesis is true).
In Figure 5 below we plot the power functions for a 5% significance level canonical matching test for the parameter combinations  = 4, 6, 8, 10 and  = 1, 2, 3, 4, 5.As can be seen, even for modest values for the size and trial parameters the power function increases quite rapidly with respect to the probability parameter .We can also see that for a fixed value of  the test favours using a higher size parameter  and a lower trial parameter .The general shape of the power functions in Figure 1 illustrates convergence towards full power for values  > 0 (i.e., for all values in the alternative hypothesis) as the size and trials parameter increase.This is desirable behaviour for the test, and it reflects the fact that   ⁄ →   ⁄ in probability under broad limiting conditions on either parameter.It turns out that our power function will converge to this ideal power function if  → ∞ or if  > 1 and  → ∞.(In the case where  = 0 or  = 1 the generalised matching distribution is a point-mass distribution and the matching test has power zero.This holds even if we take the limit  → ∞.) Before demonstrating this we will first show that the mean number of matches allows us to perfectly estimate a positive probability parameter under either of these limit conditions.Applying the moment results in Theorem 8 and rules for means and variances of sums of IID random variables, we get: For  > 0 and under either of the stated limit conditions we have the asymptotic equivalence (  ⁄ ) →   ⁄ and (  ⁄ ) → 0. Consequently, the estimator   ⁄ converges in probability to  under either limit condition.This shows that   ⁄ is a consistent estimator for  under either limit condition, but it also allows us to show that the power of the canonical matching test converges to one under the stated limiting conditions over all parameter values in the alternative hypothesis.Under either limiting condition we have   ⁄ →   ⁄ which means that  * ()  ⁄ → 0. Thus, for any value  > 0 we get: One special case of the matching test that is noteworthy is the case where  = 2, such that the player in the matching game is trying to match a pair of objects in their correct order in each trial.In this case, in a single trial there are only two possible matches -the incorrect way of matching the objects, which gives  = 0, and the correct way of matching the objects, which gives  = 2.The player will know the correct order for the objects (not just guess it correctly via a random choice) if the player knows the place of at least one of the two objects, which occurs with probability 1 − (1 − ) = 2 −  = (2 − ).If the player does not know the correct order for the objects then the player will still get the correct match with probability ½ due to randomly matching the two objects.Consequently, the probability of correctly matching both the objects in the matching game is: and we have the simplified distributions: It is simple to show that  is a strictly increasing function of 0 ≤  ≤ 1, which means that the canonical matching distribution effectively reduces to an exact binomial test of the hypotheses: In the code below we show an example of the canonical matching test for some data generated with  = 2 and we confirm that this gives the same p-value as the exact binomial test computed with the binom.testfunction in the stats package (R code team 2021). 13(Note that the "probability of success" mentioned in the latter test is the parameter , not the parameter .) 13 The p-values are identical to within a small tolerance which is due to rounding error in the computations.It is possible to form an alternative matching test using the likelihood-ratio-statistic instead of the mean/total number of matches.This has certain known advantages, including the fact that the Neyman-Pearson lemma ensures efficiency of the test.Nevertheless, the simpler form of test we offer here using the mean/total number of matches as the test statistic has the advantage of simplicity and maintains reasonable power even for modest values of the parameters.

Summary and conclusion
In this paper we have examined the properties of the classical matching distribution and a useful generalisation of this distribution.The classical matching distribution arises in the classical "problem of coincidences" -an antique problem dating back to the early eighteenth century.
In this problem a player attempts to match the unknown order of a finite set of objects, and the order chosen by the player is taken to be a random permutation equivalent to simple-randomsampling without replacement.Our generalised distribution models the case where a player is first able to identify known matches for objects with a fixed probability (for each object), with the ordering of the remaining objects occurring "at random" via a random permutation.This generalisation allows us to deal with cases of matching games where the player has some ability to match the objects under consideration "better than random".
We have given a comprehensive analysis of the generalised matching distribution, including derivation of its probability mass function, moment generating function, main central moments and asymptotic behaviour.We have also developed a useful recursive algorithm to compute the probability mass function for the distribution and we have discussed how this algorithm can be extended to compute other probability functions.Finally, we have examined estimators, confidence intervals and hypothesis tests for the probability parameter in the distribution, under the condition that the size parameter is fixed by experimental design.All of this is implemented in functions in the stat.extendpackage for easy use by readers (see the table of available functions and their inputs below).
Our generalisation of the matching distribution allows a non-zero matching probability, so it can accommodate cases where matching is done "better than random".This allows us to use matching data to make inferences about the matching probability.In particular, we can use the canonical matching test to test for matching that is "better than random" and we can determine the power function for this test.Our view is that this is a useful and realistic generalisation of the classical "problem of coincidences".Indeed, matching analysis has occurred in contexts such as the hat-check problem and the secret-Santa problem, where the nature of the objects (or their owners/gives) may give some clues as to their proper order.In such cases it is useful to be able to model the possibility of a non-zero matching probability occurring prior to the random permutation of objects.
We hope that this paper piques the reader's interest in an interesting variation on an antique problem in probability theory.We also hope that it serves to assist analysis of this problem in a range of realistic cases.The generalised matching distribution we have examined in this paper is a natural extension of the classical matching distribution and it stands as an interesting and useful family of discrete distributions. .

Function Inputs
For  = 0 we have: For  = 1 we have: For  = 2 we have: For  = 3 we have: For  ≥ 4 we have: The raw and central moments in the theorem follow directly from these values.■ PROOF OF THEOREM 5: The base equation in the theorem follows by substitution: To establish the recursive equation in the theorem we use the corresponding recursive equation for the subfactorial numbers (see e.g., Hassani 2004), which is: Applying this equation gives: which was to be shown.■

PROOF OF THEOREM 6:
To establish the result we use the corresponding recursive equation for the subfactorial numbers, which is: Applying this equation with  =  −  + 1 gives: which was to be shown.■ PROOF THEOREM 7: Using Theorem 3 and the relationship  * =  +  we have which was to be shown.
PROOF OF THEOREM 8: From Theorem 2 we have the general rule: and for all  > 0 we have (, 0) = 0 so we can then remove the first index in the sum to get: Applying this rule for  = 1, … ,4 gives the specific results: From these raw moments we can obtain the relevant central moments, which give the moments in the theorem.With a bit of algebra it can be shown that: Substituting these derivatives into the forms for the logarithmic derivatives gives the stated score function and Hessian function.The corollary is easily obtained using the transformation:
effective way to compute the mass function for the generalised matching distribution is to first compute all the mass function values for all classical matching distributions up to the required size (using the recursive equations in Theorem 5) and then apply the convolution equation Match(|, ) = ∑ Bin(ℓ|, ) • Match( − ℓ| − ℓ) ℓ to computation the mass values for the generalised distribution.It is advisable to conduct all these computations in logspace to avoid arithmetic underflow.Probability functions for the matching distribution are available in the stat.extendpackage in R (O'Neill and Fultz 2021).This package gives standard functions dmatching, pmatching, qmatching and rmatching respectively for the probability mass function, cumulative distribution function, quantile function, and random generation function for the distribution.(All the plots and computations in this paper

FIGURE 4 :
FIGURE 4: The generalised matching distribution with  = 12 and  = 0.2 (top left) probability density function; (bottom left) cumulative distribution function; (bottom right) quantile function; (top right) proportions from random sample transformation transforms the probability parameter 0 ≤  ≤ 1 to a value  ∈ ℝ on the extended real line, which means that subsequent steps use unconstrained optimisation instead of constrained optimisation.For purposes of succinct statement of the functions, we will use the notation (, ℓ) ≡ Match( − ℓ| − ℓ) introduced above.We begin by considering the case of a single observation , corresponding to a single round of the matching game.The size parameter  is fixed by the design of the matching game (i.e., it is a known constant).The loglikelihood function, score function and Hessian function are shown in Theorem 10 below, and the corresponding functions for the transformed parameter  are shown in the corollary.THEOREM 10 (Score function and Hessian function): For a single observation  from the generalised matching distribution the log-likelihood function is: ℓ () = log Bin(ℓ|, ) • (, ℓ) ℓ .The corresponding score function and Hessian function are given respectively by: − ½) () −  () .We can write the score function and Hessian function for  in terms of the score function and Hessian function for  as follows:

#
Load library and set parameters library(stat.extend)n <-16 PROB <-0.04 #Generate an IID random sample set.seed(1)DATA <-rmatching(40, size = n, prob = PROB) #Generate the MLE and bootstrap confidence interval MLE.matching(x = DATA, size = n, CI.method = 'bootstrap', conf.level= 0 8 This makes it useful to compute the distribution of the total number of matches  = ∑  over  trials where  , … ,  ~ IID Match(, ).Using convolutions of the generalised matching distribution it is fairly simple to further generalise the distribution to add a trials parameter  and return the distribution of the total number of matches under that number of trials.To this end, we will use the following notation to denote this more generalised distribution: Match(|, , ) = ℙ( = |, , ).