# Distributions of Random Partitions and Their Applications

- 197 Downloads
- 5 Citations

## Abstract

Assume that a random sample of size *m* is selected from a population containing a countable number of classes (subpopulations) of elements (individuals). A partition of the set of sample elements into (unordered) subsets, with each subset containing the elements that belong to same class, induces a random partition of the sample size *m*, with part sizes {*Z* _{1},*Z* _{2},...,*Z* _{ N }} being positive integer-valued random variables. Alternatively, if *N* _{ j } is the number of different classes that are represented in the sample by *j* elements, for *j*=1,2,...,*m*, then (*N* _{1},*N* _{2},...,*N* _{ m }) represents the same random partition. The joint and the marginal distributions of (*N* _{1},*N* _{2},...,*N* _{ m }), as well as the distribution of \(N=\sum^m_{j=1}N_{\!j}\) are of particular interest in statistical inference. From the inference point of view, it is desirable that all the information about the population is contained in (*N* _{1},*N* _{2},...,*N* _{ m }). This requires that no physical, genetical or other kind of significance is attached to the actual labels of the population classes. In the present paper, combinatorial, probabilistic and compound sampling models are reviewed. Also, sampling models with population classes of random weights (proportions), and in particular the Ewens and Pitman sampling models, on which many publications are devoted, are extensively presented.

## Keywords

Combinatorial sampling model Compound sampling model Dirichlet–Poisson distribution Exchangeable random partitions Ewens sampling formula Partition structures Pitman sampling formula Pólya urn model Stirling numbers## AMS 2000 Subject Classification

Primary 60C05, 62D05 Secondary 05A05, 05A17## Preview

Unable to display preview. Download preview PDF.

## References

- C. E. Antoniak, “Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems,”
*Annals of Statistics*vol. 2 pp. 1152–1174, 1974.zbMATHMathSciNetGoogle Scholar - R. Arratia, A. D. Barbour, and S. Tavaré, “Poisson process approximations for the Ewens sampling formula,”
*Annals of Applied Probability*vol. 2 pp. 519–535, 1992.zbMATHMathSciNetGoogle Scholar - D. E. Barton and F. N. David, “Contagious occupancy,”
*Journal of the Royal Statistical Society, Series B*vol. 21 pp. 120–123, 1959a.zbMATHGoogle Scholar - D. E. Barton and F. N. David, “Haemacytometer counts and occupancy theory,”
*Trabajos de Estadistica*vol. 10 pp. 13–18, 1959b.zbMATHMathSciNetGoogle Scholar - T. Cacoullos and Ch. A. Charalambides, “On minimum variance unbiased estimation for truncated binomial and negative binomial distributions,”
*Annals of the Institute of Statistical Mathematics*vol. 27 pp. 235–244, 1975.zbMATHCrossRefMathSciNetGoogle Scholar - Ch. A. Charalambides, “The asymptotic normality of certain combinatorial distributions,”
*Annals of the Institute of Statistical Mathematics*vol. 28 pp. 499–506, 1976.zbMATHCrossRefMathSciNetGoogle Scholar - Ch. A. Charalambides, “On a restricted occupancy model and its applications,”
*Biometrical Journal*vol. 23 pp. 601–610, 1981.zbMATHCrossRefMathSciNetGoogle Scholar - Ch. A. Charalambides, “On restricted and pseudo-contagious occupancy distributions,”
*Journal of Applied Probability*vol. 20 pp. 872–876, 1983.zbMATHCrossRefMathSciNetGoogle Scholar - Ch. A. Charalambides,
*Enumerative Combinatorics*, CRC Press: Boca Raton, FL, 2002.zbMATHGoogle Scholar - Ch. A. Charalambides,
*Combinatorial Methods in Discrete Distributions*, Wiley: Hoboken, NJ, 2005.zbMATHCrossRefGoogle Scholar - A. De Moivre,
*The Doctrine of Chances*, Pearson: London, 1718 (2nd ed. 1738 and 3rd ed. 1756).Google Scholar - P. Donnelly, “Partitions structures, Pólya urns, the Ewens sampling formula, and the ages of alleles,”
*Theoretical Population Biology*vol. 30 pp. 271–288, 1986.zbMATHCrossRefMathSciNetGoogle Scholar - P. Donnelly and G. Grimmett, “On the asymptotic distribution of large prime factors,”
*Journal of the London Mathematical Society*vol. 47 pp. 395–404, 1993.zbMATHCrossRefMathSciNetGoogle Scholar - P. Donnelly and S. Tavaré, “The ages of alleles and a coalescent,”
*Advances in Applied Probability*vol. 18 pp. 1–19, 1986.zbMATHCrossRefMathSciNetGoogle Scholar - S. Engen,
*Stochastic Abundance Models with Emphasis on Biological Communities and Species Diversity*, Chapman & Hall: London, UK, 1978.zbMATHGoogle Scholar - W. J. Ewens, “The sampling theory of selectively neutral alleles,”
*Theoretical Population Biology*vol. 3 pp. 87–112, 1972.CrossRefMathSciNetGoogle Scholar - W. Feller,
*An Introduction to Probability Theory and its Applications*, (vol. 1, 3rd edn) Wiley: New York, 1968.zbMATHGoogle Scholar - C. M. Goldie, “Records, permutations and greatest convex minorants,”
*Mathematical Proceedings of the Cambridge Philosophical Society*vol. 106 pp. 169–177, 1989.zbMATHMathSciNetGoogle Scholar - R. C. Griffiths, “Lines of descent in the diffusion approximation of neutral Wright–Fisher models,”
*Theoretical Population Biology*vol. 17 pp. 37–50, 1980.zbMATHCrossRefMathSciNetGoogle Scholar - J. C. Hansen, “A functional central limit theorem for the Ewens sampling formula,”
*Journal of Applied Probability*vol. 27 pp. 28–43, 1990.zbMATHCrossRefMathSciNetGoogle Scholar - F. M. Hoppe, “Pólya-like urns and the Ewens sampling formula,”
*Journal of Mathematical Biology*vol. 20 pp. 91–99, 1984.zbMATHCrossRefMathSciNetGoogle Scholar - F. M. Hoppe, “Size-biased filtering of Poisson–Dirichlet samples with an application to partition structures in genetics,”
*Journal of Applied Probability*vol. 23 pp. 1008–1012, 1986.zbMATHCrossRefMathSciNetGoogle Scholar - F. M. Hoppe, “The sampling theory of neutral alleles and an urn model in population genetics,”
*Journal of Mathematical Biology*vol. 25 pp. 123–159, 1987.zbMATHMathSciNetGoogle Scholar - N. Hoshino, “Engen’s extended negative binomial model revisited,”
*Annals of the Institute of Statistical Mathematics*vol. 57 pp. 369–387, 2005.zbMATHCrossRefMathSciNetGoogle Scholar - T. Huillet, “Sampling formulae arising from random Dirichlet populations,”
*Communications in Statistics. Theory and Methods*vol. 34 pp. 1019–1040, 2005.zbMATHCrossRefMathSciNetGoogle Scholar - N. L. Johnson and S. Kotz,
*Urn Models and Their Applications*, Wiley: New York, 1977.Google Scholar - N. L. Johnson and S. Kotz, “Developments in discrete distributions, 1969–1980,”
*International Statistical Review*vol. 50 pp. 71–101, 1982.zbMATHMathSciNetCrossRefGoogle Scholar - N. L. Johnson, S. Kotz, and N. Balakrishnan,
*Discrete Multivariate Distributions*, Wiley: New York, 1997.zbMATHGoogle Scholar - N. L. Johnson, S. Kotz, and A. W. Kemp,
*Univariate Discrete Distributions*, (2nd edn) Wiley: New York, 1992.zbMATHGoogle Scholar - P. Joyce, “Partition structures and sufficient statistics,”
*Journal of Applied Probability*vol. 35 pp. 622–632, 1998.zbMATHCrossRefMathSciNetGoogle Scholar - S. Karlin and J. McGregor, “Addendum to a paper of W. Ewens,”
*Theoretical Population Biology*vol. 3 pp. 113–116, 1972.CrossRefMathSciNetGoogle Scholar - F. P. Kelly, “On stochastic population models in genetics,”
*Journal of Applied Probability*vol. 13 pp. 127–131, 1976.CrossRefMathSciNetzbMATHGoogle Scholar - F. P. Kelly, “Exact results for the Moran neutral allele model,”
*Advances of Applied Probability*vol. 9 pp. 197–201, 1977.CrossRefGoogle Scholar - R. Keener, E. Rothman, and N. Starr, “Distributions on partitions,”
*Annals of Statistics*vol. 15 pp. 1466–1481, 1987.zbMATHMathSciNetGoogle Scholar - J. F. C. Kingman, “Random discrete distributions,”
*Journal of Royal Statistical Society, Series B*vol. 37 pp. 1–22, 1975.zbMATHMathSciNetGoogle Scholar - J. F. C. Kingman, “The population structure associated with the Ewens sampling formula,”
*Theoretical Population Biology*vol. 11 pp. 274–283, 1977.CrossRefMathSciNetGoogle Scholar - J. F. C. Kingman, “Random partitions in population genetics,”
*Proceedings of the Royal Society London, Series A*vol. 361 pp. 1–20, 1978a.zbMATHMathSciNetGoogle Scholar - J. F. C. Kingman, “The representation of partition structures,”
*Journal of the London Mathematical Society*vol. 18 pp. 374–380, 1978b.zbMATHCrossRefMathSciNetGoogle Scholar - J. F. C. Kingman, “On the genealogy of large populations,”
*Journal of Applied Probability*vol. 19A pp. 27–43, 1982a.CrossRefMathSciNetGoogle Scholar - J. F. C. Kingman, “The coalescent,”
*Stochastic Processes and Their Applications*vol. 13 pp. 235–248, 1982b.zbMATHCrossRefMathSciNetGoogle Scholar - S. Kotz and N. Balakrishnan, “Advances in urn models during the past two decades.” In N. Balakrishnan (ed.),
*Advances in Combinatorial Methods and Applications to Probability and Statistics*, pp. 203–257, Birkhäuser: Boston, MA, 1997.Google Scholar - M. Koutras, “Non-central Stirling numbers and some applications,”
*Discrete Mathematics*vol. 42 pp. 73–89, 1982.zbMATHCrossRefMathSciNetGoogle Scholar - S. Kullback, “On certain distributions derived from the multinomial distribution,”
*Annals of Mathematical Statistics*vol. 8 pp. 128–144, 1937.Google Scholar - J. W. McGloskey, “A model for the distribution of individuals by species in an environment,” Ph.D. thesis, Michigan State University, 1965.Google Scholar
- K. Nishimura and M. Sibuya, “Extended Stirling family of discrete probability distributions,”
*Communications in Statistics. Theory and Methods*vol. 26 pp. 1727–1744, 1997.zbMATHMathSciNetGoogle Scholar - G. P. Patil and S. Bildikar, “On minimum variance unbiased estimation for the logarithmic series distribution,”
*Sankyā, Series A*vol. 28 pp. 239–250, 1966.zbMATHMathSciNetGoogle Scholar - G. P. Patil and C. Taillie, “Diversity as a concept and its applications for random communities,”
*Bulletin of the International Statistical Institute*vol. XLVII pp. 497–515, 1977.MathSciNetGoogle Scholar - G. P. Patil and J. K. Wani, “On certain structural properties of the logarithmic series distribution and the first type Stirling distribution,”
*Sankyā, Series A*vol. 27 pp. 271–280, 1965.zbMATHMathSciNetGoogle Scholar - M. Perman, J. Pitman, and M. Yor, “Size-biased sampling of Poisson point processes and excursions,”
*Probability Theory and Related Fields*vol. 92 pp. 21–39, 1992.zbMATHCrossRefMathSciNetGoogle Scholar - J. Pitman, “Exchangeable and partially exchangeable random partitions,”
*Probability Theory and Related Fields*vol. 102 pp. 145–158, 1995.zbMATHCrossRefMathSciNetGoogle Scholar - J. Pitman, “Random discrete distributions invariant under size-biased permutation,”
*Advances in Applied Probability*vol. 28 pp. 525–539, 1996.zbMATHCrossRefMathSciNetGoogle Scholar - J. Pitman and M. Yor, “The two-parameter Poisson–Dirichlet distribution derived from a stable subordinator,”
*Annals of Probability*vol. 25 pp. 855–900, 1997.zbMATHCrossRefMathSciNetGoogle Scholar - G. B. Price, “Distributions derived from the multinomial expansion,”
*American Mathematical Monthly*vol. 53 pp. 59–74, 1946.zbMATHCrossRefMathSciNetGoogle Scholar - V. Romanovsky, “Su due problemi di distribuzione casuale,”
*Giornalle dell’ Istituto Italiano degli Attuari*vol. 5 pp. 196–218, 1934.Google Scholar - M. Sibuya, “A random clustering process,”
*Annals of the Institute of Statistical Mathematics*vol. 45 pp. 459–465, 1993.zbMATHCrossRefMathSciNetGoogle Scholar - M. Sibuya and H. Yamato, “Ordered and unordered random partitions of an integer and the GEM distribution,”
*Statistics & Probability Letters*vol. 25 177–183, 1995.zbMATHCrossRefMathSciNetGoogle Scholar - F. M. Steward, “Variability in the amount of heterozygosity maintained by neutral mutations,”
*Theoretical Population Biology*vol. 9 pp. 188–201, 1976.CrossRefMathSciNetGoogle Scholar - A. C. Trajstman, “On a conjecture of G. A. Watterson,”
*Advances in Applied Probability*vol. 6 pp. 489–493, 1974.zbMATHCrossRefMathSciNetGoogle Scholar - G. Trieb, “A Pólya urn model and the coalescent,”
*Journal of Applied Probability*vol. 29 pp. 1–10, 1992.zbMATHCrossRefMathSciNetGoogle Scholar - G. A. Watterson, “Models for the logarithmic species abudance distributions,”
*Theoretical Population Biology*vol. 6 pp. 217–250, 1974a.CrossRefMathSciNetGoogle Scholar - G. A. Watterson, “The sampling theory of selectively neutral alleles,”
*Advances in Applied Probability*vol. 6 pp. 463–488, 1974b.zbMATHCrossRefMathSciNetGoogle Scholar - G. A. Watterson, “The stationary distribution of the infinitely-many neutral alleles diffusion model,”
*Journal of Applied Probability*vol. 13 pp. 639–651, 1976.zbMATHCrossRefMathSciNetGoogle Scholar - H. Yamato, “A Pólya urn model with a continuum of colours,”
*Annals of the Institute of Statistical Mathematics*vol. 45 pp. 453–458, 1993.zbMATHCrossRefMathSciNetGoogle Scholar - H. Yamato and M. Sibuya, “Moments of some statistics of Pitman sampling formula,”
*Bulletin of Informatics and Cybernetics*vol. 32 pp. 1–10, 2000.zbMATHMathSciNetGoogle Scholar - H. Yamato, M. Sibuya, and T. Nomachi, “Ordered sample from two-parameter GEM distribution,”
*Statistics & Probability Letters*vol. 55 pp. 19–27, 2001.zbMATHCrossRefMathSciNetGoogle Scholar - J. E. Young, “Binary sequential representations of random partitions,”
*Bernoulli*vol. 11 pp. 847–861, 2005.zbMATHMathSciNetCrossRefGoogle Scholar