Skip to main content
Log in

Prediction in Ewens–Pitman sampling formula and random samples from number partitions

  • Published:
Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Abstract

Motivated by marine ecological data on species abundance, with the record of subsamples, two problems are investigated in this paper, assuming the Ewens–Pitman sampling formula: One is the prediction of the number of new species if the catch is continued, and the other is how the number of species will decrease in random subsamples. Related statistics and extended models are also considered. A tool for the work is the generalized Stirling numbers of three variables.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

References

  • Andrews, G.W., Eriksson, K. (2004). Integer partitions. Cambridge University Press, UK. (Japanese Translation by F. Sato, 2006, Tokyo: Sugaku Shobo.).

  • Carlton, M. A. (1999). Applications of the two-parameter Poisson-Dirichlet distributions. Ph.D. dissertation. Los Angeles: Department of Statistics, University of California.

  • Charalambides, C. A. (2002). Enumerative combinatorics. Hoboken, NJ: Wiley.

  • Charalambides, C. A. (2005). Combinatorial methods in discrete distributions. BocaRaton, FL: Chapman & Halls/CRC.

  • Charalambides, C. A. (2007). Distributions of random partitions and their applications. Methodology and Computing in Applied Probability, 9, 163–193.

    Google Scholar 

  • Charalambides, C. A., Singh, J. (1988). A review of the Stirling numbers, their generalizations and statistical applications. Communication in Statistics-Theory and Methods, 17, 2533–2595.

    Google Scholar 

  • Comtet, L. (1974). Advanced combinatorics: the art of finite and infinite expansions. Dordrecht, Netherlands: Reidel.

  • Corcino, R. B. (2001). Some theorems on generalized Stirling numbers. Ars Combinatoria, 60, 273–286.

    Google Scholar 

  • Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1, 209–230.

    Google Scholar 

  • Gnedin, A., Pitman, J. (2006). Exchangeable Gibbs partitions and Stirling triangles. Mathematical Sciences, 138, 5674–5685. (original Russian version: Zapiski Nauchnnykh Seminarov ROMI, 325, 2005, 83–102.).

    Google Scholar 

  • Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237–264.

    Google Scholar 

  • Guisan, A., Zimmermann, N. E. (2000). Predictive habitat distribution models in ecology. Ecological Modelling, 135, 147–186.

    Google Scholar 

  • Heales, D. S., Brewer, D. T., Wang, Y.-G. (2000). Subsampling multi-species trawl catches from tropical northern Australia: Does it matter which part of the catch is sampled? Fisheries Research, 48, 117–126.

    Google Scholar 

  • Heales, D. S., Brewer, D. T., Jones, P. N. (2003a). Subsampling trawl catches from vessels using seawater hoppers: Are catch composition estimates biased? Fisheries Research, 63, 113–120.

    Google Scholar 

  • Heales, D. S., Brewer, D. T., Wang, Y.-G., Jones, P. N. (2003b). Does the size of subsamples taken from multispecies trawl catches affect estimates of catch composition and abundances? Fishery Bulletin, 101, 790–799.

    Google Scholar 

  • Hoshino, N. (2001). Applying Pitman’s sampling formula to microdata disclosure risk assessment. Journal of Official Statistics, 17, 499–520.

    Google Scholar 

  • Hoshino, N. (2012). Random partitioning over a sparse contingency table. Annals of the Institute of Statistical Mathematics, 64, 457–474.

    Google Scholar 

  • Hsu, L. C., Shiue, P. J.-S. (1998). A unified approach to generalized Stirling numbers. Advances in Applied Mathematics, 20, 366–384.

    Google Scholar 

  • Hubbell, S. P. (2001). The unified neutral theory of biodiversity and biogeography. NJ: Princeton University Press.

  • Johnson, N. L., Kemp, A. W., Kotz, S. (2005). Univariate Discrete Distributions (3rd ed.). New York, NY: Wiley.

  • Kerov, S. V. (2006). Coherent random allocations, and the Ewns-Pitman formula. Mathematical Sciences, 135–3, 5699–5710.

  • Lijoi, A., Mena, R. H., Prünster, I. (2005). Hierarchical mixture modeling with normalized inverse Gaussian priors, Journal of the American Statistical Association, 100, 1278–1291.

    Google Scholar 

  • Lijoi, A., Mena, R. H., Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika, 94–4, 769–786.

  • Lijoi, A., Prünster, I., Walker, S. G. (2008). Bayesian nonparametric estimators derived from conditional Gibbs structures. The Annals of Applied Probability, 18–4, 1519–1547.

    Google Scholar 

  • McGill, B.J., et al. (2007). Species abundance distributions: moving beyond single prediction theories to integration within an ecological frame work. Ecology Letters, 10, 995–1015. (17 coauthors are abbreviated).

    Google Scholar 

  • Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Mathematics, Vol. 1875. New York, NY: Springer.

  • Shimadzu, H., Darnell, R. (2013). Quantifying the effect of sub-sampling on species abundance distributions, (submitted).

  • Sibuya, M. (1993). A random clustering process. Annals of the Institute of Statistical Mathematics, 45, 459–465.

    Google Scholar 

  • Sibuya, M., Nishimura, K. (1997). Prediction of record-breakings. Statistica Sinica, 7, 893–906.

    Google Scholar 

  • Sibuya, M., Yamato, H. (2001). Pitman’s model of random partitions. RIMS Kokyuroku. Research Institute for Mathematical Science, Kyoto University, 1240, 64–73. (A revised version: International Conference on Advances in Statistical Inferential Methods: Theory and Applications, Proceedings, June 9–12, 2003, Kazakhstan Inst. Manag. Econ. Strat. Res. (KIMEP), Almaty, pp. 219–231).

  • van Ark, H., Meiswinkel, R. (1992). Subsampling of large light trap catches of culicoides (diptera: ceratopogonidae). Ondelstepoort Journal of Veterinary Research, 59, 183–189.

    Google Scholar 

  • Wang, W., Wang, T. (2008). Generalized Riordan arrays. Discrete Mathematics, 308, 6466–6500.

    Google Scholar 

  • Yamato, H., Sibuya, M. (2003a). Moments of some statistics of Pitman Sampling Formula. Bulletin of Informatics and Cybernetics, Fukuoka, 32, 1–10.

    Google Scholar 

  • Yamato, H., Sibuya, M. (2003b). Some topics on Pitman’s random partition. Proceedings of the Institute of Statistical Mathematics, 51, 351–372. (in Japanese).

Download references

Acknowledgments

The author thanks the referees for useful comments improving the paper. The data in Sect. 5 were made available by courtesy of Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia. The author thanks the scientists and crews who participated in the 1997 RV Southern Surveyor voyages and CSIRO Marine and Atmospheric Research. He thanks Dr. Ross Darnell, CSIRO Mathematics, and Dr. Hideyasu Shimadzu, Geoscience Australia, for their guide to marine science and the subsampling problem. This work was supported by Grant-in-Aid for Scientific Research awarded by the Japan Society for the Promotion of Science.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Masaaki Sibuya.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 205 KB)

About this article

Cite this article

Sibuya, M. Prediction in Ewens–Pitman sampling formula and random samples from number partitions. Ann Inst Stat Math 66, 833–864 (2014). https://doi.org/10.1007/s10463-013-0427-8

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10463-013-0427-8

Keywords

Navigation