Abstract
Motivated by marine ecological data on species abundance, with the record of subsamples, two problems are investigated in this paper, assuming the Ewens–Pitman sampling formula: One is the prediction of the number of new species if the catch is continued, and the other is how the number of species will decrease in random subsamples. Related statistics and extended models are also considered. A tool for the work is the generalized Stirling numbers of three variables.
Similar content being viewed by others
References
Andrews, G.W., Eriksson, K. (2004). Integer partitions. Cambridge University Press, UK. (Japanese Translation by F. Sato, 2006, Tokyo: Sugaku Shobo.).
Carlton, M. A. (1999). Applications of the two-parameter Poisson-Dirichlet distributions. Ph.D. dissertation. Los Angeles: Department of Statistics, University of California.
Charalambides, C. A. (2002). Enumerative combinatorics. Hoboken, NJ: Wiley.
Charalambides, C. A. (2005). Combinatorial methods in discrete distributions. BocaRaton, FL: Chapman & Halls/CRC.
Charalambides, C. A. (2007). Distributions of random partitions and their applications. Methodology and Computing in Applied Probability, 9, 163–193.
Charalambides, C. A., Singh, J. (1988). A review of the Stirling numbers, their generalizations and statistical applications. Communication in Statistics-Theory and Methods, 17, 2533–2595.
Comtet, L. (1974). Advanced combinatorics: the art of finite and infinite expansions. Dordrecht, Netherlands: Reidel.
Corcino, R. B. (2001). Some theorems on generalized Stirling numbers. Ars Combinatoria, 60, 273–286.
Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems. The Annals of Statistics, 1, 209–230.
Gnedin, A., Pitman, J. (2006). Exchangeable Gibbs partitions and Stirling triangles. Mathematical Sciences, 138, 5674–5685. (original Russian version: Zapiski Nauchnnykh Seminarov ROMI, 325, 2005, 83–102.).
Good, I. J. (1953). The population frequencies of species and the estimation of population parameters. Biometrika, 40, 237–264.
Guisan, A., Zimmermann, N. E. (2000). Predictive habitat distribution models in ecology. Ecological Modelling, 135, 147–186.
Heales, D. S., Brewer, D. T., Wang, Y.-G. (2000). Subsampling multi-species trawl catches from tropical northern Australia: Does it matter which part of the catch is sampled? Fisheries Research, 48, 117–126.
Heales, D. S., Brewer, D. T., Jones, P. N. (2003a). Subsampling trawl catches from vessels using seawater hoppers: Are catch composition estimates biased? Fisheries Research, 63, 113–120.
Heales, D. S., Brewer, D. T., Wang, Y.-G., Jones, P. N. (2003b). Does the size of subsamples taken from multispecies trawl catches affect estimates of catch composition and abundances? Fishery Bulletin, 101, 790–799.
Hoshino, N. (2001). Applying Pitman’s sampling formula to microdata disclosure risk assessment. Journal of Official Statistics, 17, 499–520.
Hoshino, N. (2012). Random partitioning over a sparse contingency table. Annals of the Institute of Statistical Mathematics, 64, 457–474.
Hsu, L. C., Shiue, P. J.-S. (1998). A unified approach to generalized Stirling numbers. Advances in Applied Mathematics, 20, 366–384.
Hubbell, S. P. (2001). The unified neutral theory of biodiversity and biogeography. NJ: Princeton University Press.
Johnson, N. L., Kemp, A. W., Kotz, S. (2005). Univariate Discrete Distributions (3rd ed.). New York, NY: Wiley.
Kerov, S. V. (2006). Coherent random allocations, and the Ewns-Pitman formula. Mathematical Sciences, 135–3, 5699–5710.
Lijoi, A., Mena, R. H., Prünster, I. (2005). Hierarchical mixture modeling with normalized inverse Gaussian priors, Journal of the American Statistical Association, 100, 1278–1291.
Lijoi, A., Mena, R. H., Prünster, I. (2007). Bayesian nonparametric estimation of the probability of discovering new species. Biometrika, 94–4, 769–786.
Lijoi, A., Prünster, I., Walker, S. G. (2008). Bayesian nonparametric estimators derived from conditional Gibbs structures. The Annals of Applied Probability, 18–4, 1519–1547.
McGill, B.J., et al. (2007). Species abundance distributions: moving beyond single prediction theories to integration within an ecological frame work. Ecology Letters, 10, 995–1015. (17 coauthors are abbreviated).
Pitman, J. (2006). Combinatorial Stochastic Processes. Lecture Notes in Mathematics, Vol. 1875. New York, NY: Springer.
Shimadzu, H., Darnell, R. (2013). Quantifying the effect of sub-sampling on species abundance distributions, (submitted).
Sibuya, M. (1993). A random clustering process. Annals of the Institute of Statistical Mathematics, 45, 459–465.
Sibuya, M., Nishimura, K. (1997). Prediction of record-breakings. Statistica Sinica, 7, 893–906.
Sibuya, M., Yamato, H. (2001). Pitman’s model of random partitions. RIMS Kokyuroku. Research Institute for Mathematical Science, Kyoto University, 1240, 64–73. (A revised version: International Conference on Advances in Statistical Inferential Methods: Theory and Applications, Proceedings, June 9–12, 2003, Kazakhstan Inst. Manag. Econ. Strat. Res. (KIMEP), Almaty, pp. 219–231).
van Ark, H., Meiswinkel, R. (1992). Subsampling of large light trap catches of culicoides (diptera: ceratopogonidae). Ondelstepoort Journal of Veterinary Research, 59, 183–189.
Wang, W., Wang, T. (2008). Generalized Riordan arrays. Discrete Mathematics, 308, 6466–6500.
Yamato, H., Sibuya, M. (2003a). Moments of some statistics of Pitman Sampling Formula. Bulletin of Informatics and Cybernetics, Fukuoka, 32, 1–10.
Yamato, H., Sibuya, M. (2003b). Some topics on Pitman’s random partition. Proceedings of the Institute of Statistical Mathematics, 51, 351–372. (in Japanese).
Acknowledgments
The author thanks the referees for useful comments improving the paper. The data in Sect. 5 were made available by courtesy of Commonwealth Scientific and Industrial Research Organisation (CSIRO), Australia. The author thanks the scientists and crews who participated in the 1997 RV Southern Surveyor voyages and CSIRO Marine and Atmospheric Research. He thanks Dr. Ross Darnell, CSIRO Mathematics, and Dr. Hideyasu Shimadzu, Geoscience Australia, for their guide to marine science and the subsampling problem. This work was supported by Grant-in-Aid for Scientific Research awarded by the Japan Society for the Promotion of Science.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
About this article
Cite this article
Sibuya, M. Prediction in Ewens–Pitman sampling formula and random samples from number partitions. Ann Inst Stat Math 66, 833–864 (2014). https://doi.org/10.1007/s10463-013-0427-8
Received:
Revised:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10463-013-0427-8