Abstract
We consider the determination of optimal sample sizes to estimate the concentration of organisms in ballast water via a semiparametric Bayesian approach involving a Dirichlet process mixture based on a Poisson model. This semiparametric model provides greater flexibility to model the organism distribution than that allowed by competing parametric models and is robust against misspecification. To obtain the optimal sample size we use a total cost minimization criterion, based on the sum of a Bayes risk and a sampling cost function. Credible intervals obtained via the proposed model may be used to verify compliance of the water with international standards before deballasting.
Similar content being viewed by others
References
Aguirre-Macedo ML, Vidal-Martinez VM, Herrera-Silveira JA, Valdés-Lozano DS, Herrera-Rodríguez M, Olvera-Novoa MA (2008) Ballast water as a vector of coral pathogens in the Gulf of Mexico: the case of the cayo arcas coral reef. Mar Pollut Bull 56:1570–1577
Blackwell D, MacQueen JB (1973) Ferguson distributions via Pólya-urn schemes. Ann Stat 1:353–355
Casas-Monroy O, Rajakaruna H, Bailey SA (2020) Improving estimation of phytoplankton abundance and distribution in ballast water discharges. J Appl Phycol 32:1185–1199
Cifarelli DM, Melilli E (2000) Some new results for Dirichlet priors. Ann Stat 28:1390–1413
Cifarelli DM, Regazzini E (1990) Distribution functions of means of a Dirichlet process. Correct Ann Stat 22:1633–1634
Costa EG, Lopes RM, Singer JM (2015) Implications of heterogeneous distributions of organisms on ballast water sampling. Mar Pollut Bull 91:280–287
Costa EG, Lopes RM, Singer JM (2016) Sample size for estimating the mean concentration of organisms in ballast water. J Environ Manage 180:433–438
Costa EG, Paulino CD, Singer JM (2021) Sample size for estimating organism concentration in ballast water: a Bayesian approach. Braz J Prob Stat 35:158–171
Escobar MD, West M (1998) Computing nonparametric hierarchical models. In: Dey D, Müller P, Sinha D (eds)., Practical nonparametric and semiparametric Bayesian statistics, chap. 1, pp 1–22, Springer, New York
Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230
Guglielmi A, Holmes CC, Walker SG (2002) Perfect simulation involving functionals of a Dirichlet process. J Comput Graph Stat 11:306–310
Guglielmi A, Tweedie RL (2001) Markov chain Monte Carlo estimation of the law of the mean of a Dirichlet process. Bernoulli 7:573–592
Hjort NL, Ongaro A (2005) Exact inference for random Dirichlet means. Stat Infer Stoch Process 8:227–254
Islam AFMS, Pettit LI (2014) Bayesian sample size determination for the bounded linex loss function. J Stat Comput Simul 84:1644–1653
James LF, Lijoi A, Prünster I (2008) Distributions of linear functionals of two parameter Poisson: Dirichlet random measures. Ann Appl Probab 18:521–551
Lindley DV (1997) The choice of sample size. J R Stat Soc Ser D (Stat) 46:129–138
Müller P, Parmigiani G (1995) Optimal design via curve fitting of Monte Carlo experiments. J Am Stat Assoc 90:1322–1330
Müller P, Quintana FA, Jara A, Hanson T (2015) Bayesian nonparametric data analysis. Springer, New York
Murphy KR, Ritz D, Hewitt CL (2002) Heterogeneous zooplankton distribution in a ship’s ballast tanks. J Plankton Res 24:729–734
Parmigiani G, Inoue LYT (2009) Decision theory: principles and approaches. Wiley, New York
Phadia EG (2016) Prior processes and their applications, 2nd edn. Springer, New York
R Core Team (2016) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Regazzini E, Guglielmi A, Nunno GD (2002) Theory and numerical analysis for exact distributions of functionals of a Dirichlet process. Ann Stat 30:1376–1411
Rice KM, Lumley T, Szpiro AA (2008) Trading bias for precision: decision theory for intervals and sets. http://www.bepress.com/uwbiostat/paper336. Working Paper 336, UW Biostatistics
Sethuraman J, Tiwari RC (1982) Convergence of dirichlet measures and the interpretation of their parameter. In: Proceedings Third Purdue Symposium Statistics Decision Theory and Related Topics. S. S. Gupta and J. Berger, pp 305–315, Academic Press, New York
Walker SG, Mallick BK (1997) A note on the scale parameter of the Dirichlet process. Can J Stat 25:473–479
Acknowledgements
This research received financial support from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grants 153526/2014-9 and 304841/2019-6) and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, grant 2013/21728-2), Brazil. This research was also supported by the Fundação para a Ciência e Tecnologia (FCT), Portugal, under Projects UID/MAT/00006/2019 and UID/MAT/00006/2013. The authors are also grateful to Prof. Peter Müller for the constructive comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A
Appendix A
- Algorithm 1: :
-
Drawing samples from the joint posterior distribution of the \(\lambda _i\).
- Step 1. :
-
Simulate initial values for \(\lambda _i\), \(i=1,\ldots ,n\) from \(F_0\);
- Step 2. :
-
Under a Gibbs sampling scheme, update \(\lambda _i\), \(i=1,\ldots ,n\) using (4);
- Step 3. :
-
Update the values obtained in Step 2 using (5);
- Step 4. :
-
Repeat steps 2-3 a number of times as a burn-in; the values obtained in the last iteration are the required values.
- Algorithm 2::
-
Drawing samples of the random mean \(\overline{\lambda }\).
- Step 1. :
-
Set a value for \(\epsilon\), set \(\overline{\lambda }_1^\ell =0\) and take \(\overline{\lambda }_1^u\) as the largest internal bit value of the computer being employed (in our case, \(1.79\times 10^{308}\));
- Step 2. :
- Step 3. :
-
If the absolute difference between the two quantities is smaller than \(\epsilon\), the required value \(\overline{\lambda }\) may be taken as either \(\overline{\lambda }_t^u\) or \(\overline{\lambda }_t^\ell\). Otherwise, return to step 2.
- Algorithm 3: :
-
Drawing samples from the distribution of \(\overline{\lambda }^{(n)}\)
- Step 1. :
-
Simulate \(B_*\) from a \(\text {Beta}(n, \alpha )\) distribution;
- Step 2. :
-
Simulate \(\overline{\lambda }\) using Algorithm 2;
- Step 3. :
-
Simulate \(D_i\), \(i=1,\ldots ,n\) from a multivariate uniform distribution;
- Step 4. :
-
Simulate \((Z_1,\ldots ,Z_n)\) from \(\nu (d{\varvec{\lambda }}_n\vert {\varvec{x}}_n)\) using Algorithm 1;
- Step 5. :
-
Obtain the required value using the quantities generated in steps 1-4 and (10) of the article.
Rights and permissions
About this article
Cite this article
Costa, E.G., Paulino, C.D. & Singer, J.M. Optimal sample size for estimating the mean concentration of invasive organisms in ballast water via a semiparametric Bayesian analysis. Stat Methods Appl 32, 57–74 (2023). https://doi.org/10.1007/s10260-022-00639-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10260-022-00639-0