Optimal sample size for estimating the mean concentration of invasive organisms in ballast water via a semiparametric Bayesian analysis

Costa, Eliardo G.; Paulino, Carlos Daniel; Singer, Julio M.

doi:10.1007/s10260-022-00639-0

Optimal sample size for estimating the mean concentration of invasive organisms in ballast water via a semiparametric Bayesian analysis

Original Paper
Published: 13 May 2022

Volume 32, pages 57–74, (2023)
Cite this article

Statistical Methods & Applications Aims and scope Submit manuscript

252 Accesses
1 Altmetric
Explore all metrics

Abstract

We consider the determination of optimal sample sizes to estimate the concentration of organisms in ballast water via a semiparametric Bayesian approach involving a Dirichlet process mixture based on a Poisson model. This semiparametric model provides greater flexibility to model the organism distribution than that allowed by competing parametric models and is robust against misspecification. To obtain the optimal sample size we use a total cost minimization criterion, based on the sum of a Bayes risk and a sampling cost function. Credible intervals obtained via the proposed model may be used to verify compliance of the water with international standards before deballasting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ballast Water Sampling and Sample Analysis for Compliance Control

Estimating Percentiles of Bacteriological Counts of Recreational Water Quality Using Tweedie Models

Article 18 September 2014

The Frequency Component of Water Quality Criterion Compliance Assessment Should be Data Driven

Article 12 April 2015

References

Aguirre-Macedo ML, Vidal-Martinez VM, Herrera-Silveira JA, Valdés-Lozano DS, Herrera-Rodríguez M, Olvera-Novoa MA (2008) Ballast water as a vector of coral pathogens in the Gulf of Mexico: the case of the cayo arcas coral reef. Mar Pollut Bull 56:1570–1577
Article Google Scholar
Blackwell D, MacQueen JB (1973) Ferguson distributions via Pólya-urn schemes. Ann Stat 1:353–355
Article MATH Google Scholar
Casas-Monroy O, Rajakaruna H, Bailey SA (2020) Improving estimation of phytoplankton abundance and distribution in ballast water discharges. J Appl Phycol 32:1185–1199
Article Google Scholar
Cifarelli DM, Melilli E (2000) Some new results for Dirichlet priors. Ann Stat 28:1390–1413
Article MathSciNet MATH Google Scholar
Cifarelli DM, Regazzini E (1990) Distribution functions of means of a Dirichlet process. Correct Ann Stat 22:1633–1634
MATH Google Scholar
Costa EG, Lopes RM, Singer JM (2015) Implications of heterogeneous distributions of organisms on ballast water sampling. Mar Pollut Bull 91:280–287
Article Google Scholar
Costa EG, Lopes RM, Singer JM (2016) Sample size for estimating the mean concentration of organisms in ballast water. J Environ Manage 180:433–438
Article Google Scholar
Costa EG, Paulino CD, Singer JM (2021) Sample size for estimating organism concentration in ballast water: a Bayesian approach. Braz J Prob Stat 35:158–171
Article MathSciNet MATH Google Scholar
Escobar MD, West M (1998) Computing nonparametric hierarchical models. In: Dey D, Müller P, Sinha D (eds)., Practical nonparametric and semiparametric Bayesian statistics, chap. 1, pp 1–22, Springer, New York
Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230
Article MathSciNet MATH Google Scholar
Guglielmi A, Holmes CC, Walker SG (2002) Perfect simulation involving functionals of a Dirichlet process. J Comput Graph Stat 11:306–310
Article MathSciNet Google Scholar
Guglielmi A, Tweedie RL (2001) Markov chain Monte Carlo estimation of the law of the mean of a Dirichlet process. Bernoulli 7:573–592
Article MathSciNet MATH Google Scholar
Hjort NL, Ongaro A (2005) Exact inference for random Dirichlet means. Stat Infer Stoch Process 8:227–254
Article MathSciNet MATH Google Scholar
Islam AFMS, Pettit LI (2014) Bayesian sample size determination for the bounded linex loss function. J Stat Comput Simul 84:1644–1653
Article MathSciNet MATH Google Scholar
James LF, Lijoi A, Prünster I (2008) Distributions of linear functionals of two parameter Poisson: Dirichlet random measures. Ann Appl Probab 18:521–551
Article MathSciNet MATH Google Scholar
Lindley DV (1997) The choice of sample size. J R Stat Soc Ser D (Stat) 46:129–138
Google Scholar
Müller P, Parmigiani G (1995) Optimal design via curve fitting of Monte Carlo experiments. J Am Stat Assoc 90:1322–1330
MathSciNet MATH Google Scholar
Müller P, Quintana FA, Jara A, Hanson T (2015) Bayesian nonparametric data analysis. Springer, New York
Book MATH Google Scholar
Murphy KR, Ritz D, Hewitt CL (2002) Heterogeneous zooplankton distribution in a ship’s ballast tanks. J Plankton Res 24:729–734
Parmigiani G, Inoue LYT (2009) Decision theory: principles and approaches. Wiley, New York
Book MATH Google Scholar
Phadia EG (2016) Prior processes and their applications, 2nd edn. Springer, New York
Book MATH Google Scholar
R Core Team (2016) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/
Regazzini E, Guglielmi A, Nunno GD (2002) Theory and numerical analysis for exact distributions of functionals of a Dirichlet process. Ann Stat 30:1376–1411
Article MathSciNet MATH Google Scholar
Rice KM, Lumley T, Szpiro AA (2008) Trading bias for precision: decision theory for intervals and sets. http://www.bepress.com/uwbiostat/paper336. Working Paper 336, UW Biostatistics
Sethuraman J, Tiwari RC (1982) Convergence of dirichlet measures and the interpretation of their parameter. In: Proceedings Third Purdue Symposium Statistics Decision Theory and Related Topics. S. S. Gupta and J. Berger, pp 305–315, Academic Press, New York
Walker SG, Mallick BK (1997) A note on the scale parameter of the Dirichlet process. Can J Stat 25:473–479
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

This research received financial support from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grants 153526/2014-9 and 304841/2019-6) and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, grant 2013/21728-2), Brazil. This research was also supported by the Fundação para a Ciência e Tecnologia (FCT), Portugal, under Projects UID/MAT/00006/2019 and UID/MAT/00006/2013. The authors are also grateful to Prof. Peter Müller for the constructive comments and suggestions.

Author information

Carlos Daniel Paulino and Julio M. Singer authors have contributed equally.

Authors and Affiliations

Departamento de Estatística, Universidade Federal do Rio Grande do Norte, Natal, Brazil
Eliardo G. Costa
Departamento de Matemática, IST and CEAUL, FCUL, Universidade de Lisboa, Lisboa, Portugal
Carlos Daniel Paulino
Departamento de Estatística, Universidade de São Paulo, São Paulo, Brazil
Julio M. Singer

Authors

Eliardo G. Costa
View author publications
You can also search for this author in PubMed Google Scholar
Carlos Daniel Paulino
View author publications
You can also search for this author in PubMed Google Scholar
Julio M. Singer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Eliardo G. Costa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Algorithm 1: :: Drawing samples from the joint posterior distribution of the \(\lambda _i\).
Step 1. :: Simulate initial values for \(\lambda _i\), \(i=1,\ldots ,n\) from \(F_0\);
Step 2. :: Under a Gibbs sampling scheme, update \(\lambda _i\), \(i=1,\ldots ,n\) using (4);
Step 3. :: Update the values obtained in Step 2 using (5);
Step 4. :: Repeat steps 2-3 a number of times as a burn-in; the values obtained in the last iteration are the required values.

Algorithm 2::: Drawing samples of the random mean \(\overline{\lambda }\).
Step 1. :: Set a value for \(\epsilon\), set \(\overline{\lambda }_1^\ell =0\) and take \(\overline{\lambda }_1^u\) as the largest internal bit value of the computer being employed (in our case, \(1.79\times 10^{308}\));
Step 2. :: Update the upper and lower quantities using (8) and (9);
Step 3. :: If the absolute difference between the two quantities is smaller than \(\epsilon\), the required value \(\overline{\lambda }\) may be taken as either \(\overline{\lambda }_t^u\) or \(\overline{\lambda }_t^\ell\). Otherwise, return to step 2.

Algorithm 3: :: Drawing samples from the distribution of \(\overline{\lambda }^{(n)}\)
Step 1. :: Simulate \(B_*\) from a \(\text {Beta}(n, \alpha )\) distribution;
Step 2. :: Simulate \(\overline{\lambda }\) using Algorithm 2;
Step 3. :: Simulate \(D_i\), \(i=1,\ldots ,n\) from a multivariate uniform distribution;
Step 4. :: Simulate \((Z_1,\ldots ,Z_n)\) from \(\nu (d{\varvec{\lambda }}_n\vert {\varvec{x}}_n)\) using Algorithm 1;
Step 5. :: Obtain the required value using the quantities generated in steps 1-4 and (10) of the article.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Costa, E.G., Paulino, C.D. & Singer, J.M. Optimal sample size for estimating the mean concentration of invasive organisms in ballast water via a semiparametric Bayesian analysis. Stat Methods Appl 32, 57–74 (2023). https://doi.org/10.1007/s10260-022-00639-0

Download citation

Accepted: 13 April 2022
Published: 13 May 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s10260-022-00639-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimal sample size for estimating the mean concentration of invasive organisms in ballast water via a semiparametric Bayesian analysis

Abstract

Access this article

Similar content being viewed by others

Ballast Water Sampling and Sample Analysis for Compliance Control

Estimating Percentiles of Bacteriological Counts of Recreational Water Quality Using Tweedie Models

The Frequency Component of Water Quality Criterion Compliance Assessment Should be Data Driven

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimal sample size for estimating the mean concentration of invasive organisms in ballast water via a semiparametric Bayesian analysis

Abstract

Access this article

Similar content being viewed by others

Ballast Water Sampling and Sample Analysis for Compliance Control

Estimating Percentiles of Bacteriological Counts of Recreational Water Quality Using Tweedie Models

The Frequency Component of Water Quality Criterion Compliance Assessment Should be Data Driven

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix A

Appendix A

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation