Skip to main content
Log in

Optimal sample size for estimating the mean concentration of invasive organisms in ballast water via a semiparametric Bayesian analysis

  • Original Paper
  • Published:
Statistical Methods & Applications Aims and scope Submit manuscript

Abstract

We consider the determination of optimal sample sizes to estimate the concentration of organisms in ballast water via a semiparametric Bayesian approach involving a Dirichlet process mixture based on a Poisson model. This semiparametric model provides greater flexibility to model the organism distribution than that allowed by competing parametric models and is robust against misspecification. To obtain the optimal sample size we use a total cost minimization criterion, based on the sum of a Bayes risk and a sampling cost function. Credible intervals obtained via the proposed model may be used to verify compliance of the water with international standards before deballasting.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • Aguirre-Macedo ML, Vidal-Martinez VM, Herrera-Silveira JA, Valdés-Lozano DS, Herrera-Rodríguez M, Olvera-Novoa MA (2008) Ballast water as a vector of coral pathogens in the Gulf of Mexico: the case of the cayo arcas coral reef. Mar Pollut Bull 56:1570–1577

    Article  Google Scholar 

  • Blackwell D, MacQueen JB (1973) Ferguson distributions via Pólya-urn schemes. Ann Stat 1:353–355

    Article  MATH  Google Scholar 

  • Casas-Monroy O, Rajakaruna H, Bailey SA (2020) Improving estimation of phytoplankton abundance and distribution in ballast water discharges. J Appl Phycol 32:1185–1199

    Article  Google Scholar 

  • Cifarelli DM, Melilli E (2000) Some new results for Dirichlet priors. Ann Stat 28:1390–1413

    Article  MathSciNet  MATH  Google Scholar 

  • Cifarelli DM, Regazzini E (1990) Distribution functions of means of a Dirichlet process. Correct Ann Stat 22:1633–1634

    MATH  Google Scholar 

  • Costa EG, Lopes RM, Singer JM (2015) Implications of heterogeneous distributions of organisms on ballast water sampling. Mar Pollut Bull 91:280–287

    Article  Google Scholar 

  • Costa EG, Lopes RM, Singer JM (2016) Sample size for estimating the mean concentration of organisms in ballast water. J Environ Manage 180:433–438

    Article  Google Scholar 

  • Costa EG, Paulino CD, Singer JM (2021) Sample size for estimating organism concentration in ballast water: a Bayesian approach. Braz J Prob Stat 35:158–171

    Article  MathSciNet  MATH  Google Scholar 

  • Escobar MD, West M (1998) Computing nonparametric hierarchical models. In: Dey D, Müller P, Sinha D (eds)., Practical nonparametric and semiparametric Bayesian statistics, chap. 1, pp 1–22, Springer, New York

  • Ferguson TS (1973) A Bayesian analysis of some nonparametric problems. Ann Stat 1:209–230

    Article  MathSciNet  MATH  Google Scholar 

  • Guglielmi A, Holmes CC, Walker SG (2002) Perfect simulation involving functionals of a Dirichlet process. J Comput Graph Stat 11:306–310

    Article  MathSciNet  Google Scholar 

  • Guglielmi A, Tweedie RL (2001) Markov chain Monte Carlo estimation of the law of the mean of a Dirichlet process. Bernoulli 7:573–592

    Article  MathSciNet  MATH  Google Scholar 

  • Hjort NL, Ongaro A (2005) Exact inference for random Dirichlet means. Stat Infer Stoch Process 8:227–254

    Article  MathSciNet  MATH  Google Scholar 

  • Islam AFMS, Pettit LI (2014) Bayesian sample size determination for the bounded linex loss function. J Stat Comput Simul 84:1644–1653

    Article  MathSciNet  MATH  Google Scholar 

  • James LF, Lijoi A, Prünster I (2008) Distributions of linear functionals of two parameter Poisson: Dirichlet random measures. Ann Appl Probab 18:521–551

    Article  MathSciNet  MATH  Google Scholar 

  • Lindley DV (1997) The choice of sample size. J R Stat Soc Ser D (Stat) 46:129–138

    Google Scholar 

  • Müller P, Parmigiani G (1995) Optimal design via curve fitting of Monte Carlo experiments. J Am Stat Assoc 90:1322–1330

    MathSciNet  MATH  Google Scholar 

  • Müller P, Quintana FA, Jara A, Hanson T (2015) Bayesian nonparametric data analysis. Springer, New York

    Book  MATH  Google Scholar 

  • Murphy KR, Ritz D, Hewitt CL (2002) Heterogeneous zooplankton distribution in a ship’s ballast tanks. J Plankton Res 24:729–734

  • Parmigiani G, Inoue LYT (2009) Decision theory: principles and approaches. Wiley, New York

    Book  MATH  Google Scholar 

  • Phadia EG (2016) Prior processes and their applications, 2nd edn. Springer, New York

    Book  MATH  Google Scholar 

  • R Core Team (2016) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. https://www.R-project.org/

  • Regazzini E, Guglielmi A, Nunno GD (2002) Theory and numerical analysis for exact distributions of functionals of a Dirichlet process. Ann Stat 30:1376–1411

    Article  MathSciNet  MATH  Google Scholar 

  • Rice KM, Lumley T, Szpiro AA (2008) Trading bias for precision: decision theory for intervals and sets. http://www.bepress.com/uwbiostat/paper336. Working Paper 336, UW Biostatistics

  • Sethuraman J, Tiwari RC (1982) Convergence of dirichlet measures and the interpretation of their parameter. In: Proceedings Third Purdue Symposium Statistics Decision Theory and Related Topics. S. S. Gupta and J. Berger, pp 305–315, Academic Press, New York

  • Walker SG, Mallick BK (1997) A note on the scale parameter of the Dirichlet process. Can J Stat 25:473–479

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research received financial support from Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES), Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq, grants 153526/2014-9 and 304841/2019-6) and Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP, grant 2013/21728-2), Brazil. This research was also supported by the Fundação para a Ciência e Tecnologia (FCT), Portugal, under Projects UID/MAT/00006/2019 and UID/MAT/00006/2013. The authors are also grateful to Prof. Peter Müller for the constructive comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eliardo G. Costa.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A

Appendix A

Algorithm 1: :

Drawing samples from the joint posterior distribution of the \(\lambda _i\).

Step 1. :

Simulate initial values for \(\lambda _i\), \(i=1,\ldots ,n\) from \(F_0\);

Step 2. :

Under a Gibbs sampling scheme, update \(\lambda _i\), \(i=1,\ldots ,n\) using (4);

Step 3. :

Update the values obtained in Step 2 using (5);

Step 4. :

Repeat steps 2-3 a number of times as a burn-in; the values obtained in the last iteration are the required values.

Algorithm 2::

Drawing samples of the random mean \(\overline{\lambda }\).

Step 1. :

Set a value for \(\epsilon\), set \(\overline{\lambda }_1^\ell =0\) and take \(\overline{\lambda }_1^u\) as the largest internal bit value of the computer being employed (in our case, \(1.79\times 10^{308}\));

Step 2. :

Update the upper and lower quantities using (8) and (9);

Step 3. :

If the absolute difference between the two quantities is smaller than \(\epsilon\), the required value \(\overline{\lambda }\) may be taken as either \(\overline{\lambda }_t^u\) or \(\overline{\lambda }_t^\ell\). Otherwise, return to step 2.

Algorithm 3: :

Drawing samples from the distribution of \(\overline{\lambda }^{(n)}\)

Step 1. :

Simulate \(B_*\) from a \(\text {Beta}(n, \alpha )\) distribution;

Step 2. :

Simulate \(\overline{\lambda }\) using Algorithm 2;

Step 3. :

Simulate \(D_i\), \(i=1,\ldots ,n\) from a multivariate uniform distribution;

Step 4. :

Simulate \((Z_1,\ldots ,Z_n)\) from \(\nu (d{\varvec{\lambda }}_n\vert {\varvec{x}}_n)\) using Algorithm 1;

Step 5. :

Obtain the required value using the quantities generated in steps 1-4 and (10) of the article.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Costa, E.G., Paulino, C.D. & Singer, J.M. Optimal sample size for estimating the mean concentration of invasive organisms in ballast water via a semiparametric Bayesian analysis. Stat Methods Appl 32, 57–74 (2023). https://doi.org/10.1007/s10260-022-00639-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10260-022-00639-0

Keywords

Navigation