Skip to main content
Log in

Bayesian two-part modeling of phytoplankton biomass and occurrence

  • Primary Research Paper
  • Published:
Hydrobiologia Aims and scope Submit manuscript

Abstract

Phytoplankton biomass data often involve zero outcomes preventing a description by continuous distributions with positive support such as the lognormal distribution commonly used to describe ecological data. Two usual solutions: ignoring the zeroes and adding a small positive number to all outcomes, induce bias and reduce predictive power. To address these shortcomings, we design a Bayesian two-part model with a binary component for presence or absence and a continuous component involving a lognormal model for non-zero biomass. We specify two equations relating species-specific occurrence probabilities and expected log-biomasses when present to potential covariates, with spike-and-slab priors imposed on linear effects to selectively discard the irrelevant predictors. We analyze the biomass data of 74 phytoplankton (57 diatoms and 17 dinoflagellates) recorded weekly at Station L4 (Western English Channel, UK) between April 2003 and December 2009, along with measurements of abiotic covariates. Our results disclose different combinations of environmental predictors for the occurrence and the biomass of individual species. Overall, the occurrence of dinoflagellates is associated with higher temperature and irradiance levels compared to diatoms, with virtually no dependence on nutrient concentrations. Irradiance emerges as the key predictor of biomass when species are present. Optimum temperatures for biomass accumulation and temperature sensitivities vary widely among and within functional types. Compared to one-stage models based on usual zero-handling approaches, our two-part model stands out with higher prediction accuracy. The two-part modeling approach provides a valuable framework for decoupling the predictors of species occurrence and abundance from observational data.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Source data were obtained from and are available from the Western Channel Observatory www.westernchannelobservatory.org.uk. The L4 environmental data are available at the British Oceanographic Data Centre https://www.bodc.ac.uk/ (doi: https://doi.org/10.5285/f1968a39-26bf-55fe-e044-000b5de50f38).

Code availability

The OpenBUGS code used to fit the model to data is available in the Online Supplementary Materials.

References

  • Aitchison, J., 1955. On the distribution of a positive random variable having a discrete probability mass at the origin. Journal of the American Statistical Association 50: 901–908.

    Google Scholar 

  • Aitchison, J. & J. A. C. Brown, 1957. The Lognormal Distribution (with special reference to its uses in economics), Cambridge University Press, London, pp. 94–99.

    Google Scholar 

  • Amemiya, T., 1974. Multivariate regression and simultaneous equation models when the dependent variables are truncated normal. Econometrica 42: 999–1012.

    Article  Google Scholar 

  • Armbrust, E. V., 2009. The life of diatoms in the world’s oceans. Nature 459: 185–192.

    Article  CAS  PubMed  Google Scholar 

  • Cameron, A. & P. Trivedi, 1998. Regression analysis of count data, University Press, Cambridge.

    Book  Google Scholar 

  • Clarke, K. R. & R. H. Green, 1988. Statistical design and analysis for a “biological effects” study. Marine Ecology Progress Series 46: 213–226.

    Article  Google Scholar 

  • Cragg, J. G., 1971. Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39: 829–844.

    Article  Google Scholar 

  • Conn, P. B., D. S. Johnson, P. J. Williams, S. R. Melin & M. B. Hooten, 2018. A guide to Bayesian model checking for ecologists. Ecological Monographs 88: 526–542.

    Article  Google Scholar 

  • Crow, E. L. & K. Shimizu, 1988. Lognormal distributions: Theory and applications, Marcel Dekker, New York:, 47–51.

    Google Scholar 

  • Feng, C., H. Wang, N. Lu, T. Chen, H. He, Y. Lu & X. M. Tu, 2014. Log-transformation and its implications for data analysis. Shanghai Archives of Psychiatry 26: 105–109.

    PubMed  PubMed Central  Google Scholar 

  • Field, C., M. Behrenfeld, J. Randerson & P. Falkowski, 1998. Primary production of the biosphere: integrating terrestrial and oceanic components. Science 281: 237–240.

    Article  CAS  PubMed  Google Scholar 

  • Fletcher, D., D. MacKenzie & E. Villouta, 2005. Modelling skewed data with many zeroes: a simple approach combining ordinary and logistic regression. Environmental and Ecological Statistics 12: 45–54.

    Article  Google Scholar 

  • Gelman, A., X.-L. Meng & H. S. Stern, 1996. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 6: 733–807.

    Google Scholar 

  • Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari & D. B. Rubin, 2013. Bayesian data analysis, 3rd ed. Chapman & Hall, London:

    Book  Google Scholar 

  • George, E. I. & R. E. McCulloch, 1993. Variable selection via Gibbs sampling. Journal of the American Statistical Association 88: 881–889.

    Article  Google Scholar 

  • Gilks, W. R., S. Richardson & D. J. Spiegelhalter (eds), 1996. Markov Chain Monte Carlo in practice. Chapman and Hall, London.

    Google Scholar 

  • Harris, R., 2010. The L4 time-series: the first 20 years. Journal of Plankton Research 32(5): 577–583.

    Article  Google Scholar 

  • Hastie, T., R. Tibshirani & J. Friedman, 2009. The elements of statistical learning: Data mining, inference, and prediction, Springer, New York:

    Book  Google Scholar 

  • Jeffreys, H., 1961. Theory of probability, 3rd ed. Oxford University Press, Oxford:

    Google Scholar 

  • Kass, R. & A. Raftery, 1995. Bayes factors. Journal of the American Statistical Association 90: 773–795.

    Article  Google Scholar 

  • Lambert, D., 1992. Zero-inflated Poisson regression, with an application to defects in mnufacturing. Technometrics 34: 1.

    Article  Google Scholar 

  • Le Quéré, C., S. P. Harrison, I. C. Prentice, et al., 2005. Ecosystem dynamics based on plankton functional types for global ocean biogeochemistry models. Global Change Biology 11: 2016–2040.

    Google Scholar 

  • Martin, T. G., B. A. Wintle, J. R. Rhodes, P. M. Kuhnert, S. A. Field, S. J. Low-Choy, A. J. Tyre & H. P. Possingham, 2005. Zero tolerance ecology: improving ecological inference by modeling of zero observations. Ecology Letters 8: 1235–1246.

    Article  PubMed  Google Scholar 

  • Maunder, M. N. & A. E. Punt, 2004. Standardizing catch and effort data: a review of recent approaches. Fisheries Research 70: 141–159.

    Article  Google Scholar 

  • May, R. M., 1975. Patterns of species abundance and diversity. In Cody, M. L. & J. M. Diamond (eds), Ecology and evolution of communities Harvard University Press, Cambridge, pp. 81–120.

    Google Scholar 

  • McCarthy, M., 2007. Bayesian methods in ecology, Cambridge University Press, New York.

    Book  Google Scholar 

  • McCullagh, P. & J. Nelder, 1989. Generalized linear models, 2nd ed. Chapman and Hall/CRC, Boca Raton:

    Book  Google Scholar 

  • McGill, B. J., 2003. A test of the unified neutral theory of biodiversity. Nature 422: 881–885.

    Article  CAS  PubMed  Google Scholar 

  • Menden-Deuer, S. & E. J. Lessard, 2000. Carbon to volume relationships for dinoflagellates, diatoms, and other protest plankton. Limnology and Oceanography 45: 569–579.

    Article  CAS  Google Scholar 

  • Min, Y. & A. Agresti, 2002. Modeling non-negative data with clumping at zero: a survey. Journal of the Iranian Statistical Society 1: 7–33.

    Google Scholar 

  • Mullahy, J., 1986. Specification and testing of some modified count data models. Journal of Econometrics 33: 341–365.

    Article  Google Scholar 

  • Mutshinda, C. M., R. B. O’Hara & I. P. Woiwod, 2009. What drives community dynamics? Proceedings of the Royal Society London, Series B 276: 2923–2929.

    Google Scholar 

  • Mutshinda, C. M., R. B. O’Hara & I. P. Woiwod, 2011. A multispecies perspective on ecological impacts of climatic forcing. Journal of Animal Ecology 80: 101–107.

    Article  Google Scholar 

  • Mutshinda, C. M., L. Troccoli-Ghinaglia, Z. V. Finkel, F. E. Müller-Karger & A. J. Irwin, 2013a. Environmental control of the dominant phytoplankton in the Cariaco basin: a hierarchical Bayesian approach. Marine Biology Research 9: 247–261.

    Article  Google Scholar 

  • Mutshinda, C. M., Z. V. Finkel & A. J. Irwin, 2013b. Which environmental factors control phytoplankton populations? A Bayesian variable selection approach. Ecological Modelling 269: 1–8.

    Article  Google Scholar 

  • Mutshinda, C. M., Z. V. Finkel, C. E. Widdicombe & A. J. Irwin, 2016. Ecological equivalence of species within phytoplankton functional groups. Functional Ecology 30: 1714–1722.

    Article  Google Scholar 

  • Mutshinda, C. M., Z. V. Finkel, C. E. Widdicombe & A. J. Irwin, 2017. Phytoplankton traits from long-term oceanographic time-series. Marine Ecology Progress Series 576: 11–25.

    Article  CAS  Google Scholar 

  • Mutshinda, C. M., Z. V. Finkel, C. E. Widdicombe & A. J. Irwin, 2019. Bayesian inference to partition determinants of community dynamics from observational time series. Community Ecology 20: 238–251.

    Article  Google Scholar 

  • Mutshinda, C. M., Z. V. Finkel, C. E. Widdicombe & A. J. Irwin, 2020. A trait-based clustering for phytoplankton biomass modeling and prediction. Diversity 12: 295.

    Article  CAS  Google Scholar 

  • Mwalili, S., E. Lesaffre & D. Declerck, 2008. The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Statistical Methods in Medical Research 17: 123–139.

    Article  PubMed  Google Scholar 

  • Mwanza, C., 2010. Bayesian analysis of community dynamics, University of Helsinki, Helsinki:

    Google Scholar 

  • Neelon, B., A. O’Malley & V. Smith, 2016. Modeling zero-modified count and semicontinuous data in health services research Part 1: background and overview. Statistics in Medicine 35: 5070–5093.

    Article  PubMed  Google Scholar 

  • Olsen, M. K. & J. L. Schafer, 2001. A two-part random-effects model for semicontinuous longitudinal data. Journal of the American Statistical Association 96: 730–745.

    Article  Google Scholar 

  • Owusu, R. A., C. M. Mutshinda, I. Antai, K. Q. Dadzie & E. M. Winston, 2016. Which UGC features drive web purchase intent? A spike-and-slab Bayesian Variable Selection Approach. Internet Research. 26: 22–37.

    Article  Google Scholar 

  • Pennington, M., 1983. Efficient estimators of abundance for fish and plankton surveys. Biometrics 39: 281–286.

    Article  Google Scholar 

  • Preston, F. W., 1948. The commonness, and rarity, of species. Ecology 29: 254–283.

    Article  Google Scholar 

  • Reynolds, C. S., 2006. Ecology of phytoplankton, Cambridge University Press, Cambridge, MA.

    Book  Google Scholar 

  • Rubec, P. J., R. Kiltie, E. Leone, R. O. Flamm, L. E. McEachron & C. Santi, 2016. Using delta-generalized additive models to predict spatial distributions and population abundance of Juvenile Pink Shrimp in Tampa Bay, Florida. Marine and Coastal Fisheries 8: 232–243.

    Article  Google Scholar 

  • Stefánsson, G., 1996. Analysis of groundfish survey abundance data: combining the GLM and delta approaches. ICES Journal of Marine Science 53: 577–588.

    Article  Google Scholar 

  • Stone, M., 1974. Cross-validation choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B 36: 111–147.

    Google Scholar 

  • Su, L., B. D. Tom & V. T. Farewell, 2009. Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics 10: 374–389.

    Article  PubMed  PubMed Central  Google Scholar 

  • Sugihara, G., 1980. Minimal community structure: an explanation of species abundance patterns. American Naturalist 116: 770–787.

    Article  Google Scholar 

  • Thomas, A., R. B. O’Hara, U. Ligges & S. Sturtz, 2006. Making BUGS Open. R News 6: 12–17.

    Google Scholar 

  • Vanhoutte-Bruniera, A., S. L. Lyons, F. Gohin, L. Fernand, A. Ménesguen & P. Cugier, 2008. Modelling the Karenia mikimotoi bloom that occurred in the western English Channel during summer 2003. Ecological Modelling 210: 351–376.

    Article  Google Scholar 

  • Widdicombe, C., D. Eloire, D. Harbour, R. Harris & P. Somerfield, 2010. Long-term phytoplankton community dynamics in the Western English Channel. Journal of Plankton Research 32: 643–655.

    Article  Google Scholar 

  • Wang, X., X. Feng & X. Song, 2020. Joint analysis of semicontinuous data with latent variables. Computational Statistics and Data Analysis 151: 107005.

    Article  Google Scholar 

  • Xu, L., A. D. Paterson, W. Turpin & W. Xu, 2015. Assessment and selection of competing models for zero-inflated microbiome data. PLoS ONE 10: 129606.

    Google Scholar 

Download references

Acknowledgements

This work was supported by the Simons Collaboration on Computational Biogeochemical Modeling of Marine Ecosystems/CBIOMES (Grant ID: 549935, AJI). C.E.W. was funded through the UK Natural Environment Research Council’s National Capability Long-term Single Centre Science Programme, “Climate Linked Atlantic Sector Science”, Grant No. NE/R015953/1, and is a contribution to Theme 1.3—Biological Dynamics. Phytoplankton biomass and environmental data were provided by the Plymouth Marine Laboratory’s Western Channel Observatory www.westernchannelobservatory.org.uk, which was funded as part of the UK’s Natural Environmental Research Council’s National Capability.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Crispin M. Mutshinda.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Handling Editor: Alex Elliott.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 23 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mutshinda, C.M., Mishra, A., Finkel, Z.V. et al. Bayesian two-part modeling of phytoplankton biomass and occurrence. Hydrobiologia 849, 1287–1300 (2022). https://doi.org/10.1007/s10750-021-04789-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10750-021-04789-2

Keywords

Navigation