Abstract
Phytoplankton biomass data often involve zero outcomes preventing a description by continuous distributions with positive support such as the lognormal distribution commonly used to describe ecological data. Two usual solutions: ignoring the zeroes and adding a small positive number to all outcomes, induce bias and reduce predictive power. To address these shortcomings, we design a Bayesian two-part model with a binary component for presence or absence and a continuous component involving a lognormal model for non-zero biomass. We specify two equations relating species-specific occurrence probabilities and expected log-biomasses when present to potential covariates, with spike-and-slab priors imposed on linear effects to selectively discard the irrelevant predictors. We analyze the biomass data of 74 phytoplankton (57 diatoms and 17 dinoflagellates) recorded weekly at Station L4 (Western English Channel, UK) between April 2003 and December 2009, along with measurements of abiotic covariates. Our results disclose different combinations of environmental predictors for the occurrence and the biomass of individual species. Overall, the occurrence of dinoflagellates is associated with higher temperature and irradiance levels compared to diatoms, with virtually no dependence on nutrient concentrations. Irradiance emerges as the key predictor of biomass when species are present. Optimum temperatures for biomass accumulation and temperature sensitivities vary widely among and within functional types. Compared to one-stage models based on usual zero-handling approaches, our two-part model stands out with higher prediction accuracy. The two-part modeling approach provides a valuable framework for decoupling the predictors of species occurrence and abundance from observational data.
Similar content being viewed by others
Data availability
Source data were obtained from and are available from the Western Channel Observatory www.westernchannelobservatory.org.uk. The L4 environmental data are available at the British Oceanographic Data Centre https://www.bodc.ac.uk/ (doi: https://doi.org/10.5285/f1968a39-26bf-55fe-e044-000b5de50f38).
Code availability
The OpenBUGS code used to fit the model to data is available in the Online Supplementary Materials.
References
Aitchison, J., 1955. On the distribution of a positive random variable having a discrete probability mass at the origin. Journal of the American Statistical Association 50: 901–908.
Aitchison, J. & J. A. C. Brown, 1957. The Lognormal Distribution (with special reference to its uses in economics), Cambridge University Press, London, pp. 94–99.
Amemiya, T., 1974. Multivariate regression and simultaneous equation models when the dependent variables are truncated normal. Econometrica 42: 999–1012.
Armbrust, E. V., 2009. The life of diatoms in the world’s oceans. Nature 459: 185–192.
Cameron, A. & P. Trivedi, 1998. Regression analysis of count data, University Press, Cambridge.
Clarke, K. R. & R. H. Green, 1988. Statistical design and analysis for a “biological effects” study. Marine Ecology Progress Series 46: 213–226.
Cragg, J. G., 1971. Some statistical models for limited dependent variables with application to the demand for durable goods. Econometrica 39: 829–844.
Conn, P. B., D. S. Johnson, P. J. Williams, S. R. Melin & M. B. Hooten, 2018. A guide to Bayesian model checking for ecologists. Ecological Monographs 88: 526–542.
Crow, E. L. & K. Shimizu, 1988. Lognormal distributions: Theory and applications, Marcel Dekker, New York:, 47–51.
Feng, C., H. Wang, N. Lu, T. Chen, H. He, Y. Lu & X. M. Tu, 2014. Log-transformation and its implications for data analysis. Shanghai Archives of Psychiatry 26: 105–109.
Field, C., M. Behrenfeld, J. Randerson & P. Falkowski, 1998. Primary production of the biosphere: integrating terrestrial and oceanic components. Science 281: 237–240.
Fletcher, D., D. MacKenzie & E. Villouta, 2005. Modelling skewed data with many zeroes: a simple approach combining ordinary and logistic regression. Environmental and Ecological Statistics 12: 45–54.
Gelman, A., X.-L. Meng & H. S. Stern, 1996. Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 6: 733–807.
Gelman, A., J. B. Carlin, H. S. Stern, D. B. Dunson, A. Vehtari & D. B. Rubin, 2013. Bayesian data analysis, 3rd ed. Chapman & Hall, London:
George, E. I. & R. E. McCulloch, 1993. Variable selection via Gibbs sampling. Journal of the American Statistical Association 88: 881–889.
Gilks, W. R., S. Richardson & D. J. Spiegelhalter (eds), 1996. Markov Chain Monte Carlo in practice. Chapman and Hall, London.
Harris, R., 2010. The L4 time-series: the first 20 years. Journal of Plankton Research 32(5): 577–583.
Hastie, T., R. Tibshirani & J. Friedman, 2009. The elements of statistical learning: Data mining, inference, and prediction, Springer, New York:
Jeffreys, H., 1961. Theory of probability, 3rd ed. Oxford University Press, Oxford:
Kass, R. & A. Raftery, 1995. Bayes factors. Journal of the American Statistical Association 90: 773–795.
Lambert, D., 1992. Zero-inflated Poisson regression, with an application to defects in mnufacturing. Technometrics 34: 1.
Le Quéré, C., S. P. Harrison, I. C. Prentice, et al., 2005. Ecosystem dynamics based on plankton functional types for global ocean biogeochemistry models. Global Change Biology 11: 2016–2040.
Martin, T. G., B. A. Wintle, J. R. Rhodes, P. M. Kuhnert, S. A. Field, S. J. Low-Choy, A. J. Tyre & H. P. Possingham, 2005. Zero tolerance ecology: improving ecological inference by modeling of zero observations. Ecology Letters 8: 1235–1246.
Maunder, M. N. & A. E. Punt, 2004. Standardizing catch and effort data: a review of recent approaches. Fisheries Research 70: 141–159.
May, R. M., 1975. Patterns of species abundance and diversity. In Cody, M. L. & J. M. Diamond (eds), Ecology and evolution of communities Harvard University Press, Cambridge, pp. 81–120.
McCarthy, M., 2007. Bayesian methods in ecology, Cambridge University Press, New York.
McCullagh, P. & J. Nelder, 1989. Generalized linear models, 2nd ed. Chapman and Hall/CRC, Boca Raton:
McGill, B. J., 2003. A test of the unified neutral theory of biodiversity. Nature 422: 881–885.
Menden-Deuer, S. & E. J. Lessard, 2000. Carbon to volume relationships for dinoflagellates, diatoms, and other protest plankton. Limnology and Oceanography 45: 569–579.
Min, Y. & A. Agresti, 2002. Modeling non-negative data with clumping at zero: a survey. Journal of the Iranian Statistical Society 1: 7–33.
Mullahy, J., 1986. Specification and testing of some modified count data models. Journal of Econometrics 33: 341–365.
Mutshinda, C. M., R. B. O’Hara & I. P. Woiwod, 2009. What drives community dynamics? Proceedings of the Royal Society London, Series B 276: 2923–2929.
Mutshinda, C. M., R. B. O’Hara & I. P. Woiwod, 2011. A multispecies perspective on ecological impacts of climatic forcing. Journal of Animal Ecology 80: 101–107.
Mutshinda, C. M., L. Troccoli-Ghinaglia, Z. V. Finkel, F. E. Müller-Karger & A. J. Irwin, 2013a. Environmental control of the dominant phytoplankton in the Cariaco basin: a hierarchical Bayesian approach. Marine Biology Research 9: 247–261.
Mutshinda, C. M., Z. V. Finkel & A. J. Irwin, 2013b. Which environmental factors control phytoplankton populations? A Bayesian variable selection approach. Ecological Modelling 269: 1–8.
Mutshinda, C. M., Z. V. Finkel, C. E. Widdicombe & A. J. Irwin, 2016. Ecological equivalence of species within phytoplankton functional groups. Functional Ecology 30: 1714–1722.
Mutshinda, C. M., Z. V. Finkel, C. E. Widdicombe & A. J. Irwin, 2017. Phytoplankton traits from long-term oceanographic time-series. Marine Ecology Progress Series 576: 11–25.
Mutshinda, C. M., Z. V. Finkel, C. E. Widdicombe & A. J. Irwin, 2019. Bayesian inference to partition determinants of community dynamics from observational time series. Community Ecology 20: 238–251.
Mutshinda, C. M., Z. V. Finkel, C. E. Widdicombe & A. J. Irwin, 2020. A trait-based clustering for phytoplankton biomass modeling and prediction. Diversity 12: 295.
Mwalili, S., E. Lesaffre & D. Declerck, 2008. The zero-inflated negative binomial regression model with correction for misclassification: an example in caries research. Statistical Methods in Medical Research 17: 123–139.
Mwanza, C., 2010. Bayesian analysis of community dynamics, University of Helsinki, Helsinki:
Neelon, B., A. O’Malley & V. Smith, 2016. Modeling zero-modified count and semicontinuous data in health services research Part 1: background and overview. Statistics in Medicine 35: 5070–5093.
Olsen, M. K. & J. L. Schafer, 2001. A two-part random-effects model for semicontinuous longitudinal data. Journal of the American Statistical Association 96: 730–745.
Owusu, R. A., C. M. Mutshinda, I. Antai, K. Q. Dadzie & E. M. Winston, 2016. Which UGC features drive web purchase intent? A spike-and-slab Bayesian Variable Selection Approach. Internet Research. 26: 22–37.
Pennington, M., 1983. Efficient estimators of abundance for fish and plankton surveys. Biometrics 39: 281–286.
Preston, F. W., 1948. The commonness, and rarity, of species. Ecology 29: 254–283.
Reynolds, C. S., 2006. Ecology of phytoplankton, Cambridge University Press, Cambridge, MA.
Rubec, P. J., R. Kiltie, E. Leone, R. O. Flamm, L. E. McEachron & C. Santi, 2016. Using delta-generalized additive models to predict spatial distributions and population abundance of Juvenile Pink Shrimp in Tampa Bay, Florida. Marine and Coastal Fisheries 8: 232–243.
Stefánsson, G., 1996. Analysis of groundfish survey abundance data: combining the GLM and delta approaches. ICES Journal of Marine Science 53: 577–588.
Stone, M., 1974. Cross-validation choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B 36: 111–147.
Su, L., B. D. Tom & V. T. Farewell, 2009. Bias in 2-part mixed models for longitudinal semicontinuous data. Biostatistics 10: 374–389.
Sugihara, G., 1980. Minimal community structure: an explanation of species abundance patterns. American Naturalist 116: 770–787.
Thomas, A., R. B. O’Hara, U. Ligges & S. Sturtz, 2006. Making BUGS Open. R News 6: 12–17.
Vanhoutte-Bruniera, A., S. L. Lyons, F. Gohin, L. Fernand, A. Ménesguen & P. Cugier, 2008. Modelling the Karenia mikimotoi bloom that occurred in the western English Channel during summer 2003. Ecological Modelling 210: 351–376.
Widdicombe, C., D. Eloire, D. Harbour, R. Harris & P. Somerfield, 2010. Long-term phytoplankton community dynamics in the Western English Channel. Journal of Plankton Research 32: 643–655.
Wang, X., X. Feng & X. Song, 2020. Joint analysis of semicontinuous data with latent variables. Computational Statistics and Data Analysis 151: 107005.
Xu, L., A. D. Paterson, W. Turpin & W. Xu, 2015. Assessment and selection of competing models for zero-inflated microbiome data. PLoS ONE 10: 129606.
Acknowledgements
This work was supported by the Simons Collaboration on Computational Biogeochemical Modeling of Marine Ecosystems/CBIOMES (Grant ID: 549935, AJI). C.E.W. was funded through the UK Natural Environment Research Council’s National Capability Long-term Single Centre Science Programme, “Climate Linked Atlantic Sector Science”, Grant No. NE/R015953/1, and is a contribution to Theme 1.3—Biological Dynamics. Phytoplankton biomass and environmental data were provided by the Plymouth Marine Laboratory’s Western Channel Observatory www.westernchannelobservatory.org.uk, which was funded as part of the UK’s Natural Environmental Research Council’s National Capability.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare no conflict of interest.
Additional information
Handling Editor: Alex Elliott.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Mutshinda, C.M., Mishra, A., Finkel, Z.V. et al. Bayesian two-part modeling of phytoplankton biomass and occurrence. Hydrobiologia 849, 1287–1300 (2022). https://doi.org/10.1007/s10750-021-04789-2
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10750-021-04789-2