Abstract
In this paper we consider generalized linear latent variable models that can handle overdispersed counts and continuous but non-negative data. Such data are common in ecological studies when modelling multivariate abundances or biomass. By extending the standard generalized linear modelling framework to include latent variables, we can account for any covariation between species not accounted for by the predictors, notably species interactions and correlations driven by missing covariates. We show how estimation and inference for the considered models can be performed efficiently using the Laplace approximation method and use simulations to study the finite-sample properties of the resulting estimates. In the overdispersed count data case, the Laplace-approximated estimates perform similarly to the estimates based on variational approximation method, which is another method that provides a closed form approximation of the likelihood. In the biomass data case, we show that ignoring the correlation between taxa affects the regression estimates unfavourably. To illustrate how our methods can be used in unconstrained ordination and in making inference on environmental variables, we apply them to two ecological datasets: abundances of bacterial species in three arctic locations in Europe and abundances of coral reef species in Indonesia.
Supplementary materials accompanying this paper appear on-line.
This is a preview of subscription content, access via your institution.




References
Araújo, M. B. and Luoto, M. (2007). The importance of biotic interactions for modelling species distributions under climate change. Global Ecology and Biogeography, 16:743–753.
Bartholomew, D. J., Knott, M., and Moustaki, I. (2011). Latent variable models and factor analysis: A unified approach. Wiley: New York.
Bianconcini, S. and Cagnone, S. (2012). Estimation of generalized linear latent variable models via fully exponential Laplace approximation. Journal of Multivariate Analysis, 112:183–193.
Blanchet, F. (2014). HMSC: Hierachical modelling of species community. R package version 0.6-2.
Brown, A. M., Warton, D. I., Andrew, N. R., Binns, M., Cassis, G., and Gibb, H. (2014). The fourth-corner solution - using predictive models to understand how species traits interact with the environment. Methods in Ecology and Evolution, 5:344–352.
Burnham, K. and Anderson, D. (2002). Model selection and multimodel inference: Al practical information-theoretic approach. Springer.
Chu, H., Fierer, N., Lauber, C. L., Caporaso, J. G., Knight, R., and Grogan, P. (2010). Soil bacterial diversity in the arctic is not fundamentally different from that found in other biomes. Environmental Microbiology, 12:2998–3006.
Cressie, N., Calder, C. A., Clark, J. S., Hoef, J. M. V., and Wikle, C. K. (2009). Accounting for uncertainty in ecological analysis: the strengths and limitations of hierarchical statistical modeling. Ecological Applications, 19(3):553–570.
Dunn, P. K. and Smyth, G. K. (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics, 5:236–244.
——. (2005). Series evaluation of tweedie exponential dispersion model densities. Statistics and Computing, 15:267–280.
Dunstan, P. K., Foster, S. D., Hui, F., and Warton, D. I. (2013). Finite mixture of regression modeling for high-dimensional count and biomass data in ecology. Journal of Agricultural, Biological and Environmental Sciences, 18:357–375.
Foster, S. D. and Bravington, M. V. (2013). A Poisson–Gamma model for analysis of ecological non-negative continuous data. Environmental and ecological statistics, 20:533–552.
Hall, P., Ormerod, J. T., and Wand, M. (2011a). Theory of gaussian variational approximation for a poisson mixed model. Statistica Sinica, 21:369–389.
Hall, P., Pham, T., Wand, M. P., Wang, S. S., et al. (2011b). Asymptotic normality and valid inference for Gaussian variational approximation. The Annals of Statistics, 39:2502–2532.
Huber, P. and Ronchetti, E. (2009). Robust Statistics. Wiley: New York.
Huber, P., Ronchetti, E., and Victoria-Feser, M. (2004). Estimation of generalized linear latent variable models. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 66:893–908.
Hui, F. K. C. (2016). boral–Bayesian Ordination and Regression Analysis of Multivariate Abundance Data in R. Methods in Ecology and Evolution, 7:744–750.
Hui, F. K. C., Taskinen, S., Pledger, S., Foster, S. D., and Warton, D. I. (2015). Model-Based Approaches to Unconstrained Ordination. Methods in Ecology and Evolution, 6:399–411.
Hui, F. K. C., Warton, D., Ormerod, J., Haapaniemi, V., and Taskinen, S. (2016). Variational Approximations for Generalized Linear Latent Variable Models. Journal of Computational and Graphical Statistics. In press.
Joe, H. (2008). Accuracy of laplace approximation for discrete response mixed models. Computational Statistics & Data Analysis, 5066–5074:52.
Jorgensen, B. (1997). The Theory of Dispersion Models. Chapman & Hall.
Kendal, W. S. (2004). Taylor’s ecological power law as a consequence of scale invariant exponential dispersion models. Ecological Complexity, 1(3):193–209.
Kristensen, K., Nielsen, A., Berg, C., Skaug, H., and Bell, B. (2016). Tmb: Automatic differentiation and laplace approximation. Journal of Statistical Software, Articles, 70(5):1–21.
Letten, A. D., Keith, D. A., Tozer, M. G., and Hui, F. K. (2015). Fine-scale hydrological niche differentiation through the lens of multi-species co-occurrence models. Journal of Ecology, 103:1264–1275.
Männistö, M. K., Tiirola, M., and Häggblom, M. M. (2007). Bacterial communities in arctic fjelds of finnish lapland are stable but highly ph-dependent. FEMS Microbiology Ecology, 59:452–465.
Martin, T. G., Wintle, B. A., Rhodes, J. R., Kuhnert, P. M., Field, S. A., Low-Choy, S. J., Tyre, A. J., and Possingham, H. P. (2005). Zero tolerance ecology: improving ecological inference by modelling the source of zero observations. Ecology letters, 8:1235–1246.
Morales-Castilla, I., Matias, M. G., Gravel, D., and Araújo, M. B. (2015). Inferring biotic interactions from proxies. Trends in ecology & evolution, 30(6):347–356.
Moustaki, I. (1996). A latent trait and a latent class model for mixed observed variables. British Journal of Mathematical and Statistical Psychology, 49:313–334.
Moustaki, I. and Knott, M. (2000). Generalized latent trait models. Psychometrika, 65:391–411.
Nakagawa, S. and Schielzeth, H. (2013). A general and simple method for obtaining R2 from generalized linear mixed-effects models. Methods In Ecology And Evolution, 4:133–142.
Nissinen, R., Männistö, M., and van Elsas, J. (2012). Endophytic bacterial communities in three arctic plants from low arctic fell tundra are cold-adapted and host-plant specific. FEMS Microbiology Ecology, 82:510–522.
Ovaskainen, O., Abrego, N., Halme, P., and Dunson, D. (2016a). Using latent variable models to identify large networks of species-to-species associations at different spatial scales. Methods in Ecology and Evolution, 7:549–555.
Ovaskainen, O., de Knegt, H. J., and Delgado Sanchez, M. d. M. (2016b). Quantitative Ecology and Evolutionary Biology: Integrating Models with Data. Oxford: Oxford University Press.
Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2002). Reliable estimation of generalized linear mixed models using adaptive quadrature. Stata Journal, 2:1–21.
Rodrigues-Motta, M., Pinheiro, H. P., Martins, E. G., Araujo, M. S., and dos Reis, S. F. (2013). Multivariate models for correlated count data. Journal of Applied Statistics, 40:1586–1596.
Sammel, M. D., Ryan, L. M., and Legler, J. M. (1997). Latent variable models for mixed discrete and continuous outcomes. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 59:667–678.
Skrondal, A. and Rabe-Hesketh, S. (2004). Generalized Latent Variable Modeling: Multilevel, Longitudinal and Structural Equation Models. Chapman & Hall, Boca Raton.
Taylor, L. R. (1961). Aggregation, variance and the mean. Nature, 189:732 – 735.
Warton, D. I. (2005). Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data. Environmetrics, 16:275–289.
Warton, D. I., Blanchet, F. G., O’Hara, R., Ovaskainen, O., Taskinen, S., Walker, S. C., and Hui, F. K. (2016). Extending Joint Models in Community Ecology: A Response to Beissinger et al. Trends in Ecology & Evolution, 31:737–738.
Warton, D. I., Blanchet, F. G., O’Hara, R., Ovaskainen, O., Taskinen, S., Walker, S. C., and Hui, F. K. C. (2015). So many variables: Joint modeling in community ecology. Trends in Ecology and Evolution, 30:766–779.
Warwick, R., Clarke, K., and Suharsono (1990). A statistical analysis of coral community responses to the 1982–83 el niño in the thousand islands, indonesia. Coral Reefs, 8:171–179.
Welsh, A. H., Cunningham, R. B., Donnelly, C., and Lindenmayer, D. B. (1996). Modelling the abundance of rare species: statistical models for counts with extra zeros. Ecological Modelling, 88:297–308.
Yu, D. W., Ji, Y., Emerson, B. C., Wang, X., Ye, C., Yang, C., and Ding, Z. (2012). Biodiversity soup: metabarcoding of arthropods for rapid biodiversity assessment and biomonitoring. Methods in Ecology and Evolution, 3:613–623.
Acknowledgements
We thank the Associate Editor and the referees for their helpful comments. We also thank Dr Manoj Kumar and Dr Riitta Nissinen for providing us the plant-microbial diversity data. JN and ST were supported by the Academy of Finland grants 251965 and 283323.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendices
A Proofs
1.1 A.1 Laplace Approximations for the General Exponential Family
Assume that the responses \(y_{ij}\) come from the exponential family of distributions with mean \(\mu _{ij}=E(y_{ij})\), and write \(f(y_{ij}|\varvec{u}_i,\varvec{\Psi }) = \exp \left\{ y_{ij}a_j(\mu _{ij})-b_j(\mu _{ij}) + c_j(y_{ij})\right\} \), where \(a_j(\cdot )\), \(b_j(\cdot )\) and \(c_j(\cdot )\) are known functions, and \(\varvec{\Psi }\) includes all model parameters. The log-likelihood function (5) for parameter vector \(\varvec{\Psi }\) now equals
and the Laplace approximation of the log-likelihood function is
where
and \(\varvec{\hat{u}}_i\) is the maximum of \(Q(\varvec{\Psi },\varvec{u}_{i}) = (1/m)\left( \sum \limits _{j=1}^m\log f(y_{ij}|\varvec{u}_i;\varvec{\Psi }) - \varvec{u}_i'\varvec{u}_i/2\right) \) with respect to \(\varvec{u}_i\). The result has been proven in Huber et al. (2004).
1.2 A.2 Poisson Responses
Species counts can be modelled as Poisson distributed responses, \(y_{ij}\sim Poisson(\mu _{ij})\), and log link function. Then \(a_j(\mu _{ij}) = \log (\mu _{ij}), b_j(\mu _{ij})=\mu _{ij}\), and \(c_j(y_{ij})=-\log (y_{ij}!)\). Then the following Laplace approximation \(\tilde{l}\) for the log-likelihood function is obtained
where \(\varvec{\Gamma }(\varvec{\Psi },\varvec{\hat{u}}_i)= \sum \nolimits _{j=1}^m \exp (\hat{\eta }_{ij})\varvec{\gamma }_j\varvec{\gamma }_j' + \varvec{I}_d\), with \(\hat{\eta }_{ij}=\alpha _i + \beta _{0j} + \varvec{x}'_i \varvec{\beta }_j + \hat{\varvec{u}_i}'\varvec{\gamma }_j\), and \(\varvec{\hat{u}}_i\) is the maximum of
1.3 A.3 Proof of Theorem 2
Assume that the responses \(y_{ij}\) come from the zero-inflated Poisson distribution with mean \(E(y_{ij})=(1-p_j)\mu _{ij}\) and density of the form (3). The log-likelihood function (5) then equals
Hence, the Laplace approximation of the log-likelihood function is
where
with \(\hat{\eta }_{ij}=\alpha _i + \beta _{0j} + \varvec{x}'_i \varvec{\beta }_j + \hat{\varvec{u}_i}'\varvec{\gamma }_j\) and \(\hat{A}_{ij}=\exp \{-\exp (\hat{\eta }_{ij})\}\), and \(\varvec{\hat{u}}_i\) is the maximum of \(Q(\varvec{\Psi },\varvec{u}_{i}) = (1/m)\left( \sum \nolimits _{j=1}^m\log f(y_{ij}|\varvec{u}_i;\varvec{\Psi }) - \varvec{u}_i'\varvec{u}_i/2\right) \).
1.4 A.4 Proof of Theorem 3
Assume that the responses \(y_{ij}\) come from the Tweedie distribution with mean \(E(y_{ij})=\mu _{ij}\) and density of the form (4). The log-likelihood function (5) then equals
Hence, the Laplace approximation of the log-likelihood function is
where
with \(\hat{\eta }_{ij}=\alpha _i + \beta _{0j} + \varvec{x}'_i \varvec{\beta }_j + \hat{\varvec{u}_i}'\varvec{\gamma }_j\) and \(\hat{A}_{ij}=\exp \{-\exp (\hat{\eta }_{ij})\}\), and \(\varvec{\hat{u}}_i\) is the maximum of \(Q(\varvec{\Psi },\varvec{u}_{i}) = (1/m)\left( \sum \nolimits _{j=1}^m\log f(y_{ij}|\varvec{u}_i;\varvec{\Psi }) - \varvec{u}_i'\varvec{u}_i/2\right) \).
B Additional Application Results
The ordination of \(n=56\) sites based on generalized linear latent variable model without any covariates assuming negative binomial distributed responses. The sites in ordination are coloured according to their a soil organic matter (SOM) values and b phosphorous (P) values, and labelled according to the sampling site.
The ordination of \(n=56\) sites based on generalized linear latent variable model with pH, soil organic matter and phosphorous as covariates, and assuming negative binomial distributed responses. The sites in ordination are coloured according to their a pH values, b soil organic matter (SOM) values and c phosphorous (P) values, and labelled according to the sampling site. The effect of environmental variables vanishes, but the ordination is affected by the sampling location few Kilpisjärvi sites being different from the others what comes to species composition.
Rights and permissions
About this article
Cite this article
Niku, J., Warton, D.I., Hui, F.K.C. et al. Generalized Linear Latent Variable Models for Multivariate Count and Biomass Data in Ecology. JABES 22, 498–522 (2017). https://doi.org/10.1007/s13253-017-0304-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13253-017-0304-7
Keywords
- Biomass
- Laplace approximation
- Ordination
- Overdispersed count
- Species interactions