Environmental and Ecological Statistics

, Volume 18, Issue 3, pp 427–446 | Cite as

Compositional analysis of overdispersed counts using generalized estimating equations

  • David I. WartonEmail author
  • Peter Guttorp


Multivariate abundance data are commonly collected in ecology, and used to explore questions of “community composition”—how relative abundance of different taxa changes with environmental conditions. In this paper, we propose a log-linear marginal modeling approach for analyzing such compositional count data, via generalized estimating equations. This method exploits the multiplicative nature of log-linear models for counts, by reparameterizing models that describe marginal effects on mean abundance. This allows partitioning into “main effects” and compositional effects, which is appealing for interpretation. We apply the proposed approach to reanalyze compositional counts of benthic invertebrates from Delaware Bay, and data of invertebrate communities inhabiting Acacia plants in eastern Australia. In both cases we resort to a resampling approach to make inferences about regression parameters, because the number of clusters was not large compared to cluster size.


Bootstrap Community composition data Log-linear models Negative binomial regression Reparameterization Multivariate analysis 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Supplementary material

10651_2010_145_MOESM1_ESM.pdf (69 kb)
ESM 1 (PDF 69 kb)


  1. Aitchison J (1986) The statistical analysis of compositional data. Chapman & Hall, LondonGoogle Scholar
  2. Anderson MJ (2001) A new method for non-parametric multivariate analysis of variance. Austral Ecol 26: 32–46Google Scholar
  3. Andrew NR, Hughes L (2005) Arthropod community structure along a latitudinal gradient: implications for future impacts of climate change. Austral Ecol 30: 281–297CrossRefGoogle Scholar
  4. Billheimer D, Cardoso T, Freeman E, Guttorp P, Ko H, Silkey M (1997) Natural variability of benthic species composition in the Delaware Bay. Environ Ecol Stat 4: 95–115CrossRefGoogle Scholar
  5. Billheimer D, Guttorp P, Fagan WF (2001) Statistical interpretation of species composition. J Am Stat Assoc 96: 1205–1214CrossRefGoogle Scholar
  6. Chaganty N (1997) An alternative approach to the analysis of longitudinal data via generalized estimating equations. J Stat Plan Inference 63: 39–54CrossRefGoogle Scholar
  7. Crowder M (1995) On the use of a working correlation matrix in using generalised linear models for repeated measures. Biometrika 82(2): 407–410CrossRefGoogle Scholar
  8. Davison AC, Hinkley DV (1997) Bootstrap methods and their application. Cambridge University Press, CambridgeGoogle Scholar
  9. Drum M, McCullagh P (1993) Regression models for discrete longitudinal responses: comment. Stat Sci 8(3): 300–301CrossRefGoogle Scholar
  10. Duong T (2005) ks: Kernel smoothing., R package version 1.3.4
  11. Efron B, Tibshirani R (1993) An introduction to the bootstrap. Chapman & Hall, New YorkGoogle Scholar
  12. Hardin JW, Hilbe JM (2002) Generalized estimating equations. Chapman & Hall, Boca RatonCrossRefGoogle Scholar
  13. Hilbe JM (2007) Negative binomial regression. Cambridge University Press, CambridgeGoogle Scholar
  14. Lahiri SN (2003) Resampling methods for dependent data. Springer, New YorkGoogle Scholar
  15. Lawless JF (1987) Negative binomial and mixed Poisson regression. Can J Stat 15: 209–225CrossRefGoogle Scholar
  16. Leps J, Smilauer P (2003) Multivariate analysis of ecological data using CANOCO. The Univeristy Press, CambridgeCrossRefGoogle Scholar
  17. Liang KY, Zeger SL (1986) Longitudinal data analysis using generalized linear models. Biometrika 73: 13–22CrossRefGoogle Scholar
  18. Mancl LA, DeRouen TA (2001) A covariance estimator for GEE with improved small-sample properties. Biometrics 57(1): 126–134PubMedCrossRefGoogle Scholar
  19. McCullagh P, Nelder JA (1989) Generalized linear models, 2nd edn. Chapman & Hall, LondonGoogle Scholar
  20. Pan W (2001) Akaike’s information criterion in generalized estimating equations. Biometrics 57: 120–125PubMedCrossRefGoogle Scholar
  21. Shults J, Chaganty NR (1998) Analysis of serially correlated data using quasi-least squares. Biometrics 54: 1622–1630CrossRefGoogle Scholar
  22. Warton DI (2005) Many zeros does not mean zero inflation: comparing the goodness-of-fit of parametric models to multivariate abundance data. Environmetrics 16(3): 275–289CrossRefGoogle Scholar
  23. Warton DI (2008) Raw data graphing: an informative but under-utilized tool for the analysis of multivariate abundances. Austral Ecol 33(3): 290–300CrossRefGoogle Scholar
  24. Warton DI (in press) Regularized sandwich estimators for analysis of high dimensional data using generalized estimating equations. BiometricsGoogle Scholar
  25. Zeger SL, Liang KY (1986) Longitudinal data analysis for discrete and continuous outcomes. Biometrics 42: 121–130PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  1. 1.School of Mathematics and Statistics, Evolution and Ecology Research CentreThe University of New South WalesSydneyAustralia
  2. 2.Department of StatisticsThe University of WashingtonSeattleUSA

Personalised recommendations