Environmental and Ecological Statistics

, Volume 19, Issue 3, pp 327–344 | Cite as

Hierarchical Bayesian strategy for modeling correlated compositional data with observed zero counts

  • Carolyn HustonEmail author
  • Carl Schwarz


This article proposes a hierarchical multivariate conditional autoregressive model applied to a compositional response vector. We particularly focus on situations when the composition is discrete occurring when observations are based on small multinomial counts. We address drawbacks that exist in current modeling approaches for such data. Our hierarchical model will be demonstrated with data used to help manage a commercial sockeye salmon fishery in the Fraser River of British Columbia.


Compositional data MVCAR Hierarchical model Bayesian Sum-to-one Zero counts 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Agresti A (2002) Categorical data analysis. Wiley, HobokenCrossRefGoogle Scholar
  2. Aitchison J (1982) The statistical analysis of compositional data. J R Stat Soc Ser B (Methodol) 44(2): 139–177Google Scholar
  3. Aitchison J (1986) The statistical analysis of compositional data. Chapman and Hall, LondonCrossRefGoogle Scholar
  4. Baldi B, Moore D, Bleyer C (2009) The practice of statistics in the life sciences. WH Freeman, New YorkGoogle Scholar
  5. Besag J (1974) Spatial interaction and the statistical analysis of lattice systems. J R Stat Soc Ser B (Methodol) 36(2): 192–236Google Scholar
  6. Besag J, Kooperberg C (1995) On conditional and intrinsic autoregressions. Biometrika 82(4): 733–746Google Scholar
  7. Besag J, York J, Mollié A (1991) Bayesian image restoration, with two applications in spatial statistics. Ann Inst Stat Math 43(1): 1–20CrossRefGoogle Scholar
  8. Besag J, Green P, Higdon D, Mengersen K (1995) Bayesian computation and stochastic systems. Stat Sci 10(1): 3–41CrossRefGoogle Scholar
  9. Billheimer D, Cardoso T, Freeman E, Guttorp P, Ko H, Silkey M (1997) Natural variability of benthic species composition in the Delaware Bay. Environ Ecol Stat 4(2): 95–115CrossRefGoogle Scholar
  10. Billheimer D, Guttorp P, Fagan W (2001) Statistical interpretation of species composition. J Am Stat Assoc 96(456): 1205–1215CrossRefGoogle Scholar
  11. Brooks S, Catchpole E, Morgan B (2000) Bayesian animal survival estimation. Stat Sci 15(4): 357–376CrossRefGoogle Scholar
  12. Clopper C, Pearson E (1934) The use of confidence or fiducial limits illustrated in the case of the binomial. Biometrika 26(4): 404CrossRefGoogle Scholar
  13. El-Shaarawi A, Piegorsch W (2002) Encyclopedia of environmetrics. Wiley, UKGoogle Scholar
  14. Gelfand A, Vounatsou P (2003) Proper multivariate conditional autoregressive models for spatial data analysis. Biostatistics 4(1): 11–15PubMedCrossRefGoogle Scholar
  15. Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, CambridgeGoogle Scholar
  16. Gelman A, Rubin D (1992) Inference from iterative simulation using multiple sequences. Stat Sci 7(4): 457–472CrossRefGoogle Scholar
  17. Gelman A, Meng X, Stern H (1996) Posterior predictive assessment of model fitness via realized discrepancies. Statistica Sinica 6: 733–759Google Scholar
  18. Gelman A, Carlin J, Stern H, Rubin D (2003) Bayesian data analysis. Chapman and Hall/CRC, Boca RatonGoogle Scholar
  19. Ghosh M, Rao J (1994) Small area estimation: an appraisal. Stat Sci 9(1): 55–76CrossRefGoogle Scholar
  20. Ghosh M, Natarajan K, Stroud T, Carlin B (1998) Generalized linear models for small-area estimation. J Am Stat Assoc 93(441): 273–282CrossRefGoogle Scholar
  21. Grunwald G, Raftery A, Guttorp P (1993) Time series of continuous proportions. J R Stat Soc Ser B (Methodol) 55(1): 103–116Google Scholar
  22. Guimarães P, Lindrooth R (2007) Controlling for overdispersion in grouped conditional logit models: a computationally simple application of Dirichlet-multinomial regression. Econ J 10(2): 439–452Google Scholar
  23. Harvey A, Fernandes C (1989) Time series models for count or qualitative observations. J Bus Econ Stat 7(4): 407–417Google Scholar
  24. Kim H, Sun D, Tsutakawa R (2001) A bivariate Bayes method for improving the estimates of mortality rates with a twofold conditional autoregressive model. J Am Stat Assoc 96(456): 1506–1522CrossRefGoogle Scholar
  25. Lunn D, Thomas A, Best N, Spiegelhalter D (2000) WinBUGS-a Bayesian modelling framework: concepts, structure, and extensibility. Stat Comput 10(4): 325–337CrossRefGoogle Scholar
  26. Mardia K (1988) Multi-dimensional multivariate Gaussian Markov random fields with application to image processing. J Multivar Anal 24(2): 265–284CrossRefGoogle Scholar
  27. Mosimann J (1962) On the compound multinomial distribution, the multivariate β-distribution, and correlations among proportions. Biometrika 49(1): 65–82Google Scholar
  28. Moura F, Migon H (2002) Bayesian spatial models for small area estimation of proportions. Stat Model 2(3): 183–201CrossRefGoogle Scholar
  29. Niklasson M, Lindbladh M (2002) A long-term record of Quercus decline, logging and fires in a southern Swedish Fagus-Picea forest. J Veg Sci 13(6): 765–774Google Scholar
  30. Ord K, Fernandes C, Harvey A (1993) Time series models for multivariate series of count data. In: Subba Rao T (eds) Developments in time series analysis: in hounour of Maurice B. Priestley. Chapman and Hall, London, pp 295–309Google Scholar
  31. Palmer J, Pettit L (1996) Risks of using improper priors with gibbs sampling and autocorrelated errors. J Comput Graph Stat 5(3): 245–249Google Scholar
  32. Pawlowsky-Glahn V, Olea R (2004) Geostatistical analysis of compositional data. Oxford University Press, New YorkGoogle Scholar
  33. Pearson K (1896–1897) Mathematical contributions to the theory of evolution. On a form of spurious correlation that may arise when indices are used in the measurement of organs. Proc R Soc Lond 60:389–498Google Scholar
  34. Rao J (2003) Small area estimation. Wiley, HobokenCrossRefGoogle Scholar
  35. Reyment R, Savazzi E (1999) Aspects of multivariate statistical analysis in geology. Elsevier, AmsterdamGoogle Scholar
  36. Shaddick G, Wakefield J (2002) Modelling daily multivariate pollutant data at multiple sites. J R Stat Soc Ser C (Appl Stat) 51(3): 351–372CrossRefGoogle Scholar
  37. Thomas A, Best N, Lunn D, Arnold R, Spiegelhalter D (2004) GeoBugs user manual.
  38. Vounatsou P, Smith T, Gelfand A (2000) Spatial modelling of multinomial data with latent structure: an application to geographical mapping of human gene and haplotype frequencies. Biostatistics 1(2): 177–189PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of Statistics and Actuarial ScienceSimon Fraser UniversityBurnabyCanada

Personalised recommendations