Lobachevskii Journal of Mathematics

, Volume 39, Issue 3, pp 398–412 | Cite as

A Dirichlet Regression Model for Compositional Data with Zeros

  • Michail TsagrisEmail author
  • Connie Stewart


Compositional data are met in many different fields, such as economics, archaeometry, ecology, geology and political sciences. Regression where the dependent variable is a composition is usually carried out via a log-ratio transformation of the composition or via the Dirichlet distribution. However, when there are zero values in the data these two ways are not readily applicable. Suggestions for this problem exist, but most of them rely on substituting the zero values. In this paper we adjust the Dirichlet distribution when covariates are present, in order to allow for zero values to be present in the data, without modifying any values. To do so, we modify the log-likelihood of the Dirichlet distribution to account for zero values. Examples and simulation studies exhibit the performance of the zero adjusted Dirichlet regression.


Compositional data regression Dirichlet distribution zero values 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    J. Aitchison, “The statistical analysis of compositional data,” J. R. Stat. Soc., Ser. B 44, 139–177 (1982).MathSciNetzbMATHGoogle Scholar
  2. 2.
    J. Aitchison, The Statistical Analysis of Compositional Data (Chapman and Hall, London, 2003).zbMATHGoogle Scholar
  3. 3.
    I. J. Bear and D. Billheimer, “A logistic normal mixture model allowing essential zeros,” in Proceedings of the 6th Compositional Data Analysis Workshop, Girona, Spain, 2015.Google Scholar
  4. 4.
    A. Butler and C. Glasbey, “A latent Gaussian model for compositional data with zeros,” J. R. Stat. Soc., Ser. C 57, 505–520 (2008).MathSciNetCrossRefGoogle Scholar
  5. 5.
    G. Campbell and J. E. Mosimann, “Multivariate analysis of size and shape: modelling with the Dirichlet distribution,” in ASA Proceedings of Section on Statistical Graphics (San Francisco, USA, 1987), pp. 93–101.Google Scholar
  6. 6.
    P. J. Davis, “Leonhard Euler’s integral: a historical profile of the gamma function: In memoriam: Milton Abramowitz,” Am.Math. Mon. 66, 849–869 (1959).zbMATHGoogle Scholar
  7. 7.
    D. M. Endres and J. E. Schindelin, “A new metric for probability distributions,” IEEE Trans. Inform. Theory 49, 1858–1860 (2003).MathSciNetCrossRefzbMATHGoogle Scholar
  8. 8.
    R. Gueorguieva, R. Rosenheck, and D. Zelterman, “Dirichlet component regression and its applications to psychiatric data,” Comput. Stat. Data Anal. 52, 5344–5355 (2008).CrossRefzbMATHGoogle Scholar
  9. 9.
    C. Gourieroux, A. Monfort, and A. Trognon “Pseudo maximum likelihood methods: theory,” Econometrica 52, 681–700 (1984).MathSciNetCrossRefzbMATHGoogle Scholar
  10. 10.
    R. H. Hijazi, “An EM-algorithm based method to deal with rounded zeros in compositional data under Dirichlet models,” in Proceedings of the 1st Compositional Data Analysis Workshop, Girona, Spain, 2011.Google Scholar
  11. 11.
    R. H. Hijazi and R.W. Jernigan, “Modelling compositional data using Dirichlet regression models,” J. Appl. Probab. Stat. 4, 77–91 (2009).MathSciNetzbMATHGoogle Scholar
  12. 12.
    S. Kullback, Information Theory and Statistics (Dover, New York, 1997).zbMATHGoogle Scholar
  13. 13.
    T. J. Leininger, A. E. Gelfand, J. M. Allen, and J. A. Silander, Jr., “Spatial regression modeling for compositional data with many zeros,” J. Agricult., Biol. Environ. Stat. 18, 314–334 (2013).MathSciNetCrossRefzbMATHGoogle Scholar
  14. 14.
    J. M. Maier, DirichletReg: Dirichlet Regression in R (2014). Scholar
  15. 15.
    J. A. Martín-Fernández, K. Hron, M. Templ, P. Filzmoser, and J. Palarea-Albaladejo, “Model-based replacement of rounded zeros in compositional data: Classical and robust approaches,” Comput. Stat. Data Anal. 56, 2688–2704 (2012).MathSciNetCrossRefzbMATHGoogle Scholar
  16. 16.
    I. T. Jolliffe, Principal Component Analysis (Springer, New York, 2005).CrossRefzbMATHGoogle Scholar
  17. 17.
    W. Lin, P. Shi, R. Feng, and H. Li, “Variable selection in regression with compositional covariates,” Biometrika 101, 785–797 (2014).MathSciNetCrossRefzbMATHGoogle Scholar
  18. 18.
    M. R. Murteira Joséand J. J. S. Ramalho, “Regression analysis of multivariate fractional data,” Econometric Rev. 35, 515–552 (2016).MathSciNetCrossRefGoogle Scholar
  19. 19.
    K. W. Ng, G. L. Tian, and M. L. Tang, Dirichlet and Related Distributions: Theory, Methods and Applications (Wiley, Chichester, 2011).CrossRefzbMATHGoogle Scholar
  20. 20.
    R. Ospina and S. L. P. Ferrari, “Inflated beta distributions,” Stat. Papers 51, 111–126 (2010).MathSciNetCrossRefzbMATHGoogle Scholar
  21. 21.
    F. Österreicher and I. Vajda, “A new class of metric divergences on probability spaces and its applicability in statistics,” Ann. Inst. Stat. Math. 55, 639–653 (2003).MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    J. Palarea-Albaladejo and J. A. Martín-Fernández, “Amodified EMalr-algorithm for replacing rounded zeros in compositional data sets,” Comput. Geosci. 34, 902–917 (2008).CrossRefGoogle Scholar
  23. 23.
    J. L. Scealy and A. H. Welsh, “Regression for compositional data by using distributions defined on the hypersphere,” J. R. Stat. Soc., Ser. B 73, 351–375 (2011).MathSciNetCrossRefGoogle Scholar
  24. 24.
    R. L. Smith, “A statistical assessment of Buchanan’s vote in Palm Beach county,” Stat. Sci. 17, 441–457 (2002).MathSciNetCrossRefzbMATHGoogle Scholar
  25. 25.
    M. A. Stephens, “Use of the vonMises distribution to analyse continuous proportions,” Biometrika 69, 197–203 (1982).MathSciNetCrossRefGoogle Scholar
  26. 26.
    C. Stewart and C. Field, “Managing the essential zeros in quantitative fatty acid signature analysis,” J. Agricult., Biol., Environ. Stat. 16, 45–69 (2011).MathSciNetCrossRefzbMATHGoogle Scholar
  27. 27.
    M. Templ, K. Hron, and P. Filzmoser, robCompositions: Robust Estimation for Compositional Data, R PackageVersion 0.8-4.Google Scholar
  28. 28.
    H. Theil, Economics and Information Theory (North-Holland, Amsterdam, 1967).Google Scholar
  29. 29.
    T. W. Yee, VGAM: Vector Generalized Linear and Additive Models. R Package Version 0.8-4 (2011). http://CRAN. Scholar
  30. 30.
    G. Zadora, T. Neocleous, and C. Aitken, “A two-level model for evidence evaluation in the presence of zeros,” J. Forensic Sci. 55, 371–384 (2010).CrossRefGoogle Scholar
  31. 31.
    M. Tsagris and G. Athineou, Compositional: Compositional Data Analysis. R package version 2.8 (2017). Scholar

Copyright information

© Pleiades Publishing, Ltd. 2018

Authors and Affiliations

  1. 1.Department of Computer ScienceUniversity of CreteHeraklion CreteGreece
  2. 2.Department of Mathematics and StatisticsUniversity of New BrunswickSaint John, New BrunswickCanada

Personalised recommendations