A Dirichlet Regression Model for Compositional Data with Zeros

Abstract

Compositional data are met in many different fields, such as economics, archaeometry, ecology, geology and political sciences. Regression where the dependent variable is a composition is usually carried out via a log-ratio transformation of the composition or via the Dirichlet distribution. However, when there are zero values in the data these two ways are not readily applicable. Suggestions for this problem exist, but most of them rely on substituting the zero values. In this paper we adjust the Dirichlet distribution when covariates are present, in order to allow for zero values to be present in the data, without modifying any values. To do so, we modify the log-likelihood of the Dirichlet distribution to account for zero values. Examples and simulation studies exhibit the performance of the zero adjusted Dirichlet regression.

This is a preview of subscription content, log in to check access.

References

  1. 1.

    J. Aitchison, “The statistical analysis of compositional data,” J. R. Stat. Soc., Ser. B 44, 139–177 (1982).

    MathSciNet  MATH  Google Scholar 

  2. 2.

    J. Aitchison, The Statistical Analysis of Compositional Data (Chapman and Hall, London, 2003).

    Google Scholar 

  3. 3.

    I. J. Bear and D. Billheimer, “A logistic normal mixture model allowing essential zeros,” in Proceedings of the 6th Compositional Data Analysis Workshop, Girona, Spain, 2015.

  4. 4.

    A. Butler and C. Glasbey, “A latent Gaussian model for compositional data with zeros,” J. R. Stat. Soc., Ser. C 57, 505–520 (2008).

    MathSciNet  Article  Google Scholar 

  5. 5.

    G. Campbell and J. E. Mosimann, “Multivariate analysis of size and shape: modelling with the Dirichlet distribution,” in ASA Proceedings of Section on Statistical Graphics (San Francisco, USA, 1987), pp. 93–101.

    Google Scholar 

  6. 6.

    P. J. Davis, “Leonhard Euler’s integral: a historical profile of the gamma function: In memoriam: Milton Abramowitz,” Am.Math. Mon. 66, 849–869 (1959).

    MATH  Google Scholar 

  7. 7.

    D. M. Endres and J. E. Schindelin, “A new metric for probability distributions,” IEEE Trans. Inform. Theory 49, 1858–1860 (2003).

    MathSciNet  Article  MATH  Google Scholar 

  8. 8.

    R. Gueorguieva, R. Rosenheck, and D. Zelterman, “Dirichlet component regression and its applications to psychiatric data,” Comput. Stat. Data Anal. 52, 5344–5355 (2008).

    Article  MATH  Google Scholar 

  9. 9.

    C. Gourieroux, A. Monfort, and A. Trognon “Pseudo maximum likelihood methods: theory,” Econometrica 52, 681–700 (1984).

    MathSciNet  Article  MATH  Google Scholar 

  10. 10.

    R. H. Hijazi, “An EM-algorithm based method to deal with rounded zeros in compositional data under Dirichlet models,” in Proceedings of the 1st Compositional Data Analysis Workshop, Girona, Spain, 2011.

  11. 11.

    R. H. Hijazi and R.W. Jernigan, “Modelling compositional data using Dirichlet regression models,” J. Appl. Probab. Stat. 4, 77–91 (2009).

    MathSciNet  MATH  Google Scholar 

  12. 12.

    S. Kullback, Information Theory and Statistics (Dover, New York, 1997).

    Google Scholar 

  13. 13.

    T. J. Leininger, A. E. Gelfand, J. M. Allen, and J. A. Silander, Jr., “Spatial regression modeling for compositional data with many zeros,” J. Agricult., Biol. Environ. Stat. 18, 314–334 (2013).

    MathSciNet  Article  MATH  Google Scholar 

  14. 14.

    J. M. Maier, DirichletReg: Dirichlet Regression in R (2014). http://dirichletreg.r-forge.r-project.org/.

    Google Scholar 

  15. 15.

    J. A. Martín-Fernández, K. Hron, M. Templ, P. Filzmoser, and J. Palarea-Albaladejo, “Model-based replacement of rounded zeros in compositional data: Classical and robust approaches,” Comput. Stat. Data Anal. 56, 2688–2704 (2012).

    MathSciNet  Article  MATH  Google Scholar 

  16. 16.

    I. T. Jolliffe, Principal Component Analysis (Springer, New York, 2005).

    Google Scholar 

  17. 17.

    W. Lin, P. Shi, R. Feng, and H. Li, “Variable selection in regression with compositional covariates,” Biometrika 101, 785–797 (2014).

    MathSciNet  Article  MATH  Google Scholar 

  18. 18.

    M. R. Murteira Joséand J. J. S. Ramalho, “Regression analysis of multivariate fractional data,” Econometric Rev. 35, 515–552 (2016).

    MathSciNet  Article  Google Scholar 

  19. 19.

    K. W. Ng, G. L. Tian, and M. L. Tang, Dirichlet and Related Distributions: Theory, Methods and Applications (Wiley, Chichester, 2011).

    Google Scholar 

  20. 20.

    R. Ospina and S. L. P. Ferrari, “Inflated beta distributions,” Stat. Papers 51, 111–126 (2010).

    MathSciNet  Article  MATH  Google Scholar 

  21. 21.

    F. Österreicher and I. Vajda, “A new class of metric divergences on probability spaces and its applicability in statistics,” Ann. Inst. Stat. Math. 55, 639–653 (2003).

    MathSciNet  Article  MATH  Google Scholar 

  22. 22.

    J. Palarea-Albaladejo and J. A. Martín-Fernández, “Amodified EMalr-algorithm for replacing rounded zeros in compositional data sets,” Comput. Geosci. 34, 902–917 (2008).

    Article  Google Scholar 

  23. 23.

    J. L. Scealy and A. H. Welsh, “Regression for compositional data by using distributions defined on the hypersphere,” J. R. Stat. Soc., Ser. B 73, 351–375 (2011).

    MathSciNet  Article  Google Scholar 

  24. 24.

    R. L. Smith, “A statistical assessment of Buchanan’s vote in Palm Beach county,” Stat. Sci. 17, 441–457 (2002).

    MathSciNet  Article  MATH  Google Scholar 

  25. 25.

    M. A. Stephens, “Use of the vonMises distribution to analyse continuous proportions,” Biometrika 69, 197–203 (1982).

    MathSciNet  Article  Google Scholar 

  26. 26.

    C. Stewart and C. Field, “Managing the essential zeros in quantitative fatty acid signature analysis,” J. Agricult., Biol., Environ. Stat. 16, 45–69 (2011).

    MathSciNet  Article  MATH  Google Scholar 

  27. 27.

    M. Templ, K. Hron, and P. Filzmoser, robCompositions: Robust Estimation for Compositional Data, R PackageVersion 0.8-4.

  28. 28.

    H. Theil, Economics and Information Theory (North-Holland, Amsterdam, 1967).

    Google Scholar 

  29. 29.

    T. W. Yee, VGAM: Vector Generalized Linear and Additive Models. R Package Version 0.8-4 (2011). http://CRAN. R-project.org/package=VGAM.

    Google Scholar 

  30. 30.

    G. Zadora, T. Neocleous, and C. Aitken, “A two-level model for evidence evaluation in the presence of zeros,” J. Forensic Sci. 55, 371–384 (2010).

    Article  Google Scholar 

  31. 31.

    M. Tsagris and G. Athineou, Compositional: Compositional Data Analysis. R package version 2.8 (2017). https://CRAN.R-project.org/package=Compositional.

    Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Michail Tsagris.

Additional information

(Submitted by A. I. Volodin)

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Tsagris, M., Stewart, C. A Dirichlet Regression Model for Compositional Data with Zeros. Lobachevskii J Math 39, 398–412 (2018). https://doi.org/10.1134/S1995080218030198

Download citation

Keywords

  • Compositional data
  • regression
  • Dirichlet distribution
  • zero values