Regression for compositions based on a generalization of the Dirichlet distribution

Abstract

The simplex is the geometrical locus of D-dimensional positive data with constant sum, called compositions. A possible distribution for compositions is the Dirichlet. In Dirichlet models, there are no scale parameters and the D shapes are assumed dependent on auxiliary variables. This peculiar feature makes Dirichlet models difficult to apply and to interpret. Here, we propose a generalization of the Dirichlet, called the simplicial generalized Beta (SGB) distribution. It includes an overall shape parameter, a scale composition and the D Dirichlet shapes. The SGB is flexible enough to accommodate many practical situations. SGB regression models are applied to data from the United Kingdom Time Use Survey. The R-package SGB makes the methods accessible to users.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

References

  1. Aitchison J (1986) The statistical analysis of compositional data. Monographs on statistics and applied probability. Chapman and Hall Ltd (reprinted 2003 with additional material by the Blackburn Press, London (UK)

  2. Aitchison J, Barceló-Vidal C, Martín-Fernández JA (2000) Logratio analysis and compositional distance. Math Geol 32(3):271–275

    Article  Google Scholar 

  3. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J Roy Stat Soc: Ser B (Methodol) 57(1):289–300

    MathSciNet  MATH  Google Scholar 

  4. Chen J, Zhang X, Li S (2017) Multiple linear regression with compositional response and covariates. J Appl Stat 44(12):2270–2285

    MathSciNet  Article  Google Scholar 

  5. Craiu M, Craiu V (1969) Repartitia Dirichlet generalizatá. Analele Universitatii Bucuresti, Mathematicá-Mecanicá 18:9–11

    MATH  Google Scholar 

  6. Egozcue JJ, Pawlovsky-Glahn V (2011) Chapter 2: basic concepts and procedures. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis. Wiley, Theory and applications

    Google Scholar 

  7. Faraway J, Marsaglia G, Marsaglia J, Baddeley A (2019) goftest: classical goodness-of-fit tests for Univariate distributions. r package version 1.2-2. https://CRAN.R-project.org/package=goftest

  8. Gershuny J, Sullivan O (2017) United Kingdom Time Use Survey, 2014–2015 [data collection]. UK Data Service. http://doi.org/10.5255/UKDA-SN-8128-1

  9. Graf M (2019) SGB: simplicial generalized beta regression. R package version 1.0. https://cran.r-project.org/package=SGB

  10. Gueorguieva R, Rosenheck R, Zelterman D (2008) Dirichlet component regression and its applications to psychiatric data. Comput Stat Data Anal 52(12):5344–5355

    Article  Google Scholar 

  11. Hijazi RH, Jernigan RW (2009) Modelling compositional data using Dirichlet regression models. J Appl Prob Stat 4(1):77–91

    MATH  Google Scholar 

  12. Hron K, Filzmoser P, Thompson K (2012) Linear regression with compositional explanatory variables. J Appl Stat 39(5):1115–1128

    MathSciNet  Article  Google Scholar 

  13. Huber PJ (1967) The behavior of maximum likelihood estimates under nonstandard conditions. In: Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, vol 1, pp 221–233

  14. Kotz S, Balakrishnan N, Johnson NL (2000) Continuous multivariate distributions, models and applications, vol 1. Wiley, Hoboken

    Google Scholar 

  15. Madsen K, Nielsen H, Tingleff O (2004) Optimization with constraints, Informatics and Mathematical Modelling. Technical University of Denmark, Lyngby

    Google Scholar 

  16. Mateu-Figueras G, Pawlowsky-Glahn V, Barceló-Vidal C (2003) Distributions on the simplex. In: Thió-Henestrosa S, Fernández JM (eds) Proceedings of Compositional data analysis workshop—CoDaWork’03

  17. Minka T (2000) Estimating a Dirichlet distribution (revised 2012). Tech. rep

  18. Monti GS, Mateu-Figueras G, Pawlowsky-Glahn V (2011) Notes on the scaled Dirichlet distribution. Chap 10. In: Pawlowsky-Glahn V, Buccianti A (eds) Compositional data analysis, theory and applications. Wiley, Hoboken, pp 128–138

    Google Scholar 

  19. Monti G, Mateu-Figueras G, Pawlowsky-Glahn V, Egozcue J (2015) Shifted-Dirichlet regression vs simplicial regression: a comparison. In: Thió-Henestrosa S, Fernández JM (eds) Proceedings of the 6th international workshop on compositional data analysis

  20. Monti G, Mateu-Figueras G, Pawlowsky-Glahn V, Egozcue J (2016) A regression model for compositional data based on the Shifted-Dirichlet distribution

  21. Morais J, Thomas-Agnan C (2019) Impact of economic context on automobile market segment shares: a compositional approach (unpublished report)

  22. Ng KW, Tian GL, Tang ML (2011) Dirichlet and related distributions: theory, methods and applications. Wiley series in probability and statistics, http://hdl.handle.net/10722/141604

  23. Rayens WS, Srinivasan C (1994) Dependence properties of generalized liouville distributions on the simplex. J Am Stat Assoc 89(428):1465–1470

    MathSciNet  Article  Google Scholar 

  24. van den Boogaart KG, Tolosana R, Bren M (2014) Compositions: compositional data analysis. r package version 1.40-1. https://CRAN.R-project.org/package=compositions

  25. van den Boogaart KG, Tolosana-Delgado R (2013) Analyzing compositional data with R. Springer, Heidelberg

    Google Scholar 

  26. Varadhan R (2015) alabama: constrained Nonlinear Optimization. R package version 2015.3-1, https://cran.r-project.org/package=alabama

  27. Wicker N, Muller J, Kalathur RKR, Poch O (2008) A maximum likelihood approximation method for Dirichlet’s parameter estimation. Comput Stat Data Anal 52(3):1315–1322

    MathSciNet  Article  Google Scholar 

  28. Yang X, Frees E, Zhang Z (2011) A generalized Beta-copula with applications in modeling multivariate long-tailed data. Insur: Math Econ 49(2):265–284

    MathSciNet  MATH  Google Scholar 

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Monique Graf.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proofs

Proof of Theorem 1

  1. 1.

    \(\{a_k,\, k=1,\ldots ,D\}\) not constant implies dependence on \(\theta\).

    Making the change of variables defined by \(t = \sum _{j=1}^D y_j\) and \(u_k = y_k/t\), \(k=1,\ldots ,D-1\), and setting \(u_D=1-\sum _{j=1}^{D-1} u_j\), we obtain

    $$\begin{aligned}&f({\mathbf {u}},t | \theta ) = \prod _{k=1}^D \left[ \frac{a_k}{\Gamma (p_k) \theta ^{1/a_k}b_k}\left( \frac{t u_k}{\theta ^{1/a_k}b_k}\right) ^{a_k p_k-1} \exp \left( -\left( \frac{t u_k}{\theta ^{1/a_k}b_k}\right) ^{a_k}\right) \right] t^{D-1} \nonumber \\&= \left[ \prod _{k=1}^D \frac{a_k}{\Gamma (p_k) b_k }\left( \frac{u_k}{b_k}\right) ^{a_k p_k-1}\right] \exp \left[ -\sum _{k=1}^D \left( \frac{t}{\theta ^{1/a_k}}\frac{u_k}{b_k}\right) ^{a_k}\right] \prod _{k=1}^D \left( \frac{t}{\theta ^{1/a_k}}\right) ^{a_kp_k } \frac{1}{t} \nonumber \\&=f({\mathbf {u}}|\theta )f(t|{\mathbf {u}},\theta ). \end{aligned}$$

    We want to find the constant of integration C, such that

    $$\begin{aligned} C\int _0^{\infty }f(t|{\mathbf {u}},\theta ) dt&= \int _0^{\infty } \exp \left[ -\sum _{k=1}^D \left( \frac{t}{\theta ^{1/a_k}}\frac{u_k}{b_k}\right) ^{a_k}\right] \prod _{k=1}^D \left( \frac{t}{\theta ^{1/a_k}}\right) ^{a_kp_k } \frac{1}{t} dt \\&= \int _0^{\infty } \exp \left[ -\theta ^{-1}\sum _{k=1}^D \left( \frac{t\,u_k}{b_k}\right) ^{a_k}\right] \theta ^{-P} \prod _{k=1}^D t^{a_kp_k } \frac{1}{t} dt. \end{aligned}$$

    It is clear that, if the parameters \(a_k\) are not constant, the result still depends on \(\theta\). This implies that in this case the distribution of the composition depends on the mixing scheme.

  2. 2.

    \(\{a_k,\, k=1,\ldots ,D\}\) constant implies independence on \(\theta\).

    If \(a_k=a\) for all \(k=1,\ldots ,D\), \(f(t|{\mathbf {u}},\theta )\) is easily integrated. Setting

    $$\begin{aligned}c_k= (u_k/b_k)^{a}, \quad v=\left( \sum _{k=1}^D c_k\right) \frac{t^{a}}{\theta } \text { and } dv=a\left( \sum _{k=1}^D c_k\right) \frac{t^{a}}{\theta }\frac{1}{t}dt, \end{aligned}$$

    we have

    $$\begin{aligned} C\int _0^{\infty }f(t|{\mathbf {u}},\theta ) dt&= \frac{\Gamma (P)}{a\left( \sum _{k=1}^D (u_k/b_k)^{a}\right) ^P}. \end{aligned}$$
    (16)

    Thus the constant C in Eq. (16) does not depend on \(\theta\). Thus this distribution does not depend on \(\theta\).

\(\square\)

Proof of Theorem 2

Taking the density \(f_{{\mathbf {U}}}({\mathbf {u}}_{-D})\) as expressed in Eq. (4), we see that the kernel is, up to a constant factor,

$$\begin{aligned} K({\mathbf {u}}_{-D}) \propto \frac{\left[ (1-\sum _{j=1}^{D-1}u_j)/b_D\right] ^{ap_D-1}}{\left\{ \sum _{j=1}^{D-1}(u_j/b_j)^{a} + \left( 1-\sum _{j=1}^{D-1}u_j)/b_D\right) ^{a}\right\} ^{P}}, \end{aligned}$$

and cannot be put into the form \(K({\mathbf {u}}_{-D})=h\left( \sum _{j=1}^{D-1} (u_j/b_j)^{\beta _j}\right)\), except if \(a=b_j=1\), in which case it reduces to \(K({\mathbf {u}}_{-D};\,a=1,b_j=1,j=1,\ldots ,D) \propto \left[ 1-\sum _{j=1}^{D-1}u_j\right] ^{p_D-1}.\)

Thus \(h(x;\,a=1,b_j=1,j=1,\ldots ,D)=(1-x)^{p_d-1}.\) \(\square\)

Proof of Theorem 3

Without loss of generality, suppose that \(J=1,\ldots ,r\), where \(2\le r<D-1\). Consider the following change of variables: \(x = \sum _{j=1}^r u_j\); \(v_k = u_k/x\) if \(1 \le k \le r-1\); \(w_k = u_k/(1-x)\) if \(r+1 \le k \le D-1\). The Jacobian is \(x^{r-1}(1-x)^{D-r-1}\).

Let \({\mathbf {b}}_1=(b_1,\ldots ,b_r)\) and \({\mathbf {b}}_2=(b_{r+1},\ldots ,b_D)\). We have \(\left( \| {\mathbf {u}}/{\mathbf {b}}\| _a\right) ^a=x^a\left( \| {\mathbf {v}}/{\mathbf {b}}_1\| _a\right) ^a+(1-x)^a \left( \| {\mathbf {w}}/{\mathbf {b}}_2\| _a\right) ^a.\) Making the above change of variables in Eq. (5) and setting \(P_1=\sum _{j=1}^r p_j\) and \(P_2=\sum _{j=r+1}^D p_j,\) we obtain, rearranging terms

$$\begin{aligned}&f_{X,{\mathbf {V}},{\mathbf {W}}}(x,{\mathbf {v}},{\mathbf {w}}) = f_{{\mathbf {U}}}({\mathbf {u}}_{-D}(x,{\mathbf {v}},{\mathbf {w}})x^{r-1}(1-x)^{D-r-1} \nonumber \\& =\frac{\Gamma(P)a^{D-1}}{\prod_{j=1}^D \Gamma(p_j)} \prod_{k=1}^{r} \left\{\frac{xv_k/b_k}{\left[x^a\left(\|{\mathbf {w}}/{\mathbf {b}}_1\|_a\right)^a+(1-x)^a \left(\|{\mathbf {w}}/{\mathbf {b}}_2\|_a\right)^a\right]^{1/a}}\right\}^{ap_k} \times \nonumber \\&\prod _{k=r+1}^{D} \left\{ \frac{(1-x)w_k/b_k}{\left[ x^a\left( \| {\mathbf {v}}/{\mathbf {b}}_1\| _a\right) ^a+(1-x)^a \left( \| {\mathbf {w}}/{\mathbf {b}}_2\| _a\right) ^a\right] ^{1/a}}\right\} ^{ap_k} \times \nonumber \\&\frac{x^{r-1}(1-x)^{D-r-1}}{x^r\prod _{k=1}^{r-1}v_k \left( 1-\sum _{j=1}^{r-1}v_j\right) (1-x)^{D-r}\prod _{k=r+1}^{D-1}w_k \left( 1-\sum _{j=r+1}^{D-1}w_j\right) } \nonumber \\ \nonumber \\&= \frac{\Gamma (P_1)a^{r-1}}{\prod _{j=1}^r \Gamma (p_j)} \prod _{k=1}^{r}\left\{ \frac{v_k/b_k}{\| {\mathbf {v}}/{\mathbf {b}}_1\| _a}\right\} ^{ap_k} \frac{1}{\prod _{k=1}^{r-1}v_k\left( 1-\sum _{j=1}^{r-1}v_j\right) } \times \end{aligned}$$
(17)
$$\begin{aligned}&\frac{\Gamma (P_2)a^{D-r-1}}{\prod _{j=r+1}^D \Gamma (p_j)}\prod _{k=r+1}^{D}\left\{ \frac{w_k/b_k}{\| {\mathbf {w}}/{\mathbf {b}}_2\| _a}\right\} ^{ap_k} \frac{1}{\prod _{k=r+1}^{D-1}w_k\left( 1-\sum _{j=r+1}^{D-1}w_j\right) } \times \nonumber \\&\frac{\Gamma (P)a}{ \Gamma (P_1)\Gamma (P_2)}\left\{ \frac{x^a(\| {\mathbf {v}}/{\mathbf {b}}_1\| _a)^a}{x^a\left( \| {\mathbf {v}}/{\mathbf {b}}_1\| _a\right) ^a+(1-x)^a \left( \| {\mathbf {w}}/{\mathbf {b}}_2\| _a\right) ^a}\right\} ^{P_1} \times \nonumber \\&\left\{ \frac{(1-x)^a(\| {\mathbf {w}}/{\mathbf {b}}_2\| _a)^a}{x^a\left( \| {\mathbf {v}}/{\mathbf {b}}_1\| _a\right) ^a+(1-x)^a \left( \| {\mathbf {w}}/{\mathbf {b}}_2\| _a\right) ^a}\right\} ^{P_2} \times \frac{1}{x(1-x)} \nonumber \\&\nonumber \\&= f_{{\mathbf {V}}}({\mathbf {v}}) f_{{\mathbf {W}}}({\mathbf {w}}) f_{X | {\mathbf {V}}, {\mathbf {W}}}(x;{\mathbf {v}},{\mathbf {w}}). \end{aligned}$$
(18)

Thus the amalgamation is, conditionally on the two sub-compositions,

\(SGB\left( a,\{(\| {\mathbf {v}}/{\mathbf {b}}_1\| _a^{-1},P_1), (\| {\mathbf {w}}/{\mathbf {b}}_2\| _a^{-1},P_2) \}\right) ,\) with conditional density

$$\begin{aligned}&f_{X | {\mathbf {V}}, {\mathbf {W}}_2}(x;{\mathbf {v}},{\mathbf {w}}) = \frac{\Gamma (P)a}{\Gamma (P_1)\Gamma (P_2)} \times \left\{ \frac{x^a(\| {\mathbf {v}}/{\mathbf {b}}_1\| _a)^a}{x^a\left( \| {\mathbf {v}}/{\mathbf {b}}_1\| _a\right) ^a+(1-x)^a \left( \| {\mathbf {w}}/{\mathbf {b}}_2\| _a\right) ^a}\right\} ^{P_1} \times \nonumber \\&\left\{ \frac{(1-x)^a(\| {\mathbf {w}}/{\mathbf {b}}_2\| _a)^a}{x^a\left( \| {\mathbf {v}}/{\mathbf {b}}_1\| _a\right) ^a+(1-x)^a \left( \| {\mathbf {w}}/{\mathbf {b}}_2\| _a\right) ^a}\right\} ^{P_2} \times \frac{1}{x(1-x)} \end{aligned}$$
(19)

The constant of integration does not involve \({\mathbf {v}}\) and \({\mathbf {w}}\). Thus the two sub-compositions \({\mathbf {V}}\) and \({\mathbf {W}}\) are independent SGB.

  1. 1.

    The densities of \({\mathbf {V}}\) and \({\mathbf {W}}\) are at the first two rows of Eq. (17). The independence follows directly from Eq. (18).

  2. 2.

    The conditional distribution of \((X|{\mathbf {V}},{\mathbf {W}})\) is given in Eq. (19).

  3. 3.

    The conditional expectation of \(\log (X/(1-X))\) is a direct application of Eq. (21).

  4. 4.

    The expression for \({\mathrm {{E}}}_A(X | {\mathbf {V}}={\mathbf {v}}, {\mathbf {W}}={\mathbf {w}})\) is an application of Eq. (8) to the density in Eq. (19).

\(\square\)

Moments of ratios and log-ratios of parts

1. It is equivalent to compute moment of ratios and log-ratios of parts from the distribution of the composition \({\mathbf {U}}\) or from the initial vector \({\mathbf {Y}}\), because

$$\begin{aligned} U_k/U_j=Y_k/Y_j\qquad \text {for all } j,k=1,\ldots ,D. \end{aligned}$$

The mixed moments of the random vector \({\mathbf {Y}}\) are given by \(M_{{\mathbf {Y}}}(t_1,\ldots ,t_D)= \mathrm {{E}}\left( Y_1^{t_1} \ldots Y_{D}^{t_D}\right) .\)

Set \(t_+=\sum _{j=1}^{D-1} t_j\). Then the mixed moment ratios of the random composition following an SGB distribution \({{SGB(a, \{b_j,p_j\})}},\) \(j=1,\ldots ,D\) are given by the corresponding moment of a product of generalized Gamma random variables, namely,

$$\begin{aligned}&M_{{\mathbf {U}}}(t_1,\ldots ,t_{-{D-1}})= M_{{\mathbf {Y}}}(t_1,\ldots ,t_{D-1},-t_+) \nonumber \\&= \mathrm {{E}}\left[ \left( \frac{U_1}{U_{D}}\right) ^{t_1}\ldots \left( \frac{U_{D-1}}{U_{D}}\right) ^{t_{D-1}}\right] \nonumber \\&= \frac{\prod _{k=1}^{D-1} (b_k)^{t_k}}{(b_D)^{t_+}} \frac{\left\{ \prod _{k=1}^{D-1} \Gamma (p_k+t_k/a)\right\} \Gamma (p_D-t_+/a)}{\prod _{j=1}^D \Gamma (p_j)} \nonumber \\&\qquad -ap_k< t_k, k=1,\ldots ,D-1; \, t_+ < ap_D. \end{aligned}$$
(20)

2. The function \(M_{{\mathbf {U}}}\) in Eq. (20) is the moment generating function of the log-ratios of parts. By taking the first and second derivative of \(M_{{\mathbf {U}}}(t{\mathbf {e}}_i)\) at \(t=0\), Eqs. (21) and (22) are obtained.

$$\begin{aligned} \mathrm {{E}}\log \left( U_i/U_{D}\right)&= \log \left( b_i/b_D\right) +\frac{1}{a}\left( \psi (p_i)-\psi (p_D)\right) \end{aligned}$$
(21)
$$\begin{aligned} \mathrm {{E}}\left[ \log (U_i/U_D)\right] ^2&= \left[ \log \left( b_i/b_D\right) + \frac{1}{a}\left( \psi (p_i)-\psi (p_D)\right) \right] ^2 \nonumber \\&+ \frac{1}{a^2}\left( \psi ^{(1)}(p_i)+\psi ^{(1)}(p_D)\right) . \end{aligned}$$
(22)

Distinct pairs of log-ratios of parts are uncorrelated. The technique can be readily applied to log-ratio transforms of any kind. From Eq. (21), we recover Eq. (8).

Partial derivatives of the pseudo-log-likelihood

Let n be the sample size, D the number of parts and p the number of explanatory variables. Set

$$\begin{aligned} K_{bi}= \sum _{j=1}^D z_j({\mathbf {u}}_i) \log \left( \frac{u_{ij}}{b_{ij}}\right) , \end{aligned}$$

where \({\mathbf {u}}_i=(u_{i1},u_{i2},\ldots , u_{iD}), i=1,\ldots ,n\) are the observed compositions, \(b_{ij}, j=1,\ldots ,D\) the corresponding scales and \(z_j({\mathbf {u}}_i)\) the j-th component of the vector defined in Eq. (6).

The partial derivatives of the pseudo-log-likelihood in Eq. (12) are

$$\begin{aligned} \frac{\partial \ell }{\partial a}&= \frac{n(D-1)}{a} + \sum _{i=1}^n w_i\sum _{k=1}^D p_k \left[ \log \left( \frac{u_{ik}}{b_{ik}}\right) -K_{bi}\right] \\ \frac{\partial \ell }{\partial b_{ik}}&= w_i \sum _{j=1}^D p_j \frac{\partial \log z_j({\mathbf {u}}_i)}{\partial b_{ik}} \nonumber \\&= w_i \frac{a}{b_{ik}}\left( P z_k({\mathbf {u}}_i)-p_k\right) \quad k=1,\ldots ,D, \quad i=1,\ldots ,n\\ \frac{\partial \ell }{\partial p_k}&= n(\psi (P) - \psi (p_k)) + \sum _{i=1}^n w_i \log z_k({\mathbf {u}}_i) \quad k=1,\ldots ,D, \end{aligned}$$

where \(\psi\) is the digamma function.

The derivatives with respect to the regression parameters are given by

$$\begin{aligned} \frac{\partial \ell }{\partial \beta _{j m}} = \sum _{i=1}^n \sum _{k=1}^D \frac{\partial \ell }{\partial b_{ik}} \frac{\partial b_{ik}}{\partial \beta _{j m}},\quad j=1,\ldots ,p;\, m=1,\ldots ,D-1. \end{aligned}$$

The partial derivatives \(\partial b_{ik}/\partial \beta _{j \ell }\) are computed in two steps.

Setting \({\mathbf {s}}_i={\mathbf {x}}_i^t {\mathbf {B}}{\mathbf {V}}^+ = (s_{ir}),r=1,\ldots ,D\) in Eq. (14), we have

$$\begin{aligned} \frac{\partial b_{ik}}{\partial s_{ir}}&= b_{ik}(\delta _{kr}-b_{ir}) \, \qquad \qquad \text {where } \delta _{kr}=1 \text { if } r=k, 0 \text { otherwise}.\\ \frac{\partial s_{ir}}{\partial \beta _{jm}}&= x_{ij}\,({\mathbf {V}}^+)_{rm}. \end{aligned}$$

Thus, denoting by \(({\mathbf {V}}^+)_{m}\) the m-th column of \({\mathbf {V}}^+\), we have

$$\begin{aligned} \frac{\partial b_{ik}}{\partial \beta _{j m}}&= \sum _{r=1}^D \frac{\partial b_{ik}}{\partial s_{ir}} \frac{\partial s_{ir}}{\partial \beta _{jm}} = x_{ij}b_{ik}\left[ ({\mathbf {V}}^+)_{km}-{\mathbf {b}}_i^t ({\mathbf {V}}^+)_{m}\right] , \\ \frac{\partial \ell }{\partial \beta _{j m}}&= \sum _{i=1}^n \sum _{k=1}^D w_i \frac{a}{b_{ik}}\left( P z_k({\mathbf {u}}_i)-p_k\right) x_{ij}b_{ik}\left[ ({\mathbf {V}}^+)_{km}-{\mathbf {b}}_i^t ({\mathbf {V}}^+)_{m}\right] \\&= a\sum _{i=1}^n w_i x_{ij} \sum _{k=1}^D \left( P z_k({\mathbf {u}}_i)-p_k\right) ({\mathbf {V}}^+)_{km}, \end{aligned}$$

because \(\left( P z_k({\mathbf {u}}_i)-p_k\right) \mathbf {1}_D = 0\).

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Graf, M. Regression for compositions based on a generalization of the Dirichlet distribution. Stat Methods Appl 29, 913–936 (2020). https://doi.org/10.1007/s10260-020-00512-y

Download citation

Keywords

  • Compositions
  • Simplicial generalized Beta distribution
  • Maximum likelihood estimation
  • Imputation
  • Multiple regression

Mathematics Subject Classification

  • 62E15
  • 62F10