Appendix
Proof of Proposition 3
The conditional distribution function of \({\varvec{S}}_1\mid {\varvec{X}}_{2}={\varvec{x}}_{2}\) can be derived most easily by conditioning on \({\varvec{Z}}\):
$$\begin{aligned} \begin{aligned} F_{{\varvec{S}}_1\mid {\varvec{X}}_{2}={\varvec{x}}_{2}}({\varvec{s}}_1)=&\sum _{i\le D} F_{{\varvec{S}}_1\mid {\varvec{X}}_{2}={\varvec{x}}_{2},{\varvec{Z}}={\varvec{e}}_i}({\varvec{s}}_1)\\&\cdot P({\varvec{Z}}={\varvec{e}}_i\mid {\varvec{X}}_{2}={\varvec{x}}_{2}). \end{aligned} \end{aligned}$$
(35)
Given that \({\varvec{X}}\mid {\varvec{Z}}={\varvec{e}}_i\sim \mathcal{D}({\dot{\varvec{\alpha }}}_i)\), by using well-known Dirichlet independence properties we have that:
$$\begin{aligned} {\varvec{S}}_1 | {\varvec{X}}_{2}={\varvec{x}}_{2},{\varvec{Z}}={\varvec{e}}_i\sim {\varvec{S}}_1\mid {\varvec{Z}}={\varvec{e}}_i. \end{aligned}$$
Recalling that the Dirichlet distribution is closed under the operation of subcomposition, it follows that:
$$\begin{aligned} {\varvec{S}}_1 | {\varvec{Z}}={\varvec{e}}_i\;\sim \;\mathcal{D}(\alpha _1,\ldots ,\alpha _i+\tau _i,\ldots ,\alpha _k),\quad i\le k \end{aligned}$$
and
$$\begin{aligned} {\varvec{S}}_1 | {\varvec{Z}}={\varvec{e}}_i\;\sim \;\mathcal{D}(\alpha _1,\ldots ,\alpha _i,\ldots ,\alpha _k),\quad i> k. \end{aligned}$$
The probabilities \(P({\varvec{Z}}={\varvec{e}}_i\mid {\varvec{X}}_{2}={\varvec{x}}_{2})\) can be computed by the Bayes theorem. In particular, the distribution of \(({\varvec{X}}_{2},1-X_2^+)^\intercal | {\varvec{Z}}={\varvec{e}}_i\) can be obtained by resorting to closure of the Dirichlet under marginalization; it takes the form
$$\begin{aligned} ({\varvec{X}}_{2},1-X_2^+)^\intercal | {\varvec{Z}}={\varvec{e}}_i\;\sim \;\mathcal{D}(\alpha _{k+1},\ldots ,\alpha _D,\alpha _1^++\tau _i) \end{aligned}$$
if \(i\le k\) and
$$\begin{aligned} ({\varvec{X}}_{2},1-X_2^+)^\intercal | {\varvec{Z}}={\varvec{e}}_i\;\sim \;\mathcal{D}(\alpha _{k+1},\ldots ,\alpha _i+\tau _i,\ldots ,\alpha _D,\alpha _1^+) \end{aligned}$$
if \(i> k\). From the Bayes formula, some algebraic manipulations show that the probabilities \(P({\varvec{Z}}={\varvec{e}}_i\mid {\varvec{X}}_{2}={\varvec{x}}_{2})\) are proportional to the \(p_{i}^{'}\)’s provided by (14). By plugging all the computed quantities into (35), the result is obtained.
Proof of Proposition 5
It is obvious that if \(\varvec{\theta }=\varvec{\theta }^\prime \), then \(\mathbf{X } \sim \mathbf{X }^\prime \). In order to show the converse, one can focus on the marginal distribution of \(X_i\). By virtue of Proposition 3, we can write its density function \(g(x_i; \varvec{\theta })\) as:
$$\begin{aligned} \begin{aligned} g(x_i; \varvec{\theta })&= x_i^{\alpha _i - 1} (1-x_i)^{\alpha ^+-\alpha _i -1} \\&\cdot \left\{ p_i \frac{\Gamma (\alpha ^++ \tau _i) x_i^{\tau _i}}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)} \right. \\&+\, \left. \sum _{l\ne i} p_l \frac{\Gamma (\alpha ^++ \tau _l)(1-x_i)^{\tau _l} }{\Gamma (\alpha _i)\Gamma (\alpha ^+- \alpha _i + \tau _l)} \right\} . \end{aligned} \end{aligned}$$
(36)
If \(\mathbf{X } \sim \mathbf{X }^\prime \), then \(X_i \sim X_i^\prime \) and therefore, \(g(x_i; \varvec{\theta }) = g(x_i; \varvec{\theta }^\prime )\)\(\forall \)\(x_i\)\(\in \) (0, 1), as these density functions are continuous. It follows that \(\displaystyle \lim \limits _{x \rightarrow 0^+} \frac{g(x_i; \varvec{\theta })}{x_i^{\alpha _i - 1}} = \lim \limits _{x \rightarrow 0^+} \frac{g(x_i; \varvec{\theta }^\prime )}{x_i^{\alpha _i - 1}}\). We have:
$$\begin{aligned} \displaystyle \lim \limits _{x_i \rightarrow 0^+} \frac{g(x_i; \varvec{\theta })}{x_i^{\alpha _i - 1}} = \sum _{l \ne i} \frac{p_l \Gamma (\alpha ^++ \tau _l)}{\Gamma (\alpha _i)\Gamma (\alpha ^+-\alpha _i+\tau _l)} \end{aligned}$$
and
$$\begin{aligned} \begin{aligned} \lim \limits _{x_i \rightarrow 0^+} \frac{g(x_i; \varvec{\theta }^\prime )}{x_i^{\alpha _i - 1}}&= \left( \lim \limits _{x_i \rightarrow 0^+} \frac{x_i^{\alpha _i^\prime - 1}}{x_i^{\alpha _i - 1}} \right) \\&\cdot \sum _{l \ne i} \frac{p_l^\prime \Gamma (\alpha ^{\prime +} + \tau _l^\prime )}{\Gamma (\alpha _i^\prime )\Gamma (\alpha ^{\prime +}-\alpha _i^\prime +\tau _l^\prime )}. \end{aligned} \end{aligned}$$
In order to satisfy the equality of these two limits, the quantity \(\displaystyle \left( \lim _{x_i \rightarrow 0^+} \frac{x_i^{\alpha _i^\prime - 1}}{x^{\alpha _i - 1}}\right) \) must be finite and different from 0.
This implies that \(\varvec{\alpha }=\varvec{\alpha }^\prime \). As a consequence, the equality \(g(x_i; \varvec{\theta }) = g(x_i; \varvec{\theta }^\prime )\) can be rewritten as:
$$\begin{aligned}&\, \frac{p_i \Gamma (\alpha ^++ \tau _i)x_i^{\tau _i}}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)} + \sum _{l\ne i} \frac{ p_l \Gamma (\alpha ^++ \tau _l)(1-x_i)^{\tau _l}}{\Gamma (\alpha _i)\Gamma (\alpha ^+- \alpha _i + \tau _l)} = \nonumber \\&\quad = \frac{p_i^\prime \Gamma (\alpha ^++ \tau _i^\prime )x_i^{\tau _i^\prime }}{\Gamma (\alpha _i + \tau _i^\prime ) \Gamma (\alpha ^+-\alpha _i)} + \sum _{l\ne i} \frac{p_l^\prime \Gamma (\alpha ^++ \tau _l^\prime )(1-x_i)^{\tau _l^\prime }}{\Gamma (\alpha _i)\Gamma (\alpha ^+- \alpha _i + \tau _l^\prime )}.\nonumber \\ \end{aligned}$$
(37)
By taking the limits as \(x_i \rightarrow 1^-\) on both sides, one obtains:
$$\begin{aligned} \frac{p_i \Gamma (\alpha ^++ \tau _i)}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)} = \frac{p_i^\prime \Gamma (\alpha ^++ \tau _i^\prime )}{\Gamma (\alpha _i + \tau _i^\prime ) \Gamma (\alpha ^+-\alpha _i)}. \end{aligned}$$
(38)
Equation (38) implies that \(p_i\) and \( p_i^\prime \) are either both null or both strictly positive. In the former case, because of the parameter space definition, \( \tau _i=\tau _i^\prime =1\). In the latter case, plugging (38) into equality (37) and deriving both sides, the following equality must hold \(\forall \)\(x_i\)\(\in \) (0, 1):
$$\begin{aligned} \begin{aligned}&\, \frac{p_i \tau _i \Gamma (\alpha ^++ \tau _i) x_i^{\tau _i-1}}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)}\\&\qquad - \sum _{l\ne i} \frac{p_l \tau _l \Gamma (\alpha ^++ \tau _l)(1-x_i)^{\tau _l-1}}{\Gamma (\alpha _i)\Gamma (\alpha ^+- \alpha _i + \tau _l)} = \\&\quad = \frac{ p_i \tau _i^\prime \Gamma (\alpha ^++ \tau _i)x_i^{\tau _i^\prime -1}}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)}\\&\qquad -\sum _{l\ne i} \frac{p_l^\prime \tau _l^\prime \Gamma (\alpha ^++ \tau _l^\prime )(1-x_i)^{\tau _l^\prime -1}}{\Gamma (\alpha _i)\Gamma (\alpha ^+- \alpha _i + \tau _l^\prime )}. \end{aligned} \end{aligned}$$
(39)
Taking the limits as \(x_i \rightarrow 1^-\) on both sides, we have:
$$\begin{aligned} \frac{p_i \tau _i \Gamma (\alpha ^++ \tau _i)}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)} = \frac{p_i \tau _i ^\prime \Gamma (\alpha ^++ \tau _i)}{\Gamma (\alpha _i + \tau _i) \Gamma (\alpha ^+-\alpha _i)}. \end{aligned}$$
(40)
It follows that \(\tau _i=\tau _i^\prime \) for any i such that \(p_i>0\) and hence for all i. Finally, substituting this constraint in (38), it is possible to conclude that \(\mathbf{p }= \mathbf{p }^\prime \).
Proof of Proposition 8
Recall that \(\mathbf{X }| Y^+ = y^+ \sim EFD(\varvec{\alpha },\mathbf{p }^*(y^+),\varvec{\tau }, \beta )\), where \(\mathbf{p }^*(y^+)\) are defined as in (23). Then, if \(\tau _i=\tau \)\(\forall i\), it can be seen immediately that the \(p_i^*(y^+)\)’s are independent of \(y^+\) (and coincide with the \(p_i\)’s). Conversely, if the basis is compositional invariant, then \(p_i^*(y^+)\) does not depend on \(y^+\), and therefore, neither does the ratio \(p_i^*(y^+)/p_l^*(y^+)\)\(\forall i\ne l\). Because this ratio is proportional to \({(y^+)}^{\tau _i-\tau _l}\), \(\tau _i=\tau _l\)\(\forall i\ne l\).
Partial derivatives
In this section we show the partial derivatives of the complete-data log-likelihood (25). In particular, for \(i=1, \ldots ,D\), the first-order partial derivatives are:
$$\begin{aligned} \frac{\partial l_c(\varvec{\theta })}{\partial p_i} = \frac{z_{\cdot i}}{p_i} - \frac{z_{\cdot D}}{p_D}, \end{aligned}$$
where \(z_{\cdot i}=\sum _{j=1}^n z_{ji}\).
$$\begin{aligned} \frac{\partial l_c(\varvec{\theta })}{\partial \alpha _i}= & {} \left( \sum _{l=1}^D z_{\cdot l} \psi (\alpha ^++ \tau _l)\right) + \sum _{j=1}^n \log x_{ji}\\&+\, z_{\cdot i} \left( \psi (\alpha _i) - \psi (\alpha _i + \tau _i) \right) - n \psi (\alpha _i).\\ \frac{\partial l_c(\varvec{\theta })}{\partial \tau _i}= & {} z_{\cdot i} \left( \psi (\alpha ^++ \tau _i) - \psi (\alpha _i + \tau _i)\right) + \sum _{j=1}^n z_{ji} \log x_{ji}. \end{aligned}$$
Table 7 Goodness-of-fit measures for two-part compositions
The second-order partial derivatives are:
$$\begin{aligned} \frac{\partial ^2 l_c(\varvec{\theta })}{\partial p_i \partial p_h} = - \frac{z_{\cdot D}}{p_D^2} - \mathbb {1}_{i=h} \cdot \frac{z_{\cdot i}}{p_i^2}, \end{aligned}$$
where \(\mathbb {1}_{i=h}\) is the indicator function that is equal to 1 if \(i = h\) and 0 otherwise.
$$\begin{aligned} \frac{\partial ^2 l_c(\varvec{\theta })}{\partial p_i \partial \alpha _h}= & {} \frac{\partial ^2 l_c(\varvec{\theta })}{\partial p_i \partial \tau _h} = 0.\\ \frac{\partial ^2 l_c(\varvec{\theta })}{\partial \alpha _i \partial \alpha _h}= & {} \left( \sum _{l=1}^D z_{\cdot l} \psi ^\prime (\alpha ^++ \tau _l) \right) - \mathbb {1}_{i=h} n\psi ^\prime (\alpha _i)\\&+\,\mathbb {1}_{i=h} \cdot \left[ z_{\cdot i} \left( \psi ^\prime (\alpha _i) - \psi ^\prime (\alpha _i + \tau _i)\right) \right] , \end{aligned}$$
where \(\psi ^\prime (\cdot )\) is the trigamma function.
$$\begin{aligned} \frac{\partial ^2 l_c(\varvec{\theta })}{\partial \alpha _i \partial \tau _h}= & {} z_{\cdot h} \psi ^\prime (\alpha ^++ \tau _h) - \mathbb {1}_{i=h} z_{\cdot i} \psi ^\prime (\alpha _i + \tau _i).\\ \frac{\partial ^2 l_c(\varvec{\theta })}{\partial \tau _i \partial \tau _h}= & {} \mathbb {1}_{i=h} \left[ z_{\cdot i} \left( \psi ^\prime (\alpha ^++ \tau _i) - \psi ^\prime (\alpha _i + \tau _i) \right) \right] . \end{aligned}$$
Results of the univariate case of the olive oil dataset
In this section we report the AIC and BIC criteria for the considered models (Table 7), and the fitted density curves (Fig. 9).