A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance

Martínez-Minaya, Joaquín; Rue, Haavard

doi:10.1007/s11222-024-10427-3

A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance

Original Paper
Open access
Published: 16 April 2024

Volume 34, article number 116, (2024)
Cite this article

Download PDF

You have full access to this open access article

Statistics and Computing Aims and scope Submit manuscript

A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance

Download PDF

Joaquín Martínez-Minaya¹ &
Haavard Rue²

708 Accesses
1 Citation
Explore all metrics

Abstract

Compositional Data Analysis (CoDa) has gained popularity in recent years. This type of data consists of values from disjoint categories that sum up to a constant. Both Dirichlet regression and logistic-normal regression have become popular as CoDa analysis methods. However, fitting this kind of multivariate models presents challenges, especially when structured random effects are included in the model, such as temporal or spatial effects. To overcome these challenges, we propose the logistic-normal Dirichlet Model (LNDM). We seamlessly incorporate this approach into the R-INLA package, facilitating model fitting and model prediction within the framework of Latent Gaussian Models. Moreover, we explore metrics like Deviance Information Criteria, Watanabe Akaike information criterion, and cross-validation measure conditional predictive ordinate for model selection in R-INLA for CoDa. Illustrating LNDM through two simulated examples and with an ecological case study on Arabidopsis thaliana in the Iberian Peninsula, we underscore its potential as an effective tool for managing CoDa and large CoDa databases.

A Spatial Durbin Model for Compositional Data

Comparison of WAIC and posterior predictive approaches for N-mixture models

Article Open access 08 July 2024

A Dirichlet Regression Model for Compositional Data with Zeros

Article 17 April 2018

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Compositional Data analysis is an increasingly popular topic for understanding processes that consist in values that correspond to disjoint categories, the sum of which is a constant. Those values are usually proportions or percentages, and in such cases the constant is 1 or 100. The data generated from these processes are widely known as Compositional Data (CoDa). For the sake of simplicity and without loss of generality, from now on, we assume the constant to be 1. Connor and Mosimann (1969) proposed Dirichlet regression to deal with CoDa. Since then, several studies have been conducted using this technique, and most of them have proved that it is a very valuable tool for modelling CoDa, see for example Hijazi and Jernigan (2009) and Pirzamanbein et al. (2020).

There are other approaches to CoDa analysis. Aitchison (1986) presented an unified theory, developing a range of methods based on the idea that “information in compositional vectors is concerned with relative, not absolute magnitudes”. With this statement, the notion of ratios among proportions emerged and the concept of log-ratios arose as the preferred method for dealing with CoDa. Modelling CoDa using logistic-normal gained ground, and the bases of CoDa were established.

A vast body of literature exists on the subject of applying these methods using both Dirichlet regression and logistic-normal regression in different fields, including Ecology (Kobal et al. 2017; Douma and Weedon 2019), Geology (Buccianti and Grunsky 2014; Engle and Rowan 2014), Genomics (Tsilimigras and Fodor 2016; Shi et al. 2016; Washburne et al. 2017; Creus Martí et al. 2022), Environmental Sciences (Aguilera et al. 2021; Mota-Bertran et al. 2022) or Medicine (Dumuid et al. 2018; Fairclough et al. 2018).

Nevertheless, one of the biggest problems encountered when dealing with CoDa models is performing inference. To do so, different approaches have been proposed; in particular, many R-packages have been implemented not only from the frequentist perspective (Cribari-Neto and Zeileis 2010; Templ et al. 2011; Maier 2014), but also from the Bayesian paradigm. R-packages such as BayesX (Klein et al. 2015), Stan (Sennhenn-Reulen 2018), BUGS (van der Merwe 2018) and R-JAGS (Plummer 2016) have tools for dealing with CoDa. These Bayesian packages are mainly based on Markov chain Monte Carlo (MCMC) methods, which construct a Markov chain whose stationary distribution converges to the posterior distribution. However, the computational cost of MCMC can be high. Moreover, the integrated nested Laplace approximation (INLA) methodology (Rue et al. 2009), which is mainly intended for approximating the posterior distribution using the Laplace integration method, has become an alternative to MCMC guaranteeing a higher computational speed for Latent Gaussian Models (LGMs). With the incorporation of new techniques from Bayesian variational inference (Niekerk and Rue 2021; Van Niekerk et al. 2023) and the optimisation of the computation, which improves its parallel performance (Gaedke-Merzhäuser et al. 2023), a new era is emerging in the INLA software. Hence, incorporating a tool for dealing with CoDa would be a convenient way to tackle the large CoDa databases sometimes encountered.

Nonetheless, in R-INLA, it is still a challenge to fit models when we deal with a multivariate likelihood such as the ones defined in simplex of dimension $D (\mathbb {S}^D)$. There are some approximations for the Dirichlet likelihood that involve converting the original Dirichlet observations into Gaussian pseudo-observations conditioned to the linear predictor (Martínez-Minaya et al. 2023) or just converting a CoDa multivariate response into coordinates using the isometric log-ratio transformation (Mota-Bertran et al. 2022) and fitting them in an independent way. However, there is no unified way to fit these models inside R-INLA and take advantage of all its facilities.

In this paper we present the logistic-normal Dirichlet model (LNDM), which mainly uses logistic-normal distribution with Dirichlet covariance through the additive log-ratio transformation as likelihood. This allows us to integrate it within the R-INLA package in a very simple way. Thus, we benefit from all the other features of R-INLA for model fitting, model selection and predictions within the framework of LGMs. Additionally, we present how measures such the Deviance Information Criteria (Spiegelhalter et al. 2002, DIC), the Watanabe Akaike information criterion (Watanabe and Opper 2010; Gelman et al. 2014, WAIC), or the cross-validation measure conditional predictive ordinate (CPO) for evaluating the predictive capacity (Pettit 1990; Roos and Held 2011) are computed in R-INLA for dealing with CoDa. To show how the method works, two simulate examples and a real example in the field of Ecology were implemented. In the last part, we conducted a spatial analysis of the plant Arabidopsis thaliana on the Iberian Peninsula.

The paper is then divided into 7 more sections. Section 2 introduces CoDa, the distributions that can be defined in $\mathbb {S}^D$, and their equivalence. Section 3 presents some fundamentals of the INLA methodology. Section 4 is devoted to introducing the logistic-normal regression with Dirichlet covariance. In Sect. 5, we introduce spatial models as well as model selection measures in CoDa. Section 6 focuses on presenting a simulated spatial study. In Sect. 7, we provide a real application of this method and, finally, Sect. 8 concludes and discusses future avenues of research.

2 CoDa background

This section is devoted to introducing some preliminary concepts for a better understanding of CoDa. In particular, we present some basic and formal definitions of the two main distributions employed when we deal with CoDa.

2.1 CoDa: Definitions

Let $\varvec{y}_{D \times 1}$ be a vector that satisfies $\sum _{d = 1}^D y_d = 1$, and $0< y_d < 1$, $d = 1,\ldots , D$. This vector is called a composition, and it pertains to the simplex sample space. The simplex of dimension D, denoted by $\mathbb {S}^D$, is defined as:

$$\begin{aligned} \mathbb {S}^D = \left\{ {\varvec{y}} \in \mathbb {R}^D \mid 0< y_d < 1; \ \ \sum _{d = 1}^D y_d = 1 \right\} . \end{aligned}$$

(1)

As in the ordinary real Euclidean space, there is a geometry defined in $\mathbb {S}^D$. It does not follow the usual Euclidean geometry, and it was introduced by Pawlowsky-Glahn and Egozcue (2001) and Egozcue et al. (2003). It is called Aitchison geometry. The definitions of perturbation and powering are sufficient to obtain a vector space of compositions and the usual properties such as commutativity, associativity and distributivity hold. With the definition of the Aitchison inner product, the Aitchison norm and the Aitchison distance, an Euclidean linear vector space is obtained (Pawlowsky-Glahn and Egozcue 2001).

Following the fundamentals proposed by Aitchison (1986), log-ratios play an important role in CoDa analysis. They can be constructed in different ways, including centered log-ratio, isometric log-ratio or additive log-ratio, among others (Egozcue et al. 2012). In this work, we focus on the well-known additive log-ratio transformation because of its straightforward interpretation (Greenacre et al. 2023), and due to its being a one-to-one mapping from $\mathbb {S}^D$ to $\mathbb {R}^{D-1}$. It is defined as:

$$\begin{aligned} {\varvec{z}}_{(D-1)\times 1}= alr({\varvec{y}}):= \left[ \log \left( \frac{y_1}{y_D}\right) , \ldots , \log \left( \frac{y_{D-1}}{y_{D}}\right) \right] , \end{aligned}$$

(2)

where D is the reference category. In Greenacre et al. (2023), the authors depicted some criteria to select the reference category. They recommended choosing the one whose logarithm has low variance as a reference, and avoiding taking a reference with low relative abundances across samples. The new variables generated are called alr-coordinates. The inverse alr, also called $alr^{-1}$ is

$$\begin{aligned} \begin{aligned} alr^{-1}({\varvec{z}})=&{} \left[ \frac{\exp {(z_1)}}{1 + \sum _{d=1}^{D-1} \exp {(z_d)}}, \ldots , \right. \\{}&{} \left. \frac{\exp {(z_{D-1})}}{1+ \sum _{d=1}^{D-1} \exp {(z_d)}},\frac{1}{1+ \sum _{d=1}^{D-1} \exp {(z_d)}} \right] . \end{aligned} \end{aligned}$$

In addition to Aitchison geometry, several probability distributions have also been characterised in $\mathbb {S}^D$ (Figueras et al. 2003), although here we focus on the normal distribution on the simplex or logistic-normal distribution, and the Dirichlet distribution.

2.2 Logistic-normal distribution and Dirichlet distribution

Logistic-normal distribution was defined by Aitchison and Shen (1980) and it was studied in depth in Aitchison (1986). A D random vector ${\varvec{y}}$ is said to have a logistic-normal distribution ${{\mathcal {L}}}{{\mathcal {N}}}(\varvec{\mu }, \varvec{\Sigma })$, or alternatively a normal distribution on $\mathbb {S}^D$, if any of its vector of log-ratio coordinates has a joint $(D-1)$-variate normal distribution. This definition can be adapted straight to a CoDa response using alr-coordinates, as:

$$\begin{aligned} {\varvec{y}} \mid \varvec{\mu }, \varvec{\Sigma } \sim {{\mathcal {L}}}{{\mathcal {N}}}(\varvec{\mu }, \varvec{\Sigma }) \Longleftrightarrow alr({\varvec{y}}) \mid \varvec{\mu }, \varvec{\Sigma } \sim {\mathcal {N}}(\varvec{\mu }, \varvec{\Sigma }), \end{aligned}$$

(3)

$\varvec{\mu }$ being a $D-1$ dimensional vector and $\varvec{\Sigma }$ a $(D-1) \times (D-1)$ covariance matrix. Alternatively, the Dirichlet distribution was introduced in Connor and Mosimann (1969), and it is the generalisation of the widely known beta distribution. A D random vector ${\varvec{y}}$ is said to have a Dirichlet distribution ${\mathcal {D}}(\varvec{\alpha })$, if it has the following probability density:

$$\begin{aligned} p(\varvec{y} \mid \varvec{\alpha })= \frac{1}{\text {B}(\varvec{\alpha })} \prod _{d=1}^D y_d^{\alpha _d -1} , \end{aligned}$$

(4)

$\varvec{\alpha } = (\alpha _1, \ldots , \alpha _D)$ being the vector of shape parameters for each category, $\alpha _D>0$ $\forall d$, $y_d \in (0,1)$, $\sum _{d=1}^D y_d=1$, and $\text {B}(\varvec{\alpha })$ the multinomial Beta function, which serves as the normalising constant. The multinomial Beta function is defined as $\text {B}(\varvec{\alpha })=\prod _{d=1}^D \Gamma (\alpha _d)/ \Gamma (\sum _{d=1}^D \alpha _d)$. The sum of all $\alpha $’s, $\alpha _0=\sum _{d=1}^D \alpha _c$, is usually interpreted as a precision parameter. The Beta distribution is the particular case when $D=2$. In addition, each variable is marginally Beta distributed with $\alpha =\alpha _d$ and $\beta =\alpha _0-\alpha _d$. If $\varvec{y} \sim {\mathcal {D}}(\varvec{\alpha })$, the expected values are $\text {E}(y_d)=\alpha _d/\alpha _0$, the variances are $\text {Var}(y_d)=[\alpha _c(\alpha _0 - \alpha _d)]/[\alpha _0^2(\alpha _0 + 1)]$ and the covariances are $\text {Cov}(y_d, y_{d'})=-\alpha _d \alpha _{d'}/[\alpha _0^2(\alpha _0 + 1)]$.

2.3 Relation between the two distributions

As pointed out in Aitchison (1986, 126–129), the logistic-normal and the Dirichlet distribution are separate in the sense that they are never exactly equal for any choice of parameters. However, through the Kullback–Leibler divergence (KL), which measures by how much the approximation q misses the target p, the Dirichlet distribution can be approached with the logistic-normal distribution. The solution to the minimisation of the KL:

$$\begin{aligned} K(p, q) = \int _{{\mathcal {S}}^D} p({\varvec{y}} \mid \varvec{\alpha }) \log \left( \frac{p({\varvec{y}} \mid \varvec{\alpha })}{q({\varvec{y}} \mid \varvec{\mu }, \varvec{\Sigma })} \right) d {\varvec{y}}, \end{aligned}$$

(5)

where $p({\varvec{y}} \mid \varvec{\alpha })$ represents the density function of the Dirichlet, and $q({\varvec{y}} \mid \varvec{\mu }, \varvec{\Sigma })$, the logistic-normal density function, is minimised by:

$$\begin{aligned} \begin{array}{rcl} \varvec{\mu } &{} = &{} {\varvec{E}}\left[ \log \left( \frac{y_{1}}{y_D}\right) , \ldots , \log \left( \frac{y_{D-1}}{y_D}\right) \right] = {\varvec{E}}\left[ alr({\varvec{y}}) \right] , \\ \\ \varvec{\Sigma } &{}= &{} \varvec{Var}\left[ \log \left( \frac{y_{1}}{y_D}\right) , \ldots , \log \left( \frac{y_{D-1}}{y_D}\right) \right] = \varvec{Var}\left[ alr({\varvec{y}}) \right] , \end{array} \end{aligned}$$

(6)

and the solution can be written in terms of the digamma $\phi $ and trigamma $\phi '$ functions as:

$$\begin{aligned} \begin{aligned} \mu _d=&{} \phi (\alpha _d) - \phi (\alpha _D), \quad d = 1, \ldots , D-1, \\ \Sigma _{dd}=&{} \phi '(\alpha _d) + \phi '(\alpha _D), \quad d = 1, \ldots , D-1 , \\ \Sigma _{dk}=&{} \phi '(\alpha _D),\quad d \ne k. \end{aligned} \end{aligned}$$

(7)

This approach plays an important role in this paper, as it constitutes the basis for defining logistic-normal regression with Dirichlet covariance. But first we introduce the model framework in which this likelihood is included, that is, Latent Gaussian Models (LGMs, Rue et al. 2009).

3 LGMs and INLA

The popularity of INLA lies in the fact that it allows fast approximate inference for LGMs. Furthermore, the INLA software is experiencing a new era, facilitated by the integration of novel techniques from Bayesian variational inference (Niekerk and Rue 2021; Van Niekerk et al. 2023) and enhanced computation optimization, leading to improved parallel performance (Gaedke-Merzhäuser et al. 2023). This section is devoted to briefly introducing the structure of LGMs and how INLA makes inference and prediction with the new advances in INLA.

3.1 LGMs

In Van Niekerk et al. (2023) a new formulation of INLA is presented. So, we follow it to introduce the notions of INLA. LGMs can be seen as three-stage hierarchical Bayesian models in which observations $\varvec{y}_{N \times 1}$ can be assumed to be conditionally independent given a latent Gaussian random field ${\mathcal {\varvec{X}}}$ and hyperparameters $\varvec{\theta }_1$

$$\begin{aligned} \varvec{y} \mid {\mathcal {\varvec{X}}}, \varvec{\theta }_1 \sim \prod _{n=1}^N p(y_n \mid {\mathcal {\varvec{X}}},\varvec{\theta }_1). \end{aligned}$$

(8)

The versatility of the model class is related to the specification of the latent Gaussian field:

$$\begin{aligned} \begin{aligned} {\mathcal {\varvec{X}}} \mid \varvec{\theta }_2 \sim {\mathcal {N}}(\varvec{0}, \varvec{Q}^{-1}(\varvec{\theta }_2)), \end{aligned} \end{aligned}$$

(9)

which includes all the latent (non-observable) components of interest, such as fixed effects and random terms, describing the process underlying the data. The hyperparameters $\varvec{\theta }=\{\varvec{\theta }_1, \varvec{\theta }_2\}$ control the latent Gaussian field and/or the likelihood for the data.

Additionally, the LGMs are a class generalising the large number of related variants of additive and generalised models. If $\varvec{\eta }_{N \times 1}$ is a column vector representing the linear predictor, then different effects can be added to it:

$$\begin{aligned} \varvec{\eta }_{N \times 1} = {\varvec{X}} \varvec{\beta } + \sum _{l = 1}^L f_l(\varvec{u}_l) \, \end{aligned}$$

(10)

where ${\varvec{X}}$ is the design matrix for the fixed part (including the first column of 1 s if intercepts are added to the model), and $\varvec{\beta }_{(M + 1) \times 1}$ is a column vector for the linear effects of $\varvec{X}$ on $\varvec{\eta }$. $\{\varvec{f}\}$ are unknown functions of $\varvec{U}$. This formulation can be seen as any model where each one of the $f^l(.)$ terms can be written in matrix form as ${\varvec{A}}_l\varvec{u}_l$. So, expression (10) can be rewritten as $\varvec{\eta } = {\varvec{A}} {\mathcal {\varvec{X}}}$, with ${\varvec{A}}$ a sparse design matrix that links the linear predictors to the latent field.

When we do inference, the aim is to estimate ${\mathcal {\varvec{X}}}_{(M + 1 + L) \times 1} = \{\varvec{\beta }, \varvec{f}\}$, which represents the set of unobserved latent variables (latent field). If a Gaussian prior is assumed for $\varvec{\beta }$ and $\varvec{f}$, the joint prior distribution of ${\mathcal {\varvec{X}}}$ is Gaussian. This yields the latent field ${\mathcal {\varvec{X}}}$ in the hierarchical LGM formulation. The vector of hyperparameters $\varvec{\theta }$ contain the non-Gaussian parameters of the likelihood and the model components. These parameters commonly include variance, scale or correlation parameters.

In most cases, the latent field in addition to be Gaussian, is also a Gaussian Markov random field (GMRF, Rue and Held 2005). A GMRF is a multivariate Gaussian random variable with additional conditional independence properties: $x_j$ and $x_j'$ are conditionally independent given the remaining elements if and only if the (i, j) entry of the precision matrix is 0. Implementation of INLA method use this property to speed up computation.

3.2 INLA

The main idea of the INLA approach is to approximate the posteriors of interest: the marginal posteriors for the latent field, $p({\mathcal {X}}_m \mid \varvec{y})$, and the marginal posteriors for the hyperparameters, $p(\theta _k \mid \varvec{y})$. With the modern formulation (Van Niekerk et al. 2023), the main enhancement is that the latent field is not augmented with the ‘noisy’ linear predictors. Then, the joint density of the latent field, hyperparameters and the data is derived as:

$$\begin{aligned} p({\mathcal {\varvec{X}}}, \varvec{\theta } \mid {\varvec{y}}) \propto p(\varvec{\theta }) p({\mathcal {\varvec{X}}} \mid \varvec{\theta }) \prod _{n = 1}^N p(y_n \mid ({\varvec{A}} {\mathcal {\varvec{X}}})_n, \varvec{\theta }). \end{aligned}$$

(11)

Thus, the initial step in approaching the posterior distributions involves determining the mode and the Hessian at the mode of ${\tilde{p}}(\varvec{\theta } \mid \varvec{y})$:

$$\begin{aligned} {\tilde{p}}(\varvec{\theta } \mid \varvec{y}) \propto \frac{p({\mathcal {\varvec{X}}}, \varvec{\theta } \mid \varvec{y})}{p_{G}({\mathcal {\varvec{X}}} \mid \varvec{\theta }, \varvec{y})} \bigg |_{{\mathcal {\varvec{X}}} = \varvec{\mu (\theta )}}. \end{aligned}$$

(12)

being $p_{G}\left( {\mathcal {\varvec{X}}} \mid \varvec{\theta }, \varvec{y}\right) $ the Gaussian approximation to $p({\mathcal {\varvec{X}}} \mid \varvec{\theta }, \varvec{y})$ computed as depicted in Van Niekerk et al. (2023):

$$\begin{aligned} \begin{aligned} {\mathcal {\varvec{X}}} \mid \varvec{\theta }, \varvec{y} \sim {\mathcal {N}}(\varvec{\mu } (\varvec{\theta }), \varvec{Q}_{{\mathcal {\varvec{X}}}}^{-1}(\varvec{\theta })). \end{aligned} \end{aligned}$$

(13)

The subsequent step involves obtaining the conditional posterior distributions of the elements in ${\mathcal {\varvec{X}}}$. To achieve this, it suffices to perform integration $\varvec{\theta }$ out from (13) using T integration points $\theta _t$ and area weights $\delta _t$ defined by some numerical integration scheme:

$$\begin{aligned} {\tilde{p}}({\mathcal {X}}_m \mid \varvec{y})= & {} \int p_{G}({\mathcal {X}}_m \mid \varvec{\theta }, \varvec{y}) d \varvec{\theta }\nonumber \\\approx & {} \sum _{t = 1}^T p_{G}({\mathcal {X}}_m \mid \theta _t, \varvec{y}) {\tilde{p}}(\theta _t \mid \varvec{y})\delta _t. \end{aligned}$$

(14)

Finally, the recent proposed Variational Bayes correction to Gaussian means by Niekerk and Rue (2021) is used to efficiently calculate an improved mean for the marginal posterior of the latent field. All this methodology can be used through R with the R-INLA package. For more details about R-INLA we refer the reader to Blangiardo and Cameletti (2015), Zuur et al. (2017), Wang et al. (2018), Krainski et al. (2018), Moraga (2019), Gómez-Rubio (2020), Van Niekerk et al. (2023), where practical examples and code guidelines are provided.

4 INLA for fitting logistic-normal regression with Dirichlet covariance

This part of the paper focuses on presenting our approximation for fitting CoDa.

4.1 Bayesian logistic-normal regression with Dirichlet covariance

To define the likelihood we need the logistic-normal distribution and the structure of the variance–covariance matrix presented in Eq. (7).

Definition 1

${\varvec{y}} \in \mathbb {S}^D$ follows a logistic-normal distribution with Dirichlet covariance $\mathcal {LND}(\varvec{\mu }, \varvec{\Sigma })$ if and only if $alr({\varvec{y}}) \sim {\mathcal {N}}(\varvec{\mu }, \varvec{\Sigma })$, and:

$$\begin{aligned} \begin{aligned} \begin{array}{rcl} \Sigma _{dd} &{}{} = &{}{} \sigma _d^2 + \gamma , \quad d = 1, \ldots , D-1, \\ \Sigma _{dk} &{}{} = &{}{} \gamma , d \ne k, \end{array} \end{aligned} \end{aligned}$$

where $\sigma _d^2 + \gamma $ represents the variance of each log-ratio and $\gamma $ is the covariance between log-ratios.

From now on, we will refer to ${{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu },\varvec{\Sigma })$ as the multivariate normal with Dirichlet covariance structure, as depicted in Definition 1. Let $\varvec{y}$ be a multivariate random variable such as $\varvec{y} \sim \mathcal {LND}(\varvec{\mu }, \varvec{\Sigma })$, which by definition is equivalent to $alr(\varvec{y} ) \sim {{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu }, \varvec{\Sigma })$. Because of its easy interpretability in terms of log-ratios with the reference category, we focus on modelling $alr(\varvec{y})$ as a ${{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu }, \varvec{\Sigma })$.

Let $\varvec{\mu }^{(d)}_{N \times 1}$, a column vector representing the linear predictor for the nth observation in the dth alr-coordinate, and ${\varvec{X}}^{(d)}$ with dimension $N \times (M^{(d)} + 1), d = 1, \ldots , D-1$, the design matrix, which can be different for each dth alr-coordinate; in other words, each alr-coordinate can be explained by different covariates. Let ${\varvec{f}}^{(d)}$ be a set of $L^{(d)}$ unknown functions of ${\varvec{U}}$ that also can vary depending on the alr-coordinate. For the sake of simplicity, and without loss of generality, we assume $M^{(d)} = M$ and $L^{(d)}=L$, fixing the number of covariates and the number of functions as the same in each linear predictor. Finally, we define $\varvec{\beta }^{(d)}_{(M+1) \times 1}$ a $M + 1$-dimensional column vector that contains the parameters corresponding to the fixed effects including the intercept.

Then, the logistic-normal Dirichlet model (LNDM) can be expressed as follows:

$$\begin{aligned} alr(\varvec{y} )\sim & {} {{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu }, \varvec{\Sigma }) , \end{aligned}$$

(15)

$$\begin{aligned} \varvec{\mu }^{(d)}= & {} {\varvec{X}} \varvec{\beta }^{(d)} + \sum _{l=1}^L {\varvec{f}}_l^{(d)} (u_l) , \end{aligned}$$

(16)

being ${\mathcal {\varvec{X}}} = \{\varvec{\beta }^{(d)}, {\varvec{f}}^{(d)}; d = 1, \ldots , D - 1\}\,$ the latent field, $\varvec{\theta }_1 = \{\sigma _d^2, \gamma : d = 1, \ldots , D-1 \}$ the hyperparameters corresponding to the likelihood, and $\varvec{\theta }_2$ the hyperparameters corresponding to the functions f.

4.2 LNDM in R-INLA

R-INLA has been implemented in the sense that each data item is linked to one element of the Gaussian field. Although in this new INLA era, this condition disappears (Van Niekerk et al. 2023), it is still a challenge to fit models with multivariate likelihoods. Some approximations exist for Multinomial likelihood using the Poisson–Laplace trick (Baker 1994), or the Dirichlet likelihood converting the original Dirichlet observations into Gaussian pseudo-observations conditioned to the linear predictor (Martínez-Minaya et al. 2023). In our case, the main challenge is to estimate the variance-covariance matrix of the ${{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu }, \varvec{\Sigma })$ distribution, in particular, $p(\gamma \mid {\varvec{y}})$. To do so, we adopt the strategy of modelling each alr-coordinate as if we were modelling multiple likelihoods (Krainski et al. 2018), and the covariance hyperparameter is estimated using independent random effects through the following well-known proposition.

Proposition 1

Let $z_d$, $d = 1, \ldots , D-1$ be independent Gaussian random variables with different mean $\mu _d$ variances $\sigma _{d}^2$, and $u \sim {\mathcal {N}}(0, \gamma )$. Then, the multivariate random variable ${\varvec{y}}$, defined as:

$$\begin{aligned} \begin{aligned} \begin{array}{rcl} y_1 &{}{} = &{}{} z_1 + u, \\ y_2 &{}{} = &{}{} z_2 + u, \\ \vdots &{}{} = &{}{} \vdots \\ y_{D-1} &{}{} = &{}{} z_{D-1} + u, \end{array} \end{aligned} \end{aligned}$$

(17)

follows a multivariate Gaussian with mean $\varvec{\mu }$ and covariance matrix $\varvec{\Sigma }$ whose elements are:

$$\begin{aligned} \begin{aligned} \begin{array}{rcl} \Sigma _{dd} &{}{} = &{}{} \sigma _d^2 + \gamma , \ d = 1, \ldots , D-1, \\ \Sigma _{dj} &{}{} = &{}{} \gamma , d \ne j. \, \end{array} \end{aligned} \end{aligned}$$

This proposition is simple but powerful, as with independent Gaussian distributions and a shared random effect between predictors, $p(\gamma \mid {\varvec{y}})$ can be easily estimated. So, this structure fits perfectly in the context of LGMs. Thus, to estimate LNDM in R-INLA, we only need to add an individual shared random effect between linear predictors corresponding to the different alr-coordinates.

4.3 A simulated example

In this section, we exemplify, using a simulated scenario, the process of fitting CoDa using R-INLA. To elucidate, we initiate with a simplistic case featuring solely three categories and one covariate. We presuppose that the impact of this covariate differs for each predictor. Subsequently, we designate this model as a Type II model. The model structure with which we operate in this example is:

$$\begin{aligned}{} & {} alr({\varvec{Y}} ) \sim {{\mathcal {N}}}{{\mathcal {D}}}((\varvec{\mu }^{(1)}, \varvec{\mu }^{(2)}), \varvec{\Sigma }) , \end{aligned}$$

(18)

$$\begin{aligned}{} & {} \varvec{\mu }^{(d)} = {\varvec{X}} \varvec{\beta }^{(d)} , \end{aligned}$$

(19)

where $\varvec{X}_{N \times 2}$ is a matrix with ones in the first column and values of the covariate simulated from a Uniform distribution between $-0.5$ and 0.5. Four different parameters compose the model, and they form the latent field: ${\mathcal {\varvec{X}}} = \{\beta _0^{(1)}, \beta _0^{(2)}, \beta _1^{(1)}, \beta _1^{(2)}\}$. Moreover, three different hyperparameters are included in the model and they form the set of hyperparameters $\varvec{\theta } = \{\sigma _1^2, \sigma _2^2, \gamma \}$.

4.3.1 Data simulation

In this part of the manuscript, we present an example of how simulation can be conducted. First at all, we define the values of the hyperparameters and we compute the correlation matrix in $\varvec{\Sigma }$. $N = 1000$, $D = 3$, $\sigma _1^2 = 0.5$, $\sigma _2^2=0.4$ and $\gamma = 0.1$ are the choosen values for the simulation.

Correlation matrix can also be easily computed. This matrix is formed for $((D-1)^2 - (D-1))/2$ values out of the diagonal.

Next step is simulating the covariate.

Subsequently, with fixed betas, $\beta _0^{(1)} = -1$, $\beta _1^{(1)} = 1$, $\beta _0^{(2)} = -1$, $\beta _1^{(2)} = 2$, we construct the values for the two linear predictors.

Simulating from a multivariate Gaussian with the structure previously constructed is the next step. And with it, we obtain the alr-coordinates.

Finally, we move to the simplex assuming the third category the reference one. the output is a matrix with the response variable summing their rows up to one. We create a data.frame in order to keep the CoDa, the alr-coordinates and the covariate x. In Fig. 1, CoDa generated and alr-coordinates have been depicted.

4.3.2 Preparing data for being introduced in R-INLA

In this section, the most labor-intensive step is preparing the database to be input into R-INLA. To do this, we make use of structures like inla.stack. In this structure, we need to include the multiresponse variable, where we incorporate different alr-coordinates. Additionally, we input the covariates, indicating which alr-coordinate they affect, along with an index that assist us in introducing the shared random effect for estimating the hyperparameter $\gamma $. So, we start defining such index.

Posteriorly, we extent the dataset for constructing the multivariate response which is a matrix with dimension $(N \times (D-1)) \times (D-1)$, being the first column formed for the first alr-coordinate en N first rows, and NAs in the rest; the second column formed by the second alr-coordinate in the positions $(N + 1)$:(2N), and NAs in the rest, and so on.

In the model, covariates are included as random effects with big variance. So, we need the values of the covariates, and also, an index indicating to which alr-coordinate it belongs.

Finally, we create the inla.stack for estimation, and we are ready for fitting the model.

4.3.3 Fitting the model

For fitting the model, it is required to define priors for the parameters and hyperparameters. Prior considered for the parameters are the default ones used in R-INLA. However, PC-priors (Simpson et al. 2017) are considered for the standard deviations and the root square of the covariance parameter $\gamma $, in particular, PC-prior(1, 0.01) were used for $\sigma _1$, $\sigma _2$ and $\sqrt{\gamma }$. So, the required formula to be introduced in R-INLA was:

and the call to R-INLA:

In Figs. 2 and 3, marginal posterior distributions jointly with the simulated value are depicted showing that we were able to recover the original value.

5 Spatial LNDM and model selection

Once the LNDM is defined, a particular focus lies on how more intricate structures within the linear predictor can be accommodated within the R-INLA framework. Furthermore, another issue pertains to model selection. Hence, this section is dedicated to spatial LNDMs and the utilization of measures such as Deviance Information Criteria (DIC), Watanabe Akaike information criterion (WAIC), and LCPO for model selection.

5.1 Spatial LNDMs

Of particular interest are the LNDMs in the spatial context. The analysis of the spatial process refers to the analysis of data collected in space. Space can be indexed over a discrete domain or a continuous one. So, spatial statistics is traditionally divided into three main areas depending on the type of problem and data: lattice data, Geostatistics and point patterns. For a review of models of different types of spatial data, see Haining and Haining (2003) and Cressie and Wikle (2015). When a spatial effect has to be included in the model, it is common to formulate mixed-effects regression models in which the linear predictor is made up of a trend plus a spatial variation, the spatial effect being modelled with correlation random effects and matching perfectly the structure presented in Eq. (16).

R-INLA provides many options when implementing Gaussian latent spatial effects (Gómez-Rubio 2020), including intrinsic conditional autoregressive models (iCAR) or conditional autoregressive models (CAR) for areal data (Besag et al. 1991) or spatial effect with Matérn covariance function for continuous processes (Lindgren et al. 2011). In this manuscript, we focus in the last, but it can be easily applicable to other latent Gaussian effects.

The Matérn covariance function is one of the most widely used in Geostatistics due to its flexibility. Although initially it could not be directly incorporated into the R-INLA structure, in Lindgren et al. (2011) introduced a solution through the SPDE module, approximating the spatial latent effect with a Matérn function as a solution to a stochastic partial differential equation using the finite element method (FEM). Since then, this methodology has been applied in numerous scientific articles across different areas (Martínez-Minaya et al. 2018).

These effects can be easily included in the LNDM. As we are adopting a multiple likelihood modelling strategy, we make use of the features that R-INLA provides for fitting multiple likelihoods in a jointly way. The copy command is intended share random effects, i.e., to use the same latent effect in different linear predictors. It also, allows to share exactly the same latent effect but adding a proportionality hyperparmeter. The replicate feature provides a way to add different random effects per linear predictor sharing the same hyperparameters. For details about its implementation, we refer the reader to the website https://www.r-inla.org/ and books by Krainski et al. (2018) and Gómez-Rubio (2020).

Applying these principles and emphasizing both fixed effects and continuous spatial random effects, the examples presented in this paper follow a systematic framework that leads to the development of eight distinct model types. Then, the model structure employed for the remainder of the paper is as follows:

$$\begin{aligned} alr(\varvec{Y} )\sim & {} {{\mathcal {N}}}{{\mathcal {D}}}((\varvec{\mu }^{(1)}, \ldots , \varvec{\mu }^{(d)}), \varvec{\Sigma }), \end{aligned}$$

(20)

$$\begin{aligned} \varvec{\mu }^{(d)}= & {} {\varvec{X}} \varvec{\beta }^{(d)}+\varvec{\omega }^{(d)} , d = 1,\ldots ,D-1 , \end{aligned}$$

(21)

$\varvec{\mu }^{(d)} = (\mu ^{(d)}_{1}, \ldots , \mu ^{(d)}_{N})$ being the different linear predictor for the nth observation in the dth alr-coordinate, and ${\varvec{X}}_{N \times (M + 1)}$ the design matrix, containing 1s in the first column if intercepts are considered in the model. $\varvec{\omega }^{(d)}$ represents the spatial random effect with Matérn covariance for each dth alr-coordinate, $\varvec{\omega }^{(d)} \sim {\mathcal {N}}(\varvec{0}, {\varvec{Q}}^{-1}(\sigma _{\varvec{\omega }}, \phi ))$, depending on the standard deviation of the spatial effect $\sigma _{\varvec{\omega }}$ and its range $\phi $. $\varvec{\beta }^{(d)}_{(M + 1)\times 1}$ is the parameter vector corresponding to the fixed effects. The latent field is composed of the parameters corresponding to the fixed effects and the realisations of the random field.

$$\begin{aligned} {\mathcal {\varvec{X}}} = \{\varvec{\beta }^{(d)}, \varvec{\omega }^{(d)}: d = 1, \ldots , (D-1)\}. \end{aligned}$$

In contrast, $\varvec{\theta }_1 = \{\sigma _d^2, \gamma : d = 1, \ldots , (D-1) \}$ are the hyperparameters corresponding to the likelihood, and $\varvec{\theta }_2 = \{\sigma _{\varvec{\omega }}, \phi \} $ are the hyperparameters corresponding to the spatial random effect. Together they form the field of hyperparameters. Gaussian priors are usually assigned for the fixed effects and PC-priors for the hyperparameters (Simpson et al. 2017).

Based on the model structure defined in Eq. (21), R-INLA offers flexibility by allowing us to introduce fixed effects and random effects in different ways with the features previously explained. For the fixed effects, two different assumptions between parameters of the different alr-coordinates are plausible. The first is under the assumption that the effect of the m-covariate is the same for the different alr-coordinates, i.e. they are sharing the same parameter for fixed effects: $\beta _m^{(d)} = \beta _m^{(k)}$, $d \ne k$ and $d, k = 1,\ldots , (D-1)$, $m = 0, \ldots M$. We denote it by $\beta _m$. For the second, we consider that the effect of the m-covariate could be different for each alr-coordinate. Note that this one is more general, as it includes the case where the effects are equal and also the case where we do not have the same covariates in each linear predictor. We denote them by $\beta _m^{(d)}$.

With regard to the random effects, we distinguish three different cases. The first one considers that the spatial random field is the same for all the linear predictors, i.e. $\varvec{\omega }^{(d)} = \varvec{\omega }^{(k)}$, $d \ne k$ and $d, k = 1,\ldots , (D-1)$. They share exactly the same spatial term. So, we denote it by $\varvec{\omega }$ as it is not dependent on the alr-coordinates predictor. The second case is under the assumption that the spatial fields are proportional, in other words, ${\varvec{\omega }}^{(d)} = \alpha ^{(d)} {\varvec{\omega }}^{(k)}$, $d \ne k$ and $d, k = 1,\ldots , (D-1)$. We denoted it by $\varvec{\omega }^{(*d)}$. Finally, the third case states that the realisation of the spatial random effect is different for each linear predictor. However, they share the same hyperparameters, i.e. $\varvec{\omega }^{(d)} \ne \varvec{\omega }^{(k)}$, $d \ne k$, and $d, k = 1,\ldots , (D-1)$, where $\varvec{\omega }^{(d)} \sim {\mathcal {N}}(0, {\varvec{Q}}^{-1}(\sigma _{\varvec{\omega }}, \phi ))$. We denote it by $\varvec{\omega }^{(d)}$.

By combining fixed and random terms, we reach eight different structures for the linear predictors (See Table 1 for details about the latent field and hyperparameters):

Type I: share the same parameters for fixed effects, and do not include spatial random effects.
Type II: have different parameters for fixed effects, and do not include spatial random effects.
Type III: share the same parameters for fixed effects, and share the same spatial effect.
Type IV: have different parameters for fixed effects, and share the same spatial effect.
Type V: share the same parameters for fixed effects, and the spatial effects between linear predictors are proportional. Realisations of the spatial field are the same, but a proportionality hyperparameter is added in two of the three linear predictors.
Type VI: have different parameters for fixed effects, and the spatial effects between linear predictors are proportional. Realisations of the spatial field are the same, but a proportionality hyperparameter is added in two of the three linear predictors.
Type VII: share the same parameters for fixed effects, and different realisations of the spatial effect for each linear predictor. Although realisations of random effects are different, they share the same hyperparameters.
Type VIII: have different parameters for fixed effects, and different realisations of the spatial effect for each linear predictor. Although realisations of random effects are different, they share the same hyperparameters.

Table 1 Different structures included in the model in an additive way with their corresponding latent field and the hyperparameters to be estimated

Full size table

5.2 Model selection and validation

Regarding the model selection process, sometimes there are a large number of models resulting from all the possible combinations of covariates, and combining them with the possible latent effects that can be incorporated increases the number of possibilities exponentially. R-INLA has proved to be fast enough to compute huge numbers of models as well as different measures to make the model selection process feasible. Such measures include Deviance Information Criteria (Spiegelhalter et al. 2002, DIC), defined as a hierarchical modelling generalisation of the Akaike information criterion (AIC); Watanabe Akaike information criterion (Watanabe and Opper 2010; Gelman et al. 2014, WAIC), which is the sum of two components: one quantifying the model fit and the other evaluating the model complexity; or the cross-validation measure conditional predictive ordinate (CPO) for evaluating the predictive capacity and its log-score (Pettit 1990; Roos and Held 2011, LCPO). The models with the lowest values of DIC, WAIC or LCPO have preference over the rest.

However, R-INLA is programmed to handle univariate likelihoods, and the variability added with the inclusion of the new random effect is not being considered when the calculation of the deviance is computed. This affects the computation of the DIC and WAIC. So, an additional process is needed to calculate DIC and WAIC when the response variable follows a multivariate normal distribution. This process must be able to incorporate the elements that are off the diagonal of the variance–covariance matrix. To achieve this, a post-processing of the model is performed for obtaining samples of the jointly posterior distributions using the feature inla.posterior.sample function, and the likelihood of the multivariate normal distribution is calculated. The remaining calculations for DIC are done following the formula defined in Spiegelhalter et al. (2002), meanwhile, WAIC is computed following the formula in Watanabe and Opper (2010). These two ways have been implemented in two different functions in R. The functions are called DIC.mult and WAIC.mult and are available in the repository https://github.com/jmartinez-minaya/INLAcomp.

The same does not apply to the CPO, as it is based on the posterior predictive distribution. In Appendix A, there is a proof of why the CPO is not affected by the approach we propose here. However, we believe that the CPO cannot be calculated in the same way when dealing with CoDa, and therefore, we propose a new definition.

5.2.1 CPO

In the context of CoDa cross-validation process, excluding a category from a CoDa point may not make sense, as we know that CoDa have a constraint: their sum must be 1. This implies that the remaining categories provide valuable information about the category we are excluding. One might think that working in the log-ratio coordinates could alleviate this issue, but that is not the case. The reference category is present in all the log-ratios, and thus we encounter a similar situation. At that point, the remaining log-ratio coordinates provide information about the category we have removed during cross-validation. In this manner, the concept of friendship emerges. Consequently, we can assert that the first alr-coordinate of individual n is friend of the second alr-coordinate of individual n, and is thereby contributing information. Hence, in order to conduct cross-validation for individual n and alr-coordinate d, it is necessary to exclude the values from all alr-coordinates pertaining to that individual. Accordingly, we can define the CPO for the nth data point and dth alr-coordinate as:

$$\begin{aligned} \text {CPO}_n^{(d)}= & {} \int p(alr(\varvec{y})_n^{(d)} \mid {\mathcal {\varvec{X}}}, \varvec{\theta })\nonumber \\{} & {} p({\mathcal {\varvec{X}}}, \varvec{\theta } \mid alr(\varvec{y})_{-n}^{\bullet }) \, d{\mathcal {\varvec{X}}} d \varvec{\theta } , \end{aligned}$$

(22)

being $alr(\varvec{y})_n^{(d)}$ the observed vector for the n-data point and the d alr-coordinate, and $alr(\varvec{y})_{-n}^{\bullet }$ represents the observed data in alr-coordinates ($N-1$ data points with $D-1$ components for data point) excluding the n data point with its corresponding $D-1$ alr-coordinates. We then easily compute the log-score (Gneiting and Raftery 2007) as:

$$\begin{aligned} \text {LCPO} = -\frac{1}{N \cdot (D-1)} \sum _{d=1}^{D-1} \sum _{n=1}^{N} \log {\left( \text {CPO}_n^{(d)}\right) }. \end{aligned}$$

(23)

6 Continuos spatial data: a simulation study

The goals of this simulation are twofold. Firstly, we seek to assess the reliability of model selection criteria previously presented. As we have pointed out, these metrics play a crucial role in identifying the model that best represents the underlying process. Secondly, we aim to demonstrate capability of R-INLA to accurately recover the initial parameters.

6.1 Simulated data

We conducted a simulation of a spatial LNDM Type VIII renowned for its high flexibility as the fixed effects vary by linear predictor, and spatial effects realizations differ accordingly. The simulation involved one covariate, simulated from a Uniform distribution between $-0.5$ and 0.5; two different realizations of a Matérn field in the square space [0,10]$\times $ [0,10] with range $\phi = 4$ and $\sigma _{\omega } = 1$ (See Fig. 6); one thousand observations ($N = 1000$) and three dimensions ($D = 3$). Given that $D = 3$, applying the alr transformation yields two linear predictors. In the context of Type VIII and considering we simulated only one covariate, we are tasked with estimating two parameters, denoted as $\beta _1^{(1)}$ and $\beta _1^{(2)}$. These parameters were pre-set to specific values: $-2.27$ and $-2.3$ respectively. Turning our attention to the likelihood hyperparameters, we encounter two variance hyperparameters $\sigma _1^2$ and $\sigma _2^2$ and one covariance parameter $\gamma $. For this simulation, these hyperparameters were fixed at predetermined values 0.32, 0.59 and 0.1. Resulting data simulation is depicted in Fig. 4 and the alr-coordinates using the third category as reference are displayed in Fig. 5. We selected the third category as reference as it was the one whose logarithm had the lowest variance.

6.2 Model selection

The simulation originates from the Type VIII model, and we sought to fit alternative model types (refer to Table 1). Subsequently, we computed the DIC, WAIC, and LCPO for each model. Results are depicted in Table 2. Upon analysis, it is evident that, in all three cases, the Type VIII model consistently exhibits the best fit to our simulated data. This conclusion is supported by consistently smaller values across all three evaluation metrics.

Table 2 LNDMs with their corresponding DIC, WAIC and LCPO

Full size table

6.3 Parameters recovery

As previously discussed, the optimal model is the Type VIII model. This model comprises: 2 parameters corresponding to fixed effects, $\beta _{1}^{(1)}$ and $\beta _{1}^{(2)}$, and the realizations of the spatial random effects which form the latent Gaussian field (${\mathcal {\varvec{X}}}$); 3 hyperparameters related to likelihood $\sigma _1^2$, $\sigma _2^2$ and $\gamma $, and 2 hyperparameters associated with spatial random effects which forms the set of hyperparameters ($\varvec{\theta }$).

The 95% credible interval of the parameter $\beta _1^{(1)}$ is [2.103, 2.4] with a median value of 2.251. In contrast, for the parameter $\beta _1^{(2)}$, the 95% credible interval is $[-2.469, -2.086]$ with a median value of $-2.277$. Comparing these intervals with the true parameter values, $-2.27$ and 2.3 respectively, we conclude that estimation is accurate enough. A similar pattern emerges for the latent fields with Matérn covariance matrices. In Fig. 6, we depict the original spatial latent fields alongside the medians and estimated 95% credible intervals. Once again, we observe a reliable estimation. Finally, we examine the behavior of the hyperparameters. In Fig. 7, the posterior distributions of the hyperparameters are illustrated jointly with the true values. Once more, the estimations align well with the actual values. From these findings, we can conclude that the method is proficient in recovering the true parameter values effectively.

7 The case of Arabidopsis thaliana

This section is devoted to showing an application of continuous spatial LNDMs in a real setting.

7.1 The data and the model

We worked with a collection of 301 accessions of the annual plant Arabidopsis thaliana on the Iberian Peninsula. For each accession, the probability of belonging to each of the 4 genetic clusters (GC) inferred in Martínez-Minaya et al. (2019), namely, GC1, GC2, GC3 and GC4, were available (Fig. 8), their sum total being 1. We were interested in estimating the probability of membership, which in this particular context can be thought of as the habitat suitability for each genetic cluster. To do so, we employed LNDMs including climate covariates and spatial terms in the linear predictor. In particular, two bioclimatic variables were used to define the climatic part: annual mean temperature (BIO1) and annual precipitation (BIO12). The complete dataset was downloaded from the repository Martínez-Minaya et al. (2019). Climate covariates were scaled before conducting the analysis.

As mentioned, four categories were employed in this problem: GC1, GC2, GC3 and GC4. So, we dealt with proportions in $\mathbb {S}^4$. To produce the LNDM, we selected GC4 as the reference category because it was the one whose logarithm had the lowest variance. We were thus dealing with a three dimensional ${{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu }, \varvec{\Sigma })$. The transformed data is shown in Fig. 9.

7.2 Model selection, model fitting and prediction

Model selection was conducted including the intercept and also the two climatic covariates combining them with the spatial effects for the different structures presented in Table 1. 8 models were fitted and the DIC, WAIC and LCPO were computed (Table 3).

In view of the results in the model selection, and based on DIC and WAIC, we observed that the one with type VIII structure seemed to be the best at representing the process of interest. On the contrary, the LCPO indicates that the best model features a Type VI structure. However, as the difference is just 0.019, we proceeded with the model Type VIII for making the computation of the posterior distributions and also for making the predictions. Then, R-INLA allowed us to compute the posterior distribution for the fixed effects (Fig. 10) in each alr-coordinate. As we have argued in favour of alr, it is easy to interpret in terms of ratios.

If we focus on the covariate BIO1 (annual mean temperature), we observed that in presence of BIO12, it is relevant with a probability of 0.972 for the coefficient to be lower than 0 in the the first alr-coordinate, 0.99 for the second one, and 0.99 for the third. Therefore, in all three cases, we shall presume the covariate to be relevant and proceed to interpret the coefficients (Fig. 10). We observed that the ratio between the probability of belonging to GC1 and the probability of belonging to GC4 reduces by approximately 20% when the scaled covariate annual mean temperature increased by one unit. For the case of the ratio between the probability of belonging to GC2 and GC4, it decreased by 32% when the scaled covariate annual mean temperature increased by one unit. Finally, the ratio between the probability of belonging to GC3 and GC4 decreased by 50% when the covariate annual mean temperature increased by one unit.

If we focus on the covariate present in the model BIO12 (annual precipitation), we noted that in presence of BIO1, it is relevant with a probability of 0.72 for the coefficient to be lower than 0 in the the first alr-coordinate. Not happen the same for the second and third alr-coordinate, as the probability to be lower than 0 are 0.43 and 0.46 respectively. As a result, we assume the covariate’s relevance in the first alr-coordinate and we proceed to interpret its coefficient (Fig. 10). The ratio between the probability of belonging to GC1 and the probability of belonging to GC4 decreases by approximately 6% when the scaled covariate BIO12 increased by one unit and BIO1 remains constant.

Table 3 LNDMs with their corresponding DIC, WAIC and LCPO

Full size table

With the method implemented here, we are able to make predictions not only on the alr-coordinates scale (Fig. 11), but also on the original scale (Fig. 12). If we focus on Fig. 11, we observe how in the north-west of Spain the ratio between the probability of belonging to GC1 and GC4 reached 12, meaning that at those points the probability of belonging to GC1 is 12 times greater than the probability of belonging to GC4. Something similar happened in the north-east of the Iberian Peninsula, where the probability of belonging to GC2 is 12 times greater than the probability of belonging to GC4. The case of the third alr-coordinate seems a bit different, and the greatest difference between the probability of belonging to GC3 and GC4 is found in the centre of the Iberian Peninsula.

Finally, it is accessible to compute marginal posterior distribution of the hyperparameters and, consequently, the covariance parameter between the alr-coordinates (Fig. 13).

8 Conclusions and future work

CoDa are becoming more and more common, especially in the context of genomics, and require increasingly powerful computational tools to be analysed. Thus, we believe that finding a way to include a likelihood that can deal with CoDa in the context of LGMs can facilitate inference and predictions. That is why in this manuscript, we have introduced a different way to make inference on Bayesian CoDa analysis. By doing so, we attempt to include it in the context of LGMs, thereby making the range of possibilities that R-INLA offers available to the logistic-normal distribution with Dirichlet covariance likelihood.

The main idea underlying the proposed method is to approximate the multivariate likelihoods with univariate ones sharing an independent random effect that can be fitted by R-INLA, in particular, Gaussian likelihoods. This idea is similar to the one proposed for modelling Multinomial likelihood in R-INLA, where using the Poisson trick (Baker 1994) to reparameterise the model we need to fit independent Poisson observations, or the one proposed in (Martínez-Minaya et al. 2023) to approximate Dirichlet likelihoods using conditionally independent Gaussians. Simpson et al. (2016) also used a similar strategy, constructing a Poisson approximation to the true log-Gaussian Cox process likelihood and making it possible to carry out inference on a regular lattice over the observation window by counting the number of points in each cell. But this work does not intend to be a substitute for the dirinla package (Martínez-Minaya et al. 2023) or for the Bayesian ilr approach (Mota-Bertran et al. 2022): it is simply a viable alternative when dealing with CoDa that allows the estimation and prediction of very complex models in the context of CoDa. Furthermore, functions are provided for the computation of DIC and WAIC within the framework of R-INLA, accompanied by the definition of the CPO for CoDa.

We have reported an example in the field of Ecology, showing the potential of R-INLA when continuous spatial effects can be added in the linear predictor. We have exploited the options that R-INLA has available using tools in the context of multiple likelihoods, such as copy or replicate (Gómez-Rubio 2020). With them, our aim was to show practitioners the number of models that can be fitted in this context. Although here we have focused mainly on spatial processes, this tool can be easily applied in other contexts: temporal, spatiotemporal, etc., as long as we exprees the model in the context of LGMs.

9 Supplementary information

Code: The functions are stored in a R-package call INLAComp, it is on https://github.com/jmartinez-minaya/INLAcomp. The results shown in the paper are stored in https://jmartinez-minaya.github.io/supplementary.html.

References

Aguilera, A., Bautista, F., Gutiérrez-Ruiz, M., Ceniceros-Gómez, A.E., Cejudo, R., Goguitchaichvili, A.: Heavy metal pollution of street dust in the largest city of Mexico, sources and health risk assessment. Environ. Monit. Assess. 193(4), 1–16 (2021). https://doi.org/10.1007/s10661-021-09344-z
Article Google Scholar
Aitchison, J.: The Statistical Analysis of Compositional Data. Chapman and Hall London, London (1986)
Book Google Scholar
Aitchison, J., Shen, S.M.: Logistic-normal distributions: some properties and uses. Biometrika 67(2), 261–272 (1980)
Article MathSciNet Google Scholar
Baker, S.G.: The multinomial-Poisson transformation. J. R. Stat. Soc. Ser. D (Stat.) 43(4), 495–504 (1994)
Google Scholar
Besag, J., York, J., Mollié, A.: Bayesian image restoration, with two applications in spatial statistics. Ann. Inst. Stat. Math. 43(1), 1–20 (1991)
Article MathSciNet Google Scholar
Blangiardo, M., Cameletti, M.: Spatial and spatio-temporal Bayesian models with R-INLA. Wiley, New Jersey (2015)
Book Google Scholar
Buccianti, A., Grunsky, E.: Compositional data analysis in geochemistry: Are we sure to see what really occurs during natural processes? J. Geochem. Explor. 141, 1–5 (2014). https://doi.org/10.1016/j.gexplo.2014.03.022
Article Google Scholar
Connor, R.J., Mosimann, J.E.: Concepts of independence for proportions with a generalization of the Dirichlet distribution. J. Am. Stat. Assoc. 64(325), 194–206 (1969). https://doi.org/10.1080/01621459.1969.10500963
Article MathSciNet Google Scholar
Cressie, N., Wikle, C.K.: Statistics for Spatio-Temporal Data. Wiley, New Jersey (2015)
Google Scholar
Creus Martí, I., Moya, A., Santonja, F.: Bayesian hierarchical compositional models for analysing longitudinal abundance data from microbiome studies. Complexity 2022 (2022) https://doi.org/10.1155/2022/4907527
Cribari-Neto, F., Zeileis, A.: Beta regression in R. J. Stat. Softw. 34(2) (2010)
Douma, J.C., Weedon, J.T.: Analysing continuous proportions in Ecology and Evolution: A practical introduction to beta and Dirichlet regression. Methods Ecol. Evol. 10(9), 1412–1430 (2019). https://doi.org/10.1111/2041-210X.13234
Article Google Scholar
Dumuid, D., Stanford, T.E., Martin-Fernández, J.-A., Pedišić, Ž, Maher, C.A., Lewis, L.K., Hron, K., Katzmarzyk, P.T., Chaput, J.-P., Fogelholm, M., et al.: Compositional data analysis for physical activity, sedentary time and sleep research. Stat. Methods Med. Res. 27(12), 3726–3738 (2018). https://doi.org/10.1177/09622802177108
Article MathSciNet Google Scholar
Egozcue, J.J., Daunis-I-Estadella, J., Pawlowsky-Glahn, V., Hron, K., Filzmoser, P.: Simplicial regression. Norm. Model. (2012)
Egozcue, J.J., Pawlowsky-Glahn, V., Mateu-Figueras, G., Barcelo-Vidal, C.: Isometric logratio transformations for compositional data analysis. Math. Geol. 35(3), 279–300 (2003)
Article MathSciNet Google Scholar
Engle, M.A., Rowan, E.L.: Geochemical evolution of produced waters from hydraulic fracturing of the Marcellus Shale, Northern Appalachian basin: a multivariate compositional data analysis approach. Int. J. Coal Geol. 126, 45–56 (2014). https://doi.org/10.1016/j.coal.2013.11.010
Article Google Scholar
Fahrmeir, L., Kneib, T., Lang, S., Marx, B., Fahrmeir, L., Kneib, T., Lang, S., Marx, B.: Regression models. In: Methods and Applications. Springer, New York (2013)
Fairclough, S.J., Dumuid, D., Mackintosh, K.A., Stone, G., Dagger, R., Stratton, G., Davies, I., Boddy, L.M.: Adiposity, fitness, health-related quality of life and the reallocation of time between children’s school day activity behaviours: a compositional data analysis. Prev. Med. Rep. 11, 254–261 (2018). https://doi.org/10.1016/j.pmedr.2018.07.011
Article Google Scholar
Figueras, G., Pawlowsky-Glahn, V., Vidal, C., et al.: Distributions on the simplex (2003)
Gaedke-Merzhäuser, L., Niekerk, J., Schenk, O., Rue, H.: Parallelized integrated nested Laplace approximations for fast Bayesian inference. Stat. Comput. 33(1), 25 (2023)
Article MathSciNet Google Scholar
Gelman, A., Hwang, J., Vehtari, A.: Understanding predictive information criteria for Bayesian models. Stat. Comput. 24(6), 997–1016 (2014)
Article MathSciNet Google Scholar
Gneiting, T., Raftery, A.E.: Strictly proper scoring rules, prediction, and estimation. J. Am. Stat. Assoc. 102(477), 359–378 (2007)
Article MathSciNet Google Scholar
Gómez-Rubio, V.: Bayesian inference with INLA. CRC Press, Boca Raton (2020)
Book Google Scholar
Greenacre, M., Grunsky, E., Bacon-Shone, J., Erb, I., Quinn, T.: Aitchison’s compositional data analysis 40 years on: a reappraisal. Stat. Sci. (2023). https://doi.org/10.1214/22-STS880
Haining, R.P., Haining, R.: Spatial Data Analysis: Theory and Practice. Cambridge University Press, Cambridge (2003)
Book Google Scholar
Hijazi, R.H., Jernigan, R.W.: Modelling compositional data using Dirichlet regression models. J. Appl. Probab. Stat. 4(1), 77–91 (2009)
MathSciNet Google Scholar
Klein, N., Kneib, T., Klasen, S., Lang, S.: Bayesian structured additive distributional regression for multivariate responses. J. R. Stat. Soc.: Ser. C (Appl. Stat.) 64(4), 569–591 (2015)
Article MathSciNet Google Scholar
Kobal, M., Kastelec, D., Eler, K.: Temporal changes of forest species composition studied by compositional data approach. Forest-Biogeosci For. 10(4), 729–738 (2017). https://doi.org/10.3832/ifor2187-010
Article Google Scholar
Krainski, E.T., Gómez-Rubio, V., Bakka, H., Lenzi, A., Castro-Camilo, D., Simpson, D., Lindgren, F., Rue, H.: Advanced spatial modeling with Stochastic partial differential equations Using R and INLA. CRC Press, Boca Raton (2018)
Google Scholar
Lindgren, F., Rue, H., Lindström, J.: An explicit link between gaussian fields and gaussian Markov random fields: the stochastic partial differential equation approach. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 73(4), 423–498 (2011)
Article MathSciNet Google Scholar
Maier, M.J.: DirichletReg: Dirichlet regression for compositional data in R (2014)
Martínez-Minaya, J., Conesa, D., Fortin, M.-J., Alonso-Blanco, C., Picó, F.X., Marcer, A.: A hierarchical Bayesian beta regression approach to study the effects of geographic genetic structure and spatial autocorrelation on species distribution range shifts. https://doi.org/10.5281/zenodo.2552025
Martínez-Minaya, J., Lindgren, F., López-Quílez, A., Simpson, D., Conesa, D.: The integrated nested Laplace approximation for fitting Dirichlet regression models. J. Comput. Graph. Stat. (2023). https://doi.org/10.1080/10618600.2022.2144330
Martínez-Minaya, J., Cameletti, M., Conesa, D., Pennino, M.G.: Species distribution modeling: a statistical review with focus in spatio-temporal issues. Stoch. Environ. Res. Risk Assess. 32, 3227–3244 (2018)
Article Google Scholar
Martínez-Minaya, J., Conesa, D., Fortin, M.-J., Alonso-Blanco, C., Picó, F.X., Marcer, A.: A hierarchical Bayesian beta regression approach to study the effects of geographical genetic structure and spatial autocorrelation on species distribution range shifts. Mol. Ecol. Resour. 19(4), 929–943 (2019). https://doi.org/10.1111/1755-0998.13024
Article Google Scholar
Merwe, S.: A method for Bayesian regression modelling of composition data. arXiv:1801.02954 (2018)
Moraga, P.: Geospatial health data: modeling and visualization with R-INLA and shiny. CRC Press, Boca Raton (2019)
Book Google Scholar
Mota-Bertran, A., Saez, M., Coenders, G.: Compositional and Bayesian inference analysis of the concentrations of air pollutants in Catalonia, Spain. Environ. Res. 204, 112388 (2022). https://doi.org/10.1016/j.envres.2021.112388
Article Google Scholar
Niekerk, J., Rue, H.: Correcting the Laplace method with variational Bayes. arXiv:2111.12945 (2021)
Pawlowsky-Glahn, V., Egozcue, J.J.: Geometric approach to statistical analysis on the simplex. Stoch. Environ. Res. Risk Assess. 15(5), 384–398 (2001)
Article Google Scholar
Pettit, L.: The conditional predictive ordinate for the normal distribution. J. R. Stat. Soc.: Ser. B (Methodol.) 52(1), 175–184 (1990)
Article MathSciNet Google Scholar
Pirzamanbein, B., Poska, A., Lindström, J.: Bayesian reconstruction of past land cover from pollen data: Model robustness and sensitivity to auxiliary variables. Earth Space Sci. 7(1), e2018EA00057 (2020). https://doi.org/10.1029/2018EA000547
Plummer, M.: Rjags: Bayesian Graphical Models Using MCMC. In: R package version 4–6 (2016). https://CRAN.R-project.org/package=rjags
Roos, M., Held, L.: Sensitivity analysis in Bayesian generalized linear mixed models for binary data. Bayesian Anal. 6(2), 259–278 (2011)
Article MathSciNet Google Scholar
Rue, H., Held, L.: Gaussian Markov Random Fields: Theory and Applications. Chapman & Hall, New York (2005)
Book Google Scholar
Rue, H., Martino, S., Chopin, N.: Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J R Stat Soc Ser B (Stat Methodol) 71(2), 319–392 (2009)
Article MathSciNet Google Scholar
Sennhenn-Reulen, H.: Bayesian Regression for a Dirichlet distributed response using Stan. arXiv:1808.06399 (2018)
Shi, P., Zhang, A., Li, H., et al.: Regression analysis for microbiome compositional data. Ann. App. Stat. 10(2), 1019–1040 (2016). https://doi.org/10.1214/16-AOAS928
Article MathSciNet Google Scholar
Simpson, D., Rue, H., Riebler, A., Martins, T.G., Sørbye, S.H.: Penalising model component complexity: a principled, practical approach to constructing priors. Stat. Sci. 32(1), 1–28 (2017). https://doi.org/10.1214/16-STS576
Simpson, D., Illian, J.B., Lindgren, F., Sørbye, S.H., Rue, H.: Going off grid: Computationally efficient inference for log-Gaussian Cox processes. Biometrika 103(1), 49–70 (2016)
Article MathSciNet Google Scholar
Spiegelhalter, D.J., Best, N.G., Carlin, B.P., Van Der Linde, A.: Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 64(4), 583–639 (2002)
Templ, M., Hron, K., Filzmoser, P.: RobCompositions: an R-package for Robust statistical analysis of compositional data, pp. 341–355. John Wiley and Sons, New Jersey (2011)
Tsilimigras, M.C., Fodor, A.A.: Compositional data analysis of the microbiome: fundamentals, tools, and challenges. Ann. Epidemiol. 26(5), 330–335 (2016). https://doi.org/10.1016/j.annepidem.2016.03.002
Article Google Scholar
Van Niekerk, J., Krainski, E., Rustand, D., Rue, H.: A new avenue for Bayesian inference with INLA. Comput. Stat. Data Anal. 181, 107692 (2023)
Article MathSciNet Google Scholar
Wang, X., Ryan, Y.Y., Faraway, J.J.: Bayesian Regression Modeling with INLA. Chapman and Hall/CRC, London (2018)
Book Google Scholar
Washburne, A.D., Silverman, J.D., Leff, J.W., Bennett, D.J., Darcy, J.L., Mukherjee, S., Fierer, N., David, L.A.: Phylogenetic factorization of compositional data yields lineage-level associations in microbiome datasets. PeerJ 5, 2969 (2017). https://doi.org/10.7717/peerj.2969
Article Google Scholar
Watanabe, S., Opper, M.: Asymptotic equivalence of Bayes cross validation and widely applicable information criterion in singular learning theory. J. Mach. Learn. Res. 11(12) (2010)
Zuur, A.F., Ieno, E.N., Saveliev, A.A.: Beginner’s guide to spatial, temporal, and spatial-temporal ecological data analysis with R-INLA. Highland Statistics Ltd, Newburgh (2017)

Download references

Acknowledgements

Joaquín Martínez-Minaya gratefully acknowledges the Ministry of Science, Innovation and Universities (Spain) for research project PID2020-115882RB-I00. Joaquín Martínez-Minaya also acknowledges for Funding for open access charge: CRUE-Universitat Politècnica de València.

Funding

Open Access funding provided thanks to the CRUE-CSIC agreement with Springer Nature.

Author information

Authors and Affiliations

Multivariate Statistical Engineering Research Group, Department of Applied Statistics and Operational Research, and Quality, Universitat Politècnica de València, Camí de Vera, SN, 46022, Valencia, Spain
Joaquín Martínez-Minaya
Computer, Electrical and Mathematical Science and Engineering Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
Haavard Rue

Authors

Joaquín Martínez-Minaya
View author publications
You can also search for this author in PubMed Google Scholar
Haavard Rue
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

JM-M and HR developed the methodology. JM-M developed models and performed simulations. JM-M wrote the main manuscript. JM-M and HR reviewed the manuscript.

Corresponding author

Correspondence to Joaquín Martínez-Minaya.

Ethics declarations

Conflict of interest

The authors declare no Conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A: CPO computation in R-INLA

To verify that the CPO is not affected when fitting the model, it is enough to simplify the problem to the calculation of the posterior predictive distribution for the following two models:

MODEL I:

$$\begin{aligned} y_i\sim & {} {\mathcal {N}}(\mu _i, \sigma ^2_y), \ i = 1, \ldots N, \nonumber \\ \mu _i= & {} \beta _0, \nonumber \\ \beta _0\sim & {} {\mathcal {N}}(0, \sigma _0^2), \end{aligned}$$

(A1)

MODEL II:

$$\begin{aligned} y_i\sim & {} {\mathcal {N}}(\mu _i, \sigma ^2_{\varvec{\epsilon }}), \nonumber \\ \mu _i= & {} \beta _0 + \omega _i ,\ \nonumber \\ \beta _0\sim & {} {\mathcal {N}}(0, \sigma _0^2) \nonumber \\ \omega _i\sim & {} {\mathcal {N}}(0, \sigma _{\omega }^2) . \end{aligned}$$

(A2)

Let assume for simplicity that $\sigma ^2_y$, $\sigma ^2_{\varvec{\epsilon }}$, $\sigma ^2_{\omega }$ and $\sigma _0^2$ are fixed numbers. Both models are equivalent, and $\sigma ^2_{\varvec{\epsilon }} + \sigma ^2_{\omega } = \sigma ^2_{y}$. However, as we have pointed out, in R-INLA, an additional process is required for computing DIC and WAIC. Nevertheless, it is not necessary for CPO, let’s see why.

Proposition 2

Let $y_i$, $i = 1, \ldots , N$ independent realisations of a Gaussian distribution with mean $\mu _i$ and variance $\sigma ^2_{y}$. The expressions (A1) and (A2) reflect two different ways of representing the process, although both models are equivalent. Thus, the CPO of the Model I and Model II are the same.

Proof

For proving that both CPOs are equal, it is enough to show that the posterior predictive distribution of both models is the same.

We start with a general linear mixed model following the expression in Eq. (10)

$$\begin{aligned} {\varvec{y}} = {\varvec{X}} \varvec{\beta } + \varvec{A_{\omega }} \varvec{\omega } + \varvec{\epsilon } , \end{aligned}$$

(A3)

being ${\varvec{X}}$ and $\varvec{\varvec{A_{\omega }}}$ design matrices, $\varvec{\beta }$, a vector of fixed effects which follows a multivariate Gaussian prior distribution with mean ${\varvec{m}}$ and covariance matrix ${\varvec{M}}$, and $\varvec{\omega }$ a vector of random effects which follows a multivariate Gaussian prior distribution with mean ${\varvec{0}}$ and covariance matrix ${\varvec{G}}$. The covariance matrices for $\varvec{\omega }$ and $\varvec{\varvec{\epsilon }}$ are assumed to be non singular, and positive definite, and $\varvec{\omega }$ and $\varvec{\varvec{\epsilon }}$ are independent.

Following Fahrmeir et al. (2013), if the covariance structures ${\varvec{G}}$ and ${\varvec{R}}$ are known, and ${\varvec{C}} = ({\varvec{X}}, {\varvec{U}})$, ${\varvec{B}} = \begin{pmatrix} {\varvec{M}}^{-1} &{} {\varvec{0}} \\ {\varvec{0}} &{} {\varvec{G}}^{-1} \end{pmatrix} \,, \varvec{{\tilde{m}}} = \begin{pmatrix} {\varvec{M}}^{-1} {\varvec{m}} \\ {\varvec{0}} \end{pmatrix} \,,$ then the posterior distribution is multivariate Gaussian with the the following expectation and Covariance matrix.

$$\begin{aligned} E((\varvec{\beta }, \varvec{\gamma }) \mid {\varvec{y}} )= & {} ({\varvec{C}}' {\varvec{R}}^{-1} {\varvec{C}} + {\varvec{B}})^{-1} \nonumber \\{} & {} \left( \varvec{{\tilde{m}}} + {\varvec{C}}' {\varvec{R}}^{-1} {\varvec{y}} \right) \end{aligned}$$

(A4)

$$\begin{aligned} \varvec{Cov}((\varvec{\beta }, \varvec{\gamma }) \mid {\varvec{y}})= & {} \left( {\varvec{C}}' {\varvec{R}}^{-1} {\varvec{C}} + {\varvec{B}} \right) ^{-1} \end{aligned}$$

(A5)

Model I:

For model I, depicted in Eq. (A1), ${\varvec{R}}$ is a diagonal matrix in $\mathbb {R}^{n \times n}$ whose elements in the diagonal are $\sigma ^2_y$. As we do not have random effects ${\varvec{C}} = {\varvec{X}}$, which is a column matrix in $\mathbb {R}^{N \times 1}$ whose elements are 1. $\varvec{{\tilde{m}}}$ is a column matrix in $\mathbb {R}^{1 \times 1}$ whose elements are 0, and finally ${\varvec{B}} = {\varvec{M}}^{-1}$ in $\mathbb {R}^{1 \times 1}$, whose element is $\frac{1}{\sigma _0^2}$. Then, $\beta _0 \mid {\varvec{y}} \sim {\mathcal {N}}(\mu _{\beta _0}, \sigma ^2_{\beta _0})$, being:

$$\begin{aligned} \mu _{\beta _0}= & {} {\varvec{E}}(\beta _0 \mid {\varvec{y}} ) = \frac{1}{\frac{N}{\sigma ^2_y} + \frac{1}{\sigma _0^2}} \frac{N \overline{{\varvec{y}}}}{\sigma ^2_y} \end{aligned}$$

(A6)

$$\begin{aligned} \sigma ^2_{\beta _0} = \varvec{Var}(\beta _0 \mid {\varvec{y}})= & {} \frac{1}{\frac{N}{\sigma ^2_y} + \frac{1}{\sigma _0^2}} \end{aligned}$$

(A7)

The posterior predictive distribution for a new observation $y^{\prime }$

$$\begin{aligned} p\left( y^{\prime } \mid {\varvec{y}} \right) = \int p\left( y^{\prime } \mid \beta _0 \right) \cdot p\left( \beta _0 \mid {\varvec{y}} \right) d \beta _0 \end{aligned}$$

(A8)

is Gaussian with mean $\mu _{\beta _0}$ and variance $\sigma _{\beta _0}^2 + \sigma _{y}^2$.

Model II:

Regarding model II, depicted in Eq. (A2), ${\varvec{R}}$ is also diagonal matrix in $\mathbb {R}^{N \times N}$ whose elements in the diagonal are $\sigma ^2_{\varvec{\epsilon }}$. ${\varvec{V}}$, again is a column matrix in $\mathbb {R}^{N \times 1}$ whose elements are 1, and ${\varvec{U}}$ is an identity matrix in $\mathbb {R}^{N \times 1}$. Then ${\varvec{C}}= ({\varvec{V}}, {\varvec{U}})$. $\varvec{{\tilde{m}}}$ is a column matrix in $\mathbb {R}^{(N + 1) \times 1}$ whose elements are 0. Finally ${\varvec{B}}$ is a diagonal matrix in $\mathbb {R}^{(N+1) \times (N+1)}$, whose first element of the diagonal is $\frac{1}{\sigma _0^2}$ and the rest are $\frac{1}{\sigma _{\varvec{\omega }}^2}$.

Computing the joint posterior distribution for $\beta _0, \varvec{\omega }$, we obtain that it follows a multivariate Gaussian with:

$$\begin{aligned}{} & {} {\varvec{E}}(\beta _0, \varvec{\omega } \mid {\varvec{y}})\nonumber \\{} & {} \quad = \varvec{Cov}(\beta _0, \varvec{\omega } \mid {\varvec{y}}) \begin{pmatrix} \frac{1}{\sigma ^2_{\varvec{\epsilon }}} &{} \frac{1}{\sigma ^2_{\varvec{\epsilon }}} &{} \frac{1}{\sigma ^2_{\varvec{\epsilon }}} &{} \ldots &{} \frac{1}{\sigma ^2_{\varvec{\epsilon }}} \\ \frac{1}{\sigma ^2_{\varvec{\epsilon }}} &{} 0 &{} 0 &{} \ldots &{} 0 \\ 0 &{} \frac{1}{\sigma ^2_{\varvec{\epsilon }}} &{} 0 &{} \ldots &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ 0 &{} 0 &{} 0 &{} \ldots &{} \frac{1}{\sigma _{\varvec{\varvec{\epsilon }}}^2} \\ \end{pmatrix} {\varvec{y}} \end{aligned}$$

(A9)

$$\begin{aligned}{} & {} \varvec{Cov}(\beta _0, \varvec{\omega } \mid {\varvec{y}})\nonumber \\{} & {} \quad = \begin{pmatrix} \frac{N}{\sigma ^2_{\varvec{\epsilon }}} + \frac{1}{\sigma _0^2} &{} \frac{1}{\sigma ^2_{\varvec{\epsilon }}} &{} \frac{1}{\sigma ^2_{\varvec{\epsilon }}} &{} \ldots &{} \frac{1}{\sigma ^2_{\varvec{\epsilon }}} \\ \frac{1}{\sigma ^2_{\varvec{\epsilon }}} &{} \frac{1}{\sigma ^2_{\varvec{\epsilon }}} + \frac{1}{\sigma _{\varvec{\omega }}^2} &{} 0 &{} \ldots &{} 0 \\ \vdots &{} \vdots &{} \vdots &{} \vdots &{} \vdots \\ \frac{1}{\sigma ^2_{\varvec{\epsilon }}} &{} 0 &{} 0 &{} \ldots &{} \frac{1}{\sigma ^2_{\varvec{\epsilon }}} + \frac{1}{\sigma _{\varvec{\omega }}^2} \\ \end{pmatrix}^{-1} \end{aligned}$$

(A10)

The posterior predictive distribution for a new observation $y^{\prime }$ with mean $\mu ^{\prime }$ can be computed as:

$$\begin{aligned} p\left( y^{\prime } \mid {\varvec{y}} \right) = \int p\left( y^{\prime } \mid \mu ^{\prime } \right) \cdot p\left( \mu ^{\prime } \mid {\varvec{y}} \right) d \mu ^{\prime }, \end{aligned}$$

(A11)

being $p(\mu ^{\prime } \mid {\varvec{y}}) = \int p\left( \mu ^{\prime } \mid \beta _0, \sigma ^2_{\varvec{\omega }} \right) \cdot p\left( \beta _0 \mid {\varvec{y}} \right) d \beta _0$. Clearly, it is Gaussian with mean $\mu _{\beta _0}$ and variance $\sigma ^2_{\beta _0} + \sigma _{\varvec{\omega }}^2$. Note that $\sigma ^2_{\beta _0}$ is the variance of the posterior marginal of $\beta _0$. This corresponds to the first element of $\varvec{Cov}(\beta _0, \varvec{\omega } \mid {\varvec{y}})$, which is $\frac{1}{\frac{N}{\sigma ^2_{\varvec{\omega }}} + \frac{1}{\sigma _0^2}}$. Something similar happens with $\mu _{\beta _0}$, the first element of the resulting matrix ${\varvec{E}}(\beta _0, \varvec{\omega } \mid {\varvec{y}} )$, which is $\frac{1}{\frac{N}{\sigma ^2_{\varvec{\epsilon }} + \sigma ^2_{\varvec{\omega }}} + \frac{1}{\sigma _0^2}} \frac{N \overline{{\varvec{y}}}}{\sigma ^2_{\varvec{\epsilon }} + \sigma ^2_{\varvec{\omega }}}$

Finally, and coming back to Eq. (A11), we obtain that the posterior predictive distribution of $y^{\prime } \mid {\varvec{y}}$ is Gaussian, with mean $\mu _{\beta _0}$ and variance $\sigma ^2_{\beta _0} + \sigma ^2_{\varvec{\omega }} + \sigma ^2_{\epsilon }$.

As a consequence, the two models have the same posterior predictive distributions, and then CPO is equal for both. $\square $

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Martínez-Minaya, J., Rue, H. A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance. Stat Comput 34, 116 (2024). https://doi.org/10.1007/s11222-024-10427-3

Download citation

Received: 28 August 2023
Accepted: 14 March 2024
Published: 16 April 2024
DOI: https://doi.org/10.1007/s11222-024-10427-3

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A flexible Bayesian tool for CoDa mixed models: logistic-normal distribution with Dirichlet covariance

Abstract

Similar content being viewed by others

A Spatial Durbin Model for Compositional Data

Comparison of WAIC and posterior predictive approaches for N-mixture models

A Dirichlet Regression Model for Compositional Data with Zeros

1 Introduction

2 CoDa background

2.1 CoDa: Definitions

2.2 Logistic-normal distribution and Dirichlet distribution

2.3 Relation between the two distributions

3 LGMs and INLA

3.1 LGMs

3.2 INLA

4 INLA for fitting logistic-normal regression with Dirichlet covariance

4.1 Bayesian logistic-normal regression with Dirichlet covariance

Definition 1

4.2 LNDM in R-INLA

Proposition 1

4.3 A simulated example

4.3.1 Data simulation

4.3.2 Preparing data for being introduced in R-INLA

4.3.3 Fitting the model

5 Spatial LNDM and model selection

5.1 Spatial LNDMs

5.2 Model selection and validation

5.2.1 CPO

6 Continuos spatial data: a simulation study

6.1 Simulated data

6.2 Model selection

6.3 Parameters recovery

7 The case of Arabidopsis thaliana

7.1 The data and the model

7.2 Model selection, model fitting and prediction

8 Conclusions and future work

9 Supplementary information

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendix A: CPO computation in R-INLA

Appendix A: CPO computation in R-INLA

Proposition 2

Proof

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation