1 Introduction

Compositional Data analysis is an increasingly popular topic for understanding processes that consist in values that correspond to disjoint categories, the sum of which is a constant. Those values are usually proportions or percentages, and in such cases the constant is 1 or 100. The data generated from these processes are widely known as Compositional Data (CoDa). For the sake of simplicity and without loss of generality, from now on, we assume the constant to be 1. Connor and Mosimann (1969) proposed Dirichlet regression to deal with CoDa. Since then, several studies have been conducted using this technique, and most of them have proved that it is a very valuable tool for modelling CoDa, see for example Hijazi and Jernigan (2009) and Pirzamanbein et al. (2020).

There are other approaches to CoDa analysis. Aitchison (1986) presented an unified theory, developing a range of methods based on the idea that “information in compositional vectors is concerned with relative, not absolute magnitudes”. With this statement, the notion of ratios among proportions emerged and the concept of log-ratios arose as the preferred method for dealing with CoDa. Modelling CoDa using logistic-normal gained ground, and the bases of CoDa were established.

A vast body of literature exists on the subject of applying these methods using both Dirichlet regression and logistic-normal regression in different fields, including Ecology (Kobal et al. 2017; Douma and Weedon 2019), Geology (Buccianti and Grunsky 2014; Engle and Rowan 2014), Genomics (Tsilimigras and Fodor 2016; Shi et al. 2016; Washburne et al. 2017; Creus Martí et al. 2022), Environmental Sciences (Aguilera et al. 2021; Mota-Bertran et al. 2022) or Medicine (Dumuid et al. 2018; Fairclough et al. 2018).

Nevertheless, one of the biggest problems encountered when dealing with CoDa models is performing inference. To do so, different approaches have been proposed; in particular, many R-packages have been implemented not only from the frequentist perspective (Cribari-Neto and Zeileis 2010; Templ et al. 2011; Maier 2014), but also from the Bayesian paradigm. R-packages such as BayesX (Klein et al. 2015), Stan (Sennhenn-Reulen 2018), BUGS (van der Merwe 2018) and R-JAGS (Plummer 2016) have tools for dealing with CoDa. These Bayesian packages are mainly based on Markov chain Monte Carlo (MCMC) methods, which construct a Markov chain whose stationary distribution converges to the posterior distribution. However, the computational cost of MCMC can be high. Moreover, the integrated nested Laplace approximation (INLA) methodology (Rue et al. 2009), which is mainly intended for approximating the posterior distribution using the Laplace integration method, has become an alternative to MCMC guaranteeing a higher computational speed for Latent Gaussian Models (LGMs). With the incorporation of new techniques from Bayesian variational inference (Niekerk and Rue 2021; Van Niekerk et al. 2023) and the optimisation of the computation, which improves its parallel performance (Gaedke-Merzhäuser et al. 2023), a new era is emerging in the INLA software. Hence, incorporating a tool for dealing with CoDa would be a convenient way to tackle the large CoDa databases sometimes encountered.

Nonetheless, in R-INLA, it is still a challenge to fit models when we deal with a multivariate likelihood such as the ones defined in simplex of dimension \(D (\mathbb {S}^D)\). There are some approximations for the Dirichlet likelihood that involve converting the original Dirichlet observations into Gaussian pseudo-observations conditioned to the linear predictor (Martínez-Minaya et al. 2023) or just converting a CoDa multivariate response into coordinates using the isometric log-ratio transformation (Mota-Bertran et al. 2022) and fitting them in an independent way. However, there is no unified way to fit these models inside R-INLA and take advantage of all its facilities.

In this paper we present the logistic-normal Dirichlet model (LNDM), which mainly uses logistic-normal distribution with Dirichlet covariance through the additive log-ratio transformation as likelihood. This allows us to integrate it within the R-INLA package in a very simple way. Thus, we benefit from all the other features of R-INLA for model fitting, model selection and predictions within the framework of LGMs. Additionally, we present how measures such the Deviance Information Criteria (Spiegelhalter et al. 2002, DIC), the Watanabe Akaike information criterion (Watanabe and Opper 2010; Gelman et al. 2014, WAIC), or the cross-validation measure conditional predictive ordinate (CPO) for evaluating the predictive capacity (Pettit 1990; Roos and Held 2011) are computed in R-INLA for dealing with CoDa. To show how the method works, two simulate examples and a real example in the field of Ecology were implemented. In the last part, we conducted a spatial analysis of the plant Arabidopsis thaliana on the Iberian Peninsula.

The paper is then divided into 7 more sections. Section 2 introduces CoDa, the distributions that can be defined in \(\mathbb {S}^D\), and their equivalence. Section 3 presents some fundamentals of the INLA methodology. Section 4 is devoted to introducing the logistic-normal regression with Dirichlet covariance. In Sect. 5, we introduce spatial models as well as model selection measures in CoDa. Section 6 focuses on presenting a simulated spatial study. In Sect. 7, we provide a real application of this method and, finally, Sect. 8 concludes and discusses future avenues of research.

2 CoDa background

This section is devoted to introducing some preliminary concepts for a better understanding of CoDa. In particular, we present some basic and formal definitions of the two main distributions employed when we deal with CoDa.

2.1 CoDa: Definitions

Let \(\varvec{y}_{D \times 1}\) be a vector that satisfies \(\sum _{d = 1}^D y_d = 1\), and \(0< y_d < 1\), \(d = 1,\ldots , D\). This vector is called a composition, and it pertains to the simplex sample space. The simplex of dimension D, denoted by \(\mathbb {S}^D\), is defined as:

$$\begin{aligned} \mathbb {S}^D = \left\{ {\varvec{y}} \in \mathbb {R}^D \mid 0< y_d < 1; \ \ \sum _{d = 1}^D y_d = 1 \right\} . \end{aligned}$$
(1)

As in the ordinary real Euclidean space, there is a geometry defined in \(\mathbb {S}^D\). It does not follow the usual Euclidean geometry, and it was introduced by Pawlowsky-Glahn and Egozcue (2001) and Egozcue et al. (2003). It is called Aitchison geometry. The definitions of perturbation and powering are sufficient to obtain a vector space of compositions and the usual properties such as commutativity, associativity and distributivity hold. With the definition of the Aitchison inner product, the Aitchison norm and the Aitchison distance, an Euclidean linear vector space is obtained (Pawlowsky-Glahn and Egozcue 2001).

Following the fundamentals proposed by Aitchison (1986), log-ratios play an important role in CoDa analysis. They can be constructed in different ways, including centered log-ratio, isometric log-ratio or additive log-ratio, among others (Egozcue et al. 2012). In this work, we focus on the well-known additive log-ratio transformation because of its straightforward interpretation (Greenacre et al. 2023), and due to its being a one-to-one mapping from \(\mathbb {S}^D\) to \(\mathbb {R}^{D-1}\). It is defined as:

$$\begin{aligned} {\varvec{z}}_{(D-1)\times 1}= alr({\varvec{y}}):= \left[ \log \left( \frac{y_1}{y_D}\right) , \ldots , \log \left( \frac{y_{D-1}}{y_{D}}\right) \right] , \end{aligned}$$
(2)

where D is the reference category. In Greenacre et al. (2023), the authors depicted some criteria to select the reference category. They recommended choosing the one whose logarithm has low variance as a reference, and avoiding taking a reference with low relative abundances across samples. The new variables generated are called alr-coordinates. The inverse alr, also called \(alr^{-1}\) is

$$\begin{aligned} \begin{aligned} alr^{-1}({\varvec{z}})=&{} \left[ \frac{\exp {(z_1)}}{1 + \sum _{d=1}^{D-1} \exp {(z_d)}}, \ldots , \right. \\{}&{} \left. \frac{\exp {(z_{D-1})}}{1+ \sum _{d=1}^{D-1} \exp {(z_d)}},\frac{1}{1+ \sum _{d=1}^{D-1} \exp {(z_d)}} \right] . \end{aligned} \end{aligned}$$

In addition to Aitchison geometry, several probability distributions have also been characterised in \(\mathbb {S}^D\) (Figueras et al. 2003), although here we focus on the normal distribution on the simplex or logistic-normal distribution, and the Dirichlet distribution.

2.2 Logistic-normal distribution and Dirichlet distribution

Logistic-normal distribution was defined by Aitchison and Shen (1980) and it was studied in depth in Aitchison (1986). A D random vector \({\varvec{y}}\) is said to have a logistic-normal distribution \({{\mathcal {L}}}{{\mathcal {N}}}(\varvec{\mu }, \varvec{\Sigma })\), or alternatively a normal distribution on \(\mathbb {S}^D\), if any of its vector of log-ratio coordinates has a joint \((D-1)\)-variate normal distribution. This definition can be adapted straight to a CoDa response using alr-coordinates, as:

$$\begin{aligned} {\varvec{y}} \mid \varvec{\mu }, \varvec{\Sigma } \sim {{\mathcal {L}}}{{\mathcal {N}}}(\varvec{\mu }, \varvec{\Sigma }) \Longleftrightarrow alr({\varvec{y}}) \mid \varvec{\mu }, \varvec{\Sigma } \sim {\mathcal {N}}(\varvec{\mu }, \varvec{\Sigma }), \end{aligned}$$
(3)

\(\varvec{\mu }\) being a \(D-1\) dimensional vector and \(\varvec{\Sigma }\) a \((D-1) \times (D-1)\) covariance matrix. Alternatively, the Dirichlet distribution was introduced in Connor and Mosimann (1969), and it is the generalisation of the widely known beta distribution. A D random vector \({\varvec{y}}\) is said to have a Dirichlet distribution \({\mathcal {D}}(\varvec{\alpha })\), if it has the following probability density:

$$\begin{aligned} p(\varvec{y} \mid \varvec{\alpha })= \frac{1}{\text {B}(\varvec{\alpha })} \prod _{d=1}^D y_d^{\alpha _d -1} , \end{aligned}$$
(4)

\(\varvec{\alpha } = (\alpha _1, \ldots , \alpha _D)\) being the vector of shape parameters for each category, \(\alpha _D>0\) \(\forall d\), \(y_d \in (0,1)\), \(\sum _{d=1}^D y_d=1\), and \(\text {B}(\varvec{\alpha })\) the multinomial Beta function, which serves as the normalising constant. The multinomial Beta function is defined as \(\text {B}(\varvec{\alpha })=\prod _{d=1}^D \Gamma (\alpha _d)/ \Gamma (\sum _{d=1}^D \alpha _d)\). The sum of all \(\alpha \)’s, \(\alpha _0=\sum _{d=1}^D \alpha _c\), is usually interpreted as a precision parameter. The Beta distribution is the particular case when \(D=2\). In addition, each variable is marginally Beta distributed with \(\alpha =\alpha _d\) and \(\beta =\alpha _0-\alpha _d\). If \(\varvec{y} \sim {\mathcal {D}}(\varvec{\alpha })\), the expected values are \(\text {E}(y_d)=\alpha _d/\alpha _0\), the variances are \(\text {Var}(y_d)=[\alpha _c(\alpha _0 - \alpha _d)]/[\alpha _0^2(\alpha _0 + 1)]\) and the covariances are \(\text {Cov}(y_d, y_{d'})=-\alpha _d \alpha _{d'}/[\alpha _0^2(\alpha _0 + 1)]\).

2.3 Relation between the two distributions

As pointed out in Aitchison (1986, 126–129), the logistic-normal and the Dirichlet distribution are separate in the sense that they are never exactly equal for any choice of parameters. However, through the Kullback–Leibler divergence (KL), which measures by how much the approximation q misses the target p, the Dirichlet distribution can be approached with the logistic-normal distribution. The solution to the minimisation of the KL:

$$\begin{aligned} K(p, q) = \int _{{\mathcal {S}}^D} p({\varvec{y}} \mid \varvec{\alpha }) \log \left( \frac{p({\varvec{y}} \mid \varvec{\alpha })}{q({\varvec{y}} \mid \varvec{\mu }, \varvec{\Sigma })} \right) d {\varvec{y}}, \end{aligned}$$
(5)

where \(p({\varvec{y}} \mid \varvec{\alpha })\) represents the density function of the Dirichlet, and \(q({\varvec{y}} \mid \varvec{\mu }, \varvec{\Sigma })\), the logistic-normal density function, is minimised by:

$$\begin{aligned} \begin{array}{rcl} \varvec{\mu } &{} = &{} {\varvec{E}}\left[ \log \left( \frac{y_{1}}{y_D}\right) , \ldots , \log \left( \frac{y_{D-1}}{y_D}\right) \right] = {\varvec{E}}\left[ alr({\varvec{y}}) \right] , \\ \\ \varvec{\Sigma } &{}= &{} \varvec{Var}\left[ \log \left( \frac{y_{1}}{y_D}\right) , \ldots , \log \left( \frac{y_{D-1}}{y_D}\right) \right] = \varvec{Var}\left[ alr({\varvec{y}}) \right] , \end{array} \end{aligned}$$
(6)

and the solution can be written in terms of the digamma \(\phi \) and trigamma \(\phi '\) functions as:

$$\begin{aligned} \begin{aligned} \mu _d=&{} \phi (\alpha _d) - \phi (\alpha _D), \quad d = 1, \ldots , D-1, \\ \Sigma _{dd}=&{} \phi '(\alpha _d) + \phi '(\alpha _D), \quad d = 1, \ldots , D-1 , \\ \Sigma _{dk}=&{} \phi '(\alpha _D),\quad d \ne k. \end{aligned} \end{aligned}$$
(7)

This approach plays an important role in this paper, as it constitutes the basis for defining logistic-normal regression with Dirichlet covariance. But first we introduce the model framework in which this likelihood is included, that is, Latent Gaussian Models (LGMs, Rue et al. 2009).

3 LGMs and INLA

The popularity of INLA lies in the fact that it allows fast approximate inference for LGMs. Furthermore, the INLA software is experiencing a new era, facilitated by the integration of novel techniques from Bayesian variational inference (Niekerk and Rue 2021; Van Niekerk et al. 2023) and enhanced computation optimization, leading to improved parallel performance (Gaedke-Merzhäuser et al. 2023). This section is devoted to briefly introducing the structure of LGMs and how INLA makes inference and prediction with the new advances in INLA.

3.1 LGMs

In Van Niekerk et al. (2023) a new formulation of INLA is presented. So, we follow it to introduce the notions of INLA. LGMs can be seen as three-stage hierarchical Bayesian models in which observations \(\varvec{y}_{N \times 1}\) can be assumed to be conditionally independent given a latent Gaussian random field \({\mathcal {\varvec{X}}}\) and hyperparameters \(\varvec{\theta }_1\)

$$\begin{aligned} \varvec{y} \mid {\mathcal {\varvec{X}}}, \varvec{\theta }_1 \sim \prod _{n=1}^N p(y_n \mid {\mathcal {\varvec{X}}},\varvec{\theta }_1). \end{aligned}$$
(8)

The versatility of the model class is related to the specification of the latent Gaussian field:

$$\begin{aligned} \begin{aligned} {\mathcal {\varvec{X}}} \mid \varvec{\theta }_2 \sim {\mathcal {N}}(\varvec{0}, \varvec{Q}^{-1}(\varvec{\theta }_2)), \end{aligned} \end{aligned}$$
(9)

which includes all the latent (non-observable) components of interest, such as fixed effects and random terms, describing the process underlying the data. The hyperparameters \(\varvec{\theta }=\{\varvec{\theta }_1, \varvec{\theta }_2\}\) control the latent Gaussian field and/or the likelihood for the data.

Additionally, the LGMs are a class generalising the large number of related variants of additive and generalised models. If \(\varvec{\eta }_{N \times 1}\) is a column vector representing the linear predictor, then different effects can be added to it:

$$\begin{aligned} \varvec{\eta }_{N \times 1} = {\varvec{X}} \varvec{\beta } + \sum _{l = 1}^L f_l(\varvec{u}_l) \, \end{aligned}$$
(10)

where \({\varvec{X}}\) is the design matrix for the fixed part (including the first column of 1 s if intercepts are added to the model), and \(\varvec{\beta }_{(M + 1) \times 1}\) is a column vector for the linear effects of \(\varvec{X}\) on \(\varvec{\eta }\). \(\{\varvec{f}\}\) are unknown functions of \(\varvec{U}\). This formulation can be seen as any model where each one of the \(f^l(.)\) terms can be written in matrix form as \({\varvec{A}}_l\varvec{u}_l\). So, expression (10) can be rewritten as \(\varvec{\eta } = {\varvec{A}} {\mathcal {\varvec{X}}}\), with \({\varvec{A}}\) a sparse design matrix that links the linear predictors to the latent field.

When we do inference, the aim is to estimate \({\mathcal {\varvec{X}}}_{(M + 1 + L) \times 1} = \{\varvec{\beta }, \varvec{f}\}\), which represents the set of unobserved latent variables (latent field). If a Gaussian prior is assumed for \(\varvec{\beta }\) and \(\varvec{f}\), the joint prior distribution of \({\mathcal {\varvec{X}}}\) is Gaussian. This yields the latent field \({\mathcal {\varvec{X}}}\) in the hierarchical LGM formulation. The vector of hyperparameters \(\varvec{\theta }\) contain the non-Gaussian parameters of the likelihood and the model components. These parameters commonly include variance, scale or correlation parameters.

In most cases, the latent field in addition to be Gaussian, is also a Gaussian Markov random field (GMRF, Rue and Held 2005). A GMRF is a multivariate Gaussian random variable with additional conditional independence properties: \(x_j\) and \(x_j'\) are conditionally independent given the remaining elements if and only if the (ij) entry of the precision matrix is 0. Implementation of INLA method use this property to speed up computation.

3.2 INLA

The main idea of the INLA approach is to approximate the posteriors of interest: the marginal posteriors for the latent field, \(p({\mathcal {X}}_m \mid \varvec{y})\), and the marginal posteriors for the hyperparameters, \(p(\theta _k \mid \varvec{y})\). With the modern formulation (Van Niekerk et al. 2023), the main enhancement is that the latent field is not augmented with the ‘noisy’ linear predictors. Then, the joint density of the latent field, hyperparameters and the data is derived as:

$$\begin{aligned} p({\mathcal {\varvec{X}}}, \varvec{\theta } \mid {\varvec{y}}) \propto p(\varvec{\theta }) p({\mathcal {\varvec{X}}} \mid \varvec{\theta }) \prod _{n = 1}^N p(y_n \mid ({\varvec{A}} {\mathcal {\varvec{X}}})_n, \varvec{\theta }). \end{aligned}$$
(11)

Thus, the initial step in approaching the posterior distributions involves determining the mode and the Hessian at the mode of \({\tilde{p}}(\varvec{\theta } \mid \varvec{y})\):

$$\begin{aligned} {\tilde{p}}(\varvec{\theta } \mid \varvec{y}) \propto \frac{p({\mathcal {\varvec{X}}}, \varvec{\theta } \mid \varvec{y})}{p_{G}({\mathcal {\varvec{X}}} \mid \varvec{\theta }, \varvec{y})} \bigg |_{{\mathcal {\varvec{X}}} = \varvec{\mu (\theta )}}. \end{aligned}$$
(12)

being \(p_{G}\left( {\mathcal {\varvec{X}}} \mid \varvec{\theta }, \varvec{y}\right) \) the Gaussian approximation to \(p({\mathcal {\varvec{X}}} \mid \varvec{\theta }, \varvec{y})\) computed as depicted in Van Niekerk et al. (2023):

$$\begin{aligned} \begin{aligned} {\mathcal {\varvec{X}}} \mid \varvec{\theta }, \varvec{y} \sim {\mathcal {N}}(\varvec{\mu } (\varvec{\theta }), \varvec{Q}_{{\mathcal {\varvec{X}}}}^{-1}(\varvec{\theta })). \end{aligned} \end{aligned}$$
(13)

The subsequent step involves obtaining the conditional posterior distributions of the elements in \({\mathcal {\varvec{X}}}\). To achieve this, it suffices to perform integration \(\varvec{\theta }\) out from (13) using T integration points \(\theta _t\) and area weights \(\delta _t\) defined by some numerical integration scheme:

$$\begin{aligned} {\tilde{p}}({\mathcal {X}}_m \mid \varvec{y})= & {} \int p_{G}({\mathcal {X}}_m \mid \varvec{\theta }, \varvec{y}) d \varvec{\theta }\nonumber \\\approx & {} \sum _{t = 1}^T p_{G}({\mathcal {X}}_m \mid \theta _t, \varvec{y}) {\tilde{p}}(\theta _t \mid \varvec{y})\delta _t. \end{aligned}$$
(14)

Finally, the recent proposed Variational Bayes correction to Gaussian means by Niekerk and Rue (2021) is used to efficiently calculate an improved mean for the marginal posterior of the latent field. All this methodology can be used through R with the R-INLA package. For more details about R-INLA we refer the reader to Blangiardo and Cameletti (2015), Zuur et al. (2017), Wang et al. (2018), Krainski et al. (2018), Moraga (2019), Gómez-Rubio (2020), Van Niekerk et al. (2023), where practical examples and code guidelines are provided.

4 INLA for fitting logistic-normal regression with Dirichlet covariance

This part of the paper focuses on presenting our approximation for fitting CoDa.

4.1 Bayesian logistic-normal regression with Dirichlet covariance

To define the likelihood we need the logistic-normal distribution and the structure of the variance–covariance matrix presented in Eq. (7).

Definition 1

\({\varvec{y}} \in \mathbb {S}^D\) follows a logistic-normal distribution with Dirichlet covariance \(\mathcal {LND}(\varvec{\mu }, \varvec{\Sigma })\) if and only if \(alr({\varvec{y}}) \sim {\mathcal {N}}(\varvec{\mu }, \varvec{\Sigma })\), and:

$$\begin{aligned} \begin{aligned} \begin{array}{rcl} \Sigma _{dd} &{}{} = &{}{} \sigma _d^2 + \gamma , \quad d = 1, \ldots , D-1, \\ \Sigma _{dk} &{}{} = &{}{} \gamma , d \ne k, \end{array} \end{aligned} \end{aligned}$$

where \(\sigma _d^2 + \gamma \) represents the variance of each log-ratio and \(\gamma \) is the covariance between log-ratios.

From now on, we will refer to \({{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu },\varvec{\Sigma })\) as the multivariate normal with Dirichlet covariance structure, as depicted in Definition 1. Let \(\varvec{y}\) be a multivariate random variable such as \(\varvec{y} \sim \mathcal {LND}(\varvec{\mu }, \varvec{\Sigma })\), which by definition is equivalent to \(alr(\varvec{y} ) \sim {{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu }, \varvec{\Sigma })\). Because of its easy interpretability in terms of log-ratios with the reference category, we focus on modelling \(alr(\varvec{y})\) as a \({{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu }, \varvec{\Sigma })\).

Let \(\varvec{\mu }^{(d)}_{N \times 1}\), a column vector representing the linear predictor for the nth observation in the dth alr-coordinate, and \({\varvec{X}}^{(d)}\) with dimension \(N \times (M^{(d)} + 1), d = 1, \ldots , D-1\), the design matrix, which can be different for each dth alr-coordinate; in other words, each alr-coordinate can be explained by different covariates. Let \({\varvec{f}}^{(d)}\) be a set of \(L^{(d)}\) unknown functions of \({\varvec{U}}\) that also can vary depending on the alr-coordinate. For the sake of simplicity, and without loss of generality, we assume \(M^{(d)} = M\) and \(L^{(d)}=L\), fixing the number of covariates and the number of functions as the same in each linear predictor. Finally, we define \(\varvec{\beta }^{(d)}_{(M+1) \times 1}\) a \(M + 1\)-dimensional column vector that contains the parameters corresponding to the fixed effects including the intercept.

Then, the logistic-normal Dirichlet model (LNDM) can be expressed as follows:

$$\begin{aligned} alr(\varvec{y} )\sim & {} {{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu }, \varvec{\Sigma }) , \end{aligned}$$
(15)
$$\begin{aligned} \varvec{\mu }^{(d)}= & {} {\varvec{X}} \varvec{\beta }^{(d)} + \sum _{l=1}^L {\varvec{f}}_l^{(d)} (u_l) , \end{aligned}$$
(16)

being \({\mathcal {\varvec{X}}} = \{\varvec{\beta }^{(d)}, {\varvec{f}}^{(d)}; d = 1, \ldots , D - 1\}\,\) the latent field, \(\varvec{\theta }_1 = \{\sigma _d^2, \gamma : d = 1, \ldots , D-1 \}\) the hyperparameters corresponding to the likelihood, and \(\varvec{\theta }_2\) the hyperparameters corresponding to the functions f.

4.2 LNDM in R-INLA

R-INLA has been implemented in the sense that each data item is linked to one element of the Gaussian field. Although in this new INLA era, this condition disappears (Van Niekerk et al. 2023), it is still a challenge to fit models with multivariate likelihoods. Some approximations exist for Multinomial likelihood using the Poisson–Laplace trick (Baker 1994), or the Dirichlet likelihood converting the original Dirichlet observations into Gaussian pseudo-observations conditioned to the linear predictor (Martínez-Minaya et al. 2023). In our case, the main challenge is to estimate the variance-covariance matrix of the \({{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu }, \varvec{\Sigma })\) distribution, in particular, \(p(\gamma \mid {\varvec{y}})\). To do so, we adopt the strategy of modelling each alr-coordinate as if we were modelling multiple likelihoods (Krainski et al. 2018), and the covariance hyperparameter is estimated using independent random effects through the following well-known proposition.

Proposition 1

Let \(z_d\), \(d = 1, \ldots , D-1\) be independent Gaussian random variables with different mean \(\mu _d\) variances \(\sigma _{d}^2\), and \(u \sim {\mathcal {N}}(0, \gamma )\). Then, the multivariate random variable \({\varvec{y}}\), defined as:

$$\begin{aligned} \begin{aligned} \begin{array}{rcl} y_1 &{}{} = &{}{} z_1 + u, \\ y_2 &{}{} = &{}{} z_2 + u, \\ \vdots &{}{} = &{}{} \vdots \\ y_{D-1} &{}{} = &{}{} z_{D-1} + u, \end{array} \end{aligned} \end{aligned}$$
(17)

follows a multivariate Gaussian with mean \(\varvec{\mu }\) and covariance matrix \(\varvec{\Sigma }\) whose elements are:

$$\begin{aligned} \begin{aligned} \begin{array}{rcl} \Sigma _{dd} &{}{} = &{}{} \sigma _d^2 + \gamma , \ d = 1, \ldots , D-1, \\ \Sigma _{dj} &{}{} = &{}{} \gamma , d \ne j. \, \end{array} \end{aligned} \end{aligned}$$

This proposition is simple but powerful, as with independent Gaussian distributions and a shared random effect between predictors, \(p(\gamma \mid {\varvec{y}})\) can be easily estimated. So, this structure fits perfectly in the context of LGMs. Thus, to estimate LNDM in R-INLA, we only need to add an individual shared random effect between linear predictors corresponding to the different alr-coordinates.

4.3 A simulated example

In this section, we exemplify, using a simulated scenario, the process of fitting CoDa using R-INLA. To elucidate, we initiate with a simplistic case featuring solely three categories and one covariate. We presuppose that the impact of this covariate differs for each predictor. Subsequently, we designate this model as a Type II model. The model structure with which we operate in this example is:

$$\begin{aligned}{} & {} alr({\varvec{Y}} ) \sim {{\mathcal {N}}}{{\mathcal {D}}}((\varvec{\mu }^{(1)}, \varvec{\mu }^{(2)}), \varvec{\Sigma }) , \end{aligned}$$
(18)
$$\begin{aligned}{} & {} \varvec{\mu }^{(d)} = {\varvec{X}} \varvec{\beta }^{(d)} , \end{aligned}$$
(19)

where \(\varvec{X}_{N \times 2}\) is a matrix with ones in the first column and values of the covariate simulated from a Uniform distribution between \(-0.5\) and 0.5. Four different parameters compose the model, and they form the latent field: \({\mathcal {\varvec{X}}} = \{\beta _0^{(1)}, \beta _0^{(2)}, \beta _1^{(1)}, \beta _1^{(2)}\}\). Moreover, three different hyperparameters are included in the model and they form the set of hyperparameters \(\varvec{\theta } = \{\sigma _1^2, \sigma _2^2, \gamma \}\).

4.3.1 Data simulation

In this part of the manuscript, we present an example of how simulation can be conducted. First at all, we define the values of the hyperparameters and we compute the correlation matrix in \(\varvec{\Sigma }\). \(N = 1000\), \(D = 3\), \(\sigma _1^2 = 0.5\), \(\sigma _2^2=0.4\) and \(\gamma = 0.1\) are the choosen values for the simulation.

figure a

Correlation matrix can also be easily computed. This matrix is formed for \(((D-1)^2 - (D-1))/2\) values out of the diagonal.

figure b

Next step is simulating the covariate.

figure c

Subsequently, with fixed betas, \(\beta _0^{(1)} = -1\), \(\beta _1^{(1)} = 1\), \(\beta _0^{(2)} = -1\), \(\beta _1^{(2)} = 2\), we construct the values for the two linear predictors.

figure d

Simulating from a multivariate Gaussian with the structure previously constructed is the next step. And with it, we obtain the alr-coordinates.

figure e

Finally, we move to the simplex assuming the third category the reference one. the output is a matrix with the response variable summing their rows up to one. We create a data.frame in order to keep the CoDa, the alr-coordinates and the covariate x. In Fig. 1, CoDa generated and alr-coordinates have been depicted.

figure f
Fig. 1
figure 1

Top: CoDa simulated represented in the simplex. Bottom: alr-coordiantes in terms of the generated covariate x

4.3.2 Preparing data for being introduced in R-INLA

In this section, the most labor-intensive step is preparing the database to be input into R-INLA. To do this, we make use of structures like inla.stack. In this structure, we need to include the multiresponse variable, where we incorporate different alr-coordinates. Additionally, we input the covariates, indicating which alr-coordinate they affect, along with an index that assist us in introducing the shared random effect for estimating the hyperparameter \(\gamma \). So, we start defining such index.

figure g

Posteriorly, we extent the dataset for constructing the multivariate response which is a matrix with dimension \((N \times (D-1)) \times (D-1)\), being the first column formed for the first alr-coordinate en N first rows, and NAs in the rest; the second column formed by the second alr-coordinate in the positions \((N + 1)\):(2N), and NAs in the rest, and so on.

figure h
figure i

In the model, covariates are included as random effects with big variance. So, we need the values of the covariates, and also, an index indicating to which alr-coordinate it belongs.

figure j

Finally, we create the inla.stack for estimation, and we are ready for fitting the model.

figure k

4.3.3 Fitting the model

For fitting the model, it is required to define priors for the parameters and hyperparameters. Prior considered for the parameters are the default ones used in R-INLA. However, PC-priors (Simpson et al. 2017) are considered for the standard deviations and the root square of the covariance parameter \(\gamma \), in particular, PC-prior(1, 0.01) were used for \(\sigma _1\), \(\sigma _2\) and \(\sqrt{\gamma }\). So, the required formula to be introduced in R-INLA was:

figure l

and the call to R-INLA:

figure m

In Figs. 2 and 3, marginal posterior distributions jointly with the simulated value are depicted showing that we were able to recover the original value.

Fig. 2
figure 2

Marginals posterior distributions for the fixed effects. Vertical lines represent the real values

Fig. 3
figure 3

Marginals posterior distributions for the hyperparameters. Vertical lines represent the real values

5 Spatial LNDM and model selection

Once the LNDM is defined, a particular focus lies on how more intricate structures within the linear predictor can be accommodated within the R-INLA framework. Furthermore, another issue pertains to model selection. Hence, this section is dedicated to spatial LNDMs and the utilization of measures such as Deviance Information Criteria (DIC), Watanabe Akaike information criterion (WAIC), and LCPO for model selection.

5.1 Spatial LNDMs

Of particular interest are the LNDMs in the spatial context. The analysis of the spatial process refers to the analysis of data collected in space. Space can be indexed over a discrete domain or a continuous one. So, spatial statistics is traditionally divided into three main areas depending on the type of problem and data: lattice data, Geostatistics and point patterns. For a review of models of different types of spatial data, see Haining and Haining (2003) and Cressie and Wikle (2015). When a spatial effect has to be included in the model, it is common to formulate mixed-effects regression models in which the linear predictor is made up of a trend plus a spatial variation, the spatial effect being modelled with correlation random effects and matching perfectly the structure presented in Eq. (16).

R-INLA provides many options when implementing Gaussian latent spatial effects (Gómez-Rubio 2020), including intrinsic conditional autoregressive models (iCAR) or conditional autoregressive models (CAR) for areal data (Besag et al. 1991) or spatial effect with Matérn covariance function for continuous processes (Lindgren et al. 2011). In this manuscript, we focus in the last, but it can be easily applicable to other latent Gaussian effects.

The Matérn covariance function is one of the most widely used in Geostatistics due to its flexibility. Although initially it could not be directly incorporated into the R-INLA structure, in Lindgren et al. (2011) introduced a solution through the SPDE module, approximating the spatial latent effect with a Matérn function as a solution to a stochastic partial differential equation using the finite element method (FEM). Since then, this methodology has been applied in numerous scientific articles across different areas (Martínez-Minaya et al. 2018).

These effects can be easily included in the LNDM. As we are adopting a multiple likelihood modelling strategy, we make use of the features that R-INLA provides for fitting multiple likelihoods in a jointly way. The copy command is intended share random effects, i.e., to use the same latent effect in different linear predictors. It also, allows to share exactly the same latent effect but adding a proportionality hyperparmeter. The replicate feature provides a way to add different random effects per linear predictor sharing the same hyperparameters. For details about its implementation, we refer the reader to the website https://www.r-inla.org/ and books by Krainski et al. (2018) and Gómez-Rubio (2020).

Applying these principles and emphasizing both fixed effects and continuous spatial random effects, the examples presented in this paper follow a systematic framework that leads to the development of eight distinct model types. Then, the model structure employed for the remainder of the paper is as follows:

$$\begin{aligned} alr(\varvec{Y} )\sim & {} {{\mathcal {N}}}{{\mathcal {D}}}((\varvec{\mu }^{(1)}, \ldots , \varvec{\mu }^{(d)}), \varvec{\Sigma }), \end{aligned}$$
(20)
$$\begin{aligned} \varvec{\mu }^{(d)}= & {} {\varvec{X}} \varvec{\beta }^{(d)}+\varvec{\omega }^{(d)} , d = 1,\ldots ,D-1 , \end{aligned}$$
(21)

\(\varvec{\mu }^{(d)} = (\mu ^{(d)}_{1}, \ldots , \mu ^{(d)}_{N})\) being the different linear predictor for the nth observation in the dth alr-coordinate, and \({\varvec{X}}_{N \times (M + 1)}\) the design matrix, containing 1s in the first column if intercepts are considered in the model. \(\varvec{\omega }^{(d)}\) represents the spatial random effect with Matérn covariance for each dth alr-coordinate, \(\varvec{\omega }^{(d)} \sim {\mathcal {N}}(\varvec{0}, {\varvec{Q}}^{-1}(\sigma _{\varvec{\omega }}, \phi ))\), depending on the standard deviation of the spatial effect \(\sigma _{\varvec{\omega }}\) and its range \(\phi \). \(\varvec{\beta }^{(d)}_{(M + 1)\times 1}\) is the parameter vector corresponding to the fixed effects. The latent field is composed of the parameters corresponding to the fixed effects and the realisations of the random field.

$$\begin{aligned} {\mathcal {\varvec{X}}} = \{\varvec{\beta }^{(d)}, \varvec{\omega }^{(d)}: d = 1, \ldots , (D-1)\}. \end{aligned}$$

In contrast, \(\varvec{\theta }_1 = \{\sigma _d^2, \gamma : d = 1, \ldots , (D-1) \}\) are the hyperparameters corresponding to the likelihood, and \(\varvec{\theta }_2 = \{\sigma _{\varvec{\omega }}, \phi \} \) are the hyperparameters corresponding to the spatial random effect. Together they form the field of hyperparameters. Gaussian priors are usually assigned for the fixed effects and PC-priors for the hyperparameters (Simpson et al. 2017).

Based on the model structure defined in Eq. (21), R-INLA offers flexibility by allowing us to introduce fixed effects and random effects in different ways with the features previously explained. For the fixed effects, two different assumptions between parameters of the different alr-coordinates are plausible. The first is under the assumption that the effect of the m-covariate is the same for the different alr-coordinates, i.e. they are sharing the same parameter for fixed effects: \(\beta _m^{(d)} = \beta _m^{(k)}\), \(d \ne k\) and \(d, k = 1,\ldots , (D-1)\), \(m = 0, \ldots M\). We denote it by \(\beta _m\). For the second, we consider that the effect of the m-covariate could be different for each alr-coordinate. Note that this one is more general, as it includes the case where the effects are equal and also the case where we do not have the same covariates in each linear predictor. We denote them by \(\beta _m^{(d)}\).

With regard to the random effects, we distinguish three different cases. The first one considers that the spatial random field is the same for all the linear predictors, i.e. \(\varvec{\omega }^{(d)} = \varvec{\omega }^{(k)}\), \(d \ne k\) and \(d, k = 1,\ldots , (D-1)\). They share exactly the same spatial term. So, we denote it by \(\varvec{\omega }\) as it is not dependent on the alr-coordinates predictor. The second case is under the assumption that the spatial fields are proportional, in other words, \({\varvec{\omega }}^{(d)} = \alpha ^{(d)} {\varvec{\omega }}^{(k)}\), \(d \ne k\) and \(d, k = 1,\ldots , (D-1)\). We denoted it by \(\varvec{\omega }^{(*d)}\). Finally, the third case states that the realisation of the spatial random effect is different for each linear predictor. However, they share the same hyperparameters, i.e. \(\varvec{\omega }^{(d)} \ne \varvec{\omega }^{(k)}\), \(d \ne k\), and \(d, k = 1,\ldots , (D-1)\), where \(\varvec{\omega }^{(d)} \sim {\mathcal {N}}(0, {\varvec{Q}}^{-1}(\sigma _{\varvec{\omega }}, \phi ))\). We denote it by \(\varvec{\omega }^{(d)}\).

By combining fixed and random terms, we reach eight different structures for the linear predictors (See Table 1 for details about the latent field and hyperparameters):

  • Type I: share the same parameters for fixed effects, and do not include spatial random effects.

  • Type II: have different parameters for fixed effects, and do not include spatial random effects.

  • Type III: share the same parameters for fixed effects, and share the same spatial effect.

  • Type IV: have different parameters for fixed effects, and share the same spatial effect.

  • Type V: share the same parameters for fixed effects, and the spatial effects between linear predictors are proportional. Realisations of the spatial field are the same, but a proportionality hyperparameter is added in two of the three linear predictors.

  • Type VI: have different parameters for fixed effects, and the spatial effects between linear predictors are proportional. Realisations of the spatial field are the same, but a proportionality hyperparameter is added in two of the three linear predictors.

  • Type VII: share the same parameters for fixed effects, and different realisations of the spatial effect for each linear predictor. Although realisations of random effects are different, they share the same hyperparameters.

  • Type VIII: have different parameters for fixed effects, and different realisations of the spatial effect for each linear predictor. Although realisations of random effects are different, they share the same hyperparameters.

Table 1 Different structures included in the model in an additive way with their corresponding latent field and the hyperparameters to be estimated

5.2 Model selection and validation

Regarding the model selection process, sometimes there are a large number of models resulting from all the possible combinations of covariates, and combining them with the possible latent effects that can be incorporated increases the number of possibilities exponentially. R-INLA has proved to be fast enough to compute huge numbers of models as well as different measures to make the model selection process feasible. Such measures include Deviance Information Criteria (Spiegelhalter et al. 2002, DIC), defined as a hierarchical modelling generalisation of the Akaike information criterion (AIC); Watanabe Akaike information criterion (Watanabe and Opper 2010; Gelman et al. 2014, WAIC), which is the sum of two components: one quantifying the model fit and the other evaluating the model complexity; or the cross-validation measure conditional predictive ordinate (CPO) for evaluating the predictive capacity and its log-score (Pettit 1990; Roos and Held 2011, LCPO). The models with the lowest values of DIC, WAIC or LCPO have preference over the rest.

However, R-INLA is programmed to handle univariate likelihoods, and the variability added with the inclusion of the new random effect is not being considered when the calculation of the deviance is computed. This affects the computation of the DIC and WAIC. So, an additional process is needed to calculate DIC and WAIC when the response variable follows a multivariate normal distribution. This process must be able to incorporate the elements that are off the diagonal of the variance–covariance matrix. To achieve this, a post-processing of the model is performed for obtaining samples of the jointly posterior distributions using the feature inla.posterior.sample function, and the likelihood of the multivariate normal distribution is calculated. The remaining calculations for DIC are done following the formula defined in Spiegelhalter et al. (2002), meanwhile, WAIC is computed following the formula in Watanabe and Opper (2010). These two ways have been implemented in two different functions in R. The functions are called DIC.mult and WAIC.mult and are available in the repository https://github.com/jmartinez-minaya/INLAcomp.

The same does not apply to the CPO, as it is based on the posterior predictive distribution. In Appendix A, there is a proof of why the CPO is not affected by the approach we propose here. However, we believe that the CPO cannot be calculated in the same way when dealing with CoDa, and therefore, we propose a new definition.

Fig. 4
figure 4

CoDa simulated. Proportion per category

5.2.1 CPO

In the context of CoDa cross-validation process, excluding a category from a CoDa point may not make sense, as we know that CoDa have a constraint: their sum must be 1. This implies that the remaining categories provide valuable information about the category we are excluding. One might think that working in the log-ratio coordinates could alleviate this issue, but that is not the case. The reference category is present in all the log-ratios, and thus we encounter a similar situation. At that point, the remaining log-ratio coordinates provide information about the category we have removed during cross-validation. In this manner, the concept of friendship emerges. Consequently, we can assert that the first alr-coordinate of individual n is friend of the second alr-coordinate of individual n, and is thereby contributing information. Hence, in order to conduct cross-validation for individual n and alr-coordinate d, it is necessary to exclude the values from all alr-coordinates pertaining to that individual. Accordingly, we can define the CPO for the nth data point and dth alr-coordinate as:

$$\begin{aligned} \text {CPO}_n^{(d)}= & {} \int p(alr(\varvec{y})_n^{(d)} \mid {\mathcal {\varvec{X}}}, \varvec{\theta })\nonumber \\{} & {} p({\mathcal {\varvec{X}}}, \varvec{\theta } \mid alr(\varvec{y})_{-n}^{\bullet }) \, d{\mathcal {\varvec{X}}} d \varvec{\theta } , \end{aligned}$$
(22)

being \(alr(\varvec{y})_n^{(d)}\) the observed vector for the n-data point and the d alr-coordinate, and \(alr(\varvec{y})_{-n}^{\bullet }\) represents the observed data in alr-coordinates (\(N-1\) data points with \(D-1\) components for data point) excluding the n data point with its corresponding \(D-1\) alr-coordinates. We then easily compute the log-score (Gneiting and Raftery 2007) as:

$$\begin{aligned} \text {LCPO} = -\frac{1}{N \cdot (D-1)} \sum _{d=1}^{D-1} \sum _{n=1}^{N} \log {\left( \text {CPO}_n^{(d)}\right) }. \end{aligned}$$
(23)

6 Continuos spatial data: a simulation study

The goals of this simulation are twofold. Firstly, we seek to assess the reliability of model selection criteria previously presented. As we have pointed out, these metrics play a crucial role in identifying the model that best represents the underlying process. Secondly, we aim to demonstrate capability of R-INLA to accurately recover the initial parameters.

6.1 Simulated data

We conducted a simulation of a spatial LNDM Type VIII renowned for its high flexibility as the fixed effects vary by linear predictor, and spatial effects realizations differ accordingly. The simulation involved one covariate, simulated from a Uniform distribution between \(-0.5\) and 0.5; two different realizations of a Matérn field in the square space [0,10]\(\times \) [0,10] with range \(\phi = 4\) and \(\sigma _{\omega } = 1\) (See Fig. 6); one thousand observations (\(N = 1000\)) and three dimensions (\(D = 3\)). Given that \(D = 3\), applying the alr transformation yields two linear predictors. In the context of Type VIII and considering we simulated only one covariate, we are tasked with estimating two parameters, denoted as \(\beta _1^{(1)}\) and \(\beta _1^{(2)}\). These parameters were pre-set to specific values: \(-2.27\) and \(-2.3\) respectively. Turning our attention to the likelihood hyperparameters, we encounter two variance hyperparameters \(\sigma _1^2\) and \(\sigma _2^2\) and one covariance parameter \(\gamma \). For this simulation, these hyperparameters were fixed at predetermined values 0.32, 0.59 and 0.1. Resulting data simulation is depicted in Fig. 4 and the alr-coordinates using the third category as reference are displayed in Fig. 5. We selected the third category as reference as it was the one whose logarithm had the lowest variance.

Fig. 5
figure 5

Additive log-ratio transformation of CoDa using the third category as the reference one

6.2 Model selection

The simulation originates from the Type VIII model, and we sought to fit alternative model types (refer to Table 1). Subsequently, we computed the DIC, WAIC, and LCPO for each model. Results are depicted in Table 2. Upon analysis, it is evident that, in all three cases, the Type VIII model consistently exhibits the best fit to our simulated data. This conclusion is supported by consistently smaller values across all three evaluation metrics.

Table 2 LNDMs with their corresponding DIC, WAIC and LCPO

6.3 Parameters recovery

As previously discussed, the optimal model is the Type VIII model. This model comprises: 2 parameters corresponding to fixed effects, \(\beta _{1}^{(1)}\) and \(\beta _{1}^{(2)}\), and the realizations of the spatial random effects which form the latent Gaussian field (\({\mathcal {\varvec{X}}}\)); 3 hyperparameters related to likelihood \(\sigma _1^2\), \(\sigma _2^2\) and \(\gamma \), and 2 hyperparameters associated with spatial random effects which forms the set of hyperparameters (\(\varvec{\theta }\)).

The 95% credible interval of the parameter \(\beta _1^{(1)}\) is [2.103, 2.4] with a median value of 2.251. In contrast, for the parameter \(\beta _1^{(2)}\), the 95% credible interval is \([-2.469, -2.086]\) with a median value of \(-2.277\). Comparing these intervals with the true parameter values, \(-2.27\) and 2.3 respectively, we conclude that estimation is accurate enough. A similar pattern emerges for the latent fields with Matérn covariance matrices. In Fig. 6, we depict the original spatial latent fields alongside the medians and estimated 95% credible intervals. Once again, we observe a reliable estimation. Finally, we examine the behavior of the hyperparameters. In Fig. 7, the posterior distributions of the hyperparameters are illustrated jointly with the true values. Once more, the estimations align well with the actual values. From these findings, we can conclude that the method is proficient in recovering the true parameter values effectively.

Fig. 6
figure 6

Real values for the latent fields with Matérn covariance matrix used in the simulation. Median and 95% credible intervals for the the estimated field

Fig. 7
figure 7

Marginals posterior distributions for the hyperparameters. Vertical lines represent the real values

7 The case of Arabidopsis thaliana

This section is devoted to showing an application of continuous spatial LNDMs in a real setting.

7.1 The data and the model

We worked with a collection of 301 accessions of the annual plant Arabidopsis thaliana on the Iberian Peninsula. For each accession, the probability of belonging to each of the 4 genetic clusters (GC) inferred in Martínez-Minaya et al. (2019), namely, GC1, GC2, GC3 and GC4, were available (Fig. 8), their sum total being 1. We were interested in estimating the probability of membership, which in this particular context can be thought of as the habitat suitability for each genetic cluster. To do so, we employed LNDMs including climate covariates and spatial terms in the linear predictor. In particular, two bioclimatic variables were used to define the climatic part: annual mean temperature (BIO1) and annual precipitation (BIO12). The complete dataset was downloaded from the repository Martínez-Minaya et al. (2019). Climate covariates were scaled before conducting the analysis.

Fig. 8
figure 8

Probability of membership of GC1, GC2, GC3 and GC4 on the Iberian Peninsula

As mentioned, four categories were employed in this problem: GC1, GC2, GC3 and GC4. So, we dealt with proportions in \(\mathbb {S}^4\). To produce the LNDM, we selected GC4 as the reference category because it was the one whose logarithm had the lowest variance. We were thus dealing with a three dimensional \({{\mathcal {N}}}{{\mathcal {D}}}(\varvec{\mu }, \varvec{\Sigma })\). The transformed data is shown in Fig. 9.

Fig. 9
figure 9

Additive log-ratio transformation of the proportion of GC1, GC2, GC3 and GC4 on the Iberian Peninsula, using GC4 as the reference category

7.2 Model selection, model fitting and prediction

Model selection was conducted including the intercept and also the two climatic covariates combining them with the spatial effects for the different structures presented in Table 1. 8 models were fitted and the DIC, WAIC and LCPO were computed (Table 3).

In view of the results in the model selection, and based on DIC and WAIC, we observed that the one with type VIII structure seemed to be the best at representing the process of interest. On the contrary, the LCPO indicates that the best model features a Type VI structure. However, as the difference is just 0.019, we proceeded with the model Type VIII for making the computation of the posterior distributions and also for making the predictions. Then, R-INLA allowed us to compute the posterior distribution for the fixed effects (Fig. 10) in each alr-coordinate. As we have argued in favour of alr, it is easy to interpret in terms of ratios.

If we focus on the covariate BIO1 (annual mean temperature), we observed that in presence of BIO12, it is relevant with a probability of 0.972 for the coefficient to be lower than 0 in the the first alr-coordinate, 0.99 for the second one, and 0.99 for the third. Therefore, in all three cases, we shall presume the covariate to be relevant and proceed to interpret the coefficients (Fig. 10). We observed that the ratio between the probability of belonging to GC1 and the probability of belonging to GC4 reduces by approximately 20% when the scaled covariate annual mean temperature increased by one unit. For the case of the ratio between the probability of belonging to GC2 and GC4, it decreased by 32% when the scaled covariate annual mean temperature increased by one unit. Finally, the ratio between the probability of belonging to GC3 and GC4 decreased by 50% when the covariate annual mean temperature increased by one unit.

If we focus on the covariate present in the model BIO12 (annual precipitation), we noted that in presence of BIO1, it is relevant with a probability of 0.72 for the coefficient to be lower than 0 in the the first alr-coordinate. Not happen the same for the second and third alr-coordinate, as the probability to be lower than 0 are 0.43 and 0.46 respectively. As a result, we assume the covariate’s relevance in the first alr-coordinate and we proceed to interpret its coefficient (Fig. 10). The ratio between the probability of belonging to GC1 and the probability of belonging to GC4 decreases by approximately 6% when the scaled covariate BIO12 increased by one unit and BIO1 remains constant.

Table 3 LNDMs with their corresponding DIC, WAIC and LCPO
Fig. 10
figure 10

Marginal posterior distribution for the parameters corresponding to the fixed effects or each of the alr-coordinates: BIO2 and BIO12

With the method implemented here, we are able to make predictions not only on the alr-coordinates scale (Fig. 11), but also on the original scale (Fig. 12). If we focus on Fig. 11, we observe how in the north-west of Spain the ratio between the probability of belonging to GC1 and GC4 reached 12, meaning that at those points the probability of belonging to GC1 is 12 times greater than the probability of belonging to GC4. Something similar happened in the north-east of the Iberian Peninsula, where the probability of belonging to GC2 is 12 times greater than the probability of belonging to GC4. The case of the third alr-coordinate seems a bit different, and the greatest difference between the probability of belonging to GC3 and GC4 is found in the centre of the Iberian Peninsula.

Fig. 11
figure 11

Mean and standard deviation of the posterior predictive distribution for the alr-coordinates

Fig. 12
figure 12

Mean and standard deviation of the posterior predictive distribution for the probability of belonging to GC1, GC2, GC3 and GC4

Finally, it is accessible to compute marginal posterior distribution of the hyperparameters and, consequently, the covariance parameter between the alr-coordinates (Fig. 13).

Fig. 13
figure 13

Marginal posterior distribution for the hyperparameters of the model

8 Conclusions and future work

CoDa are becoming more and more common, especially in the context of genomics, and require increasingly powerful computational tools to be analysed. Thus, we believe that finding a way to include a likelihood that can deal with CoDa in the context of LGMs can facilitate inference and predictions. That is why in this manuscript, we have introduced a different way to make inference on Bayesian CoDa analysis. By doing so, we attempt to include it in the context of LGMs, thereby making the range of possibilities that R-INLA offers available to the logistic-normal distribution with Dirichlet covariance likelihood.

The main idea underlying the proposed method is to approximate the multivariate likelihoods with univariate ones sharing an independent random effect that can be fitted by R-INLA, in particular, Gaussian likelihoods. This idea is similar to the one proposed for modelling Multinomial likelihood in R-INLA, where using the Poisson trick (Baker 1994) to reparameterise the model we need to fit independent Poisson observations, or the one proposed in (Martínez-Minaya et al. 2023) to approximate Dirichlet likelihoods using conditionally independent Gaussians. Simpson et al. (2016) also used a similar strategy, constructing a Poisson approximation to the true log-Gaussian Cox process likelihood and making it possible to carry out inference on a regular lattice over the observation window by counting the number of points in each cell. But this work does not intend to be a substitute for the dirinla package (Martínez-Minaya et al. 2023) or for the Bayesian ilr approach (Mota-Bertran et al. 2022): it is simply a viable alternative when dealing with CoDa that allows the estimation and prediction of very complex models in the context of CoDa. Furthermore, functions are provided for the computation of DIC and WAIC within the framework of R-INLA, accompanied by the definition of the CPO for CoDa.

We have reported an example in the field of Ecology, showing the potential of R-INLA when continuous spatial effects can be added in the linear predictor. We have exploited the options that R-INLA has available using tools in the context of multiple likelihoods, such as copy or replicate (Gómez-Rubio 2020). With them, our aim was to show practitioners the number of models that can be fitted in this context. Although here we have focused mainly on spatial processes, this tool can be easily applied in other contexts: temporal, spatiotemporal, etc., as long as we exprees the model in the context of LGMs.

9 Supplementary information

Code: The functions are stored in a R-package call INLAComp, it is on https://github.com/jmartinez-minaya/INLAcomp. The results shown in the paper are stored in https://jmartinez-minaya.github.io/supplementary.html.