Appendix
A: Integrated Nested Laplace Approximation—INLA
Integrated nested Laplace approximation (INLA, Rue et al. 2009) is a powerful methodology that allows the user to fit a variety of Bayesian models. A model can be fitted in INLA, if, for a random variable \({\varvec{Y}}\), its mean \({\varvec{\mu }}\) can be modeled through a link function g(.) in an additive way as:
$$\begin{aligned} g(\mu _i) = \eta _i = \beta _0 + \sum _{j = 1}^{n_{\xi }}\xi ^{(j)}(z_{ji}) + \sum _{k = 1}^{n_{\beta }}\beta _kX_{ki} + \epsilon _i, \end{aligned}$$
(A-1)
where \(\xi ^{(j)}(z_{ji})\) are unknown functions of the covariates \(z_{ij}\), \(\beta _0\) is an intercept, \(\beta _k\) is a set of coefficients related to the fixed effects \(X_{ki}\) and \(\epsilon _i\) are unstructured terms. INLA assumes Gaussian priors to the vector \({\varvec{u}} = \{\beta _0, {\varvec{\xi }}, {\varvec{\beta }}, \epsilon \}\) giving rise to a Gaussian Markov random field (GMRF, Rue and Held 2005). If the latent structure of a model can be written as a GMRF, it is possible to apply the INLA methodology. Most common models belonging to the GLMM family can be fitted in this framework.
The vector \({\varvec{u}} = \{\beta _0, {\varvec{\xi }}, {\varvec{\beta }}, {\varvec{\epsilon }} \}\) may depend on some hyperparameters \({\varvec{\theta }}\), for example, variances and correlation parameters that obey, in general, \(\text {dim}({\varvec{u}}) \gg \text {dim}({\varvec{\theta }}) = n_{\theta }\). That way, one must provide the prior distribution for the vector \(\{{\varvec{u}}, {\varvec{\theta }}\}\). INLA assigns priors \(\pi ({\varvec{u}}, {\varvec{\theta }}) = \pi ({\varvec{u}}|{\varvec{\theta }})\pi ({\varvec{\theta }})\) where \(\pi ({\varvec{u}}|{\varvec{\theta }})\) is a GMRF and \(\pi ({\varvec{\theta }})\) may be decomposed as \(\prod _{j = 1}^{n_{\theta }}\pi ({\varvec{\theta _j}})\). The marginal posterior distributions for the set of parameters are given by:
$$\begin{aligned} \pi (u_j|{\varvec{y}}) = \int \pi (u_j, {\varvec{\theta }}|{\varvec{y}})d{\varvec{\theta }} = \int \pi (u_j|{\varvec{\theta }}, {\varvec{y}})\pi ({\varvec{\theta }}|{\varvec{y}})d{\varvec{\theta }},\\ \pi (\theta _k|{\varvec{y}}) = \int \pi ({\varvec{\theta }}|{\varvec{y}}) d{\varvec{\theta _{-k}}}. \end{aligned}$$
In the absence of analytical solution to these integrals, numerical approximations are necessary to obtain \({\tilde{\pi }}(u_j|{\varvec{y}})\) and \({\tilde{\pi }}(\theta _k|{\varvec{y}})\), where \({\tilde{\pi }}(.)\) denotes an approximate function for \(\pi (.)\).
Marginal Distribution for \(\theta _k\)
We can rewrite \(\displaystyle \pi ({\varvec{\theta }}|{\varvec{y}}) = \frac{\pi ({\varvec{u}}, {\varvec{\theta }}|{\varvec{y}})}{\pi ({\varvec{u}}| {\varvec{\theta }}, {\varvec{y}})}\). To approximate this quantity, Rue et al. (2009) suggest a Gaussian approximation for the denominator as:
$$\begin{aligned} {\tilde{\pi }}({\varvec{\theta }}|{\varvec{y}}) \propto \frac{\pi ({\varvec{u}}, {\varvec{\theta }}, {\varvec{y}})}{\pi _G({\varvec{u}}| {\varvec{\theta }}, {\varvec{y}})}\Bigg |_{u = u^{*}({\varvec{\theta }})}, \end{aligned}$$
where \(\pi _G(.)\) is the Gaussian approximation of a density, and \(u^{*}({\varvec{\theta }})\) is the mode of \(\pi ({\varvec{u}}| {\varvec{\theta }}, {\varvec{y}})\) at a given \({\varvec{\theta }}\). Now, to obtain the marginal distribution \({\tilde{\pi }}(\theta _k|{\varvec{y}})\), a numerical integration is conducted. Using a grid of \(\theta _k\) values, the marginal is obtained as:
$$\begin{aligned} \pi (\theta _k|{\varvec{y}}) = \sum _{h=1}^H {\tilde{\pi }}({\varvec{\theta }}|{\varvec{y}})\Delta _{kh}. \end{aligned}$$
Marginal Distribution for \(u_j\)
Rue et al. (2009) propose three different approximations to this quantity: 1) Gaussian approximation; 2) Laplace approximation, and; 3) simplified Laplace approximation. The Gaussian approximation is the easiest to be obtained, but provides poor results. At the cost of being computationally expensive, the Laplace approximation produces better results. The simplified Laplace approximation provides satisfactory results, with an improved computational time. Taking one of them as approximation for \({\tilde{\pi }}(u_j|{\varvec{\theta }}, y)\), one can calculate the posterior marginal distribution as:
$$\begin{aligned} {\tilde{\pi }}(u_j|{\varvec{y}}) \approx \sum _{h=1}^H {\tilde{\pi }}(u_j | \theta ^*_h, {\varvec{y}}) {\tilde{\pi }}(\theta ^*_h|{\varvec{y}}) \Delta _h. \end{aligned}$$
B: Additional Simulation Results
Table 6 presents the simulation results for scenario SM2 (cubic and linear) and SM3 (cubic), comparing the SCM (without MSPOCK adjustment), to the SCM with the adjustment.
Table 6 Simulation results comparing SCM (shared component model, without confounding adjustment), and MSPOCK (shared component model, with confounding adjustment) for scenario SM2 (linear and cubic) and SM3 (cubic) C: Widely Applicable Information Criterion
In any application, it is a common in practice to have several competitor models. These models may vary in the number of parameters and/or model likelihood and, therefore, the complexity of these models can differ. One important aspect to evaluate is the parsimony principle that consists in determining a trade-off between model fitting and model complexity. In practice, we are searching for the best fit. However, the best fit does not necessarily always mean a more complex model, since they may have undesirable properties as overfitting, computational cost, identifiability issues, and so on.
Under the Bayesian paradigm, the deviation information criterion (DIC, Spiegelhalter et al. 2002) continues to be a widely popular metric. However, Gelman et al. (2014) studied and compared different model selection criteria, and concluded that the Widely applicable information criterion (WAIC, Watanabe 2010) is a promising alternative to performing such a task. To calculate the WAIC, one must compute the following log pointwise posterior predictive density (\( {lppd}\)):
$$\begin{aligned} lppd = \log \left( \prod _{i=1}^n \pi _{post}(y_i) \right) = \sum _{i=1}^n \log \left( \int \pi (y_i|{\varvec{u}},{\varvec{\theta }}) \pi _{post}({\varvec{u}},{\varvec{\theta }}) \right) , \end{aligned}$$
where \(\pi _{post}(\cdot )\) represents the posterior distribution of some quantity. Next, to adjust for a possible overfitting, a term is added to correct for the effective number of parameters \(p_{\text {WAIC}} = \sum _{i=1}^n V(\log f(y_i|{\varvec{u}}, {\varvec{\theta }}))\), where \(V(\cdot )\) is the posterior variance of the log predictive density. Finally, the WAIC is given by:
$$\begin{aligned} \text {WAIC} = -2 (lppd - p_{\text {WAIC}}). \end{aligned}$$
The model with the smallest WAIC value is considered the model of best fit to a dataset.