Keywords

6.1 Bayes Theorem and Bayesian Linear Regression

Unlike classic statistical inference, which assumes that the parameter θ that defines the model is a fixed unknown quantity, in Bayesian inference it is considered as a random variable whose variation tries to represent the knowledge or ignorance about this, before the data points are collected (Box and Tiao 1992). The probability density function which describes such variation is known as the prior distribution and is an additional component in the specification of the complete model in a Bayesian framework.

Given a data set yn = (y1, …, yn) whose distribution is assumed to be f(y| θ), and a prior distribution for the parameter θ, f(θ), the Bayesian analysis uses the Bayes theorem to combine these two pieces of information to obtain the posterior distribution of the parameters, on which the inference is fully based (Christensen et al. 2011):

$$ f\left(\boldsymbol{\theta} |\boldsymbol{y}\right)=\frac{f\left(\boldsymbol{y},\boldsymbol{\theta} \right)}{f\left(\boldsymbol{y}\right)}=\frac{f\left(\boldsymbol{\theta} \right)f\left(\boldsymbol{y}|\boldsymbol{\theta} \right)}{f\left(\boldsymbol{y}\right)}\propto f\left(\boldsymbol{\theta} \right)L\left(\boldsymbol{\theta}; \boldsymbol{y}\right), $$

where f(y) =  ∫ f(y| θ)f(θ)dθ = Eθ[f(y| θ)] is the marginal distribution of θ. This conditional distribution describes what is known about θ after data is collected and can be thought of as the updated prior knowledge about θ with the information contained in the data, which is done through the likelihood function L(θ; y) (Box and Tiao 1992).

In general, because the posterior distribution doesn’t always have a recognizable form and it is often not easy to simulate from this, numerical approximation methods are employed. Once a sample of the posterior distribution is obtained, estimation of a parameter is often found by averaging the sample values or averaging a function of the sample values when another quantity is of interest. For example, in genomic prediction with dense molecular markers, the main interest is to predict the trait of interest of the non-phenotyped individuals that have only genotypic information, environment variables, or other information (covariates). In this situation, a convenient practice is to include the individuals to be predicted (yp) in the posterior distribution to be sampled.

Specifically, a standard Bayesian framework for a normal linear regression model (see Chap. 3)

$$ Y={\beta}_0+\sum \limits_{j=1}^p{X}_j{\beta}_j+\epsilon $$
(6.1)

with ϵ a random error with normal distribution with mean 0 and variance σ2, is fully specified by assuming the next non-informative prior distribution: β and log(σ) approximately independent and locally uniform.

$$ f\left(\boldsymbol{\beta}, {\sigma}^2\right)\propto {\sigma}^{-2} $$
(6.2)

which is not a proper distribution because it does not integrate to 1 (Box and Tiao 1992; Gelman et al. 2013). However, when X is of full column rank, the posterior distribution is a proper distribution and is given by

$$ {\displaystyle \begin{array}{c}f\left(\boldsymbol{\beta}, {\sigma}^2|\boldsymbol{y},\boldsymbol{X}\right)\propto {\left({\sigma}^2\right)}^{-\frac{n}{2}}\exp \left[-\frac{1}{2{\sigma}^2}{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } \right)}^{\mathrm{T}}\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } \right)\right]{\left({\sigma}^2\right)}^{-1}\\ {}\propto {\left({\sigma}^2\right)}^{-\frac{n}{2}}\exp \left[-\frac{1}{2{\sigma}^2}\left({\boldsymbol{\beta}}^{\mathrm{T}}{\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\boldsymbol{\beta } -2{\boldsymbol{y}}^{\mathrm{T}}\boldsymbol{X}\boldsymbol{\beta } +{\boldsymbol{y}}^{\mathrm{T}}\boldsymbol{y}\right)\right]{\left({\sigma}^2\right)}^{-1}\\ {}\propto {\left({\sigma}^2\right)}^{-\frac{p+1}{2}}\exp \left[-\frac{1}{2{\sigma}^2}{\left(\boldsymbol{\beta} -\tilde{\boldsymbol{\beta}}\right)}^{\mathrm{T}}{\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\left(\boldsymbol{\beta} -\tilde{\boldsymbol{\beta}}\right)-\frac{1}{2{\sigma}^2}\left({\boldsymbol{y}}^{\mathrm{T}}\boldsymbol{y}-{\tilde{\boldsymbol{\beta}}}^{\mathrm{T}}{\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\tilde{\boldsymbol{\beta}}\right)\right]{\left({\sigma}^2\right)}^{-1-\left(n-p-1\right)/2}\\ {}\propto {\left({\sigma}^2\right)}^{-\frac{p+1}{2}}\exp \left[-\frac{1}{2{\sigma}^2}{\left(\boldsymbol{\beta} -\tilde{\boldsymbol{\beta}}\right)}^{\mathrm{T}}{\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\left(\boldsymbol{\beta} -\tilde{\boldsymbol{\beta}}\right)\right]{\left({\sigma}^2\right)}^{-1-\left(n-p-1\right)/2}\exp\ \left[-\frac{1}{2{\sigma}^2}{\boldsymbol{y}}^{\mathrm{T}}\left(\boldsymbol{I}-\boldsymbol{H}\right)\boldsymbol{y}\right]\\ {}\propto {\left({\sigma}^2\right)}^{-\frac{p+1}{2}}\exp \left[-\frac{1}{2{\sigma}^2}{\left(\boldsymbol{\beta} -\tilde{\boldsymbol{\beta}}\right)}^{\mathrm{T}}{\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\left(\boldsymbol{\beta} -\tilde{\boldsymbol{\beta}}\right)\right]{\left({\sigma}^2\right)}^{-1-\frac{n-p-1}{2}}\exp \left(-\frac{\left(n-p-1\right){\tilde{\sigma}}^2}{2{\sigma}^2}\right),\end{array}} $$

where \( \overset{\sim }{\boldsymbol{\beta}}={\left({\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\right)}^{-1}{\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{y} \), \( {\overset{\sim }{\sigma}}^2=\frac{1}{n-p-1}{\boldsymbol{y}}^{\mathrm{T}}\left(\boldsymbol{I}-\boldsymbol{H}\right)\boldsymbol{y}, \) and H = X(XTX)−1XT. From here the marginal posterior distribution of σ2 is \( {\sigma}^2\mid \boldsymbol{y},\boldsymbol{X}\sim IG\left(\left(n-p-1\right)/2,\frac{\left(n-p-1\right){\overset{\sim }{\sigma}}^2}{2}\right) \) with mean \( \frac{\left(n-p-1\right){\hat{\sigma}}^2}{2}/\left[\left(n-p-1\right)/2\right]={\overset{\sim }{\sigma}}^2 \), and given σ2, the posterior conditional distribution of β is given by \( \boldsymbol{\beta} \mid {\sigma}^2,\boldsymbol{y},\boldsymbol{X}\sim N\left(\overset{\sim }{\boldsymbol{\beta}},{\sigma}^2{\left({\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\right)}^{-1}\right). \)

6.2 Bayesian Genome-Based Ridge Regression

When p > n, X is not of full column rank and the posterior of model (6.1) may not be proper (Gelman et al. 2013), so a solution is instead to consider independently proper prior distributions, β ∼ N(0, Iσ2) and σ2 ∼ IG(α0, α0), which for large values of σ2 (106) and small values of α0 (10−3) is an approximation to the standard non-informative prior given in (6.1) (Christensen et al. 2011). A similar prior specification is taken in genomic prediction where different models are obtained by adopting different prior distributions of the parameters. For example, the Bayesian Linear Ridge Regression (Pérez and de los Campos 2014) with standardized covariates (Xjs) is given by

$$ Y=\mu +\sum \limits_{j=1}^p{X}_j{\beta}_j+\epsilon $$
(6.3)

with a flat prior for mean parameter (μ), f(μ) ∝ 1, which can be approximately specified by \( \mu \sim N\left(0,{\sigma}_0^2\right) \), with a large value of \( {\sigma}_0^2 \) (1010), a multivariate normal distribution with mean vector 𝟎 and covariance matrix \( {\sigma}_{\beta}^2{\boldsymbol{I}}_p \) on the beta coefficients, \( {\boldsymbol{\beta}}_0={\left({\beta}_1,\dots, {\beta}_p\right)}^{\mathrm{T}}\mid {\sigma}_{\beta}^2\sim {N}_p\left(\mathbf{0},{\boldsymbol{I}}_p{\sigma}_{\beta}^2\right), \) and scaled inverse Chi-square distributions as priors for the variance component: \( {\sigma}_{\beta}^2\sim {\chi}_{v_{\beta },\kern0.5em {S}_{\beta}}^{-2} \) (prior for the variance of the regression coefficients βj) and \( {\sigma}^2\sim {\chi}_{v,S}^{-2} \) (prior for the variance of random errors, ϵ), where \( {\chi}_{v,S}^{-2} \) denotes the scaled inverse Chi-squared distribution with shape parameter v and scale parameter S. The joint posterior distribution of the parameters in this model, \( \boldsymbol{\theta} ={\left(\mu, {\boldsymbol{\beta}}_0^{\mathrm{T}},{\sigma}^2,{\sigma}_{\beta}^2\right)}^{\mathrm{T}}, \) is given by

$$ {\displaystyle \begin{array}{c}f\left(\mu, {\boldsymbol{\beta}}_0^{\mathrm{T}},{\sigma}_{\beta}^2,{\sigma}^2|\boldsymbol{y},\boldsymbol{X}\right)\propto L\left(\mu, {\boldsymbol{\beta}}_0^{\mathrm{T}},{\sigma}^2;\boldsymbol{y}\right)f\left(\boldsymbol{\theta} \right)\\ {}\propto L\left(\mu, {\boldsymbol{\beta}}_0^{\mathrm{T}},{\sigma}^2;\boldsymbol{y}\right)f\left(\mu \right)f\left({\boldsymbol{\beta}}_0|{\sigma}_{\beta}^2\right)f\left({\sigma}_{\beta}^2\right)f\left({\sigma}^2\right)\\ {}\propto \frac{1}{{\left(2{\pi \sigma}^2\right)}^{\frac{n}{2}}}\exp \left[-\frac{1}{2{\sigma}^2}{\left\Vert \boldsymbol{y}-{1}_n\mu -{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right\Vert}^2\right]\times \exp \left(-\frac{1}{2{\sigma}_0^2}{\mu}^2\right)\\ {}\times \frac{1}{{\left({\sigma}_{\beta}^2\right)}^{\frac{p}{2}}}\exp \left(\left[-\frac{1}{2{\sigma}_{\beta}^2}{\boldsymbol{\beta}}_0^{\mathrm{T}}{\boldsymbol{\beta}}_0\right]\right)\times \frac{{\left(\frac{S_{\beta }}{2}\right)}^{\frac{v_{\beta }}{2}}}{\Gamma \left(\frac{v_{\beta }}{2}\right){\left({\sigma}_{\beta}^2\right)}^{1+\frac{v_{\beta }}{2}}}\exp \left(-\frac{S_{\beta }}{2{\sigma}_{\beta}^2}\right)\kern0.5em \\ {}\times \frac{{\left(\frac{S}{2}\right)}^{\frac{v}{2}}}{\Gamma \left(\frac{v}{2}\right){\left({\sigma}^2\right)}^{1+\frac{v}{2}}}\exp \left(-\frac{S}{2{\sigma}^2}\right).\end{array}} $$

This has no known form and it is not easy to simulate values of it, so numerical methods are required to explore it. One way to simulate values of this distribution is by means of the Gibbs sampler method, which consists of alternatingly generating samples of the full conditional distributions of each variable (or block of variables) given the rest of the parameters (Casella and George 1992).

The full conditional posteriors to implement the Gibbs sampler are obtained in the lines below.

The conditional posterior distribution of β0 is given by

$$ {\displaystyle \begin{array}{c}f\left({\boldsymbol{\beta}}_0|-\right)\propto L\left(\mu, {\boldsymbol{\beta}}_0^{\mathrm{T}},{\sigma}^2;\boldsymbol{y}\right)f\left({\boldsymbol{\beta}}_0|{\sigma}_{\beta}^2\right)\\ {}\propto \exp \left[-\frac{1}{2{\sigma}^2}{\left\Vert \boldsymbol{y}-{1}_n\mu -{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right\Vert}^2-\frac{1}{2{\sigma}_{\beta}^2}{\boldsymbol{\beta}}_0^{\mathrm{T}}{\boldsymbol{\beta}}_0\right]\\ {}\propto \exp \left\{-\frac{1}{2}\left[{\boldsymbol{\beta}}_0^{\mathrm{T}}\left({\sigma}_{\beta}^{-2}{\boldsymbol{I}}_n+{\sigma}^{-2}{\boldsymbol{X}}_1^{\mathrm{T}}{\boldsymbol{X}}_1\right){\boldsymbol{\beta}}_0-2{\sigma}^{-2}{\left(\boldsymbol{y}-{1}_n\mu \right)}^{\mathrm{T}}{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right]\right\}\ \\ {}\propto \exp \left\{-\frac{1}{2}\left[{\left({\boldsymbol{\beta}}_0-{\tilde{\boldsymbol{\beta}}}_0\right)}^{\mathrm{T}}{\tilde{\boldsymbol{\Sigma}}}_0^{-1}\left({\boldsymbol{\beta}}_0-{\tilde{\boldsymbol{\beta}}}_0\right)\right]\right\}\ \end{array}} $$

\( {\overset{\sim }{\boldsymbol{\Sigma}}}_0={\left({\sigma}_{\beta}^{-2}{\boldsymbol{I}}_p+{\sigma}^{-2}{\boldsymbol{X}}_1^{\mathrm{T}}{\boldsymbol{X}}_1\right)}^{-1} \) and \( {\overset{\sim }{\boldsymbol{\beta}}}_0={\sigma}^{-2}{\overset{\sim }{\boldsymbol{\Sigma}}}_0{\boldsymbol{X}}_1^{\mathrm{T}}\left(\boldsymbol{y}-{\mathbf{1}}_n\mu \right) \). That is, \( {\boldsymbol{\beta}}_0\mid -\sim {\boldsymbol{N}}_p\left({\overset{\sim }{\boldsymbol{\beta}}}_0,{\overset{\sim }{\boldsymbol{\Sigma}}}_0\right) \). Similarly, the conditional distribution of μ is \( \mu \mid -\sim \boldsymbol{N}\left(\overset{\sim }{\mu },{\overset{\sim }{\sigma}}_0^2\right) \), where \( {\overset{\sim }{\sigma}}_0^2=\frac{\sigma^2}{n} \)and \( \overset{\sim }{\mu }=\frac{1}{n}{\mathbf{1}}_n^{\mathrm{T}}\left(\boldsymbol{y}-{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right) \).

The conditional distribution of σ2 is

$$ {\displaystyle \begin{array}{c}f\left({\sigma}^2|-\right)\propto L\left(\mu, {\boldsymbol{\beta}}_0^{\mathrm{T}},{\sigma}^2;\boldsymbol{y}\right)f\left({\sigma}^2\right)\\ {}\propto \frac{1}{{\left({\sigma}^2\right)}^{\frac{n}{2}}}\exp \left[-\frac{1}{2{\sigma}^2}{\left\Vert \boldsymbol{y}-{1}_n\mu -{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right\Vert}^2\right]\frac{{\left(\frac{S}{2}\right)}^{\frac{v}{2}}}{\Gamma \left(\frac{v}{2}\right){\left({\sigma}^2\right)}^{1+\frac{v}{2}}}\exp \left(-\frac{S}{2{\sigma}^2}\right)\\ {}\propto \frac{{\left(\frac{\tilde{S}}{2}\right)}^{\frac{\tilde{v}}{2}}}{{\left({\sigma}^2\right)}^{1+\frac{\tilde{v}}{2}}}\exp \left(-\frac{\tilde{S}}{2{\sigma}^2}\right),\end{array}} $$

where \( \overset{\sim }{v}=v+n \) and \( \overset{\sim }{S}=S+{\left|\left|\boldsymbol{y}-{\mathbf{1}}_n\mu -{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right|\right|}^2 \). So \( {\sigma}^2\mid -\sim {\chi}_{\overset{\sim }{v},\overset{\sim }{S}}^{-2}, \) where \( {\chi}_{v,s}^{-2} \) denotes a scaled inverse Chi-squared distribution with parameters v and S. Similarly, \( {\sigma}_{\beta}^2\mid -\sim {\chi}_{{\overset{\sim }{v}}_{\beta },{\overset{\sim }{S}}_{\beta}}^{-2}, \) where \( {\overset{\sim }{v}}_{\beta }={v}_{\beta }+p \) and \( {\overset{\sim }{S}}_{\beta }={S}_{\beta }+{\boldsymbol{\beta}}_0^{\mathrm{T}}{\boldsymbol{\beta}}_0 \).

In summary, for the Ridge regression model, a Gibbs sampler consists of the following steps:

  1. 1.

    Choose initial values for μ, β0, and σ2.

  2. 2.

    Simulate a value of the full conditional distribution of\( {\sigma}_{\beta}^2 \):

    $$ {\sigma}_{\beta}^2\mid \mu, {\boldsymbol{\beta}}_0,{\sigma}^2\sim {\chi}_{\overset{\sim }{v},{\overset{\sim }{S}}_{\beta}}^{-2}, $$

    where \( {\chi}_{\overset{\sim }{v},{\overset{\sim }{S}}_{\beta}}^{-2} \) denotes a scaled inverse Chi-square distribution with shape parameter \( {\overset{\sim }{v}}_{\beta }={v}_{\beta }+p \) a\( \mathrm{nd}\ \mathrm{scale}\ \mathrm{parameter}\ {\overset{\sim }{S}}_{\beta }={S}_{\beta }+{\boldsymbol{\beta}}_0^{\mathrm{T}}{\boldsymbol{\beta}}_0 \).

  3. 3.

    Simulate the full conditional posterior distribution of β0:

    $$ {\boldsymbol{\beta}}_0\mid \mu, {\sigma}_{\beta}^2,{\sigma}^2\sim {\boldsymbol{N}}_p\left({\overset{\sim }{\boldsymbol{\beta}}}_0,{\overset{\sim }{\boldsymbol{\Sigma}}}_0\right), $$

    where \( {\overset{\sim }{\boldsymbol{\Sigma}}}_0={\left({\sigma}_{\beta}^{-2}{\boldsymbol{I}}_p+{\sigma}^{-2}{\boldsymbol{X}}_1^{\mathrm{T}}{\boldsymbol{X}}_1\right)}^{-1} \) and \( {\overset{\sim }{\boldsymbol{\beta}}}_0={\sigma}^{-2}{\overset{\sim }{\boldsymbol{\Sigma}}}_0{\boldsymbol{X}}_1^{\mathbf{T}}\left(\boldsymbol{y}-{\mathbf{1}}_n\mu \right) \)

  4. 4.

    Simulate the full conditional distribution of μ:

    $$ \mu \mid {\boldsymbol{\beta}}_0,{\sigma}_{\beta}^2,{\sigma}^2\sim \boldsymbol{N}\left(\overset{\sim }{\mu },{\overset{\sim }{\sigma}}_{\mu}^2\right), $$

    where \( {\overset{\sim }{\sigma}}_{\mu}^2=\frac{\sigma^2}{n} \) and \( \overset{\sim }{\mu }={\mathbf{1}}_n^{\mathrm{T}}\left(\boldsymbol{y}-{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right). \)

  5. 5.

    Simulate the full conditional distribution of σ2:

    $$ {\sigma}^2\mid \mu, {\boldsymbol{\beta}}_0,{\sigma}_{\beta}^2\sim {\chi}_{\overset{\sim }{v},\overset{\sim }{S}}^{-2}, $$

    where \( \overset{\sim }{v}=v+n\ \mathrm{and}\ \overset{\sim }{S}=S+{\left\Vert \boldsymbol{y}-{\mathbf{1}}_n\mu -{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right\Vert}^2 \).

  6. 6.

    Repeat steps 2–5 depending on how many values of the parameter vector (\( {\boldsymbol{\beta}}^{\mathrm{T}},{\sigma}_{\beta}^2,{\sigma}^2 \)) you wish to simulate. Usually a large number of iterations are needed and an early part of them are discarded, to finally average the rest of each parameter to obtain estimates of them.

The Gibbs sampler described above can be implemented easily with the BGLR R package: if the hyperparameters S-v and Sβ-vβ are not specified, by default the BGLR function assigns v = vβ = 5, and to S and Sβ assigns values such that the mode of the priors of σ2 and \( {\sigma}_{\beta}^2 \) (inverse scaled Chi-square) matches a certain proportion of the total variance (1 − R2 and R2): S = Var(Y) × (1 − R2) × (v + 2) and Sβ = Var(Y) × R2 × (vβ + 2) (see Appendix 2 for more details). Explicitly, in BGLR this model can be implemented by running the following R code:

ETA = list( list( model = ‘BRR’, X = X1, df0 = vβ, S0 = Sβ, R2 = 1-R2) ) A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R2)

where nIter = 1e4 and burnIn = 1e3 are the desired number of iterations and the number of them to be discarded when computing the estimates of the parameters. Remember that when the hyperparameter values are not given, they are set up in the default values, as previously described.

A sub-model of the BRR that does not induce shrinkage of the beta coefficients is obtained by assuming that \( {\left({\beta}_1,\dots, {\beta}_p\right)}^{\mathrm{T}}\mid {\sigma}_{\beta}^2\sim {N}_p\left(\mathbf{0},{\boldsymbol{I}}_p{\sigma}_{\beta}^2\right), \) ignoring the prior distribution of \( {\sigma}_{\beta}^2 \) and setting this at a very high value (1010). Note that this model is very similar to the Bayesian model obtained by adopting the prior (6.2), under which the beta coefficients are estimated solely with the information contained in the likelihood function (Pérez and de los Campos 2014). This prior model can also be implemented in the BGLR package and is called FIXED . Certainly, the Gibbs sampler steps for its implementation are the same as the steps described before for the BRR, except that step 2 is removed (no simulations are obtained from \( {\sigma}_{\beta}^2 \)) and \( {\sigma}_{\beta}^{-2} \) is set equal to zero in the full conditional of β0 (step 3).

6.3 Bayesian GBLUP Genomic Model

In genomic-enabled prediction, the number of markers used to predict the performance of a trait of interest is often very large compared to the number of individuals phenotyped in the sample (p ≫ n); for this reason, some computational difficulties may arise when exploring the posterior distribution of the beta coefficients. When the main objective is to use this model for predictive purposes, a solution consists of reducing the dimension of the problem by directly simulating values of g = X1β0 (breeding values or genomic effects, Lehermeier et al. 2013) instead of only from β0. To do this, first note that because \( {\boldsymbol{\beta}}_0\mid {\sigma}_{\beta}^2\sim {N}_p\left(\mathbf{0},{\boldsymbol{I}}_p{\sigma}_{\beta}^2\right), \) to induce a prior for g, this is defined as \( \boldsymbol{g}={\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\mid {\sigma}_{\beta}^2\sim {N}_n\left(\mathbf{0},{\sigma}_{\beta}^2{\boldsymbol{X}}_1{\boldsymbol{X}}_1^{\mathrm{T}}\right)={N}_n\left(\mathbf{0},{\sigma}_g^2\boldsymbol{G}\right) \), where \( {\sigma}_g^2=p{\sigma}_{\beta}^2 \) and \( \boldsymbol{G}=\frac{1}{p}{\boldsymbol{X}}_1{\boldsymbol{X}}_1^{\mathrm{T}}, \) which is known as the genomic relationship matrix (VanRaden 2007). Then, under this parameterization (g = X1β0 and \( {\sigma}_g^2=p{\sigma}_{\beta}^2 \)), the model specified in (6.3), in matrix notation takes the following form:

$$ \boldsymbol{Y}={1}_n\mu +\boldsymbol{g}+\upepsilon $$
(6.4)

with a flat prior to mean parameter (μ), \( {\sigma}^2\sim {\chi}_{v,S}^{-2} \), and the induced priors: \( \boldsymbol{g}={\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\mid {\sigma}_g^2\sim {N}_n\left(\mathbf{0},{\sigma}_g^2\boldsymbol{G}\right) \) and \( {\sigma}_g^2\sim {\chi}_{v_g,\kern0.5em {S}_g}^{-2} \) (vg = vβ, Sg = pSβ).

Similarly to what was done for model (6.3), the full conditional posterior distribution of g in model (6.4) is given by

$$ {\displaystyle \begin{array}{c}f\left(\boldsymbol{g}|-\right)\propto L\left(\mu, \boldsymbol{g},{\sigma}^2;\boldsymbol{y}\right)f\left(\boldsymbol{g}|{\sigma}_g^2\right)\\ {}\propto \frac{1}{{\left(2{\pi \sigma}^2\right)}^{\frac{n}{2}}}\exp \left[-\frac{1}{2{\sigma}^2}{\left\Vert \boldsymbol{y}-{1}_n\mu -\boldsymbol{g}\right\Vert}^2\right]\ \frac{1}{{\left({\sigma}_g^2\right)}^{\frac{n}{2}}}\exp \left(\left[-\frac{1}{2{\sigma}_g^2}{\boldsymbol{g}}^{\mathrm{T}}{\boldsymbol{G}}^{-1}\boldsymbol{g}\right]\right)\\ {}\propto \exp \left\{-\frac{1}{2}\left[{\left(\boldsymbol{g}-\tilde{\boldsymbol{g}}\right)}^{\mathrm{T}}{\tilde{\boldsymbol{G}}}^{-1}\left(\boldsymbol{g}-\tilde{\boldsymbol{g}}\right)\right]\right\},\end{array}} $$

where \( \overset{\sim }{\boldsymbol{G}}={\left({\sigma}_g^{-2}{\boldsymbol{G}}^{-1}+{\sigma}^{-2}{\boldsymbol{I}}_n\right)}^{-1} \) and \( \overset{\sim }{\boldsymbol{g}}={\sigma}^{-2}\overset{\sim }{\boldsymbol{G}}\left(\boldsymbol{y}-{\mathbf{1}}_n\mu \right) \), and from here \( \boldsymbol{g}\mid -\sim {N}_n\left(\overset{\sim }{\boldsymbol{g}},\overset{\sim }{\boldsymbol{G}}\right) \). Then, the mean/mode of g ∣ − is \( \overset{\sim }{\boldsymbol{g}}={\sigma}^{-2}\overset{\sim }{\boldsymbol{G}}\left(\boldsymbol{y}-{\mathbf{1}}_n\mu \right) \), which is also the best linear unbiased predictor (BLUP) of g under the mixed model equation of Henderson (1975) using the machinery of a classic linear mixed model described in the previous chapter for model (6.4), after recognizing the prior distribution of g as the distribution of the random effects in a model, ignoring the priors specification of the rest of the parameters and assuming that they are known (Henderson 1975). For this reason, model (6.4) is often referred to as GBLUP. If G is replaced by the pedigree matrix A, the resulting model is known as PBLUP or ABLUP.

The full conditional posterior of the rest of parameters is similar to the BRR model: \( \mu \mid -\sim \boldsymbol{N}\left(\overset{\sim }{\mu },{\overset{\sim }{\sigma}}_0^2\right) \), where \( {\overset{\sim }{\sigma}}_0^2=\frac{\sigma^2}{n}\kern0.5em \)and \( \overset{\sim }{\mu }=\frac{1}{n}{\mathbf{1}}_n^{\mathrm{T}}\left(\boldsymbol{y}-\boldsymbol{g}\right) \); \( {\sigma}^2\mid -\sim {\chi}_{\overset{\sim }{v},\overset{\sim }{S}}^{-2} \), w\( \mathrm{here}\ \overset{\sim }{v}=v+n \) and \( \overset{\sim }{S}=S+{\left\Vert \boldsymbol{y}-{\mathbf{1}}_n\mu -\boldsymbol{g}\right\Vert}^2; \) and \( {\sigma}_g^2\mid -\sim {\chi}_{\tilde{v}_{g},{\overset{\sim }{S}}_g}^{-2}, \) where \( \tilde{v}_{g}={v}_g+n \) and \( {\overset{\sim }{S}}_g={S}_g+{\boldsymbol{g}}^{\mathrm{T}}{\boldsymbol{G}}^{-1}\boldsymbol{g} \).

Note that when p ≫ n, then the dimension of the parameter space of the posterior of GBLUP model is lower than the BRR.

The GBLUP model (6.4) also can be implemented easily with the BGLR R package, and when the hyperparameters S-v and Sg-vg are not specified, v = vg = 5 is used by default and the scale parameters are settled similarly as in the BRR.

The BGLR code to fit this model:

ETA = list( list( model = ‘RHKS’, K = G, df0 = vg, S0 = Sg, R2 = 1-R2)) ) A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R2)

The GBLUP can be equivalently expressed and consequently fitted with the BRR model by making the design matrix equal to the lower triangular factor of the Cholesky decomposition of the genomic relationship matrix, i.e., X = L, where G = LL. So, with the BGLR package, the BRR implementation of a GBLUP model is.

L = t(chol(G)) ETA = list( list( model = ‘BRR’, X = L, df0 = vβ, S0 = Sβ, R2 = 1-R2) ) A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R2)

When there is more than one repetition of an individual in the data at hand, or a more sophisticated design is used in the data collection, model (6.4) can be specified in a more general way to take into account this structure, as follows:

$$ \boldsymbol{Y}={\mathbf{1}}_n\mu +\boldsymbol{Zg}+\boldsymbol{\epsilon} $$
(6.5)

with Z the incident matrix of the genotypes. This model cannot be fitted directly in the BGLR and some precalculus is needed first to compute the “covariance” matrix of the predictor Zg in model (6.5): KL = Var(Zg) = ZGZT. The BGLR code for implementing this model is the following:

Z = model.matrix(~0+GID,data=dat_F,xlev = list(GID=unique(dat_F$GID))) K_L = Z%*%G%*%t(Z) ETA = list( list( model = ‘RHKS’, K = K_L, df0 = vg, S0 = Sg, R2 = 1-R2)) ) A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R2)

where dat_F is the data set that contains the necessary phenotypic information (GID: Lines or individuals; y: response variable of the trait of interest).

6.4 Genomic-Enabled Prediction BayesA Model

Another variant to the standard Bayesian model (6.1) is the BayesA model proposed by Meuwissen et al. (2001), which is a slight modification of the BRR model obtained with the same prior distributions, except that now a specific variance \( {\sigma}_{\beta_j}^2 \) is assumed for each of the covariate (marker) effects, that is, \( {\beta}_j\mid {\sigma}_{\beta_j}^2\sim N\left(0,{\sigma}_{\beta j}^2\right) \), and these variance parameters are supposed to be independent random variables with a scaled inverse Chi-square distribution with parameters vβ and Sβ, \( {\sigma}_{\beta_j}^2\sim {\chi}^{-2}\left({v}_{\beta },{S}_{\beta}\right) \). These specific variances for each marker effect provide covariate heterogeneous shrinkage estimation. Furthermore, a gamma distribution is assigned to Sβ, Sβ ∼ G(r, s), where G(r, s) denotes a gamma distribution where r and s are the rate and shape parameters, respectively. By providing a different prior variance for each βj, this model has the potential of inducing covariate-specific shrinkage of estimated effects (Pérez and de los Campos 2013).

Note that choosing r = r/vβ and taking very large values of vβ, the prior of \( {\sigma}_{\beta j}^2 \) collapses to a degenerate distribution at Sβ, and the BRR model is obtained, but with a gamma distribution with parameters r and s as priors to the common variance of the effects \( {\sigma}_{\beta}^2=\mathrm{Var}\left({\beta}_j\right)={S}_{\beta } \), instead of χ−2(vβ, Sβ). Furthermore, the marginal distribution of each beta coefficient βj, that is, the unconditional distribution of βj ∣ Sβ, is a scaled-t-student distribution (scaled by \( \sqrt{S_{\beta }/{v}_{\beta }} \)). These distributions, compared to the normal, have heavier tails and put higher mass around 0, which compared to the BRR, induce fewer shrinkage estimates of covariates with sizable effects, and induce strong shrinkage toward zero estimates of covariates with smaller effects, respectively (de los Campos et al. 2013).

A Gibbs sampler implementation for estimating the parameters of this model can be done following steps 1–6 of the BRR model, where the second step is replaced by the next 2.1 and 2.2 steps, the third and the last step are replaced and modified by the next steps 3 and 6:

  1. 2.1

    Simulate from the full conditional of each \( {\sigma_{\beta}}_j^2 \)

    $$ {\sigma}_{\beta_j}^2\mid \mu, {\boldsymbol{\beta}}_0,{\boldsymbol{\sigma}}_{-j}^2,{S}_{\beta },{\sigma}^2\sim {\chi}_{\tilde{v}_{j},{{\overset{\sim }{S}}_{\beta}}_j}^{-2}, $$

    where \( {\overset{\sim }{v}}_{\beta_j}={v}_{\beta }+1 \) a\( \mathrm{nd}\ \mathrm{scale}\ \mathrm{parameter}\ {\overset{\sim }{S}}_{\beta_j}={S}_{\beta }+{\beta}_j^2 \), where \( {\boldsymbol{\sigma}}_{-j}^2 \) is the vector \( {\boldsymbol{\sigma}}_{\beta}^2=\left({\sigma}_{\beta_1}^2,\dots, {\sigma}_{\beta_p}^2\right) \) but without the jth entry.

  2. 2.2

    Simulate from the full conditional of Sβ

    $$ {\displaystyle \begin{array}{c}f\left({S}_{\beta }|-\right)\propto \left[\prod \limits_{j=1}^pf\left({\sigma}_{\beta_j}^2|{S}_{\beta }\ \right)\right]f\left({S}_{\beta}\right)\\ {}\propto \prod \limits_{j=1}^p\left[\frac{{\left(\frac{S_{\beta }}{2}\right)}^{\frac{v_{\beta }}{2}}}{\Gamma \left(\frac{v_{\beta }}{2}\right){\left({\sigma}_{\beta_j}^2\right)}^{1+\frac{v_{\beta }}{2}}}\exp \left(-\frac{S_{\beta }}{2{\sigma}_{\beta_j}^2}\right)\right]{S}_{\beta}^{s-1}\exp \left(-{rS}_{\beta}\right)\\ {}\propto {S}_{\beta}^{s+\frac{pv_{\beta }}{2}-1}\exp \left[-\left(r+\frac{1}{2}\sum \limits_{j=1}^p\frac{1}{\sigma_{\beta j}^2}\right){S}_{\beta}\right]\end{array}} $$

    which corresponds to the kernel of the gamma distribution with rate parameter \( \overset{\sim }{r}=r+\frac{1}{2}{\sum}_{j=1}^p\frac{1}{\sigma_{\beta j}^2} \) and shape parameter \( \overset{\sim }{s}=s+\frac{pv_{\beta }}{2} \), and so \( {S}_{\beta}\mid -\sim \mathrm{Gamma}\left(\overset{\sim }{r},\overset{\sim }{s}\right). \)

  • 3. Simulate the full conditional posterior distribution of β0:

    $$ {\boldsymbol{\beta}}_0\mid \mu, {\boldsymbol{\sigma}}_{\beta}^2,{\sigma}^2\sim {N}_p\left({\overset{\sim }{\boldsymbol{\beta}}}_0,{\overset{\sim }{\boldsymbol{\Sigma}}}_0\right), $$

    where \( {\overset{\sim }{\boldsymbol{\Sigma}}}_0={\left({\boldsymbol{D}}_p^{-1}+{\sigma}^{-2}{\boldsymbol{X}}_1^T{\boldsymbol{X}}_1\right)}^{-1} \) and \( {\overset{\sim }{\boldsymbol{\beta}}}_0={\sigma}^{-2}{\overset{\sim }{\boldsymbol{\Sigma}}}_0{\boldsymbol{X}}_1^{\mathbf{T}}\left(\boldsymbol{y}-{\mathbf{1}}_n\mu \right) \), \( {\boldsymbol{D}}_p=\mathrm{Diag}\left({\sigma}_{\beta_1}^2,\dots, {\sigma}_{\beta_p}^2\right). \)

  • 6. Repeat steps 2–5 (given in the BRR method) depending on how many values of the parameter vector (\( {\boldsymbol{\beta}}^{\mathrm{T}},{\boldsymbol{\sigma}}_{\beta}^2,{\sigma}^2,{S}_{\beta } \)) we wish to simulate.

When implementing this model in the BGLR package, by default v = vβ = 5 are used and S = Var(Y) × (1 − R2) × (v + 2), which makes the mode of the priors of σ2 (χ−2(v, S)) match a certain proportion of the total variance (1 − R2). If the hyperparameters of the a priori for Sβ, s, and r, are not specified, by default BGLR takes s = 1.1 to have an a priori (Sβ ∼ G(s, r)) that is relatively non-informative with a coefficient of variation (\( 1/\sqrt{s} \)) of approximately 95%. Then, to the rate parameter it assigns the value r = (s − 1)/Sβ, with \( {S}_{\beta }=\mathrm{Var}(y)\times {R}^2\times \left({v}_{\beta }+2\right)/{S}_x^2 \), where \( {S}_x^2 \) is the sum of the variances of the columns of X and R2 is the percentage of the total variation that a priori is required to attribute to the covariates in X. The BGLR code for implementing this model is

ETA = list( list( model = ‘BayesA’, X=X1, df0 = vβ, rate0 = r, shape0 = s, R2 = 1-R2)) A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R2)

6.5 Genomic-Enabled Prediction BayesB and BayesC Models

Other variants of the model (6.1) are the BayesC and the BayesB models, which in turn can be considered as direct extensions of the BRR and BayesA models, respectively, by adding a parameter π that represents the prior proportion of covariates with nonzero effect (Pérez and de los Campos 2014).

The BayesC is the same as the BRR, but instead of assuming a priori that the beta coefficients are independent normal random variables with mean 0 and variance \( {\sigma}_{\beta}^2 \), it assumes that with probability π each βj comes from a\( \kern0.50em N\left(0,{\sigma}_{\beta}^2\right) \), and with probability 1 − π comes from a degenerate distribution (DG) at zero, that is, \( {\beta}_1,\dots, {\beta}_p\mid {\sigma}_{\beta}^2,\pi \overset{iid}{\sim }{\pi}_p\mathrm{N}\left(0,{\sigma}_{\beta}^2\right)+\left(1-{\pi}_p\right)\mathrm{DG}(0) \) (mix of a normal distribution with mean 0 and variance \( {\sigma}_{\beta}^2 \), and degenerate distribution at zero). In addition, for the parameter πp, a beta distribution is assigned as prior, that is, πp~Beta (πp0, ϕ0), where πp0 = E(πp) represents the mean and \( {\phi}_0^{-1} \) is the “dispersion” parameter (\( \operatorname{var}\left({\pi}_p\ \right)=\frac{\pi_{p0\left(1-{\pi}_{p0}\right)}}{\phi_0+1} \)). If ϕ0 = 2 and πp0 = 0.5, the prior for πp is a uniform distribution in (0,1). For large values of ϕ0, the distribution for πp is highly concentrated around πp0, and so the BayesC is reduced to BRR when πp0 = 1 for large values of ϕ0.

For this model, the full conditional distributions of μ and \( {\sigma}_{\beta}^2 \) are the same as the model described before, that is, \( \mu \mid -\sim N\left(\overset{\sim }{\mu },{\overset{\sim }{\sigma}}_{\mu}^2\right) \) and \( {\sigma}^2\mid -\sim {\chi}_{\overset{\sim }{v},\overset{\sim }{S}}^{-2}. \) However, for the rest of the parameters, this does not have a known form and is not easy to simulate from them. A solution is to introduce a latent variable to represent the prior distribution of each βj, and compute all the conditional distributions in this augmented scheme, including the distribution corresponding to the latent variable. To do this, note that this prior can be specified by assuming that conditional to a binary latent variable Zj,

$$ {\beta}_j\mid {\sigma}_{\beta}^2,{Z}_j=z\sim \left\{\begin{array}{c}N\left(0,{\sigma}_{\beta}^2\right),\kern0.5em \mathrm{if}\ z=1\\ {}\mathrm{DG}(0),\kern0.5em \mathrm{if}\ z=0,\end{array}\right. $$

where Zj is a Bernoulli random variable with parameter πp (Zj ∼ Ber(πp)). With this introduced latent variable, all the full conditionals can be derived, as is described next.

If the current value of zj is 1, the full conditional posterior of βj is

$$ {\displaystyle \begin{array}{c}f\left({\beta}_j|-\right)\propto L\left(\mu, {\boldsymbol{\beta}}_0,{\sigma}^2;\boldsymbol{y}\right)f\left({\beta}_j|{\sigma}_{\beta}^2,{z}_j\right)\\ {}\propto \exp \left(-\frac{1}{2{\sigma}^2}\left\Vert \boldsymbol{y}-{1}_n\mu -{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right\Vert \right)\frac{1}{\sqrt{2{\pi \sigma}_{\beta}^2}}\exp \left(-\frac{\beta_j^2}{2{\sigma}_{\beta}^2}\right)\\ {}\propto \exp \left(-\frac{1}{2{\sigma}^2}\sum \limits_{i=1}^n{\left({y}_{ij}-{x}_{ij}{\beta}_j\right)}^2-\frac{1}{2{\sigma}_{\beta}^2}{\beta}_j^2\right)\\ {}\propto \exp \left\{-\frac{1}{2}\left[\left({\sigma}_{\beta}^{-2}+{\sigma}^{-2}\sum \limits_{i=1}^n{x}_{ij}^2\right){\beta}_j^2-2{\sigma}^{-2}\sum \limits_{i=1}^n{x}_{ij}{y}_{ij}{\beta}_j+\frac{1}{\sigma^2}\sum \limits_{i=1}^n{y}_{ij}^2\right]\right\}\\ {}\propto \exp \left[-\frac{{\left({\beta}_j-{\tilde{\beta}}_j\right)}^2}{2{\tilde{\sigma}}_j^2}\right],\end{array}} $$

where \( {y}_{ij}={y}_i-{\sum}_{\begin{array}{c}k=1\\ {}k\ne j\end{array}}^p{x}_{ik}{\beta}_k \), \( {\overset{\sim }{\sigma}}_j^2={\left({\sigma}_{\beta}^{-2}+{\sigma}^{-2}{\sum}_{i=1}^n{x}_{ij}^2\right)}^{-1} \), and \( {\overset{\sim }{\beta}}_j={\sigma}^{-2}{\overset{\sim }{\sigma}}_j^2{\sum}_{i=1}^n{x}_{ij}{y}_{ij}. \) That is, when the current value of zj is 1, \( {\beta}_j\mid -\sim N\left({\overset{\sim }{\beta}}_j,{\overset{\sim }{\sigma}}_j^2\right) \). However, if zj = 0, the full conditional posterior of βj is a degenerate random variable at 0, that is, βj ∣  −  ∼ DG(0).

The full conditional distribution of Zj is

$$ {\displaystyle \begin{array}{c}f\left({z}_j|-\right)\propto f\left({\beta}_j|{\sigma}_{\beta}^2,{z}_j\right)f\left({z}_j\right)\\ {}\propto f\left({\beta}_j|{\sigma}_{\beta}^2,{z}_j\right){\pi}_p^{z_j}{\left(1-{\pi}_j\right)}^{1-{z}_j}\end{array}} $$

from which, conditional on the rest of the parameters, Zj is a Bernoulli random variable with parameter \( {\overset{\sim }{\pi}}_{pj}=\frac{\frac{\pi_p}{\sqrt{2\pi {\sigma}_{\beta}^2}}\exp \left(-\frac{\beta_j^2}{2{\sigma}_{\beta}^2}\right)}{\frac{\pi_p}{\sqrt{2\pi {\sigma}_{\beta}^2}}\exp \left(-\frac{\beta_j^2}{2{\sigma}_{\beta}^2}\right)+\left(1-{\pi}_p\right){\delta}_0\left({\beta}_j\right)} \). Note however that, if βj ≠ 0, \( {\overset{\sim }{\pi}}_{pj}=1, \) and then Zj = 1 with probability 1, when simulating from the full conditional posterior of βj, we will always obtain values different from zero, and this cyclic behavior will remain permanent. On the other hand, note that if βj = 0, \( {\overset{\sim }{\pi}}_{pj}=\frac{\frac{\pi_p}{\sqrt{2\pi {\sigma}_{\beta}^2}}}{\frac{\pi_p}{\sqrt{2\pi {\sigma}_{\beta}^2}}+\left(1-{\pi}_p\right)} \) is not 0, then the next simulated value of βj will be different from 0, and in this scenario, in the next steps of the “Gibbs,” Zj will at all times be 1, and so the chain has absorbing states and will not explore the entire sampling space. A solution to this problem consists of trying to simulate from the joint conditional distribution of βj and Zj, that is, from βj, Zj ∣ −. This full joint conditional distribution can be computed as

$$ f\left({\beta}_j,{z}_j|-\right)\propto {f}_{\ast}\left({z}_j\right)f\left({\beta}_j|{z}_j,-\right)\propto {f}_{\ast}\left({z}_j\right)L\left(\mu, {\boldsymbol{\beta}}_0,{\sigma}^2;\boldsymbol{y}\right)f\left({\beta}_j|{\sigma}_{\beta}^2,{z}_j\right), $$

where f(zj) is the marginal conditional distribution of Zj conditioned to all parameters except βj (Zj ∣  − j ∼ f(·)). Specifically, this is given by “integrating” (f(βj, zj| −) with respect to βj)

$$ {\displaystyle \begin{array}{c}{f}_{\ast}\left({z}_j\right)\propto \underset{-\infty }{\overset{\infty }{\int }}f\left({\beta}_j,{z}_j|-\right)d{\beta}_j\propto \underset{-\infty }{\overset{\infty }{\int }}L\left(\mu, {\boldsymbol{\beta}}_0,{\sigma}^2;\boldsymbol{y}\right)f\left({\beta}_j|{\sigma}_{\beta}^2,{z}_j\right)f\left({z}_j|{\pi}_p\right)d{\beta}_j\\ {}\propto \left\{\begin{array}{c}\underset{-\infty }{\overset{\infty }{\int }}\exp \left[-\frac{1}{2{\sigma}^2}\sum \limits_{i=1}^n{\left({y}_{ij}-{x}_{ij}{\beta}_j\right)}^2\right]\frac{1}{\sqrt{2{\pi \sigma}_{\beta}^2}}\exp \left(-\frac{\beta_j^2}{2{\sigma}_{\beta}^2}\right){\pi}_pd{\beta}_j\\ {}\underset{-\infty }{\overset{\infty }{\int }}\exp \left[-\frac{1}{2{\sigma}^2}\sum \limits_{i=1}^n{\left({y}_{ij}-{x}_{ij}{\beta}_j\right)}^2\right]{\delta}_0\left({\beta}_j\right)\left(1-{\pi}_p\right)d{\beta}_j\end{array}\right.\\ {}\propto \left\{\begin{array}{c}{\pi}_p\sqrt{\frac{{\tilde{\sigma}}_j^2}{\sigma_{\beta}^2}}\\ {}1-{\pi}_p.\end{array}\right.\end{array}} $$

From here, Zj ∣ − is a Bernoulli random distribution with parameter \( {\overset{\sim }{\pi}}_p=\left({\pi}_p\sqrt{\frac{{\overset{\sim }{\sigma}}_j^2}{\sigma_{\beta}^2}}\right)/\left({\pi}_p\sqrt{\frac{{\overset{\sim }{\sigma}}_j^2}{\sigma_{\beta}^2}}+1-{\pi}_p\right) \). With this and the full conditional distribution derived above for βj, an easy way to simulate values from βj, Zj ∣ − consists of first simulating a zj value from \( {Z}_j\mid -j\sim \mathrm{Ber}\left({\overset{\sim }{\pi}}_p\right) \), and then, if zj = 1, simulating a value of βj from \( {\beta}_j\mid -\sim N\left({\overset{\sim }{\beta}}_j,{\overset{\sim }{\sigma}}_j^2\right) \), otherwise take βj = 0.

Now, note that the full conditional distribution of \( {\sigma}_{\beta}^2 \) is

$$ {\displaystyle \begin{array}{c}f\left({\sigma}_{\beta}^2|-\right)\propto \left\{\prod \limits_j^p{\left[f\left({\beta}_j|{\sigma}_{\beta}^2,{z}_j\right)\right]}^{z_j}\right\}f\left({\sigma}_{\beta}^2\right)\\ {}\propto \frac{1}{{\left({\sigma}_{\beta}^2\right)}^{p{\overline{z}}_p}}\exp \left(-\frac{1}{2{\sigma}_{\beta}^2}\sum \limits_{j=1}^p{z}_j{\beta}_j^2\kern0.5em \right)\frac{{\left(\frac{S_{\beta }}{2}\right)}^{\frac{v_{\beta }}{2}}}{\Gamma \left(\frac{v_{\beta }}{2}\right){\left({\sigma}_{\beta}^2\right)}^{1+\frac{v_{\beta }}{2}}}\exp \left(-\frac{S_{\beta }}{2{\sigma}_{\beta}^2}\right)\\ {}\propto \frac{1}{{\left({\sigma}_{\beta}^2\right)}^{p{\overline{z}}_p}}\exp \left(-\frac{1}{2{\sigma}_{\beta}^2}\sum \limits_{j=1}^p{z}_j{\beta}_j^2\kern0.5em \right)\frac{{\left(\frac{S_{\beta }}{2}\right)}^{\frac{v_{\beta }}{2}}}{\Gamma \left(\frac{v_{\beta }}{2}\right){\left({\sigma}_{\beta}^2\right)}^{1+\frac{v_{\beta }}{2}}}\exp \left(-\frac{S_{\beta }}{2{\sigma}_{\beta}^2}\right)\\ {}\propto \frac{1}{{\left({\sigma}_{\beta}^2\right)}^{1+\frac{{\tilde{v}}_{\beta }}{2}}}\exp \left(-\frac{{\tilde{S}}_{\beta }}{2{\sigma}_{\beta}^2}\right),\end{array}} $$

where \( {\overline{z}}_p=1/p{\sum}_{j=1}^p{z}_j \), \( {\overset{\sim }{S}}_{\beta }={S}_{\beta }+{\sum}_{j=1}^p{z}_j{\beta}_j^2,\kern0.5em \)and \( {\overset{\sim }{v}}_{\beta }={v}_{\beta }+p{\overline{z}}_p \). That is, \( {\sigma}_{\beta}^2\mid \mu, {\boldsymbol{\beta}}_0,{\sigma}^2,\mathbf{z}\sim {\chi}_{\overset{\sim }{v},{\overset{\sim }{S}}_{\beta}}^{-2} \). The full conditional distributions of μ and σ2 are the same as BRR, that is, \( \mu \mid -\sim N\left(\overset{\sim }{\mu },{\overset{\sim }{\sigma}}_{\mu}^2\right) \) and \( {\sigma}^2\mid -\sim {\chi}_{\overset{\sim }{v},\overset{\sim }{S}}^{-2} \), with \( {\overset{\sim }{\sigma}}_{\mu}^2=\frac{\sigma^2}{n}, \) \( \overset{\sim }{\mu }={\mathbf{1}}_n^{\mathrm{T}}\left(\boldsymbol{y}-{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right), \) \( \overset{\sim }{v}=v+n,\kern0.5em \mathrm{and}\ \overset{\sim }{S}=S+{\left\Vert \boldsymbol{y}-{\mathbf{1}}_n\mu -{\boldsymbol{X}}_1{\boldsymbol{\beta}}_0\right\Vert}^2 \).

The full conditional distribution of πp is

$$ {\displaystyle \begin{array}{c}f\left({\pi}_p|-\right)\propto \left[\prod \limits_{j=1}^pf\left({z}_j|{\pi}_p\right)\right]\ f\left({\pi}_p\right)\\ {}\propto {\pi}_p^{p{\overline{z}}_p}{\left(1-{\pi}_p\right)}^{p\left(1-{\overline{z}}_p\right)}{\pi}_p^{\phi_0{\pi}_{p0}+1-1}{\left(1-{\pi}_p\right)}^{\phi_0\left(1-{\pi}_{p0}\right)-1}\\ {}\propto {\pi}_p^{\phi_0{\pi}_{p0}+p{\overline{z}}_p-1}{\left(1-{\pi}_p\right)}^{\phi_0\left(1-{\pi}_{p0}\right)+p\left(1-{\overline{z}}_p\right)-1}\end{array}} $$

which means that \( {\pi}_p\mid -\sim \mathrm{Beta}\ \left({\overset{\sim }{\pi}}_{p0},{\overset{\sim }{\phi}}_0\right) \), with \( {\overset{\sim }{\phi}}_0={\phi}_0+p \) and \( {\overset{\sim }{\pi}}_{p0}=\frac{\phi_0{\pi}_{p0}+p{\overline{z}}_p}{\phi_0+p} \).

The BayesB model is a variant of BayesA that assumes almost the same prior models to the parameters, except that instead of assuming independent normal random variables with common mean 0 and common variance \( {\sigma}_{\beta}^2 \) for the beta coefficients, this model adopts a mixture distribution, that is, \( {\beta}_j\mid {\sigma}_{\beta_j}^2,\pi \overset{iid}{\sim}\pi N\left(0,{\sigma}_{\beta_j}^2\right)+\left(1-\pi \right)\mathrm{DG}(0) \), with π ∼ Beta(π0, p0). This model has the potential to perform variable selection and produce covariate-specific shrinkage estimates (Pérez et al. 2010).

This model can also be considered an extension of the BayesC model with a gamma distribution as prior to the scale parameter of the a priori distribution of the variance of the beta coefficients, that is, Sβ ∼ G(s, r). It is interesting to point out that if π = 1, this model is reduced to BayesA, which is obtained by taking π0 = 1 and letting ϕ0 go to ∞. Also, this is reduced to the BayesC by setting \( s/r={S}_{\beta}^0 \) and choosing a very large value for r.

To explore the posterior distribution of this model, the same Gibbs sampler given for BayesC can be used, but adding to the process the full conditional posterior distribution of Sβ: \( {S}_{\beta}\mid -\sim \mathrm{Gamma}\left(\overset{\sim }{r},\overset{\sim }{s}\right) \), where \( \overset{\sim }{r}=r+\frac{1}{2{\sigma}_{\beta}^2} \) and shape parameter \( \overset{\sim }{s}=s+\frac{v_{\beta }}{2} \).

When implementing both these models in the BGLR R package, by default this assigns πp0 = 0.5 and ϕ0 = 10, for the hyperparameters of the prior model of πp, Beta(πp0, ϕ0), which results in a weakly informative prior. For the remaining hyperparameters of the BayesC model, by default BGLR assigns values like those assigned to the BRR model, but with some modifications to consider because a priori now only a proportion π0 of the covariates (columns of X) has nonzero effects:

$$ {\displaystyle \begin{array}{l}v={v}_{\beta }=5,\\ {}S=\mathrm{Var}(Y)\times \left(1-{R}^2\right)\times \left(v+2\right),\\ {}{S}_{\beta }=\mathrm{Var}(y)\times {R}^2\times \frac{\left({v}_{\beta }+2\right)}{S_x^2{\pi}_0}.\end{array}} $$

While for the remaining hyperparameters of BayesB , by default BGLR also assigns values similar to BayesAv = vβ = 5, S = Var(Y) × (1 − R2) × (v + 2), r = (s − 1)/Sβ, with \( {S}_{\beta }=\mathrm{Var}(y)\times {R}^2\times \frac{\left({v}_{\beta }+2\right)}{S_x^2{\pi}_0} \), where \( {S}_x^2 \) is the sum of the variances of the columns of X.

The BGLR codes to implement these models are, respectively:

ETA = list( list( model = ‘BayesC’, X=X1 ) ) A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, df0 = v, S0 = S, probIn = πp0, counts = ϕ0, R2 = R2)

and

ETA = list( list( model = ‘BayesB’, X=X1 ) ) A = BGLR(y=y, ETA = ETA, nIter = 1e4, burnIn = 1e3, df0 = v, rate0 = r, shape0 = s, probIn = πp0, counts = ϕ0, R2 = R2)

6.6 Genomic-Enabled Prediction Bayesian Lasso Model

Another variant of the model (6.1) is the Bayesian Lasso linear regression model (BL). This model assumes independent Laplace or double-exponential distributions with location and scale parameters 0 and \( \frac{\sqrt{\sigma^2}}{\lambda } \), respectively, for the beta coefficients, that is, \( {\beta}_1,\dots, {\beta}_p\mid {\sigma}^2,\lambda \overset{iid}{\sim }L\left(0,\frac{\sqrt{\sigma^2}}{\lambda}\right) \). Furthermore, the priors for parameters μ and σ2 are the same as in the models described before, while for λ2, a gamma distribution with parameters sλ and rλ is often adopted.

Because compared to the normal distribution, the Laplace distribution has fatter tails and puts higher density around 0, this prior induces stronger shrinkage estimates for covariates with relatively small effects and reduced shrinkage estimates for covariates with larger effects (Pérez et al. 2010).

A more convenient specification of the prior for the beta coefficients in this model is obtained with the representation proposed by Park and Casella (2008), which is a continuous scale mixture of a normal distribution: βj ∣ τj ∼ N(0, τjσ2) and τj ∼ Exp(2/λ2), j = 1, …, p, where Exp(θ) denotes an exponential distribution with scale parameter θ. So, unlike the prior used by the BRR model, this prior distribution also puts a higher mass at zero and has heavier tails, which induce stronger shrinkage estimates for covariates with relatively small effect and less shrinkage estimates for markers with sizable effect (Pérez et al. 2010).

Note that the prior distribution for the beta coefficients and the prior variance of this distribution in BayesB and BayesC can be equivalently expressed as a mixture of a scaled inverse Chi-squared distribution with parameters vβ and Sβ, and a degenerate distribution at zero, that is, \( {\beta}_j\sim N\left(0,{\sigma}_{\beta}^2\right) \) and \( {\sigma}_{\beta}^2\sim {\pi}_p{\chi}^{-2}\left({v}_{\beta },{S}_{\beta}\right)+\left(1-{\pi}_p\right)\mathrm{DG}(0) \). So, based on this result and the connections between the models described before, the main difference between all these models is the manner in which the prior variance of the predictor variable is modelled.

Example 1

To illustrate how to use the models described before, here we consider the prediction of grain yield (tons/ha) based on marker information. The data set used consists of 30 lines in four environments with one and two repetitions and the genotyped information contains 500 markers for each line. The numbers of lines with one (two) repetition are 6 (24), 2 (28), 0 (30), and 3 (27) in Environments 1, 2, 3, and 4, respectively, resulting in 229 observations. The performance prediction of all these models was evaluated with 10 random partitions in a cross-validation strategy, where 80% of the complete data set was used to fit the model and the rest to evaluate the model in terms of the mean squared error of prediction (MSE).

The results for all models (shown in Table 6.1) were obtained by iterating 10,000 times the corresponding Gibbs sampler and discarding the first 1000 of them, using the default hyperparameter values implemented in BGLR. This indicates that the behavior of all the models is similar, except the BayesC, where the MSE is slightly greater than the rest.

Table 6.1 Mean squared error (MSE) of prediction across 10 random partitions, with 80% for training and the rest for testing, in five Bayesian linear models

The R code to obtain the results in Table 6.1 is given in Appendix 3.

What happens when using other hyperparameter values? Although the ones used here (proposed by Pérez et al. 2010) did not always produce the best prediction performance (Lehermeier et al. 2013) and there are other ways to propose the hyperparameter values in these models (Habier et al. 2010, 2011), it is important to point out that the values used by default in BGLR work reasonably well and that it is not easy to find other combinations that work better in all applications, and when you want to use other combinations of hyperparameters you need to be very careful because you can dramatically affect the predictive performance of the model that uses the default hyperparameters.

Indeed, by means of simulated and experimental data, Lehermeier et al. (2013) observed a strong influence on the predictive performance of the hyperparameters given to the prior distributions in BayesA, BayesB, and the Bayes Lasso with fixed λ. Specifically, in the first two models, they observed that the scale parameter Sβ of the prior distribution of variance of βj had a strong effect on the predictive ability because overfitting in the data occurred when a too large value of this value was chosen, whereas underfitting was observed when too small values of this parameter were used. Note that this is expected approximately by seeing that in both models (BayesA and BayesB), \( \mathrm{Var}\left({\beta}_j\right)=E\left({\sigma}_{\beta_j}^2\right)={S}_{\beta }/\Big({v}_{\beta }-1 \)), which is almost the inverse of the regularization parameter in any type of Ridge regression model.

6.7 Extended Predictor in Bayesian Genomic Regression Models

All the Bayesian formulations of the model (6.1) described before can be extended, in terms of the predictor, to easily take into account the effects of other factors. For example, effects of environments and environment–marker interaction can be added as

$$ \boldsymbol{y}={\mathbf{1}}_n\mu +{\boldsymbol{X}}_E{\boldsymbol{\beta}}_E+\boldsymbol{X}\boldsymbol{\beta } +{\boldsymbol{X}}_{EM}{\boldsymbol{\beta}}_{EM}+\boldsymbol{\epsilon}, $$
(6.6)

where XE and XEM are the design matrices of the environments and environment–marker interactions, respectively, while βE and βEM are the vectors of the environment effects and the interaction effects, respectively, with a prior distribution that can be specified as was done for β. Indeed, with the BGLR function all these things are possible, and all the options described before can also be adopted for the rest of effects added in the model: FIXED, BRR, BayesA, BayesB, BayesC, and BL.

Under the RKHS model with genotypic and environment–genotypic interaction effects, in the predictor, the modified model (6.6) is expressed as

$$ \boldsymbol{Y}={\mathbf{1}}_n\mu +{\boldsymbol{X}}_E{\boldsymbol{\beta}}_E+{\boldsymbol{Z}}_L\boldsymbol{g}+{\boldsymbol{Z}}_{EL}\boldsymbol{gE}+\boldsymbol{\epsilon}, $$
(6.7)

where ZL and ZLE are the incident matrices of the genomic and environment–genotypic interaction effects, respectively. Similarly to model (6.5), this model cannot be fitted directly in the BGLR and some precalculations are needed first to compute the “covariance” matrix of the predictors ZLg and ZELgE, which are \( {\boldsymbol{K}}_L={\sigma}_g^{-2}\ \mathrm{Var}\left({\boldsymbol{Z}}_L\boldsymbol{g}\right)={\boldsymbol{Z}}_L\boldsymbol{G}{\boldsymbol{Z}}_L^{\mathrm{T}} \) and\( {\boldsymbol{K}}_{LE}={\sigma}_{gE}^{-2}\mathrm{Var}\left({\boldsymbol{Z}}_{LE}\boldsymbol{gE}\right)={\boldsymbol{Z}}_{LE}\left({\boldsymbol{I}}_I\boldsymbol{\bigotimes}\boldsymbol{G}\right){\boldsymbol{Z}}_{LE}^{\mathrm{T}} \), respectively, where I is the number of environments. The BGLR code for implementing this model is the following:

I = length(unique(dat_F$Env)) XE = model.matrix(~0+Env,data=dat_F)[,-1] Z_L = model.matrix(~0+GID,data=dat_F,xlev = list(GID=unique(dat_F$GID))) K_L = Z_L %*%G%*%t(Z_L) Z_LE = model.matrix(~0+GID:Env,data=dat_F, xlev = list(GID=unique(dat_F$GID),Env = unique(dat_F$Env))) K_LE = Z_LE%*%kronecker(diag(I),G)%*%t(Z_LE) ETA = list( list(model='FIXED',X=XE), list( model = ‘RHKS’, K = K_L, df0 = vg, S0 = Sg, R2 = 1-R2)), list(model='RKHS',K=K_LE ) A = BGLR(y,ETA = ETA, nIter = 1e4, burnIn = 1e3, S0 = S, df0 = v, R2 = R2)

where dat_F is the data set that contains the necessary phenotypic information (GID: Lines or individuals; Env: Environment; y: response variable of trait under study).

Example 2 (Includes Models with Only Env Effects and Models with Env and LinexEnv Effects)

To illustrate how to fit the extended genomic regression models described before, here we consider the prediction of grain yield (tons/ha) based on marker information and the genomic relationship derived from it. The data set used consists of 30 lines in four environments, and the genotyped information of 500 markers for each line. The performance prediction of 18 models was evaluated with a five-fold cross-validation, where 80% of the complete data set was used to fit the model and the rest to evaluate the model in terms of the mean squared error of prediction (MSE). These models were obtained by considering different predictors (marker, environment, or/and environment–marker interaction) and different prior models to the parameters of each predictor included.

The model M1 only considered in the predictor the marker effects, from which six sub-models were obtained by adopting one of the six options (BRR, RKHS, BayesA, BayesB, BayesC, or BL) to the prior model of β (or g). Model M2 is model M1 plus the environment effects with a FIXED prior model, for all prior sub-models in the marker predictor. Model M3 is model M2 plus the environment–marker interaction, with a prior model of the same family as those chosen for the marker predictor (see in Table 6.2 all the models we compared).

Table 6.2 Fitted models: Mmd, m = 1, 2, 3, d = 1, …, 6

The performance prediction of the models presented in Table 6.2 is shown in Table 6.3. The first column represents the kind of prior model used in both marker effects and env:marker interaction terms, when the latter is included in the model. In each of the first five prior models, model M2 resulted in better MSE performance, while when the BL prior model was used, model M3, the model with the interaction term, was better. The greater difference is between M1 and M2, where the average MSE across all priors of the first model is approximately 21% greater than the corresponding average of the M2 model. Similar behavior was observed with Pearson’s correlation, with the average of this criterion across all priors about 32% greater in model M2 than in M1. So the inclusion of the environment effect was important, but not the environment:marker interaction.

Table 6.3 Performance prediction of models in Table 6.2: Mean squared error of prediction (MSE) and average Pearson’s correlation (PC), each with its standard deviation across the five partitions

The best prediction performance in terms of MSE was obtained with sub-model M25 (M2 with a BayesC prior) followed by M21 (M2 with a BRR prior). However, the difference between those and sub-models M22, M23, and M24, also derived from M2, is only slight and a little more than with M26, which as commented before, among the models that assume a BL prior, showed a worse performance than M36 (M3 plus a BL prior for marker effects and environment–marker interaction).

6.8 Bayesian Genomic Multi-trait Linear Regression Model

The univariate models described for continuous outcomes do not exploit the possible correlation between traits, when the selection of better individuals is based on several traits and these univariate models are fitted separately to each trait. Relative to this, some advantages of jointly modelling the multi-traits is that in this way the correlation among the traits is appropriately accounted for and can help to improve statistical power and the precision of parameter estimation, which are very important in genomic selection, because they can help to improve prediction accuracy and reduce trait selection bias (Schaeffer 1984; Pollak et al. 1984; Montesinos-López et al. 2018).

An example of this is when crop breeders collect phenotypic data for multiple traits such as grain yield and its components (grain type, grain weight, biomass, etc.), tolerance to biotic and abiotic stresses, and grain quality (taste, shape, color, nutrient, and/or content) (Montesinos-López et al. 2016). In this and many other cases, sometimes the interest is to predict traits that are difficult or expensive to measure with those that are easy to measure or the aim can be to improve all these correlated traits simultaneously (Henderson and Quaas 1976; Calus and Veerkamp 2011; Jiang et al. 2015). In these lines, there is evidence of the usefulness of multi-trait modelling. For example, Jia and Jannink (2012) showed that, compared to single-trait modelling, the prediction accuracy of low-heritability traits could be increased by using a multi-trait model when the degree of correlation between traits is at least moderate. They also found that multi-trait models had better prediction accuracy when phenotypes were not available on all individuals and traits. Joint modelling also has been found useful for increasing prediction accuracy when the traits of interest are not measured in the individuals of the testing set, but this and other traits were observed in individuals in the training set (Pszczola et al. 2013).

6.8.1 Genomic Multi-trait Linear Model

The genomic multi-trait linear model adopts a univariate genomic linear model structure for each trait but with correlated residuals and genotypic effects for traits in the same individual. Assuming that for individual j, nT traits (Yjt, t = 1, …, nT) are measured, this model assumes that

$$ \left[\begin{array}{c}{Y}_{j1}\\ {}{Y}_{j2}\\ {}\vdots \\ {}{Y}_{j{n}_T}\end{array}\right]=\left[\begin{array}{c}{\mu}_1\\ {}{\mu}_2\\ {}\vdots \\ {}{\mu}_{n_T}\end{array}\right]+\left[\begin{array}{c}{\boldsymbol{x}}_j^{\mathrm{T}}{\boldsymbol{\beta}}_1\\ {}{\boldsymbol{x}}_j^{\mathrm{T}}{\boldsymbol{\beta}}_2\\ {}\vdots \\ {}{\boldsymbol{x}}_j^{\mathrm{T}}{\boldsymbol{\beta}}_{n_T}\end{array}\right]+\left[\begin{array}{c}{g}_{j1}\\ {}{g}_{j2}\\ {}\vdots \\ {}{g}_{jn_T}\end{array}\right]+\left[\begin{array}{c}{\epsilon}_{j1}\\ {}{\epsilon}_{j2}\\ {}\vdots \\ {}{\epsilon}_{jn_T}\end{array}\right], $$

where μt, t = 1, …, nT, are the specific trait intercepts, xj is a vector of covariates equal for all traits, gjt, t = 1, …, nT, are the specific trait genotype effects, and ϵjt, t = 1, …, nT are the random error terms corresponding to each trait. In matrix notation, it can be expressed as

$$ {\boldsymbol{Y}}_j=\boldsymbol{\mu} +{\boldsymbol{B}}^{\mathrm{T}}\ {\boldsymbol{x}}_j+{\boldsymbol{g}}_j+{\boldsymbol{\epsilon}}_j, $$
(6.8)

where \( {\boldsymbol{Y}}_j={\left[{Y}_{j1},\dots, {Y}_{j{n}_T}\right]}^{\mathrm{T}} \), \( \boldsymbol{\mu} ={\left[{\mu}_1,\dots, {\mu}_{n_T}\right]}^{\mathrm{T}} \), \( \boldsymbol{B}=\left[{\boldsymbol{\beta}}_1,\kern0.5em \dots, \kern0.5em {\boldsymbol{\beta}}_{n_T}\right] \), \( {\boldsymbol{g}}_j={\left[{g}_{j1},\dots, {g}_{j{n}_T}\right]}^{\mathrm{T}}, \) and \( {\boldsymbol{\epsilon}}_j={\left[\epsilon, \dots, {\epsilon}_{n_T}\right]}^{\mathrm{T}}. \) The residual vectors are assumed to be independent with multivariate normal distribution, that is \( {\boldsymbol{\epsilon}}_j\sim {N}_{n_T}\left(\mathbf{0},\boldsymbol{R}\right), \) and all the random genotype effects are assumed to be \( \boldsymbol{g}={\left[{\boldsymbol{g}}_1^{\mathrm{T}},\dots, {\boldsymbol{g}}_J^{\mathrm{T}}\right]}^{\mathrm{T}}\sim N\left(\mathbf{0},\boldsymbol{G}\mathbf{\bigotimes }{\boldsymbol{\Sigma}}_T\right), \) being the Kronecker product. For a full Bayesian specification of this model, we suppose that β = vec(B) ∼ N(β0, Σβ), that is, marginally, for the fixed effect of each trait, a prior multivariate normal distribution is adopted, \( {\boldsymbol{\beta}}_t\sim {N}_p\left({\boldsymbol{\beta}}_{t0},{\boldsymbol{\Sigma}}_{{\boldsymbol{\beta}}_t}\right),t=1,\dots, {n}_T; \) a flat prior for the intercepts, f(μ) ∝ 1; and independent inverse Wishart distributions for the covariance matrix of residuals R and for ΣT, that is, ΣT ∼ IW(vt, St) and R ∼ IW(vR, SR).

Putting all the information together where the measured traits of each individual (Yj) are accommodated in the rows of a matrix response (Y), model (6.8) can be expressed as

$$ \boldsymbol{Y}={\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}+\boldsymbol{XB}+{\boldsymbol{Z}}_1{\boldsymbol{b}}_1+\boldsymbol{E}, $$
(6.9)

where =[Y1, …, YJ]T, X = [x1, …, xJ]T, b1 = [g1, …, gJ]T, and E = [ϵ1, …, ϵJ]T. Note that under this notation, \( {\boldsymbol{E}}^{\mathrm{T}}\sim M{N}_{n_T\times J}\left(\mathbf{0},\boldsymbol{R},{\boldsymbol{I}}_J\right) \) or equivalently \( \boldsymbol{E}\sim M{N}_{J\times {n}_T}\left(\mathbf{0},{\boldsymbol{I}}_J,\boldsymbol{R}\right) \), and \( {\boldsymbol{b}}_1^{\mathrm{T}}\sim {MN}_{n_T\times J}\left(\mathbf{0},{\boldsymbol{\Sigma}}_T,\boldsymbol{G}\right) \) or \( {\boldsymbol{b}}_1\sim {MN}_{J\times {n}_T}\left(\mathbf{0},\boldsymbol{G},{\boldsymbol{\Sigma}}_T\right) \). Here \( \boldsymbol{Z}\sim {MN}_{J\times {n}_T}\left(\boldsymbol{M},\boldsymbol{U},\boldsymbol{V}\right) \) means that the random matrix Z follows the matrix variate normal distribution with parameters M, U, and V, or equivalently, that the JnT random vector vec(Z) is distributed as \( {N}_{J{n}_T}\left(\mathrm{vec}\left(\boldsymbol{M}\right),\boldsymbol{V}\otimes \boldsymbol{U}\right), \)with vec(·) denoting the vectorization of a matrix that stacks the columns of this in a single column. Note that when ΣT and R are diagonal matrices, model (6.9) is equivalent to separately fitting a univariate GBLUP model to each trait.

The conditional distribution of all traits is given by

$$ {\displaystyle \begin{array}{c}f\left(\boldsymbol{Y}|\boldsymbol{\mu}, \boldsymbol{\beta}, {\boldsymbol{b}}_1,{\boldsymbol{\Sigma}}_T,\mathbf{R}\right)=\frac{{\left|\boldsymbol{R}\right|}^{-\frac{J}{2}}}{{\left(2\pi \right)}^{Jn_T}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left[{\boldsymbol{R}}^{-1}{\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{B}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)}^{\mathrm{T}}{\boldsymbol{I}}_J\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{\beta } -{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)\right]\right\}\\ {}=\frac{{\left|\boldsymbol{R}\right|}^{-\frac{J}{2}}}{{\left(2\pi \right)}^{Jn_{\mathrm{T}}}}\exp \left\{-\frac{1}{2}{\left[\mathrm{vec}\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{B}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)\right]}^{\mathrm{T}}\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{I}}_J\right)\left[\mathrm{vec}\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{B}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)\right]\right\}\end{array}} $$

and the joint posterior of parameters μ, B, b1, ΣT, and R is given by

$$ f\left(\boldsymbol{\mu}, \boldsymbol{B},{\boldsymbol{b}}_{\mathbf{1}},{\boldsymbol{\Sigma}}_T,\boldsymbol{R}|\boldsymbol{Y}\right)\propto f\left(\boldsymbol{Y}|\boldsymbol{\mu}, \boldsymbol{B},{\boldsymbol{b}}_1,{\boldsymbol{\Sigma}}_T,\mathbf{R}\right)f\left({\boldsymbol{b}}_1|{\boldsymbol{\Sigma}}_T\right)f\left({\boldsymbol{\Sigma}}_T\right)f\left(\boldsymbol{\beta} \right)f\left(\boldsymbol{R}\right), $$

where f(b1| ΣT) denotes the conditional distribution of the genotype effects, and f(ΣT), f(β), and f(R) denote the prior density distribution of ΣT, B, and R, respectively. This joint posterior distribution of the parameters doesn’t have closed form; for this reason, next are derived the full conditional distributions for Gibbs sampler implementation.

Let β0 and Σβ be the prior mean and variance of β = vec(B). Because tr(AB) = vec(AT)Tvec(B) = vec(B)Tvec(AT) and vec (AXB) = (BT ⊗ A)vec(X), we have that

$$ {\displaystyle \begin{array}{c}P\left(\boldsymbol{\beta} |-\right)\propto f\left(\boldsymbol{Y}|\boldsymbol{\mu}, \boldsymbol{B},{\boldsymbol{b}}_1,{\boldsymbol{\Sigma}}_T,\mathbf{R}\right)f\left(\boldsymbol{\beta} \right)\\ {}\propto \exp \left\{\begin{array}{c}-\frac{1}{2}{\left[\mathrm{vec}\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)-\left({\boldsymbol{I}}_{n_T}\otimes \boldsymbol{X}\right)\mathrm{vec}\left(\boldsymbol{B}\right)\right]}^{\mathrm{T}}\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{I}}_J\right)\\ {}\times \left[\mathrm{vec}\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)-\left({\boldsymbol{I}}_{n_T}\otimes \boldsymbol{X}\right)\mathrm{vec}\left(\boldsymbol{B}\right)\right]-\frac{1}{2}{\left(\boldsymbol{\beta} -{\boldsymbol{\beta}}_0\right)}^{\mathrm{T}}{\boldsymbol{\Sigma}}_{\beta}^{-1}\left(\boldsymbol{\beta} -{\boldsymbol{\beta}}_0\right)\end{array}\right\}\\ {}\propto \exp \left\{\begin{array}{c}-\frac{1}{2}{\left[\mathrm{vec}\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)-\left({\boldsymbol{I}}_{n_T}\otimes \boldsymbol{X}\right)\boldsymbol{\beta} \right]}^{\mathrm{T}}\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{I}}_J\right)\\ {}\times \left[\mathrm{vec}\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)-\left({\boldsymbol{I}}_{n_T}\otimes \boldsymbol{X}\right)\boldsymbol{\beta} \right]-\frac{1}{2}{\left(\boldsymbol{\beta} -{\boldsymbol{\beta}}_0\right)}^{\mathrm{T}}{\boldsymbol{\Sigma}}_{\beta}^{-1}\left(\boldsymbol{\beta} -{\boldsymbol{\beta}}_0\right)\end{array}\right\}\\ {}\propto \exp \left\{-\frac{1}{2}{\left[\boldsymbol{\beta} -{\tilde{\boldsymbol{\beta}}}_0\right]}^{\mathrm{T}}{\tilde{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}}^{-1}\left[\boldsymbol{\beta} -{\tilde{\boldsymbol{\beta}}}_0\right]\right\},\end{array}} $$

where \( {\overset{\sim }{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}}={\left[{\boldsymbol{\Sigma}}_{\beta}^{-1}+\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\right)\right]}^{-1} \) and \( {\overset{\sim }{\boldsymbol{\beta}}}_0={\overset{\sim }{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}}\left[{\boldsymbol{\Sigma}}_{\beta}^{-1}{\boldsymbol{\beta}}_0+\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{X}}^{\mathrm{T}}\right)\mathrm{vec}\left(\boldsymbol{Y}-{\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)\right] \). So, the full conditional distribution of β is \( {N}_p\left({\overset{\sim }{\boldsymbol{\beta}}}_0,{\overset{\sim }{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}}\right) \). Similarly, the full conditional distribution of g = vec(b1) is \( {N}_J\left(\overset{\sim }{\boldsymbol{g}},\overset{\sim }{\boldsymbol{G}}\right) \), with \( \overset{\sim }{\boldsymbol{G}}={\left[\left({\boldsymbol{\Sigma}}_T^{-1}\otimes {\boldsymbol{G}}^{-1}\right)+\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{Z}}_1^{\mathrm{T}}{\boldsymbol{Z}}_1\right)\right]}^{-1} \) and \( \overset{\sim }{\boldsymbol{g}}=\overset{\sim }{\boldsymbol{G}}\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{Z}}_1^{\mathrm{T}}\right)\mathrm{vec}\left(\boldsymbol{Y}-{\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{XB}\right). \) Now, because \( \mathrm{vec}\left({\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}\right)=\left({\boldsymbol{I}}_{n_T}\otimes {\mathbf{1}}_J\right)\boldsymbol{\mu} \), similarly as before, the full conditional of μ is \( {N}_{n_T}\left(\overset{\sim }{\boldsymbol{\mu}},{\overset{\sim }{\boldsymbol{\Sigma}}}_{\mu}\right) \), where \( {\overset{\sim }{\boldsymbol{\Sigma}}}_{\mu }={J}^{-1}\boldsymbol{R} \) and \( \overset{\sim }{\boldsymbol{\mu}}={\overset{\sim }{\boldsymbol{\Sigma}}}_{\mu}\left({\boldsymbol{R}}^{-1}\otimes {\mathbf{1}}_J\right)\mathrm{vec}\left(\boldsymbol{Y}-\boldsymbol{XB}-{\boldsymbol{Z}}_1\boldsymbol{B}\right) \).

The full conditional distribution of ΣT

$$ {\displaystyle \begin{array}{c}P\left({\boldsymbol{\Sigma}}_T|-\right)\propto P\left({\boldsymbol{b}}_1|{\boldsymbol{\Sigma}}_T\right)P\left({\boldsymbol{\Sigma}}_T\right)\\ {}\propto {\left|{\boldsymbol{\Sigma}}_T\right|}^{-\frac{J}{2}}{\left|\boldsymbol{G}\right|}^{-\frac{n_T}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left[{\boldsymbol{b}}_1^{\mathrm{T}}{\boldsymbol{G}}^{-1}{\boldsymbol{b}}_1{\boldsymbol{\Sigma}}_T^{-1}\right]\right\}P\left({\boldsymbol{\Sigma}}_T\right)\\ {}\propto {\boldsymbol{\Sigma}}_T^{-\frac{v_T+J+{n}_T+1}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left({\boldsymbol{b}}_1^{\mathrm{T}}{\boldsymbol{G}}^{-1}{\boldsymbol{b}}_1+{\boldsymbol{S}}_T\right){\boldsymbol{\Sigma}}_T^{-1}\right\}.\end{array}} $$

From here we have that \( {\boldsymbol{\Sigma}}_T\mid -\sim \mathrm{IW}\left({v}_T+J,{\boldsymbol{b}}_1^{\mathrm{T}}{\boldsymbol{G}}^{-1}{\boldsymbol{b}}_1+{\boldsymbol{S}}_T\right). \) Now, because

$$ {\displaystyle \begin{array}{c}P\left(\boldsymbol{R}|-\right)\propto f\left(\boldsymbol{Y}|\boldsymbol{\mu}, \boldsymbol{B},{\boldsymbol{b}}_1,{\boldsymbol{\Sigma}}_T,\mathbf{R}\right)f\left(\boldsymbol{R}\right)\\ {}\propto {\left|\boldsymbol{R}\right|}^{-\frac{J}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left[{\boldsymbol{R}}^{-1}{\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{B}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)}^{\mathrm{T}}{\boldsymbol{I}}_J\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{\beta } -{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)\right]\right\}\\ {}{\left|\boldsymbol{R}\right|}^{-\frac{v_R+{n}_T+1}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left({\boldsymbol{S}}_T{\boldsymbol{R}}^{-1}\right)\right\}\\ {}\propto {\left|\boldsymbol{R}\right|}^{-\frac{v_R+J+{n}_T+1}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left[{\boldsymbol{S}}_T+{\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{B}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)}^{\mathrm{T}}\left(\boldsymbol{Y}-{1}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{\beta } -{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)\right]{\boldsymbol{R}}^{-1}\right\}\end{array}} $$

the full conditional distribution of R is \( \mathrm{IW}\left({\overset{\sim }{v}}_R,{\overset{\sim }{\boldsymbol{S}}}_R\right) \), where \( {\overset{\sim }{v}}_R={v}_R+J \) and \( {\overset{\sim }{\boldsymbol{S}}}_R={\boldsymbol{S}}_T+{\left(\boldsymbol{Y}-{\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{B}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)}^{\mathrm{T}}\left(\boldsymbol{Y}-{\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{\beta } -{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right). \)

In summary, a Gibbs sampler exploration of the joint posterior distribution of μ, β, g, ΣT, and R can be done with the following steps:

  1. 1.

    Simulate β from a multivariate normal distribution \( {N}_p\left({\overset{\sim }{\boldsymbol{\beta}}}_0,{\overset{\sim }{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}}\right) \), where \( {\overset{\sim }{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}}={\left[{\boldsymbol{\Sigma}}_{\beta}^{-1}+\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\right)\right]}^{-1} \) and \( {\overset{\sim }{\boldsymbol{\beta}}}_0={\overset{\sim }{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}}\left[{\boldsymbol{\Sigma}}_{\beta}^{-1}{\boldsymbol{\beta}}_0+\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{X}}^{\mathrm{T}}\right)\mathrm{vec}\left(\boldsymbol{Y}-{\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)\right] \).

  2. 2.

    Simulate μ from \( {N}_{n_T}\left(\tilde{\boldsymbol{\mu}},{\tilde{\boldsymbol{\Sigma}}}_{\mu}\right), \)where \( {\overset{\sim }{\boldsymbol{\Sigma}}}_{\mu }={J}^{-1}\boldsymbol{R} \) and \( \overset{\sim }{\boldsymbol{\mu}}={\overset{\sim }{\boldsymbol{\Sigma}}}_{\mu}\left({\boldsymbol{R}}^{-1}\otimes {\mathbf{1}}_J\right)\mathrm{vec}\left(\boldsymbol{Y}-\boldsymbol{XB}-{\boldsymbol{Z}}_1\boldsymbol{B}\right) \).

  3. 3.

    Simulate g = vec(b1) from \( {N}_J\left(\overset{\sim }{\boldsymbol{g}},\overset{\sim }{\boldsymbol{G}}\right), \)where \( \overset{\sim }{\boldsymbol{G}}={\left[\left({\boldsymbol{\Sigma}}_T^{-1}\otimes {\boldsymbol{G}}^{-1}\right)+\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{Z}}_1^{\mathrm{T}}{\boldsymbol{Z}}_1\right)\right]}^{-1} \) and \( \overset{\sim }{\boldsymbol{g}}=\overset{\sim }{\boldsymbol{G}}\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{Z}}_1^{\mathrm{T}}\right)\mathrm{vec}\left(\boldsymbol{Y}-{\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{XB}\right). \)

  4. 4.

    Simulate ΣT from \( \mathrm{IW}\left({v}_T+J,{\boldsymbol{b}}_1^{\mathrm{T}}{\boldsymbol{G}}^{-1}{\boldsymbol{b}}_1+{\boldsymbol{S}}_T\right) \).

  5. 5.

    Simulate R from \( \mathrm{IW}\left({\overset{\sim }{v}}_R,{\overset{\sim }{\boldsymbol{S}}}_R\right), \)where \( {\overset{\sim }{v}}_R={v}_R+J \) and \( {\overset{\sim }{\boldsymbol{S}}}_R={\boldsymbol{S}}_T+{\left(\boldsymbol{Y}-{\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{B}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right)}^{\mathrm{T}}\left(\boldsymbol{Y}-{\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{\beta } -{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right). \)

  6. 6.

    Return to step 1 or terminate when chain length is adequate to meet convergence diagnostics and the required sample size is reached.

An implementation of this model can be done using the github version of the BGLR R library, which can be accessed from https://github.com/gdlc/BGLR-R and can be installed directly in the R console by running the following commands: install.packages('devtools'); library(devtools) ; install_git ('https://github.com/gdlc/BGLR-R'). This implementation also uses a flat prior for the fixed effect regression coefficients β, and in such a case, the corresponding full conditional of this parameter is the same as step 1 of the Gibbs sampler given before, but removing \( {\boldsymbol{\Sigma}}_{\beta}^{-1} \) and \( {\boldsymbol{\Sigma}}_{\beta}^{-1}{\boldsymbol{\beta}}_0 \) from \( {\overset{\sim }{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}} \) and \( {\overset{\sim }{\boldsymbol{\beta}}}_0 \), respectively. Specifically, model (6.8) can be implemented with this version of the BGLR package as follows:

ETA = list( list( X=X, model='FIXED' ), list( K=Z1GZ T1 , model=’ RKHS’ ) ) A = Multitrait(y = Y, ETA=ETA, resCov = list( type = 'UN', S0 = SR, df0 = vR ), nIter = nI, burnIn = nb)

The first argument in the Multitrait function is the response variable which many times is a phenotype matrix where each row corresponds to the measurement of nT traits in each individual. The second argument is a list predictor in which the first sub-list specifies the design matrix and prior model to the fixed effects part of the predictor in model (6.9), and in the second sub-list are specified the parameters of the distribution of random genetic effects of b1, where it is specified the K = G genomic relationship matrix, that accounts for the similarity between individuals based on marker information, df0 = vT and S0 = ST are the degrees of freedom parameter (vT) and the scale matrix parameter (ST) of the inverse Wishart prior distribution for ΣT, respectively. In the third argument (resCOV), S0 and df0 are the scale matrix parameter (SR) and the degree of freedom parameter (vR) of the inverse Wishart prior distribution for R. The last two arguments are the required number of iterations (nI) and the burn-in period (nb) for running the Gibbs sampler.

Similarly to the univariate case, model (6.9) can be equivalently described and implemented as a multivariate Ridge regression model, as follows:

$$ \boldsymbol{Y}={\mathbf{1}}_J{\boldsymbol{\mu}}^{\mathrm{T}}+\boldsymbol{XB}+{\boldsymbol{X}}_1{\boldsymbol{B}}_1+\boldsymbol{E}, $$
(6.10)

where X1 = Z1LG, \( \boldsymbol{G}={\boldsymbol{L}}_G{\boldsymbol{L}}_G^{\mathrm{T}} \) is the Cholesky factorization of G, \( {\boldsymbol{B}}_1={\boldsymbol{L}}_G^{-1}{\boldsymbol{b}}_1\sim {MN}_{J\times {n}_T}\left(\mathbf{0},{\boldsymbol{I}}_J,{\boldsymbol{\Sigma}}_T\right) \), and the specifications for the rest of parameters and prior distribution are the same as given in model (6.8). A Gibbs sampler implementation of this model is very similar to the one described before, with little modification. Indeed, a Gibbs implementation with the same multi-trait function is as follows:

LG= t(chol(G)) X1= Z1LG ETA = list( list( X=X, model='FIXED' ), list( X=X1, model='BRR' ) ) A = Multitrait(y = Y, ETA=ETA, resCov = list( type = 'UN', S0 = SR, df0 = vR ), nIter = nI, burnIn = nb)

with the only change in the second sub-list predictor, where now the design matrix X1 and the Ridge regression model (BRR) are specified.

Example 3

To illustrate the performance in terms of the prediction power of these models and how to implement this in R software, we considered a reduced data set that consisted of 50 wheat lines grown in two environments. In each individual, two traits were measured: FLRSDS and MIXTIM. The evaluation was done with a five-fold cross-validation, where lines were evaluated in some environments with all traits but are missing for all traits in other environments. Model (6.9) was fitted and the environment effect was assumed a fixed effect.

The results are shown in Table 6.4, where the average (standard deviation) of two performance criteria is shown for each trait in each environment: average Pearson’s correlation (PC) and the mean squared error of prediction (MSE). Table 6.4 shows good correlation performance in both traits and in both environments, and better predictions were obtained in environment 2 with both criteria. The magnitude of the MSE in the first trait is mainly because the measurement scale is greater than in the second trait.

Table 6.4 Average Pearson’s correlation (PC) and mean squared error of prediction (MSE) between predicted and observed values across five random partitions where lines were evaluated in some environments with all traits but are missing for all traits in other environments, SD represent standard deviation across partitions

The R codes to reproduce these results (Table 6.4) are shown in Appendix 5.

6.9 Bayesian Genomic Multi-trait and Multi-environment Model (BMTME)

Model (6.9) does not take into account the possible trait–genotype–environment interaction (T × G × E), when environment information is available. An extension of this model is the one proposed by Montesinos-López et al. (2016), who added this interaction term to vary the specific trait genetic effects (gj) across environments. If the information of nT traits of J lines is collected in I environments, this model is given by

$$ \boldsymbol{Y}={\mathbf{1}}_{IJ}{\boldsymbol{\mu}}^{\mathrm{T}}+\boldsymbol{XB}+{\boldsymbol{Z}}_1{\boldsymbol{b}}_1+{\boldsymbol{Z}}_2{\boldsymbol{b}}_2+\boldsymbol{E}, $$
(6.11)

where =[Y1, …, YIJ]T, X = [x1, …, xIJ]T, Z1 and Z2 are the incident lines and the incident environment–line interaction matrices, b1 = [g1, …, gJ]T, b2 = [g21, …, g2IJ]T, and E = [ϵ1, …, ϵIJ]T. Here, \( {\boldsymbol{b}}_2\mid {\boldsymbol{\Sigma}}_T,{\boldsymbol{\Sigma}}_E\sim M{N}_{IJ\times {n}_T}\left(\mathbf{0},{\boldsymbol{\Sigma}}_E\mathbf{\bigotimes}\boldsymbol{G},{\boldsymbol{\Sigma}}_T\right) \), and similar to model (6.2), \( {\boldsymbol{b}}_1\mid {\boldsymbol{\Sigma}}_T\sim {MN}_{J\times {n}_T}\left(\mathbf{0},\boldsymbol{G},{\boldsymbol{\Sigma}}_T\right) \) and \( \boldsymbol{E}\sim M{N}_{IJ\times {n}_T}\left(\mathbf{0},{\boldsymbol{I}}_{IJ},\boldsymbol{R}\right) \). The complete Bayesian specification of this model assumes independent multivariate normal distributions for the columns of B, that is, for the fixed effect of each trait a prior multivariate normal distribution is adopted, \( {\boldsymbol{\beta}}_t\sim {N}_p\left({\boldsymbol{\beta}}_{t0},{\boldsymbol{\Sigma}}_{{\boldsymbol{\beta}}_t}\right),t=1,\dots, {n}_T; \) a flat prior for the intercepts, f(μ) ∝ 1; and independent inverse Wishart distributions for the covariance matrices of residuals R and for ΣT, ΣT ∼ IW(vT, ST) and R ∼ IW(vR, SR), and also an inverse Wishart distribution for ΣE, ΣE ∼ IW(vE, SE).

The full conditional distributions of μ, B, b1, b2, and R can be derived as in model (6.9). The full conditional distribution of ΣT is

$$ {\displaystyle \begin{array}{c}f\left({\boldsymbol{\Sigma}}_T|-\right)\propto f\left({\boldsymbol{b}}_1|{\boldsymbol{\Sigma}}_T\right)P\left({\boldsymbol{b}}_2|{\boldsymbol{\Sigma}}_T,{\boldsymbol{\Sigma}}_E\right)P\left({\boldsymbol{\Sigma}}_T\right)\\ {}\propto {\left|{\boldsymbol{\Sigma}}_T\right|}^{-\frac{J}{2}}{\left|\boldsymbol{G}\right|}^{-\frac{L}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left[{\boldsymbol{b}}_1^{\mathrm{T}}{\boldsymbol{G}}^{-1}{\boldsymbol{b}}_1{\boldsymbol{\Sigma}}_T^{-1}\right]\right\}\\ {}\times {\left|{\boldsymbol{\Sigma}}_T\right|}^{-\frac{IJ}{2}}{\left|{\boldsymbol{\Sigma}}_E\otimes \boldsymbol{G}\right|}^{-\frac{n_T}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left[{\boldsymbol{b}}_2^{\mathrm{T}}\left({\boldsymbol{\Sigma}}_E^{-1}\otimes {\boldsymbol{G}}^{-1}\right){\boldsymbol{b}}_2{\boldsymbol{\Sigma}}_T^{-1}\right]\right\}P\left({\boldsymbol{\Sigma}}_T\right)\\ {}\propto {\boldsymbol{\Sigma}}_T^{-\frac{v_T+J+ IJ+{n}_T+1}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left({\boldsymbol{b}}_1^{\mathrm{T}}{\boldsymbol{G}}^{-1}{\boldsymbol{b}}_1+{\boldsymbol{b}}_2^{\mathrm{T}}\left({\boldsymbol{\Sigma}}_E^{-1}\otimes {\boldsymbol{G}}^{-1}\right){\boldsymbol{b}}_2+{\boldsymbol{S}}_t\right){\boldsymbol{\Sigma}}_T^{-1}\right\},\end{array}} $$

that is, \( {\boldsymbol{\Sigma}}_T\mid -\sim \mathrm{IW}\left({v}_T+J+ IJ,{\boldsymbol{b}}_1^{\mathrm{T}}{\boldsymbol{G}}^{-1}{\boldsymbol{b}}_1+{\boldsymbol{b}}_2^{\mathrm{T}}\left({\boldsymbol{\Sigma}}_E^{-1}\otimes {\boldsymbol{G}}^{-1}\right){\boldsymbol{b}}_2+{\boldsymbol{S}}_T\right). \)

Now, let be \( {\boldsymbol{b}}_2^{\ast } \) a JnT × I matrix such that \( \mathrm{vec}\left({\boldsymbol{b}}_2^{\mathrm{T}}\right)=\mathrm{vec}\left({\boldsymbol{b}}_2^{\ast}\right). \) Then because \( {\boldsymbol{b}}_2^{\mathrm{T}}\mid {\boldsymbol{\Sigma}}_T,{\boldsymbol{\Sigma}}_E\sim M{N}_{n_T\times IJ}\left(\mathbf{0},{\boldsymbol{\Sigma}}_T,{\boldsymbol{\Sigma}}_E\boldsymbol{\bigotimes}\boldsymbol{G}\right) \), \( \mathrm{vec}\left({\boldsymbol{b}}_2^{\mathrm{T}}\right)\mid {\boldsymbol{\Sigma}}_T,{\boldsymbol{\Sigma}}_E\sim N\left(\mathbf{0},{\boldsymbol{\Sigma}}_E\boldsymbol{\bigotimes}\left(\boldsymbol{G}\boldsymbol{\bigotimes }{\boldsymbol{\Sigma}}_T\right)\right), \) and \( {\boldsymbol{b}}_2^{\ast}\mid {\boldsymbol{\Sigma}}_T,{\boldsymbol{\Sigma}}_E\sim M{N}_{n_T\times IJ}\left(\mathbf{0},\boldsymbol{G}\boldsymbol{\bigotimes }{\boldsymbol{\Sigma}}_T,{\boldsymbol{\Sigma}}_E\right) \), the full conditional posterior distribution of ΣE is

$$ {\displaystyle \begin{array}{c}P\left({\boldsymbol{\Sigma}}_E|\mathrm{ELSE}\right)\propto P\left({\boldsymbol{b}}_2|{\boldsymbol{\Sigma}}_E\right)\ P\left({\boldsymbol{\Sigma}}_E\right)\\ {}\propto {\left|{\boldsymbol{\Sigma}}_E\right|}^{-\frac{JL}{2}}{\left|\boldsymbol{G}\otimes {\boldsymbol{\Sigma}}_T\right|}^{-\frac{I}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left[{\boldsymbol{\Sigma}}_E^{-1}{\boldsymbol{b}}_2^{\ast \mathrm{T}}\left({\boldsymbol{G}}^{-1}\otimes {\boldsymbol{\Sigma}}_T^{-1}\right){\boldsymbol{b}}_2^{\ast}\right]\right\}\\ {}{\left|{\boldsymbol{S}}_E\right|}^{\frac{\upsilon_E+I-1}{2}}\times {\left|{\boldsymbol{\Sigma}}_E\right|}^{-\frac{\upsilon_E}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left({\boldsymbol{S}}_E{\boldsymbol{\Sigma}}_E^{-1}\right)\right\}\\ {}\propto {\left|{\boldsymbol{\Sigma}}_E\right|}^{-\frac{\upsilon_E+ JL+I+1}{2}}\exp \left\{-\frac{1}{2}\mathrm{tr}\left[\left({\boldsymbol{b}}_2^{\ast \mathrm{T}}\left({\boldsymbol{G}}^{-1}\otimes {\boldsymbol{\Sigma}}_T^{-1}\right){\boldsymbol{b}}_2^{\ast }+{\boldsymbol{S}}_E\right)\right]{\boldsymbol{\Sigma}}_E^{-1}\right\}\end{array}} $$

which means that\( {\boldsymbol{\Sigma}}_E\mid -\sim \mathrm{IW}\left({\upsilon}_E+ JL,{\boldsymbol{b}}_2^{\ast \mathrm{T}}\left({\boldsymbol{G}}^{-1}\otimes {\boldsymbol{\Sigma}}_T^{-1}\right){\boldsymbol{b}}_2^{\ast }+{\boldsymbol{S}}_E\right) \).

A Gibbs sampler to explore the joint posterior distribution of parameters of model (6.11), μ, β, b1,b2, ΣT, ΣE, and R, can be implemented with the following steps:

  1. 1.

    Simulate β from a multivariate normal distribution \( {N}_p\left({\overset{\sim }{\boldsymbol{\beta}}}_0,{\overset{\sim }{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}}\right) \), where \( {\overset{\sim }{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}}={\left[{\boldsymbol{\Sigma}}_{\beta}^{-1}+\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\right)\right]}^{-1} \) and \( {\overset{\sim }{\boldsymbol{\beta}}}_0={\overset{\sim }{\boldsymbol{\Sigma}}}_{\boldsymbol{\beta}}\left[{\boldsymbol{\Sigma}}_{\beta}^{-1}{\boldsymbol{\beta}}_0+\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{X}}^{\mathrm{T}}\right)\mathrm{vec}\left(\boldsymbol{Y}-{\mathbf{1}}_{IJ}{\boldsymbol{\mu}}^{\mathrm{T}}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1-{\boldsymbol{Z}}_2{\boldsymbol{b}}_2\right)\right] \).

  2. 2.

    Simulate μ from \( {N}_{n_T}\left(\overset{\sim }{\boldsymbol{\mu}},{\overset{\sim }{\boldsymbol{\Sigma}}}_{\mu}\right) \), where \( {\overset{\sim }{\boldsymbol{\Sigma}}}_{\mu }={(IJ)}^{-1}\boldsymbol{R} \) and \( \overset{\sim }{\boldsymbol{\mu}}={\overset{\sim }{\boldsymbol{\Sigma}}}_{\mu}\left({\boldsymbol{R}}^{-1}\otimes {\mathbf{1}}_{IJ}\right)\mathrm{vec}\left(\boldsymbol{Y}-\boldsymbol{XB}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1-{\boldsymbol{Z}}_2{\boldsymbol{b}}_{\mathbf{2}}\right) \).

  3. 3.

    Simulate g1 = vec(b1) from \( {N}_J\left({\overset{\sim }{\boldsymbol{g}}}_1,\overset{\sim }{\boldsymbol{G}}\right) \), where \( \overset{\sim }{\boldsymbol{G}}={\left[\left({\boldsymbol{\Sigma}}_T^{-1}\otimes {\boldsymbol{G}}^{-1}\right)+\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{Z}}_1^{\mathrm{T}}{\boldsymbol{Z}}_1\right)\right]}^{-1} \) and \( \overset{\sim }{\boldsymbol{g}}=\overset{\sim }{\boldsymbol{G}}\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{Z}}_1^{\mathrm{T}}\right)\mathrm{vec}\left(\boldsymbol{Y}-{\mathbf{1}}_{IJ}{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{XB}-{\boldsymbol{Z}}_2{\boldsymbol{b}}_2\right). \)

  4. 4.

    Simulate g2 = vec(b2) from \( {N}_J\left({\overset{\sim }{\boldsymbol{g}}}_2,{\overset{\sim }{\boldsymbol{G}}}_2\right) \), where \( {\overset{\sim }{\boldsymbol{G}}}_2={\left[\left({\boldsymbol{\Sigma}}_T^{-1}\otimes {\boldsymbol{\Sigma}}_E^{-1}\bigotimes {\boldsymbol{G}}^{-1}\right)+\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{Z}}_2^{\mathrm{T}}{\boldsymbol{Z}}_2\right)\right]}^{-1} \) and \( {\overset{\sim }{\boldsymbol{g}}}_2={\overset{\sim }{\boldsymbol{G}}}_2\left({\boldsymbol{R}}^{-1}\otimes {\boldsymbol{Z}}_2^{\mathrm{T}}\right)\mathrm{vec}\left(\boldsymbol{Y}-{\mathbf{1}}_{IJ}{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{XB}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1\right). \)

  5. 5.

    Simulate ΣT from \( \mathrm{IW}\left({v}_T+J+ IJ,{\boldsymbol{b}}_1^{\mathrm{T}}{\boldsymbol{G}}^{-1}{\boldsymbol{b}}_1+{\boldsymbol{b}}_2^{\mathrm{T}}\left({\boldsymbol{\Sigma}}_E^{-1}\otimes {\boldsymbol{G}}^{-1}\right){\boldsymbol{b}}_2+{\boldsymbol{S}}_T\right) \).

  6. 6.

    Simulate ΣE from \( \mathrm{IW}\left({\upsilon}_E+ JL,{\boldsymbol{b}}_2^{\ast \mathrm{T}}\left({\boldsymbol{G}}^{-1}\otimes {\boldsymbol{\Sigma}}_T^{-1}\right){\boldsymbol{b}}_2^{\ast }+{\boldsymbol{S}}_E\right). \)

  7. 7.

    Simulate R from \( \mathrm{IW}\left({\overset{\sim }{v}}_R,{\overset{\sim }{\boldsymbol{S}}}_R\right) \), where \( {\overset{\sim }{v}}_R={v}_R+ IJ \) and \( {\overset{\sim }{\boldsymbol{S}}}_R={\boldsymbol{S}}_T+{\left(\boldsymbol{Y}-{\mathbf{1}}_{IJ}{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{B}-{\boldsymbol{Z}}_1{\boldsymbol{b}}_1-{\boldsymbol{Z}}_2{\boldsymbol{b}}_2\right)}^{\mathrm{T}}\left(\boldsymbol{Y}-{\mathbf{1}}_{IJ}{\boldsymbol{\mu}}^{\mathrm{T}}-\boldsymbol{X}\boldsymbol{\beta } -{\boldsymbol{Z}}_1{\boldsymbol{b}}_1-{\boldsymbol{Z}}_2{\boldsymbol{b}}_2\right). \)

  8. 8.

    Return to step 1 or terminate when chain length is adequate to meet convergence diagnostics and the required sample size is reached.

A similar Gibbs sampler is implemented in the BMTME R package, with the main difference, that this package does not allow specifying a general fixed effect design matrix X, only the corresponding to the design matrix for the environment effects, and also the intercept vector μ is ignored because it is included in the fixed environment effects. Specifically, to fit model (6.11) where the only fixed effects to be taken into account are the environment’s effects, the R code to implement this with the BMTME package is as follows:

XE = model.matrix(~Env,data=dat_F) Z1 = model.matrix(~0+GID,data=dat_F) Lg = t(chol(G)) Z1_a = Z1%*%Lg Z2 = model.matrix(~0+GID:Env,data=dat_F) G2 = kronecker(diag(dim(XE)[2])),Lg) Z2 _a = Z2%*%G2 A = BMTME(Y = Y, X = XE, Z1 = Z1_a, Z2 = Z2_a, nIter = nI, burnIn = nb, thin = 2, bs = 50)

where Y is the matrix of response variables where each row corresponds to the measurement of nT traits in each individual, XE is a design matrix for the environment effects, Z1 is the incidence matrix of the genetics effects, Z2 is the design matrix of the genetic–environment interaction effects, nI and nb are the required number of interactions and the burn-in period, and bs is the number of blocks to use internally to sample from vec(b2).

Example 4

To illustrate how to implement this model with the BMTME R package, we considered the data in Example 2, but now the explored model includes the trait–genotype–environment interaction.

The average results of the prediction performance in terms of PC and MSE for implementing the same five-fold cross-validation used in Example 3 are shown in Table 6.5. These results show an improvement in terms of prediction performance with this model in all trait environments combinations and in both criteria (PC and MSE) to measure the prediction performance, except in trait MIXTIM and Env 2, where the MSE is slightly greater than the one obtained with model (6.9), which does not take into account the triple interaction (trait–genotype–environment).

Table 6.5 Average Pearson’s correlation (PC) and mean squared error of prediction (MSE) between predicted and observed values across five random partitions where lines were evaluated in some environments with all traits but are missing for all traits in other environments

The R code used to obtained these results is given in Appendix 5.