Keywords

5.1 General of Linear Mixed Models

Linear mixed models (LMM) are flexible extensions of linear models in which fixed and random effects enter linearly into the model. This is useful in many disciplines to model repeated, longitudinal, or clustered observations, in which random effects are introduced to help capture correlation or/and random variation among observations in the same group of individuals. Random effects are random values associated with levels of a random factor, and often represent random deviations from the population mean and linear relationships described by fixed effects (Pinheiro and Bates 2000; West et al. 2014).

The first formulation of a linear mixed model was applied in the field of astronomy to analyze repeated telescopic observations made at various hourly intervals over a range of nights (West et al. 2014). The mixed model approach is often called by various names, depending on the discipline in which it is applied. For example, in the social sciences, this approach is known as a multilevel or hierarchical model that is often used to flexibly measure the different levels of grouping present in the data structure (e.g., an impact evaluation of a new teaching method, survey of job satisfaction, education applications, etc.) (Goldstein 2011; Speelman et al. 2018; Finch et al. 2019).

Other application areas can be found in medicine (health care research, Leyland and Goldstein 2001; Brown and Prescott 2014), agriculture, ecology, industry, and animal science, where this model is often referred to as the random effects or mixed-effects model (Pinheiro and Bates 2000; Raudenbush and Bryk 2002; Meeker et al. 2011; Zuur et al. 2009). Specifically, now there is an increasing number of applications of this model in genomic selection for plant and animal breeding, where molecular markers obtained by genotyping-by-sequencing or other technologies are used to predict breeding values for non-phenotyped lines to select candidate lines prior to phenotypic evaluation (Meuwissen et al. 2001; Poland et al. 2012; Cabrera-Bosquet et al. 2012; Araus and Cairns 2014; Crossa et al. 2017; Covarrubias-Pazaran et al. 2018; Wang et al. 2018; Cappa et al. 2019). However, the use of this model in animal science can be traced back to Henderson (1950).

The general univariate linear mixed model (Harville 1977) is provided by the formula

$$ \boldsymbol{Y}=\boldsymbol{X}\boldsymbol{\beta } +\boldsymbol{Zb}+\boldsymbol{\epsilon}, $$
(5.1)

where Y is the n × 1 random response vector, X is the n × (p + 1) design matrix for the fixed effects, β = (β0, β1, …, βp)T is the (p + 1) × 1 coefficient vector of fixed effects, b is a q × 1 vector of random effects, Z is the associate matrix design for the random effects, and ϵ is the n × 1 vector of random errors. It assumes that ϵ is a random vector with a mean vector of 0 and a variance–covariance matrix R, b is a random vector with a mean of 0 and variance–covariance matrix D, and a null variance–covariance matrix between ϵ and b, Cov(ϵ, b) = 0n × q. In genomic applications, b often includes the genotypic effects and genotype × environment interaction effects, while X may contain information about environment covariates and other related information.

Note that under this model, E(Y) =  and the variance–covariance matrix of the response vector is Var(Y) = ZDZT + R.

5.2 Estimation of the Linear Mixed Model

5.2.1 Maximum Likelihood Estimation

One method typically used for the estimation of the parameters of the LMM is the maximum likelihood approach. For the estimation under an LMM, the random errors and the random effects components are needed. Assuming that ϵ ∼ Nn(0, R), and b ∼ Nq(0, D), with R and D positive semi-defined matrices, the marginal distribution of the response vector Y is Nn(, ZDZT + R), and so the likelihood of the parameters is given by

$$ L\left(\boldsymbol{\beta}, \boldsymbol{D},\boldsymbol{R};\boldsymbol{y}\right)=\frac{{\left|\boldsymbol{V}\right|}^{-\frac{1}{2}}}{{\left(2\pi \right)}^{\frac{n}{2}}}\exp \left[-\frac{1}{2}{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } \right)}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } \right)\right], $$
(5.2)

where V = ZTDZ + R is the marginal variance of Y.

The maximum likelihood estimators (MLE) of the parameters, β, D, and R, are the values that maximize the likelihood function (5.2) (Searle et al. 2006; Stroup 2012), but due to the fact that no explicit formulas to estimate these parameters exist, numerical methods such as Newton–Raphson and Fisher Scoring are used. See details for implementing these methods in Jennrich and Sampson (1976) for the case where: D is a block diagonal with a submatrix in each diagonal of the form \( {\sigma}_j^2{\boldsymbol{A}}_j, \) where Aj is a known matrix and \( {\sigma}_j^2 \) is the variance component parameter to be estimated in this case; while for R = σ2C, where C is a known matrix, only σ2 should be estimated. For a more general explanation, see Jennrich and Schluchter (1986), and for an improvement of the algorithms proposed by these authors, consult Lindstrom and Bates (1988).

Another numerical method that can be used to obtain the MLE is the expected maximization (EM) algorithm, which is conceptually a simple algorithm for parameter estimation in this model (Laird and Ware 1982). This algorithm is an iterative numerical method to obtain maximum likelihood in the context of missing or hidden data (Borman 2004). The EM algorithm is described for the case where σ2In, and where In is the identity matrix of dimension n. This algorithm, for some specific variance–covariance matrices of random effects, as described and used below, can be implemented using the sommer R package (Covarrubias-Pazaran 2016, 2018), which provides two additional algorithms available to obtain the MLE of the parameters in this same model.

5.2.1.1 EM Algorithm

The likelihood for complete data, y and b, is given by

$$ {f}_{\boldsymbol{Y},\boldsymbol{b}}\left(\boldsymbol{y},\boldsymbol{b}\right)={f}_{\boldsymbol{Y}\mid \boldsymbol{b}}\left(\boldsymbol{y}|\boldsymbol{b}\right){f}_{\boldsymbol{b}}\left(\boldsymbol{b}\right)=\frac{{\left|\boldsymbol{D}\right|}^{-\frac{1}{2}}}{{\left(2\pi {\sigma}^2\right)}^{\frac{n}{2}}}\exp \left[-\frac{1}{2{\sigma}^2}{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } -\boldsymbol{Zb}\right)}^{\mathrm{T}}\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } -\boldsymbol{Zb}\right)-\frac{1}{2}{\boldsymbol{b}}^{\mathrm{T}}{\boldsymbol{D}}^{-1}\boldsymbol{b}\right] $$

As such, the log-likelihood for the complete data, y and b, is given by

$$ {\ell}_c\left(\boldsymbol{\beta}, \boldsymbol{\theta}; \boldsymbol{y},\boldsymbol{b}\right)=\log \left[{f}_{\boldsymbol{Y},\boldsymbol{b}}\left(\boldsymbol{y},\boldsymbol{b}\right)\right]=-\frac{n}{2}\log \left(2\pi {\sigma}^2\right)-\frac{1}{2{\sigma}^2}{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } -\boldsymbol{Zb}\right)}^{\mathrm{T}}\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } -\boldsymbol{Zb}\right)-\frac{1}{2}{\boldsymbol{b}}^{\mathrm{T}}{\boldsymbol{D}}^{-1}\boldsymbol{b}-\frac{1}{2}\log \left(\left|\boldsymbol{D}\right|\right), $$

where θ is the vector parameter that defines the variance–covariance matrix of the random effects (D) and the random vector (R). Some specific examples are given below.

5.2.1.1.1 E Step

Because E(uTAu) = tr[AVar(u)] + E(u)TAE(u) and \( \boldsymbol{b}\mid \boldsymbol{Y}=\boldsymbol{y}\sim {N}_q\left(\overset{\sim }{\boldsymbol{b}},\overset{\sim }{\boldsymbol{D}}\right) \) (see Appendix 1), given the current values of the parameters β(t) and θ(t), the conditional expected value of the complete likelihood, c(β, θ; y, b), is given by [Step (E)]:

$$ {\displaystyle \begin{array}{c}Q\left(\boldsymbol{\beta}, \boldsymbol{\theta} |{\boldsymbol{\beta}}_{(t)},{\boldsymbol{\theta}}_{(t)},\right)={E}_{\boldsymbol{b}\mid \boldsymbol{y}}\left[{\mathrm{\ell}}_c\left(\boldsymbol{\beta}, \boldsymbol{b},\boldsymbol{\theta}; \boldsymbol{y}\right)\right]\\ {}=-\frac{n}{2}\log \left(2{\pi \sigma}^2\right)-\frac{1}{2{\sigma}^2}\mathrm{tr}\left(\boldsymbol{Z}{\tilde{\boldsymbol{D}}}_{\left(\boldsymbol{t}\right)}{\boldsymbol{Z}}^{\mathrm{T}}\right)-\frac{1}{2{\sigma}^2}{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } -\boldsymbol{Z}{\tilde{\boldsymbol{b}}}_{(t)}\right)}^{\mathrm{T}}\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } -\boldsymbol{Z}{\tilde{\boldsymbol{b}}}_{(t)}\right)\\ {}-\frac{1}{2}\mathrm{tr}\left({\boldsymbol{D}}^{-1}{\tilde{\boldsymbol{D}}}_{\left(\boldsymbol{t}\right)}\right)-\frac{1}{2}{\tilde{\boldsymbol{b}}}_{(t)}^{\mathrm{T}}{\boldsymbol{D}}^{-1}{\tilde{\boldsymbol{b}}}_{(t)}-\frac{1}{2}\log \left(\left|\boldsymbol{D}\right|\right)\\ {}=-\frac{n}{2}\log \left(2{\pi \sigma}^2\right)-\frac{1}{2{\sigma}^2}\left[\mathrm{tr}\left(\boldsymbol{Z}{\tilde{\boldsymbol{D}}}_{(t)}{\boldsymbol{Z}}^{\mathrm{T}}\right)+{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } -\boldsymbol{Z}{\tilde{\boldsymbol{b}}}_{(t)}\right)}^{\mathrm{T}}\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } -\boldsymbol{Z}{\tilde{\boldsymbol{b}}}_{(t)}\right)\right]\\ {}-\frac{1}{2}\left\{\mathrm{tr}\left[\left({\tilde{\boldsymbol{D}}}_{(t)}+{\tilde{\boldsymbol{b}}}_{(t)}{\tilde{\boldsymbol{b}}}_{(t)}^{\mathrm{T}}\right){\boldsymbol{D}}^{-1}\right]+\log \left(\left|\boldsymbol{D}\right|\right)\right\}\end{array}} $$

where \( {\overset{\sim }{\boldsymbol{D}}}_{(t)}={\left({\boldsymbol{D}}_{(t)}^{-1}+{\sigma}_{(t)}^{-2}{\boldsymbol{Z}}^{\mathrm{T}}\boldsymbol{Z}\right)}^{-1},\kern0.5em {\overset{\sim }{\boldsymbol{b}}}_{(t)}={\sigma}_{(t)}^{-2}{\overset{\sim }{\boldsymbol{D}}}_{(t)}{\boldsymbol{Z}}^{\mathrm{T}}\left(\boldsymbol{y}-\boldsymbol{X}{\boldsymbol{\beta}}_{(t)}\right), \) and β(t), \( {\sigma}_{(t)}^2 \), and D(t + 1) are the current values of βσ2, and D, respectively.

5.2.1.1.2 M Step

The second step of the EM algorithm is the M step, which consists of updating the parameters by maximizing the conditional expected value of the complete likelihood. First, we can observe that for any value of θ, the value of β that maximizes Q(β, θ| β(t), θ(t)) is given by

$$ {\boldsymbol{\beta}}_{\left(t+1\right)}={\left({\boldsymbol{X}}^{\mathrm{T}}\boldsymbol{X}\right)}^{-1}{\boldsymbol{X}}^{\mathrm{T}}\left(\boldsymbol{y}-\boldsymbol{Z}{\overset{\sim }{\boldsymbol{b}}}_{(t)}\right) $$

which does not depend on the chosen values of θ, but rather specifically on the chosen values of σ2 and D. Then by equating to zero, the derivative of Q(β(t + 1), θ| β(t), θ(t)) with respect to σ2, and solving for σ2, we can obtain that the value of σ2 that maximizes Q(β(t + 1), θ| β(t), θ(t)), for D fixed, is given by

$$ {\sigma}_{\left(t+1\right)}^2=\frac{1}{n}\left[\mathrm{tr}\left(\boldsymbol{Z}{\overset{\sim }{\boldsymbol{D}}}_{(t)}{\boldsymbol{Z}}^{\mathrm{T}}\right)+{\left(\boldsymbol{y}-\boldsymbol{X}{\boldsymbol{\beta}}_{\left(t+1\right)}-\boldsymbol{Z}{\overset{\sim }{\boldsymbol{b}}}_{(t)}\right)}^{\mathrm{T}}\left(\boldsymbol{y}-\boldsymbol{X}{\boldsymbol{\beta}}_{\left(t+1\right)}-\boldsymbol{Z}{\overset{\sim }{\boldsymbol{b}}}_{(t)}\right)\right] $$

which is independent of the value of D. Now, according to result 4.10 in Johnson and Wichern (2002), the value of D that maximizes Q(β, θ| β(t), θ(t)) is given by

$$ {\boldsymbol{D}}_{\left(t+1\right)}={\overset{\sim }{\boldsymbol{D}}}_{(t)}+{\overset{\sim }{\boldsymbol{b}}}_{(t)}{\overset{\sim }{\boldsymbol{b}}}_{(t)}^{\mathrm{T}} $$

and does not depend on β and σ2. So, by joining the above optimization, we have the M step that consists of updating the parameters β, σ2, and D, with β(t + 1), \( {\sigma}_{\left(t+1\right)}^2 \), and D(t + 1), respectively. At this point, we can observe that the current value of the parameters β(t), \( {\sigma}_{(t)}^2 \), and D(t) are used in the computation of \( \overset{\sim }{\boldsymbol{b}} \) and \( \overset{\sim }{\boldsymbol{D}} \).

In the case of \( \boldsymbol{D}={\sigma}_g^2\boldsymbol{A} \), where A is a known matrix (the case in some genomic prediction models, where A corresponds to the genomic relationship matrix, the pedigree matrix, or the environmental matrix), the unique variance parameters to estimate are \( {\sigma}_g^2 \) and σ2, that is, \( \left(\boldsymbol{\theta} =\left({\sigma}_g^2,{\sigma}^2\right)\right) \). In the same fashion, very often in genomic applications but also in a more general setting, \( \boldsymbol{D}=\mathrm{Diag}\left({\sigma}_1^2{\boldsymbol{A}}_1,\dots, {\sigma}_K^2\ {\boldsymbol{A}}_K\right), \) where Ak represents a known variance–covariance matrix (or correlation) structure (genomic relationship matrix, pedigree relationship matrix, etc., Burgueño et al. 2012) between the different random effects included. In this context, the variance component parameters to estimate are \( {\sigma}_k^2,k=1,\dots, K, \) and σ2 \( \left(\boldsymbol{\theta} =\left({\sigma}_1^2,\dots, {\sigma}_K^2,{\sigma}^2\right)\right) \), where Ak, k = 1, …, K, are positive defined known matrices of dimensions qk × qk, k = 1, …, K, such that \( {\sum}_{k=1}^K{q}_k=q \), the E step can be reduced (see Appendix 2) to

$$ {\displaystyle \begin{array}{c}Q\left(\boldsymbol{\beta}, \boldsymbol{\theta} |{\boldsymbol{\beta}}_{(t)},{\boldsymbol{\theta}}_{(t)}\right)=-\frac{n}{2}\log \left(2\pi {\sigma}^2\right)-\frac{1}{2{\sigma}^2}\left[\mathrm{tr}\left(\boldsymbol{Z}{\overset{\sim }{\boldsymbol{D}}}_{(t)}{\boldsymbol{Z}}^{\mathrm{T}}\right)+{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } -\boldsymbol{Z}{\overset{\sim }{\boldsymbol{b}}}_{(t)}\right)}^{\mathrm{T}}\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } -\boldsymbol{Z}{\overset{\sim }{\boldsymbol{b}}}_{(t)}\right)\right]\\ {}-\frac{1}{2}\sum \limits_{k=1}^K\left\{\frac{\sigma_{k(t)}^2}{\sigma_k^2}\left[{q}_k-{\sigma}_{k(t)}^2\mathrm{tr}\left({\boldsymbol{A}}_k{\boldsymbol{Z}}_k^{\mathrm{T}}{\boldsymbol{V}}_{(t)}^{-1}{\boldsymbol{Z}}_k\right)+{\sigma}_{k(t)}^{-2}{\overset{\sim }{\boldsymbol{b}}}_{k(t)}^{\mathrm{T}}{\boldsymbol{A}}_k^{-1}{\overset{\sim }{\boldsymbol{b}}}_{k(t)}\right]+{q}_k\log \left({\sigma}_k^2\right)+\log \left(\left|{\boldsymbol{A}}_k\right|\right)\right\},\end{array}} $$

where V(t) is the marginal matrix of variance–covariance of the response vector in the current value of the parameters, Z = [Z1Z2ZK] is the partitioned design matrix of random effects, with Zk n × qk, the corresponding matrix design for the random effects k, bk (\( {\boldsymbol{b}}^{\mathrm{T}}=\left({\boldsymbol{b}}_1^{\mathrm{T}},\dots, {\boldsymbol{b}}_K^{\mathrm{T}}\ \right) \)), \( {\overset{\sim }{\boldsymbol{b}}}_{(t)}^{\mathrm{T}}=\left({\overset{\sim }{\boldsymbol{b}}}_{1(t)}^{\mathrm{T}},\dots, {\overset{\sim }{\boldsymbol{b}}}_{K(t)}^{\mathrm{T}}\right), \) and \( {\sigma}_{k(t)}^2,k=1,\dots, K, \) are the current values of the variance parameters. Finally, for this specific model, the maximization updates for the beta coefficients and variance components are the same as before, where the variance components are

$$ {\sigma}_{k\left(t+1\right)}^2=\frac{1}{q_k}{\sigma}_{k(t)}^2\left[{q}_k-{\sigma}_{k(t)}^2\mathrm{tr}\left({\boldsymbol{A}}_k{\boldsymbol{Z}}_k^{\mathrm{T}}{\boldsymbol{V}}_{(t)}^{-1}{\boldsymbol{Z}}_k\right)+{\sigma}_{k(t)}^{-2}{\overset{\sim }{\boldsymbol{b}}}_{k(t)}^{\mathrm{T}}{\boldsymbol{A}}_k^{-1}{\overset{\sim }{\boldsymbol{b}}}_{k(t)}\right],k=1,\dots, K. $$

These are obtained by maximizing Q(β, θ| β(t), θ(t)) (defined above) with respect to \( {\sigma}_k^2,k=1,\dots, K. \)

5.2.1.2 REML

An alternative to the ML estimation of the variance components of model (5.1) and to avoid the underestimation of the maximum likelihood method is the restricted maximum likelihood estimation method (REML) proposed by Patterson and Thompson (1971). Among the several ways to define this, one is discussed by Laird and Ware (1982), which under a Bayesian paradigm, consists of estimating the parameters of the variance components by maximizing the marginal posterior distribution of the variance components by assuming a “locally” uniform prior to the distribution for β and θ, that is, f(β, θ) ∝ 1 (Pinheiro and Bates 2000). The marginal posterior of the variance components is given in Appendix 3

$$ {\displaystyle \begin{array}{l}f\left(\boldsymbol{\theta} |\boldsymbol{y}\right)\propto \int f\left(\boldsymbol{\beta}, \boldsymbol{\theta} |\boldsymbol{y}\right)d\boldsymbol{\beta} \propto \int f\left(\boldsymbol{y}|\boldsymbol{\beta}, \boldsymbol{\theta} \right)d\boldsymbol{\beta} \\ {}\propto {\left|\boldsymbol{V}\right|}^{-\frac{1}{2}}{\left|{\boldsymbol{X}}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\boldsymbol{X}\right|}^{\frac{1}{2}}\exp \left\{-\frac{1}{2}{\left(\boldsymbol{y}-\boldsymbol{X}\tilde{\boldsymbol{\beta}}\right)}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\left(\boldsymbol{y}-\boldsymbol{X}\tilde{\boldsymbol{\beta}}\right)\right\},\end{array}} $$

where \( \overset{\sim }{\boldsymbol{\beta}}={\left({\boldsymbol{X}}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\boldsymbol{X}\right)}^{-1}{\boldsymbol{X}}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\boldsymbol{y} \), which corresponds to the maximum likelihood of the fixed effects when the variance components are assumed known as generalized least squares estimates of β (GLS). Then, the restricted maximum likelihood estimators (REML) θ are those that maximize f(θ| y), or equivalently, the REML θ can be defined as maximizing the

$$ {\mathbf{\ell}}_R\left(\boldsymbol{\theta}; \boldsymbol{y}\right)=-\frac{1}{2}\log \left(\left|{\boldsymbol{X}}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\boldsymbol{X}\right|\right)-\frac{1}{2}\log \left(\left|\boldsymbol{V}\right|\right)-\frac{1}{2}{\left(\boldsymbol{y}-\boldsymbol{X}\overset{\sim }{\boldsymbol{\beta}}\right)}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\left(\boldsymbol{y}-\boldsymbol{X}\overset{\sim }{\boldsymbol{\beta}}\right) $$

This function is known as the restricted likelihood because it can be shown that this also corresponds to the likelihood associated with the maximum number (n − p − 1) of linearly independent error contrasts FY, where F is a full row rank (n − p − 1) × n known matrix such that FX = 0. It is important to point out that the associated likelihood based on the transformed data, FY, gives the same result for any chosen contrast error matrix F and, consequently, this is invariant to fixed effect parameters (Harville 1974). Equivalently, the REML of β and θ can be defined as those that maximize the

$$ {\mathbf{\ell}}_R\left(\boldsymbol{\beta}, \boldsymbol{\theta}; \boldsymbol{y}\right)=-\frac{1}{2}\log \left(\left|{\boldsymbol{X}}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\boldsymbol{X}\right|\right)-\frac{1}{2}\log \left(\left|\boldsymbol{V}\right|\right)-\frac{1}{2}{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } \right)}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\beta } \right) $$

This objective function is like the natural logarithm of likelihood function given in Eq. (5.2) (log-likelihood) except for the first term. To obtain the REML solutions or the maximum a posteriory (when adopting a locally uniform prior for the parameter, as described before) of the variance components parameters, (like for the MLE) numerical methods are required. See Jennrich and Schluchter (1986) and Lindstrom and Bates (1988) for details on the Newton–Raphson and Fisher Scoring algorithms; consult the lme4 R package (Bates et al. 2015) which uses a generic nonlinear optimizer and implements a large variety of different models (MLE and REML) that arise from the LMM when different structures of the variance of random effects and errors are adopted. For a derivation of the EM algorithm to obtain the REML, see Laird and Ware (1982) under model (5.1) for longitudinal data; this same approach can be used for the genomic model previously described, where \( \boldsymbol{D}=\mathrm{Diag}\left({\sigma}_1^2{\boldsymbol{A}}_1,\dots,, {\sigma}_K^2\ {\boldsymbol{A}}_K\right) \) and R = σ2In. Consult Searle (1993) and Covarrubias-Pazaran (2016, 2018) for an implementation of this algorithm with the EM function in the sommer R package.

5.2.1.3 BLUPs

In many situations, in addition to the estimation of the fixed effects, the prediction of the random effects is also of interest. A standard method for “estimating” the random effects is the best linear unbiased predictor (BLUP; Robinson 1991), which originally was developed by Henderson (1975) in animal breeding for estimating merit in dairy cattle and is now commonly employed in many research areas (Piepho et al. 2008). If the variance components D and R are known, the best linear unbiased predictor (BLUP) of the random effects b is given by

$$ {\overset{\sim }{\boldsymbol{b}}}^{\ast }=\boldsymbol{D}{\boldsymbol{Z}}^{\mathrm{T}}{\boldsymbol{V}}^{-\mathbf{1}}\left(\boldsymbol{y}-\boldsymbol{X}\overset{\sim }{\boldsymbol{\beta}}\right), $$

where \( \overset{\sim }{\boldsymbol{\beta}}={\left({\boldsymbol{X}}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\boldsymbol{X}\right)}^{-1}{\boldsymbol{X}}^{\mathrm{T}}{\boldsymbol{V}}^{-1}\boldsymbol{y} \) is the generalized least squared (GLS) estimator of β. This can be obtained by maximizing with respect to β and b, the joint density of y and b, fY,b(y, b), and is the reason why Harville (1985) called these estimates of realized values b (McLean et al. 1991), or likewise by solving the mixed model equations (MME) (Henderson 1975):

$$ \left[\begin{array}{cc}{\boldsymbol{X}}^{\mathrm{T}}{\boldsymbol{R}}^{-1}\boldsymbol{X}& {\boldsymbol{X}}^{\mathrm{T}}{\boldsymbol{R}}^{-1}\boldsymbol{Z}\\ {}{\boldsymbol{Z}}^{\mathrm{T}}{\boldsymbol{R}}^{-1}\boldsymbol{X}& {\boldsymbol{Z}}^{\mathrm{T}}{\boldsymbol{R}}^{-1}\boldsymbol{Z}+{\boldsymbol{D}}^{-1}\end{array}\right]\left[\begin{array}{c}\overset{\sim }{\boldsymbol{\beta}}\\ {}{\overset{\sim }{\boldsymbol{b}}}^{\ast}\end{array}\right]=\left[\begin{array}{c}{\boldsymbol{X}}^{\mathrm{T}}{\boldsymbol{R}}^{-1}\boldsymbol{y}\\ {}{\boldsymbol{Z}}^{\mathrm{T}}{\boldsymbol{R}}^{-1}\boldsymbol{y}\end{array}\right] $$

from which the inversion of the variance–covariance matrix of Y is avoided, which can be helpful in some situations to save considerable computational resources. Note that \( {\overset{\sim }{\boldsymbol{b}}}^{\ast} \) corresponds to the posterior mean of the random effects where the fixed effects are replaced by its GLS (Searle et al. 2006).

When the variance components are unknown, which is most often the case, they are frequently estimated using restricted maximum likelihood estimators, which replace them in the corresponding equations. Then, the approximate best linear unbiased predictor is obtained and is referred to as the estimated or empirical best linear unbiased predictor (EBLUP) (Rencher 2008).

To solve the mixed model equations, there are several software packages that can be useful, but one in particular in the genomic context is the sommer package (Covarrubias-Pazaran 2016, 2018) that internally solves the MME after the variance components are estimated. The github version of the sommer R package can be accessed at https://github.com/cran/sommer and can be installed with the following commands:

install.packages('devtools'); library(devtools); install_github('covaruber/sommer')

5.3 Linear Mixed Models in Genomic Prediction

In a simple genomic prediction context where b includes the genotype effects, and the genomic relationship matrix (VanRaden 2008), that is, G is available, very often the assumed variance–covariance matrix of the random effects is \( {\sigma}_g^2\boldsymbol{G} \) and the errors are assumed as independently and identically distributed, R = σ2In, where n is the total number of observations. In this case, the resultant model is known as the GBLUP model and when the pedigree is used, it is referred to as PBLUP. Another kind of information between lines can also be used, such as the relationship matrices derived from hyperspectral reflectance information (Krause et al. 2019). Other extensions of this model can be developed by taking into account other factors, for example, genotype× environment interaction, as will be illustrated later in the genomic prediction context.

In this case, where only the genotypic effects are taken into account, in the linear mixed model (5.1), the fixed effects design matrix is X = 1n, where the vector of length n corresponds to the general mean β = β0, b = (b1, b2, …, bJ)T contains the genotypic effects of J lines, and Z is the incidence matrix design for the random line effects (ZL):

$$ \boldsymbol{Y}={\mathbf{1}}_n\mu +{\boldsymbol{Z}}_{\boldsymbol{L}}\boldsymbol{b}+\boldsymbol{\epsilon}, $$
(5.3)

where \( \boldsymbol{b}\sim {N}_J\left(\mathbf{0},{\sigma}_g^2\boldsymbol{G}\right) \) and R = σ2In.

The basic code to implement the GBLUP model (5.3) with the sommer package is the following:

A = mmer(y ~ 1,random= ~ vs(GID,Gu=G) , rcov= ~ vs(units), data=dat_F, verbose=FALSE)

where y and GID are the column names that contain the response variable and genotypes in data set dat_F. G is the genomic relationship matrix for lines which is specified in the Gu argument, that in general serves to provide a known variance–covariance matrix between the levels of the random effects (GID). In the “rcov” option, the argument “units” is always used to specify the error term.

5.4 Illustrative Examples of the Univariate LMM

Example 1

To illustrate the performance of the LMM in a genomic prediction context doing the fitting process with the sommer package, we considered a wheat data set that consisted of 500 markers measured for each line as the genomic information, and with 229 observations in total that registered grain yield (tons/ha): 30 lines in four environments with one or two repetitions.

The prediction performance of the model given in Eq. (5.3) (M1) was evaluated with 10 random partitions, where each partition was made up of two subsets, one containing 80% of the data and used for training the model, and the other containing the remaining 20% of the data and used to evaluate the prediction performance of the model in terms of the mean squared error of prediction (MSE).

Furthermore, this model assumes that the errors were independently and identically distributed as ϵ ∼ Nn(0, σ2In); independently of the genotypic effects, b was assumed multivariate normal with a null mean vector and a variance–covariance matrix equal to \( {\sigma}_g^2\boldsymbol{G} \), where the genomic relationship matrix was computed with the information of the 500 markers.

The variance components parameters, in this case \( {\sigma}_g^2 \) and σ2, were estimated by restricted maximum likelihood estimation with the mmer function in the sommer package, using the default algorithm optimization, the Newton–Raphson method. For univariate response variables, the EM algorithm through the EM function in this R package can also be used.

The results are shown in Table 5.1, where we also present the Pearson’s correlation (PC) and MSE of the same model but without taking into account the information of the genomic relationship between lines (G), that is, the variance–covariance matrix for the genotypic effects is assumed to be \( \mathrm{Var}\left(\boldsymbol{b}\right)={\sigma}_g^2{\boldsymbol{I}}_J \). This model is referred to as M10. From this table, we can observe that model M10 shows a slightly better performance in terms of both MSE and PC criteria than the M1 model: the MSE of model M1 is 3.15% greater than the MSE of M10, while the PC of the M10 is 3.94% greater than the corresponding M1 model. The better average performance was observed with model M10, which did not consider genomic information, suggests that the marker information in this particular case did not provide useful information; however, in general, this is not expected when using marker information for prediction, although this could change with larger data sets (more lines and more markers) or by improving the quality of the available data.

Table 5.1 Prediction performance of the GBLUP model (5.3, M1) and the model (5.3) that results from ignoring the genomic information (M10): mean squared error of prediction (MSE) and Pearson’s correlation (PC), and its standard deviation for each criterion is reported for each partition

The R code to reproduce this result is given in Appendix 4. This can be adapted easily to another CV strategy of interest where the objective, for example, can be the prediction of non-observed lines in some environments or the prediction of lines in a future year.

An extension of the GBLUP model is the G×E BLUP model that takes into account the main environmental effects, the genotypic effects, and the genotype ×environment interaction effects:

$$ Y={\mathbf{1}}_n\mu +{\boldsymbol{X}}_E{\boldsymbol{\beta}}_E+{\boldsymbol{Z}}_L{\boldsymbol{b}}_1+{\boldsymbol{Z}}_{EL}{\boldsymbol{b}}_2+\boldsymbol{\epsilon} $$
(5.4)

where now the fixed effects are part of the linear mixed model (5.1) that was explicitly split into the general mean part (1nμ) and the environment effects term (XEβE), X = [1n XE] and \( \boldsymbol{\beta} ={\left(\mu, {\boldsymbol{\beta}}_E^{\mathrm{T}}\right)}^{\mathrm{T}} \). Similarly, for the random effects, Z = [ZL ZEL] and \( \boldsymbol{b}={\left[{\boldsymbol{b}}_1^{\mathrm{T}},{\boldsymbol{b}}_2^{\mathrm{T}}\right]}^{\mathrm{T}} \), where b1 and b2 were the vectors with the random genotypic effects and the vector with the random genotype×environment interaction effects, with incidence matrix ZL and ZEL, respectively. For b1, the same distribution as the GBLUP model was assumed, \( {\boldsymbol{b}}_1\sim {N}_J\left(\mathbf{0},{\sigma}_g^2\boldsymbol{G}\right) \), and for the second random effect, b2~NJ(0, ΣE ⨂ G), where ΣE ⨂ G is the relationship matrix of the genotype×environment interaction term, with ΣE the genetic variance–covariance matrix between I environments; the ith element of the diagonal of ΣE, \( {\sigma}_{Ei}^2, \) is the genetic variance in environment i, i = 1, …, I, and σEikG is the genetic variance–covariance matrix for lines in environments i and k, where σEik is the element (i, k) of ΣE.

When ΣE has a non-diagonal structure, the information from the genomic relationship matrix and the correlated environments can be helpful for improving the prediction performance of the model by borrowing information between lines inside an environment and between lines across and among environments (Burgueño et al. 2012).

Example 2

To illustrate how model (5.4) can be implemented using the sommer package, the same data used in Example 1 are considered, where the same 30 genotypes are in the four environments. Besides the line indicator (GID), environment information (Env) was also available in the data set, which was needed for implementing model (5.4). The adopted structure for the variance–covariance matrix between environments is \( {\boldsymbol{\Sigma}}_E={\sigma}_{EG}^2{\boldsymbol{I}}_I \) and the resulting model is referred to as M2. Another explored model (M20) was obtained under the same specification, with the difference that G was set equal to the identity matrix.

Using the same validation scheme that was used in Example 1, the results for each of the 10 random partitions are shown in Table 5.2, in which, for illustrative purposes, model (5.3) plus environment as a fixed effect (M11) is also included, that is,

$$ \boldsymbol{Y}={\mathbf{1}}_n\mu +{\boldsymbol{X}}_E{\boldsymbol{\beta}}_E+{\boldsymbol{Z}}_L\boldsymbol{b}+\boldsymbol{\epsilon}, $$

where μ, βE, and b are as before (5.3), and XEβE is the predictor term corresponding to the environment fixed effects.

Table 5.2 Prediction performance of two sub-models of (5.4): model M2 in which \( \mathrm{Var}\left({\boldsymbol{b}}_1\right)={\sigma}_g^2\boldsymbol{G} \), b2~NJ(0, ΣE ⨂ G) and \( {\boldsymbol{\Sigma}}_E={\sigma}_{EG}^2{\boldsymbol{I}}_I \)(M2); and model M20 that is the same as model M2 but the genomic information is not taken into account, that is, G = IJ

From Table 5.2 we can observe yet again a moderately better performance of model M20 that does not take into account the genomic information. Model M2 also confirms the lack of usefulness of the marker information in this case, but again, this in general is expected to change for other data sets with a greater number of lines, markers, or more data quality. The MSE of model M2 is 5.11% greater than the MSE of model M20, while the PC value of this last model is 6.98% greater than the corresponding PC value obtained with the M20 model. When comparing the M20 model and M11, the MSE of this last model is just 0.075% greater than the corresponding MSE of M20, but when considering the PC value, the first model resulted in 13.98% greater than model (5.1) plus the environment effect. Indeed, because of the high variation observed across partitions (SD values of PC and MSE), there is no significant difference between models in Table 5.2.

Furthermore, in terms of the average MSE, the model with the best performance between those presented in Table 5.1 (M10) is 13.73% greater than the average MSE of the best performance model between those compared in Table 5.2 (M11), while in terms of the average Pearson’s correlation, the best model in Table 5.2 (M20) is 25.42% greater than the average Pearson’s correlation of the best model in Table 5.1. The worse average MSE performance of those in Table 5.1 (M1) is 17.31% greater than the best average MSE performance in Table 5.2 (M20), and the best average PC performance in Table 5.2 (M20) is 30.37% greater than the worse average PC performance in Table 5.1 (M1). Actually, the best average MSE model in Table 5.1 (M10) is 8.19 greater than the worse average MSE model in Table 5.2 (M2), while the worse average PC model in Table 5.2 (M11) is 10.51% greater than the average PC model in Table 5.1 (M10).

The R code to reproduce these results is given in Appendix 5.

Other versions of model (5.4) can be obtained by adopting other variance–covariance structures. For example, another version of model (5.4) can be obtained when environment covariates are available (W) and they are used to model the G× E predictor term, specifically when the genetic variance–covariance matrix between environments, ΣE, is modeled by \( {\boldsymbol{\Sigma}}_E={\sigma}_{EG}^2\boldsymbol{O} \), where \( \boldsymbol{O}=\frac{1}{p_w}\boldsymbol{W}{\boldsymbol{W}}^{\mathrm{T}} \) and the similarity between environments is computed like the genomic relationship matrix (G), using the information of pw environment covariates (Jarquín et al. 2014; Martini et al. 2020), or where O is obtained from phenotypic correlations across environments from related historical data (Martini et al. 2020). In the sommer package, this can be implemented using the following basic R code:

O = diag(I)#Specified the O matrix for the I environments dat_F$Env_GID = paste(dat_F$Env,dat_F$GID,sep='_') GE = kronecker(O,G) rnGWE = expand.grid(row.names(G),unique(dat_F$Env)) row.names(GE) = paste(rnGWE[,2],rnGWE[,1],sep='_') colnames(GE) = row.names(GE) A = mmer(y ~ Env, random= ~ vs(GID,Gu=G)+ vs(Env_GID,Gu = GE), rcov= ~ vs(units), data=dat_F)

Other more complex models can be explored with the sommer package when more data information is available, such as specifying an unstructured variance–covariance matrix for ΣE. A simpler model is the non-correlated heterogeneous variance components (for environments) which arises by assuming a diagonal structure, \( {\boldsymbol{\Sigma}}_E=\mathrm{Diag}\left({\sigma}_1^2,\dots, {\sigma}_I^2\right) \). This can be implemented by replacing the interaction term in the predictor in sommer vs(Env,Gu = G) by vs(ds(Env), GID,Gu = G). Similarly, for a specific environment residual variance, vs(units) need to be replaced by vs(ds(Env),units) or vs(at(Env),units). See Appendix 7 for a basic code to implement all these models and see Covarrubias-Pazaran (2018) for more variance structures that can be exploited in this model.

5.5 Multi-trait Genomic Linear Mixed-Effects Models

In some genomic applications, there are several traits of interest and all of them are measured in some lines but in other lines only subsets of those traits are measured. Although separate univariate genomic linear mixed models can be performed to analyze all measured traits, sometimes single univariate genomic models do not work well, especially in traits with low heritability. When low heritability traits have at least moderate correlation with high heritability traits, the prediction performance ability for these low heritability traits could strongly increase by using a multi-trait model (Jia and Jannink 2012; Montesinos-López et al. 2016; Budhlakoti et al. 2019).

If for each line (j = 1, …J), nT traits are measured, Yjt, t = 1, …nT, the multi-trait genomic linear mixed-effects model adopts an unstructured covariance matrix for the residuals between traits and for the random genotypic effects between traits, and similar to the univariate trait models (5.3), this can be expressed as

$$ \left[\begin{array}{c}{Y}_{j1}\\ {}{Y}_{j2}\\ {}\vdots \\ {}{Y}_{j{n}_T}\end{array}\right]=\left[\begin{array}{c}{\mu}_1\\ {}{\mu}_2\\ {}\vdots \\ {}{\mu}_{n_T}\end{array}\right]+\left[\begin{array}{c}{g}_{j1}\\ {}{g}_{j2}\\ {}\vdots \\ {}{g}_{jn_T}\end{array}\right]+\left[\begin{array}{c}{\epsilon}_{j1}\\ {}{\epsilon}_{j2}\\ {}\vdots \\ {}{\epsilon}_{jn_T}\end{array}\right],j=1,\dots, J, $$
(5.5)

where μt, t = 1, …, nT, are the specific trait means, gjt, t = 1, …, nT, are the specific trait genotypic effects, and ϵjt, t = 1, …, nT, are the random error terms corresponding to each trait. Furthermore, \( \boldsymbol{b}={\left[{\boldsymbol{g}}_1^{\mathrm{T}},\dots, {\boldsymbol{g}}_J^{\mathrm{T}}\right]}^{\mathrm{T}}\sim N\left(\mathbf{0},\boldsymbol{G}\boldsymbol{\bigotimes }{\boldsymbol{\Sigma}}_T\right), \) \( {\boldsymbol{g}}_j={\left[{g}_{j1},\dots, {g}_{j{n}_T}\right]}^{\mathrm{T}}, \) j = 1, …, J, and \( {\boldsymbol{\epsilon}}_j={\left[{\epsilon}_{j1},\dots, {\epsilon}_{j{n}_T}\right]}^{\mathrm{T}} \), j = 1, …, J, are independent multivariate normal random vectors with null mean and variance \( {\mathbf{R}}_{n_T} \), ΣT is nT × nT matrix that represents the genetic covariance between traits, and is the Kronecker product.

In matrix notation, it is the linear mixed model (5.1) where \( \boldsymbol{Y}={\left[{\boldsymbol{Y}}_1^{\mathrm{T}},\dots, {\boldsymbol{Y}}_J^{\mathrm{T}}\right]}^{\mathrm{T}}, \) \( {\boldsymbol{Y}}_j={\left[{Y}_{j1},\dots, {Y}_{j{n}_T}\right]}^{\mathrm{T}}, \)\( \boldsymbol{X}={\mathbf{1}}_J\bigotimes {\boldsymbol{I}}_{n_T}, \)\( \boldsymbol{\beta} =\boldsymbol{\mu} ={\left({\mu}_1,\dots, {\mu}_{n_T}\right)}^{\mathrm{T}}, \)\( \boldsymbol{Z}={\boldsymbol{I}}_{n_TJ}, \)\( \boldsymbol{b}={\left[{\boldsymbol{g}}_1^{\mathrm{T}},\dots, {\boldsymbol{g}}_J^{\mathrm{T}}\right]}^{\mathrm{T}},\boldsymbol{\epsilon} ={\left[{\boldsymbol{\epsilon}}_1^{\mathrm{T}}\dots {\boldsymbol{\epsilon}}_J^{\mathrm{T}}\right]}^{\mathrm{T}}\sim N\left(\mathbf{0},{\boldsymbol{I}}_J\boldsymbol{\bigotimes}{\boldsymbol{R}}_{n_{\boldsymbol{T}}}\right), \)and \( \boldsymbol{b}={\left[{\boldsymbol{g}}_1^{\mathrm{T}},\dots, {\boldsymbol{g}}_J^{\mathrm{T}}\right]}^{\mathrm{T}}\sim N\left(\mathbf{0},\boldsymbol{G}\boldsymbol{\bigotimes }{\boldsymbol{\Sigma}}_T\right). \) Similarly, the extended model that arises by adding more fixed effects (X) can be specified by adding a term to the predictor:

$$ \boldsymbol{Y}=\left({\mathbf{1}}_{IJ}\bigotimes {\boldsymbol{I}}_{n_T}\right)\boldsymbol{\mu} +\boldsymbol{X}\boldsymbol{\beta } +\boldsymbol{Zb}+\boldsymbol{\epsilon} $$
(5.5a)

When ΣT and R are diagonal matrices, model (5.5) is equivalent to separately fitting a univariate GBLUP model to each trait.

The R code to fit this multivariate model with the sommer package is

A = mmer(cbind(T1,…,TnT) ~x1+x2+…+xp , random= ~ vs(GID,Gu=G), rcov= ~ vs(units), data=dat_F)

where y and GID are again the column names corresponding to the response variables and genotypes in data set dat_F, while T1, , TnT are the column names of the matrix of response variables (y) in dat_F corresponding to the traits to be used, and similarly, x1, x2,…, xp are the column names of p covariates to be included in the fitting process (see below the R code for Example 3). The rest of the arguments are the same as those described in the R code of model 5.3.

Example 3

To illustrate the fitting of the multi-trait genomic model (5.5) (M3), we considered the same data set used in Examples 1 and 2, but with the addition of trait (y2) to be able to explore the implementation of a bivariate trait genomic model. The same CV strategy implemented in Example 1 was used. In addition to model (5.5), we also evaluated a sub-model that was obtained by considering a diagonal structure for ΣT, that is, \( {\boldsymbol{\Sigma}}_T=\mathrm{Diag}\left({\sigma}_{T_1}^2,{\sigma}_{T_2}^2\right) \). This model will be referred to as M32.

The results are in Table 5.3. On average, the two evaluated models (M3 and M32) showed a similar performance in terms of the two criteria used, MSE and PC, for both traits, but in all partitions a slightly better performance was observed in favor of model M3. For trait T1, the simpler model (M32) gave an MSE 0.785% greater than model M3, while the more complex model (M3) gave a PC only 1.066% greater than that of model M32. The difference was less for the second trait (T2), where the average MSE of M32 was only 0.165% greater than the one corresponding to model M3, while the PC of M3 was only 0.046% greater than the PC of M32.

Table 5.3 Prediction performance of a bivariate trait model (5.5)

Furthermore, note that the difference between the univariate models presented in Tables 5.1 and 5.2 and the multivariate models of Table 5.3 is not significant (only on average the models in Table 5.2 result better than models in Table 5.1) because the large standard deviation observed across partitions in MSE and PC, which in this case indicate that the multivariate model does not help improve the prediction accuracy in the trait of interest (first trait). But as commented before, this benefit could be obtained with more related auxiliary secondary traits and larger data sets of good quality.

The R code to obtain the results given in Table 5.3 is provided in Appendix 6. At the end of this Appendix, in the comment lines, the code is also available for a CV strategy, when we are interested in evaluating the performance of a bivariate model where only trait y2 is missing in testing data set and all the information of the other trait (y1) is available. This could be useful in real applications where the interest lies in predicting traits that are difficult or expensive to measure, with the phenotypic information of correlated traits that are easy or inexpensive to measure (Calus and Veerkamp 2011; Jiang et al. 2015). Of course, the code could be adapted for any other relevant strategy.

In a similar fashion, just as univariate genomic linear mixed model (5.4), model (5.5) can be directly extended to a model that considers the genotype ×environment interaction term. Next, we do this for the balanced case, and for this we assume that for each environment i = 1, …, I, J lines were phenotyped for nT traits, Yijt, t = 1, …, nT. In matrix notation, the extended G× E model (5.4) plus fixed effects () is given by

$$ \boldsymbol{Y}=\left({\mathbf{1}}_{IJ}\bigotimes {\boldsymbol{I}}_{n_T}\right)\boldsymbol{\mu} +\boldsymbol{X}\boldsymbol{\beta } +{\boldsymbol{Z}}_L{\boldsymbol{b}}_1+{\boldsymbol{Z}}_{EL}{\boldsymbol{b}}_2+\boldsymbol{\epsilon}, $$
(5.6)

where \( \boldsymbol{Y}={\left[{\boldsymbol{Y}}_1^{\mathrm{T}}\dots {\boldsymbol{Y}}_I^{\mathrm{T}}\right]}^{\mathrm{T}} \), \( {\boldsymbol{Y}}_i={\left[{\boldsymbol{Y}}_{i1}^{\mathrm{T}},\dots, {\boldsymbol{Y}}_{iJ}^{\mathrm{T}}\right]}^{\mathrm{T}} \), \( {\boldsymbol{Y}}_{ij}={\left[{\boldsymbol{Y}}_{ij1},\dots {\boldsymbol{Y}}_{ij{n}_T}\right]}^{\mathrm{T}} \), i = 1, …, I, j = 1, …, J, 1IJ is the vector of ones of order IJ, \( {\boldsymbol{I}}_{n_T} \) is the identity matrix of dimension nT, \( \boldsymbol{\mu} ={\left({\mu}_1,\dots, {\mu}_{n_T}\right)}^{\mathrm{T}} \) is the vector with the general specific trait means, \( {\boldsymbol{Z}}_L={\mathbf{1}}_I\bigotimes {\boldsymbol{I}}_{n_TJ} \) and \( {\boldsymbol{Z}}_{EL}={\boldsymbol{I}}_{I{Jn}_T} \) are the incidence matrices of genotype random effects (b1) and the incidence matrices of the genotype×environment interactions random effects (b2), respectively, with \( {\boldsymbol{b}}_1={\left[{\boldsymbol{g}}_1^{\mathrm{T}},\dots, {\boldsymbol{g}}_J^{\mathrm{T}}\right]}^{\mathrm{T}} \) and \( {\boldsymbol{b}}_{\mathbf{2}}={\left[{\boldsymbol{g}}_{21}^{\mathrm{T}},\dots, {\boldsymbol{g}}_{2I}^{\mathrm{T}}\right]}^{\mathrm{T}} \), \( {\boldsymbol{g}}_j={\left[{g}_{j1},\dots, {g}_{j{n}_T}\right]}^{\mathrm{T}}, \) \( {\boldsymbol{g}}_{2i}={\left[{\boldsymbol{g}}_{2i1}^{\mathrm{T}},\dots, {\boldsymbol{g}}_{2 iJ}^{\mathrm{T}}\right]}^{\mathrm{T}},\mathrm{and} \) \( {\boldsymbol{g}}_{2 ij}={\left[{\boldsymbol{g}}_{2 ij1},\dots {\boldsymbol{g}}_{2 ij{n}_T}\right]}^{\mathrm{T}} \), i = 1, …, I, j = 1, …, J. In addition, it is assumed that \( \boldsymbol{\epsilon} ={\left[{\boldsymbol{\epsilon}}_1^{\mathrm{T}}\dots {\boldsymbol{\epsilon}}_I^{\mathrm{T}}\right]}^{\mathrm{T}}\sim N\left(\mathbf{0},{\boldsymbol{I}}_{IJ}\mathbf{\bigotimes}{\boldsymbol{R}}_{n_{\boldsymbol{T}}}\right) \), b1  N(0, G ⨂ ΣT), and b2  N(0, ΣE  G ⨂ Σ2T).

This shows that when ΣT, Σ2T, ΣE, and R are diagonal matrices, model (5.6) is equivalent to separately fitting a univariate GBLUP model for each trait.

Example 4

To illustrate the fitting and evaluation process of model (5.6), we considered a data set that contains the information of two traits, for which 150 lines were phenotyped each in two environments, and given a total of 300 bivariate phenotypic data points. Also, a genomic relationship matrix for the lines is available that was computed with marker information.

The first explored model is referred to as M4 and assumes an unstructured variance–covariance matrix for all the components in model (5.6), except for the assumption that ΣE = II, i.e., the model assumes the same variance–covariance among environments. In addition to this model (M4), three sub-models were also explored: M42, which considers a diagonal structure for the genetic variance–covariance between traits, \( {\boldsymbol{\Sigma}}_{2T}=\mathrm{Diag}\left({\sigma}_{1{T}_1}^2,{\sigma}_{1{T}_2}^2\right) \), model M43 in which \( {\boldsymbol{\Sigma}}_{1T}=\mathrm{Diag}\left({\sigma}_{1{T}_1}^2,{\sigma}_{1{T}_2}^2\right) \) and \( {\boldsymbol{\Sigma}}_{2T}=\mathrm{Diag}\left({\sigma}_{2{T}_1}^2,{\sigma}_{2{T}_2}^2\right) \), and model M44 which is the same as M43 but with an assumed diagonal variance–covariance matrix structure for the error, \( \boldsymbol{R}=\mathrm{Diag}\left({\sigma}_{e_1}^2,{\sigma}_{e_2}^2\right) \).

The results are shown in Table 5.4, from which we can observe that for trait 1 (GY), the best performance under both criteria (MSE and PC) was obtained with the more complex model: M4. For this trait (GY), the MSE of models M42, M43, and M44 were 7.53%, 8.21%, and 8.17%, respectively, greater than the MSE of model M4. For the same trait (GY), the PC of model M4 was 47.73%, 52.24%, and 72.57%, greater than the PC of models M42, M43, and M44, respectively.

Table 5.4 Prediction performance of some sub-models of model (5.6)

For the second trait (Testwt), model M4 also showed the best performance, but only under the MSE criteria: the MSE of models M42, M43, and M44 were 4.81%, 6.44%, and 10.08%, respectively, greater than the MSE corresponding to model M4, also suggesting an increasing degradation pattern in the MSE performance as the model became simpler with fewer parameters to estimate in relation to M4. In terms of PC, the best performance was achieved with model M42, which gave 20.54%, 0.583%, and 2.247% greater performances than models M4, M43, and M44, respectively.

Appendix 7 shows the R code used to reproduce the results in Table 5.4 with the sommer package. At the end of this code, we also included the basic code to explore other variance–covariance structures. Specifically, this is the code used to explore the model with heterogeneous genetic variance–covariance matrix, Σ2T, across environments, that is, g2i  N(0, G ⨂ Σ2iT), i = 1, …, I, which are assumed independent across environments.

5.6 Final Comments

The multi-trait linear model proposed by Henderson and Quaas (1976) in animal breeding can bring benefits in comparison to single-trait modeling for the improvement of prediction accuracy when incorporating correlated traits, as well as for obtaining an optimal and simplified total merit selection index (Okeke et al. 2017).

When the goal is to predict difficult or expensive traits that are correlated with inexpensive secondary traits, the use of multi-trait models could be helpful in developing better genomic selection strategies. Similarly, improvement of the accuracy of prediction for low-heritability key traits can follow from the use of high-heritability secondary traits (Jia and Jannink 2012; Muranty et al. 2015). Furthermore, this can be combined with the information of traits obtained using the speed breeding methodology to shorten the breeding cycles and accelerate breeding programs (Ghosh et al. 2018; Watson et al. 2019).

While the advantage of the multi-trait model is clearly documented, larger data sets and more computing resources are required, as there are additional parameters that need to be estimated (genetic and error covariances), which may affect the accuracy of genomic prediction. Additionally, convergence problems often arise when implementing complex mixed linear models and especially when small data sets are used.

Although the application of multi-trait models can be easily adapted with regard to genetic correlation, heritability, training population composition, and size of data sets, some other factors need to be carefully considered when using these methods for the improvement of genomic accuracy predictions (Lorenz and Smith 2015; Covarrubias-Pazaran et al. 2018). For example, biased and suboptimal choices between univariate and multi-trait models can result from using auxiliary traits that are measured on individuals to be tested, but appropriate cross-validations strategies could be helpful in determining the usefulness of combining the multi-trait information with multi-trait models (Runcie and Cheng 2019).