Abstract
The linear genomic selection index (LGSI) is a linear combination of genomic estimated breeding values (GEBVs) used to predict the individual net genetic merit and select individual candidates from a nonphenotyped testing population as parents of the next selection cycle. In the LGSI, phenotypic and marker data from the training population are fitted into a statistical model to estimate all individual available genome marker effects; these estimates can then be used in subsequent selection cycles to obtain GEBVs that are predictors of breeding values in a testing population for which there is only marker information. The GEBVs are obtained by multiplying the estimated marker effects in the training population by the coded marker values obtained in the testing population in each selection cycle. Applying the LGSI in plant or animal breeding requires the candidates to be genotyped for selection to obtain the GEBV, and predicting and ranking the net genetic merit of the candidates for selection using the LGSI. We describe the LGSI and show that it is a direct application of the linear phenotypic selection index theory in the genomic selection context; next, we present the combined LGSI (CLGSI), which uses phenotypic and GEBV information jointly to predict the net genetic merit. The CLGSI can be used only in training populations when there are phenotypic and maker information, whereas the LGSI is used in testing populations where there is only marker information. We validate the theoretical results of the LGSI and CLGSI using real and simulated data.
Download chapter PDF
5.1 The Linear Genomic Selection Index
5.1.1 Basic Conditions for Constructing the LGSI
Conditions described in Chap. 4 (Sect. 4.1.1) for constructing a valid linear molecular selection index (LMSI), are also necessary for the linear genomic selection index (LGSI); however, in addition to those conditions, the LGSI also requires:

1.
All marker effects to be estimated simultaneously in the training population.

2.
The estimated marker effects to be used in subsequent selection cycles to obtain GEBVs that are predictors of the individual breeding values in the testing population (candidates for selection) for which there is only marker information.

3.
The GEBV values to be composed entirely of the additive genetic effects.

4.
Phenotypes to be used to estimate all marker effects in the training population, not to make selections in the testing population (Heffner et al. 2009; Lorenz et al. 2011).
5.1.2 Genomic Breeding Values and Marker Effects
The breeding value (g_{i}) is the average additive effects of the genes an individual receives from both parents; thus, it is a function of the genes transmitted from parents to progeny and is the only component that can be selected and, therefore, the main component of interest in breeding programs (Mrode 2005). The ith phenotypic value (y_{i}) can be denoted as y_{i} = g_{i} + e_{i}, where g_{i} is the breeding value and e_{i} the residual. Basic assumptions for g_{i} and e_{i} are: both g_{i} and e_{i} have normal distribution with expectation equal to zero and variance \( {\sigma}_{g_i}^2 \) and \( {\sigma}_{e_i}^2 \) respectively. This means that y_{i} = μ_{i} + g_{i} + e_{i} is a linear mixed model (Mrode 2005; Searle et al. 2006), where μ_{i} is the mean of y_{i}.
Let \( {\mathbf{y}}_i^{\prime }=\left[{y}_{i1}\kern0.5em {y}_{i2}\kern0.5em \cdots \kern0.5em {y}_{in}\right] \) be a vector 1 × n of observations in the ith trait and let \( {\mathbf{g}}_i^{\prime }=\left[{g}_{i1}\kern0.5em {g}_{i2}\kern0.5em \cdots \kern0.5em {g}_{in}\right] \) be a vector 1 × n of unobservable breeding values associated with y_{i}; then y_{i} can be written as
where μ_{i} is the mean of the ith trait, 1 is a vector n × 1 of 1s, Z is a design matrix of 0s and 1s, g_{i} ~ MVN (0, \( \mathbf{A}{\sigma}_{g_i}^2 \)) is a vector of breeding values, and e_{i} ~ MVN (0, \( {\mathbf{I}}_n{\sigma}_{e_i}^2 \)) is a vector of residuals; 0 is the mean and \( \mathbf{A}{\sigma}_{g_i}^2 \) and \( {\mathbf{I}}_n{\sigma}_{e_i}^2 \) the covariance matrix of g_{i} and e_{i} respectively; A is the numerical relationship matrix (Mrode 2005) and I_{n} an identity matrix n × n; \( {\sigma}_{g_i}^2 \) and \( {\sigma}_{e_i}^2 \) are the additive and residual variances associated with g_{i} and e_{i}; and MVN stands for multivariate normal distribution.
Suppose that A, Z, μ_{i}, \( {\sigma}_{g_i}^2 \), and \( {\sigma}_{e_i}^2 \) are known; then, according to Mrode (2005), the best linear unbiased predictor (BLUP) of g_{i} can be written as
where V^{−1} is the inverse matrix of the variance of y_{i}, i.e., \( Var\left({\mathbf{y}}_i\right)={\sigma}_{g_i}^2\mathbf{ZA}{\mathbf{Z}}^{\prime }+{\mathbf{I}}_n{\sigma}_{e_i}^2=\mathbf{V} \). In the context of animal breeding, Eq. (5.2) is considered a univariate linear phenotypic selection index (LPSI) (Mrode 2005) and is used to rank and select individuals as parents of the next generation in the context of one trait. Equation (5.2) can be extended to the multitrait phenotypic selection index case, but to predict the net genetic merit (H = w′g, see Chap. 2 for details) it would be necessary to construct linear combinations of the predicted values of g_{i} associated with the traits of interest as was described in the Foreword of this book.
The vector of the individual genomic breeding values (γ_{i}) associated with the ith characteristic (i = 1, 2,…,t; t = number of traits) of the candidates for selection can be written as
where X is an n × m matrix (n = number of observations and m = number of markers in the population) of coded marker values (2 − 2p, 1 − 2p, and −2p for genotypes AA, Aa, and aa respectively) associated with the additive effects of the quantitative trait loci (QTL) and u_{i} is an m × 1 vector of the additive effects of the QTL associated with markers that affect the ith trait. It is assumed that γ_{i} has MVN with mean 0 and variance \( \mathbf{G}{\sigma}_{\gamma}^2 \), i.e., γ_{i} ~ MVN (0, \( \mathbf{G}{\sigma}_{\upgamma_{\mathrm{i}}}^2 \)), where \( {\sigma}_{\upgamma_{\mathrm{i}}}^2 \) is the additive genomic variance of γ_{i} and G = XX′/c is the n × n additive genomic relationship matrix between genotypes; \( c=\sum \limits_{j=1}^m2{p}_j\left(1{p}_j\right) \) in an F_{2} population, and \( c=\sum \limits_{j=1}^m4{p}_j\left(1{p}_j\right) \) in a double haploid population; p is the frequency of allele A and 1 − p is the frequency of allele a in the jth marker (j = 1, 2, …, m).
The additive genomic relationship matrix G = XX′/c has special properties. For example, in the asymptotic context, the expectation of matrix G is equal to the numerical relationship matrix A, i.e., E(G) = A (Habier et al. 2007; Van Raden 2008); this means that G is a particular realization of A and when the number of markers and genotypes increases in the training population, the value of G tends to concentrate around A. Thus, it can be assumed that at the limit, when the number of markers and genotypes is very high, G = A (CerónRojas and SahagúnCastellanos 2016).
The vector of genomic breeding values (Eq. 5.3) has a similar function in genomic selection as g_{i} in the phenotypic selection context. In addition, g_{i} can be written as g_{i} = γ_{i} + η_{i}, where η_{i} = g_{i} − γ_{i} (Gianola et al. 2003). Also, note that
i.e., the covariance between γ_{i} and g_{i} is equal to the variance of γ_{i} (Dekkers 2007).
Let \( {\mathbf{y}}_i^{\prime }=\left[{y}_{i1}\kern0.5em {y}_{i2}\kern0.5em \cdots \kern0.5em {y}_{in}\right] \) be a vector 1 × n of observation of the ith trait in the training population and let \( {\boldsymbol{\upgamma}}_i^{\prime }=\left[{\gamma}_{i1}\kern0.5em {\gamma}_{i2}\kern0.5em \cdots \kern0.5em {\gamma}_{in}\right] \) be a vector 1 × n of unobservable genomic breeding values associated with y_{i}; then, y_{i} can also be written as
where μ_{i} is the mean of the ith trait, 1 is a vector n × 1 of 1s, Z is a design matrix, γ_{i} ~ MVN (0, \( \mathbf{G}{\sigma}_{\gamma}^2 \)) and ε_{i} ~ MVN (0, \( {\mathbf{I}}_n{\sigma}_{\varepsilon_i}^2 \)) are vectors of genomic breeding values and of residuals respectively, and \( {\sigma}_{\varepsilon_i}^2 \) is the residual variance. I_{n}, G, and \( {\sigma}_{\upgamma}^2 \) were defined in Eqs. (5.2) and (5.3).
According to Eqs. (5.2) and (5.3), when μ_{i}, \( {\sigma}_{\upgamma}^2 \) and \( {\sigma}_{\varepsilon_i}^2 \) are known, the vector of GEBVs for the individuals with the ith trait can be obtained as
where the variance of y_{i} should now be written as \( \mathbf{V}={\sigma}_{\gamma_i}^2\mathbf{ZG}{\mathbf{Z}}^{\prime }+{\mathbf{I}}_n{\sigma}_{\varepsilon_i}^2 \). In the context of genomic selection, Eq. (5.6) is considered a univariate LGSI and is used to rank and select individuals as parents of the next generation (Van Raden 2008; Togashi et al. 2011). Equation (5.6) is the BLUP of γ_{i} and can be extended to a multitrait genomic selection index, but to predict the net genetic merit (H = w′g), it is necessary to construct an LGSI, which is a linear combination of γ_{i}.
Although Eq. (5.6) is theoretically very important in LGSI, in practice we need to estimate the marker effects associated with all the traits of interest and to use these estimates in the testing population to obtain the GEBV of the candidates for selection. Let \( {\mathbf{u}}^{\prime }=\left[{\mathbf{u}}_1^{\prime}\kern0.5em {\mathbf{u}}_2^{\prime}\kern0.5em \cdots \kern0.5em {\mathbf{u}}_t^{\prime}\right] \) be a vector 1 × nt associated with t traits. In the univariate context, Van Raden (2008) showed that the ith vector u_{i} of marker effects in the training population can be estimated as
where \( \upsilon =\frac{\sigma_{e_i}^2}{\sigma_{g_i}^2} \); \( {\sigma}_{g_i}^2 \), \( {\sigma}_{e_i}^2 \) and the other parameters were defined earlier. According to CeronRojas et al. (2015), to estimate the vector \( {\mathbf{u}}^{\prime }=\left[{\mathbf{u}}_1^{\prime}\kern0.5em {\mathbf{u}}_2^{\prime}\kern0.5em \cdots \kern0.5em {\mathbf{u}}_t^{\prime}\right] \) in the multitrait context, Eq. (5.7) can be written as
where W_{t} = I_{t} ⊗ X, “⊗” denotes the Kronecker product (Schott 2005), c and X were defined in Eq. (5.3); N = RC^{−1}, whereR and C are the residual and breeding value covariance matrices for t traits respectively; \( {\mathbf{y}}^{\prime }=\left[{\mathbf{y}}_1^{\prime}\kern0.5em {\mathbf{y}}_2^{\prime}\kern0.5em \cdots \kern0.5em {\mathbf{y}}_t^{\prime}\right] \) ~ MVN(μ, V) is a vector of size 1 × tn, with covariance matrix V = C ⊗ G + R ⊗ I_{n}; I_{t} is an identity matrix of size t × t and I_{n} was defined earlier; \( {\boldsymbol{\upmu}}^{\prime }=\left[{\mu}_1\kern0.5em {\mu}_2\kern0.5em \cdots \kern0.5em {\mu}_t\right] \) is a vector 1 × t of means associated with vector y, and 1 is a vector n × 1 of 1s. In this case, the estimator of the vector of subvectors of genomic breeding values \( {\boldsymbol{\upgamma}}^{\prime }=\left[{\boldsymbol{\upgamma}}_1\kern0.5em {\boldsymbol{\upgamma}}_2\kern0.5em \dots \kern0.5em {\boldsymbol{\upgamma}}_t\right] \) in the testing population can be obtained as
Equation (5.9) is the vector of GEBVs for the multitrait case. Thus, in the testing population, in Eq. (5.9), only the coded values in matrix X change, whereas \( \widehat{\mathbf{u}} \) is the same in each selection cycle. Note that to obtain Eqs. (5.7) and (5.8), we assumed that μ, C, and R are known.
We indicated that the genomic breeding values have normal distribution (Eq. 5.5). Using the simulated data described in Chap. 2, Sect. 2.8.1, in Fig. 5.1 we present the distribution of the GEBVs (Eq. 5.9) associated with traits T1 in the first (Fig. 5.1a) and the fifth (Fig. 5.1b) selection cycles in the testing population. In effect, the frequency distribution of the GEBVs approaches normal distribution in both selection cycles.
5.1.3 The LGSI and Its Parameters
Similar to the LPSI (Chap. 2), the objective of the LGSI is to predict the net genetic merit H = w′g, where \( {\mathbf{g}}^{\prime }=\left[{g}_1\kern0.5em {g}_2\kern0.5em \dots \kern0.5em {g}_t\right] \) (t = number of traits) is a vector of unobservable true breeding values and \( {\mathbf{w}}^{\prime }=\left[{w}_1\kern0.5em {w}_2\kern0.5em \dots \kern0.5em {w}_t\right] \) is a vector of economic weights. Suppose that the genomic breeding values γ_{i} = Xu_{i} are known; then, the LGSI can be written as
where β is an unknown vector of weights.
The main advantage of the LGSI over the LPSI lies in the possibility of reducing the intervals between selection cycles (L_{G}) by more than two thirds (Lorenz et al. 2011); thus, this parameter should be incorporated into the LGSI selection response and the expected genetic gain per trait to reflect the main advantage of the LGSI over the LPSI and the other indices. Assuming that L_{G} = 1, in the LPSI context we wrote the selection response as R_{I} = k_{I}σ_{H}ρ_{HI}; however, if L_{G} ≠ 1, the LGSI selection response can be written as
where k_{I} is the standardized selection differential (or selection intensity) associated with the LGSI, \( {\sigma}_{HI_G} \) is the covariance between H = w′g and the LGSI, \( {\sigma}_{I_G}^2 \) is the variance of the LGSI, σ_{H} is the standard deviation of H, \( {\rho}_{HI_G} \) is the correlation between H and the LGSI, and L_{G} denotes the intervals between selection cycles.
Let C and Γ be matrices of covariance of the breeding values (g) and of the genomic breeding values (γ) respectively; then, the correlation between H = w′g and I_{G} = β′γ can be written as
where \( {\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \boldsymbol{\upbeta} ={\sigma}_{HI_G} \) is the covariance between H = w′g and I_{G} = β′γ, \( {\sigma}_H=\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}} \) is the standard deviation of the variance of H = w′g, and \( {\sigma}_{I_G}=\sqrt{{\boldsymbol{\upbeta}}^{\prime}\boldsymbol{\Gamma} \boldsymbol{\upbeta}} \) is the standard deviation of the variance of I_{G} = β′γ.
5.1.4 Maximizing LGSI Parameters
To maximize the genomic selection response (Eq. 5.11), suppose that k_{I}, σ_{H} and L_{G} are fixed and take the derivative of the natural logarithm (ln) of the correlation between H and I_{G} (Eq. 5.12) with respect to vector β, equate the result of the derivative to the null vector, and isolate β, i.e.,
The result is β = sw, where s = β′Γβ/w′Γβ is a proportional constant that does not affect the maximum value of \( {\rho}_{HI_G} \), because this is invariant to the scale change; then, assuming that β = w, the maximized LGSI selection response can be written as
Hereafter, we refer to the LGSI genomic selection response as that of Eq. (5.14). Also, because β = w, Eq. (5.12) can be written as
which is the maximized correlation between H = w′g and I_{G} = β′γ, or LGSI accuracy; \( {\sigma}_H=\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}} \) is the standard deviation of the variance of H, and \( {\sigma}_{I_G}=\sqrt{{\boldsymbol{\upbeta}}^{\prime}\boldsymbol{\Gamma} \boldsymbol{\upbeta}} \) is the standard deviation of the variance of I_{G}.
The LGSI expected genetic gain per trait (\( {\mathbf{E}}_{I_G} \)) can be written as
All the terms in Eq. (5.16) were previously defined.
Let \( {\lambda}_G=\frac{\rho_{HI_G}}{\rho_{HI}} \) be LGSI efficiency versus LPSI efficiency to predict the net genetic merit, where \( {\rho}_{HI_G} \) is the LGSI accuracy and ρ_{HI} the LPSI accuracy; in percentage terms, LGSI efficiency versus LPSI efficiency for each selection cycle can be written as
According to Eq. (5.17), if p_{G} > 0, LGSI efficiency is greater than LPSI efficiency; if p_{G} = 0, the efficiency of both selection indices is equal, and if p_{G} < 0, the LPSI is more efficient than the LGSI at predicting H = w′g.
Equation (5.17) is useful for measuring LGSI efficiency in terms of accuracy when predicting the net genetic merit (H = w′g), whereas the Technow et al. (2013) inequality measures LGSI efficiency in terms of the time needed to complete one selection cycle. In the context of the LGSI and the LPSI, the Technow inequality can be written as
where L_{G} and L_{P} denote the time required to complete one selection cycle for the LGSI and the LPSI respectively, \( {\rho}_{HI_G} \) is the LGSI accuracy, and h_{I} is the square root of the heritability (Lin and Allaire 1977; Nordskog 1978) of the LPSI, which can be denoted as \( {h}_I=\sqrt{\frac{{\mathbf{b}}^{\prime}\mathbf{Cb}}{{\mathbf{b}}^{\prime}\mathbf{Pb}}} \) (see Chap. 2 for details). Then, assuming that the selection intensity is the same for both selection indices, if Eq. (5.18) is true, the LGSI is more efficient than the LPSI per unit of time.
5.1.5 Relationship Between the LGSI and LPSI Selection Responses
To obtain the relationship between \( {R}_{I_G} \) and R_{I} in the asymptotic context, we omitted the intervals between selection cycles (L_{G} and L_{I} respectively) to simplify the algebra. Consider a population where the number of genotypes and markers tends to infinity; in this case, markers explain most of the true additive genetic variances and covariances. Thus, we can assume that matrices Γ and C are very similar, and at the limit, Γ = C. Now suppose that in this population the phenotypic variance–covariance matrix (P) is known and comprises matrix Γ and the variance–covariance residual matrix (R). In this case, the inverse of P can be written as P^{−1} = (Γ + R)^{−1} = Γ^{−1} − Γ^{−1}(Γ^{−1} + R^{−1})^{−1}Γ^{−1}, where Γ^{−1} and R^{−1} are the inverses of matrices Γ and R respectively. Thus, the LPSI selection response is given by
where b = P^{−1}Γw is the vector of coefficients of the LPSI in the asymptotic context. Note that b′Pb ≥ 0 and w′Γw ≥ 0, i.e., b′Pb and w′Γw are positive semidefinite, meaning that w′Γw ≥ w′(Γ^{−1} + R^{−1})^{−1}w ≥ 0; then, in the asymptotic context, \( {R}_{I_G}\ge {R}_I \). This result is not common when the number of genotypes and markers is small; however, it gives an idea of the theoretical behavior of \( {R}_{I_G} \) with respect to R_{I} when the number of markers and genotypes is very large.
Because g_{q} can be written as g_{q} = γ_{q} + η_{q}, where η_{q} = g_{q} − γ_{q} (q = 1, 2, ⋯, t), for low numbers of markers and genotypes, the covariance genotypic matrix C can be written as C = Γ + E, where E = C − Γ; then, the inverse of matrix P can be written as P^{−1} = [(Γ + E) + R]^{−1} = (Γ + E)^{−1} − (Γ + E)^{−1}[(Γ + E)^{−1} + R^{−1}]^{−1}(Γ + E)^{−1}. In the latter case, the LPSI selection response R_{I} can be written as
Equation (5.20) indicates that in the nonasymptotic context (low numbers of markers and genotypes), \( {R}_{I_G} \) and R_{I} are related in three possible ways:

1.
\( {R}_I>{R}_{I_G} \) if w′Ew > w′[(Γ + E)^{−1} + R^{−1}]^{−1}w

2.
\( {R}_I={R}_{I_G} \) if w′Ew = w′[(Γ + E)^{−1} + R^{−1}]^{−1}w

3.
\( {R}_{I_G}>{R}_I \) if w′Ew < w′[(Γ + E)^{−1} + R^{−1}]^{−1}w
The second and third points indicate that \( {R}_{I_G} \) may be equal to or larger than R_{I}, even under a small number of markers, depending on the size of w′Ew and w′[(Γ + E)^{−1} + R^{−1}]^{−1}w. These three points explain the theoretical relationship between R_{I} and \( {R}_{I_G} \) for a low number of markers and genotypes. When Γ = C, E = 0, and \( {R}_I={k}_I\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}{\mathbf{w}}^{\prime }{\left({\boldsymbol{\Gamma}}^{1}+{\mathbf{R}}^{1}\right)}^{1}\mathbf{w}} \), then \( {R}_{I_G}\ge {R}_I \).
5.1.6 Statistical LGSI Properties
Assuming that H and I_{G} have joint bivariate normal distribution and that Γ, C, and w are known, the LGSI has the following properties:

1.
The variance of I_{G} (\( {\sigma}_{I_G}^2 \)) and the covariance between H and I_{G} (\( {\sigma}_{HI_G} \)) are equal, i.e., \( {\sigma}_{I_G}^2={\sigma}_{HI_G} \).

2.
The maximized correlation between H and I_{G} (or LGSI accuracy) is equal to \( {\rho}_{HI_G}=\frac{\sigma_{I_G}}{\sigma_H} \), where \( {\sigma}_{I_G} \) is the standard deviation of \( {\sigma}_{I_G}^2 \) and σ_{H} is the standard deviation of the variance of H (\( {\sigma}_H^2 \)).

3.
The variance of the predicted error, \( Var\left(H{I}_G\right)=\left(1{\rho}_{HI_G}^2\right){\sigma}_H^2 \), is minimal. Note that \( Var\left(H{I}_G\right)={\sigma}_{I_G}^2+{\sigma}_H^22{\sigma}_{HI_G} \), and when β = w, \( {\sigma}_{I_G}^2={\sigma}_{HI_G} \), whence \( Var\left(H{I}_G\right)={\sigma}_H^2{\sigma}_{I_G}^2=\left(1{\rho}_{HI_G}^2\right){\sigma}_H^2 \) is minimal.

4.
The total variance of H explained by I_{G} is \( {\sigma}_{I_G}^2={\rho}_{HI_G}^2{\sigma}_H^2 \). It is evident that if \( {\rho}_{HI_G}=1 \), \( {\sigma}_{I_G}^2={\sigma}_H^2 \), and if \( {\rho}_{HI_G}=0 \), \( {\sigma}_{I_G}^2=0 \). That is, the variance of H explained by I_{G} is proportional to \( {\rho}_{HI_G} \), and when \( {\rho}_{HI_G} \) is close to 1, \( {\sigma}_{I_G}^2 \) is close to \( {\sigma}_H^2 \); if \( {\rho}_{HI_G} \) is close to 0, \( {\sigma}_{I_G}^2 \) is close to 0.
The LGSI properties described in points 1–4 of this subsection are the same as the LPSI properties described in Chap. 2. This corroborates the LGSI as an application of the LPSI theory to the genomic selection context.
5.1.7 Genomic Covariance Matrix in the Training and Testing Population
To derive the LGSI theory, we assumed that the true genomic additive variance–covariance matrix Γ was known. However, in practice, we need to estimate it. In the training population, matrix Γ can be estimated by restricted maximum likelihood (REML) using phenotypic and genomic information, as described by Vattikuti et al. (2012) and Su et al. (2012). In Eqs. (2.22) to (2.24) of Chap. 2, we presented the formulas for estimating the genotypic and residual variance and covariance based on the formulas described by Lynch and Walsh (1998). Here, we present a brief description of how we can estimate the qth component (σ_{γqq}) of Γ in the training population using the REML method.
We estimated \( {\sigma}_{\upgamma qq}={\sigma}_{\upgamma q}^2 \) (q, q′ = t = number of traits) in the absence of dominance and epistatic effects, using the model y_{q} = 1μ_{q} + Zγ_{q} + ε_{q}, where the vector y_{q} ~ NMV(1μ_{q}, V_{q}) g × 1 (g = number of genotypes in the population) had a multivariate normal distribution; 1 was a g × 1 vector of 1s, μ_{q} was the mean of the qth trait, Z was an identity matrix g × g; γ_{q} ~ NMV(0, \( \mathbf{G}{\sigma}_{\upgamma q}^2 \)) was a vector of genomic breeding values, and ε_{q} ~ NMV(0, \( \mathbf{I}{\sigma}_{\varepsilon_q}^2 \)) was a g × 1 vector of residuals. Matrix G = XX′/c was the genomic relationship matrix, and in an F_{2} population, \( c=\sum \limits_{j=1}^N2{p}_j{q}_j \); X was a g × m matrix (m = number of markers) of the coded marker values (2 − 2p for AA, 1 − 2p for Aa, and −2p for aa) for the additive effects of the markers; p and q denote the frequency of allele A and the frequency of allele a in the jth marker (j = 1, 2, …, m), and \( {\mathbf{V}}_q=\mathbf{G}{\sigma}_{\upgamma q}^2+\mathbf{I}{\sigma}_{\varepsilon_q}^2 \).
The expectation–maximization algorithm allowed the REML for the variance components \( {\sigma}_{\upgamma q}^2 \) and \( {\sigma}_{\varepsilon_q}^2 \) to be computed by iterating the following equations:
and
where g is the number of genotypes. After n iterations, when \( {\sigma}_{\upgamma q}^{2\left(n+1\right)} \) was very similar to \( {\sigma}_{\upgamma q}^{2(n)} \) and \( {\sigma}_{\varepsilon_q}^{2\left(n+1\right)} \) was very similar to \( {\sigma}_{\varepsilon_q}^{2(n)} \), \( {\sigma}_{\upgamma q}^{2\left(n+1\right)} \) and \( {\sigma}_{\varepsilon_q}^{2\left(n+1\right)} \) were the estimated variance components of \( {\sigma}_{\upgamma q}^2 \) and \( {\sigma}_{\varepsilon_q}^2 \) respectively. In Eqs. (5.21) and (5.22) tr(.) denoted the trace of the matrices within brackets; \( \mathbf{T}={\mathbf{V}}_q^{1}{\mathbf{V}}_q^{1}\mathbf{1}{\left({\mathbf{1}}^{\prime }{\mathbf{V}}_q^{1}\mathbf{1}\right)}^{1}{\mathbf{1}}^{\prime }{\mathbf{V}}_q^{1} \), and \( {\mathbf{V}}_q^{1} \) was the inverse of \( {\mathbf{V}}_q=\mathbf{G}{\sigma}_{\upgamma q}^2+\mathbf{I}{\sigma}_{\varepsilon_q}^2 \). In matrix T^{(n)}, \( {\mathbf{V}}_q^{1(n)} \) was the inverse of matrix \( {\mathbf{V}}_q^{(n)}=\mathbf{G}{\sigma}_{\upgamma q}^{2(n)}+\mathbf{I}{\sigma}_{\varepsilon_q}^{2(n)} \).
The genomic additive genetic covariance between the observations of the qth and ith traits, y_{q} and y_{i} (\( {\sigma}_{\gamma_{\mathrm{qi}}} \), q, i = 1, 2,…,t), can be estimated by REML. Here, we adapted Eqs. (5.21) and (5.22) using the variance of the sum of y_{q} and y_{i}, i.e., Var(y_{i} + y_{q}) = V_{i} + V_{q} + 2C_{iq}, where \( {\mathbf{V}}_i=\mathbf{G}{\sigma}_{{\boldsymbol{\upgamma}}_{\mathrm{i}}}^2+\mathbf{I}{\sigma}_{\varepsilon_i}^2= Var\left({\mathbf{y}}_i\right) \) is the variance of y_{i} and \( {\mathbf{V}}_q=\mathbf{G}{\sigma}_{{\boldsymbol{\upgamma}}_{\mathrm{q}}}^2+\mathbf{I}{\sigma}_{\varepsilon_q}^2= Var\left({\mathbf{y}}_q\right) \) is the variance of y_{q}; \( 2{\mathbf{C}}_{iq}=2\mathbf{G}{\sigma}_{{\boldsymbol{\upgamma}}_{\mathrm{iq}}}+2\mathbf{I}{\sigma}_{\varepsilon_{iq}}=2 Cov\left({\mathbf{y}}_i,{\mathbf{y}}_q\right) \) is the covariance of y_{q} and y_{i}, and \( {\sigma}_{{\boldsymbol{\upgamma}}_{\mathrm{iq}}} \) and \( {\sigma}_{\varepsilon_{iq}} \) are the genomic and residual covariance respectively, associated with y_{i} and y_{q}. Thus, one way of estimating \( {\sigma}_{{\boldsymbol{\upgamma}}_{\mathrm{iq}}} \) and \( {\sigma}_{e_{iq}} \) is by using the following equation:
for which Eqs. (5.21) and (5.22) can be adapted.
If there is only marker information on the testing population, then it is not possible to estimate Γ using Eqs. (5.21) to (5.23). Another way of estimating Γ is to use the method proposed by CeronRojas et al. (2015), which requires the estimated values of γ_{q} (\( {\widehat{\boldsymbol{\upgamma}}}_q \)) in the cycle of interest. Let \( \widehat{\mathbf{u}} \) be the estimator of the vector of marker effects \( {\mathbf{u}}^{\prime }=\left[{\mathbf{u}}_1^{\prime}\kern0.5em {\mathbf{u}}_2^{\prime}\kern0.5em \cdots \kern0.5em {\mathbf{u}}_t^{\prime}\right] \) for t traits obtained in the training population. We obtained the qth GEBVs (q = 1, 2, …, t) in the lth selection cycle (l = 1, 2, …, number of cycles) as
where \( {\widehat{\mathbf{u}}}_q \) is the vector of size m × 1 of the estimated marker effects of the qth trait in the training population and X_{l} is a matrix of size n × m of the coded values of marker genotypes in the lth selection cycle of the testing population.
Now suppose that γ_{q} and \( {\boldsymbol{\upgamma}}_{q^{\prime }} \) have multivariate normal distribution jointly, with mean \( \mathbf{1}{\mu}_{\gamma_q} \) and \( \mathbf{1}{\mu}_{\gamma_{q^{\prime }}} \) respectively, and covariance matrix \( \mathbf{G}{\sigma}_{\gamma_{q{q}^{\prime }}} \), where 1 is an n × 1 vector of 1s and G = XX′/c is the additive genomic relationship matrix. Then, \( \boldsymbol{\Gamma} =\left\{{\sigma}_{\gamma_{q{q}^{\prime }}}\right\} \) can be estimated as
where \( {\widehat{\sigma}}_{\gamma_{q{q}^{\prime }}}=\frac{1}{g}{\left({\widehat{\boldsymbol{\upgamma}}}_{ql}\mathbf{1}{\widehat{\mu}}_{\gamma_{ql}}\right)}^{\prime }{\mathbf{G}}_l^{1}\left({\widehat{\boldsymbol{\upgamma}}}_{q^{\prime }l}\mathbf{1}{\widehat{\mu}}_{\gamma_{q^{\prime }l}}\right) \) is the estimated covariance between γ_{q} and \( {\boldsymbol{\upgamma}}_{q^{\prime }} \) in the lth selection cycle of the testing population; g is the number of genotypes; \( {\widehat{\boldsymbol{\upgamma}}}_{ql} \) was defined in Eq. (5.24); \( {\widehat{\mu}}_{\gamma_{ql}} \) and \( {\widehat{\mu}}_{\gamma_{q^{\prime }l}} \) are the estimated arithmetic means of the values of \( {\widehat{\boldsymbol{\upgamma}}}_{ql} \) and \( {\widehat{\boldsymbol{\upgamma}}}_{q^{\prime }l} \); 1 is a g × 1 vector of 1s and \( {\mathbf{G}}_l={c}^{1}{\mathbf{X}}_l{\mathbf{X}}_l^{\prime } \) is the additive genomic relationship matrix in the lth selection cycle (l = 1, 2, …, number of cycles) in the testing population.
From Eq. (5.25) we can estimate the LGSI response and expected genetic gain per trait in the testing population as
respectively. The estimated LGSI (\( {\widehat{I}}_{\mathrm{G}} \)) values in the lth selection cycle can be obtained as
where w_{q} is the qth economic weight and \( {\widehat{\boldsymbol{\upgamma}}}_{ql} \) was defined in Eq. (5.24). Equation (5.27) is a vector of size g × 1 (g= number of genotypes). In practice, \( {\widehat{I}}_{\mathrm{G}} \) values are ranked to select individual genotypes with optimal GEBVs.
5.1.8 Numerical Examples
To estimate matrices C and R and the marker effects in the training population, we used a real maize (Zea mays) F_{2} population with 248 genotypes (each with two repetitions), 233 molecular markers, and three traits—grain yield (GY, ton ha^{−1}), ear height (EHT, cm), and plant height (PHT, cm)—evaluated in one environment. The estimated matrices were \( \widehat{\mathbf{C}}=\left[\begin{array}{ccc}0.07& 0.61& 1.06\\ {}0.61& 17.93& 22.75\\ {}1.06& 22.75& 44.53\end{array}\right] \) and \( \widehat{\mathbf{R}}=\left[\begin{array}{ccc}0.38& 0.72& 1.27\\ {}0.72& 47.14& 60.96\\ {}1.27& 60.96& 121.46\end{array}\right] \), which were estimated by Eqs. (5.21) to (5.23) using the numerical relationship A instead of the genomic relationship matrix (G = XX′/c).
Table 5.1 presents the first 20 BLUPs of the estimated marker effects (Eq. 5.8) in the training population and the first 20 marker coded values and GEBVs (Eq. 5.9) obtained in the testing population associated with trait GY. In the testing population, there were 380 genotypes and 233 molecular markers. In this population, the estimated genomic covariance matrix \( \boldsymbol{\Gamma} =\left\{{\sigma}_{\gamma_{q{q}^{\prime }}}\right\} \) was \( \widehat{\boldsymbol{\Gamma}}=\left[\begin{array}{ccc}0.21& 2.95& 5.00\\ {}2.95& 42.41& 71.11\\ {}5.00& 71.11& 121.53\end{array}\right] \). The first GEBV (0.195) related to GY in Table 5.1 was obtained as 0.195 = − 0.0003(1) − 0.0038(1) − 0.0085(0) + ⋯ − 0.03(−1). The other GEBVs can be obtained in a similar manner.
Suppose a selection intensity of 10% (k_{I} = 1.755) and a vector of economic weights of \( {\mathbf{w}}^{\prime }=\left[5\kern0.5em 0.1\kern0.5em 0.1\right] \); then, the estimated LGSI selection response and the expected genetic gain per trait without including the interval between selection cycle is \( {\widehat{R}}_{I_G}=(1.755)\sqrt{{\mathbf{w}}^{\prime}\widehat{\boldsymbol{\Gamma}}\mathbf{w}}=0.92 \) and \( {\widehat{\mathbf{E}}}_{I_G}^{\prime }=(1.755)\frac{{\mathbf{w}}^{\prime}\widehat{\boldsymbol{\Gamma}}}{\sqrt{{\mathbf{w}}^{\prime}\widehat{\boldsymbol{\Gamma}}\mathbf{w}}}=\left[0.80\kern0.5em 11.41\kern0.5em 19.28\right] \) respectively, whereas the estimated LGSI accuracy was \( {\widehat{\rho}}_{HI_G}=0.48 \).
Chapter 11 presents RIndSel, a graphical unit interface that uses selection index theory to select individual candidates as parents for the next selection cycle, which can be used to obtain the results of the real numerical example described in this subsection.
To compare LGSI efficiency versus LPSI efficiency we used the simulated data described in Chap. 2, Sect. 2.8.1. According to Beyene et al. (2015), at least 4 years are required to complete one phenotypic selection cycle in maize, whereas genomic selection requires only 1.5 years. Thus, to compare LGSI efficiency versus LPSI efficiency in terms of time, we can use the Technow et al. (2013) inequality described in Eq. (5.18).
Table 5.2 presents the estimated value of Eq. (5.18) for five simulated selection cycles. The LGSI efficiency was higher than LPSI efficiency in terms of time, because the Technow et al. (2013) inequality was true in the five selection cycles. An additional result obtained by CeronRojas et al. (2015) is presented in Fig. 5.2, which shows the correlation among the LGSI, the LPSI, and the true net genetic merit values in seven selection cycles. According to Fig. 5.2, the correlation between the LGSI and the true net genetic merit values was higher than the correlation between the LPSI and the true net genetic merit values for the first three selection cycles; after this cycle, the correlation between LGSI and the true net genetic merit values tended to decrease.
5.2 The Combined Linear Genomic Selection Index
The combined LGSI (CLGSI) developed by Dekkers (2007) is a slightly modified version of the LMSI (see Chap. 4 for details), which, instead of using the marker scores, uses the GEBVs and the phenotypic information jointly to predict the net genetic merit. The main difference between the CLGSI and the LGSI is that the CLGSI can only be used in training populations, whereas the LGSI is used in testing populations. The basic conditions for constructing a valid CLGSI include conditions for constructing the LPSI, the LMSI, and the LGSI, because the CLGSI uses GEBVs and phenotypic information jointly to predict the net genetic merit.
5.2.1 The CLGSI Parameters
The net genetic merit can be written in a similar manner to that in the LMSI context, that is, as
where \( {\mathbf{g}}^{\prime }=\left[{g}_1\kern0.5em \dots \kern0.5em {g}_t\right] \) is the vector of breeding values, \( {\mathbf{w}}^{\prime }=\left[{w}_1\kern0.5em \cdots \kern0.5em {w}_t\right] \) is the vector of economic weights associated with breeding values, \( {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_t\right] \) is a null vector associated with the vector of genomic breeding values \( {\boldsymbol{\upgamma}}^{\prime }=\left[{\gamma}_1\kern0.5em {\gamma}_2\kern0.5em \dots \kern0.5em {\gamma}_t\right] \), \( {\mathbf{a}}_G^{\prime }=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right] \) and \( {\mathbf{z}}_G=\left[{\mathbf{g}}^{\prime}\kern0.5em {\boldsymbol{\upgamma}}^{\prime}\right] \).
The CLGSI can be written as
where \( {\mathbf{y}}^{\prime }=\left[{y}_1\kern0.5em \cdots \kern0.5em {y}_t\right] \) (t = number of traits) is the vector of phenotypic values; γ was defined earlier; \( {\boldsymbol{\upbeta}}_{\mathbf{y}}^{\prime } \) and β_{G} are vectors of coefficients of phenotypic and genomic weight values respectively; \( {\boldsymbol{\upbeta}}_C^{\prime }=\left[{\boldsymbol{\upbeta}}_{\mathbf{y}}^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_G^{\prime}\right] \) and \( {\mathbf{t}}_G^{\prime }=\left[{\mathbf{y}}^{\prime}\kern0.5em {\boldsymbol{\upgamma}}^{\prime}\right] \).
The CLGSI selection response can be written as
where k_{I} is the standardized selection differential of the CLGSI, \( {\sigma}_H^2={\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\mathbf{a}}_C \) and \( Var\left({I}_C\right)={\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C \) are the variances of H and I_{C}, whereas \( {\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\boldsymbol{\upbeta}}_C \) and \( {\rho}_{HI_C} \) are the covariance and the correlation between H and I_{C} respectively; \( {\mathbf{T}}_C= Var\left[\begin{array}{c}\mathbf{y}\\ {}\boldsymbol{\upgamma} \end{array}\right]=\left[\begin{array}{cc}\mathbf{P}& \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] \) and \( {\boldsymbol{\Psi}}_C= Var\left[\begin{array}{c}\mathbf{g}\\ {}\boldsymbol{\upgamma} \end{array}\right]=\left[\begin{array}{cc}\mathbf{C}& \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] \) are block matrices of the phenotypic covariance matrix, P = Var(y), the genomic covariance matrix, Γ = Var(γ), and the genetic breeding values covariance matrix, C = Var(g).
Suppose that matrices Ψ_{C} and T_{C} are known; then the CLGSI vector of coefficients that simultaneously maximizes \( {\rho}_{HI_C} \) and R_{C} can be written as
whence the optimized CLGSI is
Equations (5.31) and (5.32) indicate that the CLGSI is an application of the LPSI to the genomic selection context.
From Eq. (5.31), the maximized CLGSI selection response, expected genetic gain per trait and accuracy can be written as
and
respectively. Note that the maximized LPSI accuracy is \( {\rho}_{HI}=\frac{\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}}{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}} \) (see Chap. 2). The denominator of the accuracy of the CLGSI and \( {\rho}_{HI}=\frac{\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}}{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}} \) is the same; however, the numerator of the two indices accuracy is different. We would expect that \( \sqrt{{\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C}\ge \sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}} \), and then \( {\rho}_{HI_C}\ge {\rho}_{HI} \). Similar results can be observed when we compared the maximized LPSI selection response and expected genetic gain per trait with the maximized CLGSI selection response and expected genetic gain per trait.
5.2.2 Relationship Between the CLGSI and the LGSI
As we have indicated, the CLGSI is mathematically equivalent to the LMSI; thus, it has similar statistical properties to those of the LMSI, some of which are described in this section. The rest can be seen in Chap. 4. Let \( {\mathbf{Q}}_C={\mathbf{T}}_C^{1}{\boldsymbol{\Psi}}_C \), then matrix Q_{C} can be written as
whence as \( {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_t\right] \), the two subvectors that conform vector β_{C} = Q_{C}a_{C} or \( {\boldsymbol{\upbeta}}_C^{\prime }=\left[{\boldsymbol{\upbeta}}_{\mathbf{y}}^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_G^{\prime}\right] \) can be written as
and
When Γ is equal to the null matrix (no genomic information), Eq. (5.37) is equal to β_{y} = P^{−1}Cw = b and \( {R}_C={k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}={R}_I \), which are the LPSI vector of coefficients and the selection response.
By Eqs. (5.37) and (5.38), the maximized CLGSI selection response and the optimized CLGSI can be written as
and
respectively.
Assume that when the number of markers and genotypes increases, matrix Γ tends to matrix C and that, at the limit, Γ = C; then, Eq. (5.39) can be written as \( {R}_C={k}_I\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}}={R}_G \) (except by L_{G}); in addition, β_{y} = 0 and β_{G} = w, the weights of the LGSI, and, in this latter case, the CLGSI is equal to the LGSI, as we would expect. Thus, in the asymptotic context, the LGSI and the CLGSI are the same.
An additional interesting result of the relationship between the CLGSI and the LGSI is as follows. The maximized correlation between H and I_{C} (or CLGSI accuracy) can be written as
However, when Γ = C, \( {\boldsymbol{\Psi}}_C=\left[\begin{array}{cc}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] \), β_{y} = 0, β_{G} = w, and \( {\boldsymbol{\upbeta}}_C^{\prime }=\left[{\boldsymbol{\upbeta}}_{\mathbf{y}}^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_G^{\prime}\right]=\left[\begin{array}{cc}\mathbf{0}& {\mathbf{w}}^{\prime}\end{array}\right] \), whence \( {\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\boldsymbol{\upbeta}}_C={\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\mathbf{a}}_C={\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C={\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w} \), and Eq. (5.41) is equal to 1. That is, the maximum correlation between H and I_{C} in the asymptotic context is equal to the maximum correlation between H and the LGSI, and that value will be equal to 1.
The asymptotic relationship between the CLGSI expected genetic gain per trait, E_{C} (Eq. 5.34), and the LGSI expected genetic gain per trait, \( {\mathbf{E}}_{I_G} \) (Eq. 5.16), is as follows. When Γ = C, \( {\boldsymbol{\Psi}}_C=\left[\begin{array}{cc}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] \) and \( {\boldsymbol{\upbeta}}_C^{\prime }=\left[\mathbf{0}\kern0.5em {\mathbf{w}}^{\prime}\right] \), whence
This means that in the asymptotic context, the CLGSI expected genetic gain per trait is twice the LGSI expected genetic gain per trait. Of course, 2 is only a proportionality constant; thus, in reality, \( {\mathbf{E}}_C={\mathbf{E}}_{I_G} \).
5.2.3 Statistical Properties of the CLGSI
Assume that H and I_{C} have bivariate joint normal distribution; P, C, Γ, and w are known, and \( {\boldsymbol{\upbeta}}_C={\mathbf{T}}_C^{1}{\boldsymbol{\Psi}}_C{\mathbf{a}}_C \); then, the CLGSI properties are as follow:

1.
\( {\sigma}_{I_C}^2={\sigma}_{HI_C} \), i.e., the variance of I_{C} (\( {\sigma}_{I_C}^2 \)) and the covariance between H and I_{C} (\( {\sigma}_{HI_C} \)) are the same.

2.
The maximized correlation between H and I_{C} is \( {\rho}_{HI_C}=\frac{\sigma_{I_C}}{\sigma_H} \), where \( {\sigma}_{I_C} \) is the standard deviation of the variance of I_{C} (\( {\sigma}_{I_C}^2 \)) and σ_{H} is the standard deviation of the variance of H(\( {\sigma}_H^2 \)).

3.
The variance of the predicted error, \( Var\left(H{I}_C\right)=\left(1{\rho}_{HI_C}^2\right){\sigma}_H^2 \), is minimal.

4.
The total variance of H explained by I_{C} is \( {\sigma}_{I_C}^2={\rho}_{HI_C}^2{\sigma}_H^2 \).
Note that CLGSI properties 1 to 4 are the same as LMSI properties 1 to 4 and that both indices jointly incorporate phenotypic and marker information to predict the net genetic merit; however, the LMSI incorporates the marker information by the marker score values, whereas the CLGSI uses the GEBVs.
5.2.4 Estimating the CLGSI Parameters
Using the real maize (Zea mays) F_{2} population with 248 genotypes (each with two repetitions), 233 molecular markers and three traits—GY (ton ha^{−1}), EHT (cm), and PHT (cm)—described in Sect. 5.1.8 of this chapter, we estimated matrices P and C using Eqs. (2.22) to (2.24) described in Chap. 2 of this book. The estimated matrices were \( \widehat{\mathbf{P}}=\left[\begin{array}{ccc}0.45& 1.33& 2.33\\ {}1.33& 65.07& 83.71\\ {}2.33& 83.71& 165.99\end{array}\right] \) and \( \widehat{\mathbf{C}}=\left[\begin{array}{ccc}0.07& 0.61& 1.06\\ {}0.61& 17.93& 22.75\\ {}1.06& 22.75& 44.53\end{array}\right] \).
In a similar manner, we estimated matrix Γ using Eqs. (5.21) to (5.23). The estimated matrix was \( \widehat{\boldsymbol{\Gamma}}=\left[\begin{array}{ccc}0.07& 0.65& 1.05\\ {}0.65& 10.62& 14.25\\ {}1.05& 14.25& 26.37\end{array}\right] \). Note that matrices \( \widehat{\mathbf{C}} \) and \( \widehat{\boldsymbol{\Gamma}} \) have similar values. This means that, in the asymptotic context, we can assume that matrix Γ tends to matrix C.
To estimate the CLMSI and its associated parameters (selection response, expected genetic gain per trait, etc.), we need to estimate the vector of coefficients \( {\boldsymbol{\upbeta}}_C={\mathbf{T}}_C^{1}{\boldsymbol{\Psi}}_C{\mathbf{a}}_C \) as \( {\widehat{\boldsymbol{\upbeta}}}_C={\widehat{\mathbf{T}}}_C^{1}{\widehat{\boldsymbol{\Psi}}}_C{\mathbf{a}}_C \), where \( {\widehat{\mathbf{T}}}_C=\left[\begin{array}{cc}\widehat{\mathbf{P}}& \widehat{\boldsymbol{\Gamma}}\\ {}\widehat{\boldsymbol{\Gamma}}& \widehat{\boldsymbol{\Gamma}}\end{array}\right] \) and \( {\widehat{\boldsymbol{\Psi}}}_C=\left[\begin{array}{cc}\widehat{\mathbf{C}}& \widehat{\boldsymbol{\Gamma}}\\ {}\widehat{\boldsymbol{\Gamma}}& \widehat{\boldsymbol{\Gamma}}\end{array}\right] \) are estimates of matrices \( {\mathbf{T}}_C=\left[\begin{array}{cc}\mathbf{P}& \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] \) and \( {\boldsymbol{\Psi}}_C=\left[\begin{array}{cc}\mathbf{C}& \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] \) respectively. The estimated CLGSI vector of coefficients \( {\widehat{\boldsymbol{\upbeta}}}_C={\widehat{\mathbf{T}}}_C^{1}{\widehat{\boldsymbol{\Psi}}}_C{\mathbf{a}}_C \) is conformed by the vector of phenotypic weights, \( {\widehat{\boldsymbol{\upbeta}}}_{\mathbf{y}}={\left(\widehat{\mathbf{P}}\widehat{\boldsymbol{\Gamma}}\right)}^{1}\left(\widehat{\mathbf{C}}\widehat{\boldsymbol{\Gamma}}\right)\mathbf{w} \), and by the vector of genomic weights, \( {\widehat{\boldsymbol{\upbeta}}}_G=\left[\mathbf{I}{\left(\widehat{\mathbf{P}}\widehat{\boldsymbol{\Gamma}}\right)}^{1}\left(\widehat{\mathbf{C}}\widehat{\boldsymbol{\Gamma}}\right)\right]\mathbf{w} \).
Let \( {\mathbf{w}}^{\prime }=\left[5\kern0.5em 0.1\kern0.5em 0.1\right] \) be the vector of economic weights; then, according to the estimated matrices \( \widehat{\mathbf{P}} \), \( \widehat{\mathbf{C}} \), and \( \widehat{\boldsymbol{\Gamma}} \), \( {\widehat{\boldsymbol{\upbeta}}}_{\mathbf{y}}^{\prime }=\left[0.08\kern0.5em 0.02\kern0.5em 0.01\right] \) and \( {\widehat{\boldsymbol{\upbeta}}}_G^{\prime }=\left[4.92\kern0.5em 0.08\kern0.5em 0.09\right] \), whence the estimated CLGSI in the training population can be written as
Suppose a selection intensity of 10% (k_{I} = 1.755); then, the estimated CLGSI selection response and expected genetic gain per trait were \( {\widehat{R}}_C={k}_I\sqrt{{\widehat{{\boldsymbol{\upbeta}}^{\prime}}}_C{\widehat{\mathbf{T}}}_C{\widehat{\boldsymbol{\upbeta}}}_C}=1.54 \) and \( {\widehat{\mathbf{E}}}_C^{\prime }={k}_I\frac{{\widehat{\boldsymbol{\upbeta}}}_C^{\prime }{\widehat{\boldsymbol{\Psi}}}_C}{\sqrt{{\widehat{{\boldsymbol{\upbeta}}^{\prime}}}_C{\widehat{\mathbf{T}}}_C{\widehat{\boldsymbol{\upbeta}}}_C}}=\left[0.36\kern0.5em 1.04\kern0.5em 1.70\kern0.5em 0.36\kern0.5em 1.53\kern0.5em 2.38\right] \) respectively, whereas the estimated CLGSI accuracy was \( {\widehat{\rho}}_{HI_C}=\frac{{\widehat{\sigma}}_{I_C}}{{\widehat{\sigma}}_H}=0.814 \).
The estimated LPSI selection response, expected genetic gain per trait, and accuracy were 0.601, \( \left[0.09\kern0.5em 0.81\kern0.5em 0.89\right] \), and 0.32 respectively; thus, the CLGSI was more efficient to predict the net genetic merit than the LPSI because the CLGSI accuracy and selection response were 0.814 and 1.54 respectively.
5.2.5 LGSI and CLGSI Efficiency Vs LMSI, GWLMSI and LPSI Efficiency
In this subsection, we compare the accuracy, selection response, and efficiency of the LGSI and CLGSI with the LMSI, the GWLMSI, and the LPSI using the simulated data for a maize (Zea mays) population described in Chap. 2, Sect. 2.8.1.
Figure 5.3 presents the estimated accuracy values of the LMSI, the LGSI, the CLGSI, the LPSI, and the GWLMSI for five simulated selection cycles. According to these results, for the first three selection cycles, the estimated accuracies of the indices, in decreasing order, were LMSI > LGSI > CLGSI > LPSI > GWLMSI. That is, the highest estimated accuracy was obtained with the LMSI, whereas the lowest was obtained with the GWLMSI. For the fourth and fifth selection cycles, the estimated accuracies, in decreasing order, were LMSI > LPSI > CLGSI > LGSI > GWLMSI. This means that in all five selection cycles, the LMSI had the highest accuracy and the GWLMSI had the lowest accuracy, whereas the estimated LGSI accuracy was reduced to fourth place. Thus, the accuracy of the LGSI tended to decrease after the first three selection cycles whereas LPSI accuracy was a constant.
To compare LGSI efficiency versus the efficiency of the other selection indices, we assumed that the interval between selection cycles in the LGSI is 1.5 years, whereas for CLGSI, LMSI, GWLMSI, and LPSI, the interval was 4.0 years. Table 5.3 presents the estimated selection response of the LPSI, the LMSI, the GWLMSI, the LGSI, and the CLGSI, including and not including the interval between selection cycles (first and second parts of Table 5.3 respectively), obtained using five simulated selection cycles. According to the first part of Table 5.3, the average estimated selection responses, in decreasing order, of the LMSI, CLGSI, LPSI, GWLMSI, and LGSI for the five simulated selection cycles were 17.82, 15.30, 15.22, 13.24, and 13.11 respectively, when the length of the interval between selection was not included. If the length of the interval between selection cycles is included when comparing the selection response of the indices in terms of time, the estimated selection response of LMSI, CLGSI, LPSI, GWLMSI must be divided by 4 in each selection cycle, and the estimated LGSI selection response should be divided by 1.5. Thus, according to the second part of Table 5.3, if we include the length of the interval between selection cycles, the average estimated selection responses, in decreasing order, of LGSI, LMSI, CLGSI, LPSI, and GWLMSI for the five simulated selection cycles were 8.74, 4.46, 3.83, 3.80, and 3.31. This means that in terms of time, the efficiency of the LGSI was higher than the efficiency of the other four selection indices.
Table 5.4 presents the estimated accuracy of the LMSI, LGSI, CLGSI, LPSI, and the GWLMSI. In addition, Table 5.4 presents the efficiency when predicting the net genetic merit of the LMSI with respect to the LGSI, CLGSI, LPSI, and GWLMSI as percentages, for five simulated selection cycles. Note that in this case, LMSI efficiency was higher than the efficiency of the other four selection indices, because the LMSI had the highest correlation with the net genetic merit.
References
Beyene Y, Semagn K, Mugo S, Tarekegne A, Babu R et al (2015) Genetic gains in grain yield through genomic selection in eight biparental maize populations under drought stress. Crop Sci 55:154–163
CeronRojas JJ, Crossa J, Arief VN, Basford K, Rutkoski J, Jarquín D, Alvarado G, Beyene Y, Semagn K, DeLacy I (2015) A genomic selection index applied to simulated and real data. G3 (Bethesda) 5:2155–2164
CerónRojas JJ, SahagúnCastellanos J (2016) Multivariate empirical Bayes to predict plant breeding values. Agrociencia 50:633–648
Dekkers JCM (2007) Prediction of response to markerassisted and genomic selection using selection index theory. J Anim Breed Genet 124:331–341
Gianola D, PerezEnciso M, Toro MA (2003) On markerassisted prediction of genetic value: beyond the ridge. Genetics 163:347–365
Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genomeassisted breeding values. Genetics 177:2389–2397
Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49(1):12
Lin CY, Allaire FR (1977) Heritability of a linear combination of traits. Theor Appl Genet 51:1–3
Lorenz AJ, Chao S, Asoro FG, Heffner EL, Hayashi T et al (2011) Genomic selection in plant breeding: knowledge and prospects. Adv Agron 110:77–123
Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sinauer Associates, Sunderland, ME
Mrode RA (2005) Linear models for the prediction of animal breeding values, 2nd edn. CABI Publishing, Cambridge, MA
Nordskog AW (1978) Some statistical properties of an index of multiple traits. Theor Appl Genet 52:91–94
Schott JR (2005) Matrix analysis for statistics, 2nd edn. Wiley, Hoboken, NJ
Searle S, Casella G, McCulloch CE (2006) Variance components. Wiley, Hoboken, NJ
Su G, Christensen OF, Ostersen T, Henryon M, Lund MS (2012) Estimating additive and nonadditive genetic variances and predicting genetic merits using genomewide dense single nucleotide polymorphism markers. PLoS One 7(9):e45293
Technow F, Bürger A, Melchinger AE (2013) Genomic prediction of northern corn leaf blight resistance in maize with combined or separate training sets for heterotic groups. G3 (Bethesda) 3:197–203
Togashi K, Lin CY, Yamazaki T (2011) The efficiency of genomewide selection for genetic improvement of net merit. J Anim Sci 89:2972–2980
Van Raden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423
Vattikuti S, Guo J, Chow CC (2012) Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet 8(3):e1002637
Author information
Authors and Affiliations
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2018 The Author(s)
About this chapter
Cite this chapter
CéronRojas, J.J., Crossa, J. (2018). Linear Genomic Selection Indices. In: Linear Selection Indices in Modern Plant Breeding. Springer, Cham. https://doi.org/10.1007/9783319912233_5
Download citation
DOI: https://doi.org/10.1007/9783319912233_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783319912226
Online ISBN: 9783319912233
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)