Linear Genomic Selection Indices

Céron-Rojas, J. Jesus; Crossa, José

doi:10.1007/978-3-319-91223-3_5

J. Jesus Céron-Rojas⁴ &
José Crossa⁴

5088 Accesses
1 Citations

Abstract

The linear genomic selection index (LGSI) is a linear combination of genomic estimated breeding values (GEBVs) used to predict the individual net genetic merit and select individual candidates from a nonphenotyped testing population as parents of the next selection cycle. In the LGSI, phenotypic and marker data from the training population are fitted into a statistical model to estimate all individual available genome marker effects; these estimates can then be used in subsequent selection cycles to obtain GEBVs that are predictors of breeding values in a testing population for which there is only marker information. The GEBVs are obtained by multiplying the estimated marker effects in the training population by the coded marker values obtained in the testing population in each selection cycle. Applying the LGSI in plant or animal breeding requires the candidates to be genotyped for selection to obtain the GEBV, and predicting and ranking the net genetic merit of the candidates for selection using the LGSI. We describe the LGSI and show that it is a direct application of the linear phenotypic selection index theory in the genomic selection context; next, we present the combined LGSI (CLGSI), which uses phenotypic and GEBV information jointly to predict the net genetic merit. The CLGSI can be used only in training populations when there are phenotypic and maker information, whereas the LGSI is used in testing populations where there is only marker information. We validate the theoretical results of the LGSI and CLGSI using real and simulated data.

You have full access to this open access chapter, Download chapter PDF

An analytical framework to derive the expected precision of genomic selection

Article Open access 27 December 2017

Overview of Genomic Prediction Methods and the Associated Assumptions on the Variance of Marker Effect, and on the Architecture of the Target Trait

Genomic Selection

5.1 The Linear Genomic Selection Index

5.1.1 Basic Conditions for Constructing the LGSI

Conditions described in Chap. 4 (Sect. 4.1.1) for constructing a valid linear molecular selection index (LMSI), are also necessary for the linear genomic selection index (LGSI); however, in addition to those conditions, the LGSI also requires:

1.
All marker effects to be estimated simultaneously in the training population.
2.
The estimated marker effects to be used in subsequent selection cycles to obtain GEBVs that are predictors of the individual breeding values in the testing population (candidates for selection) for which there is only marker information.
3.
The GEBV values to be composed entirely of the additive genetic effects.
4.
Phenotypes to be used to estimate all marker effects in the training population, not to make selections in the testing population (Heffner et al. 2009; Lorenz et al. 2011).

5.1.2 Genomic Breeding Values and Marker Effects

The breeding value (g_i) is the average additive effects of the genes an individual receives from both parents; thus, it is a function of the genes transmitted from parents to progeny and is the only component that can be selected and, therefore, the main component of interest in breeding programs (Mrode 2005). The ith phenotypic value (y_i) can be denoted as y_i = g_i + e_i, where g_i is the breeding value and e_i the residual. Basic assumptions for g_i and e_i are: both g_i and e_i have normal distribution with expectation equal to zero and variance $ {\sigma}_{g_i}^2 $ and $ {\sigma}_{e_i}^2 $ respectively. This means that y_i = μ_i + g_i + e_i is a linear mixed model (Mrode 2005; Searle et al. 2006), where μ_i is the mean of y_i.

Let $ {\mathbf{y}}_i^{\prime }=\left[{y}_{i1}\kern0.5em {y}_{i2}\kern0.5em \cdots \kern0.5em {y}_{in}\right] $ be a vector 1 × n of observations in the ith trait and let $ {\mathbf{g}}_i^{\prime }=\left[{g}_{i1}\kern0.5em {g}_{i2}\kern0.5em \cdots \kern0.5em {g}_{in}\right] $ be a vector 1 × n of unobservable breeding values associated with y_i; then y_i can be written as

$$ {\mathbf{y}}_i=\mathbf{1}{\mu}_i+{\mathbf{Zg}}_i+{\mathbf{e}}_i, $$

(5.1)

where μ_i is the mean of the ith trait, 1 is a vector n × 1 of 1s, Z is a design matrix of 0s and 1s, g_i ~ MVN (0, $ \mathbf{A}{\sigma}_{g_i}^2 $) is a vector of breeding values, and e_i ~ MVN (0, $ {\mathbf{I}}_n{\sigma}_{e_i}^2 $) is a vector of residuals; 0 is the mean and $ \mathbf{A}{\sigma}_{g_i}^2 $ and $ {\mathbf{I}}_n{\sigma}_{e_i}^2 $ the covariance matrix of g_i and e_i respectively; A is the numerical relationship matrix (Mrode 2005) and I_n an identity matrix n × n; $ {\sigma}_{g_i}^2 $ and $ {\sigma}_{e_i}^2 $ are the additive and residual variances associated with g_i and e_i; and MVN stands for multivariate normal distribution.

Suppose that A, Z, μ_i, $ {\sigma}_{g_i}^2 $, and $ {\sigma}_{e_i}^2 $ are known; then, according to Mrode (2005), the best linear unbiased predictor (BLUP) of g_i can be written as

$$ {\widehat{\mathbf{g}}}_i={\sigma}_{g_i}^2\mathbf{A}{\mathbf{Z}}^{\prime }{\mathbf{V}}^{-1}\left({\mathbf{y}}_i-\mathbf{1}{\mu}_i\right), $$

(5.2)

where V⁻¹ is the inverse matrix of the variance of y_i, i.e., $ Var\left({\mathbf{y}}_i\right)={\sigma}_{g_i}^2\mathbf{ZA}{\mathbf{Z}}^{\prime }+{\mathbf{I}}_n{\sigma}_{e_i}^2=\mathbf{V} $. In the context of animal breeding, Eq. (5.2) is considered a univariate linear phenotypic selection index (LPSI) (Mrode 2005) and is used to rank and select individuals as parents of the next generation in the context of one trait. Equation (5.2) can be extended to the multi-trait phenotypic selection index case, but to predict the net genetic merit (H = w′g, see Chap. 2 for details) it would be necessary to construct linear combinations of the predicted values of g_i associated with the traits of interest as was described in the Foreword of this book.

The vector of the individual genomic breeding values (γ_i) associated with the ith characteristic (i = 1, 2,…,t; t = number of traits) of the candidates for selection can be written as

$$ {\boldsymbol{\upgamma}}_i={\mathbf{Xu}}_i, $$

(5.3)

where X is an n × m matrix (n = number of observations and m = number of markers in the population) of coded marker values (2 − 2p, 1 − 2p, and −2p for genotypes AA, Aa, and aa respectively) associated with the additive effects of the quantitative trait loci (QTL) and u_i is an m × 1 vector of the additive effects of the QTL associated with markers that affect the ith trait. It is assumed that γ_i has MVN with mean 0 and variance $ \mathbf{G}{\sigma}_{\gamma}^2 $, i.e., γ_i ~ MVN (0, $ \mathbf{G}{\sigma}_{\upgamma_{\mathrm{i}}}^2 $), where $ {\sigma}_{\upgamma_{\mathrm{i}}}^2 $ is the additive genomic variance of γ_i and G = XX′/c is the n × n additive genomic relationship matrix between genotypes; $ c=\sum \limits_{j=1}^m2{p}_j\left(1-{p}_j\right) $ in an F₂ population, and $ c=\sum \limits_{j=1}^m4{p}_j\left(1-{p}_j\right) $ in a double haploid population; p is the frequency of allele A and 1 − p is the frequency of allele a in the jth marker (j = 1, 2, …, m).

The additive genomic relationship matrix G = XX′/c has special properties. For example, in the asymptotic context, the expectation of matrix G is equal to the numerical relationship matrix A, i.e., E(G) = A (Habier et al. 2007; Van Raden 2008); this means that G is a particular realization of A and when the number of markers and genotypes increases in the training population, the value of G tends to concentrate around A. Thus, it can be assumed that at the limit, when the number of markers and genotypes is very high, G = A (Cerón-Rojas and Sahagún-Castellanos 2016).

The vector of genomic breeding values (Eq. 5.3) has a similar function in genomic selection as g_i in the phenotypic selection context. In addition, g_i can be written as g_i = γ_i + η_i, where η_i = g_i − γ_i (Gianola et al. 2003). Also, note that

$$ Cov\left({\mathbf{g}}_i,{\boldsymbol{\upgamma}}_i\right)={\sigma}_{\upgamma_{\mathrm{i}}}^2, $$

(5.4)

i.e., the covariance between γ_i and g_i is equal to the variance of γ_i (Dekkers 2007).

Let $ {\mathbf{y}}_i^{\prime }=\left[{y}_{i1}\kern0.5em {y}_{i2}\kern0.5em \cdots \kern0.5em {y}_{in}\right] $ be a vector 1 × n of observation of the ith trait in the training population and let $ {\boldsymbol{\upgamma}}_i^{\prime }=\left[{\gamma}_{i1}\kern0.5em {\gamma}_{i2}\kern0.5em \cdots \kern0.5em {\gamma}_{in}\right] $ be a vector 1 × n of unobservable genomic breeding values associated with y_i; then, y_i can also be written as

$$ {\mathbf{y}}_i=\mathbf{1}{\mu}_i+{\mathbf{Z}\boldsymbol{\upgamma}}_i+{\boldsymbol{\upvarepsilon}}_i, $$

(5.5)

where μ_i is the mean of the ith trait, 1 is a vector n × 1 of 1s, Z is a design matrix, γ_i ~ MVN (0, $ \mathbf{G}{\sigma}_{\gamma}^2 $) and ε_i ~ MVN (0, $ {\mathbf{I}}_n{\sigma}_{\varepsilon_i}^2 $) are vectors of genomic breeding values and of residuals respectively, and $ {\sigma}_{\varepsilon_i}^2 $ is the residual variance. I_n, G, and $ {\sigma}_{\upgamma}^2 $ were defined in Eqs. (5.2) and (5.3).

According to Eqs. (5.2) and (5.3), when μ_i, $ {\sigma}_{\upgamma}^2 $ and $ {\sigma}_{\varepsilon_i}^2 $ are known, the vector of GEBVs for the individuals with the ith trait can be obtained as

$$ {\widehat{\boldsymbol{\upgamma}}}_i={\sigma}_{\gamma_i}^2\mathbf{G}{\mathbf{Z}}^{\prime }{\mathbf{V}}^{-1}\left({\mathbf{y}}_i-\mathbf{1}{\mu}_i\right), $$

(5.6)

where the variance of y_i should now be written as $ \mathbf{V}={\sigma}_{\gamma_i}^2\mathbf{ZG}{\mathbf{Z}}^{\prime }+{\mathbf{I}}_n{\sigma}_{\varepsilon_i}^2 $. In the context of genomic selection, Eq. (5.6) is considered a univariate LGSI and is used to rank and select individuals as parents of the next generation (Van Raden 2008; Togashi et al. 2011). Equation (5.6) is the BLUP of γ_i and can be extended to a multi-trait genomic selection index, but to predict the net genetic merit (H = w′g), it is necessary to construct an LGSI, which is a linear combination of γ_i.

Although Eq. (5.6) is theoretically very important in LGSI, in practice we need to estimate the marker effects associated with all the traits of interest and to use these estimates in the testing population to obtain the GEBV of the candidates for selection. Let $ {\mathbf{u}}^{\prime }=\left[{\mathbf{u}}_1^{\prime}\kern0.5em {\mathbf{u}}_2^{\prime}\kern0.5em \cdots \kern0.5em {\mathbf{u}}_t^{\prime}\right] $ be a vector 1 × nt associated with t traits. In the univariate context, Van Raden (2008) showed that the ith vector u_i of marker effects in the training population can be estimated as

$$ {\widehat{\mathbf{u}}}_i={c}^{-1}{\mathbf{X}}^{\prime }{\left[\mathbf{G}+\upsilon {\mathbf{I}}_n\right]}^{-1}\left({\mathbf{y}}_i-\mathbf{1}{\mu}_i\right), $$

(5.7)

where $ \upsilon =\frac{\sigma_{e_i}^2}{\sigma_{g_i}^2} $; $ {\sigma}_{g_i}^2 $, $ {\sigma}_{e_i}^2 $ and the other parameters were defined earlier. According to Ceron-Rojas et al. (2015), to estimate the vector $ {\mathbf{u}}^{\prime }=\left[{\mathbf{u}}_1^{\prime}\kern0.5em {\mathbf{u}}_2^{\prime}\kern0.5em \cdots \kern0.5em {\mathbf{u}}_t^{\prime}\right] $ in the multi-trait context, Eq. (5.7) can be written as

$$ \widehat{\mathbf{u}}={c}^{-1}{\mathbf{W}}_t^{\prime }{\left[\left({\mathbf{I}}_t\otimes \mathbf{G}\right)+\left(\mathbf{N}\otimes {\mathbf{I}}_n\right)\right]}^{-1}\left(\mathbf{y}-\boldsymbol{\upmu} \otimes \mathbf{1}\right), $$

(5.8)

where W_t = I_t ⊗ X, “⊗” denotes the Kronecker product (Schott 2005), c and X were defined in Eq. (5.3); N = RC⁻¹, whereR and C are the residual and breeding value covariance matrices for t traits respectively; $ {\mathbf{y}}^{\prime }=\left[{\mathbf{y}}_1^{\prime}\kern0.5em {\mathbf{y}}_2^{\prime}\kern0.5em \cdots \kern0.5em {\mathbf{y}}_t^{\prime}\right] $ ~ MVN(μ, V) is a vector of size 1 × tn, with covariance matrix V = C ⊗ G + R ⊗ I_n; I_t is an identity matrix of size t × t and I_n was defined earlier; $ {\boldsymbol{\upmu}}^{\prime }=\left[{\mu}_1\kern0.5em {\mu}_2\kern0.5em \cdots \kern0.5em {\mu}_t\right] $ is a vector 1 × t of means associated with vector y, and 1 is a vector n × 1 of 1s. In this case, the estimator of the vector of sub-vectors of genomic breeding values $ {\boldsymbol{\upgamma}}^{\prime }=\left[{\boldsymbol{\upgamma}}_1\kern0.5em {\boldsymbol{\upgamma}}_2\kern0.5em \dots \kern0.5em {\boldsymbol{\upgamma}}_t\right] $ in the testing population can be obtained as

$$ \widehat{\boldsymbol{\upgamma}}={\mathbf{W}}_t\widehat{\mathbf{u}}. $$

(5.9)

Equation (5.9) is the vector of GEBVs for the multi-trait case. Thus, in the testing population, in Eq. (5.9), only the coded values in matrix X change, whereas $ \widehat{\mathbf{u}} $ is the same in each selection cycle. Note that to obtain Eqs. (5.7) and (5.8), we assumed that μ, C, and R are known.

We indicated that the genomic breeding values have normal distribution (Eq. 5.5). Using the simulated data described in Chap. 2, Sect. 2.8.1, in Fig. 5.1 we present the distribution of the GEBVs (Eq. 5.9) associated with traits T1 in the first (Fig. 5.1a) and the fifth (Fig. 5.1b) selection cycles in the testing population. In effect, the frequency distribution of the GEBVs approaches normal distribution in both selection cycles.

5.1.3 The LGSI and Its Parameters

Similar to the LPSI (Chap. 2), the objective of the LGSI is to predict the net genetic merit H = w′g, where $ {\mathbf{g}}^{\prime }=\left[{g}_1\kern0.5em {g}_2\kern0.5em \dots \kern0.5em {g}_t\right] $ (t = number of traits) is a vector of unobservable true breeding values and $ {\mathbf{w}}^{\prime }=\left[{w}_1\kern0.5em {w}_2\kern0.5em \dots \kern0.5em {w}_t\right] $ is a vector of economic weights. Suppose that the genomic breeding values γ_i = Xu_i are known; then, the LGSI can be written as

$$ {I}_{\mathrm{G}}={\boldsymbol{\upbeta}}^{\prime}\boldsymbol{\upgamma}, $$

(5.10)

where β is an unknown vector of weights.

The main advantage of the LGSI over the LPSI lies in the possibility of reducing the intervals between selection cycles (L_G) by more than two thirds (Lorenz et al. 2011); thus, this parameter should be incorporated into the LGSI selection response and the expected genetic gain per trait to reflect the main advantage of the LGSI over the LPSI and the other indices. Assuming that L_G = 1, in the LPSI context we wrote the selection response as R_I = k_Iσ_Hρ_HI; however, if L_G ≠ 1, the LGSI selection response can be written as

$$ {R}_{I_G}=\frac{k_I}{L_G}\frac{\sigma_{HI_G}}{\sigma_{I_G}^2}=\frac{k_I}{L_G}{\sigma}_H{\rho}_{HI_G}, $$

(5.11)

where k_I is the standardized selection differential (or selection intensity) associated with the LGSI, $ {\sigma}_{HI_G} $ is the covariance between H = w′g and the LGSI, $ {\sigma}_{I_G}^2 $ is the variance of the LGSI, σ_H is the standard deviation of H, $ {\rho}_{HI_G} $ is the correlation between H and the LGSI, and L_G denotes the intervals between selection cycles.

Let C and Γ be matrices of covariance of the breeding values (g) and of the genomic breeding values (γ) respectively; then, the correlation between H = w′g and I_G = β′γ can be written as

$$ {\rho}_{HI_G}=\frac{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \boldsymbol{\upbeta}}{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}\sqrt{{\boldsymbol{\upbeta}}^{\prime}\boldsymbol{\Gamma} \boldsymbol{\upbeta}}}, $$

(5.12)

where $ {\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \boldsymbol{\upbeta} ={\sigma}_{HI_G} $ is the covariance between H = w′g and I_G = β′γ, $ {\sigma}_H=\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}} $ is the standard deviation of the variance of H = w′g, and $ {\sigma}_{I_G}=\sqrt{{\boldsymbol{\upbeta}}^{\prime}\boldsymbol{\Gamma} \boldsymbol{\upbeta}} $ is the standard deviation of the variance of I_G = β′γ.

5.1.4 Maximizing LGSI Parameters

To maximize the genomic selection response (Eq. 5.11), suppose that k_I, σ_H and L_G are fixed and take the derivative of the natural logarithm (ln) of the correlation between H and I_G (Eq. 5.12) with respect to vector β, equate the result of the derivative to the null vector, and isolate β, i.e.,

$$ \frac{\partial }{\partial \boldsymbol{\upbeta}}\ln {\rho}_{HI_g}=\frac{\partial }{\partial \boldsymbol{\upbeta}}\ln \left(\frac{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \boldsymbol{\upbeta}}{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}\sqrt{{\boldsymbol{\upbeta}}^{\prime}\boldsymbol{\Gamma} \boldsymbol{\upbeta}}}\right)=\mathbf{0}. $$

(5.13)

The result is β = sw, where s = β′Γβ/w′Γβ is a proportional constant that does not affect the maximum value of $ {\rho}_{HI_G} $, because this is invariant to the scale change; then, assuming that β = w, the maximized LGSI selection response can be written as

$$ {R}_{I_G}=\frac{k_I}{L_G}\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}}. $$

(5.14)

Hereafter, we refer to the LGSI genomic selection response as that of Eq. (5.14). Also, because β = w, Eq. (5.12) can be written as

$$ {\rho}_{HI_G}=\frac{\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}}}{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}}=\frac{\sigma_{I_G}}{\sigma_H}, $$

(5.15)

which is the maximized correlation between H = w′g and I_G = β′γ, or LGSI accuracy; $ {\sigma}_H=\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}} $ is the standard deviation of the variance of H, and $ {\sigma}_{I_G}=\sqrt{{\boldsymbol{\upbeta}}^{\prime}\boldsymbol{\Gamma} \boldsymbol{\upbeta}} $ is the standard deviation of the variance of I_G.

The LGSI expected genetic gain per trait ($ {\mathbf{E}}_{I_G} $) can be written as

$$ {\mathbf{E}}_{I_G}=\frac{k_I}{L_G}\frac{\boldsymbol{\Gamma} \mathbf{w}}{\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}}}. $$

(5.16)

All the terms in Eq. (5.16) were previously defined.

Let $ {\lambda}_G=\frac{\rho_{HI_G}}{\rho_{HI}} $ be LGSI efficiency versus LPSI efficiency to predict the net genetic merit, where $ {\rho}_{HI_G} $ is the LGSI accuracy and ρ_HI the LPSI accuracy; in percentage terms, LGSI efficiency versus LPSI efficiency for each selection cycle can be written as

$$ {p}_G=100\left({\lambda}_G-1\right). $$

(5.17)

According to Eq. (5.17), if p_G > 0, LGSI efficiency is greater than LPSI efficiency; if p_G = 0, the efficiency of both selection indices is equal, and if p_G < 0, the LPSI is more efficient than the LGSI at predicting H = w′g.

Equation (5.17) is useful for measuring LGSI efficiency in terms of accuracy when predicting the net genetic merit (H = w′g), whereas the Technow et al. (2013) inequality measures LGSI efficiency in terms of the time needed to complete one selection cycle. In the context of the LGSI and the LPSI, the Technow inequality can be written as

$$ {L}_G<\frac{\rho_{HI_G}}{h_I}{L}_P, $$

(5.18)

where L_G and L_P denote the time required to complete one selection cycle for the LGSI and the LPSI respectively, $ {\rho}_{HI_G} $ is the LGSI accuracy, and h_I is the square root of the heritability (Lin and Allaire 1977; Nordskog 1978) of the LPSI, which can be denoted as $ {h}_I=\sqrt{\frac{{\mathbf{b}}^{\prime}\mathbf{Cb}}{{\mathbf{b}}^{\prime}\mathbf{Pb}}} $ (see Chap. 2 for details). Then, assuming that the selection intensity is the same for both selection indices, if Eq. (5.18) is true, the LGSI is more efficient than the LPSI per unit of time.

5.1.5 Relationship Between the LGSI and LPSI Selection Responses

To obtain the relationship between $ {R}_{I_G} $ and R_I in the asymptotic context, we omitted the intervals between selection cycles (L_G and L_I respectively) to simplify the algebra. Consider a population where the number of genotypes and markers tends to infinity; in this case, markers explain most of the true additive genetic variances and covariances. Thus, we can assume that matrices Γ and C are very similar, and at the limit, Γ = C. Now suppose that in this population the phenotypic variance–covariance matrix (P) is known and comprises matrix Γ and the variance–covariance residual matrix (R). In this case, the inverse of P can be written as P⁻¹ = (Γ + R)⁻¹ = Γ⁻¹ − Γ⁻¹(Γ⁻¹ + R⁻¹)⁻¹Γ⁻¹, where Γ⁻¹ and R⁻¹ are the inverses of matrices Γ and R respectively. Thus, the LPSI selection response is given by

$$ {R}_I={k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}={k}_I\sqrt{{\mathbf{w}}^{\prime }{\boldsymbol{\Gamma} \mathbf{P}}^{-1}\boldsymbol{\Gamma} \mathbf{w}}={k}_I\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}-{\mathbf{w}}^{\prime }{\left({\boldsymbol{\Gamma}}^{-1}+{\mathbf{R}}^{-1}\right)}^{-1}\mathbf{w}}, $$

(5.19)

where b = P⁻¹Γw is the vector of coefficients of the LPSI in the asymptotic context. Note that b′Pb ≥ 0 and w′Γw ≥ 0, i.e., b′Pb and w′Γw are positive semi-definite, meaning that w′Γw ≥ w′(Γ⁻¹ + R⁻¹)⁻¹w ≥ 0; then, in the asymptotic context, $ {R}_{I_G}\ge {R}_I $. This result is not common when the number of genotypes and markers is small; however, it gives an idea of the theoretical behavior of $ {R}_{I_G} $ with respect to R_I when the number of markers and genotypes is very large.

Because g_q can be written as g_q = γ_q + η_q, where η_q = g_q − γ_q (q = 1, 2, ⋯, t), for low numbers of markers and genotypes, the covariance genotypic matrix C can be written as C = Γ + E, where E = C − Γ; then, the inverse of matrix P can be written as P⁻¹ = [(Γ + E) + R]⁻¹ = (Γ + E)⁻¹ − (Γ + E)⁻¹[(Γ + E)⁻¹ + R⁻¹]⁻¹(Γ + E)⁻¹. In the latter case, the LPSI selection response R_I can be written as

$$ {R}_I={k}_I\sqrt{{\mathbf{w}}^{\prime}\left(\boldsymbol{\Gamma} +\mathbf{E}\right){\mathbf{P}}^{-1}\left(\boldsymbol{\Gamma} +\mathbf{E}\right)\mathbf{w}}={k}_I\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}+{\mathbf{w}}^{\prime}\mathbf{Ew}-{\mathbf{w}}^{\prime }{\left[{\left(\boldsymbol{\Gamma} +\mathbf{E}\right)}^{-1}+{\mathbf{R}}^{-1}\right]}^{-1}\mathbf{w}}. $$

(5.20)

Equation (5.20) indicates that in the non-asymptotic context (low numbers of markers and genotypes), $ {R}_{I_G} $ and R_I are related in three possible ways:

1.
$ {R}_I>{R}_{I_G} $ if w′Ew > w′[(Γ + E)⁻¹ + R⁻¹]⁻¹w
2.
$ {R}_I={R}_{I_G} $ if w′Ew = w′[(Γ + E)⁻¹ + R⁻¹]⁻¹w
3.
$ {R}_{I_G}>{R}_I $ if w′Ew < w′[(Γ + E)⁻¹ + R⁻¹]⁻¹w

The second and third points indicate that $ {R}_{I_G} $ may be equal to or larger than R_I, even under a small number of markers, depending on the size of w′Ew and w′[(Γ + E)⁻¹ + R⁻¹]⁻¹w. These three points explain the theoretical relationship between R_I and $ {R}_{I_G} $ for a low number of markers and genotypes. When Γ = C, E = 0, and $ {R}_I={k}_I\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}-{\mathbf{w}}^{\prime }{\left({\boldsymbol{\Gamma}}^{-1}+{\mathbf{R}}^{-1}\right)}^{-1}\mathbf{w}} $, then $ {R}_{I_G}\ge {R}_I $.

5.1.6 Statistical LGSI Properties

Assuming that H and I_G have joint bivariate normal distribution and that Γ, C, and w are known, the LGSI has the following properties:

1.
The variance of I_G ($ {\sigma}_{I_G}^2 $) and the covariance between H and I_G ($ {\sigma}_{HI_G} $) are equal, i.e., $ {\sigma}_{I_G}^2={\sigma}_{HI_G} $.
2.
The maximized correlation between H and I_G (or LGSI accuracy) is equal to $ {\rho}_{HI_G}=\frac{\sigma_{I_G}}{\sigma_H} $, where $ {\sigma}_{I_G} $ is the standard deviation of $ {\sigma}_{I_G}^2 $ and σ_H is the standard deviation of the variance of H ($ {\sigma}_H^2 $).
3.
The variance of the predicted error, $ Var\left(H-{I}_G\right)=\left(1-{\rho}_{HI_G}^2\right){\sigma}_H^2 $, is minimal. Note that $ Var\left(H-{I}_G\right)={\sigma}_{I_G}^2+{\sigma}_H^2-2{\sigma}_{HI_G} $, and when β = w, $ {\sigma}_{I_G}^2={\sigma}_{HI_G} $, whence $ Var\left(H-{I}_G\right)={\sigma}_H^2-{\sigma}_{I_G}^2=\left(1-{\rho}_{HI_G}^2\right){\sigma}_H^2 $ is minimal.
4.
The total variance of H explained by I_G is $ {\sigma}_{I_G}^2={\rho}_{HI_G}^2{\sigma}_H^2 $. It is evident that if $ {\rho}_{HI_G}=1 $, $ {\sigma}_{I_G}^2={\sigma}_H^2 $, and if $ {\rho}_{HI_G}=0 $, $ {\sigma}_{I_G}^2=0 $. That is, the variance of H explained by I_G is proportional to $ {\rho}_{HI_G} $, and when $ {\rho}_{HI_G} $ is close to 1, $ {\sigma}_{I_G}^2 $ is close to $ {\sigma}_H^2 $; if $ {\rho}_{HI_G} $ is close to 0, $ {\sigma}_{I_G}^2 $ is close to 0.

The LGSI properties described in points 1–4 of this subsection are the same as the LPSI properties described in Chap. 2. This corroborates the LGSI as an application of the LPSI theory to the genomic selection context.

5.1.7 Genomic Covariance Matrix in the Training and Testing Population

To derive the LGSI theory, we assumed that the true genomic additive variance–covariance matrix Γ was known. However, in practice, we need to estimate it. In the training population, matrix Γ can be estimated by restricted maximum likelihood (REML) using phenotypic and genomic information, as described by Vattikuti et al. (2012) and Su et al. (2012). In Eqs. (2.22) to (2.24) of Chap. 2, we presented the formulas for estimating the genotypic and residual variance and covariance based on the formulas described by Lynch and Walsh (1998). Here, we present a brief description of how we can estimate the qth component (σ_γqq) of Γ in the training population using the REML method.

We estimated $ {\sigma}_{\upgamma qq}={\sigma}_{\upgamma q}^2 $ (q, q′ = t = number of traits) in the absence of dominance and epistatic effects, using the model y_q = 1μ_q + Zγ_q + ε_q, where the vector y_q ~ NMV(1μ_q, V_q) g × 1 (g = number of genotypes in the population) had a multivariate normal distribution; 1 was a g × 1 vector of 1s, μ_q was the mean of the qth trait, Z was an identity matrix g × g; γ_q ~ NMV(0, $ \mathbf{G}{\sigma}_{\upgamma q}^2 $) was a vector of genomic breeding values, and ε_q ~ NMV(0, $ \mathbf{I}{\sigma}_{\varepsilon_q}^2 $) was a g × 1 vector of residuals. Matrix G = XX′/c was the genomic relationship matrix, and in an F₂ population, $ c=\sum \limits_{j=1}^N2{p}_j{q}_j $; X was a g × m matrix (m = number of markers) of the coded marker values (2 − 2p for AA, 1 − 2p for Aa, and −2p for aa) for the additive effects of the markers; p and q denote the frequency of allele A and the frequency of allele a in the jth marker (j = 1, 2, …, m), and $ {\mathbf{V}}_q=\mathbf{G}{\sigma}_{\upgamma q}^2+\mathbf{I}{\sigma}_{\varepsilon_q}^2 $.

The expectation–maximization algorithm allowed the REML for the variance components $ {\sigma}_{\upgamma q}^2 $ and $ {\sigma}_{\varepsilon_q}^2 $ to be computed by iterating the following equations:

$$ {\sigma}_{\upgamma q}^{2\left(n+1\right)}={\sigma}_{\upgamma q}^{2(n)}+\frac{{\left({\sigma}_{\upgamma q}^{2(n)}\right)}^2}{g}\left[{\mathbf{y}}_q^{\prime}\left({\mathbf{T}}^{(n)}{\mathbf{GT}}^{(n)}\right){\mathbf{y}}_q- tr\left({\mathbf{T}}^{(n)}\mathbf{G}\right)\right] $$

(5.21)

and

$$ {\sigma}_{\varepsilon_q}^{2\left(n+1\right)}={\sigma}_{\varepsilon_q}^{2(n)}+\frac{{\left({\sigma}_{\varepsilon_q}^{2(n)}\right)}^2}{g}\left[{\mathbf{y}}_q^{\prime}\left({\mathbf{T}}^{(n)}{\mathbf{T}}^{(n)}\right){\mathbf{y}}_q- tr\left({\mathbf{T}}^{(n)}\right)\right], $$

(5.22)

where g is the number of genotypes. After n iterations, when $ {\sigma}_{\upgamma q}^{2\left(n+1\right)} $ was very similar to $ {\sigma}_{\upgamma q}^{2(n)} $ and $ {\sigma}_{\varepsilon_q}^{2\left(n+1\right)} $ was very similar to $ {\sigma}_{\varepsilon_q}^{2(n)} $, $ {\sigma}_{\upgamma q}^{2\left(n+1\right)} $ and $ {\sigma}_{\varepsilon_q}^{2\left(n+1\right)} $ were the estimated variance components of $ {\sigma}_{\upgamma q}^2 $ and $ {\sigma}_{\varepsilon_q}^2 $ respectively. In Eqs. (5.21) and (5.22) tr(.) denoted the trace of the matrices within brackets; $ \mathbf{T}={\mathbf{V}}_q^{-1}-{\mathbf{V}}_q^{-1}\mathbf{1}{\left({\mathbf{1}}^{\prime }{\mathbf{V}}_q^{-1}\mathbf{1}\right)}^{-1}{\mathbf{1}}^{\prime }{\mathbf{V}}_q^{-1} $, and $ {\mathbf{V}}_q^{-1} $ was the inverse of $ {\mathbf{V}}_q=\mathbf{G}{\sigma}_{\upgamma q}^2+\mathbf{I}{\sigma}_{\varepsilon_q}^2 $. In matrix T⁽ⁿ⁾, $ {\mathbf{V}}_q^{-1(n)} $ was the inverse of matrix $ {\mathbf{V}}_q^{(n)}=\mathbf{G}{\sigma}_{\upgamma q}^{2(n)}+\mathbf{I}{\sigma}_{\varepsilon_q}^{2(n)} $.

The genomic additive genetic covariance between the observations of the qth and ith traits, y_q and y_i ($ {\sigma}_{\gamma_{\mathrm{qi}}} $, q, i = 1, 2,…,t), can be estimated by REML. Here, we adapted Eqs. (5.21) and (5.22) using the variance of the sum of y_q and y_i, i.e., Var(y_i + y_q) = V_i + V_q + 2C_iq, where $ {\mathbf{V}}_i=\mathbf{G}{\sigma}_{{\boldsymbol{\upgamma}}_{\mathrm{i}}}^2+\mathbf{I}{\sigma}_{\varepsilon_i}^2= Var\left({\mathbf{y}}_i\right) $ is the variance of y_i and $ {\mathbf{V}}_q=\mathbf{G}{\sigma}_{{\boldsymbol{\upgamma}}_{\mathrm{q}}}^2+\mathbf{I}{\sigma}_{\varepsilon_q}^2= Var\left({\mathbf{y}}_q\right) $ is the variance of y_q; $ 2{\mathbf{C}}_{iq}=2\mathbf{G}{\sigma}_{{\boldsymbol{\upgamma}}_{\mathrm{iq}}}+2\mathbf{I}{\sigma}_{\varepsilon_{iq}}=2 Cov\left({\mathbf{y}}_i,{\mathbf{y}}_q\right) $ is the covariance of y_q and y_i, and $ {\sigma}_{{\boldsymbol{\upgamma}}_{\mathrm{iq}}} $ and $ {\sigma}_{\varepsilon_{iq}} $ are the genomic and residual covariance respectively, associated with y_i and y_q. Thus, one way of estimating $ {\sigma}_{{\boldsymbol{\upgamma}}_{\mathrm{iq}}} $ and $ {\sigma}_{e_{iq}} $ is by using the following equation:

$$ 0.5 Var\left({\mathbf{y}}_i+{\mathbf{y}}_q\right)-0.5 Var\left({\mathbf{y}}_i\right)-0.5 Var\left({\mathbf{y}}_q\right), $$

(5.23)

for which Eqs. (5.21) and (5.22) can be adapted.

If there is only marker information on the testing population, then it is not possible to estimate Γ using Eqs. (5.21) to (5.23). Another way of estimating Γ is to use the method proposed by Ceron-Rojas et al. (2015), which requires the estimated values of γ_q ($ {\widehat{\boldsymbol{\upgamma}}}_q $) in the cycle of interest. Let $ \widehat{\mathbf{u}} $ be the estimator of the vector of marker effects $ {\mathbf{u}}^{\prime }=\left[{\mathbf{u}}_1^{\prime}\kern0.5em {\mathbf{u}}_2^{\prime}\kern0.5em \cdots \kern0.5em {\mathbf{u}}_t^{\prime}\right] $ for t traits obtained in the training population. We obtained the qth GEBVs (q = 1, 2, …, t) in the lth selection cycle (l = 1, 2, …, number of cycles) as

$$ {\widehat{\boldsymbol{\upgamma}}}_{ql}={\mathbf{X}}_l{\widehat{\mathbf{u}}}_q $$

(5.24)

where $ {\widehat{\mathbf{u}}}_q $ is the vector of size m × 1 of the estimated marker effects of the qth trait in the training population and X_l is a matrix of size n × m of the coded values of marker genotypes in the lth selection cycle of the testing population.

Now suppose that γ_q and $ {\boldsymbol{\upgamma}}_{q^{\prime }} $ have multivariate normal distribution jointly, with mean $ \mathbf{1}{\mu}_{\gamma_q} $ and $ \mathbf{1}{\mu}_{\gamma_{q^{\prime }}} $ respectively, and covariance matrix $ \mathbf{G}{\sigma}_{\gamma_{q{q}^{\prime }}} $, where 1 is an n × 1 vector of 1s and G = XX′/c is the additive genomic relationship matrix. Then, $ \boldsymbol{\Gamma} =\left\{{\sigma}_{\gamma_{q{q}^{\prime }}}\right\} $ can be estimated as

$$ {\widehat{\boldsymbol{\Gamma}}}_l=\left\{{\widehat{\sigma}}_{\gamma_{q{q}^{\prime }}}\right\}, $$

(5.25)

where $ {\widehat{\sigma}}_{\gamma_{q{q}^{\prime }}}=\frac{1}{g}{\left({\widehat{\boldsymbol{\upgamma}}}_{ql}-\mathbf{1}{\widehat{\mu}}_{\gamma_{ql}}\right)}^{\prime }{\mathbf{G}}_l^{-1}\left({\widehat{\boldsymbol{\upgamma}}}_{q^{\prime }l}-\mathbf{1}{\widehat{\mu}}_{\gamma_{q^{\prime }l}}\right) $ is the estimated covariance between γ_q and $ {\boldsymbol{\upgamma}}_{q^{\prime }} $ in the lth selection cycle of the testing population; g is the number of genotypes; $ {\widehat{\boldsymbol{\upgamma}}}_{ql} $ was defined in Eq. (5.24); $ {\widehat{\mu}}_{\gamma_{ql}} $ and $ {\widehat{\mu}}_{\gamma_{q^{\prime }l}} $ are the estimated arithmetic means of the values of $ {\widehat{\boldsymbol{\upgamma}}}_{ql} $ and $ {\widehat{\boldsymbol{\upgamma}}}_{q^{\prime }l} $; 1 is a g × 1 vector of 1s and $ {\mathbf{G}}_l={c}^{-1}{\mathbf{X}}_l{\mathbf{X}}_l^{\prime } $ is the additive genomic relationship matrix in the lth selection cycle (l = 1, 2, …, number of cycles) in the testing population.

From Eq. (5.25) we can estimate the LGSI response and expected genetic gain per trait in the testing population as

$$ {\widehat{R}}_{I_G}=\frac{k_I}{L_G}\sqrt{{\mathbf{w}}^{\prime}\widehat{\boldsymbol{\Gamma}}\mathbf{w}}\kern1em \mathrm{and}\kern1em {\widehat{\mathbf{E}}}_{I_G}=\frac{k_I}{L_G}\frac{\widehat{\boldsymbol{\Gamma}}\mathbf{w}}{\sqrt{{\mathbf{w}}^{\prime}\widehat{\boldsymbol{\Gamma}}\mathbf{w}}}, $$

(5.26)

respectively. The estimated LGSI ($ {\widehat{I}}_{\mathrm{G}} $) values in the lth selection cycle can be obtained as

$$ {\widehat{I}}_{\mathrm{G}}=\sum \limits_{q=1}^{\mathrm{t}}{w}_q{\widehat{\boldsymbol{\upgamma}}}_{ql}, $$

(5.27)

where w_q is the qth economic weight and $ {\widehat{\boldsymbol{\upgamma}}}_{ql} $ was defined in Eq. (5.24). Equation (5.27) is a vector of size g × 1 (g= number of genotypes). In practice, $ {\widehat{I}}_{\mathrm{G}} $ values are ranked to select individual genotypes with optimal GEBVs.

5.1.8 Numerical Examples

To estimate matrices C and R and the marker effects in the training population, we used a real maize (Zea mays) F₂ population with 248 genotypes (each with two repetitions), 233 molecular markers, and three traits—grain yield (GY, ton ha⁻¹), ear height (EHT, cm), and plant height (PHT, cm)—evaluated in one environment. The estimated matrices were $ \widehat{\mathbf{C}}=\left[\begin{array}{ccc}0.07& 0.61& 1.06\\ {}0.61& 17.93& 22.75\\ {}1.06& 22.75& 44.53\end{array}\right] $ and $ \widehat{\mathbf{R}}=\left[\begin{array}{ccc}0.38& 0.72& 1.27\\ {}0.72& 47.14& 60.96\\ {}1.27& 60.96& 121.46\end{array}\right] $, which were estimated by Eqs. (5.21) to (5.23) using the numerical relationship A instead of the genomic relationship matrix (G = XX′/c).

Table 5.1 presents the first 20 BLUPs of the estimated marker effects (Eq. 5.8) in the training population and the first 20 marker coded values and GEBVs (Eq. 5.9) obtained in the testing population associated with trait GY. In the testing population, there were 380 genotypes and 233 molecular markers. In this population, the estimated genomic covariance matrix $ \boldsymbol{\Gamma} =\left\{{\sigma}_{\gamma_{q{q}^{\prime }}}\right\} $ was $ \widehat{\boldsymbol{\Gamma}}=\left[\begin{array}{ccc}0.21& 2.95& 5.00\\ {}2.95& 42.41& 71.11\\ {}5.00& 71.11& 121.53\end{array}\right] $. The first GEBV (0.195) related to GY in Table 5.1 was obtained as 0.195 = − 0.0003(1) − 0.0038(1) − 0.0085(0) + ⋯ − 0.03(−1). The other GEBVs can be obtained in a similar manner.

Table 5.1 The 20 best linear unbiased predictors (BLUPs) of the estimated marker effects in the training population and the first 20 marker coded values and genomic estimated breeding values (GEBVs) obtained in the testing population associated with grain yield

Full size table

Suppose a selection intensity of 10% (k_I = 1.755) and a vector of economic weights of $ {\mathbf{w}}^{\prime }=\left[5\kern0.5em -0.1\kern0.5em -0.1\right] $; then, the estimated LGSI selection response and the expected genetic gain per trait without including the interval between selection cycle is $ {\widehat{R}}_{I_G}=(1.755)\sqrt{{\mathbf{w}}^{\prime}\widehat{\boldsymbol{\Gamma}}\mathbf{w}}=0.92 $ and $ {\widehat{\mathbf{E}}}_{I_G}^{\prime }=(1.755)\frac{{\mathbf{w}}^{\prime}\widehat{\boldsymbol{\Gamma}}}{\sqrt{{\mathbf{w}}^{\prime}\widehat{\boldsymbol{\Gamma}}\mathbf{w}}}=\left[0.80\kern0.5em 11.41\kern0.5em 19.28\right] $ respectively, whereas the estimated LGSI accuracy was $ {\widehat{\rho}}_{HI_G}=0.48 $.

Chapter 11 presents RIndSel, a graphical unit interface that uses selection index theory to select individual candidates as parents for the next selection cycle, which can be used to obtain the results of the real numerical example described in this subsection.

To compare LGSI efficiency versus LPSI efficiency we used the simulated data described in Chap. 2, Sect. 2.8.1. According to Beyene et al. (2015), at least 4 years are required to complete one phenotypic selection cycle in maize, whereas genomic selection requires only 1.5 years. Thus, to compare LGSI efficiency versus LPSI efficiency in terms of time, we can use the Technow et al. (2013) inequality described in Eq. (5.18).

Table 5.2 presents the estimated value of Eq. (5.18) for five simulated selection cycles. The LGSI efficiency was higher than LPSI efficiency in terms of time, because the Technow et al. (2013) inequality was true in the five selection cycles. An additional result obtained by Ceron-Rojas et al. (2015) is presented in Fig. 5.2, which shows the correlation among the LGSI, the LPSI, and the true net genetic merit values in seven selection cycles. According to Fig. 5.2, the correlation between the LGSI and the true net genetic merit values was higher than the correlation between the LPSI and the true net genetic merit values for the first three selection cycles; after this cycle, the correlation between LGSI and the true net genetic merit values tended to decrease.

Table 5.2 Five simulated selection cycles

Full size table

5.2 The Combined Linear Genomic Selection Index

The combined LGSI (CLGSI) developed by Dekkers (2007) is a slightly modified version of the LMSI (see Chap. 4 for details), which, instead of using the marker scores, uses the GEBVs and the phenotypic information jointly to predict the net genetic merit. The main difference between the CLGSI and the LGSI is that the CLGSI can only be used in training populations, whereas the LGSI is used in testing populations. The basic conditions for constructing a valid CLGSI include conditions for constructing the LPSI, the LMSI, and the LGSI, because the CLGSI uses GEBVs and phenotypic information jointly to predict the net genetic merit.

5.2.1 The CLGSI Parameters

The net genetic merit can be written in a similar manner to that in the LMSI context, that is, as

$$ H={\mathbf{w}}^{\prime}\mathbf{g}+{\mathbf{w}}_2^{\prime}\boldsymbol{\upgamma} =\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right]\left[\begin{array}{c}\mathbf{g}\\ {}\boldsymbol{\upgamma} \end{array}\right]={\mathbf{a}}_G^{\prime }{\mathbf{z}}_G, $$

(5.28)

where $ {\mathbf{g}}^{\prime }=\left[{g}_1\kern0.5em \dots \kern0.5em {g}_t\right] $ is the vector of breeding values, $ {\mathbf{w}}^{\prime }=\left[{w}_1\kern0.5em \cdots \kern0.5em {w}_t\right] $ is the vector of economic weights associated with breeding values, $ {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_t\right] $ is a null vector associated with the vector of genomic breeding values $ {\boldsymbol{\upgamma}}^{\prime }=\left[{\gamma}_1\kern0.5em {\gamma}_2\kern0.5em \dots \kern0.5em {\gamma}_t\right] $, $ {\mathbf{a}}_G^{\prime }=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right] $ and $ {\mathbf{z}}_G=\left[{\mathbf{g}}^{\prime}\kern0.5em {\boldsymbol{\upgamma}}^{\prime}\right] $.

The CLGSI can be written as

$$ {I}_C={\boldsymbol{\upbeta}}_{\mathbf{y}}^{\prime}\mathbf{y}+{\boldsymbol{\upbeta}}_G^{\prime}\boldsymbol{\upgamma} =\left[{\boldsymbol{\upbeta}}_{\mathbf{y}}^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_G^{\prime}\right]\left[\begin{array}{c}\mathbf{y}\\ {}\boldsymbol{\upgamma} \end{array}\right]={\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{t}}_C, $$

(5.29)

where $ {\mathbf{y}}^{\prime }=\left[{y}_1\kern0.5em \cdots \kern0.5em {y}_t\right] $ (t = number of traits) is the vector of phenotypic values; γ was defined earlier; $ {\boldsymbol{\upbeta}}_{\mathbf{y}}^{\prime } $ and β_G are vectors of coefficients of phenotypic and genomic weight values respectively; $ {\boldsymbol{\upbeta}}_C^{\prime }=\left[{\boldsymbol{\upbeta}}_{\mathbf{y}}^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_G^{\prime}\right] $ and $ {\mathbf{t}}_G^{\prime }=\left[{\mathbf{y}}^{\prime}\kern0.5em {\boldsymbol{\upgamma}}^{\prime}\right] $.

The CLGSI selection response can be written as

$$ {R}_C={k}_I{\sigma}_H{\rho}_{HI_C}={k}_I{\sigma}_H\frac{{\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\boldsymbol{\upbeta}}_C}{\sqrt{{\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\mathbf{a}}_C}{\sqrt{{\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C\boldsymbol{\upbeta}}}_C}, $$

(5.30)

where k_I is the standardized selection differential of the CLGSI, $ {\sigma}_H^2={\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\mathbf{a}}_C $ and $ Var\left({I}_C\right)={\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C $ are the variances of H and I_C, whereas $ {\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\boldsymbol{\upbeta}}_C $ and $ {\rho}_{HI_C} $ are the covariance and the correlation between H and I_C respectively; $ {\mathbf{T}}_C= Var\left[\begin{array}{c}\mathbf{y}\\ {}\boldsymbol{\upgamma} \end{array}\right]=\left[\begin{array}{cc}\mathbf{P}& \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] $ and $ {\boldsymbol{\Psi}}_C= Var\left[\begin{array}{c}\mathbf{g}\\ {}\boldsymbol{\upgamma} \end{array}\right]=\left[\begin{array}{cc}\mathbf{C}& \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] $ are block matrices of the phenotypic covariance matrix, P = Var(y), the genomic covariance matrix, Γ = Var(γ), and the genetic breeding values covariance matrix, C = Var(g).

Suppose that matrices Ψ_C and T_C are known; then the CLGSI vector of coefficients that simultaneously maximizes $ {\rho}_{HI_C} $ and R_C can be written as

$$ {\boldsymbol{\upbeta}}_C={\mathbf{T}}_C^{-1}{\boldsymbol{\Psi}}_C{\mathbf{a}}_C, $$

(5.31)

whence the optimized CLGSI is

$$ {I}_C={\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{t}}_C, $$

(5.32)

Equations (5.31) and (5.32) indicate that the CLGSI is an application of the LPSI to the genomic selection context.

From Eq. (5.31), the maximized CLGSI selection response, expected genetic gain per trait and accuracy can be written as

$$ {R}_C={k}_I\sqrt{{\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C}, $$

(5.33)

$$ {\mathbf{E}}_C={k}_I\frac{{\boldsymbol{\Psi}}_C{\boldsymbol{\upbeta}}_C}{\sqrt{{\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C}} $$

(5.34)

and

$$ {\rho}_{HI_C}=\frac{\sqrt{{\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C}}{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}}, $$

(5.35)

respectively. Note that the maximized LPSI accuracy is $ {\rho}_{HI}=\frac{\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}}{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}} $ (see Chap. 2). The denominator of the accuracy of the CLGSI and $ {\rho}_{HI}=\frac{\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}}{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}} $ is the same; however, the numerator of the two indices accuracy is different. We would expect that $ \sqrt{{\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C}\ge \sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}} $, and then $ {\rho}_{HI_C}\ge {\rho}_{HI} $. Similar results can be observed when we compared the maximized LPSI selection response and expected genetic gain per trait with the maximized CLGSI selection response and expected genetic gain per trait.

5.2.2 Relationship Between the CLGSI and the LGSI

As we have indicated, the CLGSI is mathematically equivalent to the LMSI; thus, it has similar statistical properties to those of the LMSI, some of which are described in this section. The rest can be seen in Chap. 4. Let $ {\mathbf{Q}}_C={\mathbf{T}}_C^{-1}{\boldsymbol{\Psi}}_C $, then matrix Q_C can be written as

$$ {\mathbf{Q}}_C=\left[\begin{array}{cc}{\left(\mathbf{P}-\boldsymbol{\Gamma} \right)}^{-1}\left(\mathbf{C}-\boldsymbol{\Gamma} \right)& \mathbf{0}\\ {}\mathbf{I}-{\left(\mathbf{P}-\boldsymbol{\Gamma} \right)}^{-1}\left(\mathbf{C}-\boldsymbol{\Gamma} \right)& \mathbf{I}\end{array}\right], $$

(5.36)

whence as $ {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_t\right] $, the two sub-vectors that conform vector β_C = Q_Ca_C or $ {\boldsymbol{\upbeta}}_C^{\prime }=\left[{\boldsymbol{\upbeta}}_{\mathbf{y}}^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_G^{\prime}\right] $ can be written as

$$ {\boldsymbol{\upbeta}}_{\mathbf{y}}={\left(\mathbf{P}-\boldsymbol{\Gamma} \right)}^{-1}\left(\mathbf{C}-\boldsymbol{\Gamma} \right)\mathbf{w}, $$

(5.37)

and

$$ {\boldsymbol{\upbeta}}_G=\left[\mathbf{I}-{\left(\mathbf{P}-\boldsymbol{\Gamma} \right)}^{-1}\left(\mathbf{C}-\boldsymbol{\Gamma} \right)\right]\mathbf{w}=\mathbf{w}-{\boldsymbol{\upbeta}}_{\mathbf{y}}. $$

(5.38)

When Γ is equal to the null matrix (no genomic information), Eq. (5.37) is equal to β_y = P⁻¹Cw = b and $ {R}_C={k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}={R}_I $, which are the LPSI vector of coefficients and the selection response.

By Eqs. (5.37) and (5.38), the maximized CLGSI selection response and the optimized CLGSI can be written as

$$ {R}_C={k}_I\sqrt{{\mathbf{w}}^{\prime}\mathbf{C}{\left(\mathbf{P}-\boldsymbol{\Gamma} \right)}^{-1}\left(\mathbf{C}-\boldsymbol{\Gamma} \right)\mathbf{w}+{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \left[\mathbf{I}-{\left(\mathbf{P}-\boldsymbol{\Gamma} \right)}^{-1}\left(\mathbf{C}-\boldsymbol{\Gamma} \right)\right]\mathbf{w}} $$

(5.39)

and

$$ {I}_C={\boldsymbol{\upbeta}}_{\mathbf{y}}\mathbf{y}+{\boldsymbol{\upbeta}}_G\boldsymbol{\upgamma} ={\mathbf{w}}^{\prime}\boldsymbol{\upgamma} +{\boldsymbol{\upbeta}}_{\mathbf{y}}\left(\mathbf{y}-\boldsymbol{\upgamma} \right), $$

(5.40)

respectively.

Assume that when the number of markers and genotypes increases, matrix Γ tends to matrix C and that, at the limit, Γ = C; then, Eq. (5.39) can be written as $ {R}_C={k}_I\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}}={R}_G $ (except by L_G); in addition, β_y = 0 and β_G = w, the weights of the LGSI, and, in this latter case, the CLGSI is equal to the LGSI, as we would expect. Thus, in the asymptotic context, the LGSI and the CLGSI are the same.

An additional interesting result of the relationship between the CLGSI and the LGSI is as follows. The maximized correlation between H and I_C (or CLGSI accuracy) can be written as

$$ {\rho}_{HI_C}=\frac{{\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\boldsymbol{\upbeta}}_C}{\sqrt{{\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\mathbf{a}}_C}\sqrt{{\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C}}; $$

(5.41)

However, when Γ = C, $ {\boldsymbol{\Psi}}_C=\left[\begin{array}{cc}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] $, β_y = 0, β_G = w, and $ {\boldsymbol{\upbeta}}_C^{\prime }=\left[{\boldsymbol{\upbeta}}_{\mathbf{y}}^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_G^{\prime}\right]=\left[\begin{array}{cc}\mathbf{0}& {\mathbf{w}}^{\prime}\end{array}\right] $, whence $ {\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\boldsymbol{\upbeta}}_C={\mathbf{a}}_C^{\prime }{\boldsymbol{\Psi}}_C{\mathbf{a}}_C={\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C={\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w} $, and Eq. (5.41) is equal to 1. That is, the maximum correlation between H and I_C in the asymptotic context is equal to the maximum correlation between H and the LGSI, and that value will be equal to 1.

The asymptotic relationship between the CLGSI expected genetic gain per trait, E_C (Eq. 5.34), and the LGSI expected genetic gain per trait, $ {\mathbf{E}}_{I_G} $ (Eq. 5.16), is as follows. When Γ = C, $ {\boldsymbol{\Psi}}_C=\left[\begin{array}{cc}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] $ and $ {\boldsymbol{\upbeta}}_C^{\prime }=\left[\mathbf{0}\kern0.5em {\mathbf{w}}^{\prime}\right] $, whence

$$ {\mathbf{E}}_C={k}_I\frac{{\boldsymbol{\Psi}}_C{\boldsymbol{\upbeta}}_C}{\sqrt{{\boldsymbol{\upbeta}}_C^{\prime }{\mathbf{T}}_C{\boldsymbol{\upbeta}}_C}}={k}_I\frac{2\boldsymbol{\Gamma} \mathbf{w}}{\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}}}=2{\mathbf{E}}_{I_G}. $$

(5.42)

This means that in the asymptotic context, the CLGSI expected genetic gain per trait is twice the LGSI expected genetic gain per trait. Of course, 2 is only a proportionality constant; thus, in reality, $ {\mathbf{E}}_C={\mathbf{E}}_{I_G} $.

5.2.3 Statistical Properties of the CLGSI

Assume that H and I_C have bivariate joint normal distribution; P, C, Γ, and w are known, and $ {\boldsymbol{\upbeta}}_C={\mathbf{T}}_C^{-1}{\boldsymbol{\Psi}}_C{\mathbf{a}}_C $; then, the CLGSI properties are as follow:

1.
$ {\sigma}_{I_C}^2={\sigma}_{HI_C} $, i.e., the variance of I_C ($ {\sigma}_{I_C}^2 $) and the covariance between H and I_C ($ {\sigma}_{HI_C} $) are the same.
2.
The maximized correlation between H and I_C is $ {\rho}_{HI_C}=\frac{\sigma_{I_C}}{\sigma_H} $, where $ {\sigma}_{I_C} $ is the standard deviation of the variance of I_C ($ {\sigma}_{I_C}^2 $) and σ_H is the standard deviation of the variance of H($ {\sigma}_H^2 $).
3.
The variance of the predicted error, $ Var\left(H-{I}_C\right)=\left(1-{\rho}_{HI_C}^2\right){\sigma}_H^2 $, is minimal.
4.
The total variance of H explained by I_C is $ {\sigma}_{I_C}^2={\rho}_{HI_C}^2{\sigma}_H^2 $.

Note that CLGSI properties 1 to 4 are the same as LMSI properties 1 to 4 and that both indices jointly incorporate phenotypic and marker information to predict the net genetic merit; however, the LMSI incorporates the marker information by the marker score values, whereas the CLGSI uses the GEBVs.

5.2.4 Estimating the CLGSI Parameters

Using the real maize (Zea mays) F₂ population with 248 genotypes (each with two repetitions), 233 molecular markers and three traits—GY (ton ha⁻¹), EHT (cm), and PHT (cm)—described in Sect. 5.1.8 of this chapter, we estimated matrices P and C using Eqs. (2.22) to (2.24) described in Chap. 2 of this book. The estimated matrices were $ \widehat{\mathbf{P}}=\left[\begin{array}{ccc}0.45& 1.33& 2.33\\ {}1.33& 65.07& 83.71\\ {}2.33& 83.71& 165.99\end{array}\right] $ and $ \widehat{\mathbf{C}}=\left[\begin{array}{ccc}0.07& 0.61& 1.06\\ {}0.61& 17.93& 22.75\\ {}1.06& 22.75& 44.53\end{array}\right] $.

In a similar manner, we estimated matrix Γ using Eqs. (5.21) to (5.23). The estimated matrix was $ \widehat{\boldsymbol{\Gamma}}=\left[\begin{array}{ccc}0.07& 0.65& 1.05\\ {}0.65& 10.62& 14.25\\ {}1.05& 14.25& 26.37\end{array}\right] $. Note that matrices $ \widehat{\mathbf{C}} $ and $ \widehat{\boldsymbol{\Gamma}} $ have similar values. This means that, in the asymptotic context, we can assume that matrix Γ tends to matrix C.

To estimate the CLMSI and its associated parameters (selection response, expected genetic gain per trait, etc.), we need to estimate the vector of coefficients $ {\boldsymbol{\upbeta}}_C={\mathbf{T}}_C^{-1}{\boldsymbol{\Psi}}_C{\mathbf{a}}_C $ as $ {\widehat{\boldsymbol{\upbeta}}}_C={\widehat{\mathbf{T}}}_C^{-1}{\widehat{\boldsymbol{\Psi}}}_C{\mathbf{a}}_C $, where $ {\widehat{\mathbf{T}}}_C=\left[\begin{array}{cc}\widehat{\mathbf{P}}& \widehat{\boldsymbol{\Gamma}}\\ {}\widehat{\boldsymbol{\Gamma}}& \widehat{\boldsymbol{\Gamma}}\end{array}\right] $ and $ {\widehat{\boldsymbol{\Psi}}}_C=\left[\begin{array}{cc}\widehat{\mathbf{C}}& \widehat{\boldsymbol{\Gamma}}\\ {}\widehat{\boldsymbol{\Gamma}}& \widehat{\boldsymbol{\Gamma}}\end{array}\right] $ are estimates of matrices $ {\mathbf{T}}_C=\left[\begin{array}{cc}\mathbf{P}& \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] $ and $ {\boldsymbol{\Psi}}_C=\left[\begin{array}{cc}\mathbf{C}& \boldsymbol{\Gamma} \\ {}\boldsymbol{\Gamma} & \boldsymbol{\Gamma} \end{array}\right] $ respectively. The estimated CLGSI vector of coefficients $ {\widehat{\boldsymbol{\upbeta}}}_C={\widehat{\mathbf{T}}}_C^{-1}{\widehat{\boldsymbol{\Psi}}}_C{\mathbf{a}}_C $ is conformed by the vector of phenotypic weights, $ {\widehat{\boldsymbol{\upbeta}}}_{\mathbf{y}}={\left(\widehat{\mathbf{P}}-\widehat{\boldsymbol{\Gamma}}\right)}^{-1}\left(\widehat{\mathbf{C}}-\widehat{\boldsymbol{\Gamma}}\right)\mathbf{w} $, and by the vector of genomic weights, $ {\widehat{\boldsymbol{\upbeta}}}_G=\left[\mathbf{I}-{\left(\widehat{\mathbf{P}}-\widehat{\boldsymbol{\Gamma}}\right)}^{-1}\left(\widehat{\mathbf{C}}-\widehat{\boldsymbol{\Gamma}}\right)\right]\mathbf{w} $.

Let $ {\mathbf{w}}^{\prime }=\left[5\kern0.5em -0.1\kern0.5em -0.1\right] $ be the vector of economic weights; then, according to the estimated matrices $ \widehat{\mathbf{P}} $, $ \widehat{\mathbf{C}} $, and $ \widehat{\boldsymbol{\Gamma}} $, $ {\widehat{\boldsymbol{\upbeta}}}_{\mathbf{y}}^{\prime }=\left[0.08\kern0.5em -0.02\kern0.5em -0.01\right] $ and $ {\widehat{\boldsymbol{\upbeta}}}_G^{\prime }=\left[4.92\kern0.5em -0.08\kern0.5em -0.09\right] $, whence the estimated CLGSI in the training population can be written as

$$ {\widehat{I}}_C={\widehat{\boldsymbol{\upbeta}}}_{\mathbf{y}}\mathbf{y}+{\widehat{\boldsymbol{\upbeta}}}_G\widehat{\boldsymbol{\upgamma}}. $$

(5.43)

Suppose a selection intensity of 10% (k_I = 1.755); then, the estimated CLGSI selection response and expected genetic gain per trait were $ {\widehat{R}}_C={k}_I\sqrt{{\widehat{{\boldsymbol{\upbeta}}^{\prime}}}_C{\widehat{\mathbf{T}}}_C{\widehat{\boldsymbol{\upbeta}}}_C}=1.54 $ and $ {\widehat{\mathbf{E}}}_C^{\prime }={k}_I\frac{{\widehat{\boldsymbol{\upbeta}}}_C^{\prime }{\widehat{\boldsymbol{\Psi}}}_C}{\sqrt{{\widehat{{\boldsymbol{\upbeta}}^{\prime}}}_C{\widehat{\mathbf{T}}}_C{\widehat{\boldsymbol{\upbeta}}}_C}}=\left[0.36\kern0.5em 1.04\kern0.5em 1.70\kern0.5em 0.36\kern0.5em 1.53\kern0.5em 2.38\right] $ respectively, whereas the estimated CLGSI accuracy was $ {\widehat{\rho}}_{HI_C}=\frac{{\widehat{\sigma}}_{I_C}}{{\widehat{\sigma}}_H}=0.814 $.

The estimated LPSI selection response, expected genetic gain per trait, and accuracy were 0.601, $ \left[0.09\kern0.5em -0.81\kern0.5em -0.89\right] $, and 0.32 respectively; thus, the CLGSI was more efficient to predict the net genetic merit than the LPSI because the CLGSI accuracy and selection response were 0.814 and 1.54 respectively.

5.2.5 LGSI and CLGSI Efficiency Vs LMSI, GW-LMSI and LPSI Efficiency

In this subsection, we compare the accuracy, selection response, and efficiency of the LGSI and CLGSI with the LMSI, the GW-LMSI, and the LPSI using the simulated data for a maize (Zea mays) population described in Chap. 2, Sect. 2.8.1.

Figure 5.3 presents the estimated accuracy values of the LMSI, the LGSI, the CLGSI, the LPSI, and the GW-LMSI for five simulated selection cycles. According to these results, for the first three selection cycles, the estimated accuracies of the indices, in decreasing order, were LMSI > LGSI > CLGSI > LPSI > GW-LMSI. That is, the highest estimated accuracy was obtained with the LMSI, whereas the lowest was obtained with the GW-LMSI. For the fourth and fifth selection cycles, the estimated accuracies, in decreasing order, were LMSI > LPSI > CLGSI > LGSI > GW-LMSI. This means that in all five selection cycles, the LMSI had the highest accuracy and the GW-LMSI had the lowest accuracy, whereas the estimated LGSI accuracy was reduced to fourth place. Thus, the accuracy of the LGSI tended to decrease after the first three selection cycles whereas LPSI accuracy was a constant.

To compare LGSI efficiency versus the efficiency of the other selection indices, we assumed that the interval between selection cycles in the LGSI is 1.5 years, whereas for CLGSI, LMSI, GW-LMSI, and LPSI, the interval was 4.0 years. Table 5.3 presents the estimated selection response of the LPSI, the LMSI, the GW-LMSI, the LGSI, and the CLGSI, including and not including the interval between selection cycles (first and second parts of Table 5.3 respectively), obtained using five simulated selection cycles. According to the first part of Table 5.3, the average estimated selection responses, in decreasing order, of the LMSI, CLGSI, LPSI, GW-LMSI, and LGSI for the five simulated selection cycles were 17.82, 15.30, 15.22, 13.24, and 13.11 respectively, when the length of the interval between selection was not included. If the length of the interval between selection cycles is included when comparing the selection response of the indices in terms of time, the estimated selection response of LMSI, CLGSI, LPSI, GW-LMSI must be divided by 4 in each selection cycle, and the estimated LGSI selection response should be divided by 1.5. Thus, according to the second part of Table 5.3, if we include the length of the interval between selection cycles, the average estimated selection responses, in decreasing order, of LGSI, LMSI, CLGSI, LPSI, and GW-LMSI for the five simulated selection cycles were 8.74, 4.46, 3.83, 3.80, and 3.31. This means that in terms of time, the efficiency of the LGSI was higher than the efficiency of the other four selection indices.

Table 5.3 Estimated selection response of the linear phenotypic selection index (LPSI), the linear molecular selection index (LMSI), the genome-wide LMSI (GW-LMSI), the linear genomic selection index (LGSI), and the combined LGSI (CLGSI), not including (first part of the Table) and including (second part of the Table) the interval length between selection cycles, obtained using five simulated selection cycles

Full size table

Table 5.4 presents the estimated accuracy of the LMSI, LGSI, CLGSI, LPSI, and the GW-LMSI. In addition, Table 5.4 presents the efficiency when predicting the net genetic merit of the LMSI with respect to the LGSI, CLGSI, LPSI, and GW-LMSI as percentages, for five simulated selection cycles. Note that in this case, LMSI efficiency was higher than the efficiency of the other four selection indices, because the LMSI had the highest correlation with the net genetic merit.

Table 5.4 Estimated accuracy of the LMSI, the LGSI, the CLGSI, the LPSI, and the GW-LMSI; LMSI efficiency compared with LGSI, CLGSI, LPSI, and GW-LMSI efficiencies, expressed in percentages, for five simulated selection cycles

Full size table

References

Beyene Y, Semagn K, Mugo S, Tarekegne A, Babu R et al (2015) Genetic gains in grain yield through genomic selection in eight bi-parental maize populations under drought stress. Crop Sci 55:154–163
Article Google Scholar
Ceron-Rojas JJ, Crossa J, Arief VN, Basford K, Rutkoski J, Jarquín D, Alvarado G, Beyene Y, Semagn K, DeLacy I (2015) A genomic selection index applied to simulated and real data. G3 (Bethesda) 5:2155–2164
Article Google Scholar
Cerón-Rojas JJ, Sahagún-Castellanos J (2016) Multivariate empirical Bayes to predict plant breeding values. Agrociencia 50:633–648
Google Scholar
Dekkers JCM (2007) Prediction of response to marker-assisted and genomic selection using selection index theory. J Anim Breed Genet 124:331–341
Article CAS Google Scholar
Gianola D, Perez-Enciso M, Toro MA (2003) On marker-assisted prediction of genetic value: beyond the ridge. Genetics 163:347–365
CAS PubMed PubMed Central Google Scholar
Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177:2389–2397
CAS PubMed PubMed Central Google Scholar
Heffner EL, Sorrells ME, Jannink JL (2009) Genomic selection for crop improvement. Crop Sci 49(1):12
Article Google Scholar
Lin CY, Allaire FR (1977) Heritability of a linear combination of traits. Theor Appl Genet 51:1–3
Article CAS Google Scholar
Lorenz AJ, Chao S, Asoro FG, Heffner EL, Hayashi T et al (2011) Genomic selection in plant breeding: knowledge and prospects. Adv Agron 110:77–123
Article Google Scholar
Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sinauer Associates, Sunderland, ME
Google Scholar
Mrode RA (2005) Linear models for the prediction of animal breeding values, 2nd edn. CABI Publishing, Cambridge, MA
Book Google Scholar
Nordskog AW (1978) Some statistical properties of an index of multiple traits. Theor Appl Genet 52:91–94
CAS PubMed Google Scholar
Schott JR (2005) Matrix analysis for statistics, 2nd edn. Wiley, Hoboken, NJ
Google Scholar
Searle S, Casella G, McCulloch CE (2006) Variance components. Wiley, Hoboken, NJ
Google Scholar
Su G, Christensen OF, Ostersen T, Henryon M, Lund MS (2012) Estimating additive and non-additive genetic variances and predicting genetic merits using genome-wide dense single nucleotide polymorphism markers. PLoS One 7(9):e45293
Article CAS Google Scholar
Technow F, Bürger A, Melchinger AE (2013) Genomic prediction of northern corn leaf blight resistance in maize with combined or separate training sets for heterotic groups. G3 (Bethesda) 3:197–203
Article Google Scholar
Togashi K, Lin CY, Yamazaki T (2011) The efficiency of genome-wide selection for genetic improvement of net merit. J Anim Sci 89:2972–2980
Article CAS Google Scholar
Van Raden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423
Article CAS Google Scholar
Vattikuti S, Guo J, Chow CC (2012) Heritability and genetic correlations explained by common SNPs for metabolic syndrome traits. PLoS Genet 8(3):e1002637
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Mexico, Mexico
J. Jesus Céron-Rojas & José Crossa

Authors

J. Jesus Céron-Rojas
View author publications
You can also search for this author in PubMed Google Scholar
José Crossa
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Céron-Rojas, J.J., Crossa, J. (2018). Linear Genomic Selection Indices. In: Linear Selection Indices in Modern Plant Breeding. Springer, Cham. https://doi.org/10.1007/978-3-319-91223-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-319-91223-3_5
Published: 27 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91222-6
Online ISBN: 978-3-319-91223-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Linear Genomic Selection Indices

Abstract

Similar content being viewed by others

An analytical framework to derive the expected precision of genomic selection

Overview of Genomic Prediction Methods and the Associated Assumptions on the Variance of Marker Effect, and on the Architecture of the Target Trait

Genomic Selection

5.1 The Linear Genomic Selection Index

5.1.1 Basic Conditions for Constructing the LGSI

5.1.2 Genomic Breeding Values and Marker Effects

5.1.3 The LGSI and Its Parameters

5.1.4 Maximizing LGSI Parameters

5.1.5 Relationship Between the LGSI and LPSI Selection Responses

5.1.6 Statistical LGSI Properties

5.1.7 Genomic Covariance Matrix in the Training and Testing Population

5.1.8 Numerical Examples

5.2 The Combined Linear Genomic Selection Index

5.2.1 The CLGSI Parameters

5.2.2 Relationship Between the CLGSI and the LGSI

5.2.3 Statistical Properties of the CLGSI

5.2.4 Estimating the CLGSI Parameters

5.2.5 LGSI and CLGSI Efficiency Vs LMSI, GW-LMSI and LPSI Efficiency

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Linear Genomic Selection Indices

Abstract

Similar content being viewed by others

An analytical framework to derive the expected precision of genomic selection

Overview of Genomic Prediction Methods and the Associated Assumptions on the Variance of Marker Effect, and on the Architecture of the Target Trait

Genomic Selection

5.1 The Linear Genomic Selection Index

5.1.1 Basic Conditions for Constructing the LGSI

5.1.2 Genomic Breeding Values and Marker Effects

5.1.3 The LGSI and Its Parameters

5.1.4 Maximizing LGSI Parameters

5.1.5 Relationship Between the LGSI and LPSI Selection Responses

5.1.6 Statistical LGSI Properties

5.1.7 Genomic Covariance Matrix in the Training and Testing Population

5.1.8 Numerical Examples

5.2 The Combined Linear Genomic Selection Index

5.2.1 The CLGSI Parameters

5.2.2 Relationship Between the CLGSI and the LGSI

5.2.3 Statistical Properties of the CLGSI

5.2.4 Estimating the CLGSI Parameters

5.2.5 LGSI and CLGSI Efficiency Vs LMSI, GW-LMSI and LPSI Efficiency

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation