Linear Marker and Genome-Wide Selection Indices

Céron-Rojas, J. Jesus; Crossa, José

doi:10.1007/978-3-319-91223-3_4

J. Jesus Céron-Rojas⁴ &
José Crossa⁴

4572 Accesses
1 Citations

Abstract

There are two main linear marker selection indices employed in marker-assisted selection (MAS) to predict the net genetic merit and to select individual candidates as parents for the next generation: the linear marker selection index (LMSI) and the genome-wide LMSI (GW-LMSI). Both indices maximize the selection response, the expected genetic gain per trait, and the correlation with the net genetic merit; however, applying the LMSI in plant or animal breeding requires genotyping the candidates for selection; performing a linear regression of phenotypic values on the coded values of the markers such that the selected markers are statistically linked to quantitative trait loci that explain most of the variability in the regression model; constructing the marker score, and combining the marker score with phenotypic information to predict and rank the net genetic merit of the candidates for selection. On the other hand, the GW-LMSI is a single-stage procedure that treats information at each individual marker as a separate trait. Thus, all marker information can be entered together with phenotypic information into the GW-LMSI, which is then used to predict the net genetic merit and select candidates. We describe the LMSI and GW-LMSI theory and show that both indices are direct applications of the linear phenotypic selection index theory to MAS. Using real and simulated data we validated the theory of both indices.

You have full access to this open access chapter, Download chapter PDF

Overview of Genomic Prediction Methods and the Associated Assumptions on the Variance of Marker Effect, and on the Architecture of the Target Trait

Genomic Selection

Genomic Selection: State of the Art

Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

4.1 The Linear Marker Selection Index

4.1.1 Basic Conditions for Constructing the LMSI

In Chap. 2, Sect. 2.1, we indicated ten basic conditions for constructing a valid linear phenotypic selection index (LPSI). These ten conditions are also necessary for the linear marker selection index (LMSI); however, in addition to those conditions, the LMSI also requires the following conditions:

1.
The markers and the quantitative trait loci (QTL) should be in linkage disequilibrium in the population under selection.
2.
The QTL effects should be combined additively both within and between loci.
3.
The QTL should be in coupling mode, that is, one of the initial lines should have all the alleles that have a positive effect on the chromosome, and the other lines should have all the negative effects.
4.
The traits of interest should be affected by a few QTL with large effects (and possibly a number of very small QTL effects) rather than many small QTL effects.
5.
The heritability of the traits should be low.
6.
Markers correlated with the traits of interest should be identified.

Under these conditions, the LMSI should be more efficient than the LPSI, at least in the first selection cycles (Whittaker 2003; Moreau et al. 2007).

4.1.2 The LMSI Parameters

Let y_i = g_i + e_i be the ith trait (i = 1, 2, …, t, t = number of traits), where e_i~N(0, $ {\sigma}_{e_i}^2 $) is the residual with expectation equal to zero and variance value $ {\sigma}_{e_i}^2 $, and N stands for normal distribution. Assuming that the QTL effects combine additively both within and between loci, the ith unobservable genetic value g_i can be written as

$$ {g}_i=\sum \limits_{k=1}^{N_Q}{\alpha}_k{q}_k, $$

(4.1)

where α_k is the effect of the kth QTL, q_k is the number of favorable alleles at the kth QTL (2, 1 or 0), and N_Q is the number of QTL affecting the ith trait of interest.

If the QTL effect values are not observable, the g_i values in Eq. (4.1) are also not observable; however, we can use a linear combination of the markers linked to the QTL (s_i) that affect the ith trait to predict the g_i value as

$$ {s}_i=\sum \limits_{j=1}^M{\theta}_j{x}_j, $$

(4.2)

where s_i is a predictor of g_i, θ_j is the regression coefficient of the linear regression model, x_j is the coded value of the jth markers (e.g., 1, 0, and −1 for marker genotypes AA, Aa and aa respectively), and M is the number of selected markers linked to the QTL that affect the ith trait. Equation (4.2) is called the marker score (Lande and Thompson 1990; Whittaker 2003) and this is the main reason why the LMSI is not equal to the LPSI described in Chap. 2. The number of selected markers is only a subset of potential markers linked to QTL in the population under selection; thus, the s_i values should be lower than or equal to the g_i values. One way of estimating the s_i values is to perform a linear regression of phenotypic values on the coded values of the markers, select markers that are statistically linked to quantitative trait loci that explain most of the variability in the regression model, and then obtain the estimated value of s_i ($ {\widehat{s}}_i $) as the sum of the products of the QTL effects linked to markers and multiplied by the marker coded values associated with the ith trait. Some authors (e.g., Moreau et al. 2007) call $ {\widehat{s}}_i $ the molecular score; in this book, we call s_i the marker score and $ {\widehat{s}}_i $ the estimated marker score.

The objective of the LMSI is to predict the net genetic merit of each individual and select the individuals with the highest net genetic merit for further breeding. In the LMSI context, the net genetic merit can be written as

$$ H={\mathbf{w}}^{\prime}\mathbf{g}+{\mathbf{w}}_2^{\prime}\mathbf{s}=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right]\left[\begin{array}{c}\mathbf{g}\\ {}\mathbf{s}\end{array}\right]={\mathbf{a}}^{\prime}\mathbf{z}, $$

(4.3)

where $ {\mathbf{g}}^{\prime }=\left[{g}_1\kern0.5em \dots \kern0.5em {g}_q\right] $ is the vector of breeding values; $ {\mathbf{w}}^{\prime }=\left[{w}_1\kern0.5em \cdots \kern0.5em {w}_t\right] $ is the vector of economic weights associated with g; $ {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_t\right] $ is a null vector associated with the vector of marker scores $ {\mathbf{s}}^{\prime }=\left[{s}_1\kern0.5em \cdots \kern0.5em {s}_t\right] $; s_i is the ith marker score; $ {\mathbf{a}}^{\prime }=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right] $ and $ \mathbf{z}=\left[{\mathbf{g}}^{\prime}\kern0.5em {\mathbf{s}}^{\prime}\right] $.

The information provided by the marker score can be used in breeding programs to increase the accuracy of predicting the net genetic merit of the individuals under selection. The LMSI combines the phenotypic and marker scores to predict H in each selection cycle and can be written as

$$ {I}_M={\boldsymbol{\upbeta}}_y^{\prime}\mathbf{y}+{\boldsymbol{\upbeta}}_s^{\prime}\mathbf{s}=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_s^{\prime}\right]\left[\begin{array}{c}\mathbf{y}\\ {}\mathbf{s}\end{array}\right]={\boldsymbol{\upbeta}}^{\prime}\mathbf{t}, $$

(4.4)

where $ {\boldsymbol{\upbeta}}_y^{\prime } $ and β_s are vectors of phenotypic and marker score weights respectively; $ {\mathbf{y}}^{\prime }=\left[{y}_1\kern0.5em \cdots \kern0.5em {y}_t\right] $ is the vector of trait phenotypic values and s was defined in Eq. (4.3); $ {\boldsymbol{\upbeta}}^{\prime }=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_s^{\prime}\right] $ and $ {\mathbf{t}}^{\prime }=\left[{\mathbf{y}}^{\prime}\kern0.5em {\mathbf{s}}^{\prime}\right] $.

The LMSI selection response can be written as

$$ {R}_M={k}_I{\sigma}_H{\rho}_{I_MH}={k}_I{\sigma}_H\frac{{\mathbf{a}}^{\prime }{\mathbf{Z}}_M\boldsymbol{\upbeta}}{\sqrt{{\mathbf{a}}^{\prime }{\mathbf{Z}}_M\mathbf{a}}\sqrt{{\boldsymbol{\upbeta}}^{\prime }{\mathbf{T}}_M\boldsymbol{\upbeta}}}, $$

(4.5)

where k_I is the standardized selection differential of the LMSI, $ {\sigma}_H=\sqrt{{\mathbf{a}}^{\prime }{\mathbf{Z}}_M\mathbf{a}} $ and $ \sqrt{{\boldsymbol{\upbeta}}^{\prime }{\mathbf{T}}_M\boldsymbol{\upbeta}} $ are the standard deviations of the variances of H and I_M, whereas $ {\rho}_{I_MH} $ and a′Z_Mβ are the correlation and the covariance between H and I_M respectively; $ {\mathbf{T}}_M= Var\left[\begin{array}{c}\mathbf{y}\\ {}\mathbf{s}\end{array}\right]=\left[\begin{array}{cc}\mathbf{P}& \mathbf{S}\\ {}\mathbf{S}& \mathbf{S}\end{array}\right] $ and $ {\mathbf{Z}}_M= Var\left[\begin{array}{c}\mathbf{g}\\ {}\mathbf{s}\end{array}\right]=\left[\begin{array}{cc}\mathbf{C}& \mathbf{S}\\ {}\mathbf{S}& \mathbf{S}\end{array}\right] $ are block matrices of covariance where P = Var(y), S = Var(s), and C = Var(g) are the covariance matrices of phenotypic values (y), the marker score (s), and the genetic value (g) respectively in the population. Vectors a and β were defined in Eqs. (4.3) and (4.4) respectively.

The LMSI expected genetic gain per trait can be written as

$$ {\mathbf{E}}_M={k}_I\frac{{\mathbf{Z}}_M\boldsymbol{\upbeta}}{\sqrt{{\boldsymbol{\upbeta}}^{\prime }{\mathbf{T}}_M\boldsymbol{\upbeta}}}. $$

(4.6)

All the parameters in Eq. (4.6) were previously defined.

4.1.3 The Maximized LMSI Parameters

Suppose that P, S and C are known matrices; then, matrices T_M and Z_M are known and, according to the LPSI theory (Chap. 2 for details), the LMSI vector of coefficients (β_M) that maximizes $ {\rho}_{I_MH} $, R_M, and E_M can be written as

$$ \boldsymbol{\upbeta} ={\mathbf{T}}_M^{-1}{\mathbf{Z}}_M\mathbf{a}, $$

(4.7)

whence the maximized selection response and the maximized correlation (or LMSI accuracy) between H and I_M can be written as

$$ {R}_M={k}_I\sqrt{{\boldsymbol{\upbeta}}^{\prime }{\mathbf{T}}_M\boldsymbol{\upbeta}}, $$

(4.8a)

and

$$ {\rho}_{I_MH}=\frac{\sigma_{I_M}}{\sigma_H}, $$

(4.8b)

respectively, where $ {\sigma}_{I_M}=\sqrt{{\boldsymbol{\upbeta}}^{\prime }{\mathbf{T}}_M\boldsymbol{\upbeta}} $ is the standard deviation of the variance of I_M and $ {\sigma}_H=\sqrt{{\mathbf{a}}^{\prime }{\mathbf{Z}}_M\mathbf{a}} $ is the deviation of the variance of H. Equations (4.8a) and (4.8b) show that the LMSI is a direct application of the LPSI theory in the marker-assisted selection (MAS) context.

Let $ \mathbf{Q}={\mathbf{T}}_M^{-1}{\mathbf{Z}}_M $; then, matrix Q can be written as

$$ \mathbf{Q}=\left[\begin{array}{cc}{\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)& \mathbf{0}\\ {}\mathbf{I}-{\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)& \mathbf{I}\end{array}\right], $$

(4.9)

whence β = Qa, and as $ {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_t\right] $, we can write the two vectors of $ {\boldsymbol{\upbeta}}^{\prime }=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_s^{\prime}\right] $ as

$$ {\boldsymbol{\upbeta}}_y={\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)\mathbf{w}\kern1em \mathrm{and}\kern1em {\boldsymbol{\upbeta}}_s=\left[\mathbf{I}-{\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)\right]\mathbf{w}. $$

(4.10a)

Another way of writing the marker score vector weights is

$$ {\boldsymbol{\upbeta}}_s=\mathbf{w}-{\boldsymbol{\upbeta}}_y, $$

(4.10b)

where β_y = (P − S)⁻¹(C − S)w. By Eq. (4.10b), the optimal LMSI can be written as

$$ {I}_M={\mathbf{w}}^{\prime}\mathbf{s}+{\boldsymbol{\upbeta}}_y^{\prime}\left(\mathbf{y}-\mathbf{s}\right). $$

(4.11)

Equation (4.11) indicates that, in practice, to estimate the optimal LMSI, we only need to estimate the vector of coefficients β_y. By Eq. (4.10a), Eq. (4.8a) can be written as

$$ {R}_M={k}_I\sqrt{{\mathbf{w}}^{\prime}\mathbf{C}{\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)\mathbf{w}+{\mathbf{w}}^{\prime}\mathbf{S}\left[\mathbf{I}-{\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)\right]\mathbf{w}}. $$

(4.12)

Thus, by Eqs. (4.10a) and (4.12), when S is a null matrix, vector β_y is equal to β_y = P⁻¹Cw = b and $ {R}_M={k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}={R}_I $, which are the LPSI vector of coefficients and its selection response respectively.

Assume that when the number of markers and genotypes tend to infinity, S tends to C; then, at the limit, we can suppose that S = C, and by this latter result, R_M is equal to

$$ {k}_I\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}. $$

(4.13)

That is, Eq. (4.13) is the maximum value of the LMSI selection response when the numbers of markers and genotypes tend to infinity. Thus, the possible LMSI selection response values of Eq. (4.12) should be between $ {k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}} $ and $ {k}_I\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}} $, i.e.,

$$ {k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}\le {R}_M\le {k}_I\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}, $$

(4.14)

or between 1 and $ \frac{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}}{\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}}=\frac{\sigma_H}{\sigma_I} $, that is,

$$ 1\le {R}_M\le \frac{\sigma_H}{\sigma_I}. $$

(4.15)

Note that $ \frac{\sigma_H}{\sigma_I}=\frac{1}{\rho_{HI}} $, where ρ_HI is the maximized correlation between the net genetic merit (H) and the LPSI (I) described in Chap. 2. Equation (4.15) indicates that LMSI efficiency tends to infinity when the ρ_HI value tends to zero and is an additional way of denoting the paradox of LMSI efficiency described by Knapp (1998), which implies that LMSI efficiency tends to infinity when the ρ_HI value tends to zero.

4.1.4 The LMSI for One Trait

For the one-trait case, matrices T_M, Z_M, and Q can be written as

$$ {\mathbf{T}}_M=\left[\begin{array}{cc}{\sigma}_y^2& {\sigma}_s^2\\ {}{\sigma}_s^2& {\sigma}_s^2\end{array}\right],\kern0.5em {\mathbf{Z}}_M=\left[\begin{array}{cc}{\sigma}_g^2& {\sigma}_s^2\\ {}{\sigma}_s^2& {\sigma}_s^2\end{array}\right]\kern1em \mathrm{and}\kern1em \mathbf{Q}=\left[\begin{array}{cc}\frac{\sigma_g^2-{\sigma}_s^2}{\sigma_y^2-{\sigma}_s^2}& 0\\ {}\frac{\sigma_y^2-{\sigma}_g^2}{\sigma_y^2-{\sigma}_s^2}& 1\end{array}\right], $$

(4.16)

where $ {\sigma}_y^2 $, $ {\sigma}_g^2 $, and $ {\sigma}_s^2 $ are the phenotypic, genetic, and marker score variances respectively. By Eqs. (4.10a) and (4.10b), when $ {\mathbf{a}}^{\prime }=\left[1\kern0.5em 0\right] $, the elements of vector β = Qa are

$$ {\beta}_y=\frac{\sigma_g^2-{\sigma}_s^2}{\sigma_y^2-{\sigma}_s^2}\kern1em \mathrm{and}\kern1em {\beta}_s=1-{\beta}_y, $$

(4.17a)

whence the optimal LMSI can be written as

$$ {I}_M=s+{\beta}_y\left(y-s\right); $$

(4.17b)

whereas by Eq. (4.12), the maximized LMSI selection response can be written as

$$ {R}_M={k}_I\sqrt{\frac{\sigma_g^2\left({\sigma}_g^2-{\sigma}_s^2\right)+{\sigma}_s^2\left({\sigma}_y^2-{\sigma}_g^2\right)}{\sigma_y^2-{\sigma}_s^2}}. $$

(4.18)

When $ {\sigma}_s^2=0 $, $ {\beta}_y=\frac{\sigma_g^2}{\sigma_y^2}={h}^2 $, I_M = h²y, and $ {R}_M=k\frac{\sigma_g^2}{\sigma_y}=k{\sigma}_y{h}^2=R $, the selection response for the one-trait case without markers.

4.1.5 Efficiency of LMSI Versus LPSI Efficiency for One Trait

Suppose that the intensity of selection is the same in both indices; then, to compare LMSI versus LPSI efficiency for predicting the net genetic merit, we can use the ratio $ {\lambda}_M=\frac{\rho_{I_MH}}{\rho_{HI}}=\frac{R_M}{R_I} $ (Bulmer 1980; Moreau et al. 1998), where R_I is the maximized LPSI selection response. In percentage terms, the LMSI versus LPSI efficiency can be written as

$$ {p}_M=100\left({\lambda}_M-1\right). $$

(4.19)

When p_M = 0, the efficiency of both indices is the same; when p_M > 0, the efficiency of the LMSI is higher than that of the LPSI, and when p_M < 0, LPSI efficiency is higher than LMSI efficiency for predicting the net genetic merit.

In the case of one trait, Lande and Thompson (1990) showed that LMSI efficiency (not in percentage terms) with respect to phenotypic efficiency can be written as

$$ {\lambda}_M=\frac{R_M}{R}=\sqrt{\frac{q}{h^2}+\frac{{\left(1-q\right)}^2}{1-{qh}^2}}, $$

(4.20)

where R_M was defined in Eq. (4.18), R = kσ_yh², h² is the trait heritability, and $ q=\frac{\sigma_s^2}{\sigma_g^2} $ is the proportion of additive genetic variance explained by the markers. According to Eq. (4.20), the advantage of the LMSI over phenotypic selection increases as the population size increases and heritability decreases, because in such cases, $ q=\frac{\sigma_s^2}{\sigma_g^2} $ tends to 1 and Eq. (4.20) approaches $ \frac{1}{h} $. Therefore, the LMSI is most efficient for traits with low heritability and when the marker score explains a large proportion of the genetic variance. Thus, note that when h² tends to zero, $ \frac{1}{h} $ tends to infinity; this means that in the asymptotic context, LMSI efficiency with respect to phenotypic efficiency for one trait (Eq. 4.20) tends to infinity and this is the LMSI paradox pointed out by Knapp (1998). There are other problems associated with the LMSI: it increases the selection response only in the short term and can result in lower cumulative responses in the longer term than phenotypic selection, as the LMSI fixes the QTL at a faster rate than phenotypic selection. In addition, it requires the weights (Eq. 4.17a) to be updated, because in each generation the frequency of the QTL changes (Dekkers and Settar 2004).

4.1.6 Statistical LMSI Properties

Assume that H and I_M have bivariate joint normal distribution, $ \boldsymbol{\upbeta} ={\mathbf{T}}_M^{-1}{\mathbf{Z}}_M\mathbf{a} $, and that P, C, S, and w are known; then, the statistical LMSI properties are the same as the LPSI properties described in Chap. 2. That is,

1.
$ {\sigma}_{I_M}^2={\sigma}_{HI_M} $: the variance of I_M ($ {\sigma}_{I_M}^2 $) and the covariance between H and I_M ($ {\sigma}_{HI_M} $) are the same.
2.
The maximized correlation between H and I_M (or I_M accuracy) is $ {\rho}_{HI_M}=\frac{\sigma_{I_M}}{\sigma_H} $.
3.
The variance of the predicted error, $ Var\left(H-{I}_M\right)=\left(1-{\rho}_{HI_M}^2\right){\sigma}_H^2 $, is minimal.
4.
The total variance of H explained by I_M is $ {\sigma}_{I_M}^2={\rho}_{HI_M}^2{\sigma}_H^2 $.
5.
The heritability of I_M is $ {\mathrm{h}}_{\mathrm{M}}^2=\frac{{\boldsymbol{\upbeta}}_M^{\prime }{\mathbf{Z}}_M{\boldsymbol{\upbeta}}_M}{{\boldsymbol{\upbeta}}_M^{\prime }{\mathbf{T}}_M{\boldsymbol{\upbeta}}_M} $.

Properties 1 to 4 are the same as LPSI properties 1 to 4, but, because the LMSI jointly incorporates the phenotypic and marker information to predict the net genetic merit, LMSI accuracy should be higher than LPSI accuracy. The same is true of the LMSI selection response and expected genetic gain per trait when compared with the LPSI selection response and expected genetic gain per trait.

4.2 The Genome-Wide Linear Selection Index

The genome-wide linear marker selection index (GW-LMSI) is a single-stage procedure that treats information at each individual marker as a separate trait. Thus, all marker information can be entered together with phenotypic information into the GW-LMSI, which is then used to predict the net genetic merit. In a similar manner to the LMSI, the GW-LMSI exploits the linkage disequilibrium between markers and the QTL produced when inbred lines are crossed.

4.2.1 The GW-LMSI Parameters

In a similar manner to the LPSI, the main objective of the GW-LMSI is to predict the net genetic merit values of each individual and select the best individuals for further breeding. In the GW-LMSI context, the net genetic merit can be written as

$$ H={\mathbf{w}}^{\prime}\mathbf{g}+{\mathbf{w}}_2^{\prime}\mathbf{m}=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right]\left[\begin{array}{c}\mathbf{g}\\ {}\mathbf{m}\end{array}\right]={\mathbf{a}}_W^{\prime }{\mathbf{z}}_W, $$

(4.21)

where $ {\mathbf{g}}^{\prime }=\left[{g}_1\kern0.5em \dots \kern0.5em {g}_t\right] $ (j = 1, 2, …, t = number of traits) is the vector of breeding values, $ {\mathbf{w}}^{\prime }=\left[{w}_1\kern0.5em \cdots \kern0.5em {w}_t\right] $ is the vector of economic weights associated with the breeding values, and $ {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_m\right] $ is a null vector associated with the coded values of the markers $ {\mathbf{m}}^{\prime }=\left[{m}_1\kern0.5em \cdots \kern0.5em {m}_m\right] $, where m_j (j = 1, 2, …, m = number of markers) is the jth marker in the training population; $ {\mathbf{a}}_W^{\prime }=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right] $ and $ {\mathbf{z}}_W=\left[{\mathbf{g}}^{\prime}\kern0.5em {\mathbf{m}}^{\prime}\right] $.

The GW-LMSI (I_W) combines the phenotypic value and the molecular information linked to the individual traits to predict H values in each selection cycle. It can be written as

$$ {I}_W={\boldsymbol{\upbeta}}_y^{\prime}\mathbf{y}+{\boldsymbol{\upbeta}}_m^{\prime}\mathbf{m}=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_m^{\prime}\right]\left[\begin{array}{c}\mathbf{y}\\ {}\mathbf{m}\end{array}\right]={\boldsymbol{\upbeta}}_W^{\prime }{\mathbf{t}}_W, $$

(4.22)

where $ {\boldsymbol{\upbeta}}_y^{\prime } $ and β_m are vectors of phenotypic and marker weights respectively; $ {\mathbf{y}}^{\prime }=\left[{y}_1\kern0.5em \cdots \kern0.5em {y}_t\right] $ is the vector of phenotypic values and m was defined in Eq. (4.21); $ {\boldsymbol{\upbeta}}_W^{\prime }=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_m^{\prime}\right] $ and $ {\mathbf{t}}_W^{\prime }=\left[{\mathbf{y}}^{\prime}\kern0.5em {\mathbf{m}}^{\prime}\right] $.

The GW-LSI selection response can be written as

$$ {R}_W={k}_I{\sigma}_H{\rho}_{I_WH}={k}_I{\sigma}_H\frac{{\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \boldsymbol{\upbeta}}_W}{\sqrt{{\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \mathbf{a}}_W}\sqrt{{\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W}}, $$

(4.23a)

where k_I is the standardized selection differential of the GW-LMSI, $ {\sigma}_H^2={\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \mathbf{a}}_W $ and $ Var\left({I}_W\right)={\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W $ are the variance of H and I_W, whereas $ {\rho}_{I_WH}=\frac{{\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \boldsymbol{\upbeta}}_W}{\sqrt{{\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \mathbf{a}}_W}\sqrt{{\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W}} $ and $ {\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \boldsymbol{\upbeta}}_W $ are the correlation and the covariance between H and I_W respectively; $ \boldsymbol{\Phi} = Var\left[\begin{array}{c}\mathbf{y}\\ {}\mathbf{m}\end{array}\right]=\left[\begin{array}{cc}\mathbf{P}& {\mathbf{W}}^{\prime}\\ {}\mathbf{W}& \mathbf{M}\end{array}\right] $ and $ \boldsymbol{\Psi} = Var\left[\begin{array}{c}\mathbf{g}\\ {}\mathbf{m}\end{array}\right]=\left[\begin{array}{cc}\mathbf{C}& {\mathbf{W}}^{\prime}\\ {}\mathbf{W}& \mathbf{M}\end{array}\right] $ are block covariance matrices where P = Var(y), M = Var(m), C = Var(g), and W = Cov(y, m) = Cov(g, m) are the covariance matrices of phenotypic values (y), the molecular marker (m) coded values, and the genetic (g) values, whereas W is the covariance matrix between y and m, and between g and m. The size of matrices P and C is t × t, but the sizes of matrices M and W are m × m and m × t respectively.

From a theoretical point of view, Crossa and Cerón-Rojas (2011) showed that matrix M can be written as

$$ \mathbf{M}=\left[\begin{array}{cccc}1& \left(1-2{\delta}_{11}\right)& \cdots & \left(1-2{\delta}_{1N}\right)\\ {}\left(1-2{\delta}_{21}\right)& 1& \cdots & \left(1-2{\delta}_{2N}\right)\\ {}\vdots & \vdots & \ddots & \vdots \\ {}\left(1-2{\delta}_{N1}\right)& \left(1-2{\delta}_{N2}\right)& \cdots & 1\end{array}\right], $$

(4.23b)

where (1 − 2δ_ij) is the covariance (or correlation) and δ_ij the recombination frequency between the ith and jth marker (i, j = 1, 2, …, m = number of markers). According to Crossa and Cerón-Rojas (2011), matrix W can be written as

$$ \mathbf{W}=\left[\begin{array}{cccc}\left(1-2{r}_{11}\right){\alpha}_{11}& \left(1-2{r}_{11}\right){\alpha}_{12}& \cdots & \left(1-2{r}_{1N}\right){\alpha}_{1{N}_Q}\\ {}\left(1-2{r}_{21}\right){\alpha}_{21}& \left(1-2{r}_{22}\right){\alpha}_{22}& \cdots & \left(1-2{r}_{2N}\right){\alpha}_{2{N}_Q}\\ {}\vdots & \vdots & \ddots & \vdots \\ {}\left(1-2{r}_{t1}\right){\alpha}_{t1}& \left(1-2{r}_{N2}\right){\alpha}_{t2}& \cdots & \left(1-2{r}_{NN}\right){\alpha}_{tN_Q}\end{array}\right], $$

(4.23c)

where (1 − 2r_ik)α_qk (i = 1, 2, …, m, k = 1, 2, …, N_Q = number of QTL, q = 1, 2, …, t) is the covariance between the qth trait and the ith marker; r_ik is the recombination frequency between the ith marker and the kth QTL; and α_qk is the effect of the kth QTL over the qth trait.

The GW-LMSI expected genetic gain per trait can be written as

$$ {\mathbf{E}}_{LW}={k}_I\frac{\boldsymbol{\Psi} \boldsymbol{\upbeta}}{\sqrt{{\boldsymbol{\upbeta}}^{\prime}\boldsymbol{\Phi} \boldsymbol{\upbeta}}}. $$

(4.24)

All parameters in Eq. (4.24) were previously defined.

Matrix Φ could be singular, i.e., its inverse (Φ⁻¹) could not exist because matrix W is singular. Suppose that matrices Φ and Ψ are known; then, according to the LPSI theory, the GW-LMSI vector of coefficients (β_W) that maximizes $ {\rho}_{I_WH} $ can be written as

$$ {\boldsymbol{\upbeta}}_W={\boldsymbol{\Phi}}^{-}{\boldsymbol{\Psi} \mathbf{a}}_W, $$

(4.25a)

where matrix Φ⁻ denotes a generalized inverse of Φ. By Eq. (4.25a), the maximized GW-LMSI selection response is

$$ {R}_W={k}_I\sqrt{{\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W}. $$

(4.25b)

Equations (4.25a) and (4.25b) show that the GW-LMSI is a direct application of the LPSI to MAS. By Eq. (4.25a), the maximized correlation between H and I_W is

$$ {\rho}_{I_WH}=\frac{\sigma_{I_W}}{\sigma_H}, $$

(4.25c)

where $ {\sigma}_{I_W}=\sqrt{{\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W} $ is the standard deviation of the variance of I_W and $ {\sigma}_H=\sqrt{{\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \mathbf{a}}_W} $ is the standard deviation of the variance of H.

4.2.2 Relationship Between the GW-LMSI and the LPSI

Matrix Φ⁻ can be written as

$$ {\boldsymbol{\Phi}}^{-}=\left[\begin{array}{cc}{\mathbf{L}}^{-}& -{\mathbf{L}}^{-}{\mathbf{W}}^{\prime }{\mathbf{M}}^{-}\\ {}-{\mathbf{M}}^{-}{\mathbf{W}\mathbf{L}}^{-}& {\mathbf{M}}^{-}+{\mathbf{M}}^{-}{\mathbf{W}\mathbf{L}}^{-}{\mathbf{W}}^{\prime }{\mathbf{M}}^{-}\end{array}\right], $$

(4.26)

where L⁻ is a generalized inverse of matrix L = P − W^′M⁻W, and M⁻ is a generalized inverse of matrix M. In matrix Φ⁻, the inverse of matrix W is not required and the standard inverse of matrix M (M⁻¹) may exist. In the latter case, the standard inverse of matrix L (L⁻¹) exists and can be written as L⁻¹ = (P − W^′M⁻¹W)⁻¹ = P⁻¹ + P⁻¹W^′[M − WP⁻¹W^′]⁻¹WP⁻¹ (Searle et al. 2006).

By Eq. (4.26) and because $ {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_N\right] $, the vector components of $ {\boldsymbol{\upbeta}}_W^{\prime }=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_m^{\prime}\right] $, or β_W = Φ⁻Ψa_W, can be written as

$$ {\boldsymbol{\upbeta}}_y=\left[{\mathbf{L}}^{-}\mathbf{C}-{\mathbf{L}}^{-}{\mathbf{W}}^{\prime }{\mathbf{M}}^{-}\mathbf{W}\right]\mathbf{w} $$

(4.27)

and

$$ {\boldsymbol{\upbeta}}_m=\left[\left({\mathbf{M}}^{-}+{\mathbf{M}}^{-}{\mathbf{W}\mathbf{L}}^{-}{\mathbf{W}}^{\prime }{\mathbf{M}}^{-}\right)\mathbf{W}-{\mathbf{M}}^{-}{\mathbf{W}\mathbf{L}}^{-}\mathbf{C}\right]\mathbf{w}, $$

(4.28)

where w is the vector of economic weights. Suppose that there is no marker information; then, matrices M and W are null and Eq. (4.27) is equal to β_y = P⁻¹Cw = b (the LPSI vector of coefficients), whereas β_m = 0 and $ {R}_W={k}_I\sqrt{{\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W}={k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}={R}_I $, the LPSI selection response. Now suppose that the markers explain all the genetic variability; in this case, β_y = 0 and β_m = (X^′X)⁻X^′Y, the matrix of linear regression coefficients in the multivariate context, where (X^′X)⁻ is a generalized inverse matrix of X^′X and Y is a matrix of phenotypic observations.

4.2.3 Statistical Properties of GW-LMSI

Assume that H and I_W have bivariate joint normal distribution, β_W = Φ⁻Ψa_W, and P, C, M, W, and w are known; then, the statistical GW-LMSI properties are the same as the LMSI properties. That is,

1.
$ {\sigma}_{I_W}^2={\sigma}_{HI_W} $, i.e., the variance of I_W ($ {\sigma}_{I_W}^2 $) and the covariance between H and I_W ($ {\sigma}_{HI_W} $) are the same.
2.
The maximized correlation between H and I_W, or I_W accuracy, is $ {\rho}_{HI_W}=\frac{\sigma_{I_W}}{\sigma_H} $.
3.
The variance of the predicted error, $ Var\left(H-{I}_W\right)=\left(1-{\rho}_{HI_W}^2\right){\sigma}_H^2 $, is minimal.
4.
The total variance of H explained by I_W is $ {\sigma}_{I_W}^2={\rho}_{HI_W}^2{\sigma}_H^2 $.

According to Lange and Whittaker (2001), GW-LMSI efficiency should be greater than LMSI efficiency. However, this would be true only if matrices P, C, M, and W are known and trait heritability is very low.

4.3 Estimating the LMSI Parameters

When covariance matrices P, C, and S, and the vector of economic weights (w) are known, there is no error in the estimation of the LMSI parameters (selection response, expected genetic gain, etc.); the same is true for the GW-LMSI when, in addition to P, C, and w, the covariance matrices M and W are known. In such cases, the relative efficiency of the LMSI (GW-LMSI) depends only on the heritability of the traits and on the portion of phenotypic variation associated with markers. Using simulated data, Lange and Whittaker (2001) found that GW-LMSI efficiency was higher than LMSI efficiency when trait heritability was 0.2 and matrices P, C, M, and W were known. When P, C, S, M, and W are unknown, it is necessary to estimate them; then, the LMSI and GW-LMSI vector of coefficients and the effects associated with markers are estimated with some error. This error leads to lower LMSI and GW-LMSI efficiency than expected under the assumption that the parameters are known; however, in the latter case, Lange and Whittaker (2001) also found that GW-LMSI efficiency was greater than that of the LMSI when trait heritability was 0.05. Moreover, in the LMSI there is additional bias in the estimation of the parameters because only markers with significant effects are included in the index (Moreau et al. 1998).

In Chap. 2, we described the restricted maximum likelihood (REML) method for estimating matrices P and C. Some authors (Lande and Thompson 1990; Charcosset and Gallais 1996; Hospital et al. 1997; Moreau et al. 1998, 2007) have described methods for estimating marker scores, the variance of the marker scores, the LMSI vector of coefficients, etc., in the context of one trait; however, up to now there have been no reports on the estimation of matrix S in the multi-trait case. Lange and Whittaker (2001) only indicated that matrix S can be estimated as $ \widehat{\mathbf{S}}= Var\left(\widehat{\mathbf{s}}\right) $, where $ \widehat{\mathbf{s}} $ is a vector of estimated marker scores associated with several individual traits.

The main problems associated with the estimated LMSI parameters are:

1.
The estimated values of the covariance matrix S ($ \widehat{\mathbf{S}} $) tend to overestimate the genetic covariance matrix (C).
2.
The estimated variances of the marker scores can be negative.

When the first point is true, the estimated LMSI selection response and efficiency could be negative because the estimated matrix $ {\widehat{\mathbf{T}}}_M=\left[\begin{array}{cc}\widehat{\mathbf{P}}& \widehat{\mathbf{S}}\\ {}\widehat{\mathbf{S}}& \widehat{\mathbf{S}}\end{array}\right] $ is not positive definite (all eigenvalues positive) and the estimated matrix $ {\widehat{\mathbf{Z}}}_M=\left[\begin{array}{cc}\widehat{\mathbf{G}}& \widehat{\mathbf{S}}\\ {}\widehat{\mathbf{S}}& \widehat{\mathbf{S}}\end{array}\right] $ is not positive semi-definite (no negative eigenvalues). In addition, the results can lead to all weights being placed on the molecular score and the weights on the phenotype values can be negative (Moreau et al. 2007). When the second point is true, the variance of the marker scores is not useful. The two problems indicated above could be caused by using the same data set to select markers and to estimate marker effects, and there is no simple way of solving them. Lande and Thompson (1990) proposed that the markers used to obtain $ \widehat{\mathbf{S}} $ be selected a priori as those with the most highly significant partial regression coefficients from among all the markers in the linkage group analyzed in the previous generation. Zhang and Smith (1992, 1993) proposed using two independent sets of markers: one to estimate marker effects and the other to select markers. Additional solutions to these problems were described by Moreau et al. (2007).

In this subsection, we describe methods (in the univariate and multivariate context) for estimating molecular marker effects, marker scores, and their variance and covariance, and for estimating the LMSI and GW-LMSI vector of coefficients, selection response, expected genetic gain, and accuracy. This subsection is only for illustration; we use the same data set to select markers, and to estimate marker effects and the variance of marker scores.

4.3.1 Estimating the Marker Score

According to Eqs. (4.11) and (4.17b), when the vector of economic weights is equal to $ {\mathbf{a}}^{\prime }=\left[1\kern0.5em 0\right] $, the LMSI for the ith trait y_i (i = 1, 2, ⋯, t; t = number of traits) value can be written as $ {I}_{M_{li}}\kern0.5em =\kern0.5em {s}_i+{\beta}_{y_i}\left({y}_i-{s}_i\right) $ (l = 1, 2, ⋯, n; n = number of individuals or genotypes), where $ {\beta}_{yi}=\frac{\sigma_{g_i}^2-{\sigma}_{s_i}^2}{\sigma_{y_i}^2-{\sigma}_{s_i}^2}=\frac{h_i^2\left(1-{q}_i\right)}{1-{q}_i{h}_i^2} $ is the LMSI coefficient, $ {h}_i^2=\frac{\sigma_{g_i}^2}{\sigma_{y_i}^2} $ is the heritability of the ith trait, and $ {q}_i=\frac{\sigma_{s_i}^2}{\sigma_{g_i}^2} $ is the proportion of genetic variance explained by the QTL or markers associated with the ith trait; $ {s}_i=\sum \limits_{j=1}^M{\theta}_j{x}_j $ (j = 1, 2, ⋯, M; M = number of selected markers) is the ith individual trait marker score; and $ {\sigma}_{y_i}^2 $, $ {\sigma}_{g_i}^2 $, and $ {\sigma}_{s_i}^2 $ are the ith variances of the phenotypic, genetic, and marker score values respectively.

The simplest way of estimating the ith marker score s_i is to perform a multiple linear regression of phenotypic values (y_i) on the coded values of the markers (x_j) and then select the markers statistically linked to the ith QTL that explain most of the variability in the regression model and use them to construct $ {s}_i=\sum \limits_{j\in M}{\theta}_j{x}_j $.

We can fit the model $ {y}_i^{\ast }=\sum \limits_{j\in M}{\theta}_j{x}_j+e $, where $ {y}_i^{\ast }={y}_i-{\overline{y}}_i $ and $ {\overline{y}}_i $ are the average values of the ith trait, by maximum likelihood or least squares. When estimating θ_j, the main problem is to choose the set of markers M based on criteria for declaring markers as significant and then use the estimated values of θ_j ($ {\widehat{\theta}}_j $) to estimate the ith marker score s_i as $ {\widehat{s}}_i=\sum \limits_{j\in M}{\widehat{\theta}}_j{x}_j $. The values of $ {\widehat{s}}_i $ may increase or decrease according to the number of markers (x_j) included in the model, and $ {\widehat{s}}_i $ affects LMSI selection response and efficiency by means of the estimated variance of $ {\widehat{s}}_i $ ($ {\widehat{\sigma}}_{{\widehat{s}}_i}^2 $) (Figs. 4.1 and 4.2).

According to the least squares method of estimation, $ \widehat{\boldsymbol{\uptheta}}={\left({\mathbf{X}}^{\prime}\mathbf{X}\right)}^{-1}{\mathbf{X}}^{\prime }{\mathbf{y}}^{\ast } $ is an estimator of the vector of regression coefficients $ {\boldsymbol{\uptheta}}^{\prime }=\left[{\theta}_1\kern0.5em {\theta}_2\kern0.5em \cdots \kern0.5em {\theta}_m\right] $, where m (m < n) is the number of markers, X is a matrix n × m of coded marker values (e.g., 1, 0 and −1 for marker genotypes AA, Aa, and aa respectively) and y^∗ is a vector n × 1 of phenotypic values centered based on its average values. Only a subset M(M < m) of the m markers is statistically linked to the QTL and then only a subset M of the estimated vector $ \widehat{\boldsymbol{\uptheta}} $ values is selected to estimate s_i as $ {\widehat{s}}_i=\sum \limits_{j=1}^M{\widehat{\theta}}_j{x}_j $.

To illustrate how to obtain $ {\widehat{s}}_i=\sum \limits_{j\in M}{\widehat{\theta}}_j{x}_j $, we use a real maize (Zea mays) F₂ population with 247 genotypes (each one with two repetitions), 195 molecular markers, and four traits – grain yield (GY, ton ha⁻¹); plant height (PHT, cm), ear height (EHT, cm), and anthesis day (AD, days) – evaluated in one environment. In an F₂ population, the marker homozygous loci for the allele from the first parental line can be coded by 1, whereas the marker homozygous loci for the allele from the second parental line can be coded by −1, and the marker heterozygous loci by 0.

For this example, we used trait PHT. Only seven markers were statistically linked to the PHT. The estimated vector of regression coefficients for these seven markers was $ \widehat{{\boldsymbol{\uptheta}}^{\prime }}=\left[5.46\kern0.5em -4.54\kern0.5em 0.98\kern0.5em 7.39\kern0.5em -7.75\kern0.5em -1.91\kern0.5em -3.53\right] $. Table 4.1 presents the first 20 genotypes, the coded values of the seven selected markers, and the first 20 estimated $ {\widehat{s}}_{PHT} $ values of the 247 genotypes in the maize (Zea mays) F₂ population. According to $ \widehat{{\boldsymbol{\uptheta}}^{\prime }} $ and the coded values of the seven markers, the first estimated $ {\widehat{s}}_{PHT} $ value was obtained as $ {\widehat{s}}_{PHT1}=-1.91(1)+-3.53\left(-1\right)=1.62 $; the second estimated $ {\widehat{s}}_{PHT} $ value was obtained as $ {\widehat{s}}_{PHT2}=5.46\left(-1\right)+-4.54\left(-1\right)-1.91\left(-1\right)=0.99 $, etc. The 20th estimated $ {\widehat{s}}_{PHT} $ value was obtained as $ {\widehat{s}}_{PHT20}=-3.53\left(-1\right)=3.53 $. This estimation procedure is valid for any number of genotypes and markers.

Table 4.1 Number of selected genotypes, coded values of seven selected markers, and estimated marker score values obtained from a maize (Zea mays) F₂ population with 247 genotypes and 195 molecular markers

Full size table

Figure 4.3 shows the distribution of the 247 estimated marker scores associated with traits PHT and EHT of the maize F₂ population. Note that the estimated marker score values approach normal distribution.

4.3.2 Estimating the Variance of the Marker Score

There are many methods of estimating the variance of the marker score associated with the ith trait ($ {\sigma}_{s_i}^2 $); the first one was proposed by Lande and Thompson (1990). According to these authors, $ {\sigma}_{s_i}^2 $ can be estimated as

$$ {\widehat{\sigma}}_{{\widehat{s}}_i}^2={\widehat{\boldsymbol{\uptheta}}}_i^{\prime }{\mathbf{M}}_i{\widehat{\boldsymbol{\uptheta}}}_i-\frac{M{\widehat{\sigma}}_{e_i}^2}{n}, $$

(4.29)

where $ {\widehat{\boldsymbol{\uptheta}}}_i $ is the estimated vector of regression coefficients of the selected markers, $ {\mathbf{M}}_i=\frac{2}{n}{\mathbf{X}}_i^{\prime }{\mathbf{X}}_i $is the covariance matrix M × M of the selected markers that are statistically linked to the ith trait marker loci; $ {\widehat{\sigma}}_{e_i}^2=\frac{{\mathbf{y}}^{\prime}\left(\mathbf{I}-\mathbf{H}\right)\mathbf{y}}{n-M-1} $ is the unbiased estimated variance of the residuals, $ \mathbf{H}=\mathbf{I}-{\mathbf{X}}_i{\left({\mathbf{X}}_i^{\prime }{\mathbf{X}}_i\right)}^{-1}{\mathbf{X}}_i^{\prime } $, I is an identity matrix n × n, M is the number of selected markers statistically linked to the QTL, and X_i is a matrix n × M with the coded values of the selected markers. According to Lande and Thompson (1990), Eq. (4.29) is an unbiased estimator of $ {\sigma}_{s_i}^2 $ and its variance can be written as

$$ Var\left({\widehat{\sigma}}_{{\widehat{s}}_i}^2\right)=\frac{4{\sigma}_{s_i}^2{\sigma}_{e_i}^2}{n}+\frac{2M{\left({\sigma}_{e_i}^2\right)}^2}{n^2}+\frac{2{M}^2{\left({\sigma}_{e_i}^2\right)}^2}{n^2\left(n-M\right)}, $$

(4.30)

which tends to zero when n, the number of genotypes or individuals, is very high.

From Eq. (4.29), it is possible to obtain an estimator of the covariance between the ith and jth marker scores when the number of selected markers statistically linked to the QTL is the same in the ith and jth traits. Thus, by Eq. (4.29), the covariance between the ith and jth marker scores can be estimated as

$$ {\widehat{\sigma}}_{{\widehat{s}}_{ij}}={\widehat{\boldsymbol{\uptheta}}}_i^{\prime }{\mathbf{M}}_{ij}{\widehat{\boldsymbol{\uptheta}}}_j-\frac{M{\widehat{\sigma}}_{e_{ij}}}{n}, $$

(4.31)

where $ {\widehat{\boldsymbol{\uptheta}}}_i $ and $ {\widehat{\boldsymbol{\uptheta}}}_j $ are the estimated vectors of regression coefficients of the selected markers associated with the ith and jth trait loci respectively; $ {\mathbf{M}}_{ij}=\frac{2}{n}{\mathbf{X}}_i^{\prime }{\mathbf{X}}_j $ is the covariance matrix M × M of the markers statistically linked to the ith and jth trait marker loci; X_i and X_j are n × M matrices with the coded values of the selected markers associated with the ith and jth trait loci respectively; $ {\widehat{\sigma}}_{e_{ij}}=\frac{{\mathbf{y}}_i^{\prime}\left(\mathbf{I}-{\mathbf{H}}_{ij}\right){\mathbf{y}}_j}{n-M-1} $ is the estimated covariance of the residuals between the ith (y_i) and jth (y_j) trait values, $ {\mathbf{H}}_{ij}=\mathbf{I}-{\mathbf{X}}_i{\left({\mathbf{X}}_i^{\prime }{\mathbf{X}}_j\right)}^{-1}{\mathbf{X}}_j^{\prime } $, I is an identity matrix n × n, and M is the number of selected markers statistically linked to the QTL.

According to the PHT values described in Sect. 4.3.1 of this chapter, M = 7, n = 247, $ {\widehat{\sigma}}_{e_i}^2=180.80 $ and $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 $ (Eq. 4.29). Note that $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2\le {\widehat{\sigma}}_{g_{PHT}}^2 $, where $ {\widehat{\sigma}}_{g_{PHT}}^2=83.0 $ is an estimate of the genetic variance of PHT. The estimated portion of the genetic variance attributable to $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 $ was $ {\widehat{q}}_{PHT}=\frac{48.23}{83}=0.5811 $; that is, the seven markers explain 58.11% of the genetic variance associated with PHT.

Charcosset and Gallais (1996) considered two possible methods of estimating $ {\sigma}_{s_i}^2 $ based on the coefficient of multiple determination or squared multiple correlation R² (note that in this case R² is not the square of the selection response). The coefficient R² gives the portion of the total variation in the phenotypic values that is “explained” by, or attributable to, the markers and can be written as

$$ {R}^2=\frac{\widehat{\boldsymbol{\uptheta}}{\mathbf{X}}^{\prime}\mathbf{y}-n{\overline{y}}^2}{{\mathbf{y}}^{\prime}\mathbf{y}-n{\overline{y}}^2}=\frac{{\widehat{\sigma}}_s^2}{{\widehat{\sigma}}_y^2}, $$

(4.32a)

where $ \widehat{\boldsymbol{\uptheta}}{\mathbf{X}}^{\prime}\mathbf{y}-n{\overline{y}}^2 $ is the overall regression sum of squares adjusted for the intercept and $ {\mathbf{y}}^{\prime}\mathbf{y}-n{\overline{y}}^2 $ is the total sum of squares adjusted for the mean. The coefficient R² is equal to 1 if the fitted equation $ {y}_i={\theta}_0+\sum \limits_{j\in M}{\theta}_j{x}_j+{e}_i $ passes through all the data points, so that all residuals are null; then, the markers explain all the phenotypic variance. At the other extreme, R² is zero if $ {\overline{y}}_i={\widehat{\theta}}_0 $ and the estimated regression coefficients are null, i.e., $ {\widehat{\theta}}_1={\widehat{\theta}}_2=\cdots ={\widehat{\theta}}_M=0 $. In the latter case, markers do not affect the phenotypic observations and the variance of the marker score values is zero. Thus, the R² values are between 0 and 1, i.e., 0 ≤ R² ≤ 1.0. Equation (4.32a) is useful for estimating $ {\sigma}_{s_i}^2 $ as $ {\widehat{\sigma}}_{y_i}^2\sum \limits_{j=1}^M{R}_j^2={\widehat{\sigma}}_s^2 $, where $ {R}_j^2 $ is the estimated value of the jth marker and $ {\widehat{\sigma}}_y^2 $ is the phenotypic variance of the ith trait; however, this is a biased estimator of $ {\sigma}_{s_i}^2 $ (Hospital et al. 1997).

Charcosset and Gallais (1996) and Hospital et al. (1997) proposed an unbiased estimator of $ {\sigma}_{s_i}^2 $ based on all the selected markers using the adjusted coefficient of multiple determination, i.e.,

$$ {R}_{Adj}^2=1-\frac{n-1}{n-M-1}\left(1-{R}^2\right)=\frac{{\widehat{\sigma}}_s^2}{{\widehat{\sigma}}_y^2}, $$

(4.32b)

whence we can obtain a unbiased estimator of $ {\sigma}_{s_i}^2 $ as $ {\widehat{\sigma}}_y^2{R}_{Adj}^2={\widehat{\sigma}}_{{\widehat{s}}_i}^2 $ by jointly using all the markers that affect the phenotypic values. The problem with Eq. (4.32b) is that the $ {R}_{Adj}^2 $ values could be negative; in that case, the estimated value of $ {\sigma}_{s_i}^2 $ would also be negative. One additional problem with Eq. (4.32b) is that the $ {R}_{Adj}^2 $ values can produce $ {\widehat{\sigma}}_s^2 $ values that are higher than those of the estimated variance of the breeding values $ {\widehat{\sigma}}_g^2 $.

Using Eqs. (4.32a) and (4.32b), we can estimate $ {\sigma}_{s_i}^2 $, but from them it is not clear how we can estimate the covariance between two different estimated marker score values.

Consider the case of the PHT values described in Sect. 4.3.1 of this chapter, where M = 7, n = 247, and the estimated variance of PHT was $ {\widehat{\sigma}}_{PHT}^2=191.81 $. The estimated values of R² for each of the seven markers were 0.0038, 0.0005, 0.006, 0.0013, 0.0036, 0.0114, and 0.0298, whence, by multiplying each estimated R² value by $ {\widehat{\sigma}}_{PHT}^2=191.81 $ and summing the results, we found that the estimated value of $ {\sigma}_{s_{PHT}}^2 $ was $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 $. In this case, the estimated portion of the genetic variance attributable to $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 $ was $ {\widehat{q}}_{PHT}=\frac{9.78}{83}=0.1178 $; thus, when we estimated $ {\sigma}_{s_{PHT}}^2 $ according to Eq. (4.32a), the seven markers explained only 11.78% of the genetic variance associated with PHT.

The estimated value of $ {R}_{Adj}^2 $ for the seven markers jointly was 0.06, whence $ {\widehat{\sigma}}_{s_{PHT}}^2=(191.81)(0.06)=11.50 $ is an estimate of $ {\sigma}_{s_{PHT}}^2 $. In the latter case, the estimated portion of the genetic variance attributable to $ {\widehat{\sigma}}_{s_{PHT}}^2=11.50 $ was $ {\widehat{q}}_{PHT}=\frac{11.5}{83}=0.1385 $; that is, according to Eq. (4.32b), the seven markers explain 13.85% of the genetic variance associated with PHT.

One additional way of estimating the variance of the marker score $ {\sigma}_{s_i}^2 $ was proposed by Lange and Whittaker (2001) as

$$ \frac{1}{n-1}\sum \limits_{i=1}^n{\left({\widehat{s}}_i-{\widehat{\mu}}_{s_i}\right)}^2, $$

(4.33)

where $ {\widehat{s}}_i=\sum \limits_{j=1}^M{\widehat{\theta}}_j{x}_j $ and $ {\widehat{\mu}}_{s_i} $ is the mean of $ {\widehat{s}}_i $ values. The covariance between the ith and jth marker scores can be estimated as the cross products of the marker score values divided by n − 1. Note that in this case, the number of markers associated with the ith and jth traits may be different.

For the PHT values described in Sect. 4.3.1 of this chapter, where n = 247, the estimated value of $ {\sigma}_{s_i}^2 $ was $ {\widehat{\sigma}}_{s_{PHT}}^2=15.75 $ and the estimated portion of the genetic variance attributable to $ {\widehat{\sigma}}_{s_{PHT}}^2=15.75 $ was $ {\widehat{q}}_{PHT}=\frac{15.75}{83}=0.1897 $. That is, the seven markers jointly explain 18.97% of the genetic variance associated with PHT according to Eq. (4.33).

4.3.3 Estimating LMSI Selection Response and Efficiency

With the estimated phenotypic variances ($ {\widehat{\sigma}}_{PHT}^2=191.81 $), the estimated genetic variance ($ {\widehat{\sigma}}_{g_{PHT}}^2=83.0 $) and the estimated marker score variances: $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 $ (Eq. 4.29), $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 $ (Eq. 4.32a), $ {\widehat{\sigma}}_{s_{PHT}}^2=11.50 $ (Eq. 4.32b), and $ {\widehat{\sigma}}_{s_{PHT}}^2=15.75 $ (Eq. 4.33), we can estimate the LMSI coefficient, selection response, and efficiency.

Using the estimated value $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 $ obtained with Eq. (4.29), it is possible to estimate the LMSI weight as $ {\widehat{\beta}}_{PHT}=\frac{{\widehat{\sigma}}_{g_{PHT}}^2-{\widehat{\sigma}}_{s_{PHT}}^2}{{\widehat{\sigma}}_{PHT}^2-{\widehat{\sigma}}_{s_{PHT}}^2}=\frac{83.0-48.23}{191.81-48.23}=0.242 $, whereas for $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 $, $ {\widehat{\sigma}}_{s_{PHT}}^2=11.50 $, and $ {\widehat{\sigma}}_{s_{PHT}}^2=15.75 $, the estimated values of β_PHT were 0.402, 0.40, and 0.382 respectively. The latter results indicate that the estimated values of β_PHT associated with the phenotypic values tend to decrease when the estimated values of the variance of the marker score increase. This means that at the limit, when all the genetic variance is explained by the markers, the estimated values of β_PHT are zero and the estimated LMSI is equal to $ {\widehat{I}}_M=\widehat{s} $. Thus, for trait PHT, when the estimated values of β_PHT are not zero, the estimated LMSI can be written as $ {\widehat{I}}_{M_{PHT}}={\widehat{s}}_{PHT}+{\widehat{\beta}}_{PHT}\left({PHT}_i-{\widehat{s}}_{PHT}\right) $. The $ {\widehat{I}}_{M_{PHT}} $ values are used to predict, rank, and select the net genetic merit value of each individual candidate for selection.

Based on the result $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 $ obtained with Eq. (4.29) and using a selection intensity of 10% (k_I= 1.755), the estimated LMSI selection response can be obtained as

$$ {\widehat{R}}_M={k}_I\sqrt{\frac{{\widehat{\sigma}}_g^2\left({\widehat{\sigma}}_g^2-{\widehat{\sigma}}_s^2\right)+{\widehat{\sigma}}_s^2\left({\widehat{\sigma}}_y^2-{\widehat{\sigma}}_g^2\right)}{{\widehat{\sigma}}_y^2-{\widehat{\sigma}}_s^2}}=1.755\sqrt{\frac{83\left(83-48.23\right)+48.23\left(191.81-83\right)}{191.81-48.23}} $$

$$ =1.755\sqrt{56.65}=13.21. $$

In a similar manner, using the result $ {\widehat{\sigma}}_{s_{PHT}}^2=15.75 $, the estimated selection response was $ {\widehat{R}}_M=1.755\sqrt{\frac{83\left(83-15.75\right)+15.75\left(191.81-83\right)}{191.81-15.75}}=1.755\sqrt{41.44}=11.30. $ With $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 $ and $ {\widehat{\sigma}}_{s_{PHT}}^2=11.50 $, the estimated values of the LMSI selection responses were 10.99 and 11.10 respectively. The latter results indicate that the estimated values of the LMSI selection responses tend to increase when the estimated values of the variance of the marker score increase.

We can estimate LMSI versus phenotypic efficiency for one trait as $ {\widehat{\lambda}}_M=\sqrt{\frac{\widehat{q}}{{\widehat{h}}^2}+\frac{{\left(1-\widehat{q}\right)}^2}{1-{\widehat{q}\widehat{h}}^2}} $, where $ {\widehat{h}}^2 $ is the estimated trait heritability and $ \widehat{q}=\frac{{\widehat{\sigma}}_s^2}{{\widehat{\sigma}}_g^2} $ is the estimated portion of additive genetic variance explained by the markers. When $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 $, $ {\widehat{q}}_{PHT}=\frac{48.23}{83}=0.5811 $, and $ {\widehat{h}}^2=0.433 $, the estimated LMSI efficiency was $ {\widehat{\lambda}}_M=\sqrt{1.58}=1.25 $. For $ {\widehat{\sigma}}_{s_{PHT}}^2=15.75 $, $ {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 $, and $ {\widehat{\sigma}}_{s_{PHT}}^2=11.50 $, the estimated portions of the additive genetic variance explained by the markers were $ {\widehat{q}}_{PHT}=\frac{15.75}{83}=0.1897 $, $ {\widehat{q}}_{PHT}=\frac{9.78}{83}=0.1178 $, and $ {\widehat{q}}_{PHT}=\frac{11.5}{83}=0.1385 $ respectively, whence the estimated LMSI efficiencies were 1.1, 1.04, and 1.05 respectively. The latter results indicate that the estimated values of LMSI efficiency tend to increase when the estimated values of the variance of the marker score increase (Fig. 4.1).

Figure 4.1 presents the change in LMSI efficiency with respect to phenotypic selection for different values of the variance of the marker score when the phenotypic (191.81) and genetic (83) variances are fixed. In a similar manner, Fig. 4.2 presents the change in the LMSI selection response for different values of the variance of the marker score when the phenotypic (191.81) and genetic (83) variances are fixed. In effect, LMSI efficiency and the selection response depend on the genetic variance explained by the markers.

4.3.4 Estimating the Variance of the Marker Score in the Multi-Trait Case

Equation (4.33) can be used in the multi-trait context when the numbers of markers associated with the ith and jth traits are different. Also, it is possible to adapt Eqs. (4.32a) and (4.32b) to the multi-trait case. However, in the latter case, in addition to the markers linked to the QTL that affect one specific trait, we need to find markers that affect more than one trait, which may be very difficult. For this reason, in the multi-trait context, Eqs. (4.32a) and (4.32b) could be used to estimate the variance of the marker score (S) without preselecting the markers that affect the phenotypic traits, only when the number of genotypes is higher than the number of markers.

Let y₁, y₂, …, y_r be r independent multivariate normal vectors of observations, each with n observations, such that $ \mathbf{Y}=\left[\begin{array}{cccc}{y}_{11}& {y}_{12}& \cdots & {y}_{1t}\\ {}{y}_{21}& {y}_{22}& \cdots & {y}_{2t}\\ {}\vdots & \vdots & \cdots & \vdots \\ {}{y}_{n1}& {y}_{n2}& \cdots & {y}_{nt}\end{array}\right] $ is a matrix n × t of observations for t traits; then, the multivariate linear regression model can be written as Y = XB + U, where X is a matrix n × m (m= number of markers and m < n) of known coded marker values, B is a matrix m × n of regression coefficients, and U is a matrix n × t of unobserved random disturbance whose rows for given X are uncorrelated, each with mean 0 and common covariance matrix E (Mardia et al. 1982; Rencher 2002). According to the least squares method of estimation, $ \widehat{\mathbf{B}}={\left({\mathbf{X}}^{\prime}\mathbf{X}\right)}^{-1}{\mathbf{X}}^{\prime}\mathbf{Y} $ is an estimator of B and $ \widehat{\mathbf{E}}=\frac{{\left(\mathbf{Y}-\widehat{\mathbf{B}}\mathbf{X}\right)}^{\prime}\left(\mathbf{Y}-\widehat{\mathbf{B}}\mathbf{X}\right)}{n-m-1} $ is an estimator of the residual covariance matrix E assuming that n > m (Johnson and Wichern 2007).

Note that $ 1-{R}^2=\frac{{\widehat{\mathbf{e}}}^{\prime}\widehat{\mathbf{e}}}{{\mathbf{y}}^{\prime}\mathbf{y}} $, where $ \widehat{\mathbf{e}} $ is a vector of estimated residual values of the model $ {y}_i={\theta}_0+\sum \limits_{j\in M}{\theta}_j{x}_j+{e}_i $ and R² is the coefficient of multiple determination (Eq. 4.32a). In addition, as in the multi-trait context the estimated matrix of residuals is $ \widehat{\mathbf{U}}=\mathbf{Y}-\widehat{\mathbf{B}}\mathbf{X} $, 1 − R² can be written as $ \mathbf{D}={\left({\mathbf{Y}}^{\prime}\mathbf{Y}\right)}^{-1}{\widehat{\mathbf{U}}}^{\prime}\widehat{\mathbf{U}} $ (Mardia et al. 1982), whence R² in the multivariate context can written as

$$ {\mathbf{R}}^2=\mathbf{I}-\mathbf{D}={\widehat{\mathbf{P}}}^{-1}\widehat{\mathbf{S}}, $$

(4.34a)

whereas $ {R}_{Adj}^2 $ (Eq. 4.32b) can be written as

$$ {\mathbf{R}}_{Adj}^2=\mathbf{I}-\frac{n-1}{n-m-1}\mathbf{D}={\widehat{\mathbf{P}}}^{-1}\widehat{\mathbf{S}}, $$

(4.34b)

where I is an identity matrix t × t, $ {\widehat{\mathbf{P}}}^{-1} $ is the inverse of the estimated covariance matrix of phenotypic values ($ \widehat{\mathbf{P}} $), and $ \widehat{\mathbf{S}} $ is the estimated covariance matrix of marker score values. From Eq. (4.34b),

$$ \widehat{\mathbf{P}}{\mathbf{R}}_{Adj}^2=\widehat{\mathbf{S}} $$

(4.34c)

is an unbiased estimator of matrix $ \widehat{\mathbf{S}} $, whereas $ \widehat{\mathbf{P}}{\mathbf{R}}^2=\widehat{\mathbf{S}} $ (Eq. 4.34a) is a biased estimator of matrix $ \widehat{\mathbf{S}} $. The main problem of Eq. (4.34c) is that the diagonal elements of $ \widehat{\mathbf{S}} $ could be negative.

From the maize F₂ population including 247 genotypes (each one with two repetitions) and 195 molecular markers described in Sect. 4.3.1, we used two traits—PHT (cm) and EHT (cm)—to illustrate the multivariate method of estimating the LMSI parameters. The estimated phenotypic and genetic covariance matrices were $ \widehat{\mathbf{P}}=\left[\begin{array}{cc}191.81& 106.89\\ {}106.89& 167.93\end{array}\right] $ and $ \widehat{\mathbf{C}}=\left[\begin{array}{cc}83.00& 57.44\\ {}57.44& 59.80\end{array}\right] $, whereas the estimated covariance matrix of marker scores, using Eq. (4.33), was $ \widehat{\mathbf{S}}=\left[\begin{array}{cc}15.750& 0.983\\ {}0.983& 28.083\end{array}\right] $. When we used Eq. (4.34a) and Eq. (4.34c), we obtained estimated values of the variance and covariance of the marker scores that were higher than the genetic values (data not presented). Equations (4.29) and (4.31) are used later to compare LMSI efficiency versus GW-LMSI efficiency using the simulated data described in Chap. 2, Sect. 2.8.1.

With matrices $ \widehat{\mathbf{P}} $, $ \widehat{\mathbf{C}} $, and $ \widehat{\mathbf{S}} $, and the vector of economic weights $ {\mathbf{a}}^{\prime }=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{0}}^{\prime}\right] $, where $ {\mathbf{w}}^{\prime }=\left[-1\kern0.5em -1\right] $ and $ {\mathbf{0}}^{\prime }=\left[0\kern0.5em 0\right] $, we obtained the estimated matrices $ \widehat{\mathbf{T}}=\left[\begin{array}{cc}\widehat{\mathbf{P}}& \widehat{\mathbf{S}}\\ {}\widehat{\mathbf{S}}& \widehat{\mathbf{S}}\end{array}\right] $ and $ \mathbf{Z}=\left[\begin{array}{cc}\widehat{\mathbf{C}}& \widehat{\mathbf{S}}\\ {}\widehat{\mathbf{S}}& \widehat{\mathbf{S}}\end{array}\right] $, whence the estimated LMSI vector of coefficients was $ {\widehat{\boldsymbol{\upbeta}}}^{\prime }={\mathbf{a}}^{\prime }{\widehat{\mathbf{Z}}}_M{\widehat{\mathbf{T}}}_M^{-1}=\left[-0.59\kern0.5em -0.18\kern0.5em -0.41\kern0.5em -0.82\right] $. Using a selection intensity of 10% (k_I = 1.755), the estimated LMSI selection response and the expected genetic gains per trait were $ {\widehat{R}}_M={k}_I\sqrt{\widehat{{\boldsymbol{\upbeta}}^{\prime }}{\widehat{\mathbf{T}}}_M\widehat{\boldsymbol{\upbeta}}}=20.41 $ and $ {\widehat{\mathbf{E}}}_M^{\prime }={k}_I\frac{\widehat{{\boldsymbol{\upbeta}}^{\prime }}{\widehat{\mathbf{Z}}}_M}{\sqrt{\widehat{{\boldsymbol{\upbeta}}^{\prime }}{\widehat{\mathbf{T}}}_M\widehat{\boldsymbol{\upbeta}}}}=\left[-10.09\kern0.5em -10.31\kern0.5em -2.53\kern0.5em -4.39\right] $ respectively, whereas the estimated LMSI accuracy was $ {\widehat{\rho}}_{H{\widehat{I}}_M}=\frac{{\widehat{\sigma}}_{I_M}}{{\widehat{\sigma}}_H}=0.72 $.

The estimated LPSI parameters (see Chap. 2 for details) using the phenotypic information from the maize F₂ population for traits PHT and EHT are as follows. The estimated LPSI vector of coefficients was $ \widehat{{\mathbf{b}}^{\prime }}={\mathbf{w}}^{\prime}\widehat{\mathbf{C}}{\widehat{\mathbf{P}}}^{-1}=\left[-0.53\kern0.5em -0.36\right] $, and, with a selection intensity of 10% (k_I = 1.755), the estimated LPSI selection response and the expected genetic gains per trait were $ {\widehat{R}}_I={k}_I\sqrt{\widehat{{\mathbf{b}}^{\prime }}\widehat{\mathbf{P}}\widehat{\mathbf{b}}}=18.97 $ and $ \widehat{{\mathbf{E}}^{\prime }}={k}_I\frac{{\widehat{\mathbf{b}}}^{\prime}\widehat{\mathbf{C}}}{{\widehat{\sigma}}_I}=\left[-10.52\kern0.5em -8.45\right] $ respectively, whereas the estimated LPSI accuracy was $ {\widehat{\rho}}_{H\widehat{I}}=\frac{{\widehat{\sigma}}_I}{{\widehat{\sigma}}_H}=0.67 $.

We can determine LMSI efficiency versus LPSI efficiency to predict the net genetic merit using the ratio of estimated accuracy values $ {\widehat{\rho}}_{H{\widehat{I}}_M}=0.72 $ and $ {\widehat{\rho}}_{H\widehat{I}}=0.67 $ of the LMSI and LPSI respectively, i.e., $ {\widehat{\lambda}}_M=\frac{0.72}{0.67}=1.075 $, whence, according to Eq. (4.19), the estimated LMSI efficiency versus the LPSI efficiency, in percentage terms, was $ {\widehat{p}}_M=100\left(1.075-1\right)=7.5 $. That is, for these data, the estimated LMSI efficiency was only 7.5% greater than LPSI efficiency at predicting the net genetic merit.

4.4 Estimating the GW-LMSI Parameters in the Asymptotic Context

Lange and Whittaker (2001) proposed the GW-LMSI. However, these authors did not provide detailed procedures for estimating matrices P, C, W, and M. They indicated that matrix C can be estimated using the estimated matrix of covariance of marker scores ($ \widehat{\mathbf{S}} $) and that matrices P, W, and M can be estimated directly by their empirical variances and covariances, but this assertion does not indicate a clear method for estimating those covariance matrices. In Chap. 2, we described the REML method of estimating C and P. Crossa and Cerón-Rojas (2011) described matrices W and M in a doubled haploid population. In this study, we describe and estimate matrices W and M for an F₂ population in the asymptotic context according to the Wright and Mowers (1994) approach, which is based on regressing phenotype values on marker coded values. We used this latter approach to estimate W and M, because it is a clearer estimation method than that of Lange and Whittaker (2001); however, the Wright and Mowers (1994) approach is an asymptotic method and should be regarded with precaution.

Matrix M is the covariance matrix of the molecular marker code values. All marker information used to construct matrix M is presented in Table 4.2. Based on this information, we found that the expectations (E(X₁) and E(X₂)) and the variances (V(X₁) and V(X₂)) of the marker coded values X₁ and X₂ are E(X₁) = E(X₂) = 0 and V(X₁) = V(X₂) = 1, whereas the covariance (Cov(X₁, X₂)) and correlation (Corr(X₁, X₂)), between X₁ and X₂ were

$$ Cov\left({X}_1,{X}_2\right)= Corr\left({X}_1,{X}_2\right)=1-2\delta . $$

(4.35)

Table 4.2 Marker genotypes, expected frequency, and coded values (X₁ and X₂) of the marker genotypes in an F₂ population

Full size table

Thus, as the variances of X₁ and X₂ are equal to 1, the correlation between X₁ and X₂ is $ Corr\left({X}_1,{X}_2\right)=\frac{Cov\left({X}_1,{X}_2\right)}{\sqrt{V\left({X}_1\right)V\left({X}_2\right)}}=1-2\delta $, i.e., the covariance and correlation between X₁ and X₂ are the same. Equation (4.35) results indicate that if we perform the same operation with many markers, we will obtain similar results; they also indicate that this is the way to construct matrix M.

Let X be a matrix of coded markers of size n × m, where n ≥ m and m= number of markers; then according to Wright and Mowers (1994), because all marker information is contained in matrix X^′X, when the number of observations (n) tends to infinity, the product $ {\mathbf{x}}_i^{\prime }{\mathbf{x}}_j/n $ tends to the covariance between markers ith and jth, whence matrix n⁻¹X^′X should tend to the covariance matrix between the markers that conform matrix X with the ijth element equal to (0.5 − δ_ij). Thus, matrix 2n⁻¹X^′X should tend to a covariance matrix where the ijth entry is equal to (1 − 2δ_ij). Based on the latter result, an estimator of matrix M in the asymptotic context is

$$ \widehat{\mathbf{M}}=2{n}^{-1}{\mathbf{X}}^{\prime}\mathbf{X}. $$

(4.36)

Equation (4.36) is an asymptotic result and should be taken with caution. To date, there has been no clear method for estimating M in the non-asymptotic context; for this reason, Eq. (4.36) is used to estimate the GW-LMSI parameters.

Assume that a QTL is between the two markers in Table 4.2; then, δ can be written as δ = r₁ + r₂ − 2r₁r₂, where r₁ and r₂ denote the recombination frequency between marker 1 and marker 2 respectively, with the QTL between them. When the number of genotypes or individuals tends to infinity, the covariance between the phenotypic trait values (y) and the marker 1 coded values (X₁) in an F₂ population can be written as

$$ Cov\left({X}_1,y\right)=\frac{1}{2}{\alpha}_1\left(1-2{r}_1\right), $$

(4.37)

where α₁(1 − 2r₁) is the portion of the additive effect (α₁) of the QTL linked to marker 1 (Edwards et al. 1987), and r₁ is the recombination frequency between the QTL and marker 1. We can assume that for many markers, the covariance of the phenotypic values is similar to Eq. (4.37), whence matrix W can be obtained.

Let y be a vector n × 1 of recorded phenotypic values, where n denotes the number of observation or records, and X is a matrix of coded markers of size n × m. When n tends to infinity, 2n⁻¹X^′y tends to be a vector with elements equal to α_i(1 − 2r_i), where α_i is the additive effect of the ith QTL linked to the ith marker, and r_i is the recombination frequency between the ith QTL and the ith marker. Now let $ \mathbf{Y}=\left[\begin{array}{cccc}{y}_{11}& {y}_{12}& \cdots & {y}_{1t}\\ {}{y}_{21}& {y}_{22}& \cdots & {y}_{2t}\\ {}\vdots & \vdots & \cdots & \vdots \\ {}{y}_{n1}& {y}_{n2}& \cdots & {y}_{nt}\end{array}\right] $ be a matrix of observations for t traits; then, an estimator of matrix W in the asymptotic context is

$$ \widehat{\mathbf{W}}=2{n}^{-1}{\mathbf{X}}^{\prime}\mathbf{Y}. $$

(4.38)

Once again, Eq. (4.38) is an asymptotic result and should be accepted with caution. But to date, there has been no clear method for estimating W in the non-asymptotic context; for this reason, Eq. (4.38) is used to estimate the GW-LMSI parameters.

4.5 Comparing LMSI Versus LPSI and GW-LMSI Efficiency

To compare LMSI efficiency versus GW-LMSI efficiency for predicting the net genetic merit, we use the simulated data set described in Chap. 2, Sect. 2.8.1.

Figure 4.4 presents the estimated accuracy values of the LPSI ($ {\widehat{\rho}}_{H\widehat{I}}=\frac{{\widehat{\sigma}}_{\widehat{I}}}{{\widehat{\sigma}}_H} $), the LMSI ($ {\widehat{\rho}}_{H{\widehat{I}}_M}=\frac{{\widehat{\sigma}}_{{\widehat{I}}_M}}{{\widehat{\sigma}}_H} $), and the GW-LMSI ($ {\widehat{\rho}}_{H{\widehat{I}}_W}=\frac{{\widehat{\sigma}}_{{\widehat{I}}_W}}{{\widehat{\sigma}}_H} $) for five simulated selection cycles. In addition, Table 4.3 presents the estimated LPSI, LMSI, and GW-LMSI selection responses, the estimated LPSI, LMSI, and GW-LMSI variances of the predicted error ($ \left(1-{\widehat{\rho}}_{H\widehat{I}}^2\right){\widehat{\sigma}}_H^2 $, $ \left(1-{\widehat{\rho}}_{H{\widehat{I}}_M}^2\right){\widehat{\sigma}}_H^2 $ and $ \left(1-{\widehat{\rho}}_{H{\widehat{I}}_W}^2\right){\widehat{\sigma}}_H^2 $ respectively), the ratios of the estimated LMSI accuracy to the estimated LPSI accuracy and the estimated LMSI accuracy to the estimated GW-LMSI accuracy, expressed as percentages (Eq. 4.19), for five simulated selection cycles.

Table 4.3 Estimated linear phenotypic, molecular, and genome-wide selection indices (LPSI, LMSI, and GW-LMSI respectively), selection responses and variance of the predicted error, and estimated ratio of LMSI accuracy to LPSI and GW-LMSI accuracy expressed in percentages for 4 traits, 2500 markers and 500 genotypes (each with four repetitions) in one environment for five simulated selection cycles

Full size table

According to Fig. 4.4, for this data set the estimated LMSI accuracy ($ {\widehat{\rho}}_{H{\widehat{I}}_M} $) was higher than the estimated LPSI and GW-LMSI accuracy ($ {\widehat{\rho}}_{H\widehat{I}} $ and $ {\widehat{\rho}}_{H{\widehat{I}}_W} $ respectively), for the five simulated selection cycles, that is, $ {\widehat{\rho}}_{H{\widehat{I}}_M}>{\widehat{\rho}}_{H\widehat{I}}>{\widehat{\rho}}_{H{\widehat{I}}_W} $. In a similar manner, Table 4.3 results indicate that the estimated LMSI selection response ($ {\widehat{R}}_M $) was higher than the estimated LPSI and GW-LMSI selection responses ($ {\widehat{R}}_I $ and $ {\widehat{R}}_W $ respectively): $ {\widehat{R}}_M>{\widehat{R}}_I>{\widehat{R}}_W $.

Note that the estimated LPSI, LMSI, and GW-LMSI variances of the predicted error, and the estimated LMSI efficiency versus LPSI efficiency and versus GW-LMSI efficiency (expressed in percentages) are related to the estimated LMSI, LPSI, and GW-LMSI accuracies, and that in all five selection cycles, $ {\widehat{\rho}}_{H{\widehat{I}}_M}>{\widehat{\rho}}_{H\widehat{I}}>{\widehat{\rho}}_{H{\widehat{I}}_W} $. This implies that the estimated LMSI variance of the predicted error was lower than the estimated LPSI and GW-LMSI variance of the predicted error. In a similar manner, because $ {\widehat{\rho}}_{H{\widehat{I}}_M}>{\widehat{\rho}}_{H\widehat{I}}>{\widehat{\rho}}_{H{\widehat{I}}_W} $, the estimated LMSI efficiency was higher than the estimated LPSI efficiency and the estimated GW-LMSI efficiency.

Based on Fig. 4.4 and Table 4.3 results, we conclude that the LMSI was a better predictor of the net genetic merit than the LPSI, and that the LPSI is a better predictor of the net genetic merit than the GW-LMSI for this simulated data set.

References

Bulmer MG (1980) The mathematical theory of quantitative genetics. Lectures in biomathematics. University of Oxford, Clarendon Press, Oxford
Google Scholar
Charcosset A, Gallais A (1996) Estimation of the contribution of quantitative trait loci (QTL) to the variance of a quantitative trait by means of genetic markers. Theor Appl Genet 93:1193–1201
Article CAS Google Scholar
Crossa J, Cerón-Rojas JJ (2011) Multi-trait multi-environment genome-wide molecular marker selection indices. J Indian Soc Agric Stat 62(2):125–142
Google Scholar
Dekkers JCM, Settar P (2004) Long-term selection with known quantitative trait loci. Plant Breed Rev 24:311–335
Google Scholar
Edwards MD, Stuber CW, Wendel JF (1987) Molecular-marker-facilitated investigations of quantitative-trait loci in maize. I. Numbers, genomic distribution and types of gene action. Genetics 116:113–125
CAS PubMed PubMed Central Google Scholar
Hospital F, Moreau L, Lacoudre F, Charcosset A, Gallais A (1997) More on the efficiency of marker-assisted selection. Theor Appl Genet 95:1181–1189
Article Google Scholar
Johnson RA, Wichern DW (2007) Applied multivariate statistical analysis, 6th edn. Pearson Prentice Hall, Upper Saddle River, NJ
Google Scholar
Knapp SJ (1998) Marker-assisted selection as a strategy for increasing the probability of selecting superior genotypes. Crop Sci 38:1164–1174
Article Google Scholar
Lande R, Thompson R (1990) Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124:743–756
CAS PubMed PubMed Central Google Scholar
Lange C, Whittaker JC (2001) On prediction of genetic values in marker-assisted selection. Genetics 159:1375–1381
CAS PubMed PubMed Central Google Scholar
Mardia KV, Kent JT, Bibby JM (1982) Multivariate analysis. Academic Press, New York
Google Scholar
Moreau L, Charcosset A, Hospital F, Gallais A (1998) Marker-assisted selection efficiency in populations of finite size. Genetics 148:1353–1365
CAS PubMed PubMed Central Google Scholar
Moreau L, Hospital F, Whittaker J (2007) Marker-assisted selection and introgression. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, vol 1, 3rd edn. Wiley, New York, pp 718–751
Google Scholar
Rencher AC (2002) Methods of multivariate analysis. Wiley, New York
Book Google Scholar
Searle S, Casella G, McCulloch CE (2006) Variance components. Wiley, Hoboken, NJ
Google Scholar
Whittaker JC (2003) Marker-assisted selection and introgression. In: Balding DJ, Bishop M, Cannings C (eds) Handbook of statistical genetics, vol 1, 2nd edn. Wiley, New York, pp 554–574
Google Scholar
Wright AJ, Mowers RP (1994) Multiple regression for molecular marker, quantitative trait data from large F₂ population. Theor Appl Genet 89:305–312
CAS PubMed Google Scholar
Zhang W, Smith C (1992) Computer simulation of marker-assisted selection utilizing linkage disequilibrium. Theor Appl Genet 83:813–820
Article CAS Google Scholar
Zhang W, Smith C (1993) Simulation of marker-assisted selection utilizing linkage disequilibrium: the effects of several additional factors. Theor Appl Genet 86:492–496
Article CAS Google Scholar

Download references

Author information

Authors and Affiliations

Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Mexico, Mexico
J. Jesus Céron-Rojas & José Crossa

Authors

J. Jesus Céron-Rojas
View author publications
You can also search for this author in PubMed Google Scholar
José Crossa
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Céron-Rojas, J.J., Crossa, J. (2018). Linear Marker and Genome-Wide Selection Indices. In: Linear Selection Indices in Modern Plant Breeding. Springer, Cham. https://doi.org/10.1007/978-3-319-91223-3_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-91223-3_4
Published: 27 September 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-91222-6
Online ISBN: 978-3-319-91223-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)

Publish with us

Policies and ethics

Linear Marker and Genome-Wide Selection Indices

Abstract

Similar content being viewed by others

Overview of Genomic Prediction Methods and the Associated Assumptions on the Variance of Marker Effect, and on the Architecture of the Target Trait

Genomic Selection

Genomic Selection: State of the Art

Keywords

4.1 The Linear Marker Selection Index

4.1.1 Basic Conditions for Constructing the LMSI

4.1.2 The LMSI Parameters

4.1.3 The Maximized LMSI Parameters

4.1.4 The LMSI for One Trait

4.1.5 Efficiency of LMSI Versus LPSI Efficiency for One Trait

4.1.6 Statistical LMSI Properties

4.2 The Genome-Wide Linear Selection Index

4.2.1 The GW-LMSI Parameters

4.2.2 Relationship Between the GW-LMSI and the LPSI

4.2.3 Statistical Properties of GW-LMSI

4.3 Estimating the LMSI Parameters

4.3.1 Estimating the Marker Score

4.3.2 Estimating the Variance of the Marker Score

4.3.3 Estimating LMSI Selection Response and Efficiency

4.3.4 Estimating the Variance of the Marker Score in the Multi-Trait Case

4.4 Estimating the GW-LMSI Parameters in the Asymptotic Context

4.5 Comparing LMSI Versus LPSI and GW-LMSI Efficiency

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Publish with us

Navigation

Linear Marker and Genome-Wide Selection Indices

Abstract

Similar content being viewed by others

Overview of Genomic Prediction Methods and the Associated Assumptions on the Variance of Marker Effect, and on the Architecture of the Target Trait

Genomic Selection

Genomic Selection: State of the Art

Keywords

4.1 The Linear Marker Selection Index

4.1.1 Basic Conditions for Constructing the LMSI

4.1.2 The LMSI Parameters

4.1.3 The Maximized LMSI Parameters

4.1.4 The LMSI for One Trait

4.1.5 Efficiency of LMSI Versus LPSI Efficiency for One Trait

4.1.6 Statistical LMSI Properties

4.2 The Genome-Wide Linear Selection Index

4.2.1 The GW-LMSI Parameters

4.2.2 Relationship Between the GW-LMSI and the LPSI

4.2.3 Statistical Properties of GW-LMSI

4.3 Estimating the LMSI Parameters

4.3.1 Estimating the Marker Score

4.3.2 Estimating the Variance of the Marker Score

4.3.3 Estimating LMSI Selection Response and Efficiency

4.3.4 Estimating the Variance of the Marker Score in the Multi-Trait Case

4.4 Estimating the GW-LMSI Parameters in the Asymptotic Context

4.5 Comparing LMSI Versus LPSI and GW-LMSI Efficiency

References

Author information

Authors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation