Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

4.1 The Linear Marker Selection Index

4.1.1 Basic Conditions for Constructing the LMSI

In Chap. 2, Sect. 2.1, we indicated ten basic conditions for constructing a valid linear phenotypic selection index (LPSI). These ten conditions are also necessary for the linear marker selection index (LMSI); however, in addition to those conditions, the LMSI also requires the following conditions:

  1. 1.

    The markers and the quantitative trait loci (QTL) should be in linkage disequilibrium in the population under selection.

  2. 2.

    The QTL effects should be combined additively both within and between loci.

  3. 3.

    The QTL should be in coupling mode, that is, one of the initial lines should have all the alleles that have a positive effect on the chromosome, and the other lines should have all the negative effects.

  4. 4.

    The traits of interest should be affected by a few QTL with large effects (and possibly a number of very small QTL effects) rather than many small QTL effects.

  5. 5.

    The heritability of the traits should be low.

  6. 6.

    Markers correlated with the traits of interest should be identified.

Under these conditions, the LMSI should be more efficient than the LPSI, at least in the first selection cycles (Whittaker 2003; Moreau et al. 2007).

4.1.2 The LMSI Parameters

Let yi = gi + ei be the ith trait (i = 1, 2, …, t, t = number of traits), where ei~N(0, \( {\sigma}_{e_i}^2 \)) is the residual with expectation equal to zero and variance value \( {\sigma}_{e_i}^2 \), and N stands for normal distribution. Assuming that the QTL effects combine additively both within and between loci, the ith unobservable genetic value gi can be written as

$$ {g}_i=\sum \limits_{k=1}^{N_Q}{\alpha}_k{q}_k, $$
(4.1)

where αk is the effect of the kth QTL, qk is the number of favorable alleles at the kth QTL (2, 1 or 0), and NQ is the number of QTL affecting the ith trait of interest.

If the QTL effect values are not observable, the gi values in Eq. (4.1) are also not observable; however, we can use a linear combination of the markers linked to the QTL (si) that affect the ith trait to predict the gi value as

$$ {s}_i=\sum \limits_{j=1}^M{\theta}_j{x}_j, $$
(4.2)

where si is a predictor of gi, θj is the regression coefficient of the linear regression model, xj is the coded value of the jth markers (e.g., 1, 0, and −1 for marker genotypes AA, Aa and aa respectively), and M is the number of selected markers linked to the QTL that affect the ith trait. Equation (4.2) is called the marker score (Lande and Thompson 1990; Whittaker 2003) and this is the main reason why the LMSI is not equal to the LPSI described in Chap. 2. The number of selected markers is only a subset of potential markers linked to QTL in the population under selection; thus, the si values should be lower than or equal to the gi values. One way of estimating the si values is to perform a linear regression of phenotypic values on the coded values of the markers, select markers that are statistically linked to quantitative trait loci that explain most of the variability in the regression model, and then obtain the estimated value of si (\( {\widehat{s}}_i \)) as the sum of the products of the QTL effects linked to markers and multiplied by the marker coded values associated with the ith trait. Some authors (e.g., Moreau et al. 2007) call \( {\widehat{s}}_i \) the molecular score; in this book, we call si the marker score and \( {\widehat{s}}_i \) the estimated marker score.

The objective of the LMSI is to predict the net genetic merit of each individual and select the individuals with the highest net genetic merit for further breeding. In the LMSI context, the net genetic merit can be written as

$$ H={\mathbf{w}}^{\prime}\mathbf{g}+{\mathbf{w}}_2^{\prime}\mathbf{s}=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right]\left[\begin{array}{c}\mathbf{g}\\ {}\mathbf{s}\end{array}\right]={\mathbf{a}}^{\prime}\mathbf{z}, $$
(4.3)

where \( {\mathbf{g}}^{\prime }=\left[{g}_1\kern0.5em \dots \kern0.5em {g}_q\right] \) is the vector of breeding values; \( {\mathbf{w}}^{\prime }=\left[{w}_1\kern0.5em \cdots \kern0.5em {w}_t\right] \) is the vector of economic weights associated with g; \( {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_t\right] \) is a null vector associated with the vector of marker scores \( {\mathbf{s}}^{\prime }=\left[{s}_1\kern0.5em \cdots \kern0.5em {s}_t\right] \); si is the ith marker score; \( {\mathbf{a}}^{\prime }=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right] \) and \( \mathbf{z}=\left[{\mathbf{g}}^{\prime}\kern0.5em {\mathbf{s}}^{\prime}\right] \).

The information provided by the marker score can be used in breeding programs to increase the accuracy of predicting the net genetic merit of the individuals under selection. The LMSI combines the phenotypic and marker scores to predict H in each selection cycle and can be written as

$$ {I}_M={\boldsymbol{\upbeta}}_y^{\prime}\mathbf{y}+{\boldsymbol{\upbeta}}_s^{\prime}\mathbf{s}=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_s^{\prime}\right]\left[\begin{array}{c}\mathbf{y}\\ {}\mathbf{s}\end{array}\right]={\boldsymbol{\upbeta}}^{\prime}\mathbf{t}, $$
(4.4)

where \( {\boldsymbol{\upbeta}}_y^{\prime } \) and βs are vectors of phenotypic and marker score weights respectively; \( {\mathbf{y}}^{\prime }=\left[{y}_1\kern0.5em \cdots \kern0.5em {y}_t\right] \) is the vector of trait phenotypic values and s was defined in Eq. (4.3); \( {\boldsymbol{\upbeta}}^{\prime }=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_s^{\prime}\right] \) and \( {\mathbf{t}}^{\prime }=\left[{\mathbf{y}}^{\prime}\kern0.5em {\mathbf{s}}^{\prime}\right] \).

The LMSI selection response can be written as

$$ {R}_M={k}_I{\sigma}_H{\rho}_{I_MH}={k}_I{\sigma}_H\frac{{\mathbf{a}}^{\prime }{\mathbf{Z}}_M\boldsymbol{\upbeta}}{\sqrt{{\mathbf{a}}^{\prime }{\mathbf{Z}}_M\mathbf{a}}\sqrt{{\boldsymbol{\upbeta}}^{\prime }{\mathbf{T}}_M\boldsymbol{\upbeta}}}, $$
(4.5)

where kI is the standardized selection differential of the LMSI, \( {\sigma}_H=\sqrt{{\mathbf{a}}^{\prime }{\mathbf{Z}}_M\mathbf{a}} \) and \( \sqrt{{\boldsymbol{\upbeta}}^{\prime }{\mathbf{T}}_M\boldsymbol{\upbeta}} \) are the standard deviations of the variances of H and IM, whereas \( {\rho}_{I_MH} \) and aZMβ are the correlation and the covariance between H and IM respectively; \( {\mathbf{T}}_M= Var\left[\begin{array}{c}\mathbf{y}\\ {}\mathbf{s}\end{array}\right]=\left[\begin{array}{cc}\mathbf{P}& \mathbf{S}\\ {}\mathbf{S}& \mathbf{S}\end{array}\right] \) and \( {\mathbf{Z}}_M= Var\left[\begin{array}{c}\mathbf{g}\\ {}\mathbf{s}\end{array}\right]=\left[\begin{array}{cc}\mathbf{C}& \mathbf{S}\\ {}\mathbf{S}& \mathbf{S}\end{array}\right] \) are block matrices of covariance where P = Var(y), S = Var(s), and C = Var(g) are the covariance matrices of phenotypic values (y), the marker score (s), and the genetic value (g) respectively in the population. Vectors a and β were defined in Eqs. (4.3) and (4.4) respectively.

The LMSI expected genetic gain per trait can be written as

$$ {\mathbf{E}}_M={k}_I\frac{{\mathbf{Z}}_M\boldsymbol{\upbeta}}{\sqrt{{\boldsymbol{\upbeta}}^{\prime }{\mathbf{T}}_M\boldsymbol{\upbeta}}}. $$
(4.6)

All the parameters in Eq. (4.6) were previously defined.

4.1.3 The Maximized LMSI Parameters

Suppose that P, S and C are known matrices; then, matrices TM and ZM are known and, according to the LPSI theory (Chap. 2 for details), the LMSI vector of coefficients (βM) that maximizes \( {\rho}_{I_MH} \), RM, and EM can be written as

$$ \boldsymbol{\upbeta} ={\mathbf{T}}_M^{-1}{\mathbf{Z}}_M\mathbf{a}, $$
(4.7)

whence the maximized selection response and the maximized correlation (or LMSI accuracy) between H and IM can be written as

$$ {R}_M={k}_I\sqrt{{\boldsymbol{\upbeta}}^{\prime }{\mathbf{T}}_M\boldsymbol{\upbeta}}, $$
(4.8a)

and

$$ {\rho}_{I_MH}=\frac{\sigma_{I_M}}{\sigma_H}, $$
(4.8b)

respectively, where \( {\sigma}_{I_M}=\sqrt{{\boldsymbol{\upbeta}}^{\prime }{\mathbf{T}}_M\boldsymbol{\upbeta}} \) is the standard deviation of the variance of IM and \( {\sigma}_H=\sqrt{{\mathbf{a}}^{\prime }{\mathbf{Z}}_M\mathbf{a}} \) is the deviation of the variance of H. Equations (4.8a) and (4.8b) show that the LMSI is a direct application of the LPSI theory in the marker-assisted selection (MAS) context.

Let \( \mathbf{Q}={\mathbf{T}}_M^{-1}{\mathbf{Z}}_M \); then, matrix Q can be written as

$$ \mathbf{Q}=\left[\begin{array}{cc}{\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)& \mathbf{0}\\ {}\mathbf{I}-{\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)& \mathbf{I}\end{array}\right], $$
(4.9)

whence β = Qa, and as \( {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_t\right] \), we can write the two vectors of \( {\boldsymbol{\upbeta}}^{\prime }=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_s^{\prime}\right] \) as

$$ {\boldsymbol{\upbeta}}_y={\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)\mathbf{w}\kern1em \mathrm{and}\kern1em {\boldsymbol{\upbeta}}_s=\left[\mathbf{I}-{\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)\right]\mathbf{w}. $$
(4.10a)

Another way of writing the marker score vector weights is

$$ {\boldsymbol{\upbeta}}_s=\mathbf{w}-{\boldsymbol{\upbeta}}_y, $$
(4.10b)

where βy = (P − S)−1(C − S)w. By Eq. (4.10b), the optimal LMSI can be written as

$$ {I}_M={\mathbf{w}}^{\prime}\mathbf{s}+{\boldsymbol{\upbeta}}_y^{\prime}\left(\mathbf{y}-\mathbf{s}\right). $$
(4.11)

Equation (4.11) indicates that, in practice, to estimate the optimal LMSI, we only need to estimate the vector of coefficients βy. By Eq. (4.10a), Eq. (4.8a) can be written as

$$ {R}_M={k}_I\sqrt{{\mathbf{w}}^{\prime}\mathbf{C}{\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)\mathbf{w}+{\mathbf{w}}^{\prime}\mathbf{S}\left[\mathbf{I}-{\left(\mathbf{P}-\mathbf{S}\right)}^{-1}\left(\mathbf{C}-\mathbf{S}\right)\right]\mathbf{w}}. $$
(4.12)

Thus, by Eqs. (4.10a) and (4.12), when S is a null matrix, vector βy is equal to βy = P−1Cw = b and \( {R}_M={k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}={R}_I \), which are the LPSI vector of coefficients and its selection response respectively.

Assume that when the number of markers and genotypes tend to infinity, S tends to C; then, at the limit, we can suppose that S = C, and by this latter result, RM is equal to

$$ {k}_I\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}. $$
(4.13)

That is, Eq. (4.13) is the maximum value of the LMSI selection response when the numbers of markers and genotypes tend to infinity. Thus, the possible LMSI selection response values of Eq. (4.12) should be between \( {k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}} \) and \( {k}_I\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}} \), i.e.,

$$ {k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}\le {R}_M\le {k}_I\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}, $$
(4.14)

or between 1 and \( \frac{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}}{\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}}=\frac{\sigma_H}{\sigma_I} \), that is,

$$ 1\le {R}_M\le \frac{\sigma_H}{\sigma_I}. $$
(4.15)

Note that \( \frac{\sigma_H}{\sigma_I}=\frac{1}{\rho_{HI}} \), where ρHI is the maximized correlation between the net genetic merit (H) and the LPSI (I) described in Chap. 2. Equation (4.15) indicates that LMSI efficiency tends to infinity when the ρHI value tends to zero and is an additional way of denoting the paradox of LMSI efficiency described by Knapp (1998), which implies that LMSI efficiency tends to infinity when the ρHI value tends to zero.

4.1.4 The LMSI for One Trait

For the one-trait case, matrices TM, ZM, and Q can be written as

$$ {\mathbf{T}}_M=\left[\begin{array}{cc}{\sigma}_y^2& {\sigma}_s^2\\ {}{\sigma}_s^2& {\sigma}_s^2\end{array}\right],\kern0.5em {\mathbf{Z}}_M=\left[\begin{array}{cc}{\sigma}_g^2& {\sigma}_s^2\\ {}{\sigma}_s^2& {\sigma}_s^2\end{array}\right]\kern1em \mathrm{and}\kern1em \mathbf{Q}=\left[\begin{array}{cc}\frac{\sigma_g^2-{\sigma}_s^2}{\sigma_y^2-{\sigma}_s^2}& 0\\ {}\frac{\sigma_y^2-{\sigma}_g^2}{\sigma_y^2-{\sigma}_s^2}& 1\end{array}\right], $$
(4.16)

where \( {\sigma}_y^2 \), \( {\sigma}_g^2 \), and \( {\sigma}_s^2 \) are the phenotypic, genetic, and marker score variances respectively. By Eqs. (4.10a) and (4.10b), when \( {\mathbf{a}}^{\prime }=\left[1\kern0.5em 0\right] \), the elements of vector β = Qa are

$$ {\beta}_y=\frac{\sigma_g^2-{\sigma}_s^2}{\sigma_y^2-{\sigma}_s^2}\kern1em \mathrm{and}\kern1em {\beta}_s=1-{\beta}_y, $$
(4.17a)

whence the optimal LMSI can be written as

$$ {I}_M=s+{\beta}_y\left(y-s\right); $$
(4.17b)

whereas by Eq. (4.12), the maximized LMSI selection response can be written as

$$ {R}_M={k}_I\sqrt{\frac{\sigma_g^2\left({\sigma}_g^2-{\sigma}_s^2\right)+{\sigma}_s^2\left({\sigma}_y^2-{\sigma}_g^2\right)}{\sigma_y^2-{\sigma}_s^2}}. $$
(4.18)

When \( {\sigma}_s^2=0 \), \( {\beta}_y=\frac{\sigma_g^2}{\sigma_y^2}={h}^2 \), IM = h2y, and \( {R}_M=k\frac{\sigma_g^2}{\sigma_y}=k{\sigma}_y{h}^2=R \), the selection response for the one-trait case without markers.

4.1.5 Efficiency of LMSI Versus LPSI Efficiency for One Trait

Suppose that the intensity of selection is the same in both indices; then, to compare LMSI versus LPSI efficiency for predicting the net genetic merit, we can use the ratio \( {\lambda}_M=\frac{\rho_{I_MH}}{\rho_{HI}}=\frac{R_M}{R_I} \) (Bulmer 1980; Moreau et al. 1998), where RI is the maximized LPSI selection response. In percentage terms, the LMSI versus LPSI efficiency can be written as

$$ {p}_M=100\left({\lambda}_M-1\right). $$
(4.19)

When pM = 0, the efficiency of both indices is the same; when pM > 0, the efficiency of the LMSI is higher than that of the LPSI, and when pM < 0, LPSI efficiency is higher than LMSI efficiency for predicting the net genetic merit.

In the case of one trait, Lande and Thompson (1990) showed that LMSI efficiency (not in percentage terms) with respect to phenotypic efficiency can be written as

$$ {\lambda}_M=\frac{R_M}{R}=\sqrt{\frac{q}{h^2}+\frac{{\left(1-q\right)}^2}{1-{qh}^2}}, $$
(4.20)

where RM was defined in Eq. (4.18), R = yh2, h2 is the trait heritability, and \( q=\frac{\sigma_s^2}{\sigma_g^2} \) is the proportion of additive genetic variance explained by the markers. According to Eq. (4.20), the advantage of the LMSI over phenotypic selection increases as the population size increases and heritability decreases, because in such cases, \( q=\frac{\sigma_s^2}{\sigma_g^2} \) tends to 1 and Eq. (4.20) approaches \( \frac{1}{h} \). Therefore, the LMSI is most efficient for traits with low heritability and when the marker score explains a large proportion of the genetic variance. Thus, note that when h2 tends to zero, \( \frac{1}{h} \) tends to infinity; this means that in the asymptotic context, LMSI efficiency with respect to phenotypic efficiency for one trait (Eq. 4.20) tends to infinity and this is the LMSI paradox pointed out by Knapp (1998). There are other problems associated with the LMSI: it increases the selection response only in the short term and can result in lower cumulative responses in the longer term than phenotypic selection, as the LMSI fixes the QTL at a faster rate than phenotypic selection. In addition, it requires the weights (Eq. 4.17a) to be updated, because in each generation the frequency of the QTL changes (Dekkers and Settar 2004).

4.1.6 Statistical LMSI Properties

Assume that H and IM have bivariate joint normal distribution, \( \boldsymbol{\upbeta} ={\mathbf{T}}_M^{-1}{\mathbf{Z}}_M\mathbf{a} \), and that P, C, S, and w are known; then, the statistical LMSI properties are the same as the LPSI properties described in Chap. 2. That is,

  1. 1.

    \( {\sigma}_{I_M}^2={\sigma}_{HI_M} \): the variance of IM (\( {\sigma}_{I_M}^2 \)) and the covariance between H and IM (\( {\sigma}_{HI_M} \)) are the same.

  2. 2.

    The maximized correlation between H and IM (or IM accuracy) is \( {\rho}_{HI_M}=\frac{\sigma_{I_M}}{\sigma_H} \).

  3. 3.

    The variance of the predicted error, \( Var\left(H-{I}_M\right)=\left(1-{\rho}_{HI_M}^2\right){\sigma}_H^2 \), is minimal.

  4. 4.

    The total variance of H explained by IM is \( {\sigma}_{I_M}^2={\rho}_{HI_M}^2{\sigma}_H^2 \).

  5. 5.

    The heritability of IM is \( {\mathrm{h}}_{\mathrm{M}}^2=\frac{{\boldsymbol{\upbeta}}_M^{\prime }{\mathbf{Z}}_M{\boldsymbol{\upbeta}}_M}{{\boldsymbol{\upbeta}}_M^{\prime }{\mathbf{T}}_M{\boldsymbol{\upbeta}}_M} \).

Properties 1 to 4 are the same as LPSI properties 1 to 4, but, because the LMSI jointly incorporates the phenotypic and marker information to predict the net genetic merit, LMSI accuracy should be higher than LPSI accuracy. The same is true of the LMSI selection response and expected genetic gain per trait when compared with the LPSI selection response and expected genetic gain per trait.

4.2 The Genome-Wide Linear Selection Index

The genome-wide linear marker selection index (GW-LMSI) is a single-stage procedure that treats information at each individual marker as a separate trait. Thus, all marker information can be entered together with phenotypic information into the GW-LMSI, which is then used to predict the net genetic merit. In a similar manner to the LMSI, the GW-LMSI exploits the linkage disequilibrium between markers and the QTL produced when inbred lines are crossed.

4.2.1 The GW-LMSI Parameters

In a similar manner to the LPSI, the main objective of the GW-LMSI is to predict the net genetic merit values of each individual and select the best individuals for further breeding. In the GW-LMSI context, the net genetic merit can be written as

$$ H={\mathbf{w}}^{\prime}\mathbf{g}+{\mathbf{w}}_2^{\prime}\mathbf{m}=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right]\left[\begin{array}{c}\mathbf{g}\\ {}\mathbf{m}\end{array}\right]={\mathbf{a}}_W^{\prime }{\mathbf{z}}_W, $$
(4.21)

where \( {\mathbf{g}}^{\prime }=\left[{g}_1\kern0.5em \dots \kern0.5em {g}_t\right] \) (j = 1, 2, …, t = number of traits) is the vector of breeding values, \( {\mathbf{w}}^{\prime }=\left[{w}_1\kern0.5em \cdots \kern0.5em {w}_t\right] \) is the vector of economic weights associated with the breeding values, and \( {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_m\right] \) is a null vector associated with the coded values of the markers \( {\mathbf{m}}^{\prime }=\left[{m}_1\kern0.5em \cdots \kern0.5em {m}_m\right] \), where mj (j = 1, 2, …, m = number of markers) is the jth marker in the training population; \( {\mathbf{a}}_W^{\prime }=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{w}}_2^{\prime}\right] \) and \( {\mathbf{z}}_W=\left[{\mathbf{g}}^{\prime}\kern0.5em {\mathbf{m}}^{\prime}\right] \).

The GW-LMSI (IW) combines the phenotypic value and the molecular information linked to the individual traits to predict H values in each selection cycle. It can be written as

$$ {I}_W={\boldsymbol{\upbeta}}_y^{\prime}\mathbf{y}+{\boldsymbol{\upbeta}}_m^{\prime}\mathbf{m}=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_m^{\prime}\right]\left[\begin{array}{c}\mathbf{y}\\ {}\mathbf{m}\end{array}\right]={\boldsymbol{\upbeta}}_W^{\prime }{\mathbf{t}}_W, $$
(4.22)

where \( {\boldsymbol{\upbeta}}_y^{\prime } \) and βm are vectors of phenotypic and marker weights respectively; \( {\mathbf{y}}^{\prime }=\left[{y}_1\kern0.5em \cdots \kern0.5em {y}_t\right] \) is the vector of phenotypic values and m was defined in Eq. (4.21); \( {\boldsymbol{\upbeta}}_W^{\prime }=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_m^{\prime}\right] \) and \( {\mathbf{t}}_W^{\prime }=\left[{\mathbf{y}}^{\prime}\kern0.5em {\mathbf{m}}^{\prime}\right] \).

The GW-LSI selection response can be written as

$$ {R}_W={k}_I{\sigma}_H{\rho}_{I_WH}={k}_I{\sigma}_H\frac{{\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \boldsymbol{\upbeta}}_W}{\sqrt{{\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \mathbf{a}}_W}\sqrt{{\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W}}, $$
(4.23a)

where kI is the standardized selection differential of the GW-LMSI, \( {\sigma}_H^2={\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \mathbf{a}}_W \) and \( Var\left({I}_W\right)={\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W \) are the variance of H and IW, whereas \( {\rho}_{I_WH}=\frac{{\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \boldsymbol{\upbeta}}_W}{\sqrt{{\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \mathbf{a}}_W}\sqrt{{\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W}} \) and \( {\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \boldsymbol{\upbeta}}_W \) are the correlation and the covariance between H and IW respectively; \( \boldsymbol{\Phi} = Var\left[\begin{array}{c}\mathbf{y}\\ {}\mathbf{m}\end{array}\right]=\left[\begin{array}{cc}\mathbf{P}& {\mathbf{W}}^{\prime}\\ {}\mathbf{W}& \mathbf{M}\end{array}\right] \) and \( \boldsymbol{\Psi} = Var\left[\begin{array}{c}\mathbf{g}\\ {}\mathbf{m}\end{array}\right]=\left[\begin{array}{cc}\mathbf{C}& {\mathbf{W}}^{\prime}\\ {}\mathbf{W}& \mathbf{M}\end{array}\right] \) are block covariance matrices where P = Var(y), M = Var(m), C = Var(g), and W = Cov(y, m) = Cov(g, m) are the covariance matrices of phenotypic values (y), the molecular marker (m) coded values, and the genetic (g) values, whereas W is the covariance matrix between y and m, and between g and m. The size of matrices P and C is t × t, but the sizes of matrices M and W are m × m and m × t respectively.

From a theoretical point of view, Crossa and Cerón-Rojas (2011) showed that matrix M can be written as

$$ \mathbf{M}=\left[\begin{array}{cccc}1& \left(1-2{\delta}_{11}\right)& \cdots & \left(1-2{\delta}_{1N}\right)\\ {}\left(1-2{\delta}_{21}\right)& 1& \cdots & \left(1-2{\delta}_{2N}\right)\\ {}\vdots & \vdots & \ddots & \vdots \\ {}\left(1-2{\delta}_{N1}\right)& \left(1-2{\delta}_{N2}\right)& \cdots & 1\end{array}\right], $$
(4.23b)

where (1 − 2δij) is the covariance (or correlation) and δij the recombination frequency between the ith and jth marker (i, j = 1, 2, …, m = number of markers). According to Crossa and Cerón-Rojas (2011), matrix W can be written as

$$ \mathbf{W}=\left[\begin{array}{cccc}\left(1-2{r}_{11}\right){\alpha}_{11}& \left(1-2{r}_{11}\right){\alpha}_{12}& \cdots & \left(1-2{r}_{1N}\right){\alpha}_{1{N}_Q}\\ {}\left(1-2{r}_{21}\right){\alpha}_{21}& \left(1-2{r}_{22}\right){\alpha}_{22}& \cdots & \left(1-2{r}_{2N}\right){\alpha}_{2{N}_Q}\\ {}\vdots & \vdots & \ddots & \vdots \\ {}\left(1-2{r}_{t1}\right){\alpha}_{t1}& \left(1-2{r}_{N2}\right){\alpha}_{t2}& \cdots & \left(1-2{r}_{NN}\right){\alpha}_{tN_Q}\end{array}\right], $$
(4.23c)

where (1 − 2rik)αqk (i = 1, 2, …, m, k = 1, 2, …, NQ = number of QTL, q = 1, 2, …, t) is the covariance between the qth trait and the ith marker; rik is the recombination frequency between the ith marker and the kth QTL; and αqk is the effect of the kth QTL over the qth trait.

The GW-LMSI expected genetic gain per trait can be written as

$$ {\mathbf{E}}_{LW}={k}_I\frac{\boldsymbol{\Psi} \boldsymbol{\upbeta}}{\sqrt{{\boldsymbol{\upbeta}}^{\prime}\boldsymbol{\Phi} \boldsymbol{\upbeta}}}. $$
(4.24)

All parameters in Eq. (4.24) were previously defined.

Matrix Φ could be singular, i.e., its inverse (Φ−1) could not exist because matrix W is singular. Suppose that matrices Φ and Ψ are known; then, according to the LPSI theory, the GW-LMSI vector of coefficients (βW) that maximizes \( {\rho}_{I_WH} \) can be written as

$$ {\boldsymbol{\upbeta}}_W={\boldsymbol{\Phi}}^{-}{\boldsymbol{\Psi} \mathbf{a}}_W, $$
(4.25a)

where matrix Φ denotes a generalized inverse of Φ. By Eq. (4.25a), the maximized GW-LMSI selection response is

$$ {R}_W={k}_I\sqrt{{\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W}. $$
(4.25b)

Equations (4.25a) and (4.25b) show that the GW-LMSI is a direct application of the LPSI to MAS. By Eq. (4.25a), the maximized correlation between H and IW is

$$ {\rho}_{I_WH}=\frac{\sigma_{I_W}}{\sigma_H}, $$
(4.25c)

where \( {\sigma}_{I_W}=\sqrt{{\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W} \) is the standard deviation of the variance of IW and \( {\sigma}_H=\sqrt{{\mathbf{a}}_W^{\prime }{\boldsymbol{\Psi} \mathbf{a}}_W} \) is the standard deviation of the variance of H.

4.2.2 Relationship Between the GW-LMSI and the LPSI

Matrix Φ can be written as

$$ {\boldsymbol{\Phi}}^{-}=\left[\begin{array}{cc}{\mathbf{L}}^{-}& -{\mathbf{L}}^{-}{\mathbf{W}}^{\prime }{\mathbf{M}}^{-}\\ {}-{\mathbf{M}}^{-}{\mathbf{W}\mathbf{L}}^{-}& {\mathbf{M}}^{-}+{\mathbf{M}}^{-}{\mathbf{W}\mathbf{L}}^{-}{\mathbf{W}}^{\prime }{\mathbf{M}}^{-}\end{array}\right], $$
(4.26)

where L is a generalized inverse of matrix L = P − WMW, and M is a generalized inverse of matrix M. In matrix Φ, the inverse of matrix W is not required and the standard inverse of matrix M (M−1) may exist. In the latter case, the standard inverse of matrix L (L−1) exists and can be written as L−1 = (P − WM−1W)−1 = P−1 + P−1W[M − WP−1W]−1WP−1 (Searle et al. 2006).

By Eq. (4.26) and because \( {\mathbf{w}}_2^{\prime }=\left[{0}_1\kern0.5em \cdots \kern0.5em {0}_N\right] \), the vector components of \( {\boldsymbol{\upbeta}}_W^{\prime }=\left[{\boldsymbol{\upbeta}}_y^{\prime}\kern0.5em {\boldsymbol{\upbeta}}_m^{\prime}\right] \), or βW = ΦΨaW, can be written as

$$ {\boldsymbol{\upbeta}}_y=\left[{\mathbf{L}}^{-}\mathbf{C}-{\mathbf{L}}^{-}{\mathbf{W}}^{\prime }{\mathbf{M}}^{-}\mathbf{W}\right]\mathbf{w} $$
(4.27)

and

$$ {\boldsymbol{\upbeta}}_m=\left[\left({\mathbf{M}}^{-}+{\mathbf{M}}^{-}{\mathbf{W}\mathbf{L}}^{-}{\mathbf{W}}^{\prime }{\mathbf{M}}^{-}\right)\mathbf{W}-{\mathbf{M}}^{-}{\mathbf{W}\mathbf{L}}^{-}\mathbf{C}\right]\mathbf{w}, $$
(4.28)

where w is the vector of economic weights. Suppose that there is no marker information; then, matrices M and W are null and Eq. (4.27) is equal to βy = P−1Cw = b (the LPSI vector of coefficients), whereas βm = 0 and \( {R}_W={k}_I\sqrt{{\boldsymbol{\upbeta}}_W^{\prime }{\boldsymbol{\Phi} \boldsymbol{\upbeta}}_W}={k}_I\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}={R}_I \), the LPSI selection response. Now suppose that the markers explain all the genetic variability; in this case, βy = 0 and βm = (XX)XY, the matrix of linear regression coefficients in the multivariate context, where (XX) is a generalized inverse matrix of XX and Y is a matrix of phenotypic observations.

4.2.3 Statistical Properties of GW-LMSI

Assume that H and IW have bivariate joint normal distribution, βW = ΦΨaW, and P, C, M, W, and w are known; then, the statistical GW-LMSI properties are the same as the LMSI properties. That is,

  1. 1.

    \( {\sigma}_{I_W}^2={\sigma}_{HI_W} \), i.e., the variance of IW (\( {\sigma}_{I_W}^2 \)) and the covariance between H and IW (\( {\sigma}_{HI_W} \)) are the same.

  2. 2.

    The maximized correlation between H and IW, or IW accuracy, is \( {\rho}_{HI_W}=\frac{\sigma_{I_W}}{\sigma_H} \).

  3. 3.

    The variance of the predicted error, \( Var\left(H-{I}_W\right)=\left(1-{\rho}_{HI_W}^2\right){\sigma}_H^2 \), is minimal.

  4. 4.

    The total variance of H explained by IW is \( {\sigma}_{I_W}^2={\rho}_{HI_W}^2{\sigma}_H^2 \).

According to Lange and Whittaker (2001), GW-LMSI efficiency should be greater than LMSI efficiency. However, this would be true only if matrices P, C, M, and W are known and trait heritability is very low.

4.3 Estimating the LMSI Parameters

When covariance matrices P, C, and S, and the vector of economic weights (w) are known, there is no error in the estimation of the LMSI parameters (selection response, expected genetic gain, etc.); the same is true for the GW-LMSI when, in addition to P, C, and w, the covariance matrices M and W are known. In such cases, the relative efficiency of the LMSI (GW-LMSI) depends only on the heritability of the traits and on the portion of phenotypic variation associated with markers. Using simulated data, Lange and Whittaker (2001) found that GW-LMSI efficiency was higher than LMSI efficiency when trait heritability was 0.2 and matrices P, C, M, and W were known. When P, C, S, M, and W are unknown, it is necessary to estimate them; then, the LMSI and GW-LMSI vector of coefficients and the effects associated with markers are estimated with some error. This error leads to lower LMSI and GW-LMSI efficiency than expected under the assumption that the parameters are known; however, in the latter case, Lange and Whittaker (2001) also found that GW-LMSI efficiency was greater than that of the LMSI when trait heritability was 0.05. Moreover, in the LMSI there is additional bias in the estimation of the parameters because only markers with significant effects are included in the index (Moreau et al. 1998).

In Chap. 2, we described the restricted maximum likelihood (REML) method for estimating matrices P and C. Some authors (Lande and Thompson 1990; Charcosset and Gallais 1996; Hospital et al. 1997; Moreau et al. 1998, 2007) have described methods for estimating marker scores, the variance of the marker scores, the LMSI vector of coefficients, etc., in the context of one trait; however, up to now there have been no reports on the estimation of matrix S in the multi-trait case. Lange and Whittaker (2001) only indicated that matrix S can be estimated as \( \widehat{\mathbf{S}}= Var\left(\widehat{\mathbf{s}}\right) \), where \( \widehat{\mathbf{s}} \) is a vector of estimated marker scores associated with several individual traits.

The main problems associated with the estimated LMSI parameters are:

  1. 1.

    The estimated values of the covariance matrix S (\( \widehat{\mathbf{S}} \)) tend to overestimate the genetic covariance matrix (C).

  2. 2.

    The estimated variances of the marker scores can be negative.

When the first point is true, the estimated LMSI selection response and efficiency could be negative because the estimated matrix \( {\widehat{\mathbf{T}}}_M=\left[\begin{array}{cc}\widehat{\mathbf{P}}& \widehat{\mathbf{S}}\\ {}\widehat{\mathbf{S}}& \widehat{\mathbf{S}}\end{array}\right] \) is not positive definite (all eigenvalues positive) and the estimated matrix \( {\widehat{\mathbf{Z}}}_M=\left[\begin{array}{cc}\widehat{\mathbf{G}}& \widehat{\mathbf{S}}\\ {}\widehat{\mathbf{S}}& \widehat{\mathbf{S}}\end{array}\right] \) is not positive semi-definite (no negative eigenvalues). In addition, the results can lead to all weights being placed on the molecular score and the weights on the phenotype values can be negative (Moreau et al. 2007). When the second point is true, the variance of the marker scores is not useful. The two problems indicated above could be caused by using the same data set to select markers and to estimate marker effects, and there is no simple way of solving them. Lande and Thompson (1990) proposed that the markers used to obtain \( \widehat{\mathbf{S}} \) be selected a priori as those with the most highly significant partial regression coefficients from among all the markers in the linkage group analyzed in the previous generation. Zhang and Smith (1992, 1993) proposed using two independent sets of markers: one to estimate marker effects and the other to select markers. Additional solutions to these problems were described by Moreau et al. (2007).

In this subsection, we describe methods (in the univariate and multivariate context) for estimating molecular marker effects, marker scores, and their variance and covariance, and for estimating the LMSI and GW-LMSI vector of coefficients, selection response, expected genetic gain, and accuracy. This subsection is only for illustration; we use the same data set to select markers, and to estimate marker effects and the variance of marker scores.

4.3.1 Estimating the Marker Score

According to Eqs. (4.11) and (4.17b), when the vector of economic weights is equal to \( {\mathbf{a}}^{\prime }=\left[1\kern0.5em 0\right] \), the LMSI for the ith trait yi (i  =  1, 2, ⋯, t; t = number of traits) value can be written as \( {I}_{M_{li}}\kern0.5em =\kern0.5em {s}_i+{\beta}_{y_i}\left({y}_i-{s}_i\right) \) (l  =  1, 2, ⋯, n; n = number of individuals or genotypes), where \( {\beta}_{yi}=\frac{\sigma_{g_i}^2-{\sigma}_{s_i}^2}{\sigma_{y_i}^2-{\sigma}_{s_i}^2}=\frac{h_i^2\left(1-{q}_i\right)}{1-{q}_i{h}_i^2} \) is the LMSI coefficient, \( {h}_i^2=\frac{\sigma_{g_i}^2}{\sigma_{y_i}^2} \) is the heritability of the ith trait, and \( {q}_i=\frac{\sigma_{s_i}^2}{\sigma_{g_i}^2} \) is the proportion of genetic variance explained by the QTL or markers associated with the ith trait; \( {s}_i=\sum \limits_{j=1}^M{\theta}_j{x}_j \) (j  =  1,  2, ⋯, M; M = number of selected markers) is the ith individual trait marker score; and \( {\sigma}_{y_i}^2 \), \( {\sigma}_{g_i}^2 \), and \( {\sigma}_{s_i}^2 \) are the ith variances of the phenotypic, genetic, and marker score values respectively.

The simplest way of estimating the ith marker score si is to perform a multiple linear regression of phenotypic values (yi) on the coded values of the markers (xj) and then select the markers statistically linked to the ith QTL that explain most of the variability in the regression model and use them to construct \( {s}_i=\sum \limits_{j\in M}{\theta}_j{x}_j \).

We can fit the model \( {y}_i^{\ast }=\sum \limits_{j\in M}{\theta}_j{x}_j+e \), where \( {y}_i^{\ast }={y}_i-{\overline{y}}_i \) and \( {\overline{y}}_i \) are the average values of the ith trait, by maximum likelihood or least squares. When estimating θj, the main problem is to choose the set of markers M based on criteria for declaring markers as significant and then use the estimated values of θj (\( {\widehat{\theta}}_j \)) to estimate the ith marker score si as \( {\widehat{s}}_i=\sum \limits_{j\in M}{\widehat{\theta}}_j{x}_j \). The values of \( {\widehat{s}}_i \) may increase or decrease according to the number of markers (xj) included in the model, and \( {\widehat{s}}_i \) affects LMSI selection response and efficiency by means of the estimated variance of \( {\widehat{s}}_i \) (\( {\widehat{\sigma}}_{{\widehat{s}}_i}^2 \)) (Figs. 4.1 and 4.2).

Fig. 4.1
figure 1

Efficiency of the linear molecular selection index with respect to phenotypic selection for the one-trait case for different values of the variance of the marker score when the phenotypic and genetic variances are fixed

Fig. 4.2
figure 2

Selection response values of the linear molecular selection index for the one-trait case for different values of the variance of the marker score when the phenotypic and genetic variances are fixed

According to the least squares method of estimation, \( \widehat{\boldsymbol{\uptheta}}={\left({\mathbf{X}}^{\prime}\mathbf{X}\right)}^{-1}{\mathbf{X}}^{\prime }{\mathbf{y}}^{\ast } \) is an estimator of the vector of regression coefficients \( {\boldsymbol{\uptheta}}^{\prime }=\left[{\theta}_1\kern0.5em {\theta}_2\kern0.5em \cdots \kern0.5em {\theta}_m\right] \), where m (m < n) is the number of markers, X is a matrix n × m of coded marker values (e.g., 1, 0 and −1 for marker genotypes AA, Aa, and aa respectively) and y is a vector n × 1 of phenotypic values centered based on its average values. Only a subset M(M < m) of the m markers is statistically linked to the QTL and then only a subset M of the estimated vector \( \widehat{\boldsymbol{\uptheta}} \) values is selected to estimate si as \( {\widehat{s}}_i=\sum \limits_{j=1}^M{\widehat{\theta}}_j{x}_j \).

To illustrate how to obtain \( {\widehat{s}}_i=\sum \limits_{j\in M}{\widehat{\theta}}_j{x}_j \), we use a real maize (Zea mays) F2 population with 247 genotypes (each one with two repetitions), 195 molecular markers, and four traits – grain yield (GY, ton ha−1); plant height (PHT, cm), ear height (EHT, cm), and anthesis day (AD, days) – evaluated in one environment. In an F2 population, the marker homozygous loci for the allele from the first parental line can be coded by 1, whereas the marker homozygous loci for the allele from the second parental line can be coded by −1, and the marker heterozygous loci by 0.

For this example, we used trait PHT. Only seven markers were statistically linked to the PHT. The estimated vector of regression coefficients for these seven markers was \( \widehat{{\boldsymbol{\uptheta}}^{\prime }}=\left[5.46\kern0.5em -4.54\kern0.5em 0.98\kern0.5em 7.39\kern0.5em -7.75\kern0.5em -1.91\kern0.5em -3.53\right] \). Table 4.1 presents the first 20 genotypes, the coded values of the seven selected markers, and the first 20 estimated \( {\widehat{s}}_{PHT} \) values of the 247 genotypes in the maize (Zea mays) F2 population. According to \( \widehat{{\boldsymbol{\uptheta}}^{\prime }} \) and the coded values of the seven markers, the first estimated \( {\widehat{s}}_{PHT} \) value was obtained as \( {\widehat{s}}_{PHT1}=-1.91(1)+-3.53\left(-1\right)=1.62 \); the second estimated \( {\widehat{s}}_{PHT} \) value was obtained as \( {\widehat{s}}_{PHT2}=5.46\left(-1\right)+-4.54\left(-1\right)-1.91\left(-1\right)=0.99 \), etc. The 20th estimated \( {\widehat{s}}_{PHT} \) value was obtained as \( {\widehat{s}}_{PHT20}=-3.53\left(-1\right)=3.53 \). This estimation procedure is valid for any number of genotypes and markers.

Table 4.1 Number of selected genotypes, coded values of seven selected markers, and estimated marker score values obtained from a maize (Zea mays) F2 population with 247 genotypes and 195 molecular markers

Figure 4.3 shows the distribution of the 247 estimated marker scores associated with traits PHT and EHT of the maize F2 population. Note that the estimated marker score values approach normal distribution.

Fig. 4.3
figure 3

Distribution of the marker scores associated with traits (a) plant height and (b) ear height of a maize (Zea mays) F2 population. Note that the distribution of frequencies of the marker score values approaches normal distribution

4.3.2 Estimating the Variance of the Marker Score

There are many methods of estimating the variance of the marker score associated with the ith trait (\( {\sigma}_{s_i}^2 \)); the first one was proposed by Lande and Thompson (1990). According to these authors, \( {\sigma}_{s_i}^2 \) can be estimated as

$$ {\widehat{\sigma}}_{{\widehat{s}}_i}^2={\widehat{\boldsymbol{\uptheta}}}_i^{\prime }{\mathbf{M}}_i{\widehat{\boldsymbol{\uptheta}}}_i-\frac{M{\widehat{\sigma}}_{e_i}^2}{n}, $$
(4.29)

where \( {\widehat{\boldsymbol{\uptheta}}}_i \) is the estimated vector of regression coefficients of the selected markers, \( {\mathbf{M}}_i=\frac{2}{n}{\mathbf{X}}_i^{\prime }{\mathbf{X}}_i \)is the covariance matrix M × M of the selected markers that are statistically linked to the ith trait marker loci; \( {\widehat{\sigma}}_{e_i}^2=\frac{{\mathbf{y}}^{\prime}\left(\mathbf{I}-\mathbf{H}\right)\mathbf{y}}{n-M-1} \) is the unbiased estimated variance of the residuals, \( \mathbf{H}=\mathbf{I}-{\mathbf{X}}_i{\left({\mathbf{X}}_i^{\prime }{\mathbf{X}}_i\right)}^{-1}{\mathbf{X}}_i^{\prime } \), I is an identity matrix n × n, M is the number of selected markers statistically linked to the QTL, and Xi is a matrix n × M with the coded values of the selected markers. According to Lande and Thompson (1990), Eq. (4.29) is an unbiased estimator of \( {\sigma}_{s_i}^2 \) and its variance can be written as

$$ Var\left({\widehat{\sigma}}_{{\widehat{s}}_i}^2\right)=\frac{4{\sigma}_{s_i}^2{\sigma}_{e_i}^2}{n}+\frac{2M{\left({\sigma}_{e_i}^2\right)}^2}{n^2}+\frac{2{M}^2{\left({\sigma}_{e_i}^2\right)}^2}{n^2\left(n-M\right)}, $$
(4.30)

which tends to zero when n, the number of genotypes or individuals, is very high.

From Eq. (4.29), it is possible to obtain an estimator of the covariance between the ith and jth marker scores when the number of selected markers statistically linked to the QTL is the same in the ith and jth traits. Thus, by Eq. (4.29), the covariance between the ith and jth marker scores can be estimated as

$$ {\widehat{\sigma}}_{{\widehat{s}}_{ij}}={\widehat{\boldsymbol{\uptheta}}}_i^{\prime }{\mathbf{M}}_{ij}{\widehat{\boldsymbol{\uptheta}}}_j-\frac{M{\widehat{\sigma}}_{e_{ij}}}{n}, $$
(4.31)

where \( {\widehat{\boldsymbol{\uptheta}}}_i \) and \( {\widehat{\boldsymbol{\uptheta}}}_j \) are the estimated vectors of regression coefficients of the selected markers associated with the ith and jth trait loci respectively; \( {\mathbf{M}}_{ij}=\frac{2}{n}{\mathbf{X}}_i^{\prime }{\mathbf{X}}_j \) is the covariance matrix M × M of the markers statistically linked to the ith and jth trait marker loci; Xi and Xj are n × M matrices with the coded values of the selected markers associated with the ith and jth trait loci respectively; \( {\widehat{\sigma}}_{e_{ij}}=\frac{{\mathbf{y}}_i^{\prime}\left(\mathbf{I}-{\mathbf{H}}_{ij}\right){\mathbf{y}}_j}{n-M-1} \) is the estimated covariance of the residuals between the ith (yi) and jth (yj) trait values, \( {\mathbf{H}}_{ij}=\mathbf{I}-{\mathbf{X}}_i{\left({\mathbf{X}}_i^{\prime }{\mathbf{X}}_j\right)}^{-1}{\mathbf{X}}_j^{\prime } \), I is an identity matrix n × n, and M is the number of selected markers statistically linked to the QTL.

According to the PHT values described in Sect. 4.3.1 of this chapter, M = 7, n = 247, \( {\widehat{\sigma}}_{e_i}^2=180.80 \) and \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 \) (Eq. 4.29). Note that \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2\le {\widehat{\sigma}}_{g_{PHT}}^2 \), where \( {\widehat{\sigma}}_{g_{PHT}}^2=83.0 \) is an estimate of the genetic variance of PHT. The estimated portion of the genetic variance attributable to \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 \) was \( {\widehat{q}}_{PHT}=\frac{48.23}{83}=0.5811 \); that is, the seven markers explain 58.11% of the genetic variance associated with PHT.

Charcosset and Gallais (1996) considered two possible methods of estimating \( {\sigma}_{s_i}^2 \) based on the coefficient of multiple determination or squared multiple correlation R2 (note that in this case R2 is not the square of the selection response). The coefficient R2 gives the portion of the total variation in the phenotypic values that is “explained” by, or attributable to, the markers and can be written as

$$ {R}^2=\frac{\widehat{\boldsymbol{\uptheta}}{\mathbf{X}}^{\prime}\mathbf{y}-n{\overline{y}}^2}{{\mathbf{y}}^{\prime}\mathbf{y}-n{\overline{y}}^2}=\frac{{\widehat{\sigma}}_s^2}{{\widehat{\sigma}}_y^2}, $$
(4.32a)

where \( \widehat{\boldsymbol{\uptheta}}{\mathbf{X}}^{\prime}\mathbf{y}-n{\overline{y}}^2 \) is the overall regression sum of squares adjusted for the intercept and \( {\mathbf{y}}^{\prime}\mathbf{y}-n{\overline{y}}^2 \) is the total sum of squares adjusted for the mean. The coefficient R2 is equal to 1 if the fitted equation \( {y}_i={\theta}_0+\sum \limits_{j\in M}{\theta}_j{x}_j+{e}_i \) passes through all the data points, so that all residuals are null; then, the markers explain all the phenotypic variance. At the other extreme, R2 is zero if \( {\overline{y}}_i={\widehat{\theta}}_0 \) and the estimated regression coefficients are null, i.e., \( {\widehat{\theta}}_1={\widehat{\theta}}_2=\cdots ={\widehat{\theta}}_M=0 \). In the latter case, markers do not affect the phenotypic observations and the variance of the marker score values is zero. Thus, the R2 values are between 0 and 1, i.e., 0 ≤ R2 ≤ 1.0. Equation (4.32a) is useful for estimating \( {\sigma}_{s_i}^2 \) as \( {\widehat{\sigma}}_{y_i}^2\sum \limits_{j=1}^M{R}_j^2={\widehat{\sigma}}_s^2 \), where \( {R}_j^2 \) is the estimated value of the jth marker and \( {\widehat{\sigma}}_y^2 \) is the phenotypic variance of the ith trait; however, this is a biased estimator of \( {\sigma}_{s_i}^2 \) (Hospital et al. 1997).

Charcosset and Gallais (1996) and Hospital et al. (1997) proposed an unbiased estimator of \( {\sigma}_{s_i}^2 \) based on all the selected markers using the adjusted coefficient of multiple determination, i.e.,

$$ {R}_{Adj}^2=1-\frac{n-1}{n-M-1}\left(1-{R}^2\right)=\frac{{\widehat{\sigma}}_s^2}{{\widehat{\sigma}}_y^2}, $$
(4.32b)

whence we can obtain a unbiased estimator of \( {\sigma}_{s_i}^2 \) as \( {\widehat{\sigma}}_y^2{R}_{Adj}^2={\widehat{\sigma}}_{{\widehat{s}}_i}^2 \) by jointly using all the markers that affect the phenotypic values. The problem with Eq. (4.32b) is that the \( {R}_{Adj}^2 \) values could be negative; in that case, the estimated value of \( {\sigma}_{s_i}^2 \) would also be negative. One additional problem with Eq. (4.32b) is that the \( {R}_{Adj}^2 \) values can produce \( {\widehat{\sigma}}_s^2 \) values that are higher than those of the estimated variance of the breeding values \( {\widehat{\sigma}}_g^2 \).

Using Eqs. (4.32a) and (4.32b), we can estimate \( {\sigma}_{s_i}^2 \), but from them it is not clear how we can estimate the covariance between two different estimated marker score values.

Consider the case of the PHT values described in Sect. 4.3.1 of this chapter, where M = 7, n = 247, and the estimated variance of PHT was \( {\widehat{\sigma}}_{PHT}^2=191.81 \). The estimated values of R2 for each of the seven markers were 0.0038, 0.0005, 0.006, 0.0013, 0.0036, 0.0114, and 0.0298, whence, by multiplying each estimated R2 value by \( {\widehat{\sigma}}_{PHT}^2=191.81 \) and summing the results, we found that the estimated value of \( {\sigma}_{s_{PHT}}^2 \) was \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 \). In this case, the estimated portion of the genetic variance attributable to \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 \) was \( {\widehat{q}}_{PHT}=\frac{9.78}{83}=0.1178 \); thus, when we estimated \( {\sigma}_{s_{PHT}}^2 \) according to Eq. (4.32a), the seven markers explained only 11.78% of the genetic variance associated with PHT.

The estimated value of \( {R}_{Adj}^2 \) for the seven markers jointly was 0.06, whence \( {\widehat{\sigma}}_{s_{PHT}}^2=(191.81)(0.06)=11.50 \) is an estimate of \( {\sigma}_{s_{PHT}}^2 \). In the latter case, the estimated portion of the genetic variance attributable to \( {\widehat{\sigma}}_{s_{PHT}}^2=11.50 \) was \( {\widehat{q}}_{PHT}=\frac{11.5}{83}=0.1385 \); that is, according to Eq. (4.32b), the seven markers explain 13.85% of the genetic variance associated with PHT.

One additional way of estimating the variance of the marker score \( {\sigma}_{s_i}^2 \) was proposed by Lange and Whittaker (2001) as

$$ \frac{1}{n-1}\sum \limits_{i=1}^n{\left({\widehat{s}}_i-{\widehat{\mu}}_{s_i}\right)}^2, $$
(4.33)

where \( {\widehat{s}}_i=\sum \limits_{j=1}^M{\widehat{\theta}}_j{x}_j \) and \( {\widehat{\mu}}_{s_i} \) is the mean of \( {\widehat{s}}_i \) values. The covariance between the ith and jth marker scores can be estimated as the cross products of the marker score values divided by n − 1. Note that in this case, the number of markers associated with the ith and jth traits may be different.

For the PHT values described in Sect. 4.3.1 of this chapter, where n = 247, the estimated value of \( {\sigma}_{s_i}^2 \) was \( {\widehat{\sigma}}_{s_{PHT}}^2=15.75 \) and the estimated portion of the genetic variance attributable to \( {\widehat{\sigma}}_{s_{PHT}}^2=15.75 \) was \( {\widehat{q}}_{PHT}=\frac{15.75}{83}=0.1897 \). That is, the seven markers jointly explain 18.97% of the genetic variance associated with PHT according to Eq. (4.33).

4.3.3 Estimating LMSI Selection Response and Efficiency

With the estimated phenotypic variances (\( {\widehat{\sigma}}_{PHT}^2=191.81 \)), the estimated genetic variance (\( {\widehat{\sigma}}_{g_{PHT}}^2=83.0 \)) and the estimated marker score variances: \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 \) (Eq. 4.29), \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 \) (Eq. 4.32a), \( {\widehat{\sigma}}_{s_{PHT}}^2=11.50 \) (Eq. 4.32b), and \( {\widehat{\sigma}}_{s_{PHT}}^2=15.75 \) (Eq. 4.33), we can estimate the LMSI coefficient, selection response, and efficiency.

Using the estimated value \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 \) obtained with Eq. (4.29), it is possible to estimate the LMSI weight as \( {\widehat{\beta}}_{PHT}=\frac{{\widehat{\sigma}}_{g_{PHT}}^2-{\widehat{\sigma}}_{s_{PHT}}^2}{{\widehat{\sigma}}_{PHT}^2-{\widehat{\sigma}}_{s_{PHT}}^2}=\frac{83.0-48.23}{191.81-48.23}=0.242 \), whereas for \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 \), \( {\widehat{\sigma}}_{s_{PHT}}^2=11.50 \), and \( {\widehat{\sigma}}_{s_{PHT}}^2=15.75 \), the estimated values of βPHT were 0.402, 0.40, and 0.382 respectively. The latter results indicate that the estimated values of βPHT associated with the phenotypic values tend to decrease when the estimated values of the variance of the marker score increase. This means that at the limit, when all the genetic variance is explained by the markers, the estimated values of βPHT are zero and the estimated LMSI is equal to \( {\widehat{I}}_M=\widehat{s} \). Thus, for trait PHT, when the estimated values of βPHT are not zero, the estimated LMSI can be written as \( {\widehat{I}}_{M_{PHT}}={\widehat{s}}_{PHT}+{\widehat{\beta}}_{PHT}\left({PHT}_i-{\widehat{s}}_{PHT}\right) \). The \( {\widehat{I}}_{M_{PHT}} \) values are used to predict, rank, and select the net genetic merit value of each individual candidate for selection.

Based on the result \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 \) obtained with Eq. (4.29) and using a selection intensity of 10% (kI= 1.755), the estimated LMSI selection response can be obtained as

$$ {\widehat{R}}_M={k}_I\sqrt{\frac{{\widehat{\sigma}}_g^2\left({\widehat{\sigma}}_g^2-{\widehat{\sigma}}_s^2\right)+{\widehat{\sigma}}_s^2\left({\widehat{\sigma}}_y^2-{\widehat{\sigma}}_g^2\right)}{{\widehat{\sigma}}_y^2-{\widehat{\sigma}}_s^2}}=1.755\sqrt{\frac{83\left(83-48.23\right)+48.23\left(191.81-83\right)}{191.81-48.23}} $$
$$ =1.755\sqrt{56.65}=13.21. $$

In a similar manner, using the result \( {\widehat{\sigma}}_{s_{PHT}}^2=15.75 \), the estimated selection response was \( {\widehat{R}}_M=1.755\sqrt{\frac{83\left(83-15.75\right)+15.75\left(191.81-83\right)}{191.81-15.75}}=1.755\sqrt{41.44}=11.30. \) With \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 \) and \( {\widehat{\sigma}}_{s_{PHT}}^2=11.50 \), the estimated values of the LMSI selection responses were 10.99 and 11.10 respectively. The latter results indicate that the estimated values of the LMSI selection responses tend to increase when the estimated values of the variance of the marker score increase.

We can estimate LMSI versus phenotypic efficiency for one trait as \( {\widehat{\lambda}}_M=\sqrt{\frac{\widehat{q}}{{\widehat{h}}^2}+\frac{{\left(1-\widehat{q}\right)}^2}{1-{\widehat{q}\widehat{h}}^2}} \), where \( {\widehat{h}}^2 \) is the estimated trait heritability and \( \widehat{q}=\frac{{\widehat{\sigma}}_s^2}{{\widehat{\sigma}}_g^2} \) is the estimated portion of additive genetic variance explained by the markers. When \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=48.23 \), \( {\widehat{q}}_{PHT}=\frac{48.23}{83}=0.5811 \), and \( {\widehat{h}}^2=0.433 \), the estimated LMSI efficiency was \( {\widehat{\lambda}}_M=\sqrt{1.58}=1.25 \). For \( {\widehat{\sigma}}_{s_{PHT}}^2=15.75 \), \( {\widehat{\sigma}}_{{\widehat{s}}_{PHT}}^2=9.78 \), and \( {\widehat{\sigma}}_{s_{PHT}}^2=11.50 \), the estimated portions of the additive genetic variance explained by the markers were \( {\widehat{q}}_{PHT}=\frac{15.75}{83}=0.1897 \), \( {\widehat{q}}_{PHT}=\frac{9.78}{83}=0.1178 \), and \( {\widehat{q}}_{PHT}=\frac{11.5}{83}=0.1385 \) respectively, whence the estimated LMSI efficiencies were 1.1, 1.04, and 1.05 respectively. The latter results indicate that the estimated values of LMSI efficiency tend to increase when the estimated values of the variance of the marker score increase (Fig. 4.1).

Figure 4.1 presents the change in LMSI efficiency with respect to phenotypic selection for different values of the variance of the marker score when the phenotypic (191.81) and genetic (83) variances are fixed. In a similar manner, Fig. 4.2 presents the change in the LMSI selection response for different values of the variance of the marker score when the phenotypic (191.81) and genetic (83) variances are fixed. In effect, LMSI efficiency and the selection response depend on the genetic variance explained by the markers.

4.3.4 Estimating the Variance of the Marker Score in the Multi-Trait Case

Equation (4.33) can be used in the multi-trait context when the numbers of markers associated with the ith and jth traits are different. Also, it is possible to adapt Eqs. (4.32a) and (4.32b) to the multi-trait case. However, in the latter case, in addition to the markers linked to the QTL that affect one specific trait, we need to find markers that affect more than one trait, which may be very difficult. For this reason, in the multi-trait context, Eqs. (4.32a) and (4.32b) could be used to estimate the variance of the marker score (S) without preselecting the markers that affect the phenotypic traits, only when the number of genotypes is higher than the number of markers.

Let y1, y2, …, yr be r independent multivariate normal vectors of observations, each with n observations, such that \( \mathbf{Y}=\left[\begin{array}{cccc}{y}_{11}& {y}_{12}& \cdots & {y}_{1t}\\ {}{y}_{21}& {y}_{22}& \cdots & {y}_{2t}\\ {}\vdots & \vdots & \cdots & \vdots \\ {}{y}_{n1}& {y}_{n2}& \cdots & {y}_{nt}\end{array}\right] \) is a matrix n × t of observations for t traits; then, the multivariate linear regression model can be written as Y = XB + U, where X is a matrix n × m (m= number of markers and m < n) of known coded marker values, B is a matrix m × n of regression coefficients, and U is a matrix n × t of unobserved random disturbance whose rows for given X are uncorrelated, each with mean 0 and common covariance matrix E (Mardia et al. 1982; Rencher 2002). According to the least squares method of estimation, \( \widehat{\mathbf{B}}={\left({\mathbf{X}}^{\prime}\mathbf{X}\right)}^{-1}{\mathbf{X}}^{\prime}\mathbf{Y} \) is an estimator of B and \( \widehat{\mathbf{E}}=\frac{{\left(\mathbf{Y}-\widehat{\mathbf{B}}\mathbf{X}\right)}^{\prime}\left(\mathbf{Y}-\widehat{\mathbf{B}}\mathbf{X}\right)}{n-m-1} \) is an estimator of the residual covariance matrix E assuming that n > m (Johnson and Wichern 2007).

Note that \( 1-{R}^2=\frac{{\widehat{\mathbf{e}}}^{\prime}\widehat{\mathbf{e}}}{{\mathbf{y}}^{\prime}\mathbf{y}} \), where \( \widehat{\mathbf{e}} \) is a vector of estimated residual values of the model \( {y}_i={\theta}_0+\sum \limits_{j\in M}{\theta}_j{x}_j+{e}_i \) and R2 is the coefficient of multiple determination (Eq. 4.32a). In addition, as in the multi-trait context the estimated matrix of residuals is \( \widehat{\mathbf{U}}=\mathbf{Y}-\widehat{\mathbf{B}}\mathbf{X} \), 1 − R2 can be written as \( \mathbf{D}={\left({\mathbf{Y}}^{\prime}\mathbf{Y}\right)}^{-1}{\widehat{\mathbf{U}}}^{\prime}\widehat{\mathbf{U}} \) (Mardia et al. 1982), whence R2 in the multivariate context can written as

$$ {\mathbf{R}}^2=\mathbf{I}-\mathbf{D}={\widehat{\mathbf{P}}}^{-1}\widehat{\mathbf{S}}, $$
(4.34a)

whereas \( {R}_{Adj}^2 \) (Eq. 4.32b) can be written as

$$ {\mathbf{R}}_{Adj}^2=\mathbf{I}-\frac{n-1}{n-m-1}\mathbf{D}={\widehat{\mathbf{P}}}^{-1}\widehat{\mathbf{S}}, $$
(4.34b)

where I is an identity matrix t × t, \( {\widehat{\mathbf{P}}}^{-1} \) is the inverse of the estimated covariance matrix of phenotypic values (\( \widehat{\mathbf{P}} \)), and \( \widehat{\mathbf{S}} \) is the estimated covariance matrix of marker score values. From Eq. (4.34b),

$$ \widehat{\mathbf{P}}{\mathbf{R}}_{Adj}^2=\widehat{\mathbf{S}} $$
(4.34c)

is an unbiased estimator of matrix \( \widehat{\mathbf{S}} \), whereas \( \widehat{\mathbf{P}}{\mathbf{R}}^2=\widehat{\mathbf{S}} \) (Eq. 4.34a) is a biased estimator of matrix \( \widehat{\mathbf{S}} \). The main problem of Eq. (4.34c) is that the diagonal elements of \( \widehat{\mathbf{S}} \) could be negative.

From the maize F2 population including 247 genotypes (each one with two repetitions) and 195 molecular markers described in Sect. 4.3.1, we used two traits—PHT (cm) and EHT (cm)—to illustrate the multivariate method of estimating the LMSI parameters. The estimated phenotypic and genetic covariance matrices were \( \widehat{\mathbf{P}}=\left[\begin{array}{cc}191.81& 106.89\\ {}106.89& 167.93\end{array}\right] \) and \( \widehat{\mathbf{C}}=\left[\begin{array}{cc}83.00& 57.44\\ {}57.44& 59.80\end{array}\right] \), whereas the estimated covariance matrix of marker scores, using Eq. (4.33), was \( \widehat{\mathbf{S}}=\left[\begin{array}{cc}15.750& 0.983\\ {}0.983& 28.083\end{array}\right] \). When we used Eq. (4.34a) and Eq. (4.34c), we obtained estimated values of the variance and covariance of the marker scores that were higher than the genetic values (data not presented). Equations (4.29) and (4.31) are used later to compare LMSI efficiency versus GW-LMSI efficiency using the simulated data described in Chap. 2, Sect. 2.8.1.

With matrices \( \widehat{\mathbf{P}} \), \( \widehat{\mathbf{C}} \), and \( \widehat{\mathbf{S}} \), and the vector of economic weights \( {\mathbf{a}}^{\prime }=\left[{\mathbf{w}}^{\prime}\kern0.5em {\mathbf{0}}^{\prime}\right] \), where \( {\mathbf{w}}^{\prime }=\left[-1\kern0.5em -1\right] \) and \( {\mathbf{0}}^{\prime }=\left[0\kern0.5em 0\right] \), we obtained the estimated matrices \( \widehat{\mathbf{T}}=\left[\begin{array}{cc}\widehat{\mathbf{P}}& \widehat{\mathbf{S}}\\ {}\widehat{\mathbf{S}}& \widehat{\mathbf{S}}\end{array}\right] \) and \( \mathbf{Z}=\left[\begin{array}{cc}\widehat{\mathbf{C}}& \widehat{\mathbf{S}}\\ {}\widehat{\mathbf{S}}& \widehat{\mathbf{S}}\end{array}\right] \), whence the estimated LMSI vector of coefficients was \( {\widehat{\boldsymbol{\upbeta}}}^{\prime }={\mathbf{a}}^{\prime }{\widehat{\mathbf{Z}}}_M{\widehat{\mathbf{T}}}_M^{-1}=\left[-0.59\kern0.5em -0.18\kern0.5em -0.41\kern0.5em -0.82\right] \). Using a selection intensity of 10% (kI = 1.755), the estimated LMSI selection response and the expected genetic gains per trait were \( {\widehat{R}}_M={k}_I\sqrt{\widehat{{\boldsymbol{\upbeta}}^{\prime }}{\widehat{\mathbf{T}}}_M\widehat{\boldsymbol{\upbeta}}}=20.41 \) and \( {\widehat{\mathbf{E}}}_M^{\prime }={k}_I\frac{\widehat{{\boldsymbol{\upbeta}}^{\prime }}{\widehat{\mathbf{Z}}}_M}{\sqrt{\widehat{{\boldsymbol{\upbeta}}^{\prime }}{\widehat{\mathbf{T}}}_M\widehat{\boldsymbol{\upbeta}}}}=\left[-10.09\kern0.5em -10.31\kern0.5em -2.53\kern0.5em -4.39\right] \) respectively, whereas the estimated LMSI accuracy was \( {\widehat{\rho}}_{H{\widehat{I}}_M}=\frac{{\widehat{\sigma}}_{I_M}}{{\widehat{\sigma}}_H}=0.72 \).

The estimated LPSI parameters (see Chap. 2 for details) using the phenotypic information from the maize F2 population for traits PHT and EHT are as follows. The estimated LPSI vector of coefficients was \( \widehat{{\mathbf{b}}^{\prime }}={\mathbf{w}}^{\prime}\widehat{\mathbf{C}}{\widehat{\mathbf{P}}}^{-1}=\left[-0.53\kern0.5em -0.36\right] \), and, with a selection intensity of 10% (kI = 1.755), the estimated LPSI selection response and the expected genetic gains per trait were \( {\widehat{R}}_I={k}_I\sqrt{\widehat{{\mathbf{b}}^{\prime }}\widehat{\mathbf{P}}\widehat{\mathbf{b}}}=18.97 \) and \( \widehat{{\mathbf{E}}^{\prime }}={k}_I\frac{{\widehat{\mathbf{b}}}^{\prime}\widehat{\mathbf{C}}}{{\widehat{\sigma}}_I}=\left[-10.52\kern0.5em -8.45\right] \) respectively, whereas the estimated LPSI accuracy was \( {\widehat{\rho}}_{H\widehat{I}}=\frac{{\widehat{\sigma}}_I}{{\widehat{\sigma}}_H}=0.67 \).

We can determine LMSI efficiency versus LPSI efficiency to predict the net genetic merit using the ratio of estimated accuracy values \( {\widehat{\rho}}_{H{\widehat{I}}_M}=0.72 \) and \( {\widehat{\rho}}_{H\widehat{I}}=0.67 \) of the LMSI and LPSI respectively, i.e., \( {\widehat{\lambda}}_M=\frac{0.72}{0.67}=1.075 \), whence, according to Eq. (4.19), the estimated LMSI efficiency versus the LPSI efficiency, in percentage terms, was \( {\widehat{p}}_M=100\left(1.075-1\right)=7.5 \). That is, for these data, the estimated LMSI efficiency was only 7.5% greater than LPSI efficiency at predicting the net genetic merit.

4.4 Estimating the GW-LMSI Parameters in the Asymptotic Context

Lange and Whittaker (2001) proposed the GW-LMSI. However, these authors did not provide detailed procedures for estimating matrices P, C, W, and M. They indicated that matrix C can be estimated using the estimated matrix of covariance of marker scores (\( \widehat{\mathbf{S}} \)) and that matrices P, W, and M can be estimated directly by their empirical variances and covariances, but this assertion does not indicate a clear method for estimating those covariance matrices. In Chap. 2, we described the REML method of estimating C and P. Crossa and Cerón-Rojas (2011) described matrices W and M in a doubled haploid population. In this study, we describe and estimate matrices W and M for an F2 population in the asymptotic context according to the Wright and Mowers (1994) approach, which is based on regressing phenotype values on marker coded values. We used this latter approach to estimate W and M, because it is a clearer estimation method than that of Lange and Whittaker (2001); however, the Wright and Mowers (1994) approach is an asymptotic method and should be regarded with precaution.

Matrix M is the covariance matrix of the molecular marker code values. All marker information used to construct matrix M is presented in Table 4.2. Based on this information, we found that the expectations (E(X1) and E(X2)) and the variances (V(X1) and V(X2)) of the marker coded values X1 and X2 are E(X1) = E(X2) = 0 and V(X1) = V(X2) = 1, whereas the covariance (Cov(X1, X2)) and correlation (Corr(X1, X2)), between X1 and X2 were

$$ Cov\left({X}_1,{X}_2\right)= Corr\left({X}_1,{X}_2\right)=1-2\delta . $$
(4.35)
Table 4.2 Marker genotypes, expected frequency, and coded values (X1 and X2) of the marker genotypes in an F2 population

Thus, as the variances of X1 and X2 are equal to 1, the correlation between X1 and X2 is \( Corr\left({X}_1,{X}_2\right)=\frac{Cov\left({X}_1,{X}_2\right)}{\sqrt{V\left({X}_1\right)V\left({X}_2\right)}}=1-2\delta \), i.e., the covariance and correlation between X1 and X2 are the same. Equation (4.35) results indicate that if we perform the same operation with many markers, we will obtain similar results; they also indicate that this is the way to construct matrix M.

Let X be a matrix of coded markers of size n × m, where n ≥ m and m= number of markers; then according to Wright and Mowers (1994), because all marker information is contained in matrix XX, when the number of observations (n) tends to infinity, the product \( {\mathbf{x}}_i^{\prime }{\mathbf{x}}_j/n \) tends to the covariance between markers ith and jth, whence matrix n−1XX should tend to the covariance matrix between the markers that conform matrix X with the ijth element equal to (0.5 − δij). Thus, matrix 2n−1XX should tend to a covariance matrix where the ijth entry is equal to (1 − 2δij). Based on the latter result, an estimator of matrix M in the asymptotic context is

$$ \widehat{\mathbf{M}}=2{n}^{-1}{\mathbf{X}}^{\prime}\mathbf{X}. $$
(4.36)

Equation (4.36) is an asymptotic result and should be taken with caution. To date, there has been no clear method for estimating M in the non-asymptotic context; for this reason, Eq. (4.36) is used to estimate the GW-LMSI parameters.

Assume that a QTL is between the two markers in Table 4.2; then, δ can be written as δ = r1 + r2 − 2r1r2, where r1 and r2 denote the recombination frequency between marker 1 and marker 2 respectively, with the QTL between them. When the number of genotypes or individuals tends to infinity, the covariance between the phenotypic trait values (y) and the marker 1 coded values (X1) in an F2 population can be written as

$$ Cov\left({X}_1,y\right)=\frac{1}{2}{\alpha}_1\left(1-2{r}_1\right), $$
(4.37)

where α1(1 − 2r1) is the portion of the additive effect (α1) of the QTL linked to marker 1 (Edwards et al. 1987), and r1 is the recombination frequency between the QTL and marker 1. We can assume that for many markers, the covariance of the phenotypic values is similar to Eq. (4.37), whence matrix W can be obtained.

Let y be a vector n × 1 of recorded phenotypic values, where n denotes the number of observation or records, and X is a matrix of coded markers of size n × m. When n tends to infinity, 2n−1Xy tends to be a vector with elements equal to αi(1 − 2ri), where αi is the additive effect of the ith QTL linked to the ith marker, and ri is the recombination frequency between the ith QTL and the ith marker. Now let \( \mathbf{Y}=\left[\begin{array}{cccc}{y}_{11}& {y}_{12}& \cdots & {y}_{1t}\\ {}{y}_{21}& {y}_{22}& \cdots & {y}_{2t}\\ {}\vdots & \vdots & \cdots & \vdots \\ {}{y}_{n1}& {y}_{n2}& \cdots & {y}_{nt}\end{array}\right] \) be a matrix of observations for t traits; then, an estimator of matrix W in the asymptotic context is

$$ \widehat{\mathbf{W}}=2{n}^{-1}{\mathbf{X}}^{\prime}\mathbf{Y}. $$
(4.38)

Once again, Eq. (4.38) is an asymptotic result and should be accepted with caution. But to date, there has been no clear method for estimating W in the non-asymptotic context; for this reason, Eq. (4.38) is used to estimate the GW-LMSI parameters.

4.5 Comparing LMSI Versus LPSI and GW-LMSI Efficiency

To compare LMSI efficiency versus GW-LMSI efficiency for predicting the net genetic merit, we use the simulated data set described in Chap. 2, Sect. 2.8.1.

Figure 4.4 presents the estimated accuracy values of the LPSI (\( {\widehat{\rho}}_{H\widehat{I}}=\frac{{\widehat{\sigma}}_{\widehat{I}}}{{\widehat{\sigma}}_H} \)), the LMSI (\( {\widehat{\rho}}_{H{\widehat{I}}_M}=\frac{{\widehat{\sigma}}_{{\widehat{I}}_M}}{{\widehat{\sigma}}_H} \)), and the GW-LMSI (\( {\widehat{\rho}}_{H{\widehat{I}}_W}=\frac{{\widehat{\sigma}}_{{\widehat{I}}_W}}{{\widehat{\sigma}}_H} \)) for five simulated selection cycles. In addition, Table 4.3 presents the estimated LPSI, LMSI, and GW-LMSI selection responses, the estimated LPSI, LMSI, and GW-LMSI variances of the predicted error (\( \left(1-{\widehat{\rho}}_{H\widehat{I}}^2\right){\widehat{\sigma}}_H^2 \), \( \left(1-{\widehat{\rho}}_{H{\widehat{I}}_M}^2\right){\widehat{\sigma}}_H^2 \) and \( \left(1-{\widehat{\rho}}_{H{\widehat{I}}_W}^2\right){\widehat{\sigma}}_H^2 \) respectively), the ratios of the estimated LMSI accuracy to the estimated LPSI accuracy and the estimated LMSI accuracy to the estimated GW-LMSI accuracy, expressed as percentages (Eq. 4.19), for five simulated selection cycles.

Fig. 4.4
figure 4

Estimated correlation values of the linear phenotypic selection index (LPSI), the linear molecular selection index (LMSI), and the genome-wide LMSI (GW-LMSI) with the net genetic merit for four traits, 2500 markers and 500 genotypes (each with four repetitions) in one environment for five simulated selection cycles

Table 4.3 Estimated linear phenotypic, molecular, and genome-wide selection indices (LPSI, LMSI, and GW-LMSI respectively), selection responses and variance of the predicted error, and estimated ratio of LMSI accuracy to LPSI and GW-LMSI accuracy expressed in percentages for 4 traits, 2500 markers and 500 genotypes (each with four repetitions) in one environment for five simulated selection cycles

According to Fig. 4.4, for this data set the estimated LMSI accuracy (\( {\widehat{\rho}}_{H{\widehat{I}}_M} \)) was higher than the estimated LPSI and GW-LMSI accuracy (\( {\widehat{\rho}}_{H\widehat{I}} \) and \( {\widehat{\rho}}_{H{\widehat{I}}_W} \) respectively), for the five simulated selection cycles, that is, \( {\widehat{\rho}}_{H{\widehat{I}}_M}>{\widehat{\rho}}_{H\widehat{I}}>{\widehat{\rho}}_{H{\widehat{I}}_W} \). In a similar manner, Table 4.3 results indicate that the estimated LMSI selection response (\( {\widehat{R}}_M \)) was higher than the estimated LPSI and GW-LMSI selection responses (\( {\widehat{R}}_I \) and \( {\widehat{R}}_W \) respectively): \( {\widehat{R}}_M>{\widehat{R}}_I>{\widehat{R}}_W \).

Note that the estimated LPSI, LMSI, and GW-LMSI variances of the predicted error, and the estimated LMSI efficiency versus LPSI efficiency and versus GW-LMSI efficiency (expressed in percentages) are related to the estimated LMSI, LPSI, and GW-LMSI accuracies, and that in all five selection cycles, \( {\widehat{\rho}}_{H{\widehat{I}}_M}>{\widehat{\rho}}_{H\widehat{I}}>{\widehat{\rho}}_{H{\widehat{I}}_W} \). This implies that the estimated LMSI variance of the predicted error was lower than the estimated LPSI and GW-LMSI variance of the predicted error. In a similar manner, because \( {\widehat{\rho}}_{H{\widehat{I}}_M}>{\widehat{\rho}}_{H\widehat{I}}>{\widehat{\rho}}_{H{\widehat{I}}_W} \), the estimated LMSI efficiency was higher than the estimated LPSI efficiency and the estimated GW-LMSI efficiency.

Based on Fig. 4.4 and Table 4.3 results, we conclude that the LMSI was a better predictor of the net genetic merit than the LPSI, and that the LPSI is a better predictor of the net genetic merit than the GW-LMSI for this simulated data set.