Linear Marker and Genome-Wide Selection Indices

There are two main linear marker selection indices employed in marker-assisted selection (MAS) to predict the net genetic merit and to select individual candidates as parents for the next generation: the linear marker selection index (LMSI) and the genome-wide LMSI (GW-LMSI). Both indices maximize the selection response, the expected genetic gain per trait, and the correlation with the net genetic merit; however, applying the LMSI in plant or animal breeding requires genotyping the candidates for selection; performing a linear regression of phenotypic values on the coded values of the markers such that the selected markers are statistically linked to quantitative trait loci that explain most of the variability in the regression model; constructing the marker score, and combining the marker score with phenotypic information to predict and rank the net genetic merit of the candidates for selection. On the other hand, the GW-LMSI is a single-stage procedure that treats information at each individual marker as a separate trait. Thus, all marker information can be entered together with phenotypic information into the GW-LMSI, which is then used to predict the net genetic merit and select candidates. We describe the LMSI and GW-LMSI theory and show that both indices are direct applications of the linear phenotypic selection index theory to MAS. Using real and simulated data we validated the theory of both indices.

3. The QTL should be in coupling mode, that is, one of the initial lines should have all the alleles that have a positive effect on the chromosome, and the other lines should have all the negative effects. 4. The traits of interest should be affected by a few QTL with large effects (and possibly a number of very small QTL effects) rather than many small QTL effects. 5. The heritability of the traits should be low. 6. Markers correlated with the traits of interest should be identified.
Under these conditions, the LMSI should be more efficient than the LPSI, at least in the first selection cycles (Whittaker 2003;Moreau et al. 2007).

The LMSI Parameters
Let y i ¼ g i + e i be the ith trait (i ¼ 1, 2, . . ., t, t ¼ number of traits), where e i~N (0, σ 2 e i ) is the residual with expectation equal to zero and variance value σ 2 e i , and N stands for normal distribution. Assuming that the QTL effects combine additively both within and between loci, the ith unobservable genetic value g i can be written as where α k is the effect of the kth QTL, q k is the number of favorable alleles at the kth QTL (2, 1 or 0), and N Q is the number of QTL affecting the ith trait of interest. If the QTL effect values are not observable, the g i values in Eq. (4.1) are also not observable; however, we can use a linear combination of the markers linked to the QTL (s i ) that affect the ith trait to predict the g i value as where s i is a predictor of g i , θ j is the regression coefficient of the linear regression model, x j is the coded value of the jth markers (e.g., 1, 0, and À1 for marker genotypes AA, Aa and aa respectively), and M is the number of selected markers linked to the QTL that affect the ith trait. Equation (4.2) is called the marker score (Lande and Thompson 1990;Whittaker 2003) and this is the main reason why the LMSI is not equal to the LPSI described in Chap. 2. The number of selected markers is only a subset of potential markers linked to QTL in the population under selection; thus, the s i values should be lower than or equal to the g i values. One way of estimating the s i values is to perform a linear regression of phenotypic values on the coded values of the markers, select markers that are statistically linked to quantitative trait loci that explain most of the variability in the regression model, and then obtain the estimated value of s i (b s i ) as the sum of the products of the QTL effects linked to markers and multiplied by the marker coded values associated with the ith trait. Some authors (e.g., Moreau et al. 2007) callb s i the molecular score; in this book, we call s i the marker score and b s i the estimated marker score. The objective of the LMSI is to predict the net genetic merit of each individual and select the individuals with the highest net genetic merit for further breeding. In the LMSI context, the net genetic merit can be written as where g 0 ¼ g 1 . . . g q Â Ã is the vector of breeding values; w 0 ¼ w 1 Á Á Á w t ½ is the vector of economic weights associated with g; w 0 2 ¼ 0 1 Á Á Á 0 t ½ is a null vector associated with the vector of marker scores s 0 ¼ s 1 Á Á Á s t ½ ; s i is the ith marker score; a 0 ¼ w 0 w 0 2 ½ and z ¼ g 0 s 0 ½ . The information provided by the marker score can be used in breeding programs to increase the accuracy of predicting the net genetic merit of the individuals under selection. The LMSI combines the phenotypic and marker scores to predict H in each selection cycle and can be written as where β 0 y and β s are vectors of phenotypic and marker score weights respectively; y 0 ¼ y 1 Á Á Á y t ½ is the vector of trait phenotypic values and s was defined in Eq. (4.3); β 0 ¼ β 0 y β 0 s Â Ã and t 0 ¼ y 0 s 0 ½ . The LMSI selection response can be written as where k I is the standardized selection differential of the LMSI, σ H ¼ ffiffiffiffiffiffiffiffiffiffiffiffi ffi ! are block matrices of covariance where P ¼ Var(y), S ¼ Var(s), and C ¼ Var(g) are the covariance matrices of phenotypic values (y), the marker score (s), and the genetic value (g) respectively in the population. Vectors a and β were defined in Eqs. (4.3) and (4.4) respectively.
The LMSI expected genetic gain per trait can be written as All the parameters in Eq. (4.6) were previously defined.

The Maximized LMSI Parameters
Suppose that P, S and C are known matrices; then, matrices T M and Z M are known and, according to the LPSI theory (Chap. 2 for details), the LMSI vector of coefficients (β M ) that maximizes ρ I M H , R M , and E M can be written as whence the maximized selection response and the maximized correlation (or LMSI accuracy) between H and I M can be written as is the standard deviation of the variance of I M and σ H ¼ ffiffiffiffiffiffiffiffiffiffiffiffi ffi a 0 Z M a p is the deviation of the variance of H. Equations (4.8a) and (4.8b) show that the LMSI is a direct application of the LPSI theory in the marker-assisted selection (MAS) context.
Let Q ¼ T À1 M Z M ; then, matrix Q can be written as whence β ¼ Qa, and as w 0 Another way of writing the marker score vector weights is where β y ¼ (P À S) À1 (C À S)w. By Eq. (4.10b), the optimal LMSI can be written as Equation (4.11) indicates that, in practice, to estimate the optimal LMSI, we only need to estimate the vector of coefficients β y . By Eq. (4.10a), Eq. (4.8a) can be written as Thus, by Eqs. (4.10a) and (4.12), when S is a null matrix, vector β y is equal to are the LPSI vector of coefficients and its selection response respectively.
Assume that when the number of markers and genotypes tend to infinity, S tends to C; then, at the limit, we can suppose that S ¼ C, and by this latter result, R M is equal to That is, Eq. (4.13) is the maximum value of the LMSI selection response when the numbers of markers and genotypes tend to infinity. Thus, the possible LMSI selection response values of Eq. (4.12) should be between k I ffiffiffiffiffiffiffiffiffi ffi b 0 Pb p and k I ffiffiffiffiffiffiffiffiffiffiffiffi or between 1 and ffiffiffiffiffiffiffiffi ffi , where ρ HI is the maximized correlation between the net genetic merit (H ) and the LPSI (I ) described in Chap. 2. Equation (4.15) indicates that LMSI efficiency tends to infinity when the ρ HI value tends to zero and is an additional way of denoting the paradox of LMSI efficiency described by Knapp (1998), which implies that LMSI efficiency tends to infinity when the ρ HI value tends to zero.

The LMSI for One Trait
For the one-trait case, matrices T M , Z M , and Q can be written as where σ 2 y , σ 2 g , and σ 2 s are the phenotypic, genetic, and marker score variances respectively. By Eqs. (4.10a) and (4.10b), when a 0 ¼ 1 0 ½ , the elements of vector β ¼ Qa are whence the optimal LMSI can be written as whereas by Eq. (4.12), the maximized LMSI selection response can be written as response for the one-trait case without markers.

Efficiency of LMSI Versus LPSI Efficiency for One Trait
Suppose that the intensity of selection is the same in both indices; then, to compare LMSI versus LPSI efficiency for predicting the net genetic merit, we can use the ratio λ M ¼ Bulmer 1980;Moreau et al. 1998), where R I is the maximized LPSI selection response. In percentage terms, the LMSI versus LPSI efficiency can be written as When p M ¼ 0, the efficiency of both indices is the same; when p M > 0, the efficiency of the LMSI is higher than that of the LPSI, and when p M < 0, LPSI efficiency is higher than LMSI efficiency for predicting the net genetic merit.
In the case of one trait, Lande and Thompson (1990) showed that LMSI efficiency (not in percentage terms) with respect to phenotypic efficiency can be written as where R M was defined in Eq. (4.18), R ¼ kσ y h 2 , h 2 is the trait heritability, and q ¼ σ 2 s σ 2 g is the proportion of additive genetic variance explained by the markers. According to Eq. (4.20), the advantage of the LMSI over phenotypic selection increases as the population size increases and heritability decreases, because in such cases, q ¼ σ 2 s σ 2 g tends to 1 and Eq. (4.20) approaches 1 h . Therefore, the LMSI is most efficient for traits with low heritability and when the marker score explains a large proportion of the genetic variance. Thus, note that when h 2 tends to zero, 1 h tends to infinity; this means that in the asymptotic context, LMSI efficiency with respect to phenotypic efficiency for one trait (Eq. 4.20) tends to infinity and this is the LMSI paradox pointed out by Knapp (1998). There are other problems associated with the LMSI: it increases the selection response only in the short term and can result in lower cumulative responses in the longer term than phenotypic selection, as the LMSI fixes the QTL at a faster rate than phenotypic selection. In addition, it requires the weights (Eq. 4.17a) to be updated, because in each generation the frequency of the QTL changes (Dekkers and Settar 2004).

Statistical LMSI Properties
Assume that H and I M have bivariate joint normal distribution, β ¼ T À1 M Z M a, and that P, C, S, and w are known; then, the statistical LMSI properties are the same as the LPSI properties described in Chap. 2. That is, Properties 1 to 4 are the same as LPSI properties 1 to 4, but, because the LMSI jointly incorporates the phenotypic and marker information to predict the net genetic merit, LMSI accuracy should be higher than LPSI accuracy. The same is true of the LMSI selection response and expected genetic gain per trait when compared with the LPSI selection response and expected genetic gain per trait.

The Genome-Wide Linear Selection Index
The genome-wide linear marker selection index (GW-LMSI) is a single-stage procedure that treats information at each individual marker as a separate trait. Thus, all marker information can be entered together with phenotypic information into the GW-LMSI, which is then used to predict the net genetic merit. In a similar manner to the LMSI, the GW-LMSI exploits the linkage disequilibrium between markers and the QTL produced when inbred lines are crossed.

The GW-LMSI Parameters
In a similar manner to the LPSI, the main objective of the GW-LMSI is to predict the net genetic merit values of each individual and select the best individuals for further breeding. In the GW-LMSI context, the net genetic merit can be written as is the vector of economic weights associated with the breeding values, and w 0 The GW-LMSI (I W ) combines the phenotypic value and the molecular information linked to the individual traits to predict H values in each selection cycle. It can be written as where β 0 y and β m are vectors of phenotypic and marker weights respectively; is the vector of phenotypic values and m was defined in Eq. (4.21); β 0 The GW-LSI selection response can be written as where k I is the standardized selection differential of the GW-LMSI, and a 0 W Ψβ W are the correlation and the covariance between are the covariance matrices of phenotypic values (y), the molecular marker (m) coded values, and the genetic (g) values, whereas W is the covariance matrix between y and m, and between g and m. The size of matrices P and C is t Â t, but the sizes of matrices M and W are m Â m and m Â t respectively. From a theoretical point of view, Crossa and Cerón-Rojas (2011) showed that matrix M can be written as where (1 À 2δ ij ) is the covariance (or correlation) and δ ij the recombination frequency between the ith and jth marker (i, j ¼ 1, 2, . . ., m ¼ number of markers). According to Crossa and Cerón-Rojas (2011), matrix W can be written as is the covariance between the qth trait and the ith marker; r ik is the recombination frequency between the ith marker and the kth QTL; and α qk is the effect of the kth QTL over the qth trait. The GW-LMSI expected genetic gain per trait can be written as All parameters in Eq. (4.24) were previously defined. Matrix Φ could be singular, i.e., its inverse (Φ À1 ) could not exist because matrix W is singular. Suppose that matrices Φ and Ψ are known; then, according to the LPSI theory, the GW-LMSI vector of coefficients (β W ) that maximizes ρ I W H can be written as

ð4:25aÞ
where matrix Φ À denotes a generalized inverse of Φ. By Eq. (4.25a), the maximized GW-LMSI selection response is Equations (4.25a) and (4.25b) show that the GW-LMSI is a direct application of the LPSI to MAS. By Eq. (4.25a), the maximized correlation between H and I W is is the standard deviation of the variance of H.

Relationship Between the GW-LMSI and the LPSI
Matrix Φ À can be written as where L À is a generalized inverse of matrix L ¼ P À W 0 M À W, and M À is a generalized inverse of matrix M. In matrix Φ À , the inverse of matrix W is not required and the standard inverse of matrix M (M À1 ) may exist. In the latter case, the standard inverse of matrix L (L À1 ) exists and can be written as et al. 2006). By Eq. (4.26) and because w 0 can be written as where w is the vector of economic weights. Suppose that there is no marker information; then, matrices M and W are null and Eq. (4.27) is equal to β y ¼ P À1 Cw ¼ b (the LPSI vector of coefficients), whereas β m ¼ 0 and where (X 0 X) À is a generalized inverse matrix of X 0 X and Y is a matrix of phenotypic observations.

Statistical Properties of GW-LMSI
Assume that H and I W have bivariate joint normal distribution, β W ¼ Φ À Ψa W , and P, C, M, W, and w are known; then, the statistical GW-LMSI properties are the same as the LMSI properties. That is, , the variance of I W (σ 2 I W ) and the covariance between H and I W ( σ HI W ) are the same. 2. The maximized correlation between H and I W , or I W accuracy, is ρ HI W ¼ Lange and Whittaker (2001), GW-LMSI efficiency should be greater than LMSI efficiency. However, this would be true only if matrices P, C, M, and W are known and trait heritability is very low.

Estimating the LMSI Parameters
When covariance matrices P, C, and S, and the vector of economic weights (w) are known, there is no error in the estimation of the LMSI parameters (selection response, expected genetic gain, etc.); the same is true for the GW-LMSI when, in addition to P, C, and w, the covariance matrices M and W are known. In such cases, the relative efficiency of the LMSI (GW-LMSI) depends only on the heritability of the traits and on the portion of phenotypic variation associated with markers. Using simulated data, Lange and Whittaker (2001) found that GW-LMSI efficiency was higher than LMSI efficiency when trait heritability was 0.2 and matrices P, C, M, and W were known. When P, C, S, M, and W are unknown, it is necessary to estimate them; then, the LMSI and GW-LMSI vector of coefficients and the effects associated with markers are estimated with some error. This error leads to lower LMSI and GW-LMSI efficiency than expected under the assumption that the parameters are known; however, in the latter case, Lange and Whittaker (2001) also found that GW-LMSI efficiency was greater than that of the LMSI when trait heritability was 0.05. Moreover, in the LMSI there is additional bias in the estimation of the parameters because only markers with significant effects are included in the index (Moreau et al. 1998).
In Chap. 2, we described the restricted maximum likelihood (REML) method for estimating matrices P and C. Some authors (Lande and Thompson 1990;Charcosset and Gallais 1996;Hospital et al. 1997;Moreau et al. 1998Moreau et al. , 2007 have described methods for estimating marker scores, the variance of the marker scores, the LMSI vector of coefficients, etc., in the context of one trait; however, up to now there have been no reports on the estimation of matrix S in the multi-trait case. Lange and Whittaker (2001) only indicated that matrix S can be estimated as b s is a vector of estimated marker scores associated with several individual traits.
The main problems associated with the estimated LMSI parameters are: 1. The estimated values of the covariance matrix S ( b S ) tend to overestimate the genetic covariance matrix (C). 2. The estimated variances of the marker scores can be negative.
When the first point is true, the estimated LMSI selection response and efficiency could be negative because the estimated matrix b is not positive semi-definite (no negative eigenvalues). In addition, the results can lead to all weights being placed on the molecular score and the weights on the phenotype values can be negative (Moreau et al. 2007). When the second point is true, the variance of the marker scores is not useful. The two problems indicated above could be caused by using the same data set to select markers and to estimate marker effects, and there is no simple way of solving them. Lande and Thompson (1990) proposed that the markers used to obtain b S be selected a priori as those with the most highly significant partial regression coefficients from among all the markers in the linkage group analyzed in the previous generation. Zhang andSmith (1992, 1993) proposed using two independent sets of markers: one to estimate marker effects and the other to select markers. Additional solutions to these problems were described by Moreau et al. (2007).
In this subsection, we describe methods (in the univariate and multivariate context) for estimating molecular marker effects, marker scores, and their variance and covariance, and for estimating the LMSI and GW-LMSI vector of coefficients, selection response, expected genetic gain, and accuracy. This subsection is only for illustration; we use the same data set to select markers, and to estimate marker effects and the variance of marker scores.

Estimating the Marker Score
According to Eqs. (4.11) and (4.17b), when the vector of economic weights is equal to a 0 ¼ 1 0 ½ , the LMSI for the ith trait y i (i ¼ 1, 2, Á Á Á, t; t ¼ number of traits) value can be written as I M li ¼ s i þ β y i y i À s i ð Þ(l ¼ 1, 2, Á Á Á, n; n ¼ number of is the heritability of the ith trait, and q i ¼ σ 2 is the proportion of genetic variance explained by the QTL or markers associated with the ith trait; s i individual trait marker score; and σ 2 y i , σ 2 g i , and σ 2 s i are the ith variances of the phenotypic, genetic, and marker score values respectively. The simplest way of estimating the ith marker score s i is to perform a multiple linear regression of phenotypic values (y i ) on the coded values of the markers (x j ) and then select the markers statistically linked to the ith QTL that explain most of the variability in the regression model and use them to construct s We can fit the model y According to the least squares method of estimation, b θ ¼ X 0 X ð Þ À1 X 0 y * is an estimator of the vector of regression coefficients θ 0 ¼ θ 1 θ 2 Á Á Á θ m ½ , where m (m < n) is the number of markers, X is a matrix n Â m of coded marker values (e.g., 1, 0 and À1 for marker genotypes AA, Aa, and aa respectively) and y * is a vector n Â 1 of phenotypic values centered based on its average values. Only a subset M(M < m) of the m markers is statistically linked to the QTL and then only a subset M of the estimated vector b θ values is selected to estimate s i as b s i ¼ To illustrate how to obtain b s i ¼ X j2M b θ j x j , we use a real maize (Zea mays) F 2 population with 247 genotypes (each one with two repetitions), 195 molecular markers, and four traitsgrain yield (GY, ton ha À1 ); plant height (PHT, cm), ear height (EHT, cm), and anthesis day (AD, days)evaluated in one environment. In an F 2 population, the marker homozygous loci for the allele from the first parental line can be coded by 1, whereas the marker homozygous loci for the allele from the second parental line can be coded by À1, and the marker heterozygous loci by 0.
For this example, we used trait PHT. Only seven markers were statistically linked to the PHT. The estimated vector of regression coefficients for these seven markers was b θ 0 ¼ 5:46 À4:54 0:98 7:39 À7:75 À1:91 À3:53 ½ . Table 4 θ 0 and the coded values of the seven markers, the first estimated b s PHT value was obtained as b s PHT1 ¼ À1:91 1 ð Þ þ À3:53 À1 ð Þ ¼ 1:62 ; the second estimated b s PHT value was obtained as b s PHT2 ¼ 5:46 À1 ð Þþ À4:54 À1 ð ÞÀ1:91 À1 ð Þ ¼ 0:99, etc. The 20th estimated b s PHT value was obtained as b s PHT20 ¼ À3:53 À1 ð Þ ¼ 3:53. This estimation procedure is valid for any number of genotypes and markers. Figure 4.3 shows the distribution of the 247 estimated marker scores associated with traits PHT and EHT of the maize F 2 population. Note that the estimated marker score values approach normal distribution.

Estimating the Variance of the Marker Score
There are many methods of estimating the variance of the marker score associated with the ith trait (σ 2 s i ); the first one was proposed by Lande and Thompson (1990). According to these authors, σ 2 s i can be estimated as where b θ i is the estimated vector of regression coefficients of the selected markers, is an identity matrix n Â n, M is the number of selected markers statistically linked to the QTL, and X i is a matrix n Â M with the coded values of the selected markers. According to Lande and Thompson (1990), Eq. (4.29) is an unbiased estimator of σ 2 s i and its variance can be written as which tends to zero when n, the number of genotypes or individuals, is very high. From Eq. (4.29), it is possible to obtain an estimator of the covariance between the ith and jth marker scores when the number of selected markers statistically linked to the QTL is the same in the ith and jth traits. Thus, by Eq. (4.29), the covariance between the ith and jth marker scores can be estimated as where b θ i and b θ j are the estimated vectors of regression coefficients of the selected markers associated with the ith and jth trait loci respectively; M ij ¼ 2 n X 0 i X j is the covariance matrix M Â M of the markers statistically linked to the ith and jth trait marker loci; X i and X j are n Â M matrices with the coded values of the selected markers associated with the ith and jth trait loci respectively; b σ e ij ¼ y 0 i I À H ij À Á y j n À M À 1 is the estimated covariance of the residuals between the ith (y i ) and jth (y j ) trait values, I is an identity matrix n Â n, and M is the number of selected markers statistically linked to the QTL.
According to the PHT values described in Sect. 4.3.1 of this chapter, M ¼ 7, :0 is an estimate of the genetic variance of PHT. The estimated portion of the genetic variance attributable to b σ 2 b s PHT ¼ 48:23 was b q PHT ¼ 48:23 83 ¼ 0:5811; that is, the seven markers explain 58.11% of the genetic variance associated with PHT. Charcosset and Gallais (1996) considered two possible methods of estimating σ 2 s i based on the coefficient of multiple determination or squared multiple correlation R 2 (note that in this case R 2 is not the square of the selection response). The coefficient R 2 gives the portion of the total variation in the phenotypic values that is "explained" by, or attributable to, the markers and can be written as where b θX 0 y À n y 2 is the overall regression sum of squares adjusted for the intercept and y 0 y À n y 2 is the total sum of squares adjusted for the mean. The coefficient R 2 is equal to 1 if the fitted equation θ j x j þ e i passes through all the data points, so that all residuals are null; then, the markers explain all the phenotypic variance. At the other extreme, R 2 is zero if y i ¼ b θ 0 and the estimated regression coefficients are null, i.e., b θ In the latter case, markers do not affect the phenotypic observations and the variance of the marker score values is zero. Thus, the R 2 values are between 0 and 1, i.e., 0 R 2 1.0. Equation (4.32a) is useful for estimating σ 2 s i as b σ 2 where R 2 j is the estimated value of the jth marker and b σ 2 y is the phenotypic variance of the ith trait; however, this is a biased estimator of σ 2 s i (Hospital et al. 1997). Charcosset and Gallais (1996) and Hospital et al. (1997) proposed an unbiased estimator of σ 2 s i based on all the selected markers using the adjusted coefficient of multiple determination, i.e., whence we can obtain a unbiased estimator of σ 2 s i as b σ 2 y R 2 Adj ¼ b σ 2 b s i by jointly using all the markers that affect the phenotypic values. The problem with Eq. (4.32b) is that the R 2 Adj values could be negative; in that case, the estimated value of σ 2 s i would also be negative. One additional problem with Eq. (4.32b) is that the R 2 Adj values can produce b σ 2 s values that are higher than those of the estimated variance of the breeding values b σ 2 g . Using Eqs. (4.32a) and (4.32b), we can estimate σ 2 s i , but from them it is not clear how we can estimate the covariance between two different estimated marker score values.
Consider the case of the PHT values described in Sect. 4.3.1 of this chapter, where M ¼ 7, n ¼ 247, and the estimated variance of PHT was b σ 2 PHT ¼ 191:81. The estimated values of R 2 for each of the seven markers were 0.0038, 0.0005, 0.006, 0.0013, 0.0036, 0.0114, and 0.0298, whence, by multiplying each estimated R 2 value by b σ 2 PHT ¼ 191:81 and summing the results, we found that the estimated value of σ 2 s PHT was b σ 2 b s PHT ¼ 9:78. In this case, the estimated portion of the genetic variance attributable to b σ 2 b s PHT ¼ 9:78 was b q PHT ¼ 9:78 83 ¼ 0:1178; thus, when we estimated σ 2 s PHT according to Eq. (4.32a), the seven markers explained only 11.78% of the genetic variance associated with PHT.
The estimated value of R 2 Adj for the seven markers jointly was 0.06, whence b σ 2 s PHT ¼ 191:81 ð Þ 0:06 ð Þ¼ 11:50 is an estimate of σ 2 s PHT . In the latter case, the estimated portion of the genetic variance attributable to b σ 2 s PHT ¼ 11:50 was b q PHT ¼ 11:5 83 ¼ 0:1385; that is, according to Eq. (4.32b), the seven markers explain 13.85% of the genetic variance associated with PHT. One additional way of estimating the variance of the marker score σ 2 s i was proposed by Lange and Whittaker (2001) as b θ j x j and b μ s i is the mean of b s i values. The covariance between the ith and jth marker scores can be estimated as the cross products of the marker score values divided by n À 1. Note that in this case, the number of markers associated with the ith and jth traits may be different. For the PHT values described in Sect. 4.3.1 of this chapter, where n ¼ 247, the estimated value of σ 2 s i was b σ 2 s PHT ¼ 15:75 and the estimated portion of the genetic variance attributable to b σ 2 s PHT ¼ 15:75 was b q PHT ¼ 15:75 83 ¼ 0:1897. That is, the seven markers jointly explain 18.97% of the genetic variance associated with PHT according to Eq. (4.33).
Using the estimated value b σ 2 b s PHT ¼ 48:23 obtained with Eq. (4.29), it is possible to estimate the LMSI weight as b whereas for b σ 2 b s PHT ¼ 9:78, b σ 2 s PHT ¼ 11:50, and b σ 2 s PHT ¼ 15:75, the estimated values of β PHT were 0.402, 0.40, and 0.382 respectively. The latter results indicate that the estimated values of β PHT associated with the phenotypic values tend to decrease when the estimated values of the variance of the marker score increase. This means that at the limit, when all the genetic variance is explained by the markers, the estimated values of β PHT are zero and the estimated LMSI is equal to b I M ¼ b s. Thus, for trait PHT, when the estimated values of β PHT are not zero, the estimated LMSI can be written as b The b I M PHT values are used to predict, rank, and select the net genetic merit value of each individual candidate for selection.
Based on the result b σ 2 b s PHT ¼ 48:23 obtained with Eq. (4.29) and using a selection intensity of 10% (k I ¼ 1.755), the estimated LMSI selection response can be obtained as σ 2 b s PHT ¼ 9:78 and b σ 2 s PHT ¼ 11:50, the estimated values of the LMSI selection responses were 10.99 and 11.10 respectively. The latter results indicate that the estimated values of the LMSI selection responses tend to increase when the estimated values of the variance of the marker score increase.
We can estimate LMSI versus phenotypic efficiency for one trait as

Estimating the Variance of the Marker Score in the Multi-Trait Case
Equation (4.33) can be used in the multi-trait context when the numbers of markers associated with the ith and jth traits are different. Also, it is possible to adapt Eqs. (4.32a) and (4.32b) to the multi-trait case. However, in the latter case, in addition to the markers linked to the QTL that affect one specific trait, we need to find markers that affect more than one trait, which may be very difficult. For this reason, in the multi-trait context, Eqs. (4.32a) and (4.32b) could be used to estimate the variance of the marker score (S) without preselecting the markers that affect the phenotypic traits, only when the number of genotypes is higher than the number of markers.
Let y 1 , y 2 , . . ., y r be r independent multivariate normal vectors of observations, each with n observations, such that Y ¼ y 11 y 12 Á Á Á y 1t y 21 y 22 Á Á Á y 2t ⋮ ⋮ Á Á Á ⋮ y n1 y n2 Á Á Á y nt 2 6 6 4 3 7 7 5 is a matrix n Â t of observations for t traits; then, the multivariate linear regression model can be written as Y ¼ XB + U, where X is a matrix n Â m (m¼ number of markers and m < n) of known coded marker values, B is a matrix m Â n of regression coefficients, and U is a matrix n Â t of unobserved random disturbance whose rows for given X are uncorrelated, each with mean 0 and common covariance matrix E (Mardia et al. 1982;Rencher 2002). According to the least squares method of estimation, b B ¼ is an estimator of the residual covariance matrix E assuming that n > m (Johnson and Wichern 2007).
Note that 1 À R 2 ¼ b e 0 b e y 0 y , where b e is a vector of estimated residual values of the θ j x j þ e i and R 2 is the coefficient of multiple determination (Eq. 4.32a). In addition, as in the multi-trait context the estimated matrix of residuals Mardia et al. 1982), whence R 2 in the multivariate context can written as whereas R 2 Adj (Eq. 4.32b) can be written as where I is an identity matrix t Â t, b P À1 is the inverse of the estimated covariance matrix of phenotypic values ( b P), and b S is the estimated covariance matrix of marker score values. From Eq. (4.34b), is an unbiased estimator of matrix b S, whereas b PR 2 ¼ b S (Eq. 4.34a) is a biased estimator of matrix b S. The main problem of Eq. (4.34c) is that the diagonal elements of b S could be negative. From the maize F 2 population including 247 genotypes (each one with two repetitions) and 195 molecular markers described in Sect. 4.3.1, we used two traits-PHT (cm) and EHT (cm)-to illustrate the multivariate method of estimating the LMSI parameters. The estimated phenotypic and genetic covariance matrices M ¼ À0:59 À0:18 À0:41 À0:82 ½ . Using a selection intensity of 10% (k I ¼ 1.755), the estimated LMSI selection response and the expected genetic gains per trait were b À10:09 À10:31 À2:53 À4:39 ½ respectively, whereas the estimated LMSI The estimated LPSI parameters (see Chap. 2 for details) using the phenotypic information from the maize F 2 population for traits PHT and EHT are as follows. The estimated LPSI vector of coefficients was b b 0 ¼ w 0 b C b P À1 ¼ À0:53 À0:36 ½ , and, with a selection intensity of 10% (k I ¼ 1.755), the estimated LPSI selection response and the expected genetic gains per trait were b :52 À8:45 ½ respectively, whereas the estimated LPSI accu- We can determine LMSI efficiency versus LPSI efficiency to predict the net genetic merit using the ratio of estimated accuracy values b ρ HÎ M ¼ 0:72 and b ρ HÎ ¼ 0:67 of the LMSI and LPSI respectively, i.e., b λ M ¼ 0:72 0:67 ¼ 1:075, whence, according to Eq. (4.19), the estimated LMSI efficiency versus the LPSI efficiency, in percentage terms, was b p M ¼ 100 1:075 À 1 ð Þ¼7:5. That is, for these data, the estimated LMSI efficiency was only 7.5% greater than LPSI efficiency at predicting the net genetic merit.

Estimating the GW-LMSI Parameters
in the Asymptotic Context Lange and Whittaker (2001) proposed the GW-LMSI. However, these authors did not provide detailed procedures for estimating matrices P, C, W, and M. They indicated that matrix C can be estimated using the estimated matrix of covariance of marker scores ( b S) and that matrices P, W, and M can be estimated directly by their empirical variances and covariances, but this assertion does not indicate a clear method for estimating those covariance matrices. In Chap. 2, we described the REML method of estimating C and P. Crossa and Cerón-Rojas (2011) described matrices W and M in a doubled haploid population. In this study, we describe and estimate matrices W and M for an F 2 population in the asymptotic context according to the Wright and Mowers (1994) approach, which is based on regressing phenotype values on marker coded values. We used this latter approach to estimate W and M, because it is a clearer estimation method than that of Lange and Whittaker (2001); however, the Wright and Mowers (1994) approach is an asymptotic method and should be regarded with precaution.
Matrix M is the covariance matrix of the molecular marker code values. All marker information used to construct matrix M is presented in Table 4.2. Based on this information, we found that the expectations (E(X 1 ) and E(X 2 )) and the variances (V(X 1 ) and V(X 2 )) of the marker coded values X 1 and X 2 are E(X 1 ) ¼ E(X 2 ) ¼ 0 and V(X 1 ) ¼ V(X 2 ) ¼ 1, whereas the covariance (Cov(X 1 , X 2 )) and correlation (Corr(X 1 , X 2 )), between X 1 and X 2 were Cov X 1 ; X 2 ð Þ¼Corr X 1 ; X 2 ð Þ¼1 À 2δ: ð4:35Þ Thus, as the variances of X 1 and X 2 are equal to 1, the correlation between X 1 and X 2 is Corr X 1 ; X 2 ð Þ¼ Cov X 1 ;X 2 ð Þ ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi V X 1 ð ÞV X 2 ð Þ p ¼ 1 À 2δ, i.e., the covariance and correlation between X 1 and X 2 are the same. Equation (4.35) results indicate that if we perform the same operation with many markers, we will obtain similar results; they also indicate that this is the way to construct matrix M.

Estimating the GW-LMSI Parameters in the Asymptotic Context
Let X be a matrix of coded markers of size n Â m, where n ! m and m¼ number of markers; then according to Wright and Mowers (1994), because all marker information is contained in matrix X 0 X, when the number of observations (n) tends to infinity, the product x 0 i x j =n tends to the covariance between markers ith and jth, whence matrix n À1 X 0 X should tend to the covariance matrix between the markers that conform matrix X with the ijth element equal to (0.5 À δ ij ). Thus, matrix 2n À1 X 0 X should tend to a covariance matrix where the ijth entry is equal to (1 À 2δ ij ). Based on the latter result, an estimator of matrix M in the asymptotic context is b M ¼ 2n À1 X 0 X: ð4:36Þ Equation (4.36) is an asymptotic result and should be taken with caution. To date, there has been no clear method for estimating M in the non-asymptotic context; for this reason, Eq. (4.36) is used to estimate the GW-LMSI parameters.
Assume that a QTL is between the two markers in Table 4.2; then, δ can be written as δ ¼ r 1 + r 2 À 2r 1 r 2 , where r 1 and r 2 denote the recombination frequency between marker 1 and marker 2 respectively, with the QTL between them. When the number of genotypes or individuals tends to infinity, the covariance between the phenotypic trait values ( y) and the marker 1 coded values (X 1 ) in an F 2 population can be written as Cov X 1 ; y ð Þ¼ 1 2 α 1 1 À 2r 1 ð Þ , ð4:37Þ where α 1 (1 À 2r 1 ) is the portion of the additive effect (α 1 ) of the QTL linked to marker 1 (Edwards et al. 1987), and r 1 is the recombination frequency between the QTL and marker 1. We can assume that for many markers, the covariance of the phenotypic values is similar to Eq. (4.37), whence matrix W can be obtained. Let y be a vector n Â 1 of recorded phenotypic values, where n denotes the number of observation or records, and X is a matrix of coded markers of size n Â m. Table 4.2 Marker genotypes, expected frequency, and coded values (X 1 and X 2 ) of the marker genotypes in an F 2 population Marker genotype Expected frequency X 1 X 2 A 1 B 1 /A 1 B 1 (1Àδ) 2 /4 1 1 A 1 B 1 /A 1 B 2 2(δÀδ 2 )/4 1 0 A 1 B 2 /A 1 B 2 δ 2 /4 1 À1 A 1 B 1 /A 2 B 1 2(δÀδ 2 )/4 0 1 A 1 B 2 /A 2 B 1 2(1À2δ + 2δ 2 )/4 0 0 A 1 B 2 /A 2 B 2 2(δÀδ 2 )/4 0 À1 A 2 B 1 /A 2 B 1 δ 2 /4 À1 1 A 2 B 1 /A 2 B 2 2(δÀδ 2 )/4 À1 0 A 2 B 2 /A 2 B 2 (1Àδ) 2 /4 À1 À1  Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.