Abstract
The plant net genetic merit is a linear combination of trait breeding values weighted by its respective economic weights whereas a linear selection index (LSI) is a linear combination of phenotypic or genomic estimated breeding values (GEBV) which is used to predict the net genetic merit of candidates for selection. Because economic values are difficult to assign, some authors developed economic weight-free LSI. The economic weights LSI are associated with linear regression theory, while the economic weight-free LSI is associated with canonical correlation theory. Both LSI can be unconstrained or constrained. Constrained LSI imposes restrictions on the expected genetic gain per trait to make some traits change their mean values based on a predetermined level, while the rest of the traits change their values without restriction. This work is geared towards plant breeders and researchers interested in LSI theory and practice in the context of wheat breeding. We provide the phenotypic and genomic unconstrained and constrained LSI, which together cover the theoretical and practical cornerstone of the single-stage LSI theory in plant breeding. Our main goal is to offer researchers a starting point for understanding the core tenets of LSI theory in plant selection.
Gregorio Alvarado: To the memory of ‘Goyito’ our forever dear friend, and unique colleague.
You have full access to this open access chapter, Download chapter PDF
Similar content being viewed by others
Keywords
1 Learning Objectives
-
To understand the advantages of linear selection index (LSI) theory for making selection decisions.
-
To understand and apply the unconstrained and constrained LSI in plant breeding.
-
To understand how to estimate LSI parameters.
2 Introduction
The linear phenotypic selection index (LPSI) theory was first described in the plant breeding context [1] and later in the animal [2] breeding phenotypic selection context. When the phenotypic and genotypic covariance matrices of the traits are known, the LPSI is the best phenotype-based linear predictor of the individual net genetic merit. In LPSI theory, it is assumed that the genotypic values that define the net genetic merit are composed entirely of the additive effects of genes and that the LPSI and the net genetic merit have joint bivariate normal distribution [3]. The main objectives of using a selection index (LSI) are (i) to predict the unobservable net genetic merit values of the candidates for selection, (ii) to maximize the expected genetic gain per trait or multi-trait selection response, and (iii) to provide the breeder with an objective rule for evaluating and selecting for several traits simultaneously. The advantages of an LPSI are that it modifies the predefined economic weights according to the trait heritability, that it considers indirect selection effects resulting from the genetic correlation between traits, and that it is relatively easy to use. Its disadvantages are that it may be difficult to assign economic weights to some traits, and that it requires large amounts of information to reliably estimate the genetic covariance between traits. This may cause a large sampling error.
Because economic weights are difficult to assign to some traits, several modified indices, such as the base index, the modified base index, the non-weighted multiplicative index [4] and the eigen selection index method (ESIM) [5, 6], have been proposed. The main LSI theory was developed assuming that the economic weights are fixed and known, which is, for instance, not the case for the ESIM.
In the LPSI structure, each trait has an economic weight. This could also imply that, for each trait, a directional change is desired. This may not be suitable to achieve a breeding objective in which some traits should remain unchanged. Assuming that the breeder is interested in keeping a certain trait within a range, we could either combine the use of an LPSI with independent culling for the restricted traits, or we could incorporate the fact that not changing the trait is desired. This was the main idea of the restricted LPSI (RLPSI) [3] which solves the usual LPSI equations subject to the restriction that the covariance between the LPSI and some linear function of the genotypes involved equals zero, thus preventing selection on the index from causing any genetic change in the expected genetic advance of the restricted traits.
Later the RLPSI results were extended [7] to a selection index called constrained LPSI (CLPSI) that attempts to make some traits change their mean based on a predetermined level while the rest of them are unrestricted. The CLPSI equals the covariance between the LPSI and some linear functions of the genotypes to a constant or genetic gain predetermined by the breeder. Some authors [5] developed a constrained index ESIM (CESIM) that does not use economic weights. The CLPSI (CESIM) is the most general LPSI and includes the LPSI (ESIM) and the RLPSI as particular cases.
In a similar manner, in the marker-assisted selection (MAS) context, a linear marker selection index (LMSI) was proposed [8] that uses phenotypic and marker score values jointly to predict the net genetic merit. The LMSI combines information on markers linked to quantitative trait locus (QTLs) and the phenotypic values of the traits to predict the net genetic merit of the candidates for selection because it is not possible to identify all QTLs affecting the economically important traits. Several authors [9, 10] have criticized the LMSI approach because it makes inefficient use of the available data. In addition, because the LMSI is based on only a few large QTL effects, it violates the selection index assumptions of multivariate normality and small changes in allele frequencies. We shall not describe the LMSI. Readers interested in the LMSI can see [11] for details.
The linear genomic selection index (LGSI) and constrained linear genomic selection index (CLGSI) were developed in the genomic selection (GS) context in which animals and plants are selected based on the GEBV of the candidates for selection [12, 13]. In the LGSI context, all marker effects of the genotyped individuals in the training population are estimated using marker and phenotypic data. These estimated effects are used in subsequent selection cycles to obtain predictors (GEBVs) of the individual breeding values in the testing population for which there is only marker information about the candidates for selection.
It has been shown [9] that GS increased the accuracy of predicting the breeding values of the candidates for selection, and reduced the intervals between selection cycles and the costs of the breeding programs (See Chaps. 5, 6 and 30). Because GS decreases the generation interval, it leads to a much higher genetic gain per year. Some authors [14] indicated that GS could replace traditional progeny testing when maximizing the genetic gain per year, as long as the accuracy of GEBV is higher than or equal to 0.45.
The expected selection response of the net genetic merit and the expected genetic gain per trait are the main quantities to consider when comparing different LSI. These parameters give breeders an objective basis to compare different selection methods. We describe the practical applications of the phenotypic and genomic LSI using real wheat data. Readers unfamiliar with LSI theory should read the Appendix of this work first and then return to the manuscript. A complete exposition of LSI theory is in [15].
3 Definitions
Breeding Value
the value of an individual measured by the mean phenotype of its progeny obtained by random mating with the population. It is also the sum of the average additive effects of the genes of the individual.
Economic Weight
the increase in profit achieved by improving a particular trait by one unit.
Expected Genetic Gain Per Trait (Multi-trait Selection Response)
a vector of expected genetic gains associated with the traits of the offspring of the selected parents.
GEBV
the sum of additive whole genome allele effects of an individual. Allele effects are estimated by a regression of the phenotypic values on the whole genome DNA markers. It is used to predict breeding values of individuals in animal and plant breeding programs in the genomic selection context.
Genomic Selection
the selection of parents based on the higher GEBV values or on a linear combination of them (e.g., LGSI or CLGSI).
Genotypic Value
the average of the phenotypic values across a (large) population of environments.
Linear Selection Index (LSI)
a linear combination of phenotypic and/or GEBV values, or marker scores. In addition, it can be unconstrained or constrained.
Net Genetic Merit
a linear combination of breeding values of the individual traits of interest, each of them weighted by its respective economic value. It is also called the total economic value of one individual.
Phenotype Value
the sum of genotypic (or breeding) value, environment value, and genotype-by-environment interaction.
Quantitative Traits
plant and animal characteristics (or phenotypic expression) that exhibit continuous variability, which is the result of many gene effects interacting among themselves and with the environment.
Selection Response
the expectation of the net genetic merit of the selected individuals when the mean of the original population is zero. It is also defined as the difference between the mean phenotypic values of the offspring of the selected parents and the mean of the entire parental generation before selection.
4 Key Points
-
Selection indices are fundamental tools for modern plant breeding.
-
The use of selection indices is a key to better estimate the net genetic merits of candidates for selection. Selection indices will ensure that wheat improvement research maximizes its impact.
-
New breeding technologies like genomic assisted breeding and rapid cycle selection has to be combined with the use of selection indices to maximize response to selection.
5 Phenotypic and Genomic Selection Indices Theoretical Results
5.1 The Net Genetic Merit and the LPSI
The net genetic merit (H = w′g) is related to the vector of trait phenotypic (y) values as
where g′ = [G1 G2 … Gt] and \( {\mathbf{y}}^{\prime }=\left[{Y}_1\kern0.5em {Y}_2\kern0.5em \dots \kern0.5em {Y}_t\right] \) are vector 1 × t (t=number of traits) of true unobservable breeding values and observable trait phenotypic values, respectively, I = b′y is the LPSI, and w′ = [w1 w2…wt] is the vector of economic weights. In Eq. 32.1, we assume that e has normal distribution with expectation E(e) = 0 and variance \( {\sigma}_e^2 \), and that I and e are independent; thus \( {\sigma}_H^2={\sigma}_I^2+{\sigma}_e^2 \) is the variance of H, \( {\sigma}_I^2={\mathbf{b}}^{\prime}\mathbf{Pb} \) is the variance of I, P is the phenotypic covariance matrix, and \( {\sigma}_e^2={\sigma}_H^2-{\sigma}_I^2 \) is the residual variance.
The LPSI (I = b′y) can be written as
where b = P−1Cw, C is the genotypic covariance matrix, Cov(H, y) = Cw is the covariance among H = w′g and y, and P−1 is the inverse matrix of P.
5.2 Economic Weights for LPSI
A method for assigning economic weights to the traits [1] is as follows. Suppose that in a wheat-selection program we are required to consider the vector \( {\mathbf{y}}^{\prime }=\left[{Y}_1\kern0.5em {Y}_2\kern0.5em \dots \kern0.5em {Y}_t\right] \) of t traits. Let us evaluate each in terms of Y1. Suppose that Y1 denotes grain yield, Y2 baking quality and Y3 denotes resistance to flag smut. Suppose that an advance of 10 in baking score (Y2) is equal in value to an advance of 1 bushel per acre in yield (Y1) and that a decrease of 20% infection (Y3) is worth 1 bushel of yield (Y1), and so on. Then, taking Y1 as standard and units as indicated, w1 = 1.0, w2 = 0.1, w3 = − 0.05, etc., will be the economic values of each trait.
One additional method for assigning economic weights to the traits (which we have used in this work) is based on the expected genetic gain per trait (Appendix, Eq. 32.A5). Let us consider the real data HarvestPlus Association Mapping (HPAM) panel, which consists of 330 wheat lines from CIMMYT, and assume that the objective of the selection is to increase the mean value of Zn content in the grain (Zn), the Fe content in the grain (Fe), and grain yield (GY, t/h), while decreasing or maintaining the same plant height (PHT, cm). We found that the vector \( \mathbf{w}=\left[0.1\kern0.5em 0.5\kern0.5em 2.8\kern0.5em -0.6\right] \) (see Sect. 32.10) is adequate for obtaining the expected genetic gain per trait described in the Results Section of this work. This method is by assay and error and requires the evaluation of Eq. 32.A5 until we obtain the desired results.
5.3 The Maximized Correlation and the Maximized LPSI Selection Response
The maximized correlation between H and I (ρHI) and the maximized LPSI selection response are
respectively, where b = P−1Gw (Appendix, Eq. 32.A3). Equation 32.4 predicts the mean improvement in H due to indirect selection on I = b′y. Here, k is the intensity of selection. The heritability of I = b′y is \( {h}_I^2=\frac{{\mathbf{b}}^{\prime}\mathbf{Cb}}{{\mathbf{b}}^{\prime}\mathbf{Pb}} \).
6 The Retrospective Index
This index is useful when, instead of the index values, the breeder observes only the vector of selection differentials (s). In this case, the index that would give the same observed s is called the retrospective index and its vector of coefficients can be obtained as b = P−1s [16].
7 Constrained LPSI (CLPSI)
The CLPSI vector of coefficients is
where K = [It − Q],Q = P−1M(M′P−1M)−1M′, M′ = D′U′C, It is a t×t identity matrix and b = P−1Cw.
7.1 The Maximized CLPSI Selection Response and Expected Genetic Gain Per Trait
The maximized CLPSI selection response and expected genetic gain per trait are
respectively, where k is the selection intensity.
8 The ESIM and CESIM Theory
8.1 The Maximized ESIM Selection Response and the Maximized \( {\rho}_{H{I}_1} \)
The maximized ESIM selection response (RE) and the maximized correlation between \( {I}_{E_1}={\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{y} \) and \( {H}_{E_1}={\mathbf{w}}_{E_1}^{\prime}\mathbf{g} \) (\( {\rho}_{H{I}_1} \)) are
respectively, where \( {\boldsymbol{\beta}}_{E_1}=\mathbf{F}{\mathbf{b}}_{E_1} \) is the first eigenvector of equation \( \left({\mathbf{T}}_2-{\rho}_{H{I}_j}^2\mathbf{I}\right){\boldsymbol{\beta}}_{E_j}=\mathbf{0} \) (Appendix, Eqs. 32.A7 and 32.A8). When F is not used, Eq. 32.8 is equal to \( {R}_E=k\sqrt{{\mathbf{b}}_{E_1}^{\prime}\mathbf{P}{\mathbf{b}}_{E_1}} \) (Appendix, Eq. 32.A7), whereas Eq. 32.9 is the square root of the first eigenvalue of Eq. 32.A7, i.e., \( {\rho}_{H{I}_1}=\sqrt{\rho_{H{I}_1}^2} \). The heritability of \( {I}_{E_1}={\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{y} \) is \( {h}_E^2=\frac{{\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{C}{\boldsymbol{\beta}}_{E_1}}{{\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{P}{\boldsymbol{\beta}}_{E_1}} \)
8.2 The Maximized CESIM Selection Response and Expected Genetic Gain Per Trait
The maximized CESIM selection response (RCE) and expected genetic gain per trait (ECE) are
respectively, where all the terms were defined earlier.
9 The Unconstrained and Constrained Linear Genomic Selection Index Theory
The LGSI and the CLGSI are, respectively, an application of the LPSI and CLPSI to the genomic selection context. Thus, the LGSI and the CLGSI theoretical results are very similar to the LPSI and CLPSI theoretical results.
9.1 The Unconstrained Linear Genomic Selection Index (LGSI)
Let \( {\mathbf{z}}^{\prime }=\left[ GEB{V}_1\kern0.5em GEB{V}_2\kern0.5em \cdots \kern0.5em GEB{V}_t\right] \) be a vector of GEBVs for t traits. The individual LGSI is
where w is the vector of economic weights for t traits.
9.2 The CLGSI Vector of Coefficients
The CLGSI vector of coefficients is
where w is the vector of economic weights, KG = [It − QG], QG = UD(D′U′ΓUD)−1D′U′Γ, Γ = Var(z) is the covariance matrix of GEBV, and It is an identity matrix of size t×t, whereas D and U are the matrices described in Eq. 32.A6 (Appendix). When d = 0, D=U and matrix KG can be written as KG = [It − QG], where QG = U(U′ΓU)−1U'Γ. In this case, the CLGSI is a null restricted LGSI. When D=U and U′ is a null matrix, βG = w. Thus, the CLGSI includes the null restricted and the unrestricted LGSI as particular cases.
9.3 Maximized CLGSI Selection Response and Expected Genetic Gain Per Trait
The maximized CLGSI selection response and expected genetic gain per trait are
respectively. The methods to estimate the index parameters are in [15].
9.4 The Genomic Estimated Breeding Values (GEBV)
To obtain the GEBV, we used a multi-trait genomic best linear unbiased predictor (GBLUP) described in [12, 13].
10 Real Wheat Data
We used the HarvestPlus Association Mapping (HPAM) panel, which consists of 330 wheat lines from CIMMYT and four traits: Zn content in the grain (Zn), Fe content in the grain (Fe), grain yield (GY, t/h), and plant height (PHT, cm). The objective of the selection was to increase the mean values of Zn, Fe, and GY while PHT decreased or stayed the same.
Using CLPSI, CESIM, and CLGSI, we constrained traits Zn, Fe and GY with the vector of constraints \( {\mathbf{d}}^{\prime }=\left[1.5\kern0.5em 1.6\kern0.5em 0.45\right] \) and matrices \( {\mathbf{U}}^{\prime }=\left[\begin{array}{cccc}1& 0& 0& 0\\ {}0& 1& 0& 0\\ {}0& 0& 1& 0\end{array}\right] \) and \( {\mathbf{D}}^{\prime }=\left[\begin{array}{ccc}0.45& 0& -1.5\\ {}0& 0.45& -1.6\end{array}\right] \). Each element of vector d is the standard deviation of the genotypic variance of Zn, Fe, and GY, respectively. The vector of economic weights for LPSI, CLPSI, LGSI, and CLGSI was \( \mathbf{w}=\left[0.1\kern0.5em 0.5\kern0.5em 2.8\kern0.5em -0.6\right] \), whereas for ESIM and CESIM, matrix F was \( \mathbf{F}=\left[\begin{array}{cccc}-0.5& 0& 0& 0\\ {}0& 1.0& 0& 0\\ {}0& 0& 2.0& 0\\ {}0& 0& 0& -0.5\end{array}\right] \) and \( \mathbf{F}=\left[\begin{array}{cccc}1.0& 0& 0& 0\\ {}0& -1.0& 0& 0\\ {}0& 0& 2.0& 0\\ {}0& 0& 0& -0.8\end{array}\right] \), respectively. The total proportion (p) retained was 6% (k=1.98) for the phenotypic indices and 12.45% (k=1.65) for the genomic indices. The estimated phenotypic (\( \hat{\mathbf{P}} \)) and genotypic (\( \hat{\mathbf{C}} \)) covariance matrices among the four traits were
With the data described above, we obtained the estimated matrix Γ (\( \hat{\boldsymbol{\Gamma}} \)) for three cases denoted as G, G-COP and COP, where \( \hat{\varGamma}=\left[\begin{array}{cccc}0.47& 0.11& -0.16& -0.09\\ {}0.11& 0.82& 0.72& 0.17\\ {}-0.16& 0.72& 1.82& 0.24\\ {}-0.09& 0.17& 0.24& 0.13\end{array}\right] \), \( \hat{\varGamma}=\left[\begin{array}{cccc}0.87& 0.35& -0.03& -0.10\\ {}0.35& 1.01& 0.89& 0.17\\ {}-0.03& 0.89& 2.51& 0.35\\ {}-0.10& 0.17& 0.35& 0.17\end{array}\right] \), and \( \hat{\varGamma}=\left[\begin{array}{cccc}0.77& 0.38& 0.03& -0.08\\ {}0.38& 0.91& 0.78& 0.12\\ {}0.03& 0.78& 2.26& 0.31\\ {}-0.08& 0.12& 0.31& 0.14\end{array}\right] \), respectively.
11 Results
11.1 Phenotypic Results
Figure 32.1 presents the averages for four traits of the 20 selected individuals (with LPSI and ESIM) with a proportion of 6% (k = 1.985). In this case, those averages were very similar. We found similar results when we made selections using CLPSI and CESIM for this real dataset.
Table 32.1 presents the estimated LPSI, ESIM, CLPSI, and CESIM selection response, coefficient of correlation and heritability. The estimated ESIM and CESIM selection response, correlation and heritability were higher than the estimated LPSI and CLPSI selection response, correlation and heritability. Thus, ESIM and CESIM efficiency for predicting the net genetic merit was higher than LPSI and CLPSI efficiency.
Table 32.2 presents the estimated LPSI, ESIM, CLPSI, and CESIM expected genetic gain for four traits selected with a proportion of 6% (k = 1.985). The estimated CLPSI and CESIM expected genetic gains per trait were constrained by vector \( {\mathbf{d}}^{\prime }=\left[1.5\kern0.5em 1.6\kern0.5em 0.45\right] \) values. Thus, the estimated expected genetic gains of traits Zn, Fe, and GY should be similar to the d values. The estimated CLPSI and CESIM expected genetic gain values were lower than the d values. This means that to reach d values, breeders will need to select once again using CLPSI and CESIM. However, the estimated CESIM expected genetic gain values were higher than the estimated CLPSI expected genetic gain values.
11.2 Genomic Selection Index Results
For datasets G, G-COP and COP, in Table 32.3 we present the estimated LGSI and CLGSI selection response and expected genetic gain for four traits with a selected proportion of 12.45% (k = 1.65). In this case, the estimated CLGSI expected genetic gains per trait were constrained by vector \( {\mathbf{d}}^{\prime }=\left[1.5\kern0.5em 1.6\kern0.5em 0.45\right] \) values. Thus, the estimated expected genetic gain of traits Zn, Fe, and GY should be similar to the d values. The estimated CLGSI expected genetic gain values were lower than the d values. This means that to reach d values, breeders will need to select once again using CLGSI. Note, however, that the estimated LGSI and CLGSI selection response and expected genetic gain values were higher than the estimated LPSI and CLPSI expected genetic gain values. This means that for the predicted data, LGSI and CLGSI efficiency was higher than LPSI and CLPSI efficiency. In addition, the estimated LGSI and CLGSI selection response was not affected by the restriction imposed on the LGSI and CLGSI expected genetic gain, as we would expect.
12 How to Incorporate a Selection Index in Practice?
Incorporating a selection index requires a step-by-step approach to ensure its successful implementation. Most of the time, breeders use a customized procedure to select individuals based on independent culling that comprises multiple steps.
The first step consists of understanding the selection procedure executed by the program. The steps in the selection procedure can be mapped back to a set of reduction and selection steps applied to a selection unit (i.e., lines, families, etc.); each step consists of trait conditions (value and directionality). Each selection step can consist of meeting more than one trait condition (Table 32.4).
The second step identifies which parts of the selection process can be replaced by an index. For example, by looking at Table 32.4, you can decide to pick a single step and replace independent culling or replace multiple steps with a single selection index. Here we will replace steps 2–11 with a selection index.
13 Retrospective Index
A third step consists of building the index. Indices that depend on economic weights are difficult to implement. Instead, a retrospective index is the best way to start implementing an index. For example, assume that the matrix of estimates for the traits indicated above is available together with an indicator column in which the material was selected by the breeder using the steps indicated in Table 32.4. The formula \( \hat{\mathbf{b}}={\hat{\mathbf{P}}}^{-1}\mathbf{s} \) is then used to infer the weights.
Suppose that \( \hat{\mathbf{P}} \)and \( {\hat{\mathbf{P}}}^{-1} \)are as follow:
Then, according to Table 32.5 values,
The obtained weights can be confusing if they have a different direction than the desired direction. For example, the 4th weight for Xa2 (Table 32.5) resistance is positive, when we would expect it to be negative. This is because covariances among traits are expected to account for taking the trait in the right direction despite the value of the weight.
To show that these weights are better than the current approach, we can calculate what would be the selected individuals and the selection differentials using the index and compare them to the selection differentials obtained with the current approach. As can be seen in Table 32.6, if these weights are considered the real weights (even economic weights), the index can select better individuals than the breeder’s eyeball method. This example shows how the selection index theory can provide higher selection differentials than the breeder.
14 Discussion
14.1 The Unconstrained LSI Theory
The LSI theory includes, as particular cases, the unconstrained LPSI and LGSI, and any other unconstrained LSI associated with this theory that is based on the quantitative genetics and the multivariate normal distribution theory. The LSI theory is based on multivariate normal distribution theory because this distribution allows the LSI to be completely described using only means, variances and covariances. When the phenotypic traits and GEBV values have multivariate normal distribution, linear combinations of phenotypic traits and GEBV are normal. Even if the phenotypic traits and GEBV values do not have multivariate normal distribution, this distribution serves as a useful approximation, especially in inferences involving sample mean vectors, which, by the central limit theorem, have multivariate normal distribution [17]. By this reasoning, a fundamental assumption in LSI theory is that the LSI and the net genetic merit have joint bivariate normal distribution. Under the latter assumption, the regression of the net genetic merit on any linear function of the phenotypic or GEBV values is linear [3].
The selection response and the expected genetic gain per trait were the main parameters of the LSI and the criteria to compare LSI efficiency and predict the net genetic merit of any linear index. These parameters give breeders a clearer base on which to objectively validate the effectiveness of the adopted selection method.
The LPSI was the first LSI used to predict the net genetic merit and has good statistical properties when the phenotypic and genotypic covariances matrices are known. The LGSI is the most recent LSI and has the advantage of reducing the intervals between selection cycles by more than two thirds.
14.2 The Constrained LSI
The constrained LPSI (CLPSI) and the constrained LGSI (CLGSI) impose constraints on the expected genetic gain per trait. These indices include the unconstrained indices as particular cases. There are two types of CLPSI and CLGSI: the null restricted index and the predetermined proportional gain index. The null restricted index allows imposing restrictions equal to zero on the expected genetic gain of some traits, while the expected genetic gain of other traits increases (or decreases) without imposing any restrictions. In a similar manner, the constrained index attempts to make some traits change their expected genetic gain values based on a predetermined level, while the rest of the traits remain without restrictions. The objective of both types of selection indices is to predict the net genetic merit and select parents for the next generation. The CLPSI and CLGSI are projections of the vector coefficients of the LPSI and LGSI, respectively, to a different space, and the constraining effects are observed on the CLPSI and CLGSI expected genetic gains per trait where each restricted trait has an expected genetic gain according to the constrained values imposed by the breeder.
14.3 Statistical Properties of the LSI
Both the unconstrained and constrained indices have the same statistical properties when the phenotypic and genotypic covariance matrices and the economic weights are known. For example, they have maximum correlation with the net genetic merit and the variance of the predicted error is minimal; however, when the phenotypic and genotypic covariance matrices and the economic weights are unknown, the statistical sampling properties of the indices described in this work are difficult to know. Assuming that the estimated LSI have normal distribution, some authors [18] found the statistical sampling properties of the LSI selection responses in the phenotypic and genomic selection context while others [15] reported the statistical sampling properties of ESIM and CESIM.
15 Key Concepts
-
Using a selection index in plant breeding maximizes the expected genetic gain per trait or multi-trait selection response and provides an objective rule for evaluating and selecting for several traits simultaneously.
-
The advantages of a selection index is that it considers indirect selection effects resulting from the genetic correlation between traits. Main disadvantages are that it may be difficult to assign economic weights to some traits. Several modified indices exists to overcome this problem.
-
Recently genomic selection indices have been developed and used based on the genomic estimated breeding.
16 Conclusions
Our main goal was to offer researchers a starting point for understanding the core tenets of LSI theory in plant selection. We provided the unconstrained and constrained LSI theory associated with phenotypic and genomic selection. We validated the LSI phenotypic and genomic theoretical results in the wheat breeding context using a real wheat dataset with four traits.
References
Smith HF (1936) A discriminant function for plant selection. Ann Eugen 7:240–250. https://doi.org/10.1111/j.1469-1809.1936.tb02143.x
Hazel LN (1943) The genetic basis for constructing selection indexes. Genetics 28:476–490
Kempthorne O, Nordskog AW (1959) Restricted selection indices. Biometrics 15:10–19
Baker RJ (1986) Selection indices in plant breeding. CRC Press Inc., Boca Raton
Cerón-Rojas JJ, Sahagún-Castellanos J, Castillo-González F, Santacruz-Varela A, Crossa J (2008) A restricted selection index method based on eigen analysis. J Agric Biol Environ Stat 13:421–438. https://doi.org/10.1198/108571108X378911
Cerón-Rojas JJ, Crossa J, Toledo FH, Sahagún-Castellanos J (2016) A predetermined proportional gains eigen selection index method. Crop Sci 56:2436–2447. https://doi.org/10.2135/cropsci2015.11.0718
Mallard J (1972) The theory and computation of selection indices with constraints: a critical synthesis. Biometrics 28:713–735
Lande R, Thompson R (1990) Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124:743–756
Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Dekkers JCM (2007) Prediction of response to marker-assisted and genomic selection using selection index theory. J Anim Breed Genet 124:331–341. https://doi.org/10.1111/j.1439-0388.2007.00701.x
Céron-Rojas JJ, Crossa J (2018) Linear marker and genome-wide selection indices. In: Linear selection indices in modern plant breeding. Springer, Cham, pp 71–98
Cerón-Rojas JJ, Crossa J (2019) Efficiency of a constrained linear genomic selection index to predict the net genetic merit in plants. G3 9:3981–3994. https://doi.org/10.1534/g3.119.400677
Cerón-Rojas JJ, Crossa J, Arief V, Basford K, Rutkoski J, Jarquín D, Alvarado G, Beyene Y, Semagn K, DeLacy I (2015) A genomic selection index applied to simulated and real data. G3 5:2155–2164. https://doi.org/10.1534/g3.115.019869
Börner V, Reinsch N (2012) Optimising multistage dairy cattle breeding schemes including genomic selection using decorrelated or optimum selection indices. Genet Sel Evol 44:11. https://doi.org/10.1186/1297-9686-44-1
Cerón-Rojas JJ, Crossa J (2018) Linear selection indices in modern plant breeding. Springer, Cham
Dickerson GE, Blunn CT, Chapman AB, Kottman RM, Krider JL, Warwick EJ, Whatley J, Baker ML, Lush JL, Winters LM (1954) Evaluation of selection in developing inbred lines of swine. Research Bulletin 551. University of Missouri, College of Agriculture, Agricultural Experiment Station
Rencher AC (2002) Methods of multivariate analysis, 2nd edn. Wiley-Interscience, New York
Cerón-Rojas JJ, Crossa J (2020) Expectation and variance of the estimator of the maximized selection response of linear selection indices with normal distribution. Theor Appl Genet. https://doi.org/10.1007/s00122-020-03629-6
Céron-Rojas JJ, Crossa J (2018) Linear phenotypic eigen selection index methods. In: Linear selection indices in modern plant breeding. Springer, Cham, pp 149–176
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Appendix
Appendix
This Appendix is a brief review of the LSI theory. Readers interested in this theory should see [15], who describe the complete LSI theory.
1.1 Breeding and Trait Phenotypic Values
Let \( {\mathbf{g}}^{\prime }=\left[{G}_1\kern0.5em {G}_2\kern0.5em \dots \kern0.5em {G}_t\right] \) be a vector 1 × t (t= number of traits) of true unobservable breeding values associated with the observable vector of trait phenotypic values \( {\mathbf{y}}^{\prime }=\left[{Y}_1\kern0.5em {Y}_2\kern0.5em \dots \kern0.5em {Y}_t\right] \), such that the jth(j= 1, 2, …, t) individual trait phenotypic value for one environment is
where Gj is composed entirely of additive genetic effects and includes all types of gene and interaction values, whereas εj denotes the deviations of Yj from the Gj values. In Eq. 32.A1, Gj and εj are independent unobservable random variables, have normal distribution with expectation E(Gj) = 0 and E(εj) = 0, and variance \( {\sigma}_{G_j}^2 \)and \( {\sigma}_{\varepsilon_j}^2 \), respectively. In addition, Yj is an observable random variable, with normal distribution, expectation E(Yj) = 0 and variance \( {\sigma}_{G_j}^2+{\sigma}_{\varepsilon_j}^2 \). Finally, \( \mathit{\operatorname{cov}}\left({Y}_j,{G}_j\right)={\sigma}_{G_j}^2 \)is the covariance between Yj and Gj.
1.2 The Unconstrained Linear Phenotypic Selection Index (LPSI)
The random vectors \( {\mathbf{g}}^{\prime }=\left[{G}_1\kern0.5em {G}_2\kern0.5em \dots \kern0.5em {G}_t\right] \) and \( {\mathbf{y}}^{\prime }=\left[{Y}_1\kern0.5em {Y}_2\kern0.5em \dots \kern0.5em {Y}_t\right] \) (Equation 32.A1) have joint multivariate normal distribution with mean \( {\mu}^{\prime }=\left[\mathbf{0}\kern0.5em \mathbf{0}\right] \) and covariance matrix \( Var\left(\begin{array}{c}\mathbf{y}\\ {}\mathbf{g}\end{array}\right)=\left[\begin{array}{cc}\mathbf{P}& \mathbf{C}\\ {}\mathbf{C}& \mathbf{C}\end{array}\right] \), where P and C are t × t covariance matrices of trait phenotypic (y) and breeding (g) values, respectively. The joint distribution of the linear combination of y(I = b′y, called LPSI) and g(H = w′g, called “net genetic merit”) values is bivariate normal distribution with mean \( {\mathbf{m}}^{\prime }=\left[{m}_H\kern0.5em {m}_I\right] \) and covariance matrix
where \( {\mathbf{b}}^{\prime }=\left[{b}_1\kern0.5em {b}_2\kern0.5em \dots \kern0.5em {b}_t\right] \) is an unknown vector of coefficients associated with y, and w′ = [w1 w2…wt] is a vector of known economic weight values associated with g. In Eq. 32.A2, \( {\sigma}_H^2={\mathbf{w}}^{\prime}\mathbf{Cw} \) and \( {\sigma}_I^2={\mathbf{b}}^{\prime}\mathbf{Pb} \) are the variance of H and I, respectively, whereas σHI = w′Cb is the covariance between H and I.
1.3 The Best Linear Predictor of the Mean Value of H
Suppose that mH = 0 and mI = 0; then the conditional expectation of H given y (H/y) is
where b = P−1Cw, I = b′y = w′CP−1y; Cov(H, y) = Cw is the covariance among H and y, and P−1 is the inverse matrix of P. Eq. 32.A3 is the best linear predictor of the mean value of H.
1.4 The Selection Response
The selection response (R) is the expectation of H for a proportion p (Fig. 32.A1) of individuals selected and can be written as
where k is the intensity of selection, σH is the standard deviation of H and ρHI is the correlation between H and I.
Equation 32.A4.1 is the same for all LSI; the only change is the type of information (phenotypic or genomic) and restrictions used when the index vector of coefficients is obtained to predict H and to maximize Eq. 32.A4.1, which is the main objective of any LSI. The genetic gain in Eq. 32.A4.1 will be larger as p becomes smaller—i.e., as the selection intensity becomes more intense (Fig. 32.A2). For example, in the LGSI context, the maximized selection response (Eq. 32.A4.1) can be written as
where k is the selection intensity, L denotes the interval between selection cycles, Γ = Var(z) is the covariance matrix of GEBV, and \( {\sigma}_I=\sqrt{{\mathbf{w}}^{\prime}\boldsymbol{\Gamma} \mathbf{w}} \) is the standard deviation of IG = w′z.
1.5 Constrained LPSI (CLPSI)
The main objective of the CLPSI is to maximize Eq. 32.A4 under some restrictions imposed on the expected genetic gain per trait (E), which can be written as
The type of restriction imposed on Equation 32.A5 can be a null restriction (RLPSI) or a predetermined constraint (CLPSI). Thus, let \( {\mathbf{d}}^{\prime }=\left[{d}_1\kern0.5em {d}_2\kern0.5em \dots \kern0.5em {d}_r\right] \) be a vector of r constraints and assume that μq is the population mean of the qth trait (q=1,2,…,r , and r is the number of constraints) before selection. The CLPSI changes μq to μq + dq, where dq is a predetermined change in μq imposed by the breeder. The restriction effects will be observed on the CLPSI expected genetic gains per trait (Equation 32.A5), where each restricted trait will have an expected genetic gain according to the d values imposed by the breeder.
1.6 The CLPSI Vector of Coefficients
Let \( {\mathbf{D}}^{\prime }=\left[\begin{array}{c}{d}_r\\ {}0\\ {}\begin{array}{c}\vdots \\ {}0\end{array}\end{array}\kern0.5em \begin{array}{c}0\\ {}{d}_r\\ {}\begin{array}{c}\vdots \\ {}0\end{array}\end{array}\kern0.5em \begin{array}{ccc}\begin{array}{c}\dots \\ {}\dots \\ {}\begin{array}{c}\ddots \\ {}\dots \end{array}\end{array}& \begin{array}{c}0\\ {}0\\ {}\begin{array}{c}\vdots \\ {}{d}_r\end{array}\end{array}& \begin{array}{c}-{d}_1\\ {}-{d}_2\\ {}\begin{array}{c}\vdots \\ {}-{d}_{r-1}\end{array}\end{array}\end{array}\right] \)be a Mallard matrix [7] (r−1)×r of predetermined proportional gains, where dq (q=1, 2…, r) is the qth element of vector d, and let U′ be a matrix of 1’s and 0’s, where 1 indicates that the traits are restricted and 0 that the traits are not restricted [3]. To obtain the CLPSI vector of coefficients, we minimized the mean squared difference between I and H, E[(H−I)2], with respect to b under the restriction D′U′Cb = 0, where C is the covariance matrix of genotypic values.
The CLPSI vector of coefficients is
where K = [It − Q],Q = P−1M(M′P−1M)−1M′, M′ = D′U′C, It is a t×t identity matrix and b = P−1Cw. When d = 0, D=U, Q = P−1CU(U′CP−1CU)−1U′C, and CLPSI=RLPSI. When D=U and U′ is a null matrix, β = b. Thus, the CLPSI is the most general index and includes the LPSI and the RLPSI as particular cases.
1.7 The Eigen Selection Index Method (ESIM)
The ESIM maximizes the correlation between H=w′g and I = b′y, does not require a vector of economic weights w, and is associated with the canonical correlation theory [17]. In ESIM, I and H are canonical variables, whereas b and w are canonical vectors. The correlation between I and H (ρHI) is the canonical correlation. Thus, the measure of association between the jth linear combination of y(\( {I}_E={\mathbf{b}}_{E_j}^{\prime}\mathbf{y} \)) and the jth linear combination of g(\( {H}_E={\mathbf{w}}_{E_j}^{\prime}\mathbf{g} \)) is the jth canonical correlation (\( {\rho}_{H{I}_j} \)) value obtained from Eq. 32.A7
where \( {\mathbf{b}}_{E_j} \) is the jth canonical vector (j = 1, 2,…, t) of matrix P−1C, and \( {\mathbf{w}}_{E_j}={\mathbf{C}}^{-1}\mathbf{P}{\mathbf{b}}_{E_j} \). The first eigenvector (\( {\mathbf{b}}_{E_1} \)) of matrix P−1C is used in \( {I}_E={\mathbf{b}}_{E_1}^{\prime}\mathbf{y} \) and in the maximized ESIM selection response.
Let T=P−1C; then Eq. 32.A7 can be written as \( \mathbf{TI}{\mathbf{b}}_{E_j}={\rho}_{H{I}_j}^2\mathbf{I}{\mathbf{b}}_{E_j} \), where I = F−1F is an identity matrix of size t×t, and F=diag{f1,f2,…,ft} is a diagonal matrix with values equal to any real number, except zero values. Thus, Eq. 32.A7 is equivalent to
where T2=FTF−1 and \( {\boldsymbol{\beta}}_{E_j}=\mathbf{F}{\mathbf{b}}_{E_j} \); T and T2=FTF−1 are similar matrices and both have the same eigenvalues but different eigenvectors. Matrix T2=FTF−1 is called the similarity transformation, and matrix F is called the transforming matrix [19]. When the F values are only 1’s, vector \( {\mathbf{b}}_{E_j} \) is not affected; when the F values are only −1’s, vector \( {\mathbf{b}}_{E_j} \) will change its direction, and if the F values are different from 1 and −1, matrix F will change the proportional values of \( {\mathbf{b}}_{E_j} \). In practice, \( {\mathbf{b}}_{E_j} \) is first obtained from Eq. 32.A7 and then multiplied by matrix F to obtain \( {\boldsymbol{\beta}}_{E_j} \), that is, \( {\boldsymbol{\beta}}_{E_j} \) is a linear transformation of \( {\mathbf{b}}_{E_j} \). When vector \( {\boldsymbol{\beta}}_{E_j} \) substitutes , the ESIM index should be written as \( {I}_{E_1}={\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{y} \).
1.8 The Constrained Eigen Selection Index Method (CESIM)
The CESIM is a constrained ESIM and its vector of coefficients is the first eigenvector of Eq. 32.A9
where matrix K was described in Eq. 32.A6, and \( {\mathbf{b}}_{C{E}_1} \) is the first eigenvector of matrix KP−1C. When D′ = U′, \( {\mathbf{b}}_{C{E}_1}={\mathbf{b}}_{R_1} \) (the vector of coefficients of RESIM), and when U′ is a null matrix, \( {\mathbf{b}}_{C{E}_1}={\mathbf{b}}_{E_1} \).
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this chapter
Cite this chapter
Crossa, J. et al. (2022). Theory and Practice of Phenotypic and Genomic Selection Indices. In: Reynolds, M.P., Braun, HJ. (eds) Wheat Improvement. Springer, Cham. https://doi.org/10.1007/978-3-030-90673-3_32
Download citation
DOI: https://doi.org/10.1007/978-3-030-90673-3_32
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90672-6
Online ISBN: 978-3-030-90673-3
eBook Packages: Biomedical and Life SciencesBiomedical and Life Sciences (R0)