Keywords

1 Learning Objectives

  • To understand the advantages of linear selection index (LSI) theory for making selection decisions.

  • To understand and apply the unconstrained and constrained LSI in plant breeding.

  • To understand how to estimate LSI parameters.

2 Introduction

The linear phenotypic selection index (LPSI) theory was first described in the plant breeding context [1] and later in the animal [2] breeding phenotypic selection context. When the phenotypic and genotypic covariance matrices of the traits are known, the LPSI is the best phenotype-based linear predictor of the individual net genetic merit. In LPSI theory, it is assumed that the genotypic values that define the net genetic merit are composed entirely of the additive effects of genes and that the LPSI and the net genetic merit have joint bivariate normal distribution [3]. The main objectives of using a selection index (LSI) are (i) to predict the unobservable net genetic merit values of the candidates for selection, (ii) to maximize the expected genetic gain per trait or multi-trait selection response, and (iii) to provide the breeder with an objective rule for evaluating and selecting for several traits simultaneously. The advantages of an LPSI are that it modifies the predefined economic weights according to the trait heritability, that it considers indirect selection effects resulting from the genetic correlation between traits, and that it is relatively easy to use. Its disadvantages are that it may be difficult to assign economic weights to some traits, and that it requires large amounts of information to reliably estimate the genetic covariance between traits. This may cause a large sampling error.

Because economic weights are difficult to assign to some traits, several modified indices, such as the base index, the modified base index, the non-weighted multiplicative index [4] and the eigen selection index method (ESIM) [5, 6], have been proposed. The main LSI theory was developed assuming that the economic weights are fixed and known, which is, for instance, not the case for the ESIM.

In the LPSI structure, each trait has an economic weight. This could also imply that, for each trait, a directional change is desired. This may not be suitable to achieve a breeding objective in which some traits should remain unchanged. Assuming that the breeder is interested in keeping a certain trait within a range, we could either combine the use of an LPSI with independent culling for the restricted traits, or we could incorporate the fact that not changing the trait is desired. This was the main idea of the restricted LPSI (RLPSI) [3] which solves the usual LPSI equations subject to the restriction that the covariance between the LPSI and some linear function of the genotypes involved equals zero, thus preventing selection on the index from causing any genetic change in the expected genetic advance of the restricted traits.

Later the RLPSI results were extended [7] to a selection index called constrained LPSI (CLPSI) that attempts to make some traits change their mean based on a predetermined level while the rest of them are unrestricted. The CLPSI equals the covariance between the LPSI and some linear functions of the genotypes to a constant or genetic gain predetermined by the breeder. Some authors [5] developed a constrained index ESIM (CESIM) that does not use economic weights. The CLPSI (CESIM) is the most general LPSI and includes the LPSI (ESIM) and the RLPSI as particular cases.

In a similar manner, in the marker-assisted selection (MAS) context, a linear marker selection index (LMSI) was proposed [8] that uses phenotypic and marker score values jointly to predict the net genetic merit. The LMSI combines information on markers linked to quantitative trait locus (QTLs) and the phenotypic values of the traits to predict the net genetic merit of the candidates for selection because it is not possible to identify all QTLs affecting the economically important traits. Several authors [9, 10] have criticized the LMSI approach because it makes inefficient use of the available data. In addition, because the LMSI is based on only a few large QTL effects, it violates the selection index assumptions of multivariate normality and small changes in allele frequencies. We shall not describe the LMSI. Readers interested in the LMSI can see [11] for details.

The linear genomic selection index (LGSI) and constrained linear genomic selection index (CLGSI) were developed in the genomic selection (GS) context in which animals and plants are selected based on the GEBV of the candidates for selection [12, 13]. In the LGSI context, all marker effects of the genotyped individuals in the training population are estimated using marker and phenotypic data. These estimated effects are used in subsequent selection cycles to obtain predictors (GEBVs) of the individual breeding values in the testing population for which there is only marker information about the candidates for selection.

It has been shown [9] that GS increased the accuracy of predicting the breeding values of the candidates for selection, and reduced the intervals between selection cycles and the costs of the breeding programs (See Chaps. 5, 6 and 30). Because GS decreases the generation interval, it leads to a much higher genetic gain per year. Some authors [14] indicated that GS could replace traditional progeny testing when maximizing the genetic gain per year, as long as the accuracy of GEBV is higher than or equal to 0.45.

The expected selection response of the net genetic merit and the expected genetic gain per trait are the main quantities to consider when comparing different LSI. These parameters give breeders an objective basis to compare different selection methods. We describe the practical applications of the phenotypic and genomic LSI using real wheat data. Readers unfamiliar with LSI theory should read the Appendix of this work first and then return to the manuscript. A complete exposition of LSI theory is in [15].

3 Definitions

Breeding Value

the value of an individual measured by the mean phenotype of its progeny obtained by random mating with the population. It is also the sum of the average additive effects of the genes of the individual.

Economic Weight

the increase in profit achieved by improving a particular trait by one unit.

Expected Genetic Gain Per Trait (Multi-trait Selection Response)

a vector of expected genetic gains associated with the traits of the offspring of the selected parents.

GEBV

the sum of additive whole genome allele effects of an individual. Allele effects are estimated by a regression of the phenotypic values on the whole genome DNA markers. It is used to predict breeding values of individuals in animal and plant breeding programs in the genomic selection context.

Genomic Selection

the selection of parents based on the higher GEBV values or on a linear combination of them (e.g., LGSI or CLGSI).

Genotypic Value

the average of the phenotypic values across a (large) population of environments.

Linear Selection Index (LSI)

a linear combination of phenotypic and/or GEBV values, or marker scores. In addition, it can be unconstrained or constrained.

Net Genetic Merit

a linear combination of breeding values of the individual traits of interest, each of them weighted by its respective economic value. It is also called the total economic value of one individual.

Phenotype Value

the sum of genotypic (or breeding) value, environment value, and genotype-by-environment interaction.

Quantitative Traits

plant and animal characteristics (or phenotypic expression) that exhibit continuous variability, which is the result of many gene effects interacting among themselves and with the environment.

Selection Response

the expectation of the net genetic merit of the selected individuals when the mean of the original population is zero. It is also defined as the difference between the mean phenotypic values of the offspring of the selected parents and the mean of the entire parental generation before selection.

4 Key Points

  • Selection indices are fundamental tools for modern plant breeding.

  • The use of selection indices is a key to better estimate the net genetic merits of candidates for selection. Selection indices will ensure that wheat improvement research maximizes its impact.

  • New breeding technologies like genomic assisted breeding and rapid cycle selection has to be combined with the use of selection indices to maximize response to selection.

5 Phenotypic and Genomic Selection Indices Theoretical Results

5.1 The Net Genetic Merit and the LPSI

The net genetic merit (H = wg) is related to the vector of trait phenotypic (y) values as

$$ H={\mathbf{b}}^{\prime}\mathbf{y}+e=I+e, $$
(32.1)

where g = [G1G2 … Gt] and \( {\mathbf{y}}^{\prime }=\left[{Y}_1\kern0.5em {Y}_2\kern0.5em \dots \kern0.5em {Y}_t\right] \) are vector 1 × t (t=number of traits) of true unobservable breeding values and observable trait phenotypic values, respectively, I = by is the LPSI, and w = [w1w2wt] is the vector of economic weights. In Eq. 32.1, we assume that e has normal distribution with expectation E(e) = 0 and variance \( {\sigma}_e^2 \), and that I and e are independent; thus \( {\sigma}_H^2={\sigma}_I^2+{\sigma}_e^2 \) is the variance of H, \( {\sigma}_I^2={\mathbf{b}}^{\prime}\mathbf{Pb} \) is the variance of I, P is the phenotypic covariance matrix, and \( {\sigma}_e^2={\sigma}_H^2-{\sigma}_I^2 \) is the residual variance.

The LPSI (I = by) can be written as

$$ I={\mathbf{w}}^{\prime}\mathbf{C}{\mathbf{P}}^{-1}\mathbf{y}, $$
(32.2)

where b = P−1Cw, C is the genotypic covariance matrix, Cov(H, y) = Cw is the covariance among H = wg and y, and P−1 is the inverse matrix of P.

5.2 Economic Weights for LPSI

A method for assigning economic weights to the traits [1] is as follows. Suppose that in a wheat-selection program we are required to consider the vector \( {\mathbf{y}}^{\prime }=\left[{Y}_1\kern0.5em {Y}_2\kern0.5em \dots \kern0.5em {Y}_t\right] \) of t traits. Let us evaluate each in terms of Y1. Suppose that Y1 denotes grain yield, Y2 baking quality and Y3 denotes resistance to flag smut. Suppose that an advance of 10 in baking score (Y2) is equal in value to an advance of 1 bushel per acre in yield (Y1) and that a decrease of 20% infection (Y3) is worth 1 bushel of yield (Y1), and so on. Then, taking Y1 as standard and units as indicated, w1 = 1.0, w2 = 0.1, w3 =  − 0.05, etc., will be the economic values of each trait.

One additional method for assigning economic weights to the traits (which we have used in this work) is based on the expected genetic gain per trait (Appendix, Eq. 32.A5). Let us consider the real data HarvestPlus Association Mapping (HPAM) panel, which consists of 330 wheat lines from CIMMYT, and assume that the objective of the selection is to increase the mean value of Zn content in the grain (Zn), the Fe content in the grain (Fe), and grain yield (GY, t/h), while decreasing or maintaining the same plant height (PHT, cm). We found that the vector \( \mathbf{w}=\left[0.1\kern0.5em 0.5\kern0.5em 2.8\kern0.5em -0.6\right] \) (see Sect. 32.10) is adequate for obtaining the expected genetic gain per trait described in the Results Section of this work. This method is by assay and error and requires the evaluation of Eq. 32.A5 until we obtain the desired results.

5.3 The Maximized Correlation and the Maximized LPSI Selection Response

The maximized correlation between H and I (ρHI) and the maximized LPSI selection response are

$$ {\rho}_{HI}=\frac{\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}}{\sqrt{{\mathbf{w}}^{\prime}\mathbf{Cw}}}, $$
(32.3)
$$ R=k\sqrt{{\mathbf{b}}^{\prime}\mathbf{Pb}}, $$
(32.4)

respectively, where b = P−1Gw (Appendix, Eq. 32.A3). Equation 32.4 predicts the mean improvement in H due to indirect selection on I = by. Here, k is the intensity of selection. The heritability of I = by is \( {h}_I^2=\frac{{\mathbf{b}}^{\prime}\mathbf{Cb}}{{\mathbf{b}}^{\prime}\mathbf{Pb}} \).

6 The Retrospective Index

This index is useful when, instead of the index values, the breeder observes only the vector of selection differentials (s). In this case, the index that would give the same observed s is called the retrospective index and its vector of coefficients can be obtained as b = P−1s [16].

7 Constrained LPSI (CLPSI)

The CLPSI vector of coefficients is

$$ \boldsymbol{\beta} =\mathbf{Kb}, $$
(32.5)

where K = [It − Q],Q = P−1M(MP−1M)−1M, M = DUC, It is a t×t identity matrix and b = P−1Cw.

7.1 The Maximized CLPSI Selection Response and Expected Genetic Gain Per Trait

The maximized CLPSI selection response and expected genetic gain per trait are

$$ {R}_C=k\sqrt{{\boldsymbol{\beta}}^{\prime}\mathbf{P}\boldsymbol{\beta }}, $$
(32.6)
$$ {\mathbf{E}}_C=k\frac{\mathbf{C}\boldsymbol{\beta }}{\sqrt{{\boldsymbol{\beta}}^{\prime}\mathbf{P}\boldsymbol{\beta }}}, $$
(32.7)

respectively, where k is the selection intensity.

8 The ESIM and CESIM Theory

8.1 The Maximized ESIM Selection Response and the Maximized \( {\rho}_{H{I}_1} \)

The maximized ESIM selection response (RE) and the maximized correlation between \( {I}_{E_1}={\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{y} \) and \( {H}_{E_1}={\mathbf{w}}_{E_1}^{\prime}\mathbf{g} \) (\( {\rho}_{H{I}_1} \)) are

$$ {R}_E=k\sqrt{{\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{P}{\boldsymbol{\beta}}_{E_1}}, $$
(32.8)
$$ {\rho}_{H{I}_1}=\sqrt{\frac{{\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{C}{\boldsymbol{\beta}}_{E_1}}{{\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{P}{\boldsymbol{\beta}}_{E_1}}}, $$
(32.9)

respectively, where \( {\boldsymbol{\beta}}_{E_1}=\mathbf{F}{\mathbf{b}}_{E_1} \) is the first eigenvector of equation \( \left({\mathbf{T}}_2-{\rho}_{H{I}_j}^2\mathbf{I}\right){\boldsymbol{\beta}}_{E_j}=\mathbf{0} \) (Appendix, Eqs. 32.A7 and 32.A8). When F is not used, Eq. 32.8 is equal to \( {R}_E=k\sqrt{{\mathbf{b}}_{E_1}^{\prime}\mathbf{P}{\mathbf{b}}_{E_1}} \) (Appendix, Eq. 32.A7), whereas Eq. 32.9 is the square root of the first eigenvalue of Eq. 32.A7, i.e., \( {\rho}_{H{I}_1}=\sqrt{\rho_{H{I}_1}^2} \). The heritability of \( {I}_{E_1}={\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{y} \) is \( {h}_E^2=\frac{{\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{C}{\boldsymbol{\beta}}_{E_1}}{{\boldsymbol{\beta}}_{E_1}^{\prime}\mathbf{P}{\boldsymbol{\beta}}_{E_1}} \)

8.2 The Maximized CESIM Selection Response and Expected Genetic Gain Per Trait

The maximized CESIM selection response (RCE) and expected genetic gain per trait (ECE) are

$$ {R}_{CE}=k\sqrt{{\mathbf{b}}_{C{E}_1}^{\prime}\mathbf{P}{\mathbf{b}}_{C{E}_1}}, $$
(32.10)
$$ {\mathbf{E}}_{CE}=k\frac{\mathbf{C}{\mathbf{b}}_{C{E}_1}}{\sqrt{{\mathbf{b}}_{C{E}_1}^{\prime}\mathbf{P}{\mathbf{b}}_{C{E}_1}}}, $$
(32.11)

respectively, where all the terms were defined earlier.

9 The Unconstrained and Constrained Linear Genomic Selection Index Theory

The LGSI and the CLGSI are, respectively, an application of the LPSI and CLPSI to the genomic selection context. Thus, the LGSI and the CLGSI theoretical results are very similar to the LPSI and CLPSI theoretical results.

9.1 The Unconstrained Linear Genomic Selection Index (LGSI)

Let \( {\mathbf{z}}^{\prime }=\left[ GEB{V}_1\kern0.5em GEB{V}_2\kern0.5em \cdots \kern0.5em GEB{V}_t\right] \) be a vector of GEBVs for t traits. The individual LGSI is

$$ {I}_G={w}_1 GEB{V}_1+{w}_2 GEB{V}_2+\dots +{w}_t GEB{V}_t={\mathbf{w}}^{\prime}\mathbf{z}, $$
(32.12)

where w is the vector of economic weights for t traits.

9.2 The CLGSI Vector of Coefficients

The CLGSI vector of coefficients is

$$ {\boldsymbol{\beta}}_G={\mathbf{K}}_G\mathbf{w}, $$
(32.13)

where w is the vector of economic weights, KG = [It − QG], QG = UD(DUΓUD)−1DUΓ, Γ = Var(z) is the covariance matrix of GEBV, and It is an identity matrix of size t×t, whereas D and U are the matrices described in Eq. 32.A6 (Appendix). When d = 0, D=U and matrix KG can be written as KG = [It − QG], where QG = U(UΓU)−1U'Γ. In this case, the CLGSI is a null restricted LGSI. When D=U and U is a null matrix, βG = w. Thus, the CLGSI includes the null restricted and the unrestricted LGSI as particular cases.

9.3 Maximized CLGSI Selection Response and Expected Genetic Gain Per Trait

The maximized CLGSI selection response and expected genetic gain per trait are

$$ {R}_{CG}=k\sqrt{{\boldsymbol{\beta}}_G^{\prime}\boldsymbol{\Gamma} {\boldsymbol{\beta}}_G}, $$
(32.14)
$$ {\mathbf{E}}_{CG}=k\frac{\boldsymbol{\Gamma} {\boldsymbol{\beta}}_G}{\sqrt{{\boldsymbol{\beta}}_G^{\prime}\boldsymbol{\Gamma} {\boldsymbol{\beta}}_G}}, $$
(32.15)

respectively. The methods to estimate the index parameters are in [15].

9.4 The Genomic Estimated Breeding Values (GEBV)

To obtain the GEBV, we used a multi-trait genomic best linear unbiased predictor (GBLUP) described in [12, 13].

10 Real Wheat Data

We used the HarvestPlus Association Mapping (HPAM) panel, which consists of 330 wheat lines from CIMMYT and four traits: Zn content in the grain (Zn), Fe content in the grain (Fe), grain yield (GY, t/h), and plant height (PHT, cm). The objective of the selection was to increase the mean values of Zn, Fe, and GY while PHT decreased or stayed the same.

Using CLPSI, CESIM, and CLGSI, we constrained traits Zn, Fe and GY with the vector of constraints \( {\mathbf{d}}^{\prime }=\left[1.5\kern0.5em 1.6\kern0.5em 0.45\right] \) and matrices \( {\mathbf{U}}^{\prime }=\left[\begin{array}{cccc}1& 0& 0& 0\\ {}0& 1& 0& 0\\ {}0& 0& 1& 0\end{array}\right] \) and \( {\mathbf{D}}^{\prime }=\left[\begin{array}{ccc}0.45& 0& -1.5\\ {}0& 0.45& -1.6\end{array}\right] \). Each element of vector d is the standard deviation of the genotypic variance of Zn, Fe, and GY, respectively. The vector of economic weights for LPSI, CLPSI, LGSI, and CLGSI was \( \mathbf{w}=\left[0.1\kern0.5em 0.5\kern0.5em 2.8\kern0.5em -0.6\right] \), whereas for ESIM and CESIM, matrix F was \( \mathbf{F}=\left[\begin{array}{cccc}-0.5& 0& 0& 0\\ {}0& 1.0& 0& 0\\ {}0& 0& 2.0& 0\\ {}0& 0& 0& -0.5\end{array}\right] \) and \( \mathbf{F}=\left[\begin{array}{cccc}1.0& 0& 0& 0\\ {}0& -1.0& 0& 0\\ {}0& 0& 2.0& 0\\ {}0& 0& 0& -0.8\end{array}\right] \), respectively. The total proportion (p) retained was 6% (k=1.98) for the phenotypic indices and 12.45% (k=1.65) for the genomic indices. The estimated phenotypic (\( \hat{\mathbf{P}} \)) and genotypic (\( \hat{\mathbf{C}} \)) covariance matrices among the four traits were

$$ \hat{\mathbf{P}}=\left[\begin{array}{cccc}4.2& 1.65& -0.37& 0.24\\ {}1.65& 4.95& 0.27& 2.36\\ {}-0.37& 0.27& 0.58& 1.14\\ {}0.24& 2.36& 1.14& 14.40\end{array}\right]\kern0.5em \mathrm{and}\kern0.5em \hat{\mathbf{C}}=\left[\begin{array}{cccc}2.22& 0.95& -0.19& 0.06\\ {}0.95& 2.57& 0.15& 2.08\\ {}-0.19& 0.15& 0.20& 0.80\\ {}0.06& 2.08& 0.80& 6.97\end{array}\right] $$

With the data described above, we obtained the estimated matrix Γ (\( \hat{\boldsymbol{\Gamma}} \)) for three cases denoted as G, G-COP and COP, where \( \hat{\varGamma}=\left[\begin{array}{cccc}0.47& 0.11& -0.16& -0.09\\ {}0.11& 0.82& 0.72& 0.17\\ {}-0.16& 0.72& 1.82& 0.24\\ {}-0.09& 0.17& 0.24& 0.13\end{array}\right] \), \( \hat{\varGamma}=\left[\begin{array}{cccc}0.87& 0.35& -0.03& -0.10\\ {}0.35& 1.01& 0.89& 0.17\\ {}-0.03& 0.89& 2.51& 0.35\\ {}-0.10& 0.17& 0.35& 0.17\end{array}\right] \), and \( \hat{\varGamma}=\left[\begin{array}{cccc}0.77& 0.38& 0.03& -0.08\\ {}0.38& 0.91& 0.78& 0.12\\ {}0.03& 0.78& 2.26& 0.31\\ {}-0.08& 0.12& 0.31& 0.14\end{array}\right] \), respectively.

11 Results

11.1 Phenotypic Results

Figure 32.1 presents the averages for four traits of the 20 selected individuals (with LPSI and ESIM) with a proportion of 6% (k = 1.985). In this case, those averages were very similar. We found similar results when we made selections using CLPSI and CESIM for this real dataset.

Fig. 32.1
figure 1

Averages for four traits of 20 selected individuals with LPSI (linear phenotypic selection index) and ESIM (eigen selection index method)

Table 32.1 presents the estimated LPSI, ESIM, CLPSI, and CESIM selection response, coefficient of correlation and heritability. The estimated ESIM and CESIM selection response, correlation and heritability were higher than the estimated LPSI and CLPSI selection response, correlation and heritability. Thus, ESIM and CESIM efficiency for predicting the net genetic merit was higher than LPSI and CLPSI efficiency.

Table 32.1 Estimated unconstrained and constrained linear phenotypic selection indices (LPSI and CLPSI, respectively) and eigen selection index methods (ESIM and CESIM, respectively) selection response, coefficient of correlation and heritability

Table 32.2 presents the estimated LPSI, ESIM, CLPSI, and CESIM expected genetic gain for four traits selected with a proportion of 6% (k = 1.985). The estimated CLPSI and CESIM expected genetic gains per trait were constrained by vector \( {\mathbf{d}}^{\prime }=\left[1.5\kern0.5em 1.6\kern0.5em 0.45\right] \) values. Thus, the estimated expected genetic gains of traits Zn, Fe, and GY should be similar to the d values. The estimated CLPSI and CESIM expected genetic gain values were lower than the d values. This means that to reach d values, breeders will need to select once again using CLPSI and CESIM. However, the estimated CESIM expected genetic gain values were higher than the estimated CLPSI expected genetic gain values.

Table 32.2 Estimated unconstrained and constrained linear phenotypic selection indices (LPSI and CLPSI, respectively) and eigen selection index methods (ESIM and CESIM, respectively) expected genetic gain for four traits selected with a proportion of 6% (k = 1.985)

11.2 Genomic Selection Index Results

For datasets G, G-COP and COP, in Table 32.3 we present the estimated LGSI and CLGSI selection response and expected genetic gain for four traits with a selected proportion of 12.45% (k = 1.65). In this case, the estimated CLGSI expected genetic gains per trait were constrained by vector \( {\mathbf{d}}^{\prime }=\left[1.5\kern0.5em 1.6\kern0.5em 0.45\right] \) values. Thus, the estimated expected genetic gain of traits Zn, Fe, and GY should be similar to the d values. The estimated CLGSI expected genetic gain values were lower than the d values. This means that to reach d values, breeders will need to select once again using CLGSI. Note, however, that the estimated LGSI and CLGSI selection response and expected genetic gain values were higher than the estimated LPSI and CLPSI expected genetic gain values. This means that for the predicted data, LGSI and CLGSI efficiency was higher than LPSI and CLPSI efficiency. In addition, the estimated LGSI and CLGSI selection response was not affected by the restriction imposed on the LGSI and CLGSI expected genetic gain, as we would expect.

Table 32.3 Unconstrained and constrained estimated linear genomic selection indices (LGSI and CLGSI) selection response and expected genetic gain for four traits with a proportion of 12.45% (k = 1.65) for three datasets: G, G-COP and COP

12 How to Incorporate a Selection Index in Practice?

Incorporating a selection index requires a step-by-step approach to ensure its successful implementation. Most of the time, breeders use a customized procedure to select individuals based on independent culling that comprises multiple steps.

The first step consists of understanding the selection procedure executed by the program. The steps in the selection procedure can be mapped back to a set of reduction and selection steps applied to a selection unit (i.e., lines, families, etc.); each step consists of trait conditions (value and directionality). Each selection step can consist of meeting more than one trait condition (Table 32.4).

Table 32.4 Example of the type of independent culling selection steps carried out by a breeding program in a preliminary yield trial

The second step identifies which parts of the selection process can be replaced by an index. For example, by looking at Table 32.4, you can decide to pick a single step and replace independent culling or replace multiple steps with a single selection index. Here we will replace steps 2–11 with a selection index.

13 Retrospective Index

A third step consists of building the index. Indices that depend on economic weights are difficult to implement. Instead, a retrospective index is the best way to start implementing an index. For example, assume that the matrix of estimates for the traits indicated above is available together with an indicator column in which the material was selected by the breeder using the steps indicated in Table 32.4. The formula \( \hat{\mathbf{b}}={\hat{\mathbf{P}}}^{-1}\mathbf{s} \) is then used to infer the weights.

Suppose that \( \hat{\mathbf{P}} \)and \( {\hat{\mathbf{P}}}^{-1} \)are as follow:

$$ \hat{\mathbf{P}}=\left[\begin{array}{cccccc}1& -0.151& -0.108& -0.279& 0.321& -0.026\\ {}-0.151& 1& -0.228& -0.076& 0.054& -0.045\\ {}-0.108& -0.228& 1& 0.083& -0.080& -0.035\\ {}-0.279& -0.076& 0.083& 1& -0.099& -0.083\\ {}0.321& 0.054& -0.080& -0.099& 1& 0.064\\ {}-0.026& -0.045& -0.035& -0.083& 0.064& 1\end{array}\right] $$
$$ {\hat{\mathbf{P}}}^{-1}=\left[\begin{array}{cccccc}1.27& 0.272& 0.144& 0.321& -0.382& 0.048\\ {}0.272& 1.12& 0.267& 0.122& -0.118& 0.064\\ {}0.144& 0.267& 1.08& -0.033& 0.019& 0.055\\ {}0.321& 0.122& -0.033& 1.10& 0.003& -0.080\\ {}-0.382& -0.118& 0.019& 0.003& 1.13& -0.087\\ {}0.048& 0.064& 0.055& -0.080& -0.087& 1.01\end{array}\right]. $$

Then, according to Table 32.5 values,

$$ \hat{\mathbf{b}}={\hat{\mathbf{P}}}^{-1}\mathbf{s}=\left[\begin{array}{cccccc}1.27& 0.272& 0.144& 0.321& -0.382& 0.048\\ {}0.272& 1.12& 0.267& 0.122& -0.118& 0.064\\ {}0.144& 0.267& 1.08& -0.033& 0.019& 0.055\\ {}0.321& 0.122& -0.033& 1.10& 0.003& -0.080\\ {}-0.382& -0.118& 0.019& 0.003& 1.13& -0.087\\ {}0.048& 0.064& 0.055& -0.080& -0.087& 1.01\end{array}\right]\left[\begin{array}{c}0.65\\ {}0.40\\ {}-0.19\\ {}-0.06\\ {}0.04\\ {}-0.08\end{array}\right]=\left[\begin{array}{c}0.873\\ {}0.561\\ {}-0.004\\ {}0.208\\ {}-0.251\\ {}-0.032\end{array}\right] $$
Table 32.5 Example of a trait matrix used for building a retrospective index. An indicator column is used to derive the selection differentials and phenotypic covariance matrix required for the calculation of the retrospective index

The obtained weights can be confusing if they have a different direction than the desired direction. For example, the 4th weight for Xa2 (Table 32.5) resistance is positive, when we would expect it to be negative. This is because covariances among traits are expected to account for taking the trait in the right direction despite the value of the weight.

To show that these weights are better than the current approach, we can calculate what would be the selected individuals and the selection differentials using the index and compare them to the selection differentials obtained with the current approach. As can be seen in Table 32.6, if these weights are considered the real weights (even economic weights), the index can select better individuals than the breeder’s eyeball method. This example shows how the selection index theory can provide higher selection differentials than the breeder.

Table 32.6 Selection differentials for six traits involved in selection steps 2–11 using two selection methods, the independent culling normally applied by breeders versus the selection index based on a retrospective analysis

14 Discussion

14.1 The Unconstrained LSI Theory

The LSI theory includes, as particular cases, the unconstrained LPSI and LGSI, and any other unconstrained LSI associated with this theory that is based on the quantitative genetics and the multivariate normal distribution theory. The LSI theory is based on multivariate normal distribution theory because this distribution allows the LSI to be completely described using only means, variances and covariances. When the phenotypic traits and GEBV values have multivariate normal distribution, linear combinations of phenotypic traits and GEBV are normal. Even if the phenotypic traits and GEBV values do not have multivariate normal distribution, this distribution serves as a useful approximation, especially in inferences involving sample mean vectors, which, by the central limit theorem, have multivariate normal distribution [17]. By this reasoning, a fundamental assumption in LSI theory is that the LSI and the net genetic merit have joint bivariate normal distribution. Under the latter assumption, the regression of the net genetic merit on any linear function of the phenotypic or GEBV values is linear [3].

The selection response and the expected genetic gain per trait were the main parameters of the LSI and the criteria to compare LSI efficiency and predict the net genetic merit of any linear index. These parameters give breeders a clearer base on which to objectively validate the effectiveness of the adopted selection method.

The LPSI was the first LSI used to predict the net genetic merit and has good statistical properties when the phenotypic and genotypic covariances matrices are known. The LGSI is the most recent LSI and has the advantage of reducing the intervals between selection cycles by more than two thirds.

14.2 The Constrained LSI

The constrained LPSI (CLPSI) and the constrained LGSI (CLGSI) impose constraints on the expected genetic gain per trait. These indices include the unconstrained indices as particular cases. There are two types of CLPSI and CLGSI: the null restricted index and the predetermined proportional gain index. The null restricted index allows imposing restrictions equal to zero on the expected genetic gain of some traits, while the expected genetic gain of other traits increases (or decreases) without imposing any restrictions. In a similar manner, the constrained index attempts to make some traits change their expected genetic gain values based on a predetermined level, while the rest of the traits remain without restrictions. The objective of both types of selection indices is to predict the net genetic merit and select parents for the next generation. The CLPSI and CLGSI are projections of the vector coefficients of the LPSI and LGSI, respectively, to a different space, and the constraining effects are observed on the CLPSI and CLGSI expected genetic gains per trait where each restricted trait has an expected genetic gain according to the constrained values imposed by the breeder.

14.3 Statistical Properties of the LSI

Both the unconstrained and constrained indices have the same statistical properties when the phenotypic and genotypic covariance matrices and the economic weights are known. For example, they have maximum correlation with the net genetic merit and the variance of the predicted error is minimal; however, when the phenotypic and genotypic covariance matrices and the economic weights are unknown, the statistical sampling properties of the indices described in this work are difficult to know. Assuming that the estimated LSI have normal distribution, some authors [18] found the statistical sampling properties of the LSI selection responses in the phenotypic and genomic selection context while others [15] reported the statistical sampling properties of ESIM and CESIM.

15 Key Concepts

  • Using a selection index in plant breeding maximizes the expected genetic gain per trait or multi-trait selection response and provides an objective rule for evaluating and selecting for several traits simultaneously.

  • The advantages of a selection index is that it considers indirect selection effects resulting from the genetic correlation between traits. Main disadvantages are that it may be difficult to assign economic weights to some traits. Several modified indices exists to overcome this problem.

  • Recently genomic selection indices have been developed and used based on the genomic estimated breeding.

16 Conclusions

Our main goal was to offer researchers a starting point for understanding the core tenets of LSI theory in plant selection. We provided the unconstrained and constrained LSI theory associated with phenotypic and genomic selection. We validated the LSI phenotypic and genomic theoretical results in the wheat breeding context using a real wheat dataset with four traits.