Deep scoping: a breeding strategy to preserve, reintroduce and exploit genetic variation

Vanavermaete, David; Fostier, Jan; Maenhout, Steven; De Baets, Bernard

doi:10.1007/s00122-021-03932-w

Deep scoping: a breeding strategy to preserve, reintroduce and exploit genetic variation

Original Article
Open access
Published: 13 August 2021

Volume 134, pages 3845–3861, (2021)
Cite this article

Download PDF

You have full access to this open access article

Theoretical and Applied Genetics Aims and scope Submit manuscript

Deep scoping: a breeding strategy to preserve, reintroduce and exploit genetic variation

Download PDF

David Vanavermaete ORCID: orcid.org/0000-0002-2087-9062¹,
Jan Fostier²,
Steven Maenhout³ &
…
Bernard De Baets¹

4407 Accesses
6 Citations
28 Altmetric
1 Mention
Explore all metrics

Abstract

Key message

The deep scoping method incorporates the use of a gene bank together with different population layers to reintroduce genetic variation into the breeding population, thus maximizing the long-term genetic gain without reducing the short-term genetic gain or increasing the total financial cost.

Abstract

Genomic prediction is often combined with truncation selection to identify superior parental individuals that can pass on favorable quantitative trait locus (QTL) alleles to their offspring. However, truncation selection reduces genetic variation within the breeding population, causing a premature convergence to a sub-optimal genetic value. In order to also increase genetic gain in the long term, different methods have been proposed that better preserve genetic variation. However, when the genetic variation of the breeding population has already been reduced as a result of prior intensive selection, even those methods will not be able to avert such premature convergence. Pre-breeding provides a solution for this problem by reintroducing genetic variation into the breeding population. Unfortunately, as pre-breeding often relies on a separate breeding population to increase the genetic value of wild specimens before introducing them in the elite population, it comes with an increased financial cost. In this paper, on the basis of a simulation study, we propose a new method that reintroduces genetic variation in the breeding population on a continuous basis without the need for a separate pre-breeding program or a larger population size. This way, we are able to introduce favorable QTL alleles into an elite population and maximize the genetic gain in the short as well as in the long term without increasing the financial cost.

Improving selection decisions with mating information by accounting for Mendelian sampling variances looking two generations ahead

Article Open access 21 May 2024

New cycle, same old mistakes? Overlapping vs. discrete generations in long-term recurrent selection

Article Open access 31 October 2022

Effect of genotyping strategies on the sustained benefit of single-step genomic BLUP over multiple generations

Article Open access 18 March 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Truncation selection is often used in genomic selection to rapidly increase the short-term genetic gain of a breeding population. By selecting individuals with the highest genomic estimated breeding values (GEBVs), breeders hope to maximally pass favorable properties to their offspring. The underlying idea is easy to understand and matches the gut feeling of most breeders, making it one of the most popular strategies in plant breeding. Unfortunately, truncation selection is also associated with a loss in genetic variation (Jannink 2010). Besides entailing the loss of favorable QTL alleles from the breeding population, truncation selection causes a premature convergence of the genetic value, reducing the long-term genetic gain (Vanavermaete et al. 2020). Therefore, truncation selection can only promise a temporary, short-term increase in the genetic gain. To ensure a continuous increase in the genetic value, new selection methods are needed that maximize both the short-term and the long-term genetic gains.

Different variants of truncation selection that try to remedy the loss in genetic variation have already been proposed in the literature. One way to achieve this is by weighting the marker effects of favorable or low-frequency marker alleles and thus reducing the risk of eliminating important QTL alleles during breeding (Jannink 2010; Liu et al. 2015). The genetic variation can also be preserved by avoiding the selection of closely related individuals as in the population merit method (Lindgren and Mullin 1997) or by penalizing the GEBV when two parents with high coancestry are selected as in the maximum variance total method (Cervantes et al. 2016). The latter was further improved upon by also minimizing the rate of inbreeding, thus controlling the allele heterozygosity as well as the allele diversity (Brisbane and Gibson 1995; Akdemir and Sánchez 2016). In another strategy, the GEBV was replaced by the criterion of usefulness (UC), which not only takes into account the mean predicted genetic value of the offspring, but also the selection intensity, prediction accuracy and genetic variation of the offspring (Lehermeier et al. 2017). The scoping method combines pre-selection with a score function to avoid the selection of individuals with a too low GEBV while preserving genetic variation of the breeding population, thus maximizing the long-term genetic gain (Vanavermaete et al. 2020). Whereas the GEBV is based on the total sum of the additive marker effects, the optimal haploid value (OHV) scores individuals based on their haplotypes, and can therefore better preserve favorable QTL alleles in the breeding population, increasing the long-term genetic gain (Daetwyler et al. 2015). Müller et al. (2018) propose the expected maximum haploid breeding value (EMBV) to evaluate the potential of a candidate by measuring a limited number of gametes of each parent. The optimal cross selection (OCS) scores a crossing block based on the mean predicted genetic value of the offspring, but also constrains the loss in genetic diversity of the offspring (Akdemir and Sánchez 2016; Gorjanc et al. 2018).

Unfortunately, the aforementioned methods are generally tested on breeding populations that demonstrate a broad genetic variation. In reality, however, the genetic variation present in most breeding populations has been eroded to some extent by years of consecutive truncation selection. In such cases, the options to further increase the genetic value in the breeding population are strongly reduced. To demonstrate this, we simulate three breeding populations that suffer, to a varying degree, from reduced genetic variation by applying respectively 0, 5, and 20 breeding cycles of truncation selection. Next, using these three breeding populations as a starting point, the performances of the population merit method (Lindgren and Mullin 1997) and the scoping method (Vanavermaete et al. 2020) are compared. When these methods are initiated at a later point, the maximum reachable genetic value of the breeding population is lower, indicating that during truncation selection, favorable QTL alleles have been eliminated from the breeding population (see Fig. 1). Both methods will only be able to preserve a fraction of the genetic variation that is still present in the breeding population. Therefore, the added value of these methods is dramatically reduced when the genetic variation in the breeding population is limited.

When the genetic variation has already been substantially reduced, a gene bank could be used to (re)introduce alleles and haplotypes into the breeding population, resulting in an increase in the maximum reachable genetic value. A gene bank is an (inter)national collection of different plants ranging from wild specimens to different crop varieties at different stages of selection. To optimally reintroduce genetic variation into a breeding population and thus increase the genetic gain in the long term, the gene bank must show a broad genetic variation (Simmonds 1993; Salhuana and Pollak 2006). The introduction of gene bank accessions into the breeding population generally implies a reduction in short-term genetic gain. Depending on the available germplasm collection of the gene bank, different methods have been proposed to introduce such individuals into an elite breeding population. When a phenotypic trait is controlled by only a few genes with large effects, the favorable genes can be introgressed in the breeding population using marker-assisted backcrossing (Han et al. 2017; Smith and Beavis 1996). However, this proved unsuccessful when the phenotypic trait is controlled by many genes of small effect, which is the case for quantitative traits such as grain yield (Bouchez et al. 2002). In this setting, genomic selection (GS) can be used to rapidly introduce (new) QTL alleles from a gene bank into the breeding population (Bernardo 2009). Different mating designs use multi-parental crosses to combine elite individuals with donor individuals selected from a gene bank (Allier et al. 2019; Schopp et al. 2017). Gene bank accessions are first intercrossed to increase the frequency of favorable alleles before they are introduced into the breeding population. Cramer and Kannenberg (1992) proposed a five-year open-ended hierarchical breeding program (HOPE) to introduce new wild specimens into the breeding population using three consecutive gene pools. The HOPE method allows to effectively pass on favorable QTL alleles from the gene bank to the elite breeding population, but the need for additional pre-breeding populations drives up the total cost of the breeding program.

Allier et al. (2020a) recently proposed a new selection method, combining the haploid estimated breeding value (HEBV) and the UC to select and cross elite individuals with donor individuals. However, the calculation of the UC requires the construction of a covariance matrix, which considerably increases the computational requirements of the simulation while the bridging population can only reintroduce a fraction of the genetic variation into the breeding population. The parental selection can also be guided using genotyping-by-sequencing or related techniques, in which the relatedness of germplasm collections and elite individuals in the breeding population can be quantified and used to preserve the genetic variation in the breeding population (Glaubitz et al. 2014; Gouesnard et al. 2017). The genetic variation of a breeding population can also be increased by using exotic material, but a higher investment is needed to successfully incorporate those alleles in an elite breeding population (Salhuana and Pollak 2006; Wu et al. 2016).

We propose a new method that incorporates the use of a gene bank to reintroduce genetic variation into the breeding population, maximizing the long-term genetic gain without reducing the short-term genetic gain. By using a fraction of the breeding population for pre-breeding, the sizes of both the breeding population and the parental population remain unchanged, avoiding additional costs. This method, coined deep scoping, divides the breeding population into an elite population and different layers of pre-breeding individuals. The elite population contains the accessions that have the highest GEBVs and delivers high short-term genetic gain. In the first layer (Layer 0), individuals from a gene bank are crossed with individuals of the elite population, reintroducing genetic variation in the breeding population. Next, different layers are added, allowing for a gradual flow of favorable QTL alleles from the first layer to the elite population. Over each layer, the genetic variation is exploited, increasing the genetic gain and maximizing the transition of pre-bred individuals into the elite population.

Materials and methods

We adopt the base population and breeding scheme of Neyhart et al. (2017). The base population consists of two datasets of North American barley (Hordeum vulgare) from the University of Minnesota (UMN) and the University of North Dakota (NDSU), counting, respectively, 384 and 380 six-row spring inbred lines with 1590 biallelic SNP loci. The same base population was also used by Vanavermaete et al. (2020), ensuring that the performance of the deep scoping method can be compared with that of the scoping method. The parental selection methods are compared using four base populations that differ in their available genetic variation for a single trait of interest. These four base populations (referred to as Population BC05, Population BC10, Population BC15 and Population BC20) are created by reducing the genetic variation using truncation selection in a recurrent breeding scheme for, respectively, 5, 10, 15, and 20 breeding cycles.

Breeding scheme

The recurrent breeding scheme shown in Fig. 2 has been described by Vanavermaete et al. (2020) as well as by Neyhart et al. (2017). In this paper, minor modifications are made to this scheme. Over the first breeding cycles, the recurrent breeding scheme is used to decrease the genetic variation of the breeding population. Starting at breeding cycle 0, based on phenotypic data, the top-50 individuals of the NDSU dataset are crossed with the top-50 individuals of the UMN dataset. In the subsequent breeding cycles, the parental selection is completely based on GEBVs, reducing the financial cost of phenotyping. The GEBVs are predicted based on a linear mixed effects model (see Sect. Prediction model). In the recurrent breeding scheme, each parental couple is crossed 20 times, creating in total 1000 F1-hybrids. The F3-individuals are obtained after two cycles of single-seed descent. The recurrent breeding scheme is used to reduce the genetic variation of the breeding population by using truncation selection over 5, 10, 15 or 20 breeding cycles, selecting 100 parents with the highest GEBVs and crossing them at random. In the subsequent breeding cycles, the parents can be selected according to the deep scoping method or the HUC method with bridging. Additionally, both methods will also be able to select parents from a gene bank. Each simulation consists of 50 breeding cycles, and all results are averaged over 100 simulation runs.

Truncation selection

Truncation selection selects 100 individuals with the highest GEBVs and couples them randomly. Breeders have been using truncation selection for centuries in the hope to pass favorable properties to the next generation. Unfortunately, this method also causes a strong reduction in the genetic variation. Therefore, truncation selection is an ideal and realistic method to simulate the loss of genetic variation in a breeding population as a result of selection.

Haploid estimated breeding values

In plant breeding, GEBVs are commonly used to select the parental population. Daetwyler et al. (2015) proposed the OHV as an alternative selection metric in which the highest genetic value of each haplotype segment is used instead of the marker effects. In theory, a haplotype segment contains several alleles and markers that are always inherited together, but in the OHV approach, each chromosome is divided into different haplotype segments containing an equal number of markers. A diploid individual contains $n_H$ different haplotype segments and will have two haplotype values per segment representing the sum of the additive marker effects that are present in that segment on each homologous chromosome. The OHV is obtained by taking the sum of the highest haploid values per segment. In contrast to the GEBV, the OHV is better able to capture the potential benefits of heterozygous states in the breeding population. The HEBV proposed by Allier et al. (2020a) is similar to the OHV but allows for an overlap between the different haplotype segments. In this simulation study, the genotype is split into different haplotype segments containing 20 markers (window size) with an overlap of five markers (step size) (see Fig. 3). The same simulation parameters were adopted as reported by Allier et al. (2020a) and remained unchanged during the whole simulation study to allow for a fair comparison between the different methods. A matrix ${\mathbf {M}}$ of size $k\times n_H$, with $k$ the number of markers, is constructed to keep track of the selected markers per haplotype segment, such that $M_{ij}=1$ if marker $i$ is part of the $j$-th haplotype segment and $M_{ij}=0$ otherwise. Mathematically, the HEBV matrix ${{\mathbf {H}}}$ can be written as:

$$\begin{aligned} {{\mathbf {H}}}=({\mathbf {X}} \circ {\mathbf {1}}_{2n}\varvec{\beta }^{T}){\mathbf {M}}\, \end{aligned}$$

(1)

with ${\mathbf {X}}$ a matrix of size $2n\times k$ containing the haplotype of $n$ different individuals and $k$ different markers coded as 0 and 1 (such that the haplotype of individual i is represented at rows $2i-1$ and 2$i$), $\circ$ the Hadamard product operator, ${\mathbf {1}}_{2n}$ a vector of size 2$n$ containing 1s and $\varvec{\beta }$ a vector of size $k$ with estimated marker effects. Similar to the OHV, the HEBV between two individuals $i$ and $j$ is calculated as:

$$\begin{aligned} \mathrm {HEBV}(i,j)=\lambda \sum _{h=1}^{n_H} \max \left( {{\mathbf {H}}}_{2i-1,h}, {{\mathbf {H}}}_{2i,h}, {{\mathbf {H}}}_{2j-1,h}, {{\mathbf {H}}}_{2j, h} \right) \, \end{aligned}$$

(2)

with $\lambda$ a scaling parameter defined as the ratio between the step size and the window size. If the step size and window size are equal, then $\lambda = 1$ and the HEBV reduces to the OHV.

In a breeding population, an elite subpopulation (denoted E), containing individuals with high GEBVs, can be distinguished. The H-score $H(i)$ of an individual $i$ represents the maximal HEBV between this individual and any member of the elite subpopulation E (Allier et al. 2020a):

$$\begin{aligned} H(i) = \max _{j\in E} \mathrm {HEBV}(i,j)\,. \end{aligned}$$

(3)

In other words, an individual with a high H-score contains different favorable haplotype segments that are not available in the elite subpopulation (E) and should thus be selected as a parent.

Deep scoping method

The deep scoping method combines truncation selection with the (re)introduction of (new) QTL alleles in the breeding population with the aim of maximizing both the short- and long-term genetic gain. To introduce new QTL alleles, a gene bank is used, containing a population with a high genetic variation, but lower mean genetic value. When individuals of the gene bank are introduced into the breeding population, their lower genetic value prevents them from being selected during truncation selection. This will create a gap between the genetic value of the elite individuals and the rest of the breeding population, isolating them from one another. Although both QTL alleles will still be present in the breeding population, the QTL alleles of the individuals in the elite population will still be fixed causing a premature convergence of the genetic value. Therefore, a three-step selection procedure was designed to not only introduce QTL alleles into the breeding population but also in the elite population. To do so, the breeding population is divided into two subpopulations: the elite population and the pre-breeding population (see Fig. 4). Individuals of the elite population are selected based on the highest GEBVs and are crossed to maximize short-term genetic gain. The selection of the pre-breeding population is divided into two steps: the selection for Layer 0 and the selection for Layers 1–4. For Layer 0, elite individuals are crossed with individuals from the gene bank to maximally introduce QTL alleles into the breeding population. The parental selection for the subsequent layers maximizes the flow of individuals between the pre-breeding population and the elite population, exploiting the genetic variation such that (new) favorable QTL alleles can be introduced into the elite population. Loosely inspired by deep learning (Ivakhnenko 1971), the deep scoping method uses different layers in which individuals flow from one layer to the next, in the hope that the information that was once present in the first layer can be useful in the future and thus be transferred to the elite population.

The breeding population consists of an elite (sub)population containing 500 individuals and a pre-breeding (sub)population containing five different layers with each 100 individuals. In order to create the elite population, 50 individuals with the highest GEBVs are selected. In contrast to truncation selection, the parents are not crossed at random. The individual with the highest GEBV is selected as the P1 parent and is coupled with a P2 parent that minimizes the genetic relationship between both parents. Other crossing block designs have been considered as well, such as crossing the two individuals with the highest GEBVs with each other or crossing the top-50 individuals with the top 51–100 individuals, but both designs resulted in a significantly lower long-term genetic gain.

The pre-breeding population tries to introduce favorable marker alleles into the breeding population and ultimately in the elite population. To select the first parents for Layer 0, the HEBVs for the individuals of the gene bank are calculated. Next, the H-score is calculated for each individual of the gene bank. The five individuals with the highest H-score and thus containing the most favorable haplotype segments are selected as P1 parents. The five P2 parents are selected from the elite population to maximize the genetic value of the offspring. To maximize the genetic variation of the offspring, the scoping method is used instead of truncation selection. The scoping method has been proposed by the present authors (Vanavermaete et al. 2020) and consists of two important steps: the pre-selection and the parental selection. The pre-selection will select a fraction of the breeding population containing individuals with the highest GEBVs. Next, each selected P1 parent is crossed with a pre-selected individual that maximizes the S-score between both parents. The S-score between two individuals $i$ and $j$ is computed as:

$$\begin{aligned} S(i,j)=\sum _{m=1}^{k}\text{ var }\{Z_{im},Z_{jm}\}p_{m}\,, \end{aligned}$$

(4)

with $k$ the number of markers, ${\mathbf {Z}}$ a matrix of size $n\times k$ containing the genotype of n selected individuals and k different markers coded as −1, 0, or 1 and ${\mathbf {p}}$ a vector of size $k$ with $p_{m}=0$ if both alleles of marker $m$ have been selected in the parental population or $p_{m}=1$ otherwise (Vanavermaete et al. 2020). An individual with a high S-score contains different marker alleles that are not yet present in the parental population and should thus be selected as a parent. It is possible that an individual of Layer 0 is selected as an elite P2 parent as long as it maximizes the genetic variation of the offspring.

The subsequent layers of the pre-breeding population gradually increase the genetic value of the Layer 0 individuals, while the genetic variation is slowly decreased such that favorable QTL alleles can be passed to the elite population. To allow for a continuous flow of favorable QTL alleles into the elite population, four additional layers are used. The effect of using a different number of layers will be discussed later (see Sect. “Flow from the pre-breeding population into the elite population”). In the subsequent layers, the P1 parents are selected from the previous layer, selecting individuals with the highest H-score. This ensures that individuals with favorable haplotype segments can flow to the next layer. The P2 parents are selected such that the genetic value of the offspring is maximized while preserving the genetic variation as much as possible. Individuals of previous layers are not considered as potential parents because they could reduce the genetic value of the offspring and thus interrupt the flow of QTL alleles in the breeding population. Both pre-selection and the S-score are used to select the P2 parent. First, based on the GEBV, candidate parents are pre-selected. Next, P2 parents are selected such that the S-score is maximized between both parents. In the parental selection for Layer 1, the top-400 individuals are pre-selected and can thus be used to select the P2 parents. In the parental selection for the subsequent layers, the number of individuals that are pre-selected decreases over each layer to increase the genetic gain. The parental selection for Layer 2 only pre-selects 300 individuals, followed by 200 and 100 individuals for the selection for Layer 3 and Layer 4, respectively. Again, it is possible that an elite parent is also selected as a pre-breeding parent as long as it maximizes the genetic variation of the offspring. The use of the scoping method during the parental selection helps to preserve the genetic variation, allowing for a slower but more accurate fixation of the QTL alleles. Individuals of the fourth and last layer should have the highest genetic values and could therefore be selected during truncation selection, finally introducing favorable QTL alleles into the elite population. Note that the elite population selects the individuals with the highest GEBVs over the entire breeding population, making it possible to select individuals of any layer into the elite population as long as the GEBV is high enough.

The implementation of the deep scoping method will require several breeding cycles. Starting with a truncation-selected breeding population, when the deep scoping method is used for the first time, the parental selection for Layer 0 crosses individuals of the elite population with individuals of the gene bank, but the parents of the subsequent layers will still be selected from truncation-selected individuals. In the next breeding cycle, the parental selection for Layer 1 crosses individuals of Layer 0 with individuals of the elite population, but the parents for Layers 2–4 will be selected from the offspring of truncation-selected individuals. For each layer that is used in the deep scoping method, one additional breeding cycle will be required before the deep scoping method becomes fully operational.

HUC method with bridging

The HUC method combines the HEBV and the UC to select the parental population. A full description of the HUC method has been reported by Allier et al. (2020a). The HUC method combines an elite population (PopE) with a second donor population (PopD). The donor population is selected from a gene bank containing 500 different individuals. First, individuals with the highest GEBVs are selected as elite parents. Next, the individuals in the donor population with the highest H-scores are selected as donor parents. Next, a crossing block between the selected parents from the elite population and the donor population is built by maximizing the UC, which is calculated as:

$$\begin{aligned} U={\hat{\mu }}_{p}+i\rho {\hat{\sigma }}_{p}\,, \end{aligned}$$

(5)

with U the UC, ${\hat{\mu }}_{p}$ the predicted mean genetic value of the progeny, $i$ the selection intensity, $\rho$ the model performance and ${\hat{\sigma }}_{p}$ the predicted genetic variance of the progeny. Both parameters $i$ and $\rho$ are kept constant during the entire simulation. The UC was calculated using the implementation and parameter settings as published by Allier et al. (2020a) with $i = 2.06$ representing a selection intensity of 5% and $\rho =1$.

In our simulation study, the genetic value of the individuals of the gene bank is low. In such case, to allow for a fair comparison between the HUC method and the deep scoping method, the HUC method should be extended with a bridging population to assist the introduction of the individuals of the gene bank into the elite population (Allier et al. 2020b). This means that the breeding population is split into two parts: an elite population and a pre-breeding population. According to Allier et al. (2020b), 75% of the parental population is used to select the elite population, while the remaining 25% is used to select the pre-breeding individuals. Because the recurrent breeding scheme used in our simulation study requires the selection of an even number of parents, 80% of the parental population is used to select the elite population and the remaining 20% is used to select the pre-breeding population. In the elite population, 80 individuals with the highest GEBVs are selected and crossed using truncation selection as described in the deep scoping method. In the pre-breeding population, 10 elite individuals are crossed with 10 individuals of the gene bank (donors) according to the HUC method.

Prediction model

The GEBVs are predicted by fitting a linear mixed effects model:

$$\begin{aligned} {\mathbf {y}}={\mathbf {1}}_{n} \beta + {\mathbf {Z}} {\mathbf {u}} + \varvec{\epsilon }\,, \end{aligned}$$

(6)

with ${\mathbf {y}}$ a vector of phenotypic values, ${\mathbf {1}}_{n}$ a vector of size $n$ containing 1s, $n$ the number of individuals in the training panel, $\beta$ the fixed effect (phenotypic mean), ${\mathbf {Z}}$ the incidence matrix of the training panel with marker information, ${\mathbf {u}}$ the marker effects following a normal distribution ${\mathcal {N}}({\mathbf {0}},{\mathbf {G}})$ with ${\mathbf {G}}=\sigma _{u}^{2}{\mathbf {I}}_{k}$ (with ${\mathbf {I}}_{k}$ the identity matrix of dimension $k$), $n$ the number of markers and $\varvec{\epsilon }$ the residual effects following a normal distribution ${\mathcal {N}}({\mathbf {0}},{\mathbf {R}})$ with ${\mathbf {R}}=\sigma _{e}^{2}{\mathbf {I}}_{n}$. Both variance components $\sigma _{u}^{2}$ and $\sigma _{e}^{2}$ are estimated by means of restricted maximum likelihood using the rrBLUP package (Endelman 2011). The GEBVs of the individuals are calculated as:

$$\begin{aligned} \hat{{\mathbf {g}}} = {\mathbf {Z}}\hat{{\mathbf {u}}}\,, \end{aligned}$$

(7)

with $\hat{{\mathbf {g}}}$ the GEBVs, ${\mathbf {Z}}$ the marker information and $\hat{{\mathbf {u}}}$ the predicted marker effects.

In breeding cycle 1, the complete base population is used as a training panel. In the subsequent breeding cycles, 150 individuals are phenotyped and added to the training panel according to the tails method, selecting 75 individuals based on the tails of the normally distributed GEBVs (Neyhart et al. 2017). According to Neyhart et al. (2017), the tails method delivers a nonsignificant higher genetic gain compared to other update methods. In the case of the deep scoping method, the tails method builds a training panel with elite individuals and pre-breeding individuals improving the prediction of GEBVs of the whole breeding population without the need for two separate prediction models. Each time the training panel is updated, 150 individuals that have been longest in the training panel are removed from the training panel to reduce computational time without reducing the prediction accuracy (Neyhart et al. 2017). To calculate the UC, the Markov chain Monte Carlo (MCMC) samples of the marker effects are required. This matrix is obtained by estimating the GEBVs that are used in the HUC method via the BGLR package using a Gibbs sampler with Gaussian prior (BRR) (Allier et al. 2020a; Pérez and de los Campos 2014).

Simulation of the population

The simulation is built upon the work of Neyhart et al. (2017), using the packages GSSimTPUpdate and hypred in R (version 3.6.3). The dataset contains 1590 biallelic SNP markers from which 100 are selected as QTLs ($L=100$) and 1490 are used as markers to predict the genetic value. The true phenotypic value of the $i$-th individual ($y_{i}$) is calculated over three different environments:

$$\begin{aligned} y_{i}=\frac{1}{3}\sum _{j=1}^{3} g_{i}+e_{j}+\epsilon _{ij}\,, \end{aligned}$$

(8)

with $g_{i}$ the genetic value of the $i$-th individual, $e_j$ the $j$-th environmental effect, and $\epsilon _{ij}$ the residual effect of the $i$-th individual and the $j$-th environment. The genetic value is calculated by taking the sum of the QTL effects. The QTL effects are sampled from a geometric series such that at the $k$-th QTL, the favorable homozygote has a value of $a^k$, the unfavorable homozygote has a value of $-a^k$ and the heterozygote has a value of zero with $a=(L-1)/(L+1)$. Both the environmental and residual effects are drawn from a normal distribution with mean 0 and a variance component $\sigma _{E}^2$ and $\sigma _{e}^{2}$, respectively. The variance component of the environmental effect is defined as eight times the genetic variance, while the variance component of the residual effect is scaled to simulate a heritability of 0.5 (Bernardo 2014).

The simulation of the different breeding cycles is described by Vanavermaete et al. (2020). In this paper, a gene bank is added to the simulation. The gene bank is created by crossing individuals of the UMN population with individuals of the NDSU population. First, individuals of the UMN dataset are selected at random. For each parent, an individual of the NDSU dataset is selected that maximizes the S-score between both parents. The size of the gene bank is set at 500 individuals delivering a good balance between the preservation of the genetic variation of the base population and keeping the simulation time low.

Data availability

The scripts, figures, datasets of the base population and supplementary data are available from the GitHub repository https://github.com/biointec/deep-scoping. The dataset and the simulation of the recurrent breeding cycle have been reported and published by Neyhart et al. (2017). The HUC method has been reported and published by Allier et al. (2020a).