Genomic prediction in hybrid breeding: II. Reciprocal recurrent genomic selection with full-sib and half-sib families

Key message Genomic prediction of GCA effects based on model training with full-sib rather than half-sib families yields higher short- and long-term selection gain in reciprocal recurrent genomic selection for hybrid breeding, if SCA effects are important. Abstract Reciprocal recurrent genomic selection (RRGS) is a powerful tool for ensuring sustainable selection progress in hybrid breeding. For training the statistical model, one can use half-sib (HS) or full-sib (FS) families produced by inter-population crosses of candidates from the two parent populations. Our objective was to compare HS-RRGS and FS-RRGS for the cumulative selection gain (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma \Delta G$$\end{document}ΣΔG), the genetic, GCA and SCA variances (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma_{G}^{2}$$\end{document}σG2,\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma_{gca}^{2}$$\end{document}σgca2, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma_{sca}^{2}$$\end{document}σsca2) of the hybrid population, and prediction accuracy (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{gca}$$\end{document}rgca) for GCA effects across cycles. Using SNP data from maize and wheat, we simulated RRGS programs over 10 cycles, each consisting of four sub-cycles with genomic selection of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{e} = 20$$\end{document}Ne=20 out of 950 candidates in each parent population. Scenarios differed for heritability \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\left( {h^{2} } \right)$$\end{document}h2 and the proportion \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau = 100 \times \sigma_{sca}^{2} :\sigma_{G}^{2}$$\end{document}τ=100×σsca2:σG2 of traits, training set (TS) size (\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{TS}$$\end{document}NTS), and maize vs. wheat. Curves of \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma \Delta G$$\end{document}ΣΔG over selection cycles showed no crossing of both methods. If \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau$$\end{document}τ was high, \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma \Delta G$$\end{document}ΣΔG was generally higher for FS-RRGS than HS-RRGS due to higher \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$r_{gca}$$\end{document}rgca. In contrast, HS-RRGS was superior or on par with FS-RRGS, if \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau$$\end{document}τ or \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h^{2}$$\end{document}h2 and \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{TS}$$\end{document}NTS were low. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\Sigma \Delta G$$\end{document}ΣΔG showed a steeper increase and higher selection limit for scenarios with low \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\tau$$\end{document}τ, high \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$h^{2}$$\end{document}h2 and large \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$N_{TS}$$\end{document}NTS. \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma_{gca}^{2}$$\end{document}σgca2 and even more so \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma_{sca}^{2}$$\end{document}σsca2 decreased rapidly over cycles for both methods due to the high selection intensity and the role of the Bulmer effect for reducing \documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$\sigma_{gca}^{2}$$\end{document}σgca2. Since the TS for FS-RRGS can additionally be used for hybrid prediction, we recommend this method for achieving simultaneously the two major goals in hybrid breeding: population improvement and cultivar development. Supplementary Information The online version contains supplementary material available at 10.1007/s00122-023-04446-3.


Introduction
Recurrent selection (RS) comprises a multitude of breeding methods sharing as common feature the testing, selection and recombination of genetic units in successive steps of each selection cycle. The basic idea of RS is to improve the population mean by increasing the frequency of favorable alleles over selection cycles without loosing desirable genetic variation for future cycles (Hallauer et al. 2010). While recurrent selection in a broad sense is practiced in every breeding program, scientific studies on the efficiency of RS were generally conducted with closed populations. Quantitative genetics provided the theoretical basis for quantifying the short-term selection gain ( ΔG ) expected under different RS schemes and the relevant factors contributing to it (Falconer and Mackay 1996;Hallauer et al. 2010). Numerous RS methods have been described in textbooks (e.g., Bernardo 2002, Chap. 9), which differ with regard to the genetic units used in the various steps and whether they deal with intra-population or inter-population improvement.
In hybrid breeding, usually two genetically distant populations are used as base material to take full advantage of hybrid vigor (Melchinger and Gumber 1998). For this reason, RS is mainly concerned with inter-population Communicated by Antonio Augusto Franco Garcia. improvement of both parent populations in reciprocal terms to maximize the mean performance of their hybrid population and the top hybrids between lines developed from the two parent populations (Hallauer et al. 2010). While the technical details of reciprocal recurrent selection can vary depending on the biological requisites of the crop (e.g., multiplication coefficient, ease of pollination control and production of selfed and cross-pollinated seed or doubledhaploid lines), two general categories are distinguished, depending on whether the test units are half-sib (HS) or full-sib (FS) families (Hallauer et al. 2010). Comstock et al. (1949) proposed HS reciprocal recurrent selection as a method to maximize general combining ability (GCA) in hybrid breeding and mistakenly also mentioned improvement of specific combining ability (SCA). They suggested to select candidates from each population based on their GCA evaluated in HS progenies produced with the opposite population as common tester. Later, the HS method was modified replacing as tester the opposite population by an inbred line or single cross produced from it (Horner et al. 1973).
Full-sib recurrent reciprocal selection was proposed as an alternative to the HS methods (Hallauer 1967;Lonnquist and Williams 1967). The basic idea is to produce and evaluate FS families, corresponding to single crosses (SCs), between pairs of candidates from the two parent populations and recombine within each population the parents (or selfed progenies of them) of crosses with superior hybrid performance to generate the base material for the next breeding cycle. The main advantage of FS over the HS selection is that twice as many candidates can be sampled from each parent population, yielding the same number of SC entries for phenotyping in trials. However, a line with superior GCA is unlikely to be selected in the FS method if crossed by chance to a partner with poor GCA or if their cross has negative SCA. Thus, the positive effect of higher selection intensity is partly offset by a reduced correlation between the selection criterion and GCA, as reflected in the formula for the selection response of the FS method (Bernardo 2002, p. 201).
Genomic selection can readily be integrated into RS methods because the phenotypic data from testing of the genetic units can be used for training the statistical model for genomic prediction. This offers the opportunity to substantially improve the selection response as demonstrated by numerous studies on intra-population improvement in animal and plant breeding (e.g., Crossa et al. 2017;Hickey et al. 2017;Schaeffer 2006). First, the low costs of genotyping permits predicting the breeding value of a large number of non-phenotyped individuals, enabling a tremendous increase in the selection intensity. Second, applying genomic selection for several cycles benefits the selection response by savings in time and costs due to bypassing the time-consuming and expensive step of phenotyping. However, little is known about the persistency of the prediction accuracy and selection gain under different schemes of recurrent genomic selection . Furthermore, integration of genomic selection into FS reciprocal recurrent selection has so far received little attention in hybrid breeding.
In the present study, we applied fully stochastic forwardin-time simulations based on molecular data from breeding programs in maize and wheat to generate in silico base populations. These were subject to ten cycles of HS-RRGS or FS-RRGS, each cycle consisting of (re-)training and four sub-cycles of genomic selection. Our goal was to (i) investigate the cumulative selection gain ( ΣΔG ) of the hybrid population over selection (sub-)cycles, (ii) monitor the corresponding changes in the genetic variance ( 2 G ) and the GCA and SCA variances ( 2 gca , 2 sca ), and (iii) examine the prediction accuracy ( r gca ) for GCA effects over cycles and sub-cycles. Our main objective was to compare HS-RRGS and FS-RRGS with regard to the short-and long-term selection progress under various scenarios differing in the heritability ( h 2 ) and proportion = 100 × 2 sca ∕ 2 G at the beginning of the selection program as well as the presence or absence of genetically distant parent populations.

Genetic markers, founder lines and generation of the base populations
Our starting point were two data sets of SNP marker genotypes from maize and wheat. Data set DS1 comprised 145 dent and 111 flint founder lines, serving as female and male lines, respectively, from the maize breeding program of the University of Hohenheim detailed in previous publications (Schrag et al. 2018;Technow et al. 2014;Westhues et al. 2017). Briefly, the lines had been developed either by recurrent selfing for more than six generations or the doubledhaploid (DH) technique and had been selected for per se performance as well as for general combining ability (GCA) of important agronomic traits in testcrosses (TCs) with line testers from the opposite population. All lines had been genotyped with the 50k Illumina SNP chip MaizeSNP50 (Ganal et al. 2011). After a rigorous quality check, a total of 13,813 markers (corresponding to panel SNP all ) polymorphic in the 256 founder lines were available for our simulations, providing a fairly uniform coverage of the entire maize genome. The genetic map of these SNPs was constructed as detailed by Lanzl et al. (2023) and covered in total 1,442 cM of the maize genome.
Data set DS2 comprised 667 female and 18 male founder lines from spring bread wheat (Triticum aestivum L.) used as parents of 1,888 SC hybrids phenotyped in large experiments described by Basnet et al. (2019). The elite parent lines were chosen from CIMMYT's spring bread wheat program based on per se performance for agriculturally important traits, suitability for producing hybrids and ancestral diversity measured with the co-ancestry coefficient. The female and male parents were genotyped with customary SNP arrays. For our analyses, CIMMYT kindly provided the 10,250 SNP markers (corresponding to panel SNP all ) underlying the comparison of different methods for genomic prediction of hybrids reported by Basnet et al. (2019). The genetic map of these markers covered in total 3,009 cM of the wheat genome.
For each data set, the base material (denoted as C 1,0 ) for the parent populations F (females) and M(males) was generated in silico by the following steps: (i) random mate the founder female or male lines, excluding selfing. (ii) Sample 950 S 0 genotypes from each parent population under the restriction that each founder line contributed no more than 12 gametes to the sample, except for the males in set DS2, where the threshold was raised to a maximum of 55 gametes per founder line due to their small number. (iii) Simulate meiosis in each of the S 0 genotypes and produce from a randomly chosen gamete one DH line. Thus, we obtained a set of 950 largely unrelated DH lines in both set F and M representing the parent populations at the beginning of genomic selection for both breeding schemes.

Genetic architecture of the simulated traits
For each data set we followed the same procedure to simulate traits differing in their architecture with regard to the importance of dominance effects detailed in our companion paper ). First, a panel QP of 3,000 SNPs, representing an equal proportion of the markers positioned on each chromosome, was randomly sampled from the entire panel SNP all of SNP markers and retained as pool for choosing the QTL positions. The remaining panel R of SNPs in each data set served as markers for genomic prediction. Second, a panel Q of 1,000 QTL randomly chosen from QP were assigned additive ( a l ) and dominance ( d l ) effects as defined in the textbook of Lynch and Walsh (1998). The additive effects a i were drawn from a Gamma distribution with parameter scale = 1.66 and shape = 0.4 following previous studies (Meuwissen et al. 2001;Technow et al. 2012). The dominance effects d l = a l × k l were obtained multiplying a i by the degree of dominance k l sampled from a normal distribution k l ∼ N k , 2 k . Third, a random panel Q d ⊂ Q of n d QTL displaying only dominance effects d l was obtained replacing their a l value by zero. Fourth, the additive effects were scaled such that the genetic variance of the hybrid population F × M in C 1,0 was 2 G = 1. The parameters k , 2 k and n d were chosen based on estimates of the degree of dominance and QTL mapping results from numerous studies in maize as detailed in Melchinger et al. (2023) and for wheat, based on a literature survey of various (partially) autogamous crops (see Suppl. Table S1).
For each trait, determined by the location and the genetic effects of the QTL in Q , the genotypic value of each candidate from every selection cycle was obtained by summing the respective additive and dominance effects over all QTL. Phenotypic values of TCs (HS-RRGS) or SCs (FS-RRGS) in the training set (TS) of the initial and later selection cycles were obtained by adding to the genotypic values a noise variable from a normal distribution N 0, 2 e , where 2 e = 2 G 1 − h 2 ∕h 2 , 2 G is the genetic variance and h 2 the desired broad-sense heritability of the hybrid population F × M in C 1,0 .

Simulation of half-sib and full-sib reciprocal recurrent genomic selection
In our notation, numbers with capital and lower case letters (e.g., N TS and n TS ) refer to the hybrid and parent populations, respectively. Each selection cycle C t (t = 1, ..., 10) includes (re-)training the model in sub-cycle C t,0 and s (s = 1, ..., 4) sub-cycles C t,s of genomic selection. The candidates selected in C t,4 are used to generate the materials for C t+1,0 by production of DH lines as described below. The two methods HS-RRGS and FS-RRGS differ with respect to (i) the number of line candidates sampled from each population for producing the test units in the TS, (ii) the test units (inter-population HS versus FS families) in the TS used for phenotyping and (re-)training the model in C t,0 , and (iii) the selection criterion (GBLUPs for TC performance in HS-RRGS versus GBLUPs for GCA effects in FS-RRGS) used for genomic selection in sub-cycles C t,s . However, the two breeding schemes are identical with respect to (a) the identification of superior candidates based on the selection criterion applied by each method, (b) the recombination scheme for generating the material of the next sub-cycle, and (c) the production of DH lines after recombining the candidates selected in C t,4 to be used as base material for C t+1,0 (Suppl. Figure S1).
In each cycle C t of the HS-RRGS scheme, n F female and n M = n F male lines were randomly sampled from parent population F and M , respectively, of subcycle C t,0 and crossed to a tester from the opposite population (Fig. 1). The TCs from each parent population, adding to a total of N TS = 2n F entries over both parent populations, were phenotyped and served as TS for training the model for TC performance of the respective parent population described below [Eq.
(1)]. The GBLUPs of the candidates in sub-cycles C t,s obtained with this model served as selection criterion within cycle C t . For each parent population, the tester in cycle C t was the genotype with highest GBLUP value among the 950 DH lines from the opposite population identified in the previous cycle C t−1,0 . For cycle C 1 , the tester was the best among 100 randomly chosen DH lines with highest GCA to the opposite population determined by phenotypic evaluation of TCs with the same heritability as in C 1,0 .
In each cycle C t of the FS-RRGS scheme, n * F female and n * M = n * F male lines were randomly sampled from parent population F and M of subcycle C t,0 and pairwise crossed (Fig. 1). The N TS = n * F inter-population single-cross (SC) hybrids were phenotyped and served as TS for training the statistical model for hybrid performance described below [Eq.
(2)]. The GBLUPs for the GCA of the candidates in C t,s , obtained with this model served as selection criterion. Under the assumption of equal expenditures for phenotying the TS for each method, N TS is equal for both methods so that n * F = n * M = 2n F = 2n M , meaning that twice as many lines are sampled from each parent population for (re-)training in FS-RRGS than HS-RRGS.
For both RRGS methods and both parent populations, we applied the same scheme for recombination and recurrent genomic selection in sub-cycles C t,s (Suppl. Figure S1). In C t,0 , the top n sel,1 = 40 out of the 950 DH lines were selected from each parent population on the basis of their GBLUPs for TC performance in HS-RRGS and GCA in FS-RRGS, respectively. In each parent population, each selected line was crossed at random with one other selected line. Subsequently, the 20 intra-population single crosses were mated according to a half-diallel design and each of the 190 matings (corresponding to intra-population FS families) contributed exactly five S 0 progeny to the next generation. The n = 950 S 0 plants constituted the parent population comprising the recombined material of subcycle C t,1 and served as starting material for subcycle C t,2 . They were genotyped in the juvenile stage to select the n sel,2 = 20 plants with top GBLUPs based on their marker genotype and the statistical model trained in C t,0 . The 20 top S 0 plants were again pairwise mated and five S 0 progeny plants per intra-population FS family were genotyped for a total of n = 950 candidates, comprising the recombined material of subcycle C t,2 . This procedure was continued up to subcycle C t,4 , where from each of the 950 candidates after recombination one DH line was produced in silico to generate 950 DH lines corresponding the material in subcycle C t+1,0 . Thus, the allele frequencies in C t,4 and C t+1,0 and consequently the mean of the hybrid population between the two parent populations in these sub-cycles are expected to be identical except for sampling effects. Our rationale for selecting n sel,1 = 40 DH lines from C t,0 but n sel,s = 20 S 0 plants in advanced subcycles C t,s was to have the same effective population size N e in all sub-cycles.
Our decision to conduct within each cycle four subcycles of genomic selection before re-training was based on results examining the persistency of prediction accuracy for genomic selection with synthetic populations . Our goal was that the prediction accuracy of GBLUPs for the selection criterion should not fall below 25% of the level in C t,0 .

Genomic prediction models
For HS-RRGS, we calculated GBLUPs for TC performance of all candidates from each parent population separately and but describe the procedure here only for candidates from population F . We used the model, where TC is the vector of TC performance (corresponding to inter-population HS families) of the n F candidates in the TS of population F in sub-cycle C t,0 , µ is the fixed model intercept, F is the design matrix linking the vector of TC effects of t he genotypes from F wit h TC , TC,F refers to the genetic TC variance of the DH lines from population F in C t,0 , F is the genomic relationship matrix among the genotypes in population F calculated as described below, and is the residual error. The TC performance of all candidates in F was predicted with the formula ̂ F = 2 TC,F F,TS −1 TC,TC TC , where F,TS is a sub-matrix of F , referring to the genomic relationship of the genotypes in F and the candidates in the TS, and TC,TC is the phenotypic covariance matrix for TC performance of the candidates in the TS. The variance components required in the above formulas were estimated from the data in the respective TS using the R package regress version 1.3 (Clifford and McCullagh 2006).
For FS-RRGS, we calculated GBLUPs for the GCA of the genotypes in each population based on the following model for inter-population SC hybrids among DH lines (corresponding to inter-population FS families) from the TS where SC is the vector of phenotypic values of the N TS = n * F = 2n F SC hybrids in the TS, µ is the fixed model intercept, F and M are the design matrices linking the random GCA effects of the parent lines from F and M , respectively, with their hybrids in the TS, S the design matrix of SCA effects for the SC hybrids in the TS. The residuals are again represented by vector . The covariance matrix of the GCA effects F and M was F 2 gcaF and M 2 gcaM , respectively, and that of SCA effects was S 2 sca , where 2 gcaF , 2 gcaM and 2 sca are the variance components pertaining to GCA and SCA effects of the complete factorial F × M , F and M are the genomic relationship matrices among the genotypes from F and M , respectively, and The GCA of all candidates in population F was predicted with the formula ̂ F = 2 gcaF F,SC −1 SC,SC SC , where F,SC is the genomic relationship matrix of the individuals in population F and the parents from this population used for producing the SC hybrids in the TS, and SC,SC is the phenotypic covariance matrix of the SC hybrids in the TS. Variance components required for application of GBLUP were obtained from the data in the TS using again the R package regress version 1.3 (Clifford and McCullagh 2006).

Statistical analyses of the selection (sub-)cycles
With the SNP data from panel R = SNP all �Q , we calculated the genomic relationship matrices F and M for all genotypes i and j from all sub-cycles of population F and M , respectively, according to Schopp et al. (2017), here exemplified for genotypes i and j from population F: are the genotypic scores of i and j , respectively, reflecting the number of reference alleles in i and j , and p i l and p j l are the frequencies of the reference allele at marker l in the sub-cycle(s) of population F , from which i and j originated. We also calculated modified Rogers' distances (Rogers 1972(Rogers , 1986 on the basis of the markers in panel R between the entire population of 950 DH lines from different sub-cycles C t,0 of the same or the opposite parent population. For both HS-RRGS and FS-RRGS and all sub-cycles C t,s , we determined the genotypic values of all hybrids in the factorial between F and M using from each population (a) the 950 DH lines if s = 0 , (b) the 40 selected DH lines if s = 1 , or (c) the 40 DH lines corresponding to the parental gametes of the 20 selected S 0 genotypes if s > 1 . These provided the basis for calculating with standard procedures (i) the mean F×M,t,s of all hybrids as well as the GCA and SCA effects and (ii) the genetic variance 2 G and variance components 2 gcaF , 2 gcaM and 2 sca as well as the proportion t = 100 × 2 sca ∶ 2 G for all sub-cycles C t,0 . The cumulative selection gain ΣΔG F×M of the hybrid population F × M in sub-cycle C t,s was calculated as F×M,t,s − F×M,1,0 using the fact that G = 1 in C 1,0 .
For each sub-cycle C t,s of HS-RRGS, the prediction accuracy r gca (TC) for GCA effects was calculated by correlating the true GCA values in vectors F and M of the female and male parents of the factorials with their GBLUPs ̂ F and ̂ M , respectively. Further, we calculated the correlation r u,û (TC) of the GBLUPs ̂ F or ̂ M with their true TC performance F or M determined from the genotypic values of the TCs of the candidates in parent population F and used the same procedure for parent population M . For each sub-cycle C t,s of FS-RRGS, the prediction accuracy r gca (SC) for GCA effects was calculated by correlating the true GCA values from the factorials with their GBLUPs ̂ F or ̂ M in each sub-cycle C t,s . For sub-cycles C t,0 , these correlations were calculated for the entire set of 950 DH lines and also separately calculated for (i) the DH lines used as parents of the TS and (ii) the DH lines used as parents of the PS. For both methods, r gca values were finally averaged over both parent populations. With regard to the reduction in the genetic variance due to negative linkage disequilibrium caused by selection, known as the Bulmer effect (Bulmer 1971), we calculated for each sub-cycle C t,s also the genic GCA variances ( ̃2 gcaF and ̃2 gcaM ),the genic SCA variance ( ̃2 sca ), and the total genic variance ( ⌢ 2 G ) as follows: where p F l and p M l is the frequency of the reference allele at QTL l ∈ Q in sub-cycle C t,s of population F and M , respectively, and a l and d l the additive and dominance effect at the QTL. Finally, we computed the ratios 2 G ∶̃2 G , 2 gcaF + 2 gcaM ∶ ̃2 gcaF +̃2 gcaM and 2 sca ∶̃2 sca . For all statistics mentioned above, we calculated the mean and corresponding standard deviations from 400 simulation runs, starting with the production of the parent populations in C 1,0 from the founder lines in data set DS1 and DS2.

Results
For the maize data set, the curves of the cumulative selection gain ΣΔG as a function of the (sub-)cycles showed for each scenario a similar pattern for both breeding methods with no intersection between them (Fig. 2). A striking difference existed between the scenarios with high and low , where with low the curves for ΣΔG displayed a much steeper slope in the first cycle and approached faster the selection limit, irrespective of N TS . For both values of , the selection limit increased by ~ 30% when doubling h 2 from 0.4 to 0.8, and by ~ 15% when doubling N TS from 190 to 380. For high , FS-RRGS had higher ΣΔG than HS-RRGS with differences being largest (10%) for low h 2 and N TS = 380 , intermediate (4-6%) for high h 2 irrespective of N TS , and small (3%) for low h 2 and N TS = 190 . For small , ΣΔG for HS-RRGS was ~ 5% greater than for FS-RRGS.
The curves for ΣΔG obtained with the wheat data showed essentially the same picture as for maize (Fig. 3). Despite identical assumptions about the genetic architecture of heterotic traits, the contribution of 2 sca to the genetic variance G among hybrids was much larger in wheat than in maize ( = 45% vs. 24%). This was associated with a much slower increase and substantially lower selection limit of ΣΔG if was high. Furthermore, differences between both methods were more pronounced in that FS-RRGS had 10-12% higher ΣΔG than HS-RRGS for high , whereas HS-RRGS surpassed FS-RRGS by ~ 2 to 10% for all scenarios with small . For the maize data set, the reduction of the prediction accuracy r gca as a function of C t,0 followed approximately an inverse sigmoid function for high h 2 and an inverse logarithmic function for low h 2 (Figs. 4, Suppl. Figure S2). Increasing h 2 from 0.4 to 0.8 increased r gca by up to 25% for both breeding schemes with only minor modifications due to . By comparison, doubling N TS improved r gca by up to 15%. For high , in the first cycles r gca was ~ 10% higher for FS-RRGS than HS-RRGS irrespective of h 2 , but this advantage reduced in advanced cycles. For scenarios with low , r gca was for both methods at the same level in the first cycle but slightly higher for HS-RRGS in advanced cycles. While r gca for the TS was ~ 15% higher than for the PS in HS-RRGS, differences between both sets were less than 10% for FS-RRGS. Independent of the breeding scheme and scenario, the reduction of r gca in consecutive sub-cycles within a cycle amounted to ~ 40% from sub-cycle C t,0 to C t,1 but only to ~ 20% in later sub-cycles.
For the wheat data set, the r gca values for FS-RRGS displayed essentially the same trends as for maize regarding the level and the rate of reduction over (sub-)cycles in all scenarios (Figs. 5, Suppl. Figure S3). For HS-RRGS and scenarios with high , r gca started in C 1,0 at a much lower level yet decreased less in the following sub-cycles and reached in sub-cycles C 2,s about the same level as FS-RRGS. For HS-RRGS and scenarios with low , r gca was at a similar or Fig. 2 Cumulative selection gain ΣΔG (expressed in units of σ G in cycle C 1,0 ) in the hybrid population for full-sib and halfsib reciprocal recurrent genomic selection (RRGS). Results for 10 selection cycles, each consisting of four subcycles, based on SNP data from maize. Scenarios differed for the size N TS of the training set (TS), heritability h 2 , and proportion = 100% × 2 sca ∶ 2 G of the trait higher level for sub-cycles C 1,s and C t,s ( t ≥ 2 ), respectively, and discrepancies between r gca for the TS and PS were much larger than for FS-RRGS.
For the maize data, the curves for 2 G in C t,0 were almost identical for both breeding methods (Fig. 6). Lower h 2 and smaller N TS delayed the reduction in 2 G over cycles to a minor extent, whereas low had an accelerating effect. Likewise, the reduction in 2 gca = 2 gcaF + 2 gcaM was almost identical for both breeding methods and followed the same trends as for 2 G . The only difference was a faster reduction in 2 sca for HS-RRGS in C 2,0 and C 3,0 for scenarios with high . The ratio t = 100 × 2 sca ∶ 2 G decreased in an inverse logarithmic function with selection cycles t and the reduction rate was higher for HS-RRGS than FS-RRGS (Suppl. Figure S5).
For the wheat data set, the curves of 2 G , 2 gca , and 2 sca over selection cycles were at a similar level as for the corresponding scenario in maize (Suppl. Figure S4). For all scenarios with high , the rate of reduction in 2 G was for both breeding methods slightly higher than in maize due to the high initial value of and the steep reduction in 2 sca , especially for HS-RRGS. The curves for 2 gca were flatter than for maize. For the scenarios with small , the reduction in 2 sca was congruent for both methods in wheat but remained at higher level than in maize. The proportion t = 100 × 2 sca ∶ 2 G dropped substantially during the first cycle with a higher rate for HS-RRGS than FS-RRGS, especially for large N TS (Suppl. Figure S6).
The curves of 2 gca ∶̃2 gca (ratio of genetic to genic GCA variance) were for the maize and wheat data set largely congruent for both breeding methods (Fig. 7, Suppl. Figure S7). In maize, the ratio dropped from ~ 1.0 at the beginning of selection to values in advanced cycles below 0.9 for h 2 = 0.8 and N TS = 380, above 0.9 for h 2 = 0.4 and N TS = 190, and close to 0.9 in the other two scenarios. The Fig. 3 Cumulative selection gain ΣΔG (expressed in units of σ G in cycle C 1,0 ) in the hybrid population for full-sib and halfsib reciprocal recurrent genomic selection (RRGS). Results for 10 selection cycles, each consisting of four subcycles, based on SNP data from wheat. Scenarios differed for the size N TS of the training set (TS), heritability h 2 , and proportion = 100% × 2 sca ∶ 2 G of the trait ratio 2 sca ∶̃2 sca remained for all scenarios close to 1.0. In wheat, the ratio 2 gca ∶ ⌢ 2 gca started for scenarios with high at a level above 1.6 and approached rapidly its assymptote 1.0, whereas for low , the ratio was consistently below 1.0 with smaller values for large N TS . The ratio 2 sca ∶̃2 sca exceeded 1.5 at the beginning and rapidly approached the asymptote 1.0.

Discussion
Reciprocal recurrent selection of the parent populations is routine in hybrid breeding to support sustainable selection progress. Numerous studies have been conducted in maize and other crops to compare the effectiveness of different methods of reciprocal recurrent selection based on phenotypic data. The comprehensive review by Hallauer et al. (2010) indicates a slightly higher selection gain per cycle for FS over HS reciprocal recurrent selection in accordance with theoretical results (Jones et al. 1971). While these studies were invaluable for experimental quantitative genetics and breeding methodology, they are Fig. 4 Prediction accuracy r gca for GCA averaged over populations Π F and Π M for full-sib and half-sib reciprocal recurrent genomic selection (RRGS). Results for 10 selection cycles, each consisting of four subcycles, based on SNP data from maize. Scenarios differed in the heritability h 2 and the proportion = 100% × 2 sca ∶ 2 G of the trait. The training set size was N TS = 380 not meaningful for the comparison of HS-RRGS and FS-RRGS due to the fundamental differences between phenotypic and genomic selection in these breeding methods discussed below. In particularly, it was unknown, whether both methods differ in the prediction accuracy r gca and how this will affect the development of the GCA and SCA variances and the selection gain in the hybrid population over selection cycles. We resorted to simulations for investigating these questions, because they offer a unique flexibility to analyze different scenarios.

Comparison of reciprocal recurrent selection with half-sib and full-sib families
Reciprocal recurrent selection methods based on phenotypic or genomic data have three steps in common: (i) determination of the selection criterion for all candidates in each parent population, (ii) selection of candidates on the basis of the selection criterion, and (iii) recombination of the selected candidates within each parent population to generate the base material for the next breeding cycle. In phenotypic selection, the selection criterion is based on phenotypic evaluation of HS or FS families produced by inter-population crosses of the candidates in each parent population. By comparison, in genomic selection the selection criterion is Fig. 5 Prediction accuracy r gca for GCA averaged over populations Π F and Π M for full-sib and half-sib reciprocal recurrent genomic selection (RRGS). Results for 10 selection cycles, each consisting of four subcycles, based on SNP data from wheat. Scenarios differed in the heritability h 2 and the proportion = 100% × 2 sca ∶ 2 G of the trait. The training set size was N TS = 380 calculated on the basis of genomic data of the candidates as input for a statistical model (re-)trained with phenotypic and genomic data of the TS in sub-cycle C t,0 . With the HS method, the observed or predicted TC performance is used as proxy for the GCA of the candidates under phenotypic and genomic selection, respectively. For phenotypic selection with the FS method, the candidates from each parent population are selected based on the performance of their SC combination with another randomly chosen parent from the opposite population, serving as a proxy for their GCA. Consequently, selection of a candidate depends not only on its own GCA but also on the GCA of its crossing partner and SCA of the hybrid combination. This confounding of effects is reflected in the formula for the selection gain under FS reciprocal recurrent selection, where 2 sca and the GCA variance of the opposite population contribute to the denominator (Bernardo 2002, p. 201). The new feature of genomic selection applied to the FS method is that by taking advantage of the genomic relationships among the candidates in the female and male parent populations, it is possible to disentangle the GCA effects of both parents as well as the SCA of the hybrid and predict them individually (cf. our companion paper, Melchinger et al. 2023). This feature of genomic selection enables improving the selection gain in FS-RRGS far beyond that of phenotypic selection with the FS method and has received little attention hitherto. In summary, the methods of HS-RRGS and FS-RRGS compared Fig. 6 Genetic variances 2 G , GCA variances 2 gcaF + 2 gcaM , and SCA variances 2 sca among hybrids for full-sib and half-sib reciprocal recurrent genomic selection (RRGS). Results for 10 selection cycles based on SNP data from maize. Scenarios differed for the training set size N TS , heritability h 2 , and proportion = 100% × 2 sca ∶ 2 G of the trait 1 3 203 Page 12 of 17 in our study are identical with regard to steps (ii) and (iii) but they differ with regard to the selection criterion applied in step (i) for all sub-cycles to identify the candidates with highest GCA for subsequent recombination.

Prediction accuracy for GCA effects and persistency over cycles
Both the HS and FS method have in common that phenotyping of the TCs and SCs, respectively, in the TS does not yield a direct estimate of the GCA of their parental candidates unless-contrary to common practice-a broad-based tester from the opposite population would be used for the HS method. Considering genomic selection as a form of indirect selection, according to the breeder's equation (Falconer and Mackay 1996, p. 317) the selection gain ΔG for the target character Y is proportional to the coheritability r(X, Y) = h X r A (X, Y) , where h X is the square root of the heritability of the selection criterion X , r A (X, Y) is the genetic correlation between X and Y , and r(X, Y) is the correlation of X with Y , or synonymously the prediction accuracy of X for Y . In both HS-RRGS and FS-RRGS, the target character Y is the GCA of the candidates in each parent population with regard to the opposite population. For brevity, we confine our further discussion to the GCA of candidates from population F . For HS-RRGS, X is the predicted TC performance of the female Fig. 7 Ratio of genetic to genic variances 2 G ∶̃2 G , of genetic to genic GCA variances ( 2 gcaF + 2 gcaM ) ∶ (̃2 gcaF +̃2 gcaM ) , and genetic to genic SCA variances 2 sca ∶̃2 sca among hybrids for full-sib and half-sib reciprocal recurrent genomic selection (RRGS). Results for 10 selection cycles based on SNP data from maize. Scenarios differed for the training set size N TS , heritability h 2 , and proportion = 100% × 2 sca ∶ 2 G of the trait candidates in combination with the tester from the male population on the basis of Eq. (1). Thus, h X = r u,û (TC) is the prediction accuracy of the GBLUPs for TC performance and r A = r g (TC, gca) is the genetic correlation of TC performance with GCA. If the GCA of candidates and their SCA effects with the tester are uncorrelated, which holds strictly true for a randomly chosen tester but approximately also for testers selected for high or low GCA (results not shown), we can express r g (TC, gca) as a function of 2 gcaF and 2 sca where depends on the tester, with = 1 for an inbred line, = 1∕2 for a SC tester from two unrelated lines, and = 0 if the entire opposite population is used as tester. Thus, r g (TC, gca) can be much smaller than 1.0 (e.g., 0.77 and 0.62 for an inbred tester, if = 24% or 45%, respectively). Summarizing, the prediction accuracy r gca (TC) for GCA effects in HS-RRGS is obtained as product Since r u,û (TC) is little affected by the size of (results no shown), this formula explains why the values for r gca (TC) differed in C 1,0 only by a constant factor between traits with high and low values of (Suppl. Figure S8).
In FS-RRGS, the GCA of the candidates is directly predicted on the basis of the SC performance of the hybrids in the TS. Thus, h X = r u,û (SC, gca) is the prediction accuracy of the GBLUPs for the GCA effects obtained from the statistical model in Eq. (2) and r A = r g (gca, gca) = 1 so that r gca (SC) = r u,û (SC, gca) . We disregarded SCA effects in the calculation of GBLUPs for GCA effects in Eq. (2), which would require reliable estimates of 2 sca that are difficult to obtain in applied breeding programs. Altogether, the higher r gca values for FS-RRGS than HS-RRGS in our simulations are in harmony with a recent experimental study showing for forage traits in maize a higher prediction accuracy for sparse factorial designs than TC designs (Lorenzi et al. 2022).
The persistency of r gca over sub-cycles is of crucial importance for the decision, after how many sub-cycles retraining becomes necessary. The decline of r gca in successive sub-cycles followed generally the same pattern for FS-RRGS and HS-RRGS (Figs. 4, 5, Suppl. Figure S2 and S3), albeit at slightly different levels which depended mainly on the r gca values in C t,0 discussed above. The substantial drop of ~ 50% from C t,0 to C t,1 and the much smaller reductions in subsequent sub-cycles are in harmony with the simulation results of Müller et al. (2017) for genomic prediction of TC performance in synthetics produced from a large number of parents. These authors attributed this pattern to the fact that in sub-cycle C t,0 , pedigree relationships between r gca (TC) = r u,û (TC)r g (TC, gca) candidates in the TS and PS captured by markers are the main driver of r gca . However, the variation in pedigree relationships between TS and PS erodes in advanced sub-cycles so that linkage disequilibrium (LD) between SNPs and QTL becomes the primary source for genomic prediction, which is more persistent due to the low recombination rate between adjacent loci.
The reduction in r gca across cycles occurred at a much lower rate than expected from theory (Daetwyler et al. 2008) based on the decline in h 2 calculated from the values for 2 gca (Figs. 4, 6, Suppl. Figure S2 and S3) and constant 2 e in all cycles. This is most likely attributable to the accumulation of negative LD between adjacent loci over (sub-) cycles as a result of the Bulmer effect discussed below. Thus, LD became in advanced cycles a more important driver of genomic prediction and this hypothesis is supported by the relatively smaller gap in r gca between C t,0 and C t,1 for higher values of t. A notable exception from this trend was the increase in r gca between C 1,0 and C 2,0 in HS-RRGS of wheat for = 45% . This was accompanied by a substantial reduction in 2 sca , much stronger than for 2 gca (Suppl. Figure S4 and S6), which explains the increased prediction accuracy of GCA effects in the subsequent cycle.
We decided to re-train the model after four sub-cycles because at this stage, r gca had in both breeding schemes and all scenarios dropped to a level, where it seems no longer rewarding to continue genomic selection. In practice, the decision for re-training aims at maximizing the selection gain per time unit under given financial and logistic resources. Hence, it depends on numerous factors detailed by Müller et al. (2017), which are beyond the scope of this paper.
Combining the TS data from cycle t − 1 to the TS for retraining the model in cycle t hardly improved the prediction accuracy of both methods (results not shown). Since successive cycles were separated by four generations of selection and intermating, they display low levels of pedigree relationships and linkage phase similarity so that combining old and new sources of information had little effect on the prediction accuracy. For HS-RRGS, the carry-over of phenotypic information across selection cycles was further hampered by choosing in each cycle a new tester with outstanding TC performance in the previous cycle.

Development of GCA and SCA variances over selection cycles
Since r gca differed little between HS-RRGS and FS-RRGS and the selection and recombination schemes were identical for both methods (Suppl. Figure S1), the curves for 2 G (and its major component 2 gca ) as a function of the selection cycle were almost congruent for both breeding methods (Fig. 6, Suppl. Figure S4). Based on the effective population size N e = 20 of the selected fraction in all sub-cycles, the expected reduction in 2 gca due to genetic drift in one cycle (≙ four sub-cycles) amounts to 1 − (0.975) 4 ∧ = 10% (Falconer and Mackay 1996, p. 59). This corresponds to about one quarter of the ~ 40% drop in 2 gca observed from C 1,0 to C 2,0 in all scenarios.
The remaining larger part of the reduction in 2 gca is due to selection and reflects the changes in allele frequencies and build-up of negative covariances among QTL due to the Bulmer effect (Bulmer 1971). To disentangle these causes, we examined the reduction in the genic GCA variance ̃2 gca , which reflects exclusively the changes in allele frequencies due to drift and selection, including hitchhiking effects, but which is unaffected by the Bulmer effect. Since the QTL effects were randomly assigned to the QTL in set Q , 2 gca and ̃2 gca were similar in cycle C 1,0 as expected (Fig. 7, Suppl. Figure S7). An exception was the wheat data set, where the ratio of genetic to genic variance was much larger than 1.0 for all variance components, most likely due to the high level of LD among adjacent loci in the population of males and its high linkage phase similarity with the population of females. After two selection cycles, however, 2 gca was always about 10% smaller than its counterpart ̃2 gca due to the predominance of negative covariances between pairs of QTL, revealing that the Bulmer effect explains a substantial proportion of the reduction in 2 gca in all scenarios of both breeding schemes. The increase in 2 gca from cycle C 1,0 to C 2,0 for = 45% in the wheat data set (Suppl. Figure S4) is attributable to the substantial increase of the genetic distance between the parent populations (results not shown) as a consequence of selection, which entails an increase of 2 gca at the expense of a sharp reduction in 2 sca , as expected from theory (Reif et al. 2007).
The higher rate of reduction in 2 sca than 2 gca over selection cycles for scenarios with high (Fig. 6,Suppl. Figure S4) observed for both methods is partly attributable to the stronger effect of genetic drift on the reduction in 2 sca because the latter depends on the product of the variance-reducing effect of inbreeding in each parent population and amounts to 1 − (0.975) 4 2 ∧ = 19% over four subcycles. Interestingly, the ratio 2 sca ∶̃2 sca was close to 1.0 across all selection cycles in maize and after four cycles in wheat suggesting that covariances of dominance effects display no Bulmer effect. The slower reduction of 2 sca for FS-RRGS compared to HS-RRGS can be explained by differences in the selection pressure for SCA effects. Since the TS for FS-RRGS is composed of a large number of inter-population SCs, each parent population acts as a kind of broad-based tester for the opposite population so that the predicted GCA effects are hardly affected by SCA effects. By contrast, GCA and SCA are confounded in the TC performance for HS-RRGS so that selection affects equally both types of effects.
Altogether, our simulation results support the apprehension of previous studies (e.g., Heslot et al. 2015) that the high selection intensity enabled by genomic selection can cause a rapid depletion of 2 G in both breeding schemes. The rate of reduction in GCA variances is closely associated with the level of r gca and consequently, the risk of narrowing the genetic base is greatest under those conditions, where genomic selection is most successful. Hence, it seems prudent to estimate regularly the genetic variance components in the TS so that breeders can take effective counter measures to avoid a genetic narrowing of their germplasm.

Selection gain and limits under both breeding methods
We determined the cumulative selection gain ΣΔG achieved in the hybrid population by the mean of the genotypic values of the complete factorial of crosses among the gametes produced by the selected candidates. This corresponds to the sum of the selection gain for GCA in the two parent populations except for marginal deviations due to dominance effects. Thus, differences in the curves of ΣΔG between both methods are primarily due to the corresponding differences in r gca and GCA variances. A further factor was that the TS included twice as many candidates for FS-RRGS as for HS-RRGS because they had slightly higher r gca than the candidates in the PS (Figs. 4, 5, Suppl. Figure S2 and S3), their contribution to ΔG in C t,0 was slightly higher.
We observed no crossing of the curves of ΣΔG for HS-RRGS and FS-RRGS (Figs. 2, 3). Hence, the superiority of one method observed in the first cycle carries over to subsequent cycles. Thus, only few sub-cycles are required for comparing both methods in breeding experiments. Moreover, there is no incentive for changing the breeding method between early and late cycles. Only if no genetically distant parent populations are available in the initial phase of the program as applied to our example from wheat, one might start with FS-RRGS due to its higher r gca in C 1,0 and possibly continue afterwards with HS-RRGS depending on the genetic architecture of the trait(s) of primary interest.
For both methods, the rate of increase in ΣΔG and the selection limit were much higher for traits with low . In this case, half of the upper limit for ΣΔG was already reached after one cycle (≙ four sub-cycles), whereas for large at least twice as many cycles were required. Doubling h 2 increased ΣΔG more than doubling N TS from 190 to 380.
Hence, to ensure a high long-term ΣΔG , it is particularly important to use large N TS in combination with high h 2 at the beginning of RRGS because otherwise numerous favorable Page 15 of 17 203 alleles are lost, which cannot be compensated by increased inputs in later cycles.
Within all cycles, the selection gain was by far largest in sub-cycle C t,0 (Figs. 2, 3). This is attributable to the strong reduction in r gca and GCA variances in later sub-cycles discussed above. Further, selection in C t,0 was among DH lines with the full amount of 2 gca whereas selection in later subcycles was among S 0 plants with half the amount of 2 gca . An important question for ΣΔG in HS-RRGS concerns the choice of the tester. According to theory (Rawlings and Thompson 1962) and experimental results, low-performing testers with low frequency of favorable alleles at important loci are most effective for improving GCA (Hallauer et al. 2010). By contrast, we adapted the common practice in hybrid breeding and chose from the previous cycle the top line with highest predicted GCA from the opposite population as tester, because in this case the best TCs evaluated in the TS represent already promising hybrids of direct use for commercialization. For large , it follows from Eq. (5) that using a single-cross tester increases prediction accuracy for GCA effects and consequently also ΣΔG , but this would rule out direct development. Updating and choice of the best tester from the lines of the previous cycle most likely helped to push the genetic composition of each parent population in a direction that it optimally complemented the opposite population of the heterotic pattern, but further research is warranted to investigate the choice of the tester on the longterm selection in HS-RRGS.
In our selection scheme, we restricted the maximum number of individuals selected from an intra-population FS family in the recombination step to five genotypes. Alternatively, inbreeding in each parent population could be mitigated by applying optimum contribution selection (Woolliams et al. 2015) and/or selection amongst heterozygous genotypes with greater gametic variance (Bijma et al. 2020;Lehermeier et al. 2017;Müller et al. 2018), which could be further combined with optimal cross selection (Gorjanc et al. 2018). These modifications could be applied equally to both methods but most likely do not change their ranking, yet further research is warranted to quantify their effect.
As a spin-off of reciprocal recurrent selection, breeders are interested to identify in every (sub-)cycle the best genotypes for cultivar development. The usefulness criterion proposed by Schnell and Utz (1975) is a suitable tool for comparing the two methods with respect to the expected performance level of the selected genotypes. For hybrid genotypes, usefulness is a function of the mean of the hybrid population and 2 gca of each parent population. Since the two methods hardly differed in 2 gca for most scenarios, their ranking with regard to the usefulness criterion followed closely that for ΣΔG discussed above. With FS-RRGS, however, it is possible to predict not only the GCA but also the performance of the hybrids in the entire factorial between all lines (i.e., TS ∪ PS) in the parent populations, which opens new avenues for improving the efficiency of hybrid breeding.

Limitations of our study
While simulations are a powerful tool for analyzing the questions addressed in our study, they are meaningful only if the underlying model allows a simplified but nevertheless faithful representation of reality. This is of utmost importance for long-term selection because even minor deviations irrelevant for a single selection cycle will accumulate exponentially over cycles and may therefore lead to erroneous conclusions regarding the long-term prospects. A critical assumption underlying our simulations concerned the genetic architecture of the traits. We choose our genetic models ignoring epistasis based on experimental data of maize detailed in our companion paper ). Since little comparable information is available on self-fertilizing species, we resorted to a literature review about the importance of GCA and SCA variances in autogamous and partially allogamous crops (Suppl . Table S1). In general, the values underlying our simulations were in agreement with the estimates from these experimental studies.
The marker densities in our simulations correspond to those of custom-made SNP chips currently used in the commercial sector. Higher marker densities resulted only in marginal improvements of r gca (DoVale et al. 2022;Müller et al. 2017). We ignored the effects of mutation for retarding the depletion of 2 G (Walsh and Lynch 2018) as this should affect the long-term ΣΔG of both methods equally. We also disregarded genotype × environment interactions in the phenotyping of the TS. If a highly selected tester with proven stability over environments is used in HS-RRGS, one might expect that TCs show smaller interactions and consequently higher h 2 than the unselected hybrids in the TS for FS-RRGS, which would increase their r gca . Experimental comparisons of both methods should therefore be conducted in multiple environments and consider genotype × environment interactions in the statistical model for analyses due to their impact on the prediction accuracy (Acosta-Pech et al. 2017;Basnet et al. 2019;Crossa et al. 2017).

Conclusions
For hybrid breeding in crops and heterotic patterns with high 2 sca for important agronomic traits such as yield in temperate maize, FS-RRGS is the method of choice because with high h 2 or large N TS , ΣΔG is ~ 3-10% higher than what is achieved with HS-RRGS. In contrast, for traits with mainly additive gene action, one might consider HS-RRGS, because