Genomic Prediction: Progress and Perspectives for Rice Improvement

Bartholomé, Jérôme; Prakash, Parthiban Thathapalli; Cobb, Joshua N.

doi:10.1007/978-1-0716-2205-6_21

Jérôme Bartholomé^4,5,6,
Parthiban Thathapalli Prakash⁶ &
Joshua N. Cobb⁷

Part of the book series: Methods in Molecular Biology ((MIMB,volume 2467))

6278 Accesses
7 Citations
8 Altmetric

Abstract

Genomic prediction can be a powerful tool to achieve greater rates of genetic gain for quantitative traits if thoroughly integrated into a breeding strategy. In rice as in other crops, the interest in genomic prediction is very strong with a number of studies addressing multiple aspects of its use, ranging from the more conceptual to the more practical. In this chapter, we review the literature on rice (Oryza sativa) and summarize important considerations for the integration of genomic prediction in breeding programs. The irrigated breeding program at the International Rice Research Institute is used as a concrete example on which we provide data and R scripts to reproduce the analysis but also to highlight practical challenges regarding the use of predictions. The adage “To someone with a hammer, everything looks like a nail” describes a common psychological pitfall that sometimes plagues the integration and application of new technologies to a discipline. We have designed this chapter to help rice breeders avoid that pitfall and appreciate the benefits and limitations of applying genomic prediction, as it is not always the best approach nor the first step to increasing the rate of genetic gain in every context.

You have full access to this open access chapter, Download protocol PDF

Genomic Selection in Rice Breeding

Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement

Article Open access 10 February 2016

Understanding the genomic selection for crop improvement: current progress and future prospects

Article 10 May 2023

Key words

1 Introduction

The objective of every plant breeding program is to provide improved varieties that meet the needs of key stakeholders (value chain participants from farmers up to consumers). A clear understanding of the biology and the genetics of the species combined with a targeted product concept is a key element to achieve this objective [1]. However, the genetic landscape that a breeder needs to explore to identify superior products is very large and materially exceeds the capacity of breeding programs [2]. Indeed, plant breeding can be considered as a numbers game where breeding schemes are designed to increase the probability of finding genotypes with desirable combinations of characteristics using a limited amount of resources [3]. The breeding scheme is the conceptual framework that captures all the activities that a breeder does during a breeding cycle. A single breeding cycle can be summarized in four major parts: creation, evaluation, selection, and recombination [4] and is designed to create new variation, accurately assess the performance of the breeding germplasm, and to recombine selected individuals to form an improved cohort. Evaluation is a central part of a breeding scheme which involves multiple phenotyping steps designed to estimate the heritable genetic value (or breeding value) of the selection candidates [5]. In the case of yield, usually a set of genotypes preselected for highly heritable traits are evaluated in multi-environment trials (MET) intended to represent the target population of environments (TPE) in which the product is expected to perform [6, 7]. These final steps of the evaluation process require significant resources and span over multiple years in a majority of plant breeding programs [3]. To overcome this limitation and increase the efficiency of breeding programs, several methodologies and tools have emerged over the last three decades due in large part to improvements in the characterization of DNA polymorphisms and computing power [8]. Among them, methods that use molecular information to infer phenotypic performance (such as marker-assisted selection [9, 10] and genomic selection [11]) are important tools that allow modern breeding programs to maximize the use of their limited resources. Contrary to classical marker-assisted selection, genomic prediction accounts for quantitative trait loci of both large and small effect, thus capturing a higher proportion of the genetic variance of a trait [12, 13].

The concept of genomic selection was first proposed by Meuwissen et al. [11] for animal breeding. In this simulation study, the authors predicted the genetic value based on molecular markers of juveniles without phenotypic records using the animals of the two previous generations to estimate the marker effects. They obtained high accuracies for the predicted breeding values (genomic estimated breeding values—GEBV) and concluded that this approach to increase the rate of genetic gain has potential when coupled with techniques to reduce generation intervals. Genomic selection commonly refers to the process where selection candidates, which are only genotyped, are selected based on their GEBV (genomic predictions). To achieve this, marker-phenotype relationship is first modeled using a training set (a smaller representative set of individuals that reflects as closely as possible the genetics of the individuals intended for prediction) on which phenotypic and genome-wide marker data are both generated [12, 14]. To evaluate the performance of the models, most of the time, the correlation between the predicted and observed values is calculated using a validation population whose composition depends on the validation strategy [15]. This metric is usually referred to as accuracy or predictive ability depending on which observed values predictions are compared to: breeding values or phenotypic performances, respectively.

The accelerated development of mid- and high-density genotyping technology during the 2010s led to the first report of the practical use of genomic prediction in dairy cattle [16] followed by important contributions by breeders working in agriculturally important plant species [17, 18]. Indeed, genomic prediction is now an intense field of research seeking to optimize its use and integration into both plant and animal breeding programs globally. Important advancements have been made regarding our understanding of the major factors affecting the accuracy of the GEBVs including the effective population size of the breeding program, the heritability and genetic architecture of the target traits, the size and the composition of the training population, as well as the number, distribution, and informativeness of the markers [19]. Genomic prediction models and their implementation in software tools have also received special attention in order to efficiently leverage all information contained not only in genomic and phenotypic datasets, but also in other sources of “omics” data [20]. While the drivers of prediction accuracy are increasingly well understood, the question of how genomic prediction best integrates into an existing plant breeding strategy remains a challenge since breeding programs operate in a wide variety of contexts (target traits, species, resources, scale, etc.).

Rice (Oryza sativa ) is a model species for molecular biology [21] and a staple food for a large part of humanity. Important gains in productivity were obtained thanks to the breeding efforts during and immediately following the green revolution [22, 23]. These improvements were realized mostly through phenotypic selection in large segregating pedigree nurseries [22, 24, 25]. The use of molecular markers was also key for the introgression of major alleles conferring resistance to biotic [26] or abiotic stress [27, 28]. The success of this strategy depended heavily on the high heritability and simple genetic architectures of the traits under selection (plant height, maturity, disease resistance, grain type) and the very large and well-characterized genetic diversity of O. sativa [29, 30] and closely related species such as O. glaberrima (African rice), O. rufipogon, or O. nivara [31, 32]. This may explain why the interest for implementing genomic prediction in the global rice breeding community has been delayed relative to animal breeding or breeding traditionally cross-pollinated crops like corn. During this time, it bears mentioning that some key advancements were made through population improvement via a recurrent selection strategy in Latin America [25, 33]. However, more recently, the acceleration in genetic gain for yield in other species, the decreasing costs of genotyping, and the growing importance of sustainability in rice production have contributed to an increased interest in deploying genomic prediction in rice breeding.

In this chapter, we first give an overview of the research on genomic prediction in rice with a focus on studies that make use of the strategy in a breeding program. Then we highlight important considerations for the integration of genomic prediction into a rice breeding scheme. In this second part, aspects such as identifying the entry points for genomic selection in a breeding scheme, the effective design of training populations, strategies to reduce the generation interval, and the importance of data management systems are presented. In the third part, we take the International Rice Research Institute (IRRI) breeding program for irrigated systems as an example for the integration of genomic prediction into a product development program and provide the associated data and R scripts to run and interpret the analysis (available in Data 1, 2, and 3). In the last part, we present interesting progress in genomic prediction that can further help rice breeding programs to increase their efficiency. Our objective for this chapter is to provide rice breeders with a solid foundation for understanding the advantages and limitations of using genomic prediction in their breeding strategy to maximize the rate of genetic gain for relevant traits. Due to the heavy presence of inbred rice in Asia, we chose to focus the scope of this chapter to inbred Asian rice (O. sativa) though the specificities of applying genomic prediction to hybrid rice are addressed to a lesser extent. For another viewpoint on the importance of genomic prediction for rice breeding, we refer the reader to the book chapters from Spindel and Iwata [34] and Ahmadi et al. [35].

2 Genomic Prediction Works in Rice

The literature on genomic prediction for crop species is very rich. With over 50 studies published since 2014 (Table 1 [36, 38, 40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91]), genomic prediction in rice is not an exception. We report on most of the studies published in rice (either exclusively or in concert with other species) in order to highlight the volume and diversity of the work conducted to date and their relevance for improving breeding strategies. To achieve the latter, we intentionally emphasize studies focused on integration with breeding programs, which tend to report the more practical challenges of the implementation.

Table 1 Studies on genomic prediction in rice. When multiple data sets were used in a study, the information is reported only for the rice dataset

Full size table

2.1 General Overview

The first studies reporting the use of genomic prediction on rice were published in 2014 (Table 1). Despite the wealth of genomic and marker resources available in rice, these studies came, surprisingly, 5 years after the first studies on genomic prediction (using real data) that were published in maize [92], wheat [93], or barley [94]. The breadth of genomic resources available to rice and the depth of genetic diversity that has been characterized so far have led to the discovery of many major QTL with reasonable effect sizes. While a unique and valuable resource for the rice breeding community, the heavy focus on discovery, characterization, and introgression of large-effect QTL from exotic germplasm may have served to delay the transition toward genomic prediction [95]. The type of populations evaluated in these early genomic prediction studies in rice tends to reinforce that impression (Fig. 1a). Indeed, among the first three studies published in 2014, two were based on the same diversity panel [37] and one on hybrids derived from a mapping population (an immortalized F₂ [39]). Overall, diversity panels which were, in many cases, designed for association studies [37] represented a large proportion of the studies published so far (Fig. 1a). For most of these studies, the objective was methodological: understanding the impacts of population structure, integration of prior knowledge on trait genetic architecture, training set optimization , model comparison or integration of crop models without direct implication in a breeding program. Given the extent of ancestral subpopulation structure in rice, the use of diversity panels to assess genomic prediction models is likely to induce bias in the estimation of the predictive ability. Indeed if the population structure is not taken into account, most of the predictive ability can arise from the ability to predict between subpopulations and not within subpopulations [36]. Apart from studies based on diversity panels, 16 studies used breeding lines, nine studies focused on hybrids, six used a mapping population, four studies were based on synthetic populations, and three used cultivars (Fig. 1a).

In addition to the wide variety of populations encountered in these studies, the size of the population, the number of markers, the number of phenotypes, or the number of environments used to characterize the populations was highly variable (Fig. 1b). The largest population size (2265) was achieved using publicly available data from the 3000 rice genome project [87]. Given the limitations and difficulties surrounding the collection of high-quality phenotype data, understandably most studies employed population sizes around 300 (Table 1). In cases where large populations of 1000 or 2000 individuals were used, the phenotyping was done in a very limited number of environments (usually 1 or 2). In fact, less than half of the studies used more than three environments for phenotypic evaluations (Fig. 1b). Among the three studies having phenotypic information in 10 or more environments (year, season, or location), two are based on germplasm from breeding programs [51, 66] but the datasets were unbalanced (not all individuals phenotyped in all environments or genotyped). The third study from Jarquin et al. [88] used the information from 51 environments in combination with days length to predict days to heading for untested genotypes. Among the wide variety of traits considered, flowering time (or maturity), plant height, and grain yield were the most common. The number of markers ranged from 162 [50] to four million [89], with the majority of the studies using a few thousand markers (Table 1). Genotyping by sequencing and fixed SNP (single nucleotide polymorphism) arrays were the most commonly used technologies. In some cases, very high marker densities were obtained through whole-genome re-sequencing at generally low coverage (1× or 2×) followed by imputation [75, 87].

Statistical methods for genomic prediction have been a central focus of many studies across all species where it has been applied. Across the 54 rice studies, 33 different methods were evaluated with the genomic best linear unbiased prediction (GBLUP) method being the most used (Fig. 1c). Since this method was proposed [96], its flexibility and robustness have enabled it to quickly become a reference method for both animal and plant breeding. Similar to the traditional pedigree BLUP [97], GBLUP uses an additive relationship matrix that is based on markers instead of pedigree information. Several extensions or variations of this additive model have been proposed to account for dominance and/or epistasis [38, 55] or to use other “omics” data (transcriptome or metabolome) to estimate relatedness among individuals [82, 91]. In addition to GBLUP , RKHS (reproducing kernel Hilbert space), frequentist and Bayesian LASSO (least absolute shrinkage and selection operator), RR-BLUP (ridge regression BLUP), RF (random forest), SVM (support vector machine), PLSR (partial least squares regression), BayesB, and BayesC were the most used methods in these studies on rice (Fig. 1c). Other methods from the large family of machine learning approaches, such as gradient boosting machine (GBM) or artificial neural network (ANN), were also evaluated in the context of genomic prediction with mixed results [70, 87].

The composition of the validation set, which can play an important role in determining the accuracy of predictions, was highly dependent on the validation strategy used in each study (Table 1). Sallam et al. [15] defined three main types of validation methods: cross-validation (subset validation), interset validation, and progeny validation depending on the composition of the training and validation sets. The cross-validation or subset validation (k-fold, leave-one-out, random, or stratified sampling) was by far the most used strategy among all of the studies that we have compiled (Fig. 1d). This validation method is very convenient because you just need to partition your data into training and validation sets to be able to estimate accuracies without an “independent” dataset (as is needed for interset or progeny validation). Due to its nature, cross-validation tends to overestimate the accuracy of prediction compared to more realistic validation scenarios [59, 98]. The situation becomes even more complex when multivariate models are used [99]. Another approach close to cross-validation, the HAT method [57, 100], was used in four studies. This method, based on the hat matrix of the random effects, uses the predicted residual sum of squares to estimate accuracy of prediction and works in the context of GBLUP , RKHS, and Bayesian models. This method is considerably faster than cross-validation as no additional model retraining is necessary [100]. The interset and the progeny validation methods were only used in three studies each (Fig. 1d). Considering the context of breeding programs where the integration of genomic prediction is primarily targeted to reduce cycle time, progeny validation represents a more meaningful assessment of the performance of prediction models. Indeed, in the initial concept of genomic selection, Meuwissen et al. [11] used progeny validation: models were built with data from generations 1001 and 1002 and the accuracies were calculated using the predicted values and the true breeding values from the generation 1003. Moreover, the decay of linkage disequilibrium occurring between markers and QTL due to the recombination in progeny generations tends to decrease the accuracy of predictions [101], but makes them more realistically interpretable in terms of applications to practical breeding scenarios. For example, Ben Hassen et al. [59] used progeny validation of inbred lines with a limited number of individuals and found lower predictive ability compared to cross-validation for the same traits.

2.2 Important Findings and Current Limitations for Genomic Prediction in Rice

2.2.1 Important Findings

Table 1 provides a short summary of the main objective of each study in this review. The reader can thus be directed toward the publications that are most relevant to his or her questions. Hereafter, we summarize important results focusing mainly on those most related to the implementation in breeding programs.

1.
Genomic prediction works in different contexts. The most important results that arise from all studies is that the prediction of the performances based on molecular markers works. Indeed, the accuracy of GEBVs are relatively high even for traits like grain yield. Many rice breeders are concerned by the efficiency of genomic prediction but it is clearly not justified looking at the literature on rice and more specifically studies using breeding germplasm [43, 47, 55, 56, 59, 64].
2.
Prediction accuracy can be increased. The breeders can play on different factors to increase the accuracy of predictions or to reduce the cost of implementation. Indeed, by optimizing the training set composition and evaluation, by targeting informative molecular markers (polymorphic with a medium to high minor allele frequency and spread along the genome), or by integrating additional data (historical, environmental covariates, crop model, …) better accuracies can be obtained. The size and the composition of the training set defines the strength of genetic relationship with the selection candidates which is one of the most important factors driving the accuracy. Therefore, algorithms have been developed to select the training set [41, 44, 48, 75, 80]. Concerning molecular markers, different studies show that marker density can, to some extent, be reduced without affecting the prediction accuracy. For example, Arbelaez et al. [69] designed a cost-effective SNP assay with only 1000 markers selected to be informative in elite breeding material and obtained good accuracies.
3.
Models can predict offspring performance. The initial concept of genomic selection was based on the prediction of breeding values of offspring with the objective to decrease the duration of breeding cycles [11]. The very few studies on rice that performed progeny validation [58, 59, 91] show promising results when parental information is used to predict progeny performances. However, more remains to be done in that direction since most of the increase in genetic gain related to the integration of genomic prediction is related to the reduction of breeding cycle time.
4.
Genomic prediction is efficient in the context of hybrids: Much of the lessons learned regarding marker densities, training set identification, and model selection apply equally to hybrid and inbred breeding schemes. Hybrid programs do present unique challenges where predictions could be applied that are not applicable to other breeding schemes. Of note is the prediction of how males and females might be combined to create superior hybrid combinations. In hybrid rice there is some evidence that hybrid performance is driven by a convergence of additive genetics from the male and female lines. Incorporating nonadditive parameters into the prediction does not seem to help [38]. While this seems reasonable, other crops have shown a significant nonadditive component to hybrid performance (e.g., in corn [102, 103]). This particular conclusion was likely biased by a very narrow genetic base and very low accuracy for interset prediction of grain yield. There is also evidence that multi-trait models can improve prediction accuracy for low heritability traits in hybrid rice [56, 83]. This is of particular importance in the hybrid context as many traits (especially cost of goods traits like hybrid seed production yield) are particularly difficult to measure early in a breeding program. A particularly unique set of correlated phenotypes associated with hybrid programs is the opportunity to measure per se performance of the inbred parents as well as hybrid performance of the same material. Using parental phenotype data combined with data on hybrid performance can improve the prediction accuracy of hybrid rice yield by about 13% [91].
5.
Modeling GxE increases prediction accuracy . Whether it is through multi-environment genomic prediction models [58, 64] or by combining crop growth models and genomic prediction models [50, 88], several studies demonstrated the better accuracy of these approaches to predict environment-specific performances. A key advantage of genomic selection over traditional phenotypic selection in the case of multi-environment models is the ability of models to assess marker effects and marker effects by environment interactions and ultimately increase the prediction accuracy [18, 104]. With the integration of crop growth models in the genomic prediction framework, the response of the genotype to the environmental variations is modeled which allows the prediction of the performance of selection candidates for untested environments [105]. This approach is very promising for rice improvement because it takes better account of GxE. However, the routine use of crop growth models in breeding programs requires a substantial investment in terms of data acquisition and analysis and thus will be interesting for specific rice systems prone to environmental constraints.
6.
Differences between genomic prediction models are marginal. Most of the studies comparing statistical models for genomic prediction found small or no differences between them in terms of accuracy [20]. In general, none of the models is consistently better for all the traits or validation methods. GBLUP is usually used as a reference due to its simplicity, versatility to include different types of information, and robustness to different trait architecture. Bayesian models (such as B-LASSO, BayesB or BayesC) or RKHS can perform better when dealing with traits influenced by large-effect genes (such as flowering time or blast resistance). The few studies that used machine learning methods (such as ANN or SVM) reported disappointing results with very variable performances even with an optimization of the parameters [70, 87]. Further work in this direction is probably needed to conclude on the interest of these methods for routine genomic prediction.

2.2.2 Current Limitations

In spite of the number and diversity of studies, there are still some points that are not well covered in the literature on rice. Depending on the context, they can be limiting for harnessing the full potential of genomic selection.

1.
Accuracy alone is not enough to assess the effectiveness of genomic prediction . Almost all the studies based their evaluation of genomic selection on the accuracy of the predictions. Although accuracy is an important factor to assess prediction model efficiency, it does not inform on which individuals are selected in fine by the different methods. The realized selection differential would probably be a better metric to compare different genomic prediction approaches as breeders jointly consider several traits to advance material, which makes the evaluation on traits separately less relevant. Finally, as rightly pointed out by Bassi et al. [106], the phenotype is also only a predictor of the true breeding value and has an error variance just like a GEBV.
2.
Within-family prediction accuracy is not sufficiently taken into account. No study on rice has looked in detail into within-family prediction accuracy using multiple biparental families or parental information as the training set. Indeed, except for the specific case of studies using one biparental family, reports on within-family accuracy are scarce. This is manifest as well in the hybrid literature where most papers focus on predicting specific hybrid combinations and do not attempt to estimate general combining ability among a cohort of new males or females. This is however a key point when it comes to the implementation of genomic prediction since greater within-family accuracy can help to increase the rate of genetic gain while balancing the level of inbreeding in the population. Differences between crosses are better predicted as both within and between family variations are captured by the model [107, 108].
3.
Grain quality or disease resistance traits were neglected. No study related to the nutritional value of the polished grain (zinc content, glycemic index, …) was published to date. Only one study assessed the potential of genomic prediction to help decrease the level of arsenic in the grain using breeding [74]. Regarding disease resistance, the only study from Huang et al. [77] reported accuracies ranging from 0.15 to 0.72 for the prediction of resistance to several isolates of Magnaporthe oryzae (blast). For disease resistance, rice geneticists focus mainly on major genes, but targeting quantitative variation is also important to address concerns like bypassing resistances. For grain nutritional value, negative correlations between traits can be better addressed using multi-trait genomic prediction.
4.
Implementation in breeding programs is secondary. While it is clear that the underlying goal of all studies is to improve our knowledge of genomic prediction to optimize breeding strategies, few of them place their findings in a concrete case of a breeding program. For example, Spindel et al. [47] proposed to integrate genomic prediction into an irrigated rice breeding pipeline and discussed the advantages and constraints of such a scheme. However, for most of the studies working on breeding germplasm (see Table 1) this is not the case. The results therefore remain more theoretical than practical, as such analyses are important to justify investments in genomic selection and to understand potential barriers to its implementation.

3 Integration of Genomic Prediction into Rice Breeding Programs: Key Aspects

Entry points for genomic selection in a rice breeding program will vary depending on the objectives of the program, the breeding strategy in place, the genetic and/or environmental constraints the breeder has to account for, and the cost of genotyping and of phenotyping the traits under selection. However, there are key prerequisites to assess before integrating a breeding program’s readiness to implement genomic prediction. In the absence of essential components such as (a) clear objectives, (b) meticulous data management, (c) effective operations, (d) effective phenotyping and (e) selection based on BLUP, the application of genomic predictions is extremely limited [4]. Executing genomic prediction using breeding data or specially designed training sets is useful for establishing baseline capacity to do prediction, but integrating the technology into an existing breeding program can be a challenge. Breeding programs represent multi-year pipelines that manage overlapping cohorts of germplasm, so changing the strategy often is done stepwise so as not to disrupt the product development process. The purpose of this section is to provide guidelines regarding important elements to consider before implementing a genomic selection strategy in a rice breeding program.

3.1 Map the Breeding Strategy

The main value of genomic prediction lies in its use in decision making to efficiently select breeding material at one or several stages of the breeding scheme. Therefore, a clear understanding of the breeding strategy and its different components is the basis for an efficient integration of genomic prediction. Oftentimes, the breeding scheme resides in the head of the breeder, and translating this knowledge into a structured framework is a mandatory step to carefully design alternative schemes [109]. Genomic prediction is a long-term investment for the breeding program and the direct transition to an optimal genomic selection strategy is not always possible. Therefore, a transition plan needs to be elaborated by the breeding team and experts in order to define clear steps to achieve the objectives. This aspect is usually not reported in the literature on genomic prediction as it comes down to more technical information regarding the breeding scheme. In rice, only one study placed the results in the framework of a breeding program and detailed the use of genomic prediction and its potential impacts [47]. However, as shown in wheat, this step of breeding scheme characterization is essential for the integration or the optimization of genomic selection based on the knowledge acquired during the last years [106, 110].

Optimal genomic selection schemes are usually not simple evolutions of the current breeding scheme. The majority of conventional breeding schemes in rice, and self-pollinated crops in general, rely on pedigree breeding [25] but genomic selection is best suited to recurrent selection schemes based on elite by elite crosses to improve complex traits. Indeed, a well-structured breeding program where the elite germplasm has been clearly identified and with a small effective population size (Ne ≈ 40) is more likely to benefit from the use of genomic prediction due to higher linkage disequilibrium between markers and QTL, low or absence of population structure, and higher relatedness among genotypes. In addition, several major changes are needed to fully leverage genomic predictions: reduce cycle time, build a training set, store/use phenotypic and genotypic data, reallocate budget and staff [106, 111]. Understanding the interconnections between these changes and how they will impact the sequence of current operations allow to anticipate potential obstacles.

Key recommendations:

1.
Define clearly the current breeding strategy and its objectives.
2.
Plan the integration of genomic prediction as a long-term investment with a clear roadmap.
3.
Use recurrent selection in elite population to maximize the potential of genomic prediction.

3.2 Reduce the Cycle Time

An interesting aspect of genomic selection is that it has led to a greater focus on the fundamentals of breeding in the plant breeding community [112]. The concept of response to selection captured in the Breeder’s equation is perhaps the best example [4, 109]. Among the parameters of the equation, the generation interval (or cycle time) is the easiest to understand and to play with. As highlighted by Meuwissen et al. [11] in their seminal paper, the use of genomic predictions can greatly increase the rate of genetic gain by reducing cycle time: “It was concluded that selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants, especially if combined with reproductive techniques to shorten the generation interval.” This conclusion was confirmed 15 years later by the first report of the impact of genomic selection on the rate of genetic gain in dairy cattle [113]. The authors found a dramatic reduction in the generation interval related to a sharp increase in the rate of genetic gain from yield traits (50–100%). In plant breeding, methods to reduce cycle time (independently from the use of genomic selection) have been studied for several decades now [114, 115] . Rapid generation advance (RGA) or double haploids are probably the most common in crop species, even if more modern approaches have been proposed lately [116, 117]. In rice, RGA has regained interest recently as it is a cost-efficient way to quickly fix material (typically from F2 to F6 in 1 year) for its evaluation in replicated trials [118]. This can be realized in greenhouses, screenhouses or in the field depending on the resources available. For breeders working on a classical pedigree breeding scheme, the use of RGA could be a first step toward the implementation of genomic selection [119]. For breeding programs already implementing RGA or similar methods to reduce cycle time, genomic selection can further help to shorten the breeding cycle. However, this requires a genomic prediction model that can efficiently predict the genetic value of the next generation (progeny). Therefore, a training set based on material from one or several previous cycles has to be constituted before implementing this type of scheme. This is also the case for more aggressive strategies based on recurrent selection that aim at recombining non-fixed material (S₀) selected based on predicted values only. In that type of scheme, the population improvement part is partially decoupled from the product development part, which allows a 1-year or even shorter breeding cycle [120, 121]. For the moment, only simulation studies have reported this type of scheme since several technical challenges have to be solved before implementation. Indeed, a drastic reduction of breeding cycle time can lead to overlapping activities between different cycles during the transition period that may disrupt ongoing cycles or increase substantially the workload.

Key recommendations:

1.
Use genomic prediction in conjunction with robust methods to produce inbred lines (e.g., rapid generation advance) to effectively reduce cycle time.
2.
Take into account technical constraint associated with cycle time reduction into the genomic prediction roadmap.

3.3 Design the Training Set

Once the entry point of genomic prediction in the breeding scheme has been defined, the design of the training set is the first step toward the implementation of genomic selection. Three major choices have to be made regarding the training set: its composition and size, its phenotyping, and its genotyping. The breeder must find a balance between these three aspects in order to optimize the training set according to available resources. A simple way for most breeding programs to get started is to begin genotyping every line that enters the yield trial. From there, those datasets can be empirically optimized to increase prediction accuracy.

It is well known that the accuracy increases with the size of the training set. Theoretical [122,123,124] and empirical studies [67, 125, 126] suggest that the training set size should be maximized when dealing with complex traits. However, large training sets are not always feasible mainly due to genotyping and phenotyping costs. Several methods were developed to optimize the training set composition in order to achieve high accuracies while maintaining the size to a manageable number [41, 44, 48, 71, 75, 80, 127,128,129]. All of these methods use the additive genetic relationships (usually based on marker data) to optimally sample a set of representative genotypes. A key aspect of the optimization of the training set is the definition of the predicted set (selection candidates). Indeed, close genetic relationships between the training set and the selection candidates are key to maximize prediction accuracy [130, 131]. Therefore, most of the optimization methods are jointly considering the genotypes that will compose the training and the predicted sets to either directly compute criteria based on relatedness (the average of the relationship coefficients between the training set and the predicted set [128, 132]) or to estimate criteria based on mixed model theory (the prediction error variance, the coefficient of determination, or the expected accuracy [41, 80, 127]). In the cases where the training and the predicted sets come from the same population (e.g., selection candidates from the same cohort) or the information on the predicted individuals is not yet available (e.g., offspring), optimization methods have been developed to minimize the genetic relationships between individuals of the training set [48, 75]. Depending on the availability of data and the prediction objectives, the breeder can choose among these optimization methods to shape the training set and update it when selection candidates from a new cycle need to be predicted.

The optimization of the composition of the training set has to be done in conjunction with the phenotyping strategy. In most cases, the selection candidates that will be used to update the prediction model are evaluated for key traits in MET to estimate G × E. Since the total number of plots available for the evaluation is almost fixed, the breeder needs to balance the population size with the level of replication (within and across environments). Classically, the level of replication increases during the breeding cycle to dedicate more resources to a smaller number of more promising lines in the final stages. In the context of genomic selection where the evaluation unit being the alleles instead of the individuals, increasing the size of the training set while decreasing the level of replication tends to increase the accuracy of prediction [133, 134]. The typical size of a training population (150–300) to be phenotyped in a classical fully replicated experiment can therefore be multiplied by 1.5–3 with sparse testing. However, it is advisable to have a sufficient level of replication within and across environments to: (1) maintain repeatability, especially for low heritability traits, (2) assess the level of G × E, and (3) avoid model convergence issues with too few replicates. The limitation of replication using sparse testing approaches can also be a good opportunity when the seed availability is a constraint.

Finally, the technology used to genotype the training and predicted sets needs to be carefully considered in order to efficiently capture distinct QTL alleles as well as general relatedness in the population. Several factors come into play when choosing or developing the appropriate genotyping technology: cost, type of markers, density, informativeness in the target population, reproducibility rate, etc. In the case of applying genomic prediction, a good characterization of the genetic diversity managed by the breeding program is essential to determine the marker density needed to achieve an optimal prediction accuracy. It has been shown using both deterministic [13, 135] and stochastic [136] simulations that the marker density has to increase when the effective population size increases to maintain the accuracy [135,136,137]. However most empirical studies in rice found that the accuracy reaches a plateau when the marker density goes beyond 2–5 markers per centiMorgan for breeding programs with an effective population size lower than 50.

Key recommendations:

1.
Maximize the relatedness between the training and the predicted sets where possible.
2.
Use sparse testing for phenotyping in order to balance the size of the training set and the level of available resources.
3.
Avoid using a training set from one breeding pipeline in order to predict the candidates from another breeding pipeline.

3.4 Generate and Integrate Good Quality Data

As highlighted before, data acquisition and management are essential components of a breeding program. All advancement decisions are made based on recorded data from multiple sources (field, laboratory, service provider, etc.). Careful data management from the seed to the phenotype and/or to the genotype has to be in place to ensure accuracy. The use of digital data collection tools is a key way to reduce as much as possible errors that can be perpetuated during the data collection process. Concerningly, it has been demonstrated with simulated data that even a small percentage of severe errors (0.1% or 1%) in phenotypic records can severely reduce the response to selection [138]. Similar conclusions were also found when errors are present in the pedigree records [139]. Besides accurate data, robust and appropriately designed analysis pipelines are needed to curate the data and turn it into interpretable intelligence. Genomic prediction adds an additional layer of complexity compared to traditional marker-assisted selection in that it can require the integration of different types of data (phenotypes, genotypes, pedigree, and/or weather data) collected over several years to be useful. Consistency of data type and format and the stability of data structures over time are key aspects to leveraging the full power of historical breeding data to train and continuously update genomic prediction models [140].

To help the breeders with data management, software solutions such as the Breeding Management System (https://bmspro.io), Breeding4Results (B4R) (https://riceinfo.atlassian.net/wiki/spaces/ABOUT/pages/326172737/Breeding4Results+B4R), Breedbase (https://breedbase.org), or GOBii Genomic Data Management (https://gobiiproject.atlassian.net/wiki/spaces/GD/overview) are available and used in different public organizations. Despite the significant efforts to develop analysis pipelines (like the RiceGalaxy, https://galaxy.irri.org, [141]) and the Breeding API project (https://brapi.org) designed to enable interoperability among plant breeding databases, no efficient end to end solution is publicly available to perform genomic prediction in the context of an applied breeding program. Indeed, several limitations are present among available software for implementing genomic prediction, including a lack of direct linkages between genotypic and phenotypic data, limited multi-environment or multi-trait analytical capability, no possibility to integrate dominance or epistasis effects into a prediction model, and no meaningful integration of weather data into an analytical pipeline. The majority of public breeding programs therefore extract the phenotypic and genotypic data from their respective data management software and use ad hoc analysis pipelines to run genomic prediction models . Hopefully, projects such as the Breeding API or the Enterprise Breeding System (https://ebs.excellenceinbreeding.org) will offer these possibilities in the near future within a coherent framework designed to enable applied breeding programs.

Key recommendations:

1.
Use digital data collection systems where possible.
2.
Work with data management systems and efficient analysis routines for genomic prediction (GBLUP , RR-BLUP).
3.
Use consistent genotypic and phenotypic data structures over years to facilitate data integration.

3.5 Take into Account the Costs

The integration of genomic selection in a breeding program is a long-term investment that must translate into a better rate of genetic gain to be worth implementing. Even if the advantages of using genomic selection are clear, the optimal breeding scheme relative to genetic, operational, and cost constraints is not easy to identify. After setting a vision for what’s optimal, the need to convert to this new strategy in a budget friendly way is probably the most important limitation for the strengthening of modern breeding programs. Nevertheless, there are several levers that can be used to liberate resources in a program aiming to fully deploy genomic selection.

The first levers are related to phenotyping. Thanks to genomic prediction, some phenotyping steps can be reduced or even eliminated saving the related costs de facto. Indeed, this is one of the main advantages of genomic prediction which, with the right data structures in place, allows for both a reduction of cycle time and phenotyping costs [111]. The costs of phenotyping and the potential to replace a phenotyping activity with a prediction should be carefully evaluated when planning the integration of genomic prediction as it may sometimes require a modification of the breeding scheme. One key example of this is the cost savings incurred when transitioning from traditional pedigree breeding program where the selection that occurs during the fixation steps (F2–F5) can be delayed until after inbred lines have been extracted by substituting a field-based pedigree nursery with a much cheaper and faster SSD-based RGA method. The cost savings made at this level can easily cover the cost of genotyping since advancing material through RGA is much less expensive (around 1 US dollar per F5/F6 lines) [119]. Organizations must however look to multi-year budgeting strategies to accommodate the fixed costs that may be incurred if existing greenhouse facilities cannot be leveraged for this activity. Initial capital investments can often be paid for by reduced operational costs over several years. Furthermore, organizations must factor in the additional funding that could be generated due to the increase in genetic gain that will accompany a shortening of the breeding cycle and an improvement in selection accuracy.

Another direct way to recover costs is by using genomic prediction to reduce the volume of an expensive phenotyping exercise [75, 142]. This can be done either by selectively phenotyping a carefully chosen subset of a trial for expensive traits like grain biochemistry or other post-harvest traits and using the cost savings to pay for DNA fingerprinting. Additionally, developing an index of high-throughput correlated traits that may be less expensive to measure or offer higher throughput compared to the target trait can decrease the cost of phenotyping and offer similar accuracy. In that context, multi-trait genomic prediction offers an ideal framework to integrate correlated traits to maximize prediction accuracy [143].

The second levers are related to genotyping. In a crop breeding program, the choice of the genotyping technology to characterize the breeding germplasm (training and prediction sets) is mostly driven by the cost of genotyping per sample (and not really well captured by the cost per data point) [144]. Indeed, the cost per sample with available tools (genotyping-by-sequencing or fixed SNP arrays) is often too high to be used routinely in a public breeding program. In small to medium size breeding programs, the cost per sample has to be around 10 US dollars or less in order to assess a sufficient number of individuals. In that price bracket, the number of loci that can be currently targeted is around 1000–5000 SNPs. One option to keep costs down in the long term is to design a custom genotyping assay with SNPs selected to be specifically informative in the target breeding population. This would be a cheaper option than GBS or public fixed arrays and allow for higher density of information content in the genotype dataset. A custom SNP panel has the additional benefit of potentially surveying specific trait markers of relevance to a breeding program in addition to the genome-wide markers included in the set, thus allowing for more extensive QTL profiling of lines for known alleles that are not necessarily prioritized for MAS . In fact, depending on the capability of the genotyping service provider, it is not unreasonable to save sampling and DNA extraction costs by combining MAS and fingerprinting such that the cohort is screened with a few markers intended for MAS , then to have the DNA from selected lines re-arrayed into a new plate for genome-wide fingerprinting.

It is also possible to achieve low genotyping cost by using low-coverage genotyping-by-sequencing [145]. Given the limitation of genotyping-by-sequencing when the sequencing depth is lowered (high rate of missing data, high error rate for heterozygous loci), this approach won’t capture heterozygous loci efficiently and must be used for genotyping fixed lines, coupled with an efficient imputation framework based on high-quality sequence data of ancestral lines in the pedigree. This therefore requires expertise in bioinformatics and access to high-performance computing resources.

Key recommendations:

1.
Consider reducing the number of phenotyping steps, only phenotyping a subset of a trial, or using cheaper or higher throughput correlated traits.
2.
Design a genotyping platform with a set of markers selected specifically for the germplasm managed in the breeding program and deploy it at a service provider.

4 An Example on IRRI Breeding Program for Irrigated Systems

Here, we give a practical example of the integration and use of genomic predictions in an active rice breeding program. The recently redesigned breeding program for irrigated systems at IRRI offers an ideal context to understand the key elements of an applied breeding program using genomic predictions [146, 147]. Indeed, with its global mandate of Southeast Asia, South Asia, and Eastern Africa as the main areas of intervention, it represents the direct derivation of the early breeding efforts that resulted in the Green Revolution in Asia. As such, it is the best representation possible of an effort to produce materials that combine high yield potential and adaptation to diverse environmental conditions.

4.1 The Transition from Pedigree Breeding to Recurrent Genomic Selection

The applications of genomic selection to the IRRI breeding program came in two broad categories: within cohort predictions (full and half sibs predicting other full and half sibs) to optimize our testing strategy and across cohort predictions (grandmothers and mothers predicting daughters and granddaughters) to accelerate our breeding cycles, both of which required changes to the breeding strategy. First and foremost, both applications required the cost-effective deployment of a genotyping technology that allowed for the routine fingerprinting of the breeding material. This marker set (known as the 1k-RiCA amplicon panel [69]) had recently been developed and populated with markers that were specifically informative in our germplasm. Publicly available fixed array genotyping technology would not have served this purpose well as many of the markers on these arrays are chosen to differentiate germplasm globally [148] and were often very expensive with relatively few (or worse, biased) polymorphisms.

With the marker panel in place and deployed at a service provider, in the immediate term, the most useful application of genomic selection was to allow for selections to be made based on performance in the target environments rather than depending on a correlated response to selection with Philippine environments (where IRRI’s headquarters are located). The program as it is currently resourced generates a stage 1 yield trial of approximately 2000 new lines each year. As all of IRRI’s yield trials are conducted by national agricultural research partners, the ability to test 2000 lines in multilocation yield trials in Africa, South Asia, and Southeast Asia was extremely limited. Up to this point, the early generation breeding material was selected based on performance in the Philippines and a small number of advanced lines were sent to the regional locations for testing and evaluation (Fig. 2). Genomic selection using full and half sibs was employed to enable direct selection based on the target environment and avoid needing to rely on indirect selection. By selecting an optimized subset of the cohort and sending it to be tested in the region of interest, phenotype data from the specific region of interest could be used to predict the performance of the remaining cohort in that region. In this way, the entire cohort is tested somewhere, but no individual is tested everywhere, and thus an advancement of superior lines can be sent to partners that is tailored to their unique conditions. To do this, however, required that funds be identified to fingerprint the full cohort of about 2000 new lines every year. In order to make this form of genomic selection cost neutral, it was noted that the testing strategy in the Philippines was testing lines for 3 years (Fig. 2, former scheme). By eliminating the middle testing phase and selecting a region-specific set of lines for advance testing, sufficient funds were recovered to cover the cost of fingerprinting.

The genomic prediction application with more long-term value to the program was to enable across cohort predictions so that superior lines in each region could be recycled back into the breeding pipeline prior to regional testing, and thus accelerating the breeding cycle (Fig. 2, future scheme). This kind of prediction however requires a more robust, multi-year dataset consisting of regional phenotype data on ancestral lines, as phenotype data from full and half sibs of the emergent candidates would be unavailable at the time the prediction needs to be made. With the first application of genomic prediction in place, the program is now well positioned to begin generating multi-year datasets with region-specific phenotypic observations needed to predict new parents. However, to make this kind of prediction possible, a more directed manipulation of the crossing strategy needed to be implemented. The most important decision a breeder makes is selecting and crossing parents on the basis of breeding values for relevant traits. As this metric was not routinely calculated at IRRI, our first step was to gather our historical data together into a single model and generate the best estimates possible for breeding values and reliabilities for yield, maturity, and plant height. Breeding values for other important traits such as grain quality, disease resistance, and other agronomic traits were not collected routinely enough or at enough locations to provide meaningful estimates of breeding value. This process was substantially accelerated due to the efforts made to migrate data into the B4R data management system. As DNA fingerprint data was not available on the vast majority of our historical lines, pedigree data stored in the genealogy management system was used to estimate relatedness coefficients. This multi-year evaluation of our historical data permitted the identification of a unique core set of lines with high and reliable breeding values for yield, which would form the basis of further breeding and germplasm characterization efforts. Once identified, this set of high breeding value lines were fingerprinted and that data was then used to estimate the effective population size and used to estimate the frequencies of major genes for other traits (such as amylose content or resistance to blast). These metrics would be used to guide selection strategies among the progeny and evaluate the risk/benefit of introducing new genetics into the program.

This step, while not specifically motivated by genomic selection, was critically important because along with the development and characterization of the core germplasm came a commitment from the program to primarily cross within this new gene pool to drive genetic gain. This relatedness across generations (and aversion to frequent introduction of new germplasm into the program) creates genetic continuity over multigenerational cohorts that enables the ability to use phenotype data from ancestors to predict the performance of newly created descendants. Corresponding with that relatedness was the development of business rules for crossing and population development. These rules ensure that new crosses generated by the breeding program maximized genetic variation in the next generation to the extent possible. They also allowed for sufficient numbers of full and half-siblings in each cohort to be generated, from which predictive power could be obtained. Among these, business rules included a commitment to cross with lines from the most recent cohorts whenever possible (rather than older released lines), preventing the use of any one line in more than 10% of the crosses to avoid bottleneck the variability, the complete avoidance of sub-lining so that each F2 plant generates a unique F6 line, and ensuring that sufficient new fixed lines from each cross were entered into the stage 1 yield trial such that there was a reasonable probability of identifying a new line that was at least one standard deviation better than the average yield of the cross.

With these two applications of genomic prediction underway, the program went from a long-cycle pedigree nursery to a rapid-cycle genomics enabled breeding strategy. This strategy involved making crosses and setting population size targets according to predefined business rules, generating new lines through RGA approaches, employing MAS after line fixation, and using bulk harvests of the selected head rows to create seed for shipping to regional locations for testing. Predictions of the entire cohort across all regions would ensure that every line had either an observation or a prediction in every region, from which a core set of superior region-specific lines were identified and shipped to partners for stage 2 yield trial evaluation and testing. As data accumulated in the regions on cohorts of lines, and as the progeny and grand progeny of the original core set of lines begin to fill the pipeline, the capacity for predicting regional performance across cohorts will grow until sufficient data becomes available to allow for the identification of new parents prior to stage 1 yield testing.

4.2 Description of the Breeding Schemes and Integrating Genomic Prediction

The mapping of the breeding scheme is a key component for the optimal use of breeding program resources and to understand where the entry points for genomic selection could be placed. The current breeding strategy summarized in Fig. 2 was initiated in 2017 at IRRI in order to reduce cycle time and optimize multi-environment evaluations thanks to the introduction of genomic prediction. In this strategy, most of the activities take place at IRRI headquarters in the Philippines. In the first year, the crosses (80–100) are made and the F1 plants are validated using dedicated SNP markers. In the second year the segregating families go through SSD from F2 to F6 via RGA. At that stage, 7500 to 10,000 lines are advanced: this corresponds to 200–400 lines per cross. Population sizes for each cross are determined based on the anticipated segregation of major genes. In the third year, the lines are evaluated in the field in panicle rows for seed increase and for the evaluation of uniformity, plant architecture, and maturity. At the same time, the lines are genotyped for marker-assisted selection for major loci prioritized for each breeding pipeline. These include the waxy gene for amylose content and a number of disease resistance genes for major pests and disease (blast, bacterial leaf blight, ...) [10]. The second season of the third year is dedicated to the preparation of the seeds to be shipped in the regions. In the fourth year, the lines advanced based on MAS and head row selection (1500–2000) are genotyped using a low-density platform with less than 1000 SNP markers [69]. The same lines are also evaluated in the first stage yield trial at IRRI headquarters in the Philippines. In parallel, a subset of the cohort (250–300 lines) is sent to the regional partners in South Asia and Eastern Africa for multi-environment evaluation of key agronomic traits (plant height, time to flowering, grain yield). This subset (training set) is used to build the genomic prediction model that is later used to select an advanced class of superior lines among the entire cohort. Since no historical data were available for building reliable genomic prediction models , the integration of genomic prediction in this scheme relies on the use of half sibs or full sib-sibs to maximize the accuracy with highly related training and predicted sets [142, 149]. The genomic prediction models are used to select parental lines for the following cycle and to select promising lines (30–40) for the second stage yield trial that are conducted in the fifth year of the breeding scheme. The best performing lines at the end of this stage can then go through advance testing in the national variety release system or can be used by partners in the regions in their breeding program to enrich their gene pools.

In this strategy, the breeding cycle spans over 5 years with the recycling of advanced lines as parents occurring during the fourth year (Fig. 2). Compared to previous breeding schemes that were in place at IRRI, the cycle time is shortened by 2 years [147]. Reduction of cycle time is a key factor to increase the rate of genetic gain [109]. In this scheme, one of the major tools for cycle reduction is RGA. This approach, known for a long time [150, 151], was optimized in 2013 and implemented at large scale at IRRI in 2014 [118]. Currently, genomic prediction is not used to decrease cycle time and is mainly used to increase the intensity and accuracy of selection in regional environments, especially for yield. The main reason for this is the lack of historical data in the breeding program suitable for genomic prediction. Indeed, very few breeding lines have been consistently genotyped and phenotyped to build a reliable database. Therefore, the current phase is a transition phase where the data currently generated feeds a database that will be used to predict the performance of future progeny (across cohort predictions). This is highlighted in Fig. 2 as the future scheme. This ability to directly predict the performance of selection candidates before evaluating them in the field will enable us to decrease the cycle time by 2 additional years resulting in a 2-year breeding cycle. However, this comes with operational challenges such as ensuring four generations per year in a stable manner during RGA, production of enough seed at the end of the RGA to enable multi-environment trials, and navigating the import/export process quickly enough to ensure the seed arrives to the partners in time for planting in the main season.

4.3 A Practical Example of the Analytical Pipeline

In this section, we present the analysis pipeline that we currently use at IRRI to perform genomic selection. This corresponds to the activities mapped to the fourth year of the current breeding strategy (first stage yield trial, Fig. 2). The analysis pipeline is divided into three main steps (Fig. 3):

1.
The selection of the training set. This step is based on SNP markers specifically chosen to be informative in the elite germplasm used in the breeding program [69] and the optimization method of Akdemir et al. [41] that minimizes the prediction error variance (PEV) in the predicted set.
2.
The single trial analysis. In this step, phenotypic data (plant height, days to flowering, and grain yield) are measured on the training set in several regional locations, which are analyzed separately to assess the quality of the data at each location and estimate spatial adjustments to genotypic values with a mixed model, taking the experimental design into consideration.
3.
The genomic prediction analysis. In this last step, a GBLUP model trained with the genotypic and phenotypic data from the training set is used to predict genomic estimated breeding values (GEBVs) for all the untested lines.

To illustrate the analysis pipeline, real data from the IRRI breeding program for irrigated systems is used as an example. The analyses were conducted within the R environment and utilized the R packages asreml (under license) or sommer (freely available) for mixed model analyses and functions developed specifically for the analysis pipeline and from the literature. We have opted to give the user the possibility to choose between asreml and sommer according to his preferences. All the R scripts and the data are provided in the supplementary material (Data 1, 2, and 3).

4.3.1 Selection of the Training Set

In the current breeding scheme the genomic prediction is used for within-cohort predictions. In order to identify the best subset (training set) to be phenotyped in regional MET, we use an optimization method based on mixed model theory that minimizes the prediction error variance [41]. This method available in the R package STPGA (for Selection of Training Populations by Genetic Algorithm) requires the genomic relationship matrix (G matrix) as an input. In the example, the entire cohort of 1722 lines is genotyped with 1079 SNP markers. We use the rrBLUP package to compute the G matrix based on the genotypic matrix (geno_data) containing marker information coded as [−1, 0, 1]. The G matrix is then used as a parameter for the OptiTS function along with the desired size of the training set (sTS = 300) and the number of replicates (rep = 5). The number of replicates allows the selection of the individuals most represented in the different runs to be included in the training set in order to avoid suboptimal solutions from the genetic algorithm [41]. To evaluate the representativeness of the training set compared to the entire cohort, the individuals are plotted using the two first principal components from the G matrix (Fig. 4).

4.3.2 Single Trial Analysis

Once the training set is identified, it is sent to regional partners to be evaluated in MET. For this case study, actual trial data from five different locations in Bangladesh were used. These trials were conducted in the 2020 dry season (Jan–May). Each trial comprises 362 breeding lines of which 299 are training set lines, and the rest are advanced lines from the previous cohort and check varieties. All the trials used a partially replicated design with 20% of lines replicated. Three traits are used in this example: plant height (cm), days to flowering, and grain yield (t/ha). The trial data is uploaded into the B4R database, which has been adopted by IRRI for managing all breeding trial data. The exported data from the B4R database for each location is used to perform individual single trial analyses (pheno_data object). The objective of this step is to remove potential error in the dataset and to adjust from spatial variation using the experimental design. The following mixed model (asreml or sommer) is used to obtain the BLUP and deregressed BLUP for each line:

model <- asreml( fixed = trait ~ 1 , random = ~ DESIGN_X + DESIGN_Y + GID, na.action = na.method(x = "include"), data = dataset) model <- sommer::mmer(fixed = trait ~ 1, random = ~ DESIGN_X + DESIGN_Y + GID, rcov = ~ units, data = dataset, verbose = FALSE)

The variable DESIGN_X and DESIGN_Y represent the coordinates of the plots within the field. The variable GID represents the ID of the genotypes. The BLUP and deregressed BLUP values are then calculated. The single trial analysis is embedded in a function called single_trial_asreml or single_trial_sommer that takes the formatted phenotypic raw data as an input and returns a data frame with several variables including location, trait, genotype ID, BLUP, deregressed BLUP, and repeatability (H²). The function is then used for all locations and traits to run the model and retrieve the BLUPs (Fig. 5a).

4.3.3 Genomic Predictions

The deregressed BLUP value of the training set lines from the single trial analysis and the genome-wide marker genotype data of the entire cohort (training set and predicted set) consisting of 1722 lines are used in the genomic prediction model . The genome-wide marker data is used to construct the additive relationship matrix with the sommer package. The inverse of the additive relation matrix is then constructed in the case where asreml is used the GBLUP analysis. The GEBV for each line is computed using the GBLUP model where the regressed BLUP from each location is the response variable, location as fixed effect, the breeding line (gid) and inverse of the G matrix (invG) are used as the random effects.

model <- asreml(fixed = trait ~ 1 + location, random = ~ vm(gid, invG), data = dataset) model <- sommer::mmer( fixed = trait ~ 1 + location, random = ~ vs(gid, Gu = G), rcov = ~ vs(units), data = dataset, verbose = FALSE)

Similarly to the single location analysis, this model is embedded in a function (gblup _asreml or gblup _sommer) with two parameters: the first is the output from the single location analysis and the second is the inverse of the G matrix. The output of the function is a table containing the GEBV on the entire cohort (Fig. 5b). The GEBV values are then combined with trait marker information and used by the breeder for selecting lines for advanced testing and, also, selecting parents for the next breeding cycle.

5 Other Applications of Genomic Prediction for Rice Improvement

In the previous parts of the chapter, we saw that genomic selection requires both methodological research and a carefully designed breeding program to be implemented efficiently. In this last part, we present ongoing developments regarding the use of genomic predictions for rice improvement. We think it is important for breeders to be aware of upcoming approaches and tools to be ready when they are mature enough to be integrated in breeding programs when appropriate.

5.1 Characterization of Genetic Diversity for Pre-breeding

The characterization and the use of genetic diversity is important to meet long-term breeding objectives and maintain the adaptive potential of the breeding populations [152]. In the case of recurrent selection in elite germplasm, the addition of new material threatens the genetic gain in the short term by diluting the impact of high value alleles carefully accumulated through successive cycles of selection. However, in the long term, the loss of genetic diversity due to selection but also to negative or neutral linkage drag or genetic drift can be compensated by careful introduction of genetic variation into the elite pool [153]. The identification of the best accessions for particular breeding objectives is laborious, as it requires an accurate phenotyping of a large number of diverse lines that often mask valuable haplotypes in low breeding value backgrounds. In this context, genomic prediction can be used to identify superior accessions in germplasm collections and be applied to pre-breeding, which aims to identify high-potential genotypes among a large number of accessions [154,155,156]. In rice, the availability of large genomic resources such as the 3000 rice genomes [30] or the high-density rice array panel [157] offers a unique opportunity to use genomic prediction to target valuable genotypes relative to the breeding objectives.

5.2 Definition of Heterotic Groups for Hybrid Breeding

In hybrid breeding, heterotic groups are usually needed to optimally use the heterosis within a species [158]. To this end, hybrid selection causes the germplasm to become structured into genetically distinct groups that display superior hybrid performance when individuals from complementary groups are crossed. Contrary to other major crops (e.g., corn [159]), heterotic groups in rice are defined largely according to complementarity with a particular sterility system and not according to gene pools defined by complimentary heterotic potential. This is further complicated in rice due to the strong population structure that characterizes rice diversity being confused as heterotic differentiation of complementary gene pools [29, 30]. Efforts to coerce ancestral subpopulations into heterotic groups, as in the case of the two major types (indica and japonica), have limitations due to sterility, contrasting adaptations, and very different distributions of major grain quality parameters [160]. Further research is required to identify natural patterns of heterosis [161], and in some cases genomic prediction can assist this exploration. Recently, the use of predictions to define heterotic pools based on complementary yield performance has been proposed in rice [162]. In this study based on real data, the authors applied the approach developed by Zhao et al. [163] to detect heterotic patterns for yield by combining the predicted performances of all unique single-cross hybrids with a simulated annealing algorithm with different group sizes.

5.3 Integration of High-Throughput Phenotyping and Environmental Information

The significant progress made with genomics in breeding programs has reinforced the idea that phenotyping is still a bottleneck for genetic improvement [164]. This may seem paradoxical since one of the advantages of genomic selection lies in the reduction of some phenotyping steps. However, accurate field phenotyping for important traits (e.g., grain yield) in METs is even more important to efficiently train the prediction model and capture G × E. In addition, selection for more expensive or difficult traits (drought resistance, lodging tolerance, grain quality, etc.) can be integrated earlier in the breeding scheme thanks to genomic prediction and therefore increase the selection intensity. These observations have led to an ever-increasing interest in high-throughput phenotyping methods [165, 166]. Several tools (RGB and multispectral cameras, thermal sensor, etc.) and platforms (phenomobiles, unmanned aerial vehicles, etc.) are available for field and laboratory phenotyping with a wide range of applications. When integrated in a genomic prediction model , high-throughput phenotypic data can substantially increase the prediction accuracy [167, 168]. In the case of phenomic selection, high-throughput near-infrared spectroscopy data can even replace genotypic data and offer similar accuracy [169, 170]. However, to be useful in a breeding context, the large quantity of data generated by the high-throughput phenotyping techniques needs to be stored in a data management system, properly vetted relative to the costs and selection accuracies available from manual phenotypes, and associated with correct genotype data if it is to improve the decision-making process. Although tools and analysis pipelines have evolved in recent years, there are still important constraints to the routine use of these approaches: the acquisition of multi-environment field data and not just data from a central research station, the availability of data management systems that can handle large time-series datasets, and the initial cost of related equipment. It is expected as the technologies and regulations mature that dedicated companies offering high-throughput phenotyping services will emerge, much like has been the case with genotyping.

In addition to high-throughput phenotyping, a better characterization of environmental factors affecting the performance crop plants will enhance our ability to explain nongenetic sources of variation. Such “envirotyping” is an area of active research that shows great promise [171]. To become truly useful technologies that permit the high-throughput collection of envirotype data in real time need to continue to mature as well as data management and analytical strategies for extracting intelligence from these datasets.

6 Conclusion: A Point of View of a Rice Breeder

Based on the literature in rice and in other species, the ability to do genomic prediction and the value of applying genomic selection to rice breeding programs are beyond question. The capacity to estimate the prediction values and the key datasets and models that underlie the estimation of GEBVs is also very well understood. The marker resources and phenotyping capacity in rice are present and available at this point to even the most remote breeding organizations. Furthermore, the rules that describe how quantitative trait variation is inherited in populations are well understood, and it would seem the infinitesimal model applies to quantitative traits in rice in most cases. What remains to capture the full value of this technology is the reorientation of rice breeding programs around a short-cycle recurrent selection strategy within a defined gene pool. During that transition, genomic prediction can additionally be helpful for improving selection within cohorts and save money on field evaluation. As a result, generating genotype data or building an analytical pipeline is often not the starting point for implementation of genomic selection in most programs. Clear business rules for data collection and management, clearly defined best practices for parental selection, and a commitment to work within elite gene pools must come first. Second to these foundational activities, breeding programs must standardize and systematize their operations in such a way that resources are optimized, workflows are clear, and breeders are not spending inordinate amounts of time managing logistics. Field work needs to focus more on data quality and data collection, reserving selection decisions for after data has been collected, analyzed, and interpreted. Marker systems for routine genotyping are also necessary but must be developed such that the genotype data is specifically informative to the breeding germplasm of interest.

The public rice literature to date has largely focused on questions related to if predictions work in rice or how to optimize prediction accuracy. Very few rice publications address how predictions can be practically applied to enhanced rates of genetic gain. As a result, in an attempt to modernize many breeders get stuck in “proof of concept purgatory” by trying to replicate analyses done by others. Breeders seeking to improve their strategy would instead be benefited from considering whether the appropriate foundations are laid in their programs and then considering carefully what the entry points for prediction are in their stated breeding strategy. Commercial breeding programs may have the advantage of having the freedom to invest resources in additional capital or operational expenditures up front in order to capture value in the long term. However as budgets are often tight, fixed, or subject to congressional approval for publicly funded programs, cost saving adjustments to the breeding strategy (such as applying a sparse testing design or implementing rapid generation advance for line fixation) may liberate resources in the short term which can be applied to laying the proper foundations for a fully genomic prediction-enabled breeding strategy.

References

Ragot M, Bonierbale M, Weltzien E (2018) From market demand to breeding decisions: a framework
Google Scholar
Gallais A (2011) Méthodes de création de variétés en amélioration des plantes. Quae
Google Scholar
Brown J, Caligari P (2011) An introduction to plant breeding. Wiley
Google Scholar
Rutkoski JE (2019) Chapter four—a practical guide to genetic gain. In: Sparks DL (ed) Advances in agronomy. Academic, pp 217–249
Google Scholar
Lynch M, Walsh B (1998) Genetics and analysis of quantitative traits. Sunderland, MA, Sinauer
Google Scholar
Cooper M, Hammer GL (1996) Plant adaptation and crop improvement. CAB International, Wallingford
Book Google Scholar
Chenu K (2015) Chapter 13—characterizing the crop environment—nature, significance and applications. In: Sadras VO, Calderini DF (eds) Crop physiology, 2nd edn. Academic, San Diego, pp 321–348
Chapter Google Scholar
Xu Y, Li P, Zou C et al (2017) Enhancing genetic gain in the era of molecular breeding. J Exp Bot 68:2641–2666. https://doi.org/10.1093/jxb/erx135
Article CAS PubMed Google Scholar
Lande R, Thompson R (1990) Efficiency of marker-assisted selection in the improvement of quantitative traits. Genetics 124:743–756
Article CAS PubMed PubMed Central Google Scholar
Cobb JN, Biswas PS, Platten JD (2018) Back to the future: revisiting MAS as a tool for modern plant breeding. Theor Appl Genet 132(3):647–667. https://doi.org/10.1007/s00122-018-3266-4
Article CAS PubMed PubMed Central Google Scholar
Meuwissen TH, Hayes BJ, Goddard ME (2001) Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829
Article CAS PubMed PubMed Central Google Scholar
Goddard ME, Hayes BJ (2007) Genomic selection. J Anim Breed Genet 124:323–330. https://doi.org/10.1111/j.1439-0388.2007.00702.x
Article CAS PubMed Google Scholar
Goddard M (2008) Genomic selection: prediction of accuracy and maximisation of long term response. Genetica 136:245–257. https://doi.org/10.1007/s10709-008-9308-0
Article PubMed Google Scholar
Heffner EL, Sorrells ME, Jannink J-L (2009) Genomic selection for crop improvement. Crop Sci 49:1–12. https://doi.org/10.2135/cropsci2008.08.0512
Article CAS Google Scholar
Sallam AH, Endelman JB, Jannink J-L, Smith KP (2015) Assessing genomic selection prediction accuracy in a dynamic barley breeding population. Plant Genome 8:20. https://doi.org/10.3835/plantgenome2014.05.0020
Article CAS Google Scholar
VanRaden PM, Van Tassell CP, Wiggans GR et al (2009) Invited review: reliability of genomic predictions for North American Holstein bulls. J Dairy Sci 92:16–24. https://doi.org/10.3168/jds.2008-1514
Article CAS PubMed Google Scholar
Hickey JM, Chiurugwi T, Mackay I et al (2017) Genomic prediction unifies animal and plant breeding programs to form platforms for biological discovery. Nat Genet 49:1297–1303. https://doi.org/10.1038/ng.3920
Article CAS PubMed Google Scholar
Crossa J, Pérez-Rodríguez P, Cuevas J et al (2017) Genomic selection in plant breeding: methods, models, and perspectives. Trends Plant Sci 22:961–975. https://doi.org/10.1016/j.tplants.2017.08.011
Article CAS PubMed Google Scholar
Hayes BJ, Bowman PJ, Chamberlain AJ, Goddard ME (2009) Invited review: genomic selection in dairy cattle: progress and challenges. J Dairy Sci 92:433–443. https://doi.org/10.3168/jds.2008-1646
Article CAS PubMed Google Scholar
de los Campos G, Hickey JM, Pong-Wong R et al (2013) Whole-genome regression and prediction methods applied to plant and animal breeding. Genetics 193:327–345. https://doi.org/10.1534/genetics.112.143313
Article PubMed Central Google Scholar
Izawa T, Shimamoto K (1996) Becoming a model plant: the importance of rice to plant science. Trends Plant Sci 1:95–99. https://doi.org/10.1016/S1360-1385(96)80041-0
Article Google Scholar
Peng S, Khushg G (2003) Four decades of breeding for varietal improvement of irrigated lowland rice in the International Rice Research Institute. Plant Prod Sci 6:157–164. https://doi.org/10.1626/pps.6.157
Article Google Scholar
Chandler RF (1982) An adventure in applied science: a history of the International Rice Research Institute. IRRI
Google Scholar
Breth S (1985) International rice research: 25 years of partnership. IRRI
Google Scholar
Guimaraes EP (2009) Rice breeding. In: Cereals. Springer, pp 99–126
Chapter Google Scholar
Jena KK, Mackill DJ (2008) Molecular markers and their use in marker-assisted selection in rice. Crop Sci 48:1266–1276. https://doi.org/10.2135/cropsci2008.02.0082
Article Google Scholar
Ismail AM, Singh US, Singh S et al (2013) The contribution of submergence-tolerant (Sub1) rice varieties to food security in flood-prone rainfed lowland areas in Asia. Field Crops Res 152:83–93. https://doi.org/10.1016/j.fcr.2013.01.007
Article Google Scholar
Steele KA, Price AH, Shashidhar HE, Witcombe JR (2006) Marker-assisted selection to introgress rice QTLs controlling root traits into an Indian upland rice variety. Theor Appl Genet 112:208–221. https://doi.org/10.1007/s00122-005-0110-4
Article CAS PubMed Google Scholar
Glaszmann JC (1987) Isozymes and classification of Asian rice varieties. Theor Appl Genet 74:21–30. https://doi.org/10.1007/BF00290078
Article CAS PubMed Google Scholar
Wang W, Mauleon R, Hu Z et al (2018) Genomic variation in 3,010 diverse accessions of Asian cultivated rice. Nature 557:43–49. https://doi.org/10.1038/s41586-018-0063-9
Article CAS PubMed PubMed Central Google Scholar
Brar D, Khush G (2002) Transferring genes from wild species into rice. In: Kang MS (ed) Quantitative genetics, genomics, and plant breeding. CABI, Wallingford, p 197
Chapter Google Scholar
Brar DS, Khush GS (2018) Wild relatives of rice: a valuable genetic resource for genomics and breeding research. In: Mondal TK, Henry RJ (eds) The wild oryza genomes. Springer, Cham, pp 1–25
Google Scholar
Breseghello F, de Morais OP, Pinheiro PV et al (2011) Results of 25 years of upland rice breeding in Brazil. Crop Sci 51:914–923. https://doi.org/10.2135/cropsci2010.06.0325
Article Google Scholar
Spindel J, Iwata H (2018) Genomic selection in rice breeding. In: Sasaki T, Ashikari M (eds) Rice genomics, genetics and breeding. Springer, Singapore, pp 473–496
Chapter Google Scholar
Ahmadi N, Bartholomé J, Tuong-Vi C, Grenier C (2020) Genomic selection in rice: empirical results and implications for breeding. In: Quantitative genetics, genomics and plant breeding. CABI, Wallingford, pp 243–258
Chapter Google Scholar
Guo Z, Tucker DM, Basten CJ et al (2014) The impact of population structure on genomic prediction in stratified populations. Theor Appl Genet 127:749–762. https://doi.org/10.1007/s00122-013-2255-x
Article PubMed Google Scholar
Zhao K, Tung C-W, Eizenga GC et al (2011) Genome-wide association mapping reveals a rich genetic architecture of complex traits in Oryza sativa. Nat Commun 2:467. https://doi.org/10.1038/ncomms1467
Article CAS PubMed Google Scholar
Xu SH, Zhu D, Zhang QF (2014) Predicting hybrid performance in rice using genomic best linear unbiased prediction. Proc Natl Acad Sci U S A 111:12456–12461. https://doi.org/10.1073/pnas.1413750111
Article CAS PubMed PubMed Central Google Scholar
Hua JP, Xing YZ, Xu CG et al (2002) Genetic dissection of an elite rice hybrid revealed that heterozygotes are not always advantageous for performance. Genetics 162:1885–1895
Article CAS PubMed PubMed Central Google Scholar
Zhang Z, Ober U, Erbe M et al (2014) Improving the accuracy of whole genome prediction for complex traits using the results of genome wide association studies. PLoS One 9:e93017. https://doi.org/10.1371/journal.pone.0093017
Article CAS PubMed PubMed Central Google Scholar
Akdemir D, Sanchez JI, Jannink J-L (2015) Optimization of genomic selection training populations with a genetic algorithm. Genet Sel Evol 47:38. https://doi.org/10.1186/s12711-015-0116-6
Article PubMed PubMed Central Google Scholar
Blondel M, Onogi A, Iwata H, Ueda N (2015) A ranking approach to genomic selection. PLoS One 10:e0128570. https://doi.org/10.1371/journal.pone.0128570
Article CAS PubMed PubMed Central Google Scholar
Grenier C, Cao T-V, Ospina Y et al (2015) Accuracy of genomic selection in a rice synthetic population developed for recurrent selection breeding. PLoS One 10:e0136594. https://doi.org/10.1371/journal.pone.0136594
Article CAS PubMed PubMed Central Google Scholar
Isidro J, Jannink J-L, Akdemir D et al (2015) Training set optimization under population structure in genomic selection. Theor Appl Genet Theor Angew Genet 128:145–158. https://doi.org/10.1007/s00122-014-2418-4
Article Google Scholar
Iwata H, Ebana K, Uga Y, Hayashi T (2015) Genomic prediction of biological shape: elliptic fourier analysis and kernel partial least squares (PLS) regression applied to grain shape prediction in rice (Oryza sativa L.). PLoS One 10:e0120610. https://doi.org/10.1371/journal.pone.0120610
Article CAS PubMed PubMed Central Google Scholar
Onogi A, Ideta O, Inoshita Y et al (2015) Exploring the areas of applicability of whole-genome prediction methods for Asian rice (Oryza sativa L.). Theor Appl Genet 128:41–53. https://doi.org/10.1007/s00122-014-2411-y
Article PubMed Google Scholar
Spindel J, Begum H, Akdemir D et al (2015) Genomic selection and association mapping in rice (Oryza sativa): effect of trait genetic architecture, training population composition, marker number and statistical model on accuracy of rice genomic selection in elite, tropical rice breeding lines. PLoS Genet 11:e1004982. https://doi.org/10.1371/journal.pgen.1004982
Article CAS PubMed PubMed Central Google Scholar
Bustos-Korts D, Malosetti M, Chapman S et al (2016) Improvement of predictive ability by uniform coverage of the target genetic space. G3 6:3733–3747. https://doi.org/10.1534/g3.116.035410
Article PubMed PubMed Central Google Scholar
Jacquin L, Cao T-V, Ahmadi N (2016) A unified and comprehensible view of parametric and kernel methods for genomic prediction with application to rice. Front Genet 7:145. https://doi.org/10.3389/fgene.2016.00145
Article PubMed PubMed Central Google Scholar
Onogi A, Watanabe M, Mochizuki T et al (2016) Toward integration of genomic selection with crop modelling: the development of an integrated approach to predicting rice heading dates. Theor Appl Genet 129:805–817. https://doi.org/10.1007/s00122-016-2667-5
Article PubMed Google Scholar
Spindel JE, Begum H, Akdemir D et al (2016) Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement. Heredity 116:395–408. https://doi.org/10.1038/hdy.2015.113
Article CAS PubMed PubMed Central Google Scholar
Campbell MT, Du Q, Liu K et al (2017) A comprehensive image-based phenomic analysis reveals the complex genetic architecture of shoot growth dynamics in rice (Oryza sativa). Plant Genome 10. https://doi.org/10.3835/plantgenome2016.07.0064
Gao N, Martini JWR, Zhang Z et al (2017) Incorporating gene annotation into genomic prediction of complex phenotypes. Genetics 207:489–501. https://doi.org/10.1534/genetics.117.300198
Article CAS PubMed PubMed Central Google Scholar
Matias FI, Galli G, Granato ISC, Fritsche-Neto R (2017) Genomic prediction of autogamous and allogamous plants by SNPs and haplotypes. Crop Sci 57:2951–2958. https://doi.org/10.2135/cropsci2017.01.0022
Article Google Scholar
Morais OP, Duarte JB, Breseghello F et al (2017) Relevance of additive and non-additive genetic relatedness for genomic prediction in rice population under recurrent selection breeding. Genet Mol Res 16. https://doi.org/10.4238/gmr16039849
Wang X, Li L, Yang Z et al (2017) Predicting rice hybrid performance using univariate and multivariate GBLUP models based on North Carolina mating design II. Heredity 118:302–310. https://doi.org/10.1038/hdy.2016.87
Article CAS PubMed Google Scholar
Xu S (2017) Predicted residual error sum of squares of mixed models: an application for genomic prediction. G3 7:895–909. https://doi.org/10.1534/g3.116.038059
Article PubMed PubMed Central Google Scholar
Ben Hassen M, Bartholome J, Vale G et al (2018) Genomic prediction accounting for genotype by environment interaction offers an effective framework for breeding simultaneously for adaptation to an abiotic stress and performance under normal cropping conditions in rice. G3 8:2319–2332. https://doi.org/10.1534/g3.118.200098
Article PubMed PubMed Central Google Scholar
Ben Hassen M, Cao TV, Bartholome J et al (2018) Rice diversity panel provides accurate genomic predictions for complex traits in the progenies of biparental crosses involving members of the panel. Theor Appl Genet 131:417–435. https://doi.org/10.1007/s00122-017-3011-4
Article CAS PubMed Google Scholar
Campbell M, Walia H, Morota G (2018) Utilizing random regression models for genomic prediction of a longitudinal trait derived from high-throughput phenotyping. Plant Direct 2:e00080. https://doi.org/10.1002/pld3.80
Article PubMed PubMed Central Google Scholar
Du C, Wei JL, Wang SB, Jia ZY (2018) Genomic selection using principal component regression. Heredity 121:12–23. https://doi.org/10.1038/s41437-018-0078-x
Article CAS PubMed PubMed Central Google Scholar
Gao N, Teng J, Ye S et al (2018) Genomic prediction of complex phenotypes using genic similarity based relatedness matrix. Front Genet 9:364. https://doi.org/10.3389/fgene.2018.00364
Article CAS PubMed PubMed Central Google Scholar
Mathew B, Léon J, Sillanpää MJ (2018) Impact of residual covariance structures on genomic prediction ability in multienvironment trials. PLoS One 13:e0201181. https://doi.org/10.1371/journal.pone.0201181
Article CAS PubMed PubMed Central Google Scholar
Monteverde E, Rosas JE, Blanco P et al (2018) Multienvironment models increase prediction accuracy of complex traits in advanced breeding lines of rice. Crop Sci 58:1519–1530. https://doi.org/10.2135/cropsci2017.09.0564
Article Google Scholar
Morais Júnior OP, Breseghello F, Duarte JB et al (2018) Assessing prediction models for different traits in a rice population derived from a recurrent selection program. Crop Sci 58:2347–2359. https://doi.org/10.2135/cropsci2018.02.0087
Article CAS Google Scholar
Morais Júnior OP, Duarte JB, Breseghello F et al (2018) Single-step reaction norm models for genomic prediction in multienvironment recurrent selection trials. Crop Sci 58:592–607. https://doi.org/10.2135/cropsci2017.06.0366
Article Google Scholar
Xu Y, Wang X, Ding XW et al (2018) Genomic selection of agronomic traits in hybrid rice using an NCII population. Rice 11:32. https://doi.org/10.1186/s12284-018-0223-4
Article PubMed PubMed Central Google Scholar
Yabe S, Yoshida H, Kajiya-Kanegae H et al (2018) Description of grain weight distribution leading to genomic selection for grain-filling characteristics in rice. PLoS One 13:e0207627. https://doi.org/10.1371/journal.pone.0207627
Article CAS PubMed PubMed Central Google Scholar
Arbelaez JD, Dwiyanti MS, Tandayu E et al (2019) 1k-RiCA (1K-Rice Custom Amplicon) a novel genotyping amplicon-based SNP assay for genetics and breeding applications in rice. Rice 12:55. https://doi.org/10.1186/s12284-019-0311-0
Article PubMed PubMed Central Google Scholar
Azodi CB, Bolger E, McCarren A et al (2019) Benchmarking parametric and machine learning models for genomic prediction of complex traits. G3 9:3691–3702. https://doi.org/10.1534/g3.119.400498
Article PubMed PubMed Central Google Scholar
Berro I, Lado B, Nalin RS et al (2019) Training population optimization for genomic selection. Plant Genome 12:1–14. https://doi.org/10.3835/plantgenome2019.04.0028
Article PubMed Google Scholar
Bhandari A, Bartholomé J, Cao-Hamadoun T-V et al (2019) Selection of trait-specific markers and multi-environment models improve genomic predictive ability in rice. PLoS One 14:e0208871. https://doi.org/10.1371/journal.pone.0208871
Article PubMed PubMed Central Google Scholar
e Sousa MB, Galli G, Lyra DH et al (2019) Increasing accuracy and reducing costs of genomic prediction by marker selection. Euphytica 215:18. https://doi.org/10.1007/s10681-019-2339-z
Article CAS Google Scholar
Frouin J, Labeyrie A, Boisnard A et al (2019) Genomic prediction offers the most effective marker assisted breeding approach for ability to prevent arsenic accumulation in rice grains. PLoS One 14:e0217516. https://doi.org/10.1371/journal.pone.0217516
Article CAS PubMed PubMed Central Google Scholar
Guo T, Yu X, Li X et al (2019) Optimal designs for genomic selection in hybrid crops. Mol Plant 12:390–401. https://doi.org/10.1016/j.molp.2018.12.022
Article CAS PubMed Google Scholar
Hu X, Xie W, Wu C, Xu S (2019) A directed learning strategy integrating multiple omic data improves genomic prediction. Plant Biotechnol J 17:2011–2020. https://doi.org/10.1111/pbi.13117
Article CAS PubMed PubMed Central Google Scholar
Huang M, Balimponya EG, Mgonja EM et al (2019) Use of genomic selection in breeding rice (Oryza sativa L.) for resistance to rice blast (Magnaporthe oryzae). Mol Breed 39:114. https://doi.org/10.1007/s11032-019-1023-2
Article CAS Google Scholar
Lima LP, Azevedo CF, De Resende MDV et al (2019) New insights into genomic selection through population-based non-parametric prediction methods. Sci Agric 76:290–298. https://doi.org/10.1590/1678-992X-2017-0351
Article Google Scholar
Monteverde E, Gutierrez L, Blanco P et al (2019) Integrating molecular markers and environmental covariates to interpret genotype by environment interaction in rice (Oryza sativa L.) grown in subtropical areas. G3 9:1519–1531. https://doi.org/10.1534/g3.119.400064
Article PubMed PubMed Central Google Scholar
Ou JH, Liao CT (2019) Training set determination for genomic selection. Theor Appl Genet 132:2781–2792. https://doi.org/10.1007/s00122-019-03387-0
Article PubMed Google Scholar
Suela MM, Lima LP, Azevedo CF et al (2019) Combined index of genomic prediction methods applied to productivity traits in rice. Cienc Rural 49. https://doi.org/10.1590/0103-8478cr20181008
Wang S, Wei J, Li R et al (2019) Identification of optimal prediction models using multi-omic data for selecting hybrid rice. Heredity 123(3):395–406. https://doi.org/10.1038/s41437-019-0210-6
Article PubMed PubMed Central Google Scholar
Wang X, Xu Y, Li PC et al (2019) Efficiency of linear selection index in predicting rice hybrid performance. Mol Breed 39:1–13. https://doi.org/10.1007/s11032-019-0986-3
Article Google Scholar
Baba T, Momen M, Campbell MT et al (2020) Multi-trait random regression models increase genomic prediction accuracy for a temporal physiological trait derived from high-throughput phenotyping. PLoS One 15:e0228118. https://doi.org/10.1371/journal.pone.0228118
Article CAS PubMed PubMed Central Google Scholar
Banerjee R, Marathi B, Singh M (2020) Efficient genomic selection using ensemble learning and ensemble feature reduction. J Crop Sci Biotechnol 23:311–323. https://doi.org/10.1007/s12892-020-00039-4
Article Google Scholar
Cui YR, Li RD, Li GW et al (2020) Hybrid breeding of rice via genomic selection. Plant Biotechnol J 18:57–67. https://doi.org/10.1111/pbi.13170
Article CAS PubMed Google Scholar
Grinberg NF, Orhobor OI, King RD (2020) An evaluation of machine-learning for predicting phenotype: studies in yeast, rice, and wheat. Mach Learn 109:251–277. https://doi.org/10.1007/s10994-019-05848-5
Article PubMed Google Scholar
Jarquin D, Kajiya-Kanegae H, Taishen C et al (2020) Coupling day length data and genomic prediction tools for predicting time-related traits under complex scenarios. Sci Rep 10:13382. https://doi.org/10.1038/s41598-020-70267-9
Article CAS PubMed PubMed Central Google Scholar
Schrauf MF, Martini JWR, Simianer H et al (2020) Phantom epistasis in genomic selection: on the predictive ability of epistatic models. G3 10:3137–3145. https://doi.org/10.1534/g3.120.401300
Article CAS PubMed PubMed Central Google Scholar
Toda Y, Wakatsuki H, Aoike T et al (2020) Predicting biomass of rice with intermediate traits: modeling method combining crop growth models and genomic prediction models. PLoS One 15:e0233951. https://doi.org/10.1371/journal.pone.0233951
Article CAS PubMed PubMed Central Google Scholar
Xu Y, Zhao Y, Wang X et al (2020) Incorporation of parental phenotypic data into multi-omic models improves prediction of yield-related traits in hybrid rice. Plant Biotechnol J 19(2):261–272. https://doi.org/10.1111/pbi.13458
Article CAS PubMed PubMed Central Google Scholar
Piepho HP (2009) Ridge regression and extensions for genomewide selection in maize. Crop Sci 49:1165–1176. https://doi.org/10.2135/cropsci2008.10.0595
Article Google Scholar
de los Campos G, Naya H, Gianola D et al (2009) Predicting quantitative traits with regression models for dense molecular markers and pedigree. Genetics 182:375–385. https://doi.org/10.1534/genetics.109.101501
Article CAS PubMed PubMed Central Google Scholar
Zhong S, Dekkers JCM, Fernando RL, Jannink J-L (2009) Factors affecting accuracy from genomic selection in populations derived from multiple inbred lines: a barley case study. Genetics 182:355–364. https://doi.org/10.1534/genetics.108.098277
Article CAS PubMed PubMed Central Google Scholar
Fukuoka S, Ebana K, Yamamoto T, Yano M (2010) Integration of genomics into rice breeding. Rice 3:131–137. https://doi.org/10.1007/s12284-010-9044-9
Article Google Scholar
VanRaden PM (2008) Efficient methods to compute genomic predictions. J Dairy Sci 91:4414–4423. https://doi.org/10.3168/jds.2007-0980
Article CAS PubMed Google Scholar
Henderson CR (1975) Best linear unbiased estimation and prediction under a selection model. Biometrics 31:423–447. https://doi.org/10.2307/2529430
Article CAS PubMed Google Scholar
Michel S, Ametz C, Gungor H et al (2016) Genomic selection across multiple breeding cycles in applied bread wheat breeding. Theor Appl Genet 129:1179–1189. https://doi.org/10.1007/s00122-016-2694-2
Article PubMed PubMed Central Google Scholar
Runcie D, Cheng H (2019) Pitfalls and remedies for cross validation with multi-trait genomic prediction methods. G3 9:3727–3741. https://doi.org/10.1534/g3.119.400598
Article PubMed PubMed Central Google Scholar
Gianola D, Schön C-C (2016) Cross-validation without doing cross-validation in genome-enabled prediction. G3 6:3107–3128. https://doi.org/10.1534/g3.116.033381
Article PubMed PubMed Central Google Scholar
Habier D, Fernando RL, Dekkers JCM (2007) The impact of genetic relationship information on genome-assisted breeding values. Genetics 177:2389–2397. https://doi.org/10.1534/genetics.107.081190
Article CAS PubMed PubMed Central Google Scholar
Technow F, Schrag TA, Schipprack W et al (2014) Genome properties and prospects of genomic prediction of hybrid performance in a breeding program of maize. Genetics 197:1343–1355. https://doi.org/10.1534/genetics.114.165860
Article PubMed PubMed Central Google Scholar
González-Diéguez D, Legarra A, Charcosset A et al (2021) Genomic prediction of hybrid crops allows disentangling dominance and epistasis. Genetics 218:iyab026. https://doi.org/10.1093/genetics/iyab026
Article PubMed PubMed Central Google Scholar
Crossa J (2012) From genotype × environment interaction to gene × environment interaction. Curr Genomics 13:225–244. https://doi.org/10.2174/138920212800543066
Article CAS PubMed PubMed Central Google Scholar
Voss-Fels KP, Cooper M, Hayes BJ (2019) Accelerating crop genetic gains with genomic selection. Theor Appl Genet 132:669–686. https://doi.org/10.1007/s00122-018-3270-8
Article PubMed Google Scholar
Bassi FM, Bentley AR, Charmet G et al (2016) Breeding schemes for the implementation of genomic selection in wheat (Triticum spp.). Plant Sci 242:23–36. https://doi.org/10.1016/j.plantsci.2015.08.021
Article CAS PubMed Google Scholar
Würschum T, Maurer HP, Weissmann S et al (2017) Accuracy of within- and among-family genomic prediction in triticale. Plant Breed 136:230–236. https://doi.org/10.1111/pbr.12465
Article CAS Google Scholar
Edwards SM, Buntjer JB, Jackson R et al (2019) The effects of training population design on genomic prediction accuracy in wheat. Theor Appl Genet 132:1943–1952. https://doi.org/10.1007/s00122-019-03327-y
Article CAS PubMed PubMed Central Google Scholar
Cobb JN, Juma RU, Biswas PS et al (2019) Enhancing the rate of genetic gain in public-sector plant breeding programs: lessons from the breeder’s equation. Theor Appl Genet 132:627–645. https://doi.org/10.1007/s00122-019-03317-0
Article PubMed PubMed Central Google Scholar
Dreisigacker S, Crossa J, Pérez-Rodríguez P et al (2021) Implementation of genomic selection in the cimmyt global wheat program, findings from the past 10 years. Crop Breed Genet Genomics 3:e210005. https://doi.org/10.20900/cbgg20210005
Article Google Scholar
Heffner EL, Lorenz AJ, Jannink J-L, Sorrells ME (2010) Plant breeding with genomic selection: gain per unit time and cost. Crop Sci 50:1681–1690. https://doi.org/10.2135/cropsci2009.11.0662
Article Google Scholar
Bernardo R (2020) Reinventing quantitative genetics for plant breeding: something old, something new, something borrowed, something BLUE. Heredity 125(6):375–385. https://doi.org/10.1038/s41437-020-0312-1
Article PubMed PubMed Central Google Scholar
García-Ruiz A, Cole JB, VanRaden PM et al (2016) Changes in genetic selection differentials and generation intervals in US Holstein dairy cattle as a result of genomic selection. Proc Natl Acad Sci 113:E3995–E4004. https://doi.org/10.1073/pnas.1519061113
Article CAS PubMed PubMed Central Google Scholar
Bardhan Roy SK, Pateña GF, Vergara BS (1982) Feasibility of selection for traits associated with cold tolerance in rice under rapid generation advance method. Euphytica 31:25–31. https://doi.org/10.1007/BF00028303
Article Google Scholar
Niizeki H, Oono K (1968) Induction of haploid rice plant from anther culture. Proc Jpn Acad 44:554–557. https://doi.org/10.2183/pjab1945.44.554
Article Google Scholar
Watson A, Ghosh S, Williams MJ et al (2018) Speed breeding is a powerful tool to accelerate crop research and breeding. Nat Plants 4:23–29. https://doi.org/10.1038/s41477-017-0083-8
Article PubMed Google Scholar
Yan G, Liu H, Wang H et al (2017) Accelerated generation of selfed pure line plants for gene identification and crop breeding. Front Plant Sci 8:1786. https://doi.org/10.3389/fpls.2017.01786
Article PubMed PubMed Central Google Scholar
Collard BCY, Beredo JC, Lenaerts B et al (2017) Revisiting rice breeding methods—evaluating the use of rapid generation advance (RGA) for routine rice breeding. Plant Prod Sci 20:337–352. https://doi.org/10.1080/1343943X.2017.1391705
Article Google Scholar
Bonnecarrere V, Rosas J, Ferraro B (2019) Economic impact of marker-assisted selection and rapid generation advance on breeding programs. Euphytica 215:197. https://doi.org/10.1007/s10681-019-2529-8
Article Google Scholar
Gaynor RC, Gorjanc G, Bentley AR et al (2017) A two-part strategy for using genomic selection to develop inbred lines. Crop Sci 57:2372–2386. https://doi.org/10.2135/cropsci2016.09.0742
Article Google Scholar
Muleta KT, Pressoir G, Morris GP (2019) Optimizing genomic selection for a sorghum breeding program in Haiti: a simulation study. G3 9:391–401. https://doi.org/10.1534/g3.118.200932
Article PubMed Google Scholar
Daetwyler HD, Pong-Wong R, Villanueva B, Woolliams JA (2010) The impact of genetic architecture on genome-wide evaluation methods. Genetics 185:1021–1031. https://doi.org/10.1534/genetics.110.116855
Article CAS PubMed PubMed Central Google Scholar
Goddard ME, Hayes BJ, Meuwissen THE (2011) Using the genomic relationship matrix to predict the accuracy of genomic selection. J Anim Breed Genet 128:409–421. https://doi.org/10.1111/j.1439-0388.2011.00964.x
Article CAS PubMed Google Scholar
Elsen J-M (2017) An analytical framework to derive the expected precision of genomic selection. Genet Sel Evol 49:95. https://doi.org/10.1186/s12711-017-0366-6
Article PubMed PubMed Central Google Scholar
Norman A, Taylor J, Edwards J, Kuchel H (2018) Optimising genomic selection in wheat: effect of marker density, population size and population structure on prediction accuracy. G3 8:2889. https://doi.org/10.1534/g3.118.200311
Article PubMed PubMed Central Google Scholar
Tayeh N, Klein A, Le Paslier M-C et al (2015) Genomic prediction in pea: effect of marker density and training population size and composition on prediction accuracy. Front Plant Sci 6:941. https://doi.org/10.3389/fpls.2015.00941
Article PubMed PubMed Central Google Scholar
Rincent R, Laloë D, Nicolas S et al (2012) Maximizing the reliability of genomic selection by optimizing the calibration set of reference individuals: comparison of methods in two diverse groups of maize inbreds (Zea mays L.). Genetics 192:715–728. https://doi.org/10.1534/genetics.112.141473
Article CAS PubMed PubMed Central Google Scholar
Rincent R, Charcosset A, Moreau L (2017) Predicting genomic selection efficiency to optimize calibration set and to assess prediction accuracy in highly structured populations. Theor Appl Genet 130:2231–2247. https://doi.org/10.1007/s00122-017-2956-7
Article CAS PubMed PubMed Central Google Scholar
Mangin B, Rincent R, Rabier C-E et al (2019) Training set optimization of genomic prediction by means of EthAcc. PLoS One 14:e0205629. https://doi.org/10.1371/journal.pone.0205629
Article CAS PubMed PubMed Central Google Scholar
Pszczola M, Strabel T, Mulder HA, Calus MPL (2012) Reliability of direct genomic values for animals with different relationships within and to the reference population. J Dairy Sci 95:389–400. https://doi.org/10.3168/jds.2011-4338
Article CAS PubMed Google Scholar
Habier D, Tetens J, Seefried F-R et al (2010) The impact of genetic relationship information on genomic breeding values in German Holstein cattle. Genet Sel Evol 42:5. https://doi.org/10.1186/1297-9686-42-5
Article CAS PubMed PubMed Central Google Scholar
Lorenz AJ, Smith KP (2015) Adding genetically distant individuals to training populations reduces genomic prediction accuracy in barley. Crop Sci 55:2657–2667. https://doi.org/10.2135/cropsci2014.12.0827
Article CAS Google Scholar
Lorenz AJ (2013) Resource allocation for maximizing prediction accuracy and genetic gain of genomic selection in plant breeding: a simulation experiment. G3 3:481–491. https://doi.org/10.1534/g3.112.004911
Article PubMed PubMed Central Google Scholar
Jarquin D, Howard R, Crossa J et al (2020) Genomic prediction enhanced sparse testing for multi-environment trials. G3 10:2725. https://doi.org/10.1534/g3.120.401349
Article CAS PubMed PubMed Central Google Scholar
Grattapaglia D, Resende MV (2011) Genomic selection in forest tree breeding. Tree Genet Genomes 7:241–255. https://doi.org/10.1007/s11295-010-0328-4
Article Google Scholar
Hickey JM, Dreisigacker S, Crossa J et al (2014) Evaluation of genomic selection training population designs and genotyping strategies in plant breeding programs using simulation. Crop Sci 54:1476–1488. https://doi.org/10.2135/cropsci2013.03.0195
Article Google Scholar
Meuwissen TH (2009) Accuracy of breeding values of “unrelated” individuals predicted by dense SNP genotyping. Genet Sel Evol 41:35. https://doi.org/10.1186/1297-9686-41-35
Article CAS PubMed PubMed Central Google Scholar
Mackay IJ, Caligari PDS (1999) Major errors in data and their effect on response to selection. Crop Sci 39:cropsci1999.0011183X003900020016x. https://doi.org/10.2135/cropsci1999.0011183X003900020016x
Article Google Scholar
Israel C, Weller JI (2000) Effect of misidentification on genetic gain and estimation of breeding value in dairy cattle populations. J Dairy Sci 83:181–187. https://doi.org/10.3168/jds.S0022-0302(00)74869-7
Article CAS PubMed Google Scholar
Breseghello F, de Mello RN, Pinheiro PV et al (2021) Building the Embrapa rice breeding dataset for efficient data reuse. Crop Sci 61:3445–3457. https://doi.org/10.1002/csc2.20550
Juanillas V, Dereeper A, Beaume N et al (2019) Rice galaxy: an open resource for plant science. GigaScience 8:giz028. https://doi.org/10.1093/gigascience/giz028
Article PubMed PubMed Central Google Scholar
Akdemir D, Isidro-Sánchez J (2019) Design of training populations for selective phenotyping in genomic prediction. Sci Rep 9:1446. https://doi.org/10.1038/s41598-018-38081-6
Article CAS PubMed PubMed Central Google Scholar
Ben-Sadoun S, Rincent R, Auzanneau J et al (2020) Economical optimization of a breeding scheme by selective phenotyping of the calibration set in a multi-trait context: application to bread making quality. Theor Appl Genet 133:2197–2212. https://doi.org/10.1007/s00122-020-03590-4
Article CAS PubMed Google Scholar
Rasheed A, Hao Y, Xia X et al (2017) Crop breeding chips and genotyping platforms: progress, challenges, and perspectives. Mol Plant 10:1047–1064. https://doi.org/10.1016/j.molp.2017.06.008
Article CAS PubMed Google Scholar
Gorjanc G, Dumasy J-F, Gonen S et al (2017) Potential of low-coverage genotyping-by-sequencing and imputation for cost-effective genomic selection in biparental segregating populations. Crop Sci 57:1404–1420. https://doi.org/10.2135/cropsci2016.08.0675
Article CAS Google Scholar
Cobb J, Rafiqul M, Kumar Katiyar S et al (2020) The evolution of a revolution: re-designing green revolution breeding programs in Asia and Africa to increase rates of genetic gain. [W020]. PAG, public, p 9
Google Scholar
Collard BCY, Gregorio GB, Thomson MJ et al (2019) Transforming rice breeding: re-designing the irrigated breeding pipeline at the international rice research institute (IRRI). Crop Breed Genet Genomics 1:e190008. https://doi.org/10.20900/cbgg20190008
Article Google Scholar
Thomson MJ, Singh N, Dwiyanti MS et al (2017) Large-scale deployment of a rice 6K SNP array for genetics and breeding applications. Rice 10:40. https://doi.org/10.1186/s12284-017-0181-2
Article PubMed PubMed Central Google Scholar
Habier D, Fernando RL, Garrick DJ (2013) Genomic BLUP decoded: a look into the black box of genomic prediction. Genetics 194:597–607. https://doi.org/10.1534/genetics.113.152207
Article CAS PubMed PubMed Central Google Scholar
Maruyama K (1989) Using rapid generation advance with single seed descent in rice breeding. International Rice Research Institute, pp 253–259
Google Scholar
De Pauw RM, Clarke JM (1976) Acceleration of generation advancement in spring wheat. Euphytica 25:415–418. https://doi.org/10.1007/BF00041574
Article Google Scholar
McCouch S, Baute GJ, Bradeen J et al (2013) Feeding the future. Nature 499:23. https://doi.org/10.1038/499023a
Article CAS PubMed Google Scholar
Cowling WA (2013) Sustainable plant breeding. Plant Breed 132:1–9. https://doi.org/10.1111/pbr.12026
Article Google Scholar
Gorjanc G, Jenko J, Hearne SJ, Hickey JM (2016) Initiating maize pre-breeding programs using genomic selection to harness polygenic variation from landrace populations. BMC Genomics 17:30. https://doi.org/10.1186/s12864-015-2345-z
Article CAS PubMed PubMed Central Google Scholar
Yu X, Li X, Guo T et al (2016) Genomic prediction contributing to a promising global strategy to turbocharge gene banks. Nat Plants 2:16150. https://doi.org/10.1038/nplants.2016.150
Article CAS PubMed Google Scholar
Tanaka R, Iwata H (2018) Bayesian optimization for genomic selection: a method for discovering the best genotype among a large number of candidates. Theor Appl Genet 131:93–105. https://doi.org/10.1007/s00122-017-2988-z
Article PubMed Google Scholar
Wang DR, Agosto-Pérez FJ, Chebotarov D et al (2018) An imputation platform to enhance integration of rice genetic resources. Nat Commun 9:3519. https://doi.org/10.1038/s41467-018-05538-1
Article CAS PubMed PubMed Central Google Scholar
Melchinger AE, Gumber RK (1998) Overview of heterosis and heterotic groups in agronomic crops. In: Lamkey KR, Staub JE (eds) Concepts and breeding of heterosis in crop plants, vol 24. CSSA, Madison, WI, pp 29–44
Google Scholar
Reif JC, Melchinger AE, Xia XC et al (2003) Use of SSRs for establishing heterotic groups in subtropical maize. Theor Appl Genet 107:947–957. https://doi.org/10.1007/s00122-003-1333-x
Article CAS PubMed Google Scholar
Ouyang Y, Liu Y-G, Zhang Q (2010) Hybrid sterility in plant: stories from rice. Curr Opin Plant Biol 13:186–192. https://doi.org/10.1016/j.pbi.2010.01.002
Article PubMed Google Scholar
Xie F, He Z, Esguerra MQ et al (2014) Determination of heterotic groups for tropical Indica hybrid rice germplasm. Theor Appl Genet 127:407–417. https://doi.org/10.1007/s00122-013-2227-1
Article Google Scholar
Beukert U, Li Z, Liu G et al (2017) Genome-based identification of heterotic patterns in rice. Rice 10:22. https://doi.org/10.1186/s12284-017-0163-4
Article PubMed PubMed Central Google Scholar
Zhao Y, Li Z, Liu G et al (2015) Genome-based establishment of a high-yielding heterotic pattern for hybrid wheat breeding. Proc Natl Acad Sci 112:15624–15629. https://doi.org/10.1073/pnas.1514547112
Article CAS PubMed PubMed Central Google Scholar
Araus JL, Kefauver SC, Zaman-Allah M et al (2018) Translating high-throughput phenotyping into genetic gain. Trends Plant Sci 23:451–466. https://doi.org/10.1016/j.tplants.2018.02.001
Article CAS PubMed PubMed Central Google Scholar
Araus JL, Cairns JE (2014) Field high-throughput phenotyping: the new crop breeding frontier. Trends Plant Sci 19:52–61. https://doi.org/10.1016/j.tplants.2013.09.008
Article CAS PubMed Google Scholar
Pauli D, Chapman SC, Bart R et al (2016) The quest for understanding phenotypic variation via integrated approaches in the field environment. Plant Physiol 172:622–634. https://doi.org/10.1104/pp.16.00592
Article CAS PubMed PubMed Central Google Scholar
Rutkoski J, Poland J, Mondal S et al (2016) Canopy temperature and vegetation indices from high-throughput phenotyping improve accuracy of pedigree and genomic selection for grain yield in wheat. G3 6:2799–2808. https://doi.org/10.1534/g3.116.032888
Article PubMed PubMed Central Google Scholar
Juliana P, Montesinos-López OA, Crossa J et al (2019) Integrating genomic-enabled prediction and high-throughput phenotyping in breeding for climate-resilient bread wheat. Theor Appl Genet 132:177–194. https://doi.org/10.1007/s00122-018-3206-3
Article CAS PubMed Google Scholar
Rincent R, Charpentier J-P, Faivre-Rampant P et al (2018) Phenomic selection is a low-cost and high-throughput method based on indirect predictions: proof of concept on wheat and poplar. G3 8(12):3961–3972. https://doi.org/10.1534/g3.118.200760
Article CAS PubMed PubMed Central Google Scholar
Lane HM, Murray SC, Montesinos-López OA et al (2020) Phenomic selection and prediction of maize grain yield from near-infrared reflectance spectroscopy of kernels. Plant Phenome J 3:e20002. https://doi.org/10.1002/ppj2.20002
Article Google Scholar
Xu Y (2016) Envirotyping for deciphering environmental impacts on crop plants. Theor Appl Genet 129:653–673. https://doi.org/10.1007/s00122-016-2691-5
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgments

The authors are grateful to Adam Famoso and Flavio Breseghello for their valuable comments and comprehensive review of the chapter. We would also like to thank the irrigated rice team at IRRI: Rose Imee Zhella Morantte, Vitaliano Lopena, Holden Verdeprado, and Juan David Arbelaez for their help with data acquisition and management regarding the example provided on IRRI breeding program. We thank the IRRI Bangladesh team and, in particular, Rafiqul M. Islam as well as our partners in Bangladesh for their support in obtaining phenotypic data for the training set presented in the example.

Funding

The Bill and Melinda Gates Foundation through the Accelerated Genetic Gain in Rice (AGGRi) Alliance project sponsored and funded this work.

Author information

Authors and Affiliations

CIRAD, UMR AGAP Institut, Montpellier, France
Jérôme Bartholomé
AGAP Institut, Univ Montpellier, CIRAD, INRAE, Montpellier SupAgro, Montpellier, France
Jérôme Bartholomé
Rice Breeding Platform, International Rice Research Institute, Manila, Philippines
Jérôme Bartholomé & Parthiban Thathapalli Prakash
RiceTec Inc, Alvin, TX, USA
Joshua N. Cobb

Authors

Jérôme Bartholomé
View author publications
You can also search for this author in PubMed Google Scholar
Parthiban Thathapalli Prakash
View author publications
You can also search for this author in PubMed Google Scholar
Joshua N. Cobb
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jérôme Bartholomé .

Editor information

Editors and Affiliations

CIRAD, UMR AGAP Institut, Montpellier, France
Nourollah Ahmadi
UMR AGAP Institut, CIRAD, Montpellier, France
Jérôme Bartholomé

1 Electronic Supplementary Material

Data 1

Data from the irrigated breeding program provided as a real case example. IRRI_GS_data (ZIP 495 kb)

Data 2

R functions for the genomic prediction analysis pipeline currently used at IRRI. IRRI_GS_functions (ZIP 10 kb)

Data 3

R scripts for the genomic prediction analysis pipeline currently used at IRRI. IRRI_GS_script (ZIP 11 kb)

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this protocol

Cite this protocol

Bartholomé, J., Prakash, P.T., Cobb, J.N. (2022). Genomic Prediction: Progress and Perspectives for Rice Improvement. In: Ahmadi, N., Bartholomé, J. (eds) Genomic Prediction of Complex Traits. Methods in Molecular Biology, vol 2467. Humana, New York, NY. https://doi.org/10.1007/978-1-0716-2205-6_21

Download citation

DOI: https://doi.org/10.1007/978-1-0716-2205-6_21
Published: 22 April 2022
Publisher Name: Humana, New York, NY
Print ISBN: 978-1-0716-2204-9
Online ISBN: 978-1-0716-2205-6
eBook Packages: Springer Protocols

Publish with us

Policies and ethics

Genomic Prediction: Progress and Perspectives for Rice Improvement

Abstract

Similar content being viewed by others

Genomic Selection in Rice Breeding

Genome-wide prediction models that incorporate de novo GWAS are a powerful new tool for tropical rice improvement

Understanding the genomic selection for crop improvement: current progress and future prospects

Key words

1 Introduction

2 Genomic Prediction Works in Rice

2.1 General Overview

2.2 Important Findings and Current Limitations for Genomic Prediction in Rice

2.2.1 Important Findings

2.2.2 Current Limitations

3 Integration of Genomic Prediction into Rice Breeding Programs: Key Aspects

3.1 Map the Breeding Strategy

3.2 Reduce the Cycle Time

3.3 Design the Training Set

3.4 Generate and Integrate Good Quality Data

3.5 Take into Account the Costs

4 An Example on IRRI Breeding Program for Irrigated Systems

4.1 The Transition from Pedigree Breeding to Recurrent Genomic Selection

4.2 Description of the Breeding Schemes and Integrating Genomic Prediction

4.3 A Practical Example of the Analytical Pipeline

4.3.1 Selection of the Training Set

4.3.2 Single Trial Analysis

4.3.3 Genomic Predictions

5 Other Applications of Genomic Prediction for Rice Improvement

5.1 Characterization of Genetic Diversity for Pre-breeding

5.2 Definition of Heterotic Groups for Hybrid Breeding

5.3 Integration of High-Throughput Phenotyping and Environmental Information

6 Conclusion: A Point of View of a Rice Breeder

References

Acknowledgments

Funding

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

1 Electronic Supplementary Material

Data 1

Data 2

Data 3

Rights and permissions

Copyright information

About this protocol

Cite this protocol

Download citation

Publish with us

Search

Navigation