Theoretical and Applied Genetics

, Volume 116, Issue 6, pp 815–824

Genomewide selection in oil palm: increasing selection gain per unit time and cost with small populations

Authors

  • C. K. Wong
    • Advanced Agriecological Research SDN BDH
    • Department of Agronomy and Plant GeneticsUniversity of Minnesota
Original Paper

DOI: 10.1007/s00122-008-0715-5

Cite this article as:
Wong, C.K. & Bernardo, R. Theor Appl Genet (2008) 116: 815. doi:10.1007/s00122-008-0715-5

Abstract

Oil palm (Elaeis guineensis Jacq.) requires 19 years per cycle of phenotypic selection. The use of molecular markers may reduce the generation interval and the cost of oil-palm breeding. Our objectives were to compare, by simulation, the response to phenotypic selection, marker-assisted recurrent selection (MARS), and genomewide selection with small population sizes in oil palm, and assess the efficiency of each method in terms of years and cost per unit gain. Markers significantly associated with the trait were used to calculate the marker scores in MARS, whereas all markers were used (without significance tests) to calculate the marker scores in genomewide selection. Responses to phenotypic selection and genomewide selection were consistently greater than the response to MARS. With population sizes of N = 50 or 70, responses to genomewide selection were 4–25% larger than the corresponding responses to phenotypic selection, depending on the heritability and number of quantitative trait loci. Cost per unit gain was 26–57% lower with genomewide selection than with phenotypic selection when markers cost US $1.50 per data point, and 35–65% lower when markers cost $0.15 per data point. With population sizes of N = 50 or 70, time per unit gain was 11–23 years with genomewide selection and 14–25 years with phenotypic selection. We conclude that for a realistic yet relatively small population size of N = 50 in oil palm, genomewide selection is superior to MARS and phenotypic selection in terms of gain per unit cost and time. Our results should be generally applicable to other tree species that are characterized by long generation intervals, high costs of maintaining breeding plantations, and small population sizes in selection programs.

Introduction

Oil palm (Elaeis guineensis Jacq.) has emerged as the world’s most important oil crop, with the world production of palm oil (25%) surpassing that of soybean [Glycine max (L.) Merr.] (24%) (Mielke 2007; Yusof 2007). Oil palm is a diploid (2N = 16), cross-pollinated, perennial tropical tree crop whose economic life span ranges from 25 to 30 years. Palm oil is mainly used as food (90%) and oleochemical substitutes for mineral oil (10%) (Corley and Tinker 2003, p. 474; Yusof 2007). Palm oil is the cheapest vegetable oil available, and the current interest in biodiesel has led to a high demand and all-time-high prices for the commodity (Yusof 2007).

Oil-palm breeding is limited primarily by the long generation interval of the species and the cost of evaluating and maintaining palms in a breeding program (Soh et al. 1990; Rance et al. 2001). One cycle of selection, which includes phenotypic evaluation of testcrosses and intercrossing of the best palms to form the next cycle, requires approximately 19 years. Oil-palm breeding is expensive not only because of the length of time required per cycle, but also because large planting areas are required. The common planting density for oil palm is 138 palms ha−1, with 16 palms plot−1 (Soh et al. 1990). A typical experiment, in which testcross progenies of 25 parental palms are evaluated in four replications, requires 12 ha. Methods that reduce the duration and size of the breeding program would improve the efficiency of oil-palm breeding (Rance et al. 2001).

Other plant species may serve as models for the application of molecular markers in oil-palm breeding. Molecular markers have increased the gain per unit time in maize (Zea mays L.), which, like oil palm is a cross-pollinated crop with distinct heterotic groups (Hallauer 1990). A strategy for using markers to increase gain per unit time in maize has been as follows (Johnson 2004). First, marker-trait associations are evaluated based on maize performance in the US Corn Belt. Second, two to three cycles of selection are performed based on the markers alone in a continuous nursery in Hawaii or Puerto Rico, where phenotypic evaluations are irrelevant to performance in the US Corn Belt but where marker genotypes remain the same. On average, two to three cycles of marker-based selection in only 1 year has led to a 9% improvement in maize grain yield (Johnson 2004). Most approaches for marker-based selection in maize have involved marker-assisted recurrent selection (MARS), in which 25–30 markers with significant effects are used in calculating marker scores (Edwards and Johnson 1994; Koebner 2003). Genomewide selection, a form of marker-based selection that uses all markers by assigning breeding values to each marker (Meuwissen et al. 2001), has been recently proposed in maize (Bernardo and Yu 2007). Simulations in maize indicate genomewide selection is superior to MARS, with 18–43% larger selection responses (Bernardo and Yu 2007).

Population size is a key factor that affects the usefulness of both MARS and genomewide selection (Bernardo and Charcosset 2006; Bernardo and Yu 2007). Due to space limitations, typical population sizes (i.e., number of plants evaluated for their testcross performance) are much smaller in oil palm (∼25 palms; Soh et al. 1990; Corley and Tinker 2003, pp 191–192) than in maize (∼100–150 plants; Johnson and Mumm 1996; Johnson 2001). It is unknown whether genomewide selection with small population sizes will be superior to MARS and phenotypic selection because previous studies on genomewide selection assumed large population sizes (Meuwissen et al. 2001; Schaeffer 2006; Bernardo and Yu 2007). Furthermore, small population sizes will also reduce the ability to detect markers with significant effects and may render MARS less effective. On the other hand, the prospects for increasing gain per unit time and cost through markers are arguably greater in oil palm than in annual crops such as maize.

Our first objective was to compare, by simulation, the selection response resulting from phenotypic selection, MARS, and genomewide selection with small population sizes. Our second objective was to assess the efficiency of these selection procedures in terms of years and cost per unit gain.

Materials and methods

Overview of selection procedures

The main features of phenotypic selection, MARS, and genomewide selection are depicted in Fig. 1. The main heterotic groups in oil palm are dura (thick shell) and pisifera (shell-less) germplasm (Corley and Tinker 2003, p 145). Commercially grown oil palms have the tenera (thin shell) fruit type, which is the result of dura × pisifera matings. We assumed a common base population (Cycle 0) for each procedure. In practice, fully inbred oil palms are unavailable for making an F1 cross and the subsequent F2 population, and selection in oil palm is done in a segregating population with unknown allele frequencies. For simplicity, however, we assumed that the Cycle 0 population was a segregating F2 population (i.e., from the cross between two inbreds) with allele frequencies of 0.50 at biallelic quantitative trait loci (QTL) and marker loci. We wrote a FORTRAN program for our simulations and analysis.
https://static-content.springer.com/image/art%3A10.1007%2Fs00122-008-0715-5/MediaObjects/122_2008_715_Fig1_HTML.gif
Fig. 1

Genomewide selection, MARS, and phenotypic selection in oil palm. The Cycle 0 population was a segregating F2 population with allele frequencies of 0.50 (i.e., cross between two inbreds). Phenotypic selection requires testcrossing to an arbitrary tester (3 years), evaluation and selection of testcrosses [physiological immature phase (3 years) and data collection phase (7 years)], and recombination [crossing phase (3 years) and physiological immature phase (3 years)]. In MARS and genomewide selection, genotyping is performed at the physiological immature phase (3 years) and the best palms are selected and recombined to form the next cycle (3 years)

A simulation experiment was defined as a combination of population size (N), number of replications (NRep), number of QTL, and heritability. In a given repeat of a simulation experiment, the same Cycle 0 population was subjected to phenotypic selection, MARS, and genomewide selection. Testcross performance with an unrelated, arbitrary tester was modeled. In Cycle 0, testcrosses of N palms were evaluated in field trials with NRep replications (Fig. 1). The best NSel parental palms were selected based on their mean testcross performance and were random-mated to develop N palms in Cycle 1. With phenotypic selection, the procedures in Cycle 0 were repeated in Cycle 1 to form Cycle 2. As previously mentioned, a cycle of testcross selection requires 19 years, and selection was assumed completed when Cycle 2 palms were obtained (i.e., 38 years).

With MARS, selection in Cycles 1–3 was based on molecular markers only (Fig. 1). We assumed that the markers mapped to a single locus and were codominant and biallelic. The N palms in Cycle 0 were genotyped. Based on Cycle 0 phenotypic and marker data, markers with significant effects were detected and marker effects were estimated. The marker effects estimated in Cycle 0 were used to calculate marker scores for the N palms in Cycle 1. The best NSel palms were selected based on marker scores and random-mated to produce Cycle 2. The procedures were repeated until Cycle 4 palms were produced. Because testcross phenotypic evaluations are not needed with MARS, the time required per cycle is 6 years. MARS was assumed completed when the Cycle 4 palms were obtained; this requires 37 years, a duration comparable to two cycles of phenotypic selection.

With genomewide selection, the procedures were similar to those with MARS except in the way that marker scores were calculated: all markers were considered simultaneously and identification of markers with significant effects was not required (Fig. 1). Breeding values for each marker were obtained by best linear unbiased prediction (Meuwissen et al. 2001; Bernardo and Yu 2007).

Each simulation experiment was repeated 1,000 times and the mean responses to phenotypic selection, MARS, and genomewide selection were expressed in units of the testcross genetic standard deviation in the base population. A different Cycle 0 population was used in each repeat, with each population differing in the locations of QTL. The markers were equally spaced in the genome and each repeat consequently had the same location of markers. Approximate least significant differences (P = 0.05) between the mean responses to phenotypic selection, MARS, and genomewide selection were obtained based on the standard deviation of the responses across the 1,000 repeats.

N, NRep, and NSel for typical versus modified schemes

We considered typical values of N and NRep for oil palm as well as modified values of N and NRep that could possibly lead to greater selection responses. In accordance with common practices in oil-palm breeding, typical schemes had N = 15, 25 or 35 and NRep = 4 (Corley and Tinker 2003, pp 191–192; Soh et al. 1990). In the modified schemes, the population sizes were increased twofold (N = 30, 50 or 70) while the number of replications was reduced by half (NRep = 2) to maintain the same total numbers of palms evaluated for their testcross phenotypes in Cycle 0. Genomewide selection required an estimate of genetic variance (VG). To allow the estimation of VG, a minimum of NRep = 2 was required with genomewide selection. The number of selected palms was constant at NSel = 4, according to the rule of thumb that the response is largest when the number of selected individuals is roughly equal to the total number of cycles over which selection is performed (Bernardo et al. 2006).

Genetic model

A trait controlled by NQTL = 20, 40, 60, or 80 QTL, each with two alleles was considered. The QTL were randomly located among 16 chromosomes according to a uniform distribution across the entire genome. The sizes of the chromosomes and of the entire genome (1,743 cM) corresponded to those in a published oil-palm linkage map (Billotte et al. 2005). For MARS, a total of 100 equally spaced markers were used in accordance with previous studies indicating that responses to MARS were largest when 64–128 markers were used (Bernardo and Yu 2007). For genomewide selection, our preliminary studies suggested that responses to genomewide selection in oil palm were largest when the number of markers ranged from 120 to 160 (results not shown). A total of 140 equally spaced markers were therefore used in genomewide selection. The two marker alleles at the jth locus were denoted by Mj and mj. In simulating meiosis and crossing over, the Kosambi mapping function was used to determine the correspondence between expected recombination frequencies and map distances.

Testcross performance with an unrelated, arbitrary tester was simulated by modeling the average testcross effect of an allele substitution (Bernardo 2002, p. 65). The testcross effects of individual QTL were modeled according to a geometric distribution (Lande and Thompson 1990; Bernardo and Yu 2007) so that few QTL had large effects and many QTL had smaller effects. Testcross means in a cross-pollinated crop behave in an additive manner even if dominance is present (Bernardo 2002, p. 78). Testcross genetic effects were therefore assumed to be purely additive.

Random non-genetic effects, which had a normal distribution with a mean of zero, were added to the genotypic values to obtain phenotypic values. Testcross heritability was H = VG/(VVE), where VG was the testcross genetic variance in Cycle 0 and VE was the non-genetic variance. The random non-genetic effects were scaled so that the testcross heritability in Cycle 0 was H = 0.20 or 0.50 on an individual-replication basis.

Statistical procedure in MARS and genomewide selection

With MARS, marker effects associated with the trait were identified and estimated only in Cycle 0. First, marker effects at the jth locus were estimated by multiple regression of testcross phenotypic values on the number of Mj marker alleles on a chromosome-by-chromosome basis. Significant markers (P = 0.20) on each chromosome were identified by backwards elimination. The significance level of P = 0.20 was used because relaxed significant levels have been found to maximize the response to MARS (Hospital et al. 1997; Johnson 2001). Next, multiple regression coefficients were obtained for all the significant markers (Bernardo 2004; Bernardo and Charcosset 2006). The multiple regression coefficients were used to calculate marker scores in Cycles 1–3 (Hospital et al. 1997). Marker effects were assumed fixed in MARS.

With genomewide selection, the value associated with each marker was obtained by best linear unbiased prediction (BLUP). The BLUP linear model was
$$ {\mathbf{y}} = \mu \mathbf{1} + {\mathbf{Xg}} + {\mathbf{e}} $$
where y was an × 1 vector of testcross phenotypic means of the individuals in Cycle 0; μ was the overall testcross mean of the individuals in Cycle 0; 1 was an × 1 vector with all elements equal to 1; X was an × 140 design matrix (where the jth column correspond to the jth marker locus) with elements equal to 1 if the individual in Cycle 0 was homozygous for the Mj marker allele, −1 if it was homozygous for the mj marker allele, and 0 if it was heterozygous; g was a 140 × 1 vector of breeding values associated with the Mj marker alleles; and e was an × 1 vector of residual effects. VG and VE were estimated by analysis of variance of the testcross phenotypic values in Cycle 0. The variance of the breeding value at each of the marker loci was assumed equal to the estimated VG divided by the number of marker loci (Meuwissen et al. 2001; 140 marker loci in genomewide selection in this study). The coefficients in g were used to calculate marker scores in Cycles 1–3 (Bernardo and Yu 2007). Marker effects were assumed random in genomewide selection.

Costs and time frame of phenotypic selection, MARS and genomewide selection

One cycle of oil-palm phenotypic selection, which requires 19 years, could be broken down into phenotypic evaluation of testcrosses (13 years), intercrossing of the best palms (3 years) to form the next cycle, and the time required for physiological maturity (3 years). Phenotypic evaluation of testcrosses could be further divided into the testcrossing phase (3 years), physiological immature phase (3 years), and data collection phase (7 years). Data on performance per se (i.e., non-testcross) for each individual in the population (dura or pisifera) was not necessary because selection was for testcross performance (tenera) only. The following costs (calculated from 2007 budget estimates from a private oil-palm research company; C.K. Wong, 2007, unpublished) in US dollars were assumed: $150 entry−1 year−1 in the crossing phase; $75 entry−1 year−1 replicate−1 in the physiological immature phase; and $200 entry−1 year−1 replicate−1 in the data collection phase.

MARS and genomewide selection do not require phenotypic evaluation of testcrosses after Cycle 0. Each subsequent cycle requires 6 years, which consists of intercrossing of the best palms (3 years) to form the next cycle population until physiological maturity (3 years). Within these 6 years, palms can be genotyped and selected based on significant markers (MARS) or all markers (genomewide selection). The cost of each marker data point was assumed US $0.15 or 1.50. The lower cost of $0.15 per data point was consistent with the current cost of genotyping for single nucleotide polymorphisms (SNP; Schaeffer 2006) or Diversity Arrays Technology (DArT) markers (Kilian et al. 2005). Linkage maps for oil palm are unavailable for SNP or DArT markers but have been constructed based on restriction fragment length polymorphisms (RFLPs; Mayes et al. 1997), amplified fragment length polymorphisms (Chua et al. 2001), and simple sequence repeats (SSRs; Billote et al. 2005). The higher cost of $1.50 was consistent with the approximate cost of genotyping for SSRs in commercial molecular marker laboratories in the US (A. Kahler, pers. comm. 2007; R. Rasmusson, pers. comm. 2007).

Total projected costs of each scheme for MARS and genomewide selection were calculated based on the following: N; NRep; number of markers used (100 in MARS and 140 in genomewide selection); cost of crossing, field maintenance, and data collection for palms; and cost per marker data point. At the end of selection, MARS and genomewide selection requires 37 years (i.e. 19 years for one cycle of phenotypic selection plus 18 years for three cycles of MARS or genomewide selection). Breeding efficiency of each method was expressed in cost per unit gain and time per unit gain.

Results

Response to phenotypic selection, MARS and genomewide selection for typical schemes

As expected, the responses to phenotypic selection, MARS and genomewide selection increased as population size (N) and heritability (H) increased (Table 1). The responses decreased as the number of QTL (NQTL) increased.
Table 1

Response (in units of the genetic standard deviation in Cycle 0) to phenotypic selection, MARS, and genomewide selection in typical schemes (four replications) with different numbers of QTL, population sizes (N), and trait heritabilities (H)

Number of QTL

Method

Cycle

= 15

= 25

= 35

= 0.20

= 0.50

= 0.20

= 0.50

= 0.20

= 0.50

20

Phenotypic selection

2

1.36a

1.71

1.72

2.12

1.86

2.39

MARS

4

0.58

0.79

0.73

1.04

1.22

1.64

Genomewide selection

4

1.24

1.63

1.67

2.22*

2.04*

2.63*

RMARS:PSb

 

42%

46%

42%

49%

65%

69%

RGWS:PSc

 

91%

95%

97%

105%

110%

110%

40

Phenotypic selection

2

1.31

1.65

1.62

2.07

1.83

2.32

MARS

4

0.44

0.65

0.60

0.98

1.08

1.42

Genomewide selection

4

1.14

1.42

1.50

2.07

1.93*

2.53*

RMARS:PS

 

33%

40%

37%

48%

59%

61%

RGWS:PS

 

86%

86%

93%

101%

105%

109%

60

Phenotypic selection

2

1.20

1.58

1.53

2.00

1.76

2.23

MARS

4

0.41

0.59

0.57

0.85

0.96

1.36

Genomewide selection

4

0.92

1.33

1.46

2.00

1.82*

2.37*

RMARS:PS

 

34%

37%

37%

43%

55%

61%

RGWS:PS

 

77%

84%

95%

100%

103%

106%

80

Phenotypic selection

2

1.23

1.56

1.52

1.96

1.67

2.19

MARS

4

0.33

0.50

0.53

0.80

0.89

1.23

Genomewide selection

4

0.91

1.31

1.33

1.91

1.68

2.09

RMARS:PS

 

27%

32%

35%

41%

53%

56%

RGWS:PS

 

74%

84%

88%

97%

101%

95%

* For a given number of QTL, heritability, and population size, * indicates the response to genomewide selection (Cycle 4) was significantly greater (P = 0.05) than the response to phenotypic selection (Cycle 2). The least significant difference at the 5% level was 0.06–0.07

aFor a given number of QTL, heritability, and population size, bold fonts indicate the largest numerical response obtained across the selection methods

bRMARS:PS is the ratio between the response to MARS and the response to phenotypic selection

cRGWS:PS is the ratio between the response to genomewide selection and the response to phenotypic selection

For the typical schemes, in which testcrosses were evaluated in NRep = 4 replications, the responses to phenotypic selection and genomewide selection were consistently greater than the response to MARS (Table 1). This result was obtained across all population sizes (N = 15, 25, 35), levels of heritability (H = 0.20 and 0.50), and number of QTL (NQTL = 20 to 80) studied. The response to phenotypic selection (in units of the testcross genetic standard deviation in the base population) ranged from 1.20 to 2.39. In comparison, the response to genomewide selection ranged from 0.91 to 2.63, whereas the response to MARS ranged from 0.33 to 1.64.

All responses to MARS were statistically smaller than the corresponding responses to phenotypic selection, with ratios of response to MARS and phenotypic selection (RMARS:PS) ranging from 27 to 69%. When the population size was N = 35 with 20–60 QTL controlling the trait, the responses to genomewide selection were statistically larger than the responses to phenotypic selection, with ratios of response to genomewide selection and phenotypic selection (RGWS:PS) ranging from 103 to 110%. In contrast, when the population sizes were N = 15 or 25, responses to genomewide selection were either mostly statistically smaller than or equal to the responses to phenotypic selection, with RGWS:PS ratios ranging from 74 to 105%.

Response to phenotypic selection, MARS and genomewide selection for modified schemes

Population sizes of N = 50 or 70

For the modified schemes (NRep = 2) with population sizes of N = 50 or 70, at the end of selection, the responses to genomewide selection were statistically larger (4–5%) than the corresponding responses to phenotypic selection, depending on the heritability and number of QTL (Table 2). The responses to genomewide selection ranged from 1.58 to 3.24. In comparison, the responses to phenotypic selection ranged from 1.51 to 2.63. Nearly all of the responses to genomewide selection by Cycle 3 (one cycle prior to the end of selection) were mostly either statistically larger than or equal to the responses to phenotypic selection, except for NQTL = 80 and = 0.20. As in the typical schemes (NRep = 4), gains from genomewide selection were statistically larger than the gains from MARS (results not shown).
Table 2

Response (in units of the genetic standard deviation in Cycle 0) to phenotypic selection and genomewide selection in modified schemes (two replications) with different numbers of QTL, population sizes (N), and trait heritabilities (H)

Number Of QTL

Method

Cycle

= 30

= 50

= 70

= 0.20

= 0.50

= 0.20

= 0.50

= 0.20

= 0.50

20

Phenotypic selection

1

0.88

1.23

1.02

1.44

1.12

1.58

2

1.51

2.10

1.69

2.41

1.88

2.63

Genomewide selection

2

1.17

1.69

1.47

2.08

1.68

2.35

3

1.35

1.98

1.75*

2.50*

2.08*

2.87*

4

1.47

2.22*

1.95*

2.82*

2.35*

3.24*

RGWS:PSa

 

98%

106%

115%

117%

125%

123%

40

Phenotypic selection

1

0.85

1.21

0.99

1.41

1.09

1.52

2

1.38

1.99

1.62

2.34

1.75

2.47

Genomewide selection

2

1.05

1.59

1.33

1.99

1.57

2.23

3

1.21

1.85

1.62

2.40*

1.92*

2.72*

4

1.30

2.06*

1.76*

2.71*

2.15*

3.08*

RGWS:PS

 

94%

103%

109%

116%

122%

125%

60

Phenotypic selection

1

0.82

1.19

0.98

1.40

1.06

1.46

2

1.33

1.93

1.57

2.25

1.68

2.38

Genomewide selection

2

1.01

1.54

1.29

1.94

1.50

2.10

3

1.11

1.78

1.53

2.32*

1.81*

2.54*

4

1.16

1.96

1.67*

2.54*

2.01*

2.89*

RGWS:PS

 

87%

101%

107%

113%

120%

122%

80

Phenotypic selection

1

0.83

1.18

0.95

1.37

1.02

1.45

2

1.31

1.86

1.51

2.19

1.62

2.35

Genomewide selection

2

0.98

1.49

1.24

1.86

1.44

2.07

3

1.07

1.72

1.43

2.23

1.72*

2.54*

4

1.11

1.86

1.58*

2.47*

1.91*

2.85*

RGWS:PS

 

84%

100%

104%

113%

118%

121%

* For a given number of QTL, heritability, and population size, * indicates the response to genomewide selection was significantly greater (P = 0.05) than the response to phenotypic selection (Cycle 2). The least significant difference at the 5% level was 0.06–0.08

aRGWS:PS is the ratio between the response to genomewide selection (Cycle 4) and the response to phenotypic selection (Cycle 2)

Population size of N = 30

For the modified schemes (NRep = 2) with a population size of N = 30, the trend of responses to genomewide selection in comparison to phenotypic selection was not obvious. The responses to genomewide selection ranged from 1.11 to 2.22. In comparison, the responses to phenotypic selection ranged from 1.31 to 2.10 (Table 2). When heritability was = 0.50, the responses to genomewide selection were either statistically larger than or equal to the corresponding responses to phenotypic selection, with RGWS:PS ratios ranging from 100 to 106%. In contrast, when heritability was = 0.20, the responses to genomewide selection were mostly smaller than the corresponding responses to phenotypic selection, with RGWS:PS ratios ranging from 84 to 98%.

Across the population sizes studied (N = 30, 50, and 70), the gain in phenotypic selection from Cycle 0 to Cycle 1 ranged from 0.82 to 1.58, which accounted for 59–63% of the total gain at the end of phenotypic selection (Table 2). With genomewide selection, the response from Cycle 1 to Cycle 2 was greater than the response from Cycle 2 to Cycle 3, which in turn was greater than the response from Cycle 3 to Cycle 4. At the end of genomewide selection, 48–75% of the total gain was due to phenotypic selection in Cycle 0.

Modified versus typical schemes

For phenotypic selection, the ratio of responses between a modified scheme (e.g. N = 30, NRep = 2) and a typical scheme that required the same amount of resources (i.e, N = 15, NRep = 4) ranged from 96 to 122%. The modified schemes were least advantageous for phenotypic selection with N = 70, NRep = 2 (versus N = 35, NRep = 4 in the typical scheme). For genomewide selection, the ratio between the response in the modified scheme and the response in the typical scheme ranged from 110 to 147%.

Cost and time per unit gain

Phenotypic selection and genomewide selection (the two methods that led to the largest responses) differed in their gains per unit cost and gains per year. For the modified schemes (which led to higher responses than the typical schemes), cost per unit gain was always higher with phenotypic selection than with genomewide selection (Fig. 2). This result was obtained across all levels of heritability, numbers of QTL, and population sizes studied. The cost per unit gain (in thousands of US $) ranged from 116 to 333 for phenotypic selection, from 86 to 194 for genomewide selection at $1.50 per marker data point, and from 75 to 167 for genomewide selection at $0.15 per marker data point. Costs per unit gain increased as the number of QTL and population size increased and as the heritability decreased. As the population size increased, steeper increases of cost per unit gain were observed for phenotypic selection than for genomewide selection.
https://static-content.springer.com/image/art%3A10.1007%2Fs00122-008-0715-5/MediaObjects/122_2008_715_Fig2_HTML.gif
Fig. 2

Cost (in thousands of US dollars) per unit gain in phenotypic selection (circle) and genomewide selection at US $ 0.15 per data point (filled triangle) and US $1.50 per data point (filled square). Responses are at the end of selection in modified schemes (two replications) with different numbers of QTL and population sizes. Gain is in units of the genetic standard deviation in Cycle 0

With genomewide selection at $1.50 per marker data point, 13–16% of the total cost was for marker genotyping, 12–23% was for intercrossing selected palms for the next cycle, and 64–72% was for testcross phenotyping. At $0.15 per marker data point, only 2% of the total cost was for marker genotyping, 14–26% was for intercrossing, and 73–85% was for phenotyping.

For the modified schemes, years per unit gain were higher with phenotypic selection than with genomewide selection, except when population size was small (N = 30), heritability was low (H = 0.20), and the number of QTL was large (NQTL = 80, Fig. 3). Years per unit gain ranged from 14 to 29 for phenotypic selection and from 11 to 33 for genomewide selection. Years per unit gain decreased as the number of QTL decreased, population size increased, and heritability increased.
https://static-content.springer.com/image/art%3A10.1007%2Fs00122-008-0715-5/MediaObjects/122_2008_715_Fig3_HTML.gif
Fig. 3

Years per unit gain in phenotypic selection (circle) and genomewide selection (filled square) at the end of selection in modified schemes (two replications) with different numbers of QTL and population sizes. Gain is in units of the genetic standard deviation in Cycle 0

Discussion

Previous studies have shown that MARS and genomewide selection are useful breeding procedures when population sizes are relatively large (e.g., ∼100–150; Johnson and Mumm 1996; Hospital et al. 1997; Johnson 2001; Bernardo and Charcosset 2006; Bernardo and Yu 2007). Such population sizes are not feasible in oil palm, and we therefore investigated the usefulness of MARS and genomewide selection with the smaller population sizes that are typical in oil palm and other tree species. In this study, we assumed for simplicity that the base population (Cycle 0) was an F2 population formed from two inbreds. Consequently, each locus had two alleles with frequencies of 0.50. In practice, oil-palm populations are formed by crossing heterozygous parents, and a locus may have multiple alleles that occur in frequencies unequal to 0.50. The factor that would directly affect the response to genomewide selection and MARS is the level of linkage disequilibrium between markers and QTL, rather than the number and frequency of alleles at the QTL and marker loci. If the non-inbred nature of the parents leads to a decrease in linkage disequilibrium, then a larger number of marker loci may need to be used to compensate for the faster decline in linkage disequilibrium between QTL and marker loci.

In this study and in previous studies (Hospital et al. 1997; Bernardo and Charcosset 2006; Bernardo and Yu 2007), per-cycle responses were greater with phenotypic selection than with genomewide selection and MARS. However, per-cycle comparisons do not account for time and cost. Genomewide selection and MARS are potentially superior to phenotypic selection because of the greater response per unit time, which justified comparing these breeding methods for a fixed amount of time (Edwards and Johnson 1994; Hospital et al. 1997; Koebner 2003; Bernardo and Yu 2007).

MARS depends on having markers significantly associated with the trait of interest. Small mapping populations lead to low power of QTL detection and high false discovery rate (Lande and Thompson 1990; Bernardo 2004). Therefore, as expected, when N was small, MARS was inferior to phenotypic selection and genomewide selection. Previous studies in maize (Johnson 2004) and our preliminary simulation studies for oil palm (results not shown) indicated that when N was sufficiently large (> 165), the response to MARS in Cycle 4 was greater than the response to phenotypic selection in Cycle 2. However, such a population size is unrealistic for oil palm. Even if such large N were feasible in oil palm, MARS would be inferior to genomewide selection (Bernardo and Yu 2007).

The assumption of random effects of markers in genomewide selection circumvents the problem of over-parameterization that may occur in MARS (i.e. number of markers >N; Meuwissen et al. 2001). Genomewide selection was previously found effective in maize with relatively large population sizes, e.g. 144 (Bernardo and Yu 2007). In the current study, genomewide selection was likewise superior to phenotypic selection when N was moderately large, e.g., = 50 or 70 with NRep = 2. For these population sizes, which are realistic in oil palm, the response to genomewide selection in Cycle 3 was either greater than or equal to the response to phenotypic selection in Cycle 2. Although a significant response is still achieved from Cycle 3 to Cycle 4, breeders could choose to stop genomewide selection at Cycle 3 (instead of Cycle 4) if their goal is to obtain the same gain achieved in phenotypic selection at Cycle 2.

Moreover, populations from genomewide selection in Cycle 2 and Cycle 3 could be released as improved planting materials. This would speed up the commercialization of improved germplasm. To illustrate, more than 4 million ha of oil palm were planted in Malaysia in 2005 (Corley and Tinker 2003, p. 317; Yusof 2007). Only 2–3% of this area is for new plantings or replantings each year, partly because genetic improvement in oil palm is slow. Instead of commercializing improved planting materials every 19 years through conventional breeding, improved germplasm could be developed through genomewide selection every 6 years. This reduction in generation interval translates to a major advantage to plantation owners and growers.

For genomewide selection in oil palm, comparisons between the typical and modified schemes indicated that increasing the population size is more important than increasing the number of replications. Increasing the population size after Cycle 0 (i.e., when selection is based only on markers) but retaining the same number of selected individuals will increase the selection differential. This approach will further increase the gain per cycle in genomewide selection without significantly increasing the total cost, because the percentage of marker expenses relative to the total cost is low and the cost of growing additional palms would not be substantially large because replication is not necessary after Cycle 0.

For genomewide selection to be superior to phenotypic selection in cost and time, the minimum population size is = 50 palms evaluated in NRep = 2 replications. Coincidentally, this combination is identical to the size of a typical oil-palm field trial, i.e., = 25 and NRep = 4. Genomewide selection was more cost efficient than phenotypic selection even when the cost per marker data point was US $1.50. This result implied that genomewide selection would be economically feasible even with expensive marker systems, such as SSRs, which generally cost more than US $1 per data point (A. Kahler, pers. comm., 2007; R. Rasmussen, pers. comm., 2007). The molecular markers used in oil palm to date have been mainly RFLPs and SSRs (Corley and Tinker 2003, pp 163–167; Billotte et al 2005). However, SNP markers may be developed in oil palm in the future. In dairy cattle (Bos taurus), a DNA chip for 10,000 SNPs costs less than US $380 per animal, which is equivalent to less than US $0.04 per data point (Schaeffer 2006). The assumption of $0.15 per data point in the current study was therefore conservative for SNPs. Genotyping costs are expected to decrease in the future while field costs increase because oil palm breeding is labor intensive (Corley and Tinker 2003, p. 154). Given such a scenario, the previously mentioned scheme of increasing the population size in Cycles 1–4 in oil palm would not only increase the gain per unit time but may also increase the gain per unit cost.

In our simulations, the gain in each repeat of an experiment was not always larger with genomewide selection than with phenotypic selection. In particular, the standard deviation of the response across 1,000 repeats was about 1.0 for each method. These results indicate that in practice, genomewide selection would at times lead to observed gains smaller than those from phenotypic selection. Empirical results for MARS in maize have indeed shown variation in response: while mean gains from MARS were positive across six populations, mean performance in two populations decreased from Cycle 1 to Cycle 2 (Johnson 2004). Large-scale implementation, however, of MARS in 248 maize populations and in 43 soybean (Glycine max L. Merr) populations has proved MARS effective in these two species (Eathington et al. 2007). In practice, if genomewide selection in a particular oil-palm population seems ineffective after the initial cycles of selection, then the breeder should consider discontinuing genomewide selection in that population.

We conclude that with the relatively small population sizes that are feasible in oil palm (e.g., N = 50 with two replications), genomewide selection for 3–4 cycles is superior to both MARS and phenotypic selection in terms of gain per unit cost and time. Furthermore, genomewide selection has the advantage of staggering the total genetic gain across a few shorter cycles of selection, which may lead to more frequent commercialization of improved planting materials. While we evaluated genomewide selection in the context of oil-palm breeding, our results should be generally applicable to other tree species that are characterized by long generation intervals, high costs of maintaining breeding plantations, and small population sizes in selection programs.

Copyright information

© Springer-Verlag 2008