Introduction

Quantitative trait locus (QTL) detection procedures are routinely used in plant breeding. A typical QTL experiment aims at finding associations between quantitative traits and DNA polymorphisms based on designed segregating populations. Associations are found as a direct consequence of linkage disequilibrium between loci affecting the target trait and DNA polymorphisms. Widely used methods for QTL detection include those based on mixture models (Lander and Botstein 1989; Jansen and Stam 1994; Zeng 1994) and those following regression ideas (Haley and Knott 1992). Within the same philosophy of regression-based methods, mixed model QTL detection has been advocated, especially in the case of more complex situations, when modelling QTL by environment interaction (Malosetti et al. 2004; Boer et al. 2007) or when mapping QTL for several traits simultaneously (Malosetti et al. 2008). The advantage of the mixed model approach in the latter two cases resides in the possibility to model the underlying genetic correlation present in the data (between environments or between traits), leading to appropriate tests for QTL effects (Piepho 2005). In this paper, we introduce a new motivation for mixed model QTL detection in a designed segregating population. When the standard assumptions of absence of selection, mutation, migration and genetic drift in the mapping population are not met, the need of modelling the genetic covariance between individuals in the population calls for mixed model QTL detection.

Quantitative trait locus detection is commonly performed by analysing standard segregating populations derived from two or more founder genotypes (parental lines). Absence of selection, mutation and genetic drift are the usual assumptions. In practice, those assumptions do not necessarily hold, in which case the standard testing procedures will produce results that require extra caution in their interpretation. In the presence of selection, an extra source of genetic covariance arises as consequence of the uneven sharing of the genetic background between genotypes. Analogously to the case of linkage disequilibrium mapping, ignoring this source of genetic covariance (relatedness) will increase the rate of false-positives leading to spurious associations (Yu et al. 2006; Malosetti et al. 2007). A mixed model QTL detection approach can accommodate the extra genetic covariance by embedding kinship information in the model, leading to appropriate tests, and minimizing the rate of false QTL or gene detection. We demonstrate the case with a three-way barley cross, designed to segregate for several genes, including dwarfing and vernalization genes. We use those genes as control for our QTL analysis, since the analysis should be able to detect those genes.

A three-way cross population was created in order to fulfil the following properties: (1) to encompass a large genetic variation, including Spanish landraces, central and northern European germplasm both from two- and six-row, winter- and spring-type parents, and (2) to contain a number of important genes of known chromosomal positions, which we conveniently used as checks of the statistical procedure employed in this paper. The population of 161 inbred lines was screened with 1,536 SNP markers of which 744 were polymorphic. Distortions from expected allele frequencies showed evidence of selection in the population. The genetic relatedness between lines (kinship) was then inferred from the SNPs and the resulting relationship matrix included in the mixed model. The model incorporating kinship information was consistently superior for all traits to the model without kinship (based on the Akaike Information Criterion). The mixed model with kinship information was used for QTL detection. 85–95% of the total variation of the evaluated traits was explained by the detected QTLs, including QTLs associated with the genes known to segregate in this population. In contrast, the results of an analysis using the standard model for QTL detection produced a higher number of significant SNP–trait associations, without a clear indication of QTL location for example in chromosomal regions that were known to segregate for major QTLs.

These results highlight the importance of the inclusion of kinship information when detecting QTL in populations that have undergone some process of selection. It also shows that genetically complex crosses can be used in QTL analysis, as long as the appropriate modelling of the covariance structure is taken care of. This is particularly relevant when segregating populations are created within a plant breeding context and the elimination of clearly ill-conditioned genotypes by natural or artificial selection will introduce departures from standard segregating population assumptions.

Materials and methods

Plant material

A segregating barley population was produced by crossing the following three barley cultivars/genotypes: Candela, 915006, and Plaisant. An F1 was first obtained by crossing the cultivars Candela by the line 915006, and the resulting F1 was crossed with the cultivar Plaisant, from where 161 recombinant inbred lines (RIL) were obtained (Fig. 1). The parents were chosen to guarantee a broad genetic diversity, and so the first cross was done in 1979 involving the Dutch two-row spring barley line VDH 044-78, which carries the ari-e.GP dwarfing gene (M.E. Roothaan and A.G. Balkema-Boomstra, personal communication) and the Spanish six-row winter landrace Precoz de Cadreita, as an attempt to produce short-strawed lines with Spanish genetic background. The line 82033A1, selected from this cross is a two-row dwarf spring genotype, carrying the ari-e.GP dwarfing gene, and was crossed in 1991 to the German cultivar Cheri, that carries the sdw1 dwarfing gene derived from Diamant (Baumer and Cais 2000); here we attempted to combine both short-straw sources into a single line. We made doubled haploids from the F1 of this cross and among them we selected the genotype 915006, which combines both dwarfing genes. The Spanish winter six-row cultivar Candela, was derived from the cross of Barberousse (French six-row winter) by Pané (Spanish winter six-row selected itself from a landrace). The cross 915006 by Candela was made in 1997 and its F1 crossed in 1998 into the French winter six-row Plaisant. The RIL population derived from this three-way cross is now in F10. The first cross leading to this complex population was done in 1979 and the last one in 1998 (Fig. 1), thus it is not surprising that some conscious or unconscious selection forces might have been operating on it across such a long lapse of time.

Fig. 1
figure 1

Pedigree of the inbred lines (RILs) derived from a three-way cross (915006 × Candela)  × Plaisant

Phenotyping

The characterization of the RIL population for the dwarfing genes was carried out from F6 onwards in the field as a qualitative score, given that both dwarf phenotypes can be easily identified: sdw1 has very prostrate habit at tillering stages whereas ari-e.GP has very erect leaves. Other major genes easily identified were related to spike type: two- and six-row and intermedium. Two experiments were carried out in 2006 to characterize the vernalization and photoperiod phenotypes, in both cases evaluating the heading time: in the field, with autumn-sowing, allowing the plants to vernalize and to be growing under short-day regime, and in greenhouse under natural light and temperature conditions (no vernalization, increasing day-length), sown in 15 March (12 h day, 12 h night) and harvested in 30 June (15 h day, 9 h night). Both experiments were carried out under a randomized-block design with two replications. The traits measured that were used in the QTL analyses were number of days from emergence to heading (in field and glasshouse), and plant height (in the field).

Genotyping

The 161 RILs and the parents of the population were genetically characterized with the 1,536-SNP Illumina GoldenGate oligonucleotide pool assay, BOPA1 developed for barley (Close et al. 2009). In total, 744 SNPs that were polymorphic in this population and for which a map position was known, were used in the QTL analyses. Map information is given as supplementary file (S1) and the data set is available on request from JL Molina/Ana Casas.

Statistical models for QTL detection

Preliminary SNP analysis showed severe allele frequency distortions in many chromosomal regions (more details in the results section), which is likely the outcome of selection that operated during the inbreeding process. This is in contradiction with the assumptions of conventional segregating populations, where selection as well as other allele–frequency distorting forces as bottlenecks, mutation and migration are assumed absent (Lynch and Walsh 1998). Violation of the basic assumptions implies that the genetic covariance between genotypes (i.e. genetic relatedness) in the population is not homogeneous. Akin to association mapping studies, a QTL detection model should account for the heterogeneity in genetic relatedness between genotypes to avoid false-positives.

We used a mixed model for QTL detection that explicitly included in the model information about the genetic relatedness between RILs. A single SNP model reads (fixed terms in Greek letters, random variables/terms in underlined Roman letters):

$$ \underline{y}_{i} = {{\upmu}} + x_{i} {{\upalpha}} + \underline{u}_{i} + \underline{e}_{i} $$
(1)

with y i observed phenotype of RIL i, μ a constant, x i the SNP genotype of RIL i (defined as 0, 1 or 2, for SNP genotypes AA, AB and BB respectively), α the SNP effect (which in accordance to the convention used here, corresponds to the additive allele substitution effect of allele A by allele B), u i the random genetic background effect of RIL i and e i a random residual effect. The random u i effects are assumed to follow a normal distribution with mean zero and variance-covariance matrix \( {{\upsigma}}_{\text{u}}^{ 2} G \), with G = 2K, and K the coefficient of coancestry matrix between RILs. Residual effects were assumed normally distributed with mean zero, and variance σ2.

The coancestry matrix K was estimated from the SNP data as the average allele sharing between lines. With M SNP markers, the coancestry between RILs i and i* was estimated by:

$$ {{\uptheta}}_{i,i^{*}} = {\frac{{\sum\limits_{m = 1 ,\ldots ,M} { 1- {\frac{{\left| {x_{i} - x_{i^{*}} } \right|}}{ 2}}} }}{M}} $$
(2)

with x i and x i * the SNP genotype of individuals i and i* as defined above. Note that because SNP are markers with a very low mutation rate, identity by state essentially implies identity by descent, so more elaborated methods to estimate coancestries would have produced no improvements in the estimation of coancestries (Zhao et al. 2007b).

For QTL detection, we fit model 1 at every SNP position on the chromosomes. A genome-wide threshold value to assure a type I error rate of 0.05 was defined based on an empirical null distribution constructed from 1,000 genome-wide scans, based on randomly permutated data sets (phenotypic responses were permuted retaining the coefficient of coancestry identity between lines).

After a first round of genome-wide scanning, the model in Eq. 1 was extended to include a number of tag SNP for putative QTLs as cofactors:

$$ \underline{y}_{i} = {{\upmu}} + \sum\limits_{C} {x_{ic} {{\upalpha}}_{c} } + x_{i} {{\upalpha}} + \underline{u}_{i} + \underline{e}_{i} $$
(3)

The cofactors control genetic background, which improves the power to detect minor QTLs (Jansen and Stam 1994; Zeng 1994). Cofactors were removed from the model when closer than 20 cM from the tested SNP.

For the trait heading time in glasshouse conditions, we investigated an epistatic effect between markers in regions on chromosomes 4H and 5H by fitting the following model:

$$ \underline{y}_{i} = {{\upmu}} + x_{i 1} {{\upalpha}}_{ 1} + x_{i 2} {{\upalpha}}_{ 2} + x_{i 1} x_{i 2} {{\updelta}} + \underline{u}_{i} + \underline{e}_{i} $$
(4)

with x i1 the SNP genotype of RIL i on chromosome 4H, x i2 the SNP genotype of RIL i on chromosome 5H, and x i1 x i2 the cross product term between SNP genotypes. The parameters α1, and α2 correspond to the QTL main effects, and δ corresponds to the epistatic effect. We performed a two-dimensional epistatic QTL search by fitting model 4 for all pairs of markers in the range 50–123 cM on chromosome 4H and 100–196 cM on chromosome 5H, as those were candidates regions for the epistatic QTLs.

Finally, we fit a multi-QTL model to estimate SNP effects. To fit a final multi-QTL model, we first fitted all candidate QTLs identified as peaks in the QTL scanning profiles, and retained only the ones that remained significant in the multi-QTL model. The explained genetic variance by the final QTL model was determined by comparison of the variance-covariance matrix \( {{\upsigma}}_{\text{u}}^{ 2} G \) before and after the inclusion of all the detected QTLs in the model (Mathews et al. 2008). Individual contribution of each QTL to the total explained variance was determined by comparison of the variance-covariance matrix \( {{\upsigma}}_{\text{u}}^{ 2} G \) before and after the inclusion of the particular QTL in the model. Note that the sum of the individual QTL contributions will not necessary be equal to the total percentage of explained variance as consequence of slight correlations between QTLs. All models were fitted with GenStat 12th edition (Payne et al. 2009).

Results and interpretation

The 744 SNPs provided a dense coverage of the seven barley chromosomes with 82–145 markers per chromosome, being 90% of the gaps between markers shorter than 5 cM (Fig. 2). The allele frequencies in the RIL population revealed clear segregation distortions. We expected allele frequencies in the offspring of a three way cross (Candela × 915006) × Plaisant to be 0.25/0.25/0.50 for Candela, 915006 and Plaisant alleles, respectively. However, the observed frequencies significantly deviated from expectation in many regions of all linkage groups as it is shown in Fig. 3, where the result of the Chi-square test of no segregation distortion is presented (P values given on a −log10 scale). Regions with high −log10(P) values point to SDL (segregation distortion loci). SDL have been found in other RIL populations as well, for example in well-known maize populations (McMullen et al. 2009). The observed segregation distortions motivated the use of a QTL detection model that accounts for the heterogeneous genetic relatedness between lines caused by the uneven sharing of genetic background between RILs.

Fig. 2
figure 2

Genetic map of the 744 SNPs polymorphic in the RIL population

Fig. 3
figure 3

Result of a Chi-square test of no segregation distortion of the SNP markers along the seven linkage groups in a three-way RIL population from the cross: (Candela × 915006)  × Plaisant. Allele frequencies are expected to be 0.25/0.25/0.5 for Candela, 915006 and Plaisant, respectively

The mixed model that assumed the genetic polygenic background effect to follow a covariance structure in correspondence with the coancestry matrix between RILs, was superior to the model that ignores coancestry information. For the three traits, the Akaike Information Criterion (AIC = devianceREML + 2p, with p the number of parameters in the random model) was smallest for the model that includes the kinship information, making that model the preferable one (Table 1). This result was in agreement with our expectation, as the allele distortions observed in the population made the usual assumption of compound symmetry with an identity matrix for the genetic relatedness between lines a highly unrealistic model. Trait’s heritability was 0.78, 0.82 and 0.74 for plant height, heading time in glasshouse and heading time in field, respectively. Inspection of residuals showed a good correspondence with normality assumption and no suspicious outlying observations were found.

Table 1 Akaike Information Criterion (AIC) of models that ignore genetic relatedness between RIL versus models that include a relationship matrix (K)

Plant height

The upper frame in Fig. 4 gives the result of the genome-wide QTL search for plant height (PH). Significant SNP associations with PH were observed on chromosomes 2H, 3H, 5H and 7H. The two major peaks were observed on chromosomes 3H (127.1 cM) and 5H (69.3 cM). These two chromosomes are known to carry semidwarfing genes. The denso gene (also known as sdw1) maps on the long arm of chromosome 3H (Laurie et al. 1993), and the ari-e.GP gene maps on the short arm of chromosome 5H (Thomas et al. 1984). These genes have not been cloned yet, and none of them were present in BOPA1. Jia et al. (2009) have recently proposed GA-20 oxidase as a candidate for barley sdw1/denso. Chloupek et al. (2006) evaluated the effect of semi-dwarfing genes in the population Derkado x B83-12/21/5. sdw1 was mapped on the long arm of chromosome 3H between two SNP markers abc08541 and abc08208. SNP ABC08208 was used to generate a BOPA2 marker, 12_30096 that maps on chromosome 3H at 127.1 cM, the same position identified in this study. This result suggests that the QTL identified corresponds to sdw1/denso. This same population segregates for ari-e.GP, a gene that was mapped on the short arm of chromosome 5H, between two SSR markers Bmag0337 and Bmag0357. The BOPA1 SNP identified in our study, 11_21239 has only been mapped in the Haruna Nijo × H602 population at 50.72 cM (Sato and Takeda 2009). This marker was mapped as a Transcript Derived Marker (Contig9835_8 on 5H at 58.09 cM) in the Steptoe × Morex population (Potokina et al. 2008), and has been included in the barley integrated map of Aghnoum et al. (2010) on bin 5H-06. Some of the SSR markers mapped by Chloupek et al. (2006) around ari-e.GP, are included in the same integrated map and point to the same region, suggesting that the identified association may well correspond to this dwarfing gene.

Fig. 4
figure 4

Genome-wide results of the test for association for each of 744 SNPs with plant height, heading time (HT) in glasshouse and in field conditions. Results are given as P values on a −log10 scale. Each spike corresponds to a particular SNP, with those exceeding the horizontal line being significantly associated with the trait

Table 2 lists the SNP tagging each of the detected QTL and their effects (plus the parental genotypes). The largest peak on chromosome 3H was associated with SNP 11_10867, which according to the parental genotypes represents the contrast between 915006 versus Candela/Plaisant. The allele substitution effect of the 915006 allele (SNP allele A) by the allele coming from Candela or Plaisant (SNP allele B) was 10.4 cm, pointing to 915006 as the donor of the allele reducing height. This is in agreement with expectation, as 915006 is the known donor of the denso gene in this cross. The SNP associated with the major peak on chromosome 5H (11_21239) had an effect of −9.6 cm, pointing to 915006 as the donor of the allele that reduce plant height, which agreed with the fact that line 915006 is the source of the ari-e.GP dwarfing gene in this cross.

Table 2 Effects of tag SNP of plant height QTL (cm) expressed as the substitution effect of the SNP allele A by the SNP allele B, with their corresponding standard errors (SE) between brackets

In addition to the two major genes, two other PH QTLs were found on chromosomes 2H and 7H (Table 2). The QTL on chromosome 2H (at 63.5 cM) had an effect of −4.0 cm, pointing to both Candela and Plaisant as the donors of the low-height allele. Finally, the QTL on chromosome 7H (9.8 cM) had an effect of −2.4 cM, and again, pointing to Candela and Plaisant as the parental lines contributing the low-height allele.

The full QTL model explained 91.5% of the genetic variance for PH, suggesting that all major genes and QTLs were detected. Since most of the genetic signal has been explicitly modelled, the need of imposing a covariance structure on the polygenic effect should become less important. This was confirmed by the marginal difference between AIC values of models that include or omit the coancestry information (Table 1), which is in sharp contrast to the original situation, when no QTLs were included in the model.

Finally, it is illustrative to visualize the importance of including the coancestry information in the QTL detection model by performing the analysis based on a naive model without kinship information. Had that been the case, a substantial number of significant associations would have been detected (Fig. 5). However, it would have been very hard to conclude from this result where the QTLs locate, as many of these significant associations are potentially false positives caused by the heterogeneous genetic structure of the population. Another QTL mapping approach in which kinship information is not used, is Inclusive Composite Mapping known as iCIM (Li et al. 2007). We applied iCIM to this data, and observed that although the result gave a clearer profile with indication of QTL locations, some of the peaks corresponded to false positives caused by high LD between unlinked chromosomal regions (more detailed description of results and discussion is in Supplementary material S2).

Fig. 5
figure 5

Naive genome wide result of the test of association for each of 744 SNPs with plant height, heading time (HT) in glasshouse and in field conditions in barley when ignoring kinship information. Results are given as P values on a −log10 scale

Heading time (glasshouse)

The temperature regime in the glasshouse resulted in unfavourable conditions for vernalization, which allowed the observation of the expression of vernalization genes (plus other plant cycle QTLs). Vernalization requirements in barley are largely determined by two epistatic genes: Vrn-H1 on the long arm of chromosome 5H, and Vrn-H2 on the long arm of chromosome 4H (Zitzewitz et al. 2005). Vernalization is required when the recessive allele in Vrn–H1 (vrn–H1) and the dominant allele in Vrn–H2 (Vrn–H2) are present (Zitzewitz et al. 2005). In our cross, the recessive winter allele vrn-H1 is present in Plaisant, and the dominant allele Vrn-H2 is present in both, Candela and Plaisant. Because both genes are known to segregate in the three-way cross, we expected to observe signals on chromosomes 4H and 5H. According to expectation, a large peak was observed on chromosome 5H (with a maximum at 137.1 cM), and also several peaks were found on the long arm of chromosome 4H (Fig. 4, middle frame). Acknowledging the epistatic interaction between QTLs present in the long arms of chromosomes 4H and 5H, we complemented the one dimensional QTL search with a two-dimensional QTL search based on an epistatic model. Specifically, we tested for the interaction between pairs of SNPs present in the long arms of chromosomes 4H and 5H. The two-dimensional search revealed a significant epistatic interaction between the region 100 and 120 cM on chromosome 4H, and the region between 137 and 144 cM on chromosome 5H (Fig. 6). Note that we do not attempt to claim a very precise location of the QTL on chromosome 4H, as it is possible that the heavy distortions observed at the end of the chromosome 4H (Fig. 3) had negatively affected the power for QTL detection in that chromosome region. To estimate effects, we selected as tag markers for those regions the pair of SNP that gave the highest overall signal (in terms of main effects and interaction): for the region on chromosome 4H we selected SNP 11_10334, and for the region on chromosome 5H SNP 11_21241. It is remarkable that the epistatic interaction detected between the two tag SNP markers agreed with the epistatic interaction described for vernalization in barley (Takahashi and Yasuda 1971; Zitzewitz et al. 2005). Table 3 presents the heading date of the different genotypes at the two tag SNPs, which shows that vernalization requirement were observed in lines that carried the allele derived from Plaisant at SNP 11_21241 (Plaisant being the source of the recessive allele vrn-H1), and the SNP allele inherited either from Candela or Plaisant at SNP 11_10334 (Candela and Plaisant being the source of the dominant allele Vrn-H2). This result gave a strong support to the choice of the selected SNPs as tag for QTLs related to the vernalization genes Vrn-H1 and Vrn-H2.

Fig. 6
figure 6

Two dimensional search for epistatic QTL for heading time (HT) along the long arms of barley chromosomes 4H and 5H. The result of the test for an epistatic interaction between each SNP pair is reported as the associated P value (−log10 scale), with the strength of signal of a significant epistatic interaction given by the size and colours of the dots (from low to high): small/grey, intermediate/grey and large/black

Table 3 Predicted heading time based on the alleles present at the QTL tag SNP 11_10334 on the long arm of chromosome 4H, and the QTL tag SNP 11_21241 on the long arm of chromosome 5H

In addition to the two major QTL driving heading time, three other QTLs were detected on chromosomes 3H (127.1 cM), 4H (87.5 cM) and 7H (46.2 cM) (Table 4). The QTL on chromosome 3H had an additive effect of −2.0 days with the delaying allele coming from 915006. This QTL actually coincides with the position of the dwarfing gene denso, which was shown to be associated with delayed heading (Powell et al. 1985). The QTL on chromosome 4H at 87.5 cM (SNP 11_20765) had an effect of −3.0, with the delayed heading time associated with the allele from Candela. This position may well relate to an earliness per se QTL, eps4L, identified by Laurie et al. (1995). The remaining QTL, on chromosome 7H had an effect of −3.0 days, with the delaying flowering time allele coming from Candela.

Table 4 Effects of tag SNP of HT QTL in glasshouse conditions (days) expressed as the substitution effect of the SNP allele A by the SNP allele B, with their corresponding standard errors (SE) between brackets

The final model, with three additive QTLs plus two epistatic genes explained 95.7% of the total genetic variation for heading time in this population as grown in glasshouse conditions. Again, the importance of modelling the covariance of the residual polygenic effect became immaterial after including all QTLs and genes in the model, with a marginal difference in AIC (Table 1). However, the importance of including the kinship information at the initial stage of QTL scanning model is highlighted by the result showed in the middle frame of Fig. 5, where it shows that omitting the coancestry information would result in an excess of significant associations without leading to clear conclusions with respect to QTL locations.

Heading time (field)

As expected, the major flowering time genes identified in the glasshouse were barely detected under natural conditions, in the field (Fig. 4 bottom frame). The population grown in the field had the opportunity to fulfil vernalization requirements, which blur the effect of vernalization QTL. The total genetic variation for heading time (HT) was smaller than in greenhouse conditions (195 vs. 54 days), which is explained by fewer restrictions imposed by the vernalization genes on the HT response. In spite of the lower genetic variation, five QTLs were found for HT (Table 5).

Table 5 Effects of tag SNP of HT QTL in field conditions (days) expressed as the substitution effect of the SNP allele A by the SNP allele B, with their corresponding standard errors (SE) between brackets

Plaisant was the parental line contributing the HT delaying allele in three of the five QTLs; the QTLs on chromosome 1H at 91.0 and 138.3 cM, and the QTL on chromosome 5H (Table 5). The delay caused by Plaisant’s alleles in each of the QTLs was 1.6, 2.6 and 1.6 days, respectively. The remaining two QTLs (2H, 67.5 cM, and 3H, 127.1 cM) had in common that 915006 contributed to the HT-delay allele, with an estimated HT delay of 2.5, and 3.7 days, respectively. The finding of the QTL on chromosome 3H is consistent with previous results, as the same SNP was found associated with the denso gene, which as mentioned before, had been associated with delayed HT (dwarfing allele from 915006). The regions of Ppd-H2 on 1H and Eam6 on 2H were the main determinants of heading time in autumn sowings in the Beka × Mogador population, under the same growing conditions in Spain (Cuesta-Marcos et al. 2008). The QTL on 1H (11_20792) could correspond to Ppd-H2, a gene that is present in L915006 and Candela and absent in Plaisant. The QTL on 2H (11_21110) may represent the region of Eam6, where Candela and Plaisant would carry the early allele and L915006 the late allele. This same region was the main determinant of flowering time in the Beka × Logan population, also grown under autumn sowings in Northern Spain (Moralejo et al. 2004). The QTL on the long arm of chromosome 1H (11_20915) may correspond to the earliness per se gene eam8 (Franckowiak et al. 1997). A QTL for heading time has been mapped in this position in different populations (Borner et al. 2002; Emebiri and Moody 2006; Sameri et al. 2006). The QTL on 5H (marker 11_11532) is localized a few cM away from Vrn-H1, on the same bin 5H-11, which suggests that this may also be an effect of this vernalization gene.

As it was observed for the other traits, the QTL model captured a substantial percentage (85%) of the total genetic variance for HT in field conditions. The fact that most of the background polygenic effect has been included in the model is reflected by the marginal difference in AIC values between models with and without coancestry information (Table 1). Once more, the importance of including the coancestry information in the QTL scanning model is visualized in the bottom frame of Fig. 5.

Discussion

Influence of selection

In a conventional QTL mapping, segregating populations are assumed to experience no selection, no mutation, no migration and no genetic drift. When those conditions are met, the expected genetic similarity between genotypes in the segregating population is constant, which translates into a homogenous coancenstry or kinship matrix. This is the basic assumption of all QTL models, where the genetic relatedness between lines is represented by an identity matrix. In the presence of allele-frequency distortion phenomena, like artificial or natural selection, the genetic similarity between genotypes is not constant but it translates into a heterogeneous coancestry matrix. An important implication of the uneven relatedness between genotypes is that of introducing genetic covariance (correlation) between observations from different individuals. When testing the effect of a given DNA polymorphism, the genetic background covariance has to be accounted for, to avoid confounding of QTL and background effects. The covariance caused by the genetic background is a major cause of spurious associations, largely recognized in the association or linkage disequilibrium mapping literature (Mackay and Powell 2007).

Mixed models accounting for the varying genetic covariances between the genotypes in the population have been successfully applied to association mapping studies (Yu et al. 2006; Malosetti et al. 2007; Stich et al. 2008). In this paper, we applied a similar mixed model for QTL analysis of a designed cross using a structured variance–covariance matrix, where the structure was induced by selection. Questions may arise with respect to the validity of the inference for the genetic parameters using this type of mixed modelling in the presence of selection (Piepho et al. 2008; Thompson 2008). Note that our QTL modelling approach here, just like that of most mixed model association mapping, looks for fixed effects QTLs against a background of random residual genetic (genotypic) effects, where the focus is on the modelling of the residual genetic variances and covariances at the level of the population as it is, in this case a population that underwent selection, with the covariances depending on the genetic distances between individuals, see also Piepho et al. (2008). We want to model genotypic differences by SNP polymorphisms, while simultaneously taking into account the residual genetic relationships between those genotypes. Our case should be distinguished from the case for which the objective is to rank genotypes based on their overall genotypic performance by producing Best Linear Unbiased Predictions, and for which interest centres on the estimation of the additive genetic variance in the unselected base population, rather than the observed variance for the set of selected genotypes (Lynch and Walsh 1998, p. 793). Still, theory indicates that even under selection, REML variance component estimates are correct for the base population as long as the phenotypic data contain information on both selected and unselected genotypes (Piepho and Mohring 2005; Piepho et al. 2008; Thompson 2008).

Another valid question is that of when the inclusion of kinship information in a mixed model would result in an advantage over the traditional QTL mapping approach. To answer that question, the researcher should consider the degree of departure of the particular population under study from the standard population assumptions. The larger the departure from assumptions, the more heterogeneous the kinship matrix will be, and the more advantageous will be the use of the mixed model over the conventional QTL mapping approach. If on the contrary, little departure from assumptions exists (for example if only a few markers show segregating distortions), the closer the kinship matrix will be to a homogenous matrix, and the closer the results would be between a mixed model and a conventional QTL mapping method. An empirical approach would be to perform both analyses, and compare their results. Alternatively, one can take the mixed model with kinship information as the default choice as in the case of no selection this and the conventional method will largely coincide.

Missing markers and segregation distortion

We constrained our analysis to marker genotypes, without attempting to infer missing marker information or estimate pseudo markers between observed markers. Estimation of missing marker or pseudo marker is common in QTL mapping, where conditional probabilities of (pseudo) marker genotypes can be estimated based on flanking markers and map information (Jiang and Zeng 1997). Extensions to estimate identity by descent probabilities in multi-parental populations are available as well (Crepieux et al. 2004; Paulo et al. 2008; van Eeuwijk et al. 2010). However, the low proportion of missing values in our SNP data, together with the high density SNP coverage of the linkage groups, made the need for estimating conditional identity by descent probabilities in-between SNPs less urgent. With the advent of high throughput genotyping techniques, we expect that such high densities will become the norm in future genetic studies. In spite of this, a reason for still wanting to use linkage information could be to infer identity by descent probabilities for the bi-allelic marker system as applied to our population containing three ancestral genotypes. Our results show, however, that an analysis at exclusively marker positions was powerful enough to detect most of the genetic signal in the segregating population, as the a priori most relevant QTL were detected and characterized. Simulation studies of SNP-trait associations have shown that SNP-based analysis can efficiently detect QTLs without requiring the use of haplotype information (Zhao et al. 2007a), which is in agreement with our empirical observation.

An interesting alternative to our association mapping approach to QTL analysis for a designed cross under selection/segregation distortion is given by Wu et al. (2007). They extend models for linkage analysis by including parameters for segregation distortion. Their approach is based on maximum likelihood estimation of parameters related to segregation distortions (gamete and zygote selection) in addition to the estimation of recombination frequencies. Log-likelihood functions and maximum likelihood estimators are described for backcross and F2 populations. Simulations showed that the estimation of those parameters can be rather imprecise and would require large populations (Wu et al. 2007). Xu (2008) discusses the same issue of segregation distortion in relation to QTL detection, showing that distortions can negatively or positively affect the power for QTL detection. Zhu and Zhang (2007) proposed a multi-point maximum likelihood method to estimate positions and effects of SDL based on a liability model. A particularly interesting approach is that presented by Xu and Hu (2009) that shows the advantage of simultaneous detection of SDL and QTLs. The method is discussed for F2 populations, but has been extended to backcrosses, RIL, double haploids and four-way crosses (segregating F1 populations). As pointed out by one of the reviewers, exact EM methods are an interesting alternative to the approach presented in this manuscript. However, none of these methods have been already implemented for the type of population used in this research (RIL derived from a 3-way cross), and extending EM methods to this kind of population is outside the scope of this research. This latter point highlights the fact that while the mixed model approach is a generally applicable approach that works for whichever kind of population under whichever type of distortion, EM approaches requires special adaptations for each new type of population. Finally, it is expected that the forthcoming increase in marker density will alleviate the need of exact methods that estimate conditional QTL genotypes in-between markers by translating partially informative markers into fully informative multi-allelic haplotypes formed by set of markers (Waugh et al. 2009).

Conclusion

The development of segregating populations in genetic studies requires the use of contrasting parents, which can result in populations that are not representative of the genetic backgrounds actually used in breeding programs. Alternatively, populations can be produced by crossing parental genotypes that represent more relevant genetic backgrounds. However, for those populations to remain relevant from a breeder’s perspective, selection should be allowed to eliminate badly adapted genotypes, that are irrelevant for genetic improvement. We have shown here that this being the case, such a population could still serve for genetic analysis, as long as the appropriate modelling of the genetic covariance structure is taken care of.

We illustrated for three traits, that ignoring the coancestry information results in an unrealistically high number of marker-trait associations, without providing clear conclusions about QTL locations. We used a number of widely recognized dwarfing and vernalization genes known to segregate in the studied population, as landmarks or references to assess the agreement of the results with a priori expectations. The presence of major genes governing the traits did not preclude the identification of extra QTLs.