Background

New molecular technologies, like DNA sequencing and SNP detection, allow high throughput genotyping for QTL mapping with dense genetic maps. Thus, the classical linkage analysis (LA) methods can be improved by the integration of linkage disequilibrium information (LDLA) or considered useless, in benefits of LD methods. However, the relative interest of the two approaches depends on several parameters, like the experimental design (number and size of families), the LD status between QTL and markers, the density of genetic map, the QTL effects on traits and so on. In this study, we investigate some of these points, using the XIIIth QTLMAS workshop simulated dataset.

Methods

Simulated data

Data for 2.025 individuals across 2 generations were simulated [1]. In the first generation 5 sires were mated with 20 dams for giving 2000 offspring divided into 100 full sib families which are coming form all possible sire-dam combinations (20 offspring per family). All individuals are genotyped for 453 SNP markers distributed over 5 linkage groups and only individuals coming from 50 families were phenotyped for one trait measured at 5 different time points across the production curve.

Models

Data were analyzed by fitting several models and results compared to each other.

In three first models, the markers are assumed to only affect the trait if they are in linkage disequilibrium with a QTL (LD models).

The first model was fitted by not taking into account any population structure. The association between the marker and the trait was tested using a marker fixed effect with 4 levels (00, 01, 10, 11):

  y = μ + Xg + e  (LD1)

Dimensions: (4)

Where y is a vector of phenotypes, X is a design matrix allocating records to the marker effect, g is the effect of the marker and e is a vector of random deviates ~ N (0,σ e 2), where σ e 2 is the error variance.

The second model considers the association between the marker and the trait while taking into account also the parental effect:

   y = µ + sire + dam + Xg + e   (LD2)

Dimensions: (5)  (20)  (4)

Where dam is the dam fixed effect, sire is the sire fixed effect.

The third model considers the SNP alleles effect:

   y = µ + sire + dam + HS + HD + e   (LD3)

Dimensions:  (5)   (20)  (2)  (2)

Where HS and HD are the marker alleles received by one progeny from the sire and from the dam, respectively.

The last model is a linkage analysis model taking into account the parental haplotype received by a progeny from its parent, within family:

   y = µ + sire + dam + HS (sire) + HD (dam) + e   (LA)

Statistical methods

The 4 linear models were applied marker by marker using the SAS GLM procedure [2]. The association between the marker and the trait was tested, marker by marker, by the significance of the marker or haplotype fixed effect; eventually by parental sex. As the LA analyses were also performed marker by marker, it will be further mentioned as MLA.

The LA model was also applied in a QTL interval mapping way (further called IMLA), using the QTLMAP software [3] which was developed for populations containing a mixture of full and half-sib families [4]. The presence of the QTL was assessed using the ratio of likelihood under the hypothesis of one vs. no QTL linked to a given set of markers [5]. A fast algorithm was developed to estimated transmission probabilities at each location of a linkage group according to the SNPs information [6]. QTLMAP software was also used to test some more complex hypotheses, like two linked QTLs influencing the same trait [7]. In this case, the H1 hypothesis (there is one QTL on the linkage group) is compared to the H2 hypothesis (there are two QTLs in the linkage group). The two QTL locations under H2 are estimated considering all possible combinations based on a two dimension grid. This is of particular interest to test if a QTL detected in a single QTL LA could be a ghost.

Finally, a possible interaction between QTLs was evaluated by using one QTL previously detected as a fixed effect and testing a possible interaction between this QTL and the rest of the linkage group. For each progeny, the level of the fixed effect is deduced according to the probability of allele transmission at the QTL location if this probability is higher than 0.8 or lower than 0.2, and other progeny are discarded. Doing this, the effect of this known QTL should be suppressed and a possible other QTL could be detected. In this case, the test only considers previously observed QTL against other locations and only within a linkage group. Two interacting QTLs without main effects or on different linkage groups cannot be detected.

For all these analyses, significance thresholds were determine by simulating the performances assuming a polygenic model with a given heritability (h2=0.5). For the two QTL model, the most likely location and effect estimated under the single QTL hypothesis are used to add this QTL effect to the performances. Up to 200 simulations were performed for each trait x linkage group and thresholds of rejection were estimated according to Harrel and Davis method [8].

Results and discussion

Linkage disequilibrium analyses

The first model identifies a very large number of significant SNPs across all the linkage groups (Figure 1). When the polygenic effect was introduced in the model we were able to identify a smaller number of significant SNPs across linkage groups which can give a better idea of the association between marker and QTL. The third model (LD3) gave results similar to those obtained by LD2.

Figure 1
figure 1

Single QTL detection with the LD and LA models for P530 (P<0.0001). Each SNP p- value is plotted based on its physical location. The linkage groups are separated by red lines. For LD1 and LD2, the overall model effect is used while the sire effect is used for LD3 and LA models. The threshold line corresponds to the natural logarithm of 0.0001 and is common to all models.

Single QTL linkage analysis

Single QTL interval mapping analysis results in the detection of 9 additive QTLs, summarized in Table 1. Interestingly, some of the QTLs are detected only at some time points, revealing that the trait genetic determinism evolves during the time. Most of the identified regions were not identified by the MLA. In addition, except for the first chromosome, the MLA analysis does not gives a precise location for the QTL, and it did not detected the evolution of the QTL effects during the time.

Table 1 Locations, effects and test statistic values for the QTLs detected by the different analyses.

These discrepancies between the MLA and IMLA analyses could be mostly due to two parameters. The first one is the marker density (about one SNP every cM) and informativity. Even in regions with low informativity, the interval mapping method can calculate a probability of allele transmission by using flanking markers, while single point analysis will loose all its power. This is particularly striking for the distal region of chromosome 3 (92cM). At this location, QTLMAP detects a significant QTL (P<0.05), while MLA analysis does not. The very low informativity of the markers in this region (see Figure 2) probably explains these observations. This suggests that we would get similar results using point by point or multipoint approaches with a very dense genetic map and/or large QTL effects. The second point is the test statistics used for the QTL detection. The MLA method performs a Fisher test at the marker location while QTLMAP performs a likelihood ratio test, estimating both the QTL effect and location. Performing a MLA on sires selected based on their heterozygozity for a QTL or on subsets of families (selection of sires and dams), the QTL on chromosome 3 can be detected (see Figure 3).

Figure 2
figure 2

Chromosome 3 markers informativity in the population. The informativity values correspond to the average of transmission probabilities at the SNP location. The arrow highlight the location of the QTL detected at 92cM.

Figure 3
figure 3

MLA results on chromosome 3 with all parents (A), selected sires (B) and selected sires and dams (C)

MultiQTL analysis

The two-QTLs vs. one-QTL tests were performed on all chromosomes and traits. With this model, two additive QTLs can be identified (either their effects are reciprocal or not), but there is no interaction testing. Two QTLs (chromosome 4, at 9.3 and 75.3cM) were identified as having a significant effect on the trait at P0 and P132.

When using the QTL observed on chromosome 3 at 17cM as a fixed effect, a QTL was detected at 48.7cM.

Finally, testing interaction between previously detected QTL and other locations on its linkage group, two new QTL were identified: one non significant on chromosome 3, at 1.3cM and interacting with the QTL previously identified at 17cM and another one on the chromosome 5, at 80cM and interacting with the QTL identified at 72cM. For both regions, neither the single QTL analysis nor the multiQTL analysis did identified QTLs, for any trait. Another interesting result is that for the second linkage group, at P0, a highly significant QTL is observed at 0.7cM when testing interaction with the QTL located at 43cM, while when doing a single QTL analysis, this QTL has no effect on the trait until P263.

Comparison of LD and LA methods

As illustrated in Figure 1, the different LD models detect QTL only on chromosomes 1, 2 and 4 but with a very low accuracy in the location. The results obtained by MLA are very similar. On the opposite, IMLA (QTLMAP) detects more QTL and can identify two QTL located on the same linkage group.

Comparison of detected QTL with simulated QTL

As IMLA performed by QTLMAP gave the best results, only the QTL identified trough this method will be compared with the simulated QTL. Of the 13 QTL identified using all the different strategies (9 by single QTL analyses and 4 by multiple QTL analyses), five are located at less than 5cM of one of the simulated QTL (see Table 2). Five other QTL were located between 5 and 10cM of a simulated QTL location. Two detected QTL are 11 and 17cM away of the most probable location and could be considered as false positives. For the QTL on chromosome 5 (77.19cM) affecting the asymptote, we found two flanking QTL (72 and 80cM). This is probably a bias in the analysis due to the data structure. In the end, 10 of the 18 QTL were detected, 5 with a good accuracy in the location and 3 reported QTL were false positive. Effects of the detected QTL mostly overestimate the simulated QTL effects. This bias is often observed and could be due to the "additive" analysis strategy by linkage group, to the possible confusion between polygenic and QTL effects with the sire/dam model used [9] and also to the use of time point traits instead of growth curve.

Table 2 Comparison of the detected QTL with the simulated data.

QTLMAP is freely available through the Quantitative Genetic Platform (QGP) at the following address: https://qgp.jouy.inra.fr/

Conclusion

Our results show that for such a marker density the interval mapping strategy still gives better results than using the linkage disequilibrium models only. While the experimental design structure gives a lot of power to both approaches, the marker density and informativity clearly affect linkage disequilibrium efficiency for QTL detection. Also, using an interval mapping strategy offers the possibility to test interactions between markers. However, the LDLA strategy has not been tested and should improve the QTL detection.