Background

The identification of chromosomal segments showing association or linkage is only the first step toward discovery of genetic factors underlying susceptibility to disease. The typical genome-wide linkage analysis based on microsatellites with an average density of 10 cM results in large genomic regions for fine-mapping. In this regard, there is considerable interest in developing maps based on genomic markers that will lead to higher resolution linkage results with the hope of reducing future cost and time to conduct fine-mapping. With the availability of several million new SNPs in the public database and new technologies for large-scale, high throughput SNP genotyping at affordable costs, there is growing interests in using SNPs to create high resolution linkage maps. In this paper we evaluate strategies to systematically compare genome-wide linkage results from microsatellite and SNPs using different density maps.

Methods

Materials

The dataset for the Collaborative Study on the Genetics of Alcoholism (COGA) was provided as problem 1 for GAW14. The dataset included 1,350 individuals in 143 pedigrees, 318 microsatellite genotypes for a 10 cM genome map, 4,763 SNP loci from Illumina, 11,555 SNP loci from Affymetrix and phenotypic information. We used MAXDRINK and TTTH1 as phenotypes and the panel of 4,763 Illumina SNPs. MAXDRINK is defined as maximum number of drinks in a 24-hour period [1] and TTTH1 is defined as the Visual Oddball Experiment and the Eyes Closed Resting EEG dataset for frontal left side channel. The extracted measures correspond to the 'late' time window, which is set at 300 to 700 ms following stimulus presentation (bounding the visual P3 event), and the theta band power (3 to 7 Hz) [2]. These phenotypes were log transformed for all analyses. Three chromosomes (1, 4, and 7) which show linkage signals for MAXDRINK or TTTH1 phenotypes in previous reports [1, 2] were selected for our analyses.

Statistical analysis

For each chromosome, we constructed haplotypes using GENEHUNTER2 (GH2) [3]. Linkage equilibrium among markers is assumed in GH2. As discussed by Shaid D.J. et al. [4], if closely spaced markers are useful for haplotype fine mapping, it is reasonable to assume that that the markers themselves are in linkage disequilibrium (LD), because the implicit basis of fine mapping by haplotypes is LD. Haplotype blocks were generated using the statistical framework method [5], in which the inference on the optimal haplotype block partitioning is formulated as the problem of statistical model selection based on the likelihood of the observed data to define regions with a very small proportion of comparisons among informative SNP pairs showing strong evidence of historical recombination. We selected SNPs, at random, from each block to test for the minimum number of SNPs required to achieve the same results as using all the SNPs in a block. Rep1 represents the process of randomly selecting one SNP from a block and Rep2 for randomly selecting 2 SNPs from a block; this process was repeated until we selected the maximum of 5 SNP (Rep5) from each block. We stoped at five because the minimum observed number of SNPs in observed blocks was 5. We also conducted linkage analysis using all available SNPs. A variance components approach as implemented in SOLAR was used for all analyses [6]. The linkage results using microsatellites markers were then compared to those from constructed haplotype blocks and for reduced number of SNPs from each block (Rep1 through Rep5) and entire set of SNPs. The range of positional candidate regions was defined by a logarithm of odds (LOD) score of ≥ 1.0.

Results

The residual kurtosis of LOG transformed MAXDRINK and TTTH1 are -0.18 and 0.57, respectively allowing the assumption of normality in our analyses. The distribution of haplotype blocks for chromosome 1, 4, and 7 are displayed in Table 1. Although the LOD scores from the linkage analyses based on SNPs, as compared with microsatellites, were consistently larger (p < 0.01), the location of the signals were for the most part similar (Figures 1 and 2). Interestingly, two linkage regions on chromosome 7 (154 cM and 163 cM) were not detected in the SNP analyses for the TTTH1 phenotype (Table 2 and Figure 1). The SNP density and associated information content around the chromosome 7 linkage peaks using STRP are displayed in table 3. No significant linkage signals were observed for chromosome 4. Overall, the largest LOD score of 1.66 was observed on chromosome 1 for the analyses based on entire set of SNPs using the log of MAXDRINK as the phenotype (Table 2). Table 4 shows the widths and boundaries of linkage regions in chromosome 7 for LOG TTTH1 and chromosome 1 for LOG MAXDRINK. Width of linkage regions for LOG TTTH1 was 58 cM from microsatellite markers, compared with 24 cM, 40 cM, 34 cM, 3 8 cM, 30 cM, and 33 cM, respectively, from Rep1 to Rep5 and the entire set of SNPs.

Table 1 Haplotype block distributions for chromosomes 1, 4, and 7
Table 2 Significant linkage results
Figure 1
figure 1

Linkage plot for the log transformed TTTH1 phenotype in chromosome 7. LOD scores for microsatellite (solid line) and SNPs (dashed lines). The different colored dashed lines represent the results of the linkage analyses based on randomly selected one, two, three, four, and five SNPs from each haplotype blocks and using the entire set of SNPs.

Table 3 SNP density and information content around the chromosome 7 linkage peak (114–172 cM) for STRP scan
Table 4 Widths of linkage regions by chromosome
Figure 2
figure 2

Linkage plot for the log transformed MAXDRINK phenotype in chromosome 1. LOD scores for microsatellite (solid line) and SNPs (dashed lines). The different colored dashed lines represent the results of the linkage analyses based on randomly selected one, two, three, four, and five SNPs from each haplotype blocks and using the entire set of SNPs.

Discussion

In all, the patterns of linkage results from microsatellites were similar to those obtained from SNPs analyses for chromosome 1, 4, and 7. It was however notable that the SNP analyses did not detect two linkage regions on chromosome 7 (LOD = 1.87 and 2.01; Table 2). As displayed in Figures 1 and 2, the LOD score peaks generated from SNPs were slightly shifted to the left when compared to that from microsatellite markers. A potential reason for this observation may be the different builds of the genetic maps used for the microsatellite markers and SNPs, and/or errors in genotyping [7]. Kruglyak [8] observed an increase in LOD scores for a proportionate increase in the information content of linkage map as derived from a denser SNP map. In our results, reducing the number of SNPs in each block to 1, 2, 3, 4, and 5 SNPs did not significantly change the shape of linkage signals albeit a small drop in peak height. Since expected LOD scores correlate with information content, from table 3, there is only a small reduction in information contents for Rep1 and others are the same. It has been estimated that 1.7–2.5 SNP markers provide equivalent information as one microsatellite marker [8, 9] and that a 10 K SNP array provides at least equal power to detect linkage compared with a search based upon a 5 Mb microsatellite screen [10]. In our results, 2.5 SNP markers provide equal information content as one microsatellite marker. These observations support the idea that the use of high dense SNP maps for performing linkage analysis should result in more precisely defined loci at substantially reduced cost.

Conclusion

The linkage results from SNP maps can result in narrower linkage regions with higher LOD scores when compared with microsatellite marker maps. The linkage results from reduced sets of SNPs provided signals in the same linkage regions but with a smaller LOD scores, suggesting that loss of information content influenced expected LOD scores. The different builds of the genetic maps used in microsatellite markers and SNPs or/and errors in genotyping may have led to the significant linkage region observed on chromosome 7 in the microsatellite scan that was not detected in the genome scan based on SNPs, and for peaks from SNPs being slightly shifted to the left of the microsatellite peaks.