Background

Alcoholism is a complex disorder in which multiple genes may contribute to the risk [1]. To address this complexity, the Collaborative Study on the Genetics of Alcoholism (COGA) researchers designed a large-scale family study and collected multiple alcoholism and alcohol-related phenotypes from the participants.

ALDX1, the primary COGA definition of alcohol dependence, requires a person to meet both DSM-III-R criteria and the Feighner criteria [1, 2]. Linkage analyses using ALDX1 have provided evidence of linkage on chromosome 1, 2, and 7 [3, 4]. Highly heritable electrophysiological variables, such as electroencephalography (EEG) and event-related potentials (ERPs), have been used to identify the genes that affect brain activities related to alcoholism. Data from the Eyes Closed Resting EEG experiment (ECB21) have revealed a strong linkage on chromosome 4 [5, 6]. In further studies, the far frontal left side channel measure extracted from the target case of the Visual Oddball experiment for 4 electrode placements (TTTH1) revealed a strong linkage on chromosome 7 [7]. In these studies a 10-cM map of microsatellites was used for the initial scan. The recent addition of singe nucleotide polymorphisms (SNPs) within these linkage regions has improved the resolution of the mapping results [1, 7].

The microsatellite-based screening approach has been used successfully for mapping Mendelian diseases. However, this technique has been proven to be unreliable for complex genetic diseases [8, 9]. It has been suggested that a 1–2 cM map of moderately polymorphic biallelic markers would be more powerful than a 5–10 cM map of microsatellite screening sets [10]. If that is the case, the recently developed, high-density oligonucleotide array-based, whole-genome sampling analysis approach [11] should provide an ideal set of genotype data for a whole genome scan.

The COGA dataset provided to participants at GAW14 included data from 1,350 participants from 143 families. The genotype dataset included data produced by a 10-cM map of microsatellites, Affymetrix GeneChip® Human Mapping 10 K Array (HMA10K), and Illumina SNP-based Linkage III Panel. To identify susceptibility regions for alcoholism, we performed a genome-wide multipoint linkage analysis using alcohol dependence phenotypes ALDX1, ALDX2 (diagnosed by DSM-IV criteria), and quantitative traits TTTH1 and ECB21. The performance of microsatellites, Affymetrix HMA10K Array, and Illumina Linkage III Panel were compared in terms of information content, identified linkage regions and the 1-LOD support interval of the regions.

Methods

Map construction

The 10-cM microsatellite maps contained 328 microsatellites of which 309 have unique locations on the deCode high-resolution genetic maps. To map the SNPs, we first obtained the physical locations from build 34 of the human genome dbSNP database at the National Center for Biotechnology Information (NCBI). We then interpolated the genetic map locations using the microsatellite with unique physical locations in deCode genetic maps. 11,050 Affymetrix SNPs and 4,700 Illumina SNPs with unique sex-averaged genetic map locations were used in our study. We also created an even denser map of markers by combining Affymetrix and Illumina SNPs (Comb2). In addition, we combined Affymetrix and Illumina SNPs with the microsatellite data (Comb3) to determine the contribution of microsatellite markers.

Genotype error detection

The datasets were prepared with PEDCHECK [12] to remove Mendelian inconsistencies. A small number of erroneous genotypes can reduce the power of linkage analysis [1315]. We therefore used MERLIN [16] to eliminate the genotypes with unlikely recombination patterns (0.39% for microsatellites, 0.16% for Affymetrix SNPs, 0.13% for Illumina SNPs).

Linkage analysis

Information content (IC) measures how much of the inheritance information can be extracted from available genotype data. It closely predicts the power of a map to detect linkage [10]. We used MERLIN to calculate IC at every marker locus.

Alcohol dependence phenotypes ALDX1 and ALDX2 include five categories: no information; pure unaffected; never drank; unaffected with some symptoms; affected. We treated "never drank" as "no information", and combined "pure unaffected" and "unaffected with some symptoms" as "unaffected." We performed nonparametric linkage (NPL) analysis based on the identity-by-descent (IBD) sharing among affected individuals in a pedigree. We used MERLIN to calculate NPLall [17] and the corresponding nonparametric LOD scores based on the linear model [18]. NPL scores are generally regarded as conservative [17, 18]. The nonparametric LOD scores give more accurate p-values and can be used to construct 1-LOD support intervals [9]. For this reason we used the nonparametric LOD scores in our analysis. We chose significance level 0.05 to report linkage regions.

We conducted variance components analyses on the log transformed quantitative traits TTTH1 and ECB21, adjusting for age and sex. Heritability of the traits and the LOD scores at every marker locus were calculated by using MERLIN.

Computational implementation

MERLIN is a software package designed for dense genetic maps in pedigree data. It efficiently implements the Lander-Green algorithm [19] by using sparse binary trees to represent gene flow. We modified the tree structure in the source code of MERLIN and adjusted different compilation options to improve efficiency on highly dense maps and extensive pedigrees by 25% on a 32-bit UNIX machine and 50% on a 64-bit UNIX machine.

Results

Summary of the maps

The summary statistics for the data are shown in Table 1. Among the three data sets, Affymetrix SNPs have the highest and the most uniform density across the genome. Due to the sparseness of the microsatellites, there is only slight difference in density between the combined SNPs and the combined SNPs and microsatellites.

Table 1 Summary of the maps

Information content

Microsatellites had the lowest mean and highest standard deviation (SD) of genome-wide IC due to the limited coverage of the genome (Table 2). With the densest map, Affymetrix SNPs produced the highest mean and lowest SD as well as a lowest inter-quartile range and most narrow range of IC. This indicates a uniform and robust distribution of IC across the genome. The same trend remained for the combined map with microsatellites and SNPs (Comb3). The combined SNPs data (Comb2) show similar IC compared with Comb3 (Table 3). These results show that density plays a key role in extracting inheritance information from the available genotype data.

Table 2 Summary of IC
Table 3 Comparison of IC and density for combined data

Linkage analysis with alcohol dependence phenotypes ALDX1 and ALDX2

Both ALDX1 and ALDX2 phenotypes showed significant linkage on chromosome 7 (96–106 cM) and chromosome 10 (149–176 cM) in all the datasets. In the SNP datasets, both phenotypes detected the same region on chromosome X (30–46 cM), although the LOD scores for ALDX2 were much less significant (Table 4, Table 5).

Table 4 Summary of linkage results for ALDX1
Table 5 Summary of linkage results for ALDX2

Our analyses also detected linkage regions unique to each phenotype. For ALDX1, linkage was detected on chromosome 2, 7, 10, and 11 (Table 4) in all of the data sets. Additional linkage regions on chromosome 1, 6, 9, 12, 13, 18, and X were detected in the SNP datasets. Consistent with previous literature [3, 4], the most significant linkage region in the combined data (Comb3) was located on chromosome 7 (Figure 1). In the Affymetrix panel, the highest maximum LOD score was 2.25 at 100.871 cM (Table 4). The 1-LOD intervals for this linkage region (Affymetrix: 6.551 cM; Illumina: 9.557 cM; microsatellites: 26.80 cM) revealed significant difference between microsatellites and SNPs. An even higher maximum LOD score of 2.52 at 101.049 cM with a narrower 1-LOD interval 3.647 (Figure 1) was achieved with a denser map in the combined datasets (Comb3). On chromosome 1 and 2 we found linkage regions adjacent to the significant regions reported in [3, 4] in both SNP datasets. In contrast, we did not find any evidence of linkage to these two regions in the microsatellite data. Analyses using ALDX1 on selected chromosomes of the Comb2 and Comb3 data showed almost identical results.

Figure 1
figure 1

Linkage region on chromosome 7. LOD scores of the linkage region (83–119 cM) for microsatellite markers (pink), Affymetrix SNPs (blue), Illumina SNPs (green) and the combined data (red). Horizontal dashed line is the critical value at point-wise significance level of 0.05. Bars on the bottom indicate 1-LOD intervals for this region in microsatellites (pink), Affymetrix SNPs (blue), Illumina SNPs (green), and the combined data (red).

For ALDX2, linkage regions on chromosome 1 (247–259 cM), 7 (96–106 cM), 10 (141–176 cM), and 17 (30–53 cM) were significant at a level of 0.05 in all the three datasets. Other significant regions on chromosome 2, 3, 6, 7, 9, 10, and X were detected in SNP data sets but were not present in the microsatellite data (Table 5).

Linkage analysis with EEG measures TTTH1 and ECB21

The kurtosis values of the log transformed TTTH1 and ECB21 are -0.13 and -0.44, respectively, indicating no significant deviation from the normal distribution. Therefore, these trait values were acceptable for use in variance components analysis.

The estimate of heritability, after adjusting for age and sex, is 35.17% for TTTH1. The most significant region on chromosome 7 (112–153 cM: LOD score 1.47 for Affymetrix SNPs, 2.01 for Illumina SNPs, and 2.44 for microsatellites) overlaps with the one reported by Jones et al. [7], but the heritability is lower. The difference in heritability values could be due to different sample structure or different algorithms (Jones et al. [7] performed the variance components analyses using the t-distribution option of SOLAR) used in these two studies.

The estimate of heritability for ECB21 is 55.54% after adjusting for age and sex. One of the linkage regions on chromosome 4 (58–79 cM: LOD score 1.10 for Affymetrix, 1.25 for Illumina, and 1.40 for microsatellites) overlaps with the highly significant linkage region found in previous studies [5, 6]. The most significant region is on chromosome 6 (169–185 cM: LOD score 2.18 for Affymetrix, 2.11 for Illumina, and 0.71 for microsatellites).

Discussion

Based on the COGA data provided to participants at GAW14, we have presented a NPL analysis for alcohol dependence phenotypes ALDX1 and ALDX2, and a variance component analysis for EEG measures TTTH1 and ECB21. Our results confirmed some of the linkage findings in previous studies [37]. The increased density of the SNP data extends the number of regions detected and increases the resolution of the linkage results.

In our study, we used a p-value of 0.05 as the significance level to report linkage regions. However, in order to define true linkage and explain the inconsistencies among the results of different datasets, it is important to choose a level of genome-wide significance. Commonly used resampling-based and gene-drop simulation approaches are computationally intensive and do not lend themselves to the analysis of the large amount of data in this study. We are investigating a more efficient Monte Carlo procedure to assess genome-wide significance in linkage analysis [20].

Our results show that a denser map can be more powerful for linkage analysis. IBD sharing based linkage analysis algorithms usually assume linkage equilibrium between the markers and the strong linkage disequilibrium between closely adjacent markers could potentially introduce false linkage results [21, 22]. A study using Affymetrix HMA10K arrays indicated that there was no substantial difference in the results when SNPs in linkage disequilibrium are either retained or removed [15].

This study represents an extensive performance comparison of three different platforms (microsatellite markers, Affymetrix HMA10K Array, and Illumina Linkage III Panel) in a series of linkage analyses for alcoholism. The high density and the robust performance of SNPs make the whole-genome scan a desirable approach for linkage analysis. This new approach may bring a renewed power to IBD sharing based linkage analysis.