Background

With over 4 million individuals affected with Alzheimer's disease (AD) in the U.S., dementia of the Alzheimer's Type (DAT) is the leading cause of dementia in the elderly. These current estimates are projected to triple over the next 50 years as the population ages [13]. AD has a complex etiology with strong genetic and environmental determinants. Tremendous evidence suggests the involvement of at least three genes in early-onset autosomal dominant AD. Amyloid precursor protein (APP on chromosome 21) [4, 5], presenilin I (PSEN1 on chromosome 14) [69], and presenilin II (PSEN2 on chromosome 1) [10, 11] are all prominent early-onset Alzheimer's disease genes. Understanding of the more common late-onset Alzheimer disease (LOAD), is centered on the role of one universally accepted risk gene, the apolipoprotein E locus (APOE) [12]. The APOE ε4 allele (frequency approximately 16%) [13, 14] acts in a dose-related manner to increase risk for LOAD and decrease age-of-onset [15, 16]. Although its involvement is without question, APOE accounts for less than half of late-onset AD susceptibility [15]. Given the strong heritability of AD, other genetic factors are likely to be involved. Multiple linkage screens have been conducted to elucidate additional regions harboring susceptibility genes for late-onset AD [1735]. While regions on chromosomes 9, 10 and 12 are most consistently identified, candidate genes within those regions have yet to be clearly implicated in AD. Meanwhile, numerous other regions have been implicated but have not been the focus of detailed study due to the prominence of 9, 10, and 12.

Though numerous promising LOAD candidate genes have been examined, the lack of replication across studies has made a definitive declaration of their involvement difficult (Reviewed in [36, 37]). Genetic heterogeneity is likely to be one of the underlying reasons for this lack of replication. Given this, one possible solution is to study populations likely to be more genetically homogeneous, thereby enriching for a more homogeneous set of risk alleles. The North American Amish population is a relatively isolated, genetically well-defined homogeneous population, well-suited for this type of study. Further detail regarding the establishment of the North American Amish population has been described elsewhere [3843]. While there may be a number of LOAD susceptibility genes contributing to disease in the general population, the relatively homogeneous Amish population is likely to contain a smaller set of risk alleles.

One challenge in performing linkage analysis in Amish pedigrees is to utilize the extensive pedigree information available while maintaining tractability of the computations. Due to their strong religious and cultural beliefs, the Amish very rarely marry outside of their communities, thereby promoting a genetically isolated population [3841]. This in turn has led to an elevated degree of consanguinity, yielding family pedigrees that contain many loops that can often be traced back three or more generations. In fact, through use of the Anabaptist Genealogy Database (AGDB), we find that 93% of our overall study population of 460 individuals and more specifically 25/27 (93%) of the individuals used in this study can be traced back 10 generations to a single founding couple. Moreover, 100% of individuals within our entire Amish sample (460) belong to one very large extended pedigree when allowing parent-child and marriage links to be included [44]. Since LOAD cannot be ascertained until late in life, affected individuals are usually only available for genotyping in a single generation. Hence, by far, most of the individuals in the pedigree have unknown phenotype and genotype status. It should be noted that there are limited methodologies available to analyze disease gene linkage utilizing such large complex pedigrees. One such method is SimWalk2 which utilizes descent graph theory and Markov Chain Monte Carlo (MCMC) simulation to compute lod scores [45]. This is a computationally demanding process and because of the uncertainty of MCMC convergence, the accuracy of the scores obtained may be difficult to assess.

Combinatorial mismatch scanning (CMS) is an alternative technique to search for IBS sharing in distantly related individuals from isolated founder populations where standard linkage and association analyses are difficult to implement. While several other methods could be implemented, this approach was used because of its simplicity in examining existing data. This method was also chosen because at the onset of analysis, we lacked the more detailed knowledge of the inter-relatedness of our sample often required to perform similar, but more sophisticated approaches within large inbred pedigrees [4648]. This strategy is designed to circumvent the confounding issue of genetic heterogeneity, by examining affected and unaffected persons from relatively small founder populations [49]. By genetically evaluating case and control individuals selected from such a population, whose common ancestor is no more closely related than grandparents, some prevailing problems in allelic association studies of complex disease within generally outbred populations can be avoided. Population stratification can lead to allelic association and be misinterpreted as linkage disequilibrium. In this approach, population stratification is less of an issue due to the relative isolation and common heritage of the study population. Another difficulty facing genetic studies within large outbred populations is that these populations are likely to exhibit locus heterogeneity. Within an isolated population, the probability that the risk allele of interest might have entered the gene pool only once or rarely, provides a great advantage. This in turn will likely facilitate the distinction between true and spurious association. Heath and colleagues highlight another advantage of examining isolated populations by alluding to the potential for detecting gene-gene interactions [49]. These epistatic interactions may play a substantial role in complex disease, effectively hampering the ability to detect association using single locus methods within heterogeneous populations. With reference to this problem, there are good reasons to believe that epistatic (gene-gene) interactions are ubiquitous in complex disease and may in fact be more important than single-gene effects [50].

Methods

Subjects and phenotypes

The Amish often have large sibships and extensive pedigree records that permit the accurate estimation of IBS gene sharing to be accurately evaluated. The estimated coefficient of inbreeding for the entire population is 0.0151, which is approximately equivalent to having second cousins as parents [51]. This effect has led present-day Amish to possess genes inherited identically from a common ancestor at rates higher than observed in the general population. By searching the Anabaptist Genealogy Database (AGDB) with the query software PedHunter, we have determined the level of relatedness of our sample more precisely [44, 52]. We calculated the average kinship coefficient for our overall ascertained Amish sample to be 0.019 ± 0.00053 (mean ± SEM). This measure demonstrates a significant difference from the average kinship coefficients calculated for the within cases group (0.011 ± 0.0013, mean ± SEM), the within controls group (0.0094 ± 0.0011, mean ± SEM), and the between cases and controls group (0.010 ± 0.00065, mean ± SEM). These calculations provide us with additional confidence that our cases and controls are more distantly related to each other relative to our overall sample population.

The subjects included in this study are a subset of individuals described in extensive detail elsewhere [43]. Briefly, individuals enrolled in the study each were assigned to one of three clinical impression categories; dementia (probable or possible Alzheimer's disease); unclear (includes mild cognitive impairment (MCI)); or unaffected (cognitively normal). Participants were administered the Mini-Mental State Exam (MMSE) [53], with possible scores ranging from 0 to 30. All individuals scoring 27 or greater were classified as cognitively normal/unaffected. Those scoring 23 or less were classified as cognitively impaired and labeled as probable dementia. Those who scored 24–26 had additional neuropsychological testing including the Dementia Rating Scale (DRS) [54], the Boston Naming Test (BNT) [55], and a reading subtest from the Wide Range Achievement Test-Revised (WRAT-R) [56]. Persons were categorized as having mild cognitive impairment if their DRS score fell below an age-adjusted threshold. Each case was discussed and a consensus "final" diagnosis was determined using all available information. For analytical purposes, the cases were classified as affected (demented), unclear (includes MCI), and unaffected (cognitively normal).

Five Amish pedigrees were included in this study. Three families were from Elkhart and LaGrange counties in Indiana, one extended family from Adams county Indiana, and one extended family from Holmes county Ohio. The extended pedigree from Adams county has been the subject of other previous and ongoing studies of dementia in the Amish [16, 57]. Among the 115 individuals who were genotyped, 40 were classified as having dementia, 9 were classified as unclear, and 66 individuals were unaffected. To minimize chance IBS inheritance, individuals selected for the CMS analysis were unrelated through the grandparental generation [49]. For this current study we identified 13 dementia cases and 14 cognitively normal individuals who met this requirement. This study was undertaken after Institutional Review Board review and approval.

Molecular analysis

Following informed consent, blood samples were collected from each individual and genomic DNA was extracted from blood using standard procedures. Cell lines have been initiated on most sampled individuals. All DNA samples were coded and stored at 4°C until used.

Markers were genotyped at both the Vanderbilt and Duke laboratories for all DNA samples. Laboratory personnel were blinded to pedigree structure, affection status, and location of quality control samples. Duplicate quality control samples were placed both within and across plates and equivalent genotypes were required for all quality control samples to ensure accurate genotyping. At the Vanderbilt laboratory, marker primer sequences were obtained from the Genome Database [58] or designed with Primer3 software [59] and synthesized by Invitrogen Life Technologies (Carlsbad, CA). Amplification was performed in a PCR Express machine (ThermoHybaid, Needham Heights, MA) with the following conditions: 94°C-4 min.; 94°C-15 sec., AT-30 sec., 72°C-45 sec. (35 cycles); 72°C-4 min. PCR products were denatured for 3 min. at 95°C and run on a 6% polyacrylamide gel (Sequagel-6®from National Diagnostics, Atlanta, GA) for ~1 hr. at 75 W. Gels were stained with a SybrGold®rinse (Molecular Probes, Eugene, OR) and scanned with the Hitachi Biosystems FMBIOII laser scanner (Brisbane, CA). Marker genotyping at the Duke laboratory was performed using fluorescence imaging (Molecular Dynamics SI Fluorimager) and a semi-automated allele calling system [60].

Hardy-Weinberg equilibrium calculations were performed for each marker and Mendelian inconsistencies were identified using PedCheck in the overall dataset [61]. Suspect genotypes were re-read by a different technician or re-run as necessary to reduce errors. All microsatellite markers were required to have >90% of possible genotypes to be included in the analysis.

Statistical analysis

Comparisons between case and control genotype data for the 407 microsatellite markers were conducted using the R software package to perform Fisher's exact tests in r × c Contingency Tables [6264]. Each marker was examined for both allele and genotype differences between individuals affected with dementia and those without dementia. The first test was for IBS allele frequency inequality between the two classes, and the second test was for IBS genotype frequency inequality between the two classes. Fisher's exact test computes the probability p that the pattern of alleles observed in the sample would be obtained if there were truly no difference between the allele frequencies among affected and unaffected individuals. While our current sample size is adequate to detect moderate to major effects (odds ratio of >6 with 80% power), it does not preclude our ability to detect smaller effects given that these power calculations are based on the assumptions of complete independence of samples and random sampling of the population, neither of which is true.

To empirically evaluate the statistical significance of the p-values computed in the CMS study, we permuted our dataset. We randomly re-assigned affection status for each of the 27 individuals maintaining the original total of 13 cases and 14 controls. We then executed the Fisher's exact test using the same allele and genotype data in the original dataset for each of the 407 markers. The distribution of p-values obtained from Fisher's exact testing on 1000 randomized sets of data was then created for both the allele and genotype comparisons to assess the empiric thresholds. We would expect the Fisher's exact p-value to match the p-value within the large distribution. These permutations were needed to correct for any residual bias from unrecognized kinship correlation present.

Results

We tested 407 microsatellite markers for differences in both allele frequency and genotype frequency between Amish dementia cases and controls. We considered all pointwise p-values and have chosen to report only markers demonstrating Fisher's exact p-values < 0.05 for either allele or genotype frequency differences. This arbitrary threshold was chosen to limit the results to be displayed and to provide a reference point for discussion of markers demonstrating nominally significant (albeit within the null expectation given the number of markers examined) evidence of association to dementia within our population.

As an example, Table 1a shows the allele count data for marker D5S211. There are eight D5S211 alleles in the sample with 27 subjects being typed for 54 alleles. In the example, the probability of this data being obtained if there were no underlying difference between the allele distributions for the two classes is < 0.005 (Table 2). Table 1b shows the genotype data for marker D5S211. Note that out of the thirteen genotypes observed in the data only one genotype is present in both affected and unaffected classes. The probability of the data being obtained if there were no underlying difference between the genotype distributions for the case and controls is < 0.005 (Table 2). Markers demonstrating nominally significant (p ≤ 0.05) differences between cases and controls are listed in Table 2. There were 24 out of 407 markers (6%) demonstrating significant differences in allele frequency. The most significant markers were D3S1262, D5S211, and D19S1165. When examining the markers for genotype frequency differences, 21 out of 407 markers (5%) were significantly different between our dementia cases and controls. The most significant markers were D5S1480 and D5S211. While there were seven markers (D3S1262, D4S1625, D5S211, D6S1031, D8S1477, D8S272, D17S921, and D18S481) with p-values ≤ 0.05 for both the allelic and genotypic tests, only one marker (D5S211) was significant at the empiric p ≤ 0.005 level for both tests. Although all findings are uncorrected and there are no findings with genome-wide significance, markers in close proximity to those regions previously identified are of particular interest for future study.

Table 1 Comparison of Allele and Genotype frequencies for D5S211 in dementia cases and controls
Table 2 Microsatellite markers demonstrating nominally significant (p ≤ 0.05) empiric p-values for allele and genotype frequency differences between dementia cases and controls. Microsatellite markers in close proximity to those demonstrating significance in this study and found to be either linked (lod ≥ 1) or associated (p ≤ 0.05) in previous studies are also listed.

Discussion

We have detected a few microsatellite markers of particular interest, which demonstrate significant differences between dementia cases and controls within our Amish founder population using the combinatorial mismatch scanning approach. The CMS concept is based on excess IBS allele/genotype sharing between individuals sharing a distant set of common founders [49]. The most noteworthy finding is on chromosome 5q35.2 at approximately 183 cM where we find evidence for both allele and genotype differences between our dementia cases and controls for marker D5S211. In their large genome-wide linkage study of Alzheimer's disease, Blacker et al. detected a multipoint lod score of 1.3 at this same marker [32]. In a recent study of consanguineous Israeli-Arab communities, Farrer and colleagues found significant evidence for allele frequency differences between AD cases and controls at the closest marker (D5S400 at 175 cM) on chromosome 5 run in their study [33]. Positive findings across three distinct study populations suggest that a gene or genes within this region of chromosome 5 may be involved in risk for dementia of the Alzheimer's type (DAT). Thus future examination of this region on chromosome 5 within our Amish families is warranted. While another marker on chromosome 5q31.3 (D5S1480 at 147 cM) demonstrated genotype differences between dementia cases and controls, this location is novel with respect to other previous studies.

We also found evidence of allele frequency differences on chromosome 3q27.3 at D3S1262 (201 cM). One study tested for association with AD in a geographically distinct Finnish population descended from a small group of original founders [65]. This group found significant association (empiric p = 0.007) at marker D3S1602 (also located at 201 cM) within their AD sample. An interesting candidate gene at this location is SST, the gene encoding somatostatin, which functions as a neurotransmitter in the central nervous system. Somatostatin inhibits the release of glucagon, growth hormone, gastrin, insulin, and secretin. Additional evidence for this region stems from our genome-wide linkage study within the Amish population, for which we observed a suggestive two-point lod score of 2.42 at the nearby marker D3S2398 (209 cM) [43].

An additional marker demonstrating suggestive allele frequency differences is located on chromosome 19p13.2 (D19S1165 at 36 cM). Hiltunen et al. had detected evidence for association at two nearby markers (D19S1034 and D19S433) spanning the region containing our significant results [65]. ICAM-1 (Intercellular Adhesion Molecule 1), a previously-reported AD candidate gene, also lies within this region of interest. Pola et al. showed that the ICAM-1 K469E gene polymorphism was associated with AD in an Italian population [66]. This association was not, however, supported in studies of the gene in Finnish and Spanish populations [67, 68]. Additional strong evidence from previous work indicates the presence of a late-onset AD locus within this region. A study by Wijsman et al. provides substantial evidence for a locus at approximately 35 cM affecting AD age at onset [69]. While our study does not address age at onset, it further suggests the involvement of this region in AD.

Another region of relative interest is on chromosome 4q31.2 at marker D4S1625. This marker located on chromosome 4q at approximately 146 cM lies between two markers (D4S2394 at 130 cM and D4S1548 at 154 cM) demonstrating highly suggestive evidence for linkage within our Amish population [43]. Further evidence for this region stems from work by Pericak-Vance et al. where they detect modest evidence for linkage to a marker only 4 cM away (D4S1629, lod = 1.32) from D4S1625 (Table 2) [26].

Given that our data may violate assumptions (i.e. normality and/or unrealized correlation) of the Fisher's exact test, we determined the empiric p-value for our results through permutation. We performed the Fisher's exact test on 1000 replicates containing the same original genotype data, but with randomized affection status. The resulting distribution of p-values was then used as an empiric measure of significance for our results (Table 2). On the whole, the empiric p-value thresholds for our study showed the Fisher's exact p-value to be somewhat more liberal than expected.

We have previously performed a genome-wide linkage screen for dementia within this population; however the complex nature of the Amish pedigrees provides a challenge for linkage analysis, given the size and number of consanguineous loops within these extended families. Accordingly the linkage analysis by itself does not allow taking full advantage of the data available to us. To examine our data more thoroughly, we performed the combinatorial mismatch scan. Both this approach and the linkage analysis utilize the high level of inter-relatedness, within the Amish population, to their advantage. The nature of the CMS analysis, allowed us to examine these data without being computationally burdened by the size or family structure of our population. These two methods complement each other by allowing the examination of the same data using both a family-based approach and a "pseudo" case-control approach to identify regions across the genome which are potentially involved in AD susceptibility. We are fully aware of the limited power of our current sample; however, these analyses should be viewed as an adjunct to our recent genomic screen.

Conclusion

We have reported several markers across the genome (chr3, 4, 5, and 19) to have significant allelic and/or genotypic frequency differences between dementia cases and controls within the combined Amish communities of Ohio and Indiana. While the evidence presented here is not overwhelming for any specific region, these results must be viewed in conjunction with not only our genomic screen but with findings across other studies within additional populations. In conclusion, our results provide the groundwork for future detailed study of these regions within our growing sample of Amish individuals.