Background

Age-related neurological diseases such as stroke and dementia represent a substantial population burden, and one in three persons will develop either stroke or dementia in their lifetime [1]. Twin studies suggest that 37–78% of the variance in the age of onset of Alzheimer's disease (AD), the most common cause of dementia in the elderly, can be attributed to additive genetic effects [2, 3]. Conversely, cognitively healthy aging also has a substantial genetic basis [4]. Finally ischemic stroke [57] and vascular cognitive impairment are also heritable [8]. However, surprisingly few genes have been identified that determine the risk of developing stroke (PDE4D, ALOX5AP) [911] or Alzheimer's disease (APOE4) [12], in the community as a whole, that is for persons not from autosomal dominant, early-onset families. One reason may be that studies to date have been underpowered to detect small effects. Two additional challenges to a more complete understanding of the genetic basis of these aging related brain diseases have been the late phenotypic manifestation of these conditions and their complex, polygenic mode of inheritance. Multiple genes interacting with each other and with environmental factors likely create a complex gradient of susceptibility to disease. We hypothesized that studying the genetic basis for the gradient of susceptibility underlying AD and stroke, using endophenotypes, would provide insights into the genetics of these late-onset neurological diseases. Endophenotypes (or intermediate phenotypes) are heritable traits that reveal the actions of genes predisposing an individual to develop a disease but they often manifest years before clinical and pathological diagnostic criteria for the disease are met.

Volumetric brain MRI and comprehensive cognitive testing have been used to define heritable, reproducible, quantitative endophenotypes which in turn relate to the risk of developing dementia or stroke [1319]. Twin studies have demonstrated substantial heritability of these endophenotypes [20]. The recent availability of high-throughput platforms permits genome-wide association studies (GWAS) that incorporate a more comprehensive and unbiased approach to detect genes with modest phenotypic effects. We present the results of a GWAS of structural and functional phenotypes previously associated with cellular and vascular brain aging.

Methods

Study sample

The study design, selection criteria and participant demographics of the Framingham Original and Offspring cohorts have been detailed in prior publications [21, 22]. A total of 1345 persons, who were members of the 330 largest families across these two cohorts, underwent genotyping using the Affymetrix GeneChip Human Mapping 100K single nucleotide polymorphism (SNP) set. The Overview provides details of this sample [23]. The study sample for the current analyses comprised of 705 stroke- and dementia-free Framingham Study participants who were genotyped and had undergone volumetric brain MRI and/or cognitive testing between 1999 and 2002. Among the 1345 eligible persons who were genotyped, 508 persons were excluded since they died prior to their 7th Offspring examination, did not attend this examination, declined or were unable to complete MRI or cognitive testing, 12 persons were excluded for prevalent stroke (n = 12) at the time of MRI and cognitive testing and 11 persons with neurological diseases such as multiple sclerosis or brain tumor that could impact study phenotypes were also excluded; all participants were screened, but none required exclusion for dementia at the time of MRI. Nine individuals were excluded because covariate information was not available. This study was approved by the Institutional Review Board of Boston University Medical Center; all participants provided written informed consent including consent for genetic studies.

Phenotype definition

The list of study phenotypes is shown in Column 1 of Table 1.

Table 1 Structural (Volumetric MRI) and Functional (Cognitive Testing) Brain Aging Phenotypes

Volumetric brain MRI

Details of brain MRI acquisition parameters, blinded image analysis, definition of brain volumes (indexed for cranial cavity size) and the mean and standard deviation (SD) values for these measures in the larger sample of all Framingham subjects (n = 2259) who underwent brain MRI, have been published previously [14, 15, 2427]. Mean and SD values and heritability estimates for each of these parameters in the current study sample are available online at http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007. Digital information from the MRI scans was transferred to a central laboratory directed by one of the authors (C.D.) for processing and analysis. Analysis was done blind to the subjects' genotype, demographic and vascular risk factor data. Analyses were done using semi-automated measurements of pixel distributions based on mathematical modeling of MRI pixel intensity histograms for cerebrospinal fluid and brain matter (white matter and gray matter) to determine the optimal pixel intensity threshold that distinguished cerebrospinal fluid (CSF) from brain matter. Brain volume was determined in coronal sections by manually outlining the intracranial vault above the tentorium to determine the total cranial volume (TCV). Next, the skull and other non-brain tissues were removed from the image, followed by mathematical modeling to determine total brain volume (TBV). TBV included the supratentorial gray and white matter and excluded the CSF. We used the ratio of TBV to TCV (Total Cerebral Brain Volume, TCBV) as a measure of brain volume to correct for differences in head size. Regional brain volumes were measured as the sum of the segmented right and left lobar volumes for that region indexed to the intracranial volume; frontal (FBV), parietal (PBV), occipital (OBV) and temporal (TBV) lobar brain volumes and the regional brain volume of the hippocampus (based on hand-drawn outlines) were assessed. Two measures of ventricular volume were used: the lateral ventricular volume, and the temporal horn volume each of which was measured as the sum of the volumes for two sides, log-normalized and indexed over TCV. Finally the white matter hyperintensity volume was measured as a z-score within 10-year age- and sex-specific categories of the logarithmically transformed continuous variable (WMH). All analyses were performed using a custom-designed image analysis package, QUANTA 6.2, operating on a Sun Microsystems (Santa Clara, CA) Ultra 5 workstation. The inter-rater reliabilities ranged between 0.90 and 0.94 for TCV, TCB, regional brain and ventricular volumes and white matter hyperintensities, and intra-rater reliabilities average 0.98 across all measures.

Cognitive measures

Subjects were administered a neuropsychological test battery using standard administration protocols and trained examiners. Details of the tests administered and normative values for the Framingham Original and Offspring cohorts have been previously published [13, 28]. Since individual cognitive tests are scored measured on different scales and since scores are known to be associated with age and sex, we transformed the variables, separately by sex, to obtain variables that are comparable across tests. First, natural logarithmic transformations were applied to normalize raw scores that had a skewed distribution. Next, each variable was regressed on age and residuals from these regressions were standardized using a z-score transformation. The resulting standardized cognitive test scores were then either summed to create 3 factors, each characterizing a specific cognitive domain: verbal memory (Factor 1, F1), visuospatial memory and organization (Factor 2, F2) and attention and executive function (Factor 3, F3), or were used individually (Similarities [Sim], Boston Naming Test [BNT] and Wide Range Achievement Tests [WRAT]). Details of test source and parameters used to define each individual test and factor are outlined in Additional data file 1, table 1.

Genotyping

The Overview [23] describes the Affymetrix 100K SNP GeneChip genotyping http://gmed.bu.edu/about/genotyping.html and the Marshfield short-tandem repeat genotyping performed by the Mammalian Genotyping Service http://research.marshfieldclinic.org/genetics. Only the SNP data were used for GWA studies whereas both SNP and STR data were combined for linkage analyses.

Statistical analysis

As detailed in the Overview [23], we used linear models adjusting for first degree relationships via generalized estimating equations (GEE) and family based association tests (FBAT). All tests were performed using additive genetic models to relate qualifying SNPs to multivariable-adjusted residuals of the 9 MRI measures and the 6 cognitive factors/tests described earlier. Qualifying SNPs (n = 70,897) were defined as autosomal SNPs with genotypic call rate ≥80%, minor allele frequency ≥10% and in Hardy-Weinberg equilibrium with p ≥ 0.001. Additionally, for FBAT analyses ≥10 informative families were required. For the linkage analyses, we used Merlin software to compute multipoint identity-by-descent utilizing 10,592 informative SNPs and 613 short tandem repeats selected to minimize LD [29, 30]; we then used maximum variance component analyses in SOLAR to compute LOD scores as a measure of linkage [31].

Multivariable-adjusted trait residuals for the phenotypic traits listed in Table 1 were computed using linear regression and the full set of all Framingham Study participants in whom the phenotype of interest was available. For the MRI analyses, residuals were derived from multivariable linear regressions in SAS [32], adjusting for the variables that we had previously found were related to MRI measures: age and if appropriate age-squared, current smoking status, systolic blood pressure in mm Hg, use of anti-hypertensive drugs and presence or absence of diabetes mellitus, atrial fibrillation and electrocardiographic left ventricular hypertrophy. Similarly, residuals were derived for each cognitive measure from multiple linear regressions and adjusting for the following covariates: birth cohort by decade, education (high school, high school graduate, some college or college graduate), Framingham Stroke Risk Profile score, plasma homocysteine concentrations (at the 20th Original cohort and the 6th Offspring examinations) and apolipoprotein E genotype (ε4 +ve/-ve). Unless otherwise specified, covariate data for all 15 phenotypic measures were drawn from the 26th Original cohort and the 7th Offspring examinations. Data from sex-specific regressions were pooled for the SNP-phenotype association and linkage analyses. Winsorized residuals (truncating extreme values at ± 3.5 standard deviations) were used for linkage analysis of phenotypes with departures from normality as assessed by skewness and kurtosis (TBV, temporal horn volume, F1, F2, F3, Sim, BNT and WRAT).

Presentation of results

We used several strategies to explore the resulting phenotype-SNP association and linkage results. First, we used an unbiased approach and collated the 50 strongest phenotype-SNP associations (those with the smallest p-value) including 25 phenotype-SNP associations each for GEE and FBAT analyses, and all linkage results with a LOD score > 2.0. All SNPs were annotated using the UCSC genome browser tables http://genome.ucsc.edu/ [33, 34] to examine if the SNP was within a gene and to identify this gene.

Next, we examined the data for genes with pleiotropic effects. We assessed if genes that were associated with TCBV or WMH at p < 0.001 (as primary structural indicators of cellular and vascular brain damage) were also associated with at least two of the other brain MRI measures (p < 0.01). We also evaluated if genes that were associated with lower scores on either F1 or F3 at p < 0.001 (as primary indicators of amnestic, Alzheimer-type and vascular cognitive impairment) seemed associated with other cognitive test measures.

Finally, we investigated SNP associations in candidate genes. There are few candidate genes that have been directly linked in prior studies to the endophenotypes described in these analyses. Hence, we investigated genes previously reported to be associated with stroke, Alzheimer's disease, brain aging and vascular dementia in established databases including the NCBI Gene, PubMed and OMIM databases [35], the Alzforum Alzgene database http://www.alzforum.org/res/com/gen/alzgene [36], and the Science of Aging Knowledge Environment genes/intervention database http://sageke.sciencemag.org/cgi/genesdb [37]. All SNPs within 60 kb of the candidate genes (listed in Additional data file 1, Additional table 2) were examined for association with the 15 phenotypic traits described in this paper. Only phenotype-SNP associations with a p-value < 0.001 are described in Table 4.

Table 2 Structural and Functional Brain Aging (MRI and Cognitive Testing) Phenotypes† for FHS 100K Project: Results of Association and Linkage Analyses
Table 4 Phenotypic Associations With Candidate Genes Previously Related To Stroke, Dementia And Brain MRI Or Cognitive Function Phenotypes: Phenotype-SNP Associations With A GEE Or FBAT P-Value < 0.001

Results

The brain aging phenotypic traits available in the Framingham Study 100K SNP resource with details of the sample size, statistical transformation and details of the covariates used for multivariable adjustment of each phenotype are provided in Table 1. The mean age of the 705 subjects was 62 ± 12 years, 46% were male, 79 were from the Original Framingham cohort (enrolled in 1948–50) while 626 belonged to the Offspring cohort. Table 2 (sections a and b) provide the top twenty-five phenotype-SNP associations ranked in order by lowest p-value for the GEE and FBAT models and Table 2 (section c) presents the phenotype-SNP associations with LOD scores ≥ 2.0 and the corresponding 1.5 – LOD support interval. The strongest phenotype-SNP association in GEE analyses was between a SNP on the retinal cadherin gene CDH4 and TCBV (rs1970546; p = 3.7 × 10-8) and this was the only association that achieved genome-wide significance if we applied a conservative Bonferroni correction as detailed in the Overview (p < 5 × 10-8); in FBAT analyses the strongest phenotype-gene association was between a SNP on the gene SORL1 (rs1131497; p = 3.2 × 10-6) and performance in Sim, a test of abstract reasoning. Assuming an additive genetic model, a minor allele frequency of10% and a very conservative α of 1 × 10-8 we had an 80% power to detect an effect of 0.52 standard deviations (SD) in a given variable. For TCBV this translates to an effect size of 1.71% equivalent to8.5 years of brain aging.

We had previously reported high heritability for WMH. In the current analyses examining associations between individual SNPs and WMH there was one association that was in the top 50 and others that were in the top 100, but none were within the arbitrarily chosen cut-off for Table 2 which only details the top 25 phenotype-SNP associations. In FBAT analyses, rs1822285 and rs166085, on chromosomes 11 and 5 respectively, were associated with WMH (p = 6.4 × 10-5 and 9.3 × 10-5) but these SNPs are not within known genes. In GEE analyses, two SNPs on the biologically plausible gene CLDN10 or claudin 10, an integral membrane protein that is a component of the tight junction, were related to WMH (rs10508012 and rs10508013, p = 3.3 × 10-5 and 4.9 × 10-5). Other extragenic SNPs and SNPs on biologically interesting genes (the glial growth factor NRG1 and the potassium channel protein KCNMA1) were also associated with WMH with p values in the 10-5 to 10-4 range. We again observed the linkage between WMH and a region on chromosome 4 that we had previously reported [38]. Within this linkage peak (1.5 LOD support interval) were biologically interesting candidate genes such as EVC and EVC1 related to the Ellis van Creveld syndrome and GRK4, previously related to salt-sensitive hypertension [39, 40].

We observed that performance on the Wide-Range Achievement Test (WRAT), a test of reading ability, was linked to a region on chromosome 18p with a maximum LOD score of 5.1 at rs1846090. The 1.5 LOD support interval of this linkage peak includes an STR marker, D18S53, that has been associated with dyslexia in some prior studies [41] although not in others [42]. In the current study the observed LOD score for WRAT at D18S53 (GATA11A06) was 2.5.

Table 3 provides all phenotype-SNP associations with a GEE or FBAT p < 0.001 for a key phenotype identified a priori, and a GEE or FBAT p < 0.01 for at least two other phenotypes within each of two groups of related phenotypes. These two groups were the brain MRI parameters (with TCBV and WMH as the key phenotypes) and the cognitive tests (run once with F1 and once with F3 as the key phenotype). If adjacent SNPs were in significant linkage disequilibrium [LD] (r2 > 0.80) results are only presented for the strongest phenotype-SNP association noted within the LD block. For the MRI parameters, GEE models identified 10 SNPs and FBAT models identified 7 SNPs using TCBV as the index phenotype and none using WMH as the index phenotype; among these were 4 SNPs on PDE3A and one each on PDE4B and SCN8. For the cognitive phenotypes GEE models identified 7 phenotype-SNP associations using F1 as the key phenotype and 4 using F3 as the key phenotype; FBAT models did not identify any phenotype-SNP associations meeting these prespecified criteria.

Table 3 SNP Associations with a GEE or FBAT p-value < 0.001 for selected phenotype and p values < 0.01 for at least two other phenotypes within selected group of related phenotypes

We identified 163 potential candidate genes and looked for phenotype-SNP associations using all SNPs on the 100K Affymetrix gene chip that were within 60 kb of the candidate gene. 23 genes had no analyzable SNPs within the 100K Affymetrix gene chip while 140 genes had 1430 analyzable SNPs within 60 kb of the gene. Table 4 shows the candidate genes and all phenotype-SNP associations with a GEE or FBAT p-value < 0.001. In this analysis we included all SNPs regardless of MAF since in prior studies significant phenotype-SNP associations had been demonstrated for some of these genes with SNPs having MAF < 10%.

Discussion

This is the first GWA study of volumetric brain MRI and cognitive phenotypes in a community-based sample of adults with data drawn from two generations of persons within the same families. The complete results of the association and linkage analyses are available at our website http://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?id=phs000007. This resource has the potential to detect novel susceptibility genes for brain aging, to examine the relevance within humans of promising candidate gene associations with these diseases reported in animal models, and to replicate findings observed in other cohort studies. We used several strategies to prioritize phenotype-SNP associations, but there remain other unique ways of looking at these data that we and others will continue to explore.

In our untargeted approach of ranking SNP associations by the strength of the p-value, we found several phenotype-SNP associations within biologically interesting genes (Table 2). The most exciting was a strong association between two SNPs in or adjacent to the gene SORL1 and performance on tests of abstract reasoning (rs1131497; FBAT p = 3.2 × 10-6 and rs726601; FBAT p=8.2 X 10-4). SORL1 is an apolipoprotein E receptor, binds alpha-2-macroglobulin, and is one component of a large multimeric complex, termed the retromer complex that is involved in retrograde transport of proteins from endosomes to the trans-Golgi network [43, 44]. This retromer complex appears to play a crucial role in the transportation of transmembrane proteins implicated in Alzheimer's disease, such as amyloid precursor protein (APP) and β-site APP cleaving enzyme (BACE1). SORL1 protein is underexpressed in the frontal lobes of persons with AD compared to controls and the SORL1 gene has recently been associated with the risk of developing AD in 6 population samples [45, 46]. Only 7 SNPs on or adjacent to the SORL1 gene were evaluated in the 100K Affymetrix gene chip. One of these SNPs on SORL1 that was associated with abstract reasoning (rs726601, FBAT p = 8.2 × 10-4, Table 4) was in LD (r2 > 0.8) with SNPs (rs2282649, rs1010159) strongly associated with AD in these studies [45, 46].

In unbiased analyses, we also identified 3 genes that were associated with measures of frontal or parietal brain volume and with tests of executive function and abstract reasoning. These 3 genes, ERBB4, PDLIM5 and RFX4, (FBAT ranks #11 and 12, GEE rank #9) have each been previously associated with schizophrenia or mood disorders, conditions known to be associated with smaller frontal brain volumes and poorer performance on tests of executive function, even in unaffected family members [47, 48]. ERBB4 is a neuregulin (NRG1) receptor involved in forebrain development and N-methyl-D-aspartate (NMDA) receptor function. It has been associated with schizophrenia wherein excess of the IVS 12–15C > T has been noted (odds-ratio 2.98) [49, 50]. NRG1 itself has been associated with schizophrenia in the Icelandic DeCODE population [51] and in other studies [5254], with accelerated lobar atrophy [52], and with bipolar disorders [55, 56]. As shown in Table 4, NRG1, like ERBB4, was associated with frontal brain volume (FBV) in our sample. PDLIM5 polymorphisms have been associated with schizophrenia (rs2433320 and rs2433322) [52, 55] and bipolar disorder (rs10008257 and rs2433320) [57]. Additionally the PDLIM5 protein is a homolog of AD7c-NTP, a neural thread protein associated with Alzheimer's disease, and is being studied as a possible CSF biomarker of AD [58]. A final group of 3 genes, CDH4, VIPR2, CTNNB1 (GEE rank #1 and FBAT ranks #17 and 24) have been shown in animal studies to play an important role in neural tract and synaptic development [5961]. Using linkage analyses, we were able to replicate a previous report that dyslexia was linked to a short-tandem repeat marker D18S53 on chromosome 18p11.2.

We examined pleiotropic effects by identifying SNP associations across two sets of related phenotypes. In these analyses, we uncovered a different set of genes, none of which have been related to brain volumes, cognitive function, stroke or dementia in prior population studies. However, there are biologically interesting genes related to brain volumes including PDE3A, previously related to all aspects of thrombosis [62], SCN8A linked to cerebellar ataxia with mental retardation [63], and PDE4B which has been associated with schizophrenia [64].

We also evaluated SNPs within some candidate genes previously reported to be associated with stroke and dementia in animal studies or in population samples, and observed that several of these SNPs were associated with MRI and cognitive endophenotypes that increase the risk of these conditions; this gene list is representative but not comprehensive. Among these genes are PDE4D and LTA4H that have been previously related to stroke in several population samples [9, 10]; NGFB, NTRK2 and NTRK3 (a neural growth factor and two receptors for neural growth factors) genes, previously associated with performance on memory tasks in animal studies [65, 66]; BACE1, PRNP and A2M, genes associated with AD in case-control or family-based association studies [36, 67, 68], VLDLR, a gene previously associated with an increased risk of dementia in the presence of vascular risk factors [69] and LRRK2, a gene associated with an increased risk of Parkinson's disease in population samples [70], but also thought to be an enabling gene for tau pathology [71]. There has been only one prior study that directly related a gene (KIBRA) to one of the phenotypes (verbal memory) included in the current analyses. We did not have any SNPs in significant LD with the SNP (rs17070145) described in that study [72]. We have chosen not to include details of the correlation between SNPs from the 100K and the specific SNP(s) studied within candidate genes by prior investigators since doing so would have expanded our Table 4 beyond the size and scope of this article. For example, prior associations of several of the candidate genes with related clinical disease phenotypes (for example, PDE4D with ischemic stroke, SORL1 with AD) have described allelic heterogeneity. In these studies, multiple SNPs and haplotypes within the gene were associated with the phenotype, even within Caucasian populations [7375].

Limitations

Our study had several limitations. A healthy survivor bias is likely as participants in this sample had to survive beyond 1990 to provide DNA. Further, persons undergoing MRI had to travel to an MRI center, provide informed consent, and have no contraindication to the study. We have previously shown that persons undergoing brain MRI were significantly healthier than the overall sample of Framingham participants alive at the time [15].

Our sample of 705 related persons may have a limited power to uncover associations as compared to the larger sample that includes unrelated subjects (on whom 100K genotyping was not obtained). This is especially true for hippocampal volumes, which were computed based on hand-drawn hippocampal outlines; the number of persons in our study dataset with available hippocampal volumes was only 327. Further, we currently have only a single measure of brain MRI and cognitive tests in these subjects. However, all these participants are being restudied with a second cycle of MRI and cognitive testing. The genes associated with changes in these measures over time may be stronger candidate genes for usual and pathological brain aging processes than the genes related in current analyses to cross-sectional endophenotypes.

The 100K Affymetrix GeneChip provides limited (~30%) coverage of the genome, with no coverage of several gene rich areas and key candidate genes such as APOE [76]. However, the forthcoming NHLBI funded 550 K genome-wide scan on over 9000 Framingham participants (discussed in the Overview) should permit validation of our initial 100K SNP associations in a larger sample and will provide more dense coverage of the genome. Population stratification is not a major concern in this study sample due to the high homogeneity of ancestry (European). However, for the same reason we cannot detect race or ethnicity-specific variations in these phenotype-SNP associations. There are significant issues of multiple-testing which are addressed in the Overview; when testing for association with all alleles having a minor allele frequency >5%, it has been estimated that 1,000,000 tests are conducted across the entire human genome, hence for an α of 0.05, using a conservative Bonferroni correction (0.05 × 10-6) only tests with a p value < 5 × 10-8) would be considered significant; however others have argued that this is too stringent a threshold since it ignores correlation between individual SNPs [7779]. We emphasize that the current study is hypothesis-generating and our findings need to be replicated in other population samples.

Conclusion

The untargeted genome-wide approach to detect genetic associations with brain aging identified several biologically interesting genes (such as genes previously related to AD and schizophrenia) as possible novel candidates related to brain structure and function in middle-aged to elderly populations. Our data also suggest that genes previously associated with clinical disease may be associated with clinical endophenotypes known to increase the risk of developing these conditions. Finally, our database will serve as a resource for in silico replication of findings noted in other population-based samples, and in animal models of brain aging, stroke, and neurodegenerative diseases.