Population stratification may bias analysis of PGC-1α as a modifier of age at Huntington disease motor onset
- First Online:
- Cite this article as:
- Ramos, E.M., Latourelle, J.C., Lee, JH. et al. Hum Genet (2012) 131: 1833. doi:10.1007/s00439-012-1205-z
- 769 Downloads
Huntington’s disease (HD) is an inherited neurodegenerative disorder characterized by motor, cognitive and behavioral disturbances, caused by the expansion of a CAG trinucleotide repeat in the HD gene. The CAG allele size is the major determinant of age at onset (AO) of motor symptoms, although the remaining variance in AO is highly heritable. The rs7665116 SNP in PPARGC1A, encoding the mitochondrial regulator PGC-1α, has been reported to be a significant modifier of AO in three European HD cohorts, perhaps due to affected cases from Italy. We attempted to replicate these findings in a large collection of (1,727) HD patient DNA samples of European origin. In the entire cohort, rs7665116 showed a significant effect in the dominant model (p value = 0.008) and the additive model (p value = 0.009). However, when examined by origin, cases of Southern European origin had an increased rs7665116 minor allele frequency (MAF), consistent with this being an ancestry-tagging SNP. The Southern European cases, despite similar mean CAG allele size, had a significantly older mean AO (p < 0.001), suggesting population-dependent phenotype stratification. When the generalized estimating equations models were adjusted for ancestry, the effect of the rs7665116 genotype on AO decreased dramatically. Our results do not support rs7665116 as a modifier of AO of motor symptoms, as we found evidence for a dramatic effect of phenotypic (AO) and genotypic (MAF) stratification among European cohorts that was not considered in previously reported association studies. A significantly older AO in Southern Europe may reflect population differences in genetic or environmental factors that warrant further investigation.
Huntington’s disease (HD) is a neurodegenerative disorder with classic symptoms that include progressive chorea, motor deficits, cognitive changes and dementia. Age at onset (AO) of the overt symptoms is highly variable: while some individuals show signs in the first decade, others remain asymptomatic even after 60 years of age, though in most cases death follows on average about 17 years after the first symptoms. HD is inherited as an autosomal dominant trait and is caused by the expansion of an unstable CAG repeat, in the first exon of the HD gene (now called HTT), on chromosome 4p16.3 (The Huntington’s Disease Collaborative Research Group 1993), resulting in an expanded polyglutamine tract in the huntingtin protein.
The major determinant of AO in HD is the size of the expanded CAG repeat allele (Lee et al. 2012), such that the longer the repeat the earlier the onset of clinical symptoms, though most HD cases occur in adulthood with about 40–45 CAG repeats. Repeat length alone explains about 70 % of the variability in onset age (Duyao et al. 1993). The remaining variance in AO (residual AO) is highly heritable but remains unexplained. Nonetheless, recent genetic studies have nominated about 20 loci that may modify AO or progression of HD. However, many of the specific polymorphisms assessed in multiple studies have failed to be replicated, including attractive biological candidate genes such as glutamate receptor, ionotropic kainate 2 (GRIK2), apolipoprotein E (APOE) and brain-derived neurotrophic factor (BDNF) (Rubinsztein et al. 1997; Metzger et al. 2006; Panas et al. 1999; Alberch et al. 2005; Di Maria et al. 2006).
One biologically compelling candidate thought to be involved in HD pathogenesis is PPARGC1A, localized at 4p15.1-2, which encodes peroxisome proliferator-activated receptor γ coactivator 1α (PGC-1α), a transcriptional regulator of adaptive thermogenesis (Puigserver et al. 1998) and mitochondrial respiration and oxidative stress (Puigserver and Spiegelman 2003; St-Pierre et al. 2006). The lack of PGC-1α expression produces an HD-like phenotype in mice (Lin et al. 2004; Leone et al. 2005) and mutant, but not wild type, huntingtin down-regulates the expression of PGC-1α and its target genes (Cui et al. 2006; Lin et al. 2004; Weydt et al. 2006). Moreover, three recent studies, with DNA samples from European HD patients, mainly from Italy and Germany, have reported an association of PGC-1α with AO of HD symptoms (Che et al. 2011; Weydt et al. 2009; Taherzadeh-Fard et al. 2009). Although different sets of PPARGC1A SNPs were included in these studies, one polymorphism (rs7665116), located at the 3′-end region of intron 2, was associated with a later AO in all three studies, displaying a significant effect both in an additive and dominant model. Following these results, it was reported that polymorphisms in PGC-1α downstream target genes, namely nuclear respiratory factor 1 (NRF1) and mitochondrial transcription factor A (TFAM), may influence the AO in HD (Taherzadeh-Fard et al. 2011). Given that results based on analyses of the PPARGC1A rs7665116 SNP are motivating a broader range of research into the functional basis of the effect, the aim of the present study was to attempt to replicate the association of this SNP with AO, in a much larger cohort of 1,727 HD patients of different European populations.
We analyzed 1,929 HD patients with known AO of overt motor symptoms. The DNA samples were from subjects involved in long-term genetic studies from collaborating investigators (HD-MAPS), the HD observational study COHORT and from the Harvard Tissue Resource Center Bank (McLean’s Hospital, Belmont MA) and the National Neurological Research Bank (VAMC Wadsworth Division, Los Angeles CA). These studies included related individuals [from 1,676 different families defined either based on the likelihood of genetic similarity from genome-wide genotyping information (Western European samples) or membership in nuclear (parents and children) families (Southern European samples)]. Of these, 934 were self-reported as originally from Southern European countries (263 from Portugal, 664 from Italy, 5 from Spain and 2 from Greece), the rest of the cases had unconfirmed or no geographical origin data. 1,020 of these were genotyped using the GeneChip Human Mapping 500K Array Set (Affymetrix) at the Broad Institute of Harvard and MIT as part of a genome-wide scan for HD genetic modifiers.
The HD CAG repeat length was determined by a polymerase chain reaction (PCR) amplification assay, using fluorescently labeled primers, as previously described (Warner et al. 1993). The size of the fragments was determined using the ABI PRISM 3730xl automated DNA Sequencer (Applied Biosystems, Foster City, CA, USA) and GeneMapper version 3.7 software. A set of HD CAG alleles, determined by DNA sequencing, were used as standards. Genotyping of the PGC-1α polymorphism (rs7665116) was performed by real-time PCR using the commercially available Taqman Genotyping probe (Applied Biosystems, Foster City, CA, USA) carried out on the LightCycler® 480 (Roche Diagnostics, Mannheim), following manufacturer’s instructions.
For the 1,020 samples with whole-genome genotyping, PCA was carried out using PLINK v1.05 (http://pngu.mgh.harvard.edu/Purcell/plink/) (Purcell et al. 2007) in order to determine the genetic ancestry of these individuals. Briefly, genotypes of HD samples were combined with HapMap Phase 2 data (CEPH, Yoruba, Han-Chinese and Japanese populations) for pairwise IBD estimation and subsequent IBS clustering.
To assess differences in the mean motor AO among Western and Southern European samples, we used the general estimating equation (GEE), thereby adjusting for related samples. Multivariate analyses were generated using GEE to assess the effect of the rs7655116 SNP at the PGC-1α gene with HD residual motor onset, adjusting for familial correlation. Residual motor onsets were computed as the difference between the observed and expected age of onset and were standardized to a mean of zero and standard deviation of one. The weighted GEE was computed assuming an independent correlation structure and using the robust estimator of the variance to account for familial relationships. All statistical analyses were performed using PASW Statistics (version 18).
Genetic and clinical data among European cases of Western and Southern origin
Origin of samples
36-87 HD CAG range
40-53 HD CAG range
Number of samples
Mean HD CAG (median)
Mean Motor Onset (median)
Number of samples
Mean HD CAG (median)
Mean Motor Onset (median)
Remarkably, analysis of the clinical data for these 1,727 HD cases, which had CAG alleles ranging from 36 to 87 repeats, revealed that the self-reported Southern Europeans (n = 934) had significantly later onset of motor HD symptoms (p < 0.001), by 4–5 years, compared to the Western European (n = 793), though the means/medians for HD CAG repeat length were similar in both groups (Table 1). Furthermore, the observed rs7665116 genotypes for samples from the Southern European countries revealed higher minor allele frequency (MAF) (~17 %) when compared to the Western European set (~12 %) (Table 1). These findings, together with the striking differences in the MAF of this polymorphism among the different HapMap populations, strongly suggested that population stratification might increase type I errors in the AO association analysis.
Finally, we have recently shown that the non-normal distribution of CAG allele size (and AO) also introduces error in conventional statistical analysis (Lee et al. 2012). Even a single CAG outlier sample, with a very long CAG repeat and extremely young age at onset relative to all others, can have a profound effect on the final result when testing for the effects of potential genetic modifiers (Lee et al. 2012). Therefore, as a final filter, we chose only the Southern and Western European HD cases with CAG alleles in the 40- to 53-repeat range, shown previously to yield a statistically well-behaved data set that conforms to the fundamental assumptions of linear regression analysis (constant variance and normally distributed error) (Lee et al. 2012).
Multivariate correlation of rs7665116 with residual age at motor onset
95 % confidence limits
T–T vs. T–C vs. C–C
PGC1α rs7665116 + ancestry
T–T vs. T–C vs. C–C
WE vs. SE
T–T vs. T–C + C–C
PGC1α rs7665116 + ancestry
T–T vs. T–C + C–C
WE vs. SE
Previous studies have reported the presence of a common polymorphism in PPARGC1A (rs7665116) that is associated with a delay in AO of HD motor symptoms in three European HD cohorts (Che et al. 2011; Taherzadeh-Fard et al. 2009; Weydt et al. 2009), primarily contributed by patients from Italy. Our study, which involved a larger collection of HD cases, did not provide strong evidence for this SNP, and therefore, for PGC-1α as a modifier of HD motor onset, but did strongly support further investigation of the factors that contribute to the striking differences in AO of motor symptoms in ‘Southern Europeans’.
The results of our study expose genetic ancestry as a critical factor in HD association studies. It is expected that a disease-associated polymorphism may have varying effects in different populations, but the variation in minor allele frequency, across different genetic backgrounds related to ancestry may also be critical and should be taken into account in genetic association analysis. Cases that are poorly matched for genetic background may lead to false positives in association studies. It is important to control for ancestry by use of PCA or related methods, even in apparently close related populations such as Europeans (Novembre et al. 2008) where there is strong evidence of recent population selection that has led to intra-European variation in allele frequency (Price et al. 2008). This is particularly important in studies using the CEPH-CEU panel as controls since the genetic matching to different populations may differ considerably in different countries (Lao et al. 2008). It is difficult to accurately infer ancestry in candidate gene association studies, leading to imperfect correction for stratification. However, with increasing availability of genome-wide datasets, the assessment of population structure should become a common procedure for candidate association studies.
Our study points out the dramatic effect that population stratification can have in testing a candidate gene for an association with disease phenotype. We found that the variation in rs7665116 minor allele frequency could lead to a false positive, if genetic ancestry is not corrected for in the analysis. The Southern European cases seem to be different genetically and clinically, in terms of age at diagnosis of motor symptoms, from other European samples. By adjusting for ancestry, we observed striking effects on both the P values (increased ~4×) and effect sizes (decreased by ~25 %). Even though the post-adjustment p values remained nominally significant, the dramatic reduction in significance occasioned by considering ancestry does not lend confidence to its being a true effect and rather suggests that it is due to insufficiently rigorous ancestry categorization of the ‘Southern European’ set, which was based solely on self-reporting rather than unbiased genome-wide genotyping data. Consistent with this interpretation, while this manuscript was under review, an article by Soyal et al. (2012) reported that no association of rs7651166 with AO to first symptom (not necessarily motor) was found in an European cohort of 1,706 HD patients whose MAF for this SNP closely resembled the MAF that we report for the Western European samples in our study.
A remarkable finding from our analysis is the strong evidence of later motor onset for HD patients originally from Southern European countries (Portugal and Italy), which have a reported onset of motor symptoms that is 4–5 years later than that of HD patients from other European regions, despite similar mean/median CAG allele size. This striking difference may reflect population differences elsewhere in the genome, since extensive genome-wide SNP analyses have shown that even though European populations share much of their genetic background, they also exhibit a notable degree of non-sharing (ancestry) (Lao et al. 2008; Novembre et al. 2008). It’s important to note that in addition to genetic background differences, a number of other factors may contribute to the difference in age at onset. One possibility is environmental influences, for example, differences related to lifestyle, or perhaps types of medication. A meta-analysis study has found evidence that people who adhere to a Mediterranean Diet appear to have a reduced risk of developing Parkinson’s and Alzheimer’s disease (Sofi et al. 2008), and altered Alzheimer’s disease course (Scarmeas et al. 2007). A recent observational study of HD in Europe has shown that Southern European (and Polish) clinicians prescribed anti-dyskinetic medication more frequently than clinicians in other European regions (Orth et al. 2010). Another environment related factor that is likely to contribute are the criteria and procedures for diagnosing HD, which may differ in different cultures. It is now important to understand which of many potential population-specific genetic and/or environmental factors are associated with later reported AO of motor symptoms in Southern Europeans.
The results of our study do not provide strong evidence for PPARGC1A SNP rs7665116, and therefore, for PGC-1α, as a modifier of age at onset of HD motor symptoms. However, we have found evidence of a significantly later age at onset of motor symptoms in Southern European countries, which may reflect genetic effects and/or environmental (lifestyle, diagnosis) factors that should be further explored. Our data strongly illustrate the false contribution that population stratification may make in a candidate gene association study, while providing genetic evidence that the contribution of PGC-1α as a modifier of the disease process that leads to onset of HD motor symptoms may not be significant.
We thank the HD families, whose participation in genetic studies made this work possible, the COHORT co-investigators and contributors (see “Appendix”), contributors to the HD-MAPS study, Ruth Abramson, Alexandra Durr, Adam Rosenblatt, Luigi Frati, Susan Perlman, P. Michael Conneally, Mary Lou Klimek, Melissa Diggin, Tiffany Hadzi and Ayana Duckett, as well as the Harvard Brain Tissue Resource Center at McLean Hospital and the National Neurological Research Specimen Bank at the VA West Los Angeles Healthcare Center. EMR is the recipient of a scholarship from the Fundação para a Ciência e a Tecnologia of Portugal (SFRH/BD/44335/2008). Study funding supported by NIH grants from the NINDS NS16367 (The Massachusetts HD Center Without Walls) and NS32765, and the CHDI Foundation, Inc.
Conflict of interest
The authors declare that they have no conflict of interest.
This study used only deidentified, previously collected DNA samples and phenotypic data in a manner approved by the Institutional Review Board of Partners HealthCare, Inc.
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.