Population stratification may bias analysis of PGC-1α as a modifier of age at Huntington disease motor onset

Huntington’s disease (HD) is an inherited neurodegenerative disorder characterized by motor, cognitive and behavioral disturbances, caused by the expansion of a CAG trinucleotide repeat in the HD gene. The CAG allele size is the major determinant of age at onset (AO) of motor symptoms, although the remaining variance in AO is highly heritable. The rs7665116 SNP in PPARGC1A, encoding the mitochondrial regulator PGC-1α, has been reported to be a significant modifier of AO in three European HD cohorts, perhaps due to affected cases from Italy. We attempted to replicate these findings in a large collection of (1,727) HD patient DNA samples of European origin. In the entire cohort, rs7665116 showed a significant effect in the dominant model (p value = 0.008) and the additive model (p value = 0.009). However, when examined by origin, cases of Southern European origin had an increased rs7665116 minor allele frequency (MAF), consistent with this being an ancestry-tagging SNP. The Southern European cases, despite similar mean CAG allele size, had a significantly older mean AO (p < 0.001), suggesting population-dependent phenotype stratification. When the generalized estimating equations models were adjusted for ancestry, the effect of the rs7665116 genotype on AO decreased dramatically. Our results do not support rs7665116 as a modifier of AO of motor symptoms, as we found evidence for a dramatic effect of phenotypic (AO) and genotypic (MAF) stratification among European cohorts that was not considered in previously reported association studies. A significantly older AO in Southern Europe may reflect population differences in genetic or environmental factors that warrant further investigation.

cohort, rs7665116 showed a signiWcant eVect in the dominant model (p value = 0.008) and the additive model (p value = 0.009). However, when examined by origin, cases of Southern European origin had an increased rs7665116 minor allele frequency (MAF), consistent with this being an ancestry-tagging SNP. The Southern European cases, despite similar mean CAG allele size, had a signiWcantly older mean AO (p < 0.001), suggesting population-dependent phenotype stratiWcation. When the generalized estimating equations models were adjusted for ancestry, the eVect of the rs7665116 genotype on AO decreased dramatically. Our results do not support rs7665116 as a modiWer of AO of motor symptoms, as we found evidence for a dramatic eVect of phenotypic (AO) and genotypic (MAF) stratiWcation among European cohorts that was not considered in previously reported association studies. A signiWcantly older AO in Southern Europe may reXect population diVerences in genetic or environmental factors that warrant further investigation.

Introduction
Huntington's disease (HD) is a neurodegenerative disorder with classic symptoms that include progressive chorea, motor deWcits, cognitive changes and dementia. Age at onset (AO) of the overt symptoms is highly variable: while some individuals show signs in the Wrst decade, others remain asymptomatic even after 60 years of age, though in most cases death follows on average about 17 years after the Wrst symptoms. HD is inherited as an autosomal domi-nant trait and is caused by the expansion of an unstable CAG repeat, in the Wrst exon of the HD gene (now called HTT), on chromosome 4p16.3 (The Huntington's Disease Collaborative Research Group 1993), resulting in an expanded polyglutamine tract in the huntingtin protein.
The major determinant of AO in HD is the size of the expanded CAG repeat allele (Lee et al. 2012), such that the longer the repeat the earlier the onset of clinical symptoms, though most HD cases occur in adulthood with about 40-45 CAG repeats. Repeat length alone explains about 70 % of the variability in onset age (Duyao et al. 1993). The remaining variance in AO (residual AO) is highly heritable but remains unexplained. Nonetheless, recent genetic studies have nominated about 20 loci that may modify AO or progression of HD. However, many of the speciWc polymorphisms assessed in multiple studies have failed to be replicated, including attractive biological candidate genes such as glutamate receptor, ionotropic kainate 2 (GRIK2), apolipoprotein E (APOE) and brain-derived neurotrophic factor (BDNF) (Rubinsztein et al. 1997;Metzger et al. 2006;Panas et al. 1999;Alberch et al. 2005;Di Maria et al. 2006).
One biologically compelling candidate thought to be involved in HD pathogenesis is PPARGC1A, localized at 4p15.1-2, which encodes peroxisome proliferator-activated receptor coactivator 1 (PGC-1 ), a transcriptional regulator of adaptive thermogenesis (Puigserver et al. 1998) and mitochondrial respiration and oxidative stress (Puigserver and Spiegelman 2003;St-Pierre et al. 2006). The lack of PGC-1 expression produces an HD-like phenotype in mice (Lin et al. 2004;Leone et al. 2005) and mutant, but 123 not wild type, huntingtin down-regulates the expression of PGC-1 and its target genes Lin et al. 2004;Weydt et al. 2006). Moreover, three recent studies, with DNA samples from European HD patients, mainly from Italy and Germany, have reported an association of PGC-1 with AO of HD symptoms (Che et al. 2011;Weydt et al. 2009;Taherzadeh-Fard et al. 2009). Although diVerent sets of PPARGC1A SNPs were included in these studies, one polymorphism (rs7665116), located at the 3Ј-end region of intron 2, was associated with a later AO in all three studies, displaying a signiWcant eVect both in an additive and dominant model. Following these results, it was reported that polymorphisms in PGC-1 downstream target genes, namely nuclear respiratory factor 1 (NRF1) and mitochondrial transcription factor A (TFAM), may inXuence the AO in HD (Taherzadeh-Fard et al. 2011). Given that results based on analyses of the PPARGC1A rs7665116 SNP are motivating a broader range of research into the functional basis of the eVect, the aim of the present study was to attempt to replicate the association of this SNP with AO, in a much larger cohort of 1,727 HD patients of diVerent European populations.

Subjects
We analyzed 1,929 HD patients with known AO of overt motor symptoms. The DNA samples were from subjects involved in long-term genetic studies from collaborating investigators (HD-MAPS), the HD observational study COHORT and from the Harvard Tissue Resource Center Bank (McLean's Hospital, Belmont MA) and the National Neurological Research Bank (VAMC Wadsworth Division, Los Angeles CA). These studies included related individuals [from 1,676 diVerent families deWned either based on the likelihood of genetic similarity from genome-wide genotyping information (Western European samples) or membership in nuclear (parents and children) families (Southern European samples)]. Of these, 934 were selfreported as originally from Southern European countries (263 from Portugal, 664 from Italy, 5 from Spain and 2 from Greece), the rest of the cases had unconWrmed or no geographical origin data. 1,020 of these were genotyped using the GeneChip Human Mapping 500K Array Set (AVymetrix) at the Broad Institute of Harvard and MIT as part of a genome-wide scan for HD genetic modiWers.

Genotyping
The HD CAG repeat length was determined by a polymerase chain reaction (PCR) ampliWcation assay, using Xuores-cently labeled primers, as previously described (Warner et al. 1993). The size of the fragments was determined using the ABI PRISM 3730xl automated DNA Sequencer (Applied Biosystems, Foster City, CA, USA) and GeneMapper version 3.7 software. A set of HD CAG alleles, determined by DNA sequencing, were used as standards. Genotyping of the PGC-1 polymorphism (rs7665116) was performed by real-time PCR using the commercially available Taqman Genotyping probe (Applied Biosystems, Foster City, CA, USA) carried out on the LightCycler ® 480 (Roche Diagnostics, Mannheim), following manufacturer's instructions.

Statistics
For the 1,020 samples with whole-genome genotyping, PCA was carried out using PLINK v1.05 (http:// pngu.mgh.harvard.edu/Purcell/plink/) (Purcell et al. 2007) in order to determine the genetic ancestry of these individuals. BrieXy, genotypes of HD samples were combined with HapMap Phase 2 data (CEPH, Yoruba, Han-Chinese and Japanese populations) for pairwise IBD estimation and subsequent IBS clustering.
To assess diVerences in the mean motor AO among Western and Southern European samples, we used the general estimating equation (GEE), thereby adjusting for related samples. Multivariate analyses were generated using GEE to assess the eVect of the rs7655116 SNP at the PGC-1 gene with HD residual motor onset, adjusting for familial correlation. Residual motor onsets were computed as the diVerence between the observed and expected age of onset and were standardized to a mean of zero and standard deviation of one. The weighted GEE was computed assuming an independent correlation structure and using the robust estimator of the variance to account for familial relationships. All statistical analyses were performed using PASW Statistics (version 18).

Results
We genotyped a collection of 1,929 HD DNA samples, with known HD CAG allele sizes and known age at onset of motor symptoms, for the PPARGC1A rs7665116 polymorphism. The observed genotype frequency of this SNP was in Hardy-Weinberg equilibrium. Since, in two of the previous reports, the association with AO was primarily observed in HD patients of Italian ancestry (Che et al. 2011;Weydt et al. 2009); we split our large cohort by ancestry into either Southern European or Western European HD cases. The Southern European HD cases (n = 934) consisted of self-reported Portuguese (n = 263), Italian (n = 664), Spanish (n = 5) and Greek (n = 2) HD cases. The Western European HD cases were chosen from amongst another 1,020 HD patients by use of principal component analysis (PCA) on available whole-genome genotyping data, to infer their genetic background. The Wrst principal component (PC1) distinguished Africans from non-Africans and the second principal component (PC2) distinguished Africans and Europeans from Asians (data not shown), and allowed us to exclude from our analysis the few samples who had sig-niWcant contribution of either Asian or African ancestry. Among the remaining (n = 952) European cases, the Western European cluster (n = 793) was deWned by overlap with the US Northern-Western European origin CEPH (HapMap) cluster, and consisted mainly of persons with self-reported North-American origin (Canada and US) as well as French and Irish. Thus, we had a total of 1,727 HD patients with assigned ancestry; 934 Southern European and 793 Western European (Table 1).
Remarkably, analysis of the clinical data for these 1,727 HD cases, which had CAG alleles ranging from 36 to 87 repeats, revealed that the self-reported Southern Europeans (n = 934) had signiWcantly later onset of motor HD symptoms (p < 0.001), by 4-5 years, compared to the Western European (n = 793), though the means/medians for HD CAG repeat length were similar in both groups (Table 1). Furthermore, the observed rs7665116 genotypes for samples from the Southern European countries revealed higher minor allele frequency (MAF) (»17 %) when compared to the Western European set (»12 %) (Table 1). These Wndings, together with the striking diVerences in the MAF of this polymorphism among the diVerent HapMap populations, strongly suggested that population stratiWcation might increase type I errors in the AO association analysis.
Finally, we have recently shown that the non-normal distribution of CAG allele size (and AO) also introduces error in conventional statistical analysis (Lee et al. 2012). Even a single CAG outlier sample, with a very long CAG repeat and extremely young age at onset relative to all others, can have a profound eVect on the Wnal result when testing for the eVects of potential genetic modiWers (Lee et al. 2012). Therefore, as a Wnal Wlter, we chose only the Southern and Western European HD cases with CAG alleles in the 40-to 53-repeat range, shown previously to yield a statistically well-behaved data set that conforms to the fundamental assumptions of linear regression analysis (constant variance and normally distributed error) (Lee et al. 2012).
As summarized in Table 1, these Wltering steps yielded a total of 879 self-reported Southern European cases from 823 families and 749 Western European cases (matched by use of PCA) from 620 diVerent families, with CAG repeats ranging from 40 to 53 repeats. Notably, as observed for the larger set of HD patients with a broader CAG repeat range, in this Wnal set of 1,628 patients the self-reported Southern Europeans had a signiWcantly older mean age at onset of motor symptoms than Western Europeans (Table 1), which was observed across the spectrum of CAG allele sizes (Fig. 1), despite a similar mean/median HD CAG repeat  (Table 1). Furthermore, the rs7665116 MAF was higher in the former relative to the latter Western European patients (Table 1). Using this Wnal set of 1,628 HD patients, we then performed analysis to determine whether rs7665116 may contribute to variance in HD motor onset not explained by the length of the expanded CAG repeat. In order to adjust for familial relationships, the eVect of the rs7665116 on residual motor onset was calculated using generalized estimating equations (GEE). In the unadjusted analysis, a signiWcant association with later residual AO was observed for both the additive genetic model ( = 0.090, p value = 0.009) and the dominant model ( = 0.113, p value = 0.008) ( Table 2). However, adjusting the analysis for ancestry (Southern vs. Western European), in both the additive and the dominant models, produced a striking impact on the eVect sizes ( decreased by »25 %) and the p values (p increased »4£) (Table 2), thereby revealing that population stratiWcation is a large contributor to an apparent rs7665116 association.

Discussion
Previous studies have reported the presence of a common polymorphism in PPARGC1A (rs7665116) that is associated with a delay in AO of HD motor symptoms in three European HD cohorts (Che et al. 2011;Taherzadeh-Fard et al. 2009;Weydt et al. 2009), primarily contributed by patients from Italy. Our study, which involved a larger collection of HD cases, did not provide strong evidence for this SNP, and therefore, for PGC-1 as a modiWer of HD motor onset, but did strongly support further investigation of the factors that contribute to the striking diVerences in AO of motor symptoms in 'Southern Europeans'.
The results of our study expose genetic ancestry as a critical factor in HD association studies. It is expected that a disease-associated polymorphism may have varying eVects in diVerent populations, but the variation in minor allele frequency, across diVerent genetic backgrounds related to ancestry may also be critical and should be taken into account in genetic association analysis. Cases that are poorly matched for genetic background may lead to false positives in association studies. It is important to control for ancestry by use of PCA or related methods, even in apparently close related populations such as Europeans (Novembre et al. 2008) where there is strong evidence of recent population selection that has led to intra-European variation in allele frequency (Price et al. 2008). This is particularly important in studies using the CEPH-CEU panel as controls since the genetic matching to diVerent populations may diVer considerably in diVerent countries (Lao et al. 2008). It is diYcult to accurately infer ancestry in candidate gene association studies, leading to imperfect correction for stratiWcation. However, with increasing availability of genome-wide datasets, the assessment of population structure should become a common procedure for candidate association studies.
Our study points out the dramatic eVect that population stratiWcation can have in testing a candidate gene for an association with disease phenotype. We found that the variation in rs7665116 minor allele frequency could lead to a false positive, if genetic ancestry is not corrected for in the analysis. The Southern European cases seem to be diVerent genetically and clinically, in terms of age at diagnosis of motor symptoms, from other European samples. By adjusting for ancestry, we observed striking eVects on both the P values (increased »4£) and eVect sizes (decreased by »25 %). Even though the post-adjustment p values remained nominally signiWcant, the dramatic reduction in signiWcance occasioned by considering ancestry does not * p values were derived using GEE to account for familial relationships lend conWdence to its being a true eVect and rather suggests that it is due to insuYciently rigorous ancestry categorization of the 'Southern European' set, which was based solely on self-reporting rather than unbiased genome-wide genotyping data. Consistent with this interpretation, while this manuscript was under review, an article by Soyal et al. (2012) reported that no association of rs7651166 with AO to Wrst symptom (not necessarily motor) was found in an European cohort of 1,706 HD patients whose MAF for this SNP closely resembled the MAF that we report for the Western European samples in our study. A remarkable Wnding from our analysis is the strong evidence of later motor onset for HD patients originally from Southern European countries (Portugal and Italy), which have a reported onset of motor symptoms that is 4-5 years later than that of HD patients from other European regions, despite similar mean/median CAG allele size. This striking diVerence may reXect population diVerences elsewhere in the genome, since extensive genome-wide SNP analyses have shown that even though European populations share much of their genetic background, they also exhibit a notable degree of non-sharing (ancestry) (Lao et al. 2008;Novembre et al. 2008). It's important to note that in addition to genetic background diVerences, a number of other factors may contribute to the diVerence in age at onset. One possibility is environmental inXuences, for example, diVerences related to lifestyle, or perhaps types of medication. A meta-analysis study has found evidence that people who adhere to a Mediterranean Diet appear to have a reduced risk of developing Parkinson's and Alzheimer's disease (SoW et al. 2008), and altered Alzheimer's disease course (Scarmeas et al. 2007). A recent observational study of HD in Europe has shown that Southern European (and Polish) clinicians prescribed anti-dyskinetic medication more frequently than clinicians in other European regions (Orth et al. 2010). Another environment related factor that is likely to contribute are the criteria and procedures for diagnosing HD, which may diVer in diVerent cultures. It is now important to understand which of many potential population-speciWc genetic and/or environmental factors are associated with later reported AO of motor symptoms in Southern Europeans.

Conclusion
The results of our study do not provide strong evidence for PPARGC1A SNP rs7665116, and therefore, for PGC-1 , as a modiWer of age at onset of HD motor symptoms. However, we have found evidence of a signiWcantly later age at onset of motor symptoms in Southern European countries, which may reXect genetic eVects and/or environmental (lifestyle, diagnosis) factors that should be further explored.
Our data strongly illustrate the false contribution that population stratiWcation may make in a candidate gene association study, while providing genetic evidence that the contribution of PGC-1 as a modiWer of the disease process that leads to onset of HD motor symptoms may not be sig-niWcant.
Acknowledgments We thank the HD families, whose participation in genetic studies made this work possible, the COHORT co-investigators and contributors (see "Appendix"), contributors to the HD-MAPS study, Ruth Abramson, Alexandra Durr, Adam Rosenblatt, Luigi Frati, Susan Perlman, P. Michael Conneally, Mary Lou Klimek, Melissa Diggin, TiVany Hadzi and Ayana Duckett, as well as the Harvard Brain Tissue Resource Center at McLean Hospital and the National Neurological Research Specimen Bank at the VA West Los Angeles Healthcare Center. EMR is the recipient of a scholarship from the Fundação para a Ciência e a Tecnologia of Portugal (SFRH/BD/44335/2008). Study funding supported by NIH grants from the NINDS NS16367 (The Massachusetts HD Center Without Walls) and NS32765, and the CHDI Foundation, Inc.

ConXict of interest
The authors declare that they have no conXict of interest.

Ethical Standards
This study used only deidentiWed, previously collected DNA samples and phenotypic data in a manner approved by the Institutional Review Board of Partners HealthCare, Inc.
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.