Introduction

Type 2 diabetes and obesity are complex traits, caused by multiple environmental and genetic factors. In recent decades, there has been a dramatic rise in the prevalence of type 2 diabetes and obesity in the Western and developing world. Adaptation to powerful selective forces for genotypes that provide survival advantage has been proposed as an explanation for this observed capacity of a genetic disease to become so prevalent when unmasked by changes in environment. In 1962, James Neel suggested that exposure to periods of famine during human evolutionary history resulted in selection pressures in favour of a thrifty genotype that led to highly efficient fat storage during periods of abundance [1]. In the current climate of food overabundance and sedentary lifestyle, this thrifty genotype is suggested to lead to metabolically disadvantageous phenotypes.

Signals of positive selection resulting in reduced haplotype diversity can be identified by investigating haplotype structure and allelic architecture. For example, if the thrifty genotype hypothesis were true, we would expect to observe some of the following characteristics at disease loci: risk alleles would be derived alleles; there would be substantial differences in allele frequency across different populations; and there would be evidence that relatively recently emerging alleles have been swept to high frequency. These tests offer the possibility of detecting selection signals, operating over different time scales (ranging from recent positive selection identified through extreme integrated haplotype scores [iHSs] to the much older time frame of derived/ancestral allele status), and we would therefore not expect to obtain consistent evidence across the different tests.

The fields of type 2 diabetes and obesity genetics had until recent years met with limited success in identifying replicating loci. The advent of large-scale, well-designed association studies, coupled with large-scale follow-up and stringent criteria for declaring reproducible association, has led to the identification of well-established type 2 diabetes and obesity loci. This enables us for the first time to carry out a systematic examination of these genomic loci for evidence of signatures of selection, and thereby seek to corroborate or refute the thrifty genotype hypothesis.

Methods

For the purposes of this study, we define a confirmed type 2 diabetes or obesity locus as one that has been robustly replicated, reaching a genome-wide significance threshold of p < 5 × 10−8. This criterion yields 17 loci for type 2 diabetes (in or near the TCF7L2, PPARG, KCNJ11, CDKAL1, SLC30A8, IGF2BP2, NOTCH2, THADA, JAZF1, CDC123/CAMK1D, TSPAN8/LGR5, HHEX/IDE, CDKN2A/B, ADAMTS9, TCF2, WFS1 and KCNQ1 genes) [2] and 13 for obesity (associations with BMI) (in or near the FTO, TMEM18, MC4R, GNPDA2, SH2B1, KCTD15, MTCH2, NEGR1, PCSK1, LGR4/LIN7C/BDNF [two independent single nucleotide polymorphisms {SNPs}], ETV5/SFRS10/DGKG and MAF genes) [38] (Tables 1 and 2). We have selected a representative (index) SNP for each of these 30 independently associated loci and have examined several characteristics of the genomic sequence that might indicate evidence for selection.

Table 1 Type 2 diabetes-associated risk allele characteristics
Table 2 Obesity-associated risk allele characteristics

First, we determined whether the risk allele at the index SNPs is the ancestral or derived allele, using information available through dbSNP build 128 (www.ncbi.nlm.nih.gov/SNP/, accessed February 2009), based on chimpanzee/human sequence alignment.

We also calculated population differentiation statistics (fixation index F ST) for the 30 loci in the three HapMap phase II populations: Centre d’Etude du Polymorphisme Humain (CEPH) (Utah residents with northern and western European ancestry) (CEU); Yoruba in Ibadan, Nigeria (YRI); and Japanese in Tokyo (JPT) + Han Chinese in Beijing, China (CHB) [9]. F ST measures the proportion of total genetic variance that is caused by differences between two or more population samples. Local selection acting on a given locus can result in elevated F ST values between two populations. We can identify loci that have unusually high F ST values by comparing against the rest of the genome, which provides an empirical null distribution. The use of an empirical F ST distribution in this case is advantageous, because it does not require assumptions about the structure of human populations, SNP ascertainment bias (which differs among the three HapMap population samples) and differences in local linkage disequilibrium patterns among different populations. We constructed an empirical F ST distribution using over 2.9 million SNPs, or the subset of all HapMap Phase II SNPs with genotype data available in all the three reference samples (HapMap Release 22, April 2007). We compared the observed F ST values for the obesity and type 2 diabetes loci with the upper 95% tail of the distribution to obtain a one-tailed test for diversifying selection.

We additionally investigated evidence for natural selection by examining the iHS, a measure of recent positive selection for variants that have not yet reached fixation [10, 11]. This statistic identifies SNPs for which alleles have rapidly changed in frequency by comparing the haplotype background of the ancestral and derived alleles. Negative iHS values indicate that the derived allele resides on a longer haplotype, whereas positive iHS values suggest that the ancestral allele resides on a longer haplotype. For the purposes of this study, we define iHS <−1.5 and iHS >1.5 as suggestive evidence for natural selection, and iHS scores <−2 or >2 as evidence for a powerful selection signal [10]. We determined the iHS score for each locus in HapMap phase II data using Haplotter (http://hg-wen.uchicago.edu/selection/haplotter.htm, accessed February 2009) [10, 11].

Results

Evidence that type 2 diabetes- or obesity-associated risk alleles were more often derived than ancestral would be consistent with positive selection. In type 2 diabetes, we found the risk allele to be the derived allele at six of the 17 loci (CDC123/CAMK1D, TSPAN8/LGR5, NOTCH2, CDKN2A/B, IGF2BP2 and KCNJ11) (binomial test one-sided p = 0.93) (Table 1). Similarly, we did not observe a significant overrepresentation of derived status for the obesity-risk alleles (seven [MC4R, GNPDA2, MTCH2, NEGR1, PCSK1, LGR4/LIN7C/BDNF and ETV5/SFRS10/DGKG], p = 0.50) (Table 2). Among the type 2 diabetes loci, ten risk alleles are major and seven minor (binomial test two-sided p = 0.63) (Table 1). Among the obesity-risk alleles, six are major and seven are minor (p = 1.00) (Table 2).

Only one locus (rs7901695 at TCF7L2) showed an elevated F ST value of 0.579 (2.1 percentile), between the JPT + CHB and YRI sample (previously also noted [12]), and in the comparison between CEU and JPT + CHB (F ST = 0.323, 5.2 percentile) (Table 1). SNP rs5215 at KCNJ11 demonstrated an elevated F ST value of 0.384 between CEU and YRI (5.9 percentile) (Table 1).

Among the type 2 diabetes-associated loci, the NOTCH2 rs10923931 index SNP demonstrated an elevated iHS value (2.249, 2.3 percentile) for the protective, ancestral allele (Table 1). Among the BMI-associated SNPs, the strongest signal of positive selection was obtained for the FTO locus, with an iHS value of 1.991 (4.4 percentile) (Table 2). No general enrichment for high F ST or long haplotypes was observed for the set of diabetes- or obesity-associated SNPs (using Mann–Whitney significance testing).

Discussion

We have not observed significant evidence for overrepresentation of ancestral/derived status or for minor/major frequency at type 2 diabetes- or obesity-risk alleles. Only one locus (at the type 2 diabetes TCF7L2 locus) demonstrates large allele frequency differences across populations. Although this is consistent with chance, we note that TCF7L2 represents the strongest effect size to be identified in type 2 diabetes to date and, as such, may have been more susceptible to selection forces. Notably, we did not find strong evidence for high differentiation of rs2237892 at KCNQ1 between the European and East Asian sample (F ST = 0.209, 13.3 percentile of the empirical distribution). The risk allele C at this locus has frequencies close to 90% in the CEU and YRI HapMap samples and close to 60% in the two East Asian samples.

Our analyses indicate the presence of extended haplotypes at the FTO locus, the largest effect size for obesity found to date. However, we have not identified any consistent footprint of selection across the loci that would support the notion of a universal mechanism to explain the high prevalence of type 2 diabetes and obesity. The number of robustly replicating type 2 diabetes and obesity loci identified is poised to grow, offering the promise of an extended established disease locus list. In addition, expansion of association studies to populations of non-European descent is likely to broaden the spectrum of robustly associated allelic variation and may help identify loci with prominent evidence for population differentiation, for example where risk alleles at a SNP have rapidly changed in frequency since population separation. Importantly, the truly causal, functional variants for the majority, if not all, of established type 2 diabetes- and obesity-susceptibility loci have not been determined yet. We have therefore been restricted to studying index SNPs, representative of the replicating associations, which could have an effect on the variant-specific analyses we have carried out, as these may provide only indirect glimpses of the history of the causal mutations.

This study has been exhaustive in terms of comprehensively considering all known, well-established type 2 diabetes- and BMI-susceptibility variants. Some loci appear to have more ‘thrifty gene’ characteristics than others, but there is no clear globally consistent transpiring picture. Further emerging insights into the genetic aetiology of these complex traits are likely to help us distinguish between apparent and real signals for positive selection.