Introduction

The heritability seen in type 2 diabetes remains largely unexplained, despite substantial progress in identifying genetic variants conferring increased risk of this condition. To date, ~40 such variants have been identified, largely through genome-wide association studies (GWAS) [18]. However, the sibling relative risk (λS) of type 2 diabetes conferred by all these variants combined is ~1.15, well below the epidemiological estimate (~3.0) [8, 9].

Of late there has been great interest in the potential role of low-frequency (LF) variants in terms of individual susceptibility to complex diseases such as type 2 diabetes. As GWAS to date have focused on the detection of common variant associations, the contribution to type 2 diabetes risk of variants with a minor allele frequency (MAF) below 5% remains largely unexplored.

A logical place to initiate the search for LF variants influencing multifactorial type 2 diabetes lies in exploring those genes already implicated in diabetes pathogenesis because they contain either rare mutations causal for monogenic forms of diabetes, or common variants associated with multifactorial type 2 diabetes. In genes implicated because of their role in monogenic diabetes, there is evidence that large-effect mutations are compatible with life, and that they result in a phenotype with substantial similarities to (and clinical overlap with) to type 2 diabetes. It is likely, therefore, that variants with less dramatic effects on function and/or expression, where they exist, result in less extreme clinical phenotypes including multifactorial type 2 diabetes. As existing GWAS and linkage study data argue against the possibility of common variants (MAF > 5%) of medium- to large-effect size, variants with such effect sizes are also likely to be rare (MAF <0.1%) or of low frequency (MAF 0.1–5%). LF variants have been implicated in the pathogenesis of other complex diseases, such as type 1 diabetes [10], although their contribution to type 2 diabetes predisposition is as yet uncertain.

Rare, highly penetrant mutations in the gene HNF4A, encoding the transcription factor hepatocyte nuclear factor 4α (HNF-4A), account for approximately 5% of cases of MODY [11]. Though HNF4A is expressed in multiple tissues, its expression in the pancreatic beta cells and liver is of particular interest. In pancreatic beta cells, HNF-4A is required for glucose metabolism as well as normal insulin gene expression and secretion [12]. In the liver, HNF4-A is required for hepatic gluconeogenesis [13]. Several studies have shown linkage between multifactorial type 2 diabetes and the region of chr20q where HNF4A is located [1417]. Previous candidate gene analyses have demonstrated weak evidence of association (p~0.01) between common variants in the P1 and P2 promoters of HNF4A and multifactorial type 2 diabetes [17, 18], but these have not been substantiated in GWAS to date [15, 8]. As common variants in HNF4A do not explain the findings of linkage studies, it is possible that this region harbours more penetrant LF variants that might explain this observation [19].

HNF4A has been extensively re-sequenced, not least as part of clinical diagnostic screening for MODY. These re-sequencing efforts have, inter alia, identified two LF coding non-synonymous variants of particular interest: V255M (c.763G>A p.Val255Met) and T130I (c.389C>T p.Thr130Ile, rs1800961). V255M was first described following re-sequencing of Danish samples but no evidence of association to type 2 diabetes was seen in analysis of 1,434 cases and 4,790 controls [20]. T130I, positioned in the DNA binding domain of HNF4A, showed modest (p = 0.04) association with type 2 diabetes in the same sample [20], though subsequent efforts at replication failed to confirm this [21]. One arm of a meta-analysis of the association of HNF4A genetic variants to type 2 diabetes [22] also included some previous association studies of T130I (by our estimation including approximately 3,500 cases and 3,700 controls for this variant), and demonstrated a modest association (p = 0.045) [22]. Most recently, and of particular interest given the relationship between lipids and type 2 diabetes, a significant association between T130I and HDL-cholesterol levels has been demonstrated (p = 8 × 10−10) in a GWAS meta-analysis incorporating 30,714 individuals [23].

Both variants have been shown to be functional based on studies of the transcriptional regulation of HNF-4A target genes in a range of cell lines and primary mouse hepatocytes [20, 2426]. We therefore reasoned that they remain interesting candidates for assessment in larger samples to more clearly establish their likely contribution to type 2 diabetes susceptibility.

Methods

Individuals studied

Three categories of samples were included. Category 1 consisted of samples specifically genotyped for this study. Category 2 comprised samples with previously reported genotyping information for these single nucleotide polymorphisms. Category 3 included samples for which only summary statistics were available from previous published reports.

Category 1 samples were derived from three sources (two UK samples and one Danish sample). UK sample 1 (‘UK1’, n = 4,124 cases, 5,126 controls) included the UK Type 2 Diabetes Genetics Consortium (UKT2DGC) collection recruited in Tayside, Scotland: these have been previously described [1, 27]. UK sample 2 (‘UK2’) comprised type 2 diabetes cases (n = 1,853 for V255M; 1,193 for T130I) ascertained from a subset of the Diabetes UK Warren 2 repository [28]. The controls for UK2 were taken from the population-based British 1958 Birth Cohort (n = 7,133), and the UK Blood Services Collection (n = 3,087) [27].

Danish sample 1 (‘DK1’, n = 2,646 cases) was also included in category 1 for the study of T130I. DK1 represents samples collected in the Steno Diabetes Centre and Danish samples from the Anglo–Danish–Dutch study of Intensive Treatment in People with Screen-Detected Diabetes in Primary Care (ADDITION) [20, 29]. The new samples in DK1 were combined with the previously reported case and control data from DK2 (described below) to generate a combined DK analysis of 3,771 cases and 4,727 controls.

Category 2 included samples from Denmark, Sweden, Finland and Canada. Danish sample 2 (‘DK2’, n = 1,397 cases; 4,865 controls) was previously genotyped for T130I and V255M [20]. Two samples from the Finland–United States Investigation of Non Insulin Dependent Diabetes Mellitus Genetics (FUSION) study were included for T130I (FUSION sample 1, ‘FS1’, [n = 1,160 cases; 1,173 controls] and FUSION sample 2, ‘FS2’, [n = 1,211 cases; 1,264 controls]) [4]. FS1 and FS2 represent the FUSION GWAS and replication samples, respectively, and have been included in a type 2 diabetes [2] and a lipid GWAS [23] and subsequent follow-up of significant findings. The numbers of individuals quoted for FS1 and FS2 differ slightly from those in the reference article as a consequence of DNA availability, the withdrawal of some individuals and the updated type 2 diabetes status of others. The recruitment criteria for these samples have been reported by Zeggini et al. [4]. We also included samples from the Metabolic Syndrome in Men (METSIM) study (‘MS1’, n = 801 cases; 3,043 controls) recruited in Finland [30]. The T130I genotype data for MS1 were included as part of the lipid GWAS follow-up [23], though type 2 diabetes data have not been published. Previously reported genotyping results for T130I from three samples from the Broad Institute were also included [21]. These comprised a Canadian sample (Broad sample 1, ‘BR1’, n = 127 cases; 127 controls), a combined Swedish/Finnish sample (Broad sample 2, ‘BR2’, n = 490 cases; 490 controls) and a Swedish sample (Broad sample 3, ‘BR3’, n = 514 cases; 514 controls).

All studies were approved by local ethics committees and were performed in accordance with the principles of the Helsinki Declaration II. Informed consent was obtained from all individuals before participation. Detailed descriptions of category 1 and 2 samples are included in the Electronic supplementary material (ESM). All participant characteristics are summarised in Table 1.

Table 1 Study participant characteristics

For a more complete study of T130I, the results from these category 1 and 2 samples were included in a meta-analysis together with those from all previously published studies (category 3) for which summary statistics were available. For these category 3 samples we had no access to genotype information. These included a Pima Indian sample (‘PI1’, n = 573 cases; 464 controls) [31] and a Japanese sample (‘JP1’, n = 423 cases; 354 controls) [24] included in the meta-analysis by Sookoian et al. [22] in addition to a Mexican sample (‘MX1’, n = 100 cases; 75 controls) [32].

Genotyping and quality control

Genotyping of T130I in the UK samples was carried out using a TaqMan assay on the ABI 7900HT platform (Applied Biosystems, Warrington, Cheshire, UK). A KBioscience allele-specific PCR (KASPar) assay (KBioscience, Hoddesdon, UK) was used in the genotyping of V255M in the UK samples and T130I in DK1. The quality of the genotyping was assured by: (1) assessing the genotyping pass rate (>96% globally); (2) evaluating the estimated error rate based on completed duplicate pairs (UK samples: 0.00% for V255M, n = 314 duplicate pairs and 0.18% for T130I, 268 duplicate pairs; DK1: 0.20% based on 521 duplicate samples); and (3) assessing for departure (p < 0.05) from Hardy–Weinberg equilibrium (none detected). Genotyping methods and quality control measures for DK2 [20], Broad samples [21], FUSION samples [2, 23] and MS1 [23] have been previously reported.

Statistical analysis

No heterogeneity of genotype counts was seen between category 1 cases when assessed by an exact Pearson χ 2 test using StatXact (v6.0: Cytel Software Corp., Cambridge, MA, USA). The same was true of controls. We subsequently carried out a primary association analysis of category 1 samples (UK1, UK2, DK1 and DK2 for T130I, UK1 and UK2 for V255M) (each as a separate stratum) followed by a secondary association analysis including category 1 and category 2 samples (UK1, UK2, DK1, DK2, FS1, FS2, MS1, BR1, BR2 and BR3 for T130I and UK1, UK2 and DK2 for V255M) as separate strata. We used an exact Cochran–Armitage trend test (StatXact v6.0) for all association analyses in this report. The ORs and sample sizes for each stratum from this study were subsequently used in a meta-analysis of T130I incorporating the previously defined category 3 samples [22, 32] using an additive model performed with the Genome Wide Association Meta-Analysis (GWAMA) software package (www.well.ox.ac.uk/gwama). Though these samples are geographically disparate, a low level of heterogeneity of the T130I association effect sizes was detected using GWAMA (I 2 = 61%; Q statistic p value = 0.10; quantified by the comparison of the samples in categories 1 and 2 with category 3 samples).

Power calculations derived from QUANTO [33], using the previously reported OR for T130I [20], indicated 99% power to detect an effect size of 1.3 (for α = 0.001) for this variant in our expanded association analysis incorporating UK, Danish, FUSION, METSIM and Broad samples. The power for V255M is lower (12% power to detect the same effect size [for α = 0.001]) because of the much lower MAF. For this variant, we had 80% power to detect an effect size of 3.0 (for α = 0.001).

Results

T130I was successfully genotyped in 7,645 cases and 14,756 controls in category 1. This variant had a MAF of 3.76% in cases and 3.00% in controls. In category 1 samples, a modest association with type 2 diabetes was conferred by the T allele of this variant (additive per allele OR 1.20 [95% CI 1.08–1.33]; p = 5 × 10−4). The expanded association analysis incorporating the category 1 and 2 samples marginally increased the strength of this association (OR 1.17 [95% CI 1.08–1.28]; p = 1.5 × 10−4) (Table 2). The meta-analysis (Fig. 1) incorporating all available studies of T130I further increased the strength of this association (n = 14,279 cases; 26,835 controls; OR 1.20 [95% CI 1.10–1.30]; p = 2.1 × 10−5). There was no evidence for heterogeneity in our meta-analysis (I 2 = 61%; Q statistic p value = 0.10).

Table 2 Association testing of the T130I variant of HNF4A
Fig. 1
figure 1

Meta-analysis of the association studies of the T130I variant to type 2 diabetes. The plot was generated using Comprehensive Meta Analysis software version 2.2050 (Biostat, Englewood, NJ, USA). CS, current study

The V255M variant was successfully genotyped in 5,745 cases and 15,044 controls in the UK study. The MAF for V255M was far lower than for T130I (cases 0.08%; controls 0.10%), and no type 2 diabetes association was observed in either the UK sample (p = 0.28) or the larger association analysis incorporating Danish genotyping data (p = 0.40) (Table 3).

Table 3 Association testing of the V255M variant of HNF4A

We did not find any evidence for linkage disequilibrium (r 2 < 0.005) between either T130I or V255M and the common variants in the promoter region of HNF4A that had previously shown a weak association to type 2 diabetes susceptibility.

Discussion

When genes implicated in diabetes pathogenesis undergo monogenic screening, the variants discovered tend to be put into two categories: they are either considered to be causal for monogenic diabetes or neutral ‘polymorphisms’. The latter group has been assumed to have no role in disease susceptibility. With the large sample sizes now available, it is possible to go back to some of these coding variants that are clearly not causal for monogenic diabetes and re-examine whether they could, nevertheless, be influencing susceptibility to common forms of diabetes. Previous functional and association studies had highlighted two coding LF variants within HNF4A as interesting candidates in this respect and we have carried out the largest association analysis to date for the V255M and T130I variants of HNF4A to better understand their role in type 2 diabetes pathogenesis.

We found no association between the V255M variant of HNF4A and type 2 diabetes risk in the UK samples or in our larger analysis. It is worth emphasising that the low MAF of this variant means that our power to detect association was limited to large effect sizes only.

In contrast, evidence of association between the T130I variant of HNF4A and type 2 diabetes risk was found in our analysis of category 1 and 2 samples. The evidence for association was increased when we added category 3 samples, reaching a p of 2.1 × 10−5. To determine whether there was any additional evidence for association available from recent large-scale genome-wide association meta-analyses for type 2 diabetes, we examined data from the recently published Diabetes Genetics Replication and Meta-analysis Consortium (DIAGRAM)+ meta-analysis [8], after excluding samples already in our meta-analysis. T130I (rs1800961) is represented on Illumina arrays but is neither present on nor can it be reliably imputed into genome-wide genotypes obtained on early Affymetrix platforms, limiting the data available from DIAGRAM+ to 3,590 cases and 32,326 controls. In these samples, there was a directionally consistent but non-significant association with T130I (p = 0.13) such that the combined analysis (17,869 cases and 59,197 controls) reached p = 1.0 × 10−5.

However, this association fails—by some margin—to reach widely accepted thresholds for genome-wide significance which, in the context of LF variants, should be even more stringent than those required for common variants (perhaps around α = 5 × 10−9), given the larger number of independent tests that are possible once lower-frequency variants are considered. We estimate that to achieve this level of significance for T130I (using the effect size [an OR of 1.20] observed in our expanded meta-analysis) would require almost 100,000 samples (in fact, 48,697 cases and 48,697 controls). Recent evidence (achieving such levels of genome-wide significance) that T130I is associated with altered HDL-cholesterol levels raises the prior odds that the type 2 diabetes association we observe here is genuine, as does the strong biological candidacy of this gene given its proven causal role in monogenic forms of diabetes. The previously reported association with lipid levels also raises an interesting question as to whether or not the type 2 diabetes association is mediated by a direct influence of the variant (or a causal variant with which it is in linkage disequilibrium) on beta cell function, or a primary effect on lipid physiology. The latter question could, in principle, be answered by a suitably scaled Mendelian randomisation experiment. The broad transcriptional effects of HNF-4A would be consistent with pleiotropic effects of the variant on multiple systems.

As efforts are increasingly aimed at understanding the full allelic spectrum of variants involved in multifactorial disease pathogenesis, large-scale genotyping will be required to clarify the role played by LF variants. As our results exemplify, the numbers required for association testing of such variants are substantial when the effect size is modest. It is clear that large collaborative efforts will be needed to maximise the samples available for any such studies.