Introduction

The insulin receptor substrate-1 (IRS1) is an early mediator in the insulin receptor signal transduction pathway [1]. Tyrosine phosphorylation of IRS proteins causes them to bind phosphatidylinositol 3-kinase, which in turn initiates a phosphorylation cascade resulting in various downstream effects of insulin [2]. Tissue-specific knockout experiments in mice have shown that IRS1 is a necessary component of insulin action in skeletal muscle, adipose tissue and pancreatic beta cells [3]. Its gene IRS1 has therefore been proposed as a candidate gene that might cause diabetes in humans [4].

Indeed, the common missense variant glycine → arginine at codon 972 (G972R) [5] was associated with type 2 diabetes in a meta-analysis of 27 studies comprising 8,827 subjects, although the statistical significance of this result was modest [6]. To test the reproducibility of this association, we recently attempted to replicate the same genetic model in large samples totalling 9,000 white individuals. Despite >95% power to obtain a p < 0.05 for the odds ratio (OR) of 1.25 that had been estimated in the meta-analysis, we did not observe an association of G972R with type 2 diabetes, related traits or age of onset [7]. A simultaneous independent report in over 2,000 samples also failed to confirm the association [8].

Other variants in IRS1 may affect gene expression or protein function and thereby induce insulin resistance; in such a scenario, varying degrees of linkage disequilibrium (LD) in different populations may have given rise to discrepant association signals. In order to assess whether common genetic variation in IRS1 increases risk of type 2 diabetes, we set out to characterise the haplotype structure of IRS1, select a set of markers that capture common variation in the region and genotype them in several large family-based and case-control samples of northern European ancestry.

Subjects and methods

Clinical samples

To maximise genotyping efficiency, we performed this association study in two stages. In stage 1, we genotyped 20 tag single nucleotide polymorphisms (SNPs) in two case-control diabetes samples of European descent obtained from Genomics Collaborative (Cambridge, MA, USA). One sample comprised 1,226 case-control pairs from the USA and one comprised 1,009 case-control pairs from Poland, both matched for age, sex and grandparental country of origin. A subset of these samples has been formally examined for the presence of population stratification [9]. For stage 2, we selected the only nominally significant result obtained in stage 1 (p < 0.05) and genotyped it in Scandinavian samples from the Botnia Study [10], which include 211 trios, 874 siblings discordant for type 2 diabetes and a case-control sample totalling 755 subjects matched for age, BMI and region of origin. Cases with severe impaired glucose tolerance (studied in previous reports from our group [7, 1115]) were excluded. In addition, we studied: (1) an individually matched case-control sample totalling 254 subjects from the Saguenay Lac-St Jean region in Quebec (Canada); (2) a case-control sample from Sweden totalling 948 subjects matched for sex, age and BMI; and (3) an additional Scandinavian case-control sample including 1,999 cases with type 2 diabetes (from a southern Sweden Diabetes Registry [16]) and 2,260 unrelated ethnically matched control subjects from the Malmö Diet and Cancer Study (http://www.mdcs.mas.lu.se/), who had fasting blood glucose <5.6 mmol/l and no known family history of type 2 diabetes. These samples were validated by replication of the three most widely reproduced associations in type 2 diabetes: (1) the P12A variant in the peroxisome proliferator-activated receptor γ (PPARG) [11], (2) the E23K variant in the islet ATP-sensitive potassium inwardly-rectifying channel (KCNJ11) [7, 12] and (3) common variants in the transcription factor seven-like 2 gene (TCF7L2) [17]. All subjects gave informed consent. Appropriate institutional review board approvals were obtained and all investigations were carried out according to the Declaration of Helsinki. The phenotypic characteristics of all patient sub-samples are presented in Table 1.

Table 1 Characteristics of patient samples

Genotyping

Genotyping was generally performed by allele-specific primer extension of multiplex products with detection by matrix-assisted laser desorption ionisation-time of flight mass spectroscopy on the Sequenom platform [18]. The Scandinavian case-control sample was genotyped by the allelic discrimination method on an ABI7900 machine (Applied Biosystems, Foster City, CA, USA). Average genotyping success (taking into account all SNPs and all samples) was 96.9% and our consensus rate was 99.98%, based on 8,940 duplicate genotypes.

Haplotype structure

To evaluate the haplotype structure of the IRS1 gene, we first downloaded data for the CEU (Caucasian) samples from phase 1 of the HapMap project [19]. We targeted a segment that would begin ∼20 kb upstream of the IRS1 transcription start site and end ∼10 kb downstream from the end of the 3′ untranslated region (UTR), expanding this region in both directions until we noted decay of LD as defined by the end of a haplotype block [20]. Additional SNPs were genotyped in the HapMap CEU plate and integrated into the map to refine areas of low SNP density or clarify the extent of LD. Results were updated when genotypes from phase 2 of the HapMap became available.

Tag SNP selection

Tag SNPs were selected with Tagger [21] (http://www.broad.mit.edu/mpg/tagger/), based on data from phase 1 of the HapMap, complemented by limited additional genotyping in the HapMap CEU plate. By setting a threshold r 2 ≥ 0.8, forcing in G972R (rs1801278) as a tag and using an ‘aggressive’ strategy as implemented in Tagger, we obtained 20 single-marker tags only (equivalent to the ‘pairwise’ approach), which were carried forward in the disease samples.

Statistical analysis

To examine the association of alleles with type 2 diabetes we used simple χ 2 analysis in the case-control samples, the transmission disequilibrium test [22] in the diabetic trios and the discordant allele test [23] in the sib pairs; the first two were implemented in Haploview (http://www.broad.mit.edu/mpg/haploview/) [24]. Results from the various samples were combined by Mantel–Haenszel meta-analysis of the odds ratios [25]. Homogeneity of ORs among study samples was tested using an asymptotic Breslow–Day statistic [26]. Adjustment for covariates was performed using the software program whap (http://pngu.mgh.harvard.edu/purcell//whap/) and the test for genotype-BMI interaction was performed with PLINK (http://pngu.mgh.harvard.edu/~purcell/plink/).

In order to correct for the multiple variants examined, we performed 10,000 permutations in the US case-control sample, obtaining an empiric p value based on the number of times the best χ 2 statistic was exceeded. The ratio of this empiric p value to the best nominal p value is an indicator of the statistical correction needed to account for the number of markers tested; this correction factor (10.95×) is lower than a pure Bonferroni correction (20×) due to the correlation among variants brought about by LD in the region. The correction factor was applied to the overall meta-analysis p value and used in our power calculations, which were performed with the program of Purcell et al. [27], available at http://pngu.mgh.harvard.edu/~purcell/gpc/.

We note that although we chose a two-stage strategy for the purposes of SNP selection and genotyping efficiency, the SNP promoted to the second stage was analysed in all samples jointly; we therefore applied a statistical correction for the total number of variants tested at the joint analysis stage.

Quantitative trait comparisons

A 75-g OGTT with insulin and glucose measurements at 0, 30, 60 and 120 min was performed in a subset of the control Botnian subjects (n = 850, 415 female). Plasma glucose was measured by a glucose oxidase method on a glucose analyser (Beckman Instruments, Fullerton, CA, USA) and fasting insulin was measured by radioimmunoassay. The insulinogenic index was calculated as: \( {{\left[ {{\left( {{\text{insulin}}\,{\text{at}}\,30\,{\text{min}}} \right)} - {\left( {{\text{insulin}}\,{\text{at}}\,0\,\min } \right)}} \right]}} \mathord{\left/ {\vphantom {{{\left[ {{\left( {{\text{insulin}}\,{\text{at}}\,30\,{\text{min}}} \right)} - {\left( {{\text{insulin}}\,{\text{at}}\,0\,\min } \right)}} \right]}} {{\left[ {{\left( {{\text{glocose}}\,{\text{at}}\,30\,\min } \right)} - {\left( {{\text{glocose}}\,{\text{at}}\,0\,\min } \right)}} \right]}}}} \right. \kern-\nulldelimiterspace} {{\left[ {{\left( {{\text{glocose}}\,{\text{at}}\,30\,\min } \right)} - {\left( {{\text{glocose}}\,{\text{at}}\,0\,\min } \right)}} \right]}} \) [28]; the whole-body insulin sensitivity index was calculated as: \( {10,000} \mathord{\left/ {\vphantom {{10,000} {{\sqrt {{\left[ {{\left( {{\text{mean}}\,{\text{OGTT}}\,{\text{glucose}}} \right)} \times {\left( {{\text{mean}}\,{\text{OGTT}}\,{\text{insulin}}} \right)} \times {\left( {{\text{fasting}}\,{\text{glucose}}} \right)} \times {\left( {{\text{fasting}}\,{\text{insulin}}} \right)}} \right]}} }}}} \right. \kern-\nulldelimiterspace} {{\sqrt {{\left[ {{\left( {{\text{mean}}\,{\text{OGTT}}\,{\text{glucose}}} \right)} \times {\left( {{\text{mean}}\,{\text{OGTT}}\,{\text{insulin}}} \right)} \times {\left( {{\text{fasting}}\,{\text{glucose}}} \right)} \times {\left( {{\text{fasting}}\,{\text{insulin}}} \right)}} \right]}} }} \)[29]; insulin resistance by homeostasis model assessment (HOMAIR) was calculated as: \( {{\left( {{\text{fasting}}\,{\text{serum}}\,{\text{insulin}} \times {\text{fasting}}\,{\text{plasma}}\,{\text{glucose}}} \right)}} \mathord{\left/ {\vphantom {{{\left( {{\text{fasting}}\,{\text{serum}}\,{\text{insulin}} \times {\text{fasting}}\,{\text{plasma}}\,{\text{glucose}}} \right)}} {22.5}}} \right. \kern-\nulldelimiterspace} {22.5} \) [30]; and insulin AUC was calculated by the trapezoidal method as: \( {\left( {{\text{value}}\,{\text{of}}\,{\text{30}}\,\min \times 30} \right)} + {\left( {{\text{value}}\,{\text{of}}\,60\,\min \times 45} \right)} + {\left( {{\text{value}}\,{\text{at}}\,120\,\min \times 30} \right)} - {\left( {{\text{value}}\,{\text{at}}\,0\,\min \times 105} \right)} \). Values were log-transformed to approximate normality and compared by t test between major allele homozyogotes at rs934167 and minor allele heterozygotes (no minor allele homozygotes were detected in this subset).

Results

Characterisation of common sequence variation at IRS1

The 130 polymorphic SNPs compiled by us span 183 kb and contain the entire IRS1 gene, from ∼105 kb upstream of the transcription start site to ∼13 kb downstream of the end of the 3′ UTR. The average spacing between SNPs is 1.4 kb and the maximum gap 8.6 kb. There is significant LD in the region, which results in limited haplotype diversity and a sizeable efficiency gain when attempting to capture common genetic variation (Fig. 1).

Fig. 1
figure 1

Linkage disequilibrium (LD) plot across the IRS1 gene region. The horizontal black line depicts the chromosomal segment analysed in the HapMap CEU sample and the purple line indicates the IRS1 gene (5′ to 3′ is right to left). The tag SNP locations are indicated by hatch marks above the black line. The LD plot in the bottom part of the figure is based on the measure D′: each diamond represents the magnitude of LD for a single pair of markers, with red indicating that LD is strong (D′ >0.8) and statistically significant (LOD >2.0). Due to the high number of consecutive rare SNPs in the first exon of IRS1, there were not enough observations to merit the statistical definition of haplotype blocks according to Gabriel et al. [20]; however, the ‘spine of LD’ definition as implemented in Haploview revealed two large haplotype blocks, which begin 93.6 kb upstream of the IRS1 transcription start site and end 12.4 kb downstream of the 3′ UTR respectively (a third smaller haplotype block is noted further upstream). These haplotypes are shown by the blue line above the LD plot, with the thickness of the blue line indicating their frequency in the CEU reference sample. pct genotyped, per cent genotyped. Figure prepared using LocusView version 2.0 (T. Petryshen, A. Kirby, M. Ainscow, Broad Institute of Harvard and MIT, Cambridge, MA, USA; unpublished software)

Due to the temporal sequence of data availability, we used the haplotype plot constructed with genotypes from phase 1 of the HapMap (supplemented with additional genotyping in the CEU plate) to select our tag SNPs. After genotyping was completed in the disease samples, phase 2 of the HapMap became available; we therefore evaluated our original set of tags against the updated dataset. We observed that our 20 tag SNPs capture 85% of all common variants (minor allele frequency [MAF] ≥0.05) with r 2 ≥ 0.8 and 94% of all such variants with r 2 ≥ 0.4 (see Electronic supplementary material [ESM] Table 1).

Power calculations

We established a nominal p value of 0.05 at stage 1 as the threshold for SNPs to be carried forward to stage 2. Assuming a type 2 diabetes prevalence of 8%, our initial (stage 1) sample of 2,235 US and Polish case-control pairs had >86% power to detect associations of SNPs with MAF ≥5% and a genotypic relative risk (GRR) ≥1.3 under an additive model; power drops to ∼56% for a GRR of 1.2. Due to the statistical correction necessary to account for the number of markers tested (see Methods), we required a nominal p value <0.0046 at the joint analysis stage to declare a statistically significant association; our combined sample (US, Poland and Scandinavia) had ∼73% power to detect associations of SNPs with MAF ≥5% and GRR ≥1.2 under an additive model, achieving >98% power for GRR ≥1.3.

For our quantitative trait comparisons, and taking fasting insulin as an example, our sample of 850 control subjects had 99.8% power to detect a 1-SD increase in fasting insulin caused by an allele of MAF 4% (e.g. rs934167), assuming genotype affects 5% of the variance; if genotype only affects 1% of the variance, power would be reduced to 42.6%.

Association study

The results of stage 1 of our association study are presented in Table 2; genotype counts for this stage are shown in ESM Table 2. After a combined meta-analysis of the US and Polish samples, only one SNP (rs934167) showed nominal evidence of association (OR 1.25, 95% CI 1.03–1.51, p = 0.03). We therefore took rs934167 forward to stage 2: in a meta-analysis of the subsamples studied at this stage, rs934167 showed a trend in the same direction and approached nominal statistical significance (Table 3). Joint analysis of all samples resulted in an OR of 1.20 (95% CI 1.05–1.37) and a two-sided p value of 0.008, but after applying a statistical correction for multiple hypothesis testing, the resulting p value was 0.086. No heterogeneity was detected among our subsamples (p = 0.29).

Table 2 Association study of IRS1 SNPs (stage 1)
Table 3 Genotype counts for rs934167 in stage 2 (Scandinavian and Canadian samples)

Genotype-phenotype correlations

It is possible that, while we were not able to detect a statistically robust effect on risk of type 2 diabetes, this IRS1 variant (which is in perfect LD with several SNPs in the 3′ UTR) might influence glycaemic quantitative traits, either by decreasing insulin secretion in the pancreatic beta cells or by increasing peripheral insulin resistance. We detected no significant differences in fasting insulin, insulinogenic index, 2-h insulin, insulin AUC, HOMAIR or insulin sensitivity index between major allele homozygotes and heterozygotes at rs934167 (Table 4). A nominal effect of rs934167 on BMI in the subset of Botnian normoglycaemic subjects was not replicated in the control subjects from the USA (C/C vs C/T, 27.4 ± 5.2 vs 27.3 ± 5.4 kg/m2, p = 0.71), Poland (C/C vs C/T, 26.1 ± 3.6 vs 26.4 ± 3.1 kg/m2, p = 0.45) or Scandinavia (C/C vs C/T, 25.4 ± 3.7 vs 25.5 ± 3.2 kg/m2, p = 0.65). Because of the nominal association of rs934167 with BMI in the Botnian subsample, we further adjusted the initial association of this SNP with type 2 diabetes for log-transformed BMI; this adjustment attenuated the nominal statistical significance obtained in the unadjusted analysis (p = 0.24 and 0.11 in the US and Polish case-control samples, respectively). In addition, we did not detect any interaction between BMI and genotype at rs934167 (p = 0.48 and 0.41 in the US and Polish case-control samples, respectively). Finally, we noted no effect of genotype at rs934167 on diabetes age of onset for the 1,210 Scandinavian subjects for whom we had precise information on age at diagnosis (Fig. 2).

Table 4 Genotype-phenotype correlations according to genotype at rs934167
Fig. 2
figure 2

Cumulative incidence of type 2 diabetes, by age of onset and genotype at rs934167. The 1,210 Scandinavian subjects for whom we had precise age of onset of type 2 diabetes were stratified by rs934167 genotype and the proportion developing type 2 diabetes was plotted over time. Due to the small number of T/T homozygotes (n = 3), we analysed C/C homozygotes (black line with diamonds) vs T carriers (T/X, grey line with squares). There was no statistically significant difference in the proportion of T carriers diagnosed before age 50 (p = 0.26)

Discussion

Our study was a logical next step in our exploration of the hypothesis that common genetic variants in IRS1 influence insulin resistance. Having been unable to reproduce the specific association of G972R with type 2 diabetes suggested by Jellema et al. [6], we wondered whether other nearby variants (in LD with G972R in some samples but not in others) might account for the association signal noted by other groups. Our study design was well powered to detect associations with similar allele frequencies and putative GRR as proposed for G972R; in addition, its ability to capture common variants in the region was reasonably comprehensive. Nevertheless, we were unable to conclusively identify such an associated variant.

The non-coding SNP rs934167 showed a modest nominal association in the US and Polish case-control pairs, with a similar trend noted in the Scandinavian subsamples. There is little-to-no LD between rs934167 and rs1801278 (G972R) in white subjects (D′=0.08, r 2 = 0), such that our finding, if real, cannot be considered to support the proposed G972R association. Although rs934167 is located downstream of the IRS1 gene, it is a perfect proxy for several SNPs located in its 3′ UTR, which might conceivably affect mRNA stability and thereby influence IRS1 levels. To determine conclusively whether rs934167 is associated with type 2 diabetes would require independent confirmation in adequately powered samples, estimated to require ∼4,700 case-control pairs; even larger samples would probably be needed to detect an effect of rs934167 on glycaemic traits. If the association were confirmed, the question of whether rs934167 (or another variant in LD with it) impacts IRS1 mRNA levels could be examined in cells isolated from individuals with different genotypes at that locus.

Although our attempt to capture common genetic variation at IRS1 was adequate, the possibility remains that as-yet uncaptured polymorphisms do increase risk of type 2 diabetes. Upcoming large-scale whole-genome association scans and their analytical integration should allow for a focused test of all common variants in the region, including rs934167 and those with more modest effects or those which were not perfectly captured here. On the other hand, if an aggregate of multiple rare polymorphisms at IRS1, rather than common variants, leads to type 2 diabetes, alternative association methodologies in large resequenced samples will be required.