Introduction

The publication of the first type 1 diabetes locus found by a genome-wide association (GWA) study in 2006 (IFIH1) [1] heralded a new era in susceptibility locus discovery in this common autoimmune disease. Over 50 susceptibility loci have now been identified (www.t1dbase.org). Eighteen of these were identified by Barrett et al. [2] in a GWA meta-analysis of 7,514 cases and 9,045 controls (meta-analysis p < 1 × 10−6) and confirmed in 4,267 cases, 4,670 controls and 2,319 affected sib-pair families (providing 4,342 parent–child trios; replication p < 0.01; discovery and replication p < 5 × 10−8) [2]. However, in the family component of the replication samples, eight of the confirmed 18 susceptibility loci failed to reach nominal levels of significance (p < 0.05; inferred from the reported 95% confidence intervals for the relative risks and assuming two-sided significance tests). Although replication was based on the combined evidence from case/control and family collections, and no evidence of population stratification in the case/control collection had been found previously [2, 3], family-based evidence, if possible, remains important in order to demonstrate that these associations did not arise through population stratification bias [4]. Such a bias can occur when a single nucleotide polymorphism (SNP) differs in allele frequency across subgroups of the population and risk of disease differs between these subgroups.

Based on the number of case/control and parent–child trio replication samples used in Barrett et al. [2], if we assume that the parent–child trios equate to an equal number of cases and controls, the power of the case/control and family replication sets would have been similar and the potential impact of winner’s curse (the upward bias of the effect size of the initial finding) on replication would not differ between the replication sample sets. However, in type 1 diabetes, the effects (as measured by relative risk) of non-HLA loci tend to be smaller in affected sib-pair families [2, 5], which are enriched for type 1 diabetes with a higher frequency of high-risk HLA genotypes. Consequently, when the family component of the replication samples used in Barrett et al. [2] is considered in isolation, the 2,319 affected sib-pair families are likely to have been underpowered (too few samples analysed) to replicate the initial associations. Therefore, in the present study, we genotyped the best disease-predicting SNPs at the 18 susceptibility loci [2] in an additional 3,108 families (providing 2,801 parent–child trios to the analysis) from the Type 1 Diabetes Genetics Consortium (T1DGC) and the Juvenile Diabetes Research Foundation/Wellcome Trust Diabetes and Inflammation Laboratory. The analyses of these additional families, combined with the original 2,319 families [2], provided protection from population stratification bias, and increased power to provide further replication support for the associations of these 18 susceptibility loci [2].

Methods

Subjects

After the additional genotyping of 3,108 families (2,322 families of white European ancestry and providing at least one parent–child trio; electronic supplementary material [ESM] Table 1), we had a collection of 5,427 families (including 2,319 families previously genotyped [2]). All families were collected with appropriate informed consent. We analysed 4,429 families of white European ancestry and providing one or more parent–child trios (ESM Table 1).

Genotyping

The best disease-predicting SNPs at the 18 susceptibility loci [2] were genotyped in the additional family samples using the TaqMan 5′ nuclease assay (Applied Biosystems, Warrington, UK) according to the manufacturer’s protocol. Genotyping was performed blind to disease status and double scored to minimise error. Genotype frequencies were tested for deviation from Hardy–Weinberg equilibrium (HWE), and genotype checks were conducted for SNPs that deviated from HWE. We note that disease association can result in deviation from HWE in affected offspring and parents of affected offspring, who are not representative of the general population. The same genotyping technology and protocols had been applied in Barrett et al. [2] for the replication samples.

Statistical analysis

All statistical analyses were performed in either Stata (www.stata.com) or R (www.r-project.org). In R, we used the snpStats package available from the Bioconductor project (www.bioconductor.org), and, in Stata, we used some additional routines available from www-gene.cimr.cam.ac.uk/clayton/software.

The family-based power to replicate the 18 type 1 diabetes susceptibility loci [2] is reported in ESM Table 2. Based on the odds ratios from the case/control component of the replication samples in Barrett et al. [2], which are not subject to winner’s curse, the expanded family collection is well powered, except for 17q21.2/CCR7 (53.4% power at α = 0.05; 17.2% power at α = 2.8 × 10−3, which corresponds to the Bonferroni adjustment of the 0.05 significance level for the 18 independent tests; ESM Table 2). We have greater than 90% power at α = 0.05 for 17/18 loci, and greater than 80% power at α = 2.8 × 10−3 for 14/18 loci (17/18 have greater than 60% power at α = 2.8 × 10−3).

The best disease-predicting SNPs at the 18 susceptibility loci were analysed using the transmission disequilibrium test, except for the chromosome X locus, rs2664170 Xq28/GAB3, which was analysed using the method proposed by Clayton [6]. As we were attempting to replicate the associations reported in the case/control component of the replication samples analysed in Barrett et al. [2], we performed one-sided significance tests. We tested for population heterogeneity in SNP genotype frequencies across unaffected parents using Kruskal-Wallis one-way analysis of variance. We tested for population heterogeneity in disease association, after generating pseudo-controls [7], by testing the addition of the genotype–population interaction term to the conditional logistic regression model of disease status on genotype and population. Parent-of-origin and imprinting effects were tested using the Wallace et al. extension of the Weinberg method [8, 9].

Results

As no p values have been reported previously for the 18 novel susceptibility loci in the family component of the replication samples [2], we reanalysed the original data. We excluded 312 families because of either non-white European ancestry based on updated sample information or not providing at least one parent–child trio. Seven of the 18 loci failed to reach p < 0.05 in these 2,107 families (providing 4,212 parent–child trios; Table 1). In other words, 11 of the 18 loci reach at least nominal levels of significance. If we applied a Bonferroni adjustment for the 18 independent tests, 15 loci failed to reach p < 2.8 × 10−3.

Table 1 A summary of the analysis of the 18 novel type 1 diabetes susceptibility loci discovered in Barrett et al. [2]

The inclusion of the additional 2,322 families (providing 2,801 trios; 786 families excluded) increased the number of susceptibility loci replicated at p < 0.05 from 11 to 17 of the 18 loci. Only ZFP36L1, C14orf181/14q24.1 (p = 0.055) failed to reach p < 0.05 (Table 1). The number of susceptibility loci replicated at p < 2.8 × 10−3 increased from three to ten (Table 1). Importantly, all of the susceptibility loci had consistent direction of effects with the case/control and family replication samples reported in Barrett et al. [2], and there was no evidence of heterogeneity in the disease associations across family collections, despite there being significant SNP genotype frequency differences (ESM Table 3). The difference in SNP genotype frequencies across family collections was not surprising given that Europe is a large and diverse collection of countries. For example, we have a large number of families from Finland, a genetically isolated population, which exhibits many and large differences in common SNP allele frequencies.

We tested the 17 autosomal loci for parent-of-origin and imprinting effects; only COBL/7p12.1 showed any evidence of biased maternal transmission, p = 1.1 × 10−3 (ESM Table 4). However, this needs to be replicated in an independent dataset.

Discussion

In the expanded family collection, only one of the previously confirmed susceptibility loci failed to reach nominal levels of significance, ZFP36L1, C14orf181/14q24.1, as the p value was just above 0.05. All of the susceptibility loci had consistent direction of effects with the case/control component of the replication samples reported in Barrett et al. [2] (ESM Table 5), and even with our over-conservative threshold for multiple testing, given the very strong prior information that these were true effects [2], ten loci remained significant after the adjustment for multiple testing. This study clearly demonstrates that additional replication families were required for the 18 susceptibility loci to reach nominal levels of significance and consequently that the previously reported associations (discovery and replication p < 5 × 10−8) with odds ratios often less than 1.15 ([2]; ESM Table 5) did not arise through population stratification bias, thereby further validating the case/control collection (results).

After unequivocal replication of type 1 diabetes loci, the next steps involve dense SNP mapping in even larger sample sets and experiments analysing genotype–phenotype associations. For example, studying correlations between type 1 diabetes SNP risk alleles and haplotypes and expression of genes at the RNA and protein levels [10] can identify which genes in the associated regions are more likely to be causal. Consequently, genes with both positional and functional evidence for a role in disease aetiology can reveal the pathways and early precursors or biomarkers underlying the pathogenesis of type 1 diabetes.