Background

Type 1 diabetes (T1D) is the second most common chronic disease in children. It develops as a result of a complex interaction of genetic and environmental factors leading to the immune-mediated destruction of the insulin-producing pancreatic β-cells. Genetic predisposition has a significant role in T1D as suggested by familial clustering of the disease and increased concordance among monozygotic twins [1]. The identification and localization of susceptibility genes for complex traits or common diseases has made slow progress, owing to many factors including small effect sizes, incomplete knowledge of the polymorphism content of the genome and its patterns of linkage disequilibrium and lack of inexpensive genotyping technologies. One approach to narrowing down to specific genome regions that might contain a susceptibility gene or genes has been to carry out linkage studies in affected sib-pair families. In contrast to monogenic diseases, this approach has had limited success in multifactorial diseases.

Nevertheless, in T1D, combined analyses of several studies provided evidence for four linked regions, the major locus MHC on 6p21 (previously designated IDDM1), 10p14-q11 (IDDM10), 2q31-q33 (IDDM7 and IDDM12) and 16q22-q24 [2]. Here we undertook analysis of sequence polymorphisms in the putative IDDM10 region, comprising 23 Mb region on the chromosome 10p12-q11. We have identified a large number of single nucleotide polymorphisms (SNPs) and performed an association scan of this region in a large collection of T1D families, as well as unrelated patients and controls.

Results

Initially, in order to identify T1D genes in the IDDM10 region we adopted a candidate gene approach. Previously we examined the GAD2 gene, which encodes a major T1D autoantigen GAD65 protein, and found no evidence of association [3]. Here we resequenced and studied association of two candidate genes, CREM and SDF1, which also map to this region. The cyclic adenosine 5'-monophosphate responsive element modulator (CREM) has been shown to bind to the Interleukin-2 gene promoter and suppress expression of this cytokine [4], which is critical for the initiation and termination of the immune response as well as for T cell development. Increased CREM expression was found in T cells of patients with another autoimmune disease, systemic lupus erythematosus [5]. By resequencing the CREM gene we identified 32 SNPs, including 13 novel SNPs (Supplementary Table 1, see Additional file 1). We selected six tag SNPs and genotyped them in 1,612 T1D patients and 1,828 controls from the UK. We found a multi locus P = 0.98, indicating that common variants of CREM do not affect T1D susceptibility in a major way.

Table 1 Association analysis in the extended T1D family set.

By resequencing the cytokine stromal cell-derived factor 1 (SDF1 or CXCL12) gene we identified 33 variants, including two insertion/deletion polymorphisms, 21 of which were novel (Supplementary Table 2, see Additional file 2). We selected six tag SNPs, genotyped them in 1,612 cases and 1,828 controls and found no association (multi locus P = 0.67). We also tested SNP rs1801157, also known as 3'A(801G>A), in the evolutionary conserved 3' untranslated segment of SDF1 that previously had been associated with early onset of T1D [6, 7]. We attempted to replicate these findings and genotyped rs1801157 in 1,800 T1D families from the UK, USA and Norway. The A allele and AA genotype frequencies were very similar to those reported previously (19.4% and 5.8%, respectively). The transmission disequilibrium test revealed no association with T1D (507 transmitted A alleles and 530 untransmitted, P = 0.47; relative risk for AA genotype = 1.01, 95% CI = 0.87–1.16, P = 0.89). Even though we obtained no evidence of association we subdivided the families by age-at-onset and by HLA-DRB1 genotype because the two previous studies had carried out subgroup analyses. However, we found no association in any subgroup (data not shown).

Table 2 Association analysis in the 1,693 T1D patients and 1,805 controls from the UK.

We then conducted a comprehensive genetic analysis of the whole IDDM10 region in order to systematically identify new T1D gene(s). As part of the Human Genome Project (HGP) the Wellcome Trust Sanger Institute constructed a single tile path, i.e. set of overlapping BACs derived from two different libraries [8, 9]. The overlaps between clones in this tile path were checked for SNPs and those were deposited in the dbSNP database previously [10]. In order to discover additional novel SNPs in the IDDM10 region we constructed a second tile path that covers the whole region using clones from both BAC libraries, so that finished genome sequence from one library was complemented by a clone from the second library, i.e. from a different individual. This second tile path was then shotgun sequenced. Thus, we revealed additional polymorphic sites located outside BAC overlaps in the initial HGP tile path. In total we identified and submitted to dbSNP 12,058 SNPs, of which 10,808 were uniquely mapped onto the human genome build 34, including 1,320 SNPs that were novel, i.e. not present in dbSNP build 120. These SNPs contributed substantially to the polymorphism content of the IDDM10 region.

We then screened for association with T1D sequence polymorphisms between 21.0 Mb and 44.3 Mb of chromosome 10 (NCBI genome build 34) that include the IDDM10 region. In total 303 SNPs and 25 polymorphic microsatellite markers/short tandem repeats (STRs) were genotyped in up to 765 families with two affected offspring (Supplementary Table 3, see Additional file 3). This sample includes families in which linkage of IDDM10 was characterized initially [1113]. We found 14 polymorphisms in nine loci showing nominal evidence of association with T1D (P < 0.05). In order to investigate these results further, we genotyped these polymorphisms in an additional set of T1D families (Table 1). In the combined analyses of up to 2,857 families we found some evidence for T1D association of D10S193, rs1963187 and rs2480285 (P = 0.037, 0.0074 and 0.0026, respectively), which are clustered in a 97 kb region (coordinates: chr10; 30,577,375..30,674,697; NCBI genome build 34). Their association with the disease was largely independent of each other (between rs1963187 and rs2480285 r2 = 0.24, while between risk associated allele 226 of D10S193 and rs1963187 and rs2480285 r2 = 0 and 0.01, respectively).

Two genes localize within 150 kb of the D10S193, rs1963187 and rs2480285 markers (Figure 1 and [14]). The polyA polymerase associated domain containing 1 (PAPD1) gene encodes a protein with a nucleic acid binding PAP/25A-associated domain. Associated polymorphisms flank PAPD1, while the second gene, known as MAP3K8 (mitogen-activated protein kinase kinase kinase 8) is located 50 – 177 kb away from the associated polymorphisms. We then searched for novel sequence polymorphisms in this region. Using a panel comprising eight Caucasian individuals we resequenced 38.8 kb in the 177.2 kb region between D10S193 and exon 9 of the MAP3K8 gene. Additionally, to discover rarer genetic variants we resequenced exons, exon-intron boundaries and putative regulatory regions of the PAPD1 gene using a panel of 96 T1D patients (each representing one of the UK multiplex families in which affected sibs share both chromosomes identical-by-descent in the IDDM10 region). In total we found 147 SNPs (Supplementary Table 4, see Additional file 4).

Figure 1
figure 1

The PAPD1-MAP3K8 gene region on chromosome 10. Tracks indicate (from top to bottom): position in the human genome chromosome 10, NCBI build 34; annotated genes; regions where resequencing of the amplified fragments has been attempted; sequenced BAC clones; SNPs identified in the clone overlaps; polymorphisms genotyped in up to 765 UK and USA families with two affected offspring; SNP rs or ss numbers; a plot showing -log(P-value) for association in up to 765 families with red line at 1.3 corresponding to P = 0.05; linkage disequilibrium (LD) plots for SNPs with minor allele frequency > 0.1 showing pairwise LD by colour ranging from red (high D') or dark green (high r2), indicating strong LD, to white, indicating weak or no LD between SNPs. LD plots have been generated using Haploview [27].

Then, in addition to the five polymorphisms (two STRs and three SNPs) in the PAPD1-MAP3K8 region that were already genotyped, we tested 84 SNPs identified by resequencing. At first we studied association in 458 UK families (Supplementary Table 5, see Additional file 5). Subsequently, seven SNPs that were suggestively associated in these UK families (P = 0.012 – 0.073) were genotyped in the extended set of up to 2,857 T1D families. Thus, overall we studied 12 polymorphisms in the PAPD1-MAP3K8 region in all available T1D families. We found that alleles of six SNPs that localize in the PAPD1 gene show nominal evidence of association with T1D risk or protection (P = 0.0026 – 0.031, Table 1). Then we genotyped an additional sample of 1,693 unrelated T1D patients and 1,805 controls from the UK (Table 2) for six PAPD1 polymorphisms that were associated in the previous analysis of the extended family set. We found that only microsatellite marker D10S193 located 28 kb downstream of PAPD1 was weakly associated in this sample (P = 0.03, Table 2). Thus allele 226 of D10S193 was weakly associated with T1D risk both in the families (relative risk [RR] = 1.15, P = 0.019) and in the case-control analysis (OR = 1.16, P = 0.078). Another D10S193 allele 228 was associated with protection from T1D in cases and controls (OR = 0.73, P = 0.006), but not in the families (RR = 0.96, P = 0.59).

Additionally, we further studied seven SNPs in the MYO3A, HRNPF, NRP1 and SVIL gene regions that have shown some evidence of association in the T1D families (P = 0.0092 – 0.04, Table 1). We genotyped these SNPs in 1,693 T1D patients and 1,805 controls, but found no association with T1D (Table 2).

Discussion

Overall, the association signals that we detected in the IDDM10 region near the PAPD1 gene did not reach genome-wide significance levels, despite having tested large samples of T1D families, cases and controls. This association could be spurious or could indicate a small genuine effect located near the PAPD1 gene that we did not have statistical power to demonstrate at a genome-wide significance level. Such an effect could not explain the reported evidence of T1D linkage at this region of chromosome 10, λs = 1.12. If IDDM10 is a true disease locus, it could be caused by a single common contributory variant with strong effect (such as OR = 2 and minor allele frequency of 0.2) or, more likely, by a number of variants with smaller effects located in this region. Therefore, further association studies in datasets that are powered to identify weak genetic effects (e.g. OR = 1.3 – 1.5) are needed to discover these type 1 diabetes genes.

Conclusion

We identified a large number of SNPs for genetic studies in the IDDM10 region using a novel sequencing strategy, performed a first T1D association scan of this region and eliminated the possibility that two functional candidate genes, CREM and SDF1, have major effect on T1D. The weak association signal near the PAPD1 gene detected in the association scan may be either false or due to a small genuine effect, and cannot explain the previously observed strong linkage in the IDDM10 region.

Methods

Double tile path construction and shotgun sequencing

We constructed a double tile path between 21,022,316 and 44,323,745 bp of chromosome 10 (NCBI build 34) from two bacterial artificial chromosome (BAC) libraries, RPCI-11 and RPCI-13, and sequenced selected clones at the Wellcome Trust Sanger Institute. In total the shotgun tile paths comprised 110 RPCI-11 clones and 86 RPCI-13 clones that represent 12.6 Mb of overlap sequence. In addition to the finished sequence of chromosome 10, we shotgun sequenced to draft quality complementary clones from the other library in the tile path. In 1,060 overlaps the RPCI-11 clone was completely finished and in 76 overlaps the RPCI-13 clone was completely finished. Of the unfinished clones 77 (98.72%) were in multiple smaller contigs, i.e. draft quality.

SNP identification

Repeats were masked using RepeatMasker [15] and then masked sequences were used in pair-wise sequence alignments by Sequence Search and Alignment by Hashing Algorithm (SSAHA) to map clone sequences and find SNPs [16]. Overlaps ≥ 2 kb were considered for SNP identification. We only checked overlaps between clones in different tile paths as overlaps from the finished tile path had been checked for SNPs previously. A file of overlap pairs was derived and used as an input for a script that calls SSAHA and then parses the resulting alignments. To avoid false SNP calling due to misalignment, clusters of five or more SNPs were rejected when each one was less then 10 bp away from neighboring SNPs. We then mapped SNPs on the human genome consensus path (NCBI build 34) using the mapping information from the clone and the SSAHA algorithm. The tile paths and SNPs can be viewed on our website [14]. Information on all polymorphisms has been submitted to dbSNP [17].

Subjects

The study was done according to the principles of the Helsinki Declaration. We obtained permission from relevant ethical committees and informed consent from the participating subjects. Initially, we genotyped 329 polymorphisms in up to 765 families of Caucasian ethnic group, each with two children affected with T1D comprising 458 Diabetes UK Warren families and 307 The Human Biological Data Interchange (HBDI) families. An extended set of families with at least one affected child comprised families from the UK (n = 1,781, including 458 Diabetes UK Warren families), Norway (n = 359), Romania (n = 352) and the USA (n = 365, including 307 HBDI families). Therefore, in total we studied 2,857 T1D families; exact number of affected offspring genotyped for each SNP is shown in Table 1. Assuming multiplicative genetic model, this extended set of T1D families provides over 80% power to detect genetic effect with odds ratios (OR) of 1.25 and 1.5 for alleles at 40% and 7%, respectively, at α = 10-6. Additionally we studied an independent sample of 1,693 T1D patients collected across the UK and 1,828 control subjects that were selected from the 1958 British Birth cohort [18]. This case-control collection would have over 99% power to detect OR = 1.25 and 1.5 for alleles at 40% and 7%, respectively, at α = 0.05.

Candidate gene resequencing and genotyping

We designed primers using Primer3 [19], amplified genomic DNA by PCR and sequenced PCR fragments with an ABI Big Dye Terminator v3.1 kit and an ABI3700 capillary sequencer (ABI, Foster City, CA). Sequence reads were aligned using the Staden package [20]. We resequenced in 32 individuals all CREM and SDF1 exons, exon-intron boundaries, and up to 3 kb upstream and downstream of the gene. All identified SNPs have been submitted to dbSNP [17]. SNP genotyping was carried out using Invader (Third Wave Technologies, Madison, WI), TaqMan (Perkin Elmer Applied Biosystems, Foster City, CA) or BeadArray (Illumina Inc, San Diego, CA). Microsatellite markers were genotyped as described elsewhere [13].

Statistical analysis

We assessed genotype frequency among parents for each polymorphism using Arlequin version 2.000 [21] and found no unexpected deviation from the Hardy-Weinberg equilibrium (P > 0.01). Statistical analysis was carried out within STATA version 8.1 [22]. Tag SNPs that capture common allelic variation (MAF > 0.03) with r2 ≥ 0.8 were selected using htstep, htsearch and haptag programs within Stata [23, 24]. When a tag SNP approach was taken, we used a global association multilocus test using mlpop program in Stata. It tests for association between the disease and the tag SNPs due to linkage disequilibrium with one or more causal variants in the region. This test contrasts the allele frequencies of a non-redundant set of tag SNPs between cases and controls by use of Hotelling's T2 test [25, 26]. We did not apply multiple testing corrections in this study and all P-values reported are uncorrected.