Introduction

Alleles of multiple genes within the MHC region are associated with type 1 diabetes. In particular, the HLA-DRB1, HLA-DQA1 and HLA-DQB1 genes are most strongly associated with type 1 diabetes. Specifically, the common genotype with the highest risk for type 1 diabetes in Europeans is DRB1*0301-DQA1*0501-DQB1*0201/DRB1*04-DQA1*0301-DQB1*0302 (DR3/4), which is present in 2.4% of the general US population [1, 2]. Class I HLA alleles have also been associated with type 1 diabetes, particularly the HLA-B*39 and HLA-A*24 alleles [36]. The high risk of HLA-B*3906 has been documented for DRB1*0101-DQB1*0501 and DRB1*0801-DQB1*0402 chromosomes [4], and the B*3906 allele alone has been associated with a younger age of onset of type 1 diabetes [4, 5, 7, 8]. The risk of the B*3901 allele has not been well documented.

B*3906 and B*3901 are the most common HLA-B*39 alleles. As we are interested in extended haplotypes that span HLA-DR/DQ and HLA-B (both class I and class II genes), in this paper we investigate associations of risk for type 1 diabetes for multiple HLA-DR/DQ haplotype groups with either the B*3906 or B*3901 allele. We show that there is increased risk for both B*3906 and B*3901 on specific HLA-DR/DQ haplotypes, and extend the analysis to examine the single-nucleotide polymorphism (SNP)-level variation of these haplotypes.

Methods

Study populations and genotyping

This analysis included 2,300 affected sibling-pair families (10,012 individuals typed for HLA and/or SNPs) from the Type 1 Diabetes Genetics Consortium (T1DGC), using the 2007.11.MHC data freeze [9]. Affected sibling pairs and their parents were enrolled in nine cohorts worldwide. Within the analysed cohorts of Asia-Pacific, Europe, North America, UK, British Diabetes Association (BDA), Danish, Human Biological Data Interchange (HBDI), Joslin and Sardinian, 99% of individuals are classified as white or unknown. The T1DGC performed basic quality control analyses on the data. All study participants or their parents/surrogates provided written informed consent to participate, and the study protocol was approved by the relevant Ethics Committees and Institutional Review Boards.

Genotyping was completed for 3,072 SNPs at the Wellcome Trust Sanger Institute, using custom Illumina exon-centric and mapping panels (2957 distinct SNPs [1536 SNPs in each panel with 115 overlapping SNPs] with 2837 of 2957 SNPs successfully typed, yielding a 96% SNP success rate). In addition, complete four-digit HLA typing (HLA-DPB1, HLA-DPA1, HLA-DQB1, HLA-DQA1, HLA-DRB1, HLA-B, HLA-C and HLA-A), performed using immobilised probe linear arrays, was available for all samples [10].

MICA genotyping

To analyse association of HLA-B*39 with MICA alleles, we typed 341 T1DGC individuals who had a B*39 allele or were family members of individuals with a DRB1*0404 allele. MICA genotypes were determined using a fluorescence-based method as reported previously by Zake et al., Park et al. and Triolo et al. [1113]. Details of the exact method used by our laboratory have been reported previously by Triolo et al. [13].

Data processing

SNP positions used NCBI Build 36 (National Center for Biotechnology Information, U.S. National Library of Medicine, Bethesda, MD, USA). T1DGC chromosomes were generated from SNP and HLA genotype data using multiple software packages. First, to establish that the genotype data demonstrated a Mendelian inheritance pattern within each family, the PedCheck program (http://watson.hgen.pitt.edu) [14] was used on data from both Illumina panels and HLA separately. Mendelian inheritance patterns were present for all families. Next, data from the Illumina mapping SNP panel, the exon-centric SNP panel, and HLA were combined using custom Java programs. Merlin software (www.sph.umich.edu/csg/abecasis/Merlin) [15] was used to phase the SNP and HLA genotype data from families into chromosomes. In situations of ambiguous phase (resulting from heterozygous SNPs or HLA in all family members), phase was not inferred and instead unphased alleles were labelled as such and were excluded from analyses where appropriate.

Founder chromosomes were used in these analyses, yielding four unique chromosomes per family. An affected family based control (AFBAC) method was used to assign case or control status to chromosomes using Microsoft Excel macros as previously described [1619]. Founder chromosomes were labelled as case if they were ever transmitted to a case individual. Correspondingly, chromosomes were labelled as control if they were never transmitted to a case individual. Two affected children in each family were used to label the chromosomes as either case or control, as this was the ascertainment scheme for T1DGC. Analysis of founder chromosomes avoids the issues of non-independence of siblings in the family, as only the four parental (founder) chromosomes are analysed in every family.

Amino acid sequences for HLA-B*3906, B*3901 and B*1402 were taken from the international ImMunoGeneTics information system (IMGT) database (www.ebi.ac.uk/imgt/hla/align.html). Three-dimensional modelling of the HLA-B peptide binding region was performed using IMGT (http://imgt.cines.fr/3Dstructure-DB/) [20].

Statistical analysis

The Fisher’s exact test (two-sided) was used to calculate p values for AFBAC founder chromosome association with type 1A diabetes, with α = 0.05.

Each founder chromosome was classified as transmitted or not transmitted based on transmission to one randomly selected case child in each family. A modified transmission disequilibrium test (TDT) was used to evaluate the significance for each HLA-DR/DQ group by comparing, within each HLA-DR/DQ group, the number of B*3906 alleles transmitted to an affected individual, the number of B*3906 alleles not transmitted to an affected individual, the number of non-B*3906 alleles transmitted to an affected individual and the number of non-B*3906 alleles not transmitted to an affected individual. These values were compared using the Fisher’s exact test (α = 0.05). The same logic was used for B*3901. The underlying counts were computed using custom Java programs, which are available upon request.

Relative risk for the DRB1*08-B*39/DRB1*03 genotype was analysed using case T1DGC individuals (one case per family) and calculated control-genotype frequencies taken from non-transmitted control T1DGC chromosomes. We required the B*39 allele to be present on the DRB1*08 chromosome, not the DRB1*03 chromosome. Relative risk was calculated using Graph Pad 4.0, and the absolute risk estimate was based on a background population risk of 1/300.

For Fig. 1 and electronic supplementary material (ESM) Figs 13, consensus sequences were identified using an automated version of the process previously described for evaluating extended haplotypes [21]. Briefly, given a haplotype matrix consisting of a range of contiguous SNPs and a specific number of chromosomes, we identified the most common sequence of SNPs over a small window or subrange of, for example, 30 SNPs. Then, we narrowed the window to a smaller number of SNPs (e.g. ten) and repeated the process, thereby deriving a consensus sequence for the entire range. Additionally, as the sequence emerged, we scored each chromosome for identity with the emerging sequence. Chromosomes that did not sufficiently match the consensus were eliminated from further consideration. A consensus sequence was derived for the B*3906 chromosomes in the region surrounding the HLA-B gene, and then the other groups of chromosomes were compared with this B*3906 consensus in Fig. 1 and ESM Figs 13.

Fig. 1
figure 1

Comparison of HLA-B alleles based on SNPs. Each column is a chromosome and each row is a SNP. a HLA-B*3906 chromosomes compared with HLA-B*3906 consensus (124 chromosomes, 219 SNPs and 0.26 Mb). b HLA-B*3901 chromosomes compared with HLA-B*3906 consensus (80 chromosomes, 219 SNPs and 0.26 Mb). c HLA-B*1402 chromosomes compared with HLA-B*3906 consensus (144 chromosomes, 219 SNPs and 0.26 Mb). The same consensus sequence was used in ac, and was based on B*3906 chromosomes. The chromosomes are organised by HLA-DR/DQ type in each figure part, and only the region surrounding the HLA-B gene is shown. Yellow boxes show that the allele matches that of the B*3906 consensus, whereas blue represents the opposite allele. The telomeric end of the MHC is at the top of the graph. Type 1 diabetes case chromosomes are on the left and control chromosomes are on the right

The number of chromosomes differs between Fig. 1 and the ESM figures because of the logic that excludes a chromosome from congruence analysis if more than 20% of the SNPs within the range of consideration are unknown or unphased. As Fig. 1 and the ESM figures consider different ranges, different chromosomes fail across those ranges.

Results

We interrogated a readily available dataset from the T1DGC to examine the risk associated with the two most common HLA-B*39 alleles, B*3906 and B*3901. The B*3906 allele is present in 2.6% of case and 0.4% of control chromosomes, while B*3901 is present in 1.3% of case and 1.0% of control chromosomes. Using case and control chromosomes classified using an AFBAC method, the B*3906 allele is associated with very high risk for type 1 diabetes (127/141 B*3906 [90% case chromosomes] vs 4963/7650 non-B*3906 [65% case], p = 1.1 × 10−11, OR 4.9). These results combine all HLA-DR/DQ haplotypes together and look at the overall risk of each allele only. Overall, the B*3901 allele was not significantly associated with type 1 diabetes (68/95 B*3901 [72% case] vs 5022/7696 non-B*3901 [65% case] p = 0.2, OR 1.3). We next investigated the associations of specific HLA-DR/DQ haplotypes with B*3901 and B*3906 (Table 1). The great majority of B*3901 and B*3906 alleles are concentrated on relatively few HLA-DR/DQ haplotypes. Specifically, B*3906 is present almost exclusively on DRB1*0101-DQB1*0501, DRB1*0801-DQB1*0402, and DRB1*0401-DQB1*0302 haplotypes, whereas B*3901 is present on DRB1*0101-DQB1*0501 and DRB1*1601-DQB1*0502 haplotypes.

Table 1 Distribution of B*3901 and B*3906 alleles on HLA-DR/DQ haplotypes

We then assessed the risk of both B*39 alleles on these stratified HLA-DR/DQ haplotypes compared with the results presented earlier which are not stratified by HLA-DR/DQ haplotype. B*3906 is associated with case chromosomes on DRB1*0801-DQB1*0402 (p = 1.6 × 10−6, OR 25.4, Table 2) and is also associated on DRB1*0101-DQB1*0501 haplotypes (p = 4.9 × 10−5, OR 10.3). There is a trend toward increased association of the HLA-B*3906 allele on other HLA-DR/DQ haplotypes, with the exception of DRB1*0401-DQB1*0302 where there is no increased risk associated with the B*3906 allele (p = 0.7, OR 0.8). When we examine the B*3901 allele, we find that it is associated with case chromosomes on the DRB1*1601-DQB1*0502 haplotype (p = 3.7 × 10−3, OR 7.2, Table 3). B*3901 was not significantly associated with diabetes on other HLA-DR/DQ haplotypes. Analysis of transmission of these B*39-bearing haplotypes to one affected child per family is concordant with the above case/control analyses (Tables 4 and 5). Both B*3906 and B*3901 significantly enhance type 1 diabetes risk, each on specific HLA-DR/DQ haplotypes.

Table 2 The HLA-B*3906 allele is significantly associated with type 1 diabetes on specific HLA-DR/DQ haplotypes
Table 3 The HLA-B*3901 allele is significantly associated with type 1 diabetes on specific HLA-DR/DQ haplotypes
Table 4 Transmission of the B*3906 allele
Table 5 Transmission of the B*3901 allele

The B*3906 and B*3901 amino acid sequences are closely related, and their nearest non-B*39 HLA-B allele is B*1402 [22]. We decided to investigate these three alleles in more detail. Only two amino acids differ between the B*3906 and B*3901 alleles, both in the base of the peptide-binding pocket (B*3906 to B*3901: threonine to arginine and tryptophan to leucine). When we compare B*1402 and B*3906, they differ by seven amino acids, five of which appear to be in the peptide-binding region. However, despite this apparent similarity at the amino acid sequence level, the B*1402 allele does not enhance diabetes risk (82/163 B*1402 [50% case] vs 5008/7628 non-B*1402 [66% case], p = 8.2E−5, OR 0.5). B*1402 is marginally associated with decreased type 1 diabetes risk on DRB1*0701-DQB1*0201 and DRB1*0301-DQB1*0201 haplotypes (ESM Table 1).

We analysed SNPs surrounding the HLA-B gene for the three HLA-B alleles of interest to determine the SNP-level differences (Fig. 1a–c). In these figures, each column is one chromosome (panels showing B*3906, B*3901 or B*1402 chromosomes, respectively) and each row is one SNP. These figures only show the region surrounding the HLA-B gene (full chromosome plots can be found in ESM Figs 13). All three groups of chromosomes are compared with a common consensus sequence for B*3906. Yellow boxes represent SNP alleles that match the consensus while blue represents SNP alleles that do not match the consensus. All three groups of chromosomes (B*3906, B*3901 and B*1402) are congruent (or match) for a region of SNPs surrounding HLA-B (59,530 base pairs), even though B*3906 and B*3901 are associated with higher risk and B*1402 is not. B*3901 and B*3906 chromosomes are congruent (based on SNPs) with each other for 142,861 base pairs surrounding HLA-B. We typed a subset of families for the MICA gene, which is in the region of SNP identity, enriching for families with B*39. Of the individuals (a mixture of patients and unaffected individuals from T1DGC families) with the B*39 allele, 100% had the MICA 9 allele (26/26) vs 17% of individuals without B*39 (54/314, p = 1.3 × 10−18).

Discussion

HLA-B*39 risk has been well defined in the literature [3, 4, 7, 8]. With the characterisation of conserved extended haplotypes in type 1 diabetes risk [18, 23], studies exploring the genetic context of HLA-B*39 alleles (via haplotype and SNP analysis) have become increasingly important in further characterising haplotypic risk for type 1 diabetes and those MHC regions involved in risk determination [4, 24, 25]. Here we confirm previously discovered haplotypic associations of HLA-B*39 alleles, and place these associations in the context of high-density SNP analysis.

B*3906 is associated with increased risk on DRB1*0801-DQB1*0402 and DRB1*0101-DQB1*0501 haplotypes, whereas B*3901 is associated with increased risk on DRB1*1601-DQB1*0502 haplotypes. While both B*39 alleles are associated with increased risk, B*1402, their nearest non-B*39 HLA-B allele by amino acid sequence, is not associated with increased risk [22]. Even one amino acid difference, particularly in the peptide-binding region, could influence disease risk associated with the allele. Therefore, even though there are only seven amino acid differences (five in the peptide-binding region) between B*1402 and B*3906, those changes could result in the drastic difference in risk seen in this analysis. In addition, as there are two amino acid differences between B*3906 and B*3901, both in the peptide-binding region, those differences could also result in differences in disease risk. Therefore, we hypothesise that B*39-associated increased risk is due to the differences in amino acid sequence between B*3901/B*3906 and B*1402 or that the risk is associated with polymorphisms where B*3906 and B*3901 are congruent for almost all SNPs but do not match the B*1402 SNP haplotype. Gene-conversion events between alleles of the HLA-B gene have been reported previously. This is one possible explanation for the presence of highly similar SNP sequences surrounding the HLA-B gene for the B*3906, B*3901 and B*1402 alleles [2628].

There are five genes in the 142,861 base pair region of near SNP identity between the B*3906 and B*3901 haplotypes: HLA-B, DHFRP2 (dihydrofolate reductase pseudogene 2), HLA-S (pseudogene), MICA (MHC class I polypeptide-related sequence A), and HLA-X (pseudogene). Theoretically, any of these genes or even variations in non-coding sequence in the region could contribute to the increased risk associated with B*3901 and B*3906.

There are limitations to this work. First, B*39 is not a common allele, as we see it in only 3% of individuals with type 1 diabetes in this dataset. Consequently, we cannot tell whether B*3906 haplotypes confer greater risk than B*3901 haplotypes as the only HLA-DR/DQ haplotype that is common to both alleles is DRB1*0101-DQB1*0501, and the numbers in this group are too small to distinguish the risk associated with the alleles. Therefore, a larger dataset is needed to fully differentiate the risk associated with these two B*39 alleles. It has been previously described that the B*3906 allele is high risk on the DRB1*0404-DQB1*0302 haplotype [8, 29, 30]. Our results are in the same direction, but a larger dataset would be needed to confirm the effect (DRB1*0404-DQB1*0302: 9/9 B*3906 [100% of B*3906 are case chromosomes] compared with 249/304 non-B*3906 [82% of non-B*3906 are case chromosomes], p = 0.4, OR 4.2).

We did not observe a difference in risk due to the B*3906 allele on the DRB1*0401-DQB1*0302 haplotype (DRB1*0401-DQB1*0302: 17/19 B*3906 [89% of B*3906 are case chromosomes] compared with 1034/1134 non-B*3906 [91% of non-B*3906 are case chromosomes], p = 0.7, OR 0.82). It is possible that the DRB1*0401-DQB1*0302 risk is so high that we are unable to see an incremental effect from B*3906. In addition, as the numbers in the B*3906-DRB1*0401-DQB1*0302 group are small, it is possible that the study was underpowered to find an association. Similarly, the B*3901 allele is rare (1.2%). The current study should be considered hypothesis-generating given the limited number of chromosomes containing B*39, even in this large T1DGC dataset.

To replicate our findings from the T1DGC dataset we employed a large, publicly available dataset from the Wellcome Trust Case Consortium (WTCCC) [5] (ESM Table 2). Diplotypes containing high-risk DRB1-DQB1 haplotypes (namely DRB1*0801-DQB1*0402, DRB1*0101-DQB1*0501, DRB1*301-DQB1*0302, DRB1*0401-DQB1*0302, and DRB1*1601-DQB1*0502) were evaluated for the presence of HLA-B*3906 and HLA-B*3901. Those containing DRB1*0801-DQB1*0402 (but not the other high-risk haplotypes) were more likely to have HLA-B*3906 (case 42% [8/19], control 6% [3/53], p = 6.3 × 10−4, OR 12). Similar results were found for genotypes containing DRB1*0101-DQB1*0501 (case 9% [7/81], control 2% [7/367], p = 5.8 × 10−3, OR 5) and DRB1*0301-DQB1*0201 (case 5% [25/502], control 1% [9/628], p = 6.7 × 10−4, OR 4). Our finding that the haplotype DRB1*0401-DQB1*0302 is not associated with HLA-B*3906 was also confirmed, with similar frequency between cases (3% [5/181]) and controls (2% [3/186], p = 0.5).

Our analyses using both the T1DGC and WTCCC datasets, therefore, indicate that HLA-B*3906 is high risk on specific DRB1-DQB1 haplotypes. At a practical level, identifying specific HLA-DR/DQ haplotypes also carrying B*39 will aid type 1 diabetes genetic prediction for individuals, particularly in a research setting. In terms of quantificative risk and application in a clinical setting, genotypes (i.e. diplotypes or both HLA haplotypes) are critical determinates, especially such high-risk diplotypes as DR3/4 with HLA-B*3906. In that the B*39 alleles are uncommon among patients with type 1 diabetes, identifying B*39 alleles will have little impact on the prediction of overall risk of type 1 diabetes (as illustrated by an ROC curve). In particular, absence of a B*39 allele would not appreciably decrease the overall risk of type 1 diabetes. In contrast the very high odds ratios of B*39 in combination with specific HLA-DR/DQ alleles (e.g. DRB1*0801-DQB1*0402) indicate that for individuals carrying these genotypes there would be a potentially important increase in risk with the presence of B*39. Absolute diabetes risk of genotype DRB1*08-B*39/DRB1*03 is estimated to be 5%, compared with 0.26% when B*39 is not present with this same genotype. This 5% risk for the DRB1*08-B*39/DRB1*03 genotype is comparable with the risk seen with the highest risk genotype DR3/4.

From a theoretical point of view, if the effects of B*39 alleles are dependent on both specific HLA-DR/DQ haplotypes and in particular the specific B*39 sequences, it suggests a potentially complex model of pathogenesis. The leading hypothesis is that B*39 alleles present specific islet peptides to CD8 T lymphocytes. HLA-DR/DQ alleles might determine CD4 T cells targeting of specific autoantigens and B*39 alleles would then enhance CD8 T cells targeting of peptides of these autoantigens. Alternatively it is possible that class I polymorphisms may even influence thymic selection and the repertoire of CD4 T cell receptors. Understanding the mechanism by which B*39 alleles enhance diabetes risk on specific HLA-DR/DQ haplotypes will likely contribute to our overall understanding of the loss of tolerance leading to type 1 diabetes.