Current Diabetes Reports

, 11:445

The Past, Present, and Future of Genetic Associations in Type 1 Diabetes

Authors

  • Peter R. BakerII
    • The Barbara Davis Center for Childhood DiabetesUniversity of Colorado Denver
    • The Barbara Davis Center for Childhood DiabetesUniversity of Colorado Denver
Article

DOI: 10.1007/s11892-011-0212-0

Cite this article as:
Baker, P.R. & Steck, A.K. Curr Diab Rep (2011) 11: 445. doi:10.1007/s11892-011-0212-0

Abstract

Type 1 diabetes mellitus (T1DM) is an autoimmune disease affecting approximately one in 300 individuals in the United States. The majority of genetic research to date has focused on the heritability that predisposes to islet autoimmunity and T1DM. The evidence so far points to T1DM being a polygenic, common, complex disease with major susceptibility lying in the major histocompatibility complex (MHC) on chromosome 6 with other smaller effects seen in loci outside of the MHC. With recent advances in technology, novel means of exploring the human genome have given way to new information in the development of T1DM. The newest technologies, namely high-throughput polymorphism typing and sequencing, have led to a paradigm shift in studying common diseases such as T1DM. In this review we highlight the advances in genetic associations in T1DM in the last several decades and how they have led to a better understanding of T1DM pathogenesis.

Keywords

Type 1 diabetesAutoimmune diseasesMajor histocompatibility complexHuman leukocyte antigenGenetic association studyLinkage studyCandidate geneExtended haplotypeSingle nucleotide polymorphismGenome-wide association studyWhole exome sequencing

Introduction

The last several decades have produced some of the greatest advances in understanding the genetic basis of common human disease such as type 1 diabetes mellitus (T1DM). Much of this progress can be attributed to the exploration and characterization of the human genome. Although the techniques used to study the genome have changed over the years, the basic staple of experimental design has not. Genetic association studies have provided insight into disease risk, pathogenesis, and treatment by comparing the genetic material between a case group and a control group. Regions found more in the case group can be termed “risk regions,” and those found more in the control group can be thought of as “protective regions.”

For T1DM, as with other “common” diseases, polygenicity and the contribution of environmental factors have provided great challenges in determining the genetic cause(s) of disease pathogenesis. Association studies using the major histocompatibility complex (MHC) located on chromosome 6 began in the 1970s, with the greatest contributor to diabetes risk being the HLA region. Refinement of risk loci inside and outside the MHC continued in the 1980s and 1990s with linkage studies, candidate gene sequencing, and the eventual utilization of single nucleotide polymorphisms (SNPs) in a more targeted approach of case versus control differences (Fig. 1). Today, the use of large-scale, high-powered studies looking at hundreds of thousands of SNPs across the genome (in the form of genome-wide association studies [GWAS]) has led to a greater understanding of T1DM pathogenesis. Conversely, these new technologies have also created greater challenges regarding data quality control, statistical analysis, and interpretation of results. As T1DM genetics research moves forward, techniques looking at the entire human genome (via high-throughput sequencing) are becoming more cost effective and applicable to larger populations. Interpreting the results from these studies is a challenge that needs to be addressed before GWAS and high-throughput sequencing can be applied in the clinical setting.
https://static-content.springer.com/image/art%3A10.1007%2Fs11892-011-0212-0/MediaObjects/11892_2011_212_Fig1_HTML.gif
Fig. 1

Time trends in type 1 diabetes mellitus (T1DM) genetic associations. Changes in study methods have dramatically influenced the rate of genetic discoveries in T1DM, but the magnitude of the genetic effect found has diminished. The results of whole-exome sequencing are still under investigation. GWAS—genome-wide association studies; MHC—major histocompatibility complex; OR—odds ratio

Here, we explore the goals, methods, and results of T1DM genetic studies through the last several decades, and highlight some of the factors that have contributed to our understanding of T1DM pathogenesis. We will also explore the most current understanding of, and technologies used in the study of, T1DM genetics.

The MHC Region

The earliest studies in the genetics of T1DM found a strong association with the MHC region, which has proven to be the most influential region in diabetes pathogenesis. It is estimated that 50% of the genetic risk for T1DM is attributed to the MHC region. This 4-Mb long region on the short arm of chromosome 6 (6p21.3) contains more than 200 genes, 40% of which are immune-related [1, 2]. The region also displays strong linkage disequilibrium (LD) and is highly polymorphic making it difficult to pinpoint a single gene associated with the development of T1DM.

The earliest studies found the major T1DM susceptibility locus in the class II HLA region. Effects of class II alleles are consistent across different ethnic groups despite large differences in allele frequencies [3]; the DRB1-DQB1 haplotypes DR3-DQ2 (DRB1*0301-DQB1*0201) and DR4-DQ8 (DRB1*04-DQB1*0302) have long been associated with high risk of developing T1DM, especially in Caucasians. These two haplotypes together (as the genotype DR3-DQ2/DR4-DQ8) are present in 2.4% of Denver newborns (versus 30% to 40% of all T1DM patients), and carry an absolute risk of 1/15 versus 1/300 in the general population [4, 5]. In long-term follow-up studies of children selected for either high-risk HLA or a first-degree relative with T1DM, 41% of DR3/4 siblings and 16% of DR3/4 offspring express positive islet autoantibodies by age 7 years [6]. Further, in siblings identical by descent for both DR3/4 haplotypes, 63% had positive autoantibodies by age 7 and 85% were positive by age 15. In contrast, only 20% of DR3/4 siblings sharing 0 or 1 haplotype identical by descent-developed autoantibodies [6]. The risk in the general population for individuals carrying this high-risk class II genotype, after risk stratification for protective DRB1*04 alleles and family history, exceeds 20% for development of T1DM and 45% for development of autoantibodies [7]. Interestingly, either DR3 or DR4 haplotypes in homozygous form (DR3/DR3 or DR4/DR4) are lower risk compared with the DR3/4 genotype. The mechanism for the increased heterozygote risk is not completely understood, but it has been hypothesized that the DQA1*0501 allele of DR3 haplotypes and the DQB1*0302 alleles of DR4 haplotypes combine creating a “chimeric” molecule (DQA1*0501, DQB1*0302) for antigen presentation that increases risk for diabetes [8].

Some MHC class II alleles have been determined to be protective. DQB1*0602 is present in 20% of the general population but in only 1% of children with T1DM [9]. Other protective alleles include DRB1*0403 (even when DQB1*0302 is present) and DRB1*1401 [8]. Among DRB1*04 alleles there are varying degrees of risk with the greatest being DRB1*0405-DQB1*0302, followed closely by DRB1*0401-DQB1*0302, and to a much less extent DRB1*0402-DQB1*0302 and DRB1*0404-DQB1*0302 [8, 10]. Class II alleles of DPB1 (and associated haplotypes) have more recently been considered in protection and risk for T1DM, and potentially have effects that are independent of DRB1-DQB1 haplotypes. Previous studies have shown a predisposing effect of DPB1*0301 and a protective effect of DPB1*0402 [11, 12], whereas more recent and detailed analyses have implicated susceptibility associated with DPA1*0103-DPB1*0301 and DPA1*0103-DPB1*0202, and protection with DPA1*0103-DPB1*0402 and DPA1*0103-DPB1*0101 (Table 1) [13].
Table 1

Summary of the MHC class I and II associations in type 1 diabetes mellitus (P values < 10−3)

MHC class

Allele/haplotype

Association

Odds ratio

P value

Reference

Class II

DPA1*0103-DPB1*0101

Protection

0.2

2.E-10

Varney et al. [13]

Class II

DPA1*0103-DPB1*0202

Risk

2.1

2.E-03

Class II

DPA1*0103-DPB1*0301

Risk

1.6

5.E-08

Class II

DPA1*0103-DPB1*0402

Protection

0.5

7.E-15

Class II

DRB1*0301-DQA1*0501-DQB1*0201

Risk

3.64

2.E-22

Erlich et al. [8]

Class II

DRB1*0401-DQA1*0301-DQB1*0301

Protection

0.35

4.E-04

Class II

DRB1*0401-DQA1*0301-DQB1*0302

Risk

8.4

6.E-36

Class II

DRB1*0402-DQA1*0301-DQB1*0302

Risk

3.6

3.E-04

Class II

DRB1*0405-DQA1*0301-DQB1*0302

Risk

11.4

4.E-05

Class II

DRB1*0407-DQA1*0301-DQB1*0301

Protection

0.11

6.E-04

Class II

DRB1*0701-DQA1*0201-DQB1*0201

Protection

0.32

2.E-09

Class II

DRB1*0701-DQA1*0201-DQB1*0303

Protection

0.02

4.E-12

Class II

DRB1*1101-DQA1*0501-DQB1*0301

Protection

0.18

3.E-10

Class II

DRB1*1104-DQA1*0501-DQB1*0301

Protection

0.07

3.E-06

Class II

DRB1*1301-DQA1*0103-DQB1*0603

Protection

0.13

4.E-11

Class II

DRB1*1401-DQA1*0101-DQB1*0503

Protection

0.02

1.E-06

Class II

DRB1*1501-DQA1*0102-DQB1*0602

Protection

0.03

2.E-29

Class I

A*0101

Protection

0.77

2.E-03

Noble et al. [15]

Class I

A*0201

Risk

1.4

3.E-04

Class I

A*1101

Protection

0.53

8.E-06

Class I

A*2402

Risk

1.7

5.E-06

Class I

A*3201

Protection

0.54

4.E-04

Class I

A*6601

Protection

0.16

9.E-04

Class I

B*0702

Protection

0.58

3.E-07

Class I

B*1801

Risk

2.1

3.E-08

Class I

B*3502

Protection

0.29

5.E-04

Class I

B*3906

Risk

10.3

4.E-10

Class I

B*4403

Protection

0.42

4.E-07

Class I

B*5701

Protection

0.19

4.E-11

Class I

C*0401

Protection

0.63

6.E-05

Class I

C*0501

Risk

1.56

9.E-05

Class I

C*1601

Protection

0.39

5.E-06

Data from the most recent published data from the Type 1 Diabetes Genetic Consortium. DPA1-DPB1 and class I associations account for DRB1-DQB1

MHC major histocompatibility complex

Over the last several decades there have been changes in MHC class II allele frequency in T1DM. The highest-risk genotype DR3-DQ2/DR4-DQ8 has decreased in frequency over the last four decades, with the greatest decrease seen in children with age of onset less than 5 years and to a lesser extent in those with onset between age 6 and 10 years [14]. There has been an increase over time of other HLA genotypes without HLA-DR3 or -DR4. There is some evidence of a possible stepwise decrease in percent DR3/4 in the 1980s, which would suggest an acute environmental change (e.g., infectious, dietary, or pharmacologic) or a rapid rise of diabetes in other ethnic groups in which risk is not associated with DR3-DQ2 or DR4-DQ8 (e.g., the high-risk HLA-DR7 found in African American individuals or HLA-DR9 found in the Japanese). Given the population in the reporting study, the latter is less likely than the former.

MHC class I loci (HLA-A, -B, and -C) play a lesser role in diabetes susceptibility. Certain HLA-A, -B, and -C alleles have independent effects, but LD in the MHC region has made distinguishing class I from class II effects very difficult. Some class I alleles seem to present high risk, but this effect changes/disappears when accounting for LD with HLA class II haplotypes. For instance, HLA-A1 appears high risk for T1DM; however, most of this risk is due to the HLA-DR3 haplotype that accompanies it. When this DR3-DQ2 presence is accounted for, HLA-A1 is actually a more protective allele [15]. Likewise, HLA-B8 is very prevalent in T1DM individuals, but the allele itself has no independent effect apart from DR3-DQ2 [15]. Class I HLA-A alleles with significant (albeit relatively weak) independent effect include A2, A11, A24, A32, and A66 [15]. A3 has also been associated with a slower time of progression to diabetes after the onset of autoantibodies as well [16]. HLA-C4, -C5, and -C16 have also been weakly associated with T1DM [15, 17]. HLA-B appears to play a more important role than either HLA-A or HLA-C in T1DM risk (Table 1). HLA-B8 is lower risk when linked with the DR3 allele (and no effect when evaluated independently), and HLA-B18 presents one of the highest risks when associated with HLA-DR3 (on the “Basque haplotype” DR3-B18-A30) as well as independently [15, 17, 18]. HLA-B39 (specifically B*3906) confers high risk in several populations [15, 16]. Risk is dependent on the presence of specific DRB1-DQB1 haplotypes, namely DRB1*0801-DQB1*0402 and DRB1*0101-DQB1*0501, but not DRB1*0301-DQB1*0201 or DRB1*0401-DQB1*0302 [19]. When DRB1-DQB1 haplotypes are accounted for, HLA-B*3906 has the highest independent effect [15]. Other HLA-B alleles associated with T1DM are B7, B35, B44, and B57 [15].

Knowledge of the role of the MHC region has evolved with technology over the last several decades. Initial evaluation of class I and II MHC alleles relied on serologic testing of the MHC antigen itself. Genetic technology in the 1970s and 1980s made it possible to rapidly detect genotypes in class I and class II loci (via serologic, linear array, and eventually sequence-based techniques), but it is only relatively recently that more in-depth analyses have been undertaken in this region.

SNP Associations in the MHC

Extensive long-range LD between alleles of genes of the MHC makes it difficult to pinpoint specific genes contributing to risk. This has made technology using SNPs in the region invaluable to help refine our understanding of the role that class I and II MHC alleles play in disease risk. Several consortiums including the T1DGC (Type 1 Diabetes Genetic Consortium) have undertaken fine mapping of the MHC region to better understand disease associations in this gene-dense, highly polymorphic, and tightly linked region. Over 3000 SNPs and 66 microsatellite markers have been typed in 2300 type 1diabetes families (approximately 10,000 individuals), in one of the most detailed genotyping studies in the MHC region to date [20]. Besides confirming many of the previous hypotheses regarding class I and class II allelic risk, this dataset has served to better define SNPs lying outside of these two major risk regions. It has also given a better picture regarding LD in the region, leading to a better understanding of the involvement of extended haplotypes in T1DM risk.

One of the most well-known extended haplotypes (“8.1 haplotype”) consists of (in order from centromeric to telomeric) DQA1*0501, DQB1*0201, DRB1*0301, HLA-B8, and HLA-A1. This is the most common extended haplotype in the Caucasian population, with over 99% identical SNP identity across the MHC. It is increased in T1DM individuals (18% vs 9% of Caucasian controls) [21] presumably due to the DRB1*0301 and DQB1*0201 alleles and not as much to the HLA-B8 and HLA-A1 portion of the haplotype. The DR3-B8-A1 haplotype confers less risk than other DR3 haplotypes, with higher risk found in the less common DR3-B18-A30 (Basque) haplotype as well as other non-B8, DR3-positive individuals [22, 23]. This would implicate susceptibility loci telomeric to class II alleles, although association studies alone have not been able to pinpoint a specific locus in such a complex region of the genome. Furthermore, recent studies have shown differences in DPB1 alleles (DPB1*0301, *0402, and *0202) when only DR3-B8-A1 chromosomes are examined, indicating a risk locus centromeric to the DQ region [13]. With the plethora of immune-associated genes in the MHC region, as well as the high degree of LD in this region, higher-resolution SNP typing (or deep sequencing) is a useful technology for refining candidate loci in the MHC. Even so, functional tests exploring MHC molecular interactions will be necessary to confirm which loci are involved in T1DM risk.

Associations Outside the MHC

Discoveries of non-HLA regions with a modest effect relative to the MHC began to appear in the 1980s. Typically, linkage analysis and candidate genes were used to define associations between suspected regions and disease. It was not until recently, with the application of SNP analysis and GWAS, that a large number of loci were discovered and validated. Only a handful of genes are currently thought to have a significant genetic effect outside of the MHC, whereas most loci have much smaller effects [24, 25•, 26].

One of the first discovered non-MHC regions of interest is the insulin gene (INS), mainly due to the known correlation between development of diabetes and anti-insulin antibodies. Initially (and for almost two decades) a variable number tandem repeat sequence (VNTR) at the 5′ end of the insulin gene became associated with risk for T1DM [27]. Longer repeats are protective and are associated with increased insulin mRNA expression in the thymus [28, 29]. The length of the VNTR serves as a marker of disease risk, and implicates a region as large as 4000 bp. With the application of SNP association analysis, the VNTR of the insulin gene was found to be in strong LD with two SNPs within INS (i.e., −23HphI and +1140A/C) [30].These SNPs are presumably in closer association to the causal variation than the VNTR itself, and represent an example of how new technology is refining our understanding of autoimmune pathogenesis.

PTPN22 is located on 1p13 and encodes for LYP (a protein tyrosine phosphatase). Using a candidate gene approach, with repeated validation through various GWAS, nucleotide 1858 was found to contain a nonsynonymous SNP that changes arginine to tryptophan at amino acid position 620. This polymorphism results in a gain of function that increases inhibition of T-cell receptor signaling. Numerous groups in many different populations have confirmed its presence in T1DM patients, with an odds ratio (OR) of 3.4 in its homozygous form [31, 32]. It is hypothesized that the SNP decreases T-cell signaling, thereby decreasing negative selection in the thymus.

CTLA4, located on 2q31, encodes for cytotoxic T-lymphocyte–associated protein 4, which is a known negative regulator of cytotoxic T-cell function and an attractive candidate gene in T1DM pathogenesis [33]. Early studies revealed functional and transcriptional alterations in CTLA4 alleles carrying one or more common SNPs [34, 35]. Functional studies have confirmed its role in T-cell regulation [36, 37], and prior associations have been replicated for CTLA4 in multiple studies. CTLA4 polymorphisms have been associated with other autoimmune diseases as well, including Graves’ disease and Addison’s disease. The locus has a stronger association with thyroid autoimmunity and is primarily associated in T1DM patients with thyroid autoimmunity [38].

One of the more recently associated regions via tag SNP and candidate gene analysis is IL2RA/CD25 on 10p15.1 [39]. Functional studies have confirmed the role of IL2RA/CD25 in regulating T-cell function [40, 41] and patients with T1DM have been shown to have different circulating concentrations of CD25 [42]. The IL2RA locus has been implicated in other autoimmune diseases such as multiple sclerosis. Several other non-MHC loci have been analyzed for association with T1DM, especially with the advent of high-powered GWAS.

Genome-wide Association Studies

GWAS give information on thousands (and now millions) of SNPs covering specific regions across the entire genome. The goal is to define differences in thousands of genes (or SNPs in LD with particular alleles of genes) by simultaneously comparing SNP allele frequency differences between cases and controls on a “chip” the size of a microscope slide. These studies generate large amounts of data quickly, cost effectively, and in large numbers of subjects, but require strict quality control measures and curation before interpretation can be attempted. For these reasons the science of genetics has recently required more collaboration with biostatisticians and computational analysts.

Over the last several years, GWAS have been the primary method of studying associations in many common diseases without hypotheses regarding specific genes or regions. Besides technologic complexity, limitations of GWAS consist mainly in the reliability of the results, specifically the incidence of false-positives. Therefore, the studies have needed very large populations with well-defined cases and controls, and corrected P values exceeding 10−8. Study methods have therefore turned to consortiums such as the T1DGC and the Wellcome Trust to have adequate numbers to power GWAS. Other potential confounders in GWAS studies include ethnicity and ancestral proclivity to disease within a population, technical errors in genotyping, errors in phenotype definition, and inappropriate analysis of the data.

The use of meta-analyses to validate results has become commonplace, as looking at more studies (and thus more individuals) will give more power and less likelihood of a false-positive result. For many SNPs and regions of interest discovered initially through GWAS, meta-analyses have given larger numbers across multiple populations, and therefore have validated earlier studies [25•, 43, 44]. Other ways to ensure validity in GWAS include functional studies (e.g., animal studies involving the locus of interest), resequencing studies, and replication of the finding in another population.

High-density SNP analysis (over 300,000 per individual) and follow-up meta-analyses have added to the list of regions involved in T1DM. Still the strongest signals by far are associated with the MHC (OR > 6) with the next highest non-MHC signals in INS and PTPN22 (both with an OR of ~2) [26]. Other loci confirmed in robust studies across multiple ethnicities include IFIH1, CTLA4, CD25/IL2RA, IL2, PRKCQ, BACH2, PTPN2, ERRB3, UBASH3A, CLEC16A, CTSH, and C1QTNF6 [25•, 4249]. Even with their limitations GWAS have confidently and robustly yielded over 15 non-MHC loci of interest for T1DM, including the validation of previously discovered associations (e.g., INS, PTPN22, and CTLA4) and the discovery of novel loci such as CLEC16A and IFIH1. Over 50 total non-MHC loci have been described and are currently being validated and confirmed (http://www.t1dbase.org) (Table 2) [50•]. Again, many of the loci discovered give only modest modifications in risk prediction (OR < 2 and >0.6); however they also serve to better define disease pathogenesis and potentially create targets for future therapeutics.
Table 2

Summary of the non-MHC SNP associations in type 1 diabetes mellitus

Chromosome region

Gene candidate

Position range (GRCh37)

Size of region (Mb)

SNP

P value

Odds ratio

MAF

Reference

1p13.2

PTPN22

113818477-114658477

0.84

rs2476601

1.13E-88

2.05

0.095

Smyth et al. [46]

8.49E-85

2.05

0.142

Barrett et al. [25•]

2q24.2

IFIH1

162960873-163392761

0.43

rs1990760

2.13E-10

0.86

0.389

Smyth et al. [46]

6.57E-09

0.86

0.4

Barrett et al. [25•]

2q33.2

CTLA4

204672809-204820058

0.15

rs3087243

1.27E-14

0.82

0.452

Smyth et al. [46]

1.21E-15

0.88

0.452

Barrett et al. [25•]

4q27

IL2

122909415-123614282

0.7

rs2069763

1.28E-07

1.13

0.329

Smyth et al. [46]

rs4505848

4.7E-13

Barrett et al. [25•]

6q15

BACH2

90806835-91046297

0.24

rs117555

8.57E-09

1.13

0.465

Smyth et al. [46]

5.38E-08

Barrett et al. [25•]

6q22.32

None cited

126438029-127419834

0.98

rs9388489

5.14E-08

1.17

0.452

Barrett et al. [25•]

7p15.2

SKAP2

26657962-27205282

0.55

rs7804356

3.27E-08

0.88

0.238

Barrett et al. [25•]

7p12.2

IKZF1

50366637-50695317

0.33

rs10272724

2.60E-09

0.87

0.28

Unpublished

9p24.2

GLIS3

4228550-4321558

0.09

rs7020673

1.88E-09

0.88

0.502

Barrett et al. [25•]

10p15.1

IL2RA

6029726-6197536

0.17

rs12722495

2.94E-11

0.63

0.11

Lowe et al. [42]

1.74E-30

0.62

0.113

Smyth et al. [46]

10p15.1

PRKCQ

6435374-6545104

0.11

rs11258747

9.84E-09

0.69

0.13

Lowe et al. [42]

1.16E-07

Barrett et al. [25•]

rs947474

3.65E-09

0.91

0.187

Cooper et al. [44]

1.48E-05

0.88

0.187

Smyth et al. [46]

5.53E-06

Barrett et al. [25•]

10q23.31

RNLS

90008047-90278380

0.27

rs10509540

6.92E-09

0.75

0.285

Barrett et al. [25•]

11p15.5

INS

2068424-2308304

0.24

rs689

8.93E-195

0.42

0.293

Smyth et al. [46]

12q13.2

None cited

56351346-56805309

0.45

rs2292239

2.22E-25

1.31

0.34

Barrett et al. [25•]

12q24.12

SH2B3

111287726-113238728

1.95

rs3184504

2.72E-24

1.28

0.485

Smyth et al. [46]

2.78E-27

1.28

0.488

Barrett et al. [25•]

13q32.3

GPR183

99925872-100117793

0.19

rs9585056

5.20E-09

1.15

0.25

Heinig et al. [48]

14q24.1

None cited

69167738-69318062

0.15

rs1465788

1.37E-08

0.86

0.287

Barrett et al. [25•]

14q32.2

DLK1

101288031-101328739

0.04

rs941576

1.62E-10

0.9

0.43

Wallace et al. [49]

15q25.1

None cited

78986805-79263361

0.28

rs3825932

3.17E-15

0.86

0.318

Cooper et al. [44]

4.62E-10

0.86

0.318

Smyth et al. [46]

7.65E-08

Barrett et al. [25•]

16p13.13

None cited

10924559-11246558

0.32

rs12708716

3.19E-13

0.81

0.351

Smyth et al. [46]

2.17E-16

Barrett et al. [25•]

16p11.2

IL27

28283735-29036915

0.75

rs4788084

5.19E-08

0.86

0.424

Barrett et al. [25•]

16q23.1

None cited

75202730-75528511

0.33

rs7202877

5.71E-11

1.28

0.096

Barrett et al. [25•]

17q21.2

None cited

38737374-38878474

0.14

rs7221109

9.89E-10

0.95

0.353

Barrett et al. [25•]

18p11.21

PTPN2

12736556-12926278

0.19

rs478582

8.83E-12

0.83

0.449

Smyth et al. [46]

2.18E-12

Barrett et al. [25•]

rs45450798

1.15E-16

1.28

0.166

Smyth et al. [46]

18q22.2

CD226

67479515-67571610

0.09

rs763361

1.56E-08

1.16

0.471

Smyth et al. [46]

1.24E-05

1.16

0.465

Barrett et al. [25•]

19p13.2

TYK2

10395447-10628468

0.23

rs2304256

4.13E-09

0.86

0.293

Wallace et al. [49]

21q22.3

UBASH3A

43808809-43888353

0.08

rs3788013

3.09E-08

1.13

0.433

Smyth et al. [46]

2.09E-06

Barrett et al. [25•]

22q12.2

None cited

29807855-30669883

0.86

rs5753037

1.83E-14

1.1

0.391

Barrett et al. [25•]

22q13.1

IL2RB

37568670-37666786

0.1

rs229541

1.98E-08

1.11

0.427

Cooper et al. [44]

6.96E-07

1.12

0.428

Smyth et al. [46]

2.07E-07

Barrett et al. [25•]

Adapted from the online data resource “T1DBase.org” (Burren et al. [50•]), a collaborative effort between Cambridge University and the Type 1 Diabetes Genetic Consortium

All SNP data and P values are based on genome-wide association studies. SNPs listed were required to have at least one \( P\;{\text{value}} \leqslant {5} \times {1}{0^{{ - {8}}}} \). Odds ratio, MAF, and candidate gene in SNP region listed when available

MAF minor allele frequency, MHC major histocompatibility complex, SNP single nucleotide polymorphism

Recently, the Immunochip Initiative (The Sanger Center, UK, Wellcome Trust, Hinxton, Cambridge, UK) was begun by taking SNPs of interest from multiple previous GWAS for multiple immune-mediated diseases and combining them onto a single 200,000 SNP, Illumina-based platform (Illumina, San Diego, CA). Data are currently being generated for consortiums such as the T1DGC using this technology. The Immunochip includes deep coverage of the MHC region (> 6000 SNPs) as well as replication efforts from previous GWAS, and a plethora of nonsynonymous SNPs in genes related to immune function not previously targeted. This “disease-based” approach will allow well-powered distinction between related immunologic disease (e.g., T1DM, thyroid disease, celiac, and Addison’s) without the random, nonhypothetical nature of the traditional GWAS. Similar “custom” chips, containing up to 5 million SNPs, are being used in all fields of genetic research.

Another limitation of the current GWA paradigm is that the SNPs being used are typically considered “common” in the population (> 5% minor allele frequency [MAF]). Therefore, the SNPs (or haplotypes the SNP identifies) are likely to be common also. Although this model fits well into the “common disease–common variant hypothesis,” it is not designed to detect rare (< 1% MAF), potentially disease-causing variants in a population [5153].

Furthermore, one recent review suggests that there could be over 1000 protein-coding genes and 500 non-coding genes (including regulatory elements, pseudogenes, and RNA coding sequences) involved specifically in T1DM heritability based on currently known SNP associations [54]. In most GWA-specific findings, it is rare that the actual causal variant is found despite resequencing efforts in the area of original signal and follow-up functional studies. Although rare variants can potentially be detected in GWAS (due to their LD with the common SNP used), it is not an optimal approach [55]. For T1DM, the best example of this is IFIH1, where a “common” signal was originally found and confirmed by GWA [56], and resequencing revealed several likely (and more rare) disease-associated changes conferring protection from T1DM [57]. Identification of the causal genes/variations within GWA-implicated regions will require further genetic mapping, genotype-phenotype correlation as well as evaluation of the mechanisms and molecular differences in gene function.

High-Throughput Sequencing and Beyond

In contrast to GWAS, whole-genome sequencing explores rare genomic variations via high-throughput reads of millions of consecutive base pairs across the genome. This approach uses the “common disease–rare variant hypothesis,” which posits that common diseases are more like Mendelian disorders in which multiple rare variants within genes create a functional problem that leads to pathology [58]. Sequencing has the power to systematically uncover millions of single base pair changes across the genome by analyzing billions of bases. Rare changes could have a large effect, and direct functional impact in genes responsible for disease development [59]. Whole-exome sequencing, in which only the protein-coding portion of the genome is sequenced (accounting for ~1% of total genomic content), is a starting point for whole-genome exploration in both common and rare disease. Whole-exome sequencing presents a comprehensive, potentially cost-effective way to interrogate the genome with specific focus on regions already implicated in GWAS. Regarding clinical application of this technology, whole-exome sequencing has found success in discovering genes responsible for rare disorders [60•, 61]. However, there has been very little published to date applying high-throughput exome sequencing to common disorders, and none for T1DM.

Efforts are underway at major research centers to sequence hundreds of individuals with T1DM. As technology improves, larger sequencing projects with more individuals at more locations will be feasible. Eventually true whole-genome sequencing, which reveals intronic, promoter, and splice site sequence changes, will be applicable to large populations. For now the cost, data management, data interpretation, as well as our general understanding of the structure and composition of the human genome are limitations to using this new technology. However, improvements in technology will probably make whole-genome sequencing a practical and potentially revolutionary approach to genome-wide exploration in T1DM in the coming years.

Other genetic associations may result from genome-wide sequencing techniques, including epigenetic factors, copy number variation, RNA transcription and function, and gene regulation via promoters, enhancers, and methylation. Even if the entirety of genetic contribution to T1DM were to be uncovered, attention would still need to focus on nongenetic factors contributing to the development of T1DM. With concordance in monozygotic twins up to 66% with long-term follow-up, there is still a significant environmental component to the development of T1DM with a genetic risk present to trigger disease. Studies such as TEDDY (The Environmental Determinate of Diabetes in the Young) are currently underway in children in the United States, Germany, and Northern Europe to examine potential causal environmental factors in the development of T1DM [62].

Conclusions

Genetic associations in T1DM have progressed rapidly in the last three decades, with the largest acceleration in the last 10 years. With new technologies promoting comprehensive and efficient genome-wide analyses, these associations are becoming more and more robust revealing risk inside and outside of the MHC region. Although the first and major T1DM susceptibility locus maps to the MHC class II loci, many additional non-MHC loci have been identified with the advent of whole-genome SNP genotyping studies in the last several years. A majority of these loci appear to have effects in the immune system. These genetic discoveries are leading to practical application in the clinical setting in the form of risk prediction for T1DM, understanding pathogenesis, and ultimately potential therapies for individuals with or at risk for T1DM.

Acknowledgment

A.K. Steck has received a JDRF Early Career Patient Oriented Research Award (11-2010-206).

Disclosure

No potential conflicts of interest relevant to this article were reported.

Copyright information

© Springer Science+Business Media, LLC 2011