Introduction
We are all witnesses to a period of astonishing progress in our understanding of the genetic basis of diabetes, and the advances of recent months are arguably the most important made since the role of the HLA region was recognised in type 1 diabetes. The number of genetic regions causally implicated is now 11 each for type 1 and type 2 diabetes [1–9], and is set to rise further. The bewildering pace of new discovery stands in stark contrast to the slow progress that characterised the previous two decades, with a total combined output of three confirmed genes for type 2 diabetes and six for type 1 (Fig. 1). At last, it seems, our understanding of the genetic basis of complex, multifactorial forms of diabetes is catching up with that of rarer, single-gene disorders.
This leap in knowledge is the result of major advances in technology plus an improved understanding of patterns of human genetic variation. Using single nucleotide polymorphism (SNP) chips it is now possible to analyse up to a million SNPs—the most common form of genetic variation—in a single analysis. Repeated analyses with DNA from several thousands of patients and controls allow the identification of those variants that differ in frequency between the two groups. Previous analyses in diabetes genetics typically targeted individual candidate genes and rarely sampled more than a few hundred SNPs (often far fewer). Current genome-wide association (GWA) studies survey about 75% of common variation across the human genome.
This change in capacity has enormous ramifications for researchers seeking to understand the molecular basis of diabetes and related traits. To take one example, the adoption of liberal data release policies means that results from these studies are often freely available. If a researcher wants to know if a gene of particular interest contains diabetes risk variants, this information is only a few mouse clicks away.
As a result, the focus and scale of genetic studies are poised for dramatic shifts. Here we outline some of the insights forthcoming from this first wave of GWA studies and highlight some of the questions that need answering. One such question is the focus of an article by Schulze et al. in this month’s issue of Diabetologia [10]: do the common variants identified in case–control studies predict incident diabetes in prospective studies?
Lessons from GWA studies
The most important lesson is the demonstration of the power of genetics to provide novel insights into disease aetiology. Of the 11 genes or regions now implicated in type 2 diabetes, only four were strong biological candidates (PPARG, KCNJ11, WFS1, TCF2) [8, 9, 11–14]. Three had some corroborating evidence (IGF2BP2, the HHEX–IDE gene region, SLC30A8) [2–6], but for the remainder, evidence of their link to diabetes came as a complete surprise. These studies provide the first evidence implicating Wnt-signalling pathways (TCF7L2) and cell cycle control (CDKAL1 and CDKN2A/2B) in the pathogenesis of type 2 diabetes [2, 3, 5, 6]. For type 1, the key new discoveries highlight the contribution to disease pathogenesis of the PTPN gene family and IL-2 signalling [1, 7].
The most salient methodological lesson is the confirmation that, where genetics is concerned, power is everything. Most of the variants identified to date have modest effects on disease risk (odds ratios between 1.1 and 1.4 for each copy of the risk allele inherited). For type 2 diabetes all the regions identified, other than TCF7L2, lie in the lower part of this range; this is also likely to be the case for most of the loci yet to be discovered. Small effect sizes mean that large sample sizes (tens of thousands of individuals) are required if the signals are to be found. This problem is compounded by the extremely high levels of significance (p < 10−7) needed when evidence of association is sought across a genome’s worth of variation.
The accompanying article by Schulze and colleagues [10] illustrates this point well. The study set out to test whether risk variants reported in the first GWA study [4] were associated with incident type 2 diabetes. Using 727 incident cases and 2,500 controls from the prospective European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam cohort, the study found an effect with the HHEX–KIF11–IDE variants but not with the SLC30A8 variant. One interpretation is that the HHEX–KIF11–IDE SNPs predict the incidence of type 2 diabetes but the SLC30A8 SNP does not, which may be the case, but the failure to detect effects at the SLC30A8 variant may well be due to the combined effects of low power and sampling error. Whilst the estimate of relative risk was only 1.09, the upper confidence interval of this estimate was 1.28, and so includes effect sizes much larger than those expected from the original GWA study data. In other words, the apparently ‘negative’ result at SLC30A8 is not incompatible with the previous data implicating this gene in type 2 diabetes risk [2–6].
Where do we go from here?
The advent of GWA studies has important implications for future genetics research. Now that several large analyses have been completed for both forms of diabetes, there is little need for ‘traditional’ candidate gene studies, at least within European samples, since good coverage of common SNPs (exceeding 80% once typed SNPs are supplemented with novel methods for imputing genotypes at many untyped SNPs [15]) is now available genome-wide. The next wave of candidate gene studies will increasingly focus on those variants that currently go undetected by GWA studies; for example, by using deep resequencing approaches to analyse the phenotypic associations of all variants within a gene, both common and rare [16]. These and other association studies will need to be judged against much stricter genome-wide yardsticks, with declarations of genuine association reserved for findings that meet the criteria of significance (p < 10−7) and quality (e.g. with respect to genotyping performance and freedom from population stratification) demanded of GWA scans [1]. Data from previous GWA scans, available through liberal data release policies, can and should be used for in silico replication of emerging findings, though such usage needs to be unbiased rather than selective.
With so many data sets available, underpowered studies are of minimal value when it comes to supporting or refuting the original findings [17]. In contrast, large, well-performed studies retain the capacity to reveal important new facts, even about ‘proved’ associations. Paradoxically, studies that provide compelling evidence that a previously well-replicated association is not detectable in an otherwise well-performed and well-powered study can be the most illustrative of all (informative heterogeneity). For example, the observation that the association between variants in the FTO gene and type 2 diabetes was exclusive to those studies in which cases and controls differed substantially in BMI (by accident or design) provided the first clue that the primary effect of this gene was to influence weight [2–5, 18].
Future questions
Scientific progress typically raises as many new questions as it answers. Therefore, it is no surprise that there is much new soil for diabetes geneticists to till (see Text box: Where do we go from here?).
To begin with, there are many more genes to find for both type 1 and type 2 diabetes. Research efforts to date have followed up on only a small subset of SNPs with the strongest associations, and the confirmed susceptibility genes (other than HLA) explain only a small proportion of the observed familial aggregation. Current GWA studies focus on the discovery of common SNP-based susceptibility variants; future efforts will be directed towards structural (copy number) variants and towards rare variants in general.
Next, since GWA studies involve direct typing of only a representative subset of the common SNPs across the genome, variants showing the strongest association are unlikely to be those which are causal. In several instances the association signals uncovered have involved several genes, any one of which could be the culprit. Systematic evaluation of the regions of interest will be required to pinpoint the genetic changes that underlie disease predisposition.
In addition, the mechanisms whereby a given DNA change leads to an increased risk of diabetes need to be reconstructed. In type 1 diabetes we need to understand how the susceptibility variants influence immune response and tolerance. In type 2, we need to know whether they influence disease predisposition through primary effects on beta cell function, through insulin action, or by some other mechanism.
The ultimate objective is, of course, to understand how genetic findings can translate into advances in clinical management. One route lies in the potential to exploit knowledge of individual patterns of genetic predisposition to beneficial effect when selecting a given form of therapy. Whilst information from any single variant has limited predictive value, the same may not be true when information from many susceptibility genes is combined. In principle, genetic testing might offer insights into disease risk and predict response to the various therapeutic and preventative options available, but much work will be required to understand how to deploy such tests in clinically effective ways. Another route is to apply the insights gained into the mechanisms of disease predisposition to identify new targets for drug development. Here, genes of small effect are likely to provide clues just as valuable as those of large effect.
Concluding remarks
For the past two decades, genetics has been widely advocated as a tool for unravelling the pathogenesis of common forms of diabetes, but the complexity of the problem defied easy solutions. Recent advances have made it possible to find many of the genes that predispose to both major types of diabetes. Much work is still needed to translate knowledge of these genes into benefits for patients. The greatest benefit is likely to come from new and better therapies derived from an improved understanding of the aetiology of the disease.
Abbreviations
- GWA:
-
genome-wide association
- SNP:
-
single nucleotide polymorphism
References
Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678
Saxena R, Voight BF, Lyssenko V et al (2007) Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316:1331–1336
Scott LJ, Mohlke KL, Bonnycastle LL et al (2007) A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316:1341–1345
Sladek R, Rocheleau G, Rung J et al (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445:881–885
Steinthorsdottir V, Thorleifsson G, Reynisdottir I et al (2007) A variant in CDKAL1 influences insulin response and risk of type 2 diabetes. Nat Genet 39:770–775
Zeggini E, Weedon MN, Lindgren CM et al (2007) Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316:1336–1341
Todd JA, Walker NM, Cooper JD et al (2007) Robust associations of four new chromosome regions from genome-wide analyses of type 1 diabetes. Nat Genet 39:857–864
Sandhu MS, Weedon MN, Fawcett KA et al (2007) Common variants in WFS1 confer risk of type 2 diabetes. Nat Genet 39:951–953
Gudmundsson J, Sulem P, Steinthorsdottir V et al (2007) Two variants on chromosome 17 confer prostate cancer risk, and the one in TCF2 protects against type 2 diabetes. Nat Genet 39:977–983
Schulze MB, Al-Hassani H, Boeing H, Fisher E, Döring F, Joost HG (2007) Variation in the HHEX–IDE gene region predisposes to type 2 diabetes in the prospective, population-based EPIC-Potsdam cohort. Diabetologia DOI 10.1007/s00125-007-0766-1
Altshuler D, Hirschhorn JN, Klannemark M et al (2000) The common PPARγ Pro12Ala polymorphism is associated with decreased risk of type 2 diabetes. Nat Genet 26:76–80
Gloyn AL, Weedon MN, Owen KR et al (2003) Large-scale association studies of variants in genes encoding the pancreatic (-cell KATP channel subunits Kir6.2 (KCNJ11) and SUR1 (ABCC8) confirm that the KCNJ11 E23K variant is associated with type 2 diabetes. Diabetes 52:568–572
Nielsen EM, Hansen L, Carstensen B et al (2003) The E23K variant of Kir6.2 associates with impaired post-OGTT serum insulin response and increased risk of type 2 diabetes. Diabetes 52:573–577
Winckler W, Weedon MN, Graham RR et al (2007) Evaluation of common variants in the six known maturity-onset diabetes of the young (MODY) genes for association with type 2 diabetes. Diabetes 56:685–693
Marchini J, Howie B, Myers S, McVean G, Donnelly P (2007) A new multipoint method for genome-wide association studies by imputation of genotypes. Nat Genet 39:906–913
Romeo S, Pennacchio LA, Fu Y et al (2007) Population-based resequencing of ANGPTL4 uncovers variations that reduce triglycerides and increase HDL. Nat Genet 39:513–516
Hattersley AT, McCarthy MI (2005) What makes a good genetic association study? Lancet 366:1315–1323
Frayling TM, Timpson NJ, Weedon MN et al (2007) A common variant in the FTO gene is associated with body mass index and predisposes to childhood and adult obesity. Science 316:889–894
Duality of interest
The authors declare that there is no duality of interest associated with this manuscript.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Frayling, T.M., McCarthy, M.I. Genetic studies of diabetes following the advent of the genome-wide association study: where do we go from here?. Diabetologia 50, 2229–2233 (2007). https://doi.org/10.1007/s00125-007-0825-7
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00125-007-0825-7