Introduction

We are all witnesses to a period of astonishing progress in our understanding of the genetic basis of diabetes, and the advances of recent months are arguably the most important made since the role of the HLA region was recognised in type 1 diabetes. The number of genetic regions causally implicated is now 11 each for type 1 and type 2 diabetes [19], and is set to rise further. The bewildering pace of new discovery stands in stark contrast to the slow progress that characterised the previous two decades, with a total combined output of three confirmed genes for type 2 diabetes and six for type 1 (Fig. 1). At last, it seems, our understanding of the genetic basis of complex, multifactorial forms of diabetes is catching up with that of rarer, single-gene disorders.

Fig. 1
figure 1

Progress in identifying susceptibility genes for multifactorial forms of diabetes. Type 1 diabetes above the dateline; type 2 diabetes below. Date shown does not always indicate the year of the first reported association, but rather the date of the papers showing the most compelling evidence for association. Regions identified through candidate gene approaches are in black, and those from GWA studies are in red. Regions shown in blue (WFS1, IFIH1, TCF7L2) were identified through genetic surveys which, though extensive, were not comprehensive. Note that in many cases it is not yet clear whether the genes listed are actually causal, since variants within the associated regions may be acting through other nearby genes

This leap in knowledge is the result of major advances in technology plus an improved understanding of patterns of human genetic variation. Using single nucleotide polymorphism (SNP) chips it is now possible to analyse up to a million SNPs—the most common form of genetic variation—in a single analysis. Repeated analyses with DNA from several thousands of patients and controls allow the identification of those variants that differ in frequency between the two groups. Previous analyses in diabetes genetics typically targeted individual candidate genes and rarely sampled more than a few hundred SNPs (often far fewer). Current genome-wide association (GWA) studies survey about 75% of common variation across the human genome.

This change in capacity has enormous ramifications for researchers seeking to understand the molecular basis of diabetes and related traits. To take one example, the adoption of liberal data release policies means that results from these studies are often freely available. If a researcher wants to know if a gene of particular interest contains diabetes risk variants, this information is only a few mouse clicks away.

As a result, the focus and scale of genetic studies are poised for dramatic shifts. Here we outline some of the insights forthcoming from this first wave of GWA studies and highlight some of the questions that need answering. One such question is the focus of an article by Schulze et al. in this month’s issue of Diabetologia [10]: do the common variants identified in case–control studies predict incident diabetes in prospective studies?

Lessons from GWA studies

The most important lesson is the demonstration of the power of genetics to provide novel insights into disease aetiology. Of the 11 genes or regions now implicated in type 2 diabetes, only four were strong biological candidates (PPARG, KCNJ11, WFS1, TCF2) [8, 9, 1114]. Three had some corroborating evidence (IGF2BP2, the HHEX–IDE gene region, SLC30A8) [26], but for the remainder, evidence of their link to diabetes came as a complete surprise. These studies provide the first evidence implicating Wnt-signalling pathways (TCF7L2) and cell cycle control (CDKAL1 and CDKN2A/2B) in the pathogenesis of type 2 diabetes [2, 3, 5, 6]. For type 1, the key new discoveries highlight the contribution to disease pathogenesis of the PTPN gene family and IL-2 signalling [1, 7].

The most salient methodological lesson is the confirmation that, where genetics is concerned, power is everything. Most of the variants identified to date have modest effects on disease risk (odds ratios between 1.1 and 1.4 for each copy of the risk allele inherited). For type 2 diabetes all the regions identified, other than TCF7L2, lie in the lower part of this range; this is also likely to be the case for most of the loci yet to be discovered. Small effect sizes mean that large sample sizes (tens of thousands of individuals) are required if the signals are to be found. This problem is compounded by the extremely high levels of significance (p < 10−7) needed when evidence of association is sought across a genome’s worth of variation.

The accompanying article by Schulze and colleagues [10] illustrates this point well. The study set out to test whether risk variants reported in the first GWA study [4] were associated with incident type 2 diabetes. Using 727 incident cases and 2,500 controls from the prospective European Prospective Investigation into Cancer and Nutrition (EPIC)-Potsdam cohort, the study found an effect with the HHEXKIF11IDE variants but not with the SLC30A8 variant. One interpretation is that the HHEXKIF11IDE SNPs predict the incidence of type 2 diabetes but the SLC30A8 SNP does not, which may be the case, but the failure to detect effects at the SLC30A8 variant may well be due to the combined effects of low power and sampling error. Whilst the estimate of relative risk was only 1.09, the upper confidence interval of this estimate was 1.28, and so includes effect sizes much larger than those expected from the original GWA study data. In other words, the apparently ‘negative’ result at SLC30A8 is not incompatible with the previous data implicating this gene in type 2 diabetes risk [26].

Where do we go from here?

The advent of GWA studies has important implications for future genetics research. Now that several large analyses have been completed for both forms of diabetes, there is little need for ‘traditional’ candidate gene studies, at least within European samples, since good coverage of common SNPs (exceeding 80% once typed SNPs are supplemented with novel methods for imputing genotypes at many untyped SNPs [15]) is now available genome-wide. The next wave of candidate gene studies will increasingly focus on those variants that currently go undetected by GWA studies; for example, by using deep resequencing approaches to analyse the phenotypic associations of all variants within a gene, both common and rare [16]. These and other association studies will need to be judged against much stricter genome-wide yardsticks, with declarations of genuine association reserved for findings that meet the criteria of significance (p < 10−7) and quality (e.g. with respect to genotyping performance and freedom from population stratification) demanded of GWA scans [1]. Data from previous GWA scans, available through liberal data release policies, can and should be used for in silico replication of emerging findings, though such usage needs to be unbiased rather than selective.

With so many data sets available, underpowered studies are of minimal value when it comes to supporting or refuting the original findings [17]. In contrast, large, well-performed studies retain the capacity to reveal important new facts, even about ‘proved’ associations. Paradoxically, studies that provide compelling evidence that a previously well-replicated association is not detectable in an otherwise well-performed and well-powered study can be the most illustrative of all (informative heterogeneity). For example, the observation that the association between variants in the FTO gene and type 2 diabetes was exclusive to those studies in which cases and controls differed substantially in BMI (by accident or design) provided the first clue that the primary effect of this gene was to influence weight [25, 18].

Future questions

Scientific progress typically raises as many new questions as it answers. Therefore, it is no surprise that there is much new soil for diabetes geneticists to till (see Text box: Where do we go from here?).

To begin with, there are many more genes to find for both type 1 and type 2 diabetes. Research efforts to date have followed up on only a small subset of SNPs with the strongest associations, and the confirmed susceptibility genes (other than HLA) explain only a small proportion of the observed familial aggregation. Current GWA studies focus on the discovery of common SNP-based susceptibility variants; future efforts will be directed towards structural (copy number) variants and towards rare variants in general.

Next, since GWA studies involve direct typing of only a representative subset of the common SNPs across the genome, variants showing the strongest association are unlikely to be those which are causal. In several instances the association signals uncovered have involved several genes, any one of which could be the culprit. Systematic evaluation of the regions of interest will be required to pinpoint the genetic changes that underlie disease predisposition.

In addition, the mechanisms whereby a given DNA change leads to an increased risk of diabetes need to be reconstructed. In type 1 diabetes we need to understand how the susceptibility variants influence immune response and tolerance. In type 2, we need to know whether they influence disease predisposition through primary effects on beta cell function, through insulin action, or by some other mechanism.

The ultimate objective is, of course, to understand how genetic findings can translate into advances in clinical management. One route lies in the potential to exploit knowledge of individual patterns of genetic predisposition to beneficial effect when selecting a given form of therapy. Whilst information from any single variant has limited predictive value, the same may not be true when information from many susceptibility genes is combined. In principle, genetic testing might offer insights into disease risk and predict response to the various therapeutic and preventative options available, but much work will be required to understand how to deploy such tests in clinically effective ways. Another route is to apply the insights gained into the mechanisms of disease predisposition to identify new targets for drug development. Here, genes of small effect are likely to provide clues just as valuable as those of large effect.

Concluding remarks

For the past two decades, genetics has been widely advocated as a tool for unravelling the pathogenesis of common forms of diabetes, but the complexity of the problem defied easy solutions. Recent advances have made it possible to find many of the genes that predispose to both major types of diabetes. Much work is still needed to translate knowledge of these genes into benefits for patients. The greatest benefit is likely to come from new and better therapies derived from an improved understanding of the aetiology of the disease.