The Contribution of Low-Frequency and Rare Coding Variation to Susceptibility to Type 2 Diabetes

Purpose of Review Soon after the first genome-wide association study (GWAS) for type 2 diabetes (T2D) was published, it was hypothesized that rare and low-frequency variants might explain a substantial proportion of disease risk. Rare coding variants in particular were emphasized given their large expected role in disease. This review summarizes the extent to which recent T2D genetic studies provide evidence for or against this hypothesis. Recent Findings Following a comprehensive study of T2D genetic architecture using three sequencing and genotyping technologies, four even larger studies have provided a yet higher resolution view of the role of rare and low-frequency coding variation in T2D susceptibility. Summary Empirical evidence strongly suggests that common regulatory variants are the dominant contributor to T2D heritability. However, rare coding variants may nonetheless be pervasive across T2D-relevant genes. A strategy using common variants to map disease genes, and rare coding variants to link molecular gene perturbations to cellular and phenotypic effects, may be an effective means to investigate T2D pathogenesis and potential new therapies.


Introduction
Genetic studies of complex diseases are largely motivated by two goals: to understand the heritable risk factors for disease in the population, and to identify biological processes relevant to disease pathogenesis [1]. The first goal seeks to quantify the contribution of different classes of genetic variation to disease heritability [2]. The second seeks to identify genetic "experiments of nature" that link genes or pathways to disease risk and potentially suggest new therapeutic strategies [3].
Coding variants have long been an emphasis in genetic studies for type 2 diabetes (T2D) and other complex diseases.
Because they constitute the bulk of known genetic risk factors for Mendelian diseases, they have been hypothesized to contribute disproportionately to complex disease heritability [4][5][6]. Because their effects are usually easier to interpret than those of noncoding variants, they can lead to clear hypotheses about a disease-relevant gene and its directional relationship with disease risk (i.e., whether loss of function predisposes to or protects from disease) [5,7]. The demonstration in 2004 that loss of function mutations in PCSK9 lower low-density lipoprotein levels [8] and protect from coronary artery disease [9], and the successful cholesterol-lowering PCSK9 inhibitors consequently developed [10], have served as longstanding exemplars for many complex diseases.
When the first genome-wide association studies (GWAS) for T2D were published in 2007, some observers were therefore surprised that (a) most associations mapped outside of protein-coding regions of the genome [11] and (b) the identified associations explained only a relatively small portion of disease risk [2]. Early GWAS thus produced the first robust associations for T2D-a clear success [1,12]-but in few cases provided clear insight into T2D's genetic basis or its molecular and cellular mechanisms [5,7,13]. However, because GWAS directly or indirectly analyze only a limited set of common (minor allele frequency [MAF] > 5%) variants in This article is part of the Topical Collection on Genetics * Jason Flannick flannick@broadinstitute.org the genome, their associations are not expected to explain all (or even most of) disease heritability, and might in fact tag disease-causal variants some distance away [2,5]. This review will discuss how these early GWAS findings inspired a decade of studies to understand the role of lowfrequency (MAF < 5%) and rare (MAF < 0.5%) coding variation in T2D susceptibility. In the past few years, a clear picture has begun to emerge as to how these variants contribute to T2D heritability and might be used to better understand T2D biology.

Hypotheses, Conceptual Frameworks, and Experimental Approaches
Following early GWAS findings, three hypotheses (or models) emerged about the contribution of low-frequency and rare variants to the "genetic architecture" of complex diseases. First, rare variants were hypothesized to have significantly larger effects on disease risk than do common variants [5,7,14,15]. Purifying natural selection might prevent strong-effect variants from becoming common in a population [5,16,17], which could explain the empirically modest effects (odds ratio [OR] < 1.1) on disease risk of most common variants [1,2]. Strong-effect, low-frequency variants could be more clinically or therapeutically actionable than modest-effect common variants [3,18].
Second, rare variants were hypothesized to explain a significant amount of disease heritability [2,7,13]. There are many more rare variants than common variants within the population [19,20], and GWAS by design do not interrogate them. If rare or low-frequency variants have significantly larger effects on average than do common variants, then they in aggregate could explain much of the heritability not captured by GWAS.
Third, rare variants were hypothesized to cause some, and perhaps a significant portion of, common variant GWAS associations. By chance, it is possible that one or more diseasecausal rare variants may segregate non-randomly with a common variant, creating a "synthetic association" detected by a common variant GWAS [5,21]. If synthetic associations are commonplace, they could impact the design of "fine mapping" studies-efforts to localize a GWAS "index variant" association to a causal variant(s)-because index variants may lie significantly further from causal rare variants than they are expected to lie from causal common variants [21].
Testing these three hypotheses for T2D and other complex diseases required advances in rare variant ascertainment, genotyping, and association analysis. Foremost, rare variants can only be comprehensively ascertained through sequencing, and their large-scale study therefore required technology to advance from traditional Sanger sequencing of individual genes to cost-effective high-throughput next-generation sequencing [22]. By 2010, next-generation sequencing technologies were inexpensive enough to apply to thousands of samples at select regions of the genome, beginning with several genes [23,24] and soon expanding to the entire exome [25,26]. Because cost considerations limited the total size of regions sequenced, most early studies focused on proteincoding regions of the genome.
As whole-exome sequence (and later whole-genome sequence) data progressively accrued, opportunities also emerged to genotype a subset of coding variants detected by sequencing in much larger sample sizes. By 2012, enough European ancestry exomes had been sequenced to enable design of an inexpensive SNP microarray (the Illumina Exome Array) capturing (at a cost one order of magnitude less than sequencing) over 80% of MAF > 0.5% coding variants in Europeans [27]. By 2016, enough European ancestry genomes had been sequenced to enable a reference panel (compiled by the Haplotype Reference Consortium [HRC]) for high-quality imputation of MAF > 0.1% variants in European ancestry samples (provided the samples had been previously genotyped by a genome-wide SNP microarray) [28]. Exome array analysis and HRC-based imputation complement exome sequencing by trading off full variant ascertainment for increased study sample size (and therefore association power).
Exome array or HRC-based imputation studies predominantly employ traditional GWAS analyses, which test variants individually for disease association ("single-variant analysis"). By contrast, analyses of rare variants from exome sequencing studies require different approaches [25]. In the early 2010s, a significant number of methods were advanced to aggregate rare variants and test for association at the level of genes ("gene-level analysis") [29]. The most basic methods collapse variants of similar molecular effect and test for different frequencies of variation between disease cases and controls ("burden tests"). While simple and easy to interpret, burden tests rely on judicious selection of variants included in the test: their application has therefore been aided by bioinformatic algorithms for predicting protein-damaging variants and theoretical frameworks for understanding how different variant selection strategies impact power to detect association [30,31]. Alternatively, statistical tests can be made more robust to the inclusion of benign variants in gene-level analysis, a strategy that motivated the design of tests such as SKAT [32] and SKAT-O [33] that test for an overdispersion of variant associations within a gene, rather than simply a directional excess of variation in cases or controls. Most rare variant studies therefore apply multiple methods for gene-level analysis, which increases the number of tests performed, but because most studies have far fewer genes than variants, the study-wide multiple testing burden is ultimately reduced relative to GWAS.
By the early part of the 2010s, therefore, sequencing technologies and rare variant analysis methodologies were sufficiently advanced to begin the first empirical assessments of the role of low-frequency and rare variation in the development of T2D (Table 1).

The First Studies of Low-Frequency and Rare Coding Variants
The first searches for low-frequency or rare variant T2D associations began even before the GWAS era. Following paradigms for Mendelian disease genetic mapping, many linkage and candidate gene studies were conducted for T2D in the late 1990s and early 2000s [34]. Other than an association near TCF7L2 [35] (still the largest genetic contributor to T2D risk), these studies produced few replicable associations [34,36,37] and today are cited less so for their discoveries and more so as cautionary contrasts to the statistical rigor of GWAS [1,13].
The first large-scale sequencing studies of T2D focused on genes with prior genetic links to T2D. One class of study focused on genes within GWAS regions, showing MTNR1B [38], SLC30A8 [39,40], PPARG [41], and HNF1A [42] to harbor collections of rare variants with moderate (OR 2-7) effects on T2D risk. Notably, in each case, stringent filtering of variants was necessary to reveal an association: the SLC30A8 association was detected with the small fraction of variants predicted to truncate SLC30A8-encoded protein, while systematic characterizations of rare variants in (genespecific) assays were needed to identify associations for MTNR1B, PPARG, and HNF1A. Furthermore, each of these genes was already widely believed (prior to the sequencing studies) to mediate the original GWAS association; early sequencing studies were less successful at identifying truly novel GWAS effector genes [40,43].
A second class of targeted sequencing studies focused on genes for Maturity Onset Diabetes of the Young (MODY) or other Mendelian diseases with clinical similarities to T2D. Beginning with small studies that showed rare variants in MODY genes to have effects on T2D risk in the general population [44]-albeit with penetrances much lower than might have been expected-and continuing with larger studies that provided stronger statistical evidence of association [27, 44, 45••, 46, 47], MODY genes have been consistently shown to harbor not only rare variants that cause early onset Mendelian diseases but also a broader "allelic series" of variants that predispose to the later onset form of T2D. These findings are now widely interpreted as evidence that MODY and T2D are not distinct conditions but rather opposite extremes of a continuum of diabetes subtypes [15,48,49].
As exome sequencing, the exome array, and sequencebased imputation reference panels began to mature, the first genome-wide scans for rare and low-frequency variant T2D associations began to appear. The earliest exome array studies relevant to T2D were focused on glycemic traits; while some coding variants of moderate effect emerged from these studies (e.g., PAM for insulinogenic index [50], G6PC2 for fasting glucose [51,52], and AKT2 for fasting insulin [53]), the number of significant associations was much smaller than would be expected from the hypotheses positing large contributions of rare variants to complex trait heritability. Early T2D sequencing studies [46,47,54,55] (each in a few thousand individuals) similarly were successful at identifying some, but not many, novel rare or low-frequency coding variant T2D associations (e.g., in PAM [47], PDX1 [47], HNF1A [46], and ADCY3 [56]). Perhaps the biggest lesson to emerge from these investigations is the value offered by studies of populations either isolated or subject to historical bottlenecks Four genotyping technologies and/or study designs (columns) have been used to identify low-frequency and rare variants associated with T2D. Each ascertains different variants (Ascertainment), enables different association analysis methodologies (Analysis), and has been applied to different sample sizes for T2D (Current T2D sample size). The bottom half of the table summarizes the historical contribution of each study design toward evaluating the validity of three rare variant hypotheses about the role of rare variation in T2D susceptibility (e.g., Iceland [47], Mexico [46], Finland [53], Greenland [56,57]) as a means to identify strong-effect T2D variants that have, by chance (e.g. genetic drift) or perhaps even positive selection, risen to moderate (or even high) frequency. Early studies of low-frequency and rare variation thus suggested that rare variant hypotheses might have been overly optimistic in their predictions about the contribution of rare variation to T2D. However, a definitive assessment of these hypotheses would require global and systematic analyses of larger datasets.

An Emerging Picture of T2D Genetic Architecture
While a study of a few thousand sequenced individuals might be expected to detect a substantial number of rare variant associations under optimistic models, more measured earlystage simulations [58,59] and analytical calculations [60] had predicted that tens of thousands of sequenced individuals would be required for reasonable power to interrogate the rare variant hypothesis for most complex diseases. An early sequencing study of 1000 T2D cases and 1000 controls, for example, had power to exclude only extreme models in which rare variants in < 20 genes explained the majority of T2D risk [55]. The lack of rare variant associations from early studies did not, therefore, rule out rare variant models for T2D, and a systematic simulation study showed that, prior to large-scale sequencing studies, rare and common variant models could each be constructed as consistent with empirical T2D genetic associations [61•].
The study that took the largest step toward constraining potential T2D genetic architectures was published in 2016 [45••], analyzing~13,000 multi-ethnic exomes,~2700 whole European ancestry genomes,~80,000 samples genotyped on the exome array, and~44,000 samples with genotypes imputed from a whole-genome sequence reference panel enriched for T2D cases. Using these data collectively, the study provided insights into all three major hypotheses about the role of rare and low-frequency variants in T2D genetic susceptibility. First, despite near-complete variant ascertainment in a modest-size European ancestry sample, only one lowfrequency variant (a previously reported noncoding variant in CCND2 [47]) achieved genome-wide significance, enabling quantitative bounds on the T2D effect sizes of lowfrequency variants, which, in short, rejected models proposing a significant number of low-frequency strong-effect T2D variants. Second, simulated rare variant models predicted far more rare and low-frequency variant associations than were observed empirically, instead supporting a T2D genetic architecture characterized by many modest-effect common variants. Third, no rare variants could plausibly explain any significant T2D GWAS signals, rejecting synthetic associations as a common phenomenon for T2D. A fourth finding of the study was that no gene-level coding variant associations reached exome-wide significance, although the implications of this finding for the validity of rare variant models were not pursued in detail.
Since the publication of the 2016 study, four other largescale studies have further constrained the contribution of rare and low-frequency variants to T2D susceptibility. The first study [62] performed deep whole-genome sequencing of 20 large Hispanic pedigrees (spanning~1000 individuals), providing the opportunity to observe and analyze multiple copies of extremely rare variants (e.g., those private to a family). Although the power of the study design was validated by the identification of several rare variant associations with gene expression (cis-expression quantitative trait loci), no evidence of large-effect rare variant associations was observed for T2D in these families.
The second study [63••] applied the exome array to4 50,000 samples (~80,000 with T2D), significantly increasing power to interrogate MAF > 0.5% coding variation for T2D association. Although the study identified 40 coding variant associations, only five had observed MAF < 5% and none had observed OR > 1.4, strongly suggesting the fruitlessness of searches for low-frequency or common coding variants with even moderate effects on T2D risk. Furthermore, through fine mapping with densely imputed GWAS data, < 50% of the 40 coding variants identified in the study were shown to be causally linked to T2D risk, with the remainder likely proxies for nearby noncoding causal variants. Coding variant associations therefore cannot be immediately assumed to implicate specific variants or genes, although (because most of the 40 associations analyzed in the study were observed with common variants) the proportion of rare coding variant T2D associations that are causal may well be significantly higher.
The third study [64••] used HRC-based imputation to ana-lyze~900,000 European samples (~75,000 with T2D), providing even greater power to detect T2D associations with variants as rare as MAF~0.1% (although imputation quality, and therefore effective sample size, is lower for rarer variants). This study produced by far the largest catalog of lowfrequency and rare variant associations to date for T2D, identifying associations with 56 low-frequency (0.5% < MAF < 5%) and 14 rare (MAF < 0.5%) variants across 60 loci; many of these variants are nearby but independent from common variants identified by earlier GWAS. Although variant OR estimates were not independently validated (and may be overestimates), some of the identified low-frequency variants had moderate to high estimated effects on T2D risk, with 14 having observed OR > 2 and two having observed OR~8. However, only seven of the 56 low-frequency variant associations lie within coding regions, and all of these had estimated OR < 2. Collectively, low-frequency variant associations in the study were estimated to explain 15× less T2D heritability than were common variant associations in the same study (1.13% vs. 16.3%), implying the heritability explained by low-frequency coding variants to be even lower (by perhaps an order of magnitude).
These three studies progressively limited the role of lowfrequency and rare coding variants in T2D susceptibility; however, they collectively ascertained at most only a small fraction of rare coding variation in the population. The final, and most recent, large-scale genetic study of T2D [65••] used exome sequencing in five major ethnic groups to analyze4 5,000 samples (~20,000 with T2D) across~3M coding variants,~95% of which are rare and~90% of which are absent from exome array or HRC-based imputation studies. This study, as expected, identified essentially no novel coding single-variant associations (only one low-frequency variant in the known obesity and T2D gene MC4R). However, it did demonstrate, for the first time, exome-wide significant genelevel associations (PAM, MC4R, and SLC30A8). Notably, these three genes had been previously implicated in T2D via GWAS, and the rare variants contributing to the gene-level signals explain significantly less T2D heritability than do the nearby independently associated common variants.
Nonetheless, exome sequencing at this scale did reveal evidence for pervasive rare variant associations across T2Drelevant genes. Twelve gene sets, defined based on prior evidence from mouse models, T2D drug targets, or monogenic diabetes, each exhibited significantly more rare variant genelevel associations than expected by chance. Individually, the gene-level associations were modest at best, requiring (in the case of T2D drug targets) perhaps 500K-1M samples to be detected at exome-wide statistical significance. However, these results suggest that, once a gene is established as relevant to T2D, it might reasonably be expected to carry a rare coding variant allelic series that could be mined for more insight into its function.
In the last 3 years, a clearer picture of T2D genetic architecture has therefore begun to emerge. The set of rare and lowfrequency coding variants with effect sizes large enough to be detected via single-variant analysis of even 1 million samples appears quite limited, even though~350 independent common variants-most of them noncoding-have been identified from those same studies. This does not imply that rare variants play no role in T2D susceptibility; indeed, the most recent T2D exome sequencing study suggests that rare variant gene-level signals may be more of a norm than a surprise in T2D-relevant genes [65••], supporting other studies that have shown an excess of associations (genome-wide) in coding exons [45••, 66, 67]. However, rare variant signals have effect sizes significantly lower than might have been expected by optimistic early hypotheses and are often detectable only after the disease relevance of gene has been established (e.g., at a relaxed significance threshold justified by the gene's "prior" evidence). For this reason, in the past and likely in the near future, most T2D rare variant signals have been or will be identified within genes harboring additional variants that, by chance, either (a) cause an extreme form of diabetes (e.g., MODY) or (b) have risen to sufficient frequency to be detectable via GWAS (Fig. 1).

The Use of Coding Variants to Understand Disease Biology
Although it seems likely that rare and low-frequency variants contribute modestly to T2D heritability, they may still play a significant role in efforts to understand disease biology or design new therapies. It has long been appreciated that a variant need not explain much heritability to offer valuable disease insight [12,13,15], as a modest-effect variant could point to a gene of which larger therapeutic perturbations may have a significant impact [68] or to a pathway that might suggest an important new disease mechanism [69]. Translating genetic associations to biological function-and then investigating how natural or potential therapeutic alteration of these functions affects disease risk or progression-represents the most pressing current and future challenge in complex disease research [70].
Rare coding variants offer unique value in this endeavor. Because they localize to protein sequence, and because they are less likely to be highly correlated to hidden causal variants via linkage disequilibrium than are common variants, they can directly implicate genes in disease pathogenesis. Furthermore, since their effects are easier to interpret than those of noncoding variants, and because they can be introduced into model systems and evaluated for effects on a variety of molecular or cellular processes, they can help basic researchers to experimentally evaluate the downstream effects of a gene on a biological process. The empirical difficulty of identifying T2D rare variant associations limits their ability to identify novel disease loci genes. However, once a gene is hypothesized to be involved in disease, rare coding variants can provide valuable "handles" on the gene to probe the relationship between its function and disease ( Table 2).
The identification and subsequent characterization of SLC30A8 provides an illustrative example of this approach. A coding variant in SLC30A8 was one of the earliest findings of T2D GWAS [71], and the gene immediately piqued research and therapeutic interest: its protein product ZnT8 is expressed mainly in pancreatic islets and transports zinc into insulin-containing granules, thus contributing to insulin processing and storage. Because of its known function, the initial hypothesis was that reduced ZnT8 activity would increase risk of T2D [72,73]; however, when numerous Slc30a8 knockout mice were subsequently created and tested for a variety of glycemic phenotypes, no consistent effects on hyperglycemia emerged [73][74][75][76][77].
It was therefore surprising when 12 rare protein-truncating variants in SLC30A8 were shown to associate, in aggregate, with protection from T2D [40]. While these data offered no insight into a potential protective mechanism, their clear predicted molecular effects (reduced ZnT8 activity) and expression within the full human system (rather than mice) prompted re-evaluation of the link between ZnT8 and T2D. After several years of further research, a recent mouse model of the most common SLC30A8 rare protein-truncating variant offered a potential mechanism for T2D protection, through increased first-phase insulin secretion [78•].
Much still remains to be determined about the mechanism linking ZnT8 loss of function to increased insulin secretion, not only in animal models but also in humans. One limitation of using the 12 SLC30A8 protein-truncating variants to explore this relationship is that all are expected to result in human haploinsufficiency, and they thus do not inform on potential effects of either full loss of function or more measured reductions in ZnT8 activity. Coding variants could help address this knowledge gap in two respects. First, if SLC30A8 "human knockouts"-or individuals with homozygous or compound heterozygous loss of function mutations-were identified [18,79], they could be deeply characterized for various phenotypes to better understand the intermediate human physiological processes responsible for T2D protection. Second, the exome-wide significant series of > 100 SLC30A8 protective missense alleles from the most recent T2D exome sequencing study [65••] could be used to probe the SLC30A8 "dose-response" curve: each allele could be introduced into a molecular or cellular assay (such as zinc transport or insulin secretion), and their effects on these assays could then be compared to their T2D protective effects to calibrate the relationship between T2D risk and the biological process measured by the assay. Particularly interesting variants could be subsequently introduced into an Slc30a8 mouse model for further characterization. This paradigm has been previously demonstrated for T2D in the context of PPARG and insulin resistance, in which simultaneous effects of rare coding variants on T2D risk (when analyzed in a genetic study) and adipogenesis (when introduced into a cellular assay) validated the gene's mechanism of action [41]. Furthermore, once an assay has been established as a proxy for human disease risk, it represents an attractive asset for future therapeutic screens [3]. Fig. 1 Profiles of rare coding variation in T2D-relevant genes. Based on recent empirical evidence, many T2D-relevant genes seem likely to harbor a series of rare coding variant associations. However, based on empirical aggregate effect sizes, typical rare variant associations may require an order of magnitude more exome sequences to detect than are available today. a Some genes, by chance, will lie near or contain a lowfrequency or common variant T2D association, as is the case for the three exome-wide significant T2D gene-level associations identified to date. Such genes will likely be detected by GWAS or exome array singlevariant analysis long before they are detected by exome sequence genelevel analysis. b Some genes will harbor extremely rare, severe mutations associated with a monogenic form of diabetes. Given evidence of a genetic overlap between monogenic forms of diabetes and T2D, these genes are strong candidates to harbor an allelic series of variants associated with T2D. c Many genes will only harbor a rare variant T2D gene-level association. Based on empirically observed aggregate effect sizes, these genes will likely be very hard to identify for the foreseeable future. For each example gene, variants are shown as tics on the transcript map; red and blue bars indicate variant case and control frequencies, respectively, and black boxes indicate the variant's "prominence" (e.g., detectability via GWAS or a Mendelian gene mapping study)

Conclusion
The past 5 years of T2D genetic research have addressed the goal of understanding T2D genetic architecture: it is now highly likely that T2D genetic risk is mostly determined by many common, modest-effect regulatory variants rather than a few rare, large-effect coding variants. The jury is still out on the extent to which rare and low-frequency variants of individually large effect might aid disease risk prediction in selected groups: only a few examples of low-frequency variants with large effects on T2D risk have been identified [57,80], and these are typically highly population-specific (and in fact often common in the particular population). Collections of large-effect rare variants may reach sufficient aggregate frequency for useful risk prediction in some genes, but to provide clinical utility, these variants will likely require functional characterization [42, 81•, 82, 83], a significant limitation for variants in novel genes. By contrast, polygenic risk scores constructed from many common variants have steadily increased in predictive power: those constructed from the latest T2D GWAS show a ninefold difference in disease prevalence for individuals at the high and low extremes of risk [64••].
The most effective path toward the goal of T2D biological discovery seems likely to include both traditional GWAS approaches as well as novel methods incorporating rare coding variant analyses (Table 2). GWAS approaches-which will only increase in power to interrogate low-frequency and rare variants as imputation reference panels expand-will likely persist for the foreseeable future as the most efficient means to identify genomic loci associated with disease. Coding variants seem better placed to help understand the functional consequences of an association by providing a variety of gene perturbations that can be both experimentally and phenotypically characterized, an area in which improved methods for conducting and dissecting gene-level tests could have a significant impact. Additionally, it seems plausible that some (maybe many) disease-relevant genes may prove undetectable by GWAS, as an association depends not only on the disease relevance of a gene but also on the historical emergence of a (reasonably common) genetic variant that sufficiently perturbs its function [17]. For T2D-relevant genes lacking a GWAS association-which might be identified as therapeutic or biological candidates from high-throughput functional screens or because they participate in a hypothesized disease-relevant pathway-coding variants offer the opportunity to conduct "reverse genetic" analyses in which variants that alter function of the gene are identified and then characterized for their effects on human phenotypes.
Although rare and low-frequency coding variants are not the panacea that some optimistic genetic models proposed after the first T2D GWAS, they are not as irrelevant to T2D as might be naively inferred from the now-appreciated dominant contribution of common noncoding variants to disease heritability. Instead, they provide one of several genetic tools necessary to understand and ultimately develop better treatments for complex diseases such as T2D-in particular, a tool for probing and refining gene function. The true revolution enabled by rare coding variation may therefore be not the previous phase of complex disease discovery but the next phase of complex disease association functional and clinical translation. Identify individuals with loss of function mutations ("human knockouts" or haploinsufficiency), or severe missense mutations and analyze deep phenotypes Biological function Determine mechanism of original common variant association through functional genomic predictions, genome editing, and readouts from cellular/animal models Characterize an "allelic series" of missense mutations to assess the molecular and cellular consequences of varied gene perturbations Therapeutic translation Of potential clinical utility to define subgroups or stratify populations through common variant polygenic risk scores Use coding variants to link molecular and cellular readouts (effects of variants on an assay) to physiological phenotypes (genetic associations of the same variants) and potentially identify putative drug targets Future studies to understand the biology of T2D and identify potential new therapies will require a combination of approaches to identify new genetic associations (Locus discovery), evaluate the role candidate genes play in human disease (Reverse genetics), translate associations to biological insights (Biological function), and suggest new therapies (Therapeutic translation). The table summarizes potential strategies to use common and rare variants toward each goal