The Contribution of Low-Frequency and Rare Coding Variation to Susceptibility to Type 2 Diabetes
Purpose of Review
Soon after the first genome-wide association study (GWAS) for type 2 diabetes (T2D) was published, it was hypothesized that rare and low-frequency variants might explain a substantial proportion of disease risk. Rare coding variants in particular were emphasized given their large expected role in disease. This review summarizes the extent to which recent T2D genetic studies provide evidence for or against this hypothesis.
Following a comprehensive study of T2D genetic architecture using three sequencing and genotyping technologies, four even larger studies have provided a yet higher resolution view of the role of rare and low-frequency coding variation in T2D susceptibility.
Empirical evidence strongly suggests that common regulatory variants are the dominant contributor to T2D heritability. However, rare coding variants may nonetheless be pervasive across T2D-relevant genes. A strategy using common variants to map disease genes, and rare coding variants to link molecular gene perturbations to cellular and phenotypic effects, may be an effective means to investigate T2D pathogenesis and potential new therapies.
KeywordsRare variants Coding variants Exome Sequencing GWAS RVAS Genetic architecture
Genetic studies of complex diseases are largely motivated by two goals: to understand the heritable risk factors for disease in the population, and to identify biological processes relevant to disease pathogenesis . The first goal seeks to quantify the contribution of different classes of genetic variation to disease heritability . The second seeks to identify genetic “experiments of nature” that link genes or pathways to disease risk and potentially suggest new therapeutic strategies .
Coding variants have long been an emphasis in genetic studies for type 2 diabetes (T2D) and other complex diseases. Because they constitute the bulk of known genetic risk factors for Mendelian diseases, they have been hypothesized to contribute disproportionately to complex disease heritability [4, 5, 6]. Because their effects are usually easier to interpret than those of noncoding variants, they can lead to clear hypotheses about a disease-relevant gene and its directional relationship with disease risk (i.e., whether loss of function predisposes to or protects from disease) [5, 7]. The demonstration in 2004 that loss of function mutations in PCSK9 lower low-density lipoprotein levels  and protect from coronary artery disease , and the successful cholesterol-lowering PCSK9 inhibitors consequently developed , have served as longstanding exemplars for many complex diseases.
When the first genome-wide association studies (GWAS) for T2D were published in 2007, some observers were therefore surprised that (a) most associations mapped outside of protein-coding regions of the genome  and (b) the identified associations explained only a relatively small portion of disease risk . Early GWAS thus produced the first robust associations for T2D—a clear success [1, 12]—but in few cases provided clear insight into T2D’s genetic basis or its molecular and cellular mechanisms [5, 7, 13]. However, because GWAS directly or indirectly analyze only a limited set of common (minor allele frequency [MAF] > 5%) variants in the genome, their associations are not expected to explain all (or even most of) disease heritability, and might in fact tag disease-causal variants some distance away [2, 5].
This review will discuss how these early GWAS findings inspired a decade of studies to understand the role of low-frequency (MAF < 5%) and rare (MAF < 0.5%) coding variation in T2D susceptibility. In the past few years, a clear picture has begun to emerge as to how these variants contribute to T2D heritability and might be used to better understand T2D biology.
Hypotheses, Conceptual Frameworks, and Experimental Approaches
Following early GWAS findings, three hypotheses (or models) emerged about the contribution of low-frequency and rare variants to the “genetic architecture” of complex diseases. First, rare variants were hypothesized to have significantly larger effects on disease risk than do common variants [5, 7, 14, 15]. Purifying natural selection might prevent strong-effect variants from becoming common in a population [5, 16, 17], which could explain the empirically modest effects (odds ratio [OR] < 1.1) on disease risk of most common variants [1, 2]. Strong-effect, low-frequency variants could be more clinically or therapeutically actionable than modest-effect common variants [3, 18].
Second, rare variants were hypothesized to explain a significant amount of disease heritability [2, 7, 13]. There are many more rare variants than common variants within the population [19, 20], and GWAS by design do not interrogate them. If rare or low-frequency variants have significantly larger effects on average than do common variants, then they in aggregate could explain much of the heritability not captured by GWAS.
Third, rare variants were hypothesized to cause some, and perhaps a significant portion of, common variant GWAS associations. By chance, it is possible that one or more disease-causal rare variants may segregate non-randomly with a common variant, creating a “synthetic association” detected by a common variant GWAS [5, 21]. If synthetic associations are commonplace, they could impact the design of “fine mapping” studies—efforts to localize a GWAS “index variant” association to a causal variant(s)—because index variants may lie significantly further from causal rare variants than they are expected to lie from causal common variants .
Testing these three hypotheses for T2D and other complex diseases required advances in rare variant ascertainment, genotyping, and association analysis. Foremost, rare variants can only be comprehensively ascertained through sequencing, and their large-scale study therefore required technology to advance from traditional Sanger sequencing of individual genes to cost-effective high-throughput next-generation sequencing . By 2010, next-generation sequencing technologies were inexpensive enough to apply to thousands of samples at select regions of the genome, beginning with several genes [23, 24] and soon expanding to the entire exome [25, 26]. Because cost considerations limited the total size of regions sequenced, most early studies focused on protein-coding regions of the genome.
As whole-exome sequence (and later whole-genome sequence) data progressively accrued, opportunities also emerged to genotype a subset of coding variants detected by sequencing in much larger sample sizes. By 2012, enough European ancestry exomes had been sequenced to enable design of an inexpensive SNP microarray (the Illumina Exome Array) capturing (at a cost one order of magnitude less than sequencing) over 80% of MAF > 0.5% coding variants in Europeans . By 2016, enough European ancestry genomes had been sequenced to enable a reference panel (compiled by the Haplotype Reference Consortium [HRC]) for high-quality imputation of MAF > 0.1% variants in European ancestry samples (provided the samples had been previously genotyped by a genome-wide SNP microarray) . Exome array analysis and HRC-based imputation complement exome sequencing by trading off full variant ascertainment for increased study sample size (and therefore association power).
Exome array or HRC-based imputation studies predominantly employ traditional GWAS analyses, which test variants individually for disease association (“single-variant analysis”). By contrast, analyses of rare variants from exome sequencing studies require different approaches . In the early 2010s, a significant number of methods were advanced to aggregate rare variants and test for association at the level of genes (“gene-level analysis”) . The most basic methods collapse variants of similar molecular effect and test for different frequencies of variation between disease cases and controls (“burden tests”). While simple and easy to interpret, burden tests rely on judicious selection of variants included in the test: their application has therefore been aided by bioinformatic algorithms for predicting protein-damaging variants and theoretical frameworks for understanding how different variant selection strategies impact power to detect association [30, 31]. Alternatively, statistical tests can be made more robust to the inclusion of benign variants in gene-level analysis, a strategy that motivated the design of tests such as SKAT  and SKAT-O  that test for an overdispersion of variant associations within a gene, rather than simply a directional excess of variation in cases or controls. Most rare variant studies therefore apply multiple methods for gene-level analysis, which increases the number of tests performed, but because most studies have far fewer genes than variants, the study-wide multiple testing burden is ultimately reduced relative to GWAS.
Technologies for interrogating low-frequency and rare variants
70–80% of MAF > 0.5% variants
Statistical inference of MAF > 0.1% variants (Europeans) or MAF > 1% variants (other populations)
All coding variants
Current T2D sample size
Contribution to testing rare and low-frequency variant hypotheses
The First Studies of Low-Frequency and Rare Coding Variants
The first searches for low-frequency or rare variant T2D associations began even before the GWAS era. Following paradigms for Mendelian disease genetic mapping, many linkage and candidate gene studies were conducted for T2D in the late 1990s and early 2000s . Other than an association near TCF7L2  (still the largest genetic contributor to T2D risk), these studies produced few replicable associations [34, 36, 37] and today are cited less so for their discoveries and more so as cautionary contrasts to the statistical rigor of GWAS [1, 13].
The first large-scale sequencing studies of T2D focused on genes with prior genetic links to T2D. One class of study focused on genes within GWAS regions, showing MTNR1B , SLC30A8 [39, 40], PPARG , and HNF1A  to harbor collections of rare variants with moderate (OR 2–7) effects on T2D risk. Notably, in each case, stringent filtering of variants was necessary to reveal an association: the SLC30A8 association was detected with the small fraction of variants predicted to truncate SLC30A8-encoded protein, while systematic characterizations of rare variants in (gene-specific) assays were needed to identify associations for MTNR1B, PPARG, and HNF1A. Furthermore, each of these genes was already widely believed (prior to the sequencing studies) to mediate the original GWAS association; early sequencing studies were less successful at identifying truly novel GWAS effector genes [40, 43].
A second class of targeted sequencing studies focused on genes for Maturity Onset Diabetes of the Young (MODY) or other Mendelian diseases with clinical similarities to T2D. Beginning with small studies that showed rare variants in MODY genes to have effects on T2D risk in the general population —albeit with penetrances much lower than might have been expected—and continuing with larger studies that provided stronger statistical evidence of association [27, 44, 45••, 46, 47], MODY genes have been consistently shown to harbor not only rare variants that cause early onset Mendelian diseases but also a broader “allelic series” of variants that predispose to the later onset form of T2D. These findings are now widely interpreted as evidence that MODY and T2D are not distinct conditions but rather opposite extremes of a continuum of diabetes subtypes [15, 48, 49].
As exome sequencing, the exome array, and sequence-based imputation reference panels began to mature, the first genome-wide scans for rare and low-frequency variant T2D associations began to appear. The earliest exome array studies relevant to T2D were focused on glycemic traits; while some coding variants of moderate effect emerged from these studies (e.g., PAM for insulinogenic index , G6PC2 for fasting glucose [51, 52], and AKT2 for fasting insulin ), the number of significant associations was much smaller than would be expected from the hypotheses positing large contributions of rare variants to complex trait heritability. Early T2D sequencing studies [46, 47, 54, 55] (each in a few thousand individuals) similarly were successful at identifying some, but not many, novel rare or low-frequency coding variant T2D associations (e.g., in PAM , PDX1 , HNF1A , and ADCY3 ). Perhaps the biggest lesson to emerge from these investigations is the value offered by studies of populations either isolated or subject to historical bottlenecks (e.g., Iceland , Mexico , Finland , Greenland [56, 57]) as a means to identify strong-effect T2D variants that have, by chance (e.g. genetic drift) or perhaps even positive selection, risen to moderate (or even high) frequency.
Early studies of low-frequency and rare variation thus suggested that rare variant hypotheses might have been overly optimistic in their predictions about the contribution of rare variation to T2D. However, a definitive assessment of these hypotheses would require global and systematic analyses of larger datasets.
An Emerging Picture of T2D Genetic Architecture
While a study of a few thousand sequenced individuals might be expected to detect a substantial number of rare variant associations under optimistic models, more measured early-stage simulations [58, 59] and analytical calculations  had predicted that tens of thousands of sequenced individuals would be required for reasonable power to interrogate the rare variant hypothesis for most complex diseases. An early sequencing study of 1000 T2D cases and 1000 controls, for example, had power to exclude only extreme models in which rare variants in < 20 genes explained the majority of T2D risk . The lack of rare variant associations from early studies did not, therefore, rule out rare variant models for T2D, and a systematic simulation study showed that, prior to large-scale sequencing studies, rare and common variant models could each be constructed as consistent with empirical T2D genetic associations [61•].
The study that took the largest step toward constraining potential T2D genetic architectures was published in 2016 [45••], analyzing ~ 13,000 multi-ethnic exomes, ~ 2700 whole European ancestry genomes, ~ 80,000 samples genotyped on the exome array, and ~ 44,000 samples with genotypes imputed from a whole-genome sequence reference panel enriched for T2D cases. Using these data collectively, the study provided insights into all three major hypotheses about the role of rare and low-frequency variants in T2D genetic susceptibility. First, despite near-complete variant ascertainment in a modest-size European ancestry sample, only one low-frequency variant (a previously reported noncoding variant in CCND2 ) achieved genome-wide significance, enabling quantitative bounds on the T2D effect sizes of low-frequency variants, which, in short, rejected models proposing a significant number of low-frequency strong-effect T2D variants. Second, simulated rare variant models predicted far more rare and low-frequency variant associations than were observed empirically, instead supporting a T2D genetic architecture characterized by many modest-effect common variants. Third, no rare variants could plausibly explain any significant T2D GWAS signals, rejecting synthetic associations as a common phenomenon for T2D. A fourth finding of the study was that no gene-level coding variant associations reached exome-wide significance, although the implications of this finding for the validity of rare variant models were not pursued in detail.
Since the publication of the 2016 study, four other large-scale studies have further constrained the contribution of rare and low-frequency variants to T2D susceptibility. The first study  performed deep whole-genome sequencing of 20 large Hispanic pedigrees (spanning ~ 1000 individuals), providing the opportunity to observe and analyze multiple copies of extremely rare variants (e.g., those private to a family). Although the power of the study design was validated by the identification of several rare variant associations with gene expression (cis-expression quantitative trait loci), no evidence of large-effect rare variant associations was observed for T2D in these families.
The second study [63••] applied the exome array to ~ 450,000 samples (~ 80,000 with T2D), significantly increasing power to interrogate MAF > 0.5% coding variation for T2D association. Although the study identified 40 coding variant associations, only five had observed MAF < 5% and none had observed OR > 1.4, strongly suggesting the fruitlessness of searches for low-frequency or common coding variants with even moderate effects on T2D risk. Furthermore, through fine mapping with densely imputed GWAS data, < 50% of the 40 coding variants identified in the study were shown to be causally linked to T2D risk, with the remainder likely proxies for nearby noncoding causal variants. Coding variant associations therefore cannot be immediately assumed to implicate specific variants or genes, although (because most of the 40 associations analyzed in the study were observed with common variants) the proportion of rare coding variant T2D associations that are causal may well be significantly higher.
The third study [64••] used HRC-based imputation to analyze ~ 900,000 European samples (~ 75,000 with T2D), providing even greater power to detect T2D associations with variants as rare as MAF~ 0.1% (although imputation quality, and therefore effective sample size, is lower for rarer variants). This study produced by far the largest catalog of low-frequency and rare variant associations to date for T2D, identifying associations with 56 low-frequency (0.5% < MAF < 5%) and 14 rare (MAF < 0.5%) variants across 60 loci; many of these variants are nearby but independent from common variants identified by earlier GWAS. Although variant OR estimates were not independently validated (and may be overestimates), some of the identified low-frequency variants had moderate to high estimated effects on T2D risk, with 14 having observed OR > 2 and two having observed OR ~ 8. However, only seven of the 56 low-frequency variant associations lie within coding regions, and all of these had estimated OR < 2. Collectively, low-frequency variant associations in the study were estimated to explain 15× less T2D heritability than were common variant associations in the same study (1.13% vs. 16.3%), implying the heritability explained by low-frequency coding variants to be even lower (by perhaps an order of magnitude).
These three studies progressively limited the role of low-frequency and rare coding variants in T2D susceptibility; however, they collectively ascertained at most only a small fraction of rare coding variation in the population. The final, and most recent, large-scale genetic study of T2D [65••] used exome sequencing in five major ethnic groups to analyze ~ 45,000 samples (~ 20,000 with T2D) across ~ 3M coding variants, ~ 95% of which are rare and ~ 90% of which are absent from exome array or HRC-based imputation studies. This study, as expected, identified essentially no novel coding single-variant associations (only one low-frequency variant in the known obesity and T2D gene MC4R). However, it did demonstrate, for the first time, exome-wide significant gene-level associations (PAM, MC4R, and SLC30A8). Notably, these three genes had been previously implicated in T2D via GWAS, and the rare variants contributing to the gene-level signals explain significantly less T2D heritability than do the nearby independently associated common variants.
Nonetheless, exome sequencing at this scale did reveal evidence for pervasive rare variant associations across T2D-relevant genes. Twelve gene sets, defined based on prior evidence from mouse models, T2D drug targets, or monogenic diabetes, each exhibited significantly more rare variant gene-level associations than expected by chance. Individually, the gene-level associations were modest at best, requiring (in the case of T2D drug targets) perhaps 500K–1M samples to be detected at exome-wide statistical significance. However, these results suggest that, once a gene is established as relevant to T2D, it might reasonably be expected to carry a rare coding variant allelic series that could be mined for more insight into its function.
The Use of Coding Variants to Understand Disease Biology
Although it seems likely that rare and low-frequency variants contribute modestly to T2D heritability, they may still play a significant role in efforts to understand disease biology or design new therapies. It has long been appreciated that a variant need not explain much heritability to offer valuable disease insight [12, 13, 15], as a modest-effect variant could point to a gene of which larger therapeutic perturbations may have a significant impact  or to a pathway that might suggest an important new disease mechanism . Translating genetic associations to biological function—and then investigating how natural or potential therapeutic alteration of these functions affects disease risk or progression—represents the most pressing current and future challenge in complex disease research .
The role of common and rare variation in future T2D genetic studies
Rare coding variants
GWAS in large sample sizes, imputed from progressively larger whole-genome sequence reference panels
Limited role for the foreseeable future (and possibly longer) due to significantly greater efficiency of GWAS
Limited role due to difficulties in identifying variants with clear molecular function
Identify individuals with loss of function mutations (“human knockouts” or haploinsufficiency), or severe missense mutations and analyze deep phenotypes
Determine mechanism of original common variant association through functional genomic predictions, genome editing, and readouts from cellular/animal models
Characterize an “allelic series” of missense mutations to assess the molecular and cellular consequences of varied gene perturbations
Of potential clinical utility to define subgroups or stratify populations through common variant polygenic risk scores
Use coding variants to link molecular and cellular readouts (effects of variants on an assay) to physiological phenotypes (genetic associations of the same variants) and potentially identify putative drug targets
The identification and subsequent characterization of SLC30A8 provides an illustrative example of this approach. A coding variant in SLC30A8 was one of the earliest findings of T2D GWAS , and the gene immediately piqued research and therapeutic interest: its protein product ZnT8 is expressed mainly in pancreatic islets and transports zinc into insulin-containing granules, thus contributing to insulin processing and storage. Because of its known function, the initial hypothesis was that reduced ZnT8 activity would increase risk of T2D [72, 73]; however, when numerous Slc30a8 knockout mice were subsequently created and tested for a variety of glycemic phenotypes, no consistent effects on hyperglycemia emerged [73, 74, 75, 76, 77].
It was therefore surprising when 12 rare protein-truncating variants in SLC30A8 were shown to associate, in aggregate, with protection from T2D . While these data offered no insight into a potential protective mechanism, their clear predicted molecular effects (reduced ZnT8 activity) and expression within the full human system (rather than mice) prompted re-evaluation of the link between ZnT8 and T2D. After several years of further research, a recent mouse model of the most common SLC30A8 rare protein-truncating variant offered a potential mechanism for T2D protection, through increased first-phase insulin secretion [78•].
Much still remains to be determined about the mechanism linking ZnT8 loss of function to increased insulin secretion, not only in animal models but also in humans. One limitation of using the 12 SLC30A8 protein-truncating variants to explore this relationship is that all are expected to result in human haploinsufficiency, and they thus do not inform on potential effects of either full loss of function or more measured reductions in ZnT8 activity. Coding variants could help address this knowledge gap in two respects. First, if SLC30A8 “human knockouts”—or individuals with homozygous or compound heterozygous loss of function mutations—were identified [18, 79], they could be deeply characterized for various phenotypes to better understand the intermediate human physiological processes responsible for T2D protection. Second, the exome-wide significant series of > 100 SLC30A8 protective missense alleles from the most recent T2D exome sequencing study [65••] could be used to probe the SLC30A8 “dose-response” curve: each allele could be introduced into a molecular or cellular assay (such as zinc transport or insulin secretion), and their effects on these assays could then be compared to their T2D protective effects to calibrate the relationship between T2D risk and the biological process measured by the assay. Particularly interesting variants could be subsequently introduced into an Slc30a8 mouse model for further characterization. This paradigm has been previously demonstrated for T2D in the context of PPARG and insulin resistance, in which simultaneous effects of rare coding variants on T2D risk (when analyzed in a genetic study) and adipogenesis (when introduced into a cellular assay) validated the gene’s mechanism of action . Furthermore, once an assay has been established as a proxy for human disease risk, it represents an attractive asset for future therapeutic screens .
The past 5 years of T2D genetic research have addressed the goal of understanding T2D genetic architecture: it is now highly likely that T2D genetic risk is mostly determined by many common, modest-effect regulatory variants rather than a few rare, large-effect coding variants. The jury is still out on the extent to which rare and low-frequency variants of individually large effect might aid disease risk prediction in selected groups: only a few examples of low-frequency variants with large effects on T2D risk have been identified [57, 80], and these are typically highly population-specific (and in fact often common in the particular population). Collections of large-effect rare variants may reach sufficient aggregate frequency for useful risk prediction in some genes, but to provide clinical utility, these variants will likely require functional characterization [42, 81•, 82, 83], a significant limitation for variants in novel genes. By contrast, polygenic risk scores constructed from many common variants have steadily increased in predictive power: those constructed from the latest T2D GWAS show a ninefold difference in disease prevalence for individuals at the high and low extremes of risk [64••].
The most effective path toward the goal of T2D biological discovery seems likely to include both traditional GWAS approaches as well as novel methods incorporating rare coding variant analyses (Table 2). GWAS approaches—which will only increase in power to interrogate low-frequency and rare variants as imputation reference panels expand—will likely persist for the foreseeable future as the most efficient means to identify genomic loci associated with disease. Coding variants seem better placed to help understand the functional consequences of an association by providing a variety of gene perturbations that can be both experimentally and phenotypically characterized, an area in which improved methods for conducting and dissecting gene-level tests could have a significant impact. Additionally, it seems plausible that some (maybe many) disease-relevant genes may prove undetectable by GWAS, as an association depends not only on the disease relevance of a gene but also on the historical emergence of a (reasonably common) genetic variant that sufficiently perturbs its function . For T2D-relevant genes lacking a GWAS association—which might be identified as therapeutic or biological candidates from high-throughput functional screens or because they participate in a hypothesized disease-relevant pathway—coding variants offer the opportunity to conduct “reverse genetic” analyses in which variants that alter function of the gene are identified and then characterized for their effects on human phenotypes.
Although rare and low-frequency coding variants are not the panacea that some optimistic genetic models proposed after the first T2D GWAS, they are not as irrelevant to T2D as might be naively inferred from the now-appreciated dominant contribution of common noncoding variants to disease heritability. Instead, they provide one of several genetic tools necessary to understand and ultimately develop better treatments for complex diseases such as T2D—in particular, a tool for probing and refining gene function. The true revolution enabled by rare coding variation may therefore be not the previous phase of complex disease discovery but the next phase of complex disease association functional and clinical translation.
Compliance with Ethical Standards
Conflict of Interest
Jason Flannick reports personal fees from Decibel Therapeutics.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance
- 11.Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci U S A. 2009;106(23):9362–7. https://doi.org/10.1073/pnas.0903103106.CrossRefPubMedPubMedCentralGoogle Scholar
- 27.Bansal V, Gassenhuber J, Phillips T, Oliveira G, Harbaugh R, Villarasa N, et al. Spectrum of mutations in monogenic diabetes genes identified from high-throughput DNA sequencing of 6888 individuals. BMC Med. 2017;15(1):213. https://doi.org/10.1186/s12916-017-0977-3.CrossRefPubMedPubMedCentralGoogle Scholar
- 33.Lee S, Emond MJ, Bamshad MJ, Barnes KC, Rieder MJ, Nickerson DA, et al. Optimal unified approach for rare-variant association testing with application to small-sample case-control whole-exome sequencing studies. Am J Hum Genet. 2012;91(2):224–37. https://doi.org/10.1016/j.ajhg.2012.06.007.CrossRefPubMedPubMedCentralGoogle Scholar
- 36.Guan W, Pluzhnikov A, Cox NJ, Boehnke M. International Type 2 Diabetes Linkage Analysis C. Meta-analysis of 23 type 2 diabetes linkage studies from the International Type 2 Diabetes Linkage Analysis Consortium. Hum Hered. 2008;66(1):35–49. https://doi.org/10.1159/000114164.CrossRefPubMedGoogle Scholar
- 39.Billings LK, Jablonski KA, Ackerman RJ, Taylor A, Fanelli RR, McAteer JB, et al. The influence of rare genetic variation in SLC30A8 on diabetes incidence and beta-cell function. J Clin Endocrinol Metab. 2014;99(5):E926–30. https://doi.org/10.1210/jc.2013-2378.CrossRefPubMedPubMedCentralGoogle Scholar
- 41.Majithia AR, Flannick J, Shahinian P, Guo M, Bray MA, Fontanillas P, et al. Rare variants in PPARG with decreased activity in adipocyte differentiation are associated with increased risk of type 2 diabetes. Proc Natl Acad Sci U S A. 2014;111(36):13127–32. https://doi.org/10.1073/pnas.1410428111.CrossRefPubMedPubMedCentralGoogle Scholar
- 44.Flannick J, Beer NL, Bick AG, Agarwala V, Molnes J, Gupta N, et al. Assessing the phenotypic effects in the general population of rare variants in genes for a dominant Mendelian form of diabetes. Nat Genet. 2013;45(11):1380–5. https://doi.org/10.1038/ng.2794.CrossRefPubMedPubMedCentralGoogle Scholar
- 45.••Fuchsberger C, Flannick J, Teslovich TM, Mahajan A, Agarwala V, Gaulton KJ, et al. The genetic architecture of type 2 diabetes. Nature. 2016;536(7614):41–7. https://doi.org/10.1038/nature18642 This paper used a combination of next-generation sequencing technologies and novel analytical approaches to provide the most comprehensive characterization of T2D genetic architecture to date. CrossRefPubMedPubMedCentralGoogle Scholar
- 51.Mahajan A, Sim X, Ng HJ, Manning A, Rivas MA, Highland HM, et al. Identification and functional characterization of G6PC2 coding variants influencing glycemic traits define an effector transcript at the G6PC2-ABCB11 locus. PLoS Genet. 2015;11(1):e1004876. https://doi.org/10.1371/journal.pgen.1004876.CrossRefPubMedPubMedCentralGoogle Scholar
- 53.Manning A, Highland HM, Gasser J, Sim X, Tukiainen T, Fontanillas P, et al. A low-frequency inactivating AKT2 variant enriched in the finnish population is associated with fasting insulin levels and type 2 diabetes risk. Diabetes. 2017;66(7):2019–32. https://doi.org/10.2337/db16-1329.CrossRefPubMedPubMedCentralGoogle Scholar
- 55.Lohmueller KE, Sparso T, Li Q, Andersson E, Korneliussen T, Albrechtsen A, et al. Whole-exome sequencing of 2,000 Danish individuals and the role of rare coding variants in type 2 diabetes. Am J Hum Genet. 2013;93(6):1072–86. https://doi.org/10.1016/j.ajhg.2013.11.005.CrossRefPubMedPubMedCentralGoogle Scholar
- 59.Moutsianas L, Agarwala V, Fuchsberger C, Flannick J, Rivas MA, Gaulton KJ, et al. The power of gene-based rare variant methods to detect disease-associated variation and test hypotheses about complex disease. PLoS Genet. 2015;11(4):e1005165. https://doi.org/10.1371/journal.pgen.1005165.CrossRefPubMedPubMedCentralGoogle Scholar
- 62.•Jun G, Manning A, Almeida M, Zawistowski M, Wood AR, Teslovich TM, et al. Evaluating the contribution of rare variants to type 2 diabetes and related traits using pedigrees. Proc Natl Acad Sci U S A. 2017;115:379–84. https://doi.org/10.1073/pnas.1705859115 . This paper employed a novel pedigree strategy to characterize ultra-rare variants (private to a family) for effects on T2D, showing that they contribute minimally to T2D.CrossRefPubMedPubMedCentralGoogle Scholar
- 63.••Mahajan A, Wessel J, Willems SM, Zhao W, Robertson NR, Chu AY, et al. Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes. Nat Genet. 2018;50(4):559–71. https://doi.org/10.1038/s41588-018-0084-1. This paper is the largest exome array study of T2D to date, further limiting the contribution to T2D risk from low-frequency coding variants.CrossRefPubMedPubMedCentralGoogle Scholar
- 64.••Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM, Rayner NW, et al. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet. 2018;50(11):1505–13. https://doi.org/10.1038/s41588-018-0241-6. This paper is the largest T2D GWAS to date, significantly expanding the number of common and low-frequency variants to T2D but finding no new evidence for moderate effect coding variants. CrossRefPubMedPubMedCentralGoogle Scholar
- 65.••Flannick J, Mercader JM, Fuchsberger C, Udler MS, Mahajan A, Wessel J, et al. Genetic discovery and translational decision support from exome sequencing of 20,791 type 2 diabetes cases and 24,440 controls from five ancestries. bioRxiv. 2018. https://doi.org/10.1101/371450. This paper is the largest T2D exome sequencing study to date, demonstrating evidence for pervasive rare variant T2D gene-level signals but showing them to contribute minimally to T2D heritability.
- 73.Nicolson TJ, Bellomo EA, Wijesekara N, Loder MK, Baldwin JM, Gyulkhandanyan AV, et al. Insulin storage and glucose homeostasis in mice null for the granule zinc transporter ZnT8 and studies of the type 2 diabetes-associated variants. Diabetes. 2009;58(9):2070–83. https://doi.org/10.2337/db09-0551.CrossRefPubMedPubMedCentralGoogle Scholar
- 74.Pound LD, Sarkar SA, Ustione A, Dadi PK, Shadoan MK, Lee CE, et al. The physiological effects of deleting the mouse SLC30A8 gene encoding zinc transporter-8 are influenced by gender and genetic background. PLoS One. 2012;7(7):e40972. https://doi.org/10.1371/journal.pone.0040972.CrossRefPubMedPubMedCentralGoogle Scholar
- 76.Wijesekara N, Dai FF, Hardy AB, Giglou PR, Bhattacharjee A, Koshkin V, et al. Beta cell-specific Znt8 deletion in mice causes marked defects in insulin processing, crystallisation and secretion. Diabetologia. 2010;53(8):1656–68. https://doi.org/10.1007/s00125-010-1733-9.CrossRefPubMedPubMedCentralGoogle Scholar
- 77.Lemaire K, Ravier MA, Schraenen A, Creemers JW, Van de Plas R, Granvik M, et al. Insulin crystallization depends on zinc transporter ZnT8 expression, but is not required for normal glucose homeostasis in mice. Proc Natl Acad Sci U S A. 2009;106(35):14872–7. https://doi.org/10.1073/pnas.0906587106.CrossRefPubMedPubMedCentralGoogle Scholar
- 78.•Kleiner S, Gomez D, Megra B, Na E, Bhavsar R, Cavino K, et al. Mice harboring the human SLC30A8 R138X loss-of-function mutation have increased insulin secretory capacity. Proc Natl Acad Sci U S A. 2018;115(32):E7642–E9. https://doi.org/10.1073/pnas.1721418115. This paper provides experimental evidence that the SLC30A8 loss of function protects from T2D, confirming one of the earliest predictions from a rare variant T2D association.CrossRefPubMedPubMedCentralGoogle Scholar
- 81.•Majithia AR, Tsuda B, Agostini M, Gnanapradeepan K, Rice R, Peloso G, et al. Prospective functional classification of all possible missense variants in PPARG. Nat Genet. 2016;48(12):1570–5. https://doi.org/10.1038/ng.3700. This paper introduces a paradigm for systematically characterizing coding variants in a T2D-relevant functional assay, of potential import for the future clinical and biological utility of coding variants in T2D.CrossRefPubMedPubMedCentralGoogle Scholar
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.