GWASs are revealing increasing numbers of loci associated with various diseases. However, our understanding of the biological mechanisms behind these genetic variants is, in many cases, incomplete. eQTLs have the potential to aid in deciphering these variants by associating them with gene expression.
Although a recent study discovered no significant eQTL relationships for well-known type 2 diabetes SNPs in colon, pancreas or liver tissue [23], our initial analysis of type 2 diabetes expression traits revealed several such examples. This most likely reflects the large number of individuals from whom we could collect samples, as well as our inclusion of adipose tissue in this study. For instance, the expression of TCF7L2 is not associated with any of the SNPs studied, but there are several genes, including VTI1A, PDCD4 and CASP7, whose expression is associated with SNPs in TCF7L2. VTI1A, which we observed as an expression trait of the TCF7L2 SNP in omental adipose tissue, is a vesicle-soluble NSF attachment protein receptor (v-SNARE) that is a component of insulin-sensitive GLUT4-containing vesicles and affects insulin-dependent glucose transport in adipocytes [24]; PDCD4, another expression trait of the same SNP, plays a crucial role in pancreatic beta cell death in type 1 diabetes [25]; CASP7 has also been identified as a positional candidate gene for type 1 diabetes [26]. All of these expression traits could serve as causal explanations of how the TCF7L2 SNP leads to type 2 diabetes.
As another example, rs1111875 and rs5015480, variants located close to the HHEX gene, are actually associated with the expression of IDE (encoding insulin-degrading enzyme) in subcutaneous fat. This association suggests that the functional significance of these SNPs in type 2 diabetes is relayed through the expression of IDE, which plays a central role in insulin metabolism [27]. In other words, explaining GWAS findings using eSNPs, as demonstrated here, might help to distinguish between two nearby genes with radically different potential mechanisms for disease.
Finally, rs564398, a variant in CDKN2B antisense RNA 1 (CDKN2B-AS1), is associated with the expression of CDKN2A in omental adipose tissue and PTPLAD2 in liver. A recent study reported that this SNP was associated with the expression of CDKN2B-AS1 but not CDKN2A/B in peripheral blood [28]; our result may be specific to the tissue types that we investigated. Here, our analysis using eSNPs suggests several different candidate mechanisms across separate tissues; it could be that higher-significance variants for type 2 diabetes play their role through distinct mechanisms in multiple relevant tissues.
In this study, we devised a ranking system that uses these type 2 diabetes expression traits in combination with coexpression networks in metabolically important tissues to discover novel genes associated with type 2 diabetes. By combining eQTL datasets from two different studies, we discovered that eSNPs regulating highly ranked genes in these tissues had a significant rising trend for association with type 2 diabetes in two well-known GWASs, specifically those performed by the WTCCC and GENEVA initiative. While the trend observed in the GENEVA study reached significance at a higher quantile cut-off, we think this is still sufficient to confirm our findings.
Having thus confirmed our hypothesis, we reasoned that the highly ranked genes that were not even marginally associated with type 2 diabetes in a GWAS might also be relevant to the pathogenesis of type 2 diabetes. We therefore investigated the other novel genes highlighted by our algorithm. Many of these novel genes that we identified are primarily expressed in skeletal or cardiac muscle. Several of these are involved in insulin signalling or glucose metabolism. For example, Ca2+ influx through L-type Ca2+ channels (CACNA1S) is essential for glucose-stimulated insulin secretion [29], while sarcosin (KBTBD10) is a cytoskeletal protein that, like VTI1A, is associated with the insulin-stimulated glucose transporter GLUT4; interestingly, this association is suppressed in the presence of insulin [30]. PYGM (phosphorylase, glycogen, muscle) and PFKM (phosphofructokinase, muscle) are key enzymes in glycogenolysis and glycolysis.
We also discovered several genes involved in cholesterol (INSIG1, insulin induced gene 1; HMGCS1, HMG-CoA synthase 1; IDI1, isopentenyl diphosphate isomerase 1) and fatty acid (FADS2, fatty acid desaturase 2; ECHDC1, enoyl CoA hydratase domain containing 1) metabolism. In particular, FADS2 activity has been linked to the risk of developing type 2 diabetes [31].
Fructokinase (KHK) is another interesting candidate gene revealed by our analysis. The endproduct of KHK is fructose-1-phosphate, which accelerates release of glucokinase (GK) from its regulatory protein (GKRP) [32]. Glucokinase serves as an insulin sensor in the pancreatic beta cells and is being evaluated as a potential drug target for type 2 diabetes [33].
The method presented here represents a paradigm for using eQTLs and prior knowledge of SNPs associated with a disease to discover additional candidate genes and variants for that disease. The strength of this approach lies in the fact that it incorporates the functional significance of the SNPs encapsulated in the association of eQTLs and expression traits. In addition, using coexpression networks constructed in various tissues enables the discovery of candidate genes not expressed in the tissues used for mapping eQTLs. As the number and quality of tissue-specific eQTL studies increase and improve, we anticipate that the power of this type of analysis to detect novel associations will also be enhanced dramatically. This highlights once again the importance of making this type of data available, so that the greater community of scientists may benefit.