Human Genetics

, Volume 137, Issue 5, pp 413–425 | Cite as

Leveraging epigenomics and contactomics data to investigate SNP pairs in GWAS

  • Elisabetta Manduchi
  • Scott M. Williams
  • Alessandra Chesi
  • Matthew E. Johnson
  • Andrew D. Wells
  • Struan F. A. Grant
  • Jason H. Moore
Original Investigation


Although Genome Wide Association Studies (GWAS) have led to many valuable insights into the genetic bases of common diseases over the past decade, the issue of missing heritability has surfaced, as the discovered main effect genetic variants found to date do not account for much of a trait’s predicted genetic component. We present a workflow, integrating epigenomics and topologically associating domain data, aimed at discovering trait-associated SNP pairs from GWAS where neither SNP achieved independent genome-wide significance. Each analyzed SNP pair consists of one SNP in a putative active enhancer and another SNP in a putative physically interacting gene promoter in a trait-relevant tissue. As a proof-of-principle case study, we used this approach to identify focused collections of SNP pairs that we analyzed in three independent Type 2 diabetes (T2D) GWAS. This approach led us to discover 35 significant SNP pairs, encompassing both novel signals and signals for which we have found orthogonal support from other sources. Nine of these pairs are consistent with eQTL results, two are consistent with our own capture C experiments, and seven involve signals supported by recent T2D literature.



The authors thank B. Cole, M. Hall, and D. Cousminer for useful conversations, and Kenyaita Hodge and Michelle Leonard for the HEPG2 capture C library preparation. We also thank the reviewers for their constructive feedback. Funding for this work was provided by National Institutes of Health Grants LM010098, DK112217, ES013508, R21 HD089824, and the Children’s Hospital of Philadelphia Center for Spatial and Functional Genomics and Daniel B. Burke Endowed Chair for Diabetes Research.

Supplementary material

439_2018_1893_MOESM1_ESM.xlsx (57 kb)
Online Resource 1, sheet1. Significant UNPHASED results with a BH adjusted combined p value < 0.1 in either the full or the interaction model are reported for all tissues. In each row, SNP1 is harbored in a predicted enhancer and SNP2 in a predicted promoter (some SNPs are in multiple promoters). These are also reported together with the list of FIMO predicted motifs with hits to the enhancer. We report the main effect adjusted combined p value for each SNP, the UNPHASED p values for the SNP pair in each study, and the combined p values (unadjusted and BH adjusted). In addition, the r2 and absolute distance (in bp) between the two SNPs are reported. Online Resource 1, sheet2. GWAMA results for the significant SNP pairs with no main effect SNPs, reporting meta-analysis odds ratio results (including 95% confidence interval bounds) for each observed haplotype with respect to the indicated reference haplotype. The Cochran’s heterogeneity statistic (q) and Higgins heterogeneity index (i2) are also indicated, together with the directions of effect in the 3 studies. The ‘undirOR’ in the last 3 columns is defined as OR when OR ≥ 1 and 1/OR when OR < 1. This is provided for quick OR comparisons regardless of direction. SNP pairs where GWAMA failed and outliers are not shown (XLSX 56 KB)
439_2018_1893_MOESM2_ESM.pdf (121 kb)
HEPG2 capture C evidence for (rs12900028, rs8037641). The loop track indicates the ARID3B promoter bait region containing rs8037641 and the captured region containing rs12900028. The SNP track shows the location of the 2 SNPs. The location of ARID3B is indicated. Finally the tracks with the capture C reads corresponding to the ARID3B bait in 3 replicate experiments are indicated. In the latter we limited the y-axis range to [0-9] since, in capture C, peak height varies with distance from bait so including the full peaks near bait would have rendered the other peaks visually indiscernible (PDF 120 KB)
439_2018_1893_MOESM3_ESM.xlsx (17 kb)
Frequencies of the alternate allele (A1) for the SNPs from Table 2 in the 3 GWAS studies. The absolute differences between these frequencies are also indicated (XLSX 17 KB)
439_2018_1893_MOESM4_ESM.xlsx (12 kb)
ENCODE/RoadMap and EnhancerAtlas data sets used to define putative active promoters and enhancers (XLSX 11 KB)
439_2018_1893_MOESM5_ESM.txt (106 kb)
List of proxies (r2 > 0.4) to known T2D sentinel SNPs (TXT 106 KB)
439_2018_1893_MOESM6_ESM.xlsx (11 kb)
Promoter DpnII bait fragments for ARID3B and HECW2 together with coordinates of subregions used for SureSelect oligonucleotides to bait them. All coordinates refer to hg19 (XLSX 10 KB)


  1. Anderson CA, Pettersson FH, Clarke GM, Cardon LR, Morris AP, Zondervan KT (2010) Data quality control in genetic case-control association studies. Nat Protoc 5(9):1564–1573CrossRefPubMedPubMedCentralGoogle Scholar
  2. Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J, Boyd M et al (2014) An atlas of active enhancers across human cell types and tissues. Nature 507(7493):455–461CrossRefPubMedPubMedCentralGoogle Scholar
  3. Bailey TL, Bodén M, Buske FA, Frith M, Grant CE, Clementi L et al (2009) MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res 37:W202–W208CrossRefGoogle Scholar
  4. Benjamini Y, Hochberg Y (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 57(1):289–300Google Scholar
  5. Boyle EA, Li YI, Pritchard JK (2017) An expanded view of complex traits: from polygenic to omnigenic. Cell 169(7):1177–1186CrossRefPubMedPubMedCentralGoogle Scholar
  6. Brem RB, Storey JD, Whittle J, Kruglyak L (2005) Genetic interactions between polymorphisms that affect gene expression in yeast. Nature 436:701–703CrossRefPubMedPubMedCentralGoogle Scholar
  7. Bush WS, Dudek SM, Ritchie MD (2009) Biofilter: a knowledge-integration system for the multi-locus analysis of genome-wide association studies. Pac Symp Biocomput 2009:368–379Google Scholar
  8. Chambers JC, Zhang W, Sehmi J, Li X, Wass MN, Van der Harst P et al (2011) Genome-wide association study identifies loci influencing concentrations of liver enzymes in plasma. Nat Genet 43(11):1131–1138CrossRefPubMedPubMedCentralGoogle Scholar
  9. Cowper-Sal Iari R, Cole MD, Karagas MR, Lupien M, Moore JH (2011) Layers of epistasis: genome-wide regulatory networks and network approaches to genome-wide association studies. Wiley Interdiscip Rev Syst Biol Med 3(5):513–526CrossRefGoogle Scholar
  10. Delaneau O, Marchini J, Zagury JF (2012) A linear complexity phasing method for thousands of genomes. Nat Methods 9(2):179–181CrossRefGoogle Scholar
  11. DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) Consortium, Asian Genetic Epidemiology Network Type 2 Diabetes (AGEN-T2D) Consortium, South Asian Type 2 Diabetes (SAT2D) Consortium, Mexican American Type 2 Diabetes (MAT2D) Consortium, Type 2 Diabetes Genetic Exploration by Nex-generation sequencing in muylti-Ethnic Samples (T2D-GENES) Consortium, Mahajan A (2014) Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat Genet 46(3):234–244Google Scholar
  12. Dixon JR, Selvaraj S, Yue F, Kim A, Li Y, Shen Y et al (2012) Topological domains in mammalian genomes identified by analysis of chromatin interactions. Nature 485:376–380CrossRefPubMedPubMedCentralGoogle Scholar
  13. Dudbridge F (2008) Likelihood-based association analysis for nuclear families and unrelated subjects with missing genotype data. Hum Hered 66:87–98CrossRefPubMedPubMedCentralGoogle Scholar
  14. Durbin R (2014) Efficient haplotype matching and storage using the positional Burrows–Wheeler transform (PBWT). Bioinformatics 30(9):1266–1272CrossRefPubMedPubMedCentralGoogle Scholar
  15. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH et al (2010) Missing heritability and strategies for finding the underlying causes of complex diseases. Nat Rev Genet 11:446–450CrossRefPubMedPubMedCentralGoogle Scholar
  16. ENCODE Project Consortium (2012) An integrated encyclopedia of DNA elements in the human genome. Nature 489(7414):57–74CrossRefGoogle Scholar
  17. FANTOM Consortium and the RIKEN PMI and CLST (DGT), Forrest AR, Kawaji H, Rehli M, Baillie JK, de Hoon MJ et al (2014) A promoter-level mammalian expression atlas. Nature 507(7493):462–470CrossRefGoogle Scholar
  18. Gao T, He B, Liu S, Zhu H, Tan K, Qian J (2016) EnhancerAtlas: a resource for enhancer annotation and analysis in 105 human cell/tissue types. Bioinformatics 32(23):3543–3551PubMedPubMedCentralGoogle Scholar
  19. Genomes Project Consortium (2015) A global reference for human genetic variation. Nature 526(7571):68–74CrossRefGoogle Scholar
  20. Grant CE, Bailey TL, Noble WS (2011) FIMO: scanning for occurrences of a given motif. Bioinformatics 27(7):1017–1018CrossRefPubMedPubMedCentralGoogle Scholar
  21. Greene CS, Penrod NM, Williams SM, Moore JH (2009) Failure to replicate a genetic association may provide important clues about genetic architecture. PLoS One 4(6):e5639. CrossRefPubMedPubMedCentralGoogle Scholar
  22. Hagberg AA, Schult DA, Swart PJ (2008) Exploring network structure, dynamics, and function using NetworkX. In: Proceedings of the 7th Python in science conference (SciPy2008), pp 11–15Google Scholar
  23. Hall MA, Wallace J, Lucas A, Kim D, Basile AO, Verma SS et al (2017) PLATO software provides analytic framework for investigating complexity beyond genome-wide association studies. Nat Commun 8(1):1167. CrossRefPubMedPubMedCentralGoogle Scholar
  24. Hemani G, Shakhbazov K, Westra HJ, Esko T, Henders AK, McRae AF et al (2014) Detection and replication of epistasis influencing transcription in humans. Nature 508(7495):249–253CrossRefPubMedPubMedCentralGoogle Scholar
  25. Hill WG, Goddard ME, Visscher PM (2008) Data and theory point to mainly additive genetic variance for complex traits. PLoS Genet 4(2):e1000008. CrossRefPubMedPubMedCentralGoogle Scholar
  26. Hinrichs AS, Karolchik D, Baertsch R, Barber GP, Bejerano G, Clawson H et al (2006) The UCSC genome browser database: update. Nucleic Acids Res 34(Database issue):D590-D598Google Scholar
  27. Huang J, Howie B, McCarthy S, Memari Y, Walter K, Min JL et al (2015) Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nature Commun 6:8111CrossRefGoogle Scholar
  28. Javierre BM, Burren OS, Wilder SP, Kreuzhuber R, Hill SM, Sewitz S et al (2016) Lineage-specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167(5):1369–1384CrossRefPubMedPubMedCentralGoogle Scholar
  29. Lau W, Andrew T, Maniatis N (2017) High-resolution genetic maps identify multiple type 2 diabetes loci at regulatory hotspots in African Americans and Europeans. Am J Hum Genet 100:803–816CrossRefPubMedPubMedCentralGoogle Scholar
  30. Loh P-R, Danecek P, Palamara PF, Fuchsberger C, Reshef YA, Finucane HK et al (2016) Reference-based phasing using the Haplotype Reference Consortium panel. Nat Genet 48(11):1443–1448CrossRefPubMedPubMedCentralGoogle Scholar
  31. MacDonald MJ, Longacre MJ, Langberg E-C, Tibell A, Kendrick MA, Fukao T et al (2009) Decreased levels of metabolic enzymes in pancreatic islets of patients with type 2 diabetes. Diabetologia 52(6):1087–1091CrossRefPubMedPubMedCentralGoogle Scholar
  32. Mackay TFC (2014) Epistasis and quantitative traits: using model organisms to study gene-gene interactions. Nat Rev Genet 15(1):22–23CrossRefPubMedGoogle Scholar
  33. Mägi R, Morris AP (2010) GWAMA: software for genome-wide association meta-analysis. BMC Bioinform 11:288CrossRefGoogle Scholar
  34. Manduchi E, Chesi A, Hall MA, Grant SFA, Moore JH (2018) Leveraging putative enhancer-promoter interactions to investigate two-way epistasis in Type 2 Diabetes GWAS. Pac Symp Biocomput 2018:548–558Google Scholar
  35. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA, Hunter DJ et al (2009) Finding the missing heritability of complex diseases. Nature 461(7265):747–753CrossRefPubMedPubMedCentralGoogle Scholar
  36. Mathelier A, Fornes O, Arenillas DJ, Chen C, Denay G, Lee J et al (2016) JASPAR 2016: a major expansion and update of the open-access database of transcription factor binding profiles. Nucleic Acids Res 44:D110–D115CrossRefGoogle Scholar
  37. McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR, Teumer A et al (2016) A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 48(10):1279–1283CrossRefPubMedPubMedCentralGoogle Scholar
  38. Mifsud B, Tavares-Cadete F, Young AN, Sugar R, Schoenfelder S, Ferreira L et al (2015) Mapping long-range promoter contacts in human cells with high-resolution capture Hi-C. Nat Genet 47:598–606CrossRefPubMedGoogle Scholar
  39. Mitra I, Lavillareuix A, Yeh E, Traglia M, Tsang K, Bearden CE et al (2017) Reverse pathway genetic approach identifies epistasis in autism spectrum disorders. PLoS Genet 13(1):e1006516. CrossRefPubMedPubMedCentralGoogle Scholar
  40. Moore JH (2015) Epistasis using ReliefF. Methods Mol Biol 1253:315–325CrossRefPubMedGoogle Scholar
  41. Moore JH, White BC (2007) Tuning ReliefF for genome-wide genetic analysis. In: Marchiori E, Moore JH, Rajapakse JC (eds) Evolutionary computation, machine learning and data mining in bioinformatics. Springer, Berlin, pp 166–175CrossRefGoogle Scholar
  42. Moore JH, Williams SM (2005) Traversing the conceptual divide between biological and statistical epistasis: systems biology and a more modern synthesis. Bioessays 27(6):637–646CrossRefPubMedGoogle Scholar
  43. Moore JH, Williams SM (2009) Epistasis and its implications for personal genetics. Am J Hum Genet 85:309–320CrossRefPubMedPubMedCentralGoogle Scholar
  44. Nair AK, Muller YL, McLean NA, Abdussamad M, Piaggi P, Kobes S et al (2014) Variants associated with type 2 diabetes identified by the transethnic meta-analysis study: assessment in American Indians and evidence for a new signal in LPP. Diabetologia 57(11):2334–2338CrossRefPubMedPubMedCentralGoogle Scholar
  45. Neph S, Kuehn MS, Reynolds AP, Haugen E, Thurman RE, Johnson AK et al (2012) BEDOPS: high-performance genomic feature operations. Bioinformatics 28(14):1919–1920CrossRefPubMedPubMedCentralGoogle Scholar
  46. Nora EP, Lajoie BR, Schulz EG, Giorgetti L, Okamoto I, Servant N et al (2012) Spatial partitioning of the regulatory landscape of the X-inactivation centre. Nature 485:381–385CrossRefPubMedPubMedCentralGoogle Scholar
  47. Pendergrass SA, Frase A, Wallace J, Wolfe D, Katiyar N, Moore C et al (2013) Genomic analyses with biofilter 2.0: knowledge driven filtering, annotation, and model development. BioData Min 6(1):25. CrossRefPubMedPubMedCentralGoogle Scholar
  48. Phillips PC (2008) Epistasis—the essential role of gene interactions in the structure and evolution of genetic systems. Nat Rev Genet 9(11):855–867CrossRefPubMedPubMedCentralGoogle Scholar
  49. Qi L, van Dam RM, Asselbergs FW, Hu FB (2007) Gene-gene interactions between HNF4A and KCNJ11 in predicting Type 2 diabetes in women. Diabet Med 24:1187–1191CrossRefPubMedGoogle Scholar
  50. Qi L, Cornelis MC, Kraft P, Stanya KJ, Linda Kao WH, Pankow JS et al (2010) Genetic variants at 2q24 are associated with susceptibility to type 2 diabetes. Hum Mol Genet 19(13):2706–2715CrossRefPubMedPubMedCentralGoogle Scholar
  51. Ritchie MD, Hahn LW, Roodi N, Bailey LR, Dupont WD, Parl FF et al (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69:138–147CrossRefPubMedPubMedCentralGoogle Scholar
  52. Roadmap Epigenomics Consortium, Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A et al (2015) Integrative analysis of 111 reference human epigenomes. Nature 518(7539):317–330CrossRefPubMedCentralGoogle Scholar
  53. Robnik-Šikonja M, Kononenko I (2003) Theoretical and empirical analysis of ReliefF and RReliefF. Mach Learn 53:23–69CrossRefGoogle Scholar
  54. Tryka KA, Hao L, Sturcke A, Jin Y, Wang ZY, Ziyabari L et al (2014) NCBI’s database of genotypes and phenotypes: dbGaP. Nucleic Acids Res 42(Database issue):D975–D979Google Scholar
  55. Urbanowicz RJ, Granizo-Mackenzie ALS, Kiralis J, Moore JH (2014) A classification and characterization of two-locus, pure, strict, epistatic models for simulation and detection. BioData Min 7:8CrossRefPubMedPubMedCentralGoogle Scholar
  56. Verma SS, Cooke Bailey JN, Lucas A, Bradford Y, Linnemann JG, Hauser MA et al (2016) Epistatic gene-based interaction analyses for glaucoma in eMERGE and NEIGHBOR Consortium. PLoS Genet 12(9):e1006186. CrossRefPubMedPubMedCentralGoogle Scholar
  57. Way GP, Youngstrom DW, Hankenson KD, Greene CS, Grant SFA (2017) Implicating candidate genes at GWAS signals by leveraging topologically associating domains. Eur J Hum Genet 25(11):1286–1289CrossRefPubMedGoogle Scholar
  58. Welcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447(7145):661–678CrossRefGoogle Scholar
  59. Xia Q, Chesi A, Manduchi E, Johnston BT, Lu S, Leonard ME et al (2016) The type 2 diabetes presumed causal variant within TCF7L2 resides in an element that controls the expression of ACSL5. Diabetologia 59(11):2360–2368CrossRefPubMedGoogle Scholar
  60. Zhang Q, Wu KH, He JY, Zeng Y, Greenbaum J, Xia X et al (2017) Novel common variants associated with obesity and type 2 diabetes detected using a cFDR method. Sci Rep 7(1):16397. CrossRefPubMedPubMedCentralGoogle Scholar
  61. Zuk O, Hechter E, Sunyaev SR, Lander ES (2012) The mystery of missing heritability: genetic interactions create phantom heritability. PNAS USA 109:1193–1198CrossRefPubMedPubMedCentralGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Department of Biostatistics, Epidemiology, and InformaticsUniversity of PennsylvaniaPhiladelphiaUSA
  2. 2.Division of Human GeneticsThe Children’s Hospital of PhiladelphiaPhiladelphiaUSA
  3. 3.Center for Spatial and Functional GenomicsThe Children’s Hospital of PhiladelphiaPhiladelphiaUSA
  4. 4.Department of Population and Quantitative Health SciencesCase Western Reserve UniversityClevelandUSA
  5. 5.Department of Pathology and Laboratory MedicinePerelman School of Medicine at the University of PennsylvaniaPhiladelphiaUSA
  6. 6.Department of GeneticsPerelman School of Medicine at the University of PennsylvaniaPhiladelphiaUSA

Personalised recommendations