Skip to main content
Log in

Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity

  • Article
  • Published:

From Nature Genetics

View current issue Submit your manuscript

Abstract

Disease-associated single-nucleotide polymorphisms (SNPs) generally do not implicate target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis. Here, we developed a heritability-based framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk. Our optimal combined S2G strategy (cS2G) included seven constituent S2G strategies and achieved a precision of 0.75 and a recall of 0.33, more than doubling the recall of any individual strategy. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 5,095 causal SNP–gene-disease triplets (with S2G-derived functional interpretation) with high confidence. We further applied cS2G to provide an empirical assessment of disease omnigenicity; we determined that the top 1% of genes explained roughly half of the SNP heritability linked to all genes and that gene-level architectures vary with variant allele frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1: Overview of the S2G framework.
Fig. 2: Accuracy of individual S2G strategies and the cS2G strategy.
Fig. 3: SNPgenedisease triplets identified by cS2G and other S2G strategies.
Fig. 4: Examples of high-confidence SNP–gene–disease triplets identified by cS2G.
Fig. 5: Empirical assessment of disease omnigenicity using cS2G.

Similar content being viewed by others

Data availability

The list of 19,995 genes, summary statistics of the 63 independent traits, training and validation critical gene sets, S2G and cS2G strategies, SNP annotations, predicted causal SNP–disease pairs from UK Biobank fine-mapping analyses and from the NHGRI-EBI GWAS Catalog and SNP heritability causally explained by SNPs linked to each gene have been made publicly available at https://alkesgroup.broadinstitute.org/cS2G and https://doi.org/10.5281/zenodo.6354007. Links for all data sets used to create S2G strategies are provided in Supplementary Table 26.

Access to the UK Biobank resource is available via application at http://www.ukbiobank.ac.uk/.

The GWAS Catalog is available at https://www.ebi.ac.uk/gwas/api/search/downloads/full.

Open Targets SNP–gene pairs are available at https://raw.githubusercontent.com/opentargets/genetics-gold-standards/master/gold_standards/processed/gwas_gold_standards.191108.tsv.

SNP–gene pairs from ref. 48 are available at https://www.dropbox.com/s/kz2c49rpm2yanf5/all_byCS_rev1.txt?dl=0.

Code availability

The code to estimate precision and recall of S2G strategies and the code to create combined S2G strategies have been made publicly available at https://alkesgroup.broadinstitute.org/cS2G/code and https://doi.org/10.5281/zenodo.6415925.

References

  1. Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).

    Article  CAS  PubMed  Google Scholar 

  3. Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  5. Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).

    Article  Google Scholar 

  7. Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. PNAS 106, 9362–9367 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).

    Article  CAS  PubMed  Google Scholar 

  11. Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Claussnitzer, M. et al. FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 373, 895–907 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).

    Article  CAS  PubMed  Google Scholar 

  14. Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Gamazon, E. R. et al. Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nat. Genet. 50, 956–967 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Porcu, E. et al. Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nat. Commun. 10, 3300 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. The GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).

    Article  PubMed Central  CAS  Google Scholar 

  19. Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Lee, D. et al. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. Bioinformatics 31, 1176–1182 (2015).

    Article  PubMed  Google Scholar 

  22. Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet 99, 1245–1260 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Liu, B., Gloudemans, M. J., Rao, A. S., Ingelsson, E. & Montgomery, S. B. Abundant associations with gene expression complicate GWAS follow-up. Nat. Genet. 51, 768–769 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 10 (2015).

    Article  CAS  Google Scholar 

  26. Hormozdiari, F. et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 50, 1041–1047 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Yao, D. W., O’Connor, L. J., Price, A. L. & Gusev, A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626–633 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet. 37, 109–124 (2021).

    Article  CAS  PubMed  Google Scholar 

  29. Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Javierre, B. M. et al. Lineage-Specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Liu, Y., Sarkar, A., Kheradpour, P., Ernst, J. & Kellis, M. Evidence of reduced recombination rate in human regulatory domains. Genome Biol. 18, 193 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  33. Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858–871 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Jung, I. et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat. Genet. 51, 1442–1449 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  36. Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  38. Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  39. GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxford) 2017, bax028 (2017).

  40. Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. GEMO Study Collaborators et al. Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes. Nat. Genet. 52, 56–73 (2020).

    Article  CAS  Google Scholar 

  42. Mountjoy, E. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 53, 1527–1533 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Wray, N. R., Wijmenga, C., Sullivan, P. F., Yang, J. & Visscher, P. M. Common disease is more complex than implied by the core gene omnigenic model. Cell 173, 1573–1580 (2018).

    Article  CAS  PubMed  Google Scholar 

  46. Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).

    Article  CAS  Google Scholar 

  48. Weeks, E. M. et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Preprint at medRxiv https://doi.org/10.1101/2020.09.08.20190561 (2020).

  49. Wang, X. & Goldstein, D. B. Enhancer domains predict gene pathogenicity and inform gene discovery in complex disease. Am. J. Hum. Genet. 106, 215–233 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet 102, 717–730 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  55. Jagadeesh, K. A. et al. Identifying disease-critical cell types and cellular processes across the human body by integration of single-cell profiles and human genetics. Preprint at bioRxiv https://doi.org/10.1101/2021.03.19.436212 (2021).

  56. Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  57. Freund, M. K. et al. Phenotype-Specific enrichment of mendelian disorder genes near GWAS Regions across 62 Complex Traits. Am. J. Hum. Genet. 103, 535–552 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Povysil, G. et al. Rare-variant collapsing analyses for complex traits: guidelines and applications. Nat. Rev. Genet. 20, 747–759 (2019).

    Article  CAS  PubMed  Google Scholar 

  59. Van Hout, C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  60. Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).

    Article  CAS  PubMed  Google Scholar 

  61. Yates, A. D. et al. Ensembl 2020. Nucleic Acids Res. 48, D682–D688 (2020).

    Article  CAS  PubMed  Google Scholar 

  62. O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).

    Article  PubMed  CAS  Google Scholar 

  63. Kapoor, A. et al. An enhancer polymorphism at the cardiomyocyte intercalated disc protein NOS1AP locus is a major regulator of the QT interval. Am. J. Hum. Genet. 94, 854–869 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  64. Bauer, D. E. et al. An erythroid enhancer of BCL11A subject to genetic variation determines fetal hemoglobin level. Science 342, 253–257 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. van den Boogaard, M. et al. A common genetic variant within SCN10A modulates cardiac SCN5A expression. J. Clin. Invest. 124, 1844–1852 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  66. Soldner, F. et al. Parkinson-associated risk variant in distal enhancer of α-synuclein modulates target gene expression. Nature 533, 95–99 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  67. Glubb, D. M. et al. Fine-scale mapping of the 5q11.2 breast cancer locus reveals at least three independent risk variants regulating MAP3K1. Am. J. Hum. Genet. 96, 5–20 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Gupta, R. M. et al. A genetic variant associated with five vascular diseases is a distal regulator of endothelin-1 gene expression. Cell 170, 522–533 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Wang, X. et al. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife 5, e10557 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Huang, Q. et al. A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. Nat. Genet. 46, 126–135 (2014).

    Article  CAS  PubMed  Google Scholar 

  71. The GAME-ON/ELLIPSE Consortium et al. CAUSEL: an epigenome- and genome-editing pipeline for establishing function of noncoding GWAS variants. Nat. Med. 21, 1357–1363 (2015).

    Article  CAS  Google Scholar 

  72. Stadhouders, R. et al. HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range MYB enhancers. J. Clin. Invest. 124, 1699–1710 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Gallagher, M. D. et al. A dementia-associated risk variant near TMEM106B alters chromatin architecture and gene expression. Am. J. Hum. Genet. 101, 643–663 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Guthridge, J. M. et al. Two functional lupus-associated BLK Promoter variants control cell-type- and developmental-stage-specific transcription. Am. J. Hum. Genet. 94, 586–598 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  75. Vicente, C. T. et al. Long-range modulation of PAG1 expression by 8q21 allergy risk variants. Am. J. Hum. Genet. 97, 329–336 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Fogarty, M. P., Cannon, M. E., Vadlamudi, S., Gaulton, K. J. & Mohlke, K. L. Identification of a regulatory variant that binds FOXA1 and FOXA2 at the CDC123/CAMK1D type 2 diabetes GWAS locus. PLoS Genet. 10, e1004633 (2014).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  77. Simeonov, D. R. et al. Discovery of stimulation-responsive immune enhancers with CRISPR activation. Nature 549, 111–115 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Ulirsch, J. C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  82. Leeuw, C. A., de, Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

Download references

Acknowledgements

We thank X. Jiang, C. Boix and M. Kellis for helpful discussion. S.G. is funded by National Institutes of Health grant R00 HG010160. A.L.P. is funded by National Institutes of Health grants U01 HG009379, R01 MH101244, R37 MH107649, R01 MH115676, R01 MH109978, U01 HG012009 and R01 HG006399. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. This research was conducted using the UK Biobank Resource under application 16549.

Author information

Authors and Affiliations

Authors

Contributions

S.G. and A.L.P. designed experiments. S.G. performed experiments. S.G., O.W., F.H., K.D., J.N., and K.J. analyzed data. D.W., H.S., C.P.F., L.OC., B.P. and J.M.E. provided suggestions on the analyses. S.G. and A.L.P., with assistance from all authors, wrote the manuscript.

Corresponding authors

Correspondence to Steven Gazal or Alkes L. Price.

Ethics declarations

Competing interests

C.P.F. is now an employee of Bristol Myers Squibb. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Guillaume Lettre and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 S2G strategy linking each SNP to best gene leads to higher precision than linking SNPs to multiple target genes.

We report the precision of S2G strategies linking SNPs to target genes using three difference approaches for converting raw linking values into linking scores: by assigning to each gene with non-zero raw linking value the same linking score (unweighted), by assigning to each gene a linking score proportional to its raw linking value (weighted), and by retaining only the gene(s) with the highest linking score (best gene). Values were estimated using non-trait-specific training critical gene set and meta-analyzed across 63 independent traits. Error bars represent 95% confidence intervals around meta-analyzed values. For most of the S2G strategies the precision was very similar (except for EpiMap, ABC and Open Targets), but the precision was generally highest for the ‘best gene’ strategy. However, we note that this choice does not reflect biological reality, in which a regulatory element may target more than one gene, and that refinements to this choice are a direction for future research.

Extended Data Fig. 2 Precision of 27 S2G strategies based on physical distance to TSS.

We report precision of the closest TSS strategy as a function of the distance between a SNP and its closest TSS (a) (numbers between parentheses represent the fraction of common SNPs linked by the strategy), and the precision of the ith closest TSS (each strategy links 100% of the SNPs) (b). Values were estimated using trait-specific validation critical gene sets and meta-analyzed across 63 independent traits. Error bars represent 95% confidence intervals around meta-analyzed values. The mean value of 0.043 for 6th-20th closest TSS suggests that genes located relatively close to causal disease genes have a slightly elevated probability of being causal. Numerical results including values of recall and corresponding standard errors are reported in Supplementary Table 5.

Extended Data Fig. 3 Precision of functional S2G strategies using all available cell types and tissues or restricted to blood and immune cell types and tissues.

We report the precisions of functional S2G strategies built using either all available cell types and tissues (All CT; in light color) and/or blood and immune cell types and tissues (Blood CT; in dark color) meta-analyzed across 63 independent traits (All traits; in blue) and 11 blood cell traits and autoimmune diseases (Blood traits; in red) (UK Biobank all autoimmune diseases, Crohn’s Disease, Rheumatoid Arthritis, Ulcerative Colitis, Lupus, Celiac, Platelet Count, Red Blood Cell Count, Red Blood Cell Distribution Width, Eosinophil Count, White Blood Cell Count; see Supplementary Table 3). Error bars represent 95% confidence intervals around meta-analyzed values. We considered 5 S2G strategies with data available for cell types and tissues: GTEx cis-eQTLs (GTEx), GTEx fine-mapped cis-eQTL (GTEx fine-mapped), Roadmap enhancer-gene linking (Roadmap), EpiMap enhancer-gene linking (EpiMap), and Activity-By-Contact (ABC). We considered 3 S2G strategies with data available only for blood and immune cell types and tissues: eQTLGen fine-mapped blood cis-eQTL (eQTLGen fine-mapped), PCHi-C (blood), and Cicero blood/basal (Cicero). We observed 1) that S2G strategies using data from all cell types and tissues were more precise than S2G strategies restricted to blood and immune cell types and tissues in both analyses of all traits (light blue vs. dark blue) and blood cell traits and autoimmune diseases (light red vs. dark red), and 2) that S2G strategies using data from blood and immune cell types and tissues are more precise in all traits than in blood cell traits and autoimmune diseases (dark blue vs. dark red).

Extended Data Fig. 4 Proportion of common and low-frequency variant heritability linked to genes explained by each individual gene.

We report the proportion of common and low-frequency variant heritability linked to genes (\(h_{gene,common}^2\) and \(h_{gene,low - freq}^2\), respectively) explained by each individual gene in 16 independent UK Biobank traits. Genes in the top 200 genes (top 1% of all genes) contributing to both \(h_{gene,common}^2\) and \(h_{gene,low - freq}^2\) are denoted in red (median of 26 genes across the 16 traits), genes in the top 200 genes contributing to only \(h_{gene,common}^2\) (resp. \(h_{gene,low - freq}^2\)) are colored in black (resp. blue) (median of 174 genes each), and remaining genes are colored in gray (median of 19,621 genes, with values close to 0 on both axes). We observe low concordance between per-gene contributions to gene architectures for common vs. low-frequency SNPs.

Extended Data Fig. 5 Excess overlap between top genes contributing to common and low-frequency variant heritability linked to genes and disease-specific Mendelian disorder genes.

We report the excess overlap between phenotype-specific Mendelian disorder genes72 and the top 200 genes contributing to common and low-frequency variant heritability linked to genes (left), and the gene enrichment of disease-specific Mendelian disorder genes (that is [SNP heritability linked to Mendelian disorder genes / SNP heritability linked to all genes] / [number of Mendelian disorder genes / total number of genes]) across common and low-frequency variants (right). Each dot represents a disease/trait - Mendelian disorder gene set pair, and is colored by the Mendelian disorder gene set. These two results suggest that both the set of top 200 genes and the per-gene heritability estimates are unlikely to be driven by noisy estimates arising from finite sample size. We restricted analyses to 21 traits analyzed in ref. 72.

Extended Data Fig. 6 Excess overlap between top genes contributing to common and low-frequency variant heritability linked to genes and differentially expressed gene sets.

We report the excess overlap between 205 differentially expressed gene sets41 and the top 200 genes contributing to common and low-frequency variants heritability linked to genes across 16 independent UK Biobank traits. Each dot represents a differentially expressed gene set, and is colored by the tissue category. We generally observed excess overlap for disease-critical tissues/cell types. We observed high correlations between excess overlaps for common vs. low-frequency variant architectures, suggesting that common and low-frequency variants architectures are driven by different genes pertaining to similar biological processes.

Supplementary information

Supplementary Information

Supplementary Figures 1–11, Supplementary table legends and Supplementary Notes

Reporting Summary

Peer Review File

Supplementary Table 1

This file contains Supplementary Tables 1–26

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Gazal, S., Weissbrod, O., Hormozdiari, F. et al. Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nat Genet 54, 827–836 (2022). https://doi.org/10.1038/s41588-022-01087-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41588-022-01087-y

  • Springer Nature America, Inc.

This article is cited by

Navigation