Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity

Gazal, Steven; Weissbrod, Omer; Hormozdiari, Farhad; Dey, Kushal K.; Nasser, Joseph; Jagadeesh, Karthik A.; Weiner, Daniel J.; Shi, Huwenbo; Fulco, Charles P.; O’Connor, Luke J.; Pasaniuc, Bogdan; Engreitz, Jesse M.; Price, Alkes L.

doi:10.1038/s41588-022-01087-y

Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity

Article
Published: 06 June 2022

Volume 54, pages 827–836, (2022)
Cite this article

From

View current issue Submit your manuscript

17k Accesses
44 Citations
53 Altmetric
Explore all metrics

Abstract

Disease-associated single-nucleotide polymorphisms (SNPs) generally do not implicate target genes, as most disease SNPs are regulatory. Many SNP-to-gene (S2G) linking strategies have been developed to link regulatory SNPs to the genes that they regulate in cis. Here, we developed a heritability-based framework for evaluating and combining different S2G strategies to optimize their informativeness for common disease risk. Our optimal combined S2G strategy (cS2G) included seven constituent S2G strategies and achieved a precision of 0.75 and a recall of 0.33, more than doubling the recall of any individual strategy. We applied cS2G to fine-mapping results for 49 UK Biobank diseases/traits to predict 5,095 causal SNP–gene-disease triplets (with S2G-derived functional interpretation) with high confidence. We further applied cS2G to provide an empirical assessment of disease omnigenicity; we determined that the top 1% of genes explained roughly half of the SNP heritability linked to all genes and that gene-level architectures vary with variant allele frequency.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

**Fig. 1: Overview of the S2G framework.**

**Fig. 2: Accuracy of individual S2G strategies and the cS2G strategy.**

**Fig. 3: SNP–gene–disease triplets identified by cS2G and other S2G strategies.**

**Fig. 4: Examples of high-confidence SNP–gene–disease triplets identified by cS2G.**

**Fig. 5: Empirical assessment of disease omnigenicity using cS2G.**

Integrative genomics analyses unveil downstream biological effectors of disease-specific polymorphisms buried in intergenic regions

Article Open access 27 April 2016

Functional annotation signatures of disease susceptibility loci improve SNP association analysis

Article Open access 24 May 2014

Leveraging epigenomics and contactomics data to investigate SNP pairs in GWAS

Article 24 May 2018

Data availability

The list of 19,995 genes, summary statistics of the 63 independent traits, training and validation critical gene sets, S2G and cS2G strategies, SNP annotations, predicted causal SNP–disease pairs from UK Biobank fine-mapping analyses and from the NHGRI-EBI GWAS Catalog and SNP heritability causally explained by SNPs linked to each gene have been made publicly available at https://alkesgroup.broadinstitute.org/cS2G and https://doi.org/10.5281/zenodo.6354007. Links for all data sets used to create S2G strategies are provided in Supplementary Table 26.

Access to the UK Biobank resource is available via application at http://www.ukbiobank.ac.uk/.

The GWAS Catalog is available at https://www.ebi.ac.uk/gwas/api/search/downloads/full.

Open Targets SNP–gene pairs are available at https://raw.githubusercontent.com/opentargets/genetics-gold-standards/master/gold_standards/processed/gwas_gold_standards.191108.tsv.

SNP–gene pairs from ref. ⁴⁸ are available at https://www.dropbox.com/s/kz2c49rpm2yanf5/all_byCS_rev1.txt?dl=0.

Code availability

The code to estimate precision and recall of S2G strategies and the code to create combined S2G strategies have been made publicly available at https://alkesgroup.broadinstitute.org/cS2G/code and https://doi.org/10.5281/zenodo.6415925.

References

Visscher, P. M. et al. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101, 5–22 (2017).
Article CAS PubMed PubMed Central Google Scholar
Buniello, A. et al. The NHGRI-EBI GWAS Catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res. 47, D1005–D1012 (2019).
Article CAS PubMed Google Scholar
Claussnitzer, M. et al. A brief history of human disease genetics. Nature 577, 179–189 (2020).
Article CAS PubMed PubMed Central Google Scholar
Benner, C. et al. FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32, 1493–1501 (2016).
Article CAS PubMed PubMed Central Google Scholar
Schaid, D. J., Chen, W. & Larson, N. B. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19, 491–504 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wang, G., Sarkar, A., Carbonetto, P. & Stephens, M. A simple new approach to variable selection in regression, with application to genetic fine mapping. J. R. Stat. Soc. Series B Stat. Methodol. 82, 1273–1300 (2020).
Article Google Scholar
Weissbrod, O. et al. Functionally informed fine-mapping and polygenic localization of complex trait heritability. Nat. Genet. 52, 1355–1363 (2020).
Article CAS PubMed PubMed Central Google Scholar
Hindorff, L. A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. PNAS 106, 9362–9367 (2009).
Article CAS PubMed PubMed Central Google Scholar
Maurano, M. T. et al. Systematic localization of common disease-associated variation in regulatory DNA. Science 337, 1190–1195 (2012).
Article CAS PubMed PubMed Central Google Scholar
Trynka, G. et al. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45, 124–130 (2013).
Article CAS PubMed Google Scholar
Finucane, H. K. et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47, 1228–1235 (2015).
Article CAS PubMed PubMed Central Google Scholar
Claussnitzer, M. et al. FTO obesity variant circuitry and adipocyte browning in humans. N. Engl. J. Med. 373, 895–907 (2015).
Article CAS PubMed PubMed Central Google Scholar
Farh, K. K.-H. et al. Genetic and epigenetic fine mapping of causal autoimmune disease variants. Nature 518, 337–343 (2015).
Article CAS PubMed Google Scholar
Gusev, A. et al. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48, 245–252 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gamazon, E. R. et al. Using an atlas of gene regulation across 44 human tissues to inform complex disease- and trait-associated variation. Nat. Genet. 50, 956–967 (2018).
Article CAS PubMed PubMed Central Google Scholar
Porcu, E. et al. Mendelian randomization integrating GWAS and eQTL data reveals genetic determinants of complex and clinical traits. Nat. Commun. 10, 3300 (2019).
Article PubMed PubMed Central CAS Google Scholar
Wainberg, M. et al. Opportunities and challenges for transcriptome-wide association studies. Nat. Genet. 51, 592–599 (2019).
Article CAS PubMed PubMed Central Google Scholar
The GTEx Consortium. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369, 1318–1330 (2020).
Article PubMed Central CAS Google Scholar
Võsa, U. et al. Large-scale cis- and trans-eQTL analyses identify thousands of genetic loci and polygenic scores that regulate blood gene expression. Nat. Genet. 53, 1300–1310 (2021).
Article PubMed PubMed Central CAS Google Scholar
Giambartolomei, C. et al. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 10, e1004383 (2014).
Article PubMed PubMed Central CAS Google Scholar
Lee, D. et al. JEPEG: a summary statistics based tool for gene-level joint testing of functional variants. Bioinformatics 31, 1176–1182 (2015).
Article PubMed Google Scholar
Hormozdiari, F. et al. Colocalization of GWAS and eQTL signals detects target genes. Am. J. Hum. Genet 99, 1245–1260 (2016).
Article CAS PubMed PubMed Central Google Scholar
Chun, S. et al. Limited statistical evidence for shared genetic effects of eQTLs and autoimmune-disease-associated loci in three major immune-cell types. Nat. Genet. 49, 600–605 (2017).
Article CAS PubMed PubMed Central Google Scholar
Liu, B., Gloudemans, M. J., Rao, A. S., Ingelsson, E. & Montgomery, S. B. Abundant associations with gene expression complicate GWAS follow-up. Nat. Genet. 51, 768–769 (2019).
Article CAS PubMed PubMed Central Google Scholar
Gamazon, E. R. et al. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47, 10 (2015).
Article CAS Google Scholar
Hormozdiari, F. et al. Leveraging molecular quantitative trait loci to understand the genetic architecture of diseases and complex traits. Nat. Genet. 50, 1041–1047 (2018).
Article CAS PubMed PubMed Central Google Scholar
Yao, D. W., O’Connor, L. J., Price, A. L. & Gusev, A. Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat. Genet. 52, 626–633 (2020).
Article CAS PubMed PubMed Central Google Scholar
Umans, B. D., Battle, A. & Gilad, Y. Where are the disease-associated eQTLs? Trends Genet. 37, 109–124 (2021).
Article CAS PubMed Google Scholar
Ernst, J. et al. Mapping and analysis of chromatin state dynamics in nine human cell types. Nature 473, 43–49 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kundaje, A. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article CAS PubMed PubMed Central Google Scholar
Javierre, B. M. et al. Lineage-Specific genome architecture links enhancers and non-coding disease variants to target gene promoters. Cell 167, 1369–1384 (2016).
Article CAS PubMed PubMed Central Google Scholar
Liu, Y., Sarkar, A., Kheradpour, P., Ernst, J. & Kellis, M. Evidence of reduced recombination rate in human regulatory domains. Genome Biol. 18, 193 (2017).
Article PubMed PubMed Central CAS Google Scholar
Pliner, H. A. et al. Cicero predicts cis-regulatory DNA interactions from single-cell chromatin accessibility data. Mol. Cell 71, 858–871 (2018).
Article CAS PubMed PubMed Central Google Scholar
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Article CAS PubMed PubMed Central Google Scholar
Jung, I. et al. A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat. Genet. 51, 1442–1449 (2019).
Article CAS PubMed PubMed Central Google Scholar
Satpathy, A. T. et al. Massively parallel single-cell chromatin landscapes of human immune cell development and intratumoral T cell exhaustion. Nat. Biotechnol. 37, 925–936 (2019).
Article CAS PubMed PubMed Central Google Scholar
Boix, C. A., James, B. T., Park, Y. P., Meuleman, W. & Kellis, M. Regulatory genomic circuitry of human disease loci by integrative epigenomics. Nature 590, 300–307 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nasser, J. et al. Genome-wide enhancer maps link risk variants to disease genes. Nature 593, 238–243 (2021).
Article CAS PubMed PubMed Central Google Scholar
GeneHancer: genome-wide integration of enhancers and target genes in GeneCards. Database (Oxford) 2017, bax028 (2017).
Michailidou, K. et al. Association analysis identifies 65 new breast cancer risk loci. Nature 551, 92–94 (2017).
Article PubMed PubMed Central CAS Google Scholar
GEMO Study Collaborators et al. Fine-mapping of 150 breast cancer risk regions identifies 191 likely target genes. Nat. Genet. 52, 56–73 (2020).
Article CAS Google Scholar
Mountjoy, E. et al. An open approach to systematically prioritize causal variants and genes at all published human GWAS trait-associated loci. Nat. Genet. 53, 1527–1533 (2021).
Article CAS PubMed PubMed Central Google Scholar
Bycroft, C. et al. The UK Biobank resource with deep phenotyping and genomic data. Nature 562, 203–209 (2018).
Article CAS PubMed PubMed Central Google Scholar
Boyle, E. A., Li, Y. I. & Pritchard, J. K. An expanded view of complex traits: from polygenic to omnigenic. Cell 169, 1177–1186 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wray, N. R., Wijmenga, C., Sullivan, P. F., Yang, J. & Visscher, P. M. Common disease is more complex than implied by the core gene omnigenic model. Cell 173, 1573–1580 (2018).
Article CAS PubMed Google Scholar
Liu, X., Li, Y. I. & Pritchard, J. K. Trans effects on gene expression can drive omnigenic inheritance. Cell 177, 1022–1034.e6 (2019).
Article CAS PubMed PubMed Central Google Scholar
The 1000 Genomes Project Consortium. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article CAS Google Scholar
Weeks, E. M. et al. Leveraging polygenic enrichments of gene features to predict genes underlying complex traits and diseases. Preprint at medRxiv https://doi.org/10.1101/2020.09.08.20190561 (2020).
Wang, X. & Goldstein, D. B. Enhancer domains predict gene pathogenicity and inform gene discovery in complex disease. Am. J. Hum. Genet. 106, 215–233 (2020).
Article CAS PubMed PubMed Central Google Scholar
Gallagher, M. D. & Chen-Plotkin, A. S. The post-GWAS era: from association to function. Am. J. Hum. Genet 102, 717–730 (2018).
Article CAS PubMed PubMed Central Google Scholar
Musunuru, K. et al. From noncoding variant to phenotype via SORT1 at the 1p13 cholesterol locus. Nature 466, 714–719 (2010).
Article CAS PubMed PubMed Central Google Scholar
Gazal, S. et al. Linkage disequilibrium-dependent architecture of human complex traits shows action of negative selection. Nat. Genet. 49, 1421–1427 (2017).
Article CAS PubMed PubMed Central Google Scholar
Gazal, S., Marquez-Luna, C., Finucane, H. K. & Price, A. L. Reconciling S-LDSC and LDAK functional enrichment estimates. Nat. Genet. 51, 1202–1204 (2019).
Article CAS PubMed PubMed Central Google Scholar
O’Connor, L. J. et al. Extreme polygenicity of complex traits is explained by negative selection. Am. J. Hum. Genet. 105, 456–476 (2019).
Article PubMed PubMed Central CAS Google Scholar
Jagadeesh, K. A. et al. Identifying disease-critical cell types and cellular processes across the human body by integration of single-cell profiles and human genetics. Preprint at bioRxiv https://doi.org/10.1101/2021.03.19.436212 (2021).
Finucane, H. K. et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50, 621–629 (2018).
Article CAS PubMed PubMed Central Google Scholar
Freund, M. K. et al. Phenotype-Specific enrichment of mendelian disorder genes near GWAS Regions across 62 Complex Traits. Am. J. Hum. Genet. 103, 535–552 (2018).
Article CAS PubMed PubMed Central Google Scholar
Povysil, G. et al. Rare-variant collapsing analyses for complex traits: guidelines and applications. Nat. Rev. Genet. 20, 747–759 (2019).
Article CAS PubMed Google Scholar
Van Hout, C. V. et al. Exome sequencing and characterization of 49,960 individuals in the UK Biobank. Nature 586, 749–756 (2020).
Article PubMed PubMed Central CAS Google Scholar
Frankish, A. et al. GENCODE reference annotation for the human and mouse genomes. Nucleic Acids Res. 47, D766–D773 (2019).
Article CAS PubMed Google Scholar
Yates, A. D. et al. Ensembl 2020. Nucleic Acids Res. 48, D682–D688 (2020).
Article CAS PubMed Google Scholar
O’Leary, N. A. et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res. 44, D733–D745 (2016).
Article PubMed CAS Google Scholar
Kapoor, A. et al. An enhancer polymorphism at the cardiomyocyte intercalated disc protein NOS1AP locus is a major regulator of the QT interval. Am. J. Hum. Genet. 94, 854–869 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bauer, D. E. et al. An erythroid enhancer of BCL11A subject to genetic variation determines fetal hemoglobin level. Science 342, 253–257 (2013).
Article CAS PubMed PubMed Central Google Scholar
van den Boogaard, M. et al. A common genetic variant within SCN10A modulates cardiac SCN5A expression. J. Clin. Invest. 124, 1844–1852 (2014).
Article PubMed PubMed Central CAS Google Scholar
Soldner, F. et al. Parkinson-associated risk variant in distal enhancer of α-synuclein modulates target gene expression. Nature 533, 95–99 (2016).
Article CAS PubMed PubMed Central Google Scholar
Glubb, D. M. et al. Fine-scale mapping of the 5q11.2 breast cancer locus reveals at least three independent risk variants regulating MAP3K1. Am. J. Hum. Genet. 96, 5–20 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gupta, R. M. et al. A genetic variant associated with five vascular diseases is a distal regulator of endothelin-1 gene expression. Cell 170, 522–533 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. Discovery and validation of sub-threshold genome-wide association study loci using epigenomic signatures. eLife 5, e10557 (2016).
Article PubMed PubMed Central Google Scholar
Huang, Q. et al. A prostate cancer susceptibility allele at 6q22 increases RFX6 expression by modulating HOXB13 chromatin binding. Nat. Genet. 46, 126–135 (2014).
Article CAS PubMed Google Scholar
The GAME-ON/ELLIPSE Consortium et al. CAUSEL: an epigenome- and genome-editing pipeline for establishing function of noncoding GWAS variants. Nat. Med. 21, 1357–1363 (2015).
Article CAS Google Scholar
Stadhouders, R. et al. HBS1L-MYB intergenic variants modulate fetal hemoglobin via long-range MYB enhancers. J. Clin. Invest. 124, 1699–1710 (2014).
Article CAS PubMed PubMed Central Google Scholar
Gallagher, M. D. et al. A dementia-associated risk variant near TMEM106B alters chromatin architecture and gene expression. Am. J. Hum. Genet. 101, 643–663 (2017).
Article CAS PubMed PubMed Central Google Scholar
Guthridge, J. M. et al. Two functional lupus-associated BLK Promoter variants control cell-type- and developmental-stage-specific transcription. Am. J. Hum. Genet. 94, 586–598 (2014).
Article CAS PubMed PubMed Central Google Scholar
Vicente, C. T. et al. Long-range modulation of PAG1 expression by 8q21 allergy risk variants. Am. J. Hum. Genet. 97, 329–336 (2015).
Article CAS PubMed PubMed Central Google Scholar
Fogarty, M. P., Cannon, M. E., Vadlamudi, S., Gaulton, K. J. & Mohlke, K. L. Identification of a regulatory variant that binds FOXA1 and FOXA2 at the CDC123/CAMK1D type 2 diabetes GWAS locus. PLoS Genet. 10, e1004633 (2014).
Article PubMed PubMed Central CAS Google Scholar
Simeonov, D. R. et al. Discovery of stimulation-responsive immune enhancers with CRISPR activation. Nature 549, 111–115 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ulirsch, J. C. et al. Systematic functional dissection of common genetic variation affecting red blood cell traits. Cell 165, 1530–1545 (2016).
Article CAS PubMed PubMed Central Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article CAS PubMed PubMed Central Google Scholar
Siepel, A. et al. Evolutionarily conserved elements in vertebrate, insect, worm, and yeast genomes. Genome Res. 15, 1034–1050 (2005).
Article CAS PubMed PubMed Central Google Scholar
Lindblad-Toh, K. et al. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478, 476–482 (2011).
Article CAS PubMed PubMed Central Google Scholar
Leeuw, C. A., de, Mooij, J. M., Heskes, T. & Posthuma, D. MAGMA: generalized gene-set analysis of GWAS data. PLoS Comput. Biol. 11, e1004219 (2015).
Article PubMed PubMed Central CAS Google Scholar

Download references

Acknowledgements

We thank X. Jiang, C. Boix and M. Kellis for helpful discussion. S.G. is funded by National Institutes of Health grant R00 HG010160. A.L.P. is funded by National Institutes of Health grants U01 HG009379, R01 MH101244, R37 MH107649, R01 MH115676, R01 MH109978, U01 HG012009 and R01 HG006399. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. This research was conducted using the UK Biobank Resource under application 16549.

Author information

Charles P. Fulco
Present address: Bristol Myers Squibb, Cambridge, MA, USA

Authors and Affiliations

Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Steven Gazal
Center for Genetic Epidemiology, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA
Steven Gazal
Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Steven Gazal, Omer Weissbrod, Farhad Hormozdiari, Kushal K. Dey, Karthik A. Jagadeesh, Huwenbo Shi & Alkes L. Price
Broad Institute of MIT and Harvard, Cambridge, MA, USA
Steven Gazal, Omer Weissbrod, Farhad Hormozdiari, Kushal K. Dey, Joseph Nasser, Karthik A. Jagadeesh, Daniel J. Weiner, Huwenbo Shi, Charles P. Fulco, Luke J. O’Connor, Jesse M. Engreitz & Alkes L. Price
Department of Systems Biology, Harvard Medical School, Boston, MA, USA
Charles P. Fulco
Departments of Computational Medicine, Human Genetics, Pathology and Laboratory Medicine, David Geffen School of Medicine, University of California, Los Angeles, Los Angeles, CA, USA
Bogdan Pasaniuc
Department of Genetics, Stanford University School of Medicine, Stanford, CA, USA
Jesse M. Engreitz
BASE Initiative, Betty Irene Moore Children’s Heart Center, Lucile Packard Children’s Hospital, Stanford University School of Medicine, Stanford, CA, USA
Jesse M. Engreitz
Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
Alkes L. Price

Authors

Steven Gazal
View author publications
You can also search for this author in PubMed Google Scholar
Omer Weissbrod
View author publications
You can also search for this author in PubMed Google Scholar
Farhad Hormozdiari
View author publications
You can also search for this author in PubMed Google Scholar
Kushal K. Dey
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Nasser
View author publications
You can also search for this author in PubMed Google Scholar
Karthik A. Jagadeesh
View author publications
You can also search for this author in PubMed Google Scholar
Daniel J. Weiner
View author publications
You can also search for this author in PubMed Google Scholar
Huwenbo Shi
View author publications
You can also search for this author in PubMed Google Scholar
Charles P. Fulco
View author publications
You can also search for this author in PubMed Google Scholar
Luke J. O’Connor
View author publications
You can also search for this author in PubMed Google Scholar
Bogdan Pasaniuc
View author publications
You can also search for this author in PubMed Google Scholar
Jesse M. Engreitz
View author publications
You can also search for this author in PubMed Google Scholar
Alkes L. Price
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

S.G. and A.L.P. designed experiments. S.G. performed experiments. S.G., O.W., F.H., K.D., J.N., and K.J. analyzed data. D.W., H.S., C.P.F., L.OC., B.P. and J.M.E. provided suggestions on the analyses. S.G. and A.L.P., with assistance from all authors, wrote the manuscript.

Corresponding authors

Correspondence to Steven Gazal or Alkes L. Price.

Ethics declarations

Competing interests

C.P.F. is now an employee of Bristol Myers Squibb. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks Guillaume Lettre and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 S2G strategy linking each SNP to best gene leads to higher precision than linking SNPs to multiple target genes.

We report the precision of S2G strategies linking SNPs to target genes using three difference approaches for converting raw linking values into linking scores: by assigning to each gene with non-zero raw linking value the same linking score (unweighted), by assigning to each gene a linking score proportional to its raw linking value (weighted), and by retaining only the gene(s) with the highest linking score (best gene). Values were estimated using non-trait-specific training critical gene set and meta-analyzed across 63 independent traits. Error bars represent 95% confidence intervals around meta-analyzed values. For most of the S2G strategies the precision was very similar (except for EpiMap, ABC and Open Targets), but the precision was generally highest for the ‘best gene’ strategy. However, we note that this choice does not reflect biological reality, in which a regulatory element may target more than one gene, and that refinements to this choice are a direction for future research.

Extended Data Fig. 2 Precision of 27 S2G strategies based on physical distance to TSS.

We report precision of the closest TSS strategy as a function of the distance between a SNP and its closest TSS (a) (numbers between parentheses represent the fraction of common SNPs linked by the strategy), and the precision of the i^th closest TSS (each strategy links 100% of the SNPs) (b). Values were estimated using trait-specific validation critical gene sets and meta-analyzed across 63 independent traits. Error bars represent 95% confidence intervals around meta-analyzed values. The mean value of 0.043 for 6th-20th closest TSS suggests that genes located relatively close to causal disease genes have a slightly elevated probability of being causal. Numerical results including values of recall and corresponding standard errors are reported in Supplementary Table 5.

Extended Data Fig. 3 Precision of functional S2G strategies using all available cell types and tissues or restricted to blood and immune cell types and tissues.

We report the precisions of functional S2G strategies built using either all available cell types and tissues (All CT; in light color) and/or blood and immune cell types and tissues (Blood CT; in dark color) meta-analyzed across 63 independent traits (All traits; in blue) and 11 blood cell traits and autoimmune diseases (Blood traits; in red) (UK Biobank all autoimmune diseases, Crohn’s Disease, Rheumatoid Arthritis, Ulcerative Colitis, Lupus, Celiac, Platelet Count, Red Blood Cell Count, Red Blood Cell Distribution Width, Eosinophil Count, White Blood Cell Count; see Supplementary Table 3). Error bars represent 95% confidence intervals around meta-analyzed values. We considered 5 S2G strategies with data available for cell types and tissues: GTEx cis-eQTLs (GTEx), GTEx fine-mapped cis-eQTL (GTEx fine-mapped), Roadmap enhancer-gene linking (Roadmap), EpiMap enhancer-gene linking (EpiMap), and Activity-By-Contact (ABC). We considered 3 S2G strategies with data available only for blood and immune cell types and tissues: eQTLGen fine-mapped blood cis-eQTL (eQTLGen fine-mapped), PCHi-C (blood), and Cicero blood/basal (Cicero). We observed 1) that S2G strategies using data from all cell types and tissues were more precise than S2G strategies restricted to blood and immune cell types and tissues in both analyses of all traits (light blue vs. dark blue) and blood cell traits and autoimmune diseases (light red vs. dark red), and 2) that S2G strategies using data from blood and immune cell types and tissues are more precise in all traits than in blood cell traits and autoimmune diseases (dark blue vs. dark red).

Extended Data Fig. 4 Proportion of common and low-frequency variant heritability linked to genes explained by each individual gene.

We report the proportion of common and low-frequency variant heritability linked to genes (\(h_{gene,common}^2\) and \(h_{gene,low - freq}^2\), respectively) explained by each individual gene in 16 independent UK Biobank traits. Genes in the top 200 genes (top 1% of all genes) contributing to both \(h_{gene,common}^2\) and \(h_{gene,low - freq}^2\) are denoted in red (median of 26 genes across the 16 traits), genes in the top 200 genes contributing to only \(h_{gene,common}^2\) (resp. \(h_{gene,low - freq}^2\)) are colored in black (resp. blue) (median of 174 genes each), and remaining genes are colored in gray (median of 19,621 genes, with values close to 0 on both axes). We observe low concordance between per-gene contributions to gene architectures for common vs. low-frequency SNPs.

Extended Data Fig. 5 Excess overlap between top genes contributing to common and low-frequency variant heritability linked to genes and disease-specific Mendelian disorder genes.

We report the excess overlap between phenotype-specific Mendelian disorder genes⁷² and the top 200 genes contributing to common and low-frequency variant heritability linked to genes (left), and the gene enrichment of disease-specific Mendelian disorder genes (that is [SNP heritability linked to Mendelian disorder genes / SNP heritability linked to all genes] / [number of Mendelian disorder genes / total number of genes]) across common and low-frequency variants (right). Each dot represents a disease/trait - Mendelian disorder gene set pair, and is colored by the Mendelian disorder gene set. These two results suggest that both the set of top 200 genes and the per-gene heritability estimates are unlikely to be driven by noisy estimates arising from finite sample size. We restricted analyses to 21 traits analyzed in ref. ⁷².

Extended Data Fig. 6 Excess overlap between top genes contributing to common and low-frequency variant heritability linked to genes and differentially expressed gene sets.

We report the excess overlap between 205 differentially expressed gene sets⁴¹ and the top 200 genes contributing to common and low-frequency variants heritability linked to genes across 16 independent UK Biobank traits. Each dot represents a differentially expressed gene set, and is colored by the tissue category. We generally observed excess overlap for disease-critical tissues/cell types. We observed high correlations between excess overlaps for common vs. low-frequency variant architectures, suggesting that common and low-frequency variants architectures are driven by different genes pertaining to similar biological processes.

Supplementary information

Supplementary Information

Supplementary Figures 1–11, Supplementary table legends and Supplementary Notes

Reporting Summary

Peer Review File

Supplementary Table 1

This file contains Supplementary Tables 1–26

Rights and permissions

Reprints and permissions

About this article

Cite this article

Gazal, S., Weissbrod, O., Hormozdiari, F. et al. Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nat Genet 54, 827–836 (2022). https://doi.org/10.1038/s41588-022-01087-y

Download citation

Received: 29 July 2021
Accepted: 27 April 2022
Published: 06 June 2022
Issue Date: June 2022
DOI: https://doi.org/10.1038/s41588-022-01087-y
Springer Nature America, Inc.

This article is cited by

Genetic variation across and within individuals
- Zhi Yu
- Tim H. H. Coorens
- Pradeep Natarajan
Nature Reviews Genetics (2024)
Common and ethnic-specific derangements in skeletal muscle transcriptome associated with obesity
- Sreejon S. Das
- Swapan K. Das
International Journal of Obesity (2024)
Epigenomic insights into common human disease pathology
- Christopher G. Bell
Cellular and Molecular Life Sciences (2024)
Tissue-specific enhancer–gene maps from multimodal single-cell data identify causal disease alleles
- Saori Sakaue
- Kathryn Weinand
- Soumya Raychaudhuri
Nature Genetics (2024)
Identification of genetic loci jointly influencing COVID-19 and coronary heart diseases
- Siyue Wang
- Hexiang Peng
- Tao Wu
Human Genomics (2023)

Associated content

One step closer to linking GWAS SNPs with the right genes

News & Views Nature Genetics 06 June 2022

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combining SNP-to-gene linking strategies to identify disease genes and assess disease omnigenicity

Abstract

Access this article

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Search

Navigation