Abstract
Drug development and biological discovery require effective strategies to map existing genetic associations to causal genes. To approach this problem, we selected 12 common diseases and quantitative traits for which highly powered genome-wide association studies (GWAS) were available. For each disease or trait, we systematically curated positive control gene sets from Mendelian forms of the disease and from targets of medicines used for disease treatment. We found that these positive control genes were highly enriched in proximity of GWAS-associated single-nucleotide variants (SNVs). We then performed quantitative assessment of the contribution of commonly used genomic features, including open chromatin maps, expression quantitative trait loci (eQTL), and chromatin conformation data. Using these features, we trained and validated an Effector Index (Ei), to map target genes for these 12 common diseases and traits. Ei demonstrated high predictive performance, both with cross-validation on the training set, and an independently derived set for type 2 diabetes. Key predictive features included coding or transcript-altering SNVs, distance to gene, and open chromatin-based metrics. This work outlines a simple, understandable approach to prioritize genes at GWAS loci for functional follow-up and drug development, and provides a systematic strategy for prioritization of GWAS target genes.
Similar content being viewed by others
Data availability
All accession codes and URLs for publicly available data are provided in the “Methods”. Newly generated DNase-seq data are available from the GEO repository under accession GSE142160. The data that support the findings of this study are available from GitHub at https://github.com/richardslab/Ei. This includes raw data underlying the figures. Results can be visualized at http://hugeamp.org/effectorgenes.html.
Code availability
The code that supports the findings of this study are available from GitHub at https://github.com/richardslab/Ei and https://github.com/mauranolab/UKBB_FINEMAP_targetgene.
References
Aguet F, Ardlie KG, Cummings BB et al (2017) Genetic effects on gene expression across human tissues. Nature 550:204–213. https://doi.org/10.1038/nature24277
Arrowsmith J (2011a) Trial watch: phase III and submission failures: 2007–2010. Nat Rev Drug Discov 10:87
Arrowsmith J (2011b) Trial watch: phase II failures: 2008–2010. Nat Rev Drug Discov 10:328–329
Arrowsmith J, Miller P (2013) Trial watch: phase II and phase III attrition rates 2011–2012. Nat Rev Drug Discov 12:569
Ayellet VS, Groop L, Mootha VK et al (2010) Common inherited variation in mitochondrial genes is not enriched for associations with type 2 diabetes or related glycemic traits. PLoS Genet 6:1001058. https://doi.org/10.1371/journal.pgen.1001058
Benjamin DJ, Berger JO, Johannesson M et al (2018) Redefine statistical significance. Nat Hum Behav 2:6–10
Benner C, Spencer CCA, Havulinna AS et al (2016) FINEMAP: efficient variable selection using summary data from genome-wide association studies. Bioinformatics 32:1493–1501. https://doi.org/10.1093/bioinformatics/btw018
Benner C, Havulinna AS, Järvelin MR et al (2017) Prospects of fine-mapping trait-associated genomic regions by using summary statistics from genome-wide association studies. Am J Hum Genet 101:539–551. https://doi.org/10.1016/j.ajhg.2017.08.012
Boekholdt SM, Arsenault BJ, Mora S et al (2012) Association of LDL cholesterol, non-HDL cholesterol, and apolipoprotein B levels with risk of cardiovascular events among patients treated with statins: a meta-analysis. J Am Med Assoc 307(12):1302–1309
Bolger AM, Lohse M, Usadel B (2014) Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30:2114–2120. https://doi.org/10.1093/bioinformatics/btu170
Bycroft C, Freeman C, Petkova D et al (2018) The UK Biobank resource with deep phenotyping and genomic data. Nature 562:203–209. https://doi.org/10.1038/s41586-018-0579-z
Cao F, Fullwood MJ (2019) Inflated performance measures in enhancer–promoter interaction-prediction methods. Nat Genet 51(8):1196–1198
Chen T, Guestrin C (2016) XGBoost: a scalable tree boosting system. In: Proceedings of the ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, pp 785–794
Claussnitzer M, Dankel SN, Kim KH et al (2015) FTO obesity variant circuitry and adipocyte browning in humans. N Engl J Med. https://doi.org/10.1056/NEJMoa1502214
Cook D, Brown D, Alexander R et al (2014) Lessons learned from the fate of AstraZeneca’s drug pipeline: a five-dimensional framework. Nat Rev Drug Discov 13:419–431
Djebali S, Davis CA, Merkel A et al (2012) Landscape of transcription in human cells. Nature. https://doi.org/10.1038/nature11233
Flannick J, Mercader JM, Fuchsberger C et al (2019) Exome sequencing of 20,791 cases of type 2 diabetes and 24,440 controls. Nature. https://doi.org/10.1038/s41586-019-1231-2
Greenwald WW, Chiou J, Yan J et al (2019) Pancreatic islet chromatin accessibility and conformation reveals distal enhancer networks of type 2 diabetes risk. Nat Commun. https://doi.org/10.1038/s41467-019-09975-4
Hormozdiari F, van de Bunt M, Segrè AV et al (2016) Colocalization of GWAS and eQTL signals detects target genes. Am J Hum Genet 99:1245–1260. https://doi.org/10.1016/j.ajhg.2016.10.003
Jiang L, Zheng Z, Qi T et al (2019) A resource-efficient tool for mixed model association analysis of large-scale data. Nat Genet 51:1749–1755. https://doi.org/10.1038/s41588-019-0530-8
John S, Sabo PJ, Canfield TK et al (2013) Genome-scale mapping of DNase I hypersensitivity. Curr Protoc Mol Biol. https://doi.org/10.1002/0471142727.mb2127s103
Johnson VE (2013) Revised standards for statistical evidence. Proc Natl Acad Sci USA. https://doi.org/10.1073/pnas.1313476110
Jones P, Kafonek S, Laurora I, Hunninghake D (1998) Comparative dose efficacy study of atorvastatin versus simvastatin, pravastatin, lovastatin, and fluvastatin in patients with hypercholesterolemia (the CURVES study). Am J Cardiol. https://doi.org/10.1016/S0002-9149(97)00965-X
Jung I, Schmitt A, Diao Y et al (2019) A compendium of promoter-centered long-range chromatin interactions in the human genome. Nat Genet. https://doi.org/10.1038/s41588-019-0494-8
Kerch A, Simes R, Barter P, Best J, Scott R (2005) Taskinen MR et al., FIELD Study Investigators. Effects of long-term fenofibrate therapy on cardiovascular events in 9795 people with type 2 diabetes mellitus (the FIELD study): randomised controlled trial. Lancet. https://doi.org/10.1016/S0140-6736(05)67667-2
Kichaev G, Yang WY, Lindstrom S et al (2014) Integrating functional data to prioritize causal variants in statistical fine-mapping studies. PLoS Genet 10:1004722. https://doi.org/10.1371/journal.pgen.1004722
King EA, Wade Davis J, Degner JF (2019) Are drug targets with genetic support twice as likely to be approved? Revised estimates of the impact of genetic support for drug mechanisms on the probability of drug approval. PLoS Genet 15:e1008489. https://doi.org/10.1371/journal.pgen.1008489
LaRosa JC, He J, Vupputuri S (1999) Effect of statins on risk of coronary disease. A meta-analysis of randomized controlled trials. J Am Med Assoc 282(24):2340–6
Law MR, Wald NJ, Rudnicka AR (2003) Quantifying effect of statins on low density lipoprotein cholesterol, ischaemic heart disease, and stroke: systematic review and meta-analysis. Br Med J. https://doi.org/10.1136/bmj.326.7404.1423
Lawlor N, George J, Bolisetty M et al (2017) Single-cell transcriptomes identify human islet cell signatures and reveal cell-type-specific expression changes in type 2 diabetes. Genome Res. https://doi.org/10.1101/gr.212720.116
Li H, Durbin R (2009) Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–1760. https://doi.org/10.1093/bioinformatics/btp324
Li B, Dewey CN, Bo Li CND (2011) RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinform 12:323. https://doi.org/10.1186/1471-2105-12-323
Love MI, Huber W, Anders S (2014) Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. https://doi.org/10.1186/s13059-014-0550-8
Mahajan A, Taliun D, Thurner M et al (2018a) Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat Genet 50:1505–1513. https://doi.org/10.1038/s41588-018-0241-6
Mahajan A, Wessel J, Willems SM et al (2018b) Refining the accuracy of validated target identification through coding variant fine-mapping in type 2 diabetes article. Nat Genet 50:559–571. https://doi.org/10.1038/s41588-018-0084-1
Mahajan A, McCarthy MI (2019) Predicted type 2 diabetes effector genes. https://s3.amazonaws.com/broad-portal-resources/effector_predictions_documentation.pdf
Maurano MT, Humbert R, Rynes E et al (2012) Systematic localization of common disease-associated variation in regulatory DNA. Science (80-). https://doi.org/10.1126/science.1222794
Maurano MT, Haugen E, Sandstrom R et al (2015) Large-scale identification of sequence variants influencing human transcription factor occupancy in vivo. Nat Genet 47:1393–1401. https://doi.org/10.1038/ng.3432
McLaren W, Gil L, Hunt SE et al (2016) The ensembl variant effect predictor. Genome Biol 17:122. https://doi.org/10.1186/s13059-016-0974-4
Miguel-Escalada I, Bonàs-Guarch S, Cebola I et al (2019) Human pancreatic islet three-dimensional chromatin architecture provides insights into the genetics of type 2 diabetes. Nat Genet. https://doi.org/10.1038/s41588-019-0457-0
Morris JA, Kemp JP, Youlten SE et al (2019) An atlas of genetic influences on osteoporosis in humans and mice. Nat Genet. https://doi.org/10.1038/s41588-018-0302-x
Nelson MR, Tipney H, Painter JL et al (2015) The support of human genetic evidence for approved drug indications. Nat Genet 47:856–860. https://doi.org/10.1038/ng.3314
O’Seaghdha CM, Wu H, Yang Q et al (2013) Meta-analysis of genome-wide association studies identifies six new loci for serum calcium concentrations. PLoS Genet. https://doi.org/10.1371/journal.pgen.1003796
Pan DZ, Garske KM, Alvarez M et al (2018) Integration of human adipocyte chromosomal interactions with adipose gene expression prioritizes obesity-related genes from GWAS. Nat Commun. https://doi.org/10.1038/s41467-018-03554-9
Pandor A, Ara RM, Tumur I et al (2009) Ezetimibe monotherapy for cholesterol lowering in 2722 people: systematic review and meta-analysis of randomized controlled trials. J Intern Med. https://doi.org/10.1111/j.1365-2796.2008.02062.x
Parker SCJ, Stitzel ML, Taylor DL et al (2013) Chromatin stretch enhancer states drive cell-specific gene regulation and harbor human disease risk variants. Proc Natl Acad Sci USA. https://doi.org/10.1073/pnas.1317023110
Pers TH, Karjalainen JM, Chan Y et al (2015a) Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun 6:5890. https://doi.org/10.1038/ncomms6890
Pers TH, Karjalainen JM, Chan Y et al (2015b) Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun 6:1–9. https://doi.org/10.1038/ncomms6890
Plenge RM, Scolnick EM, Altshuler D (2013) Validating therapeutic targets through human genetics. Nat Rev Drug Discov 12:581–594
Schriml LM, Mitraka E, Munro J et al (2019) Human Disease Ontology 2018 update: classification, content and workflow expansion. Nucleic Acids Res 47:D955–D962. https://doi.org/10.1093/nar/gky1032
Smemo S, Tena JJ, Kim KH et al (2014) Obesity-associated variants within FTO form long-range functional connections with IRX3. Nature. https://doi.org/10.1038/nature13138
Stacey D, Fauman EB, Ziemek D et al (2019) ProGeM: A framework for the prioritization of candidate causal genes at molecular quantitative trait loci. Nucleic Acids Res. https://doi.org/10.1093/nar/gky837
Thurman RE, Rynes E, Humbert R et al (2012a) The accessible chromatin landscape of the human genome. Nature. https://doi.org/10.1038/nature11232
Thurman RE, Rynes E, Humbert R et al (2012b) The accessible chromatin landscape of the human genome. Nature 489:75–82. https://doi.org/10.1038/nature11232
Wishart DS, Feunang YD, Guo AC et al (2018) DrugBank 5.0: a major update to the DrugBank database for 2018. Nucleic Acids Res 46:D1074–D1082. https://doi.org/10.1093/nar/gkx1037
Yao DW, O’Connor LJ, Price AL, Gusev A (2020) Quantifying genetic effects on disease mediated by assayed gene expression levels. Nat Genet 52:626–633. https://doi.org/10.1038/s41588-020-0625-2
Zhu X, Stephens M (2018) Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes. Nat Commun. https://doi.org/10.1038/s41467-018-06805-x
Acknowledgements
This research has been conducted using the UK Biobank Resource using project number 27449.
Funding
The funding agencies had no role in the design, implementation or interpretation of this study. The views expressed in this article are those of the author(s) and not necessarily those of funders. MIM has received funding from the NIH: U01-DK105535 and the Wellcome Trust: Wellcome: 090532, 098381, 106130, 203141, 212259. The Greenwood lab acknowledges support from Compute Canada (RAPI: nzt-671-aa). MTM is partially funded by National Institutes of Health grant R35GM119703. The Richards research group is supported by the Canadian Institutes of Health Research (CIHR), the Lady Davis Institute of the Jewish General Hospital, the Canadian Foundation for Innovation, the NIH Foundation, Cancer Research UK and the Fonds de Recherche Québec Santé (FRQS). JBR is supported by a FRQS Clinical Research Scholarship. TwinsUK is funded by the Welcome Trust, Medical Research Council, European Union, the National Institute for Health Research (NIHR)-funded BioResource, Clinical Research Facility and Biomedical Research Centre based at Guy’s and St Thomas’ NHS Foundation Trust in partnership with King’s College London.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Conflict of interest
EF is an employee of Pfizer. MIM has served on advisory panels for Pfizer, NovoNordisk and Zoe Global, has received honoraria from Merck, Pfizer, Novo Nordisk and Eli Lilly, and research funding from Abbvie, Astra Zeneca, Boehringer Ingelheim, Eli Lilly, Janssen, Merck, NovoNordisk, Pfizer, Roche, Sanofi Aventis, Servier, and Takeda. MIM is currently at Genentech, 1 DNA Way, South San Francisco, CA 94080, and a holder of Roche stock. JBR has served as an advisor to GlaxoSmithKline and Deerfield Capital. JBR’s institution has received investigator-initiated grant funding from Eli Lilly, GlaxoSmithKline and Biogen for projects unrelated to this research. JBR is the CEO of 5 Prime Sciences (http://www.5primesciences.com). VF is an employee of 5 Prime Sciences.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Forgetta, V., Jiang, L., Vulpescu, N.A. et al. An effector index to predict target genes at GWAS loci. Hum Genet 141, 1431–1447 (2022). https://doi.org/10.1007/s00439-022-02434-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-022-02434-z