Abstract
Genome wide association studies (GWASs) have identified tens of thousands of single nucleotide polymorphisms (SNPs) associated with human diseases and characteristics. A significant fraction of GWAS findings can be false positives. The gold standard for true positives is an independent validation. The goal of this study was to identify SNP features associated with validation success. Summary statistics from the Catalog of Published GWASs were used in the analysis. Since our goal was an analysis of reproducibility, we focused on the diseases/phenotypes targeted by at least 10 GWASs. GWASs were arranged in discovery-validation pairs based on the time of publication, with the discovery GWAS published before validation. We used four definitions of the validation success that differ by stringency. Associations of SNP features with validation success were consistent across the definitions. The strongest predictor of SNP validation was the level of statistical significance in the discovery GWAS. The magnitude of the effect size was associated with validation success in a non-linear manner. SNPs with risk allele frequencies in the range 30–70% showed a higher validation success rate compared to rarer or more common SNPs. Missense, 5’UTR, stop gained, and SNPs located in transcription factor binding sites had a higher validation success rate compared to intergenic, intronic and synonymous SNPs. There was a positive association between validation success and the level of evolutionary conservation of the sites. In addition, validation success was higher when discovery and validation GWASs targeted the same ethnicity. All predictors of validation success remained significant in a multivariate logistic regression model indicating their independent contribution. To conclude, we identified SNP features predicting validation success of GWAS hits. These features can be used to select SNPs for validation and downstream functional studies.
Similar content being viewed by others
Availability of data and materials
The data from A Catalog of Published Genome-Wide Association Studies https://www.genome.gov/catalog-of-published-genomewide-association-studies, UCSC Human Genome Browser https://genome.ucsc.edu, The Ensembl Regulatory Build http://useast.ensembl.org/info/genome/funcgen/regulatory_build.html, and ENCODE https://www.encodeproject.org/, all in the public domain, were used in this project.
Code availability
Not applicable.
References
Bosse Y, Amos CI (2018) A decade of GWAS results in lung cancer. Cancer Epidemiol Biomark Prev 27(4):363–379. https://doi.org/10.1158/1055-9965.EPI-16-0794
Brzyski D, Peterson CB, Sobczyk P, Candes EJ, Bogdan M, Sabatti C (2017) Controlling the rate of GWAS false discoveries. Genetics 205(1):61–75. https://doi.org/10.1534/genetics.116.193987
Buniello A, MacArthur JAL, Cerezo M, Harris LW, Hayhurst J, Malangone C, Parkinson H et al (2019) The NHGRI-EBI GWAS catalog of published genome-wide association studies, targeted arrays and summary statistics 2019. Nucleic Acids Res 47(D1):D1005–D1012. https://doi.org/10.1093/nar/gky1120
Buroker NE (2014) Regulatory SNPs and transcriptional factor binding sites in ADRBK1, AKT3, ATF3, DIO2, TBXA2R and VEGFA. Transcription 5(4):e964559. https://doi.org/10.4161/21541264.2014.964559
Caballero A, Tenesa A, Keightley PD (2015) The nature of genetic variation for complex traits revealed by GWAS and regional heritability mapping analyses. Genetics 201(4):1601–1613. https://doi.org/10.1534/genetics.115.177220
Gallagher MD, Chen-Plotkin AS (2018) The post-GWAS era: from association to function. Am J Hum Genet 102(5):717–730. https://doi.org/10.1016/j.ajhg.2018.04.002
Gorlov IP, Moore JH, Peng B, Jin JL, Gorlova OY, Amos CI (2014) SNP characteristics predict replication success in association studies. Hum Genet 133(12):1477–1486. https://doi.org/10.1007/s00439-014-1493-6
Hong EP, Park JW (2012) Sample size and statistical power calculation in genetic association studies. Genomics Inform 10(2):117–122. https://doi.org/10.5808/GI.2012.10.2.117
Horwitz T, Lam K, Chen Y, Xia Y, Liu C (2019) A decade in psychiatric GWAS research. Mol Psychiatry 24(3):378–389. https://doi.org/10.1038/s41380-018-0055-z
Huo Y, Li S, Liu J, Li X, Luo XJ (2019) Functional genomics reveal gene regulatory mechanisms underlying schizophrenia risk. Nat Commun 10(1):670. https://doi.org/10.1038/s41467-019-08666-4
Lao O, Lu TT, Nothnagel M, Junge O, Freitag-Wolf S, Caliebe A, Kayser M et al (2008) Correlation between genetic and geographic structure in Europe. Curr Biol 18(16):1241–1248. https://doi.org/10.1016/j.cub.2008.07.049
Li Y, Willer C, Sanna S, Abecasis G (2009) Genotype imputation. Annu Rev Genomics Hum Genet 10:387–406. https://doi.org/10.1146/annurev.genom.9.081307.164242
Liang B, Ding H, Huang L, Luo H, Zhu X (2020) GWAS in cancer: progress and challenges. Mol Genet Genomics 295(3):537–561. https://doi.org/10.1007/s00438-020-01647-z
Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN (2003) Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet 33(2):177–182. https://doi.org/10.1038/ng1071
Lou J, Gong J, Ke J, Tian J, Zhang Y, Li J, Miao X et al (2017) A functional polymorphism located at transcription factor binding sites, rs6695837 near LAMC1 gene, confers risk of colorectal cancer in Chinese populations. Carcinogenesis 38(2):177–183. https://doi.org/10.1093/carcin/bgw204
Marigorta UM, Rodriguez JA, Gibson G, Navarro A (2018) Replicability and prediction: lessons and challenges from GWAS. Trends Genet 34(7):504–517. https://doi.org/10.1016/j.tig.2018.03.005
Merelli I, Calabria A, Cozzi P, Viti F, Mosca E, Milanesi L (2013) SNPranker 2.0: a gene-centric data mining tool for diseases associated SNP prioritization in GWAS. BMC Bioinform 14(Suppl 1):S9. https://doi.org/10.1186/1471-2105-14-S1-S9
Myers TA, Chanock SJ, Machiela MJ (2020) LDlinkR: an R package for rapidly calculating linkage disequilibrium statistics in diverse populations. Front Genet 11:157. https://doi.org/10.3389/fgene.2020.00157
O’Connor LJ, Schoech AP, Hormozdiari F, Gazal S, Patterson N, Price AL (2019) Extreme polygenicity of complex traits is explained by negative selection. Am J Hum Genet 105(3):456–476. https://doi.org/10.1016/j.ajhg.2019.07.003
Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A (2010) Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20(1):110–121. https://doi.org/10.1101/gr.097857.109
Schaid DJ, Chen W, Larson NB (2018) From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat Rev Genet 19(8):491–504. https://doi.org/10.1038/s41576-018-0016-z
Sham PC, Purcell SM (2014) Statistical power and significance testing in large-scale genetic studies. Nat Rev Genet 15(5):335–346. https://doi.org/10.1038/nrg3706
Shi J, Park JH, Duan J, Berndt ST, Moy W, Yu K, Chatterjee N et al (2016) Winner’s curse correction and variable thresholding improve performance of polygenic risk modeling based on genome-wide association study summary-level data. PLoS Genet 12(12):e1006493. https://doi.org/10.1371/journal.pgen.1006493
Shi S, Yuan N, Yang M, Du Z, Wang J, Sheng X, Xiao J et al (2018) Comprehensive assessment of genotype imputation performance. Hum Hered 83(3):107–116. https://doi.org/10.1159/000489758
Tam V, Patel N, Turcotte M, Bosse Y, Pare G, Meyre D (2019) Benefits and limitations of genome-wide association studies. Nat Rev Genet 20(8):467–484. https://doi.org/10.1038/s41576-019-0127-1
Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7–24. https://doi.org/10.1016/j.ajhg.2011.11.029
Wang J, Huang D, Zhou Y, Yao H, Liu H, Zhai S, Li MJ et al (2020) CAUSALdb: a database for disease/trait causal variants identified using summary statistics of genome-wide association studies. Nucleic Acids Res 48(D1):D807–D816. https://doi.org/10.1093/nar/gkz1026
Xiao R, Boehnke M (2011) Quantifying and correcting for the winner’s curse in quantitative-trait association studies. Genet Epidemiol 35(3):133–138. https://doi.org/10.1002/gepi.20551
Xu Z, Taylor JA (2009) SNPinfo: integrating GWAS and candidate gene information into functional SNP selection for genetic association studies. Nucleic Acids Res 37(Web Server issue):W600-605. https://doi.org/10.1093/nar/gkp290
Zeng J, de Vlaming R, Wu Y, Robinson MR, Lloyd-Jones LR, Yengo L, Yang J et al (2018) Signatures of negative selection in the genetic architecture of human complex traits. Nat Genet 50(5):746–753. https://doi.org/10.1038/s41588-018-0101-4
Funding
Partial financial support was received from National Institutes of Health Grants U19CA203654, U19CA203654S1, R01CA231141, and P01 CA206980-01A1, Cancer Prevention and Research Institute of Texas Grant RR170048. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors have no conflicts of interest to declare that are relevant to the content of this article.
Ethics approval
Not applicable: the study used aggregate statistics from datasets in the public domain.
Consent to participate
Not applicable.
Consent for publication
Not applicable.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Gorlova, O.Y., Xiao, X., Tsavachidis, S. et al. SNP characteristics and validation success in genome wide association studies. Hum Genet 141, 229–238 (2022). https://doi.org/10.1007/s00439-021-02407-8
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-021-02407-8