Abstract
Successful independent replication is the most direct approach for distinguishing real genotype–disease associations from false discoveries in genome-wide association studies (GWAS). Selecting SNPs for replication has been primarily based on P values from the discovery stage, although additional characteristics of SNPs may be used to improve replication success. We used disease-associated SNPs from more than 2,000 published GWASs to identify predictors of SNP reproducibility. SNP reproducibility was defined as a proportion of successful replications among all replication attempts. The study reporting association for the first time was considered to be discovery and all consequent studies targeting the same phenotype replications. We found that −Log(P), where P is a P value from the discovery study, is the strongest predictor of the SNP reproducibility. Other significant predictors include type of the SNP (e.g., missense vs intronic SNPs) and minor allele frequency. Features of the genes linked to the disease-associated SNP also predict SNP reproducibility. Based on empirically defined rules, we developed a reproducibility score (RS) to predict SNP reproducibility independently of −Log(P). We used data from two lung cancer GWAS studies as well as recently reported disease-associated SNPs to validate RS. Minus Log(P) outperforms RS when the very top SNPs are selected, while RS works better with relaxed selection criteria. In conclusion, we propose an empirical model to predict SNP reproducibility, which can be used to select SNPs for validation and prioritization.
Similar content being viewed by others
References
Alfoldi J, Lindblad-Toh K (2013) Comparative genomics as a tool to understand evolution and disease. Genome Res 23(7):1063–1068. doi:10.1101/gr.157503.113
Amos CI, Wu X, Broderick P, Gorlov IP, Gu J, Eisen T, Houlston RS (2008) Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat Genet 40(5):616–622. doi:10.1038/ng.109
Balakrishnan R, Huntley R, Van Auken K, Cherry JM (2013) A guide to best practices for Gene Ontology (GO) manual annotation. Database (Oxford) 2013:bat054. doi:10.1093/database/bat054
Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10(3):184–194. doi:10.1038/nrg2537
Coordinators NR (2014) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 42(Database issue):D7–17. doi:10.1093/nar/gkt1146
Dayem Ullah AZ, Lemoine NR, Chelala C (2013) A practical guide for the functional annotation of genetic variations using SNPnexus. Brief Bioinform 14(4):437–447. doi:10.1093/bib/bbt004
Domazet-Loso T, Tautz D (2008) An ancient evolutionary origin of genes associated with human genetic diseases. Mol Biol Evol 25(12):2699–2707. doi:10.1093/molbev/msn214
Gorlova O, Fedorov A, Logothetis C, Amos C, Gorlov I (2014) Genes with a large intronic burden show greater evolutionary conservation on the protein level. BMC Evol Biol 14(1):50. doi:10.1186/1471-2148-14-50
Greenwood CM, Rangrej J, Sun L (2007) Optimal selection of markers for validation or replication from genome-wide association studies. Genet Epidemiol 31(5):396–407. doi:10.1002/gepi.20220
Hakonarson H, Grant SF (2011) Planning a genome-wide association study: points to consider. Ann Med 43(6):451–460. doi:10.3109/07853890.2011.573803
Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 106(23):9362–9367. doi:10.1073/pnas.0903103106
Hung RJ, McKay JD, Gaborieau V, Boffetta P, Hashibe M, Zaridze D, Brennan P (2008) A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452(7187):633–637. doi:10.1038/nature06885
Ioannidis JP, Tarone R, McLaughlin JK (2011) The false-positive to false-negative ratio in epidemiologic studies. Epidemiology 22(4):450–456. doi:10.1097/EDE.0b013e31821b506e
Koch L (2014) Disease genetics: insights into missing heritability. Nat Rev Genet 15(4):218. doi:10.1038/nrg3713
Kraft P, Cox DG (2008) Study designs for genome-wide association studies. Adv Genet 60:465–504. doi:10.1016/S0065-2660(07)00417-8
Kraft P, Zeggini E, Ioannidis JP (2009) Replication in genome-wide association studies. Stat Sci 24(4):561–573. doi:10.1214/09-STS290
Lewis A, Tomlinson I (2012) Cancer. The utility of mouse models in post-GWAS research. Science 338(6112):1301–1302. doi:10.1126/science.1231733
Liu X, Yu X, Zack DJ, Zhu H, Qian J (2008) TiGER: a database for tissue-specific gene expression and regulation. BMC Bioinform 9:271. doi:10.1186/1471-2105-9-271
Marjoram P, Zubair A, Nuzhdin SV (2014) Post-GWAS: where next? More samples, more SNPs or more biology? Heredity (Edinb) 112(1):79–88. doi:10.1038/hdy.2013.52
Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ (2010) Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet 6(4):e1000888. doi:10.1371/journal.pgen.1000888
Schork AJ, Thompson WK, Pham P, Torkamani A, Roddey JC, Sullivan PF, Dale AM (2013) All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoS Genet 9(4):e1003449. doi:10.1371/journal.pgen.1003449
Shen X (2013) The curse of the missing heritability. Front Genet 4:225. doi:10.3389/fgene.2013.00225
Siontis KC, Patsopoulos NA, Ioannidis JP (2010) Replication of past candidate loci for common diseases and phenotypes in 100 genome-wide association studies. Eur J Hum Genet 18(7):832–837. doi:10.1038/ejhg.2010.26
Smith JG, Newton-Cheh C (2009) Genome-wide association study in humans. Methods Mol Biol 573:231–258. doi:10.1007/978-1-60761-247-6_14
Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7–24. doi:10.1016/j.ajhg.2011.11.029
Zuk O, Schaffner SF, Samocha K, Do R, Hechter E, Kathiresan S, Lander ES (2014) Searching for missing heritability: designing rare variant association studies. Proc Natl Acad Sci USA 111(4):E455–E464. doi:10.1073/pnas.1322563111
Acknowledgments
This work was supported in part by the National Institutes of Health U19 CA148127 Grant and the National Institutes of Health Grants 5 P30 CA016672, LM009012, LM010098 and GM103534. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Gorlov, I.P., Moore, J.H., Peng, B. et al. SNP characteristics predict replication success in association studies. Hum Genet 133, 1477–1486 (2014). https://doi.org/10.1007/s00439-014-1493-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-014-1493-6