Human Genetics

, Volume 133, Issue 12, pp 1477–1486 | Cite as

SNP characteristics predict replication success in association studies

  • Ivan P. Gorlov
  • Jason H. Moore
  • Bo Peng
  • Jennifer L. Jin
  • Olga Y. Gorlova
  • Christopher I. Amos
Original Investigation


Successful independent replication is the most direct approach for distinguishing real genotype–disease associations from false discoveries in genome-wide association studies (GWAS). Selecting SNPs for replication has been primarily based on P values from the discovery stage, although additional characteristics of SNPs may be used to improve replication success. We used disease-associated SNPs from more than 2,000 published GWASs to identify predictors of SNP reproducibility. SNP reproducibility was defined as a proportion of successful replications among all replication attempts. The study reporting association for the first time was considered to be discovery and all consequent studies targeting the same phenotype replications. We found that −Log(P), where P is a P value from the discovery study, is the strongest predictor of the SNP reproducibility. Other significant predictors include type of the SNP (e.g., missense vs intronic SNPs) and minor allele frequency. Features of the genes linked to the disease-associated SNP also predict SNP reproducibility. Based on empirically defined rules, we developed a reproducibility score (RS) to predict SNP reproducibility independently of −Log(P). We used data from two lung cancer GWAS studies as well as recently reported disease-associated SNPs to validate RS. Minus Log(P) outperforms RS when the very top SNPs are selected, while RS works better with relaxed selection criteria. In conclusion, we propose an empirical model to predict SNP reproducibility, which can be used to select SNPs for validation and prioritization.


Small Effect Size Significant SNPs Synonymous SNPs Conservation Index OMIM Gene 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



This work was supported in part by the National Institutes of Health U19 CA148127 Grant and the National Institutes of Health Grants 5 P30 CA016672, LM009012, LM010098 and GM103534. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Supplementary material

439_2014_1493_MOESM1_ESM.docx (59 kb)
Supplementary material 1 (DOCX 301 kb)
439_2014_1493_MOESM2_ESM.xls (4 mb)
Supplementary material 2 (XLS 4135 kb)
439_2014_1493_MOESM3_ESM.xls (40 kb)
Supplementary material 3 (XLS 39 kb)


  1. Alfoldi J, Lindblad-Toh K (2013) Comparative genomics as a tool to understand evolution and disease. Genome Res 23(7):1063–1068. doi: 10.1101/gr.157503.113 PubMedCentralPubMedCrossRefGoogle Scholar
  2. Amos CI, Wu X, Broderick P, Gorlov IP, Gu J, Eisen T, Houlston RS (2008) Genome-wide association scan of tag SNPs identifies a susceptibility locus for lung cancer at 15q25.1. Nat Genet 40(5):616–622. doi: 10.1038/ng.109 PubMedCentralPubMedCrossRefGoogle Scholar
  3. Balakrishnan R, Huntley R, Van Auken K, Cherry JM (2013) A guide to best practices for Gene Ontology (GO) manual annotation. Database (Oxford) 2013:bat054. doi: 10.1093/database/bat054 CrossRefGoogle Scholar
  4. Cookson W, Liang L, Abecasis G, Moffatt M, Lathrop M (2009) Mapping complex disease traits with global gene expression. Nat Rev Genet 10(3):184–194. doi: 10.1038/nrg2537 PubMedCrossRefGoogle Scholar
  5. Coordinators NR (2014) Database resources of the National Center for Biotechnology Information. Nucleic Acids Res 42(Database issue):D7–17. doi: 10.1093/nar/gkt1146 CrossRefGoogle Scholar
  6. Dayem Ullah AZ, Lemoine NR, Chelala C (2013) A practical guide for the functional annotation of genetic variations using SNPnexus. Brief Bioinform 14(4):437–447. doi: 10.1093/bib/bbt004 PubMedCrossRefGoogle Scholar
  7. Domazet-Loso T, Tautz D (2008) An ancient evolutionary origin of genes associated with human genetic diseases. Mol Biol Evol 25(12):2699–2707. doi: 10.1093/molbev/msn214 PubMedCentralPubMedCrossRefGoogle Scholar
  8. Gorlova O, Fedorov A, Logothetis C, Amos C, Gorlov I (2014) Genes with a large intronic burden show greater evolutionary conservation on the protein level. BMC Evol Biol 14(1):50. doi: 10.1186/1471-2148-14-50 PubMedCentralPubMedCrossRefGoogle Scholar
  9. Greenwood CM, Rangrej J, Sun L (2007) Optimal selection of markers for validation or replication from genome-wide association studies. Genet Epidemiol 31(5):396–407. doi: 10.1002/gepi.20220 PubMedCrossRefGoogle Scholar
  10. Hakonarson H, Grant SF (2011) Planning a genome-wide association study: points to consider. Ann Med 43(6):451–460. doi: 10.3109/07853890.2011.573803 PubMedCrossRefGoogle Scholar
  11. Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 106(23):9362–9367. doi: 10.1073/pnas.0903103106 PubMedCentralPubMedCrossRefGoogle Scholar
  12. Hung RJ, McKay JD, Gaborieau V, Boffetta P, Hashibe M, Zaridze D, Brennan P (2008) A susceptibility locus for lung cancer maps to nicotinic acetylcholine receptor subunit genes on 15q25. Nature 452(7187):633–637. doi: 10.1038/nature06885 PubMedCrossRefGoogle Scholar
  13. Ioannidis JP, Tarone R, McLaughlin JK (2011) The false-positive to false-negative ratio in epidemiologic studies. Epidemiology 22(4):450–456. doi: 10.1097/EDE.0b013e31821b506e PubMedCrossRefGoogle Scholar
  14. Koch L (2014) Disease genetics: insights into missing heritability. Nat Rev Genet 15(4):218. doi: 10.1038/nrg3713 Google Scholar
  15. Kraft P, Cox DG (2008) Study designs for genome-wide association studies. Adv Genet 60:465–504. doi: 10.1016/S0065-2660(07)00417-8 PubMedCrossRefGoogle Scholar
  16. Kraft P, Zeggini E, Ioannidis JP (2009) Replication in genome-wide association studies. Stat Sci 24(4):561–573. doi: 10.1214/09-STS290 PubMedCentralPubMedCrossRefGoogle Scholar
  17. Lewis A, Tomlinson I (2012) Cancer. The utility of mouse models in post-GWAS research. Science 338(6112):1301–1302. doi: 10.1126/science.1231733 PubMedCrossRefGoogle Scholar
  18. Liu X, Yu X, Zack DJ, Zhu H, Qian J (2008) TiGER: a database for tissue-specific gene expression and regulation. BMC Bioinform 9:271. doi: 10.1186/1471-2105-9-271 CrossRefGoogle Scholar
  19. Marjoram P, Zubair A, Nuzhdin SV (2014) Post-GWAS: where next? More samples, more SNPs or more biology? Heredity (Edinb) 112(1):79–88. doi: 10.1038/hdy.2013.52 CrossRefGoogle Scholar
  20. Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ (2010) Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLoS Genet 6(4):e1000888. doi: 10.1371/journal.pgen.1000888 PubMedCentralPubMedCrossRefGoogle Scholar
  21. Schork AJ, Thompson WK, Pham P, Torkamani A, Roddey JC, Sullivan PF, Dale AM (2013) All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoS Genet 9(4):e1003449. doi: 10.1371/journal.pgen.1003449 PubMedCentralPubMedCrossRefGoogle Scholar
  22. Shen X (2013) The curse of the missing heritability. Front Genet 4:225. doi: 10.3389/fgene.2013.00225 PubMedCentralPubMedGoogle Scholar
  23. Siontis KC, Patsopoulos NA, Ioannidis JP (2010) Replication of past candidate loci for common diseases and phenotypes in 100 genome-wide association studies. Eur J Hum Genet 18(7):832–837. doi: 10.1038/ejhg.2010.26 PubMedCentralPubMedCrossRefGoogle Scholar
  24. Smith JG, Newton-Cheh C (2009) Genome-wide association study in humans. Methods Mol Biol 573:231–258. doi: 10.1007/978-1-60761-247-6_14 PubMedCrossRefGoogle Scholar
  25. Visscher PM, Brown MA, McCarthy MI, Yang J (2012) Five years of GWAS discovery. Am J Hum Genet 90(1):7–24. doi: 10.1016/j.ajhg.2011.11.029 PubMedCentralPubMedCrossRefGoogle Scholar
  26. Zuk O, Schaffner SF, Samocha K, Do R, Hechter E, Kathiresan S, Lander ES (2014) Searching for missing heritability: designing rare variant association studies. Proc Natl Acad Sci USA 111(4):E455–E464. doi: 10.1073/pnas.1322563111 PubMedCentralPubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2014

Authors and Affiliations

  • Ivan P. Gorlov
    • 1
  • Jason H. Moore
    • 2
  • Bo Peng
    • 3
  • Jennifer L. Jin
    • 4
  • Olga Y. Gorlova
    • 1
  • Christopher I. Amos
    • 1
  1. 1.Department of Community and Family MedicineGeisel School of Medicine, Dartmouth CollegeHanoverUSA
  2. 2.The Geisel School of Medicine, Dartmouth College, HB 7937, One Medical Center Dr., Dartmouth-Hitchcock Medical CenterLebanonUSA
  3. 3.Department of Bioinformatics and Computational BiologyThe University of Texas MD Anderson Cancer CenterHoustonUSA
  4. 4.Department of MathematicsDartmouth CollegeHanoverUSA

Personalised recommendations