SNP characteristics predict replication success in association studies
- 755 Downloads
Successful independent replication is the most direct approach for distinguishing real genotype–disease associations from false discoveries in genome-wide association studies (GWAS). Selecting SNPs for replication has been primarily based on P values from the discovery stage, although additional characteristics of SNPs may be used to improve replication success. We used disease-associated SNPs from more than 2,000 published GWASs to identify predictors of SNP reproducibility. SNP reproducibility was defined as a proportion of successful replications among all replication attempts. The study reporting association for the first time was considered to be discovery and all consequent studies targeting the same phenotype replications. We found that −Log(P), where P is a P value from the discovery study, is the strongest predictor of the SNP reproducibility. Other significant predictors include type of the SNP (e.g., missense vs intronic SNPs) and minor allele frequency. Features of the genes linked to the disease-associated SNP also predict SNP reproducibility. Based on empirically defined rules, we developed a reproducibility score (RS) to predict SNP reproducibility independently of −Log(P). We used data from two lung cancer GWAS studies as well as recently reported disease-associated SNPs to validate RS. Minus Log(P) outperforms RS when the very top SNPs are selected, while RS works better with relaxed selection criteria. In conclusion, we propose an empirical model to predict SNP reproducibility, which can be used to select SNPs for validation and prioritization.
KeywordsSmall Effect Size Significant SNPs Synonymous SNPs Conservation Index OMIM Gene
This work was supported in part by the National Institutes of Health U19 CA148127 Grant and the National Institutes of Health Grants 5 P30 CA016672, LM009012, LM010098 and GM103534. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP, Collins FS, Manolio TA (2009) Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 106(23):9362–9367. doi: 10.1073/pnas.0903103106 PubMedCentralPubMedCrossRefGoogle Scholar
- Schork AJ, Thompson WK, Pham P, Torkamani A, Roddey JC, Sullivan PF, Dale AM (2013) All SNPs are not created equal: genome-wide association studies reveal a consistent pattern of enrichment among functionally annotated SNPs. PLoS Genet 9(4):e1003449. doi: 10.1371/journal.pgen.1003449 PubMedCentralPubMedCrossRefGoogle Scholar