Skip to main content
Log in

The Null Distributions of Test Statistics in Genomewide Association Studies

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Genomewide association (GWA) studies assay hundreds of thousands of single nucleotide polymorphisms (SNPs) simultaneously across the entire genome and associate them with diseases, other biological or clinical traits. The association analysis usually tests each SNP as an independent entity and ignores the biological information such as linkage disequilibrium. Although the Bonferroni correction and other approaches have been proposed to address the issue of multiple comparisons as a result of testing many SNPs, there is a lack of understanding of the distribution of an association test statistic when an entire genome is considered together. In other words, there are extensive efforts in hypothesis testing, and almost no attempt in estimating the density under the null hypothesis. By estimating the true null distribution, we can apply the result directly to hypothesis testing; better assess the existing approaches of multiple comparisons; and evaluate the impact of linkage disequilibrium on the GWA studies. To this end, we estimate the empirical null distribution of an association test statistic in GWA studies using simulated population data. We further propose a convenient and accurate method based on adaptive spline to estimate the empirical value in GWA studies and validate our findings using a real data set. Our method enables us to fully characterize the null distribution of an association test that not only can be used to test the null hypothesis of no association, but also provides important information about the impact of density of the genetic markers on the significance of the tests. Our method does not require users to perform computationally intensive permutations, and hence provides a timely solution to an important and difficult problem in GWA studies.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Cauchi S, Meyre D, Durand E, et al. (2008) Post genome-wide association studies of novel genes associated with type 2 diabetes show gene–gene interaction and high predictive value. PLoS ONE 3:e2031

    Article  Google Scholar 

  2. Chen X, Liu CT, Zhang M, Zhang H (2007) A forest-based approach to identifying gene and gene–gene interactions. Proc Natl Acad Sci USA 104:19199–19203

    Article  Google Scholar 

  3. Churchill GA, Doerge RW (1994) Empirical threshold values for quantitative trait mapping. Genetics 138:963–971

    Google Scholar 

  4. Conneely KN, Boehnke M (2007) So many correlated tests, so little time! rapid adjustment of P values for multiple correlated tests. Am J Hum Genet 81:1158–1168

    Article  Google Scholar 

  5. Donnelly P, Tavare S (1995) Coalescents and genealogical structure under neutrality. Annu Rev Genet 29:401–421

    Article  Google Scholar 

  6. Dudbridge F, Gusnanto A (2008) Estimation of significance thresholds for genomewide association scans. Genet Epidemiol 32:227–234

    Article  Google Scholar 

  7. Durrant C, Zondervan KT, Cardon LR, Hunt S, Deloukas P, Morris AP (2004) Linkage disequilibrium mapping via cladistic analysis of single-nucleotide polymorphism haplotypes. Am J Hum Genet 75:35–43

    Article  Google Scholar 

  8. Edwards AO, Ritter R, Abel KJ 3rd, Manning A, Panhuysen C, Farrer LA (2005) Complement factor H polymorphism and age-related macular degeneration. Science 308:421–424

    Article  Google Scholar 

  9. Frazer KA, Ballinger DG, Cox DR, et al. (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–861

    Article  Google Scholar 

  10. Gabriel SB, Schaffner SF, Nguyen H, et al. (2002) The structure of haplotype blocks in the human genome. Science 296:2225–2229

    Article  Google Scholar 

  11. Gao X, Becker LC, Becker DM, Starmer JD, Province MA (2009) Avoiding the high Bonferroni penalty in genome-wide association studies. Genet Epidemiol. doi:10.1002/gepi.20430

    Google Scholar 

  12. Haines JL, Hauser MA, Schmidt S, et al. (2005) Complement factor H variant increases the risk of age-related macular degeneration. Science 308:419–421

    Article  Google Scholar 

  13. Helgadottir A, Thorleifsson G, Manolescu A, et al. (2007) A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316:1491–1493

    Article  Google Scholar 

  14. Helgadottir A, Thorleifsson G, Magnusson KP, et al. (2008) The same sequence variant on 9p21 associates with myocardial infarction, abdominal aortic aneurysm and intracranial aneurysm. Nat Genet 40:217–224

    Article  Google Scholar 

  15. Hunter DJ, Kraft P, Jacobs KB, et al. (2007) A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39:870–874

    Article  Google Scholar 

  16. Jenkinson AF (1955) The frequency distribution of the annual maximum (or minimum) of meteorological elements. Q J R Meteoro Soc 81:158–171

    Article  Google Scholar 

  17. Ke X, Taylor MS, Cardon LR (2008) Singleton SNPs in the human genome and implications for genome-wide association studies. Eur J Hum Genet 16:506–515

    Article  Google Scholar 

  18. Klein RJ, Zeiss C, Chew EY, et al. (2005) Complement factor H polymorphism in age-related macular degeneration. Science 308:385–389

    Article  Google Scholar 

  19. Li C, Li M (2008) GWASimulator: a rapid whole-genome simulation program. Bioinformatics 24:140–142

    Article  Google Scholar 

  20. McPherson R, Pertsemlidis A, Kavaslar N, et al. (2007) A common allele on chromosome 9 associated with coronary heart disease. Science 316:1488–1491

    Article  Google Scholar 

  21. Moskvina V, Schmidt KM (2008) On multiple-testing correction in genome-wide association studies. Genet Epidemiol 32:567–573

    Article  Google Scholar 

  22. Nakagawa S (2004) A farewell to Bonferroni: the problems of low statistical power and publication bias. Behav Ecol 15:1044–1045

    Article  Google Scholar 

  23. Ozaki K, Ohnishi Y, Iida A, et al. (2002) Functional SNPs in the lymphotoxin-alpha gene that are associated with susceptibility to myocardial infarction. Nat Genet 32:650–654

    Article  Google Scholar 

  24. Sabatti C, Service S, Freimer N (2003) False discovery rate in linkage and association genome screens for complex disorders. Genetics 164:829–833

    Google Scholar 

  25. Samani NJ, Erdmann J, Hall AS, et al. (2007) Genomewide association analysis of coronary artery disease. N Engl J Med 357:443–453

    Article  Google Scholar 

  26. Saxena R, Voight BF, Lyssenko V, et al. (2007) Genome-wide association analysis identifies loci for type 2 diabetes and triglyceride levels. Science 316:1331–1336

    Article  Google Scholar 

  27. Scott LJ, Mohlke KL, Bonnycastle LL, et al. (2007) A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316:1341–1345

    Article  Google Scholar 

  28. Sladek R, Rocheleau G, Rung J, et al. (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445:881–885

    Article  Google Scholar 

  29. Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100:9440–9445

    Article  MathSciNet  MATH  Google Scholar 

  30. Stranger BE, Forrest MS, Clark AG, et al. (2005) Genome-wide associations of gene expression variation in humans. PLoS Genet 1:e78

    Article  Google Scholar 

  31. The International HapMap Consortium (2003) The International HapMap Project. Nature 426:789–796

    Article  Google Scholar 

  32. The Wellcome Trust Case Control Consortium (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–678

    Article  Google Scholar 

  33. Wacholder S, Chanock S, Garcia-Closas M, El Ghormli L, Rothman N (2004) Assessing the probability that a positive report is false: an approach for molecular epidemiology studies. J Natl Cancer Inst 96:434–442

    Article  Google Scholar 

  34. Zeggini E, Weedon MN, Lindgren CM, et al. (2007) Replication of genome-wide association signals in UK samples reveals risk loci for type 2 diabetes. Science 316:1336–1341

    Article  Google Scholar 

  35. Zhang H (1999) Analysis of infant growth curves using multivariate adaptive splines. Biometrics 55:452–459

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Heping Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, X., Zhang, H. The Null Distributions of Test Statistics in Genomewide Association Studies. Stat Biosci 1, 214–227 (2009). https://doi.org/10.1007/s12561-009-9011-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12561-009-9011-4

Keywords

Navigation