Advertisement

Human Genetics

, 123:617 | Cite as

MAX-rank: a simple and robust genome-wide scan for case-control association studies

  • Qizhai Li
  • Kai Yu
  • Zhaohai Li
  • Gang Zheng
Original Investigation

Abstract

In genome-wide association studies (GWAS), single-marker analysis is usually employed to identify the most significant single nucleotide polymorphisms (SNPs). The trend test has been proposed for analysis of case-control association. Three trend tests, optimal for the recessive, additive and dominant models respectively, are available. When the underlying genetic model is unknown, the maximum of the three trend test results (MAX) has been shown to be robust against genetic model misspecification. Since the asymptotic distribution of MAX depends on the allele frequency of the SNP, using the P-value of MAX for ranking may be different from using the MAX statistic. Calculating the P-value of MAX for 300,000 (300 K) or more SNPs is computationally intensive and the software and program to obtain the P-value of MAX are not widely available. On the other hand, the MAX statistic is very easy to calculate without complex computer programs. Thus, we study whether or not one could use the MAX statistic instead of its P-value to rank SNPs in GWAS. The approaches using the MAX and its P-value to rank SNPs are referred to as MAX-rank and P-rank. By applying MAX-rank and P-rank to simulated and four real datasets from GWAS, we found the ranks of SNPs with true association are very similar using both approaches. Thus, we recommend to use MAX-rank for genome-wide scans. After the top-ranked SNPs are identified, their P-values based on MAX can be calculated and compared with the significance level.

Keywords

Genetic Model Dominant Model Trend Test Robust Test Asymptotic Null Distribution 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

Notes

Acknowledgments

We thank the Center for Information Technology, NIH, for providing access to the high-performance computational capabilities of the Biowulf cluster computer system. The authors would like to thank J Hoh for sharing her AMD data with us and BJ Stone of NCI for her helpful on the English edits. Three reviewers provided useful comments and suggestions with which we improved our presentation.

References

  1. Agresti A (1990) Categorical data analysis. Wiley, LondonGoogle Scholar
  2. Balding D (2006) A tutorial on statistical methods for population association studies. Nat Rev Genet 7:781–791PubMedCrossRefGoogle Scholar
  3. Conneely KN, Boehnke M (2007) So many correlated tests, so little time! Rapid adjustment of P values for multiple correlated tests. Am J Hum Genet 81:1158–1168CrossRefGoogle Scholar
  4. Davies RB (1977) Hypothesis testing when a nuisance parameter is present only under the alternative. Biometrika 64:247–254CrossRefGoogle Scholar
  5. Freidlin B, Zheng G, Li Z, Gastwirth JL (2002) Trend tests for case-control studies of genetic markers: power, sample size and robustness. Hum Hered 53:146–152PubMedCrossRefGoogle Scholar
  6. Gail MH, Pfeiffer RM, Wheeler W, Pee D (2008) Probability of detecting disease-associated single nucleotide polymorphisms in case-control genome-wide association studies. Biostatistics 9:201–215PubMedCrossRefGoogle Scholar
  7. Gastwirth JL (1966) On robust procedures. J Am Stat Assoc 61:929–948CrossRefGoogle Scholar
  8. Gastwirth JL (1985) The use of maximin efficiency robust tests in combining contingency tables and survival analysis. J Am Stat Assoc 80:380–384CrossRefGoogle Scholar
  9. Gonzalez JR, Carrasco JL, Dudbridge F, Armengol L, Estivill X, Moreno V (2008) Maximizing association statistics over genetic models. Genet Epidemiol (in press). doi: 10.1002/gepi.20299
  10. Herbert A, Gerry NP, McQueen MB, Heid IM, Pfeufer A et al (2006) A common genetic variant is associated with adult and childhood obesity. Science 312:279–283PubMedCrossRefGoogle Scholar
  11. Hunter DJ, Kraft P, Jacobs KB, Cox DG, Yeager N, Hankinson SE, Wacholder S, Wang Z, Welch R, Hutchinson A, et al (2007) A genome-wide association study identifies alleles in FGFR2 associated with risk of sporadic postmenopausal breast cancer. Nat Genet 39:870–874PubMedCrossRefGoogle Scholar
  12. Klein RJ, Zeiss C, Chew EY, Tsai J-Y, Sackler RS, Haynes C, Henning AK, SanGiovanni JP, Mane SM, Mayne ST, Bracken MB, Ferris FL et al (2005) Complement factor H polymorphism in aged-related macular degeneration. Science 308:385–389PubMedCrossRefGoogle Scholar
  13. Li W (2008) Three lectures on case-control genetic association analysis. Brief Bioinform 9:1–13PubMedCrossRefGoogle Scholar
  14. Li Q, Zheng G, Li Z, Yu K (2008) Efficient approximation of p-value of maximum of correlated tests, with applications to genome-wide association studies. Ann Hum Genet 72:397–406PubMedCrossRefGoogle Scholar
  15. Sasieni PD (1997) From genotypes to genes: doubling the sample size. Biometrics 53:1253–1261PubMedCrossRefGoogle Scholar
  16. Sladek R, Rocheleau G, Rung J, Dina C, Shen L, Serre D, Boutin P, Vincent D, Belisle A, Hadjadj S et al (2007) A genome-wide association study identifies novel risk loci for type 2 diabetes. Nature 445:881–885PubMedCrossRefGoogle Scholar
  17. The Wellcome Trust Case Control Consortium (WTCCC) (2007) Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–683CrossRefGoogle Scholar
  18. Van Steen K, McQueen MB, Herbert A, Raby B, Lyon H et al (2005) Genomic screening and replication using the same data set in family-based association testing. Nat Genet 37:683–691PubMedCrossRefGoogle Scholar
  19. Yeager M, Orr N, Hayes RB, Jacobs KB, Kraft P, Wacholder S, Minichiello MJ, Fearnhead P, Yu K, Chatterjee N et al (2007) Genome-wide association study of prostate cancer identifies a second risk locus at 8q24. Nat Genet 39:645–649PubMedCrossRefGoogle Scholar
  20. Zaykin DV, Zhivotovsky LA (2005) Ranks of genuine associations in whole-genome scans. Genetics 171:813–823PubMedCrossRefGoogle Scholar
  21. Zheng G (2004) Maximizing a family of optimal statistics over a nuisance parameter with applications to genetic data analysis. J Appl Stat 31:661–671CrossRefGoogle Scholar
  22. Zheng G, Chen Z (2005) Comparison of maximum statistics for hypothesis testing when a nuisance parameter is present only under the alternative. Biometrics 61:254–258PubMedCrossRefGoogle Scholar
  23. Zheng G, Freidlin B, Li Z, Gastwirth JL (2003) Choice of scores in trend tests for case-control studies of candidate-gene associations. Biom J 45:335–348CrossRefGoogle Scholar
  24. Zheng G, Freidlin B, Gastwirth JL (2006) Comparison of robust tests for genetic association using case-control studies, vol 49. In: IMS lecture notes monograph series (2nd Lehmann symposium—optimality), pp 253–265Google Scholar
  25. Zheng G, Joo J, Lin JP, Stylianou M, Waclawiw MA, Geller NL (2007) Robust ranks of true associations in genome-wide case-control association studies. BMC Proc 1(Suppl 1):S165PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2008

Authors and Affiliations

  1. 1.Biostatistics BranchNational Cancer InstituteBethesdaUSA
  2. 2.Academy of Mathematics and Systems ScienceChinese Academy of SciencesBeijingChina
  3. 3.Department of StatisticsGeorge Washington UniversityWashingtonUSA
  4. 4.Biometry and Mathematical Statistics BranchNational Institute of Child Health and Human DevelopmentRockvilleUSA
  5. 5.Office of Biostatistics ResearchNational Heart, Lung and Blood InstituteBethesdaUSA

Personalised recommendations