MAX-rank: a simple and robust genome-wide scan for case-control association studies
In genome-wide association studies (GWAS), single-marker analysis is usually employed to identify the most significant single nucleotide polymorphisms (SNPs). The trend test has been proposed for analysis of case-control association. Three trend tests, optimal for the recessive, additive and dominant models respectively, are available. When the underlying genetic model is unknown, the maximum of the three trend test results (MAX) has been shown to be robust against genetic model misspecification. Since the asymptotic distribution of MAX depends on the allele frequency of the SNP, using the P-value of MAX for ranking may be different from using the MAX statistic. Calculating the P-value of MAX for 300,000 (300 K) or more SNPs is computationally intensive and the software and program to obtain the P-value of MAX are not widely available. On the other hand, the MAX statistic is very easy to calculate without complex computer programs. Thus, we study whether or not one could use the MAX statistic instead of its P-value to rank SNPs in GWAS. The approaches using the MAX and its P-value to rank SNPs are referred to as MAX-rank and P-rank. By applying MAX-rank and P-rank to simulated and four real datasets from GWAS, we found the ranks of SNPs with true association are very similar using both approaches. Thus, we recommend to use MAX-rank for genome-wide scans. After the top-ranked SNPs are identified, their P-values based on MAX can be calculated and compared with the significance level.
KeywordsGenetic Model Dominant Model Trend Test Robust Test Asymptotic Null Distribution
We thank the Center for Information Technology, NIH, for providing access to the high-performance computational capabilities of the Biowulf cluster computer system. The authors would like to thank J Hoh for sharing her AMD data with us and BJ Stone of NCI for her helpful on the English edits. Three reviewers provided useful comments and suggestions with which we improved our presentation.
- Agresti A (1990) Categorical data analysis. Wiley, LondonGoogle Scholar
- Gonzalez JR, Carrasco JL, Dudbridge F, Armengol L, Estivill X, Moreno V (2008) Maximizing association statistics over genetic models. Genet Epidemiol (in press). doi: 10.1002/gepi.20299
- Zheng G, Freidlin B, Gastwirth JL (2006) Comparison of robust tests for genetic association using case-control studies, vol 49. In: IMS lecture notes monograph series (2nd Lehmann symposium—optimality), pp 253–265Google Scholar