Abstract
The burden test and the sequence kernel association test (SKAT) are two popular methods for detecting association with rare variants. Treated as two different sources of association information, they are adaptively combined to form an optimal SKAT (SKAT-O) method for optimal power. We show that the burden test is part of rather than independent of the SKAT. We introduce a new test statistic that is the sum of the burden statistic and a statistic asymptotically independent of the burden statistic. The performance of this new test statistic is demonstrated through extensive simulation studies and applications to a Genetic Analysis Workshop 17 data set and the Ocular Hypertension Treatment Study data.
Similar content being viewed by others
References
Bacanu S-A, Nelson MR, Whittaker JC (2012) Comparison of statistical tests for association between rare variants and binary traits. PLoS One 7:e42530
Basu S, Pan W (2011) Comparison of statistical tests for disease association with rare variants. Genet Epidemiol 35:606–619
Chen Z (2011a) Is the weighted z-test the best method for combining probabilities from independent tests? J Evol Biol 24:926–930
Chen Z (2011b) A new association test based on Chi-square partition for case-control GWA studies. Genet Epidemiol 35:658–663
Chen Z (2013) Association tests through combining p-values for case control genome–wide association studies. Stat Probab Lett 83:1854–1862
Chen Z (2014) A new association test based on disease allele selection for case-control genome-wide association studies. BMC Genomics 15:358
Chen Z (2017) Testing for gene-gene interaction in case-control GWAS. Stat Interface 10:267–277
Chen Z, Nadarajah S (2014) On the optimally weighted z-test for combining probabilities from independent studies. Comput Stat Data Anal 70:387–394
Chen Z, Ng HKT (2012) A robust method for testing association in genome-wide association studies. Hum Hered 73:26–34
Chen Z, Huang H, Ng HKT (2012) Design and analysis of multiple diseases genome-wide association studies without controls. Gene 510:87–92
Chen Z, Huang H, Ng HKT (2014a) An improved robust association test for GWAS with multiple diseases. Stat Probab Lett 91:153–161
Chen Z, Yang W, Liu Q, Yang JY, Li J, Yang MQ (2014b) A new statistical approach to combining p-values using gamma distribution and its application to genome-wide association study. BMC Bioinform 15(Suppl 17):S3
Chen Z, Huang H, Ng HKT (2016a) Testing for association in case-control genome-wide association studies with shared controls. Stat Methods Med Res 25:954–967
Chen Z, Huang H, Qiu P (2016b) Comparison of multiple hazard rate functions. Biometrics 72:39–45
Chen Z, Han S, Wang K (2017a) Genetic association test based on principal component analysis Applications. Genet Mol Biol 16:189–198
Chen Z, Huang H, Qiu P (2017b) An improved two-stage procedure to compare hazard curves. J Stat Comput Simul 87:1877–1886
Chen Z, Ng HKT, Li J, Liu Q, Huang H (2017c) Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies. Stat Methods Med Res 26:567–582
Fisher RA (ed) (1932) Statistical methods for research workers. Oliver and Boyd, Edinburgh
Gordon MO, Kass MA (1999) The ocular hypertension treatment study: design and baseline description of the participants. Arch Ophthalmol 117:573–583
Lee S, Wu MC, Lin X (2012) Optimal tests for rare variant effects in sequencing association studies. Biostatistics 13:762–775
Li B, Leal SM (2008) Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am J Hum Genet 83:311–321
Lin D-Y, Tang Z-Z (2011) A general framework for detecting disease associations with rare variants in sequencing studies. Am J Hum Genet 89:354–367
Neale BM et al (2011) Testing for an unusual distribution of rare variants. PLoS Genet 7:e1001322
Pan W, Kim J, Zhang Y, Shen X, Wei P (2014) A powerful and adaptive association test for rare variants. Genetics 197:1081–1095
Sun YV, Sung YJ, Tintle N, Ziegler A (2011) Identification of genetic association of multiple rare variants using collapsing methods. Genet Epidemiol 35(Suppl 1):S101–S106
Sun J, Zheng Y, Hsu L (2013) A unified mixed-effects model for rare-variant association in sequencing studies. Genet Epidemiol 37:334–344
Wang K (2016) Boosting the power of the sequence kernel association test by properly estimating its null distribution. Am J Hum Genet 99:104–114
Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X (2011) Rare-variant association testing for sequencing data with the sequence kernel association test. Am J Hum Genet 89:82–93
Wu B, Pankow JS, Guan W (2015) Sequence kernel association analysis of rare variant set based on the marginal regression model for binary traits. Genet Epidemiol 39:399–405
Yi N, Zhi D (2011) Bayesian analysis of rare variants in genetic association studies. Genet Epidemiol 35:57–69
Zang Y, Fung WK, Zheng G (2010) Simple algorithms to calculate the asymptotic null distributions of robust tests in case-control genetic association studies in R. J Stat Softw 33:1–24
Acknowledgements
The authors would like to thank the editor and three anonymous referees for their insightful comments which resulted in a substantial improvement of the paper.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Appendix: Proofs
Appendix: Proofs
Proof of Proposition 2.1
Under the model assumption, asymptotically, \( \tilde{y}\sim N(0,1) \). The covariance between \( Q_{\text{B}} \) and \( Q_{\text{s}} \) is
Therefore, \( Q_{\text{B}} \) and \( Q_{\text{s}} \) are correlated in general.
Proof of Theorem 2.1
Since \( \tilde{G} = G - \tilde{v}_{0} \tilde{v}_{0}^{T} G \), \( \tilde{v}_{0} = \frac{G1}{{\sqrt {1^{T} G^{T} G1} }}, \) and \( \tilde{G}\tilde{G}^{T} \tilde{v}_{0} = 0 \). Therefore, \( \tilde{v}_{0} \) is orthogonal to the space spanned by the column vectors of \( \tilde{G}\tilde{G}^{T} \). Hence, for each \( \tilde{v}_{j} > 0 \), we have \( \tilde{v}_{j}^{T} \tilde{v}_{0} = 0 \). The covariance between \( \tilde{v}_{j}^{T} \tilde{y} \) and \( \tilde{v}_{k}^{T} \tilde{y} \) is \( {\text{Cov}} ( {\tilde{v}_{j}^{T} \tilde{y},\tilde{v}_{k}^{T} \tilde{y}} ) = \tilde{v}_{j}^{T} \tilde{v}_{k} = 0\; {\text{if}}\; j \ne k, \;{\text{and}}\; j,k = 0, 2, \ldots ,m \), since under the null hypothesis \( \tilde{v}_{j}^{T} \tilde{y} \) has asymptotic standard normal distribution: \( E[\tilde{v}_{j}^{T} \tilde{y}] = 0, \;{\text{and}}\; {\text{Var}}[\tilde{v}_{j}^{T} \tilde{y}] = 1. \)
Proof of Theorem 2.2
Under the null hypothesis, asymptotically, both p values P 1 and P 2 from Q 1, and Q 2, respectively, have uniform distribution between 0 and 1. Using quantile transformation as described in the text, both variables \( \left( {\chi_{1}^{2} } \right)^{ - 1} \left( {P_{i} } \right) (i = 1,2) \) asymptotically and independently follow a Chi-square distribution with a degree of freedom 1. Therefore, the new test statistic \( Q_{\text{new}} = \mathop \sum \nolimits_{i = 1}^{2} \left( {\chi_{1}^{2} } \right)^{ - 1} (P_{i} ) \) has an asymptotic Chi-square distribution with degrees of freedom 2.
Rights and permissions
About this article
Cite this article
Chen, Z., Wang, K. A gene-based test of association through an orthogonal decomposition of genotype scores. Hum Genet 136, 1385–1394 (2017). https://doi.org/10.1007/s00439-017-1839-y
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00439-017-1839-y