Robust Association Tests Under Different Genetic Models, Allowing for Binary or Quantitative Traits and Covariates

So, Hon-Cheong; Sham, Pak C.

doi:10.1007/s10519-011-9450-9

Robust Association Tests Under Different Genetic Models, Allowing for Binary or Quantitative Traits and Covariates

Original Research
Open access
Published: 09 February 2011

Volume 41, pages 768–775, (2011)
Cite this article

Download PDF

You have full access to this open access article

Behavior Genetics Aims and scope Submit manuscript

Robust Association Tests Under Different Genetic Models, Allowing for Binary or Quantitative Traits and Covariates

Download PDF

Hon-Cheong So¹ &
Pak C. Sham^1,2,3

1971 Accesses
52 Citations
Explore all metrics

Abstract

The association of genetic variants with outcomes is usually assessed under an additive model, for example by the trend test. However, misspecification of the genetic model will lead to a reduction in power. More robust tests for association might therefore be preferred. A useful approach is to consider the maximum of the three test statistics under additive, dominant and recessive models (MAX3). The p-value however has to be adjusted to maintain the type I error rate. Previous studies and software on robust association tests have focused on binary traits without covariates. In this study we developed an analytic approach to robust association tests using MAX3, allowing for quantitative or binary traits as well as covariates. The p-values from our theoretical calculations match very well with those from a bootstrap resampling procedure. The methodology is implemented in the R package RobustSNP which is able to handle both small-scale studies and GWAS. The package and documentation are available at http://sites.google.com/site/honcheongso/software/robustsnp.

Boosting the power of genome-wide association studies within and across ancestries by using polygenic scores

Article 18 September 2023

Efficient Bayesian mixed-model analysis increases association power in large cohorts

Article 02 February 2015

Statistical Perspectives for Genome-Wide Association Studies (GWAS)

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Association study is a very useful tool for revealing susceptibility variants in diseases. With the recent advances in technology, genome-wide association studies (GWAS) have been increasingly popular. The association of a genetic variant with a disease or quantitative trait is usually assessed under an additive model of inheritance. In other words, we assume that the disease risk or trait value depends upon the number of copies of the risk allele. For example, the commonly used Cochran–Armitage trend test for binary outcomes assumes an additive model (Sasieni 1997). More generally, the genotype is usually coded as 0, 1 or 2 according to the dose of the risk allele in regression models.

However, in reality it is often impossible to know the true model of inheritance beforehand. Misspecification of the genetic model leads to a reduction in power. For instance, when the recessive or dominant model is real, assuming additivity will result in power loss. More robust tests for association might therefore be preferred over model-dependent methods such as the trend test. An intuitive approach is to consider the maximum of the three test statistics under additive, dominant and recessive models (MAX3) (Freidlin et al. 2002; Gonzalez et al. 2008). Nevertheless, multiple testing needs to be taken into account to prevent inflation of type I error rate. Since the test statistics under these 3 models are not independent, a Bonferroni correction is over-conservative. Resampling-based methods, such as permutation and bootstrap, can be used to estimate the distribution of the MAX3 statistic under the null, but they are computationally expensive. In GWAS, very large numbers of markers are genotyped and we need enormous number of permutations (or runs of other resampling procedures) to achieve very low p-values.

Gonzalez et al. (2008) derived the asymptotic distribution of the likelihood ratio test statistics under H₀ for 2 × K table (K is the number of independent variables) and hence the p-value could be calculated analytically. In a similar vein, Zheng and Ng (2008) proposed the genetic model selection (GMS) test. In the first stage, the best genetic model is chosen based on a Hardy–Weinberg disequilibrium trend test between controls and cases and the chosen genetic model is tested in the second stage. The authors computed the p-value analytically by considering the proper null distribution of the GMS statistic.

The majority of previous studies on robust association tests considering different models of inheritance have focused on binary outcomes and assumed no covariates, with the exception of Li et al. (2008). In practice, other types of outcomes such as quantitative traits are often studied. Covariates are also commonly included in association studies. For instance in GWAS, researchers often correct for population stratification by including principal components (e.g. from EIGENSTRAT) (Price et al. 2006) that capture the ancestry differences in the sample. In many instances other clinical covariates (e.g. age) are also included in association studies.

Li et al. (2008) considered the Wald test and proposed estimating the covariance matrix between the 3 test statistics by solving estimating equations. The p-values for MAX3 were approximated by the “rhombus formula” that was developed based on Efron (1997). In this study, we propose and implement an alternative analytic approach to robust association tests employing MAX3, allowing for quantitative or binary outcomes as well as covariates. The approach is based on previous work by Lin (2005a), who developed a Monte-Carlo procedure to evaluate significance levels in large-scale genomic studies. We found that the concept can also be applied to robust association tests.

Our approach is based on score tests and can potentially be employed in other scenarios, as long as a score statistic can be formed. Compared to the Wald test as applied in Li et al. (2008), the score test is computationally much faster as it does not require computation of the maximum likelihood estimate (MLE) of regression coefficients. As we are usually only interested in the coefficients of the few top SNPs in a GWAS, the score test saves the time in estimating coefficients for the majority of SNPs that do not show high levels of significance. In addition, the Wald test may not be reliable in logistic regression especially when the effect size is large (or more generally when the true parameter value is far away from the null) (Hauck and Donner 1977).

Many other related tests have also been proposed. An example is the constrained likelihood ratio test (CLRT) (Wang and Sheffield 2005), which makes the restriction that the heterozygous genotype has a mean effect in between the two homozygous genotypes (i.e. no over-dominance). CLRT can deal with binary or quantitative traits and the authors have pointed out its potential to be generalized to models with covariates. The issue of covariates however was not explored in Wang and Sheffield (2005). Programs implementing CLRT have not been publicly available yet. Compared to CLRT, MAX3 might be easier to interpret and is more conceptually familiar to researchers since it is simply based on taking the maximum of the three well-known inheritance models. Also based on the assumption of no over-dominance, Yamada and Okada (2009) proposed a very similar test known as the optimal dose–effect mode trend test. Alternatively, one may also take the minimum of the p-values from the Pearson’s chi-square test and trend test. This approach (denoted MIN2) was studied by Joo et al. (2009). Simulation studies on MAX3, CLRT and MIN2 under various genetic models suggest that they have similar power (Joo et al. 2009, 2010). We shall focus on MAX3 in the current study.

Relatively few programs are available for obtaining valid p-values when testing multiple genetic models. SNPassoc (Gonzalez et al. 2007) and Rassoc (Zang et al. 2010) are two R packages that offer such options. SNPassoc includes a function (maxstat) that implements approach by Gonzalez et al. (2008). Rassoc allows the calculation of MAX3 and GMS for case–control association studies (Zang et al. 2010). However, none of the available programs allow continuous traits and none offer the option of including covariates in association tests. We have implemented our proposed methodology in a new R package called RobustSNP that is able to tackle these problems.

Methods

General theory: covariance of score functions

The theory described below followed closely the Monte-Carlo simulation approach proposed by Lin (2005a) for assessing statistical significance in multiple testing scenarios. As pointed out by Lin, all the commonly employed statistics are related to the score statistic and can be expressed as or approximated by

$$ T_{j} = U^{\prime}_{j} V_{j}^{ - 1} U_{j} $$

where the subscript j refers to the jth hypothesis we want to test and

$$ U_{j} = \sum\limits_{i = 1}^{n} {U_{ji} } $$

where U _ji is the score function calculated from data from the ith subject only and n refers to the sample size.

V_j is given by

$$ V_{j} = \sum\limits_{i = 1}^{n} {U_{ji} } U^{\prime}_{ji} $$

When the jth hypothesis is truly null, U_j is approximately normally distributed with mean 0 and covariance matrix V_j in large samples. Hence T_j follows an approximately chi-square distribution with degrees of freedom equal to the dimension of U_j.

Consider testing a total of m hypotheses to be tested. If all of them are truly null, with large samples, (U ₁, U ₂,…,U _m) follows approximately a multivariate normal distribution with mean vector 0 and the covariance between U _j and U _k of any two hypothesis tests j and k is

$$ V_{jk} = \sum\limits_{i = 1}^{n} {U_{ji} } U^{\prime}_{ki} $$

This result forms the basis of our procedure to correct for the testing of multiple genetic models. In brief, we construct the score statistic for each of the three genetic models (dominant, recessive and additive) and use the above formula to calculate the covariance matrix of the three statistics under the null. The appropriate significance level is obtained by trivariate integration.

When covariates are present, U _ji in the above formulae should represent the ith subject’s efficient score function for β _j, the parameter of interest (Bickel et al. 1993; Lin 2005a, b). We have

$$ U_{ji} = U_{{\beta_{j} ,i}} - V_{{\beta_{j} \alpha_{j} }} V_{{\alpha_{j} \alpha_{j} }}^{ - 1} U_{{\alpha_{j} ,i}} $$

where $ U_{{\beta_{j} ,i}} $ and $ U_{{\alpha_{j,i} }} $ are the score function for the ith subject for parameters β _j and α _j, α _j being the nuisance parameter(s). $ V_{{\beta_{j} \alpha_{j} }} $ and $ V_{{\alpha_{j} \alpha_{j} }} $ are sub-matrices of the limiting Fisher information matrix of β _j and α _j [$ V_{{\beta_{j} \alpha_{j} }} $ equals $ {\text{cov}}\left( {U_{{\beta_{j} }} ,U_{{\alpha_{j} }} } \right) $ and $ V_{{\alpha_{j} \alpha_{j} }}^{{}} $equals the $ {\text{var}}\left( {U_{{\alpha_{j} }} } \right) $].

Application to genetic association studies

An example of application of score tests to genetic association studies may be found in Schaid et al. (2002). Here we shall focus on generalized linear models (GLMs) and adapt some of the work by Schaid (with modifications) in the following derivations.

For simplicity, we shall just consider a single test and the subscript j will be dropped. We are interested in testing the effect of a genetic marker under different genetic models, with or without covariates. For the ith subject, let y _i be the measured outcome, X _gi be the coding of the genotype and X _ei be a vector of environmental covariates (“environmental” here just refers to any covariates to be adjusted for) including 1 as the first element (for the intercept). X _gi is coded differently under different genetic models. Denoting the three genotypes of a markers by aa, Aa and AA, they will be coded as (0, 1, 2), (0, 1, 1) and (0, 0, 1) under additive, dominant and recessive models respectively. A is assumed to be the risk allele.

One can adjust the above coding scheme to deal with imputed genotypes. Most imputation programs produce explicit probabilities of the genotypes aa, Aa and AA. For each individual, the coding under an additive model is Pr(Aa) + 2 Pr(AA) (i.e. the standard dosage output by programs). The coding under a dominant model is Pr(Aa) + Pr(AA) while the coding under a recessive model is Pr(AA).

Assume that the outcome y and the predictor variables (X _gi, X _ei) are related through a GLM,

$$ \eta_{i} = {\mathbf{X}}^{\prime}_{ei} {\varvec{\upalpha}} +{\mathbf{X}}^{\prime}_{gi} \beta = {\mathbf{Z}}^{\prime}_{i}{\varvec{\upgamma}} $$

where $ Z^{\prime}_{i} = (X^{\prime}_{ei} ,X^{\prime}_{gi} ) $ and γ = (α, β). Consistent with previous notations, the parameters α and β reflect the effects of the environmental covariate and genetic marker on the outcome respectively. η is related to the actual outcome y through the link function f, such that $ E(y_{i} |Z_{i} ) = f^{ - 1} (\eta_{i} ) $. The likelihood of the observed outcome y _i given covariates Z _i for the ith subject is

$$ L_{i} (y_{i} |Z_{i} ) = \exp \left[ {{\frac{{y_{i} \eta_{i} - b(\eta_{i} )}}{a(\phi )}} + c(y_{i} ,\phi )} \right] $$

where a, b and c are known functions and ϕ is the dispersion parameter.

We are interested only in testing the parameter β. The score function for genetic markers, with adjustment for environmental covariates, can be written as

$$ U_{\beta } = \sum\limits_{i = 1}^{n} {{\frac{{\partial \log L_{i} }}{\partial \beta }}} = \sum\limits_{i = 1}^{n} {{\frac{{y_{i} - \tilde{y}_{i} }}{a(\phi )}}} {\mathbf{X}}_{gi} $$

Note that the score test is constructed under the null hypothesis, i.e. β = 0, hence $ \tilde{y}_{i} $ is the fitted value when the trait is regressed on the environmental covariates only. $ \tilde{y}_{i} $ needs to be calculated only once even when a large number of SNPs is tested.

The contribution from the ith subject is

$$ U_{\beta ,i} = {\frac{{\partial \log L_{i} }}{\partial \beta }} = {\frac{{y_{i} - \tilde{y}_{i} }}{a(\phi )}}{\mathbf{X}}_{gi} $$

Similarly, we have

$$ U_{\alpha ,i} = {\frac{{\partial \log L_{i} }}{\partial \alpha }} = {\frac{{y_{i} - \tilde{y}_{i} }}{a(\phi )}}{\mathbf{X}}_{ei} $$

The variance and covariance of the score functions of α and β are

$$ V_{\alpha \alpha } = \sum\limits_{i = 1}^{n} {{\frac{{b^{\prime\prime}(\eta_{i} )}}{a(\phi )}}} {\mathbf{X}}_{ei} {\mathbf{X}}^{\prime}_{ei} $$

$$ V_{\alpha \beta } = \sum\limits_{i = 1}^{n} {{\frac{{b^{\prime\prime}(\eta_{i} )}}{a(\phi )}}{\mathbf{X}}_{ei} {\mathbf{X}}^{\prime}_{gi} } $$

$$ V_{\beta \beta } = \sum\limits_{i = 1}^{n} {{\frac{{b^{\prime\prime}(\eta_{i} )}}{a(\phi )}}} {\mathbf{X}}_{gi} {\mathbf{X}}^{\prime}_{gi} $$

Using the above results, the ith subject’s contribution to the efficient score function can be calculated by

$$ U_{i} = U_{\beta ,i} - V_{\alpha \beta } V_{\alpha \alpha }^{ - 1} U_{\alpha ,i} $$

as described previously. The forms of a(ϕ), b″(η _i) and $ \tilde{y}_{i} $ for linear, logistic and Poisson regressions are given by Schaid et al. (2002). They are included in Table 1 for easy reference.

Table 1 Parameters for different distributions in a GLM

Full size table

The efficient score functions are calculated for each subject and for each genetic model. Since each test is 1 df, we use the z-statistic in the form $ U_{j} /\sqrt {V_{j} } $. Denote the z-statistic from two genetic models by Z_j and Z_k, the covariance between them is given by

$$\begin{aligned} \text{cov} (Z_{j} ,Z_{k} )&= {\frac{{\text{cov}}(U_{j} ,U_{k} )}{{\sqrt {V_{j} } \sqrt {V_{k} } }}} \hfill \\ &={\frac{{\sum\nolimits_{i = 1}^{n} {U_{ji} } U^{\prime}_{ki}}}{{\sqrt {\sum\nolimits_{i = 1}^{n} {U_{ji} } U^{\prime}_{ji} }\sqrt {\sum\nolimits_{i = 1}^{n} {U_{ki} } U^{\prime}_{ki} } }}}\hfill \\ \end{aligned} $$

Hence the covariance matrix of the z-statistics for all genetic models can be determined. Considering the case where additive, dominant and recessive models are tested. Let the observed maximum z-statistic be c and the maximum z under the complete null hypothesis be Z_null,max,

$$ \begin{aligned} p_{\text{corrected}} &= 1 -\Pr(|Z_{\text{null,max}}|\le c) \hfill \\ & = 1 -\int\limits_{-c}^{c} {\int\limits_{ - c}^{c}{\int\limits_{-c}^{c}{\varphi_{3} ({\mathbf{z}};{\mathbf{0}}}}},\Sigma)d{\mathbf{z}}\end{aligned} $$

where φ ₃ is the trivariate normal distribution with covariance matrix Σ. The integral is computed by numerical methods (Genz 1992) implemented in the R package mvtnorm.

Working with the R package RobustSNP

We developed an R package RobustSNP that implements the previously described methodology. Here we briefly describe how users may perform analyses with this program. The inputs required include a file containing the outcomes (binary or quantitative) and genotypes coded as 0, 1 or 2 according to allelic counts. A file of covariates may also be included but is optional. Alternatively users can directly specify the inputs as matrices or data-frames in R.

To facilitate the analysis of GWAS, we also provide two other functions Rbin.block and Rlinear.block. These two functions accepts binary PED files from PLINK (Purcell et al. 2007) as inputs. Binary PED files are very commonly used in GWAS due to its compact size. The binary PED files are first read by the “read.plink” function in the package snpMatrix (Clayton and Leung 2007). The genotype file is then loaded in blocks (e.g. 5,000 SNPs at a time) for association analysis under different genetic models. This strategy aims to reduce the memory requirement when analyzing large-scale datasets.

The program outputs include (1) the z-statistics and p-values under additive, dominant and recessive models using the score test; (2) the p-value based on the maximum of the three genetic models, adjusted for multiple testing; (3) the error estimate from trivariate integration. The results are arranged in a tabular format with each row representing a SNP.

Results

Example application to a real dataset

To illustrate the utility of the proposed approach, we applied the methodology via RobustSNP to a real dataset of genome-wide association study on schizophrenia in a Chinese population (So et al. 2010). After quality control procedures, the dataset consisted of 473,931 SNPs from 481 cases and 2,034 controls. SNP associations with the disease were tested by logistic regression. Population stratification was corrected by including the top 10 principal components derived from EIGENSTRAT (Price et al. 2006) as covariates. Table 2 shows an excerpt of the results from chromosome 1 together with the p-values from bootstrap resampling (the bootstrap procedure is detailed below).

Table 2 Example of robust association tests as applied to a schizophrenia dataset with 10 covariates

Full size table

Running time

A block-size of 5,000 was used (i.e. loading 5,000 SNPs at a time). The entire analysis by RobustSNP took 17.9 h (excluding X chromosome SNPs). The time for dataset loading has already been included. The average time taken for a single SNP analysis was therefore ~0.139 s. For a comparison, we also employed PLINK to run logistic regressions on the same dataset for a single genetic model. The time taken was 5 h and 38 min. Hence the equivalent time taken for three models was ~16.9 h for PLINK. The time taken for a standard regression analysis and a robust analysis by maximizing test statistics over genetic models are in fact not very much different. In practice, one can also perform the analysis in parallel, for example by considering each chromosome at a time.

Comparing our theoretical results with bootstrap

To check the validity of our approach, we compare the p-values obtained from our theoretical calculations with a bootstrap procedure. Note that when covariates are present, a permutation approach that shuffles the phenotypes values may not be valid. As pointed out in Lin (2005a), in particular the empirical distribution generated by permutations may be invalid when covariates are correlated with both the genotype and phenotype. We therefore choose to test the validity of our proposed methodology by a bootstrap procedure. We employed the null-shift and scale-transformed bootstrap procedure as detailed in Dudoit et al. (2004) and procedure 2.3 in Dudoit and van der Laan (2007). Briefly, the cases and controls are sampled with replacement separately and the test statistics are re-calculated on each bootstrapped dataset. The test statistic are then null-centered (each test statistic subtracted from its mean in bootstrap samples) and scale-transformed as described in the references. The null-centered and scale-transformed test statistic $[{Z_{n}^{b}(j)}]$ is in the following form:

$$ Z_{n}^{b} (j) = \sqrt {\min \left( {1,{\frac{{\tau_{0} (j)}}{{Var\left[ {T_{n}^{b} (j)} \right]}}}} \right)} (T_{n}^{b} (j) - E\left[ {T_{n}^{b} (j)} \right]) + \lambda_{0} (j) $$

$ T_{n}^{b} (j)$ denotes the test statistic of test j from bootstrap samples of size n. Since we are testing three genetic models, j will range from 1 to 3. λ₀(j) and τ₀(j) are the known null mean and variance of the test statistic corresponding to jth test (e.g. for a z-statistic under the null, the mean is 0 and variance is 1). We performed 1,000 bootstraps for each SNP. The number of bootstraps was not further increased since the procedure is time-consuming.

We compared our theoretical p-values with bootstrap p-values on a block of 100 SNPs chosen from chromosome 1 (mimicking a fine-mapping association study). Figure 1 shows a plot of the results. It is obvious that the resampling and theoretical approaches produce very similar results. The correlation between the two sets of p-values was almost perfect (r = 0.997). We also tried to pick a random set of 300 SNPs (such that the chosen SNPs are uncorrelated) and compared the p-values under the two approaches. SNPs with only two genotypes were excluded. The plot in Fig. 2 again shows excellent correspondence of theoretical p-values with bootstrap p-values (r = 0.995). In addition, we have further picked a panel of nine random SNPs with very low p-values (p < 10⁻⁴) and investigate the concordance between the theoretical and the bootstrap p-values (using 300000 bootstraps). Table 3 showed that the p-values agreed reasonably well.

Table 3 Concordance between the theoretical and bootstrap results for a random panel of SNPs with low p-values

Full size table

Discussion

We have developed and implemented an algorithm for maximizing test statistics over different genetic models. The method was based on theories developed by Lin (2005a, b) concerning the covariance of score statistics. The asymptotic theory presented in Lin (2005a) assumes the number of hypothesis tests m is fixed and the sample size n tends to infinity. Simulations Lin (2005a) however showed that proper control of family-wise error was attained when the sample size exceeds 100 and m ranges from a few hundreds to a few thousands. For the current application, we are considering three tests (additive, dominant and recessive, i.e. m = 3) only at one time and the sample size for genetic association studies or GWAS are usually over a few hundreds and commonly more than a thousand. The number of subjects is likely to continue to rise in view of increasing collaboration between study groups. Therefore, in our case we have n ≫ m and there are no problems with the proposed analytic method.

We have not studied the power of different robust association procedures in this paper. In fact there are already numerous studies that investigated the power of various procedures such as MAX3, CLRT, MIN2 and the trend test alone (Freidlin et al. 2002; Gonzalez et al. 2008; Joo et al. 2009, 2010). Overall, the trend test performs the best when the true model is additive, but the gain in power is small compared to other robust tests (MAX3, CLRT, MIN2). Under the dominant model, all tests have comparable power. However, when the underlying model is recessive, the robust tests are more powerful than the trend test which assumes additivity. Freidlin et al. (2002) showed that employing the additive test results in substantial power loss if the true disease model is recessive, especially for alleles with low frequency (say <0.1). For instance, according to Freidlin et al. (2002), for a study with 500 cases and 500 controls and a risk allele frequency of 0.1, the power estimates of the additive, recessive and MAX3 test are 35.7, 79.4 and 71.4% respectively. If the risk allele frequency is 0.3, the power estimates of the three tests are 54, 79.5 and 72% respectively. These results suggest that recessive effects may be missed if additive models are used. The robust test MAX3 protects against model misspecification and substantially improves the power particularly for lower-frequency variants.

The three types of robust procedures MAX3, CLRT and MIN2 have similar power in general. While previous simulations were conducted without consideration of covariates, we expect that the performance of the various tests will be similar even when covariates are included. Note that for MIN2, there are yet no analytic methods for calculating the correct p-value for models with covariates, therefore resampling procedures are needed if its performance is to be investigated. Extensive simulations to test the performance of different methods in the presence of covariates may be warranted and will be a topic for further investigation.

We have focused on population-based studies in this paper. Extension to family-based studies might be of interest. The MAX3 test has been extended to TDT (Joo et al. 2010; Zheng et al. 2002), but a methodology to deal with covariates and more complex family structure has yet to be developed. Our proposed approach can potentially be applied to family-based studies if the efficient score statistics can be specified under the three inheritance models.

Two-stage designs are also very common for GWAS and how to take into account of uncertain genetic model in this setting is another interesting topic. In a two-stage design, a set of the most significant SNPs are chosen from the 1st stage and replication was performed at the 2nd stage. Kwak et al. (2009) proposed a robust procedure performing GMS in this scenario, however quantitative traits and covariates have not been considered. Further work is required to extend Kwak et al’s procedure to deal with more diverse models.

Another question is how to combine the results across different studies in meta-analyses. Typically the inputs for meta-analysis are summary test statistics rather than the raw data. For a study that includes covariates, one cannot perform the MAX3 test based on summary statistics alone. However, if robust tests have been performed for each individual study, then one may directly combine the p-values, for example by the Fisher’s method.

In conclusion, we have developed an algorithm and an R package RobustSNP for obtaining valid p-values for robust association testing of different genetic models. The algorithm avoids the need for resampling procedures which are computationally expensive. Compared to other studies (or software packages) that focus on robust association tests, the method presented here allows for both quantitative and binary outcomes and is able to deal with covariates. We believe the method and program presented here will be useful to genetic researchers and will help to uncover susceptibility variants that may otherwise be missed by standard analysis assuming additive models only.

References

Bickel PJ, Klassen CAJ, Ritov Y, Wellner JA (1993) Efficient and adaptive estimation in semiparametric models. The Johns Hopkins University Press, Baltimore
Google Scholar
Clayton D, Leung HT (2007) An R package for analysis of whole-genome association studies. Hum Hered 64(1):45–51
Article PubMed Google Scholar
Dudoit S, van der Laan MJ (2007) Multiple testing procedures and applications to genomics. Springer, New York
Google Scholar
Dudoit S, van der Laan MJ, Pollard KS (2004) Multiple testing. Part I. Single-step procedures for control of general type I error rates. Stat Appl Genet Mol Biol 3:Article 13
Efron B (1997) The length heuristic for simultaneous hypothesis tests. Biometrika 84(1):143–157
Article Google Scholar
Freidlin B, Zheng G, Li Z, Gastwirth JL (2002) Trend tests for case–control studies of genetic markers: power, sample size and robustness. Hum Hered 53(3):146–152
Article PubMed Google Scholar
Genz A (1992) Numerical computation of multivariate normal probabilities. J Comput Graph Stat 1(2):141–149
Article Google Scholar
Gonzalez JR, Armengol L, Sole X, Guino E, Mercader JM, Estivill X, Moreno V (2007) SNPassoc: an R package to perform whole genome association studies. Bioinformatics 23(5):644–645
Article PubMed Google Scholar
Gonzalez JR, Carrasco JL, Dudbridge F, Armengol L, Estivill X, Moreno V (2008) Maximizing association statistics over genetic models. Genet Epidemiol 32(3):246–254
Article PubMed Google Scholar
Hauck W Jr, Donner A (1977) Wald’s test as applied to hypotheses in logit analysis. J Am Stat Assoc 72(360):851–853
Article Google Scholar
Joo J, Kwak M, Ahn K, Zheng G (2009) A robust genome-wide scan statistic of the Wellcome Trust Case–Control Consortium. Biometrics 65(4):1115–1122
Article PubMed Google Scholar
Joo J, Kwak M, Chen Z, Zheng G (2010) Efficiency robust statistics for genetic linkage and association studies under genetic model uncertainty. Stat Med 29(1):158–180
PubMed Google Scholar
Kwak M, Joo J, Zheng G (2009) A robust test for two-stage design in genome-wide association studies. Biometrics 65(4):1288–1295
Article PubMed Google Scholar
Li Q, Zheng G, Li Z, Yu K (2008) Efficient approximation of P-value of the maximum of correlated tests, with applications to genome-wide association studies. Ann Hum Genet 72(Pt 3):397–406
Article PubMed Google Scholar
Lin DY (2005a) An efficient Monte Carlo approach to assessing statistical significance in genomic studies. Bioinformatics 21(6):781–787
Article PubMed Google Scholar
Lin DY (2005b) On rapid stimulation of P values in association studies. Am J Hum Genet 77(3):513–514 author reply 514–515
Article PubMed Google Scholar
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D (2006) Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet 38(8):904–909
Article PubMed Google Scholar
Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de Bakker PI, Daly MJ, Sham PC (2007) PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81(3):559–575
Article PubMed Google Scholar
Sasieni PD (1997) From genotypes to genes: doubling the sample size. Biometrics 53(4):1253–1261
Article PubMed Google Scholar
Schaid DJ, Rowland CM, Tines DE, Jacobson RM, Poland GA (2002) Score tests for association between traits and haplotypes when linkage phase is ambiguous. Am J Hum Genet 70(2):425–434
Article PubMed Google Scholar
So HC, Li M, Chen RY, Cheung EF, Chen EY, Cherny SS, Li T, Sham PC (2010) Genome-wide association study of schizophrenia in a Chinese population. Int J Neuropsychopharmacol 13(Supplement S1):171
Google Scholar
Wang K, Sheffield VC (2005) A constrained-likelihood approach to marker-trait association studies. Am J Hum Genet 77(5):768–780
Article PubMed Google Scholar
Yamada R, Okada Y (2009) An optimal dose-effect mode trend test for SNP genotype tables. Genet Epidemiol 33(2):114–127
Article PubMed Google Scholar
Zang Y, Fung WK, Zheng G (2010) Simple algorithms to calculate asymptotic null distributions of robust tests in case–control genetic association studies in R. J Stat Softw 33(8):1–24
Google Scholar
Zheng G, Freidlin B, Gastwirth JL (2002) Robust TDT-type candidate–gene association tests. Ann Hum Genet 66(Pt 2):145–155
Article PubMed Google Scholar
Zheng G, Ng HK (2008) Genetic model selection in two-phase analysis for case-control association studies. Biostatistics 9(3):391–399
Article PubMed Google Scholar

Download references

Acknowledgments

The work was supported by the Hong Kong Research Grants Council General Research Fund grants HKU 766906M and HKU 774707M and the University of Hong Kong Strategic Research Theme of Genomics. Hon-Cheong So was supported by a Croucher Foundation Scholarship.

Open Access

This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Author information

Authors and Affiliations

Department of Psychiatry, 10/F Laboratory Block, LKS Faculty of Medicine, University of Hong Kong, Pokfulam, Hong Kong SAR, China
Hon-Cheong So & Pak C. Sham
Genome Research Centre, University of Hong Kong, Hong Kong SAR, China
Pak C. Sham
State Key Laboratory of Brain and Cognitive Sciences, University of Hong Kong, Hong Kong SAR, China
Pak C. Sham

Authors

Hon-Cheong So
View author publications
You can also search for this author in PubMed Google Scholar
Pak C. Sham
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Pak C. Sham.

Additional information

Edited by Sarah Medland.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License (https://creativecommons.org/licenses/by-nc/2.0), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

So, HC., Sham, P.C. Robust Association Tests Under Different Genetic Models, Allowing for Binary or Quantitative Traits and Covariates. Behav Genet 41, 768–775 (2011). https://doi.org/10.1007/s10519-011-9450-9

Download citation

Received: 17 July 2010
Accepted: 18 January 2011
Published: 09 February 2011
Issue Date: September 2011
DOI: https://doi.org/10.1007/s10519-011-9450-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robust Association Tests Under Different Genetic Models, Allowing for Binary or Quantitative Traits and Covariates

Abstract

Similar content being viewed by others

Boosting the power of genome-wide association studies within and across ancestries by using polygenic scores

Efficient Bayesian mixed-model analysis increases association power in large cohorts

Statistical Perspectives for Genome-Wide Association Studies (GWAS)

Methods

General theory: covariance of score functions