Association study is a very useful tool for revealing susceptibility variants in diseases. With the recent advances in technology, genome-wide association studies (GWAS) have been increasingly popular. The association of a genetic variant with a disease or quantitative trait is usually assessed under an additive model of inheritance. In other words, we assume that the disease risk or trait value depends upon the number of copies of the risk allele. For example, the commonly used Cochran–Armitage trend test for binary outcomes assumes an additive model (Sasieni 1997). More generally, the genotype is usually coded as 0, 1 or 2 according to the dose of the risk allele in regression models.

However, in reality it is often impossible to know the true model of inheritance beforehand. Misspecification of the genetic model leads to a reduction in power. For instance, when the recessive or dominant model is real, assuming additivity will result in power loss. More robust tests for association might therefore be preferred over model-dependent methods such as the trend test. An intuitive approach is to consider the maximum of the three test statistics under additive, dominant and recessive models (MAX3) (Freidlin et al. 2002; Gonzalez et al. 2008). Nevertheless, multiple testing needs to be taken into account to prevent inflation of type I error rate. Since the test statistics under these 3 models are not independent, a Bonferroni correction is over-conservative. Resampling-based methods, such as permutation and bootstrap, can be used to estimate the distribution of the MAX3 statistic under the null, but they are computationally expensive. In GWAS, very large numbers of markers are genotyped and we need enormous number of permutations (or runs of other resampling procedures) to achieve very low p-values.

Gonzalez et al. (2008) derived the asymptotic distribution of the likelihood ratio test statistics under H0 for 2 × K table (K is the number of independent variables) and hence the p-value could be calculated analytically. In a similar vein, Zheng and Ng (2008) proposed the genetic model selection (GMS) test. In the first stage, the best genetic model is chosen based on a Hardy–Weinberg disequilibrium trend test between controls and cases and the chosen genetic model is tested in the second stage. The authors computed the p-value analytically by considering the proper null distribution of the GMS statistic.

The majority of previous studies on robust association tests considering different models of inheritance have focused on binary outcomes and assumed no covariates, with the exception of Li et al. (2008). In practice, other types of outcomes such as quantitative traits are often studied. Covariates are also commonly included in association studies. For instance in GWAS, researchers often correct for population stratification by including principal components (e.g. from EIGENSTRAT) (Price et al. 2006) that capture the ancestry differences in the sample. In many instances other clinical covariates (e.g. age) are also included in association studies.

Li et al. (2008) considered the Wald test and proposed estimating the covariance matrix between the 3 test statistics by solving estimating equations. The p-values for MAX3 were approximated by the “rhombus formula” that was developed based on Efron (1997). In this study, we propose and implement an alternative analytic approach to robust association tests employing MAX3, allowing for quantitative or binary outcomes as well as covariates. The approach is based on previous work by Lin (2005a), who developed a Monte-Carlo procedure to evaluate significance levels in large-scale genomic studies. We found that the concept can also be applied to robust association tests.

Our approach is based on score tests and can potentially be employed in other scenarios, as long as a score statistic can be formed. Compared to the Wald test as applied in Li et al. (2008), the score test is computationally much faster as it does not require computation of the maximum likelihood estimate (MLE) of regression coefficients. As we are usually only interested in the coefficients of the few top SNPs in a GWAS, the score test saves the time in estimating coefficients for the majority of SNPs that do not show high levels of significance. In addition, the Wald test may not be reliable in logistic regression especially when the effect size is large (or more generally when the true parameter value is far away from the null) (Hauck and Donner 1977).

Many other related tests have also been proposed. An example is the constrained likelihood ratio test (CLRT) (Wang and Sheffield 2005), which makes the restriction that the heterozygous genotype has a mean effect in between the two homozygous genotypes (i.e. no over-dominance). CLRT can deal with binary or quantitative traits and the authors have pointed out its potential to be generalized to models with covariates. The issue of covariates however was not explored in Wang and Sheffield (2005). Programs implementing CLRT have not been publicly available yet. Compared to CLRT, MAX3 might be easier to interpret and is more conceptually familiar to researchers since it is simply based on taking the maximum of the three well-known inheritance models. Also based on the assumption of no over-dominance, Yamada and Okada (2009) proposed a very similar test known as the optimal dose–effect mode trend test. Alternatively, one may also take the minimum of the p-values from the Pearson’s chi-square test and trend test. This approach (denoted MIN2) was studied by Joo et al. (2009). Simulation studies on MAX3, CLRT and MIN2 under various genetic models suggest that they have similar power (Joo et al. 2009, 2010). We shall focus on MAX3 in the current study.

Relatively few programs are available for obtaining valid p-values when testing multiple genetic models. SNPassoc (Gonzalez et al. 2007) and Rassoc (Zang et al. 2010) are two R packages that offer such options. SNPassoc includes a function (maxstat) that implements approach by Gonzalez et al. (2008). Rassoc allows the calculation of MAX3 and GMS for case–control association studies (Zang et al. 2010). However, none of the available programs allow continuous traits and none offer the option of including covariates in association tests. We have implemented our proposed methodology in a new R package called RobustSNP that is able to tackle these problems.

Methods

General theory: covariance of score functions

The theory described below followed closely the Monte-Carlo simulation approach proposed by Lin (2005a) for assessing statistical significance in multiple testing scenarios. As pointed out by Lin, all the commonly employed statistics are related to the score statistic and can be expressed as or approximated by

$$ T_{j} = U^{\prime}_{j} V_{j}^{ - 1} U_{j} $$

where the subscript j refers to the jth hypothesis we want to test and

$$ U_{j} = \sum\limits_{i = 1}^{n} {U_{ji} } $$

where U ji is the score function calculated from data from the ith subject only and n refers to the sample size.

V j is given by

$$ V_{j} = \sum\limits_{i = 1}^{n} {U_{ji} } U^{\prime}_{ji} $$

When the jth hypothesis is truly null, U j is approximately normally distributed with mean 0 and covariance matrix V j in large samples. Hence T j follows an approximately chi-square distribution with degrees of freedom equal to the dimension of U j .

Consider testing a total of m hypotheses to be tested. If all of them are truly null, with large samples, (U 1, U 2,…,U m) follows approximately a multivariate normal distribution with mean vector 0 and the covariance between U j and U k of any two hypothesis tests j and k is

$$ V_{jk} = \sum\limits_{i = 1}^{n} {U_{ji} } U^{\prime}_{ki} $$

This result forms the basis of our procedure to correct for the testing of multiple genetic models. In brief, we construct the score statistic for each of the three genetic models (dominant, recessive and additive) and use the above formula to calculate the covariance matrix of the three statistics under the null. The appropriate significance level is obtained by trivariate integration.

When covariates are present, U ji in the above formulae should represent the ith subject’s efficient score function for β j , the parameter of interest (Bickel et al. 1993; Lin 2005a, b). We have

$$ U_{ji} = U_{{\beta_{j} ,i}} - V_{{\beta_{j} \alpha_{j} }} V_{{\alpha_{j} \alpha_{j} }}^{ - 1} U_{{\alpha_{j} ,i}} $$

where \( U_{{\beta_{j} ,i}} \) and \( U_{{\alpha_{j,i} }} \) are the score function for the ith subject for parameters β j and α j , α j being the nuisance parameter(s). \( V_{{\beta_{j} \alpha_{j} }} \) and \( V_{{\alpha_{j} \alpha_{j} }} \) are sub-matrices of the limiting Fisher information matrix of β j and α j [\( V_{{\beta_{j} \alpha_{j} }} \) equals \( {\text{cov}}\left( {U_{{\beta_{j} }} ,U_{{\alpha_{j} }} } \right) \) and \( V_{{\alpha_{j} \alpha_{j} }}^{{}} \)equals the \( {\text{var}}\left( {U_{{\alpha_{j} }} } \right) \)].

Application to genetic association studies

An example of application of score tests to genetic association studies may be found in Schaid et al. (2002). Here we shall focus on generalized linear models (GLMs) and adapt some of the work by Schaid (with modifications) in the following derivations.

For simplicity, we shall just consider a single test and the subscript j will be dropped. We are interested in testing the effect of a genetic marker under different genetic models, with or without covariates. For the ith subject, let y i be the measured outcome, X gi be the coding of the genotype and X ei be a vector of environmental covariates (“environmental” here just refers to any covariates to be adjusted for) including 1 as the first element (for the intercept). X gi is coded differently under different genetic models. Denoting the three genotypes of a markers by aa, Aa and AA, they will be coded as (0, 1, 2), (0, 1, 1) and (0, 0, 1) under additive, dominant and recessive models respectively. A is assumed to be the risk allele.

One can adjust the above coding scheme to deal with imputed genotypes. Most imputation programs produce explicit probabilities of the genotypes aa, Aa and AA. For each individual, the coding under an additive model is Pr(Aa) + 2 Pr(AA) (i.e. the standard dosage output by programs). The coding under a dominant model is Pr(Aa) + Pr(AA) while the coding under a recessive model is Pr(AA).

Assume that the outcome y and the predictor variables (X gi , X ei ) are related through a GLM,

$$ \eta_{i} = {\mathbf{X}}^{\prime}_{ei} {\varvec{\upalpha}} +{\mathbf{X}}^{\prime}_{gi} \beta = {\mathbf{Z}}^{\prime}_{i}{\varvec{\upgamma}} $$

where \( Z^{\prime}_{i} = (X^{\prime}_{ei} ,X^{\prime}_{gi} ) \) and γ = (α, β). Consistent with previous notations, the parameters α and β reflect the effects of the environmental covariate and genetic marker on the outcome respectively. η is related to the actual outcome y through the link function f, such that \( E(y_{i} |Z_{i} ) = f^{ - 1} (\eta_{i} ) \). The likelihood of the observed outcome y i given covariates Z i for the ith subject is

$$ L_{i} (y_{i} |Z_{i} ) = \exp \left[ {{\frac{{y_{i} \eta_{i} - b(\eta_{i} )}}{a(\phi )}} + c(y_{i} ,\phi )} \right] $$

where a, b and c are known functions and ϕ is the dispersion parameter.

We are interested only in testing the parameter β. The score function for genetic markers, with adjustment for environmental covariates, can be written as

$$ U_{\beta } = \sum\limits_{i = 1}^{n} {{\frac{{\partial \log L_{i} }}{\partial \beta }}} = \sum\limits_{i = 1}^{n} {{\frac{{y_{i} - \tilde{y}_{i} }}{a(\phi )}}} {\mathbf{X}}_{gi} $$

Note that the score test is constructed under the null hypothesis, i.e. β = 0, hence \( \tilde{y}_{i} \) is the fitted value when the trait is regressed on the environmental covariates only. \( \tilde{y}_{i} \) needs to be calculated only once even when a large number of SNPs is tested.

The contribution from the ith subject is

$$ U_{\beta ,i} = {\frac{{\partial \log L_{i} }}{\partial \beta }} = {\frac{{y_{i} - \tilde{y}_{i} }}{a(\phi )}}{\mathbf{X}}_{gi} $$

Similarly, we have

$$ U_{\alpha ,i} = {\frac{{\partial \log L_{i} }}{\partial \alpha }} = {\frac{{y_{i} - \tilde{y}_{i} }}{a(\phi )}}{\mathbf{X}}_{ei} $$

The variance and covariance of the score functions of α and β are

$$ V_{\alpha \alpha } = \sum\limits_{i = 1}^{n} {{\frac{{b^{\prime\prime}(\eta_{i} )}}{a(\phi )}}} {\mathbf{X}}_{ei} {\mathbf{X}}^{\prime}_{ei} $$
$$ V_{\alpha \beta } = \sum\limits_{i = 1}^{n} {{\frac{{b^{\prime\prime}(\eta_{i} )}}{a(\phi )}}{\mathbf{X}}_{ei} {\mathbf{X}}^{\prime}_{gi} } $$
$$ V_{\beta \beta } = \sum\limits_{i = 1}^{n} {{\frac{{b^{\prime\prime}(\eta_{i} )}}{a(\phi )}}} {\mathbf{X}}_{gi} {\mathbf{X}}^{\prime}_{gi} $$

Using the above results, the ith subject’s contribution to the efficient score function can be calculated by

$$ U_{i} = U_{\beta ,i} - V_{\alpha \beta } V_{\alpha \alpha }^{ - 1} U_{\alpha ,i} $$

as described previously. The forms of a(ϕ), b″(η i ) and \( \tilde{y}_{i} \) for linear, logistic and Poisson regressions are given by Schaid et al. (2002). They are included in Table 1 for easy reference.

Table 1 Parameters for different distributions in a GLM

The efficient score functions are calculated for each subject and for each genetic model. Since each test is 1 df, we use the z-statistic in the form \( U_{j} /\sqrt {V_{j} } \). Denote the z-statistic from two genetic models by Z j and Z k , the covariance between them is given by

$$\begin{aligned} \text{cov} (Z_{j} ,Z_{k} )&= {\frac{{\text{cov}}(U_{j} ,U_{k} )}{{\sqrt {V_{j} } \sqrt {V_{k} } }}} \hfill \\ &={\frac{{\sum\nolimits_{i = 1}^{n} {U_{ji} } U^{\prime}_{ki}}}{{\sqrt {\sum\nolimits_{i = 1}^{n} {U_{ji} } U^{\prime}_{ji} }\sqrt {\sum\nolimits_{i = 1}^{n} {U_{ki} } U^{\prime}_{ki} } }}}\hfill \\ \end{aligned} $$

Hence the covariance matrix of the z-statistics for all genetic models can be determined. Considering the case where additive, dominant and recessive models are tested. Let the observed maximum z-statistic be c and the maximum z under the complete null hypothesis be Znull,max,

$$ \begin{aligned} p_{\text{corrected}} &= 1 -\Pr(|Z_{\text{null,max}}|\le c) \hfill \\ & = 1 -\int\limits_{-c}^{c} {\int\limits_{ - c}^{c}{\int\limits_{-c}^{c}{\varphi_{3} ({\mathbf{z}};{\mathbf{0}}}}},\Sigma)d{\mathbf{z}}\end{aligned} $$

where φ 3 is the trivariate normal distribution with covariance matrix Σ. The integral is computed by numerical methods (Genz 1992) implemented in the R package mvtnorm.

Working with the R package RobustSNP

We developed an R package RobustSNP that implements the previously described methodology. Here we briefly describe how users may perform analyses with this program. The inputs required include a file containing the outcomes (binary or quantitative) and genotypes coded as 0, 1 or 2 according to allelic counts. A file of covariates may also be included but is optional. Alternatively users can directly specify the inputs as matrices or data-frames in R.

To facilitate the analysis of GWAS, we also provide two other functions Rbin.block and Rlinear.block. These two functions accepts binary PED files from PLINK (Purcell et al. 2007) as inputs. Binary PED files are very commonly used in GWAS due to its compact size. The binary PED files are first read by the “read.plink” function in the package snpMatrix (Clayton and Leung 2007). The genotype file is then loaded in blocks (e.g. 5,000 SNPs at a time) for association analysis under different genetic models. This strategy aims to reduce the memory requirement when analyzing large-scale datasets.

The program outputs include (1) the z-statistics and p-values under additive, dominant and recessive models using the score test; (2) the p-value based on the maximum of the three genetic models, adjusted for multiple testing; (3) the error estimate from trivariate integration. The results are arranged in a tabular format with each row representing a SNP.

Results

Example application to a real dataset

To illustrate the utility of the proposed approach, we applied the methodology via RobustSNP to a real dataset of genome-wide association study on schizophrenia in a Chinese population (So et al. 2010). After quality control procedures, the dataset consisted of 473,931 SNPs from 481 cases and 2,034 controls. SNP associations with the disease were tested by logistic regression. Population stratification was corrected by including the top 10 principal components derived from EIGENSTRAT (Price et al. 2006) as covariates. Table 2 shows an excerpt of the results from chromosome 1 together with the p-values from bootstrap resampling (the bootstrap procedure is detailed below).

Table 2 Example of robust association tests as applied to a schizophrenia dataset with 10 covariates

Running time

A block-size of 5,000 was used (i.e. loading 5,000 SNPs at a time). The entire analysis by RobustSNP took 17.9 h (excluding X chromosome SNPs). The time for dataset loading has already been included. The average time taken for a single SNP analysis was therefore ~0.139 s. For a comparison, we also employed PLINK to run logistic regressions on the same dataset for a single genetic model. The time taken was 5 h and 38 min. Hence the equivalent time taken for three models was ~16.9 h for PLINK. The time taken for a standard regression analysis and a robust analysis by maximizing test statistics over genetic models are in fact not very much different. In practice, one can also perform the analysis in parallel, for example by considering each chromosome at a time.

Comparing our theoretical results with bootstrap

To check the validity of our approach, we compare the p-values obtained from our theoretical calculations with a bootstrap procedure. Note that when covariates are present, a permutation approach that shuffles the phenotypes values may not be valid. As pointed out in Lin (2005a), in particular the empirical distribution generated by permutations may be invalid when covariates are correlated with both the genotype and phenotype. We therefore choose to test the validity of our proposed methodology by a bootstrap procedure. We employed the null-shift and scale-transformed bootstrap procedure as detailed in Dudoit et al. (2004) and procedure 2.3 in Dudoit and van der Laan (2007). Briefly, the cases and controls are sampled with replacement separately and the test statistics are re-calculated on each bootstrapped dataset. The test statistic are then null-centered (each test statistic subtracted from its mean in bootstrap samples) and scale-transformed as described in the references. The null-centered and scale-transformed test statistic \([{Z_{n}^{b}(j)}]\) is in the following form:

$$ Z_{n}^{b} (j) = \sqrt {\min \left( {1,{\frac{{\tau_{0} (j)}}{{Var\left[ {T_{n}^{b} (j)} \right]}}}} \right)} (T_{n}^{b} (j) - E\left[ {T_{n}^{b} (j)} \right]) + \lambda_{0} (j) $$

\( T_{n}^{b} (j)\) denotes the test statistic of test j from bootstrap samples of size n. Since we are testing three genetic models, j will range from 1 to 3. λ0(j) and τ0(j) are the known null mean and variance of the test statistic corresponding to jth test (e.g. for a z-statistic under the null, the mean is 0 and variance is 1). We performed 1,000 bootstraps for each SNP. The number of bootstraps was not further increased since the procedure is time-consuming.

We compared our theoretical p-values with bootstrap p-values on a block of 100 SNPs chosen from chromosome 1 (mimicking a fine-mapping association study). Figure 1 shows a plot of the results. It is obvious that the resampling and theoretical approaches produce very similar results. The correlation between the two sets of p-values was almost perfect (r = 0.997). We also tried to pick a random set of 300 SNPs (such that the chosen SNPs are uncorrelated) and compared the p-values under the two approaches. SNPs with only two genotypes were excluded. The plot in Fig. 2 again shows excellent correspondence of theoretical p-values with bootstrap p-values (r = 0.995). In addition, we have further picked a panel of nine random SNPs with very low p-values (p < 10−4) and investigate the concordance between the theoretical and the bootstrap p-values (using 300000 bootstraps). Table 3 showed that the p-values agreed reasonably well.

Fig. 1
figure 1

A block of 100 SNPs from a real dataset was extracted. Analytic p-values from robust association tests were plotted against the p-values obtained from a bootstrap resampling procedure. One thousand bootstraps were run for each SNP. The correlation (r) is 0.997

Fig. 2
figure 2

A random set of 300 SNPs from a real dataset were extracted. Analytic p-values from robust association tests were plotted against the p-values obtained from a bootstrap resampling procedure. One thousand bootstraps were run for each SNP. The correlation (r) is 0.995

Table 3 Concordance between the theoretical and bootstrap results for a random panel of SNPs with low p-values

Discussion

We have developed and implemented an algorithm for maximizing test statistics over different genetic models. The method was based on theories developed by Lin (2005a, b) concerning the covariance of score statistics. The asymptotic theory presented in Lin (2005a) assumes the number of hypothesis tests m is fixed and the sample size n tends to infinity. Simulations Lin (2005a) however showed that proper control of family-wise error was attained when the sample size exceeds 100 and m ranges from a few hundreds to a few thousands. For the current application, we are considering three tests (additive, dominant and recessive, i.e. m = 3) only at one time and the sample size for genetic association studies or GWAS are usually over a few hundreds and commonly more than a thousand. The number of subjects is likely to continue to rise in view of increasing collaboration between study groups. Therefore, in our case we have n ≫ m and there are no problems with the proposed analytic method.

We have not studied the power of different robust association procedures in this paper. In fact there are already numerous studies that investigated the power of various procedures such as MAX3, CLRT, MIN2 and the trend test alone (Freidlin et al. 2002; Gonzalez et al. 2008; Joo et al. 2009, 2010). Overall, the trend test performs the best when the true model is additive, but the gain in power is small compared to other robust tests (MAX3, CLRT, MIN2). Under the dominant model, all tests have comparable power. However, when the underlying model is recessive, the robust tests are more powerful than the trend test which assumes additivity. Freidlin et al. (2002) showed that employing the additive test results in substantial power loss if the true disease model is recessive, especially for alleles with low frequency (say <0.1). For instance, according to Freidlin et al. (2002), for a study with 500 cases and 500 controls and a risk allele frequency of 0.1, the power estimates of the additive, recessive and MAX3 test are 35.7, 79.4 and 71.4% respectively. If the risk allele frequency is 0.3, the power estimates of the three tests are 54, 79.5 and 72% respectively. These results suggest that recessive effects may be missed if additive models are used. The robust test MAX3 protects against model misspecification and substantially improves the power particularly for lower-frequency variants.

The three types of robust procedures MAX3, CLRT and MIN2 have similar power in general. While previous simulations were conducted without consideration of covariates, we expect that the performance of the various tests will be similar even when covariates are included. Note that for MIN2, there are yet no analytic methods for calculating the correct p-value for models with covariates, therefore resampling procedures are needed if its performance is to be investigated. Extensive simulations to test the performance of different methods in the presence of covariates may be warranted and will be a topic for further investigation.

We have focused on population-based studies in this paper. Extension to family-based studies might be of interest. The MAX3 test has been extended to TDT (Joo et al. 2010; Zheng et al. 2002), but a methodology to deal with covariates and more complex family structure has yet to be developed. Our proposed approach can potentially be applied to family-based studies if the efficient score statistics can be specified under the three inheritance models.

Two-stage designs are also very common for GWAS and how to take into account of uncertain genetic model in this setting is another interesting topic. In a two-stage design, a set of the most significant SNPs are chosen from the 1st stage and replication was performed at the 2nd stage. Kwak et al. (2009) proposed a robust procedure performing GMS in this scenario, however quantitative traits and covariates have not been considered. Further work is required to extend Kwak et al’s procedure to deal with more diverse models.

Another question is how to combine the results across different studies in meta-analyses. Typically the inputs for meta-analysis are summary test statistics rather than the raw data. For a study that includes covariates, one cannot perform the MAX3 test based on summary statistics alone. However, if robust tests have been performed for each individual study, then one may directly combine the p-values, for example by the Fisher’s method.

In conclusion, we have developed an algorithm and an R package RobustSNP for obtaining valid p-values for robust association testing of different genetic models. The algorithm avoids the need for resampling procedures which are computationally expensive. Compared to other studies (or software packages) that focus on robust association tests, the method presented here allows for both quantitative and binary outcomes and is able to deal with covariates. We believe the method and program presented here will be useful to genetic researchers and will help to uncover susceptibility variants that may otherwise be missed by standard analysis assuming additive models only.