European Journal of Epidemiology

, Volume 28, Issue 10, pp 785–797 | Cite as

Comparisons of power of statistical methods for gene–environment interaction analyses



Any genome-wide analysis is hampered by reduced statistical power due to multiple comparisons. This is particularly true for interaction analyses, which have lower statistical power than analyses of associations. To assess gene–environment interactions in population settings we have recently proposed a statistical method based on a modified two-step approach, where first genetic loci are selected by their associations with disease and environment, respectively, and subsequently tested for interactions. We have simulated various data sets resembling real world scenarios and compared single-step and two-step approaches with respect to true positive rate (TPR) in 486 scenarios and (study-wide) false positive rate (FPR) in 252 scenarios. Our simulations confirmed that in all two-step methods the two steps are not correlated. In terms of TPR, two-step approaches combining information on gene-disease association and gene–environment association in the first step were superior to all other methods, while preserving a low FPR in over 250 million simulations under the null hypothesis. Our weighted modification yielded the highest power across various degrees of gene–environment association in the controls. An optimal threshold for step 1 depended on the interacting allele frequency and the disease prevalence. In all scenarios, the least powerful method was to proceed directly to an unbiased full interaction model, applying conventional genome-wide significance thresholds. This simulation study confirms the practical advantage of two-step approaches to interaction testing over more conventional one-step designs, at least in the context of dichotomous disease outcomes and other parameters that might apply in real-world settings.


Gene–environment interaction Statistical modeling False positive rate True positive rate Genome-wide testing 


With the completion of genome-wide genotyping in many epidemiological studies, opportunities exist for testing gene–environment (G*E) interactions on an unprecedented scale, and on an “agnostic” (hypothesis-free) basis. However, testing for G*E interactions on a genome-wide basis involves multiple comparisons with the associated problems of limiting the overall (genome-wide) false discovery rate [1].

Several methods have been proposed to improve the power to detect G*E interactions when many thousands of comparisons have been made: First, a restriction to cases only was introduced [2]. Later this approach was combined with a full interaction analysis in cases and controls by a Bayesian shrinkage factor [3, 4]. Also global tests for genetic effects or interactions have been suggested [5]. In 2009, the concept of a two-step method was introduced by Murcray et al. [6]. In a first step the gene–environment association is tested and only genetic variants associated with the exposure at a prespecified level of alpha are tested in the second step in a conventional model containing the genetic and environmental main effects and an interaction term. In 2011, we proposed an additional module (step 1B), where we perform a classical genome wide association study (GWAS) separately in the two exposure strata and integrate the signals with the gene–environment associations within the disease strata (step 1A) [7]. Independently, Murcray et al. [8] developed a similar, but distinct method of integrating information regarding the gene-disease association.

The power of one-step and two-step methods has recently been compared in the context of a case–control study design [9], but the relative power of these methods has not been formally compared for detecting different types of G*E interactions, and in different types of epidemiological study designs. In this paper we applied each of the published methods [2, 3, 4, 5, 6, 7, 8], and our modifications of them [7], to simulated data representing both case–control and population-wide (cross-sectional or cohort) designs, under conditions representing the null hypothesis and alternative hypotheses of varying degrees of G*E interaction. We compared the false positive rate (FPR) as determined by null simulations and the true positive rate (TPR) or statistical power as quantified by the alternative simulations. Furthermore we explored an optimal threshold for the analysis performed in both modules of step 1.

Materials and methods

Types of interaction

The power to detect an interaction is primarily determined by its effect size and its precision. A common measure for the effect size is the interaction odds ratio (ORG*E), which can be interpreted as the ratio of the stratum-specific ORs in the strata for either exposure, or disease or genotype, as illustrated by formula (1):
$$ {\text{OR}}_{{{\text{G}}*{\text{E}}}} = {\text{OR}}_{{{\text{GD}}|{\text{E}} + }} /{\text{OR}}_{{{\text{GD}}|{\text{E}} - }} = {\text{OR}}_{{{\text{GE}}|{\text{D}} + }} /{\text{OR}}_{{{\text{GE}}|{\text{D}} - }} = {\text{OR}}_{{{\text{DE}}|{\text{G}} + }} /{\text{OR}}_{{{\text{DE}}|{\text{G}} - }} , $$
where OR GD|E+ denotes the OR for the association of genotype (G) and disease (D) in the stratum exposed to the environmental exposure (E+) and OR GD|E− the OR for the respective association in the unexposed stratum (E−); OR GE|D+ and OR GE|D− refer to the associations of genotype and environmental exposure in the respective strata of cases (D+) and controls (D−); OR DE|G+ and OR DE|G– refer to the associations of disease and environmental exposure in the strata of individuals with (G+) or without (G−) the respective genotype.

With respect to stratification for the environmental exposure different kinds of interactions can be discerned, which we name full effect concentration, partial effect concentration, or cross-over interaction. A full effect concentration represents a genetic effect which is only present in the stratum with the environmental exposure. A partial effect concentration is present when the environmental effect is weaker but still present in the unexposed stratum. A cross-over interaction represents opposite genetic effects between the strata of environmental exposure.

Creating data sets for simulations

For the simulation experiments data sets were constructed based on the parameters displayed in Table 1. The number of observations (N) was generally set to 10,000. When testing the effect of various case : control ratios we generated data sets constantly with 1,000 cases; controls ranged from 1,000 to 9,000 resulting in a sample size of 2,000–10,000. The expected values for the prevalences of disease (PD), of the environmental exposure (PE), and of the interacting allele (PG) were varied. Settings with PD of 0.5 were termed “case–control settings” since case–control studies often have equal numbers of cases and controls. All other settings, which predominantly occur in surveys or cohorts, were termed “population settings”. In order to simulate various effect sizes the population attributable risk fraction (PARF) of the environmental effect (PARFE) and of the genetic effect (PARFG) were manipulated. From PARFE and PE the relative risk of the environmental effect was derived as RRE = 1 + PARFE/(1 − PARFE)/PE. From RRE, PE, and PD the contingency table for the environmental effect on the disease was calculated as PDE = PD/(1 + (1 − PE)/PE/RRE if RRE ≠ 1 and PDE = PD*PE if RRE = 1. The RRG and the contingency table for the genetic effect on the disease were derived by an analogous approach.
Table 1

Parameters entered into the power simulation study


Range (steps)

Prevalence of disease (PD)

0.05, 0.1 … (0.1) … 0.6

Number of control subjects [N*(1−PD)]

1,000 … (1,000) … 9,000

Prevalence of exposure (PE)

0.1 … (0.1) … 0.8

Prevalence of genotype (PG)

0.1 … (0.1) … 0.9

Population attributable risk fraction of exposure (PARFE)

0.1 … (0.1) … 0.5

Population attributable risk fraction of genotype (PARFG)

0 … (0.05) … 0.25

Interaction odds ratio (ORG*E)

1.05, 1.1 … (0.1) … 2.0

OR in controls (OREG|D−)

0.8±1, 0.9±1, 0.95±1, 1.0

Proportion of corrected alpha level allocated to step 1 (ψ)

0 … (0.1) … 1

The strength of the interaction was controlled by varying ORG*E. The impact of gene–environment association (GEA), i.e. the deviation from gene–environment independence, was simulated by varying the OR in controls (ORGE|D−) or in an additional analysis in the full data set (ORGE). When varying ORGE|D−, the ORGE|D+ was derived as ORGE|D+ = ORGE|D− * ORG*E. All the above parameters refer to the study sample, which we call for simplicity “population”; the source population might differ. From the two contingency tables for the environmental and the genetic effect on the disease and the ORGE|D− and ORGE|D+ all 8 cells of the contingency table of all possible combinations of the three dichotomous variables D, E, and G were created using the quadratic formula. Finally the 8 cells of the contingency table were expanded to 12 cells for coding G in three categories (0, 1, and 2) to model two alleles per locus. The simulated counts within each cell represented numbers of individuals. In general, additive genetic models were applied to model the genotype; but also dominant and recessive models were explored.

Performing the simulations and calculating power

Random simulations of counts in the presence of G*E interactions, i.e. the alternative hypotheses, were performed with 103 iterations for each 3-way combination of PD, PE, and PG. The null hypothesis, i.e. absence of an interaction, was simulated with 106 iterations using parallel computing on the National Supercomputers HLRB-II SGI Altix 4700, Linux-Cluster, and superMUC at the Leibniz Supercomputer Center in Munich, Germany.

Within each iteration 7 individual logistic regression models were estimated using an additive genetic model as described by Table 2. The beta estimates (β = lnOR) of the models and their variances were retrieved and p values for the interaction were calculated for the respective methods as explained in the following paragraphs. In general, p values were derived from the ratio of the square of the beta estimates over their variances, tested against a χ² distribution with one degree of freedom.
Table 2

Logistic regression models performed to calculate the interaction ORs and p values




D = G + E + G*E

Full interaction model


E = G in D+

Gene–environment association in cases


E = G in D−

Gene–environment association in controls or non-cases


D = G in E+

Gene-disease association in exposed subjects


D = G in E−

Gene-disease association in unexposed subjects


E = G

Gene–environment association in all subjects combined


D = G

Gene-disease association in all subjects combined

aAdditive genetic models were used if not stated otherwise

Corrections for multiple testing assumed that interactions were being examined “agnostically” as part of a genome-wide study performing tests on 651,550 statistically independent loci, as proposed by Dudbridge and Gusnanto [10]. A false positive rate of 0.05 across the whole study was selected, and a Bonferroni correction applied resulting in a corrected alpha level (αcorr). Thus, for one-step methods, if the interaction p value was below αcorr = 0.05/651,550 = 7.67 × 10−8 the interaction was considered significant. The power was calculated as the proportion of the significant interaction results among the respective iterations. The choice of significance levels in the two-step methods was simulated for various thresholds for α1 = αcorrψ ranging from ψ = 0 to 1 by 0.1 (Table 1), where ψ = 0 corresponds to a full interaction model (step 2 only) and ψ = 1 corresponds to a “step 1-only” analysis. The FPR was calculated as αcorr1−ψ. As an approximation ψ = 0.5 was used in the first simulation experiments.

Case–control analysis

The full interaction model including cases and controls (“case–control”) comprises both main effects of the genetic and environmental factors and estimates the interaction effect directly by a multiplicative interaction term corresponding to model (i) in Table 2. This model is also used by all two-step approaches to calculate the interaction effect estimate (in step 2).

Case-only analysis

The case-only analysis (“case-only”) assumes independence of G and E in the controls, i.e. OR GE|D− = 1. In this case the interaction OR equals the OR in the cases (OR GE|D+) following formula (1). Therefore simply the βEG|D+ and its variance estimated by model (ii) in Table 2 were used to calculate the interaction p value of the case-only method.

Combination of case–control and case-only analysis

Mukherjee and Chatterjee [3, 4] combined the power advantages of the case-only approach and the rigour of the full interaction model by introducing a shrinkage factor that weighs down the case-only estimate in the direction of the case–control (full interaction) estimate in relation to the observed GEA among controls.

In the first approach the variance of the interaction V(βG*E) is used for weighting [4], whereas in the second approach this is replaced by the variance in the controls V(βEG|D−) [3]. As our analyses and those by the originators [9] consistently showed that the first approach was more powerful only this one is shown in the graphs (Mukherjee and Chatterjee 2008 [4]). The respective terms were estimates by models (i) and (iii) in Table 2.

Two-step approaches proposed by Murcray et al.

Murcray et al. [6] proposed a two step approach performing a logistic regression for the gene–environment association (model (vi) in Table 2) in the first step and a full interaction analysis (model (i) in Table 2) in the second step. Bonferroni correction was only applied to the number of SNPs being tested in the second step, resulting in less extreme significance thresholds in the full interaction model. For comparability we used the same step 1 thresholds as in our modifications of the method as described below.

In a second publication, Murcray et al. [8] introduce a second GWAS at step 1, testing for the marginal association of the disease with the interacting allele [model (vii) in Table 2] [8]. A weighting factor ‘rho’ then needs to be chosen to partition the step 1 significance level between the two GWASes performed at step 1, when selecting SNPs for inclusion in step 2. The criteria for selecting rho are not specified, although Murcray et al. show that their approach is relatively insensitive to the choice of rho within the range 0.1–0.9; we chose rho = 0.5 for our power comparisons. Common to both Murcray approaches is the estimation of the gene-disease and gene–environment associations in the pooled population and not stratified by environmental exposure, or by disease status, respectively.

Modifications of the Murcray method

As described previously we modified the first step of the approach by Murcray 2009 [6] (“Ege 2011”) [7]. First, we did not assess the gene–environment association in the entire population, but separately for cases and controls (models (ii) and (iii) in Table 2) and derived a simple (unweighted) average of the estimates and the variance of this average as follows:
$$ \beta_{{{\text{step}}\;1{\text{A}}}} = \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} *\left( {\beta_{{{\text{EG}}|{\text{D}} + }} + \, \beta_{{{\text{EG}}|{\text{D}} - }} } \right) $$
$$ {\text{V}}\left( {\beta_{{{\text{step}}\;1{\text{A}}}} } \right) = \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 4$} *\left( {{\text{V}}\left( {\beta_{{{\text{EG}}|{\text{D}} + }} } \right) + V\left( {\beta_{{{\text{EG}}|{\text{D}} - }} } \right)} \right) $$
Second, anticipating the spirit of Murcray 2011 [8], we introduced a step 1B, where we assessed the gene-disease association separately in exposed and unexposed subjects [models (iv) and (v) in Table 2] and averaged the estimates as follows:
$$ \beta_{{{\text{step}}\;1{\text{B}}}} = \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 2$} *\left( {\beta_{{{\text{GD}}|{\text{E}} + }} + \beta_{{{\text{GD}}|{\text{E}} - }} } \right) $$
$$ {\text{V}}\left( {\beta_{{{\text{step}}\;1{\text{B}}}} } \right) = \raise.5ex\hbox{$\scriptstyle 1$}\kern-.1em/ \kern-.15em\lower.25ex\hbox{$\scriptstyle 4$} *\left( {{\text{V}}\left( {\beta_{{{\text{GD}}|{\text{E}} + }} } \right) + {\text{V}}\left( {\beta_{{{\text{GD}}|{\text{E}} - }} } \right)} \right) $$
Instead of partitioning the step 1 significance threshold between steps 1A and 1B, we calculated a global p value for step 1 by combining the χ² values of step 1A and step 1B in a two degree of freedom χ² test, since the χ² values of step 1A and step 1B are independent under the null hypothesis of no interaction.
$$ \chi^{2}_{{{\text{step}}\;1}} = \beta^{2}_{{{\text{step}}\;1{\text{A}}}} /{\text{V}}\left( {\beta_{{{\text{step}}\;1{\text{A}}}} } \right) + \beta^{2}_{{{\text{step}}\;1{\text{B}}}} /{\text{V}}\left( {\beta_{{{\text{step}}\;1{\text{B}}}} } \right) $$
Only SNPs with a p value in step 1 below the a certain proportion ψ of the Bonferroni corrected significance level (α1 = (0.05/651,550)ψ) were passed to step 2, for testing in a full interaction model [model (i) in Table 2]. In simulations testing the null hypothesis of no interaction or simulations of few interacting SNPs as expected in real data, the significance threshold of step 2 was corrected for the number of SNPs (n) tested in this step, i.e. α2 = 0.05/n, which is approximately α2 = (0.05/651,550)1−ψ. In simulations of the alternative hypothesis with virtually all SNPs interacting with the environment we used the complementary proportion of ψ as significance level (α2 = (0.05/651,550)1−ψ).

“Weighted two-step” method

The full interaction model may be considered as a test of the difference between the stratum-specific beta estimates contributing to the averages at step 1A (βEG|D+ and βEG|D−) or at step 1B (βGD|E+ and βGD|E−). Under the null hypothesis of no interaction (zero difference), the difference will be independent of the unweighted average only if the variances of the two components are equal. Unequal variances might result in situations where PD ≠ 0.5 or PE ≠ 0.5 and would lead to a correlation between the averages (tested at step 1) and the difference (tested at step 2), even in the absence of G*E interaction. Theoretically, therefore, simple Bonferroni correction at step 2 would be insufficiently conservative.

In the case–control study design, the variances of the beta estimates contributing to the averages at step 1A will be similar. However, the variances will not be similar for the beta estimates contributing to the average at step 1B, unless the environmental exposure prevalence is close to 50 %. In the context of a population survey or a case–control study with many more controls than cases, the variances will be unequal at step 1A.

We assessed the extent of the bias (in practice) from using the unweighted averaging approach of Ege 2011 [7] by comparing it with an alternative approach that used inverse variance weighting for averaging in step 1 (termed here the “weighted 2-step” method). By weighting the beta estimates contributing to the step 1A and step 1B averages, inversely to their respective variances, the correlation between steps 1 and 2, under the null hypothesis, is removed, assuring statistical independence of steps 1 and 2, in the absence of G*E interaction. In both, weighted and unweighted, approaches we always included step 1A and step 1B.


False positive rate under the null hypothesis of no interaction

The false positive rate (FPR) was assessed in relation to GEA in controls varying OREG|D−. Figure 1 contrasts the FPR in a population (PD = 0.1) and a case–control setting (PD = 0.5), for each of the seven methods. Detailed results are presented in Supplemental Table 1.
Fig. 1

SNP-wise false positive rate for different methods in relation to the G–E association in controls for population survey and case–control settings. The null hypothesis of absence of interaction was simulated in 9 × 106 SNPs (1 million at each of 9 interacting allele frequencies) in a population setting (PD = 0.1, left panel) and a case–control setting (PD = 0.5, right panel). All other parameters were kept constant (N = 10,000, PE = 0.3, ORG*E = 1.0, PARFE = 0.2, PARFG = 0.1, ψ = 0.5)

As expected, in both settings, the case-only approach yielded a substantial number of false positive findings in the presence of GEA. In the case–control setting (Fig. 2, right panel) the FPR for the Mukherjee and Chatterjee [4] approach exceeded 10−5 if there was GEA as determined by an ORGE|D− of at least 0.9±1. This inflated FPR was present, but less marked, in the population setting. The second approach by Mukherjee et al. [3] yielded comparable results (Supplemental Table 1). In contrast, regardless of the study design or degree of GEA, all two-step methods revealed FPRs below 10−6 (Fig. 1). The genome-wide false positive rate is explicitly given in Table 3 at various levels of study-wide significance (alpha).
Fig. 2

True positive rate in relation to disease prevalence and sample size. Left panel The alternative hypothesis of a present interaction was simulated for 9,000 SNPs at 9 different interaction allele frequencies in 10,000 individuals for a disease prevalence ranging from 5 to 60 % (left panel). Right panel The number of cases was kept constant to 1,000 individuals and the number of controls was successively increased from 1,000 to 9,000 individuals. All other parameters were kept constant (PE = 0.3, ORG*E = 1.5, OREG|D− = 1.0, PARFE = 0.2, PARFG = 0.1, ψ = 0.5)

Table 3

Absolute numbers of false and true positive findings across all methods in a population setting (PD = 0.1) at a genome-wide scale





Mukherjee and Chatterjee 2008 [4]

Murcray 2009 [6]

Murcray 2011 [8]

Ege 2011 [7]

Weighted 2-step


False positives

0 [0–0]

9 [8.4–9.6]

0 [0–0]

0 [0–0]

0 [0–0]


0 [0–0]

True positives

1.2 [1.2–1.2]

10.2 [9.6–10.8]

1.9 [1.9–2]

19.3 [18.4–20.3]

47.7 [46.5–48.9]


51.8 [50.6–53]

% True

83 [81–84]

63 [62–65]

87 [86–88]

60 [59–62]

89 [88–90]


90 [90–91]


False positives

0 [0–0]

43.6 [40.8–46.5]

0 [0–0]

0 [0–0]

0 [0–0]

0.2 [0.2–0.3]

0 [0–0]

True positives

3 [2.9–3]

14.7 [14–15.4]

4.2 [4.1–4.3]

23.3 [22.2–24.3]

53.2 [52–54.4]

67.2 [66.1–68.3]

56.7 [55.4–57.9]

% True

92 [91–93]

59 [57–60]

90 [89–91]

65 [64–67]

89 [88–90]

94 [93–95]

92 [91–93]


False positives

0 [0–0]

126.7 [118.6–134.8]

0.1 [0.1–0.1]

0 [0–0]

0 [0–0]

0.7 [0.6–0.8]

0 [0–0]

True positives

5.4 [5.2–5.5]

18.7 [17.8–19.5]

7.1 [6.9–7.3]

26.3 [25.3–27.4]

57.2 [56–58.4]

70.7 [69.6–71.8]

60.1 [58.9–61.4]

% True

100 [100–100]

54 [52–55]

93 [92–94]

75 [73–76]

92 [91–93]

95 [94–95]

95 [95–96]


False positives

0 [0–0]

198.7 [186.1–211.2]

0.3 [0.3–0.3]

0.1 [0–0.1]

0 [0–0]

1.1 [1–1.2]

0 [0–0]

True positives

6.8 [6.7–7]

20.7 [19.8–21.6]

8.8 [8.5–9.1]

27.8 [26.7–28.9]

59 [57.8–60.2]

72.2 [71.2–73.3]

61.7 [60.5–62.9]

% True

100 [100–100]

52 [50–53]

89 [88–90]

78 [76–79]

95 [95–96]

94 [93–94]

95 [95–96]


False positives

0.1 [0.1–0.1]

307.8 [288.6–327.1]

0.7 [0.6–0.7]

0.1 [0.1–0.1]

0 [0–0]

1.8 [1.7–2]

0 [0–0]

True positives

8.7 [8.5–8.9]

22.8 [21.9–23.8]

10.8 [10.5–11.1]

29.4 [28.2–30.5]

60.8 [59.6–62]

73.8 [72.8–74.8]

63.3 [62–64.5]

% True

99 [99–99]

49 [48–51]

85 [84–86]

80 [78–81]

95 [95–96]

92 [91–93]

95 [95–96]


False positives

0.2 [0.2–0.2]

395.5 [370.9–420.1]

0.9 [0.9–1]

0.2 [0.2–0.2]


2.4 [2.2–2.6]


True positives

9.9 [9.7–10.1]

24.2 [23.2–25.1]

12.1 [11.8–12.4]

30.3 [29.2–31.5]


74.7 [73.7–75.7]


% True

98 [98–98]

48 [47–50]

84 [83–85]

76 [74–77]


92 [91–92]


Absolute numbers of false positives are calculated at various overall alpha levels for a genome-wide analysis with 651,550 statistically independent loci [10]. Absolute numbers of true positives refer to 100 expected interactions in a genome-wide analysis, respectively. “% True” refers to the proportion of all positive findings, i.e. the sum of the true and false positives. For all methods the respective means over 63 scenarios were calculated with 95 % confidence intervals. The scenarios comprised variations of the interacting allele frequency from 0.1 to 0.9 and ORGE|D− from 0.8 to 1.25; ψ was set to 0.8. The three scenarios alluded to in the text are printed in bold

These null simulations also revealed that both steps of the two-step methods were uncorrelated (Supplemental Fig. 1) except for the population setting of Ege 2011 [7] with a slight correlation (median of rho = −0.03). The two components of the first step of the two-step methods were only correlated for Murcray 2011 [8] in the case–control setting with a median correlation of rho = 0.28 (Supplemental Fig. 2). When correlating the individual components of the first step to the second step, step 1A and step 2 were slightly correlated for the unweighted approach in the population setting (Ege 2011 [7], median of rho = 0.03, Supplemental Fig. 3).

When simulating GEA in the entire sample and not just in the controls, similar results emerged (Supplemental Fig. 4).

True positive rate (power) in relation to the prevalence of D, E and G

The true positive rate (TPR) was first explored in relation to disease prevalence (Fig. 2, left panel, and Supplemental Table 2). The lowest power was associated with the full interaction model. At low disease prevalences the Ege et al. [7] approach displayed the highest power followed by Murcray et al. [8] and the weighted two-step method. The case-only approach achieved a higher TPR at higher prevalences (PD > 0.3), although with the potential for a high FPR if ORGE|D−≠1 (see above). In the typical case–control scenario (PD ~ 0.5), the TPR was similar for all four two-step methods, which had a small power advantage over Mukherjee and Chatterjee 2008 [4]. The apparent gain in power of Ege et al. [7] in comparison to the other two-step approaches was accompanied by a slightly increased FPR. In order to understand whether the power advantage was maintained after adjustment for the inflated FPR we estimated both TPR and FPR for a genome-wide setting with 651,550 statistically independent loci [10] and 100 truly interacting loci, respectively. The TPR and FPR of Ege 2011 [7] at an alpha of 0.01 corresponded to the respective characteristics of Murcray et al. [8] and the weighted approach at an alpha of 0.3 (see Table 3, figures printed in bold). Thus, the weighted and unweighted approaches have similar power if the study-wide FPR is held constant.

The TPR was also related to different total sample sizes keeping the absolute number of cases constant at 1,000 (Fig. 2, right panel). As the number of controls was increased above 1,000, the TPR of the case-only became inferior to the 2-step approaches. The TPR of Murcray 2009 [6] decreased above 5,000 controls.

Comparisons of power for each method in relation to variations in the prevalence of the environmental exposure (Fig. 3) confirmed that, across a wide range of non-null scenarios, all two-step methods outperform the Mukherjee and Chatterjee 2008 [4] method in the case–control context.
Fig. 3

True positive rate in relation to the prevalence of the environmental exposure. The alternative hypothesis of a present interaction was simulated for 9,000 SNPs at different exposure prevalences in a population setting and a case–control setting. All other parameters were kept constant (N = 10,000, OREG|D− = 1.0, ORG*E = 1.5, PARFE = 0.2, PARFG = 0.1, ψ = 0.5). With these parameters (particularly PARFE) fixed, the exposure prevalence influenced the gene-disease association OR in the strata of exposed and unexposed individuals (Table) thereby creating different interaction types. In the case–control setting, the line of the weighted 2-step approach partially covers the lines of Murcray 2009 [6], 2011 [8], and Ege 2011 [7] approaches

The variation of the exposure prevalence in the context of a fixed PARFE and a fixed ORG*E led to various combinations of gene-disease associations in the exposure strata, thereby reflecting the different interaction types (Table included in Fig. 3). The highest power was achieved for a full effect concentration in the population setting and a mild cross-over in the case–control setting (Fig. 3).

The prevalence of the genotype also impacted on the power: The highest power was achieved for intermediate genotype prevalences in both population and case–control settings by all approaches except Murcray 2009 [6], which was highest at low prevalences of genotype in a population setting (Supplemental Fig. 5).

True positive rate (power) in relation to the magnitude and direction of ORG*E

The TPR was related to the size of the interaction as determined by the ORG*E (Fig. 4), assuming ORGE|D− = 1, and holding the attributable fractions for environmental and genetic effects (PARFE and PARFG) constant. In the population setting, the two-step approaches exhibited the highest power up to an ORG*E of 1.8, above which Murcray 2011 [8] and the weighted two-step design became relatively less powerful. In the case–control setting the Mukherjee and Chatterjee 2008 method [4] yielded TPRs very similar to the two-step designs across the range of ORG*E.
Fig. 4

True positive rate in relation to interaction odds ratio. The alternative hypothesis of a present interaction was simulated for 9,000 SNPs at 11 different interaction odds ratios (ORG*E = 1.05, 1.10 … (0.1) … 2.0) in a population setting (PD = 0.1, left panel) and a case–control setting (PD = 0.5, right panel). All other parameters were kept constant (N = 10,000, PE = 0.3, OREG|D− = 1.0, PARFE = 0.2, PARFG = 0.1, ψ = 0.5). In the case–control setting, the line of the weighted 2-step approach covers the lines of the Murcray 2009 [6], 2011 [9] and Ege 2011 [7] approaches

Optimal step 1 threshold

As a first approach the step 1 threshold was chosen to be the square root of the Bonferroni-corrected alpha, i.e. α1 = (0.05/651,550)ψ with ψ = 0.5. We then varied this parameter ψ from 0 to 1, thereby distributing the overall alpha at different shares between the two steps. As shown by Fig. 5 for the weighted approach the optimal choice for ψ was dependent on the interacting allele frequency PG and the disease prevalence PD (population vs. case–control setting). On average a suitable threshold would be in a range from α10.8 (for PD = 0.5) to α10.5 (for PD = 0.1), corresponding to 2.0 × 10−6 < α1 < 2.7 × 10−4. The Murcray 2011 [8] two-step approach displayed similar patterns (data not shown).
Fig. 5

Optimal step 1 threshold. To determine an optimal step 1 threshold for the two-step methods the true positive rates of the weighted two-step approach were plotted against the proportion of alpha allocated to step 1 (ψ) ranging from ψ = 0 to ψ = 1 by steps of 0.1 at disease prevalences PD = 0.1 (left panel) and PD = 0.5 (right panel). All other parameters were kept constant (9,000 SNPs at 9 different interaction allele frequencies, N = 10,000, PE = 0.3, OREG|D− = 1.0, PARFE = 0.2, PARFG = 0.1)

True and false positive rates in relation to GEA in controls

As illustrated by Fig. 6 the degree of gene–environment association (GEA) among controls influences the power to detect an interaction. The weighted approach gains power over Murcray 2011 [8] in the presence of GEA. This explains the consistently higher percentage of true positive findings among positive findings of the weighted approach as compared to Murcray 2011 [8], which is demonstrated by Table 3.
Fig. 6

True positive rate in relation to gene–environment association. The TPR is given for various degrees of GEA with an OREG|D- ranging from 0.8 to 1.25 in a population setting (PD = 0.1, left panel) and a case–control setting (PD = 0.5, right panel). All other parameters were kept constant (N = 10,000, PE = 0.3, ORG*E = 1.5, PARFE = 0.2, PARFG = 0.1, ψ = 0.8)

Effect of the underlying genetic model

All methods exhibited the highest power for additive genetic models. The weighted two-step approach had higher power for dominant models as compared to Murcray 2011 [8] irrespectively of GEA in controls, whereas Murcray 2011 [8] was equally (and in the absence of GEA in controls) more powerful for recessive models (Fig. 7).
Fig. 7

True positive rate in relation to underlying genetic model. The three genetic models dominant (D), additive (A), and recessive (R) are compared across different levels of GEA in controls with an OREG|D− ranging from 0.9 over 1.0 to 1.1 (from left to right). The alternative hypothesis of a present interaction is tested in 1,000 SNPs at an interaction allele frequency of PG = 0.5 in a population setting (PD = 0.1, N = 10,000, PE = 0.3, ORG*E = 1.5, PARFE = 0.2, PARFG = 0.1, ψ = 0.8)


The issue of multiple comparisons is a major challenge for any genome-wide analysis. In the case of interaction analyses, which tend to have lower statistical power than analyses of marginal associations (main effects), the balance between true positive rate (TPR) and (study-wide) false positive rates (FPR) require careful evaluation [11, 12]. In the present study we have simulated various data sets resembling real world scenarios (Table 1) and compared our modification [7] of the two-step approach originally proposed by Murcray et al. [6] to all relevant methods currently available for GWIS analyses. We find that, in terms of TPR, the weighted variant of our modification is superior to all other methods when analyzing population based data in the context of GEA, while preserving a low FPR in over 250 million simulations under the null hypothesis. The apparent gain in power of our unweighted variant (Ege 2011 [7]) is offset by a slightly increased FPR in the population survey setting, attributable to the correlation between steps 1A and step 2 test statistics arising from unequal stratum sizes at step 1A in the context of a population survey. In case–control studies, both modifications match the performance of other two-step methods, and have a slightly higher TPR than the empirical Bayes one-step method proposed by Mukherjee and Chatterjee [4]. In all scenarios, the least powerful method is to proceed directly to a full interaction model, applying conventional genome-wide significance thresholds.

Case-only approach

The susceptibility of the case-only approach [2] to GEA is generally acknowledged and also reflected by our simulations (Fig. 1). Moreover, we note that the power advantage of the case-only method over other more rigorous approaches is removed by including even a modest increase in the number of controls per case (Fig. 2, right panel).

Weighted combination of case-only and full interaction models

The approach by Mukherjee and Chatterjee [4] performs much better than the case-only approach in terms of FPR and much better than the full interaction model in terms of TPR as shown in their recent contribution [9] and by our simulation study. However, its performance in terms of both TPR and FPR falls short of all two-step approaches.

Two-step procedures

The major advantage of all two-step approaches over the single step methods lies in reducing the number of tests performed in the full interaction model since the two steps are asymptotically independent, and only the tests performed in the second step require correction for multiple comparisons [6]. This major advantage over all single step methods is particularly pronounced in a genome-wide scenario. When testing only few SNPs or SNPs in high linkage disequilibrium the advantage is partially consumed.

The original Murcray approach is limited by its restriction to a case–control study design and loses power with increasing numbers of controls at a constant step 1 threshold (Fig. 2) [12]. Furthermore the original approach neglects additional information from the genotype-disease association. This has now been addressed by a modification recently published by the same authors [8]. However, the additional information obtained is not fully integrated, but only combined in a trade-off of type 1 error rates at step 1 by an arbitrary partitioning parameter “rho”. At a broad range of intermediate values of rho, the method seems to be fairly insensitive to the choice of rho [8].

Modifications of the two-step approach

In order to integrate as much information as possible we have introduced a step 1B, where we calculate the unweighted average of the association estimates of two gene-disease GWASes stratified for the environmental exposure [7]. As the unweighted average induces some correlation between step 1 and step 2 (Supplemental Figs. 1 and 3) it is affected by a slightly increased FPR. Although the integration of step 1A and step 1B via the χ²df=2 statistic removed the correlation of step 1 and 2 almost completely (rho = −0.03), there was still some evidence of an inflated FPR due to incomplete statistical independence between steps 1 and 2. Therefore, in a variant of our earlier approach we integrated the stratified analyses by an inverse-variance weighted average, rendering it perfectly robust against false positive findings (Fig. 1, Table 3) while maintaining a high power throughout a spectrum of various degrees of GEA (Fig. 6). In practical terms the inflated FPR of the unweighted approach led only to one false positive finding at a genome-wide alpha level of 0.05 (Table 3), which is more than compensated by 68 true positive findings. Retrospectively these findings justify our previous analysis of a genome-wide data set [7]. However, the characteristics of the unweighted approach at a genome-wide alpha level of 0.001 were comparable to those of the weighted approach (or Murcray 2011 [8]) at a genome-wide alpha level of 0.3 (Table 3), ultimately suggesting that either a more liberal approach or a more progressive significance level should be applied to genome-wide interaction testing.

The two symmetric components of step 1 are uncorrelated (Supplemental Fig. 2) and can thus be integrated by a two degree of freedom test, avoiding the need to choose a partitioning parameter at step 1. A further advantage of this approach is an enhanced power for cross-over interactions [13] with equally weak main genetic and environmental effects, a situation where Murcray 2011 [8] is relatively less powerful (Fig. 3). Moreover, the weighted approach is relatively more powerful in dominant genetic models as compared to Murcray 2011 [8] (Fig. 7).

In the situation we were facing, with a single non-genetic exposure of interest, the environment-disease association was given. However, in situations where a multitude of environmental exposures were present one might integrate the environment-disease associations as a step 1C and perform a three degree of freedom test at step 1. This extension to a step 1C would be appealing (and symmetrical) in the search for G*G interactions, where the “E” could be one of many thousands of SNPs [14].

A further advantage of the stratified assessment of the associations in step 1, over the unstratified assessment as proposed in the original two-step approach, is that it takes mutual confounding of genetic and environmental effects into account. In theory, the Murcray methods [6, 8] are prone to confounding at step 1, since the environmental exposure has presumably been selected because of prior knowledge or suspicion of an environment-disease association, leading to a spurious GEA for SNPs that are associated with disease, even if there is no G*E interaction. In practice, confounding may explain the weak correlation of step 1A and step 1B of Murcray 2011 [8] (Supplemental Fig. 2).

As a first approximation we selected the step 1 significance threshold as the square root of the genome-wide significance level, i.e. α1 < 2.7 × 10−4, in order to distribute the power equally between the two steps (ψ = 0.5). The simulation experiment, however, revealed that the optimal threshold depended on PD and PG. Ignoring PG the best choice of ψ would be between 0.5 and 0.6 in a population survey setting (at PD ≈ 0.1) and around 0.8 in a case–control setting (Fig. 5). The finding that the TPR is maximized at ψ = 1 in the case–control setting is in line with a recent commentary by Thomas et al. [12]. However, choice of such an extreme partitioning between steps 1 and 2 would imply that the step 1 statistic alone would be taken as evidence of gene–environment interaction, i.e. all variants passing step 1 also pass step 2. Without a supplementary test of the full interaction model, the overall analysis is highly prone to false positives in the context of gene-disease or gene–environment associations in the total population.

The two degree of freedom test we used in this approach is fundamentally different from the two degree of freedom test suggested by Kraft et al. [5] in order to detect genetic effects in an entire population or subsamples characterized by specific exposures. The Kraft 2 df test compares the full interaction model (D = E + G + G*E) to a model containing only the marginal environmental effect (D = E), which equals the sum of the likelihood ratio χ² of the two models D = G in E + and D = G in E− (models iv and v in Table 2). This differs from our step 1B, which is a 1df test of the pooled stratum-specific gene-disease associations. The purpose of the Kraft approach is not to distinguish between marginal genetic effects and interactions; inherently, it is less specific as a test for interactions than the methods evaluated in this paper.

Here we have focused on modelling G*E interaction by multiplicative interaction terms in order to render the current state-of-the-art analysis methods comparable. Other modelling strategies e.g. based on risk differences, however, require special analysis techniques to be explored in the future.


This extensive set of simulations, using a range of parameters that might apply in real-world settings, confirms the practical advantage of two-step approaches to interaction testing over more conventional one-step designs, at least in the context of dichotomous disease outcomes. The method can be easily adapted to the assessment of gene–gene interactions, as discussed above. The underlying concept of averaging associations between two potentially interacting variables (E and G) across strata defined by the level of an outcome (D) could, in theory, be extended to genome-wide G*E or G*G interaction analyses for continuous outcomes variables, such as physiological traits or levels of gene expression. However, further developmental work is required to evaluate the detailed application in such circumstances.



This work was supported by the European Commission as part of GABRIEL (A multidisciplinary study to identify the genetic and environmental causes of asthma in the European Community) Contract Number 018996 under the Integrated Program LSH-2004-1.2.5-1. M.J.E received the Stephan-Weiland Fellowship of the GABRIEL consortium.

Supplementary material

10654_2013_9837_MOESM1_ESM.pdf (255 kb)
Supplementary material 1 (PDF 255 kb)


  1. 1.
    Thomas D. Gene–environment-wide association studies: emerging approaches. Nat Rev Genet. 2010;11(4):259–72.PubMedCrossRefGoogle Scholar
  2. 2.
    Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case-control studies. Stat Med. 1994;13(2):153–62.PubMedCrossRefGoogle Scholar
  3. 3.
    Mukherjee B, Ahn J, Gruber SB, Rennert G, Moreno V, Chatterjee N. Tests for gene–environment interaction from case–control data: a novel study of type I error, power and designs. Genet Epidemiol. 2008;32(7):615–26.PubMedCrossRefGoogle Scholar
  4. 4.
    Mukherjee B, Chatterjee N. Exploiting gene–environment independence for analysis of case-control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics. 2008;64(3):685–94.PubMedCrossRefGoogle Scholar
  5. 5.
    Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene–environment interaction to detect genetic associations. Hum Hered. 2007;63(2):111–9.PubMedCrossRefGoogle Scholar
  6. 6.
    Murcray CE, Lewinger JP, Gauderman WJ. Gene–environment interaction in genome-wide association studies. Am J Epidemiol. 2009;169(2):219–26.PubMedCrossRefGoogle Scholar
  7. 7.
    Ege MJ, Strachan DP, Cookson WO, Moffatt MF, Gut I, Lathrop M, et al. Gene–environment interaction for childhood asthma and exposure to farming in Central Europe. J Allergy Clin Immunol. 2011;127(1):138–44.PubMedCrossRefGoogle Scholar
  8. 8.
    Murcray CE, Lewinger JP, Conti DV, Thomas DC, Gauderman WJ. Sample size requirements to detect gene–environment interactions in genome-wide association studies. Genet Epidemiol. 2011;35(3):201–10.PubMedCrossRefGoogle Scholar
  9. 9.
    Mukherjee B, Ahn J, Gruber SB, Chatterjee N. Testing gene–environment interaction in large-scale case-control association studies: possible choices and comparisons. Am J Epidemiol. 2012;175(3):177–90.PubMedCrossRefGoogle Scholar
  10. 10.
    Dudbridge F, Gusnanto A. Estimation of significance thresholds for genomewide association scans. Genet Epidemiol. 2008;32(3):227–34.PubMedCrossRefGoogle Scholar
  11. 11.
    Chatterjee N, Wacholder S. Invited commentary: efficient testing of gene–environment interaction. Am J Epidemiol. 2009;169(2):231–3. doi:10.1093/aje/kwn352.PubMedCrossRefGoogle Scholar
  12. 12.
    Thomas DC, Lewinger JP, Murcray CE, Gauderman WJ. Invited commentary: GE-Whiz! ratcheting gene–environment studies up to the whole genome and the whole exposome. Am J Epidemiol. 2012;175(3):203–7.PubMedCrossRefGoogle Scholar
  13. 13.
    Wason JM, Dudbridge F. A general framework for two-stage analysis of genome-wide association studies and its application to case-control studies. Am J Hum Genet. 2012;90(5):760–73.PubMedCrossRefGoogle Scholar
  14. 14.
    Kooperberg C, Leblanc M. Increasing the power of identifying gene × gene interactions in genome-wide association studies. Genet Epidemiol. 2008;32(3):255–63.PubMedCrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media Dordrecht 2013

Authors and Affiliations

  1. 1.Dr. von Hauner Children’s HospitalLudwig Maximilians University MunichMunichGermany
  2. 2.Division of Community Health SciencesSt George’s, University of LondonLondonUK

Personalised recommendations