Skip to main content


Log in

Review of Statistical Methods for Gene-Environment Interaction Analysis

  • Genetic Epidemiology (C Amos, Section Editor)
  • Published:
Current Epidemiology Reports Aims and scope Submit manuscript


Purpose of Reviews

Complex diseases are caused by a combination of genetic and environmental factors, creating a challenge for understanding the disease mechanisms. Understanding the interplay between genes and environmental factors is important, as genes do not operate in isolation but rather in complex networks and pathways influenced by environmental factors. The advent of new technologies has made a massive amount of genetic data available, and various statistical methods have been developed to analyze genetic data and to identify interactions between genes and the environment, i.e., gene-environment (G-E) interactions.

Recent Findings

In this review article, we introduce various statistical methods for identifying G-E interactions using case-control designs. We review a range of disease risk models for modeling the joint effects of genetic and environmental factors such as multiplicative and additive models. We then introduce various inference methods under these disease risk models, which include a standard prospective likelihood, case-only designs, a retrospective likelihood that exploits a gene-environment independence assumption to boost power, and an empirical Bayes type approach that uses the independence assumption in a data-adaptive way. Several tests for detecting genetic associations in the presence of G-E interactions are also introduced, which include a joint test and a maximum score test that provides a unified approach by integrating a class of disease risk models to maximize over a class of score tests.


There are several challenges of G-E interaction analysis that include replication issues. While more powerful statistical methods for detecting interactions are helpful, ultimately studies with larger sample sizes are needed to identify interactions through consortium-based studies to achieve adequate power for G-E analysis.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others


Papers of particular interest, published recently, have been highlighted as: • Of importance •• Of major importance

  1. Eichler EE, Flint J, Gibson G, Kong A, Leal SM, Moore JH, et al. Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev Genet. 2010;11(6):446–50.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Thompson WD. Effect modification and the limits of biological inference from epidemiologic data. J Clin Epidemiol. 1991;44(3):221–32.

    Article  CAS  PubMed  Google Scholar 

  3. Ottman R. Gene–environment interaction: definitions and study design. Prev Med. 1996;25(6):764–70.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. • Hutter CM, Mechanic LE, Chatterjee N, Kraft P, Gillanders EM. Gene-environment interactions in cancer epidemiology: a National Cancer Institute Think Tank Report. Genet Epidemiol. 2013;37(7):643–57. Summarizes contemporary analytic methods for G × E interactions, provides an overview of motivation for performing G × E analysis, and discusses key considerations for analysis in case-control or nested case-control studies, and comments on interpretation of G × E interactions.

    Article  PubMed  PubMed Central  Google Scholar 

  5. • Maas P, Barrdahl M, Joshi AD, Auer PL, Gaudet MM, Milne RL, et al. Breast cancer risk from modifiable and nonmodifiable risk factors among white women in the United States. JAMA Oncol. 2016;2(10):1295–302. Evaluates combined risk stratification utility of common low penetrant single nucleotide polymorphisms (SNPs) and epidemiologic risk factors for breast cancer. Their model for absolute risk of breast cancer including SNPs can provide stratification for the population of white women in the United States and also can identify subsets of the population at an elevated risk that would benefit most from risk-reduction strategies based on altering modifiable factors.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Thomas D. Gene-environment-wide association studies: emerging approaches. Nat Rev Genet. 2010;11(4):259–72.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Cordell HJ. Epistasis: what it means, what it doesn't mean, and statistical methods to detect it in humans. Hum Mol Genet. 2002;11(20):2463–8.

    Article  CAS  PubMed  Google Scholar 

  8. Lund E. Comparison of additive and multiplicative models for reproductive risk factors and post menopausal breast cancer. Stat Med. 1995;14(3):267–74.

    Article  CAS  PubMed  Google Scholar 

  9. Clayton D. Link functions in multi-locus genetic models: implications for testing, prediction, and interpretation. Genet Epidemiol. 2012;36(4):409–18.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Han SS, Rosenberg PS, Ghosh A, Landi MT, Caporaso NE, Chatterjee N. An exposure-weighted score test for genetic associations integrating environmental risk factors. Biometrics. 2015.

  11. Prentice RL, Pyke R. Logistic disease incidence models and case-control studies. Biometrika. 1979;66(3):403–11.

    Article  Google Scholar 

  12. • Figueroa JD, Han SS, Garcia-Closas M, Baris D, Jacobs EJ, Kogevinas M, et al. Genome-wide interaction study of smoking and bladder cancer risk. Carcinogenesis. 2014:bgu064. Conducted a genome-wide interaction study of smoking for bladder cancer risk by applying both multiplicative and additive interactions based on a prospective likelihood and a retrospective likelihood. They identified 10 significant SNPs that interact with smoking status (ever versus never smokers) for bladder cancer; these included rs1711973 that had an increased risk (OR=1.34; 95% confidence interval (CI): 1.2–1.5) among never smokers (multiplicative interaction P = 6.38E-06) and rs12216499 that had a reduced risk (OR=0.75; CI: 0.67–0.84) for ever-smokers (additive interaction P= 1.41E-06).

  13. Joshi AD, Lindström S, Hüsing A, Barrdahl M, VanderWeele TJ, Campa D, et al. Additive interactions between susceptibility single-nucleotide polymorphisms identified in genome-wide association studies and breast cancer risk factors in the breast and prostate cancer cohort consortium. Am J Epidemiol. 2014;180(10):1018–27.

    Article  PubMed  PubMed Central  Google Scholar 

  14. Garcia-Closas M, Rothman N, Figueroa JD, Prokunina-Olsson L, Han SS, Baris D, et al. Common genetic polymorphisms modify the effect of smoking on absolute risk of bladder cancer. Cancer Res. 2013;73(7):2211–20.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  15. Piegorsch WW, Weinberg CR, Taylor JA. Non hierarchical logistic models and case only designs for assessing susceptibility in population based case control studies. Stat Med. 1994;13(2):153–62.

    Article  CAS  PubMed  Google Scholar 

  16. Freedman BI, Langefeld CD, Lu L, Divers J, Comeau ME, Kopp JB, et al. Differential effects of MYH9 and APOL1 risk variants on FRMD3 association with diabetic ESRD in African Americans. PLoS Genet. 2011;7(6):e1002150.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Neslund-Dudas C, Levin AM, Rundle A, Beebe-Dimmer J, Bock CH, Nock NL, et al. Case-only gene–environment interaction between ALAD tagSNPs and occupational lead exposure in prostate cancer. Prostate. 2014;74(6):637–46.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Albert PS, Ratnasinghe D, Tangrea J, Wacholder S. Limitations of the case-only design for identifying gene-environment interactions. Am J Epidemiol. 2001;154(8):687–93.

    Article  CAS  PubMed  Google Scholar 

  19. Umbach DM, Weinberg CR. Designing and analysing case control studies to exploit independence of genotype and exposure. Stat Med. 1997;16(15):1731–43.<1731::AID-SIM595>3.0.CO;2-S.

    Article  CAS  PubMed  Google Scholar 

  20. Chatterjee N, Carroll RJ. Semiparametric maximum likelihood estimation exploiting gene-environment independence in case-control studies. Biometrika. 2005;92(2):399–418.

    Article  Google Scholar 

  21. • Han SS, Rosenberg PS, Garcia-Closas M, Figueroa JD, Silverman D, Chanock SJ, et al. Likelihood ratio test for detecting gene (G)-environment (E) interactions under an additive risk model exploiting GE independence for case-control data. Am J Epidemiol. 2012;176(11):1060–7. Developed a likelihood ratio test for detecting additive interactions for case-control studies that incorporates the G-E independence assumption based on a retrospective likelihood. Numerical investigation of power suggests that incorporation of the independence assumption can enhance the efficiency of the test for additive interaction by 2- to 2.5-fold. The authors illustrate their method by applying it to data from a bladder cancer study.

    Article  PubMed  PubMed Central  Google Scholar 

  22. •• Mukherjee B, Chatterjee N. Exploiting gene environment independence for analysis of case–control studies: an empirical Bayes type shrinkage estimator to trade off between bias and efficiency. Biometrics. 2008;64(3):685–94. Proposed a novel empirical Bayes-type shrinkage estimator to analyze case–control data that can relax the gene-environment independence assumption in a data-adaptive fashion. They also described a general approach for deriving the new shrinkage estimator and its variance within the retrospective maximum-likelihood framework developed by Chatterjee and Carroll (2005). Both simulated and real data examples suggested that the proposed estimator strikes a balance between bias and efficiency depending on the true nature of the gene-environment association and the sample size for a given study.

    Article  PubMed  Google Scholar 

  23. Chen YH, Chatterjee N, Carroll RJ. Shrinkage estimators for robust and efficient inference in haplotype-based case-control studies. J Am Stat Assoc. 2009;104(485):220–33.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Liu G, Lee S, Lee AW, Wu AH, Bandera EV, Jensen A, et al. Robust tests for additive gene-environment interaction in case-control studies using gene-environment independence. Am J Epidemiol. 2017.

  25. • Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene-environment interaction to detect genetic associations. Hum Hered. 2007;63(2):111–9. Present a joint test of marginal association and gene-environment interaction for case-control data. They compared the power and sample size requirements of this joint test to other analyses: the marginal test of genetic association, the standard test for gene-environment interaction based on logistic regression, and the case-only test for interaction that exploits gene-environment independence.

    Article  CAS  PubMed  Google Scholar 

  26. • Hamza TH, Chen H, Hill-Burns EM, Rhodes SL, Montimurro J, Kay DM, et al. Genome-wide gene-environment study identifies glutamate receptor gene GRIN2A as a Parkinson's disease modifier gene via interaction with coffee. PLoS Genet. 2011;7(8):e1002237. Conducted a genome-wide joint test for gene x coffee interaction for Parkinson’s disease and identified a novel susceptibility locus in the GRIN2A gene. In the gene, the T allele of the SNP rs4998386 is associated with a reduced risk among heavy coffee drinkers, whereas this variant has a minimal effect among light coffee drinkers.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Hsu L, Jiao S, Dai JY, Hutter C, Peters U, Kooperberg C. Powerful cocktail methods for detecting genome-wide gene-environment interaction. Genet Epidemiol. 2012;36(3):183–94.

    Article  PubMed  Google Scholar 

  28. Kooperberg C, LeBlanc M. Increasing the power of identifying gene× gene interactions in genome-wide association studies. Genet Epidemiol. 2008;32(3):255–63.

    Article  PubMed  PubMed Central  Google Scholar 

  29. Murcray CE, Lewinger JP, Gauderman WJ. Gene-environment interaction in genome-wide association studies. Am J Epidemiol. 2008;169(2):219–26.

    Article  PubMed  PubMed Central  Google Scholar 

  30. Figueiredo JC, Hsu L, Hutter CM, Lin Y, Campbell PT, Baron JA, et al. Genome-wide diet-gene interaction analyses for risk of colorectal cancer. PLoS Genet. 2014;10(4):e1004228.

    Article  PubMed  PubMed Central  Google Scholar 

  31. Lin X, Lee S, Wu MC, Wang C, Chen H, Li Z, et al. Test for rare variants by environment interactions in sequencing association studies. Biometrics. 2016;72(1):156–64.

    Article  CAS  PubMed  Google Scholar 

  32. Jiao S, Peters U, Berndt S, Bézieau S, Brenner H, Campbell PT, et al. Powerful set-based gene-environment interaction testing framework for complex diseases. Genet Epidemiol. 2015;39(8):609–18.

    Article  PubMed  PubMed Central  Google Scholar 

  33. Liu Q, Chen LS, Nicolae DL, Pierce BL. A unified set-based test with adaptive filtering for gene–environment interaction analyses. Biometrics. 2016;72(2):629–38.

    Article  PubMed  Google Scholar 

  34. Su Y-R, Di C-Z, Hsu L. A unified powerful set-based test for sequencing data analysis of GxE interactions. Biostatistics. 2017;18(1):119–31.

    Article  PubMed  Google Scholar 

  35. Zhao G, Marceau R, Zhang D, Tzeng J-Y. Assessing gene-environment interactions for common and rare variants with binary traits using gene-trait similarity regression. Genetics. 2015;199(3):695–710.

    Article  PubMed  PubMed Central  Google Scholar 

  36. Jiao S, Hsu L, Bézieau S, Brenner H, Chan AT, Chang-Claude J, et al. SBERIA: set-based gene-environment interaction test for rare and common variants in complex diseases. Genet Epidemiol. 2013;37(5):452–64.

    Article  PubMed  PubMed Central  Google Scholar 

  37. Mukherjee B, Ahn J, Gruber SB, Chatterjee N. Testing gene-environment interaction in large-scale case-control association studies: possible choices and comparisons. Am J Epidemiol. 2011;175(3):177–90.

    Article  PubMed  PubMed Central  Google Scholar 

  38. Mukherjee B, Ahn J, Gruber SB, Rennert G, Moreno V, Chatterjee N. Tests for gene-environment interaction from case-control data: a novel study of type I error, power and designs. Genet Epidemiol. 2008;32(7):615–26.

    Article  PubMed  Google Scholar 

  39. Chatterjee N, Shi J, García-Closas M. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat Rev Genet. 2016.

Download references

Author information

Authors and Affiliations


Corresponding author

Correspondence to Nilanjan Chatterjee.

Ethics declarations

Conflict of Interest

The authors declare that they have no conflict of interest.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

This article is part of the Topical Collection on Genetic Epidemiology

Electronic Supplementary Material

Supplemental Figure 1

The R CGEN package: various likelihoods, statistical models, and functions for testing GxE interactions (PDF 29 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Han, S.S., Chatterjee, N. Review of Statistical Methods for Gene-Environment Interaction Analysis. Curr Epidemiol Rep 5, 39–45 (2018).

Download citation

  • Published:

  • Issue Date:

  • DOI: