Skip to main content
Log in

The impact of exposure-biased sampling designs on detection of gene–environment interactions in case–control studies with potential exposure misclassification

  • METHODS
  • Published:
European Journal of Epidemiology Aims and scope Submit manuscript

Abstract

With limited funding and biological specimen availability, choosing an optimal sampling design to maximize power for detecting gene-by-environment (G–E) interactions is critical. Exposure-enriched sampling is often used to select subjects with rare exposures for genotyping to enhance power for tests of G–E effects. However, exposure misclassification (MC) combined with biased sampling can affect characteristics of tests for G–E interaction and joint tests for marginal association and G–E interaction. Here, we characterize the impact of exposure-biased sampling under conditions of perfect exposure information and exposure MC on properties of several methods for conducting inference. We assess the Type I error, power, bias, and mean squared error properties of case-only, case–control, and empirical Bayes methods for testing/estimating G–E interaction and a joint test for marginal G (or E) effect and G–E interaction across three biased sampling schemes. Properties are evaluated via empirical simulation studies. With perfect exposure information, exposure-enriched sampling schemes enhance power as compared to random selection of subjects irrespective of exposure prevalence but yield bias in estimation of the G–E interaction and marginal E parameters. Exposure MC modifies the relative performance of sampling designs when compared to the case of perfect exposure information. Those conducting G–E interaction studies should be aware of exposure MC properties and the prevalence of exposure when choosing an ideal sampling scheme and method for characterizing G–E interactions and joint effects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Abbreviations

CC:

Case–control

CO:

Case-only

EB:

Empirical Bayes

G:

Genetic variant

E:

Environmental exposure

D:

Disease/outcome status

αg :

Marginal log-odds ratio associated with the genetic factor

MAg :

Marginal genetic association

αe :

Marginal log-odds ratio associated with the environmental factor

MAe :

Marginal environmental association

ORge :

Odds ratio for the association between the genetic and environmental variables in controls

βg :

Main effect log-odds ratio associated with the genetic factor

ORg :

Exp(βg)

βe :

Main effect log-odds ratio associated with the environmental factor

ORe :

Exp(βe)

βg×e :

Gene by environment interaction log-odds ratio

ORg×e :

Exp(βg×e)

References

  1. Hunter DJ. Gene–environment interactions in human diseases. Nat Rev Genet. 2005;6:287–98.

    Article  CAS  PubMed  Google Scholar 

  2. Thomas D. Gene–environment-wide association studies: emerging approaches. Nat Rev Genet. 2010;11:259–72.

    Article  CAS  PubMed Central  PubMed  Google Scholar 

  3. Dai JY, Logsdon BA, Huang Y, et al. Simultaneously testing for marginal genetic association and gene–environment interaction. Am J Epidemiol. 2012;176:164–73.

    Article  PubMed Central  PubMed  Google Scholar 

  4. Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene–environment interaction to detect genetic associations. Hum Hered. 2007;63:111–9.

    Article  CAS  PubMed  Google Scholar 

  5. Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Stat Med. 1994;13:153–62.

    Article  CAS  PubMed  Google Scholar 

  6. Chatterjee N, Carroll RJ. Semiparametric maximum likelihood estimation exploiting gene–environment independence in case–control studies. Biometrika. 2005;92:399–418.

    Article  Google Scholar 

  7. Mukherjee B, Chatterjee N. Exploiting gene–environment independence for analysis of case–control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics. 2008;64:685–94.

    Article  PubMed  Google Scholar 

  8. Mukherjee B, Ahn J, Gruber SB, Chatterjee N. Testing gene–environment interaction in large-scale case–control association studies: possible choices and comparisons. Am J Epidemiol. 2012;175:177–90.

    Article  PubMed Central  PubMed  Google Scholar 

  9. Oexle K, Meitinger T. Sampling GWAS subjects from risk populations. Genet Epidemiol. 2011;35:148–53.

    Article  PubMed  Google Scholar 

  10. Chen J, Kang G, Vanderweele T, Zhang C, Mukherjee B. Efficient designs of gene–environment interaction studies: implications of Hardy–Weinberg equilibrium and gene–environment independence. Stat Med. 2012;31:2516–30.

    Article  PubMed Central  PubMed  Google Scholar 

  11. Garcia-Closas M, Rothman N, Lubin J. Misclassification in case–control studies of gene–environment interactions: assessment of bias and sample size. Cancer Epidemiol Biomark Prev. 1999;8:1043–50.

    CAS  Google Scholar 

  12. Rothman N, Garcia-Closas M, Stewart WT, Lubin J. The impact of misclassification in case–control studies of gene–environment interactions. IARC Sci publ. 1999;148:89–96.

  13. Garcia-Closas M, Thompson WD, Robins JM. Differential misclassification and the assessment of gene–environment interactions in case–control studies. Am J Epidemiol. 1998;147:426–33.

    Article  CAS  PubMed  Google Scholar 

  14. Lindstrom S, Yen YC, Spiegelman D, Kraft P. The impact of gene–environment dependence and misclassification in genetic association studies incorporating gene–environment interactions. Hum Hered. 2009;68:171–81.

    Article  PubMed Central  PubMed  Google Scholar 

  15. Carroll RJ, Gail MH, Lubin JH. Case–control studies with errors in covariates. J Am Stat Assoc. 1993;88:185–99.

    Google Scholar 

  16. Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 111–38.

  17. Breslow NE, Chatterjee N. Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis. J R Stat Soc Ser C (Appl Stat). 1999;48(4):457–68. doi:10.1111/1467-9876.00165.

    Article  Google Scholar 

  18. Lee AJ, Scott AJ, Wild CJ. Efficient estimation in multi-phase case–control studies. Biometrika. 2010;97(2):361–74. doi:10.1093/biomet/asq009.

    Article  Google Scholar 

  19. Lumley T. Survey: analysis of complex survey samples. R package version 3.2.4. 2011. Available online at:http://cran.r-project.org/web/packages/survey/index.html.

  20. Cheng KF. Analysis of case-only studies accounting for genotyping error. Ann Hum Genet. 2007;71:238–48.

    Article  CAS  PubMed  Google Scholar 

  21. Wong MY, Day NE, Luan JA, Wareham NJ. Estimation of magnitude in gene–environment interactions in the presence of measurement error. Stat Med. 2004;23:987–98.

    Article  CAS  PubMed  Google Scholar 

  22. Greenland S. Statistical uncertainty due to misclassification: implications for validation substudies. J Clin Epidemiol. 1988;41:1167–74.

    Article  CAS  PubMed  Google Scholar 

  23. Zhang L, Mukherjee B, Ghosh M, Gruber S, Moreno V. Accounting for error due to misclassification of exposures in case–control studies of gene–environment interaction. Stat Med. 2008;27:2756–83.

    Article  PubMed  Google Scholar 

  24. Rice K. Full-likelihood approaches to misclassification of a binary exposure in matched case–control studies. Stat Med. 2003;22:3177–94.

    Article  PubMed  Google Scholar 

  25. Spiegelman DRB, Logan R. Estimation and inference for logistic regression with covariate misclassification and measurement error, in main study/validation study designs. J Am Stat Assoc. 2000;95:51–61.

    Article  Google Scholar 

  26. Lobach I, Fan R, Carroll RJ. Genotype-based association mapping of complex diseases: gene–environment interactions with multiple genetic markers and measurement error in environmental exposures. Genet Epidemiol. 2010;34:792–802.

    Article  PubMed Central  PubMed  Google Scholar 

  27. Lobach I, Mallick B, Carroll RJ. Semiparametric Bayesian analysis of gene–environment interactions with error in measurement of environmental covariates and missing genetic data. Stat Interface. 2011;4:305–16.

    Article  PubMed Central  PubMed  Google Scholar 

Download references

Acknowledgments

Support for this study was provided by National Science Foundation DMS 1007494, National Institutes of Health ES 20811, National Institutes of Health CA 156608, and National Institutes of Health CA 148107. Funding for SLS was provided by the National Human Genome Research Institute at the National Institutes of Health (T32 HG00040), the National Institute of Environmental Health Sciences at the National Institutes of Health (T32 ES013678), and a fellowship from the University of Michigan Rackham Graduate School. Funding for PB and BM was partially provided by the University of Michigan Cancer Center Support Grant NIH P30 CA 046592.

Conflict of interest

The authors declare no conflicts of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stephanie L. Stenzel.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 624 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Stenzel, S.L., Ahn, J., Boonstra, P.S. et al. The impact of exposure-biased sampling designs on detection of gene–environment interactions in case–control studies with potential exposure misclassification. Eur J Epidemiol 30, 413–423 (2015). https://doi.org/10.1007/s10654-014-9908-1

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10654-014-9908-1

Keywords

Navigation