Abstract
With limited funding and biological specimen availability, choosing an optimal sampling design to maximize power for detecting gene-by-environment (G–E) interactions is critical. Exposure-enriched sampling is often used to select subjects with rare exposures for genotyping to enhance power for tests of G–E effects. However, exposure misclassification (MC) combined with biased sampling can affect characteristics of tests for G–E interaction and joint tests for marginal association and G–E interaction. Here, we characterize the impact of exposure-biased sampling under conditions of perfect exposure information and exposure MC on properties of several methods for conducting inference. We assess the Type I error, power, bias, and mean squared error properties of case-only, case–control, and empirical Bayes methods for testing/estimating G–E interaction and a joint test for marginal G (or E) effect and G–E interaction across three biased sampling schemes. Properties are evaluated via empirical simulation studies. With perfect exposure information, exposure-enriched sampling schemes enhance power as compared to random selection of subjects irrespective of exposure prevalence but yield bias in estimation of the G–E interaction and marginal E parameters. Exposure MC modifies the relative performance of sampling designs when compared to the case of perfect exposure information. Those conducting G–E interaction studies should be aware of exposure MC properties and the prevalence of exposure when choosing an ideal sampling scheme and method for characterizing G–E interactions and joint effects.
Similar content being viewed by others
Abbreviations
- CC:
-
Case–control
- CO:
-
Case-only
- EB:
-
Empirical Bayes
- G:
-
Genetic variant
- E:
-
Environmental exposure
- D:
-
Disease/outcome status
- αg :
-
Marginal log-odds ratio associated with the genetic factor
- MAg :
-
Marginal genetic association
- αe :
-
Marginal log-odds ratio associated with the environmental factor
- MAe :
-
Marginal environmental association
- ORge :
-
Odds ratio for the association between the genetic and environmental variables in controls
- βg :
-
Main effect log-odds ratio associated with the genetic factor
- ORg :
-
Exp(βg)
- βe :
-
Main effect log-odds ratio associated with the environmental factor
- ORe :
-
Exp(βe)
- βg×e :
-
Gene by environment interaction log-odds ratio
- ORg×e :
-
Exp(βg×e)
References
Hunter DJ. Gene–environment interactions in human diseases. Nat Rev Genet. 2005;6:287–98.
Thomas D. Gene–environment-wide association studies: emerging approaches. Nat Rev Genet. 2010;11:259–72.
Dai JY, Logsdon BA, Huang Y, et al. Simultaneously testing for marginal genetic association and gene–environment interaction. Am J Epidemiol. 2012;176:164–73.
Kraft P, Yen YC, Stram DO, Morrison J, Gauderman WJ. Exploiting gene–environment interaction to detect genetic associations. Hum Hered. 2007;63:111–9.
Piegorsch WW, Weinberg CR, Taylor JA. Non-hierarchical logistic models and case-only designs for assessing susceptibility in population-based case–control studies. Stat Med. 1994;13:153–62.
Chatterjee N, Carroll RJ. Semiparametric maximum likelihood estimation exploiting gene–environment independence in case–control studies. Biometrika. 2005;92:399–418.
Mukherjee B, Chatterjee N. Exploiting gene–environment independence for analysis of case–control studies: an empirical Bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics. 2008;64:685–94.
Mukherjee B, Ahn J, Gruber SB, Chatterjee N. Testing gene–environment interaction in large-scale case–control association studies: possible choices and comparisons. Am J Epidemiol. 2012;175:177–90.
Oexle K, Meitinger T. Sampling GWAS subjects from risk populations. Genet Epidemiol. 2011;35:148–53.
Chen J, Kang G, Vanderweele T, Zhang C, Mukherjee B. Efficient designs of gene–environment interaction studies: implications of Hardy–Weinberg equilibrium and gene–environment independence. Stat Med. 2012;31:2516–30.
Garcia-Closas M, Rothman N, Lubin J. Misclassification in case–control studies of gene–environment interactions: assessment of bias and sample size. Cancer Epidemiol Biomark Prev. 1999;8:1043–50.
Rothman N, Garcia-Closas M, Stewart WT, Lubin J. The impact of misclassification in case–control studies of gene–environment interactions. IARC Sci publ. 1999;148:89–96.
Garcia-Closas M, Thompson WD, Robins JM. Differential misclassification and the assessment of gene–environment interactions in case–control studies. Am J Epidemiol. 1998;147:426–33.
Lindstrom S, Yen YC, Spiegelman D, Kraft P. The impact of gene–environment dependence and misclassification in genetic association studies incorporating gene–environment interactions. Hum Hered. 2009;68:171–81.
Carroll RJ, Gail MH, Lubin JH. Case–control studies with errors in covariates. J Am Stat Assoc. 1993;88:185–99.
Rothman KJ, Greenland S, Lash TL. Modern Epidemiology. 3rd ed. Philadelphia: Lippincott Williams & Wilkins; 2008. p. 111–38.
Breslow NE, Chatterjee N. Design and analysis of two-phase studies with binary outcome applied to Wilms tumour prognosis. J R Stat Soc Ser C (Appl Stat). 1999;48(4):457–68. doi:10.1111/1467-9876.00165.
Lee AJ, Scott AJ, Wild CJ. Efficient estimation in multi-phase case–control studies. Biometrika. 2010;97(2):361–74. doi:10.1093/biomet/asq009.
Lumley T. Survey: analysis of complex survey samples. R package version 3.2.4. 2011. Available online at:http://cran.r-project.org/web/packages/survey/index.html.
Cheng KF. Analysis of case-only studies accounting for genotyping error. Ann Hum Genet. 2007;71:238–48.
Wong MY, Day NE, Luan JA, Wareham NJ. Estimation of magnitude in gene–environment interactions in the presence of measurement error. Stat Med. 2004;23:987–98.
Greenland S. Statistical uncertainty due to misclassification: implications for validation substudies. J Clin Epidemiol. 1988;41:1167–74.
Zhang L, Mukherjee B, Ghosh M, Gruber S, Moreno V. Accounting for error due to misclassification of exposures in case–control studies of gene–environment interaction. Stat Med. 2008;27:2756–83.
Rice K. Full-likelihood approaches to misclassification of a binary exposure in matched case–control studies. Stat Med. 2003;22:3177–94.
Spiegelman DRB, Logan R. Estimation and inference for logistic regression with covariate misclassification and measurement error, in main study/validation study designs. J Am Stat Assoc. 2000;95:51–61.
Lobach I, Fan R, Carroll RJ. Genotype-based association mapping of complex diseases: gene–environment interactions with multiple genetic markers and measurement error in environmental exposures. Genet Epidemiol. 2010;34:792–802.
Lobach I, Mallick B, Carroll RJ. Semiparametric Bayesian analysis of gene–environment interactions with error in measurement of environmental covariates and missing genetic data. Stat Interface. 2011;4:305–16.
Acknowledgments
Support for this study was provided by National Science Foundation DMS 1007494, National Institutes of Health ES 20811, National Institutes of Health CA 156608, and National Institutes of Health CA 148107. Funding for SLS was provided by the National Human Genome Research Institute at the National Institutes of Health (T32 HG00040), the National Institute of Environmental Health Sciences at the National Institutes of Health (T32 ES013678), and a fellowship from the University of Michigan Rackham Graduate School. Funding for PB and BM was partially provided by the University of Michigan Cancer Center Support Grant NIH P30 CA 046592.
Conflict of interest
The authors declare no conflicts of interest.
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Stenzel, S.L., Ahn, J., Boonstra, P.S. et al. The impact of exposure-biased sampling designs on detection of gene–environment interactions in case–control studies with potential exposure misclassification. Eur J Epidemiol 30, 413–423 (2015). https://doi.org/10.1007/s10654-014-9908-1
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10654-014-9908-1