Skip to main content
Log in

Power Analysis of Exposure Mixture Studies Via Monte Carlo Simulations

  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Estimating sample size and statistical power is an essential part of a good epidemiological study design. Closed-form formulas exist for simple hypothesis tests but not for advanced statistical methods designed for exposure mixture studies. Estimating power with Monte Carlo simulations is flexible and applicable to these methods. However, it is not straightforward to code a simulation for non-experienced programmers and is often hard for a researcher to manually specify multivariate associations among exposure mixtures to set up a simulation. To simplify this process, we present the R package mpower for power analysis of observational studies of environmental exposure mixtures involving recently developed mixtures analysis methods. The components within mpower are also versatile enough to accommodate any mixtures methods that will be developed in future. The package allows users to simulate realistic exposure data and mixed-typed covariates based on public dataset such as the National Health and Nutrition Examination Survey or other existing dataset from prior studies. Users can generate power curves to assess the trade-offs between sample size, effect size, and power of a design. This paper presents tutorials and examples of power analysis using mpower.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Algorithm 1
Algorithm 2
Algorithm 3
Algorithm 4
Algorithm 5
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data Availability

Publicly available on Github (https://github.com/phuchonguyen/mpower).

Code Availability

Publicly available on Github (https://github.com/phuchonguyen/mpower).

References

  1. Gelman A, Hill J (2007) Data analysis using regression and multilevel/hierarchical models. Cambridge University Press, Cambridge

    Google Scholar 

  2. Arnold BF, Hogan DR, Colford JM Jr et al (2011) Simulation methods to estimate design power: an overview for applied research. BMC Med Res Methodol. https://doi.org/10.1186/1471-2288-11-94

    Article  Google Scholar 

  3. Gastañaga VM, McLaren CE, Delfino RJ (2006) Power calculations for generalized linear models in observational longitudinal studies: a simulation approach in sas. Comput Methods Programs Biomed 84(1):27–33

    Article  Google Scholar 

  4. Landau S, Stahl D (2013) Sample size and power calculations for medical studies by simulation when closed form expressions are not available. Stat Methods Med Res 22(3):324–345

    Article  MathSciNet  Google Scholar 

  5. Sun Z, Tao Y, Li S et al (2013) Statistical strategies for constructing health risk models with multiple pollutants and their interactions: possible choices and comparisons. Environ Health. https://doi.org/10.1186/1476-069X-12-85

    Article  Google Scholar 

  6. Bien J, Taylor J, Tibshirani R (2013) A lasso for hierarchical interactions. Ann Stat 41(3):1111. https://doi.org/10.1214/13-AOS1096

    Article  MathSciNet  MATH  Google Scholar 

  7. Lim M, Hastie T (2015) Learning interactions via hierarchical group-lasso regularization. J Comput Graph Stat 24(3):627–654. https://doi.org/10.1080/10618600.2014.938812

    Article  MathSciNet  Google Scholar 

  8. Hamra GB, Buckley JP (2018) Environmental exposure mixtures: questions and methods to address them. Curr Epidemiol Rep 5(2):160–165. https://doi.org/10.1007/s40471-018-0145-0

    Article  Google Scholar 

  9. Ferrari F, Dunson DB (2020) Identifying main effects and interactions among exposures using gaussian processes. Ann Appl Stat 14(4):1743–1758. https://doi.org/10.1214/20-AOAS1363

    Article  MathSciNet  MATH  Google Scholar 

  10. Ferrari F, Dunson DB (2020) Bayesian factor analysis for inference on interactions. J Am Stat Assoc. https://doi.org/10.1080/01621459.2020.1745813

    Article  MATH  Google Scholar 

  11. Green P, MacLeod CJ (2016) SIMR: an R package for power analysis of generalized linear mixed models by simulation. Methods Ecol Evol 7(4):493–498. https://doi.org/10.1111/2041-210X.12504

    Article  Google Scholar 

  12. Morgan-Wall T, Khoury G (2021) Optimal design generation and power evaluation in R: the skpr package. J Stat Softw 99(1):1–36. https://doi.org/10.18637/jss.v099.i01

  13. LeBeau B (2022) simglm: simulate models based on the generalized linear model. R package version 0.8.9. https://CRAN.R-project.org/package=simglm. Accessed 5 Jan 2022

  14. Bobb JF, Valeri L, Henn BC et al (2015) Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 16(3):493–508. https://doi.org/10.1093/biostatistics/kxu058

    Article  MathSciNet  Google Scholar 

  15. Hoeting JA, Madigan D, Raftery AE et al (1999) Bayesian model averaging: a tutorial. Stat Sci 14(4):382–401

    MathSciNet  MATH  Google Scholar 

  16. Hamra GB, MacLehose RF, Croen L et al (2021) Bayesian weighted sums: a flexible approach to estimate summed mixture effects. Int J Environ Res Public Health 18(4):1373. https://doi.org/10.3390/ijerph18041373

    Article  Google Scholar 

  17. Keil AP, Buckley JP, O’Brien KM et al (2020) A quantile-based g-computation approach to addressing the effects of exposure mixtures. Environ Health Perspect. https://doi.org/10.1289/EHP5838

    Article  Google Scholar 

  18. Hoff PD (2007) Extending the rank likelihood for semiparametric copula estimation. Ann Appl Stat 1(1):265–283. https://doi.org/10.1214/07-AOAS107

    Article  MathSciNet  MATH  Google Scholar 

  19. Hoff P (2018) sbgcop: Semiparametric Bayesian Gaussian copula estimation and imputation. R package version 0.980. https://CRAN.R-project.org/package=sbgcop. Accessed 5 Jan 2022

  20. Lewandowski D, Kurowicka D, Joe H (2009) Generating random correlation matrices based on vines and extended onion method. J Multivar Anal 100(9):1989–2001. https://doi.org/10.1016/j.jmva.2009.04.008

    Article  MathSciNet  MATH  Google Scholar 

  21. Bedford T, Cooke RM (2002) Vines: a new graphical model for dependent random variables. Ann Stat 30(4):1031–1068

    Article  MathSciNet  MATH  Google Scholar 

  22. Joe H (2006) Generting random correlation matrices based on partial correlations. J Multivar Anal 97:2177–2189

    Article  MATH  Google Scholar 

  23. Eaton ML (2007) Multivariate statistics: a vector space approach. Inst Math Stat Lect Notes-Monogr Ser 53:512. https://doi.org/10.1214/lnms/1196285102

    Article  MATH  Google Scholar 

  24. Czanner G, Sarma SV, Eden UT et al (2008) A signal-to-noise ratio estimator for generalized linear model systems. In: Proceedings of the World Congress on Engineering, p 2

  25. McCullagh P, Nelder JA (1983) Generalized linear models. Chapman and Hall, Boca Raton

    Book  MATH  Google Scholar 

  26. Joubert BR, Kioumourtzoglou MA, Chamberlain T et al (2022) Powering research through innovative methods for mixtures in epidemiology (prime) program: novel and expanded statistical methods. Int J Environ Res Public Health. https://doi.org/10.3390/ijerph19031378

    Article  Google Scholar 

  27. Raftery A, Hoeting J, Volinsky C et al (2021) BMA: Bayesian model averaging. R package version 3.18.15. https://CRAN.R-project.org/package=BMA. Accessed 5 Jan 2022

  28. Bobb JF, Henn BC, Valeri L et al (2018) Statistical software for analyzing the health effects of multiple concurrent exposures via Bayesian kernel machine regression. Environ Health. https://doi.org/10.1186/s12940-018-0413-y

    Article  Google Scholar 

  29. Poworoznek E (2020) infinitefactor: Bayesian infinite factor models. R package version 1.0. https://CRAN.R-project.org/package=infinitefactor. Accessed 5 Jan 2022

  30. Nguyen PH (2022) bws: Bayesian weighted sums. R package version 0.1.0. https://CRAN.R-project.org/package=bws. Accessed 5 Jan 2022

  31. Keil A (2021) qgcomp: quantile g-computation. R package version 2.7.0. https://CRAN.R-project.org/package=qgcomp. Accessed 5 Jan 2022

  32. Corporation M, Weston S (2022) doSNOW: foreach parallel adaptor for the snow package. R package version 1.0.20. https://CRAN.R-project.org/package=doSNOW. Accessed 5 Jan 2022

  33. Microsoft, Weston S (2020) foreach: provides foreach looping construct. R package version 1.5.1. https://CRAN.R-project.org/package=foreach. Accessed 5 Jan 2022

  34. Cohen J (2013) Statistical power analysis for the behavioral sciences. Academic Press, Cambridge

    Book  MATH  Google Scholar 

  35. Zhang Z, Mai Y (2023) WebPower: basic and advanced statistical power analysis. R package version 0.9.3. https://CRAN.R-project.org/package=WebPower. Accessed 5 Jan 2022

  36. Wu B, Jiang Y, Jin X et al (2020) Using three statistical methods to analyze the association between exposure to 9 compounds and obesity in children and adolescents: Nhanes 2005–2010. Environ Health. https://doi.org/10.1186/s12940-020-00642-6

    Article  Google Scholar 

Download references

Funding

This work was partially supported by Grants R01ES027498 and R01ES028804 of the National Institute of Environmental Health Sciences of the United States National Institutes of Health.

Author information

Authors and Affiliations

Authors

Contributions

AHH devised and supervised the project. PHN developed the software, examples, and wrote the first draft of the manuscript. SME provided critical feedback and helped shape features of the software. All authors contributed to the final manuscript.

Corresponding author

Correspondence to Phuc H. Nguyen.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Consent to Participate

Not applicable.

Consent for Publication

Not applicable.

Ethical Approval

Not applicable.

Appendix A Estimated Signal-to-Noise Ratio as a Function of m

Appendix A Estimated Signal-to-Noise Ratio as a Function of m

We will estimate the SNR of the following data-generating process using different values for m:

figure ah
Fig. 7
figure 7

The estimated SNR for the linear model example is unbiased but the standard error might be large with a small sample of simulated data. The red horizontal line is the ground truth SNR (Color figure online)

Since the predictors are independent standard normal distributions, and the noise variance is 1, we can calculate the true SNR as \([0.3^2(1) + 0.3^2(1)]/1 = 0.18\). Figure 7 shows the estimated SNR and 1000-bootstrap s.e. for m \(\in \{500, 5000, 50000, 10000, 200000\}\). A larger m results in a more precise estimate. When the mixture model is defined based on resampling, it may not be possible to choose a large m without duplicating observations and underestimating the signal.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nguyen, P.H., Herring, A.H. & Engel, S.M. Power Analysis of Exposure Mixture Studies Via Monte Carlo Simulations. Stat Biosci (2023). https://doi.org/10.1007/s12561-023-09385-7

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12561-023-09385-7

Keywords

Navigation