Purpose of Review
The purpose of this review is to outline the main questions in environmental mixtures research and provide a non-technical explanation of novel or advanced methods to answer these questions.
Machine learning techniques are now being incorporated into environmental mixture research to overcome issues with traditional methods. Though some methods perform well on specific tasks, no method consistently outperforms all others in complex mixture analyses, largely because different methods were developed to answer different research questions. We discuss four main questions in environmental mixtures research: (1) Are there specific exposure patterns in the study population? (2) Which are the toxic agents in the mixture? (3) Are mixture members acting synergistically? And, (4) what is the overall effect of the mixture?
We emphasize the importance of robust methods and interpretable results over predictive accuracy. We encourage collaboration with computer scientists, data scientists, and biostatisticians in future mixture method development.
This is a preview of subscription content, log in to check access.
Buy single article
Instant access to the full article PDF.
Price includes VAT for USA
Subscribe to journal
Immediate online access to all issues from 2019. Subscription will auto renew annually.
This is the net price. Taxes to be calculated in checkout.
Papers of particular interest, published recently, have been highlighted as: •• Of major importance
Grandjean P, Landrigan PJ. Developmental neurotoxicity of industrial chemicals. Lancet. 2006;368(9553):2167–78.
U.S. EPA. Air, climate, and energy: strategic research action plan 2012–2016. Office of Research and Development, Available at: https://www.epa.gov/sites/production/files/2014-06/documents/strap-ace2012.pdf, 2012.
NRC (National Research Council). Air Quality Management in the United States. Washington, DC: National Academies Press; 2004.
NIEHS. Strategic plan 2012–2017—advancing science, Improving Health: A Plan for Envi-ronmental Health Research. US Department of Health and Human Services, National Insti-tutes of Health, Available at: https://www.niehs.nih.gov/health/materials/niehs_strategic_plan_20122017_frontiers_in_environmental_health_sciences_booklet_508.pdf, 2012.
NIEHS Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology Studies, Available at: http://www.niehs.nih.gov/about/events/pastmtg/2015/statistical/, July 2015.
Taylor KW, Joubert BR, Braun JM, Dilworth C, Gennings C, Hauser R, et al. Statistical approaches for assessing health effects of environmental chemical mixtures in epidemiology: lessons from an innovative workshop. Environ Health Perspect. 2016;124(12):A227.
Hamra GB, Buckley JP. Environmental exposure mixtures: questions and methods to address them. Curr Epidemiol Rep. 2018;5(2):160–5.
Stafoggia M, Breitner S, Hampel R, Basagaña X. Statistical ap-proaches to address multi-pollutant mixtures and multiple exposures: the state of the science. Curr Environ Health Rep. 2017;4(4):481–90.
Huang H, Wang A, Morello-Frosch R, Lam J, Sirota M, Padula A, et al. Cumulative risk and impact modeling on environmental chemical and social stressors. Curr Environ Health Rep. 2018;5(1):88–99.
Coker E, Liverani S, Su JG, Molitor J. Multi-pollutant modeling through examination of susceptible subpopulations using profile regression. Curr Environ Health Rep. 2018;5(1):59–69.
NIEHS. Powering Research through Innovative Methods for mixtures in Epidemiology (PRIME) (R01), RFA-ES-17-001, Available at: https://grants.nih.gov/grants/guide/rfa-files/RFA-ES-17-001.html, 2017.
Jolliffe I. Principal component analysis. Springer-Verlag, New York, 2002.
Jolliffe IT. Principal component analysis and factor analysis. Principal component analysis, pages 150–166, Springer-Verlag, New York, 2002.
Thompson B. Exploratory and confirmatory factor analysis: understanding concepts and applications. Applied Psychological Measurement. 2007;31(3):245-8.
Paatero P, Tapper U. Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics. 1994;5(2):111–26.
Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 1996; 58(1)267–288.
Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67(2):301–20.
•• Bobb JF, Valeri L, Henn BC, Christiani DC, Wright RO, Mazumdar M, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 16(3):493–508, 2014 BKMR was developed specifically for environmental mixtures by including kernel machine regression, a machine learning technique, in a Bayesian model.
Coull BA, Bobb JF, Wellenius GA, Kioumourtzoglou M-A, Mittle-man MA, Koutrakis P, et al. Development of Statistical Methods for Multipollutant Research; Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Con-stituents, volume 183. Boston: Health Effects Institute; 2015.
Bobb JF, Henn BC, Valeri L, Coull BA. Statistical software for analyzing the health effects of multiple concurrent exposures via bayesian kernel machine regression. Environ Health. 2018;17(1):67.
•• Carrico C, Gennings C, Wheeler DC, Factor-Litvak P. Characteriza-tion of weighted quantile sum regression for highly correlated data in a risk analysis setting. J Agric Biol Environ Stat. 2015;20(1):100–20 WQS was developed specifically for environmental mixtures using a machine learning optimization algorithm.
Kioumourtzoglou M-A, Austin E, Koutrakis P, Dominici F, Schwartz J, Zanobetti A. PM2.5 and survival among older adults: effect modifica-tion by particulate composition. Epidemiology. 2015;26(3):321–7.
•• James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. New York: Springer; 2013. This book provides guidance on how to implement statistical and machine learning methods without requiring a background in statistics or computer science. The authors give practical explanations of available methods and when to use them, including R code.
Pang Y, Peng RD, Jones MR, Francesconi KA, Goessler W, Howard B-b V, et al. Metal mixtures in urban and rural populations in the US: the multi-ethnic study of atherosclerosis and the strong heart study. Environ Res. 2016;147:356–64.
Kioumourtzoglou M-A, Coull BA, Dominici F, Koutrakis P, Schwartz J, Suh H. The impact of source contribution uncertainty on the effects of source-specific PM2.5 on hospital admissions: a case study in Boston, MA. Journal of expo-sure science and environmental. Epidemiology. 2014;24(4):365–71.
Robinson O, Tamayo I, De Castro M, Valentin A, Giorgis-Allemand L, Krog NH, et al. The urban exposome during pregnancy and its socioeconomic determi-nants. Environ Health Perspect. 2018;126(7):077005.
Manzano-León N, Serrano-Lomelin J, Sánchez BN, Quintana-Belmares R, Vega E, Vázquez-López I, et al. Tnf α and il-6 responses to particulate matter in vitro: variation according to PM size, season, and polycyclic aromatic hydrocarbon and soil content. Environ Health Perspect. 2015;124(4):406–12.
Emmanuel J, Candès XL, Ma Y, Wright J. Robust principal component analysis? J ACM. 2011;58(3):11.
Gillis N, Glineur F. Using underapproximations for sparse nonnegative matrix factorization. Pattern Recogn. 2010;43(4):1676–87.
Gillis N, Plemmons RJ. Sparse nonnegative matrix underapproximation and its application to hyperspectral image analysis. Linear Algebra Appl. 2013;438(10):3991–4007.
Lee DD, Seung HS. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, 2001;556–562.
Traoré T, Forhan A, Sirot V, Kadawathagedara M, Heude B, Hulin M, et al. To which mixtures are french pregnant women mainly exposed? a combination of the second french total diet study with the eden and elfe cohort studies. Food Chem Toxicol. 2018;111:310–28.
Shen X, Ye J. Adaptive model selection. J Am Stat Assoc. 2002;97(457):210–21.
Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001;96(456):1348–60.
Leamer EE. Specification searches: ad hoc inference with nonexperimental data, John Wiley & Sons Incorporated, 1978.
Raftery AE. Approximate bayes factors and accounting for model uncertainty in gener-alised linear models. Biometrika. 1996;83(2):251–66.
Draper D. Assessment and propagation of model uncertainty. J R Stat Soc Ser B Methodol 1995;45–97.
Fan J, Li R. Statistical challenges with high dimensionality: feature selection in knowledge discovery. arXiv preprint math/0602133 2006
Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogo-nal problems. Technometrics. 1970;12(1):55–67.
Friedman J, Hastie T, Tibshirani R. The elements of statistical learning, volume 1. New York: Springer series in Statistics; 2001.
Hal Daumé III. A course in machine learning. 2012;5–73.
Nwanaji-Enwerem JC, Dai L, Colicino E, Oulhote Y, Di Q, Kloog I, et al. Associations between long-term exposure to PM2.5 component species and blood DNA methylation age in the elderly: the VA normative aging study. Environ Int. 2017;102:57–65.
Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101(476):1418–29.
Liu D, Lin X, Ghosh D. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics. 2007;63(4):1079–88.
Wasserman GA, Liu X, Parvez F, Chen Y, Factor-Litvak P, LoIa-cono NJ, et al. A cross-sectional study of water arsenic exposure and intellectual function in adolescence in Arai-Hazar, Bangladesh. Environ Int. 2018;118:304–13.
Stingone JA, Pandey OP, Claudio L, Pandey G. Using machine learning to identify air pollution exposure profiles associated with early cognitive skills among us children. Environ Pollut. 2017;230:730–40.
Ouidir M, Lepeule J, Siroux V, Malherbe L, Meleux F, Rivière E, Launay L, Zaros C, Cheminat M, Charles M-A, et al. Is atmospheric pollution exposure during pregnancy associated with individual and contex-tual characteristics? A nationwide study in FranceJ Epidemiol Community Health, pages jech–2016, 2017.
Gass K, Klein M, Chang HH, Flanders WD, Strickland MJ. Classification and regression trees for epidemiologic research: an air pollution example. Environ Health. 13(1):17, 2014.
National Research Council. Phthalates and cumulative risk assessment: the tasks ahead. National Academies Press, 2009.
Van den Berg M, Birnbaum LS, Denison M, De Vito M, Farland W, Feeley M, et al. The 2005 world health organization reevaluation of human and mammalian toxic equivalency fac-tors for dioxins and dioxin-like compounds. Toxicol Sci. 2006;93(2):223–41.
Mitro SD, Birnbaum LS, Needham BL, Zota AR. Cross-sectional associations between exposure to persistent organic pollutants and leukocyte telomere length among us adults in nhanes, 2001–2002. Environ Health Perspect. 2015;124(5):651–8.
Gennings C, Carrico C, Factor-Litvak P, Krigbaum N, Cirillo PM, Cohn BA. A cohort study evaluation of maternal pcb exposure related to time to pregnancy in daughters. Environ Health. 2013;12(1):66.
Yorita Christensen KL, Carrico CK, Sanyal AJ, Gennings C. Multiple classes of environmental chemicals are associated with liver disease: Nhanes 2003–2004. Int J Hyg Environ Health. 2013;216(6):703–9.
White AJ, O’Brien KM, Niehoff NM, Carroll R, Sandler DP. Metallic air pollutants and breast cancer risk in a nationwide cohort study. Epidemiology 20181;30(1):20-8..
Hoffman MD, Blei DM, Wang C, Paisley J. Stochastic variational inference. J Mach Learn Res. 2013;14(1):1303–47.
•• Gelman A, Stern HS, Carlin JB, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. Chapman and Hall/CRC, 2013. This book is widely considered the leading text on Bayesian methods, with an accessible, applied approach to data analysis. The authors introduce basic concepts from a data-analytic perspective before presenting advanced methods.
MacLehose RF, Dunson DB, Herring AH, Hoppin JA. Bayesian methods for highly correlated exposure data. Epidemiology 2007;18(2):199–207.
MacLehose RF, Hamra GB. Applications of bayesian methods to epidemi-ologic research. Curr Epidemiol Rep. 2014;1(3):103–9.
Furlong MA, Herring A, Buckley JP, Goldman BD, Daniels JL, Engel LS, et al. Prenatal exposure to organophosphorus pesticides and childhood neurodevelopmental phenotypes. Environ Res. 2017;158:737–47.
Fragoso TM, Bertoli W, Louzada F. Bayesian model averaging: a systematic review and conceptual classification. Int Stat Rev. 2018;86(1):1–28.
Wilson A, Zigler CM, Patel CJ, Dominici F. Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression. Biometrics, 2018;74(3):1034-44.
Berger K, Eskenazi B, Balmes J, Holland N, Calafat AM, Harley KG. Associations between prenatal maternal urinary concentrations of personal care product chemical biomarkers and childhood respiratory and allergic outcomes in the CHAMACOS study. Environ Int. 2018;121:538–49.
Berger K, Eskenazi B, Balmes J, Kogut K, Holland N, Calafat AM, Harley KG. Prenatal high molecular weight phthalates and bisphenol a, and childhood respiratory and allergic outcomes. Pediatr Allergy Immunol, 2019;30(1):36-46.
Berger K, Gunier RB, Chevrier J, Calafat AM, Ye X, Eskenazi B, et al. Associations of maternal exposure to triclosan, parabens, and other phenols with prenatal maternal and neonatal thyroid hormone levels. Environ Res. 2018;165:379–86.
Park SK, Tao Y, Meeker JD, Harlow SD, Mukherjee B. Envirsonmental risk score as a new tool to examine multi-pollutants in epidemiologic research: an example from the nhanes study using serum lipid levels. PLoS One. 2014;9(6):e98632.
Chipman HA, George EI, McCulloch RE, et al. Bart: Bayesian additive regression trees. Ann Appl Stat. 2010;4(1):266–98.
Ko Y-A, Mukherjee B, Smith JA, Kardia SL, Allison M, Diez AVR. Classification and clustering methods for multiple environmental factors in gene-environment interaction: application to the multi-ethnic study of atherosclerosis. Epidemiology. 2016;27(6):870–8.
Coker E, Gunier R, Bradman A, Harley K, Kogut K, Molitor J, et al. Association between pesticide profiles used on agricultural fields near maternal residences during pregnancy and iq at age 7 years. Int J Environ Res Public Health. 2017;14(5):506.
Molitor J, Papathomas M, Jerrett M, Richardson S. Bayesian profile regression with an application to the national survey of children’s health. Biostatistics. 2010;11(3):484–98.
Kioumourtzoglou M-A, Zanobetti A, Schwartz JD, Coull BA, Dominici F, Suh HH. The effect of primary organic particles on emer-gency hospital admissions among the elderly in 3 us cities. Environ Health. 2013;12(1):68.
Kalkbrenner AE, Daniels JL, Chen J-C, Poole C, Emch M, Morrissey J. Perinatal exposure to hazardous air pollutants and autism spectrum dis-orders at age 8. Epidemiology. 2010;21(5):631–41.
Momoli F, Abrahamowicz M, Parent M-E, Krewski D, Siemiaty-cki J. Analysis of multiple exposures: an empirical comparison of results from conventional and semi-bayes modeling strategies. Epidemiology. 2010;21(1):144–51.
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Mea-surement error in nonlinear models: a modern perspective. Chapman and Hall/CRC, 2006.
Pollack AZ, Perkins NJ, Mumford SL, Ye A, Schisterman EF. Correlated biomarker measurement error: an important threat to inference in environmental epidemiology. Am J Epidemiol. 2012;177(1):84–92.
Weisskopf MG, Seals RM, Webster TF. Bias amplification in epidemiologic analysis of exposure to mixtures. Environ Health Perspect. 2018;47003:1.
This work was supported by NIEHS F31 ES030263, T32 ES007322, P30 ES009089, and R01 ES028805.
Conflict of Interest
Elizabeth A. Gibson, Jeff Goldsmith, and Marianthi-Anna Kioumourtzoglou declare that they have received grants from the NIEHS during this study.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the Topical Collection on Methods in Environmental Epidemiology
About this article
Cite this article
Gibson, E.A., Goldsmith, J. & Kioumourtzoglou, M. Complex Mixtures, Complex Analyses: an Emphasis on Interpretable Results. Curr Envir Health Rpt 6, 53–61 (2019). https://doi.org/10.1007/s40572-019-00229-5
- Environmental mixtures
- Dimension reduction
- Variable selection
- Bayesian statistics