Complex Mixtures, Complex Analyses: an Emphasis on Interpretable Results
- 27 Downloads
Purpose of Review
The purpose of this review is to outline the main questions in environmental mixtures research and provide a non-technical explanation of novel or advanced methods to answer these questions.
Machine learning techniques are now being incorporated into environmental mixture research to overcome issues with traditional methods. Though some methods perform well on specific tasks, no method consistently outperforms all others in complex mixture analyses, largely because different methods were developed to answer different research questions. We discuss four main questions in environmental mixtures research: (1) Are there specific exposure patterns in the study population? (2) Which are the toxic agents in the mixture? (3) Are mixture members acting synergistically? And, (4) what is the overall effect of the mixture?
We emphasize the importance of robust methods and interpretable results over predictive accuracy. We encourage collaboration with computer scientists, data scientists, and biostatisticians in future mixture method development.
KeywordsEnvironmental mixtures Multi-pollutant Dimension reduction Variable selection Bayesian statistics
This work was supported by NIEHS F31 ES030263, T32 ES007322, P30 ES009089, and R01 ES028805.
Compliance with Ethical Standards
Conflict of Interest
Elizabeth A. Gibson, Jeff Goldsmith, and Marianthi-Anna Kioumourtzoglou declare that they have received grants from the NIEHS during this study.
Human and Animal Rights and Informed Consent
This article does not contain any studies with human or animal subjects performed by any of the authors.
Papers of particular interest, published recently, have been highlighted as: •• Of major importance
- 2.U.S. EPA. Air, climate, and energy: strategic research action plan 2012–2016. Office of Research and Development, Available at: https://www.epa.gov/sites/production/files/2014-06/documents/strap-ace2012.pdf, 2012.
- 3.NRC (National Research Council). Air Quality Management in the United States. Washington, DC: National Academies Press; 2004.Google Scholar
- 4.NIEHS. Strategic plan 2012–2017—advancing science, Improving Health: A Plan for Envi-ronmental Health Research. US Department of Health and Human Services, National Insti-tutes of Health, Available at: https://www.niehs.nih.gov/health/materials/niehs_strategic_plan_20122017_frontiers_in_environmental_health_sciences_booklet_508.pdf, 2012.
- 5.NIEHS Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology Studies, Available at: http://www.niehs.nih.gov/about/events/pastmtg/2015/statistical/, July 2015.
- 11.NIEHS. Powering Research through Innovative Methods for mixtures in Epidemiology (PRIME) (R01), RFA-ES-17-001, Available at: https://grants.nih.gov/grants/guide/rfa-files/RFA-ES-17-001.html, 2017.
- 12.Jolliffe I. Principal component analysis. Springer-Verlag, New York, 2002.Google Scholar
- 13.Jolliffe IT. Principal component analysis and factor analysis. Principal component analysis, pages 150–166, Springer-Verlag, New York, 2002.Google Scholar
- 14.Thompson B. Exploratory and confirmatory factor analysis: understanding concepts and applications. Applied Psychological Measurement. 2007;31(3):245-8.Google Scholar
- 16.Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 1996; 58(1)267–288.Google Scholar
- 18.•• Bobb JF, Valeri L, Henn BC, Christiani DC, Wright RO, Mazumdar M, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 16(3):493–508, 2014 BKMR was developed specifically for environmental mixtures by including kernel machine regression, a machine learning technique, in a Bayesian model. Google Scholar
- 19.Coull BA, Bobb JF, Wellenius GA, Kioumourtzoglou M-A, Mittle-man MA, Koutrakis P, et al. Development of Statistical Methods for Multipollutant Research; Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Con-stituents, volume 183. Boston: Health Effects Institute; 2015.Google Scholar
- 21.•• Carrico C, Gennings C, Wheeler DC, Factor-Litvak P. Characteriza-tion of weighted quantile sum regression for highly correlated data in a risk analysis setting. J Agric Biol Environ Stat. 2015;20(1):100–20 WQS was developed specifically for environmental mixtures using a machine learning optimization algorithm. CrossRefGoogle Scholar
- 23.•• James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. New York: Springer; 2013. This book provides guidance on how to implement statistical and machine learning methods without requiring a background in statistics or computer science. The authors give practical explanations of available methods and when to use them, including R code. CrossRefGoogle Scholar
- 25.Kioumourtzoglou M-A, Coull BA, Dominici F, Koutrakis P, Schwartz J, Suh H. The impact of source contribution uncertainty on the effects of source-specific PM2.5 on hospital admissions: a case study in Boston, MA. Journal of expo-sure science and environmental. Epidemiology. 2014;24(4):365–71.Google Scholar
- 27.Manzano-León N, Serrano-Lomelin J, Sánchez BN, Quintana-Belmares R, Vega E, Vázquez-López I, et al. Tnf α and il-6 responses to particulate matter in vitro: variation according to PM size, season, and polycyclic aromatic hydrocarbon and soil content. Environ Health Perspect. 2015;124(4):406–12.CrossRefGoogle Scholar
- 28.Emmanuel J, Candès XL, Ma Y, Wright J. Robust principal component analysis? J ACM. 2011;58(3):11.Google Scholar
- 31.Lee DD, Seung HS. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, 2001;556–562.Google Scholar
- 35.Leamer EE. Specification searches: ad hoc inference with nonexperimental data, John Wiley & Sons Incorporated, 1978.Google Scholar
- 37.Draper D. Assessment and propagation of model uncertainty. J R Stat Soc Ser B Methodol 1995;45–97.Google Scholar
- 38.Fan J, Li R. Statistical challenges with high dimensionality: feature selection in knowledge discovery. arXiv preprint math/0602133 2006Google Scholar
- 40.Friedman J, Hastie T, Tibshirani R. The elements of statistical learning, volume 1. New York: Springer series in Statistics; 2001.Google Scholar
- 41.Hal Daumé III. A course in machine learning. 2012;5–73.Google Scholar
- 47.Ouidir M, Lepeule J, Siroux V, Malherbe L, Meleux F, Rivière E, Launay L, Zaros C, Cheminat M, Charles M-A, et al. Is atmospheric pollution exposure during pregnancy associated with individual and contex-tual characteristics? A nationwide study in FranceJ Epidemiol Community Health, pages jech–2016, 2017.Google Scholar
- 48.Gass K, Klein M, Chang HH, Flanders WD, Strickland MJ. Classification and regression trees for epidemiologic research: an air pollution example. Environ Health. 13(1):17, 2014.Google Scholar
- 49.National Research Council. Phthalates and cumulative risk assessment: the tasks ahead. National Academies Press, 2009.Google Scholar
- 54.White AJ, O’Brien KM, Niehoff NM, Carroll R, Sandler DP. Metallic air pollutants and breast cancer risk in a nationwide cohort study. Epidemiology 20181;30(1):20-8..Google Scholar
- 55.Hoffman MD, Blei DM, Wang C, Paisley J. Stochastic variational inference. J Mach Learn Res. 2013;14(1):1303–47.Google Scholar
- 56.•• Gelman A, Stern HS, Carlin JB, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. Chapman and Hall/CRC, 2013. This book is widely considered the leading text on Bayesian methods, with an accessible, applied approach to data analysis. The authors introduce basic concepts from a data-analytic perspective before presenting advanced methods. Google Scholar
- 57.MacLehose RF, Dunson DB, Herring AH, Hoppin JA. Bayesian methods for highly correlated exposure data. Epidemiology 2007;18(2):199–207.Google Scholar
- 61.Wilson A, Zigler CM, Patel CJ, Dominici F. Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression. Biometrics, 2018;74(3):1034-44.Google Scholar
- 63.Berger K, Eskenazi B, Balmes J, Kogut K, Holland N, Calafat AM, Harley KG. Prenatal high molecular weight phthalates and bisphenol a, and childhood respiratory and allergic outcomes. Pediatr Allergy Immunol, 2019;30(1):36-46.Google Scholar
- 73.Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Mea-surement error in nonlinear models: a modern perspective. Chapman and Hall/CRC, 2006.Google Scholar
- 75.Weisskopf MG, Seals RM, Webster TF. Bias amplification in epidemiologic analysis of exposure to mixtures. Environ Health Perspect. 2018;47003:1.Google Scholar