Current Environmental Health Reports

, Volume 6, Issue 2, pp 53–61 | Cite as

Complex Mixtures, Complex Analyses: an Emphasis on Interpretable Results

  • Elizabeth A. Gibson
  • Jeff Goldsmith
  • Marianthi-Anna KioumourtzoglouEmail author
Methods in Environmental Epidemiology (AZ Pollack and NJ Perkins, Section Editors)
Part of the following topical collections:
  1. Topical Collection on Methods in Environmental Epidemiology


Purpose of Review

The purpose of this review is to outline the main questions in environmental mixtures research and provide a non-technical explanation of novel or advanced methods to answer these questions.

Recent Findings

Machine learning techniques are now being incorporated into environmental mixture research to overcome issues with traditional methods. Though some methods perform well on specific tasks, no method consistently outperforms all others in complex mixture analyses, largely because different methods were developed to answer different research questions. We discuss four main questions in environmental mixtures research: (1) Are there specific exposure patterns in the study population? (2) Which are the toxic agents in the mixture? (3) Are mixture members acting synergistically? And, (4) what is the overall effect of the mixture?


We emphasize the importance of robust methods and interpretable results over predictive accuracy. We encourage collaboration with computer scientists, data scientists, and biostatisticians in future mixture method development.


Environmental mixtures Multi-pollutant Dimension reduction Variable selection Bayesian statistics 


Funding information

This work was supported by NIEHS F31 ES030263, T32 ES007322, P30 ES009089, and R01 ES028805.

Compliance with Ethical Standards

Conflict of Interest

Elizabeth A. Gibson, Jeff Goldsmith, and Marianthi-Anna Kioumourtzoglou declare that they have received grants from the NIEHS during this study.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.


Papers of particular interest, published recently, have been highlighted as: •• Of major importance

  1. 1.
    Grandjean P, Landrigan PJ. Developmental neurotoxicity of industrial chemicals. Lancet. 2006;368(9553):2167–78.CrossRefGoogle Scholar
  2. 2.
    U.S. EPA. Air, climate, and energy: strategic research action plan 2012–2016. Office of Research and Development, Available at:, 2012.
  3. 3.
    NRC (National Research Council). Air Quality Management in the United States. Washington, DC: National Academies Press; 2004.Google Scholar
  4. 4.
    NIEHS. Strategic plan 2012–2017—advancing science, Improving Health: A Plan for Envi-ronmental Health Research. US Department of Health and Human Services, National Insti-tutes of Health, Available at:, 2012.
  5. 5.
    NIEHS Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology Studies, Available at:, July 2015.
  6. 6.
    Taylor KW, Joubert BR, Braun JM, Dilworth C, Gennings C, Hauser R, et al. Statistical approaches for assessing health effects of environmental chemical mixtures in epidemiology: lessons from an innovative workshop. Environ Health Perspect. 2016;124(12):A227.CrossRefGoogle Scholar
  7. 7.
    Hamra GB, Buckley JP. Environmental exposure mixtures: questions and methods to address them. Curr Epidemiol Rep. 2018;5(2):160–5.CrossRefGoogle Scholar
  8. 8.
    Stafoggia M, Breitner S, Hampel R, Basagaña X. Statistical ap-proaches to address multi-pollutant mixtures and multiple exposures: the state of the science. Curr Environ Health Rep. 2017;4(4):481–90.CrossRefGoogle Scholar
  9. 9.
    Huang H, Wang A, Morello-Frosch R, Lam J, Sirota M, Padula A, et al. Cumulative risk and impact modeling on environmental chemical and social stressors. Curr Environ Health Rep. 2018;5(1):88–99.CrossRefGoogle Scholar
  10. 10.
    Coker E, Liverani S, Su JG, Molitor J. Multi-pollutant modeling through examination of susceptible subpopulations using profile regression. Curr Environ Health Rep. 2018;5(1):59–69.CrossRefGoogle Scholar
  11. 11.
    NIEHS. Powering Research through Innovative Methods for mixtures in Epidemiology (PRIME) (R01), RFA-ES-17-001, Available at:, 2017.
  12. 12.
    Jolliffe I. Principal component analysis. Springer-Verlag, New York, 2002.Google Scholar
  13. 13.
    Jolliffe IT. Principal component analysis and factor analysis. Principal component analysis, pages 150–166, Springer-Verlag, New York, 2002.Google Scholar
  14. 14.
    Thompson B. Exploratory and confirmatory factor analysis: understanding concepts and applications. Applied Psychological Measurement. 2007;31(3):245-8.Google Scholar
  15. 15.
    Paatero P, Tapper U. Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics. 1994;5(2):111–26.CrossRefGoogle Scholar
  16. 16.
    Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 1996; 58(1)267–288.Google Scholar
  17. 17.
    Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67(2):301–20.CrossRefGoogle Scholar
  18. 18.
    •• Bobb JF, Valeri L, Henn BC, Christiani DC, Wright RO, Mazumdar M, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 16(3):493–508, 2014 BKMR was developed specifically for environmental mixtures by including kernel machine regression, a machine learning technique, in a Bayesian model. Google Scholar
  19. 19.
    Coull BA, Bobb JF, Wellenius GA, Kioumourtzoglou M-A, Mittle-man MA, Koutrakis P, et al. Development of Statistical Methods for Multipollutant Research; Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Con-stituents, volume 183. Boston: Health Effects Institute; 2015.Google Scholar
  20. 20.
    Bobb JF, Henn BC, Valeri L, Coull BA. Statistical software for analyzing the health effects of multiple concurrent exposures via bayesian kernel machine regression. Environ Health. 2018;17(1):67.CrossRefGoogle Scholar
  21. 21.
    •• Carrico C, Gennings C, Wheeler DC, Factor-Litvak P. Characteriza-tion of weighted quantile sum regression for highly correlated data in a risk analysis setting. J Agric Biol Environ Stat. 2015;20(1):100–20 WQS was developed specifically for environmental mixtures using a machine learning optimization algorithm. CrossRefGoogle Scholar
  22. 22.
    Kioumourtzoglou M-A, Austin E, Koutrakis P, Dominici F, Schwartz J, Zanobetti A. PM2.5 and survival among older adults: effect modifica-tion by particulate composition. Epidemiology. 2015;26(3):321–7.CrossRefGoogle Scholar
  23. 23.
    •• James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. New York: Springer; 2013. This book provides guidance on how to implement statistical and machine learning methods without requiring a background in statistics or computer science. The authors give practical explanations of available methods and when to use them, including R code. CrossRefGoogle Scholar
  24. 24.
    Pang Y, Peng RD, Jones MR, Francesconi KA, Goessler W, Howard B-b V, et al. Metal mixtures in urban and rural populations in the US: the multi-ethnic study of atherosclerosis and the strong heart study. Environ Res. 2016;147:356–64.CrossRefGoogle Scholar
  25. 25.
    Kioumourtzoglou M-A, Coull BA, Dominici F, Koutrakis P, Schwartz J, Suh H. The impact of source contribution uncertainty on the effects of source-specific PM2.5 on hospital admissions: a case study in Boston, MA. Journal of expo-sure science and environmental. Epidemiology. 2014;24(4):365–71.Google Scholar
  26. 26.
    Robinson O, Tamayo I, De Castro M, Valentin A, Giorgis-Allemand L, Krog NH, et al. The urban exposome during pregnancy and its socioeconomic determi-nants. Environ Health Perspect. 2018;126(7):077005.CrossRefGoogle Scholar
  27. 27.
    Manzano-León N, Serrano-Lomelin J, Sánchez BN, Quintana-Belmares R, Vega E, Vázquez-López I, et al. Tnf α and il-6 responses to particulate matter in vitro: variation according to PM size, season, and polycyclic aromatic hydrocarbon and soil content. Environ Health Perspect. 2015;124(4):406–12.CrossRefGoogle Scholar
  28. 28.
    Emmanuel J, Candès XL, Ma Y, Wright J. Robust principal component analysis? J ACM. 2011;58(3):11.Google Scholar
  29. 29.
    Gillis N, Glineur F. Using underapproximations for sparse nonnegative matrix factorization. Pattern Recogn. 2010;43(4):1676–87.CrossRefGoogle Scholar
  30. 30.
    Gillis N, Plemmons RJ. Sparse nonnegative matrix underapproximation and its application to hyperspectral image analysis. Linear Algebra Appl. 2013;438(10):3991–4007.CrossRefGoogle Scholar
  31. 31.
    Lee DD, Seung HS. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, 2001;556–562.Google Scholar
  32. 32.
    Traoré T, Forhan A, Sirot V, Kadawathagedara M, Heude B, Hulin M, et al. To which mixtures are french pregnant women mainly exposed? a combination of the second french total diet study with the eden and elfe cohort studies. Food Chem Toxicol. 2018;111:310–28.CrossRefGoogle Scholar
  33. 33.
    Shen X, Ye J. Adaptive model selection. J Am Stat Assoc. 2002;97(457):210–21.CrossRefGoogle Scholar
  34. 34.
    Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001;96(456):1348–60.CrossRefGoogle Scholar
  35. 35.
    Leamer EE. Specification searches: ad hoc inference with nonexperimental data, John Wiley & Sons Incorporated, 1978.Google Scholar
  36. 36.
    Raftery AE. Approximate bayes factors and accounting for model uncertainty in gener-alised linear models. Biometrika. 1996;83(2):251–66.CrossRefGoogle Scholar
  37. 37.
    Draper D. Assessment and propagation of model uncertainty. J R Stat Soc Ser B Methodol 1995;45–97.Google Scholar
  38. 38.
    Fan J, Li R. Statistical challenges with high dimensionality: feature selection in knowledge discovery. arXiv preprint math/0602133 2006Google Scholar
  39. 39.
    Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogo-nal problems. Technometrics. 1970;12(1):55–67.CrossRefGoogle Scholar
  40. 40.
    Friedman J, Hastie T, Tibshirani R. The elements of statistical learning, volume 1. New York: Springer series in Statistics; 2001.Google Scholar
  41. 41.
    Hal Daumé III. A course in machine learning. 2012;5–73.Google Scholar
  42. 42.
    Nwanaji-Enwerem JC, Dai L, Colicino E, Oulhote Y, Di Q, Kloog I, et al. Associations between long-term exposure to PM2.5 component species and blood DNA methylation age in the elderly: the VA normative aging study. Environ Int. 2017;102:57–65.CrossRefGoogle Scholar
  43. 43.
    Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101(476):1418–29.CrossRefGoogle Scholar
  44. 44.
    Liu D, Lin X, Ghosh D. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics. 2007;63(4):1079–88.CrossRefGoogle Scholar
  45. 45.
    Wasserman GA, Liu X, Parvez F, Chen Y, Factor-Litvak P, LoIa-cono NJ, et al. A cross-sectional study of water arsenic exposure and intellectual function in adolescence in Arai-Hazar, Bangladesh. Environ Int. 2018;118:304–13.CrossRefGoogle Scholar
  46. 46.
    Stingone JA, Pandey OP, Claudio L, Pandey G. Using machine learning to identify air pollution exposure profiles associated with early cognitive skills among us children. Environ Pollut. 2017;230:730–40.CrossRefGoogle Scholar
  47. 47.
    Ouidir M, Lepeule J, Siroux V, Malherbe L, Meleux F, Rivière E, Launay L, Zaros C, Cheminat M, Charles M-A, et al. Is atmospheric pollution exposure during pregnancy associated with individual and contex-tual characteristics? A nationwide study in FranceJ Epidemiol Community Health, pages jech–2016, 2017.Google Scholar
  48. 48.
    Gass K, Klein M, Chang HH, Flanders WD, Strickland MJ. Classification and regression trees for epidemiologic research: an air pollution example. Environ Health. 13(1):17, 2014.Google Scholar
  49. 49.
    National Research Council. Phthalates and cumulative risk assessment: the tasks ahead. National Academies Press, 2009.Google Scholar
  50. 50.
    Van den Berg M, Birnbaum LS, Denison M, De Vito M, Farland W, Feeley M, et al. The 2005 world health organization reevaluation of human and mammalian toxic equivalency fac-tors for dioxins and dioxin-like compounds. Toxicol Sci. 2006;93(2):223–41.CrossRefGoogle Scholar
  51. 51.
    Mitro SD, Birnbaum LS, Needham BL, Zota AR. Cross-sectional associations between exposure to persistent organic pollutants and leukocyte telomere length among us adults in nhanes, 2001–2002. Environ Health Perspect. 2015;124(5):651–8.CrossRefGoogle Scholar
  52. 52.
    Gennings C, Carrico C, Factor-Litvak P, Krigbaum N, Cirillo PM, Cohn BA. A cohort study evaluation of maternal pcb exposure related to time to pregnancy in daughters. Environ Health. 2013;12(1):66.CrossRefGoogle Scholar
  53. 53.
    Yorita Christensen KL, Carrico CK, Sanyal AJ, Gennings C. Multiple classes of environmental chemicals are associated with liver disease: Nhanes 2003–2004. Int J Hyg Environ Health. 2013;216(6):703–9.CrossRefGoogle Scholar
  54. 54.
    White AJ, O’Brien KM, Niehoff NM, Carroll R, Sandler DP. Metallic air pollutants and breast cancer risk in a nationwide cohort study. Epidemiology 20181;30(1):20-8..Google Scholar
  55. 55.
    Hoffman MD, Blei DM, Wang C, Paisley J. Stochastic variational inference. J Mach Learn Res. 2013;14(1):1303–47.Google Scholar
  56. 56.
    •• Gelman A, Stern HS, Carlin JB, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. Chapman and Hall/CRC, 2013. This book is widely considered the leading text on Bayesian methods, with an accessible, applied approach to data analysis. The authors introduce basic concepts from a data-analytic perspective before presenting advanced methods. Google Scholar
  57. 57.
    MacLehose RF, Dunson DB, Herring AH, Hoppin JA. Bayesian methods for highly correlated exposure data. Epidemiology 2007;18(2):199–207.Google Scholar
  58. 58.
    MacLehose RF, Hamra GB. Applications of bayesian methods to epidemi-ologic research. Curr Epidemiol Rep. 2014;1(3):103–9.CrossRefGoogle Scholar
  59. 59.
    Furlong MA, Herring A, Buckley JP, Goldman BD, Daniels JL, Engel LS, et al. Prenatal exposure to organophosphorus pesticides and childhood neurodevelopmental phenotypes. Environ Res. 2017;158:737–47.CrossRefGoogle Scholar
  60. 60.
    Fragoso TM, Bertoli W, Louzada F. Bayesian model averaging: a systematic review and conceptual classification. Int Stat Rev. 2018;86(1):1–28.CrossRefGoogle Scholar
  61. 61.
    Wilson A, Zigler CM, Patel CJ, Dominici F. Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression. Biometrics, 2018;74(3):1034-44.Google Scholar
  62. 62.
    Berger K, Eskenazi B, Balmes J, Holland N, Calafat AM, Harley KG. Associations between prenatal maternal urinary concentrations of personal care product chemical biomarkers and childhood respiratory and allergic outcomes in the CHAMACOS study. Environ Int. 2018;121:538–49.CrossRefGoogle Scholar
  63. 63.
    Berger K, Eskenazi B, Balmes J, Kogut K, Holland N, Calafat AM, Harley KG. Prenatal high molecular weight phthalates and bisphenol a, and childhood respiratory and allergic outcomes. Pediatr Allergy Immunol, 2019;30(1):36-46.Google Scholar
  64. 64.
    Berger K, Gunier RB, Chevrier J, Calafat AM, Ye X, Eskenazi B, et al. Associations of maternal exposure to triclosan, parabens, and other phenols with prenatal maternal and neonatal thyroid hormone levels. Environ Res. 2018;165:379–86.CrossRefGoogle Scholar
  65. 65.
    Park SK, Tao Y, Meeker JD, Harlow SD, Mukherjee B. Envirsonmental risk score as a new tool to examine multi-pollutants in epidemiologic research: an example from the nhanes study using serum lipid levels. PLoS One. 2014;9(6):e98632.CrossRefGoogle Scholar
  66. 66.
    Chipman HA, George EI, McCulloch RE, et al. Bart: Bayesian additive regression trees. Ann Appl Stat. 2010;4(1):266–98.CrossRefGoogle Scholar
  67. 67.
    Ko Y-A, Mukherjee B, Smith JA, Kardia SL, Allison M, Diez AVR. Classification and clustering methods for multiple environmental factors in gene-environment interaction: application to the multi-ethnic study of atherosclerosis. Epidemiology. 2016;27(6):870–8.CrossRefGoogle Scholar
  68. 68.
    Coker E, Gunier R, Bradman A, Harley K, Kogut K, Molitor J, et al. Association between pesticide profiles used on agricultural fields near maternal residences during pregnancy and iq at age 7 years. Int J Environ Res Public Health. 2017;14(5):506.CrossRefGoogle Scholar
  69. 69.
    Molitor J, Papathomas M, Jerrett M, Richardson S. Bayesian profile regression with an application to the national survey of children’s health. Biostatistics. 2010;11(3):484–98.CrossRefGoogle Scholar
  70. 70.
    Kioumourtzoglou M-A, Zanobetti A, Schwartz JD, Coull BA, Dominici F, Suh HH. The effect of primary organic particles on emer-gency hospital admissions among the elderly in 3 us cities. Environ Health. 2013;12(1):68.CrossRefGoogle Scholar
  71. 71.
    Kalkbrenner AE, Daniels JL, Chen J-C, Poole C, Emch M, Morrissey J. Perinatal exposure to hazardous air pollutants and autism spectrum dis-orders at age 8. Epidemiology. 2010;21(5):631–41.CrossRefGoogle Scholar
  72. 72.
    Momoli F, Abrahamowicz M, Parent M-E, Krewski D, Siemiaty-cki J. Analysis of multiple exposures: an empirical comparison of results from conventional and semi-bayes modeling strategies. Epidemiology. 2010;21(1):144–51.CrossRefGoogle Scholar
  73. 73.
    Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Mea-surement error in nonlinear models: a modern perspective. Chapman and Hall/CRC, 2006.Google Scholar
  74. 74.
    Pollack AZ, Perkins NJ, Mumford SL, Ye A, Schisterman EF. Correlated biomarker measurement error: an important threat to inference in environmental epidemiology. Am J Epidemiol. 2012;177(1):84–92.CrossRefGoogle Scholar
  75. 75.
    Weisskopf MG, Seals RM, Webster TF. Bias amplification in epidemiologic analysis of exposure to mixtures. Environ Health Perspect. 2018;47003:1.Google Scholar

Copyright information

© Springer Nature Switzerland AG 2019

Authors and Affiliations

  • Elizabeth A. Gibson
    • 1
  • Jeff Goldsmith
    • 2
  • Marianthi-Anna Kioumourtzoglou
    • 1
    Email author
  1. 1.Department of Environmental Health Sciences, Mailman School of Public HealthColumbia UniversityNew YorkUSA
  2. 2.Department of BiostatisticsColumbia University Mailman School of Public HealthNew YorkUSA

Personalised recommendations