Complex Mixtures, Complex Analyses: an Emphasis on Interpretable Results


Purpose of Review

The purpose of this review is to outline the main questions in environmental mixtures research and provide a non-technical explanation of novel or advanced methods to answer these questions.

Recent Findings

Machine learning techniques are now being incorporated into environmental mixture research to overcome issues with traditional methods. Though some methods perform well on specific tasks, no method consistently outperforms all others in complex mixture analyses, largely because different methods were developed to answer different research questions. We discuss four main questions in environmental mixtures research: (1) Are there specific exposure patterns in the study population? (2) Which are the toxic agents in the mixture? (3) Are mixture members acting synergistically? And, (4) what is the overall effect of the mixture?


We emphasize the importance of robust methods and interpretable results over predictive accuracy. We encourage collaboration with computer scientists, data scientists, and biostatisticians in future mixture method development.

This is a preview of subscription content, log in to check access.


Papers of particular interest, published recently, have been highlighted as: •• Of major importance

  1. 1.

    Grandjean P, Landrigan PJ. Developmental neurotoxicity of industrial chemicals. Lancet. 2006;368(9553):2167–78.

    Article  CAS  PubMed  Google Scholar 

  2. 2.

    U.S. EPA. Air, climate, and energy: strategic research action plan 2012–2016. Office of Research and Development, Available at:, 2012.

  3. 3.

    NRC (National Research Council). Air Quality Management in the United States. Washington, DC: National Academies Press; 2004.

    Google Scholar 

  4. 4.

    NIEHS. Strategic plan 2012–2017—advancing science, Improving Health: A Plan for Envi-ronmental Health Research. US Department of Health and Human Services, National Insti-tutes of Health, Available at:, 2012.

  5. 5.

    NIEHS Statistical Approaches for Assessing Health Effects of Environmental Chemical Mixtures in Epidemiology Studies, Available at:, July 2015.

  6. 6.

    Taylor KW, Joubert BR, Braun JM, Dilworth C, Gennings C, Hauser R, et al. Statistical approaches for assessing health effects of environmental chemical mixtures in epidemiology: lessons from an innovative workshop. Environ Health Perspect. 2016;124(12):A227.

    Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Hamra GB, Buckley JP. Environmental exposure mixtures: questions and methods to address them. Curr Epidemiol Rep. 2018;5(2):160–5.

    Article  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Stafoggia M, Breitner S, Hampel R, Basagaña X. Statistical ap-proaches to address multi-pollutant mixtures and multiple exposures: the state of the science. Curr Environ Health Rep. 2017;4(4):481–90.

    Article  CAS  PubMed  Google Scholar 

  9. 9.

    Huang H, Wang A, Morello-Frosch R, Lam J, Sirota M, Padula A, et al. Cumulative risk and impact modeling on environmental chemical and social stressors. Curr Environ Health Rep. 2018;5(1):88–99.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Coker E, Liverani S, Su JG, Molitor J. Multi-pollutant modeling through examination of susceptible subpopulations using profile regression. Curr Environ Health Rep. 2018;5(1):59–69.

    Article  CAS  PubMed  Google Scholar 

  11. 11.

    NIEHS. Powering Research through Innovative Methods for mixtures in Epidemiology (PRIME) (R01), RFA-ES-17-001, Available at:, 2017.

  12. 12.

    Jolliffe I. Principal component analysis. Springer-Verlag, New York, 2002.

  13. 13.

    Jolliffe IT. Principal component analysis and factor analysis. Principal component analysis, pages 150–166, Springer-Verlag, New York, 2002.

  14. 14.

    Thompson B. Exploratory and confirmatory factor analysis: understanding concepts and applications. Applied Psychological Measurement. 2007;31(3):245-8.

  15. 15.

    Paatero P, Tapper U. Positive matrix factorization: a non-negative factor model with optimal utilization of error estimates of data values. Environmetrics. 1994;5(2):111–26.

    Article  Google Scholar 

  16. 16.

    Tibshirani R. Regression shrinkage and selection via the lasso. J R Stat Soc Ser B Methodol 1996; 58(1)267–288.

  17. 17.

    Zou H, Hastie T. Regularization and variable selection via the elastic net. J R Stat Soc Ser B Stat Methodol. 2005;67(2):301–20.

    Article  Google Scholar 

  18. 18.

    •• Bobb JF, Valeri L, Henn BC, Christiani DC, Wright RO, Mazumdar M, et al. Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics. 16(3):493–508, 2014 BKMR was developed specifically for environmental mixtures by including kernel machine regression, a machine learning technique, in a Bayesian model.

  19. 19.

    Coull BA, Bobb JF, Wellenius GA, Kioumourtzoglou M-A, Mittle-man MA, Koutrakis P, et al. Development of Statistical Methods for Multipollutant Research; Part 1. Statistical Learning Methods for the Effects of Multiple Air Pollution Con-stituents, volume 183. Boston: Health Effects Institute; 2015.

    Google Scholar 

  20. 20.

    Bobb JF, Henn BC, Valeri L, Coull BA. Statistical software for analyzing the health effects of multiple concurrent exposures via bayesian kernel machine regression. Environ Health. 2018;17(1):67.

    Article  PubMed  PubMed Central  Google Scholar 

  21. 21.

    •• Carrico C, Gennings C, Wheeler DC, Factor-Litvak P. Characteriza-tion of weighted quantile sum regression for highly correlated data in a risk analysis setting. J Agric Biol Environ Stat. 2015;20(1):100–20 WQS was developed specifically for environmental mixtures using a machine learning optimization algorithm.

    Article  PubMed  Google Scholar 

  22. 22.

    Kioumourtzoglou M-A, Austin E, Koutrakis P, Dominici F, Schwartz J, Zanobetti A. PM2.5 and survival among older adults: effect modifica-tion by particulate composition. Epidemiology. 2015;26(3):321–7.

    Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    •• James G, Witten D, Hastie T, Tibshirani R. An introduction to statistical learning. New York: Springer; 2013. This book provides guidance on how to implement statistical and machine learning methods without requiring a background in statistics or computer science. The authors give practical explanations of available methods and when to use them, including R code.

    Google Scholar 

  24. 24.

    Pang Y, Peng RD, Jones MR, Francesconi KA, Goessler W, Howard B-b V, et al. Metal mixtures in urban and rural populations in the US: the multi-ethnic study of atherosclerosis and the strong heart study. Environ Res. 2016;147:356–64.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  25. 25.

    Kioumourtzoglou M-A, Coull BA, Dominici F, Koutrakis P, Schwartz J, Suh H. The impact of source contribution uncertainty on the effects of source-specific PM2.5 on hospital admissions: a case study in Boston, MA. Journal of expo-sure science and environmental. Epidemiology. 2014;24(4):365–71.

    CAS  Google Scholar 

  26. 26.

    Robinson O, Tamayo I, De Castro M, Valentin A, Giorgis-Allemand L, Krog NH, et al. The urban exposome during pregnancy and its socioeconomic determi-nants. Environ Health Perspect. 2018;126(7):077005.

    Article  PubMed  PubMed Central  Google Scholar 

  27. 27.

    Manzano-León N, Serrano-Lomelin J, Sánchez BN, Quintana-Belmares R, Vega E, Vázquez-López I, et al. Tnf α and il-6 responses to particulate matter in vitro: variation according to PM size, season, and polycyclic aromatic hydrocarbon and soil content. Environ Health Perspect. 2015;124(4):406–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. 28.

    Emmanuel J, Candès XL, Ma Y, Wright J. Robust principal component analysis? J ACM. 2011;58(3):11.

    Google Scholar 

  29. 29.

    Gillis N, Glineur F. Using underapproximations for sparse nonnegative matrix factorization. Pattern Recogn. 2010;43(4):1676–87.

    Article  Google Scholar 

  30. 30.

    Gillis N, Plemmons RJ. Sparse nonnegative matrix underapproximation and its application to hyperspectral image analysis. Linear Algebra Appl. 2013;438(10):3991–4007.

    Article  Google Scholar 

  31. 31.

    Lee DD, Seung HS. Algorithms for non-negative matrix factorization. In Advances in neural information processing systems, 2001;556–562.

  32. 32.

    Traoré T, Forhan A, Sirot V, Kadawathagedara M, Heude B, Hulin M, et al. To which mixtures are french pregnant women mainly exposed? a combination of the second french total diet study with the eden and elfe cohort studies. Food Chem Toxicol. 2018;111:310–28.

    Article  CAS  PubMed  Google Scholar 

  33. 33.

    Shen X, Ye J. Adaptive model selection. J Am Stat Assoc. 2002;97(457):210–21.

    Article  Google Scholar 

  34. 34.

    Fan J, Li R. Variable selection via nonconcave penalized likelihood and its oracle properties. J Am Stat Assoc. 2001;96(456):1348–60.

    Article  Google Scholar 

  35. 35.

    Leamer EE. Specification searches: ad hoc inference with nonexperimental data, John Wiley & Sons Incorporated, 1978.

  36. 36.

    Raftery AE. Approximate bayes factors and accounting for model uncertainty in gener-alised linear models. Biometrika. 1996;83(2):251–66.

    Article  Google Scholar 

  37. 37.

    Draper D. Assessment and propagation of model uncertainty. J R Stat Soc Ser B Methodol 1995;45–97.

  38. 38.

    Fan J, Li R. Statistical challenges with high dimensionality: feature selection in knowledge discovery. arXiv preprint math/0602133 2006

  39. 39.

    Hoerl AE, Kennard RW. Ridge regression: biased estimation for nonorthogo-nal problems. Technometrics. 1970;12(1):55–67.

    Article  Google Scholar 

  40. 40.

    Friedman J, Hastie T, Tibshirani R. The elements of statistical learning, volume 1. New York: Springer series in Statistics; 2001.

    Google Scholar 

  41. 41.

    Hal Daumé III. A course in machine learning. 2012;5–73.

  42. 42.

    Nwanaji-Enwerem JC, Dai L, Colicino E, Oulhote Y, Di Q, Kloog I, et al. Associations between long-term exposure to PM2.5 component species and blood DNA methylation age in the elderly: the VA normative aging study. Environ Int. 2017;102:57–65.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  43. 43.

    Zou H. The adaptive lasso and its oracle properties. J Am Stat Assoc. 2006;101(476):1418–29.

    Article  CAS  Google Scholar 

  44. 44.

    Liu D, Lin X, Ghosh D. Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics. 2007;63(4):1079–88.

    Article  PubMed  PubMed Central  Google Scholar 

  45. 45.

    Wasserman GA, Liu X, Parvez F, Chen Y, Factor-Litvak P, LoIa-cono NJ, et al. A cross-sectional study of water arsenic exposure and intellectual function in adolescence in Arai-Hazar, Bangladesh. Environ Int. 2018;118:304–13.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. 46.

    Stingone JA, Pandey OP, Claudio L, Pandey G. Using machine learning to identify air pollution exposure profiles associated with early cognitive skills among us children. Environ Pollut. 2017;230:730–40.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  47. 47.

    Ouidir M, Lepeule J, Siroux V, Malherbe L, Meleux F, Rivière E, Launay L, Zaros C, Cheminat M, Charles M-A, et al. Is atmospheric pollution exposure during pregnancy associated with individual and contex-tual characteristics? A nationwide study in FranceJ Epidemiol Community Health, pages jech–2016, 2017.

  48. 48.

    Gass K, Klein M, Chang HH, Flanders WD, Strickland MJ. Classification and regression trees for epidemiologic research: an air pollution example. Environ Health. 13(1):17, 2014.

  49. 49.

    National Research Council. Phthalates and cumulative risk assessment: the tasks ahead. National Academies Press, 2009.

  50. 50.

    Van den Berg M, Birnbaum LS, Denison M, De Vito M, Farland W, Feeley M, et al. The 2005 world health organization reevaluation of human and mammalian toxic equivalency fac-tors for dioxins and dioxin-like compounds. Toxicol Sci. 2006;93(2):223–41.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  51. 51.

    Mitro SD, Birnbaum LS, Needham BL, Zota AR. Cross-sectional associations between exposure to persistent organic pollutants and leukocyte telomere length among us adults in nhanes, 2001–2002. Environ Health Perspect. 2015;124(5):651–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. 52.

    Gennings C, Carrico C, Factor-Litvak P, Krigbaum N, Cirillo PM, Cohn BA. A cohort study evaluation of maternal pcb exposure related to time to pregnancy in daughters. Environ Health. 2013;12(1):66.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  53. 53.

    Yorita Christensen KL, Carrico CK, Sanyal AJ, Gennings C. Multiple classes of environmental chemicals are associated with liver disease: Nhanes 2003–2004. Int J Hyg Environ Health. 2013;216(6):703–9.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. 54.

    White AJ, O’Brien KM, Niehoff NM, Carroll R, Sandler DP. Metallic air pollutants and breast cancer risk in a nationwide cohort study. Epidemiology 20181;30(1):20-8..

  55. 55.

    Hoffman MD, Blei DM, Wang C, Paisley J. Stochastic variational inference. J Mach Learn Res. 2013;14(1):1303–47.

    Google Scholar 

  56. 56.

    •• Gelman A, Stern HS, Carlin JB, Dunson DB, Vehtari A, Rubin DB. Bayesian data analysis. Chapman and Hall/CRC, 2013. This book is widely considered the leading text on Bayesian methods, with an accessible, applied approach to data analysis. The authors introduce basic concepts from a data-analytic perspective before presenting advanced methods.

  57. 57.

    MacLehose RF, Dunson DB, Herring AH, Hoppin JA. Bayesian methods for highly correlated exposure data. Epidemiology 2007;18(2):199–207.

  58. 58.

    MacLehose RF, Hamra GB. Applications of bayesian methods to epidemi-ologic research. Curr Epidemiol Rep. 2014;1(3):103–9.

    Article  Google Scholar 

  59. 59.

    Furlong MA, Herring A, Buckley JP, Goldman BD, Daniels JL, Engel LS, et al. Prenatal exposure to organophosphorus pesticides and childhood neurodevelopmental phenotypes. Environ Res. 2017;158:737–47.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. 60.

    Fragoso TM, Bertoli W, Louzada F. Bayesian model averaging: a systematic review and conceptual classification. Int Stat Rev. 2018;86(1):1–28.

    Article  Google Scholar 

  61. 61.

    Wilson A, Zigler CM, Patel CJ, Dominici F. Model-averaged confounder adjustment for estimating multivariate exposure effects with linear regression. Biometrics, 2018;74(3):1034-44.

  62. 62.

    Berger K, Eskenazi B, Balmes J, Holland N, Calafat AM, Harley KG. Associations between prenatal maternal urinary concentrations of personal care product chemical biomarkers and childhood respiratory and allergic outcomes in the CHAMACOS study. Environ Int. 2018;121:538–49.

    Article  CAS  PubMed  Google Scholar 

  63. 63.

    Berger K, Eskenazi B, Balmes J, Kogut K, Holland N, Calafat AM, Harley KG. Prenatal high molecular weight phthalates and bisphenol a, and childhood respiratory and allergic outcomes. Pediatr Allergy Immunol, 2019;30(1):36-46.

  64. 64.

    Berger K, Gunier RB, Chevrier J, Calafat AM, Ye X, Eskenazi B, et al. Associations of maternal exposure to triclosan, parabens, and other phenols with prenatal maternal and neonatal thyroid hormone levels. Environ Res. 2018;165:379–86.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. 65.

    Park SK, Tao Y, Meeker JD, Harlow SD, Mukherjee B. Envirsonmental risk score as a new tool to examine multi-pollutants in epidemiologic research: an example from the nhanes study using serum lipid levels. PLoS One. 2014;9(6):e98632.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  66. 66.

    Chipman HA, George EI, McCulloch RE, et al. Bart: Bayesian additive regression trees. Ann Appl Stat. 2010;4(1):266–98.

    Article  Google Scholar 

  67. 67.

    Ko Y-A, Mukherjee B, Smith JA, Kardia SL, Allison M, Diez AVR. Classification and clustering methods for multiple environmental factors in gene-environment interaction: application to the multi-ethnic study of atherosclerosis. Epidemiology. 2016;27(6):870–8.

    Article  PubMed  PubMed Central  Google Scholar 

  68. 68.

    Coker E, Gunier R, Bradman A, Harley K, Kogut K, Molitor J, et al. Association between pesticide profiles used on agricultural fields near maternal residences during pregnancy and iq at age 7 years. Int J Environ Res Public Health. 2017;14(5):506.

    Article  CAS  PubMed Central  Google Scholar 

  69. 69.

    Molitor J, Papathomas M, Jerrett M, Richardson S. Bayesian profile regression with an application to the national survey of children’s health. Biostatistics. 2010;11(3):484–98.

    Article  PubMed  Google Scholar 

  70. 70.

    Kioumourtzoglou M-A, Zanobetti A, Schwartz JD, Coull BA, Dominici F, Suh HH. The effect of primary organic particles on emer-gency hospital admissions among the elderly in 3 us cities. Environ Health. 2013;12(1):68.

    Article  PubMed  PubMed Central  Google Scholar 

  71. 71.

    Kalkbrenner AE, Daniels JL, Chen J-C, Poole C, Emch M, Morrissey J. Perinatal exposure to hazardous air pollutants and autism spectrum dis-orders at age 8. Epidemiology. 2010;21(5):631–41.

    Article  PubMed  PubMed Central  Google Scholar 

  72. 72.

    Momoli F, Abrahamowicz M, Parent M-E, Krewski D, Siemiaty-cki J. Analysis of multiple exposures: an empirical comparison of results from conventional and semi-bayes modeling strategies. Epidemiology. 2010;21(1):144–51.

    Article  PubMed  Google Scholar 

  73. 73.

    Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. Mea-surement error in nonlinear models: a modern perspective. Chapman and Hall/CRC, 2006.

  74. 74.

    Pollack AZ, Perkins NJ, Mumford SL, Ye A, Schisterman EF. Correlated biomarker measurement error: an important threat to inference in environmental epidemiology. Am J Epidemiol. 2012;177(1):84–92.

    Article  PubMed  PubMed Central  Google Scholar 

  75. 75.

    Weisskopf MG, Seals RM, Webster TF. Bias amplification in epidemiologic analysis of exposure to mixtures. Environ Health Perspect. 2018;47003:1.

    Google Scholar 

Download references


This work was supported by NIEHS F31 ES030263, T32 ES007322, P30 ES009089, and R01 ES028805.

Author information



Corresponding author

Correspondence to Marianthi-Anna Kioumourtzoglou.

Ethics declarations

Conflict of Interest

Elizabeth A. Gibson, Jeff Goldsmith, and Marianthi-Anna Kioumourtzoglou declare that they have received grants from the NIEHS during this study.

Human and Animal Rights and Informed Consent

This article does not contain any studies with human or animal subjects performed by any of the authors.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the Topical Collection on Methods in Environmental Epidemiology

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Gibson, E.A., Goldsmith, J. & Kioumourtzoglou, M. Complex Mixtures, Complex Analyses: an Emphasis on Interpretable Results. Curr Envir Health Rpt 6, 53–61 (2019).

Download citation


  • Environmental mixtures
  • Multi-pollutant
  • Dimension reduction
  • Variable selection
  • Bayesian statistics