Skip to main content
Log in

Considerations and Targeted Approaches to Identifying Bad Actors in Exposure Mixtures

  • Original Paper
  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Variable importance is a key statistical issue in exposure mixtures, as it allows a ranking of exposures as potential targets for intervention, and helps to identify bad actors within a mixture. In settings where mixtures have many constituents or high between-constituent correlations, estimators of importance can be subject to bias or high variance. Current approaches to assessing variable importance have major limitations, including reliance on overly strong or incorrect constraints or assumptions, excessive model extrapolation, or poor interpretability, especially regarding practical significance. We sought to overcome these limitations by applying an established doubly robust, machine learning-based approach to estimating variable importance in a mixtures context. This method reduces model extrapolation, appropriately controls confounding, and provides both interpretability and model flexibility. We illustrate its use with an evaluation of the relationship between telomere length, a measure of biologic aging, and exposure to a mixture of polychlorinated biphenyls (PCBs), dioxins, and furans among 979 US adults from the National Health and Nutrition Examination Survey (NHANES). In contrast with standard approaches for mixtures, our approach selected PCB 180 and PCB 194 as important contributors to telomere length. We hypothesize that this difference could be due to residual confounding in standard methods that rely on variable selection. Further empirical evaluation of this method is needed, but it is a promising tool in the search for bad actors within a mixture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5

Similar content being viewed by others

References

  1. Greenland S (2017) For and against methodologies: some perspectives on recent causal and statistical inference debates. Eur J Epidemiol 32:3–20

    Article  Google Scholar 

  2. Czarnota J, Gennings C, Wheeler DC (2015) Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk. Cancer Inform 14:CIN–S17295

    Article  Google Scholar 

  3. Gibson EA, Nunez Y, Abuawad A et al (2019) An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health 18:1–16

    Article  Google Scholar 

  4. Díaz Muñoz I, Van Der Laan M (2012) Population intervention causal effects based on stochastic interventions. Biometrics 68(2):541–549

    Article  MathSciNet  MATH  Google Scholar 

  5. Díaz Muñoz I, van der Laan MJ (2018) Stochastic treatment regimes. In: van der Laan MJ, Rose S (eds) Targeted learning in data science: causal inference for complex longitudinal studies. Springer International Publishing, Cham, pp 219–232

    Chapter  Google Scholar 

  6. Díaz Muñoz I, Hubbard A, Decker A et al (2015) Variable importance and prediction methods for longitudinal problems with missing variables. PLoS ONE 10(3):e0120031

    Article  Google Scholar 

  7. Mitro SD, Birnbaum LS, Needham BL et al (2016) Cross-sectional associations between exposure to persistent organic pollutants and leukocyte telomere length among US adults in NHANES, 2001–2002. Environ Health Perspect 124(5):651–658

    Article  Google Scholar 

  8. Zipf G, Chiappa M, Porter KS et al (2013) Health and nutrition examination survey plan and operations, 1999–2010. Vital Health Stat 1 56:1–37

    Google Scholar 

  9. Van der Laan MJ (2006) Statistical inference for variable importance. Int J Biostat. https://doi.org/10.2202/1557-4679.1008

    Article  MathSciNet  Google Scholar 

  10. Pearl J (2010) Brief report: on the consistency rule in causal inference: “axiom, definition, assumption, or theorem?’’. Epidemiology 21:872–875

    Article  Google Scholar 

  11. Young JG, Hernán MA, Robins JM (2014) Identification, estimation and approximation of risk under interventions that depend on the natural value of treatment using observational data. Epidemiol Methods 3(1):1–19

    Article  MATH  Google Scholar 

  12. Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288

    MathSciNet  MATH  Google Scholar 

  13. Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320

    Article  MathSciNet  MATH  Google Scholar 

  14. Snowden JM, Reid CE, Tager IB (2015) Framing air pollution epidemiology in terms of population interventions, with applications to multi-pollutant modeling. Epidemiology 26(2):271

    Article  Google Scholar 

  15. Westreich D, Cole SR (2010) Invited commentary: positivity in practice. Am J Epidemiol 171(6):674–677

    Article  Google Scholar 

  16. Breiman L (2001) Random forests. Mach Learn 45:5–32

    Article  MATH  Google Scholar 

  17. Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22. https://CRAN.R-project.org/doc/Rnews/

  18. Strobl C, Boulesteix AL, Zeileis A et al (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8(1):1–21

    Article  Google Scholar 

  19. Breiman L (2001) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16(3):199–231

    Article  MATH  Google Scholar 

  20. Greenland S (2000) Principles of multilevel modelling. Int J Epidemiol 29(1):158–167

    Article  Google Scholar 

  21. Pearl J (1995) Causal diagrams for empirical research. Biometrika 82(4):669–688

    Article  MathSciNet  MATH  Google Scholar 

  22. Richardson TS, Robins JM (2013) Single world intervention graphs (SWIGs): a unification of the counterfactual and graphical approaches to causality. Center for the Statistics and the Social Sciences, University of Washington Series Working Paper 128(30):2013

  23. Van der Laan MJ, Rose S et al (2011) Targeted learning: causal inference for observational and experimental data, vol 4. Springer, New York

    Book  Google Scholar 

  24. Robins J (1986) A new approach to causal inference in mortality studies with a sustained exposure period–application to control of the healthy worker survivor effect. Math Model 7(9–12):1393–1512

    Article  MathSciNet  MATH  Google Scholar 

  25. Bang H, Robins JM (2005) Doubly robust estimation in missing data and causal inference models. Biometrics 61(4):962–973

    Article  MathSciNet  MATH  Google Scholar 

  26. Van der Laan MJ, Polley EC, Hubbard AE (2007) Super learner. Stat Appl Genet Mol Biol. https://doi.org/10.2202/1544-6115.1309

    Article  MathSciNet  MATH  Google Scholar 

  27. Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259

    Article  Google Scholar 

  28. IARC Working Group on the Evaluation of Carcinogenic Risks to Humans (2012) Chemical agents and related occupations. IARC Monogr Eval Carcinog Risks Hum 100(Pt F):9–562

    Google Scholar 

  29. IARC Working Group on the Evaluation of Carcinogenic Risks to Humans (2016) Polychlorinated Biphenyls and Polybrominated Biphenyls, vol 107. pp 9–500

  30. Sarkar P, Shiizaki K, Yonemoto J et al (2006) Activation of telomerase in BeWo cells by estrogen and 2,3,7,8-tetrachlorodibenzo-p-dioxin in co-operation with c-Myc. Int J Oncol 28(1):43–51

    Google Scholar 

  31. Ziegler S, Schettgen T, Beier F et al (2017) Accelerated telomere shortening in peripheral blood lymphocytes after occupational polychlorinated biphenyls exposure. Arch Toxicol 91:289–300

    Article  Google Scholar 

  32. Van den Berg M, Birnbaum LS, Denison M et al (2006) The 2005 World Health Organization reevaluation of human and mammalian toxic equivalency factors for dioxins and dioxin-like compounds. Toxicol Sci 93(2):223–241

    Article  Google Scholar 

  33. Keil AP, Buckley JP, O’Brien KM et al (2020) A quantile-based g-computation approach to addressing the effects of exposure mixtures. Environ Health Perspect 128(4):047004

    Article  Google Scholar 

  34. O’Brien KM, Upson K, Cook NR et al (2016) Environmental chemicals in urine and blood: improving methods for creatinine and lipid adjustment. Environ Health Perspect 124(2):220–227

    Article  Google Scholar 

  35. Cawthon RM (2002) Telomere measurement by quantitative PCR. Nucleic Acids Res 30(10):e47–e47

    Article  Google Scholar 

  36. Lan Q, Cawthon R, Shen M et al (2009) A prospective study of telomere length measured by monochrome multiplex quantitative PCR and risk of non-Hodgkin lymphoma. Clin Cancer Res 15(23):7429–7433

    Article  Google Scholar 

  37. Gelman A (2008) Scaling regression inputs by dividing by two standard deviations. Stat Med 27(15):2865–2873

    Article  MathSciNet  Google Scholar 

  38. Carrico C, Gennings C, Wheeler DC et al (2015) Characterization of weighted quantile sum regression for highly correlated data in a risk analysis setting. J Agric Biol Environ Stat 20:100–120

    Article  MathSciNet  MATH  Google Scholar 

  39. Wood SN, Pya N, Säfken B (2016) Smoothing parameter and model selection for general smooth models. J Am Stat Assoc 111(516):1548–1563

    Article  MathSciNet  Google Scholar 

  40. Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67

    MathSciNet  MATH  Google Scholar 

  41. Bobb JF, Valeri L, Claus Henn B et al (2015) Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 16(3):493–508

    Article  MathSciNet  Google Scholar 

  42. Chernozhukov V, Chetverikov D, Demirer M et al (2018) Double/debiased machine learning for treatment and structural parameters. Econom J 21(1):C1–C68

  43. Kang JD, Schafer JL (2007) Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22(4):523–539

    MathSciNet  MATH  Google Scholar 

  44. Rubin DB (2008) For objective causal inference, design trumps analysis. Ann Appl Stat 2(3):808–840

    Article  MathSciNet  MATH  Google Scholar 

Download references

Funding

This study was funded by the Intramural Research Program of the National Institutes of Health, NCI, Division of Cancer Epidemiology and Genetics (Grant No. Z01CP010119), the Intramural Research Program of the National Institutes of Health, NIEHS (Grant No. Z01ES044005).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander P. Keil.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Keil, A.P., O’Brien, K.M. Considerations and Targeted Approaches to Identifying Bad Actors in Exposure Mixtures. Stat Biosci (2023). https://doi.org/10.1007/s12561-023-09409-2

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12561-023-09409-2

Keywords

Navigation