Considerations and Targeted Approaches to Identifying Bad Actors in Exposure Mixtures

Keil, Alexander P.; O’Brien, Katie M.

doi:10.1007/s12561-023-09409-2

Considerations and Targeted Approaches to Identifying Bad Actors in Exposure Mixtures

Original Paper
Published: 12 December 2023

(2023)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

66 Accesses
Explore all metrics

Abstract

Variable importance is a key statistical issue in exposure mixtures, as it allows a ranking of exposures as potential targets for intervention, and helps to identify bad actors within a mixture. In settings where mixtures have many constituents or high between-constituent correlations, estimators of importance can be subject to bias or high variance. Current approaches to assessing variable importance have major limitations, including reliance on overly strong or incorrect constraints or assumptions, excessive model extrapolation, or poor interpretability, especially regarding practical significance. We sought to overcome these limitations by applying an established doubly robust, machine learning-based approach to estimating variable importance in a mixtures context. This method reduces model extrapolation, appropriately controls confounding, and provides both interpretability and model flexibility. We illustrate its use with an evaluation of the relationship between telomere length, a measure of biologic aging, and exposure to a mixture of polychlorinated biphenyls (PCBs), dioxins, and furans among 979 US adults from the National Health and Nutrition Examination Survey (NHANES). In contrast with standard approaches for mixtures, our approach selected PCB 180 and PCB 194 as important contributors to telomere length. We hypothesize that this difference could be due to residual confounding in standard methods that rely on variable selection. Further empirical evaluation of this method is needed, but it is a promising tool in the search for bad actors within a mixture.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Accommodating detection limits of multiple exposures in environmental mixture analyses: an overview of statistical approaches

Article Open access 16 May 2024

Environmental Exposure Mixtures: Questions and Methods to Address Them

Article 05 April 2018

A Comparison of Statistical Methods for Studying Interactions of Chemical Mixtures

Article Open access 23 February 2024

References

Greenland S (2017) For and against methodologies: some perspectives on recent causal and statistical inference debates. Eur J Epidemiol 32:3–20
Article Google Scholar
Czarnota J, Gennings C, Wheeler DC (2015) Assessment of weighted quantile sum regression for modeling chemical mixtures and cancer risk. Cancer Inform 14:CIN–S17295
Article Google Scholar
Gibson EA, Nunez Y, Abuawad A et al (2019) An overview of methods to address distinct research questions on environmental mixtures: an application to persistent organic pollutants and leukocyte telomere length. Environ Health 18:1–16
Article Google Scholar
Díaz Muñoz I, Van Der Laan M (2012) Population intervention causal effects based on stochastic interventions. Biometrics 68(2):541–549
Article MathSciNet MATH Google Scholar
Díaz Muñoz I, van der Laan MJ (2018) Stochastic treatment regimes. In: van der Laan MJ, Rose S (eds) Targeted learning in data science: causal inference for complex longitudinal studies. Springer International Publishing, Cham, pp 219–232
Chapter Google Scholar
Díaz Muñoz I, Hubbard A, Decker A et al (2015) Variable importance and prediction methods for longitudinal problems with missing variables. PLoS ONE 10(3):e0120031
Article Google Scholar
Mitro SD, Birnbaum LS, Needham BL et al (2016) Cross-sectional associations between exposure to persistent organic pollutants and leukocyte telomere length among US adults in NHANES, 2001–2002. Environ Health Perspect 124(5):651–658
Article Google Scholar
Zipf G, Chiappa M, Porter KS et al (2013) Health and nutrition examination survey plan and operations, 1999–2010. Vital Health Stat 1 56:1–37
Google Scholar
Van der Laan MJ (2006) Statistical inference for variable importance. Int J Biostat. https://doi.org/10.2202/1557-4679.1008
Article MathSciNet Google Scholar
Pearl J (2010) Brief report: on the consistency rule in causal inference: “axiom, definition, assumption, or theorem?’’. Epidemiology 21:872–875
Article Google Scholar
Young JG, Hernán MA, Robins JM (2014) Identification, estimation and approximation of risk under interventions that depend on the natural value of treatment using observational data. Epidemiol Methods 3(1):1–19
Article MATH Google Scholar
Tibshirani R (1996) Regression shrinkage and selection via the lasso. J R Stat Soc Ser B (Methodol) 58(1):267–288
MathSciNet MATH Google Scholar
Zou H, Hastie T (2005) Regularization and variable selection via the elastic net. J R Stat Soc Ser B (Stat Methodol) 67(2):301–320
Article MathSciNet MATH Google Scholar
Snowden JM, Reid CE, Tager IB (2015) Framing air pollution epidemiology in terms of population interventions, with applications to multi-pollutant modeling. Epidemiology 26(2):271
Article Google Scholar
Westreich D, Cole SR (2010) Invited commentary: positivity in practice. Am J Epidemiol 171(6):674–677
Article Google Scholar
Breiman L (2001) Random forests. Mach Learn 45:5–32
Article MATH Google Scholar
Liaw A, Wiener M (2002) Classification and regression by randomforest. R News 2(3):18–22. https://CRAN.R-project.org/doc/Rnews/
Strobl C, Boulesteix AL, Zeileis A et al (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinform 8(1):1–21
Article Google Scholar
Breiman L (2001) Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci 16(3):199–231
Article MATH Google Scholar
Greenland S (2000) Principles of multilevel modelling. Int J Epidemiol 29(1):158–167
Article Google Scholar
Pearl J (1995) Causal diagrams for empirical research. Biometrika 82(4):669–688
Article MathSciNet MATH Google Scholar
Richardson TS, Robins JM (2013) Single world intervention graphs (SWIGs): a unification of the counterfactual and graphical approaches to causality. Center for the Statistics and the Social Sciences, University of Washington Series Working Paper 128(30):2013
Van der Laan MJ, Rose S et al (2011) Targeted learning: causal inference for observational and experimental data, vol 4. Springer, New York
Book Google Scholar
Robins J (1986) A new approach to causal inference in mortality studies with a sustained exposure period–application to control of the healthy worker survivor effect. Math Model 7(9–12):1393–1512
Article MathSciNet MATH Google Scholar
Bang H, Robins JM (2005) Doubly robust estimation in missing data and causal inference models. Biometrics 61(4):962–973
Article MathSciNet MATH Google Scholar
Van der Laan MJ, Polley EC, Hubbard AE (2007) Super learner. Stat Appl Genet Mol Biol. https://doi.org/10.2202/1544-6115.1309
Article MathSciNet MATH Google Scholar
Wolpert DH (1992) Stacked generalization. Neural Netw 5(2):241–259
Article Google Scholar
IARC Working Group on the Evaluation of Carcinogenic Risks to Humans (2012) Chemical agents and related occupations. IARC Monogr Eval Carcinog Risks Hum 100(Pt F):9–562
Google Scholar
IARC Working Group on the Evaluation of Carcinogenic Risks to Humans (2016) Polychlorinated Biphenyls and Polybrominated Biphenyls, vol 107. pp 9–500
Sarkar P, Shiizaki K, Yonemoto J et al (2006) Activation of telomerase in BeWo cells by estrogen and 2,3,7,8-tetrachlorodibenzo-p-dioxin in co-operation with c-Myc. Int J Oncol 28(1):43–51
Google Scholar
Ziegler S, Schettgen T, Beier F et al (2017) Accelerated telomere shortening in peripheral blood lymphocytes after occupational polychlorinated biphenyls exposure. Arch Toxicol 91:289–300
Article Google Scholar
Van den Berg M, Birnbaum LS, Denison M et al (2006) The 2005 World Health Organization reevaluation of human and mammalian toxic equivalency factors for dioxins and dioxin-like compounds. Toxicol Sci 93(2):223–241
Article Google Scholar
Keil AP, Buckley JP, O’Brien KM et al (2020) A quantile-based g-computation approach to addressing the effects of exposure mixtures. Environ Health Perspect 128(4):047004
Article Google Scholar
O’Brien KM, Upson K, Cook NR et al (2016) Environmental chemicals in urine and blood: improving methods for creatinine and lipid adjustment. Environ Health Perspect 124(2):220–227
Article Google Scholar
Cawthon RM (2002) Telomere measurement by quantitative PCR. Nucleic Acids Res 30(10):e47–e47
Article Google Scholar
Lan Q, Cawthon R, Shen M et al (2009) A prospective study of telomere length measured by monochrome multiplex quantitative PCR and risk of non-Hodgkin lymphoma. Clin Cancer Res 15(23):7429–7433
Article Google Scholar
Gelman A (2008) Scaling regression inputs by dividing by two standard deviations. Stat Med 27(15):2865–2873
Article MathSciNet Google Scholar
Carrico C, Gennings C, Wheeler DC et al (2015) Characterization of weighted quantile sum regression for highly correlated data in a risk analysis setting. J Agric Biol Environ Stat 20:100–120
Article MathSciNet MATH Google Scholar
Wood SN, Pya N, Säfken B (2016) Smoothing parameter and model selection for general smooth models. J Am Stat Assoc 111(516):1548–1563
Article MathSciNet Google Scholar
Friedman JH (1991) Multivariate adaptive regression splines. Ann Stat 19(1):1–67
MathSciNet MATH Google Scholar
Bobb JF, Valeri L, Claus Henn B et al (2015) Bayesian kernel machine regression for estimating the health effects of multi-pollutant mixtures. Biostatistics 16(3):493–508
Article MathSciNet Google Scholar
Chernozhukov V, Chetverikov D, Demirer M et al (2018) Double/debiased machine learning for treatment and structural parameters. Econom J 21(1):C1–C68
Kang JD, Schafer JL (2007) Demystifying double robustness: a comparison of alternative strategies for estimating a population mean from incomplete data. Stat Sci 22(4):523–539
MathSciNet MATH Google Scholar
Rubin DB (2008) For objective causal inference, design trumps analysis. Ann Appl Stat 2(3):808–840
Article MathSciNet MATH Google Scholar

Download references

Funding

This study was funded by the Intramural Research Program of the National Institutes of Health, NCI, Division of Cancer Epidemiology and Genetics (Grant No. Z01CP010119), the Intramural Research Program of the National Institutes of Health, NIEHS (Grant No. Z01ES044005).

Author information

Authors and Affiliations

Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, NIH, 9609 Medical Center Drive, Rockville, MD, 20850, USA
Alexander P. Keil
Epidemiology Branch, National Institute of Environmental Health Sciences, NIH, 111 T.W. Alexander Drive, Durham, NC, 27709, USA
Katie M. O’Brien

Authors

Alexander P. Keil
View author publications
You can also search for this author in PubMed Google Scholar
Katie M. O’Brien
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alexander P. Keil.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Keil, A.P., O’Brien, K.M. Considerations and Targeted Approaches to Identifying Bad Actors in Exposure Mixtures. Stat Biosci (2023). https://doi.org/10.1007/s12561-023-09409-2

Download citation

Received: 12 May 2023
Revised: 24 October 2023
Accepted: 06 November 2023
Published: 12 December 2023
DOI: https://doi.org/10.1007/s12561-023-09409-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Considerations and Targeted Approaches to Identifying Bad Actors in Exposure Mixtures

Abstract

Access this article

Similar content being viewed by others

Accommodating detection limits of multiple exposures in environmental mixture analyses: an overview of statistical approaches

Environmental Exposure Mixtures: Questions and Methods to Address Them

A Comparison of Statistical Methods for Studying Interactions of Chemical Mixtures

References

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Considerations and Targeted Approaches to Identifying Bad Actors in Exposure Mixtures

Abstract

Access this article

Similar content being viewed by others

Accommodating detection limits of multiple exposures in environmental mixture analyses: an overview of statistical approaches

Environmental Exposure Mixtures: Questions and Methods to Address Them

A Comparison of Statistical Methods for Studying Interactions of Chemical Mixtures

References

Funding

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation