Statistical Methods for Modeling Exposure Variables Subject to Limit of Detection

Seok, Eunsil; Ghassabian, Akhgar; Wang, Yuyan; Liu, Mengling

doi:10.1007/s12561-023-09408-3

Statistical Methods for Modeling Exposure Variables Subject to Limit of Detection

Original Paper
Published: 28 November 2023

(2023)
Cite this article

Statistics in Biosciences Aims and scope Submit manuscript

Eunsil Seok¹,
Akhgar Ghassabian^1,2,
Yuyan Wang¹ &
…
Mengling Liu ORCID: orcid.org/0000-0001-9758-8522¹

87 Accesses
Explore all metrics

Abstract

Environmental health research aims to assess the impact of environmental exposures, making it crucial to understand their effects due to their broad impacts on the general population. However, a common issue with measuring exposures using bio-samples in laboratory is that values below the limit of detection (LOD) are either left unreported or inaccurately read by machines, which subsequently influences the analysis and assessment of exposure effects on health outcomes. We address the challenge of handling exposure variables subject to LOD when they are treated as either covariates or an outcome. We evaluate the performance of commonly-used methods including complete-case analysis and fill-in method, and advanced techniques such as multiple imputation, missing-indicator model, two-part model, Tobit model, and several others. We compare these methods through simulations and a dataset from NHANES 2013–2014. Our numerical studies show that the missing-indicator model generally yields reasonable estimates when considering exposure variables as covariates under various settings, while other methods tend to be sensitive to the LOD-missing proportions and/or distributional skewness of exposures. When modeling an exposure variable as the outcome, Tobit model performs well under Gaussian distribution and quantile regression generally provides robust estimates across various shapes of the outcome’s distribution. In the presence of missing data due to LOD, different statistical models should be considered for being aligned with scientific questions, model assumptions, requirements of data distributions, as well as their interpretations. Sensitivity analysis to handle LOD-missing exposures can improve the robustness of model conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Compare the marginal effects for environmental exposure and biomonitoring data with repeated measurements and values below the limit of detection

Article 22 January 2024

Quantile regression for exposure data with repeated measures in the presence of non-detects

Article 09 June 2021

Methods to account for uncertainties in exposure assessment in studies of environmental exposures

Article Open access 08 April 2019

References

Ortega-Villa AM, Liu D, Ward MH, Albert PS (2021) New insights into modeling exposure measurements below the limit of detection. Environ Epidemiol 5(1):e116
Article Google Scholar
He H, Mi X, Tang W, Kelly T, Shen H, Deng H, Du Y (2020) Statistical issues on analysis of censored data due to detection limit. Int J Stat Probab 9(4):49–61
Article Google Scholar
Lin DY, Zeng D, Couper D (2020) A general framework for integrative analysis of incomplete multiomics data. Genet Epidemiol 44(7):646–664
Article Google Scholar
Lynn HS (2001) Maximum likelihood inference for left-censored HIV RNA data. Stat Med 20(1):33–45
Article Google Scholar
May RC, Ibrahim JG, Chu H (2011) Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits. Stat Med 30(20):2551–2561
Article MathSciNet Google Scholar
Chiou SH, Betensky RA, Balasubramanian R (2019) The missing indicator approach for censored covariates subject to limit of detection in logistic regression models. Ann Epidemiol 38:57–64
Article Google Scholar
Nie L, Chu H, Liu C, Cole SR, Vexler A, Schisterman EF (2010) Linear regression with an independent variable subject to a detection limit. Epidemiology 21(suppl 4):S17
Article Google Scholar
Cole SR, Chu H, Nie L, Schisterman EF (2009) Estimating the odds ratio when exposure has a limit of detection. Int J Epidemiol 38(6):1674–1680
Article Google Scholar
Lubin JH, Colt JS, Camann D, Davis S, Cerhan JR, Severson RK, Hartge P (2004) Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect 112(17):1691–1696
Article Google Scholar
Neelon B, O’Malley AJ, Smith VA (2016) Modeling zero-modified count and semicontinuous data in health services research part 1: background and overview. Stat Med 35(27):5070–5093
Article MathSciNet Google Scholar
Liu L, Shih YCT, Strawderman RL, Zhang D, Johnson BA, Chai H (2019) Statistical analysis of zero-inflated nonnegative continuous data: a review. Stat Sci 34(2):253–279
Article MathSciNet MATH Google Scholar
Bernhardt PW, Wang HJ, Zhang D (2015) Statistical methods for generalized linear models with covariates subject to detection limits. Stat Biosci 7:68–89
Article Google Scholar
Dziura JD, Post LA, Zhao Q, Fu Z, Peduzzi P (2013) Strategies for dealing with missing data in clinical trials: from design to analysis. Yale J Biol Med 86(3):343
Google Scholar
Helsel DR (2005) More than obvious: better methods for interpreting nondetect data. Environ Sci Technol 39(20):419A-423A
Article Google Scholar
Richardson DB, Ciampi A (2003) Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am J Epidemiol 157(4):355–363
Article Google Scholar
Schisterman EF, Vexler A, Whitcomb BW, Liu A (2006) The limitations due to exposure detection limits for regression models. Am J Epidemiol 163(4):374–383
Article Google Scholar
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
Book MATH Google Scholar
Baccarelli A, Pfeiffer R, Consonni D, Pesatori AC, Bonzini M, Patterson DG Jr, Landi MT (2005) Handling of dioxin measurement data in the presence of non-detectable values: overview of available methods and their application in the seveso chloracne study. Chemosphere 60(7):898–906
Article Google Scholar
Arunajadai SG, Rauh VA (2012) Handling covariates subject to limits of detection in regression. Environ Ecol Stat 19(3):369–391
Article MathSciNet Google Scholar
Liu H, Campana AM, Wang Y, Kannan K, Liu M, Zhu H, Ghassabian A (2021) Organophosphate pesticide exposure: demographic and dietary predictors in an urban pregnancy cohort. Environ Pollut 283:116920
Article Google Scholar
Tyrrell J, Melzer D, Henley W, Galloway TS, Osborne NJ (2013) Associations between socioeconomic status and environmental toxicant concentrations in adults in the USA: NHANES 2001–2010. Environ Int 59:328–335
Article Google Scholar
Vrijheid M, Martinez D, Aguilera I, Ballester F, Basterrechea M, Esplugues A, Sunyer J (2012) Socioeconomic status and exposure to multiple environmental pollutants during pregnancy: evidence for environmental inequity? J Epidemiol Commun Health 66(2):106–113
Article Google Scholar
Tobin J (1958) Estimation of relationships for limited dependent variables. Econometrica 26:24–36
Article MathSciNet MATH Google Scholar
Ling W, Cheng B, Wei Y, Willey JZ, Cheung YK (2022) Statistical inference in quantile regression for zero-inflated outcomes. Stat Sin 32:1411–1433
MathSciNet MATH Google Scholar
Koenker R, Bassett G Jr (1978) Regression quantiles. Econometrica 46:33–50
Article MathSciNet MATH Google Scholar
Centers for Disease Control and Prevention (CDC) and National Center for Health Statistics (NCHS) (2014) National Health and Nutrition Examination Survey Data. Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2013–2014. Available at: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/overview.aspx?BeginYear=2013
Trasande L, Attina TM, Blustein J (2012) Association between urinary bisphenol a concentration and obesity prevalence in children and adolescents. J Am Med Assoc 308(11):1113–1121
Article Google Scholar
Carwile JL, Michels KB (2011) Urinary bisphenol a and obesity: Nhanes 2003–2006. Environ Res 111(6):825–830
Article Google Scholar
Food and Drug Administration (FDA) (2014) Bisphenol A (BPA): Use in food contact application. Available at: https://www.fda.gov/newsevents/publichealthfocus/ucm064437.htm
Rochester JR, Bolden AL (2015) Bisphenol s and f: a systematic review and comparison of the hormonal activity of bisphenol a substitutes. Environ Health Perspect 123(7):643–650
Article Google Scholar
Eladak S, Grisin T, Moison D, Guerquin MJ, N’Tumba-Byn T, Pozzi-Gaudin S, Habert R (2015) A new chapter in the bisphenol a story: bisphenol s and bisphenol f are not safe alternatives to this compound. Fertil Steril 103(1):11–21
Article Google Scholar
Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Carpenter JR (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. https://doi.org/10.1136/bmj.b2393
Article Google Scholar
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30(4):377–399
Article MathSciNet Google Scholar
Van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in r. J Stat Softw 45:1–67
Article Google Scholar
Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press, Boca Raton
Book MATH Google Scholar
Van Buuren S, Oudshoorn CG (2000) Multivariate imputation by chained equations. TNO, Leiden
Google Scholar
Van Buuren S, Brand JP, Groothuis-Oudshoorn CG, Rubin DB (2006) Fully conditional specification in multivariate imputation. J Stat Comput Simul 76(12):1049–1064
Article MathSciNet MATH Google Scholar
Jakobsen JC, Gluud C, Wetterslev J, Winkel P (2017) When and how should multiple imputation be used for handling missing data in randomised clinical trials-a practical guide with flowcharts. BMC Med Res Methodol 17(1):1–10
Article Google Scholar
Amemiya T (1984) Tobit models: a survey. J Econometr 24(1–2):3–61
Article MathSciNet MATH Google Scholar
Kleiber C, Zeileis A, Zeileis MA (2020) Package ‘aer’. R package version 12(4)
Koenker R, Portnoy S, Ng PT, Zeileis A, Grosjean P, Ripley BD (2018) Package ‘quantreg’. Cran R-project. org
Ling W (2022) Statistical inference in quantile regression for zero-inflated outcomes. https://github.com/wdl2459/ZIQ. GitHub
Zipf G, Chiappa M, Porter KS, Ostchega Y, Lewis BG, Dostal J (2013) Health and nutrition examination survey plan and operations, 1999-2010. National Center for Health Statistics. Vital Health Stat 1(56)

Download references

Funding

This research was partially supported by NIH R01ES032808, NIH R01ES032826, and NIH UH3OD023305, and CDC/NIOSH grant U01OH012637.

Author information

Authors and Affiliations

Department of Population Health, New York University Grossman School of Medicine, New York, NY, 10016, USA
Eunsil Seok, Akhgar Ghassabian, Yuyan Wang & Mengling Liu
Department of Pediatrics, New York University Grossman School of Medicine, New York, NY, 10016, USA
Akhgar Ghassabian

Authors

Eunsil Seok
View author publications
You can also search for this author in PubMed Google Scholar
Akhgar Ghassabian
View author publications
You can also search for this author in PubMed Google Scholar
Yuyan Wang
View author publications
You can also search for this author in PubMed Google Scholar
Mengling Liu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mengling Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 3528 KB)

Appendix A: Figure A1. Data flow diagram for deriving final dataset for analysis. Figure A2. Histogram of original and log-transformed and standardized exposures in NHANES dataset. Table A1. Coefficient estimates (and standard error) of exposure variables in a single exposure model. Table A2. Coefficient estimate (and standard error) of exposure variables in a multiple exposures model. Table A3. Coefficient estimates (and standard error) of AGE and IPR variables in an exposure-outcome model. Figure A3 - A14 Estimated effects of GENDER, RACE, EDU, MWD, and CREATININE variables with \(95\%\) confidence intervals when a chemical exposure is an outcome in a model. Table A4. LOD values for two exposure variables and corresponding missing rates. Table A5. Results from five methods in single exposure model under Scenario 2. Figure A15. Histogram of generated skewed exposure variable with different missing rates due to LOD.

Appendix B: Details on ML approach for single exposure model.

Supplementary file 2 (html 2718 KB)

Appendix C: R markdown document implementing all descriptive and analytical processes of the NHANES data application in this article.

Supplementary file 3 (pdf 30 KB)

Appendix D: R code for preprocessing and merging four datasets from NHANES 2013-2014 cycle.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Seok, E., Ghassabian, A., Wang, Y. et al. Statistical Methods for Modeling Exposure Variables Subject to Limit of Detection. Stat Biosci (2023). https://doi.org/10.1007/s12561-023-09408-3

Download citation

Received: 27 December 2022
Revised: 19 August 2023
Accepted: 12 October 2023
Published: 28 November 2023
DOI: https://doi.org/10.1007/s12561-023-09408-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Statistical Methods for Modeling Exposure Variables Subject to Limit of Detection

Abstract

Access this article

Similar content being viewed by others

Compare the marginal effects for environmental exposure and biomonitoring data with repeated measurements and values below the limit of detection

Quantile regression for exposure data with repeated measures in the presence of non-detects

Methods to account for uncertainties in exposure assessment in studies of environmental exposures

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

Supplementary file 1 (pdf 3528 KB)

Supplementary file 2 (html 2718 KB)

Supplementary file 3 (pdf 30 KB)

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Statistical Methods for Modeling Exposure Variables Subject to Limit of Detection

Abstract

Access this article

Similar content being viewed by others

Compare the marginal effects for environmental exposure and biomonitoring data with repeated measurements and values below the limit of detection

Quantile regression for exposure data with repeated measures in the presence of non-detects

Methods to account for uncertainties in exposure assessment in studies of environmental exposures

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Supplementary Information

Supplementary file 1 (pdf 3528 KB)

Supplementary file 2 (html 2718 KB)

Supplementary file 3 (pdf 30 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation