Abstract
Environmental health research aims to assess the impact of environmental exposures, making it crucial to understand their effects due to their broad impacts on the general population. However, a common issue with measuring exposures using bio-samples in laboratory is that values below the limit of detection (LOD) are either left unreported or inaccurately read by machines, which subsequently influences the analysis and assessment of exposure effects on health outcomes. We address the challenge of handling exposure variables subject to LOD when they are treated as either covariates or an outcome. We evaluate the performance of commonly-used methods including complete-case analysis and fill-in method, and advanced techniques such as multiple imputation, missing-indicator model, two-part model, Tobit model, and several others. We compare these methods through simulations and a dataset from NHANES 2013–2014. Our numerical studies show that the missing-indicator model generally yields reasonable estimates when considering exposure variables as covariates under various settings, while other methods tend to be sensitive to the LOD-missing proportions and/or distributional skewness of exposures. When modeling an exposure variable as the outcome, Tobit model performs well under Gaussian distribution and quantile regression generally provides robust estimates across various shapes of the outcome’s distribution. In the presence of missing data due to LOD, different statistical models should be considered for being aligned with scientific questions, model assumptions, requirements of data distributions, as well as their interpretations. Sensitivity analysis to handle LOD-missing exposures can improve the robustness of model conclusions.
Similar content being viewed by others
References
Ortega-Villa AM, Liu D, Ward MH, Albert PS (2021) New insights into modeling exposure measurements below the limit of detection. Environ Epidemiol 5(1):e116
He H, Mi X, Tang W, Kelly T, Shen H, Deng H, Du Y (2020) Statistical issues on analysis of censored data due to detection limit. Int J Stat Probab 9(4):49–61
Lin DY, Zeng D, Couper D (2020) A general framework for integrative analysis of incomplete multiomics data. Genet Epidemiol 44(7):646–664
Lynn HS (2001) Maximum likelihood inference for left-censored HIV RNA data. Stat Med 20(1):33–45
May RC, Ibrahim JG, Chu H (2011) Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits. Stat Med 30(20):2551–2561
Chiou SH, Betensky RA, Balasubramanian R (2019) The missing indicator approach for censored covariates subject to limit of detection in logistic regression models. Ann Epidemiol 38:57–64
Nie L, Chu H, Liu C, Cole SR, Vexler A, Schisterman EF (2010) Linear regression with an independent variable subject to a detection limit. Epidemiology 21(suppl 4):S17
Cole SR, Chu H, Nie L, Schisterman EF (2009) Estimating the odds ratio when exposure has a limit of detection. Int J Epidemiol 38(6):1674–1680
Lubin JH, Colt JS, Camann D, Davis S, Cerhan JR, Severson RK, Hartge P (2004) Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect 112(17):1691–1696
Neelon B, O’Malley AJ, Smith VA (2016) Modeling zero-modified count and semicontinuous data in health services research part 1: background and overview. Stat Med 35(27):5070–5093
Liu L, Shih YCT, Strawderman RL, Zhang D, Johnson BA, Chai H (2019) Statistical analysis of zero-inflated nonnegative continuous data: a review. Stat Sci 34(2):253–279
Bernhardt PW, Wang HJ, Zhang D (2015) Statistical methods for generalized linear models with covariates subject to detection limits. Stat Biosci 7:68–89
Dziura JD, Post LA, Zhao Q, Fu Z, Peduzzi P (2013) Strategies for dealing with missing data in clinical trials: from design to analysis. Yale J Biol Med 86(3):343
Helsel DR (2005) More than obvious: better methods for interpreting nondetect data. Environ Sci Technol 39(20):419A-423A
Richardson DB, Ciampi A (2003) Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am J Epidemiol 157(4):355–363
Schisterman EF, Vexler A, Whitcomb BW, Liu A (2006) The limitations due to exposure detection limits for regression models. Am J Epidemiol 163(4):374–383
Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York
Baccarelli A, Pfeiffer R, Consonni D, Pesatori AC, Bonzini M, Patterson DG Jr, Landi MT (2005) Handling of dioxin measurement data in the presence of non-detectable values: overview of available methods and their application in the seveso chloracne study. Chemosphere 60(7):898–906
Arunajadai SG, Rauh VA (2012) Handling covariates subject to limits of detection in regression. Environ Ecol Stat 19(3):369–391
Liu H, Campana AM, Wang Y, Kannan K, Liu M, Zhu H, Ghassabian A (2021) Organophosphate pesticide exposure: demographic and dietary predictors in an urban pregnancy cohort. Environ Pollut 283:116920
Tyrrell J, Melzer D, Henley W, Galloway TS, Osborne NJ (2013) Associations between socioeconomic status and environmental toxicant concentrations in adults in the USA: NHANES 2001–2010. Environ Int 59:328–335
Vrijheid M, Martinez D, Aguilera I, Ballester F, Basterrechea M, Esplugues A, Sunyer J (2012) Socioeconomic status and exposure to multiple environmental pollutants during pregnancy: evidence for environmental inequity? J Epidemiol Commun Health 66(2):106–113
Tobin J (1958) Estimation of relationships for limited dependent variables. Econometrica 26:24–36
Ling W, Cheng B, Wei Y, Willey JZ, Cheung YK (2022) Statistical inference in quantile regression for zero-inflated outcomes. Stat Sin 32:1411–1433
Koenker R, Bassett G Jr (1978) Regression quantiles. Econometrica 46:33–50
Centers for Disease Control and Prevention (CDC) and National Center for Health Statistics (NCHS) (2014) National Health and Nutrition Examination Survey Data. Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2013–2014. Available at: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/overview.aspx?BeginYear=2013
Trasande L, Attina TM, Blustein J (2012) Association between urinary bisphenol a concentration and obesity prevalence in children and adolescents. J Am Med Assoc 308(11):1113–1121
Carwile JL, Michels KB (2011) Urinary bisphenol a and obesity: Nhanes 2003–2006. Environ Res 111(6):825–830
Food and Drug Administration (FDA) (2014) Bisphenol A (BPA): Use in food contact application. Available at: https://www.fda.gov/newsevents/publichealthfocus/ucm064437.htm
Rochester JR, Bolden AL (2015) Bisphenol s and f: a systematic review and comparison of the hormonal activity of bisphenol a substitutes. Environ Health Perspect 123(7):643–650
Eladak S, Grisin T, Moison D, Guerquin MJ, N’Tumba-Byn T, Pozzi-Gaudin S, Habert R (2015) A new chapter in the bisphenol a story: bisphenol s and bisphenol f are not safe alternatives to this compound. Fertil Steril 103(1):11–21
Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Carpenter JR (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. https://doi.org/10.1136/bmj.b2393
White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30(4):377–399
Van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in r. J Stat Softw 45:1–67
Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press, Boca Raton
Van Buuren S, Oudshoorn CG (2000) Multivariate imputation by chained equations. TNO, Leiden
Van Buuren S, Brand JP, Groothuis-Oudshoorn CG, Rubin DB (2006) Fully conditional specification in multivariate imputation. J Stat Comput Simul 76(12):1049–1064
Jakobsen JC, Gluud C, Wetterslev J, Winkel P (2017) When and how should multiple imputation be used for handling missing data in randomised clinical trials-a practical guide with flowcharts. BMC Med Res Methodol 17(1):1–10
Amemiya T (1984) Tobit models: a survey. J Econometr 24(1–2):3–61
Kleiber C, Zeileis A, Zeileis MA (2020) Package ‘aer’. R package version 12(4)
Koenker R, Portnoy S, Ng PT, Zeileis A, Grosjean P, Ripley BD (2018) Package ‘quantreg’. Cran R-project. org
Ling W (2022) Statistical inference in quantile regression for zero-inflated outcomes. https://github.com/wdl2459/ZIQ. GitHub
Zipf G, Chiappa M, Porter KS, Ostchega Y, Lewis BG, Dostal J (2013) Health and nutrition examination survey plan and operations, 1999-2010. National Center for Health Statistics. Vital Health Stat 1(56)
Funding
This research was partially supported by NIH R01ES032808, NIH R01ES032826, and NIH UH3OD023305, and CDC/NIOSH grant U01OH012637.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Supplementary Information
Below is the link to the electronic supplementary material.
Supplementary file 1 (pdf 3528 KB)
Appendix A: Figure A1. Data flow diagram for deriving final dataset for analysis. Figure A2. Histogram of original and log-transformed and standardized exposures in NHANES dataset. Table A1. Coefficient estimates (and standard error) of exposure variables in a single exposure model. Table A2. Coefficient estimate (and standard error) of exposure variables in a multiple exposures model. Table A3. Coefficient estimates (and standard error) of AGE and IPR variables in an exposure-outcome model. Figure A3 - A14 Estimated effects of GENDER, RACE, EDU, MWD, and CREATININE variables with \(95\%\) confidence intervals when a chemical exposure is an outcome in a model. Table A4. LOD values for two exposure variables and corresponding missing rates. Table A5. Results from five methods in single exposure model under Scenario 2. Figure A15. Histogram of generated skewed exposure variable with different missing rates due to LOD.
Appendix B: Details on ML approach for single exposure model.
Supplementary file 2 (html 2718 KB)
Appendix C: R markdown document implementing all descriptive and analytical processes of the NHANES data application in this article.
Supplementary file 3 (pdf 30 KB)
Appendix D: R code for preprocessing and merging four datasets from NHANES 2013-2014 cycle.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Seok, E., Ghassabian, A., Wang, Y. et al. Statistical Methods for Modeling Exposure Variables Subject to Limit of Detection. Stat Biosci (2023). https://doi.org/10.1007/s12561-023-09408-3
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12561-023-09408-3