Skip to main content
Log in

Statistical Methods for Modeling Exposure Variables Subject to Limit of Detection

  • Original Paper
  • Published:
Statistics in Biosciences Aims and scope Submit manuscript

Abstract

Environmental health research aims to assess the impact of environmental exposures, making it crucial to understand their effects due to their broad impacts on the general population. However, a common issue with measuring exposures using bio-samples in laboratory is that values below the limit of detection (LOD) are either left unreported or inaccurately read by machines, which subsequently influences the analysis and assessment of exposure effects on health outcomes. We address the challenge of handling exposure variables subject to LOD when they are treated as either covariates or an outcome. We evaluate the performance of commonly-used methods including complete-case analysis and fill-in method, and advanced techniques such as multiple imputation, missing-indicator model, two-part model, Tobit model, and several others. We compare these methods through simulations and a dataset from NHANES 2013–2014. Our numerical studies show that the missing-indicator model generally yields reasonable estimates when considering exposure variables as covariates under various settings, while other methods tend to be sensitive to the LOD-missing proportions and/or distributional skewness of exposures. When modeling an exposure variable as the outcome, Tobit model performs well under Gaussian distribution and quantile regression generally provides robust estimates across various shapes of the outcome’s distribution. In the presence of missing data due to LOD, different statistical models should be considered for being aligned with scientific questions, model assumptions, requirements of data distributions, as well as their interpretations. Sensitivity analysis to handle LOD-missing exposures can improve the robustness of model conclusions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  1. Ortega-Villa AM, Liu D, Ward MH, Albert PS (2021) New insights into modeling exposure measurements below the limit of detection. Environ Epidemiol 5(1):e116

    Article  Google Scholar 

  2. He H, Mi X, Tang W, Kelly T, Shen H, Deng H, Du Y (2020) Statistical issues on analysis of censored data due to detection limit. Int J Stat Probab 9(4):49–61

    Article  Google Scholar 

  3. Lin DY, Zeng D, Couper D (2020) A general framework for integrative analysis of incomplete multiomics data. Genet Epidemiol 44(7):646–664

    Article  Google Scholar 

  4. Lynn HS (2001) Maximum likelihood inference for left-censored HIV RNA data. Stat Med 20(1):33–45

    Article  Google Scholar 

  5. May RC, Ibrahim JG, Chu H (2011) Maximum likelihood estimation in generalized linear models with multiple covariates subject to detection limits. Stat Med 30(20):2551–2561

    Article  MathSciNet  Google Scholar 

  6. Chiou SH, Betensky RA, Balasubramanian R (2019) The missing indicator approach for censored covariates subject to limit of detection in logistic regression models. Ann Epidemiol 38:57–64

    Article  Google Scholar 

  7. Nie L, Chu H, Liu C, Cole SR, Vexler A, Schisterman EF (2010) Linear regression with an independent variable subject to a detection limit. Epidemiology 21(suppl 4):S17

    Article  Google Scholar 

  8. Cole SR, Chu H, Nie L, Schisterman EF (2009) Estimating the odds ratio when exposure has a limit of detection. Int J Epidemiol 38(6):1674–1680

    Article  Google Scholar 

  9. Lubin JH, Colt JS, Camann D, Davis S, Cerhan JR, Severson RK, Hartge P (2004) Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect 112(17):1691–1696

    Article  Google Scholar 

  10. Neelon B, O’Malley AJ, Smith VA (2016) Modeling zero-modified count and semicontinuous data in health services research part 1: background and overview. Stat Med 35(27):5070–5093

    Article  MathSciNet  Google Scholar 

  11. Liu L, Shih YCT, Strawderman RL, Zhang D, Johnson BA, Chai H (2019) Statistical analysis of zero-inflated nonnegative continuous data: a review. Stat Sci 34(2):253–279

    Article  MathSciNet  MATH  Google Scholar 

  12. Bernhardt PW, Wang HJ, Zhang D (2015) Statistical methods for generalized linear models with covariates subject to detection limits. Stat Biosci 7:68–89

    Article  Google Scholar 

  13. Dziura JD, Post LA, Zhao Q, Fu Z, Peduzzi P (2013) Strategies for dealing with missing data in clinical trials: from design to analysis. Yale J Biol Med 86(3):343

    Google Scholar 

  14. Helsel DR (2005) More than obvious: better methods for interpreting nondetect data. Environ Sci Technol 39(20):419A-423A

    Article  Google Scholar 

  15. Richardson DB, Ciampi A (2003) Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am J Epidemiol 157(4):355–363

    Article  Google Scholar 

  16. Schisterman EF, Vexler A, Whitcomb BW, Liu A (2006) The limitations due to exposure detection limits for regression models. Am J Epidemiol 163(4):374–383

    Article  Google Scholar 

  17. Rubin DB (1987) Multiple imputation for nonresponse in surveys. Wiley, New York

    Book  MATH  Google Scholar 

  18. Baccarelli A, Pfeiffer R, Consonni D, Pesatori AC, Bonzini M, Patterson DG Jr, Landi MT (2005) Handling of dioxin measurement data in the presence of non-detectable values: overview of available methods and their application in the seveso chloracne study. Chemosphere 60(7):898–906

    Article  Google Scholar 

  19. Arunajadai SG, Rauh VA (2012) Handling covariates subject to limits of detection in regression. Environ Ecol Stat 19(3):369–391

    Article  MathSciNet  Google Scholar 

  20. Liu H, Campana AM, Wang Y, Kannan K, Liu M, Zhu H, Ghassabian A (2021) Organophosphate pesticide exposure: demographic and dietary predictors in an urban pregnancy cohort. Environ Pollut 283:116920

    Article  Google Scholar 

  21. Tyrrell J, Melzer D, Henley W, Galloway TS, Osborne NJ (2013) Associations between socioeconomic status and environmental toxicant concentrations in adults in the USA: NHANES 2001–2010. Environ Int 59:328–335

    Article  Google Scholar 

  22. Vrijheid M, Martinez D, Aguilera I, Ballester F, Basterrechea M, Esplugues A, Sunyer J (2012) Socioeconomic status and exposure to multiple environmental pollutants during pregnancy: evidence for environmental inequity? J Epidemiol Commun Health 66(2):106–113

    Article  Google Scholar 

  23. Tobin J (1958) Estimation of relationships for limited dependent variables. Econometrica 26:24–36

    Article  MathSciNet  MATH  Google Scholar 

  24. Ling W, Cheng B, Wei Y, Willey JZ, Cheung YK (2022) Statistical inference in quantile regression for zero-inflated outcomes. Stat Sin 32:1411–1433

    MathSciNet  MATH  Google Scholar 

  25. Koenker R, Bassett G Jr (1978) Regression quantiles. Econometrica 46:33–50

    Article  MathSciNet  MATH  Google Scholar 

  26. Centers for Disease Control and Prevention (CDC) and National Center for Health Statistics (NCHS) (2014) National Health and Nutrition Examination Survey Data. Hyattsville, MD: U.S. Department of Health and Human Services, Centers for Disease Control and Prevention, 2013–2014. Available at: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/overview.aspx?BeginYear=2013

  27. Trasande L, Attina TM, Blustein J (2012) Association between urinary bisphenol a concentration and obesity prevalence in children and adolescents. J Am Med Assoc 308(11):1113–1121

    Article  Google Scholar 

  28. Carwile JL, Michels KB (2011) Urinary bisphenol a and obesity: Nhanes 2003–2006. Environ Res 111(6):825–830

    Article  Google Scholar 

  29. Food and Drug Administration (FDA) (2014) Bisphenol A (BPA): Use in food contact application. Available at: https://www.fda.gov/newsevents/publichealthfocus/ucm064437.htm

  30. Rochester JR, Bolden AL (2015) Bisphenol s and f: a systematic review and comparison of the hormonal activity of bisphenol a substitutes. Environ Health Perspect 123(7):643–650

    Article  Google Scholar 

  31. Eladak S, Grisin T, Moison D, Guerquin MJ, N’Tumba-Byn T, Pozzi-Gaudin S, Habert R (2015) A new chapter in the bisphenol a story: bisphenol s and bisphenol f are not safe alternatives to this compound. Fertil Steril 103(1):11–21

    Article  Google Scholar 

  32. Sterne JA, White IR, Carlin JB, Spratt M, Royston P, Kenward MG, Carpenter JR (2009) Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. https://doi.org/10.1136/bmj.b2393

    Article  Google Scholar 

  33. White IR, Royston P, Wood AM (2011) Multiple imputation using chained equations: issues and guidance for practice. Stat Med 30(4):377–399

    Article  MathSciNet  Google Scholar 

  34. Van Buuren S, Groothuis-Oudshoorn K (2011) Mice: multivariate imputation by chained equations in r. J Stat Softw 45:1–67

    Article  Google Scholar 

  35. Schafer JL (1997) Analysis of incomplete multivariate data. CRC Press, Boca Raton

    Book  MATH  Google Scholar 

  36. Van Buuren S, Oudshoorn CG (2000) Multivariate imputation by chained equations. TNO, Leiden

    Google Scholar 

  37. Van Buuren S, Brand JP, Groothuis-Oudshoorn CG, Rubin DB (2006) Fully conditional specification in multivariate imputation. J Stat Comput Simul 76(12):1049–1064

    Article  MathSciNet  MATH  Google Scholar 

  38. Jakobsen JC, Gluud C, Wetterslev J, Winkel P (2017) When and how should multiple imputation be used for handling missing data in randomised clinical trials-a practical guide with flowcharts. BMC Med Res Methodol 17(1):1–10

    Article  Google Scholar 

  39. Amemiya T (1984) Tobit models: a survey. J Econometr 24(1–2):3–61

    Article  MathSciNet  MATH  Google Scholar 

  40. Kleiber C, Zeileis A, Zeileis MA (2020) Package ‘aer’. R package version 12(4)

  41. Koenker R, Portnoy S, Ng PT, Zeileis A, Grosjean P, Ripley BD (2018) Package ‘quantreg’. Cran R-project. org

  42. Ling W (2022) Statistical inference in quantile regression for zero-inflated outcomes. https://github.com/wdl2459/ZIQ. GitHub

  43. Zipf G, Chiappa M, Porter KS, Ostchega Y, Lewis BG, Dostal J (2013) Health and nutrition examination survey plan and operations, 1999-2010. National Center for Health Statistics. Vital Health Stat 1(56)

Download references

Funding

This research was partially supported by NIH R01ES032808, NIH R01ES032826, and NIH UH3OD023305, and CDC/NIOSH grant U01OH012637.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mengling Liu.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Supplementary Information

Below is the link to the electronic supplementary material.

Supplementary file 1 (pdf 3528 KB)

Appendix A: Figure A1. Data flow diagram for deriving final dataset for analysis. Figure A2. Histogram of original and log-transformed and standardized exposures in NHANES dataset. Table A1. Coefficient estimates (and standard error) of exposure variables in a single exposure model. Table A2. Coefficient estimate (and standard error) of exposure variables in a multiple exposures model. Table A3. Coefficient estimates (and standard error) of AGE and IPR variables in an exposure-outcome model. Figure A3 - A14 Estimated effects of GENDER, RACE, EDU, MWD, and CREATININE variables with \(95\%\) confidence intervals when a chemical exposure is an outcome in a model. Table A4. LOD values for two exposure variables and corresponding missing rates. Table A5. Results from five methods in single exposure model under Scenario 2. Figure A15. Histogram of generated skewed exposure variable with different missing rates due to LOD.

Appendix B: Details on ML approach for single exposure model.

Supplementary file 2 (html 2718 KB)

Appendix C: R markdown document implementing all descriptive and analytical processes of the NHANES data application in this article.

Supplementary file 3 (pdf 30 KB)

Appendix D: R code for preprocessing and merging four datasets from NHANES 2013-2014 cycle.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Seok, E., Ghassabian, A., Wang, Y. et al. Statistical Methods for Modeling Exposure Variables Subject to Limit of Detection. Stat Biosci (2023). https://doi.org/10.1007/s12561-023-09408-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12561-023-09408-3

Keywords

Navigation