Advertisement

Environmental and Ecological Statistics

, Volume 19, Issue 3, pp 369–391 | Cite as

Handling covariates subject to limits of detection in regression

  • Srikesh G. ArunajadaiEmail author
  • Virginia A. Rauh
Article

Abstract

In the environmental health sciences, measurements of toxic exposures are often constrained by a lower limit called the limit of detection (LOD), with observations below this limit called non-detects. Although valid inference may be obtained by excluding non-detects in the estimation of exposure effects, this practice can lead to substantial reduction in power to detect a significant effect, depending on the proportion of censoring and the closeness of the effect size to the null value. Therefore, a variety of methods have been commonly used in the environmental science literature to substitute values for the non-detects for the purpose of estimating exposure effects, including ad hoc values such as \({LOD/2, LOD/\sqrt{2}}\) and LOD. Another method substitutes the expected value of the non-detects, i.e., E[X|X ≤ LOD] but this requires that the inference be robust to mild miss-specifications in the distribution of the exposure variable. In this paper, we demonstrate that the estimate of the exposure effect is extremely sensitive to ad-hoc substitutions and moderate distribution miss-specifications under the conditions of large sample sizes and moderate effect size, potentially leading to biased estimates. We propose instead the use of the generalized gamma distribution to estimate imputed values for the non-detects, and show that this method avoids the risk of distribution miss-specification among the class of distributions represented by the generalized gamma distribution. A multiple imputation-based procedure is employed to estimate the regression parameters. Compared to the method of excluding non-detects, the proposed method can substantially increase the power to detect a significant effect when the effect size is close to the null value in small samples with moderate levels of censoring ( ≤ 50%), without compromising the coverage and relative bias of the estimates.

Keywords

Limit of detection Multiple imputation Generalized gamma distribution Regression 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Baccarelli A, Pfeiffer R, Consonni D, Pesatori A, Bonzini M, Patterson D Jr, Bertazzi P, Landi M (2005) Handling of dioxin measurement data in the presence of non-detectable values: overview of available methods and their application in the Seveso chloracne study. Chemosphere 60(7): 898–906PubMedCrossRefGoogle Scholar
  2. Carroll R, Ruppert D, Stefanski L, Crainiceanu C (2006) Measurement error in nonlinear models: a modern perspective. Chapman and Hall/CRC, Boca RatonCrossRefGoogle Scholar
  3. Cox C, Chu H, Schneider M, Muñoz A (2007) Parametric survival analysis and taxonomy of hazard functions for the generalized gamma distribution. Stat Med 26(23): 4352–4374PubMedCrossRefGoogle Scholar
  4. Gillespie B, Chen Q, Reichert H, Franzblau A, Hedgeman E, Lepkowski J, Adriaens P, Demond A, Luksemburg W, Garabrant D (2010) Estimating population distributions when some data are below a limit of detection by using a reverse Kaplan-Meier estimator. Epidemiology 21(4): S64PubMedCrossRefGoogle Scholar
  5. Gilliom R, Helsel D (1986) Estimation of distributional parameters for censored trace level water quality data 1. estimation techniques. Water Resour Res. http://www.agu.org/pubs/crossref/1986/WR022i002p00135.shtml
  6. Gomes O, Combes C, Dussauchoy A (2008) Parameter estimation of the generalized gamma distribution. Math Comput Simul 79(4): 955–963CrossRefGoogle Scholar
  7. Helsel D (1990) Less than obvious-statistical treatment of data below the detection limit. Environ Sci Technol. http://pubs.acs.org/doi/abs/10.1021/es00082a001
  8. Helsel D (2005) Nondetects and data analysis: statistics for censored environmental data. Wiley-Blackwell, HobokenGoogle Scholar
  9. Helsel D, Cohn T (1988) Estimation of descriptive statistics for multiply censored water quality data. Water Resour Res. http://www.agu.org/pubs/crossref/1988/WR024i012p01997.shtml
  10. Hughes J (1999) Mixed effects models with censored data with application to hiv rna levels. Biometrics. http://www3.interscience.wiley.com/journal/119061990/abstract
  11. Jacqmin-Gadda H, Thiebaut R, Chene G (2000) Analysis of left-censored longitudinal data with application to viral load in hiv infection. Biostatistics. http://biostatistics.oxfordjournals.org/cgi/content/abstract/1/4/355
  12. Jassal S, Kritz-Silverstein D, Barrett-Connor E (2010) A prospective study of albuminuria and cognitive function in older adults: the Rancho Bernardo Study. Am J Epidemiol 171: 277–286PubMedCrossRefGoogle Scholar
  13. Leith K, Bowerman W, Wierda M, Best D, Grubb T, Sikarske J (2010) A comparison of techniques for assessing central tendency in left-censored data using PCB and p, p’DDE contaminant concentrations from Michigan’s Bald Eagle Biosentinel Program. Chemosphere 80: 7–12PubMedCrossRefGoogle Scholar
  14. Little R (1992) Regression with missing X’s: a review. J Am Stat Assoc 87(420): 1227–1237Google Scholar
  15. Lubin J, Colt J, Camann D, Davis S, Cerhan J, Severson R, Bernstein L, Hartge P (2004) Epidemiologic evaluation of measurement data in the presence of detection limits. Environ Health Perspect 112(17): 1691PubMedCrossRefGoogle Scholar
  16. Lyles R, Lyles C, Taylor D (2000) Random regression models for human immunodeficiency virus ribonucleic acid data subject to left censoring and informative drop-outs. J R Stat Soc Ser C. http://www3.interscience.wiley.com/journal/119037970/abstract
  17. Lynn H (2001) Maximum likelihood inference for left-censored HIV RNA data. Stat Med 20(1): 33–45PubMedCrossRefGoogle Scholar
  18. Nadarajah S, Kotz S (2006) R programs for computing truncated distributions. J Stat Softw 16: 273–278Google Scholar
  19. Navas-Acien A, Tellez-Plaza M, Guallar E, Muntner P, Silbergeld E, Jaar B, Weaver V (2009) Blood cadmium and lead and chronic kidney disease in US adults: a joint analysis. Am J Epidemiol 170: 1156–1164PubMedCrossRefGoogle Scholar
  20. Neta G, von Ehrenstein O, Goldman L, Lum K, Sundaram R, Andrews W, Zhang J (2010) Umbilical cord serum cytokine levels and risks of small-for-gestational-age and preterm birth. Am J Epidemiol 171(8): 859PubMedCrossRefGoogle Scholar
  21. Nie L, Chu H, Liu C, Cole SR, Vexler A, Schisterman EF (2010) Linear regression with an independent variable subject to a detection limit. Epidemiology 21: S17–S24. doi: 10.1097/EDE.0b013e3181ce97d8 PubMedCrossRefGoogle Scholar
  22. Prentice R (1974) A log gamma model and its maximum likelihood estimation. Biometrika 61(3): 539CrossRefGoogle Scholar
  23. R Development Core Team (2010) R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria, http://www.R-project.org, ISBN 3-900051-07-0
  24. Raghunathan T, Lepkowski J, Van Hoewyk J, Solenberger P (2001) A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodol 27(1): 85–96Google Scholar
  25. Richardson D, Ciampi A (2003) Effects of exposure measurement error when an exposure variable is constrained by a lower limit. Am J Epidemiol. http://aje.oxfordjournals.org/cgi/content/abstract/157/4/355
  26. Rubin D (2004) Multiple imputation for nonresponse in surveys. John Wiley and Sons Inc, HobokenGoogle Scholar
  27. Schafer J (1999) Multiple imputation: a primer. Stat Methods Med Res 8(1): 3PubMedCrossRefGoogle Scholar
  28. Schisterman E, Vexler A, Whitcomb B, Liu A (2006) The limitations due to exposure detection limits for regression models. Am J Epidemiol. 163:374–383. http://aje.oxfordjournals.org/cgi/content/abstract/163/4/374 Google Scholar
  29. Stacy E, Mihram G (1965) Parameter estimation for a generalized gamma distribution. Technometrics 7(3): 349–358CrossRefGoogle Scholar
  30. Stein C, Savitz D, Dougan M (2009) Serum levels of perfluorooctanoic acid and perfluorooctane sulfonate and pregnancy outcome. Am J Epidemiol 170(7): 837PubMedCrossRefGoogle Scholar
  31. Sutton-Tyrrell K, Zhao X, Santoro N, Lasley B, Sowers M, Johnston J, Mackey R, Matthews K (2010) Reproductive hormones and obesity: 9 years of observation from the study of women’s health across the nation. Am J Epidemiol 171: 1203–1213PubMedCrossRefGoogle Scholar
  32. Waller L, Turnbull B (1992) Probability plotting with censored data. Am Stat 46: 5–12Google Scholar
  33. Wannemuehler K, Lyles R (2005) A unified model for covariate measurement error adjustment in an occupational health study while accounting for non-detectable exposures. J R Stat Soc Ser C Appl Stat 54(1):259–271. http://www.jstor.org/stable/3592611

Copyright information

© Springer Science+Business Media, LLC 2012

Authors and Affiliations

  1. 1.Department of BiostatisticsNew YorkUSA
  2. 2.Department of Population and Family HealthNew YorkUSA

Personalised recommendations