Semiparametric approaches for matched case–control studies with error-in-covariates

  • Nels G. Johnson
  • Inyoung KimEmail author
Original paper


The matched case–control study is a popular design in public health, biomedical, and epidemiological research for human, animal, and other subjects for clustered binary outcomes. Often covariates in such studies are measured with error. Not accounting for this error can lead to incorrect inference for all covariates in the model. The methods for assessing and characterizing error-in-covariates in matched case–control studies are quite limited. In this article we propose several approaches for handling error-in-covariates that detect both parametric and nonparametric relationships between the covariates and the binary outcome. We propose a Bayesian approach and two approximate-Bayesian approaches for addressing error-in-covariates that is additive and Gaussian, where the variable measured with error has an unknown, nonlinear relationship with the response. The Bayesian approaches use an approximate latent variable probit model. All methods are developed using the nonparametric method of low-rank thin-plate splines. We assess the performance of each method in terms of mean squared error and mean bias in both simulations and a perturbed example of 1–4 matched case-crossover study.


Bayesian methods Latent variable probit Mixed model Thin-plate splines 



We would like to thank Pang Du, Leanna House, Scotland Leman, George Terrell, and Matt Williams for their advice and assistance. We would also like to thank Ho Kim for supplying the aseptic meningitis data.


  1. Agresti A (2002) Categorical data analysis, 2nd edn. Wiley series in probability and statistics. Wiley, HobokenCrossRefzbMATHGoogle Scholar
  2. Albert J, Chib S (1993) Bayesian-analysis of binary and polytochtomous response data. J Am Stat Assoc 88(422):669–679. CrossRefzbMATHGoogle Scholar
  3. Bartlett J, Keogh R (2018) Bayesian correction for covariate measurement error: a frequentist evaluation and comparison with regression calibration. Stat Methods Med Res 27:1695–1708MathSciNetCrossRefGoogle Scholar
  4. Berry SM, Carroll RJ, Ruppert D (2002) Bayesian smoothing and regression splines for measurement error problems. J Am Stat Assoc 97(457):160–169. MathSciNetCrossRefzbMATHGoogle Scholar
  5. Buzas JS, Stefanski LA (1996) A note on corrected-score estimation. Stat Probab Lett 28(1):1–8. MathSciNetCrossRefzbMATHGoogle Scholar
  6. Camilli G (1994) Origin of the scaling constant \(\text{ d }=1.7\), in item response theory. J Educ Behav Stat 19(3):293–295CrossRefGoogle Scholar
  7. Carroll R, Roeder K, Wasserman L (1999) Flexible parametric measurement error models. Biometrics 55(1):44–54. CrossRefzbMATHGoogle Scholar
  8. Carroll R, Ruppert D, Tosteson T, Crainiceanu C, Karagas M (2004) Nonparametric regression and instrumental variables. J Am Stat Assoc 99:736–750MathSciNetCrossRefzbMATHGoogle Scholar
  9. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM (2006) Measurement error in nonlinear models: a modern perspective, 2nd edn. Monographs on statistics and applied probability. Chapman and Hall/CRC, Boca RatonCrossRefzbMATHGoogle Scholar
  10. Crainiceanu C, Ruppert D, Wand MP (2005) Bayesian analysis for penalized spline regression using winbugs. J Stat Softw 14(1):1–14Google Scholar
  11. Eaton JW et al (2008) GNU Octave 3.0.5.
  12. Geman S, Geman D (1984) Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Trans Pattern Anal Mach Intell 6(6):721–741CrossRefzbMATHGoogle Scholar
  13. Guolo A (2008) A flexible approach to measurement error correction in case–control studies. Biometrics 64(4):1207–1214. MathSciNetCrossRefzbMATHGoogle Scholar
  14. Guolo A, Brazzale AR (2008) A simulation-based comparison of techniques to correct for measurement error in matched case–control studies. Stat Med 27(19):3755–3775. MathSciNetCrossRefGoogle Scholar
  15. Gustafson P (2003) Measurement error and misclassification in statistics and epidemiology: impacts and Bayesian adjustments. CRC Press, Boca RatonCrossRefzbMATHGoogle Scholar
  16. Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57(1):97–109. MathSciNetCrossRefzbMATHGoogle Scholar
  17. Hosmer DW Jr, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley series in probability and statistics. Wiley, HobokenCrossRefzbMATHGoogle Scholar
  18. Huang Y, Wang C (2000) Cox regression with accurate covariates unascertainable: a nonparametric-correction approach. J Am Stat Assoc 95(452):1209–1219MathSciNetCrossRefzbMATHGoogle Scholar
  19. Huang Y, Wang C (2001) Consistent functional methods for logistic regression with errors in covariates. J Am Stat Assoc 96(456):1469–1482MathSciNetCrossRefzbMATHGoogle Scholar
  20. MATLAB (2012) Version (R2012a). The MathWorks Inc., NatickGoogle Scholar
  21. McShane L, Midthune D, Dorgan J, Freedman L, Carroll R (2001) Covariate measurement error adjustment for matched case–control studies. Biometrics 57(1):62–73. MathSciNetCrossRefzbMATHGoogle Scholar
  22. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E (1953) Equation of state calculations by fast computing machines. J Chem Phys 21(6):1087–1092. CrossRefGoogle Scholar
  23. Parker PA, Vining GG, Wilson SR, Szarka JL III, Johnson NG (2010) The prediction properties of classical and inverse regression for the simple linear calibration problem. J Qual Technol 42(4):332–347CrossRefGoogle Scholar
  24. Peleg AY, Husain S, Qureshi ZA, Silveira FP, Sarumi M, Shutt KA, Kwak EJ, Paterson DL (2007) Risk factors, clinical characteristics, and outcome of nocardia infection in organ transplant recipients: a matched case–control study. Clin Infect Dis 44(10):1307–1314. CrossRefGoogle Scholar
  25. Ruppert D, Wand MP, Carroll RJ (2003) Semiparametric regression. Cambridge series on statistical and probabilistic mathematics. Cambridge University Press, New YorkCrossRefzbMATHGoogle Scholar
  26. Ryu D, Li E, Mallick B (2011) Bayesian nonparametric regression analysis of data with random effects covariates from longitudinal measurements. Biometrics 67:454–466MathSciNetCrossRefzbMATHGoogle Scholar
  27. Scott AJ, Wild CJ (1997) Fitting regression models to case–control data by maximum likelihood. Biometrika 84(1):57–71MathSciNetCrossRefzbMATHGoogle Scholar
  28. Shaby B, Wells M (2010) Exploring an adaptive metropolis algorithm. Technical report, Department of Statistical Science, Duke UniversityGoogle Scholar
  29. Sinha S, Mukherjee B, Ghosh M, Mallick BK, Carroll RJ (2005) Semiparametric Bayesian analysis of matched case–control studies with missing exposure. J Am Stat Assoc 100(470):591–601MathSciNetCrossRefzbMATHGoogle Scholar
  30. Sinha S, Mallick B, Kipnis V, Carroll R (2010) Semiparametric Bayesian analysis of nutritional epidemiology data in the presence of measurement error. Biometrics 66:444–454MathSciNetCrossRefzbMATHGoogle Scholar
  31. Stefanski LA, Carroll RJ (1987) Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika 74(4):703–716. MathSciNetzbMATHGoogle Scholar
  32. Tester J, Rutherford G, Wald Z, Rutherford M (2004) A matched case–control study evaluating the effectiveness of speed humps in reducing child pedestrian injuries. Am J Public Health 94(4):646–650. CrossRefGoogle Scholar
  33. Tierney L, Kadane J (1986) Accurate approximations for posterior moments and marginal densities. J Am Stat Assoc 81(393):82–86. MathSciNetCrossRefzbMATHGoogle Scholar
  34. Whitney CG, Pilishvili T, Farley MM, Schaffner W, Craig AS, Lynfield R, Nyquist A-C, Gershman KA, Vazquez M, Bennett NM, Reingold A, Thomas A, Glode MP, Zell ER, Jorgensen JH, Beall B, Schuchat A (2006) Effectiveness of seven-valent pneumococcal conjugate vaccine against invasive pneumococcal disease: a matched case–control study. Lancet 368(9546):1495–1502. CrossRefGoogle Scholar
  35. Woodward M (2013) Epidemiology: study design and data analysis, 3rd edn. Chapman & Hall, Boca RatonCrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2019

Authors and Affiliations

  1. 1.National Institute for Mathematical and Biological SynthesisUniversity of TennesseeKnoxvilleUSA
  2. 2.Department of StatisticsVirginia TechBlacksburgUSA

Personalised recommendations