Skip to main content

Advertisement

Log in

Correction for misclassification of caries experience in the absence of internal validation data

  • Review
  • Published:
Clinical Oral Investigations Aims and scope Submit manuscript

Abstract

Objectives

To quantify the effects of risk factors and/or determinants on disease occurrence, it is important that the risk factors as well as the variable that measures the disease outcome are recorded with the least error as possible. When investigating the factors that influence a binary outcome, a logistic regression model is often fitted under the assumption that the data are collected without error. However, most categorical outcomes (e.g., caries experience) are accompanied by misclassification and this needs to be accounted for. The aim of this research was to adjust for binary outcome misclassification using an external validation study when investigating factors influencing caries experience in schoolchildren.

Materials and methods

Data from the Signal Tandmobiel® study were used. A total of 500 children from the main and 148 from the validation study were included in the analysis. Regression models (with several covariates) for sensitivity and specificity were used to adjust for misclassification in the main data.

Results

The use of sensitivity and specificity modeled as functions of several covariates resulted in a better correction compared to using point estimates of sensitivity and specificity. Age, geographical location of the school to which the child belongs, dentition type, tooth type, and surface type were significantly associated with the prevalence of caries experience.

Conclusions

Sensitivity and specificity calculated based on an external validation study may resemble those obtained from an internal study if conditioned on a rich set of covariates.

Clinical relevance

Main data can be corrected for misclassification using information obtained from an external validation study when a rich set of covariates is recorded during calibration.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Pine CM, Pitts NB, Nugent Z (1997) British Association for the Study of Community Dentistry (BASCD) guidance on the statistical aspects of training and calibration of examiners for surveys of child dental health: a BASCD coordinated dental epidemiology programme quality standard. Community Dent Health 14(Suppl 1):18–29

    PubMed  Google Scholar 

  2. Neuhaus JM (1999) Bias and efficiency loss due to misclassified responses in binary regression. Biometrika 86:843–855

    Article  Google Scholar 

  3. Neuhaus JM (2002) Analysis of clustered and longitudinal binary data subject to response misclassification. Biometrics 58:675–683

    Article  PubMed  Google Scholar 

  4. Magder LS, Hughes JP (1997) Logistic regression when the outcome is measured with uncertainty. Am J Epidemiol 146(2):195–203

    Article  PubMed  Google Scholar 

  5. Mwalili S, Lesaffre E, Declerck D (2005) A Bayesian ordinal logistic regression model to correct for inter-observer measurement error in a geographical oral health study. J R Stat Soc Ser C Appl 54:77–93

    Article  Google Scholar 

  6. Lesaffre E, Mwalili S, Declerck D (2004) Analysis of caries experience taking inter-observer bias and variability into account. J Dent Res 83(12):951–955

    Article  PubMed  Google Scholar 

  7. Tenenbein A (1986) A double sampling scheme for estimating from misclassified multinomial data with applications to sampling inspection. Biometrika 73:13–22

    Article  Google Scholar 

  8. Marshall RJ (1990) Validation study methods for estimating exposure proportions and odds ratios with misclassified data. J Clin Epidemiol 43:941–947

    Article  PubMed  Google Scholar 

  9. Küchenhoff H (2009) Misclassification and measurement error in oral health. In: Lesaffre E, Feine J, Leroux B, Declerck D (eds) Statistical and methodological aspects of oral health research. Wiley, New York, pp 279–290

    Chapter  Google Scholar 

  10. Agbaje JO, Mutsvari T, Lesaffre E, Declerck D (2012) Examiner performance in calibration exercises compared with field conditions when scoring caries experience. Clin Oral Investig 16(2):481–488

    Article  PubMed  Google Scholar 

  11. Declerck D, Lesaffre E, Leroy R, Vanobbergen J (2009) Examples from oral health epidemiology: the Signal Tandmobiel and smile for life studies. In: Lesaffre E, Feine J, Leroux B, Declerck D (eds) Statistical and methodological aspects of oral health research. Wiley, New York, pp 341–357

    Google Scholar 

  12. Pitts NB, Evans DJ, Pine CM (1997) British Association for the Study of Community Dentistry (BASCD) diagnostic criteria for caries prevalence surveys–1996/7. Community Dent Health 14(Suppl 1):6–9

    PubMed  Google Scholar 

  13. Klein H, Palmer CE, Knutson JW (1938) Studies on dental caries. I. Dental status and dental needs of elementary school children. Public Health Rep 53:751–765

    Article  Google Scholar 

  14. Mutsvari T (2012) Misclassification in multilevel models with applications in dental caries research, PhD Dissertation, KU Leuven

  15. Brenner H, Savitz DA (1990) The effects of sensitivity and specificity of case selection on validity, sample size, precision, and power in hospital-based case–control studies. Am J Epidemiol 132(1):181–192

    PubMed  Google Scholar 

  16. Wacholder S, Armstrong B, Hartge P (1993) Validation studies using an alloyed gold standard. Am J Epidemiol 137:1251–1258

    PubMed  Google Scholar 

  17. McInturf P, Johnson WO, Cowling D, Gardner IA (2004) Modelling risk when binary outcomes are subject to error. Stat Med 23:1095–1109

    Article  Google Scholar 

  18. Ralph BD (1998) Propensity score methods for bias reduction in the comparison of treatment to non-randomized control group. Stat Med 17:2265–2281

    Article  Google Scholar 

  19. Gustafson P (2004) Measurement error and misclassification in statistics and epidemiology: impacts and Bayesian adjustments. Chapman & Hall, London

    Google Scholar 

  20. Greenland S (1980) The effects of misclassification in the presence of covariates. Am J Epidemiol 112:564–569

    PubMed  Google Scholar 

  21. Lesaffre E, Lawson B (2012) Bayesian biostatistics. Wiley, New York

    Book  Google Scholar 

  22. Plummer M (2011) JAGS: a program for analysis of Bayesian graphical models using Gibbs sampling. In: Hornik K, Leisch F, Zeileis A (eds) Proceedings of the 3rd International Workshop on Distributed Statistical Computing (DSC, 2003). Technische Universitaet Wien, Vienna, Austria. http://www.ci.tuwien.ac.at/Conferences/DSC.html. Accessed 24 Nov 2011

  23. Plummer M, Best N, Cowles K, Vines K (2008) CODA: output analysis and diagnostics for MCMC. R Package version 0.13-3

Download references

Acknowledgments

This investigation was supported by Research Grant OT/05/60, KU Leuven; data collection was supported by Unilever, Belgium. The Signal Tandmobiel® project has the following partners: D. Declerck (Department of Oral Health Sciences, KU Leuven), L. Martens (Dental School, University Ghent), J. Vanobbergen (Dental School, University Ghent), P. Bottenberg (Dental School, University Brussels), E. Lesaffre (Department of Biostatistics, Erasmus Medical Center, Rotterdam, The Netherlands and L-Biostat, KU Leuven), and K. Hoppenbrouwers (Youth Health Department, KU Leuven; Flemish Association for Youth Health Care).

Conflict of interest

The authors declare no conflicts of interest.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. Lesaffre.

Appendix

Appendix

Multilevel model assuming error-free CE data

Let Y M, stme be the true CE score of surface s, (s = 1, …, n t ) nested in tooth t = 1, …, n m , which is nested in child/mouth m = 1, …, N M according to examiner e, (e = 1, …, n e ) in the main study. The model uses \( {\pi_{\mathrm{stme}}}=Pr\left( {{Y_{\mathrm{M},\mathrm{stme}}}=1\left| {\boldsymbol{\upbeta}, {{\mathbf{x}}_{\mathrm{stme}}},{u_{\mathrm{m}}},\ {u_{\mathrm{tm}}},{u_{\mathrm{e}}}} \right.} \right) \), which is the true conditional probability for CE on surface s nested in tooth t in mouth m by examiner e. The multilevel logistic model for the true main data is then given by:

$$ \mathrm{logit}\left( {{\pi_{\mathrm{stme}}}} \right)=\mathbf{x}_{\mathrm{stme}}^T\boldsymbol{\upbeta} +{u_{\mathrm{m}}}+{u_{\mathrm{tm}}}+{u_{\mathrm{e}}} $$

where x stme represents the risk factors and/or determinants, β is a vector of regression coefficients and it quantifies the effect of the risk factors/or determinants. The quantities u m, u tm, and u e are random intercepts at mouth, tooth, and examiner level and they are independently distributed with mean zero and variances \( \sigma_{\mathrm{m}}^2,\sigma_{\mathrm{tm}}^2,\;\mathrm{and}\;\sigma_{\mathrm{e}}^2 \), at mouth, tooth (nested in mouth), and examiner level, respectively, i.e., (u m, u tm, u e) ~ N(0, D) where \( \mathbf{D}=\mathrm{diag}\left( {\sigma_{\mathrm{m}}^2,\sigma_{\mathrm{tm}}^2,\sigma_{\mathrm{e}}^2} \right) \). They take into account the clustering of teeth within mouths, surfaces within teeth, and an examiner recording many surfaces, respectively.

Dealing with external validation data

There are two types of misclassification, i.e., differential and non-differential. Non-differential misclassification occurs when the misclassification does not depend on determinants [19, 20]. Differential misclassification occurs when misclassification is different in boys and girls.

If scoring in the main and validation studies is done by the same fallible dental examiners, then either the misclassification probabilities are the same in the two studies or the misclassification process is differential (misclassification depending on covariates), but given a rich set of covariates (e.g., gender, dentition type, tooth type, and surface type as for the ST validation study), they become the same in the two studies. If the misclassification probabilities between the two studies are equal, then the external validation data can be immediately used to correct for misclassification in the main model. However, if these misclassification probabilities do differ between the main data and validation data, then the misclassification process is differential and, conditional on a rich set of covariates, the scoring in the two data sets may become identical.

Consider an external validation data set as in the ST study. Assume P 1 and P 2 are the misclassification processes in the main study and validation study, respectively. Suppose that subjects are well characterized by a rich covariate vector, z, then under the settings described above, our claim is that often \( {P_1}\left( {{Y^{*}}\left| {Y,z} \right.} \right)={P_2}\left( {{Y^{*}}\left| {Y,z} \right.} \right) \) when \( {P_1}\left( {{Y^{*}}\left| Y \right.} \right)\ne {P_2}\left( {{Y^{*}}\left| Y \right.} \right) \). Inequality of the (assumed non-differential) misclassification process P 1 and P 2 occurs because f 1(z)  ≠  f 2(z) in \( {P_j}\left( {{Y^{*}}\left| Y \right.} \right)=\int {{P_j}\left( {{Y^{*}}\left| {Y,\mathbf{z}} \right.} \right){f_j}\left( \mathbf{z} \right)d\left( \mathbf{z} \right)} \) for j = 1,2. As a result, the misclassification probabilities become identical given z.

Multilevel model for cross-sectional CE data adjusting for misclassification

Using the estimates of parameters for logistic models of SE and SP from validation data, say α and η, a corrected multilevel logistic model for the main observed data uses \( \pi_{\mathrm{stme}}^{*}=Pr\left( {Y_{\mathrm{M},\mathrm{stme}}^{*}=1\left| {\beta, {{\mathbf{x}}_{\mathrm{stme}}},{u_{\mathrm{m}}},\ {u_{\mathrm{tm}}},{u_{\mathrm{e}}},\boldsymbol{\upalpha}, \boldsymbol{\upeta}, \mathbf{z}} \right.} \right) \), which is the observed (corrected for misclassification) probability for CE on surface s nested in tooth t in mouth m from the main data set given x stme and z, a vector of covariates from the main data and validation data, respectively, and random effects u m, u tm, and u e and estimates of SE and SP α and η, respectively. This processing was done in one joint model, i.e., a model that encompasses the estimation of SE and SP and at the same time correcting for misclassification. The corrected model is given by:

$$ \pi_{\mathrm{stme}}^{*}=\left( {1-{\tau_{00 }}} \right)+\left[ {{\tau_{11 }}+{\tau_{00 }}-1} \right]\left[ {{g^{-1 }}\left( {x_{\mathrm{stme}}^T\boldsymbol{\upbeta} +{u_{\mathrm{m}}}+{u_{\mathrm{tm}}}+{u_{\mathrm{e}}}} \right)} \right] $$

where τ 11 = τ 11(z) and τ 00 = τ 00(z) are the differential SE and SP.

Bayesian estimation approach

The posterior summary measures of the parameters are obtained using a sampling approach called the Markov Chain Monte Carlo (MCMC) approach [21]. Here, non-informative or vague priors were used which express that there is no prior information on the parameters. For this purpose, JAGS 3.1.0 [22] software was used. Three MCMC chains were run, each for 100,000 iterations for each model. The convergence of these MCMC chains was checked using the CODA package (see [23]) in R. In particular, the Gelman and Rubin diagnostics measure \( \widehat{\mathrm{R}} \) was used and this value was close to 1 for all the parameters, which means there was no evidence against convergence. Finally, a sensitivity analysis on the model corrected for misclassification was performed. Specifically, a sensitivity analysis was performed by changing the prior distributions for fixed effects. This was done in order to check whether the model was robust to some perturbations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Mutsvari, T., Declerck, D. & Lesaffre, E. Correction for misclassification of caries experience in the absence of internal validation data. Clin Oral Invest 17, 1799–1805 (2013). https://doi.org/10.1007/s00784-013-0993-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00784-013-0993-4

Keywords

Navigation