Nonparametric estimation of ROC curves based on Bayesian models when the true disease state is unknown

  • Chong WangEmail author
  • Bruce W. Turnbull
  • Yrjö T. Gröhn
  • Søren S. Nielsen


We develop a Bayesian methodology for nonparametric estimation of ROC curves used for evaluation of the accuracy of a diagnostic procedure. We consider the situation where there is no perfect reference test, that is, no “gold standard”. The method is based on a multinomial model for the joint distribution of test-positive and test-negative observations. We use a Bayesian approach which assures the natural monotonicity property of the resulting ROC curve estimate. MCMC methods are used to compute the posterior estimates of the sensitivities and specificities that provide the basis for inference concerning the accuracy of the diagnostic procedure. Because there is no gold standard, identifiability requires that the data come from at least two populations with different prevalences. No assumption is needed concerning the shape of the distributions of test values of the diseased and non diseased in these populations. We discuss an application to an analysis of ELISA scores in the diagnostic testing of paratuberculosis (Johne’s Disease) for several herds of dairy cows and compare the results to those obtained from some previously proposed methods.

Key Words

Diagnostic test Johne’s Disease Markov chain Monte Carlo No gold standard Sensitivity Specificity 


  1. Albert, P. S., and Dodd, L. E. (2004), “A Cautionary Note on the Robustness of Latent Class Models for Estimating Diagnostic Error Without a Gold Standard,” Biometrics, 60, 427–435.zbMATHCrossRefMathSciNetGoogle Scholar
  2. Andersen, S. (1997), “Re: Bayesian Estimation of Disease Prevalence and the Parameters of Diagnostic Tests in the Absence of a Gold Standard,” American Journal of Epidemiology, 145, 290–291.Google Scholar
  3. Andersen, H. J., Aagaard, K., Skjoth, F., Rattenborg, E., and Enevoldsen, C. (2000), “Integration of Research, Development, Health Promotion, and Milk Quality Assurance in the Danish Dairy Industry,” in Proceedings of the Ninth Symposium of the International Society of Veterinary Epidemiology and Economics, Breckenridge, CO, August 6–11, eds. M. D. Salman, P. Morley, and R. Ruch-Gallie, pp. 258–260.Google Scholar
  4. Begg, C. B., and Metz, C. E. (1990), “Consensus Diagnosis and ‘Gold Standards’,” Medical Decision Making, 10, 29–30.CrossRefGoogle Scholar
  5. Beiden, S. V., Campbell, G., Meier, K. L., and Wagner, R. F. (2000), “On the Problem of ROC Analysis without Truth: The EM Algorithm and the Information Matrix,” in Proceedings of the Society of Photo-Optical Instrumentation Engineers (SPIE): The International Society for Optical Engineering, Bellingham WA, eds. M. D. Salman, P. Morley, and R. Ruch-Gallie, vol. 3981, pp. 126–134.Google Scholar
  6. Best, N., Cowles, M., and Vines, S. (1995), CODA Manual version 0.30, Cambridge, UK: MRC Biostatistics Unit.Google Scholar
  7. Black, M. A., and Craig, B. A. (2002), “Estimating Disease Prevalence in the Absence of a Gold Standard,” Statistics in Medicine, 21, 2653–2669.CrossRefGoogle Scholar
  8. Branscum, A. J., Gardner, I. A., and Johnson, W. O. (2005), “Estimation of Diagnostic-test Sensitivity and Specificity Through Bayesian Modeling,” Preventive Veterinary Medicine, 68, 145–163.CrossRefGoogle Scholar
  9. Choi, Y., Johnson, W. O., Collins, M. T., and Gardner, I. A. (2006), “Bayesian Inferences for Receiver Operating Characteristic Curves in the Absence of a Gold Standard,” Journal of Agricultural, Biological and Enviromental Statistics, 11, 210–229.CrossRefGoogle Scholar
  10. Dendukuri, N., and Joseph, L. (2001), “Bayesian Approaches to Modeling the Conditional Dependence Between Multiple Diagnostic Tests,” Biometrics, 57, 158–167.CrossRefMathSciNetGoogle Scholar
  11. Enøe, C., Georgiadis, M. P., and Johnson, W. O. (2000), “Estimation of Sensitivity and Specificity of Diagnostic Tests and Disease Prevalence When the True Disease State is Unknown,” Preventive Veterinary Medicine, 45, 61–81.CrossRefGoogle Scholar
  12. Garrett, E. S., Eaton, E. E., and Zeger, S. (2002), “Methods for Evaluating the Performance of Diagnostic Tests in the Absence of a Gold Standard: a Latent Class Model Approach,” Statistics in Medicine, 21, 1289–1307.CrossRefGoogle Scholar
  13. Georgiadis, M. P., Johnson, W. O., Gardner, I. A., and Singh, R. (2003), “Correlation-adjusted Estimation of Sensitivity and Specificity of Two Diagnostic Tests,” Applied Statistics, 52, 63–76.zbMATHMathSciNetGoogle Scholar
  14. Greiner, M., Pfeiffer, D., and Smith, R. D. (2000), “Priciples and Practical Application of Receiver Operating Characteristic Analysis for Diagnostic Tests,” Preventive Veterinary Medicine, 45, 23–41.CrossRefGoogle Scholar
  15. Gustafson, P. (2005), “The Utility of Prior Information and Stratification for Parameter Estimation With Two Screening Tests but No Gold Standard,” Statistics in Medicine, 24, 1203–1217.CrossRefMathSciNetGoogle Scholar
  16. Hall, P., and Zhou, X.-H. (2003), “Nonparametric Estimation of Component Distributions in a Multivariate Mixture,” The Annals of Statistics, 31, 201–224.zbMATHCrossRefMathSciNetGoogle Scholar
  17. Hanson, T. E., Johnson, W. O., and Gardner, I. A. (2003), “Hierarchical Models for the Estimation of Disease Prevalence and the Sensitivity and Specificity of Dependent Tests in the Absence of a Gold Standard,” Journal of Agricultural, Biological, and Environmental Statistics, 8, 223–239.CrossRefGoogle Scholar
  18. Henkelman, R. M., Kay, I., and Bronskill, M. J. (1990), “Receiver Operator Characteristic (ROC) Analysis Without Truth,” Medical Decision Making, 10, 24–29.CrossRefGoogle Scholar
  19. Hui, S. L., and Walter, S. D. (1980), “Estimating the Error Rates of Diagnostic Tests,” Biometrics, 36, 167–171.zbMATHCrossRefGoogle Scholar
  20. Hui, S. L., and Zhou, X. H. (1998), “Evaluation of Diagnostic Tests Without Gold Standards,” Statistical Methods in Medical Research, 7, 354–370.CrossRefGoogle Scholar
  21. Johnson, W. O., Gastwirth, J. L., and Pearson, L. M. (2001), “Screening Without a ‘Gold Standard’: The Hui-Walter Paradigm Revisited,” American Journal of Epidemiology, 153, 921–924.CrossRefGoogle Scholar
  22. Joseph, L., Gyorkos, T., and Coupal, L. (1995), “Bayesian Estimation of Disease Prevalence and the Parameters of Diagnostic Tests in the Absence of a Gold Standard,” American Journal of Epidemiology, 141, 263–272.Google Scholar
  23. Nielsen, S. S., Gronbak, C., Agger, J. F., and Houe, H. (2002), “Maximum-like lihood Estimation of Sensitivity and Specificity of ELISAs and Faecal Culture for Diagnosis of Paratuberculosis,” Preventive Veterinary Medicine, 53, 191–204.CrossRefGoogle Scholar
  24. Pepe, M. (2003), The Statistical Evaluation of Medical Tests for Classification and Prediction, New York: Oxford University Press.Google Scholar
  25. Qu, Y., Tan, M., and Kutner, M. K. (1996), “Random Effects Models for Evaluating Accuracy of Diagnostic Tests,” Biometrics, 52, 797–810.zbMATHCrossRefMathSciNetGoogle Scholar
  26. Rideout, B. A., Brown, S., Davis, W. C., Gay, J. M., Giannella, R. A., Hines, M. E., Hueston, W. D., Hutchinson, L. J., and Rouse, T. (2003), The Diagnosis and Control of Johne’s Disease, Washington DC: National Academy Press.Google Scholar
  27. Robert, C. P., and Casella, G. (2004), Monte Carlo Statistical Methods (2nd ed.), New York: Springer.Google Scholar
  28. Spiegelhalter, D. J., Thomas, A., Best, N. G., and Gilks, W. R. (1995), BUGS: Bayesian Inference Using Gibbs Sampling Version 0.50, Cambridge: MRC Biostatistics Unit.Google Scholar
  29. Stabel, J. (2000), “Transitions in Immune Responses to Mycobacterium Paratuberculosis,” Veterinary Microbiology, 77, 465–473.CrossRefGoogle Scholar
  30. Tanner, M., and Wong, W. (1987), “The Calculation Of Posterior Distributions By Data Augmentation,” Journal of the American Statistical Association, 82, 528–550.zbMATHCrossRefMathSciNetGoogle Scholar
  31. The Math Works, Inc. (2004), Getting Started with MATLAB, Version 7.Google Scholar
  32. Toft, N., and Jørgensen, E., and Højsgaard, S. (2005), “Diagnosing Diagnostic Tests: Evaluating the Assumptions Underlying the Estimation of Sensitivity and Specifity in the Absence of a Gold Standard,” Preventive Veterinary Medicine, 68, 19–33.CrossRefGoogle Scholar
  33. Walter, S. D., and Irwig, L. M. (1988), “Estimation of Test Error Rates, Disease Prevalence and Relative Risk From Misclassified Data—A Review,” Journal of Clinical Epidemiology, 41, 923–937.CrossRefGoogle Scholar
  34. Wang, C., Turnbull, B. W., Gröhn, Y. T., and Nielsen, S. S. (2006), “Nonparametric Estimation of ROC Curves Based on Bayesian Models when the True Disease State is Unknown,” Technical Report 1445., Cornell University, School of Operations Research and Industrial Engineering.Google Scholar
  35. Yang, I., and Becker, M. P. (1997), “Latent Variable Modeling of Diagnostic Accuracy,” Biometrics, 53, 948–958.zbMATHCrossRefGoogle Scholar
  36. Zhou, X.-H., Castelluccio, P., and Zhou, C. (2005), “Nonparametric Estimation of ROC Curves in the Absence of a Gold Standard,” Biometrics, 61, 600–609.CrossRefMathSciNetGoogle Scholar

Copyright information

© International Biometric Society 2007

Authors and Affiliations

  • Chong Wang
    • 1
    • 2
    Email author
  • Bruce W. Turnbull
    • 3
  • Yrjö T. Gröhn
    • 4
  • Søren S. Nielsen
    • 5
  1. 1.Department of Mathematics, College of Veterinary MedicineCornell UniversityIthaca
  2. 2.Postdoctoral Associate, Department of Population Medicine and Diagnostic Sciences, College of Veterinary MedicineCornell UniversityIthaca
  3. 3.School of Operations Research and Industrial Engineering and Department of Statistical ScienceCornell UniversityIthaca
  4. 4.Department of Population Medicine and Diagnostic Sciences, College of Veterinary MedicineCornell UniversityIthaca
  5. 5.Department of Large Animal SciencesThe Royal Veterinary and Agricultural UniversityFrederiksberg CDenmark

Personalised recommendations