Methods and recommendations for evaluating and reporting a new diagnostic test

  • A. S. Hess
  • M. Shardell
  • J. K. Johnson
  • K. A. Thom
  • P. Strassle
  • G. Netzer
  • A. D. Harris


No standardized guidelines exist for the biostatistical methods appropriate for studies evaluating diagnostic tests. Publication recommendations such as the STARD statement provide guidance for the analysis of data, but biostatistical advice is minimal and application is inconsistent. This article aims to provide a self-contained, accessible resource on the biostatistical aspects of study design and reporting for investigators. For all dichotomous diagnostic tests, estimates of sensitivity and specificity should be reported with confidence intervals. Power calculations are strongly recommended to ensure that investigators achieve desired levels of precision. In the absence of a gold standard reference test, the composite reference standard method is recommended for improving estimates of the sensitivity and specificity of the test under evaluation.


Chlamydia Trachomatis Latent Class Analysis Reference Test Nucleic Acid Amplification Test Publication Recommendation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.



MS’s work on this article was supported by NIH grant 1K25AG034216.

ADH’s work on this article was supported by NIH grant 1K24AI079040-01A1.

Conflict of interest

JKJ has received funding from Becton Dickinson. The other authors declare that they have no conflict of interest.


  1. 1.
    Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM et al (2003) Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. Ann Intern Med 138(1):40–44PubMedGoogle Scholar
  2. 2.
    Pfeifer J (ed) (2006) Molecular genetic testing in surgical pathology. Lippincott Williams & Wilkins, PhiladelphiaGoogle Scholar
  3. 3.
    Rosner BA (2006) Fundamentals of biostatistics, 6th edn. Thomson Brooks Cole, Belmont, CAGoogle Scholar
  4. 4.
    FDA (2011) Statistical guidance on reporting results from studies evaluating diagnostic tests. Available from: Updated 6 January 2011; cited 8 December 2011
  5. 5.
    Royse D, Thyer BA, Padgett DK (2010) Program evaluation: An introduction, 5th edn. Wadsworth, Cengage Learning, Belmont, CAGoogle Scholar
  6. 6.
    Royse D (2008) Research methods in social work, 5th edn. Thomson Brooks Cole, Belmont, CAGoogle Scholar
  7. 7.
    Sullivan LM (2008) Essentials of biostatistics in public health, 1st edn. Jones and Bartlett, Sudbury, MAGoogle Scholar
  8. 8.
    Price RM, Bonett DG (2008) Confidence intervals for a ratio of two independent binomial proportions. Stat Med 27(26):5497–508PubMedCrossRefGoogle Scholar
  9. 9.
    Schachter J, McCormack WM, Chernesky MA, Martin DH, Van Der Pol B, Rice PA et al (2003) Vaginal swabs are appropriate specimens for diagnosis of genital tract infection with chlamydia trachomatis. J Clin Microbiol 41(8):3784–3789PubMedCrossRefGoogle Scholar
  10. 10.
    Miller WC (1998) Bias in discrepant analysis: when two wrongs don't make a right. J Clin Epidemiol 51(3):219–231PubMedCrossRefGoogle Scholar
  11. 11.
    Hawkins DM, Garrett JA, Stephenson B (2001) Some issues in resolution of diagnostic tests using an imperfect gold standard. Stat Med 20(13):1987–2001PubMedCrossRefGoogle Scholar
  12. 12.
    Hadgu A (1996) The discrepancy in discrepant analysis. Lancet 348(9027):592–593PubMedCrossRefGoogle Scholar
  13. 13.
    Alonzo TA, Pepe MS (1999) Using a combination of reference tests to assess the accuracy of a new diagnostic test. Stat Med 18(22):2987–3003PubMedCrossRefGoogle Scholar
  14. 14.
    Baughman AL, Bisgard KM, Cortese MM, Thompson WW, Sanden GN, Strebel PM (2008) Utility of composite reference standards and latent class analysis in evaluating the clinical accuracy of diagnostic tests for pertussis. Clin Vaccine Immunol 15(1):106–114PubMedCrossRefGoogle Scholar
  15. 15.
    Lipman HB, Astles JR (1998) Quantifying the bias associated with use of discrepant analysis. Clin Chem 44(1):108–115PubMedGoogle Scholar
  16. 16.
    Torrance-Rynard VL, Walter SD (1997) Effects of dependent errors in the assessment of diagnostic test performance. Stat Med 16(19):2157–2175PubMedCrossRefGoogle Scholar
  17. 17.
    Pepe MS, Janes H (2007) Insights into latent class analysis of diagnostic test performance. Biostatistics 8(2):474–484PubMedCrossRefGoogle Scholar
  18. 18.
    Rindskopf D, Rindskopf W (1986) The value of latent class analysis in medical diagnosis. Stat Med 5(1):21–27PubMedCrossRefGoogle Scholar
  19. 19.
    Qu Y, Tan M, Kutner MH (1996) Random effects models in latent class analysis for evaluating accuracy of diagnostic tests. Biometrics 52(3):797–810PubMedCrossRefGoogle Scholar
  20. 20.
    Hui SL, Zhou XH (1998) Evaluation of diagnostic tests without gold standards. Stat Methods Med Res 7(4):354–370PubMedCrossRefGoogle Scholar

Copyright information

© Springer-Verlag 2012

Authors and Affiliations

  • A. S. Hess
    • 1
  • M. Shardell
    • 1
  • J. K. Johnson
    • 1
  • K. A. Thom
    • 1
  • P. Strassle
    • 1
  • G. Netzer
    • 1
  • A. D. Harris
    • 1
  1. 1.University of Maryland School of MedicineBaltimoreUSA

Personalised recommendations