Journal of General Internal Medicine

, Volume 19, Issue 5, pp 460–465 | Cite as

The use of “overall accuracy” to evaluate the validity of screening or diagnostic tests

  • Anthony J. Alberg
  • Ji Wan Park
  • Brant W. Hager
  • Malcolm V. Brock
  • Marie Diener-West


OBJECTIVE: Evaluations of screening or diagnostic tests sometimes incorporate measures of overall accuracy, diagnostic accuracy, or test efficiency. These terms refer to a single summary measurement calculated from 2 × 2 contingency tables that is the overall probability that a patient will be correctly classified by a screening or diagnostic test. We assessed the value of overall accuracy in studies of test validity, a topic that has not received adequate emphasis in the clinical literature.

DESIGN: Guided by previous reports, we summarize the issues concerning the use of overall accuracy. To document its use in contemporary studies, a search was performed for test evaluation studies published in the clinical literature from 2000 to 2002 in which overall accuracy derived from a 2×2 contingency table was reported.

MEASUREMENTS AND MAIN RESULTS: Overall accuracy is the weighted average of a test’s sensitivity and specificity, where sensitivity is weighted by prevalence and specificity is weighted by the complement of prevalence. Overall accuracy becomes particularly problematic as a measure of validity as 1) the difference between sensitivity and specificity increases and/or 2) the prevalence deviates away from 50%. Both situations lead to an increasing deviation between overall accuracy and either sensitivity or specificity. A summary of results from published studies (N=25) illustrated that the prevalence-dependent nature of overall accuracy has potentially negative consequences that can lead to a distorted impression of the validity of a screening or diagnostic test.

CONCLUSIONS: Despite the intuitive appeal of overall accuracy as a single measure of test validity, its dependence on prevalence renders it inferior to the careful and balanced consideration of sensitivity and specificity.

Key words

accuracy screening diagnostic test research methods sensitivity specificity validity 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Shapiro DE. The interpretation of diagnostic tests. Stat Methods Med Res. 1999;8:113–34.PubMedCrossRefGoogle Scholar
  2. 2.
    Begg CB. Biases in the assessment of diagnostic tests. Stat Med. 1987;6:411–23.PubMedCrossRefGoogle Scholar
  3. 3.
    Metz CE. Basic principles of ROC analysis. Semin Nucl Med. 1978;8:283–98.PubMedCrossRefGoogle Scholar
  4. 4.
    Weiss N. Clinical Epidemiology: The Study of the Outcome of Illness. 2nd ed. New York, NY: Oxford University Press; 1996:20–1.Google Scholar
  5. 5.
    Grimes DA, Schulz KF. Uses and abuses of screening tests. Lancet. 2002;359:881–4.PubMedCrossRefGoogle Scholar
  6. 6.
    Siberry GK. Conversion formulas and biostatistics. In: Siberry GK, Iannone R, eds. The Harriet Lane Handbook: A Manual for Pediatric House Officers. 15th ed. St. Louis, Mo: Mosby; 2000:181–6.Google Scholar
  7. 7.
    Galen RS, Gambino SR. Beyond Normality: The Predictive Value and Efficiency of Medical Diagnoses. New York, NY: John Wiley & Sons; 1975.Google Scholar
  8. 8.
    Wassertheil-Smoller S. Biostatistics and Epidemiology: A Primer for Health Professionals. 2nd ed. New York, NY: Springer-Verlag; 1995:118–28.Google Scholar
  9. 9.
    Nardin RA, Rutkove SB, Raynor EM. Diagnostic accuracy of electrodiagnostic testing in the evaluation of weakness. Muscle Nerve. 2002;26:201–5.PubMedCrossRefGoogle Scholar
  10. 10.
    Tong MJ, Blatt LM, Kao VWC. Surveillance for hepatocellular carcinoma in patients with chronic viral hepatitis in the United States of America. J Gastroenterol Hepatol. 2001;16:553–9.PubMedCrossRefGoogle Scholar
  11. 11.
    McFarland EG, Kim TK, Savino RM. Clinical assessment of three common tests for superior labral anterior-posterior lesions. Am J Sports Med. 2002;30:810–5.PubMedGoogle Scholar
  12. 12.
    Krettek C, Seekamp A, Kontopp H, Tscherne H. Hannover Fracture Scale ′98—re-evaluation and new perspectives of an established extremity salvage score. Injury. 2001;32:317–28.PubMedCrossRefGoogle Scholar
  13. 13.
    Postema S, Pattynama P, van den Berg-Huysmans A, Peters LW, Kenter G, Trimbos JB. Effect of MRI on therapeutic decisions in invasive cervical carcinoma. Gynecol Oncol. 2000;79:485–9.PubMedCrossRefGoogle Scholar
  14. 14.
    Yang WT, Lam WWM, Yu MY, Cheung TH, Metreweli C. Comparison of dynamic helical CT and dynamic MR imaging in the evaluation of pelvic lymph nodes in cervical carcinoma. Am J Roentgenol. 2000;175:759–66.Google Scholar
  15. 15.
    Tsatalpas P, Beuthein-Baumann B, Kropp J, et al. Diagnostic value of 18F-FDG positron emission tomography for detection and treatment control of malignant germ cell tumors. Urol Int. 2002;68:157–63.PubMedCrossRefGoogle Scholar
  16. 16.
    Jee W, McCauley TR, Katz LD, Matheny JM, Ruwe PA, Daigneault JP. Superior labral anterior posterior (SLAP) lesions of the glenoid labrum: reliability and accuracy of MR arthrography for diagnosis. Radiology. 2001;218:127–32.PubMedGoogle Scholar
  17. 17.
    Koide Y, Yotsukura M, Yoshino H, Ishikawa K. Usefulness of QT dispersion immediately after exercise as an indicator of coronary stenosis independent of gender or exercise-induced ST-segment depression. Am J Cardiol. 2000;86:1312–7.PubMedCrossRefGoogle Scholar
  18. 18.
    Aslam N, Banerjee S, Carr JV, Savvas M, Hooper R, Jurkovic D. Prospective evaluation of logistic regression models for the diagnosis of ovarian cancer. Obstet Gynecol. 2000;96:75–80.PubMedCrossRefGoogle Scholar
  19. 19.
    Yeoh GPS, Chan KW. The diagnostic value of fine-needle aspiration cytology in the assessment of thyroid nodules: a retrospective 5-year analysis. Hong Kong Med J. 1999;5:140–4.PubMedGoogle Scholar
  20. 20.
    Vicini FA, Kestin LL, Martinez AA. The correlation of serial prostate specific antigen measurements with clinical outcome after external beam radiation therapy of patients for prostate carcinoma. Cancer. 2000;88:2305–18.PubMedCrossRefGoogle Scholar
  21. 21.
    Elhendy A, van Domberg RT, Sozzi FB, Poldermans D, Bax JJ, Roelandt JRTC. Impact of hypertension on the accuracy of exercise stress myocardial perfusion imaging for the diagnosis of coronary artery disease. Heart. 2001;85:655–61.PubMedCrossRefGoogle Scholar
  22. 22.
    Viegi G, Pedreschi M, Pistelli F, et al. Prevalence of airways obstruction in a general population: European Respiratory Society versus American Thoracic Society definition. Chest. 2000;117(suppl 2):339–45.CrossRefGoogle Scholar
  23. 23.
    Nunes LW, Schnall MD, Orel SG. Update of breast MR imaging architectural interpretation model. Radiology. 2001;219:484–94.PubMedGoogle Scholar
  24. 24.
    Flamen P, Lerut A, Van Cutsem E, et al. Utility of positron emission tomography for the staging of patients with potentially operable esophageal carcinoma. J Clin Oncol. 2000;18:3202–10.PubMedGoogle Scholar
  25. 25.
    Sone S, Li F, Yang Z-G, et al. Characteristics of small lung cancers invisible on conventional chest radiography and detected by population based screening using spiral CT. Br J Radiol. 2000;73:137–45.PubMedGoogle Scholar
  26. 26.
    Wong BC, Wong WM, Wang WH, et al. An evaluation of invasive and non-invasive tests for the diagnosis of Helicobactor pylori infection in Chinese. Aliment Pharmacol Ther. 2001;15:505–11.PubMedCrossRefGoogle Scholar
  27. 27.
    Lin WY, Chao TH, Wang SJ. Clinical features and gallium scan in the detection of post-surgical infection in the elderly. Eur J Nucl Med Mol Imaging. 2002;29:371–5.PubMedCrossRefGoogle Scholar
  28. 28.
    Ahmad NA, Lewis JD, Ginsberg GG, Rosato EF, Morris JB, Kochman ML. EUS in preoperative staging of pancreatic cancer. Gastrointest Endosc. 2000;52:463–8.PubMedCrossRefGoogle Scholar
  29. 29.
    Meyer PT, Schreckenberger M, Spetzger U, et al. Comparison of visual and ROI-based brain tumor grading using 18F-FDG PET: ROC analysis. Eur J Nucl Med Mol Imaging. 2001;28:165–74.CrossRefGoogle Scholar
  30. 30.
    Ogawa K, Oida A, Sugimura H, et al. Clinical significance of blood brain natriuretic peptide level measurement in the detection of heart disease in untreated outpatients. Circ J. 2002;66:122–6.PubMedCrossRefGoogle Scholar
  31. 31.
    Lokeshwar VB, Schroeder GL, Selzer MG, et al. Bladder tumor markers for monitoring recurrence and screening comparison of hyaluronic acid-hyaluronidase and BTA-stat tests. Cancer. 2002;95:61–72.PubMedCrossRefGoogle Scholar
  32. 32.
    Gurleyik G, Gurleyik E, Cetinkaya F, Unalmiser S. Serum interleukin-6 measurement in the diagnosis of acute appendicitis. Aust NZ J Surg. 2002;72:665–7.CrossRefGoogle Scholar
  33. 33.
    Greco M, Crippa F, Agresti R, et al. Axillary lymph node staging in breast cancer by 2-fluoro-2-deoxy-D-glucose-positron emission tomography: clinical evaluation and alternative management. J Natl Cancer Inst. 2001;93:630–5.PubMedCrossRefGoogle Scholar
  34. 34.
    Colao A, Faggiano A, Pivonello R, et al. Inferior petrosal sinus sampling in the differential diagnosis of Cushing’s syndrome: results of an Italian multicenter study. Eur J Endocrinol. 2001;144:499–507.PubMedCrossRefGoogle Scholar
  35. 35.
    Szklo M, Nieto FJ. Epidemiology: Beyond the Basics. Gaithersburg, Md: Aspen Publishers, Inc.; 2000.Google Scholar

Copyright information

© Society of General Internal Medicine 2004

Authors and Affiliations

  • Anthony J. Alberg
    • 3
  • Ji Wan Park
    • 3
  • Brant W. Hager
    • 3
  • Malcolm V. Brock
    • 2
  • Marie Diener-West
    • 1
  1. 1.the Department of BiostatisticsThe Johns Hopkins Bloomberg School of Public HealthUSA
  2. 2.the Department of SurgeryJohns Hopkins School of MedicineBaltimore
  3. 3.Department of Epidemiology, Room E6132BJohns Hopkins Bloomberg School of Public HealthBaltimore

Personalised recommendations