OBJECTIVE: Evaluations of screening or diagnostic tests sometimes incorporate measures of overall accuracy, diagnostic accuracy, or test efficiency. These terms refer to a single summary measurement calculated from 2 × 2 contingency tables that is the overall probability that a patient will be correctly classified by a screening or diagnostic test. We assessed the value of overall accuracy in studies of test validity, a topic that has not received adequate emphasis in the clinical literature.
DESIGN: Guided by previous reports, we summarize the issues concerning the use of overall accuracy. To document its use in contemporary studies, a search was performed for test evaluation studies published in the clinical literature from 2000 to 2002 in which overall accuracy derived from a 2×2 contingency table was reported.
MEASUREMENTS AND MAIN RESULTS: Overall accuracy is the weighted average of a test’s sensitivity and specificity, where sensitivity is weighted by prevalence and specificity is weighted by the complement of prevalence. Overall accuracy becomes particularly problematic as a measure of validity as 1) the difference between sensitivity and specificity increases and/or 2) the prevalence deviates away from 50%. Both situations lead to an increasing deviation between overall accuracy and either sensitivity or specificity. A summary of results from published studies (N=25) illustrated that the prevalence-dependent nature of overall accuracy has potentially negative consequences that can lead to a distorted impression of the validity of a screening or diagnostic test.
CONCLUSIONS: Despite the intuitive appeal of overall accuracy as a single measure of test validity, its dependence on prevalence renders it inferior to the careful and balanced consideration of sensitivity and specificity.
accuracy screening diagnostic test research methods sensitivity specificity validity
This is a preview of subscription content, log in to check access.
Siberry GK. Conversion formulas and biostatistics. In: Siberry GK, Iannone R, eds. The Harriet Lane Handbook: A Manual for Pediatric House Officers. 15th ed. St. Louis, Mo: Mosby; 2000:181–6.Google Scholar
Galen RS, Gambino SR. Beyond Normality: The Predictive Value and Efficiency of Medical Diagnoses. New York, NY: John Wiley & Sons; 1975.Google Scholar
Wassertheil-Smoller S. Biostatistics and Epidemiology: A Primer for Health Professionals. 2nd ed. New York, NY: Springer-Verlag; 1995:118–28.Google Scholar
Nardin RA, Rutkove SB, Raynor EM. Diagnostic accuracy of electrodiagnostic testing in the evaluation of weakness. Muscle Nerve. 2002;26:201–5.PubMedCrossRefGoogle Scholar
Tong MJ, Blatt LM, Kao VWC. Surveillance for hepatocellular carcinoma in patients with chronic viral hepatitis in the United States of America. J Gastroenterol Hepatol. 2001;16:553–9.PubMedCrossRefGoogle Scholar
McFarland EG, Kim TK, Savino RM. Clinical assessment of three common tests for superior labral anterior-posterior lesions. Am J Sports Med. 2002;30:810–5.PubMedGoogle Scholar
Krettek C, Seekamp A, Kontopp H, Tscherne H. Hannover Fracture Scale ′98—re-evaluation and new perspectives of an established extremity salvage score. Injury. 2001;32:317–28.PubMedCrossRefGoogle Scholar
Postema S, Pattynama P, van den Berg-Huysmans A, Peters LW, Kenter G, Trimbos JB. Effect of MRI on therapeutic decisions in invasive cervical carcinoma. Gynecol Oncol. 2000;79:485–9.PubMedCrossRefGoogle Scholar
Yang WT, Lam WWM, Yu MY, Cheung TH, Metreweli C. Comparison of dynamic helical CT and dynamic MR imaging in the evaluation of pelvic lymph nodes in cervical carcinoma. Am J Roentgenol. 2000;175:759–66.Google Scholar
Tsatalpas P, Beuthein-Baumann B, Kropp J, et al. Diagnostic value of 18F-FDG positron emission tomography for detection and treatment control of malignant germ cell tumors. Urol Int. 2002;68:157–63.PubMedCrossRefGoogle Scholar
Jee W, McCauley TR, Katz LD, Matheny JM, Ruwe PA, Daigneault JP. Superior labral anterior posterior (SLAP) lesions of the glenoid labrum: reliability and accuracy of MR arthrography for diagnosis. Radiology. 2001;218:127–32.PubMedGoogle Scholar
Koide Y, Yotsukura M, Yoshino H, Ishikawa K. Usefulness of QT dispersion immediately after exercise as an indicator of coronary stenosis independent of gender or exercise-induced ST-segment depression. Am J Cardiol. 2000;86:1312–7.PubMedCrossRefGoogle Scholar
Aslam N, Banerjee S, Carr JV, Savvas M, Hooper R, Jurkovic D. Prospective evaluation of logistic regression models for the diagnosis of ovarian cancer. Obstet Gynecol. 2000;96:75–80.PubMedCrossRefGoogle Scholar
Yeoh GPS, Chan KW. The diagnostic value of fine-needle aspiration cytology in the assessment of thyroid nodules: a retrospective 5-year analysis. Hong Kong Med J. 1999;5:140–4.PubMedGoogle Scholar
Vicini FA, Kestin LL, Martinez AA. The correlation of serial prostate specific antigen measurements with clinical outcome after external beam radiation therapy of patients for prostate carcinoma. Cancer. 2000;88:2305–18.PubMedCrossRefGoogle Scholar
Elhendy A, van Domberg RT, Sozzi FB, Poldermans D, Bax JJ, Roelandt JRTC. Impact of hypertension on the accuracy of exercise stress myocardial perfusion imaging for the diagnosis of coronary artery disease. Heart. 2001;85:655–61.PubMedCrossRefGoogle Scholar
Viegi G, Pedreschi M, Pistelli F, et al. Prevalence of airways obstruction in a general population: European Respiratory Society versus American Thoracic Society definition. Chest. 2000;117(suppl 2):339–45.CrossRefGoogle Scholar
Nunes LW, Schnall MD, Orel SG. Update of breast MR imaging architectural interpretation model. Radiology. 2001;219:484–94.PubMedGoogle Scholar
Flamen P, Lerut A, Van Cutsem E, et al. Utility of positron emission tomography for the staging of patients with potentially operable esophageal carcinoma. J Clin Oncol. 2000;18:3202–10.PubMedGoogle Scholar
Sone S, Li F, Yang Z-G, et al. Characteristics of small lung cancers invisible on conventional chest radiography and detected by population based screening using spiral CT. Br J Radiol. 2000;73:137–45.PubMedGoogle Scholar
Wong BC, Wong WM, Wang WH, et al. An evaluation of invasive and non-invasive tests for the diagnosis of Helicobactor pylori infection in Chinese. Aliment Pharmacol Ther. 2001;15:505–11.PubMedCrossRefGoogle Scholar
Lin WY, Chao TH, Wang SJ. Clinical features and gallium scan in the detection of post-surgical infection in the elderly. Eur J Nucl Med Mol Imaging. 2002;29:371–5.PubMedCrossRefGoogle Scholar
Ahmad NA, Lewis JD, Ginsberg GG, Rosato EF, Morris JB, Kochman ML. EUS in preoperative staging of pancreatic cancer. Gastrointest Endosc. 2000;52:463–8.PubMedCrossRefGoogle Scholar
Meyer PT, Schreckenberger M, Spetzger U, et al. Comparison of visual and ROI-based brain tumor grading using 18F-FDG PET: ROC analysis. Eur J Nucl Med Mol Imaging. 2001;28:165–74.CrossRefGoogle Scholar
Ogawa K, Oida A, Sugimura H, et al. Clinical significance of blood brain natriuretic peptide level measurement in the detection of heart disease in untreated outpatients. Circ J. 2002;66:122–6.PubMedCrossRefGoogle Scholar
Lokeshwar VB, Schroeder GL, Selzer MG, et al. Bladder tumor markers for monitoring recurrence and screening comparison of hyaluronic acid-hyaluronidase and BTA-stat tests. Cancer. 2002;95:61–72.PubMedCrossRefGoogle Scholar
Gurleyik G, Gurleyik E, Cetinkaya F, Unalmiser S. Serum interleukin-6 measurement in the diagnosis of acute appendicitis. Aust NZ J Surg. 2002;72:665–7.CrossRefGoogle Scholar
Greco M, Crippa F, Agresti R, et al. Axillary lymph node staging in breast cancer by 2-fluoro-2-deoxy-D-glucose-positron emission tomography: clinical evaluation and alternative management. J Natl Cancer Inst. 2001;93:630–5.PubMedCrossRefGoogle Scholar
Colao A, Faggiano A, Pivonello R, et al. Inferior petrosal sinus sampling in the differential diagnosis of Cushing’s syndrome: results of an Italian multicenter study. Eur J Endocrinol. 2001;144:499–507.PubMedCrossRefGoogle Scholar
Szklo M, Nieto FJ. Epidemiology: Beyond the Basics. Gaithersburg, Md: Aspen Publishers, Inc.; 2000.Google Scholar