Abstract
Purpose
Sensitivity and the false positive rate are usually defined with the patient as the unit of observation, i.e., the diagnostic test detects or does not detect disease in a patient. For tests designed to find and diagnose lesions, e.g., lung nodules, the usual definitions of sensitivity and specificity may be misleading. In this paper we describe and compare five measures of accuracy of lesion detection.
Methods
The five levels of evaluation considered were patient level without localization, patient level with localization, region of interest (ROI) level without localization, ROI level with localization, and lesion level.
Results
We found that estimators of sensitivity that do not require the reader to correctly locate the lesion overstate sensitivity. Patient-level estimators of sensitivity can be misleading when there is more than one lesion per patient and they reduce study power. Patient-level estimators of the false positive rate can conceal important differences between techniques. Referring clinicians rely on a test’s reported accuracy to both choose the appropriate test and plan management for their patients. If reported sensitivity is overstated, the clinician could choose the test for disease screening, and have false confidence that a negative test represents the true absence of lesions. Similarly, the lower false positive rate associated with patient-level estimators can mislead clinicians about the diagnostic value of the test and consequently that a positive finding is real.
Conclusion
We present clear recommendations for studies assessing and comparing the accuracy of tests tasked with the detection and interpretation of lesions...
Similar content being viewed by others
References
Fryback DG, Thornbury JR (1991) The efficacy of diagnostic imaging. Med Decis Mak 11:88–94
Zhou XH, Obuchowski NA, McClish DL (2002) Statistical methods in diagnostic medicine. Wiley, New York
Pepe MS (2004) The Statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford
Metz CE (1978) Basic principles of ROC analysis. Semin Nucl Med 8:283–298
Zweig MH, Campbell G (1993) Receiver operating characteristic plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
Kundel HL, Nodine CF (1983) A visual concept shapes image perception. Radiology 146:363–368
Chakraborty DP (2006) A search model and figure of merit for observer data acquired according to the free-response paradigm. Phys Med Biol 51:3449–3462
Edwards DC, Kupinski MA, Metz CE, Nishikawa RM (2002) Maximum likelihood fitting of FROC curves under an initial-detection-and-candidate-analysis model. Med Phys 29:2861–2870
Wagner RF, Metz CE, Campbell G (2007) Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol 14:723–748
Rockette HE (1994) An index of diagnostic accuracy in the multiple disease setting. Acad Radiol 1:283–286
Chakraborty DP (2006) ROC curves predicted by a model of visual search. Phys Med Biol 51:3463–3482
Song T, Bandos AI, Rockette HE (2008) On comparing methods for discriminating between actually negative and actually positive subjects with FROC type data. Med Phys 35:1547–1558
Pickhardt PJ, Nugent PA, Mysliwiec PA, Choi RJ, Schindler WR (2004) Location of adenomas missed by optical colonoscopy. Ann Intern Med 141:352–359
Obuchowski NA (1998) On the comparison of correlated proportions for clustered data. Stat Med 17:1495–1507
Obuchowski NA (1997) Nonparametric analysis of clustered ROC curve data. Biometrics 53:170–180
Obuchowski NA, Lieber ML, Powell KA (2000) Data analysis for detection and localization of multiple abnormalities with application to mammography. Acad Radiol 7:516–525; Author’s Response to Comments, 7:554–555
Beam CA (1998) Analysis of clustered data in receiver operating characteristic studies. Stat Methods Med Res 7:324–336
Rutter CM (2000) Bootstrap estimation of diagnostic accuracy with patient-clustered data. Acad Radiol 7:516–525
Chakraborty DP, Berbaum KS (2004) Observer studies involving detection and localization: modeling, analysis, and validation. Med Phys 31:2313–2330
Chakraborty DP (2006) Analysis of location specific observer performance data: validated extensions of the jackknife free-response (JAFROC) method. Acad Radiol 13:1187–1193
Kish L (1965) Survey sampling. Wiley, New York
Acknowledgement
This manuscript was partially prepared under a grant from the State of Ohio, Department of Development. The content reflects the views of the Cleveland Clinic Foundation and does not necessarily reflect the views of the State of Ohio, Department of Development.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
The sample size calculations in Table 4 follow the methods described in Zhou et al. [2]. For 80% power and 5% type I error rate, a formula for determining the sample size for testing the null hypothesis that two sensitivities are equal, versus the alternative hypothesis that the sensitivities differ, is:
where Sen0 is the sensitivity under the null hypothesis, Sen1 is the sensitivity of the first technique under the alternative hypothesis, and Sen2 is the sensitivity of the second technique under the alternative hypothesis. The sample size needed for studies using the level 2 estimators can be estimated by substituting the conjectured values of patient-level sensitivity into this equation.
For studies using lesion-level estimators (or ROI-level estimators) when there is more than one lesion expected per patient (or more than one ROI with a lesion per patient), several modifications to this formula are required. First, instead of patient-level values of sensitivity, we substitute lesion-level (or ROI-level) values of sensitivity. Second, we divide N D by the number of lesions (or number of ROIs) per patient, then we multiple by the design effect. The design effect [22] is a factor that accounts for the correlation between lesions within the same patient. The design effect equals: \({\text{1}} + \left( {s - {\text{1}}} \right)r\), where s is the number of lesions per patient (or the number of ROIs with lesions per patient), and r is the correlation between lesions within the same patient. If there is no interlesion correlation, then the design effect is 1.0. If there is perfect positive interlesion correlation, then the design effect is s.
Rights and permissions
About this article
Cite this article
Obuchowski, N.A., Mazzone, P.J. & Dachman, A.H. Bias, underestimation of risk, and loss of statistical power in patient-level analyses of lesion detection. Eur Radiol 20, 584–594 (2010). https://doi.org/10.1007/s00330-009-1590-4
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00330-009-1590-4