Skip to main content
Log in

Bias, underestimation of risk, and loss of statistical power in patient-level analyses of lesion detection

  • Computer Applications
  • Published:
European Radiology Aims and scope Submit manuscript

Abstract

Purpose

Sensitivity and the false positive rate are usually defined with the patient as the unit of observation, i.e., the diagnostic test detects or does not detect disease in a patient. For tests designed to find and diagnose lesions, e.g., lung nodules, the usual definitions of sensitivity and specificity may be misleading. In this paper we describe and compare five measures of accuracy of lesion detection.

Methods

The five levels of evaluation considered were patient level without localization, patient level with localization, region of interest (ROI) level without localization, ROI level with localization, and lesion level.

Results

We found that estimators of sensitivity that do not require the reader to correctly locate the lesion overstate sensitivity. Patient-level estimators of sensitivity can be misleading when there is more than one lesion per patient and they reduce study power. Patient-level estimators of the false positive rate can conceal important differences between techniques. Referring clinicians rely on a test’s reported accuracy to both choose the appropriate test and plan management for their patients. If reported sensitivity is overstated, the clinician could choose the test for disease screening, and have false confidence that a negative test represents the true absence of lesions. Similarly, the lower false positive rate associated with patient-level estimators can mislead clinicians about the diagnostic value of the test and consequently that a positive finding is real.

Conclusion

We present clear recommendations for studies assessing and comparing the accuracy of tests tasked with the detection and interpretation of lesions...

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Fryback DG, Thornbury JR (1991) The efficacy of diagnostic imaging. Med Decis Mak 11:88–94

    Article  CAS  Google Scholar 

  2. Zhou XH, Obuchowski NA, McClish DL (2002) Statistical methods in diagnostic medicine. Wiley, New York

    Book  Google Scholar 

  3. Pepe MS (2004) The Statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford

  4. Metz CE (1978) Basic principles of ROC analysis. Semin Nucl Med 8:283–298

    Article  CAS  PubMed  Google Scholar 

  5. Zweig MH, Campbell G (1993) Receiver operating characteristic plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577

    CAS  PubMed  Google Scholar 

  6. Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36

    CAS  PubMed  Google Scholar 

  7. Kundel HL, Nodine CF (1983) A visual concept shapes image perception. Radiology 146:363–368

    CAS  PubMed  Google Scholar 

  8. Chakraborty DP (2006) A search model and figure of merit for observer data acquired according to the free-response paradigm. Phys Med Biol 51:3449–3462

    Article  CAS  PubMed  Google Scholar 

  9. Edwards DC, Kupinski MA, Metz CE, Nishikawa RM (2002) Maximum likelihood fitting of FROC curves under an initial-detection-and-candidate-analysis model. Med Phys 29:2861–2870

    Article  PubMed  Google Scholar 

  10. Wagner RF, Metz CE, Campbell G (2007) Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol 14:723–748

    Article  PubMed  Google Scholar 

  11. Rockette HE (1994) An index of diagnostic accuracy in the multiple disease setting. Acad Radiol 1:283–286

    Article  CAS  PubMed  Google Scholar 

  12. Chakraborty DP (2006) ROC curves predicted by a model of visual search. Phys Med Biol 51:3463–3482

    Article  CAS  PubMed  Google Scholar 

  13. Song T, Bandos AI, Rockette HE (2008) On comparing methods for discriminating between actually negative and actually positive subjects with FROC type data. Med Phys 35:1547–1558

    Article  PubMed  Google Scholar 

  14. Pickhardt PJ, Nugent PA, Mysliwiec PA, Choi RJ, Schindler WR (2004) Location of adenomas missed by optical colonoscopy. Ann Intern Med 141:352–359

    PubMed  Google Scholar 

  15. Obuchowski NA (1998) On the comparison of correlated proportions for clustered data. Stat Med 17:1495–1507

    Article  CAS  PubMed  Google Scholar 

  16. Obuchowski NA (1997) Nonparametric analysis of clustered ROC curve data. Biometrics 53:170–180

    Article  Google Scholar 

  17. Obuchowski NA, Lieber ML, Powell KA (2000) Data analysis for detection and localization of multiple abnormalities with application to mammography. Acad Radiol 7:516–525; Author’s Response to Comments, 7:554–555

    Article  CAS  PubMed  Google Scholar 

  18. Beam CA (1998) Analysis of clustered data in receiver operating characteristic studies. Stat Methods Med Res 7:324–336

    Article  CAS  PubMed  Google Scholar 

  19. Rutter CM (2000) Bootstrap estimation of diagnostic accuracy with patient-clustered data. Acad Radiol 7:516–525

    Article  Google Scholar 

  20. Chakraborty DP, Berbaum KS (2004) Observer studies involving detection and localization: modeling, analysis, and validation. Med Phys 31:2313–2330

    Article  PubMed  Google Scholar 

  21. Chakraborty DP (2006) Analysis of location specific observer performance data: validated extensions of the jackknife free-response (JAFROC) method. Acad Radiol 13:1187–1193

    Article  PubMed  Google Scholar 

  22. Kish L (1965) Survey sampling. Wiley, New York

    Google Scholar 

Download references

Acknowledgement

This manuscript was partially prepared under a grant from the State of Ohio, Department of Development. The content reflects the views of the Cleveland Clinic Foundation and does not necessarily reflect the views of the State of Ohio, Department of Development.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nancy A. Obuchowski.

Appendix

Appendix

The sample size calculations in Table 4 follow the methods described in Zhou et al. [2]. For 80% power and 5% type I error rate, a formula for determining the sample size for testing the null hypothesis that two sensitivities are equal, versus the alternative hypothesis that the sensitivities differ, is:

$${N_{\text{D}} = {{\left[ {{\text{2}} \times {\text{Sen}}_{\text{0}} \times \left( {{\text{1}} - {\text{Sen}}_{\text{0}} } \right) \times {\text{7}}{\text{.84}}} \right]} \mathord{\left/ {\vphantom {{\left[ {{\text{2}} \times {\text{Sen}}_{\text{0}} \times \left( {{\text{1}} - {\text{Sen}}_{\text{0}} } \right) \times {\text{7}}{\text{.84}}} \right]} {\left( {{\text{Sen}}_{\text{1}} - {\text{Sen}}_{\text{2}} } \right)^{\text{2}} }}} \right. \kern-\nulldelimiterspace} {\left( {{\text{Sen}}_{\text{1}} - {\text{Sen}}_{\text{2}} } \right)^{\text{2}} }}}$$

where Sen0 is the sensitivity under the null hypothesis, Sen1 is the sensitivity of the first technique under the alternative hypothesis, and Sen2 is the sensitivity of the second technique under the alternative hypothesis. The sample size needed for studies using the level 2 estimators can be estimated by substituting the conjectured values of patient-level sensitivity into this equation.

For studies using lesion-level estimators (or ROI-level estimators) when there is more than one lesion expected per patient (or more than one ROI with a lesion per patient), several modifications to this formula are required. First, instead of patient-level values of sensitivity, we substitute lesion-level (or ROI-level) values of sensitivity. Second, we divide N D by the number of lesions (or number of ROIs) per patient, then we multiple by the design effect. The design effect [22] is a factor that accounts for the correlation between lesions within the same patient. The design effect equals: \({\text{1}} + \left( {s - {\text{1}}} \right)r\), where s is the number of lesions per patient (or the number of ROIs with lesions per patient), and r is the correlation between lesions within the same patient. If there is no interlesion correlation, then the design effect is 1.0. If there is perfect positive interlesion correlation, then the design effect is s.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Obuchowski, N.A., Mazzone, P.J. & Dachman, A.H. Bias, underestimation of risk, and loss of statistical power in patient-level analyses of lesion detection. Eur Radiol 20, 584–594 (2010). https://doi.org/10.1007/s00330-009-1590-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00330-009-1590-4

Keywords

Navigation