Bias, underestimation of risk, and loss of statistical power in patient-level analyses of lesion detection

Obuchowski, Nancy A.; Mazzone, Peter J.; Dachman, Abraham H.

doi:10.1007/s00330-009-1590-4

Bias, underestimation of risk, and loss of statistical power in patient-level analyses of lesion detection

Computer Applications
Published: 16 September 2009

Volume 20, pages 584–594, (2010)
Cite this article

European Radiology Aims and scope Submit manuscript

Nancy A. Obuchowski¹,
Peter J. Mazzone² &
Abraham H. Dachman³

360 Accesses
27 Citations
Explore all metrics

Abstract

Purpose

Sensitivity and the false positive rate are usually defined with the patient as the unit of observation, i.e., the diagnostic test detects or does not detect disease in a patient. For tests designed to find and diagnose lesions, e.g., lung nodules, the usual definitions of sensitivity and specificity may be misleading. In this paper we describe and compare five measures of accuracy of lesion detection.

Methods

The five levels of evaluation considered were patient level without localization, patient level with localization, region of interest (ROI) level without localization, ROI level with localization, and lesion level.

Results

We found that estimators of sensitivity that do not require the reader to correctly locate the lesion overstate sensitivity. Patient-level estimators of sensitivity can be misleading when there is more than one lesion per patient and they reduce study power. Patient-level estimators of the false positive rate can conceal important differences between techniques. Referring clinicians rely on a test’s reported accuracy to both choose the appropriate test and plan management for their patients. If reported sensitivity is overstated, the clinician could choose the test for disease screening, and have false confidence that a negative test represents the true absence of lesions. Similarly, the lower false positive rate associated with patient-level estimators can mislead clinicians about the diagnostic value of the test and consequently that a positive finding is real.

Conclusion

We present clear recommendations for studies assessing and comparing the accuracy of tests tasked with the detection and interpretation of lesions...

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Study Designs: Diagnostic Studies

Article 20 April 2021

What you need to know about statistics Part I: validity of diagnostic and screening tests

Article 31 January 2015

Development and validation of clinical prediction models to risk stratify patients presenting with small pulmonary nodules: a research protocol

Article Open access 29 November 2018

References

Fryback DG, Thornbury JR (1991) The efficacy of diagnostic imaging. Med Decis Mak 11:88–94
Article CAS Google Scholar
Zhou XH, Obuchowski NA, McClish DL (2002) Statistical methods in diagnostic medicine. Wiley, New York
Book Google Scholar
Pepe MS (2004) The Statistical evaluation of medical tests for classification and prediction. Oxford University Press, Oxford
Metz CE (1978) Basic principles of ROC analysis. Semin Nucl Med 8:283–298
Article CAS PubMed Google Scholar
Zweig MH, Campbell G (1993) Receiver operating characteristic plots: a fundamental evaluation tool in clinical medicine. Clin Chem 39:561–577
CAS PubMed Google Scholar
Hanley JA, McNeil BJ (1982) The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
CAS PubMed Google Scholar
Kundel HL, Nodine CF (1983) A visual concept shapes image perception. Radiology 146:363–368
CAS PubMed Google Scholar
Chakraborty DP (2006) A search model and figure of merit for observer data acquired according to the free-response paradigm. Phys Med Biol 51:3449–3462
Article CAS PubMed Google Scholar
Edwards DC, Kupinski MA, Metz CE, Nishikawa RM (2002) Maximum likelihood fitting of FROC curves under an initial-detection-and-candidate-analysis model. Med Phys 29:2861–2870
Article PubMed Google Scholar
Wagner RF, Metz CE, Campbell G (2007) Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol 14:723–748
Article PubMed Google Scholar
Rockette HE (1994) An index of diagnostic accuracy in the multiple disease setting. Acad Radiol 1:283–286
Article CAS PubMed Google Scholar
Chakraborty DP (2006) ROC curves predicted by a model of visual search. Phys Med Biol 51:3463–3482
Article CAS PubMed Google Scholar
Song T, Bandos AI, Rockette HE (2008) On comparing methods for discriminating between actually negative and actually positive subjects with FROC type data. Med Phys 35:1547–1558
Article PubMed Google Scholar
Pickhardt PJ, Nugent PA, Mysliwiec PA, Choi RJ, Schindler WR (2004) Location of adenomas missed by optical colonoscopy. Ann Intern Med 141:352–359
PubMed Google Scholar
Obuchowski NA (1998) On the comparison of correlated proportions for clustered data. Stat Med 17:1495–1507
Article CAS PubMed Google Scholar
Obuchowski NA (1997) Nonparametric analysis of clustered ROC curve data. Biometrics 53:170–180
Article Google Scholar
Obuchowski NA, Lieber ML, Powell KA (2000) Data analysis for detection and localization of multiple abnormalities with application to mammography. Acad Radiol 7:516–525; Author’s Response to Comments, 7:554–555
Article CAS PubMed Google Scholar
Beam CA (1998) Analysis of clustered data in receiver operating characteristic studies. Stat Methods Med Res 7:324–336
Article CAS PubMed Google Scholar
Rutter CM (2000) Bootstrap estimation of diagnostic accuracy with patient-clustered data. Acad Radiol 7:516–525
Article Google Scholar
Chakraborty DP, Berbaum KS (2004) Observer studies involving detection and localization: modeling, analysis, and validation. Med Phys 31:2313–2330
Article PubMed Google Scholar
Chakraborty DP (2006) Analysis of location specific observer performance data: validated extensions of the jackknife free-response (JAFROC) method. Acad Radiol 13:1187–1193
Article PubMed Google Scholar
Kish L (1965) Survey sampling. Wiley, New York
Google Scholar

Download references

Acknowledgement

This manuscript was partially prepared under a grant from the State of Ohio, Department of Development. The content reflects the views of the Cleveland Clinic Foundation and does not necessarily reflect the views of the State of Ohio, Department of Development.

Author information

Authors and Affiliations

Department of Quantitative Health Sciences/JJN3 and the Imaging Institute, Cleveland Clinic Foundation, 9500 Euclid Ave, Cleveland, OH, 44195, USA
Nancy A. Obuchowski
Respiratory Institute, Cleveland Clinic Foundation, 9500 Euclid Ave, Cleveland, OH, 44195, USA
Peter J. Mazzone
Department of Radiology, MC2026, The University of Chicago, 5841 South Maryland Avenue, Chicago, IL, 60637, USA
Abraham H. Dachman

Authors

Nancy A. Obuchowski
View author publications
You can also search for this author in PubMed Google Scholar
Peter J. Mazzone
View author publications
You can also search for this author in PubMed Google Scholar
Abraham H. Dachman
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Nancy A. Obuchowski.

Appendix

The sample size calculations in Table 4 follow the methods described in Zhou et al. [2]. For 80% power and 5% type I error rate, a formula for determining the sample size for testing the null hypothesis that two sensitivities are equal, versus the alternative hypothesis that the sensitivities differ, is:

$${N_{\text{D}} = {{\left[ {{\text{2}} \times {\text{Sen}}_{\text{0}} \times \left( {{\text{1}} - {\text{Sen}}_{\text{0}} } \right) \times {\text{7}}{\text{.84}}} \right]} \mathord{\left/ {\vphantom {{\left[ {{\text{2}} \times {\text{Sen}}_{\text{0}} \times \left( {{\text{1}} - {\text{Sen}}_{\text{0}} } \right) \times {\text{7}}{\text{.84}}} \right]} {\left( {{\text{Sen}}_{\text{1}} - {\text{Sen}}_{\text{2}} } \right)^{\text{2}} }}} \right. \kern-\nulldelimiterspace} {\left( {{\text{Sen}}_{\text{1}} - {\text{Sen}}_{\text{2}} } \right)^{\text{2}} }}}$$

where Sen₀ is the sensitivity under the null hypothesis, Sen₁ is the sensitivity of the first technique under the alternative hypothesis, and Sen₂ is the sensitivity of the second technique under the alternative hypothesis. The sample size needed for studies using the level 2 estimators can be estimated by substituting the conjectured values of patient-level sensitivity into this equation.

For studies using lesion-level estimators (or ROI-level estimators) when there is more than one lesion expected per patient (or more than one ROI with a lesion per patient), several modifications to this formula are required. First, instead of patient-level values of sensitivity, we substitute lesion-level (or ROI-level) values of sensitivity. Second, we divide N _D by the number of lesions (or number of ROIs) per patient, then we multiple by the design effect. The design effect [22] is a factor that accounts for the correlation between lesions within the same patient. The design effect equals: ${\text{1}} + \left( {s - {\text{1}}} \right)r$, where s is the number of lesions per patient (or the number of ROIs with lesions per patient), and r is the correlation between lesions within the same patient. If there is no interlesion correlation, then the design effect is 1.0. If there is perfect positive interlesion correlation, then the design effect is s.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Obuchowski, N.A., Mazzone, P.J. & Dachman, A.H. Bias, underestimation of risk, and loss of statistical power in patient-level analyses of lesion detection. Eur Radiol 20, 584–594 (2010). https://doi.org/10.1007/s00330-009-1590-4

Download citation

Received: 16 March 2009
Revised: 15 July 2009
Accepted: 27 July 2009
Published: 16 September 2009
Issue Date: March 2010
DOI: https://doi.org/10.1007/s00330-009-1590-4

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Bias, underestimation of risk, and loss of statistical power in patient-level analyses of lesion detection

Abstract

Purpose

Methods

Results

Conclusion

Access this article

Similar content being viewed by others

Study Designs: Diagnostic Studies

What you need to know about statistics Part I: validity of diagnostic and screening tests

Development and validation of clinical prediction models to risk stratify patients presenting with small pulmonary nodules: a research protocol

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Appendix

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Bias, underestimation of risk, and loss of statistical power in patient-level analyses of lesion detection

Abstract

Purpose

Methods

Results

Conclusion

Access this article

Similar content being viewed by others

Study Designs: Diagnostic Studies

What you need to know about statistics Part I: validity of diagnostic and screening tests

Development and validation of clinical prediction models to risk stratify patients presenting with small pulmonary nodules: a research protocol

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation