Abstract
Ideal reference standards for comparing segmentation algorithms balance trade-offs between the data set size, the costs of reference standard creation and the resulting accuracy. As reference standard quality impacts the likelihood of detecting significant improvements (i.e. the statistical power), we derived a sample size formula for segmentation accuracy comparison using an imperfect reference standard. We expressed this formula as a function of algorithm performance and reference standard quality (e.g. measured with a high quality reference standard on pilot data) to reveal the relationship between reference standard quality and statistical power, addressing key study design questions: (1) How many validation images are needed to compare segmentation algorithms? (2) How accurate should the reference standard be? The resulting formula predicted statistical power to within 2% of Monte Carlo simulations across a range of model parameters. A case study, using the PROMISE12 prostate segmentation data set, shows the practical use of the formula.
Chapter PDF
Similar content being viewed by others
References
Beiden, S.V., Campbell, G., Meier, K.L., Wagner, R.F.: The problem of ROC analysis without truth: The EM algorithm and the information matrix. In: SPIE Medical Imaging, pp. 126–134 (2000)
Boehmer, D., Maingon, P., Poortmans, P., Baron, M.H., Miralbell, R., Remouchamps, V., Scrase, C., Bossi, A., Bolla, M.: Guidelines for primary radiotherapy of patients with prostate cancer. Radiother. Oncol. 79(3), 259–269 (2006)
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian data analysis, vol. 2. Taylor & Francis (2014)
Landman, B.A., Asman, A.J., Scoggins, A.G., Bogovic, J.A., Stein, J.A., Prince, J.L.: Foibles, follies, and fusion: Web-based collaboration for medical image labeling. NeuroImage 59(1), 530–539 (2012)
Litjens, G., Toth, R., van de Ven, W., Hoeks, C., Kerkstra, S., van Ginneken, B., Vincent, G., Guillard, G., et al.: Evaluation of prostate segmentation algorithms for MRI: the PROMISE12 challenge. Med. Image Anal. 18(2), 359–373 (2014)
Lumley, T., Diehr, P., Emerson, S., Chen, L.: The importance of the normality assumption in public health data sets. Ann. Rev. Pub. Health 23(1), 151–169 (2002)
Minka, T.P.: Estimating a Dirichlet distribution. Tech. rep., M.I.T. (2000)
Warfield, S.K., Zou, K.H., Wells, W.M.: Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans. Med. Imag. 23(7), 903–921 (2004)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Gibson, E., Huisman, H.J., Barratt, D.C. (2015). Statistical Power in Image Segmentation: Relating Sample Size to Reference Standard Quality. In: Navab, N., Hornegger, J., Wells, W., Frangi, A. (eds) Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015. MICCAI 2015. Lecture Notes in Computer Science(), vol 9351. Springer, Cham. https://doi.org/10.1007/978-3-319-24574-4_13
Download citation
DOI: https://doi.org/10.1007/978-3-319-24574-4_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-24573-7
Online ISBN: 978-3-319-24574-4
eBook Packages: Computer ScienceComputer Science (R0)