Validation of Image Segmentation by Estimating Rater Bias and Variance

  • Simon K. Warfield
  • Kelly H. Zou
  • William M. Wells
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4191)


The accuracy and precision of segmentations of medical images has been difficult to quantify in the absence of a “ground truth” or reference standard segmentation for clinical data. Although physical or digital phantoms can help by providing a reference standard, they do not allow the reproduction of the full range of imaging and anatomical characteristics observed in clinical data.

An alternative assessment approach is to compare to segmentations generated by domain experts. Segmentations may be generated by raters who are trained experts or by automated image analysis algorithms. Typically these segmentations differ due to intra-rater and inter-rater variability. The most appropriate way to compare such segmentations has been unclear.

We present here a new algorithm to enable the estimation of performance characteristics, and a true labeling, from observations of segmentations of imaging data where segmentation labels may be ordered or continuous measures. This approach may be used with, amongst others, surface, distance transform or level set representations of segmentations, and can be used to assess whether or not a rater consistently over-estimates or under-estimates the position of a boundary.


Image Segmentation True Score Rater Segmentation Rater Bias True Label 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Warfield, S.K., Zou, K.H., Wells, W.M.: Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imag 23, 903–921 (2004)CrossRefGoogle Scholar
  2. 2.
    Rohlfing, T., Russakoff, D.B., Maurer, C.R.: Expectation maximization strategies for multi-atlas multi-label segmentation. In: Proceedings of International Conference of Information Processing in Medical Imaging, pp. 210–221 (2003)Google Scholar
  3. 3.
    Rohlfing, T., Russakoff, D.B., Maurer, C.R.: Performance-based classifier combination in atlas-based image segmentation using expectation-maximization parameter estimation. IEEE Transactions On Medical Imaging 23, 983–994 (2004)CrossRefGoogle Scholar
  4. 4.
    Dice, L.R.: Measures of the amount of ecologic association between species. Ecology 26(3), 297–302 (1945)CrossRefGoogle Scholar
  5. 5.
    Dempster, A., Laird, N., Rubin, D.: Maximum-likelihood from incomplete data via the EM algorithm. J. Royal Statist. Soc. Ser. B. 39, 34–37 (1977)MathSciNetGoogle Scholar
  6. 6.
    McLachlan, G.J., Krishnan, T.: The EM Algorithm and Extensions. Wiley-Interscience, New York (1996)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Simon K. Warfield
    • 1
    • 2
  • Kelly H. Zou
    • 2
  • William M. Wells
    • 2
  1. 1.Computational Radiology Laboratory, Dept. RadiologyChildren’s Hospital 
  2. 2.Dept. RadiologyBrigham and Women’s Hospital, Harvard Medical SchoolBostonUSA

Personalised recommendations