Comparative Validation of Graphical Models for Learning Tumor Segmentations from Noisy Manual Annotations

  • Frederik O. Kaster
  • Bjoern H. Menze
  • Marc-André Weber
  • Fred A. Hamprecht
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6533)


Classification-based approaches for segmenting medical images commonly suffer from missing ground truth: often one has to resort to manual labelings by human experts, which may show considerable intra-rater and inter-rater variability. We experimentally evaluate several latent class and latent score models for tumor classification based on manual segmentations of different quality, using approximate variational techniques for inference. For the first time, we also study models that make use of image feature information on this specific task. Additionally, we analyze the outcome of hybrid techniques formed by combining aspects of different models. Benchmarking results on simulated MR images of brain tumors are presented: while simple baseline techniques already gave very competitive performance, significant improvements could be made by explicitly accounting for rater quality. Furthermore, we point out the transfer of these models to the task of fusing manual tumor segmentations derived from different imaging modalities on real-world data.


Hybrid Model Area Under Curve Manual Segmentation Tumor Segmentation Tumor Probability 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Buntine, W.: Operations for Learning with Graphical Models. Journal of Artificial Intelligence Research 2, 159–225 (1994)Google Scholar
  2. Dempster, A., Laird, N., Rubin, D., et al.: Maximum Likelihood from Incomplete Data via the EM Algorithm. Journal of the Royal Statistical Society. Series B (Methodological) 39(1), 1–38 (1977)MathSciNetzbMATHGoogle Scholar
  3. Gelfand, A.E., Smith, A.F.: Sampling-Based Approaches to Calculating Marginal Densities. Journal of the American Statistical Association 85(410), 398–409 (1990)MathSciNetCrossRefzbMATHGoogle Scholar
  4. Giannini, C., Scheithauer, B., Weaver, A., et al.: Oligodendrogliomas: reproducibility and prognostic value of histologic diagnosis and grading. Journal of Neuropathology & Experimental Neurology 60(3), 248 (2001)CrossRefGoogle Scholar
  5. Koller, D., Friedman, N.: Probabilistic Graphical Models – Principles and Techniques. MIT Press, Cambridge (2009)zbMATHGoogle Scholar
  6. Lunn, D., Thomas, A., Best, N., et al.: WinBUGS – A Bayesian modelling framework: Concepts, structure and extensibility. Statistics and Computing 10, 325–337 (2000)CrossRefGoogle Scholar
  7. Minka, T.: Expectation Propagation for approximate Bayesian inference. In: Proceedings of the 17th Conference on Uncertainty in Artificial Intelligence, pp. 362–369 (2001)Google Scholar
  8. Minka, T., Winn, J., Guiver, J., et al.: Infer.NET 2.3. Microsoft Research, Cambridge (2009),
  9. Minka, T., Winn, J.: Gates. In: Koller, D., Schuurmans, D., Bengio, Y., Bottou, L. (eds.) Advances in Neural Information Processing Systems 21, pp. 1073–1080. MIT Press, Cambridge (2009)Google Scholar
  10. Prastawa, M., Bullitt, E., Gerig, G.: Simulation of Brain Tumors in MR Images for Evaluation of Segmentation Efficacy. Medical Image Analysis 13(2), 297–311 (2009)CrossRefGoogle Scholar
  11. Raykar, V.C., Yu, S., Zhao, L.H., et al.: Learning From Crowds. Journal of Machine Learning Research 11, 1297–1322 (2010)MathSciNetGoogle Scholar
  12. Rogers, S., Girolami, M., Polajnar, T.: Semi-parametric analysis of multi-rater data. Statistics and Computing 20(3), 317–334 (2010)MathSciNetCrossRefGoogle Scholar
  13. Schmidt, M., Levner, I., Greiner, R., et al.: Segmenting Brain Tumors using Alignment-Based Features. In: Proceedings of the Fourth International Conference on Machine Learning and Applications (ICMLA), pp. 215–220 (2005)Google Scholar
  14. Smyth, P., Fayyad, U., Burl, M., et al.: Inferring Ground Truth From Subjective Labelling of Venus Images. In: Tesauro, G., Toretzy, D., Leen, T. (eds.) Advances in Neural Information Processing Systems, vol. 7, pp. 1085–1092. MIT Press, Cambridge (1995)Google Scholar
  15. Warfield, S., Zou, K., Wells, W.: Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Transactions on Medical Imaging 23(7), 903–921 (2004)CrossRefGoogle Scholar
  16. Warfield, S., Zou, K., Wells, W.: Validation of image segmentation by estimating rater bias and variance. Philosophical Transactions of the Royal Society A 366(1874), 2361–2375 (2008)CrossRefGoogle Scholar
  17. Whitehill, J., Ruvolo, P.: fan Wu, T., et al.: Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise. In: Bengio, Y., Schuurmans, D., Lafferty, J., Williams, C.K.I., Culotta, A. (eds.) Advances in Neural Information Processing Systems 22, pp. 2035–2043. MIT Press, Cambridge (2009)Google Scholar
  18. Winn, J., Bishop, C.: Variational Message Passing. Journal of Machine Learning Research 6, 661–694 (2005)MathSciNetzbMATHGoogle Scholar
  19. Zhang, J.: The mean field theory in EM procedures for Markov random fields. IEEE Transactions on Signal Processing 40(10), 2570–2583 (1992)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Frederik O. Kaster
    • 1
    • 2
  • Bjoern H. Menze
    • 3
    • 4
  • Marc-André Weber
    • 5
  • Fred A. Hamprecht
    • 1
  1. 1.Heidelberg Collaboratory for Image ProcessingUniversity of HeidelbergGermany
  2. 2.German Cancer Research CenterHeidelbergGermany
  3. 3.CSAILMassachusetts Institute of TechnologyCambridgeUSA
  4. 4.INRIA Sophia-Antipolis MéditerrannéeFrance
  5. 5.Department of Diagnostic RadiologyUniversity of HeidelbergGermany

Personalised recommendations