Skip to main content
Log in

Balancing the Role of Priors in Multi-Observer Segmentation Evaluation

  • Published:
Journal of Signal Processing Systems Aims and scope Submit manuscript

Abstract

Comparison of a group of multiple observer segmentations is known to be a challenging problem. A good segmentation evaluation method would allow different segmentations not only to be compared, but to be combined to generate a “true” segmentation with higher consensus. Numerous multi-observer segmentation evaluation approaches have been proposed in the literature, and STAPLE in particular probabilistically estimates the true segmentation by optimal combination of observed segmentations and a prior model of the truth. An Expectation–Maximization (EM) algorithm, STAPLE’s convergence to the desired local minima depends on good initializations for the truth prior and the observer-performance prior. However, accurate modeling of the initial truth prior is nontrivial. Moreover, among the two priors, the truth prior always dominates so that in certain scenarios when meaningful observer-performance priors are available, STAPLE can not take advantage of that information. In this paper, we propose a Bayesian decision formulation of the problem that permits the two types of prior knowledge to be integrated in a complementary manner in four cases with differing application purposes: (1) with known truth prior; (2) with observer prior; (3) with neither truth prior nor observer prior; and (4) with both truth prior and observer prior. The third and fourth cases are not discussed (or effectively ignored) by STAPLE, and in our research we propose a new method to combine multiple-observer segmentations based on the maximum a posterior (MAP) principle, which respects the observer prior regardless of the availability of the truth prior. Based on the four scenarios, we have developed a web-based software application that implements the flexible segmentation evaluation framework for digitized uterine cervix images. Experiment results show that our framework has flexibility in effectively integrating different priors for multi-observer segmentation evaluation and it also generates results comparing favorably to those by the STAPLE algorithm and the Majority Vote Rule.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12
Figure 13
Figure 14
Figure 15
Figure 16
Figure 17
Figure 18

Similar content being viewed by others

References

  1. Warfield, S. K., Zou, K. H., & Wells, W. M. (2004). Simultaneous Truth and Performance Level Estimation (STAPLE): An algorithm for the validation of image segmentation. IEEE Transactions on Medical Imaging, July.

  2. Lotenberg, S., Greenspan, H., Gordon, S., Long, L. R., Jeronimo, J., & Antani, S. K. (2007). Automatic evaluation of uterine cervix segmentations. Proceedings of SPIE Medical Imaging, 6515, 65151J–1-12.

    Google Scholar 

  3. Zhu, Y., Long, L. R., Antani, S. K., Xue, Z., & Thoma, G. R. (2007). Web-based STAPLE for quality estimation of multiple image segmentations. Poster at 20th NIH Research Festival (IMAG-12), National Institutes of Health, September.

  4. Zhang, Y. J. (1996). A survey on evaluation methods for image segmentation. Pattern Recognition, 29(8), 1335–1346.

    Article  Google Scholar 

  5. Yasnoff, W. A., Mui, J. K., & Bacus, J. W. (1977). Error measures in scene segmentation. Pattern Recognition, 9(4), 217–231.

    Article  Google Scholar 

  6. Qian Huang Dom, B. (1995). Quantitative methods of evaluating image segmentation. Proceedings IEEE International Conference on Image Processing, 3, 53–56.

    Article  Google Scholar 

  7. Martin, D. (2002). An empirical approach to grouping and segmentation. PhD dissertation, University of California, Berkeley.

  8. Cardoso, J. S., & Corte-Real, L. (2005). Toward a generic evaluation of image segmentation. IEEE Transactions on Image Processing, 14(11), 1773–1782.

    Article  Google Scholar 

  9. Monteiro, F. C., Fernando, C., Campilho, A. C., & Aurélio, C. Performance Evaluation of Image Segmentation. ICIAR06 (I: 248–259).

  10. Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 226–239 Mar.

    Article  Google Scholar 

  11. Windridge, D., & Kittler, J. (2003). A morphologically optimal strategy for classifier combination: Multiple expert fusion as a tomographic process. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 343–353 Mar.

    Article  Google Scholar 

  12. Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3, 79–87.

    Article  Google Scholar 

  13. Jordan, M. I., & Jacobs, R. A. Hierarchical Mixtures of Experts and the EM Algorithm. Tech. Rep. AIM-1440, 1993.

  14. Restif, C. (2007). Revisiting the evaluation of segmentation results: Introducing confidence maps. Medical Image Computing and Computer-Assisted Intervention, 2, 588–595.

    Google Scholar 

  15. Martina, A., Laanaya, H., & Arnold-Bos, A. (2006). Evaluation for uncertain image classification and segmentation. Pattern Recognition, 39(11), 1987–1995 November.

    Article  Google Scholar 

  16. Berger, J. (1985). Statistical decision theory and bayesian analysis. New York: Springer-Verlag.

    MATH  Google Scholar 

  17. Prasad, M., Sowmya, A., & Koch, I. (2004). Feature subset selection using ICA for classifying emphysema in HRCT images. 17th International Conference on Pattern Recognition (ICPR), 4, 515–518.

    Article  Google Scholar 

  18. Prasad, M., Sowmya, A., & Wilson, P. Multi-level classification of emphysema in HRCT lung images. Pattern Analysis & Applications

  19. Herrero, R., Schiffman, M. H., Bratti, C., et al. (1997). Design and methods of a population-based natural history study of cervical neoplasia in a rural province of Costa-Rica: The Guanacaste Project. Revista Panamericana de Salud Pública, 1(5), 362–375.

    Article  Google Scholar 

  20. Huang, X., Wang, W., Xue, Z., Antani, S., Long, L. R., & Jeronimo, J. (2008). Tissue classification using cluster features for lesion detection in digital cervigrams. San Diego: SPIE Medical Imaging.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yaoyao Zhu.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhu, Y., Huang, X., Wang, W. et al. Balancing the Role of Priors in Multi-Observer Segmentation Evaluation. J Sign Process Syst Sign Image Video Technol 55, 185–207 (2009). https://doi.org/10.1007/s11265-008-0215-5

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11265-008-0215-5

Keywords

Navigation