Abstract
Comparison of a group of multiple observer segmentations is known to be a challenging problem. A good segmentation evaluation method would allow different segmentations not only to be compared, but to be combined to generate a “true” segmentation with higher consensus. Numerous multi-observer segmentation evaluation approaches have been proposed in the literature, and STAPLE in particular probabilistically estimates the true segmentation by optimal combination of observed segmentations and a prior model of the truth. An Expectation–Maximization (EM) algorithm, STAPLE’s convergence to the desired local minima depends on good initializations for the truth prior and the observer-performance prior. However, accurate modeling of the initial truth prior is nontrivial. Moreover, among the two priors, the truth prior always dominates so that in certain scenarios when meaningful observer-performance priors are available, STAPLE can not take advantage of that information. In this paper, we propose a Bayesian decision formulation of the problem that permits the two types of prior knowledge to be integrated in a complementary manner in four cases with differing application purposes: (1) with known truth prior; (2) with observer prior; (3) with neither truth prior nor observer prior; and (4) with both truth prior and observer prior. The third and fourth cases are not discussed (or effectively ignored) by STAPLE, and in our research we propose a new method to combine multiple-observer segmentations based on the maximum a posterior (MAP) principle, which respects the observer prior regardless of the availability of the truth prior. Based on the four scenarios, we have developed a web-based software application that implements the flexible segmentation evaluation framework for digitized uterine cervix images. Experiment results show that our framework has flexibility in effectively integrating different priors for multi-observer segmentation evaluation and it also generates results comparing favorably to those by the STAPLE algorithm and the Majority Vote Rule.
Similar content being viewed by others
References
Warfield, S. K., Zou, K. H., & Wells, W. M. (2004). Simultaneous Truth and Performance Level Estimation (STAPLE): An algorithm for the validation of image segmentation. IEEE Transactions on Medical Imaging, July.
Lotenberg, S., Greenspan, H., Gordon, S., Long, L. R., Jeronimo, J., & Antani, S. K. (2007). Automatic evaluation of uterine cervix segmentations. Proceedings of SPIE Medical Imaging, 6515, 65151J–1-12.
Zhu, Y., Long, L. R., Antani, S. K., Xue, Z., & Thoma, G. R. (2007). Web-based STAPLE for quality estimation of multiple image segmentations. Poster at 20th NIH Research Festival (IMAG-12), National Institutes of Health, September.
Zhang, Y. J. (1996). A survey on evaluation methods for image segmentation. Pattern Recognition, 29(8), 1335–1346.
Yasnoff, W. A., Mui, J. K., & Bacus, J. W. (1977). Error measures in scene segmentation. Pattern Recognition, 9(4), 217–231.
Qian Huang Dom, B. (1995). Quantitative methods of evaluating image segmentation. Proceedings IEEE International Conference on Image Processing, 3, 53–56.
Martin, D. (2002). An empirical approach to grouping and segmentation. PhD dissertation, University of California, Berkeley.
Cardoso, J. S., & Corte-Real, L. (2005). Toward a generic evaluation of image segmentation. IEEE Transactions on Image Processing, 14(11), 1773–1782.
Monteiro, F. C., Fernando, C., Campilho, A. C., & Aurélio, C. Performance Evaluation of Image Segmentation. ICIAR06 (I: 248–259).
Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20, 226–239 Mar.
Windridge, D., & Kittler, J. (2003). A morphologically optimal strategy for classifier combination: Multiple expert fusion as a tomographic process. IEEE Transactions on Pattern Analysis and Machine Intelligence, 25, 343–353 Mar.
Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3, 79–87.
Jordan, M. I., & Jacobs, R. A. Hierarchical Mixtures of Experts and the EM Algorithm. Tech. Rep. AIM-1440, 1993.
Restif, C. (2007). Revisiting the evaluation of segmentation results: Introducing confidence maps. Medical Image Computing and Computer-Assisted Intervention, 2, 588–595.
Martina, A., Laanaya, H., & Arnold-Bos, A. (2006). Evaluation for uncertain image classification and segmentation. Pattern Recognition, 39(11), 1987–1995 November.
Berger, J. (1985). Statistical decision theory and bayesian analysis. New York: Springer-Verlag.
Prasad, M., Sowmya, A., & Koch, I. (2004). Feature subset selection using ICA for classifying emphysema in HRCT images. 17th International Conference on Pattern Recognition (ICPR), 4, 515–518.
Prasad, M., Sowmya, A., & Wilson, P. Multi-level classification of emphysema in HRCT lung images. Pattern Analysis & Applications
Herrero, R., Schiffman, M. H., Bratti, C., et al. (1997). Design and methods of a population-based natural history study of cervical neoplasia in a rural province of Costa-Rica: The Guanacaste Project. Revista Panamericana de Salud Pública, 1(5), 362–375.
Huang, X., Wang, W., Xue, Z., Antani, S., Long, L. R., & Jeronimo, J. (2008). Tissue classification using cluster features for lesion detection in digital cervigrams. San Diego: SPIE Medical Imaging.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Zhu, Y., Huang, X., Wang, W. et al. Balancing the Role of Priors in Multi-Observer Segmentation Evaluation. J Sign Process Syst Sign Image Video Technol 55, 185–207 (2009). https://doi.org/10.1007/s11265-008-0215-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11265-008-0215-5