Combining Multiple Image Segmentations by Maximizing Expert Agreement
A common characteristic of collecting the ground truth for medical images is that multiple experts provide only partially coherent manual segmentations, and in some cases, with varying confidence. As the result, there is considerable spatial variation between the expert segmentations, and for training and testing, the “true” ground truth is estimated by disambiguating (combining) the provided segments. STAPLE and its derivatives are the state-of-the-art approach for disambiguating multiple spatial segments provided by clinicians. In this work, we propose a simple yet effective procedure based on maximizing the joint agreement of experts. Our algorithm produces the optimal disambiguation by maximizing the agreement and no priors are used. In the experimental part, we generate a new ground truth for the popular diabetic retinopathy benchmark, DiaRetDB1, for which the original expert markings are publicly available. We demonstrate performance superior to the original and also STAPLE generated ground truth. In addition, the DiaRetDB1 baseline method performs better with the new ground truth.
KeywordsGround Truth Diabetic Retinopathy Receiver Operating Characteristic Curve Lesion Type Equal Error Rate
Unable to display preview. Download preview PDF.
- 1.Kauppi, T., Kalesnykiene, V., Kamarainen, J.K., Lensu, L., Sorri, I., Raninen, A., Voutilainen, R., Uusitalo, H., Kälviäinen, H., Pietilä, J.: The DIARETDB1 diabetic retinopathy database and evaluation protocol. In: BMVC (2007)Google Scholar
- 2.Warfield, S., Zou, K., Wells, W.: Simultaneous truth and performance level estimation (STAPLE): An algorithm for the validation of image segmentation. IEEE Trans. on Medical Imaging 23(7) (2004)Google Scholar
- 3.Commowick, O., Warfield, S.: A continuous staple for scalar, vector, and tensor images: An application to dti analysis. IEEE Trans. on Medical Imaging 28(6) (2009)Google Scholar
- 4.Commowick, O., Warfield, S.: Estimation of inferential uncertainty in assessing expert segmentation performance from staple. IEEE Trans. on Medical Imaging 29(3) (2010)Google Scholar
- 5.Kauppi, T., Kamarainen, J.-K., Lensu, L., Kalesnykiene, V., Sorri, I., Kälviäinen, H., Uusitalo, H., Pietilä, J.: Fusion of Multiple Expert Annotations and Overall Score Selection for Medical Image Diagnosis. In: Salberg, A.-B., Hardeberg, J.Y., Jenssen, R. (eds.) SCIA 2009. LNCS, vol. 5575, pp. 760–769. Springer, Heidelberg (2009)CrossRefGoogle Scholar
- 6.Everingham, M., Zisserman, A., Williams, C.K.I., Van Gool, L.: The PASCAL Visual Object Classes Challenge (VOC 2006) Results (2006)Google Scholar
- 7.Kittler, J., Hatef, M., Duin, R.P.W., Matas, J.: On combining classfiers. IEEE Trans. on PAMI 20(3) (1998)Google Scholar