Abstract
Purpose
Knowledge of the exact shape of a lesion, or ground truth (GT), is necessary for the development of diagnostic tools by means of algorithm validation, measurement metric analysis, accurate size estimation. Four methods that estimate GTs from multiple readers’ documentations by considering the spatial location of voxels were compared: thresholded Probability-Map at 0.50 (TPM0.50) and at 0.75 (TPM0.75), simultaneous truth and performance level estimation (STAPLE) and truth estimate from self distances (TESD).
Methods
A subset of the publicly available Lung Image Database Consortium archive was used, selecting pulmonary nodules documented by all four radiologists. The pair-wise similarities between the estimated GTs were analyzed by computing the respective Jaccard coefficients. Then, with respect to the readers’ marking volumes, the estimated volumes were ranked and the sign test of the differences between them was performed.
Results
(a) the rank variations among the four methods and the volume differences between STAPLE and TESD are not statistically significant, (b) TPM0.50 estimates are statistically larger (c) TPM0.75 estimates are statistically smaller (d) there is some spatial disagreement in the estimates as the one-sided 90% confidence intervals between TPM0.75 and TPM0.50, TPM0.75 and STAPLE, TPM0.75 and TESD, TPM0.50 and STAPLE, TPM0.50 and TESD, STAPLE and TESD, respectively, show: [0.67, 1.00], [0.67, 1.00], [0.77, 1.00], [0.93, 1.00], [0.85, 1.00], [0.85, 1.00].
Conclusions
The method used to estimate the GT is important: the differences highlighted that STAPLE and TESD, notwithstanding a few weaknesses, appear to be equally viable as a GT estimator, while the increased availability of computing power is decreasing the appeal afforded to TPMs. Ultimately, the choice of which GT estimation method, between the two, should be preferred depends on the specific characteristics of the marked data that is used with respect to the two elements that differentiate the method approaches: relative reliabilities of the readers and the reliability of the region boundaries.
References
Armato SG, McLennan G, McNitt-Gray MF, Meyer CR, Yankelevitz D, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, Reeves AP, Croft BY, Clarke LP (2004) The Lung Image Database Consortium Research Group. Lung image database consortium: developing a resource for the medical imaging research community. Radiology 232(3): 739–748
Biancardi AM, Reeves AP (2009) TESD: a novel ground truth estimation method. In: SPIE international symposium on medical imaging, vol 7260, pp 72603V–1–8
Biancardi AM, Jirapatnakul AC, Fotin S, Apanasovich TV, Reeves AP (2009) An analysis of two ground truth estimation methods. In: SPIE international symposium on medical imaging, vol 7260, pp 72600E–1–8
Boykov Y, Kolmogorov V (2004) An experimental comparison of Min-Cut/Max-Flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell 26(9): 1124–1137
Breiman RS, Beck JW, Korobkin M, Glenny R, Akwari OE, Heaston DK, Moore AV, Ram PC (1982) Volume determinations using computed tomography. Am J Roentgenol 138(2): 329–333
Felzenszwalb P, Huttenlocher D (2003) Distance transforms of sampled functions. Technical report, Cornell University
Ford LR Jr, Fulkerson DR (1956) Maximal flow through a network. Can J Math 8(3): 399–404
Goodman LR, Gulsun M, Washington L, Nagy PG, Piacsek KL (2006) Inherent variability of CT lung nodule measurements in vivo using semiautomated volumetric measurements. Am J Roentgenol 186: 989–994
Ibanez L, Schroeder W, Ng L, Cates J (2005) The ITK Software Guide, 2nd edn. Kitware. ISBN 1-930934-15-7. http://www.itk.org/ItkSoftwareGuide.pdf
Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bull Soc Vaudoise Sci Nat 44: 223–270
Ko JP, Rusinek H, Jacobs EL, Babb JS, Betke M, McGuinness G, Naidich DP (2003) Small pulmonary nodules: volume measurement at chest CT—phantom study. Radiology 228(3): 864–870
Kuhnigk J-M, Dicken V, Bornemann L, Wormanns D, Krass S, Peitgen HO (2004) Fast automated segmentation and reproducible volumetry of pulmonary metastases in CT-scans for therapy monitoring. In: Lecture notes in computer science, vol 3217, pp 933–941. Medical Image Computing and Computer-Assisted Intervention. Springer GmbH
McNitt-Gray MF, Armato SG III, Meyer CR, Reeves AP, McLennan G, Pais RC, Freymann J, Brown MS, Engelmann RM, Bland PH, Laderach GE, Piker C, Guo J, Towfic Z, Qing DP-Y, Yankelevitz DF, Aberle DR, van Beek EJR, MacMahon H, Kazerooni EA, Croft BY, Clarke LP (2007) The lung image database consortium (LIDC) data collection process for nodule detection and annotation. Acad Radiol 14(12): 1464–1474
Meyer CR, Johnson TD, McLennan G, Aberle DR, Kazerooni EA, MacMahon H, Mullan BF, Yankelevitz DF, van Beek EJR, Armato SG III, McNitt-Gray MF, Reeves AP, Gur D, Henschke CI, Hoffman EA, Bland PH, Laderach G, Pais R, Qing D, Piker C, Guo J, Starkey A, Max D, Croft BY, Clarke LP (2006) Evaluation of lung MDCT nodule annotation across radiologists and methods. Acad Radiol 13: 1254–1265
National Cancer Institute (2009) National cancer imaging archive. https://imaging.nci.nih.gov/ncia/. Accessed 9 Jan 2009
National Institutes of Health (2009) Lung image database resource for imaging research. http://grants.nih.gov/grants/guide/rfa-files/RFA-CA-01-001.html, 2000. Accessed 9 Jan 2009
Okada K, Comaniciu D, Krishnan A (2005) Robust anisotropic gaussian fitting for volumetric characterization of pulmonary nodules in multislice CT. IEEE Trans Med Imaging 24(3): 409–423
Reeves A, Chan A, Yankelevitz D, Henschke C, Kressler B, Kostis W (2006) On measuring the change in size of pulmonary nodules. IEEE Trans Med Imaging 25(4): 435–450
Reeves AP, Biancardi AM, Apanasovich TV, Meyer CR, MacMahon H, van Beek EJR, Kazerooni EA, Yankelevitz D, McNitt-Gray MF, McLennan G, Armato SG III, Henschke CI, Aberle DR, Croft BY, Clarke LP (2007) The lung image database consortium (LIDC): a comparison of different size metrics for pulmonary nodule measurements. Acad Radiol 14(12): 1475–1485
Revel M-P, Merlin A, Peyrard S, Triki R, Couchon S, Chatellier G, Frija G (2006) Software volumetric evaluation of doubling times for differentiating benign versus malignant pulmonary nodules. Am J Roentgenol 187: 135–142
Rohlfing T, Maurer CR (2007) Shape-based averaging. IEEE Trans Image Process 16(1): 153–161
Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, Verweij J, Van Glabbeke M, van Oosterom AT, Christian MC, Gwyther SG (2000) New guidelines to evaluate the response to treatment in solid tumors. J Natl Cancer Inst 92(3): 205–216
Warfield SK, Zou KH, Wells WM (2004) Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging 23(7): 903–921
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Biancardi, A.M., Jirapatnakul, A.C. & Reeves, A.P. A comparison of ground truth estimation methods. Int J CARS 5, 295–305 (2010). https://doi.org/10.1007/s11548-009-0401-3
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11548-009-0401-3