A comparison of ground truth estimation methods

  • Alberto M. BiancardiEmail author
  • Artit C. Jirapatnakul
  • Anthony P. Reeves
Original Article



Knowledge of the exact shape of a lesion, or ground truth (GT), is necessary for the development of diagnostic tools by means of algorithm validation, measurement metric analysis, accurate size estimation. Four methods that estimate GTs from multiple readers’ documentations by considering the spatial location of voxels were compared: thresholded Probability-Map at 0.50 (TPM0.50) and at 0.75 (TPM0.75), simultaneous truth and performance level estimation (STAPLE) and truth estimate from self distances (TESD).


A subset of the publicly available Lung Image Database Consortium archive was used, selecting pulmonary nodules documented by all four radiologists. The pair-wise similarities between the estimated GTs were analyzed by computing the respective Jaccard coefficients. Then, with respect to the readers’ marking volumes, the estimated volumes were ranked and the sign test of the differences between them was performed.


(a) the rank variations among the four methods and the volume differences between STAPLE and TESD are not statistically significant, (b) TPM0.50 estimates are statistically larger (c) TPM0.75 estimates are statistically smaller (d) there is some spatial disagreement in the estimates as the one-sided 90% confidence intervals between TPM0.75 and TPM0.50, TPM0.75 and STAPLE, TPM0.75 and TESD, TPM0.50 and STAPLE, TPM0.50 and TESD, STAPLE and TESD, respectively, show: [0.67, 1.00], [0.67, 1.00], [0.77, 1.00], [0.93, 1.00], [0.85, 1.00], [0.85, 1.00].


The method used to estimate the GT is important: the differences highlighted that STAPLE and TESD, notwithstanding a few weaknesses, appear to be equally viable as a GT estimator, while the increased availability of computing power is decreasing the appeal afforded to TPMs. Ultimately, the choice of which GT estimation method, between the two, should be preferred depends on the specific characteristics of the marked data that is used with respect to the two elements that differentiate the method approaches: relative reliabilities of the readers and the reliability of the region boundaries.


CAD development Algorithm validation Volumetric measurement Diagnosis Response to therapy 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Armato SG, McLennan G, McNitt-Gray MF, Meyer CR, Yankelevitz D, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, Reeves AP, Croft BY, Clarke LP (2004) The Lung Image Database Consortium Research Group. Lung image database consortium: developing a resource for the medical imaging research community. Radiology 232(3): 739–748Google Scholar
  2. 2.
    Biancardi AM, Reeves AP (2009) TESD: a novel ground truth estimation method. In: SPIE international symposium on medical imaging, vol 7260, pp 72603V–1–8Google Scholar
  3. 3.
    Biancardi AM, Jirapatnakul AC, Fotin S, Apanasovich TV, Reeves AP (2009) An analysis of two ground truth estimation methods. In: SPIE international symposium on medical imaging, vol 7260, pp 72600E–1–8Google Scholar
  4. 4.
    Boykov Y, Kolmogorov V (2004) An experimental comparison of Min-Cut/Max-Flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell 26(9): 1124–1137CrossRefPubMedGoogle Scholar
  5. 5.
    Breiman RS, Beck JW, Korobkin M, Glenny R, Akwari OE, Heaston DK, Moore AV, Ram PC (1982) Volume determinations using computed tomography. Am J Roentgenol 138(2): 329–333Google Scholar
  6. 6.
    Felzenszwalb P, Huttenlocher D (2003) Distance transforms of sampled functions. Technical report, Cornell UniversityGoogle Scholar
  7. 7.
    Ford LR Jr, Fulkerson DR (1956) Maximal flow through a network. Can J Math 8(3): 399–404Google Scholar
  8. 8.
    Goodman LR, Gulsun M, Washington L, Nagy PG, Piacsek KL (2006) Inherent variability of CT lung nodule measurements in vivo using semiautomated volumetric measurements. Am J Roentgenol 186: 989–994CrossRefGoogle Scholar
  9. 9.
    Ibanez L, Schroeder W, Ng L, Cates J (2005) The ITK Software Guide, 2nd edn. Kitware. ISBN 1-930934-15-7.
  10. 10.
    Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bull Soc Vaudoise Sci Nat 44: 223–270Google Scholar
  11. 11.
    Ko JP, Rusinek H, Jacobs EL, Babb JS, Betke M, McGuinness G, Naidich DP (2003) Small pulmonary nodules: volume measurement at chest CT—phantom study. Radiology 228(3): 864–870CrossRefPubMedGoogle Scholar
  12. 12.
    Kuhnigk J-M, Dicken V, Bornemann L, Wormanns D, Krass S, Peitgen HO (2004) Fast automated segmentation and reproducible volumetry of pulmonary metastases in CT-scans for therapy monitoring. In: Lecture notes in computer science, vol 3217, pp 933–941. Medical Image Computing and Computer-Assisted Intervention. Springer GmbHGoogle Scholar
  13. 13.
    McNitt-Gray MF, Armato SG III, Meyer CR, Reeves AP, McLennan G, Pais RC, Freymann J, Brown MS, Engelmann RM, Bland PH, Laderach GE, Piker C, Guo J, Towfic Z, Qing DP-Y, Yankelevitz DF, Aberle DR, van Beek EJR, MacMahon H, Kazerooni EA, Croft BY, Clarke LP (2007) The lung image database consortium (LIDC) data collection process for nodule detection and annotation. Acad Radiol 14(12): 1464–1474CrossRefPubMedGoogle Scholar
  14. 14.
    Meyer CR, Johnson TD, McLennan G, Aberle DR, Kazerooni EA, MacMahon H, Mullan BF, Yankelevitz DF, van Beek EJR, Armato SG III, McNitt-Gray MF, Reeves AP, Gur D, Henschke CI, Hoffman EA, Bland PH, Laderach G, Pais R, Qing D, Piker C, Guo J, Starkey A, Max D, Croft BY, Clarke LP (2006) Evaluation of lung MDCT nodule annotation across radiologists and methods. Acad Radiol 13: 1254–1265CrossRefPubMedGoogle Scholar
  15. 15.
    National Cancer Institute (2009) National cancer imaging archive. Accessed 9 Jan 2009
  16. 16.
    National Institutes of Health (2009) Lung image database resource for imaging research., 2000. Accessed 9 Jan 2009
  17. 17.
    Okada K, Comaniciu D, Krishnan A (2005) Robust anisotropic gaussian fitting for volumetric characterization of pulmonary nodules in multislice CT. IEEE Trans Med Imaging 24(3): 409–423CrossRefPubMedGoogle Scholar
  18. 18.
    Reeves A, Chan A, Yankelevitz D, Henschke C, Kressler B, Kostis W (2006) On measuring the change in size of pulmonary nodules. IEEE Trans Med Imaging 25(4): 435–450CrossRefPubMedGoogle Scholar
  19. 19.
    Reeves AP, Biancardi AM, Apanasovich TV, Meyer CR, MacMahon H, van Beek EJR, Kazerooni EA, Yankelevitz D, McNitt-Gray MF, McLennan G, Armato SG III, Henschke CI, Aberle DR, Croft BY, Clarke LP (2007) The lung image database consortium (LIDC): a comparison of different size metrics for pulmonary nodule measurements. Acad Radiol 14(12): 1475–1485CrossRefPubMedGoogle Scholar
  20. 20.
    Revel M-P, Merlin A, Peyrard S, Triki R, Couchon S, Chatellier G, Frija G (2006) Software volumetric evaluation of doubling times for differentiating benign versus malignant pulmonary nodules. Am J Roentgenol 187: 135–142CrossRefGoogle Scholar
  21. 21.
    Rohlfing T, Maurer CR (2007) Shape-based averaging. IEEE Trans Image Process 16(1): 153–161CrossRefPubMedGoogle Scholar
  22. 22.
    Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, Verweij J, Van Glabbeke M, van Oosterom AT, Christian MC, Gwyther SG (2000) New guidelines to evaluate the response to treatment in solid tumors. J Natl Cancer Inst 92(3): 205–216CrossRefPubMedGoogle Scholar
  23. 23.
    Warfield SK, Zou KH, Wells WM (2004) Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging 23(7): 903–921CrossRefPubMedGoogle Scholar

Copyright information

© CARS 2009

Authors and Affiliations

  • Alberto M. Biancardi
    • 1
    Email author
  • Artit C. Jirapatnakul
    • 1
  • Anthony P. Reeves
    • 1
  1. 1.Cornell UniversityIthacaUSA

Personalised recommendations