A comparison of ground truth estimation methods



Knowledge of the exact shape of a lesion, or ground truth (GT), is necessary for the development of diagnostic tools by means of algorithm validation, measurement metric analysis, accurate size estimation. Four methods that estimate GTs from multiple readers’ documentations by considering the spatial location of voxels were compared: thresholded Probability-Map at 0.50 (TPM0.50) and at 0.75 (TPM0.75), simultaneous truth and performance level estimation (STAPLE) and truth estimate from self distances (TESD).


A subset of the publicly available Lung Image Database Consortium archive was used, selecting pulmonary nodules documented by all four radiologists. The pair-wise similarities between the estimated GTs were analyzed by computing the respective Jaccard coefficients. Then, with respect to the readers’ marking volumes, the estimated volumes were ranked and the sign test of the differences between them was performed.


(a) the rank variations among the four methods and the volume differences between STAPLE and TESD are not statistically significant, (b) TPM0.50 estimates are statistically larger (c) TPM0.75 estimates are statistically smaller (d) there is some spatial disagreement in the estimates as the one-sided 90% confidence intervals between TPM0.75 and TPM0.50, TPM0.75 and STAPLE, TPM0.75 and TESD, TPM0.50 and STAPLE, TPM0.50 and TESD, STAPLE and TESD, respectively, show: [0.67, 1.00], [0.67, 1.00], [0.77, 1.00], [0.93, 1.00], [0.85, 1.00], [0.85, 1.00].


The method used to estimate the GT is important: the differences highlighted that STAPLE and TESD, notwithstanding a few weaknesses, appear to be equally viable as a GT estimator, while the increased availability of computing power is decreasing the appeal afforded to TPMs. Ultimately, the choice of which GT estimation method, between the two, should be preferred depends on the specific characteristics of the marked data that is used with respect to the two elements that differentiate the method approaches: relative reliabilities of the readers and the reliability of the region boundaries.

This is a preview of subscription content, access via your institution.


  1. 1

    Armato SG, McLennan G, McNitt-Gray MF, Meyer CR, Yankelevitz D, Aberle DR, Henschke CI, Hoffman EA, Kazerooni EA, MacMahon H, Reeves AP, Croft BY, Clarke LP (2004) The Lung Image Database Consortium Research Group. Lung image database consortium: developing a resource for the medical imaging research community. Radiology 232(3): 739–748

    Google Scholar 

  2. 2

    Biancardi AM, Reeves AP (2009) TESD: a novel ground truth estimation method. In: SPIE international symposium on medical imaging, vol 7260, pp 72603V–1–8

  3. 3

    Biancardi AM, Jirapatnakul AC, Fotin S, Apanasovich TV, Reeves AP (2009) An analysis of two ground truth estimation methods. In: SPIE international symposium on medical imaging, vol 7260, pp 72600E–1–8

  4. 4

    Boykov Y, Kolmogorov V (2004) An experimental comparison of Min-Cut/Max-Flow algorithms for energy minimization in vision. IEEE Trans Pattern Anal Mach Intell 26(9): 1124–1137

    Article  PubMed  Google Scholar 

  5. 5

    Breiman RS, Beck JW, Korobkin M, Glenny R, Akwari OE, Heaston DK, Moore AV, Ram PC (1982) Volume determinations using computed tomography. Am J Roentgenol 138(2): 329–333

    CAS  Google Scholar 

  6. 6

    Felzenszwalb P, Huttenlocher D (2003) Distance transforms of sampled functions. Technical report, Cornell University

  7. 7

    Ford LR Jr, Fulkerson DR (1956) Maximal flow through a network. Can J Math 8(3): 399–404

    Google Scholar 

  8. 8

    Goodman LR, Gulsun M, Washington L, Nagy PG, Piacsek KL (2006) Inherent variability of CT lung nodule measurements in vivo using semiautomated volumetric measurements. Am J Roentgenol 186: 989–994

    Article  Google Scholar 

  9. 9

    Ibanez L, Schroeder W, Ng L, Cates J (2005) The ITK Software Guide, 2nd edn. Kitware. ISBN 1-930934-15-7. http://www.itk.org/ItkSoftwareGuide.pdf

  10. 10

    Jaccard P (1908) Nouvelles recherches sur la distribution florale. Bull Soc Vaudoise Sci Nat 44: 223–270

    Google Scholar 

  11. 11

    Ko JP, Rusinek H, Jacobs EL, Babb JS, Betke M, McGuinness G, Naidich DP (2003) Small pulmonary nodules: volume measurement at chest CT—phantom study. Radiology 228(3): 864–870

    Article  PubMed  Google Scholar 

  12. 12

    Kuhnigk J-M, Dicken V, Bornemann L, Wormanns D, Krass S, Peitgen HO (2004) Fast automated segmentation and reproducible volumetry of pulmonary metastases in CT-scans for therapy monitoring. In: Lecture notes in computer science, vol 3217, pp 933–941. Medical Image Computing and Computer-Assisted Intervention. Springer GmbH

  13. 13

    McNitt-Gray MF, Armato SG III, Meyer CR, Reeves AP, McLennan G, Pais RC, Freymann J, Brown MS, Engelmann RM, Bland PH, Laderach GE, Piker C, Guo J, Towfic Z, Qing DP-Y, Yankelevitz DF, Aberle DR, van Beek EJR, MacMahon H, Kazerooni EA, Croft BY, Clarke LP (2007) The lung image database consortium (LIDC) data collection process for nodule detection and annotation. Acad Radiol 14(12): 1464–1474

    Article  PubMed  Google Scholar 

  14. 14

    Meyer CR, Johnson TD, McLennan G, Aberle DR, Kazerooni EA, MacMahon H, Mullan BF, Yankelevitz DF, van Beek EJR, Armato SG III, McNitt-Gray MF, Reeves AP, Gur D, Henschke CI, Hoffman EA, Bland PH, Laderach G, Pais R, Qing D, Piker C, Guo J, Starkey A, Max D, Croft BY, Clarke LP (2006) Evaluation of lung MDCT nodule annotation across radiologists and methods. Acad Radiol 13: 1254–1265

    Article  PubMed  Google Scholar 

  15. 15

    National Cancer Institute (2009) National cancer imaging archive. https://imaging.nci.nih.gov/ncia/. Accessed 9 Jan 2009

  16. 16

    National Institutes of Health (2009) Lung image database resource for imaging research. http://grants.nih.gov/grants/guide/rfa-files/RFA-CA-01-001.html, 2000. Accessed 9 Jan 2009

  17. 17

    Okada K, Comaniciu D, Krishnan A (2005) Robust anisotropic gaussian fitting for volumetric characterization of pulmonary nodules in multislice CT. IEEE Trans Med Imaging 24(3): 409–423

    Article  PubMed  Google Scholar 

  18. 18

    Reeves A, Chan A, Yankelevitz D, Henschke C, Kressler B, Kostis W (2006) On measuring the change in size of pulmonary nodules. IEEE Trans Med Imaging 25(4): 435–450

    Article  PubMed  Google Scholar 

  19. 19

    Reeves AP, Biancardi AM, Apanasovich TV, Meyer CR, MacMahon H, van Beek EJR, Kazerooni EA, Yankelevitz D, McNitt-Gray MF, McLennan G, Armato SG III, Henschke CI, Aberle DR, Croft BY, Clarke LP (2007) The lung image database consortium (LIDC): a comparison of different size metrics for pulmonary nodule measurements. Acad Radiol 14(12): 1475–1485

    Article  PubMed  Google Scholar 

  20. 20

    Revel M-P, Merlin A, Peyrard S, Triki R, Couchon S, Chatellier G, Frija G (2006) Software volumetric evaluation of doubling times for differentiating benign versus malignant pulmonary nodules. Am J Roentgenol 187: 135–142

    Article  Google Scholar 

  21. 21

    Rohlfing T, Maurer CR (2007) Shape-based averaging. IEEE Trans Image Process 16(1): 153–161

    Article  PubMed  Google Scholar 

  22. 22

    Therasse P, Arbuck SG, Eisenhauer EA, Wanders J, Kaplan RS, Rubinstein L, Verweij J, Van Glabbeke M, van Oosterom AT, Christian MC, Gwyther SG (2000) New guidelines to evaluate the response to treatment in solid tumors. J Natl Cancer Inst 92(3): 205–216

    Article  CAS  PubMed  Google Scholar 

  23. 23

    Warfield SK, Zou KH, Wells WM (2004) Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation. IEEE Trans Med Imaging 23(7): 903–921

    Article  PubMed  Google Scholar 

Download references

Author information



Corresponding author

Correspondence to Alberto M. Biancardi.

Rights and permissions

Reprints and Permissions

About this article

Cite this article

Biancardi, A.M., Jirapatnakul, A.C. & Reeves, A.P. A comparison of ground truth estimation methods. Int J CARS 5, 295–305 (2010). https://doi.org/10.1007/s11548-009-0401-3

Download citation


  • CAD development
  • Algorithm validation
  • Volumetric measurement
  • Diagnosis
  • Response to therapy