Inter-observer variability of manual contour delineation of structures in CT
- 371 Downloads
To quantify the inter-observer variability of manual delineation of lesions and organ contours in CT to establish a reference standard for volumetric measurements for clinical decision making and for the evaluation of automatic segmentation algorithms.
Materials and methods
Eleven radiologists manually delineated 3193 contours of liver tumours (896), lung tumours (1085), kidney contours (434) and brain hematomas (497) on 490 slices of clinical CT scans. A comparative analysis of the delineations was then performed to quantify the inter-observer delineation variability with standard volume metrics and with new group-wise metrics for delineations produced by groups of observers.
The mean volume overlap variability values and ranges (in %) between the delineations of two observers were: liver tumours 17.8 [-5.8,+7.2]%, lung tumours 20.8 [-8.8,+10.2]%, kidney contours 8.8 [-0.8,+1.2]% and brain hematomas 18 [-6.0,+6.0] %. For any two randomly selected observers, the mean delineation volume overlap variability was 5–57%. The mean variability captured by groups of two, three and five observers was 37%, 53% and 72%; eight observers accounted for 75–94% of the total variability. For all cases, 38.5% of the delineation non-agreement was due to parts of the delineation of a single observer disagreeing with the others. No statistical difference was found for the delineation variability between the observers based on their expertise.
The variability in manual delineations for different structures and observers is large and spans a wide range across a variety of structures and pathologies. Two and even three observers may not be sufficient to establish the full range of inter-observer variability.
• This study quantifies the inter-observer variability of manual delineation of lesions and organ contours in CT.
• The variability of manual delineations between two observers can be significant. Two and even three observers capture only a fraction of the full range of inter-observer variability observed in common practice.
• Inter-observer manual delineation variability is necessary to establish a reference standard for radiologist training and evaluation and for the evaluation of automatic segmentation algorithms.
KeywordsHumans Observer variation Reproducibility of results
- HD LED
High-definition light-emitting diode
Institutional review board
Multiple-detector computed tomography
We thank Dr. Alexander Benstein and the team of radiologists of the Department of Radiology Hadassah Hebrew University Medical Center, Jerusalem, Israel, for their participation in the manual delineation project. We also thank Dr. Tammy Riklin-Raviv, Ben Gurion University of the Negev, for providing the brain CT scans and the brain hematoma delineations.
This study has received partial funding from the Israel Ministry of Science, Technology and Space, grant 53681, 2016-19, and by the Oppenheimer Applied Research Grant, The Hebrew University, TUBITAK ARDEB grant no. 110E264, 2015-16.
Compliance with ethical standards
The scientific guarantor of this publication is Prof. Leo Joskowicz.
Conflict of interest
The authors of this manuscript declare no relationships with any companies, whose products or services may be related to the subject matter of the article.
Statistics and biometry
No complex statistical methods were necessary for this paper.
Written informed consent was waived by the institutional review board.
Institutional review board approval was obtained.
• performed at one institution
- 4.Abbara S, Blanke P, Maroules CD et al (2016) SCCT guidelines for the performance and acquisition of coronary computed tomographic angiography: A report of the society of Cardiovascular Computed Tomography Guidelines Committee: Endorsed by the North American Society for Cardiovascular Imaging (NASCI). J Cardiovasc Comput Tomogr 10(6):435–449CrossRefGoogle Scholar
- 6.Pupulim LF, Ronot M, Paradis V, Chemouny S, Vilgrain V (2017) Volumetric measurement of hepatic tumors: accuracy of manual contouring using CT with volumetric pathology as the reference method. Diagn Interv Imaging S2211-5684(17):30282–30286Google Scholar
- 10.ITK-SNAP open software. http://www.itksnap.org/pmwiki/pmwiki.php. Accessed Jul 8 2018.
- 11.Cohen D (2017) Segmentation variability estimation in medical image processing: framework, method and study. MSc Thesis. The Hebrew University of Jerusalem IsraelGoogle Scholar
- 14.Gurari D, Theriault D, Sameki M, et al (2015) How to collect segmentations for biomedical images? A benchmark evaluating the performance of experts, crowdsourced non-experts, and algorithms. Proc IEEE Winter Conference on Applications of Computer Vision, pp 1169–1176Google Scholar
- 15.Irshad H, Montaser-Kouhsari L, Waltz G et al (2015) Crowdsourcing image annotation for nucleus detection and segmentation in computational pathology: evaluating experts, automated methods, and the crowd. Pac Symp Biocomput, pp 294–305Google Scholar
- 16.Helm E, Seitel A, Isensee F et al (2018) Clickstream analysis for crowd-based objects segmentation with confidence. IEEE Trans Pattern Anal Mach Intell, to appear. https://doi.org/10.1109/TPAMI.2017.2777967