Comparative methods for PET image segmentation in pharyngolaryngeal squamous cell carcinoma
- First Online:
- Cite this article as:
- Zaidi, H., Abdoli, M., Fuentes, C.L. et al. Eur J Nucl Med Mol Imaging (2012) 39: 881. doi:10.1007/s00259-011-2053-0
- 1.2k Downloads
Several methods have been proposed for the segmentation of 18F-FDG uptake in PET. In this study, we assessed the performance of four categories of 18F-FDG PET image segmentation techniques in pharyngolaryngeal squamous cell carcinoma using clinical studies where the surgical specimen served as the benchmark.
Nine PET image segmentation techniques were compared including: five thresholding methods; the level set technique (active contour); the stochastic expectation-maximization approach; fuzzy clustering-based segmentation (FCM); and a variant of FCM, the spatial wavelet-based algorithm (FCM-SW) which incorporates spatial information during the segmentation process, thus allowing the handling of uptake in heterogeneous lesions. These algorithms were evaluated using clinical studies in which the segmentation results were compared to the 3-D biological tumour volume (BTV) defined by histology in PET images of seven patients with T3–T4 laryngeal squamous cell carcinoma who underwent a total laryngectomy. The macroscopic tumour specimens were collected “en bloc”, frozen and cut into 1.7- to 2-mm thick slices, then digitized for use as reference.
The clinical results suggested that four of the thresholding methods and expectation-maximization overestimated the average tumour volume, while a contrast-oriented thresholding method, the level set technique and the FCM-SW algorithm underestimated it, with the FCM-SW algorithm providing relatively the highest accuracy in terms of volume determination (−5.9 ± 11.9%) and overlap index. The mean overlap index varied between 0.27 and 0.54 for the different image segmentation techniques. The FCM-SW segmentation technique showed the best compromise in terms of 3-D overlap index and statistical analysis results with values of 0.54 (0.26–0.72) for the overlap index.
The BTVs delineated using the FCM-SW segmentation technique were seemingly the most accurate and approximated closely the 3-D BTVs defined using the surgical specimens. Adaptive thresholding techniques need to be calibrated for each PET scanner and acquisition/processing protocol, and should not be used without optimization.
KeywordsPET Segmentation Biological tumour volume Head and neck Treatment planning
Image-guided adaptive radiation therapy using modern technology has emerged as a promising approach to dose escalation in pharyngolaryngeal squamous cell carcinoma . For many years, anatomical imaging was used to delineate gross tumour volumes for radiotherapy treatment planning. However, the wide adoption of PET/CT in the clinical setting and the efficacy of 18F-FDG PET imaging in a wide variety of malignant tumours with high sensitivity, specificity and accuracy stimulated the use of this technology in radiation therapy . PET/CT has been proven to provide superior sensitivity compared to CT simulation, which in some cases might miss some regions that appear suspicious on PET images including the detection of distant metastases, and to shed light on the actual tumour volume, which might in reality be smaller or bigger on the PET study than on the CT scan alone. In addition, in many studies the findings of anatomical and molecular imaging modalities have disagreed, and the addition of PET has been shown to result in modification of the treatment plans in a substantial number of clinical studies . Another important consideration is the reduction in inter- and intraobserver variability achieved with PET for delineation of the biological tumour volume (BTV) compared to anatomical imaging techniques .
Accurate target volume delineation using PET has proved to be a challenging task owing to the intrinsic properties of PET data. As a result, a wide variety of image segmentation techniques have been proposed . However, validation of accuracy (fidelity to the truth) and precision (reproducibility) of these algorithms still remain unresolved and require further research and development effort. A rather challenging and even problematic issue for validation of segmentation algorithms is the identification of a gold standard (i.e. the benchmark). The lack of guidelines established by nuclear medicine and radiation oncology professional societies renders this task more complex. Four approaches have been used in the literature to assess the accuracy of PET image segmentation techniques. These include manual delineation performed by experienced physicians, the use of simulated or experimental phantom studies where the tumour volume and spatial extent are already known, comparison with correlated anatomical gross tumour volumes defined on CT or MRI, and comparison of tumour volumes delineated on clinical PET data with actual tumour volumes measured on the registered macroscopic specimens derived from histology (where a PET scan was undertaken before surgery).
Evaluation and validation of PET image segmentation techniques using simulated or experimental phantom studies is very popular thanks to the availability of a wide variety of static and dynamic physical phantoms and comprehensive anatomical and physiological anthropomorphic models of the human body . The use of the macroscopic surgical specimen for validation of PET image segmentation techniques is one of the most promising approaches reported so far for clinical studies [7, 8] provided deformation related to the shrinkage of the specimen after surgical excision are taken into account .
We present here an assessment of the performance of four categories of 18F-FDG PET image segmentation techniques for pharyngolaryngeal squamous cell carcinoma in clinical studies, where the 3-D contour defined on the surgical specimen served as the reference.
Materials and methods
PET image segmentation methods can be divided into four broad categories based on the underlying methodology: (1) thresholding methods, (2) variational approaches, (3) stochastic modelling-based techniques, and (4) learning methods. For this study we selected representative methods from each category. Five thresholding methods, one variational method, one stochastic modelling method and two learning methods were selected. These methods are reviewed below briefly for completeness. Detailed descriptions of these methods can be found in the recent review by Zaidi and El Naqa . PET segmentation algorithms were implemented using MATLAB (Mathworks, Natick, MA) with a Mac Pro 8 quad-core Intel Xeon “Nehalem” processor and 64-bit architecture running a Snow Leopard operating system.
A variation of this approach was successfully used for PET image segmentation . PET images are first smoothed using a nonlinear anisotropic diffusion filter  and are then added as a second input to the FCM algorithm to optimize the objective function with knowledge about spatial constraint, thus incorporating spatial information (FCM-S). In addition, a methodology was developed to integrate the à trous wavelet transform  in the standard FCM algorithm (FCM-SW) to allow handling of’ uptake in heterogeneous lesions. This is achieved by adding a regularization term to the FCM objective function using the transformation result of the PET image by the à trous wavelet transform with the aim of incorporating information about lesion heterogeneity.
The influence of the wavelet on voxel clustering update is controlled by βk coefficients, which depend on two parameters set by the trial-and-error technique. This approach is well established for solving problems where there are multiple chances of obtaining the correct solution. These values were chosen in such a way that the influence of the wavelet filtered image remained most important for most regions of the image showing high tracer uptake, such that there is a stronger influence on the objective function when the current voxel is located inside the tumour . These values were optimized once and were not adjusted again by hand for the particular images being processed.
Seven patients with T3–T4 laryngeal squamous cell carcinoma from the Louvain database  who had undergone an 18F-FDG PET study prior to treatment were included for comparative analysis of the investigated PET image segmentation methods . The patients were immobilized with a tailored thermoplastic mask (Sinmed, Reeuwijk, The Netherlands) attached to a flat table-top to avoid neck motion during scanning. A preinjection transmission scan (10 min) was acquired prior to intravenous injection of 185–370 MBq of 18F-FDG, which was followed by a 60-min dynamic 3-D PET emission scan using an ECAT EXACT HR camera (CTI/Siemens, Knoxville, TN). PET data were reconstructed using a 3-D AW-OSEM algorithm (four subsets and eight iterations) following correction for dead time, randoms, scatter, attenuation and radioactive decay.
Patients underwent a total laryngectomy a few days (average 5 days) following the PET study. A special procedure was developed to allow 3-D coregistration of the macroscopic specimens with the imaging modalities. Fresh surgical specimens were placed in a polystyrene cast containing three longitudinally placed wooden rods that were equally spaced in the transverse plane of the specimen and the cast was filled with a 16% gelatin solution and kept at −20°C for 48 h and then at −80°C for at least 72 h. The authors reported that these fixation and freezing procedures did not result in retraction compared to other methods as evident by their animal data . The macroscopic tumour specimens were gathered, frozen and cut into 1.7- to 2-mm thick slices. The 3-D specimen was reconstructed following digitization and realignment of the obtained thin slices also taking into account the material lost during slicing. A semiautomated rigid-body registration algorithm was then used to coregister the PET and macroscopic surgical specimen images . The last step consisted of creating the fully 3-D macroscopic tumour volume by delineating separately on each slice the macroscopic tumour extension. The volume obtained served as the reference for evaluation of the investigated PET segmentation techniques.
The EM delineation overestimated the tumour volume in relation to the reference by a mean volume of 22.4cm3. The level set, FCM and FCM-SW methods achieved good results compared to the thresholding and stochastic models; nevertheless, they also tended to slightly underestimate tumour volumes on average. The FCM-SW approach achieved the closest standard deviation to the reference volumes.
Mean SOIs resulting from the different PET image segmentation methods relative to the reference 3-D contour defined on the surgical specimens. The levels of statistical significance for paired samples are also shown
SOI in relation to histology
Black et al. 
Biehl et al. 
Nestle et al. 
Schaefer et al. 
Molecular imaging-guided clinical diagnosis and radiation therapy treatment planning for pharyngolaryngeal squamous cell carcinoma is highly complex, not only because of the diversity of tumour subsites, but also because of the anatomical restrictions of the head and neck region and the critical importance of preserving the function of organs at risk. The additional information provided by PET/CT combined with sophisticated quantitative methodologies could certainly help to improve diagnostic and treatment strategies. The accurate determination of tumour shape and volume from FDG PET images remains a challenging task owing to the limitations of the current generation of PET scanners, particularly their limited spatial resolution and high noise characteristics resulting from their low sensitivity. Despite the very worthwhile research and development efforts and advances in PET image segmentation , there is still scope for improvement given the complexity of the image generation process and the limitations of available techniques, particularly in those with inhomogeneous tumours [31, 32].
This study involved a qualitative and quantitative evaluation of nine PET-based delineation techniques using clinical studies where the 3-D contour defined on in vivo macroscopic surgical specimens served as the reference . Although no reference is perfect, such histological specimens provide the best possible approximation of the tumour boundary. The objective was to select the best available segmentation tool for PET-guided radiotherapy treatment planning and assessment of response to treatment. Although the Louvain database is well documented, the histopathology data were delineated on macroscopic visualization by a single expert histopathologist rather than at the microscopic level and by agreement of an expert panel. In addition, non-uniform uptake of FDG within the tumour volume was lost when compared to the histopathology boundaries, thus excluding the possibility of distinguishing regions in tumours with heterogeneous uptake. Moreover, the data were acquired on an old stand-alone ECAT EXACT HR scanner and reconstructed using predefined parameters, the raw data not being available for reprocessing. These are some of the limitations of this unique dataset. However, it was the only available clinical dataset for pharyngolaryngeal squamous cell carcinoma that provided surgical specimens as reference. We therefore adopted this database as the only available choice for comparison and validation of the investigated PET segmentation techniques. Yet this scanner has a spatial resolution (transaxial spatial resolution varying from 3.6 mm FWHM at the centre to 4.5 mm FWHM tangentially at 20 cm)  that is comparable to the resolution achieved by the current generation of PET/CT scanners used to derive actual calibration parameters required by adaptive thresholding approaches [11, 13, 14, 15].
The performance of the nine segmentation techniques was highly dependent on the contrast and noise characteristics of the PET images. As such, the histology-derived contours correlated very differently with the PET images shown in Figs. 2 and 3. The PET images used for the comparative assessment were preprocessed following the work of Geets et al. to account for noise and blurring artefacts that may degrade the definition of tumour boundaries . These methods were applied here for consistency with previous work on these data. However, such preprocessing needs to be carefully applied depending on the scanning and acquisition protocols used. All methods used in this work were implemented as described by the authors in their original articles. It is, however, recognized that adaptive thresholding techniques need to be calibrated optimized in the clinic for each PET scanner and acquisition/processing protocol. It should be emphasized that calibration is mandatory for any PET segmentation algorithm to adjust the parameters of the method with respect to the spatial resolution and noise properties of the PET scanner.
The results obtained in clinical studies for various lesion sizes and contrasts demonstrate that none of the methods is adequate for all conditions independent of lesion characteristics and scanning conditions. In this clinically realistic set-up with heterogeneous target and background, thresholding methods [11, 13, 14] performed less well than the method of Schaefer et al. . Moreover, the latter technique, which incorporates prior knowledge of both contrast and volume, performed as well as the more sophisticated approaches assessed in this work. However, the implementation of techniques such as that of Schaefer et al. depends on the scanner, data acquisition and processing protocol, and often requires extensive phantom studies for calibration, which renders their standardization difficult. This makes studies comparing different sites or multicentre clinical trials difficult to carry out. Furthermore, the thresholding methods do not consider inhomogeneities in the BTV [24, 30].
The clinical assessment of segmentation techniques using surgical specimens as reference showed quite similar results to those of previous physical phantom studies . Previous studies in non-small-cell lung cancer have provided similar conclusions, but with small differences owing to the presence of respiratory motion artefacts in thoracic clinical images .
The level set and FCM algorithms were a good compromise in terms of the relative error in the volume estimate in comparison with the better performance of the FCM-SW algorithm, with SOI values of up to 0.72 and CEs which were smaller than those with the thresholding, variational and stochastic methods. The CE gives an idea of the spatial location and geometrical shape of the segmented image compared to the reference, and could have values higher than 100%. For this criterion, it was observed that the FCM-SW algorithm out-performed three of the other automated techniques compared in this work, and the method handled typical low-contrast and highly noisy images without clearly defined edges better. The methods of Biehl et al. and Black et al. and the 40% of SUVmax algorithm were not successful and gave estimated volumes that were significantly different. In addition, they resulted in higher CEs than all the algorithms evaluated in this work. Overall, the automated methods produced more accurate and robust performance than the thresholding methods except that of Schaefer et al. . The reasons of this exception are not yet clear and are still being investigated using simulated data. Nevertheless, there are still opportunities to refine the algorithms in our study using a larger sample for learning. Another promising direction is combining complementary information from PET and CT images in a joint segmentation process .
Geets et al.  developed a gradient-based method using the watershed transform and hierarchical cluster analysis and compared it with an adaptive thresholding method based on the signal-to-background ratio approach as implemented by Daisne et al. . Using the same clinical data, the adaptive thresholding method overestimated the actual volume determined in the macroscopic specimens by 68%, whereas the gradient-based approach overestimated the actual volume by about 20%. The FCM-SW approach  out-performed both techniques and underestimated the actual volume by only 6% on average.
The results of the study are not straightforward to project to general clinical practice, for example, in the context of the image reconstruction method or acquisition parameters. These parameters are likely to affect the image properties, such as spatial resolution and noise levels, essential for algorithm performance. A more comprehensive assessment of the impact of acquisition, reconstruction and processing techniques on algorithm performance in a more methodical manner over a wider range of resolution/noise levels is required.
With the limited sample size used in this comparative study, the FCM-SW algorithm achieved the highest accuracy for clinical studies in terms of relative error and overlap in comparison with all the other techniques evaluated for biological target definition in pharyngolaryngeal squamous cell carcinoma. Moreover, it was less parametric than stochastic and variational methods, which could benefit from parameter optimization in some cases. Overall, the BTVs delineated using the FCM-SW technique were the most accurate and approximated closely the 3-D BTVs defined in the corresponding surgical specimens. With slight fine tuning, this technique could be a good candidate for PET-guided radiation treatment planning and assessment of response to treatment. Adaptive thresholding techniques need to be calibrated for each PET scanner and acquisition/processing protocol, and should not be used without optimization.
This work was supported by the Swiss National Science Foundation under grant SNSF 31003A-135576, Geneva Cancer League, Indo-Swiss Joint Research Programme ISJRP 138866 and the Natural Sciences and Engineering Research Council of Canada under grant NSERC- RGPIN 397711-11. The authors would like to thank Dr. John Lee (Université catholique de Louvain, Brussels) for providing the clinical PET datasets.
Conflicts of interest
This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.