Background

Target volume delineation in the treatment of head and neck cancers is a critical issue in the current era of highly conformal radiotherapy with intensity modulated radiotherapy (IMRT) techniques. Steep dose gradients allow sparing of adjacent critical structures but also introduce the potential for geographical misses leading to marginal recurrences if target volume delineation is not accurate [13]. Delineation variability can have a large impact on the dose to the tumour and organs at risk [4], and tumour delineation inaccuracy is recognised as a key source of error in radiotherapy delivery [5, 6]. Computed tomography (CT) remains the core of radiotherapy planning, with the electron density map generated providing accurate dosimetry. However, for delineation of the gross tumour volume (GTV) the limitations of CT-based delineation are widely acknowledged, and were clearly demonstrated in a study of the delineation of supra-glottic tumours with a 50 % degree of agreement between experienced physicians [7].

The integration of multimodality imaging into the radiotherapy planning process provides the opportunity to improve upon the reliance on CT-based tumour delineation. Magnetic resonance imaging (MR) offers excellent soft tissue discrimination, multiplanar imaging capabilities, and importantly, image quality is less susceptible to artefact from dental amalgam compared with CT [8, 9]. Anatomical imaging with CT or MR is inherently limited in allowing discrimination of tumour tissue from surrounding soft tissues. As a result, there has been considerable interest in utilising functional imaging to complement anatomical imaging [10, 11]. 2-Deoxy-2-[18F]-Fluoro-D-glucose positron emission tomography-computed tomography (FDG PET-CT) is a widely used functional imaging technique in oncology; tumour cells exhibit differential glucose uptake (the ‘Warburg effect’) as a basis of the identification of cancer [12]. The potential relevance of FDG PET-CT to radiotherapy planning is highlighted by the finding that loco-regional recurrences occur in-field in regions which are FDG-avid at baseline [13].

Some major institutions employ tight volumetric margins in the treatment of oropharyngeal cancer; for example recently reported series from major institutions [1416] have employed GTV to CTV margins of 0-10 mm. However, the limited soft tissue contrast of CT commonly combined with interference from dental artefact make CT-based delineation of oropharyngeal primary tumours in routine clinical practice particularly challenging [17]. Therefore, the use of multimodality imaging to aid accurate GTV delineation for oropharyngeal primaries is appealing. However, only limited data is available to inform upon the intermodality comparison of CT, MRI and FDG PET-CT for oropharyngeal carcinoma [18, 19].

The primary aim of this prospective study was to quantitatively investigate the variation in oropharyngeal squamous cell carcinoma (OSCC) primary GTV delineation with CT, MR and FDG PET-CT, using volumetric and positional analyses.

Methods

Inclusion criteria

Inclusion criteria for this prospective single centre pilot imaging study were: age ≥18 years, histologically proven squamous cell carcinoma of the head and neck region, WHO performance status 0–2, decision to proceed with (chemo) radiotherapy with curative intent following discussion in a multi-disciplinary meeting, measurable primary cancer on routine pre-treatment imaging (CT and/or MR), and provision of fully informed consent. Patients were excluded from the study if there was poorly controlled diabetes, contraindication to MR or an estimated glomerular filtration rate <30 ml/min/1.73 m2. This study was approved by the Research Ethics Committee (National Research Ethics Committee Yorkshire and the Humber-Bradford, 11/YH/0212) and Administration of Radioactive Substances Advisory Committee (ARSAC); ISRCTN Registry: ISRCTN34165059 and all patients provided informed written consent prior to study entry.

The study protocol included contrast enhanced FDG PET-CT and MR scans performed in a 5-point thermoplastic radiotherapy immobilization mask. Target delineation and treatment proceeded according to institutional clinical protocols.

Fifteen patients entered the study; 1 patient withdrew consent prior to imaging. 11 of the 14 patients who underwent pre-treatment imaging according to the study protocol had a diagnosis of an oropharyngeal cancer and form the basis of this report.

Image acquisition

FDG PET-CT

FDG PET-CT imaging was performed on a 64-section GE Discovery 690 PET-CT system (GE Healthcare, Amersham, UK). Baseline half-body PET acquisition and additional dedicated head and neck acquisition in the immobilization mask (3–4 bed positions, 2 minutes per bed position) from skull vertex to carina was performed for 60 minutes following a 400 MBq injection of Fluorine-18 FDG intravenously. The CT component of the head and neck acquisition was obtained after a 25 second delay following a bolus of 100 ml of iodinated contrast (Niopam 300, Bracco Ltd, High Wycombe, UK) injected at 3 ml/s using the following settings; 120 kV, variable mA (min 10, max 600, noise index 12.2), tube rotation 0.5 s per rotation, pitch 0.969 with a 2.5 mm slice reconstruction. The head and neck component of the FDG PET-CT scan, acquired with a 5-point thermoplastic radiotherapy immobilization mask fitted and room laser alignment to radiopaque reference markers placed on the mask, was also used for radiotherapy planning according to routine clinical protocols.

MR

MR images were acquired on a 1.5 T Siemens Magnetom Avanto system (Siemens Healthcare, Erlangen, Germany). Patients were immobilized in the same treatment position and the same mask as for FDG PET-CT imaging. Axial post-contrast T1-weighted (TR = 831 ms, TE = 8.6 ms, 105 × 2 mm thick contiguous slices, acquired voxel size = 0.9 × 0.9 × 2.0 mm) and axial fat saturated T2-weighted (TR = 4430 ms, TE = 76 ms, voxel size = 0.8 × 0.7 × 3.0 mm) sequences were acquired.

Image co-registration

To allow the spatial comparison of the FDG PET-CT, CT and MR scans, rigid image registration was undertaken using Mirada RTx v1.4 software (Mirada Medical, Oxford, UK) between the CT dataset and the T1-weighted MR dataset. FDG PET-CT scans were inherently spatially co-registered.

Gross tumour volume delineation of primary tumour

In order to simulate the clinical scenario, all outlining was performed with access to clinical history, findings of clinical examination, diagnostic imaging including CT and/or MR performed as part of the diagnostic process prior to entry into the study; FDG PET-CT was not performed as a routine diagnostic investigation and was not therefore available to the observers.

CT and MR based GTV contours

For each patient, five observers (two radiologists and three radiation oncologists) were provided with lists of contours to be performed on study images of primary tumours (CT, MR and combined CT and MR (CT-MR)); the order in which contours were performed was systematically varied for each individual observer. To minimize any potential for recall, a minimum of a two week interval was mandated prior to generating contours for each individual patient using different imaging modalities. For CT based contours, observers were blinded to the MR and PET images acquired as part of the study protocol. For MR based contours, post-contrast T1-weighted and fat saturated T2-weighted images were available and inherently co-registered; and observers were blinded to CT and PET images acquired as part of the study protocol. For combined CT-MR contours, the post-contrast T1-weighted and fat saturated T2-weighted MR and CT were available.

FDG PET-CT GTV contours

Image analysis was undertaken on Mirada RTx v1.4 software. The maximum standardized uptake value (SUVmax) was derived by drawing a region of interest (ROI) encompassing the primary tumour. The PET GTV was defined by using an adaptive thresholding technique, known as the Schaefer algorithm [20], calculated from the mean primary tumour SUV (SUVmean) when applying a 70 % of SUVmax isocontour, the background tissue SUVmean and two scanner specific coefficients (determined from phantom studies).

Data analysis

The data analysis was split into the GTV volume analysis and position analysis. All statistical analysis was performed using Matlab2013b (MATLAB and Statistics Toolbox Release 2013b, The MathWorks, Inc., Natick, Massachusetts, United States).

Volume analysis

Variation in volume of GTV with modality

Linear mixed effects models were used to determine the significance of differences in GTV volume with modality, where modality and clinician role (radiologist or radiation oncologist) were fixed effect variables and patient and clinician were random effect variables [21]. The lack of multiple clinician PET GTVs made inter-clinician variability impractical to model when PET was included, therefore multiple models were used where clinician and clinician title inter-observer variability terms were excluded in the PET GTV model. Data population testing was performed using Q-Q plots and × 1/3 transformations were used to create normal population distributions. A significant ρ-value was considered to be ρ < 0.02 to account for the multiple model comparisons that were required due to the fixed variable comparison method in linear mixed effects models [22].

Variation in volume of GTV with clinician group

The mean GTV volumes for the CT, MR and CT-MR modalities were calculated for each clinician group; radiologist and oncologist. Significance testing between clinician groups for each modality was undertaken using linear mixed effects models.

Variation in inter-observer variability with imaging modality

The variation in inter-observer delineation was measured by taking the mean over all patients of the standard deviation of all observers delineations for each patient within a modality. This was repeated for CT, MR and CT-MR volumes. Significance testing was then performed between modalities using an ANOVA test combined with a Tukey multiple comparison test [23].

Positional analysis

Six positional metrics were calculated using ImSimQA software (v3.1.5, OSL, Shrewsbury, UK): Mean distance to conformity (MDC); Centre of gravity distance (CGD); Conformity index (CI); DICE index; sensitivity index (Se. Idx); and inclusion index (Incl. Idx). The conformity index and DICE index both produce output values between 0 and 1 (using different calculation methods), where 0 represents two contours with no overlap and 1 represents two contours that are perfectly overlapping [24]. The Se. Idx and Incl. Idx calculate the overlapping volume between two contours as a percentage of the volume of one of the two contours. When used together they calculate the percentage of volume A which is within volume B and vice versa. CGD is the distance between the geometric centres of two contours [25]. MDC is the mean of the distances between contours averaged over all positions not within the overlapping contour [25].

Variation in inter-observer variability with imaging modality

The positional inter-observer variability for each modality was assessed by comparing all GTVs delineated using the same modality for each patient. The final positional comparison values were calculated for each metric by calculating the mean of the metric results for each patient and subsequently the overall mean result for all patients. Significance testing was then performed between modalities using an ANOVA test combined with a Tukey multiple comparison test [23].

Variation in GTV position with imaging modality

The variation in GTV position between modalities was assessed using ImSimQA between GTVs delineated by the same clinician and the PET GTV for each patient.

Results

11 patients with histologically proven OSCC entered the study. Baseline characteristics are summarised in Table 1. Diagnostic imaging included MR for all patients. The median time between FDG PET-CT and MR scans performed within the study was 7 days (range 0–12). Within the time constraints for completing contouring of the primary tumour GTV, all CT contours, 51/55 MR, and 42/55 CT-MR GTV contours were completed; 10/11 combined CT-MR GTVs were incomplete for one radiologist. A representative example of contours delineated by each observer on CT, MR, CT-MR and by automatic segmentation of PET is shown in Fig. 1. Figure 2 provides an example of contouring by a single observer on CT, MR, CT-MR and by automatic segmentation of PET superimposed upon the CT scan.

Table 1 Patient demographics and tumour characteristics
Fig. 1
figure 1

Example of inter-observer variability in contouring GTVs based on CT, MR, CT-MR and of auto-segmented contour based on PET for a patient with a T2 N2b poorly differentiated squamous cell carcinoma of the right tonsil. Contours shown are: radiation oncologist 1 red, radiation oncologist 2 yellow, radiation oncologist 3 orange, radiologist 1 green, radiologist 2 purple, PET contour blue

Fig. 2
figure 2

Representative example of GTVs delineated on CT, MR, CT-MR by a single radiation oncologist, displayed on an axial CT scan, for a patient with a T1 N2b well differentiated squamous cell carcinoma of the right tonsil. CT GTV red; MR GTV yellow; CT-MR contour orange; PET contour blue

Volume analysis of GTVs

The volume of the primary tumour contours for CT, MR, CT-MR and PET are shown for each patient in Fig. 3 and are summarised in Table 2. Table 2 illustrates the median and mean volumes of GTVs delineated on CT, MR, CT-MR and generated by automatic segmentation of the PET. Figure 4 demonstrates the volume of GTVs delineated by individual observers using CT, MR and CT-MR. Table 3 illustrates the standard deviation of the GTV volume delineations for each patient for each modality. Compared with CT GTVs, CT-MR GTVs were significantly larger (p = 0.0052). MR had a significantly smaller GTV volume standard deviation than CT (ρ-value < 0.05). Average PET GTVs were smaller than CT, MR and CT-MR volumes, a difference which was significant compared with MR and CT-MR GTVs (p = 0.003 and p < 0.001 respectively).

Fig. 3
figure 3

Median volumes of GTVs delineated on CT, MR, CT-MR and PET for each patient

Table 2 Summary of volume of GTVs contoured using CT, MR, CT-MR and PET
Fig. 4
figure 4

Volumes of GTVs delineated by individual observers on CT, MR and CT-MR

Table 3 The standard deviation of the GTV volume delineations undertaken by clinicians for each patient for each modality

Significant differences were found between radiologist- and oncologist-delineated GTV volumes for each individual modality: CT (radiologist 9.1 cm3 vs. oncologist 13.8 cm3, ρ = 0.022); MR (radiologist 9.9 cm3 vs. oncologist 14.4 cm3, ρ = 0.00013); CT-MR (radiologist 10.5 cm3 vs. oncologist 15.8 cm3, ρ = 0.12); and overall for all modalities (radiologist 9.7 cm3 vs. oncologist 14.6 cm3, ρ = 0.001).

Positional analysis of GTVs

The analysis of positional inter-observer variability is summarized in Table 4. Inter-observer variability was found to be significantly higher for CT compared to MR and CT-MR, with no significant differences between MR and CT-MR contours.

Table 4 Mean positional metric results for the inter-observer variability

The results of the comparison of GTV position between CT, MR, CT-MR and PET is shown in Table 5. CT, MR and CT-MR were found to all have similar, large differences in position compared to PET. A mean of 64 %, 74 % and 77 % of the PET GTV were included within the CT, MR and CT-MR GTVs respectively. A mean of 56 %, 58 %, 54 % of the CT, MR and CT-MR GTVs were included within the PET GTVs. MR and CT GTVs were found to have a low level of overlap and a large variation in CGD and MDC. A mean of 57 % of the MR GTV was included within the CT GTV; conversely a mean of 63 % of the CT GTV was included within the MR GTV. MR and CT-MR were found to have a high level of overlap and a small variation in CGD and MDC; a mean of 85 % of the CT-MR GTV was included within the MR GTV .

Table 5 Inter-modality positional GTV analysis

Discussion

There is considerable interest in improving the accuracy of tumour delineation in the era of highly conformal IMRT [10]. The current standard of CT-based delineation is particularly limited for oropharyngeal primary tumours, which are often barely visible even with contrast-enhanced CT-simulation scans [9, 19]. Multimodality imaging has the potential to improve the accuracy and reproducibility of tumour delineation.

Clinical experience suggests that oropharyngeal primary tumours are more readily identifiable on MR than CT. There was no significant difference in the volume of GTVs outlined on MR and CT. Although there was considerable inter-observer variability for CT, MR and CT-MR GTV delineation, there was significantly less variability for MR and CT-MR than for CT GTVs. Analysis of positional metrics demonstrated a low degree of volume overlap between CT and MR GTVs. MR and CT-MR GTVs showed a large degree of overlap; this is likely to reflect the clinicians’ propensity to base the CT-MR GTV contours on the MR on which the edge of the primary tumour is more readily identifiable. These data suggest that the implementation of either combined CT-MR or MR-based planning would have a considerable impact upon GTV delineation compared with CT-based planning.

These data are broadly in line with a previous study by Daisne et al. [18] who did not find a significant difference in the volume of GTVs contoured by a single observer on CT or MR in a series of 10 patients with oropharyngeal carcinoma. Consistent with our results, this series also showed significant areas of non-overlap between CT and MR defined GTVs. Another prior study by Ahmed et al. compared CT and MR-based GTVs in a series of six patients with base of tongue cancers [17]. This study also found that there was only limited overlap between CT and MR GTVs although, by contrast with our results, reported that there was no difference in inter-observer variability between CT and MR and that the primary tumour GTV was larger on MR than CT.

Interestingly our data showed that GTVs delineated on CT, MR or CT-MR were significantly smaller when contoured by radiologists compared with oncologists. Similarly, Ahmed et al. [17] reported that average GTVs delineated by a single radiologist were smaller than those contoured by oncologists. Clinical information and the findings of clinical examination remain critical to avoid geometric misses due to disease such as mucosal extension which may not be identified on imaging. Variations in this study between oncologists and radiologists emphasize the potential benefit of a multidisciplinary collaborative approach to GTV delineation, including radiation oncologists, radiologists and surgeons (who may have valuable additional input, for example based on the findings of an examination under anaesthetic).

With regard to the use of FDG PET-CT for radiotherapy planning, a key issue is the methodology used to define the edge of the functional volume of interest. Current generation PET-CT scanners have limitations including image noise, voxel sizes of 4-5 mm, partial volume effects and reconstruction uncertainties which lead to blurring of the edge of PET-avid tumours [9]. A host of methods have been proposed for ‘contouring’ a PET-avid tumour, varying from manual visual delineation to fully automated algorithms [26, 27]. Altering the SUV scale when viewing PET images can alter the apparent tumour volume by a factor of around two [28]; manual delineation is therefore an inevitably subjective process leading to inter-observer variability [29]. Although a host of automated methods have been developed for segmenting PET-avid tumours [30], few have histopathological correlation. In the absence of a widely accepted method, we made a pragmatic decision to use a previously described contrast-orientated method with coefficients derived from individual phantom data on the PET-CT scanner which had performed favourably in comparative phantom and simulated patient studies [20, 31, 32], and pathological correlation in other tumour sites [33]. The results from the PET delineation component of this study need to be interpreted with the unresolved difficulty regarding the optimal method of PET delineation in mind.

PET-based GTVs were significantly smaller than MR and CT-MR GTVs (Table 2), and non-significantly smaller than CT GTVs. Despite this difference in volume, there were substantial areas of the PET GTV which were not included in the CT or MR GTVs; conversely large areas of the CT and MR GTVs were not included within the PET GTV. Consistent with these findings, was the reported series of Daisne et al. [18] of 10 patients with oropharyngeal cancer in which the PET GTV was significantly smaller than CT or MR-based GTVs, with areas of mismatch between PET GTVs and CT or MR GTVs. Interestingly, for patient 6 the PET GTV volume was greater than any other modality GTV volume. This was in contrast to all other patients and the overall results of this study. This could be due to the inherent difficulties in delineating a PET GTV that occur, even using the semi-automatic contouring algorithm, when the GTV 18−FDG uptake resides in an area of natural 18−FDG uptake caused by, for example, inflammation or brown fat. In such cases the PET GTV delineation can incorrectly identify physiological 18−FDG uptake as tumour uptake, leading to false positive GTV tissue and a larger GTV delineation than appropriate. In this case, when visually reviewed it was found that the PET GTV extended further inferiorly compared to the other modality GTVs and also was in a region of relatively high background uptake around the tonsils.

The main limitation of this series is the absence of histological validation. Two series including nine [18] and ten [34] patients who underwent a laryngectomy/laryngopharyngectomy for laryngeal or hypopharyngeal cancer following CT, MR and FDG PET-CT imaging have provided histological validation. Both series reported that the pathological tumour was smaller than any individual imaging modality, but that no single imaging modality encompassed the whole pathological tumour. The inability of imaging to depict the whole tumour volume was thought likely to be due to superficial mucosal extension in that tumour site. No similar series with pathological correlation have been performed for oropharynx cancers to the best of our knowledge. By contrast with the larynx, a resected specimen from the oropharynx would lack the cartilage structure to provide registration with imaging; in addition, oropharyngeal cancer is commonly managed non-surgically. In the absence of pathological validation, our series is descriptive without a ground truth; it is important to recognise that increasing the consistency of contours does not necessarily imply superior target volume delineation. Other limitations include the necessity for co-registration between MR and FDG PET-CT scans; since both scans were performed within the same immobilisation mask it would be expected that co-registration errors would be small.

In the absence of histological validation, it is not possible to select which imaging modality is superior for target volume delineation. It is perhaps not surprising that anatomical and functional imaging techniques provide potentially complimentary information. The smaller FDG-PET volume may be demonstrating the inability of the other techniques to discriminate between inactive necrotic/cystic tissue and the active cancerous tissue; however, FDG uptake is non-specific, so areas of FDG uptake beyond CT or MRI-delineated tumour volume may relate to adjacent inflammatory changes or alternatively areas of sub-clinical tumour infiltration. It seems likely that incorporating multimodality imaging with accurate clinical examination will minimise the risk of a geographical miss. For example, PET may add to the accuracy of target delineation based on anatomical imaging by the detection of areas which are FDG-avid but sub-clinical on CT and MR. This is supported by the findings of Thiagarajan et al. [19] who reported on the impact of PET and MR and physical examination in target delineation in a series of 41 patients with oropharyngeal cancer. This study compared a reference GTV based on CT, PET, MR and physical examination; the concordance indices for both GTVs based on CT and MR or based on CT and PET were low compared with the reference GTV, implying a potential benefit for incorporating all imaging modalities. Importantly, the study highlighted the importance of clinical examination in addition to multimodality imaging for the detection of mucosal extension.

These data show the potential complimentary role for multimodality imaging in target volume delineation. Clearly additional multicentre prospective clinical studies are needed to analyse the impact of this approach on clinical outcomes. Incorporation of multimodality imaging may be more beneficial in the advanced disease setting (patients in this study all had stage III/IV disease) compared with the treatment of early disease. The impact of multimodality imaging on the balance of achieving local control whilst minimising toxicity will depend upon the approach and margins adopted to delineating the clinical target volume, as a multimodality imaging-defined GTV may be larger than that defined on CTV alone. A cost-effectiveness analysis will be useful prior to widespread incorporation into routine practice.

Conclusion

In summary, this study showed that using CT, MR and PET produced significantly different GTVs which varied in volume and/or position, with no single imaging modality encompassing all potential GTV regions. These data support the increased incorporation of multimodality imaging for target volume delineation, to minimise the risk of geographical misses.