Background

In the planning and delivery of external beam radiotherapy (EBRT) for prostate cancer, several potential sources of variability have been documented. Prostate organ motion between [117] and during fractions [68] has been described by several authors. Variations in organ contouring using different imaging modalities [1012] and variations in interobserver measures have also been reported [18, 19].

Electronic portal imaging (EPI) enables the verification of treatment fields during a course of EBRT. Several protocols of EPI are currently in use in various institutions and vary mainly in terms of the number, timing, and frequency of images acquired over a treatment course. Other potential sources of variability in the treatment verification process include the type of correction with on-line or off-line protocols which represent in fact two different strategies to reduce variability [57]. While EPI protocols may be diverse among institutions, the common goal of treatment delivery is to ensure accuracy and consistency throughout the process of image verification.

Few reports from the literature have specifically addressed interobserver variability in portal images verification. In a study of 16 observers of different professional disciplines (radiation oncologists, radiation therapists and physicists) using images of various anatomical sites and a 5-point scale to assess conformity between simulator films and portal images, Bissett et al. demonstrated significant inconsistencies between observers [20]. In another series of electronic portal imaging verification in 18 prostate cancer patients, Dalen et al. reported intraclass correlation coefficients (ICC) consistent with significant agreement between radiation oncologists (ICC 0.58) and radiation therapists (ICC 0.72) [2]. Lewis et al. reported good agreement in a study of 9 observers matching a total of 17 images of pelvic radiotherapy portals [9]. However, there are few data from other institutions to corroborate these findings. Within the framework of an effective EPI protocol, the present report focuses on interobserver consistency associated with EPI registration performed by 6 trained radiation therapists on a cohort of 20 prostate cancer patients undergoing daily EPIs during the first ten fractions.

Methods

EPI protocol

The protocol for the verification of treatment fields during EBRT for prostate cancer employed at this institution consisted of amorphous silicon EPI acquisition daily during the first 3 days of treatment. These images are then registered to a template generated from the reference images or digitally reconstructed radiographs (DRR). The registration process is based on anatomy matching of the EPI to the reference DRR. On the AP images the key structures are the superior and inferior pubic rami, the pubic symphysis and the obturator foramen. On the Lat images the key structures are the pubic symphysis, the femoral head and the acetabulum. For each fraction, reference images and EPIs are generated for the Anterior-Posterior (AP) and Left Lateral (Lat) beam incidences. The registration process yields 2 measures of displacement of the isocentre for each beam incidence: Superior-Inferior (SI) and Left-Right (LR) directions for the AP images; and SI and Anterior-Posterior (AP) directions for the Lat images. Based on a review of the literature, tolerance of displacements was set at 5 mm. Any single value of displacement greater than twice the tolerance limit (2 × 5 mm = 10 mm), will lead to an off-line correction and a repeat EPI. The values of displacement calculated for the first three fractions are averaged. If this 3-day average exceeds the set tolerance of 5 mm, an off line correction is also applied and the EPI is repeated. For the AP images, directions of displacements are as follows: superior/inferior = +/-; right/left = +/-. For the Lat images: superior/inferior = +/-; anterior/posterior = +/-. To examine possible time trends, EPIs were obtained daily during the first 10 fractions of each course of EBRT. Ten fractions were examined since previous series have suggested that the position of early fractions may not be representative of the overall systematic error as later fractions [21].

Patients

Twenty consecutive patients with prostate cancer undergoing radical EBRT over at least 6 weeks were analysed for this study. The AP and Lat EPI of the first ten fractions were registered by 6 trained radiation therapists. Measures of displacements in the SI and LR directions for the AP images and SI and AP directions for the Lat images were independently recorded by the 6 observers, all blinded to each other's results. This yielded a total of 2400 registrations and 4800 values of displacement for the analysis.

Statistics

Mean displacement values and their corresponding standard deviation were calculated for the whole group and the 6 observers individually. Inter-observer variation was assessed by calculating the standard deviation of the six observers' measurements within each image. The sources of variation in measurements of displacement between the observers and the images were compared using variance components analysis. Time trends were estimated using repeated measures analysis. Random and systemic deviations were subsequently calculated for each observer in accordance to previously published definitions [7]

Random error was defined as variations between fractions during a treatment series, was determined by calculating the spread (1 SD) of differences around the corresponding mean in each patient and then calculating the average of these SDs for the whole group.

Systematic error was defined as deviations between the planned position and the average patient position over the treatment course, were obtained by calculating the mean displacement per patient and then the SD of all patients' means.

Results

Descriptive statistics of the measurements are presented in Table 1. Errors of a larger magnitude were identified at the beginning of treatment in a few patients which yielded values of maximum or minimum displacements close to or > +/-10 mm.

Table 1 Mean, standard deviations and range of displacement measures for the entire study cohort

An alternate approach to evaluate consistency in measurements is to calculate the individual mean values of the 6 observers and their corresponding standard deviation (SD) for each of the 200 images and the four directions of displacement. These results are presented in Table 2. For the entire group of 6 observers, the individual means range from -1.61 mm to 1.04 mm, while the SD range from 2.21 to 3.59 mm respectively.

Table 2 Mean displacement measures by each observer for all patients and fractions

In order to assess the impact of displacement on previously set tolerance limits, we calculated the proportion of measurements within 3 mm and 5 mm from zero for each observer. Overall proportions are presented in Table 3. This assumes that the ideal displacement measurement is equal to zero. The values of 3 mm and 5 mm were selected according to the EPI guidelines currently in effect at our institution. This calculation provides an estimate of the proportion of fractions that would require a correction based on a daily online EPI protocol. Reducing the level of tolerance from 5 mm to 3 mm increases the number of corrections that need to be applied. Differences between observers in meeting tolerance limits were examined using the chi-square test. The proportion of measurements within +/- 3 mm and +/- 5 mm from zero varied significantly between observers for all measures except for the measures of AP images in the LR direction. While the reason for this is unclear, we note that proportion of agreement between observers for these measures had the smallest range among the 6 observers (range 69.5–79.5% = 10% for APLR-3 mm and range 91–93% = 2% for APLR-5 mm, respectively). This suggests that there is less variation between observers for the APLR measurement for the APLR measurement for the conditions stated above. This may also indicate that a discrepancy in this direction and incidence is more readily visualized and agreed upon by a group of observers.

Table 3 Proportions of measurements within +/-3 mm and +/-5 mm from 0 and chi-square analysis of variations between observers

To further assess the level of agreement between observers we calculated the standard deviation of the six observers' measured displacement for each image. The results are presented in table 4. The lowest inter-observer variation was for the AP images having a mean SD of 0.7 and 1.0 for the LR and SI directions respectively. Lateral images showed a wider level of interobserver variation with mean SD of 1.7 and 1.4 for the AP and SI directions respectively

Table 4 Estimated variance attributed to observers and images using variance components analysis

The contribution of the observers and the images to the overall variability in measured displacement was estimated using an analysis of variance component. The results are presented in table 5. The variance between images is considerably larger than that of the observers for all directions of displacement. This indicates that the interobserver variability is very small and contributes little to the overall variability in the measured displacements.

Table 5 Sources of variation in measurements of displacement expressed as the variance.

The presence of time trends was examined using repeated measure analysis. Time was used as a continuous variable in the model and t-test was used to assess statistical significance. The results are presented in Table 6. Time trends were observed for the APLR and Lat SI measurements (p < 0.001 and p = 0.003 respectively). The magnitude of theses trends were however small (0.09 mm/fraction for APLR and -0.067 mm/fraction for Lat SI).

Table 6 Estimated time trends and their significance from repeated measures analysis

Table 7 presents the systematic and random errors in displacements calculated for each observer for the four possible displacements. Systematic errors were the largest in the Lat AP measurements, while random errors were the largest in the AP SI measurements.

Table 7 Systematic and random errors of each observer for each measurement

Discussion

Several protocols of EPI verification have been described in the literature [57].

The current study specifically assesses interobserver consistency in the EPI registration within an institutionally-defined EPI protocol.

In this protocol, the EPI registration is performed by trained radiation therapists who are responsible to assess treatment accuracy. The analysis was intended to verify that consistency can be achieved among the individuals independently performing the measures.

There are few studies available from the literature dealing with interobserver variation in EPI registration. In a study by Dalen et al. investigating the concordance of approval between groups of radiation oncologists and radiation therapists, no statistically significant differences between the two groups was demonstrated [2]. In this study, results were analyzed using intraclass correlation coefficients (ICC). This method is often use when several observers are measuring a common parameter. A high ICC however, does not imply agreement on all measurements. Hence, this method of comparing observers should be weighed against the goal of the comparison itself, or, in this instance, the accepted variation or tolerance between measurements. For example, if observer 1 measures a displacement of 1 mm on 10 consecutive fractions while observer 2 measures a displacement of 4 mm on the same 10 consecutive fractions, a correlation coefficient of 1 will be obtained. Yet, the measurements are different and if a tolerance is set at 3 mm, measurements by observer 1 would be considered within acceptable limits while those of observer 2 would require corrections for exceeding the tolerance limits.

Lewis et al. [9] investigated variability among 9 observers in assessing patient movement during pelvic EBRT. These authors demonstrated that interobservers variability may be as low as < 1 mm. Similarly, our analysis showed that an observer effect was present for only 1 of the 4 measurements and a mean difference of only 1 mm was noted between the 2 observers that differed the most in their measurements.

Our center has recently moved toward a fiducial marker protocol for EPI registration in patients receiving EBRT for prostate cancer. The use of gold fiducial markers has been shown to be a feasible and effective method of tracking prostate motion during treatment [317]. Nederveen et al. showed that the application of a marker-based verification system can reduce systematic errors when compared to the use of bony anatomy alone [11]. In another study by Ullman et al., high intra- and inter-radiation therapist reproducibility was demonstrated in daily verification and correction of isocenter positions relative to fiducial markers. In this study, using a 5 mm threshold, only 0.5% of treatments required shifts due to intra- or inter-observer error [17]. Analysis of our institution's data using fiducial marker is ongoing to potentially corroborate these results. Although the use of fiducial markers implanted in the prostate is increasingly adopted as a standard in some centres, there remains a large proportion of centres worldwide that continues to rely on bony landmarks in image verification for prostate cancer treatment. In this context, we believe that information regarding the interobserver variability of EPI verification using bony anatomy still provides an important measure of quality assurance. Furthermore, the accuracy of bony anatomy matching remains an important factor since pelvic treatment continues to rely on bony anatomy rather than prostate position [12].

Conclusion

This study demonstrated significant consistency among trained radiation therapists in EPI registration for treatment verification of radical prostate cancer EBRT. No significant systematic observer effect and no systematic time trends were identified. These findings may serve as measures of quality assurance of the institutional verification protocol.