Background

Treatment of head and neck cancers using Intensity Modulated Radiation Therapy (IMRT) or Volumetric Modulated Arc Therapy (VMAT) is a promising technique due to its ability to conform high dose to irregularly shaped volumes and to steer doses away from multiple critical normal organs. However more demanding modern treatment techniques require better modeling of treatment beams and more sophisticated modeling in the presence of inhomogeneities in order to guarantee accuracy in the calculation of dose distribution.

Advanced (‘type b’) dose calculation algorithms (such as AAA – Anisotropic Analytical Algorithm) now routinely available in commercial treatment planning systems show improved accuracy compared to the previous pencil beam (‘Type a’) algorithms, accounting for lateral electron transport, but some errors still persist. The convolution-superposition algorithm, the AAA and the collapsed cone convolution algorithm (type-b algorithms) were proved to significantly overestimate the doses near air/tissue interfaces [1,2,3,4].

The nasopharyngeal carcinomas region is surrounded by a considerable amount of bony structures and air cavities, the limitations of the algorithms mentioned above may affect the reliability of the calculated dose distribution.

The Acuros XB (AXB) algorithm, recently introduced in the Eclipse treatment planning system (Varian Medical Systems, Palo Alto, USA) [5] accounts for the effects of heterogeneities in patient dose calculation by explicitly solving the linear Boltzman transport equation (LBTE) that describes the macroscopic behavior of the radiation particles as they travel through and interact with matter. Some recent investigations have shown that AXB is able to achieve comparable accuracy to the golden standard of Monte Carlo calculations in heterogeneous media [6,7,8].

Previous studies quantified the difference between the use of AXB vs AAA for calculating dose for breast, lung and nasopharyngeal cancer treatments.

For breast cancer treatments, Fogliata et al. [9] show how the analysis of the two breast structures presenting densities comparable with muscle and with adipose tissue showed an average difference in dose between AXB and AAA of 1.6%, with AAA predicting higher dose than AXB, for muscle tissue (the lobular breast), while the difference for adipose tissue was negligible.

For Non-Small-cell lung cancer treatments, again Fogliata et al. [10] investigated the clinical impact of the AXB. The planning target dose difference was stratified between the target in soft tissue, where the mean dose was found to be lower for AXB with a range of 0.4 to 1.7%, and the target in lung tissue, where the mean dose was higher from 0.2 to 1.2% for 6MV and lower for 15 MV up to 2.0%.

Studies Kan et al. [11] carried out for nasopharyngeal carcinomas treatments show how when using AXB instead of AAA, the averaged mean dose to PTV was found to be up to 1.2% lower and the averaged minimum dose to PTV in bone was 4% lower, whereas it was 1.5% lower for PTV in tissue.

Interesting is the investigation of the radiobiological impact of AXB compared to AAA in treatment planning. For lung cancer treatments the impact of the dose distribution differences on the NTCP of the lungs and the heart was reported [12, 13]. For whole breast cancer treatments, Petillion et al. [14] show how the more advanced algorithms predicted a significantly lower TCP and NTCP for moderate breast fibrosis; the differences varied between 1 and 2.1% for TCP and between 2.9 and 5.5% for NTCP. In the study of Padmanaban et al. [15] compared to the AAA algorithm, the AXB was found to significantly alter the tumor control probability (TCP) for treatment of oesophageal cancer.

Studies on the radiobiological impact for nasopharyngeal cancer (NPC) treatments due to the recalculation of dose distribution using AXB instead of AAA are lacking; bringing up this subject is interesting because the target volumes include a considerable amount of air cavities and bony structures. We, therefore, investigated the radiobiological impact (both on the TCP and on the NTCP) in NPC patients treated with VMAT.

Methods

Patient data, treatment planning and delivery technique

Twenty-six clinical treatment plans of NPC patients with stages I trough IV were reviewed for this study.

The target volume of each patient was defined by oncologist in charge using 1.25 mm thick axial CT images. The gross tumor volumes (GTV) included all known gross disease as determined by imaging and clinical findings. The margins were adjusted to 1.0 cm beyond the GTV to obtain the CTV; the CTV was expanded symmetrically by 0.3 cm in all directions to account for patient setup and motion within the thermoplastic mask.

The prescribed doses were 69.96 Gy to high-risk PTV (PTV1), 59.40 Gy to intermediate-risk PTV (PTV2) and 54.45 Gy to low-risk PTV (PTV3) with simultaneous integrated boost in 33 fractions. The patients were irradiated with RapidArc (RA) treatments, VMAT with two complete arcs with collimator 10° and 350°, respectively, plus one complete arc with collimator 0°. All plans were generated using a 6 MV beam and modulated with a 120 multileaf collimator from a linear accelerator (Truebeam – Varian Medical Systems, Palo Alto, USA).

The treatment plans were developed using Eclipse 15.5 TPS (Treatment Planning System); the dose distributions of the clinical treatment plans initially performed using the AAA algorithm were recalculated with AXB using the same number of monitor units provided by AAA. Dose to medium calculation was selected for Acuros XB, accounting for the element composition of specific anatomical regions as derived by the CT dataset. Tissue segmentation was automatically performed based on density ranges derived from the HU values read in the CT dataset of the patients. For each tissue, the specific chemical composition was based on the ICRP Report 23 [16].

By the visual inspection of the isodose distribution and DVHs, a treatment plan was deemed satisfactory if certain normal tissue dose criteria were met and the isodose lines indicated a “good” tumor coverage. Usually one tried to ensure that the degree of heterogeneity was kept within + 7% and − 5% of the prescribed dose in accordance with the ICRU Report 62 [17].

Data were tested for normality with the Shapiro-Wilk test and different datasets were compared with the Wilcoxon Signed Rank test. A p value < 0.05 was considered the threshold for statistical significance.

For the validation of both the algorithms implemented in the TPS, the tests, the analysis, and the acceptability criteria were in large part based on the report of the AAPM Report 55 [18], other documents such as the technical report by IAEA [19] were consulted. For AAA and AXB, the outcomes of some test were comparable to those provided by Van Esch et al. [20] and Fogliata et al. [21], respectively.

NTCP and TCP analysis

The NTCP was evaluated by applying different radiobiological models according the analyzed endpoints. To take dose fractionation into account, dose-volume histograms (DVHs) were corrected to 2 Gy/fraction equivalent (LQED2) [22], assuming a α/β value of 3 Gy.

For quantifying the risk of xerostemia from irradiation of the parotid glands, of developing grade ≥ 2 laryngeal edema, of mandible necrosis and myelophathy, the NTCP was calculated using Lyman Kutcher-Burman (LKB) model [23,24,25] (details on the model are given in Appendix). The applied parameters are listed in Table 1.

Table 1 Summary of NTCP modeling studies (SWALM6: physician-rated swallowing dysfunction 6 months after (CH) RT)

To calculate the NTCP and the TCP, the DVHs were imported to Biosuite (Clatterbridge Cancer Center, Bebington, Wirral, UK) [33].

The following equation [34]:

$$ NTCP={\left(1+{e}^{-S}\right)}^{-1} $$
(1)

was used to calculate the risk of radiation-induced hypothyroidism.

Soproglottic larynx and superior pharyngeal constrictor muscle (PCM) were also contoured except for three patients where the surgical intervention was so invasive to make impossible to delineate these contours. NTCP for physician-rated swallowing dysfunction 6 months after (CH) RT (SWALM6) (primary endpoint) and for the secondary endpoint concerning the swallowing solid food dysfunction was performed by Eq. (1).

The values of S parameter are reported in Table 1.

Using the LQ model, the TCP was calculated from DVHs of the PTV1. The radiobiological parameters used in the model were derived from the study by Lee et al. [35]: the values of α and α/β were taken as 0.33 Gy− 1 and 10 Gy, respectively; a clonogenic cell density of 107 cells/cm3 was assumed [36].

Dose analysis

For the PTV1,2,3, we evaluated D95%, D2% dose levels on the DVH above which lay 95 and 2% of the volume of the PTV1; they were used as a surrogate for dose minimum and dose maximum, respectively. The mean dose (physical dose) to the PTV1,2,3 was also recorded.

The mean dose was assessed for all OARs; for spinal cord and mandible, because their structure predominately serial, D2%, was also considered.

Results

The results of the comparison of the treatments plans as calculated by two algorithms, AAA and AXB, are summarized in Tables 2, 3. A comparison of the total physical dose DVHs of the PTV1,2,3 and OARs for a typical patient plan calculated using the two dose algorithms is shown in Fig. 1.

Table 2 Comparison of dose to PTVs calculated using AAA and AXB for all patients
Table 3 Median and range of Dmean and D2% to OAR estimated by AAA and AXB over all patients
Fig. 1
figure 1

Example of a comparative DVH for a NPC plan. The curves calculated by the AAA algorithm are depicted by solid lines and those calculated by AXB by dotted lines

Subsequently, NTCP calculated with AAA and AXB algorithm are referred to as NTCPAAA and NTCPAXB respectively; the NTCP values less than 0.1% are assumed to be zero.

Dose to PTV and TCP

It appears that lower doses for D95%, D2% and Dmean in the re-calculated AXB plans, as compared to AAA plans (Table 2).

When AXB was used, the median percentage difference for D95%, D2% and Dmean of PTV1 were reduced by 1.5% (range: 0.1, 4.0%; p < 0.001), 0.8% (range: 0.3, 1.8%; p < 0.001) and 1.1% (range: 0.1, 1.4%; p < 0.001). For PTV2 and PTV3 the results, regarding D2% and Dmean, were similar to PTV1, while for D95% the difference was not statistically significant. The more reduction in D95% was observed in PTV1 that generally encompassed a more high portion of bony structures, such as mandible, cervical vertebrae and skull base.

The poorer coverage of the PTV1 was reflected in the TCP, which was significantly lower when the AXB was used, the median value was 81.55% (range: 74.90, 88.60%) and 84.10% (range: 77.70, 89.90%) for AAA (p < 0.001) (Fig. 2). Figure 3 shows the percentage TCP difference between AAA and AXB plans (ΔTCP%) versus the percentage D95% differences in the AAA and AXB plans (ΔD95%%). It clearly shows that ΔTCP% increases as ΔD95%%. The percentage TCP difference can be as large as 5.3% on the case with a 4.0% percentage difference in D95.

Fig. 2
figure 2

Comparison of TCP for PTV1 computed with the AAA (abscissa) and the AXB (ordinate) algorithm. Each symbol represents data of an individual patient. The dotted line indicates the line of identity

Fig. 3
figure 3

Δ(TCP)% (AAA vs AXB) versus Δ(D95)% (AAA vs AXB) regarding PTV1

Dose to OARs and NTCP

The maximum percentage difference for Dmean of OARs, averaged over the 26 patients, was 3.4% for the mandible; the minimum percentage difference was 0.9% for PCM. The difference between the two algorithms in terms of Dmean to OARS was statistically significant for all the structures.

The percentage difference for D2% of mandible and spinal cord were 3.1 and 1.9% respectively.

Interestingly, the Eisbruck et al. [26] parameters predicted much higher NTCP value for the risk of a decrease in the salivary flow to 25% of the pre-treatment flow at 1 year post treatment than the risk calculated by Roesink et al. [27] parameters which considered the same endpoint (see Table 1). This is because the Eisbruck et al. parameters used a much shaper slope of the response curve compared with the other parameter set, which results in a more dose-sensitive NTCP prediction. -9.3% and − 5.1 was the percentage difference between the median NTCPAXB and NTCPAAA value when Eisbruck et al. and Roesink et al. parameters were applied respectively.

The risk for developing mandible necrosis was found to be much higher when the AAA was used, an increase of 56.6% was observed: median NTCP 6.5% (range: 1.8, 31.8%) vs 2.8% (range: 0.5%, 17.7) when AXB was used.

Regarding the larynx, the use of AAA resulted in a median Dmean equal to 43.5 Gy (range: 33.0, 63.2 Gy) vs 42.7 Gy (range: 32.1 Gy, 62.2 Gy) for AXB. The median NTCPAXB of risk for larynx edema of Grade ≥ 2 was significantly lower than NTCPAAA: 19.2% (range: 2.4–72.6%) vs 21.8% (range: 3–75.2%); the percentage difference was 12.2%.

− 1.9, − 1.7 were the percentage difference between AXB and AAA for the median of thyroid gland Dmean and NTCP for developing hypothyroidism respectively; the difference were statistically significant.

Dmean to superior pharyngeal constrictor muscle (PCM) and supraglottic larynx were recorded for both plans developed with AXB and AAA. Moderate percentage difference (though statistically significant) between AXB and AAA were seen for the median value: − 0.94% and − 1.9% for PCM and supraglottic larynx respectively. For SWALM6 the median NTCPAXB value was 31.7% (range: 20.4, 54.2%) vs 33.1% (range: 21.5, 55.7%) for NTCPAAA; it resulted in a percentage difference of − 4.2% and the median of the percentage differences between NTCP values, Δ(NTCP)%, across the whole patient population was 4.1% (range: 2.9, 7.4%).

For the secondary endpoint, the median NTCPAXB was 28.1% (range: 11.6, 58.4%) vs 29.2% (range: 12.4, 59.9%) for NTCPAAA and the median of the percentage differences between NTCP values, Δ(NTCP)%, across the whole patient population was 4.5% (range: 2.5, 6.9%).

The incidence of myelophathy predicted by available parameters set was zero, but on the other hand all the plans respected the maximum dose to spinal cord which was inferior to 46 Gy.

Discussion

Previous studies investigating the use of AXB in heterogeneous media suggest that this algorithm is more accurate than the widely-used AAA. Consequently the comparison between AXB and AAA dose distribution by analysed dose indices, provides an indication of the difference between the dose predicted by the AAA and that considered as a better approximation of true delivered dose. In our study, we showed that the photon dose calculation algorithm used in NPC treatments has radiobiological and, therefore, clinical impact. This study quantifies the radiobiological impact of the differences between the physical dose distributions in NPC by NTCP and TCP.

The differences in dose to target predicted by two algorithms are of a magnitude such that the choice of algorithm has clinical impact: the TCP percentage difference can be up to 6.8%. Normalization of treatment plans using AXB to meet the protocol dose prescription of 69.96 Gy would result in an increase in MU of around 1.7% (range 1.0 to 2.2%) with a corresponding increase in dose delivered to the OARs. More radiation output to produce the same coverage as AAA involves a corresponding increase in dose delivered to the surrounding OAR.

This is in line with results reported in the study by Kan et al., mentioned in background section.

Figure 4 shows the box plot of the percentage ΔNTCP (AAA vs AXB) comparison between the different OARs. The NTCP for developing mandible necrosis shows the largest median ΔNTCP (56.6%), the NTCP of risk for larynx edema of Grade ≥ 2 follows with percentage ΔTCP equal to 12.2%. For the other OARs, the percentage ΔΝTCP is lower than 5%, except for Eisbruck et al. parameters that is able to show better discriminate between the dose calculations algorithms.

Fig. 4
figure 4

Box plot of Δ(NTCP)% (AAA vs AXB) for the different endpoints. The bold line represents the median of the percentage difference and the black bars represent the range of the data. (R and E refer to Roesink et al. and Eisbruch et al. parameters, respectively)

The AXB calculates dose considering the element composition; unlike most water-like tissue in body, such as muscle and lung, the elemental composition of compact bone (such as mandible) is very different from that of water. Siebers et al. [37] reported that dose calculations neglecting the element composition resulted negligible effect in lighter tissue but not in compact bone. Consequently our results found the largest differences in PTVs and OARs containing bony.

Regarding the larynx, it is a structure surrounding air; AXB shows a better agreement with Monte Carlo calculation [38] in regions of re-buildup in soft tissue after the beam has passed through low density tissue such as air and therefore lower doses beyond the air/tissue interface than AAA along the central axis. This effect of dose reduction in air and near air/tissue interfaces appears responsible for higher ΔNTCP of risk for larynx edema of Grade ≥ 2 compared with the remaining ΔNTCPs.

The comparison of the two algorithms in the present study is in accordance with the literature; in NPC treatments, the differences are of minor clinical significance in some situations such as when the PTVs and OAR don’t involved air or bone. The adoption of the AXB into clinical treatment planning practice requires one to fully understand its effect and its potential consequences so as to re-evaluate an assessment of dose-effect relationships and of parameters used in treatment planning decisions.

Similarly, the introduction of a predictive model into clinical practice has to be prudent as it is necessary to assess if it is based on calculations and treatments similar to those for which the NTCP has to be calculated. There are large uncertainties in the biological models and its associated parameters; the more accurate dose distribution given by AXB would be useful to have a better understanding of the treatment outcomes. As more clinical data are collected, it may help in the formulation of models to predict radiobiological response and result in more accurate prediction of TCP and NTCP.

The published TCP/NTCP model parameters that we used were obtained from studies that used different techniques and dose algorithms from the present study. Whatever the case, the use of these TCP/NTCP model parameters is appropriate because our study performs a relative comparison between two different dose calculation algorithms rather than studying the absolute expected values.

The results found in this study show how for NPC treatments the differences between the dose distributions of the two tested algorithms yield statistically significant differences in the NTCP and TCP values.

Conclusion

In this study, we have tried to investigate qualitative, possible clinical consequences of the use of AAA versus AXB (keeping the same number of monitor units provided by AAA and clinically delivered to each patient) for NPC treatments by comparing NTCP and TCP values. As a result, the NTCPAXB/TCPAXB was lower than the NTCPAAA/TCPAAA; the difference could be clinically significant. The availability of AXB algorithm could improve patient dose estimation, increasing the data consistency of clinical trials. This could improve radiobiological models and obtain more robust radiobiological parameters.