Background

Whole brain radiation (WBRT) is a common treatment for patients with brain metastases [1,2,3,4,5,6]. Many patients who receive WBRT have a poor prognosis, and it is important to minimize both acute and late toxicities. Commonly-acknowledged consequences of WBRT include neurocognitive effects, fatigue, and hair loss [7,8,9].

More-recently, in a prospective study, we reported that patients receiving standard WBRT (without prospective delineation of the parotid or lacrimal glands) resulted in clinically-significant acute xerostomia and dry eye in roughly 35% and 25% of cases, respectively with toxicity rates associated with glandular doses [1, 10]. In those reports, a summary of the reported toxicities was provided together with a statistical analysis of their correlation against given dose volume metrics. However, as there are no prior studies reporting NTCP model parameter values for these acute toxicities, we performed additional dosimetric analyses including normal tissue complication probability (NTCP) modelling. Different groups in research and clinical domain are more familiar with and prone to use a given NTCP model. For this reason, in this study, parameter values were derived for two popular NTCP models in order to facilitate such groups incorporate NTCP metrics for the examined toxicities in their analyses. Also, dose thresholds were identified in this study, which seeks to generate risk assessment tools to aid in clinical decision making.

Methods

Patient selection, treatment and OAR delineation

Patients were treated with WBRT on a prospective, IRB-approved study (ClinicalTrials.gov #NCT02682199), with details of study procedures previously reported [1, 10]. In brief, patients were eligible if they were planned to receive WBRT for any indication using 3D planning to a total dose of 25–40 Gy in 10–20 fractions, at 2–3 Gy per fraction. Patients provided written consent and were enrolled at one academic center and two affiliated community hospitals.

All patients were treated in the supine position using customized head-cast immobilization. Patients were treated using 3D planning without prospective delineation of the parotid glands or lacrimal glands, though some providers delineated the globe and/or lens for avoidance. Using digitally-reconstructed radiographs, WBRT fields were designed based on bony anatomy and covered the entire skull and extended to the inferior border of the C1 or C2 vertebrae [1].

The bilateral parotid and lacrimal glands were retrospectively delineated without knowledge of clinical outcomes. Parotids were delineated using planning CT alone, whereas MRI fusion was used for lacrimal gland delineation. The dose volume histograms (DVH) were calculated for the bilateral parotid and lacrimal glands. DVH-based metrics were correlated with patient reported outcomes and definitions of toxicity as described below.

Definition of toxicity for xerostomia and dry eye

Patients completed xerostomia and dry eye questionnaires at baseline pre-WBRT, at the conclusion of WBRT, and at 1, 3, and 6 months post-WBRT. All toxicity analyses refer to outcomes at 1 month (1M) post-WBRT, which was the prospectively-specified primary time point. To allow NTCP modelling, clinically significant toxicity was defined as a binary variable using threshold worsening of patient-reported symptom scores, as described below.

For xerostomia, patients completed the validated University of Michigan Xerostomia Questionnaire (xerostomia score), calculated using eight questions each scored from 0 to 10 and linearly converted to a 100-point scale, with higher scores representing worse symptoms [11,12,13]. Patients also completed a 2-question xerostomia bother score that assessed the degree to which xerostomia bothered patients while eating and while not eating [11]. Each bother question was answered on a 4-point Likert scale (0: Bothered not at all, 1: Bothered a little bit, 2: Bothered quite a bit, and 3: Bothered very much) adapted from the EORTC QLQ-H&N35 head and neck cancer-specific QOL questionnaire [11, 14]. The higher score of the two bother questions was considered the overall bother score at that time point. Two definitions of clinically significant toxicity were used for the xerostomia analysis: (1) ≥ 20 point increase in xerostomia score and (2) ≥ 2 point increase in xerostomia bother score [11]. Regarding those two scores, as it has been reported in the literature and has been confirmed by our data, lower levels of increase from baseline were characterized by large variability. However, for the levels that we used in this study the responses were more stable over time constituting a clear worsening of the symptoms from baseline. The results and analysis of xerostomia using bother score are presented in the “Appendix”.

For dry eye, patients completed the Subjective Evaluation of Symptoms of Dryness (SESoD) [1, 15,16,17]. The SESoD is a single question assessing the presence and significance of dry eye and is scored on a 5-point Likert scale (0: None, 1: Minimal, 2: Mild, 3: Moderate, 4: Severe).

Clinically significant toxicity was defined as a ≥ 2 point increase in SESoD score [1].

Radiobiological models

Physical dose distributions were converted to equivalent doses of 2 Gy per fraction (EDQ2Gy) using an α/β value of 3 Gy [18,19,20]. For each organ, the corresponding generalized equivalent uniform doses were calculated (gEUD2Gy) [21, 22]. The clinical data relating to the observed rate of NTCP was used to calculate the model parameters for both the Lyman–Kutcher–Burman (LKB) model [23, 24] and the relative seriality (RS) [25]. The basic parameters of each model are: D50 (or TD50), which is the dose for a complication rate of 50%, the slope (gradient) of the dose response curve (m for LKB and γ for RS), and the parameter that accounts for the volume dependence of the organ (n for LKB and s for RS). From the formulation of the RS model, the biologically effective uniform dose (\(\overline{\overline{D}}\)) is derived. \(\overline{\overline{D}}\) is the dose that causes exactly the same normal tissue complication probability as the real dose distribution [26]. The mathematical formulations of NTCP models and biological doses are provided in the “Appendix”. In the figures and tables presenting results of the NTCP models, equivalent to 2 Gy per fraction doses are used otherwise the physical doses are shown.

Statistical methods

For the two NTCP models, the parameter values and their 95% confidence intervals were determined using the maximum likelihood method [18, 27, 28]. The fitting calculations were performed through the use of a minimization package (MINOS) [29]. The confidence intervals of the model parameters were determined using the profile likelihood method. The ability of the NTCP models to distinguish patients with and without the examined symptoms was evaluated using the area under the curve (AUC) measure, which is used as a summary of the ROC curve [18, 30]. The goodness-of-fit of the different NTCP models was assessed through the Hosmer–Lemeshow test [31]. Additionally, the Odds Ratio (OR) method was applied to identify NTCP thresholds beyond which the risk of toxicity increases significantly [18, 32]. Those thresholds were identified in three steps. First, we identified the thresholds for which the OR values are larger than 1 and sorted them by OR value (largest to lowest); second, we identified the thresholds for which the low limit of the 95% confidence interval (95% CI) is larger than one; and third, we identified the threshold with the smallest 95% CI.

Results

100 patients were enrolled and treated between 2015 and 2018, of whom 55 and 54 were eligible for the analyses of 1-month xerostomia and dry eye, respectively (45 patients were prospectively excluded from analysis due to lack of baseline score or baseline SESoD score ≥ 3, or did not complete WBRT, or did not complete any follow-up questionnaires at 1 month post-RT). Patient characteristics are shown in Table 1.

Table 1 Patient characteristics (n = 55) [1]

Most patients received 30 Gy in 10 fractions (58%) or 35 Gy in 14 fractions (31%). For the xerostomia analysis, clinically significant toxicity was observed in 19 patients (35%) who had a ≥ 20 point increase in xerostomia score. For the dry eye analysis, clinically significant toxicity was observed in 13 patients (24%) who had a ≥ 2 point increase in SESoD score.

Figure 1 shows a coronal view of the spatial dose distribution for a representative patient in the plans of the parotid and lacrimal glands, respectively. It also illustrates the bilateral parotid and lacrimal dose volume histograms (DVHs) for patients with and without toxicity. Table 2 presents average OAR mean and volumetric dose in patients with and without toxicity. Figure 2 shows the areas under the ROC curves (AUC) for different parotid and lacrimal dose volume metrics for their corresponding toxicity endpoints.

Fig. 1
figure 1

Upper: the spatial dose distribution of a representative patient and DVHs of the combined parotid glands for patients with (red solid lines) versus without (green dotted lines) clinically significant xerostomia defined using xerostomia score (≥ 20 point worsening in xerostomia score). Lower: the spatial dose distribution of a representative patient and DVHs of the combined lacrimal glands for patients with versus without clinically significant dry eye (≥ 2 point worsening in SESoD score). The isodose lines represent percentages of the prescription dose (30 Gy)

Table 2 Average dosimetric parameters for patients with and without toxicity
Fig. 2
figure 2

Upper: AUC curves of the combined parotid glands for clinically significant xerostomia. Lower: AUC curves of the combined lacrimal glands for clinically significant dry eye. The x-axis refers to the dose (D) of the dose volume metric (VD) and has units of Gy

For xerostomia, the AUC values for Dmean and V20Gy of the parotid glands were 0.64 and 0.68, respectively. Patients with parotid V20Gy ≥ 49% had 8.6 (95% CI: 2.4–30.8) times higher risk for clinically significant toxicity as defined using xerostomia score (this cutoff was determined using the Odds Ratio method).

For dry eye, the AUC value for Dmean to the lacrimal glands was 0.60, whereas the dose–volume indices in the range V6Gy–V15Gy were found to correlate best with toxicity with AUC values ranging between 0.65 and 0.68. Patients with lacrimal V15Gy ≥ 80% had a 5.8 (95% CI 1.1–29.4) times higher risk for clinically significant toxicity (this cutoff was determined using the Odds Ratio method).

Results for NTCP modelling parameters using LKB and RS methods are shown in Table 3, for xerostomia and dry eye, respectively. Figure 3 shows the parotid and lacrimal dose response curves generated using these models, for a range of gEUD and \(\overline{\overline{D}}\) doses (for the LKB and RS models, respectively). Overall results for the NTCP modelling, AUC analysis, and assessment of statistical significance are summarized in Table 4 for xerostomia and dry eye, respectively. The goodness of fit of the NTCP models was evaluated by the Hosmer–Lemeshow test, which showed that the p values of the LKB and RS models were 0.35 for xerostomia and 0.52 for dry eye, respectively. Both values are larger than 0.05, so the null hypothesis that the observed and expected response rates are the same across all doses cannot be rejected.

Table 3 Summary of the parameter values and 95% confidence intervals for the LKB and RS models of the parotid glands for the endpoint of xerostomia and lacrimal glands for dry eye
Fig. 3
figure 3

Upper: dose response curves for the parotid glands and clinically significant xerostomia. Lower: dose response curves for the lacrimal glands and clinically significant dry eye. The dashed lines correspond to the 95% confidence intervals of the dose response curves. The individual binary response data are shown as open circles. The doses on the x-axis correspond to gEUD and \(\overline{\overline{D}}\) (for the LKB and RS models, respectively). The shaded histograms represent the response rates as a function of mean dose to the parotid or lacrimal glands, respectively

Table 4 Summary of the results from the fit of the two normal tissue complication probability models for xerostomia and dry eye. AUC = area under the curve. Response rate is the average of the response rates predicted by the models. OR = Odds ratio. Threshold doses refer to gEUD and \(\overline{\overline{D}}\) (for the LKB and RS models, respectively). The p value was calculated using the two sample t-test for the patient subgroups above and below the dose threshold (the null hypothesis is that the true mean difference is zero)

Discussion

In this analysis, we expanded on our recent reports [1, 10] by creating NTCP models and refining the dosimetric relationships linking parotid dose to xerostomia and lacrimal dose to dry eye in patients receiving WBRT. The dose–volume metrics V13Gy–V23Gy and V6Gy–V15Gy of the parotid and lacrimal glands were best correlated with the endpoints of xerostomia and dry eye, respectively (see Fig. 2). The AUC values of those dose metrics are close to the AUC values of Dmean for both organs (parotid and lacrimal glands). This is because they demonstrate a parallel-like volume effect against the endpoints of xerostomia and dry eye, respectively. In both cases, the respective V20 and V15 metrics showed higher AUC values but not at a statistically significant level.

The OR values that are reported here are the highest values, which are also statistically significant (lower limit of the 95% CI should be larger than one) and have the have the smallest confidence interval. However, it has to be stated that the accompanied thresholds depend on the cohort characteristics (e.g. distribution of dose values among patients). Those results have an immediate clinical applicability since they can be used as dose constraints in treatment plan optimization and evaluation.

In WBRT, the vast majority of patients are treated with parallel-opposed lateral fields. IMRT/VMAT are not well-accepted approaches since the cost and planning/QA time for IMRT/VMAT are far higher than for opposed laterals, and given the clinical setting, might not be justified. In order to establish a closer dose–response relation for dry eye, the response of the individual eyes could be recorded. However, in most cases the status of the patient is not good. Getting good QOL data is challenging and asking questions about each eye separately would add mental burden on the patients.

For xerostomia, our findings are consistent with recent reports from radiotherapy of head and neck tumors finding that parotid V20 correlates best with PRO-CTCAE scores [33]. For example, the difference in parotid mean dose between patients with vs. without xerostomia was only 2.4 Gy, whereas the difference in parotid V20Gy was 8.4% (Table 2). A statistically significant threshold of V20Gy ≤ 47% corresponded to an odds ratio of 8.6 (95% CI 2.4–30.8).

For dry eye, there are no good prior studies for comparison. More specifically, although there are a couple of studies performing NTCP modeling for lacrimal glands, neither of those use the LKB and RS models or have the same or a similar clinical endpoint [e.g. Bhandare et al. use a different model (Logit) and a different toxicity endpoint (ophthalmologic diagnosis of severe DES)] [34]. In our analysis, the same pattern was found as with xerostomia in that volumetric dose metrics were also better correlated with toxicity than mean doses. For example, in our group, the mean lacrimal dose difference between patients with vs. without dry eye was only 1.6 Gy, whereas the difference in lacrimal V15Gy was 8%. A statistically significant threshold of V15Gy ≤ 80% corresponded to an odds ratio of 4.4 (95% CI 1.3–15.4).

There are many studies reporting NTCP model parameters for xerostomia after head and neck radiotherapy. The values of the NTCP model parameters we computed for the same endpoint but after WBRT are not very different than those reported in the literature. When the later ones were applied to the present dataset many of them were found compatible producing similar AUC and OR values with the fitted model parameters. The reason is that the previously published model parameter values fall within the confidence intervals of the derived values (see Tables 3, 5). For the parameter sets that were not found compatible, there are several potential reasons to explain this difference. First, most prior reports address patients with head and neck cancer, and the character of the 3D dose distribution is very different in the setting of WBRT. More specifically, dose fall off inside the volume of parotids is more pronounced in the case of head and neck radiotherapy.

Table 5 Summary of the parameter values that have been reported by different groups for the LKB model, regarding the endpoint of xerostomia. The difference between the physician-scored (CTCAE) patient-reported (PRO-CTCAE) scoring systems is also indicated. Those model parameter sets were applied on the current dataset and the corresponding AUC, OR (with 95% confidence interval and dose threshold) values were calculated

Second, most prior studies have used physician-scored toxicity (CTCAE), while the current study uses PRO [35,36,37,38]. It is generally understood that providers might underestimate the rate/severity of toxicities. In this light, the differences between our results and the prior studies (see Table 5) are logical as lower values for TD50, and higher values for m, and higher values for n would be associated with predicting a higher rate of toxicity (as our use of PROs would be expected to yield higher risk rates). Interestingly, our computed parameters are close to parameters derived from our recent analysis of patient-reported symptoms (PRO-CTCAE) in patients with head and neck cancer as presented in Table 5 [33]. This agreement suggests that the models might be reasonably applicable across a broad range of 3D-dose/volume characters.

Third, most prior studies have considered the dose/volume metrics for the contralateral parotid alone (since there is usually/often no attempt made to spare the ipsilateral parotid gland), where we considered both parotids as a pooled single structure in the setting of WBRT.

We found that both the LKB and RS NTCP models were able to be fitted to the clinical xerostomia data (Table 4), with similar goodness-of-fit. Indeed, the threshold doses for xerostomia in both NTCP models were identical (18 Gy). This is similar to what has been found in other studies where both models provide a reasonable fit to the data [23, 35].

The results of xerostomia based on bother score have very similar pattern with those based on Michigan Questionnaire. More specifically, both scoring systems have almost the same V20Gy and Odds ratio statistics. However, the NTCP model values show some deviation (TD50/D50 values are higher and m/γ values are lower for bother score).

For dry eye, there is minimal published dose/volume data/modelling results to which we can compare our findings. As with the parotids, the threshold doses for dry eye were similar for the two models (28 Gy for the gEUD in the LKB and 25 Gy for the \(\overline{\overline{D}}\) in the RS). Although the data show that there an increased risk for dry eye beyond those threshold doses, the OR results were not statistically significant. This means that a more conservative approach should be followed regarding the use of those threshold doses in the clinic.

In this study, serial observations were collected (from 1 to 6 months) aiming at analyzing the dependence of velocity and extent of symptom resolution on dose. Unfortunately, the follow-up data at 3 and 6 months are sparse to allow a valid statistical analysis. More specifically, the available follow-up data were 55, 33 and 28 at 1, 3 and 6 months, respectively.

Our study has several limitations. First, we did not consider provider-defined toxicity scoring. Nevertheless, we think our approach is reasonable as there is increasing recognition of the importance of patient-reported outcomes [35]. Second, the definition of a significant toxicity was somewhat subjective (e.g., a threshold symptom score increase from baseline). However, this a common approach that has been used by other groups using patient-reported outcomes, and it is reasonable since it inherently acknowledges the importance of considering baseline status [35]. This is particularly important in our setting since many patients with brain metastases requiring WBRT would have received prior systemic therapies that might impact these symptoms. Third, the total number of patients and events was modest. A rule of thumb in modelling is to have 10 events per parameter in the dataset. When this rule breaks, the results show large confidence intervals. However, in our case this is reasonable as the study was prospective, and to our knowledge this is one of the largest studies of its kind to address this issue in the setting of WBRT.

Finally, it is known that all the existing NTCP models have inherent limitations. These models do not account for biological mechanisms (cell redistribution, re-oxygenation, etc.) that may have an impact on treatment outcome. Further, such models do not consider the spatial distribution of dose, and hence ignore the possibility that there may be particularly-important regions of such “parallel organs” in determining clinical response [39]. Despite these uncertainties, the correlations of the model-based NTCP values with clinical outcome data were fairly good.

Conclusions

In conclusion, we created NTCP models that reasonably-well fit acute parotid and lacrimal gland toxicity after WBRT. These models and their findings may help establish levels of risk for these acute toxicities and help clinicians determine reasonable dosimetric goals to maintain quality of life in patients receiving palliative WBRT. Further investigations are needed to refine/confirm these model-based estimates.