Introduction

Many patients with metastatic cancer receive oncological treatment, and radiotherapy (RT) is an important component of palliative treatment1. RT can be an effective tool for palliation of symptoms arising from cancer, including pain from bone metastases or neurological compromise from brain or spinal metastases with cord or nerve root compression. The aim of palliative RT is to alleviate symptoms and improve quality of life. Evidence has shown that palliative RT was received by approximately 10% of patients who died of cancer near their end of life2,3. In one population-based study that included 15,287 patients who received RT in the last month of life, 17.8% received more than 10 days of treatment4. This finding corroborates with a German study which showed 50% of patients spent more than 60% of remaining 30 days of life receiving RT4.

RT can be delivered via different dosing regimens (e.g., single fraction on one day versus multiple fractions for weeks)5,6,7,8,9. The use of multi-fractionation (split up the total dose into small fractions) is often perceived to be associated with less long-term complications and need for retreatment4. While larger fraction size by single-fractionation theoretically has an increased risk of late-onset radiation toxicity. Beside radiobiological consideration, medical training and experience, departmental policies, and insurance reimbursement all influence the decision on the dose-fractionation regimens. Use of hypofractionation or single fractionation is associated with a perceived poor prognosis by the oncologist. Protracted courses of RT can become considerable demand and burden on terminal cancer patients7. Their symptoms can be aggravated by transportation to the RT facility and repeated positioning in the treatment suite. It is also costly to the healthcare system but might also preclude the trigger for end-of-life measures for this group of patients.

As highlighted in the statement by the American Society of Clinical Oncology in 2011, transition from focusing on cancer-directed therapy to palliative care often occurs within days of death10. This is possibly related to the poor accuracy of clinicians at predicting prognosis and survival of patients with advanced malignancies11,12,13. An accurate and practical short-term mortality prediction score for patients with metastatic cancer receiving palliative RT can assist clinicians in tailoring palliative RT use and for delivering appropriate dose-fractionations according to the expected short-term risk of death. Furthermore, an earlier and more thorough assessment of patient management options, goals, and preferences to facilitate personalized palliative care along the disease trajectory will be possible. When a cure is not possible, the goals of treatment change appropriately from prolonging life to controlling symptoms and improving quality of life. However, evidence has shown that clinician estimates of survival tend to be optimistic and poorly reproducible11. Prognosis overestimates may have contributed to mismatched fractionation schedules, and lots of patients needing to discontinue therapy.

The determination of prognosis and life expectancy is critical to the care of patients with advanced cancer. Prognostic factors and predictive tools have been explored and developed to improve the estimation of life expectancy over the clinician’s estimation. However, most of these studies are based on small study samples and hospital settings with short follow-up and survival times (in terms of weeks) or based on single tumor sites and are not specific to patients referred for RT14,15,16,17. Some studied the prognostic factors for survival time after palliative RT but did not develop a predictive model3,18. Palliative RT rarely affect survival time but often improves quality of life19,20. To improve palliative RT delivery and resource allocation optimization, we conducted a cohort study using routinely collected electronic data from patients with metastatic cancer receiving their first course of palliative RT. We studied the factors associated with short-term mortality and developed a predictive score model for 30-day mortality.

Methods

Data source

We retrieved data for our patient cohort on April 1, 2019. All RT episodes were identified using the MOSAIQ information system42, which archives and integrates RT planning and treatment details. These data were then linked to the entries maintained by the Clinical Data Analysis and Reporting System, which is an electronic medical record (EMR) database operated by the Hospital Authority of Hong Kong. The International Classification of Diseases, ninth revision (ICD-9), was used for disease coding. We followed the transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) guideline for model development and validation43,44. The Research Ethics Committee, New Territories West Cluster, Hospital Authority, Hong Kong approved the study on October 1, 2018, and waived patient consent requirement (reference no: NTWC/REC/18093). The research was conducted in accordance with the 1964 Declaration of Helsinki and its later amendments.

Patients, data, and settings

We included patients with metastatic cancer who received palliative RT at Tuen Mun Hospital, Hong Kong, between July 1, 2007 and December 31, 2017, and had not received palliative radiotherapy at the center before July 1, 2007. The definition of first course palliative RT fraction was based on a combination of the receipt of an identified palliative dose-fractionation and the treating oncologist’s indication; a RT course is defined as one or more fractions of external beam RT, delivered to a defined area.

Short-term mortality predictive risk factors

To transform raw EMR data into variables usable in a prediction model, we first collected all data from the 180- to 365-day period (depending on particular variables), ending the day before palliative RT initiation (we did not exclude patients based on absence of data during the period). Raw data were aggregated into potential predictors in the following categories: demographics, prescribed medications, comorbidities and other grouped ICD-9 diagnoses, surgical procedures, health care resource use, and laboratory results. No data on the first course palliative RT itself (e.g., dose-fractionation and techniques) were used in the predictive model. More precise information on the variables used as short-term mortality predictors are provided in Appendix 1 and Supplementary Table 3.

Outcomes

Our primary outcome was 30-day overall mortality, which was calculated from the start of the first course palliative RT until death or when censored (April 1, 2019). The start date of RT was used because it was closer to the date when the clinical decision to treat was made than that of the end of treatment and provides a uniform time point across all fractionation regimens.

Model selection, performance, and scoring

We used multivariable logistic regression models to evaluate the predictive performance of the primary outcome, 30-day mortality45,46. The model’s predictor functions were pre-specified a priori based on subject matter knowledge (Table 2). We assumed a pattern of randomness and created one imputed dataset using a fully conditional specification based on a multivariate normal distribution47. Different combinations of the 13 covariates were chosen for the regression models (Table 2). The 13 covariates were age, sex, Royal College of Surgeons modified comorbidity score48, log peripheral white blood cell count, log peripheral blood neutrophil lymphocyte ratio (NLR), log plasma urea, log serum bilirubin, serum albumin, lactate dehydrogenase (LDH), red cell distribution, attendance to emergency room, sites receiving palliative RT, and primary lung cancer.

Data-adaptive methods based on cross-validation and mean absolute error for predictions (MAE) were used to evaluate the predictive performance of different model specifications. We used ten-fold cross-validation to reduce the risk of overfitting the final model to the training set49. The cross-validation procedure involved fitting a candidate model for the primary outcome, using data from nine of the ten blocks (the “derivation set”), and evaluating its performance in the held-out block (the “validation set”). We repeated this process ten times, each time using a different block as the validation set, and then averaged the performance over the ten validation sets.

As the overall performance metric, we used the MAE, which measures the average of the difference between predicted and observed outcome in the test, i.e., the average prediction error50. This represents the closeness of the prediction to the eventual outcomes. Our measure of model discrimination was the cross-validated areas under the receiver operating characteristic (ROC) curves51,52. An ROC curve is a plot of the sensitivity of a model (the vertical axis) vs 1 minus the specificity (the horizontal axis) for all possible cut-off values that might be used to classify patients predicted to have 30-day mortality compared with patients who will not die within 30 days51. Given any 2 random patients, one died within 30 days and one did not, the probability that the model will correctly classify the patient with the outcome as higher risk is equal to the area under the ROC curve (AUC)53. We calculated 95% confidence intervals (CIs) of the AUC following the method of DeLong et al.54. We evaluated the model calibration by observing the agreement between observed outcomes and predictions55. We used a graphical assessment of calibration, with predictions on the x-axis and the observed outcome on the y-axis. We performed a sensitivity analysis to evaluate the robustness of model performance by testing different model specifications.

Finally, we produced a point score system from the best model we developed. In the system, points were assigned based on the predictor values for a patient; the total scores correspond to the risks of the 30-day overall mortality56. The steps to develop the point score system have been summarized in the Appendix 2. For each point score we summarized the positive predictive value (PPV) and negative predictive value (NPV) which respectively represented the probability that the disease is present given a positive test result and that the disease is absent given a negative test result57.

Statistical analysis

Descriptive analyses were conducted to describe the cohort of patients receiving first course palliative RT. We used frequencies and proportions for categorical variables and means with standard deviations (when normally distributed) or medians with interquartile ranges (when not normally distributed) for continuous variables. To describe the association between patient factors and an increased or decreased short-term mortality, we reported odds ratios (OR) from univariable and multivariable logistic regressions with their respective 95% CI.

The analysis was performed using Stata v.15.1 (StataCorp LLC, College Station, Texas, USA) and R v. 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria)58,59.

Results

Description of the cohort

We identified 5,795 patients who commenced palliative RT between July 1, 2007 and December 31, 2017. Patient characteristics are summarized in Table 1. The median age was 64 (interquartile range: 55–75) years; 61.8% were male. Patients with lung cancer (39.7%) constituted the highest proportion of the cohort. In all, 55.1%, 29.2%, and 15.7% were classified as having score 0, 1, and ≥2, according to the Royal College of Surgeons modified Charlson score, respectively. A total of 5,291 patients died during the follow-up period (median follow-up 3 months), of which 995 patients (17.2%) died within 30 days from the start of RT. Data were complete except for those on albumin, peripheral blood cell counts, urea, bilirubin, and LDH, which were imputed21.

Table 1 Patient characteristics for model derivation.

Thirty-day mortality and model performance

Of the 5,795 patients receiving their first course palliative RT, 995 (17.2%) died within 30 days. Model 2 was chosen as the best performing model among candidate models 1–4 from the regression analyses (Table 2). The most important predictors of short-term mortality were primary lung cancer (OR: 1.73, 95% CI: 1.47–2.04), log peripheral blood NLR (OR: 1.71, 95% CI 1.52–1.92), and log plasma urea (OR: 1.55, 95% CI: 1.32–1.82).

Table 2 Comparison of different model specifications.

Figure 1 shows good model discriminations from the candidate models by the ROC curves. Figure 2 shows the 10-fold cross-validated receiver-operating characteristics (cv-ROC) curve for 30-day mortality prediction from the best model (model 2 in Table 2). Model 2 showed the highest discrimination, i.e., its predictive accuracy was good, with a cross-validated-area under curve (cvAUC) of 0.81 (95% CI: 0.79–0.82) (Figs. 1,2).

Figure 1
figure 1

Model discriminations from the four candidate models by the receiver-operating characteristics (ROC) curves, as shown by the respective areas under curve. (created using Stata v.15.1, https://www.stata.com/).

Figure 2
figure 2

10-fold cv-ROC curves for 30-day mortality prediction from the best candidate model (model 2), which has a cv-area under curve of 0.811. SD: standard deviation; cv-ROC: cross-validated receiver-operating characteristics. (created using Stata v.15.1, https://www.stata.com/).

Tables 3 and 4 show the point score and average predicted probabilities of 30-day mortality based on model 2, respectively. For ease of interpretation, values of the predictors in log-scale were back-transformed to their original scale. A point score cut-off value of 6 (positive predictive value: 33.0%; negative predictive value: 94.2%; sensitivity: 80.3%; specificity: 66.2%) showed the greatest Youden’s index (46.5), corresponding to maximum joint sensitivity and specificity on the ROC curve.

Table 3 Points score system for probability of 30-day mortality for patients with metastatic cancer receiving first course palliative radiotherapy based on model derived.
Table 4 Probabilities of the outcome (30-day mortality) that correspond to the points total.

Supplementary Figure 1 shows the predicted probabilities of 30-day mortality by mortality status. The model calibration suggests a reasonably good fit for the model (Supplementary Fig. 2, likelihood-ratio statistic: 2.81, P = 0.094), which provides accurate predictions for almost the entire range of the death probability. The predicted probabilities stay close to the ideal calibration line for low and high probabilities of death. Sensitivity analyses showed no association between comorbidities and 30-day mortality or interaction between comorbidities and age; no association between systemic treatments, including chemotherapy, and null increase in predictive performance was observed, regardless of whether comorbidity was included in the model. Additionally, in sensitivity analysis we assessed whether our model was consistent to different windows of time (0–29, 0–35 and 0–45 days) and we applied our point score system to predict 3- and 6-month mortality in the same patient cohorts. We found similar values of NPV and PPV (Supplementary Tables 1 and 2).

Discussion

We found that primary lung cancer, peripheral blood NLR, and plasma urea were strong predictors of short-term mortality among patients with stage IV cancer. Our score system was a good predictor of short-term mortality; performance metrics by ROC curves and calibration curves showed high model discrimination and calibration, respectively.

More recent and successful studies on predictive models for survival after palliative RT in patients with advanced cancer were conducted by Chow et al. and Kristnan et al.22,23,24. These studies are similar; however, ours has important advantages. We developed a scoring system that uses objective measurements (complete blood counts, liver and renal function tests within 180 days) to determine the 30-day mortality of patients receiving palliative RT. Furthermore, our data were obtained from routine practice; this increased the model’s clinical applicability, unlike those by Chow et al., whose Radiotherapy Rapid Response Programme was established with an aim to provide quick palliative RT for patients with terminal cancer mostly referred from medical oncologists and palliative care doctors22,23. Our model for prediction of 30-day mortality was developed based on a large population of unselected adults referred for palliative RT (5,795 patients versus 395 in Chow et al. and 862 in Krishnan et al.). The concordance (C)-statistic is a measure of goodness-of-fit for binary outcomes in a logistic regression model. It represents the probability that the predicted and observed outcomes are concordant for a randomly selected pair of patients in the predictive model25. The C-statistic for TEACHH model and Chow’s model based on 3 risk factors were 0.59 and 0.65 respectively23,24, while the AUC (equivalent to C-statistic) in our model was 0.81 which is better. The AUC was cross-validated which reduces the optimism bias of the other two and was internally validated to provide higher consistency in absence of an external validation. It is an easy-to-calculate tool for patients with metastatic cancer who were referred for palliative RT and who account for 20–40% of patients treated in radiation oncology departments26,27,28,29. Furthermore, given the mandatory status of death certification in Hong Kong and the automated nature of RT and vital status data collection, data on the dates of the first course of palliative RT and death were reliable. The referring clinician’s indication was included in our definition of palliative RT, which was better than merely using predefined dose and fractionation schedules. Our model performed reasonably well across a range of cancer types and other variables, despite lacking genetic data, cancer-specific biomarkers, or any detailed information beyond EMR. This emphasized that commonly available data in EMR contain important predictors to identify clinically relevant outcomes in patients with cancer under palliative care. Most of the inputs to the model are standard structured data components in EMR. The model’s algorithm could easily integrate into existing clinical management systems, importing the data directly from the EMR without specialized infrastructure. Additionally, implementing the tool can continuously and independently validate the predictive power from an ongoing prospective cohort. This is important to reflect the secular trend in cancer epidemiology changes, treatment variations, and referral patterns in an evolving real-world setting.

The model outperformed clinician estimates of survival to guide appropriate clinical judgment in treatment, resource allocation, and early palliative care referrals with advanced care planning30. The NPV exceeded 90% which means patients have very high chance of staying alive beyond 30 days if predicted so by the model. This could be a better standpoint to start dialogue with patients. Realistic and honest disclosure of prognosis can encourage shared decision-making between the patient and the care team, with which the patient can settle personal, family, and financial issues earlier, instead of embarking on another course of treatment based on inaccurate prognosis. However, after thorough discussions with the patient and family, if the patient still opts for RT despite reasonable chance of early mortality, we argue that hypofractionation is preferred to avoid a protracted course of RT near death, given the well-documented evidence for equivalent effects in a range of symptoms31.

Regarding the choice of covariates and development of the model, patients referred for palliative RT often received oncological treatment and blood work before; hence, we included commonly performed biochemical or hematological markers. Clinical experience has shown that patients with lung cancer generally die earlier than patients with other cancers, such as breast cancer14, and patients with certain sites of metastases (e.g., bone only) live longer than patients with others, such as brain and spinal metastases with cord compression32,33,34,35. Since no data were available on sites of metastatic diseases, we substituted with irradiation data. Hence, we included primary cancer site and irradiation site in the score determination. Age may influence not only recommendations for treatment but also prediction of remaining lifespan, analyzed in our model. Moreover, clinician estimates of survival were excluded because they were likely based on experience and training, poorly reproducible, and not commonly recorded in routine electronic database.

Our study had limitations. First, the prediction model was built on data from patients treated with RT and might not be accurate for untreated patients. Second, our procedure for categorizing the predictor variables may not identify the cutoff values with the best discriminating capacity. Third, some important prognostic factors may have been omitted. For example, data on performance status and patient quality of life evaluations using validated scales, or of their frailty status36,37, considered prognostic in previous studies, were not analyzed22. However, we introduced patient comorbidities as proxy for patient frailties. Fourth, palliative RT use was at the oncologists’ discretion in some cases when curative and palliative intent treatment could not be distinguished (e.g., patients having limited metastasis receiving higher dose RT for better local control). Finally, we considered the patients for first course palliative RT without considering the effects of subsequent RT courses and other treatments.

A prediction tool using EMR data, retrieved from routine clinical practice, can accurately predict short-term mortality among patients with advanced cancer starting radiotherapy. Such tool could facilitate shared decision-making among the patients, family, and medical care team. Additionally, it could help clinicians identify patients unlikely to benefit from RT beyond 30 days and those who may instead benefit from earlier palliative care referral and end-of-life planning. Machine learning techniques have the potential to improve clinical decision-making by identifying those at increased risk of poor mortality38. In 3 studies summarized by a systematic review, machine learning techniques are better than routine logistic regression in building model for mortality prediction in older and/or hospitalized adults, if enough data are obtained38,39,40,41. Future research is needed to incorporate machine learning techniques and to determine the generalizability and feasibility of the application of prediction tool in clinical settings.