Introduction

Survival estimates are important when treating patients with metastatic bone disease to help set patient, family, and physician expectations. For orthopaedic surgeons, objective means of estimating life expectancy can be used to guide surgical decision-making, thereby helping to identify which patients may benefit from surgery and which may require more durable reconstructive options. In addition, validated means by which to estimate longevity may be used to risk-stratify patients before enrollment in clinical trials. Unfortunately, physician estimates alone are notoriously inaccurate [7, 8], which stimulated the development of more objective means of estimating life expectancy in this terminal but not necessarily terminally ill patient population.

One application designed for use in patients with metastatic bone disease is PATHFx [11]. Introduced in 2011 [4], this validated clinical decision-support tool [6] uses clinical and physiologic variables to generate the probability of survival at 3 and 12 months after orthopaedic surgery. It is available free at www.pathfx.org. These times were chosen by expert opinion as a means to help surgeons determine which patients would benefit from surgery (probability of survival at 3 months) and whether a more-durable implant may be necessary (probability of survival at 12 months). However, given the relatively broad array of treatment options for patients with symptomatic bone metastases, survival estimates at other times may be preferred in certain situations. For instance, those considering palliative options for metastatic lesions may seek to estimate very short (1-month) life expectancy [10]. In addition, more-favorable estimates of 1–6 months may justify less-invasive approaches to stabilization such as intramedullary fixation (Fig. 1A). Finally, a survey of Musculoskeletal Tumor Society members [12] indicated that the probability of survival at 6 months postsurgery would be helpful in determining whether to choose a more-durable reconstruction option involving conventional or modular “tumor” prostheses, specifically for lesions involving the peritrochanteric or subtrochanteric femur (Fig. 1D). Similarly, for patients with favorable estimates at 12 months postsurgery, the evidence supporting the need for more-durable implants becomes stronger.

Fig. 1A–D
figure 1

Two implant options for patients with impending pathologic proximal femur fractures are shown. Relatively sick patients with high likelihood of perioperative complications and survival estimates less than 6 months may benefit from intramedullary stabilization. (A) Preoperative and (B) postoperative radiographs are shown. Healthier patients who have a lower likelihood of perioperative complications and more favorable survival estimates of greater than 6 months may require more durable implants, such as the prosthesis shown here. (C) Preoperative and (D) postoperative radiographs are shown. Patients with very short life expectancies of less than 1 month may be candidates for palliative radiotherapy, without surgery.

Although various statistical techniques could be used to estimate life expectancy, a Bayesian approach using probabilistic theory appears well suited to this clinical application, in part because it can accommodate uncertainty by functioning in the presence of incomplete information, common in the clinical setting. In addition, Bayesian networks are capable of estimating the likelihood of rare events, which may be useful when attempting to estimate very short (< 1 month) life expectancy in a population of surgical candidates. The original Bayesian Belief Networks that we used to estimate 3- and 12-month survival have been externally validated twice [6, 11] and possess favorable characteristics on decision curve analysis (DCA) [14], indicating that the models are suitable for clinical use. As such, it is reasonable to determine whether 1-month and 6-month survival could be estimated using similar techniques. In doing so, we respond to treatment philosophies preferred by the orthopaedic oncology community [12] and provide clinicians with an objective means by which to characterize each patient’s life expectancy. If successful, www.pathfx.org could be upgraded to provide the user with a comprehensive, no-cost, assessment of each patient’s survival trajectory, by estimating 1-, 3-, 6-, and 12-month postoperative survival.

With this in mind, we developed two Bayesian models capable of estimating the likelihood of 1- and 6-month survival in patients undergoing surgery for metastatic bone disease. On external validation, we performed receiver operator characteristic (ROC) and DCA [14] to determine whether each model was suitable for clinical use.

Patients and Methods

After local institutional review board approval, we retrospectively reviewed our longitudinally maintained database and identified the records of 189 patients who underwent surgery for skeletal metastases at Memorial Sloan-Kettering Cancer Center (MSKCC) between 1999 and 2003 (MSKCC training set). Each record contained 15 variables and sufficient information to establish survival at 6 months postsurgery (yes or no) in 100% of records. No patient was lost to followup. Other recorded variables included age at surgery, race, sex, primary oncologic diagnosis, indication for surgery (impending or complete pathologic fracture), number of bone metastases (solitary or multiple), presence or absence of visceral (organ) metastases, estimated glomerular filtration rate (mL/minute/1.73 m2), serum calcium concentration (mg/dL), serum albumin concentration (g/dL), presence or absence of lymph node metastases, prior chemotherapy (yes or no), preoperative hemoglobin (g/dL; on admission, before transfusion, if applicable), absolute lymphocyte count (K/mL), and the senior surgeons’ (JHH and PJB) estimates of survival (postoperatively in months).

Including the surgeon’s estimate—a subjective assessment—may seem controversial. However, doing so allows surgeons with considerable expertise to provide a weighted estimate of survival that can be used in conjunction with the other, more-objective, features contained in PATHFx. Although the surgeon’s estimate contains many important subjective features that cannot be quantified, it does demand a certain level of experience. If surgeons are unsure regarding whether an estimate is appropriate, he or she should select “unknown.” Doing so will maintain accuracy of the model, while not introducing undue bias, as shown in two prior external validation studies [6, 11].

We used these data to develop two Bayesian models in a manner previously described [5] using commercially available machine-learning software (FasterAnalytics™; DecisionQ, Washington, DC, USA). In brief, all 15 variables were considered as candidate features for inclusion in the models. Prior distributions, that is, the values each variable is likely to assume under various circumstances, was learned from the MSKCC training set and not specified a priori. We then generated two models containing the outcomes, 1-month or 6-month survival (yes or no), and generated calibration curves that plotted predicted risk against actual risk to assess the accuracy of model predictions.

We next used data from eight major referral centers across Scandinavia (n = 815) to perform external validation (Scandinavian external validation set). We chose this registry because it is well-characterized, and was used to externally validate the original 3- and 12-month PATHFx models [6]. In general, models are not suitable for clinical practice until external validation is performed. The Scandinavian external validation set provided a means to test model performance by requiring the 1- and 6-month models to estimate survival in each of the 815 “unknowns.” The diversity of the data between the MSKCC training set and the Scandinavian external validation set is important to help ensure the models can be used to guide treatment in various settings with differing patient populations, demographics, and treatment philosophies. Each record contained the preoperative features required to validate the 1-month and 6-month models. Followup was sufficient to establish postoperative survival at 6 months in 100% of records. Ethical approval was not required before using these deidentified registry data. Importantly, the senior surgeons’ estimates of survival were not contained in the external validation data set. Choosing a validation set without the surgeon’s estimate was intentional and helps ensure these models are applicable to centers that may lack expertise in estimating life expectancy for this patient population.

Using the Scandinavian external validation set, we determined the discriminatory ability of each model in estimating the likelihood of 1- or 6-month survival by sensitivity, specificity, and ROC analysis [2]. A minimum ROC area under the curve (AUC) of 0.7 was considered acceptable and was specified a priori.

Measures of accuracy and discrimination alone cannot determine whether the models are suitable for clinical use. Other methods that weigh the consequences of false-positive and false-negative result models are also necessary. To accomplish this, we used DCA [14], an analytic technique that helps quantify the consequences of under- and/or overtreatment of the disease. When constructing decision curves, we assumed surgical decisions would be based strictly on the output of each model. For instance, the decision to offer palliative care or surgery would be based on the likelihood of 1-month survival, and the choice of implant (intramedullary nail vs endoprostheses) would be based on the output of the 6-month model. In each setting, the consequences of undertreatment differ from those associated with overtreatment, which serves as the basis for decision analysis. When interpreting the curves, surgeons must consider his or her “threshold probability,” or the point of equipoise at which he or she is indecisive regarding which treatment is best. Surgeons usually have a low threshold for treatment when dealing with healthy patients, and higher thresholds with sicker patients. Because thresholds vary by surgeon, patient, and situation, it is important to choose a decision analytic technique that allows one to evaluate one or more models across a range of clinically relevant thresholds. Importantly, the decision curves do not estimate the likelihood of survival (the PATHFx models serve this purpose), but rather help determine which model(s) should and should not be used in certain clinical situations represented by a given surgeon’s threshold. Wherever possible, we included information recommended by the Transparent Reporting of a Multivariable Prediction Model for Individual Prognosis or Diagnosis (TRIPOD) [1]. In addition to these, we believe decision analytic techniques, such as those used in this study, are important requirements for any model destined for the clinical setting [3].

Results

The Bayesian models successfully estimated 1- and 6-month survival. The graphic structure of each model helps one understand the relationships between features (Fig. 2), and by inspection of the calibration plots, both were reasonably calibrated to the MSKCC training data (Fig. 3). In terms of model structure (Fig. 2), those features connected to one another are known as “first-degree associates,” and are conditionally dependent—that is, the value of one feature can be represented in terms of one or more other features. First-degree associates are the features that are most closely related to and most influential in estimating the likelihood of a particular outcome. In fact, if the values for the first-degree associates are known, the values for the rest of the features are irrelevant. However, if a first-degree associate is missing, the models retain functionality and other features are used in its place. For instance, some surgeons may not choose to provide a surgeon’s estimate of survival. For the 1-month model (Fig. 2A), the oncologic diagnosis, presence of visceral metastases, and Eastern Cooperative Oncology Group (ECOG) performance status serve as objective surrogates. For the 6-month model (Fig. 2B), ECOG performance status is used instead. As shown, there are two first-degree associates of 1-month survival (Fig. 2A) and six first-degree associates of 6-month survival (Fig. 2B). On external validation, the 1- and 6-month models correctly estimate survival in 80% and 60% records, respectively. For the 1-month model, 11 of the 166 erroneous estimates (7%) were underestimates and 155 (93%) were overestimates. For the 6-month models, 100 of the 274 erroneous estimates (37%) were underestimates and 174 (64%) were overestimates. The AUC for the 1- and 6-month models were 0.76 (95% CI, 0.72–0.80) and 0.76 (95% CI, 0.73–0.79), respectively. Importantly, the models retained their discriminatory ability despite missing data, and notable differences in most continuous (Table 1) and categorical (Table 2) features. This highlights the unique advantages of Bayesian models to function in the presence of missing input data, even when confronted with differing patient populations and treatment philosophies.

Fig. 2A–B
figure 2

The Bayesian Belief Network structure of the (A) 1-month and (B) 6-month models are shown. There are two first-degree associates of 1-month survival. These are the features that are most highly related to the outcome of interest and include the senior surgeon’s estimate of survival and the presence of a completed (as opposed to an impending) pathologic fracture. In comparison, there are six first-degree associates of 6-month survival, including preoperative hemoglobin, absolute lymphocyte count, oncologic diagnosis, presence of a completed pathologic fracture, the number of bone metastases, and the senior surgeon’s estimate of survival.

Fig. 3A–B
figure 3

The calibration curves show the agreement between observed outcomes and those predicted by the (A) 1-month and (B) 6-month PATHFx models. The shaded region depicts the 95% CI of the predictions. Perfect calibration to the training data should overlie the 45° dotted line. Both models are reasonably well calibrated to the MSKCC training data.

Table 1 Categorical variables contained in the training and validation sets
Table 2 Continuous variables contained within the training and validation sets

DCA (Fig. 4) indicated that both models possess clinical utility. Net benefit is defined as a 1- or 6-month survivor who duly receives an operation and implant commensurate with his or her estimated survival. Importantly, decision curves do not recommend a course of treatment, but rather help determine whether PATHFx should be used to achieve better outcomes in various clinical situations [5]. To interpret the decision curves, a surgeon must first determine his or her threshold for treatment, known as the threshold probability. This is the point of equipoise between recommending surgery or nonoperative treatment, in the case of 1-month survival, or deciding between an intramedullary nail or prosthesis, in the case of 6-month survival. The threshold differs between surgeons and is patient-, tumor-, and situation-dependent. For example, some may require a 50% probability of 1-month survival before offering surgery, while others require only a 10%. However, surgeons who are considering whether to offer surgery to a very sick patient, thought to be at high risk of having perioperative complications, may have a very high threshold. For example, if a surgeon’s threshold exceeds 60%, (eg, would offer surgery only if the probability of survival exceeded 60%) (Fig. 4A), he or she should use PATHFx to derive 1-month survival estimates rather than assume the patient will or will not survive greater than 1 month. If a surgeon’s threshold is less than 60%, which can be the case in healthier patients thought to be at lower risk of perioperative complications, the PATHFx curves and the “assume all survive” curves are collinear. As such, he or she may either use PATHFx to derive a probability of 1-month survival, or simply proceed as if the patient will survive greater than 1 month.

Fig. 4A–B
figure 4

The decision curves depict the net benefit of the (A) 1-month and (B) 6-month models when applied to the Scandinavian external validation set. Net benefit is defined as a single patient who duly receives the correct treatment based on the model output. Each of the models could be used rather than assume all or none of the patients will survive greater than 1 or 6 months, respectively. However, surgeons requiring a high degree of probability of 6-month survival (B, arrow) before offering endoprostheses should base treatment decisions on the assumption that the patient will not survive greater than 6 months rather than use the PATHFx. This situation can be encountered with very sick patients for whom the risks of arthroplasty outweigh the benefits. In this case, surgeons may choose a less-invasive approach to stabilization, or palliative treatment, depending on the patient’s 1- and 6-month survival estimates.

Another important use of decision curves is to determine in which patients PATHFx should not be used. For instance, surgeons who are considering whether a more durable implant may be necessary by using the 6-month model, may have a high threshold for doing so in the setting of very sick patients. If his or her threshold for offering a prosthesis rather than an intramedullary nail exceeds 72% (Fig. 4B, arrow), surgeons may expect better outcomes by basing treatment decisions on the assumption that the patient will not survive greater than 6 months, rather than using the PATHFx tool.

Discussion

The ability to derive objective, personalized survival estimates is important for all providers treating patients with metastatic bone disease. As the number of medical, surgical, and palliative therapies increase, there is a need to risk-stratify patients in clinical and research settings. Toward this end, we successfully developed and externally validated two additional models able to estimate the likelihood of 1- and 6-month survival. We also showed that the models possess clinical utility, and added them to www.pathfx.org, which now is capable of estimating the likelihood of survival at 1, 3, 6, and 12 months after surgery. In doing so, we acknowledge that surgeons may find it useful to obtain very-short (1 month) survival estimates in an effort to avoid expensive and unsuccessful surgery during the last month of life. In addition, favorable survival estimates of greater than 6 months may help justify more-durable, but complicated, reconstructive endoprosthetic options. By improving our ability to estimate each patient’s postoperative survival trajectory, we ensure PATHFx remains current and relevant to the surgeon, who may now obtain a more complete representation of their patients’ survival trajectory at www.pathfx.org. In addition we provide a validated tool that may be used for risk stratification before enrollment in clinical trials in this challenging and quite diverse patient population.

We note the following limitations. Given the data used in this study, it is probable that other statistical techniques could yield prognostic models. However, we have considerable experience using and applying Bayesian statistics. Moreover, the existing 3- and 12-month models have been externally validated twice [6, 11] and shown to possess clinical utility in other patient populations. Next, the data in the training and validation sets were derived from tertiary referral centers. As such, it is unknown if and how the resultant models would apply to the community setting. However, it was our intent to use an independent data set that did not contain the senior surgeons’ estimates of survival to illustrate an important feature in Bayesian modeling—the ability to accommodate uncertainty in the clinical setting, and function in the presence of missing input data. Next, as with any modeling approach, overfitting can occur and produce overly optimistic results. This is typically the case when authors report internal validation statistics. However, by focusing our results on external validation, rather than internal validation, we suggest that the models retain accuracy when confronted by otherwise unrelated patient information and therefore, are not overfit to the training set. In addition, because the models were developed using data from patients who had undergone orthopaedic stabilization, only a small proportion (8% of the MSKCC training set) survived less than 1 month. Nevertheless, Bayesian Belief Networks are able to estimate the likelihood of rare events, like very short-term survivors, by describing their unique characteristics and the relationships between features. Next, it is unclear whether PATHFx would be useful in nonsurgical settings to estimate life expectancy; however, we intend to validate them in patients undergoing palliative radiation therapy of symptomatic bone metastases. Finally, before using any clinical decision support tool, orthopaedic surgeons should apply the same level of scrutiny and healthy skepticism as they do for the implants they select, and the literature they read. The techniques used to establish discriminatory ability, and net benefit based on external validation, are considered to be the minimum statistical requirements for would-be clinical decision support tools. As such, these results apply to patients similar to those represented by the United States and Scandinavian populations who were treated at oncologic referral centers in a multidisciplinary fashion. If PATHFx is to be used in other settings, including international centers with differing patient populations and treatment philosophies, additional external validation is recommended and currently is underway.

When approaching patients with metastatic bone disease, surgeons base treatment decisions on various factors, including the location and physiology of the lesion, the desired mechanical properties of the implant, the patient’s and family’s wishes, and perhaps most importantly, how long the patient is likely to live. For instance, an 80-year-old male with multiple skeletal metastases, an impending pathologic femur fracture attributable to lung cancer, poor ECOG performance status (≥ 3), organ and lymph node metastases, hemoglobin of 8 g/dL on admission, and an absolute lymphocyte count of 0.9 K/mL has a very unfavorable survival profile (Fig. 5A), which may help justify palliative measures, rather than more-complicated and expensive surgical options. However, a 65-year-old female with multiple skeletal metastases and an impending pathologic femur fracture attributable to estrogen receptor-positive breast cancer, organ metastases, lymph node metastases, a hemoglobin of 12 g/dL on admission, and an absolute lymphocyte count of 1.4 K/mL has a much more favorable survival profile (Fig. 5B), which we believe helps justify the need for more-durable reconstructive options.

Fig. 5A–B
figure 5

This figure from www.pathfx.com shows two typical patient scenarios encountered by surgeons who treat metastatic bone disease. The patient characteristics and PATHFx inputs are shown on the right, and the individualized estimates of survival are displayed as horizontal bar graphs on the left. (A) The first graph shows a very poor survival profile that may help provide surgeons and families with objective information to choose palliative therapy, or in some cases, less-invasive means of surgical stabilization. (B) However, more favorable survival estimates can be used to justify the use of more durable, complicated, and expensive orthopaedic implants, such as conventional, or “tumor” prostheses. ECOG = Eastern Cooperative Oncology Group.

On external validation, discriminatory ability of the 1- and 6-month models was similar to those previously studied [6, 11]. Specifically, the 3- month and 12-month models contained in PATHFx showed AUCs of 0.79 and 0.76, respectively, when confronted with this external validation data set and DCA showed that both of these models were suitable for use in the Scandinavian patient population [6]. The results of the current study show similar discriminatory ability of 0.76 (95% CI, 0.72–0.80) and 0.76 (95% CI, 0.73–0.79), for the 1- and 6-month models, respectively, and favorable DCA indicating that both models could be used rather than assume all patients, or none, would survive greater than 1 or 6 months, respectively.

One disadvantage of the PATHFx models has been an inability to compare them with other published means of estimating survival in patients with skeletal metastases. For instance, the method of Tokuhashi et al. [13], a widely used means to guide surgical decision-making in patients with symptomatic spine metastases, classifies estimated postoperative survival using 6 and 12 months. Before now, this method could not be compared with the PATHFx models, which previously contained only 3- and 12-month models. In addition, the method of Katagiri et al. [9] generates short-, intermediate-, and long-term survival estimates using a scoring system. Although future work is necessary to determine whether the methods described by Tokuhashi et al. [13], Katagiri et al. [9], or PATHFx are better suited for clinical use, the techniques used to establish discriminatory ability, and net benefit of this article, considered to be the minimum statistical requirements for would-be clinical decision support tools, may be used to answer this question.

The Bayesian Belief Network appears well suited for estimating 1- and 6-month survival in this patient population. Measures of discriminatory ability and decision analysis confirm that both models are suitable for clinical use and we upgraded www.pathfx.org to include the algorithms described in this study. Surgeons may now use a web browser to generate survival estimates at 1, 3, 6, and 12 months after surgery, at no cost. Although we plan further external validation studies in surgical and nonsurgical patients, we believe these objective estimates may be used not only to guide treatment, but also as a risk stratification method to support future clinical trials involving patients with metastatic bone disease.