Introduction

Accurate prediction of the life expectancy for patients with skeletal metastases is helpful for appropriate treatment selection. Survival estimations are particularly germane to patients with short (< 1 month) survival estimates to avoid unsuccessful and expensive procedures during the last month of life. For patients with intermediate survival estimates (< 6 months), less-invasive means of treatment may be indicated. However, durable reconstruction options may be justified for those with more-favorable estimates of 6- or 12-months postoperative survival. Currently, these decisions are left to the treating surgeon and his or her team without valid, objective decision-making tools to help decide who should undergo surgery and who should not. In the US and other Western countries, PATHFx was validated to help physicians with these decisions [3,4,5, 10], free of charge at www.pathfx.org, but this tool has not been shown to be applicable to Japanese populations.

A Bayesian model is a statistical method to explore conditional, probabilistic relationships between variables to estimate the likelihood of an outcome using observed data. One of the authors (JAF) [3] developed a series of Bayesian models capable of estimating 1-, 3-, 6-, and 12-month postoperative survival in patients undergoing stabilization procedures for metastatic bone disease. The variables of PATHFx, composed of diagnosis-specific information [3, 7,8,9,10], extent of disease [2, 13], performance status [10], and basic laboratory assessments [6], are intended to guide good clinical judgment by delivering the likelihood of survival, not subjective survival estimates.

Before PATHFx can be recommended for clinical use in a particular population, it should be externally validated in specific patient populations. All previous studies were based on patients from Western countries [3,4,5, 10]. However, there are differences in several aspects of patient background and care including referral patterns, medical resources, and treatment strategies for skeletal metastases. This poses a problem when applying the PATHFx models to patients from Eastern countries with differing races, cultures, patient populations, and treatment philosophies from those previously studied.

We asked (1) whether the PATHFx models are as predictive in Japanese patients by estimating the area under the receiver operator characteristic curve (AUC); we considered an AUC greater than 0.7 as an adequate predictive value. We also (2) performed decision curve analysis [10] at various times to determine whether and how PATHFx should be used clinically at those various intervals.

Patients and Methods

Bayesian Belief Networks

Bayesian belief network modeling is a statistical method representing conditional and probabilistic relationships between variables. These relationships enable the development of a graphic n-dimensional structure or model that codifies outcomes into a single hierarchical network. A Bayesian approach can be helpful in analyzing data with multidimensionality and uncertainty. In fact, Bayesian belief network models maintain their robustness in the context of incomplete or discordant clinical data. For this reason, Bayesian belief networks have been successfully used to model complex relationships and to classify outcomes in various oncologic diagnoses [1].

Eligibility

This retrospective study was designed as a multiinstitutional cooperation among five cancer referral centers in Japan (National Cancer Center Hospital, Cancer Institute Hospital, The University of Tokyo Hospital, Teikyo University Hospital, and Juntendo University Hospital), each of which provided ethics approval. We chose these centers because all have institutional tumor boards comprised of experts in different specialties including orthopaedic oncology, medical oncology, radiation oncology, and palliative care for discussing the treatment approach for skeletal metastases. The surgical indication for patients with skeletal metastasis of the extremities was surgery performed to prevent or treat pathologic fracture according to the Mirels’ criteria [10]. Surgical indications for patients with skeletal metastasis of the spine were defined as intractable pain and progressive neurologic deficits that were thought to be related to the compression of the myeloradicular structures based on MRI. We included the records of all patients who underwent surgery for metastatic bone disease at any site (appendicular and axial skeleton) between 2009 and 2015, and also who had followup of 12 months or more. From the databases, we identified 270 patients (105 in the National Cancer Center Hospital, 83 in the Cancer Institute Hospital, 34 in The University of Tokyo Hospital, 28 in Teikyo University Hospital, and 20 in Juntendo University Hospital) who underwent surgery for metastatic bone disease, however, we excluded nine patients with followups less than 12 months. Finally, the records of 261 patients (103 from the National Cancer Center Hospital, 81 from the Cancer Institute Hospital, 31 from The University of Tokyo Hospital, 27 from Teikyo University Hospital, and 19 from Juntendo University Hospital) were analyzed. Because this study focused only on patients who were treated operatively, applicability of the tool to patients with metastatic carcinoma to bone overall is not possible.

As expected, characteristics of the patients including the continuous features (Table 1) and categorical features (Table 2) included in the training [3] and validation datasets [10] differed from those previously studied.

Table 1 Comparison of continuous features between Japanese dataset and Western datasets
Table 2 Comparison of categorical features between Japanese dataset and Western datasets

Prognostic Variables and Data Collection

We retrospectively reviewed the databases at the five referral centers to identify patients undergoing surgery for metastatic bone disease. The databases contained information collected between January 2009 and December 2015. The following data were extracted: age at the time of surgery, sex, indication for surgery (impending fracture or completed pathologic fracture), number of bone metastases (solitary or multiple), presence or absence of visceral or lymph node metastases, preoperative hemoglobin concentration ([g/dL] on admission to the hospital, before transfusion, if applicable), absolute lymphocyte count (K/μL), and the patient’s primary oncologic diagnosis, classified into one of three groups as previously described [9]. For example, patients with lung, gastric, and hepatocellular carcinoma and melanoma were assigned to Group 1; patients with sarcomas and other carcinomas, Group 2; and patients with breast, prostate, renal cell, and thyroid carcinoma, multiple myeloma, and malignant lymphoma, Group 3. The surgeon’s estimate of survival (postoperatively, in months), also is used, which allows surgeons with expertise in treating patients with metastatic bone disease to provide a weighted estimate of survival that can be used in conjunction with the other, more-objective, features contained in PATHFx.

The definitions used for this study were similar to those previously described except that the criteria for determining lymph node metastases differed [4]. In our centers we seldom perform lymph node biopsies. An impending pathologic fracture was one in which the degree of bone and/or cortical disruption warranted, in the opinion of the treating surgeon, prophylactic surgical stabilization to prevent fracture. Lesions that resulted in a change in bone length, alignment, rotation, or loss of height as determined by imaging, were considered completed pathologic fractures. Biopsy-proven and/or clinically obvious metastases to organs in the chest, abdomen, or brain were considered visceral metastases. In addition, biopsy-proven or clinically obvious metastases to the lymph nodes were considered indicative of lymph node involvement. We also collected the senior surgeons’ (KO, TG, YS, HK, TT) estimates of survival in months (if recorded), and categorized each patient’s overall survival at 1, 3, 6, and 12 months (yes/no).

Statistical Analysis

For external validation of the PATHFx models, we applied each record containing the features listed above to the PATHFx Bayesian models, which estimated the likelihood of postoperative survival at each of these times, for each record. We then performed receiver operating characteristic (ROC) curve analysis and estimated the area under the ROC curve (AUC) as a measure of discriminatory ability. The models were used “as-is” and were not refit or otherwise improved using a Japanese validation set. Bayesian belief networks retain functionality in the presence of missing data so no other imputation methods were used. Validation was considered successful if the AUC was greater than 0.70 and was determined a priori. We chose this threshold because we consider it to be the lowest acceptable limit of discriminatory ability.

Decision curve analysis is a statistical method for evaluating the benefits of a predictive model across a range of patient preferences for accepting risk of undertreatment and overtreatment to help in decision making [12]. Physicians and patients must decide whether to proceed with surgery. However, the decision to operate depends on the benefits (effectiveness) and harm (complications or tumor progression despite the surgery) of the surgery. For example, if the survival is expected to be longer than 12 months, most patients will accept the risks and burden of the surgery, but if not, some may feel that surgery is unwarranted. Decision curve analysis is a method to assess the value of information provided by this predictive model by considering the range of a patient’s risk and benefit preferences, without the need for actually measuring these preferences. In this study, decision curve analysis showed the 3, 6, and 12-month PATHFx models could be used in the clinical setting, but caution should be applied when using the 1-month PATHFx model.

To estimate disease-specific survival using the Kaplan-Meier method, we defined survival as the time elapsed from the date of surgery until the date of death, or the last followup for survivors. All statistical analyses were conducted using JMP® Version 9.0.2 (SAS Institute, Inc, Cary, NC, USA), FasterAnalyticsTM Version 6.5 (DecisionQ Inc, Washington, DC, USA), and R© Version 3.0.2 (R Foundation for Statistical Computing, Vienna, Austria).

Results

The 1-, 3-, 6-, and 12-month survival rates were 92.0%, 87.4%, 71.3%, and 59.4%, respectively (Fig. 1).

Fig. 1
figure 1

The Kaplan-Meier curve shows overall survival after surgery for the patients.

PATHFx correctly classified survival at each of the times studied in the majority of patients. The 1-, 3-, 6-, and 12-month models correctly estimated survival in 234 (90%), 212 (81%), 208 (80%), and 184 (70%) records, respectively. On ROC curve analysis, the AUCs were 0.77 (95% CI, 0.63–0.86), 0.80 (95% CI, 0.72–0.87), 0.83 (95% CI, 0.77–0.89), 0.80 (95% CI, 0.75–0.86), respectively (Table 3). However, accuracy of the 1-month model was relatively low with a lower bound of the 95% CI less than 0.7, probably owing to the small number of events.

Table 3 Summary of the accuracy of the predictive model at each time

The decision curves at 6 and 12 months showed that each of these models may be used clinically (a positive net benefit, ie, above the lines assuming none or all survive at each time) (Fig. 2), but those at 1 and 3 months highlighted the need for caution when applying these models to the Japanese patient population. Japanese surgeons may expect better outcomes by assuming all patients will survive greater than 1 or 3 months unless their threshold for treatment exceeds 0.80, or 0.60, for the 1-month and 3-month models, respectively. For the 6- and 12-month models, however, use of PATHFx will result in better predictions of outcomes at all thresholds, rather than assuming all patients or none of the patients will survive greater than 6 or 12 months, respectively.

Fig. 2A–D
figure 2

The decision curve analysis (dashed line) of the predictive models based on the Japanese dataset at (A) 1, (B) 3, (C) 6, and (D) 12 months, indicates that all the models should be used rather than assuming all patients (continuous line) or none of the patients (thick continuous line) will survive greater than the period of each predictive model.

Discussion

PATHFx provides objective survival estimates in patients who underwent surgery for skeletal metastases, which helps surgeons avoid under- or overtreatment of the disease. One of its advantages is that the tool is available to orthopaedic surgeons, worldwide. However, no matter how convenient, one should not assume any prognostic model or scoring system is suitable for use in a particular patient population unless it has undergone external validation in that setting. Herein, we externally validated the PATHFx models using information from Japanese patients, to determine whether the tool, developed using patients from Western countries, is generalizable to Japanese patients. In doing so, we showed its ability to estimate the likelihood of survival at four times is useful for surgical decision making, and its suitability for clinical use, based on decision curve analysis.

We note the following limitations. The validation data were derived from cancer referral centers in Japan, so some may question whether these models would be useful in the community hospital setting with different patient characteristics and outcomes. Predictions using PATHFX in Japan only apply to the degree that surgeons apply the same indications for surgery to their practices that were applied to the patients in this series. Nevertheless, most, if not all Japanese patients undergoing surgery for metastatic bone disease receive care at one of the five cancer centers that contributed data to this study, suggesting the data are representative of the Japanese patient population. Although we externally validated the PATHFx model, applicability of this model to patients with metastatic bone disease overall is not possible because our study focused only on patients who were treated operatively. If others use palliative or nonoperative treatment more or less than was done at these five centers, use different adjuvant therapies, use different surgical approaches, or apply particular approaches differently, then PATHFX might not apply to their patients. Despite being a topic of ongoing study, it remains unclear whether the models may be applied to patients undergoing nonoperative or other palliative treatment for their metastatic bone disease. Other statistical techniques such as nomogams or scoring systems also may yield prognostic information and possibly yield different results, however, the Bayesian models tested in this study were previously externally validated and shown to possess clinical utility in Western patient populations. Although we showed similar predictive accuracy of PATHFx among different countries included in this study, we were not able to show these cohorts were unbiased. There is no documented evidence that it can be used to compare the results obtained in different countries. Although the original study and other validation series [3,4,5, 10] used only biopsy-proven lymph node metastases, we judged the presence or absence of lymph node metastases clinically, because in Japan, we usually do not perform a biopsy for lymph node metastases. Our dataset may have been underpowered to document the predictability of the model at 1 month since the confidence intervals were quite high. More study of this time is indicated.

On external validation, discriminatory ability of the PATHFx models (0.77, 95% CI, 0.63–0.86; 0.80, 95% CI, 0.72–0.87; 0.83, 95% CI, 0.77–0.89; and 0.80, 95% CI, 0.75–0.86; for the 1-, 3-, 6-, and 12-month models, respectively) was similar to those of populations previously studied [3,4,5, 10]. Favorable decision curve analysis results indicated that all models could be used rather than assuming all patients, or none, would survive greater than each time. Decision curve analysis also highlights the need for caution when applying the 1- and 3-month models to the Japanese patient population.

To date, there have been several prognostic scoring systems for patients with skeletal metastasis [2, 9, 11]. Most were designed to predict the degree of risk of death (ie, low or high risk) based on surgically or nonsurgically treated patients with skeletal metastases. Although they are brief instruments and easy to use, PATHFx has the potential to predict outcome in a more-detailed manner by generating a probability of survival at several times, thereby providing a depiction of each patient’s most likely survival trajectory.

As expected, the distributions of demographic and clinical data from Japanese patients differed from those reported in previous (Western) external validation datasets (Scandinavian, Italian, and Memorial Sloan Kettering Cancer Center) (Tables 1 and 2). This suggests differences in many aspects of care including referral patterns and treatment strategies for skeletal metastases. For example, Japanese patients had the best prognosis despite the low proportion in oncologic diagnostic Group 3 (breast, prostate, renal cell, and thyroid carcinoma, multiple myeloma, and malignant lymphoma), which usually shows a better prognosis. This may be associated with a propensity for higher rate of impending fracture, higher value of hemoglobin concentration, and better performance status in the Japanese dataset. These trends in patient backgrounds may have resulted from the differences in referral pattern and treatment strategy for patients with skeletal metastasis treated in Japan compared with Western countries. In addition, the Japanese patients survived longer, compared with the Western patients, despite the surgeon’s estimate of worst survival, also suggesting the strict indication for surgery in Japan.

Despite these differences, our study showed that PATHFx retained discriminatory ability and more importantly, clinical utility despite key differences in patient characteristics. In addition, we showed the ability of the PATHFx Bayesian models to function in the presence of missing data, something we consider to be an important characteristic of clinical decision-support tools. The ability to function despite missing inputs is particularly important given half of the Japanese dataset lacked a surgeon’s estimate of survival. PATHFx is suitable for use when treating or studying Japanese patients. Measures of discriminatory ability and decision curve analysis confirm that each of the models may be used, although care must be used when using the 1- and 3-month models. PATHFx may now support international collaborative studies involving Japanese and Western patients, by deriving objective survival estimates.