FormalPara Key Points

A disease model using real-world data from patients with castration-resistant prostate cancer in the Netherlands showed comparable patient characteristics and survival outcomes between the observed and simulated populations.

The disease model was unable to predict differences between treatment groups due to unobserved differences.

Future research should explore the use of a combination of real-world data (to improve generalisability) and data from randomised controlled trials (to ensure the internal validity) to develop disease models.

1 Introduction

With over 12,000 newly diagnosed patients per year, prostate cancer is the most common cancer in men in the Netherlands [1]. Metastatic prostate cancer that progresses while the patient is receiving androgen-deprivation therapy (either alone or in combination with chemotherapy, new androgen-receptor targeting agents, or palliative radiotherapy [2,3,4]) is considered castration-resistant prostate cancer (CRPC) [5]. The median overall survival (OS) of patients with CRPC receiving solely best supportive care is 14 months [6]. Since 2004, multiple new treatments have become available that have improved the OS of these patients [7,8,9,10,11,12,13,14,15].

There is an increasing interest in real-world data (RWD) complementary to that from randomised controlled trials (RCTs). Traditionally, RCTs are designed to show the efficacy of treatments in precisely defined groups under controlled circumstances. However, patients included in RCTs are not a good representation of patients in clinical practice. Studies have shown that real-world patients with CRPC differ from those in trials because of patient selection (i.e., patients in real-world practice are older and have more comorbidities) [16, 17]. (Cost-)effectiveness studies based on RCT data provide the (cost)effectiveness of a treatment for patients in a study setting, which might differ in real-world patients. Furthermore, information on the full disease course is lacking in RCTs as efficacy is estimated during a limited time period and often considering only one treatment line. Moreover, RCTs usually evaluate a new treatment compared with standard of care (or placebo). If different drugs have positive trial results compared with standard of care or placebo, direct comparisons between these drugs are often lacking. Consequently, the effectiveness of different treatment sequences is unknown. Real-world disease models spanning multiple sequential treatment lines can provide insight into the (cost) effectiveness of treatment sequences in clinical practice.

Due to extrapolation, combination of data sources, and correction for differences between patients, models are needed to enable lifetime cost-effectiveness analyses. A well-performing model should be able to simulate reality, i.e., replicate observed outcomes [18]. Using the same baseline characteristics, simulated outcomes should be similar to observed outcomes. Moreover, relative differences in survival outcomes between treatments in the simulated data should be similar to the observed differences between treatments. In this article, we describe our experiences in developing a disease model based on RWD from patients with CRPC.

2 Methods

2.1 Data and Patients

Data were derived from the Castration-Resistant Prostate Cancer (CAPRI) registry, an observational study in the Netherlands.[16] In the CAPRI registry, patients newly diagnosed with CRPC between 1 January 2010 and 31 December 2015 from 20 Dutch hospitals were retrospectively included and followed until 31 December 2017 (N = 3616). Patients treated with docetaxel or androgen-receptor targeting agents for metastatic hormone-sensitive prostate cancer were excluded from the analysis (N = 16). An estimated 20% of all patients with CRPC in the Netherlands is included in the study population [16].

For this study, data from patients treated with at least one life-prolonging drug (LPD) (i.e., docetaxel, cabazitaxel, abiraterone acetate plus prednisone [ABI+P], enzalutamide, or radium-223) were included, whereas patients not treated with an LPD were excluded. Docetaxel treatment was available during the entire study period. Cabazitaxel became available as a second LPD (LPD2) in the Netherlands from 2011 onwards, and ABI+P, enzalutamide, and radium-223 were available as first LPDs (LPD1) from 2014 onwards.

Missing values in the dataset were handled using multiple imputations by chained equations.[19] For each treatment line, the following patient characteristics were imputed: World Health Organisation performance status (WHO PS) (or Eastern Cooperative Oncology Group performance status), opioid use, prostate-specific antigen (PSA), alkaline phosphatase (ALP), haemoglobin, lactate dehydrogenase (LDH), bone metastases, and visceral metastases. These characteristics were used as both imputed and predictive variables. Type of treatment, age, OS, and OS state (alive, dead, or lost to follow-up) were only used as predictors for multiple imputations.[19] Data after multiple imputation were used in the data analysis.

2.2 Model Type

The CRPC population is heterogeneous, with different patient and disease characteristics affecting the course of the disease. To be able to simulate individual patients with specific characteristics and events during their full disease course, patients were simulated using a patient-level discrete-event simulation model with a lifetime time horizon. This model type enables the modelling of the course of a patient in a natural way by accounting for entities (patients) with attributes (patient characteristics) and events.[20]

2.3 Time to Event

The OS was divided into three time periods (Fig. 1). For each patient, we calculated the time from start of LPD1 until the first event (TTE1), which could be either start of LPD2 or death. TTE2 (i.e., time from start of LPD2 to either start of LPD3 or death) was determined in a similar way, whereas TTE3 was calculated as the time from third LPD (LPD3) to death. TTE3 can thus include multiple treatment lines; however, since only 10% of patients received more than three treatment lines, the model only simulated three treatment lines. However, as patients could die earlier, not all simulated patients received all three treatment lines, with some receiving one or two treatment lines.

Fig. 1
figure 1

Flow chart of the patient simulation. *Event can be either next treatment line or death; **event is death. OS overall survival

2.4 Regression Models

Since a lifetime horizon is required for economic evaluations in the Netherlands [21], survival data were extrapolated beyond the follow-up period by fitting several parametric models (i.e., exponential, Weibull, lognormal, log-logistic, generalized gamma, and Gompertz [22]) to the observed survival data. Log-logistic distribution had the best fit for TTE1, TTE2, and TTE3 (Table S1 in the electronic supplementary material [ESM]). Multivariate regression models were built to predict TTE. Based on literature and expert opinion [23], the following predictive variables were included to predict TTE1, TTE2, and TTE3: type of treatment (docetaxel, cabazitaxel, ABI+P, enzalutamide, or radium-223), age (in years), WHO PS (0–1/>1), opioid use (yes/no), PSA (in µg/L), ALP (in U/L), haemoglobin (in mmol/L), LDH (in U/L), bone metastases (yes/no), and visceral metastases (yes/no) (Tables S2–4 in the ESM). As type of event for TTE1 and TTE2 could either be next treatment or death, multivariate logistic regression models for the probability of dying versus switching to the next treatment were used to predict each type of event. These multivariate logistic regression models included the same predictive variables as the TTE regression models (Tables S5 and 6 in the ESM).

2.5 Model Simulation

Patients from the CAPRI registry were sampled with replacement to create a patient population for the simulation model. For each simulation, a population of 5000 patients was simulated to get stable results. The individual patient simulation consisted of several steps (Fig. 1). First, a patient with specific patient characteristics (as observed in the registry) was randomly drawn from the observed data. Second, type of treatment was assigned to each individual patient. LPD1 was the actual first treatment received in the CAPRI registry, whereas LPD2 and LPD3 allocation was based on probabilities conditional to the previous treatment as in the CAPRI registry (Table S7 in the ESM). Third, TTE1 was estimated using the TTE multivariate regression model (Table S2 in the ESM). Finally, type of event (i.e., next treatment or death) was estimated using the multivariate logistic regression model (Table S5 in the ESM). Second- and/or third-line treatment were simulated in a similar way except that death was the only possible event for TTE3 (Tables S3, S4, S6 in the ESM). Every time a patient started the next treatment line, patient characteristics were updated based on conditional probabilities depending on the patient characteristics in the previous line estimated from the CAPRI registry (Tables S8 and S9 in the ESM). All analyses were conducted using SPSS statistics 25 and R version 3.6.1.

2.6 Model Validation

A valid model should be able to simulate the observed data while using the same baseline characteristics, and simulated relative survival differences between treatments should be similar to the observed differences between treatments. Therefore, internal validation of the model was performed by mimicking the real-world patient population (i.e., same patient characteristics at start LPD1, same LPD1) in the model.

3 Results

From the CAPRI registry, 1937 of 3600 patients (54%) were eligible for analysis (we excluded patients who received no treatment [N = 1205] and those who received other [experimental] treatment [N = 458]). Most patients were treated with docetaxel in the first line (N = 1131), whereas 407 patients received enzalutamide as LPD1, 373 patients received ABI+P, and 26 patients received radium-223. Of all patients, 62% (N = 1186) received a second-line and 30% a third-line treatment. Patient and disease characteristics of the simulated population were comparable to those in the observed population after multiple imputation (Table 1).

Table 1 Patient and disease characteristics of all patients at start of first life-prolonging drug

Overall (including all treatments), this resulted in similar survival curves for the simulated and observed data. However, the simulation model overestimated OS during the first years and underestimated OS in later years (Fig. 2a). TTE1 and TTE2 were similar between simulated and observed data in the first years; however, the simulation model overestimated them in later years (Fig. 2b and Fig. S1 in the ESM) and underestimated TTE3 in later years (Fig. S2 in the ESM).

Fig. 2
figure 2

Survival curves of observed and simulated data of total population. a Overall survival and b time to event 1 of observed and simulated total population. OS overall survival

Median TTE1 and type of event (i.e., next treatment or death) after LPD1 and LPD2 were similar for the simulated and observed populations (Table 2). Simulated median TTE2 and TTE3 deviated from the observed data, although the differences were small (TTE2: 7.5 vs. 7.1 months; TTE3: 7.9 vs. 8.2 months). Median OS was 0.8 months longer in the simulated than in the observed population (20.6 vs. 19.8 months).

Table 2 Median time to event (in months) and overall survival in observed and simulated population

As missing values were frequent for some patient characteristics (i.e., WHO PS, visceral metastases, opioid use, and LDH), simulation of TTE and OS was also performed for patients with complete data (N = 411). The characteristics of these patients are presented in Table S10 in the ESM. The simulation model overestimated OS during the first years and underestimated it in later years compared with the observed estimates (Fig. S3 in the ESM). Simulated median TTE1 and TTE3 were comparable to the observed results. However, there were differences between simulated and observed median TTE2 (7.4 vs. 6.4 months, difference: 1 month) and OS (20.2 vs. 18.7 months, difference: 1.5 months) (Table S11 in the ESM).

Differences in median OS stratified by LPD1 between simulated and observed data were similar to those in the total population (0.8 months) for ABI+P (0.7 months) and enzalutamide (1 month). However, simulated median OS deviated from the observed outcomes for docetaxel (1.5 months) and radium-223 (3.8 months) (Table 3). Plotted TTE1 stratified by LPD1 showed that the simulated curve deviated from the observed curves, especially for patients receiving docetaxel and enzalutamide (Figs. S8–11). Furthermore, Table 3 shows that the model was unable to validly replicate the differences between type of LPD1. For example, the difference in median OS between docetaxel and ABI+P was 1.6 months in the simulated data and 0.8 months in the observed data. Docetaxel versus enzalutamide had a simulated difference in median OS of 3.8 months and an observed difference of 6.3 months. In addition, the observed data showed crossing curves for enzalutamide and radium-223 (Fig. 3a), but the survival curves of these two treatments were distant from each other in the simulated data, so the model was unable to replicate the observed differences between treatments in a similar way (Fig. 3b).

Table 3 Observed and simulated time to event and overall survival stratified by first-line treatment
Fig. 3
figure 3

Survival curves stratified by first-line treatment. Survival curves stratified by first-line treatment of a the observed population and b the simulated population. ABI abiraterone acetate plus prednisone, DOC docetaxel, ENZ enzalutamide, OS overall survival, RAD radium-223

4 Discussion

In this study, we developed a full disease model of real-world patients with CRPC. Internal validation showed similar TTE in the simulated and observed total CRPC populations. However, simulated median OS deviated from the observed median OS (difference of 0.8 months) as simulated OS was overestimated during the first years but underestimated in later years. Model simulation based on only complete cases resulted in a larger overestimation of median OS (difference of 1.5 months). This disease model could not adequately estimate the differences between treatments, as these differences became smaller or larger in the model compared with the observed differences. We consider this to be the main limitation of our disease model, since using these results for cost-effectiveness analyses would lead to biased results. Although we were unable to build a valid model for patients with CRPC, we believe that—in the context of honesty and transparency—this ‘brilliant failure’ should be reported as others may learn from our experiences, which can be beneficial for science.

4.1 Challenges of Using Real-World Data (RWD) in Disease Models

During the development of the disease model, we faced several challenges with using RWD. Although RWD provide insight into the effectiveness and safety of treatments in daily practice, they have important limitations in a disease model. First, in the real world, patients are not randomly allocated to a treatment; instead, treatment choices can be influenced by patient and disease characteristics, clinician experience, or patient preference. It is challenging, maybe even impossible, to consider, identify, and measure all confounders in treatment decisions [24]. The real-world patient population is heterogenous, and the strict conditions for randomisation and a controlled setting are not applicable to RWD. As a consequence, the observed differences in outcomes between two treatment groups may be caused by case-mix or other (unmeasured) confounders and not type of treatment [22]. Although we tried to control for possible confounders by correcting for various patient characteristics that may influence treatment allocation and prognosis, this approach is inferior to a randomised design and may thus be biased. Simulated TTE and OS of all patients were comparable to the observed estimates, which was also true for ABI+P or enzalutamide as LPD1. However, survival curves of simulated and observed patients with docetaxel or radium-223 as LPD1 differed. Since only docetaxel was available as LPD1 before 2014, patients received docetaxel between 2010 and 2013 regardless of their patient characteristics but might have been eligible for a treatment other than docetaxel (e.g., ABI+P or enzalutamide) were other treatments available. This might explain the differences in survival curves of simulated and observed patients receiving docetaxel. Differences in simulated and observed survival curves for radium-223 might be due to the small number of patients (N = 26). Moreover, one of the main findings of this study is the inability of the disease model to validly replicate the differences between treatments, as these differences became smaller or larger in the simulated data compared with the observed data (e.g., docetaxel vs. ABI+P: simulated difference of 1.6 months and observed difference of 0.8 months; docetaxel vs. enzalutamide: simulated difference of 3.8 months and observed difference of 6.3 months). Thus, despite using multivariate regression models to control for possible confounders, we could not adequately control for all differences between treatments. This might be due to unobserved differences between treatments (e.g. patient preference) that could not be identified and controlled by multivariate regression models. Therefore, the current disease model is unable to predict differences between treatments. The lack of randomisation and the existence of unobserved confounders unrelated to different types of treatment in observational data is an issue that might also be faced in other disease models based on RWD. Therefore, the findings from and discussion of this study might also be relevant in other populations. Alternative methods could have been used to model survival and events. Degeling et al. [25] describe four different modelling approaches (event-specific distribution, event-specific probability and distribution, unimodal joint distribution and regression model, multimodal joint distribution and regression model), which all performed differently in a simulation study [25]. According to modelling good research practices, estimating the TTE first and defining the event second is the preferred modelling approach [26], and this approach was applied to our disease model. Considering the time span of our study, we did not apply alternative methods to model survival and events. However, this might be interesting for further research. Propensity score matching (PSM) is another method that could control for observed differences in patient characteristics and enable comparison of a treatment and a comparator. However, since PSM can only match on observed characteristics, unobserved differences cannot be excluded. Moreover, PSM is not feasible for the comparison of more than two treatment options [27, 28]. Since the model could not adequately replicate the observed data (i.e., simulated data should be similar to the observed data, and simulated relative differences should be similar to the observed differences between treatments), we considered the CRPC model based on only RWD to be invalid.

Second, RWD is prone to missing data, particularly when the follow-up period is long [29, 30]. In this study, values were missing for almost all patient characteristics, varying from 9% missing values for PSA to 47% for visceral disease state (Table 1). This is a disadvantage of retrospective data collection, which should be considered when designing a disease model. Multiple imputation could offer a valid solution for missing patient characteristics, provided the missingness of data is not related to unobserved variables [29]. We tested the disease model including only data from complete cases (i.e., without any missing values). Simulated results showed similar differences with observed results as when the imputed data of all patients was used. Despite dealing with missing values, observational data enables the analysis of large amounts of data. Differences between simulated and observed data (i.e., overestimation and underestimation of the observed data), as seen in this study, might be due to the amount of missing data. Uncertainty regarding RWD will diminish and survival estimations might improve when missing data are minimised. Therefore, standardised reporting of data should be improved.

The third challenge with RWD is timeliness of reporting results. RWD can be collected from the moment a new treatment is approved by healthcare authorities and used in clinical practice. To provide insight into long-term effects of a certain treatment, the follow-up period should be of sufficient length. At the time results from RWD become available, treatment practices might already have changed due to new developments. RWD results may thus lag behind. In the CAPRI registry, first-line treatment with ABI+P, enzalutamide, and radium-223 are underrepresented, since patients diagnosed with CRPC between 2010 and 2015 were included and ABI+P, enzalutamide, and radium-223 became available as LPD1 in the Netherlands from 2014 onwards. The results of this disease model should thus be regarded against the backdrop of the time period in which data were collected and might not be representative for current clinical practice. Further research with more up-to-date data is recommended.

The update of patient characteristics and treatment allocation could be regarded as a limitation. In the current model, changes in patient characteristics and treatment allocation were only based on the value of the characteristic at the start of the previous treatment line or the previous treatment. These probabilities did not take other variables into account. With the simplified method, we were able to replicate the mean patient characteristics of the CAPRI registry; however, multivariate regression models, and including other patient and disease characteristics, may yield better individual replications. Therefore, in future research, we recommend that patient characteristics be updated using multivariate regression models. Additionally, the same dataset (CAPRI registry) was used for the development and validation of the disease model, and no other dataset was used for external validation of the disease model. Since the model lacked internal validation, external validation with an external dataset was not considered useful.

Moreover, side effects and adverse events of treatments were not considered in the disease model. Adverse events could be taken into account when calculating quality-adjusted life-years (QALYs). Survival with treatments associated with a higher toxicity level might decrease when considering adverse events. On the other hand, costs might increase due to adverse event treatments.

4.2 Potential Opportunities with and Recommendations for Using RWD in Disease Models

Although the use of RWD in disease models is associated with several challenges, it also has benefits. RWD provide insight into the use and uptake of new interventions in clinical practice. For example, the CAPRI registry showed that, in clinical practice, 40% of the patients who were fit for docetaxel according to the clinical guidelines did not receive docetaxel [31]. Furthermore, where results from RCTs often lack generalisability to daily practice, RWD show the effectiveness of new treatments in the real world. Patients with CRPC in the real world differ from those treated in clinical trials, generally having unfavourable patient and disease characteristics (i.e., older, more comorbidities, and worse WHO PS). These differences in characteristics may result in the observed differences in median OS between trial and real-world patients. Previous studies showed a longer OS for trial patients than for real-world patients (from CRPC diagnosis: 35 vs. 24 months, and from start LPD2: 13.6 vs. 9.6 months) [16, 17]. Additionally, RWD could provide insight into the full disease course comprising sequential treatments. In the CAPRI registry, a large range of different treatment sequences was observed (26 different sequences with N > 20). This information could be used to compare various treatment sequences and to estimate which treatment sequence is most preferable in terms of effects and costs. Thus, RWD are of importance for obtaining insight into the use, uptake, and (cost) effectiveness of a (new) treatment in daily practice.

Considering the challenges of and benefits from using RWD in disease models, a combination of RWD and data from clinical studies in a disease model may offer the best of both worlds. RWD could provide insight into the effectiveness and safety of a treatment in daily practice, whereas RCT data provides an unbiased estimate of effectiveness of treatments. Using both RWD and RCT data might be an opportunity to build a well-performing disease model that accurately replicates observed data. A well-performing disease model could be used to estimate cost effectiveness by randomly drawing individual patients from the observed data, following the patient until death, and assigning QALYs and costs to the patient. This could enable the study of costs and effects of different treatment sequences, which consists of not only a single but multiple treatment lines. Moreover, such a disease model could be used to study the impact on the costs and effects if patients received another treatment sequence. Furthermore, to increase the relevance of results from RWD, we recommend the use of up-to-date data. However, the urge to provide timely and relevant results should not diminish sufficient follow-up.

5 Conclusions

We developed a disease model for patients with CRPC using RWD. The overall model accurately replicated the observed data but could not replicate observed differences in outcomes between treatments. As a result, the model was considered unable to replicate the differences in treatments in the observed data, which is crucial for a meaningful cost-effectiveness analysis. Therefore, further research should explore the use of a combination of up-to-date RWD and RCT data in disease models.