FormalPara Key points for decision makers

Outcomes of economic evaluations based on registry data are to be assessed differently than economic evaluations based on trial data

Frequently encountered issues, such as confounding by indication, missing values, and insufficient number of comparable patients, need to be adequately addressed to maximize the internal validity

Real-world data provide generalizable outcomes and provide insights into a drug’s value for money in daily practice

1 Introduction

Considerations of costs and cost effectiveness are increasingly important for decision making on healthcare resource allocation. Economic evaluations enable a comparison of the cost-effectiveness of alternative treatments, and are thus especially important for decision making on reimbursement of new expensive drugs. Until recently, economic evaluations mainly consisted of cost-effectiveness analyses (CEAs) modeled from randomized clinical trial (RCT) data. RCTs aim to demonstrate the efficacy of interventions and ensure internal validity by randomly assigning which patients receive the new intervention. The circumstances in especially phase III trials are, however, not generalizable (i.e., externally valid) to a more heterogeneous group of patients treated in a real-world setting. Therefore, many uncertainties remain regarding the relevance of the results of RCTs in a real-world setting.

Cost-effectiveness evidence based on RCT data may, therefore, not be sufficiently informative for decision makers. In such cases, evidence needs to be obtained from other sources, for example patient registries. A patient registry enables the evaluation of specified outcomes for a population defined by a particular disease, condition, or exposure, and when thoroughly designed and performed a patient registry can provide real-world evidence of clinical practice, patient outcomes, safety, and comparative effectiveness [1].

Guidelines on conducting and reporting economic evaluations are readily available [24], as well as questionnaires to assess the relevance and credibility of observational studies [5]. However, barriers still exist to use evidence from economic evaluations in actual decision making [6, 7]. This necessitates the evaluation of the strengths and limitations of different types of evidence [8]. Moreover, practical guidance on using registry data for economic evaluations as well as on how these evaluations can be used in decision making is currently lacking.

This paper presents a practical guide on how to use registry data to inform decisions about the cost effectiveness of new drugs. We discuss the required steps of conducting a sound economic evaluation; the steps are explained by using the Population based HAematological Registry for Observational Studies (PHAROS) as an example. Although using registry data imposes some challenges, we illustrate that it is feasible to conduct an economic evaluation. We also discuss potential issues and limitations of economic evaluations based on registry data. The last section highlights the value of real-world economic evaluations for decision makers.

2 PHAROS and Its Context

In the Netherlands, outcomes research requirements were implemented in 2006 for new expensive drugs to ensure timely access to promising drugs. If a drug is included in this policy, hospitals receive an additional ear-marked budget; however, with the obligation to gather data on appropriate drug use and real-world cost-effectiveness [9, 10]. A reassessment after 4 years determines whether or not additional financing will continue. Real-world data are often collected within a patient registry.

One of the first Dutch patient registries was PHAROS. PHAROS is a population-based disease registry that started in 2010 with three hematologic malignancies (non-Hodgkin lymphoma, multiple myeloma, and chronic lymphatic leukemia) in three regions; these regions cover 40 % of the Netherlands [11]. PHAROS expanded over the years to other hematological malignancies (chronic myeloid leukemia, myelodysplastic syndromes, and myelofibrosis) and is currently expanding to a nationwide coverage. Like many other registries, PHAROS was created to serve multiple purposes including measuring and improving the quality of care and determining the clinical and cost effectiveness of treatments used in a real-world setting. This paper uses examples of the economic evaluation [12] based on data from PHAROS. This economic evaluation was conducted to inform the reassessment of rituximab maintenance therapy for patients with follicular lymphoma, a subtype of non-Hodgkin lymphoma. A Markov Model was used with a 20-year time horizon to compare rituximab maintenance therapy in patients who responded to second-line chemotherapy with best supportive care (i.e., observation after a response to second-line chemotherapy). For further details we refer to Blommestein et al. [12].

3 Conducting Sound Economic Evaluations with Registry Data

Economic evaluations typically include a number of steps, irrespective of the source of data. These steps, comprising existing guidelines in academic literature [24] are presented in Table 1.

Table 1 Steps of an economic evaluation

3.1 The Policy Issue

Above all, it is important to define a clear objective for the economic evaluation and ascertain its relevance to healthcare decision making. One of the reasons to initiate PHAROS was to support decision making on the reimbursement of expensive drugs for three hematologic malignancies. Consequently, PHAROS data should facilitate the conduction of economic evaluations with real-world data.

3.2 Define the Research Question

It is crucial to determine the main research questions of the economic evaluation before setting up a registry that should collect the required data. For example, if a registry needs to be able to answer questions about the incremental cost-effectiveness ratio (ICER), relevant costs, and effects of at least two groups of patients are to be collected. Decision makers in the Netherlands require real-world evidence on appropriate use, effectiveness, and incremental cost effectiveness of drugs. Based on these requirements, the following research questions were defined for PHAROS:

  1. i)

    To whom and how is the drug of interest prescribed in daily practice?

  2. ii)

    What is the real-world effectiveness of this drug?

  3. iii)

    What is the real-world incremental cost effectiveness of this drug?

Regarding the first research question, PHAROS needed to include detailed data on baseline patient characteristics (including prognostic information) of patients who were treated as well as of patients who were not treated with the drug of interest. While a registry can be intervention based, PHAROS was set up as a disease-based registry. The advantage of using a disease-based registry is that all patients are included who meet the disease criteria. Therefore, PHAROS included patients eligible for treatment as well as patients ineligible for treatment. This also enabled identifying patients eligible for treatment but not treated with the drug of interest; these patients may serve as a comparator group. In addition, PHAROS needed to provide evidence on how drugs were used in daily practice. PHAROS not only included data on types of treatment, but also data on treatment regimes, dosages, dose modifications, treatment interruptions, and treatment duration. Furthermore, from a policy perspective, it is important to obtain insight into equitable access to (expensive) drugs. Population-based registries can serve to obtain evidence on uptake by hospital and region; they may thus serve to reveal differences in access to a drug between regions and between university and general hospitals. In cases where data are based on a non-population-based registry, it is crucial that the selection is representative for the entire patient population as well as that a sufficient number of patients is included to ensure generalizability.

Regarding the second research question, PHAROS had to provide evidence on real-world effectiveness of the drug of interest. RCTs are the gold standard to demonstrate efficacy and assure internal validity by random assigning patients to a treatment strategy. In contrast, registries involve observational data and provide details on patients treated in daily practice. Reimbursement decisions may depend on the real-world use, effectiveness, and costs; in cases where a drug is not effective or not cost effective in daily practice, reimbursement of the drug may be reconsidered. If well designed, a registry includes information that enables accounting for heterogeneity in daily practice patients, physician variation, and the healthcare context. Therefore, effectiveness estimates based on registry data assure external validity and are thus generalizable to the real-world patient population. Ideally, the data should cover all treatments from diagnosis until death. However, this also depends on the length of follow-up and the time an analysis is required for policy making.

Regarding the third research question, PHAROS data needed to be able to demonstrate incremental real-world cost effectiveness of the drug of interest. Similarly to the second research question, a well-designed disease registry enables the estimation of incremental real-world effects, costs, and cost effectiveness simultaneously.

3.3 Define the Perspective of the Study

The perspective of the economic evaluation determines what type of costs and outcomes are to be included in the analyses. Most economic evaluations are conducted from a third-party payer or societal perspective. A societal perspective implies the inclusion of all relevant costs (direct and indirect, medical and non-medical costs) and relevant outcomes (quality of life and life-years). In contrast, in a third-party payer perspective non-medical costs are not included (e.g., traveling costs, productivity costs). Other used perspectives are healthcare, hospital, and patient. Requirements regarding the perspective may differ per country. It is, however, best to define the perspective before the start of data collection because it determines what costs and outcomes are needed for the economic evaluation. The objective of PHAROS was to gather evidence for the reassessment of expensive drugs in the Netherlands. Such a reassessment requires a societal perspective in the Netherlands.

3.4 Identify the Comparator(s)

Economic evaluations involve a “comparative analysis of alternative courses of action in terms of both their costs and consequences” [2]. The choice of comparator is crucial for the outcomes of the economic evaluation and it may potentially be a source of bias. In economic evaluations based on real-world data, it may not always be clear which alternative treatment is the most appropriate comparator and it may depend on the policy issue at stake. The most relevant alternative for decision makers is usually the current standard of treatment, this may also be best supportive care or a wait-and-see policy [12]. The inclusion of control groups to a registry adds to its complexity, time, and costs [1], but it allows the performance of a sound economic evaluation that compares a new treatment with the current standard of care. Collecting data over a long time period increases the chance that a registry includes an appropriate comparator group and avoids incomparable patient groups because of for example a rapid uptake of a new drug. This was, for example, illustrated by a Dutch observational study among patients with stage III colon cancer. Patients ineligible for the drug of interest had higher levels of unfavorable prognostic factors, i.e., carcinoembryonic antigen levels at baseline [13]. PHAROS included patients diagnosed from 2004 to 2012 and included relatively more patients in the comparator group who were included in the earlier years of the registry, while the intervention group included more patients who were diagnosed at the later years of the registry.

3.5 Identify, Measure, and Value Relevant Costs

Costs can be identified in the following categories; hospital resources, community care resources, patient and family resource use, and resource use in other sectors [2]. Guidelines regarding economic evaluations and valuation of unit costs can differ per country, as can the available data. We used Dutch data and the methods as set forward by Dutch guidelines [14].

Relevant cost items for inclusion in the registry depend on disease characteristics, the patient population, treatment strategies of interest, and the perspective of the study. It is usually not efficient to collect all potential cost components and a balance needs to be established between the relevance of the cost item relative to the burden of collection [1]. This balance can be based on previous research findings and/or determined in collaboration with treating physicians and based on professional guidelines. In PHAROS, data on hospital resource use were collected for outpatient visits, daycare treatment, inpatient days, and intensive care days. In addition, data on drug dosages, treatment duration, and supportive care were collected. Data on services provided outside the hospital were not collected.

Generally, data on hospital resource use can be collected from electronic hospital records and patient files. However, data can only be retrieved if it has been adequately reported by physicians. Adequate reporting may be hampered in daily practice because physicians are not dictated by strict criteria as in trials. Patient questionnaires can be used to collect data on additional direct medical costs (e.g., healthcare providers outside the hospital, concomitant medication), direct non-medical costs (e.g., traveling costs), and indirect non-medical costs (e.g., productivity costs). It is important to note, however, that the inclusion of cost items other than direct medical may be hampered in a registry in which data are retrospectively collected. In PHAROS, we encountered several issues. First, information on resource use outside hospitals was expected to be extremely fragmented, especially in cases of severe diseases with centralized treatment. Patients in the PHAROS registry were often discharged from hospital and referred to different rehabilitation centers. Second, although PHAROS was initiated as a prospective registry, clinical and costs data were mainly collected retrospectively at several points in time. In other words, we started in 2010 to collect data from patients diagnosed from 2004 onwards. Patients were identified using the nationwide Netherlands Cancer Registry. This resulted, however, in a delay in the inclusion of patients.

Regarding productivity costs, PHAROS was supplemented with information from the Patient Reported Outcomes Following Initial treatment and Long term Evaluation of Survivorship (PROFILES) study. This longitudinal cross-sectional study was conducted to obtain insight on amongst others quality of life and productivity losses of patients with follicular lymphoma [15]. However, the reassessment of the drug of interest was bounded by a 4-year re-evaluation period. At the time, our economic evaluation needed to be conducted for Dutch decision makers, the number of patients included in the longitudinal study was limited and data could not be matched to the disease states in our model. Therefore, the economic evaluation did not include productivity costs. We assumed that this was a conservative approach because the productivity costs for rituximab maintenance are most likely lower compared with the best supportive care group [16, 17].

Furthermore, economic evaluations should only concern costs related to the disease and/or its treatment instead of the costs induced by unrelated diseases occurring simultaneously. It is important to note, however, that establishing such a relation is not always easy or clear-cut when using registry data. For example, admission of older patients to a nursing home may either be related to the disease but may also have occurred for other reasons. Moreover, determining which costs are related to the disease and/or its treatment is even less straightforward for an older population and in cases where comorbidities are present. Therefore, the inclusion of some cost items may be debatable.

The inclusion of cost items in the PHAROS economic evaluation was based on our previous experiences and supported by the literature that reported the same main cost drivers in treating hematologic patients [12, 18]. Therefore, it was believed that an appropriate balance was achieved between registration burden and relevance of the cost items. Such an evaluation of assumptions is crucial and depends on the characteristics of the patient population and the type of drug of interest. More detailed information regarding the included cost items and the unit costs are reported elsewhere [12].

The definition of the policy issue and research questions determines the cost components included in a registry. It is possible that researchers who conduct the economic evaluation are not yet involved at the start of the registry and must therefore rely on available data. In these cases, confirmations from the literature should be obtained to ensure that the most important cost components are included in the economic evaluation.

3.6 Identify, Measure, and Value Outcomes of Each Alternative

The most preferred effectiveness outcomes for policy makers are overall survival (OS)/life-years gained (LYG), and quality-adjusted life-years (QALYs) but also other clinical objectives linked to improvement in patients’ outcomes can be included [14]. The follow-up in registries is generally much longer compared with RCTs and data are collected on subsequent treatments. Therefore, registries usually provide more information on OS. In addition, if data on life-time follow-up are collected, extrapolation of survival data, associated with uncertainty, is no longer necessary. Life-time follow-up is extremely valuable for economic evaluations because a lifetime horizon is usually required to incorporate all potential differences in effects and costs for the remainder of the patient’s life [2]. However, because economic evaluations should provide timely results, it may be necessary to conduct evaluations prior to reaching the ideal follow-up time. Regarding other effectiveness outcome measures, it is important to be aware that they may differ from the endpoints of an RCT. For example, primary endpoints of RCTs in cancer are most often response, time to progression, and progression-free survival; OS rarely is a primary endpoint in an RCT. In observational registries, however, data on response and progression may be biased because this may not be accurately captured in patient files [19]. Moreover, physicians in daily practice often do not report using standardized response criteria [20], whereas RCTs dictate response criteria. This may especially be the case when data are retrospectively collected by other individuals than the treating physician. The moment at which progression is established may also differ from an RCT because there is no strict monitoring scheme; progression could thus be established much later than it occurs. Therefore, we advocate using time-to-next-treatment (TTNT) as a proxy for progression, additional to survival, in economic evaluations based on registry data. Whenever a physician changes to another treatment, there must be a reason for doing so; progression can be one of them. In PHAROS, we used TTNT to model final outcomes (i.e., LYG and QALYs).

Regarding the adverse effects of treatments, these should be accounted for in the economic evaluation. However, identifying and measuring toxicity data may be hampered in a registry. Although adverse events and their severity grading were collected in PHAROS, we encountered substantial issues establishing causal relations between the treatment and the adverse event.

Regarding the outcome quality of life, these data can be collected in a registry using patient-reported outcome measures. As mentioned previously, the number of patients included in the PROFILES study was still limited, and we could not match the data to the disease states in our model. Therefore, we based the utilities on findings in the literature.

3.7 Calculate the ICER

This step usually involves modeling methods such as Markov modeling or patient-level simulation modeling [21]. It is important to carefully select the model that best fits the data from the registry [22]. This step can greatly differ from only using data from an RCT. The main issues in calculating the real-world incremental cost effectiveness are associated with confounding by indication, missing data, and insufficient numbers of (comparable) patients. These issues will be further discussed in the next section. The ability to deal with these issues determines whether it is possible to develop a feasible model for the economic evaluation and to obtain valid incremental estimates based on real-world data only [19]. We used the methods as set forward by Dutch guidelines. Detailed information on the cost-effectiveness calculations performed with PHAROS data is reported elsewhere [12].

3.8 Assessment of Uncertainty

The outcomes of an economic evaluation are surrounded with uncertainties, irrespective of whether the economic evaluation is based on data from an RCT or a registry. Therefore, it is important to extensively conduct analyses of the most important uncertainties. This information may be crucial for deciding on the adoption of a new drug. The uncertainty of input parameters can be analyzed by scenario analysis as well as probabilistic and univariate sensitivity analyses [2]. In PHAROS, we observed great patient heterogeneity which resulted, in combination with small numbers of eligible patients treated with the drug of interest, in wide confidence intervals. In addition, as presented in Table 2, different scenarios based on different assumptions lead to different cost-effectiveness ratios (e.g., costs per QALY ranged from €11,499 to €12,789 to €23,919 in three scenarios [12]). While information regarding the assumptions for the model and appropriate sensitivity analyses on assumptions apply to all economic evaluations, we believe this is even more important when using registry data. Assumptions to calculate incremental outcomes might be because of the absence of randomisation, which is less straightforward.

Table 2 Scenario analysis of the PHAROS economic evaluation [12]

3.9 Presentation of the Results and Discussion of All Issues of Concern to Users

Presenting and discussing the results in an understandable matter is of utmost importance for the use of economic evaluations in decision making [6]. This may even be more important when the economic evaluation is based on data from registries because registry data are often less straightforward and more prone to bias. Topics that need to be reported depend on the conducted economic evaluation but should at least include: information on confounders, methods to account for missing values, validity, and generalizability of the results. The latter two are extremely important to determine usefulness of the results for decision makers [2]. It is also important to separately report both the effects and costs per alternative. Extremely high ICERs may, for example, indicate large cost differences between alternatives, but they can also result from small incremental effects.

4 The Main Issues in Economic Evaluations Based on Registry Data

There are three main issues with conducting economic evaluations with real-world data from registries: (i) confounding by indication; (ii) missing data; and (iii) insufficient number of patients. If encountered, it is crucial to appropriately deal with these issues to maximize the validity of the results of the economic evaluation and its value to decision makers.

4.1 Confounding by Indication

One of the main concerns about observational data raised in academic literature is the lack of a randomized controlled setting, which results in problems with internal validity [2325]. Instead of treatment being randomly assigned as in an RCT, the choice of treatment is made by the treating physician based on characteristics of the patient. In addition, insurance coverage or national guidelines may also influence outcomes [8]. It is important to be aware that confounding by indication is a major challenge for economic evaluations based on observational data from registries. PHAROS showed that the real-world patient population was highly heterogeneous. When baseline patient characteristics associated with the outcome of interest differ between the treatment groups, the results of a study are biased if not appropriately corrected for these differences. We are aware that no correction method can substitute randomization, but there are several methods that can be used to increase the validity of the outcomes.

Methods to deal with confounding by indication are for example multivariable regression modeling, propensity score (PS) matching, and data synthesis. Multivariable regression modeling has been the conventional method to reduce bias related to confounding by indication. Potential confounders are included simultaneously in a regression model that estimates final outcomes. Using multivariable regression models for registry data requires information on patient and disease characteristics.

In the past decade, there has been an increasing trend of using PS matching techniques [26]. This technique allows the calculation of the chance of receiving the treatment of interest by using observed patient characteristics [27]. PS scores are then used to match a treatment group to a comparator group based on patients who have similar chances (PS scores) of receiving the treatment of interest. Other applications of the PS score matching technique include stratification, covariance adjustment, and weighting [27, 28].

Although PS matching techniques are increasingly and successfully used [26, 29], these techniques are less attractive when multiple treatment strategies are compared simultaneously. A better understanding of the benefits and limitations in practical circumstances of PS matching vs multivariate risk modeling is still needed [26].

Finally, in case correcting for confounding is hampered (e.g., missing values or a lack of a control group), data synthesis can be used to model incremental outcomes. For example, it may be a good option to synthesize efficacy data from an RCT with effectiveness data from daily clinical practice, especially when an appropriate comparator group is lacking [30]. However, it is important to be aware that there was an initial need for data from daily practice because patient baseline characteristics may differ between patients treated and not treated in an RCT.

4.2 Missing Data

Even when a registry is well designed and executed by an active interdisciplinary collaborative research group, it is to be expected that missing values on certain variables will exist. Therefore, only analyzing complete cases is most likely not possible. Although imputing mean values might be less of a problem for RCT data, this method is not to be recommended because the patient population in daily practice is usually far more heterogeneous. We recommend using the multiple imputations method because this method not only imputes missing values but also accounts for the uncertainty associated with the imputed value by creating multiple datasets [31]. Missing values are imputed based on observed variables. To account for the uncertainty of the predicted variables, each missing value is imputed multiple times resulting in several complete datasets. The analyses of the combined datasets produce overall estimates and standard errors that reflect the uncertainty around the imputed variables. However, it is important to note that this method can only be used for missing values that depend on known and observed variables (i.e., variables missing at random) [32].

4.3 Insufficient Number of Comparable Patients

Sufficient numbers of patients and follow-up data are required for conducting a sound economic evaluation with registry data. This is, however, sometimes difficult to realize in daily practice. A large difference may exist between the actual patient population (i.e., the population included in the registry) and the analytic patient population (i.e., the population that met the criteria for analysis [1]). RCTs usually base the number of patients included on power calculations and continue including patients until the desired number has been reached. This is, however, not possible in daily practice; for example, if physicians no longer use the alternative treatment, the analytic population will be small. The minimal required number of patients also depends on the extensiveness of the heterogeneity of the real-world patients, which may not be known in advance. The option to actively search for extra patients treated with the drug of interest has to be balanced with a potential diminishing generalizability.

In PHAROS, we faced confounding by indication, missing data, as well as a small analytical patient population. First, confounding by indication was present because the comparator group included relatively more patients with a worse prognosis compared with the treatment group [12]. We used PS matching methods to correct for observed differences in patient and disease characteristics. After matching, both groups were more balanced regarding characteristics of re-induction therapy, B symptoms, and disease progression. Table 2 illustrates the variation on outcomes of our scenario analyses in which we used both matched and unmatched data.

Second, we encountered a small analytical patient population in PHAROS. The actual population included nearly 700 patients with follicular lymphoma. However, the required analyses were too early for most patients because the patients did not (yet) receive a second line of chemotherapy. Therefore, only 14 % of the actual population was included in the analytic population. To increase the number of patients, data were obtained from Hemobase, a multidisciplinary Web-based electronic patient record in the north-eastern part of the Netherlands that collected similar data. Although this increased the analytic population from 89 to 113 patients, the number of patients remained small. The rather small and highly heterogeneous population led to wide confidence intervals for treatment with rituximab maintenance (e.g., OS of matched real-world effects ranged from 1.0 to 3.9 years and costs ranged from −€44,362 to +€105,977).

Third, because missing data were present for relevant outcomes (e.g., response rates), the number of patients included in our analyses reduced even further after applying PS matching (e.g., N = 51 reduced to N = 43 in the rituximab group).

5 The Value of Real-World Economic Evaluations for Decision Makers

Decision makers often make limited use of evidence from economic evaluations [33, 34]. There is, however, a higher chance that decision makers use such evidence if the evidence is accessible (i.e., timeliness and understandability) and acceptable (i.e., accuracy and validity of research methods given institutional requirements) [7].

Above all, it is crucial that decision makers realize that registry data differ from RCT data and that the outcomes of their economic evaluations should thus be assessed differently. This should, however, not be seen as a drawback, but rather as an important opportunity. Both data sources complement each other; they allow balancing internal validity and generalizability and answer different questions.

The economic evaluation based on PHAROS data demonstrated these differences by calculating different scenarios. Table 2 presents these scenarios as well as their outcomes. We discuss the value of each scenario for healthcare decision makers regarding whether the research methods were accessible and acceptable.

Scenario 1 was only based on RCT results; no real-world data were included in the analyses. Randomization ensured the internal validity; therefore, the difference between the intervention group (i.e., patients who received rituximab maintenance therapy) and the control group (i.e., patients who were only observed) could be attributed to the treatment. In other words, treating patients with rituximab maintenance therapy costs €12,655 per QALY gained compared with observation only. This scenario used well-known conventional methods (RCT data) and may thus be highly accessible and acceptable to decision makers. Accessibility and acceptability is ensured by the understandability of the results, i.e., economic evaluations based on trial data are intuitive because conventional methods are used. This is, however, at the cost of generalizability, because no data were used from daily practice. The results do not inform decision makers on the expected costs and effects in the real-world patient population while this was the policy issue at stake. As a consequence, none of the questions raised by decision makers (i.e., to whom and how is the drug prescribed and what is the real-world cost effectiveness) can be answered with scenario 1.

In scenarios 2.1 and 2.2, efficacy data from the RCT were combined with matched and unmatched real-world cost data, respectively. This resulted in substantial differences in the estimated costs per QALY gained (€23,821/QALY for scenario 2.1 and €5,162/QALY for scenario 2.2). Because both scenarios combined RCT data with real-world data, the interpretation of the outcomes may be more complicated because it is unclear to whom the results apply, i.e., trial, real-world patients, or both. In other words, results are less accessible for decision makers. The effectiveness estimates are internally valid because they are based on RCT data, but they do not inform decision makers on the effectiveness in daily practice. In contrast to scenario 1, both scenarios 2.1 and 2.2 provide information on real-world costs. It should be noted, however, that the accuracy of the incremental costs in scenario 2.2 may be impeded because patients treated and not treated with rituximab maintenance therapy were not comparable and we did not correct for these differences by using a matching method. Moreover, it is questionable for whom the ICER is actually valid (i.e., the efficacy estimates apply to trial patients while the cost estimates apply to the real-world patient population). Therefore, both ICERs should be carefully interpreted.

Scenarios 3.1, 3.2, and 3.3 used real-world data for both cost and effectiveness estimates. Consequently, the results are generalizable to the real-world patient population and applicable to the policy issue at stake. Because decision makers are less familiar with interpreting real-world data, these scenarios may be less accessible for decision makers. It is, therefore, crucial that the methods and results are extensively reported in an understandable language. Unmatched data as used in scenarios 3.2 and 3.3 inform decision makers on the real-world costs and effects, but a major drawback is that differences cannot be assessed between cases and controls because the incremental estimates are not sufficiently valid. Both scenarios 3.2 and 3.3 show higher total costs for the control group while the opposite was expected and shown by the other scenarios. Although matching methods reduced the analytical population, we believe that scenario 3.1 provides the most accurate and valid results because matching methods were used for both costs and effects to reduce bias related to confounding by indication.

Decision makers were interested in real-world outcomes and, in the Dutch case, required evidence from daily clinical practice to reduce the uncertainty of both real-world costs and effects of rituximab maintenance therapy. We believe that the computed ICERs can only be used if the applied methods are accurate and valid. In other words, incremental outcomes of economic evaluations can only be used when cases and controls are comparable or when appropriate methods are used to correct for differences in baseline characteristics (scenarios 1 and 3.1). In cases where baseline characteristics greatly differ between patient groups and no matching methods have been used, the outcomes of an economic evaluation should not be acceptable for decision makers because the incremental outcomes are not accurate and not valid. We believe that scenario 3.1 is most valuable to decision makers because this scenario achieves an appropriate balance between generalizability and internal validity. The estimated cost-effectiveness ratio (€11,245) also provides reassurance to decision makers that efficacy from the trial can be realized at favorable costs in the real-world patient population. However, because a formal decision has not yet been made, it is currently unknown how decision makers interpreted and evaluated the outcomes.

6 Further Research Areas for Registry Data

Expensive cancer drugs are increasingly developed for patient populations stratified by genetic characteristics and this trend illustrates an increasing role for biochemical, histological, and genetic markers to aid treatment decisions [35]. While the PHAROS registry focused on expensive drugs, registries may also be used to collect information on biochemical, histological, and genetic markers, which can be used for economic evaluations of these markers. This may be an important subject for further research using registry data.

7 Final Remarks

It is important for decision makers that a drug provides sufficient value in relation to its costs in daily practice. Economic evaluations based on real-world data can provide extremely valuable insights into real-world incremental cost effectiveness [12, 30, 36]. In PHAROS, both matched and unmatched outcomes seem favorable for the decision to adopt rituximab maintenance therapy. In other cases, the variation in outcomes can be much greater and less favorable than in PHAROS, which necessitates a careful evaluation of the causes of the conflicting results between RCT and real-world data. Moreover, it may not always be possible to develop a feasible model with real-world data to calculate incremental estimates [19]. We advocate that incremental estimates (ICERs) should always be based on matched patients in case patient groups are incomparable. However, unmatched real-world data are still valuable for decision makers because they provide evidence on costs and effects of a treatment in a real-world setting, although not incremental [18, 19, 34]. Real-world evidence can also be used to obtain a certain level of reassurance regarding the extent to which the evidence from the RCT is applicable to the real-world patient population. It is, however, crucial that decision makers realize that the outcomes of an economic evaluation based on registry data should be assessed differently compared with the outcomes of an economic evaluation based on RCT data. The need for generalizable outcomes has to be balanced with the need for internally valid outcomes. While registries are able to provide insight into the use, effectiveness, and costs of a therapy in routine clinical practice and therefore offer healthcare decision makers with realistic expectations for outcomes in real-world patients, it should be noted that other solutions exist to balance internal and external validity. For example, pragmatic trials can include a broad patient population and can thus also ensure generalizability. Pragmatic trials have the major advantage of randomizing treatment but are on the other hand, however, associated with logistical, ethical, and sample size challenges as well as high resource investments [37].

In PHAROS, we demonstrated that it was feasible to conduct a real-world economic evaluation using registry data. We believed that we provided decision makers with acceptable and accessible information and showed that the real-world outcomes confirmed the efficacy of the trial. In our opinion, this provided reassurance to decision makers about a drug’s value for money in daily clinical practice.