FormalPara Key Points

We characterize the pan-Canadian Oncology Drug Review (pCODR) process as two stages: a decision to reject or not reject, followed by a decision to recommend full or condition approval (conditional on non-rejection).

Clinical aspects appear to carry the greatest weight in the decision to reject or not reject, whereas value for money had the greatest weight in full versus conditional approvals.

Notwithstanding pCODR’s implicit review process, there appears to be an identifiable and consistent set of factors driving pCODR recommendations.

1 Introduction

Health technology assessment involves the joint consideration of multiple criteria in aid of specific policy decisions. In this paper, we explore how pan-Canadian Oncology Drug Review (pCODR) balances clinical, economic, social and organizational criteria to make recommendations regarding the funding of oncology drugs to drug plans across Canada. Specifically, we evaluate the impact of the content and quality of economic analyses and clinical studies on recommendations to fund, not fund, or conditionally fund specific cancer drugs.

The Canadian healthcare system is publicly funded from general taxation revenues. In line with the broader federalist structure, the federal government collects and pools revenue and oversees the general legislative framework, but allocations of monies to specific healthcare services, including drug plans, are made at the sub-national (provincial) level. In Canada and elsewhere, there has been a movement to centralize the drug review process [1, 2] to provide national recommendations to be acted on at the sub-national level.

The pCODR was established in 2011 and is administered alongside the broader Common Drug Review (CDR) within the Canadian Agency for Drugs and Technologies in Health (CADTH). While CDR is responsible primarily for non-cancer community-based treatments, the pCODR is responsible for cancer drugs administered in and outside of hospitals [3, 4].

The stated objective of the pCODR review process is “to bring consistency and clarity to the assessment of cancer drugs” in focusing on four explicit dimensions of value: clinical benefit, economic evaluation (value for money), patient-based values and adoption feasibility [5]. pCODR guidelines do not offer explicit weights for these dimensions or a threshold that would need to be met for any single element of the review. Each dimension is to be discussed with reference to the uniqueness of the individual drug, disease and context [5]. This implicit approach contrasts with the UK National Institute for Health and Care Excellence (NICE), which has effectively adopted a more explicit, though still contextual, cost-effectiveness ‘range’ [6,7,8].

The ambiguity of implicit approaches can provide flexibility in exercising appropriate contextual judgement and addressing the inherently complex nature of healthcare priority setting [9,10,11]. However, ambiguity can also adversely impact transparency, rigour and consistency and may create an opportunity—perceived or real—for special interest groups to unduly influence decisions [12,13,14,15]. More explicit decision weights and acceptable trade-offs (for example, a maximum cost-effectiveness threshold) arguably avoid some of these issues.

Given the implicit nature of the pCODR process, it is of interest to understand how these decisions are actually made. The processes and committees involved in the review of drugs (cancer and otherwise), and the basis of recommendations for reimbursement, have previously been investigated in the literature from several angles. There are three broad strands of this literature, focusing on (1) comparisons of processes, (2) the assessment of regional variations between recommendations, and (3) the factors that appear most important in driving recommendations. Our study contributes to this third category.

High-level comparisons of review processes between jurisdictions show that drug reimbursement decisions are important across healthcare systems yet vary in the structures and decision criteria used [16], process indicators [17, 18] and resulting recommendations [19,20,21]. Variations are attributed to general differences between health systems [19, 22] and insufficient economic evidence [23]. Some cross-country variations are explained by considerations of therapeutic value [24], disease severity [25] and economic considerations [22, 26, 27]. Clinical considerations have been shown to be significant predictors of funding recommendations in Canada, the UK and Poland [28,29,30,31], with clinical uncertainty being significant in Belgium and Wales [32, 33] and not significant in Scotland [34]. Cost effectiveness was a significant influence on funding recommendations in all countries except Wales [33]. In addition, funding bodies have considered the recommendations made elsewhere [22].

We were interested in assessing the relative importance placed on the four dimensions of value defined by pCODR guidelines to understand the basis of their recommendations. We were also interested in testing whether there may be unstated acceptable thresholds for some aspects, particularly around the economic dimensions. Using publicly available information on pCODR recommendations to date, we estimated the relative importance of individual criteria on the likelihood of a drug receiving a positive, negative or conditional recommendation. We did not address the unique separation of responsibilities in Canada between pCODR and CDR, although this issue has been debated elsewhere in the literature [35,36,37].

Our study builds on the existing literature by focusing on recommendations made by a review body that has not previously been investigated using a revealed preferences methodology. Our study differs from that of Rocchi et al. [28] in that they investigated the recommendations of the CDR, whose mandate includes the review of drugs and therapies considered for public funding in Canada except for oncology drugs. pCODR is a separate review body, with a strict focus on cancer drugs.

2 Methods

2.1 Data and Data Extraction

Data for this study were extracted from publicly available reports of all recommendations made by pCODR between 2011 and 2017 [38]. These reports are prepared by the pCODR Expert Review Committee and provide a summary of the evidence for each of the four dimensions that are used to guide recommendations: clinical benefit, economic evaluation (or value for money), patient-based values and adoption feasibility. pCODR makes three possible recommendations: full approval, conditional approval or rejection. This format is consistent with many other review bodies in that at least one intermediate level exists between ‘accept’ and ‘reject’ recommendations (e.g. restricted use, conditional acceptance, time-limited acceptance).

Funding is recommended (full approval) if a submission is judged to meet all four dimensions of value (clinical benefit + value for money + patient values + adoption feasibility). If a submission does not (and cannot) meet one or more dimensions, the recommendation is to reject for funding. However, if the committee feels that the neglected dimension(s) could potentially be addressed by some change, such as a reduction in price or additional clinical evidence, then the committee can recommend a conditional approval, pending this change or clarification. In practice, it appears that conditional approval has been exclusively granted pending a reduction in price.

We used an initial sample of ten summaries to identify variables for extraction and to validate their labels. Subsequently, each report was coded by two independent reviewers. Conflicts were resolved via discussion and, if consensus could not be reached, a deciding vote by a third reviewer. The unit of analysis was submissions rather than drugs, as the same drug could be submitted for more than one indication. A particular drug–indication combination could also be re-submitted for consideration on the basis of new evidence or a new price following a conditional approval. Each submission, whether initial or re-submission, was treated as an independent observation.

Within the decision summaries, we identified ten consistently reported elements, which we subjectively categorized according to the pCODR decision criteria in Table 1. We recorded observed survival with the intervention and the comparator as reported in each submission. However, different submissions reported average or median overall survival (OS), progression-free survival (PFS), or 12-month, 24-month or 36-month survival rates. To accommodate these different measures of clinical benefit, we analysed survival data in terms of relative benefit, taking the ratio of whichever statistic was reported for the intervention and comparator groups. When more than one survival statistic was reported, we estimated relative benefit in this order of priority: OS, PFS, and finally x-month survival. If there was no comparator in the submission, we set relative survival to 1. We also added a flag indicating whether the primary outcome of the clinical evidence was OS, and an interaction term between clinical benefit and the OS flag to test whether clinical outcomes reported in terms of OS carried more weight in pCODR deliberations than disease-free survival or PFS.

Table 1 Description and distribution of attributes (complete case analysis N = 91)

The quality of clinical evidence was assessed on the basis of whether the submission was based on a phase III, double-blinded randomized controlled trial with an appropriate control arm and was judged as methodologically strong in the pCODR report. The severity of adverse events was defined relative to existing treatments; a substantial increase in the risk of an adverse event relative to the clinical standard was coded as high, whereas similar or lesser risks were coded as low. Submissions were considered to address an issue of ‘limited treatment options’ if no or very few alternative treatments were available to treat the particular condition.

Uncertainty of the economic model was judged on the basis of comments in the economic summary as well as whether the summary reported a specific cost-effectiveness ratio (ICER). We used the ICER estimate judged most plausible by pCODR economic reviewers; this was often based on a version of the original economic model adapted by pCODR reviewers to incorporate different assumptions or input parameters. We divided the ICER by 10,000 to facilitate model convergence.

Patient time and infrastructure requirements both tended to closely correspond with whether the drug was an oral or intravenous form of the drug. Oral drugs were associated with less patient time, in terms of both chemotherapy ‘chair time’ as well as travel time to a chemotherapy clinic. Likewise, oral drugs tended to reduce the infrastructure requirements associated with therapy, although some oral drugs required additional testing or blood monitoring. We coded these subjectively, on the basis of comments in the decision summary, and—although we arbitrarily assigned patient time to the patient values dimension and infrastructure to the adoption feasibility dimension—we recognise there is overlap between these attributes and dimensions. Likewise, aspects expressed in patient value statements tended to correspond with survival benefit or unmet need, and therefore these attributes can be seen as measures of patient value as well as clinical benefit. Finally, specific budget impact was not typically reported as this was dependent on the characteristics and circumstances of each province, but we coded the expected budget impact as high or low on the basis of the size of relevant patient population, the cost of the drug, and expected new infrastructure or testing requirements.

2.2 Statistical Analysis

We conducted a complete-case analysis, excluding records with missing values. We used a chi-squared bivariate test for associations between individual attributes and the final recommendation and used multivariate regression methods to estimate the impact of different factors on pCODR recommendations. We addressed the following two questions:

  1. 1.

    Which of the identified factors (and dimensions) appear to carry the greatest weight in pCODR recommendations?

  2. 2.

    Is there an implicit maximum willingness-to-pay or cost-utility threshold in pCODR recommendations?

Similar recommendations in other jurisdictions have been analysed using binary [28, 29, 33, 34, 39,40,41] or multinomial [31] regression methods or bivariate analysis alone [42]. We addressed the first question via a two-stage binary approach. In the first stage, the outcome of interest was defined as rejection versus non-rejection (including full and conditional approval). In the second stage, the outcome of interest was full approval versus conditional approval and was limited to alternatives that were not rejected in the first stage.

In both stages, we used a penalized binary logistic regression model to account for the relatively small number of observations and the sparse contingency table. Using this model, we systematically tested all possible main effects combinations, as well as plausible interaction terms. We selected a preferred specification on the basis of Akaike’s Information Criterion corrected for small samples (AICc) and the correspondence between the predicted and the actual decision (predictive accuracy). We reported estimated coefficients, p values and marginal effects for the preferred specification in each stage.

To assess the relative importance of the included attributes, we re-estimated the preferred model at each stage, systematically excluding one attribute at a time to record its impact on the log-likelihood (LL) and predictive accuracy. A greater impact on these measures was interpreted as signalling a greater impact of the attribute on pCODR recommendations. Attributes not included in the preferred specifications were judged to have had little influence on the pCODR recommendation.

The second study question was addressed via a segmented linear probability model to identify any statistically significant inflection points in the likelihood of full approval by reported ICER. We also estimated the specific ICER threshold at which there was a 50% probability of approval.

All analyses were performed using R statistical software, version 3.2.3. The binary logit model was estimated using the BRGLM package, and segmented regression was performed using the segmented package.

3 Results

3.1 Descriptive Statistics

Our review identified 94 unique decisions up to 30 January 2017. Of these, 15 (16%) were fully approved, 55 (59%) were conditionally approved and 24 (26%) were rejected. There were 81 unique drugs represented among these decisions, with 13 submissions for more than one indication and six re-submissions. We were unable to extract a full set of attributes for three recommendations (two did not report an ICER, and we were unable to estimate budget impact for one). These were excluded from the complete case analysis, giving a final sample size of 91 recommendations for the first stage of the analysis: 14 (15%) fully approved, 53 (58%) conditionally approved and 24 (26%) rejected. The 67 full or conditional approvals were analysed in the second stage of the analysis. The frequency distributions of the complete case attributes are shown in Table 1.

3.2 Statistical Analysis

The frequency distributions of the extracted attribute levels and the significance of chi-squared tests of their association with the final recommendation are shown in Table 1. Our full multivariate model specification included all the attributes in Table 1 except overall clinical benefit and the flag for ICER not reported. We excluded overall clinical benefit as it was a summary of the other attributes and was closely correlated with the final decision. We also included four interaction terms: relative survival gain with OS flag, relative survival gain with ICER, relative survival gain with side effects, and available alternatives with side effects. This full specification had an AICc of 111.95.

After systematically testing all 8191 combinations of these variables, the specification that minimised AICc (91.31) included a flag for high-quality clinical evidence, the interaction between relative survival gain and low adverse events, and the interaction between availability of alternatives and low adverse events. This specification correctly predicted 82% of all decisions and 50% of rejections. The coefficients and marginal effects for this preferred specification are summarised in Table 2. A positive coefficient indicates an increase in the likelihood of rejection, and a negative coefficient indicates an increase in the likelihood of approval (full/conditional approval).

Table 2 Summary of preferred specification: rejection vs. non-rejection (full/conditional approval) (n = 91)

This model suggested that, holding all other factors constant, drugs with high-quality clinical evidence, better relative survival gain, without alternatives, and with low adverse events were less likely to be rejected. The marginal effects suggested that a submission with high-quality clinical evidence was 26% less likely to be rejected than a submission with low-quality clinical evidence. This was the only attribute that had a statistically significant influence on the recommendation. This interpretation is supported by the change in log-likelihood resulting from the exclusion of each parameter. The quality of the clinical evidence appeared to be approximately three times more important than either of the other two variables.

In the second-stage analysis, estimating the likelihood of full versus conditional approval (excluding rejections), the full specification had an AICc of 60.52. A specification that included only the ICER and a flag indicating a low incidence of adverse effects minimised AICc (Table 3). This specification had an AICc of 27.87 and correctly predicted 91% of full versus conditional approvals.

Table 3 Summary of preferred specification: full vs. conditional approval (n = 67)

Submissions with a higher ICER were more likely to receive a conditional than a full approval. Each $Can10,000 increase in ICER was associated with a 3.3% decrease in the likelihood of full approval. The ICER was the only statistically significant contributor to the full versus conditional approval recommendation. The impact of low adverse events was not statistically significant. This was supported by the change in log likelihood, which suggested that the exclusion of the ICER had a much greater impact than the exclusion of the adverse events flag.

The predicted probability of full approval by the ICER is plotted in Fig. 1. Submissions with an ICER < $Can87,500 had a > 50% probability of full approval, and there was a sharp inflection point in the probability of full approval at an ICER of $Can140,700 per quality-adjusted life-year (QALY) gained.

Fig. 1
figure 1

Predicted probability of full approval by incremental cost-effectiveness ratio (ICER) and final recommendation

An implicit maximum acceptable cost-effectiveness threshold around $Can140,000 per QALY gained is consistent with a simple cross-tabulation of recommendations by ICER category, shown in Table 4. All the full approvals had a reported ICER < $Can150,000, whereas the proportion of conditional approvals increased with ICERs > $Can100,000 per QALY.

Table 4 pCODR recommendations by incremental cost-effectiveness ratio

4 Discussion

The pCODR deliberative framework [5] considers four dimensions of value: clinical benefit, economic evaluation (value for money), patient-based values and adoption feasibility. If all four dimensions are judged as having been met, the drug is recommended for funding. When one (or more) of the four criteria is judged not to have been met, the drug is not recommended for funding. If the unmet criterion could potentially be met through changes to one of the variables considered, the drug is recommended for funding conditional on improving some aspect (most often price). Our characterization of the decision process as a two-stage process is consistent with this framework: an initial decision on whether or not to reject a submission outright and, given non-rejection, a decision on whether to grant full or conditional approval.

Our first model suggested that pCODR’s decision to reject versus approve (fully or conditionally) a submission is driven almost exclusively by the clinical profile (quality of the clinical evidence, relative survival gains and the incidence of adverse events) and the consideration of alternatives. This finding is consistent with other literature that demonstrates the influence of clinical uncertainty [28, 33] and clinical superiority [28,29,30, 32, 40] on funding recommendations in Belgium, Canada, Poland, the UK, and Wales. The number of alternative treatments has also been shown to be important in Australia and Belgium [32, 43].

Value for money, in the form of cost per QALY gained, does not appear to play a role in the initial decision to reject or accept, but it was a key factor in the decision over full versus conditional approval (in non-rejected cases). There was evidence of a maximum cost-effectiveness threshold around $Can140,000 per QALY gained, with no submissions beyond this threshold receiving full approval. Submissions that reported a low incidence of adverse events were also more likely to receive full approval.

The factors that were significant in the two models appeared to represent three of the four dimensions considered by pCODR: clinical (quality of evidence, relative survival gain, adverse events), economic (ICER) and patient values (availability of alternatives). Adoption feasibility, particularly as represented in terms of budget impact, did not appear in either model. This may reflect that budget implications are province specific and likely play a much greater role in the provincial decision to fund a drug. At the national review level, it is difficult to accurately account for provincial budget considerations.

Notwithstanding consideration of the availability of alternatives in pCODR recommendations, patient values are still largely neglected in the review process. Patient input to submissions often expresses a willingness to accept greater risks of adverse events in exchange for longer survival or greater treatment options. Some consideration of a greater willingness to accept risk, or (presumably) a greater willingness to pay for health gains, would arguably be more relevant expressions of patient values than the simple count of treatment alternatives, but they are not included in the current review process. Appropriate methods for measuring and incorporating such patient values and preferences merits further research.

Subjectivity in coding the qualitative factors, and a relative lack of variability in many of them, are key limitations of the study. As noted, the decision summaries were largely descriptive, and interpretations of aspects such as unmet need, the severity of adverse events, the quality of clinical and economic evidence, and patient values were unavoidably subjective—for us as well as for the pCODR committee members. This risk was managed with standard qualitative coding techniques, including the consistent application of a priori coding criteria (emerging from the initial sample of ten reports), the use of two independent reviewers, and the resolution of conflicts in consultation with a third reviewer. The statistical analysis was also limited by a relatively small number of observations, which may lead to a sparse contingency table and unrealistic parameter estimates. This is mitigated by our use of a penalized model and by emphasising the change in log-likelihood rather than marginal effects as a measure of relative importance.

5 Conclusion

Among the four dimensions of value highlighted in pCODR guidelines (clinical, economic, adoption feasibility, and patient values), clinical aspects appeared to carry the greatest weight in the decision to reject or not reject, along with aspects of patient value (treatments with no alternatives were less likely to be rejected). Cost effectiveness did not appear to play a direct role in the initial decision to reject or not reject but is critical in full versus conditional approvals. There was also evidence of a maximum acceptable threshold around $Can140,000 per QALY gained. These results are plausible and have a face validity consistent with anecdotal descriptions of the pCODR review process. Together, they suggest there is an identifiable set of factors driving pCODR decisions, supporting the consistency of the review process despite the absence of explicit decision weights or thresholds. However, the implicit nature of the review process, and the difficulty of extracting and interpreting some of the attribute levels used in the analysis, suggests that the process may still lack transparency.

Data Availability Statement

The R dataset used in the statistical analysis is available via Figshare at https://figshare.com/articles/Cleaned_pCODR_decision_summaries/5759646. The R code used in the analysis is available from the authors upon request.