FormalPara Key Points

A framework to assess the potential benefit of a new drug based on real-world data and its target product profile.

The approach allows for early data driven portfolio decisions to select drug candidates based on their expected cost savings.

A worked example is included.

1 Introduction

For a new drug to be developed, the desired properties are described in a target product profile [1]. This target product profile states the expected efficacy (“intended”) and safety (“unintended”) outcomes. For example, as compared with the current standard treatment, the target product profile could state that a reduction of 30% in efficacy outcome A (e.g., myocardial infarction) and of 20% in efficacy outcome B (e.g., ischemic stroke), at the potential risk of an increase of 20% in an adverse event C (e.g., gastrointestinal bleeding), is expected [2]. This is a typical benefit/risk trade-off and, for the purposes of this paper, we assume that this trade-off yields a net positive clinical effect for patients. This is, of course, an assumption that needs to be considered carefully and with input from all relevant stakeholders; should this assumption not hold, a wholesale reconsideration of the development program would be clearly warranted.

For payers—governments or commercial entities, those making formulary and other financial decisions—the question is, how does such a product profile translate into benefit and risk to patients, and how does it translate to costs for the healthcare system. On the cost question, payers ask whether the additional costs for the new drug and any incremental risk of adverse events will be offset by savings because of the drug’s efficacy. This evaluation of costs is done in comparison with an existing standard of care, whose costs may be unknown but are generally measurable with existing data sources (e.g., healthcare claims). This cost evaluation is a critical step, as investment decisions in drug development are often (though not always) made in favor of drugs that offer clear advantages to existing treatments with respect to benefits, risks, and/or costs. Among payers’ health technology assessment agencies and formulary decision makers, only drugs with clear benefit/risk and benefit/cost advantages will be positively viewed and ultimately accepted for reimbursement.

We propose a framework for using real-world data (RWD) to (1) measure the disease-specific costs of a disease’s current standard of care and then (2) project the costs of the proposed new product, from which (3) a cost differential can be calculated. The approach employs retrospective RWD to obtain assumptions around the observed healthcare costs for a cohort of patients taking the existing standard of care. Recognizing that the new product may not be suitable for all patients using the current standard—for example, their disease may be less severe than that of the intended patients for the new medication, or they may be contraindicated for the new drug—the approach then takes a weighted sample from this baseline cohort to create a new cohort of patients who fit the target profile of the hypothetical drug under development and in whom hypothesized incremental effects on efficacy and safety can be estimated. This new sampled cohort and the baseline cohort representing the standard of care are then used to estimate differential disease-specific costs for the hypothetical treatment at both the population and the patient level.

The use of sampling and reweighting methods to generate a “pseudo-population” that reflects the characteristics of a target population has a rich history in the epidemiology and causal inference literature. In these settings, the objective is to generate a pseudo-population that reflects the covariate distributions of a target population to improve internal and external validity when estimating treatment effects (e.g., confounding control or generalizing study results) [3,4,5,6,7,8,9,10]. For these purposes, a large number of variables are often considered when reweighting the original cohort, which is frequently achieved through various propensity score weighting/resampling approaches [7, 9,10,11,12].

With that said, the objectives for generating a pseudo-population for the purposes described here are separate from those for causal inference. Instead of generating a pseudo-population that mimics the covariate distributions of a target population, the sampling objectives here are simply to generate a pseudo-population where the expected number of outcome events reflects the expectations described in the target product profile for a new drug. For the purpose of simplicity, we chose to match the sampling within stratum of pseudo-population to the target treatment outcome.

Here, we describe a framework that samples from a cohort of patients representing the current standard of care to generate a hypothetical cohort of patients that fits a given target product profile. We then illustrate the sampling framework using an example based on RWD from patients with incident heart failure. With the proposed framework, we seek to address two audiences. For developers of medical products, we seek to create a framework for data-driven decision making on the allocation of investments in new products. For governmental and commercial payers, we seek to simplify cost-effectiveness analyses of a new drug compared with the standard of care and augment such analyses with assumptions drawn from actual patient experience. Compared with traditional cost-effectiveness analyses, the proposed framework is more flexible as it does not require comparison of pre- and post-treatment within the clinical trial. It further requires minimal assumptions on healthcare costs when assessing cost effectiveness as these are derived directly from RWD.

2 Methods

The sampling approach for generating the hypothetical cohort for which healthcare costs were determined and compared differed according to the number of outcomes defined in the target product profile. In the following, the cases for one, two, and three outcome variables are described.

2.1 Generating the Hypothetical Sample for One Outcome Variable

For one outcome variable, e.g., A, we observe the proportions in the first row of Table 1 from the real-world population. Based on the target product profile, a reduction of 30% for outcome A is envisaged. Thus, a hypothesized sample as shown in the second row of Table 1 is needed. As all the individual cell proportions are known in this case, generating a sample with the hypothesized margins is straightforward and can be obtained by stratified sampling among the cases with outcome A and the cases without outcome A.

Table 1 Observed and hypothesized proportions real-world data for one outcome variable

2.2 Generating the Hypothetical Sample for Two Outcome Variables

For two outcome variables, consider Table 2, which shows the distribution of the outcomes in the real-world population. Suppose that the hypothesized effects include a reduction of the rate of A by 30% and a reduction of the rate of B by 20%. To obtain the marginal distribution of the outcomes in a hypothetical population under these constraints, the hypothetical effect sizes are applied to the margins of Table 2 as illustrated in Table 3. Unlike the case of one outcome variable, the table cells e, f, g, and h can no longer be calculated from the margins because the matrix in Eq. (1) has only rank 3 and can therefore not be inverted. In other words, the solutions for the cell values for e, f, g, and h are not unique. For small total sample sizes, all possible solutions of Eq. (1) can be determined numerically. However, to get a unique solution, additional constraints need to be applied:

$$\left( {\begin{array}{*{20}c} {\begin{array}{*{20}c} e \\ f \\ \end{array} } \\ {\begin{array}{*{20}c} g \\ h \\ \end{array} } \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} 1 &\;\; 1 &\;\; 0 &\;\; 0\\ 1 &\;\; 0 &\;\; 1 &\;\; 0 \\ 0 &\;\; 1 &\;\; 0 &\;\; 1 \\ 0 &\;\; 0 &\;\; 1 &\;\; 1 \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {0.8*\left( {a + b} \right)} \\ {\begin{array}{*{20}c} {0.7*\left( {a + c} \right)} \\ {0.3*\left( {a + c} \right) + \left( {b + d} \right)} \\ {0.2*\left( {a + b} \right) + \left( {c + d} \right)} \\ \end{array} } \\ \end{array} } \right).$$
(1)
Table 2 Observed proportions real-world data two outcome variables
Table 3 Hypothesized proportions real-world data two outcome variables

For example, for a sample size of 100 and the margins given in Table 4, there are a total of seven solutions (i.e., possible cell count combinations). These solutions are given in Table 5. In this example, there is a wide range of the events: 0–6% for outcome A and 2–8% for outcome B. Since cost is strongly correlated with outcome events, it is apparent that any cost estimate would be heavily dependent on the particular solution of Eq. (1). Thus, we need to define an additional constraint on the cell frequencies e, f, g, and h in Table 3 to get a unique solution.

Table 4 Example with a total count of 100
Table 5 Solutions to example in Table 4 with a total count of 100

Huber [13] provides a way to identify unique cell frequencies by setting the odds ratio between the cell frequencies in the original population

$${\text{OR}}: = \frac{e/f}{g/h},$$
(2)

as an additional constraint on the cell frequencies for the sampled population. Alternatively, one can of course fix any one of the four cell frequencies e, f, g, and h in Table 3. In the latter case, the calculation of the remaining three cell frequencies in Table 3 is straightforward, whereas the calculation given the odds ratio is more complicated and is laid out in “Appendix”.

Lastly, one could also assume that the occurrence of the two events are independent of each other. In this special case, the determination of the cell counts is straightforward as the odds ratio will be 1 by definition. However, in the case of independence of the two outcomes A and B, an easier way to obtain the cell frequencies is to distribute the margin counts in, for example, each row separately with the assumed column proportions. Needless to say, in the case of two efficacy outcomes, these are almost always correlated and not independent. Thus, the researcher must define a meaningful constraint, be it for the odds ratio or for one of the table cells.

2.3 Generating the Hypothetical Sample for Three Outcome Variables

The next level is the case with three outcome variables. Consider Table 6 representing the real-world population. From Table 6, we get the following four summation equations:

$$a + c + e + g = o_{A} \quad {\text{ with }}o_{A} {\text{ the proportion of }}A$$
(3)
$$a + b + e + f = o_{B} \quad {\text{ with }}o_{B} {\text{ the proportion of }}B$$
(4)
$$a + b + c + d = o_{C} \quad {\text{ with }}o_{C} {\text{ the proportion of }}C$$
(5)
$$a + b + c + d + e + f + g + h = 1.$$
(6)
Table 6 Real-world data three outcome variables

Similar to the case of two outcome variables, there is no unique solution. Either several of the individual cell proportions need to be fixed or independence must be assumed in order to arrive at a unique solution. In our example, we assume independence of the safety outcome C from both efficacy outcomes A and B. Hence, the problem of defining the eight cell frequencies in Table 6 reduces to solving Table 2 for stratum C and for stratum not C independently.

2.4 Methods for the Worked Example

2.4.1 Data Source

The data source for this study was the Optum Clinformatics Data Mart (CDM), which is a US health insurance claims database that includes longitudinally linked patient records. All patients in the Optum CDM database who met the inclusion criteria were included in the study population. Data in the Optum CDM are collected in an observational manner such that the management of the patient is determined by the patient and the caregiver and represents care as it is provided in routine clinical practice.

2.4.2 Cohort

We generated a cohort of patients with incident heart failure using appropriate International Classification of Diseases, Ninth Revision (ICD-9) and Tenth Revision (ICD-10) codes. This subset of the Optum CDM database included all relevant parameters needed, including medical history, baseline characteristics, outcome A, outcome B, outcome C, and disease-specific costs. Subgroups were defined using the information from the medical history and baseline characteristics.

2.4.3 Hypothetical Sample

We developed software in the R programming environment for generating the hypothetical sample in the way described above. It also performed all analyses required to estimate absolute risk reductions (overall and in subgroups) and potential cost savings based on the risk reductions resulting from the hypothesized effects. If the assumptions in the target product profile are correct, the new derived database, i.e. original plus hypothetical sample, should mirror the claims database a couple of years after launch.

3 Results

The described methodology generated cell frequencies and proportions for the hypothetical cohort based on the respective numbers from the original cohort (see Table 7). Based on the assumed hypothetical treatment effect in this example—a reduced efficacy outcome A by 30%, reduced efficacy outcome B by 20%, and increased safety outcome C by 10%—the absolute risk and event reductions were estimated (see Table 8). The median costs per day for one patient were estimated to be $10.37 and $8.39 in the original and hypothetical cohort, respectively. This means that the assumed target product profile would result in cost savings of $1.98 per day and patient—not accounting for any additional drug costs.

Table 7 Patient counts by outcome
Table 8 Risk and event reductions

4 Discussion

For a new drug development, the desired properties of the new compound are described in a target product profile, which outlines the expected efficacy and safety outcomes. In this paper, we presented a framework to use RWD to determine how such a product profile would translate into cost savings for the healthcare system in order to make portfolio decisions. The approach estimates these cost savings by creating a hypothetical sample from RWD that is based on the target product profile.

Generating a unique hypothetical sample from RWD with no additional assumptions other than those provided in the target product profile is only possible in the case of one outcome variable. For two or more outcome variables, additional constraints or the assumption of independence between the outcome variables is required. While independence between two efficacy outcome variables would appear to be unrealistic, it can be envisaged that, for example, an efficacy and a safety variable are independent. That is, the occurrence of a side effect is unrelated to whether the drug is efficacious in a given patient. Our approach can easily be extended to analyze subgroups of the intended target population. This could, for example, help to predict potential randomized controlled trial subpopulations with a favorable relationship between anticipated absolute risk reduction and size of the respective subgroup. In combination with available evidence of the value of the identified subgroups, the approach might help to justify the inclusion of the subgroups in the statistical analysis plan of a randomized controlled trial.

A limitation of our approach to estimating potential cost savings is that we require all relevant outcomes to be available in a real-world database. While this holds true for hard endpoints such as hospitalization or myocardial infarction, this is not always the case for softer endpoints such as pain. Further, when sampling for multiple outcomes, the sampling framework requires investigators to specify the relationship between the outcomes to get a unique solution. Decisions on what constraints to implement when specifying these relationships is subjective, and findings can be dependent on the constraints specified. Two solutions to this could be as follows: (1) applying an integer linear programming optimization to maximize the cost reduction and (2) providing a range of cost reductions by looping through all possible solutions and estimating the median along with the upper and lower bounds of the calculated cost reductions. However, these enhancements are beyond the scope of this manuscript, and we leave this to future research.

5 Conclusion

We have presented a simple approach to assessing the potential absolute clinical and economic benefit of a new drug based on RWD and its target product profile. The approach allows for early data-driven portfolio decisions to select drug candidates for development based on their expected cost savings. One application of the described approach is to assess the relative value of different subgroups, which may support evidence-driven decisions on portfolio candidates, research and development plans, and market access strategies. Potential future extensions of the methodology should be explored and may include RWD-based estimations of incremental cost-effectiveness ratios or indirect comparisons.