FormalPara Key Points

The transportability of comparative effectiveness evidence generated using data from other countries has become a key concern of health technology assessment agencies.

Methods have been developed to measure and adjust for a lack of transportability but their consideration in a health technology assessment is limited.

This article provides an introduction to transportability and related methods, and discusses important considerations regarding their use in health technology assessment settings.

1 Introduction

Health technology assessment (HTA) agencies commonly express a preference for evidence from randomised controlled trials (RCTs) to guide decision making, citing that randomisation eliminates non-chance confounding from unmeasured patient characteristics [1,2,3]. Conditional on no loss to follow-up and no measurement error, blinded randomisation in large samples ensures RCTs produce unbiased estimates of comparative efficacy within the trial sample, i.e. effects that are internally valid [4].

However, evidence-based decision making also requires that evidence is externally valid, i.e. that treatment effects from a study represent unbiased estimates of efficacy in a specific target population of interest. When study populations are not randomly selected from a target population, external validity is more uncertain and it is possible that distributions of effect modifiers (characteristics that predict variation in treatment effects) differ between the trial sample and target population [5]. Trial samples rarely represent random samples of the target population, often having strict inclusion/exclusion criteria, targeting patients with the highest risk of an outcome, and containing a selected sample of patients agreeing to participate [6].

Studies using real-world data (RWD) are increasingly common sources of comparative-effectiveness evidence in both regulatory and HTA submissions, particularly in rare cancers and other diseases, where conducting well-powered RCTs is difficult [7,8,9]. Despite the absence of randomisation reducing the likelihood of internal validity, RWD studies are often conducted in samples of patients being treated in clinical practice, which are likely more representative of a decision-maker’s target population.

However, external validity does not just refer to generalisability but also to transportability [6]. Whereas generalisability relates to whether inferences from a study can be extended to a target population from which the study dataset was sampled, transportability relates to whether inferences can be extended to a separate (external) population from which the study sample was not derived [10]. The availability and quality of RWD vary substantially by country, meaning that submissions to some HTA agencies often include evidence from patients residing in other countries. Similarly, in many countries, new therapies will not be available to patients prior to reimbursement, meaning local RWD sources will not collect data on all treatments of interest, and manufacturers must use RWD from countries where the therapy is already adopted [11]. The transportability of evidence from other countries, estimated either from RCTs or RWD, has therefore become a key concern of HTA agencies [12], driven for example by cross-country differences in potential effect modifiers such as disease characteristics, comparator therapies and treatment settings, and has led to a preference for local data [1, 13].

When evidence from other countries is used to inform comparative effectiveness (or efficacy), consideration of transportability in study design and analysis should be a key step. Methods have been developed to correct for a lack of transportability [14]. However, these methods have received little attention in an HTA setting. This may be because of limited knowledge of the methods and/or uncertainties in how best to utilise them within existing HTA frameworks.

Existing studies have provided detailed technical information on these methods [14, 15]. This article aims to provide a concise summary of methods available for identifying and correcting for a lack of transportability, with a specific focus on transporting evidence from either RCTs or RWD and transporting evidence across countries. We also extend existing studies by discussing important considerations regarding their use in HTA settings. Although we focus specifically on transporting results across countries, we note that HTA agencies may also be interested in transporting inferences across patient subgroups and treatment settings. Similar considerations will also be relevant when transporting results across regions within the same country. We also focus on transporting estimates of relative treatment effects, although RWD studies in particular may be used for a variety of other purposes in HTAs, for example, for deriving model inputs such as utilities relating to disease states, meaning transporting absolute rather than comparative effects is a concern.

2 Methods for Transportability

2.1 Key Assumptions

Typically, decision makers are interested in the target population average treatment effect (PATE): the average effect of treatment if all individuals in the target population were assigned the treatment. However, researchers commonly have access only to a sample and must estimate the study sample average treatment effect (SATE).

Westreich et al. define the ability of the SATE to approximate the PATE as “target validity”, and deviations from the PATE as “target bias” [6]. These deviations can occur because of threats to internal validity, such as confounding and threats to external validity (Fig. 1).

Fig. 1
figure 1

Drivers of the differences between the sample average treatment effect (SATE) and the population average treatment effect (PATE). Adapted from Degtiar and Rose [14]. External validity bias and internal validity bias can be positive or negative and may bias the SATE in different directions. Sampling variability may also cause deviation from the PATE

Transportability is a specific form of external validity, where the study sample is not a subset of the target population (Fig. 2), for example, when extending inferences from a sample of patients from one country to patients in a different country. This differs from generalisability, where the study sample is a subset of the target population, for example, if extending inferences from a sample of patients with a given condition to all patients with that condition in the same country.

Fig. 2
figure 2

Relevant study and target populations in the case of generalisability and transportability

In addition to data on the study sample, methods to identify and adjust for a lack of transportability require a sample of data on the target population (the “target sample”). The key assumptions required to identify the PATE under transportability are outlined in Box 1 [14, 16, 17]. Standard assumptions for internal validity are required, including: (1) no differences in unmeasured outcome predictors (confounders) between treatment arms (conditional treatment exchangeability); (2) overlap in the characteristics of measured outcome predictors between treatment arms (positivity of treatment assignment); and (3) that the outcomes of individuals do not depend on the treatments received by others (stable unit treatment value assumption). Analogues of these assumptions for external validity are then required, and relate to differences between the study and target samples (rather than between treatment arms).

The conditional mean difference exchangeability of study selection assumption will often be most crucial, and will be the focus of the analysis. This includes identification and adjustment for observed effect modifiers, and arguments over the plausibility of no unmeasured effect modification. Where treatment effects are estimated from RWD, the plausibility of no unobserved confounding is also key. Whilst transporting absolute effects is not the focus of this paper, no unobserved confounding is also required for external validity in this case.

Box 1 Assumptions for identifying the PATE under transportability

To explain these concepts, a hypothetical example is provided in Box 2.

Box 2 Assumptions for identifying the PATE under transportability

2.2 Overview of Key Methods

Methods for identifying and correcting for barriers to transportability are summarised in Fig. 3, based on information in existing published reviews [14, 15]. Detailed information on these methods is provided in the Appendix. As in HTA submissions, the burden of evidence generation falls on the manufacturers (who will commonly have access to the individual patient-level data [IPD] from the study sample), we focus on methods where IPD is available for the study sample and where either IPD or summary data are available from a target sample. However, we note that alternative methods such as meta-analytic techniques can be used when only summary data are available on both samples [14].

Fig. 3
figure 3

Methods of identifying and adjusting for barriers to transportability. IO inverse odds, IPD individual patient data, ML machine learning, SMD standardised mean difference

Identifying transportability involves testing its assumptions, and methods depend on the availability of outcome data in the target sample. Where outcome data are unavailable, requirements underlying the conditional mean difference exchangeability of study selection assumption are tested in two steps by: (1) assessing differences in the distributions of characteristics between study and target populations and (2) identifying whether characteristics driving these differences are effect modifiers (i.e. explain treatment effect heterogeneity). Differences in distributions can also be used to test for potential violations of the positivity of selection. Where outcome data are available, the degree of transportability can be measured directly by testing for an unobserved effect modification using outcome or treatment effect differences between study and target samples after adjustment for observed effect modifiers.

Accounting for a lack of transportability involves adjustment to ensure similarity in observed effect modifiers between study and target samples. Methods depend on whether IPD are available on characteristics of the target sample. Where IPD are available, analogues of methods used to adjust for observed confounding can be employed, including outcome regression-based methods, matching, stratification, inverse odds of participation weighting and doubly robust methods combining matching/weighting with regression adjustment. Where only aggregate data are available, methods including those employed for population-adjusted indirect treatment comparisons, such as matching-adjusted indirect comparisons, can be used.

A key assumption of all methods is no unmeasured effect modification, and a sensitivity analysis is available to assess the extent to which results are sensitive to deviations from this assumption [18, 19]. Appropriate methods depend on whether unmeasured effect modifiers are missing from the target sample only or from both study and target samples.

Finally, many of these methods have been developed for transporting treatment effects from RCTs, but can also be applied to comparative-effectiveness studies using RWD. This can involve pre-processing study sample data using weighting/matching to balance observed confounders prior to applying transportability methods [19], or using bespoke methods that generate weights to simultaneously correct for confounding and transportability [20].

3 Considerations for the Application of Transportability Methods for HTA

In the following, we discuss several considerations for the application of transportability methods in an HTA context (Table 1), building on good practices outlined in the epidemiological literature [14, 15].

Table 1 Summary of the key considerations for the application of transportability methods in HTAs

3.1 Choice of Target Population and Estimand

Discussions regarding the relevant target population(s) should be driven by the decision the study aims to inform. Where HTA agencies make decisions over national reimbursement, the relevant target population will be the whole population of individuals with a disease. However, where HTA agencies make decisions over regional reimbursement or reimbursement within certain subgroups (e.g. patients with high-risk genetics), national disease-wide populations will not be appropriate target populations.

The decision an analysis aims to inform should also guide the appropriate estimand. Decision makers will typically be interested in the PATE; however, if treatment discontinuation or switching is substantial or is not expected to reflect routine practice or outcomes, other estimands may be appropriate.

Following the choice of estimand, investigators must clearly outline the assumptions required to identify it. This will guide the choice of methods and allow HTA committees to judge the plausibility of the assumptions.

3.2 Identification of Effect Modifiers

Potential effect modifiers should be identified during study design [15]. Four classes of effect modifiers should generally be considered: patient/disease characteristics (e.g. biomarker prevalence), setting (e.g. location of and access to care), treatment (e.g. timing, dosage, comparator therapies, concomitant medications) and outcomes (e.g. follow-up or timing of measurements) [14, 21]. Beal et al. have derived a checklist for effect modifiers specifically relevant for the transportability of data from US electronic health records for HTA use in cases in oncology [22]. Although this checklist is designed for oncology, many of its components (such as differences in baseline demographics, disease and biomarker prevalence, disease assessment frequency, concomitant medications and access to supportive care) are relevant in the majority of disease areas. However, development of checklists for other specific disease areas is warranted.

Directed acyclic graphs represent a simple and transparent approach to identifying potential causal relationships between variables [23], and may be useful both in identifying effect modifiers and in communicating sources of transportability bias to HTA agencies. This should be supplemented with systematic literature searches and elicitation of expert opinion. As similar processes are recommended by HTA agencies to identify potential confounders [1, 24], we recommend that confounders and effect modifiers are identified concurrently when a RWD study is being used to estimate comparative effectiveness. Once study and target samples have been selected, the two-step approach used to identify transportability outlined above could be used to identify effect modifiers relevant to the particular study setting.

3.3 Choice of Study Sample

The choice of the study sample (e.g. between a RCT and a RWD source) should be guided by a trade-off between internal validity and external validity. Where a RWD source is considered, the choice of a specific RWD source could be informed by completion of tools such as EUnetHTA’s REQueST tool [25] or the Data Suitability Assessment Tool (DataSAT) tool produced by the National Institute for Health and Care Excellence [26], which assess data quality, suitability of use for HTA purposes and appropriateness for answering a specific research question. This should be supplemented with a checklist similar to that produced by Beal et al. [22].

Where possible, a study sample should be chosen that both has characteristics ensuring internal validity and draws samples directly from the target population (country), avoiding problems with transportability (e.g. RCTs conducted within the target population or high-quality local RWD). Where prospective studies (either trial or RWD) are used, these should be designed to ensure generalisability of the study sample, such as the use of pragmatic RCTs or purposeful sampling [14], or where this is not possible designed so that relevant effect modifiers are collected.

However, where existing RWD sources are used, manufacturers may be forced to use study samples from countries outside of a decision-maker’s population. This could be owing to local data sources either not existing or lacking clinical detail or because of an intervention not currently being available in the target population. In this case, it is important that the study sample is chosen to maximise transportability, by choosing a sample that is most similar in effect modifiers to the target population and/or collects data on all effect modifiers. Importantly, the absence of data at a patient level will generally prevent adjustment for effect modifiers relating to setting and treatment differences across countries, and thus ideally, the study sample should be located in a country with a similar healthcare design and with similar treatment guidelines for the disease of interest.

We must also recognise that manufacturers typically seek reimbursement in multiple markets, resulting in separate target populations for each HTA agency. As it is unlikely that separate study samples will be chosen to estimate comparative effectiveness in each market, manufacturers are likely to choose a single study sample. This could be chosen to be similar in effect modifiers to a population in a single key market, or chosen to be representative as possible to multiple key markets (with the caveat that this may mean the sample is not representative of any market). In each case, data should be available to apply methods to transport evidence to other markets.

A special case is where RWD is used to derive an external control arm for a single-arm trial; a case increasingly common in HTA submissions [9]. If either the single-arm trial or the external control arm is sampled from a country outside of the target population, external validity (in addition to internal validity) requires transporting absolute effects, and therefore requires an assumption of no differences in unmeasured prognostic variables across two countries (in addition to no differences in unmeasured effect modifiers). Where arms are sourced from different countries both not located in the target population, adjustment would need to ensure a balance of all prognostic variables both between treatment arms and between each arm and the target population, and would require a stronger assumption of no differences in unmeasured prognostic variables across three countries. Given the limited plausibility of these assumptions in this setting, the use of methods to assess the sensitivity of results to unobserved confounding and effect modification will be crucial, although residual uncertainty surrounding results is likely to be unavoidable in these cases.

3.4 Choice of Target Sample

The target sample, if data are available, should be chosen with representativeness of the target population in mind, otherwise transporting results to the target sample may not ensure identification of the PATE. A RWD source will typically be used as the target sample. Unless a census forms the target sample (meaning the target sample equals the target population), representativeness is not guaranteed and will need to be assessed.

To enable identification and adjustment for a lack of transportability, the target sample should report data on all effect modifiers. For interventions where many effect modifiers are plausible, sources that include detailed information on patient/disease characteristics, such as registries or electronic health records, may be most appropriate. Where plausible effect modifiers are restricted to commonly recorded patient characteristics, less rich data sources such as claims databases may be sufficient. If the general population is the target population, data from censuses could serve as the target sample, although analysis would be restricted to methods that only require aggregate data. All else equal, a target sample with IPD available should be preferred over aggregate data on the target population, as this allows a more robust adjustment for effect modification. Effect modifiers must also be defined similarly to in the study sample, and must be reported with sufficient completeness and accuracy.

3.5 Methods to Identify and Adjust for Barriers to Transportability

Determining the most appropriate transportability methods in practice is not trivial. Here, we provide some important considerations in an HTA setting.

3.5.1 Identifying/Measuring Transportability

As it is unlikely that outcomes of all treatments will be available in the target sample, the two-step approach that first examines differences in characteristics between the target and study samples before assessing treatment effect heterogeneity in the target sample will be most appropriate. Multiple methods should be used, including standardised mean differences of propensity scores, and examinations of propensity score distributions alongside formal diagnostic tests to identify the absence of an overlap [14]. Univariate standardised mean differences and tests can then be used to examine drivers of overall differences. With only summary target sample data available, analysis could include computing simple differences in averages, alongside standardised mean differences and univariate tests if standard deviations are reported.

Parametric models with treatment-covariate interactions can be used to detect effect modification. Where small study samples result in power issues or where unknown functional forms increase the risk of model misspecification, machine learning techniques such as Bayesian additive regression trees could be considered, and the use of directed acyclic graphs may be particularly crucial for selecting effect modifiers in this case.

Where tests indicate no differences in characteristics across target and study samples or that characteristics are not effect modifiers, comparative effectiveness can be estimated using the study sample without adjustment for transportability, noting that unobserved effect modification is always possible. If assessments of no overlap suggest methods to adjust for a lack of transportability that cannot be applied, investigators could consider changing the target population [27]. For example, whilst evidence may not be transportable to all patients with a condition, they may be transportable to specific patient subgroups, which may still be of interest to HTA agencies.

3.5.2 Adjusting for Barriers to Transportability

Where IPD is available for the target sample, existing reviews suggest that doubly robust methods should be used as the base case, as they are unbiased if only either the propensity model or outcome model is misspecified [15]. Normalised inverse odds weights are recommended in the event of variable weights. Machine learning could be explored to select functional forms in outcome/propensity models when sample sizes are small and/or where functional forms are unknown. Standardised mean differences between study and target samples should be compared before and after weighting to test for balance. Doubly robust or regression-based methods may be preferable in cases of a limited overlap in characteristics between study and target samples, as they allow for adjustment using extrapolation beyond the common support.

Where study samples are derived from RWD, pre-processing using matching/weighting to ensure balance on observed confounders across treatment arms should be conducted, with newer methods that simultaneously correct for confounding and transportability also considered [20]. Given existing simulation studies have been primarily conducted by creators of these methods, “neutral comparisons” could be useful in assessing the relative performance of methods that are free from potential investigator bias [28, 29]. Further targeted simulation studies that explore relative performance in scenarios similar to where methods would be used in practice could also be warranted.

Where only aggregate data are available on the target sample, the adapted inverse odds weighting approach should be considered if data are available on the joint distribution of covariates. If only univariate statistics are available, methods for population-adjusted indirect comparisons can be used, with existing guidelines informing method selection [30]. However, the potential for residual bias when using these methods must be noted.

Accessibility by HTA agencies could also be considered. Recognising existing concerns regarding the acceptability of RWD methods by HTA agencies [31], some argue that the potential benefits of complex approaches such as machine learning should be compared against the potential reduction in the ability to communicate findings [32]. However, a counter argument is that the optimal methods should be used with the onus on decision makers to ensure knowledge of these methods.

Unadjusted results should also be presented as a comparison to demonstrate the impact of adjustment. For effect modifiers relating to setting and treatment differences across countries, where data will generally be absent, their impact must primarily be argued qualitatively, minimised through a careful choice of the study sample and explored in a sensitivity analysis.

3.5.3 Sensitivity Analysis

Consistent with latest HTA guidance for estimating comparative effectiveness using RWD [1], sensitivity analyses should explore the impact of uncertain decisions made during an analysis, and explore the impact of violations of key assumptions. Results using alternative methods for adjusting for transportability could be presented alongside those using doubly robust methods. However, Bayesian model averaging approaches, which have been recommended for addressing model uncertainty more generally in health decision models, could be explored rather than presenting decision makers with numerous alternative treatment effects [33].

A sensitivity analysis assessing a potential unobserved effect modification should always be conducted, and approaches specifically designed for this purpose, such as the Nguyen et al. and bias function approaches, should be considered [18, 19]. Quantitative bias analysis methods addressing unobserved confounding, such as the E-value, are receiving increased attention in HTAs, and allow analysts to either (1) examine the size of bias required to change a result (e.g. treatment effect relative to some decision-making threshold) or (2) estimate the direction, magnitude and uncertainty of bias associated with treatment effects [34, 35]. However, to our knowledge, these have not yet been applied in this setting. Additional analyses may include comparing transported treatment-specific outcomes to outcomes in the target sample, if these outcome data are available in the target sample. A country-specific subgroup analysis may also be useful where a multi-country RCT forms the study sample, noting that reduced sample sizes will lead to a loss of power and will break randomisation unless the trial design ensured random treatment allocation within countries.

3.6 Integrating Transportability into the HTA Submission and Decision Process

Considerations regarding the choice of study and target samples and use of transportability methods should be pre-specified in a study protocol and statistical analysis plan. Where it is anticipated that transportability will form a key part of a submission, early engagement with HTA agencies and other key stakeholders will be crucial to ensure alignment on whether transportability is a concern and how transportability methods can help address that. To support this, HTA agencies should provide clear guidelines on transportability methods, including whether they should be used in the base-case or as a sensitivity analysis in cost-effectiveness models. A review of RWD guidelines highlighted that only two HTA guidelines discuss transportability, and none describes methods to adjust for a lack of transportability [21]. Guidelines should also explicitly outline a hierarchy of study designs, recognising the trade-off between internal and external validity in the use of RCTs.

As with other evidence-generation activities for HTA, primary responsibility for carrying out transportability analysis will often lie with the manufacturer. However, where a manufacturer does not provide consideration of transportability in a submission, it may be prudent for HTA bodies to utilise simple methods to examine the likely impacts on results, noting any limitations. This could include the identification of effect modifiers from targeted reviews and/or expert elicitation, comparisons of summary statistics between the study sample and aggregate data on an easily accessible target sample, followed by a conservative sensitivity analysis. Knowledge that a default conservative approach would be applied in the absence of more considered approaches may incentivise manufacturers to conduct a thorough investigation addressing transportability.

As with evidence from non-randomised studies, acceptance of results transported from other countries by HTA agencies will depend on their confidence that transportability methods sufficiently address potential bias. We note that in many cases, providing HTA agencies with this confidence may be difficult. Unlike methods for correcting other biases, transportability methods require data from a second suitable data source (the target sample) to be identified. Often, the lack of appropriate data to conduct an analysis within the target population will be the reason such methods are needed. As a result, the applicability of these methods is likely higher when transporting data to a country with high-quality RWD in the disease of interest, but where the lack of adoption of the key treatment prevents the use of this data source for assessing comparative effectiveness. Even when an appropriate target sample is available, concerns over residual bias from unobserved effect modifiers will remain. As when addressing confounding, a sensitivity analysis will be crucial for highlighting and quantifying uncertainty in study findings to HTA agencies. However, the bar is arguably higher when adjusting for effect modification (across study and target samples) compared with when addressing confounding (between patients on different treatments), as the absence of patient-level data on important potential effect modifiers relating to treatment and setting characteristics will generally prevent adjustment (Sect. 3.3). Transportability methods will therefore be most suitable where differences in these effect modifiers can be minimised through a careful choice of the location of the study sample.

Given these threats to validity, building on applied studies in a generalisability setting [36] and examples transporting absolute effects [37], applied demonstration studies are needed to highlight that relative treatment effects can be transported across borders with minimal bias. To date, no such application exists. Replication studies, similar in style to RCT DUPLICATE and other initiatives that assess the ability of non-randomised studies to replicate RCT evidence [38], are warranted. Such studies would use published treatment effects as a benchmark and aim to replicate these effects by applying transportability methods to data from other countries. Continued methods development is also required, as methods are currently not available for transporting dynamic treatment effects.

4 Conclusions

A variety of methods are available to adjust comparative-effectiveness evidence to attempt to improve transportability, and are applicable using either patient-level or aggregate data. These methods require good data on the target population, which may not always be available, and the use of a sensitivity analysis to explore a likely unmeasured effect modification. Nevertheless, transportability methods can play an important role in ensuring that comparative-effectiveness evidence is appropriately considered in HTAs.

This article highlights several methodological and practical considerations that must be examined if these methods are to be successfully used in the HTA setting. These points include the choice of target population and study and target sample based on the relevant decision and decision maker(s) an analysis aims to inform, careful identification of effect modifiers and methods to adjust for them, and considerations on how these methods can be integrated into existing HTA evidence frameworks. Clear methodological guidelines on the application of transportability methods in the HTA setting, alongside further simulation studies to identify optimal methods and demonstration studies applying these methods, will facilitate their wider uptake.