Introduction

Decentralized clinical trials (DCTs) are defined by the Clinical Trial Transformation Initiative as “trials executed through telemedicine, mobile/local healthcare providers (HCPs), and/or mobile technologies.” [1] A DCT does not need to be completely remote [1, 2], and others have proposed metrics for the degree of decentralization for a clinical trial which are based on how much a study depends on specific sites and the methods of data collection used [2]. In this article, we use the term decentralized to denote data not collected through the traditional approach (i.e., during in-person clinical visits). Decentralized measurements may include measurements collected at home by patients or health workers, through a remote electronic device (e.g., electronic clinical outcome assessment), a mobile device, or at a local medical facility.

Prior to the COVID-19 pandemic, there was interest in DCTs due to the potential reduction in time and cost associated with patient travel which may improve patient retention [1, 3]. Additional potential benefits include faster and more diverse patient recruitment due to fewer geographical restrictions, especially for rare diseases [1, 3, 4]. Technological advances including the wide use of mobile devices and increasing popularity in the adoption of wearable devices also facilitate DCTs [1, 5, 6].

During the COVID-19 pandemic, interest in decentralization turned into necessity and use of decentralization in clinical trials accelerated quickly due to patient safety concerns about traveling to clinical trial sites or temporary closure of some sites [7]. This public health emergency triggered regulatory support for alternative methods of data collection with an emphasis on maintaining patient safety, which helped temporarily facilitate trial decentralization [3, 7,8,9].

Despite the advantages and recent regulatory support for DCTs, these trials still present many challenges. For example, there is uncertainty about which of the additional regulatory flexibilities will be allowed once the COVID-19 pandemic ends [3, 7]. Additional legal complications in the USA include restriction of telehealth visits over state lines, which may require having primary investigators with a medical license in each patient’s home state. Delivery of study medications directly to participants may also be difficult for practical reasons, due to some state laws, the need for preservation of patient privacy during delivery, and potential need for refrigeration [1, 10]. From a patient safety perspective, there is also concern about how sponsors and Institutional Review Boards will monitor and enforce safety requirements on decentralized sites, as well as the accuracy of data collected to monitor adverse events [8, 9].

Alternative data collection in DCTs may lead to data quality concerns including whether the data is traceable/auditable, how to securely store or transmit patient data, whether the data will have increased variability or biases, and how updates to the software or hardware used to collect the data will impact longitudinal studies [6, 11]. The United States Food and Drug Administration supports some decentralized data collection for trials ongoing during the pandemic [12], but it may take time before the effects of decentralization on data quality are well understood.

In some cases, local, home, or virtual data collection may not be as reliable as data collected onsite. Some virtual data collection tools and methods require validation. Despite the sponsor’s best effort to validate the devices and tools for decentralized data collection, the quality of the decentralized measurements may not be as consistent as onsite measurements. Using body weight as an example, even if the sponsor provides a standardized weight scale and detailed instructions, patients generally do not have the necessary training to be fully compliant to the procedure.

One approach to addressing some of these challenges while planning a new DCT is to utilize a hybrid approach for decentralization by conducting a few critical visits on site and all other data collection and visits in a decentralized manner [1]. This approach was recommended for one DCT study by a regulatory agency [Unpublished Communication]. In this article, we use clinical trials in chronic weight management (CWM) as an example to understand the impact of a fully decentralized approach and a hybrid approach on the estimates of mean response and the treatment difference in the primary outcome under various scenarios including increased variability and bias for decentralized measurements. Section 2 describes the clinical trial setting and the data generating models, Section 3 provides simulation results, and in Sections 4 and 5 we summarize our findings.

Materials and Methods

Simulation Setting

We consider a 52-week trial comparing an experimental treatment with placebo for the purpose of CWM for participants with obesity, defined by baseline body mass index (BMI) \(\ge\) 30 kg/m\(^2\) or BMI \(\ge 27\) kg/m\(^2\) with at least 1 weight-related comorbidity. The primary analysis variable is the change in body weight from baseline to 52 weeks. We assume body weight is measured at baseline and every 4 weeks after randomization until week 52, and the study enrolls 600 participants randomly assigned to placebo or an experimental treatment with a 1:1 randomization ratio.

The onsite weight measurement for subject \(j \; (1 \le j \le n_i)\) at time point \(k \; (k=0,1,\ldots ,13)\) on treatment i (\(i=0\) for the control group and \(i=1\) for the experimental treatment group) is modeled as

$$\begin{aligned} Y_{ijk} = \mu _{ik} + s_{ij} + e_{ijk}, \end{aligned}$$
(1)

where \(\mu _{ik}\) is the population mean response for treatment group i at time point k, \(s_{ij} \sim \text{ NID }(0, \sigma _s^2)\), \(e_{ijk} \sim \text{ NID }(0,\sigma _e^2)\), where \(s_{ij}\) and \(e_{ijk}\) are independent and NID stands for “normally independently distributed”. We assume the decentralized measurement \(Y_{ijk}^*\) follows

$$\begin{aligned} Y_{ijk}^{*} = (1+r)(\mu _{ik} + s_{ij} + e_{ijk} + s_{ij}^* + e_{ijk}^*), \end{aligned}$$
(2)

where \(s_{ij}^* \sim \text{ NID }(0,\sigma _s^{2*})\) is the additional between-subject variability, \(e_{ijk}^* \sim \text{ NID }(0,\sigma _e^{2*})\) is the additional within-subject variability, \(s_{ij}^*\), \(s_{ij}\), \(e_{ijk}^*\) and \(e_{ijk}\) are all mutually independent, and r (\(-1< r < 1\)) is the proportion of bias in the decentralized measurement.

We consider 3 clinical trial settings:

  1. 1.

    All onsite—Body weight is measured on site for all time points.

  2. 2.

    Mixed—Body weight is measured on site for Week 0 (baseline), Week 24, and Week 52, and decentralized measurements are used for all other time points.

  3. 3.

    All decentralized—Decentralized body weight measurements are used for all time points.

For decentralized measurements, we consider 3 scenarios: (A) additional within-subject variability but without additional between-subject variability and bias (\(\sigma _e^{2*}>0\), \(\sigma _s^{2*}=0\), and \(r=0\)), (B) additional within- and between-subject variability but without bias (\(\sigma _e^{2*}>0\), \(\sigma _s^{2*}>0\), and \(r=0\)), and (C) additional within- and between-subject variability and additional bias (\(\sigma _e^{2*}>0\), \(\sigma _s^{2*}>0\), and \(r \ne 0\)).

For the onsite measurements, we use a between-patient variance (\(\sigma ^2_s\)) of 300 and a within-patient variance (\(\sigma ^2_e\)) of 70. These values are selected based on historical studies. For decentralized measurements, the parameters for additional variability and bias are \(\sigma _s^{2*} = 130\), \(\sigma _e^{2*} = 30\), and \(r = -0.05\). The \(\sigma _s^{2*}\) and \(\sigma _e^{2*}\) values assume an approximate 20% increase in the within- and between-patient standard errors beyond the \(\sigma _e^2\) and \(\sigma _s^2\) values from historical studies. Literature reviews have suggested self-reported weight measurements are typically 0.5 to 1 kg lower compared to weight measurements taken on site [13, 14]. We assume an \(-5\)% (approximately \(-5\) kg) bias to evaluate the performance of estimators under an extreme situation.

In CWM studies, often the primary endpoint is the percent change in body weight from baseline. Under the data generation models (1) and (2), the theoretic mean percent weight loss is difficult to calculate, so we first use the change in body weight (kg) from baseline as the primary analysis variable and perform analyses to estimate the estimands for the change in body weight. The true mean body weight, the mean change in body weight, and the mean percent change in body weight by treatment group are provided in Table 1. The mean percent change was calculated based on 1,000,000 Monte Carlo simulation samples.

Table 1 True Mean Body Weight by Treatment Arm

To introduce missing data under the missing at random assumption [15], we first calculate the change in body weight from baseline for each post-baseline visit. Then, we generate dropouts assuming the probability of patients staying in the study at the k-th time point (\(k=2,3,\ldots ,13\)) is:

$$\begin{aligned} \pi _{ijk} = \frac{1}{ 1 + \exp \left\{ \alpha _0 + \alpha _1 \left[ \{Z_{ij,k-1} Y_{ij,k-1} +(1-Z_{ij,k-1}) Y_{ij,k-1}^* \} - \{Z_{ij0} Y_{ij0} +(1-Z_{ij0}) Y_{ij0}^* \} \right] \right\} }, \end{aligned}$$
(3)

where \(Z_{ijk}\) is a binary variable indicating whether the measurement at time point k is collected on site. The probability of missingness at each time point depends on the observed intermediate change in weight measurement at the previous time point. This missingness mechanism is ignorable missingness, also called missing at random (MAR) [15]. We choose \(\alpha _1 = 1\) to ensure the data is missing at random and \(\alpha _0\) ranging from − 17.5 to − 24.5 to obtain an overall proportion of patients with missing values at Week 52 of approximately 20%.

Inferences

Under the assumption of missing at random, we estimate the mean change in body weight using a mixed model for repeated measures (MMRM) including factors of treatment, time point, and treatment by time point interaction, with baseline body weight as a covariate. An unstructured variance–covariance structure is used to model the within-subject correlation. Commonly used software packages (SAS and R) as well as theory for (mixed) linear models provide inference conditional on the baseline covariate. Let \(\hat{\mu }_{ik}\) denote the estimator for \(\mu _{ik}\) and \({\text{Var}}\{\hat{\mu }_{ik}|\bar{X}_{\cdot \cdot }\}\) be the variance of \(\hat{\mu }_{ik}\) conditional on baseline covariate, where \(\bar{X}_{\cdot \cdot }\) represents the overall mean for X across treatment groups. By the law of total variance, we have

$$\begin{aligned} {\text{Var}}\{\hat{\mu }_{ik}\} = E[ {\text{Var}}\{\hat{\mu }_{ik}|\bar{X}_{\cdot \cdot }\}] + {\text{Var}}[ E\{\hat{\mu }_{ik})|\bar{X}_{\cdot \cdot }\}]. \end{aligned}$$

Then,

$$\begin{aligned} {\widehat{\text{Var}}}\{\hat{\mu }_{ik}\} = {\widehat{\text{Var}}}\{\hat{\mu }_{ik}|\bar{X}_{\cdot \cdot }\} + {\widehat{\text{Var}}}[ E\{\hat{\mu }_{ik}|\bar{X}_{\cdot \cdot }\}]. \end{aligned}$$
(4)

Linear model packages give the variance estimate \({\widehat{\text{Var}}}\{\hat{\mu }_{ik}|\bar{X}_{\cdot \cdot }\}\). We need to add the second part to obtain the unconditional variance estimation. The second part can be estimated as

$$\begin{aligned} {\widehat{\text{Var}}}[ E(\hat{\mu }_{ik}|\bar{X}_{\cdot \cdot })] = (\hat{\beta }_x)^2{\text{Var}}(\bar{X}_{\cdot \cdot }) = (\hat{\beta }_x)^2 S_x^2/n, \end{aligned}$$

where \(S_x^2\) is the sample variance for the baseline observations (across treatment groups) and \(n=n_1+n_2\) is the overall sample size. Since the estimator \({\hat{\mu }}_{1k} - {\hat{\mu }}_{0k}\) for the treatment effect does not contain \({\bar{X}}_{\cdot \cdot }\), the conditional and unconditional variance are the same and the direct variance estimation from software packages can be directly used for inference. In the simulation results in Section 3, the inference for the mean in each treatment group is based on the variance estimation in Eq. (4).

Results

We conducted simulation studies based on 5,000 simulated samples for 3 clinical settings (all onsite measurements, measurements with mixed modality, and all decentralized measurements) for scenarios A, B, and C for increased variability and bias. In all scenarios, the proportion of missing values at the 52-week visit is between 13.8 and 15.6% for the experimental treatment arm and 24.3 and 26.6% for the placebo arm.

Figure 1 summarizes the simulation results for Scenario A in which we assume decentralized measurements only increase the residual error [\(\sigma _e^{2*}=30\), \(\sigma _s^{2*}=0\), and \(r=0\) in Model (2)]. When measurements are all collected on site, the bias is minimal and the estimated 95% confidence intervals have a coverage probability of approximately 0.95. For the mixed onsite/decentralized measurement modality, the standard errors at visits with decentralized measurements are larger than the case of all onsite measurements, but at the visits where weights are measured onsite (Weeks 24 and 52), the point estimates and standard errors are similar between the mixed modality and all onsite measurements. For the case of all decentralized measurements, all estimates also have minimal bias but the standard errors are consistently larger than the all onsite measurement scenario.

Figure 1
figure 1

Summary of Simulation Results for the Treatment Difference in the Mean Change in Body Weight from Baseline for the Scenarios with All Onsite Measurements, with a Mixture of Onsite and Decentralized Measurements, and with All Decentralized Measurements Assuming Increased Within-Subject Variability for Decentralized Measurements (Based on 5000 Simulated Samples). The True values indicate the true weight change from baseline for each treatment arm and time point.

Figure 2 summarizes the simulation results for Scenario B in which we assume decentralized measurements increase the residual error variability and between-subject variability [\(\sigma _e^{2*}=30\), \(\sigma _s^{2*}=130\), and \(r=0\) in Model (2)]. The results are generally similar to those for Scenario A (Fig. 1), with slight increases in standard errors for the decentralized measurements in the mixed modality.

Figure 2
figure 2

Summary of Simulation Results for the Treatment Difference in the Mean Change in Body Weight from Baseline for the Scenarios with All Onsite Measurements, with a Mixture of Onsite and Decentralized Measurements, and with All Decentralized Measurements Assuming Increased Within- and Between-Subject Variability for Decentralized Measurements (Based on 5000 Simulated Samples). The True values indicate the true weight change from baseline for each treatment arm and time point.

Figure 3 summarizes the simulation results for Scenario C in which we assume the decentralized measurements not only increase the within- and between-subject variabilities but also introduce a bias [\(\sigma _e^{2*}=30\), \(\sigma _s^{2*}=130\), and \(r=-0.05\) in Model (2)]. For mixed modality measurements, the biases and standard errors in the estimated mean change in body weight from baseline to 24 and 52 weeks are similar to the case of only onsite measurements, but the biases are larger for other time points. Recall that in the mixed modality scenario the measurements at baseline, Week 24 and Week 52 were collected on site, so we expect to see no bias in the change from baseline estimates to 24 and 52 weeks, but the bias for the change in body weight for intermediate time points is amplified due to the mismatch of data collection methods at baseline and the intermediate time points. For the case of all decentralized measurements, there is a clear bias in all time points but the bias is smaller than the decentralized measurement time points in the mixed modality scenario. This is likely because the bias is largely canceled out when calculating the change from baseline when both baseline and postbaseline measurements are decentralized. Note the bias cancelation in the all decentralized modality is specific to our simulation scenarios. If the effect of bias cancelation is smaller, the bias for the all decentralized scenario will be larger.

Figure 3
figure 3

Summary of Simulation Results for the Treatment Difference in the Mean Change in Body Weight from Baseline for the Scenarios with all Onsite Measurements, with a Mixture of Onsite and Decentralized Measurements, and with All Decentralized Measurements Assuming Increased Within- and Between-Subject Variability, and Bias for Decentralized Measurements (Based on 5000 Simulated Samples). The True values indicate the true weight change from baseline for each treatment arm and time point.

Table 2 summarizes the bias, standard deviation, the mean of the standard error over the simulation iterations, and the coverage probability for the 95% confidence interval for the treatment difference in the mean change in body weight from baseline to 12, 24, 36, and 52 weeks. Data for other intermediate time points are not shown here, as the results were similar to those at 12 and 36 weeks. Note for the mixed modality setting, onsite measurements are collected at baseline, 24, and 52 weeks only. Similar to the results for each treatment group, for the mixed data collection modality the estimates for the treatment difference are very similar to those for the non-DCT (traditional) setting for 24 and 52 weeks, have larger variability at 12 and 36 weeks when decentralized measurements have increased variabilities, and are biased at 12 and 36 weeks when decentralized measurements have measurement bias. Note the bias is much smaller for the treatment difference compared to individual treatment group results. This is likely because some bias is canceled out during calculation of the treatment difference. For the case of all decentralized measurements, estimates for the treatment difference have increased variability at all time points and are slightly biased when the decentralized measurements are biased.

Table 2 Summary of simulation results for the difference between placebo and experimental treatment arm in the difference in weight from baseline—comparing the all onsite scenario to scenarios with decentralized measurements at select time points (based on 5000 simulated samples)

As mentioned previously, primary outcomes from CWM trials are commonly presented as percent change in weight from baseline. Therefore, we carried out analogous simulation studies based on percent change from baseline and the results can be found in the Supplemental Information online. Overall, the coverage probabilities for the percent change results are slightly lower for all cases. This is likely because the percent change data does not satisfy all the assumptions of the MMRM model used (i.e., the percent change values have heterogeneous variance and non-normality). However, the trends in increasing standard deviations and bias at decentralized visits, as well as lower coverage probabilities in the scenario where bias is present at decentralized visits, are all consistent with the results discussed previously when the absolute change in weight from baseline is the primary outcome of interest.

We did not simulate the case under the null hypothesis of no treatment difference. Based on the theory of linear models, the coverage probability does not change when the true mean treatment difference changes for linear models. Therefore, the coverage probability in Table 2 can be used gauge the type I error rate (type I error is equal to 1 minus the coverage probability). Referring to this table for the Scenario C results (which correspond to Fig. 3), the type I error rate at 52 weeks in the mixed modality is approximately 5% and the type I error rate in the all decentralized modality is  9%. Due to the inflated type I error and biased estimates in the all decentralized modality at critical time points, the mixed modality is a reasonable approach to increase the number of clinical trial visits which are decentralized while maintaining the validity of the inference at key time points if the decentralized data measurements are not fully validated.

Discussion

DCTs are drawing more interest due to the convenience for patients and easier access for traditionally under-represented patients. One challenge for DCTs is the validity of decentralized measurements (collected at home, virtually, or at local facilities). For measurements collected by patients, even if standardized devices and detailed instructions are provided to patients, they may not be able to closely follow the instructions as they are not trained technicians or nurses. In addition, completely virtual clinical trials can be challenging as patients and physicians may prefer to have some face-to-face interactions with sites and certain measurements can be more reliably obtained on site versus at home or locally. A mixed modality of on site and decentralized visits may be attractive because they can take advantage of the benefits of DCTs (less travel time and more access to under-represented patient populations) while maintaining the benefit of face-to-face interactions between healthcare providers and patients, and accurate measurements taken at traditional site-based clinical trials. In Eli Lilly and Company, most new diabetes clinical trials utilizing DCT capabilities use a mixed modality, in which onsite measurements are collected at key time points (e.g., baseline and the primary time points).

The impact of the mixed DCT modality on statistical inference (bias, variance, and coverage probability of the confidence interval) has not been studied previously. In this article, we evaluated the impact of the mixed DCT modality on statistical inference through simulation for longitudinal continuous outcomes. Since we assume the measurements collected at baseline and primary time points are taken onsite, the DCT measurements at intermediate time points would not impact the primary analysis if there were no missing values. In our simulation, we evaluate the performance of the MMRM under the assumption of MAR and our results show the modality at intermediate outcomes has little impact on the estimation of treatment effect at key time points.

The MMRM analysis is often used to estimate treatment effects when the hypothetical strategy is used to handle intercurrent events and missing or unobserved values are considered missing at random. Another important strategy commonly used in handling intercurrent events is the treatment policy strategy. To estimate the treatment effect under the treatment policy strategy, imputation methods (e.g., retrieved dropout [16] and reference-based imputation [17] methods) based on the pattern mixture model may be used. Since retrieved dropout and jump-to-reference imputation methods only use the baseline and primary time point (for the primary analysis), the modality of the intermediate measurements will not impact these 2 imputation methods. For copy-reference and return-to-baseline [18] imputation methods, intermediate outcomes are needed to estimate the within-subject variance–covariance structure for the reference arm and/or treatment arm, which requires the assumption of ignorable missingness for hypothetical outcomes. Based on simulation results in the “Results” section under the missing at random assumption, it is reasonable to conclude the mixed modality also has little impact on estimation using copy-reference and return-to-baseline imputation methods. However, we note the different modality for data collection at intermediate time points may impact the amount of intercurrent events or missing values which may affect estimation for the estimands associated with the treatment policy strategy and aforementioned imputation methods, but we expect such an impact on the final estimation results to be small. Overall, data collected in the mixed modality setting can be used to handle missing data in a wide variety of ways and the best approach to handling missing data in a particular study depends on the reason for missing data and the estimand being used.

One drawback of using the mixed modality approach is the potential for high bias at non-critical time points as demonstrated in Fig. 3. An alternative approach which may help address this drawback is to collect both onsite and decentralized measurements at baseline and use the decentralized baseline measurement to calculate the change from baseline at decentralized time points. This is a more complicated option which assumes that the bias observed in a decentralized measurement is consistent across time points. This could also potentially increase the challenge of statistical analysis, e.g., which (onsite or decentralized) baseline value should be included in the analysis model. We do not recommend this approach.

Another challenge in using the mixed DCT modality is understanding the mean response over time since onsite and DCT measurements may not be “comparable”. For most measurements, the difference between measurement modalities may not be large. To present the estimates from onsite measurements taken at key time points and estimates from DCT measurements at intermediate time points, one can use different line type and point symbols to differentiate the measurement modalities (see Fig. 4 for an illustration). Additionally, for a variable that is collected through a decentralized approach for the first time, it may not be a bad idea to collect both decentralized and onsite measurements at some time points so that the difference between the 2 data collection modalities can be assessed.

Figure 4
figure 4

Visualization of the Mean Change in Body Weight Over Time When Mixed-Modality Data Collection Methods are Used in a Decentralized Clinical Trial

Conclusion

In conclusion, clinical trials with mixed decentralized and onsite measurement modalities are an attractive option to reduce the burden of patient travel to the trial site while maintaining the advantages of traditional clinical trials. This article shows a mixed modality approach has little impact on analysis results at the primary time point(s) for various estimands, even if the decentralized measurement devices or procedures are not fully validated. Therefore, more clinical trials with mixed modality should be conducted to attract traditionally under-represented patient populations, reduce patients’ burden, and reduce the cost of clinical trials while maintaining the quality of inference for key study objectives.