FormalPara Key Points

This methodological study compared the effect estimates among different time-zero settings in a comparative effectiveness study with a non-user comparator using actual real-world data.

The treatment effects, based on the adjusted hazard ratio for the outcome, varied among the six time-zero setting methods, even though the same dataset was used for the same research question.

An incorrectly set time zero can result in a biased result and mislead the conclusion; therefore, researchers should carefully select the time zero to minimize those biases when designing a study.

1 Introduction

An increasing body of pharmacoepidemiologic research has been conducted using real-world data (RWD). Real-world data studies can generate evidence on the effectiveness or safety of a treatment in clinical settings, which complements the evidence obtained from randomized controlled trials [1]. Moreover, RWD can be utilized in research throughout the life cycle of drug development, including regulatory decision making [2, 3]. Real-world data studies have research advantages, such as representativeness, large data size, longitudinal data, and low cost [4]. However, using RWD also has serious challenges because of potential biases, such as misclassification of exposure and outcomes, confounding, and selection bias, albeit some inherent in observational studies [4].

Time-related biases, such as immortal-time bias and time-lag bias, are major issues in comparative effectiveness studies using RWD [5, 6]. Immortal time is a period during which the outcome cannot occur because of the definition of the cohorts [7]. For example, patients in a treatment group inevitably remain event-free until they receive the study treatment, and inappropriate handling of this period can introduce bias in the result (immortal-time bias) [8, 9]. Time-lag bias can arise when comparing groups at different stages of the disease, for instance, when the time from diagnosis to the start of follow-up differs between groups [8]. These time-related biases can significantly impact the estimates of treatment effects, and failure to control for them may lead to misleading conclusions [5, 10].

Time-related biases are typically introduced when baseline or follow-up–related time points differ between the study groups. Therefore, aligning the three time points—meeting eligibility, initiating treatment, and starting follow-up—is crucial to reduce these biases in a comparative effectiveness study [11, 12]. One solution is to use a new-user, active-comparator design, that is, comparing new users of treatment of interest with new users of alternative therapy (ideally with the same indication) [1, 13]. Using this comparator instead of prevalent users or patients without active comparators (non-users) can reduce selection bias and increase between-group comparability [13, 14]. Moreover, in the new-user design, the three time points described above (eligibility, treatment start, and follow-up start) can be aligned by setting the starting point of follow-up (“time zero”) at treatment initiation (TI) in both groups, thereby reducing the time-related biases [11].

Researchers may want to compare against non-users for their study purposes, or because of the absence of a proper active comparator. Indeed, of 89 studies on drug effectiveness or safety using RWD, 34% used a non-user comparator [12]. However, comparison with non-users poses a significant challenge in setting the time zero because non-users do not have a time point for starting treatment [12]. Some solutions related to time-zero problems when compared to non-users were proposed [11] and previous studies selected time zero for the non-use group in various ways: for example, at cohort entry, which is the same for the treatment group [15]; or at a different time point compared to the treatment group, including the first day of the first diagnosed month [16], a randomly selected visit date [17], and the same date as their matched patient [18]. As such, various options are available for selecting time zero, but the proper selection is crucial to avoid time-related bias [11, 12]. There are studies using simulated data [19] or RWD [20], which examined various approaches for selecting time zero, although they focused on time-zero setting of external control cohort with multiple line initiation as potential time zeros in oncology.

However, the degree to which the study results would be affected by the different time-zero settings using RWD, especially in a non-cancer study with a non-user comparator, has not been thoroughly examined. Therefore, this study aimed to illustrate the impact of “time-zero setting,” the setting of the starting point of follow-up, on the effect estimates in a cohort study using RWD to compare the treatment effect between the treatment and non-use groups.

2 Methods

2.1 Study Overview

This was a methodological study to illustrate the impact of different settings of time zero of follow-up in a pharmacoepidemiological, non-user comparator study using actual RWD. We conceived several methods for setting time zero and compared the effect estimates obtained in each setting. As an example of an RWD study on treatment effectiveness, we used a previous cohort study using Japanese RWD that compared the risk of diabetic retinopathy (DR) between users and non-users of lipid-lowering medications in patients with type 2 diabetes [21]. The present study was approved by the ethics committee of Juntendo University, Tokyo, Japan (approval number: E21-0041).

2.2 Data Source

This study used an administrative claims database compiled by JMDC Inc., one of Japan's most frequently used RWD sources [22]. As of September 2021, the number of patients in the database is approximately 13 million. The database contains health checkup results and claims (for hospitalization, outpatient treatment, drug preparation, and dental treatment) for insured employees and their dependents collected from health insurance societies [23]. The database includes patient attributes (age and sex), medical facilities, diagnoses, procedures, medications, and medical costs. In this database, data are available from 2005, and anonymized individual IDs enable the tracing of patients between institutions so long as they are insured by the same insurance society.

2.3 Study Cohort

The study cohort comprised patients with type 2 diabetes, initiated on glucose-lowering medications between January 1, 2005, and March 31, 2017 (study period). The study entry date (SED) was defined as the date of the first prescription of a glucose-lowering agent (Anatomical Therapeutic Chemical [ATC] classification system code, A10) and with the diagnostic record of type 2 diabetes (International Classification of Diseases [ICD]-10 codes, E11 or E14) in the same month. To check for pre-entry patient characteristics to confirm eligibility, we required patients to have data for at least 180 days before SED.

Patients were excluded if they had a prescription record of lipid-lowering agents (ATC code, C10) or a diagnostic record of type 1 diabetes (ICD-10 code, E10) during the 180 days before SED or if they were aged < 18 years at SED. A history of DR before SED was handled by setting a washout period for the outcomes (described in Sect. 2.5).

2.4 Variables

The selection and definition of variables were based on a study by Kawasaki et al [21].

2.4.1 Outcome

The outcome was identified using a diagnostic record of DR (ICD-10 codes: E103, E113, E143, H360, H431, H356, H450, or H431). The event date was defined as the date of the first diagnostic record of DR in the patient data.

2.4.2 Exposure

Patients were classified into the treatment group if they had a prescription record of lipid-lowering agents (ATC code C10) during the period from SED to the end of follow-up (defined in Sect. 2.5) or otherwise into the non-use group.

2.4.3 Covariates

The covariates were age, sex, and duration of type 2 diabetes at SED, in addition to the following variables assessed over 180 days before SED: weighted Charlson Comorbidity Index [24], antihypertensive medications (ATC codes, C02, C03, or C07–09), and anticoagulant agents (ATC codes, B01AA-AF or B01AX).

2.5 Follow-Up and Wash-Out Periods

We implemented several methods for setting “time zero,” the starting point of the follow-up. Each method is presented in Sect. 2.6. Regardless of the method used, the follow-up period began at time zero and ended when censored on the incidence of an outcome event, exit from the database, or the end of the study period, whichever occurred first. For simplicity, treatment discontinuation was not considered for the treatment group. In other words, the “as-started effect” (the effect of the initial treatment choice) was evaluated without considering the treatment duration [1].

We set a washout period for the outcomes, during which an outcome event should not occur, as “before time zero.” As the time point of time zero varies depending on the setting method, the washout period also changes according to the settings. For example, when time zero was set at SED, patients with an outcome event at any time before SED were excluded. However, when time zero was set at TI, patients with an outcome event after SED but before TI were also excluded.

2.6 “Time Zero” Settings

We conceived four different setting of time zero that were previously proposed or used (SED vs SED, TI vs SED, TI vs Random, and TI vs Matched), of which SED versus SED and TI versus Matched had two approaches; thus, a total of six time-zero settings were implemented in this study. The study timeframe and definition of time zero in the respective settings are displayed in Fig. 1.

Fig. 1
figure 1

Study time frame for each time-zero setting method. The follow-up period started at time zero and ended when censored on the incidence of an outcome event, exit from the database, or the end of the study period, whichever occurred first. The time before time zero was the washout period for the outcomes; patients with an outcome event during the washout period were excluded from the analysis. SED study entry date (the date of the first prescription of a glucose-lowering agent plus the diagnostic record of type 2 diabetes in the same month), TI treatment initiation (first prescription date of lipid-lowering agents)

2.6.1 SED versus SED

Time zero was set at SED for both the treatment and non-use groups (Fig. 1a). Two approaches were used to implement this setting. The first used the “naïve approach”, in which patients who had an outcome event before TI were treated as non-users [15].

The second approach used the “cloning method”, which was proposed to address biases caused by misalignment of follow-up and treatment starting points [11, 14]. Step-by-step instructions for this method have been published elsewhere [25]. In brief, two clones (copies) were created for each patient; one was allocated to the treatment arm and the other to the non-use arm. Either clone was then censored when it became incompatible with the definition of the allocated arm (e.g., a prescription of lipid-lowering agents in the non-use arm). This artificial censoring introduces a selection bias. Therefore, inverse-probability-of-censoring weighting [26] was applied to address this informative censoring. Weights were calculated using a Cox regression model, including all covariates (listed in the Sect. 2.4.3). In contrast to the original method, in this study, we did not consider a grace period, and consequently, among the patients in the treatment arms, patients in both groups were not artificially censored since they were always compatible with the arm. For the non-use arm, patients in the treatment groups were artificially censored at treatment initiation, i.e., the time when they were not compatible with the arm, although those in the non-use groups were not. Hence, the events in non-use groups were considered in both arms. Since we suspected that this may dilute the treatment effect, we performed an analysis in the per-protocol setting using different grace periods as mentioned in the supplementary materials (Supplementary materials).

2.6.2 TI versus SED

Time zero was set in the treatment group at TI, the first prescription date of lipid-lowering agents [16]. In contrast, it was set at SED for the non-use group (Fig. 1b). In this setting, patients with an outcome event before TI were excluded from the treatment group because the outcome occurred within the washout period.

2.6.3 TI Versus Random

Time zero for the treatment group was set at TI, as in TI versus SED. In contrast, for the non-use group, the date of time zero was randomly selected from all hospital visit dates after SED (Fig. 1c) [11, 17].

2.6.4 TI versus Matched

Time zero for the treatment group was set at TI. In contrast, time zero for the non-use group was defined as the corresponding date of time zero (in terms of the length from SED to TI) of their matched treatment patients. Therefore, the duration from SED to time zero was equal between the matched pairs (Fig. 1d). To implement this setting, treatment patients were matched with non-users in a 1:1 ratio using a propensity score for lipid-lowering treatment calculated using a logistic regression model including all covariates (listed in the Sect. 2.4.3).

We used two approaches for matching, differing in the order of patient matching. The first approach matched patients in random order (TI vs Matched [random order]). Patients in the treatment group were listed randomly, and matching was performed sequentially from the top of the list [27]. For each treatment patient, a non-user with the nearest propensity score was identified as a potential match. The time zero of this non-user was set to correspond to the duration from SED to the time zero of the matched treatment patient. A non-user was discarded if this patient had been censored by this date, and matching was repeated until a suitable non-user was identified. However, this non-user could still be a potential match for another treatment patient.

The second approach matched patients in a systematic order (TI vs Matched [systematic order]). Patients in the treatment group were listed in ascending order on the length from SED to TI and matching was performed sequentially from the top of the list (i.e., from treatment patient who had shorter length from SED to TI) [28]. As for the TI versus Matched (random order), for each treatment patient, a non-user with the nearest propensity score was identified as a potential match and time zero was set to correspond to the duration from SED to the time zero of the matched treatment patient. A patient was discarded when censored by this date. In the systematic order matching, a non-user, who was once discarded due to censoring before the time zero of the potential match, was also discarded from subsequent matching.

2.7 Statistical Analyses

In each time-zero setting, the following effect measurements were calculated: (1) cumulative incidence at 1 year from time zero (2) incidence rate of DR; (3) crude incidence rate ratio of DR; and (4) hazard ratio (HR) and its 95% confidence interval (CI), estimated using a Cox regression model adjusted for covariates (listed in the Sect. 2.4.3). However, for SED versus SED (cloning method), only HR was estimated because the calculation of the incidence rate and crude incidence rate ratio was not applicable due to the use of an artificial population in the cloning method. In addition, only the group (treatment or non-use) was included as an explanatory variable in the Cox regression model for the outcome because covariate adjustment was already performed in inverse-probability-of-censoring weighting.

As a sensitivity analysis, the effect measurements were also calculated after excluding patients with an outcome event within 90 days after time zero in each method. Analyses were performed using SAS release 9.4 (SAS Institute, Cary, NY, USA), and R version 4.1.0 (R Foundation for Statistical Computing, Vienna, Austria). The latter was used for analysis of the SED versus SED (cloning method) only.

3 Results

3.1 Baseline Characteristics of Patients

Of 12,706,548 individuals with ICD-10 codes of E11–14 available in the JMDC database at the time of the study, we obtained data for 42,308 patients who met the eligibility criteria. The flow of patient disposition is provided in the Online Resource Fig. S1.

These eligible patients were classified into treatment or non-use groups, and each time-zero setting was applied. In each method, patients with an outcome event during the washout period, which differed depending on the time-zero settings, were excluded. As a result, 10,929 patients comprised the treatment group in all time-zero settings except for SED versus SED (cloning method); and 24,767, 23,935, 21,860, and 10,929 patients formed the non-use groups in SED versus SED (naïve approach), TI versus SED, TI versus Random, and TI versus Matched (random order, systematic order), respectively. The baseline characteristics of the patients in each group are summarized in Table 1.

Table 1 Baseline characteristics of patients in each group according to time zero setting methods

The treatment group in TI versus SED, TI versus Random, and TI versus Matched (time zero at TI) had a mean (standard deviation [SD]) duration from SED to time zero of 154.4 (345.5) days. In contrast, the non-use group had 0 days in TI versus SED (time zero at SED) and 306.6 (435.0) days in TI versus Random (time zero at randomly sampled hospital visit date).

3.2 Relative Risk of Diabetic Retinopathy

Table 2 and Fig. 2 summarize the effect measurements in each time-zero setting.

Table 2 Incidence rate, rate ratio, and cumulative incidence of the outcome in each time zero setting method
Fig. 2
figure 2

Hazard ratio in each time-zero setting method. SED versus SED (naïve approach): time zero set at SED for both groups (naïve approach). SED versus SED (cloning method): time zero set at SED for both groups (cloning method). TI versus SED: time zero set at treatment initiation for the treatment group and SED for the non-use group. TI versus Random: time zero set at treatment initiation for the treatment group and at a randomly sampled hospital visit date for the non-use group. TI versus Matched (random order): time zero set at treatment initiation for the treatment group and at the matched date for the non-use group (matching in random order). TI versus Matched (systematic order): time zero set at treatment initiation for the treatment group and at the matched date for the non-use group (matching in systematic order). For SED versus SED (cloning method), only the “group” was included as an explanatory variable in the Cox regression model because covariate adjustment was already performed in the estimation of weights for the inverse-probability-of-censoring weighting. CI confidence interval, HR hazard ratio, SED study entry date (the date of the first prescription of a glucose-lowering agent plus the diagnostic record of type 2 diabetes in the same month), TI treatment initiation (first prescription date of lipid-lowering agents)

In SED versus SED (naïve approach), the cumulative incidence at 1 year follow-up and its 95% CI of treatment group was 0.070 (0.065–0.075) and that of the non-use group was 0.124 (0.119–0.128). In SED versus SED (cloning method), that of the treatment group was 0.101 (0.097–0.104) and that of the non-use group was 0.100 (0.096–0.103). In TI versus SED, that of the treatment group was 0.087 (0.081–0.092), which was same as in TI versus Random, TI versus Matched (random order) and TI versus Matched (systematic order), and that of the non-use group was 0.098 (0.094–0.103). Those of the non-use group were 0.054 (0.050–0.058) in TI versus Random, 0.122 (0.115–0.129) in TI versus Matched (random order), and 0.090 (0.084–0.096) in TI versus Matched (systematic order).

In SED versus SED (naïve approach), the incidence rate per 1000 person-year and its 95% CI of treatment group was 57.8 (55.0–60.6) and that of the non-use group was 94.9 (91.9–97.8). In TI versus SED, that of the treatment group was 69.1 (65.8–72.5), which was same as in TI versus Random, TI versus Matched (random order) and TI versus Matched (systematic order), and that of the non-use group was 76.5 (73.8–79.1). Those of the non-use group were 51.6 (48.6–54.6) in TI versus Random, 100.8 (95.9–105.7) in TI versus Matched (random order), and 72.2 (68.5–75.9) in TI versus Matched (systematic order).

Among all methods, both the crude rate ratio and adjusted HR showed the same trends. However, the results of the treatment effects were inconsistent among the six methods. Lipid-lowering treatment was associated with a reduced risk of DR in SED versus SED (naïve approach) (HR, 0.65; 95% CI 0.61–0.69), TI versus SED (RR, 0.61; 95% CI 0.58–0.65, adjusted HR, 0.92; 95% CI 0.86–0.97), and TI versus Matched (random order) (RR, 0.69; 95% CI 0.64–0.73, adjusted HR, 0.76; 95% CI 0.71–0.82). In contrast, TI versus Random showed that treatment was associated with an increased risk (RR, 1.34; 95% CI 1.24–1.44, adjusted HR, 1.52; 95% CI 1.40–1.64). Contrary to these results, treatment had neither increased nor decreased risk in SED versus SED (cloning method) (adjusted HR, 0.95; 95% CI 0.93–1.13) and TI versus Matched (systematic order) (RR, 0.96; 95% CI 0.89–1.03, adjusted HR, 0.99; 95% CI 0.93–1.07).

In this population, outcome events occurred more frequently in the early days after SED (Online Resource Fig. S2). Therefore, we calculated the effect estimates after excluding patients with an outcome within 90 days after time zero as a sensitivity analysis. Although the effect estimates approached null in every time-zero setting, each method showed the same trend as above, except that TI versus SED no longer showed any treatment effect (Online Resource Table S1).

For SED versus SED (cloning method), we performed an analysis in the per-protocol setting using different grace periods (either 3 or 6 months) and did not observe any differences in a sensitivity analysis (supplementary materials).

4 Discussion

In a comparative effectiveness study, multiple choices are available for when and how to set the time zero of follow-up [11, 19, 20], and the time-zero setting is particularly challenging in a study with a non-user comparator arm [12]. In this study, we implemented six methods to set time zero in a comparative RWD study with a non-user comparator and compared the effect estimates obtained from these settings. The six methods estimated the treatment effects contradictorily, with three showing a reduced risk of the outcome, two showing a null effect, and one showing an increased risk. These results demonstrate that different time-zero settings can lead to different conclusions, even if the same dataset is analyzed for the same research question, shedding light on the significant impact of time-zero settings on the effect estimates in such a comparative effectiveness study.

No "gold standard" exists for setting time zero in studies with a non-user comparator, and the present study does not aim to determine superiority among the six setting methods. Nevertheless, SED versus SED (cloning method) and TI versus Matched (systematic order) were based on approaches proposed to avoid possible biases, such as immortal-time bias, time-lag bias, and selection bias [11, 25, 28]. Therefore, these two settings would presumably have higher between-group comparability than the other settings. For example, SED versus SED (cloning method) used the method proposed by Hernán et al [11, 25]. By allocating clones of each patient to the respective arms, the patients in both arms became identical at time zero, eliminating confounding at baseline. Moreover, despite the time zero set at SED, clones and artificial censoring in this method successfully avoided immortal-time bias, which would otherwise occur when time zero is set at SED in both groups [9].

In TI versus Matched (systematic order), the duration from SED to time zero was equated in the two groups by matching, preventing time-lag bias. Indeed, this time-zero setting has been reported as a good setting example for non-users [12]. However, the way of choosing non-users can introduce a selection bias [28]. For example, in random order matching as in TI versus Matched (random order), a non-user, who should be excluded due to an outcome event before time zero if matched with a particular patient, may be included in the cohort by being matched with another treatment patient with a different time zero. In other words, a non-user patient who should be excluded because of the outcome event remains in the pool of potential matches of subsequent patients in the treatment group. As a result, the probability of being matched for a non-user who had outcome events is increased and more patients with outcomes would be included in the non-use group [8, 28]. One way to avoid this bias is to select non-users chronologically or systematically such that a non-user who was once discarded will never be included as a match for another treatment patient [28]. In this sense, TI versus Matched (systematic order) likely managed to avoid this selection bias by systematically matching patients based on the length from SED to time zero.

As such, SED versus SED (cloning method) and TI versus Matched (systematic order) seemed to control for various biases, possibly preserving high between-group comparability. Interestingly, both methods yielded an adjusted HR of approximately 1, indicating a null effect of lipid-lowering treatment for DR. There is no known "true value" to be compared against; however, previous randomized clinical trial (RCT) showed no difference in progression of DR between lipid lowering treatment and placebo group in patients without pre-existing DR [29], although a risk reduction was shown in patients with pre-existing DR [29, 30]. Since the study population of this current study comprised patients without previous DR, the results of the above two methods were similar to RCT findings. As explained above, the results of these two methods would have been much less biased than the results obtained by other setting methods, which probably had lower group comparability. Therefore, we discuss the results of the remaining four settings in terms of potential biases and their directions.

Three methods (SED vs SED [naïve approach], TI versus SED, and TI versus Matched [random order]) showed an adjusted HR of < 1, indicating that the treatment had a protective effect on the study outcome. However, these reduced outcome risks were likely attributed to the overestimation of the incidence rate in the non-use group due to biases caused by the respective time-zero settings. First, SED versus SED (naïve approach) is a typical case of immortal-time bias [7]. This may have overestimated the incidence rate in the non-use group, causing a downward bias in the relative risk in the treatment group. Second, in TI versus SED, the follow-up period of the non-use group included the early days after SED, during which outcome events occurred more frequently, whereas those of the treatment group did not, which is a case of bias from time lag and latency [8]. This probably resulted in overestimating the incidence rate in the non-use group. The disproportionately high incidence rate soon after SED may be because patients were encouraged to undergo an ophthalmologic examination at diagnosis [31] or did so driven by the start of pharmacotherapy. After excluding patients with outcomes in the early days after time zero as a sensitivity analysis, thereby removing the imbalance associated with this period, TI versus SED no longer showed a treatment effect (Online Resource Table S1). Finally, TI versus Matched (random order) probably introduced a selection bias by matching non-users in random order as explained earlier [8].

In contrast, TI versus Random yielded an adjusted HR of >1, indicating that treatment had an increased outcome risk. In this method, time zero of non-users was randomly selected from all hospital visit dates after SED. Although this type of solution was proposed in the previous literature [11], our results suggested that this was not suitable in this study. In particular, this setting would introduce a selection bias by removing those with an event before time zero in the non-use group, which had longer period from SED to time zero (154.4 days in the treatment group vs 306.6 days in the non-use group).

To the best of our knowledge, this is the first study to demonstrate the direction and magnitude of potential biases caused by different time-zero settings in a non-user comparative effectiveness study using RWD. We assumed various time-zero settings based on previous studies, such as those considered problematic or desirable [8, 11, 12, 25, 28]. A major strength was that we applied six possible time-zero settings to the same dataset for the same research question using real-world data, purely focusing on evaluating the impact of time-zero settings.

However, the present study had a limitation in that there may have been residual biases in our effect estimates due to, for example, confounding by indication and time-varying confounding [28]. The covariate information used for the cloning method, calculation of propensity score, and Cox regression model for outcome were based on baseline data only, which may lead to residual bias due to time varying confounding. Also, we did not use analytical methods that can control for immortal time biases, such as the time-dependent Cox regression model. For TI versus Random method, although there might be more suitable approaches, such as selecting visits randomly based on the distribution of times in the treatment group, these were not applied. Nevertheless, this limitation does not degrade the value of this study because our aim was not to obtain accurate effect estimates. For simplicity, the present study focused on time-zero settings without considering other aspects, including latency time [5], grace period [14], and treatment discontinuation [1]. An analysis of the impact of these other aspects of the study design is warranted in future studies. It should also be noted that the present study did not aim to determine superiority among different time-zero settings. A desirable time-zero setting should vary depending on the study. Thus, the setting used for a particular study should be individually justified, considering the study design, analytical methods, diseases of interest, outcomes, and exposures. Finally, only a single data source was used in this study, which may limit the generalizability of these findings. However, the used data source contains large numbers of patients and is quite representative of the active population in Japan.

5 Conclusion

In conclusion, the results of this study illustrated that the conclusions regarding treatment effect varied depending on how to set time zero in a non-user comparative effectiveness study using RWD, highlighting the significant impacts of time-zero settings on the study results. An improper setting of time zero may lead to biased and misleading conclusions. Therefore, to minimize biases, researchers should carefully select time zero to increase between-group comparability when designing a study. Further research on the impact of time-zero settings in studies of different diseases, outcomes, and exposures will facilitate further understanding of this important element that should be considered when designing a comparative effectiveness study using RWD.