Background

In April 2018, the operation of the Medical Information Database Network (MID-NET®) began as a national project aimed at utilizing real-world data for drug safety assessments in Japan [1,2,3,4]. MID-NET® is a database systems network consisting of 10 collaborative organizations (23 collaborative hospitals) that can analyze data derived from claim data, diagnosis procedure combination data, and electronic medical record (EMR) data at the individual-level [2].

Laboratory test results derived from EMR data have detailed information on clinical symptoms [5] and are expected to be used for confounding adjustments in drug safety assessments. However, the appropriate use of these test results is difficult since a number of data obtained during routine medical care may be missing in datasets for analysis. Therefore, it is essential to appropriately select and apply existing methods to reduce bias due to missing data (hereinafter referred to as “missing data methods”).

Choosing a missing data method requires an understanding of the missing data sources and missing data mechanism [6,7,8]. Raebel et al., in their study using the US Food and Drug Administration Mini-Sentinel Distributed Database [6], reported that missing data sources of laboratory test results include testing outside of contracted laboratories, patient location where testing was conducted, patient clinical features, etc. In the study of the 10 MID-NET®-collaborative hospitals of the Tokushukai Medical Group [9], authors reported that patient background and setting of ordering system of laboratory tests were the main missing data sources. Although their impact was not evaluated because they were unobserved, the measurement policies by physicians and institutions were considered as potential sources as well. Missing data mechanism can be classified as missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR) based on the relationship of missing data probability with missing data and observed values [10] (See Table 1).

Table 1 Missing data mechanisms and their examples

In the utilization of laboratory test results contained in MID-NET® to confounding adjustments, it is not clear what impact different missing data methods have on the result. Furthermore, owing to the differences between hospitals in terms of the proportion of patients with missing data (hereinafter, “missing proportion”) and the relationship between missing data and patient background [9], hospital variations regarding missing data should be well-considered.

In this study, the application of missing data methods was carried out to two scenarios of drug safety assessment adjusted by one laboratory test item. We evaluated the impact that missing data methods/approaches to hospital variations had on the interpretation of effect estimates and results.

Methods

Study’s scope

We considered a drug safety assessment to estimate an exposure effect using the Cox proportional-hazards model, which is commonly used in cohort designs. Assuming one laboratory test item as a confounder, we applied five missing methods.

Database and target hospitals

This study used the database system for MID-NET®-collaborative organizations of Tokushukai Medical Group (hereinafter, “Tokushukai database”), which has the largest number of MID-NET®- collaborative hospitals (10 hospitals; Supplementary Table S1). Supplementary Table S2 demonstrates the data items used for analysis.

Definition of missing data

As per a previous study, missing data were defined as “data that would be meaningful for the analysis but not available during a specific period before the first prescription date” [9]. The specific period, based on the results of this prior study, was set to 90 days.

Missing methods

MAR, the reasoning behind which is explained in Missing methods section, was assumed as the missing data mechanism in this study. The following four methods were considered as missing data methods providing unbiased results (hereinafter, “MAR-based methods”) when both the MAR assumption and the correct model specification used in the missing data method (hereinafter collectively referred to as “missing data models”) were correct: single regression imputation (SRI), multiple imputation (MI), inverse probability weighted (IPW) method, and likelihood-based method [11,12,13,14]. In the MID-NET® database system, SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) can be used. Since SRI, MI, and IPW methods are implemented in SAS version 9.4, we adopted these MAR-based methods.

In this study, five missing methods were applied in order to evaluate their impact on the effect estimation of the outcome model. The methods included the above three MAR-based methods, a method excluding a laboratory test item from baseline covariates (Exclusion), and a complete case (CC) method (Table 2). Table 2 provides a brief description of the three MAR-based methods [7, 15,16,17,18,19,20,21,22,23] and the settings of this study. In the application of MAR-based methods, two approaches were adopted to consider hospital variations in missing data. The hospital-specific-approach implements a missing data model within each hospital cohort. The overall-approach, consequently, implements a missing data model to the overall cohort (combined hospitals cohorts) and uses hospital as a fixed effect covariate (Fig. 1).

Table 2 Overview of the five missing methods
Fig. 1
figure 1

Overview of the two approaches considering hospital variations. Abbreviations: aHR, adjusted hazard ratio. † In the overall approach, the effects of missing data sources at the hospital level, including unobserved missing data sources, can be considered as the hospital effect of the fixed effect, if not all. ‡ In the hospital-specific approach, the interaction of patient background factors with hospitals can be considered

Confounding adjustment and outcome model

For the confounding adjustment methods, we adopted two propensity score (PS) methods (PS weighting and PS matching) as well as an outcome model method with confounding factors as covariates (regression adjustment). The confounding factors, other than the laboratory test item, were patient related factors (see Table 3) and the corresponding hospitals. Since the target population in the scenarios of this study is an exposed population, the standardized mortality ratio weighting (SMRW) [27] was used for PS weighting.

Table 3 Details of scenarios 1 and 2

Then, the point estimate of adjusted hazard ratio (aHR) and 95% confidence interval (CI) were calculated using the Cox proportional-hazard model. In PS matching, stratified Cox proportional-hazard model by matched pairs was used. Robust standard error was used to calculate 95% CI. In the combination of PS methods and MI, aHR estimation was performed for m imputed datasets, and m estimates were combined (“within approach”) [34] (see Supplementary Fig. S1).

Scenarios

Based on our previous research [9], we selected the scenarios of two cohort studies that evaluated the relation between drugs and their known risks (see Supplementary Fig. S2). Scenario 1 was the risk of diabetes associated with antipsychotic drug use. In Scenario 1, the blood glucose level before prescription of antipsychotic drugs was the target laboratory test item. Scenario 2 was the risk of hepatic injury associated with HMG-CoA reductase inhibitors (statins) use. In Scenario 2, the target laboratory test item was either the alkaline phosphatase (ALP), alanine transaminase (ALT), low-density lipoprotein cholesterol (LDL-chol), or triglyceride (TG) levels before statins prescription. In Scenario 2, four laboratory test items were included in the baseline covariates of the outcome model, and CC method, SRI, MI, and IPW method were set for each laboratory test item. Table 3 shows the detailed settings of the scenarios, including cohort sizes and the number of complete cases.

Protocol approval and statistical analysis

Our study protocol was approved by the Kyoto University Graduate School, Faculty of Medicine, and Kyoto University Hospital Ethics Committee in November 2018 (R1793). Statistical analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC, USA).

Results

Overall cohort sizes and the numbers of complete cases were summarized in Table 3 and Supplementary Figs. S3 and S4. Patient backgrounds including incidence rate are demonstrated in Supplementary Tables S3-S6 (Tables S3 and S5 for overall cohorts and complete cases, and Tables S4 and S6 for hospital cohorts).

Scenario 1

Confounding adjustment by PS weighting

The aHR of Exclusion was 0.52 (95% CI, 0.34–0.81) (Fig. 2). CC method, SRI, MI, and IPW method contained blood glucose level as a covariate. The aHRs of SRI and MI by the hospital-specific-approach were 0.61 (95% CI, 0.39–0.96) and 0.63 (95% CI, 0.42–0.93), respectively; the point estimates were relatively close, but the width of the 95% CI was slightly narrower in MI. Conversely, the aHRs of the CC and IPW methods by hospital-specific-approach were 0.78 and 0.76, respectively, and thus higher than those of SRI and MI.

Fig. 2
figure 2

Scenario1 (the risk of diabetes associated with SGA compared to FGA use). Hazard ratios from outcome models with and without baseline blood glucose (SRI, MI, and IPW method were applied by hospital-specific-approach). Abbreviations: aHR, adjusted hazard ratio; CC, complete cases; CI, confidence interval; IPW, inverse probability weighted; MI, multiple imputation; PS, propensity score; SRI, single regression imputation. †The sample size in PS matching with MI gives the mean of 10 matched samples

The aHRs of SRI, MI, and IPW method by the overall-approach showed no substantial differences compared with the estimates by the hospital-specific-approach (Fig. 3).

Fig. 3
figure 3

Scenario1 (the risk of diabetes associated with SGA use compared to FGA use). The difference in hazard ratios between approaches considering hospital variations. Abbreviations: aHR, adjusted hazard ratio; CI, confidence interval; IPW, inverse probability weighted; MI, multiple imputation; PS, propensity score; SRI, single regression imputation. †The sample size in PS matching with MI gives the mean of 10 matched samples

Confounding adjustment by PS matching

The aHRs of SRI and MI by the hospital-specific-approach were 1.10 (95% CI, 0.72–1.66) and 1.01 (95% CI, 0.34–2.97), respectively. Although there were no substantial differences in point estimates, the width of the 95% CI was larger in MI (Fig. 2). The aHR of the IPW method was 1.31 by the hospital-specific-approach but reduced to 0.94 by the overall-approach (Fig. 3).

Regression adjustment

The aHRs of SRI and MI by the hospital-specific-approach were relatively close in terms of the point estimates and the 95% CIs (Fig. 2).

Scenario 2

Confounding adjustment by PS weighting

The aHR of Exclusion was 1.32 (95% CI, 0.83–2.08). CC method, SRI, MI, and IPW method included each laboratory test item as a covariate. The aHRs of SRI and MI by hospital-specific-approach varied between 1.20 and 1.32 through all laboratory test items (Fig. 4). While there were no substantial differences in the point estimates of SRI and MI, the width of the 95% CI was slightly narrower in MI; for example, the aHR of SRI and MI for LDL-chol were 1.24 (95% CI, 0.78–1.96) and 1.20 (95% CI, 0.84–1.71), respectively. The aHR of IPW method for ALT was 1.21, whereas those for ALP and LDL-chol were close to 1. The aHRs of the CC method varied depending on the type of laboratory test item. Accordingly, the aHRs for ALT and LDL-chol were 1.29 and 1.08, respectively.

Fig. 4
figure 4

Scenario2 (the risk of hepatic injury associated with rosuvastatin use compared to atorvastatin use). Hazard ratios from outcome models with and without baseline ALT, ALP, LDL-chol, or TG (SRI, MI, and IPW method were applied by hospital-specific-approach). Abbreviations: aHR, adjusted hazard ratio; ALT, alanine transaminase; ALP, alkaline phosphatase; CC, complete cases; CI, confidence interval; IPW, inverse probability weighted; LDL-chol, low-density lipoprotein cholesterol; MI, multiple imputation; TG, triglyceride; PS, propensity score; SRI, single regression imputation. † Not shown in the forest plot if the aHR is 8.0 or more. ‡The sample size in PS matching with MI gives the mean of 10 matched samples

The aHRs of SRI and MI by the overall-approach ranged between 1.19 and 1.32 through all laboratory test items (Fig. 5). For some laboratory test items, especially ALP, the aHR of MI tended to be higher than those determined by the hospital-specific-approach.

Fig. 5
figure 5

Scenario2 (the risk of hepatic injury associated with rosuvastatin use compared to atorvastatin use). The difference in hazard ratios between approaches considering the hospital variations. Abbreviations: aHR, adjusted hazard ratio; ALT, alanine transaminase; ALP, alkaline phosphatase; CI, confidence interval; IPW, inverse probability weighted; LDL-chol, low-density lipoprotein cholesterol; MI, multiple imputation; TG, triglyceride; PS, propensity score; SRI, single regression imputation. † Not shown in the forest plot if the aHR is 8. or more. ‡The sample size in PS matching with MI gives the mean of 10 matched samples

Confounding adjustment by PS matching

For all laboratory test items, the differences in aHR between methods were higher and the width of 95% CI was greater than the values obtained with other confounding adjustment methods.

Regression adjustment

The aHRs of SRI and MI by the hospital-specific-approach ranged from 1.22 to 1.28 throughout all laboratory test items. For the results in each laboratory test item, both the point estimates and the 95% CIs were relatively close in SRI and MI (Fig. 4). However, the aHR of the IPW method was lower than those of SRI and MI.

Discussion

We evaluated the impact of five missing methods, approaches to hospital variations, and confounding adjustment methods on the effect estimation in two different scenarios. In Missing methods section of Discusssion and Approaches considering hospital variations sections, we excluded discussions on PS matching in Scenario 2 were excluded and were, instead, included in Confounding adjustment methods section.

Missing methods

SRI versus MI

Although the setting is different, Marshall et al. [35] reported that the bias in SRI and MI, using the same covariates in imputation models, is the same when the missing proportion of a continuous variable is around 25%, and that there was no substantial difference in bias even if the missing proportion increased to 50%. In this study, we had no difference in the point estimates of aHRs between SRI and MI. This might be due to the fact that we used the same covariates in imputation models.

However, for 95% CIs, we found some differences between SRI and MI in confounding adjustment with PS methods (PS weighting of Scenarios 1 and 2 and PS matching of Scenario 1). This will be further explained in Confounding adjustment methods section. Although SRI is generally described as underestimating SE [36], the reason that this study did not show a clear underestimation of SE regardless of the type of confounding adjustment method might be due to the targeted single laboratory test item.

IPW method versus imputation methods (SRI and MI)

The degree of difference in the point estimates of aHRs between the IPW method and the imputation methods varied depending on laboratory test items. In particular, there were large differences in Scenarios 1 and 2 adjusted by blood glucose level and ALP, respectively. This might be attributed to the large weights in the IPW method. In the analyses of blood glucose level and ALP, some hospital cohorts had IPWs of 20 or higher. Large weights can contribute to estimation instability. Therefore, it is important to check the stability of the effect estimation using the truncation of large weights [23] as a sensitivity analysis.

CC and exclusion methods

In the CC method, large differences in aHRs from those in the MAR-based methods were observed, especially in Scenario 1 adjusted by blood glucose level and in Scenario 2 adjusted by LDL-chol. When the standardized mean differences (SMD) [37] between complete cases and overall cohorts were calculated for patient background factors, factors with SMD of 0.1 or higher existed in Scenarios 1 and 2 (complete cases for LDL-chol) (Supplementary Tables S3 and S5). Such factors were considered to have non-negligible differences in patient backgrounds [38, 39], and the aHR difference between the CC method and the missing data methods was due to the fact that complete cases did not represent overall cohorts.

In the method excluding the laboratory test item in Scenario 1, a relatively large difference in aHR from those in the MAR-based methods occurred. However, in some Scenario 2 cases, the difference was relatively small and no noticeable difference in the analysis for ALT existed. This might be attributed to the fact that confounding by ALT was small because mean ALTs were the same for rosuvastatin and atorvastatin users.

Assumption of missing data mechanism

We considered the missing data mechanism in this study was a mixture of MAR and MNAR. Applying the missing data method based on the MNAR assumption requires extensive modeling of the missing data process. In such a case, the MAR-based method may be used as the main analysis method. Then, to evaluate the stability of the main results with the MAR assumption, the pre-planned sensitivity analysis should be considered [40].

When applying the MAR-based methods, the validity of the MAR assumption should be relatively increased by considering as many factors affecting the missing data as possible increases [41, 42]. In this study, all patient background characteristics were included in the missing data model. However, other missing data sources, such as the settings in the ordering system for laboratory tests, the measurement policy of tests, and the preferences of doctors were unobserved and could not be included in the missing data models.

Approaches considering hospital variations

The results of SRI and MI, and some of those in the IPW method showed no noticeable change in aHR due to different approaches. In the overall approach, hospitals were included in the missing data model as a fixed variable and were expected to capture some of the effects of unobserved missing data sources at the hospital level. In the hospital-specific approach, interactions between patient backgrounds and hospitals can be considered. It is not known whether the hospital effect can be explained by the fixed effect. Additionally, the presence or absence of interaction has not been evaluated, but the results of this study indicate that the difference in approach may not significantly affect the difference in aHRs.

Some changes in aHRs due to the difference in approach in the IPW method may have been caused by the change in the distribution of IPW. In the analysis for ALP in Scenario 2 of the hospital-specific-approach, unlike the overall-approach, some patients with an IPW of 20 or higher existed.

Confounding adjustment methods

Confounding adjustment methods affected the results in two ways. First, in PS methods (especially PS matching), there was a difference in 95% CIs between SRI and MI. For binary outcomes, Granger et al. [43] reported that coverage of 95% CI was too high in the PS matching compared to the PS weighting using SMRW due to the overestimation of variances. A combination of the PS matching and MI with the “within approach” may have affected the difference in 95% CIs in this study as well.

Second, in the analysis using PS matching, the degree of variation in aHRs and the ranges of 95% CI between missing data methods were large. One of the reasons was attributed to the decrease in the number of patients to be analyzed (Figs. 2 and 3). In Scenario 2, the incidence rate was low (Supplementary Table S5), and the impact of decreasing sample size was large. PS matching should be used carefully, considering the results from this study and its features (e.g., the target population will no longer be an exposed population if all patients in the exposed population do not have matched controls).

In scenario 1, unlike a previous study [44], the increased risk of diabetes associated with SGA use was not observed. In our study, although we did not confirm the frequency of laboratory measurements during the follow-up period, there were differences between groups in the missing proportion before drug prescription (SGA users: 20.6%, FGA users: 9.2%), suggesting that a detection bias may have existed and affected the results. In scenario 2, the point estimates of aHR were around or over 1, consistent with previous studies [32, 33].

When selecting scenarios and laboratory test items, we also considered the feasibility of the number of events; thus, it was not covered in this study, but the following case can also be a typical scenario under which missing laboratory values occur in the MID-NET® based on the knowledge from our previous study [9]; the scenario using laboratory tests with restrictions on the implementation interval based on the health insurance system (e.g., measurement can only be performed once every 3 months) that may be unique to the Japanese medical environment. It is important to Considering the impact and characteristics of missing data is important when planning research.

There are two main limitations to this study. The first is that the scenarios and laboratory test items were limited. Since we used five missing methods, two approaches for considering hospital variations, and three confounder adjustment methods, we had to limit the scenarios and the number of laboratory test items. As more than one laboratory test item can be a confounding factor, further studies focusing on such situations are needed. The second is that we only examined the Tokushukai Database. When applying the missing data method to the entire MID-NET®, one must refer to the findings of this study considering the differences from the Tokushukai Database regarding the target population and missing data sources.

Conclusions

Based on the five main findings of this study (Table 4), we concluded that, although the different missing methods may contribute to differences in parameter estimates of the outcome model, SRI and MI can provide similar point estimates, and two approaches considering hospital variations do not have a major impact on the results. Confounding adjustment by PS matching gave unstable point estimates and wide confidence intervals and should therefore be used carefully. Although we report findings based on a case study and cannot draw generalizable recommendation, our research results may help in the selection of missing data imputation methods and the interpretation of obtained results in the future utilization of MID-NET®.

Table 4 Main findings in this study