Dealing with missing data in laboratory test results used as a baseline covariate: results of multi-hospital cohort studies utilizing a database system contributing to MID-NET® in Japan

Komamine, Maki; Fujimura, Yoshiaki; Omiya, Masatomo; Sato, Tosiya

doi:10.1186/s12911-023-02345-7

Dealing with missing data in laboratory test results used as a baseline covariate: results of multi-hospital cohort studies utilizing a database system contributing to MID-NET^® in Japan

Research
Open access
Published: 30 October 2023

Volume 23, article number 242, (2023)
Cite this article

Download PDF

You have full access to this open access article

BMC Medical Informatics and Decision Making Aims and scope Submit manuscript

Dealing with missing data in laboratory test results used as a baseline covariate: results of multi-hospital cohort studies utilizing a database system contributing to MID-NET^® in Japan

Download PDF

Maki Komamine ORCID: orcid.org/0000-0002-8390-0922^1,2,
Yoshiaki Fujimura³,
Masatomo Omiya⁴ &
…
Tosiya Sato¹

754 Accesses
Explore all metrics

Abstract

Background

To evaluate missing data methods applied to laboratory test results used for confounding adjustment, utilizing data from 10 MID-NET^®-collaborative hospitals.

Methods

Using two scenarios, five methods dealing with missing laboratory test results were applied, including three missing data methods (single regression imputation (SRI), multiple imputation (MI), and inverse probability weighted (IPW) method). We compared the point estimates of adjusted hazard ratios (aHRs) and 95% confidence intervals (CIs) between the five methods. Hospital variability in missing data was considered using the hospital-specific approach and overall approach. Confounding adjustment methods were propensity score (PS) weighting, PS matching, and regression adjustment.

Results

In Scenario 1, the risk of diabetes due to second-generation antipsychotics was compared with that due to first-generation antipsychotics. The aHR adjusted by PS weighting using SRI, MI, and IPW by the hospital-specific-approach was 0.61 [95%CI, 0.39–0.96], 0.63 [95%CI, 0.42–0.93], and 0.76 [95%CI, 0.46–1.25], respectively. In Scenario 2, the risk of liver injuries due to rosuvastatin was compared with that due to atorvastatin. Although PS matching largely contributed to differences in aHRs between methods, PS weighting provided no substantial difference in point estimates of aHRs between SRI and MI, similar to Scenario 1. The results of SRI and MI in both scenarios showed no considerable changes, even upon changing the approaches considering hospital variations.

Conclusions

SRI and MI provide similar point estimates of aHR. Two approaches considering hospital variations did not markedly affect the results. Adjustment by PS matching should be used carefully.

View this article's peer review reports

Characteristics of hospital differences in missing of clinical laboratory test results in a multi-hospital observational database contributing to MID-NET® in Japan

Article Open access 06 June 2021

Incorporating Linked Healthcare Claims to Improve Confounding Control in a Study of In-Hospital Medication Use

Article 03 May 2015

Robustness of Multiple Imputation Methods for Missing Risk Factor Data from Electronic Medical Records for Observational Studies

Article 10 September 2022

Background

In April 2018, the operation of the Medical Information Database Network (MID-NET^®) began as a national project aimed at utilizing real-world data for drug safety assessments in Japan [1,2,3,4]. MID-NET^® is a database systems network consisting of 10 collaborative organizations (23 collaborative hospitals) that can analyze data derived from claim data, diagnosis procedure combination data, and electronic medical record (EMR) data at the individual-level [2].

Laboratory test results derived from EMR data have detailed information on clinical symptoms [5] and are expected to be used for confounding adjustments in drug safety assessments. However, the appropriate use of these test results is difficult since a number of data obtained during routine medical care may be missing in datasets for analysis. Therefore, it is essential to appropriately select and apply existing methods to reduce bias due to missing data (hereinafter referred to as “missing data methods”).

Choosing a missing data method requires an understanding of the missing data sources and missing data mechanism [6,7,8]. Raebel et al., in their study using the US Food and Drug Administration Mini-Sentinel Distributed Database [6], reported that missing data sources of laboratory test results include testing outside of contracted laboratories, patient location where testing was conducted, patient clinical features, etc. In the study of the 10 MID-NET^®-collaborative hospitals of the Tokushukai Medical Group [9], authors reported that patient background and setting of ordering system of laboratory tests were the main missing data sources. Although their impact was not evaluated because they were unobserved, the measurement policies by physicians and institutions were considered as potential sources as well. Missing data mechanism can be classified as missing completely at random (MCAR), missing at random (MAR), or missing not at random (MNAR) based on the relationship of missing data probability with missing data and observed values [10] (See Table 1).

Table 1 Missing data mechanisms and their examples

Full size table

In the utilization of laboratory test results contained in MID-NET^® to confounding adjustments, it is not clear what impact different missing data methods have on the result. Furthermore, owing to the differences between hospitals in terms of the proportion of patients with missing data (hereinafter, “missing proportion”) and the relationship between missing data and patient background [9], hospital variations regarding missing data should be well-considered.

In this study, the application of missing data methods was carried out to two scenarios of drug safety assessment adjusted by one laboratory test item. We evaluated the impact that missing data methods/approaches to hospital variations had on the interpretation of effect estimates and results.

Methods

Study’s scope

We considered a drug safety assessment to estimate an exposure effect using the Cox proportional-hazards model, which is commonly used in cohort designs. Assuming one laboratory test item as a confounder, we applied five missing methods.

Database and target hospitals

This study used the database system for MID-NET^®-collaborative organizations of Tokushukai Medical Group (hereinafter, “Tokushukai database”), which has the largest number of MID-NET^®- collaborative hospitals (10 hospitals; Supplementary Table S1). Supplementary Table S2 demonstrates the data items used for analysis.

Definition of missing data

As per a previous study, missing data were defined as “data that would be meaningful for the analysis but not available during a specific period before the first prescription date” [9]. The specific period, based on the results of this prior study, was set to 90 days.

Missing methods

MAR, the reasoning behind which is explained in Missing methods section, was assumed as the missing data mechanism in this study. The following four methods were considered as missing data methods providing unbiased results (hereinafter, “MAR-based methods”) when both the MAR assumption and the correct model specification used in the missing data method (hereinafter collectively referred to as “missing data models”) were correct: single regression imputation (SRI), multiple imputation (MI), inverse probability weighted (IPW) method, and likelihood-based method [11,12,13,14]. In the MID-NET^® database system, SAS version 9.4 (SAS Institute Inc., Cary, NC, USA) can be used. Since SRI, MI, and IPW methods are implemented in SAS version 9.4, we adopted these MAR-based methods.

In this study, five missing methods were applied in order to evaluate their impact on the effect estimation of the outcome model. The methods included the above three MAR-based methods, a method excluding a laboratory test item from baseline covariates (Exclusion), and a complete case (CC) method (Table 2). Table 2 provides a brief description of the three MAR-based methods [7, 15,16,17,18,19,20,21,22,23] and the settings of this study. In the application of MAR-based methods, two approaches were adopted to consider hospital variations in missing data. The hospital-specific-approach implements a missing data model within each hospital cohort. The overall-approach, consequently, implements a missing data model to the overall cohort (combined hospitals cohorts) and uses hospital as a fixed effect covariate (Fig. 1).

Table 2 Overview of the five missing methods

Full size table

Confounding adjustment and outcome model

For the confounding adjustment methods, we adopted two propensity score (PS) methods (PS weighting and PS matching) as well as an outcome model method with confounding factors as covariates (regression adjustment). The confounding factors, other than the laboratory test item, were patient related factors (see Table 3) and the corresponding hospitals. Since the target population in the scenarios of this study is an exposed population, the standardized mortality ratio weighting (SMRW) [27] was used for PS weighting.

Table 3 Details of scenarios 1 and 2

Full size table

Then, the point estimate of adjusted hazard ratio (aHR) and 95% confidence interval (CI) were calculated using the Cox proportional-hazard model. In PS matching, stratified Cox proportional-hazard model by matched pairs was used. Robust standard error was used to calculate 95% CI. In the combination of PS methods and MI, aHR estimation was performed for m imputed datasets, and m estimates were combined (“within approach”) [34] (see Supplementary Fig. S1).

Scenarios

Based on our previous research [9], we selected the scenarios of two cohort studies that evaluated the relation between drugs and their known risks (see Supplementary Fig. S2). Scenario 1 was the risk of diabetes associated with antipsychotic drug use. In Scenario 1, the blood glucose level before prescription of antipsychotic drugs was the target laboratory test item. Scenario 2 was the risk of hepatic injury associated with HMG-CoA reductase inhibitors (statins) use. In Scenario 2, the target laboratory test item was either the alkaline phosphatase (ALP), alanine transaminase (ALT), low-density lipoprotein cholesterol (LDL-chol), or triglyceride (TG) levels before statins prescription. In Scenario 2, four laboratory test items were included in the baseline covariates of the outcome model, and CC method, SRI, MI, and IPW method were set for each laboratory test item. Table 3 shows the detailed settings of the scenarios, including cohort sizes and the number of complete cases.

Protocol approval and statistical analysis

Our study protocol was approved by the Kyoto University Graduate School, Faculty of Medicine, and Kyoto University Hospital Ethics Committee in November 2018 (R1793). Statistical analyses were performed using SAS version 9.4 (SAS Institute, Cary, NC, USA).

Results

Overall cohort sizes and the numbers of complete cases were summarized in Table 3 and Supplementary Figs. S3 and S4. Patient backgrounds including incidence rate are demonstrated in Supplementary Tables S3-S6 (Tables S3 and S5 for overall cohorts and complete cases, and Tables S4 and S6 for hospital cohorts).

Scenario 1

Confounding adjustment by PS weighting

The aHR of Exclusion was 0.52 (95% CI, 0.34–0.81) (Fig. 2). CC method, SRI, MI, and IPW method contained blood glucose level as a covariate. The aHRs of SRI and MI by the hospital-specific-approach were 0.61 (95% CI, 0.39–0.96) and 0.63 (95% CI, 0.42–0.93), respectively; the point estimates were relatively close, but the width of the 95% CI was slightly narrower in MI. Conversely, the aHRs of the CC and IPW methods by hospital-specific-approach were 0.78 and 0.76, respectively, and thus higher than those of SRI and MI.

The aHRs of SRI, MI, and IPW method by the overall-approach showed no substantial differences compared with the estimates by the hospital-specific-approach (Fig. 3).

Confounding adjustment by PS matching

The aHRs of SRI and MI by the hospital-specific-approach were 1.10 (95% CI, 0.72–1.66) and 1.01 (95% CI, 0.34–2.97), respectively. Although there were no substantial differences in point estimates, the width of the 95% CI was larger in MI (Fig. 2). The aHR of the IPW method was 1.31 by the hospital-specific-approach but reduced to 0.94 by the overall-approach (Fig. 3).

Regression adjustment

The aHRs of SRI and MI by the hospital-specific-approach were relatively close in terms of the point estimates and the 95% CIs (Fig. 2).

Scenario 2

Confounding adjustment by PS weighting

The aHR of Exclusion was 1.32 (95% CI, 0.83–2.08). CC method, SRI, MI, and IPW method included each laboratory test item as a covariate. The aHRs of SRI and MI by hospital-specific-approach varied between 1.20 and 1.32 through all laboratory test items (Fig. 4). While there were no substantial differences in the point estimates of SRI and MI, the width of the 95% CI was slightly narrower in MI; for example, the aHR of SRI and MI for LDL-chol were 1.24 (95% CI, 0.78–1.96) and 1.20 (95% CI, 0.84–1.71), respectively. The aHR of IPW method for ALT was 1.21, whereas those for ALP and LDL-chol were close to 1. The aHRs of the CC method varied depending on the type of laboratory test item. Accordingly, the aHRs for ALT and LDL-chol were 1.29 and 1.08, respectively.

The aHRs of SRI and MI by the overall-approach ranged between 1.19 and 1.32 through all laboratory test items (Fig. 5). For some laboratory test items, especially ALP, the aHR of MI tended to be higher than those determined by the hospital-specific-approach.

Confounding adjustment by PS matching

For all laboratory test items, the differences in aHR between methods were higher and the width of 95% CI was greater than the values obtained with other confounding adjustment methods.

Regression adjustment

The aHRs of SRI and MI by the hospital-specific-approach ranged from 1.22 to 1.28 throughout all laboratory test items. For the results in each laboratory test item, both the point estimates and the 95% CIs were relatively close in SRI and MI (Fig. 4). However, the aHR of the IPW method was lower than those of SRI and MI.

Discussion

We evaluated the impact of five missing methods, approaches to hospital variations, and confounding adjustment methods on the effect estimation in two different scenarios. In Missing methods section of Discusssion and Approaches considering hospital variations sections, we excluded discussions on PS matching in Scenario 2 were excluded and were, instead, included in Confounding adjustment methods section.

Missing methods

SRI versus MI

Although the setting is different, Marshall et al. [35] reported that the bias in SRI and MI, using the same covariates in imputation models, is the same when the missing proportion of a continuous variable is around 25%, and that there was no substantial difference in bias even if the missing proportion increased to 50%. In this study, we had no difference in the point estimates of aHRs between SRI and MI. This might be due to the fact that we used the same covariates in imputation models.

However, for 95% CIs, we found some differences between SRI and MI in confounding adjustment with PS methods (PS weighting of Scenarios 1 and 2 and PS matching of Scenario 1). This will be further explained in Confounding adjustment methods section. Although SRI is generally described as underestimating SE [36], the reason that this study did not show a clear underestimation of SE regardless of the type of confounding adjustment method might be due to the targeted single laboratory test item.

IPW method versus imputation methods (SRI and MI)

The degree of difference in the point estimates of aHRs between the IPW method and the imputation methods varied depending on laboratory test items. In particular, there were large differences in Scenarios 1 and 2 adjusted by blood glucose level and ALP, respectively. This might be attributed to the large weights in the IPW method. In the analyses of blood glucose level and ALP, some hospital cohorts had IPWs of 20 or higher. Large weights can contribute to estimation instability. Therefore, it is important to check the stability of the effect estimation using the truncation of large weights [23] as a sensitivity analysis.

CC and exclusion methods

In the CC method, large differences in aHRs from those in the MAR-based methods were observed, especially in Scenario 1 adjusted by blood glucose level and in Scenario 2 adjusted by LDL-chol. When the standardized mean differences (SMD) [37] between complete cases and overall cohorts were calculated for patient background factors, factors with SMD of 0.1 or higher existed in Scenarios 1 and 2 (complete cases for LDL-chol) (Supplementary Tables S3 and S5). Such factors were considered to have non-negligible differences in patient backgrounds [38, 39], and the aHR difference between the CC method and the missing data methods was due to the fact that complete cases did not represent overall cohorts.

In the method excluding the laboratory test item in Scenario 1, a relatively large difference in aHR from those in the MAR-based methods occurred. However, in some Scenario 2 cases, the difference was relatively small and no noticeable difference in the analysis for ALT existed. This might be attributed to the fact that confounding by ALT was small because mean ALTs were the same for rosuvastatin and atorvastatin users.

Assumption of missing data mechanism

We considered the missing data mechanism in this study was a mixture of MAR and MNAR. Applying the missing data method based on the MNAR assumption requires extensive modeling of the missing data process. In such a case, the MAR-based method may be used as the main analysis method. Then, to evaluate the stability of the main results with the MAR assumption, the pre-planned sensitivity analysis should be considered [40].

When applying the MAR-based methods, the validity of the MAR assumption should be relatively increased by considering as many factors affecting the missing data as possible increases [41, 42]. In this study, all patient background characteristics were included in the missing data model. However, other missing data sources, such as the settings in the ordering system for laboratory tests, the measurement policy of tests, and the preferences of doctors were unobserved and could not be included in the missing data models.

Approaches considering hospital variations

The results of SRI and MI, and some of those in the IPW method showed no noticeable change in aHR due to different approaches. In the overall approach, hospitals were included in the missing data model as a fixed variable and were expected to capture some of the effects of unobserved missing data sources at the hospital level. In the hospital-specific approach, interactions between patient backgrounds and hospitals can be considered. It is not known whether the hospital effect can be explained by the fixed effect. Additionally, the presence or absence of interaction has not been evaluated, but the results of this study indicate that the difference in approach may not significantly affect the difference in aHRs.

Some changes in aHRs due to the difference in approach in the IPW method may have been caused by the change in the distribution of IPW. In the analysis for ALP in Scenario 2 of the hospital-specific-approach, unlike the overall-approach, some patients with an IPW of 20 or higher existed.

Confounding adjustment methods

Confounding adjustment methods affected the results in two ways. First, in PS methods (especially PS matching), there was a difference in 95% CIs between SRI and MI. For binary outcomes, Granger et al. [43] reported that coverage of 95% CI was too high in the PS matching compared to the PS weighting using SMRW due to the overestimation of variances. A combination of the PS matching and MI with the “within approach” may have affected the difference in 95% CIs in this study as well.

Second, in the analysis using PS matching, the degree of variation in aHRs and the ranges of 95% CI between missing data methods were large. One of the reasons was attributed to the decrease in the number of patients to be analyzed (Figs. 2 and 3). In Scenario 2, the incidence rate was low (Supplementary Table S5), and the impact of decreasing sample size was large. PS matching should be used carefully, considering the results from this study and its features (e.g., the target population will no longer be an exposed population if all patients in the exposed population do not have matched controls).

In scenario 1, unlike a previous study [44], the increased risk of diabetes associated with SGA use was not observed. In our study, although we did not confirm the frequency of laboratory measurements during the follow-up period, there were differences between groups in the missing proportion before drug prescription (SGA users: 20.6%, FGA users: 9.2%), suggesting that a detection bias may have existed and affected the results. In scenario 2, the point estimates of aHR were around or over 1, consistent with previous studies [32, 33].

When selecting scenarios and laboratory test items, we also considered the feasibility of the number of events; thus, it was not covered in this study, but the following case can also be a typical scenario under which missing laboratory values occur in the MID-NET^® based on the knowledge from our previous study [9]; the scenario using laboratory tests with restrictions on the implementation interval based on the health insurance system (e.g., measurement can only be performed once every 3 months) that may be unique to the Japanese medical environment. It is important to Considering the impact and characteristics of missing data is important when planning research.

There are two main limitations to this study. The first is that the scenarios and laboratory test items were limited. Since we used five missing methods, two approaches for considering hospital variations, and three confounder adjustment methods, we had to limit the scenarios and the number of laboratory test items. As more than one laboratory test item can be a confounding factor, further studies focusing on such situations are needed. The second is that we only examined the Tokushukai Database. When applying the missing data method to the entire MID-NET^®, one must refer to the findings of this study considering the differences from the Tokushukai Database regarding the target population and missing data sources.

Conclusions

Based on the five main findings of this study (Table 4), we concluded that, although the different missing methods may contribute to differences in parameter estimates of the outcome model, SRI and MI can provide similar point estimates, and two approaches considering hospital variations do not have a major impact on the results. Confounding adjustment by PS matching gave unstable point estimates and wide confidence intervals and should therefore be used carefully. Although we report findings based on a case study and cannot draw generalizable recommendation, our research results may help in the selection of missing data imputation methods and the interpretation of obtained results in the future utilization of MID-NET^®.

Table 4 Main findings in this study

Full size table

Availability of data and materials

The datasets generated and/or analyzed during the current study are not publicly available due to the terms of use for MID-NET^® to which we adhered when conducting this study; the accessibility of the dataset used for this analysis is restricted to specific authors including the corresponding author in a predetermined secure environment. No outside researchers are allowed to access the dataset. This study used the database system for MID-NET^®-collaborative organizations of the Tokushukai Medical Group, a part of MID-NET^®, and not the entire MID-NET^®. However, we followed the terms of use for MID-NET^®, as the datasets were included in the entire MID-NET^®.

Abbreviations

aHR:: Adjusted hazard ratio
ALP:: Alkaline phosphatase
ALT:: Alanine transaminase
CC:: Complete case
CI:: Confidence interval
EMR:: Electronic medical record
FGA:: First-generation antipsychotic
HbA1c:: Hemoglobin A1c
ICD:: International Classification of Diseases
IPW:: Inverse probability weighted
JDS:: The Japan Diabetes Society
LDL-chol:: Low-density lipoprotein cholesterol
MAR:: Missing at random
MCAR:: Missing completely at random
MNAR:: Missing not at random
MI:: Multiple imputation
NGSP:: The National Glycohemoglobin Standardization Program
PS:: Propensity score
SGA:: Second-generation antipsychotic
SMD:: Standardized mean differences
SMRW:: Standardized mortality ratio weighting
SRI:: Single regression imputation
TG:: Triglyceride

References

Yamada K, Itoh M, Fujimura Y, et al. The utilization and challenges of Japan’s MID-NET® medical information database network in postmarketing drug safety assessments: a summary of pilot pharmacoepidemiological studies. Pharmacoepidemiol Drug Saf. 2019;28(5):601–8.
Article PubMed Google Scholar
Yamaguchi M, Inomata S, Harada S, et al. Establishment of the MID-NET® medical information database network as a reliable and valuable database for drug safety assessments in Japan. Pharmacoepidemiol Drug Saf. 2019;28(10):1395–404.
Article PubMed PubMed Central Google Scholar
Pharmaceuticals and Medical Devices Agency. Summary of MID-NET^® study: no. 2018-001. 2020. https://www.pmda.go.jp/files/000233987.pdf. Accessed 7 May 2022.
Pharmaceuticals and Medical Devices Agency. Summary of MID-NET^® study: no. 2018-002. 2020. https://www.pmda.go.jp/files/000234446.pdf. Accessed 7 May 2022.
Schneeweiss S, Rassen JA, Glynn RJ, et al. Supplementing claims data with outpatient laboratory test results to improve confounding adjustment in effectiveness studies of lipid-lowering treatments. BMC Med Res Methodol. 2012;12:180.
Article CAS PubMed PubMed Central Google Scholar
Raebel MA, Shetterly S, Lu CY, et al. Methods for using clinical laboratory test results as baseline confounders in multi-site observational database studies when missing data are expected. Pharmacoepidemiol Drug Saf. 2016;25(7):798–814.
Article CAS PubMed Google Scholar
Wells BJ, Chagin KM, Nowacki AS, Kattan MW. Strategies for handling missing data in electronic health record-derived data. EGEMS (Wash DC). 2013;1(3):1035.
PubMed Google Scholar
Eekhout I, de Boer RM, Twisk JW, de Vet HC, Heymans MW. Missing data: a systematic review of how they are reported and handled. Epidemiology. 2012;23(5):729–32.
Article PubMed Google Scholar
Komamine M, Fujimura Y, Nitta Y, Omiya M, Doi M, Sato T. Characteristics of hospital differences in missing of clinical laboratory test results in a multi-hospital observational database contributing to MID-NET® in Japan. BMC Med Inform Decis Mak. 2021;21(1):181.
Article PubMed PubMed Central Google Scholar
Molenberghs G, Fitzmaurice G, Kenward MG, Tsiatis A, Verbeke G. Handbook of missing data methodology. Boca Raton: CRC Press; 2014.
Book Google Scholar
Herring AH, Ibrahim JG. Likelihood-based methods for missing covariates in the Cox proportional hazards model. J Am Stat Assoc. 2001;96(453):292–302.
Article Google Scholar
Donders AR, van der Heijden GJ, Stijnen T, Moons KG. Review: a gentle introduction to imputation of missing values. J Clin Epidemiol. 2006;59(10):1087–91.
Article PubMed Google Scholar
Perkins NJ, Cole SR, Harel O, et al. Principled approaches to missing data in epidemiologic studies. Am J Epidemiol. 2018;187(3):568–75.
Article PubMed Google Scholar
Dziura JD, Post LA, Zhao Q, Fu Z, Peduzzi P. Strategies for dealing with missing data in clinical trials: from design to analysis. Yale J Biol Med. 2013;86(3):343–58.
PubMed PubMed Central Google Scholar
Little RJA, Rubin DB. Statistical analysis with missing data. 2nd ed. Hoboken: Wiley; 2002.
Book Google Scholar
Baraldi AN, Enders CK. An introduction to modern missing data analyses. J Sch Psychol. 2010;48(1):5–37.
Article PubMed Google Scholar
Rubin DB. Inference and missing data. Biometrika. 1976;63:581–92.
Article Google Scholar
Rubin DB. Multiple imputation for nonresponse in surveys. New York: Wiley; 1987.
Book Google Scholar
Graham JW, Olchowski AE, Gilreath TD. How many imputations are really needed? Some practical clarifications of multiple imputation theory. Prev Sci. 2007;8(3):206–13.
Article PubMed Google Scholar
Bounthavong M, Watanabe JH, Sullivan KM. Approach to addressing missing data for electronic medical records and pharmacy claims data research. Pharmacotherapy. 2015;35(4):380–7.
Article PubMed Google Scholar
van Buuren S. Flexible imputation of missing data. CRC Press; 2018.
Wang CY, Chen HY. Augmented inverse probability weighted estimator for Cox missing covariate regression. Biometrics. 2001;57(2):414–9.
Article CAS PubMed Google Scholar
Seaman SR, White IR. Review of inverse probability weighting for dealing with missing data. Stat Methods Med Res. 2013;22(3):278–95.
Article PubMed Google Scholar
Von Elm E, Altman DG, Egger M, et al. The strengthening the reporting of observational studies in epidemiology (STROBE) Statement: guidelines for reporting observational studies. Ann Intern Med. 2007;147(8):573–7.
Article Google Scholar
White IR, Royston P. Imputing missing covariate values for the Cox model. Stat Med. 2009;28(15):1982–98.
Article PubMed PubMed Central Google Scholar
Xu Q, Paik MC, Rundek T, Elkind MS, Sacco RL. Reweighting estimators for Cox regression with missing covariate data: analysis of insulin resistance and risk of stroke in the Northern Manhattan Study. Stat Med. 2011;30(28):3328–40.
Article PubMed PubMed Central Google Scholar
Sato T, Matsuyama Y. Marginal structural models as a tool for standardization. Epidemiology. 2003;14(6):680–6.
Article PubMed Google Scholar
Newcomer JW, Haupt DW, Fucetola R, et al. Abnormalities in glucose regulation during antipsychotic treatment of schizophrenia. Arch Gen Psychiatry. 2002;59(4):337–45.
Article CAS PubMed Google Scholar
Lindenmayer JP, Nathan AM, Smith RC. Hyperglycemia associated with the use of atypical antipsychotics. J Clin Psychiatry. 2001;62Suppl23:30–8.
Google Scholar
Crestor (rosuvastatin) [package insert]. Osaka: AstraZeneca K.K.; 2022. https://www.pmda.go.jp/PmdaSearch/iyakuDetail/ResultDataSetPDF/670227_2189017F1022_1_28. Accessed 7 May 2022.
Lipitor (atorvastatin) [package insert]. Tokyo: Viatris Inc; 2021. https://www.pmda.go.jp/PmdaSearch/iyakuDetail/ResultDataSetPDF/671450_2189015F1023_2_01. Accessed 7 May 2022.
Clarke AT, Johnson PC, Hall GC, Ford I, Mills PR. High dose atorvastatin associated with increased risk of significant hepatotoxicity in comparison to simvastatin in UK GPRD cohort. PLoS One. 2016;11(3):e0151587.
Article PubMed PubMed Central Google Scholar
Chang CH, Chang YC, Lee YC, Liu YC, Chuang LM, Lin JW. Severe hepatic injury associated with different statins in patients with chronic liver disease: a nationwide population-based cohort study. J Gastroenterol Hepatol. 2015;30(1):155–62.
Article CAS PubMed Google Scholar
Leyrat C, Seaman SR, White IR, et al. Propensity score analysis with partially observed covariates: how should multiple imputation be used? Stat Methods Med Res. 2019;28(1):3–19.
Article PubMed Google Scholar
Marshall A, Altman DG, Holder RL. Comparison of imputation methods for handling missing covariate data when fitting a Cox proportional hazards model: a resampling study. BMC Med Res Methodol. 2010;10:112.
Article PubMed PubMed Central Google Scholar
Little RJA, Rubin DB. Statistical analysis with missing data. Wiley; 1987, pp.44–47.
Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med. 2009;28(25):3083–107.
Article PubMed PubMed Central Google Scholar
Normand ST, Landrum MB, Guadagnoli E, et al. Validating recommendations for coronary angiography following an acute myocardial infarction in the elderly: a matched analysis using propensity scores. J Clin Epidemiol. 2001;54(4):387–98.
Article CAS PubMed Google Scholar
Cappabianca G, Mariscalco G, Biancari F, et al. Safety and efficacy of prothrombin complex concentrate as first-line treatment in bleeding after cardiac surgery. Crit Care. 2016;20:5.
Article PubMed PubMed Central Google Scholar
E9 (R1): addendum to statistical principles for clinical trials on choosing appropriate estimands and defining sensitivity analyses in clinical trials. https://database.ich.org/sites/default/files/E9-R1_Step4_Guideline_2019_1203.pdf. Accessed 7 May 2022.
Beaulieu-Jones BK, Lavage DR, Snyder JW, Moore JH, Pendergrass SA, Bauer CR. Characterizing and managing missing structured data in electronic health records: data analysis. JMIR Med Inform. 2018;6(1):e11.
Article PubMed PubMed Central Google Scholar
van Buuren S, Boshuizen HC, Knook DL. Multiple imputation of missing blood pressure covariates in survival analysis. Stat Med. 1999;18(6):681–94.
Article PubMed Google Scholar
Granger E, Sergeant JC, Lunt M. Avoiding pitfalls when combining multiple imputation and propensity scores. Stat Med. 2019;38(26):5120–32.
Article PubMed PubMed Central Google Scholar
Smith M, Hopkins D, Peveler RC, Holt RI, Woodward M, Ismail K. First- v. second-generation antipsychotics and risk for diabetes in schizophrenia: systematic review and meta-analysis. Br J Psychiatry. 2008;192(6):406–11.
Article CAS PubMed Google Scholar

Download references

Acknowledgements

We thank Dr. Masaaki Doi, Dr. Yoshiaki Uyama, and Tokushukai Medical Group for their valuable assistance. Views expressed here are those of the authors and do not necessarily represent the official views and findings of the Pharmaceuticals and Medical Devices Agency. This research was funded by the Agency for Medical Research and Development (AMED) under Grant Number JP21lk0201702.

Funding

This research was supported by the Japan Agency for Medical Research and Development (AMED) under Grant Number JP21lk0201702. The funding bodies played no role in the design of the study and collection, analysis, and interpretation of data and in writing the manuscript.

Author information

Authors and Affiliations

Department of Biostatistics, Kyoto University School of Public Health, Yoshida-konoecho, Sakyo-ku, Kyoto, 606-8501, Japan
Maki Komamine & Tosiya Sato
Office of Medical Informatics and Epidemiology, Pharmaceuticals and Medical Devices Agency, Tokyo, Japan
Maki Komamine
Head Office, Tokushukai Information System Incorporated, Osaka, Japan
Yoshiaki Fujimura
Department of Clinical Biostatistics, Graduate School of Medicine, Kyoto University, Kyoto, Japan
Masatomo Omiya

Authors

Maki Komamine
View author publications
You can also search for this author in PubMed Google Scholar
Yoshiaki Fujimura
View author publications
You can also search for this author in PubMed Google Scholar
Masatomo Omiya
View author publications
You can also search for this author in PubMed Google Scholar
Tosiya Sato
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

MK, MO, and TS conceptualized the study. MK analyzed the data. MK wrote the initial draft of the manuscript. MK, YF, MO, and TS contributed to the interpretation of findings and manuscript revisions. All authors have read and approved the final version of the manuscript.

Corresponding author

Correspondence to Maki Komamine.

Ethics declarations

Ethics approval and consent to participate

The study was conducted in accordance with the Declaration of Helsinki and approved by the Kyoto University Graduate School and Faculty of Medicine Kyoto University Hospital Ethics Committee in November 2018 (R1793). Based on Japanese Ethical Guidelines for Medical and Health Research Involving Human Subjects, the need for informed consent from individual patients was waived by the Kyoto University Graduate School and Faculty of Medicine Kyoto University Hospital Ethics Committee. The use of extracted data for this study from the database system for MID-NET^®-collaborative organizations of Tokushukai Medical Group was approved by the administrative board of General Incorporated Association Tokushukai, which is the data holder. The data used in this study were anonymized before their use (refer to the previous study by Yamaguchi et al. [2] for details of the data anonymization).

Consent for publication

Not applicable.

Competing interests

Maki Komamine is employed by the Pharmaceuticals and Medical Devices Agency and has no financial or personal relationships with other people or organizations that could inappropriately influence or bias the contents of this paper. Other authors have no financial or personal relationships with other people or organizations that could inappropriately influence or bias the contents of this paper.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplementary Table S1.

List of the 10 hospitals in the database system for MID-NET^® collaborative organizations of Tokushukai Medical Group. Supplementary Table S2. Data items used in this study. Supplementary Table S3. Scenario 1. Patient backgrounds among the cohort, complete cases, and cases with missing data. Supplementary Table S4. Scenario 1. Patient backgrounds by hospital. Supplementary Table S5. Scenario 2. Patient backgrounds among the cohort, complete cases, and cases with missing data. Supplementary Table S6. Scenario 2. Patient backgrounds by hospital. Supplementary Figure S1. Sequence of steps from handling missing data to statistical analysis. Supplementary Figure S2. Study scenario selection flowchart. Supplementary Figure S3. Scenario 1. Number of patients in the study cohort: risk of diabetes associated with SGA. Supplementary Figure S4. Scenario 2. Number of patients in the study cohort: risk of hepatic injury associated with rosuvastatin.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article

Komamine, M., Fujimura, Y., Omiya, M. et al. Dealing with missing data in laboratory test results used as a baseline covariate: results of multi-hospital cohort studies utilizing a database system contributing to MID-NET^® in Japan. BMC Med Inform Decis Mak 23, 242 (2023). https://doi.org/10.1186/s12911-023-02345-7

Download citation

Received: 30 November 2022
Accepted: 19 October 2023
Published: 30 October 2023
DOI: https://doi.org/10.1186/s12911-023-02345-7

Dealing with missing data in laboratory test results used as a baseline covariate: results of multi-hospital cohort studies utilizing a database system contributing to MID-NET® in Japan

Abstract

Background

Methods

Results

Conclusions

Similar content being viewed by others

Characteristics of hospital differences in missing of clinical laboratory test results in a multi-hospital observational database contributing to MID-NET® in Japan

Incorporating Linked Healthcare Claims to Improve Confounding Control in a Study of In-Hospital Medication Use

Robustness of Multiple Imputation Methods for Missing Risk Factor Data from Electronic Medical Records for Observational Studies

Background

Methods

Study’s scope

Database and target hospitals

Definition of missing data

Missing methods

Confounding adjustment and outcome model

Scenarios

Protocol approval and statistical analysis

Results

Scenario 1

Confounding adjustment by PS weighting

Confounding adjustment by PS matching

Regression adjustment

Scenario 2

Confounding adjustment by PS weighting

Confounding adjustment by PS matching

Regression adjustment

Discussion

Missing methods

SRI versus MI

IPW method versus imputation methods (SRI and MI)

CC and exclusion methods

Assumption of missing data mechanism

Approaches considering hospital variations

Confounding adjustment methods

Conclusions

Availability of data and materials

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethics approval and consent to participate

Consent for publication

Competing interests

Additional information

Publisher’s Note

Supplementary Information

Additional file 1: Supplementary Table S1.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

Dealing with missing data in laboratory test results used as a baseline covariate: results of multi-hospital cohort studies utilizing a database system contributing to MID-NET^® in Japan