2.1 Dataset
The Medical Information Mart for Intensive Care (MIMIC-III) database contains clinical and administrative data on over 60,000 ICU stays at Beth Israel Deaconess Medical Center (BIDMC) between 2001 and 2012. It includes operational-level data on bed assignments and service transfers, as well as ICD-9-CM diagnoses and several mortality measures (ICU stay mortality, hospital mortality, and survival duration up to one year).
2.2 Methodology
Cohort Selection
We included all adult subjects, aged 18 years or older, cared for by the MICU at any point during their admission. The study period was defined as June, 2002 through December, 2012. In order to ensure independence of observations only the last ICU admission for each subject was included in the analysis.
Exclusion criteria included subjects whose primary hospital team at any point during their admission was non-medical (i.e. surgical or cardiac), as this might imply a specific reason aside from capacity constraints for a patient to be a boarder in a non-medical ICU (for example, a postoperative subject in the surgical ICU being transferred from the surgical ICU team to the medical ICU team for persistent respiratory failure).
The final study population included 8442 subjects, of whom 1881 (22 %) were exposed to the effects of boarding.
Statistical Approach
A naive estimate of the effect of boarding on mortality would compare the outcomes of patients who were boarders to those who were not. However, the decision to board a patient is not random. It takes into account the level of severity of a given patient’s condition, as well as how that compares with the severity levels of other incoming patients also in need of an ICU bed. It is likely that much of the information that informs this decision is unobservable. As a consequence, if we conducted this study as a simple regression analysis we would obtain biased estimates of the effect of boarding.
For example, assume that boarding increases mortality, but also that ICU staff preferentially select less severely ill patients to be boarders. In this hypothetical scenario, the observed association between boarding and mortality could appear protective if the negative effect of boarding on mortality is smaller than the positive effect on observed mortality of selecting healthier patients. While one may, and should, control for patients’ severity of illness and pre-existing health levels, it is not usually possible to observe these with the same granularity and accuracy as the hospital staff who decide whether the patient will become a boarder. As a result, boarders may still be healthier than non-boarders even after conditioning on a measure of severity of illness.
An IVA is an attractive approach in this situation. In this study, we focus on MICU patients. We propose that the number of remaining available beds in the western campus MICU at time of patient intake (west_initial_remaining_beds) may serve as a valid instrument for boarding status. It is important to note that west_initial_remaining_beds does not include beds that are available outside of the MICU (i.e. beds to which boarders can be assigned). The boarder status of the patient is the causal variable and the outcome is death during ICU stay (Fig. 19.2).
The Oxford Acute Severity of Illness Score (OASIS) is employed to help account for residual differences between the health status of boarders and non-boarders at the time of their intake into the ICU. OASIS is an ICU scoring system that has been shown to have non-inferior performance characteristics relative to APACHE (Acute Physiology and Chronic Health Evaluation), MPM (Mortality Probability Model), and SAPS (Simplified Acute Physiology Score) [2]. We preferentially use OASIS for severity of illness adjustment because its scores can be more accurately reconstructed in MIMIC-III in a retrospective manner than the aforementioned alternatives.
At times when hospital load is high, the total number of patients being cared for by the ICU team (west_initial_team_census) is likely to be high, and west_initial_remaining_beds is likely to be low. Furthermore, it is plausible that higher values of west_initial_team_census might affect mortality as a relatively fixed quantity of ICU resources (e.g. physicians) is stretched across a greater number of patients.
At first it may be unclear why there is imperfect correlation between west_initial_team_census and west_initial_remaining_beds, as one might anticipate that the number of remaining beds is simply inversely proportional to the total number of patients being cared for by the ICU team. The source of variation between these variables is two-fold. The primary driver is the stochastic pattern of ICU discharges. It is improbable that all boarders will be discharged prior to any of the non-boarders. Discharging a non-boarder while other patients remain as boarders creates a situation where the total team census may continue to be higher than the bed capacity of the MICU, yet the number of available beds in the MICU becomes non-zero. The second, smaller source of variation is occupancy of MICU beds by patients being cared for by other ICU teams (e.g a SICU patient boarding in the MICU).
Using west_initial_remaining_beds as an instrument is therefore valid, but we must control for west_initial_team_census. To check that west_initial_remaining_beds is correlated to the propensity of patients to board, we fit a generalized additive model with a logistic link function.
Once a natural experiment has been identified and the validity of the instrumental variable confirmed, an IVA can be conducted to estimate the causal effect of the treatment. The standard in the econometrics literature has been to use a two-step ordinary least squares (OLS) regression. There are two important limitations to this approach in biomedical settings. Firstly, it requires continuous treatment and outcome variables, both of which tend to be discrete or binary in medical applications. Secondly, it requires knowledge of the functional form of the underlying relationships such that the data can be transformed to make the relationships linear in the parameters of the estimated model. This is often beyond what is known in the biomedical field.
Several approaches have been developed to address these limitations. Probit models are part of a family of generalized linear models (GLM) that is well suited to working with discrete data, thereby addressing the first aforementioned limitation. Furthermore, use of a basis expansion may allow the functional form to be approximated flexibly using penalized splines, substantially relaxing the second limitation related to knowledge of functional forms. At least one statistical package, SemiParBIVProbit for R, combines these two approaches in an accessible implementation.
In addition to the probit model, we used the survival package for R to estimate a non-instrumental Cox proportional hazards model as a robustness check. In order to minimize selection bias in this non-instrumental model, we used a subset of the dataset in which it is intuitive that selective pressures would be reduced or non-existent: west_initial_remaining_beds equal to zero (all patients must board irrespective of their severity of illness) or west_initial_remaining_beds greater than or equal to three (no imminent capacity constraint exerting pressure on physicians to board patients). The linear assumptions of the Cox models are strong and not justified a priori, therefore in order to test for potential nonlinearities in the instrumental model we used the Vuong and Clarke tests of the SemiParBIVProbit package.
All of our models included controls for patient age, gender, OASIS and Elixhauser comorbidity scores, length of hospital stay prior to ICU admission, and calendar year. In addition to controlling for the west_initial_team_census, we also controlled for the total number of boarders under the care of the MICU team.
2.3 Pre-processing
We used a software package called Chatto-Transform [3] that connects to a local PostgreSQL instance of MIMIC-III and simplifies the process of importing table data into an interactive Jupyter notebook [4]. Python 3 and the Pandas library [5] were used for data extraction and analysis (see code supplement).
The publicly available version of MIMIC-III applies random time-shifts to records to help prevent subjects from being identified. After institutional review board approval, we obtained the exact dates and bed assignments for each subject’s ICU stay and used this to reconstruct the entire hospital ICU census.
The services table in MIMIC-III documents the specific service (e.g. medicine, general surgery, cardiology) responsible for a patient at a given moment in time. The service providing MICU care is classified as ‘medicine’. Therefore general medicine patients who are initially admitted to a ward and later require a MICU bed will still only have one entry per admission in this table, provided that they are not transferred to the care of a different service. We consider a refined copy of the services table (‘med_service_only’) that retains only those rows pertaining to patients cared for exclusively by the medicine service during their stay. The resulting table therefore has only one row per hospital admission.
The transfers table documents every change in a patient’s location during their hospital admission, including exact bed assignments and timestamp data. A new table df can be created by performing a left join between transfers and med_service_only. In the resulting table, rows pertaining to the population of interest (i.e. medicine patients who incurred a MICU stay at some point during their admission) will have data corresponding to both the left (transfers) and right (med_service_only) tables. Rows pertaining to all other patients will only have data from the transfers table. We further subdivide this table into inboarders (which contains rows pertaining to non-MICU patients occupying beds in the MICU) and df5 (which contains rows pertaining to our population of interest).
Looping through each row in df5, we identify rows in inboarders that represent a MICU bed occupied by a non-MICU patient at the time a MICU patient began their ICU stay. We also determine whether the new MICU patient was assigned a bed outside the geographic confines of the MICU, in which case they were classified as a boarder. Lastly, a count of the total number of patients being cared for by the MICU team is generated and added to each row of df5. These variables allow for calculation of the number of remaining MICU beds through the formula:
$$ Remaining\;Beds = (MICU\;Capacity - No.\;of\;Inboarders) - (Team\;Census - No.\;of\;Boarders) $$
Death during ICU stay was determined a priori to be our primary outcome of interest. We identified a number of instances in the dataset where death occurred within minutes or hours of discharge from the ICU. This was most likely due to combination of expected deaths (subjects transitioned to comfort-focused care who were transferred out of the ICU shortly prior to death), unexpected deaths, and minor time discrepancies inherent to large datasets that include administrative details. Prior to data analysis it was decided that our preferred definition of death during ICU stay would include those within 24 h of leaving the ICU.