INTRODUCTION

Delirium, an acute confusional state, is a common, morbid, and costly geriatric syndrome.1 It occurs in 30–50 % of medical patients and 27–51 % of orthopedic surgical patients, with the highest levels observed among patients in the intensive care unit1. Delirium after major surgery is associated with longer length of hospitalization,2 greater nosocomial complications,3 and higher rates of discharge to nursing homes.4 , 5 Moreover, patients with post-surgical delirium have an increased risk of cognitive and functional decline, incident dementia, and a 2–20 fold increased risk of mortality,2 , 6 with hospital mortality rates of 25–33 %.5 , 7 Taken together, the estimated annual health care costs attributable to delirium ranges upwards of $182 billion U.S. dollars.8

The ability to determine which episodes of delirium are likely to lead to poor clinical outcomes has remained a major area of challenge. To date, clinicians and investigators have characterized delirium by its intensity at a single timepoint9 14 or its total duration4 in an effort to quantify “severity.” However, previous work has not quantified the severity of an entire episode of delirium (such as across an entire hospitalization), which is critical for advancing our understanding of the influence of delirium on clinical outcomes. Such an exploration would require expansion of the concept of delirium severity beyond a single instantaneous rating or counting days with delirium.

Thus, the present study was designed to better quantify the severity of an entire episode of delirium during hospitalization across multiple (and combined) domains, including delirium intensity, delirium duration, or change in cognitive function over time. We examined the predictive validity of multiple delirium episode severity measures for 30- and 90-day post-hospital outcomes in two distinct cohorts of older adults. Thus, the present study was designed to better quantify the severity of an entire episode of delirium during hospitalization across multiple (and combined) domains, including delirium intensity, delirium duration, or change in cognitive function over time. Our aim was to identify the ‘best’ measures of delirium episode severity by examining the predictive validity of multiple delirium episode severity measures for 30- and 90-day post-hospital outcomes in two distinct cohorts of older adults. We hypothesized that a combination measure of delirium episode severity that incorporated both intensity and duration would improve prediction of adverse post-hospital outcomes relative to any single instantaneous delirium severity measure.

METHODS

Study Samples

Two prospective cohort studies (surgical and medical) were examined: the Successful AGing after Elective Surgery (SAGES) Study and Project Recovery (respectively). The two samples were not combined because of baseline differences and in order to provide cross-validation for predictive validity. The ongoing SAGES Study enrolled 566 patients aged ≥70 years scheduled for elective major noncardiac surgery, including: cervical or lumbar laminectomy, total hip or knee replacement, open abdominal aortic aneurysm repair, lower extremity vascular bypass, or colectomy. Participants were recruited during June 2010 to August 2013 from two Harvard Medical School affiliated hospitals. Major inclusion and exclusion criteria have been previously published.15 Briefly, 1052 patients were screened for eligibility based on medical record review (age ≥70 years; scheduled for major orthopedic, vascular, or general surgical procedures; having general or regional anesthesia; projected hospital stay ≥2 days). Among these patients, 318 declined an interview, 163 were ineligible (residing within 50 miles, evidence of dementia, rescheduled surgery, prior hospitalization, severe blindness or deafness), and 5 were eligible but refused participation, resulting in a final sample of 566 patients (eFigure 1).16

Project Recovery enrolled patients aged ≥70 years, who were admitted to the medicine service at Yale-New Haven Hospital from March 1995 to March 1998. Sample inclusion and exclusion criteria have been reported.8 , 17 , 18 Briefly, 2434 patients were eligible based on: age ≥70 years and no delirium upon admission, but with medium to high risk for delirium. Among these patients, 1265 were excluded given their inability to complete interviews (n = 298), coma or terminal illness (n = 69), hospital stay <48 h (n = 219), previous enrollment (n = 324), or other reasons (n = 355). Lastly, 250 patients declined enrollment, yielding a final sample of 919 patients (eFigure 2). This reflects a combined cohort, which has been previously examined.10 , 18 , 19

Clinical outcomes were assessed by trained full time clinical research associates (e.g., chart abstractors and interviewers) who were blinded to the Confusion Assessment Method (CAM; described below) delirium status of the patients. A separate group of trained full-time clinical research associates conducted in-hospital CAM assessments. For SAGES, informed consent for study participation was obtained from all subjects according to procedures approved by the institutional review boards of Beth Israel Deaconess Medical Center and Brigham and Women’s Hospital, the two surgical sites, and Hebrew SeniorLife, the study coordinating center, all located in Boston, MA. For Project Recovery, informed consent was obtained from the patients or a proxy for patients with significant cognitive impairment, as approved by the Yale University Institutional Review Board.

Delirium Assessment

For both studies, delirium was determined using the CAM,20 which was assessed daily throughout hospitalization. The CAM was scored after a 10–15 min structured interview with cognitive testing (of attention, orientation, memory) conducted by trained study staff. Delirium was present if the patient had an acute onset of change or fluctuation in mental status, inattention, and either disorganized thinking or altered level of consciousness. Study interviewers underwent intensive training and standardization.16 , 21

Delirium Episode Severity Measures

Individual Measures: We chose to examine different types of measures based on evidence in the literature. For example, previous studies have examined delirium intensity, persistence or duration as severity indicators.1 , 4 Other studies examined delirium at discharge and cognitive change as severity indicators.22 , 23 Thus, we chose to examine nine individual measures of delirium episode severity that were classified into four groups of measures requiring: 1) delirium intensity only (based on the 10-item CAM-S long form measurement), 2) delirium intensity and duration, 3) measures requiring information on delirium duration and delirium at discharge, and 4) a measure of cognitive change. We considered two measures that require delirium intensity (peak CAM-S score and mean CAM-S score), along with three measures that require delirium intensity plus delirium duration (sum of all CAM-S scores; sum of all CAM-S scores, only on delirium days; and peak CAM-S score × delirium days). Peak CAM-S × delirium days was considered since it explicitly considers both the intensity and duration of delirium, whereas the sum of all CAM-S scores implicitly considers intensity and duration without including number of delirium days in its definition. Three measures required delirium diagnosis (without intensity) and delirium duration: total number of delirium days, percentage of delirium days, and delirium at discharge. The final delirium episode severity measure captured change in cognition. This was computed, in Project Recovery only, as the absolute difference between the highest and lowest Mini Mental State Examination score (purchased from Psychological Assessment Resources). For our analyses, we considered the entire episode of delirium for each patient as our unit of analysis; thus, we did not treat each delirium day as an independent time point for these analyses. eTable 1 contains more information on these delirium episode severity measures.

Combination Measures: Two combination measures were considered to see whether predictive validity could be improved by considering several individual measures simultaneously (see Online-only Supplement for details).

Association with Clinical Outcomes

We examined the associations among the nine delirium episode severity measures and two combination measures with relevant 30- and 90-day post-hospital outcomes in Project Recovery, where collection of clinical outcomes has been completed. These 30- and 90-day outcomes included: death, nursing home placement, and readmission. Information on death was obtained from medical records, the National Death Index, Social Security and Medicare Part A databases, and death certificates.8 , 24 Nursing home placement was determined from Medicare Part A information. Hospital readmission was determined from Medicare Part A and medical record review. Since participants who die are no longer eligible for other outcomes, two hierarchical outcomes were considered to avoid inferential errors: 1) nursing home placement or death, and 2) readmission or death. We had complete information (i.e., no missing data) for our outcome measures.

Since SAGES data collection is ongoing, we investigated two currently available 30-day outcomes: nursing home placement and readmission. Only one patient died within 30-days post-discharge, and was included in these outcome groups for all analyses.

Statistical Analysis

We conducted our analysis in two stages. First, we examined the association between each of the nine delirium episode severity measures with 30- and 90-day clinical outcomes. To aid in interpretation of the post-hospital outcomes, we categorized all delirium episode severity measures, except for delirium at discharge, into four categories defined by sample distribution quartiles and labeled: none, low, moderate, high. Patients without delirium were included in the analyses, and may have contributed severity points (i.e., not all cases had 0 points). Delirium at discharge was classified as: no delirium, delirium not at discharge, and delirium at discharge. The association of each delirium episode severity measure and each outcome was estimated using a generalized linear model with a Poisson error distribution for binary outcomes (death, nursing home residence, and readmission). Since our goal was to develop a relative ranking to compare delirium severity measures, and not to develop a predictive model for clinical outcomes following delirium, we wanted to have parsimonious adjustment, only for age, sex, and race in all models; thus, we did not control for every possible contributor to the outcomes (e.g., Charlson comorbidity index).

Two types of c-statistics were considered to examine model fit for the categorical and continuous measures of delirium episode severity. The c-statistic is widely used as a measure of predictive accuracy. It indicates how well predicted values agree with observed values and also how well the model discriminates between observations at different levels of an outcome. Values for the c-statistic can be thought of as the probability that among two people chosen at random from the sample, one with a lower value on the risk factor (delirium episode severity) and the other with a higher value of the risk factor, that the person with the higher value on the risk factor has a greater value on the outcome variable. We did not anticipate high values for our c-statistics since we did not develop explanatory models intended to maximize predictive ability for the clinical outcomes; instead we used the c-statistic as a relative metric to compare the nine delirium episode severity measures to each other. Since Project Recovery was an intervention trial, we conducted additional analyses to evaluate the potential influence of the intervention on these relationships.

Additional Analyses

Additional analyses aimed to explore the importance of delirium intensity versus duration. We conducted a stratified analysis of the delirium episode severity measure ‘peak CAM-S x delirium days’ to determine whether a severe (intense) delirium over a single day was associated with poorer clinical outcomes compared to a less severe (intense) delirium over ≥2 days (see Online-Only Supplement for details).

Analyses were conducted using SAS version 9.3 (SAS Institute, Cary, NC) and Stata version 13 (StatCorp, College Station, TX).

RESULTS

Sample characteristics of the patients are reported in Table 1. The Project Recovery and SAGES samples included older adults (mean age, 80 and 77, respectively) with fewer men than women (40 and 42 % men, respectively). A greater proportion of Project Recovery patients had a Charlson comorbidity index of ≥2 relative to SAGES patients (70 vs. 30 %, respectively). Incident delirium during hospitalization was higher in SAGES than Project Recovery patients (24 vs. 13 %, respectively).

Table 1 Sample Characteristics of the Project Recovery and SAGES Studies

In Table 2, we report the means and proportions of the nine delirium episode severity measures. The average levels in Project Recovery and SAGES were similar across all measures.

Table 2 Means and Proportions of Nine Delirium Episode Severity Measures

In Table 3, we report the association of the nine delirium episode severity measures with 30-day outcomes in Project Recovery, including death, nursing home residence, and readmission. The risk of death increased across each severity category for: peak CAM-S, mean CAM-S, sum of all CAM-S scores, and delirium at discharge. An increased risk for hospital readmission by delirium episode severity category was observed for: sum of all CAM-S scores, peak CAM-S, mean CAM-S, and delirium at discharge. For every measure, the intervention did not significantly influence the relationship between delirium severity and the outcomes; thus, this variable was not included in our analyses. As anticipated, the strength of associations between the delirium episode severity measures and the 90-day outcomes (eTable 2) were less pronounced than the 30-day outcomes (Table 3).

Table 3 Association of Nine Delirium Episode Severity Measures with Three 30-Day Outcomes (Death, Nursing Home Residence, Readmission) in Project Recovery

The SAGES study had generally similar results at 30 days (Table 4) as Project Recovery (Table 3). For instance, the risk of readmission increased across each severity category for: peak CAM-S, mean CAM-S, and sum of all CAM-S scores.

Table 4 Association of Eight Delirium Episode Severity Measures with Two 30-Day Outcomes (Nursing Home Residence and Readmission) in the Successful Aging after Elective Surgery (SAGES) Study

Table 5 lists the components of the: 1) full combination measure that includes four delirium episode severity measures (sum of all CAM-S scores, delirium at discharge, total number of delirium days, and change in cognition), and 2) alternate combination measure, which is similar to the full combination measure except for the absence of sum of all CAM-S scores. Sum of all CAM-S scores yielded the highest c-statistic across all 30-day outcomes, with one exception (nursing home residence in SAGES). Since peak and mean CAM-S were highly correlated with the sum of all CAM-S scores (rs = 0.8–1.0; eTable 3), they were not included in the combination measures. None of the combination measures improved predictive validity compared to the individual measures (eTable 4).

Table 5 C-statistic of Individual Delirium Episode Severity Measures with 30-Day Outcomes in Project Recovery and SAGES

DISCUSSION

Our findings across every delirium episode severity measure indicate a “dose–response” relationship between delirium severity and clinical outcomes; thus, more severe delirium is worse than mild delirium across every measure. In particular, episodes of delirium with the highest intensity and longest duration combined lead to the worst clinical outcomes (death, nursing home placement, or readmission). The ability to quantify severity of an entire episode of delirium rather than severity at a single point in time allows us to better capture the full clinical impact of delirium. Although all nine delirium episode severity measures (based on ratings of delirium intensity, duration, or cognitive change) were predictive of 30- and 90-day outcomes, the measure that added the delirium intensity scores by CAM-S measured each day over the entire hospitalization (sum of all CAM-S scores) demonstrated the strongest association with these outcomes. It is important to note, however, that the differences between the sum of all CAM-S and other CAM-S measures (peak CAM-S, mean CAM-S) were relatively small. In situations where a delirium intensity measure is unavailable, a combination of other measures (e.g., delirium at discharge, delirium days, and change in cognition—included in the alternate combination score) represents the next best delirium episode severity measure. Moreover, our results suggest that considering both delirium intensity and duration are important to fully understanding the downstream effects of an entire episode of delirium. These findings hold important implications for clinicians and researchers interested in the care of delirious patients, since addressing both intensity and duration of delirium may be necessary to impact on its poor prognosis.

The predictive validity of the CAM-S for hospital and post-hospital outcomes has been previously reported.10 The present study extends this work substantially by: 1) considering the CAM-S in multiple ways; 2) creating additional delirium episode severity measures that include delirium diagnosis, duration, and cognitive change; 3) examining combination measures of severity that take into account the best-performing individual delirium episode severity measures; and 4) investigating the association between these individual and combination delirium episode severity measures with both shorter term (30-day) and longer term (90-day) outcomes. Other important strengths of the study are the inclusion of two large datasets from distinct clinical settings (a medical and surgical sample) of older adults with rich data collection that allowed for quantifying delirium episode severity.

One of our aims was to examine the relative contributions of delirium intensity or duration as predictors of 30- and 90-day outcomes. Both appear to be important, and combined may best reflect the severity of the pathologic insult or precipitating factors associated with delirium. However, the intensity and duration measures were highly correlated and very few patients exhibited high delirium intensity with few delirium days, or conversely, low delirium intensity with many delirium days (eFigures 3 and 4). Thus, we were unable to conduct the appropriate statistical analyses to evaluate their independent contributions in this study. While our results demonstrate that measures considering both delirium intensity and duration are superior for predicting the clinical impact of an episode of delirium, we were unable to evaluate whether intensity or duration considered separately was more important. Future work is needed to address this important area. While the measures based on daily measurements of CAM-S (required to capture both the intensity and duration of delirium) performed in a superior fashion, we acknowledge the additional time and expense to obtain sequential CAM-S ratings may limit the broad applicability of this approach. Availability of brief approaches to measure CAM-S (<3–5 min)25 , 26 will enhance this process. Moreover, such measures will provide the best insight into identifying patients at highest risk of poor clinical outcomes, who could benefit most from intervention. Under circumstances with limited time and resources, the use of measures that consider solely delirium intensity or duration is the next preferred measure for capturing the severity of an episode of delirium.

This study provides proof-of-concept for the importance of both delirium intensity and duration combined. Our findings align with and extend previous work that has linked delirium severity, delirium duration, and outcomes. A study of older hip fracture patients undergoing surgery reported a higher risk of death and nursing home placement after 6-months among delirious patients with severe relative to mild delirium.27 Several studies have documented that persistent delirium (defined differently according to study as delirium 1- or 6-months following discharge) was associated with poor outcomes, including increased nursing home placement, functional decline, and mortality.4 , 28 , 29 Our findings serve to extend the previous work by combining delirium intensity and duration measurement within the context of an entire episode of delirium and to demonstrate the predictive validity for both 30- and 90-day outcomes, and illustrating a graded dose–response relationship. Future work will be needed to describe the characteristics of patients in the highest levels of the delirium episode severity measures, including how many have subsyndromal delirium and baseline cognitive impairment or dementia.

Several caveats about this study are important to mention. First, our analyses were limited to the examination of a single intensity measure (CAM-S). It will be important to compare and examine other delirium intensity measures, such as the Memorial Delirium Assessment Scale, Delirium Rating Scale, and Delirium Index in future work. Second, the characteristics of both study populations may not be generalizable to all older adults. Future validation of our findings in other samples and settings will be important. Third, while the sum of all CAM-S scores consistently demonstrated the highest ranking by c-statistics across all of the clinical outcomes examined (with one exception), we acknowledge that the sum, peak, and mean CAM-S measures are highly correlated and their respective results are very close. One of our major points is that all of these measures require daily CAM-S scores, and thus, daily measurement of delirium severity offers the optimal approach to measuring delirium severity, when possible.

This work is innovative in advancing the conceptualization of delirium that is damaging or harmful in terms of long-term outcomes. The finding that both the intensity and duration of delirium contribute synergistically to poor outcomes may help to advance both future pathophysiologic studies and to focus future interventions. Our findings suggest that interventions that address both the intensity of delirium and its duration will be key to improve clinical outcomes. Importantly, interventions that reduce intensity but increase duration (e.g., antipsychotics) may not have a positive impact on clinical outcomes.30 Ultimately, delirium episode severity measures, such as the sum of all CAM-S scores, should prove useful for monitoring delirium clinically and for providing a quantifiable dose–response outcome measure for intervention trials and for pathophysiologic investigations, all vital steps to advance our understanding of this common, morbid, and costly geriatric syndrome.