Introduction

To improve the quality of care in the intensive care unit (ICU) it is essential to measure the effectiveness of new and existing ICU interventions on long-term patient outcomes [1, 2]. Measurement of long-term outcomes and the patient and ICU characteristics (“exposures”) predicting them present critical care investigators with unique challenges. Although general reviews exist in the epidemiological literature [3, 4], there is no systematic guidance to exposure and outcome measurement in the ICU. Such guidance may help improve the quality and comparability of outcomes research in critical care.

In this report we use examples from the critical care literature to describe measurement issues related to outcomes studies. As a framework for this discussion we focus on the three key elements of study design: subjects, outcomes and exposures, and time [5]. Using this framework we review principles of measurement, describe relevant challenges, and suggest recommendations for future critical care long-term outcomes research.

Subjects

Principles

Inclusion and exclusion criteria are used to create a patient population that is both relevant to the research question and comparable with prior studies. Extensive inclusion and exclusion criteria may enhance a study’s feasibility and internal validity by focusing precisely on a specific research question. However, such restrictive criteria also may bias the exposure-outcome relationship, reduce comparability with prior research, and limit generalizability. These considerations must be carefully weighed in designing and evaluating a study’s eligibility criteria.

Challenges and recommendations

Patient heterogeneity

Patient characteristics vary by ICU and hospital [6]. Thus, understanding these characteristics is important when interpreting and comparing the results of outcomes studies. To reduce this heterogeneity we recommend harmonizing patient eligibility criteria across similar types of long-term outcomes studies. For example, consensus on the inclusion or exclusion of patients in certain subgroups (e.g., neurological trauma, length of stay <24 h) would improve the ability to compare results across studies. Furthermore, to improve the description of this heterogeneity we recommend developing guidelines for reporting patient characteristics such as those created for meta-analyses of observational studies [7].

Even within specific clinical syndromes patients are not homogeneous. For example, there is heterogeneity in the underlying pathology of patients meeting the clinical definition of acute respiratory distress syndrome (ARDS) [8]. Such heterogeneity may create variation in the effectiveness of specific therapies and reduce the comparability of findings between outcomes studies. Consequently the refinement and standardization of clinical definitions should continue [9, 10] in order to improve the comparability of study populations in outcomes research.

Selection bias

Survivors of critical illness frequently have significant morbidity (e.g., cognitive or physical impairment) which may give rise to selection bias in outcomes studies. This bias can occur when a study excludes patients who are too impaired to provide information, particularly for patient-reported outcomes (e.g., quality of life, QOL). In this situation, if outcomes of the study population are compared to an external standard (e.g., normal values for QOL in an age- and sex-matched population), exclusion of these patients creates bias toward more positive outcomes for the study. Furthermore, when such exclusions are associated with the exposure of interest, evaluation of the exposure-outcome relationship may be severely biased in either a positive or negative direction [11].

Selection bias is a fundamental threat to the validity of a study and cannot be removed by statistical analysis. Consequently we recommend development of robust methods for measuring outcomes in impaired ICU survivors. For example, the Sickness Impact Profile (SIP) survey for QOL assessment [12] measures patient behaviors, which can be directly observed by proxies, rather than patients’ attitudes or opinions. Alternatively, use of a brief instrument may allow more complete ascertainment of outcomes in ICU survivors who otherwise cannot tolerate lengthy evaluations. For example, the EuroQol-5D is a simple five question QOL survey recommended for use in critical care. This simple survey may be more feasible than lengthier instruments, such as the Medical Outcomes Study Short-Form 36 Item (SF-36) and SIP, in studies where the potential for patient impairment and selection bias is great [13].

High mortality and loss to follow-up after ICU discharge also make long-term outcomes studies vulnerable to selection bias. This bias occurs when the subjects being evaluated in follow-up are not representative of the original study population. For example, a study of long-term neuromuscular dysfunction in ICU survivors identified 195 eligible patients, but only 86 were alive at follow-up, 47 lived close to the study site, and 22 agreed to participate, resulting in a 26% follow-up rate among surviving patients [14]. This problem is encountered by many studies within critical care [15, 16]. Among ICU survivors the sickest or most disabled patients may be difficult to access [17]. Alternatively, ICU survivors who regain mobility may move away from the study site or return to work and be unavailable for follow-up. Thus the patients remaining in the study may significantly differ from the original cohort of ICU survivors.

Since loss to follow-up cannot be eliminated in long-term outcomes studies, we recommend that investigators clearly describe the magnitude of patient loss and the characteristics of those who were lost vs. those remaining in the study. Systematic collection and reporting of basic data on lost patients may suggest the magnitude and direction of the selection bias. Such data could include demographic and other baseline information. In addition, when patients can be contacted but are not able to participate, investigators could collect the reason for nonparticipation (e.g., too ill or moved away) and a brief description of the patient’s status at the time of contact (e.g., unable to perform usual activities).

Furthermore, we recommend that investigators develop and report methods for minimizing loss to follow-up [18] as extensively done for survey-based research [19, 20]. For example, Herridge et al. [17] described the frequency and efforts required for home visits to patients who could not travel to the hospital for follow-up. Furthermore, extensive use of telephone and mail reminders, tracking of nonresponders, outreach teams to follow patients at home, and use of proxy respondents may reduce loss to follow-up, but the effects of such measures have not been sufficiently studied in the context of critical care [17, 18]. Newsletters, small tokens of appreciation (e.g., magnets or mugs with study logo), and other incentives are also worthy of further investigation [18]. Thus research on methods to maximize rates of follow-up would benefit long-term outcomes research.

Outcomes and exposures

Principles

A classification of critical care outcomes is provided below (see also Table S5, Electronic Supplementary Material):

  • Medical outcomes

    • Survival (short- and long-term)

    • Surrogate markers for outcomes (e.g., organ dysfunction [21])

    • New medical diagnoses

    • Hospital readmission

  • Patient outcomes

    • Impairment and disability (e.g., pulmonary function, hearing impairment, swallowing dysfunction, neuromuscular dysfunction)

    • Functional status (physical, mental, neuropsychological, recovery)

    • Quality of life

  • Caregiver outcomes

    • Functional status of the caregivers (mental status, recovery)

    • Use of time and restriction of activities

  • Societal outcomes

    • Resource utilization and economic burden

    • Ethical and legal appropriateness

Interested readers can find a comprehensive review and evidence-based appraisal of patient outcome measures in existing publications [22, 23]. Detailed reviews of neuropsychological outcome measures have also been recently summarized elsewhere [24, 25]. Outcomes related to the perspectives of caregivers and society are relatively new to critical care research and address the broader impact of critical illness, such as the emotional burden on caregivers [26, 27] and the economic burden on society [28, 29].

Although outcomes have been well categorized [22, 23, 24, 25], a clear classification of ICU exposures has not been generally accepted. One relevant framework can be adapted from the “system factors” classification used for adverse-event reporting [30]. Within this classification there are seven types of factors contributing to adverse events: patient, provider, team, task, training, management, and organizational factors [31]. This classification can be simplified to three major categories in order to provide a more general classification of ICU exposures: patient-based exposures (patient system factors), clinical management exposures (provider, team, and task system factors), and ICU organizational exposures (training, management, and organizational system factors). This classification of critical care exposures is described in more detail below:

  • Patient-based exposures

    • Demographics (e.g., age, gender, race)

    • Comorbidity

    • ICU admission diagnosis

    • Severity of illness

  • Clinical management exposures

    • Medications

    • Mechanical ventilation technique and settings

    • Procedures (e.g., tracheotomy)

    • Other medical therapies (e.g., nutrition, blood products)

    • Other technological therapies (e.g., renal replacement therapy)

  • ICU organizational exposures

    • ICU physician staffing (e.g., intensivist)

    • Nurse-to-patient ratio

    • ICU and hospital volume

    • Hospital teaching status

    • Use of clinical protocols

    • Available technology

    • Teamwork factors

Selection of appropriate measurement instruments for study outcomes and exposures is an essential step in study design. This topic has been extensively reviewed in several comprehensive publications [22, 23, 24, 25]. Thus we highlight only two issues relevant to instrument selection in critical care. First, some exposures and outcomes (e.g., QOL) are measured quantitatively. For these measurements primary considerations for instrument selection include assessment of validity, reliability, responsiveness, and interpretability [32]. These measurement characteristics vary with the patient population. Consequently, where possible, instruments should be validated in ICU patients. Second, exposures or interventions that are directly modifiable (e.g., mechanical ventilator settings and drug dosage) should be measured in a way that reflects the proposed exposure-outcome relationship. For example, in measuring the dose of aminoglycoside antibiotics, the maximum dose (peak concentration) may be the primary determinant of treatment efficacy in critically ill patients, whereas the cumulative dose may be a more important determinant of toxic side effects [33, 34, 35, 36]. Thus the most appropriate measurement depends on the research question and existing knowledge regarding the exposure-outcome relationship.

Challenges and recommendations

Selecting outcomes and exposures

Research on long-term outcomes of ICU survivors is in its infancy, thus posing a challenge when selecting exposures and outcomes for measurement in new studies. Although there is growing consensus regarding the importance of measuring certain outcomes (e.g., cognitive status [24, 25]), the significance of other outcomes (e.g., muscle weakness and wasting [37]) is still being explored. Similarly, the relationship of many exposures to long-term outcomes remains uncertain. For example, hypoxemia and hypotension have some impact on long-term cognitive status, but the magnitude and relevance of this effect are not certain [38, 39].

Since collecting data on all potential exposures and outcomes is not feasible, investigators must select those of greatest importance. Randomized trials with short-term outcomes may provide evidence regarding the efficacy of an exposure which can be further explored, from a longer term perspective, in subsequent observational studies. For example, the PROWESS [40] study demonstrated a short-term survival benefit of activated protein C in severe sepsis. A subsequent study of longer term outcomes demonstrated that patient severity of illness was an important factor affecting the survival benefit of this treatment [41].

Based on these challenges we make two recommendations. First, to define the exposures and outcomes of greatest importance we recommend developing consensus regarding a research agenda for outcomes studies. Long-term data collection poses unique challenges, and communication and collaboration between investigators conducting long-term outcome studies will facilitate collection of complementary information and sharing of knowledge regarding difficulties encountered. Second, once consensus is reached regarding measures of exposures and outcomes, routine measurement of those values in the ICU will enable clinicians to learn continuously about the effect of interventions in their ICU and assist investigators in conducting multicenter studies with larger sample sizes.

Validating and standardizing existing instruments

Validation of preexisting instruments in ICU patient populations is essential for accurate measurement. This process already has been occurring for certain ICU outcomes (e.g., QOL). A 1998 review demonstrated that only 3 of 64 ICU QOL studies used instruments with previously documented validity and reliability [32]. However, by 2002 methodological research on QOL measurement in ICU patients had increased such that a consensus conference recommended SF-36 and EuroQol-5D as the most appropriate instruments for future research [13]. Following this example we recommend (a) continued methodological research to evaluate the measurement characteristics of existing instruments in ICU patient populations and (b) continued consensus building regarding the most appropriate measurement instruments for additional outcomes and exposures in critical care research.

Custom-made instruments

As critical care investigators explore novel exposures and outcomes, appropriate measurement instruments may not exist. In these circumstances the use of custom-made data collection instruments demands investigators’ time, effort, and expertise for development. However, such instruments may receive little formal evaluation of their measurement characteristics. Thus we recommend that study protocols include evaluation of the measurement characteristics of their custom-made instruments, and institute quality control efforts to minimize variability in measurement. Furthermore, use of existing, validated measurement instruments, rather than custom-made instruments, enhances comparability between studies and assists in building knowledge regarding exposure-outcome relationships. Consequently we recommend that custom-made instruments not be used when existing, validated instruments are available.

Time

Principles

Relevant principles regarding the timing and frequency of measurement include consideration of four issues: (a) prospective vs. retrospective data collection, (b) frequency of measurement, (c) the “biologically active window,” and (d) any “lag period” effect.

First, retrospective data collection can be convenient and time efficient but is limited by the data contained within existing records. As described in the accompanying article [5], a prospective design is most useful when existing records do not contain the data of interest (e.g., ICU organizational characteristics), are likely to be inaccurate without use of specific measurement tools (e.g., delirium [42]), or are not collected at the appropriate frequency for research purposes (e.g., certain laboratory tests).

Second, certain discrete exposures (e.g., tracheotomy) and outcomes (e.g., survival) occur during a specific period of time (e.g., inpatient hospital stay) and may be measured at a single point after the time period elapses. Other exposures (e.g., gender) are inherent traits that can be measured at any single time point. However, time-varying exposures (e.g., level of sedation) and outcomes (e.g., QOL) require repeated measurement and statistical analysis that accounts for the nonindependence of these measurements [43] to accurately reflect the exposure-outcome relationship.

Third, measurement of outcomes should occur within the “biologically active window” of the exposure [3]. Some exposures may have their effect over a short period, whereas others may be long lasting. For example, Herridge et al. [17] found that ARDS had a time-limited impact on pulmonary function with significant decrement at 3 months after ICU discharge, but sustained improvement towards normal by 6 months. On the other hand, a decrement in patients’ 6-min walk distance persisted throughout 12 months of follow-up. A single outcome measurement at 12 months therefore would detect an impact of ARDS on physical functional status but not on pulmonary function. Thus the biologically active window of an exposure should be considered in determining the timing of outcome assessments.

Finally, outcome measurement must account for any “lag period” during which the impact of an exposure has not yet manifested [11]. For example, in studying the effect of caloric intake on nosocomial bloodstream infections, Rubinson et al. [44] reasoned that decreased caloric intake requires a lag of longer than 48 h before causing clinically detectable infection. Thus in their analysis the investigators did not consider the level of caloric intake within the 48 h prior to any infection. Without accounting for an appropriate lag period, an exposure-outcome relationship may be distorted [11].

Challenges and recommendations

Baseline data

Given the sudden nature of critical illness it may be impossible to measure patient baseline characteristics directly. Consequently investigators often do not attempt baseline measurements [5]. However, baseline status can be estimated retrospectively from ICU survivors or patient proxies. Patient retrospective measurement results in a survival bias since data will not be obtained from patients who died in the ICU [11]. Retrospective measurement also leads to recall bias since patients may not accurately remember their status prior to critical illness [11]. Baseline data may be obtained from proxies on a timely basis, and this can reduce survival and recall bias. However, proxies may not always be available (resulting in missing data) and may not accurately provide baseline measurements due to stress, infrequent contact with the patient, or perceptions of baseline status that differ from the patient [45, 46]. Biases due to inaccurate data from proxies may be nondifferential and bias the results toward the null hypothesis of no exposure-outcome association.

Despite these biases we recommend the collection of baseline data, when feasible, for assessing the impact of ICU exposures on long-term outcomes. We make this suggestion because baseline measurements of ICU patients may differ from those in the age- and sex-matched general population. For example, three studies of premorbid QOL in ICU survivors [47, 48, 49] found significant decrements in baseline status among the survivors vs. a matched general population. Furthermore, prior research has demonstrated that the baseline assessment of QOL by patient proxies may be reliable and valid [50]. Thus measurement of baseline data is important for more accurate assessment of the impact of exposures on long-term outcomes in ICU patients.

To clarify further the impact of using proxy or retrospective patient measurements to estimate baseline status for specific outcomes we recommend building methodological evaluation into existing research protocols. For example, proxy-obtained baseline measurements can be obtained and compared with retrospective measurement of baseline status for a subgroup of ICU survivors in a long-term outcomes study protocol.

Time-varying exposures and outcomes

The dynamic nature of critical illness and ICU management involves many time-varying exposures and outcomes. For example, mechanical ventilator parameters may change several times per day, and laboratory values such as serum glucose are even more variable. Such variables require frequent measurement and appropriate statistical analysis to adjust for the nonindependence of measurements [43] to accurately understand and model the underlying exposure or outcome. However, frequent measurement creates a substantial data collection burden, which must be weighed against the benefit of more complete assessment. For example, in the ARDSnet [51] trial of low tidal volume ventilation, ventilator parameters (a highly variable, primary exposure) were recorded twice daily, whereas medications (a more stable, secondary exposure) were recorded daily for 4 days and weekly thereafter. To ensure that the data collection burden is reasonable in the context of a particular study we recommend pilot testing measurement instruments and reporting the time requirements for administering new and existing instruments to provide data for the design of future studies.

Summary

In designing outcome studies of critically ill patients, each element of the study design (subjects, outcomes and exposures, and time) should reflect the primary research question and be measured in a manner which is valid, feasible, and comparable with prior studies. The ICU setting presents specific challenges for measurement of these key elements. We present recommendations for addressing these challenges (Table 1). Only through appropriate measurement can we gain the knowledge necessary to improve quality of care and long-term outcomes for ICU patients.

Table 1 Challenges and recommendations for measurement in critical care long-term outcomes research