The ICU is a setting where death is common, and the majority of these deaths involve decisions to withhold or withdraw life-sustaining treatments [1]. These two facts highlight the importance of addressing the quality of end-of-life care in the ICU, as well as the need to support patients and family members through this process. Unfortunately, both quality of care and support for patients and families vary markedly from hospital to hospital, influenced in large part by physician attitudes and hospital norms [1, 2]. Importantly, family members of patients who die in the ICU experience a significant burden of distress, with high levels of symptoms of anxiety, depression, and post-traumatic stress disorder [3, 4] that have long-lasting consequences. Evidence suggests that behaviors of ICU clinicians and the culture of care in the ICU care can increase, or decrease, these symptoms [5, 6].

If we accept that ICU clinicians have an important responsibility to provide high-quality care to patients who die in the ICU and to their families, measuring the quality of this care and identifying interventions to improve it are crucial. Reliable and valid patient- and family-centered outcomes are essential to credible evaluation of end-of-life care [7]. Efforts to identify the key elements of such outcomes have explored the perspectives of patients [8, 9], families [8, 9], and professional organizations [10], generating broad consensus that such outcomes should include multiple domains, including the physical, psychological, social, spiritual, and ethical. However, creating an instrument that accurately summarizes such diverse domains—especially if the goal is to create a single score—is challenging and may in fact be fundamentally impossible.

In an article recently published in Intensive Care Medicine, Kentish-Barnes and colleagues present a new tool designed to measure family members’ experience with end-of-life care in the ICU [11]. The authors are to be congratulated for a thorough and careful development process that included an inter-professional team, review of existing studies and instruments, and input from family members and ICU clinicians. Starting with 50 items in eight domains, they iteratively tested, analyzed, and reduced the instrument to 15 items that contribute to a single score: CAESAR. They then conducted a prospective cohort study enrolling 600 consecutive family members of patients who died in one of 41 participating ICUs in France, achieving over 90 % participation from these relatives—a remarkable accomplishment. They assessed CAESAR 3 weeks after patients’ deaths and later assessed family members’ symptoms of anxiety, depression, and post-traumatic stress at 3 months, as well as post-traumatic stress and prolonged grief at 6 and 12 months. Using the summary CAESAR score, divided into tertiles, they found that a lower CAESAR score was associated with increased symptoms of anxiety, depression, post-traumatic stress, and complicated grief at all subsequent time-points. These findings provide construct validation for CAESAR, showing that poorer ratings of the quality of care were associated with increased psychological symptoms. These findings also support prior studies suggesting that the care we provide in the ICU for patients and their families can have important implications for the mental health of family members for many months after a patient dies in the ICU.

The authors used exploratory factor analysis and Cronbach’s alpha to suggest that the 15-item CAESAR questionnaire can be summarized with a single summed score. This approach provides evidence that the items are correlated with one another and that the composite score, to some extent, reflects family members’ overall assessment of ICU care. However, the approach does not demonstrate that CAESAR measures a single, unidimensional construct. Convincing demonstration of the unidimensionality of these items should include a single-factor confirmatory factor analysis (CFA) model that defines the items as ordered categorical variables and shows evidence of non-significant misfit of the model to the observed data, based on the χ 2 test of fit. Although this test is often omitted because of its vulnerability to the influence of large sample size (significant χ 2 values sometimes resulting from trivial misfit when the sample is large), this possibility can be tested with Bayesian modeling [12]. Should both CFA and Bayesian models suggest significant misfit, additional modeling, aimed either at identifying unidimensional constructs measured with a subset of the items, or at identifying multi-factor models, is important until acceptable fit is demonstrated. Finally, even when a unidimensional model (using conventional CFA or Bayesian analysis) shows acceptable fit to the data, it is unclear whether a score constructed as the sum of ordered categorical variables is the most appropriate summary measure, given the impossibility of demonstrating that intervals between categories are equal. If a composite score is preferred to a latent construct, it might be better to generate a factor score that acknowledges the items’ level of measurement and accounts for specific contribution of each item to the whole.

Why does it matter whether CAESAR measures a unidimensional construct? Imagine rating the quality of a restaurant. You might be asked to rate service, food, and atmosphere. If you have had a particularly good meal (or a particularly bad one), your ratings on these constructs are likely to correlate but their combination into a single score may disguise important information; the overall score could cause the chef to be fired, when it was really the inadequate air-conditioning and the rude waiter that were the cause of low ratings. The total score may make it difficult to identify effective interventions. In addition, although we know that CAESAR correlates with family members’ psychological symptoms for at least a year after a death in the ICU, we do not yet know whether the score will improve with interventions that improve care.

These standards set a high bar for outcome measures, but we believe it is important that the bar be high so we can identify interventions that clearly improve patient and family outcomes and increase the value of the care we provide. Prior measures of constructs similar to that assessed by CAESAR, such as family experience in the ICU as measured by the Family Satisfaction in the ICU (FS-ICU) [13] or patient quality of dying as assessed by family or clinicians with the Quality of Dying and Death (QODD) [14] questionnaires, have not been shown to meet these rigorous standards. Furthermore, development of CAESAR offers the opportunity to consider direct comparisons of these available measures—an important step in identifying the best possible outcome measures.

We believe that the CAESAR is an important tool that will advance the measurement of family experience with end-of-life care in the ICU. In a field where many important intervention studies have not assessed patient- and family-centered outcomes at all, often in favor of easier-to-obtain outcomes such as ICU length of stay [15], CAESAR is an exciting advance. However, we think it is important to exercise caution about whether CAESAR is ready to be a primary endpoint in trials of interventions. We need additional information about the constructs it measures and whether these constructs can identify interventions that make care better.