The use of clinical process-of-care measures to assess the quality of health care has grown rapidly in the past 20 years. Process measures are commonly used for internal quality assessment and improvement activities, for external accountability, for pay-for-performance (P4P) and value-based purchasing, and for regulatory purposes.1, 2 This use is motivated by the following chain of events: 1) scientific studies—often randomized controlled trials (RCTs)—find health benefits for a specific process of care for a specific population; 2) professional consensus develops that delivering this process of care for a particular population represents good-quality care; and 3) the process of care is specified as a performance measure and used to assess, improve, and report quality of care. For example, RCTs found as early as 1981 that beta blocker use decreased mortality rates after acute myocardial infarction (AMI).36 The regular use of beta blockers in post-AMI patients was advocated in authoritative clinical practice guidelines a decade later,7 and the use of beta blockers was incorporated into the National Committee for Quality Assurance Health Plan Employer Data and Information Set in 19968 and into multiple other performance measurements systems thereafter.9, 10 Similar stories exist for other processes of care, such as the use of influenza vaccinations, colorectal cancer screening, and angiotensin-converting enzyme inhibitors for systolic heart failure. As providers have increased their delivery of these processes of care, there was an expectation that this would lead to improvements in patient outcomes. Yet many studies have found minimal or even no differences in clinical outcomes between high- and low-performing providers on certain process-of-care measures (where “high-performing” indicates that the provider has delivered the process of care to a high percentage of eligible patients).1115 This has left providers and health policymakers wondering whether the focus on processes of care is misplaced.

In this paper, we discuss important challenges with regard to analyses that attempt to relate delivery of processes of care with changes in patient outcomes. We highlight analytical issues that should be considered in attempts to assess the process–outcome link in practice, such that results of the analyses can be appropriately interpreted by clinicians and health policymakers. These challenges can be grouped into four general domains: the choice of outcome, the power to detect differences in outcome, the ability to explain or control for confounding, and the stability of measure specification over time. Acknowledging these challenges is important for both the individuals conducting the analysis as well as those interpreting and disseminating the results.

What is the expected outcome of the process of care?

The first challenge in analyses that investigate the effect of a process of care on a clinical outcome is in determining exactly what outcome should be used.16 While the outcome evaluated in a clinical trial or other evidence that serves as the basis for a recommended process of care is known, this outcome is frequently not available from data that is accessible to the analyst, or it may be measured with more error than in the clinical trial. For example, RCT evidence has demonstrated that aspirin reduces the combined outcome of any serious vascular event by 25 % among patients with acute or previous vascular disease.17 However, in assessing, for example, whether receipt of aspirin upon hospital arrival after an AMI has led to improved outcomes in the Medicare population, it would be very difficult to obtain information about confirmed and adjudicated serious vascular events if one were restricted to Medicare claims or other administrative data. Instead, analyses attempting to gauge the impact of receipt of aspirin have often approximated the outcome with available data, using vascular event-related readmission or death.18, 19 While these are outcomes of importance to both patients and providers, their use in process–outcome analyses requires an assumption that the expected improvement attributable to the process of care is approximately equal to that for the outcome used in the clinical trial. The validity of the analysis rests on the strength of this assumption.

What is the proximity of the outcome to the process of care?

Another issue in selecting an outcome is the proximity of the observed outcome to the delivered process of care. This is of particular concern with processes of care wherein multiple steps or long periods of time are required between the delivery of the process of care and the outcome. Preventive processes of care typically have both of these limitations. The more steps that are required, the more difficult it is to be confident that any difference in outcome is due to the delivery of the specific process of care. The longer the period of time that is needed to produce an observable effect on the outcome—thus involving a longer follow-up period—the more difficult it is to attribute differences in outcomes to care received in the distant past. This difficulty in determining an outcome that is proximate and appropriate for the process of care being evaluated means that many preventive processes of care may not be amenable to analyses aimed at evaluating associations between process measure performance and patient outcome.

What is the potential observed difference in outcome, given the observed variation in provider performance?

In a provider-level analysis (i.e., where the unit of analysis is the hospital or long-term care facility), the lack of power is often attributable to small differences in performance among providers. When there is little variation in provider performance (e.g., hospital performance on a measure clustered nationally around 80 %), the potential observable difference in health outcomes that is attainable is very small.24 As an example, we consider the use of beta blockers following an AMI. RCT evidence in more than 20,000 patients has demonstrated a 1.2-percentage-point absolute reduction in mortality rates in AMI patients following the use of beta blockers (Fig. 1a).25 In other words, by comparing a group where 0 % received therapy to a group where 100 % received therapy, it was estimated that receipt of beta blocker therapy decreased the probability of death by 1.2 %. Now, suppose that one wanted to compare a hospital with “low” performance on the process measure assessing beta blocker therapy at discharge (e.g., a hospital whose performance rate was at the 25th percentile) vs. a hospital with “high performance” (e.g., a hospital at the 75th percentile). According to Hospital Compare, hospital performance in 2004 at the 25th percentile was 87 % (meaning 87 % of patients received beta blocker therapy at discharge), while performance at the 75th percentile was 97 %.26 Given this 10-percentage-point difference in performance and the 1.2 % expected reduction in mortality, we would expect to observe only a 0.12-percentage-point difference in mortality when comparing these two hospitals (Fig. 1b).12 Hence, the “failure” to demonstrate large differences in mortality between high- and low-performing hospitals may not be due to any lack of effect of the process of care on the outcome, but rather to insufficient variation in performance.

Fig. 1
figure 1

(a) RCT evidence supporting beta blockers after an AMI demonstrated a 1.2-percentage-point absolute reduction in mortality; (b) a 10-percentage-point difference in performance between two hospitals would lead us to expect only a 0.12-percentage-point reduction in mortality.

Are those not receiving the process of care a small proportion of the population?

Many processes of care have mean performance rates now exceeding 90 %, or even 95 %. When a very small proportion of the population specified by the measure is not receiving the process of care, there are two problems, one statistical and the other inferential. The statistical problem is that with a fixed total sample size, the ability to detect differences in outcomes is greatly diminished as the number of patients not receiving the process of care decreases. Table 1 illustrates this point, showing the minimum detectable percentage point change in outcome with 80 % power, assuming a total population of 10,000 and an average outcome rate of 10 % without the process of care (e.g., 10 % mortality among those who do not receive a beta blocker prescription at hospital discharge after AMI). Table 1 shows that for a fixed total sample size, when the number of those receiving versus not receiving the process of care is more evenly split, smaller changes in the outcome can be detected, i.e., one could detect a 10.0 % versus 8.4 % difference with 5,000 patients in each group. However, when this split is largely unbalanced, only large changes in outcome are detectable, i.e., one could detect only a 10.0 % vs. 3.2 % difference in mortality with 9,900 in one group and 100 in the other. This makes it difficult to detect small effect sizes equivalent in magnitude to those generally seen in RCTs, which often report absolute risk reduction of less than 5%.17, 2022

Table 1 Minimum detectable effect sizes, assuming an average outcome rate of 10 % without the process of care and a total sample size of 10,000

The inferential problem arises if the small fraction of individuals who do not receive the process of care differ in important ways from those who do. With any process measure, it is improbable that the measure will be applicable to the entire target population. There will always be some subset of individuals with characteristics that make the process of care inappropriate for them, such as rare contraindications or prior rare side effects, but which lie outside the exclusion criteria. Supporting this conclusion is the experience of the UK National Health Service Quality and Outcomes Framework (QOF), a P4P program that allows general practitioners (GPs) to exclude from measurement any patient for whom the GP believes the process of care should not be applied (“exception reporting”). A recent UK study examining exception reporting in the QOF program found that a median of 5.3 % of patients were excluded from quality calculations.23 If such patients exist equally in the United States (where exception reporting is not in place), they likely comprise the majority of patients not receiving “topped out” processes of care, those where 90–95 % of patients nationally are receiving the process of care. When this occurs, there is an equal or greater likelihood that any differences in outcomes between patients who do and do not receive the process of care are as a result of differences in the patients being compared rather than the receipt of the process of care.

Is the analysis robust to potential unmeasured confounding?

Another potential problem inherent in all observational studies is the possibility of unmeasured confounding. When this occurs, the observed relationship between the process of care and the outcome may be biased. For example, in an observational study examining the effect of beta blockers in a population-based cohort of elderly patients with heart failure, patients were less likely to be prescribed a beta blocker if they were older or had several comorbid conditions.27 Such patients are also at a higher risk of death and hospital readmission. Thus, if information concerning comorbid conditions is not available, the estimated effect of receiving the beta blockers may be overly optimistic, as it does not account for the fact that patients with these conditions are less likely to receive the process of care and more likely to experience negative outcomes.

While it is impossible to ensure that there are no unmeasured factors influencing the relationship between the process of care and outcome, it is possible to quantitatively assess the robustness of analysis to such unmeasured confounders.21, 2730 To illustrate, we simulated a study investigating the association between beta blocker prescription at discharge after an AMI and a negative outcome (the details of which are in the online appendix). As shown in Table 2, confounders with relatively modest associations between receipt of the process of care and outcome (odds ratios of 1.3–1.5) can increase the probability of finding an association when none exists to as high as 1 in 10 or even 1 in 8, or they can increase the chance of not finding an association when one does exist to nearly 50 %. Thus, an important component in conducting these analyses should be to consider potential unmeasured confounders not simply by mentioning them as possible limitations, but by quantifying this possibility. While clinical expertise is needed to identify potential confounding factors, statistical simulations such as this can provide estimates of the sensitivity of the results to the specified unmeasured confounders.2830

Table 2 Simulation assessing the percentage of patients prescribed beta blockers at discharge after an AMI: sensitivity of type I and type II (power) error rates to an unmeasured confounder

What are the potential analytical impacts of measure specification changes?

The ability to examine the association between changes in measure performance over time and outcomes can be hindered by changes in measure specifications. Changes to both the numerator and denominator of a measure often occur once a measure is in use. While these changes generally lead to better-specified measures, analyzing changes in performance and associated changes in outcomes across time is very difficult. For example, one of the Hospital Inpatient Quality Reporting Program process-of-care measures has had eight specification changes since January 2006.31 Following one of these changes, which added further detail on patients that should be excluded from eligibility, the number of eligible patients decreased dramatically, by 4,000 patients per month, and mortality rates increased among those receiving the process of care (26 % to 30 % mortality rate). This would indicate that the change likely removed from eligibility those patients at a lower risk of mortality. As such, any comparison between processes and outcomes over time for this measure would need to consider whether this change in specification may have influenced any observed relationship.

Implications for researchers and policymakers

Despite a strong desire to test for links between processes of care and outcomes in practice, such analyses need to be carefully considered. The clinical and analytic issues involved in conducting such studies are substantial, and lack of attention to the issues described in this paper will increase the chance of producing a result that may mislead providers and policymakers regarding associations, or lack thereof, between recommended processes of care and outcomes.

Just as researchers contemplating a clinical trial of a new therapy must consider beforehand whether their patient selection criteria are optimal, whether they are appropriately measuring the relevant outcome, and whether they have enough power to detect a clinically important effect, individuals assessing the relationship between a process of care and an outcome or interpreting the results of such studies should carefully consider the questions outlined in Fig. 2. Specifically, researchers should ensure that study designs and analyses are robust to these potential issues and that analysis results are appropriately presented in the context of the study limitations. Similarly, policymakers interpreting these results should review the analyses to determine whether the challenges discussed here raise questions regarding the validity of the findings.

Fig. 2
figure 2

Questions to address before undertaking or when interpreting results from a process–outcome analysis

Observational studies of associations between processes of care and relevant short-term outcomes are an important component in the evaluation of policies that encourage better health care outcomes. However, the interpretation of study results must consider the strengths and weaknesses of the analysis as well as the strength of the existing evidence base supporting the process–outcome relationship. One observational study that does not find a process–outcome relationship is unlikely to be stronger evidence than a series of RCTs that has.